polish dygraph doc and deploy

8dd9dd42 · littletomatodonkey · fa24f823 · 8dd9dd42 · 8dd9dd42 · 8dd9dd42
101 changed file
--- a/.clang_format.hook
+++ b/.clang_format.hook
+#!/bin/bash
+set -e
+
+readonly VERSION="3.8"
+
+version=$(clang-format -version)
+
+if ! [[ $version == *"$VERSION"* ]]; then
+    echo "clang-format version check failed."
+    echo "a version contains '$VERSION' is needed, but get '$version'"
+    echo "you can install the right version, and make an soft-link to '\$PATH' env"
+    exit -1
+fi
+
+clang-format $@
--- a/.gitignore
+++ b/.gitignore
@@ -6,6 +6,8 @@ dataset/
 checkpoints/
 output/
 pretrained/
+.ipynb_checkpoints/
 *.ipynb*
 _build/
+build/
 nohup.out
--- a/README.md
+++ b/README.md
-# PaddleClas
-
-**文档教程**: https://paddleclas.readthedocs.io
-
-**30分钟玩转PaddleClas**: https://paddleclas.readthedocs.io/zh_CN/latest/tutorials/quick_start.html
-
-## 简介
-
-飞桨图像分类套件PaddleClas是飞桨为工业界和学术界所准备的一个图像分类任务的工具集，助力使用者训练出更好的视觉模型和应用落地。
-
-<div align="center">
-    <img src="./docs/images/main_features_s.png" width="700">
-</div>
-
-## 丰富的模型库
-
-基于ImageNet1k分类数据集，PaddleClas提供ResNet、ResNet_vd、Res2Net、HRNet、MobileNetV3等23种系列的分类网络结构的简单介绍、论文指标复现配置，以及在复现过程中的训练技巧。与此同时，也提供了对应的117个图像分类预训练模型，并且基于TensorRT评估了服务器端模型的GPU预测时间，以及在骁龙855（SD855）上评估了移动端模型的CPU预测时间和存储大小。支持的***预训练模型列表、下载地址以及更多信息***请见文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
+[简体中文](README_cn.md) | English

-<div align="center">
-    <img src="./docs/images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.jpg" width="700">
-</div>
-
-上图对比了一些最新的面向服务器端应用场景的模型，在使用V100，FP32和TensorRT，batch size为1时的预测时间及其准确率，图中准确率82.4%的ResNet50_vd_ssld和83.7%的ResNet101_vd_ssld，是采用PaddleClas提供的SSLD知识蒸馏方案训练的模型。图中相同颜色和符号的点代表同一系列不同规模的模型。不同模型的简介、FLOPS、Parameters以及详细的GPU预测时间(包括不同batchsize的T4卡预测速度)请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
-
-<div align="center">
-<img
-src="./docs/images/models/mobile_arm_top1.png" width="700">
-</div>
-
-上图对比了一些最新的面向移动端应用场景的模型，在骁龙855（SD855）上预测一张图像的时间和其准确率，包括MobileNetV1系列、MobileNetV2系列、MobileNetV3系列和ShuffleNetV2系列。图中准确率79%的MV3_large_x1_0_ssld（M是MobileNet的简称），71.3%的MV3_small_x1_0_ssld、76.74%的MV2_ssld和77.89%的MV1_ssld，是采用PaddleClas提供的SSLD蒸馏方法训练的模型。MV3_large_x1_0_ssld_int8是进一步进行INT8量化的模型。不同模型的简介、FLOPS、Parameters和模型存储大小请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
-
- TODO
- [ ] EfficientLite、GhostNet、RegNet论文指标复现和性能评估
-
-## 高阶优化支持
-除了提供丰富的分类网络结构和预训练模型，PaddleClas也支持了一系列有助于图像分类任务效果和效率提升的算法或工具。
-### SSLD知识蒸馏
-
-知识蒸馏是指使用教师模型(teacher model)去指导学生模型(student model)学习特定任务，保证小模型在参数量不变的情况下，得到比较大的效果提升，甚至获得与大模型相似的精度指标。PaddleClas提供了一种简单的半监督标签知识蒸馏方案（SSLD，Simple Semi-supervised Label Distillation），使用该方案，模型效果普遍提升3%以上，一些蒸馏模型提升效果如下图所示：
+# PaddleClas

-<div align="center">
-<img
-src="./docs/images/distillation/distillation_perform_s.jpg" width="700">
-</div>
+## Introduction

-以在ImageNet1K蒸馏模型为例，SSLD知识蒸馏方案框架图如下，该方案的核心关键点包括教师模型的选择、loss计算方式、迭代轮数、无标签数据的使用、以及ImageNet1k蒸馏finetune，每部分的详细介绍以及实验介绍请参考文档教程中的[**知识蒸馏章节**](https://paddleclas.readthedocs.io/zh_CN/latest/advanced_tutorials/distillation/index.html)。
+PaddleClas is a toolset for image classification tasks prepared for the industry and academia. It helps users train better computer vision models and apply them in real scenarios.

-<div align="center">
-<img
-src="./docs/images/distillation/ppcls_distillation_s.jpg" width="700">
-</div>

-### 数据增广
+**Recent update**
+- 2020.10.12 Add Paddle-Lite demo。
+- 2020.10.10 Add cpp inference demo and improve FAQ tutorial.
+- 2020.09.17 Add `HRNet_W48_C_ssld` pretrained model, whose Top-1 Acc on ImageNet-1k dataset reaches 83.62%. Add `ResNet34_vd_ssld` pretrained model, whose Top-1 Acc on ImageNet-1k dataset reaches 79.72%.
+- 2020.09.07 Add `HRNet_W18_C_ssld` pretrained model, whose Top-1 Acc on ImageNet-1k dataset reaches 81.16%.
+- 2020.07.14 Add `Res2Net200_vd_26w_4s_ssld` pretrained model, whose Top-1 Acc on ImageNet-1k dataset reaches 85.13%. Add `Fix_ResNet50_vd_ssld_v2` pretrained model, whose Top-1 Acc on ImageNet-1k dataset reaches 84.00%.
+- 2020.06.17 Add English documents.
+- 2020.06.12 Add support for training and evaluation on Windows or CPU.
+- [more](./docs/en/update_history_en.md)

-在图像分类任务中，图像数据的增广是一种常用的正则化方法，可以有效提升图像分类的效果，尤其对于数据量不足或者模型网络较大的场景。常用的数据增广可以分为3类，图像变换类、图像裁剪类和图像混叠类，如下图所示。图像变换类是指对全图进行一些变换，例如AutoAugment，RandAugment。图像裁剪类是指对图像以一定的方式遮挡部分区域的变换，例如CutOut，RandErasing，HideAndSeek，GridMask。图像混叠类是指多张图进行混叠一张新图的变换，例如Mixup，Cutmix。

-<div align="center">
-<img
-src="./docs/images/image_aug/image_aug_samples_s.jpg" width="800">
-</div>
+## Features

-PaddleClas提供了上述8种数据增广算法的复现和在统一实验环境下的效果评估。下图展示了不同数据增广方式在ResNet50上的表现, 与标准变换相比，采用数据增广，识别准确率最高可以提升1%。每种数据增广方法的详细介绍、对比的实验环境请参考文档教程中的[**数据增广章节**](https://paddleclas.readthedocs.io/zh_CN/latest/advanced_tutorials/image_augmentation/index.html)。
+- Rich model zoo. Based on the ImageNet-1k classification dataset, PaddleClas provides 24 series of classification network structures and training configurations, 122 models' pretrained weights and their evaluation metrics.

-<div align="center">
-<img
-src="./docs/images/image_aug/main_image_aug_s.jpg" width="600">
-</div>
+- SSLD Knowledge Distillation. Based on this SSLD distillation strategy, the top-1 acc of the distilled model is generally increased by more than 3%.

-## 30分钟玩转PaddleClas
+- Data augmentation: PaddleClas provides detailed introduction of 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, code reproduction and effect evaluation in a unified experimental environment.

-基于flowers102数据集，30分钟体验PaddleClas不同骨干网络的模型训练、不同预训练模型、SSLD知识蒸馏方案和数据增广的效果。详情请参考文档教程中的[**30分钟玩转PaddleClas**](https://paddleclas.readthedocs.io/zh_CN/latest/tutorials/quick_start.html)。
+- Pretrained model with 100,000 categories: Based on `ResNet50_vd` model, Baidu open sourced the `ResNet50_vd` pretrained model trained on a 100,000-category dataset. In some practical scenarios, the accuracy based on the pretrained weights can be increased by up to 30%.

-## 开始使用
+- A variety of training modes, including multi-machine training, mixed precision training, etc.

-PaddleClas的安装说明、模型训练、预测、评估以及模型微调（finetune）请参考文档教程中的[**初级使用章节**](https://paddleclas.readthedocs.io/zh_CN/latest/tutorials/index.html)。
+- A variety of inference and deployment solutions, including TensorRT inference, Paddle-Lite inference, model service deployment, model quantification, Paddle Hub, etc.

+- Support Linux, Windows, macOS and other systems.

-## 特色拓展应用

-### 10万类图像分类预训练模型
-在实际应用中，由于训练数据匮乏，往往将ImageNet1K数据集训练的分类模型作为预训练模型，进行图像分类的迁移学习。然而ImageNet1K数据集的类别只有1000种，预训练模型的特征迁移能力有限。因此百度自研了一个有语义体系的、粒度有粗有细的10w级别的Tag体系，通过人工或半监督方式，至今收集到 5500w+图片训练数据；该系统是国内甚至世界范围内最大规模的图片分类体系和训练集合。PaddleClas提供了在该数据集上训练的ResNet50_vd的模型。下表显示了一些实际应用场景中，使用ImageNet预训练模型和上述10万类图像分类预训练模型的效果比对，使用10万类图像分类预训练模型，识别准确率最高可以提升30%。
+## Tutorials

-| 数据集   | 数据统计                | ImageNet预训练模型 | 10万类图像分类预训练模型 |
-|:--:|:--:|:--:|:--:|
-| 花卉    | class_num:102<br/>train/val:5789/2396      | 0.7779        | 0.9892        |
-| 手绘简笔画 | class_num:18<br/>train/val:1007/432        | 0.8785        | 0.9107        |
-| 植物叶子  | class_num:6<br/>train/val:5256/2278        | 0.8212        | 0.8385        |
-| 集装箱车辆 | class_num:115<br/>train/val:4879/2094       | 0.623         | 0.9524        |
-| 椅子    | class_num:5<br/>train/val:169/78         | 0.8557        | 0.9077        |
-| 地质    | class_num:4<br/>train/val:671/296         | 0.5719        | 0.6781        |
+- [Installation](./docs/en/tutorials/install_en.md)
+- [Quick start PaddleClas in 30 minutes](./docs/en/tutorials/quick_start_en.md)
+- [Model introduction and model zoo](./docs/en/models/models_intro_en.md)
+    - [Model zoo overview](#Model_zoo_overview)
+    - [ResNet and Vd series](#ResNet_and_Vd_series)
+    - [Mobile series](#Mobile_series)
+    - [SEResNeXt and Res2Net series](#SEResNeXt_and_Res2Net_series)
+    - [DPN and DenseNet series](#DPN_and_DenseNet_series)
+    - [HRNet series](#HRNet_series)
+    - [Inception series](#Inception_series)
+    - [EfficientNet and ResNeXt101_wsl series](#EfficientNet_and_ResNeXt101_wsl_series)
+    - [ResNeSt and RegNet series](#ResNeSt_and_RegNet_series)
+- Model training/evaluation
+    - [Data preparation](./docs/en/tutorials/data_en.md)
+    - [Model training and finetuning](./docs/en/tutorials/getting_started_en.md)
+    - [Model evaluation](./docs/en/tutorials/getting_started_en.md)
+- Model prediction/inference
+    - [Prediction based on training engine](./docs/en/extension/paddle_inference_en.md)
+    - [Python inference](./docs/en/extension/paddle_inference_en.md)
+    - [C++ inference](./deploy/cpp_infer/readme_en.md)
+    - [Serving deployment](./docs/en/extension/paddle_serving_en.md)
+    - [Mobile](./deploy/lite/readme.md)
+    - [Model Quantization and Compression](docs/en/extension/paddle_quantization_en.md)
+- Advanced tutorials
+    - [Knowledge distillation](./docs/en/advanced_tutorials/distillation/distillation_en.md)
+    - [Data augmentation](./docs/en/advanced_tutorials/image_augmentation/ImageAugment_en.md)
+- Applications
+    - [Transfer learning](./docs/en/application/transfer_learning_en.md)
+    - [Pretrained model with 100,000 categories](./docs/en/application/transfer_learning_en.md)
+    - [Generic object detection](./docs/en/application/object_detection_en.md)
+- FAQ
+    - [General image classification problems](./docs/en/faq_en.md)
+    - [PaddleClas FAQ](./docs/en/faq_en.md)
+- [Competition support](./docs/en/competition_support_en.md)
+- [License](#License)
+- [Contribution](#Contribution)
+
+
+<a name="Model_zoo_overview"></a>
+### Model zoo overview
+
+Based on the ImageNet-1k classification dataset, the 24 classification network structures supported by PaddleClas and the corresponding 122 image classification pretrained models are shown below. Training trick, a brief introduction to each series of network structures, and performance evaluation will be shown in the corresponding chapters. The  evaluation environment is as follows.
+
+* CPU evaluation environment is based on Snapdragon 855 (SD855).
+* The GPU evaluation speed is measured by running 500 times under the FP32+TensorRT configuration (excluding the warmup time of the first 10 times).
+
+
+Curves of accuracy to the inference time of common server-side models are shown as follows.
+
+![](./docs/images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png)
+
+
+Curves of accuracy to the inference time and storage size of common mobile-side models are shown as follows.
+
+![](./docs/images/models/mobile_arm_storage.png)
+
+![](./docs/images/models/mobile_arm_top1.png)
+
+
+
+<a name="ResNet_and_Vd_series"></a>
+### ResNet and Vd series
+
+Accuracy and inference time metrics of ResNet and Vd series models are shown as follows. More detailed information can be refered to [ResNet and Vd series tutorial](./docs/en/models/ResNet_and_vd_en.md).
+
+| Model                 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                                                         |
+|---------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------|
+| ResNet18            | 0.7098    | 0.8992    | 1.45606               | 3.56305              | 3.66     | 11.69     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_pretrained.tar)            |
+| ResNet18_vd         | 0.7226    | 0.9080    | 1.54557               | 3.85363              | 4.14     | 11.71     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar)         |
+| ResNet34            | 0.7457    | 0.9214    | 2.34957               | 5.89821              | 7.36     | 21.8      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar)            |
+| ResNet34_vd         | 0.7598    | 0.9298    | 2.43427               | 6.22257              | 7.39     | 21.82     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_vd_pretrained.tar)         |
+| ResNet34_vd_ssld         | 0.7972    | 0.9490    | 2.43427               | 6.22257              | 7.39     | 21.82     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_vd_ssld_pretrained.tar)         |
+| ResNet50            | 0.7650    | 0.9300    | 3.47712               | 7.84421              | 8.19     | 25.56     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.tar)            |
+| ResNet50_vc         | 0.7835    | 0.9403    | 3.52346               | 8.10725              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vc_pretrained.tar)         |
+| ResNet50_vd         | 0.7912    | 0.9444    | 3.53131               | 8.09057              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar)         |
+| ResNet50_vd_v2      | 0.7984    | 0.9493    | 3.53131               | 8.09057              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_v2_pretrained.tar)      |
+| ResNet101           | 0.7756    | 0.9364    | 6.07125               | 13.40573             | 15.52    | 44.55     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar)           |
+| ResNet101_vd        | 0.8017    | 0.9497    | 6.11704               | 13.76222             | 16.1     | 44.57     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar)        |
+| ResNet152           | 0.7826    | 0.9396    | 8.50198               | 19.17073             | 23.05    | 60.19     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.tar)           |
+| ResNet152_vd        | 0.8059    | 0.9530    | 8.54376               | 19.52157             | 23.53    | 60.21     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_vd_pretrained.tar)        |
+| ResNet200_vd        | 0.8093    | 0.9533    | 10.80619              | 25.01731             | 30.53    | 74.74     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet200_vd_pretrained.tar)        |
+| ResNet50_vd_<br>ssld    | 0.8239    | 0.9610    | 3.53131               | 8.09057              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar)    |
+| ResNet50_vd_<br>ssld_v2 | 0.8300    | 0.9640    | 3.53131               | 8.09057              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_v2_pretrained.tar) |
+| ResNet101_vd_<br>ssld   | 0.8373    | 0.9669    | 6.11704               | 13.76222             | 16.1     | 44.57     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_ssld_pretrained.tar)   |
+
+
+<a name="Mobile_series"></a>
+### Mobile series
+
+Accuracy and inference time metrics of Mobile series models are shown as follows. More detailed information can be refered to [Mobile series tutorial](./docs/en/models/Mobile_en.md).
+
+| Model                              | Top-1 Acc | Top-5 Acc | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | Model storage size(M) | Download Address                                                                                                      |
+|----------------------------------|-----------|-----------|------------------------|----------|-----------|---------|-----------------------------------------------------------------------------------------------------------|
+| MobileNetV1_<br>x0_25                | 0.5143    | 0.7546    | 3.21985                | 0.07     | 0.46      | 1.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_x0_25_pretrained.tar)                |
+| MobileNetV1_<br>x0_5                 | 0.6352    | 0.8473    | 9.579599               | 0.28     | 1.31      | 5.2     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_x0_5_pretrained.tar)                 |
+| MobileNetV1_<br>x0_75                | 0.6881    | 0.8823    | 19.436399              | 0.63     | 2.55      | 10      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_x0_75_pretrained.tar)                |
+| MobileNetV1                      | 0.7099    | 0.8968    | 32.523048              | 1.11     | 4.19      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.tar)                      |
+| MobileNetV1_<br>ssld                 | 0.7789    | 0.9394    | 32.523048              | 1.11     | 4.19      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_ssld_pretrained.tar)                 |
+| MobileNetV2_<br>x0_25                | 0.5321    | 0.7652    | 3.79925                | 0.05     | 1.5       | 6.1     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_25_pretrained.tar)                |
+| MobileNetV2_<br>x0_5                 | 0.6503    | 0.8572    | 8.7021                 | 0.17     | 1.93      | 7.8     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_5_pretrained.tar)                 |
+| MobileNetV2_<br>x0_75                | 0.6983    | 0.8901    | 15.531351              | 0.35     | 2.58      | 10      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_75_pretrained.tar)                |
+| MobileNetV2                      | 0.7215    | 0.9065    | 23.317699              | 0.6      | 3.44      | 14      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.tar)                      |
+| MobileNetV2_<br>x1_5                 | 0.7412    | 0.9167    | 45.623848              | 1.32     | 6.76      | 26      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x1_5_pretrained.tar)                 |
+| MobileNetV2_<br>x2_0                 | 0.7523    | 0.9258    | 74.291649              | 2.32     | 11.13     | 43      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x2_0_pretrained.tar)                 |
+| MobileNetV2_<br>ssld                 | 0.7674    | 0.9339    | 23.317699              | 0.6      | 3.44      | 14      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_ssld_pretrained.tar)                 |
+| MobileNetV3_<br>large_x1_25          | 0.7641    | 0.9295    | 28.217701              | 0.714    | 7.44      | 29      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_25_pretrained.tar)          |
+| MobileNetV3_<br>large_x1_0           | 0.7532    | 0.9231    | 19.30835               | 0.45     | 5.47      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_pretrained.tar)           |
+| MobileNetV3_<br>large_x0_75          | 0.7314    | 0.9108    | 13.5646                | 0.296    | 3.91      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_75_pretrained.tar)          |
+| MobileNetV3_<br>large_x0_5           | 0.6924    | 0.8852    | 7.49315                | 0.138    | 2.67      | 11      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar)           |
+| MobileNetV3_<br>large_x0_35          | 0.6432    | 0.8546    | 5.13695                | 0.077    | 2.1       | 8.6     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_35_pretrained.tar)          |
+| MobileNetV3_<br>small_x1_25          | 0.7067    | 0.8951    | 9.2745                 | 0.195    | 3.62      | 14      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_25_pretrained.tar)          |
+| MobileNetV3_<br>small_x1_0           | 0.6824    | 0.8806    | 6.5463                 | 0.123    | 2.94      | 12      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_0_pretrained.tar)           |
+| MobileNetV3_<br>small_x0_75          | 0.6602    | 0.8633    | 5.28435                | 0.088    | 2.37      | 9.6     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x0_75_pretrained.tar)          |
+| MobileNetV3_<br>small_x0_5           | 0.5921    | 0.8152    | 3.35165                | 0.043    | 1.9       | 7.8     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x0_5_pretrained.tar)           |
+| MobileNetV3_<br>small_x0_35          | 0.5303    | 0.7637    | 2.6352                 | 0.026    | 1.66      | 6.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x0_35_pretrained.tar)          |
+| MobileNetV3_<br>small_x0_35_ssld          | 0.5555    | 0.7771    | 2.6352                 | 0.026    | 1.66      | 6.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x0_35_ssld_pretrained.tar)          |
+| MobileNetV3_<br>large_x1_0_ssld      | 0.7896    | 0.9448    | 19.30835               | 0.45     | 5.47      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_pretrained.tar)      |
+| MobileNetV3_large_<br>x1_0_ssld_int8 | 0.7605    |     -      | 14.395                 |    -     |      -     | 10      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_int8_pretrained.tar) |
+| MobileNetV3_small_<br>x1_0_ssld      | 0.7129    | 0.9010    | 6.5463                 | 0.123    | 2.94      | 12      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_0_ssld_pretrained.tar)      |
+| ShuffleNetV2                     | 0.6880    | 0.8845    | 10.941                 | 0.28     | 2.26      | 9       | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_pretrained.tar)                     |
+| ShuffleNetV2_<br>x0_25               | 0.4990    | 0.7379    | 2.329                  | 0.03     | 0.6       | 2.7     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x0_25_pretrained.tar)               |
+| ShuffleNetV2_<br>x0_33               | 0.5373    | 0.7705    | 2.64335                | 0.04     | 0.64      | 2.8     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x0_33_pretrained.tar)               |
+| ShuffleNetV2_<br>x0_5                | 0.6032    | 0.8226    | 4.2613                 | 0.08     | 1.36      | 5.6     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x0_5_pretrained.tar)                |
+| ShuffleNetV2_<br>x1_5                | 0.7163    | 0.9015    | 19.3522                | 0.58     | 3.47      | 14      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x1_5_pretrained.tar)                |
+| ShuffleNetV2_<br>x2_0                | 0.7315    | 0.9120    | 34.770149              | 1.12     | 7.32      | 28      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x2_0_pretrained.tar)                |
+| ShuffleNetV2_<br>swish               | 0.7003    | 0.8917    | 16.023151              | 0.29     | 2.26      | 9.1     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_swish_pretrained.tar)               |
+| DARTS_GS_4M                      | 0.7523    | 0.9215    | 47.204948              | 1.04     | 4.77      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/DARTS_GS_4M_pretrained.tar)                      |
+| DARTS_GS_6M                      | 0.7603    | 0.9279    | 53.720802              | 1.22     | 5.69      | 24      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/DARTS_GS_6M_pretrained.tar)                      |
+| GhostNet_<br>x0_5                    | 0.6688    | 0.8695    | 5.7143                 | 0.082    | 2.6       | 10      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/GhostNet_x0_5_pretrained.pdparams)               |
+| GhostNet_<br>x1_0                    | 0.7402    | 0.9165    | 13.5587                | 0.294    | 5.2       | 20      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/GhostNet_x1_0_pretrained.pdparams)               |
+| GhostNet_<br>x1_3                    | 0.7579    | 0.9254    | 19.9825                | 0.44     | 7.3       | 29      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/GhostNet_x1_3_pretrained.pdparams)               |
+
+
+<a name="SEResNeXt_and_Res2Net_series"></a>
+### SEResNeXt and Res2Net series
+
+Accuracy and inference time metrics of SEResNeXt and Res2Net series models are shown as follows. More detailed information can be refered to [SEResNext and_Res2Net series tutorial](./docs/en/models/SEResNext_and_Res2Net_en.md).
+
+
+| Model                 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                                                         |
+|---------------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------------|
+| Res2Net50_<br>26w_4s          | 0.7933    | 0.9457    | 4.47188               | 9.65722              | 8.52     | 25.7      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_26w_4s_pretrained.tar)          |
+| Res2Net50_vd_<br>26w_4s       | 0.7975    | 0.9491    | 4.52712               | 9.93247              | 8.37     | 25.06     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_vd_26w_4s_pretrained.tar)       |
+| Res2Net50_<br>14w_8s          | 0.7946    | 0.9470    | 5.4026                | 10.60273             | 9.01     | 25.72     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_14w_8s_pretrained.tar)          |
+| Res2Net101_vd_<br>26w_4s      | 0.8064    | 0.9522    | 8.08729               | 17.31208             | 16.67    | 45.22     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net101_vd_26w_4s_pretrained.tar)      |
+| Res2Net200_vd_<br>26w_4s      | 0.8121    | 0.9571    | 14.67806              | 32.35032             | 31.49    | 76.21     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net200_vd_26w_4s_pretrained.tar)      |
+| Res2Net200_vd_<br>26w_4s_ssld | 0.8513    | 0.9742    | 14.67806              | 32.35032             | 31.49    | 76.21     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net200_vd_26w_4s_ssld_pretrained.tar) |
+| ResNeXt50_<br>32x4d           | 0.7775    | 0.9382    | 7.56327               | 10.6134              | 8.02     | 23.64     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_32x4d_pretrained.tar)           |
+| ResNeXt50_vd_<br>32x4d        | 0.7956    | 0.9462    | 7.62044               | 11.03385             | 8.5      | 23.66     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_vd_32x4d_pretrained.tar)        |
+| ResNeXt50_<br>64x4d           | 0.7843    | 0.9413    | 13.80962              | 18.4712              | 15.06    | 42.36     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_64x4d_pretrained.tar)           |
+| ResNeXt50_vd_<br>64x4d        | 0.8012    | 0.9486    | 13.94449              | 18.88759             | 15.54    | 42.38     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_vd_64x4d_pretrained.tar)        |
+| ResNeXt101_<br>32x4d          | 0.7865    | 0.9419    | 16.21503              | 19.96568             | 15.01    | 41.54     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x4d_pretrained.tar)          |
+| ResNeXt101_vd_<br>32x4d       | 0.8033    | 0.9512    | 16.28103              | 20.25611             | 15.49    | 41.56     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_32x4d_pretrained.tar)       |
+| ResNeXt101_<br>64x4d          | 0.7835    | 0.9452    | 30.4788               | 36.29801             | 29.05    | 78.12     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_64x4d_pretrained.tar)          |
+| ResNeXt101_vd_<br>64x4d       | 0.8078    | 0.9520    | 30.40456              | 36.77324             | 29.53    | 78.14     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar)       |
+| ResNeXt152_<br>32x4d          | 0.7898    | 0.9433    | 24.86299              | 29.36764             | 22.01    | 56.28     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_32x4d_pretrained.tar)          |
+| ResNeXt152_vd_<br>32x4d       | 0.8072    | 0.9520    | 25.03258              | 30.08987             | 22.49    | 56.3      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_vd_32x4d_pretrained.tar)       |
+| ResNeXt152_<br>64x4d          | 0.7951    | 0.9471    | 46.7564               | 56.34108             | 43.03    | 107.57    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_64x4d_pretrained.tar)          |
+| ResNeXt152_vd_<br>64x4d       | 0.8108    | 0.9534    | 47.18638              | 57.16257             | 43.52    | 107.59    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_vd_64x4d_pretrained.tar)       |
+| SE_ResNet18_vd            | 0.7333    | 0.9138    | 1.7691                | 4.19877              | 4.14     | 11.8      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNet18_vd_pretrained.tar)            |
+| SE_ResNet34_vd            | 0.7651    | 0.9320    | 2.88559               | 7.03291              | 7.84     | 21.98     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNet34_vd_pretrained.tar)            |
+| SE_ResNet50_vd            | 0.7952    | 0.9475    | 4.28393               | 10.38846             | 8.67     | 28.09     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNet50_vd_pretrained.tar)            |
+| SE_ResNeXt50_<br>32x4d        | 0.7844    | 0.9396    | 8.74121               | 13.563               | 8.02     | 26.16     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt50_32x4d_pretrained.tar)        |
+| SE_ResNeXt50_vd_<br>32x4d     | 0.8024    | 0.9489    | 9.17134               | 14.76192             | 10.76    | 26.28     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt50_vd_32x4d_pretrained.tar)     |
+| SE_ResNeXt101_<br>32x4d       | 0.7912    | 0.9420    | 18.82604              | 25.31814             | 15.02    | 46.28     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt101_32x4d_pretrained.tar)       |
+| SENet154_vd               | 0.8140    | 0.9548    | 53.79794              | 66.31684             | 45.83    | 114.29    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/SENet154_vd_pretrained.tar)               |
+
+
+<a name="DPN_and_DenseNet_series"></a>
+### DPN and DenseNet series
+
+Accuracy and inference time metrics of DPN and DenseNet series models are shown as follows. More detailed information can be refered to [DPN and DenseNet series tutorial](./docs/en/models/DPN_DenseNet_en.md).
+
+
+| Model                 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                                                         |
+|-------------|-----------|-----------|-----------------------|----------------------|----------|-----------|--------------------------------------------------------------------------------------|
+| DenseNet121 | 0.7566    | 0.9258    | 4.40447               | 9.32623              | 5.69     | 7.98      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet121_pretrained.tar) |
+| DenseNet161 | 0.7857    | 0.9414    | 10.39152              | 22.15555             | 15.49    | 28.68     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet161_pretrained.tar) |
+| DenseNet169 | 0.7681    | 0.9331    | 6.43598               | 12.98832             | 6.74     | 14.15     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet169_pretrained.tar) |
+| DenseNet201 | 0.7763    | 0.9366    | 8.20652               | 17.45838             | 8.61     | 20.01     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet201_pretrained.tar) |
+| DenseNet264 | 0.7796    | 0.9385    | 12.14722              | 26.27707             | 11.54    | 33.37     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet264_pretrained.tar) |
+| DPN68       | 0.7678    | 0.9343    | 11.64915              | 12.82807             | 4.03     | 10.78     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/DPN68_pretrained.tar)       |
+| DPN92       | 0.7985    | 0.9480    | 18.15746              | 23.87545             | 12.54    | 36.29     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/DPN92_pretrained.tar)       |
+| DPN98       | 0.8059    | 0.9510    | 21.18196              | 33.23925             | 22.22    | 58.46     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/DPN98_pretrained.tar)       |
+| DPN107      | 0.8089    | 0.9532    | 27.62046              | 52.65353             | 35.06    | 82.97     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/DPN107_pretrained.tar)      |
+| DPN131      | 0.8070    | 0.9514    | 28.33119              | 46.19439             | 30.51    | 75.36     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/DPN131_pretrained.tar)      |
+
+<a name="HRNet_series"></a>
+### HRNet series
+
+Accuracy and inference time metrics of HRNet series models are shown as follows. More detailed information can be refered to [Mobile series tutorial](./docs/en/models/HRNet_en.md).
+
+
+| Model         | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                                                 |
+|-------------|-----------|-----------|------------------|------------------|----------|-----------|--------------------------------------------------------------------------------------|
+| HRNet_W18_C | 0.7692    | 0.9339    | 7.40636          | 13.29752         | 4.14     | 21.29     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_pretrained.tar) |
+| HRNet_W18_C_ssld | 0.81162    | 0.95804    | 7.40636          | 13.29752         | 4.14     | 21.29     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_ssld_pretrained.tar) |
+| HRNet_W30_C | 0.7804    | 0.9402    | 9.57594          | 17.35485         | 16.23    | 37.71     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W30_C_pretrained.tar) |
+| HRNet_W32_C | 0.7828    | 0.9424    | 9.49807          | 17.72921         | 17.86    | 41.23     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W32_C_pretrained.tar) |
+| HRNet_W40_C | 0.7877    | 0.9447    | 12.12202         | 25.68184         | 25.41    | 57.55     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W40_C_pretrained.tar) |
+| HRNet_W44_C | 0.7900    | 0.9451    | 13.19858         | 32.25202         | 29.79    | 67.06     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W44_C_pretrained.tar) |
+| HRNet_W48_C | 0.7895    | 0.9442    | 13.70761         | 34.43572         | 34.58    | 77.47     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W48_C_pretrained.tar) |
+| HRNet_W48_C_ssld | 0.8363    | 0.9682    | 13.70761         | 34.43572         | 34.58    | 77.47     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W48_C_pretrained.tar) |
+| HRNet_W64_C | 0.7930    | 0.9461    | 17.57527         | 47.9533          | 57.83    | 128.06    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W64_C_pretrained.tar) |
+
+
+<a name="Inception_series"></a>
+### Inception series
+
+Accuracy and inference time metrics of Inception series models are shown as follows. More detailed information can be refered to [Inception series tutorial](./docs/en/models/Inception_en.md).
+
+
+| Model                 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                                                         |
+|--------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|---------------------------------------------------------------------------------------------|
+| GoogLeNet          | 0.7070    | 0.8966    | 1.88038               | 4.48882              | 2.88     | 8.46      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/GoogLeNet_pretrained.tar)          |
+| Xception41         | 0.7930    | 0.9453    | 4.96939               | 17.01361             | 16.74    | 22.69     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/Xception41_pretrained.tar)         |
+| Xception41_deeplab | 0.7955    | 0.9438    | 5.33541               | 17.55938             | 18.16    | 26.73     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/Xception41_deeplab_pretrained.tar) |
+| Xception65         | 0.8100    | 0.9549    | 7.26158               | 25.88778             | 25.95    | 35.48     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/Xception65_pretrained.tar)         |
+| Xception65_deeplab | 0.8032    | 0.9449    | 7.60208               | 26.03699             | 27.37    | 39.52     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/Xception65_deeplab_pretrained.tar) |
+| Xception71         | 0.8111    | 0.9545    | 8.72457               | 31.55549             | 31.77    | 37.28     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/Xception71_pretrained.tar)         |
+| InceptionV4        | 0.8077    | 0.9526    | 12.99342              | 25.23416             | 24.57    | 42.68     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/InceptionV4_pretrained.tar)        |
+
+
+<a name="EfficientNet_and_ResNeXt101_wsl_series"></a>
+### EfficientNet and ResNeXt101_wsl series
+
+Accuracy and inference time metrics of EfficientNet and ResNeXt101_wsl series models are shown as follows. More detailed information can be refered to [EfficientNet and ResNeXt101_wsl series tutorial](./docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md).

-10万类图像分类预训练模型下载地址如下，更多的相关内容请参考文档教程中的[**图像分类迁移学习章节**](https://paddleclas.readthedocs.io/zh_CN/latest/application/transfer_learning.html#id1)。

- **10万类预训练模型:**[**下载地址**](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_10w_pretrained.tar)
+| Model                       | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                                                               |
+|---------------------------|-----------|-----------|------------------|------------------|----------|-----------|----------------------------------------------------------------------------------------------------|
+| ResNeXt101_<br>32x8d_wsl      | 0.8255    | 0.9674    | 18.52528         | 34.25319         | 29.14    | 78.44     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x8d_wsl_pretrained.tar)      |
+| ResNeXt101_<br>32x16d_wsl     | 0.8424    | 0.9726    | 25.60395         | 71.88384         | 57.55    | 152.66    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x16d_wsl_pretrained.tar)     |
+| ResNeXt101_<br>32x32d_wsl     | 0.8497    | 0.9759    | 54.87396         | 160.04337        | 115.17   | 303.11    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x32d_wsl_pretrained.tar)     |
+| ResNeXt101_<br>32x48d_wsl     | 0.8537    | 0.9769    | 99.01698256      | 315.91261        | 173.58   | 456.2     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x48d_wsl_pretrained.tar)     |
+| Fix_ResNeXt101_<br>32x48d_wsl | 0.8626    | 0.9797    | 160.0838242      | 595.99296        | 354.23   | 456.2     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/Fix_ResNeXt101_32x48d_wsl_pretrained.tar) |
+| EfficientNetB0            | 0.7738    | 0.9331    | 3.442            | 6.11476          | 0.72     | 5.1       | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB0_pretrained.tar)            |
+| EfficientNetB1            | 0.7915    | 0.9441    | 5.3322           | 9.41795          | 1.27     | 7.52      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB1_pretrained.tar)            |
+| EfficientNetB2            | 0.7985    | 0.9474    | 6.29351          | 10.95702         | 1.85     | 8.81      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB2_pretrained.tar)            |
+| EfficientNetB3            | 0.8115    | 0.9541    | 7.67749          | 16.53288         | 3.43     | 11.84     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB3_pretrained.tar)            |
+| EfficientNetB4            | 0.8285    | 0.9623    | 12.15894         | 30.94567         | 8.29     | 18.76     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB4_pretrained.tar)            |
+| EfficientNetB5            | 0.8362    | 0.9672    | 20.48571         | 61.60252         | 19.51    | 29.61     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB5_pretrained.tar)            |
+| EfficientNetB6            | 0.8400    | 0.9688    | 32.62402         | -                | 36.27    | 42        | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB6_pretrained.tar)            |
+| EfficientNetB7            | 0.8430    | 0.9689    | 53.93823         | -                | 72.35    | 64.92     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB7_pretrained.tar)            |
+| EfficientNetB0_<br>small      | 0.7580    | 0.9258    | 2.3076           | 4.71886          | 0.72     | 4.65      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB0_small_pretrained.tar)      |
+

-### 通用目标检测
+<a name="ResNeSt_and_RegNet_series"></a>
+### ResNeSt and RegNet series

-近年来，学术界和工业界广泛关注图像中目标检测任务，而图像分类的网络结构以及预训练模型效果直接影响目标检测的效果。PaddleDetection使用PaddleClas的82.39%的ResNet50_vd的预训练模型，结合自身丰富的检测算子，提供了一种面向服务器端应用的目标检测方案，PSS-DET (Practical Server Side Detection)。该方案融合了多种只增加少许计算量，但是可以有效提升两阶段Faster RCNN目标检测效果的策略，包括检测模型剪裁、使用分类效果更优的预训练模型、DCNv2、Cascade RCNN、AutoAugment、Libra sampling以及多尺度训练。其中基于82.39%的R50_vd_ssld预训练模型，与79.12%的R50_vd的预训练模型相比，检测效果可以提升1.5%。在COCO目标检测数据集上测试PSS-DET，当V100单卡预测速度为61FPS时，mAP是41.6%，预测速度为20FPS时，mAP是47.8%。详情请参考[**通用目标检测章节**](https://paddleclas.readthedocs.io/zh_CN/latest/application/object_detection.html)。
+Accuracy and inference time metrics of ResNeSt and RegNet series models are shown as follows. More detailed information can be refered to [ResNeSt and RegNet series tutorial](./docs/en/models/ResNeSt_RegNet_en.md).


- TODO
- [ ] PaddleClas在OCR任务中的应用
- [ ] PaddleClas在人脸检测和识别中的应用
+| Model                    | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                                                                 |
+|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------|
+| ResNeSt50_<br>fast_1s1x64d | 0.8035    | 0.9528    | 3.45405                | 8.72680                | 8.68     | 26.3      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeSt50_fast_1s1x64d_pretrained.pdparams) |
+| ResNeSt50              | 0.8102    | 0.9542    | 6.69042    | 8.01664                | 10.78    | 27.5      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeSt50_pretrained.pdparams)              |
+| RegNetX_4GF            | 0.785     | 0.9416    |    6.46478              |      11.19862           | 8        | 22.1      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/RegNetX_4GF_pretrained.pdparams)            |

-## 工业级应用部署工具
-PaddlePaddle提供了一系列实用工具，便于工业应用部署PaddleClas，具体请参考文档教程中的[**实用工具章节**](https://paddleclas.readthedocs.io/zh_CN/latest/extension/index.html)。

- TensorRT预测
- Paddle-Lite
- 模型服务化部署
- 模型量化
- 多机训练
- Paddle Hub
+<a name="License"></a>
+## License

-## 赛事支持
-PaddleClas的建设源于百度实际视觉业务应用的淬炼和视觉前沿能力的探索，助力多个视觉重点赛事取得领先成绩，并且持续推进更多的前沿视觉问题的解决和落地应用。更多内容请关注文档教程中的[**赛事支持章节**](https://paddleclas.readthedocs.io/zh_CN/latest/competition_support.html)
+PaddleClas is released under the <a href="https://github.com/PaddlePaddle/PaddleClas/blob/master/LICENSE">Apache 2.0 license</a>

- 2018年Kaggle Open Images V4图像目标检测挑战赛冠军
- 首届多媒体信息识别技术竞赛中印刷文本OCR、人脸识别和地标识别三项任务A级证书
- 2019年Kaggle Open Images V5图像目标检测挑战赛亚军
- 2019年Kaggle地标检索挑战赛亚军
- 2019年Kaggle地标识别挑战赛亚军

-## 许可证书
-本项目的发布受<a href="https://github.com/PaddlePaddle/PaddleCLS/blob/master/LICENSE">Apache 2.0 license</a>许可认证。
+<a name="Contribution"></a>
+## Contribution

-## 版本更新
+Contributions are highly welcomed and we would really appreciate your feedback!!

-## 如何贡献代码
-我们非常欢迎你为PaddleClas贡献代码，也十分感谢你的反馈。
+- Thank [nblib](https://github.com/nblib) to fix bug of RandErasing.
+- Thank [chenpy228](https://github.com/chenpy228) to fix some typos PaddleClas.
--- a/README_cn.md
+++ b/README_cn.md
+简体中文 | [English](README.md)
+
+# PaddleClas
+
+## 简介
+
+飞桨图像分类套件PaddleClas是飞桨为工业界和学术界所准备的一个图像分类任务的工具集，助力使用者训练出更好的视觉模型和应用落地。
+
+**近期更新**
+- 2020.10.12 添加Paddle-Lite demo。
+- 2020.10.10 添加cpp inference demo，完善`FAQ 30问`教程。
+- 2020.09.17 添加 `HRNet_W48_C_ssld `模型，在ImageNet-1k上Top-1 Acc可达83.62%；添加 `ResNet34_vd_ssld `模型，在ImageNet-1k上Top-1 Acc可达79.72%。
+- 2020.09.07 添加 `HRNet_W18_C_ssld `模型，在ImageNet-1k上Top-1 Acc可达81.16%；添加 `MobileNetV3_small_x0_35_ssld `模型，在ImageNet-1k上Top-1 Acc可达55.55%。
+- 2020.07.14 添加 `Res2Net200_vd_26w_4s_ssld `模型，在ImageNet-1k上Top-1 Acc可达85.13%；添加 `Fix_ResNet50_vd_ssld_v2 `模型，在ImageNet-1k上Top-1 Acc可达84.0%。
+- 2020.06.17 添加英文文档。
+- 2020.06.12 添加对windows和CPU环境的训练与评估支持。
+- [more](./docs/zh_CN/update_history.md)
+
+
+## 特性
+
+- 丰富的模型库：基于ImageNet1k分类数据集，PaddleClas提供了24个系列的分类网络结构和训练配置，122个预训练模型和性能评估。
+
+- SSLD知识蒸馏：基于该方案蒸馏模型的识别准确率普遍提升3%以上。
+
+- 数据增广：支持AutoAugment、Cutout、Cutmix等8种数据增广算法详细介绍、代码复现和在统一实验环境下的效果评估。
+
+- 10万类图像分类预训练模型：百度自研并开源了基于10万类数据集训练的 `ResNet50_vd `模型，在一些实际场景中，使用该预训练模型的识别准确率最多可以提升30%。
+
+- 多种训练方案，包括多机训练、混合精度训练等。
+
+- 多种预测推理、部署方案，包括TensorRT预测、Paddle-Lite预测、模型服务化部署、模型量化、Paddle Hub等。
+
+- 可运行于Linux、Windows、MacOS等多种系统。
+
+
+## 文档教程
+
+- [快速安装](./docs/zh_CN/tutorials/install.md)
+- [30分钟玩转PaddleClas](./docs/zh_CN/tutorials/quick_start.md)
+- [模型库介绍和预训练模型](./docs/zh_CN/models/models_intro.md)
+    - [模型库概览图](#模型库概览图)
+    - [ResNet及其Vd系列](#ResNet及其Vd系列)
+    - [移动端系列](#移动端系列)
+    - [SEResNeXt与Res2Net系列](#SEResNeXt与Res2Net系列)
+    - [DPN与DenseNet系列](#DPN与DenseNet系列)
+    - [HRNet](HRNet系列)
+    - [Inception系列](#Inception系列)
+    - [EfficientNet与ResNeXt101_wsl系列](#EfficientNet与ResNeXt101_wsl系列)
+    - [ResNeSt与RegNet系列](#ResNeSt与RegNet系列)
+- 模型训练/评估
+    - [数据准备](./docs/zh_CN/tutorials/data.md)
+    - [模型训练与微调](./docs/zh_CN/tutorials/getting_started.md)
+    - [模型评估](./docs/zh_CN/tutorials/getting_started.md)
+- 模型预测
+    - [基于训练引擎预测推理](./docs/zh_CN/extension/paddle_inference.md)
+    - [基于Python预测引擎预测推理](./docs/zh_CN/extension/paddle_inference.md)
+    - [基于C++预测引擎预测推理](./deploy/cpp_infer/readme.md)
+    - [服务化部署](./docs/zh_CN/extension/paddle_serving.md)
+    - [端侧部署](./deploy/lite/readme.md)
+    - [模型量化压缩](docs/zh_CN/extension/paddle_quantization.md)
+- 高阶使用
+    - [知识蒸馏](./docs/zh_CN/advanced_tutorials/distillation/distillation.md)
+    - [数据增广](./docs/zh_CN/advanced_tutorials/image_augmentation/ImageAugment.md)
+- 特色拓展应用
+    - [迁移学习](./docs/zh_CN/application/transfer_learning.md)
+    - [10万类图像分类预训练模型](./docs/zh_CN/application/transfer_learning.md)
+    - [通用目标检测](./docs/zh_CN/application/object_detection.md)
+- FAQ
+    - [图像分类通用问题](./docs/zh_CN/faq.md)
+    - [PaddleClas实战FAQ](./docs/zh_CN/faq.md)
+- [赛事支持](./docs/zh_CN/competition_support.md)
+- [许可证书](#许可证书)
+- [贡献代码](#贡献代码)
+
+
+## 模型库
+
+<a name="模型库概览图"></a>
+### 模型库概览图
+
+基于ImageNet1k分类数据集，PaddleClas支持24种系列分类网络结构以及对应的122个图像分类预训练模型，训练技巧、每个系列网络结构的简单介绍和性能评估将在相应章节展现，下面所有的速度指标评估环境如下：
+* CPU的评估环境基于骁龙855（SD855）。
+* GPU评估环境基于T4机器，在FP32+TensorRT配置下运行500次测得（去除前10次的warmup时间）。
+
+常见服务器端模型的精度指标与其预测耗时的变化曲线如下图所示。
+
+![](./docs/images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png)
+
+
+常见移动端模型的精度指标与其预测耗时、模型存储大小的变化曲线如下图所示。
+
+![](./docs/images/models/mobile_arm_storage.png)
+
+![](./docs/images/models/mobile_arm_top1.png)
+
+
+<a name="ResNet及其Vd系列"></a>
+### ResNet及其Vd系列
+
+ResNet及其Vd系列模型的精度、速度指标如下表所示，更多关于该系列的模型介绍可以参考：[ResNet及其Vd系列模型文档](./docs/zh_CN/models/ResNet_and_vd.md)。
+
+| 模型                  | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址                                                                                         |
+|---------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------|
+| ResNet18            | 0.7098    | 0.8992    | 1.45606               | 3.56305              | 3.66     | 11.69     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_pretrained.tar)            |
+| ResNet18_vd         | 0.7226    | 0.9080    | 1.54557               | 3.85363              | 4.14     | 11.71     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar)         |
+| ResNet34            | 0.7457    | 0.9214    | 2.34957               | 5.89821              | 7.36     | 21.8      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar)            |
+| ResNet34_vd         | 0.7598    | 0.9298    | 2.43427               | 6.22257              | 7.39     | 21.82     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_vd_pretrained.tar)         |
+| ResNet34_vd_ssld         | 0.7972    | 0.9490    | 2.43427               | 6.22257              | 7.39     | 21.82     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_vd_ssld_pretrained.tar)         |
+| ResNet50            | 0.7650    | 0.9300    | 3.47712               | 7.84421              | 8.19     | 25.56     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.tar)            |
+| ResNet50_vc         | 0.7835    | 0.9403    | 3.52346               | 8.10725              | 8.67     | 25.58     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vc_pretrained.tar)         |
+| ResNet50_vd         | 0.7912    | 0.9444    | 3.53131               | 8.09057              | 8.67     | 25.58     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar)         |
+| ResNet50_vd_v2      | 0.7984    | 0.9493    | 3.53131               | 8.09057              | 8.67     | 25.58     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_v2_pretrained.tar)      |
+| ResNet101           | 0.7756    | 0.9364    | 6.07125               | 13.40573             | 15.52    | 44.55     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar)           |
+| ResNet101_vd        | 0.8017    | 0.9497    | 6.11704               | 13.76222             | 16.1     | 44.57     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar)        |
+| ResNet152           | 0.7826    | 0.9396    | 8.50198               | 19.17073             | 23.05    | 60.19     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.tar)           |
+| ResNet152_vd        | 0.8059    | 0.9530    | 8.54376               | 19.52157             | 23.53    | 60.21     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_vd_pretrained.tar)        |
+| ResNet200_vd        | 0.8093    | 0.9533    | 10.80619              | 25.01731             | 30.53    | 74.74     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet200_vd_pretrained.tar)        |
+| ResNet50_vd_<br>ssld    | 0.8239    | 0.9610    | 3.53131               | 8.09057              | 8.67     | 25.58     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar)    |
+| ResNet50_vd_<br>ssld_v2 | 0.8300    | 0.9640    | 3.53131               | 8.09057              | 8.67     | 25.58     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_v2_pretrained.tar) |
+| ResNet101_vd_<br>ssld   | 0.8373    | 0.9669    | 6.11704               | 13.76222             | 16.1     | 44.57     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_ssld_pretrained.tar)   |
+
+
+<a name="移动端系列"></a>
+### 移动端系列
+
+移动端系列模型的精度、速度指标如下表所示，更多关于该系列的模型介绍可以参考：[移动端系列模型文档](./docs/zh_CN/models/Mobile.md)。
+
+| 模型                               | Top-1 Acc | Top-5 Acc | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | 模型大小(M) | 下载地址                                                                                                      |
+|----------------------------------|-----------|-----------|------------------------|----------|-----------|---------|-----------------------------------------------------------------------------------------------------------|
+| MobileNetV1_<br>x0_25                | 0.5143    | 0.7546    | 3.21985                | 0.07     | 0.46      | 1.9     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_x0_25_pretrained.tar)                |
+| MobileNetV1_<br>x0_5                 | 0.6352    | 0.8473    | 9.579599               | 0.28     | 1.31      | 5.2     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_x0_5_pretrained.tar)                 |
+| MobileNetV1_<br>x0_75                | 0.6881    | 0.8823    | 19.436399              | 0.63     | 2.55      | 10      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_x0_75_pretrained.tar)                |
+| MobileNetV1                      | 0.7099    | 0.8968    | 32.523048              | 1.11     | 4.19      | 16      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.tar)                      |
+| MobileNetV1_<br>ssld                 | 0.7789    | 0.9394    | 32.523048              | 1.11     | 4.19      | 16      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_ssld_pretrained.tar)                 |
+| MobileNetV2_<br>x0_25                | 0.5321    | 0.7652    | 3.79925                | 0.05     | 1.5       | 6.1     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_25_pretrained.tar)                |
+| MobileNetV2_<br>x0_5                 | 0.6503    | 0.8572    | 8.7021                 | 0.17     | 1.93      | 7.8     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_5_pretrained.tar)                 |
+| MobileNetV2_<br>x0_75                | 0.6983    | 0.8901    | 15.531351              | 0.35     | 2.58      | 10      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_75_pretrained.tar)                |
+| MobileNetV2                      | 0.7215    | 0.9065    | 23.317699              | 0.6      | 3.44      | 14      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.tar)                      |
+| MobileNetV2_<br>x1_5                 | 0.7412    | 0.9167    | 45.623848              | 1.32     | 6.76      | 26      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x1_5_pretrained.tar)                 |
+| MobileNetV2_<br>x2_0                 | 0.7523    | 0.9258    | 74.291649              | 2.32     | 11.13     | 43      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x2_0_pretrained.tar)                 |
+| MobileNetV2_<br>ssld                 | 0.7674    | 0.9339    | 23.317699              | 0.6      | 3.44      | 14      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_ssld_pretrained.tar)                 |
+| MobileNetV3_<br>large_x1_25          | 0.7641    | 0.9295    | 28.217701              | 0.714    | 7.44      | 29      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_25_pretrained.tar)          |
+| MobileNetV3_<br>large_x1_0           | 0.7532    | 0.9231    | 19.30835               | 0.45     | 5.47      | 21      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_pretrained.tar)           |
+| MobileNetV3_<br>large_x0_75          | 0.7314    | 0.9108    | 13.5646                | 0.296    | 3.91      | 16      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_75_pretrained.tar)          |
+| MobileNetV3_<br>large_x0_5           | 0.6924    | 0.8852    | 7.49315                | 0.138    | 2.67      | 11      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar)           |
+| MobileNetV3_<br>large_x0_35          | 0.6432    | 0.8546    | 5.13695                | 0.077    | 2.1       | 8.6     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_35_pretrained.tar)          |
+| MobileNetV3_<br>small_x1_25          | 0.7067    | 0.8951    | 9.2745                 | 0.195    | 3.62      | 14      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_25_pretrained.tar)          |
+| MobileNetV3_<br>small_x1_0           | 0.6824    | 0.8806    | 6.5463                 | 0.123    | 2.94      | 12      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_0_pretrained.tar)           |
+| MobileNetV3_<br>small_x0_75          | 0.6602    | 0.8633    | 5.28435                | 0.088    | 2.37      | 9.6     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x0_75_pretrained.tar)          |
+| MobileNetV3_<br>small_x0_5           | 0.5921    | 0.8152    | 3.35165                | 0.043    | 1.9       | 7.8     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x0_5_pretrained.tar)           |
+| MobileNetV3_<br>small_x0_35          | 0.5303    | 0.7637    | 2.6352                 | 0.026    | 1.66      | 6.9     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x0_35_pretrained.tar)          |
+| MobileNetV3_<br>small_x0_35_ssld          | 0.5555    | 0.7771    | 2.6352                 | 0.026    | 1.66      | 6.9     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x0_35_ssld_pretrained.tar)          |
+| MobileNetV3_<br>large_x1_0_ssld      | 0.7896    | 0.9448    | 19.30835               | 0.45     | 5.47      | 21      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_pretrained.tar)      |
+| MobileNetV3_large_<br>x1_0_ssld_int8 | 0.7605    |     -      | 14.395                 |    -     |      -     | 10      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_int8_pretrained.tar) |
+| MobileNetV3_small_<br>x1_0_ssld      | 0.7129    | 0.9010    | 6.5463                 | 0.123    | 2.94      | 12      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_0_ssld_pretrained.tar)      |
+| ShuffleNetV2                     | 0.6880    | 0.8845    | 10.941                 | 0.28     | 2.26      | 9       | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_pretrained.tar)                     |
+| ShuffleNetV2_<br>x0_25               | 0.4990    | 0.7379    | 2.329                  | 0.03     | 0.6       | 2.7     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x0_25_pretrained.tar)               |
+| ShuffleNetV2_<br>x0_33               | 0.5373    | 0.7705    | 2.64335                | 0.04     | 0.64      | 2.8     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x0_33_pretrained.tar)               |
+| ShuffleNetV2_<br>x0_5                | 0.6032    | 0.8226    | 4.2613                 | 0.08     | 1.36      | 5.6     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x0_5_pretrained.tar)                |
+| ShuffleNetV2_<br>x1_5                | 0.7163    | 0.9015    | 19.3522                | 0.58     | 3.47      | 14      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x1_5_pretrained.tar)                |
+| ShuffleNetV2_<br>x2_0                | 0.7315    | 0.9120    | 34.770149              | 1.12     | 7.32      | 28      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x2_0_pretrained.tar)                |
+| ShuffleNetV2_<br>swish               | 0.7003    | 0.8917    | 16.023151              | 0.29     | 2.26      | 9.1     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_swish_pretrained.tar)               |
+| DARTS_GS_4M                      | 0.7523    | 0.9215    | 47.204948              | 1.04     | 4.77      | 21      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/DARTS_GS_4M_pretrained.tar)                      |
+| DARTS_GS_6M                      | 0.7603    | 0.9279    | 53.720802              | 1.22     | 5.69      | 24      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/DARTS_GS_6M_pretrained.tar)                      |
+| GhostNet_<br>x0_5                    | 0.6688    | 0.8695    | 5.7143                 | 0.082    | 2.6       | 10      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/GhostNet_x0_5_pretrained.pdparams)               |
+| GhostNet_<br>x1_0                    | 0.7402    | 0.9165    | 13.5587                | 0.294    | 5.2       | 20      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/GhostNet_x1_0_pretrained.pdparams)               |
+| GhostNet_<br>x1_3                    | 0.7579    | 0.9254    | 19.9825                | 0.44     | 7.3       | 29      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/GhostNet_x1_3_pretrained.pdparams)               |
+
+
+<a name="SEResNeXt与Res2Net系列"></a>
+### SEResNeXt与Res2Net系列
+
+SEResNeXt与Res2Net系列模型的精度、速度指标如下表所示，更多关于该系列的模型介绍可以参考：[SEResNeXt与Res2Net系列模型文档](./docs/zh_CN/models/SEResNext_and_Res2Net.md)。
+
+
+| 模型                  | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址                                                                                         |
+|---------------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------------|
+| Res2Net50_<br>26w_4s          | 0.7933    | 0.9457    | 4.47188               | 9.65722              | 8.52     | 25.7      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_26w_4s_pretrained.tar)          |
+| Res2Net50_vd_<br>26w_4s       | 0.7975    | 0.9491    | 4.52712               | 9.93247              | 8.37     | 25.06     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_vd_26w_4s_pretrained.tar)       |
+| Res2Net50_<br>14w_8s          | 0.7946    | 0.9470    | 5.4026                | 10.60273             | 9.01     | 25.72     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_14w_8s_pretrained.tar)          |
+| Res2Net101_vd_<br>26w_4s      | 0.8064    | 0.9522    | 8.08729               | 17.31208             | 16.67    | 45.22     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net101_vd_26w_4s_pretrained.tar)      |
+| Res2Net200_vd_<br>26w_4s      | 0.8121    | 0.9571    | 14.67806              | 32.35032             | 31.49    | 76.21     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net200_vd_26w_4s_pretrained.tar)      |
+| Res2Net200_vd_<br>26w_4s_ssld | 0.8513    | 0.9742    | 14.67806              | 32.35032             | 31.49    | 76.21     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net200_vd_26w_4s_ssld_pretrained.tar) |
+| ResNeXt50_<br>32x4d           | 0.7775    | 0.9382    | 7.56327               | 10.6134              | 8.02     | 23.64     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_32x4d_pretrained.tar)           |
+| ResNeXt50_vd_<br>32x4d        | 0.7956    | 0.9462    | 7.62044               | 11.03385             | 8.5      | 23.66     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_vd_32x4d_pretrained.tar)        |
+| ResNeXt50_<br>64x4d           | 0.7843    | 0.9413    | 13.80962              | 18.4712              | 15.06    | 42.36     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_64x4d_pretrained.tar)           |
+| ResNeXt50_vd_<br>64x4d        | 0.8012    | 0.9486    | 13.94449              | 18.88759             | 15.54    | 42.38     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_vd_64x4d_pretrained.tar)        |
+| ResNeXt101_<br>32x4d          | 0.7865    | 0.9419    | 16.21503              | 19.96568             | 15.01    | 41.54     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x4d_pretrained.tar)          |
+| ResNeXt101_vd_<br>32x4d       | 0.8033    | 0.9512    | 16.28103              | 20.25611             | 15.49    | 41.56     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_32x4d_pretrained.tar)       |
+| ResNeXt101_<br>64x4d          | 0.7835    | 0.9452    | 30.4788               | 36.29801             | 29.05    | 78.12     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_64x4d_pretrained.tar)          |
+| ResNeXt101_vd_<br>64x4d       | 0.8078    | 0.9520    | 30.40456              | 36.77324             | 29.53    | 78.14     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar)       |
+| ResNeXt152_<br>32x4d          | 0.7898    | 0.9433    | 24.86299              | 29.36764             | 22.01    | 56.28     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_32x4d_pretrained.tar)          |
+| ResNeXt152_vd_<br>32x4d       | 0.8072    | 0.9520    | 25.03258              | 30.08987             | 22.49    | 56.3      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_vd_32x4d_pretrained.tar)       |
+| ResNeXt152_<br>64x4d          | 0.7951    | 0.9471    | 46.7564               | 56.34108             | 43.03    | 107.57    | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_64x4d_pretrained.tar)          |
+| ResNeXt152_vd_<br>64x4d       | 0.8108    | 0.9534    | 47.18638              | 57.16257             | 43.52    | 107.59    | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_vd_64x4d_pretrained.tar)       |
+| SE_ResNet18_vd            | 0.7333    | 0.9138    | 1.7691                | 4.19877              | 4.14     | 11.8      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNet18_vd_pretrained.tar)            |
+| SE_ResNet34_vd            | 0.7651    | 0.9320    | 2.88559               | 7.03291              | 7.84     | 21.98     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNet34_vd_pretrained.tar)            |
+| SE_ResNet50_vd            | 0.7952    | 0.9475    | 4.28393               | 10.38846             | 8.67     | 28.09     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNet50_vd_pretrained.tar)            |
+| SE_ResNeXt50_<br>32x4d        | 0.7844    | 0.9396    | 8.74121               | 13.563               | 8.02     | 26.16     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt50_32x4d_pretrained.tar)        |
+| SE_ResNeXt50_vd_<br>32x4d     | 0.8024    | 0.9489    | 9.17134               | 14.76192             | 10.76    | 26.28     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt50_vd_32x4d_pretrained.tar)     |
+| SE_ResNeXt101_<br>32x4d       | 0.7912    | 0.9420    | 18.82604              | 25.31814             | 15.02    | 46.28     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt101_32x4d_pretrained.tar)       |
+| SENet154_vd               | 0.8140    | 0.9548    | 53.79794              | 66.31684             | 45.83    | 114.29    | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/SENet154_vd_pretrained.tar)               |
+
+
+<a name="DPN与DenseNet系列"></a>
+### DPN与DenseNet系列
+
+DPN与DenseNet系列模型的精度、速度指标如下表所示，更多关于该系列的模型介绍可以参考：[DPN与DenseNet系列模型文档](./docs/zh_CN/models/DPN_DenseNet.md)。
+
+
+| 模型                  | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址                                                                                         |
+|-------------|-----------|-----------|-----------------------|----------------------|----------|-----------|--------------------------------------------------------------------------------------|
+| DenseNet121 | 0.7566    | 0.9258    | 4.40447               | 9.32623              | 5.69     | 7.98      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet121_pretrained.tar) |
+| DenseNet161 | 0.7857    | 0.9414    | 10.39152              | 22.15555             | 15.49    | 28.68     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet161_pretrained.tar) |
+| DenseNet169 | 0.7681    | 0.9331    | 6.43598               | 12.98832             | 6.74     | 14.15     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet169_pretrained.tar) |
+| DenseNet201 | 0.7763    | 0.9366    | 8.20652               | 17.45838             | 8.61     | 20.01     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet201_pretrained.tar) |
+| DenseNet264 | 0.7796    | 0.9385    | 12.14722              | 26.27707             | 11.54    | 33.37     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet264_pretrained.tar) |
+| DPN68       | 0.7678    | 0.9343    | 11.64915              | 12.82807             | 4.03     | 10.78     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/DPN68_pretrained.tar)       |
+| DPN92       | 0.7985    | 0.9480    | 18.15746              | 23.87545             | 12.54    | 36.29     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/DPN92_pretrained.tar)       |
+| DPN98       | 0.8059    | 0.9510    | 21.18196              | 33.23925             | 22.22    | 58.46     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/DPN98_pretrained.tar)       |
+| DPN107      | 0.8089    | 0.9532    | 27.62046              | 52.65353             | 35.06    | 82.97     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/DPN107_pretrained.tar)      |
+| DPN131      | 0.8070    | 0.9514    | 28.33119              | 46.19439             | 30.51    | 75.36     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/DPN131_pretrained.tar)      |
+
+
+
+<a name="HRNet系列"></a>
+### HRNet系列
+
+HRNet系列模型的精度、速度指标如下表所示，更多关于该系列的模型介绍可以参考：[HRNet系列模型文档](./docs/zh_CN/models/HRNet.md)。
+
+
+| 模型          | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址                                                                                 |
+|-------------|-----------|-----------|------------------|------------------|----------|-----------|--------------------------------------------------------------------------------------|
+| HRNet_W18_C | 0.7692    | 0.9339    | 7.40636          | 13.29752         | 4.14     | 21.29     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_pretrained.tar) |
+| HRNet_W18_C_ssld | 0.81162    | 0.95804    | 7.40636          | 13.29752         | 4.14     | 21.29     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_ssld_pretrained.tar) |
+| HRNet_W30_C | 0.7804    | 0.9402    | 9.57594          | 17.35485         | 16.23    | 37.71     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W30_C_pretrained.tar) |
+| HRNet_W32_C | 0.7828    | 0.9424    | 9.49807          | 17.72921         | 17.86    | 41.23     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W32_C_pretrained.tar) |
+| HRNet_W40_C | 0.7877    | 0.9447    | 12.12202         | 25.68184         | 25.41    | 57.55     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W40_C_pretrained.tar) |
+| HRNet_W44_C | 0.7900    | 0.9451    | 13.19858         | 32.25202         | 29.79    | 67.06     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W44_C_pretrained.tar) |
+| HRNet_W48_C | 0.7895    | 0.9442    | 13.70761         | 34.43572         | 34.58    | 77.47     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W48_C_pretrained.tar) |
+| HRNet_W48_C_ssld | 0.8363    | 0.9682    | 13.70761         | 34.43572         | 34.58    | 77.47     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W48_C_pretrained.tar) |
+| HRNet_W64_C | 0.7930    | 0.9461    | 17.57527         | 47.9533          | 57.83    | 128.06    | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W64_C_pretrained.tar) |
+
+
+<a name="Inception系列"></a>
+### Inception系列
+
+Inception系列模型的精度、速度指标如下表所示，更多关于该系列的模型介绍可以参考：[Inception系列模型文档](./docs/zh_CN/models/Inception.md)。
+
+| 模型                  | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址                                                                                         |
+|--------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|---------------------------------------------------------------------------------------------|
+| GoogLeNet          | 0.7070    | 0.8966    | 1.88038               | 4.48882              | 2.88     | 8.46      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/GoogLeNet_pretrained.tar)          |
+| Xception41         | 0.7930    | 0.9453    | 4.96939               | 17.01361             | 16.74    | 22.69     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/Xception41_pretrained.tar)         |
+| Xception41_deeplab | 0.7955    | 0.9438    | 5.33541               | 17.55938             | 18.16    | 26.73     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/Xception41_deeplab_pretrained.tar) |
+| Xception65         | 0.8100    | 0.9549    | 7.26158               | 25.88778             | 25.95    | 35.48     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/Xception65_pretrained.tar)         |
+| Xception65_deeplab | 0.8032    | 0.9449    | 7.60208               | 26.03699             | 27.37    | 39.52     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/Xception65_deeplab_pretrained.tar) |
+| Xception71         | 0.8111    | 0.9545    | 8.72457               | 31.55549             | 31.77    | 37.28     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/Xception71_pretrained.tar)         |
+| InceptionV4        | 0.8077    | 0.9526    | 12.99342              | 25.23416             | 24.57    | 42.68     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/InceptionV4_pretrained.tar)        |
+
+
+<a name="EfficientNet与ResNeXt101_wsl系列"></a>
+### EfficientNet与ResNeXt101_wsl系列
+
+EfficientNet与ResNeXt101_wsl系列模型的精度、速度指标如下表所示，更多关于该系列的模型介绍可以参考：[EfficientNet与ResNeXt101_wsl系列模型文档](./docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md)。
+
+
+| 模型                        | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址                                                                                               |
+|---------------------------|-----------|-----------|------------------|------------------|----------|-----------|----------------------------------------------------------------------------------------------------|
+| ResNeXt101_<br>32x8d_wsl      | 0.8255    | 0.9674    | 18.52528         | 34.25319         | 29.14    | 78.44     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x8d_wsl_pretrained.tar)      |
+| ResNeXt101_<br>32x16d_wsl     | 0.8424    | 0.9726    | 25.60395         | 71.88384         | 57.55    | 152.66    | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x16d_wsl_pretrained.tar)     |
+| ResNeXt101_<br>32x32d_wsl     | 0.8497    | 0.9759    | 54.87396         | 160.04337        | 115.17   | 303.11    | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x32d_wsl_pretrained.tar)     |
+| ResNeXt101_<br>32x48d_wsl     | 0.8537    | 0.9769    | 99.01698256      | 315.91261        | 173.58   | 456.2     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x48d_wsl_pretrained.tar)     |
+| Fix_ResNeXt101_<br>32x48d_wsl | 0.8626    | 0.9797    | 160.0838242      | 595.99296        | 354.23   | 456.2     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/Fix_ResNeXt101_32x48d_wsl_pretrained.tar) |
+| EfficientNetB0            | 0.7738    | 0.9331    | 3.442            | 6.11476          | 0.72     | 5.1       | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB0_pretrained.tar)            |
+| EfficientNetB1            | 0.7915    | 0.9441    | 5.3322           | 9.41795          | 1.27     | 7.52      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB1_pretrained.tar)            |
+| EfficientNetB2            | 0.7985    | 0.9474    | 6.29351          | 10.95702         | 1.85     | 8.81      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB2_pretrained.tar)            |
+| EfficientNetB3            | 0.8115    | 0.9541    | 7.67749          | 16.53288         | 3.43     | 11.84     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB3_pretrained.tar)            |
+| EfficientNetB4            | 0.8285    | 0.9623    | 12.15894         | 30.94567         | 8.29     | 18.76     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB4_pretrained.tar)            |
+| EfficientNetB5            | 0.8362    | 0.9672    | 20.48571         | 61.60252         | 19.51    | 29.61     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB5_pretrained.tar)            |
+| EfficientNetB6            | 0.8400    | 0.9688    | 32.62402         | -                | 36.27    | 42        | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB6_pretrained.tar)            |
+| EfficientNetB7            | 0.8430    | 0.9689    | 53.93823         | -                | 72.35    | 64.92     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB7_pretrained.tar)            |
+| EfficientNetB0_<br>small      | 0.7580    | 0.9258    | 2.3076           | 4.71886          | 0.72     | 4.65      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB0_small_pretrained.tar)      |
+
+
+<a name="ResNeSt与RegNet系列"></a>
+### ResNeSt与RegNet系列
+
+ResNeSt与RegNet系列模型的精度、速度指标如下表所示，更多关于该系列的模型介绍可以参考：[ResNeSt与RegNet系列模型文档](./docs/zh_CN/models/ResNeSt_RegNet.md)。
+
+
+| 模型                     | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址                                                                                                 |
+|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------|
+| ResNeSt50_<br>fast_1s1x64d | 0.8035    | 0.9528    | 3.45405                | 8.72680                | 8.68     | 26.3      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeSt50_fast_1s1x64d_pretrained.pdparams) |
+| ResNeSt50              | 0.8102    | 0.9542    | 6.69042    | 8.01664                | 10.78    | 27.5      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeSt50_pretrained.pdparams)              |
+| RegNetX_4GF            | 0.785     | 0.9416    |    6.46478              |      11.19862           | 8        | 22.1      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/RegNetX_4GF_pretrained.pdparams)            |
+
+
+<a name="许可证书"></a>
+## 许可证书
+本项目的发布受<a href="https://github.com/PaddlePaddle/PaddleCLS/blob/master/LICENSE">Apache 2.0 license</a>许可认证。
+
+
+<a name="贡献代码"></a>
+## 贡献代码
+我们非常欢迎你为PaddleClas贡献代码，也十分感谢你的反馈。
+
+- 非常感谢[nblib](https://github.com/nblib)修正了PaddleClas中RandErasing的数据增广配置文件。
+- 非常感谢[chenpy228](https://github.com/chenpy228)修正了PaddleClas文档中的部分错别字。
--- a/deploy/cpp_infer/CMakeLists.txt
+++ b/deploy/cpp_infer/CMakeLists.txt
+project(clas_system CXX C)
+
+option(WITH_MKL        "Compile demo with MKL/OpenBlas support, default use MKL."       ON)
+option(WITH_GPU        "Compile demo with GPU/CPU, default use CPU."                    OFF)
+option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static."   ON)
+option(WITH_TENSORRT "Compile demo with TensorRT."   OFF)
+
+SET(PADDLE_LIB "" CACHE PATH "Location of libraries")
+SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
+SET(CUDA_LIB "" CACHE PATH "Location of libraries")
+SET(CUDNN_LIB "" CACHE PATH "Location of libraries")
+SET(TENSORRT_DIR "" CACHE PATH "Compile demo with TensorRT")
+
+set(DEMO_NAME "clas_system")
+
+
+macro(safe_set_static_flag)
+    foreach(flag_var
+        CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
+        CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
+      if(${flag_var} MATCHES "/MD")
+        string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
+      endif(${flag_var} MATCHES "/MD")
+    endforeach(flag_var)
+endmacro()
+
+if (WITH_MKL)
+    ADD_DEFINITIONS(-DUSE_MKL)
+endif()
+
+if(NOT DEFINED PADDLE_LIB)
+  message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib")
+endif()
+
+if(NOT DEFINED OPENCV_DIR)
+    message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv")
+endif()
+
+
+if (WIN32)
+  include_directories("${PADDLE_LIB}/paddle/fluid/inference")
+  include_directories("${PADDLE_LIB}/paddle/include")
+  link_directories("${PADDLE_LIB}/paddle/fluid/inference")
+  find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/build/ NO_DEFAULT_PATH)
+
+else ()
+  find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/share/OpenCV NO_DEFAULT_PATH)
+  include_directories("${PADDLE_LIB}/paddle/include")
+  link_directories("${PADDLE_LIB}/paddle/lib")
+endif ()
+include_directories(${OpenCV_INCLUDE_DIRS})
+
+if (WIN32)
+    add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
+    set(CMAKE_C_FLAGS_DEBUG   "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd")
+    set(CMAKE_C_FLAGS_RELEASE  "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT")
+    set(CMAKE_CXX_FLAGS_DEBUG  "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd")
+    set(CMAKE_CXX_FLAGS_RELEASE   "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT")
+    if (WITH_STATIC_LIB)
+        safe_set_static_flag()
+        add_definitions(-DSTATIC_LIB)
+    endif()
+else()
+    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -o3 -std=c++11")
+    set(CMAKE_STATIC_LIBRARY_PREFIX "")
+endif()
+message("flags" ${CMAKE_CXX_FLAGS})
+
+
+if (WITH_GPU)
+    if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "")
+        message(FATAL_ERROR "please set CUDA_LIB with -DCUDA_LIB=/path/cuda-8.0/lib64")
+    endif()
+    if (NOT WIN32)
+        if (NOT DEFINED CUDNN_LIB)
+            message(FATAL_ERROR "please set CUDNN_LIB with -DCUDNN_LIB=/path/cudnn_v7.4/cuda/lib64")
+        endif()
+    endif(NOT WIN32)
+endif()
+
+include_directories("${PADDLE_LIB}/third_party/install/protobuf/include")
+include_directories("${PADDLE_LIB}/third_party/install/glog/include")
+include_directories("${PADDLE_LIB}/third_party/install/gflags/include")
+include_directories("${PADDLE_LIB}/third_party/install/xxhash/include")
+include_directories("${PADDLE_LIB}/third_party/install/zlib/include")
+include_directories("${PADDLE_LIB}/third_party/boost")
+include_directories("${PADDLE_LIB}/third_party/eigen3")
+
+include_directories("${CMAKE_SOURCE_DIR}/")
+
+if (NOT WIN32)
+  if (WITH_TENSORRT AND WITH_GPU)
+     include_directories("${TENSORRT_DIR}/include")
+     link_directories("${TENSORRT_DIR}/lib")
+  endif()
+endif(NOT WIN32)
+
+link_directories("${PADDLE_LIB}/third_party/install/zlib/lib")
+
+link_directories("${PADDLE_LIB}/third_party/install/protobuf/lib")
+link_directories("${PADDLE_LIB}/third_party/install/glog/lib")
+link_directories("${PADDLE_LIB}/third_party/install/gflags/lib")
+link_directories("${PADDLE_LIB}/third_party/install/xxhash/lib")
+link_directories("${PADDLE_LIB}/paddle/lib")
+
+
+if(WITH_MKL)
+  include_directories("${PADDLE_LIB}/third_party/install/mklml/include")
+  if (WIN32)
+    set(MATH_LIB ${PADDLE_LIB}/third_party/install/mklml/lib/mklml.lib
+            ${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5md.lib)
+  else ()
+    set(MATH_LIB ${PADDLE_LIB}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
+            ${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
+    execute_process(COMMAND cp -r ${PADDLE_LIB}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX} /usr/lib)
+  endif ()
+  set(MKLDNN_PATH "${PADDLE_LIB}/third_party/install/mkldnn")
+  if(EXISTS ${MKLDNN_PATH})
+    include_directories("${MKLDNN_PATH}/include")
+    if (WIN32)
+      set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
+    else ()
+      set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
+    endif ()
+  endif()
+else()
+  set(MATH_LIB ${PADDLE_LIB}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
+endif()
+
+# Note: libpaddle_inference_api.so/a must put before libpaddle_fluid.so/a
+if(WITH_STATIC_LIB)
+  set(DEPS
+      ${PADDLE_LIB}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
+else()
+  set(DEPS
+      ${PADDLE_LIB}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
+endif()
+
+if (NOT WIN32)
+    set(DEPS ${DEPS}
+        ${MATH_LIB} ${MKLDNN_LIB} 
+        glog gflags protobuf z xxhash
+        )
+    if(EXISTS "${PADDLE_LIB}/third_party/install/snappystream/lib")
+        set(DEPS ${DEPS} snappystream)
+    endif()
+    if (EXISTS "${PADDLE_LIB}/third_party/install/snappy/lib")
+        set(DEPS ${DEPS} snappy)
+    endif()
+else()
+    set(DEPS ${DEPS}
+        ${MATH_LIB} ${MKLDNN_LIB}
+        glog gflags_static libprotobuf xxhash)
+    set(DEPS ${DEPS} libcmt shlwapi)
+    if (EXISTS "${PADDLE_LIB}/third_party/install/snappy/lib")
+        set(DEPS ${DEPS} snappy)
+    endif()
+    if(EXISTS "${PADDLE_LIB}/third_party/install/snappystream/lib")
+        set(DEPS ${DEPS} snappystream)
+    endif()
+endif(NOT WIN32)
+
+
+if(WITH_GPU)
+  if(NOT WIN32)
+    if (WITH_TENSORRT)
+      set(DEPS ${DEPS} ${TENSORRT_DIR}/lib/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
+      set(DEPS ${DEPS} ${TENSORRT_DIR}/lib/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
+    endif()
+    set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX})
+  else()
+    set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
+    set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
+    set(DEPS ${DEPS} ${CUDNN_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX})
+  endif()
+endif()
+
+
+if (NOT WIN32)
+    set(EXTERNAL_LIB "-ldl -lrt -lgomp -lz -lm -lpthread")
+    set(DEPS ${DEPS} ${EXTERNAL_LIB})
+endif()
+
+set(DEPS ${DEPS} ${OpenCV_LIBS})
+
+AUX_SOURCE_DIRECTORY(./src SRCS)
+add_executable(${DEMO_NAME} ${SRCS})
+
+target_link_libraries(${DEMO_NAME} ${DEPS})
+
+if (WIN32 AND WITH_MKL)
+    add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mklml/lib/mklml.dll ./mklml.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5md.dll ./libiomp5md.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mkldnn/lib/mkldnn.dll ./mkldnn.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mkldnn/lib/mkldnn.dll ./release/mkldnn.dll
+    )
+endif()
--- a/deploy/cpp_infer/docs/imgs/ILSVRC2012_val_00000666.JPEG
+++ b/deploy/cpp_infer/docs/imgs/ILSVRC2012_val_00000666.JPEG
--- a/deploy/cpp_infer/docs/imgs/cpp_infer_result.png
+++ b/deploy/cpp_infer/docs/imgs/cpp_infer_result.png
--- a/deploy/cpp_infer/docs/imgs/vs2019_step1.png
+++ b/deploy/cpp_infer/docs/imgs/vs2019_step1.png
--- a/deploy/cpp_infer/docs/imgs/vs2019_step2.png
+++ b/deploy/cpp_infer/docs/imgs/vs2019_step2.png
--- a/deploy/cpp_infer/docs/imgs/vs2019_step3.png
+++ b/deploy/cpp_infer/docs/imgs/vs2019_step3.png
--- a/deploy/cpp_infer/docs/imgs/vs2019_step4.png
+++ b/deploy/cpp_infer/docs/imgs/vs2019_step4.png
--- a/deploy/cpp_infer/docs/imgs/vs2019_step5.png
+++ b/deploy/cpp_infer/docs/imgs/vs2019_step5.png
--- a/deploy/cpp_infer/docs/imgs/vs2019_step6.png
+++ b/deploy/cpp_infer/docs/imgs/vs2019_step6.png
--- a/deploy/cpp_infer/docs/windows_vs2019_build.md
+++ b/deploy/cpp_infer/docs/windows_vs2019_build.md
+# Visual Studio 2019 Community CMake 编译指南
+
+PaddleClas在Windows 平台下基于`Visual Studio 2019 Community` 进行了测试。微软从`Visual Studio 2017`开始即支持直接管理`CMake`跨平台编译项目，但是直到`2019`才提供了稳定和完全的支持，所以如果你想使用CMake管理项目编译构建，我们推荐使用`Visual Studio 2019`。如果您希望通过生成`sln解决方案`的方式进行编译，可以参考该文档：[https://zhuanlan.zhihu.com/p/145446681](https://zhuanlan.zhihu.com/p/145446681)。
+
+
+## 前置条件
+* Visual Studio 2019
+* CUDA 9.0 / CUDA 10.0，cudnn 7.6+ （仅在使用GPU版本的预测库时需要）
+* CMake 3.0+
+
+请确保系统已经安装好上述基本软件，以下测试基于`Visual Studio 2019 Community`版本。
+
+**下面所有示例以工作目录为 `D:\projects`演示**。
+
+### Step1: 下载PaddlePaddle C++ 预测库 fluid_inference
+
+PaddlePaddle C++ 预测库针对不同的`CPU`和`CUDA`版本提供了不同的预编译版本，请根据实际情况下载:  [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/windows_cpp_inference.html)。
+
+解压后`D:\projects\fluid_inference`目录包含内容为：
+```
+fluid_inference
+├── paddle # paddle核心库和头文件
+|
+├── third_party # 第三方依赖库和头文件
+|
+└── version.txt # 版本和编译信息
+```
+
+### Step2: 安装配置OpenCV
+
+1. 在OpenCV官网下载适用于Windows平台的3.4.6版本， [下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download)  
+2. 运行下载的可执行文件，将OpenCV解压至指定目录，如`D:\projects\opencv`
+3. 配置环境变量，如下流程所示  
+    - 此电脑（我的电脑）-> 属性 -> 高级系统设置 -> 环境变量
+    - 在系统变量中找到Path（如没有，自行创建），并双击编辑
+    - 新建，将OpenCV路径填入并保存，如 `D:\projects\opencv\build\x64\vc14\bin`
+
+### Step3: 使用Visual Studio 2019直接编译CMake
+
+1. 打开Visual Studio 2019 Community，点击 `继续但无需代码`
+
+![step2](./imgs/vs2019_step1.png)
+
+2. 点击： `文件`->`打开`->`CMake`
+
+![step2.1](./imgs/vs2019_step2.png)
+
+选择项目代码所在路径，并打开`CMakeList.txt`：
+
+![step2.2](./imgs/vs2019_step3.png)
+
+3. 点击：`项目`->`cpp_inference_demo的CMake设置`
+
+![step3](./imgs/vs2019_step4.png)
+
+4. 请设置以下参数的值
+
+
+| 名称                          | 值                 | 保存到 JSON |
+| ----------------------------- | ------------------ | ----------- |
+| CMAKE_BACKWARDS_COMPATIBILITY | 3.17               | [√]         |
+| CMAKE_BUILD_TYPE              | RelWithDebInfo     | [√]         |
+| CUDA_LIB                      | CUDA的库路径       | [√]         |
+| CUDNN_LIB                     | CUDNN的库路径      | [√]         |
+| OPENCV_DIR                    | OpenCV的安装路径   | [√]         |
+| PADDLE_LIB                    | Paddle预测库的路径 | [√]         |
+| WITH_GPU                      | [√]                | [√]         |
+| WITH_MKL                      | [√]                | [√]         |
+| WITH_STATIC_LIB               | [√]                | [√]         |
+
+**注意**：
+
+1. `CMAKE_BACKWARDS_COMPATIBILITY` 的值请根据自己 `cmake` 版本设置，`cmake` 版本可以通过命令：`cmake --version` 查询；
+2. `CUDA_LIB` 、 `CUDNN_LIB` 的值仅需在使用**GPU版本**预测库时指定，其中CUDA库版本尽量对齐，**使用9.0、10.0版本，不使用9.2、10.1等版本CUDA库**；
+3. 在设置 `CUDA_LIB`、`CUDNN_LIB`、`OPENCV_DIR`、`PADDLE_LIB` 时，点击 `浏览`，分别设置相应的路径；
+4. 在使用`CPU`版预测库时，请把 `WITH_GPU` 的勾去掉。
+
+![step4](./imgs/vs2019_step5.png)
+
+**设置完成后**, 点击上图中 `保存并生成CMake缓存以加载变量` 。
+
+5. 点击`生成`->`全部生成`
+
+![step6](./imgs/vs2019_step6.png)
+
+
+### Step4: 预测及可视化
+
+在完成上述操作后，`Visual Studio 2019` 编译产出的可执行文件 `clas_system.exe` 在 `out\build\x64-Release`目录下，打开`cmd`，并切换到该目录：
+
+```
+cd D:\projects\PaddleClas\deploy\cpp_infer\out\build\x64-Release
+```
+可执行文件`clas_system.exe`即为编译产出的的预测程序，其使用方法如下：
+
+```shell
+#预测图片 `.\docs\ILSVRC2012_val_00008306.JPEG`  
+.\clas_system.exe D:\projects\PaddleClas\deploy\cpp_infer\tools\config.txt .\docs\ILSVRC2012_val_00008306.JPEG
+```
+
+上述命令中，第一个参数为配置文件路径，第二个参数为需要预测的图片路径。
+
+
+### 注意
+* 在Windows下的终端中执行文件exe时，可能会发生乱码的现象，此时需要在终端中输入`CHCP 65001`，将终端的编码方式由GBK编码(默认)改为UTF-8编码，更加具体的解释可以参考这篇博客：[https://blog.csdn.net/qq_35038153/article/details/78430359](https://blog.csdn.net/qq_35038153/article/details/78430359)。
+* 如果需要使用CPU预测，PaddlePaddle在Windows上仅支持avx的CPU预测，目前不支持noavx的CPU预测。
--- a/deploy/cpp_infer/include/cls.h
+++ b/deploy/cpp_infer/include/cls.h
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+
+#include "opencv2/core.hpp"
+#include "opencv2/imgcodecs.hpp"
+#include "opencv2/imgproc.hpp"
+#include "paddle_api.h"
+#include "paddle_inference_api.h"
+#include <chrono>
+#include <iomanip>
+#include <iostream>
+#include <ostream>
+#include <vector>
+
+#include <cstring>
+#include <fstream>
+#include <numeric>
+
+#include <include/preprocess_op.h>
+
+namespace PaddleClas {
+
+class Classifier {
+public:
+  explicit Classifier(const std::string &model_dir, const bool &use_gpu,
+                      const int &gpu_id, const int &gpu_mem,
+                      const int &cpu_math_library_num_threads,
+                      const bool &use_mkldnn, const bool &use_zero_copy_run,
+                      const int &resize_short_size, const int &crop_size) {
+    this->use_gpu_ = use_gpu;
+    this->gpu_id_ = gpu_id;
+    this->gpu_mem_ = gpu_mem;
+    this->cpu_math_library_num_threads_ = cpu_math_library_num_threads;
+    this->use_mkldnn_ = use_mkldnn;
+    this->use_zero_copy_run_ = use_zero_copy_run;
+
+    this->resize_short_size_ = resize_short_size;
+    this->crop_size_ = crop_size;
+
+    LoadModel(model_dir);
+  }
+
+  // Load Paddle inference model
+  void LoadModel(const std::string &model_dir);
+
+  // Run predictor
+  void Run(cv::Mat &img);
+
+private:
+  std::shared_ptr<PaddlePredictor> predictor_;
+
+  bool use_gpu_ = false;
+  int gpu_id_ = 0;
+  int gpu_mem_ = 4000;
+  int cpu_math_library_num_threads_ = 4;
+  bool use_mkldnn_ = false;
+  bool use_zero_copy_run_ = false;
+
+  std::vector<float> mean_ = {0.485f, 0.456f, 0.406f};
+  std::vector<float> scale_ = {1 / 0.229f, 1 / 0.224f, 1 / 0.225f};
+  bool is_scale_ = true;
+
+  int resize_short_size_ = 256;
+  int crop_size_ = 224;
+
+  // pre-process
+  ResizeImg resize_op_;
+  Normalize normalize_op_;
+  Permute permute_op_;
+  CenterCropImg crop_op_;
+};
+
+} // namespace PaddleClas
\ No newline at end of file
--- a/deploy/cpp_infer/include/config.h
+++ b/deploy/cpp_infer/include/config.h
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+
+#include <iomanip>
+#include <iostream>
+#include <map>
+#include <ostream>
+#include <string>
+#include <vector>
+
+#include "include/utility.h"
+
+namespace PaddleClas {
+
+class Config {
+public:
+  explicit Config(const std::string &config_file) {
+    config_map_ = LoadConfig(config_file);
+
+    this->use_gpu = bool(stoi(config_map_["use_gpu"]));
+
+    this->gpu_id = stoi(config_map_["gpu_id"]);
+
+    this->gpu_mem = stoi(config_map_["gpu_mem"]);
+
+    this->cpu_math_library_num_threads =
+        stoi(config_map_["cpu_math_library_num_threads"]);
+
+    this->use_mkldnn = bool(stoi(config_map_["use_mkldnn"]));
+
+    this->use_zero_copy_run = bool(stoi(config_map_["use_zero_copy_run"]));
+
+    this->cls_model_dir.assign(config_map_["cls_model_dir"]);
+
+    this->resize_short_size = stoi(config_map_["resize_short_size"]);
+
+    this->crop_size = stoi(config_map_["crop_size"]);
+  }
+
+  bool use_gpu = false;
+
+  int gpu_id = 0;
+
+  int gpu_mem = 4000;
+
+  int cpu_math_library_num_threads = 1;
+
+  bool use_mkldnn = false;
+
+  bool use_zero_copy_run = false;
+
+  std::string cls_model_dir;
+
+  int resize_short_size = 256;
+  int crop_size = 224;
+
+  void PrintConfigInfo();
+
+private:
+  // Load configuration
+  std::map<std::string, std::string> LoadConfig(const std::string &config_file);
+
+  std::vector<std::string> split(const std::string &str,
+                                 const std::string &delim);
+
+  std::map<std::string, std::string> config_map_;
+};
+
+} // namespace PaddleClas
--- a/deploy/cpp_infer/include/preprocess_op.h
+++ b/deploy/cpp_infer/include/preprocess_op.h
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+
+#include "opencv2/core.hpp"
+#include "opencv2/imgcodecs.hpp"
+#include "opencv2/imgproc.hpp"
+#include <chrono>
+#include <iomanip>
+#include <iostream>
+#include <ostream>
+#include <vector>
+
+#include <cstring>
+#include <fstream>
+#include <numeric>
+
+using namespace std;
+using namespace paddle;
+
+namespace PaddleClas {
+
+class Normalize {
+public:
+  virtual void Run(cv::Mat *im, const std::vector<float> &mean,
+                   const std::vector<float> &scale, const bool is_scale = true);
+};
+
+// RGB -> CHW
+class Permute {
+public:
+  virtual void Run(const cv::Mat *im, float *data);
+};
+
+class CenterCropImg {
+public:
+  virtual void Run(cv::Mat &im, const int crop_size = 224);
+};
+
+class ResizeImg {
+public:
+  virtual void Run(const cv::Mat &img, cv::Mat &resize_img, int max_size_len);
+};
+
+} // namespace PaddleClas
\ No newline at end of file
--- a/deploy/cpp_infer/include/utility.h
+++ b/deploy/cpp_infer/include/utility.h
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+
+#include <chrono>
+#include <iomanip>
+#include <iostream>
+#include <ostream>
+#include <stdlib.h>
+#include <vector>
+
+#include <algorithm>
+#include <cstring>
+#include <fstream>
+#include <numeric>
+
+#include "opencv2/core.hpp"
+#include "opencv2/imgcodecs.hpp"
+#include "opencv2/imgproc.hpp"
+
+namespace PaddleClas {
+
+class Utility {
+public:
+  static std::vector<std::string> ReadDict(const std::string &path);
+
+  //   template <class ForwardIterator>
+  //   inline static size_t argmax(ForwardIterator first, ForwardIterator last)
+  //   {
+  //     return std::distance(first, std::max_element(first, last));
+  //   }
+};
+
+} // namespace PaddleClas
\ No newline at end of file
--- a/deploy/cpp_infer/readme.md
+++ b/deploy/cpp_infer/readme.md
+# 服务器端C++预测
+
+本教程将介绍在服务器端部署PaddleClas模型的详细步骤。
+
+
+## 1. 准备环境
+
+### 运行准备
+- Linux环境，推荐使用docker。
+- Windows环境，目前支持基于`Visual Studio 2019 Community`进行编译；此外，如果您希望通过生成`sln解决方案`的方式进行编译，可以参考该文档：[https://zhuanlan.zhihu.com/p/145446681](https://zhuanlan.zhihu.com/p/145446681)
+
+* 该文档主要介绍基于Linux环境下的PaddleClas C++预测流程，如果需要在Windows环境下使用预测库进行C++预测，具体编译方法请参考[Windows下编译教程](./docs/windows_vs2019_build.md)。
+
+### 1.1 编译opencv库
+
+* 首先需要从opencv官网上下载在Linux环境下源码编译的包，以3.4.7版本为例，下载及解压缩命令如下：
+
+```
+wget https://github.com/opencv/opencv/archive/3.4.7.tar.gz
+tar -xvf 3.4.7.tar.gz
+```
+
+最终可以在当前目录下看到`opencv-3.4.7/`的文件夹。
+
+* 编译opencv，首先设置opencv源码路径(`root_path`)以及安装路径(`install_path`)，`root_path`为下载的opencv源码路径，`install_path`为opencv的安装路径。在本例中，源码路径即为当前目录下的`opencv-3.4.7/`。
+
+```shell
+cd ./opencv-3.4.7
+export root_path=$PWD
+export install_path=${root_path}/opencv3
+```
+
+* 然后在opencv源码路径下，按照下面的方式进行编译。
+
+```shell
+rm -rf build
+mkdir build
+cd build
+
+cmake .. \
+    -DCMAKE_INSTALL_PREFIX=${install_path} \
+    -DCMAKE_BUILD_TYPE=Release \
+    -DBUILD_SHARED_LIBS=OFF \
+    -DWITH_IPP=OFF \
+    -DBUILD_IPP_IW=OFF \
+    -DWITH_LAPACK=OFF \
+    -DWITH_EIGEN=OFF \
+    -DCMAKE_INSTALL_LIBDIR=lib64 \
+    -DWITH_ZLIB=ON \
+    -DBUILD_ZLIB=ON \
+    -DWITH_JPEG=ON \
+    -DBUILD_JPEG=ON \
+    -DWITH_PNG=ON \
+    -DBUILD_PNG=ON \
+    -DWITH_TIFF=ON \
+    -DBUILD_TIFF=ON
+
+make -j
+make install
+```
+
+* `make install`完成之后，会在该文件夹下生成opencv头文件和库文件，用于后面的PaddleClas代码编译。
+
+以opencv3.4.7版本为例，最终在安装路径下的文件结构如下所示。**注意**：不同的opencv版本，下述的文件结构可能不同。
+
+```
+opencv3/
+|-- bin
+|-- include
+|-- lib64
+|-- share
+```
+
+### 1.2 下载或者编译Paddle预测库
+
+* 有2种方式获取Paddle预测库，下面进行详细介绍。
+
+#### 1.2.1 预测库源码编译
+* 如果希望获取最新预测库特性，可以从Paddle github上克隆最新代码，源码编译预测库。
+* 可以参考[Paddle预测库官网](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)的说明，从github上获取Paddle代码，然后进行编译，生成最新的预测库。使用git获取代码方法如下。
+
+```shell
+git clone https://github.com/PaddlePaddle/Paddle.git
+```
+
+* 进入Paddle目录后，使用如下方法编译。
+
+```shell
+rm -rf build
+mkdir build
+cd build
+
+cmake  .. \
+    -DWITH_CONTRIB=OFF \
+    -DWITH_MKL=ON \
+    -DWITH_MKLDNN=ON  \
+    -DWITH_TESTING=OFF \
+    -DCMAKE_BUILD_TYPE=Release \
+    -DWITH_INFERENCE_API_TEST=OFF \
+    -DON_INFER=ON \
+    -DWITH_PYTHON=ON
+make -j
+make inference_lib_dist
+```
+
+更多编译参数选项可以参考Paddle C++预测库官网：[https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。
+
+
+* 编译完成之后，可以在`build/fluid_inference_install_dir/`文件下看到生成了以下文件及文件夹。
+
+```
+build/fluid_inference_install_dir/
+|-- CMakeCache.txt
+|-- paddle
+|-- third_party
+|-- version.txt
+```
+
+其中`paddle`就是之后进行C++预测时所需的Paddle库，`version.txt`中包含当前预测库的版本信息。
+
+#### 1.2.2 直接下载安装
+
+* [Paddle预测库官网](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)上提供了不同cuda版本的Linux预测库，可以在官网查看并选择合适的预测库版本。
+
+  以`ubuntu14.04_cuda9.0_cudnn7_avx_mkl`的`1.8.4`版本为例，使用下述命令下载并解压：
+
+
+```shell
+wget https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz
+
+tar -xvf fluid_inference.tgz
+```
+
+
+最终会在当前的文件夹中生成`fluid_inference/`的子文件夹。
+
+
+## 2 开始运行
+
+### 2.1 将模型导出为inference model
+
+* 可以参考[模型导出](../../tools/export_model.py)，导出`inference model`，用于模型预测。得到预测模型后，假设模型文件放在`inference`目录下，则目录结构如下。
+
+```
+inference/
+|--model
+|--params
+```
+**注意**：上述文件中，`model`文件存储了模型结构信息，`params`文件存储了模型参数信息。因此，在使用模型导出时，需将导出的`__model__`文件重命名为`model`，`__variables__`文件重命名为`params`。
+
+
+### 2.2 编译PaddleClas C++预测demo
+
+* 编译命令如下，其中Paddle C++预测库、opencv等其他依赖库的地址需要换成自己机器上的实际地址。
+
+
+```shell
+sh tools/build.sh
+```
+
+具体地，`tools/build.sh`中内容如下。
+
+```shell
+OPENCV_DIR=your_opencv_dir
+LIB_DIR=your_paddle_inference_dir
+CUDA_LIB_DIR=your_cuda_lib_dir
+CUDNN_LIB_DIR=your_cudnn_lib_dir
+
+BUILD_DIR=build
+rm -rf ${BUILD_DIR}
+mkdir ${BUILD_DIR}
+cd ${BUILD_DIR}
+cmake .. \
+    -DPADDLE_LIB=${LIB_DIR} \
+    -DWITH_MKL=ON \
+    -DDEMO_NAME=clas_system \
+    -DWITH_GPU=OFF \
+    -DWITH_STATIC_LIB=OFF \
+    -DUSE_TENSORRT=OFF \
+    -DOPENCV_DIR=${OPENCV_DIR} \
+    -DCUDNN_LIB=${CUDNN_LIB_DIR} \
+    -DCUDA_LIB=${CUDA_LIB_DIR} \
+
+make -j
+```
+
+上述命令中，
+
+* `OPENCV_DIR`为opencv编译安装的地址（本例中为`opencv-3.4.7/opencv3`文件夹的路径）；
+
+* `LIB_DIR`为下载的Paddle预测库（`fluid_inference`文件夹），或编译生成的Paddle预测库（`build/fluid_inference_install_dir`文件夹）的路径；
+
+* `CUDA_LIB_DIR`为cuda库文件地址，在docker中为`/usr/local/cuda/lib64`；
+
+* `CUDNN_LIB_DIR`为cudnn库文件地址，在docker中为`/usr/lib/x86_64-linux-gnu/`。
+
+在执行上述命令，编译完成之后，会在当前路径下生成`build`文件夹，其中生成一个名为`clas_system`的可执行文件。
+
+
+### 运行demo
+* 执行以下命令，完成对一幅图像的分类。
+
+```shell
+sh tools/run.sh
+```
+
+* 最终屏幕上会输出结果，如下图所示。
+
+<div align="center">
+    <img src="./docs/imgs/cpp_infer_result.png" width="600">
+</div>
+
+
+其中`class id`表示置信度最高的类别对应的id，score表示图片属于该类别的概率。
--- a/deploy/cpp_infer/readme_en.md
+++ b/deploy/cpp_infer/readme_en.md
+# Server-side C++ inference
+
+
+In this tutorial, we will introduce the detailed steps of deploying PaddleClas models on the server side.
+
+
+## 1. Prepare the environment
+
+### Environment
+
+- Linux, docker is recommended.
+- Windows, compilation based on `Visual Studio 2019 Community` is supported. In addition, you can refer to [How to use PaddleDetection to make a complete project](https://zhuanlan.zhihu.com/p/145446681) to compile by generating the `sln solution`.
+- This document mainly introduces the compilation and inference of PaddleClas C++ in Linux environment.
+- If you need to use the Inference Library in Windows environment, please refer to [The compilation tutorial in Windows](./docs/windows_vs2019_build.md) for detailed information.
+
+
+### 1.1 Compile opencv
+
+* First of all, you need to download the source code compiled package in the Linux environment from the opencv official website. Taking opencv3.4.7 as an example, the download and uncompress command are as follows.
+
+```
+wget https://github.com/opencv/opencv/archive/3.4.7.tar.gz
+tar -xf 3.4.7.tar.gz
+```
+
+Finally, you can see the folder of `opencv-3.4.7/` in the current directory.
+
+* Compile opencv, the opencv source path (`root_path`) and installation path (`install_path`) should be set by yourself. Among them, `root_path` is the downloaded opencv source code path, and `install_path` is the installation path of opencv. In this case, the opencv source is `./opencv-3.4.7`.
+
+```shell
+cd ./opencv-3.4.7
+export root_path=$PWD
+export install_path=${root_path}/opencv3
+```
+
+* After entering the opencv source code path, you can compile it in the following way.
+
+
+```shell
+rm -rf build
+mkdir build
+cd build
+
+cmake .. \
+    -DCMAKE_INSTALL_PREFIX=${install_path} \
+    -DCMAKE_BUILD_TYPE=Release \
+    -DBUILD_SHARED_LIBS=OFF \
+    -DWITH_IPP=OFF \
+    -DBUILD_IPP_IW=OFF \
+    -DWITH_LAPACK=OFF \
+    -DWITH_EIGEN=OFF \
+    -DCMAKE_INSTALL_LIBDIR=lib64 \
+    -DWITH_ZLIB=ON \
+    -DBUILD_ZLIB=ON \
+    -DWITH_JPEG=ON \
+    -DBUILD_JPEG=ON \
+    -DWITH_PNG=ON \
+    -DBUILD_PNG=ON \
+    -DWITH_TIFF=ON \
+    -DBUILD_TIFF=ON
+
+make -j
+make install
+```
+
+* After `make install` is completed, the opencv header file and library file will be generated in this folder for later PaddleClas source code compilation.
+
+Take opencv3.4.7 for example, the final file structure under the opencv installation path is as follows. **NOTICE**:The following file structure may be different for different Versions of Opencv.
+
+```
+opencv3/
+|-- bin
+|-- include
+|-- lib64
+|-- share
+```
+
+### 1.2 Compile or download the Paddle Inference Library
+
+* There are 2 ways to obtain the Paddle Inference Library, described in detail below.
+
+
+#### 1.2.1 Compile from the source code
+* If you want to get the latest Paddle Inference Library features, you can download the latest code from Paddle GitHub repository and compile the inference library from the source code.
+* You can refer to [Paddle Inference Library] (https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html) to get the Paddle source code from github, and then compile To generate the latest inference library. The method of using git to access the code is as follows.
+
+
+```shell
+git clone https://github.com/PaddlePaddle/Paddle.git
+```
+
+* After entering the Paddle directory, the compilation method is as follows.
+
+```shell
+rm -rf build
+mkdir build
+cd build
+
+cmake  .. \
+    -DWITH_CONTRIB=OFF \
+    -DWITH_MKL=ON \
+    -DWITH_MKLDNN=ON  \
+    -DWITH_TESTING=OFF \
+    -DCMAKE_BUILD_TYPE=Release \
+    -DWITH_INFERENCE_API_TEST=OFF \
+    -DON_INFER=ON \
+    -DWITH_PYTHON=ON
+make -j
+make inference_lib_dist
+```
+
+For more compilation parameter options, please refer to the official website of the Paddle C++ inference library:[https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html](https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html).
+
+
+* After the compilation process, you can see the following files in the folder of `build/fluid_inference_install_dir/`.
+
+```
+build/fluid_inference_install_dir/
+|-- CMakeCache.txt
+|-- paddle
+|-- third_party
+|-- version.txt
+```
+
+Among them, `paddle` is the Paddle library required for C++ prediction later, and `version.txt` contains the version information of the current inference library.
+
+
+
+#### 1.2.2 Direct download and installation
+
+* Different cuda versions of the Linux inference library (based on GCC 4.8.2) are provided on the
+[Paddle Inference Library official website](https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html). You can view and select the appropriate version of the inference library on the official website.
+
+
+* After downloading, use the following method to uncompress.
+
+```
+tar -xf fluid_inference.tgz
+```
+
+Finally you can see the following files in the folder of `fluid_inference/`.
+
+
+## 2. Compile and run the demo
+
+### 2.1 Export the inference model
+
+* You can refer to [Model inference]((../../tools/export_model.py))，export the inference model. After the model is exported, assuming it is placed in the `inference` directory, the directory structure is as follows.
+
+```
+inference/
+|--model
+|--params
+```
+
+**NOTICE**: Among them, `model` file stores the model structure information and the `params` file stores the model parameter information.Therefore, you could rename the files name exported by [Model inference]((../../tools/export_model.py)).
+
+### 2.2 Compile PaddleClas C++ inference demo
+
+
+* The compilation commands are as follows. The addresses of Paddle C++ inference library, opencv and other Dependencies need to be replaced with the actual addresses on your own machines.
+
+```shell
+sh tools/build.sh
+```
+
+Specifically, the content in `tools/build.sh` is as follows.
+
+```shell
+OPENCV_DIR=your_opencv_dir
+LIB_DIR=your_paddle_inference_dir
+CUDA_LIB_DIR=your_cuda_lib_dir
+CUDNN_LIB_DIR=your_cudnn_lib_dir
+
+BUILD_DIR=build
+rm -rf ${BUILD_DIR}
+mkdir ${BUILD_DIR}
+cd ${BUILD_DIR}
+cmake .. \
+    -DPADDLE_LIB=${LIB_DIR} \
+    -DWITH_MKL=ON \
+    -DDEMO_NAME=ocr_system \
+    -DWITH_GPU=OFF \
+    -DWITH_STATIC_LIB=OFF \
+    -DUSE_TENSORRT=OFF \
+    -DOPENCV_DIR=${OPENCV_DIR} \
+    -DCUDNN_LIB=${CUDNN_LIB_DIR} \
+    -DCUDA_LIB=${CUDA_LIB_DIR} \
+
+make -j
+```
+
+In the above parameters of command:
+
+* `OPENCV_DIR` is the opencv installation path;
+
+* `LIB_DIR` is the download (`fluid_inference` folder) or the generated Paddle Inference Library path (`build/fluid_inference_install_dir` folder);
+
+* `CUDA_LIB_DIR` is the cuda library file path, in docker; it is `/usr/local/cuda/lib64`;
+
+* `CUDNN_LIB_DIR` is the cudnn library file path, in docker it is `/usr/lib/x86_64-linux-gnu/`.
+
+After the compilation is completed, an executable file named `ocr_system` will be generated in the `build` folder.
+
+
+### Run the demo
+* Execute the following command to complete the classification of an image.
+
+```shell
+sh tools/run.sh
+```
+
+* The detection results will be shown on the screen, which is as follows.
+
+<div align="center">
+    <img src="./docs/imgs/cpp_infer_result.png" width="600">
+</div>
+
+* In the above results,`class id` represents the id corresponding to the category with the highest confidence, and `score` represents the probability that the image belongs to that category.
--- a/deploy/cpp_infer/src/cls.cpp
+++ b/deploy/cpp_infer/src/cls.cpp
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <include/cls.h>
+
+namespace PaddleClas {
+
+void Classifier::LoadModel(const std::string &model_dir) {
+  AnalysisConfig config;
+  config.SetModel(model_dir + "/model", model_dir + "/params");
+
+  if (this->use_gpu_) {
+    config.EnableUseGpu(this->gpu_mem_, this->gpu_id_);
+  } else {
+    config.DisableGpu();
+    if (this->use_mkldnn_) {
+      config.EnableMKLDNN();
+      // cache 10 different shapes for mkldnn to avoid memory leak
+      config.SetMkldnnCacheCapacity(10);
+    }
+    config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_);
+  }
+
+  // false for zero copy tensor
+  // true for commom tensor
+  config.SwitchUseFeedFetchOps(!this->use_zero_copy_run_);
+  // true for multiple input
+  config.SwitchSpecifyInputNames(true);
+
+  config.SwitchIrOptim(true);
+
+  config.EnableMemoryOptim();
+  config.DisableGlogInfo();
+
+  this->predictor_ = CreatePaddlePredictor(config);
+}
+
+void Classifier::Run(cv::Mat &img) {
+  cv::Mat srcimg;
+  cv::Mat resize_img;
+  img.copyTo(srcimg);
+
+  this->resize_op_.Run(img, resize_img, this->resize_short_size_);
+
+  this->crop_op_.Run(resize_img, this->crop_size_);
+
+  this->normalize_op_.Run(&resize_img, this->mean_, this->scale_,
+                          this->is_scale_);
+  std::vector<float> input(1 * 3 * resize_img.rows * resize_img.cols, 0.0f);
+  this->permute_op_.Run(&resize_img, input.data());
+
+  // Inference.
+  if (this->use_zero_copy_run_) {
+    auto input_names = this->predictor_->GetInputNames();
+    auto input_t = this->predictor_->GetInputTensor(input_names[0]);
+    input_t->Reshape({1, 3, resize_img.rows, resize_img.cols});
+    input_t->copy_from_cpu(input.data());
+    this->predictor_->ZeroCopyRun();
+  } else {
+    paddle::PaddleTensor input_t;
+    input_t.shape = {1, 3, resize_img.rows, resize_img.cols};
+    input_t.data =
+        paddle::PaddleBuf(input.data(), input.size() * sizeof(float));
+    input_t.dtype = PaddleDType::FLOAT32;
+    std::vector<paddle::PaddleTensor> outputs;
+    this->predictor_->Run({input_t}, &outputs, 1);
+  }
+
+  std::vector<float> out_data;
+  auto output_names = this->predictor_->GetOutputNames();
+  auto output_t = this->predictor_->GetOutputTensor(output_names[0]);
+  std::vector<int> output_shape = output_t->shape();
+  int out_num = std::accumulate(output_shape.begin(), output_shape.end(), 1,
+                                std::multiplies<int>());
+
+  out_data.resize(out_num);
+  output_t->copy_to_cpu(out_data.data());
+
+  int maxPosition =
+      max_element(out_data.begin(), out_data.end()) - out_data.begin();
+  std::cout << "result: " << std::endl;
+  std::cout << "\tclass id: " << maxPosition << std::endl;
+  std::cout << std::fixed << std::setprecision(10)
+            << "\tscore: " << double(out_data[maxPosition]) << std::endl;
+}
+
+} // namespace PaddleClas
--- a/deploy/cpp_infer/src/config.cpp
+++ b/deploy/cpp_infer/src/config.cpp
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <include/config.h>
+
+namespace PaddleClas {
+
+std::vector<std::string> Config::split(const std::string &str,
+                                       const std::string &delim) {
+  std::vector<std::string> res;
+  if ("" == str)
+    return res;
+  char *strs = new char[str.length() + 1];
+  std::strcpy(strs, str.c_str());
+
+  char *d = new char[delim.length() + 1];
+  std::strcpy(d, delim.c_str());
+
+  char *p = std::strtok(strs, d);
+  while (p) {
+    std::string s = p;
+    res.push_back(s);
+    p = std::strtok(NULL, d);
+  }
+
+  return res;
+}
+
+std::map<std::string, std::string>
+Config::LoadConfig(const std::string &config_path) {
+  auto config = Utility::ReadDict(config_path);
+
+  std::map<std::string, std::string> dict;
+  for (int i = 0; i < config.size(); i++) {
+    // pass for empty line or comment
+    if (config[i].size() <= 1 || config[i][0] == '#') {
+      continue;
+    }
+    std::vector<std::string> res = split(config[i], " ");
+    dict[res[0]] = res[1];
+  }
+  return dict;
+}
+
+void Config::PrintConfigInfo() {
+  std::cout << "=======Paddle Class inference config======" << std::endl;
+  for (auto iter = config_map_.begin(); iter != config_map_.end(); iter++) {
+    std::cout << iter->first << " : " << iter->second << std::endl;
+  }
+  std::cout << "=======End of Paddle Class inference config======" << std::endl;
+}
+
+} // namespace PaddleClas
\ No newline at end of file
--- a/deploy/cpp_infer/src/main.cpp
+++ b/deploy/cpp_infer/src/main.cpp
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "opencv2/core.hpp"
+#include "opencv2/imgcodecs.hpp"
+#include "opencv2/imgproc.hpp"
+#include <chrono>
+#include <iomanip>
+#include <iostream>
+#include <ostream>
+#include <vector>
+
+#include <cstring>
+#include <fstream>
+#include <numeric>
+
+#include <include/cls.h>
+#include <include/config.h>
+
+using namespace std;
+using namespace cv;
+using namespace PaddleClas;
+
+int main(int argc, char **argv) {
+  if (argc < 3) {
+    std::cerr << "[ERROR] usage: " << argv[0]
+              << " configure_filepath image_path\n";
+    exit(1);
+  }
+
+  Config config(argv[1]);
+
+  config.PrintConfigInfo();
+
+  std::string img_path(argv[2]);
+
+  cv::Mat srcimg = cv::imread(img_path, cv::IMREAD_COLOR);
+  cv::cvtColor(srcimg, srcimg, cv::COLOR_BGR2RGB);
+
+  Classifier classifier(config.cls_model_dir, config.use_gpu, config.gpu_id,
+                        config.gpu_mem, config.cpu_math_library_num_threads,
+                        config.use_mkldnn, config.use_zero_copy_run,
+                        config.resize_short_size, config.crop_size);
+
+  auto start = std::chrono::system_clock::now();
+  classifier.Run(srcimg);
+  auto end = std::chrono::system_clock::now();
+  auto duration =
+      std::chrono::duration_cast<std::chrono::microseconds>(end - start);
+  std::cout << "Cost "
+            << double(duration.count()) *
+                   std::chrono::microseconds::period::num /
+                   std::chrono::microseconds::period::den
+            << " s" << std::endl;
+
+  return 0;
+}
--- a/deploy/cpp_infer/src/preprocess_op.cpp
+++ b/deploy/cpp_infer/src/preprocess_op.cpp
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "opencv2/core.hpp"
+#include "opencv2/imgcodecs.hpp"
+#include "opencv2/imgproc.hpp"
+#include "paddle_api.h"
+#include "paddle_inference_api.h"
+#include <chrono>
+#include <iomanip>
+#include <iostream>
+#include <ostream>
+#include <vector>
+
+#include <cstring>
+#include <fstream>
+#include <math.h>
+#include <numeric>
+
+#include <include/preprocess_op.h>
+
+namespace PaddleClas {
+
+void Permute::Run(const cv::Mat *im, float *data) {
+  int rh = im->rows;
+  int rw = im->cols;
+  int rc = im->channels();
+  for (int i = 0; i < rc; ++i) {
+    cv::extractChannel(*im, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), i);
+  }
+}
+
+void Normalize::Run(cv::Mat *im, const std::vector<float> &mean,
+                    const std::vector<float> &scale, const bool is_scale) {
+  double e = 1.0;
+  if (is_scale) {
+    e /= 255.0;
+  }
+  (*im).convertTo(*im, CV_32FC3, e);
+  for (int h = 0; h < im->rows; h++) {
+    for (int w = 0; w < im->cols; w++) {
+      im->at<cv::Vec3f>(h, w)[0] =
+          (im->at<cv::Vec3f>(h, w)[0] - mean[0]) * scale[0];
+      im->at<cv::Vec3f>(h, w)[1] =
+          (im->at<cv::Vec3f>(h, w)[1] - mean[1]) * scale[1];
+      im->at<cv::Vec3f>(h, w)[2] =
+          (im->at<cv::Vec3f>(h, w)[2] - mean[2]) * scale[2];
+    }
+  }
+}
+
+void CenterCropImg::Run(cv::Mat &img, const int crop_size) {
+  int resize_w = img.cols;
+  int resize_h = img.rows;
+  int w_start = int((resize_w - crop_size) / 2);
+  int h_start = int((resize_h - crop_size) / 2);
+  cv::Rect rect(w_start, h_start, crop_size, crop_size);
+  img = img(rect);
+}
+
+void ResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img,
+                    int resize_short_size) {
+  int w = img.cols;
+  int h = img.rows;
+
+  float ratio = 1.f;
+  if (h < w) {
+    ratio = float(resize_short_size) / float(h);
+  } else {
+    ratio = float(resize_short_size) / float(w);
+  }
+
+  int resize_h = round(float(h) * ratio);
+  int resize_w = round(float(w) * ratio);
+
+  cv::resize(img, resize_img, cv::Size(resize_w, resize_h));
+}
+
+} // namespace PaddleClas
\ No newline at end of file
--- a/deploy/cpp_infer/src/utility.cpp
+++ b/deploy/cpp_infer/src/utility.cpp
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <iostream>
+#include <ostream>
+#include <vector>
+
+#include <include/utility.h>
+
+namespace PaddleClas {
+
+std::vector<std::string> Utility::ReadDict(const std::string &path) {
+  std::ifstream in(path);
+  std::string line;
+  std::vector<std::string> m_vec;
+  if (in) {
+    while (getline(in, line)) {
+      m_vec.push_back(line);
+    }
+  } else {
+    std::cout << "no such label file: " << path << ", exit the program..."
+              << std::endl;
+    exit(1);
+  }
+  return m_vec;
+}
+
+} // namespace PaddleClas
\ No newline at end of file
--- a/deploy/cpp_infer/tools/build.sh
+++ b/deploy/cpp_infer/tools/build.sh
+OPENCV_DIR=/PaddleClas/PaddleOCR/opencv-3.4.7/opencv3/
+LIB_DIR=/PaddleClas/PaddleOCR/fluid_inference/
+CUDA_LIB_DIR=/usr/local/cuda/lib64
+CUDNN_LIB_DIR=/usr/lib/x86_64-linux-gnu/
+
+BUILD_DIR=build
+rm -rf ${BUILD_DIR}
+mkdir ${BUILD_DIR}
+cd ${BUILD_DIR}
+cmake .. \
+    -DPADDLE_LIB=${LIB_DIR} \
+    -DWITH_MKL=ON \
+    -DWITH_GPU=OFF \
+    -DWITH_STATIC_LIB=OFF \
+    -DUSE_TENSORRT=OFF \
+    -DOPENCV_DIR=${OPENCV_DIR} \
+    -DCUDNN_LIB=${CUDNN_LIB_DIR} \
+    -DCUDA_LIB=${CUDA_LIB_DIR} \
+
+make -j
--- a/deploy/cpp_infer/tools/config.txt
+++ b/deploy/cpp_infer/tools/config.txt
+# model load config
+use_gpu  0
+gpu_id  0
+gpu_mem  4000
+cpu_math_library_num_threads  10
+use_mkldnn 1
+use_zero_copy_run 1
+
+# cls config
+cls_model_dir  ./inference/
+resize_short_size 256
+crop_size 224
--- a/deploy/cpp_infer/tools/run.sh
+++ b/deploy/cpp_infer/tools/run.sh
+
+./build/clas_system ./tools/config.txt ./docs/imgs/ILSVRC2012_val_00000666.JPEG
--- a/deploy/lite/Makefile
+++ b/deploy/lite/Makefile
+ARM_ABI = arm8
+export ARM_ABI
+
+include ../Makefile.def
+
+LITE_ROOT=../../../
+
+THIRD_PARTY_DIR=${LITE_ROOT}/third_party
+
+OPENCV_VERSION=opencv4.1.0
+
+OPENCV_LIBS = ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/arm64-v8a/libs/libopencv_imgcodecs.a \
+              ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/arm64-v8a/libs/libopencv_imgproc.a \
+              ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/arm64-v8a/libs/libopencv_core.a \
+              ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/arm64-v8a/3rdparty/libs/libtegra_hal.a \
+              ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/arm64-v8a/3rdparty/libs/liblibjpeg-turbo.a \
+              ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/arm64-v8a/3rdparty/libs/liblibwebp.a \
+              ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/arm64-v8a/3rdparty/libs/liblibpng.a \
+              ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/arm64-v8a/3rdparty/libs/liblibjasper.a \
+              ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/arm64-v8a/3rdparty/libs/liblibtiff.a \
+              ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/arm64-v8a/3rdparty/libs/libIlmImf.a \
+              ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/arm64-v8a/3rdparty/libs/libtbb.a \
+              ${THIRD_PARTY_DIR}/${OPENCV_VERSION}/arm64-v8a/3rdparty/libs/libcpufeatures.a
+
+OPENCV_INCLUDE = -I../../../third_party/${OPENCV_VERSION}/arm64-v8a/include
+
+CXX_INCLUDES = $(INCLUDES) ${OPENCV_INCLUDE} -I$(LITE_ROOT)/cxx/include
+
+CXX_LIBS = ${OPENCV_LIBS} -L$(LITE_ROOT)/cxx/lib/ -lpaddle_light_api_shared $(SYSTEM_LIBS)
+
+###############################################################
+# How to use one of static libaray:                           #
+#  `libpaddle_api_full_bundled.a`                             #
+#  `libpaddle_api_light_bundled.a`                            #
+###############################################################
+# Note: default use lite's shared library.                    #
+###############################################################
+# 1. Comment above line using `libpaddle_light_api_shared.so`
+# 2. Undo comment below line using `libpaddle_api_light_bundled.a`
+
+#CXX_LIBS = $(LITE_ROOT)/cxx/lib/libpaddle_api_light_bundled.a $(SYSTEM_LIBS)
+
+clas_system: fetch_opencv clas_system.o
+	$(CC) $(SYSROOT_LINK) $(CXXFLAGS_LINK) clas_system.o -o clas_system  $(CXX_LIBS) $(LDFLAGS)
+
+clas_system.o: image_classfication.cpp
+	$(CC) $(SYSROOT_COMPLILE) $(CXX_DEFINES) $(CXX_INCLUDES) $(CXX_FLAGS) -o clas_system.o -c image_classfication.cpp
+
+fetch_opencv:
+	@ test -d ${THIRD_PARTY_DIR} ||  mkdir ${THIRD_PARTY_DIR}
+	@ test -e ${THIRD_PARTY_DIR}/${OPENCV_VERSION}.tar.gz || \
+      (echo "fetch opencv libs" && \
+      wget -P ${THIRD_PARTY_DIR} https://paddle-inference-dist.bj.bcebos.com/${OPENCV_VERSION}.tar.gz)
+	@ test -d ${THIRD_PARTY_DIR}/${OPENCV_VERSION} || \
+      tar -zxvf ${THIRD_PARTY_DIR}/${OPENCV_VERSION}.tar.gz -C ${THIRD_PARTY_DIR}
+
+
+.PHONY: clean
+clean:
+	rm -f clas_system.o
+	rm -f clas_system
--- a/deploy/lite/config.txt
+++ b/deploy/lite/config.txt
+clas_model_file ./MobileNetV3_large_x1_0.nb
+label_path ./imagenet1k_label_list.txt
+resize_short_size 256
+crop_size 224
+visualize 0
--- a/deploy/lite/image_classfication.cpp
+++ b/deploy/lite/image_classfication.cpp
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "paddle_api.h" // NOLINT
+#include <arm_neon.h>
+#include <chrono>
+#include <fstream>
+#include <iostream>
+#include <math.h>
+#include <opencv2/opencv.hpp>
+#include <sys/time.h>
+#include <vector>
+
+using namespace paddle::lite_api; // NOLINT
+using namespace std;
+
+struct RESULT {
+  std::string class_name;
+  int class_id;
+  float score;
+};
+
+std::vector<RESULT> PostProcess(const float *output_data, int output_size,
+                                const std::vector<std::string> &word_labels,
+                                cv::Mat &output_image) {
+  const int TOPK = 5;
+  int max_indices[TOPK];
+  double max_scores[TOPK];
+  for (int i = 0; i < TOPK; i++) {
+    max_indices[i] = 0;
+    max_scores[i] = 0;
+  }
+  for (int i = 0; i < output_size; i++) {
+    float score = output_data[i];
+    int index = i;
+    for (int j = 0; j < TOPK; j++) {
+      if (score > max_scores[j]) {
+        index += max_indices[j];
+        max_indices[j] = index - max_indices[j];
+        index -= max_indices[j];
+        score += max_scores[j];
+        max_scores[j] = score - max_scores[j];
+        score -= max_scores[j];
+      }
+    }
+  }
+
+  std::vector<RESULT> results(TOPK);
+  for (int i = 0; i < results.size(); i++) {
+    results[i].class_name = "Unknown";
+    if (max_indices[i] >= 0 && max_indices[i] < word_labels.size()) {
+      results[i].class_name = word_labels[max_indices[i]];
+    }
+    results[i].score = max_scores[i];
+    results[i].class_id = max_indices[i];
+    cv::putText(output_image,
+                "Top" + std::to_string(i + 1) + "." + results[i].class_name +
+                    ":" + std::to_string(results[i].score),
+                cv::Point2d(5, i * 18 + 20), cv::FONT_HERSHEY_PLAIN, 1,
+                cv::Scalar(51, 255, 255));
+  }
+  return results;
+}
+
+// fill tensor with mean and scale and trans layout: nhwc -> nchw, neon speed up
+void NeonMeanScale(const float *din, float *dout, int size,
+                   const std::vector<float> mean,
+                   const std::vector<float> scale) {
+  if (mean.size() != 3 || scale.size() != 3) {
+    std::cerr << "[ERROR] mean or scale size must equal to 3\n";
+    exit(1);
+  }
+  float32x4_t vmean0 = vdupq_n_f32(mean[0]);
+  float32x4_t vmean1 = vdupq_n_f32(mean[1]);
+  float32x4_t vmean2 = vdupq_n_f32(mean[2]);
+  float32x4_t vscale0 = vdupq_n_f32(scale[0]);
+  float32x4_t vscale1 = vdupq_n_f32(scale[1]);
+  float32x4_t vscale2 = vdupq_n_f32(scale[2]);
+
+  float *dout_c0 = dout;
+  float *dout_c1 = dout + size;
+  float *dout_c2 = dout + size * 2;
+
+  int i = 0;
+  for (; i < size - 3; i += 4) {
+    float32x4x3_t vin3 = vld3q_f32(din);
+    float32x4_t vsub0 = vsubq_f32(vin3.val[0], vmean0);
+    float32x4_t vsub1 = vsubq_f32(vin3.val[1], vmean1);
+    float32x4_t vsub2 = vsubq_f32(vin3.val[2], vmean2);
+    float32x4_t vs0 = vmulq_f32(vsub0, vscale0);
+    float32x4_t vs1 = vmulq_f32(vsub1, vscale1);
+    float32x4_t vs2 = vmulq_f32(vsub2, vscale2);
+    vst1q_f32(dout_c0, vs0);
+    vst1q_f32(dout_c1, vs1);
+    vst1q_f32(dout_c2, vs2);
+
+    din += 12;
+    dout_c0 += 4;
+    dout_c1 += 4;
+    dout_c2 += 4;
+  }
+  for (; i < size; i++) {
+    *(dout_c0++) = (*(din++) - mean[0]) * scale[0];
+    *(dout_c1++) = (*(din++) - mean[1]) * scale[1];
+    *(dout_c2++) = (*(din++) - mean[2]) * scale[2];
+  }
+}
+
+cv::Mat ResizeImage(const cv::Mat &img, const int &resize_short_size) {
+  int w = img.cols;
+  int h = img.rows;
+
+  cv::Mat resize_img;
+
+  float ratio = 1.f;
+  if (h < w) {
+    ratio = float(resize_short_size) / float(h);
+  } else {
+    ratio = float(resize_short_size) / float(w);
+  }
+  int resize_h = round(float(h) * ratio);
+  int resize_w = round(float(w) * ratio);
+
+  cv::resize(img, resize_img, cv::Size(resize_w, resize_h));
+  return resize_img;
+}
+
+cv::Mat CenterCropImg(const cv::Mat &img, const int &crop_size) {
+  int resize_w = img.cols;
+  int resize_h = img.rows;
+  int w_start = int((resize_w - crop_size) / 2);
+  int h_start = int((resize_h - crop_size) / 2);
+  cv::Rect rect(w_start, h_start, crop_size, crop_size);
+  cv::Mat crop_img = img(rect);
+  return crop_img;
+}
+
+std::vector<RESULT>
+RunClasModel(std::shared_ptr<PaddlePredictor> predictor, const cv::Mat &img,
+             const std::map<std::string, std::string> &config,
+             const std::vector<std::string> &word_labels) {
+  // Read img
+  int resize_short_size = stoi(config.at("resize_short_size"));
+  int crop_size = stoi(config.at("crop_size"));
+  int visualize = stoi(config.at("visualize"));
+
+  cv::Mat resize_image = ResizeImage(img, resize_short_size);
+
+  cv::Mat crop_image = CenterCropImg(resize_image, crop_size);
+
+  cv::Mat img_fp;
+  double e = 1.0 / 255.0;
+  crop_image.convertTo(img_fp, CV_32FC3, e);
+
+  // Prepare input data from image
+  std::unique_ptr<Tensor> input_tensor(std::move(predictor->GetInput(0)));
+  input_tensor->Resize({1, 3, img_fp.rows, img_fp.cols});
+  auto *data0 = input_tensor->mutable_data<float>();
+
+  std::vector<float> mean = {0.485f, 0.456f, 0.406f};
+  std::vector<float> scale = {1 / 0.229f, 1 / 0.224f, 1 / 0.225f};
+  const float *dimg = reinterpret_cast<const float *>(img_fp.data);
+  NeonMeanScale(dimg, data0, img_fp.rows * img_fp.cols, mean, scale);
+
+  // Run predictor
+  predictor->Run();
+
+  // Get output and post process
+  std::unique_ptr<const Tensor> output_tensor(
+      std::move(predictor->GetOutput(0)));
+  auto *output_data = output_tensor->data<float>();
+
+  int output_size = 1;
+  for (auto dim : output_tensor->shape()) {
+    output_size *= dim;
+  }
+
+  cv::Mat output_image;
+  auto results =
+      PostProcess(output_data, output_size, word_labels, output_image);
+
+  if (visualize) {
+    std::string output_image_path = "./clas_result.png";
+    cv::imwrite(output_image_path, output_image);
+    std::cout << "save output image into " << output_image_path << std::endl;
+  }
+
+  return results;
+}
+
+std::shared_ptr<PaddlePredictor> LoadModel(std::string model_file) {
+  MobileConfig config;
+  config.set_model_from_file(model_file);
+
+  std::shared_ptr<PaddlePredictor> predictor =
+      CreatePaddlePredictor<MobileConfig>(config);
+  return predictor;
+}
+
+std::vector<std::string> split(const std::string &str,
+                               const std::string &delim) {
+  std::vector<std::string> res;
+  if ("" == str)
+    return res;
+  char *strs = new char[str.length() + 1];
+  std::strcpy(strs, str.c_str());
+
+  char *d = new char[delim.length() + 1];
+  std::strcpy(d, delim.c_str());
+
+  char *p = std::strtok(strs, d);
+  while (p) {
+    string s = p;
+    res.push_back(s);
+    p = std::strtok(NULL, d);
+  }
+
+  return res;
+}
+
+std::vector<std::string> ReadDict(std::string path) {
+  std::ifstream in(path);
+  std::string filename;
+  std::string line;
+  std::vector<std::string> m_vec;
+  if (in) {
+    while (getline(in, line)) {
+      m_vec.push_back(line);
+    }
+  } else {
+    std::cout << "no such file" << std::endl;
+  }
+  return m_vec;
+}
+
+std::map<std::string, std::string> LoadConfigTxt(std::string config_path) {
+  auto config = ReadDict(config_path);
+
+  std::map<std::string, std::string> dict;
+  for (int i = 0; i < config.size(); i++) {
+    std::vector<std::string> res = split(config[i], " ");
+    dict[res[0]] = res[1];
+  }
+  return dict;
+}
+
+void PrintConfig(const std::map<std::string, std::string> &config) {
+  std::cout << "=======PaddleClas lite demo config======" << std::endl;
+  for (auto iter = config.begin(); iter != config.end(); iter++) {
+    std::cout << iter->first << " : " << iter->second << std::endl;
+  }
+  std::cout << "=======End of PaddleClas lite demo config======" << std::endl;
+}
+
+std::vector<std::string> LoadLabels(const std::string &path) {
+  std::ifstream file;
+  std::vector<std::string> labels;
+  file.open(path);
+  while (file) {
+    std::string line;
+    std::getline(file, line);
+    std::string::size_type pos = line.find(" ");
+    if (pos != std::string::npos) {
+      line = line.substr(pos);
+    }
+    labels.push_back(line);
+  }
+  file.clear();
+  file.close();
+  return labels;
+}
+
+int main(int argc, char **argv) {
+  if (argc < 3) {
+    std::cerr << "[ERROR] usage: " << argv[0] << " config_path img_path\n";
+    exit(1);
+  }
+
+  std::string config_path = argv[1];
+  std::string img_path = argv[2];
+
+  // load config
+  auto config = LoadConfigTxt(config_path);
+  PrintConfig(config);
+
+  std::string clas_model_file = config.at("clas_model_file");
+  std::string label_path = config.at("label_path");
+
+  // Load Labels
+  std::vector<std::string> word_labels = LoadLabels(label_path);
+
+  auto clas_predictor = LoadModel(clas_model_file);
+
+  auto start = std::chrono::system_clock::now();
+
+  cv::Mat srcimg = cv::imread(img_path, cv::IMREAD_COLOR);
+  cv::cvtColor(srcimg, srcimg, cv::COLOR_BGR2RGB);
+
+  std::vector<RESULT> results =
+      RunClasModel(clas_predictor, srcimg, config, word_labels);
+
+  std::cout << "===clas result for image: " << img_path << "===" << std::endl;
+  for (int i = 0; i < results.size(); i++) {
+    std::cout << "\t"
+              << "Top-" << i + 1 << ", class_id: " << results[i].class_id
+              << ", class_name: " << results[i].class_name
+              << ", score: " << results[i].score << std::endl;
+  }
+
+  auto end = std::chrono::system_clock::now();
+  auto duration =
+      std::chrono::duration_cast<std::chrono::microseconds>(end - start);
+
+  std::cout << "Cost "
+            << double(duration.count()) *
+                   std::chrono::microseconds::period::num /
+                   std::chrono::microseconds::period::den
+            << " s" << std::endl;
+
+  return 0;
+}
--- a/deploy/lite/imgs/lite_demo_result.png
+++ b/deploy/lite/imgs/lite_demo_result.png
--- a/deploy/lite/imgs/tabby_cat.jpg
+++ b/deploy/lite/imgs/tabby_cat.jpg
--- a/deploy/lite/prepare.sh
+++ b/deploy/lite/prepare.sh
+#!/bin/bash
+
+if [ $# != 1 ] ; then
+echo "USAGE: $0 your_inference_lite_lib_path"
+exit 1;
+fi
+
+mkdir -p  $1/demo/cxx/clas/debug/
+cp  ../../ppcls/utils/imagenet1k_label_list.txt  $1/demo/cxx/clas/debug/
+cp -r  ./*   $1/demo/cxx/clas/
+cp ./config.txt  $1/demo/cxx/clas/debug/
+cp ./imgs/tabby_cat.jpg  $1/demo/cxx/clas/debug/
+
+echo "Prepare Done"
--- a/deploy/lite/readme.md
+++ b/deploy/lite/readme.md
+# 端侧部署
+
+本教程将介绍基于[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 在移动端部署PaddleClas分类模型的详细步骤。
+
+Paddle Lite是飞桨轻量化推理引擎，为手机、IOT端提供高效推理能力，并广泛整合跨平台硬件，为端侧部署及应用落地问题提供轻量化的部署方案。如果希望直接测试速度，可以参考[Paddle-Lite移动端benchmark测试教程](../../docs/zh_CN/extension/paddle_mobile_inference.md)。
+
+
+## 1. 准备环境
+
+### 运行准备
+- 电脑（编译Paddle Lite）
+- 安卓手机（armv7或armv8）
+
+### 1.1 准备交叉编译环境
+交叉编译环境用于编译 Paddle Lite 和 PaddleClas 的C++ demo。
+支持多种开发环境，不同开发环境的编译流程请参考对应文档。
+
+1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker)
+2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux)
+3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os)
+
+### 1.2 准备预测库
+
+预测库有两种获取方式：
+1. [建议]直接下载，预测库下载链接如下：
+      |平台|预测库下载链接|
+      |-|-|
+      |Android|[arm7](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/Android/inference_lite_lib.android.armv7.gcc.c++_static.with_extra.CV_ON.tar.gz) / [arm8](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/Android/inference_lite_lib.android.armv8.gcc.c++_static.with_extra.CV_ON.tar.gz)|
+      |iOS|[arm7](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/iOS/inference_lite_lib.ios.armv7.with_extra.CV_ON.tar.gz) / [arm8](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/iOS/inference_lite_lib.ios64.armv8.with_extra.CV_ON.tar.gz)|
+
+      注：1. 如果是从 Paddle-Lite [官方文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/release_lib.html#android-toolchain-gcc)下载的预测库，
+      注意选择`with_extra=ON，with_cv=ON`的下载链接。2. 如果使用量化的模型部署在端侧，建议使用Paddle-Lite develop分支编译预测库。
+
+2. 编译Paddle-Lite得到预测库，Paddle-Lite的编译方式如下：
+```shell
+git clone https://github.com/PaddlePaddle/Paddle-Lite.git
+cd Paddle-Lite
+# 如果使用编译方式，建议使用develop分支编译预测库
+git checkout develop
+./lite/tools/build_android.sh  --arch=armv8  --with_cv=ON --with_extra=ON
+```
+
+**注意**：编译Paddle-Lite获得预测库时，需要打开`--with_cv=ON --with_extra=ON`两个选项，`--arch`表示`arm`版本，这里指定为armv8，更多编译命令介绍请参考[链接](https://paddle-lite.readthedocs.io/zh/latest/user_guides/Compile/Android.html#id2)。
+
+直接下载预测库并解压后，可以得到`inference_lite_lib.android.armv8/`文件夹，通过编译Paddle-Lite得到的预测库位于`Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/`文件夹下。
+预测库的文件目录如下：
+
+```
+inference_lite_lib.android.armv8/
+|-- cxx                                        C++ 预测库和头文件
+|   |-- include                                C++ 头文件
+|   |   |-- paddle_api.h
+|   |   |-- paddle_image_preprocess.h
+|   |   |-- paddle_lite_factory_helper.h
+|   |   |-- paddle_place.h
+|   |   |-- paddle_use_kernels.h
+|   |   |-- paddle_use_ops.h
+|   |   `-- paddle_use_passes.h
+|   `-- lib                                           C++预测库
+|       |-- libpaddle_api_light_bundled.a             C++静态库
+|       `-- libpaddle_light_api_shared.so             C++动态库
+|-- java                                     Java预测库
+|   |-- jar
+|   |   `-- PaddlePredictor.jar
+|   |-- so
+|   |   `-- libpaddle_lite_jni.so
+|   `-- src
+|-- demo                                     C++和Java示例代码
+|   |-- cxx                                  C++  预测库demo
+|   `-- java                                 Java 预测库demo
+```
+
+## 2 开始运行
+
+### 2.1 模型优化
+
+Paddle-Lite 提供了多种策略来自动优化原始的模型，其中包括量化、子图融合、混合调度、Kernel优选等方法，使用Paddle-Lite的`opt`工具可以自动对inference模型进行优化，目前支持两种优化方式，优化后的模型更轻量，模型运行速度更快。
+
+**注意**：如果已经准备好了 `.nb` 结尾的模型文件，可以跳过此步骤。
+
+#### 2.1.1 [建议]pip安装paddlelite并进行转换
+
+Python下安装 `paddlelite`，目前最高支持`Python3.7`。
+
+```shell
+pip install paddlelite
+```
+
+之后使用`paddle_lite_opt`工具可以进行inference模型的转换。`paddle_lite_opt`的部分参数如下
+
+|选项|说明|
+|-|-|
+|--model_dir|待优化的PaddlePaddle模型（非combined形式）的路径|
+|--model_file|待优化的PaddlePaddle模型（combined形式）的网络结构文件路径|
+|--param_file|待优化的PaddlePaddle模型（combined形式）的权重文件路径|
+|--optimize_out_type|输出模型类型，目前支持两种类型：protobuf和naive_buffer，其中naive_buffer是一种更轻量级的序列化/反序列化实现。若您需要在mobile端执行模型预测，请将此选项设置为naive_buffer。默认为protobuf|
+|--optimize_out|优化模型的输出路径|
+|--valid_targets|指定模型可执行的backend，默认为arm。目前可支持x86、arm、opencl、npu、xpu，可以同时指定多个backend(以空格分隔)，Model Optimize Tool将会自动选择最佳方式。如果需要支持华为NPU（Kirin 810/990 Soc搭载的达芬奇架构NPU），应当设置为npu, arm|
+|--record_tailoring_info|当使用 根据模型裁剪库文件 功能时，则设置该选项为true，以记录优化后模型含有的kernel和OP信息，默认为false|
+
+`--model_file`表示inference模型的model文件地址，`--param_file`表示inference模型的param文件地址；`optimize_out`用于指定输出文件的名称（不需要添加`.nb`的后缀）。直接在命令行中运行`paddle_lite_opt`，也可以查看所有参数及其说明。
+
+
+#### 2.1.2 源码编译Paddle-Lite生成opt工具
+
+模型优化需要Paddle-Lite的`opt`可执行文件，可以通过编译Paddle-Lite源码获得，编译步骤如下：
+```shell
+# 如果准备环境时已经clone了Paddle-Lite，则不用重新clone Paddle-Lite
+git clone https://github.com/PaddlePaddle/Paddle-Lite.git
+cd Paddle-Lite
+git checkout develop
+# 启动编译
+./lite/tools/build.sh build_optimize_tool
+```
+
+编译完成后，`opt`文件位于`build.opt/lite/api/`下，可通过如下方式查看`opt`的运行选项和使用方式；
+```shell
+cd build.opt/lite/api/
+./opt
+```
+
+`opt`的使用方式与参数与上面的`paddle_lite_opt`完全一致。
+
+<a name="2.1.3"></a>
+
+#### 2.1.3 转换示例
+
+下面以PaddleClas的 `MobileNetV3_large_x1_0` 模型为例，介绍使用`paddle_lite_opt`完成预训练模型到inference模型，再到Paddle-Lite优化模型的转换。
+
+```shell
+# 进入PaddleClas根目录
+cd PaddleClas_root_path
+export PYTHONPATH=$PWD
+
+# 下载并解压预训练模型
+wget https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_pretrained.tar
+tar -xf MobileNetV3_large_x1_0_pretrained.tar
+
+# 将预训练模型导出为inference模型
+python tools/export_model.py -m MobileNetV3_large_x1_0 -p ./MobileNetV3_large_x1_0_pretrained/ -o ./MobileNetV3_large_x1_0_inference/
+
+# 将inference模型转化为Paddle-Lite优化模型
+paddle_lite_opt --model_file=./MobileNetV3_large_x1_0_inference/model --param_file=./MobileNetV3_large_x1_0_inference/params --optimize_out=./MobileNetV3_large_x1_0
+```
+
+最终在当前文件夹下生成`MobileNetV3_large_x1_0.nb`的文件。
+
+**注意**：`--optimize_out` 参数为优化后模型的保存路径，无需加后缀`.nb`；`--model_file` 参数为模型结构信息文件的路径，`--param_file` 参数为模型权重信息文件的路径，请注意文件名。
+
+<a name="2.2与手机联调"></a>
+### 2.2 与手机联调
+
+首先需要进行一些准备工作。
+1. 准备一台arm8的安卓手机，如果编译的预测库和opt文件是armv7，则需要arm7的手机，并修改Makefile中`ARM_ABI = arm7`。
+2. 电脑上安装ADB工具，用于调试。 ADB安装方式如下：
+
+    3.1. MAC电脑安装ADB:
+
+    ```shell
+    brew cask install android-platform-tools
+    ```
+    3.2. Linux安装ADB
+    ```shell
+    sudo apt update
+    sudo apt install -y wget adb
+    ```
+    3.3. Window安装ADB
+
+    win上安装需要去谷歌的安卓平台下载ADB软件包进行安装：[链接](https://developer.android.com/studio)
+
+4. 手机连接电脑后，开启手机`USB调试`选项，选择`文件传输`模式，在电脑终端中输入：
+
+```shell
+adb devices
+```
+如果有device输出，则表示安装成功，如下所示：
+```
+List of devices attached
+744be294    device
+```
+
+5. 准备优化后的模型、预测库文件、测试图像和类别映射文件。
+
+```shell
+cd PaddleClas_root_path
+cd deploy/lite/
+
+# 运行prepare.sh
+# prepare.sh 会将预测库文件、测试图像和使用的字典文件放置在预测库中的demo/cxx/clas文件夹下
+sh prepare.sh /{lite prediction library path}/inference_lite_lib.android.armv8
+
+# 进入lite demo的工作目录
+cd /{lite prediction library path}/inference_lite_lib.android.armv8/
+cd demo/cxx/clas/
+
+# 将C++预测动态库so文件复制到debug文件夹中
+cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/
+```
+
+`prepare.sh` 以 `PaddleClas/deploy/lite/imgs/tabby_cat.jpg` 作为测试图像，将测试图像复制到`demo/cxx/clas/debug/` 文件夹下。
+将 `paddle_lite_opt` 工具优化后的模型文件放置到 `/{lite prediction library path}/inference_lite_lib.android.armv8/demo/cxx/clas/debug/` 文件夹下。本例中，使用[2.1.3](#2.1.3)生成的 `MobileNetV3_large_x1_0.nb` 模型文件。
+
+执行完成后，clas文件夹下将有如下文件格式：
+
+```
+demo/cxx/clas/
+|-- debug/
+|   |--MobileNetV3_large_x1_0.nb                优化后的分类器模型文件
+|   |--tabby_cat.jpg                           	待测试图像
+|   |--imagenet1k_label_list.txt                类别映射文件
+|   |--libpaddle_light_api_shared.so    C++预测库文件
+|   |--config.txt                       分类预测超参数配置
+|-- config.txt                  				分类预测超参数配置
+|-- image_classfication.cpp            	图像分类代码文件
+|-- Makefile                    				编译文件
+```
+
+#### 注意：
+* 上述文件中，`imagenet1k_label_list.txt` 是ImageNet1k数据集的类别映射文件，如果使用自定义的类别，需要更换该类别映射文件。
+
+*  `config.txt` 包含了分类器的超参数，如下：
+
+```shell
+clas_model_file ./MobileNetV3_large_x1_0.nb # 模型文件地址
+label_path ./imagenet1k_label_list.txt 			# 类别映射文本文件
+resize_short_size 256 # resize之后的短边边长
+crop_size 224 				# 裁剪后用于预测的边长
+visualize 0 # 是否进行可视化，如果选择的话，会在当前文件夹下生成名为clas_result.png的图像文件。
+```
+
+5. 启动调试，上述步骤完成后就可以使用ADB将文件夹 `debug/` push到手机上运行，步骤如下：
+
+```shell
+# 执行编译，得到可执行文件clas_system
+make -j
+
+# 将编译得到的可执行文件移动到debug文件夹中
+mv clas_system ./debug/
+
+# 将上述debug文件夹push到手机上
+adb push debug /data/local/tmp/
+
+adb shell
+cd /data/local/tmp/debug
+export LD_LIBRARY_PATH=/data/local/tmp/debug:$LD_LIBRARY_PATH
+
+# clas_system可执行文件的使用方式为:
+# ./clas_system 配置文件路径  测试图像路径
+./clas_system ./config.txt ./tabby_cat.jpg
+```
+
+如果对代码做了修改，则需要重新编译并push到手机上。
+
+运行效果如下：
+
+<div align="center">
+    <img src="./imgs/lite_demo_result.png" width="600">
+</div>
+
+
+## FAQ
+Q1：如果想更换模型怎么办，需要重新按照流程走一遍吗？  
+A1：如果已经走通了上述步骤，更换模型只需要替换 `.nb` 模型文件即可，同时要注意修改下配置文件中的 `.nb` 文件路径以及类别映射文件（如有必要）。
+
+Q2：换一个图测试怎么做？  
+A2：替换 debug 下的测试图像为你想要测试的图像，使用 ADB 再次 push 到手机上即可。
--- a/deploy/lite/readme_en.md
+++ b/deploy/lite/readme_en.md
+
+# Tutorial of PaddleClas Mobile Deployment
+
+This tutorial will introduce how to use [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) to deploy PaddleClas models on mobile phones.
+
+Paddle-Lite is a lightweight inference engine for PaddlePaddle. It provides efficient inference capabilities for mobile phones and IoTs,  and extensively integrates cross-platform hardware to provide lightweight deployment solutions for mobile-side deployment issues.
+
+If you only want to test speed, please refer to [The tutorial of Paddle-Lite mobile-side benchmark test](../../docs/zh_CN/extension/paddle_mobile_inference.md).
+
+## 1. Preparation
+
+- Computer (for compiling Paddle-Lite)
+- Mobile phone (arm7 or arm8)
+
+## 2. Build Paddle-Lite library
+
+The cross-compilation environment is used to compile the C++ demos of Paddle-Lite and PaddleClas.
+
+For the detailed compilation directions of different development environments, please refer to the corresponding documents.
+
+1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker)
+2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux)
+3. [macOS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os)
+
+## 3. Download inference library for Android or iOS
+
+|Platform|Inference Library Download Link|
+|-|-|
+|Android|[arm7](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/Android/inference_lite_lib.android.armv7.gcc.c++_static.with_extra.CV_ON.tar.gz) / [arm8](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/Android/inference_lite_lib.android.armv8.gcc.c++_static.with_extra.CV_ON.tar.gz)|
+|iOS|[arm7](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/iOS/inference_lite_lib.ios.armv7.with_extra.CV_ON.tar.gz) / [arm8](https://paddlelite-data.bj.bcebos.com/Release/2.6.1/iOS/inference_lite_lib.ios64.armv8.with_extra.CV_ON.tar.gz)|
+
+**NOTE**:
+
+1. If you download the inference library from [Paddle-Lite official document](https://paddle-lite.readthedocs.io/zh/latest/user_guides/release_lib.html#android-toolchain-gcc), please choose `with_extra=ON` , `with_cv=ON` .
+
+2. It is recommended to build inference library using [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) develop branch if you want to deploy the [quantitative](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/quantization/README_en.md) model to mobile phones. Please refer to the [link](https://paddle-lite.readthedocs.io/zh/latest/user_guides/Compile/Android.html#id2) for more detailed information about compiling.
+
+
+The structure of the inference library is as follows:
+
+```
+inference_lite_lib.android.armv8/
+|-- cxx                                                    C++ inference library and header files
+|   |-- include                                            C++ header files
+|   |   |-- paddle_api.h
+|   |   |-- paddle_image_preprocess.h
+|   |   |-- paddle_lite_factory_helper.h
+|   |   |-- paddle_place.h
+|   |   |-- paddle_use_kernels.h
+|   |   |-- paddle_use_ops.h
+|   |   `-- paddle_use_passes.h
+|   `-- lib                                                                                  C++ inference library
+|       |-- libpaddle_api_light_bundled.a           C++ static library
+|       `-- libpaddle_light_api_shared.so           C++ dynamic library
+|-- java                                                     Java inference library
+|   |-- jar
+|   |   `-- PaddlePredictor.jar
+|   |-- so
+|   |   `-- libpaddle_lite_jni.so
+|   `-- src
+|-- demo                                                     C++ and java demos
+|   |-- cxx                                                                                  C++ demos
+|   `-- java                                                                              Java demos
+```
+
+
+
+## 4. Inference Model Optimization
+
+Paddle-Lite provides a variety of strategies to automatically optimize the original training model, including quantization, sub-graph fusion, hybrid scheduling, Kernel optimization and so on. In order to make the optimization process more convenient and easy to use, Paddle-Lite provides `opt` tool to automatically complete the optimization steps and output a lightweight, optimal executable model.
+
+**NOTE**: If you have already got the `.nb` file, you can skip this step.
+
+<a name="4.1"></a>
+
+### 4.1 [RECOMMEND] Use `pip` to install Paddle-Lite and optimize model
+
+* Use pip to install Paddle-Lite. The following command uses `pip3.7` .
+
+```shell
+pip install paddlelite
+```
+
+* Use `paddle_lite_opt` to optimize inference model, the parameters of `paddle_lite_opt` are as follows:
+
+| Parameters              | Explanation                                                  |
+| ----------------------- | ------------------------------------------------------------ |
+| --model_dir             | Path to the PaddlePaddle model (no-combined) file to be optimized. |
+| --model_file            | Path to the net structure file of PaddlePaddle model (combined) to be optimized. |
+| --param_file            | Path to the net weight files of PaddlePaddle model (combined) to be optimized. |
+| --optimize_out_type     | Type of output model, `protobuf` by default. Supports `protobuf` and `naive_buffer` . Compared with `protobuf`, you can use`naive_buffer` to get a more lightweight serialization/deserialization model. If you need to predict on the mobile-side, please set it to `naive_buffer`. |
+| --optimize_out          | Path to output model, not needed to add `.nb` suffix.        |
+| --valid_targets         | The executable backend of the model, `arm` by default. Supports one or some of `x86` , `arm` , `opencl` , `npu` , `xpu`. If set more than one, please separate the options by space, and the `opt` tool will choose the best way automatically. If need to support Huawei NPU (DaVinci core carried by Kirin 810/990 SoC), please set it to `npu arm` . |
+| --record_tailoring_info | Whether to enable `Cut the Library Files According To the Model` , `false` by default. If need to record kernel and OP infos of optimized model, please set it to `true`. |
+
+In addition, you can run `paddle_lite_opt` to get more detailed information about how to use.
+
+### 4.2 Compile Paddle-Lite to generate `opt` tool
+
+Optimizing model requires Paddle-Lite's `opt` executable file, which can be obtained by compiling the Paddle-Lite. The steps are as follows:
+
+```shell
+# get the Paddle-Lite source code, if have gotten , please skip
+git clone https://github.com/PaddlePaddle/Paddle-Lite.git
+cd Paddle-Lite
+git checkout develop
+# compile
+./lite/tools/build.sh build_optimize_tool
+```
+
+After the compilation is complete, the `opt` file is located under `build.opt/lite/api/`.
+
+`opt` tool is used in the same way as `paddle_lite_opt` , please refer to [4.1](#4.1).
+
+<a name="4.3"></a>
+
+### 4.3 Demo of get the optimized model
+
+Taking the `MobileNetV3_large_x1_0` model of PaddleClas as an example, we will introduce how to use `paddle_lite_opt` to complete the conversion from the pre-trained model to the inference model, and then to the Paddle-Lite optimized model.
+
+```shell
+# enter PaddleClas root directory
+cd PaddleClas_root_path
+export PYTHONPATH=$PWD
+
+# download and uncompress the pre-trained model
+wget https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_pretrained.tar
+tar -xf MobileNetV3_large_x1_0_pretrained.tar
+
+# export the pre-trained model as an inference model
+python tools/export_model.py -m MobileNetV3_large_x1_0 -p ./MobileNetV3_large_x1_0_pretrained/ -o ./MobileNetV3_large_x1_0_inference/
+
+# convert inference model to Paddle-Lite optimized model
+paddle_lite_opt --model_file=./MobileNetV3_large_x1_0_inference/model --param_file=./MobileNetV3_large_x1_0_inference/params --optimize_out=./MobileNetV3_large_x1_0
+```
+
+When the above code command is completed, there will be ``MobileNetV3_large_x1_0.nb` in the current directory, which is the converted model file.
+
+## 5. Run optimized model on Phone
+
+1. Prepare an Android phone with `arm8`. If the compiled inference library and `opt` file are `armv7`, you need an `arm7` phone and modify `ARM_ABI = arm7` in the Makefile.
+
+2. Install the ADB tool on the computer.
+
+    * Install ADB for MAC
+
+      Recommend use homebrew to install.
+
+      ```shell
+      brew cask install android-platform-tools
+      ```
+    * Install ADB for Linux
+
+      ```shell
+      sudo apt update
+      sudo apt install -y wget adb
+      ```
+    * Install ADB for windows
+      If install ADB fo Windows, you need to download from Google's Android platform: [Download Link](https://developer.android.com/studio).
+
+    First, make sure the phone is connected to the computer, turn on the `USB debugging` option of the phone, and select the `file transfer` mode. Verify whether ADB is installed successfully as follows:
+
+    ```shell
+    $ adb devices
+
+    List of devices attached
+    744be294    device
+    ```
+
+    If there is `device` output like the above, it means the installation was successful.
+
+4. Prepare optimized model, inference library files, test image and dictionary file used.
+
+```shell
+cd PaddleClas_root_path
+cd deploy/lite/
+
+# prepare.sh will put the inference library files, the test image and the dictionary files in demo/cxx/clas
+sh prepare.sh /{lite inference library path}/inference_lite_lib.android.armv8
+
+# enter the working directory of lite demo
+cd /{lite inference library path}/inference_lite_lib.android.armv8/
+cd demo/cxx/clas/
+
+# copy the C++ inference dynamic library file （ie. .so) to the debug folder
+cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/
+```
+
+The `prepare.sh` take `PaddleClas/deploy/lite/imgs/tabby_cat.jpg` as the test image, and copy it to the `demo/cxx/clas/debug/` directory.
+
+You should put the model that optimized by `paddle_lite_opt` under the `demo/cxx/clas/debug/` directory. In this example, use `MobileNetV3_large_x1_0.nb` model file generated in [2.1.3](#4.3).
+
+The structure of the clas demo is as follows after the above command is completed:
+
+```
+demo/cxx/clas/
+|-- debug/
+|   |--MobileNetV3_large_x1_0.nb                    class model
+|   |--tabby_cat.jpg                              test image
+|   |--imagenet1k_label_list.txt                    dictionary file
+|   |--libpaddle_light_api_shared.so              C++ .so file
+|   |--config.txt                                 config file
+|-- config.txt                                    config file
+|-- image_classfication.cpp                       source code
+|-- Makefile                                      compile file
+```
+
+**NOTE**:
+
+* `Imagenet1k_label_list.txt` is the category mapping file of the `ImageNet1k` dataset. If use a custom category, you need to replace the category mapping file.
+* `config.txt`  contains the hyperparameters, as follows:
+
+```shell
+clas_model_file ./MobileNetV3_large_x1_0.nb # path of model file
+label_path ./imagenet1k_label_list.txt      # path of category mapping file
+resize_short_size 256                       # the short side length after resize
+crop_size 224                               # side length used for inference after cropping
+
+visualize 0                                 # whether to visualize. If you set it to 1, an image file named 'clas_result.png' will be generated in the current directory.
+```
+
+5. Run Model on Phone
+
+```shell
+# run compile to get the executable file 'clas_system'
+make -j
+
+# move the compiled executable file to the debug folder
+mv clas_system ./debug/
+
+# push the debug folder to Phone
+adb push debug /data/local/tmp/
+
+adb shell
+cd /data/local/tmp/debug
+export LD_LIBRARY_PATH=/data/local/tmp/debug:$LD_LIBRARY_PATH
+
+# the usage of clas_system is as follows:
+# ./clas_system "path of config file" "path of test image"
+./clas_system ./config.txt ./tabby_cat.jpg
+```
+
+**NOTE**: If you make changes to the code, you need to recompile and repush the `debug ` folder to the phone.
+
+The result is as follows:
+
+<div align="center">
+    <img src="./imgs/lite_demo_result.png" width="600">
+</div>
+
+
+
+## FAQ
+
+Q1：If I want to change the model, do I need to go through the all process again?  
+A1：If you have completed the above steps, you only need to replace the `.nb` model file after replacing the model. At the same time, you may need to modify the path of `.nb` file in the config file and change the category mapping file to be compatible the model .
+
+Q2：How to change the test picture?  
+A2：Replace the test image under debug folder with the image you want to test，and then repush to the Phone again.
--- a/docs/en/advanced_tutorials/distillation/distillation_en.md
+++ b/docs/en/advanced_tutorials/distillation/distillation_en.md
+
+
+# Introduction of model compression methods
+
+In recent years, deep neural networks have been proven to be an extremely effective method to solve problems in the fields of computer vision and natural language processing. The deep learning methods performs better than traditional methods with suitable network structure and training process.
+
+With enough training data, increasing parameters of the neural network by building a reasonabe network can significantly the model performance. But this increases the model complexity, which takes too much computation cost in real scenarios.
+
+
+Parameter redundancy exists in deep neural networks. There are several methods to compress the model suck as pruning ,quantization, knowledge distillation, etc. Knowledge distillation refers to using the teacher model to guide the student model to learn specific tasks, ensuring that the small model has a relatively large effect improvement with the computation cost unchanged, and even obtains similar accuracy with the large model [1]. Combining some of the existing distillation methods [2,3], PaddleClas provides a simple semi-supervised label knowledge distillation solution (SSLD). Top-1 Accuarcy on ImageNet1k dataset has an improvement of more than 3% based on ResNet_vd and MobileNet series, which can be shown as below.
+
+
+![](../../../images/distillation/distillation_perform_s.jpg)
+
+
+# SSLD
+
+## Introduction
+
+The following figure shows the framework of SSLD.
+
+![](../../../images/distillation/ppcls_distillation.png)
+
+First, we select nearly 4 million images from ImageNet22k dataset, and integrate it with the ImageNet-1k training set to get a new dataset containing 5 million images. Then, we combine the student model and the teacher model into a new network, which outputs the predictions of the student model and the teacher model, respectively. The gradient of the entire network of the teacher model is fixed. Finally, we use JS divergence loss as the loss function for the training process. Here we take MobileNetV3 distillation task as an example, and introduce key points of SSLD.
+
+* Choice of the teacher model, During knowledge distillation, it may not be an optimal solution if the structure of the teacher model and the student model are too different. Under the same structure, the teacher model with higher accuracy leads to better performance for the student model during distillation. Compared with the 79.12% ResNet50_vd teacher model, using the 82.4% teacher model can bring a 0.4% accuracy improvement on Top-1 accuracy (`75.6%-> 76.0%`).
+
+* Improvement of loss function. The most commonly used loss function for classification is cross entropy loss. We fint that when using soft label for training, KL divergence loss is almost useless to improve model performance compared to cross entropy loss, but The accuracy has a 0.2% improvement using JS divergence loss (`76.0%-> 76.2%`). Loss function in SSLD is JS divergence loss.
+
+* More iteration number. It is only 120 for the baseline experiment. We can achieve a 0.9% improvement by setting it as 360 (`76.2%-> 77.1%`).
+
+* There is not need for laleled data in SSLD, which leads to convenient training data expansion. label is not utilized when computing the loss function, therefore the unlabeled data can also be used to train the network. The label-free distillation strategy of this distillation solution has also greatly improved the upper performance limit of student models (`77.1%-> 78.5%`).
+
+* ImageNet1k finetune. ImageNet1k training set is used for finetuning, which brings a 0.4% accuracy improvement (`78.5%-> 78.9%`).
+
+
+## Data selection
+
+* An important feature of the SSLD distillation scheme is no need for labeled images, so the dataset size can be arbitrarily expanded. Considering the limitation of computing resources, we here only expand the training set of the distillation task based on the ImageNet22k dataset. For SSLD, we used the `Top-k per class` data sampling scheme [3]. Specific steps are as follows.
+     * Deduplication of training set. We first deduplicate the ImageNet22k dataset and the ImageNet1k validation set based on the SIFT feature similarity matching method to prevent the added ImageNet22k training set from containing the ImageNet1k validation set images. Finally we removed 4511 similar images. Similar pictures with partial filtering are shown below.
+
+    ![](../../../images/distillation/22k_1k_val_compare_w_sift.png)
+
+    * Obtain the soft label of the ImageNet22k dataset. For the ImageNet22k dataset after deduplication, we use the `ResNeXt101_32x16d_wsl` model to make predictions to obtain the soft label of each image.
+     * Top-k data selection. There contains 1000 categories in ImageNet1k dataset. For each category, we find out images in the category with Top-k highest score, and finally generate a dataset whose image number does not exceed `1000 * k` (For some categories, there may contain less than k images).
+     * The selected images are merged with the ImageNet1k training set to form the new dataset used for the final distillation model training, which contains 5 million images in all.
+
+# Experiments
+
+The distillation solution that PaddleClas provides is combining common training with finetuning. Given a suitable teacher model, the large dataset(5 million) is used for common training and the ImageNet1k dataset is used for finetuning.
+
+## Choice of teacher model
+
+In order to verify the influence of the model size difference between the teacher model and the student model on the distillation results as well as the teacher model accuracy, we conducted several experiments. The training strategy is unified as follows: `cosine_decay_warmup, lr = 1.3, epoch = 120, bs = 2048`, and the student models are all trained from scratch.
+
+
+|Teacher Model | Teacher Top1 | Student Model | Student Top1|
+|- |:-: |:-: | :-: |
+| ResNeXt101_32x16d_wsl | 84.2% | MobileNetV3_large_x1_0 | 75.78% |
+| ResNet50_vd | 79.12% | MobileNetV3_large_x1_0 | 75.60% |
+| ResNet50_vd | 82.35% | MobileNetV3_large_x1_0 | 76.00% |
+
+
+It can be shown from the table that:
+
+> When the teacher model structure is the same, the higher the teacher model accuracy, the better the final student model will be.
+>
+> The size difference between the teacher model and the student model should not be too large, otherwise it will decrease the accuracy of the distillation results.
+
+Therefore, during distillation, for the ResNet series student model, we use `ResNeXt101_32x16d_wsl` as the teacher model; for the MobileNet series student model, we use` ResNet50_vd_SSLD` as the teacher model.
+
+
+## Distillation using large-scale dataset
+
+Training process is carried out on the large-scale dataset with 5 million images. Specifically, the following table shows more details of different models.
+
+|Student Model | num_epoch  | l2_ecay | batch size/gpu cards |  base lr | learning rate decay | top1 acc |
+| - |:-: |:-: | :-: |:-: |:-: |:-: |
+| MobileNetV1 | 360 | 3e-5 | 4096/8  | 1.6 | cosine_decay_warmup | 77.65% |
+| MobileNetV2 | 360 | 1e-5 | 3072/8  | 0.54 | cosine_decay_warmup | 76.34% |
+| MobileNetV3_large_x1_0 | 360 | 1e-5 |  5760/24 | 3.65625 | cosine_decay_warmup | 78.54% |
+| MobileNetV3_small_x1_0 | 360 | 1e-5 |  5760/24 | 3.65625 | cosine_decay_warmup | 70.11% |
+| ResNet50_vd | 360 | 7e-5 | 1024/32 | 0.4 | cosine_decay_warmup | 82.07% |
+| ResNet101_vd | 360 | 7e-5 | 1024/32 | 0.4 | cosine_decay_warmup | 83.41% |
+| Res2Net200_vd_26w_4s | 360 | 4e-5 | 1024/32 | 0.4 | cosine_decay_warmup | 84.82% |
+
+## finetuning using ImageNet1k
+
+Finetuning is carried out on ImageNet1k dataset to restore distribution between training set and test set. the following table shows more details of finetuning.
+
+
+|Student Model | num_epoch  | l2_ecay | batch size/gpu cards |  base lr | learning rate decay |  top1 acc |
+| - |:-: |:-: | :-: |:-: |:-: |:-: |
+| MobileNetV1 | 30 | 3e-5 | 4096/8 | 0.016 | cosine_decay_warmup | 77.89%  |
+| MobileNetV2 | 30 | 1e-5 | 3072/8  | 0.0054 | cosine_decay_warmup | 76.73% |
+| MobileNetV3_large_x1_0 | 30 | 1e-5 |  2048/8 | 0.008 | cosine_decay_warmup | 78.96% |
+| MobileNetV3_small_x1_0 | 30 | 1e-5 |  6400/32 | 0.025 | cosine_decay_warmup | 71.28% |
+| ResNet50_vd | 60 | 7e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 82.39% |
+| ResNet101_vd | 30 | 7e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 83.73% |
+| Res2Net200_vd_26w_4s | 360 | 4e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 85.13% |
+
+## Data agmentation and Fix strategy
+
+* Based on experiments mentioned above, we add AutoAugment [4] during training process, and reduced l2_decay from 4e-5 t 2e-5. Finally, the Top-1 accuracy on ImageNet1k dataset can reach 82.99%, with 0.6% improvement compared to the standard SSLD distillation strategy.
+
+* For image classsification tasks, The model accuracy can be further improved when the test scale is 1.15 times that of training[5]. For the 82.99% ResNet50_vd pretrained model, it comes to 83.7% using 320x320 for the evaluation. We use Fix strategy to finetune the model with the training scale set as 320x320. During the process, the pre-preocessing pipeline is same for both training and test. All the weights except the fully connected layer are freezed. Finally the top-1 accuracy comes to **84.0%**.
+
+
+# Application of the distillation model
+
+## Instructions
+
+* Adjust the learning rate of the middle layer. The middle layer feature map of the model obtained by distillation is more refined. Therefore, when the distillation model is used as the pretrained model in other tasks, if the same learning rate as before is adopted, it is easy to destroy the features. If the learning rate of the overall model training is reduced, it will bring about the problem of slow convergence. Therefore, we use the strategy of adjusting the learning rate of the middle layer. specifically:
+    * For ResNet50_vd, we set up a learning rate list. The three conv2d convolution parameters before the resiual block have a uniform learning rate multiple, and the four resiual block conv2d have theirs own learning rate parameters, respectively. 5 values need to be set in the list. By the experiment, we find that when used for transfer learning finetune classification model, the learning rate list with `[0.1,0.1,0.2,0.2,0.3]` performs better in most tasks; while in the object detection tasks, `[0.05, 0.05, 0.05, 0.1, 0.15]` can bring greater accuracy gains.
+    * For MoblileNetV3_large_1x0, because it contains 15 blocks, we set each 3 blocks to share a learning rate, so 5 learning rate values are required. We find that in classification and detection tasks, the learning rate list with `[0.25, 0.25, 0.5, 0.5, 0.75]` performs better in most tasks.
+* Appropriate l2 decay. Different l2 decay values are set for different models during training. In order to prevent overfitting, l2 decay is ofen set as large for large models. L2 decay is set as `1e-4` for ResNet50, and `1e-5 ~ 4e-5` for MobileNet series models. L2 decay needs also to be adjusted when applied in other tasks. Taking Faster_RCNN_MobiletNetV3_FPN as an example, we found that only modifying l2 decay can bring up to 0.5% accuracy (mAP) improvement on the COCO2017 dataset.
+
+
+## Transfer learning
+
+* To verify the effect of the SSLD pretrained model in transfer learning, we carried out experiments on 10 small datasets. Here, in order to ensure the comparability of the experiment, we use the standard preprocessing process trained by the ImageNet1k dataset. For the distillation model, we also add a simple search method for the learning rate of the middle layers of the distillation pretrained model.
+* For ResNet50_vd, the baseline pretrained model Top-1 Acc is 79.12%, the other parameters are got by grid search. For distillation pretrained model, we add learning rate of the middle layers into the search space. The following table shows the results.
+
+| Dataset | Model | Baseline Top1 Acc | Distillation Model Finetune |
+|- |:-: |:-: | :-: |
+| Oxford102 flowers | ResNete50_vd | 97.18% | 97.41% |
+| caltech-101 | ResNete50_vd | 92.57% | 93.21% |
+| Oxford-IIIT-Pets | ResNete50_vd | 94.30% | 94.76% |
+| DTD | ResNete50_vd | 76.48% | 77.71% |
+| fgvc-aircraft-2013b | ResNete50_vd | 88.98% | 90.00% |
+| Stanford-Cars | ResNete50_vd | 92.65% | 92.76% |
+| SUN397 | ResNete50_vd | 64.02% | 68.36% |
+| cifar100 | ResNete50_vd | 86.50% | 87.58% |
+| cifar10 | ResNete50_vd | 97.72% | 97.94% |
+| Food-101 | ResNete50_vd | 89.58% | 89.99% |
+
+* It can be seen that on the above 10 datasets, combined with the appropriate middle layer learning rate, the distillation pretrained model can bring an average accuracy improvement of more than 1%.
+
+## Object detection
+
+
+Based on the two-stage Faster/Cascade RCNN model, we verify the effect of the pretrained model obtained by distillation.
+
+* ResNet50_vd
+
+Training scale and test scale are set as 640x640, and some of the ablationstudies are as follows.
+
+
+| Model | train/test scale | pretrain top1 acc | feature map lr | coco mAP |
+|- |:-: |:-: | :-: | :-: |
+| Faster RCNN R50_vd FPN | 640/640 | 79.12% | [1.0,1.0,1.0,1.0,1.0] | 34.8% |
+| Faster RCNN R50_vd FPN | 640/640 | 79.12% | [0.05,0.05,0.1,0.1,0.15] | 34.3% |
+| Faster RCNN R50_vd FPN | 640/640 | 82.18% | [0.05,0.05,0.1,0.1,0.15] | 36.3% |
+
+
+It can be seen here that for the baseline pretrained model, excessive adjustment of the middle-layer learning rate actually reduces the performance of the detection model. Based on this distillation model, we also provide a practical server-side detection solution. The detailed configuration and training code are open source, more details can be refer to [PaddleDetection] (https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_enhance).
+
+
+# Practice
+
+This section will introduce the SSLD distillation experiments in detail based on the ImageNet-1K dataset. If you want to experience this method quickly, you can refer to [** Quick start PaddleClas in 30 minutes**] (../../tutorials/quick_start.md), whose dataset is set as Flowers102.
+
+
+
+## Configuration
+
+
+
+### Distill ResNet50_vd using ResNeXt101_32x16d_wsl
+
+Configuration of distilling `ResNet50_vd` using `ResNeXt101_32x16d_wsl` is as follows.
+
+```yaml
+ARCHITECTURE:
+    name: 'ResNeXt101_32x16d_wsl_distill_ResNet50_vd'
+pretrained_model: "./pretrained/ResNeXt101_32x16d_wsl_pretrained/"
+# pretrained_model:
+#     - "./pretrained/ResNeXt101_32x16d_wsl_pretrained/"
+#     - "./pretrained/ResNet50_vd_pretrained/"
+use_distillation: True
+```
+
+### Distill MobileNetV3_large_x1_0 using ResNet50_vd_ssld
+
+The detailed configuration is as follows.
+
+```yaml
+ARCHITECTURE:
+    name: 'ResNet50_vd_distill_MobileNetV3_large_x1_0'
+pretrained_model: "./pretrained/ResNet50_vd_ssld_pretrained/"
+# pretrained_model:
+#     - "./pretrained/ResNet50_vd_ssld_pretrained/"
+#     - "./pretrained/ResNet50_vd_pretrained/"
+use_distillation: True
+```
+
+## Begin to train the network
+
+If everything is ready, users can begin to train the network using the following command.
+
+```bash
+export PYTHONPATH=path_to_PaddleClas:$PYTHONPATH
+
+python -m paddle.distributed.launch \
+    --selected_gpus="0,1,2,3" \
+    --log_dir=R50_vd_distill_MV3_large_x1_0 \
+    tools/train.py \
+        -c ./configs/Distillation/R50_vd_distill_MV3_large_x1_0.yaml
+```
+
+## Note
+
+* Before using SSLD, users need to train a teacher model on the target dataset firstly. The teacher model is used to guide the training of the student model.
+
+* When using SSLD, users need to set `use_distillation` in the configuration file to` True`. In addition, because the student model learns soft-label with knowledge information, you need to turn off the `label_smoothing` option.
+
+* If the student model is not loaded with a pretrained model, the other hyperparameters of the training can refer to the hyperparameters trained by the student model on ImageNet-1k. If the student model is loaded with the pre-trained model, the learning rate can be adjusted to `1/100~1/10` of the standard learning rate.
+
+* In the process of SSLD distillation, the student model only learns the soft label, which makes the training process more difficult. It is recommended that the value of `l2_decay` can be decreased appropriately to obtain higher accuracy of the validation set.
+
+* If users are going to add unlabeled training data, just the training list textfile needs to be adjusted for more data.
+
+
+
+> If this document is helpful to you, welcome to star our project: [https://github.com/PaddlePaddle/PaddleClas](https://github.com/PaddlePaddle/PaddleClas)
+
+
+# Reference
+
+[1] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
+
+[2] Bagherinezhad H, Horton M, Rastegari M, et al. Label refinery: Improving imagenet classification through label progression[J]. arXiv preprint arXiv:1805.02641, 2018.
+
+[3] Yalniz I Z, Jégou H, Chen K, et al. Billion-scale semi-supervised learning for image classification[J]. arXiv preprint arXiv:1905.00546, 2019.
+
+[4] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2019: 113-123.
+
+[5] Touvron H, Vedaldi A, Douze M, et al. Fixing the train-test resolution discrepancy[C]//Advances in Neural Information Processing Systems. 2019: 8250-8260.
--- a/docs/en/advanced_tutorials/distillation/index.rst
+++ b/docs/en/advanced_tutorials/distillation/index.rst
@@ -4,4 +4,4 @@ distillation
 .. toctree::
   :maxdepth: 3

-   distillation.md
+   distillation_en.md
--- a/docs/en/advanced_tutorials/image_augmentation/ImageAugment_en.md
+++ b/docs/en/advanced_tutorials/image_augmentation/ImageAugment_en.md
+# Image Augmentation
+
+
+Image augmentation is a commonly used regularization method in image classification task, which is often used in scenarios with insufficient data or large model. In this chapter, we mainly introduce 8 image augmentation methods besides standard augmentation methods. Users can apply these methods in their own tasks for better model performance. Under the same conditions, These augmentation methods' performance on ImageNet1k dataset is shown as follows.
+
+![](../../../images/image_aug/main_image_aug.png)
+
+
+# Common image augmentation methods
+
+If without special explanation, all the examples and experiments in this chapter are based on ImageNet1k dataset with the network input image size set as 224.
+
+The standard data augmentation pipeline in ImageNet classification tasks contains the following steps.
+
+1. Decode image, abbreviated as `ImageDecode`.
+2. Randomly crop the image to size with 224x224, abbreviated as `RandCrop`.
+3. Randomly flip the image horizontally, abbreviated as `RandFlip`.
+4. Normalize the image pixel values, abbreviated as `Normalize`.
+5. Transpose the image from `[224, 224, 3]`(HWC) to `[3, 224, 224]`(CHW), abbreviated as `Transpose`.
+6. Group the image data(`[3, 224, 224]`) into a batch(`[N, 3, 224, 224]`), where `N` is the batch size. It is abbreviated as `Batch`.
+
+
+Compared with the above standard image augmentation methods, the researchers have also proposed many improved image augmentation strategies. These strategies are to insert certain operations at different stages of the standard augmentation method, based on the different stages of operation. We divide it into the following three categories.
+
+1. Transformation. Perform some transformations on the image after `RandCrop`, such as AutoAugment and RandAugment.
+2. Cropping. Perform some transformations on the image after  `Transpose`, such as CutOut, RandErasing, HideAndSeek and GridMask.
+3. Aliasing. Perform some transformations on the image after `Batch`, such as Mixup and Cutmix.
+
+
+Visualization results of some images after augmentation are shown as follows.
+
+![](../../../images/image_aug/image_aug_samples_s_en.jpg)
+
+
+The following table shows more detailed information of the transformations.
+
+
+| Method        | Input                        | Output                        | Auto-<br>Augment\[1\] | Rand-<br>Augment\[2\] | CutOut\[3\] | Rand<br>Erasing\[4\] | HideAnd-<br>Seek\[5\] | GridMask\[6\] | Mixup\[7\] | Cutmix\[8\] |
+|-------------|---------------------------|---------------------------|------------------|------------------|-------------|------------------|------------------|---------------|------------|------------|
+| Image<br>Decode | Binary                    | (224, 224, 3)<br>uint8      | Y                | Y                | Y           | Y                | Y                | Y             | Y          | Y |
+| RandCrop    | (:, :, 3)<br>uint8          | (224, 224, 3)<br>uint8      | Y                | Y                | Y           | Y                | Y                | Y             | Y          | Y |
+| **Process**     | (224, 224, 3)<br>uint8      | (224, 224, 3)<br>uint8      | Y                | Y                | \-          | \-               | \-               | \-            | \-         | \- |
+| RandFlip    | (224, 224, 3)<br>uint8      | (224, 224, 3)<br>float32    | Y                | Y                | Y           | Y                | Y                | Y             | Y          | Y |
+| Normalize   | (224, 224, 3)<br>uint8      | (3, 224, 224)<br>float32    | Y                | Y                | Y           | Y                | Y                | Y             | Y          | Y |
+| Transpose   | (224, 224, 3)<br>float32    | (3, 224, 224)<br>float32    | Y                | Y                | Y           | Y                | Y                | Y             | Y          | Y |
+| **Process**     | (3, 224, 224)<br>float32    | (3, 224, 224)<br>float32    | \-               | \-               | Y           | Y                | Y                | Y             | \-         | \- |
+| Batch       | (3, 224, 224)<br>float32    | (N, 3, 224, 224)<br>float32 | Y                | Y                | Y           | Y                | Y                | Y             | Y          | Y |
+| **Process**     | (N, 3, 224, 224)<br>float32 | (N, 3, 224, 224)<br>float32 | \-               | \-               | \-          | \-               | \-               | \-            | Y          | Y |
+
+
+
+PaddleClas integrates all the above data augmentation strategies. More details including principles and usage of the strategies are introduced in the following chapters. For better visualization, we use the following figure to show the changes after the transformations. And `RandCrop` is replaced with` Resize` for simplification.
+
+![](../../../images/image_aug/test_baseline.jpeg)
+
+# Image Transformation
+
+Transformation means performing some transformations on the image after `RandCrop`. It mainly contains AutoAugment and RandAugment.
+
+## AutoAugment
+
+Address：[https://arxiv.org/abs/1805.09501v1](https://arxiv.org/abs/1805.09501v1)
+
+Github repo：[https://github.com/DeepVoltaire/AutoAugment](https://github.com/DeepVoltaire/AutoAugment)
+
+
+Unlike conventional artificially designed image augmentation methods, AutoAugment is an image augmentation solution suitable for a specific data set found by certain search algorithm in the search space of a series of image augmentation sub-strategies. For the ImageNet dataset, the final data augmentation solution contains 25 sub-strategy combinations. Each sub-strategy contains two transformations. For each image, a sub-strategy combination is randomly selected and then determined with a certain probability Perform each transformation in the sub-strategy.
+
+In PaddleClas, `AutoAugment` is used as follows.
+
+```python
+from ppcls.data.imaug import DecodeImage
+from ppcls.data.imaug import ResizeImage
+from ppcls.data.imaug import ImageNetPolicy
+from ppcls.data.imaug import transform
+
+size = 224
+
+decode_op = DecodeImage()
+resize_op = ResizeImage(size=(size, size))
+autoaugment_op = ImageNetPolicy()
+
+ops = [decode_op, resize_op, autoaugment_op]
+
+imgs_dir = image_path
+fnames = os.listdir(imgs_dir)
+for f in fnames:
+    data = open(os.path.join(imgs_dir, f)).read()
+    img = transform(data, ops)
+```
+
+The images after `AutoAugment` are as follows.
+
+![][test_autoaugment]
+
+## RandAugment
+
+Address: [https://arxiv.org/pdf/1909.13719.pdf](https://arxiv.org/pdf/1909.13719.pdf)
+
+Github repo: [https://github.com/heartInsert/randaugment](https://github.com/heartInsert/randaugment)
+
+
+The search method of `AutoAugment` is relatively violent. Searching for the optimal strategy for this data set directly on the data set requires a lot of computation. In `RandAugment`, the author found that on the one hand, for larger models and larger datasets, the gains generated by the augmentation method searched using `AutoAugment` are smaller. On the other hand, the searched strategy is limited to certain dataset, which has poor generalization performance and not sutable for other datasets.
+
+In `RandAugment`, the author proposes a random augmentation method. Instead of using a specific probability to determine whether to use a certain sub-strategy, all sub-strategies are selected with the same probability. The experiments in the paper also show that this method performs well even for large models.
+
+In PaddleClas, `RandAugment` is used as follows.
+
+```python
+from ppcls.data.imaug import DecodeImage
+from ppcls.data.imaug import ResizeImage
+from ppcls.data.imaug import RandAugment
+from ppcls.data.imaug import transform
+
+size = 224
+
+decode_op = DecodeImage()
+resize_op = ResizeImage(size=(size, size))
+randaugment_op = RandAugment()
+
+ops = [decode_op, resize_op, randaugment_op]
+
+imgs_dir = image_path
+fnames = os.listdir(imgs_dir)
+for f in fnames:
+    data = open(os.path.join(imgs_dir, f)).read()
+    img = transform(data, ops)
+```
+
+The images after `RandAugment` are as follows.
+
+![][test_randaugment]
+
+
+# Image Cropping
+
+Cropping means performing some transformations on the image after `Transpose`, setting pixels of the cropped area as certain constant. It mainly contains CutOut, RandErasing, HideAndSeek and GridMask.
+
+Image cropping methods can be operated before or after normalization. The difference is that if we crop the image before normalization and fill the areas with 0, the cropped areas' pixel values will not be 0 after normalization, which will cause grayscale distribution change of the data.
+
+The above-mentioned cropping transformation ideas are the similar, all to solve the problem of poor generalization ability of the trained model on occlusion images, the difference lies in that their cropping details.
+
+
+## Cutout
+
+Address: [https://arxiv.org/abs/1708.04552](https://arxiv.org/abs/1708.04552)
+
+Github repo: [https://github.com/uoguelph-mlrg/Cutout](https://github.com/uoguelph-mlrg/Cutout)
+
+
+Cutout is a kind of dropout, but occludes input image rather than feature map. It is more robust to noise than noise. Cutout has two advantages: (1) Using Cutout, we can simulate the situation when the subject is partially occluded. (2) It can promote the model to make full use of more content in the image for classification, and prevent the network from focusing only on the saliency area, thereby causing overfitting.
+
+In PaddleClas, `Cutout` is used as follows.
+
+```python
+from ppcls.data.imaug import DecodeImage
+from ppcls.data.imaug import ResizeImage
+from ppcls.data.imaug import Cutout
+from ppcls.data.imaug import transform
+
+size = 224
+
+decode_op = DecodeImage()
+resize_op = ResizeImage(size=(size, size))
+cutout_op = Cutout(n_holes=1, length=112)
+
+ops = [decode_op, resize_op, cutout_op]
+
+imgs_dir = image_path
+fnames = os.listdir(imgs_dir)
+for f in fnames:
+    data = open(os.path.join(imgs_dir, f)).read()
+    img = transform(data, ops)
+```
+
+The images after `Cutout` are as follows.
+
+![][test_cutout]
+
+## RandomErasing
+
+Address: [https://arxiv.org/pdf/1708.04896.pdf](https://arxiv.org/pdf/1708.04896.pdf)
+
+Github repo: [https://github.com/zhunzhong07/Random-Erasing](https://github.com/zhunzhong07/Random-Erasing)
+
+RandomErasing is similar to the Cutout. It is also to solve the problem of poor generalization ability of the trained model on images with occlusion. The author also pointed out in the paper that the way of random cropping is complementary to random horizontal flipping. The author also verified the effectiveness of the method on pedestrian re-identification (REID). Unlike `Cutout`, in` `, `RandomErasing` is operateed on the image with a certain probability, size and aspect ratio of the generated mask are also randomly generated according to pre-defined hyperparameters.
+
+In PaddleClas, `RandomErasing` is used as follows.
+
+```python
+from ppcls.data.imaug import DecodeImage
+from ppcls.data.imaug import ResizeImage
+from ppcls.data.imaug import ToCHWImage
+from ppcls.data.imaug import RandomErasing
+from ppcls.data.imaug import transform
+
+size = 224
+
+decode_op = DecodeImage()
+resize_op = ResizeImage(size=(size, size))
+randomerasing_op = RandomErasing()
+
+ops = [decode_op, resize_op, tochw_op, randomerasing_op]
+
+imgs_dir = image_path
+fnames = os.listdir(imgs_dir)
+for f in fnames:
+    data = open(os.path.join(imgs_dir, f)).read()
+    img = transform(data, ops)
+    img = img.transpose((1, 2, 0))
+```
+
+The images after `RandomErasing` are as follows.
+
+![][test_randomerassing]
+
+
+## HideAndSeek
+
+Address: [https://arxiv.org/pdf/1811.02545.pdf](https://arxiv.org/pdf/1811.02545.pdf)
+
+Github repo: [https://github.com/kkanshul/Hide-and-Seek](https://github.com/kkanshul/Hide-and-Seek)
+
+
+Images are divided into some patches for `HideAndSeek` and masks are generated with certain probability for each patch. The meaning of the masks in different areas is shown in the figure below.
+
+![][hide_and_seek_mask_expanation]
+
+In PaddleClas, `HideAndSeek` is used as follows.
+
+```python
+from ppcls.data.imaug import DecodeImage
+from ppcls.data.imaug import ResizeImage
+from ppcls.data.imaug import ToCHWImage
+from ppcls.data.imaug import HideAndSeek
+from ppcls.data.imaug import transform
+
+size = 224
+
+decode_op = DecodeImage()
+resize_op = ResizeImage(size=(size, size))
+hide_and_seek_op = HideAndSeek()
+
+ops = [decode_op, resize_op, tochw_op, hide_and_seek_op]
+
+imgs_dir = image_path
+fnames = os.listdir(imgs_dir)
+for f in fnames:
+    data = open(os.path.join(imgs_dir, f)).read()
+    img = transform(data, ops)
+    img = img.transpose((1, 2, 0))
+```
+
+The images after `HideAndSeek` are as follows.
+
+![][test_hideandseek]
+
+
+## GridMask
+Address：[https://arxiv.org/abs/2001.04086](https://arxiv.org/abs/2001.04086)
+
+Github repo：[https://github.com/akuxcw/GridMask](https://github.com/akuxcw/GridMask)
+
+
+The author points out that the previous method based on image cropping has two problems, as shown in the following figure:
+
+1. Excessive deletion of the area may cause most or all of the target subject to be deleted, or cause the context information loss, resulting in the images after enhancement becoming noisy data.
+2. Reserving too much area has little effect on the object and context.
+
+![][gridmask-0]
+
+Therefore, it is the core problem to be solved how to
+if you avoid over-deletion or over-retention becomes the core problem to be solved.
+
+`GridMask` is to generate a mask with the same resolution as the original image and multiply it with the original image. The mask grid and size are adjusted by the hyperparameters.
+
+In the training process, there are two methods to use:
+1. Set a probability p and use the GridMask to augment the image with probability p from the beginning of training.
+2. Initially set the augmentation probability to 0, and the probability is increased with number of iterations from 0 to p.
+
+It shows that the second method is better.
+
+The usage of `GridMask` in PaddleClas is shown below.
+
+```python
+from data.imaug import DecodeImage
+from data.imaug import ResizeImage
+from data.imaug import ToCHWImage
+from data.imaug import GridMask
+from data.imaug import transform
+
+size = 224
+
+decode_op = DecodeImage()
+resize_op = ResizeImage(size=(size, size))
+tochw_op = ToCHWImage()
+gridmask_op = GridMask(d1=96, d2=224, rotate=1, ratio=0.6, mode=1, prob=0.8)
+
+ops = [decode_op, resize_op, tochw_op, gridmask_op]
+
+imgs_dir = image_path
+fnames = os.listdir(imgs_dir)
+for f in fnames:
+    data = open(os.path.join(imgs_dir, f)).read()
+    img = transform(data, ops)
+    img = img.transpose((1, 2, 0))
+```
+
+The images after `GridMask` are as follows.
+
+![][test_gridmask]
+
+
+# Image aliasing
+
+Aliasing means performing some transformations on the image after `Batch`, which contains Mixup and Cutmix.
+
+Data augmentation methods introduced before are based on single image while aliasing is carried on a certain batch to generate a new batch.
+
+## Mixup
+
+Address: [https://arxiv.org/pdf/1710.09412.pdf](https://arxiv.org/pdf/1710.09412.pdf)
+
+Github repo: [https://github.com/facebookresearch/mixup-cifar10](https://github.com/facebookresearch/mixup-cifar10)
+
+Mixup is the first solution for image aliasing, it is easy to realize and performs well not only on image classification but also on object detection. Mixup is usually carried out in a batch for simplification, so as `Cutmix`.
+
+
+The usage of `Mixup` in PaddleClas is shown below.
+
+```python
+from ppcls.data.imaug import DecodeImage
+from ppcls.data.imaug import ResizeImage
+from ppcls.data.imaug import ToCHWImage
+from ppcls.data.imaug import transform
+from ppcls.data.imaug import MixupOperator
+
+size = 224
+
+decode_op = DecodeImage()
+resize_op = ResizeImage(size=(size, size))
+tochw_op = ToCHWImage()
+hide_and_seek_op = HideAndSeek()
+mixup_op = MixupOperator()
+cutmix_op = CutmixOperator()
+
+ops = [decode_op, resize_op, tochw_op]
+
+imgs_dir = image_path
+
+batch = []
+fnames = os.listdir(imgs_dir)
+for idx, f in enumerate(fnames):
+    data = open(os.path.join(imgs_dir, f)).read()
+    img = transform(data, ops)
+    batch.append( (img, idx) ) # fake label
+
+new_batch = mixup_op(batch)
+```
+
+The images after `Mixup` are as follows.
+
+![][test_mixup]
+
+## Cutmix
+
+Address: [https://arxiv.org/pdf/1905.04899v2.pdf](https://arxiv.org/pdf/1905.04899v2.pdf)
+
+Github repo: [https://github.com/clovaai/CutMix-PyTorch](https://github.com/clovaai/CutMix-PyTorch)
+
+Unlike `Mixup` which directly adds two images, for Cutmix, an `ROI` is cut out from one image and
+Cutmix randomly cuts out an `ROI` from one image, and then covered onto the corresponding area in the another image. The usage of `Cutmix` in PaddleClas is shown below.
+
+
+```python
+rom ppcls.data.imaug import DecodeImage
+from ppcls.data.imaug import ResizeImage
+from ppcls.data.imaug import ToCHWImage
+from ppcls.data.imaug import transform
+from ppcls.data.imaug import CutmixOperator
+
+size = 224
+
+decode_op = DecodeImage()
+resize_op = ResizeImage(size=(size, size))
+tochw_op = ToCHWImage()
+hide_and_seek_op = HideAndSeek()
+cutmix_op = CutmixOperator()
+
+ops = [decode_op, resize_op, tochw_op]
+
+imgs_dir = image_path
+
+batch = []
+fnames = os.listdir(imgs_dir)
+for idx, f in enumerate(fnames):
+    data = open(os.path.join(imgs_dir, f)).read()
+    img = transform(data, ops)
+    batch.append( (img, idx) ) # fake label
+
+new_batch = cutmix_op(batch)
+```
+
+The images after `Cutmix` are as follows.
+
+![][test_cutmix]
+
+
+# Experiments
+
+Based on PaddleClas, Metrics of different augmentation methods on ImageNet1k dataset are as follows.
+
+
+| Model          | Learning strategy  | l2 decay | batch size | epoch | Augmentation method   | Top1 Acc    | Reference |
+|-------------|------------------|--------------|------------|-------|----------------|------------|----|
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | Standard transform           | 0.7731 | - |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | AutoAugment    | 0.7795 |  0.7763 |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | mixup          | 0.7828 |  0.7790 |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | cutmix         | 0.7839 |  0.7860 |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | cutout         | 0.7801 |  - |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | gridmask       | 0.7785 |  0.7790 |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | random-augment | 0.7770 |  0.7760 |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | random erasing | 0.7791 |  - |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | hide and seek  | 0.7743 |  0.7720 |
+
+
+**note**:
+* In the experiment here, for better comparison, we fixed the l2 decay to 1e-4. To achieve higher accuracy, we recommend trying to use a smaller l2 decay. Combined with data augmentaton, we found that reducing l2 decay from 1e-4 to 7e-5 can bring at least 0.3~0.5% accuracy improvement.
+* We have not yet combined different strategies or verified, whch is our future work.
+
+
+
+## Data augmentation practice
+
+Experiments about data augmentation will be introduced in detail in this section. If you want to quickly experience these methods, please refer to [**Quick start PaddleClas in 30 miniutes**](../../tutorials/quick_start_en.md).
+
+## Configurations
+
+Since hyperparameters differ from different augmentation methods. For better understanding, we list 8 augmentation configuration files in `configs/DataAugment` based on ResNet50. Users can train the model with `tools/run.sh`. The following are 3 of them.
+
+### RandAugment
+
+Configuration of `RandAugment` is shown as follows. `Num_layers`(default as 2) and `magnitude`(default as 5) are two hyperparameters.
+
+
+```yaml
+    transforms:
+        - DecodeImage:
+            to_rgb: True
+            to_np: False
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - RandAugment:
+            num_layers: 2
+            magnitude: 5
+        - NormalizeImage:
+            scale: 1./255.
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - ToCHWImage:
+```
+
+### Cutout
+
+Configuration of `Cutout` is shown as follows. `n_holes`(default as 1) and `n_holes`(default as 112) are two hyperparameters.
+
+```yaml
+    transforms:
+        - DecodeImage:
+            to_rgb: True
+            to_np: False
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1./255.
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - Cutout:
+            n_holes: 1
+            length: 112
+        - ToCHWImage:
+```
+
+### Mixup
+
+
+Configuration of `Mixup` is shown as follows. `alpha`(default as 0.2) is hyperparameter which users need to care about. What's more, `use_mix` need to be set as `True` in the root of the configuration.
+
+```yaml
+    transforms:
+        - DecodeImage:
+            to_rgb: True
+            to_np: False
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1./255.
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - ToCHWImage:
+    mix:
+        - MixupOperator:
+            alpha: 0.2
+```
+
+## 启动命令
+
+Users can use the following command to start the training process, which can also be referred to `tools/run.sh`.
+
+```bash
+export PYTHONPATH=path_to_PaddleClas:$PYTHONPATH
+
+python -m paddle.distributed.launch \
+    --selected_gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./configs/DataAugment/ResNet50_Cutout.yaml
+```
+
+## Note
+
+* When using augmentation methods based on image aliasing, users need to set `use_mix` in the configuration file as `True`. In addition, because the label needs to be aliased when the image is aliased, the accuracy of the training data cannot be calculated. The training accuracy rate was not printed during the training process.
+
+* The training data is more difficult with data augmentation, so the training loss may be larger, the training set accuracy is relatively low, but it has better generalization ability, so the validation set accuracy is relatively higher.
+
+* After the use of data augmentation, the model may tend to be underfitting. It is recommended to reduce `l2_decay` for better performance on validation set.
+
+* hyperparameters exist in almost all agmenatation methods. Here we provide hyperparameters for ImageNet1k dataset. User may need to finetune the hyperparameters on specified dataset. More training tricks can be referred to [**Tricks**](../../../zh_CN/models/Tricks.md).
+
+
+> If this document is helpful to you, welcome to star our project: [https://github.com/PaddlePaddle/PaddleClas](https://github.com/PaddlePaddle/PaddleClas)
+
+
+# Reference
+
+[1] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2019: 113-123.
+
+
+[2] Cubuk E D, Zoph B, Shlens J, et al. Randaugment: Practical automated data augmentation with a reduced search space[J]. arXiv preprint arXiv:1909.13719, 2019.
+
+[3] DeVries T, Taylor G W. Improved regularization of convolutional neural networks with cutout[J]. arXiv preprint arXiv:1708.04552, 2017.
+
+[4] Zhong Z, Zheng L, Kang G, et al. Random erasing data augmentation[J]. arXiv preprint arXiv:1708.04896, 2017.
+
+[5] Singh K K, Lee Y J. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization[C]//2017 IEEE international conference on computer vision (ICCV). IEEE, 2017: 3544-3553.
+
+[6] Chen P. GridMask Data Augmentation[J]. arXiv preprint arXiv:2001.04086, 2020.
+
+[7] Zhang H, Cisse M, Dauphin Y N, et al. mixup: Beyond empirical risk minimization[J]. arXiv preprint arXiv:1710.09412, 2017.
+
+[8] Yun S, Han D, Oh S J, et al. Cutmix: Regularization strategy to train strong classifiers with localizable features[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 6023-6032.
+
+
+
+[test_baseline]: ../../../images/image_aug/test_baseline.jpeg
+[test_autoaugment]: ../../../images/image_aug/test_autoaugment.jpeg
+[test_cutout]: ../../../images/image_aug/test_cutout.jpeg
+[test_gridmask]: ../../../images/image_aug/test_gridmask.jpeg
+[gridmask-0]: ../../../images/image_aug/gridmask-0.png
+[test_hideandseek]: ../../../images/image_aug/test_hideandseek.jpeg
+[test_randaugment]: ../../../images/image_aug/test_randaugment.jpeg
+[test_randomerassing]: ../../../images/image_aug/test_randomerassing.jpeg
+[hide_and_seek_mask_expanation]: ../../../images/image_aug/hide-and-seek-visual.png
+[test_mixup]: ../../../images/image_aug/test_mixup.png
+[test_cutmix]: ../../../images/image_aug/test_cutmix.png
--- a/docs/en/advanced_tutorials/image_augmentation/index.rst
+++ b/docs/en/advanced_tutorials/image_augmentation/index.rst
@@ -4,4 +4,4 @@ image_augmentation
 .. toctree::
   :maxdepth: 3

-   ImageAugment.md
+   ImageAugment_en.md
--- a/docs/en/application/index.rst
+++ b/docs/en/application/index.rst
@@ -4,5 +4,5 @@ application
 .. toctree::
   :maxdepth: 2
   
-   transfer_learning.md
-   object_detection.md
+   transfer_learning_en.md
+   object_detection_en.md
--- a/docs/en/application/object_detection_en.md
+++ b/docs/en/application/object_detection_en.md
+# General object detection
+
+## Practical Server-side detection method base on RCNN
+
+### Introduction
+
+
+* In recent years, object detection tasks have attracted widespread attention. [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) open-sourced the ResNet50_vd_SSLD pretrained model based on ImageNet(Top1 Acc 82.4%). And based on the pretrained model, PaddleDetection provided the PSS-DET (Practical Server-side detection) with the help of the rich operators in PaddleDetection. The inference speed can reach 61FPS on single V100 GPU when COCO mAP is 41.6%, and 20FPS when COCO mAP is 47.8%.
+
+* We take the standard `Faster RCNN ResNet50_vd FPN` as an example. The following table shows ablation study of PSS-DET.
+
+| Trick | Train scale | Test scale |  COCO mAP | Infer speed/FPS |
+|- |:-: |:-: | :-: | :-: |
+| `baseline` | 640x640 | 640x640 | 36.4% | 43.589 |
+| +`test proposal=pre/post topk 500/300` | 640x640 | 640x640 | 36.2% | 52.512 |
+| +`fpn channel=64` | 640x640 | 640x640 | 35.1% | 67.450 |
+| +`ssld pretrain` | 640x640 | 640x640 | 36.3% | 67.450 |
+| +`ciou loss` | 640x640 | 640x640 | 37.1% | 67.450 |
+| +`DCNv2` | 640x640 | 640x640 | 39.4% | 60.345 |
+| +`3x, multi-scale training` | 640x640 | 640x640 | 41.0% | 60.345 |
+| +`auto augment` | 640x640 | 640x640 | 41.4% | 60.345 |
+| +`libra sampling` | 640x640 | 640x640 | 41.6% | 60.345 |
+
+
+Based on the ablation experiments, Cascade RCNN and larger inference scale(1000x1500) are used for better performance. The final COCO mAP is 47.8%
+and the following figure shows `mAP-Speed` curves for some common detectors.
+
+
+![pssdet](../../images/det/pssdet.png)
+
+
+**Note**
+> For fair comparison, inference time for PSS-DET models on V100 GPU is transformed to Titan V GPU by multiplying by 1.2 times.
+
+For more detailed information, you can refer to [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det).
+
+
+## Practical Mobile-side detection method base on RCNN
+
+* This part is comming soon!
--- a/docs/en/application/transfer_learning_en.md
+++ b/docs/en/application/transfer_learning_en.md
+# Transfer learning in image classification
+
+Transfer learning is an important part of machine learning, which is widely used in various fields such as text and images. Here we mainly introduce transfer learning in the field of image classification, which is often called domain transfer, such as migration of the ImageNet classification model to the specified image classification task, such as flower classification.
+
+## Hyperparameter search
+
+ImageNet is the widely used dataset for image classification. A series of empirical hyperparameters have been summarized. High accuracy can be got using the hyperparameters. However, when applied in the specified dataset, the hyperparameters may not be optimal. There are two commonly used hyperparameter search methods that can be used to help us obtain better model hyperparameters.
+
+### Grid search
+
+For grid search, which is also called exhaustive search, the optimal value is determined by finding the best solution from all solutions in the search space. The method is simple and effective, but when the search space is large, it takes huge computing resource.
+
+### Bayesian search
+
+Bayesian search, which is also called Bayesian optimization, is realized by randomly selecting a group of hyperparameters in the search space. Gaussian process is used to update the hyperparameters, compute their expected mean and variance according to the performance of the previous hyperparameters. The larger the expected mean, the greater the probability of being close to the optimal solution. The larger the expected variance, the greater the uncertainty. Usually, the hyperparameter point with large expected mean is called `exporitation`, and the hyperparameter point with large variance is called `exploration`. Acquisition function is defined to balance the expected mean and variance. The currently selected hyperparameter point is viewed as the optimal position with maximum probability.
+
+According to the above two search schemes, we carry out some experiments based on fixed scheme and two search schemes on 8 open source datasets. As the experimental scheme in [1], we search for 4 hyperparameters, the search space and The experimental results are as follows:
+
+a fixed set of parameter experiments and two search schemes on 8 open source data sets. With reference to the experimental scheme of [1], we search for 4 hyperparameters, the search space and the experimental results are as follows:
+
+
+- Fixed scheme.
+
+```
+lr=0.003，l2 decay=1e-4，label smoothing=False，mixup=False
+```
+
+- Search space of the hyperparameters.
+
+```
+lr: [0.1, 0.03, 0.01, 0.003, 0.001, 0.0003, 0.0001]
+
+l2 decay: [1e-3, 3e-4, 1e-4, 3e-5, 1e-5, 3e-6, 1e-6]
+
+label smoothing: [False, True]
+
+mixup: [False, True]
+```
+
+It takes 196 times for grid search, and takes 10 times less for Bayesian search. The baseline is trained by using ImageNet1k pretrained model based on ResNet50_vd and fixed scheme. The follow shows the experiments.
+
+
+| Dataset             | Fix scheme | Grid search | Grid search time | Bayesian search | Bayesian search time|
+| ------------------ | -------- | -------- | -------- | -------- | ---------- |
+| Oxford-IIIT-Pets   | 93.64%   | 94.55%   | 196 | 94.04%     | 20         |
+| Oxford-102-Flowers | 96.08%   | 97.69%   | 196 |  97.49%     | 20         |
+| Food101            | 87.07%   | 87.52%   | 196 |  87.33%     | 23         |
+| SUN397             | 63.27%   | 64.84%   | 196 |  64.55%     | 20         |
+| Caltech101         | 91.71%   | 92.54%   | 196 |  92.16%     | 14         |
+| DTD                | 76.87%   | 77.53%   | 196 |  77.47%     | 13         |
+| Stanford Cars      | 85.14%   | 92.72%   | 196 |  92.72%     | 25         |
+| FGVC Aircraft      | 80.32%   | 88.45%   | 196 |  88.36%     | 20         |
+
+
+- The above experiments verify that Bayesian search only reduces the accuracy by 0% to 0.4% under the condition of reducing the number of searches by about 10 times compared to grid search.
+- The search space can be expaned easily using Bayesian search.
+
+## Large-scale image classification
+
+In practical applications, due to the lack of training data, the classification model trained on the ImageNet1k data set is often used as the pretrained model for other image classification tasks. In order to further help solve practical problems, based on ResNet50_vd, Baidu open sourced a self-developed large-scale classification pretrained model, in which the training data contains 100,000 categories and 43 million pictures. The pretrained model can be downloaded as follows：[**download link**](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_10w_pretrained.tar)
+
+We conducted transfer learning experiments on 6 self-collected datasets,
+
+using a set of fixed parameters and a grid search method, in which the number of training rounds was set to 20epochs, the ResNet50_vd model was selected, and the ImageNet pre-training accuracy was 79.12%. The comparison results of the experimental data set parameters and model accuracy are as follows:
+
+
+Fixed scheme：
+
+```
+lr=0.001，l2 decay=1e-4，label smoothing=False，mixup=False
+```
+
+| Dataset          | Statstics                                  | **Pretrained moel on ImageNet <br />Top-1(fixed)/Top-1(search)** | **Pretrained moel on large-scale dataset<br />Top-1(fixed)/Top-1(search)** |
+| --------------- | ----------------------------------------- | -------------------------------------------------------- | --------------------------------------------------------- |
+| Flowers         | class:102<br />train:5789<br />valid:2396 | 0.7779/0.9883                                            | 0.9892/0.9954                                             |
+| Hand-painted stick figures       | Class:18<br />train:1007<br />valid:432   | 0.8795/0.9196                                            | 0.9107/0.9219                                             |
+| Leaves     | class:6<br />train:5256<br />valid:2278   | 0.8212/0.8482                                            | 0.8385/0.8659                                             |
+| Container vehicle       | Class:115<br />train:4879<br />valid:2094 | 0.6230/0.9556                                            | 0.9524/0.9702                                             |
+| Chair         | class:5<br />train:169<br />valid:78      | 0.8557/0.9688                                            | 0.9077/0.9792                                             |
+| Geology         | class:4<br />train:671<br />valid:296     | 0.5719/0.8094                                            | 0.6781/0.8219                                             |
+
+- The above experiments verified that for fixed parameters, compared with the pretrained model on ImageNet, using the large-scale classification model as a pretrained model can help us improve the model performance on a new dataset in most cases. Parameter search can be further helpful to the model performance.
+
+## Reference
+
+[1] Kornblith, Simon, Jonathon Shlens, and Quoc V. Le. "Do better imagenet models transfer better?." *Proceedings of the IEEE conference on computer vision and pattern recognition*. 2019.
+
+[2] Kolesnikov, Alexander, et al. "Large Scale Learning of General Visual Representations for Transfer." *arXiv preprint arXiv:1912.11370* (2019).
--- a/docs/en/competition_support_en.md
+++ b/docs/en/competition_support_en.md
+### Competition Support
+
+PaddleClas stems from the Baidu's visual business applications and the exploration of frontier visual capabilities. It has helped us achieve leading results in many key events, and continues to promote more frontier visual solutions and landing applications.
+
+
+* 1st place in 2018 Kaggle Open Images V4 object detection challenge
+
+
+* 2nd place in 2019 Kaggle Open Images V5 object detection challenge
+    * The report is avaiable here: [https://arxiv.org/pdf/1911.07171.pdf](https://arxiv.org/pdf/1911.07171.pdf)
+    * The pretrained model and code is avaiable here: [source code](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/featured_model/OIDV5_BASELINE_MODEL.md)
+
+* 2nd place in Kacggle Landmark Retrieval Challenge 2019
+    * The report is avaiable here: [https://arxiv.org/abs/1906.03990](https://arxiv.org/abs/1906.03990)
+    * The pretrained model and code is avaiable here: [source code](https://github.com/PaddlePaddle/Research/tree/master/CV/landmark)
+
+* 2nd place in Kaggle Landmark Recognition Challenge 2019
+    * The report is avaiable here: [https://arxiv.org/abs/1906.03990](https://arxiv.org/abs/1906.03990)
+    * The pretrained model and code is avaiable here: [source code](https://github.com/PaddlePaddle/Research/tree/master/CV/landmark)
+
+* A-level certificate of three tasks: printed text OCR, face recognition and landmark recognition in the first multimedia information recognition technology competition
--- a/docs/en/extension/index.rst
+++ b/docs/en/extension/index.rst
@@ -4,9 +4,9 @@ extension
 .. toctree::
   :maxdepth: 1
   
-   paddle_inference.md
-   paddle_mobile_inference.md
-   paddle_quantization.md
-   multi_machine_training.md
-   paddle_hub.md
-   paddle_serving.md
+   paddle_inference_en.md
+   paddle_mobile_inference_en.md
+   paddle_quantization_en.md
+   multi_machine_training_en.md
+   paddle_hub_en.md
+   paddle_serving_en.md
--- a/docs/en/extension/multi_machine_training_en.md
+++ b/docs/en/extension/multi_machine_training_en.md
+# Distributed Training
+
+Distributed deep neural networks training is highly efficient in PaddlePaddle.
+And it is one of the PaddlePaddle's core advantage technologies.
+On image classification tasks, distributed training can achieve almost linear acceleration ratio.
+[Fleet](https://github.com/PaddlePaddle/Fleet) is High-Level API for distributed training in PaddlePaddle.
+By using Fleet, a user can shift from local machine paddlepaddle code to distributed code easily.
+In order to support both single-machine training and multi-machine training,
+[PaddleClas](https://github.com/PaddlePaddle/PaddleClas) uses the Fleet API interface.
+For more information about distributed training,
+please refer to [Fleet API documentation](https://github.com/PaddlePaddle/Fleet/blob/develop/README.md).
--- a/docs/en/extension/paddle_hub_en.md
+++ b/docs/en/extension/paddle_hub_en.md
+# Paddle Hub
+
+[PaddleHub](https://github.com/PaddlePaddle/PaddleHub) is a pre-trained model application tool for PaddlePaddle.
+Developers can conveniently use the high-quality pre-trained model combined with Fine-tune API to quickly complete the whole process from model migration to deployment.
+All the pre-trained models of [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) have been collected by PaddleHub.
+For further details, please refer to [PaddleHub website](https://www.paddlepaddle.org.cn/hub).
--- a/docs/en/extension/paddle_inference_en.md
+++ b/docs/en/extension/paddle_inference_en.md
+# Prediction Framework
+
+## Introduction
+
+Models for Paddle are stored in many different forms, which can be roughly divided into two categories：
+1. persistable model（the models saved by fluid.save_persistables）
+    The weights are saved in checkpoint, which can be loaded to retrain, one scattered weight file saved by persistable stands for one persistable variable in the model, there is no structure information in these variable, so the weights should be used with the model structure.
+    ```
+    resnet50-vd-persistable/
+    ├── bn2a_branch1_mean
+    ├── bn2a_branch1_offset
+    ├── bn2a_branch1_scale
+    ├── bn2a_branch1_variance
+    ├── bn2a_branch2a_mean
+    ├── bn2a_branch2a_offset
+    ├── bn2a_branch2a_scale
+    ├── ...
+    └── res5c_branch2c_weights
+    ```
+2. inference model（the models saved by fluid.io.save_inference_model）
+    The model saved by this function cam be used for inference directly, compared with the ones saved by persistable, the model structure will be additionally saved in the model, with the weights, the model with trained weights can be reconstruction. as shown in the following figure, the structure information is saved in `model`
+    ```
+    resnet50-vd-persistable/
+    ├── bn2a_branch1_mean
+    ├── bn2a_branch1_offset
+    ├── bn2a_branch1_scale
+    ├── bn2a_branch1_variance
+    ├── bn2a_branch2a_mean
+    ├── bn2a_branch2a_offset
+    ├── bn2a_branch2a_scale
+    ├── ...
+    ├── res5c_branch2c_weights
+    └── model
+    ```
+    For convenience, all weight files will be saved into a `params` file when saving the inference model on Paddle, as shown below：
+    ```
+    resnet50-vd
+    ├── model
+    └── params
+    ```
+
+Both the training engine and the prediction engine in Paddle support the model's e inference, but the back propagation is not performed during the inference, so it can be customized optimization (such as layer fusion, kernel selection, etc.) to achieve low latency and high throughput during inference. The training engine can support either the persistable model or the inference model, and the prediction engine only supports the inference model, so three different inferences are derived：
+
+1. prediction engine + inference model
+2. training engine + inference model
+3. training engine + inference model
+
+Regardless of the inference method, it basically includes the following main steps：
+ Engine Build
+ Make Data to Be Predicted
+ Perform Predictions
+ Result Analysis
+
+There are two main differences in different inference methods: building the engine and executing the forecast. The following sections will be introduced in detail
+
+
+## Model Transformation
+
+During training, we usually save some checkpoints (persistable models). These are just model weight files and cannot be directly loaded by the prediction engine to predict, so we usually find suitable checkpoints after the training and convert them to inference model. There are two main steps: 1. Build a training engine, 2. Save the inference model, as shown below.
+
+```python
+import fluid
+
+from ppcls.modeling.architectures.resnet_vd import ResNet50_vd
+
+place = fluid.CPUPlace()
+exe = fluid.Executor(place)
+startup_prog = fluid.Program()
+infer_prog = fluid.Program()
+with fluid.program_guard(infer_prog, startup_prog):
+    with fluid.unique_name.guard():
+        image = create_input()
+        image = fluid.data(name='image', shape=[None, 3, 224, 224], dtype='float32')
+        out = ResNet50_vd.net(input=input, class_dim=1000)
+
+infer_prog = infer_prog.clone(for_test=True)
+fluid.load(program=infer_prog, model_path=the path of persistable model, executor=exe)
+
+fluid.io.save_inference_model(
+        dirname='./output/',
+        feeded_var_names=[image.name],
+        main_program=infer_prog,
+        target_vars=out,
+        executor=exe,
+        model_filename='model',
+        params_filename='params')
+```
+
+A complete example is provided in the `tools/export_model.py`, just execute the following command to complete the conversion：
+
+```python
+python tools/export_model.py \
+    --m=the name of model \
+    --p=the path of persistable model\
+    --o=the saved path of model and params
+```
+
+## Prediction engine + inference model
+
+The complete example is provided in the `tools/infer/predict.py`，just execute the following command to complete the prediction:
+
+```
+python ./tools/infer/predict.py \
+    -i=./test.jpeg \
+    -m=./resnet50-vd/model \
+    -p=./resnet50-vd/params \
+    --use_gpu=1 \
+    --use_tensorrt=True
+```
+
+Parameter Description：
+ `image_file`(shortening i)：the path of images which are needed to predict，such as `./test.jpeg`.
+ `model_file`(shortening m)：the path of weights folder，such as `./resnet50-vd/model`.
+ `params_file`(shortening p)：the path of weights file，such as `./resnet50-vd/params`.
+ `batch_size`(shortening b)：batch size，such as  `1`.
+ `ir_optim` whether to use `IR` optimization, default: True.
+ `use_tensorrt`: whether to use TensorRT prediction engine, default:True.
+ `gpu_mem`： Initial allocation of GPU memory, the unit is M.
+ `use_gpu`: whether to use GPU, default: True.
+ `enable_benchmark`：whether to use benchmark, default: False.
+ `model_name`：the name of model.
+
+NOTE：
+when using benchmark, we use tersorrt by default to make predictions on Paddle.
+
+
+Building prediction engine：
+
+```python
+from paddle.fluid.core import AnalysisConfig
+from paddle.fluid.core import create_paddle_predictor
+config = AnalysisConfig(the path of model file, the path of params file)
+config.enable_use_gpu(8000, 0)
+config.disable_glog_info()
+config.switch_ir_optim(True)
+config.enable_tensorrt_engine(
+        precision_mode=AnalysisConfig.Precision.Float32,
+        max_batch_size=1)
+
+# no zero copy方式需要去除fetch feed op
+config.switch_use_feed_fetch_ops(False)
+
+predictor = create_paddle_predictor(config)
+```
+
+Prediction Execution：
+
+```python
+import numpy as np
+
+input_names = predictor.get_input_names()
+input_tensor = predictor.get_input_tensor(input_names[0])
+input = np.random.randn(1, 3, 224, 224).astype("float32")
+input_tensor.reshape([1, 3, 224, 224])
+input_tensor.copy_from_cpu(input)
+predictor.zero_copy_run()
+```
+
+More parameters information can be refered in [Paddle Python prediction API](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/python_infer_cn.html). If you need to predict in the environment of business, we recommand you to use [Paddel C++ prediction API](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/native_infer.html)，a rich pre-compiled prediction library is provided in the offical website[Paddle C++ prediction library](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。
+
+
+By default, Paddle's wheel package does not include the TensorRT prediction engine. If you need to use TensorRT for prediction optimization, you need to compile the corresponding wheel package yourself. For the compilation method, please refer to Paddle's compilation guide. [Paddle compilation](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/fromsource.html)。
+
+## Training engine + persistable model prediction
+
+A complete example is provided in the `tools/infer/infer.py`, just execute the following command to complete the prediction：
+
+```python
+python tools/infer/infer.py \
+    --i=the path of images which are needed to predict \
+    --m=the name of model \
+    --p=the path of persistable model \
+    --use_gpu=True
+```
+
+Parameter Description：
+ `image_file`(shortening i)：the path of images which are needed to predict，such as `./test.jpeg`
+ `model_file`(shortening m)：the path of weights folder，such as `./resnet50-vd/model`
+ `params_file`(shortening p)：the path of weights file，such as `./resnet50-vd/params`
+ `use_gpu` : whether to use GPU, default: True.
+
+
+Training Engine Construction：
+
+Since the persistable model does not contain the structural information of the model, it is necessary to construct the network structure first, and then load the weights to build the training engine。
+
+```python
+import fluid
+from ppcls.modeling.architectures.resnet_vd import ResNet50_vd
+
+place = fluid.CPUPlace()
+exe = fluid.Executor(place)
+startup_prog = fluid.Program()
+infer_prog = fluid.Program()
+with fluid.program_guard(infer_prog, startup_prog):
+    with fluid.unique_name.guard():
+        image = create_input()
+        image = fluid.data(name='image', shape=[None, 3, 224, 224], dtype='float32')
+        out = ResNet50_vd.net(input=input, class_dim=1000)
+infer_prog = infer_prog.clone(for_test=True)
+fluid.load(program=infer_prog, model_path=the path of persistable model, executor=exe)
+```
+
+Perform inference：
+
+```python
+outputs = exe.run(infer_prog,
+        feed={image.name: data},
+        fetch_list=[out.name],
+        return_numpy=False)
+```
+
+For the above parameter descriptions, please refer to the official website [fluid.Executor](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/executor_cn/Executor_cn.html)
+
+## Training engine + inference model prediction
+
+A complete example is provided in `tools/infer/py_infer.py`, just execute the following command to complete the prediction：
+
+```python
+python tools/infer/py_infer.py \
+    --i=the path of images \
+    --d=the path of saved model \
+    --m=the path of saved model file \
+    --p=the path of saved weight file \
+    --use_gpu=True
+```
+ `image_file`(shortening i)：the path of images which are needed to predict，如 `./test.jpeg`
+ `model_file`(shortening m)：the path of model file，如 `./resnet50_vd/model`
+ `params_file`(shortening p)：the path of weights file，如 `./resnet50_vd/params`
+ `model_dir`(shortening d)：the folder of model，如`./resent50_vd`
+ `use_gpu`：whether to use GPU, default: True
+
+Training engine build
+
+Since inference model contains the structure of model, we do not need to construct the model before, load the model file and weights file directly to bulid training engine.
+
+```python
+import fluid
+
+place = fluid.CPUPlace()
+exe = fluid.Executor(place)
+[program, feed_names, fetch_lists] = fluid.io.load_inference_model(
+        the path of saved model,
+        exe,
+        model_filename=the path of model file,
+        params_filename=the path of weights file)
+compiled_program = fluid.compiler.CompiledProgram(program)
+```
+
+> `load_inference_model` Not only supports scattered weight file collection, but also supports a single weight file。
+
+Perform inference：
+
+```python
+outputs = exe.run(compiled_program,
+        feed={feed_names[0]: data},
+        fetch_list=fetch_lists,
+        return_numpy=False)
+```
+
+For the above parameter descriptions, please refer to the official website [fluid.Executor](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/executor_cn/Executor_cn.html)
--- a/docs/en/extension/paddle_mobile_inference_en.md
+++ b/docs/en/extension/paddle_mobile_inference_en.md
+# Paddle-Lite
+
+## Introduction
+
+[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) is a set of lightweight inference engine which is fully functional, easy to use and then performs well. Lightweighting is reflected in the use of fewer bits to represent the weight and activation of the neural network, which can greatly reduce the size of the model, solve the problem of limited storage space of the mobile device, and the inference speed is better than other frameworks on the whole.
+
+In [PaddleClas](https://github.com/PaddlePaddle/PaddleClas), we uses Paddle-Lite to [evaluate the performance on the mobile device](../models/Mobile.md), in this section we uses the `MobileNetV1` model trained on the `ImageNet1k` dataset as an example to introduce how to use `Paddle-Lite` to evaluate the model speed on the mobile terminal (evaluated on SD855)
+
+## Evaluation Steps
+
+### Export the Inference Model
+
+* First you should transform the saved model during training to the special model which can be used to inference, the special model can be exported by `tools/export_model.py`, the specific way of transform is as follows.
+
+```shell
+python tools/export_model.py -m MobileNetV1 -p pretrained/MobileNetV1_pretrained/ -o inference/MobileNetV1
+```
+
+Finally the `model` and `parmas` can be saved in `inference/MobileNetV1`.
+
+
+### Download Benchmark Binary File
+
+* Use the adb (Android Debug Bridge) tool to connect the Android phone and the PC, then develop and debug. After installing adb and ensuring that the PC and the phone are successfully connected, use the following command to view the ARM version of the phone and select the pre-compiled library based on ARM version.
+
+```shell
+adb shell getprop ro.product.cpu.abi
+```
+
+* Download Benchmark_bin File
+
+```shell
+wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v8
+```
+
+If the ARM version is v7, the v7 benchmark_bin file should be downloaded, the command is as follow.
+
+```shell
+wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v7
+```
+
+### Inference benchmark
+
+After the PC and mobile phone are successfully connected, use the following command to start the model evaluation.
+
+```
+sh tools/lite/benchmark.sh ./benchmark_bin_v8 ./inference result_armv8.txt true
+```
+
+Where `./benchmark_bin_v8` is the path of the benchmark binary file, `./inference` is the path of all the models that need to be evaluated, `result_armv8.txt` is the result file, and the final parameter `true` means that the model will be optimized before evaluation. Eventually, the evaluation result file of `result_armv8.txt` will be saved in the current folder. The specific performances are as follows.
+
+```
+PaddleLite Benchmark
+Threads=1 Warmup=10 Repeats=30
+MobileNetV1                           min = 30.89100    max = 30.73600    average = 30.79750
+
+Threads=2 Warmup=10 Repeats=30
+MobileNetV1                           min = 18.26600    max = 18.14000    average = 18.21637
+
+Threads=4 Warmup=10 Repeats=30
+MobileNetV1                           min = 10.03200    max = 9.94300     average = 9.97627
+```
+
+Here is the model inference speed under different number of threads, the unit is FPS, taking model on one threads as an example, the average speed of MobileNetV1 on SD855 is `30.79750FPS`.
+
+### Model Optimization and Speed Evaluation
+
+* In II.III section, we mention that the model will be optimized before evaluation, here you can  first optimize the model, and then directly load the optimized model for speed evaluation
+
+* Paddle-Lite
+In Paddle-Lite, we provides multiple strategies to automatically optimize the original training model, which contain Quantify, Subgraph fusion, Hybrid scheduling, Kernel optimization and so on. In order to make the optimization more convenient and easy to use, we provide opt tools to automatically complete the optimization steps and output a lightweight, optimal  and executable model in Paddle-Lite, which can be downloaded on [Paddle-Lite Model Optimization Page](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_optimize_tool.html). Here we take `MacOS` as our development environment, download[opt_mac](https://paddlelite-data.bj.bcebos.com/model_optimize_tool/opt_mac) model optimization tools and use the following commands to optimize the model.
+
+
+```shell
+model_file="../MobileNetV1/model"
+param_file="../MobileNetV1/params"
+opt_models_dir="./opt_models"
+mkdir ${opt_models_dir}
+./opt_mac --model_file=${model_file} \
+    --param_file=${param_file} \
+    --valid_targets=arm \
+    --optimize_out_type=naive_buffer \
+    --prefer_int8_kernel=false \
+    --optimize_out=${opt_models_dir}/MobileNetV1
+```
+
+Where the `model_file` and `param_file` are exported model file and the file address respectively, after transforming successfully, the `MobileNetV1.nb` will be saved in `opt_models`
+
+
+
+Use the benchmark_bin file to load the optimized model for evaluation. The commands are as follows.
+
+```shell
+bash benchmark.sh ./benchmark_bin_v8 ./opt_models result_armv8.txt
+```
+
+Finally the result is saved in `result_armv8.txt` and shown as follow.
+
+```
+PaddleLite Benchmark
+Threads=1 Warmup=10 Repeats=30
+MobileNetV1_lite              min = 30.89500    max = 30.78500    average = 30.84173
+
+Threads=2 Warmup=10 Repeats=30
+MobileNetV1_lite              min = 18.25300    max = 18.11000    average = 18.18017
+
+Threads=4 Warmup=10 Repeats=30
+MobileNetV1_lite              min = 10.00600    max = 9.90000     average = 9.96177
+```
+
+
+Taking the model on one threads as an example, the average speed of MobileNetV1 on SD855 is `30.84173FPS`.
+
+More specific parameter explanation and Paddle-Lite usage can refer to [Paddle-Lite docs](https://paddle-lite.readthedocs.io/zh/latest/)。
--- a/docs/en/extension/paddle_quantization_en.md
+++ b/docs/en/extension/paddle_quantization_en.md
+# Model Quantifization
+
+Int8 quantization is one of the key features in [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim).
+It supports two kinds of training aware, **Dynamic strategy** and **Static strategy**,
+layer-wise and channel-wise quantization,
+and using PaddleLite to deploy models generated by PaddleSlim.
+
+By using this toolkit, [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) quantized the mobilenet_v3_large_x1_0 model whose accuracy is 78.9% after distilled.
+After quantized, the prediction speed is accelerated from 19.308ms to 14.395ms on SD855.
+The storage size is reduced from 21M to 10M.
+The top1 recognition accuracy rate is 75.9%.
+For specific training methods, please refer to [PaddleSlim quant aware](https://paddlepaddle.github.io/PaddleSlim/quick_start/quant_aware_tutorial.html)。
--- a/docs/en/extension/paddle_serving_en.md
+++ b/docs/en/extension/paddle_serving_en.md
+# Model Service Deployment
+
+## Overview
+[Paddle Serving](https://github.com/PaddlePaddle/Serving) aims to help deep-learning researchers to easily deploy online inference services, supporting one-click deployment of industry, high concurrency and efficient communication between client and server and supporting multiple programming languages to develop clients.
+
+Taking HTTP inference service deployment as an example to introduce how to use PaddleServing to deploy model services in PaddleClas.
+
+## Serving Install
+
+It is recommends to use docker to install and deploy the Serving environment in the Serving official website, first, you need to pull the docker environment and create Serving-based docker.
+
+```shell
+nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+nvidia-docker exec -it test bash
+```
+
+In docker, you need to install some packages about Serving
+
+```shell
+pip install paddlepaddle-gpu
+pip install paddle-serving-client
+pip install paddle-serving-server-gpu
+```
+
+* If the installation speed is too slow, you can add `-i https://pypi.tuna.tsinghua.edu.cn/simple` following pip to speed up the process.
+
+* If you want to deploy CPU service, you can install the cpu version of Serving, the command is as follow.
+
+```shell
+pip install paddle-serving-server
+```
+
+### Export Model
+
+Exporting the Serving model using `tools/export_serving_model.py`, taking ResNet50_vd as an example, the command is as follow.
+
+```shell
+python tools/export_serving_model.py -m ResNet50_vd -p ./pretrained/ResNet50_vd_pretrained/ -o serving
+```
+
+finally, the client configures, model parameters and structure file will be saved in `ppcls_client_conf` and `ppcls_model`.
+
+
+### Service Deployment and Request
+
+* Using the following commands to start the Serving.
+
+```shell
+python tools/serving/image_service_gpu.py serving/ppcls_model workdir 9292
+```
+
+`serving/ppcls_model` is the address of the Serving model just saved, `workdir` is the work directory, and `9292` is the port of the service.
+
+
+* Using the following script to send an identification request to the Serving and return the result.
+
+```
+python tools/serving/image_http_client.py  9292 ./docs/images/logo.png
+```
+
+`9292` is the port for sending the request, which is consistent with the Serving starting port, and `./docs/images/logo.png` is the test image, the final top1 label and probability are returned.
+
+* For more Serving deployment, such RPC inference service, you can refer to the Serving official website: [https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet)
--- a/docs/en/faq_en.md
+++ b/docs/en/faq_en.md
+# FAQ
+
+>>
+* Why are the metrics different for different cards?
+* A: Fleet is the default option for the use of PaddleClas. Each GPU card is taken as a single trainer and deals with different images, which cause the final small difference. Single card evalution is suggested to get the accurate results if you use `tools/eval.py`. You can also use  `tools/eval_multi_platform.py` to evalute the models on multiple GPU cards, which is also supported on Windows and CPU.
+
+
+>>
+* Q: Why `Mixup` or `Cutmix` is not used even if I have already add the data operation in the configuration file?
+* A: When using `Mixup` or `Cutmix`, you also need to add `use_mix: True` in the configuration file to make it work properly.
+
+
+>>
+* Q: During evaluation and inference, pretrained model address is assgined, but the weights can not be imported. Why?
+* A: Prefix of the pretrained model is needed. For example, if the pretained weights are located in `output/ResNet50_vd/19`, with the filename `output/ResNet50_vd/19/ppcls.pdparams`, then `pretrained_model` in the configuration file needs to be `output/ResNet50_vd/19/ppcls`.
+
+>>
+* Q: Why are the metrics 0.3% lower than that shown in the model zoo for `EfficientNet` series of models?
+* A: Resize method is set as `Cubic` for `EfficientNet`(interpolation is set as 2 in OpenCV), while other models are set as `Bilinear`(interpolation is set as None in OpenCV). Therefore, you need to modify the interpolation explicitly in `ResizeImage`. Specifically, the following configuration is a demo for EfficientNet.
+
+```
+VALID:
+    batch_size: 16
+    num_workers: 4
+    file_list: "./dataset/ILSVRC2012/val_list.txt"
+    data_dir: "./dataset/ILSVRC2012/"
+    shuffle_seed: 0
+    transforms:
+        - DecodeImage:
+            to_rgb: True
+            to_np: False
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+            interpolation: 2
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - ToCHWImage:
+```
+
+>>
+* Q: What should I do if I want to transform the weights' format from `pdparams` to an earlier version(before Paddle1.7.0), which consists of the scattered files?
+* A: You can use `fluid.load` to load the `pdparams` weights and use `fluid.io.save_vars` to save the weights as scattered files. The demo is as follows. Finally all the scattered files will be saved in the path `path_to_save_var`.
+```
+fluid.load(
+        program=infer_prog, model_path=args.pretrained_model, executor=exe)
+state = fluid.io.load_program_state(args.pretrained_model)
+def exists(var):
+    return var.name in state
+fluid.io.save_vars(exe, "./path_to_save_var", infer_prog, predicate=exists)
+```
+
+
+>>
+* Q: The error occured when using visualdl under python2, shows that: `TypeError: __init__() missing 1 required positional argument: 'sync_cycle'`.
+* A: `Visualdl` is only supported on python3 as now, whose version needs also be higher than `2.0`. If your visualdl version is lower than 2.0, you can also install visualdl 2.0 by `pip3 install visualdl==2.0.0b8 -i https://mirror.baidu.com/pypi/simple`.
--- a/docs/en/index.rst
+++ b/docs/en/index.rst
@@ -11,9 +11,7 @@ Welcome to PaddleClas！
   advanced_tutorials/index
   application/index
   extension/index
-   competition_support.md
-   model_zoo.md
-   change_log.md
-   faq.md
+   competition_support_en.md
+   update_history_en.md
+   faq_en.md

-:math:`PaddlePaddle2020`
--- a/docs/en/models/DPN_DenseNet_en.md
+++ b/docs/en/models/DPN_DenseNet_en.md
+# DPN and DenseNet series
+
+## Overview
+
+DenseNet is a new network structure proposed in 2017 and was the best paper of CVPR. The network has designed a new cross-layer connected block called dense-block. Compared to the bottleneck in ResNet, dense-block has designed a more aggressive dense connection module, that is, connecting all the layers to each other, and each layer will accept all the layers in front of it as its additional input. DenseNet stacks all dense-blocks into a densely connected network. The dense connection makes DenseNet easier to backpropagate, making the network easier to train and converge.  The full name of DPN is Dual Path Networks, which is a network composed of DenseNet and ResNeXt, which proves that DenseNet can extract new features from the previous level, and ResNeXt essentially reuses the extracted features . The author further analyzes and finds that ResNeXt has high reuse rate for features, but low redundancy, while DenseNet can create new features, but with high redundancy. Combining the advantages of the two structures, the author designed the DPN network. In the end, the DPN network achieved better results than ResNeXt and DenseNet under the same FLOPS and parameters.
+
+The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.DPN.png)
+
+The pretrained models of these two types of models (a total of 10) are open sourced in PaddleClas at present. The indicators are shown in the figure above. It is easy to observe that under the same FLOPS and parameters, DPN has higher accuracy than DenseNet. However,because DPN has more branches, its inference speed is slower than DenseNet. Since DenseNet264 has the deepest layers in all DenseNet networks, it has the largest parameters,DenseNet161 has the largest width, resulting the largest FLOPs and the highest accuracy in this series. From the perspective of inference speed, DenseNet161, which has a large FLOPs and high accuracy, has a faster speed than DenseNet264, so it has a greater advantage than DenseNet264.
+
+For DPN series networks, the larger the model's FLOPs and parameters, the higher the model's accuracy. Among them, since the width of DPN107 is the largest, it has the largest number of parameters and FLOPs in this series of networks.
+
+
+## Accuracy, FLOPS and Parameters
+
+| Models      | Top1   | Top5   | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| DenseNet121 | 0.757  | 0.926  | 0.750             |                   | 5.690        | 7.980             |
+| DenseNet161 | 0.786  | 0.941  | 0.778             |                   | 15.490       | 28.680            |
+| DenseNet169 | 0.768  | 0.933  | 0.764             |                   | 6.740        | 14.150            |
+| DenseNet201 | 0.776  | 0.937  | 0.775             |                   | 8.610        | 20.010            |
+| DenseNet264 | 0.780  | 0.939  | 0.779             |                   | 11.540       | 33.370            |
+| DPN68       | 0.768  | 0.934  | 0.764             | 0.931             | 4.030        | 10.780            |
+| DPN92       | 0.799  | 0.948  | 0.793             | 0.946             | 12.540       | 36.290            |
+| DPN98       | 0.806  | 0.951  | 0.799             | 0.949             | 22.220       | 58.460            |
+| DPN107      | 0.809  | 0.953  | 0.802             | 0.951             | 35.060       | 82.970            |
+| DPN131      | 0.807  | 0.951  | 0.801             | 0.949             | 30.510       | 75.360            |
+
+
+
+
+## Inference speed based on V100 GPU
+
+| Models                               | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|-------------|-----------|-------------------|--------------------------|
+| DenseNet121 | 224       | 256               | 4.371                    |
+| DenseNet161 | 224       | 256               | 8.863                    |
+| DenseNet169 | 224       | 256               | 6.391                    |
+| DenseNet201 | 224       | 256               | 8.173                    |
+| DenseNet264 | 224       | 256               | 11.942                   |
+| DPN68       | 224       | 256               | 11.805                   |
+| DPN92       | 224       | 256               | 17.840                   |
+| DPN98       | 224       | 256               | 21.057                   |
+| DPN107      | 224       | 256               | 28.685                   |
+| DPN131      | 224       | 256               | 28.083                   |
+
+
+
+## Inference speed based on T4 GPU
+
+| Models      | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| DenseNet121 | 224       | 256               | 4.16436                      | 7.2126                       | 10.50221                     | 4.40447                      | 9.32623                      | 15.25175                     |
+| DenseNet161 | 224       | 256               | 9.27249                      | 14.25326                     | 20.19849                     | 10.39152                     | 22.15555                     | 35.78443                     |
+| DenseNet169 | 224       | 256               | 6.11395                      | 10.28747                     | 13.68717                     | 6.43598                      | 12.98832                     | 20.41964                     |
+| DenseNet201 | 224       | 256               | 7.9617                       | 13.4171                      | 17.41949                     | 8.20652                      | 17.45838                     | 27.06309                     |
+| DenseNet264 | 224       | 256               | 11.70074                     | 19.69375                     | 24.79545                     | 12.14722                     | 26.27707                     | 40.01905                     |
+| DPN68       | 224       | 256               | 11.7827                      | 13.12652                     | 16.19213                     | 11.64915                     | 12.82807                     | 18.57113                     |
+| DPN92       | 224       | 256               | 18.56026                     | 20.35983                     | 29.89544                     | 18.15746                     | 23.87545                     | 38.68821                     |
+| DPN98       | 224       | 256               | 21.70508                     | 24.7755                      | 40.93595                     | 21.18196                     | 33.23925                     | 62.77751                     |
+| DPN107      | 224       | 256               | 27.84462                     | 34.83217                     | 60.67903                     | 27.62046                     | 52.65353                     | 100.11721                    |
+| DPN131      | 224       | 256               | 28.58941                     | 33.01078                     | 55.65146                     | 28.33119                     | 46.19439                     | 89.24904                     |
--- a/docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md
+++ b/docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md
+# EfficientNet and ResNeXt101_wsl series
+
+## Overview
+
+EfficientNet is a lightweight NAS-based network released by Google in 2019. EfficientNetB7 refreshed the classification accuracy of ImageNet-1k at that time. In this paper, the author points out that the traditional methods to improve the performance of neural networks mainly start with the width of the network, the depth of the network, and the resolution of the input picture.
+However, the author found that balancing these three dimensions is essential for improving accuracy and efficiency through experiments.
+Therefore, the author summarized how to balance the three dimensions at the same time through a series of experiments.
+At the same time, based on this scaling method, the author built a total of 7 networks B1-B7 in the EfficientNet series on the basis of EfficientNetB0, and with the same FLOPS and parameters, the accuracy reached state-of-the-art effect.
+
+ResNeXt is an improved version of ResNet that proposed by Facebook in 2016. In 2019, Facebook researchers studied the accuracy limit of the series network on ImageNet through weakly-supervised-learning. In order to distinguish the previous ResNeXt network, the suffix of this series network is WSL, where WSL is the abbreviation of weakly-supervised-learning. In order to have stronger feature extraction capability, the researchers further enlarged the network width, among which the largest ResNeXt101_32x48d_wsl has 800 million parameters. It was trained under 940 million weak-labeled images, and the results were finetune trained on imagenet-1k. Finally, the acc-1 of imagenet-1k reaches 85.4%, which is also the network with the highest precision under the resolution of 224x224 on imagenet-1k so far. In Fix-ResNeXt, the author used a larger image resolution, made a special Fix strategy for the inconsistency of image data preprocessing in training and testing, and made ResNeXt101_32x48d_wsl have a higher accuracy. Since it used the Fix strategy, it was named Fix-ResNeXt101_32x48d_wsl.
+
+The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png)
+
+At present, there are a total of 14 pretrained models of the two types of models that PaddleClas open source. It can be seen from the above figure that the advantages of the EfficientNet series network are very obvious. The ResNeXt101_wsl series model uses more data, and the final accuracy is also higher. EfficientNet_B0_small removes SE_block based on EfficientNet_B0, which has faster inference speed.
+
+## Accuracy, FLOPS and Parameters
+
+| Models                        | Top1   | Top5   | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| ResNeXt101_<br>32x8d_wsl      | 0.826  | 0.967  | 0.822             | 0.964             | 29.140       | 78.440            |
+| ResNeXt101_<br>32x16d_wsl     | 0.842  | 0.973  | 0.842             | 0.972             | 57.550       | 152.660           |
+| ResNeXt101_<br>32x32d_wsl     | 0.850  | 0.976  | 0.851             | 0.975             | 115.170      | 303.110           |
+| ResNeXt101_<br>32x48d_wsl     | 0.854  | 0.977  | 0.854             | 0.976             | 173.580      | 456.200           |
+| Fix_ResNeXt101_<br>32x48d_wsl | 0.863  | 0.980  | 0.864             | 0.980             | 354.230      | 456.200           |
+| EfficientNetB0                | 0.774  | 0.933  | 0.773             | 0.935             | 0.720        | 5.100             |
+| EfficientNetB1                | 0.792  | 0.944  | 0.792             | 0.945             | 1.270        | 7.520             |
+| EfficientNetB2                | 0.799  | 0.947  | 0.803             | 0.950             | 1.850        | 8.810             |
+| EfficientNetB3                | 0.812  | 0.954  | 0.817             | 0.956             | 3.430        | 11.840            |
+| EfficientNetB4                | 0.829  | 0.962  | 0.830             | 0.963             | 8.290        | 18.760            |
+| EfficientNetB5                | 0.836  | 0.967  | 0.837             | 0.967             | 19.510       | 29.610            |
+| EfficientNetB6                | 0.840  | 0.969  | 0.842             | 0.968             | 36.270       | 42.000            |
+| EfficientNetB7                | 0.843  | 0.969  | 0.844             | 0.971             | 72.350       | 64.920            |
+| EfficientNetB0_<br>small      | 0.758  | 0.926  |                   |                   | 0.720        | 4.650             |
+
+
+## Inference speed based on V100 GPU
+
+| Models                               | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|-------------------------------|-----------|-------------------|--------------------------|
+| ResNeXt101_<br>32x8d_wsl      | 224       | 256               | 19.127                   |
+| ResNeXt101_<br>32x16d_wsl     | 224       | 256               | 23.629                   |
+| ResNeXt101_<br>32x32d_wsl     | 224       | 256               | 40.214                   |
+| ResNeXt101_<br>32x48d_wsl     | 224       | 256               | 59.714                   |
+| Fix_ResNeXt101_<br>32x48d_wsl | 320       | 320               | 82.431                   |
+| EfficientNetB0                | 224       | 256               | 2.449                    |
+| EfficientNetB1                | 240       | 272               | 3.547                    |
+| EfficientNetB2                | 260       | 292               | 3.908                    |
+| EfficientNetB3                | 300       | 332               | 5.145                    |
+| EfficientNetB4                | 380       | 412               | 7.609                    |
+| EfficientNetB5                | 456       | 488               | 12.078                   |
+| EfficientNetB6                | 528       | 560               | 18.381                   |
+| EfficientNetB7                | 600       | 632               | 27.817                   |
+| EfficientNetB0_<br>small      | 224       | 256               | 1.692                    |
+
+
+
+## Inference speed based on T4 GPU
+
+| Models                    | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|---------------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| ResNeXt101_<br>32x8d_wsl      | 224       | 256               | 18.19374                     | 21.93529                     | 34.67802                     | 18.52528                     | 34.25319                     | 67.2283                      |
+| ResNeXt101_<br>32x16d_wsl     | 224       | 256               | 18.52609                     | 36.8288                      | 62.79947                     | 25.60395                     | 71.88384                     | 137.62327                    |
+| ResNeXt101_<br>32x32d_wsl     | 224       | 256               | 33.51391                     | 70.09682                     | 125.81884                    | 54.87396                     | 160.04337                    | 316.17718                    |
+| ResNeXt101_<br>32x48d_wsl     | 224       | 256               | 50.97681                     | 137.60926                    | 190.82628                    | 99.01698256                  | 315.91261                    | 551.83695                    |
+| Fix_ResNeXt101_<br>32x48d_wsl | 320       | 320               | 78.62869                     | 191.76039                    | 317.15436                    | 160.0838242                  | 595.99296                    | 1151.47384                   |
+| EfficientNetB0            | 224       | 256               | 3.40122                      | 5.95851                      | 9.10801                      | 3.442                        | 6.11476                      | 9.3304                       |
+| EfficientNetB1            | 240       | 272               | 5.25172                      | 9.10233                      | 14.11319                     | 5.3322                       | 9.41795                      | 14.60388                     |
+| EfficientNetB2            | 260       | 292               | 5.91052                      | 10.5898                      | 17.38106                     | 6.29351                      | 10.95702                     | 17.75308                     |
+| EfficientNetB3            | 300       | 332               | 7.69582                      | 16.02548                     | 27.4447                      | 7.67749                      | 16.53288                     | 28.5939                      |
+| EfficientNetB4            | 380       | 412               | 11.55585                     | 29.44261                     | 53.97363                     | 12.15894                     | 30.94567                     | 57.38511                     |
+| EfficientNetB5            | 456       | 488               | 19.63083                     | 56.52299                     | -                            | 20.48571                     | 61.60252                     | -                            |
+| EfficientNetB6            | 528       | 560               | 30.05911                     | -                            | -                            | 32.62402                     | -                            | -                            |
+| EfficientNetB7            | 600       | 632               | 47.86087                     | -                            | -                            | 53.93823                     | -                            | -                            |
+| EfficientNetB0_small      | 224       | 256               | 2.39166                      | 4.36748                      | 6.96002                      | 2.3076                       | 4.71886                      | 7.21888                      |
--- a/docs/en/models/HRNet_en.md
+++ b/docs/en/models/HRNet_en.md
+# HRNet series
+
+## Overview
+
+HRNet is a brand new neural network proposed by Microsoft research Asia in 2019. Different from the previous convolutional neural network, this network can still maintain high resolution in the deep layer of the network, so the heat map of the key points predicted is more accurate, and it is also more accurate in space. In addition, the network performs particularly well in other visual tasks sensitive to resolution, such as detection and segmentation.
+
+The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.HRNet.png)
+
+At present, there are 7 pretrained models of such models open-sourced by PaddleClas, and their indicators are shown in the figure. Among them, the reason why the accuracy of the HRNet_W48_C indicator is abnormal may be due to fluctuations in training.
+
+
+## Accuracy, FLOPS and Parameters
+
+| Models      | Top1   | Top5   | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| HRNet_W18_C | 0.769  | 0.934  | 0.768             | 0.934             | 4.140        | 21.290            |
+| HRNet_W18_C_ssld | 0.816  | 0.958  | 0.768             | 0.934             | 4.140        | 21.290            |
+| HRNet_W30_C | 0.780  | 0.940  | 0.782             | 0.942             | 16.230       | 37.710            |
+| HRNet_W32_C | 0.783  | 0.942  | 0.785             | 0.942             | 17.860       | 41.230            |
+| HRNet_W40_C | 0.788  | 0.945  | 0.789             | 0.945             | 25.410       | 57.550            |
+| HRNet_W44_C | 0.790  | 0.945  | 0.789             | 0.944             | 29.790       | 67.060            |
+| HRNet_W48_C | 0.790  | 0.944  | 0.793             | 0.945             | 34.580       | 77.470            |
+| HRNet_W48_C_ssld | 0.836  | 0.968  | 0.793             | 0.945             | 34.580       | 77.470            |
+| HRNet_W64_C | 0.793  | 0.946  | 0.795             | 0.946             | 57.830       | 128.060           |
+
+
+## Inference speed based on V100 GPU
+
+| Models      | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|-------------|-----------|-------------------|--------------------------|
+| HRNet_W18_C | 224       | 256               | 7.368                    |
+| HRNet_W18_C_ssld | 224       | 256               | 7.368                    |
+| HRNet_W30_C | 224       | 256               | 9.402                    |
+| HRNet_W32_C | 224       | 256               | 9.467                    |
+| HRNet_W40_C | 224       | 256               | 10.739                   |
+| HRNet_W44_C | 224       | 256               | 11.497                   |
+| HRNet_W48_C | 224       | 256               | 12.165                   |
+| HRNet_W48_C_ssld | 224       | 256               | 12.165                   |
+| HRNet_W64_C | 224       | 256               | 15.003                   |
+
+
+
+
+## Inference speed based on T4 GPU
+
+| Models      | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| HRNet_W18_C | 224       | 256               | 6.79093                      | 11.50986                     | 17.67244                     | 7.40636                      | 13.29752                     | 23.33445                     |
+| HRNet_W18_C_ssld | 224       | 256               | 6.79093                      | 11.50986                     | 17.67244                     | 7.40636                      | 13.29752                     | 23.33445                     |
+| HRNet_W30_C | 224       | 256               | 8.98077                      | 14.08082                     | 21.23527                     | 9.57594                      | 17.35485                     | 32.6933                      |
+| HRNet_W32_C | 224       | 256               | 8.82415                      | 14.21462                     | 21.19804                     | 9.49807                      | 17.72921                     | 32.96305                     |
+| HRNet_W40_C | 224       | 256               | 11.4229                      | 19.1595                      | 30.47984                     | 12.12202                     | 25.68184                     | 48.90623                     |
+| HRNet_W44_C | 224       | 256               | 12.25778                     | 22.75456                     | 32.61275                     | 13.19858                     | 32.25202                     | 59.09871                     |
+| HRNet_W48_C | 224       | 256               | 12.65015                     | 23.12886                     | 33.37859                     | 13.70761                     | 34.43572                     | 63.01219                     |
+| HRNet_W48_C_ssld | 224       | 256               | 12.65015                     | 23.12886                     | 33.37859                     | 13.70761                     | 34.43572                     | 63.01219                     |
+| HRNet_W64_C | 224       | 256               | 15.10428                     | 27.68901                     | 40.4198                      | 17.57527                     | 47.9533                      | 97.11228                     |
--- a/docs/en/models/Inception_en.md
+++ b/docs/en/models/Inception_en.md
+# Inception series
+
+## Overview
+
+GoogLeNet is a new neural network structure designed by Google in 2014, which, together with VGG network, became the twin champions of the ImageNet challenge that year. GoogLeNet introduces the Inception structure for the first time, and stacks the Inception structure in the network so that the number of network layers reaches 22, which is also the mark of the convolutional network exceeding 20 layers for the first time. Since 1x1 convolution is used in the Inception structure to reduce the dimension of channel number, and Global pooling is used to replace the traditional method of processing features in multiple fc layers, the final GoogLeNet network has much less FLOPS and parameters than VGG network, which has become a beautiful scenery of neural network design at that time.
+
+Xception is another improvement to InceptionV3 that Google proposed after Inception. In Xception, the author used the depthwise separable convolution to replace the traditional convolution operation, which greatly saved the network FLOPS and the number of parameters, but improved the accuracy. In DeeplabV3+, the author further improved the Xception and increased the number of Xception layers, and designed the network of Xception65 and Xception71.
+
+InceptionV4 is a new neural network designed by Google in 2016, when residual structure were all the rage, but the authors believe that high performance can be achieved using only Inception structure. InceptionV4 uses more Inception structure to achieve even greater precision on Imagenet-1k.
+
+The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.Inception.png)
+
+The figure above reflects the relationship between the accuracy of Xception series and InceptionV4 and other indicators. Among them, Xception_deeplab is consistent with the structure of the paper, and Xception is an improved model developed by PaddleClas, which improves the accuracy by about 0.6% when the inference speed is basically unchanged. Details of the improved model are being updated, so stay tuned.
+
+
+## Accuracy, FLOPS and Parameters
+
+| Models             | Top1   | Top5   | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| GoogLeNet          | 0.707  | 0.897  | 0.698             |                   | 2.880        | 8.460             |
+| Xception41         | 0.793  | 0.945  | 0.790             | 0.945             | 16.740       | 22.690            |
+| Xception41<br>_deeplab | 0.796  | 0.944  |                   |                   | 18.160       | 26.730            |
+| Xception65         | 0.810  | 0.955  |                   |                   | 25.950       | 35.480            |
+| Xception65<br>_deeplab | 0.803  | 0.945  |                   |                   | 27.370       | 39.520            |
+| Xception71         | 0.811  | 0.955  |                   |                   | 31.770       | 37.280            |
+| InceptionV4        | 0.808  | 0.953  | 0.800             | 0.950             | 24.570       | 42.680            |
+
+
+
+## Inference speed based on V100 GPU
+
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|------------------------|-----------|-------------------|--------------------------|
+| GoogLeNet              | 224       | 256               | 1.807                    |
+| Xception41             | 299       | 320               | 3.972                    |
+| Xception41_<br>deeplab | 299       | 320               | 4.408                    |
+| Xception65             | 299       | 320               | 6.174                    |
+| Xception65_<br>deeplab | 299       | 320               | 6.464                    |
+| Xception71             | 299       | 320               | 6.782                    |
+| InceptionV4            | 299       | 320               | 11.141                   |
+
+
+
+## Inference speed based on T4 GPU
+
+| Models             | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|--------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| GoogLeNet          | 299       | 320               | 1.75451                      | 3.39931                      | 4.71909                      | 1.88038                      | 4.48882                      | 6.94035                      |
+| Xception41         | 299       | 320               | 2.91192                      | 7.86878                      | 15.53685                     | 4.96939                      | 17.01361                     | 32.67831                     |
+| Xception41_<br>deeplab | 299       | 320               | 2.85934                      | 7.2075                       | 14.01406                     | 5.33541                      | 17.55938                     | 33.76232                     |
+| Xception65         | 299       | 320               | 4.30126                      | 11.58371                     | 23.22213                     | 7.26158                      | 25.88778                     | 53.45426                     |
+| Xception65_<br>deeplab | 299       | 320               | 4.06803                      | 9.72694                      | 19.477                       | 7.60208                      | 26.03699                     | 54.74724                     |
+| Xception71         | 299       | 320               | 4.80889                      | 13.5624                      | 27.18822                     | 8.72457                      | 31.55549                     | 69.31018                     |
+| InceptionV4        | 299       | 320               | 9.50821                      | 13.72104                     | 20.27447                     | 12.99342                     | 25.23416                     | 43.56121                     |
--- a/docs/en/models/Mobile_en.md
+++ b/docs/en/models/Mobile_en.md
+# Mobile and Embedded Vision Applications Network series
+
+## Overview
+
+MobileNetV1 is a network launched by Google in 2017 for use on mobile devices or embedded devices. The network replaces the depthwise separable convolution with the traditional convolution operation, that is, the combination of depthwise convolution and pointwise convolution. Compared with the traditional convolution operation, this combination can greatly save the number of parameters and computation. At the same time, MobileNetV1 can also be used for object detection, image segmentation and other visual tasks.
+
+MobileNetV2 is a lightweight network proposed by Google following MobileNetV1. Compared with MobileNetV1, MobileNetV2 proposed Linear bottlenecks and Inverted residual block as a basic network structures, to constitute MobileNetV2 network architecture through stacking these basic module a lot. In the end, higher classification accuracy was achieved when FLOPS was only half of MobileNetV1.
+
+The ShuffleNet series network is the lightweight network structure proposed by MEGVII. So far, there are two typical structures in this series network, namely, ShuffleNetV1 and ShuffleNetV2. A Channel Shuffle operation in ShuffleNet can exchange information between groups and perform end-to-end training. In the paper of ShuffleNetV2, the author proposes four criteria for designing lightweight networks, and designs the ShuffleNetV2 network according to the four criteria and the shortcomings of ShuffleNetV1.
+
+MobileNetV3 is a new and lightweight network based on NAS proposed by Google in 2019. In order to further improve the effect, the activation functions of relu and sigmoid were replaced with hard_swish and hard_sigmoid activation functions, and some improved strategies were introduced to reduce the amount of network computing.
+
+GhosttNet is a brand-new lightweight network structure proposed by Huawei in 2020. By introducing the ghost module, the problem of redundant calculation of features in traditional deep networks is greatly alleviated, which greatly reduces the amount of network parameters and calculations.
+
+![](../../images/models/mobile_arm_top1.png)
+
+![](../../images/models/mobile_arm_storage.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png)
+
+Currently there are 32 pretrained models of the mobile series open source by PaddleClas, and their indicators are shown in the figure below. As you can see from the picture, newer lightweight models tend to perform better, and MobileNetV3 represents the latest lightweight neural network architecture. In MobileNetV3, the author used 1x1 convolution after global-avg-pooling in order to obtain higher accuracy,this operation significantly increases the number of parameters but has little impact on the amount of computation, so if the model is evaluated from a storage perspective of excellence, MobileNetV3 does not have much advantage, but because of its smaller computation, it has a faster inference speed. In addition, the SSLD distillation model in our model library performs excellently, refreshing the accuracy of the current lightweight model from various perspectives. Due to the complex structure and many branches of the MobileNetV3 model, which is not GPU friendly, the GPU inference speed is not as good as that of MobileNetV1.
+
+## Accuracy, FLOPS and Parameters
+
+| Models                               | Top1    | Top5    | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| MobileNetV1_x0_25                    | 0.514   | 0.755   | 0.506             |                   | 0.070        | 0.460             |
+| MobileNetV1_x0_5                     | 0.635   | 0.847   | 0.637             |                   | 0.280        | 1.310             |
+| MobileNetV1_x0_75                    | 0.688   | 0.882   | 0.684             |                   | 0.630        | 2.550             |
+| MobileNetV1                          | 0.710   | 0.897   | 0.706             |                   | 1.110        | 4.190             |
+| MobileNetV1_ssld                     | 0.779   | 0.939   |                   |                   | 1.110        | 4.190             |
+| MobileNetV2_x0_25                    | 0.532   | 0.765   |                   |                   | 0.050        | 1.500             |
+| MobileNetV2_x0_5                     | 0.650   | 0.857   | 0.654             | 0.864             | 0.170        | 1.930             |
+| MobileNetV2_x0_75                    | 0.698   | 0.890   | 0.698             | 0.896             | 0.350        | 2.580             |
+| MobileNetV2                          | 0.722   | 0.907   | 0.718             | 0.910             | 0.600        | 3.440             |
+| MobileNetV2_x1_5                     | 0.741   | 0.917   |                   |                   | 1.320        | 6.760             |
+| MobileNetV2_x2_0                     | 0.752   | 0.926   |                   |                   | 2.320        | 11.130            |
+| MobileNetV2_ssld                     | 0.7674  | 0.9339  |                   |                   | 0.600        | 3.440             |
+| MobileNetV3_large_<br>x1_25          | 0.764   | 0.930   | 0.766             |                   | 0.714        | 7.440             |
+| MobileNetV3_large_<br>x1_0           | 0.753   | 0.923   | 0.752             |                   | 0.450        | 5.470             |
+| MobileNetV3_large_<br>x0_75          | 0.731   | 0.911   | 0.733             |                   | 0.296        | 3.910             |
+| MobileNetV3_large_<br>x0_5           | 0.692   | 0.885   | 0.688             |                   | 0.138        | 2.670             |
+| MobileNetV3_large_<br>x0_35          | 0.643   | 0.855   | 0.642             |                   | 0.077        | 2.100             |
+| MobileNetV3_small_<br>x1_25          | 0.707   | 0.895   | 0.704             |                   | 0.195        | 3.620             |
+| MobileNetV3_small_<br>x1_0           | 0.682   | 0.881   | 0.675             |                   | 0.123        | 2.940             |
+| MobileNetV3_small_<br>x0_75          | 0.660   | 0.863   | 0.654             |                   | 0.088        | 2.370             |
+| MobileNetV3_small_<br>x0_5           | 0.592   | 0.815   | 0.580             |                   | 0.043        | 1.900             |
+| MobileNetV3_small_<br>x0_35          | 0.530   | 0.764   | 0.498             |                   | 0.026        | 1.660             |
+| MobileNetV3_small_<br>x0_35_ssld          | 0.556   | 0.777   | 0.498             |                   | 0.026        | 1.660             |
+| MobileNetV3_large_<br>x1_0_ssld      | 0.790   | 0.945   |                   |                   | 0.450        | 5.470             |
+| MobileNetV3_large_<br>x1_0_ssld_int8 | 0.761   |         |                   |                   |              |                   |
+| MobileNetV3_small_<br>x1_0_ssld      | 0.713   | 0.901   |                   |                   | 0.123        | 2.940             |
+| ShuffleNetV2                         | 0.688   | 0.885   | 0.694             |                   | 0.280        | 2.260             |
+| ShuffleNetV2_x0_25                   | 0.499   | 0.738   |                   |                   | 0.030        | 0.600             |
+| ShuffleNetV2_x0_33                   | 0.537   | 0.771   |                   |                   | 0.040        | 0.640             |
+| ShuffleNetV2_x0_5                    | 0.603   | 0.823   | 0.603             |                   | 0.080        | 1.360             |
+| ShuffleNetV2_x1_5                    | 0.716   | 0.902   | 0.726             |                   | 0.580        | 3.470             |
+| ShuffleNetV2_x2_0                    | 0.732   | 0.912   | 0.749             |                   | 1.120        | 7.320             |
+| ShuffleNetV2_swish                   | 0.700   | 0.892   |                   |                   | 0.290        | 2.260             |
+| GhostNet_x0_5                        | 0.668   | 0.869   | 0.662             | 0.866             | 0.082        | 2.600             |
+| GhostNet_x1_0                        | 0.740   | 0.916   | 0.739             | 0.914             | 0.294        | 5.200             |
+| GhostNet_x1_3                        | 0.757   | 0.925   | 0.757             | 0.927             | 0.440        | 7.300             |
+
+
+## Inference speed and storage size based on SD855
+
+| Models                               | Batch Size=1(ms) | Storage Size(M) |
+|:--:|:--:|:--:|
+| MobileNetV1_x0_25                    | 3.220            | 1.900           |
+| MobileNetV1_x0_5                     | 9.580            | 5.200           |
+| MobileNetV1_x0_75                    | 19.436           | 10.000          |
+| MobileNetV1                          | 32.523           | 16.000          |
+| MobileNetV1_ssld                     | 32.523           | 16.000          |
+| MobileNetV2_x0_25                    | 3.799            | 6.100           |
+| MobileNetV2_x0_5                     | 8.702            | 7.800           |
+| MobileNetV2_x0_75                    | 15.531           | 10.000          |
+| MobileNetV2                          | 23.318           | 14.000          |
+| MobileNetV2_x1_5                     | 45.624           | 26.000          |
+| MobileNetV2_x2_0                     | 74.292           | 43.000          |
+| MobileNetV2_ssld                     | 23.318           | 14.000          |
+| MobileNetV3_large_x1_25          | 28.218           | 29.000          |
+| MobileNetV3_large_x1_0           | 19.308           | 21.000          |
+| MobileNetV3_large_x0_75          | 13.565           | 16.000          |
+| MobileNetV3_large_x0_5           | 7.493            | 11.000          |
+| MobileNetV3_large_x0_35          | 5.137            | 8.600           |
+| MobileNetV3_small_x1_25          | 9.275            | 14.000          |
+| MobileNetV3_small_x1_0           | 6.546            | 12.000          |
+| MobileNetV3_small_x0_75          | 5.284            | 9.600           |
+| MobileNetV3_small_x0_5           | 3.352            | 7.800           |
+| MobileNetV3_small_x0_35          | 2.635            | 6.900           |
+| MobileNetV3_small_x0_35_ssld          | 2.635            | 6.900           |
+| MobileNetV3_large_x1_0_ssld      | 19.308           | 21.000          |
+| MobileNetV3_large_x1_0_ssld_int8 | 14.395           | 10.000          |
+| MobileNetV3_small_x1_0_ssld      | 6.546            | 12.000          |
+| ShuffleNetV2                         | 10.941           | 9.000           |
+| ShuffleNetV2_x0_25                   | 2.329            | 2.700           |
+| ShuffleNetV2_x0_33                   | 2.643            | 2.800           |
+| ShuffleNetV2_x0_5                    | 4.261            | 5.600           |
+| ShuffleNetV2_x1_5                    | 19.352           | 14.000          |
+| ShuffleNetV2_x2_0                    | 34.770           | 28.000          |
+| ShuffleNetV2_swish                   | 16.023           | 9.100           |
+| GhostNet_x0_5                   | 5.714           | 10.000           |
+| GhostNet_x1_0                   | 13.558           | 20.000           |
+| GhostNet_x1_3                   | 19.982           | 29.000           |
+
+
+## Inference speed based on T4 GPU
+
+| Models            | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|-----------------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|
+| MobileNetV1_x0_25           | 0.68422               | 1.13021               | 1.72095               | 0.67274               | 1.226                 | 1.84096               |
+| MobileNetV1_x0_5            | 0.69326               | 1.09027               | 1.84746               | 0.69947               | 1.43045               | 2.39353               |
+| MobileNetV1_x0_75           | 0.6793                | 1.29524               | 2.15495               | 0.79844               | 1.86205               | 3.064                 |
+| MobileNetV1                 | 0.71942               | 1.45018               | 2.47953               | 0.91164               | 2.26871               | 3.90797               |
+| MobileNetV1_ssld            | 0.71942               | 1.45018               | 2.47953               | 0.91164               | 2.26871               | 3.90797               |
+| MobileNetV2_x0_25           | 2.85399               | 3.62405               | 4.29952               | 2.81989               | 3.52695               | 4.2432                |
+| MobileNetV2_x0_5            | 2.84258               | 3.1511                | 4.10267               | 2.80264               | 3.65284               | 4.31737               |
+| MobileNetV2_x0_75           | 2.82183               | 3.27622               | 4.98161               | 2.86538               | 3.55198               | 5.10678               |
+| MobileNetV2                 | 2.78603               | 3.71982               | 6.27879               | 2.62398               | 3.54429               | 6.41178               |
+| MobileNetV2_x1_5            | 2.81852               | 4.87434               | 8.97934               | 2.79398               | 5.30149               | 9.30899               |
+| MobileNetV2_x2_0            | 3.65197               | 6.32329               | 11.644                | 3.29788               | 7.08644               | 12.45375              |
+| MobileNetV2_ssld            | 2.78603               | 3.71982               | 6.27879               | 2.62398               | 3.54429               | 6.41178               |
+| MobileNetV3_large_x1_25     | 2.34387               | 3.16103               | 4.79742               | 2.35117               | 3.44903               | 5.45658               |
+| MobileNetV3_large_x1_0      | 2.20149               | 3.08423               | 4.07779               | 2.04296               | 2.9322                | 4.53184               |
+| MobileNetV3_large_x0_75     | 2.1058                | 2.61426               | 3.61021               | 2.0006                | 2.56987               | 3.78005               |
+| MobileNetV3_large_x0_5      | 2.06934               | 2.77341               | 3.35313               | 2.11199               | 2.88172               | 3.19029               |
+| MobileNetV3_large_x0_35     | 2.14965               | 2.7868                | 3.36145               | 1.9041                | 2.62951               | 3.26036               |
+| MobileNetV3_small_x1_25     | 2.06817               | 2.90193               | 3.5245                | 2.02916               | 2.91866               | 3.34528               |
+| MobileNetV3_small_x1_0      | 1.73933               | 2.59478               | 3.40276               | 1.74527               | 2.63565               | 3.28124               |
+| MobileNetV3_small_x0_75     | 1.80617               | 2.64646               | 3.24513               | 1.93697               | 2.64285               | 3.32797               |
+| MobileNetV3_small_x0_5      | 1.95001               | 2.74014               | 3.39485               | 1.88406               | 2.99601               | 3.3908                |
+| MobileNetV3_small_x0_35     | 2.10683               | 2.94267               | 3.44254               | 1.94427               | 2.94116               | 3.41082               |
+| MobileNetV3_small_x0_35_ssld     | 2.10683               | 2.94267               | 3.44254               | 1.94427               | 2.94116               | 3.41082               |
+| MobileNetV3_large_x1_0_ssld | 2.20149               | 3.08423               | 4.07779               | 2.04296               | 2.9322                | 4.53184               |
+| MobileNetV3_small_x1_0_ssld | 1.73933               | 2.59478               | 3.40276               | 1.74527               | 2.63565               | 3.28124               |
+| ShuffleNetV2                | 1.95064               | 2.15928               | 2.97169               | 1.89436               | 2.26339               | 3.17615               |
+| ShuffleNetV2_x0_25          | 1.43242               | 2.38172               | 2.96768               | 1.48698               | 2.29085               | 2.90284               |
+| ShuffleNetV2_x0_33          | 1.69008               | 2.65706               | 2.97373               | 1.75526               | 2.85557               | 3.09688               |
+| ShuffleNetV2_x0_5           | 1.48073               | 2.28174               | 2.85436               | 1.59055               | 2.18708               | 3.09141               |
+| ShuffleNetV2_x1_5           | 1.51054               | 2.4565                | 3.41738               | 1.45389               | 2.5203                | 3.99872               |
+| ShuffleNetV2_x2_0           | 1.95616               | 2.44751               | 4.19173               | 2.15654               | 3.18247               | 5.46893               |
+| ShuffleNetV2_swish          | 2.50213               | 2.92881               | 3.474                 | 2.5129                | 2.97422               | 3.69357               |
--- a/docs/en/models/Others_en.md
+++ b/docs/en/models/Others_en.md
+# Other networks
+
+## Overview
+
+In 2012, AlexNet network proposed by Alex et al. won the ImageNet competition by far surpassing the second place, and the convolutional neural network and even deep learning attracted wide attention. AlexNet used relu as the activation function of CNN to solve the gradient dispersion problem of sigmoid when the network is deep. During the training, Dropout was used to randomly lose a part of the neurons, avoiding the overfitting of the model. In the network, overlapping maximum pooling is used to replace the average pooling commonly used in CNN, which avoids the fuzzy effect of average pooling and improves the feature richness. In a sense, AlexNet has exploded the research and application of neural networks.
+
+SqueezeNet achieved the same precision as AlexNet on Imagenet-1k, but only with 1/50 parameters. The core of the network is the Fire module, which used the convolution of 1x1 to achieve channel dimensionality reduction, thus greatly saving the number of parameters. The author created SqueezeNet by stacking a large number of Fire modules.
+
+VGG is a convolutional neural network developed by researchers at Oxford University's Visual Geometry Group and DeepMind. The network explores the relationship between the depth of the convolutional neural network and its performance. By repeatedly stacking the small convolutional kernel of 3x3 and the maximum pooling layer of 2x2, the multi-layer convolutional neural network is successfully constructed and has achieved good convergence accuracy. In the end, VGG won the runner-up of ILSVRC 2014 classification and the champion of positioning.
+
+DarkNet53 is designed for object detection by YOLO author in the paper. The network is basically composed of 1x1 and 3x3 kernel, with a total of 53 layers, named DarkNet53.
+
+
+
+## Accuracy, FLOPS and Parameters
+
+| Models                    | Top1   | Top5   | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| AlexNet                   | 0.567  | 0.792  | 0.5720            |                   | 1.370        | 61.090            |
+| SqueezeNet1_0             | 0.596  | 0.817  | 0.575             |                   | 1.550        | 1.240             |
+| SqueezeNet1_1             | 0.601  | 0.819  |                   |                   | 0.690        | 1.230             |
+| VGG11                     | 0.693  | 0.891  |                   |                   | 15.090       | 132.850           |
+| VGG13                     | 0.700  | 0.894  |                   |                   | 22.480       | 133.030           |
+| VGG16                     | 0.720  | 0.907  | 0.715             | 0.901             | 30.810       | 138.340           |
+| VGG19                     | 0.726  | 0.909  |                   |                   | 39.130       | 143.650           |
+| DarkNet53                 | 0.780  | 0.941  | 0.772             | 0.938             | 18.580       | 41.600            |
+| ResNet50_ACNet            | 0.767  | 0.932  |                   |                   | 10.730       | 33.110            |
+| ResNet50_ACNet<br>_deploy | 0.767  | 0.932  |                   |                   | 8.190        | 25.550            |
+
+
+
+## Inference speed based on V100 GPU
+
+
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|---------------------------|-----------|-------------------|----------------------|
+| AlexNet                   | 224       | 256               | 1.176                |
+| SqueezeNet1_0             | 224       | 256               | 0.860                |
+| SqueezeNet1_1             | 224       | 256               | 0.763                |
+| VGG11                     | 224       | 256               | 1.867                |
+| VGG13                     | 224       | 256               | 2.148                |
+| VGG16                     | 224       | 256               | 2.616                |
+| VGG19                     | 224       | 256               | 3.076                |
+| DarkNet53                 | 256       | 256               | 3.139                |
+| ResNet50_ACNet<br>_deploy | 224       | 256               | 5.626                |
+
+
+
+## Inference speed based on T4 GPU
+
+| Models                | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| AlexNet               | 224       | 256               | 1.06447                      | 1.70435                      | 2.38402                      | 1.44993                      | 2.46696                      | 3.72085                      |
+| SqueezeNet1_0         | 224       | 256               | 0.97162                      | 2.06719                      | 3.67499                      | 0.96736                      | 2.53221                      | 4.54047                      |
+| SqueezeNet1_1         | 224       | 256               | 0.81378                      | 1.62919                      | 2.68044                      | 0.76032                      | 1.877                        | 3.15298                      |
+| VGG11                 | 224       | 256               | 2.24408                      | 4.67794                      | 7.6568                       | 3.90412                      | 9.51147                      | 17.14168                     |
+| VGG13                 | 224       | 256               | 2.58589                      | 5.82708                      | 10.03591                     | 4.64684                      | 12.61558                     | 23.70015                     |
+| VGG16                 | 224       | 256               | 3.13237                      | 7.19257                      | 12.50913                     | 5.61769                      | 16.40064                     | 32.03939                     |
+| VGG19                 | 224       | 256               | 3.69987                      | 8.59168                      | 15.07866                     | 6.65221                      | 20.4334                      | 41.55902                     |
+| DarkNet53             | 256       | 256               | 3.18101                      | 5.88419                      | 10.14964                     | 4.10829                      | 12.1714                      | 22.15266                     |
+| ResNet50_ACNet        | 256       | 256               | 3.89002                      | 4.58195                      | 9.01095                      | 5.33395                      | 10.96843                     | 18.70368                     |
+| ResNet50_ACNet_deploy | 224       | 256               | 2.6823                       | 5.944                        | 7.16655                      | 3.49161                      | 7.78374                      | 13.94361                     |
--- a/docs/en/models/ResNeSt_RegNet_en.md
+++ b/docs/en/models/ResNeSt_RegNet_en.md
+## Overview
+
+The ResNeSt series was proposed in 2020. The original resnet network structure has been improved by introducing K groups and adding an attention module similar to SEBlock in different groups, the accuracy is greater than that of the basic model ResNet, but the parameter amount and flops are almost the same as the basic ResNet.
+
+RegNet was proposed in 2020 by Facebook to deepen the concept of design space. Based on AnyNetX, the model performance is gradually improved by shared bottleneck ratio, shared group width, adjusting network depth or width and other strategies. What's more, the design space structure is simplified, whose interpretability is also be improved. The quality of design space is improved while its diversity is maintained. Under similar conditions, the performance of the designed RegNet model performs better than EfficientNet and 5 times faster than EfficientNet.
+
+## Accuracy, FLOPs and Parameters
+
+| Models           | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| ResNeSt50_fast_1s1x64d        | 0.8035 | 0.9528|  0.8035 |            -| 8.68     | 26.3   |
+| ResNeSt50        | 0.8102 | 0.9542|  0.8113 |            -| 10.78     | 27.5   |
+| RegNetX_4GF        | 0.7850 | 0.9416|  0.7860 |            -| 8.0     | 22.1   |
+
+
+## Inference speed based on T4 GPU
+
+| Models             | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|--------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| ResNeSt50_fast_1s1x64d          | 224       | 256   | 3.46466           | 5.56647           | 9.11848          | 3.45405      |   8.72680    |    15.48710     |
+| ResNeSt50         | 224       | 256               | 7.05851           | 8.97676            | 13.34704          | 6.16248      |   12.0633    |    21.49936     |
+| RegNetX_4GF | 224       | 256       | 6.69042    | 8.01664            | 11.60608       | 6.46478     |   11.19862    |    16.89089    |
--- a/docs/en/models/ResNet_and_vd_en.md
+++ b/docs/en/models/ResNet_and_vd_en.md
+# ResNet and ResNet_vd series
+
+## Overview
+
+The ResNet series model was proposed in 2015 and won the championship in the ILSVRC2015 competition with a top5 error rate of 3.57%. The network innovatively proposed the residual structure, and built the ResNet network by stacking multiple residual structures. Experiments show that using residual blocks can improve the convergence speed and accuracy effectively.
+
+Joyce Xu of Stanford university calls ResNet one of three architectures that "really redefine the way we think about neural networks." Due to the outstanding performance of ResNet, more and more scholars and engineers from academia and industry have improved its structure. The well-known ones include wide-resnet, resnet-vc, resnet-vd, Res2Net, etc. The number of parameters and FLOPs of resnet-vc and resnet-vd are almost the same as those of ResNet, so we hereby unified them into the ResNet series.
+
+The models of the ResNet series released this time include 14 pre-trained models including ResNet50, ResNet50_vd, ResNet50_vd_ssld, and ResNet200_vd. At the training level, ResNet adopted the standard training process for training ImageNet, while the rest of the improved model adopted more training strategies, such as cosine decay for the decline of learning rate and the regular label smoothing method,mixup was added to the data preprocessing, and the total number of iterations increased from 120 epoches to 200 epoches.
+
+Among them, ResNet50_vd_v2 and ResNet50_vd_ssld adopted knowledge distillation, which further improved the accuracy of the model while keeping the structure unchanged. Specifically, the teacher model of ResNet50_vd_v2 is ResNet152_vd (top1 accuracy 80.59%), the training set is imagenet-1k, the teacher model of ResNet50_vd_ssld is ResNeXt101_32x16d_wsl (top1 accuracy 84.2%), and the training set is the combination of 4 million data mined by imagenet-22k and ImageNet-1k . The specific methods of knowledge distillation are being continuously updated.
+
+The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.ResNet.png)
+
+
+As can be seen from the above curves, the higher the number of layers, the higher the accuracy, but the corresponding number of parameters, calculation and latency will increase. ResNet50_vd_ssld further improves the accuracy of top-1 of the ImageNet-1k validation set by using stronger teachers and more data, reaching 82.39%, refreshing the accuracy of ResNet50 series models.
+
+
+## Accuracy, FLOPS and Parameters
+
+| Models           | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| ResNet18         | 0.710           | 0.899           | 0.696                    | 0.891                    | 3.660     | 11.690    |
+| ResNet18_vd      | 0.723           | 0.908           |                          |                          | 4.140     | 11.710    |
+| ResNet34         | 0.746           | 0.921           | 0.732                    | 0.913                    | 7.360     | 21.800    |
+| ResNet34_vd      | 0.760           | 0.930           |                          |                          | 7.390     | 21.820    |
+| ResNet34_vd_ssld      | 0.797           | 0.949           |                          |                          | 7.390     | 21.820    |
+| ResNet50         | 0.765           | 0.930           | 0.760                    | 0.930                    | 8.190     | 25.560    |
+| ResNet50_vc      | 0.784           | 0.940           |                          |                          | 8.670     | 25.580    |
+| ResNet50_vd      | 0.791           | 0.944           | 0.792                    | 0.946                    | 8.670     | 25.580    |
+| ResNet50_vd_v2   | 0.798           | 0.949           |                          |                          | 8.670     | 25.580    |
+| ResNet101        | 0.776           | 0.936           | 0.776                    | 0.938                    | 15.520    | 44.550    |
+| ResNet101_vd     | 0.802           | 0.950           |                          |                          | 16.100    | 44.570    |
+| ResNet152        | 0.783           | 0.940           | 0.778                    | 0.938                    | 23.050    | 60.190    |
+| ResNet152_vd     | 0.806           | 0.953           |                          |                          | 23.530    | 60.210    |
+| ResNet200_vd     | 0.809           | 0.953           |                          |                          | 30.530    | 74.740    |
+| ResNet50_vd_ssld | 0.824           | 0.961           |                          |                          | 8.670     | 25.580    |
+| ResNet50_vd_ssld_v2 | 0.830           | 0.964           |                          |                          | 8.670     | 25.580    |
+| Fix_ResNet50_vd_ssld_v2 | 0.840           | 0.970           |                          |                          | 17.696     | 25.580    |
+| ResNet101_vd_ssld | 0.837           | 0.967           |                          |                          | 16.100    | 44.570     |
+
+* Note: `ResNet50_vd_ssld_v2` is obtained by adding AutoAugment in training process on the basis of `ResNet50_vd_ssld` training strategy.`Fix_ResNet50_vd_ssld_v2` stopped all parameter updates of `ResNet50_vd_ssld_v2` except the FC layer,and fine-tuned on ImageNet1k dataset, the resolution is 320x320.
+
+
+## Inference speed based on V100 GPU
+
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|------------------|-----------|-------------------|--------------------------|
+| ResNet18         | 224       | 256               | 1.499                    |
+| ResNet18_vd      | 224       | 256               | 1.603                    |
+| ResNet34         | 224       | 256               | 2.272                    |
+| ResNet34_vd      | 224       | 256               | 2.343                    |
+| ResNet34_vd_ssld      | 224       | 256               | 2.343                    |
+| ResNet50         | 224       | 256               | 2.939                    |
+| ResNet50_vc      | 224       | 256               | 3.041                    |
+| ResNet50_vd      | 224       | 256               | 3.165                    |
+| ResNet50_vd_v2   | 224       | 256               | 3.165                    |
+| ResNet101        | 224       | 256               | 5.314                    |
+| ResNet101_vd     | 224       | 256               | 5.252                    |
+| ResNet152        | 224       | 256               | 7.205                    |
+| ResNet152_vd     | 224       | 256               | 7.200                    |
+| ResNet200_vd     | 224       | 256               | 8.885                    |
+| ResNet50_vd_ssld | 224       | 256               | 3.165                    |
+| ResNet101_vd_ssld  | 224       | 256             | 5.252                  |
+
+
+## Inference speed based on T4 GPU
+
+| Models            | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|-------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| ResNet18          | 224       | 256               | 1.3568                       | 2.5225                       | 3.61904                      | 1.45606                      | 3.56305                      | 6.28798                      |
+| ResNet18_vd       | 224       | 256               | 1.39593                      | 2.69063                      | 3.88267                      | 1.54557                      | 3.85363                      | 6.88121                      |
+| ResNet34          | 224       | 256               | 2.23092                      | 4.10205                      | 5.54904                      | 2.34957                      | 5.89821                      | 10.73451                     |
+| ResNet34_vd       | 224       | 256               | 2.23992                      | 4.22246                      | 5.79534                      | 2.43427                      | 6.22257                      | 11.44906                     |
+| ResNet34_vd       | 224       | 256               | 2.23992                      | 4.22246                      | 5.79534                      | 2.43427                      | 6.22257                      | 11.44906                     |
+| ResNet50          | 224       | 256               | 2.63824                      | 4.63802                      | 7.02444                      | 3.47712                      | 7.84421                      | 13.90633                     |
+| ResNet50_vc       | 224       | 256               | 2.67064                      | 4.72372                      | 7.17204                      | 3.52346                      | 8.10725                      | 14.45577                     |
+| ResNet50_vd       | 224       | 256               | 2.65164                      | 4.84109                      | 7.46225                      | 3.53131                      | 8.09057                      | 14.45965                     |
+| ResNet50_vd_v2    | 224       | 256               | 2.65164                      | 4.84109                      | 7.46225                      | 3.53131                      | 8.09057                      | 14.45965                     |
+| ResNet101         | 224       | 256               | 5.04037                      | 7.73673                      | 10.8936                      | 6.07125                      | 13.40573                     | 24.3597                      |
+| ResNet101_vd      | 224       | 256               | 5.05972                      | 7.83685                      | 11.34235                     | 6.11704                      | 13.76222                     | 25.11071                     |
+| ResNet152         | 224       | 256               | 7.28665                      | 10.62001                     | 14.90317                     | 8.50198                      | 19.17073                     | 35.78384                     |
+| ResNet152_vd      | 224       | 256               | 7.29127                      | 10.86137                     | 15.32444                     | 8.54376                      | 19.52157                     | 36.64445                     |
+| ResNet200_vd      | 224       | 256               | 9.36026                      | 13.5474                      | 19.0725                      | 10.80619                     | 25.01731                     | 48.81399                     |
+| ResNet50_vd_ssld  | 224       | 256               | 2.65164                      | 4.84109                      | 7.46225                      | 3.53131                      | 8.09057                      | 14.45965                     |
+| ResNet50_vd_ssld_v2  | 224       | 256               | 2.65164                      | 4.84109                      | 7.46225                      | 3.53131                      | 8.09057                      | 14.45965                     |
+| Fix_ResNet50_vd_ssld_v2  | 320       | 320               | 3.42818                      | 7.51534                      | 13.19370                      | 5.07696                      | 14.64218                      | 27.01453                     |
+| ResNet101_vd_ssld | 224       | 256               | 5.05972                      | 7.83685                      | 11.34235                     | 6.11704                      | 13.76222                     | 25.11071                     |
--- a/docs/en/models/SEResNext_and_Res2Net_en.md
+++ b/docs/en/models/SEResNext_and_Res2Net_en.md
+# SEResNeXt and Res2Net series
+
+## Overview
+
+ResNeXt, one of the typical variants of ResNet, was presented at the CVPR conference in 2017. Prior to this, the methods to improve the model accuracy mainly focused on deepening or widening the network, which increased the number of parameters and calculation, and slowed down the inference speed accordingly. The concept of cardinality was proposed in ResNeXt structure. The author found that increasing the number of channel groups was more effective than increasing the depth and width through experiments. It can improve the accuracy without increasing the parameter complexity and reduce the number of parameters at the same time, so it is a more successful variant of ResNet.
+
+SENet is the winner of the 2017 ImageNet classification competition. It proposes a new SE structure that can be migrated to any other network. It controls the scale to enhance the important features between each channel, and weaken the unimportant features. So that the extracted features are more directional.
+
+Res2Net is a brand-new improvement of ResNet proposed in 2019. The solution can be easily integrated with other excellent modules. Without increasing the amount of calculation, the performance on ImageNet, CIFAR-100 and other data sets exceeds ResNet. Res2Net, with its simple structure and superior performance, further explores the multi-scale representation capability of CNN at a more fine-grained level. Res2Net reveals a new dimension to improve model accuracy, called scale, which is an essential and more effective factor in addition to the existing dimensions of depth, width, and cardinality. The network also performs well in other visual tasks such as object detection and image segmentation.
+
+The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
+
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png)
+
+
+At present, there are a total of 24 pretrained models of the three categories open sourced by PaddleClas, and the indicators are shown in the figure. It can be seen from the diagram that under the same Flops and Params, the improved model tends to have higher accuracy, but the inference speed is often inferior to the ResNet series. On the other hand, Res2Net performed better. Compared with group operation in ResNeXt and SE structure operation in SEResNet, Res2Net tended to have better accuracy in the same Flops, Params and inference speed.
+
+
+
+## Accuracy, FLOPS and Parameters
+
+| Models                | Top1   | Top5   | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| Res2Net50_26w_4s      | 0.793  | 0.946  | 0.780             | 0.936             | 8.520        | 25.700            |
+| Res2Net50_vd_26w_4s   | 0.798  | 0.949  |                   |                   | 8.370        | 25.060            |
+| Res2Net50_14w_8s      | 0.795  | 0.947  | 0.781             | 0.939             | 9.010        | 25.720            |
+| Res2Net101_vd_26w_4s  | 0.806  | 0.952  |                   |                   | 16.670       | 45.220            |
+| Res2Net200_vd_26w_4s  | 0.812  | 0.957  |                   |                   | 31.490       | 76.210            |
+| Res2Net200_vd_26w_4s_ssld  | **0.851**  | 0.974  |                   |                   | 31.490       | 76.210            |
+| ResNeXt50_32x4d       | 0.778  | 0.938  | 0.778             |                   | 8.020        | 23.640            |
+| ResNeXt50_vd_32x4d    | 0.796  | 0.946  |                   |                   | 8.500        | 23.660            |
+| ResNeXt50_64x4d       | 0.784  | 0.941  |                   |                   | 15.060       | 42.360            |
+| ResNeXt50_vd_64x4d    | 0.801  | 0.949  |                   |                   | 15.540       | 42.380            |
+| ResNeXt101_32x4d      | 0.787  | 0.942  | 0.788             |                   | 15.010       | 41.540            |
+| ResNeXt101_vd_32x4d   | 0.803  | 0.951  |                   |                   | 15.490       | 41.560            |
+| ResNeXt101_64x4d      | 0.784  | 0.945  | 0.796             |                   | 29.050       | 78.120            |
+| ResNeXt101_vd_64x4d   | 0.808  | 0.952  |                   |                   | 29.530       | 78.140            |
+| ResNeXt152_32x4d      | 0.790  | 0.943  |                   |                   | 22.010       | 56.280            |
+| ResNeXt152_vd_32x4d   | 0.807  | 0.952  |                   |                   | 22.490       | 56.300            |
+| ResNeXt152_64x4d      | 0.795  | 0.947  |                   |                   | 43.030       | 107.570           |
+| ResNeXt152_vd_64x4d   | 0.811  | 0.953  |                   |                   | 43.520       | 107.590           |
+| SE_ResNet18_vd        | 0.733  | 0.914  |                   |                   | 4.140        | 11.800            |
+| SE_ResNet34_vd        | 0.765  | 0.932  |                   |                   | 7.840        | 21.980            |
+| SE_ResNet50_vd        | 0.795  | 0.948  |                   |                   | 8.670        | 28.090            |
+| SE_ResNeXt50_32x4d    | 0.784  | 0.940  | 0.789             | 0.945             | 8.020        | 26.160            |
+| SE_ResNeXt50_vd_32x4d | 0.802  | 0.949  |                   |                   | 10.760       | 26.280            |
+| SE_ResNeXt101_32x4d   | 0.791  | 0.942  | 0.793             | 0.950             | 15.020       | 46.280            |
+| SENet154_vd           | 0.814  | 0.955  |                   |                   | 45.830       | 114.290           |
+
+
+
+## Inference speed based on V100 GPU
+
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|-----------------------|-----------|-------------------|--------------------------|
+| Res2Net50_26w_4s      | 224       | 256               | 4.148                    |
+| Res2Net50_vd_26w_4s   | 224       | 256               | 4.172                    |
+| Res2Net50_14w_8s      | 224       | 256               | 5.113                    |
+| Res2Net101_vd_26w_4s  | 224       | 256               | 7.327                    |
+| Res2Net200_vd_26w_4s  | 224       | 256               | 12.806                   |
+| ResNeXt50_32x4d       | 224       | 256               | 10.964                   |
+| ResNeXt50_vd_32x4d    | 224       | 256               | 7.566                    |
+| ResNeXt50_64x4d       | 224       | 256               | 13.905                   |
+| ResNeXt50_vd_64x4d    | 224       | 256               | 14.321                   |
+| ResNeXt101_32x4d      | 224       | 256               | 14.915                   |
+| ResNeXt101_vd_32x4d   | 224       | 256               | 14.885                   |
+| ResNeXt101_64x4d      | 224       | 256               | 28.716                   |
+| ResNeXt101_vd_64x4d   | 224       | 256               | 28.398                   |
+| ResNeXt152_32x4d      | 224       | 256               | 22.996                   |
+| ResNeXt152_vd_32x4d   | 224       | 256               | 22.729                   |
+| ResNeXt152_64x4d      | 224       | 256               | 46.705                   |
+| ResNeXt152_vd_64x4d   | 224       | 256               | 46.395                   |
+| SE_ResNet18_vd        | 224       | 256               | 1.694                    |
+| SE_ResNet34_vd        | 224       | 256               | 2.786                    |
+| SE_ResNet50_vd        | 224       | 256               | 3.749                    |
+| SE_ResNeXt50_32x4d    | 224       | 256               | 8.924                    |
+| SE_ResNeXt50_vd_32x4d | 224       | 256               | 9.011                    |
+| SE_ResNeXt101_32x4d   | 224       | 256               | 19.204                   |
+| SENet154_vd           | 224       | 256               | 50.406                   |
+
+
+## Inference speed based on T4 GPU
+
+| Models                | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| Res2Net50_26w_4s      | 224       | 256               | 3.56067                      | 6.61827                      | 11.41566                     | 4.47188                      | 9.65722                      | 17.54535                     |
+| Res2Net50_vd_26w_4s   | 224       | 256               | 3.69221                      | 6.94419                      | 11.92441                     | 4.52712                      | 9.93247                      | 18.16928                     |
+| Res2Net50_14w_8s      | 224       | 256               | 4.45745                      | 7.69847                      | 12.30935                     | 5.4026                       | 10.60273                     | 18.01234                     |
+| Res2Net101_vd_26w_4s  | 224       | 256               | 6.53122                      | 10.81895                     | 18.94395                     | 8.08729                      | 17.31208                     | 31.95762                     |
+| Res2Net200_vd_26w_4s  | 224       | 256               | 11.66671                     | 18.93953                     | 33.19188                     | 14.67806                     | 32.35032                     | 63.65899                     |
+| ResNeXt50_32x4d       | 224       | 256               | 7.61087                      | 8.88918                      | 12.99674                     | 7.56327                      | 10.6134                      | 18.46915                     |
+| ResNeXt50_vd_32x4d    | 224       | 256               | 7.69065                      | 8.94014                      | 13.4088                      | 7.62044                      | 11.03385                     | 19.15339                     |
+| ResNeXt50_64x4d       | 224       | 256               | 13.78688                     | 15.84655                     | 21.79537                     | 13.80962                     | 18.4712                      | 33.49843                     |
+| ResNeXt50_vd_64x4d    | 224       | 256               | 13.79538                     | 15.22201                     | 22.27045                     | 13.94449                     | 18.88759                     | 34.28889                     |
+| ResNeXt101_32x4d      | 224       | 256               | 16.59777                     | 17.93153                     | 21.36541                     | 16.21503                     | 19.96568                     | 33.76831                     |
+| ResNeXt101_vd_32x4d   | 224       | 256               | 16.36909                     | 17.45681                     | 22.10216                     | 16.28103                     | 20.25611                     | 34.37152                     |
+| ResNeXt101_64x4d      | 224       | 256               | 30.12355                     | 32.46823                     | 38.41901                     | 30.4788                      | 36.29801                     | 68.85559                     |
+| ResNeXt101_vd_64x4d   | 224       | 256               | 30.34022                     | 32.27869                     | 38.72523                     | 30.40456                     | 36.77324                     | 69.66021                     |
+| ResNeXt152_32x4d      | 224       | 256               | 25.26417                     | 26.57001                     | 30.67834                     | 24.86299                     | 29.36764                     | 52.09426                     |
+| ResNeXt152_vd_32x4d   | 224       | 256               | 25.11196                     | 26.70515                     | 31.72636                     | 25.03258                     | 30.08987                     | 52.64429                     |
+| ResNeXt152_64x4d      | 224       | 256               | 46.58293                     | 48.34563                     | 56.97961                     | 46.7564                      | 56.34108                     | 106.11736                    |
+| ResNeXt152_vd_64x4d   | 224       | 256               | 47.68447                     | 48.91406                     | 57.29329                     | 47.18638                     | 57.16257                     | 107.26288                    |
+| SE_ResNet18_vd        | 224       | 256               | 1.61823                      | 3.1391                       | 4.60282                      | 1.7691                       | 4.19877                      | 7.5331                       |
+| SE_ResNet34_vd        | 224       | 256               | 2.67518                      | 5.04694                      | 7.18946                      | 2.88559                      | 7.03291                      | 12.73502                     |
+| SE_ResNet50_vd        | 224       | 256               | 3.65394                      | 7.568                        | 12.52793                     | 4.28393                      | 10.38846                     | 18.33154                     |
+| SE_ResNeXt50_32x4d    | 224       | 256               | 9.06957                      | 11.37898                     | 18.86282                     | 8.74121                      | 13.563                       | 23.01954                     |
+| SE_ResNeXt50_vd_32x4d | 224       | 256               | 9.25016                      | 11.85045                     | 25.57004                     | 9.17134                      | 14.76192                     | 19.914                       |
+| SE_ResNeXt101_32x4d   | 224       | 256               | 19.34455                     | 20.6104                      | 32.20432                     | 18.82604                     | 25.31814                     | 41.97758                     |
+| SENet154_vd           | 224       | 256               | 49.85733                     | 54.37267                     | 74.70447                     | 53.79794                     | 66.31684                     | 121.59885                    |
--- a/docs/en/models/Tricks_en.md
+++ b/docs/en/models/Tricks_en.md
+# Tricks for Training
+
+## Choice of Optimizers:
+Since the development of deep learning, there have been many researchers working on the optimizer. The purpose of the optimizer is to make the loss function as small as possible, so as to find suitable parameters to complete a certain task. At present, the main optimizers used in model training are SGD, RMSProp, Adam, AdaDelt and so on. The SGD optimizers with momentum is widely used in academia and industry, so most of models we release are trained by SGD optimizer with momentum. But the SGD optimizer with momentum has two disadvantages, one is that the convergence speed is slow, the other is that the initial learning rate is difficult to set, however, if the initial learning rate is set properly and the models are trained in sufficient iterations, the models trained by SGD with momentum can reach higher accuracy compared with the models trained by other optimizers. Some other optimizers with adaptive learning rate such as Adam, RMSProp and so on tent to converge faster, but the final convergence accuracy will be slightly worse. If you want to train a model in faster convergence speed, we recommend you use the optimizers with adaptive learning rate, but if you want to train a model with higher accuracy, we recommend you to use SGD optimizer with momentum.
+
+## Choice of Learning Rate and Learning Rate Declining Strategy:
+The choice of learning rate is related to the optimizer, data set and tasks. Here we mainly introduce the learning rate of training ImageNet-1K with momentum + SGD as the optimizer and the choice of learning rate decline.
+
+### Concept of Learning Rate：
+the learning rate is the hyperparameter to control the learning speed, the lower the learning rate, the slower the change of the loss value, though using a low learning rate can ensure that you will not miss any local minimum, but it also means that the convergence speed is slow, especially when the gradient is trapped in a gradient plateau area.
+
+### Learning Rate Decline Strategy：
+During training, if we always use the same learning rate, we cannot get the model with highest accuracy, so the learning rate should be adjust during training. In the early stage of training, the weights are in a random initialization state and the gradients are tended to descent, so we can set a relatively large learning rate for faster convergence. In the late stage of training, the weights are close to the optimal values, the optimal value cannot be reached by a relatively large learning rate, so a relatively smaller learning rate should be used. During training, many researchers use the piecewise_decay learning rate reduction strategy, which is a stepwise decline learning rate. For example, in the training of ResNet50, the initial learning rate we set is 0.1, and the learning rate drops to 1/10 every 30 epoches, the total epoches for training is 120. Besides the piecewise_decay, many researchers also proposed other ways to decrease the learning rate, such as polynomial_decay, exponential_decay and cosine_decay and so on, among them, cosine_decay has become the preferred learning rate reduction method for improving model accuracy beacause there is no need to adjust hyperparameters and the robustness is relatively high. The learning rate curves of cosine_decay and piecewise_decay are shown in the following figures, it is easy to observe that during the entire training process, cosine_decay keeps a relatively large learning rate, so its convergence is slower, but the final convergence accuracy is better than the one using piecewise_decay.
+
+![](../../images/models/lr_decay.jpeg)
+
+In addition, we can also see from the figures that the number of epoches with a small learning rate in cosine_decay is fewer, which will affect the final accuracy, so in order to make cosine_decay play a better effect, it is recommended to use cosine_decay in large epoched, such as 200 epoches.
+
+### Warmup Strategy
+If a large batch_size is adopted to train nerual network, we recommend you to adopt warmup strategy. as the name suggests, the warmup strategy is to let model learning first warm up, we do not directly use the initial learning rate at the begining of training, instead, we use a gradually increasing learning rate to train the model, when the increasing learning rate reaches the initial learning rate, the learning rate reduction method mentioned in the learning rate reduction strategy is then used to decay the learning rate. Experiments show that when the batch size is large, warmup strategy can improve the accuracy. Some model training with large batch_size such as MobileNetV3 training, we set the epoch in warmup to 5 by default, that is, first in 5 epoches, the learning rate increases from 0 to initial learning rate, then learning rate decay begins.
+
+## Choice of Batch_size
+Batch_size is an important hyperparameter in training neural networks, batch_size determines how much data is sent to the neural network to for training at a time. In the paper [1], the author found in experiments that when batch_size is linearly related to the learning rate, the convergence accuracy is hardly affected. When training ImageNet data, an initial learning rate of 0.1 are commonly chosen for training, and batch_size is 256, so according to the actual model size and memory, you can set the learning rate to 0.1\*k, batch_size to 256\*k.
+
+## Choice of Weight_decay
+Overfitting is a common term in machine learning. A simple understanding is that the model performs well on the training data, but it performs poorly on the test data. In the convolutional neural network, there also exists the problem of overfitting. To avoid overfitting, many regular ways have been proposed. Among them, weight_decay is one of the widely used ways to avoid overfitting. After the final loss function, L2 regularization(weight_decay) is added to the loss function, with the help of L2 regularization, the weight of the network tend to choose a smaller value, and finally the parameters in the entire network tends to 0, and the generalization performance of the model is improved accordingly. In different kinds of Deep learning frame, the meaning of L2_decay is the coefficient of L2 regularization, on paddle, the name of this value is L2_decay, so in the following the value is called L2_decay. the larger the coefficient, the more the model tends to be underfitting. In the task of training ImageNet, this parameter is set to 1e-4 in most network. In some small networks such as MobileNet networks, in order to avoid network underfitting, the value is set to 1e-5 ~ 4e-5. Of course, the setting of this value is also related to the specific data set, When the data set is large, the network itself tends to be under-fitted, and the value can be appropriately reduced. When the data set is small, the network tends to overfit itself, so the value can be increased appropriately. The following table shows the accuracy of MobileNetV1_x0_25 using different l2_decay on ImageNet-1k. Since MobileNetV1_x0_25 is a relatively small network, the large l2_decay will make the network tend to be underfitting, so in this network, 3e-5 are better choices compared with 1e-4.
+
+| Model                | L2_decay | Train acc1/acc5 | Test acc1/acc5 |
+|:--:|:--:|:--:|:--:|
+| MobileNetV1_x0_25 | 1e-4     | 43.79%/67.61%   | 50.41%/74.70%  |
+| MobileNetV1_x0_25 | 3e-5     | 47.38%/70.83%   | 51.45%/75.45%  |
+
+In addition, the setting of L2_decay is also related to whether other regularization is used during training. If the data argument during the training is more complicated, which means that the training becomes more difficult, L2_decay can be appropriately reduced. The following table shows that the precision of ResNet50 using a different l2_decay on ImageNet-1K. It is easy to observe that after the training becomes difficult, using a smaller l2_decay helps to improve the accuracy of the model.
+
+| Model       | L2_decay | Train acc1/acc5 | Test acc1/acc5 |
+|:--:|:--:|:--:|:--:|
+| ResNet50 | 1e-4     | 75.13%/90.42%   | 77.65%/93.79%  |
+| ResNet50 | 7e-5     | 75.56%/90.55%   | 78.04%/93.74%  |
+
+In summary, l2_decay can be adjusted according to specific tasks and models. Usually simple tasks or larger models are recommended to use Larger l2_decay, complex tasks or smaller models are recommended to use smaller l2_decay.
+
+## Choice of Label_smoothing
+Label_smoothing is a regularization method in deep learning. Its full name is Label Smoothing Regularization (LSR), that is, label smoothing regularization. In the traditional classification task, when calculating the loss function, the real one hot label and the output of the neural network are calculated in cross-entropy formula, the label smoothing aims to make the real one hot label become smooth label, which makes the neural network no longer learn from the hard labels, but the soft labels with a probability value, where the probability of the position corresponding to the category is the largest and the probability of other positions are very small value, specific calculation method can be seen in the paper[2]. In label-smoothing, there is an epsilon parameter describing the degree of softening the label. The larger epsilon, the smaller the probability and smoother the label, on the contrary, the label tends to be hard label. during training on ImageNet-1K, the parameter is usually set to 0.1. In the experiments of training ResNet50, when using label_smoothing, the accuracy is higher than the one without label_smoothing, the following table shows the performance of ResNet50_vd with label smoothing and without label smoothing.
+
+| Model          | Use_label_smoothing | Test acc1 |
+|:--:|:--:|:--:|
+| ResNet50_vd | 0                   | 77.9%     |
+| ResNet50_vd | 1                   | 78.4%     |
+
+But, because label smoothing can be regarded as a regular way, on relatively small models, the accuracy improvement is not obvious or even decreases, the following table shows the accuracy performance of ResNet18 with label smoothing and without label smoothing on ImageNet-1K, it can be clearly seen that after using label smoothing, the accuracy of ResNet has decreased.
+
+| Model       | Use_label_smoohing | Train acc1/acc5 | Test acc1/acc5 |
+|:--:|:--:|:--:|:--:|
+| ResNet18 | 0                  | 69.81%/87.70%   | 70.98%/89.92%  |
+| ResNet18 | 1                  | 68.00%/86.56%   | 70.81%/89.89%  |
+
+
+In summary, the use of label_smoohing for larger models can effectively improve the accuracy of the model, and the use of label_smoohing for smaller models may reduce the accuracy of the model, so before deciding whether to use label_smoohing, you need to evaluate the size of the model and the difficulty of the task.
+
+## Change the Crop Area and Stretch Transformation Degree of the Images for Small Models
+In the standard preprocessing of ImageNet-1k data, two values of scale and ratio are defined in the random_crop function. These two values respectively determine the size of the image crop and the degree of stretching of the image. The default value of scale is 0.08-1(lower_scale-upper_scale), the default value range of ratio is 3/4-4/3(lower_ratio-upper_ratio). In small network training, such data argument will make the network underfitting, resulting in a decrease in accuracy. In order to improve the accuracy of the network, you can make the data argument weaker, that is, increase the crop area of the images or weaken the degree of stretching and transformation of the images, we can achieve weaker image transformation by increasing the value of lower_scale or narrowing the gap between lower_ratio and upper_scale. The following table lists the accuracy of training MobileNetV2_x0_25 with different lower_scale. It can be seen that the training accuracy and validation accuracy are improved after increasing the crop area of the images
+
+| Model                | Scale Range | Train_acc1/acc5 | Test_acc1/acc5 |
+|:--:|:--:|:--:|:--:|
+| MobileNetV2_x0_25 | [0.08,1]  | 50.36%/72.98%   | 52.35%/75.65%  |
+| MobileNetV2_x0_25 | [0.2,1]   | 54.39%/77.08%   | 53.18%/76.14%  |
+
+## Use Data Augmentation to Improve Accuracy
+In general, the size of the data set is critical to the performances, but the annotation of images are often more expensive, so the number of annotated images are often scarce. In this case, the data argument is particularly important. In the standard data augmentation for training on ImageNet-1k, two data augmentation methods which are random_crop and random_flip are mainly used. However, in recent years, more and more data augmentation methods have been proposed, such as cutout, mixup, cutmix, AutoAugment, etc. Experiments show that these data augmentation methods can effectively improve the accuracy of the model. The following table lists the performance of ResNet50 in 8 different data augmentation methods. It can be seen that compared to the baseline, all data augmentation methods can be useful for the accuracy of ResNet50, among them cutmix is currently the most effective data argument. More data argument can be seen here[**Data Argument**](https://paddleclas.readthedocs.io/zh_CN/latest/advanced_tutorials/image_augmentation/ImageAugment.html).
+
+| Model       | Data Argument         | Test top-1 |
+|:--:|:--:|:--:|
+| ResNet50 | Baseline           | 77.31%     |
+| ResNet50 | Auto-Augment   | 77.95%     |
+| ResNet50 | Mixup          | 78.28%     |
+| ResNet50 | Cutmix         | 78.39%     |
+| ResNet50 | Cutout         | 78.01%     |
+| ResNet50 | Gridmask       | 77.85%     |
+| ResNet50 | Random-Augment | 77.70%     |
+| ResNet50 | Random-Erasing | 77.91%     |
+| ResNet50 | Hide-and-Seek  | 77.43%     |
+
+## Determine the Tuning Strategy by Train_acc and Test_acc
+In the process of training the network, the training set accuracy rate and validation set accuracy rate of each epoch are usually printed. Generally speaking, the accuracy of the training set is slightly higher than the accuracy of the validation set or the same are good state in training, but if you find that the accuracy of training set is much higher than the one of validation set, it means that overfitting happens in your task, which need more regularization, such as increase the value of L2_decay, using more data argument or label smoothing and so on. If you find that the accuracy of training set is lower than the one of validation set, it means that underfitting happens in your task, which recommend you to decrease the value of L2_decay, using fewer data argument, increase the area of the crop area of the images, weaken the stretching transformation of the images, remove label_smoothing, etc.
+
+## Improve the Accuracy of Your Own Data Set with Existing Pre-trained Models
+In the field of computer vision, it has become common to load pre-trained models to train one's own tasks. Compared with starting training from random initialization, loading pre-trained models can often improve the accuracy of specific tasks. In general, the pre-trained model widely used in the industry is obtained from the ImageNet-1k dataset. The fc layer weight of the pre-trained model is a matrix of k\*1000, where k is The number of neurons before,  and the weights of the fc layer is not need to load because of the different tasks. In terms of learning rate, if your training data set is particularly small (such as less than 1,000), we recommend that you use a smaller initial learning rate, such as 0.001 (batch_size: 256, the same below), to avoid a large learning rate undermine pre-training weights, if your training data set is relatively large (greater than 100,000), we recommend that you try a larger initial learning rate, such as 0.01 or greater.
+
+
+> If you think this guide is helpful to you, welcome to star our repo:[https://github.com/PaddlePaddle/PaddleClas](https://github.com/PaddlePaddle/PaddleClas)
+
+## Reference
+[1]P. Goyal, P. Dolla ́r, R. B. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR, abs/1706.02677, 2017.
+
+[2]C.Szegedy,V.Vanhoucke,S.Ioffe,J.Shlens,andZ.Wojna. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.
--- a/docs/en/models/index.rst
+++ b/docs/en/models/index.rst
@@ -4,13 +4,13 @@ models
 .. toctree::
   :maxdepth: 1
   
-   models_intro.md
-   Tricks.md
-   ResNet_and_vd.md
-   Mobile.md
-   SEResNext_and_Res2Net.md
-   Inception.md
-   HRNet.md
-   DPN_DenseNet.md
-   EfficientNet_and_ResNeXt101_wsl.md
-   Others.md
+   models_intro_en.md
+   Tricks_en.md
+   ResNet_and_vd_en.md
+   Mobile_en.md
+   SEResNext_and_Res2Net_en.md
+   Inception_en.md
+   HRNet_en.md
+   DPN_DenseNet_en.md
+   EfficientNet_and_ResNeXt101_wsl_en.md
+   Others_en.md
--- a/docs/en/models/models_intro_en.md
+++ b/docs/en/models/models_intro_en.md
+# Model Library Overview
+
+## Overview
+
+Based on the ImageNet1k classification dataset, the 23 classification network structures supported by PaddleClas and the corresponding 117 image classification pretrained models are shown below. Training trick, a brief introduction to each series of network structures, and performance evaluation will be shown in the corresponding chapters.
+
+## Evaluation environment
+* CPU evaluation environment is based on Snapdragon 855 (SD855).
+* The GPU evaluation environment is based on V100 and TensorRT, and the evaluation script is as follows.
+
+```shell
+#!/usr/bin/env bash
+
+export PYTHONPATH=$PWD:$PYTHONPATH
+
+python tools/infer/predict.py \
+    --model_file='pretrained/infer/model' \
+    --params_file='pretrained/infer/params' \
+    --enable_benchmark=True \
+    --model_name=ResNet50_vd \
+    --use_tensorrt=True \
+    --use_fp16=False \
+    --batch_size=1
+```
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png)
+
+![](../../images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.jpg)
+
+![](../../images/models/mobile_arm_top1.png)
+
+
+> If you think this document is helpful to you, welcome to give a star to our project:[https://github.com/PaddlePaddle/PaddleClas](https://github.com/PaddlePaddle/PaddleClas)
+
+
+## Pretrained model list and download address
+- ResNet and ResNet_vd series
+  - ResNet series<sup>[[1](#ref1)]</sup>([paper link](http://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html))
+    - [ResNet18](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_pretrained.tar)
+    - [ResNet34](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar)
+    - [ResNet50](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.tar)
+    - [ResNet101](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar)
+    - [ResNet152](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.tar)
+  - ResNet_vc、ResNet_vd series<sup>[[2](#ref2)]</sup>([paper link](https://arxiv.org/abs/1812.01187))
+    - [ResNet50_vc](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vc_pretrained.tar)
+    - [ResNet18_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar)
+    - [ResNet34_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_vd_pretrained.tar)
+    - [ResNet34_vd_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_vd_ssld_pretrained.tar)
+    - [ResNet50_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar)
+    - [ResNet50_vd_v2](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_v2_pretrained.tar)
+    - [ResNet101_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar)
+    - [ResNet152_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_vd_pretrained.tar)
+    - [ResNet200_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet200_vd_pretrained.tar)
+    - [ResNet50_vd_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar)
+    - [ResNet50_vd_ssld_v2](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_v2_pretrained.tar)
+    - [Fix_ResNet50_vd_ssld_v2](https://paddle-imagenet-models-name.bj.bcebos.com/Fix_ResNet50_vd_ssld_v2_pretrained.tar)
+    - [ResNet101_vd_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_ssld_pretrained.tar)
+
+
+- Mobile and Embedded Vision Applications Network series
+  - MobileNetV3 series<sup>[[3](#ref3)]</sup>([paper link](https://arxiv.org/abs/1905.02244))
+    - [MobileNetV3_large_x0_35](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_35_pretrained.tar)
+    - [MobileNetV3_large_x0_5](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar)
+    - [MobileNetV3_large_x0_75](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_75_pretrained.tar)
+    - [MobileNetV3_large_x1_0](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_pretrained.tar)
+    - [MobileNetV3_large_x1_25](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_25_pretrained.tar)
+    - [MobileNetV3_small_x0_35](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x0_35_pretrained.tar)
+    - [MobileNetV3_small_x0_5](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x0_5_pretrained.tar)
+    - [MobileNetV3_small_x0_75](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x0_75_pretrained.tar)
+    - [MobileNetV3_small_x1_0](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_0_pretrained.tar)
+    - [MobileNetV3_small_x1_25](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_25_pretrained.tar)
+    - [MobileNetV3_large_x1_0_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_pretrained.tar)
+    - [MobileNetV3_large_x1_0_ssld_int8](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_int8_pretrained.tar)
+    - [MobileNetV3_small_x1_0_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_0_ssld_pretrained.tar)
+  - MobileNetV2 series<sup>[[4](#ref4)]</sup>([paper link](https://arxiv.org/abs/1801.04381))
+    - [MobileNetV2_x0_25](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_25_pretrained.tar)
+    - [MobileNetV2_x0_5](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_5_pretrained.tar)
+    - [MobileNetV2_x0_75](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_75_pretrained.tar)
+    - [MobileNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.tar)
+    - [MobileNetV2_x1_5](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x1_5_pretrained.tar)
+    - [MobileNetV2_x2_0](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x2_0_pretrained.tar)
+    - [MobileNetV2_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_ssld_pretrained.tar)
+  - MobileNetV1 series<sup>[[5](#ref5)]</sup>([paper link](https://arxiv.org/abs/1704.04861))
+    - [MobileNetV1_x0_25](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_x0_25_pretrained.tar)
+    - [MobileNetV1_x0_5](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_x0_5_pretrained.tar)
+    - [MobileNetV1_x0_75](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_x0_75_pretrained.tar)
+    - [MobileNetV1](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.tar)
+    - [MobileNetV1_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_ssld_pretrained.tar)
+  - ShuffleNetV2 series<sup>[[6](#ref6)]</sup>([paper link](https://arxiv.org/abs/1807.11164))
+    - [ShuffleNetV2_x0_25](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x0_25_pretrained.tar)
+    - [ShuffleNetV2_x0_33](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x0_33_pretrained.tar)
+    - [ShuffleNetV2_x0_5](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x0_5_pretrained.tar)
+    - [ShuffleNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_pretrained.tar)
+    - [ShuffleNetV2_x1_5](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x1_5_pretrained.tar)
+    - [ShuffleNetV2_x2_0](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x2_0_pretrained.tar)
+    - [ShuffleNetV2_swish](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_swish_pretrained.tar)
+  - GhostNet series<sup>[[23](#ref23)]</sup>([paper link](https://arxiv.org/pdf/1911.11907.pdf))
+    - [GhostNet_x0_5](https://paddle-imagenet-models-name.bj.bcebos.com/GhostNet_x0_5_pretrained.pdparams)
+    - [GhostNet_x1_0](https://paddle-imagenet-models-name.bj.bcebos.com/GhostNet_x1_0_pretrained.pdparams)
+    - [GhostNet_x1_3](https://paddle-imagenet-models-name.bj.bcebos.com/GhostNet_x1_3_pretrained.pdparams)
+
+
+- SEResNeXt and Res2Net series
+  - ResNeXt series<sup>[[7](#ref7)]</sup>([paper link](https://arxiv.org/abs/1611.05431))
+    - [ResNeXt50_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_32x4d_pretrained.tar)
+    - [ResNeXt50_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_64x4d_pretrained.tar)
+    - [ResNeXt101_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x4d_pretrained.tar)
+    - [ResNeXt101_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_64x4d_pretrained.tar)
+    - [ResNeXt152_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_32x4d_pretrained.tar)
+    - [ResNeXt152_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_64x4d_pretrained.tar)
+  - ResNeXt_vd series
+    - [ResNeXt50_vd_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_vd_32x4d_pretrained.tar)
+    - [ResNeXt50_vd_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_vd_64x4d_pretrained.tar)
+    - [ResNeXt101_vd_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_32x4d_pretrained.tar)
+    - [ResNeXt101_vd_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar)
+    - [ResNeXt152_vd_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_vd_32x4d_pretrained.tar)
+    - [ResNeXt152_vd_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_vd_64x4d_pretrained.tar)
+  - SE_ResNet_vd series<sup>[[8](#ref8)]</sup>([paper link](https://arxiv.org/abs/1709.01507))
+    - [SE_ResNet18_vd](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNet18_vd_pretrained.tar)
+    - [SE_ResNet34_vd](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNet34_vd_pretrained.tar)
+    - [SE_ResNet50_vd](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNet50_vd_pretrained.tar)
+  - SE_ResNeXt series
+    - [SE_ResNeXt50_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt50_32x4d_pretrained.tar)
+    - [SE_ResNeXt101_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt101_32x4d_pretrained.tar)
+  - SE_ResNeXt_vd series
+    - [SE_ResNeXt50_vd_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt50_vd_32x4d_pretrained.tar)
+    - [SENet154_vd](https://paddle-imagenet-models-name.bj.bcebos.com/SENet154_vd_pretrained.tar)
+  - Res2Net series<sup>[[9](#ref9)]</sup>([paper link](https://arxiv.org/abs/1904.01169))
+    - [Res2Net50_26w_4s](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_26w_4s_pretrained.tar)
+    - [Res2Net50_vd_26w_4s](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_vd_26w_4s_pretrained.tar)
+    - [Res2Net50_14w_8s](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_14w_8s_pretrained.tar)
+    - [Res2Net101_vd_26w_4s](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net101_vd_26w_4s_pretrained.tar)
+    - [Res2Net200_vd_26w_4s](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net200_vd_26w_4s_pretrained.tar)
+    - [Res2Net200_vd_26w_4s_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net200_vd_26w_4s_ssld_pretrained.tar)
+
+
+- Inception series
+  - GoogLeNet series<sup>[[10](#ref10)]</sup>([paper link](https://arxiv.org/pdf/1409.4842.pdf))
+    - [GoogLeNet](https://paddle-imagenet-models-name.bj.bcebos.com/GoogLeNet_pretrained.tar)
+  - Inception series<sup>[[11](#ref11)]</sup>([paper link](https://arxiv.org/abs/1602.07261))
+    - [InceptionV4](https://paddle-imagenet-models-name.bj.bcebos.com/InceptionV4_pretrained.tar)
+  - Xception series<sup>[[12](#ref12)]</sup>([paper link](http://openaccess.thecvf.com/content_cvpr_2017/html/Chollet_Xception_Deep_Learning_CVPR_2017_paper.html))
+    - [Xception41](https://paddle-imagenet-models-name.bj.bcebos.com/Xception41_pretrained.tar)
+    - [Xception41_deeplab](https://paddle-imagenet-models-name.bj.bcebos.com/Xception41_deeplab_pretrained.tar)
+    - [Xception65](https://paddle-imagenet-models-name.bj.bcebos.com/Xception65_pretrained.tar)
+    - [Xception65_deeplab](https://paddle-imagenet-models-name.bj.bcebos.com/Xception65_deeplab_pretrained.tar)
+    - [Xception71](https://paddle-imagenet-models-name.bj.bcebos.com/Xception71_pretrained.tar)
+
+
+- HRNet series
+  - HRNet series<sup>[[13](#ref13)]</sup>([paper link](https://arxiv.org/abs/1908.07919))
+    - [HRNet_W18_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_pretrained.tar)
+    - [HRNet_W18_C_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_ssld_pretrained.tar)
+    - [HRNet_W30_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W30_C_pretrained.tar)
+    - [HRNet_W32_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W32_C_pretrained.tar)
+    - [HRNet_W40_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W40_C_pretrained.tar)
+    - [HRNet_W44_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W44_C_pretrained.tar)
+    - [HRNet_W48_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W48_C_pretrained.tar)
+    - [HRNet_W48_C_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W48_C_ssld_pretrained.tar)
+    - [HRNet_W64_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W64_C_pretrained.tar)
+
+
+- DPN and DenseNet series
+  - DPN series<sup>[[14](#ref14)]</sup>([paper link](https://arxiv.org/abs/1707.01629))
+    - [DPN68](https://paddle-imagenet-models-name.bj.bcebos.com/DPN68_pretrained.tar)
+    - [DPN92](https://paddle-imagenet-models-name.bj.bcebos.com/DPN92_pretrained.tar)
+    - [DPN98](https://paddle-imagenet-models-name.bj.bcebos.com/DPN98_pretrained.tar)
+    - [DPN107](https://paddle-imagenet-models-name.bj.bcebos.com/DPN107_pretrained.tar)
+    - [DPN131](https://paddle-imagenet-models-name.bj.bcebos.com/DPN131_pretrained.tar)
+  - DenseNet series<sup>[[15](#ref15)]</sup>([paper link](https://arxiv.org/abs/1608.06993))
+    - [DenseNet121](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet121_pretrained.tar)
+    - [DenseNet161](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet161_pretrained.tar)
+    - [DenseNet169](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet169_pretrained.tar)
+    - [DenseNet201](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet201_pretrained.tar)
+    - [DenseNet264](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet264_pretrained.tar)
+
+
+- EfficientNet and ResNeXt101_wsl series
+  - EfficientNet series<sup>[[16](#ref16)]</sup>([paper link](https://arxiv.org/abs/1905.11946))
+    - [EfficientNetB0_small](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB0_small_pretrained.tar)
+    - [EfficientNetB0](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB0_pretrained.tar)
+    - [EfficientNetB1](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB1_pretrained.tar)
+    - [EfficientNetB2](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB2_pretrained.tar)
+    - [EfficientNetB3](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB3_pretrained.tar)
+    - [EfficientNetB4](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB4_pretrained.tar)
+    - [EfficientNetB5](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB5_pretrained.tar)
+    - [EfficientNetB6](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB6_pretrained.tar)
+    - [EfficientNetB7](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB7_pretrained.tar)
+  - ResNeXt101_wsl series<sup>[[17](#ref17)]</sup>([paper link](https://arxiv.org/abs/1805.00932))
+    - [ResNeXt101_32x8d_wsl](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x8d_wsl_pretrained.tar)
+    - [ResNeXt101_32x16d_wsl](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x16d_wsl_pretrained.tar)
+    - [ResNeXt101_32x32d_wsl](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x32d_wsl_pretrained.tar)
+    - [ResNeXt101_32x48d_wsl](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x48d_wsl_pretrained.tar)
+    - [Fix_ResNeXt101_32x48d_wsl](https://paddle-imagenet-models-name.bj.bcebos.com/Fix_ResNeXt101_32x48d_wsl_pretrained.tar)
+
+
+
+- ResNeSt and RegNet series
+  - ResNeSt series<sup>[[24](#ref24)]</sup>([paper link](https://arxiv.org/abs/2004.08955))
+    - [ResNeSt50_fast_1s1x64d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeSt50_fast_1s1x64d_pretrained.pdparams)
+    - [ResNeSt50](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeSt50_pretrained.pdparams)
+  - RegNet series<sup>[[25](#ref25)]</sup>([paper link](https://arxiv.org/abs/2003.13678))
+    - [RegNetX_4GF](https://paddle-imagenet-models-name.bj.bcebos.com/RegNetX_4GF_pretrained.pdparams)
+
+
+
+- Other models
+  - AlexNet series<sup>[[18](#ref18)]</sup>([paper link](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf))
+    - [AlexNet](https://paddle-imagenet-models-name.bj.bcebos.com/AlexNet_pretrained.tar)
+  - SqueezeNet series<sup>[[19](#ref19)]</sup>([paper link](https://arxiv.org/abs/1602.07360))
+    - [SqueezeNet1_0](https://paddle-imagenet-models-name.bj.bcebos.com/SqueezeNet1_0_pretrained.tar)
+    - [SqueezeNet1_1](https://paddle-imagenet-models-name.bj.bcebos.com/SqueezeNet1_1_pretrained.tar)
+  - VGG series<sup>[[20](#ref20)]</sup>([paper link](https://arxiv.org/abs/1409.1556))
+    - [VGG11](https://paddle-imagenet-models-name.bj.bcebos.com/VGG11_pretrained.tar)
+    - [VGG13](https://paddle-imagenet-models-name.bj.bcebos.com/VGG13_pretrained.tar)
+    - [VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.tar)
+    - [VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.tar)
+  - DarkNet series<sup>[[21](#ref21)]</sup>([paper link](https://arxiv.org/abs/1506.02640))
+    - [DarkNet53](https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_ImageNet1k_pretrained.tar)
+  - ACNet series<sup>[[22](#ref22)]</sup>([paper link](https://arxiv.org/abs/1908.03930))
+    - [ResNet50_ACNet_deploy](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_ACNet_deploy_pretrained.tar)
+
+**Note**: The pretrained models of EfficientNetB1-B7 in the above models are transferred from [pytorch version of EfficientNet](https://github.com/lukemelas/EfficientNet-PyTorch), and the ResNeXt101_wsl series of pretrained models are transferred from [Official repo](https://github.com/facebookresearch/WSL-Images), the remaining pretrained models are obtained by training with the PaddlePaddle framework, and the corresponding training hyperparameters are given in configs.
+
+## References
+
+
+<a name="ref1">[1]</a> He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
+
+<a name="ref2">[2]</a> He T, Zhang Z, Zhang H, et al. Bag of tricks for image classification with convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 558-567.
+
+<a name="ref3">[3]</a> Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 1314-1324.
+
+<a name="ref4">[4]</a> Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.
+
+<a name="ref5">[5]</a> Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
+
+<a name="ref6">[6]</a> Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 116-131.
+
+<a name="ref7">[7]</a> Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500.
+
+
+<a name="ref8">[8]</a> Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
+
+
+<a name="ref9">[9]</a> Gao S, Cheng M M, Zhao K, et al. Res2net: A new multi-scale backbone architecture[J]. IEEE transactions on pattern analysis and machine intelligence, 2019.
+
+<a name="ref10">[10]</a> Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.
+
+
+<a name="ref11">[11]</a> Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Thirty-first AAAI conference on artificial intelligence. 2017.
+
+<a name="ref12">[12]</a> Chollet F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1251-1258.
+
+<a name="ref13">[13]</a> Wang J, Sun K, Cheng T, et al. Deep high-resolution representation learning for visual recognition[J]. arXiv preprint arXiv:1908.07919, 2019.
+
+<a name="ref14">[14]</a> Chen Y, Li J, Xiao H, et al. Dual path networks[C]//Advances in neural information processing systems. 2017: 4467-4475.
+
+<a name="ref15">[15]</a> Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708.
+
+
+<a name="ref16">[16]</a> Tan M, Le Q V. Efficientnet: Rethinking model scaling for convolutional neural networks[J]. arXiv preprint arXiv:1905.11946, 2019.
+
+<a name="ref17">[17]</a> Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196.
+
+<a name="ref18">[18]</a> Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 2012: 1097-1105.
+
+<a name="ref19">[19]</a> Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. arXiv preprint arXiv:1602.07360, 2016.
+
+<a name="ref20">[20]</a> Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
+
+<a name="ref21">[21]</a> Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.
+
+<a name="ref22">[22]</a> Ding X, Guo Y, Ding G, et al. Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 1911-1920.
+
+<a name="ref23">[23]</a> Han K, Wang Y, Tian Q, et al. GhostNet: More features from cheap operations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1580-1589.
+
+<a name="ref24">[24]</a> Zhang H, Wu C, Zhang Z, et al. Resnest: Split-attention networks[J]. arXiv preprint arXiv:2004.08955, 2020.
+
+<a name="ref25">[25]</a> Radosavovic I, Kosaraju R P, Girshick R, et al. Designing network design spaces[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10428-10436.
--- a/docs/en/tutorials/config_en.md
+++ b/docs/en/tutorials/config_en.md
+#Configuration
+
+---
+
+## Introduction
+
+This document introduces the configuration(filed in `config/*.yaml`) of PaddleClas.
+
+### Basic
+
+| name | detail | default value | optional value |
+|:---:|:---:|:---:|:---:|
+| mode | mode | "train" | ["train"," valid"] |
+| architecture | model name | "ResNet50_vd" | one of 23 architectures |
+| pretrained_model | pretrained model path | "" | Str |
+| model_save_dir | model stored path | "" | Str |
+| classes_num | class number | 1000 | int |
+| total_images | total images | 1281167 | int |
+| save_interval | save interval | 1 | int |
+| validate | whether to validate when training | TRUE | bool |
+| valid_interval | valid interval | 1 | int |
+| epochs | epoch |  | int |
+| topk | K value | 5 | int |
+| image_shape | image size | [3，224，224] | list, shape: (3,) |
+| use_mix | whether to use mixup | False | ['True', 'False'] |
+| ls_epsilon | label_smoothing epsilon value| 0 | float |
+
+### Optimizer & Learning rate
+
+learning rate
+
+| name | detail | default value |Optional value |
+|:---:|:---:|:---:|:---:|
+| function | decay type | "Linear" | ["Linear", "Cosine", <br> "Piecewise", "CosineWarmup"] |
+| params.lr | initial learning rate | 0.1 | float |
+| params.decay_epochs | milestone in piecewisedecay |  | list |
+| params.gamma | gamma in piecewisedecay | 0.1 | float |
+| params.warmup_epoch | warmup epoch | 5 | int |
+| parmas.steps | decay steps in lineardecay | 100 | int |
+| params.end_lr | end lr in lineardecay | 0 | float |
+
+optimizer
+
+| name | detail | default value | optional value |
+|:---:|:---:|:---:|:---:|
+| function | optimizer name | "Momentum" | ["Momentum", "RmsProp"] |
+| params.momentum | momentum value | 0.9 | float |
+| regularizer.function | regularizer method name | "L2" | ["L1", "L2"] |
+| regularizer.factor | regularizer factor | 0.0001 | float |
+
+### reader
+
+| name | detail |
+|:---:|:---:|
+| batch_size | batch size |
+| num_workers | worker number |
+| file_list | train list path |
+| data_dir | train  dataset path |
+| shuffle_seed | seed |
+
+processing
+
+| function name | attribute name | detail |
+|:---:|:---:|:---:|
+| DecodeImage | to_rgb | decode to RGB |
+|  | to_np | to numpy |
+|  | channel_first | Channel first |
+| RandCropImage | size | random crop |
+| RandFlipImage | | random flip |
+| NormalizeImage | scale | normalize image |
+|  | mean | mean |
+|  | std | std |
+|  | order | order |
+| ToCHWImage |  | to CHW |
+| CropImage | size | crop size |
+| ResizeImage | resize_short | resize according to short size |
+
+mix preprocessing
+
+| name| detail|
+|:---:|:---:|
+| MixupOperator.alpha | alpha value in mixup|
--- a/docs/en/tutorials/data_en.md
+++ b/docs/en/tutorials/data_en.md
+# Data
+
+---
+
+## Introducation
+This document introduces the preparation of ImageNet1k and flowers102
+
+## Dataset
+
+Dataset | train dataset size | valid dataset size | category |
+:------:|:---------------:|:---------------------:|:--------:|
+[flowers102](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/)|1k | 6k | 102 |
+[ImageNet1k](http://www.image-net.org/challenges/LSVRC/2012/)|1.2M| 50k | 1000 |
+
+* Data format
+
+Please follow the steps mentioned below to organize data, include train_list.txt and val_list.txt
+
+```shell
+# delimiter: "space"
+
+ILSVRC2012_val_00000001.JPEG 65
+...
+
+```
+### ImageNet1k
+After downloading data, please organize the data dir as below
+
+```bash
+PaddleClas/dataset/imagenet/
+|_ train/
+|  |_ n01440764
+|  |  |_ n01440764_10026.JPEG
+|  |  |_ ...
+|  |_ ...
+|  |
+|  |_ n15075141
+|     |_ ...
+|     |_ n15075141_9993.JPEG
+|_ val/
+|  |_ ILSVRC2012_val_00000001.JPEG
+|  |_ ...
+|  |_ ILSVRC2012_val_00050000.JPEG
+|_ train_list.txt
+|_ val_list.txt
+```
+### Flowers102 Dataset
+
+Download [Data](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) then decompress:
+
+```shell
+jpg/
+setid.mat
+imagelabels.mat
+```
+
+Please put all the files under ```PaddleClas/dataset/flowers102```
+
+generate generate_flowers102_list.py and train_list.txt和val_list.txt
+
+```bash
+python generate_flowers102_list.py jpg train > train_list.txt
+python generate_flowers102_list.py jpg valid > val_list.txt
+
+```
+
+Please organize data dir as below
+
+```bash
+PaddleClas/dataset/flowers102/
+|_ jpg/
+|  |_ image_03601.jpg
+|  |_ ...
+|  |_ image_02355.jpg
+|_ train_list.txt
+|_ val_list.txt
+```
--- a/docs/en/tutorials/getting_started_en.md
+++ b/docs/en/tutorials/getting_started_en.md
+# Getting Started
+---
+Please refer to [Installation](install.md) to setup environment at first, and prepare ImageNet1K data by following the instruction mentioned in the [data](data.md)
+
+## 1. Training and Evaluation on Windows or CPU
+
+If training and evaluation are performed on Windows system or CPU, it is recommended to use the `tools/train_multi_platform.py` and `tools/eval_multi_platform.py` scripts.
+
+
+## 1.1 Model training
+
+After preparing the configuration file, The training process can be started in the following way.
+
+```
+python tools/train_multi_platform.py \
+    -c configs/ResNet/ResNet50.yaml \
+    -o model_save_dir=./output/ \
+    -o use_gpu=True
+```
+
+Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o model_save_dir=./output/` means to modify the `model_save_dir` in the configuration file to ` ./output/`. `-o use_gpu=True` means to use GPU for training. If you want to use the CPU for training, you need to set `use_gpu` to `False`.
+
+
+Of course, you can also directly modify the configuration file to update the configuration. For specific configuration parameters, please refer to [Configuration Document](config.md).
+
+* The output log examples are as follows:
+    * If mixup or cutmix is used in training, only loss, lr (learning rate) and training time of the minibatch will be printed in the log.
+
+    ```
+    train step:890  loss:  6.8473 lr: 0.100000 elapse: 0.157s
+    ```
+
+    * If mixup or cutmix is not used during training, in addition to loss, lr (learning rate) and the training time of the minibatch, top-1 and top-k( The default is 5) will also be printed in the log.
+
+    ```
+    epoch:0    train    step:13    loss:7.9561    top1:0.0156    top5:0.1094    lr:0.100000    elapse:0.193s
+    ```
+
+During training, you can view loss changes in real time through `VisualDL`. The command is as follows.
+
+```bash
+visualdl --logdir ./scalar --host <host_IP> --port <port_num>
+```
+
+### 1.2 Model finetuning
+
+* After configuring the configuration file, you can finetune it by loading the pretrained weights, The command is as shown below.
+
+```
+python tools/train_multi_platform.py \
+    -c configs/ResNet/ResNet50.yaml \
+    -o pretrained_model="./pretrained/ResNet50_pretrained"
+```
+
+Among them, `pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
+
+### 1.3 Resume Training
+
+* If the training process is terminated for some reasons, you can also load the checkpoints to continue training.
+
+```
+python tools/train_multi_platform.py \
+    -c configs/ResNet/ResNet50.yaml \
+    -o checkpoints="./output/ResNet/0/ppcls"
+```
+
+The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, earning rate, optimizer and other information will be loaded using this parameter.
+
+
+### 1.4 Model evaluation
+
+* The model evaluation process can be started as follows.
+
+```bash
+python tools/eval_multi_platform.py \
+    -c ./configs/eval.yaml \
+    -o ARCHITECTURE.name="ResNet50_vd" \
+    -o pretrained_model=path_to_pretrained_models
+```
+
+You can modify the `ARCHITECTURE.name` field and `pretrained_model` field in `configs/eval.yaml` to configure the evaluation model, and you also can update the configuration through the -o parameter.
+
+
+**Note:** When loading the pretrained model, you need to specify the prefix of the pretrained model. For example, the pretrained model path is `output/ResNet50_vd/19`, and the pretrained model filename is `output/ResNet50_vd/19/ppcls.pdparams`, the parameter `pretrained_model` needs to be specified as `output/ResNet50_vd/19/ppcls`, PaddleClas will automatically fill in the `.pdparams` suffix.
+
+### 2. Training and evaluation on Linux+GPU
+
+If you want to run PaddleClas on Linux with GPU, it is highly recommended to use the model training and evaluation scripts provided by PaddleClas: `tools/train.py` and `tools/eval.py`.
+
+### 2.1 Model training
+
+After preparing the configuration file, The training process can be started in the following way.
+
+```bash
+# PaddleClas starts multi-card and multi-process training through launch
+# Specify the GPU running card number by setting FLAGS_selected_gpus
+python -m paddle.distributed.launch \
+    --selected_gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./configs/ResNet/ResNet50_vd.yaml
+```
+
+The configuration can be updated by adding the `-o` parameter.
+
+```bash
+python -m paddle.distributed.launch \
+    --selected_gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./configs/ResNet/ResNet50_vd.yaml \
+        -o use_mix=1 \
+        --vdl_dir=./scalar/
+```
+
+The format of output log information is the same as above.
+
+
+
+### 2.2 Model finetuning
+
+* After configuring the configuration file, you can finetune it by loading the pretrained weights, The command is as shown below.
+
+```
+python -m paddle.distributed.launch \
+    --selected_gpus="0,1,2,3" \
+    tools/train.py \
+        -c configs/ResNet/ResNet50.yaml \
+        -o pretrained_model="./pretrained/ResNet50_pretrained"
+```
+
+Among them, `pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
+
+* There contains a lot of examples of model finetuning in [The quick start tutorial](./quick_start_en.md). You can refer to this tutorial to finetune the model on a specific dataset.
+
+### 2.3 Resume Training
+
+* If the training process is terminated for some reasons, you can also load the checkpoints to continue training.
+
+```
+python -m paddle.distributed.launch \
+    --selected_gpus="0,1,2,3" \
+    tools/train.py \
+        -c configs/ResNet/ResNet50.yaml \
+        -o checkpoints="./output/ResNet/0/ppcls"
+```
+
+The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter.
+
+### 2.4 Model evaluation
+
+* The model evaluation process can be started as follows.
+
+```bash
+python tools/eval_multi_platform.py \
+    -c ./configs/eval.yaml \
+    -o ARCHITECTURE.name="ResNet50_vd" \
+    -o pretrained_model=path_to_pretrained_models
+```
+
+You can modify the `ARCHITECTURE.name` field and `pretrained_model` field in `configs/eval.yaml` to configure the evaluation model, and you also can update the configuration through the -o parameter.
+
+
+## 3. Model inference
+
+PaddlePaddle provides three ways to perform model inference. Next, how to use the inference engine to perforance model inference will be introduced.
+
+Firstly, you should export inference model using `tools/export_model.py`.
+
+```bash
+python tools/export_model.py \
+    --model=model_name \
+    --pretrained_model=pretrained_model_dir \
+    --output_path=save_inference_dir
+
+```
+
+Secondly, Inference engine can be started using the following commands.
+
+```bash
+python tools/infer/predict.py \
+    -m model_path \
+    -p params_path \
+    -i image path \
+    --use_gpu=1 \
+    --use_tensorrt=True
+```
+please refer to [inference](../extension/paddle_inference_en.md) for more details.
--- a/docs/en/tutorials/index.rst
+++ b/docs/en/tutorials/index.rst
@@ -4,6 +4,8 @@ tutorials
 .. toctree::
   :maxdepth: 1
   
-   install.md
-   getting_started.md
-   config.md
+   install_en.md
+   quick_start_en.md
+   data_en.md
+   getting_started_en.md
+   config_en.md
--- a/docs/en/tutorials/install_en.md
+++ b/docs/en/tutorials/install_en.md
+# Installation
+
+---
+
+## Introducation
+
+This document introduces how to install PaddleClas and its requirements.
+
+## Install PaddlePaddle
+
+Python 3.5, CUDA 9.0, CUDNN7.0 nccl2.1.2 and later version are required at first, For now, PaddleClas only support training on the GPU device. Please follow the instructions in the [Installation](http://www.paddlepaddle.org.cn/install/quick) if the PaddlePaddle on the device is lower than v1.7
+
+Install PaddlePaddle
+
+```bash
+pip install paddlepaddle-gpu --upgrade
+```
+
+or compile from source code, please refer to [Installation](http://www.paddlepaddle.org.cn/install/quick).
+
+Verify Installation
+
+```python
+import paddle.fluid as fluid
+fluid.install_check.run_check()
+```
+
+Check PaddlePaddle version：
+
+```bash
+python -c "import paddle; print(paddle.__version__)"
+```
+
+Note:
+- Make sure the compiled version is later than v1.7
+- Indicate **WITH_DISTRIBUTE=ON** when compiling, Please refer to [Instruction](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#id3) for more details.
+
+
+## Install PaddleClas
+
+**Clone PaddleClas: **
+
+```
+cd path_to_clone_PaddleClas
+git clone https://github.com/PaddlePaddle/PaddleClas.git
+```
+
+**Install requirements**
+
+
+```
+pip install --upgrade -r requirements.txt
+```
+
+If the install process of visualdl failed, you can try the following commands.
+
+```
+pip3 install --upgrade visualdl==2.0.0b3 -i https://mirror.baidu.com/pypi/simple
+
+```
+
+What's more, visualdl is just supported in python3, so python3 is needed if you want to use visualdl.
--- a/docs/en/tutorials/quick_start_en.md
+++ b/docs/en/tutorials/quick_start_en.md
+# Trial in 30mins
+
+Based on the flowers102 dataset, it takes only 30 mins to experience PaddleClas, include training varieties of backbone and pretrained model, SSLD distillation, and multiple data augmentation, Please refer to [Installation](install.md) to install at first.
+
+
+## Preparation
+
+* enter insatallation dir
+
+```
+cd path_to_PaddleClas
+```
+
+* enter `dataset/flowers102`, download and decompress flowers102 dataset.
+
+```shell
+cd dataset/flowers102
+wget https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz
+wget https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat
+wget https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat
+tar -xf 102flowers.tgz
+```
+
+* create train/val/test label files
+
+```shell
+python generate_flowers102_list.py jpg train > train_list.txt
+python generate_flowers102_list.py jpg valid > val_list.txt
+python generate_flowers102_list.py jpg test > extra_list.txt
+cat train_list.txt extra_list.txt > train_extra_list.txt
+```
+
+**Note:** In order to offer more data to SSLD training task, train_list.txt and extra_list.txt will merge into train_extra_list.txft
+
+* return `PaddleClas` dir
+
+```
+cd ../../
+```
+
+## Environment
+
+### Set PYTHONPATH
+
+```bash
+export PYTHONPATH=./:$PYTHONPATH
+```
+
+### Download pretrained model
+
+
+```bash
+python tools/download.py -a ResNet50_vd -p ./pretrained -d True
+python tools/download.py -a ResNet50_vd_ssld -p ./pretrained -d True
+python tools/download.py -a MobileNetV3_large_x1_0 -p ./pretrained -d True
+```
+
+Paramters：
+ `architecture`(shortname: a): model name.
+ `path`(shortname: p) download path.
+ `decompress`(shortname: d) whether to decompress.
+
+
+
+* All experiments are running on the NVIDIA® Tesla® V100 sigle card.
+
+
+## Training
+
+### Train from scratch
+
+* Train ResNet50_vd
+
+```shell
+export CUDA_VISIBLE_DEVICES=0
+python -m paddle.distributed.launch \
+    --selected_gpus="0" \
+    tools/train.py \
+        -c ./configs/quick_start/ResNet50_vd.yaml
+
+```
+
+The validation `Top1 Acc` curve is showmn below.
+
+![](../../images/quick_start/r50_vd_acc.png)
+
+
+### Finetune - ResNet50_vd pretrained model (Acc 79.12\%)
+
+* finetune ResNet50_vd_ model pretrained on the 1000-class Imagenet dataset
+
+```shell
+export CUDA_VISIBLE_DEVICES=0
+python -m paddle.distributed.launch \
+    --selected_gpus="0" \
+    tools/train.py \
+        -c ./configs/quick_start/ResNet50_vd_finetune.yaml
+
+```
+
+The validation `Top1 Acc` curve is shown below
+
+![](../../images/quick_start/r50_vd_pretrained_acc.png)
+
+Compare with training from scratch, it improve by 65\% to 94.02\%
+
+
+### SSLD finetune - ResNet50_vd_ssld pretrained model (Acc 82.39\%)
+
+Note: when finetuning model, which has been trained by SSLD, please use smaller learning rate in the middle of net.
+
+```yaml
+ARCHITECTURE:
+    name: 'ResNet50_vd'
+    params:
+        lr_mult_list: [0.1, 0.1, 0.2, 0.2, 0.3]
+pretrained_model: "./pretrained/ResNet50_vd_ssld_pretrained"
+```
+
+Tringing script
+
+```shell
+export CUDA_VISIBLE_DEVICES=0
+python -m paddle.distributed.launch \
+    --selected_gpus="0" \
+    tools/train.py \
+        -c ./configs/quick_start/ResNet50_vd_ssld_finetune.yaml
+```
+
+Compare with finetune on the 79.12% pretrained model, it improve by 0.9% to 95%.
+
+
+### More architecture - MobileNetV3
+
+Training script
+
+```shell
+export CUDA_VISIBLE_DEVICES=0
+python -m paddle.distributed.launch \
+    --selected_gpus="0" \
+    tools/train.py \
+        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml
+```
+
+Compare with ResNet50_vd pretrained model, it decrease by 5% to 90%. Different architecture generates different performance, actually it is a task-oriented decision to apply the best performance model, should consider the inference time, storage, heterogeneous device, etc.
+
+
+### RandomErasing
+
+Data augmentation works when training data is small.
+
+Training script
+
+```shell
+export CUDA_VISIBLE_DEVICES=0
+python -m paddle.distributed.launch \
+    --selected_gpus="0" \
+    tools/train.py \
+        -c ./configs/quick_start/ResNet50_vd_ssld_random_erasing_finetune.yaml
+```
+
+It improves by 1.27\% to 96.27\%
+
+* Save ResNet50_vd pretrained model to experience next chapter.
+
+```shell
+cp -r output/ResNet50_vd/19/  ./pretrained/flowers102_R50_vd_final/
+```
+
+### Distillation
+
+* Use extra_list.txt as unlabeled data, Note:
+    * Samples in the `extra_list.txt` and `val_list.txt` don't have intersection
+    * Because of in the source code, label information is unused, This is still unlabeled distillation
+    * Teacher model use the pretrained_model trained on the flowers102 dataset, and student model use the MobileNetV3_large_x1_0 pretrained model(Acc 75.32\%) trained on the ImageNet1K dataset
+
+
+```yaml
+total_images: 7169
+ARCHITECTURE:
+    name: 'ResNet50_vd_distill_MobileNetV3_large_x1_0'
+pretrained_model:
+    - "./pretrained/flowers102_R50_vd_final/ppcls"
+    - "./pretrained/MobileNetV3_large_x1_0_pretrained/”
+TRAIN:
+    file_list: "./dataset/flowers102/train_extra_list.txt"
+```
+
+Final training script
+
+```shell
+export CUDA_VISIBLE_DEVICES=0
+python -m paddle.distributed.launch \
+    --selected_gpus="0" \
+    tools/train.py \
+        -c ./configs/quick_start/R50_vd_distill_MV3_large_x1_0.yaml
+```
+
+It significantly imporve by 6.47% to 96.47% with more unlabeled data and teacher model.
+
+### All accuracy
+
+
+|Configuration | Top1 Acc |
+|- |:-: |
+| ResNet50_vd.yaml | 0.2735 |
+| MobileNetV3_large_x1_0_finetune.yaml | 0.9000 |
+| ResNet50_vd_finetune.yaml | 0.9402 |
+| ResNet50_vd_ssld_finetune.yaml | 0.9500 |
+| ResNet50_vd_ssld_random_erasing_finetune.yaml | 0.9627 |
+| R50_vd_distill_MV3_large_x1_0.yaml | 0.9647 |
+
+
+The whole accuracy curves are shown below
+
+
+![](../../images/quick_start/all_acc.png)
+
+
+
+* **NOTE**: As flowers102 is a small dataset, validatation accuracy maybe float 1%.
+
+* Please refer to [Getting_started](./getting_started) for more details
--- a/docs/en/update_history_en.md
+++ b/docs/en/update_history_en.md
+# Release Notes
+
+- 2020.10.12
+    * Add Paddle-Lite demo.
+
+- 2020.10.10
+    * Add cpp inference demo.
+    * Improve FAQ tutorials.
+
+* 2020.09.17
+    * Add `HRNet_W48_C_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 83.62%.
+    * Add `ResNet34_vd_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 79.72%.
+
+* 2020.09.07
+    * Add `HRNet_W18_C_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 81.16%.
+    * Add `MobileNetV3_small_x0_35_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 55.55%.
+
+* 2020.07.14
+    * Add `Res2Net200_vd_26w_4s_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 85.13%.
+    * Add `Fix_ResNet50_vd_ssld_v2` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 84.00%.
+
+* 2020.06.17
+    * Add English documents.
+
+* 2020.06.12
+    * Add support for training and evaluation on Windows or CPU.
+
+* 2020.05.17
+    * Add support for mixed precision training.
+
+* 2020.05.09
+    * Add user guide about Paddle Serving and Paddle-Lite.
+    * Add benchmark about FP16/FP32 on T4 GPU.
+
+* 2020.04.14
+    * First commit.
--- a/docs/images/distillation/distillation_perform_s.jpg
+++ b/docs/images/distillation/distillation_perform_s.jpg
--- a/docs/images/image_aug/image_aug_samples_s_en.jpg
+++ b/docs/images/image_aug/image_aug_samples_s_en.jpg
--- a/docs/images/main_features_s.png
+++ b/docs/images/main_features_s.png
--- a/docs/images/main_features_s_en.png
+++ b/docs/images/main_features_s_en.png
--- a/docs/images/models/T4_benchmark/t4.fp16.bs4.ResNet.png
+++ b/docs/images/models/T4_benchmark/t4.fp16.bs4.ResNet.png
--- a/docs/images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png
+++ b/docs/images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png
--- a/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png
+++ b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png
--- a/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png
+++ b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png
--- a/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.png
+++ b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.png
--- a/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png
+++ b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png
--- a/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png
+++ b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png
--- a/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png
+++ b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png
--- a/docs/images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png
+++ b/docs/images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png
--- a/docs/images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.jpg
+++ b/docs/images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.jpg
--- a/docs/images/models/mobile_arm_top1.png
+++ b/docs/images/models/mobile_arm_top1.png
--- a/docs/zh_CN/advanced_tutorials/distillation/distillation.md
+++ b/docs/zh_CN/advanced_tutorials/distillation/distillation.md
@@ -8,7 +8,7 @@
 深度神经网络一般有较多的参数冗余，目前有几种主要的方法对模型进行压缩，减小其参数量。如裁剪、量化、知识蒸馏等，其中知识蒸馏是指使用教师模型(teacher model)去指导学生模型(student model)学习特定任务，保证小模型在参数量不变的情况下，得到比较大的性能提升，甚至获得与大模型相似的精度指标[1]。PaddleClas融合已有的蒸馏方法[2,3]，提供了一种简单的半监督标签知识蒸馏方案（SSLD，Simple Semi-supervised Label Distillation），基于ImageNet1k分类数据集，在ResNet_vd以及MobileNet系列上的精度均有超过3%的绝对精度提升，具体指标如下图所示。


-![](../../../images/distillation/distillation_perform.png)
+![](../../../images/distillation/distillation_perform_s.jpg)


 # 二、SSLD 蒸馏策略
@@ -17,10 +17,8 @@

 SSLD的流程图如下图所示。

-
 ![](../../../images/distillation/ppcls_distillation.png)

-
 首先，我们从ImageNet22k中挖掘出了近400万张图片，同时与ImageNet-1k训练集整合在一起，得到了一个新的包含500万张图片的数据集。然后，我们将学生模型与教师模型组合成一个新的网络，该网络分别输出学生模型和教师模型的预测分布，与此同时，固定教师模型整个网络的梯度，而学生模型可以做正常的反向传播。最后，我们将两个模型的logits经过softmax激活函数转换为soft label，并将二者的soft label做JS散度作为损失函数，用于蒸馏模型训练。下面以MobileNetV3（该模型直接训练，精度为75.3%）的知识蒸馏为例，介绍该方案的核心关键点（baseline为79.12%的ResNet50_vd模型蒸馏MobileNetV3，训练集为ImageNet1k训练集，loss为cross entropy loss，迭代轮数为120epoch，精度指标为75.6%）。

 * 教师模型的选择。在进行知识蒸馏时，如果教师模型与学生模型的结构差异太大，蒸馏得到的结果反而不会有太大收益。相同结构下，精度更高的教师模型对结果也有很大影响。相比于79.12%的ResNet50_vd教师模型，使用82.4%的ResNet50_vd教师模型可以带来0.4%的绝对精度收益(`75.6%->76.0%`)。
@@ -31,7 +29,7 @@ SSLD的流程图如下图所示。

 * 无需数据集的真值标签，很容易扩展训练集。SSLD的loss在计算过程中，仅涉及到教师和学生模型对于相同图片的处理结果（经过softmax激活函数处理之后的soft label），因此即使图片数据不包含真值标签，也可以用来进行训练并提升模型性能。该蒸馏方案的无标签蒸馏策略也大大提升了学生模型的性能上限（`77.1%->78.5%`）。

-* ImageNet1k蒸馏finetune。我们仅使用ImageNet1k数据，使用蒸馏方法对上述模型进行finetune，最终仍然可以获得0.4%的性能提升(`75.8%->78.9%`)。
+* ImageNet1k蒸馏finetune。我们仅使用ImageNet1k数据，使用蒸馏方法对上述模型进行finetune，最终仍然可以获得0.4%的性能提升(`78.5%->78.9%`)。



@@ -87,6 +85,7 @@ SSLD的流程图如下图所示。
 | MobileNetV3_small_x1_0 | 360 | 1e-5 |  5760/24 | 3.65625 | cosine_decay_warmup | 70.11% |
 | ResNet50_vd | 360 | 7e-5 | 1024/32 | 0.4 | cosine_decay_warmup | 82.07% |
 | ResNet101_vd | 360 | 7e-5 | 1024/32 | 0.4 | cosine_decay_warmup | 83.41% |
+| Res2Net200_vd_26w_4s | 360 | 4e-5 | 1024/32 | 0.4 | cosine_decay_warmup | 84.82% |

 ## 3.3 ImageNet1k训练集finetune

@@ -101,6 +100,15 @@ SSLD的流程图如下图所示。
 | MobileNetV3_small_x1_0 | 30 | 1e-5 |  6400/32 | 0.025 | cosine_decay_warmup | 71.28% |
 | ResNet50_vd | 60 | 7e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 82.39% |
 | ResNet101_vd | 30 | 7e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 83.73% |
+| Res2Net200_vd_26w_4s | 360 | 4e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 85.13% |
+
+
+## 3.4 数据增广以及基于Fix策略的微调
+
+* 基于前文所述的实验结论，我们在训练的过程中加入自动增广(AutoAugment)[4]，同时进一步减小了l2_decay(4e-5->2e-5)，最终ResNet50_vd经过SSLD蒸馏策略，在ImageNet1k上的精度可以达到82.99%，相比之前不加数据增广的蒸馏策略再次增加了0.6%。
+
+
+* 对于图像分类任务，在测试的时候，测试尺度为训练尺度的1.15倍左右时，往往在不需要重新训练模型的情况下，模型的精度指标就可以进一步提升[5]，对于82.99%的ResNet50_vd在320x320的尺度下测试，精度可达83.7%，我们进一步使用Fix策略，即在320x320的尺度下进行训练，使用与预测时相同的数据预处理方法，同时固定除FC层以外的所有参数，最终在320x320的预测尺度下，精度可以达到**84.0%**。


 ## 3.4 实验过程中的一些问题
@@ -182,7 +190,7 @@ for var in ./*_student; do cp "$var" "../student_model/${var%_student}"; done #
 | Faster RCNN R50_vd FPN | 640/640 | 79.12% | [0.05,0.05,0.1,0.1,0.15] | 34.3% |
 | Faster RCNN R50_vd FPN | 640/640 | 82.18% | [0.05,0.05,0.1,0.1,0.15] | 36.3% |

-在这里可以看出，对于未蒸馏模型，过度调整中间层学习率反而降低最终检测模型的性能指标。基于该蒸馏模型，我们也提供了领先的服务端实用目标检测方案，详细的配置与训练代码均已开源，可以参考[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det)。
+在这里可以看出，对于未蒸馏模型，过度调整中间层学习率反而降低最终检测模型的性能指标。基于该蒸馏模型，我们也提供了领先的服务端实用目标检测方案，详细的配置与训练代码均已开源，可以参考[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_enhance)。


 # 五、SSLD实战
@@ -266,3 +274,7 @@ sh tools/run.sh
 [2] Bagherinezhad H, Horton M, Rastegari M, et al. Label refinery: Improving imagenet classification through label progression[J]. arXiv preprint arXiv:1805.02641, 2018.

 [3] Yalniz I Z, Jégou H, Chen K, et al. Billion-scale semi-supervised learning for image classification[J]. arXiv preprint arXiv:1905.00546, 2019.
+
+[4] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2019: 113-123.
+
+[5] Touvron H, Vedaldi A, Douze M, et al. Fixing the train-test resolution discrepancy[C]//Advances in Neural Information Processing Systems. 2019: 8250-8260.
--- a/docs/zh_CN/advanced_tutorials/image_augmentation/ImageAugment.md
+++ b/docs/zh_CN/advanced_tutorials/image_augmentation/ImageAugment.md
@@ -4,7 +4,6 @@

 ![](../../../images/image_aug/main_image_aug.png)

-
 # 二、常用数据增广方法

 如果没有特殊说明，本章节中所有示例为 ImageNet 分类，并且假设最终输入网络的数据维度为：`[batch-size, 3, 224, 224]`
@@ -24,6 +23,9 @@
 2. 对`Transpose` 后的 224 的图像进行一些裁剪: CutOut，RandErasing，HideAndSeek，GridMask
 3. 对 `Batch` 后的数据进行混合: Mixup，Cutmix

+增广后的可视化效果如下所示。
+
+![](../../../images/image_aug/image_aug_samples_s.jpg)

 具体如下表所示：


--- a/docs/zh_CN/application/transfer_learning.md
+++ b/docs/zh_CN/application/transfer_learning.md
@@ -57,7 +57,7 @@ Mixup: [False, True]

 ## 二、 大规模分类模型

-在实际应用中，由于训练数据的匮乏，往往将ImageNet1k数据集训练的分类模型作为预训练模型，进行图像分类的迁移学习。为了进一步助力解决实际问题，基于ResNet50_vd, 百度开源了自研的大规模分类预训练模型，其中训练数据为10万个类别，4300万张图片。
+在实际应用中，由于训练数据的匮乏，往往将ImageNet1k数据集训练的分类模型作为预训练模型，进行图像分类的迁移学习。为了进一步助力解决实际问题，基于ResNet50_vd, 百度开源了自研的大规模分类预训练模型，其中训练数据为10万个类别，4300万张图片。10万类预训练模型的下载地址：[**下载地址**](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_10w_pretrained.tar)

 我们在6个自有采集的数据集上进行迁移学习实验，采用一组固定参数以及网格搜索方式，其中训练轮数设置为20epochs，选用ResNet50_vd模型，ImageNet预训练精度为79.12%。实验数据集参数以及模型精度的对比结果如下：


--- a/docs/zh_CN/extension/paddle_inference.md
+++ b/docs/zh_CN/extension/paddle_inference.md
@@ -100,7 +100,7 @@ python tools/export_model.py \
 在模型库的 `tools/infer/predict.py` 中提供了完整的示例，只需执行下述命令即可完成预测：

 ```
-python ./predict.py \
+python ./tools/infer/predict.py \
    -i=./test.jpeg \
    -m=./resnet50-vd/model \
    -p=./resnet50-vd/params \
@@ -122,7 +122,7 @@ python ./predict.py \

 注意：
 当启用benchmark时，默认开启tersorrt进行预测
- 
+

 构建预测引擎：

@@ -259,4 +259,3 @@ outputs = exe.run(compiled_program,
 ```

 上述执行预测时候的参数说明可以参考官网 [fluid.Executor](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/executor_cn/Executor_cn.html)
-
--- a/docs/zh_CN/faq.md
+++ b/docs/zh_CN/faq.md
 # FAQ

+## 写在前面
+
+* 我们收集整理了开源以来在issues和用户群中的常见问题并且给出了简要解答，旨在为图像分类的开发者提供一些参考，也希望帮助大家少走一些弯路。
+
+* 图像分类领域大佬众多，模型和论文更新速度也很快，本文档回答主要依赖有限的项目实践，难免挂一漏万，如有遗漏和不足，也希望有识之士帮忙补充和修正，万分感谢。
+
+
+## PaddleClas常见问题汇总(持续更新)
+
+* [图像分类30个问题](#图像分类30个问题)
+    * [基础知识](#基础知识)
+    * [模型训练相关](#模型训练相关)
+    * [数据相关](#数据相关)
+    * [模型推理与预测相关](#模型推理与预测相关)
+* [PaddleClas使用问题](#PaddleClas使用问题)
+
+
+<a name="图像分类30个问题"></a>
+## 图像分类30个问题
+
+<a name="基础知识"></a>
+### 基础知识
+
+>>
+* Q: 图像分类领域常用的分类指标有几种
+* A:
+    * 对于单个标签的图像分类问题（仅包含1个类别与背景），评估指标主要有Accuracy，Precision，Recall，F-score等，令TP(True Positive)表示将正类预测为正类，FP(False Positive)表示将负类预测为正类，TN(True Negative)表示将负类预测为负类，FN(False Negative)表示将正类预测为负类。那么Accuracy=(TP + TN) / NUM，Precision=TP /(TP + FP)，Recall=TP /(TP + FN)。
+    * 对于类别数大于1的图像分类问题，评估指标主要有Accuary和Class-wise Accuracy，Accuary表示所有类别预测正确的图像数量占总图像数量的百分比；Class-wise Accuracy是对每个类别的图像计算Accuracy，然后再对所有类别的Accuracy取平均得到。
+
+>>
+* Q: 怎样根据自己的任务选择合适的模型进行训练？
+* A: 如果希望在服务器部署，或者希望精度尽可能地高，对模型存储大小或者预测速度的要求不是很高，那么推荐使用ResNet_vd、Res2Net_vd、DenseNet、Xception等适合于服务器端的系列模型；如果希望在移动端侧部署，则推荐使用MobileNetV3、GhostNet等适合于移动端的系列模型。同时，我们推荐在选择模型的时候可以参考[模型库](https://github.com/PaddlePaddle/PaddleClas/tree/master/docs/zh_CN/models)中的速度-精度指标图。
+
+>>
+* Q: 如何进行参数初始化，什么样的初始化可以加快模型收敛？
+* A: 众所周知，参数的初始化可以影响模型的最终性能。一般来说，如果目标数据集不是很大，建议使用ImageNet-1k训练得到的预训练模型进行初始化。如果是自己手动设计的网络或者暂时没有基于ImageNet-1k训练得到的预训练权重，可以使用Xavier初始化或者MSRA初始化，其中Xavier初始化是针对Sigmoid函数提出的，对RELU函数不太友好，网络越深，各层输入的方差越小，网络越难训练，所以当神经网络中使用较多RELU激活函数时，推荐使用MSRA初始化。
+
+>>
+* Q: 针对深度神经网络参数冗余的问题，目前有哪些比较好的解决办法？
+* A: 目前有几种主要的方法对模型进行压缩，减少模型参数冗余的问题，如剪枝、量化、知识蒸馏等。模型剪枝指的是将权重矩阵中相对不重要的权值剔除，然后再重新对网络进行微调；模型量化指的是一种将浮点计算转成低比特定点计算的技术，如8比特、4比特等，可以有效的降低模型计算强度、参数大小和内存消耗。知识蒸馏是指使用教师模型(teacher model)去指导学生模型(student model)学习特定任务，保证小模型在参数量不变的情况下，性能有较大的提升，甚至获得与大模型相似的精度指标。
+
+>>
+* Q: 怎样在其他任务，如目标检测、图像分割、关键点检测等任务中选择比较合适的分类模型作为骨干网络？
+* A: 在不考虑速度的情况下，在大部分的任务中，推荐使用精度更高的预训练模型和骨干网络，PaddleClas中开源了一系列的SSLD知识蒸馏预训练模型，如ResNet50_vd_ssld, Res2Net200_vd_26w_4s_ssld等，在模型精度和速度方面都是非常有优势的，推荐大家使用。对于一些特定的任务，如图像分割或者关键点检测等任务，对图像分辨率的要求比较高，那么更推荐使用HRNet等能够同时兼顾网络深度和分辨率的神经网络模型，PaddleClas也提供了HRNet_W18_C_ssld、HRNet_W48_C_ssld等精度非常高的HRNet SSLD蒸馏系列预训练模型，大家可以使用这些精度更高的预训练模型与骨干网络，提升自己在其他任务上的模型精度。
+
+>>
+* Q: 注意力机制是什么？目前有哪些比较常用的注意力机制方法？
+* A: 注意力机制（Attention Mechanism）源于对人类视觉的研究。将注意力机制用在计算机视觉任务上，可以有效捕捉图片中有用的区域，从而提升整体网络性能。目前比较常用的有[SE block](https://arxiv.org/abs/1709.01507)、[SK-block](https://arxiv.org/abs/1903.06586)、[Non-local block](https://arxiv.org/abs/1711.07971)、[GC block](https://arxiv.org/abs/1904.11492)、[CBAM](https://arxiv.org/abs/1807.06521)等，核心思想就是去学习特征图在不同区域或者不同通道中的重要性，从而让网络更加注意显著性的区域。
+
+<a name="模型训练相关"></a>
+### 模型训练相关
+
+>>
+* Q: 使用深度卷积网络做图像分类如果训练一个拥有1000万个类的模型会碰到什么问题？
+* A: 因为FC层参数很多，内存/显存/模型的存储占用都会大幅增大；模型收敛速度也会变慢一些。建议在这种情况下，再最后的FC层前加一层维度较小的FC，这样可以大幅减少模型的存储大小。
+
+>>
+* Q: 训练过程中，如果模型收敛效果很差，可能的原因有哪些呢？
+* A: 主要有以下几个可以排查的地方：（1）应该检查数据标注，确保训练集和验证集的数据标注没有问题。（2）可以试着调整一下学习率（初期可以以10倍为单位进行调节），过大（训练震荡）或者过小（收敛太慢）的学习率都可能导致收敛效果差。（3）数据量太大，选择的模型太小，难以学习所有数据的特征。（4）可以看下数据预处理的过程中是否使用了归一化，如果没有使用归一化操作，收敛速度可能会比较慢。（5）如果数据量比较小，可以试着加载PaddleClas中提供的基于ImageNet-1k数据集的预训练模型，这可以大大提升训练收敛速度。（6）数据集存在长尾问题，可以参考[数据长尾问题解决方案](#jump)。
+
+>>
+* Q: 训练图像分类任务时，该怎么选择合适的优化器？
+* A: 优化器的目的是为了让损失函数尽可能的小，从而找到合适的参数来完成某项任务。目前业界主要用到的优化器有SGD、RMSProp、Adam、AdaDelt等，其中由于带momentum的SGD优化器广泛应用于学术界和工业界，所以我们发布的模型也大都使用该优化器来实现损失函数的梯度下降。带momentum的SGD优化器有两个劣势，其一是收敛速度慢，其二是初始学习率的设置需要依靠大量的经验，然而如果初始学习率设置得当并且迭代轮数充足，该优化器也会在众多的优化器中脱颖而出，使得其在验证集上获得更高的准确率。一些自适应学习率的优化器如Adam、RMSProp等，收敛速度往往比较快，但是最终的收敛精度会稍差一些。如果追求更快的收敛速度，我们推荐使用这些自适应学习率的优化器，如果追求更高的收敛精度，我们推荐使用带momentum的SGD优化器。
+
+>>
+* Q: 当前主流的学习率下降策略有哪些？一般需要怎么选择呢？
+* A: 学习率是通过损失函数的梯度调整网络权重的超参数的速度。学习率越低，损失函数的变化速度就越慢。虽然使用低学习率可以确保不会错过任何局部极小值，但也意味着将花费更长的时间来进行收敛，特别是在被困在高原区域的情况下。在整个训练过程中，我们不能使用同样的学习率来更新权重，否则无法到达最优点，所以需要在训练过程中调整学习率的大小。在训练初始阶段，由于权重处于随机初始化的状态，损失函数相对容易进行梯度下降，所以可以设置一个较大的学习率。在训练后期，由于权重参数已经接近最优值，较大的学习率无法进一步寻找最优值，所以需要设置一个较小的学习率。在训练整个过程中，很多研究者使用的学习率下降方式是piecewise_decay，即阶梯式下降学习率，如在ResNet50标准的训练中，我们设置的初始学习率是0.1，每30epoch学习率下降到原来的1/10，一共迭代120epoch。除了piecewise_decay，很多研究者也提出了学习率的其他下降方式，如polynomial_decay（多项式下降）、exponential_decay（指数下降）,cosine_decay（余弦下降）等，其中cosine_decay无需调整超参数，鲁棒性也比较高，所以成为现在提高模型精度首选的学习率下降方式。Cosine_decay和piecewise_decay的学习率变化曲线如下图所示，容易观察到，在整个训练过程中，cosine_decay都保持着较大的学习率，所以其收敛较为缓慢，但是最终的收敛效果较peicewise_decay更好一些。
+![](../images/models/lr_decay.jpeg)
+>>
+* Q: Warmup学习率策略是什么？一般用在什么样的场景中？
+* A: Warmup策略顾名思义就是让学习率先预热一下，在训练初期我们不直接使用最大的学习率，而是用一个逐渐增大的学习率去训练网络，当学习率增大到最高点时，再使用学习率下降策略中提到的学习率下降方式衰减学习率的值。如果使用较大的batch_size训练神经网络时，我们建议您使用warmup策略。实验表明，在batch_size较大时，warmup可以稳定提升模型的精度。在训练MobileNetV3等batch_size较大的实验中，我们默认将warmup中的epoch设置为5，即先用5epoch将学习率从0增加到最大值，再去做相应的学习率衰减。
+
+>>
+* Q: 什么是`batch size`？在模型训练中，怎么选择合适的`batch size`？
+* A: `batch size`是训练神经网络中的一个重要的超参数，该值决定了一次将多少数据送入神经网络参与训练。论文[Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677)，当`batch size`的值与学习率的值呈线性关系时，收敛精度几乎不受影响。在训练ImageNet数据时，大部分的神经网络选择的初始学习率为0.1，`batch size`是256，所以根据实际的模型大小和显存情况，可以将学习率设置为0.1*k,batch_size设置为256*k。在实际任务中，也可以将该设置作为初始参数，进一步调节学习率参数并获得更优的性能。
+>>
+* Q: weight_decay是什么？怎么选择合适的weight_decay呢？
+* A: 过拟合是机器学习中常见的一个名词，简单理解即为模型在训练数据上表现很好，但在测试数据上表现较差，在卷积神经网络中，同样存在过拟合的问题，为了避免过拟合，很多正则方式被提出，其中，weight_decay是其中一个广泛使用的避免过拟合的方式。在使用SGD优化器时，weight_decay等价于在最终的损失函数后添加L2正则化，L2正则化使得网络的权重倾向于选择更小的值，最终整个网络中的参数值更趋向于0，模型的泛化性能相应提高。在各大深度学习框架的实现中，该值表达的含义是L2正则前的系数，在paddle框架中，该值的名称是l2_decay，所以以下都称其为l2_decay。该系数越大，表示加入的正则越强，模型越趋于欠拟合状态。在训练ImageNet的任务中，大多数的网络将该参数值设置为1e-4，在一些小的网络如MobileNet系列网络中，为了避免网络欠拟合，该值设置为1e-5~4e-5之间。当然，该值的设置也和具体的数据集有关系，当任务的数据集较大时，网络本身趋向于欠拟合状态，可以将该值适当减小，当任务的数据集较小时，网络本身趋向于过拟合状态，可以将该值适当增大。下表展示了MobileNetV1_x0_25在ImageNet-1k上使用不同l2_decay的精度情况。由于MobileNetV1_x0_25是一个比较小的网络，所以l2_decay过大会使网络趋向于欠拟合状态，所以在该网络中，相对1e-4，3e-5是更好的选择。
+
+| 模型                | L2_decay | Train acc1/acc5 | Test acc1/acc5 |
+|:--:|:--:|:--:|:--:|
+| MobileNetV1_x0_25 | 1e-4     | 43.79%/67.61%   | 50.41%/74.70%  |
+| MobileNetV1_x0_25 | 3e-5     | 47.38%/70.83%   | 51.45%/75.45%  |
+
+
+>>
+* Q: 标签平滑(label_smoothing)指的是什么？有什么效果呢？一般适用于什么样的场景中？
+* A: Label_smoothing是深度学习中的一种正则化方法，其全称是 Label Smoothing Regularization(LSR)，即标签平滑正则化。在传统的分类任务计算损失函数时，是将真实的one hot标签与神经网络的输出做相应的交叉熵计算，而label_smoothing是将真实的one hot标签做一个标签平滑的处理，使得网络学习的标签不再是一个hard label，而是一个有概率值的soft label，其中在类别对应的位置的概率最大，其他位置概率是一个非常小的数。具体的计算方式参见论文[2]。在label_smoothing里，有一个epsilon的参数值，该值描述了将标签软化的程度，该值越大，经过label smoothing后的标签向量的标签概率值越小，标签越平滑，反之，标签越趋向于hard label，在训练ImageNet-1k的实验里通常将该值设置为0.1。
+在训练ImageNet-1k的实验中，我们发现，ResNet50大小级别及其以上的模型在使用label_smooting后，精度有稳定的提升。下表展示了ResNet50_vd在使用label_smoothing前后的精度指标。同时，由于label_smoohing相当于一种正则方式，在相对较小的模型上，精度提升不明显甚至会有所下降，下表展示了ResNet18在ImageNet-1k上使用label_smoothing前后的精度指标。可以明显看到，在使用label_smoothing后，精度有所下降。
+
+| 模型   | Use_label_smoothing | Test acc1 |
+|:--:|:--:|:--:|
+| ResNet50_vd | 0    | 77.9%  |
+| ResNet50_vd | 1    | 78.4%  |
+| ResNet18    | 0    | 71.0%  |
+| ResNet18    | 1    | 70.8%  |
+
+
+>>
+* Q: 在训练的时候怎么通过训练集和验证集的准确率或者loss确定进一步的调优策略呢？
+* A: 在训练网络的过程中，通常会打印每一个epoch的训练集准确率和验证集准确率，二者刻画了该模型在两个数据集上的表现。通常来说，训练集的准确率比验证集准确率微高或者二者相当是比较不错的状态。如果发现训练集的准确率比验证集高很多，说明在这个任务上已经过拟合，需要在训练过程中加入更多的正则，如增大l2_decay的值，加入更多的数据增广策略，加入label_smoothing策略等；如果发现训练集的准确率比验证集低一些，说明在这个任务上可能欠拟合，需要在训练过程中减弱正则效果，如减小l2_decay的值，减少数据增广方式，增大图片crop区域面积，减弱图片拉伸变换，去除label_smoothing等。
+
+>>
+* Q: 怎么使用已有的预训练模型提升自己的数据集的精度呢？
+* A: 在现阶段计算机视觉领域中，加载预训练模型来训练自己的任务已成为普遍的做法，相比从随机初始化开始训练，加载预训练模型往往可以提升特定任务的精度。一般来说，业界广泛使用的预训练模型是通过训练128万张图片1000类的ImageNet-1k数据集得到的，该预训练模型的fc层权重是是一个k\*1000的矩阵，其中k是fc层以前的神经元数，在加载预训练权重时，无需加载fc层的权重。在学习率方面，如果您的任务训练的数据集特别小（如小于1千张），我们建议你使用较小的初始学习率，如0.001（batch_size:256,下同），以免较大的学习率破坏预训练权重。如果您的训练数据集规模相对较大（大于10万），我们建议你尝试更大的初始学习率，如0.01或者更大。
+
+<a name="数据相关"></a>
+### 数据相关
+
+>>
+* Q: 图像分类的数据预处理过程一般包括哪些步骤？
+* A: 以在ImageNet-1k数据集上训练ResNet50为例，一张图片被输入进网络，主要有图像解码、随机裁剪、随机水平翻转、标准化、数据重排，组batch并送进网络这几个步骤。图像解码指的是将图片文件读入到内存中，随机裁剪指的是将读入的图像随机拉伸并裁剪到长宽均为224的图像，随机水平翻转指的是对裁剪后的图片以0.5的概率进行水平翻转，标准化指的是将图片每个通道的数据通过去均值实现中心化的处理，使得数据尽可能符合`N(0,1)`的正态分布，数据重排指的是将数据由`[224,224,3]`的格式变为`[3,224,224]`的格式，组batch指的是将多幅图像组成一个批数据，送进网络进行训练。
+
+>>
+* Q: 随机裁剪是怎么影响小模型训练的性能的？
+* A: 在ImageNet-1k数据的标准预处理中，随机裁剪函数中定义了scale和ratio两个值，两个值分别确定了图片crop的大小和图片的拉伸程度，其中scale的默认取值范围是0.08-1(lower_scale-upper_scale),ratio的默认取值范围是3/4-4/3(lower_ratio-upper_ratio)。在非常小的网络训练中，此类数据增强会使得网络欠拟合，导致精度有所下降。为了提升网络的精度，可以使其数据增强变的更弱，即增大图片的crop区域或者减弱图片的拉伸变换程度。我们可以分别通过增大lower_scale的值或缩小lower_ratio与upper_scale的差距来实现更弱的图片变换。下表列出了使用不同lower_scale训练MobileNetV2_x0_25的精度，可以看到，增大图片的crop区域面积后训练精度和验证精度均有提升。
+
+| 模型                | Scale取值范围 | Train_acc1/acc5 | Test_acc1/acc5 |
+|:--:|:--:|:--:|:--:|
+| MobileNetV2_x0_25 | [0.08,1]  | 50.36%/72.98%   | 52.35%/75.65%  |
+| MobileNetV2_x0_25 | [0.2,1]   | 54.39%/77.08%   | 53.18%/76.14%  |
+
+
+>>
+* Q: 数据量不足的情况下，目前有哪些常见的数据增广方法来增加训练样本的丰富度呢？
+* A: PaddleClas中将目前比较常见的数据增广方法分为了三大类，分别是图像变换类、图像裁剪类和图像混叠类，图像变换类主要包括AutoAugment和RandAugment，图像裁剪类主要包括CutOut、RandErasing、HideAndSeek和GridMask，图像混叠类主要包括Mixup和Cutmix，更详细的关于数据增广的介绍可以参考：[数据增广章节](./advanced_tutorials/image_augmentation/ImageAugment.md)。
+>>
+* Q: 对于遮挡情况比较常见的图像分类场景，该使用什么数据增广方法去提升模型的精度呢？
+* A: 在训练的过程中可以尝试对训练集使用CutOut、RandErasing、HideAndSeek和GridMask等裁剪类数据增广方法，让模型也能够不止学习到显著区域，也能关注到非显著性区域，从而在遮挡的情况下，也能较好地完成识别任务。
+
+>>
+* Q: 对于色彩变换情况比较复杂的情况下，应该使用哪些数据增广方法提升模型精度呢？
+* A: 可以考虑使用AutoAugment或者RandAugment的数据增广策略，这两种策略中都包括了锐化、直方图均衡化等丰富的颜色变换，可以让模型在训练的过程中对这些变换更加鲁棒。
+>>
+* Q: Mixup和Cutmix的工作原理是什么？为什么它们也是非常有效的数据增广方法？
+* A: Mixup通过线性叠加两张图片生成新的图片，对应label也进行线性叠加用以训练，Cutmix则是从一幅图中随机裁剪出一个 感兴趣区域(ROI)，然后覆盖当前图像中对应的区域，label也按照图像面积比例进行线性叠加。它们其实也是生成了和训练集不同的样本和label并让网络去学习，从而扩充了样本的丰富度。
+>>
+* Q: 对于精度要求不是那么高的图像分类任务，大概需要准备多大的训练数据集呢？
+* A: 训练数据的数量和需要解决问题的复杂度有关系。难度越大，精度要求越高，则数据集需求越大，而且一般情况实际中的训练数据越多效果越好。当然，一般情况下，在加载预训练模型的情况下，每个类别包括10-20张图像即可保证基本的分类效果；不加载预训练模型的情况下，每个类别需要至少包含100-200张图像以保证基本的分类效果。
+
+>>
+* Q: <span id="jump">对于长尾分布的数据集，目前有哪些比较常用的方法？</span>
+* A: （1）可以对数据量比较少的类别进行重采样，增加其出现的概率；（2）可以修改loss，增加图像较少对应的类别的图片的loss权重；（3）可以借鉴迁移学习的方法，从常见类别中学习通用知识，然后迁移到少样本的类别中。
+
+<a name="模型推理与预测相关"></a>
+### 模型推理与预测相关
+
+>>
+* Q: 有时候图像中只有小部分区域是所关注的前景物体，直接拿原图来进行分类的话，识别效果很差，这种情况要怎么做呢？
+* A: 可以在分类之前先加一个主体检测的模型，将前景物体检测出来之后再进行分类，可以大大提升最终的识别效果。如果不考虑时间成本，也可以使用multi-crop的方式对所有的预测做融合来决定最终的类别。
+>>
+* Q: 目前推荐的，模型预测方式有哪些？
+* A: 在模型训练完成之后，推荐使用导出的固化模型（inference model），基于Paddle预测引擎进行预测，目前支持python inference与cpp inference。如果希望基于服务化部署预测模型，那么推荐使用HubServing的部署方式。
+>>
+* Q: 模型训练完成之后，有哪些比较合适的预测方法进一步提升模型精度呢？
+* A: （1）可以使用更大的预测尺度，比如说训练的时候使用的是224，那么预测的时候可以考虑使用288或者320，这会直接带来0.5%左右的精度提升。（2）可以使用测试时增广的策略（Test Time Augmentation, TTA)，将测试集通过旋转、翻转、颜色变换等策略，创建多个副本，并分别预测，最后将所有的预测结果进行融合，这可以大大提升预测结果的精度和鲁棒性。（3）当然，也可以使用多模型融合的策略，将多个模型针对相同图片的预测结果进行融合。
+>>
+* Q: 多模型融合的时候，该怎么选择合适的模型进行融合呢？
+* A: 在不考虑预测速度的情况下，建议选择精度尽量高的模型；同时建议选择不同结构或者系列的模型进行融合，比如在精度相似的情况下，ResNet50_vd与Xception65的模型融合结果往往比ResNet50_vd与ResNet101_vd的模型融合结果要好一些。
+
+>>
+* Q: 使用固定的模型进行预测时有哪些比较常用的加速方法？
+* A: （1）使用性能更优的GPU进行预测；（2）增大预测的batch size；（3）使用TenorRT以及FP16半精度浮点数等方法进行预测。
+
+
+<a name="PaddleClas使用问题"></a>
+## PaddleClas使用问题
+
 >>
 * Q: 多卡评估时，为什么每张卡输出的精度指标不相同？
 * A: 目前PaddleClas基于fleet api使用多卡，在多卡评估时，每张卡都是单独读取各自part的数据，不同卡中计算的图片是不同的，因此最终指标也会有微量差异，如果希望得到准确的评估指标，可以使用单卡评估。

-
 >>
 * Q: 在配置文件的`TRAIN`字段中配置了`mix`的参数，为什么`mixup`的数据增广预处理没有生效呢？
 * A: 使用mixup时，数据预处理部分与模型输入部分均需要修改，因此还需要在配置文件中显式地配置`use_mix: True`，才能使得`mixup`生效。
@@ -45,4 +212,63 @@ VALID:

 >>
 * Q: 如果想将保存的`pdparams`模型参数文件转换为早期版本(Paddle1.7.0之前)的零碎文件(每个文件均为一个单独的模型参数)，该怎么实现呢？
-* A: 可以首先导入`pdparams`模型，之后使用`fluid.io.save_vars`函数将模型保存为零散的碎文件。
+* A: 可以首先导入`pdparams`模型，之后使用`fluid.io.save_vars`函数将模型保存为零散的碎文件。示例代码如下，最终所有零散文件会被保存在`path_to_save_var`目录下。
+```
+fluid.load(
+        program=infer_prog, model_path=args.pretrained_model, executor=exe)
+state = fluid.io.load_program_state(args.pretrained_model)
+def exists(var):
+    return var.name in state
+fluid.io.save_vars(exe, "./path_to_save_var", infer_prog, predicate=exists)
+```
+
+>>
+* Q: python2下，使用visualdl的时候，报出以下错误，`TypeError: __init__() missing 1 required positional argument: 'sync_cycle'`，这是为什么呢？
+* A: 目前visualdl仅支持在python3下运行，visualdl需要是2.0以上的版本，如果visualdl版本不对的话，可以通过以下方式进行安装：`pip3 install visualdl==2.0.0b8  -i https://mirror.baidu.com/pypi/simple`
+
+>>
+* Q: 自己在测ResNet50_vd预测单张图片速度的时候发现比官网提供的速度benchmark慢了很多，而且CPU速度比GPU速度快很多，这个是为什么呢？
+* A: 模型预测需要初始化，初始化的过程比较耗时，因此在统计预测速度的时候，需要批量跑一批图片，去除前若干张图片的预测耗时，再统计下平均的时间。GPU比CPU速度测试单张图片速度慢是因为GPU的初始化并CPU要慢很多。
+
+>>
+* Q: 在动态图中加载静态图预训练模型的时候，需要注意哪些问题？
+* A: 在使用infer.py预测单张图片或者文件夹中的图片时，需要注意指定[infer.py](https://github.com/PaddlePaddle/PaddleClas/blob/53c5850df7c49a1bfcd8d989e6ccbea61f406a1d/tools/infer/infer.py#L40)中的`load_static_weights`为True，在finetune或者评估的时候需要添加`-o load_static_weights=True`的参数。
+>>
+* Q: 灰度图可以用于模型训练吗？
+* A: 灰度图也可以用于模型训练，不过需要修改模型的输入shape为`[1, 224, 224]`，此外数据增广部分也需要注意适配一下。不过为了更好地使用PaddleClas代码的话，即使是灰度图，也建议调整为3通道的图片进行训练（RGB通道的像素值相等）。
+
+>>
+* Q: 怎么在windows上或者cpu上面模型训练呢？
+* A: 可以参考[PaddleClas开始使用教程](https://github.com/PaddlePaddle/PaddleClas/blob/master/docs/zh_CN/tutorials/getting_started.md)，详细介绍了在Linux、Windows、CPU等环境中进行模型训练、评估与预测的教程。
+>>
+* Q: 怎样在模型训练的时候使用label smoothing呢？
+* A: 可以在配置文件中设置label smoothing epsilon的值，`ls_epsilon=0.1`，表示设置该值为0.1，若该值为-1，则表示不使用label smoothing。
+>>
+* Q: PaddleClas提供的10W类图像分类预训练模型能否用于模型推断呢？
+* A: 该10W类图像分类预训练模型没有提供fc全连接层的参数，无法用于模型推断，目前可以用于模型微调。
+>>
+* Q: 在使用`tools/infere/predict.py`进行模型预测的时候，报了这个问题:`Error: Pass tensorrt_subgraph_pass has not been registered`，这是为什么呢？
+* A: 如果希望使用TensorRT进行模型预测推理的话，需要编译带TensorRT的PaddlePaddle，编译的时候参考以下的编译方式，其中`TENSORRT_ROOT`表示TensorRT的路径。
+```
+cmake  .. \
+        -DWITH_CONTRIB=OFF \
+        -DWITH_MKL=ON \
+        -DWITH_MKLDNN=ON  \
+        -DWITH_TESTING=OFF \
+        -DCMAKE_BUILD_TYPE=Release \
+        -DWITH_INFERENCE_API_TEST=OFF \
+        -DON_INFER=ON \
+        -DWITH_PYTHON=ON \
+        -DPY_VERSION=2.7 \
+        -DTENSORRT_ROOT=/usr/local/TensorRT6-cuda10.0-cudnn7/
+make -j16
+make inference_lib_dist
+```
+>>
+* Q: 怎样在训练的时候使用自动混合精度(Automatic Mixed Precision, AMP)训练呢？
+* A: 可以参考[ResNet50_fp16.yml](https://github.com/PaddlePaddle/PaddleClas/blob/master/configs/ResNet/ResNet50_fp16.yml)这个配置文件；具体地，如果希望自己的配置文件在模型训练的时候也支持自动混合精度，可以在配置文件中添加下面的配置信息。
+```
+use_fp16: True
+amp_scale_loss: 128.0
+use_dynamic_loss_scaling: True
+```
--- a/docs/zh_CN/models/HRNet.md
+++ b/docs/zh_CN/models/HRNet.md
@@ -21,11 +21,13 @@ HRNet是2019年由微软亚洲研究院提出的一种全新的神经网络，
 | Models      | Top1   | Top5   | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
 |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
 | HRNet_W18_C | 0.769  | 0.934  | 0.768             | 0.934             | 4.140        | 21.290            |
+| HRNet_W18_C_ssld | 0.816  | 0.958  | 0.768             | 0.934             | 4.140        | 21.290            |
 | HRNet_W30_C | 0.780  | 0.940  | 0.782             | 0.942             | 16.230       | 37.710            |
 | HRNet_W32_C | 0.783  | 0.942  | 0.785             | 0.942             | 17.860       | 41.230            |
 | HRNet_W40_C | 0.788  | 0.945  | 0.789             | 0.945             | 25.410       | 57.550            |
 | HRNet_W44_C | 0.790  | 0.945  | 0.789             | 0.944             | 29.790       | 67.060            |
 | HRNet_W48_C | 0.790  | 0.944  | 0.793             | 0.945             | 34.580       | 77.470            |
+| HRNet_W48_C_ssld | 0.836  | 0.968  | 0.793             | 0.945             | 34.580       | 77.470            |
 | HRNet_W64_C | 0.793  | 0.946  | 0.795             | 0.946             | 57.830       | 128.060           |


@@ -34,11 +36,13 @@ HRNet是2019年由微软亚洲研究院提出的一种全新的神经网络，
 | Models      | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |-------------|-----------|-------------------|--------------------------|
 | HRNet_W18_C | 224       | 256               | 7.368                    |
+| HRNet_W18_C_ssld | 224       | 256               | 7.368                    |
 | HRNet_W30_C | 224       | 256               | 9.402                    |
 | HRNet_W32_C | 224       | 256               | 9.467                    |
 | HRNet_W40_C | 224       | 256               | 10.739                   |
 | HRNet_W44_C | 224       | 256               | 11.497                   |
 | HRNet_W48_C | 224       | 256               | 12.165                   |
+| HRNet_W48_C_ssld | 224       | 256               | 12.165                   |
 | HRNet_W64_C | 224       | 256               | 15.003                   |


@@ -49,9 +53,11 @@ HRNet是2019年由微软亚洲研究院提出的一种全新的神经网络，
 | Models      | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
 |-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
 | HRNet_W18_C | 224       | 256               | 6.79093                      | 11.50986                     | 17.67244                     | 7.40636                      | 13.29752                     | 23.33445                     |
+| HRNet_W18_C_ssld | 224       | 256               | 6.79093                      | 11.50986                     | 17.67244                     | 7.40636                      | 13.29752                     | 23.33445                     |
 | HRNet_W30_C | 224       | 256               | 8.98077                      | 14.08082                     | 21.23527                     | 9.57594                      | 17.35485                     | 32.6933                      |
 | HRNet_W32_C | 224       | 256               | 8.82415                      | 14.21462                     | 21.19804                     | 9.49807                      | 17.72921                     | 32.96305                     |
 | HRNet_W40_C | 224       | 256               | 11.4229                      | 19.1595                      | 30.47984                     | 12.12202                     | 25.68184                     | 48.90623                     |
 | HRNet_W44_C | 224       | 256               | 12.25778                     | 22.75456                     | 32.61275                     | 13.19858                     | 32.25202                     | 59.09871                     |
 | HRNet_W48_C | 224       | 256               | 12.65015                     | 23.12886                     | 33.37859                     | 13.70761                     | 34.43572                     | 63.01219                     |
+| HRNet_W48_C_ssld | 224       | 256               | 12.65015                     | 23.12886                     | 33.37859                     | 13.70761                     | 34.43572                     | 63.01219                     |
 | HRNet_W64_C | 224       | 256               | 15.10428                     | 27.68901                     | 40.4198                      | 17.57527                     | 47.9533                      | 97.11228                     |
--- a/docs/zh_CN/models/Mobile.md
+++ b/docs/zh_CN/models/Mobile.md
@@ -9,6 +9,8 @@ ShuffleNet系列网络是旷视提出的轻量化网络结构，到目前为止

 MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络，为了进一步提升效果，将relu和sigmoid激活函数分别替换为hard_swish与hard_sigmoid激活函数，同时引入了一些专门减小网络计算量的改进策略。

+GhostNet是华为于2020年提出的一种全新的轻量化网络结构，通过引入ghost module，大大减缓了传统深度网络中特征的冗余计算问题，使得网络的参数量和计算量大大降低。
+
 ![](../../images/models/mobile_arm_top1.png)

 ![](../../images/models/mobile_arm_storage.png)
@@ -18,7 +20,7 @@ MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络
 ![](../../images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png)


-目前PaddleClas开源的的移动端系列的预训练模型一共有32个，其指标如图所示。从图片可以看出，越新的轻量级模型往往有更优的表现，MobileNetV3代表了目前最新的轻量级神经网络结构。在MobileNetV3中，作者为了获得更高的精度，在global-avg-pooling后使用了1x1的卷积。该操作大幅提升了参数量但对计算量影响不大，所以如果从存储角度评价模型的优异程度，MobileNetV3优势不是很大，但由于其更小的计算量，使得其有更快的推理速度。此外，我们模型库中的ssld蒸馏模型表现优异，从各个考量角度下，都刷新了当前轻量级模型的精度。由于MobileNetV3模型结构复杂，分支较多，对GPU并不友好，GPU预测速度不如MobileNetV1。
+目前PaddleClas开源的的移动端系列的预训练模型一共有35个，其指标如图所示。从图片可以看出，越新的轻量级模型往往有更优的表现，MobileNetV3代表了目前主流的轻量级神经网络结构。在MobileNetV3中，作者为了获得更高的精度，在global-avg-pooling后使用了1x1的卷积。该操作大幅提升了参数量但对计算量影响不大，所以如果从存储角度评价模型的优异程度，MobileNetV3优势不是很大，但由于其更小的计算量，使得其有更快的推理速度。此外，我们模型库中的ssld蒸馏模型表现优异，从各个考量角度下，都刷新了当前轻量级模型的精度。由于MobileNetV3模型结构复杂，分支较多，对GPU并不友好，GPU预测速度不如MobileNetV1。GhostNet于2020年提出，通过引入ghost的网络设计理念，大大降低了计算量和参数量，同时在精度上也超过前期最高的MobileNetV3网络结构。


 ## 精度、FLOPS和参数量
@@ -47,6 +49,7 @@ MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络
 | MobileNetV3_small_<br>x0_75          | 0.660   | 0.863   | 0.654             |                   | 0.088        | 2.370             |
 | MobileNetV3_small_<br>x0_5           | 0.592   | 0.815   | 0.580             |                   | 0.043        | 1.900             |
 | MobileNetV3_small_<br>x0_35          | 0.530   | 0.764   | 0.498             |                   | 0.026        | 1.660             |
+| MobileNetV3_small_<br>x0_35_ssld          | 0.556   | 0.777   | 0.498             |                   | 0.026        | 1.660             |
 | MobileNetV3_large_<br>x1_0_ssld      | 0.790   | 0.945   |                   |                   | 0.450        | 5.470             |
 | MobileNetV3_large_<br>x1_0_ssld_int8 | 0.761   |         |                   |                   |              |                   |
 | MobileNetV3_small_<br>x1_0_ssld      | 0.713   | 0.901   |                   |                   | 0.123        | 2.940             |
@@ -57,6 +60,9 @@ MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络
 | ShuffleNetV2_x1_5                    | 0.716   | 0.902   | 0.726             |                   | 0.580        | 3.470             |
 | ShuffleNetV2_x2_0                    | 0.732   | 0.912   | 0.749             |                   | 1.120        | 7.320             |
 | ShuffleNetV2_swish                   | 0.700   | 0.892   |                   |                   | 0.290        | 2.260             |
+| GhostNet_x0_5                        | 0.668   | 0.869   | 0.662             | 0.866             | 0.082        | 2.600             |
+| GhostNet_x1_0                        | 0.740   | 0.916   | 0.739             | 0.914             | 0.294        | 5.200             |
+| GhostNet_x1_3                        | 0.757   | 0.925   | 0.757             | 0.927             | 0.440        | 7.300             |


 ## 基于SD855的预测速度和存储大小
@@ -85,6 +91,7 @@ MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络
 | MobileNetV3_small_x0_75          | 5.284            | 9.600           |
 | MobileNetV3_small_x0_5           | 3.352            | 7.800           |
 | MobileNetV3_small_x0_35          | 2.635            | 6.900           |
+| MobileNetV3_small_x0_35_ssld          | 2.635            | 6.900           |
 | MobileNetV3_large_x1_0_ssld      | 19.308           | 21.000          |
 | MobileNetV3_large_x1_0_ssld_int8 | 14.395           | 10.000          |
 | MobileNetV3_small_x1_0_ssld      | 6.546            | 12.000          |
@@ -95,6 +102,9 @@ MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络
 | ShuffleNetV2_x1_5                    | 19.352           | 14.000          |
 | ShuffleNetV2_x2_0                    | 34.770           | 28.000          |
 | ShuffleNetV2_swish                   | 16.023           | 9.100           |
+| GhostNet_x0_5                   | 5.714           | 10.000           |
+| GhostNet_x1_0                   | 13.558           | 20.000           |
+| GhostNet_x1_3                   | 19.982           | 29.000           |


 ## 基于T4 GPU的预测速度
@@ -123,6 +133,7 @@ MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络
 | MobileNetV3_small_x0_75     | 1.80617               | 2.64646               | 3.24513               | 1.93697               | 2.64285               | 3.32797               |
 | MobileNetV3_small_x0_5      | 1.95001               | 2.74014               | 3.39485               | 1.88406               | 2.99601               | 3.3908                |
 | MobileNetV3_small_x0_35     | 2.10683               | 2.94267               | 3.44254               | 1.94427               | 2.94116               | 3.41082               |
+| MobileNetV3_small_x0_35_ssld     | 2.10683               | 2.94267               | 3.44254               | 1.94427               | 2.94116               | 3.41082               |
 | MobileNetV3_large_x1_0_ssld | 2.20149               | 3.08423               | 4.07779               | 2.04296               | 2.9322                | 4.53184               |
 | MobileNetV3_small_x1_0_ssld | 1.73933               | 2.59478               | 3.40276               | 1.74527               | 2.63565               | 3.28124               |
 | ShuffleNetV2                | 1.95064               | 2.15928               | 2.97169               | 1.89436               | 2.26339               | 3.17615               |

--- a/docs/zh_CN/models/ResNeSt_RegNet.md
+++ b/docs/zh_CN/models/ResNeSt_RegNet.md
+# ResNeSt与RegNet系列
+
+## 概述
+
+ResNeSt系列模型是在2020年提出的，在原有的resnet网络结构上做了改进，通过引入K个Group和在不同Group中加入类似于SEBlock的attention模块，使得精度相比于基础模型ResNet有了大幅度的提高，且参数量和flops与基础的ResNet基本保持一致。
+
+RegNet是由facebook于2020年提出，旨在深化设计空间理念的概念，在AnyNetX的基础上逐步改进，通过加入共享瓶颈ratio、共享组宽度、调整网络深度与宽度等策略，最终实现简化设计空间结构、提高设计空间的可解释性、改善设计空间的质量，并保持设计空间的模型多样性的目的。最终设计出的模型在类似的条件下，性能还要优于EfficientNet，并且在GPU上的速度提高了5倍。
+
+
+## 精度、FLOPS和参数量
+
+| Models           | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| ResNeSt50_fast_1s1x64d        | 0.8035 | 0.9528|  0.8035 |            -| 8.68     | 26.3   |
+| ResNeSt50        | 0.8102 | 0.9542|  0.8113 |            -| 10.78     | 27.5   |
+| RegNetX_4GF        | 0.7850 | 0.9416|  0.7860 |            -| 8.0     | 22.1   |
+
+
+## 基于T4 GPU的预测速度
+
+| Models             | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|--------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| ResNeSt50_fast_1s1x64d          | 224       | 256   | 3.46466           | 5.56647           | 9.11848          | 3.45405      |   8.72680    |    15.48710     |
+| ResNeSt50         | 224       | 256               | 7.05851           | 8.97676            | 13.34704          | 6.16248      |   12.0633    |    21.49936     |
+| RegNetX_4GF | 224       | 256       | 6.69042    | 8.01664            | 11.60608       | 6.46478     |   11.19862    |    16.89089    |
--- a/docs/zh_CN/models/ResNet_and_vd.md
+++ b/docs/zh_CN/models/ResNet_and_vd.md
@@ -32,6 +32,7 @@ ResNet系列模型是在2015年提出的，一举在ILSVRC2015比赛中取得冠
 | ResNet18_vd      | 0.723           | 0.908           |                          |                          | 4.140     | 11.710    |
 | ResNet34         | 0.746           | 0.921           | 0.732                    | 0.913                    | 7.360     | 21.800    |
 | ResNet34_vd      | 0.760           | 0.930           |                          |                          | 7.390     | 21.820    |
+| ResNet34_vd_ssld      | 0.797           | 0.949           |                          |                          | 7.390     | 21.820    |
 | ResNet50         | 0.765           | 0.930           | 0.760                    | 0.930                    | 8.190     | 25.560    |
 | ResNet50_vc      | 0.784           | 0.940           |                          |                          | 8.670     | 25.580    |
 | ResNet50_vd      | 0.791           | 0.944           | 0.792                    | 0.946                    | 8.670     | 25.580    |
@@ -58,6 +59,7 @@ ResNet系列模型是在2015年提出的，一举在ILSVRC2015比赛中取得冠
 | ResNet18_vd      | 224       | 256               | 1.603                    |
 | ResNet34         | 224       | 256               | 2.272                    |
 | ResNet34_vd      | 224       | 256               | 2.343                    |
+| ResNet34_vd_ssld      | 224       | 256               | 2.343                    |
 | ResNet50         | 224       | 256               | 2.939                    |
 | ResNet50_vc      | 224       | 256               | 3.041                    |
 | ResNet50_vd      | 224       | 256               | 3.165                    |
@@ -79,6 +81,7 @@ ResNet系列模型是在2015年提出的，一举在ILSVRC2015比赛中取得冠
 | ResNet18_vd       | 224       | 256               | 1.39593                      | 2.69063                      | 3.88267                      | 1.54557                      | 3.85363                      | 6.88121                      |
 | ResNet34          | 224       | 256               | 2.23092                      | 4.10205                      | 5.54904                      | 2.34957                      | 5.89821                      | 10.73451                     |
 | ResNet34_vd       | 224       | 256               | 2.23992                      | 4.22246                      | 5.79534                      | 2.43427                      | 6.22257                      | 11.44906                     |
+| ResNet34_vd_ssld       | 224       | 256               | 2.23992                      | 4.22246                      | 5.79534                      | 2.43427                      | 6.22257                      | 11.44906                     |
 | ResNet50          | 224       | 256               | 2.63824                      | 4.63802                      | 7.02444                      | 3.47712                      | 7.84421                      | 13.90633                     |
 | ResNet50_vc       | 224       | 256               | 2.67064                      | 4.72372                      | 7.17204                      | 3.52346                      | 8.10725                      | 14.45577                     |
 | ResNet50_vd       | 224       | 256               | 2.65164                      | 4.84109                      | 7.46225                      | 3.53131                      | 8.09057                      | 14.45965                     |

--- a/docs/zh_CN/models/SEResNext_and_Res2Net.md
+++ b/docs/zh_CN/models/SEResNext_and_Res2Net.md
@@ -32,6 +32,7 @@ Res2Net是2019年提出的一种全新的对ResNet的改进方案，该方案可
 | Res2Net50_14w_8s      | 0.795  | 0.947  | 0.781             | 0.939             | 9.010        | 25.720            |
 | Res2Net101_vd_26w_4s  | 0.806  | 0.952  |                   |                   | 16.670       | 45.220            |
 | Res2Net200_vd_26w_4s  | 0.812  | 0.957  |                   |                   | 31.490       | 76.210            |
+| Res2Net200_vd_26w_4s_ssld  | **0.851**  | 0.974  |                   |                   | 31.490       | 76.210            |
 | ResNeXt50_32x4d       | 0.778  | 0.938  | 0.778             |                   | 8.020        | 23.640            |
 | ResNeXt50_vd_32x4d    | 0.796  | 0.946  |                   |                   | 8.500        | 23.660            |
 | ResNeXt50_64x4d       | 0.784  | 0.941  |                   |                   | 15.060       | 42.360            |

--- a/docs/zh_CN/models/models_intro.md
+++ b/docs/zh_CN/models/models_intro.md
@@ -45,6 +45,7 @@ python tools/infer/predict.py \
    - [ResNet50_vc](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vc_pretrained.tar)
    - [ResNet18_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar)
    - [ResNet34_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_vd_pretrained.tar)
+    - [ResNet34_vd_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_vd_ssld_pretrained.tar)
    - [ResNet50_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar)
    - [ResNet50_vd_v2](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_v2_pretrained.tar)
    - [ResNet101_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar)
@@ -93,6 +94,10 @@ python tools/infer/predict.py \
    - [ShuffleNetV2_x1_5](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x1_5_pretrained.tar)
    - [ShuffleNetV2_x2_0](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x2_0_pretrained.tar)
    - [ShuffleNetV2_swish](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_swish_pretrained.tar)
+  - GhostNet系列<sup>[[23](#ref23)]</sup>([论文地址](https://arxiv.org/pdf/1911.11907.pdf))
+    - [GhostNet_x0_5](https://paddle-imagenet-models-name.bj.bcebos.com/GhostNet_x0_5_pretrained.pdparams)
+    - [GhostNet_x1_0](https://paddle-imagenet-models-name.bj.bcebos.com/GhostNet_x1_0_pretrained.pdparams)
+    - [GhostNet_x1_3](https://paddle-imagenet-models-name.bj.bcebos.com/GhostNet_x1_3_pretrained.pdparams)


 - SEResNeXt与Res2Net系列
@@ -126,6 +131,7 @@ python tools/infer/predict.py \
    - [Res2Net50_14w_8s](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_14w_8s_pretrained.tar)
    - [Res2Net101_vd_26w_4s](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net101_vd_26w_4s_pretrained.tar)
    - [Res2Net200_vd_26w_4s](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net200_vd_26w_4s_pretrained.tar)
+    - [Res2Net200_vd_26w_4s_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net200_vd_26w_4s_ssld_pretrained.tar)


 - Inception系列
@@ -144,11 +150,13 @@ python tools/infer/predict.py \
 - HRNet系列
  - HRNet系列<sup>[[13](#ref13)]</sup>([论文地址](https://arxiv.org/abs/1908.07919))
    - [HRNet_W18_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_pretrained.tar)
+    - [HRNet_W18_C_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_ssld_pretrained.tar)
    - [HRNet_W30_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W30_C_pretrained.tar)
    - [HRNet_W32_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W32_C_pretrained.tar)
    - [HRNet_W40_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W40_C_pretrained.tar)
    - [HRNet_W44_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W44_C_pretrained.tar)
    - [HRNet_W48_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W48_C_pretrained.tar)
+    - [HRNet_W48_C_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W48_C_ssld_pretrained.tar)
    - [HRNet_W64_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W64_C_pretrained.tar)


@@ -185,6 +193,13 @@ python tools/infer/predict.py \
    - [ResNeXt101_32x48d_wsl](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x48d_wsl_pretrained.tar)
    - [Fix_ResNeXt101_32x48d_wsl](https://paddle-imagenet-models-name.bj.bcebos.com/Fix_ResNeXt101_32x48d_wsl_pretrained.tar)

+- ResNeSt与RegNet系列
+  - ResNeSt系列<sup>[[24](#ref24)]</sup>([论文地址](https://arxiv.org/abs/2004.08955))
+    - [ResNeSt50_fast_1s1x64d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeSt50_fast_1s1x64d_pretrained.pdparams)
+    - [ResNeSt50](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeSt50_pretrained.pdparams)
+  - RegNet系列<sup>[[25](#ref25)]</sup>([paper link](https://arxiv.org/abs/2003.13678))
+    - [RegNetX_4GF](https://paddle-imagenet-models-name.bj.bcebos.com/RegNetX_4GF_pretrained.pdparams)
+

 - 其他模型
  - AlexNet系列<sup>[[18](#ref18)]</sup>([论文地址](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf))
@@ -254,3 +269,9 @@ python tools/infer/predict.py \
 <a name="ref21">[21]</a> Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.

 <a name="ref22">[22]</a> Ding X, Guo Y, Ding G, et al. Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 1911-1920.
+
+<a name="ref23">[23]</a> Han K, Wang Y, Tian Q, et al. GhostNet: More features from cheap operations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1580-1589.
+
+<a name="ref24">[24]</a> Zhang H, Wu C, Zhang Z, et al. Resnest: Split-attention networks[J]. arXiv preprint arXiv:2004.08955, 2020.
+
+<a name="ref25">[25]</a> Radosavovic I, Kosaraju R P, Girshick R, et al. Designing network design spaces[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10428-10436.
--- a/docs/zh_CN/tutorials/getting_started.md
+++ b/docs/zh_CN/tutorials/getting_started.md
@@ -2,17 +2,89 @@
 ---
 请事先参考[安装指南](install.md)配置运行环境，并根据[数据说明](./data.md)文档准备ImageNet1k数据，本章节下面所有的实验均以ImageNet1k数据集为例。

-## 一、设置环境变量
+## 1. Windows或者CPU上训练与评估

-**设置PYTHONPATH环境变量：**
+如果在windows系统或者CPU上进行训练与评估，推荐使用`tools/train_multi_platform.py`与`tools/eval_multi_platform.py`脚本。
+
+### 1.1 模型训练
+
+准备好配置文件之后，可以使用下面的方式启动训练。
+
+```
+python tools/train_multi_platform.py \
+    -c configs/ResNet/ResNet50.yaml \
+    -o model_save_dir=./output/ \
+    -o use_gpu=True
+```
+
+其中，`-c`用于指定配置文件的路径，`-o`用于指定需要修改或者添加的参数，`-o model_save_dir=./output/`表示将配置文件中的`model_save_dir`修改为`./output/`。`-o use_gpu=True`表示使用GPU进行训练。如果希望使用CPU进行训练，则需要将`use_gpu`设置为`False`。
+
+也可以直接修改模型对应的配置文件更新配置。具体配置参数参考[配置文档](config.md)。
+
+* 输出日志示例如下：
+
+    * 如果在训练使用了mixup或者cutmix的数据增广方式，那么日志中只会打印出loss(损失)、lr(学习率)以及该minibatch的训练时间。
+
+    ```
+    train step:890  loss:  6.8473 lr: 0.100000 elapse: 0.157s
+    ```
+
+    * 如果训练过程中没有使用mixup或者cutmix的数据增广，那么除了loss(损失)、lr(学习率)以及该minibatch的训练时间之外，日志中也会打印出top-1与top-k(默认为5)的信息。
+
+    ```
+    epoch:0    train    step:13    loss:7.9561    top1:0.0156    top5:0.1094    lr:0.100000    elapse:0.193s
+    ```
+
+训练期间可以通过VisualDL实时观察loss变化，启动命令如下：

 ```bash
-export PYTHONPATH=path_to_PaddleClas:$PYTHONPATH
+visualdl --logdir ./scalar --host <host_IP> --port <port_num>
 ```

-## 二、模型训练与评估
+### 1.2 模型微调

-PaddleClas 提供模型训练与评估脚本：`tools/train.py`和`tools/eval.py`
+* 根据自己的数据集配置好配置文件之后，可以通过加载预训练模型进行微调，如下所示。
+
+```
+python tools/train_multi_platform.py \
+    -c configs/ResNet/ResNet50.yaml \
+    -o pretrained_model="./pretrained/ResNet50_pretrained"
+```
+
+其中`pretrained_model`用于设置加载预训练权重的地址，使用时需要换成自己的预训练模型权重路径，也可以直接在配置文件中修改该路径。
+
+### 1.3 模型恢复训练
+
+* 如果训练任务因为其他原因被终止，也可以加载断点权重继续训练。
+
+```
+python tools/train_multi_platform.py \
+    -c configs/ResNet/ResNet50.yaml \
+    -o checkpoints="./output/ResNet/0/ppcls"
+```
+
+其中配置文件不需要做任何修改，只需要在训练时添加`checkpoints`参数即可，表示加载的断点权重路径，使用该参数会同时加载保存的断点权重和学习率、优化器等信息。
+
+
+### 1.4 模型评估
+
+* 可以通过以下命令完成模型评估。
+
+```bash
+python tools/eval_multi_platform.py \
+    -c ./configs/eval.yaml \
+    -o ARCHITECTURE.name="ResNet50_vd" \
+    -o pretrained_model=path_to_pretrained_models
+```
+
+可以更改`configs/eval.yaml`中的`ARCHITECTURE.name`字段和`pretrained_model`字段来配置评估模型，也可以通过-o参数更新配置。
+
+**注意：** 加载预训练模型时，需要指定预训练模型的前缀，例如预训练模型参数所在的文件夹为`output/ResNet50_vd/19`，预训练模型参数的名称为`output/ResNet50_vd/19/ppcls.pdparams`，则`pretrained_model`参数需要指定为`output/ResNet50_vd/19/ppcls`，PaddleClas会自动补齐`.pdparams`的后缀。
+
+
+## 2. 基于Linux+GPU的模型训练与评估
+
+如果机器环境为Linux+GPU，那么推荐使用PaddleClas 提供的模型训练与评估脚本：`tools/train.py`和`tools/eval.py`，可以更快地完成训练与评估任务。

 ### 2.1 模型训练

@@ -28,12 +100,6 @@ python -m paddle.distributed.launch \
        -c ./configs/ResNet/ResNet50_vd.yaml
 ```

- 输出日志示例如下：
-
-```
-epoch:0    train    step:13    loss:7.9561    top1:0.0156    top5:0.1094    lr:0.100000    elapse:0.193
-```
-
 可以通过添加-o参数来更新配置：

 ```bash
@@ -42,41 +108,58 @@ python -m paddle.distributed.launch \
    tools/train.py \
        -c ./configs/ResNet/ResNet50_vd.yaml \
        -o use_mix=1 \
-	--vdl_dir=./scalar/
-
+        --vdl_dir=./scalar/
 ```

- 输出日志示例如下：
+输出日志信息的格式同上。
+
+### 2.2 模型微调
+
+* 根据自己的数据集配置好配置文件之后，可以通过加载预训练模型进行微调，如下所示。

 ```
-epoch:0    train    step:522    loss:1.6330    lr:0.100000    elapse:0.210
+python -m paddle.distributed.launch \
+    --selected_gpus="0,1,2,3" \
+    tools/train.py \
+        -c configs/ResNet/ResNet50.yaml \
+        -o pretrained_model="./pretrained/ResNet50_pretrained"
 ```

-也可以直接修改模型对应的配置文件更新配置。具体配置参数参考[配置文档](config.md)。
+其中`pretrained_model`用于设置加载预训练权重的地址，使用时需要换成自己的预训练模型权重路径，也可以直接在配置文件中修改该路径。

-训练期间可以通过VisualDL实时观察loss变化，启动命令如下：
+* [30分钟玩转PaddleClas教程](./quick_start.md)中包含大量模型微调的示例，可以参考该章节在特定的数据集上进行模型微调。

-```bash
-visualdl --logdir ./scalar --host <host_IP> --port <port_num>
+
+### 2.3 模型恢复训练
+
+* 如果训练任务，因为其他原因被终止，也可以加载断点权重继续训练。

 ```
+python -m paddle.distributed.launch \
+    --selected_gpus="0,1,2,3" \
+    tools/train.py \
+        -c configs/ResNet/ResNet50.yaml \
+        -o checkpoints="./output/ResNet/0/ppcls"
+```

+其中配置文件不需要做任何修改，只需要在训练时添加`checkpoints`参数即可，表示加载的断点权重路径，使用该参数会同时加载保存的模型参数权重和学习率、优化器等信息。

-### 2.2 模型微调

-* [30分钟玩转PaddleClas](./quick_start.md)中包含大量模型微调的示例，可以参考该章节在特定的数据集上进行模型微调。
+### 2.4 模型评估

-### 2.3 模型评估
+* 可以通过以下命令完成模型评估。

 ```bash
-python tools/eval.py \
-    -c ./configs/eval.yaml \
-    -o ARCHITECTURE.name="ResNet50_vd" \
-    -o pretrained_model=path_to_pretrained_models
+python -m paddle.distributed.launch \
+    --selected_gpus="0" \
+    tools/eval.py \
+        -c ./configs/eval.yaml \
+        -o ARCHITECTURE.name="ResNet50_vd" \
+        -o pretrained_model=path_to_pretrained_models
 ```
+
 可以更改configs/eval.yaml中的`ARCHITECTURE.name`字段和pretrained_model字段来配置评估模型，也可以通过-o参数更新配置。

-**注意：** 加载预训练模型时，需要指定预训练模型的前缀，例如预训练模型参数所在的文件夹为`output/ResNet50_vd/19`，预训练模型参数的名称为`output/ResNet50_vd/19/ppcls.pdparams`，则`pretrained_model`参数需要指定为`output/ResNet50_vd/19/ppcls`，PaddleClas会自动补齐`.pdparams`的后缀。

 ## 三、模型推理

@@ -97,6 +180,6 @@ python tools/infer/predict.py \
    -p params文件路径 \
    -i 图片路径 \
    --use_gpu=1 \
-    --use_tensorrt=True
+    --use_tensorrt=False
 ```
 更多使用方法和推理方式请参考[分类预测框架](../extension/paddle_inference.md)。
--- a/docs/zh_CN/tutorials/install.md
+++ b/docs/zh_CN/tutorials/install.md
@@ -67,3 +67,5 @@ visualdl可能出现安装失败，请尝试
 pip3 install --upgrade visualdl==2.0.0b3 -i https://mirror.baidu.com/pypi/simple

 ```
+
+此外，visualdl目前只支持在python3下运行，因此如果希望使用visualdl，需要使用python3。
--- a/docs/zh_CN/update_history.md
+++ b/docs/zh_CN/update_history.md
 # 更新日志

+- 2020.10.12
+    * 添加Paddle-Lite demo。
+
+- 2020.10.10
+    * 添加cpp inference demo。
+    * 添加FAQ30问。
+
+- 2020.09.17
+    * 添加HRNet_W48_C_ssld模型，在ImageNet上Top-1 Acc可达0.836；添加ResNet34_vd_ssld模型，在ImageNet上Top-1 Acc可达0.797。
+
+* 2020.09.07
+    * 添加HRNet_W18_C_ssld模型，在ImageNet上Top-1 Acc可达0.81162；添加MobileNetV3_small_x0_35_ssld模型，在ImageNet上Top-1 Acc可达0.5555。
+
+* 2020.07.14
+    * 添加Res2Net200_vd_26w_4s_ssld模型，在ImageNet上Top-1 Acc可达85.13%。
+    * 添加Fix_ResNet50_vd_ssld_v2模型，，在ImageNet上Top-1 Acc可达84.0%。
+
+* 2020.06.17
+    * 添加英文文档。
+
+* 2020.06.12
+    * 添加对windows和CPU环境的训练与评估支持。
+
 * 2020.05.17
    * 添加混合精度训练。