diff --git a/.gitignore b/.gitignore
index d8f8bca6ec5faa7f12c78c2dbeb9c7d11e903b97..f1e7651dabdb487f76efa9c992407bb077feac35 100644
--- a/.gitignore
+++ b/.gitignore
@@ -12,3 +12,4 @@ build/
 log/
 nohup.out
 .DS_Store
+.idea
diff --git a/MANIFEST.in b/MANIFEST.in
index b0a4f6dc151b0e11d83655d3f7ef40c200a88ee8..97372da0035488913c83dfe6f2ddfb8fe0c906c3 100644
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -1,7 +1,8 @@
 include LICENSE.txt
 include README.md
 include docs/en/whl_en.md
-recursive-include deploy/python predict_cls.py preprocess.py postprocess.py det_preprocess.py
+recursive-include deploy/python *.py
+recursive-include deploy/configs *.yaml
 recursive-include deploy/utils get_image_list.py config.py logger.py predictor.py
 
 recursive-include ppcls/ *.py *.txt
\ No newline at end of file
diff --git a/README.md b/README.md
index 44885f554afdc7e00188fae2987e7fbbb4278fcc..13c4f964bb9063f28d6e08dfb8c6b828a81d2536 120000
--- a/README.md
+++ b/README.md
@@ -1 +1 @@
-README_ch.md
\ No newline at end of file
+README_en.md
\ No newline at end of file
diff --git a/README_ch.md b/README_ch.md
index 74f02ecca839b53217b2189a65afaf0b012b3261..fbc7aa6fcf1180d6ab733e3d739dca0f3861e149 100644
--- a/README_ch.md
+++ b/README_ch.md
@@ -4,106 +4,130 @@
 
 ## 简介
 
-飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集，助力使用者训练出更好的视觉模型和应用落地。
+飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别和图像分类任务的工具集，助力使用者训练出更好的视觉模型和应用落地。
 
-**近期更新**
-- 🔥️ 2022.5.26 [飞桨产业实践范例直播课](http://aglc.cn/v-c4FAR)，解读**超轻量重点区域人员出入管理方案**，欢迎报名来交流。
-  <div align="center">
-    <img src="https://user-images.githubusercontent.com/80816848/170166458-767a01ca-1429-437f-a628-dd184732ef53.png"  width = "150" />
-  </div>
-- 2022.5.23 新增[人员出入管理范例库](https://aistudio.baidu.com/aistudio/projectdetail/4094475)，具体内容可以在 AI Stuio 上体验。
-- 2022.5.20 上线[PP-HGNet](./docs/zh_CN/models/PP-HGNet.md), [PP-LCNet v2](./docs/zh_CN/models/PP-LCNetV2.md)
-- 2022.4.21 新增 CVPR2022 oral论文 [MixFormer](https://arxiv.org/pdf/2204.02557.pdf) 相关[代码](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files)。
-- 2022.1.27 全面升级文档；新增[PaddleServing C++ pipeline部署方式](./deploy/paddleserving)和[18M图像识别安卓部署Demo](./deploy/lite_shitu)。
-- 2021.11.1 发布[PP-ShiTu技术报告](https://arxiv.org/pdf/2111.00775.pdf)，新增饮料识别demo
-- 2021.10.23 发布轻量级图像识别系统PP-ShiTu，CPU上0.2s即可完成在10w+库的图像识别。
-[点击这里](./docs/zh_CN/quick_start/quick_start_recognition.md)立即体验
-- 2021.09.17 发布PP-LCNet系列超轻量骨干网络模型, 在Intel CPU上，单张图像预测速度约5ms，ImageNet-1K数据集上Top1识别准确率达到80.82%，超越ResNet152的模型效果。PP-LCNet的介绍可以参考[论文](https://arxiv.org/pdf/2109.15099.pdf), 或者[PP-LCNet模型介绍](docs/zh_CN/models/PP-LCNet.md)，相关指标和预训练权重可以从 [这里](docs/zh_CN/algorithm_introduction/ImageNet_models.md)下载。
-- [more](./docs/zh_CN/others/update_history.md)
-
-## 特性
-
-- PP-ShiTu轻量图像识别系统：集成了目标检测、特征学习、图像检索等模块，广泛适用于各类图像识别任务。cpu上0.2s即可完成在10w+库的图像识别。
+<div align="center">
+<img src="./docs/images/class_simple.gif"  width = "600" />
+<p>PULC实用图像分类模型效果展示</p>
+</div>
+&nbsp;
 
-- PP-LCNet轻量级CPU骨干网络：专门为CPU设备打造轻量级骨干网络，速度、精度均远超竞品。
 
-- 丰富的预训练模型库：提供了36个系列共175个ImageNet预训练模型，其中7个精选系列模型支持结构快速修改。
+<div align="center">
+<img src="./docs/images/recognition.gif"  width = "400" />
+<p>PP-ShiTu图像识别系统效果展示</p>
+</div>
 
-- 全面易用的特征学习组件：集成arcmargin, triplet loss等12度量学习方法，通过配置文件即可随意组合切换。
 
-- SSLD知识蒸馏：14个分类预训练模型，精度普遍提升3%以上；其中ResNet50_vd模型在ImageNet-1k数据集上的Top-1精度达到了84.0%，
-Res2Net200_vd预训练模型Top-1精度高达85.1%。
+## 近期更新
+- 📢将于**6月15-6月17日晚20:30** 进行为期三天的课程直播，详细介绍超轻量图像分类方案，对各场景模型优化原理及使用方式进行拆解，之后还有产业案例全流程实操，对各类痛难点解决方案进行手把手教学，加上现场互动答疑，抓紧扫码上车吧！
 
 <div align="center">
-<img src="./docs/images/recognition.gif"  width = "400" />
+<img src="https://user-images.githubusercontent.com/45199522/173483779-2332f990-4941-4f8d-baee-69b62035fc31.png" width = "200" height = "200"/>
 </div>
 
+- 🔥️ 2022.6.15 发布[PULC超轻量图像分类实用方案](docs/zh_CN/PULC/PULC_train.md)，CPU推理3ms，精度比肩SwinTransformer，覆盖人、车、OCR场景九大常见任务。
+
+- 2022.5.26 [飞桨产业实践范例直播课](http://aglc.cn/v-c4FAR)，解读**超轻量重点区域人员出入管理方案**。
+
+- 2022.5.23 新增[人员出入管理范例库](https://aistudio.baidu.com/aistudio/projectdetail/4094475)，具体内容可以在 AI Stuio 上体验。
+
+- 2022.5.20 上线[PP-HGNet](./docs/zh_CN/models/PP-HGNet.md), [PP-LCNetv2](./docs/zh_CN/models/PP-LCNetV2.md)。
+
+- 2022.4.21 新增 CVPR2022 oral论文 [MixFormer](https://arxiv.org/pdf/2204.02557.pdf) 相关[代码](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files)。
+
+- [more](./docs/zh_CN/others/update_history.md)
+
+## 特性
+
+PaddleClas发布了[PP-HGNet](docs/zh_CN/models/PP-HGNet.md)、[PP-LCNetv2](docs/zh_CN/models/PP-LCNetV2.md)、 [PP-LCNet](docs/zh_CN/models/PP-LCNet.md)和[SSLD半监督知识蒸馏方案](docs/zh_CN/advanced_tutorials/ssld.md)等算法，
+并支持多种图像分类、识别相关算法，在此基础上打造[PULC超轻量图像分类方案](docs/zh_CN/PULC/PULC_quickstart.md)和[PP-ShiTu图像识别系统](./docs/zh_CN/quick_start/quick_start_recognition.md)。
+![](https://user-images.githubusercontent.com/19523330/173273046-239a42da-c88d-4c2c-94b1-2134557afa49.png)
+
 
 ## 欢迎加入技术交流群
 
-* 您可以扫描下面的QQ/微信二维码（添加小助手微信并回复“C”），加入PaddleClas微信交流群，获得更高效的问题答疑，与各行各业开发者充分交流，期待您的加入。
+* 您可以扫描下面的微信/QQ二维码（添加小助手微信并回复“C”），加入PaddleClas微信交流群，获得更高效的问题答疑，与各行各业开发者充分交流，期待您的加入。
 
 <div align="center">
-<img src="https://user-images.githubusercontent.com/80816848/164383225-e375eb86-716e-41b4-a9e0-4b8a3976c1aa.jpg" width="200"/>
 <img src="https://user-images.githubusercontent.com/48054808/160531099-9811bbe6-cfbb-47d5-8bdb-c2b40684d7dd.png" width="200"/>
+<img src="https://user-images.githubusercontent.com/80816848/164383225-e375eb86-716e-41b4-a9e0-4b8a3976c1aa.jpg" width="200"/>
 </div>
 
 ## 快速体验
 
+PULC超轻量图像分类方案快速体验：[点击这里](docs/zh_CN/PULC/PULC_quickstart.md)
+
 PP-ShiTu图像识别快速体验：[点击这里](./docs/zh_CN/quick_start/quick_start_recognition.md)
 
 ## 文档教程
-- 安装说明
-  - [安装Paddle](./docs/zh_CN/installation/install_paddle.md)
-  - [安装PaddleClas](./docs/zh_CN/installation/install_paddleclas.md)
-- 快速体验
-  - [PP-ShiTu图像识别快速体验](./docs/zh_CN/quick_start/quick_start_recognition.md)
-  - 图像分类快速体验
-    - [尝鲜版](./docs/zh_CN/quick_start/quick_start_classification_new_user.md)
-    - [进阶版](./docs/zh_CN/quick_start/quick_start_classification_professional.md)
-    - [多标签分类](./docs/zh_CN/quick_start/quick_start_multilabel_classification.md)
+- [环境准备](docs/zh_CN/installation/install_paddleclas.md)
+- [PULC超轻量图像分类实用方案](docs/zh_CN/PULC/PULC_train.md)
+  - [超轻量图像分类快速体验](docs/zh_CN/PULC/PULC_quickstart.md)
+  - [超轻量图像分类模型库](docs/zh_CN/PULC/PULC_model_list.md)
+    - [PULC有人/无人分类模型](docs/zh_CN/PULC/PULC_person_exists.md)
+    - [PULC人体属性识别模型](docs/zh_CN/PULC/PULC_person_attribute.md)
+    - [PULC佩戴安全帽分类模型](docs/zh_CN/PULC/PULC_safety_helmet.md)
+    - [PULC交通标志分类模型](docs/zh_CN/PULC/PULC_traffic_sign.md)
+    - [PULC车辆属性识别模型](docs/zh_CN/PULC/PULC_vehicle_attribute.md)
+    - [PULC有车/无车分类模型](docs/zh_CN/PULC/PULC_car_exists.md)
+    - [PULC含文字图像方向分类模型](docs/zh_CN/PULC/PULC_text_image_orientation.md)
+    - [PULC文本行方向分类模型](docs/zh_CN/PULC/PULC_textline_orientation.md)
+    - [PULC语种分类模型](docs/zh_CN/PULC/PULC_language_classification.md)
+  - [模型训练](docs/zh_CN/PULC/PULC_train.md)
+  - 推理部署
+    - [基于python预测引擎推理](docs/zh_CN/inference_deployment/python_deploy.md#1)
+    - [基于C++预测引擎推理](docs/zh_CN/inference_deployment/cpp_deploy.md)
+    - [服务化部署](docs/zh_CN/inference_deployment/paddle_serving_deploy.md)
+    - [端侧部署](docs/zh_CN/inference_deployment/paddle_lite_deploy.md)
+    - [Paddle2ONNX模型转化与预测](deploy/paddle2onnx/readme.md)
+  - [模型压缩](deploy/slim/README.md)
 - [PP-ShiTu图像识别系统介绍](#图像识别系统介绍)
+  - [图像识别快速体验](docs/zh_CN/quick_start/quick_start_recognition.md)
+  - 模块介绍
     - [主体检测](./docs/zh_CN/image_recognition_pipeline/mainbody_detection.md)
-    - [特征提取](./docs/zh_CN/image_recognition_pipeline/feature_extraction.md)
+    - [特征提取模型](./docs/zh_CN/image_recognition_pipeline/feature_extraction.md)
     - [向量检索](./docs/zh_CN/image_recognition_pipeline/vector_search.md)
-- [骨干网络和预训练模型库](./docs/zh_CN/algorithm_introduction/ImageNet_models.md)
-- 数据准备
-  - [图像分类数据集介绍](./docs/zh_CN/data_preparation/classification_dataset.md)
-  - [图像识别数据集介绍](./docs/zh_CN/data_preparation/recognition_dataset.md)
-- 模型训练
-    - [图像分类任务](./docs/zh_CN/models_training/classification.md)
-    - [图像识别任务](./docs/zh_CN/models_training/recognition.md)
-    - [训练参数调整策略](./docs/zh_CN/models_training/train_strategy.md)
-    - [配置文件说明](./docs/zh_CN/models_training/config_description.md)
-- 模型预测部署
-    - [模型导出](./docs/zh_CN/inference_deployment/export_model.md)
-    - Python/C++ 预测引擎
-      - [基于Python预测引擎预测推理](./docs/zh_CN/inference_deployment/python_deploy.md)
-      - [基于C++分类预测引擎预测推理](./docs/zh_CN/inference_deployment/cpp_deploy.md)、[基于C++的PP-ShiTu预测引擎预测推理](deploy/cpp_shitu/readme.md)
-    - 服务化部署
-      - [Paddle Serving服务化部署(推荐)](./docs/zh_CN/inference_deployment/paddle_serving_deploy.md)
-      - [Hub serving服务化部署](./docs/zh_CN/inference_deployment/paddle_hub_serving_deploy.md)
-    - [端侧部署](./deploy/lite/readme.md)
-    - [whl包预测](./docs/zh_CN/inference_deployment/whl_deploy.md)
-- 算法介绍
-    - [图像分类任务介绍](./docs/zh_CN/algorithm_introduction/image_classification.md)
-    - [度量学习介绍](./docs/zh_CN/algorithm_introduction/metric_learning.md)
-- 高阶使用
-    - [数据增广](./docs/zh_CN/advanced_tutorials/DataAugmentation.md)
-    - [模型量化](./docs/zh_CN/advanced_tutorials/model_prune_quantization.md)
-    - [知识蒸馏](./docs/zh_CN/advanced_tutorials/knowledge_distillation.md)
-    - [PaddleClas结构解析](./docs/zh_CN/advanced_tutorials/code_overview.md)
-    - [社区贡献指南](./docs/zh_CN/advanced_tutorials/how_to_contribute.md)
+    - [哈希编码](docs/zh_CN/image_recognition_pipeline/)
+  - [模型训练](docs/zh_CN/models_training/recognition.md)
+  - 推理部署
+    - [基于python预测引擎推理](docs/zh_CN/inference_deployment/python_deploy.md#2)
+    - [基于C++预测引擎推理](deploy/cpp_shitu/readme.md)
+    - [服务化部署](docs/zh_CN/inference_deployment/paddle_serving_deploy.md)
+    - [端侧部署](deploy/lite_shitu/README.md)
+- PP系列骨干网络模型
+  - [PP-HGNet](docs/zh_CN/models/PP-HGNet.md)
+  - [PP-LCNetv2](docs/zh_CN/models/PP-LCNetV2.md)
+  - [PP-LCNet](docs/zh_CN/models/PP-LCNet.md)
+- [SSLD半监督知识蒸馏方案](docs/zh_CN/advanced_tutorials/ssld.md)
+- 前沿算法
+  - [骨干网络和预训练模型库](docs/zh_CN/algorithm_introduction/ImageNet_models.md)
+  - [度量学习](docs/zh_CN/algorithm_introduction/metric_learning.md)
+  - [模型压缩](docs/zh_CN/algorithm_introduction/model_prune_quantization.md)
+  - [模型蒸馏](docs/zh_CN/algorithm_introduction/knowledge_distillation.md)
+  - [数据增强](docs/zh_CN/advanced_tutorials/DataAugmentation.md)
+- [产业实用范例库](docs/zh_CN/samples)
+- [30分钟快速体验图像分类](docs/zh_CN/quick_start/quick_start_classification_new_user.md)
 - FAQ
-    - [图像识别精选问题](docs/zh_CN/faq_series/faq_2021_s2.md)
-    - [图像分类精选问题](docs/zh_CN/faq_series/faq_selected_30.md)
-    - [图像分类FAQ第一季](docs/zh_CN/faq_series/faq_2020_s1.md)
-    - [图像分类FAQ第二季](docs/zh_CN/faq_series/faq_2021_s1.md)
+  - [图像识别精选问题](docs/zh_CN/faq_series/faq_2021_s2.md)
+  - [图像分类精选问题](docs/zh_CN/faq_series/faq_selected_30.md)
+  - [图像分类FAQ第一季](docs/zh_CN/faq_series/faq_2020_s1.md)
+  - [图像分类FAQ第二季](docs/zh_CN/faq_series/faq_2021_s1.md)
+- [社区贡献指南](./docs/zh_CN/advanced_tutorials/how_to_contribute.md)
 - [许可证书](#许可证书)
 - [贡献代码](#贡献代码)
 
+
+<a name="PULC超轻量图像分类方案"></a>
+## PULC超轻量图像分类方案
+<div align="center">
+<img src="https://user-images.githubusercontent.com/19523330/173011854-b10fcd7a-b799-4dfd-a1cf-9504952a3c44.png"  width = "800" />
+</div>
+PULC融合了骨干网络、数据增广、蒸馏等多种前沿算法，可以自动训练得到轻量且高精度的图像分类模型。
+PaddleClas提供了覆盖人、车、OCR场景九大常见任务的分类模型，CPU推理3ms，精度比肩SwinTransformer。
+
 <a name="图像识别系统介绍"></a>
-## PP-ShiTu图像识别系统介绍
+## PP-ShiTu图像识别系统
 
 <div align="center">
 <img src="./docs/images/structure.jpg"  width = "800" />
@@ -111,6 +135,11 @@ PP-ShiTu图像识别快速体验：[点击这里](./docs/zh_CN/quick_start/quick
 
 PP-ShiTu是一个实用的轻量级通用图像识别系统，主要由主体检测、特征学习和向量检索三个模块组成。该系统从骨干网络选择和调整、损失函数的选择、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型裁剪量化8个方面，采用多种策略，对各个模块的模型进行优化，最终得到在CPU上仅0.2s即可完成10w+库的图像识别的系统。更多细节请参考[PP-ShiTu技术方案](https://arxiv.org/pdf/2111.00775.pdf)。
 
+<a name="分类效果展示"></a>
+## PULC实用图像分类模型效果展示
+<div align="center">
+<img src="docs/images/classification.gif">
+</div>
 
 <a name="识别效果展示"></a>
 ## PP-ShiTu图像识别系统效果展示
diff --git a/README_en.md b/README_en.md
index 9b0d7c85d76cf06eac8fb265abb85c3bb98a275f..4bf960e57f2e56972f889c4bcf6a6d715b903477 100644
--- a/README_en.md
+++ b/README_en.md
@@ -4,39 +4,41 @@
 
 ## Introduction
 
-PaddleClas is an image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
+PaddleClas is an image classification and image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
 
-**Recent updates**
-
-- 2022.4.21 Added the related [code](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files) of the CVPR2022 oral paper [MixFormer](https://arxiv.org/pdf/2204.02557.pdf).
-
-- 2021.09.17 Add PP-LCNet series model developed by PaddleClas, these models show strong competitiveness on Intel CPUs.
-For the introduction of PP-LCNet, please refer to [paper](https://arxiv.org/pdf/2109.15099.pdf) or [PP-LCNet model introduction](docs/en/models/PP-LCNet_en.md). The metrics and pretrained model are available [here](docs/en/ImageNet_models_en.md).
-
-- 2021.06.29 Add Swin-transformer series model，Highest top1 acc on ImageNet1k dataset reaches 87.2%, training, evaluation and inference are all supported. Pretrained models can be downloaded [here](docs/en/models/models_intro_en.md).
-- 2021.06.16 PaddleClas release/2.2. Add metric learning and vector search modules. Add product recognition, animation character recognition, vehicle recognition and logo recognition. Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
-- [more](./docs/en/update_history_en.md)
+<div align="center">
+<img src="./docs/images/class_simple_en.gif"  width = "600" />
 
-## Features
+PULC demo images
+</div>
+&nbsp;
 
-- A practical image recognition system consist of detection, feature learning and retrieval modules, widely applicable to all types of image recognition tasks.
-Four sample solutions are provided, including product recognition, vehicle recognition, logo recognition and animation character recognition.
 
-- Rich library of pre-trained models: Provide a total of 164 ImageNet pre-trained models in 35 series, among which 6 selected series of models support fast structural modification.
+<div align="center">
+<img src="./docs/images/recognition.gif"  width = "400" />
 
-- Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be combined and switched at will through configuration files.
+PP-ShiTu demo images
+</div>
 
-- SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
+**Recent updates**
+- 2022.6.15 Release [**P**ractical **U**ltra **L**ight-weight image **C**lassification solutions](./docs/en/PULC/PULC_quickstart_en.md). PULC models inference within 3ms on CPU devices, with accuracy on par with SwinTransformer. We also release 9 practical classification models covering pedestrian, vehicle and OCR scenario.
+- 2022.4.21 Added the related [code](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files) of the CVPR2022 oral paper [MixFormer](https://arxiv.org/pdf/2204.02557.pdf).
 
-- Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc.  with detailed introduction, code replication and evaluation of effectiveness in a unified experimental environment.
+- 2021.09.17 Add PP-LCNet series model developed by PaddleClas, these models show strong competitiveness on Intel CPUs.
+For the introduction of PP-LCNet, please refer to [paper](https://arxiv.org/pdf/2109.15099.pdf) or [PP-LCNet model introduction](docs/en/models/PP-LCNet_en.md). The metrics and pretrained model are available [here](docs/en/algorithm_introduction/ImageNet_models_en.md).
 
+- 2021.06.29 Add [Swin-transformer](docs/en/models/SwinTransformer_en.md)) series model，Highest top1 acc on ImageNet1k dataset reaches 87.2%, training, evaluation and inference are all supported. Pretrained models can be downloaded [here](docs/en/algorithm_introduction/ImageNet_models_en.md#16).
+- 2021.06.16 PaddleClas release/2.2. Add metric learning and vector search modules. Add product recognition, animation character recognition, vehicle recognition and logo recognition. Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
+- [more](./docs/en/others/update_history_en.md)
 
+## Features
 
+PaddleClas release PP-HGNet、PP-LCNetv2、 PP-LCNet and **S**imple **S**emi-supervised **L**abel **D**istillation algorithms, and support plenty of
+image classification and image recognition algorithms.
+Based on th algorithms above, PaddleClas release PP-ShiTu image recognition system and [**P**ractical **U**ltra **L**ight-weight image **C**lassification solutions](docs/en/PULC/PULC_quickstart_en.md).
 
-<div align="center">
-<img src="./docs/images/recognition_en.gif"  width = "400" />
-</div>
 
+![](https://user-images.githubusercontent.com/19523330/173539361-68cf7ab1-7e3b-4e5e-b00f-1500719bd2a2.png)
 
 ## Welcome to Join the Technical Exchange Group
 
@@ -48,41 +50,57 @@ Four sample solutions are provided, including product recognition, vehicle recog
 </div>
 
 ## Quick Start
-Quick experience of image recognition：[Link](./docs/en/tutorials/quick_start_recognition_en.md)
+Quick experience of PP-ShiTu image recognition system：[Link](./docs/en/quick_start/quick_start_recognition_en.md)
+
+Quick experience of **P**ractical **U**ltra **L**ight-weight image **C**lassification models：[Link](docs/en/PULC/PULC_quickstart_en.md)
 
 ## Tutorials
 
-- [Quick Installation](./docs/en/tutorials/install_en.md)
-- [Quick Start of Recognition](./docs/en/tutorials/quick_start_recognition_en.md)
+- [Install Paddle](./docs/en/installation/install_paddle_en.md)
+- [Install PaddleClas Environment](./docs/en/installation/install_paddleclas_en.md)
+- [Practical Ultra Light-weight image Classification solutions](./docs/en/PULC/PULC_train_en.md)
+  - [PULC Quick Start](docs/en/PULC/PULC_quickstart_en.md)
+  - [PULC Model Zoo](docs/en/PULC/PULC_model_list_en.md)
+    - [PULC Classification Model of Someone or Nobody](docs/en/PULC/PULC_person_exists_en.md)
+    - [PULC Recognition Model of Person Attribute](docs/en/PULC/PULC_person_attribute_en.md)
+    - [PULC Classification Model of Wearing or Unwearing Safety Helmet](docs/en/PULC/PULC_safety_helmet_en.md)
+    - [PULC Classification Model of Traffic Sign](docs/en/PULC/PULC_traffic_sign_en.md)
+    - [PULC Recognition Model of Vehicle Attribute](docs/en/PULC/PULC_vehicle_attribute_en.md)
+    - [PULC Classification Model of Containing or Uncontaining Car](docs/en/PULC/PULC_car_exists_en.md)
+    - [PULC Classification Model of Text Image Orientation](docs/en/PULC/PULC_text_image_orientation_en.md)
+    - [PULC Classification Model of Textline Orientation](docs/en/PULC/PULC_textline_orientation_en.md)
+    - [PULC Classification Model of Language](docs/en/PULC/PULC_language_classification_en.md)
+- [Quick Start of Recognition](./docs/en/quick_start/quick_start_recognition_en.md)
 - [Introduction to Image Recognition Systems](#Introduction_to_Image_Recognition_Systems)
-- [Demo images](#Demo_images)
+- [Image Recognition Demo images](#Rec_Demo_images)
+- [PULC demo images](#Clas_Demo_images)
 - Algorithms Introduction
-    - [Backbone Network and Pre-trained Model Library](./docs/en/ImageNet_models_en.md)
-    - [Mainbody Detection](./docs/en/application/mainbody_detection_en.md)
-    - [Image Classification](./docs/en/tutorials/image_classification_en.md)
-    - [Feature Learning](./docs/en/application/feature_learning_en.md)
-        - [Product Recognition](./docs/en/application/product_recognition_en.md)
-        - [Vehicle Recognition](./docs/en/application/vehicle_recognition_en.md)
-        - [Logo Recognition](./docs/en/application/logo_recognition_en.md)
-        - [Animation Character Recognition](./docs/en/application/cartoon_character_recognition_en.md)
+    - [Backbone Network and Pre-trained Model Library](./docs/en/algorithm_introduction/ImageNet_models_en.md)
+    - [Mainbody Detection](./docs/en/image_recognition_pipeline/mainbody_detection_en.md)
+    - [Feature Learning](./docs/en/image_recognition_pipeline/feature_extraction_en.md)
     - [Vector Search](./deploy/vector_search/README.md)
-- Models Training/Evaluation
-    - [Image Classification](./docs/en/tutorials/getting_started_en.md)
-    - [Feature Learning](./docs/en/tutorials/getting_started_retrieval_en.md)
 - Inference Model Prediction
-    - [Python Inference](./docs/en/inference.md)
+    - [Python Inference](./docs/en/inference_deployment/python_deploy_en.md)
     - [C++ Classfication Inference](./deploy/cpp/readme_en.md)， [C++ PP-ShiTu Inference](deploy/cpp_shitu/readme_en.md)
 - Model Deploy (only support classification for now, recognition coming soon)
     - [Hub Serving Deployment](./deploy/hubserving/readme_en.md)
     - [Mobile Deployment](./deploy/lite/readme_en.md)
-    - [Inference Using whl](./docs/en/whl_en.md)
+    - [Inference Using whl](./docs/en/inference_deployment/whl_deploy_en.md)
 - Advanced Tutorial
     - [Knowledge Distillation](./docs/en/advanced_tutorials/distillation/distillation_en.md)
-    - [Model Quantization](./docs/en/extension/paddle_quantization_en.md)
-    - [Data Augmentation](./docs/en/advanced_tutorials/image_augmentation/ImageAugment_en.md)
+    - [Model Quantization](./docs/en/algorithm_introduction/model_prune_quantization_en.md)
+    - [Data Augmentation](./docs/en/advanced_tutorials/DataAugmentation_en.md)
 - [License](#License)
 - [Contribution](#Contribution)
 
+<a name="Introduction_to_PULC"></a>
+## Introduction to Practical Ultra Light-weight image Classification solutions
+<div align="center">
+<img src="https://user-images.githubusercontent.com/19523330/173011854-b10fcd7a-b799-4dfd-a1cf-9504952a3c44.png"  width = "800" />
+</div>
+PULC solutions consists of PP-LCNet light-weight backbone, SSLD pretrained models, Ensemble of Data Augmentation strategy and SKL-UGI knowledge distillation.
+PULC models inference within 3ms on CPU devices, with accuracy comparable with SwinTransformer. We also release 9 practical models covering pedestrian, vehicle and OCR.
+
 <a name="Introduction_to_Image_Recognition_Systems"></a>
 ## Introduction to Image Recognition Systems
 
@@ -97,8 +115,14 @@ Image recognition can be divided into three steps:
 
 For a new unknown category, there is no need to retrain the model, just prepare images of new category, extract features and update retrieval database and the category can be recognised.
 
-<a name="Demo_images"></a>
-## Demo images [more](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.2/docs/images/recognition/more_demo_images)
+<a name="Clas_Demo_images"></a>
+## PULC demo images
+<div align="center">
+<img src="docs/images/classification_en.gif">
+</div>
+
+<a name="Rec_Demo_images"></a>
+## Image Recognition Demo images [more](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.2/docs/images/recognition/more_demo_images)
 - Product recognition
 <div align="center">
 <img src="https://user-images.githubusercontent.com/18028216/122769644-51604f80-d2d7-11eb-8290-c53b12a5c1f6.gif"  width = "400" />
diff --git a/deploy/configs/PULC/car_exists/inference_car_exists.yaml b/deploy/configs/PULC/car_exists/inference_car_exists.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b6733069d99b5622c83321bc628f3d70274ce8d4
--- /dev/null
+++ b/deploy/configs/PULC/car_exists/inference_car_exists.yaml
@@ -0,0 +1,36 @@
+Global:
+  infer_imgs: "./images/PULC/car_exists/objects365_00001507.jpeg"
+  inference_model_dir: "./models/car_exists_infer"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: False
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 3
+    - ToCHWImage:
+
+PostProcess:
+  main_indicator: ThreshOutput
+  ThreshOutput:
+    threshold: 0.5
+    label_0: no_car
+    label_1: contains_car
+  SavePreLabel:
+    save_dir: ./pre_label/
diff --git a/deploy/configs/PULC/language_classification/inference_language_classification.yaml b/deploy/configs/PULC/language_classification/inference_language_classification.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..fb9fb6b6631e774e7486bcdb31c25621e2b7d790
--- /dev/null
+++ b/deploy/configs/PULC/language_classification/inference_language_classification.yaml
@@ -0,0 +1,33 @@
+Global:
+  infer_imgs: "./images/PULC/language_classification/word_35404.png"
+  inference_model_dir: "./models/language_classification_infer"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: False
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        size: [160, 80]
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 3
+    - ToCHWImage:
+
+PostProcess:
+  main_indicator: Topk
+  Topk:
+    topk: 2
+    class_id_map_file: "../ppcls/utils/PULC_label_list/language_classification_label_list.txt"
+  SavePreLabel:
+    save_dir: ./pre_label/
diff --git a/deploy/configs/PULC/person_attribute/inference_person_attribute.yaml b/deploy/configs/PULC/person_attribute/inference_person_attribute.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..d5be2a3568291d0a31a7026974fc22ecf54a8f4c
--- /dev/null
+++ b/deploy/configs/PULC/person_attribute/inference_person_attribute.yaml
@@ -0,0 +1,32 @@
+Global:
+  infer_imgs: "./images/PULC/person_attribute/090004.jpg"
+  inference_model_dir: "./models/person_attribute_infer"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: True
+  cpu_num_threads: 10
+  benchmark: False
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        size: [192, 256]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 3
+    - ToCHWImage:
+
+PostProcess:
+  main_indicator: PersonAttribute
+  PersonAttribute:
+    threshold: 0.5  #default threshold
+    glasses_threshold: 0.3  #threshold only for glasses
+    hold_threshold: 0.6 #threshold only for hold  
diff --git a/deploy/configs/PULC/person/inference_person_cls.yaml b/deploy/configs/PULC/person_exists/inference_person_exists.yaml
similarity index 81%
rename from deploy/configs/PULC/person/inference_person_cls.yaml
rename to deploy/configs/PULC/person_exists/inference_person_exists.yaml
index a70f663a792fcdcab3b7d45059f2afe0b1efbf07..3df94a80c7c75814e778e5320a31b20a8a7eb742 100644
--- a/deploy/configs/PULC/person/inference_person_cls.yaml
+++ b/deploy/configs/PULC/person_exists/inference_person_exists.yaml
@@ -1,6 +1,6 @@
 Global:
-  infer_imgs: "./images/PULC/person/objects365_02035329.jpg"
-  inference_model_dir: "./models/person_cls_infer"
+  infer_imgs: "./images/PULC/person_exists/objects365_02035329.jpg"
+  inference_model_dir: "./models/person_exists_infer"
   batch_size: 1
   use_gpu: True
   enable_mkldnn: False
@@ -29,7 +29,7 @@ PreProcess:
 PostProcess:
   main_indicator: ThreshOutput
   ThreshOutput:
-    threshold: 0.9
+    threshold: 0.5
     label_0: nobody
     label_1: someone
   SavePreLabel:
diff --git a/deploy/configs/PULC/safety_helmet/inference_safety_helmet.yaml b/deploy/configs/PULC/safety_helmet/inference_safety_helmet.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..66a4cebb359a9b1f03a205ee6a031ca6464cffa8
--- /dev/null
+++ b/deploy/configs/PULC/safety_helmet/inference_safety_helmet.yaml
@@ -0,0 +1,36 @@
+Global:
+  infer_imgs: "./images/PULC/safety_helmet/safety_helmet_test_1.png"
+  inference_model_dir: "./models/safety_helmet_infer"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: False
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 3
+    - ToCHWImage:
+
+PostProcess:
+  main_indicator: ThreshOutput
+  ThreshOutput:
+    threshold: 0.5
+    label_0: wearing_helmet
+    label_1: unwearing_helmet
+  SavePreLabel:
+    save_dir: ./pre_label/
diff --git a/deploy/configs/PULC/text_image_orientation/inference_text_image_orientation.yaml b/deploy/configs/PULC/text_image_orientation/inference_text_image_orientation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c6c3969ffa627288fe58fab28b3fe1cbffe9dd03
--- /dev/null
+++ b/deploy/configs/PULC/text_image_orientation/inference_text_image_orientation.yaml
@@ -0,0 +1,35 @@
+Global:
+  infer_imgs: "./images/PULC/text_image_orientation/img_rot0_demo.jpg"
+  inference_model_dir: "./models/text_image_orientation_infer"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: False
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 3
+    - ToCHWImage:
+
+PostProcess:
+  main_indicator: Topk
+  Topk:
+    topk: 2
+    class_id_map_file: "../ppcls/utils/PULC_label_list/text_image_orientation_label_list.txt"
+  SavePreLabel:
+    save_dir: ./pre_label/
diff --git a/deploy/configs/PULC/textline_orientation/inference_textline_orientation.yaml b/deploy/configs/PULC/textline_orientation/inference_textline_orientation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..108b3dd53a95c06345bdd7ccd34b2e5252d2df19
--- /dev/null
+++ b/deploy/configs/PULC/textline_orientation/inference_textline_orientation.yaml
@@ -0,0 +1,33 @@
+Global:
+  infer_imgs: "./images/PULC/textline_orientation/textline_orientation_test_0_0.png"
+  inference_model_dir: "./models/textline_orientation_infer"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: True
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        size: [160, 80]
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 3
+    - ToCHWImage:
+
+PostProcess:
+  main_indicator: Topk
+  Topk:
+    topk: 1
+    class_id_map_file: "../ppcls/utils/PULC_label_list/textline_orientation_label_list.txt"
+  SavePreLabel:
+    save_dir: ./pre_label/
diff --git a/deploy/configs/PULC/traffic_sign/inference_traffic_sign.yaml b/deploy/configs/PULC/traffic_sign/inference_traffic_sign.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..53699718b4fdd38da86eaee4cccc584dcc87d2b7
--- /dev/null
+++ b/deploy/configs/PULC/traffic_sign/inference_traffic_sign.yaml
@@ -0,0 +1,35 @@
+Global:
+  infer_imgs: "./images/PULC/traffic_sign/99603_17806.jpg"
+  inference_model_dir: "./models/traffic_sign_infer"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: True
+  cpu_num_threads: 10
+  benchmark: False
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 3
+    - ToCHWImage:
+
+PostProcess:
+  main_indicator: Topk
+  Topk:
+    topk: 5
+    class_id_map_file: "../ppcls/utils/PULC_label_list/traffic_sign_label_list.txt"
+  SavePreLabel:
+    save_dir: ./pre_label/
diff --git a/deploy/configs/PULC/vehicle_attribute/inference_vehicle_attribute.yaml b/deploy/configs/PULC/vehicle_attribute/inference_vehicle_attribute.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..14ae348d09faca113d5863fbb57f066675b3f447
--- /dev/null
+++ b/deploy/configs/PULC/vehicle_attribute/inference_vehicle_attribute.yaml
@@ -0,0 +1,32 @@
+Global:
+  infer_imgs: "./images/PULC/vehicle_attribute/0002_c002_00030670_0.jpg"
+  inference_model_dir: "./models/vehicle_attribute_infer"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: True
+  cpu_num_threads: 10
+  benchmark: False
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        size: [256, 192]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 3
+    - ToCHWImage:
+
+PostProcess:
+  main_indicator: VehicleAttribute
+  VehicleAttribute:
+    color_threshold: 0.5
+    type_threshold: 0.5
+  
diff --git a/deploy/configs/inference_attr.yaml b/deploy/configs/inference_attr.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..88f73db5419414812450b768ac783982386f0a78
--- /dev/null
+++ b/deploy/configs/inference_attr.yaml
@@ -0,0 +1,33 @@
+Global:
+  infer_imgs: "./images/Pedestrain_Attr.jpg"
+  inference_model_dir: "../inference/"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: False
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        size: [192, 256]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 3
+    - ToCHWImage:
+
+PostProcess:
+  main_indicator: PersonAttribute
+  PersonAttribute:
+    threshold: 0.5  #default threshold
+    glasses_threshold: 0.3  #threshold only for glasses
+    hold_threshold: 0.6 #threshold only for hold
+  
diff --git a/deploy/configs/inference_cls.yaml b/deploy/configs/inference_cls.yaml
index fc0f0fe67aa628e504bb6fcb743f29fd020548cc..d9181278cc617822f98e4966abf0d12ceca498a4 100644
--- a/deploy/configs/inference_cls.yaml
+++ b/deploy/configs/inference_cls.yaml
@@ -1,5 +1,5 @@
 Global:
-  infer_imgs: "./images/ILSVRC2012_val_00000010.jpeg"
+  infer_imgs: "./images/ImageNet/ILSVRC2012_val_00000010.jpeg"
   inference_model_dir: "./models"
   batch_size: 1
   use_gpu: True
@@ -32,4 +32,4 @@ PostProcess:
     topk: 5
     class_id_map_file: "../ppcls/utils/imagenet1k_label_list.txt"
   SavePreLabel:
-    save_dir: ./pre_label/
\ No newline at end of file
+    save_dir: ./pre_label/
diff --git a/deploy/configs/inference_cls_ch4.yaml b/deploy/configs/inference_cls_ch4.yaml
index 9b740ed8293c3d66a325682cafc42e2b1415df4d..85f9acb29a88772da63abe302354f5e17a9c3e59 100644
--- a/deploy/configs/inference_cls_ch4.yaml
+++ b/deploy/configs/inference_cls_ch4.yaml
@@ -1,5 +1,5 @@
 Global:
-  infer_imgs: "./images/ILSVRC2012_val_00000010.jpeg"
+  infer_imgs: "./images/ImageNet/ILSVRC2012_val_00000010.jpeg"
   inference_model_dir: "./models"
   batch_size: 1
   use_gpu: True
@@ -32,4 +32,4 @@ PostProcess:
     topk: 5
     class_id_map_file: "../ppcls/utils/imagenet1k_label_list.txt"
   SavePreLabel:
-    save_dir: ./pre_label/
\ No newline at end of file
+    save_dir: ./pre_label/
diff --git a/deploy/images/ILSVRC2012_val_00000010.jpeg b/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg
similarity index 100%
rename from deploy/images/ILSVRC2012_val_00000010.jpeg
rename to deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg
diff --git a/deploy/images/ILSVRC2012_val_00010010.jpeg b/deploy/images/ImageNet/ILSVRC2012_val_00010010.jpeg
similarity index 100%
rename from deploy/images/ILSVRC2012_val_00010010.jpeg
rename to deploy/images/ImageNet/ILSVRC2012_val_00010010.jpeg
diff --git a/deploy/images/ILSVRC2012_val_00020010.jpeg b/deploy/images/ImageNet/ILSVRC2012_val_00020010.jpeg
similarity index 100%
rename from deploy/images/ILSVRC2012_val_00020010.jpeg
rename to deploy/images/ImageNet/ILSVRC2012_val_00020010.jpeg
diff --git a/deploy/images/ILSVRC2012_val_00030010.jpeg b/deploy/images/ImageNet/ILSVRC2012_val_00030010.jpeg
similarity index 100%
rename from deploy/images/ILSVRC2012_val_00030010.jpeg
rename to deploy/images/ImageNet/ILSVRC2012_val_00030010.jpeg
diff --git a/deploy/images/PULC/car_exists/objects365_00001507.jpeg b/deploy/images/PULC/car_exists/objects365_00001507.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..9959954b6b8bf27589e1d2081f86c6078d16e2c1
Binary files /dev/null and b/deploy/images/PULC/car_exists/objects365_00001507.jpeg differ
diff --git a/deploy/images/PULC/car_exists/objects365_00001521.jpeg b/deploy/images/PULC/car_exists/objects365_00001521.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..ea65b3108ec0476ce952b3221c31ac54fcef161d
Binary files /dev/null and b/deploy/images/PULC/car_exists/objects365_00001521.jpeg differ
diff --git a/deploy/images/PULC/language_classification/word_17.png b/deploy/images/PULC/language_classification/word_17.png
new file mode 100644
index 0000000000000000000000000000000000000000..c0cd74632460e01676fbc5a43b220c0a7f7d0474
Binary files /dev/null and b/deploy/images/PULC/language_classification/word_17.png differ
diff --git a/deploy/images/PULC/language_classification/word_20.png b/deploy/images/PULC/language_classification/word_20.png
new file mode 100644
index 0000000000000000000000000000000000000000..f9149670e8a2aa086c91451442f63a727661fd7d
Binary files /dev/null and b/deploy/images/PULC/language_classification/word_20.png differ
diff --git a/deploy/images/PULC/language_classification/word_35404.png b/deploy/images/PULC/language_classification/word_35404.png
new file mode 100644
index 0000000000000000000000000000000000000000..9e1789ab47aefecac8eaf1121decfc6a8cfb1e8b
Binary files /dev/null and b/deploy/images/PULC/language_classification/word_35404.png differ
diff --git a/deploy/images/PULC/person_attribute/090004.jpg b/deploy/images/PULC/person_attribute/090004.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..140694eeec3d2925303e8c0d544ef5979cd78219
Binary files /dev/null and b/deploy/images/PULC/person_attribute/090004.jpg differ
diff --git a/deploy/images/PULC/person_attribute/090007.jpg b/deploy/images/PULC/person_attribute/090007.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..9fea2e7c9e0047a8b59606877ad41fe24bf2e24c
Binary files /dev/null and b/deploy/images/PULC/person_attribute/090007.jpg differ
diff --git a/deploy/images/PULC/person/objects365_01780782.jpg b/deploy/images/PULC/person_exists/objects365_01780782.jpg
similarity index 100%
rename from deploy/images/PULC/person/objects365_01780782.jpg
rename to deploy/images/PULC/person_exists/objects365_01780782.jpg
diff --git a/deploy/images/PULC/person/objects365_02035329.jpg b/deploy/images/PULC/person_exists/objects365_02035329.jpg
similarity index 100%
rename from deploy/images/PULC/person/objects365_02035329.jpg
rename to deploy/images/PULC/person_exists/objects365_02035329.jpg
diff --git a/deploy/images/PULC/safety_helmet/safety_helmet_test_1.png b/deploy/images/PULC/safety_helmet/safety_helmet_test_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..c28f54f77d54df6e68e471538846b01db4387e08
Binary files /dev/null and b/deploy/images/PULC/safety_helmet/safety_helmet_test_1.png differ
diff --git a/deploy/images/PULC/safety_helmet/safety_helmet_test_2.png b/deploy/images/PULC/safety_helmet/safety_helmet_test_2.png
new file mode 100644
index 0000000000000000000000000000000000000000..8e784af808afb58d67fdb3e277dfeebd134ee846
Binary files /dev/null and b/deploy/images/PULC/safety_helmet/safety_helmet_test_2.png differ
diff --git a/deploy/images/PULC/text_image_orientation/img_rot0_demo.jpg b/deploy/images/PULC/text_image_orientation/img_rot0_demo.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..412d41956ba48c8e3243bdeff746d389be7e762b
Binary files /dev/null and b/deploy/images/PULC/text_image_orientation/img_rot0_demo.jpg differ
diff --git a/deploy/images/PULC/text_image_orientation/img_rot180_demo.jpg b/deploy/images/PULC/text_image_orientation/img_rot180_demo.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..f4725b96698e2ac222ae9d4830d8f29a33322443
Binary files /dev/null and b/deploy/images/PULC/text_image_orientation/img_rot180_demo.jpg differ
diff --git a/deploy/images/PULC/textline_orientation/textline_orientation_test_0_0.png b/deploy/images/PULC/textline_orientation/textline_orientation_test_0_0.png
new file mode 100644
index 0000000000000000000000000000000000000000..4b8d24d29ff0f8b4befff6bf943d506c36061d4d
Binary files /dev/null and b/deploy/images/PULC/textline_orientation/textline_orientation_test_0_0.png differ
diff --git a/deploy/images/PULC/textline_orientation/textline_orientation_test_0_1.png b/deploy/images/PULC/textline_orientation/textline_orientation_test_0_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..42ad5234973679e65be6054f90c1cc7c0f989bd2
Binary files /dev/null and b/deploy/images/PULC/textline_orientation/textline_orientation_test_0_1.png differ
diff --git a/deploy/images/PULC/textline_orientation/textline_orientation_test_1_0.png b/deploy/images/PULC/textline_orientation/textline_orientation_test_1_0.png
new file mode 100644
index 0000000000000000000000000000000000000000..ac2447842dd0fac260c0d3c6e0d156dda9890923
Binary files /dev/null and b/deploy/images/PULC/textline_orientation/textline_orientation_test_1_0.png differ
diff --git a/deploy/images/PULC/textline_orientation/textline_orientation_test_1_1.png b/deploy/images/PULC/textline_orientation/textline_orientation_test_1_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..7d5b75f7e5bbeabded56eba1b4b566c4ca019590
Binary files /dev/null and b/deploy/images/PULC/textline_orientation/textline_orientation_test_1_1.png differ
diff --git a/deploy/images/PULC/traffic_sign/100999_83928.jpg b/deploy/images/PULC/traffic_sign/100999_83928.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6f32ed5ae2d8483d29986e3a45db1789da2a4d43
Binary files /dev/null and b/deploy/images/PULC/traffic_sign/100999_83928.jpg differ
diff --git a/deploy/images/PULC/traffic_sign/99603_17806.jpg b/deploy/images/PULC/traffic_sign/99603_17806.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c792fdf6eb64395fffaf8289a1ec14d47279860e
Binary files /dev/null and b/deploy/images/PULC/traffic_sign/99603_17806.jpg differ
diff --git a/deploy/images/PULC/vehicle_attribute/0002_c002_00030670_0.jpg b/deploy/images/PULC/vehicle_attribute/0002_c002_00030670_0.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..bb5de9fc6ff99550bf9bff8d4a9f0d0e0fe18c06
Binary files /dev/null and b/deploy/images/PULC/vehicle_attribute/0002_c002_00030670_0.jpg differ
diff --git a/deploy/images/PULC/vehicle_attribute/0014_c012_00040750_0.jpg b/deploy/images/PULC/vehicle_attribute/0014_c012_00040750_0.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..76207d43ce597a1079c523dca0c32923bf15db19
Binary files /dev/null and b/deploy/images/PULC/vehicle_attribute/0014_c012_00040750_0.jpg differ
diff --git a/deploy/images/Pedestrain_Attr.jpg b/deploy/images/Pedestrain_Attr.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6a87e856af8c17a3b93617b93ea517b91c508619
Binary files /dev/null and b/deploy/images/Pedestrain_Attr.jpg differ
diff --git a/deploy/paddle2onnx/readme.md b/deploy/paddle2onnx/readme.md
index d1307ea84e3d7a1465c7c464d3b41dfa7613a046..bacc202806bf1a60e85790969edcb70f1489f7df 100644
--- a/deploy/paddle2onnx/readme.md
+++ b/deploy/paddle2onnx/readme.md
@@ -1,53 +1,59 @@
 # paddle2onnx 模型转化与预测
 
-本章节介绍 ResNet50_vd 模型如何转化为 ONNX 模型，并基于 ONNX 引擎预测。
+## 目录
+
+- [paddle2onnx 模型转化与预测](#paddle2onnx-模型转化与预测)
+  - [1. 环境准备](#1-环境准备)
+  - [2. 模型转换](#2-模型转换)
+  - [3. onnx 预测](#3-onnx-预测)
 
 ## 1. 环境准备
 
 需要准备 Paddle2ONNX 模型转化环境，和 ONNX 模型预测环境。
 
-Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式，算子目前稳定支持导出 ONNX Opset 9~11，部分Paddle算子支持更低的ONNX Opset转换。
-更多细节可参考 [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
+Paddle2ONNX 支持将 PaddlePaddle inference 模型格式转化到 ONNX 模型格式，算子目前稳定支持导出 ONNX Opset 9~11。
+更多细节可参考 [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX#paddle2onnx)
 
 - 安装 Paddle2ONNX
-```
-python3.7 -m pip install paddle2onnx
-```
+    ```shell
+    python3.7 -m pip install paddle2onnx
+    ```
 
-- 安装 ONNX 运行时
-```
-python3.7 -m pip install onnxruntime
-```
+- 安装 ONNX 推理引擎
+    ```shell
+    python3.7 -m pip install onnxruntime
+    ```
+下面以 ResNet50_vd 为例，介绍如何将 PaddlePaddle inference 模型转换为 ONNX 模型，并基于 ONNX 引擎预测。
 
 ## 2. 模型转换
 
 - ResNet50_vd inference模型下载
 
-```
-cd deploy
-mkdir models && cd models
-wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar && tar xf ResNet50_vd_infer.tar
-cd ..
-```
+    ```shell
+    cd deploy
+    mkdir models && cd models
+    wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar && tar xf ResNet50_vd_infer.tar
+    cd ..
+    ```
 
 - 模型转换
 
-使用 Paddle2ONNX 将 Paddle 静态图模型转换为 ONNX 模型格式：
-```
-paddle2onnx --model_dir=./models/ResNet50_vd_infer/ \
---model_filename=inference.pdmodel \
---params_filename=inference.pdiparams \
---save_file=./models/ResNet50_vd_infer/inference.onnx \
---opset_version=10 \
---enable_onnx_checker=True
-```
+    使用 Paddle2ONNX 将 Paddle 静态图模型转换为 ONNX 模型格式：
+    ```shell
+    paddle2onnx --model_dir=./models/ResNet50_vd_infer/ \
+    --model_filename=inference.pdmodel \
+    --params_filename=inference.pdiparams \
+    --save_file=./models/ResNet50_vd_infer/inference.onnx \
+    --opset_version=10 \
+    --enable_onnx_checker=True
+    ```
 
-执行完毕后，ONNX 模型 `inference.onnx` 会被保存在 `./models/ResNet50_vd_infer/` 路径下
+转换完毕后，生成的ONNX 模型 `inference.onnx` 会被保存在 `./models/ResNet50_vd_infer/` 路径下
 
 ## 3. onnx 预测
 
 执行如下命令：
-```
+```shell
 python3.7 python/predict_cls.py \
 -c configs/inference_cls.yaml \
 -o Global.use_onnx=True \
diff --git a/deploy/paddle2onnx/readme_en.md b/deploy/paddle2onnx/readme_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..6df13e5fe31805d642432dea8526661e82b6e95b
--- /dev/null
+++ b/deploy/paddle2onnx/readme_en.md
@@ -0,0 +1,59 @@
+# Paddle2ONNX: Converting To ONNX and Deployment
+
+This section introduce that how to convert the Paddle Inference Model ResNet50_vd to ONNX model and deployment based on ONNX engine.
+
+## 1. Installation
+
+First, you need to install Paddle2ONNX and onnxruntime. Paddle2ONNX is a toolkit to convert Paddle Inference Model to ONNX model. Please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_en.md) for more information.
+
+- Paddle2ONNX Installation
+```
+python3.7 -m pip install paddle2onnx
+```
+
+- ONNX Installation
+```
+python3.7 -m pip install onnxruntime
+```
+
+## 2. Converting to ONNX
+
+Download the Paddle Inference Model ResNet50_vd:
+
+```
+cd deploy
+mkdir models && cd models
+wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar && tar xf ResNet50_vd_infer.tar
+cd ..
+```
+
+Converting to ONNX model:
+
+```
+paddle2onnx --model_dir=./models/ResNet50_vd_infer/ \
+--model_filename=inference.pdmodel \
+--params_filename=inference.pdiparams \
+--save_file=./models/ResNet50_vd_infer/inference.onnx \
+--opset_version=10 \
+--enable_onnx_checker=True
+```
+
+After running the above command, the ONNX model file converted would be save in  `./models/ResNet50_vd_infer/`.
+
+## 3. Deployment
+
+Deployment with ONNX model, command is as shown below.
+
+```
+python3.7 python/predict_cls.py \
+-c configs/inference_cls.yaml \
+-o Global.use_onnx=True \
+-o Global.use_gpu=False \
+-o Global.inference_model_dir=./models/ResNet50_vd_infer
+```
+
+The prediction results:
+
+```
+ILSVRC2012_val_00000010.jpeg:   class id(s): [153, 204, 229, 332, 155], score(s): [0.69, 0.10, 0.02, 0.01, 0.01], label_name(s): ['Maltese dog, Maltese terrier, Maltese', 'Lhasa, Lhasa apso', 'Old English sheepdog, bobtail', 'Angora, Angora rabbit', 'Shih-Tzu']
+```
diff --git a/deploy/paddleserving/build_server.sh b/deploy/paddleserving/build_server.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1329a3684ff72862858ee25c0a938bd61ff654ae
--- /dev/null
+++ b/deploy/paddleserving/build_server.sh
@@ -0,0 +1,88 @@
+# 使用镜像：
+# registry.baidubce.com/paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82
+
+# 编译Serving Server：
+
+# client和app可以直接使用release版本
+
+# server因为加入了自定义OP，需要重新编译
+
+# 默认编译时的${PWD}=PaddleClas/deploy/paddleserving/
+
+python_name=${1:-'python'}
+
+apt-get update
+apt install -y libcurl4-openssl-dev libbz2-dev
+wget -nc https://paddle-serving.bj.bcebos.com/others/centos_ssl.tar
+tar xf centos_ssl.tar
+rm -rf centos_ssl.tar
+mv libcrypto.so.1.0.2k /usr/lib/libcrypto.so.1.0.2k
+mv libssl.so.1.0.2k /usr/lib/libssl.so.1.0.2k
+ln -sf /usr/lib/libcrypto.so.1.0.2k /usr/lib/libcrypto.so.10
+ln -sf /usr/lib/libssl.so.1.0.2k /usr/lib/libssl.so.10
+ln -sf /usr/lib/libcrypto.so.10 /usr/lib/libcrypto.so
+ln -sf /usr/lib/libssl.so.10 /usr/lib/libssl.so
+
+# 安装go依赖
+rm -rf /usr/local/go
+wget -qO- https://paddle-ci.cdn.bcebos.com/go1.17.2.linux-amd64.tar.gz | tar -xz -C /usr/local
+export GOROOT=/usr/local/go
+export GOPATH=/root/gopath
+export PATH=$PATH:$GOPATH/bin:$GOROOT/bin
+go env -w GO111MODULE=on
+go env -w GOPROXY=https://goproxy.cn,direct
+go install github.com/grpc-ecosystem/grpc-gateway/protoc-gen-grpc-gateway@v1.15.2
+go install github.com/grpc-ecosystem/grpc-gateway/protoc-gen-swagger@v1.15.2
+go install github.com/golang/protobuf/protoc-gen-go@v1.4.3
+go install google.golang.org/grpc@v1.33.0
+go env -w GO111MODULE=auto
+
+# 下载opencv库
+wget https://paddle-qa.bj.bcebos.com/PaddleServing/opencv3.tar.gz
+tar -xvf opencv3.tar.gz
+rm -rf opencv3.tar.gz
+export OPENCV_DIR=$PWD/opencv3
+
+# clone Serving
+git clone https://github.com/PaddlePaddle/Serving.git -b develop --depth=1
+
+cd Serving # PaddleClas/deploy/paddleserving/Serving
+export Serving_repo_path=$PWD
+git submodule update --init --recursive
+${python_name} -m pip install -r python/requirements.txt
+
+# set env
+export PYTHON_INCLUDE_DIR=$(${python_name} -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())")
+export PYTHON_LIBRARIES=$(${python_name} -c "import distutils.sysconfig as sysconfig; print(sysconfig.get_config_var('LIBDIR'))")
+export PYTHON_EXECUTABLE=`which ${python_name}`
+
+export CUDA_PATH='/usr/local/cuda'
+export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
+export CUDA_CUDART_LIBRARY='/usr/local/cuda/lib64/'
+export TENSORRT_LIBRARY_PATH='/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/'
+
+# cp 自定义OP代码
+\cp ../preprocess/general_clas_op.* ${Serving_repo_path}/core/general-server/op
+\cp ../preprocess/preprocess_op.* ${Serving_repo_path}/core/predictor/tools/pp_shitu_tools
+
+# 编译Server
+mkdir server-build-gpu-opencv
+cd server-build-gpu-opencv
+cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
+-DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
+-DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
+-DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
+-DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
+-DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
+-DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH} \
+-DOPENCV_DIR=${OPENCV_DIR} \
+-DWITH_OPENCV=ON \
+-DSERVER=ON \
+-DWITH_GPU=ON ..
+make -j32
+
+${python_name} -m pip install python/dist/paddle*
+
+# export SERVING_BIN
+export SERVING_BIN=$PWD/core/general-server/serving
+cd ../../
\ No newline at end of file
diff --git a/deploy/paddleserving/config.yml b/deploy/paddleserving/config.yml
index d9f464dd093d5a3d0ac34a61f4af17e3792fcd86..92d8297f9f23a4082cb0a499ca4c172e71d79caf 100644
--- a/deploy/paddleserving/config.yml
+++ b/deploy/paddleserving/config.yml
@@ -30,4 +30,4 @@ op:
             client_type: local_predictor
 
             #Fetch结果列表，以client_config中fetch_var的alias_name为准
-            fetch_list: ["prediction"] 
+            fetch_list: ["prediction"]
diff --git a/deploy/paddleserving/preprocess/general_clas_op.cpp b/deploy/paddleserving/preprocess/general_clas_op.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..e0ab48fa52da70a558b34e7ab1deda52675e99bc
--- /dev/null
+++ b/deploy/paddleserving/preprocess/general_clas_op.cpp
@@ -0,0 +1,206 @@
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "core/general-server/op/general_clas_op.h"
+#include "core/predictor/framework/infer.h"
+#include "core/predictor/framework/memory.h"
+#include "core/predictor/framework/resource.h"
+#include "core/util/include/timer.h"
+#include <algorithm>
+#include <iostream>
+#include <memory>
+#include <sstream>
+
+namespace baidu {
+namespace paddle_serving {
+namespace serving {
+
+using baidu::paddle_serving::Timer;
+using baidu::paddle_serving::predictor::MempoolWrapper;
+using baidu::paddle_serving::predictor::general_model::Tensor;
+using baidu::paddle_serving::predictor::general_model::Response;
+using baidu::paddle_serving::predictor::general_model::Request;
+using baidu::paddle_serving::predictor::InferManager;
+using baidu::paddle_serving::predictor::PaddleGeneralModelConfig;
+
+int GeneralClasOp::inference() {
+  VLOG(2) << "Going to run inference";
+  const std::vector<std::string> pre_node_names = pre_names();
+  if (pre_node_names.size() != 1) {
+    LOG(ERROR) << "This op(" << op_name()
+               << ") can only have one predecessor op, but received "
+               << pre_node_names.size();
+    return -1;
+  }
+  const std::string pre_name = pre_node_names[0];
+
+  const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name);
+  if (!input_blob) {
+    LOG(ERROR) << "input_blob is nullptr,error";
+    return -1;
+  }
+  uint64_t log_id = input_blob->GetLogId();
+  VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name;
+
+  GeneralBlob *output_blob = mutable_data<GeneralBlob>();
+  if (!output_blob) {
+    LOG(ERROR) << "output_blob is nullptr,error";
+    return -1;
+  }
+  output_blob->SetLogId(log_id);
+
+  if (!input_blob) {
+    LOG(ERROR) << "(logid=" << log_id
+               << ") Failed mutable depended argument, op:" << pre_name;
+    return -1;
+  }
+
+  const TensorVector *in = &input_blob->tensor_vector;
+  TensorVector *out = &output_blob->tensor_vector;
+
+  int batch_size = input_blob->_batch_size;
+  output_blob->_batch_size = batch_size;
+  VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size;
+
+  Timer timeline;
+  int64_t start = timeline.TimeStampUS();
+  timeline.Start();
+
+  // only support string type
+
+  char *total_input_ptr = static_cast<char *>(in->at(0).data.data());
+  std::string base64str = total_input_ptr;
+
+  cv::Mat img = Base2Mat(base64str);
+
+  // RGB2BGR
+  cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
+
+  // Resize
+  cv::Mat resize_img;
+  resize_op_.Run(img, resize_img, resize_short_size_);
+
+  // CenterCrop
+  crop_op_.Run(resize_img, crop_size_);
+
+  // Normalize
+  normalize_op_.Run(&resize_img, mean_, scale_, is_scale_);
+
+  // Permute
+  std::vector<float> input(1 * 3 * resize_img.rows * resize_img.cols, 0.0f);
+  permute_op_.Run(&resize_img, input.data());
+  float maxValue = *max_element(input.begin(), input.end());
+  float minValue = *min_element(input.begin(), input.end());
+
+  TensorVector *real_in = new TensorVector();
+  if (!real_in) {
+    LOG(ERROR) << "real_in is nullptr,error";
+    return -1;
+  }
+
+  std::vector<int> input_shape;
+  int in_num = 0;
+  void *databuf_data = NULL;
+  char *databuf_char = NULL;
+  size_t databuf_size = 0;
+
+  input_shape = {1, 3, resize_img.rows, resize_img.cols};
+  in_num = std::accumulate(input_shape.begin(), input_shape.end(), 1,
+                           std::multiplies<int>());
+
+  databuf_size = in_num * sizeof(float);
+  databuf_data = MempoolWrapper::instance().malloc(databuf_size);
+  if (!databuf_data) {
+    LOG(ERROR) << "Malloc failed, size: " << databuf_size;
+    return -1;
+  }
+
+  memcpy(databuf_data, input.data(), databuf_size);
+  databuf_char = reinterpret_cast<char *>(databuf_data);
+  paddle::PaddleBuf paddleBuf(databuf_char, databuf_size);
+  paddle::PaddleTensor tensor_in;
+  tensor_in.name = in->at(0).name;
+  tensor_in.dtype = paddle::PaddleDType::FLOAT32;
+  tensor_in.shape = {1, 3, resize_img.rows, resize_img.cols};
+  tensor_in.lod = in->at(0).lod;
+  tensor_in.data = paddleBuf;
+  real_in->push_back(tensor_in);
+
+  if (InferManager::instance().infer(engine_name().c_str(), real_in, out,
+                                     batch_size)) {
+    LOG(ERROR) << "(logid=" << log_id
+               << ") Failed do infer in fluid model: " << engine_name().c_str();
+    return -1;
+  }
+
+  int64_t end = timeline.TimeStampUS();
+  CopyBlobInfo(input_blob, output_blob);
+  AddBlobInfo(output_blob, start);
+  AddBlobInfo(output_blob, end);
+  return 0;
+}
+
+cv::Mat GeneralClasOp::Base2Mat(std::string &base64_data) {
+  cv::Mat img;
+  std::string s_mat;
+  s_mat = base64Decode(base64_data.data(), base64_data.size());
+  std::vector<char> base64_img(s_mat.begin(), s_mat.end());
+  img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR
+  return img;
+}
+
+std::string GeneralClasOp::base64Decode(const char *Data, int DataByte) {
+  const char DecodeTable[] = {
+      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
+      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
+      0,  0,  0,  0,  0,  0,  0,  0,  0,
+      62, // '+'
+      0,  0,  0,
+      63,                                     // '/'
+      52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9'
+      0,  0,  0,  0,  0,  0,  0,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,
+      10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z'
+      0,  0,  0,  0,  0,  0,  26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
+      37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z'
+  };
+
+  std::string strDecode;
+  int nValue;
+  int i = 0;
+  while (i < DataByte) {
+    if (*Data != '\r' && *Data != '\n') {
+      nValue = DecodeTable[*Data++] << 18;
+      nValue += DecodeTable[*Data++] << 12;
+      strDecode += (nValue & 0x00FF0000) >> 16;
+      if (*Data != '=') {
+        nValue += DecodeTable[*Data++] << 6;
+        strDecode += (nValue & 0x0000FF00) >> 8;
+        if (*Data != '=') {
+          nValue += DecodeTable[*Data++];
+          strDecode += nValue & 0x000000FF;
+        }
+      }
+      i += 4;
+    } else // 回车换行,跳过
+    {
+      Data++;
+      i++;
+    }
+  }
+  return strDecode;
+}
+DEFINE_OP(GeneralClasOp);
+} // namespace serving
+} // namespace paddle_serving
+} // namespace baidu
diff --git a/deploy/paddleserving/preprocess/general_clas_op.h b/deploy/paddleserving/preprocess/general_clas_op.h
new file mode 100644
index 0000000000000000000000000000000000000000..69b7a8e005872d7b66b9a61265ca5798b4ac8bab
--- /dev/null
+++ b/deploy/paddleserving/preprocess/general_clas_op.h
@@ -0,0 +1,70 @@
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+#include "core/general-server/general_model_service.pb.h"
+#include "core/general-server/op/general_infer_helper.h"
+#include "core/predictor/tools/pp_shitu_tools/preprocess_op.h"
+#include "paddle_inference_api.h" // NOLINT
+#include <string>
+#include <vector>
+
+#include "opencv2/core.hpp"
+#include "opencv2/imgcodecs.hpp"
+#include "opencv2/imgproc.hpp"
+#include <chrono>
+#include <iomanip>
+#include <iostream>
+#include <ostream>
+#include <vector>
+
+#include <cstring>
+#include <fstream>
+#include <numeric>
+
+namespace baidu {
+namespace paddle_serving {
+namespace serving {
+
+class GeneralClasOp
+    : public baidu::paddle_serving::predictor::OpWithChannel<GeneralBlob> {
+public:
+  typedef std::vector<paddle::PaddleTensor> TensorVector;
+
+  DECLARE_OP(GeneralClasOp);
+
+  int inference();
+
+private:
+  // clas preprocess
+  std::vector<float> mean_ = {0.485f, 0.456f, 0.406f};
+  std::vector<float> scale_ = {0.229f, 0.224f, 0.225f};
+  bool is_scale_ = true;
+
+  int resize_short_size_ = 256;
+  int crop_size_ = 224;
+
+  PaddleClas::ResizeImg resize_op_;
+  PaddleClas::Normalize normalize_op_;
+  PaddleClas::Permute permute_op_;
+  PaddleClas::CenterCropImg crop_op_;
+
+  // read pics
+  cv::Mat Base2Mat(std::string &base64_data);
+  std::string base64Decode(const char *Data, int DataByte);
+};
+
+} // namespace serving
+} // namespace paddle_serving
+} // namespace baidu
diff --git a/deploy/paddleserving/preprocess/preprocess_op.cpp b/deploy/paddleserving/preprocess/preprocess_op.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..9c79342ceda115fe3c213bb6f5d32c6e56f2380a
--- /dev/null
+++ b/deploy/paddleserving/preprocess/preprocess_op.cpp
@@ -0,0 +1,149 @@
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "opencv2/core.hpp"
+#include "opencv2/imgcodecs.hpp"
+#include "opencv2/imgproc.hpp"
+#include "paddle_api.h"
+#include "paddle_inference_api.h"
+#include <chrono>
+#include <iomanip>
+#include <iostream>
+#include <ostream>
+#include <vector>
+
+#include <cstring>
+#include <fstream>
+#include <math.h>
+#include <numeric>
+
+#include "preprocess_op.h"
+
+namespace Feature {
+
+void Permute::Run(const cv::Mat *im, float *data) {
+  int rh = im->rows;
+  int rw = im->cols;
+  int rc = im->channels();
+  for (int i = 0; i < rc; ++i) {
+    cv::extractChannel(*im, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), i);
+  }
+}
+
+void Normalize::Run(cv::Mat *im, const std::vector<float> &mean,
+                    const std::vector<float> &std, float scale) {
+  (*im).convertTo(*im, CV_32FC3, scale);
+  for (int h = 0; h < im->rows; h++) {
+    for (int w = 0; w < im->cols; w++) {
+      im->at<cv::Vec3f>(h, w)[0] =
+          (im->at<cv::Vec3f>(h, w)[0] - mean[0]) / std[0];
+      im->at<cv::Vec3f>(h, w)[1] =
+          (im->at<cv::Vec3f>(h, w)[1] - mean[1]) / std[1];
+      im->at<cv::Vec3f>(h, w)[2] =
+          (im->at<cv::Vec3f>(h, w)[2] - mean[2]) / std[2];
+    }
+  }
+}
+
+void CenterCropImg::Run(cv::Mat &img, const int crop_size) {
+  int resize_w = img.cols;
+  int resize_h = img.rows;
+  int w_start = int((resize_w - crop_size) / 2);
+  int h_start = int((resize_h - crop_size) / 2);
+  cv::Rect rect(w_start, h_start, crop_size, crop_size);
+  img = img(rect);
+}
+
+void ResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img,
+                    int resize_short_size, int size) {
+  int resize_h = 0;
+  int resize_w = 0;
+  if (size > 0) {
+    resize_h = size;
+    resize_w = size;
+  } else {
+    int w = img.cols;
+    int h = img.rows;
+
+    float ratio = 1.f;
+    if (h < w) {
+      ratio = float(resize_short_size) / float(h);
+    } else {
+      ratio = float(resize_short_size) / float(w);
+    }
+    resize_h = round(float(h) * ratio);
+    resize_w = round(float(w) * ratio);
+  }
+  cv::resize(img, resize_img, cv::Size(resize_w, resize_h));
+}
+
+} // namespace Feature
+
+namespace PaddleClas {
+void Permute::Run(const cv::Mat *im, float *data) {
+  int rh = im->rows;
+  int rw = im->cols;
+  int rc = im->channels();
+  for (int i = 0; i < rc; ++i) {
+    cv::extractChannel(*im, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), i);
+  }
+}
+
+void Normalize::Run(cv::Mat *im, const std::vector<float> &mean,
+                    const std::vector<float> &scale, const bool is_scale) {
+  double e = 1.0;
+  if (is_scale) {
+    e /= 255.0;
+  }
+  (*im).convertTo(*im, CV_32FC3, e);
+  for (int h = 0; h < im->rows; h++) {
+    for (int w = 0; w < im->cols; w++) {
+      im->at<cv::Vec3f>(h, w)[0] =
+          (im->at<cv::Vec3f>(h, w)[0] - mean[0]) / scale[0];
+      im->at<cv::Vec3f>(h, w)[1] =
+          (im->at<cv::Vec3f>(h, w)[1] - mean[1]) / scale[1];
+      im->at<cv::Vec3f>(h, w)[2] =
+          (im->at<cv::Vec3f>(h, w)[2] - mean[2]) / scale[2];
+    }
+  }
+}
+
+void CenterCropImg::Run(cv::Mat &img, const int crop_size) {
+  int resize_w = img.cols;
+  int resize_h = img.rows;
+  int w_start = int((resize_w - crop_size) / 2);
+  int h_start = int((resize_h - crop_size) / 2);
+  cv::Rect rect(w_start, h_start, crop_size, crop_size);
+  img = img(rect);
+}
+
+void ResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img,
+                    int resize_short_size) {
+  int w = img.cols;
+  int h = img.rows;
+
+  float ratio = 1.f;
+  if (h < w) {
+    ratio = float(resize_short_size) / float(h);
+  } else {
+    ratio = float(resize_short_size) / float(w);
+  }
+
+  int resize_h = round(float(h) * ratio);
+  int resize_w = round(float(w) * ratio);
+
+  cv::resize(img, resize_img, cv::Size(resize_w, resize_h));
+}
+
+} // namespace PaddleClas
diff --git a/deploy/paddleserving/preprocess/preprocess_op.h b/deploy/paddleserving/preprocess/preprocess_op.h
new file mode 100644
index 0000000000000000000000000000000000000000..0ea9d2e14a525365bb049a13358660a2567dadc8
--- /dev/null
+++ b/deploy/paddleserving/preprocess/preprocess_op.h
@@ -0,0 +1,81 @@
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+
+#include "opencv2/core.hpp"
+#include "opencv2/imgcodecs.hpp"
+#include "opencv2/imgproc.hpp"
+#include <chrono>
+#include <iomanip>
+#include <iostream>
+#include <ostream>
+#include <vector>
+
+#include <cstring>
+#include <fstream>
+#include <numeric>
+
+namespace Feature {
+
+class Normalize {
+public:
+  virtual void Run(cv::Mat *im, const std::vector<float> &mean,
+                   const std::vector<float> &std, float scale);
+};
+
+// RGB -> CHW
+class Permute {
+public:
+  virtual void Run(const cv::Mat *im, float *data);
+};
+
+class CenterCropImg {
+public:
+  virtual void Run(cv::Mat &im, const int crop_size = 224);
+};
+
+class ResizeImg {
+public:
+  virtual void Run(const cv::Mat &img, cv::Mat &resize_img, int max_size_len,
+                   int size = 0);
+};
+
+} // namespace Feature
+
+namespace PaddleClas {
+
+class Normalize {
+public:
+  virtual void Run(cv::Mat *im, const std::vector<float> &mean,
+                   const std::vector<float> &scale, const bool is_scale = true);
+};
+
+// RGB -> CHW
+class Permute {
+public:
+  virtual void Run(const cv::Mat *im, float *data);
+};
+
+class CenterCropImg {
+public:
+  virtual void Run(cv::Mat &im, const int crop_size = 224);
+};
+
+class ResizeImg {
+public:
+  virtual void Run(const cv::Mat &img, cv::Mat &resize_img, int max_size_len);
+};
+
+} // namespace PaddleClas
diff --git a/deploy/paddleserving/recognition/config.yml b/deploy/paddleserving/recognition/config.yml
index 6ecc32e22435f07a549ffcdeb6a435b33c4901f1..e4108006e6f2ea1a3698e4fdf9c32f25dcbfbeb0 100644
--- a/deploy/paddleserving/recognition/config.yml
+++ b/deploy/paddleserving/recognition/config.yml
@@ -31,7 +31,7 @@ op:
 
             #Fetch结果列表，以client_config中fetch_var的alias_name为准
             fetch_list: ["features"]
-            
+
     det:
         concurrency: 1
         local_service_conf:
diff --git a/deploy/paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_client/serving_client_conf.prototxt b/deploy/paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_client/serving_client_conf.prototxt
new file mode 100644
index 0000000000000000000000000000000000000000..c781eb6f449fe06afbba7f96e01798c974bccf54
--- /dev/null
+++ b/deploy/paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_client/serving_client_conf.prototxt
@@ -0,0 +1,32 @@
+feed_var {
+  name: "x"
+  alias_name: "x"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 3
+  shape: 224
+  shape: 224
+}
+feed_var {
+  name: "boxes"
+  alias_name: "boxes"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 6
+}
+fetch_var {
+  name: "save_infer_model/scale_0.tmp_1"
+  alias_name: "features"
+  is_lod_tensor: false
+  fetch_type: 1
+  shape: 512
+}
+fetch_var {
+  name: "boxes"
+  alias_name: "boxes"
+  is_lod_tensor: false
+  fetch_type: 1
+  shape: 6
+}
+
+
diff --git a/deploy/paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_serving/serving_server_conf.prototxt b/deploy/paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_serving/serving_server_conf.prototxt
new file mode 100644
index 0000000000000000000000000000000000000000..04812f42ed90fbbd47c73b9ec706d57c04b4c571
--- /dev/null
+++ b/deploy/paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_serving/serving_server_conf.prototxt
@@ -0,0 +1,30 @@
+feed_var {
+  name: "x"
+  alias_name: "x"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 3
+  shape: 224
+  shape: 224
+}
+feed_var {
+  name: "boxes"
+  alias_name: "boxes"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 6
+}
+fetch_var {
+  name: "save_infer_model/scale_0.tmp_1"
+  alias_name: "features"
+  is_lod_tensor: false
+  fetch_type: 1
+  shape: 512
+}
+fetch_var {
+  name: "boxes"
+  alias_name: "boxes"
+  is_lod_tensor: false
+  fetch_type: 1
+  shape: 6
+}
diff --git a/deploy/paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/serving_client_conf.prototxt b/deploy/paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/serving_client_conf.prototxt
new file mode 100644
index 0000000000000000000000000000000000000000..d9ab81a8b3c275f638f314489a84deef46011d73
--- /dev/null
+++ b/deploy/paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/serving_client_conf.prototxt
@@ -0,0 +1,29 @@
+feed_var {
+  name: "im_shape"
+  alias_name: "im_shape"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 2
+}
+feed_var {
+  name: "image"
+  alias_name: "image"
+  is_lod_tensor: false
+  feed_type: 7
+  shape: -1
+  shape: -1
+  shape: 3
+}
+fetch_var {
+  name: "save_infer_model/scale_0.tmp_1"
+  alias_name: "save_infer_model/scale_0.tmp_1"
+  is_lod_tensor: true
+  fetch_type: 1
+  shape: -1
+}
+fetch_var {
+  name: "save_infer_model/scale_1.tmp_1"
+  alias_name: "save_infer_model/scale_1.tmp_1"
+  is_lod_tensor: false
+  fetch_type: 2
+}
diff --git a/deploy/paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/serving_server_conf.prototxt b/deploy/paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/serving_server_conf.prototxt
new file mode 100644
index 0000000000000000000000000000000000000000..d9ab81a8b3c275f638f314489a84deef46011d73
--- /dev/null
+++ b/deploy/paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/serving_server_conf.prototxt
@@ -0,0 +1,29 @@
+feed_var {
+  name: "im_shape"
+  alias_name: "im_shape"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 2
+}
+feed_var {
+  name: "image"
+  alias_name: "image"
+  is_lod_tensor: false
+  feed_type: 7
+  shape: -1
+  shape: -1
+  shape: 3
+}
+fetch_var {
+  name: "save_infer_model/scale_0.tmp_1"
+  alias_name: "save_infer_model/scale_0.tmp_1"
+  is_lod_tensor: true
+  fetch_type: 1
+  shape: -1
+}
+fetch_var {
+  name: "save_infer_model/scale_1.tmp_1"
+  alias_name: "save_infer_model/scale_1.tmp_1"
+  is_lod_tensor: false
+  fetch_type: 2
+}
diff --git a/deploy/paddleserving/recognition/test_cpp_serving_client.py b/deploy/paddleserving/recognition/test_cpp_serving_client.py
index a2bf1ae3e9d0a69628319b9f845a1e6f7701b391..e2cd17e855ebfe8fb286ebaeff8ab63874e2e972 100644
--- a/deploy/paddleserving/recognition/test_cpp_serving_client.py
+++ b/deploy/paddleserving/recognition/test_cpp_serving_client.py
@@ -12,7 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-import sys
 import numpy as np
 
 from paddle_serving_client import Client
@@ -22,181 +21,101 @@ import faiss
 import os
 import pickle
 
-
-class MainbodyDetect():
-    """
-    pp-shitu mainbody detect.
-    include preprocess, process, postprocess
-    return detect results
-    Attention: Postprocess include num limit and box filter; no nms 
-    """
-
-    def __init__(self):
-        self.preprocess = DetectionSequential([
-            DetectionFile2Image(), DetectionNormalize(
-                [0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True),
-            DetectionResize(
-                (640, 640), False, interpolation=2), DetectionTranspose(
-                    (2, 0, 1))
-        ])
-
-        self.client = Client()
-        self.client.load_client_config(
-            "../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/serving_client_conf.prototxt"
-        )
-        self.client.connect(['127.0.0.1:9293'])
-
-        self.max_det_result = 5
-        self.conf_threshold = 0.2
-
-    def predict(self, imgpath):
-        im, im_info = self.preprocess(imgpath)
-        im_shape = np.array(im.shape[1:]).reshape(-1)
-        scale_factor = np.array(list(im_info['scale_factor'])).reshape(-1)
-
-        fetch_map = self.client.predict(
-            feed={
-                "image": im,
-                "im_shape": im_shape,
-                "scale_factor": scale_factor,
-            },
-            fetch=["save_infer_model/scale_0.tmp_1"],
-            batch=False)
-        return self.postprocess(fetch_map, imgpath)
-
-    def postprocess(self, fetch_map, imgpath):
-        #1. get top max_det_result
-        det_results = fetch_map["save_infer_model/scale_0.tmp_1"]
-        if len(det_results) > self.max_det_result:
-            boxes_reserved = fetch_map[
-                "save_infer_model/scale_0.tmp_1"][:self.max_det_result]
-        else:
-            boxes_reserved = det_results
-
-        #2. do conf threshold
-        boxes_list = []
-        for i in range(boxes_reserved.shape[0]):
-            if (boxes_reserved[i, 1]) > self.conf_threshold:
-                boxes_list.append(boxes_reserved[i, :])
-
-        #3. add origin image box
-        origin_img = cv2.imread(imgpath)
-        boxes_list.append(
-            np.array([0, 1.0, 0, 0, origin_img.shape[1], origin_img.shape[0]]))
-        return np.array(boxes_list)
-
-
-class ObjectRecognition():
-    """
-    pp-shitu object recognion for all objects detected by MainbodyDetect.
-    include preprocess, process, postprocess
-    preprocess include preprocess for each image and batching.
-    Batch process
-    postprocess include retrieval and nms
-    """
-
-    def __init__(self):
-        self.client = Client()
-        self.client.load_client_config(
-            "../../models/general_PPLCNet_x2_5_lite_v1.0_client/serving_client_conf.prototxt"
-        )
-        self.client.connect(["127.0.0.1:9294"])
-
-        self.seq = Sequential([
-            BGR2RGB(), Resize((224, 224)), Div(255),
-            Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225],
-                      False), Transpose((2, 0, 1))
-        ])
-
-        self.searcher, self.id_map = self.init_index()
-
-        self.rec_nms_thresold = 0.05
-        self.rec_score_thres = 0.5
-        self.feature_normalize = True
-        self.return_k = 1
-
-    def init_index(self):
-        index_dir = "../../drink_dataset_v1.0/index"
-        assert os.path.exists(os.path.join(
-            index_dir, "vector.index")), "vector.index not found ..."
-        assert os.path.exists(os.path.join(
-            index_dir, "id_map.pkl")), "id_map.pkl not found ... "
-
-        searcher = faiss.read_index(os.path.join(index_dir, "vector.index"))
-
-        with open(os.path.join(index_dir, "id_map.pkl"), "rb") as fd:
-            id_map = pickle.load(fd)
-        return searcher, id_map
-
-    def predict(self, det_boxes, imgpath):
-        #1. preprocess
-        batch_imgs = []
-        origin_img = cv2.imread(imgpath)
-        for i in range(det_boxes.shape[0]):
-            box = det_boxes[i]
-            x1, y1, x2, y2 = [int(x) for x in box[2:]]
-            cropped_img = origin_img[y1:y2, x1:x2, :].copy()
-            tmp = self.seq(cropped_img)
-            batch_imgs.append(tmp)
-        batch_imgs = np.array(batch_imgs)
-
-        #2. process
-        fetch_map = self.client.predict(
-            feed={"x": batch_imgs}, fetch=["features"], batch=True)
-        batch_features = fetch_map["features"]
-
-        #3. postprocess
-        if self.feature_normalize:
-            feas_norm = np.sqrt(
-                np.sum(np.square(batch_features), axis=1, keepdims=True))
-            batch_features = np.divide(batch_features, feas_norm)
-        scores, docs = self.searcher.search(batch_features, self.return_k)
-
-        results = []
-        for i in range(scores.shape[0]):
-            pred = {}
-            if scores[i][0] >= self.rec_score_thres:
-                pred["bbox"] = [int(x) for x in det_boxes[i, 2:]]
-                pred["rec_docs"] = self.id_map[docs[i][0]].split()[1]
-                pred["rec_scores"] = scores[i][0]
-                results.append(pred)
-        return self.nms_to_rec_results(results)
-
-    def nms_to_rec_results(self, results):
-        filtered_results = []
-        x1 = np.array([r["bbox"][0] for r in results]).astype("float32")
-        y1 = np.array([r["bbox"][1] for r in results]).astype("float32")
-        x2 = np.array([r["bbox"][2] for r in results]).astype("float32")
-        y2 = np.array([r["bbox"][3] for r in results]).astype("float32")
-        scores = np.array([r["rec_scores"] for r in results])
-
-        areas = (x2 - x1 + 1) * (y2 - y1 + 1)
-        order = scores.argsort()[::-1]
-        while order.size > 0:
-            i = order[0]
-            xx1 = np.maximum(x1[i], x1[order[1:]])
-            yy1 = np.maximum(y1[i], y1[order[1:]])
-            xx2 = np.minimum(x2[i], x2[order[1:]])
-            yy2 = np.minimum(y2[i], y2[order[1:]])
-
-            w = np.maximum(0.0, xx2 - xx1 + 1)
-            h = np.maximum(0.0, yy2 - yy1 + 1)
-            inter = w * h
-            ovr = inter / (areas[i] + areas[order[1:]] - inter)
-            inds = np.where(ovr <= self.rec_nms_thresold)[0]
-            order = order[inds + 1]
-            filtered_results.append(results[i])
-        return filtered_results
-
-
+rec_nms_thresold = 0.05
+rec_score_thres = 0.5
+feature_normalize = True
+return_k = 1
+index_dir = "../../drink_dataset_v1.0/index"
+
+
+def init_index(index_dir):
+    assert os.path.exists(os.path.join(
+        index_dir, "vector.index")), "vector.index not found ..."
+    assert os.path.exists(os.path.join(
+        index_dir, "id_map.pkl")), "id_map.pkl not found ... "
+
+    searcher = faiss.read_index(os.path.join(index_dir, "vector.index"))
+
+    with open(os.path.join(index_dir, "id_map.pkl"), "rb") as fd:
+        id_map = pickle.load(fd)
+    return searcher, id_map
+
+
+#get box
+def nms_to_rec_results(results, thresh=0.1):
+    filtered_results = []
+
+    x1 = np.array([r["bbox"][0] for r in results]).astype("float32")
+    y1 = np.array([r["bbox"][1] for r in results]).astype("float32")
+    x2 = np.array([r["bbox"][2] for r in results]).astype("float32")
+    y2 = np.array([r["bbox"][3] for r in results]).astype("float32")
+    scores = np.array([r["rec_scores"] for r in results])
+
+    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+    order = scores.argsort()[::-1]
+    while order.size > 0:
+        i = order[0]
+        xx1 = np.maximum(x1[i], x1[order[1:]])
+        yy1 = np.maximum(y1[i], y1[order[1:]])
+        xx2 = np.minimum(x2[i], x2[order[1:]])
+        yy2 = np.minimum(y2[i], y2[order[1:]])
+
+        w = np.maximum(0.0, xx2 - xx1 + 1)
+        h = np.maximum(0.0, yy2 - yy1 + 1)
+        inter = w * h
+        ovr = inter / (areas[i] + areas[order[1:]] - inter)
+        inds = np.where(ovr <= thresh)[0]
+        order = order[inds + 1]
+        filtered_results.append(results[i])
+    return filtered_results
+
+
+def postprocess(fetch_dict, feature_normalize, det_boxes, searcher, id_map,
+                return_k, rec_score_thres, rec_nms_thresold):
+    batch_features = fetch_dict["features"]
+
+    #do feature norm
+    if feature_normalize:
+        feas_norm = np.sqrt(
+            np.sum(np.square(batch_features), axis=1, keepdims=True))
+        batch_features = np.divide(batch_features, feas_norm)
+
+    scores, docs = searcher.search(batch_features, return_k)
+
+    results = []
+    for i in range(scores.shape[0]):
+        pred = {}
+        if scores[i][0] >= rec_score_thres:
+            pred["bbox"] = [int(x) for x in det_boxes[i, 2:]]
+            pred["rec_docs"] = id_map[docs[i][0]].split()[1]
+            pred["rec_scores"] = scores[i][0]
+            results.append(pred)
+
+    #do nms
+    results = nms_to_rec_results(results, rec_nms_thresold)
+    return results
+
+
+#do client
 if __name__ == "__main__":
-    det = MainbodyDetect()
-    rec = ObjectRecognition()
-
-    #1. get det_results    
-    imgpath = "../../drink_dataset_v1.0/test_images/001.jpeg"
-    det_results = det.predict(imgpath)
-
-    #2. get rec_results
-    rec_results = rec.predict(det_results, imgpath)
-    print(rec_results)
+    client = Client()
+    client.load_client_config([
+        "../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client",
+        "../../models/general_PPLCNet_x2_5_lite_v1.0_client"
+    ])
+    client.connect(['127.0.0.1:9400'])
+
+    im = cv2.imread("../../drink_dataset_v1.0/test_images/001.jpeg")
+    im_shape = np.array(im.shape[:2]).reshape(-1)
+    fetch_map = client.predict(
+        feed={"image": im,
+              "im_shape": im_shape},
+        fetch=["features", "boxes"],
+        batch=False)
+
+    #add retrieval procedure
+    det_boxes = fetch_map["boxes"]
+    searcher, id_map = init_index(index_dir)
+    results = postprocess(fetch_map, feature_normalize, det_boxes, searcher,
+                          id_map, return_k, rec_score_thres, rec_nms_thresold)
+    print(results)
diff --git a/deploy/paddleserving/test_cpp_serving_client.py b/deploy/paddleserving/test_cpp_serving_client.py
index 50794b363767c8236ccca1001a441b535a9f9db3..ba5399c90dcd5e0701df26e2d2f8337a4105ab51 100644
--- a/deploy/paddleserving/test_cpp_serving_client.py
+++ b/deploy/paddleserving/test_cpp_serving_client.py
@@ -12,16 +12,20 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-import sys
+import base64
+import time
+
 from paddle_serving_client import Client
 
-#app
-from paddle_serving_app.reader import Sequential, URL2Image, Resize
-from paddle_serving_app.reader import CenterCrop, RGB2BGR, Transpose, Div, Normalize
-import time
+
+def bytes_to_base64(image: bytes) -> str:
+    """encode bytes into base64 string
+    """
+    return base64.b64encode(image).decode('utf8')
+
 
 client = Client()
-client.load_client_config("./ResNet50_vd_serving/serving_server_conf.prototxt")
+client.load_client_config("./ResNet50_client/serving_client_conf.prototxt")
 client.connect(["127.0.0.1:9292"])
 
 label_dict = {}
@@ -31,22 +35,17 @@ with open("imagenet.label") as fin:
         label_dict[label_idx] = line.strip()
         label_idx += 1
 
-#preprocess
-seq = Sequential([
-    URL2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
-    Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
-])
-
-start = time.time()
-image_file = "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"
+image_file = "./daisy.jpg"
 for i in range(1):
-    img = seq(image_file)
-    fetch_map = client.predict(
-        feed={"inputs": img}, fetch=["prediction"], batch=False)
-
-    prob = max(fetch_map["prediction"][0])
-    label = label_dict[fetch_map["prediction"][0].tolist().index(prob)].strip(
-    ).replace(",", "")
-    print("prediction: {}, probability: {}".format(label, prob))
-end = time.time()
-print(end - start)
+    start = time.time()
+    with open(image_file, 'rb') as img_file:
+        image_data = img_file.read()
+        image = bytes_to_base64(image_data)
+        fetch_dict = client.predict(
+            feed={"inputs": image}, fetch=["prediction"], batch=False)
+        prob = max(fetch_dict["prediction"][0])
+        label = label_dict[fetch_dict["prediction"][0].tolist().index(
+            prob)].strip().replace(",", "")
+        print("prediction: {}, probability: {}".format(label, prob))
+    end = time.time()
+    print(end - start)
diff --git a/deploy/python/postprocess.py b/deploy/python/postprocess.py
index 4f4d005fdff2bf17e04265e136443d0cd837f10e..23a803e284361e98b60f193c450318536d992937 100644
--- a/deploy/python/postprocess.py
+++ b/deploy/python/postprocess.py
@@ -64,9 +64,17 @@ class ThreshOutput(object):
         for idx, probs in enumerate(x):
             score = probs[1]
             if score < self.threshold:
-                result = {"class_ids": [0], "scores":  [1 - score], "label_names": [self.label_0]}
+                result = {
+                    "class_ids": [0],
+                    "scores": [1 - score],
+                    "label_names": [self.label_0]
+                }
             else:
-                result = {"class_ids": [1], "scores": [score], "label_names": [self.label_1]}
+                result = {
+                    "class_ids": [1],
+                    "scores": [score],
+                    "label_names": [self.label_1]
+                }
             if file_names is not None:
                 result["file_name"] = file_names[idx]
             y.append(result)
@@ -179,3 +187,136 @@ class Binarize(object):
             byte[:, i:i + 1] = np.dot(x[:, i * 8:(i + 1) * 8], self.unit)
 
         return byte
+
+
+class PersonAttribute(object):
+    def __init__(self,
+                 threshold=0.5,
+                 glasses_threshold=0.3,
+                 hold_threshold=0.6):
+        self.threshold = threshold
+        self.glasses_threshold = glasses_threshold
+        self.hold_threshold = hold_threshold
+
+    def __call__(self, batch_preds, file_names=None):
+        # postprocess output of predictor
+        age_list = ['AgeLess18', 'Age18-60', 'AgeOver60']
+        direct_list = ['Front', 'Side', 'Back']
+        bag_list = ['HandBag', 'ShoulderBag', 'Backpack']
+        upper_list = ['UpperStride', 'UpperLogo', 'UpperPlaid', 'UpperSplice']
+        lower_list = [
+            'LowerStripe', 'LowerPattern', 'LongCoat', 'Trousers', 'Shorts',
+            'Skirt&Dress'
+        ]
+        batch_res = []
+        for res in batch_preds:
+            res = res.tolist()
+            label_res = []
+            # gender 
+            gender = 'Female' if res[22] > self.threshold else 'Male'
+            label_res.append(gender)
+            # age
+            age = age_list[np.argmax(res[19:22])]
+            label_res.append(age)
+            # direction 
+            direction = direct_list[np.argmax(res[23:])]
+            label_res.append(direction)
+            # glasses
+            glasses = 'Glasses: '
+            if res[1] > self.glasses_threshold:
+                glasses += 'True'
+            else:
+                glasses += 'False'
+            label_res.append(glasses)
+            # hat
+            hat = 'Hat: '
+            if res[0] > self.threshold:
+                hat += 'True'
+            else:
+                hat += 'False'
+            label_res.append(hat)
+            # hold obj
+            hold_obj = 'HoldObjectsInFront: '
+            if res[18] > self.hold_threshold:
+                hold_obj += 'True'
+            else:
+                hold_obj += 'False'
+            label_res.append(hold_obj)
+            # bag
+            bag = bag_list[np.argmax(res[15:18])]
+            bag_score = res[15 + np.argmax(res[15:18])]
+            bag_label = bag if bag_score > self.threshold else 'No bag'
+            label_res.append(bag_label)
+            # upper
+            upper_res = res[4:8]
+            upper_label = 'Upper:'
+            sleeve = 'LongSleeve' if res[3] > res[2] else 'ShortSleeve'
+            upper_label += ' {}'.format(sleeve)
+            for i, r in enumerate(upper_res):
+                if r > self.threshold:
+                    upper_label += ' {}'.format(upper_list[i])
+            label_res.append(upper_label)
+            # lower
+            lower_res = res[8:14]
+            lower_label = 'Lower: '
+            has_lower = False
+            for i, l in enumerate(lower_res):
+                if l > self.threshold:
+                    lower_label += ' {}'.format(lower_list[i])
+                    has_lower = True
+            if not has_lower:
+                lower_label += ' {}'.format(lower_list[np.argmax(lower_res)])
+
+            label_res.append(lower_label)
+            # shoe
+            shoe = 'Boots' if res[14] > self.threshold else 'No boots'
+            label_res.append(shoe)
+
+            threshold_list = [0.5] * len(res)
+            threshold_list[1] = self.glasses_threshold
+            threshold_list[18] = self.hold_threshold
+            pred_res = (np.array(res) > np.array(threshold_list)
+                        ).astype(np.int8).tolist()
+            batch_res.append({"attributes": label_res, "output": pred_res})
+        return batch_res
+
+
+class VehicleAttribute(object):
+    def __init__(self, color_threshold=0.5, type_threshold=0.5):
+        self.color_threshold = color_threshold
+        self.type_threshold = type_threshold
+        self.color_list = [
+            "yellow", "orange", "green", "gray", "red", "blue", "white",
+            "golden", "brown", "black"
+        ]
+        self.type_list = [
+            "sedan", "suv", "van", "hatchback", "mpv", "pickup", "bus",
+            "truck", "estate"
+        ]
+
+    def __call__(self, batch_preds, file_names=None):
+        # postprocess output of predictor
+        batch_res = []
+        for res in batch_preds:
+            res = res.tolist()
+            label_res = []
+            color_idx = np.argmax(res[:10])
+            type_idx = np.argmax(res[10:])
+            if res[color_idx] >= self.color_threshold:
+                color_info = f"Color: ({self.color_list[color_idx]}, prob: {res[color_idx]})"
+            else:
+                color_info = "Color unknown"
+
+            if res[type_idx + 10] >= self.type_threshold:
+                type_info = f"Type: ({self.type_list[type_idx]}, prob: {res[type_idx + 10]})"
+            else:
+                type_info = "Type unknown"
+
+            label_res = f"{color_info}, {type_info}"
+
+            threshold_list = [self.color_threshold
+                              ] * 10 + [self.type_threshold] * 9
+            pred_res = (np.array(res) > np.array(threshold_list)
+                        ).astype(np.int8).tolist()
+            batch_res.append({"attributes": label_res, "output": pred_res})
+        return batch_res
diff --git a/deploy/python/predict_cls.py b/deploy/python/predict_cls.py
index 64c07ea875eaa2c456393328183b7270080a64d1..49bf62fa3060b9336a3438b2ee5c25b2bac49667 100644
--- a/deploy/python/predict_cls.py
+++ b/deploy/python/predict_cls.py
@@ -138,13 +138,20 @@ def main(config):
                 continue
             batch_results = cls_predictor.predict(batch_imgs)
             for number, result_dict in enumerate(batch_results):
-                filename = batch_names[number]
-                clas_ids = result_dict["class_ids"]
-                scores_str = "[{}]".format(", ".join("{:.2f}".format(
-                    r) for r in result_dict["scores"]))
-                label_names = result_dict["label_names"]
-                print("{}:\tclass id(s): {}, score(s): {}, label_name(s): {}".
-                      format(filename, clas_ids, scores_str, label_names))
+                if "PersonAttribute" in config[
+                        "PostProcess"] or "VehicleAttribute" in config[
+                            "PostProcess"]:
+                    filename = batch_names[number]
+                    print("{}:\t {}".format(filename, result_dict))
+                else:
+                    filename = batch_names[number]
+                    clas_ids = result_dict["class_ids"]
+                    scores_str = "[{}]".format(", ".join("{:.2f}".format(
+                        r) for r in result_dict["scores"]))
+                    label_names = result_dict["label_names"]
+                    print(
+                        "{}:\tclass id(s): {}, score(s): {}, label_name(s): {}".
+                        format(filename, clas_ids, scores_str, label_names))
             batch_imgs = []
             batch_names = []
     if cls_predictor.benchmark:
diff --git a/deploy/slim/quant_post_static.py b/deploy/slim/quant_post_static.py
index 5c8469794ad29e18dad15f985b611e423fd4b474..20507c66ad1ed583c2baf1bae7e812e0364e015e 100644
--- a/deploy/slim/quant_post_static.py
+++ b/deploy/slim/quant_post_static.py
@@ -43,6 +43,7 @@ def main():
                                       'inference.pdiparams'))
     config["DataLoader"]["Eval"]["sampler"]["batch_size"] = 1
     config["DataLoader"]["Eval"]["loader"]["num_workers"] = 0
+
     init_logger()
     device = paddle.set_device("cpu")
     train_dataloader = build_dataloader(config["DataLoader"], "Eval", device,
@@ -67,6 +68,7 @@ def main():
         quantize_model_path=os.path.join(
             config["Global"]["save_inference_dir"], "quant_post_static_model"),
         sample_generator=sample_generator(train_dataloader),
+        batch_size=config["DataLoader"]["Eval"]["sampler"]["batch_size"],
         batch_nums=10)
 
 
diff --git a/docs/en/PULC/PULC_car_exists_en.md b/docs/en/PULC/PULC_car_exists_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..33c0932e6f118d7f9e31650e7d1e9754af19ec17
--- /dev/null
+++ b/docs/en/PULC/PULC_car_exists_en.md
@@ -0,0 +1,457 @@
+# PULC Classification Model of Containing or Uncontaining Car
+
+------
+
+## Catalogue
+
+- [1. Introduction](#1)
+- [2. Quick Start](#2)
+    - [2.1 PaddlePaddle Installation](#2.1)
+    - [2.2 PaddleClas Installation](#2.2)
+    - [2.3 Prediction](#2.3)
+- [3. Training, Evaluation and Inference](#3)
+    - [3.1 Installation](#3.1)
+    - [3.2 Dataset](#3.2)
+      - [3.2.1 Dataset Introduction](#3.2.1)
+      - [3.2.2 Getting Dataset](#3.2.2)
+    - [3.3 Training](#3.3)
+    - [3.4 Evaluation](#3.4)
+    - [3.5 Inference](#3.5)
+- [4. Model Compression](#4)
+  - [4.1 SKL-UGI Knowledge Distillation](#4.1)
+    - [4.1.1 Teacher Model Training](#4.1.1)
+    - [4.1.2 Knowledge Distillation Training](#4.1.2)
+- [5. SHAS](#5)
+- [6. Inference Deployment](#6)
+  - [6.1 Getting Paddle Inference Model](#6.1)
+    - [6.1.1 Exporting Paddle Inference Model](#6.1.1)
+    - [6.1.2 Downloading Inference Model](#6.1.2)
+  - [6.2 Prediction with Python](#6.2)
+    - [6.2.1 Image Prediction](#6.2.1)
+    - [6.2.2 Images Prediction](#6.2.2)
+  - [6.3 Deployment with C++](#6.3)
+  - [6.4 Deployment as Service](#6.4)
+  - [6.5 Deployment on Mobile](#6.5)
+  - [6.6 Converting To ONNX and Deployment](#6.6)
+
+<a name="1"></a>
+
+## 1. Introduction
+
+This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of car exists using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in monitoring scenarios, massive data filtering scenarios, etc.
+
+The following table lists the relevant indicators of the model. The first two lines means that using SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The third to sixth lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
+
+| Backbone | Tpr(%) | Latency(ms) | Size(M)| Training Strategy |
+|-------|----------------|----------|---------------|---------------|
+| SwinTranformer_tiny  | 97.71          | 95.30  | 111 | using ImageNet pretrained model |
+| MobileNetV3_small_x0_35  | 81.23          | 2.85  | 2.7 | using ImageNet pretrained model |
+| PPLCNet_x1_0  | 94.72          | 2.12  | 7.1 | using ImageNet pretrained model |
+| PPLCNet_x1_0  | 95.48          | 2.12  | 7.1 | using SSLD pretrained model |
+| PPLCNet_x1_0  | 95.48          | 2.12  | 7.1 | using SSLD pretrained model + EDA strategy  |
+| <b>PPLCNet_x1_0<b>  | <b>95.92<b>    | <b>2.12<b>  | <b>7.1<b> | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
+
+It can be seen that high Tpr can be getted when backbone is SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the Tpr will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the Tpr is higher more 13 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more than 20% faster. After additional using the SSLD pretrained model, the Tpr can be improved by about 0.7 percentage points without affecting the inference speed. Finally, after additional using the SKL-UGI knowledge distillation, the Tpr can be further improved by 0.44 percentage points. At this point, the Tpr is close to that of SwinTranformer_tiny, but the speed is more than 40 times faster. The training method and deployment instructions of PULC will be introduced in detail below.
+
+**Note**:
+
+* About `Tpr` metric, please refer to [3.2 section](#3.2) for more information .
+* The Latency is tested on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. The MKLDNN is enabled and the number of threads is 10.
+* About PP-LCNet, please refer to [PP-LCNet Introduction](../models/PP-LCNet_en.md) and [PP-LCNet Paper](https://arxiv.org/abs/2109.15099).
+
+<a name="2"></a>
+
+## 2. Quick Start
+
+<a name="2.1"></a>  
+
+### 2.1 PaddlePaddle Installation
+
+- Run the following command to install if CUDA9 or CUDA10 is available.
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- Run the following command to install if GPU device is unavailable.
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) for more information about installation, for examples other versions.
+
+<a name="2.2"></a>  
+
+### 2.2 PaddleClas wheel Installation
+
+The command of PaddleClas installation as bellow:
+
+```bash
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 Prediction
+
+First, please click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download and unzip to get the test demo images.
+
+* Prediction with CLI
+
+```bash
+paddleclas --model_name=car_exists --infer_imgs=pulc_demo_imgs/car_exists/objects365_00001507.jpeg
+```
+
+Results:
+
+```
+>>> result
+class_ids: [1], scores: [0.9871138], label_names: ['contains_car'], filename: pulc_demo_imgs/car_exists/objects365_00001507.jpeg
+Predict complete!
+```
+
+**Note**: If you want to test other images, only need to specify the `--infer_imgs` argument, and the directory containing images is also supported.
+
+* Prediction in Python
+
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="car_exists")
+result = model.predict(input_data="pulc_demo_imgs/car_exists/objects365_00001507.jpeg")
+print(next(result))
+```
+
+**Note**: The `result` returned by `model.predict()` is a generator, so you need to use the `next()` function to call it or `for` loop to loop it. And it will predict with `batch_size` size batch and return the prediction results when called. The default `batch_size` is 1, and you also specify the `batch_size` when instantiating, such as `model = paddleclas.PaddleClas(model_name="car_exists",  batch_size=2)`. The result of demo above:
+
+```
+>>> result
+[{'class_ids': [1], 'scores': [0.9871138], 'label_names': ['contains_car'], 'filename': 'pulc_demo_imgs/car_exists/objects365_00001507.jpeg'}]
+```
+
+<a name="3"></a>
+
+## 3. Training, Evaluation and Inference
+
+<a name="3.1"></a>  
+
+### 3.1 Installation
+
+Please refer to [Installation](../installation/install_paddleclas_en.md) to get the description about installation.
+
+<a name="3.2"></a>
+
+### 3.2 Dataset
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 Dataset Introduction
+
+All datasets used in this case are open source data. Train and validation data are the subset of [Object365](https://www.objects365.org/overview.html) data. ImageNet_val is [ImageNet-1k](https://www.image-net.org/) validation data.
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 Getting Dataset
+
+The data used in this case can be getted by processing the open source data. The detailed processes are as follows:
+
+- Training data. This case deals with the annotation file of Objects365 data training data. If a certain image contains the label of "car" and the area of this box is greater than 10% in the whole image, it is considered that the image contains car. If there is no label of any vehicle in a certain image, such as car, bus and so on, it is considered that the image does not contain car. After processing, 108629 images were obtained, including 27422 images containing car and 81207 images uncontaining car.
+- Validation data: Same as Training data, but checked manually to remove some labeled wrong images.
+
+**Note**: the labels of objects365 are not completely mutually exclusive. For example, F1 racing cars may be "F1 formula" or "car". In order to reduce the interference, we only keep the "car" label as containing car, and the figure without any vehicle as uncontaining car.
+
+Some image of the processed dataset is as follows:
+
+![](../../images/PULC/docs/car_exists_data_demo.jpeg)
+
+And you can also download the data processed directly.
+
+```
+cd path_to_PaddleClas
+```
+
+Enter the `dataset/` directory, download and unzip the dataset.
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/car_exists.tar
+tar -xf car_exists.tar
+cd ../
+```
+
+The datas under `car_exists` directory:
+
+```
+
+├── objects365_car
+│   ├── objects365_00000039.jpg
+│   ├── objects365_00000099.jpg
+├── ImageNet_val
+│   ├── ILSVRC2012_val_00000001.JPEG
+│   ├── ILSVRC2012_val_00000002.JPEG
+...
+├── train_list.txt
+├── train_list.txt.debug
+├── train_list_for_distill.txt
+├── val_list.txt
+└── val_list.txt.debug
+```
+
+Where `train/` and `val/` are training set and validation set respectively. The `train_list.txt` and `val_list.txt` are label files of training data and validation data respectively. The file `train_list.txt.debug` and `val_list.txt.debug` are subset of `train_list.txt` and `val_list.txt` respectively. `ImageNet_val/` is the validation data of ImageNet-1k, which will be used for SKL-UGI knowledge distillation, and its label file is `train_list_for_distill.txt`.
+
+**Note**:
+
+* About the contents format of `train_list.txt` and `val_list.txt`, please refer to [Description about Classification Dataset in PaddleClas](../data_preparation/classification_dataset_en.md).
+* About the `train_list_for_distill.txt`, please refer to [Knowledge Distillation Label](../advanced_tutorials/distillation/distillation_en.md).
+
+<a name="3.3"></a>
+
+### 3.3 Training
+
+The details of training config in `ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml`. The command about training as follows:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml
+```
+
+The best metric of validation data is between `0.95` and `0.96`. There would be fluctuations because the data size is small.
+
+**Note**:
+
+* The metric Tpr, that describe the True Positive Rate when False Positive Rate is less than a certain threshold(1/100 used in this case), is one of the commonly used metric for binary classification. About the details of Fpr and Tpr, please refer [here](https://en.wikipedia.org/wiki/Receiver_operating_characteristic).
+* When evaluation, the best metric TprAtFpr will be printed that include `Fpr`, `Tpr` and the current `threshold`. The `Tpr` means the Recall rate under the current `Fpr`. The `Tpr` higher, the model better. The `threshold` would be used in deployment, which means the classification threshold under best `Fpr` metric.
+
+<a name="3.4"></a>
+
+### 3.4 Evaluation
+
+After training, you can use the following commands to evaluate the model.
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+Among the above command, the argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+
+<a name="3.5"></a>
+
+### 3.5 Inference
+
+After training, you can use the model that trained to infer. Command is as follow:
+
+```python
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+
+The results:
+
+```
+[{'class_ids': [1], 'scores': [0.9871138], 'label_names': ['contains_car'], 'filename': 'deploy/images/PULC/car_exists/objects365_00001507.jpeg'}]
+```
+
+**Note**:
+
+* Among the above command, argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+* The default test image is `deploy/images/PULC/car_exists/objects365_00001507.jpeg`. And you can test other image, only need to specify the argument `-o Infer.infer_imgs=path_to_test_image`.
+* The default threshold is `0.5`. If needed, you can specify the argument `Infer.PostProcess.threshold`, such as: `-o Infer.PostProcess.threshold=0.9794`. And the argument `threshold` is needed to be specified according by specific case. The `0.9794` is the best threshold when `Fpr` is less than `1/100` in this valuation dataset.
+
+<a name="4"></a>
+
+## 4. Model Compression
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI Knowledge Distillation
+
+SKL-UGI is a simple but effective knowledge distillation algrithem proposed by PaddleClas.
+
+<!-- todo -->
+<!-- Please refer to [SKL-UGI](../advanced_tutorials/distillation/distillation_en.md) for more details. -->
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 Teacher Model Training
+
+Training the teacher model with hyperparameters specified in `ppcls/configs/PULC/car_exists/PPLCNet/PPLCNet_x1_0.yaml`. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd
+```
+
+The best metric of validation data is between `0.96` and `0.98`. The best teacher model weight would be saved in file `output/ResNet101_vd/best_model.pdparams`.
+
+<a name="4.1.2"></a>
+
+#### 4.1.2 Knowledge Distillation Training
+
+The training strategy, specified in training config file `ppcls/configs/PULC/car_exists/PPLCNet_x1_0_distillation.yaml`, the teacher model is `ResNet101_vd`, the student model is `PPLCNet_x1_0` and the additional unlabeled training data is validation data of ImageNet1k. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/car_exists/PPLCNet_x1_0_distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
+```
+
+The best metric is between `0.95` and `0.97`. The best student model weight would be saved in file `output/DistillationModel/best_model_student.pdparams`.
+
+<a name="5"></a>
+
+## 5. Hyperparameters Searching
+
+The hyperparameters used by [3.2 section](#3.2) and [4.1 section](#4.1) are according by `Hyperparameters Searching` in PaddleClas. If you want to get better results on your own dataset, you can refer to [Hyperparameters Searching](PULC_train_en.md#4) to get better hyperparameters.
+
+**Note**: This section is optional. Because the search process will take a long time, you can selectively run according to your specific. If not replace the dataset, you can ignore this section.
+
+<a name="6"></a>
+
+## 6. Inference Deployment
+
+<a name="6.1"></a>
+
+### 6.1 Getting Paddle Inference Model
+
+Paddle Inference is the original Inference Library of the PaddlePaddle, provides high-performance inference for server deployment. And compared with  directly based on the pretrained model, Paddle Inference can use tools to accelerate prediction, so as to achieve better inference performance. Please refer to [Paddle Inference](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html) for more information.
+
+Paddle Inference need Paddle Inference Model to predict. Two process provided to get Paddle Inference Model. If want to use the provided by PaddleClas, you can download directly, click [Downloading Inference Model](#6.1.2).
+
+<a name="6.1.1"></a>
+
+### 6.1.1 Exporting Paddle Inference Model
+
+The command about exporting Paddle Inference Model is as follow:
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_car_exists_infer
+```
+
+After running above command, the inference model files would be saved in `deploy/models/PPLCNet_x1_0_car_exists_infer`, as shown below:
+
+```
+├── PPLCNet_x1_0_car_exists_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**Note**: The best model is from knowledge distillation training. If knowledge distillation training is not used, the best model would be saved in `output/PPLCNet_x1_0/best_model.pdparams`.
+
+<a name="6.1.2"></a>
+
+### 6.1.2 Downloading Inference Model
+
+You can also download directly.
+
+```
+cd deploy/models
+# download the inference model and decompression
+wget https://paddleclas.bj.bcebos.com/models/PULC/car_exists_infer.tar && tar -xf car_exists_infer.tar
+```
+
+After decompression, the directory `models` should be shown below.
+
+```
+├── car_exists_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 Prediction with Python
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 Image Prediction
+
+Return the directory `deploy`:
+
+```
+cd ../
+```
+
+Run the following command to classify whether there are cars in the image `./images/PULC/car_exists/objects365_00001507.jpeg`.
+
+```shell
+# Use the following command to predict with GPU.
+python3.7 python/predict_cls.py -c configs/PULC/car_exists/inference_car_exists.yaml
+# Use the following command to predict with CPU.
+python3.7 python/predict_cls.py -c configs/PULC/car_exists/inference_car_exists.yaml -o Global.use_gpu=False
+```
+
+The prediction results:
+
+```
+objects365_00001507.jpeg:       class id(s): [1], score(s): [0.99], label_name(s): ['contains_car']
+```
+
+**Note**: The default threshold is `0.5`. If needed, you can specify the argument `Infer.PostProcess.threshold`, such as: `-o Infer.PostProcess.threshold=0.9794`. And the argument `threshold` is needed to be specified according by specific case. The `0.9794` is the best threshold when `Fpr` is less than `1/100` in this valuation dataset. Please refer to [3.3 section](#3.3) for details.
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 Images Prediction
+
+If you want to predict images in directory, please specify the argument `Global.infer_imgs` as directory path by `-o Global.infer_imgs`. The command is as follow.
+
+```shell
+# Use the following command to predict with GPU. If want to replace with CPU, you can add argument -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/car_exists/inference_car_exists.yaml -o Global.infer_imgs="./images/PULC/car_exists/"
+```
+
+All prediction results will be printed, as shown below.
+
+```
+objects365_00001507.jpeg:       class id(s): [1], score(s): [0.99], label_name(s): ['contains_car']
+objects365_00001521.jpeg:       class id(s): [0], score(s): [0.99], label_name(s): ['no_car']
+```
+
+Among the prediction results above, `contains_car` means that there is a car in the image, `no_car` means that there is no car in the image.
+
+<a name="6.3"></a>
+
+### 6.3 Deployment with C++
+
+PaddleClas provides an example about how to deploy with C++. Please refer to [Deployment with C++](../inference_deployment/cpp_deploy_en.md).
+
+<a name="6.4"></a>
+
+### 6.4 Deployment as Service
+
+Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
+
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+
+<a name="6.5"></a>
+
+### 6.5 Deployment on Mobile
+
+Paddle-Lite is an open source deep learning framework that designed to make easy to perform inference on mobile, embeded, and IoT devices. Please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) for more information.
+
+PaddleClas provides an example of how to deploy on mobile by Paddle-Lite. Please refer to [Paddle-Lite deployment](../inference_deployment/paddle_lite_deploy_en.md).
+
+<a name="6.6"></a>
+
+### 6.6 Converting To ONNX and Deployment
+
+Paddle2ONNX support convert Paddle Inference model to ONNX model. And you can deploy with ONNX model on different inference engine, such as TensorRT, OpenVINO, MNN/TNN, NCNN and so on. About Paddle2ONNX details, please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
+
+PaddleClas provides an example of how to convert Paddle Inference model to ONNX model by paddle2onnx toolkit and predict by ONNX model. You can refer to [paddle2onnx](../../../deploy/paddle2onnx/readme_en.md) for deployment details.
diff --git a/docs/en/PULC/PULC_language_classification_en.md b/docs/en/PULC/PULC_language_classification_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..c7cd5f5db9c01f01c4fbb2299086bc1adcfc98d1
--- /dev/null
+++ b/docs/en/PULC/PULC_language_classification_en.md
@@ -0,0 +1,470 @@
+# PULC Classification Model of Language
+
+------
+
+## Catalogue
+
+- [1. Introduction](#1)
+- [2. Quick Start](#2)
+    - [2.1 PaddlePaddle Installation](#2.1)
+    - [2.2 PaddleClas Installation](#2.2)
+    - [2.3 Prediction](#2.3)
+- [3. Training, Evaluation and Inference](#3)
+    - [3.1 Installation](#3.1)
+    - [3.2 Dataset](#3.2)
+      - [3.2.1 Dataset Introduction](#3.2.1)
+      - [3.2.2 Getting Dataset](#3.2.2)
+    - [3.3 Training](#3.3)
+    - [3.4 Evaluation](#3.4)
+    - [3.5 Inference](#3.5)
+- [4. Model Compression](#4)
+  - [4.1 SKL-UGI Knowledge Distillation](#4.1)
+    - [4.1.1 Teacher Model Training](#4.1.1)
+    - [4.1.2 Knowledge Distillation Training](#4.1.2)
+- [5. SHAS](#5)
+- [6. Inference Deployment](#6)
+  - [6.1 Getting Paddle Inference Model](#6.1)
+    - [6.1.1 Exporting Paddle Inference Model](#6.1.1)
+    - [6.1.2 Downloading Inference Model](#6.1.2)
+  - [6.2 Prediction with Python](#6.2)
+    - [6.2.1 Image Prediction](#6.2.1)
+    - [6.2.2 Images Prediction](#6.2.2)
+  - [6.3 Deployment with C++](#6.3)
+  - [6.4 Deployment as Service](#6.4)
+  - [6.5 Deployment on Mobile](#6.5)
+  - [6.6 Converting To ONNX and Deployment](#6.6)
+
+<a name="1"></a>
+
+## 1. Introduction
+
+This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of language in the image using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in various scenarios involving multilingual OCR processing, such as finance and government affairs.
+
+The following table lists the relevant indicators of the model. The first two lines means that using SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The third to sixth lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy. When replacing the backbone with PPLCNet_x1_0, the input shape of model is changed to [192, 48], and the stride of the network is changed to [2, [2, 1], [2, 1], [2, 1]].
+
+| Backbone | Top1-Acc(%) | Latency(ms) | Size(M)| Training Strategy |
+| ----------------------- | --------- | -------- | ------- | ---------------------------------------------- |
+| SwinTranformer_tiny     | 98.12     | 89.09    | 111     | using ImageNet pretrained model |
+| MobileNetV3_small_x0_35 | 95.92     | 2.98     | 3.7      | using ImageNet pretrained model |
+| PPLCNet_x1_0            | 98.35     | 2.58     | 7.1     | using ImageNet pretrained model |
+| PPLCNet_x1_0            | 98.7      | 2.58     | 7.1     | using SSLD pretrained model |
+| PPLCNet_x1_0            | 99.12     | 2.58     | 7.1     | using SSLD pretrained model + EDA strategy  |
+| **PPLCNet_x1_0**        | **99.26** | **2.58** | **7.1** | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
+
+It can be seen that high accuracy can be getted when backbone is SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the accuracy will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0 and changing the input shape and stride of network, the accuracy is higher more 2.43 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more than 20% faster. After additional using the SSLD pretrained model, the accuracy can be improved by about 0.35 percentage points without affecting the inference speed. Further, additional using the EDA strategy, the accuracy can be increased by 0.42 percentage points. Finally, after additional using the SKL-UGI knowledge distillation, the accuracy can be further improved by 0.14 percentage points. At this point, the accuracy is higher than that of SwinTranformer_tiny, but the speed is more faster. The training method and deployment instructions of PULC will be introduced in detail below.
+
+**Note**:
+
+* The Latency is tested on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. The MKLDNN is enabled and the number of threads is 10.
+* About PP-LCNet, please refer to [PP-LCNet Introduction](../models/PP-LCNet_en.md) and [PP-LCNet Paper](https://arxiv.org/abs/2109.15099).
+
+<a name="2"></a>
+
+## 2. Quick Start
+
+<a name="2.1"></a>  
+
+### 2.1 PaddlePaddle Installation
+
+- Run the following command to install if CUDA9 or CUDA10 is available.
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- Run the following command to install if GPU device is unavailable.
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) for more information about installation, for examples other versions.
+
+<a name="2.2"></a>  
+
+### 2.2 PaddleClas wheel Installation
+
+The command of PaddleClas installation as bellow:
+
+```bash
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 Prediction
+
+First, please click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download and unzip to get the test demo images.
+
+* Prediction with CLI
+
+```bash
+paddleclas --model_name=language_classification --infer_imgs=pulc_demo_imgs/language_classification/word_35404.png
+```
+
+Results:
+
+```
+>>> result
+class_ids: [4, 6], scores: [0.88672, 0.01434], label_names: ['japan', 'korean'], filename: pulc_demo_imgs/language_classification/word_35404.png
+Predict complete!
+```
+
+**Note**: If you want to test other images, only need to specify the `--infer_imgs` argument, and the directory containing images is also supported.
+
+* Prediction in Python
+
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="language_classification")
+result = model.predict(input_data="pulc_demo_imgs/language_classification/word_35404.png")
+print(next(result))
+```
+
+**Note**: The `result` returned by `model.predict()` is a generator, so you need to use the `next()` function to call it or `for` loop to loop it. And it will predict with `batch_size` size batch and return the prediction results when called. The default `batch_size` is 1, and you also specify the `batch_size` when instantiating, such as `model = paddleclas.PaddleClas(model_name="language_classification",  batch_size=2)`. The result of demo above:
+
+```
+>>> result
+[{'class_ids': [4, 6], 'scores': [0.88672, 0.01434], 'label_names': ['japan', 'korean'], 'filename': 'pulc_demo_imgs/language_classification/word_35404.png'}]
+```
+
+<a name="3"></a>
+
+## 3. Training, Evaluation and Inference
+
+<a name="3.1"></a>  
+
+### 3.1 Installation
+
+Please refer to [Installation](../installation/install_paddleclas_en.md) to get the description about installation.
+
+<a name="3.2"></a>
+
+### 3.2 Dataset
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 Dataset Introduction
+
+The models wo provided are trained with internal data, which is not open source yet. So it is suggested that constructing dataset based on open source dataset [Multi-lingual scene text detection and recognition](https://rrc.cvc.uab.es/?ch=15&com=downloads) to experience the this case.
+
+Some image of the processed dataset is as follows:
+
+![](../../images/PULC/docs/language_classification_original_data.png)
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 Getting Dataset
+
+The models provided support to classcify 10 languages, which as shown in the following list:
+
+`0` : means Arabic
+`1` : means chinese_cht
+`2` : means cyrillic
+`3` : means devanagari
+`4` : means Japanese
+`5` : means ka
+`6` : means Korean
+`7` : means ta
+`8` : means te
+`9` : means Latin
+
+In the `Multi-lingual scene text detection and recognition`, only Arabic, Japanese, Korean and Latin data are included. 1600 images from each of the four languages are taken as the training data of this case, 300 images as the evaluation data, and 400 images as the supplementary data is used for the `SKL-UGI Knowledge Distillation`.
+
+Therefore, for the demo dataset in this case, the language categories are shown in following list:
+`0` : means arabic
+`4` : means japan
+`6` : means korean
+`9` : means latin
+
+**Note**: The images used in this task should be cropped by text from original image. Only the text line part is used as the image data.
+
+If you want to create your own dataset, you can collect and sort out the data of the required languages in your task as required. And you can also download the data processed directly.
+
+```
+cd path_to_PaddleClas
+```
+
+Enter the `dataset/` directory, download and unzip the dataset.
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/language_classification.tar
+tar -xf language_classification.tar
+cd ../
+```
+
+The datas under `language_classification` directory:
+
+```
+├── img
+│   ├── word_1.png
+│   ├── word_2.png
+...
+├── train_list.txt
+├── train_list_for_distill.txt
+├── test_list.txt
+└── label_list.txt
+```
+
+Where `img/` is the directory including 9200 images in 4 languages. The `train_list.txt` and `test_list.txt` are label files of training data and validation data respectively. `label_list.txt` is the mapping file corresponding to the four languages. `train_list_for_distill.txt` is the label list of images used for `SKL-UGI Knowledge Distillation`.
+
+**Note**:
+
+* About the contents format of `train_list.txt` and `val_list.txt`, please refer to [Description about Classification Dataset in PaddleClas](../data_preparation/classification_dataset_en.md).
+* About the `train_list_for_distill.txt`, please refer to [Knowledge Distillation Label](../advanced_tutorials/distillation/distillation_en.md).
+
+<a name="3.3"></a>
+
+### 3.3 Training
+
+The details of training config in `ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml`. The command about training as follows:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
+        -o Arch.class_num=4
+```
+
+**Note**: Because the class num of demo dataset is 4, the argument `-o Arch.class_num=4` should be specifed to change the prediction class num of model to 4.
+
+<a name="3.4"></a>
+
+### 3.4 Evaluation
+
+After training, you can use the following commands to evaluate the model.
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model" \
+    -o Arch.class_num=4
+```
+
+Among the above command, the argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+
+<a name="3.5"></a>
+
+### 3.5 Inference
+
+After training, you can use the model that trained to infer. Command is as follow:
+
+```bash
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model" \
+    -o Arch.class_num=4
+```
+
+The results:
+
+```
+[{'class_ids': [4, 9], 'scores': [0.96809, 0.01001], 'file_name': 'deploy/images/PULC/language_classification/word_35404.png', 'label_names': ['japan', 'latin']}]
+```
+
+**Note**:
+
+* Among the above command, argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+* The default test image is `deploy/images/PULC/person_exists/objects365_02035329.jpg`. And you can test other image, only need to specify the argument `-o Infer.infer_imgs=path_to_test_image`.
+* Among the prediction results, `japan` means japanese and `korean` means korean.
+
+<a name="4"></a>
+
+## 4. Model Compression
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI Knowledge Distillation
+
+SKL-UGI is a simple but effective knowledge distillation algrithem proposed by PaddleClas.
+
+<!-- todo -->
+<!-- Please refer to [SKL-UGI](../advanced_tutorials/distillation/distillation_en.md) for more details. -->
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 Teacher Model Training
+
+Training the teacher model with hyperparameters specified in `ppcls/configs/PULC/language_classification/PPLCNet/PPLCNet_x1_0.yaml`. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd \
+        -o Arch.class_num=4
+```
+
+The best teacher model weight would be saved in file `output/ResNet101_vd/best_model.pdparams`.
+
+**Note**: Training the ResNet101_vd model requires more GPU memory. If the memory is not enough, you can reduce the learning rate and batch size in the same proportion.
+
+<a name="4.1.2"></a>
+
+#### 4.1.2 Knowledge Distillation Training
+
+The training strategy, specified in training config file `ppcls/configs/PULC/language_classification/PPLCNet_x1_0_distillation.yaml`, the teacher model is `ResNet101_vd`, the student model is `PPLCNet_x1_0`. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0_distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model \
+        -o Arch.class_num=4
+```
+
+The best student model weight would be saved in file `output/DistillationModel/best_model_student.pdparams`.
+
+<a name="5"></a>
+
+## 5. Hyperparameters Searching
+
+The hyperparameters used by [3.2 section](#3.2) and [4.1 section](#4.1) are according by `Hyperparameters Searching` in PaddleClas. If you want to get better results on your own dataset, you can refer to [Hyperparameters Searching](PULC_train_en.md#4) to get better hyperparameters.
+
+**Note**: This section is optional. Because the search process will take a long time, you can selectively run according to your specific. If not replace the dataset, you can ignore this section.
+
+<a name="6"></a>
+
+## 6. Inference Deployment
+
+<a name="6.1"></a>
+
+### 6.1 Getting Paddle Inference Model
+
+Paddle Inference is the original Inference Library of the PaddlePaddle, provides high-performance inference for server deployment. And compared with  directly based on the pretrained model, Paddle Inference can use tools to accelerate prediction, so as to achieve better inference performance. Please refer to [Paddle Inference](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html) for more information.
+
+Paddle Inference need Paddle Inference Model to predict. Two process provided to get Paddle Inference Model. If want to use the provided by PaddleClas, you can download directly, click [Downloading Inference Model](#6.1.2).
+
+<a name="6.1.1"></a>
+
+### 6.1.1 Exporting Paddle Inference Model
+
+The command about exporting Paddle Inference Model is as follow:
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_language_classification_infer
+```
+
+After running above command, the inference model files would be saved in `deploy/models/PPLCNet_x1_0_language_classification_infer`, as shown below:
+
+```
+├── PPLCNet_x1_0_language_classification_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**Note**: The best model is from knowledge distillation training. If knowledge distillation training is not used, the best model would be saved in `output/PPLCNet_x1_0/best_model.pdparams`.
+
+<a name="6.1.2"></a>
+
+### 6.1.2 Downloading Inference Model
+
+You can also download directly.
+
+```
+cd deploy/models
+# download the inference model and decompression
+wget https://paddleclas.bj.bcebos.com/models/PULC/language_classification_infer.tar && tar -xf language_classification_infer.tar
+```
+
+After decompression, the directory `models` should be shown below.
+
+```
+├── language_classification_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 Prediction with Python
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 Image Prediction
+
+Return the directory `deploy`:
+
+```
+cd ../
+```
+
+Run the following command to classify language about the image `./images/PULC/language_classification/word_35404.png`.
+
+```shell
+# Use the following command to predict with GPU.
+python3.7 python/predict_cls.py -c configs/PULC/language_classification/inference_language_classification.yaml
+# Use the following command to predict with CPU.
+python3.7 python/predict_cls.py -c configs/PULC/language_classification/inference_language_classification.yaml -o Global.use_gpu=False
+```
+
+The prediction results:
+
+```
+word_35404.png:    class id(s): [4, 6], score(s): [0.89, 0.01], label_name(s): ['japan', 'korean']
+```
+
+**Note**: Among the prediction results, `japan` means japanese and `korean` means korean.
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 Images Prediction
+
+If you want to predict images in directory, please specify the argument `Global.infer_imgs` as directory path by `-o Global.infer_imgs`. The command is as follow.
+
+```shell
+# Use the following command to predict with GPU. If want to replace with CPU, you can add argument -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/language_classification/inference_language_classification.yaml -o Global.infer_imgs="./images/PULC/language_classification/"
+```
+
+All prediction results will be printed, as shown below.
+
+```
+word_17.png:    class id(s): [9, 4], score(s): [0.80, 0.09], label_name(s): ['latin', 'japan']
+word_20.png:    class id(s): [0, 4], score(s): [0.91, 0.02], label_name(s): ['arabic', 'japan']
+word_35404.png:    class id(s): [4, 6], score(s): [0.89, 0.01], label_name(s): ['japan', 'korean']
+```
+
+Among the prediction results above, `japan` means japanese, `latin` means latin, `arabic` means arabic and `korean` means korean.
+
+<a name="6.3"></a>
+
+### 6.3 Deployment with C++
+
+PaddleClas provides an example about how to deploy with C++. Please refer to [Deployment with C++](../inference_deployment/cpp_deploy_en.md).
+
+<a name="6.4"></a>
+
+### 6.4 Deployment as Service
+
+Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
+
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+
+<a name="6.5"></a>
+
+### 6.5 Deployment on Mobile
+
+Paddle-Lite is an open source deep learning framework that designed to make easy to perform inference on mobile, embeded, and IoT devices. Please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) for more information.
+
+PaddleClas provides an example of how to deploy on mobile by Paddle-Lite. Please refer to [Paddle-Lite deployment](../inference_deployment/paddle_lite_deploy_en.md).
+
+<a name="6.6"></a>
+
+### 6.6 Converting To ONNX and Deployment
+
+Paddle2ONNX support convert Paddle Inference model to ONNX model. And you can deploy with ONNX model on different inference engine, such as TensorRT, OpenVINO, MNN/TNN, NCNN and so on. About Paddle2ONNX details, please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
+
+PaddleClas provides an example of how to convert Paddle Inference model to ONNX model by paddle2onnx toolkit and predict by ONNX model. You can refer to [paddle2onnx](../../../deploy/paddle2onnx/readme_en.md) for deployment details.
diff --git a/docs/en/PULC/PULC_model_list_en.md b/docs/en/PULC/PULC_model_list_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..a7de0ce2c996132e6c882a10f5fcecd22398cc22
--- /dev/null
+++ b/docs/en/PULC/PULC_model_list_en.md
@@ -0,0 +1,25 @@
+# PULC Model Zoo
+
+------
+
+The PULC model zoo is provided here, mainly providing indicators, model storage size, and download links of the model. The pre-trained model can be used for fine-tuning training, and the inference model can be directly used for prediction and deployment.
+
+
+|Model name| Model Description | Metrics |Storage Size| Latency| Download Address|
+| --- | --- | --- | --- | --- | --- |
+| person_exists |[Human Exists Classification](PULC_person_exists_en.md)| 96.23 |7.0M|2.58ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/person_exists_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/person_exists_pretrained.pdparams)|
+| person_attribute |[Pedestrian Attribute Classification](PULC_person_attribute_en.md)| 78.59 |7.2M|2.01ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/person_attribute_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/person_attribute_pretrained.pdparams)|
+| safety_helmet |[Classification of Wheather Wearing Safety Helmet](PULC_safety_helmet_en.md)| 99.38 |7.1M|2.03ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/safety_helmet_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/safety_helmet_pretrained.pdparams)|
+| traffic_sign |[Traffic Sign Classification](PULC_traffic_sign_en.md)| 98.35 |8.2M|2.10ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/traffic_sign_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/traffic_sign_pretrained.pdparams)|
+| vehicle_attribute |[Vehicle Attribute Classification](PULC_vehicle_attribute_en.md)| 90.81 |7.2M|2.36ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/vehicle_attribute_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/vehicle_attribute_pretrained.pdparams)|
+| car_exists |[Car Exists Classification](PULC_car_exists_en.md) | 95.92 | 7.1M | 2.38ms |[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/car_exists_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/car_exists_pretrained.pdparams)|
+| text_image_orientation |[Text Image Orientation Classification](PULC_text_image_orientation_en.md)| 99.06 | 7.1M | 2.16ms |[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/text_image_orientation_pretrained.pdparams)|
+| textline_orientation |[Text-line Orientation Classification](PULC_textline_orientation_en.md)| 96.01 |7.0M|2.72ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/textline_orientation_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/textline_orientation_pretrained.pdparams)|
+| language_classification |[Language Classification](PULC_language_classification_en.md)| 99.26 |7.1M|2.58ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/language_classification_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/language_classification_pretrained.pdparams)|
+
+
+**Note：**
+
+* The backbone of all the above models is PPLCNet_x1_0. The different sizes of some models are caused by the different output sizes of the classification layer. The inference time is tested on the Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. During the test process, the MKLDNN acceleration strategy is turned on, and the number of threads is 10. There will be slight fluctuations during the speed test process.
+
+* The evaluation indicators of person_exists, safety_helmet, and car_exists are TprAtFpr. The evaluation indicators of person_attribute and vehicle_attribute are ma. The evaluation indicators of traffic_sign, text_image_orientation, textline_orientation and language_classification are Top-1 Acc.
diff --git a/docs/en/PULC/PULC_person_attribute_en.md b/docs/en/PULC/PULC_person_attribute_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..173313aad1a684289f3a6825cdf73ea01493847d
--- /dev/null
+++ b/docs/en/PULC/PULC_person_attribute_en.md
@@ -0,0 +1,448 @@
+# PULC Recognition Model of Person Attribute
+
+------
+
+## Catalogue
+
+- [1. Introduction](#1)
+- [2. Quick Start](#2)
+    - [2.1 PaddlePaddle Installation](#2.1)
+    - [2.2 PaddleClas Installation](#2.2)
+    - [2.3 Prediction](#2.3)
+- [3. Training, Evaluation and Inference](#3)
+    - [3.1 Installation](#3.1)
+    - [3.2 Dataset](#3.2)
+      - [3.2.1 Dataset Introduction](#3.2.1)
+      - [3.2.2 Getting Dataset](#3.2.2)
+    - [3.3 Training](#3.3)
+    - [3.4 Evaluation](#3.4)
+    - [3.5 Inference](#3.5)
+- [4. Model Compression](#4)
+  - [4.1 SKL-UGI Knowledge Distillation](#4.1)
+    - [4.1.1 Teacher Model Training](#4.1.1)
+    - [4.1.2 Knowledge Distillation Training](#4.1.2)
+- [5. SHAS](#5)
+- [6. Inference Deployment](#6)
+  - [6.1 Getting Paddle Inference Model](#6.1)
+    - [6.1.1 Exporting Paddle Inference Model](#6.1.1)
+    - [6.1.2 Downloading Inference Model](#6.1.2)
+  - [6.2 Prediction with Python](#6.2)
+    - [6.2.1 Image Prediction](#6.2.1)
+    - [6.2.2 Images Prediction](#6.2.2)
+  - [6.3 Deployment with C++](#6.3)
+  - [6.4 Deployment as Service](#6.4)
+  - [6.5 Deployment on Mobile](#6.5)
+  - [6.6 Converting To ONNX and Deployment](#6.6)
+
+<a name="1"></a>
+
+## 1. Introduction
+
+This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of person attribute using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in
+Pedestrian analysis scenarios, pedestrian tracking scenarios, etc.
+
+The following table lists the relevant indicators of the model. The first three lines means that using Res2Net200_vd_26w_4s, SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The fourth to seventh lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
+
+
+| Backbone | ma（%） | Latency(ms) | Size(M) | Training Strategy |
+|-------|-----------|----------|---------------|---------------|
+| Res2Net200_vd_26w_4s  | 81.25 | 77.51  | 293 | using ImageNet pretrained |
+| SwinTransformer_tiny  | 80.17 | 89.51  | 111 | using ImageNet pretrained |
+| MobileNetV3_small_x0_35  | 70.79 | 2.90  | 1.7 | using ImageNet pretrained |
+| PPLCNet_x1_0  | 76.31 | 2.01  | 7.1 | using ImageNet pretrained |
+| PPLCNet_x1_0  | 77.31 | 2.01  | 7.1 | using SSLD pretrained |
+| PPLCNet_x1_0  | 77.71 | 2.01  | 7.1 | using SSLD pretrained + EDA strategy|
+| <b>PPLCNet_x1_0<b>  | <b>78.59<b> | <b>2.01<b>  | <b>7.1<b> | using SSLD pretrained + EDA strategy + SKL-UGI knowledge distillation strategy|
+
+It can be seen that high ma metric can be getted when backbone are Res2Net200_vd_26w_4s and SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the ma metric will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the ma metric is higher more 5.5 percentage points higher than MobileNetv3_small_x0_35. At the same time, the speed can be more than 20% faster. After additional using the SSLD pretrained model, the ma metric can be improved by about 1 percentage points without affecting the inference speed. Further, additional using the EDA strategy, the ma metric can be increased by 0.4 percentage points. Finally, after additional using the SKL-UGI knowledge distillation, the ma matric can be further improved by 0.88 percentage points. At this time, the ma metric of PPLCNet_x1_0 is only 1.58% different from SwinTransformer_tiny, but the speed is more than 44 times faster. The training method and deployment instructions of PULC will be introduced in detail below.
+
+**Note**:
+
+* The Latency is tested on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. The MKLDNN is enabled and the number of threads is 10.
+* About PP-LCNet, please refer to [PP-LCNet Introduction](../models/PP-LCNet_en.md) and [PP-LCNet Paper](https://arxiv.org/abs/2109.15099).
+
+<a name="2"></a>
+
+## 2. Quick Start
+
+<a name="2.1"></a>  
+
+### 2.1 PaddlePaddle Installation
+
+- Run the following command to install if CUDA9 or CUDA10 is available.
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- Run the following command to install if GPU device is unavailable.
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) for more information about installation, for examples other versions.
+
+<a name="2.2"></a>  
+
+### 2.2 PaddleClas wheel Installation
+
+The command of PaddleClas installation as bellow:
+
+```bash
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 Prediction
+
+First, please click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download and unzip to get the test demo images.
+
+
+* Prediction with CLI
+
+```bash
+paddleclas --model_name=person_attribute --infer_imgs=pulc_demo_imgs/person_attribute/090004.jpg
+```
+
+Results:
+```
+>>> result
+attributes: ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower:  Trousers', 'No boots'], output: [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1], filename: pulc_demo_imgs/person_attribute/090004.jpg
+Predict complete!
+```
+
+**Note**: If you want to test other images, only need to specify the `--infer_imgs` argument, and the directory containing images is also supported.
+
+* Prediction in Python
+
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="person_attribute")
+result = model.predict(input_data="pulc_demo_imgs/person_attribute/090004.jpg")
+print(next(result))
+```
+
+**Note**: The `result` returned by `model.predict()` is a generator, so you need to use the `next()` function to call it or `for` loop to loop it. And it will predict with `batch_size` size batch and return the prediction results when called. The default `batch_size` is 1, and you also specify the `batch_size` when instantiating, such as `model = paddleclas.PaddleClas(model_name="person_attribute",  batch_size=2)`. The result of demo above:
+
+```
+>>> result
+[{'attributes': ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower:  Trousers', 'No boots'], 'output': [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1], 'filename': 'pulc_demo_imgs/person_attribute/090004.jpg'}]
+```
+
+<a name="3"></a>
+
+## 3. Training, Evaluation and Inference
+
+<a name="3.1"></a>  
+
+### 3.1 Installation
+
+Please refer to [Installation](../installation/install_paddleclas_en.md) to get the description about installation.
+
+<a name="3.2"></a>
+
+### 3.2 Dataset
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 Dataset Introduction
+
+The data used in this case is the [pa100k dataset](https://www.v7labs.com/open-datasets/pa-100k).
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 Getting Dataset
+
+Some image of the processed dataset is as follows:
+
+![](../../images/PULC/docs/person_attribute_data_demo.png)
+
+
+We converted the data into a PaddleClas multi-label readable data format that can be downloaded directly.
+
+```
+cd path_to_PaddleClas
+```
+
+Enter the `dataset/` directory, download and unzip the dataset.
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/pa100k.tar
+tar -xf pa100k.tar
+cd ../
+```
+
+The datas under `pa100k` directory:
+
+```
+pa100k
+├── train
+│   ├── 000001.jpg
+│   ├── 000002.jpg
+...
+├── val
+│   ├── 080001.jpg
+│   ├── 080002.jpg
+...  
+├── test
+│   ├── 090001.jpg
+│   ├── 090002.jpg
+...
+...
+├── train_list.txt
+├── train_val_list.txt
+├── val_list.txt
+├── test_list.txt
+```
+
+Where `train/`, `val/`, `test/` are training set, validation set and test set respectively. `train_list.txt`, `val_list.txt`, `test_list.txt` are the label files of the training set, validation set, and test set, respectively. In this example, `test_list.txt` is not used for now.
+
+
+<a name="3.3"></a>
+
+### 3.3 Training
+
+The details of training config in ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`. The command about training as follows:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
+```
+
+The best metric for the validation set is around `77.71%` (the dataset is small and generally fluctuates around 0.3%).
+
+
+<a name="3.4"></a>
+
+### 3.4 Evaluation
+
+After training, you can use the following commands to evaluate the model.
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+Among the above command, the argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+
+<a name="3.5"></a>
+
+### 3.5 Inference
+
+After training, you can use the model that trained to infer. Command is as follow:
+
+```python
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+
+The results:
+
+```
+[{'attributes': ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower:  Trousers', 'No boots'], 'output': [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]}]
+```
+
+**Note**:
+
+* Among the above command, argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+* The default test image is `deploy/images/PULC/person_attribute/090004.jpg`. And you can test other image, only need to specify the argument `-o Infer.infer_imgs=path_to_test_image`.
+
+<a name="4"></a>
+
+## 4. Model Compression
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI Knowledge Distillation
+
+SKL-UGI is a simple but effective knowledge distillation algrithem proposed by PaddleClas.
+
+<!-- todo -->
+<!-- Please refer to [SKL-UGI](../advanced_tutorials/distillation/distillation_en.md) for more details. -->
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 Teacher Model Training
+
+Training the teacher model with hyperparameters specified in `ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd
+```
+
+The best metric for the validation set is around `80.10%`. The best teacher model weight would be saved in file `output/ResNet101_vd/best_model.pdparams`.
+
+<a name="4.1.2"></a>
+
+#### 4.1.2 Knowledge Distillation Training
+
+The training strategy, specified in training config file `ppcls/configs/PULC/person_attribute/PPLCNet_x1_0_Distillation.yaml`, the teacher model is `ResNet101_vd`, the student model is `PPLCNet_x1_0`. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0_Distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
+```
+
+The best metric for the validation set is around `78.5%`. The best student model weight would be saved in file `output/DistillationModel/best_model_student.pdparams`.
+
+<a name="5"></a>
+
+## 5. Hyperparameters Searching
+
+The hyperparameters used by [3.2 section](#3.2) and [4.1 section](#4.1) are according by `Hyperparameters Searching` in PaddleClas. If you want to get better results on your own dataset, you can refer to [Hyperparameters Searching](PULC_train_en.md#4) to get better hyperparameters.
+
+**Note**: This section is optional. Because the search process will take a long time, you can selectively run according to your specific. If not replace the dataset, you can ignore this section.
+
+<a name="6"></a>
+
+## 6. Inference Deployment
+
+<a name="6.1"></a>
+
+### 6.1 Getting Paddle Inference Model
+
+Paddle Inference is the original Inference Library of the PaddlePaddle, provides high-performance inference for server deployment. And compared with  directly based on the pretrained model, Paddle Inference can use tools to accelerate prediction, so as to achieve better inference performance. Please refer to [Paddle Inference](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html) for more information.
+
+Paddle Inference need Paddle Inference Model to predict. Two process provided to get Paddle Inference Model. If want to use the provided by PaddleClas, you can download directly, click [Downloading Inference Model](#6.1.2).
+
+<a name="6.1.1"></a>
+
+### 6.1.1 Exporting Paddle Inference Model
+
+The command about exporting Paddle Inference Model is as follow:
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person_attribute_infer
+```
+
+After running above command, the inference model files would be saved in `PPLCNet_x1_0_person_attribute_infer`, as shown below:
+
+```
+├── PPLCNet_x1_0_person_attribute_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**Note**: The best model is from knowledge distillation training. If knowledge distillation training is not used, the best model would be saved in `output/PPLCNet_x1_0/best_model.pdparams`.
+
+<a name="6.1.2"></a>
+
+### 6.1.2 Downloading Inference Model
+
+You can also download directly.
+
+```
+cd deploy/models
+# download the inference model and decompression
+wget https://paddleclas.bj.bcebos.com/models/PULC/person_attribute_infer.tar && tar -xf person_attribute_infer.tar
+```
+
+After decompression, the directory `models` should be shown below.
+
+```
+├── person_attribute_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 Prediction with Python
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 Image Prediction
+
+Return the directory `deploy`:
+
+```
+cd ../
+```
+
+Run the following command to classify whether there are human in the image `./images/PULC/person_attribute/090004.jpg`.
+
+```shell
+# Use the following command to predict with GPU.
+python3.7 python/predict_cls.py -c configs/PULC/person_attribute/inference_person_attribute.yaml -o Global.use_gpu=True
+# Use the following command to predict with CPU.
+python3.7 python/predict_cls.py -c configs/PULC/person_attribute/inference_person_attribute.yaml -o Global.use_gpu=False
+```
+
+The prediction results:
+
+```
+090004.jpg:     {'attributes': ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower:  Trousers', 'No boots'], 'output': [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]}
+```
+
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 Images Prediction
+
+If you want to predict images in directory, please specify the argument `Global.infer_imgs` as directory path by `-o Global.infer_imgs`. The command is as follow.
+
+```shell
+# Use the following command to predict with GPU. If want to replace with CPU, you can add argument -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/person_attribute/inference_person_attribute.yaml -o Global.infer_imgs="./images/PULC/person_attribute/"
+```
+
+All prediction results will be printed, as shown below.
+
+```
+090004.jpg:     {'attributes': ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower:  Trousers', 'No boots'], 'output': [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]}
+090007.jpg:     {'attributes': ['Female', 'Age18-60', 'Side', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'No bag', 'Upper: ShortSleeve', 'Lower:  Skirt&Dress', 'No boots'], 'output': [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0]}
+```
+
+Among the prediction results above, `someone` means that there is a human in the image, `nobody` means that there is no human in the image.
+
+<a name="6.3"></a>
+
+### 6.3 Deployment with C++
+
+PaddleClas provides an example about how to deploy with C++. Please refer to [Deployment with C++](../inference_deployment/cpp_deploy_en.md).
+
+<a name="6.4"></a>
+
+### 6.4 Deployment as Service
+
+Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
+
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+
+<a name="6.5"></a>
+
+### 6.5 Deployment on Mobile
+
+Paddle-Lite is an open source deep learning framework that designed to make easy to perform inference on mobile, embeded, and IoT devices. Please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) for more information.
+
+PaddleClas provides an example of how to deploy on mobile by Paddle-Lite. Please refer to [Paddle-Lite deployment](../inference_deployment/paddle_lite_deploy_en.md).
+
+<a name="6.6"></a>
+
+### 6.6 Converting To ONNX and Deployment
+
+Paddle2ONNX support convert Paddle Inference model to ONNX model. And you can deploy with ONNX model on different inference engine, such as TensorRT, OpenVINO, MNN/TNN, NCNN and so on. About Paddle2ONNX details, please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
+
+PaddleClas provides an example of how to convert Paddle Inference model to ONNX model by paddle2onnx toolkit and predict by ONNX model. You can refer to [paddle2onnx](../../../deploy/paddle2onnx/readme_en.md) for deployment details.
diff --git a/docs/en/PULC/PULC_person_exists_en.md b/docs/en/PULC/PULC_person_exists_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..baf5ce3e4c295a57d928853f5a0b3da1d3c7b366
--- /dev/null
+++ b/docs/en/PULC/PULC_person_exists_en.md
@@ -0,0 +1,458 @@
+# PULC Classification Model of Someone or Nobody
+
+------
+
+## Catalogue
+
+- [1. Introduction](#1)
+- [2. Quick Start](#2)
+    - [2.1 PaddlePaddle Installation](#2.1)
+    - [2.2 PaddleClas Installation](#2.2)
+    - [2.3 Prediction](#2.3)
+- [3. Training, Evaluation and Inference](#3)
+    - [3.1 Installation](#3.1)
+    - [3.2 Dataset](#3.2)
+      - [3.2.1 Dataset Introduction](#3.2.1)
+      - [3.2.2 Getting Dataset](#3.2.2)
+    - [3.3 Training](#3.3)
+    - [3.4 Evaluation](#3.4)
+    - [3.5 Inference](#3.5)
+- [4. Model Compression](#4)
+  - [4.1 SKL-UGI Knowledge Distillation](#4.1)
+    - [4.1.1 Teacher Model Training](#4.1.1)
+    - [4.1.2 Knowledge Distillation Training](#4.1.2)
+- [5. SHAS](#5)
+- [6. Inference Deployment](#6)
+  - [6.1 Getting Paddle Inference Model](#6.1)
+    - [6.1.1 Exporting Paddle Inference Model](#6.1.1)
+    - [6.1.2 Downloading Inference Model](#6.1.2)
+  - [6.2 Prediction with Python](#6.2)
+    - [6.2.1 Image Prediction](#6.2.1)
+    - [6.2.2 Images Prediction](#6.2.2)
+  - [6.3 Deployment with C++](#6.3)
+  - [6.4 Deployment as Service](#6.4)
+  - [6.5 Deployment on Mobile](#6.5)
+  - [6.6 Converting To ONNX and Deployment](#6.6)
+
+<a name="1"></a>
+
+## 1. Introduction
+
+This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of human exists using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in monitoring scenarios, personnel access control scenarios, massive data filtering scenarios, etc.
+
+The following table lists the relevant indicators of the model. The first two lines means that using SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The third to sixth lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
+
+| Backbone | Tpr(%) | Latency(ms) | Size(M)| Training Strategy |
+|-------|-----------|----------|---------------|---------------|
+| SwinTranformer_tiny  | 95.69 | 95.30  | 111 | using ImageNet pretrained model |
+| MobileNetV3_small_x0_35  | 68.25 | 2.85  | 2.6 | using ImageNet pretrained model |
+| PPLCNet_x1_0  | 89.57 | 2.12  | 7.0 | using ImageNet pretrained model |
+| PPLCNet_x1_0  | 92.10 | 2.12  | 7.0 | using SSLD pretrained model |
+| PPLCNet_x1_0  | 93.43 | 2.12  | 7.0 | using SSLD pretrained model + EDA strategy  |
+| <b>PPLCNet_x1_0<b>  | <b>96.23<b> | <b>2.12<b>  | <b>7.0<b> | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
+
+It can be seen that high Tpr can be getted when backbone is SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the Tpr will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the Tpr is higher more 20 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more than 20% faster. After additional using the SSLD pretrained model, the Tpr can be improved by about 2.6 percentage points without affecting the inference speed. Further, additional using the EDA strategy, the Tpr can be increased by 1.3 percentage points. Finally, after additional using the SKL-UGI knowledge distillation, the Tpr can be further improved by 2.8 percentage points. At this point, the Tpr is close to that of SwinTranformer_tiny, but the speed is more than 40 times faster. The training method and deployment instructions of PULC will be introduced in detail below.
+
+**Note**:
+
+* About `Tpr` metric, please refer to [3.2 section](#3.2) for more information .
+* The Latency is tested on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. The MKLDNN is enabled and the number of threads is 10.
+* About PP-LCNet, please refer to [PP-LCNet Introduction](../models/PP-LCNet_en.md) and [PP-LCNet Paper](https://arxiv.org/abs/2109.15099).
+
+<a name="2"></a>
+
+## 2. Quick Start
+
+<a name="2.1"></a>  
+
+### 2.1 PaddlePaddle Installation
+
+- Run the following command to install if CUDA9 or CUDA10 is available.
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- Run the following command to install if GPU device is unavailable.
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) for more information about installation, for examples other versions.
+
+<a name="2.2"></a>  
+
+### 2.2 PaddleClas wheel Installation
+
+The command of PaddleClas installation as bellow:
+
+```bash
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 Prediction
+
+First, please click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download and unzip to get the test demo images.
+
+* Prediction with CLI
+
+```bash
+paddleclas --model_name=person_exists --infer_imgs=pulc_demo_imgs/person_exists/objects365_01780782.jpg
+```
+
+Results:
+```
+>>> result
+class_ids: [0], scores: [0.9955421453341842], label_names: ['nobody'], filename: pulc_demo_imgs/person_exists/objects365_01780782.jpg
+Predict complete!
+```
+
+**Note**: If you want to test other images, only need to specify the `--infer_imgs` argument, and the directory containing images is also supported.
+
+* Prediction in Python
+
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="person_exists")
+result = model.predict(input_data="pulc_demo_imgs/person_exists/objects365_01780782.jpg")
+print(next(result))
+```
+
+**Note**: The `result` returned by `model.predict()` is a generator, so you need to use the `next()` function to call it or `for` loop to loop it. And it will predict with `batch_size` size batch and return the prediction results when called. The default `batch_size` is 1, and you also specify the `batch_size` when instantiating, such as `model = paddleclas.PaddleClas(model_name="person_exists",  batch_size=2)`. The result of demo above:
+
+```
+>>> result
+[{'class_ids': [0], 'scores': [0.9955421453341842], 'label_names': ['nobody'], 'filename': 'pulc_demo_imgs/person_exists/objects365_01780782.jpg'}]
+```
+
+<a name="3"></a>
+
+## 3. Training, Evaluation and Inference
+
+<a name="3.1"></a>  
+
+### 3.1 Installation
+
+Please refer to [Installation](../installation/install_paddleclas_en.md) to get the description about installation.
+
+<a name="3.2"></a>
+
+### 3.2 Dataset
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 Dataset Introduction
+
+All datasets used in this case are open source data. Train data is the subset of [MS-COCO](https://cocodataset.org/#overview) training data. And the validation data is the subset of [Object365](https://www.objects365.org/overview.html) training data. ImageNet_val is [ImageNet-1k](https://www.image-net.org/) validation data.
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 Getting Dataset
+
+The data used in this case can be getted by processing the open source data. The detailed processes are as follows:
+
+- Training data. This case deals with the annotation file of MS-COCO data training data. If a certain image contains the label of "person" and the area of this box is greater than 10% in the whole image, it is considered that the image contains human. If there is no label of "person" in a certain image, It is considered that the image does not contain human. After processing, 92964 pieces of available data were obtained, including 39813 images containing human and 53151 images without containing human.
+- Validation data: randomly select a small part of data from object365 data, use the better model trained on MS-COCO to predict these data, take the intersection between the prediction results and the data annotation file, and filter the intersection results into the validation set according to the method of obtaining the training set. After processing, 27820 pieces of available data were obtained. There are 2255 pieces of data with human and 25565 pieces of data without human. The data visualization of the processed dataset is as follows:
+
+Some image of the processed dataset is as follows:
+
+![](../../images/PULC/docs/person_exists_data_demo.png)
+
+And you can also download the data processed directly.
+
+```
+cd path_to_PaddleClas
+```
+
+Enter the `dataset/` directory, download and unzip the dataset.
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/person_exists.tar
+tar -xf person_exists.tar
+cd ../
+```
+
+The datas under `person_exists` directory:
+
+```
+├── train
+│   ├── 000000000009.jpg
+│   ├── 000000000025.jpg
+...
+├── val
+│   ├── objects365_01780637.jpg
+│   ├── objects365_01780640.jpg
+...
+├── ImageNet_val
+│   ├── ILSVRC2012_val_00000001.JPEG
+│   ├── ILSVRC2012_val_00000002.JPEG
+...
+├── train_list.txt
+├── train_list.txt.debug
+├── train_list_for_distill.txt
+├── val_list.txt
+└── val_list.txt.debug
+```
+
+Where `train/` and `val/` are training set and validation set respectively. The `train_list.txt` and `val_list.txt` are label files of training data and validation data respectively. The file `train_list.txt.debug` and `val_list.txt.debug` are subset of `train_list.txt` and `val_list.txt` respectively. `ImageNet_val/` is the validation data of ImageNet-1k, which will be used for SKL-UGI knowledge distillation, and its label file is `train_list_for_distill.txt`.
+
+**Note**:
+
+* About the contents format of `train_list.txt` and `val_list.txt`, please refer to [Description about Classification Dataset in PaddleClas](../data_preparation/classification_dataset_en.md).
+* About the `train_list_for_distill.txt`, please refer to [Knowledge Distillation Label](../advanced_tutorials/distillation/distillation_en.md).
+
+<a name="3.3"></a>
+
+### 3.3 Training
+
+The details of training config in `ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml`. The command about training as follows:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml
+```
+
+The best metric of validation data is between `0.94` and `0.95`. There would be fluctuations because the data size is small.
+
+**Note**:
+
+* The metric Tpr, that describe the True Positive Rate when False Positive Rate is less than a certain threshold(1/1000 used in this case), is one of the commonly used metric for binary classification. About the details of Fpr and Tpr, please refer [here](https://en.wikipedia.org/wiki/Receiver_operating_characteristic).
+* When evaluation, the best metric TprAtFpr will be printed that include `Fpr`, `Tpr` and the current `threshold`. The `Tpr` means the Recall rate under the current `Fpr`. The `Tpr` higher, the model better. The `threshold` would be used in deployment, which means the classification threshold under best `Fpr` metric.
+
+<a name="3.4"></a>
+
+### 3.4 Evaluation
+
+After training, you can use the following commands to evaluate the model.
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+Among the above command, the argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+
+<a name="3.5"></a>
+
+### 3.5 Inference
+
+After training, you can use the model that trained to infer. Command is as follow:
+
+```python
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+
+The results:
+
+```
+[{'class_ids': [1], 'scores': [0.9999976], 'label_names': ['someone'], 'file_name': 'deploy/images/PULC/person_exists/objects365_02035329.jpg'}]
+```
+
+**Note**:
+
+* Among the above command, argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+* The default test image is `deploy/images/PULC/person_exists/objects365_02035329.jpg`. And you can test other image, only need to specify the argument `-o Infer.infer_imgs=path_to_test_image`.
+* The default threshold is `0.5`. If needed, you can specify the argument `Infer.PostProcess.threshold`, such as: `-o Infer.PostProcess.threshold=0.9794`. And the argument `threshold` is needed to be specified according by specific case. The `0.9794` is the best threshold when `Fpr` is less than `1/1000` in this valuation dataset.
+
+<a name="4"></a>
+
+## 4. Model Compression
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI Knowledge Distillation
+
+SKL-UGI is a simple but effective knowledge distillation algrithem proposed by PaddleClas.
+
+<!-- todo -->
+<!-- Please refer to [SKL-UGI](../advanced_tutorials/distillation/distillation_en.md) for more details. -->
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 Teacher Model Training
+
+Training the teacher model with hyperparameters specified in `ppcls/configs/PULC/person_exists/PPLCNet/PPLCNet_x1_0.yaml`. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd
+```
+
+The best metric of validation data is between `0.96` and `0.98`. The best teacher model weight would be saved in file `output/ResNet101_vd/best_model.pdparams`.
+
+<a name="4.1.2"></a>
+
+#### 4.1.2 Knowledge Distillation Training
+
+The training strategy, specified in training config file `ppcls/configs/PULC/person_exists/PPLCNet_x1_0_distillation.yaml`, the teacher model is `ResNet101_vd`, the student model is `PPLCNet_x1_0` and the additional unlabeled training data is validation data of ImageNet1k. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0_distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
+```
+
+The best metric is between `0.95` and `0.97`. The best student model weight would be saved in file `output/DistillationModel/best_model_student.pdparams`.
+
+<a name="5"></a>
+
+## 5. Hyperparameters Searching
+
+The hyperparameters used by [3.2 section](#3.2) and [4.1 section](#4.1) are according by `Hyperparameters Searching` in PaddleClas. If you want to get better results on your own dataset, you can refer to [Hyperparameters Searching](PULC_train_en.md#4) to get better hyperparameters.
+
+**Note**: This section is optional. Because the search process will take a long time, you can selectively run according to your specific. If not replace the dataset, you can ignore this section.
+
+<a name="6"></a>
+
+## 6. Inference Deployment
+
+<a name="6.1"></a>
+
+### 6.1 Getting Paddle Inference Model
+
+Paddle Inference is the original Inference Library of the PaddlePaddle, provides high-performance inference for server deployment. And compared with  directly based on the pretrained model, Paddle Inference can use tools to accelerate prediction, so as to achieve better inference performance. Please refer to [Paddle Inference](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html) for more information.
+
+Paddle Inference need Paddle Inference Model to predict. Two process provided to get Paddle Inference Model. If want to use the provided by PaddleClas, you can download directly, click [Downloading Inference Model](#6.1.2).
+
+<a name="6.1.1"></a>
+
+### 6.1.1 Exporting Paddle Inference Model
+
+The command about exporting Paddle Inference Model is as follow:
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person_exists_infer
+```
+
+After running above command, the inference model files would be saved in `deploy/models/PPLCNet_x1_0_person_exists_infer`, as shown below:
+
+```
+├── PPLCNet_x1_0_person_exists_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**Note**: The best model is from knowledge distillation training. If knowledge distillation training is not used, the best model would be saved in `output/PPLCNet_x1_0/best_model.pdparams`.
+
+<a name="6.1.2"></a>
+
+### 6.1.2 Downloading Inference Model
+
+You can also download directly.
+
+```
+cd deploy/models
+# download the inference model and decompression
+wget https://paddleclas.bj.bcebos.com/models/PULC/person_exists_infer.tar && tar -xf person_exists_infer.tar
+```
+
+After decompression, the directory `models` should be shown below.
+
+```
+├── person_exists_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 Prediction with Python
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 Image Prediction
+
+Return the directory `deploy`:
+
+```
+cd ../
+```
+
+Run the following command to classify whether there are humans in the image `./images/PULC/person_exists/objects365_02035329.jpg`.
+
+```shell
+# Use the following command to predict with GPU.
+python3.7 python/predict_cls.py -c configs/PULC/person_exists/inference_person_exists.yaml
+# Use the following command to predict with CPU.
+python3.7 python/predict_cls.py -c configs/PULC/person_exists/inference_person_exists.yaml -o Global.use_gpu=False
+```
+
+The prediction results:
+
+```
+objects365_02035329.jpg:    class id(s): [1], score(s): [1.00], label_name(s): ['someone']
+```
+
+**Note**: The default threshold is `0.5`. If needed, you can specify the argument `Infer.PostProcess.threshold`, such as: `-o Infer.PostProcess.threshold=0.9794`. And the argument `threshold` is needed to be specified according by specific case. The `0.9794` is the best threshold when `Fpr` is less than `1/1000` in this valuation dataset. Please refer to [3.3 section](#3.3) for details.
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 Images Prediction
+
+If you want to predict images in directory, please specify the argument `Global.infer_imgs` as directory path by `-o Global.infer_imgs`. The command is as follow.
+
+```shell
+# Use the following command to predict with GPU. If want to replace with CPU, you can add argument -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/person_exists/inference_person_exists.yaml -o Global.infer_imgs="./images/PULC/person_exists/"
+```
+
+All prediction results will be printed, as shown below.
+
+```
+objects365_01780782.jpg:    class id(s): [0], score(s): [1.00], label_name(s): ['nobody']
+objects365_02035329.jpg:    class id(s): [1], score(s): [1.00], label_name(s): ['someone']
+```
+
+Among the prediction results above, `someone` means that there is a human in the image, `nobody` means that there is no human in the image.
+
+<a name="6.3"></a>
+
+### 6.3 Deployment with C++
+
+PaddleClas provides an example about how to deploy with C++. Please refer to [Deployment with C++](../inference_deployment/cpp_deploy_en.md).
+
+<a name="6.4"></a>
+
+### 6.4 Deployment as Service
+
+Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
+
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+
+<a name="6.5"></a>
+
+### 6.5 Deployment on Mobile
+
+Paddle-Lite is an open source deep learning framework that designed to make easy to perform inference on mobile, embeded, and IoT devices. Please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) for more information.
+
+PaddleClas provides an example of how to deploy on mobile by Paddle-Lite. Please refer to [Paddle-Lite deployment](../inference_deployment/paddle_lite_deploy_en.md).
+
+<a name="6.6"></a>
+
+### 6.6 Converting To ONNX and Deployment
+
+Paddle2ONNX support convert Paddle Inference model to ONNX model. And you can deploy with ONNX model on different inference engine, such as TensorRT, OpenVINO, MNN/TNN, NCNN and so on. About Paddle2ONNX details, please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
+
+PaddleClas provides an example of how to convert Paddle Inference model to ONNX model by paddle2onnx toolkit and predict by ONNX model. You can refer to [paddle2onnx](../../../deploy/paddle2onnx/readme_en.md) for deployment details.
diff --git a/docs/en/PULC/PULC_quickstart_en.md b/docs/en/PULC/PULC_quickstart_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..087c359283c0e288db91bc80774163eda336853b
--- /dev/null
+++ b/docs/en/PULC/PULC_quickstart_en.md
@@ -0,0 +1,123 @@
+# PULC Quick Start
+
+------
+
+This document introduces the prediction using PULC series model based on PaddleClas wheel.
+
+## Catalogue
+
+- [1. Installation](#1)
+  - [1.1 PaddlePaddle Installation](#11)
+  - [1.2 PaddleClas wheel Installation](#12)
+- [2. Quick Start](#2)
+  - [2.1 Predicion with Command Line](#2.1)
+  - [2.2 Predicion with Python](#2.2)
+  - [2.3 Supported Model List](#2.3)
+- [3. Summary](#3)
+
+<a name="1"></a>
+
+## 1. Installation
+
+<a name="1.1"></a>
+
+### 1.1 PaddlePaddle Installation
+
+- Run the following command to install if CUDA9 or CUDA10 is available.
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- Run the following command to install if GPU device is unavailable.
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) for more information about installation, for examples other versions.
+
+<a name="1.2"></a>
+
+### 1.2 PaddleClas wheel Installation
+
+```bash
+pip3 install paddleclas
+```
+
+<a name="2"></a>
+
+## 2. Quick Start
+
+PaddleClas provides a series of test cases, which contain demos of different scenes about people, cars, OCR, etc. Click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download the data.
+
+<a name="2.1"></a>
+
+### 2.1 Predicion with Command Line
+
+```
+cd /path/to/pulc_demo_imgs
+```
+
+The prediction command:
+
+```bash
+paddleclas --model_name=person_exists --infer_imgs=pulc_demo_imgs/person_exists/objects365_01780782.jpg
+```
+
+Result:
+
+```
+>>> result
+class_ids: [0], scores: [0.9955421453341842], label_names: ['nobody'], filename: pulc_demo_imgs/person_exists/objects365_01780782.jpg
+Predict complete!
+```
+`Nobody` means there is no one in the image, `someone` means there is someone in the image. Therefore, the prediction result indicates that there is no one in the figure.
+
+**Note**: The "--infer_imgs" argument specify the image(s) to be predict, and you can also specify a directoy contains images. If use other model, you can specify the `--model_name` argument. Please refer to [2.3 Supported Model List](#2.3) for the supported model list.
+
+<a name="2.2"></a>
+
+### 2.2 Predicion with Python
+
+You can also use in Python:
+
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="person_exists")
+result = model.predict(input_data="pulc_demo_imgs/person_exists/objects365_01780782.jpg")
+print(next(result))
+```
+
+The printed result information:
+
+```
+>>> result
+[{'class_ids': [0], 'scores': [0.9955421453341842], 'label_names': ['nobody'], 'filename': 'pulc_demo_imgs/person_exists/objects365_01780782.jpg'}]
+```
+
+**Note**: `model.predict()` is a generator, so `next()` or `for` is needed to call it. This would to predict by batch that length is `batch_size`, default by 1. You can specify the argument `batch_size` and `model_name` when instantiating PaddleClas object, for example: `model = paddleclas.PaddleClas(model_name="person_exists",  batch_size=2)`. Please refer to [2.3 Supported Model List](#2.3) for the supported model list.
+
+<a name="2.3"></a>
+
+### 2.3 Supported Model List
+
+The name of PULC series models are as follows:
+
+| Name | Intro |
+| --- | --- |
+| person_exists | Human Exists Classification |
+| person_attribute | Pedestrian Attribute Classification |
+| safety_helmet | Classification of Wheather Wearing Safety Helmet |
+| traffic_sign | Traffic Sign Classification |
+| vehicle_attribute | Vehicle Attribute Classification |
+| car_exists | Car Exists Classification |
+| text_image_orientation | Text Image Orientation Classification |
+| textline_orientation | Text-line Orientation Classification |
+| language_classification | Language Classification |
+
+<a name="3"></a>
+
+## 3. Summary
+
+The PULC series models have been verified to be effective in different scenarios about people, vehicles, OCR, etc. The ultra lightweight model can achieve the accuracy close to SwinTransformer model, and the speed is increased by 40+ times. And PULC also provides the whole process of dataset getting, model training, model compression and deployment. Please refer to [Human Exists Classification](PULC_person_exists_en.md)、[Pedestrian Attribute Classification](PULC_person_attribute_en.md)、[Classification of Wheather Wearing Safety Helmet](PULC_safety_helmet_en.md)、[Traffic Sign Classification](PULC_traffic_sign_en.md)、[Vehicle Attribute Classification](PULC_vehicle_attribute_en.md)、[Car Exists Classification](PULC_car_exists_en.md)、[Text Image Orientation Classification](PULC_text_image_orientation_en.md)、[Text-line Orientation Classification](PULC_textline_orientation_en.md)、[Language Classification](PULC_language_classification_en.md) for more information about different scenarios.
diff --git a/docs/en/PULC/PULC_safety_helmet_en.md b/docs/en/PULC/PULC_safety_helmet_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..d2e5cb32931cdc98b0776f4692e6162e907aa6fa
--- /dev/null
+++ b/docs/en/PULC/PULC_safety_helmet_en.md
@@ -0,0 +1,432 @@
+# PULC Classification Model of Wheather Wearing Safety Helmet or Not
+
+-----
+
+## Catalogue
+
+- [1. Introduction](#1)
+- [2. Quick Start](#2)
+    - [2.1 PaddlePaddle Installation](#2.1)
+    - [2.2 PaddleClas Installation](#2.2)
+    - [2.3 Prediction](#2.3)
+- [3. Training, Evaluation and Inference](#3)
+    - [3.1 Installation](#3.1)
+    - [3.2 Dataset](#3.2)
+      - [3.2.1 Dataset Introduction](#3.2.1)
+      - [3.2.2 Getting Dataset](#3.2.2)
+    - [3.3 Training](#3.3)
+    - [3.4 Evaluation](#3.4)
+    - [3.5 Inference](#3.5)
+- [4. Model Compression](#4)
+  - [4.1 SKL-UGI Knowledge Distillation](#4.1)
+    - [4.1.1 Teacher Model Training](#4.1.1)
+    - [4.1.2 Knowledge Distillation Training](#4.1.2)
+- [5. SHAS](#5)
+- [6. Inference Deployment](#6)
+  - [6.1 Getting Paddle Inference Model](#6.1)
+    - [6.1.1 Exporting Paddle Inference Model](#6.1.1)
+    - [6.1.2 Downloading Inference Model](#6.1.2)
+  - [6.2 Prediction with Python](#6.2)
+    - [6.2.1 Image Prediction](#6.2.1)
+    - [6.2.2 Images Prediction](#6.2.2)
+  - [6.3 Deployment with C++](#6.3)
+  - [6.4 Deployment as Service](#6.4)
+  - [6.5 Deployment on Mobile](#6.5)
+  - [6.6 Converting To ONNX and Deployment](#6.6)
+
+<a name="1"></a>
+
+## 1. Introduction
+
+This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of wheather wearing safety helmet using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in construction scenes, factory workshop scenes, traffic scenes and so on.
+
+The following table lists the relevant indicators of the model. The first three lines means that using SwinTransformer_tiny, Res2Net200_vd_26w_4s and MobileNetV3_small_x0_35 as the backbone to training. The fourth to seventh lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
+
+| Backbone | Tpr(%) | Latency(ms) | Size(M)| Training Strategy |
+|-------|-----------|----------|---------------|---------------|
+| SwinTranformer_tiny  | 93.57 | 91.32  | 111 | using ImageNet pretrained model |
+| Res2Net200_vd_26w_4s  | 98.92 | 80.99 | 284 | using ImageNet pretrained model |
+| MobileNetV3_small_x0_35  | 84.83 | 2.85 | 2.6 | using ImageNet pretrained model |
+| PPLCNet_x1_0  | 93.27 | 2.03  | 7.1 | using ImageNet pretrained model |
+| PPLCNet_x1_0  | 98.16 | 2.03  | 7.1 | using SSLD pretrained model |
+| PPLCNet_x1_0  | 99.30 | 2.03  | 7.1 | using SSLD pretrained model + EDA strategy  |
+| <b>PPLCNet_x1_0<b>  | <b>99.38<b> | <b>2.03<b>  | <b>7.1<b> | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
+
+It can be seen that high Tpr can be getted when backbone is Res2Net200_vd_26w_4s, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the Tpr will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the Tpr is higher more 8.5 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more than 20% faster. After additional using the SSLD pretrained model, the Tpr can be improved by about 4.9 percentage points without affecting the inference speed. Further, additional using the EDA strategy, the Tpr can be increased by 1.1 percentage points. Finally, after additional using the UDML knowledge distillation, the Tpr can be further improved by 2.2 percentage points. At this point, the Tpr is higher than that of Res2Net200_vd_26w_4s, but the speed is more than 70 times faster. The training method and deployment instructions of PULC will be introduced in detail below.
+
+**Note**:
+
+* About `Tpr` metric, please refer to [3.2 section](#3.2) for more information .
+* The Latency is tested on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. The MKLDNN is enabled and the number of threads is 10.
+* About PP-LCNet, please refer to [PP-LCNet Introduction](../models/PP-LCNet_en.md) and [PP-LCNet Paper](https://arxiv.org/abs/2109.15099).
+
+<a name="2"></a>
+
+## 2. Quick Start
+
+<a name="2.1"></a>  
+
+### 2.1 PaddlePaddle Installation
+
+- Run the following command to install if CUDA9 or CUDA10 is available.
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- Run the following command to install if GPU device is unavailable.
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) for more information about installation, for examples other versions.
+
+<a name="2.2"></a>  
+
+### 2.2 PaddleClas wheel Installation
+
+The command of PaddleClas installation as bellow:
+
+```bash
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 Prediction
+
+First, please click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download and unzip to get the test demo images.
+
+* Prediction with CLI
+
+```bash
+paddleclas --model_name=safety_helmet --infer_imgs=pulc_demo_imgs/safety_helmet/safety_helmet_test_1.png
+```
+
+Results:
+
+```
+>>> result
+class_ids: [1], scores: [0.9986255], label_names: ['unwearing_helmet'], filename: pulc_demo_imgs/safety_helmet/safety_helmet_test_1.png
+Predict complete!
+```
+**Note**: If you want to test other images, only need to specify the `--infer_imgs` argument, and the directory containing images is also supported.
+
+* Prediction in Python
+
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="safety_helmet")
+result = model.predict(input_data="pulc_demo_imgs/safety_helmet/safety_helmet_test_1.png")
+print(next(result))
+```
+
+**Note**: The `result` returned by `model.predict()` is a generator, so you need to use the `next()` function to call it or `for` loop to loop it. And it will predict with `batch_size` size batch and return the prediction results when called. The default `batch_size` is 1, and you also specify the `batch_size` when instantiating, such as `model = paddleclas.PaddleClas(model_name="safety_helmet",  batch_size=2)`. The result of demo above:
+
+```
+>>> result
+[{'class_ids': [1], 'scores': [0.9986255], 'label_names': ['unwearing_helmet'], 'filename': 'pulc_demo_imgs/safety_helmet/safety_helmet_test_1.png'}]
+```
+
+<a name="3"></a>
+
+## 3. Training, Evaluation and Inference
+
+<a name="3.1"></a>  
+
+### 3.1 Installation
+
+Please refer to [Installation](../installation/install_paddleclas_en.md) to get the description about installation.
+
+<a name="3.2"></a>
+
+### 3.2 Dataset
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 Dataset Introduction
+
+All datasets used in this case are open source data. Train data is the subset of [Safety-Helmet-Wearing-Dataset](https://github.com/njvisionpower/Safety-Helmet-Wearing-Dataset), [hard-hat-detection](https://www.kaggle.com/datasets/andrewmvd/hard-hat-detection) and [Large-scale CelebFaces Attributes (CelebA) Dataset](https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html).
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 Getting Dataset
+
+The data used in this case can be getted by processing the open source data. The detailed processes are as follows:
+
+* `Safety-Helmet-Wearing-Dataset`: according to the bbox label data, the image is cropped by enlarging width and height of bbox by 3 times. The label is 0 if wearing safety helmet in the image, and the label is 1 if not;
+* `hard-hat-detection`: Only use the image that labeled "hat" and crop it with bbox. The label is 0;
+* `CelebA`: Only use the image labeled "wearing_hat" and crop it with bbox. The label is 0;
+
+After processing, the dataset totals about 150000 images, of which the number of images with and without wearing safety helmet is about 28000 and 121000 respectively. Then 5600 images are randomly selected in the two labels as the valuation data, a total of about 11200 images, and about 138000 other images as the training data.
+
+Some image of the processed dataset is as follows:
+
+![](../../images/PULC/docs/safety_helmet_data_demo.jpg)
+
+And you can also download the data processed directly.
+
+```
+cd path_to_PaddleClas
+```
+
+Enter the `dataset/` directory, download and unzip the dataset.
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/safety_helmet.tar
+tar -xf safety_helmet.tar
+cd ../
+```
+
+The datas under `safety_helmet` directory:
+
+```
+├── images
+│   ├── VOC2028_part2_001209_1.jpg
+│   ├── HHD_hard_hat_workers23_1.jpg
+│   ├── CelebA_077809.jpg
+│   ├── ...
+│   └── ...
+├── train_list.txt
+└── val_list.txt
+```
+
+The `train_list.txt` and `val_list.txt` are label files of training data and validation data respectively. All images in `images/` directory.
+
+**Note**:
+
+* About the contents format of `train_list.txt` and `val_list.txt`, please refer to [Description about Classification Dataset in PaddleClas](../data_preparation/classification_dataset_en.md).
+
+<a name="3.3"></a>
+
+### 3.3 Training
+
+The details of training config in `ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml`. The command about training as follows:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+    -c ./ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0.yaml
+```
+
+The best metric of validation data is between `0.985` and `0.993`. There would be fluctuations because the data size is small.
+
+**Note**:
+
+* The metric Tpr, that describe the True Positive Rate when False Positive Rate is less than a certain threshold(1/10000 used in this case), is one of the commonly used metric for binary classification. About the details of Fpr and Tpr, please refer [here](https://en.wikipedia.org/wiki/Receiver_operating_characteristic).
+* When evaluation, the best metric TprAtFpr will be printed that include `Fpr`, `Tpr` and the current `threshold`. The `Tpr` means the Recall rate under the current `Fpr`. The `Tpr` higher, the model better. The `threshold` would be used in deployment, which means the classification threshold under best `Fpr` metric.
+
+<a name="3.4"></a>
+
+### 3.4 Evaluation
+
+After training, you can use the following commands to evaluate the model.
+
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+
+Among the above command, the argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+
+<a name="3.5"></a>
+
+### 3.5 Inference
+
+After training, you can use the model that trained to infer. Command is as follow:
+
+```python
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+
+The results:
+
+```
+[{'class_ids': [1], 'scores': [0.9524797], 'label_names': ['unwearing_helmet'], 'file_name': 'deploy/images/PULC/safety_helmet/safety_helmet_test_1.png'}]
+```
+
+**备注：**
+
+* Among the above command, argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+* The default test image is `deploy/images/PULC/safety_helmet/safety_helmet_test_1.png`. And you can test other image, only need to specify the argument `-o Infer.infer_imgs=path_to_test_image`.
+* The default threshold is `0.5`. If needed, you can specify the argument `Infer.PostProcess.threshold`, such as: `-o Infer.PostProcess.threshold=0.9167`. And the argument `threshold` is needed to be specified according by specific case. The `0.9167` is the best threshold when `Fpr` is less than `1/10000` in this valuation dataset.
+
+<a name="4"></a>
+
+## 4. Model Compression
+
+<a name="4.1"></a>
+
+### 4.1 UDML  Knowledge Distillation
+
+UDML is a simple but effective knowledge distillation algrithem proposed by PaddleClas. Please refer to [UDML 知识蒸馏](../advanced_tutorials/knowledge_distillation_en.md#1.2.3) for more details.
+
+<a name="4.1.1"></a>
+
+#### 4.1.1  Knowledge Distillation Training
+
+Training with hyperparameters specified in `ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0_distillation.yaml`. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+    -c ./ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0_distillation.yaml
+```
+
+The best metric is between `0.990` and `0.993`. The best student model weight would be saved in file `output/DistillationModel/best_model_student.pdparams`.
+
+<a name="5"></a>
+
+## 5. Hyperparameters Searching
+
+The hyperparameters used by [3.2 section](#3.2) and [4.1 section](#4.1) are according by `Hyperparameters Searching` in PaddleClas. If you want to get better results on your own dataset, you can refer to [Hyperparameters Searching](PULC_train_en.md#4) to get better hyperparameters.
+
+**Note**: This section is optional. Because the search process will take a long time, you can selectively run according to your specific. If not replace the dataset, you can ignore this section.
+
+<a name="6"></a>
+
+## 6. Inference Deployment
+
+<a name="6.1"></a>
+
+### 6.1 Getting Paddle Inference Model
+
+Paddle Inference is the original Inference Library of the PaddlePaddle, provides high-performance inference for server deployment. And compared with  directly based on the pretrained model, Paddle Inference can use tools to accelerate prediction, so as to achieve better inference performance. Please refer to [Paddle Inference](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html) for more information.
+
+Paddle Inference need Paddle Inference Model to predict. Two process provided to get Paddle Inference Model. If want to use the provided by PaddleClas, you can download directly, click [Downloading Inference Model](#6.1.2).
+
+<a name="6.1.1"></a>
+
+### 6.1.1 Exporting Paddle Inference Model
+
+The command about exporting Paddle Inference Model is as follow:
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_safety_helmet_infer
+```
+
+After running above command, the inference model files would be saved in `deploy/models/PPLCNet_x1_0_safety_helmet_infer`, as shown below:
+
+```
+├── PPLCNet_x1_0_safety_helmet_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**Note**: The best model is from knowledge distillation training. If knowledge distillation training is not used, the best model would be saved in `output/PPLCNet_x1_0/best_model.pdparams`.
+
+<a name="6.1.2"></a>
+
+### 6.1.2 Downloading Inference Model
+
+You can also download directly.
+
+```
+cd deploy/models
+# download the inference model and decompression
+wget https://paddleclas.bj.bcebos.com/models/PULC/safety_helmet_infer.tar && tar -xf safety_helmet_infer.tar
+```
+
+After decompression, the directory `models` should be shown below.
+
+```
+├── safety_helmet_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 Prediction with Python
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 Image Prediction
+
+Return the directory `deploy`:
+
+```
+cd ../
+```
+
+Run the following command to classify whether wearing safety helmet about the image `./images/PULC/safety_helmet/safety_helmet_test_1.png`.
+
+```shell
+# Use the following command to predict with GPU.
+python3.7 python/predict_cls.py -c configs/PULC/safety_helmet/inference_safety_helmet.yaml
+# Use the following command to predict with CPU.
+python3.7 python/predict_cls.py -c configs/PULC/safety_helmet/inference_safety_helmet.yaml -o Global.use_gpu=False
+```
+
+The prediction results:
+
+```
+safety_helmet_test_1.png:       class id(s): [1], score(s): [1.00], label_name(s): ['unwearing_helmet']
+```
+
+**Note**: The default threshold is `0.5`. If needed, you can specify the argument `Infer.PostProcess.threshold`, such as: `-o Infer.PostProcess.threshold=0.9167`. And the argument `threshold` is needed to be specified according by specific case. The `0.9167` is the best threshold when `Fpr` is less than `1/10000` in this valuation dataset. Please refer to [3.3 section](#3.3) for details.
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 Images Prediction
+
+If you want to predict images in directory, please specify the argument `Global.infer_imgs` as directory path by `-o Global.infer_imgs`. The command is as follow.
+
+```shell
+# Use the following command to predict with GPU. If want to replace with CPU, you can add argument -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/safety_helmet/inference_safety_helmet.yaml -o Global.infer_imgs="./images/PULC/safety_helmet/"
+```
+
+All prediction results will be printed, as shown below.
+
+```
+safety_helmet_test_1.png:       class id(s): [1], score(s): [1.00], label_name(s): ['unwearing_helmet']
+safety_helmet_test_2.png:       class id(s): [0], score(s): [1.00], label_name(s): ['wearing_helmet']
+```
+
+Among the prediction results above, `wearing_helmet` means that wearing safety helmet about the image, `unwearing_helmet` means not.
+
+<a name="6.3"></a>
+
+### 6.3 Deployment with C++
+
+PaddleClas provides an example about how to deploy with C++. Please refer to [Deployment with C++](../inference_deployment/cpp_deploy_en.md).
+
+<a name="6.4"></a>
+
+### 6.4 Deployment as Service
+
+Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
+
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+
+<a name="6.5"></a>
+
+### 6.5 Deployment on Mobile
+
+Paddle-Lite is an open source deep learning framework that designed to make easy to perform inference on mobile, embeded, and IoT devices. Please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) for more information.
+
+PaddleClas provides an example of how to deploy on mobile by Paddle-Lite. Please refer to [Paddle-Lite deployment](../inference_deployment/paddle_lite_deploy_en.md).
+
+<a name="6.6"></a>
+
+### 6.6 Converting To ONNX and Deployment
+
+Paddle2ONNX support convert Paddle Inference model to ONNX model. And you can deploy with ONNX model on different inference engine, such as TensorRT, OpenVINO, MNN/TNN, NCNN and so on. About Paddle2ONNX details, please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
+
+PaddleClas provides an example of how to convert Paddle Inference model to ONNX model by paddle2onnx toolkit and predict by ONNX model. You can refer to [paddle2onnx](../../../deploy/paddle2onnx/readme_en.md) for deployment details.
diff --git a/docs/en/PULC/PULC_text_image_orientation_en.md b/docs/en/PULC/PULC_text_image_orientation_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..1d3cc41f992adff90f396463205cd060147023c1
--- /dev/null
+++ b/docs/en/PULC/PULC_text_image_orientation_en.md
@@ -0,0 +1,466 @@
+# PULC Classification Model of Text Image Orientation
+
+## Catalogue
+
+- [1. Introduction](#1)
+- [2. Quick Start](#2)
+    - [2.1 PaddlePaddle Installation](#2.1)
+    - [2.2 PaddleClas Installation](#2.2)
+    - [2.3 Prediction](#2.3)
+- [3. Training, Evaluation and Inference](#3)
+    - [3.1 Installation](#3.1)
+    - [3.2 Dataset](#3.2)
+      - [3.2.1 Dataset Introduction](#3.2.1)
+      - [3.2.2 Getting Dataset](#3.2.2)
+    - [3.3 Training](#3.3)
+    - [3.4 Evaluation](#3.4)
+    - [3.5 Inference](#3.5)
+- [4. Model Compression](#4)
+  - [4.1 SKL-UGI Knowledge Distillation](#4.1)
+    - [4.1.1 Teacher Model Training](#4.1.1)
+    - [4.1.2 Knowledge Distillation Training](#4.1.2)
+- [5. SHAS](#5)
+- [6. Inference Deployment](#6)
+  - [6.1 Getting Paddle Inference Model](#6.1)
+    - [6.1.1 Exporting Paddle Inference Model](#6.1.1)
+    - [6.1.2 Downloading Inference Model](#6.1.2)
+  - [6.2 Prediction with Python](#6.2)
+    - [6.2.1 Image Prediction](#6.2.1)
+    - [6.2.2 Images Prediction](#6.2.2)
+  - [6.3 Deployment with C++](#6.3)
+  - [6.4 Deployment as Service](#6.4)
+  - [6.5 Deployment on Mobile](#6.5)
+  - [6.6 Converting To ONNX and Deployment](#6.6)
+
+<a name="1"></a>
+
+## 1. Introduction
+
+In the process of document scanning, license shooting and so on, sometimes in order to shoot more clearly, the camera device will be rotated, resulting in photo in different directions. At this time, the standard OCR process cannot cope with these issues well. Using the text image orientation classification technology, the direction of the text image can be predicted and adjusted, so as to improve the accuracy of OCR processing. This case provides a way for users to use PaddleClas PULC (Practical Ultra Lightweight image Classification) to quickly build a lightweight, high-precision, practical classification model of text image orientation. This model can be widely used in OCR processing scenarios of rotating pictures in financial, government and other industries.
+
+The following table lists the relevant indicators of the model. The first two lines means that using SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The third to fifth lines means that the backbone is replaced by PPLCNet, additional use of SSLD pretrained model and additional use of hyperparameters searching strategy.
+
+| Backbone | Top1-Acc(%) | Latency(ms) | Size(M)| Training Strategy |
+| ----------------------- | --------- | ---------- | --------- | ------------------------------------- |
+| SwinTranformer_tiny     | 99.12     | 89.65      | 111       | using ImageNet pretrained model       |
+| MobileNetV3_small_x0_35 | 83.61     | 2.95       | 2.6        | using ImageNet pretrained model       |
+| PPLCNet_x1_0            | 97.85     | 2.16       | 7.1       | using ImageNet pretrained model       |
+| PPLCNet_x1_0            | 98.02     | 2.16       | 7.1       | using SSLD pretrained model           |
+| **PPLCNet_x1_0**        | **99.06** | **2.16**   | **7.1**   | using SSLD pretrained model + hyperparameters searching strategy |
+
+It can be seen that high accuracy can be getted when backbone is SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the accuracy will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the accuracy is higher more 14 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more faster. After additional using the SSLD pretrained model, the accuracy can be improved by about 0.17 percentage points without affecting the inference speed. Finally, after additional using the hyperparameters searching strategy, the accuracy can be further improved by 1.04 percentage points. At this point, the accuracy is close to that of SwinTranformer_tiny, but the speed is more faster. The training method and deployment instructions of PULC will be introduced in detail below.
+
+**Note**:
+
+* The Latency is tested on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. The MKLDNN is enabled and the number of threads is 10.
+* About PP-LCNet, please refer to [PP-LCNet Introduction](../models/PP-LCNet_en.md) and [PP-LCNet Paper](https://arxiv.org/abs/2109.15099).
+
+<a name="2"></a>
+
+## 2. Quick Start
+
+<a name="2.1"></a>  
+
+### 2.1 PaddlePaddle Installation
+
+- Run the following command to install if CUDA9 or CUDA10 is available.
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- Run the following command to install if GPU device is unavailable.
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) for more information about installation, for examples other versions.
+
+<a name="2.2"></a>  
+
+### 2.2 PaddleClas wheel Installation
+
+The command of PaddleClas installation as bellow:
+
+```bash
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 Prediction
+
+First, please click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download and unzip to get the test demo images.
+
+* Prediction with CLI
+
+```bash
+paddleclas --model_name=text_image_orientation --infer_imgs=pulc_demo_imgs/text_image_orientation/img_rot0_demo.jpg
+```
+
+Results:
+
+```
+>>> result
+class_ids: [0, 2], scores: [0.85615, 0.05046], label_names: ['0', '180'], filename: pulc_demo_imgs/text_image_orientation/img_rot0_demo.jpg
+Predict complete!
+```
+
+**Note**: If you want to test other images, only need to specify the `--infer_imgs` argument, and the directory containing images is also supported.
+
+* Prediction in Python
+
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="text_image_orientation")
+result = model.predict(input_data="pulc_demo_imgs/text_image_orientation/img_rot0_demo.jpg")
+print(next(result))
+```
+
+**Note**: The `result` returned by `model.predict()` is a generator, so you need to use the `next()` function to call it or `for` loop to loop it. And it will predict with `batch_size` size batch and return the prediction results when called. The default `batch_size` is 1, and you also specify the `batch_size` when instantiating, such as `model = paddleclas.PaddleClas(model_name="text_image_orientation",  batch_size=2)`. The result of demo above:
+
+```
+>>> result
+[{'class_ids': [0, 2], 'scores': [0.85615, 0.05046], 'label_names': ['0', '180'], 'filename': 'pulc_demo_imgs/text_image_orientation/img_rot0_demo.jpg'}]
+```
+
+<a name="3"></a>
+
+## 3. Training, Evaluation and Inference
+
+<a name="3.1"></a>  
+
+### 3.1 Installation
+
+Please refer to [Installation](../installation/install_paddleclas_en.md) to get the description about installation.
+
+<a name="3.2"></a>
+
+### 3.2 Dataset
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 Dataset Introduction
+
+The model provided in [1 section](#1) is trained using internal data, which has not been open source. So we provide a dataset with [ICDAR2019-ArT](https://ai.baidu.com/broad/introduction?dataset=art), [XFUND](https://github.com/doc-analysis/XFUND) and [ICDAR2015](https://rrc.cvc.uab.es/?ch=4&com=introduction) to experience.
+
+![](../../images/PULC/docs/text_image_orientation_original_data.png)
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 Getting Dataset
+
+The data used in this case can be getted by processing the open source data. The detailed processes are as follows:
+
+Considering the resolution of original image is too high to need long training time, all the data are scaled in advance. Keeping image aspect ratio, the short edge is scaled to 384. Then rotate the data clockwise to generate composite data of 90 degrees, 180 degrees and 270 degrees respectively. Among them, 41460 images generated by ICDAR2019-ArT and XFUND are randomly divided into training set and verification set according to the ratio of 9:1. 6000 images generated by ICDAR2015 are used as supplementary data in the experiment of `SKL-UGI knowledge distillation`.
+
+Some image of the processed dataset is as follows:
+
+![](../../images/PULC/docs/text_image_orientation_data_demo.png)
+
+And you can also download the data processed directly.
+
+```
+cd path_to_PaddleClas
+```
+
+Enter the `dataset/` directory, download and unzip the dataset.
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/text_image_orientation.tar
+tar -xf text_image_orientation.tar
+cd ../
+```
+
+The datas under `text_image_orientation` directory:
+
+```
+├── img_0
+│   ├── img_rot0_0.jpg
+│   ├── img_rot0_1.png
+...
+├── img_90
+│   ├── img_rot90_0.jpg
+│   ├── img_rot90_1.png
+...
+├── img_180
+│   ├── img_rot180_0.jpg
+│   ├── img_rot180_1.png
+...
+├── img_270
+│   ├── img_rot270_0.jpg
+│   ├── img_rot270_1.png
+...
+├── distill_data
+│   ├── gt_7060_0.jpg
+│   ├── gt_7060_90.jpg
+...
+├── train_list.txt
+├── train_list.txt.debug
+├── train_list_for_distill.txt
+├── test_list.txt
+├── test_list.txt.debug
+└── label_list.txt
+```
+
+Where `img_0/`, `img_90/`, `img_180/` and `img_270/` are data of 4 angles respectively. The `train_list.txt` and `val_list.txt` are label files of training data and validation data respectively. The file `train_list.txt.debug` and `val_list.txt.debug` are subset of `train_list.txt` and `val_list.txt` respectively. `distill_data/` is the supplementary data, which will be used for SKL-UGI knowledge distillation, and its label file is `train_list_for_distill.txt`.
+
+**Note**:
+
+* About the contents format of `train_list.txt` and `val_list.txt`, please refer to [Description about Classification Dataset in PaddleClas](../data_preparation/classification_dataset_en.md).
+* About the `train_list_for_distill.txt`, please refer to [Knowledge Distillation Label](../advanced_tutorials/distillation/distillation_en.md).
+
+<a name="3.3"></a>
+
+### 3.3 Training
+
+The details of training config in `ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml`. The command about training as follows:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml
+```
+
+The best metric of validation data is about `0.99`.
+
+
+**Note**:
+* The metric mentioned in this document are training on large-scale internal dataset. When using demo data to train, this metric cannot be achieved because the dataset is small and the distribution is different from large-scale internal data. You can further expand your own data and use the optimization method described in this case to achieve higher accuracy.
+
+<a name="3.4"></a>
+
+### 3.4 Evaluation
+
+After training, you can use the following commands to evaluate the model.
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+Among the above command, the argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+
+<a name="3.5"></a>
+
+### 3.5 Inference
+
+After training, you can use the model that trained to infer. Command is as follow:
+
+```bash
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+The results:
+
+```
+[{'class_ids': [0, 2], 'scores': [0.85615, 0.05046], 'file_name': 'deploy/images/PULC/text_image_orientation/img_rot0_demo.jpg', 'label_names': ['0', '180']}]
+```
+
+**Note**:
+
+* Among the above command, argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+* The default test image is `deploy/images/PULC/text_image_orientation/img_rot0_demo.jpg`. And you can test other image, only need to specify the argument `-o Infer.infer_imgs=path_to_test_image`.
+* The Top2 result would be printed. `0` means that the text direction of the drawing is 0 degrees, `90` means that 90 degrees clockwise, `180` means that 180 degrees clockwise, `270` means that 270 degrees clockwise.
+
+<a name="4"></a>
+
+## 4. Model Compression
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI Knowledge Distillation
+
+SKL-UGI is a simple but effective knowledge distillation algrithem proposed by PaddleClas.
+
+<!-- todo -->
+<!-- Please refer to [SKL-UGI](../advanced_tutorials/distillation/distillation_en.md) for more details. -->
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 Teacher Model Training
+
+Training the teacher model with hyperparameters specified in `ppcls/configs/PULC/text_image_orientation/PPLCNet/PPLCNet_x1_0.yaml`. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd
+```
+
+The best metric of validation data is about `0.996`. The best teacher model weight would be saved in file `output/ResNet101_vd/best_model.pdparams`.
+
+**Note**: Training ResNet101_vd need more GPU memory. So you can reduce `batch_size` and `learning rate` at the same time, such as: `-o DataLoader.Train.sampler.batch_size=64`, `Optimizer.lr.learning_rate=0.1`.
+
+<a name="4.1.2"></a>
+
+#### 4.1.2 Knowledge Distillation Training
+
+The training strategy, specified in training config file `ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0_distillation.yaml`, the teacher model is `ResNet101_vd` and the student model is `PPLCNet_x1_0`.
+
+The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0_distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
+```
+
+The best metric is about `0.99`. The best student model weight would be saved in file `output/DistillationModel/best_model_student.pdparams`.
+
+<a name="5"></a>
+
+## 5. Hyperparameters Searching
+
+The hyperparameters used by [3.2 section](#3.2) and [4.1 section](#4.1) are according by `Hyperparameters Searching` in PaddleClas. If you want to get better results on your own dataset, you can refer to [Hyperparameters Searching](PULC_train_en.md#4) to get better hyperparameters.
+
+**Note**: This section is optional. Because the search process will take a long time, you can selectively run according to your specific. If not replace the dataset, you can ignore this section.
+
+<a name="6"></a>
+
+## 6. Inference Deployment
+
+<a name="6.1"></a>
+
+### 6.1 Getting Paddle Inference Model
+
+Paddle Inference is the original Inference Library of the PaddlePaddle, provides high-performance inference for server deployment. And compared with  directly based on the pretrained model, Paddle Inference can use tools to accelerate prediction, so as to achieve better inference performance. Please refer to [Paddle Inference](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html) for more information.
+
+Paddle Inference need Paddle Inference Model to predict. Two process provided to get Paddle Inference Model. If want to use the provided by PaddleClas, you can download directly, click [Downloading Inference Model](#6.1.2).
+
+<a name="6.1.1"></a>
+
+### 6.1.1 Exporting Paddle Inference Model
+
+The command about exporting Paddle Inference Model is as follow:
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_text_image_orientation_infer
+```
+
+After running above command, the inference model files would be saved in `deploy/models/PPLCNet_x1_0_text_image_orientation_infer`, as shown below:
+
+```
+├── PPLCNet_x1_0_text_image_orientation_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**Note**: The best model is from knowledge distillation training. If knowledge distillation training is not used, the best model would be saved in `output/PPLCNet_x1_0/best_model.pdparams`.
+
+<a name="6.1.2"></a>
+
+### 6.1.2 Downloading Inference Model
+
+You can also download directly.
+
+```
+cd deploy/models
+# download the inference model and decompression
+wget https://paddleclas.bj.bcebos.com/models/PULC/text_image_orientation_infer.tar && tar -xf text_image_orientation_infer.tar
+```
+
+After decompression, the directory `models` should be shown below.
+
+```
+├── text_image_orientation_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 Prediction with Python
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 Image Prediction
+
+Return the directory `deploy`:
+
+```
+cd ../
+```
+
+Run the following command to classify text image orientation about image `./images/PULC/text_image_orientation/img_rot0_demo.png`.
+
+```shell
+# Use the following command to predict with GPU.
+python3.7 python/predict_cls.py -c configs/PULC/text_image_orientation/inference_text_image_orientation.yaml
+# Use the following command to predict with CPU.
+python3.7 python/predict_cls.py -c configs/PULC/text_image_orientation/inference_text_image_orientation.yaml -o Global.use_gpu=False
+```
+
+The prediction results:
+
+```
+img_rot0_demo.jpg:    class id(s): [0, 2], score(s): [0.86, 0.05], label_name(s): ['0', '180']
+```
+
+Among the results, `0` means that the text direction of the drawing is 0 degrees, `90` means that 90 degrees clockwise, `180` means that 180 degrees clockwise, `270` means that 270 degrees clockwise.
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 Images Prediction
+
+If you want to predict images in directory, please specify the argument `Global.infer_imgs` as directory path by `-o Global.infer_imgs`. The command is as follow.
+
+```shell
+# Use the following command to predict with GPU. If want to replace with CPU, you can add argument -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/text_image_orientation/inference_text_image_orientation.yaml -o Global.infer_imgs="./images/PULC/text_image_orientation/"
+```
+
+All prediction results will be printed, as shown below.
+
+```
+img_rot0_demo.jpg:    class id(s): [0, 2], score(s): [0.86, 0.05], label_name(s): ['0', '180']
+img_rot180_demo.jpg:    class id(s): [2, 1], score(s): [0.88, 0.04], label_name(s): ['180', '90']
+```
+
+<a name="6.3"></a>
+
+### 6.3 Deployment with C++
+
+PaddleClas provides an example about how to deploy with C++. Please refer to [Deployment with C++](../inference_deployment/cpp_deploy_en.md).
+
+<a name="6.4"></a>
+
+### 6.4 Deployment as Service
+
+Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
+
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+
+<a name="6.5"></a>
+
+### 6.5 Deployment on Mobile
+
+Paddle-Lite is an open source deep learning framework that designed to make easy to perform inference on mobile, embeded, and IoT devices. Please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) for more information.
+
+PaddleClas provides an example of how to deploy on mobile by Paddle-Lite. Please refer to [Paddle-Lite deployment](../inference_deployment/paddle_lite_deploy_en.md).
+
+<a name="6.6"></a>
+
+### 6.6 Converting To ONNX and Deployment
+
+Paddle2ONNX support convert Paddle Inference model to ONNX model. And you can deploy with ONNX model on different inference engine, such as TensorRT, OpenVINO, MNN/TNN, NCNN and so on. About Paddle2ONNX details, please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
+
+PaddleClas provides an example of how to convert Paddle Inference model to ONNX model by paddle2onnx toolkit and predict by ONNX model. You can refer to [paddle2onnx](../../../deploy/paddle2onnx/readme_en.md) for deployment details.
diff --git a/docs/en/PULC/PULC_textline_orientation_en.md b/docs/en/PULC/PULC_textline_orientation_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..d11307d0b5aafe056c1f1e53a85882d2449ac277
--- /dev/null
+++ b/docs/en/PULC/PULC_textline_orientation_en.md
@@ -0,0 +1,450 @@
+# PULC Classification Model of Textline Orientation
+
+------
+
+## Catalogue
+
+- [1. Introduction](#1)
+- [2. Quick Start](#2)
+    - [2.1 PaddlePaddle Installation](#2.1)
+    - [2.2 PaddleClas Installation](#2.2)
+    - [2.3 Prediction](#2.3)
+- [3. Training, Evaluation and Inference](#3)
+    - [3.1 Installation](#3.1)
+    - [3.2 Dataset](#3.2)
+      - [3.2.1 Dataset Introduction](#3.2.1)
+      - [3.2.2 Getting Dataset](#3.2.2)
+    - [3.3 Training](#3.3)
+    - [3.4 Evaluation](#3.4)
+    - [3.5 Inference](#3.5)
+- [4. Model Compression](#4)
+  - [4.1 SKL-UGI Knowledge Distillation](#4.1)
+    - [4.1.1 Teacher Model Training](#4.1.1)
+    - [4.1.2 Knowledge Distillation Training](#4.1.2)
+- [5. SHAS](#5)
+- [6. Inference Deployment](#6)
+  - [6.1 Getting Paddle Inference Model](#6.1)
+    - [6.1.1 Exporting Paddle Inference Model](#6.1.1)
+    - [6.1.2 Downloading Inference Model](#6.1.2)
+  - [6.2 Prediction with Python](#6.2)
+    - [6.2.1 Image Prediction](#6.2.1)
+    - [6.2.2 Images Prediction](#6.2.2)
+  - [6.3 Deployment with C++](#6.3)
+  - [6.4 Deployment as Service](#6.4)
+  - [6.5 Deployment on Mobile](#6.5)
+  - [6.6 Converting To ONNX and Deployment](#6.6)
+
+<a name="1"></a>
+
+## 1. Introduction
+
+This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of textline orientation using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in character correction, character recognition, etc.
+
+The following table lists the relevant indicators of the model. The first two lines means that using SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The third to seventh lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
+
+| Backbone | Top-1 Acc(%) | Latency(ms) | Size(M)| Training Strategy |
+|-------|-----------|----------|---------------|---------------|
+| SwinTranformer_tiny  | 93.61 | 89.64  | 111 |  using ImageNet pretrained model |
+| MobileNetV3_small_x0_35  | 81.40 | 2.96  | 2.6 |  using ImageNet pretrained model |
+| PPLCNet_x1_0  | 89.99 | 2.11  | 7.0 |  using ImageNet pretrained model |
+| PPLCNet_x1_0*  | 94.06 | 2.68  | 7.0 | using ImageNet pretrained model |
+| PPLCNet_x1_0*  | 94.11 | 2.68  | 7.0 | using SSLD pretrained model |
+| <b>PPLCNet_x1_0**<b>  | <b>96.01<b> | <b>2.72<b>  | <b>7.0<b> | using SSLD pretrained model + EDA strategy  |
+| PPLCNet_x1_0**  | 95.86 | 2.72  | 7.0 | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
+
+It can be seen that high accuracy can be getted when backbone is SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the accuracy will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the accuracy is higher more 8.6 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more than 10% faster. On this basis, by changing the resolution and stripe (refer to [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)), the speed becomes 27% slower, but the accuracy can be improved by 4.5 percentage points. After additional using the SSLD pretrained model, the accuracy can be improved by about 0.05 percentage points without affecting the inference speed. Finally, additional using the EDA strategy, the accuracy can be increased by 1.9 percentage points. The training method and deployment instructions of PULC will be introduced in detail below.
+
+**Note**:
+* Backbone name without \* means the resolution is 224x224, and with \* means the resolution is 48x192 (h\*w). The stride of the network is changed to `[2, [2, 1], [2, 1], [2, 1]`. Please refer to [PaddleOCR]（ https://github.com/PaddlePaddle/PaddleOCR）for more details.
+* Backbone name with \*\* means that the resolution is 80x160 (h\*w), and the stride of the network is changed to `[2, [2, 1], [2, 1], [2, 1]]`. This resolution is searched by [Hyperparameter Searching](pulc_train_en.md#4).
+* The Latency is tested on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. The MKLDNN is enabled and the number of threads is 10.
+* About PP-LCNet, please refer to [PP-LCNet Introduction](../models/PP-LCNet_en.md) and [PP-LCNet Paper](https://arxiv.org/abs/2109.15099).
+
+<a name="2"></a>
+
+## 2. Quick Start
+
+<a name="2.1"></a>  
+
+### 2.1 PaddlePaddle Installation
+
+- Run the following command to install if CUDA9 or CUDA10 is available.
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- Run the following command to install if GPU device is unavailable.
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) for more information about installation, for examples other versions.
+
+<a name="2.2"></a>  
+
+### 2.2 PaddleClas wheel Installation
+
+The command of PaddleClas installation as bellow:
+
+```bash
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 Prediction
+
+First, please click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download and unzip to get the test demo images.
+
+* Prediction with CLI
+
+```bash
+paddleclas --model_name=textline_orientation --infer_imgs=pulc_demo_imgs/textline_orientation/textline_orientation_test_0_0.png
+```
+
+Results:
+
+```
+>>> result
+class_ids: [0], scores: [1.0], label_names: ['0_degree'], filename: pulc_demo_imgs/textline_orientation/textline_orientation_test_0_0.png
+Predict complete!
+```
+
+**Note**: If you want to test other images, only need to specify the `--infer_imgs` argument, and the directory containing images is also supported.
+
+* Prediction in Python
+
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="textline_orientation")
+result = model.predict(input_data="pulc_demo_imgs/textline_orientation/textline_orientation_test_0_0.png")
+print(next(result))
+```
+
+**Note**: The `result` returned by `model.predict()` is a generator, so you need to use the `next()` function to call it or `for` loop to loop it. And it will predict with `batch_size` size batch and return the prediction results when called. The default `batch_size` is 1, and you also specify the `batch_size` when instantiating, such as `model = paddleclas.PaddleClas(model_name="textline_orientation",  batch_size=2)`. The result of demo above:
+
+```
+>>> result
+[{'class_ids': [0], 'scores': [1.0], 'label_names': ['0_degree'], 'filename': 'pulc_demo_imgs/textline_orientation/textline_orientation_test_0_0.png'}]
+```
+
+<a name="3"></a>
+
+## 3. Training, Evaluation and Inference
+
+<a name="3.1"></a>  
+
+### 3.1 Installation
+
+Please refer to [Installation](../installation/install_paddleclas_en.md) to get the description about installation.
+
+<a name="3.2"></a>
+
+### 3.2 Dataset
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 Dataset Introduction
+
+The data used in this case come from internal data. If you want to experience the training process, you can use open source data, such as [ICDAR2019-LSVT](https://aistudio.baidu.com/aistudio/datasetdetail/8429).
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 Getting Dataset
+
+Take ICDAR2019-LSVT for example, images with ID numbers from 0 to 1999 would be processed and used. After rotation, it is divided into class 0 or class 1. Class 0 means that the textline rotation angle is 0 degrees, and class 1 means 180 degrees.
+
+- Training data: The images with ID number from 0 to 1799 are used as the training set. 3600 images in total.
+- Evaluation data: The images with ID number from 1800 to 1999 are used as the evaluation set. 400 images in total.
+
+Some image of the processed dataset is as follows:
+
+![](../../images/PULC/docs/textline_orientation_data_demo.png)
+
+And you can also download the data processed directly.
+
+```
+cd path_to_PaddleClas
+```
+
+Enter the `dataset/` directory, download and unzip the dataset.
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/textline_orientation.tar
+tar -xf textline_orientation.tar
+cd ../
+```
+
+The datas under `textline_orientation` directory:
+
+```
+├── 0
+│   ├── img_0.jpg
+│   ├── img_1.jpg
+...
+├── 1
+│   ├── img_0.jpg
+│   ├── img_1.jpg
+...
+├── train_list.txt
+└── val_list.txt
+```
+
+其中 `0/` 和 `1/` 分别存放 0 类和 1 类的数据。`train_list.txt` 和 `val_list.txt` 分别为训练集和验证集的标签文件。
+
+Where `0/` and `1/` are class 0 and class 1 data respectively. The `train_list.txt` and `val_list.txt` are label files of training data and validation data respectively.
+
+**Note**:
+
+* About the contents format of `train_list.txt` and `val_list.txt`, please refer to [Description about Classification Dataset in PaddleClas](../data_preparation/classification_dataset_en.md).
+
+<a name="3.3"></a>
+
+### 3.3 Training
+
+The details of training config in `ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml`. The command about training as follows:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml
+```
+
+**Note**:
+
+* Because the ICDAR2019-LSVT data set is different from the dataset used in the provided pretrained model. If you want to get higher accuracy, you can process [ICDAR2019-LSVT](https://aistudio.baidu.com/aistudio/datasetdetail/8429).
+
+<a name="3.4"></a>
+
+### 3.4 Evaluation
+
+After training, you can use the following commands to evaluate the model.
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+Among the above command, the argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+
+<a name="3.5"></a>
+
+### 3.5 Inference
+
+After training, you can use the model that trained to infer. Command is as follow:
+
+```python
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+
+The results:
+
+```
+[{'class_ids': [0], 'scores': [1.0], 'file_name': 'deploy/images/PULC/textline_orientation/textline_orientation_test_0_0.png', 'label_names': ['0_degree']}]
+```
+
+**Note**:
+
+* Among the above command, argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+* The default test image is `deploy/images/PULC/textline_orientation/textline_orientation_test_0_0.png`. And you can test other image, only need to specify the argument `-o Infer.infer_imgs=path_to_test_image`.
+
+<a name="4"></a>
+
+## 4. Model Compression
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI Knowledge Distillation
+
+SKL-UGI is a simple but effective knowledge distillation algrithem proposed by PaddleClas.
+
+<!-- todo -->
+<!-- Please refer to [SKL-UGI](../advanced_tutorials/distillation/distillation_en.md) for more details. -->
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 Teacher Model Training
+
+Training the teacher model with hyperparameters specified in `ppcls/configs/PULC/textline_orientation/PPLCNet/PPLCNet_x1_0.yaml`. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd
+```
+
+The best metric of validation data is between `0.96` and `0.98`. The best teacher model weight would be saved in file `output/ResNet101_vd/best_model.pdparams`.
+
+<a name="4.1.2"></a>
+
+#### 4.1.2 Knowledge Distillation Training
+
+The training strategy, specified in training config file `ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0_distillation.yaml`, the teacher model is `ResNet101_vd` and the student model is `PPLCNet_x1_0`. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0_distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
+```
+
+The best metric is between `0.95` and `0.97`. The best student model weight would be saved in file `output/DistillationModel/best_model_student.pdparams`.
+
+<a name="5"></a>
+
+## 5. Hyperparameters Searching
+
+The hyperparameters used by [3.2 section](#3.2) and [4.1 section](#4.1) are according by `Hyperparameters Searching` in PaddleClas. If you want to get better results on your own dataset, you can refer to [Hyperparameters Searching](PULC_train_en.md#4) to get better hyperparameters.
+
+**Note**: This section is optional. Because the search process will take a long time, you can selectively run according to your specific. If not replace the dataset, you can ignore this section.
+
+<a name="6"></a>
+
+## 6. Inference Deployment
+
+<a name="6.1"></a>
+
+### 6.1 Getting Paddle Inference Model
+
+Paddle Inference is the original Inference Library of the PaddlePaddle, provides high-performance inference for server deployment. And compared with  directly based on the pretrained model, Paddle Inference can use tools to accelerate prediction, so as to achieve better inference performance. Please refer to [Paddle Inference](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html) for more information.
+
+Paddle Inference need Paddle Inference Model to predict. Two process provided to get Paddle Inference Model. If want to use the provided by PaddleClas, you can download directly, click [Downloading Inference Model](#6.1.2).
+
+<a name="6.1.1"></a>
+
+### 6.1.1 Exporting Paddle Inference Model
+
+The command about exporting Paddle Inference Model is as follow:
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_textline_orientation_infer
+```
+
+After running above command, the inference model files would be saved in `deploy/models/PPLCNet_x1_0_textline_orientation_infer`, as shown below:
+
+```
+├── PPLCNet_x1_0_textline_orientation_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**Note**: The best model is from knowledge distillation training. If knowledge distillation training is not used, the best model would be saved in `output/PPLCNet_x1_0/best_model.pdparams`.
+
+<a name="6.1.2"></a>
+
+### 6.1.2 Downloading Inference Model
+
+You can also download directly.
+
+```
+cd deploy/models
+# 下载 inference 模型并解压
+wget https://paddleclas.bj.bcebos.com/models/PULC/textline_orientation_infer.tar && tar -xf textline_orientation_infer.tar
+```
+
+After decompression, the directory `models` should be shown below.
+
+```
+├── textline_orientation_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 Prediction with Python
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 Image Prediction
+
+Return the directory `deploy`:
+
+```
+cd ../
+```
+
+Run the following command to classify the rotation of image `./images/PULC/textline_orientation/objects365_02035329.jpg`.
+
+```shell
+# Use the following command to predict with GPU.
+python3.7 python/predict_cls.py -c configs/PULC/textline_orientation/inference_textline_orientation.yaml
+# Use the following command to predict with CPU.
+python3.7 python/predict_cls.py -c configs/PULC/textline_orientation/inference_textline_orientation.yaml  -o Global.use_gpu=False
+```
+
+The prediction results:
+
+```
+textline_orientation_test_0_0.png:    class id(s): [0], score(s): [1.00], label_name(s): ['0_degree']
+```
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 Images Prediction
+
+If you want to predict images in directory, please specify the argument `Global.infer_imgs` as directory path by `-o Global.infer_imgs`. The command is as follow.
+
+```shell
+# Use the following command to predict with GPU. If want to replace with CPU, you can add argument -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/textline_orientation/inference_textline_orientation.yaml -o Global.infer_imgs="./images/PULC/textline_orientation/"
+```
+
+All prediction results will be printed, as shown below.
+
+```
+textline_orientation_test_0_0.png:    class id(s): [0], score(s): [1.00], label_name(s): ['0_degree']
+textline_orientation_test_0_1.png:    class id(s): [0], score(s): [1.00], label_name(s): ['0_degree']
+textline_orientation_test_1_0.png:    class id(s): [1], score(s): [1.00], label_name(s): ['180_degree']
+textline_orientation_test_1_1.png:    class id(s): [1], score(s): [1.00], label_name(s): ['180_degree']
+```
+
+Among the prediction results above, `0_degree` means that the rotation angle of the textline image is 0, and `180_degree` means that 180.
+
+<a name="6.3"></a>
+
+### 6.3 Deployment with C++
+
+PaddleClas provides an example about how to deploy with C++. Please refer to [Deployment with C++](../inference_deployment/cpp_deploy_en.md).
+
+<a name="6.4"></a>
+
+### 6.4 Deployment as Service
+
+Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
+
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+
+<a name="6.5"></a>
+
+### 6.5 Deployment on Mobile
+
+Paddle-Lite is an open source deep learning framework that designed to make easy to perform inference on mobile, embeded, and IoT devices. Please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) for more information.
+
+PaddleClas provides an example of how to deploy on mobile by Paddle-Lite. Please refer to [Paddle-Lite deployment](../inference_deployment/paddle_lite_deploy_en.md).
+
+<a name="6.6"></a>
+
+### 6.6 Converting To ONNX and Deployment
+
+Paddle2ONNX support convert Paddle Inference model to ONNX model. And you can deploy with ONNX model on different inference engine, such as TensorRT, OpenVINO, MNN/TNN, NCNN and so on. About Paddle2ONNX details, please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
+
+PaddleClas provides an example of how to convert Paddle Inference model to ONNX model by paddle2onnx toolkit and predict by ONNX model. You can refer to [paddle2onnx](../../../deploy/paddle2onnx/readme_en.md) for deployment details.
diff --git a/docs/en/PULC/PULC_traffic_sign_en.md b/docs/en/PULC/PULC_traffic_sign_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..baa0faf4828a6c7acc16f8c12587a2af58c04f99
--- /dev/null
+++ b/docs/en/PULC/PULC_traffic_sign_en.md
@@ -0,0 +1,475 @@
+# PULC Classification Model of Traffic Sign
+
+------
+
+## Catalogue
+
+- [1. Introduction](#1)
+- [2. Quick Start](#2)
+    - [2.1 PaddlePaddle Installation](#2.1)
+    - [2.2 PaddleClas Installation](#2.2)
+    - [2.3 Prediction](#2.3)
+- [3. Training, Evaluation and Inference](#3)
+    - [3.1 Installation](#3.1)
+    - [3.2 Dataset](#3.2)
+      - [3.2.1 Dataset Introduction](#3.2.1)
+      - [3.2.2 Getting Dataset](#3.2.2)
+    - [3.3 Training](#3.3)
+    - [3.4 Evaluation](#3.4)
+    - [3.5 Inference](#3.5)
+- [4. Model Compression](#4)
+  - [4.1 SKL-UGI Knowledge Distillation](#4.1)
+    - [4.1.1 Teacher Model Training](#4.1.1)
+    - [4.1.2 Knowledge Distillation Training](#4.1.2)
+- [5. SHAS](#5)
+- [6. Inference Deployment](#6)
+  - [6.1 Getting Paddle Inference Model](#6.1)
+    - [6.1.1 Exporting Paddle Inference Model](#6.1.1)
+    - [6.1.2 Downloading Inference Model](#6.1.2)
+  - [6.2 Prediction with Python](#6.2)
+    - [6.2.1 Image Prediction](#6.2.1)
+    - [6.2.2 Images Prediction](#6.2.2)
+  - [6.3 Deployment with C++](#6.3)
+  - [6.4 Deployment as Service](#6.4)
+  - [6.5 Deployment on Mobile](#6.5)
+  - [6.6 Converting To ONNX and Deployment](#6.6)
+
+<a name="1"></a>
+
+## 1. Introduction
+
+This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of traffic sign using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in automatic driving, road monitoring, etc.
+
+The following table lists the relevant indicators of the model. The first two lines means that using SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The third to sixth lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
+
+| Backbone | Top-1 Acc(%) | Latency(ms) | Size(M)| Training Strategy |
+|-------|-----------|----------|---------------|---------------|
+| SwinTranformer_tiny  | 98.11 | 89.45  | 111 | using ImageNet pretrained model |
+| MobileNetV3_small_x0_35  | 93.88 | 3.01  | 3.9 | using ImageNet pretrained model |
+| PPLCNet_x1_0  | 97.78 | 2.10  | 8.2 | using ImageNet pretrained model |
+| PPLCNet_x1_0  | 97.84 | 2.10  | 8.2 | using SSLD pretrained model |
+| PPLCNet_x1_0  | 98.14 | 2.10  | 8.2 | using SSLD pretrained model + EDA strategy  |
+| <b>PPLCNet_x1_0<b>  | <b>98.35<b> | <b>2.10<b>  | <b>8.2<b> | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
+
+It can be seen that high accuracy can be getted when backbone is SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the accuracy will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the accuracy is lower 3.9 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more than 43% faster. After additional using the SSLD pretrained model, the accuracy can be improved by about 0.06 percentage points without affecting the inference speed. Further, additional using the EDA strategy, the accuracy can be increased by 0.3 percentage points. Finally, after additional using the SKL-UGI knowledge distillation, the accuracy can be further improved by 0.21 percentage points. At this point, the accuracy exceeds that of SwinTranformer_tiny, but the speed is more than 41 times faster. The training method and deployment instructions of PULC will be introduced in detail below.
+
+**Note**:
+
+* The Latency is tested on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. The MKLDNN is enabled and the number of threads is 10.
+* About PP-LCNet, please refer to [PP-LCNet Introduction](../models/PP-LCNet_en.md) and [PP-LCNet Paper](https://arxiv.org/abs/2109.15099).
+
+<a name="2"></a>
+
+## 2. Quick Start
+
+<a name="2.1"></a>  
+
+### 2.1 PaddlePaddle Installation
+
+- Run the following command to install if CUDA9 or CUDA10 is available.
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- Run the following command to install if GPU device is unavailable.
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) for more information about installation, for examples other versions.
+
+<a name="2.2"></a>  
+
+### 2.2 PaddleClas wheel Installation
+
+The command of PaddleClas installation as bellow:
+
+```bash
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 Prediction
+
+First, please click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download and unzip to get the test demo images.
+
+* Prediction with CLI
+
+```bash
+paddleclas --model_name=traffic_sign --infer_imgs=pulc_demo_imgs/traffic_sign/100999_83928.jpg
+```
+
+Results:
+
+```
+>>> result
+class_ids: [182, 179, 162, 128, 24], scores: [0.98623, 0.01255, 0.00022, 0.00021, 0.00012], label_names: ['pl110', 'pl100', 'pl120', 'p26', 'pm10'], filename: pulc_demo_imgs/traffic_sign/100999_83928.jpg
+Predict complete!
+```
+
+**Note**: If you want to test other images, only need to specify the `--infer_imgs` argument, and the directory containing images is also supported.
+
+* Prediction in Python
+
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="traffic_sign")
+result = model.predict(input_data="pulc_demo_imgs/traffic_sign/100999_83928.jpg")
+print(next(result))
+```
+
+**Note**: The `result` returned by `model.predict()` is a generator, so you need to use the `next()` function to call it or `for` loop to loop it. And it will predict with `batch_size` size batch and return the prediction results when called. The default `batch_size` is 1, and you also specify the `batch_size` when instantiating, such as `model = paddleclas.PaddleClas(model_name="traffic_sign",  batch_size=2)`. The result of demo above:
+
+```
+>>> result
+[{'class_ids': [182, 179, 162, 128, 24], 'scores': [0.98623, 0.01255, 0.00022, 0.00021, 0.00012], 'label_names': ['pl110', 'pl100', 'pl120', 'p26', 'pm10'], 'filename': 'pulc_demo_imgs/traffic_sign/100999_83928.jpg'}]
+```
+
+<a name="3"></a>
+
+## 3. Training, Evaluation and Inference
+
+<a name="3.1"></a>  
+
+### 3.1 Installation
+
+Please refer to [Installation](../installation/install_paddleclas_en.md) to get the description about installation.
+
+<a name="3.2"></a>
+
+### 3.2 Dataset
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 Dataset Introduction
+
+All datasets used in this case are open source data. Train data is the subset of [MS-COCO](https://cocodataset.org/#overview) training data. And the validation data is the subset of [Object365](https://www.objects365.org/overview.html) training data. ImageNet_val is [ImageNet-1k](https://www.image-net.org/) validation data.
+
+The dataset used in this case is based on the [Tsinghua-Tencent 100K dataset (CC-BY-NC license), TT100K](https://cg.cs.tsinghua.edu.cn/traffic-sign/) randomly expanded and cropped according to the bounding box.
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 Getting Dataset
+
+The processing to `TT00K` includes randomly expansion and cropping, details are shown below.
+
+```python
+def get_random_crop_box(xmin, ymin, xmax, ymax, img_height, img_width, ratio=1.0):
+    h = ymax - ymin
+    w = ymax - ymin
+
+    xmin_diff = random.random() * ratio * min(w, xmin/ratio)
+    ymin_diff = random.random() * ratio * min(h, ymin/ratio)
+    xmax_diff = random.random() * ratio * min(w, (img_width-xmin-1)/ratio)
+    ymax_diff = random.random() * ratio * min(h, (img_height-ymin-1)/ratio)
+
+    new_xmin = round(xmin - xmin_diff)
+    new_ymin = round(ymin - ymin_diff)
+    new_xmax = round(xmax + xmax_diff)
+    new_ymax = round(ymax + ymax_diff)
+
+    return new_xmin, new_ymin, new_xmax, new_ymax
+```
+
+Some image of the processed dataset is as follows:
+
+<div align="center">
+<img src="../../images/PULC/docs/traffic_sign_data_demo.png"  width = "500" />
+</div>
+
+You can also download the data processed directly. And the process script file `deal.py` is also included.
+
+```
+cd path_to_PaddleClas
+```
+
+Enter the `dataset/` directory, download and unzip the dataset.
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/traffic_sign.tar
+tar -xf traffic_sign.tar
+cd ../
+```
+
+The datas under `traffic_sign` directory:
+
+```
+traffic_sign
+├── train
+│   ├── 0_62627.jpg
+│   ├── 100000_89031.jpg
+│   ├── 100001_89031.jpg
+...
+├── test
+│   ├── 100423_2315.jpg
+│   ├── 100424_2315.jpg
+│   ├── 100425_2315.jpg
+...
+├── other
+│   ├── 100603_3422.jpg
+│   ├── 100604_3422.jpg
+...
+├── label_list_train.txt
+├── label_list_test.txt
+├── label_list_other.txt
+├── label_list_train_for_distillation.txt
+├── label_list_train.txt.debug
+├── label_list_test.txt.debug
+├── label_name_id.txt
+├── deal.py
+```
+
+Where `train/` and `test/` are training set and validation set respectively. The `label_list_train.txt` and `label_list_test.txt` are label files of training data and validation data respectively. The file `label_list_train.txt.debug` and `label_list_test.txt.debug` are subset of `train_list.txt` and `val_list.txt` respectively. `other` would be used for SKL-UGI knowledge distillation, and its label file is `label_list_train_for_distillation.txt`.
+
+**Note**:
+
+* About the contents format of `label_list_train.txt` and `label_list_train.txt`, please refer to [Description about Classification Dataset in PaddleClas](../data_preparation/classification_dataset_en.md).
+* About the `label_list_train_for_distillation.txt`, please refer to [Knowledge Distillation Label](../advanced_tutorials/distillation/distillation_en.md).
+
+<a name="3.3"></a>
+
+### 3.3 Training
+
+The details of training config in `ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml`. The command about training as follows:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml
+```
+
+The best metric of validation data is between `98.0` and `98.2`. There would be fluctuations because the data size is small.
+
+<a name="3.4"></a>
+
+### 3.4 Evaluation
+
+After training, you can use the following commands to evaluate the model.
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+Among the above command, the argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+
+<a name="3.5"></a>
+
+### 3.5 Inference
+
+After training, you can use the model that trained to infer. Command is as follow:
+
+```bash
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model
+```
+
+The results:
+
+```
+99603_17806.jpg:        class id(s): [216, 145, 49, 207, 169], score(s): [1.00, 0.00, 0.00, 0.00, 0.00], label_name(s): ['pm20', 'pm30', 'pm40', 'pl25', 'pm15']
+```
+
+**Note**:
+
+* Among the above command, argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+* The default test image is `deploy/images/PULC/traffic_sign/99603_17806.jpg`. And you can test other image, only need to specify the argument `-o Infer.infer_imgs=path_to_test_image`.
+
+
+<a name="4"></a>
+
+## 4. Model Compression
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI Knowledge Distillation
+
+SKL-UGI is a simple but effective knowledge distillation algrithem proposed by PaddleClas.
+
+<!-- todo -->
+<!-- Please refer to [SKL-UGI](../advanced_tutorials/distillation/distillation_en.md) for more details. -->
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 Teacher Model Training
+
+Training the teacher model with hyperparameters specified in `ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml`. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd
+```
+
+The best metric of validation data is about `98.59%`. The best teacher model weight would be saved in file `output/ResNet101_vd/best_model.pdparams`.
+
+<a name="4.1.2"></a>
+
+#### 4.1.2 Knowledge Distillation Training
+
+The training strategy, specified in training config file `ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0_distillation.yaml`, the teacher model is `ResNet101_vd`, the student model is `PPLCNet_x1_0` and the additional unlabeled training data is validation data of ImageNet1k. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0_distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
+```
+
+The best metric is about `98.35%`. The best student model weight would be saved in file `output/DistillationModel/best_model_student.pdparams`.
+
+<a name="5"></a>
+
+## 5. Hyperparameters Searching
+
+The hyperparameters used by [3.2 section](#3.2) and [4.1 section](#4.1) are according by `Hyperparameters Searching` in PaddleClas. If you want to get better results on your own dataset, you can refer to [Hyperparameters Searching](PULC_train_en.md#4) to get better hyperparameters.
+
+**Note**: This section is optional. Because the search process will take a long time, you can selectively run according to your specific. If not replace the dataset, you can ignore this section.
+
+<a name="6"></a>
+
+## 6. Inference Deployment
+
+<a name="6.1"></a>
+
+### 6.1 Getting Paddle Inference Model
+
+Paddle Inference is the original Inference Library of the PaddlePaddle, provides high-performance inference for server deployment. And compared with  directly based on the pretrained model, Paddle Inference can use tools to accelerate prediction, so as to achieve better inference performance. Please refer to [Paddle Inference](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html) for more information.
+
+Paddle Inference need Paddle Inference Model to predict. Two process provided to get Paddle Inference Model. If want to use the provided by PaddleClas, you can download directly, click [Downloading Inference Model](#6.1.2).
+<a name="6.1.1"></a>
+
+### 6.1.1 Exporting Paddle Inference Model
+
+The command about exporting Paddle Inference Model is as follow:
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_traffic_sign_infer
+```
+
+After running above command, the inference model files would be saved in `deploy/models/PPLCNet_x1_0_traffic_sign_infer`, as shown below:
+
+```
+├── PPLCNet_x1_0_traffic_sign_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**Note**: The best model is from knowledge distillation training. If knowledge distillation training is not used, the best model would be saved in `output/PPLCNet_x1_0/best_model.pdparams`.
+
+<a name="6.1.2"></a>
+
+### 6.1.2 Downloading Inference Model
+
+You can also download directly.
+
+```
+cd deploy/models
+# download the inference model and decompression
+wget https://paddleclas.bj.bcebos.com/models/PULC/traffic_sign_infer.tar && tar -xf traffic_sign_infer.tar
+```
+
+After decompression, the directory `models` should be shown below.
+
+```
+├── traffic_sign_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 Prediction with Python
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 Image Prediction
+
+Return the directory `deploy`:
+
+```
+cd ../
+```
+
+Run the following command to classify traffic sign about the image `./images/PULC/traffic_sign/99603_17806.jpg`.
+
+```shell
+# Use the following command to predict with GPU.
+python3.7 python/predict_cls.py -c configs/PULC/traffic_sign/inference_traffic_sign.yaml
+# Use the following command to predict with CPU.
+python3.7 python/predict_cls.py -c configs/PULC/traffic_sign/inference_traffic_sign.yaml -o Global.use_gpu=False
+```
+
+The prediction results:
+
+```
+99603_17806.jpg:        class id(s): [216, 145, 49, 207, 169], score(s): [1.00, 0.00, 0.00, 0.00, 0.00], label_name(s): ['pm20', 'pm30', 'pm40', 'pl25', 'pm15']
+```
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 Images Prediction
+
+If you want to predict images in directory, please specify the argument `Global.infer_imgs` as directory path by `-o Global.infer_imgs`. The command is as follow.
+
+```shell
+# Use the following command to predict with GPU. If want to replace with CPU, you can add argument -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/traffic_sign/inference_traffic_sign.yaml -o Global.infer_imgs="./images/PULC/traffic_sign/"
+```
+
+All prediction results will be printed, as shown below.
+
+```
+100999_83928.jpg:    class id(s): [182, 179, 162, 128, 24], score(s): [0.99, 0.01, 0.00, 0.00, 0.00], label_name(s): ['pl110', 'pl100', 'pl120', 'p26', 'pm10']
+99603_17806.jpg:    class id(s): [216, 145, 49, 24, 169], score(s): [1.00, 0.00, 0.00, 0.00, 0.00], label_name(s): ['pm20', 'pm30', 'pm40', 'pm10', 'pm15']
+```
+
+About the `label_name` details, please refer to `dataset/traffic_sign/report.pdf`.
+
+<a name="6.3"></a>
+
+### 6.3 Deployment with C++
+
+PaddleClas provides an example about how to deploy with C++. Please refer to [Deployment with C++](../inference_deployment/cpp_deploy_en.md).
+
+<a name="6.4"></a>
+
+### 6.4 Deployment as Service
+
+Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
+
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+
+<a name="6.5"></a>
+
+### 6.5 Deployment on Mobile
+
+Paddle-Lite is an open source deep learning framework that designed to make easy to perform inference on mobile, embeded, and IoT devices. Please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) for more information.
+
+PaddleClas provides an example of how to deploy on mobile by Paddle-Lite. Please refer to [Paddle-Lite deployment](../inference_deployment/paddle_lite_deploy_en.md).
+
+<a name="6.6"></a>
+
+### 6.6 Converting To ONNX and Deployment
+
+Paddle2ONNX support convert Paddle Inference model to ONNX model. And you can deploy with ONNX model on different inference engine, such as TensorRT, OpenVINO, MNN/TNN, NCNN and so on. About Paddle2ONNX details, please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
+
+PaddleClas provides an example of how to convert Paddle Inference model to ONNX model by paddle2onnx toolkit and predict by ONNX model. You can refer to [paddle2onnx](../../../deploy/paddle2onnx/readme_en.md) for deployment details.
diff --git a/docs/en/PULC/PULC_train_en.md b/docs/en/PULC/PULC_train_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..9f94265e9ffb38f40633c671b0f6a60846f8cd08
--- /dev/null
+++ b/docs/en/PULC/PULC_train_en.md
@@ -0,0 +1,246 @@
+## Practical Ultra Lightweight Classification scheme PULC
+------
+
+
+## Catalogue
+
+- [1. Introduction of PULC solution](#1)
+- [2. Data preparation](#2)
+    - [2.1 Dataset format description](#2.1)
+    - [2.2 Annotation file generation method](#2.2)
+- [3. Training with standard classification configuration](#3)
+    - [3.1 PP-LCNet as backbone](#3.1)
+    - [3.2 SSLD pretrained model](#3.2)
+    - [3.3 EDA strategy](#3.3)
+    - [3.4 SKL-UGI knowledge distillation](#3.4)
+    - [3.5 Summary](#3.5)
+- [4. Hyperparameters Searching](#4)
+    - [4.1 Search based on default configuration](#4.1)
+    - [4.2 Custom search configuration](#4.2)
+
+<a name="1"></a>
+
+### 1. Introduction of PULC solution
+
+Image classification is one of the basic algorithms of computer vision, and it is also the most common algorithm in enterprise applications, and further, it is also an important part of many CV applications. In recent years, the backbone network model has developed rapidly, and the accuracy record of ImageNet has been continuously refreshed. However, the performance of these models in practical scenarios is sometimes unsatisfactory. On the one hand, models with high precision tend to have large storage and slow inference speed, which are often difficult to meet actual deployment requirements; on the other hand, after selecting a suitable model, experienced engineers are often required to adjust parameters, which is time-consuming and labor-intensive. In order to solve the problems of enterprise application and make the training and parameter adjustment of classification models easier, PaddleClas summarized and launched a Practical Ultra Lightweight Classification (PULC) solution. PULC integrates various state-of-the-art algorithms such as backbone network, data augmentation and distillation, etc., and finally can automatically obtain a lightweight and high-precision image classification model.
+
+
+The PULC solution has been verified to be effective in many scenarios, such as human-related scenarios, car-related scenarios, and OCR-related scenarios. With an ultra-lightweight model, the accuracy close to SwinTransformer can be achieved, and the inference speed can be 40+ times faster.
+
+<div align="center">
+<img src="https://user-images.githubusercontent.com/19523330/173011854-b10fcd7a-b799-4dfd-a1cf-9504952a3c44.png"  width = "800" />
+</div>
+
+The  solution mainly includes 4 parts, namely: PP-LCNet lightweight backbone network, SSLD pre-trained model, Ensemble Data Augmentation (EDA) and SKL-UGI knowledge distillation algorithm. In addition, we also adopt the method of hyperparameters searching to efficiently optimize the hyperparameters in training. Below, we take the person exists or not scene as an example to illustrate the solution.
+
+**Note**：For some specific scenarios, we provide basic training documents for reference, such as [person exists or not classification model](PULC_person_exists_en.md), etc. You can find these documents [here](./PULC_model_list_en.md). If the methods in these documents do not meet your needs, or if you need a custom training task, you can refer to this document.
+
+<a name="2"></a>
+
+### 2. Data preparation
+
+<a name="2.1"></a>
+
+#### 2.1 Dataset format description
+
+PaddleClas uses the `txt` format file to specify the training set and validation set. Take the person exists or not scene as an example, you need to specify `train_list.txt` and `val_list.txt` as the data labels of the training set and validation set. The format is in the form of as follows:
+
+```
+# Each line uses "space" to separate the image path and label
+train/1.jpg 0
+train/10.jpg 1
+...
+```
+
+If you want to get more information about common classification datasets, you can refer to the document [PaddleClas Classification Dataset Format Description](../data_preparation/classification_dataset_en.md).
+
+
+<a name="2.2"></a>
+
+#### 2.2 Annotation file generation method
+
+If you already have the data in the actual scene, you can label it according to the format in the previous section. Here, we provide a script to quickly generate annotation files. You only need to put different categories of data in folders and run the script to generate annotation files.
+
+First, assume that the path where you store the data is `./train`, `train/` contains the data of each category, the category number starts from 0, and the folder of each category contains specific image data.
+
+```shell
+train
+├── 0
+│   ├── 0.jpg
+│   ├── 1.jpg
+│   └── ...
+└── 1
+    ├── 0.jpg
+    ├── 1.jpg
+    └── ...
+└── ...
+```
+
+```shell
+tree -r -i -f train | grep -E "jpg|JPG|jpeg|JPEG|png|PNG" | awk -F "/" '{print $0" "$2}' > train_list.txt
+```
+
+Among them, if more image name suffixes are involved, the content after `grep -E` can be added, and the `2` in `$2` is the level of the category number folder.
+
+**Note:** The above is an introduction to the method of dataset acquisition and generation. Here you can directly download the person exists or not scene data to quickly start the experience.
+
+
+Go to the PaddleClas directory.
+
+```
+cd path_to_PaddleClas
+```
+
+Go to the `dataset/` directory, download and unzip the data.
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/person_exists.tar
+tar -xf person_exists.tar
+cd ../
+```
+
+<a name="3"></a>
+
+### 3. Training with standard classification configuration
+
+<a name="3.1"></a>
+
+#### 3.1 PP-LCNet as backbone
+
+PULC adopts the lightweight backbone network PP-LCNet, which is 50% faster than other networks with the same accuracy. You can view the detailed introduction of the backbone network in [PP-LCNet Introduction](../models/PP-LCNet_en.md).
+
+The command to train with PP-LCNet is:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0_search.yaml
+```
+
+For performance comparison, we also provide configuration files for the large model SwinTransformer_tiny and the lightweight model MobileNetV3_small_x0_35, which you can train with the command:
+
+SwinTransformer_tiny：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_exists/SwinTransformer_tiny_patch4_window7_224.yaml
+```
+
+MobileNetV3_small_x0_35：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_exists/MobileNetV3_small_x0_35.yaml
+```
+
+
+The accuracy of the trained models is compared in the following table.
+
+| Model | Tpr（%） | Latency（ms） | Storage Size（M） | Strategy |
+|-------|-----------|----------|---------------|---------------|
+| SwinTranformer_tiny  | 95.69 | 95.30  | 107 | Use ImageNet pretrained model|
+| MobileNetV3_small_x0_35  | 68.25 | 2.85  | 1.6 | Use ImageNet pretrained model |
+| PPLCNet_x1_0  | 89.57 | 2.12  | 6.5 | Use ImageNet pretrained model |
+
+It can be seen that PP-LCNet is much faster than SwinTransformer, but the accuracy is also slightly lower. Below we improve the accuracy of the PP-LCNet model through a series of optimizations.
+
+<a name="3.2"></a>
+
+#### 3.2 SSLD pretrained model
+
+SSLD is a semi-supervised distillation algorithm developed by Baidu. On the ImageNet dataset, the model accuracy can be improved by 3-7 points. You can find a detailed introduction in [SSLD introduction](../advanced_tutorials/distillation/distillation_en.md). We found that using SSLD pre-trained weights can effectively improve the accuracy of the applied classification model. In addition, using a smaller resolution in training can effectively improve model accuracy. At the same time, we also optimize the learning rate.
+Based on the above three improvements, the accuracy of our trained model is 92.1%, an increase of 2.6%.
+
+<a name="3.3"></a>
+
+#### 3.3 EDA strategy
+
+Data augmentation is a commonly used optimization strategy in vision algorithms, which can significantly improve the accuracy of the model. In addition to the traditional RandomCrop, RandomFlip, etc. methods, we also apply RandomAugment and RandomErasing. You can find a detailed introduction at [Data Augmentation Introduction](../advanced_tutorials/DataAugmentation_en.md).
+Since these two kinds of data augmentation greatly modify the picture, making the classification task more difficult, it may lead to under-fitting of the model on some datasets. We will set the probability of enabling these two methods in advance.
+Based on the above improvements, we obtained a model accuracy of 93.43%, an increase of 1.3%.
+
+<a name="3.4"></a>
+
+#### 3.4 SKL-UGI knowledge distillation
+
+Knowledge distillation is a method that can effectively improve the accuracy of small models. You can find a detailed introduction in [Introduction to Knowledge Distillation](../advanced_tutorials/distillation/distillation_en.md). We choose ResNet101_vd as the teacher model for distillation. In order to adapt to the distillation process, we also adjust the learning rate of different stages of the network here. Based on the above improvements, we trained the model to get a model accuracy of 95.6%, an increase of 1.4%.
+
+<a name="3.5"></a>
+
+#### 3.5 Summary
+
+After the optimization of the above methods, the final accuracy of PP-LCNet reaches 95.6%, reaching the accuracy level of the large model. We summarize the experimental results in the following table:
+
+| Model | Tpr（%） | Latency（ms） | Storage Size（M） | Strategy |
+|-------|-----------|----------|---------------|---------------|
+| SwinTranformer_tiny  | 95.69 | 95.30  | 107 | Use ImageNet pretrained model |
+| MobileNetV3_small_x0_35  | 68.25 | 2.85  | 1.6 | Use ImageNet pretrained model |
+| PPLCNet_x1_0  | 89.57 | 2.12  | 6.5 | Use ImageNet pretrained model |
+| PPLCNet_x1_0  | 92.10 | 2.12  | 6.5 | Use SSLD pretrained model |
+| PPLCNet_x1_0  | 93.43 | 2.12  | 6.5 | Use SSLD pretrained model + EDA Strategy|
+| <b>PPLCNet_x1_0<b>  | <b>95.60<b> | <b>2.12<b>  | <b>6.5<b> | Use SSLD pretrained model + EDA Strategy + SKL-UGI knowledge distillation |
+
+We also used the same optimization strategy in the other 8 scenarios and got the following results:
+
+| scenarios | large model | large model metrics(%) | small model | small model metrics(%) |
+|----------|----------|----------|----------|----------|
+| Pedestrian Attribute Classification | Res2Net200_vd | 81.25 | PPLCNet_x1_0 | 78.59 |
+| Classification of Wheather Wearing Safety Helmet | Res2Net200_vd| 98.92 | PPLCNet_x1_0 |99.38 |
+| Traffic Sign Classification | SwinTransformer_tiny | 98.11 | PPLCNet_x1_0 | 98.35 |
+| Vehicle Attribute Classification | Res2Net200_vd_26w_4s | 91.36 | PPLCNet_x1_0 | 90.81 |
+| Car Exists Classification | SwinTransformer_tiny | 97.71 | PPLCNet_x1_0 | 95.92 |
+| Text Image Orientation Classification | SwinTransformer_tiny |99.12 | PPLCNet_x1_0 | 99.06 |
+| Text-line Orientation Classification | SwinTransformer_tiny | 93.61 | PPLCNet_x1_0 | 96.01 |
+| Language Classification | SwinTransformer_tiny | 98.12 | PPLCNet_x1_0 | 99.26 |
+
+
+It can be seen from the results that the PULC scheme can improve the model accuracy in multiple application scenarios. Using the PULC scheme can greatly reduce the workload of model optimization and quickly obtain models with higher accuracy.
+
+
+<a name="4"></a>
+
+### 4. Hyperparameters Searching
+
+In the above training process, we adjusted parameters such as learning rate, data augmentation probability, and stage learning rate mult list. The optimal values of these parameters may not be the same in different scenarios. We provide a quick hyperparameters searching script to automate the process of hyperparameter tuning. This script traverses the parameters in the search value list to replace the parameters in the default configuration, then trains in sequence, and finally selects the parameters corresponding to the model with the highest accuracy as the search result.
+
+<a name="4.1"></a>
+
+#### 4.1 Search based on default configuration
+
+The configuration file [search.yaml](../../../ppcls/configs/PULC/person_exists/search.yaml) defines the configuration of hyperparameters searching in person exists or not scenarios. Use the following commands to complete hyperparameters searching.
+
+```bash
+python3 tools/search_strategy.py -c ppcls/configs/PULC/person_exists/search.yaml
+```
+
+**Note**：Regarding the search part, we are also constantly improving, so stay tuned.
+
+<a name="4.2"></a>
+
+#### 4.2 Custom search configuration
+
+
+You can also modify the configuration of hyperparameters searching based on training results or your parameter tuning experience.
+
+Modify the `search_values` field in `lrs` to modify the list of learning rate search values;
+
+Modify the `search_values` field in `resolutions` to modify the search value list of resolutions;
+
+Modify the `search_values` field in `ra_probs` to modify the search value list of RandAugment activation probability;
+
+Modify the `search_values` field in `re_probs` to modify the search value list of RnadomErasing on probability;
+
+Modify the `search_values` field in `lr_mult_list` to modify the lr_mult search value list;
+
+Modify the `search_values` field in `teacher` to modify the search list of the teacher model.
+
+After the search is completed, the final results will be generated in `output/search_person_exists`, where, except for `search_res`, the directories in `output/search_person_exists` are the weights and training log files of the results of the corresponding hyperparameters of each search training, ` search_res` corresponds to the result of knowledge distillation, that is, the final model. The weights of the model are stored in `output/output_dir/search_person_exists/DistillationModel/best_model_student.pdparams`.
diff --git a/docs/en/PULC/PULC_vehicle_attribute_en.md b/docs/en/PULC/PULC_vehicle_attribute_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..47d7c963e9de6e4bde9fd3338830611e59b60695
--- /dev/null
+++ b/docs/en/PULC/PULC_vehicle_attribute_en.md
@@ -0,0 +1,481 @@
+# PULC Recognition Model of Vehicle Attribute
+
+------
+
+## Catalogue
+
+- [1. Introduction](#1)
+- [2. Quick Start](#2)
+    - [2.1 PaddlePaddle Installation](#2.1)
+    - [2.2 PaddleClas Installation](#2.2)
+    - [2.3 Prediction](#2.3)
+- [3. Training, Evaluation and Inference](#3)
+    - [3.1 Installation](#3.1)
+    - [3.2 Dataset](#3.2)
+      - [3.2.1 Dataset Introduction](#3.2.1)
+      - [3.2.2 Getting Dataset](#3.2.2)
+    - [3.3 Training](#3.3)
+    - [3.4 Evaluation](#3.4)
+    - [3.5 Inference](#3.5)
+- [4. Model Compression](#4)
+  - [4.1 SKL-UGI Knowledge Distillation](#4.1)
+    - [4.1.1 Teacher Model Training](#4.1.1)
+    - [4.1.2 Knowledge Distillation Training](#4.1.2)
+- [5. SHAS](#5)
+- [6. Inference Deployment](#6)
+  - [6.1 Getting Paddle Inference Model](#6.1)
+    - [6.1.1 Exporting Paddle Inference Model](#6.1.1)
+    - [6.1.2 Downloading Inference Model](#6.1.2)
+  - [6.2 Prediction with Python](#6.2)
+    - [6.2.1 Image Prediction](#6.2.1)
+    - [6.2.2 Images Prediction](#6.2.2)
+  - [6.3 Deployment with C++](#6.3)
+  - [6.4 Deployment as Service](#6.4)
+  - [6.5 Deployment on Mobile](#6.5)
+  - [6.6 Converting To ONNX and Deployment](#6.6)
+
+<a name="1"></a>
+
+## 1. Introduction
+
+This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of vehicle attribute using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in Vehicle identification, road monitoring and other scenarios.
+
+The following table lists the relevant indicators of the model. The first three lines means that using Res2Net200_vd_26w_4s, ResNet50 and MobileNetV3_small_x0_35 as the backbone to training. The fourth to seventh lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
+
+
+| Backbone | mA（%） | Latency(ms) | Size(M) | Training Strategy |
+|-------|-----------|----------|---------------|---------------|
+| Res2Net200_vd_26w_4s  | 91.36 | 79.46  | 293 | using ImageNet pretrained  |
+| ResNet50  | 89.98 | 12.83  | 92 | using ImageNet pretrained |
+| MobileNetV3_small_x0_35  | 87.41 | 2.91  | 2.8 | using ImageNet pretrained |
+| PPLCNet_x1_0  | 89.57 | 2.36  | 7.2 | using ImageNet pretrained |
+| PPLCNet_x1_0  | 90.07 | 2.36  | 7.2 | using SSLD pretrained  |
+| PPLCNet_x1_0  | 90.59 | 2.36  | 7.2 | using SSLD pretrained + EDA strategy|
+| <b>PPLCNet_x1_0<b>  | <b>90.81<b> | <b>2.36<b>  | <b>7.2<b> | using SSLD pretrained + EDA strategy + SKL-UGI knowledge distillation strategy|
+
+
+It can be seen from the table that the ma metric is higher when the backbone is Res2Net200_vd_26w_4s, but the inference speed is slower. After replacing the backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the ma metric drops significantly. When the backbone is replaced by PPLCNet_x1_0, the ma metric is increased by 2 percentage points, and the speed is also increased by about 23%. On this basis, after using the SSLD pre-training model, the ma metric can be improved by about 0.5 percentage points without changing the inference speed. Further, when the EDA strategy is integrated, the ma metric can be improved by another 0.52 percentage points. Finally, using After SKL-UGI knowledge distillation, the ma metric can continue to improve by 0.23 percentage points. At this time, the ma metric of PPLCNet_x1_0 is only 0.55 percentage points away from Res2Net200_vd_26w_4s, but it is 32 times faster. The training method and deployment instructions of PULC will be introduced in detail below.
+
+
+**Note**:
+
+* The Latency is tested on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. The MKLDNN is enabled and the number of threads is 10.
+* About PP-LCNet, please refer to [PP-LCNet Introduction](../models/PP-LCNet_en.md) and [PP-LCNet Paper](https://arxiv.org/abs/2109.15099).
+
+<a name="2"></a>
+
+## 2. Quick Start
+
+<a name="2.1"></a>  
+
+### 2.1 PaddlePaddle Installation
+
+- Run the following command to install if CUDA9 or CUDA10 is available.
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- Run the following command to install if GPU device is unavailable.
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) for more information about installation, for examples other versions.
+
+<a name="2.2"></a>  
+
+### 2.2 PaddleClas wheel Installation
+
+The command of PaddleClas installation as bellow:
+
+```bash
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 Prediction
+
+First, please click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download and unzip to get the test demo images.
+
+
+* Prediction with CLI
+
+```bash
+paddleclas --model_name=vehicle_attribute --infer_imgs=pulc_demo_imgs/vehicle_attribute/0002_c002_00030670_0.jpg
+```
+
+Results:
+```
+>>> result
+attributes: Color: (yellow, prob: 0.9893476963043213), Type: (hatchback, prob: 0.9734097719192505), output: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], filename: pulc_demo_imgs/vehicle_attribute/0002_c002_00030670_0.jpg
+Predict complete!
+```
+
+**Note**: If you want to test other images, only need to specify the `--infer_imgs` argument, and the directory containing images is also supported.
+
+* Prediction in Python
+
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="vehicle_attribute")
+result = model.predict(input_data="pulc_demo_imgs/vehicle_attribute/0002_c002_00030670_0.jpg")
+print(next(result))
+```
+
+**Note**: The `result` returned by `model.predict()` is a generator, so you need to use the `next()` function to call it or `for` loop to loop it. And it will predict with `batch_size` size batch and return the prediction results when called. The default `batch_size` is 1, and you also specify the `batch_size` when instantiating, such as `model = paddleclas.PaddleClas(model_name="vehicle_attribute",  batch_size=2)`. The result of demo above:
+
+```
+>>> result
+[{'attributes': 'Color: (yellow, prob: 0.9893476963043213), Type: (hatchback, prob: 0.9734097719192505)', 'output': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], 'filename': 'pulc_demo_imgs/vehicle_attribute/0002_c002_00030670_0.jpg'}]
+```
+
+<a name="3"></a>
+
+## 3. Training, Evaluation and Inference
+
+<a name="3.1"></a>  
+
+### 3.1 Installation
+
+Please refer to [Installation](../installation/install_paddleclas_en.md) to get the description about installation.
+
+<a name="3.2"></a>
+
+### 3.2 Dataset
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 Dataset Introduction
+
+The data used in this case is the [pa100k dataset](https://www.v7labs.com/open-datasets/pa-100k).
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 Getting Dataset
+
+
+Part of the data visualization is shown below.
+
+<div align="center">
+<img src="../../images/PULC/docs/vehicle_attribute_data_demo.png"  width = "500" />
+</div>
+
+First, apply for and download data from [VeRi dataset official website](https://www.v7labs.com/open-datasets/veri-dataset), put it in the `dataset` directory of PaddleClas, the dataset directory name is `VeRi `, use the following command to enter the folder.
+
+
+```shell
+cd PaddleClas/dataset/VeRi/
+```
+
+Then use the following code to convert the label (you can execute the following command in the python terminal, or you can write it to a file and run the file using `python3 convert.py`).
+
+```python
+import os
+from xml.dom.minidom import parse
+
+vehicleids = []
+
+def convert_annotation(input_fp, output_fp):
+    in_file = open(input_fp)
+    list_file = open(output_fp, 'w')
+    tree = parse(in_file)
+
+    root = tree.documentElement
+
+    for item in root.getElementsByTagName("Item"):  
+        label = ['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
+        if item.hasAttribute("imageName"):
+            name = item.getAttribute("imageName")
+        if item.hasAttribute("vehicleID"):
+            vehicleid = item.getAttribute("vehicleID")
+            if vehicleid not in vehicleids :
+                vehicleids.append(vehicleid)
+            vid = vehicleids.index(vehicleid)
+        if item.hasAttribute("colorID"):
+            colorid = int (item.getAttribute("colorID"))
+            label[colorid-1] = '1'
+        if item.hasAttribute("typeID"):
+            typeid = int (item.getAttribute("typeID"))
+            label[typeid+9] = '1'
+        label = ','.join(label)
+        list_file.write(os.path.join('image_train', name)  + "\t" + label + "\n")
+
+    list_file.close()
+
+convert_annotation('train_label.xml', 'train_list.txt')  #imagename vehiclenum colorid typeid
+convert_annotation('test_label.xml', 'test_list.txt')
+```
+
+
+After executing the above command, the `VeRi` directory has the following data:
+
+```
+VeRi
+├── image_train
+│   ├── 0001_c001_00016450_0.jpg
+│   ├── 0001_c001_00016460_0.jpg
+│   ├── 0001_c001_00016470_0.jpg
+...
+├── image_test
+│   ├── 0002_c002_00030600_0.jpg
+│   ├── 0002_c002_00030605_1.jpg
+│   ├── 0002_c002_00030615_1.jpg
+...
+...
+├── train_list.txt
+├── test_list.txt
+├── train_label.xml
+├── test_label.xml
+```
+
+where `train/` and `test/` are the training set and validation set, respectively. `train_list.txt` and `test_list.txt` are the converted label files for training and validation sets, respectively.
+
+
+<a name="3.3"></a>
+
+### 3.3 Training
+
+The details of training config in `./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml`. The command about training as follows:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml
+```
+
+The best metric for the validation set is around `90.59%` (the dataset is small and generally fluctuates around 0.3%).
+
+
+<a name="3.4"></a>
+
+### 3.4 Evaluation
+
+After training, you can use the following commands to evaluate the model.
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+Among the above command, the argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+
+<a name="3.5"></a>
+
+### 3.5 Inference
+
+After training, you can use the model that trained to infer. Command is as follow:
+
+```python
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model
+```
+
+The results:
+
+```
+[{'attr': 'Color: (yellow, prob: 0.9893478155136108), Type: (hatchback, prob: 0.9734100103378296)', 'pred': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], 'file_name': './deploy/images/PULC/vehicle_attribute/0002_c002_00030670_0.jpg'}]
+```
+
+**Note**:
+
+* Among the above command, argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+* The default test image is `./deploy/images/PULC/vehicle_attribute/0002_c002_00030670_0.jpg`. And you can test other image, only need to specify the argument `-o Infer.infer_imgs=path_to_test_image`.
+
+<a name="4"></a>
+
+## 4. Model Compression
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI Knowledge Distillation
+
+SKL-UGI is a simple but effective knowledge distillation algrithem proposed by PaddleClas.
+
+<!-- todo -->
+<!-- Please refer to [SKL-UGI](../advanced_tutorials/distillation/distillation_en.md) for more details. -->
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 Teacher Model Training
+
+Training the teacher model with hyperparameters specified in `ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml`. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd
+```
+
+The best metric for the validation set is around `91.60%`. The best teacher model weight would be saved in file `output/ResNet101_vd/best_model.pdparams`.
+
+<a name="4.1.2"></a>
+
+#### 4.1.2 Knowledge Distillation Training
+
+The training strategy, specified in training config file `ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0_distillation.yaml`, the teacher model is `ResNet101_vd`, the student model is `PPLCNet_x1_0`. The command is as follow:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0_distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
+```
+
+The best metric for the validation set is around `90.81%`. The best student model weight would be saved in file `output/DistillationModel/best_model_student.pdparams`.
+
+<a name="5"></a>
+
+## 5. Hyperparameters Searching
+
+The hyperparameters used by [3.2 section](#3.2) and [4.1 section](#4.1) are according by `Hyperparameters Searching` in PaddleClas. If you want to get better results on your own dataset, you can refer to [Hyperparameters Searching](PULC_train_en.md#4) to get better hyperparameters.
+
+**Note**: This section is optional. Because the search process will take a long time, you can selectively run according to your specific. If not replace the dataset, you can ignore this section.
+
+<a name="6"></a>
+
+## 6. Inference Deployment
+
+<a name="6.1"></a>
+
+### 6.1 Getting Paddle Inference Model
+
+Paddle Inference is the original Inference Library of the PaddlePaddle, provides high-performance inference for server deployment. And compared with  directly based on the pretrained model, Paddle Inference can use tools to accelerate prediction, so as to achieve better inference performance. Please refer to [Paddle Inference](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html) for more information.
+
+Paddle Inference need Paddle Inference Model to predict. Two process provided to get Paddle Inference Model. If want to use the provided by PaddleClas, you can download directly, click [Downloading Inference Model](#6.1.2).
+
+<a name="6.1.1"></a>
+
+### 6.1.1 Exporting Paddle Inference Model
+
+The command about exporting Paddle Inference Model is as follow:
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_vehicle_attribute_infer
+```
+
+After running above command, the inference model files would be saved in `PPLCNet_x1_0_vehicle_attribute_infer`, as shown below:
+
+```
+└── PPLCNet_x1_0_vehicle_attribute_infer
+    ├── inference.pdiparams
+    ├── inference.pdiparams.info
+    └── inference.pdmodel
+```
+
+**Note**: The best model is from knowledge distillation training. If knowledge distillation training is not used, the best model would be saved in `output/PPLCNet_x1_0/best_model.pdparams`.
+
+<a name="6.1.2"></a>
+
+### 6.1.2 Downloading Inference Model
+
+You can also download directly.
+
+```
+cd deploy/models
+# download the inference model and decompression
+wget https://paddleclas.bj.bcebos.com/models/PULC/vehicle_attribute_infer.tar && tar -xf vehicle_attribute_infer.tar
+```
+
+After decompression, the directory `models` should be shown below.
+
+```
+├── vehicle_attribute_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 Prediction with Python
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 Image Prediction
+
+Return the directory `deploy`:
+
+```
+cd ../
+```
+
+Run the following command to classify whether there are human in the image `../images/PULC/vehicle_attribute/0002_c002_00030670_0.jpg`.
+
+```shell
+# Use the following command to predict with GPU.
+python3.7 python/predict_cls.py -c configs/PULC/vehicle_attribute/inference_vehicle_attribute.yaml -o Global.use_gpu=True
+# Use the following command to predict with CPU.
+python3.7 python/predict_cls.py -c configs/PULC/vehicle_attribute/inference_vehicle_attribute.yaml -o Global.use_gpu=False
+```
+
+The prediction results:
+
+```
+0002_c002_00030670_0.jpg:     {'attributes': 'Color: (yellow, prob: 0.9893478155136108), Type: (hatchback, prob: 0.9734099507331848)', 'output': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]}
+```
+
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 Images Prediction
+
+If you want to predict images in directory, please specify the argument `Global.infer_imgs` as directory path by `-o Global.infer_imgs`. The command is as follow.
+
+```shell
+# Use the following command to predict with GPU. If want to replace with CPU, you can add argument -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/vehicle_attribute/inference_vehicle_attribute.yaml -o Global.infer_imgs="./images/PULC/vehicle_attribute/"
+```
+
+All prediction results will be printed, as shown below.
+
+```
+0002_c002_00030670_0.jpg:     {'attributes': 'Color: (yellow, prob: 0.9893476963043213), Type: (hatchback, prob: 0.9734097719192505)', 'output': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]}
+0014_c012_00040750_0.jpg:     {'attributes': 'Color: (red, prob: 0.999872088432312), Type: (sedan, prob: 0.999976634979248)', 'output': [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]}
+```
+
+Among the prediction results above, `someone` means that there is a human in the image, `nobody` means that there is no human in the image.
+
+<a name="6.3"></a>
+
+### 6.3 Deployment with C++
+
+PaddleClas provides an example about how to deploy with C++. Please refer to [Deployment with C++](../inference_deployment/cpp_deploy_en.md).
+
+<a name="6.4"></a>
+
+### 6.4 Deployment as Service
+
+Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
+
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+
+<a name="6.5"></a>
+
+### 6.5 Deployment on Mobile
+
+Paddle-Lite is an open source deep learning framework that designed to make easy to perform inference on mobile, embeded, and IoT devices. Please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) for more information.
+
+PaddleClas provides an example of how to deploy on mobile by Paddle-Lite. Please refer to [Paddle-Lite deployment](../inference_deployment/paddle_lite_deploy_en.md).
+
+<a name="6.6"></a>
+
+### 6.6 Converting To ONNX and Deployment
+
+Paddle2ONNX support convert Paddle Inference model to ONNX model. And you can deploy with ONNX model on different inference engine, such as TensorRT, OpenVINO, MNN/TNN, NCNN and so on. About Paddle2ONNX details, please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
+
+PaddleClas provides an example of how to convert Paddle Inference model to ONNX model by paddle2onnx toolkit and predict by ONNX model. You can refer to [paddle2onnx](../../../deploy/paddle2onnx/readme_en.md) for deployment details.
diff --git a/docs/en/inference_deployment/whl_deploy_en.md b/docs/en/inference_deployment/whl_deploy_en.md
index 224d41a7c1f2de9886fd830a36b8910dae0f97b6..e2666458a27f55bdb44f5fcb2646ba9107e80163 100644
--- a/docs/en/inference_deployment/whl_deploy_en.md
+++ b/docs/en/inference_deployment/whl_deploy_en.md
@@ -1,6 +1,6 @@
 # PaddleClas wheel package
 
-Paddleclas supports Python WHL package for prediction. At present, WHL package only supports image classification, but does not support subject detection, feature extraction and vector retrieval.
+PaddleClas supports Python wheel package for prediction. At present, PaddleClas wheel supports image classification including ImagetNet1k models and PULC models, but does not support mainbody detection, feature extraction and vector retrieval.
 
 ---
 
@@ -8,8 +8,10 @@ Paddleclas supports Python WHL package for prediction. At present, WHL package o
 
 - [1. Installation](#1)
 - [2. Quick Start](#2)
+   - [2.1 ImageNet1k models](#2.1)
+   - [2.2 PULC models](#2.2)
 - [3. Definition of Parameters](#3)
-- [4. Usage](#4)
+- [4. More usage](#4)
    - [4.1 View help information](#4.1)
    - [4.2 Prediction using inference model provide by PaddleClas](#4.2)
    - [4.3 Prediction using local model files](#4.3)
@@ -20,6 +22,7 @@ Paddleclas supports Python WHL package for prediction. At present, WHL package o
    - [4.8 Specify the mapping between class id and label name](#4.8)
 
 <a name="1"></a>
+
 ## 1. Installation
 
 * installing from pypi
@@ -36,8 +39,14 @@ pip3 install dist/*
 ```
 
 <a name="2"></a>
+
 ## 2. Quick Start
-* Using the `ResNet50` model provided by PaddleClas, the following image(`'docs/images/inference_deployment/whl_demo.jpg'`) as an example.
+
+<a name="2.1"></a>
+
+### 2.1 ImageNet1k models
+
+Using the `ResNet50` model provided by PaddleClas, the following image(`'docs/images/inference_deployment/whl_demo.jpg'`) as an example.
 
 ![](../../images/inference_deployment/whl_demo.jpg)
 
@@ -68,25 +77,88 @@ filename: docs/images/inference_deployment/whl_demo.jpg, top-5, class_ids: [8, 7
 Predict complete!
 ```
 
+<a name="2.2"></a>
+
+### 2.2 PULC models
+
+PULC integrates various state-of-the-art algorithms such as backbone network, data augmentation and distillation, etc., and finally can automatically obtain a lightweight and high-precision image classification model.
+
+PaddleClas provides a series of test cases, which contain demos of different scenes about people, cars, OCR, etc. Click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download the data.
+
+Prection using the PULC "Human Exists Classification" model provided by PaddleClas:
+
+* Python
+
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="person_exists")
+result = model.predict(input_data="pulc_demo_imgs/person_exists/objects365_01780782.jpg")
+print(next(result))
+```
+
+```
+>>> result
+[{'class_ids': [0], 'scores': [0.9955421453341842], 'label_names': ['nobody'], 'filename': 'pulc_demo_imgs/person_exists/objects365_01780782.jpg'}]
+```
+
+`Nobody` means there is no one in the image, `someone` means there is someone in the image. Therefore, the prediction result indicates that there is no one in the figure.
+
+**Note**: `model.predict()` is a generator, so `next()` or `for` is needed to call it. This would to predict by batch that length is `batch_size`, default by 1. You can specify the argument `batch_size` and `model_name` when instantiating PaddleClas object, for example: `model = paddleclas.PaddleClas(model_name="person_exists",  batch_size=2)`. Please refer to [Supported Model List](#PULC_Models) for the supported model list.
+
+* CLI
+
+```bash
+paddleclas --model_name=person_exists --infer_imgs=pulc_demo_imgs/person_exists/objects365_01780782.jpg
+```
+
+```
+>>> result
+class_ids: [0], scores: [0.9955421453341842], label_names: ['nobody'], filename: pulc_demo_imgs/person_exists/objects365_01780782.jpg
+Predict complete!
+```
+
+**Note**: The "--infer_imgs" argument specify the image(s) to be predict, and you can also specify a directoy contains images. If use other model, you can specify the `--model_name` argument. Please refer to [Supported Model List](#PULC_Models) for the supported model list.
+
+<a name="PULC_Models"></a>
+
+**Supported Model List**
+
+The name of PULC series models are as follows:
+
+| Name | Intro |
+| --- | --- |
+| person_exists | Human Exists Classification |
+| person_attribute | Pedestrian Attribute Classification |
+| safety_helmet | Classification of Wheather Wearing Safety Helmet |
+| traffic_sign | Traffic Sign Classification |
+| vehicle_attribute | Vehicle Attribute Classification |
+| car_exists | Car Exists Classification |
+| text_image_orientation | Text Image Orientation Classification |
+| textline_orientation | Text-line Orientation Classification |
+| language_classification | Language Classification |
+
+Please refer to [Human Exists Classification](../PULC/PULC_person_exists_en.md)、[Pedestrian Attribute Classification](../PULC/PULC_person_attribute_en.md)、[Classification of Wheather Wearing Safety Helmet](../PULC/PULC_safety_helmet_en.md)、[Traffic Sign Classification](../PULC/PULC_traffic_sign_en.md)、[Vehicle Attribute Classification](../PULC/PULC_vehicle_attribute_en.md)、[Car Exists Classification](../PULC/PULC_car_exists_en.md)、[Text Image Orientation Classification](../PULC/PULC_text_image_orientation_en.md)、[Text-line Orientation Classification](../PULC/PULC_textline_orientation_en.md)、[Language Classification](../PULC/PULC_language_classification_en.md) for more information about different scenarios.
+
 <a name="3"></a>
+
 ## 3. Definition of Parameters
 
 The following parameters can be specified in Command Line or used as parameters of the constructor when instantiating the PaddleClas object in Python.
 * model_name(str): If using inference model based on ImageNet1k provided by Paddle, please specify the model's name by the parameter.
 * inference_model_dir(str): Local model files directory, which is valid when `model_name` is not specified. The directory should contain `inference.pdmodel` and `inference.pdiparams`.
 * infer_imgs(str): The path of image to be predicted, or the directory containing the image files, or the URL of the image from Internet.
-* use_gpu(bool): Whether to use GPU or not, default by `True`.
-* gpu_mem(int): GPU memory usages，default by `8000`。
-* use_tensorrt(bool): Whether to open TensorRT or not. Using it can greatly promote predict preformance, default by `False`.
-* enable_mkldnn(bool): Whether enable MKLDNN or not, default `False`.
-* cpu_num_threads(int): Assign number of cpu threads, valid when `--use_gpu` is `False` and `--enable_mkldnn` is `True`, default by `10`.
-* batch_size(int): Batch size, default by `1`.
-* resize_short(int): Resize the minima between height and width into `resize_short`, default by `256`.
-* crop_size(int): Center crop image to `crop_size`, default by `224`.
-* topk(int): Print (return) the `topk` prediction results, default by `5`.
-* class_id_map_file(str): The mapping file between class ID and label, default by `ImageNet1K` dataset's mapping.
-* pre_label_image(bool): whether prelabel or not, default=False.
-* save_dir(str): The directory to save the prediction results that can be used as pre-label, default by `None`, that is, not to save.
+* use_gpu(bool): Whether to use GPU or not.
+* gpu_mem(int): GPU memory usages.
+* use_tensorrt(bool): Whether to open TensorRT or not. Using it can greatly promote predict preformance.
+* enable_mkldnn(bool): Whether enable MKLDNN or not.
+* cpu_num_threads(int): Assign number of cpu threads, valid when `--use_gpu` is `False` and `--enable_mkldnn` is `True`.
+* batch_size(int): Batch size.
+* resize_short(int): Resize the minima between height and width into `resize_short`.
+* crop_size(int): Center crop image to `crop_size`.
+* topk(int): Print (return) the `topk` prediction results when Topk postprocess is used.
+* threshold(float): The threshold of ThreshOutput when postprocess is used.
+* class_id_map_file(str): The mapping file between class ID and label.
+* save_dir(str): The directory to save the prediction results that can be used as pre-label.
 
 **Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`. The following is a demo.
 
@@ -103,6 +175,7 @@ clas = PaddleClas(model_name='ViT_base_patch16_384', resize_short=384, crop_size
 ```
 
 <a name="4"></a>
+
 ## 4. Usage
 
 PaddleClas provides two ways to use:
@@ -110,6 +183,7 @@ PaddleClas provides two ways to use:
 2. Bash command line programming.
 
 <a name="4.1"></a>
+
 ### 4.1 View help information
 
 * CLI
@@ -118,6 +192,7 @@ paddleclas -h
 ```
 
 <a name="4.2"></a>
+
 ### 4.2 Prediction using inference model provide by PaddleClas
 You can use the inference model provided by PaddleClas to predict, and only need to specify `model_name`. In this case, PaddleClas will automatically download files of specified model and save them in the directory `~/.paddleclas/`.
 
@@ -136,6 +211,7 @@ paddleclas --model_name='ResNet50' --infer_imgs='docs/images/inference_deploymen
 ```
 
 <a name="4.3"></a>
+
 ### 4.3 Prediction using local model files
 You can use the local model files trained by yourself to predict, and only need to specify `inference_model_dir`. Note that the directory must contain `inference.pdmodel` and `inference.pdiparams`.
 
@@ -154,6 +230,7 @@ paddleclas --inference_model_dir='./inference/' --infer_imgs='docs/images/infere
 ```
 
 <a name="4.4"></a>
+
 ### 4.4 Prediction by batch
 You can predict by batch, only need to specify `batch_size` when `infer_imgs` is direcotry contain image files.
 
@@ -173,6 +250,7 @@ paddleclas --model_name='ResNet50' --infer_imgs='docs/images/' --batch_size 2
 ```
 
 <a name="4.5"></a>
+
 ### 4.5 Prediction of Internet image
 You can predict the Internet image, only need to specify URL of Internet image by `infer_imgs`. In this case, the image file will be downloaded and saved in the directory `~/.paddleclas/images/`.
 
@@ -191,6 +269,7 @@ paddleclas --model_name='ResNet50' --infer_imgs='https://raw.githubusercontent.c
 ```
 
 <a name="4.6"></a>
+
 ### 4.6 Prediction of NumPy.array format image
 In Python code, you can predict the `NumPy.array` format image, only need to use the `infer_imgs` to transfer variable of image data. Note that the models in PaddleClas only support to predict 3 channels image data, and channels order is `RGB`.
 
@@ -205,6 +284,7 @@ print(next(result))
 ```
 
 <a name="4.7"></a>
+
 ### 4.7 Save the prediction result(s)
 You can save the prediction result(s) as pre-label, only need to use `pre_label_out_dir` to specify the directory to save.
 
@@ -212,17 +292,18 @@ You can save the prediction result(s) as pre-label, only need to use `pre_label_
 ```python
 from paddleclas import PaddleClas
 clas = PaddleClas(model_name='ResNet50', save_dir='./output_pre_label/')
-infer_imgs = 'docs/images/inference_deployment/whl_' # it can be infer_imgs folder path which contains all of images you want to predict.
+infer_imgs = 'docs/images/' # it can be infer_imgs folder path which contains all of images you want to predict.
 result=clas.predict(infer_imgs)
 print(next(result))
 ```
 
 * CLI
 ```bash
-paddleclas --model_name='ResNet50' --infer_imgs='docs/images/inference_deployment/whl_' --save_dir='./output_pre_label/'
+paddleclas --model_name='ResNet50' --infer_imgs='docs/images/' --save_dir='./output_pre_label/'
 ```
 
 <a name="4.8"></a>
+
 ### 4.8 Specify the mapping between class id and label name
 You can specify the mapping between class id and label name, only need to use `class_id_map_file` to specify the mapping file. PaddleClas uses ImageNet1K's mapping by default.
 
diff --git a/docs/images/PULC/docs/car_exists_data_demo.jpeg b/docs/images/PULC/docs/car_exists_data_demo.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..9959954b6b8bf27589e1d2081f86c6078d16e2c1
Binary files /dev/null and b/docs/images/PULC/docs/car_exists_data_demo.jpeg differ
diff --git a/docs/images/PULC/docs/language_classification_original_data.png b/docs/images/PULC/docs/language_classification_original_data.png
new file mode 100644
index 0000000000000000000000000000000000000000..42c4a03ebe3df6b4563e6f006d61faa0a4b1fdea
Binary files /dev/null and b/docs/images/PULC/docs/language_classification_original_data.png differ
diff --git a/docs/images/PULC/docs/person_attribute_data_demo.png b/docs/images/PULC/docs/person_attribute_data_demo.png
new file mode 100644
index 0000000000000000000000000000000000000000..c9b276af0a554bbe07d807224d56fbbe5e2b7400
Binary files /dev/null and b/docs/images/PULC/docs/person_attribute_data_demo.png differ
diff --git a/docs/images/PULC/docs/person_exists_data_demo.png b/docs/images/PULC/docs/person_exists_data_demo.png
new file mode 100644
index 0000000000000000000000000000000000000000..b74ab64b6f62b83880aa426c1d05cb1fc53840e4
Binary files /dev/null and b/docs/images/PULC/docs/person_exists_data_demo.png differ
diff --git a/docs/images/PULC/docs/safety_helmet_data_demo.jpg b/docs/images/PULC/docs/safety_helmet_data_demo.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..70bd2d952fd20e6f8fe39182914e400177d913c4
Binary files /dev/null and b/docs/images/PULC/docs/safety_helmet_data_demo.jpg differ
diff --git a/docs/images/PULC/docs/text_image_orientation_data_demo.png b/docs/images/PULC/docs/text_image_orientation_data_demo.png
new file mode 100644
index 0000000000000000000000000000000000000000..756b18e03077f7c631deb39390aa84ba0f4580ae
Binary files /dev/null and b/docs/images/PULC/docs/text_image_orientation_data_demo.png differ
diff --git a/docs/images/PULC/docs/text_image_orientation_original_data.png b/docs/images/PULC/docs/text_image_orientation_original_data.png
new file mode 100644
index 0000000000000000000000000000000000000000..9014179214224c21f50a595f414617ab12538b8e
Binary files /dev/null and b/docs/images/PULC/docs/text_image_orientation_original_data.png differ
diff --git a/docs/images/PULC/docs/textline_orientation_data_demo.png b/docs/images/PULC/docs/textline_orientation_data_demo.png
new file mode 100644
index 0000000000000000000000000000000000000000..fcb48732026e48e14a616967ee06904c2feb9449
Binary files /dev/null and b/docs/images/PULC/docs/textline_orientation_data_demo.png differ
diff --git a/docs/images/PULC/docs/traffic_sign_data_demo.png b/docs/images/PULC/docs/traffic_sign_data_demo.png
new file mode 100644
index 0000000000000000000000000000000000000000..6fac97a299b6fbf037a931f7ba56607f791271f3
Binary files /dev/null and b/docs/images/PULC/docs/traffic_sign_data_demo.png differ
diff --git a/docs/images/PULC/docs/vehicle_attribute_data_demo.png b/docs/images/PULC/docs/vehicle_attribute_data_demo.png
new file mode 100644
index 0000000000000000000000000000000000000000..68c67acb331de19b688b9b9111fb8c20ff42fc2a
Binary files /dev/null and b/docs/images/PULC/docs/vehicle_attribute_data_demo.png differ
diff --git a/docs/images/algorithm_introduction/hnsw.png b/docs/images/algorithm_introduction/hnsw.png
new file mode 100644
index 0000000000000000000000000000000000000000..eeacd32bd31e690bca2363932ca7ab9d78750313
Binary files /dev/null and b/docs/images/algorithm_introduction/hnsw.png differ
diff --git a/docs/images/class_simple.gif b/docs/images/class_simple.gif
new file mode 100644
index 0000000000000000000000000000000000000000..c30122dfa239e14901738f0c6583be6a259d339f
Binary files /dev/null and b/docs/images/class_simple.gif differ
diff --git a/docs/images/class_simple_en.gif b/docs/images/class_simple_en.gif
new file mode 100644
index 0000000000000000000000000000000000000000..14c3a678f6b0ba81b7761c397ddc97826817409a
Binary files /dev/null and b/docs/images/class_simple_en.gif differ
diff --git a/docs/images/classification.gif b/docs/images/classification.gif
new file mode 100644
index 0000000000000000000000000000000000000000..db2ff2a56be31793402a350f68e59eb924d7c1bf
Binary files /dev/null and b/docs/images/classification.gif differ
diff --git a/docs/images/classification_en.gif b/docs/images/classification_en.gif
new file mode 100644
index 0000000000000000000000000000000000000000..884d5ba1453a3c717a9060e3a9831ea6e5160e7d
Binary files /dev/null and b/docs/images/classification_en.gif differ
diff --git a/docs/zh_CN/PULC/PULC_car_exists.md b/docs/zh_CN/PULC/PULC_car_exists.md
new file mode 100644
index 0000000000000000000000000000000000000000..4107363534f9c76508d660ffb7d69dc705076a1a
--- /dev/null
+++ b/docs/zh_CN/PULC/PULC_car_exists.md
@@ -0,0 +1,470 @@
+# PULC 有车/无车分类模型
+
+------
+
+
+## 目录
+
+- [1. 模型和应用场景介绍](#1)
+- [2. 模型快速体验](#2)
+    - [2.1 安装 paddlepaddle](#2.1)
+    - [2.2 安装 paddleclas](#2.2)
+    - [2.3 预测](#2.3)
+- [3. 模型训练、评估和预测](#3)
+    - [3.1 环境配置](#3.1)
+    - [3.2 数据准备](#3.2)
+      - [3.2.1 数据集来源](#3.2.1)
+      - [3.2.2 数据集获取](#3.2.2)
+    - [3.3 模型训练](#3.3)
+    - [3.4 模型评估](#3.4)
+    - [3.5 模型预测](#3.5)
+- [4. 模型压缩](#4)
+  - [4.1 SKL-UGI 知识蒸馏](#4.1)
+    - [4.1.1 教师模型训练](#4.1.1)
+    - [4.1.2 蒸馏训练](#4.1.2)
+- [5. 超参搜索](#5)
+- [6. 模型推理部署](#6)
+  - [6.1 推理模型准备](#6.1)
+    - [6.1.1 基于训练得到的权重导出 inference 模型](#6.1.1)
+    - [6.1.2 直接下载 inference 模型](#6.1.2)
+  - [6.2 基于 Python 预测引擎推理](#6.2)
+    - [6.2.1 预测单张图像](#6.2.1)
+    - [6.2.2 基于文件夹的批量预测](#6.2.2)
+  - [6.3 基于 C++ 预测引擎推理](#6.3)
+  - [6.4 服务化部署](#6.4)
+  - [6.5 端侧部署](#6.5)
+  - [6.6 Paddle2ONNX 模型转换与预测](#6.6)
+
+
+<a name="1"></a>
+
+## 1. 模型和应用场景介绍
+
+该案例提供了用户使用 PaddleClas 的超轻量图像分类方案（PULC，Practical Ultra Lightweight image Classification）快速构建轻量级、高精度、可落地的有车/无车的分类模型。该模型可以广泛应用于如监控场景、海量数据过滤场景等。
+
+下表列出了判断图片中是否有车的二分类模型的相关指标，前两行展现了使用 SwinTranformer_tiny 和 MobileNetV3_small_x0_35 作为 backbone 训练得到的模型的相关指标，第三行至第六行依次展现了替换 backbone 为 PPLCNet_x1_0、使用 SSLD 预训练模型、使用 SSLD 预训练模型 + EDA 策略、使用 SSLD 预训练模型 + EDA 策略 + SKL-UGI 知识蒸馏策略训练得到的模型的相关指标。
+
+
+| 模型 | Tpr（%）@Fpr0.01 | 延时（ms） | 存储（M） | 策略 |
+|-------|----------------|----------|---------------|---------------|
+| SwinTranformer_tiny  | 97.71          | 95.30  | 111 | 使用 ImageNet 预训练模型 |
+| MobileNetV3_small_x0_35  | 81.23          | 2.85  | 2.7 | 使用 ImageNet 预训练模型 |
+| PPLCNet_x1_0  | 94.72          | 2.12  | 7.1 | 使用 ImageNet 预训练模型 |
+| PPLCNet_x1_0  | 95.48          | 2.12  | 7.1 | 使用 SSLD 预训练模型 |
+| PPLCNet_x1_0  | 95.48          | 2.12  | 7.1 | 使用 SSLD 预训练模型+EDA 策略|
+| <b>PPLCNet_x1_0<b>  | <b>95.92<b>    | <b>2.12<b>  | <b>7.1<b> | 使用 SSLD 预训练模型+EDA 策略+SKL-UGI 知识蒸馏策略|
+
+从表中可以看出，backbone 为 SwinTranformer_tiny 时精度较高，但是推理速度较慢。将 backbone 替换为轻量级模型 MobileNetV3_small_x0_35 后，速度可以大幅提升，但是会导致精度大幅下降。将 backbone 替换为速度更快的 PPLCNet_x1_0 时，精度较 MobileNetV3_small_x0_35 高 13 个百分点，与此同时速度依旧可以快 20% 以上。在此基础上，使用 SSLD 预训练模型后，在不改变推理速度的前提下，精度可以提升约 0.7 个百分点，进一步地，在使用 SKL-UGI 知识蒸馏后，精度可以继续提升 0.44 个百分点。此时，PPLCNet_x1_0 达到了接近 SwinTranformer_tiny 模型的精度，但是速度快 40 多倍。关于 PULC 的训练方法和推理部署方法将在下面详细介绍。
+
+**备注：**
+
+* `Tpr`指标的介绍可以参考 [3.3节](#3.3)的备注部分，延时是基于 Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz 测试得到，开启 MKLDNN 加速策略，线程数为10。
+* 关于PP-LCNet的介绍可以参考[PP-LCNet介绍](../models/PP-LCNet.md)，相关论文可以查阅[PP-LCNet paper](https://arxiv.org/abs/2109.15099)。
+
+
+<a name="2"></a>
+
+## 2. 模型快速体验
+
+<a name="2.1"></a>  
+
+### 2.1 安装 paddlepaddle
+
+- 您的机器安装的是 CUDA9 或 CUDA10，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- 您的机器是CPU，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+更多的版本需求，请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+<a name="2.2"></a>
+
+### 2.2 安装 paddleclas
+
+使用如下命令快速安装 paddleclas
+
+```  
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 预测
+
+点击[这里](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip)下载 demo 数据并解压，然后在终端中切换到相应目录。
+
+* 使用命令行快速预测
+
+```bash
+paddleclas --model_name=car_exists --infer_imgs=pulc_demo_imgs/car_exists/objects365_00001507.jpeg
+```
+
+结果如下：
+```
+>>> result
+class_ids: [1], scores: [0.9871138], label_names: ['contains_car'], filename: pulc_demo_imgs/car_exists/objects365_00001507.jpeg
+Predict complete!
+```
+
+**备注**： 更换其他预测的数据时，只需要改变 `--infer_imgs=xx` 中的字段即可，支持传入整个文件夹。
+
+
+* 在 Python 代码中预测
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="car_exists")
+result = model.predict(input_data="pulc_demo_imgs/car_exists/objects365_00001507.jpeg")
+print(next(result))
+```
+
+**备注**：`model.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果, 默认 `batch_size` 为 1，如果需要更改 `batch_size`，实例化模型时，需要指定 `batch_size`，如 `model = paddleclas.PaddleClas(model_name="car_exists",  batch_size=2)`, 使用默认的代码返回结果示例如下：
+
+```
+>>> result
+[{'class_ids': [1], 'scores': [0.9871138], 'label_names': ['contains_car'], 'filename': 'pulc_demo_imgs/car_exists/objects365_00001507.jpeg'}]
+```
+
+
+<a name="3"></a>
+
+## 3. 模型训练、评估和预测
+
+<a name="3.1"></a>  
+
+### 3.1 环境配置
+
+* 安装：请先参考文档[环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+
+<a name="3.2"></a>
+
+### 3.2 数据准备
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 数据集来源
+
+本案例中所使用的所有数据集均为开源数据，`train`和`val` 集合均为[Objects365 数据](https://www.objects365.org/overview.html)的子集，`ImageNet_val` 为[ImageNet-1k 数据](https://www.image-net.org/)的验证集。
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 数据集获取
+
+在公开数据集的基础上经过后处理即可得到本案例需要的数据，具体处理方法如下：
+
+- 训练集合，本案例处理了 Objects365 数据训练集的标注文件，如果某张图含有“car”的标签，且这个框的面积在整张图中的比例大于 10%，即认为该张图中含有车，如果某张图中没有任何与交通工具，例如car、bus等相关的的标签，则认为该张图中不含有车。经过处理后，得到 108629 条可用数据，其中有车的数据有 27422 条，无车的数据 81207 条。
+
+- 验证集合，处理方法与训练集相同，数据来源于 Objects365 数据集的验证集。为了测试结果准确，验证集经过人工校正，去除了一些可能存在标注错误的图像。
+
+* 注：由于objects365的标签并不是完全互斥的，例如F1赛车可能是 "F1 Formula"，也可能被标称"car"。为了减轻干扰，我们仅保留"car"标签作为有车，而将不含任何交通工具的图作为无车。
+
+处理后的数据集部分数据可视化如下：
+
+![](../../images/PULC/docs/car_exists_data_demo.jpeg)
+
+此处提供了经过上述方法处理好的数据，可以直接下载得到。
+
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，下载并解压有车/无车场景的数据。
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/car_exists.tar
+tar -xf car_exists.tar
+cd ../
+```
+
+执行上述命令后，`dataset/` 下存在 `car_exists` 目录，该目录中具有以下数据：
+
+```
+
+├── objects365_car
+│   ├── objects365_00000039.jpg
+│   ├── objects365_00000099.jpg
+├── ImageNet_val
+│   ├── ILSVRC2012_val_00000001.JPEG
+│   ├── ILSVRC2012_val_00000002.JPEG
+...
+├── train_list.txt
+├── train_list.txt.debug
+├── train_list_for_distill.txt
+├── val_list.txt
+└── val_list.txt.debug
+```
+
+其中 `train/` 和 `val/` 分别为训练集和验证集。`train_list.txt` 和 `val_list.txt` 分别为训练集和验证集的标签文件，`train_list.txt.debug` 和 `val_list.txt.debug` 分别为训练集和验证集的 `debug` 标签文件，其分别是 `train_list.txt` 和 `val_list.txt` 的子集，用该文件可以快速体验本案例的流程。`ImageNet_val/` 是 ImageNet-1k 的验证集，该集合和 `train` 集合的混合数据用于本案例的 `SKL-UGI知识蒸馏策略`，对应的训练标签文件为 `train_list_for_distill.txt` 。
+
+**备注：**
+
+* 关于 `train_list.txt`、`val_list.txt`的格式说明，可以参考 [PaddleClas 分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。
+
+* 关于如何得到蒸馏的标签文件可以参考[知识蒸馏标签获得方法](../advanced_tutorials/ssld.md#3.2)。
+
+
+<a name="3.3"></a>
+
+### 3.3 模型训练
+
+
+在 `ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml` 中提供了基于该场景的训练配置，可以通过如下脚本启动训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml
+```
+
+验证集的最佳指标在 `0.95-0.96` 之间（数据集较小，容易造成波动）。
+
+**备注：**
+
+* 此时使用的指标为Tpr，该指标描述了在假正类率（Fpr）小于某一个指标时的真正类率（Tpr），是产业中二分类问题常用的指标之一。在本案例中，Fpr 为 1/100 。关于 Fpr 和 Tpr 的更多介绍，可以参考[这里](https://baike.baidu.com/item/AUC/19282953)。
+
+* 在eval时，会打印出来当前最佳的 TprAtFpr 指标，具体地，其会打印当前的 `Fpr`、`Tpr` 值，以及当前的 `threshold`值，`Tpr` 值反映了在当前 `Fpr` 值下的召回率，该值越高，代表模型越好。`threshold` 表示当前最佳 `Fpr` 所对应的分类阈值，可用于后续模型部署落地等。
+
+<a name="3.4"></a>
+
+### 3.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+其中 `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+<a name="3.5"></a>
+
+### 3.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```python
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+
+输出结果如下：
+
+```
+[{'class_ids': [1], 'scores': [0.9871138], 'label_names': ['contains_car'], 'filename': 'deploy/images/PULC/car_exists/objects365_00001507.jpeg'}]
+```
+
+**备注：**
+
+* 这里`-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+* 默认是对 `deploy/images/PULC/car_exists/objects365_00001507.jpeg` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
+
+* 二分类默认的阈值为0.5， 如果需要指定阈值，可以重写 `Infer.PostProcess.threshold` ，如`-o Infer.PostProcess.threshold=0.9794`，该值需要根据实际场景来确定，此处的 `0.9794` 是在该场景中的 `val` 数据集在百分之一 Fpr 下得到的最佳 Tpr 所得到的。
+
+
+<a name="4"></a>
+
+## 4. 模型压缩
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI 知识蒸馏
+
+SKL-UGI 知识蒸馏是 PaddleClas 提出的一种简单有效的知识蒸馏方法，关于该方法的介绍，可以参考[SKL-UGI 知识蒸馏](../advanced_tutorials/ssld.md)。
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 教师模型训练
+
+复用 `ppcls/configs/PULC/car_exists/PPLCNet/PPLCNet_x1_0.yaml` 中的超参数，训练教师模型，训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd
+```
+
+验证集的最佳指标为 `0.96-0.98` 之间，当前教师模型最好的权重保存在 `output/ResNet101_vd/best_model.pdparams`。
+
+<a name="4.1.2"></a>
+
+####  4.1.2 蒸馏训练
+
+配置文件`ppcls/configs/PULC/car_exists/PPLCNet_x1_0_distillation.yaml`提供了`SKL-UGI知识蒸馏策略`的配置。该配置将`ResNet101_vd`当作教师模型，`PPLCNet_x1_0`当作学生模型，使用ImageNet数据集的验证集作为新增的无标签数据。训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/car_exists/PPLCNet_x1_0_distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
+```
+
+验证集的最佳指标为 `0.95-0.97` 之间，当前模型最好的权重保存在 `output/DistillationModel/best_model_student.pdparams`。
+
+
+<a name="5"></a>
+
+## 5. 超参搜索
+
+在 [3.3 节](#3.3)和 [4.1 节](#4.1)所使用的超参数是根据 PaddleClas 提供的 `超参数搜索策略` 搜索得到的，如果希望在自己的数据集上得到更好的结果，可以参考[超参数搜索策略](PULC_train.md#4-超参搜索)来获得更好的训练超参数。
+
+**备注：** 此部分内容是可选内容，搜索过程需要较长的时间，您可以根据自己的硬件情况来选择执行。如果没有更换数据集，可以忽略此节内容。
+
+<a name="6"></a>
+
+## 6. 模型推理部署
+
+<a name="6.1"></a>
+
+### 6.1 推理模型准备
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference可使用 MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于 Paddle Inference 推理引擎的介绍，可以参考 [Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+
+当使用 Paddle Inference 推理时，加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法，如果希望得到和文档相同的结果，请选择[直接下载 inference 模型](#6.1.2)的方式。
+
+<a name="6.1.1"></a>
+
+### 6.1.1 基于训练得到的权重导出 inference 模型
+
+此处，我们提供了将权重和模型转换的脚本，执行该脚本可以得到对应的 inference 模型：
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_car_exists_infer
+```
+执行完该脚本后会在 `deploy/models/` 下生成 `PPLCNet_x1_0_car_exists_infer` 文件夹，`models` 文件夹下应有如下文件结构：
+
+```
+├── PPLCNet_x1_0_car_exists_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**备注：** 此处的最佳权重是经过知识蒸馏后的权重路径，如果没有执行知识蒸馏的步骤，最佳模型保存在`output/PPLCNet_x1_0/best_model.pdparams`中。
+
+<a name="6.1.2"></a>
+
+### 6.1.2 直接下载 inference 模型
+
+[6.1.1 小节](#6.1.1)提供了导出 inference 模型的方法，此处也提供了该场景可以下载的 inference 模型，可以直接下载体验。
+
+```
+cd deploy/models
+# 下载 inference 模型并解压
+wget https://paddleclas.bj.bcebos.com/models/PULC/car_exists_infer.tar && tar -xf car_exists_infer.tar
+```
+
+解压完毕后，`models` 文件夹下应有如下文件结构：
+
+```
+├── car_exists_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 基于 Python 预测引擎推理
+
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 预测单张图像
+
+返回 `deploy` 目录：
+
+```
+cd ../
+```
+
+运行下面的命令，对图像 `./images/PULC/car_exists/objects365_00001507.jpeg` 进行有人/无人分类。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/car_exists/inference_car_exists.yaml
+# 使用下面的命令使用 CPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/car_exists/inference_car_exists.yaml -o Global.use_gpu=False
+```
+
+输出结果如下。
+
+```
+objects365_00001507.jpeg:       class id(s): [1], score(s): [0.99], label_name(s): ['contains_car']
+```
+
+
+**备注：** 二分类默认的阈值为0.5， 如果需要指定阈值，可以重写 `Infer.PostProcess.threshold` ，如`-o Infer.PostProcess.threshold=0.9794`，该值需要根据实际场景来确定，此处的 `0.9794` 是在该场景中的 `val` 数据集在百分之一 Fpr 下得到的最佳 Tpr 所得到的。该阈值的确定方法可以参考[3.3节](#3.3)备注部分。
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 基于文件夹的批量预测
+
+如果希望预测文件夹内的图像，可以直接修改配置文件中的 `Global.infer_imgs` 字段，也可以通过下面的 `-o` 参数修改对应的配置。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测，如果希望使用 CPU 预测，可以在命令后面添加 -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/car_exists/inference_car_exists.yaml -o Global.infer_imgs="./images/PULC/car_exists/"
+```
+
+终端中会输出该文件夹内所有图像的分类结果，如下所示。
+
+```
+objects365_00001507.jpeg:       class id(s): [1], score(s): [0.99], label_name(s): ['contains_car']
+objects365_00001521.jpeg:       class id(s): [0], score(s): [0.99], label_name(s): ['no_car']
+```
+
+其中，`contains_car` 表示该图里存在车，`no_car` 表示该图里不存在车。
+
+<a name="6.3"></a>
+
+### 6.3 基于 C++ 预测引擎推理
+
+PaddleClas 提供了基于 C++ 预测引擎推理的示例，您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台，可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。
+
+<a name="6.4"></a>
+
+### 6.4 服务化部署
+
+Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍，可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。
+
+PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例，您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。
+
+<a name="6.5"></a>
+
+### 6.5 端侧部署
+
+Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍，可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。
+
+PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例，您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。
+
+<a name="6.6"></a>
+
+### 6.6 Paddle2ONNX 模型转换与预测
+
+Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署，包括TensorRT/OpenVINO/MNN/TNN/NCNN，以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍，可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。
+
+PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例，您可以参考[Paddle2ONNX 模型转换与预测](../../../deploy/paddle2onnx/readme.md)来完成相应的部署工作。
diff --git a/docs/zh_CN/PULC/PULC_language_classification.md b/docs/zh_CN/PULC/PULC_language_classification.md
new file mode 100644
index 0000000000000000000000000000000000000000..309f3e9cc8a0c3c519722baeb13e5b90a8312e51
--- /dev/null
+++ b/docs/zh_CN/PULC/PULC_language_classification.md
@@ -0,0 +1,453 @@
+# PULC 语种分类模型
+
+## 目录
+
+- [1. 模型和应用场景介绍](#1)
+- [2. 模型快速体验](#2)
+    - [2.1 安装 paddlepaddle](#2.1)
+    - [2.2 安装 paddleclas](#2.2)
+    - [2.3 预测](#2.3)
+- [3. 模型训练、评估和预测](#3)
+  - [3.1 环境配置](#3.1)  
+  - [3.2 数据准备](#3.2)
+    - [3.2.1 数据集来源](#3.2.1)
+    - [3.2.2 数据集获取](#3.2.2)
+  - [3.3 模型训练](#3.3)
+  - [3.4 模型评估](#3.4)
+  - [3.5 模型预测](#3.5)
+- [4. 模型压缩](#4)
+  - [4.1 SKL-UGI 知识蒸馏](#4.1)
+    - [4.1.1 教师模型训练](#4.1.1)
+    - [4.1.2 蒸馏训练](#4.1.2)
+- [5. 超参搜索](#5)
+- [6. 模型推理部署](#6)
+  - [6.1 推理模型准备](#6.1)
+    - [6.1.1 基于训练得到的权重导出 inference 模型](#6.1.1)
+    - [6.1.2 直接下载 inference 模型](#6.1.2)
+  - [6.2 基于 Python 预测引擎推理](#6.2)
+    - [6.2.1 预测单张图片](#6.2.1)
+    - [6.2.2 基于文件夹的批量预测](#6.2.2)
+  - [6.3 基于 C++ 预测引擎推理](#6.3)
+  - [6.4 服务化部署](#6.4)
+  - [6.5 端侧部署](#6.5)
+  - [6.6 Paddle2ONNX 模型转换与预测](#6.6)
+
+<a name="1"></a>
+
+## 1. 模型和应用场景介绍
+
+该案例提供了用户使用 PaddleClas 的超轻量图像分类方案（PULC，Practical Ultra Lightweight image Classification）快速构建轻量级、高精度、可落地的语种分类模型。使用该方法训练得到的模型可以快速判断图片中的文字语种，该模型可以广泛应用于金融、政务等各种涉及多语种OCR处理的场景中。
+
+下表列出了语种分类模型的相关指标，前两行展现了使用 SwinTranformer_tiny 和 MobileNetV3_small_x0_35 作为 backbone 训练得到的模型的相关指标，第三行至第六行依次展现了替换 backbone 为 PPLCNet_x1_0、使用 SSLD 预训练模型、使用 SSLD 预训练模型 + EDA 策略、使用 SSLD 预训练模型 + EDA 策略 + SKL-UGI 知识蒸馏策略训练得到的模型的相关指标。其中替换 backbone 为 PPLCNet_x1_0时，将数据预处理时的输入尺寸变为[192,48]，且网络的下采样stride调整为[2, [2, 1], [2, 1], [2, 1], [2, 1]]。
+
+| 模型                    | 精度      | 延时     | 存储    | 策略                                           |
+| ----------------------- | --------- | -------- | ------- | ---------------------------------------------- |
+| SwinTranformer_tiny     | 98.12     | 89.09    | 111     | 使用ImageNet预训练模型                         |
+| MobileNetV3_small_x0_35 | 95.92     | 2.98     | 3.7      | 使用ImageNet预训练模型                         |
+| PPLCNet_x1_0            | 98.35     | 2.58     | 7.1     | 使用ImageNet预训练模型                         |
+| PPLCNet_x1_0            | 98.7      | 2.58     | 7.1     | 使用SSLD预训练模型                             |
+| PPLCNet_x1_0            | 99.12     | 2.58     | 7.1     | 使用SSLD预训练模型+EDA策略                     |
+| **PPLCNet_x1_0**        | **99.26** | **2.58** | **7.1** | 使用SSLD预训练模型+EDA策略+SKL-UGI知识蒸馏策略 |
+
+从表中可以看出，backbone 为 SwinTranformer_tiny 时精度比较高，但是推理速度较慢。将 backbone 替换为轻量级模型 MobileNetV3_small_x0_35 后，速度提升明显，但精度有了大幅下降。将 backbone 替换为 PPLCNet_x1_0 且调整预处理输入尺寸和网络的下采样stride时，速度略为提升，同时精度较 MobileNetV3_large_x1_0 高2.43个百分点。在此基础上，使用 SSLD 预训练模型后，在不改变推理速度的前提下，精度可以提升 0.35 个百分点，进一步地，当融合EDA策略后，精度可以再提升 0.42 个百分点，最后，在使用 SKL-UGI 知识蒸馏后，精度可以继续提升 0.14 个百分点。此时，PPLCNet_x1_0 超过了 SwinTranformer_tiny 模型的精度，并且速度有了明显提升。关于 PULC 的训练方法和推理部署方法将在下面详细介绍。
+
+**备注：**
+
+* 关于PP-LCNet的介绍可以参考[PP-LCNet介绍](../models/PP-LCNet.md)，相关论文可以查阅[PP-LCNet paper](https://arxiv.org/abs/2109.15099)。
+
+<a name="2"></a>
+
+## 2. 模型快速体验
+
+<a name="2.1"></a>  
+
+### 2.1 安装 paddlepaddle
+
+- 您的机器安装的是 CUDA9 或 CUDA10，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- 您的机器是CPU，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+更多的版本需求，请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+<a name="2.2"></a>  
+
+### 2.2 安装 paddleclas
+
+使用如下命令快速安装 paddleclas
+
+```  
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 预测
+
+点击[这里](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip)下载 demo 数据并解压，然后在终端中切换到相应目录。
+
+* 使用命令行快速预测
+
+```bash
+paddleclas --model_name=language_classification --infer_imgs=pulc_demo_imgs/language_classification/word_35404.png
+```
+
+结果如下：
+```
+>>> result
+class_ids: [4, 6], scores: [0.88672, 0.01434], label_names: ['japan', 'korean'], filename: pulc_demo_imgs/language_classification/word_35404.png
+Predict complete!
+```
+
+**备注**： 更换其他预测的数据时，只需要改变 `--infer_imgs=xx` 中的字段即可，支持传入整个文件夹。
+
+
+* 在 Python 代码中预测
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="language_classification")
+result = model.predict(input_data="pulc_demo_imgs/language_classification/word_35404.png")
+print(next(result))
+```
+
+**备注**：`model.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果, 默认 `batch_size` 为 1，如果需要更改 `batch_size`，实例化模型时，需要指定 `batch_size`，如 `model = paddleclas.PaddleClas(model_name="language_classification",  batch_size=2)`, 使用默认的代码返回结果示例如下：
+
+```
+>>> result
+[{'class_ids': [4, 6], 'scores': [0.88672, 0.01434], 'label_names': ['japan', 'korean'], 'filename': 'pulc_demo_imgs/language_classification/word_35404.png'}]
+```
+
+
+<a name="3"></a>
+
+## 3. 模型训练、评估和预测
+
+<a name="3.1"></a>
+
+### 3.1 环境配置
+
+* 安装：请先参考文档 [环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+
+<a name="3.2"></a>
+
+### 3.2 数据准备
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 数据集来源
+
+[第1节](#1)中提供的模型使用内部数据训练得到，该数据集暂时不方便公开。这里基于 [Multi-lingual scene text detection and recognition](https://rrc.cvc.uab.es/?ch=15&com=downloads) 开源数据集构造了一个多语种demo数据集，用于体验本案例的预测过程。
+
+![](../../images/PULC/docs/language_classification_original_data.png)
+
+<a name="3.2.2"></a>
+
+#### 3.2.2 数据集获取
+
+[第1节](#1)中提供的模型共支持10个类别，分别为：
+
+`0` 表示阿拉伯语（arabic）；`1` 表示中文繁体（chinese_cht）；`2` 表示斯拉夫语（cyrillic）；`3` 表示梵文（devanagari）；`4` 表示日语（japan）；`5` 表示卡纳达文（ka）；`6` 表示韩语（korean）；`7` 表示泰米尔文（ta）；`8` 表示泰卢固文（te）；`9` 表示拉丁语（latin）。
+
+在 Multi-lingual scene text detection and recognition 数据集中，仅包含了阿拉伯语、日语、韩语和拉丁语数据，这里分别将 4 个语种的数据各抽取 1600 张作为本案例的训练数据，300 张作为测试数据，以及 400 张作为补充数据和训练数据混合用于本案例的`SKL-UGI知识蒸馏策略`实验。
+
+因此，对于本案例中的demo数据集，类别为：
+
+`0` 表示阿拉伯语（arabic）；`1` 表示日语（japan）；`2` 表示韩语（korean）；`3` 表示拉丁语（latin）。
+
+如果想要制作自己的多语种数据集，可以按照需求收集并整理自己任务中需要语种的数据，此处提供了经过上述方法处理好的demo数据，可以直接下载得到。
+
+**备注：** 语种分类任务中的图片数据需要将整图中的文字区域抠取出来，仅仅使用文本行部分作为图片数据。
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，下载并解压多语种场景的demo数据。
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/language_classification.tar
+tar -xf language_classification.tar
+cd ../
+```
+
+执行上述命令后，`dataset/`下存在`language_classification`目录，该目录中具有以下数据：
+
+```
+├── img
+│   ├── word_1.png
+│   ├── word_2.png
+...
+├── train_list.txt
+├── train_list_for_distill.txt
+├── test_list.txt
+└── label_list.txt
+```
+
+其中`img/`存放了 4 种语言总计 9200 张数据。`train_list.txt`和`test_list.txt`分别为训练集和验证集的标签文件，`label_list.txt`是 4 类语言分类模型对应的类别列表，`SKL-UGI 知识蒸馏策略`对应的训练标签文件为`train_list_for_distill.txt`。用这些图片可以快速体验本案例中模型的训练预测过程。
+
+***备注：***
+
+-  这里的`label_list.txt`是4类语种分类模型对应的类别列表，如果自己构造的数据集语种类别发生变化，需要自行调整。
+-  如果想要自己构造训练集和验证集，可以参考[PaddleClas分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。
+
+<a name="3.3"></a>
+
+### 3.3 模型训练
+
+在`ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml`中提供了基于该场景的训练配置，可以通过如下脚本启动训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
+        -o Arch.class_num=4
+```
+
+-  由于本文档中的demo数据集的类别数量为 4，所以需要添加`-o Arch.class_num=4`来将模型的类别数量指定为4。
+
+<a name="3.4"></a>
+
+### 3.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model" \
+    -o Arch.class_num=4
+```
+
+其中 `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+<a name="3.5"></a>
+
+### 3.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```bash
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model" \
+    -o Arch.class_num=4
+```
+
+输出结果如下：
+
+```
+[{'class_ids': [4, 9], 'scores': [0.96809, 0.01001], 'file_name': 'deploy/images/PULC/language_classification/word_35404.png', 'label_names': ['japan', 'latin']}]
+```
+
+***备注：***
+
+- 其中 `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+- 默认是对 `deploy/images/PULC/language_classification/word_35404.png` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
+- 预测输出为top2的预测结果，`japan` 表示该图中文字语种识别为日语，`latin` 表示该图中文字语种识别为拉丁语。
+
+<a name="4"></a>
+
+## 4. 模型压缩
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI 知识蒸馏
+
+SKL-UGI 知识蒸馏是 PaddleClas 提出的一种简单有效的知识蒸馏方法，关于该方法的介绍，可以参考[SKL-UGI 知识蒸馏](../advanced_tutorials/ssld.md)。
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 教师模型训练
+
+复用`ppcls/configs/PULC/language_classification/PPLCNet/PPLCNet_x1_0.yaml`中的超参数，训练教师模型，训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd \
+        -o Arch.class_num=4
+```
+
+当前教师模型最好的权重保存在`output/ResNet101_vd/best_model.pdparams`。
+
+**备注：** 训练ResNet101_vd模型需要的显存较多，如果机器显存不够，可以将学习率和 batch size 同时缩小一定的倍数进行训练。
+
+<a name="4.1.2"></a>
+
+#### 4.1.2 蒸馏训练
+
+配置文件`ppcls/configs/PULC/language_classification/PPLCNet_x1_0_distillation.yaml`提供了`SKL-UGI知识蒸馏策略`的配置。该配置将`ResNet101_vd`当作教师模型，`PPLCNet_x1_0`当作学生模型，使用[3.2.2节](#3.2.2)中介绍的蒸馏数据作为新增的无标签数据。训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0_distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model \
+        -o Arch.class_num=4
+```
+
+当前模型最好的权重保存在`output/DistillationModel/best_model_student.pdparams`。
+
+<a name="5"></a>
+
+## 5. 超参搜索
+
+在 [3.2 节](#3.2)和 [4.1 节](#4.1)所使用的超参数是根据 PaddleClas 提供的 `超参数搜索策略` 搜索得到的，如果希望在自己的数据集上得到更好的结果，可以参考[超参数搜索策略](PULC_train.md#4-超参搜索)来获得更好的训练超参数。
+
+**备注：** 此部分内容是可选内容，搜索过程需要较长的时间，您可以根据自己的硬件情况来选择执行。如果没有更换数据集，可以忽略此节内容。
+
+<a name="6"></a>
+
+## 6. 模型推理部署
+
+<a name="6.1"></a>
+
+### 6.1 推理模型准备
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference可使用MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于Paddle Inference推理引擎的介绍，可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+
+当使用 Paddle Inference 推理时，加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法，如果希望得到和文档相同的结果，请选择[直接下载 inference 模型](#6.1.2)的方式。
+
+<a name="6.1.1"></a>
+
+#### 6.1.1 基于训练得到的权重导出 inference 模型
+
+此处，我们提供了将权重和模型转换的脚本，执行该脚本可以得到对应的 inference 模型：
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_language_classification_infer
+```
+
+执行完该脚本后会在`deploy/models/`下生成`PPLCNet_x1_0_language_classification_infer`文件夹，`models` 文件夹下应有如下文件结构：
+
+```
+├── PPLCNet_x1_0_language_classification_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**备注：** 此处的最佳权重是经过知识蒸馏后的权重路径，如果没有执行知识蒸馏的步骤，最佳模型保存在`output/PPLCNet_x1_0/best_model.pdparams`中。
+
+<a name="6.1.2"></a>
+
+#### 6.1.2 直接下载 inference 模型
+
+[6.1.1 小节](#6.1.1)提供了导出 inference 模型的方法，此处也提供了该场景可以下载的 inference 模型，可以直接下载体验。
+
+```
+cd deploy/models
+# 下载inference 模型并解压
+wget https://paddleclas.bj.bcebos.com/models/PULC/language_classification_infer.tar && tar -xf language_classification_infer.tar
+```
+
+解压完毕后，`models` 文件夹下应有如下文件结构：
+
+```
+├── language_classification_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 基于 Python 预测引擎推理
+
+<a name="6.2.1"></a>
+
+#### 6.2.1 预测单张图像
+
+返回 `deploy` 目录：
+
+```
+cd ../
+```
+
+运行下面的命令，对图像 `./images/PULC/language_classification/word_35404.png` 进行整图文字方向分类。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/language_classification/inference_language_classification.yaml
+# 使用下面的命令使用 CPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/language_classification/inference_language_classification.yaml -o Global.use_gpu=False
+```
+
+输出结果如下。
+
+```
+word_35404.png:    class id(s): [4, 6], score(s): [0.89, 0.01], label_name(s): ['japan', 'korean']
+```
+
+其中，输出为top2的预测结果，`japan` 表示该图中文字语种为日语，`korean` 表示该图中文字语种为韩语。
+
+<a name="6.2.2"></a>
+
+#### 6.2.2 基于文件夹的批量预测
+
+如果希望预测文件夹内的图像，可以直接修改配置文件中的 `Global.infer_imgs` 字段，也可以通过下面的 `-o` 参数修改对应的配置。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测，如果希望使用 CPU 预测，可以在命令后面添加 -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/language_classification/inference_language_classification.yaml -o Global.infer_imgs="./images/PULC/language_classification/"
+```
+
+终端中会输出该文件夹内所有图像的分类结果，如下所示。
+
+```
+word_17.png:    class id(s): [9, 4], score(s): [0.80, 0.09], label_name(s): ['latin', 'japan']
+word_20.png:    class id(s): [0, 4], score(s): [0.91, 0.02], label_name(s): ['arabic', 'japan']
+word_35404.png:    class id(s): [4, 6], score(s): [0.89, 0.01], label_name(s): ['japan', 'korean']
+```
+
+其中，输出为top2的预测结果，`japan` 表示该图中文字语种为日语，`latin` 表示该图中文字语种为拉丁语，`arabic` 表示该图中文字语种为阿拉伯语，`korean` 表示该图中文字语种为韩语。
+
+<a name="6.3"></a>
+
+### 6.3 基于 C++ 预测引擎推理
+
+PaddleClas 提供了基于 C++ 预测引擎推理的示例，您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台，可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。
+
+<a name="6.4"></a>
+
+### 6.4 服务化部署
+
+Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍，可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。
+
+PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例，您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。
+
+<a name="6.5"></a>
+
+### 6.5 端侧部署
+
+Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍，可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。
+
+PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例，您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。
+
+<a name="6.6"></a>
+
+### 6.6 Paddle2ONNX 模型转换与预测
+
+Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署，包括TensorRT/OpenVINO/MNN/TNN/NCNN，以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍，可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。
+
+PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例，您可以参考[Paddle2ONNX 模型转换与预测](../../../deploy/paddle2onnx/readme.md)来完成相应的部署工作。
diff --git a/docs/zh_CN/PULC/PULC_model_list.md b/docs/zh_CN/PULC/PULC_model_list.md
new file mode 100644
index 0000000000000000000000000000000000000000..4b2d7a8774d7d64a634a1bebc96481fc2ad076eb
--- /dev/null
+++ b/docs/zh_CN/PULC/PULC_model_list.md
@@ -0,0 +1,25 @@
+# PULC 模型库
+
+------
+
+此处提供了 PULC 模型库的相关指标和模型的下载链接，其中预训练模型可以用来微调训练，推理模型可以直接用来预测和部署。
+
+
+|模型名称|模型简介|模型精度 |模型大小|推理耗时|下载地址|
+| --- | --- | --- | --- | --- | --- |
+| person_exists |[PULC有人/无人分类模型](PULC_person_exists.md)| 96.23 |7.0M|2.58ms|[推理模型](https://paddleclas.bj.bcebos.com/models/PULC/inference/person_exists_infer.tar) / [预训练模型](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/person_exists_pretrained.pdparams)|
+| person_attribute |[PULC人体属性识别模型](PULC_person_attribute.md)| 78.59 |7.2M|2.01ms|[推理模型](https://paddleclas.bj.bcebos.com/models/PULC/inference/person_attribute_infer.tar) / [预训练模型](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/person_attribute_pretrained.pdparams)|
+| safety_helmet |[PULC佩戴安全帽分类模型](PULC_safety_helmet.md)| 99.38 |7.1M|2.03ms|[推理模型](https://paddleclas.bj.bcebos.com/models/PULC/inference/safety_helmet_infer.tar) / [预训练模型](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/safety_helmet_pretrained.pdparams)|
+| traffic_sign |[PULC交通标志分类模型](PULC_traffic_sign.md)| 98.35 |8.2M|2.10ms|[推理模型](https://paddleclas.bj.bcebos.com/models/PULC/inference/traffic_sign_infer.tar) / [预训练模型](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/traffic_sign_pretrained.pdparams)|
+| vehicle_attribute |[PULC车辆属性识别模型](PULC_vehicle_attribute.md)| 90.81 |7.2M|2.36ms|[推理模型](https://paddleclas.bj.bcebos.com/models/PULC/inference/vehicle_attribute_infer.tar) / [预训练模型](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/vehicle_attribute_pretrained.pdparams)|
+| car_exists |[PULC有车/无车分类模型](PULC_car_exists.md) | 95.92 | 7.1M | 2.38ms |[推理模型](https://paddleclas.bj.bcebos.com/models/PULC/inference/car_exists_infer.tar) / [预训练模型](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/car_exists_pretrained.pdparams)|
+| text_image_orientation |[PULC含文字图像方向分类模型](PULC_text_image_orientation.md)| 99.06 | 7.1M | 2.16ms |[推理模型](https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar) / [预训练模型](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/text_image_orientation_pretrained.pdparams)|
+| textline_orientation |[PULC文本行方向分类模型](PULC_textline_orientation.md)| 96.01 |7.0M|2.72ms|[推理模型](https://paddleclas.bj.bcebos.com/models/PULC/inference/textline_orientation_infer.tar) / [预训练模型](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/textline_orientation_pretrained.pdparams)|
+| language_classification |[PULC语种分类模型](PULC_language_classification.md)| 99.26 |7.1M|2.58ms|[推理模型](https://paddleclas.bj.bcebos.com/models/PULC/inference/language_classification_infer.tar) / [预训练模型](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/language_classification_pretrained.pdparams)|
+
+
+**备注：**
+
+* 以上所有的模型的 backbone 均为 PPLCNet_x1_0，部分模型大小不同是由于分类的输出大小不同导致的，推理耗时是基于Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz 测试得到，其中测试过程开启 MKLDNN 加速策略，线程数为10。速度测试过程会有轻微波动。
+
+* person_exists、safety_helmet、car_exists 的评测指标为 TprAtFpr，person_attribute、vehicle_attribute的评测指标为ma、traffic_sign、text_image_orientation、textline_orientation、language_classification的评测指标为Top-1 Acc。
diff --git a/docs/zh_CN/PULC/PULC_person_attribute.md b/docs/zh_CN/PULC/PULC_person_attribute.md
new file mode 100644
index 0000000000000000000000000000000000000000..a144aed80b1e3b3ccca6a530c3f8392a057e3190
--- /dev/null
+++ b/docs/zh_CN/PULC/PULC_person_attribute.md
@@ -0,0 +1,453 @@
+# PULC 人体属性识别模型
+
+------
+
+
+## 目录
+
+- [1. 模型和应用场景介绍](#1)
+- [2. 模型快速体验](#2)
+    - [2.1 安装 paddlepaddle](#2.1)
+    - [2.2 安装 paddleclas](#2.2)
+    - [2.3 预测](#2.3)
+- [3. 模型训练、评估和预测](#3)
+    - [3.1 环境配置](#3.1)
+    - [3.2 数据准备](#3.2)
+      - [3.2.1 数据集来源](#3.2.1)
+      - [3.2.2 数据集获取](#3.2.2)
+    - [3.3 模型训练](#3.3)
+    - [3.4 模型评估](#3.4)
+    - [3.5 模型预测](#3.5)
+- [4. 模型压缩](#4)
+  - [4.1 SKL-UGI 知识蒸馏](#4.1)
+    - [4.1.1 教师模型训练](#4.1.1)
+    - [4.1.2 蒸馏训练](#4.1.2)
+- [5. 超参搜索](#5)
+- [6. 模型推理部署](#6)
+  - [6.1 推理模型准备](#6.1)
+    - [6.1.1 基于训练得到的权重导出 inference 模型](#6.1.1)
+    - [6.1.2 直接下载 inference 模型](#6.1.2)
+  - [6.2 基于 Python 预测引擎推理](#6.2)
+    - [6.2.1 预测单张图像](#6.2.1)
+    - [6.2.2 基于文件夹的批量预测](#6.2.2)
+  - [6.3 基于 C++ 预测引擎推理](#6.3)
+  - [6.4 服务化部署](#6.4)
+  - [6.5 端侧部署](#6.5)
+  - [6.6 Paddle2ONNX 模型转换与预测](#6.6)
+
+
+<a name="1"></a>
+
+## 1. 模型和应用场景介绍
+
+该案例提供了用户使用 PaddleClas 的超轻量图像分类方案（PULC，Practical Ultra Lightweight image Classification）快速构建轻量级、高精度、可落地的人体属性识别模型。该模型可以广泛应用于行人分析、行人跟踪等场景。
+
+下表列出了不同人体属性识别模型的相关指标，前三行展现了使用 SwinTransformer_tiny、Res2Net200_vd_26w_4s 和 MobileNetV3_small_x0_35 作为 backbone 训练得到的模型的相关指标，第四行至第七行依次展现了替换 backbone 为 PPLCNet_x1_0、使用 SSLD 预训练模型、使用 SSLD 预训练模型 + EDA 策略、使用 SSLD 预训练模型 + EDA 策略 + SKL-UGI 知识蒸馏策略训练得到的模型的相关指标。
+
+
+| 模型 | mA（%） | 延时（ms） | 存储（M） | 策略 |
+|-------|-----------|----------|---------------|---------------|
+| Res2Net200_vd_26w_4s  | 81.25 | 77.51  | 293 | 使用ImageNet预训练模型 |
+| SwinTransformer_tiny  | 80.17 | 89.51  | 111 | 使用ImageNet预训练模型 |
+| MobileNetV3_small_x0_35  | 70.79 | 2.90  | 1.7 | 使用ImageNet预训练模型 |
+| PPLCNet_x1_0  | 76.31 | 2.01  | 7.1 | 使用ImageNet预训练模型 |
+| PPLCNet_x1_0  | 77.31 | 2.01  | 7.1 | 使用SSLD预训练模型 |
+| PPLCNet_x1_0  | 77.71 | 2.01  | 7.1 | 使用SSLD预训练模型+EDA策略|
+| <b>PPLCNet_x1_0<b>  | <b>78.59<b> | <b>2.01<b>  | <b>7.1<b> | 使用SSLD预训练模型+EDA策略+SKL-UGI知识蒸馏策略|
+
+从表中可以看出，backbone 为 Res2Net200_vd_26w_4s 和 SwinTransformer_tiny 时精度较高，但是推理速度较慢。将 backbone 替换为轻量级模型 MobileNetV3_small_x0_35 后，速度可以大幅提升，但是精度也大幅下降。将 backbone 替换为 PPLCNet_x1_0 时，精度较 MobileNetV3_small_x0_35 高 5.5%，于此同时，速度更快。在此基础上，使用 SSLD 预训练模型后，在不改变推理速度的前提下，精度可以提升 1%，进一步地，当融合EDA策略后，精度可以再提升 0.4%，最后，在使用 SKL-UGI 知识蒸馏后，精度可以继续提升 0.88%。此时，PPLCNet_x1_0 的精度与 SwinTransformer_tiny 仅相差1.58%，但是速度快 44 倍。关于 PULC 的训练方法和推理部署方法将在下面详细介绍。
+
+**备注：**
+
+* 延时是基于 Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz 测试得到，开启 MKLDNN 加速策略，线程数为10。
+* 关于PP-LCNet的介绍可以参考[PP-LCNet介绍](../models/PP-LCNet.md)，相关论文可以查阅[PP-LCNet paper](https://arxiv.org/abs/2109.15099)。
+
+
+<a name="2"></a>
+
+## 2. 模型快速体验
+
+<a name="2.1"></a>  
+
+### 2.1 安装 paddlepaddle
+
+- 您的机器安装的是 CUDA9 或 CUDA10，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- 您的机器是CPU，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+更多的版本需求，请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+<a name="2.2"></a>  
+
+### 2.2 安装 paddleclas
+
+使用如下命令快速安装 paddleclas
+
+```  
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 预测
+
+点击[这里](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip)下载 demo 数据并解压，然后在终端中切换到相应目录。
+
+* 使用命令行快速预测
+
+```bash
+paddleclas --model_name=person_attribute --infer_imgs=pulc_demo_imgs/person_attribute/090004.jpg
+```
+
+结果如下：
+```
+>>> result
+attributes: ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower:  Trousers', 'No boots'], output: [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1], filename: pulc_demo_imgs/person_attribute/090004.jpg
+Predict complete!
+```
+
+**备注**： 更换其他预测的数据时，只需要改变 `--infer_imgs=xx` 中的字段即可，支持传入整个文件夹。
+
+
+* 在 Python 代码中预测
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="person_attribute")
+result = model.predict(input_data="pulc_demo_imgs/person_attribute/090004.jpg")
+print(next(result))
+```
+
+**备注**：`model.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果, 默认 `batch_size` 为 1，如果需要更改 `batch_size`，实例化模型时，需要指定 `batch_size`，如 `model = paddleclas.PaddleClas(model_name="person_attribute",  batch_size=2)`, 使用默认的代码返回结果示例如下：
+
+```
+>>> result
+[{'attributes': ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower:  Trousers', 'No boots'], 'output': [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1], 'filename': 'pulc_demo_imgs/person_attribute/090004.jpg'}]
+```
+
+<a name="3"></a>
+
+## 3. 模型训练、评估和预测
+
+<a name="3.1"></a>  
+
+### 3.1 环境配置
+
+* 安装：请先参考文档 [环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+
+<a name="3.2"></a>
+
+### 3.2 数据准备
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 数据集来源
+
+本案例中所使用的数据为[pa100k 数据集](https://www.v7labs.com/open-datasets/pa-100k)。
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 数据集获取
+
+部分数据可视化如下所示。
+
+<div align="center">
+<img src="../../images/PULC/docs/person_attribute_data_demo.png"  width = "500" />
+</div>
+
+
+我们将原始数据转换成了 PaddleClas 多标签可读的数据格式，可以直接下载。
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，下载并解压有人/无人场景的数据。
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/pa100k.tar
+tar -xf pa100k.tar
+cd ../
+```
+
+执行上述命令后，`dataset/` 下存在 `pa100k` 目录，该目录中具有以下数据：
+
+
+执行上述命令后，`pa100k`目录中具有以下数据：
+
+```
+pa100k
+├── train
+│   ├── 000001.jpg
+│   ├── 000002.jpg
+...
+├── val
+│   ├── 080001.jpg
+│   ├── 080002.jpg
+...  
+├── test
+│   ├── 090001.jpg
+│   ├── 090002.jpg
+...
+...
+├── train_list.txt
+├── train_val_list.txt
+├── val_list.txt
+├── test_list.txt
+```
+
+其中`train/`、`val/`、`test/`分别为训练集、验证集和测试集。`train_list.txt`、`val_list.txt`、`test_list.txt`分别为训练集、验证集、测试集的标签文件。在本例子中，`test_list.txt`暂时没有使用。
+
+
+<a name="3.3"></a>
+
+### 3.3 模型训练
+
+
+在 `ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml` 中提供了基于该场景的训练配置，可以通过如下脚本启动训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
+```
+
+验证集的最佳指标在 `77.71%` 左右（数据集较小，一般有0.3%左右的波动）。
+
+
+<a name="3.4"></a>
+
+### 3.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+其中 `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+<a name="3.5"></a>
+
+### 3.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```bash
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+
+输出结果如下：
+
+```
+[{'attributes': ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower:  Trousers', 'No boots'], 'output': [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]}]
+```
+
+**备注：**
+
+* 这里`-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+* 默认是对 `deploy/images/PULC/person_attribute/090004.jpg` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
+
+<a name="4"></a>
+
+## 4. 模型压缩
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI 知识蒸馏
+
+SKL-UGI 知识蒸馏是 PaddleClas 提出的一种简单有效的知识蒸馏方法，关于该方法的介绍，可以参考[SKL-UGI 知识蒸馏](../advanced_tutorials/ssld.md)。
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 教师模型训练
+
+复用 `ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml` 中的超参数，训练教师模型，训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd
+```
+
+验证集的最佳指标为 `80.10%` 左右，当前教师模型最好的权重保存在 `output/ResNet101_vd/best_model.pdparams`。
+
+<a name="4.1.2"></a>
+
+####  4.1.2 蒸馏训练
+
+配置文件`ppcls/configs/PULC/person_attribute/PPLCNet_x1_0_Distillation.yaml`提供了`SKL-UGI知识蒸馏策略`的配置。该配置将`ResNet101_vd`当作教师模型，`PPLCNet_x1_0`当作学生模型。训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0_Distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
+```
+
+验证集的最佳指标为 `78.5%` 左右，当前模型最好的权重保存在 `output/DistillationModel/best_model_student.pdparams`。
+
+
+<a name="5"></a>
+
+## 5. 超参搜索
+
+在 [3.2 节](#3.2)和 [4.1 节](#4.1)所使用的超参数是根据 PaddleClas 提供的 `超参数搜索策略` 搜索得到的，如果希望在自己的数据集上得到更好的结果，可以参考[超参数搜索策略](PULC_train.md#4-超参搜索)来获得更好的训练超参数。
+
+**备注：** 此部分内容是可选内容，搜索过程需要较长的时间，您可以根据自己的硬件情况来选择执行。如果没有更换数据集，可以忽略此节内容。
+
+<a name="6"></a>
+
+## 6. 模型推理部署
+
+<a name="6.1"></a>
+
+### 6.1 推理模型准备
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference可使用MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于Paddle Inference推理引擎的介绍，可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+
+当使用 Paddle Inference 推理时，加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法，如果希望得到和文档相同的结果，请选择[直接下载 inference 模型](#6.1.2)的方式。
+
+<a name="6.1.1"></a>
+
+### 6.1.1 基于训练得到的权重导出 inference 模型
+
+此处，我们提供了将权重和模型转换的脚本，执行该脚本可以得到对应的 inference 模型：
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person_attribute_infer
+```
+执行完该脚本后会在 `deploy/models/` 下生成 `PPLCNet_x1_0_person_attribute_infer` 文件夹，`models` 文件夹下应有如下文件结构：
+
+```
+├── PPLCNet_x1_0_person_attribute_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**备注：** 此处的最佳权重是经过知识蒸馏后的权重路径，如果没有执行知识蒸馏的步骤，最佳模型保存在`output/PPLCNet_x1_0/best_model.pdparams`中。
+
+<a name="6.1.2"></a>
+
+### 6.1.2 直接下载 inference 模型
+
+[6.1.1 小节](#6.1.1)提供了导出 inference 模型的方法，此处也提供了该场景可以下载的 inference 模型，可以直接下载体验。
+
+```
+cd deploy/models
+# 下载 inference 模型并解压
+wget https://paddleclas.bj.bcebos.com/models/PULC/person_attribute_infer.tar && tar -xf person_attribute_infer.tar
+```
+
+解压完毕后，`models` 文件夹下应有如下文件结构：
+
+```
+├── person_attribute_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 基于 Python 预测引擎推理
+
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 预测单张图像
+
+返回 `deploy` 目录：
+
+```
+cd ../
+```
+
+运行下面的命令，对图像 `./images/PULC/person_attribute/090004.jpg` 进行车辆属性识别。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/person_attribute/inference_person_attribute.yaml -o Global.use_gpu=True
+# 使用下面的命令使用 CPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/person_attribute/inference_person_attribute.yaml -o Global.use_gpu=False
+```
+
+输出结果如下。
+
+```
+090004.jpg:     {'attributes': ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower:  Trousers', 'No boots'], 'output': [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]}
+```
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 基于文件夹的批量预测
+
+如果希望预测文件夹内的图像，可以直接修改配置文件中的 `Global.infer_imgs` 字段，也可以通过下面的 `-o` 参数修改对应的配置。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测，如果希望使用 CPU 预测，可以在命令后面添加 -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/person_attribute/inference_person_attribute.yaml -o Global.infer_imgs="./images/PULC/person_attribute/"
+```
+
+终端中会输出该文件夹内所有图像的属性识别结果，如下所示。
+
+```
+090004.jpg:     {'attributes': ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower:  Trousers', 'No boots'], 'output': [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]}
+090007.jpg:     {'attributes': ['Female', 'Age18-60', 'Side', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'No bag', 'Upper: ShortSleeve', 'Lower:  Skirt&Dress', 'No boots'], 'output': [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0]}
+```
+
+<a name="6.3"></a>
+
+### 6.3 基于 C++ 预测引擎推理
+
+PaddleClas 提供了基于 C++ 预测引擎推理的示例，您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台，可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。
+
+<a name="6.4"></a>
+
+### 6.4 服务化部署
+
+Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍，可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。
+
+PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例，您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。
+
+<a name="6.5"></a>
+
+### 6.5 端侧部署
+
+Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍，可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。
+
+PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例，您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。
+
+<a name="6.6"></a>
+
+### 6.6 Paddle2ONNX 模型转换与预测
+
+Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署，包括TensorRT/OpenVINO/MNN/TNN/NCNN，以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍，可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。
+
+PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例，您可以参考[Paddle2ONNX 模型转换与预测](../../../deploy/paddle2onnx/readme.md)来完成相应的部署工作。
diff --git a/docs/zh_CN/PULC/PULC_person_cls.md b/docs/zh_CN/PULC/PULC_person_cls.md
deleted file mode 100644
index ff3508c35c3ff9394da9f5c82e0b4001ee8394a3..0000000000000000000000000000000000000000
--- a/docs/zh_CN/PULC/PULC_person_cls.md
+++ /dev/null
@@ -1,332 +0,0 @@
-# PaddleClas构建有人/无人分类案例
-
-此处提供了用户使用 PaddleClas 快速构建轻量级、高精度、可落地的有人/无人的分类模型教程，主要基于有人/无人场景的数据，融合了轻量级骨干网络PPLCNet、SSLD预训练权重、EDA数据增强策略、SKL-UGI知识蒸馏策略、SHAS超参数搜索策略，得到精度高、速度快、易于部署的二分类模型。
-
-------
-
-
-## 目录
-
-- [1. 环境配置](#1)
-- [2. 有人/无人场景推理预测](#2)
-  - [2.1 下载模型](#2.1)  
-  - [2.2 模型推理预测](#2.2)
-      - [2.2.1 预测单张图像](#2.2.1)
-      - [2.2.2 基于文件夹的批量预测](#2.2.2)
-- [3.有人/无人场景训练](#3)
-    - [3.1 数据准备](#3.1)
-    - [3.2 模型训练](#3.2)
-      - [3.2.1 基于默认超参数训练](#3.2.1)
-        - [3.2.1.1 基于默认超参数训练轻量级模型](#3.2.1.1)
-        - [3.2.1.2 基于默认超参数训练教师模型](#3.2.1.2)
-        - [3.2.1.3 基于默认超参数进行蒸馏训练](#3.2.1.3)
-      - [3.2.2 超参数搜索训练](#3.2)  
-- [4. 模型评估与推理](#4)
-  - [4.1 模型评估](#3.1)
-  - [4.2 模型预测](#3.2)
-  - [4.3 使用 inference 模型进行推理](#4.3)
-    - [4.3.1 导出 inference 模型](#4.3.1)
-    - [4.3.2 模型推理预测](#4.3.2)
-    
-    
-<a name="1"></a>
-
-## 1. 环境配置
-
-* 安装：请先参考 [Paddle 安装教程](../installation/install_paddle.md) 以及 [PaddleClas 安装教程](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
- 
-<a name="2"></a> 
-
-## 2. 有人/无人场景推理预测
-
-<a name="2.1"></a> 
-
-### 2.1 下载模型
-
-* 进入 `deploy` 运行目录。
-
-```
-cd deploy
-```
-
-下载有人/无人分类的模型。
-
-```
-mkdir models
-cd models
-# 下载inference 模型并解压
-wget https://paddleclas.bj.bcebos.com/models/PULC/person_cls_infer.tar && tar -xf person_cls_infer.tar
-```
-
-解压完毕后，`models` 文件夹下应有如下文件结构：
-
-```
-├── person_cls_infer
-│   ├── inference.pdiparams
-│   ├── inference.pdiparams.info
-│   └── inference.pdmodel
-```
-
-<a name="2.2"></a> 
-
-### 2.2 模型推理预测
-
-<a name="2.2.1"></a> 
-
-#### 2.2.1 预测单张图像
-
-返回 `deploy` 目录：
-
-```
-cd ../
-```
-
-运行下面的命令，对图像 `./images/PULC/person/objects365_02035329.jpg` 进行有人/无人分类。
-
-```shell
-# 使用下面的命令使用 GPU 进行预测
-python3.7 python/predict_cls.py -c configs/PULC/person/inference_person_cls.yaml -o PostProcess.ThreshOutput.threshold=0.9794
-# 使用下面的命令使用 CPU 进行预测
-python3.7 python/predict_cls.py -c configs/PULC/person/inference_person_cls.yaml -o PostProcess.ThreshOutput.threshold=0.9794 -o Global.use_gpu=False
-```
-
-输出结果如下。
-
-```
-objects365_02035329.jpg:	class id(s): [1], score(s): [1.00], label_name(s): ['someone']
-```
-
-
-**备注：** 真实场景中往往需要在假正类率（Fpr）小于某一个指标下求真正类率（Tpr），该场景中的`val`数据集在千分之一Fpr下得到的最佳Tpr所得到的阈值为`0.9794`，故此处的`threshold`为`0.9794`。该阈值的确定方法可以参考[3.2节](#3.2)
-
-<a name="2.2.2"></a> 
-
-#### 2.2.2 基于文件夹的批量预测
-
-如果希望预测文件夹内的图像，可以直接修改配置文件中的 `Global.infer_imgs` 字段，也可以通过下面的 `-o` 参数修改对应的配置。
-
-```shell
-# 使用下面的命令使用 GPU 进行预测，如果希望使用 CPU 预测，可以在命令后面添加 -o Global.use_gpu=False
-python3.7 python/predict_cls.py -c configs/PULC/person/inference_person_cls.yaml -o Global.infer_imgs="./images/PULC/person/"
-```
-
-终端中会输出该文件夹内所有图像的分类结果，如下所示。
-
-```
-objects365_01780782.jpg:	class id(s): [0], score(s): [1.00], label_name(s): ['nobody']
-objects365_02035329.jpg:	class id(s): [1], score(s): [1.00], label_name(s): ['someone']
-```
-
-其中，`someone` 表示该图里存在人，`nobody` 表示该图里不存在人。
-
-<a name="3"></a> 
-
-## 3.有人/无人场景训练
-
-<a name="3.1"></a> 
-
-### 3.1 数据准备
-
-进入 PaddleClas 目录。
-
-```
-cd path_to_PaddleClas
-```
-
-进入 `dataset/` 目录，下载并解压有人/无人场景的数据。
-
-```shell
-cd dataset
-wget https://paddleclas.bj.bcebos.com/data/cls_demo/person.tar
-tar -xf person.tar
-cd ../
-```
-
-执行上述命令后，`dataset/`下存在`person`目录，该目录中具有以下数据：
-
-```
-
-├── train
-│   ├── 000000000009.jpg
-│   ├── 000000000025.jpg
-...
-├── val
-│   ├── objects365_01780637.jpg
-│   ├── objects365_01780640.jpg
-...
-├── ImageNet_val
-│   ├── ILSVRC2012_val_00000001.JPEG
-│   ├── ILSVRC2012_val_00000002.JPEG
-...
-├── train_list.txt
-├── train_list.txt.debug
-├── train_list_for_distill.txt
-├── val_list.txt
-└── val_list.txt.debug
-```
-
-其中`train/`和`val/`分别为训练集和验证集。`train_list.txt`和`val_list.txt`分别为训练集和验证集的标签文件，`train_list.txt.debug`和`val_list.txt.debug`分别为训练集和验证集的`debug`标签文件，其分别是`train_list.txt`和`val_list.txt`的子集，用该文件可以快速体验本案例的流程。`ImageNet_val/`是ImageNet的验证集，该集合和`train`集合的混合数据用于本案例的`SKL-UGI知识蒸馏策略`，对应的训练标签文件为`train_list_for_distill.txt`。
-
-* **注意**: 
-
-* 本案例中所使用的所有数据集均为开源数据，`train`集合为[MS-COCO数据](https://cocodataset.org/#overview)的训练集的子集，`val`集合为[Object365数据](https://www.objects365.org/overview.html)的训练集的子集，`ImageNet_val`为[ImageNet数据](https://www.image-net.org/)的验证集。数据集的筛选流程可以参考[有人/无人场景数据集筛选方法]()。
-
-<a name="3.2"></a> 
-
-### 3.2 模型训练
-
-<a name="3.2.1"></a> 
-
-#### 3.2.1 基于默认超参数训练
-
-<a name="3.2.1.1"></a> 
-
-##### 3.2.1.1 基于默认超参数训练轻量级模型
-
-在`ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml`中提供了基于该场景的训练配置，可以通过如下脚本启动训练：
-
-```shell
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-python3 -m paddle.distributed.launch \
-    --gpus="0,1,2,3" \
-    tools/train.py \
-        -c ./ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml 
-```
-
-验证集的最佳指标在0.94-0.95之间（数据集较小，容易造成波动）。
-
-**备注：** 
-
-* 此时使用的指标为Tpr，该指标描述了在假正类率（Fpr）小于某一个指标时的真正类率（Tpr），是产业中二分类问题常用的指标之一。在本案例中，Fpr为千分之一。关于Fpr和Tpr的更多介绍，可以参考[这里](https://baike.baidu.com/item/AUC/19282953)。
-
-* 在eval时，会打印出来当前最佳的TprAtFpr指标，具体地，其会打印当前的`Fpr`、`Tpr`值，以及当前的`threshold`值，`Tpr`值反映了在当前`Fpr`值下的召回率，该值越高，代表模型越好。`threshold` 表示当前最佳`Fpr`所对应的分类阈值，可用于后续模型部署落地等。
-
-<a name="3.2.1.2"></a> 
-
-##### 3.2.1.2 基于默认超参数训练教师模型
-
-复用`ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml`中的超参数，训练教师模型，训练脚本如下：
-
-```shell
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-python3 -m paddle.distributed.launch \
-    --gpus="0,1,2,3" \
-    tools/train.py \
-        -c ./ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml \
-        -o Arch.name=ResNet101_vd
-```
-
-验证集的最佳指标为0.96-0.98之间，当前教师模型最好的权重保存在`output/ResNet101_vd/best_model.pdparams`。
-
-<a name="3.2.1.3"></a> 
-
-##### 3.2.1.3 基于默认超参数进行蒸馏训练
-
-配置文件`ppcls/configs/PULC/PULC/Distillation/PPLCNet_x1_0_distillation.yaml`提供了`SKL-UGI知识蒸馏策略`的配置。该配置将`ResNet101_vd`当作教师模型，`PPLCNet_x1_0`当作学生模型，使用ImageNet数据集的验证集作为新增的无标签数据。训练脚本如下：
-
-```shell
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-python3 -m paddle.distributed.launch \
-    --gpus="0,1,2,3" \
-    tools/train.py \
-        -c ./ppcls/configs/PULC/person/Distillation/PPLCNet_x1_0_distillation.yaml \
-        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
-```
-
-验证集的最佳指标为0.95-0.97之间，当前模型最好的权重保存在`output/DistillationModel/best_model_student.pdparams`。
-
-<a name="3.2.2"></a>
-
-#### 3.2.2 超参数搜索训练
-
-[3.2 小节](#3.2) 提供了在已经搜索并得到的超参数上进行了训练，此部分内容提供了搜索的过程，此过程是为了得到更好的训练超参数。
-
-* 搜索运行脚本如下：
-
-```shell
-python tools/search_strategy.py -c ppcls/configs/StrategySearch/person.yaml
-```
-
-在`ppcls/configs/StrategySearch/person.yaml`中指定了具体的 GPU id 号和搜索配置, 默认搜索的训练日志和模型存放于`output/search_person`中，最终的蒸馏模型存放于`output/search_person/search_res/DistillationModel/best_model_student.pdparams`。
-
-* **注意**: 
-
-* 3.1小节提供的默认配置已经经过了搜索，所以此过程不是必要的过程，如果自己的训练数据集有变化，可以尝试此过程。
-
-* 此过程基于当前数据集在 V100 4 卡上大概需要耗时 10 小时，如果缺少机器资源，希望体验搜索过程，可以将`ppcls/configs/cls_demo/person/PPLCNet/PPLCNet_x1_0_search.yaml`中的`train_list.txt`和`val_list.txt`分别替换为`train_list.txt.debug`和`val_list.txt.debug`。替换list只是为了加速跑通整个搜索过程，由于数据量较小，其搜素的结果没有参考性。另外，搜索空间可以根据当前的机器资源来调整，如果机器资源有限，可以尝试缩小搜索空间，如果机器资源较充足，可以尝试扩大搜索空间。
-
-* 如果此过程搜索的得到的超参数与[3.2.1小节](#3.2.1)提供的超参数不一致，主要是由于训练数据较小造成的波动导致，可以忽略。
-
-
-<a name="4"></a>
-
-## 4. 模型评估与推理
-
-
-<a name="4.1"></a> 
-
-### 4.1 模型评估
-
-训练好模型之后，可以通过以下命令实现对模型指标的评估。
-
-```bash
-python3 tools/eval.py \
-    -c ./ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml \
-    -o Global.pretrained_model="output/DistillationModel/best_model_student"
-```
-
-<a name="4.2"></a> 
-
-### 4.2 模型预测
-
-模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
-
-```python
-python3 tools/infer.py \
-    -c ./ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml \
-    -o Infer.infer_imgs=./dataset/person/val/objects365_01780637.jpg  \
-    -o Global.pretrained_model=output/DistillationModel/best_model_student \
-    -o Global.pretrained_model=Infer.PostProcess.threshold=0.9794
-```
-
-输出结果如下：
-
-```
-[{'class_ids': [0], 'scores': [0.9878496769815683], 'label_names': ['nobody'], 'file_name': './dataset/person/val/objects365_01780637.jpg'}]
-```
-
-**备注：** 这里的`Infer.PostProcess.threshold`的值需要根据实际场景来确定，此处的`0.9794`是在该场景中的`val`数据集在千分之一Fpr下得到的最佳Tpr所得到的。
-
-<a name="4.3"></a> 
-
-### 4.3 使用 inference 模型进行推理
-
-<a name="4.3.1"></a> 
-
-### 4.3.1 导出 inference 模型
-
-通过导出 inference 模型，PaddlePaddle 支持使用预测引擎进行预测推理。接下来介绍如何用预测引擎进行推理：
-首先，对训练好的模型进行转换：
-
-```bash
-python3 tools/export_model.py \
-    -c ./ppcls/configs/cls_demo/PULC/PPLCNet/PPLCNet_x1_0.yaml \
-    -o Global.pretrained_model=output/DistillationModel/best_model_student \
-    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person
-```
-执行完该脚本后会在`deploy/models/`下生成`PPLCNet_x1_0_person`文件夹，该文件夹中的模型与 2.2 节下载的推理预测模型格式一致。
-
-<a name="4.3.2"></a> 
-
-### 4.3.2 基于 inference 模型推理预测
-推理预测的脚本为：
-
-```
-python3.7 python/predict_cls.py -c configs/PULC/person/inference_person_cls.yaml -o Global.inference_model_dir="models/PPLCNet_x1_0_person" -o PostProcess.ThreshOutput.threshold=0.9794
-```
-
-**备注：**
-
-- 此处的`PostProcess.ThreshOutput.threshold`由eval时的最佳`threshold`来确定。
-- 更多关于推理的细节，可以参考[2.2节](#2.2)。
-
diff --git a/docs/zh_CN/PULC/PULC_person_exists.md b/docs/zh_CN/PULC/PULC_person_exists.md
new file mode 100644
index 0000000000000000000000000000000000000000..b3b830a893a4648645beab3a447ec8d894a5da4c
--- /dev/null
+++ b/docs/zh_CN/PULC/PULC_person_exists.md
@@ -0,0 +1,472 @@
+# PULC 有人/无人分类模型
+
+------
+
+
+## 目录
+
+- [1. 模型和应用场景介绍](#1)
+- [2. 模型快速体验](#2)
+    - [2.1 安装 paddlepaddle](#2.1)
+    - [2.2 安装 paddleclas](#2.2)
+    - [2.3 预测](#2.3)
+- [3. 模型训练、评估和预测](#3)
+    - [3.1 环境配置](#3.1)
+    - [3.2 数据准备](#3.2)
+      - [3.2.1 数据集来源](#3.2.1)
+      - [3.2.2 数据集获取](#3.2.2)
+    - [3.3 模型训练](#3.3)
+    - [3.4 模型评估](#3.4)
+    - [3.5 模型预测](#3.5)
+- [4. 模型压缩](#4)
+  - [4.1 SKL-UGI 知识蒸馏](#4.1)
+    - [4.1.1 教师模型训练](#4.1.1)
+    - [4.1.2 蒸馏训练](#4.1.2)
+- [5. 超参搜索](#5)
+- [6. 模型推理部署](#6)
+  - [6.1 推理模型准备](#6.1)
+    - [6.1.1 基于训练得到的权重导出 inference 模型](#6.1.1)
+    - [6.1.2 直接下载 inference 模型](#6.1.2)
+  - [6.2 基于 Python 预测引擎推理](#6.2)
+    - [6.2.1 预测单张图像](#6.2.1)
+    - [6.2.2 基于文件夹的批量预测](#6.2.2)
+  - [6.3 基于 C++ 预测引擎推理](#6.3)
+  - [6.4 服务化部署](#6.4)
+  - [6.5 端侧部署](#6.5)
+  - [6.6 Paddle2ONNX 模型转换与预测](#6.6)
+
+
+<a name="1"></a>
+
+## 1. 模型和应用场景介绍
+
+该案例提供了用户使用 PaddleClas 的超轻量图像分类方案（PULC，Practical Ultra Lightweight image Classification）快速构建轻量级、高精度、可落地的有人/无人的分类模型。该模型可以广泛应用于如监控场景、人员进出管控场景、海量数据过滤场景等。
+
+下表列出了判断图片中是否有人的二分类模型的相关指标，前两行展现了使用 SwinTranformer_tiny 和 MobileNetV3_small_x0_35 作为 backbone 训练得到的模型的相关指标，第三行至第六行依次展现了替换 backbone 为 PPLCNet_x1_0、使用 SSLD 预训练模型、使用 SSLD 预训练模型 + EDA 策略、使用 SSLD 预训练模型 + EDA 策略 + SKL-UGI 知识蒸馏策略训练得到的模型的相关指标。
+
+
+| 模型 | Tpr（%） | 延时（ms） | 存储（M） | 策略 |
+|-------|-----------|----------|---------------|---------------|
+| SwinTranformer_tiny  | 95.69 | 95.30  | 111 | 使用 ImageNet 预训练模型 |
+| MobileNetV3_small_x0_35  | 68.25 | 2.85  | 2.6 | 使用 ImageNet 预训练模型 |
+| PPLCNet_x1_0  | 89.57 | 2.12  | 7.0 | 使用 ImageNet 预训练模型 |
+| PPLCNet_x1_0  | 92.10 | 2.12  | 7.0 | 使用 SSLD 预训练模型 |
+| PPLCNet_x1_0  | 93.43 | 2.12  | 7.0 | 使用 SSLD 预训练模型+EDA 策略|
+| <b>PPLCNet_x1_0<b>  | <b>96.23<b> | <b>2.12<b>  | <b>7.0<b> | 使用 SSLD 预训练模型+EDA 策略+SKL-UGI 知识蒸馏策略|
+
+从表中可以看出，backbone 为 SwinTranformer_tiny 时精度较高，但是推理速度较慢。将 backbone 替换为轻量级模型 MobileNetV3_small_x0_35 后，速度可以大幅提升，但是会导致精度大幅下降。将 backbone 替换为速度更快的 PPLCNet_x1_0 时，精度较 MobileNetV3_small_x0_35 高 20 多个百分点，与此同时速度依旧可以快 20% 以上。在此基础上，使用 SSLD 预训练模型后，在不改变推理速度的前提下，精度可以提升约 2.6 个百分点，进一步地，当融合EDA策略后，精度可以再提升 1.3 个百分点，最后，在使用 SKL-UGI 知识蒸馏后，精度可以继续提升 2.8 个百分点。此时，PPLCNet_x1_0 达到了 SwinTranformer_tiny 模型的精度，但是速度快 40 多倍。关于 PULC 的训练方法和推理部署方法将在下面详细介绍。
+
+**备注：**
+
+* `Tpr`指标的介绍可以参考 [3.2 小节](#3.2)的备注部分，延时是基于 Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz 测试得到，开启 MKLDNN 加速策略，线程数为10。
+* 关于PP-LCNet的介绍可以参考[PP-LCNet介绍](../models/PP-LCNet.md)，相关论文可以查阅[PP-LCNet paper](https://arxiv.org/abs/2109.15099)。
+
+
+<a name="2"></a>
+
+## 2. 模型快速体验
+
+<a name="2.1"></a>  
+
+### 2.1 安装 paddlepaddle
+
+- 您的机器安装的是 CUDA9 或 CUDA10，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- 您的机器是CPU，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+更多的版本需求，请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+<a name="2.2"></a>  
+
+### 2.2 安装 paddleclas
+
+使用如下命令快速安装 paddleclas
+
+```  
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 预测
+
+点击[这里](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip)下载 demo 数据并解压，然后在终端中切换到相应目录。
+
+* 使用命令行快速预测
+
+```bash
+paddleclas --model_name=person_exists --infer_imgs=pulc_demo_imgs/person_exists/objects365_01780782.jpg
+```
+
+结果如下：
+```
+>>> result
+class_ids: [0], scores: [0.9955421453341842], label_names: ['nobody'], filename: pulc_demo_imgs/person_exists/objects365_01780782.jpg
+Predict complete!
+```
+
+**备注**： 更换其他预测的数据时，只需要改变 `--infer_imgs=xx` 中的字段即可，支持传入整个文件夹。
+
+
+* 在 Python 代码中预测
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="person_exists")
+result = model.predict(input_data="pulc_demo_imgs/person_exists/objects365_01780782.jpg")
+print(next(result))
+```
+
+**备注**：`model.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果, 默认 `batch_size` 为 1，如果需要更改 `batch_size`，实例化模型时，需要指定 `batch_size`，如 `model = paddleclas.PaddleClas(model_name="person_exists",  batch_size=2)`, 使用默认的代码返回结果示例如下：
+
+```
+>>> result
+[{'class_ids': [0], 'scores': [0.9955421453341842], 'label_names': ['nobody'], 'filename': 'pulc_demo_imgs/person_exists/objects365_01780782.jpg'}]
+```
+
+<a name="3"></a>
+
+## 3. 模型训练、评估和预测
+
+<a name="3.1"></a>  
+
+### 3.1 环境配置
+
+* 安装：请先参考文档[环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+
+<a name="3.2"></a>
+
+### 3.2 数据准备
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 数据集来源
+
+本案例中所使用的所有数据集均为开源数据，`train` 集合为[MS-COCO 数据](https://cocodataset.org/#overview)的训练集的子集，`val` 集合为[Object365 数据](https://www.objects365.org/overview.html)的训练集的子集，`ImageNet_val` 为[ImageNet-1k 数据](https://www.image-net.org/)的验证集。
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 数据集获取
+
+在公开数据集的基础上经过后处理即可得到本案例需要的数据，具体处理方法如下：
+
+- 训练集合，本案例处理了 MS-COCO 数据训练集的标注文件，如果某张图含有“人”的标签，且这个框的面积在整张图中的比例大于 10%，即认为该张图中含有人，如果某张图中没有“人”的标签，则认为该张图中不含有人。经过处理后，得到 92964 条可用数据，其中有人的数据有 39813 条，无人的数据 53151 条。
+
+- 验证集合，从 Object365 数据中随机抽取一小部分数据，使用在 MS-COCO 上训练得到的较好的模型预测这些数据，将预测结果和数据的标注文件取交集，将交集的结果按照得到训练集的方法筛选出验证集合。经过处理后，得到 27820 条可用数据。其中有人的数据有 2255 条，无人的数据有 25565 条。
+
+处理后的数据集部分数据可视化如下：
+
+![](../../images/PULC/docs/person_exists_data_demo.png)
+
+此处提供了经过上述方法处理好的数据，可以直接下载得到。
+
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，下载并解压有人/无人场景的数据。
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/person_exists.tar
+tar -xf person_exists.tar
+cd ../
+```
+
+执行上述命令后，`dataset/` 下存在 `person_exists` 目录，该目录中具有以下数据：
+
+```
+
+├── train
+│   ├── 000000000009.jpg
+│   ├── 000000000025.jpg
+...
+├── val
+│   ├── objects365_01780637.jpg
+│   ├── objects365_01780640.jpg
+...
+├── ImageNet_val
+│   ├── ILSVRC2012_val_00000001.JPEG
+│   ├── ILSVRC2012_val_00000002.JPEG
+...
+├── train_list.txt
+├── train_list.txt.debug
+├── train_list_for_distill.txt
+├── val_list.txt
+└── val_list.txt.debug
+```
+
+其中 `train/` 和 `val/` 分别为训练集和验证集。`train_list.txt` 和 `val_list.txt` 分别为训练集和验证集的标签文件，`train_list.txt.debug` 和 `val_list.txt.debug` 分别为训练集和验证集的 `debug` 标签文件，其分别是 `train_list.txt` 和 `val_list.txt` 的子集，用该文件可以快速体验本案例的流程。`ImageNet_val/` 是 ImageNet-1k 的验证集，该集合和 `train` 集合的混合数据用于本案例的 `SKL-UGI知识蒸馏策略`，对应的训练标签文件为 `train_list_for_distill.txt` 。
+
+**备注：**
+
+* 关于 `train_list.txt`、`val_list.txt`的格式说明，可以参考 [PaddleClas 分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。
+
+* 关于如何得到蒸馏的标签文件可以参考[知识蒸馏标签获得方法](../advanced_tutorials/ssld.md#3.2)。
+
+
+<a name="3.3"></a>
+
+### 3.3 模型训练
+
+
+在 `ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml` 中提供了基于该场景的训练配置，可以通过如下脚本启动训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml
+```
+
+验证集的最佳指标在 `0.94-0.95` 之间（数据集较小，容易造成波动）。
+
+**备注：**
+
+* 此时使用的指标为Tpr，该指标描述了在假正类率（Fpr）小于某一个指标时的真正类率（Tpr），是产业中二分类问题常用的指标之一。在本案例中，Fpr 为千分之一。关于 Fpr 和 Tpr 的更多介绍，可以参考[这里](https://baike.baidu.com/item/AUC/19282953)。
+
+* 在eval时，会打印出来当前最佳的 TprAtFpr 指标，具体地，其会打印当前的 `Fpr`、`Tpr` 值，以及当前的 `threshold`值，`Tpr` 值反映了在当前 `Fpr` 值下的召回率，该值越高，代表模型越好。`threshold` 表示当前最佳 `Fpr` 所对应的分类阈值，可用于后续模型部署落地等。
+
+<a name="3.4"></a>
+
+### 3.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+其中 `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+<a name="3.5"></a>
+
+### 3.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```python
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+
+输出结果如下：
+
+```
+[{'class_ids': [1], 'scores': [0.9999976], 'label_names': ['someone'], 'file_name': 'deploy/images/PULC/person_exists/objects365_02035329.jpg'}]
+```
+
+**备注：**
+
+* 这里`-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+* 默认是对 `deploy/images/PULC/person_exists/objects365_02035329.jpg` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
+
+* 二分类默认的阈值为0.5， 如果需要指定阈值，可以重写 `Infer.PostProcess.threshold` ，如`-o Infer.PostProcess.threshold=0.9794`，该值需要根据实际场景来确定，此处的 `0.9794` 是在该场景中的 `val` 数据集在千分之一 Fpr 下得到的最佳 Tpr 所得到的。
+
+
+<a name="4"></a>
+
+## 4. 模型压缩
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI 知识蒸馏
+
+SKL-UGI 知识蒸馏是 PaddleClas 提出的一种简单有效的知识蒸馏方法，关于该方法的介绍，可以参考[SKL-UGI 知识蒸馏](../advanced_tutorials/ssld.md)。
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 教师模型训练
+
+复用 `ppcls/configs/PULC/person_exists/PPLCNet/PPLCNet_x1_0.yaml` 中的超参数，训练教师模型，训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd
+```
+
+验证集的最佳指标为 `0.96-0.98` 之间，当前教师模型最好的权重保存在 `output/ResNet101_vd/best_model.pdparams`。
+
+<a name="4.1.2"></a>
+
+####  4.1.2 蒸馏训练
+
+配置文件`ppcls/configs/PULC/person_exists/PPLCNet_x1_0_distillation.yaml`提供了`SKL-UGI知识蒸馏策略`的配置。该配置将`ResNet101_vd`当作教师模型，`PPLCNet_x1_0`当作学生模型，使用ImageNet数据集的验证集作为新增的无标签数据。训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0_distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
+```
+
+验证集的最佳指标为 `0.95-0.97` 之间，当前模型最好的权重保存在 `output/DistillationModel/best_model_student.pdparams`。
+
+
+<a name="5"></a>
+
+## 5. 超参搜索
+
+在 [3.3 节](#3.3)和 [4.1 节](#4.1)所使用的超参数是根据 PaddleClas 提供的 `超参数搜索策略` 搜索得到的，如果希望在自己的数据集上得到更好的结果，可以参考[超参数搜索策略](PULC_train.md#4-超参搜索)来获得更好的训练超参数。
+
+**备注：** 此部分内容是可选内容，搜索过程需要较长的时间，您可以根据自己的硬件情况来选择执行。如果没有更换数据集，可以忽略此节内容。
+
+<a name="6"></a>
+
+## 6. 模型推理部署
+
+<a name="6.1"></a>
+
+### 6.1 推理模型准备
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference可使用 MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于 Paddle Inference 推理引擎的介绍，可以参考 [Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+
+当使用 Paddle Inference 推理时，加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法，如果希望得到和文档相同的结果，请选择[直接下载 inference 模型](#6.1.2)的方式。
+
+<a name="6.1.1"></a>
+
+### 6.1.1 基于训练得到的权重导出 inference 模型
+
+此处，我们提供了将权重和模型转换的脚本，执行该脚本可以得到对应的 inference 模型：
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person_exists_infer
+```
+执行完该脚本后会在 `deploy/models/` 下生成 `PPLCNet_x1_0_person_exists_infer` 文件夹，`models` 文件夹下应有如下文件结构：
+
+```
+├── PPLCNet_x1_0_person_exists_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**备注：** 此处的最佳权重是经过知识蒸馏后的权重路径，如果没有执行知识蒸馏的步骤，最佳模型保存在`output/PPLCNet_x1_0/best_model.pdparams`中。
+
+<a name="6.1.2"></a>
+
+### 6.1.2 直接下载 inference 模型
+
+[6.1.1 小节](#6.1.1)提供了导出 inference 模型的方法，此处也提供了该场景可以下载的 inference 模型，可以直接下载体验。
+
+```
+cd deploy/models
+# 下载 inference 模型并解压
+wget https://paddleclas.bj.bcebos.com/models/PULC/person_exists_infer.tar && tar -xf person_exists_infer.tar
+```
+
+解压完毕后，`models` 文件夹下应有如下文件结构：
+
+```
+├── person_exists_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 基于 Python 预测引擎推理
+
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 预测单张图像
+
+返回 `deploy` 目录：
+
+```
+cd ../
+```
+
+运行下面的命令，对图像 `./images/PULC/person_exists/objects365_02035329.jpg` 进行有人/无人分类。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/person_exists/inference_person_exists.yaml
+# 使用下面的命令使用 CPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/person_exists/inference_person_exists.yaml -o Global.use_gpu=False
+```
+
+输出结果如下。
+
+```
+objects365_02035329.jpg:	class id(s): [1], score(s): [1.00], label_name(s): ['someone']
+```
+
+
+**备注：** 二分类默认的阈值为0.5， 如果需要指定阈值，可以重写 `Infer.PostProcess.threshold` ，如`-o Infer.PostProcess.threshold=0.9794`，该值需要根据实际场景来确定，此处的 `0.9794` 是在该场景中的 `val` 数据集在千分之一 Fpr 下得到的最佳 Tpr 所得到的。该阈值的确定方法可以参考[3.3节](#3.3)备注部分。
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 基于文件夹的批量预测
+
+如果希望预测文件夹内的图像，可以直接修改配置文件中的 `Global.infer_imgs` 字段，也可以通过下面的 `-o` 参数修改对应的配置。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测，如果希望使用 CPU 预测，可以在命令后面添加 -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/person_exists/inference_person_exists.yaml -o Global.infer_imgs="./images/PULC/person_exists/"
+```
+
+终端中会输出该文件夹内所有图像的分类结果，如下所示。
+
+```
+objects365_01780782.jpg:	class id(s): [0], score(s): [1.00], label_name(s): ['nobody']
+objects365_02035329.jpg:	class id(s): [1], score(s): [1.00], label_name(s): ['someone']
+```
+
+其中，`someone` 表示该图里存在人，`nobody` 表示该图里不存在人。
+
+<a name="6.3"></a>
+
+### 6.3 基于 C++ 预测引擎推理
+
+PaddleClas 提供了基于 C++ 预测引擎推理的示例，您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台，可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。
+
+<a name="6.4"></a>
+
+### 6.4 服务化部署
+
+Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍，可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。
+
+PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例，您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。
+
+<a name="6.5"></a>
+
+### 6.5 端侧部署
+
+Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍，可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。
+
+PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例，您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。
+
+<a name="6.6"></a>
+
+### 6.6 Paddle2ONNX 模型转换与预测
+
+Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署，包括TensorRT/OpenVINO/MNN/TNN/NCNN，以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍，可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。
+
+PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例，您可以参考[Paddle2ONNX 模型转换与预测](../../../deploy/paddle2onnx/readme.md)来完成相应的部署工作。
diff --git a/docs/zh_CN/PULC/PULC_quickstart.md b/docs/zh_CN/PULC/PULC_quickstart.md
new file mode 100644
index 0000000000000000000000000000000000000000..c7c6980625d6325bddbd5a6fed619147534c43b7
--- /dev/null
+++ b/docs/zh_CN/PULC/PULC_quickstart.md
@@ -0,0 +1,125 @@
+# PULC 快速体验
+
+------
+
+本文主要介绍通过 PaddleClas whl 包，使用 PULC 系列模型进行预测。
+
+## 目录
+
+- [1. 安装](#1)
+  - [1.1 安装PaddlePaddle](#11)
+  - [1.2 安装PaddleClas whl包](#12)
+- [2. 快速体验](#2)
+  - [2.1 命令行使用](#2.1)
+  - [2.2 Python脚本使用](#2.2)
+  - [2.3 模型列表](#2.3)
+- [3.小结](#3)
+
+<a name="1"></a>
+
+## 1. 安装
+
+<a name="1.1"></a>
+
+### 1.1 安装 PaddlePaddle
+
+- 您的机器安装的是 CUDA9 或 CUDA10，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- 您的机器是CPU，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+更多的版本需求，请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+<a name="1.2"></a>
+
+### 1.2 安装 PaddleClas whl 包
+
+```bash
+pip3 install paddleclas
+```
+
+<a name="2"></a>
+
+## 2. 快速体验
+
+PaddleClas 提供了一系列测试图片，里边包含人、车、OCR等方向的多个场景的demo数据。点击[这里](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip)下载并解压，然后在终端中切换到相应目录。
+
+<a name="2.1"></a>
+
+### 2.1 命令行使用
+
+```
+cd /path/to/pulc_demo_imgs
+```
+
+使用命令行预测：
+
+```bash
+paddleclas --model_name=person_exists --infer_imgs=pulc_demo_imgs/person_exists/objects365_01780782.jpg
+```
+
+结果如下：
+```
+>>> result
+class_ids: [0], scores: [0.9955421453341842], label_names: ['nobody'], filename: pulc_demo_imgs/person_exists/objects365_01780782.jpg
+Predict complete!
+```
+
+若预测结果为 `nobody`，表示该图中没有人，若预测结果为 `someone`，则表示该图中有人。此处预测结果为 `nobody`，表示该图中没有人。
+
+**备注**： 更换其他预测的数据时，只需要改变 `--infer_imgs=xx` 中的字段即可，支持传入整个文件夹，如需要替换模型，更改 `--model_name` 中的模型名字即可，模型名字可以参考[2.3 模型列表](#2.3)。
+
+<a name="2.2"></a>
+
+### 2.2 Python 脚本使用
+
+此处提供了在 python 脚本中使用 PULC 有人/无人分类模型预测的例子。
+
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="person_exists")
+result = model.predict(input_data="pulc_demo_imgs/person_exists/objects365_01780782.jpg")
+print(next(result))
+```
+
+打印的结果如下：
+
+```
+>>> result
+[{'class_ids': [0], 'scores': [0.9955421453341842], 'label_names': ['nobody'], 'filename': 'pulc_demo_imgs/person_exists/objects365_01780782.jpg'}]
+```
+
+**备注**：`model.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果, 默认 `batch_size` 为 1，如果需要更改 `batch_size`，实例化模型时，需要指定 `batch_size`，如 `model = paddleclas.PaddleClas(model_name="person_exists",  batch_size=2)`。更换其他模型只需要替换`model_name`, `model_name`,可以参考[2.3 模型列表](#2.3)。
+
+<a name="2.3"></a>
+
+### 2.3 模型列表
+
+PULC 系列模型的名称和简介如下：
+
+|模型名称|模型简介|
+| --- | --- |
+| person_exists | PULC有人/无人分类模型 |
+| person_attribute | PULC人体属性识别模型 |
+| safety_helmet | PULC佩戴安全帽分类模型 |
+| traffic_sign | PULC交通标志分类模型 |
+| vehicle_attribute | PULC车辆属性识别模型 |
+| car_exists | PULC有车/无车分类模型 |
+| text_image_orientation | PULC含文字图像方向分类模型 |
+| textline_orientation | PULC文本行方向分类模型 |
+| language_classification | PULC语种分类模型 |
+
+<a name="3"></a>
+
+## 3. 小结
+
+通过本节内容，相信您已经熟练掌握 PaddleClas whl 包的 PULC 模型使用方法并获得了初步效果。
+
+PULC 方法产出的系列模型在人、车、OCR等方向的多个场景中均验证有效，用超轻量模型就可实现与 SwinTransformer 模型接近的精度，预测速度提高 40+ 倍。并且打通数据、模型训练、压缩和推理部署全流程，具体地，您可以参考[PULC有人/无人分类模型](PULC_person_exists.md)、[PULC人体属性识别模型](PULC_person_attribute.md)、[PULC佩戴安全帽分类模型](PULC_safety_helmet.md)、[PULC交通标志分类模型](PULC_traffic_sign.md)、[PULC车辆属性识别模型](PULC_vehicle_attribute.md)、[PULC有车/无车分类模型](PULC_car_exists.md)、[PULC含文字图像方向分类模型](PULC_text_image_orientation.md)、[PULC文本行方向分类模型](PULC_textline_orientation.md)、[PULC语种分类模型](PULC_language_classification.md)。
diff --git a/docs/zh_CN/PULC/PULC_safety_helmet.md b/docs/zh_CN/PULC/PULC_safety_helmet.md
new file mode 100644
index 0000000000000000000000000000000000000000..0467b61b12c629ebc7a6e2a2268b4c82fe512abe
--- /dev/null
+++ b/docs/zh_CN/PULC/PULC_safety_helmet.md
@@ -0,0 +1,438 @@
+# PULC 佩戴安全帽分类模型
+
+------
+
+## 目录
+
+- [1. 模型和应用场景介绍](#1)
+- [2. 模型快速体验](#2)
+    - [2.1 安装 paddlepaddle](#2.1)
+    - [2.2 安装 paddleclas](#2.2)
+    - [2.3 预测](#2.3)
+- [3. 模型训练、评估和预测](#3)
+    - [3.1 环境配置](#3.1)
+    - [3.2 数据准备](#3.2)
+      - [3.2.1 数据集来源](#3.2.1)
+      - [3.2.2 数据集获取](#3.2.2)
+    - [3.3 模型训练](#3.3)
+    - [3.4 模型评估](#3.4)
+    - [3.5 模型预测](#3.5)
+- [4. 模型压缩](#4)
+  - [4.1 UDML 知识蒸馏](#4.1)
+    - [4.1.1 教师模型训练](#4.1.1)
+    - [4.1.2 蒸馏训练](#4.1.2)
+- [5. 超参搜索](#5)
+- [6. 模型推理部署](#6)
+  - [6.1 推理模型准备](#6.1)
+    - [6.1.1 基于训练得到的权重导出 inference 模型](#6.1.1)
+    - [6.1.2 直接下载 inference 模型](#6.1.2)
+  - [6.2 基于 Python 预测引擎推理](#6.2)
+    - [6.2.1 预测单张图像](#6.2.1)
+    - [6.2.2 基于文件夹的批量预测](#6.2.2)
+  - [6.3 基于 C++ 预测引擎推理](#6.3)
+  - [6.4 服务化部署](#6.4)
+  - [6.5 端侧部署](#6.5)
+  - [6.6 Paddle2ONNX 模型转换与预测](#6.6)
+
+
+<a name="1"></a>
+
+## 1. 模型和应用场景介绍
+
+该案例提供了用户使用 PaddleClas 的超轻量图像分类方案（PULC，Practical Ultra Lightweight image Classification）快速构建轻量级、高精度、可落地的“是否佩戴安全帽”的二分类模型。该模型可以广泛应用于如建筑施工场景、工厂车间场景、交通场景等。
+
+下表列出了判断图片中是否佩戴安全帽的二分类模型的相关指标，前三行展现了使用 Res2Net200_vd_26w_4s，SwinTranformer_tiny 和 MobileNetV3_small_x0_35 作为 backbone 训练得到的模型的相关指标，第四行至第七行依次展现了替换 backbone 为 PPLCNet_x1_0、使用 SSLD 预训练模型、使用 SSLD 预训练模型 + EDA 策略、使用 SSLD 预训练模型 + EDA 策略 + UDML 知识蒸馏策略训练得到的模型的相关指标。
+
+| 模型 | Tpr（%） | 延时（ms） | 存储（M） | 策略 |
+|-------|-----------|----------|---------------|---------------|
+| SwinTranformer_tiny  | 93.57 | 91.32  | 111 | 使用ImageNet预训练模型 |
+| Res2Net200_vd_26w_4s  | 98.92 | 80.99 | 284 | 使用ImageNet预训练模型 |
+| MobileNetV3_small_x0_35  | 84.83 | 2.85 | 2.6 | 使用ImageNet预训练模型 |
+| PPLCNet_x1_0  | 93.27 | 2.03  | 7.1 | 使用ImageNet预训练模型 |
+| PPLCNet_x1_0  | 98.16 | 2.03  | 7.1 | 使用SSLD预训练模型 |
+| PPLCNet_x1_0  | 99.30 | 2.03  | 7.1 | 使用SSLD预训练模型+EDA策略|
+| <b>PPLCNet_x1_0<b>  | <b>99.38<b> | <b>2.03<b>  | <b>7.1<b> | 使用SSLD预训练模型+EDA策略+UDML知识蒸馏策略|
+
+从表中可以看出，在使用服务器端大模型作为 backbone 时，SwinTranformer_tiny 精度较低，Res2Net200_vd_26w_4s 精度较高，但服务器端大模型推理速度普遍较慢。将 backbone 替换为轻量级模型 MobileNetV3_small_x0_35 后，速度可以大幅提升，但是精度显著降低。在将 backbone 替换为 PPLCNet_x1_0 后，精度较 MobileNetV3_small_x0_35 提高约 8.5 个百分点，与此同时速度快 20% 以上。在此基础上，将 PPLCNet_x1_0 的预训练模型替换为 SSLD 预训练模型后，在对推理速度无影响的前提下，精度提升约 4.9 个百分点，进一步地使用 EDA 策略后，精度可以再提升 1.1 个百分点。此时，PPLCNet_x1_0 已经超过 Res2Net200_vd_26w_4s 模型的精度，但是速度快 70+ 倍。最后，在使用 UDML 知识蒸馏后，精度可以再提升 0.08 个百分点。下面详细介绍关于 PULC 安全帽模型的训练方法和推理部署方法。
+
+**备注：**
+
+* `Tpr`指标的介绍可以参考 [3.3小节](#3.3)的备注部分，延时是基于 Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz 测试得到，开启MKLDNN加速策略，线程数为10。
+
+* 关于PP-LCNet的介绍可以参考[PP-LCNet介绍](../models/PP-LCNet.md)，相关论文可以查阅[PP-LCNet paper](https://arxiv.org/abs/2109.15099)。
+
+
+<a name="2"></a>
+
+## 2. 模型快速体验
+
+<a name="2.1"></a>  
+
+### 2.1 安装 paddlepaddle
+
+- 您的机器安装的是 CUDA9 或 CUDA10，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- 您的机器是CPU，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+更多的版本需求，请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+<a name="2.2"></a>  
+
+### 2.2 安装 paddleclas
+
+使用如下命令快速安装 paddleclas
+
+```  
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 预测
+
+点击[这里](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip)下载 demo 数据并解压，然后在终端中切换到相应目录。
+
+* 使用命令行快速预测
+
+```bash
+paddleclas --model_name=safety_helmet --infer_imgs=pulc_demo_imgs/safety_helmet/safety_helmet_test_1.png
+```
+
+结果如下：
+```
+>>> result
+class_ids: [1], scores: [0.9986255], label_names: ['unwearing_helmet'], filename: pulc_demo_imgs/safety_helmet/safety_helmet_test_1.png
+Predict complete!
+```
+
+**备注**： 更换其他预测的数据时，只需要改变 `--infer_imgs=xx` 中的字段即可，支持传入整个文件夹。
+
+
+* 在 Python 代码中预测
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="safety_helmet")
+result = model.predict(input_data="pulc_demo_imgs/safety_helmet/safety_helmet_test_1.png")
+print(next(result))
+```
+
+**备注**：`model.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果, 默认 `batch_size` 为 1，如果需要更改 `batch_size`，实例化模型时，需要指定 `batch_size`，如 `model = paddleclas.PaddleClas(model_name="safety_helmet",  batch_size=2)`, 使用默认的代码返回结果示例如下：
+
+```
+>>> result
+[{'class_ids': [1], 'scores': [0.9986255], 'label_names': ['unwearing_helmet'], 'filename': 'pulc_demo_imgs/safety_helmet/safety_helmet_test_1.png'}]
+```
+
+<a name="3"></a>
+
+## 3. 模型训练、评估和预测
+
+<a name="3.1"></a>  
+
+### 3.1 环境配置
+
+* 安装：请先参考文档 [环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+
+<a name="3.2"></a>
+
+### 3.2 数据准备
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 数据集来源
+
+本案例中所使用的所有数据集均为开源数据，数据集基于[Safety-Helmet-Wearing-Dataset](https://github.com/njvisionpower/Safety-Helmet-Wearing-Dataset)、[hard-hat-detection](https://www.kaggle.com/datasets/andrewmvd/hard-hat-detection)与[Large-scale CelebFaces Attributes (CelebA) Dataset](https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)处理整合而来。
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 数据集获取
+
+在公开数据集的基础上经过后处理即可得到本案例需要的数据，具体处理方法如下：
+
+* 对于 Safety-Helmet-Wearing-Dataset 数据集：根据 bbox 标签数据，对其宽、高放大 3 倍作为 bbox 对图像进行裁剪，其中带有安全帽的图像类别为0，不戴安全帽的图像类别为1；
+* 对于 hard-hat-detection 数据集：仅使用其中类别标签为 “hat” 的图像，并使用 bbox 标签进行裁剪，图像类别为0；
+* 对于 CelebA 数据集：仅使用其中类别标签为 “Wearing_Hat” 的图像，并使用 bbox 标签进行裁剪，图像类别为0。
+
+在整合上述数据后，可得到共约 15 万数据，其中戴安全帽与不戴安全帽的图像数量分别约为 2.8 万与 12.1 万，然后在两个类别上分别随机选取 0.56 万张图像作为测试集，共约 1.12 万张图像，其他约 13.8 万张图像作为训练集。
+
+处理后的数据集部分数据可视化如下：
+
+![](../../images/PULC/docs/safety_helmet_data_demo.jpg)
+
+此处提供了经过上述方法处理好的数据，可以直接下载得到。
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，下载并解压安全帽场景的数据。
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/safety_helmet.tar
+tar -xf safety_helmet.tar
+cd ../
+```
+
+执行上述命令后，`dataset/` 下存在 `safety_helmet` 目录，该目录中具有以下数据：
+
+```
+├── images
+│   ├── VOC2028_part2_001209_1.jpg
+│   ├── HHD_hard_hat_workers23_1.jpg
+│   ├── CelebA_077809.jpg
+│   ├── ...
+│   └── ...
+├── train_list.txt
+└── val_list.txt
+```
+
+其中，`train_list.txt` 和 `val_list.txt` 分别为训练集和验证集的标签文件，所有的图像数据在 `images/` 目录下。
+
+**备注：**
+
+* 关于 `train_list.txt`、`val_list.txt`的格式说明，可以参考[PaddleClas分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。
+
+<a name="3.3"></a>
+
+### 3.3 模型训练
+
+在 `ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0.yaml` 中提供了基于该场景的训练配置，可以通过如下脚本启动训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+    -c ./ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0.yaml
+```
+
+验证集的最佳指标在 `0.985-0.993` 之间（数据集较小，容易造成波动）。
+
+**备注：**
+
+* 此时使用的指标为Tpr，该指标描述了在假正类率（Fpr）小于某一个指标时的真正类率（Tpr），是产业中二分类问题常用的指标之一。在本案例中，Fpr 为万分之一。关于 Fpr 和 Tpr 的更多介绍，可以参考[这里](https://baike.baidu.com/item/AUC/19282953)。
+
+* 在eval时，会打印出来当前最佳的 TprAtFpr 指标，具体地，其会打印当前的 `Fpr`、`Tpr` 值，以及当前的 `threshold`值，`Tpr` 值反映了在当前 `Fpr` 值下的召回率，该值越高，代表模型越好。`threshold` 表示当前最佳 `Fpr` 所对应的分类阈值，可用于后续模型部署落地等。
+
+<a name="3.4"></a>
+
+### 3.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+
+其中 `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了训练过程中的最佳参数权重文件所在的路径，如需指定其他权重文件，只需替换对应的路径即可。
+
+<a name="3.5"></a>
+
+### 3.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```python
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+
+输出结果如下：
+
+```
+[{'class_ids': [1], 'scores': [0.9524797], 'label_names': ['unwearing_helmet'], 'file_name': 'deploy/images/PULC/safety_helmet/safety_helmet_test_1.png'}]
+```
+
+**备注：**
+
+* 这里`-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+* 默认是对 `deploy/images/PULC/safety_helmet/safety_helmet_test_1.png` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
+
+* 二分类默认的阈值为0.5， 如果需要指定阈值，可以重写 `Infer.PostProcess.threshold` ，如 `-o Infer.PostProcess.threshold=0.9167`，该值需要根据实际应用场景来确定，在 safety_helmet 数据集的 val 验证集上，在万分之一 Fpr 下得到的最佳 Tpr 时，该值为 0.9167。
+
+<a name="4"></a>
+
+## 4. 模型压缩
+
+<a name="4.1"></a>
+
+### 4.1 UDML 知识蒸馏
+
+UDML 知识蒸馏是一种简单有效的知识蒸馏方法，关于该方法的介绍，可以参考[UDML 知识蒸馏](../advanced_tutorials/knowledge_distillation.md#1.2.3)。
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 蒸馏训练
+
+配置文件 `ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0_distillation.yaml` 提供了 `UDML知识蒸馏策略` 的配置。训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+    -c ./ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0_distillation.yaml
+```
+
+验证集的最佳指标为 `0.990-0.993` 之间，当前模型最好的权重保存在 `output/DistillationModel/best_model_student.pdparams`。
+
+<a name="5"></a>
+
+## 5. 超参搜索
+
+在 [3.2 节](#3.2)和 [4.1 节](#4.1)所使用的超参数是根据 PaddleClas 提供的 `超参数搜索策略` 搜索得到的，如果希望在自己的数据集上得到更好的结果，可以参考[超参数搜索策略](PULC_train.md#4-超参搜索)来获得更好的训练超参数。
+
+**备注**：此部分内容是可选内容，搜索过程需要较长的时间，您可以根据自己的硬件情况来选择执行。如果没有更换数据集，可以忽略此节内容。
+
+<a name="6"></a>
+
+## 6. 模型推理部署
+
+<a name="6.1"></a>
+
+### 6.1 推理模型准备
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference 可使用 MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于 Paddle Inference 推理引擎的介绍，可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+
+当使用 Paddle Inference 推理时，加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法，如果希望得到和文档相同的结果，请选择[直接下载 inference 模型](#6.1.2)的方式。
+
+<a name="6.1.1"></a>
+
+### 6.1.1 基于训练得到的权重导出 inference 模型
+
+此处，我们提供了将权重和模型转换的脚本，执行该脚本可以得到对应的 inference 模型：
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_safety_helmet_infer
+```
+
+执行完该脚本后会在 `deploy/models/` 下生成 `PPLCNet_x1_0_safety_helmet_infer` 目录，该目录下有如下文件结构：
+
+```
+├── PPLCNet_x1_0_safety_helmet_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**备注：** 此处的最佳权重是经过知识蒸馏后的权重路径，如果没有执行知识蒸馏的步骤，最佳模型保存在 `output/PPLCNet_x1_0/best_model.pdparams` 中。
+
+<a name="6.1.2"></a>
+
+### 6.1.2 直接下载 inference 模型
+
+[6.1.1 小节](#6.1.1)提供了导出 inference 模型的方法，此处也提供了该场景可以下载的 inference 模型，可以直接下载体验。
+
+```
+cd deploy/models
+# 下载 inference 模型并解压
+wget https://paddleclas.bj.bcebos.com/models/PULC/safety_helmet_infer.tar && tar -xf safety_helmet_infer.tar
+```
+
+解压完毕后，`models` 文件夹下应有如下文件结构：
+
+```
+├── safety_helmet_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 基于 Python 预测引擎推理
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 预测单张图像
+
+返回 `deploy` 目录：
+
+```
+cd ../
+```
+
+运行下面的命令，对图像 `./images/PULC/safety_helmet/safety_helmet_test_1.png` 进行是否佩戴安全帽分类。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/safety_helmet/inference_safety_helmet.yaml
+# 使用下面的命令使用 CPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/safety_helmet/inference_safety_helmet.yaml -o Global.use_gpu=False
+```
+
+输出结果如下。
+
+```
+safety_helmet_test_1.png:       class id(s): [1], score(s): [1.00], label_name(s): ['unwearing_helmet']
+```
+
+**备注：** 二分类默认的阈值为0.5， 如果需要指定阈值，可以重写 `Infer.PostProcess.threshold` ，如 `-o Infer.PostProcess.threshold=0.9167`，该值需要根据实际应用场景来确定，在 safety_helmet 数据集的 val 验证集上，在万分之一 Fpr 下得到的最佳 Tpr 时，该值为 0.9167。该阈值的确定方法可以参考[3.3节](#3.3)备注部分。
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 基于文件夹的批量预测
+
+如果希望预测文件夹内的图像，可以直接修改配置文件中的 `Global.infer_imgs` 字段，也可以通过下面的 `-o` 参数修改对应的配置。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测，如果希望使用 CPU 预测，可以在命令后面添加 -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/safety_helmet/inference_safety_helmet.yaml -o Global.infer_imgs="./images/PULC/safety_helmet/"
+```
+
+终端中会输出该文件夹内所有图像的分类结果，如下所示。
+
+```
+safety_helmet_test_1.png:       class id(s): [1], score(s): [1.00], label_name(s): ['unwearing_helmet']
+safety_helmet_test_2.png:       class id(s): [0], score(s): [1.00], label_name(s): ['wearing_helmet']
+```
+
+其中，`wearing_helmet` 表示该图中的人佩戴了安全帽，`unwearing_helmet` 表示该图中的人未佩戴安全帽。
+
+<a name="6.3"></a>
+
+### 6.3 基于 C++ 预测引擎推理
+
+PaddleClas 提供了基于 C++ 预测引擎推理的示例，您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台，可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。
+
+<a name="6.4"></a>
+
+### 6.4 服务化部署
+
+Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍，可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。
+
+PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例，您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。
+
+<a name="6.5"></a>
+
+### 6.5 端侧部署
+
+Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍，可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。
+
+PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例，您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。
+
+<a name="6.6"></a>
+
+### 6.6 Paddle2ONNX 模型转换与预测
+
+Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署，包括TensorRT/OpenVINO/MNN/TNN/NCNN，以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍，可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。
+
+PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例，您可以参考[Paddle2ONNX 模型转换与预测](../../../deploy/paddle2onnx/readme.md)来完成相应的部署工作。
diff --git a/docs/zh_CN/PULC/PULC_text_image_orientation.md b/docs/zh_CN/PULC/PULC_text_image_orientation.md
new file mode 100644
index 0000000000000000000000000000000000000000..d89396f0a0c4a67dd0990bd4e19725684b894020
--- /dev/null
+++ b/docs/zh_CN/PULC/PULC_text_image_orientation.md
@@ -0,0 +1,460 @@
+# PULC 含文字图像方向分类模型
+
+## 目录
+
+- [1. 模型和应用场景介绍](#1)
+- [2. 模型快速体验](#2)
+    - [2.1 安装 paddlepaddle](#2.1)
+    - [2.2 安装 paddleclas](#2.2)
+    - [2.3 预测](#2.3)
+- [3. 模型训练、评估和预测](#3)
+  - [3.1 环境配置](#3.1)  
+  - [3.2 数据准备](#3.2)
+    - [3.2.1 数据集来源](#3.2.1)
+    - [3.2.2 数据集获取](#3.2.2)
+  - [3.3 模型训练](#3.3)
+  - [3.4 模型评估](#3.4)
+  - [3.5 模型预测](#3.5)
+- [4. 模型压缩](#4)
+  - [4.1 SKL-UGI 知识蒸馏](#4.1)
+    - [4.1.1 教师模型训练](#4.1.1)
+    - [4.1.2 蒸馏训练](#4.1.2)
+- [5. 超参搜索](#5)
+- [6. 模型推理部署](#6)
+  - [6.1 推理模型准备](#6.1)
+    - [6.1.1 基于训练得到的权重导出 inference 模型](#6.1.1)
+    - [6.1.2 直接下载 inference 模型](#6.1.2)
+  - [6.2 基于 Python 预测引擎推理](#6.2)
+    - [6.2.1 预测单张图片](#6.2.1)
+    - [6.2.2 基于文件夹的批量预测](#6.2.2)
+  - [6.3 基于 C++ 预测引擎推理](#6.3)
+  - [6.4 服务化部署](#6.4)
+  - [6.5 端侧部署](#6.5)
+  - [6.6 Paddle2ONNX 模型转换与预测](#6.6)
+
+<a name="1"></a>
+
+## 1. 模型和应用场景介绍
+
+在诸如文档扫描、证照拍摄等过程中，有时为了拍摄更清晰，会将拍摄设备进行旋转，导致得到的图片也是不同方向的。此时，标准的OCR流程无法很好地应对这些数据。利用图像分类技术，可以预先判断含文字图像的方向，并将其进行方向调整，从而提高OCR处理的准确性。该案例提供了用户使用 PaddleClas 的超轻量图像分类方案（PULC，Practical Ultra Lightweight image Classification）快速构建轻量级、高精度、可落地的含文字图像方向的分类模型。该模型可以广泛应用于金融、政务等行业的旋转图片的OCR处理场景中。
+
+下表列出了判断含文字图像方向分类模型的相关指标，前两行展现了使用 SwinTranformer_tiny 和 MobileNetV3_small_x0_35 作为 backbone 训练得到的模型的相关指标，第三行至第五行依次展现了替换 backbone 为 PPLCNet_x1_0、使用 SSLD 预训练模型、使用EDA策略训练得到的模型的相关指标。
+
+| 模型                    | 精度（%） | 延时（ms） | 存储（M） | 策略                       |
+| ----------------------- | --------- | ---------- | --------- | -------------------------- |
+| SwinTranformer_tiny     | 99.12     | 89.65      | 111       | 使用ImageNet预训练模型     |
+| MobileNetV3_small_x0_35 | 83.61     | 2.95       | 2.6        | 使用ImageNet预训练模型     |
+| PPLCNet_x1_0            | 97.85     | 2.16       | 7.1       | 使用ImageNet预训练模型     |
+| PPLCNet_x1_0            | 99.02     | 2.16       | 7.1       | 使用SSLD预训练模型         |
+| **PPLCNet_x1_0**        | **99.06** | **2.16**   | **7.1**   | 使用SSLD预训练模型+EDA策略 |
+
+从表中可以看出，backbone 为 SwinTranformer_tiny 时精度比较高，但是推理速度较慢。将 backbone 替换为轻量级模型 MobileNetV3_small_x0_35 后，速度提升明显，但精度有了大幅下降。将 backbone 替换为 PPLCNet_x1_0 时，速度略为提升，同时精度较 MobileNetV3_small_x0_35 高了 14.24 个百分点。在此基础上，使用 SSLD 预训练模型后，在不改变推理速度的前提下，精度可以提升 1.17 个百分点，进一步地使用 EDA 策略后，精度可以再提升 0.04 个百分点。此时，PPLCNet_x1_0 与 SwinTranformer_tiny 的精度差别不大，但是速度明显变快。关于 PULC 的训练方法和推理部署方法将在下面详细介绍。
+
+**备注：**
+
+* 关于PP-LCNet的介绍可以参考[PP-LCNet介绍](../models/PP-LCNet.md)，相关论文可以查阅[PP-LCNet paper](https://arxiv.org/abs/2109.15099)。
+
+<a name="2"></a>
+
+## 2. 模型快速体验
+
+<a name="2.1"></a>  
+
+### 2.1 安装 paddlepaddle
+
+- 您的机器安装的是 CUDA9 或 CUDA10，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- 您的机器是CPU，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+更多的版本需求，请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+<a name="2.2"></a>  
+
+### 2.2 安装 paddleclas
+
+使用如下命令快速安装 paddleclas
+
+```  
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 预测
+
+点击[这里](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip)下载 demo 数据并解压，然后在终端中切换到相应目录。
+
+* 使用命令行快速预测
+
+```bash
+paddleclas --model_name=text_image_orientation --infer_imgs=pulc_demo_imgs/text_image_orientation/img_rot0_demo.jpg
+```
+
+结果如下：
+```
+>>> result
+class_ids: [0, 2], scores: [0.85615, 0.05046], label_names: ['0', '180'], filename: pulc_demo_imgs/text_image_orientation/img_rot0_demo.jpg
+Predict complete!
+```
+
+**备注**： 更换其他预测的数据时，只需要改变 `--infer_imgs=xx` 中的字段即可，支持传入整个文件夹。
+
+
+* 在 Python 代码中预测
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="text_image_orientation")
+result = model.predict(input_data="pulc_demo_imgs/text_image_orientation/img_rot0_demo.jpg")
+print(next(result))
+```
+
+**备注**：`model.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果, 默认 `batch_size` 为 1，如果需要更改 `batch_size`，实例化模型时，需要指定 `batch_size`，如 `model = paddleclas.PaddleClas(model_name="text_image_orientation",  batch_size=2)`, 使用默认的代码返回结果示例如下：
+
+```
+>>> result
+[{'class_ids': [0, 2], 'scores': [0.85615, 0.05046], 'label_names': ['0', '180'], 'filename': 'pulc_demo_imgs/text_image_orientation/img_rot0_demo.jpg'}]
+```
+
+<a name="3"></a>
+
+## 3. 模型训练、评估和预测
+
+<a name="3.1"></a>
+
+### 3.1 环境配置
+
+* 安装：请先参考文档 [环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+
+<a name="3.2"></a>
+
+### 3.2 数据准备
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 数据集来源
+
+[第1节](#1)中提供的模型使用内部数据训练得到，该数据集暂时不方便公开。这里基于 [ICDAR2019-ArT](https://ai.baidu.com/broad/introduction?dataset=art)、 [XFUND](https://github.com/doc-analysis/XFUND) 和 [ICDAR2015](https://rrc.cvc.uab.es/?ch=4&com=introduction) 三个公开数据集构造了一个小规模含文字图像方向分类数据集，用于体验本案例。
+
+![](../../images/PULC/docs/text_image_orientation_original_data.png)
+
+<a name="3.2.2"></a>
+
+#### 3.2.2 数据集获取
+
+在公开数据集的基础上经过后处理即可得到本案例需要的数据，具体处理方法如下：
+
+考虑到原始图片的分辨率较高，模型训练时间较长，这里将所有数据预先进行了缩放处理，在保持长宽比不变的前提下，将短边缩放到384。然后将数据进行顺时针旋转处理，分别生成90度、180度和270度的合成数据。其中，将 ICDAR2019-ArT 和 XFUND 生成的41460张数据按照 9:1 的比例随机划分成了训练集和验证集， ICDAR2015 生成的6000张数据作为`SKL-UGI知识蒸馏策略`实验中的补充数据。
+
+处理后的数据集部分数据可视化如下：
+
+![](../../images/PULC/docs/text_image_orientation_data_demo.png)
+
+此处提供了经过上述方法处理好的数据，可以直接下载得到。
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，下载并解压含文字图像方向场景的数据。
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/text_image_orientation.tar
+tar -xf text_image_orientation.tar
+cd ../
+```
+
+执行上述命令后，`dataset/`下存在`text_image_orientation`目录，该目录中具有以下数据：
+
+```
+├── img_0
+│   ├── img_rot0_0.jpg
+│   ├── img_rot0_1.png
+...
+├── img_90
+│   ├── img_rot90_0.jpg
+│   ├── img_rot90_1.png
+...
+├── img_180
+│   ├── img_rot180_0.jpg
+│   ├── img_rot180_1.png
+...
+├── img_270
+│   ├── img_rot270_0.jpg
+│   ├── img_rot270_1.png
+...
+├── distill_data
+│   ├── gt_7060_0.jpg
+│   ├── gt_7060_90.jpg
+...
+├── train_list.txt
+├── train_list.txt.debug
+├── train_list_for_distill.txt
+├── test_list.txt
+├── test_list.txt.debug
+└── label_list.txt
+```
+
+其中`img_0/`、`img_90/`、`img_180/`和`img_270/`分别存放了4个角度的训练集和验证集数据。`train_list.txt`和`test_list.txt`分别为训练集和验证集的标签文件，`train_list.txt.debug`和`test_list.txt.debug`分别为训练集和验证集的`debug`标签文件，其分别是`train_list.txt`和`test_list.txt`的子集，用该文件可以快速体验本案例的流程。`distill_data/`是补充文字数据，该集合和`train`集合的混合数据用于本案例的`SKL-UGI知识蒸馏策略`，对应的训练标签文件为`train_list_for_distill.txt`。关于如何得到蒸馏的标签可以参考[知识蒸馏标签获得](../advanced_tutorials/ssld.md#3.2)。
+
+**备注：**
+
+* 关于 `train_list.txt`、`val_list.txt`的格式说明，可以参考[PaddleClas分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。
+
+* 关于如何得到蒸馏的标签文件可以参考[知识蒸馏标签获得方法](../advanced_tutorials/ssld.md#3.2)。
+
+<a name="3.3"></a>
+
+### 3.3 模型训练
+
+在`ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml`中提供了基于该场景的训练配置，可以通过如下脚本启动训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml
+```
+
+验证集的最佳指标在0.99左右。
+
+**备注**：本文档中提到的训练指标均为在大规模内部数据上的训练指标，使用 demo 数据训练时，由于数据集规模较小且分布与大规模内部数据不同，无法达到该指标。可以进一步扩充自己的数据并且使用本案例中介绍的优化方法进行调优，从而达到更高的精度。
+
+<a name="3.4"></a>
+
+### 3.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+其中 `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+<a name="3.5"></a>
+
+### 3.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```bash
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+输出结果如下：
+
+```
+[{'class_ids': [0, 2], 'scores': [0.85615, 0.05046], 'file_name': 'deploy/images/PULC/text_image_orientation/img_rot0_demo.jpg', 'label_names': ['0', '180']}]
+```
+
+**备注：**
+
+- 其中 `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+- 默认是对 `deploy/images/PULC/text_image_orientation/img_rot0_demo.jpg` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
+- 输出为top2的预测结果，`0` 表示该图文本方向为0度，`90` 表示该图文本方向为顺时针90度，`180` 表示该图文本方向为顺时针180度，`270` 表示该图文本方向为顺时针270度。
+
+<a name="4"></a>
+
+## 4. 模型压缩
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI 知识蒸馏
+
+SKL-UGI 知识蒸馏是 PaddleClas 提出的一种简单有效的知识蒸馏方法，关于该方法的介绍，可以参考[SKL-UGI 知识蒸馏](../advanced_tutorials/ssld.md)。
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 教师模型训练
+
+复用`ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml`中的超参数，训练教师模型，训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd
+```
+
+验证集的最佳指标为 0.996 左右，当前教师模型最好的权重保存在`output/ResNet101_vd/best_model.pdparams`。
+
+**备注：** 训练 ResNet101_vd 模型需要的显存较多，如果机器显存不够，可以将学习率和 batch size 同时缩小一定的倍数进行训练。如在命令后添加以下参数 `-o DataLoader.Train.sampler.batch_size=64`, `Optimizer.lr.learning_rate=0.1`。
+
+<a name="4.1.2"></a>
+
+#### 4.1.2 蒸馏训练
+
+配置文件`ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0_distillation.yaml`提供了`SKL-UGI 知识蒸馏策略`的配置。该配置将 `ResNet101_vd` 当作教师模型，`PPLCNet_x1_0` 当作学生模型，使用[3.2.2节](#3.2.2)中介绍的蒸馏数据作为新增的无标签数据。训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0_distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
+```
+
+验证集的最佳指标为0.99左右，当前模型最好的权重保存在`output/DistillationModel/best_model_student.pdparams`。
+
+<a name="5"></a>
+
+## 5. 超参搜索
+
+在 [3.2 节](#3.2)和 [4.1 节](#4.1)所使用的超参数是根据 PaddleClas 提供的 `超参数搜索策略` 搜索得到的，如果希望在自己的数据集上得到更好的结果，可以参考[超参数搜索策略](PULC_train.md#4-超参搜索)来获得更好的训练超参数。
+
+**备注：** 此部分内容是可选内容，搜索过程需要较长的时间，您可以根据自己的硬件情况来选择执行。如果没有更换数据集，可以忽略此节内容。
+
+<a name="6"></a>
+
+## 6. 模型推理部署
+
+<a name="6.1"></a>
+
+### 6.1 推理模型准备
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference可使用MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于Paddle Inference推理引擎的介绍，可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+
+当使用 Paddle Inference 推理时，加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法，如果希望得到和文档相同的结果，请选择[直接下载 inference 模型](#6.1.2)的方式。
+
+<a name="6.1.1"></a>
+
+#### 6.1.1 基于训练得到的权重导出 inference 模型
+
+此处，我们提供了将权重和模型转换的脚本，执行该脚本可以得到对应的 inference 模型：
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_text_image_orientation_infer
+```
+
+执行完该脚本后会在`deploy/models/`下生成`PPLCNet_x1_0_text_image_orientation_infer`文件夹，`models` 文件夹下应有如下文件结构：
+
+```
+├── PPLCNet_x1_0_text_image_orientation_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**备注：** 此处的最佳权重是经过知识蒸馏后的权重路径，如果没有执行知识蒸馏的步骤，最佳模型保存在`output/PPLCNet_x1_0/best_model.pdparams`中。
+
+<a name="6.1.2"></a>
+
+#### 6.1.2 直接下载 inference 模型
+
+[6.1.1 小节](#6.1.1)提供了导出 inference 模型的方法，此处也提供了该场景可以下载的 inference 模型，可以直接下载体验。
+
+```
+cd deploy/models
+# 下载inference 模型并解压
+wget https://paddleclas.bj.bcebos.com/models/PULC/text_image_orientation_infer.tar && tar -xf text_image_orientation_infer.tar
+```
+
+解压完毕后，`models` 文件夹下应有如下文件结构：
+
+```
+├── text_image_orientation_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 基于 Python 预测引擎推理
+
+<a name="6.2.1"></a>
+
+#### 6.2.1 预测单张图像
+
+返回 `deploy` 目录：
+
+```
+cd ../
+```
+
+运行下面的命令，对图像 `./images/PULC/text_image_orientation/img_rot0_demo.png` 进行含文字图像方向分类。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/text_image_orientation/inference_text_image_orientation.yaml
+# 使用下面的命令使用 CPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/text_image_orientation/inference_text_image_orientation.yaml -o Global.use_gpu=False
+```
+
+输出结果如下。
+
+```
+img_rot0_demo.jpg:    class id(s): [0, 2], score(s): [0.86, 0.05], label_name(s): ['0', '180']
+```
+
+其中，输出为top2的预测结果，`0` 表示该图文本方向为0度，`90` 表示该图文本方向为顺时针90度，`180` 表示该图文本方向为顺时针180度，`270` 表示该图文本方向为顺时针270度。
+
+<a name="6.2.2"></a>
+
+#### 6.2.2 基于文件夹的批量预测
+
+如果希望预测文件夹内的图像，可以直接修改配置文件中的 `Global.infer_imgs` 字段，也可以通过下面的 `-o` 参数修改对应的配置。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测，如果希望使用 CPU 预测，可以在命令后面添加 -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/text_image_orientation/inference_text_image_orientation.yaml -o Global.infer_imgs="./images/PULC/text_image_orientation/"
+```
+
+终端中会输出该文件夹内所有图像的分类结果，如下所示。
+
+```
+img_rot0_demo.jpg:    class id(s): [0, 2], score(s): [0.86, 0.05], label_name(s): ['0', '180']
+img_rot180_demo.jpg:    class id(s): [2, 1], score(s): [0.88, 0.04], label_name(s): ['180', '90']
+```
+
+<a name="6.3"></a>
+
+### 6.3 基于 C++ 预测引擎推理
+
+PaddleClas 提供了基于 C++ 预测引擎推理的示例，您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台，可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。
+
+<a name="6.4"></a>
+
+### 6.4 服务化部署
+
+Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍，可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。
+
+PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例，您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。
+
+<a name="6.5"></a>
+
+### 6.5 端侧部署
+
+Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍，可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。
+
+PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例，您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。
+
+<a name="6.6"></a>
+
+### 6.6 Paddle2ONNX 模型转换与预测
+
+Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署，包括TensorRT/OpenVINO/MNN/TNN/NCNN，以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍，可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。
+
+PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例，您可以参考[Paddle2ONNX 模型转换与预测](../../../deploy/paddle2onnx/readme.md)来完成相应的部署工作。
diff --git a/docs/zh_CN/PULC/PULC_textline_orientation.md b/docs/zh_CN/PULC/PULC_textline_orientation.md
new file mode 100644
index 0000000000000000000000000000000000000000..eea10307532eb0a8a323a82108b0c5f9691a82f8
--- /dev/null
+++ b/docs/zh_CN/PULC/PULC_textline_orientation.md
@@ -0,0 +1,457 @@
+# PULC 文本行方向分类模型
+
+------
+
+
+## 目录
+
+- [1. 模型和应用场景介绍](#1)
+- [2. 模型快速体验](#2)
+    - [2.1 安装 paddlepaddle](#2.1)
+    - [2.2 安装 paddleclas](#2.2)
+    - [2.3 预测](#2.3)
+- [3. 模型训练、评估和预测](#3)
+    - [3.1 环境配置](#3.1)
+    - [3.2 数据准备](#3.2)
+      - [3.2.1 数据集来源](#3.2.1)
+      - [3.2.2 数据集获取](#3.2.2)
+    - [3.3 模型训练](#3.3)
+    - [3.4 模型评估](#3.4)
+    - [3.5 模型预测](#3.5)
+- [4. 模型压缩](#4)
+  - [4.1 SKL-UGI 知识蒸馏](#4.1)
+    - [4.1.1 教师模型训练](#4.1.1)
+    - [4.1.2 蒸馏训练](#4.1.2)
+- [5. 超参搜索](#5)
+- [6. 模型推理部署](#6)
+  - [6.1 推理模型准备](#6.1)
+    - [6.1.1 基于训练得到的权重导出 inference 模型](#6.1.1)
+    - [6.1.2 直接下载 inference 模型](#6.1.2)
+  - [6.2 基于 Python 预测引擎推理](#6.2)
+    - [6.2.1 预测单张图像](#6.2.1)
+    - [6.2.2 基于文件夹的批量预测](#6.2.2)
+  - [6.3 基于 C++ 预测引擎推理](#6.3)
+  - [6.4 服务化部署](#6.4)
+  - [6.5 端侧部署](#6.5)
+  - [6.6 Paddle2ONNX 模型转换与预测](#6.6)
+
+
+<a name="1"></a>
+
+## 1. 模型和应用场景介绍
+
+该案例提供了用户使用 PaddleClas 的超轻量图像分类方案（PULC，Practical Ultra Lightweight image Classification）快速构建轻量级、高精度、可落地的文本行方向分类模型。该模型可以广泛应用于如文字矫正、文字识别等场景。
+
+下表列出了文本行方向分类模型的相关指标，前两行展现了使用 Res2Net200_vd 和 MobileNetV3_small_x0_35 作为 backbone 训练得到的模型的相关指标，第三行至第七行依次展现了替换 backbone 为 PPLCNet_x1_0、使用 SSLD 预训练模型、使用 SSLD 预训练模型 + EDA 策略、使用 SSLD 预训练模型 + EDA 策略 + SKL-UGI 知识蒸馏策略训练得到的模型的相关指标。
+
+
+| 模型 | Top-1 Acc（%） | 延时（ms） | 存储（M） | 策略 |
+|-------|-----------|----------|---------------|---------------|
+| SwinTranformer_tiny  | 93.61 | 89.64  | 111 | 使用 ImageNet 预训练模型 |
+| MobileNetV3_small_x0_35  | 81.40 | 2.96  | 2.6 | 使用 ImageNet 预训练模型 |
+| PPLCNet_x1_0  | 89.99 | 2.11  | 7.0 | 使用 ImageNet 预训练模型 |
+| PPLCNet_x1_0*  | 94.06 | 2.68  | 7.0 | 使用 ImageNet 预训练模型 |
+| PPLCNet_x1_0*  | 94.11 | 2.68  | 7.0 | 使用 SSLD 预训练模型 |
+| <b>PPLCNet_x1_0**<b>  | <b>96.01<b> | <b>2.72<b>  | <b>7.0<b> | 使用 SSLD 预训练模型+EDA 策略|
+| PPLCNet_x1_0**  | 95.86 | 2.72  | 7.0 | 使用 SSLD 预训练模型+EDA 策略+SKL-UGI 知识蒸馏策略|
+
+从表中可以看出，backbone 为 SwinTranformer_tiny 时精度较高，但是推理速度较慢。将 backbone 替换为轻量级模型 MobileNetV3_small_x0_35 后，速度可以大幅提升，精度下降也比较明显。将 backbone 替换为 PPLCNet_x1_0 时，精度较 MobileNetV3_small_x0_35 高 8.6 个百分点，速度快10%左右。在此基础上，更改分辨率和stride， 速度变慢 27%，但是精度可以提升 4.5 个百分点（采用[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)的方案），使用 SSLD 预训练模型后，精度可以继续提升约 0.05 个百分点 ，进一步地，当融合EDA策略后，精度可以再提升 1.9 个百分点。最后，融合SKL-UGI 知识蒸馏策略后，在该场景无效。关于 PULC 的训练方法和推理部署方法将在下面详细介绍。
+
+**备注：**
+
+* 其中不带\*的模型表示分辨率为224x224，带\*的模型表示分辨率为48x192（h\*w）,数据增强从网络中的 stride 改为 `[2, [2, 1], [2, 1], [2, 1], [2, 1]]`，其中，外层列表中的每一个元素代表网络结构下采样层的stride，该策略为 [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) 提供的文本行方向分类器方案。带\*\*的模型表示分辨率为80x160（h\*w）, 网络中的 stride 改为 `[2, [2, 1], [2, 1], [2, 1], [2, 1]]`，其中，外层列表中的每一个元素代表网络结构下采样层的stride，此分辨率是经过[超参数搜索策略](PULC_train.md#4-超参搜索)搜索得到的。
+* 延时是基于 Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz 测试得到，开启 MKLDNN 加速策略，线程数为10。
+* 关于PP-LCNet的介绍可以参考[PP-LCNet介绍](../models/PP-LCNet.md)，相关论文可以查阅[PP-LCNet paper](https://arxiv.org/abs/2109.15099)。
+
+<a name="2"></a>
+
+## 2. 模型快速体验
+
+<a name="2.1"></a>  
+
+### 2.1 安装 paddlepaddle
+
+- 您的机器安装的是 CUDA9 或 CUDA10，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- 您的机器是CPU，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+更多的版本需求，请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+<a name="2.2"></a>  
+
+### 2.2 安装 paddleclas
+
+使用如下命令快速安装 paddleclas
+
+```  
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 预测
+
+点击[这里](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip)下载 demo 数据并解压，然后在终端中切换到相应目录。
+
+* 使用命令行快速预测
+
+```bash
+paddleclas --model_name=textline_orientation --infer_imgs=pulc_demo_imgs/textline_orientation/textline_orientation_test_0_0.png
+```
+
+结果如下：
+```
+>>> result
+class_ids: [0], scores: [1.0], label_names: ['0_degree'], filename: pulc_demo_imgs/textline_orientation/textline_orientation_test_0_0.png
+Predict complete!
+```
+
+**备注**： 更换其他预测的数据时，只需要改变 `--infer_imgs=xx` 中的字段即可，支持传入整个文件夹。
+
+* 在 Python 代码中预测
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="textline_orientation")
+result = model.predict(input_data="pulc_demo_imgs/textline_orientation/textline_orientation_test_0_0.png")
+print(next(result))
+```
+
+**备注**：`model.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果, 默认 `batch_size` 为 1，如果需要更改 `batch_size`，实例化模型时，需要指定 `batch_size`，如 `model = paddleclas.PaddleClas(model_name="textline_orientation",  batch_size=2)`, 使用默认的代码返回结果示例如下：
+
+```
+>>> result
+[{'class_ids': [0], 'scores': [1.0], 'label_names': ['0_degree'], 'filename': 'pulc_demo_imgs/textline_orientation/textline_orientation_test_0_0.png'}]
+```
+
+<a name="3"></a>
+
+## 3. 模型训练、评估和预测
+
+<a name="3.1"></a>  
+
+### 3.1 环境配置
+
+* 安装：请先参考文档 [环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+
+<a name="3.2"></a>
+
+### 3.2 数据准备
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 数据集来源
+
+本案例中所使用的所有数据集来源于内部数据，如果您希望体验训练过程，可以使用开源数据如[ICDAR2019-LSVT 文本行识别数据](https://aistudio.baidu.com/aistudio/datasetdetail/8429)。
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 数据集获取
+
+在公开数据集的基础上经过后处理即可得到本案例需要的数据，具体处理方法如下：
+
+本案例处理了 ICDAR2019-LSVT 文本行识别数据，将其中的 id 号为 0-1999 作为本案例的数据集合，经过旋转处理成 0 类 和 1 类，其中 0 类代表文本行为正，即 0 度，1 类代表文本行为反，即 180 度。
+
+- 训练集合，id号为 0-1799 作为训练集合，0 类和 1 类共 3600 张。
+
+- 验证集合，id号为 1800-1999 作为验证集合，0 类和 1 类共 400 张。
+
+处理后的数据集部分数据可视化如下：
+
+![](../../images/PULC/docs/textline_orientation_data_demo.png)
+
+此处提供了经过上述方法处理好的数据，可以直接下载得到。
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，下载并解压文本行方向分类场景的数据。
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/textline_orientation.tar
+tar -xf textline_orientation.tar
+cd ../
+```
+
+执行上述命令后，`dataset/` 下存在 `textline_orientation` 目录，该目录中具有以下数据：
+
+```
+├── 0
+│   ├── img_0.jpg
+│   ├── img_1.jpg
+...
+├── 1
+│   ├── img_0.jpg
+│   ├── img_1.jpg
+...
+├── train_list.txt
+└── val_list.txt
+```
+
+其中 `0/` 和 `1/` 分别存放 0 类和 1 类的数据。`train_list.txt` 和 `val_list.txt` 分别为训练集和验证集的标签文件。
+
+**备注：**
+
+* 关于 `train_list.txt`、`val_list.txt` 的格式说明，可以参考[PaddleClas分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。
+
+
+<a name="3.3"></a>
+
+### 3.3 模型训练
+
+
+在 `ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml` 中提供了基于该场景的训练配置，可以通过如下脚本启动训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml
+```
+
+
+**备注：**
+
+* 由于此时使用的数据集并非内部非开源数据集，此处不能直接复现提供的模型的指标，如果希望得到更高的精度，可以根据需要处理[ICDAR2019-LSVT 文本行识别数据](https://aistudio.baidu.com/aistudio/datasetdetail/8429)。
+
+<a name="3.4"></a>
+
+### 3.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+其中 `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+<a name="3.5"></a>
+
+### 3.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```python
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+
+输出结果如下：
+
+```
+[{'class_ids': [0], 'scores': [1.0], 'file_name': 'deploy/images/PULC/textline_orientation/textline_orientation_test_0_0.png', 'label_names': ['0_degree']}]
+```
+
+**备注：**
+
+* 这里`-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+* 默认是对 `deploy/images/PULC/textline_orientation/textline_orientation_test_0_0.png` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
+
+
+<a name="4"></a>
+
+## 4. 模型压缩
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI 知识蒸馏
+
+SKL-UGI 知识蒸馏是 PaddleClas 提出的一种简单有效的知识蒸馏方法，关于该方法的介绍，可以参考[SKL-UGI 知识蒸馏](../advanced_tutorials/ssld.md)。
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 教师模型训练
+
+复用 `./ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml` 中的超参数，训练教师模型，训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd
+```
+
+当前教师模型最好的权重保存在 `output/ResNet101_vd/best_model.pdparams`。
+
+<a name="4.1.2"></a>
+
+####  4.1.2 蒸馏训练
+
+配置文件`ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0_distillation.yaml`提供了`SKL-UGI知识蒸馏策略`的配置。该配置将`ResNet101_vd`当作教师模型，`PPLCNet_x1_0`当作学生模型。训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0_distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
+```
+
+当前模型最好的权重保存在 `output/DistillationModel/best_model_student.pdparams`。
+
+
+<a name="5"></a>
+
+## 5. 超参搜索
+
+在 [3.3 节](#3.3)和 [4.1 节](#4.1)所使用的超参数是根据 PaddleClas 提供的 `超参数搜索策略` 搜索得到的，如果希望在自己的数据集上得到更好的结果，可以参考[超参数搜索策略](PULC_train.md#4-超参搜索)来获得更好的训练超参数。
+
+**备注：** 此部分内容是可选内容，搜索过程需要较长的时间，您可以根据自己的硬件情况来选择执行。
+
+<a name="6"></a>
+
+## 6. 模型推理部署
+
+<a name="6.1"></a>
+
+### 6.1 推理模型准备
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference可使用MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于Paddle Inference推理引擎的介绍，可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+
+当使用 Paddle Inference 推理时，加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法，如果希望得到和文档相同的结果，请选择[直接下载 inference 模型](#6.1.2)的方式。
+
+<a name="6.1.1"></a>
+
+### 6.1.1 基于训练得到的权重导出 inference 模型
+
+此处，我们提供了将权重和模型转换的脚本，执行该脚本可以得到对应的 inference 模型：
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_textline_orientation_infer
+```
+执行完该脚本后会在 `deploy/models/` 下生成 `PPLCNet_x1_0_textline_orientation_infer` 文件夹，`models` 文件夹下应有如下文件结构：
+
+```
+├── PPLCNet_x1_0_textline_orientation_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**备注：** 此处的最佳权重可以根据实际情况来选择，如果希望导出知识蒸馏后的权重，则最佳权重保存在`output/DistillationModel/best_model_student.pdparams`，在导出命令中更改`-o Global.pretrained_model=xx`中的字段为`output/DistillationModel/best_model_student`即可。
+
+<a name="6.1.2"></a>
+
+### 6.1.2 直接下载 inference 模型
+
+[6.1.1 小节](#6.1.1)提供了导出 inference 模型的方法，此处也提供了该场景可以下载的 inference 模型，可以直接下载体验。
+
+```
+cd deploy/models
+# 下载 inference 模型并解压
+wget https://paddleclas.bj.bcebos.com/models/PULC/textline_orientation_infer.tar && tar -xf textline_orientation_infer.tar
+```
+
+解压完毕后，`models` 文件夹下应有如下文件结构：
+
+```
+├── textline_orientation_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 基于 Python 预测引擎推理
+
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 预测单张图像
+
+返回 `deploy` 目录：
+
+```
+cd ../
+```
+
+运行下面的命令，对图像 `./images/PULC/textline_orientation/textline_orientation_test_0_0.png` 进行文字方向cd分类。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/textline_orientation/inference_textline_orientation.yaml
+# 使用下面的命令使用 CPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/textline_orientation/inference_textline_orientation.yaml  -o Global.use_gpu=False
+```
+
+输出结果如下。
+
+```
+textline_orientation_test_0_0.png:    class id(s): [0], score(s): [1.00], label_name(s): ['0_degree']
+```
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 基于文件夹的批量预测
+
+如果希望预测文件夹内的图像，可以直接修改配置文件中的 `Global.infer_imgs` 字段，也可以通过下面的 `-o` 参数修改对应的配置。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测，如果希望使用 CPU 预测，可以在命令后面添加 -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/textline_orientation/inference_textline_orientation.yaml -o Global.infer_imgs="./images/PULC/textline_orientation/"
+```
+
+终端中会输出该文件夹内所有图像的分类结果，如下所示。
+
+```
+textline_orientation_test_0_0.png:    class id(s): [0], score(s): [1.00], label_name(s): ['0_degree']
+textline_orientation_test_0_1.png:    class id(s): [0], score(s): [1.00], label_name(s): ['0_degree']
+textline_orientation_test_1_0.png:    class id(s): [1], score(s): [1.00], label_name(s): ['180_degree']
+textline_orientation_test_1_1.png:    class id(s): [1], score(s): [1.00], label_name(s): ['180_degree']
+```
+
+其中，`0_degree` 表示该文本行为 0 度，`180_degree` 表示该文本行为 180 度。
+
+<a name="6.3"></a>
+
+### 6.3 基于 C++ 预测引擎推理
+
+PaddleClas 提供了基于 C++ 预测引擎推理的示例，您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台，可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。
+
+<a name="6.4"></a>
+
+### 6.4 服务化部署
+
+Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍，可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。
+
+PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例，您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。
+
+<a name="6.5"></a>
+
+### 6.5 端侧部署
+
+Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍，可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。
+
+PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例，您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。
+
+<a name="6.6"></a>
+
+### 6.6 Paddle2ONNX 模型转换与预测
+
+Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署，包括TensorRT/OpenVINO/MNN/TNN/NCNN，以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍，可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。
+
+PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例，您可以参考[Paddle2ONNX 模型转换与预测](../../../deploy/paddle2onnx/readme.md)来完成相应的部署工作。
diff --git a/docs/zh_CN/PULC/PULC_traffic_sign.md b/docs/zh_CN/PULC/PULC_traffic_sign.md
new file mode 100644
index 0000000000000000000000000000000000000000..700cbd58b89501ec8b7fe9add5bdceb373a36936
--- /dev/null
+++ b/docs/zh_CN/PULC/PULC_traffic_sign.md
@@ -0,0 +1,485 @@
+# PULC 交通标志分类模型
+
+------
+
+
+## 目录
+
+- [1. 模型和应用场景介绍](#1)
+- [2. 模型快速体验](#2)
+    - [2.1 安装 paddlepaddle](#2.1)
+    - [2.2 安装 paddleclas](#2.2)
+    - [2.3 预测](#2.3)
+- [3. 模型训练、评估和预测](#3)
+    - [3.1 环境配置](#3.1)
+    - [3.2 数据准备](#3.2)
+      - [3.2.1 数据集来源](#3.2.1)
+      - [3.2.2 数据集获取](#3.2.2)
+    - [3.3 模型训练](#3.3)
+    - [3.4 模型评估](#3.4)
+    - [3.5 模型预测](#3.5)
+- [4. 模型压缩](#4)
+  - [4.1 SKL-UGI 知识蒸馏](#4.1)
+    - [4.1.1 教师模型训练](#4.1.1)
+    - [4.1.2 蒸馏训练](#4.1.2)
+- [5. 超参搜索](#5)
+- [6. 模型推理部署](#6)
+  - [6.1 推理模型准备](#6.1)
+    - [6.1.1 基于训练得到的权重导出 inference 模型](#6.1.1)
+    - [6.1.2 直接下载 inference 模型](#6.1.2)
+  - [6.2 基于 Python 预测引擎推理](#6.2)
+    - [6.2.1 预测单张图像](#6.2.1)
+    - [6.2.2 基于文件夹的批量预测](#6.2.2)
+  - [6.3 基于 C++ 预测引擎推理](#6.3)
+  - [6.4 服务化部署](#6.4)
+  - [6.5 端侧部署](#6.5)
+  - [6.6 Paddle2ONNX 模型转换与预测](#6.6)
+
+
+<a name="1"></a>
+
+## 1. 模型和应用场景介绍
+
+该案例提供了用户使用 PaddleClas 的超轻量图像分类方案（PULC，Practical Ultra Lightweight image Classification）快速构建轻量级、高精度、可落地的交通标志分类模型。该模型可以广泛应用于自动驾驶、道路监控等场景。
+
+下表列出了不同交通标志分类模型的相关指标，前两行展现了使用 SwinTranformer_tiny 和 MobileNetV3_small_x0_35 作为 backbone 训练得到的模型的相关指标，第三行至第六行依次展现了替换 backbone 为 PPLCNet_x1_0、使用 SSLD 预训练模型、使用 SSLD 预训练模型 + EDA 策略、使用 SSLD 预训练模型 + EDA 策略 + SKL-UGI 知识蒸馏策略训练得到的模型的相关指标。
+
+
+| 模型 | Top-1 Acc（%） | 延时（ms） | 存储（M） | 策略 |
+|-------|-----------|----------|---------------|---------------|
+| SwinTranformer_tiny  | 98.11 | 89.45  | 111 | 使用ImageNet预训练模型 |
+| MobileNetV3_small_x0_35  | 93.88 | 3.01  | 3.9 | 使用ImageNet预训练模型 |
+| PPLCNet_x1_0  | 97.78 | 2.10  | 8.2 | 使用ImageNet预训练模型 |
+| PPLCNet_x1_0  | 97.84 | 2.10  | 8.2 | 使用SSLD预训练模型 |
+| PPLCNet_x1_0  | 98.14 | 2.10  | 8.2 | 使用SSLD预训练模型+EDA策略|
+| <b>PPLCNet_x1_0<b>  | <b>98.35<b> | <b>2.10<b>  | <b>8.2<b> | 使用SSLD预训练模型+EDA策略+SKL-UGI知识蒸馏策略|
+
+从表中可以看出，backbone 为 SwinTranformer_tiny 时精度较高，但是推理速度较慢。将 backbone 替换为轻量级模型 MobileNetV3_small_x0_35 后，速度可以大幅提升，但是精度下降明显。将 backbone 替换为 PPLCNet_x1_0 时，精度低3.9%，同时速度提升 43% 左右。在此基础上，使用 SSLD 预训练模型后，在不改变推理速度的前提下，精度可以提升约 0.06%，进一步地，当融合EDA策略后，精度可以再提升 0.3%，最后，在使用 SKL-UGI 知识蒸馏后，精度可以继续提升 0.21%。此时，PPLCNet_x1_0 的精度超越了 SwinTranformer_tiny，速度快 41 倍。关于 PULC 的训练方法和推理部署方法将在下面详细介绍。
+
+**备注：**
+
+* 关于PP-LCNet的介绍可以参考[PP-LCNet介绍](../models/PP-LCNet.md)，相关论文可以查阅[PP-LCNet paper](https://arxiv.org/abs/2109.15099)。
+
+
+<a name="2"></a>
+
+## 2. 模型快速体验
+
+<a name="2.1"></a>  
+
+### 2.1 安装 paddlepaddle
+
+- 您的机器安装的是 CUDA9 或 CUDA10，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- 您的机器是CPU，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+更多的版本需求，请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+<a name="2.2"></a>  
+
+### 2.2 安装 paddleclas
+
+使用如下命令快速安装 paddleclas
+
+```  
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 预测
+
+点击[这里](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip)下载 demo 数据并解压，然后在终端中切换到相应目录。
+
+* 使用命令行快速预测
+
+```bash
+paddleclas --model_name=traffic_sign --infer_imgs=pulc_demo_imgs/traffic_sign/100999_83928.jpg
+```
+
+结果如下：
+```
+>>> result
+class_ids: [182, 179, 162, 128, 24], scores: [0.98623, 0.01255, 0.00022, 0.00021, 0.00012], label_names: ['pl110', 'pl100', 'pl120', 'p26', 'pm10'], filename: pulc_demo_imgs/traffic_sign/100999_83928.jpg
+Predict complete!
+```
+
+**备注**： 更换其他预测的数据时，只需要改变 `--infer_imgs=xx` 中的字段即可，支持传入整个文件夹。
+
+
+* 在 Python 代码中预测
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="traffic_sign")
+result = model.predict(input_data="pulc_demo_imgs/traffic_sign/100999_83928.jpg")
+print(next(result))
+```
+
+**备注**：`model.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果, 默认 `batch_size` 为 1，如果需要更改 `batch_size`，实例化模型时，需要指定 `batch_size`，如 `model = paddleclas.PaddleClas(model_name="traffic_sign",  batch_size=2)`, 使用默认的代码返回结果示例如下：
+
+```
+>>> result
+[{'class_ids': [182, 179, 162, 128, 24], 'scores': [0.98623, 0.01255, 0.00022, 0.00021, 0.00012], 'label_names': ['pl110', 'pl100', 'pl120', 'p26', 'pm10'], 'filename': 'pulc_demo_imgs/traffic_sign/100999_83928.jpg'}]
+```
+
+<a name="3"></a>
+
+## 3. 模型训练、评估和预测
+
+<a name="3.1"></a>  
+
+### 3.1 环境配置
+
+* 安装：请先参考文档 [环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+
+<a name="3.2"></a>
+
+### 3.2 数据准备
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 数据集来源
+
+本案例中所使用的数据为[Tsinghua-Tencent 100K dataset (CC-BY-NC license)](https://cg.cs.tsinghua.edu.cn/traffic-sign/)，在使用的过程中，对交通标志检测框进行随机扩充与裁剪，从而得到用于训练与测试的图像，下面简称该数据集为`TT100K`数据集。
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 数据集获取
+
+在TT00K数据集上，对交通标志检测框进行随机扩充与裁剪，从而得到用于训练与测试的图像。随机扩充检测框的逻辑如下所示。
+
+```python
+def get_random_crop_box(xmin, ymin, xmax, ymax, img_height, img_width, ratio=1.0):
+    h = ymax - ymin
+    w = ymax - ymin
+
+    xmin_diff = random.random() * ratio * min(w, xmin/ratio)
+    ymin_diff = random.random() * ratio * min(h, ymin/ratio)
+    xmax_diff = random.random() * ratio * min(w, (img_width-xmin-1)/ratio)
+    ymax_diff = random.random() * ratio * min(h, (img_height-ymin-1)/ratio)
+
+    new_xmin = round(xmin - xmin_diff)
+    new_ymin = round(ymin - ymin_diff)
+    new_xmax = round(xmax + xmax_diff)
+    new_ymax = round(ymax + ymax_diff)
+
+    return new_xmin, new_ymin, new_xmax, new_ymax
+```
+
+完整的预处理逻辑，可以参考下载好的数据集文件夹中的`deal.py`文件。
+
+
+处理后的数据集部分数据可视化如下。
+
+<div align="center">
+<img src="../../images/PULC/docs/traffic_sign_data_demo.png"  width = "500" />
+</div>
+
+
+此处提供了经过上述方法处理好的数据，可以直接下载得到。
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，下载并解压交通标志分类场景的数据。
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/traffic_sign.tar
+tar -xf traffic_sign.tar
+cd ../
+```
+
+执行上述命令后，`dataset/`下存在`traffic_sign`目录，该目录中具有以下数据：
+
+```
+traffic_sign
+├── train
+│   ├── 0_62627.jpg
+│   ├── 100000_89031.jpg
+│   ├── 100001_89031.jpg
+...
+├── test
+│   ├── 100423_2315.jpg
+│   ├── 100424_2315.jpg
+│   ├── 100425_2315.jpg
+...
+├── other
+│   ├── 100603_3422.jpg
+│   ├── 100604_3422.jpg
+...
+├── label_list_train.txt
+├── label_list_test.txt
+├── label_list_other.txt
+├── label_list_train_for_distillation.txt
+├── label_list_train.txt.debug
+├── label_list_test.txt.debug
+├── label_name_id.txt
+├── deal.py
+```
+
+其中`train/`和`test/`分别为训练集和验证集。`label_list_train.txt`和`label_list_test.txt`分别为训练集和验证集的标签文件，`label_list_train.txt.debug`和`label_list_test.txt.debug`分别为训练集和验证集的`debug`标签文件，其分别是`label_list_train.txt`和`label_list_test.txt`的子集，用该文件可以快速体验本案例的流程。`train`与`other`的混合数据用于本案例的`SKL-UGI知识蒸馏策略`，对应的训练标签文件为`label_list_train_for_distillation.txt`。
+
+
+**备注：**
+
+* 关于 `label_list_train.txt`、`label_list_test.txt`的格式说明，可以参考[PaddleClas分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。
+
+* 关于如何得到蒸馏的标签文件可以参考[知识蒸馏标签获得方法](../advanced_tutorials/ssld.md)。
+
+
+<a name="3.3"></a>
+
+### 3.3 模型训练
+
+
+在 `ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml` 中提供了基于该场景的训练配置，可以通过如下脚本启动训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml
+```
+
+验证集的最佳指标在 `98.14%` 左右（数据集较小，一般有0.1%左右的波动）。
+
+
+<a name="3.4"></a>
+
+### 3.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+其中 `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+<a name="3.5"></a>
+
+### 3.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```bash
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model
+```
+
+输出结果如下：
+
+```
+99603_17806.jpg:        class id(s): [216, 145, 49, 207, 169], score(s): [1.00, 0.00, 0.00, 0.00, 0.00], label_name(s): ['pm20', 'pm30', 'pm40', 'pl25', 'pm15']
+```
+
+**备注：**
+
+* 这里`-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+* 默认是对 `deploy/images/PULC/traffic_sign/99603_17806.jpg` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
+
+<a name="4"></a>
+
+## 4. 模型压缩
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI 知识蒸馏
+
+SKL-UGI 知识蒸馏是 PaddleClas 提出的一种简单有效的知识蒸馏方法，关于该方法的介绍，可以参考[SKL-UGI 知识蒸馏](../advanced_tutorials/ssld.md#3.2)。
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 教师模型训练
+
+复用 `ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml` 中的超参数，训练教师模型，训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd
+```
+
+验证集的最佳指标为 `98.59%` 左右，当前教师模型最好的权重保存在 `output/ResNet101_vd/best_model.pdparams`。
+
+<a name="4.1.2"></a>
+
+####  4.1.2 蒸馏训练
+
+配置文件`ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0_distillation.yaml`提供了`SKL-UGI知识蒸馏策略`的配置。该配置将`ResNet101_vd`当作教师模型，`PPLCNet_x1_0`当作学生模型，使用ImageNet数据集的验证集作为新增的无标签数据。训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0_distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
+```
+
+验证集的最佳指标为 `98.35%` 左右，当前模型最好的权重保存在 `output/DistillationModel/best_model_student.pdparams`。
+
+
+<a name="5"></a>
+
+## 5. 超参搜索
+
+在 [3.2 节](#3.2)和 [4.1 节](#4.1)所使用的超参数是根据 PaddleClas 提供的 `超参数搜索策略` 搜索得到的，如果希望在自己的数据集上得到更好的结果，可以参考[超参数搜索策略](PULC_train.md#4-超参搜索)来获得更好的训练超参数。
+
+**备注：** 此部分内容是可选内容，搜索过程需要较长的时间，您可以根据自己的硬件情况来选择执行。如果没有更换数据集，可以忽略此节内容。
+
+<a name="6"></a>
+
+## 6. 模型推理部署
+
+<a name="6.1"></a>
+
+### 6.1 推理模型准备
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference可使用MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于Paddle Inference推理引擎的介绍，可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+
+当使用 Paddle Inference 推理时，加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法，如果希望得到和文档相同的结果，请选择[直接下载 inference 模型](#6.1.2)的方式。
+
+<a name="6.1.1"></a>
+
+### 6.1.1 基于训练得到的权重导出 inference 模型
+
+此处，我们提供了将权重和模型转换的脚本，执行该脚本可以得到对应的 inference 模型：
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_traffic_sign_infer
+```
+执行完该脚本后会在 `deploy/models/` 下生成 `PPLCNet_x1_0_traffic_sign_infer` 文件夹，`models` 文件夹下应有如下文件结构：
+
+```
+├── PPLCNet_x1_0_traffic_sign_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+**备注：** 此处的最佳权重是经过知识蒸馏后的权重路径，如果没有执行知识蒸馏的步骤，最佳模型保存在`output/PPLCNet_x1_0/best_model.pdparams`中。
+
+<a name="6.1.2"></a>
+
+### 6.1.2 直接下载 inference 模型
+
+[6.1.1 小节](#6.1.1)提供了导出 inference 模型的方法，此处也提供了该场景可以下载的 inference 模型，可以直接下载体验。
+
+```
+cd deploy/models
+# 下载 inference 模型并解压
+wget https://paddleclas.bj.bcebos.com/models/PULC/traffic_sign_infer.tar && tar -xf traffic_sign_infer.tar
+```
+
+解压完毕后，`models` 文件夹下应有如下文件结构：
+
+```
+├── traffic_sign_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 基于 Python 预测引擎推理
+
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 预测单张图像
+
+返回 `deploy` 目录：
+
+```
+cd ../
+```
+
+运行下面的命令，对图像 `./images/PULC/traffic_sign/99603_17806.jpg` 进行交通标志分类。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/traffic_sign/inference_traffic_sign.yaml
+# 使用下面的命令使用 CPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/traffic_sign/inference_traffic_sign.yaml -o Global.use_gpu=False
+```
+
+输出结果如下。
+
+```
+99603_17806.jpg:        class id(s): [216, 145, 49, 207, 169], score(s): [1.00, 0.00, 0.00, 0.00, 0.00], label_name(s): ['pm20', 'pm30', 'pm40', 'pl25', 'pm15']
+```
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 基于文件夹的批量预测
+
+如果希望预测文件夹内的图像，可以直接修改配置文件中的 `Global.infer_imgs` 字段，也可以通过下面的 `-o` 参数修改对应的配置。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测，如果希望使用 CPU 预测，可以在命令后面添加 -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/traffic_sign/inference_traffic_sign.yaml -o Global.infer_imgs="./images/PULC/traffic_sign/"
+```
+
+终端中会输出该文件夹内所有图像的分类结果，如下所示。
+
+```
+100999_83928.jpg:    class id(s): [182, 179, 162, 128, 24], score(s): [0.99, 0.01, 0.00, 0.00, 0.00], label_name(s): ['pl110', 'pl100', 'pl120', 'p26', 'pm10']
+99603_17806.jpg:    class id(s): [216, 145, 49, 24, 169], score(s): [1.00, 0.00, 0.00, 0.00, 0.00], label_name(s): ['pm20', 'pm30', 'pm40', 'pm10', 'pm15']
+```
+
+输出的 `label_name`可以从`dataset/traffic_sign/report.pdf`文件中查阅对应的图片。
+
+<a name="6.3"></a>
+
+### 6.3 基于 C++ 预测引擎推理
+
+PaddleClas 提供了基于 C++ 预测引擎推理的示例，您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台，可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。
+
+<a name="6.4"></a>
+
+### 6.4 服务化部署
+
+Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍，可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。
+
+PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例，您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。
+
+<a name="6.5"></a>
+
+### 6.5 端侧部署
+
+Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍，可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。
+
+PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例，您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。
+
+<a name="6.6"></a>
+
+### 6.6 Paddle2ONNX 模型转换与预测
+
+Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署，包括TensorRT/OpenVINO/MNN/TNN/NCNN，以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍，可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。
+
+PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例，您可以参考[Paddle2ONNX 模型转换与预测](../../../deploy/paddle2onnx/readme.md)来完成相应的部署工作。
diff --git a/docs/zh_CN/PULC/PULC_train.md b/docs/zh_CN/PULC/PULC_train.md
new file mode 100644
index 0000000000000000000000000000000000000000..035535c7f9eb04af952c628fca85cedaaffc97b8
--- /dev/null
+++ b/docs/zh_CN/PULC/PULC_train.md
@@ -0,0 +1,241 @@
+## 超轻量图像分类方案PULC
+------
+
+
+## 目录
+
+- [1. PULC方案简介](#1)
+- [2. 数据准备](#2)
+    - [2.1 数据集格式说明](#2.1)
+    - [2.2 标注文件生成](#2.2)
+- [3. 使用标准分类配置进行训练](#3)
+    - [3.1 骨干网络PP-LCNet](#3.1)
+    - [3.2 SSLD预训练权重](#3.2)
+    - [3.3 EDA数据增强策略](#3.3)
+    - [3.4 SKL-UGI模型蒸馏](#3.4)
+    - [3.5 总结](#3.5)
+- [4. 超参搜索](#4)
+    - [4.1 基于默认配置搜索](#4.1)
+    - [4.2 自定义搜索配置](#4.2)
+
+<a name="1"></a>
+
+### 1. PULC方案简介
+
+图像分类是计算机视觉的基础算法之一，是企业应用中最常见的算法，也是许多 CV 应用的重要组成部分。近年来，骨干网络模型发展迅速，ImageNet 的精度纪录被不断刷新。然而，这些模型在实用场景的表现有时却不尽如人意。一方面，精度高的模型往往体积大，运算慢，常常难以满足实际部署需求；另一方面，选择了合适的模型之后，往往还需要经验丰富的工程师进行调参，费时费力。PaddleClas 为了解决企业应用难题，让分类模型的训练和调参更加容易，总结推出了实用轻量图像分类解决方案（PULC, Practical Ultra Lightweight Classification）。PULC融合了骨干网络、数据增广、蒸馏等多种前沿算法，可以自动训练得到轻量且高精度的图像分类模型。
+
+PULC 方案在人、车、OCR等方向的多个场景中均验证有效，用超轻量模型就可实现与 SwinTransformer 模型接近的精度，预测速度提高 40+ 倍。
+
+<div align="center">
+<img src="https://user-images.githubusercontent.com/19523330/173011854-b10fcd7a-b799-4dfd-a1cf-9504952a3c44.png"  width = "800" />
+</div>
+
+方案主要包括 4 部分，分别是：PP-LCNet轻量级骨干网络、SSLD预训练权重、数据增强策略集成(EDA)和 SKL-UGI 知识蒸馏算法。此外，我们还采用了超参搜索的方法，高效优化训练中的超参数。下面，我们以有人/无人场景为例，对方案进行说明。
+
+**备注**：针对一些特定场景，我们提供了基础的训练文档供参考，例如[有人/无人分类模型](PULC_person_exists.md)等，您可以在[这里](./PULC_model_list.md)找到这些文档。如果这些文档中的方法不能满足您的需求，或者您需要自定义训练任务，您可以参考本文档。
+
+<a name="2"></a>
+
+### 2. 数据准备
+
+<a name="2.1"></a>
+
+#### 2.1 数据集格式说明
+
+PaddleClas 使用 `txt` 格式文件指定训练集和测试集，以有人/无人场景为例，其中需要指定 `train_list.txt` 和 `val_list.txt` 当作训练集和验证集的数据标签，格式形如：
+
+```
+# 每一行采用"空格"分隔图像路径与标注
+train/1.jpg 0
+train/10.jpg 1
+...
+```
+
+如果您想获取更多常用分类数据集的信息，可以参考文档可以参考 [PaddleClas 分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。
+
+<a name="2.2"></a>
+
+#### 2.2 标注文件生成
+
+如果您已经有实际场景中的数据，那么按照上节的格式进行标注即可。这里，我们提供了一个快速生成数据的脚本，您只需要将不同类别的数据分别放在文件夹中，运行脚本即可生成标注文件。
+
+首先，假设您存放数据的路径为`./train`，`train/` 中包含了每个类别的数据，类别号从 0 开始，每个类别的文件夹中有具体的图像数据。
+
+```shell
+train
+├── 0
+│   ├── 0.jpg
+│   ├── 1.jpg
+│   └── ...
+└── 1
+    ├── 0.jpg
+    ├── 1.jpg
+    └── ...
+└── ...
+```
+
+```shell
+tree -r -i -f train | grep -E "jpg|JPG|jpeg|JPEG|png|PNG" | awk -F "/" '{print $0" "$2}' > train_list.txt
+```
+
+其中，如果涉及更多的图片名称尾缀，可以增加 `grep -E`后的内容， `$2` 中的 `2` 为类别号文件夹的层级。
+
+**备注：** 以上为数据集获取和生成的方法介绍，这里您可以直接下载有人/无人场景数据快速开始体验。
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，下载并解压有人/无人场景的数据。
+
+```shell
+cd dataset
+wget https://paddleclas.bj.bcebos.com/data/PULC/person_exists.tar
+tar -xf person_exists.tar
+cd ../
+```
+
+<a name="3"></a>
+
+### 3. 使用标准分类配置进行训练
+
+<a name="3.1"></a>
+
+#### 3.1 骨干网络PP-LCNet
+
+PULC 采用了轻量骨干网络 PP-LCNet，相比同精度竞品速度快 50%，您可以在[PP-LCNet介绍](../models/PP-LCNet.md)查阅该骨干网络的详细介绍。
+直接使用 PP-LCNet 训练的命令为：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0_search.yaml
+```
+
+为了方便性能对比，我们也提供了大模型 SwinTransformer_tiny 和轻量模型 MobileNetV3_small_x0_35 的配置文件，您可以使用命令训练：
+
+SwinTransformer_tiny：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_exists/SwinTransformer_tiny_patch4_window7_224.yaml
+```
+
+MobileNetV3_small_x0_35：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_exists/MobileNetV3_small_x0_35.yaml
+```
+
+训练得到的模型精度对比如下表。
+
+| 模型 | Tpr（%） | 延时（ms） | 存储（M） | 策略 |
+|-------|-----------|----------|---------------|---------------|
+| SwinTranformer_tiny  | 95.69 | 95.30  | 107 | 使用 ImageNet 预训练模型 |
+| MobileNetV3_small_x0_35  | 68.25 | 2.85  | 1.6 | 使用 ImageNet 预训练模型 |
+| PPLCNet_x1_0  | 89.57 | 2.12  | 6.5 | 使用 ImageNet 预训练模型 |
+
+从中可以看出，PP-LCNet 的速度比 SwinTransformer 快很多，但是精度也略低。下面我们通过一系列优化来提高 PP-LCNet 模型的精度。
+
+<a name="3.2"></a>
+
+#### 3.2 SSLD预训练权重
+
+SSLD 是百度自研的半监督蒸馏算法，在 ImageNet 数据集上，模型精度可以提升 3-7 个点，您可以在 [SSLD 介绍](../advanced_tutorials/ssld.md)找到详细介绍。我们发现，使用SSLD预训练权重，可以有效提升应用分类模型的精度。此外，在训练中使用更小的分辨率，可以有效提升模型精度。同时，我们也对学习率进行了优化。
+基于以上三点改进，我们训练得到模型精度为 92.1%，提升 2.6%。
+
+<a name="3.3"></a>
+
+#### 3.3 EDA数据增强策略
+
+数据增强是视觉算法中常用的优化策略，可以对模型精度有明显提升。除了传统的 RandomCrop，RandomFlip 等方法之外，我们还应用了 RandomAugment 和 RandomErasing。您可以在[数据增强介绍](../advanced_tutorials/DataAugmentation.md)找到详细介绍。
+由于这两种数据增强对图片的修改较大，使分类任务变难，在一些小数据集上可能会导致模型欠拟合，我们将提前设置好这两种方法启用的概率。
+基于以上改进，我们训练得到模型精度为 93.43%，提升 1.3%。
+
+<a name="3.4"></a>
+
+#### 3.4 SKL-UGI模型蒸馏
+
+模型蒸馏是一种可以有效提升小模型精度的方法，您可以在[知识蒸馏介绍](../advanced_tutorials/ssld.md)找到详细介绍。我们选择 ResNet101_vd 作为教师模型进行蒸馏。为了适应蒸馏过程，我们在此也对网络不同 stage 的学习率进行了调整。基于以上改进，我们训练得到模型精度为 95.6%，提升 1.4%。
+
+<a name="3.5"></a>
+
+#### 3.5 总结
+
+经过以上方法优化，PP-LCNet最终精度达到 95.6%，达到了大模型的精度水平。我们将实验结果总结如下表：
+
+| 模型 | Tpr（%） | 延时（ms） | 存储（M） | 策略 |
+|-------|-----------|----------|---------------|---------------|
+| SwinTranformer_tiny  | 95.69 | 95.30  | 107 | 使用 ImageNet 预训练模型 |
+| MobileNetV3_small_x0_35  | 68.25 | 2.85  | 1.6 | 使用 ImageNet 预训练模型 |
+| PPLCNet_x1_0  | 89.57 | 2.12  | 6.5 | 使用 ImageNet 预训练模型 |
+| PPLCNet_x1_0  | 92.10 | 2.12  | 6.5 | 使用 SSLD 预训练模型 |
+| PPLCNet_x1_0  | 93.43 | 2.12  | 6.5 | 使用 SSLD 预训练模型+EDA 策略|
+| <b>PPLCNet_x1_0<b>  | <b>95.60<b> | <b>2.12<b>  | <b>6.5<b> | 使用 SSLD 预训练模型+EDA 策略+SKL-UGI 知识蒸馏策略|
+
+我们在其他 8 个场景中也使用了同样的优化策略，得到如下结果：
+
+| 场景 | 大模型 | 大模型精度（%）  | 小模型 | 小模型精度（%） |
+|----------|----------|----------|----------|----------|
+| 人体属性识别 | Res2Net200_vd | 81.25 | PPLCNet_x1_0 | 78.59 |
+| 佩戴安全帽分类 | Res2Net200_vd| 98.92 | PPLCNet_x1_0 |99.38 |
+| 交通标志分类 | SwinTransformer_tiny | 98.11 | PPLCNet_x1_0 | 98.35 |
+| 车辆属性识别 | Res2Net200_vd_26w_4s | 91.36 | PPLCNet_x1_0 | 90.81 |
+| 有车/无车分类 | SwinTransformer_tiny | 97.71 | PPLCNet_x1_0 | 95.92 |
+| 含文字图像方向分类 | SwinTransformer_tiny |99.12 | PPLCNet_x1_0 | 99.06 |
+| 文本行方向分类 | SwinTransformer_tiny | 93.61 | PPLCNet_x1_0 | 96.01 |
+| 语种分类 | SwinTransformer_tiny | 98.12 | PPLCNet_x1_0 | 99.26 |
+
+
+从结果可以看出，PULC 方案在多个应用场景中均可提升模型精度。使用 PULC 方案可以大大减少模型优化的工作量，快速得到精度较高的模型。
+
+<a name="4"></a>
+
+### 4. 超参搜索
+
+在上述训练过程中，我们调节了学习率、数据增广方法开启概率、分阶段学习率倍数等参数。
+这些参数在不同场景中最优值可能并不相同。我们提供了一个快速超参搜索的脚本，将超参调优的过程自动化。
+这个脚本会遍历搜索值列表中的参数来替代默认配置中的参数，依次训练，最终选择精度最高的模型所对应的参数作为搜索结果。
+
+<a name="4.1"></a>
+
+#### 4.1 基于默认配置搜索
+
+配置文件 [search.yaml](../../../ppcls/configs/PULC/person_exists/search.yaml) 定义了有人/无人场景超参搜索的配置，使用如下命令即可完成超参数的搜索。
+
+```bash
+python3 tools/search_strategy.py -c ppcls/configs/PULC/person_exists/search.yaml
+```
+
+**备注**：关于搜索部分，我们也在不断优化，敬请期待。
+
+<a name="4.2"></a>
+
+#### 4.2 自定义搜索配置
+
+您也可以根据训练结果或调参经验，修改超参搜索的配置。
+
+修改 `lrs` 中的`search_values`字段，可以修改学习率搜索值列表；
+
+修改 `resolutions` 中的 `search_values` 字段，可以修改分辨率的搜索值列表；
+
+修改 `ra_probs` 中的 `search_values` 字段，可以修改 RandAugment 开启概率的搜索值列表；
+
+修改 `re_probs` 中的 `search_values` 字段，可以修改 RnadomErasing 开启概率的搜索值列表；
+
+修改 `lr_mult_list` 中的 `search_values` 字段，可以修改 lr_mult 搜索值列表；
+
+修改 `teacher` 中的 `search_values` 字段，可以修改教师模型的搜索列表。
+
+搜索完成后，会在 `output/search_person_exists` 中生成最终的结果，其中，除`search_res`外 `output/search_person_exists` 中目录为对应的每个搜索的超参数的结果的权重和训练日志文件，`search_res` 对应的是蒸馏后的结果，也就是最终的模型，该模型的权重保存在`output/output_dir/search_person_exists/DistillationModel/best_model_student.pdparams`。
diff --git a/docs/zh_CN/PULC/PULC_vehicle_attribute.md b/docs/zh_CN/PULC/PULC_vehicle_attribute.md
new file mode 100644
index 0000000000000000000000000000000000000000..35b731f324236f4b9bcade4074c4a7afd21b9e8e
--- /dev/null
+++ b/docs/zh_CN/PULC/PULC_vehicle_attribute.md
@@ -0,0 +1,477 @@
+# PULC 车辆属性识别模型
+
+------
+
+
+## 目录
+
+- [1. 模型和应用场景介绍](#1)
+- [2. 模型快速体验](#2)
+    - [2.1 安装 paddlepaddle](#2.1)
+    - [2.2 安装 paddleclas](#2.2)
+    - [2.3 预测](#2.3)
+- [3. 模型训练、评估和预测](#3)
+    - [3.1 环境配置](#3.1)
+    - [3.2 数据准备](#3.2)
+      - [3.2.1 数据集来源](#3.2.1)
+      - [3.2.2 数据集获取](#3.2.2)
+    - [3.3 模型训练](#3.3)
+    - [3.4 模型评估](#3.4)
+    - [3.5 模型预测](#3.5)
+- [4. 模型压缩](#4)
+  - [4.1 SKL-UGI 知识蒸馏](#4.1)
+    - [4.1.1 教师模型训练](#4.1.1)
+    - [4.1.2 蒸馏训练](#4.1.2)
+- [5. 超参搜索](#5)
+- [6. 模型推理部署](#6)
+  - [6.1 推理模型准备](#6.1)
+    - [6.1.1 基于训练得到的权重导出 inference 模型](#6.1.1)
+    - [6.1.2 直接下载 inference 模型](#6.1.2)
+  - [6.2 基于 Python 预测引擎推理](#6.2)
+    - [6.2.1 预测单张图像](#6.2.1)
+    - [6.2.2 基于文件夹的批量预测](#6.2.2)
+  - [6.3 基于 C++ 预测引擎推理](#6.3)
+  - [6.4 服务化部署](#6.4)
+  - [6.5 端侧部署](#6.5)
+  - [6.6 Paddle2ONNX 模型转换与预测](#6.6)
+
+
+<a name="1"></a>
+
+## 1. 模型和应用场景介绍
+
+该案例提供了用户使用 PaddleClas 的超轻量图像分类方案（PULC，Practical Ultra Lightweight image Classification）快速构建轻量级、高精度、可落地的车辆属性识别模型。该模型可以广泛应用于车辆识别、道路监控等场景。
+
+下表列出了不同车辆属性识别模型的相关指标，前三行展现了使用 Res2Net200_vd_26w_4s、 ResNet50、MobileNetV3_small_x0_35 作为 backbone 训练得到的模型的相关指标，第四行至第七行依次展现了替换 backbone 为 PPLCNet_x1_0、使用 SSLD 预训练模型、使用 SSLD 预训练模型 + EDA 策略、使用 SSLD 预训练模型 + EDA 策略 + SKL-UGI 知识蒸馏策略训练得到的模型的相关指标。
+
+
+| 模型 | mA（%） | 延时（ms） | 存储（M） | 策略 |
+|-------|-----------|----------|---------------|---------------|
+| Res2Net200_vd_26w_4s  | 91.36 | 79.46  | 293 | 使用ImageNet预训练模型 |
+| ResNet50  | 89.98 | 12.83  | 92 | 使用ImageNet预训练模型 |
+| MobileNetV3_small_x0_35  | 87.41 | 2.91  | 2.8 | 使用ImageNet预训练模型 |
+| PPLCNet_x1_0  | 89.57 | 2.36  | 7.2 | 使用ImageNet预训练模型 |
+| PPLCNet_x1_0  | 90.07 | 2.36  | 7.2 | 使用SSLD预训练模型 |
+| PPLCNet_x1_0  | 90.59 | 2.36  | 7.2 | 使用SSLD预训练模型+EDA策略|
+| <b>PPLCNet_x1_0<b>  | <b>90.81<b> | <b>2.36<b>  | <b>7.2<b> | 使用SSLD预训练模型+EDA策略+SKL-UGI知识蒸馏策略|
+
+从表中可以看出，backbone 为 Res2Net200_vd_26w_4s 时精度较高，但是推理速度较慢。将 backbone 替换为轻量级模型 MobileNetV3_small_x0_35 后，速度可以大幅提升，但是精度下降明显。将 backbone 替换为 PPLCNet_x1_0 时，精度提升 2 个百分点，同时速度也提升 23% 左右。在此基础上，使用 SSLD 预训练模型后，在不改变推理速度的前提下，精度可以提升约 0.5 个百分点，进一步地，当融合EDA策略后，精度可以再提升 0.52 个百分点，最后，在使用 SKL-UGI 知识蒸馏后，精度可以继续提升 0.23 个百分点。此时，PPLCNet_x1_0 的精度与 Res2Net200_vd_26w_4s 仅相差 0.55 个百分点，但是速度快 32 倍。关于 PULC 的训练方法和推理部署方法将在下面详细介绍。
+
+**备注：**
+    
+* 延时是基于 Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz 测试得到，开启 MKLDNN 加速策略，线程数为10。
+* 关于PP-LCNet的介绍可以参考[PP-LCNet介绍](../models/PP-LCNet.md)，相关论文可以查阅[PP-LCNet paper](https://arxiv.org/abs/2109.15099)。
+
+
+<a name="2"></a>
+
+## 2. 模型快速体验
+
+<a name="2.1"></a>  
+
+### 2.1 安装 paddlepaddle
+
+- 您的机器安装的是 CUDA9 或 CUDA10，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- 您的机器是CPU，请运行以下命令安装
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+更多的版本需求，请参照[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
+
+<a name="2.2"></a>  
+
+### 2.2 安装 paddleclas
+
+使用如下命令快速安装 paddleclas
+
+```  
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 预测
+
+点击[这里](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip)下载 demo 数据并解压，然后在终端中切换到相应目录。
+
+* 使用命令行快速预测
+
+```bash
+paddleclas --model_name=vehicle_attribute --infer_imgs=pulc_demo_imgs/vehicle_attribute/0002_c002_00030670_0.jpg
+```
+
+结果如下：
+```
+>>> result
+attributes: Color: (yellow, prob: 0.9893476963043213), Type: (hatchback, prob: 0.9734097719192505), output: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], filename: pulc_demo_imgs/vehicle_attribute/0002_c002_00030670_0.jpg
+Predict complete!
+```
+
+**备注**： 更换其他预测的数据时，只需要改变 `--infer_imgs=xx` 中的字段即可，支持传入整个文件夹。
+
+
+* 在 Python 代码中预测
+```python
+import paddleclas
+model = paddleclas.PaddleClas(model_name="vehicle_attribute")
+result = model.predict(input_data="pulc_demo_imgs/vehicle_attribute/0002_c002_00030670_0.jpg")
+print(next(result))
+```
+
+**备注**：`model.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果, 默认 `batch_size` 为 1，如果需要更改 `batch_size`，实例化模型时，需要指定 `batch_size`，如 `model = paddleclas.PaddleClas(model_name="vehicle_attribute",  batch_size=2)`, 使用默认的代码返回结果示例如下：
+
+```
+>>> result
+[{'attributes': 'Color: (yellow, prob: 0.9893476963043213), Type: (hatchback, prob: 0.9734097719192505)', 'output': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], 'filename': 'pulc_demo_imgs/vehicle_attribute/0002_c002_00030670_0.jpg'}]
+```
+
+
+<a name="3"></a>
+
+## 3. 模型训练、评估和预测
+
+<a name="3.1"></a>  
+
+### 3.1 环境配置
+
+* 安装：请先参考文档 [环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+
+<a name="3.2"></a>
+
+### 3.2 数据准备
+
+<a name="3.2.1"></a>
+
+#### 3.2.1 数据集来源
+
+本案例中所使用的数据为[VeRi 数据集](https://www.v7labs.com/open-datasets/veri-dataset)。
+
+<a name="3.2.2"></a>  
+
+#### 3.2.2 数据集获取
+
+部分数据可视化如下所示。
+
+<div align="center">
+<img src="../../images/PULC/docs/vehicle_attribute_data_demo.png"  width = "500" />
+</div>
+
+首先从[VeRi数据集官网](https://www.v7labs.com/open-datasets/veri-dataset)中申请并下载数据，放在PaddleClas的`dataset`目录下，数据集目录名为`VeRi`，使用下面的命令进入该文件夹。
+
+```shell
+cd PaddleClas/dataset/VeRi/
+```
+
+然后使用下面的代码转换label（可以在python终端中执行下面的命令，也可以将其写入一个文件，然后使用`python3 convert.py`的方式运行该文件）。
+
+
+```python
+import os
+from xml.dom.minidom import parse
+
+vehicleids = []
+
+def convert_annotation(input_fp, output_fp):
+    in_file = open(input_fp)
+    list_file = open(output_fp, 'w')
+    tree = parse(in_file)
+
+    root = tree.documentElement
+
+    for item in root.getElementsByTagName("Item"):  
+        label = ['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
+        if item.hasAttribute("imageName"):
+            name = item.getAttribute("imageName")
+        if item.hasAttribute("vehicleID"):
+            vehicleid = item.getAttribute("vehicleID")
+            if vehicleid not in vehicleids :
+                vehicleids.append(vehicleid)
+            vid = vehicleids.index(vehicleid)
+        if item.hasAttribute("colorID"):
+            colorid = int (item.getAttribute("colorID"))
+            label[colorid-1] = '1'
+        if item.hasAttribute("typeID"):
+            typeid = int (item.getAttribute("typeID"))
+            label[typeid+9] = '1'
+        label = ','.join(label)
+        list_file.write(os.path.join('image_train', name)  + "\t" + label + "\n")
+
+    list_file.close()
+
+convert_annotation('train_label.xml', 'train_list.txt')  #imagename vehiclenum colorid typeid
+convert_annotation('test_label.xml', 'test_list.txt')
+```
+
+执行上述命令后，`VeRi`目录中具有以下数据：
+
+```
+VeRi
+├── image_train
+│   ├── 0001_c001_00016450_0.jpg
+│   ├── 0001_c001_00016460_0.jpg
+│   ├── 0001_c001_00016470_0.jpg
+...
+├── image_test
+│   ├── 0002_c002_00030600_0.jpg
+│   ├── 0002_c002_00030605_1.jpg
+│   ├── 0002_c002_00030615_1.jpg
+...
+...
+├── train_list.txt
+├── test_list.txt
+├── train_label.xml
+├── test_label.xml
+```
+
+其中`train/`和`test/`分别为训练集和验证集。`train_list.txt`和`test_list.txt`分别为训练集和验证集的转换后用于训练的标签文件。
+
+
+<a name="3.3"></a>
+
+### 3.3 模型训练
+
+
+在 `ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml` 中提供了基于该场景的训练配置，可以通过如下脚本启动训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml
+```
+
+验证集的最佳指标在 `90.59%` 左右（数据集较小，一般有0.3%左右的波动）。
+
+
+<a name="3.4"></a>
+
+### 3.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
+```
+
+其中 `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+<a name="3.5"></a>
+
+### 3.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```bash
+python3 tools/infer.py \
+    -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model
+```
+
+输出结果如下：
+
+```
+[{'attr': 'Color: (yellow, prob: 0.9893478155136108), Type: (hatchback, prob: 0.9734100103378296)', 'pred': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], 'file_name': './deploy/images/PULC/vehicle_attribute/0002_c002_00030670_0.jpg'}]
+```
+
+**备注：**
+
+* 这里`-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+* 默认是对 `./deploy/images/PULC/vehicle_attribute/0002_c002_00030670_0.jpg` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
+
+<a name="4"></a>
+
+## 4. 模型压缩
+
+<a name="4.1"></a>
+
+### 4.1 SKL-UGI 知识蒸馏
+
+SKL-UGI 知识蒸馏是 PaddleClas 提出的一种简单有效的知识蒸馏方法，关于该方法的介绍，可以参考[SKL-UGI 知识蒸馏](../advanced_tutorials/ssld.md)。
+
+<a name="4.1.1"></a>
+
+#### 4.1.1 教师模型训练
+
+复用 `ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml` 中的超参数，训练教师模型，训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
+        -o Arch.name=ResNet101_vd
+```
+
+验证集的最佳指标为 `91.60%` 左右，当前教师模型最好的权重保存在 `output/ResNet101_vd/best_model.pdparams`。
+
+<a name="4.1.2"></a>
+
+####  4.1.2 蒸馏训练
+
+配置文件`ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0_distillation.yaml`提供了`SKL-UGI知识蒸馏策略`的配置。该配置将`ResNet101_vd`当作教师模型，`PPLCNet_x1_0`当作学生模型。训练脚本如下：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0_distillation.yaml \
+        -o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
+```
+
+验证集的最佳指标为 `90.81%` 左右，当前模型最好的权重保存在 `output/DistillationModel/best_model_student.pdparams`。
+
+
+<a name="5"></a>
+
+## 5. 超参搜索
+
+在 [3.3 节](#3.3)和 [4.1 节](#4.1)所使用的超参数是根据 PaddleClas 提供的 `超参数搜索策略` 搜索得到的，如果希望在自己的数据集上得到更好的结果，可以参考[超参数搜索策略](PULC_train.md#4-超参搜索)来获得更好的训练超参数。
+
+**备注：** 此部分内容是可选内容，搜索过程需要较长的时间，您可以根据自己的硬件情况来选择执行。如果没有更换数据集，可以忽略此节内容。
+
+<a name="6"></a>
+
+## 6. 模型推理部署
+
+<a name="6.1"></a>
+
+### 6.1 推理模型准备
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference可使用MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于Paddle Inference推理引擎的介绍，可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+
+当使用 Paddle Inference 推理时，加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法，如果希望得到和文档相同的结果，请选择[直接下载 inference 模型](#6.1.2)的方式。
+
+<a name="6.1.1"></a>
+
+### 6.1.1 基于训练得到的权重导出 inference 模型
+
+此处，我们提供了将权重和模型转换的脚本，执行该脚本可以得到对应的 inference 模型：
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model_student \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_vehicle_attribute_infer
+```
+执行完该脚本后会在 `deploy/models/` 下生成 `PPLCNet_x1_0_vehicle_attributeibute_infer` 文件夹，`models` 文件夹下应有如下文件结构：
+
+```
+└── PPLCNet_x1_0_vehicle_attribute_infer
+    ├── inference.pdiparams
+    ├── inference.pdiparams.info
+    └── inference.pdmodel
+```
+
+**备注：** 此处的最佳权重是经过知识蒸馏后的权重路径，如果没有执行知识蒸馏的步骤，最佳模型保存在`output/PPLCNet_x1_0/best_model.pdparams`中。
+
+<a name="6.1.2"></a>
+
+### 6.1.2 直接下载 inference 模型
+
+[6.1.1 小节](#6.1.1)提供了导出 inference 模型的方法，此处也提供了该场景可以下载的 inference 模型，可以直接下载体验。
+
+```
+cd deploy/models
+# 下载 inference 模型并解压
+wget https://paddleclas.bj.bcebos.com/models/PULC/vehicle_attribute_infer.tar && tar -xf vehicle_attribute_infer.tar
+```
+
+解压完毕后，`models` 文件夹下应有如下文件结构：
+
+```
+├── vehicle_attribute_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="6.2"></a>
+
+### 6.2 基于 Python 预测引擎推理
+
+
+<a name="6.2.1"></a>  
+
+#### 6.2.1 预测单张图像
+
+返回 `deploy` 目录：
+
+```
+cd ../
+```
+
+运行下面的命令，对图像 `./images/PULC/vehicle_attribute/0002_c002_00030670_0.jpg` 进行车辆属性识别。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/vehicle_attribute/inference_vehicle_attribute.yaml -o Global.use_gpu=True
+# 使用下面的命令使用 CPU 进行预测
+python3.7 python/predict_cls.py -c configs/PULC/vehicle_attribute/inference_vehicle_attribute.yaml -o Global.use_gpu=False
+```
+
+输出结果如下。
+
+```
+0002_c002_00030670_0.jpg:	 {'attributes': 'Color: (yellow, prob: 0.9893478155136108), Type: (hatchback, prob: 0.9734099507331848)', 'output': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]}
+```
+
+<a name="6.2.2"></a>  
+
+#### 6.2.2 基于文件夹的批量预测
+
+如果希望预测文件夹内的图像，可以直接修改配置文件中的 `Global.infer_imgs` 字段，也可以通过下面的 `-o` 参数修改对应的配置。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测，如果希望使用 CPU 预测，可以在命令后面添加 -o Global.use_gpu=False
+python3.7 python/predict_cls.py -c configs/PULC/vehicle_attribute/inference_vehicle_attribute.yaml -o Global.infer_imgs="./images/PULC/vehicle_attribute/"
+```
+
+终端中会输出该文件夹内所有图像的属性识别结果，如下所示。
+
+```
+0002_c002_00030670_0.jpg:	 {'attributes': 'Color: (yellow, prob: 0.9893476963043213), Type: (hatchback, prob: 0.9734097719192505)', 'output': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]}
+0014_c012_00040750_0.jpg:	 {'attributes': 'Color: (red, prob: 0.999872088432312), Type: (sedan, prob: 0.999976634979248)', 'output': [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]}
+```
+
+<a name="6.3"></a>
+
+### 6.3 基于 C++ 预测引擎推理
+
+PaddleClas 提供了基于 C++ 预测引擎推理的示例，您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台，可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。
+
+<a name="6.4"></a>
+
+### 6.4 服务化部署
+
+Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍，可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。
+
+PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例，您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。
+
+<a name="6.5"></a>
+
+### 6.5 端侧部署
+
+Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍，可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。
+
+PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例，您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。
+
+<a name="6.6"></a>
+
+### 6.6 Paddle2ONNX 模型转换与预测
+
+Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署，包括TensorRT/OpenVINO/MNN/TNN/NCNN，以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍，可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。
+
+PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例，您可以参考[Paddle2ONNX 模型转换与预测](../../../deploy/paddle2onnx/readme.md)来完成相应的部署工作。
diff --git a/docs/zh_CN/advanced_tutorials/DataAugmentation.md b/docs/zh_CN/advanced_tutorials/DataAugmentation.md
index 9e5159a4a148b75fa28d1cd774a8e8498e6da460..7097ff637b9f204f19d596445b2d0376e7b52d3b 100644
--- a/docs/zh_CN/advanced_tutorials/DataAugmentation.md
+++ b/docs/zh_CN/advanced_tutorials/DataAugmentation.md
@@ -1,33 +1,149 @@
 # 数据增强分类实战
 ---
 
-本节将基于 ImageNet-1K 的数据集详细介绍数据增强实验，如果想快速体验此方法，可以参考 [**30 分钟玩转 PaddleClas（进阶版）**](../quick_start/quick_start_classification_professional.md)中基于 CIFAR100 的数据增强实验。如果想了解相关算法的内容，请参考[数据增强算法介绍](../algorithm_introduction/DataAugmentation.md)。
-
-
 ## 目录
 
-- [1. 参数配置](#1)
-    - [1.1 AutoAugment](#1.1)
-    - [1.2 RandAugment](#1.2)
-    - [1.3 TimmAutoAugment](#1.3)
-    - [1.4 Cutout](#1.4)
-    - [1.5 RandomErasing](#1.5)
-    - [1.6 HideAndSeek](#1.6)
-    - [1.7 GridMask](#1.7)
-    - [1.8 Mixup](#1.8)
-    - [1.9 Cutmix](#1.9)
-    - [1.10 Mixup 与 Cutmix 同时使用](#1.10)
-- [2. 启动命令](#2)
-- [3. 注意事项](#3)
-- [4. 实验结果](#4)
+- [1. 算法介绍](#1)
+    - [1.1 数据增强简介](#1.1)
+    - [1.2 图像变换类数据增强](#1.2)
+      - [1.2.1 AutoAugment](#1.2.1)
+        - [1.2.1.1 AutoAugment 算法介绍](#1.2.1.1)
+        - [1.2.1.2 AutoAugment 配置](#1.2.1.2)
+      - [1.2.2 RandAugment](#1.2.2)
+        - [1.2.2.1 RandAugment 算法介绍](#1.2.2.1)
+        - [1.2.2.2 RandAugment 配置](#1.2.2.2)
+      - [1.2.3 TimmAutoAugment](#1.2.3)
+        - [1.2.3.1 TimmAutoAugment 算法介绍](#1.2.3.1)
+        - [1.2.3.2 TimmAutoAugment 配置](#1.2.3.2)
+    - [1.3 图像裁剪类数据增强](#1.3)
+      - [1.3.1 Cutout](#1.3.1)
+        - [1.3.1.1 Cutout 算法介绍](#1.3.1.1)
+        - [1.3.1.2 Cutout 配置](#1.3.1.2)
+      - [1.3.2 RandomErasing](#1.3.2)
+        - [1.3.2.1 RandomErasing 算法介绍](#1.3.2.1)
+        - [1.3.2.2 RandomErasing 配置](#1.3.2.2)
+      - [1.3.3 HideAndSeek](#1.3.3)
+        - [1.3.3.1 HideAndSeek 算法介绍](#1.3.3.1)
+        - [1.3.3.2 HideAndSeek 配置](#1.3.3.2)
+      - [1.3.4 GridMask](#1.3.4)
+        - [1.3.4.1 GridMask 算法介绍](#1.3.4.1)
+        - [1.3.4.2 GridMask 配置](#1.3.4.2)
+    - [1.4 图像混叠类数据增强](#1.4)
+      - [1.4.1 Mixup](#1.4.1)
+        - [1.4.1.1 Mixup 算法介绍](#1.4.1.1)
+        - [1.4.1.2 Mixup 配置](#1.4.1.2)
+      - [1.4.2 Cutmix](#1.4.2)
+        - [1.4.2.1 Cutmix 算法介绍](#1.4.2.1)
+        - [1.4.2.2 Cutmix 配置](#1.4.2.2)
+        - [1.4.2.3 Mixup 和 Cutmix 混合使用配置](#1.4.2.3)
+- [2. 模型训练、评估和预测](#2)
+    - [2.1 环境配置](#2.1)
+    - [2.2 数据准备](#2.2)
+    - [2.3 模型训练](#2.3)
+    - [2.4 模型评估](#2.4)
+    - [2.5 模型预测](#2.5)
+- [3. 参考文献](#4)
+
 
 <a name="1"></a>
-## 1. 参数配置
 
-由于不同的数据增强方式含有不同的超参数，为了便于理解和使用，我们在 `configs/DataAugment` 里分别列举了 8 种训练 ResNet50 的数据增强方式的参数配置文件，用户可以在 `tools/run.sh` 里直接替换配置文件的路径即可使用。此处分别挑选了图像变换、图像裁剪、图像混叠中的一个示例展示，其他参数配置用户可以自查配置文件。
+## 1. 算法介绍
+
+在图像分类任务中，图像数据的增广是一种常用的正则化方法，常用于数据量不足或者模型参数较多的场景。在本章节中，我们将对除 ImageNet 分类任务标准数据增强外的 8 种数据增强方式进行简单的介绍和对比，用户也可以将这些增广方法应用到自己的任务中，以获得模型精度的提升。这 8 种数据增强方式在 ImageNet 上的精度指标如下所示。
+
+![](../../images/image_aug/main_image_aug.png)
+
+更具体的指标如下表所示：
+
+
+| 模型          | 初始学习率策略  | l2 decay | batch size | epoch | 数据变化策略         | Top1 Acc    | 论文中结论 |
+|-------------|------------------|--------------|------------|-------|----------------|------------|----|
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | 标准变换           | 0.7731 | - |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | AutoAugment    | 0.7795 |  0.7763 |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | mixup          | 0.7828 |  0.7790 |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | cutmix         | 0.7839 |  0.7860 |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | cutout         | 0.7801 |  - |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | gridmask       | 0.7785 |  0.7790 |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | random-augment | 0.7770 |  0.7760 |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | random erasing | 0.7791 |  - |
+| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | hide and seek  | 0.7743 |  0.7720 |
 
 <a name="1.1"></a>
-### 1.1 AutoAugment
+
+### 1.1. 数据增强简介
+
+如果没有特殊说明，本章节中所有示例为 ImageNet 分类，并且假设最终输入网络的数据维度为：`[batch-size, 3, 224, 224]`
+
+其中 ImageNet 分类训练阶段的标准数据增强方式分为以下几个步骤：
+
+1. 图像解码：简写为 `ImageDecode`
+2. 随机裁剪到长宽均为 224 的图像：简写为 `RandCrop`
+3. 水平方向随机翻转：简写为 `RandFlip`
+4. 图像数据的归一化：简写为 `Normalize`
+5. 图像数据的重排，`[224, 224, 3]` 变为 `[3, 224, 224]`：简写为 `Transpose`
+6. 多幅图像数据组成 batch 数据，如 `batch-size` 个 `[3, 224, 224]` 的图像数据拼组成 `[batch-size, 3, 224, 224]`：简写为 `Batch`
+
+相比于上述标准的图像增广方法，研究者也提出了很多改进的图像增广策略，这些策略均是在标准增广方法的不同阶段插入一定的操作，基于这些策略操作所处的不同阶段，我们将其分为了三类：
+
+1. 对 `RandCrop` 后的 224 的图像进行一些变换: AutoAugment，RandAugment
+2. 对 `Transpose` 后的 224 的图像进行一些裁剪: CutOut，RandErasing，HideAndSeek，GridMask
+3. 对 `Batch` 后的数据进行混合: Mixup，Cutmix
+
+增广后的可视化效果如下所示。
+
+![](../../images/image_aug/image_aug_samples_s.jpg)
+
+具体如下表所示：
+
+
+| 变换方法        | 输入                        | 输出                        | Auto-<br>Augment\[1\] | Rand-<br>Augment\[2\] | CutOut\[3\] | Rand<br>Erasing\[4\] | HideAnd-<br>Seek\[5\] | GridMask\[6\] | Mixup\[7\] | Cutmix\[8\] |
+|-------------|---------------------------|---------------------------|------------------|------------------|-------------|------------------|------------------|---------------|------------|------------|
+| Image<br>Decode | Binary                    | (224, 224, 3)<br>uint8      | Y                | Y                | Y           | Y                | Y                | Y             | Y          | Y |
+| RandCrop    | (:, :, 3)<br>uint8          | (224, 224, 3)<br>uint8      | Y                | Y                | Y           | Y                | Y                | Y             | Y          | Y |
+| **Process**     | (224, 224, 3)<br>uint8      | (224, 224, 3)<br>uint8      | Y                | Y                | \-          | \-               | \-               | \-            | \-         | \- |
+| RandFlip    | (224, 224, 3)<br>uint8      | (224, 224, 3)<br>float32    | Y                | Y                | Y           | Y                | Y                | Y             | Y          | Y |
+| Normalize   | (224, 224, 3)<br>uint8      | (3, 224, 224)<br>float32    | Y                | Y                | Y           | Y                | Y                | Y             | Y          | Y |
+| Transpose   | (224, 224, 3)<br>float32    | (3, 224, 224)<br>float32    | Y                | Y                | Y           | Y                | Y                | Y             | Y          | Y |
+| **Process**     | (3, 224, 224)<br>float32    | (3, 224, 224)<br>float32    | \-               | \-               | Y           | Y                | Y                | Y             | \-         | \- |
+| Batch       | (3, 224, 224)<br>float32    | (N, 3, 224, 224)<br>float32 | Y                | Y                | Y           | Y                | Y                | Y             | Y          | Y |
+| **Process**     | (N, 3, 224, 224)<br>float32 | (N, 3, 224, 224)<br>float32 | \-               | \-               | \-          | \-               | \-               | \-            | Y          | Y |
+
+
+PaddleClas 中集成了上述所有的数据增强策略，每种数据增强策略的参考论文与参考开源代码均在下面的介绍中列出。下文将介绍这些策略的原理与使用方法，并以下图为例，对变换后的效果进行可视化。为了说明问题，本章节中将 `RandCrop` 替换为 `Resize`。
+
+![][test_baseline]
+
+<a name="1.2"></a>
+
+### 1.2 图像变换类
+
+图像变换类指的是对 `RandCrop` 后的 224 的图像进行一些变换，主要包括
+
++ AutoAugment
++ RandAugment
++ TimmAutoAugment
+
+<a name="1.2.1"></a>
+
+#### 1.2.1 AutoAugment
+
+<a name="1.2.1.1"></a>
+
+##### 1.2.1.1 AutoAugment 算法介绍
+
+论文地址：[https://arxiv.org/abs/1805.09501v1](https://arxiv.org/abs/1805.09501v1)
+
+开源代码 github 地址：[https://github.com/DeepVoltaire/AutoAugment](https://github.com/DeepVoltaire/AutoAugment)
+
+不同于常规的人工设计图像增广方式，AutoAugment 是在一系列图像增广子策略的搜索空间中通过搜索算法找到的适合特定数据集的图像增广方案。针对 ImageNet 数据集，最终搜索出来的数据增强方案包含 25 个子策略组合，每个子策略中都包含两种变换，针对每幅图像都随机的挑选一个子策略组合，然后以一定的概率来决定是否执行子策略中的每种变换。
+
+经过 AutoAugment 数据增强后结果如下图所示。
+
+![][test_autoaugment]
+
+<a name="1.2.1.2"></a>
+
+##### 1.2.1.2 AutoAugment 配置
 
 `AotoAugment` 的图像增广方式的配置如下。`AutoAugment` 是在 uint8 的数据格式上转换的，所以其处理过程应该放在归一化操作(`NormalizeImage`)之前。
 
@@ -48,8 +164,31 @@
             order: ''
 ```
 
-<a name="1.2"></a>
-### 1.2 RandAugment
+<a name="1.2.2"></a>
+
+#### 1.2.2 RandAugment
+
+<a name="1.2.2.1"></a>
+
+##### 1.2.2.1 RandAugment 算法介绍
+
+论文地址：[https://arxiv.org/pdf/1909.13719.pdf](https://arxiv.org/pdf/1909.13719.pdf)
+
+开源代码 github 地址：[https://github.com/heartInsert/randaugment](https://github.com/heartInsert/randaugment)
+
+
+`AutoAugment` 的搜索方法比较暴力，直接在数据集上搜索针对该数据集的最优策略，其计算量很大。在 `RandAugment` 文章中作者发现，一方面，针对越大的模型，越大的数据集，使用 `AutoAugment` 方式搜索到的增广方式产生的收益也就越小；另一方面，这种搜索出的最优策略是针对该数据集的，其迁移能力较差，并不太适合迁移到其他数据集上。
+
+在 `RandAugment` 中，作者提出了一种随机增广的方式，不再像 `AutoAugment` 中那样使用特定的概率确定是否使用某种子策略，而是所有的子策略都会以同样的概率被选择到，论文中的实验也表明这种数据增强方式即使在大模型的训练中也具有很好的效果。
+
+
+经过 RandAugment 数据增强后结果如下图所示。
+
+![][test_randaugment]
+
+<a name="1.2.2.2"></a>
+
+##### 1.2.2.2 RandAugment 配置
 
 `RandAugment` 的图像增广方式的配置如下，其中用户需要指定其中的参数 `num_layers` 与 `magnitude`，默认的数值分别是 `2` 和 `5`。`RandAugment` 是在 uint8 的数据格式上转换的，所以其处理过程应该放在归一化操作(`NormalizeImage`)之前。
 
@@ -72,8 +211,21 @@
             order: ''
 ```
 
-<a name="1.3"></a>
-### 1.3 TimmAutoAugment
+<a name="1.2.3"></a>
+
+#### 1.2.3 TimmAutoAugment
+
+<a name="1.2.3.1"></a>
+
+##### 1.2.3.1 TimmAutoAugment 算法介绍
+
+开源代码 github 地址：[https://github.com/rwightman/pytorch-image-models/blob/master/timm/data/auto_augment.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/data/auto_augment.py)
+
+`TimmAutoAugment` 是开源作者对 AutoAugment 和 RandAugment 的改进，事实证明，其在很多视觉任务上有更好的表现，目前绝大多数 VisionTransformer 模型都是基于 TimmAutoAugment 去实现的。
+
+<a name="1.2.3.1"></a>
+
+##### 1.2.3.2 TimmAutoAugment 配置
 
 `TimmAutoAugment` 的图像增广方式的配置如下，其中用户需要指定其中的参数 `config_str`、`interpolation`、`img_size`，默认的数值分别是 `rand-m9-mstd0.5-inc1`、`bicubic`、`224`。`TimmAutoAugment` 是在 uint8 的数据格式上转换的，所以其处理过程应该放在归一化操作(`NormalizeImage`)之前。
 
@@ -97,8 +249,43 @@
             order: ''
 ```
 
-<a name="1.4"></a>
-### 1.4 Cutout
+<a name="1.3"></a>
+
+### 1.3 图像裁剪类
+
+图像裁剪类主要是对 `Transpose` 后的 224 的图像进行一些裁剪，并将裁剪区域的像素值置为特定的常数（默认为 0），主要包括：
+
++ CutOut
++ RandErasing
++ HideAndSeek
++ GridMask
+
+图像裁剪的这些增广并非一定要放在归一化之后，也有不少实现是放在归一化之前的，也就是直接对 uint8 的图像进行操作，两种方式的差别是：如果直接对 uint8 的图像进行操作，那么再经过归一化之后被裁剪的区域将不再是纯黑或纯白（减均值除方差之后像素值不为 0）。而对归一后之后的数据进行操作，裁剪的区域会是纯黑或纯白。
+
+上述的裁剪变换思路是相同的，都是为了解决训练出的模型在有遮挡数据上泛化能力较差的问题，不同的是他们的裁剪方式、区域不太一样。
+
+<a name="1.3.1"></a>
+
+#### 1.3.1 Cutout
+
+<a name="1.3.1.1"></a>
+
+##### 1.3.1.1 Cutout 算法介绍
+
+论文地址：[https://arxiv.org/abs/1708.04552](https://arxiv.org/abs/1708.04552)
+
+开源代码 github 地址：[https://github.com/uoguelph-mlrg/Cutout](https://github.com/uoguelph-mlrg/Cutout)
+
+Cutout 可以理解为 Dropout 的一种扩展操作，不同的是 Dropout 是对图像经过网络后生成的特征进行遮挡，而 Cutout 是直接对输入的图像进行遮挡，相对于 Dropout 对噪声的鲁棒性更好。作者在论文中也进行了说明，这样做法有以下两点优势：(1)通过 Cutout 可以模拟真实场景中主体被部分遮挡时的分类场景；(2)可以促进模型充分利用图像中更多的内容来进行分类，防止网络只关注显著性的图像区域，从而发生过拟合。
+
+
+经过 RandAugment 数据增强后结果如下图所示。
+
+![][test_cutout]
+
+<a name="1.3.1.2"></a>
+
+##### 1.3.1.2 Cutout 配置
 
 `Cutout` 的图像增广方式的配置如下，其中用户需要指定其中的参数 `n_holes` 与 `length`，默认的数值分别是 `1` 和 `112`。类似其他图像裁剪类的数据增强方式，`Cutout` 既可以在 uint8 格式的数据上操作，也可以在归一化)(`NormalizeImage`)后的数据上操作，此处给出的是在归一化后的操作。
 
@@ -121,8 +308,31 @@
             length: 112
 ```
 
-<a name="1.5"></a>
-### 1.5 RandomErasing
+
+<a name="1.3.2"></a>
+
+#### 1.3.2 RandomErasing
+
+<a name="1.3.2.1"></a>
+
+##### 1.3.2.1 RandomErasing 算法介绍
+
+论文地址：[https://arxiv.org/pdf/1708.04896.pdf](https://arxiv.org/pdf/1708.04896.pdf)
+
+开源代码 github 地址：[https://github.com/zhunzhong07/Random-Erasing](https://github.com/zhunzhong07/Random-Erasing)
+
+`RandomErasing` 与 `Cutout` 方法类似，同样是为了解决训练出的模型在有遮挡数据上泛化能力较差的问题，作者在论文中也指出，随机裁剪的方式与随机水平翻转具有一定的互补性。作者也在行人再识别(REID)上验证了该方法的有效性。与 `Cutout` 不同的是，在 `RandomErasing` 中，图片以一定的概率接受该种预处理方法，生成掩码的尺寸大小与长宽比也是根据预设的超参数随机生成。
+
+
+PaddleClas 中 `RandomErasing` 的使用方法如下所示。
+
+经过 RandomErasing 数据增强后结果如下图所示。
+
+![][test_randomerassing]
+
+<a name="1.3.2.2"></a>
+
+##### 1.3.2.2 RandomErasing 配置
 
 `RandomErasing` 的图像增广方式的配置如下，其中用户需要指定其中的参数 `EPSILON`、`sl`、`sh`、`r1`、`attempt`、`use_log_aspect`、`mode`，默认的数值分别是 `0.25`、`0.02`、`1.0/3.0`、`0.3`、`10`、`True`、`pixel`。类似其他图像裁剪类的数据增强方式，`RandomErasing` 既可以在 uint8 格式的数据上操作，也可以在归一化(`NormalizeImage`)后的数据上操作，此处给出的是在归一化后的操作。
 
@@ -150,8 +360,35 @@
             mode: pixel
 ```
 
-<a name="1.6"></a>
-### 1.6 HideAndSeek
+<a name="1.3.3"></a>
+
+#### 1.3.3 HideAndSeek
+
+<a name="1.3.3.1"></a>
+
+##### 1.3.3.1 HideAndSeek 算法介绍
+
+论文地址：[https://arxiv.org/pdf/1811.02545.pdf](https://arxiv.org/pdf/1811.02545.pdf)
+
+开源代码 github 地址：[https://github.com/kkanshul/Hide-and-Seek](https://github.com/kkanshul/Hide-and-Seek)
+
+
+`HideAndSeek` 论文将图像分为若干块区域(patch)，对于每块区域，都以一定的概率生成掩码，不同区域的掩码含义如下图所示。
+
+
+![][hide_and_seek_mask_expanation]
+
+
+PaddleClas 中 `HideAndSeek` 的使用方法如下所示。
+
+
+经过 HideAndSeek 数据增强后结果如下图所示。
+
+![][test_hideandseek]
+
+<a name="1.3.3.2"></a>
+
+##### 1.3.3.2 HideAndSeek 配置
 
 `HideAndSeek` 的图像增广方式的配置如下。类似其他图像裁剪类的数据增强方式，`HideAndSeek` 既可以在 uint8 格式的数据上操作，也可以在归一化(`NormalizeImage`)后的数据上操作，此处给出的是在归一化后的操作。
 
@@ -172,9 +409,43 @@
         - HideAndSeek:
 ```
 
-<a name="1.7"></a>
+<a name="1.3.4"></a>
+
+#### 1.3.4 GridMask
+
+<a name="1.3.4.1"></a>
+
+##### 1.3.4.1 GridMask 算法介绍
+
+论文地址：[https://arxiv.org/abs/2001.04086](https://arxiv.org/abs/2001.04086)
+
+开源代码 github 地址：[https://github.com/akuxcw/GridMask](https://github.com/akuxcw/GridMask)
 
-### 1.7 GridMask
+
+作者在论文中指出，此前存在的基于对图像 crop 的方法存在两个问题，如下图所示：
+
+1. 过度删除区域可能造成目标主体大部分甚至全部被删除，或者导致上下文信息的丢失，导致增广后的数据成为噪声数据；
+2. 保留过多的区域，对目标主体及上下文基本产生不了什么影响，失去增广的意义。
+
+![][gridmask-0]
+
+因此如果避免过度删除或过度保留成为需要解决的核心问题。
+
+
+`GridMask` 是通过生成一个与原图分辨率相同的掩码，并将掩码进行随机翻转，与原图相乘，从而得到增广后的图像，通过超参数控制生成的掩码网格的大小。
+
+
+在训练过程中，有两种以下使用方法：
+1. 设置一个概率 p，从训练开始就对图片以概率 p 使用 `GridMask` 进行增广。
+2. 一开始设置增广概率为 0，随着迭代轮数增加，对训练图片进行 `GridMask` 增广的概率逐渐增大，最后变为 p。
+
+论文中验证上述第二种方法的训练效果更好一些。
+
+经过 GridMask 数据增强后结果如下图所示。
+
+<a name="1.3.4.2"></a>
+
+##### 1.3.4.2 GridMask 配置
 
 `GridMask` 的图像增广方式的配置如下,其中用户需要指定其中的参数 `d1`、`d2`、`rotate`、`ratio`、`mode`, 默认的数值分别是 `96`、`224`、`1`、`0.5`、`0`。类似其他图像裁剪类的数据增强方式，`GridMask` 既可以在 uint8 格式的数据上操作，也可以在归一化(`NormalizeImage`)后的数据上操作，此处给出的是在归一化后的操作。
 
@@ -200,8 +471,43 @@
             mode: 0
 ```
 
-<a name="1.8"></a>
-### 1.8 Mixup
+![][test_gridmask]
+
+<a name="1.4"></a>
+
+### 1.4 图像混叠类
+
+图像混叠主要对 `Batch` 后的数据进行混合，包括：
+
++ Mixup
++ Cutmix
+
+前文所述的图像变换与图像裁剪都是针对单幅图像进行的操作，而图像混叠是对两幅图像进行融合，生成一幅图像，两种方法的主要区别为混叠的方式不太一样。
+
+<a name="1.4.1"></a>
+
+#### 1.4.1 Mixup
+
+<a name="1.4.1.1"></a>
+
+##### 1.4.1.1  Mixup 算法介绍
+
+论文地址：[https://arxiv.org/pdf/1710.09412.pdf](https://arxiv.org/pdf/1710.09412.pdf)
+
+开源代码 github 地址：[https://github.com/facebookresearch/mixup-cifar10](https://github.com/facebookresearch/mixup-cifar10)
+
+Mixup 是最先提出的图像混叠增广方案，其原理简单、方便实现，不仅在图像分类上，在目标检测上也取得了不错的效果。为了便于实现，通常只对一个 batch 内的数据进行混叠，在 `Cutmix` 中也是如此。
+
+如下是 `imaug` 中的实现，需要指出的是，下述实现会出现对同一幅进行相加的情况，也就是最终得到的图和原图一样，随着 `batch-size` 的增加这种情况出现的概率也会逐渐减小。
+
+
+经过 Mixup 数据增强结果如下图所示。
+
+![][test_mixup]
+
+<a name="1.4.1.2"></a>
+
+##### 1.4.1.2  Mixup 配置
 
 `Mixup` 的图像增广方式的配置如下，其中用户需要指定其中的参数 `alpha`，默认的数值是 `0.2`。类似其他图像混合类的数据增强方式，`Mixup` 是在图像做完数据处理后将每个 batch 内的数据做图像混叠，将混叠后的图像和标签输入网络中训练，所以其是在图像数据处理（图像变换、图像裁剪）后操作。
 
@@ -224,8 +530,26 @@
             alpha: 0.2
 ```
 
-<a name="1.9"></a>
-### 1.9 Cutmix
+<a name="1.4.2"></a>
+#### 1.4.2 Cutmix
+
+<a name="1.4.2.1"></a>
+
+##### 1.4.2.1 Cutmix 算法介绍
+
+论文地址：[https://arxiv.org/pdf/1905.04899v2.pdf](https://arxiv.org/pdf/1905.04899v2.pdf)
+
+开源代码 github 地址：[https://github.com/clovaai/CutMix-PyTorch](https://github.com/clovaai/CutMix-PyTorch)
+
+与 `Mixup` 直接对两幅图进行相加不一样，`Cutmix` 是从一幅图中随机裁剪出一个 `ROI`，然后覆盖当前图像中对应的区域，代码实现如下所示：
+
+经过 Cutmix 数据增强后结果如下图所示。
+
+![][test_cutmix]
+
+<a name="1.4.2.2"></a>
+
+##### 1.4.2.2 Cutmix 配置
 
 `Cutmix` 的图像增广方式的配置如下，其中用户需要指定其中的参数 `alpha`，默认的数值是 `0.2`。类似其他图像混合类的数据增强方式，`Cutmix` 是在图像做完数据处理后将每个 batch 内的数据做图像混叠，将混叠后的图像和标签输入网络中训练，所以其是在图像数据处理（图像变换、图像裁剪）后操作。
 
@@ -248,8 +572,9 @@
             alpha: 0.2
 ```
 
-<a name="1.10"></a>
-### 1.10 Mixup 与 Cutmix 同时使用
+<a name="1.4.2.3"></a>
+
+##### 1.4.2.3 Mixup 和 Cutmix 混合使用配置
 
 `Mixup` 与 `Cutmix` 同时使用的配置如下，其中用户需要指定额外的参数 `prob`，该参数控制不同数据增强的概率，默认为 `0.5`。
 
@@ -277,55 +602,149 @@
 ```
 
 <a name="2"></a>
-## 2. 启动命令
 
-当用户配置完训练环境后，类似于训练其他分类任务，只需要将 `tools/train.sh` 中的配置文件替换成为相应的数据增强方式的配置文件即可。
+## 2. 模型训练、评估和预测
+    
+<a name="2.1"></a>  
 
-其中 `train.sh` 中的内容如下：
+### 2.1 环境配置
 
-```bash
+* 安装：请先参考 [Paddle 安装教程](../installation/install_paddle.md) 以及 [PaddleClas 安装教程](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+
+<a name="2.2"></a> 
+
+### 2.2 数据准备
+
+请在[ImageNet 官网](https://www.image-net.org/)准备 ImageNet-1k 相关的数据。
+
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，将下载好的数据命名为 `ILSVRC2012` ，存放于此。 `ILSVRC2012` 目录中具有以下数据：
+
+```
+├── train
+│   ├── n01440764
+│   │   ├── n01440764_10026.JPEG
+│   │   ├── n01440764_10027.JPEG
+├── train_list.txt
+...
+├── val
+│   ├── ILSVRC2012_val_00000001.JPEG
+│   ├── ILSVRC2012_val_00000002.JPEG
+├── val_list.txt
+```
+
+其中 `train/` 和 `val/` 分别为训练集和验证集。`train_list.txt` 和 `val_list.txt` 分别为训练集和验证集的标签文件。
+    
+**备注：** 
 
+* 关于 `train_list.txt`、`val_list.txt`的格式说明，可以参考[PaddleClas分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。
+
+
+<a name="2.3"></a> 
+
+### 2.3 模型训练 
+
+
+在 `ppcls/configs/ImageNet/DataAugment` 中提供了基于 ResNet50 的不同的数据增强的训练配置，这里以使用 `AutoAugment` 为例，介绍数据增强的使用方法。可以通过如下脚本启动训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
 python3 -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
-    --log_dir=ResNet50_Cutout \
+    --gpus="0,1,2,3" \
     tools/train.py \
-        -c ./ppcls/configs/ImageNet/DataAugment/ResNet50_Cutout.yaml
+        -c ppcls/configs/ImageNet/DataAugment/ResNet50_AutoAugment.yaml 
 ```
 
-运行 `train.sh`：
+
+**备注：** 
+
+* 1.当前精度最佳的模型会保存在 `output/ResNet50/best_model.pdparams`。
+* 2.如需更改数据增强类型，只需要替换`ppcls/configs/ImageNet/DataAugment`中的其他的配置文件即可。
+* 3.如果希望多种数据增强混合使用，请参考[第 2 节](#2)中的相关配置更改配置文件中的数据增强即可。
+* 4.由于图像混叠时需对 label 进行混叠，无法计算训练数据的准确率，所以在训练过程中没有打印训练准确率。
+* 5.在使用数据增强后，由于训练数据更难，所以训练损失函数可能较大，训练集的准确率相对较低，但其有拥更好的泛化能力，所以验证集的准确率相对较高。
+* 6.在使用数据增强后，模型可能会趋于欠拟合状态，建议可以适当的调小 `l2_decay` 的值来获得更高的验证集准确率。
+* 7.几乎每一类图像增强均含有超参数，我们只提供了基于 ImageNet-1k 的超参数，其他数据集需要用户自己调试超参数，具体超参数的含义用户可以阅读相关的论文，调试方法也可以参考[训练技巧](../models_training/train_strategy.md)。
+
+<a name="2.4"></a>
+
+### 2.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
 
 ```bash
-sh tools/train.sh
+python3 tools/eval.py \
+    -c ppcls/configs/ImageNet/DataAugment/ResNet50_AutoAugment.yaml \
+    -o Global.pretrained_model=output/ResNet50/best_model
 ```
 
+其中 `-o Global.pretrained_model="output/ResNet50/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+<a name="2.5"></a>
+
+### 2.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```python
+python3 tools/infer.py \
+    -c ppcls/configs/ImageNet/DataAugment/ResNet50_AutoAugment.yaml \
+    -o Global.pretrained_model=output/ResNet50/best_model 
+```
+
+输出结果如下：
+
+```
+[{'class_ids': [8, 7, 86, 81, 85], 'scores': [0.91347, 0.03779, 0.0036, 0.00117, 0.00112], 'file_name': 'docs/images/inference_deployment/whl_demo.jpg', 'label_names': ['hen', 'cock', 'partridge', 'ptarmigan', 'quail']}]
+```
+
+**备注：** 
+
+* 这里`-o Global.pretrained_model="output/ResNet50/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+    
+* 默认是对 `docs/images/inference_deployment/whl_demo.jpg` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
+    
+* 默认输出的是 Top-5 的值，如果希望输出 Top-k 的值，可以指定`-o Infer.PostProcess.topk=k`，其中，`k` 为您指定的值。
+
+
+
 <a name="3"></a>
-## 3. 注意事项
 
-* 由于图像混叠时需对 label 进行混叠，无法计算训练数据的准确率，所以在训练过程中没有打印训练准确率。
+## 3.参考文献
 
-* 在使用数据增强后，由于训练数据更难，所以训练损失函数可能较大，训练集的准确率相对较低，但其有拥更好的泛化能力，所以验证集的准确率相对较高。
+[1] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2019: 113-123.
 
-* 在使用数据增强后，模型可能会趋于欠拟合状态，建议可以适当的调小 `l2_decay` 的值来获得更高的验证集准确率。
 
-* 几乎每一类图像增强均含有超参数，我们只提供了基于 ImageNet-1k 的超参数，其他数据集需要用户自己调试超参数，具体超参数的含义用户可以阅读相关的论文，调试方法也可以参考[训练技巧](../models_training/train_strategy.md)。
+[2] Cubuk E D, Zoph B, Shlens J, et al. Randaugment: Practical automated data augmentation with a reduced search space[J]. arXiv preprint arXiv:1909.13719, 2019.
 
-<a name="4"></a>
-## 4. 实验结果
+[3] DeVries T, Taylor G W. Improved regularization of convolutional neural networks with cutout[J]. arXiv preprint arXiv:1708.04552, 2017.
+
+[4] Zhong Z, Zheng L, Kang G, et al. Random erasing data augmentation[J]. arXiv preprint arXiv:1708.04896, 2017.
+
+[5] Singh K K, Lee Y J. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization[C]//2017 IEEE international conference on computer vision (ICCV). IEEE, 2017: 3544-3553.
+
+[6] Chen P. GridMask Data Augmentation[J]. arXiv preprint arXiv:2001.04086, 2020.
+
+[7] Zhang H, Cisse M, Dauphin Y N, et al. mixup: Beyond empirical risk minimization[J]. arXiv preprint arXiv:1710.09412, 2017.
+
+[8] Yun S, Han D, Oh S J, et al. Cutmix: Regularization strategy to train strong classifiers with localizable features[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 6023-6032.
 
-基于 PaddleClas，在 ImageNet1k 数据集上的分类精度如下。
 
-| 模型          | 初始学习率策略  | l2 decay | batch size | epoch | 数据变化策略         | Top1 Acc    | 论文中结论 |
-|-------------|------------------|--------------|------------|-------|----------------|------------|----|
-| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | 标准变换           | 0.7731 | - |
-| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | AutoAugment    | 0.7795 |  0.7763 |
-| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | mixup          | 0.7828 |  0.7790 |
-| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | cutmix         | 0.7839 |  0.7860 |
-| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | cutout         | 0.7801 |  - |
-| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | gridmask       | 0.7785 |  0.7790 |
-| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | random-augment | 0.7770 |  0.7760 |
-| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | random erasing | 0.7791 |  - |
-| ResNet50 | 0.1/cosine_decay | 0.0001       | 256        | 300   | hide and seek  | 0.7743 |  0.7720 |
 
-**注意**：
-* 在这里的实验中，为了便于对比，我们将 l2 decay 固定设置为 1e-4，在实际使用中，我们推荐尝试使用更小的 l2 decay。结合数据增强，我们发现将 l2 decay 由 1e-4 减小为 7e-5 均能带来至少 0.3~0.5% 的精度提升。
-* 我们目前尚未对不同策略进行组合并验证效果，这一块后续我们会开展更多的对比实验，敬请期待。
+[test_baseline]: ../../images/image_aug/test_baseline.jpeg
+[test_autoaugment]: ../../images/image_aug/test_autoaugment.jpeg
+[test_cutout]: ../../images/image_aug/test_cutout.jpeg
+[test_gridmask]: ../../images/image_aug/test_gridmask.jpeg
+[gridmask-0]: ../../images/image_aug/gridmask-0.png
+[test_hideandseek]: ../../images/image_aug/test_hideandseek.jpeg
+[test_randaugment]: ../../images/image_aug/test_randaugment.jpeg
+[test_randomerassing]: ../../images/image_aug/test_randomerassing.jpeg
+[hide_and_seek_mask_expanation]: ../../images/image_aug/hide-and-seek-visual.png
+[test_mixup]: ../../images/image_aug/test_mixup.png
+[test_cutmix]: ../../images/image_aug/test_cutmix.png
diff --git a/docs/zh_CN/advanced_tutorials/knowledge_distillation.md b/docs/zh_CN/advanced_tutorials/knowledge_distillation.md
index d3e6d77cf254a933fd6e6776e361f2c499b5c14d..6224e82a79eb39cc62641c66acd8bbc0133070ce 100644
--- a/docs/zh_CN/advanced_tutorials/knowledge_distillation.md
+++ b/docs/zh_CN/advanced_tutorials/knowledge_distillation.md
@@ -1,209 +1,411 @@
 
-# 知识蒸馏
+# 知识蒸馏实战
 
 ## 目录
-  - [1. 模型压缩与知识蒸馏方法简介](#1)
-  - [2. SSLD 蒸馏策略](#2)
-    - [2.1 简介](#2.1)
-    - [2.2 数据选择](#2.2)
-  - [3. 实验](#3)
-    - [3.1 教师模型的选择](#3.1)
-    - [3.2 大数据蒸馏](#3.2)
-    - [3.3 ImageNet1k 训练集 finetune](#3.3)
-    - [3.4 数据增广以及基于 Fix 策略的微调](#3.4)
-    - [3.5 实验过程中的一些问题](#3.5)
-  - [4. 蒸馏模型的应用](#4)
-    - [4.1 使用方法](#4.1)
-    - [4.2 迁移学习 finetune](#4.2)
-    - [4.3 目标检测](#4.3)
-  - [5. SSLD 实战](#5)
-    - [5.1 参数配置](#5.1)
-    - [5.2 启动命令](#5.2)
-    - [5.3 注意事项](#5.3)
-  - [6. 参考文献](#6)
+
+- [1. 算法介绍](#1)
+    - [1.1 知识蒸馏简介](#1.1)
+        - [1.1.1 Response based distillation](#1.1.1)
+        - [1.1.2 Feature based distillation](#1.1.2)
+        - [1.1.3 Relation based distillation](#1.1.3)
+    - [1.2 PaddleClas支持的知识蒸馏算法](#1.2)
+        - [1.2.1 SSLD](#1.2.1)
+        - [1.2.2 DML](#1.2.2)
+        - [1.2.3 UDML](#1.2.3)
+        - [1.2.4 AFD](#1.2.4)
+        - [1.2.5 DKD](#1.2.5)
+- [2. 使用方法](#2)
+    - [2.1 环境配置](#2.1)
+    - [2.2 数据准备](#2.2)
+    - [2.3 模型训练](#2.3)
+    - [2.4 模型评估](#2.4)
+    - [2.5 模型预测](#2.5)
+    - [2.6 模型导出与推理](#2.6)
+- [3. 参考文献](#3)
+
+
 
 <a name="1"></a>
-## 1. 模型压缩与知识蒸馏方法简介
+
+## 1. 算法介绍
+
+<a name="1.1"></a>
+
+### 1.1 知识蒸馏简介
 
 近年来，深度神经网络在计算机视觉、自然语言处理等领域被验证是一种极其有效的解决问题的方法。通过构建合适的神经网络，加以训练，最终网络模型的性能指标基本上都会超过传统算法。
 
 在数据量足够大的情况下，通过合理构建网络模型的方式增加其参数量，可以显著改善模型性能，但是这又带来了模型复杂度急剧提升的问题。大模型在实际场景中使用的成本较高。
 
-深度神经网络一般有较多的参数冗余，目前有几种主要的方法对模型进行压缩，减小其参数量。如裁剪、量化、知识蒸馏等，其中知识蒸馏是指使用教师模型(teacher model)去指导学生模型(student model)学习特定任务，保证小模型在参数量不变的情况下，得到比较大的性能提升，甚至获得与大模型相似的精度指标 [1]。 PaddleClas 融合已有的蒸馏方法 [2,3]，提供了一种简单的半监督标签知识蒸馏方案(SSLD，Simple Semi-supervised Label Distillation)，基于 ImageNet1k 分类数据集，在 ResNet_vd 以及 MobileNet 系列上的精度均有超过 3% 的绝对精度提升，具体指标如下图所示。
+深度神经网络一般有较多的参数冗余，目前有几种主要的方法对模型进行压缩，减小其参数量。如裁剪、量化、知识蒸馏等，其中知识蒸馏是指使用教师模型(teacher model)去指导学生模型(student model)学习特定任务，保证小模型在参数量不变的情况下，得到比较大的性能提升，甚至获得与大模型相似的精度指标 [1]。
 
-![](../../images/distillation/distillation_perform_s.jpg)
-<a name="2"></a>
-## 2. SSLD 蒸馏策略
+根据蒸馏方式的不同，可以将知识蒸馏方法分为3个不同的类别：Response based distillation、Feature based distillation、Relation based distillation。下面进行详细介绍。
 
-<a name="2.1"></a>
-### 2.1 简介
+<a name='1.1.1'></a>
 
-SSLD 的流程图如下图所示。
+#### 1.1.1 Response based distillation
 
-![](../../images/distillation/ppcls_distillation.png)
-首先，我们从 ImageNet22k 中挖掘出了近 400 万张图片，同时与 ImageNet-1k 训练集整合在一起，得到了一个新的包含 500 万张图片的数据集。然后，我们将学生模型与教师模型组合成一个新的网络，该网络分别输出学生模型和教师模型的预测分布，与此同时，固定教师模型整个网络的梯度，而学生模型可以做正常的反向传播。最后，我们将两个模型的 logits 经过 softmax 激活函数转换为 soft label，并将二者的 soft label 做 JS 散度作为损失函数，用于蒸馏模型训练。下面以 MobileNetV3（该模型直接训练，精度为 75.3%）的知识蒸馏为例，介绍该方案的核心关键点（baseline 为 79.12% 的 ResNet50_vd 模型蒸馏 MobileNetV3，训练集为 ImageNet1k 训练集，loss 为 cross entropy loss，迭代轮数为 120epoch，精度指标为 75.6%）。
+最早的知识蒸馏算法 KD，由 Hinton 提出，训练的损失函数中除了 gt loss 之外，还引入了学生模型与教师模型输出的 KL 散度，最终精度超过单纯使用 gt loss 训练的精度。这里需要注意的是，在训练的时候，需要首先训练得到一个更大的教师模型，来指导学生模型的训练过程。
 
-* 教师模型的选择。在进行知识蒸馏时，如果教师模型与学生模型的结构差异太大，蒸馏得到的结果反而不会有太大收益。相同结构下，精度更高的教师模型对结果也有很大影响。相比于 79.12% 的 ResNet50_vd 教师模型，使用 82.4% 的 ResNet50_vd 教师模型可以带来 0.4% 的绝对精度收益(`75.6%->76.0%`)。
+PaddleClas 中提出了一种简单使用的 SSLD 知识蒸馏算法 [6]，在训练的时候去除了对 gt label 的依赖，结合大量无标注数据，最终蒸馏训练得到的预训练模型在 15 个模型上的精度提升平均高达 3%。
 
-* 改进 loss 计算方法。分类 loss 计算最常用的方法就是 cross entropy loss，我们经过实验发现，在使用 soft label 进行训练时，相对于 cross entropy loss，KL div loss 对模型性能提升几乎无帮助，但是使用具有对称特性的 JS div loss 时，在多个蒸馏任务上相比 cross entropy loss 均有 0.2% 左右的收益(`76.0%->76.2%`)，SSLD 中也基于 JS div loss 展开实验。
+上述标准的蒸馏方法是通过一个大模型作为教师模型来指导学生模型提升效果，而后来又发展出 DML(Deep Mutual Learning)互学习蒸馏方法 [7]，即通过两个结构相同的模型互相学习。具体的。相比于 KD 等依赖于大的教师模型的知识蒸馏算法，DML 脱离了对大的教师模型的依赖，蒸馏训练的流程更加简单，模型产出效率也要更高一些。
 
-* 更多的迭代轮数。蒸馏的 baseline 实验只迭代了 120 个 epoch 。实验发现，迭代轮数越多，蒸馏效果越好，最终我们迭代了 360 epoch，精度指标可以达到 77.1%(`76.2%->77.1%`)。
+<a name='1.1.2'></a>
 
-* 无需数据集的真值标签，很容易扩展训练集。 SSLD 的 loss 在计算过程中，仅涉及到教师和学生模型对于相同图片的处理结果（经过 softmax 激活函数处理之后的 soft label），因此即使图片数据不包含真值标签，也可以用来进行训练并提升模型性能。该蒸馏方案的无标签蒸馏策略也大大提升了学生模型的性能上限(`77.1%->78.5%`)。
+#### 1.1.2 Feature based distillation
 
-* ImageNet1k 蒸馏 finetune 。 我们仅使用 ImageNet1k 数据，使用蒸馏方法对上述模型进行 finetune，最终仍然可以获得 0.4% 的性能提升(`78.5%->78.9%`)。
+Heo 等人提出了 OverHaul [8], 计算学生模型与教师模型的 feature map distance，作为蒸馏的 loss，在这里使用了学生模型、教师模型的转移，来保证二者的 feature map 可以正常地进行 distance 的计算。
 
+基于 feature map distance 的知识蒸馏方法也能够和 `3.1 章节` 中的基于 response 的知识蒸馏算法融合在一起，同时对学生模型的输出结果和中间层 feature map 进行监督。而对于 DML 方法来说，这种融合过程更为简单，因为不需要对学生和教师模型的 feature map 进行转换，便可以完成对齐(alignment)过程。PP-OCRv2 系统中便使用了这种方法，最终大幅提升了 OCR 文字识别模型的精度。
 
-<a name="2.2"></a>
-### 2.2 数据选择
+<a name='1.1.3'></a>
 
-* SSLD 蒸馏方案的一大特色就是无需使用图像的真值标签，因此可以任意扩展数据集的大小，考虑到计算资源的限制，我们在这里仅基于 ImageNet22k 数据集对蒸馏任务的训练集进行扩充。在 SSLD 蒸馏任务中，我们使用了 `Top-k per class` 的数据采样方案 [3] 。具体步骤如下。
-    * 训练集去重。我们首先基于 SIFT 特征相似度匹配的方式对 ImageNet22k 数据集与 ImageNet1k 验证集进行去重，防止添加的 ImageNet22k 训练集中包含 ImageNet1k 验证集图像，最终去除了 4511 张相似图片。部分过滤的相似图片如下所示。
+#### 1.1.3 Relation based distillation
 
-    ![](../../images/distillation/22k_1k_val_compare_w_sift.png)
+[1.1.1](#1.1.1) 和 [1.1.2](#1.1.2) 章节中的论文中主要是考虑到学生模型与教师模型的输出或者中间层 feature map，这些知识蒸馏算法只关注个体的输出结果，没有考虑到个体之间的输出关系。
 
-    * 大数据集 soft label 获取，对于去重后的 ImageNet22k 数据集，我们使用 `ResNeXt101_32x16d_wsl` 模型进行预测，得到每张图片的 soft label 。
-    * Top-k 数据选择，ImageNet1k 数据共有 1000 类，对于每一类，找出属于该类并且得分最高的 `k` 张图片，最终得到一个数据量不超过 `1000*k` 的数据集（某些类上得到的图片数量可能少于 `k` 张）。
-    * 将该数据集与 ImageNet1k 的训练集融合组成最终蒸馏模型所使用的数据集，数据量为 500 万。
+Park 等人提出了 RKD [10]，基于关系的知识蒸馏算法，RKD 中进一步考虑个体输出之间的关系，使用 2 种损失函数，二阶的距离损失（distance-wise）和三阶的角度损失（angle-wise）
 
-<a name="3"></a>
-## 3. 实验
 
-* PaddleClas 的蒸馏策略为`大数据集训练 + ImageNet1k 蒸馏 finetune` 的策略。选择合适的教师模型，首先在挑选得到的 500 万数据集上进行训练，然后在 ImageNet1k 训练集上进行 finetune，最终得到蒸馏后的学生模型。
+本论文提出的算法关系知识蒸馏（RKD）迁移教师模型得到的输出结果间的结构化关系给学生模型，不同于之前的只关注个体输出结果，RKD 算法使用两种损失函数：二阶的距离损失(distance-wise)和三阶的角度损失(angle-wise)。在最终计算蒸馏损失函数的时候，同时考虑 KD loss 和 RKD loss。最终精度优于单独使用 KD loss 蒸馏得到的模型精度。
+
+<a name='1.2'></a>
+
+### 1.2 PaddleClas支持的知识蒸馏算法
+
+<a name='1.2.1'></a>
+
+#### 1.2.1 SSLD
+
+##### 1.2.1.1 SSLD 算法介绍
+
+论文信息：
+
+> [Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones
+](https://arxiv.org/abs/2103.05959)
+>
+> Cheng Cui, Ruoyu Guo, Yuning Du, Dongliang He, Fu Li, Zewu Wu, Qiwen Liu, Shilei Wen, Jizhou Huang, Xiaoguang Hu, Dianhai Yu, Errui Ding, Yanjun Ma
+>
+> arxiv, 2021
+
+SSLD是百度于2021年提出的一种简单的半监督知识蒸馏方案，通过设计一种改进的JS散度作为损失函数，结合基于ImageNet22k数据集的数据挖掘策略，最终帮助15个骨干网络模型的精度平均提升超过3%。
 
-<a name="3.1"></a>
-### 3.1 教师模型的选择
+更多关于SSLD的原理、模型库与使用介绍，请参考：[SSLD知识蒸馏算法介绍](./ssld.md)。
 
-为了验证教师模型和学生模型的模型大小差异和教师模型的模型精度对蒸馏结果的影响，我们做了几组实验验证。训练策略统一为：`cosine_decay_warmup，lr=1.3, epoch=120, bs=2048`，学生模型均为从头训练。
 
-|Teacher Model | Teacher Top1 | Student Model | Student Top1|
-|- |:-: |:-: | :-: |
-| ResNeXt101_32x16d_wsl | 84.2% | MobileNetV3_large_x1_0 | 75.78% |
-| ResNet50_vd | 79.12% | MobileNetV3_large_x1_0 | 75.60% |
-| ResNet50_vd | 82.35% | MobileNetV3_large_x1_0 | 76.00% |
+##### 1.2.1.2 SSLD 配置
 
+SSLD配置如下所示。在模型构建Arch字段中，需要同时定义学生模型与教师模型，教师模型固定梯度，并且加载预训练参数。在损失函数Loss字段中，需要定义`DistillationDMLLoss`，作为训练的损失函数。
 
-从表中可以看出
+```yaml
+# model architecture
+Arch:
+  name: "DistillationModel"    # 模型名称，这里使用的是蒸馏模型，
+  class_num: &class_num 1000   # 类别数量，对于ImageNet1k数据集来说，类别数为1000
+  pretrained_list:             # 预训练模型列表，因为在下面的子网络中指定了预训练模型，这里无需指定
+  freeze_params_list:          # 固定网络参数列表，为True时，表示固定该index对应的网络
+  - True
+  - False
+  infer_model_name: "Student"  # 在模型导出的时候，会导出Student子网络
+  models:                      # 子网络列表
+    - Teacher:                 # 教师模型
+        name: ResNet50_vd      # 模型名称
+        class_num: *class_num  # 类别数
+        pretrained: True       # 预训练模型路径，如果为True，则会从官网下载默认的预训练模型
+        use_ssld: True         # 是否使用SSLD蒸馏得到的预训练模型，精度会更高一些
+    - Student:                 # 学生模型
+        name: PPLCNet_x2_5     # 模型名称
+        class_num: *class_num  # 类别数
+        pretrained: False      # 预训练模型路径，可以指定为bool值或者字符串，这里为False，表示学生模型默认不加载预训练模型
+
+# loss function config for traing/eval process
+Loss:                           # 定义损失函数
+  Train:                        # 定义训练的损失函数，为列表形式
+    - DistillationDMLLoss:      # 蒸馏的DMLLoss，对DMLLoss进行封装，支持蒸馏结果(dict形式)的损失函数计算
+        weight: 1.0             # loss权重
+        model_name_pairs:       # 用于计算的模型对，这里表示计算Student和Teacher输出的损失函数
+        - ["Student", "Teacher"]
+  Eval:                         # 定义评估时的损失函数
+    - CELoss:
+        weight: 1.0
+```
 
-> 教师模型结构相同时，其精度越高，最终的蒸馏效果也会更好一些。
+<a name='1.2.2'></a>
+
+#### 1.2.2 DML
+
+##### 1.2.2.1 DML 算法介绍
+
+论文信息：
+
+> [Deep Mutual Learning](https://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_Deep_Mutual_Learning_CVPR_2018_paper.html)
+>
+> Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu
 >
-> 教师模型与学生模型的模型大小差异不宜过大，否则反而会影响蒸馏结果的精度。
+> CVPR, 2018
+
+DML论文中，在蒸馏的过程中，不依赖于教师模型，两个结构相同的模型互相学习，计算彼此输出（logits）的KL散度，最终完成训练过程。
+
+
+在ImageNet1k公开数据集上，效果如下所示。
+
+| 策略 | 骨干网络 | 配置文件 | Top-1 acc | 下载链接 |
+| --- | --- | --- | --- | --- |
+| baseline | PPLCNet_x2_5 | [PPLCNet_x2_5.yaml](../../../ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_5.yaml) | 74.93% | - |
+| DML | PPLCNet_x2_5 | [PPLCNet_x2_5_dml.yaml](../../../ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_dml.yaml) | 76.68%(**+1.75%**) | - |
+
+
+* 注：完整的PPLCNet_x2_5模型训练了360epoch，这里为了方便对比，baseline和DML均训练了100epoch，因此指标比官网最终开源出来的模型精度（76.60%）低一些。
+
+
+##### 1.2.2.2 DML 配置
+
+DML配置如下所示。在模型构建Arch字段中，需要同时定义学生模型与教师模型，教师模型与学生模型均保持梯度更新状态。在损失函数Loss字段中，需要定义`DistillationDMLLoss`（学生与教师之间的JS-Div loss）以及`DistillationGTCELoss`（学生与教师关于真值标签的CE loss），作为训练的损失函数。
+
+```yaml
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 1000
+  pretrained_list:
+  freeze_params_list:        # 两个模型互相学习，因此这里两个子网络的参数均不能固定
+  - False
+  - False
+  models:
+    - Teacher:
+        name: PPLCNet_x2_5   # 两个模型互学习，因此均没有加载预训练模型
+        class_num: *class_num
+        pretrained: False
+    - Student:
+        name: PPLCNet_x2_5
+        class_num: *class_num
+        pretrained: False
+
+Loss:
+  Train:
+    - DistillationGTCELoss:    # 因为2个子网络均没有加载预训练模型，这里需要同时计算不同子网络的输出与真值标签之间的CE loss
+        weight: 1.0
+        model_names: ["Student", "Teacher"]
+    - DistillationDMLLoss:
+        weight: 1.0
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+```
+
+<a name='1.2.3'></a>
+
+#### 1.2.3 UDML
+
+##### 1.2.3.1 UDML 算法介绍
+
+论文信息：
+
+UDML 是百度飞桨视觉团队提出的无需依赖教师模型的知识蒸馏算法，它基于DML进行改进，在蒸馏的过程中，除了考虑两个模型的输出信息，也考虑两个模型的中间层特征信息，从而进一步提升知识蒸馏的精度。更多关于UDML的说明与应用，请参考[PP-ShiTu论文](https://arxiv.org/abs/2111.00775)以及[PP-OCRv3论文](https://arxiv.org/abs/2109.03144)。
 
 
-因此最终在蒸馏实验中，对于 ResNet 系列学生模型，我们使用 `ResNeXt101_32x16d_wsl` 作为教师模型；对于 MobileNet 系列学生模型，我们使用蒸馏得到的 `ResNet50_vd` 作为教师模型。
 
-<a name="3.2"></a>
-### 3.2 大数据蒸馏
+在ImageNet1k公开数据集上，效果如下所示。
 
-基于 PaddleClas 的蒸馏策略为`大数据集训练 + imagenet1k finetune` 的策略。
+| 策略 | 骨干网络 | 配置文件 | Top-1 acc | 下载链接 |
+| --- | --- | --- | --- | --- |
+| baseline | PPLCNet_x2_5 | [PPLCNet_x2_5.yaml](../../../ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_5.yaml) | 74.93% | - |
+| UDML | PPLCNet_x2_5 | [PPLCNet_x2_5_dml.yaml](../../../ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_udml.yaml) | 76.74%(**+1.81%**) | - |
+
+
+##### 1.2.3.2 UDML 配置
+
+
+```yaml
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 1000
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - False
+  - False
+  models:
+    - Teacher:
+        name: PPLCNet_x2_5
+        class_num: *class_num
+        pretrained: False
+        # return_patterns表示除了返回输出的logits，也会返回对应名称的中间层feature map
+        return_patterns: ["blocks3", "blocks4", "blocks5", "blocks6"]
+    - Student:
+        name: PPLCNet_x2_5
+        class_num: *class_num
+        pretrained: False
+        return_patterns: ["blocks3", "blocks4", "blocks5", "blocks6"]
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationGTCELoss:
+       weight: 1.0
+       key: logits
+       model_names: ["Student", "Teacher"]
+    - DistillationDMLLoss:
+        weight: 1.0
+        key: logits
+        model_name_pairs:
+        - ["Student", "Teacher"]
+    - DistillationDistanceLoss:  # 基于蒸馏结果的距离loss，这里默认使用l2 loss计算block5之间的损失函数
+        weight: 1.0
+        key: "blocks5"
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+```
 
-针对从 ImageNet22k 挑选出的 400 万数据，融合 imagenet1k 训练集，组成共 500 万的训练集进行训练，具体地，在不同模型上的训练超参及效果如下。
+**注意(：** 上述在网络中指定`return_patterns`，返回中间层特征的功能是基于TheseusLayer，更多关于TheseusLayer的使用说明，请参考：[TheseusLayer 使用说明](./theseus_layer.md)。
 
 
-|Student Model | num_epoch  | l2_ecay | batch size/gpu cards |  base lr | learning rate decay | top1 acc |
-| - |:-: |:-: | :-: |:-: |:-: |:-: |
-| MobileNetV1 | 360 | 3e-5 | 4096/8  | 1.6 | cosine_decay_warmup | 77.65% |
-| MobileNetV2 | 360 | 1e-5 | 3072/8  | 0.54 | cosine_decay_warmup | 76.34% |
-| MobileNetV3_large_x1_0 | 360 | 1e-5 |  5760/24 | 3.65625 | cosine_decay_warmup | 78.54% |
-| MobileNetV3_small_x1_0 | 360 | 1e-5 |  5760/24 | 3.65625 | cosine_decay_warmup | 70.11% |
-| ResNet50_vd | 360 | 7e-5 | 1024/32 | 0.4 | cosine_decay_warmup | 82.07% |
-| ResNet101_vd | 360 | 7e-5 | 1024/32 | 0.4 | cosine_decay_warmup | 83.41% |
-| Res2Net200_vd_26w_4s | 360 | 4e-5 | 1024/32 | 0.4 | cosine_decay_warmup | 84.82% |
+<a name='1.2.4'></a>
 
-<a name="3.3"></a>
-### 3.3 ImageNet1k 训练集 finetune
+#### 1.2.4 AFD
 
-对于在大数据集上训练的模型，其学习到的特征可能与 ImageNet1k 数据特征有偏，因此在这里使用 ImageNet1k 数据集对模型进行 finetune。 finetune 的超参和 finetune 的精度收益如下。
+##### 1.2.4.1 AFD 算法介绍
 
+论文信息：
 
-|Student Model | num_epoch  | l2_ecay | batch size/gpu cards |  base lr | learning rate decay |  top1 acc |
-| - |:-: |:-: | :-: |:-: |:-: |:-: |
-| MobileNetV1 | 30 | 3e-5 | 4096/8 | 0.016 | cosine_decay_warmup | 77.89%  |
-| MobileNetV2 | 30 | 1e-5 | 3072/8  | 0.0054 | cosine_decay_warmup | 76.73% |
-| MobileNetV3_large_x1_0 | 30 | 1e-5 |  2048/8 | 0.008 | cosine_decay_warmup | 78.96% |
-| MobileNetV3_small_x1_0 | 30 | 1e-5 |  6400/32 | 0.025 | cosine_decay_warmup | 71.28% |
-| ResNet50_vd | 60 | 7e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 82.39% |
-| ResNet101_vd | 30 | 7e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 83.73% |
-| Res2Net200_vd_26w_4s | 360 | 4e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 85.13% |
 
-<a name="3.4"></a>
-### 3.4 数据增广以及基于 Fix 策略的微调
+> [Show, attend and distill: Knowledge distillation via attention-based feature matching](https://arxiv.org/abs/2102.02973)
+>
+> Mingi Ji, Byeongho Heo, Sungrae Park
+>
+> AAAI, 2018
 
-* 基于前文所述的实验结论，我们在训练的过程中加入自动增广(AutoAugment)[4]，同时进一步减小了 l2_decay(4e-5->2e-5)，最终 ResNet50_vd 经过 SSLD 蒸馏策略，在 ImageNet1k 上的精度可以达到 82.99%，相比之前不加数据增广的蒸馏策略再次增加了 0.6% 。
+AFD提出在蒸馏的过程中，利用基于注意力的元网络学习特征之间的相对相似性，并应用识别的相似关系来控制所有可能的特征图pair的蒸馏强度。
 
+在ImageNet1k公开数据集上，效果如下所示。
 
-* 对于图像分类任务，在测试的时候，测试尺度为训练尺度的 1.15 倍左右时，往往在不需要重新训练模型的情况下，模型的精度指标就可以进一步提升 [5]，对于 82.99% 的 ResNet50_vd 在 320x320 的尺度下测试，精度可达 83.7%，我们进一步使用 Fix 策略，即在 320x320 的尺度下进行训练，使用与预测时相同的数据预处理方法，同时固定除 FC 层以外的所有参数，最终在 320x320 的预测尺度下，精度可以达到 **84.0%**。
+| 策略 | 骨干网络 | 配置文件 | Top-1 acc | 下载链接 |
+| --- | --- | --- | --- | --- |
+| baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
+| AFD | ResNet18 | [resnet34_distill_resnet18_afd.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_afd.yaml) | 71.68%(**+0.88%**) | - |
 
-<a name="3.5"></a>
-### 3.5 实验过程中的一些问题
+注意：这里为了与论文的训练配置保持对齐，设置训练的迭代轮数为100epoch，因此baseline精度低于PaddleClas中开源出的模型精度（71.0%）
 
-* 在预测过程中，batch norm 的平均值与方差是通过加载预训练模型得到（设其模式为 test mode）。在训练过程中，batch norm 是通过统计当前 batch 的信息（设其模式为 train mode），与历史保存信息进行滑动平均计算得到，在蒸馏任务中，我们发现通过 train mode，即教师模型的均值与方差实时变化的模式，去指导学生模型，比通过 test mode 蒸馏，得到的学生模型性能更好一些，下面是一组实验结果。因此我们在该蒸馏方案中，均使用 train mode 去得到教师模型的 soft label 。
+##### 1.2.4.2 AFD 配置
 
-|Teacher Model | Teacher Top1 | Student Model | Student Top1|
-|- |:-: |:-: | :-: |
-| ResNet50_vd | 82.35% | MobileNetV3_large_x1_0 | 76.00% |
-| ResNet50_vd | 82.35% | MobileNetV3_large_x1_0 | 75.84% |
+AFD配置如下所示。在模型构建Arch字段中，需要同时定义学生模型与教师模型，固定教师模型的权重。这里需要对从教师模型获取的特征进行变换，进而与学生模型进行损失函数的计算。在损失函数Loss字段中，需要定义`DistillationKLDivLoss`（学生与教师之间的KL-Div loss）、`AFDLoss`（学生与教师之间的AFD loss）以及`DistillationGTCELoss`（学生与教师关于真值标签的CE loss），作为训练的损失函数。
 
-<a name="4"></a>
-## 4. 蒸馏模型的应用
+```yaml
+Arch:
+  name: "DistillationModel"
+  pretrained_list:
+  freeze_params_list:
+  models:
+    - Teacher:
+        name: AttentionModel # 包含若干个串行的网络，后面的网络会将前面的网络输出作为输入并进行处理
+        pretrained_list:
+        freeze_params_list:
+          - True
+          - False
+        models:
+          # AttentionModel 的基础网络
+          - ResNet34:
+              name: ResNet34
+              pretrained: True
+              # return_patterns表示除了返回输出的logits，也会返回对应名称的中间层feature map
+              return_patterns: &t_keys ["blocks[0]", "blocks[1]", "blocks[2]", "blocks[3]",
+                                        "blocks[4]", "blocks[5]", "blocks[6]", "blocks[7]",
+                                        "blocks[8]", "blocks[9]", "blocks[10]", "blocks[11]",
+                                        "blocks[12]", "blocks[13]", "blocks[14]", "blocks[15]"]
+          # AttentionModel的变换网络，会对基础子网络的特征进行变换  
+          - LinearTransformTeacher:
+              name: LinearTransformTeacher
+              qk_dim: 128
+              keys: *t_keys
+              t_shapes: &t_shapes [[64, 56, 56], [64, 56, 56], [64, 56, 56], [128, 28, 28],
+                                   [128, 28, 28], [128, 28, 28], [128, 28, 28], [256, 14, 14],
+                                   [256, 14, 14], [256, 14, 14], [256, 14, 14], [256, 14, 14],
+                                   [256, 14, 14], [512, 7, 7], [512, 7, 7], [512, 7, 7]]
 
-<a name="4.1"></a>
-### 4.1 使用方法
+    - Student:
+        name: AttentionModel
+        pretrained_list:
+        freeze_params_list:
+          - False
+          - False
+        models:
+          - ResNet18:
+              name: ResNet18
+              pretrained: False
+              return_patterns: &s_keys ["blocks[0]", "blocks[1]", "blocks[2]", "blocks[3]",
+                                        "blocks[4]", "blocks[5]", "blocks[6]", "blocks[7]"]
+          - LinearTransformStudent:
+              name: LinearTransformStudent
+              qk_dim: 128
+              keys: *s_keys
+              s_shapes: &s_shapes [[64, 56, 56], [64, 56, 56], [128, 28, 28], [128, 28, 28],
+                                   [256, 14, 14], [256, 14, 14], [512, 7, 7], [512, 7, 7]]
+              t_shapes: *t_shapes
 
-* 中间层学习率调整。蒸馏得到的模型的中间层特征图更加精细化，因此将蒸馏模型预训练应用到其他任务中时，如果采取和之前相同的学习率，容易破坏中间层特征。而如果降低整体模型训练的学习率，则会带来训练收敛速度慢的问题。因此我们使用了中间层学习率调整的策略。具体地：
-    * 针对 ResNet50_vd，我们设置一个学习率倍数列表，res block 之前的 3 个 conv2d 卷积参数具有统一的学习率倍数，4 个 res block 的 conv2d 分别有一个学习率参数，共需设置 5 个学习率倍数的超参。在实验中发现。用于迁移学习 finetune 分类模型时，`[0.1,0.1,0.2,0.2,0.3]` 的中间层学习率倍数设置在绝大多数的任务中都性能更好；而在目标检测任务中，`[0.05,0.05,0.05,0.1,0.15]` 的中间层学习率倍数设置能够带来更大的精度收益。
-    * 对于 MoblileNetV3_large_x1_0，由于其包含 15 个 block，我们设置每 3 个 block 共享一个学习率倍数参数，因此需要共 5 个学习率倍数的参数，最终发现在分类和检测任务中，`[0.25,0.25,0.5,0.5,0.75]` 的中间层学习率倍数能够带来更大的精度收益。
+  infer_model_name: "Student"
 
 
-* 适当的 l2 decay 。不同分类模型在训练的时候一般都会根据模型设置不同的 l2 decay，大模型为了防止过拟合，往往会设置更大的 l2 decay，如 ResNet50 等模型，一般设置为 `1e-4` ；而如 MobileNet 系列模型，在训练时往往都会设置为 `1e-5~4e-5`，防止模型过度欠拟合，在蒸馏时亦是如此。在将蒸馏模型应用到目标检测任务中时，我们发现也需要调节 backbone 甚至特定任务模型模型的 l2 decay，和预训练蒸馏时的 l2 decay 尽可能保持一致。以 Faster RCNN MobiletNetV3 FPN 为例，我们发现仅修改该参数，在 COCO2017 数据集上就可以带来最多 0.5% 左右的精度(mAP)提升（默认 Faster RCNN l2 decay 为 1e-4，我们修改为 1e-5~4e-5 均有 0.3%~0.5% 的提升）。
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationGTCELoss:
+        weight: 1.0
+        model_names: ["Student"]
+        key: logits
+    - DistillationKLDivLoss:  # 蒸馏的KL-Div loss，会根据model_name_pairs中的模型名称去提取对应模型的输出特征，计算loss
+        weight: 0.9           # 该loss的权重
+        model_name_pairs: [["Student", "Teacher"]]
+        temperature: 4
+        key: logits
+    - AFDLoss:                # AFD loss
+        weight: 50.0
+        model_name_pair: ["Student", "Teacher"]
+        student_keys: ["bilinear_key", "value"]
+        teacher_keys: ["query", "value"]
+        s_shapes: *s_shapes
+        t_shapes: *t_shapes
+  Eval:
+    - CELoss:
+        weight: 1.0
+```
 
-<a name="4.2"></a>
-### 4.2 迁移学习 finetune
-* 为验证迁移学习的效果，我们在 10 个小的数据集上验证其效果。在这里为了保证实验的可对比性，我们均使用 ImageNet1k 数据集训练的标准预处理过程，对于蒸馏模型我们也添加了蒸馏模型中间层学习率的搜索。
-* 对于 ResNet50_vd, baseline 为 Top1 Acc 79.12% 的预训练模型基于 grid search 搜索得到的最佳精度，对比实验则为基于该精度对预训练和中间层学习率进一步搜索得到的最佳精度。下面给出 10 个数据集上所有 baseline 和蒸馏模型的精度对比。
+**注意(：** 上述在网络中指定`return_patterns`，返回中间层特征的功能是基于TheseusLayer，更多关于TheseusLayer的使用说明，请参考：[TheseusLayer 使用说明](./theseus_layer.md)。
 
+<a name='1.2.5'></a>
 
-| Dataset | Model | Baseline Top1 Acc | Distillation Model Finetune |
-|- |:-: |:-: | :-: |
-| Oxford102 flowers | ResNete50_vd | 97.18% | 97.41% |
-| caltech-101 | ResNete50_vd | 92.57% | 93.21% |
-| Oxford-IIIT-Pets | ResNete50_vd | 94.30% | 94.76% |
-| DTD | ResNete50_vd | 76.48% | 77.71% |
-| fgvc-aircraft-2013b | ResNete50_vd | 88.98% | 90.00% |
-| Stanford-Cars | ResNete50_vd | 92.65% | 92.76% |
-| SUN397 | ResNete50_vd | 64.02% | 68.36% |
-| cifar100 | ResNete50_vd | 86.50% | 87.58% |
-| cifar10 | ResNete50_vd | 97.72% | 97.94% |
-| Food-101 | ResNete50_vd | 89.58% | 89.99% |
+#### 1.2.5 DKD
 
-* 可以看出在上面 10 个数据集上，结合适当的中间层学习率倍数设置，蒸馏模型平均能够带来 1% 以上的精度提升。
+##### 1.2.5.1 DKD 算法介绍
 
-<a name="4.3"></a>
-### 4.3 目标检测
+论文信息：
 
-我们基于两阶段目标检测 Faster/Cascade RCNN 模型验证蒸馏得到的预训练模型的效果。
 
-* ResNet50_vd
+> [Decoupled Knowledge Distillation](https://arxiv.org/abs/2203.08679)
+>
+> Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, Jiajun Liang
+>
+> CVPR, 2022
 
-设置训练与评测的尺度均为 640x640，最终 COCO 上检测指标如下。
+DKD将蒸馏中常用的 KD Loss 进行了解耦成为Target Class Knowledge Distillation(TCKD，目标类知识蒸馏)以及Non-target Class Knowledge Distillation(NCKD，非目标类知识蒸馏)两个部分，对两个部分的作用分别研究，并使它们各自的权重可以独立调节，提升了蒸馏的精度和灵活性。
 
-| Model | train/test scale | pretrain top1 acc | feature map lr | coco mAP |
-|- |:-: |:-: | :-: | :-: |
-| Faster RCNN R50_vd FPN | 640/640 | 79.12% | [1.0,1.0,1.0,1.0,1.0] | 34.8% |
-| Faster RCNN R50_vd FPN | 640/640 | 79.12% | [0.05,0.05,0.1,0.1,0.15] | 34.3% |
-| Faster RCNN R50_vd FPN | 640/640 | 82.18% | [0.05,0.05,0.1,0.1,0.15] | 36.3% |
+在ImageNet1k公开数据集上，效果如下所示。
 
-在这里可以看出，对于未蒸馏模型，过度调整中间层学习率反而降低最终检测模型的性能指标。基于该蒸馏模型，我们也提供了领先的服务端实用目标检测方案，详细的配置与训练代码均已开源，可以参考 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_enhance)。
+| 策略 | 骨干网络 | 配置文件 | Top-1 acc | 下载链接 |
+| --- | --- | --- | --- | --- |
+| baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
+| AFD | ResNet18 | [resnet34_distill_resnet18_dkd.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_dkd.yaml) | 72.59%(**+1.79%**) | - |
 
-<a name="5"></a>
-## 5. SSLD 实战
 
-本节将基于 ImageNet-1K 的数据集详细介绍 SSLD 蒸馏实验，如果想快速体验此方法，可以参考 [**30 分钟玩转 PaddleClas（进阶版）**](../quick_start/quick_start_classification_professional.md)中基于 CIFAR100 的 SSLD 蒸馏实验。
+##### 1.2.5.2 DKD 配置
 
-<a name="5.1"></a>
-### 5.1 参数配置
+DKD 配置如下所示。在模型构建Arch字段中，需要同时定义学生模型与教师模型，教师模型固定参数，且需要加载预训练模型。在损失函数Loss字段中，需要定义`DistillationDKDLoss`（学生与教师之间的DKD loss）以及`DistillationGTCELoss`（学生与教师关于真值标签的CE loss），作为训练的损失函数。
 
-实战部分提供了 SSLD 蒸馏的示例，在 `ppcls/configs/ImageNet/Distillation/mv3_large_x1_0_distill_mv3_small_x1_0.yaml` 中提供了 `MobileNetV3_large_x1_0` 蒸馏 `MobileNetV3_small_x1_0` 的配置文件，用户可以在 `tools/train.sh` 里直接替换配置文件的路径即可使用。
 
 ```yaml
 Arch:
@@ -216,53 +418,165 @@ Arch:
   - False
   models:
     - Teacher:
-        name: MobileNetV3_large_x1_0
+        name: ResNet34
         pretrained: True
-        use_ssld: True
+
     - Student:
-        name: MobileNetV3_small_x1_0
+        name: ResNet18
         pretrained: False
 
   infer_model_name: "Student"
+
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationGTCELoss:
+        weight: 1.0
+        model_names: ["Student"]
+    - DistillationDKDLoss:
+        weight: 1.0
+        model_name_pairs: [["Student", "Teacher"]]
+        temperature: 1
+        alpha: 1.0
+        beta: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
 ```
+<a name="2"></a>
 
-在参数配置中，`freeze_params_list` 中需要指定模型是否需要冻结参数，`models` 中需要指定 Teacher 模型和 Student 模型，其中 Teacher 模型需要加载预训练模型。用户可以直接在此处更改模型。
+## 2. 模型训练、评估和预测
 
-<a name="5.2"></a>
-### 5.2 启动命令
+<a name="2.1"></a>  
 
-当用户配置完训练环境后，类似于训练其他分类任务，只需要将 `tools/train.sh` 中的配置文件替换成为相应的蒸馏配置文件即可。
+### 2.1 环境配置
 
-其中 `train.sh` 中的内容如下：
+* 安装：请先参考 [Paddle 安装教程](../installation/install_paddle.md) 以及 [PaddleClas 安装教程](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
 
-```bash
+<a name="2.2"></a>
+
+### 2.2 数据准备
+
+请在[ImageNet 官网](https://www.image-net.org/)准备 ImageNet-1k 相关的数据。
+
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，将下载好的数据命名为 `ILSVRC2012` ，存放于此。 `ILSVRC2012` 目录中具有以下数据：
+
+```
+├── train
+│   ├── n01440764
+│   │   ├── n01440764_10026.JPEG
+│   │   ├── n01440764_10027.JPEG
+├── train_list.txt
+...
+├── val
+│   ├── ILSVRC2012_val_00000001.JPEG
+│   ├── ILSVRC2012_val_00000002.JPEG
+├── val_list.txt
+```
+
+其中 `train/` 和 `val/` 分别为训练集和验证集。`train_list.txt` 和 `val_list.txt` 分别为训练集和验证集的标签文件。
+
+
+如果包含与训练集场景相似的无标注数据，则也可以按照与训练集标注完全相同的方式进行整理，将文件与当前有标注的数据集放在相同目录下，将其标签值记为0，假设整理的标签文件名为`train_list_unlabel.txt`，则可以通过下面的命令生成用于SSLD训练的标签文件。
+
+```shell
+cat train_list.txt train_list_unlabel.txt > train_list_all.txt
+```
+
+
+**备注：**
 
-python -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
-    --log_dir=mv3_large_x1_0_distill_mv3_small_x1_0 \
+* 关于 `train_list.txt`、`val_list.txt`的格式说明，可以参考[PaddleClas分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。
+
+
+<a name="2.3"></a>
+
+### 2.3 模型训练
+
+
+以SSLD知识蒸馏算法为例，介绍知识蒸馏算法的模型训练、评估、预测等过程。配置文件为 [PPLCNet_x2_5_ssld.yaml](../../../ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml) ，使用下面的命令可以完成模型训练。
+
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
     tools/train.py \
-        -c ./ppcls/configs/ImageNet/Distillation/mv3_large_x1_0_distill_mv3_small_x1_0.yaml
+        -c ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml
 ```
 
-运行 `train.sh` ：
+<a name="2.4"></a>
+
+### 2.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
 
 ```bash
-sh tools/train.sh
+python3 tools/eval.py \
+    -c ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model
+```
+
+其中 `-o Global.pretrained_model="output/DistillationModel/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+<a name="2.5"></a>
+
+### 2.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```python
+python3 tools/infer.py \
+    -c ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model
+```
+
+输出结果如下：
+
+```
+[{'class_ids': [8, 7, 86, 82, 21], 'scores': [0.87908, 0.12091, 0.0, 0.0, 0.0], 'file_name': 'docs/images/inference_deployment/whl_demo.jpg', 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'kite']}]
 ```
 
-<a name="5.3"></a>
-### 5.3 注意事项
 
-* 用户在使用 SSLD 蒸馏之前，首先需要在目标数据集上训练一个教师模型，该教师模型用于指导学生模型在该数据集上的训练。
+**备注：**
 
-* 如果学生模型没有加载预训练模型，训练的其他超参数可以参考该学生模型在 ImageNet-1k 上训练的超参数，如果学生模型加载了预训练模型，学习率可以调整到原来的 1/10 或者 1/100 。
+* 这里`-o Global.pretrained_model="output/ResNet50/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
 
-* 在 SSLD 蒸馏的过程中，学生模型只学习 soft-label 导致训练目标变的更加复杂，建议可以适当的调小 `l2_decay` 的值来获得更高的验证集准确率。
+* 默认是对 `docs/images/inference_deployment/whl_demo.jpg` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
 
-* 若用户准备添加无标签的训练数据，只需要将新的训练数据放置在原本训练数据的路径下，生成新的数据 list 即可，另外，新生成的数据 list 需要将无标签的数据添加伪标签（只是为了统一读数据）。
 
-<a name="6"></a>
-## 6. 参考文献
+<a name="2.6"></a>
+
+### 2.6 模型导出与推理
+
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference可使用MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于Paddle Inference推理引擎的介绍，可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+
+在模型推理之前需要先导出模型。对于知识蒸馏训练得到的模型，在导出时需要指定`-o Global.infer_model_name=Student`，来表示导出的模型为学生模型。具体命令如下所示。
+
+```shell
+python3 tools/export_model.py \
+    -c ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml \
+    -o Global.pretrained_model=./output/DistillationModel/best_model \
+    -o Arch.infer_model_name=Student
+```
+
+最终在`inference`目录下会产生`inference.pdiparams`、`inference.pdiparams.info`、`inference.pdmodel` 3个文件。
+
+关于更多模型推理相关的教程，请参考：[Python 预测推理](../inference_deployment/python_deploy.md)。
+
+
+<a name="3"></a>
+
+## 3. 参考文献
 
 [1] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
 
@@ -273,3 +587,17 @@ sh tools/train.sh
 [4] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2019: 113-123.
 
 [5] Touvron H, Vedaldi A, Douze M, et al. Fixing the train-test resolution discrepancy[C]//Advances in Neural Information Processing Systems. 2019: 8250-8260.
+
+[6] Cui C, Guo R, Du Y, et al. Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones[J]. arXiv preprint arXiv:2103.05959, 2021.
+
+[7] Zhang Y, Xiang T, Hospedales T M, et al. Deep mutual learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4320-4328.
+
+[8] Heo B, Kim J, Yun S, et al. A comprehensive overhaul of feature distillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1921-1930.
+
+[9] Du Y, Li C, Guo R, et al. PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System[J]. arXiv preprint arXiv:2109.03144, 2021.
+
+[10] Park W, Kim D, Lu Y, et al. Relational knowledge distillation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3967-3976.
+
+[11] Zhao B, Cui Q, Song R, et al. Decoupled Knowledge Distillation[J]. arXiv preprint arXiv:2203.08679, 2022.
+
+[12] Ji M, Heo B, Park S. Show, attend and distill: Knowledge distillation via attention-based feature matching[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 35(9): 7945-7952.
diff --git a/docs/zh_CN/advanced_tutorials/ssld.md b/docs/zh_CN/advanced_tutorials/ssld.md
new file mode 100644
index 0000000000000000000000000000000000000000..e19a98cbc866bc02f0ca9df6d8e939b3342663f5
--- /dev/null
+++ b/docs/zh_CN/advanced_tutorials/ssld.md
@@ -0,0 +1,171 @@
+
+# SSLD 知识蒸馏实战
+
+## 目录
+
+- [1. 算法介绍](#1)
+    - [1.1 知识蒸馏简介](#1.1)
+    - [1.2 SSLD蒸馏策略](#1.2)
+    - [1.3 SKL-UGI蒸馏策略](#1.3)
+- [2. SSLD预训练模型库](#2)
+- [3. SSLD使用](#3)
+    - [3.1 加载SSLD模型进行微调](#3.1)
+    - [3.2 使用SSLD方案进行知识蒸馏](#3.2)
+- [4. 参考文献](#4)
+
+
+
+<a name="1"></a>
+
+## 1. 算法介绍
+
+<a name="1.1"></a>
+
+### 1.1 简介
+
+PaddleClas 融合已有的知识蒸馏方法 [2,3]，提供了一种简单的半监督标签知识蒸馏方案(SSLD，Simple Semi-supervised Label Distillation)，基于 ImageNet1k 分类数据集，在 ResNet_vd 以及 MobileNet 系列上的精度均有超过 3% 的绝对精度提升，具体指标如下图所示。
+
+<div align="center">
+<img src="../../images/distillation/distillation_perform_s.jpg"  width = "800" />
+</div>
+
+<a name="1.2"></a>
+
+### 1.2 SSLD蒸馏策略
+
+SSLD 的流程图如下图所示。
+
+<div align="center">
+<img src="../../images/distillation/ppcls_distillation.png"  width = "800" />
+</div>
+
+首先，我们从 ImageNet22k 中挖掘出了近 400 万张图片，同时与 ImageNet-1k 训练集整合在一起，得到了一个新的包含 500 万张图片的数据集。然后，我们将学生模型与教师模型组合成一个新的网络，该网络分别输出学生模型和教师模型的预测分布，与此同时，固定教师模型整个网络的梯度，而学生模型可以做正常的反向传播。最后，我们将两个模型的 logits 经过 softmax 激活函数转换为 soft label，并将二者的 soft label 做 JS 散度作为损失函数，用于蒸馏模型训练。
+
+以 MobileNetV3（该模型直接训练，精度为 75.3%）的知识蒸馏为例，该方案的核心策略优化点如下所示。
+
+
+| 实验ID | 策略               | Top-1 acc |
+|:------:|:---------:|:--------:|
+| 1    | baseline         | 75.60%    |
+| 2    | 更换教师模型精度为82.4%的权重 | 76.00%    |
+| 3    | 使用改进的JS散度损失函数    | 76.20%    |
+| 4    | 迭代轮数增加至360epoch  | 77.10%    |
+| 5    | 添加400W挖掘得到的无标注数据      | 78.50%    |
+| 6    | 基于ImageNet1k数据微调 | 78.90%    |
+
+* 注：其中baseline的训练条件为
+    * 训练数据：ImageNet1k数据集
+    * 损失函数：Cross Entropy Loss
+    * 迭代轮数：120epoch
+
+
+SSLD 蒸馏方案的一大特色就是无需使用图像的真值标签，因此可以任意扩展数据集的大小，考虑到计算资源的限制，我们在这里仅基于 ImageNet22k 数据集对蒸馏任务的训练集进行扩充。在 SSLD 蒸馏任务中，我们使用了 `Top-k per class` 的数据采样方案 [3] 。具体步骤如下。
+
+（1）训练集去重。我们首先基于 SIFT 特征相似度匹配的方式对 ImageNet22k 数据集与 ImageNet1k 验证集进行去重，防止添加的 ImageNet22k 训练集中包含 ImageNet1k 验证集图像，最终去除了 4511 张相似图片。部分过滤的相似图片如下所示。
+
+<div align="center">
+<img src="../../images/distillation/22k_1k_val_compare_w_sift.png"  width = "600" />
+</div>
+
+（2）大数据集 soft label 获取，对于去重后的 ImageNet22k 数据集，我们使用 `ResNeXt101_32x16d_wsl` 模型进行预测，得到每张图片的 soft label 。
+
+（3）Top-k 数据选择，ImageNet1k 数据共有 1000 类，对于每一类，找出属于该类并且得分最高的 `k` 张图片，最终得到一个数据量不超过 `1000*k` 的数据集（某些类上得到的图片数量可能少于 `k` 张）。
+
+（4）将该数据集与 ImageNet1k 的训练集融合组成最终蒸馏模型所使用的数据集，数据量为 500 万。
+
+
+<a name="1.3"></a>
+
+## 1.3 SKL-UGI蒸馏策略
+
+此外，在无标注数据选择的过程中，我们发现使用更加通用的数据，即使不需要严格的数据筛选过程，也可以帮助知识蒸馏任务获得稳定的精度提升，因而提出了SKL-UGI (Symmetrical-KL Unlabeled General Images distillation)知识蒸馏方案。
+
+通用数据可以使用ImageNet数据或者与场景相似的数据集。更多关于SKL-UGI的应用，请参考：[超轻量图像分类方案PULC使用教程](../PULC/PULC_train.md)。
+
+
+<a name="2"></a>
+
+## 2. 预训练模型库
+
+
+移动端预训练模型库列表如下所示。
+
+| 模型                      | FLOPs(M) | Params(M) | top-1 acc | SSLD top-1 acc | 精度收益   | 下载链接 |
+|-------------------|----------|-----------|----------|---------------|--------|------|
+| PPLCNetV2_base          | 604.16     | 6.54       | 77.04%   | 80.10%        | +3.06% |   [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNetV2_base_ssld_pretrained.pdparams)   |
+| PPLCNet_x2_5            | 906.49    | 9.04        | 76.60%   | 80.82%        | +4.22% |  [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_ssld_pretrained.pdparams)    |
+| PPLCNet_x1_0            | 160.81    | 2.96         | 71.32%   | 74.39%        | +3.07% |   [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_ssld_pretrained.pdparams)   |
+| PPLCNet_x0_5            | 47.28    | 1.89       | 63.14%   | 66.10%        | +2.96% |   [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_ssld_pretrained.pdparams)   |
+| PPLCNet_x0_25           | 18.43    | 1.52       | 51.86%   | 53.43%        | +1.57% |   [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_ssld_pretrained.pdparams)   |
+| MobileNetV1             | 578.88     | 4.19      | 71.00%   | 77.90%        | +6.90% |   [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)   |
+| MobileNetV2             | 327.84      | 3.44      | 72.20%   | 76.74%        | +4.54% |   [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)   |
+| MobileNetV3_large_x1_0  | 229.66     | 5.47      | 75.30%   | 79.00%        | +3.70% |   [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)   |
+| MobileNetV3_small_x1_0  | 63.67    | 2.94      | 68.20%   | 71.30%        | +3.10% |   [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)   |
+| MobileNetV3_small_x0_35 | 14.56    | 1.66      | 53.00%   | 55.60%        | +2.60% |   [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)   |
+| GhostNet_x1_3_ssld      | 236.89     | 7.30       | 75.70%   | 79.40%        | +3.70% |   [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)   |
+
+* 注：其中的`top-1 acc`表示使用普通训练方式得到的模型精度，`SSLD top-1 acc`表示使用SSLD知识蒸馏训练策略得到的模型精度。
+
+
+服务端预训练模型库列表如下所示。
+
+| 模型                   | FLOPs(G) | Params(M) | top-1 acc | SSLD top-1 acc | 精度收益   | 下载链接                                                                                      |
+|----------------------|----------|-----------|----------|---------------|--------|-------------------------------------------------------------------------------------------|
+| PPHGNet_base         | 25.14    | 71.62     | -   | 85.00% | - | [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_base_ssld_pretrained.pdparams) |
+| PPHGNet_small        | 8.53     | 24.38     | 81.50%   | 83.80% | +2.30% | [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_small_ssld_pretrained.pdparams) |
+| PPHGNet_tiny         | 4.54     | 14.75     | 79.83%   | 81.95% | +2.12% | [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_tiny_ssld_pretrained.pdparams) |
+| ResNet50_vd          | 8.67     | 25.58     | 79.10%   | 83.00% | +3.90% | [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams) |
+| ResNet101_vd         | 16.1     | 44.57     | 80.20%   | 83.70% | +3.50% | [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams) |
+| ResNet34_vd          | 7.39     | 21.82     | 76.00%   | 79.70% | +3.70% | [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_ssld_pretrained.pdparams) |
+| Res2Net50_vd_26w_4s  | 8.37     | 25.06     | 79.80%   | 83.10% | +3.30% | [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
+| Res2Net101_vd_26w_4s | 16.67    | 45.22     | 80.60%   | 83.90% | +3.30% | [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
+| Res2Net200_vd_26w_4s | 31.49    | 76.21     | 81.20%   | 85.10% | +3.90% | [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
+| HRNet_W18_C          | 4.14     | 21.29     | 76.90%   | 81.60% | +4.70% | [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/_ssld_pretrained.pdparams) |
+| HRNet_W48_C          | 34.58    | 77.47     | 79.00%   | 83.60% | +4.60% | [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W48_C_ssld_pretrained.pdparams) |
+| SE_HRNet_W64_C       | 57.83    | 128.97    | -        | 84.70% |   -     | [链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
+
+
+<a name="3"></a>
+
+## 3. SSLD使用方法
+
+<a name="3.1"></a>  
+
+### 3.1 加载SSLD模型进行微调
+
+如果希望直接使用预训练模型，可以在训练的时候，加入参数`-o Arch.pretrained=True -o Arch.use_ssld=True`，表示使用基于SSLD的预训练模型，示例如下所示。
+
+```shell
+# 单机单卡训练
+python3 tools/train.py -c ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml -o Arch.pretrained=True -o Arch.use_ssld=True
+# 单机多卡训练
+python3 -m paddle.distributed.launch --gpus="0,1,2,3" tools/train.py -c ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml -o Arch.pretrained=True -o Arch.use_ssld=True
+```
+
+<a name="3.2"></a>
+
+### 3.2 使用SSLD方案进行知识蒸馏
+
+相比于其他大多数知识蒸馏算法，SSLD摆脱对数据标注的依赖，通过引入无标注数据，可以进一步提升模型精度。
+
+对于无标注数据，需要按照与有标注数据完全相同的整理方式，将文件与当前有标注的数据集放在相同目录下，将其标签值记为`0`，假设整理的标签文件名为`train_list_unlabel.txt`，则可以通过下面的命令生成用于SSLD训练的标签文件。
+
+```shell
+cat train_list.txt train_list_unlabel.txt > train_list_all.txt
+```
+
+更多关于图像分类任务的数据标签说明，请参考：[PaddleClas图像分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明)
+
+PaddleClas中集成了PULC超轻量图像分类实用方案，里面包含SSLD ImageNet预训练模型的使用以及更加通用的无标签数据的知识蒸馏方案，更多详细信息，请参考[PULC超轻量图像分类实用方案使用教程](../PULC/PULC_train.md)。
+
+<a name="4"></a>
+
+## 4. 参考文献
+
+[1] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
+
+[2] Bagherinezhad H, Horton M, Rastegari M, et al. Label refinery: Improving imagenet classification through label progression[J]. arXiv preprint arXiv:1805.02641, 2018.
+
+[3] Yalniz I Z, Jégou H, Chen K, et al. Billion-scale semi-supervised learning for image classification[J]. arXiv preprint arXiv:1905.00546, 2019.
+
+[4] Touvron H, Vedaldi A, Douze M, et al. Fixing the train-test resolution discrepancy[C]//Advances in Neural Information Processing Systems. 2019: 8250-8260.
diff --git a/docs/zh_CN/algorithm_introduction/ImageNet_models.md b/docs/zh_CN/algorithm_introduction/ImageNet_models.md
index 8e847bb8c17db46e71e8542b954fdf49e8cd549d..ad32788a8579ccf22ddb72dd40f9f0a8daa019d9 100644
--- a/docs/zh_CN/algorithm_introduction/ImageNet_models.md
+++ b/docs/zh_CN/algorithm_introduction/ImageNet_models.md
@@ -133,6 +133,8 @@ PP-LCNet 系列模型的精度、速度指标如下表所示，更多关于该
 
 **: 基于 Intel-Xeon-Gold-6271C 硬件平台与 OpenVINO 2021.4.2 推理平台。
 
+<a name="PPHGNet"></a>
+
 ## PP-HGNet 系列
 
 PP-HGNet 系列模型的精度、速度指标如下表所示，更多关于该系列的模型介绍可以参考：[PP-HGNet 系列模型文档](../models/PP-HGNet.md)。
@@ -140,7 +142,10 @@ PP-HGNet 系列模型的精度、速度指标如下表所示，更多关于该
 | 模型  | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 |
 | ---  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | PPHGNet_tiny | 0.7983    |  0.9504    | 1.77            |       -     |  -       | 4.54        | 14.75        | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_tiny_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_tiny_infer.tar) |
+| PPHGNet_tiny_ssld | 0.8195    |  0.9612  |  1.77            |       -     |  -       | 4.54        | 14.75        | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_tiny_ssld_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_tiny_ssld_infer.tar) |
 | PPHGNet_small | 0.8151    |  0.9582    |  2.52            | -           |    -  | 8.53       | 24.38           | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_small_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_small_infer.tar) |
+| PPHGNet_small_ssld | 0.8382    |  0.9681  | 2.52            | -           |    -  | 8.53       | 24.38           | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_small_ssld_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_small_ssld_infer.tar) |
+| PPHGNet_base_ssld | 0.8500    |  0.9735  | 5.97            | -           |    -  | 25.14       | 71.62           | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_base_ssld_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_base_ssld_infer.tar) |
 
 <a name="ResNet"></a>
 
diff --git a/docs/zh_CN/algorithm_introduction/knowledge_distillation.md b/docs/zh_CN/algorithm_introduction/knowledge_distillation.md
index 58092195956119416277e62ec225318373a2bfa3..afedce7f07117c857717575ca063bab8a5decc66 100644
--- a/docs/zh_CN/algorithm_introduction/knowledge_distillation.md
+++ b/docs/zh_CN/algorithm_introduction/knowledge_distillation.md
@@ -42,7 +42,7 @@
 
 PaddleClas 中提出了一种简单使用的 SSLD 知识蒸馏算法 [6]，在训练的时候去除了对 gt label 的依赖，结合大量无标注数据，最终蒸馏训练得到的预训练模型在 15 个模型上的精度提升平均高达 3%。
 
-上述标准的蒸馏方法是通过一个大模型作为教师模型来指导学生模型提升效果，而后来又发展出 DML(Deep Mutual Learning)互学习蒸馏方法 [7]，即通过两个结构相同的模型互相学习。具体的。相比于 KD 等依赖于大的教师模型的知识蒸馏算法，DML 脱离了对大的教师模型的依赖，蒸馏训练的流程更加简单，模型产出效率也要更高一些。
+上述标准的蒸馏方法是通过一个大模型作为教师模型来指导学生模型提升效果，而后来又发展出 DML(Deep Mutual Learning)互学习蒸馏方法 [7]，即通过两个结构相同的模型互相学习。具体的。相比于 KD 等依赖于大的教师模型的知识蒸馏算法，DML 脱离了对大的教师模型的依赖，蒸馏训练的流程更加简单，模型产出效率也要更高一些。
 
 <a name='3.2'></a>
 ### 3.2 Feature based distillation
diff --git a/docs/zh_CN/image_recognition_pipeline/feature_extraction.md b/docs/zh_CN/image_recognition_pipeline/feature_extraction.md
index 1438e9661200ede1adf67cf6813f763c3a13c095..368abc3da9856c8d9232819aef3b43f0ef66735d 100644
--- a/docs/zh_CN/image_recognition_pipeline/feature_extraction.md
+++ b/docs/zh_CN/image_recognition_pipeline/feature_extraction.md
@@ -1,182 +1,247 @@
+简体中文|[English](../../en/image_recognition_pipeline/feature_extraction_en.md)
 # 特征提取
 
 ## 目录
 
-- [1. 简介](#1)
-- [2. 网络结构](#2)
-- [3. 通用识别模型](#3)
-- [4. 自定义特征提取](#4)
-  - [4.1 数据准备](#4.1)
-  - [4.2 模型训练](#4.2)
-  - [4.3 模型评估](#4.3)
-  - [4.4 模型推理](#4.4)
-    - [4.4.1 导出推理模型](#4.4.1)
-    - [4.4.2 获取特征向量](#4.4.2)
+- [1. 摘要](#1-摘要)
+- [2. 介绍](#2-介绍)
+- [3. 方法](#3-方法)
+  - [3.1 Backbone](#31-backbone)
+  - [3.2 Neck](#32-neck)
+  - [3.3 Head](#33-head)
+  - [3.4 Loss](#34-loss)
+- [4. 实验部分](#4-实验部分)
+- [5. 自定义特征提取](#5-自定义特征提取)
+  - [5.1 数据准备](#51-数据准备)
+  - [5.2 模型训练](#52-模型训练)
+  - [5.3 模型评估](#53-模型评估)
+  - [5.4 模型推理](#54-模型推理)
+    - [5.4.1 导出推理模型](#541-导出推理模型)
+    - [5.4.2 获取特征向量](#542-获取特征向量)
+- [6. 总结](#6-总结)
+- [7. 参考文献](#7-参考文献)
 
 <a name="1"></a>
 
-## 1. 简介
+## 1. 摘要
 
-特征提取是图像识别中的关键一环，它的作用是将输入的图片转化为固定维度的特征向量，用于后续的[向量检索](./vector_search.md)。好的特征需要具备相似度保持性，即在特征空间中，相似度高的图片对其特征相似度要比较高（距离比较近），相似度低的图片对，其特征相似度要比较小（距离比较远）。[Deep Metric Learning](../algorithm_introduction/metric_learning.md)用以研究如何通过深度学习的方法获得具有强表征能力的特征。
+特征提取是图像识别中的关键一环，它的作用是将输入的图片转化为固定维度的特征向量，用于后续的[向量检索](./vector_search.md)。一个好的特征需要具备“相似度保持性”，即相似度高的图片对，其特征的相似度也比较高（特征空间中的距离比较近），相似度低的图片对，其特征相似度要比较低（特征空间中的距离比较远）。为此[Deep Metric Learning](../algorithm_introduction/metric_learning.md)领域内提出了不少方法用以研究如何通过深度学习来获得具有强表征能力的特征。
 
 <a name="2"></a>
 
-## 2. 网络结构
+## 2. 介绍
+
 为了图像识别任务的灵活定制，我们将整个网络分为 Backbone、 Neck、 Head 以及 Loss 部分，整体结构如下图所示:
 ![](../../images/feature_extraction_framework.png)
 图中各个模块的功能为:
 
-- **Backbone**: 指定所使用的骨干网络。 值得注意的是，PaddleClas 提供的基于 ImageNet 的预训练模型，最后一层的输出为 1000，我们需要依据所需的特征维度定制最后一层的输出。
-- **Neck**: 用以特征增强及特征维度变换。这儿的 Neck，可以是一个简单的 Linear Layer，用来做特征维度变换；也可以是较复杂的 FPN 结构，用以做特征增强。
-- **Head**: 用来将 feature 转化为 logits。除了常用的 Fc Layer 外，还可以替换为 cosmargin, arcmargin, circlemargin 等模块。
-- **Loss**: 指定所使用的 Loss 函数。我们将 Loss 设计为组合 loss 的形式，可以方便地将 Classification Loss 和 Pair_wise Loss 组合在一起。
+- **Backbone**: 用于提取输入图像初步特征的骨干网络，一般由配置文件中的 [`Backbone`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L26-L29) 以及 [`BackboneStopLayer`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L30-L31) 字段共同指定。
+- **Neck**: 用以特征增强及特征维度变换。可以是一个简单的 FC Layer，用来做特征维度变换；也可以是较复杂的 FPN 结构，用以做特征增强，一般由配置文件中的 [`Neck`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L32-L35)字段指定。
+- **Head**: 用来将 feature 转化为 logits，让模型在训练阶段能以分类任务的形式进行训练。除了常用的 FC Layer 外，还可以替换为 cosmargin, arcmargin, circlemargin 等模块，一般由配置文件中的 [`Head`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L36-L41)字段指定。
+- **Loss**: 指定所使用的 Loss 函数。我们将 Loss 设计为组合 loss 的形式，可以方便地将 Classification Loss 和 Metric learning Loss 组合在一起，一般由配置文件中的 [`Loss`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L44-L50)字段指定。
 
 <a name="3"></a>
 
-## 3. 通用识别模型
+## 3. 方法
+
+### 3.1 Backbone
+
+Backbone 部分采用了 [PP_LCNet_x2_5](../models/PP-LCNet.md)，其针对Intel CPU端的性能优化探索了多个有效的结构设计方案，最终实现了在不增加推理时间的情况下，进一步提升模型的性能，最终大幅度超越现有的 SOTA 模型。
+
+### 3.2 Neck
+
+Neck 部分采用了 [FC Layer](../../../ppcls/arch/gears/fc.py)，对 Backbone 抽取得到的特征进行降维，减少了特征存储的成本与计算量。
+
+### 3.3 Head
+
+Head 部分选用 [ArcMargin](../../../ppcls/arch/gears/arcmargin.py)，在训练时通过指定margin，增大同类特征之间的角度差异再进行分类，进一步提升抽取特征的表征能力。
 
-在 PP-Shitu 中, 我们采用 [PP_LCNet_x2_5](../models/PP-LCNet.md) 作为骨干网络 Neck 部分选用 Linear Layer, Head 部分选用 [ArcMargin](../../../ppcls/arch/gears/arcmargin.py)，Loss 部分选用 CELoss，详细的配置文件见[通用识别配置文件](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml)。其中，训练数据为如下 7 个公开数据集的汇总：
+### 3.4 Loss
 
-| 数据集       | 数据量   | 类别数   | 场景  | 数据集地址 |
-| :------------:  | :-------------: | :-------: | :-------: | :--------: |
-| Aliproduct | 2498771 | 50030 | 商品 | [地址](https://retailvisionworkshop.github.io/recognition_challenge_2020/) |
-| GLDv2 | 1580470 | 81313  | 地标 | [地址](https://github.com/cvdfoundation/google-landmark) |
-| VeRI-Wild | 277797 | 30671 | 车辆 | [地址](https://github.com/PKU-IMRE/VERI-Wild)|
-| LogoDet-3K | 155427 | 3000 | Logo | [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
-| iCartoonFace | 389678 | 5013  | 动漫人物 | [地址](http://challenge.ai.iqiyi.com/detail?raceId=5def69ace9fcf68aef76a75d) |
-| SOP | 59551 | 11318  | 商品 | [地址](https://cvgl.stanford.edu/projects/lifted_struct/) |
-| Inshop | 25882 | 3997  | 商品 | [地址](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) |
-| **Total** | **5M** | **185K**  | ---- | ---- |
+Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py)，在训练时以分类任务的损失函数来指导网络进行优化。详细的配置文件见[通用识别配置文件](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml)。
+
+<a name="4"></a>
+
+## 4. 实验部分
+
+训练数据为如下 7 个公开数据集的汇总：
+
+|    数据集    | 数据量  |  类别数  |   场景   |                                  数据集地址                                  |
+| :----------: | :-----: | :------: | :------: | :--------------------------------------------------------------------------: |
+|  Aliproduct  | 2498771 |  50030   |   商品   |  [地址](https://retailvisionworkshop.github.io/recognition_challenge_2020/)  |
+|    GLDv2     | 1580470 |  81313   |   地标   |           [地址](https://github.com/cvdfoundation/google-landmark)           |
+|  VeRI-Wild   | 277797  |  30671   |   车辆   |                [地址](https://github.com/PKU-IMRE/VERI-Wild)                 |
+|  LogoDet-3K  | 155427  |   3000   |   Logo   |          [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset)          |
+| iCartoonFace | 389678  |   5013   | 动漫人物 | [地址](http://challenge.ai.iqiyi.com/detail?raceId=5def69ace9fcf68aef76a75d) |
+|     SOP      |  59551  |  11318   |   商品   |          [地址](https://cvgl.stanford.edu/projects/lifted_struct/)           |
+|    Inshop    |  25882  |   3997   |   商品   |        [地址](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html)         |
+|  **Total**   | **5M**  | **185K** |   ----   |                                     ----                                     |
 
 最终的模型效果如下表所示:
 
-| 模型       | Aliproduct  | VeRI-Wild  |  LogoDet-3K |  iCartoonFace | SOP | Inshop | Latency(ms) |
-| :----------:  | :---------: | :-------: | :-------: | :--------: | :--------: | :--------: | :--------: |
-PP-LCNet-2.5x | 0.839 | 0.888 | 0.861 | 0.841 | 0.793 | 0.892 | 5.0 
+|              模型               | Aliproduct | VeRI-Wild | LogoDet-3K | iCartoonFace |  SOP  | Inshop | Latency(ms) |
+| :-----------------------------: | :--------: | :-------: | :--------: | :----------: | :---: | :----: | :---------: |
+| GeneralRecognition_PPLCNet_x2_5 |   0.839    |   0.888   |   0.861    |    0.841     | 0.793 | 0.892  |     5.0     |
 
+* 预训练模型地址：[通用识别预训练模型](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/general_PPLCNet_x2_5_pretrained_v1.0.pdparams)
 * 采用的评测指标为：`Recall@1`
 * 速度评测机器的 CPU 具体信息为：`Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`
 * 速度指标的评测条件为： 开启 MKLDNN, 线程数设置为 10
-* 预训练模型地址：[通用识别预训练模型](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/general_PPLCNet_x2_5_pretrained_v1.0.pdparams)
 
-<a name="4"></a>
+<a name="5"></a>
 
-## 4. 自定义特征提取
+## 5. 自定义特征提取
 
-自定义特征提取，是指依据自己的任务，重新训练特征提取模型。主要包含四个步骤：1）数据准备；2）模型训练；3）模型评估；4）模型推理。
+自定义特征提取，是指依据自己的任务，重新训练特征提取模型。
 
-<a name="4.1"></a>
+下面基于`GeneralRecognition_PPLCNet_x2_5.yaml`配置文件，介绍主要的四个步骤：1）数据准备；2）模型训练；3）模型评估；4）模型推理
 
-### 4.1 数据准备
 
-首先，需要基于任务定制自己的数据集。数据集格式参见[格式说明](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/data_preparation/recognition_dataset.md#%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%BC%E5%BC%8F%E8%AF%B4%E6%98%8E)。在启动模型训练之前，需要在配置文件中修改数据配置相关的内容, 主要包括数据集的地址以及类别数量。对应到配置文件中的位置如下所示：
-```
-  Head:
-    name: ArcMargin
-    embedding_size: 512
-    class_num: 185341    #此处表示类别数
-```
-```
-  Train:
-    dataset:
-      name: ImageNetDataset
-      image_root: ./dataset/     #此处表示train数据所在的目录
-      cls_label_path: ./dataset/train_reg_all_data.txt  #此处表示train数据集label文件的地址
-```
-```
-    Query:
-      dataset:
-        name: VeriWild
-        image_root: ./dataset/Aliproduct/.    #此处表示query数据集所在的目录
-        cls_label_path: ./dataset/Aliproduct/val_list.txt.    #此处表示query数据集label文件的地址
-```
-```
-    Gallery:
-      dataset:
-        name: VeriWild
-        image_root: ./dataset/Aliproduct/    #此处表示gallery数据集所在的目录
-        cls_label_path: ./dataset/Aliproduct/val_list.txt.   #此处表示gallery数据集label文件的地址
-```
-
-<a name="4.2"></a>
+<a name="5.1"></a>
 
-### 4.2 模型训练
+### 5.1 数据准备
 
-- 单机单卡训练
-```shell
-export CUDA_VISIBLE_DEVICES=0
-python tools/train.py -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
-```
-- 单机多卡训练
-```shell
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-python -m paddle.distributed.launch \
-    --gpus="0,1,2,3" tools/train.py \
-    -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
-```
-**注意:**
-配置文件中默认采用`在线评估`的方式，如果你想加快训练速度，去除`在线评估`，只需要在上述命令后面，增加 `-o eval_during_train=False`。训练完毕后，在 output 目录下会生成最终模型文件 `latest`，`best_model` 和训练日志文件 `train.log`。其中，`best_model` 用来存储当前评测指标下的最佳模型；`latest` 用来存储最新生成的模型, 方便在任务中断的情况下从断点位置启动训练。
+首先需要基于任务定制自己的数据集。数据集格式与文件结构详见[数据集格式说明](../data_preparation/recognition_dataset.md)。
 
-- 断点续训：
-```shell
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-python -m paddle.distributed.launch \
-    --gpus="0,1,2,3" tools/train.py \
-    -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
-    -o Global.checkpoint="output/RecModel/latest"
-```
+准备完毕之后还需要在配置文件中修改数据配置相关的内容, 主要包括数据集的地址以及类别数量。对应到配置文件中的位置如下所示：
 
-<a name="4.3"></a>
+- 修改类别数：
+  ```yaml
+    Head:
+      name: ArcMargin
+      embedding_size: 512
+      class_num: 185341    # 此处表示类别数
+  ```
+- 修改训练数据集配置：
+  ```yaml
+    Train:
+      dataset:
+        name: ImageNetDataset
+        image_root: ./dataset/     # 此处表示train数据所在的目录
+        cls_label_path: ./dataset/train_reg_all_data.txt  # 此处表示train数据集label文件的地址
+  ```
+- 修改评估数据集中query数据配置：
+  ```yaml
+      Query:
+        dataset:
+          name: VeriWild
+          image_root: ./dataset/Aliproduct/    # 此处表示query数据集所在的目录
+          cls_label_path: ./dataset/Aliproduct/val_list.txt    # 此处表示query数据集label文件的地址
+  ```
+- 修改评估数据集中gallery数据配置：
+  ```yaml
+      Gallery:
+        dataset:
+          name: VeriWild
+          image_root: ./dataset/Aliproduct/    # 此处表示gallery数据集所在的目录
+          cls_label_path: ./dataset/Aliproduct/val_list.txt   # 此处表示gallery数据集label文件的地址
+  ```
+
+<a name="5.2"></a>
+
+### 5.2 模型训练
+
+模型训练主要包括启动训练和断点恢复训练的功能
 
-### 4.3 模型评估
+- 单机单卡训练
+  ```shell
+  export CUDA_VISIBLE_DEVICES=0
+  python3.7 tools/train.py \
+  -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
+  ```
+- 单机多卡训练
+  ```shell
+  export CUDA_VISIBLE_DEVICES=0,1,2,3
+  python3.7 -m paddle.distributed.launch \
+  --gpus="0,1,2,3" tools/train.py \
+  -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
+  ```
+**注意：**
+配置文件中默认采用`在线评估`的方式，如果你想加快训练速度，可以关闭`在线评估`功能，只需要在上述命令的后面，增加 `-o Global.eval_during_train=False`。
+
+训练完毕后，在 output 目录下会生成最终模型文件 `latest.pdparams`，`best_model.pdarams` 和训练日志文件 `train.log`。其中，`best_model` 保存了当前评测指标下的最佳模型，`latest` 用来保存最新生成的模型, 方便在任务中断的情况下从断点位置恢复训练。通过在上述训练命令的末尾加上`-o Global.checkpoint="path_to_resume_checkpoint"`即可从断点恢复训练，示例如下。
+
+- 单机单卡断点恢复训练
+  ```shell
+  export CUDA_VISIBLE_DEVICES=0
+  python3.7 tools/train.py \
+  -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
+  -o Global.checkpoint="output/RecModel/latest"
+  ```
+- 单机多卡断点恢复训练
+  ```shell
+  export CUDA_VISIBLE_DEVICES=0,1,2,3
+  python3.7 -m paddle.distributed.launch \
+  --gpus="0,1,2,3" tools/train.py \
+  -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
+  -o Global.checkpoint="output/RecModel/latest"
+  ```
+
+<a name="5.3"></a>
+
+### 5.3 模型评估
+
+除了训练过程中对模型进行的在线评估，也可以手动启动评估程序来获得指定的模型的精度指标。
 
 - 单卡评估
-```shell
-export CUDA_VISIBLE_DEVICES=0
-python tools/eval.py \
--c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
--o Global.pretrained_model="output/RecModel/best_model"
-```
+  ```shell
+  export CUDA_VISIBLE_DEVICES=0
+  python3.7 tools/eval.py \
+  -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
+  -o Global.pretrained_model="output/RecModel/best_model"
+  ```
 
 - 多卡评估
-```shell
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-python -m paddle.distributed.launch \
-    --gpus="0,1,2,3" tools/eval.py \
-    -c  ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
-    -o  Global.pretrained_model="output/RecModel/best_model"
-```
-**推荐：** 建议使用多卡评估。多卡评估方式可以利用多卡并行计算快速得到整体数据集的特征集合，能够加速评估的过程。
+  ```shell
+  export CUDA_VISIBLE_DEVICES=0,1,2,3
+  python3.7 -m paddle.distributed.launch \
+  --gpus="0,1,2,3" tools/eval.py \
+  -c  ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
+  -o  Global.pretrained_model="output/RecModel/best_model"
+  ```
+**注：** 建议使用多卡评估。该方式可以利用多卡并行计算快速得到全部数据的特征，能够加速评估的过程。
 
-<a name="4.4"></a>
+<a name="5.4"></a>
 
-### 4.4 模型推理
+### 5.4 模型推理
 
-推理过程包括两个步骤： 1)导出推理模型; 2)获取特征向量
+推理过程包括两个步骤： 1）导出推理模型；2）模型推理以获取特征向量
 
-<a name="4.4.1"></a>
+#### 5.4.1 导出推理模型
 
-#### 4.4.1 导出推理模型
-
-```
-python tools/export_model.py \
+首先需要将 `*.pdparams` 模型文件转换成 inference 格式，转换命令如下。
+```shell
+python3.7 tools/export_model.py \
 -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
 -o Global.pretrained_model="output/RecModel/best_model"
 ```
-生成的推理模型位于 `inference` 目录，里面包含三个文件，分别为 `inference.pdmodel`、`inference.pdiparams`、`inference.pdiparams.info`。
-其中: `inference.pdmodel` 用来存储推理模型的结构, `inference.pdiparams` 和 `inference.pdiparams.info` 用来存储推理模型相关的参数信息。
+生成的推理模型默认位于 `PaddleClas/inference` 目录，里面包含三个文件，分别为 `inference.pdmodel`、`inference.pdiparams`、`inference.pdiparams.info`。
+其中`inference.pdmodel` 用来存储推理模型的结构, `inference.pdiparams` 和 `inference.pdiparams.info` 用来存储推理模型相关的参数信息。
 
-<a name="4.4.2"></a>
+#### 5.4.2 获取特征向量
 
-#### 4.4.2 获取特征向量
+使用上一步转换得到的 inference 格式模型，将输入图片转换为对应的特征向量，推理命令如下。
 
-```
+```shell
 cd deploy
-python python/predict_rec.py \
+python3.7 python/predict_rec.py \
 -c configs/inference_rec.yaml \
 -o Global.rec_inference_model_dir="../inference"
 ```
 得到的特征输出格式如下图所示：
 ![](../../images/feature_extraction_output.png)
 
-在实际使用过程中，单纯得到特征往往并不能够满足业务的需求。如果想进一步通过特征检索来进行图像识别，可以参照文档[向量检索](./vector_search.md)。
+在实际使用过程中，仅仅得到特征可能并不能满足业务需求。如果想进一步通过特征检索来进行图像识别，可以参照文档[向量检索](./vector_search.md)。
+
+<a name="6"></a>
+
+## 6. 总结
+
+特征提取模块作为图像识别中的关键一环，在网络结构的设计，损失函数的选取上有很大的改进空间。不同的数据集类型有各自不同的特点，如行人重识别、商品识别、人脸识别数据集的分布、图片内容都不尽相同。学术界根据这些特点提出了各种各样的方法，如PCB、MGN、ArcFace、CircleLoss、TripletLoss等，围绕的还是增大类间差异、减少类内差异的最终目标，从而有效地应对各种真实场景数据。
+
+<a name="7"></a>
+
+## 7. 参考文献
+
+1. [PP-LCNet: A Lightweight CPU Convolutional Neural Network](https://arxiv.org/pdf/2109.15099.pdf)
+2. [ArcFace: Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
diff --git a/docs/zh_CN/image_recognition_pipeline/mainbody_detection.md b/docs/zh_CN/image_recognition_pipeline/mainbody_detection.md
index f3d7989029ae763523cebf3d504920863b356adc..828fdf4f1f017d524aa9ebea1f1a409dee0eaf43 100644
--- a/docs/zh_CN/image_recognition_pipeline/mainbody_detection.md
+++ b/docs/zh_CN/image_recognition_pipeline/mainbody_detection.md
@@ -19,9 +19,13 @@
   - [3.3 配置文件改动和说明](#3.3)
   - [3.4 启动训练](#3.4)
   - [3.5 模型预测与调试](#3.5)
-  - [3.6 模型导出与预测部署](#3.6)
+- [4. 模型推理部署](#4)
+  - [4.1 推理模型准备](#4.1)
+  - [4.2 基于python预测引擎推理](#4.2)
+  - [4.3 其他推理方式](#4.3)
 
-<a name="1"></a> 
+
+<a name="1"></a>
 
 ## 1. 数据集
 
@@ -37,7 +41,7 @@
 
 在实际训练的过程中，将所有数据集混合在一起。由于是主体检测，这里将所有标注出的检测框对应的类别都修改为 `前景` 的类别，最终融合的数据集中只包含 1 个类别，即前景。
 
-<a name="2"></a> 
+<a name="2"></a>
 
 ## 2. 模型选择
 
@@ -55,7 +59,7 @@
   * 速度评测机器的 CPU 具体信息为：`Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`，速度指标为开启 mkldnn，线程数设置为 10 测试得到。
   * 主体检测的预处理过程较为耗时，平均每张图在上述机器上的时间在 40~55 ms 左右，没有包含在上述的预测耗时统计中。
 
-<a name="2.1"></a> 
+<a name="2.1"></a>
 
 ### 2.1 轻量级主体检测模型
 
@@ -72,7 +76,7 @@ PicoDet 由 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 
 
 在轻量级主体检测任务中，为了更好地兼顾检测速度与效果，我们使用 PPLCNet_x2_5 作为主体检测模型的骨干网络，同时将训练与预测的图像尺度修改为了 640x640，其余配置与 [picodet_lcnet_1_5x_416_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/more_config/picodet_lcnet_1_5x_416_coco.yml) 完全一致。将数据集更换为自定义的主体检测数据集，进行训练，最终得到检测模型。
 
-<a name="2.2"></a> 
+<a name="2.2"></a>
 
 ### 2.2 服务端主体检测模型
 
@@ -93,13 +97,13 @@ PP-YOLO 由 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 
 
 在服务端主体检测任务中，为了保证检测效果，我们使用 ResNet50vd-DCN 作为检测模型的骨干网络，使用配置文件 [ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml)，更换为自定义的主体检测数据集，进行训练，最终得到检测模型。
 
-<a name="3"></a> 
+<a name="3"></a>
 
 ## 3. 模型训练
 
 本节主要介绍怎样基于 PaddleDetection，基于自己的数据集，训练主体检测模型。
 
-<a name="3.1"></a> 
+<a name="3.1"></a>
 
 ### 3.1 环境准备
 
@@ -116,7 +120,7 @@ pip install -r requirements.txt
 
 更多安装教程，请参考: [安装文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL_cn.md)
 
-<a name="3.2"></a> 
+<a name="3.2"></a>
 
 ### 3.2 数据准备
 
@@ -128,7 +132,7 @@ pip install -r requirements.txt
 [{u'id': 1, u'name': u'foreground', u'supercategory': u'foreground'}]
 ```
 
-<a name="3.3"></a> 
+<a name="3.3"></a>
 
 ### 3.3 配置文件改动和说明
 
@@ -154,7 +158,7 @@ ppyolov2_reader.yml：主要说明数据读取器配置，如 batch size，并
 
 此外，也可以根据实际情况，修改上述文件，比如，如果显存溢出，可以将 batch size 和学习率等比缩小等。
 
-<a name="3.4"></a> 
+<a name="3.4"></a>
 
 ### 3.4 启动训练
 
@@ -198,7 +202,7 @@ python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppy
 
 注意：如果遇到 "`Out of memory error`" 问题, 尝试在 `ppyolov2_reader.yml` 文件中调小 `batch_size`，同时等比例调小学习率。
 
-<a name="3.5"></a> 
+<a name="3.5"></a>
 
 ### 3.5 模型预测与调试
 
@@ -211,9 +215,11 @@ python tools/infer.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --infer
 
 `--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算，不同阈值会产生不同的结果 `keep_top_k` 表示设置输出目标的最大数量，默认值为 100，用户可以根据自己的实际情况进行设定。
 
-<a name="3.6"></a> 
+<a name="4"></a>
+## 4. 模型推理部署
 
-### 3.6 模型导出与预测部署。
+<a name="4.1"></a>
+### 4.1 推理模型准备
 
 执行导出模型脚本：
 
@@ -225,15 +231,21 @@ python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml
 
 注意： `PaddleDetection` 导出的 inference 模型的文件格式为 `model.xxx`，这里如果希望与 PaddleClas 的 inference 模型文件格式保持一致，需要将其 `model.xxx` 文件修改为 `inference.xxx` 文件，用于后续主体检测的预测部署。
 
-更多模型导出教程，请参考： [EXPORT_MODEL](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/deploy/EXPORT_MODEL.md)
+更多模型导出教程，请参考： [EXPORT_MODEL](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/EXPORT_MODEL.md)
 
 最终，目录 `inference/ppyolov2_r50vd_dcn_365e_coco` 中包含 `inference.pdiparams`, `inference.pdiparams.info` 以及 `inference.pdmodel` 文件，其中 `inference.pdiparams` 为保存的 inference 模型权重文件，`inference.pdmodel` 为保存的 inference 模型结构文件。
 
+<a name="4.2"></a>
+### 4.2 基于python预测引擎推理
 
 导出模型之后，在主体检测与识别任务中，就可以将检测模型的路径更改为该 inference 模型路径，完成预测。
 
 以商品识别为例，其配置文件为 [inference_product.yaml](../../../deploy/configs/inference_product.yaml)，修改其中的 `Global.det_inference_model_dir` 字段为导出的主体检测 inference 模型目录，参考[图像识别快速开始教程](../quick_start/quick_start_recognition.md)，即可完成商品检测与识别过程。
 
+<a name="4.3"></a>
+### 4.3 其他推理方式
+其他推理方法，如C++推理部署、PaddleServing部署等请参考[检测模型推理部署](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/README.md)。
+
 
 ### FAQ
 
diff --git a/docs/zh_CN/image_recognition_pipeline/vector_search.md b/docs/zh_CN/image_recognition_pipeline/vector_search.md
index 6cf4d207ddfa5f3cade2ac727b12df2038f3943c..be0bf785c9b4844a9e6d2ae744ceb37c5ddbfed7 100644
--- a/docs/zh_CN/image_recognition_pipeline/vector_search.md
+++ b/docs/zh_CN/image_recognition_pipeline/vector_search.md
@@ -1,5 +1,21 @@
 # 向量检索
 
+## 目录
+
+- [1. 向量检索应用场景介绍](#1)
+- [2. 向量检索算法介绍](#2)
+	- [2.1 HNSW](#2.1)
+	- [2.2 IVF](#2.2)
+	- [2.3 FLAT](#2.3)
+- [3. 检索库安装](#3)
+- [4. 使用及配置文档介绍](#4)
+	- [4.1 建库及配置文件参数](#4.1)
+	- [4.2 检索配置文件参数](#4.2)
+
+
+<a name="1"></a>
+## 1. 向量检索应用场景介绍
+
 向量检索技术在图像识别、图像检索中应用比较广泛。其主要目标是，对于给定的查询向量，在已经建立好的向量库中，与库中所有的待查询向量，进行特征向量的相似度或距离计算，得到相似度排序。在图像识别系统中，我们使用 [Faiss](https://github.com/facebookresearch/faiss) 对此部分进行支持，具体信息请详查 [Faiss 官网](https://github.com/facebookresearch/faiss)。`Faiss` 主要有以下优势
 
 - 适配性好：支持 Windos、Linux、MacOS 系统
@@ -20,17 +36,33 @@
 
 --------------------------
 
-## 目录
+<a name="2"></a>
+## 2. 使用的检索算法
+
+目前 `PaddleClas` 中检索模块，支持三种检索算法**HNSW32**、**IVF**、**FLAT**。每种检索算法，满足不同场景。其中 `HNSW32` 为默认方法，此方法的检索精度、检索速度可以取得一个较好的平衡，具体算法介绍可以查看[官方文档](https://github.com/facebookresearch/faiss/wiki)。
+
+<a name="2.1"></a>
+### 2.1 HNSW方法
+
+此方法为图索引方法，如下图所示，在建立索引的时候，分为不同的层，所以检索精度较高，速度较快，但是特征库只支持添加图像功能，不支持删除图像特征功能。基于图的向量检索算法在向量检索的评测中性能都是比较优异的。如果比较在乎检索算法的效率，而且可以容忍一定的空间成本，多数场景下比较推荐基于图的检索算法。而HNSW是一种典型的，应用广泛的图算法，很多分布式检索引擎都对HNSW算法进行了分布式改造，以应用于高并发，大数据量的线上查询。此方法为默认方法。
+<div align="center">
+<img src="../../images/algorithm_introduction/hnsw.png"  width = "400" />
+</div>
+
+<a name="2.2"></a>
+### 2.2 IVF
 
-- [1. 检索库安装](#1)
-- [2. 使用的检索算法](#2)
-- [3. 使用及配置文档介绍](#3)
-  - [3.1 建库及配置文件参数](#3.1)
-  - [3.2 检索配置文件参数](#3.2)
+一种倒排索引检索方法。速度较快，但是精度略低。特征库支持增加、删除图像特征功能。IVF主要利用倒排的思想保存每个聚类中心下的向量，每次查询向量的时候找到最近的几个中心，分别搜索这几个中心下的向量。通过减小搜索范围，大大提升搜索效率。
 
-<a name="1"></a> 
+<a name="2.3"></a>
+### 2.3 FLAT
 
-## 1. 检索库安装
+暴力检索算法。精度最高，但是数据量大时，检索速度较慢。特征库支持增加、删除图像特征功能。
+
+
+<a name="3"></a>
+
+## 3. 检索库安装
 
 `Faiss` 具体安装方法如下：
 
@@ -40,27 +72,16 @@ pip install faiss-cpu==1.7.1post2
 
 若使用时，不能正常引用，则 `uninstall` 之后，重新 `install`，尤其是 `windows` 下。
 
-<a name="2"></a> 
-
-## 2. 使用的检索算法
-
-目前 `PaddleClas` 中检索模块，支持如下三种检索算法
-
-- **HNSW32**: 一种图索引方法。检索精度较高，速度较快。但是特征库只支持添加图像功能，不支持删除图像特征功能。（默认方法）
-- **IVF**：倒排索引检索方法。速度较快，但是精度略低。特征库支持增加、删除图像特征功能。
-- **FLAT**： 暴力检索算法。精度最高，但是数据量大时，检索速度较慢。特征库支持增加、删除图像特征功能。
-
-每种检索算法，满足不同场景。其中 `HNSW32` 为默认方法，此方法的检索精度、检索速度可以取得一个较好的平衡，具体算法介绍可以查看[官方文档](https://github.com/facebookresearch/faiss/wiki)。
 
-<a name="3"></a> 
+<a name="4"></a>
 
-## 3. 使用及配置文档介绍
+## 4. 使用及配置文档介绍
 
-涉及检索模块配置文件位于：`deploy/configs/` 下，其中 `build_*.yaml` 是建立特征库的相关配置文件，`inference_*.yaml` 是检索或者分类的推理配置文件。
+涉及检索模块配置文件位于：`deploy/configs/` 下，其中 `inference_*.yaml` 是检索或者分类的推理配置文件,同时也是建立特征库的相关配置文件。
 
-<a name="3.1"></a> 
+<a name="4.1"></a>
 
-### 3.1 建库及配置文件参数
+### 4.1 建库及配置文件参数
 
 建库的具体操作如下：
 
@@ -68,14 +89,14 @@ pip install faiss-cpu==1.7.1post2
 # 进入 deploy 目录
 cd deploy
 # yaml 文件根据需要改成自己所需的具体 yaml 文件
-python python/build_gallery.py -c configs/build_***.yaml
+python python/build_gallery.py -c configs/inference_***.yaml
 ```
 
 其中 `yaml` 文件的建库的配置如下，在运行时，请根据实际情况进行修改。建库操作会将根据 `data_file` 的图像列表，将 `image_root` 下的图像进行特征提取，并在 `index_dir` 下进行存储，以待后续检索使用。
 
 其中 `data_file` 文件存储的是图像文件的路径和标签，每一行的格式为：`image_path  label`。中间间隔以 `yaml` 文件中 `delimiter` 参数作为间隔。
 
-关于特征提取的具体模型参数，可查看 `yaml` 文件。
+关于特征提取的具体模型参数，可查看 `yaml` 文件。注意下面的配置参数只列举了建立索引库相关部分。
 
 ```yaml
 # indexing engine config
@@ -88,6 +109,7 @@ IndexProcess:
   delimiter: "\t"
   dist_type: "IP"
   embedding_size: 512
+  batch_size: 32
 ```
 
 - **index_method**：使用的检索算法。目前支持三种，HNSW32、IVF、Flat
@@ -98,23 +120,29 @@ IndexProcess:
 - **delimiter**：**data_file** 中每一行的间隔符
 - **dist_type**: 特征匹配过程中使用的相似度计算方式。例如 `IP` 内积相似度计算方式，`L2` 欧式距离计算方法
 - **embedding_size**：特征维度
+- **batch_size**：建立特征库时，特征提取的`batch_size`
 
-<a name="3.2"></a> 
+<a name="4.2"></a>
+
+### 4.2 检索配置文件参数
 
-### 3.2 检索配置文件参数
 
 将检索的过程融合到 `PP-ShiTu` 的整体流程中，请参考 [README](../../../README_ch.md) 中 `PP-ShiTu 图像识别系统介绍` 部分。检索具体使用操作请参考[识别快速开始文档](../quick_start/quick_start_recognition.md)。
 
 其中，检索部分配置如下，整体检索配置文件，请参考 `deploy/configs/inference_*.yaml` 文件。
 
+注意：此部分参数只是列举了离线检索相关部分参数。
+
 ```yaml
 IndexProcess:
   index_dir: "./recognition_demo_data_v1.1/gallery_logo/index/"
   return_k: 5
   score_thres: 0.5
+  hamming_radius: 100
 ```
 
 与建库配置文件不同，新参数主要如下：
 
 - `return_k`: 检索结果返回 `k` 个结果
 - `score_thres`: 检索匹配的阈值
+- `hamming_radius`: 汉明距离半径。此参数只有在使用二值特征模型，`dist_type`设置为`hamming`时才能生效。具体二值特征模型使用方法请参考[哈希编码](./deep_hashing.md)
diff --git a/docs/zh_CN/inference_deployment/export_model.md b/docs/zh_CN/inference_deployment/export_model.md
index 1d8decb2837c0f68f71a6b022b05e574ce3ef83b..5e7d204c5f3e9755d2c97428c040fe7c2aa328e2 100644
--- a/docs/zh_CN/inference_deployment/export_model.md
+++ b/docs/zh_CN/inference_deployment/export_model.md
@@ -17,7 +17,7 @@ PaddlePaddle 支持导出 inference 模型用于部署推理场景，相比于
 <a name="1"></a>
 ## 1. 环境准备
 
-首先请参考文档[安装 PaddlePaddle](../installation/install_paddle.md)和文档[安装 PaddleClas](../installation/install_paddleclas.md)配置运行环境。
+首先请参考文档文档[环境准备](../installation/install_paddleclas.md)配置运行环境。
 
 <a name="2"></a>
 ## 2. 分类模型导出
diff --git a/docs/zh_CN/inference_deployment/python_deploy.md b/docs/zh_CN/inference_deployment/python_deploy.md
index 39843df12d17265fc586b160003e3361edb8a14a..9d4f254fdde8400b369dc54a4437dcc5f6929126 100644
--- a/docs/zh_CN/inference_deployment/python_deploy.md
+++ b/docs/zh_CN/inference_deployment/python_deploy.md
@@ -2,14 +2,15 @@
 
 ---
 
-首先请参考文档[安装 PaddlePaddle](../installation/install_paddle.md)和文档[安装 PaddleClas](../installation/install_paddleclas.md)配置运行环境。
+首先请参考文档[环境准备](../installation/install_paddleclas.md)配置运行环境。
 
 ## 目录
 
-- [1. 图像分类推理](#1)
-- [2. 主体检测模型推理](#2)
-- [3. 特征提取模型推理](#3)
-- [4. 主体检测、特征提取和向量检索串联](#4)
+- [1. 图像分类模型推理](#1)
+- [2. PP-ShiTu模型推理](#2)
+	- [2.1 主体检测模型推理](#2.1)
+	- [2.2 特征提取模型推理](#2.2)
+	- [2.3 PP-ShiTu PipeLine推理](#2.3)
 
 <a name="1"></a>
 ## 1. 图像分类推理
@@ -42,7 +43,12 @@ python python/predict_cls.py -c configs/inference_cls.yaml
 * 如果你希望提升评测模型速度，使用 GPU 评测时，建议开启 TensorRT 加速预测，使用 CPU 评测时，建议开启 MKL-DNN 加速预测。
 
 <a name="2"></a>
-## 2. 主体检测模型推理
+## 2. PP-ShiTu模型推理
+
+PP-ShiTu整个Pipeline包含三部分：主体检测、特提取模型、特征检索。其中主体检测、特征模型可以单独推理使用。单独主体检测详见[2.1](#2.1)，特征提取模型单独推理详见[2.2](#2.2)， PP-ShiTu整体推理详见[2.3](#2.3)。
+
+<a name="2.1"></a>
+### 2.1 主体检测模型推理
 
 进入 PaddleClas 的 `deploy` 目录下：
 
@@ -70,8 +76,8 @@ python python/predict_det.py -c configs/inference_det.yaml
 * `Global.use_gpu`： 是否使用 GPU 预测，默认为 `True`。
 
 
-<a name="3"></a>
-## 3. 特征提取模型推理
+<a name="2.2"></a>
+### 2.2 特征提取模型推理
 
 下面以商品特征提取为例，介绍特征提取模型推理。首先进入 PaddleClas 的 `deploy` 目录下：
 
@@ -90,7 +96,7 @@ tar -xf ./models/product_ResNet50_vd_aliproduct_v1.0_infer.tar -C ./models/
 
 上述预测命令可以得到一个 512 维的特征向量，直接输出在在命令行中。
 
-<a name="4"></a>
-## 4. 主体检测、特征提取和向量检索串联
+<a name="2.3"></a>
+### 2.3. PP-ShiTu PipeLine推理
 
 主体检测、特征提取和向量检索的串联预测，可以参考图像识别[快速体验](../quick_start/quick_start_recognition.md)。
diff --git a/docs/zh_CN/inference_deployment/whl_deploy.md b/docs/zh_CN/inference_deployment/whl_deploy.md
index 14582ace5ce13636c7c14e7fdb9ba9ad2ebbfe90..e6ad70904853d17f89974ff62b812a3420d21a2b 100644
--- a/docs/zh_CN/inference_deployment/whl_deploy.md
+++ b/docs/zh_CN/inference_deployment/whl_deploy.md
@@ -18,7 +18,7 @@ PaddleClas 支持 Python Whl 包方式进行预测，目前 Whl 包方式仅支
    - [4.6 对 `NumPy.ndarray` 格式数据进行预测](#4.6)
    - [4.7 保存预测结果](#4.7)
    - [4.8 指定 label name](#4.8)
-   
+
 
 <a name="1"></a>
 ## 1. 安装 paddleclas
@@ -212,14 +212,14 @@ print(next(result))
 ```python
 from paddleclas import PaddleClas
 clas = PaddleClas(model_name='ResNet50', save_dir='./output_pre_label/')
-infer_imgs = 'docs/images/whl/' # it can be infer_imgs folder path which contains all of images you want to predict.
+infer_imgs = 'docs/images/' # it can be infer_imgs folder path which contains all of images you want to predict.
 result=clas.predict(infer_imgs)
 print(next(result))
 ```
 
 * CLI
 ```bash
-paddleclas --model_name='ResNet50' --infer_imgs='docs/images/whl/' --save_dir='./output_pre_label/'
+paddleclas --model_name='ResNet50' --infer_imgs='docs/images/' --save_dir='./output_pre_label/'
 ```
 
 <a name="4.8"></a>
diff --git a/docs/zh_CN/installation/install_paddle.md b/docs/zh_CN/installation/install_paddle.md
deleted file mode 100644
index 995d28797c3078956af5571ef11506c2028481e4..0000000000000000000000000000000000000000
--- a/docs/zh_CN/installation/install_paddle.md
+++ /dev/null
@@ -1,101 +0,0 @@
-# 安装 PaddlePaddle
-
----
-## 目录
-
-- [1. 环境要求](#1)
-- [2.（建议）使用 Docker 环境](#2)
-- [3. 通过 pip 安装 PaddlePaddle](#3)
-- [4. 验证安装](#4)
-
-目前，**PaddleClas** 要求 **PaddlePaddle** 版本 `>=2.0`。建议使用我们提供的 Docker 运行 PaddleClas，有关 Docker、nvidia-docker 的相关使用教程可以参考[链接](https://www.runoob.com/Docker/Docker-tutorial.html)。如果不使用 Docker，可以直接跳过 [2.（建议）使用 Docker 环境](#2) 部分内容，从 [3. 通过 pip 安装 PaddlePaddle](#3) 部分开始。
-
-<a name='1'></a>
-
-## 1. 环境要求
-
-**版本要求**：
-- python 3.x
-- CUDA >= 10.1（如果使用 `paddlepaddle-gpu`）
-- cuDNN >= 7.6.4（如果使用 `paddlepaddle-gpu`）
-- nccl >= 2.1.2（如果使用分布式训练/评估）
-- gcc >= 8.2
-
-**建议**：
-* 当 CUDA 版本为 10.1 时，显卡驱动版本 `>= 418.39`；
-* 当 CUDA 版本为 10.2 时，显卡驱动版本 `>= 440.33`；
-* 更多 CUDA 版本与要求的显卡驱动版本可以参考[链接](https://docs.nvidia.com/deploy/cuda-compatibility/index.html)。
-
-<a name="2"></a>
-
-## 2.（建议）使用 Docker 环境
-
-* 切换到工作目录下
-
-```shell
-cd /home/Projects
-```
-
-* 创建 docker 容器
-
-下述命令会创建一个名为 ppcls 的 Docker 容器，并将当前工作目录映射到容器内的 `/paddle` 目录。
-
-```shell
-# 对于 GPU 用户
-sudo nvidia-docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it paddlepaddle/paddle:2.1.0-gpu-cuda10.2-cudnn7 /bin/bash
-
-# 对于 CPU 用户
-sudo docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it paddlepaddle/paddle:2.1.0 /bin/bash
-```
-
-**注意**：
-* 首次使用该镜像时，下述命令会自动下载该镜像文件，下载需要一定的时间，请耐心等待；
-* 上述命令会创建一个名为 ppcls 的 Docker 容器，之后再次使用该容器时无需再次运行该命令；
-* 参数 `--shm-size=8G` 将设置容器的共享内存为 8 G，如机器环境允许，建议将该参数设置较大，如 `64G`；
-* 您也可以访问 [DockerHub](https://hub.Docker.com/r/paddlepaddle/paddle/tags/) 获取与您机器适配的镜像；
-* 退出/进入 docker 容器：
-    * 在进入 Docker 容器后，可使用组合键 `Ctrl + P + Q` 退出当前容器，同时不关闭该容器；
-    * 如需再次进入容器，可使用下述命令：
-
-    ```shell
-    sudo Docker exec -it ppcls /bin/bash
-    ```
-
-<a name="3"></a>
-
-## 3. 通过 pip 安装 PaddlePaddle
-
-可运行下面的命令，通过 pip 安装最新版本 PaddlePaddle：
-
-```bash
-# 对于 CPU 用户
-pip install paddlepaddle --upgrade -i https://mirror.baidu.com/pypi/simple
-
-# 对于 GPU 用户
-pip install paddlepaddle-gpu --upgrade -i https://mirror.baidu.com/pypi/simple
-```
-
-**注意：**
-* 如果先安装了 CPU 版本的 PaddlePaddle，之后想切换到 GPU 版本，那么需要使用 pip 先卸载 CPU 版本的 PaddlePaddle，再安装 GPU 版本的 PaddlePaddle，否则容易导致 PaddlePaddle 冲突。
-* 您也可以从源码编译安装 PaddlePaddle，请参照 [PaddlePaddle 安装文档](http://www.paddlepaddle.org.cn/install/quick) 中的说明进行操作。
-
-<a name='4'></a>
-## 4. 验证安装
-
-使用以下命令可以验证 PaddlePaddle 是否安装成功。
-
-```python
-import paddle
-paddle.utils.run_check()
-```
-
-查看 PaddlePaddle 版本的命令如下：
-
-```bash
-python -c "import paddle; print(paddle.__version__)"
-```
-
-**注意**：
-- 从源码编译的 PaddlePaddle 版本号为 `0.0.0`，请确保使用 PaddlePaddle 2.0 及之后的源码进行编译；
-- PaddleClas 基于 PaddlePaddle 高性能的分布式训练能力，若您从源码编译，请确保打开编译选项 `WITH_DISTRIBUTE=ON`。具体编译选项参考 [编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#bianyixuanxiangbiao)；
-- 在 Docker 中运行时，为保证 Docker 容器有足够的共享内存用于 Paddle 的数据读取加速，在创建 Docker 容器时，请设置参数 `--shm-size=8g`，条件允许的话可以设置为更大的值。
diff --git a/docs/zh_CN/installation/install_paddleclas.md b/docs/zh_CN/installation/install_paddleclas.md
index 0f70bf2364589dbe85bf09128fc034d9d250d22b..e02acc6fdae4f211c07232489d07b31bd187da1d 100644
--- a/docs/zh_CN/installation/install_paddleclas.md
+++ b/docs/zh_CN/installation/install_paddleclas.md
@@ -1,29 +1,94 @@
-# 安装 PaddleClas
+# 环境准备
 
 ---
 ## 目录
 
-* [1. 克隆 PaddleClas](#1)
-* [2. 安装 Python 依赖库](#2)
+- [1. 安装 PaddlePaddle](#1)
+  - [1.1 使用Paddle官方镜像](#1.1)
+  - [1.2 在现有环境中安装paddle](#1.2)
+  - [1.3 安装验证](#1.3)
+- [2. 克隆 PaddleClas](#2)
+- [3. 安装 Python 依赖库](#3)
 
 <a name='1'></a>
+### 1.安装PaddlePaddle
+目前，**PaddleClas** 要求 **PaddlePaddle** 版本 `>=2.3`。
+建议使用Paddle官方提供的 Docker 镜像运行 PaddleClas，有关 Docker、nvidia-docker 的相关使用教程可以参考[链接](https://www.runoob.com/Docker/Docker-tutorial.html)。
 
-## 1. 克隆 PaddleClas
+<a name='1.1'></a>
+
+#### 1.1（建议）使用 Docker 环境
+
+* 切换到工作目录下，例如工作目录为`/home/Projects`，则运行命令: 
+
+```shell
+cd /home/Projects
+```
+
+* 创建 docker 容器
+
+下述命令会创建一个名为 ppcls 的 Docker 容器，并将当前工作目录映射到容器内的 `/paddle` 目录。
+
+```shell
+# 对于 GPU 用户
+sudo nvidia-docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it registry.baidubce.com/paddlepaddle/paddle:2.3.0-gpu-cuda10.2-cudnn7 /bin/bash
+
+# 对于 CPU 用户
+sudo docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it paddlepaddle/paddle:2.3.0-gpu-cuda10.2-cudnn7 /bin/bash
+```
+
+**注意**：
+* 首次使用该镜像时，下述命令会自动下载该镜像文件，下载需要一定的时间，请耐心等待；
+* 上述命令会创建一个名为 ppcls 的 Docker 容器，之后再次使用该容器时无需再次运行该命令；
+* 参数 `--shm-size=8G` 将设置容器的共享内存为 8 G，如机器环境允许，建议将该参数设置较大，如 `64G`；
+* 您也可以访问 [DockerHub](https://hub.Docker.com/r/paddlepaddle/paddle/tags/) ，手动选择需要的镜像；
+* 退出/进入 docker 容器：
+    * 在进入 Docker 容器后，可使用组合键 `Ctrl + P + Q` 退出当前容器，同时不关闭该容器；
+    * 如需再次进入容器，可使用下述命令：
+
+    ```shell
+    sudo Docker exec -it ppcls /bin/bash
+    ```
+<a name='1.2'></a>
+#### 1.2 在现有环境中安装paddle
+您也可以用pip或conda直接安装paddle，详情请参考官方文档中的[快速安装](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/docker/linux-docker.html)部分。
+<a name='1.3'></a>
+#### 1.3 安装验证
+使用以下命令可以验证 PaddlePaddle 是否安装成功。
+```python
+import paddle
+paddle.utils.run_check()
+```
+查看 PaddlePaddle 版本的命令如下：
+
+```bash
+python -c "import paddle; print(paddle.__version__)"
+```
+
+**注意**：
+- 从源码编译的 PaddlePaddle 版本号为 `0.0.0`，请确保使用 PaddlePaddle 2.3 及之后的源码进行编译；
+- PaddleClas 基于 PaddlePaddle 高性能的分布式训练能力，若您从源码编译，请确保打开编译选项 `WITH_DISTRIBUTE=ON`。具体编译选项参考 [编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#bianyixuanxiangbiao)；
+- 在 Docker 中运行时，为保证 Docker 容器有足够的共享内存用于 Paddle 的数据读取加速，在创建 Docker 容器时，请设置参数 `--shm-size=8g`，条件允许的话可以设置为更大的值。
+
+
+<a name='2'></a>
+
+### 2. 克隆 PaddleClas
 
 从 GitHub 下载：
 
 ```shell
-git clone https://github.com/PaddlePaddle/PaddleClas.git -b release/2.3
+git clone https://github.com/PaddlePaddle/PaddleClas.git -b release/2.4
 ```
 
 如果访问 GitHub 网速较慢，可以从 Gitee 下载，命令如下：
 
 ```shell
-git clone https://gitee.com/paddlepaddle/PaddleClas.git -b release/2.3
+git clone https://gitee.com/paddlepaddle/PaddleClas.git -b release/2.4
 ```
-<a name='2'></a>
+<a name='3'></a>
 
-## 2. 安装 Python 依赖库
+### 3. 安装 Python 依赖库
 
 PaddleClas 的 Python 依赖库在 `requirements.txt` 中给出，可通过如下命令安装：
 
diff --git a/docs/zh_CN/models/PP-HGNet.md b/docs/zh_CN/models/PP-HGNet.md
index d4b4a975d105f632a46c75a78b89089bdb1590e0..1150c87584319024767af1d3564f135d5391d83d 100644
--- a/docs/zh_CN/models/PP-HGNet.md
+++ b/docs/zh_CN/models/PP-HGNet.md
@@ -1,20 +1,43 @@
 # PP-HGNet 系列
 ---
-## 目录
-
-* [1. 概述](#1)
-* [2. 结构信息](#2)
-* [3. 实验结果](#3)
+- [1. 模型介绍](#1)
+    - [1.1 模型简介](#1.1)
+    - [1.2 模型细节](#1.2)
+    - [1.3 实验结果](#1.3)
+- [2. 模型快速体验](#2)
+   - [2.1 安装 paddleclas](#2.1)
+   - [2.2 预测](#2.2)
+- [3. 模型训练、评估和预测](#3)
+    - [3.1 环境配置](#3.1)
+    - [3.2 数据准备](#3.2)
+    - [3.3 模型训练](#3.3)
+    - [3.4 模型评估](#3.4)
+    - [3.5 模型预测](#3.5)
+- [4. 模型推理部署](#4)
+  - [4.1 推理模型准备](#4.1)
+    - [4.1.1 基于训练得到的权重导出 inference 模型](#4.1.1)
+    - [4.1.2 直接下载 inference 模型](#4.1.2)
+  - [4.2 基于 Python 预测引擎推理](#4.2)
+    - [4.2.1 预测单张图像](#4.2.1)
+    - [4.2.2 基于文件夹的批量预测](#4.2.2)
+  - [4.3 基于 C++ 预测引擎推理](#4.3)
+  - [4.4 服务化部署](#4.4)
+  - [4.5 端侧部署](#4.5)
+  - [4.6 Paddle2ONNX 模型转换与预测](#4.6)
 
 <a name='1'></a>
 
-## 1. 概述
+## 1. 模型介绍
+
+<a name='1.1'></a>
+
+### 1.1 模型简介
 
-PP-HGNet(High Performance GPU Net) 是百度飞桨视觉团队自研的更适用于 GPU 平台的高性能骨干网络，该网络在 VOVNet 的基础上使用了可学习的下采样层（LDS Layer），融合了 ResNet_vd、PPLCNet 等模型的优点，该模型在 GPU 平台上与其他 SOTA 模型在相同的速度下有着更高的精度。在同等速度下，该模型高于 ResNet34-D 模型 3.8 个百分点，高于 ResNet50-D 模型 2.4 个百分点，在使用百度自研 SSLD 蒸馏策略后，超越 ResNet50-D 模型 4.7 个百分点。与此同时，在相同精度下，其推理速度也远超主流 VisionTransformer 的推理速度。
+PP-HGNet(High Performance GPU Net) 是百度飞桨视觉团队自研的更适用于 GPU 平台的高性能骨干网络，该网络在 VOVNet 的基础上使用了可学习的下采样层（LDS Layer），融合了 ResNet_vd、PPHGNet 等模型的优点，该模型在 GPU 平台上与其他 SOTA 模型在相同的速度下有着更高的精度。在同等速度下，该模型高于 ResNet34-D 模型 3.8 个百分点，高于 ResNet50-D 模型 2.4 个百分点，在使用百度自研 SSLD 蒸馏策略后，超越 ResNet50-D 模型 4.7 个百分点。与此同时，在相同精度下，其推理速度也远超主流 VisionTransformer 的推理速度。
 
-<a name='2'></a>
+<a name='1.2'></a>
 
-## 2. 结构信息
+### 1.2 模型细节
 
 PP-HGNet 作者针对 GPU 设备，对目前 GPU 友好的网络做了分析和归纳，尽可能多的使用 3x3 标准卷积（计算密度最高）。在此将 VOVNet 作为基准模型，将主要的有利于 GPU 推理的改进点进行融合。从而得到一个有利于 GPU 推理的骨干网络，同样速度下，精度大幅超越其他 CNN 或者 VisionTransformer 模型。
 
@@ -26,14 +49,29 @@ PP-HGNet 骨干网络的整体结构如下：
 
 ![](../../images/PP-HGNet/PP-HGNet-block.png)
 
-<a name='3'></a>
+<a name='1.3'></a>
+
+### 1.3 实验结果
+
+PP-HGNet 目前提供的模型的精度、速度指标及预训练权重链接如下：
+
+| Model | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) | 预训练模型下载地址 | inference模型下载地址 |
+|:--: |:--: |:--: |:--: | :--: |:--: |
+| PPHGNet_tiny      | 79.83 | 95.04 | 1.77 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_tiny_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_tiny_infer.tar) |
+| PPHGNet_tiny_ssld  | 81.95 | 96.12 | 1.77 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_tiny_ssld_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_tiny_ssld_infer.tar) |
+| PPHGNet_small     | 81.51| 95.82 | 2.52  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_small_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_small_infer.tar) |
+| PPHGNet_small_ssld | 83.82| 96.81 | 2.52  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_small_ssld_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_small_ssld_infer.tar) |
+| PPHGNet_base_ssld | 85.00| 97.35 | 5.97   | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_base_ssld_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_base_ssld_infer.tar) |
 
-## 3. 实验结果
+**备注：**
+    
+* 1. `_ssld` 表示使用 `SSLD 蒸馏`后的模型。关于 `SSLD蒸馏` 的内容，详情 [SSLD 蒸馏](../advanced_tutorials/knowledge_distillation.md)。
+* 2. PP-HGNet 更多模型指标及权重，敬请期待。
 
 PP-HGNet 与其他模型的比较如下，其中测试机器为 NVIDIA® Tesla® V100，开启 TensorRT 引擎，精度类型为 FP32。在相同速度下，PP-HGNet 精度均超越了其他 SOTA CNN 模型，在与 SwinTransformer 模型的比较中，在更高精度的同时，速度快 2 倍以上。
 
 | Model | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) |
-|-------|---------------|---------------|-------------|
+|:--: |:--: |:--: |:--: |
 | ResNet34                 | 74.57      | 92.14       | 1.97        |
 | ResNet34_vd              | 75.98      | 92.98       | 2.00        |
 | EfficientNetB0           | 77.38      | 93.31       | 1.96        |
@@ -46,6 +84,304 @@ PP-HGNet 与其他模型的比较如下，其中测试机器为 NVIDIA® Tesla®
 | SwinTransformer_tiny     | 81.2      | 95.5       | 6.59        |
 | <b>PPHGNet_small<b>      | <b>81.51<b>| <b>95.82<b> | <b>2.52<b>  |
 | <b>PPHGNet_small_ssld<b> | <b>83.82<b>| <b>96.81<b> | <b>2.52<b>  |
+| Res2Net200_vd_26w_4s_ssld| 85.13      | 97.42       | 11.45       |
+| ResNeXt101_32x48d_wsl    | 85.37      | 97.69       | 55.07       |
+| SwinTransformer_base     | 85.2       | 97.5        | 13.53       |  
+| <b>PPHGNet_base_ssld<b> | <b>85.00<b>| <b>97.35<b> | <b>5.97<b>   |
+
+ 
+<a name="2"></a>   
+    
+## 2. 模型快速体验
+
+<a name="2.1"></a>   
+    
+### 2.1 安装 paddleclas
+    
+使用如下命令快速安装 paddlepaddle, paddleclas
+    
+```    
+pip3 install paddlepaddle paddleclas
+```
+<a name="2.2"></a> 
+    
+### 2.2 预测
+
+* 在命令行中使用 PPHGNet_small 的权重快速预测
+    
+```bash
+paddleclas --model_name=PPHGNet_small  --infer_imgs="docs/images/inference_deployment/whl_demo.jpg"
+```
+    
+结果如下：
+```
+>>> result
+class_ids: [8, 7, 86, 82, 81], scores: [0.71479, 0.08682, 0.00806, 0.0023, 0.00121], label_names: ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'ptarmigan'], filename: docs/images/inference_deployment/whl_demo.jpg
+Predict complete!
+```  
+    
+**备注**： 更换 PPHGNet 的其他 scale 的模型时，只需替换 `model_name`，如将此时的模型改为 `PPHGNet_tiny` 时，只需要将 `--model_name=PPHGNet_small` 改为 `--model_name=PPHGNet_tiny` 即可。   
+
+    
+* 在 Python 代码中预测
+```python
+from paddleclas import PaddleClas
+clas = PaddleClas(model_name='PPHGNet_small')
+infer_imgs = 'docs/images/inference_deployment/whl_demo.jpg'
+result = clas.predict(infer_imgs)
+print(next(result))
+```
+
+**备注**：`PaddleClas.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭
+代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果。返回结果示例如下：
+
+```
+>>> result
+[{'class_ids': [8, 7, 86, 82, 81], 'scores': [0.71479, 0.08682, 0.00806, 0.0023, 0.00121], 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'ptarmigan'], 'filename': 'docs/images/inference_deployment/whl_demo.jpg'}]
+```
+    
+    
+<a name="3"></a> 
+    
+## 3. 模型训练、评估和预测
+    
+<a name="3.1"></a>  
+
+### 3.1 环境配置
+
+* 安装：请先参考文档[环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+
+<a name="3.2"></a> 
+
+### 3.2 数据准备
+
+请在[ImageNet 官网](https://www.image-net.org/)准备 ImageNet-1k 相关的数据。
+
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，将下载好的数据命名为 `ILSVRC2012` ，存放于此。 `ILSVRC2012` 目录中具有以下数据：
+
+```
+├── train
+│   ├── n01440764
+│   │   ├── n01440764_10026.JPEG
+│   │   ├── n01440764_10027.JPEG
+├── train_list.txt
+...
+├── val
+│   ├── ILSVRC2012_val_00000001.JPEG
+│   ├── ILSVRC2012_val_00000002.JPEG
+├── val_list.txt
+```
+
+其中 `train/` 和 `val/` 分别为训练集和验证集。`train_list.txt` 和 `val_list.txt` 分别为训练集和验证集的标签文件。
+    
+**备注：** 
+
+* 关于 `train_list.txt`、`val_list.txt`的格式说明，可以参考[PaddleClas分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。
+
+
+<a name="3.3"></a> 
+
+### 3.3 模型训练 
+
+
+在 `ppcls/configs/ImageNet/PPHGNet/PPHGNet_small.yaml` 中提供了 PPHGNet_small 训练配置，可以通过如下脚本启动训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ppcls/configs/ImageNet/PPHGNet/PPHGNet_small.yaml 
+```
+
+
+**备注：** 
+
+* 当前精度最佳的模型会保存在 `output/PPHGNet_small/best_model.pdparams`
+
+<a name="3.4"></a>
+
+### 3.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
+
+```bash
+python3 tools/eval.py \
+    -c ppcls/configs/ImageNet/PPHGNet/PPHGNet_small.yaml \
+    -o Global.pretrained_model=output/PPHGNet_small/best_model
+```
+
+其中 `-o Global.pretrained_model="output/PPHGNet_small/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+<a name="3.5"></a>
+
+### 3.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```python
+python3 tools/infer.py \
+    -c ppcls/configs/ImageNet/PPHGNet/PPHGNet_small.yaml \
+    -o Global.pretrained_model=output/PPHGNet_small/best_model 
+```
+
+输出结果如下：
+
+```
+[{'class_ids': [8, 7, 86, 82, 81], 'scores': [0.71479, 0.08682, 0.00806, 0.0023, 0.00121], 'file_name': 'docs/images/inference_deployment/whl_demo.jpg', 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'ptarmigan']}]
+```
+
+**备注：** 
+
+* 这里`-o Global.pretrained_model="output/PPHGNet_small/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+    
+* 默认是对 `docs/images/inference_deployment/whl_demo.jpg` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
+    
+* 默认输出的是 Top-5 的值，如果希望输出 Top-k 的值，可以指定`-o Infer.PostProcess.topk=k`，其中，`k` 为您指定的值。
+
+
+    
+<a name="4"></a>
+
+## 4. 模型推理部署
+
+<a name="4.1"></a> 
+
+### 4.1 推理模型准备
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference可使用MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于Paddle Inference推理引擎的介绍，可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+    
+当使用 Paddle Inference 推理时，加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法，如果希望得到和文档相同的结果，请选择[直接下载 inference 模型](#6.1.2)的方式。
+
+    
+<a name="4.1.1"></a> 
+
+### 4.1.1 基于训练得到的权重导出 inference 模型
+
+此处，我们提供了将权重和模型转换的脚本，执行该脚本可以得到对应的 inference 模型：
+
+```bash
+python3 tools/export_model.py \
+    -c ppcls/configs/ImageNet/PPHGNet/PPHGNet_small.yaml \
+    -o Global.pretrained_model=output/PPHGNet_small/best_model \
+    -o Global.save_inference_dir=deploy/models/PPHGNet_small_infer
+```
+执行完该脚本后会在 `deploy/models/` 下生成 `PPHGNet_small_infer` 文件夹，`models` 文件夹下应有如下文件结构：
+
+```
+├── PPHGNet_small_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+
+<a name="4.1.2"></a> 
+
+### 4.1.2 直接下载 inference 模型
+
+[4.1.1 小节](#4.1.1)提供了导出 inference 模型的方法，此处也提供了该场景可以下载的 inference 模型，可以直接下载体验。
+
+```
+cd deploy/models
+# 下载 inference 模型并解压
+wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_small_infer.tar && tar -xf PPHGNet_small_infer.tar
+```
+
+解压完毕后，`models` 文件夹下应有如下文件结构：
+
+```
+├── PPHGNet_small_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="4.2"></a> 
+
+### 4.2 基于 Python 预测引擎推理
+
+
+<a name="4.2.1"></a>  
+
+#### 4.2.1 预测单张图像
+
+返回 `deploy` 目录：
+
+```
+cd ../
+```
+
+运行下面的命令，对图像 `./images/ImageNet/ILSVRC2012_val_00000010.jpeg` 进行分类。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/PPHGNet_small_infer
+# 使用下面的命令使用 CPU 进行预测
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/PPHGNet_small_infer -o Global.use_gpu=False
+```
+
+输出结果如下。
+
+```
+ILSVRC2012_val_00000010.jpeg:	class id(s): [332, 153, 283, 338, 204], score(s): [0.50, 0.05, 0.02, 0.01, 0.01], label_name(s): ['Angora, Angora rabbit', 'Maltese dog, Maltese terrier, Maltese', 'Persian cat', 'guinea pig, Cavia cobaya', 'Lhasa, Lhasa apso']
+```
+
+<a name="4.2.2"></a>  
+
+#### 4.2.2 基于文件夹的批量预测
+
+如果希望预测文件夹内的图像，可以直接修改配置文件中的 `Global.infer_imgs` 字段，也可以通过下面的 `-o` 参数修改对应的配置。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测，如果希望使用 CPU 预测，可以在命令后面添加 -o Global.use_gpu=False
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/PPHGNet_small_infer -o Global.infer_imgs=images/ImageNet/
+```
+
+终端中会输出该文件夹内所有图像的分类结果，如下所示。
+
+```
+ILSVRC2012_val_00000010.jpeg:	class id(s): [332, 153, 283, 338, 204], score(s): [0.50, 0.05, 0.02, 0.01, 0.01], label_name(s): ['Angora, Angora rabbit', 'Maltese dog, Maltese terrier, Maltese', 'Persian cat', 'guinea pig, Cavia cobaya', 'Lhasa, Lhasa apso']
+ILSVRC2012_val_00010010.jpeg:	class id(s): [626, 622, 531, 487, 633], score(s): [0.68, 0.02, 0.02, 0.02, 0.02], label_name(s): ['lighter, light, igniter, ignitor', 'lens cap, lens cover', 'digital watch', 'cellular telephone, cellular phone, cellphone, cell, mobile phone', "loupe, jeweler's loupe"]
+ILSVRC2012_val_00020010.jpeg:	class id(s): [178, 211, 171, 246, 741], score(s): [0.82, 0.00, 0.00, 0.00, 0.00], label_name(s): ['Weimaraner', 'vizsla, Hungarian pointer', 'Italian greyhound', 'Great Dane', 'prayer rug, prayer mat']
+ILSVRC2012_val_00030010.jpeg:	class id(s): [80, 83, 136, 23, 93], score(s): [0.84, 0.00, 0.00, 0.00, 0.00], label_name(s): ['black grouse', 'prairie chicken, prairie grouse, prairie fowl', 'European gallinule, Porphyrio porphyrio', 'vulture', 'hornbill']
+```
+
+
+<a name="4.3"></a> 
+
+### 4.3 基于 C++ 预测引擎推理
+
+PaddleClas 提供了基于 C++ 预测引擎推理的示例，您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台，可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。
+
+<a name="4.4"></a> 
+
+### 4.4 服务化部署
+
+Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍，可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。
+    
+PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例，您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。
+
+<a name="4.5"></a> 
+
+### 4.5 端侧部署
+
+Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍，可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。
+    
+PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例，您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。
+
+<a name="4.6"></a> 
+
+### 4.6 Paddle2ONNX 模型转换与预测
+    
+Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署，包括TensorRT/OpenVINO/MNN/TNN/NCNN，以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍，可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。
 
+PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例，您可以参考[Paddle2ONNX 模型转换与预测](../../../deploy/paddle2onnx/readme.md)来完成相应的部署工作。
 
-关于更多 PP-HGNet 的介绍以及下游任务的表现，敬请期待。
diff --git a/docs/zh_CN/models/PP-LCNet.md b/docs/zh_CN/models/PP-LCNet.md
index 7fea973d4634228fefcccff0e7e856546b3b9652..2c9627cfb195a7b384c262c0b4076bdf15bd3de3 100644
--- a/docs/zh_CN/models/PP-LCNet.md
+++ b/docs/zh_CN/models/PP-LCNet.md
@@ -3,54 +3,76 @@
 
 ## 目录
 
-- [1. 摘要](#1)
-- [2. 介绍](#2)
-- [3. 方法](#3)
-   - [3.1 更好的激活函数](#3.1)
-   - [3.2 合适的位置添加 SE 模块](#3.2)
-   - [3.3 合适的位置添加更大的卷积核](#3.3)
-   - [3.4 GAP 后使用更大的 1x1 卷积层](#3.4)
-- [4. 实验部分](#4)
-   - [4.1 图像分类](#4.1)
-   - [4.2 目标检测](#4.2)
-   - [4.3 语义分割](#4.3)
-- [5. 基于 V100 GPU 的预测速度](#5)
-- [6. 基于 SD855 的预测速度](#6)
-- [7. 总结](#7)
-- [8. 引用](#8)
+- [1. 模型介绍](#1)
+    - [1.1 模型简介](#1.1)
+    - [1.2 模型细节](#1.2)
+      - [1.2.1 更好的激活函数](#1.2.1)
+      - [1.2.2 合适的位置添加 SE 模块](#1.2.2)
+      - [1.2.3 合适的位置添加更大的卷积核](#1.2.3)
+      - [1.2.4 GAP 后使用更大的 1x1 卷积层](#1.2.4)
+    - [1.3 实验结果](#1.3)
+    - [1.4 Benchmark](#1.4)
+      - [1.4.1 基于 Intel Xeon Gold 6148 的预测速度](#1.4.1)
+      - [1.4.2 基于 V100 GPU 的预测速度](#1.4.2)
+      - [1.4.3 基于 SD855 的预测速度](#1.4.3)
+- [2. 模型快速体验](#2)
+   - [2.1 安装 paddleclas](#2.1)
+   - [2.2 预测](#2.2)
+- [3. 模型训练、评估和预测](#3)
+    - [3.1 环境配置](#3.1)
+    - [3.2 数据准备](#3.2)
+    - [3.3 模型训练](#3.3)
+    - [3.4 模型评估](#3.4)
+    - [3.5 模型预测](#3.5)
+- [4. 模型推理部署](#4)
+  - [4.1 推理模型准备](#4.1)
+    - [4.1.1 基于训练得到的权重导出 inference 模型](#4.1.1)
+    - [4.1.2 直接下载 inference 模型](#4.1.2)
+  - [4.2 基于 Python 预测引擎推理](#4.2)
+    - [4.2.1 预测单张图像](#4.2.1)
+    - [4.2.2 基于文件夹的批量预测](#4.2.2)
+  - [4.3 基于 C++ 预测引擎推理](#4.3)
+  - [4.4 服务化部署](#4.4)
+  - [4.5 端侧部署](#4.5)
+  - [4.6 Paddle2ONNX 模型转换与预测](#4.6)
+- [5. 引用](#5)
+  
+  
 
 <a name="1"></a>
 
-## 1. 摘要
+## 1. 模型介绍
 
-在计算机视觉领域中，骨干网络的好坏直接影响到整个视觉任务的结果。在之前的一些工作中，相关的研究者普遍将 FLOPs 或者 Params 作为优化目的，但是在工业界真实落地的场景中，推理速度才是考量模型好坏的重要指标，然而，推理速度和准确性很难兼得。考虑到工业界有很多基于 Intel CPU 的应用，所以我们本次的工作旨在使骨干网络更好的适应 Intel CPU，从而得到一个速度更快、准确率更高的轻量级骨干网络，与此同时，目标检测、语义分割等下游视觉任务的性能也同样得到提升。
+### 1.1 模型简介
 
-<a name="2"></a>
-## 2. 介绍
+在计算机视觉领域中，骨干网络的好坏直接影响到整个视觉任务的结果。在之前的一些工作中，相关的研究者普遍将 FLOPs 或者 Params 作为优化目的，但是在工业界真实落地的场景中，推理速度才是考量模型好坏的重要指标，然而，推理速度和准确性很难兼得。考虑到工业界有很多基于 Intel CPU 的应用，所以我们本次的工作旨在使骨干网络更好的适应 Intel CPU，从而得到一个速度更快、准确率更高的轻量级骨干网络，与此同时，目标检测、语义分割等下游视觉任务的性能也同样得到提升。
 
 近年来，有很多轻量级的骨干网络问世，尤其最近两年，各种 NAS 搜索出的网络层出不穷，这些网络要么主打 FLOPs 或者 Params 上的优势，要么主打 ARM 设备上的推理速度的优势，很少有网络专门针对 Intel CPU 做特定的优化，导致这些网络在 Intel CPU 端的推理速度并不是很完美。基于此，我们针对 Intel CPU 设备以及其加速库 MKLDNN 设计了特定的骨干网络 PP-LCNet，比起其他的轻量级的 SOTA 模型，该骨干网络可以在不增加推理时间的情况下，进一步提升模型的性能，最终大幅度超越现有的 SOTA 模型。与其他模型的对比图如下。
 ![](../../images/PP-LCNet/PP-LCNet-Acc.png)
 
-<a name="3"></a>
-## 3. 方法
+<a name="1.2"></a>
+
+### 1.2 模型细节
 
 网络结构整体如下图所示。
 ![](../../images/PP-LCNet/PP-LCNet.png)
 我们经过大量的实验发现，在基于 Intel CPU 设备上，尤其当启用 MKLDNN 加速库后，很多看似不太耗时的操作反而会增加延时，比如 elementwise-add 操作、split-concat 结构等。所以最终我们选用了结构尽可能精简、速度尽可能快的 block 组成我们的 BaseNet（类似 MobileNetV1）。基于 BaseNet，我们通过实验，总结了四条几乎不增加延时但是可以提升模型精度的方法，融合这四条策略，我们组合成了 PP-LCNet。下面对这四条策略一一介绍：
 
-<a name="3.1"></a>
-### 3.1 更好的激活函数
+<a name="1.2.1"></a>
+
+#### 1.2.1 更好的激活函数
 
 自从卷积神经网络使用了 ReLU 激活函数后，网络性能得到了大幅度提升，近些年 ReLU 激活函数的变体也相继出现，如 Leaky-ReLU、P-ReLU、ELU 等，2017 年，谷歌大脑团队通过搜索的方式得到了 swish 激活函数，该激活函数在轻量级网络上表现优异，在 2019 年，MobileNetV3 的作者将该激活函数进一步优化为 H-Swish，该激活函数去除了指数运算，速度更快，网络精度几乎不受影响。我们也经过很多实验发现该激活函数在轻量级网络上有优异的表现。所以在 PP-LCNet 中，我们选用了该激活函数。
 
-<a name="3.2"></a>
-### 3.2 合适的位置添加 SE 模块
+<a name="1.2.2"></a>
+
+#### 1.2.2 合适的位置添加 SE 模块
 
 SE 模块是 SENet 提出的一种通道注意力机制，可以有效提升模型的精度。但是在 Intel CPU 端，该模块同样会带来较大的延时，如何平衡精度和速度是我们要解决的一个问题。虽然在 MobileNetV3 等基于 NAS 搜索的网络中对 SE 模块的位置进行了搜索，但是并没有得出一般的结论，我们通过实验发现，SE 模块越靠近网络的尾部对模型精度的提升越大。下表也展示了我们的一些实验结果：
 
 
 | SE Location       | Top-1 Acc(\%) | Latency(ms) |
-|-------------------|---------------|-------------|
+|:--:|:--:|:--:|
 | 1100000000000     | 61.73           | 2.06         |
 | 0000001100000     | 62.17           | 2.03         |
 | <b>0000000000011<b>     | <b>63.14<b>           | <b>2.05<b>         |
@@ -59,13 +81,14 @@ SE 模块是 SENet 提出的一种通道注意力机制，可以有效提升模
 
 最终，PP-LCNet 中的 SE 模块的位置选用了表格中第三行的方案。
 
-<a name="3.3"></a>
-### 3.3 合适的位置添加更大的卷积核
+<a name="1.2.3"></a>
+    
+#### 1.2.3 合适的位置添加更大的卷积核
 
 在 MixNet 的论文中，作者分析了卷积核大小对模型性能的影响，结论是在一定范围内大的卷积核可以提升模型的性能，但是超过这个范围会有损模型的性能，所以作者组合了一种 split-concat 范式的 MixConv，这种组合虽然可以提升模型的性能，但是不利于推理。我们通过实验总结了一些更大的卷积核在不同位置的作用，类似 SE 模块的位置，更大的卷积核在网络的中后部作用更明显，下表展示了 5x5 卷积核的位置对精度的影响：
 
 | large-kernel Location       | Top-1 Acc(\%) | Latency(ms) |
-|-------------------|---------------|-------------|
+|:--:|:--:|:--:|
 | 1111111111111     | 63.22           | 2.08         |
 | 1111111000000     | 62.70           | 2.07        |
 | <b>0000001111111<b>     | <b>63.14<b>           | <b>2.05<b>         |
@@ -73,48 +96,51 @@ SE 模块是 SENet 提出的一种通道注意力机制，可以有效提升模
 
 实验表明，更大的卷积核放在网络的中后部即可达到放在所有位置的精度，与此同时，获得更快的推理速度。PP-LCNet 最终选用了表格中第三行的方案。
     
-<a name="3.4"></a>
-### 3.4 GAP 后使用更大的 1x1 卷积层
+<a name="1.2.4"></a>
+    
+#### 1.2.4 GAP 后使用更大的 1x1 卷积层
 
 在 GoogLeNet 之后，GAP（Global-Average-Pooling）后往往直接接分类层，但是在轻量级网络中，这样会导致 GAP 后提取的特征没有得到进一步的融合和加工。如果在此后使用一个更大的 1x1 卷积层（等同于 FC 层），GAP 后的特征便不会直接经过分类层，而是先进行了融合，并将融合的特征进行分类。这样可以在不影响模型推理速度的同时大大提升准确率。
 BaseNet 经过以上四个方面的改进，得到了 PP-LCNet。下表进一步说明了每个方案对结果的影响：
 
 | Activation | SE-block | Large-kernel | last-1x1-conv | Top-1 Acc(\%) | Latency(ms) |
-|------------|----------|--------------|---------------|---------------|-------------|
+|:--:|:--:|:--:|:--:|:--:|:--:|
 | 0       | 1       | 1               | 1                | 61.93 | 1.94 |
 | 1       | 0       | 1               | 1                | 62.51 | 1.87 |
 | 1       | 1       | 0               | 1                | 62.44 | 2.01 |
 | 1       | 1       | 1               | 0                | 59.91 | 1.85 |
 | <b>1<b>       | <b>1<b>       | <b>1<b>               | <b>1<b>                | <b>63.14<b> | <b>2.05<b> |
 
-<a name="4"></a>
-## 4. 实验部分
+<a name="1.3"></a>
+    
+### 1.3 实验结果
 
-<a name="4.1"></a>
-### 4.1 图像分类
+<a name="1.3.1"></a>
+    
+#### 1.3.1 图像分类
 
 图像分类我们选用了 ImageNet 数据集，相比目前主流的轻量级网络，PP-LCNet 在相同精度下可以获得更快的推理速度。当使用百度自研的 SSLD 蒸馏策略后，精度进一步提升，在 Intel cpu 端约 5ms 的推理速度下 ImageNet 的 Top-1 Acc 超过了 80%。
 
-| Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) |
-|-------|-----------|----------|---------------|---------------|-------------|
-| PPLCNet_x0_25  | 1.5 | 18  | 51.86 | 75.65 | 1.74 |
-| PPLCNet_x0_35  | 1.6 | 29  | 58.09 | 80.83 | 1.92 |
-| PPLCNet_x0_5   | 1.9 | 47  | 63.14 | 84.66 | 2.05 |
-| PPLCNet_x0_75  | 2.4 | 99  | 68.18 | 88.30 | 2.29 |
-| PPLCNet_x1_0     | 3.0 | 161 | 71.32 | 90.03 | 2.46 |
-| PPLCNet_x1_5   | 4.5 | 342 | 73.71 | 91.53 | 3.19 |
-| PPLCNet_x2_0     | 6.5 | 590 | 75.18 | 92.27 | 4.27 |
-| PPLCNet_x2_5   | 9.0 | 906 | 76.60 | 93.00 | 5.39 |
-| PPLCNet_x0_5_ssld | 1.9 | 47  | 66.10 | 86.46 | 2.05 |
-| PPLCNet_x1_0_ssld | 3.0 | 161 | 74.39 | 92.09 | 2.46 |
-| PPLCNet_x2_5_ssld | 9.0 | 906 | 80.82 | 95.33 | 5.39 |
+| Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) | 预训练模型下载地址 | inference模型下载地址 |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:| 
+| PPLCNet_x0_25  | 1.5 | 18  | 51.86 | 75.65 | 1.74 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_25_infer.tar) |
+| PPLCNet_x0_35  | 1.6 | 29  | 58.09 | 80.83 | 1.92 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_35_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_35_infer.tar) |
+| PPLCNet_x0_5   | 1.9 | 47  | 63.14 | 84.66 | 2.05 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_5_infer.tar) |
+| PPLCNet_x0_75  | 2.4 | 99  | 68.18 | 88.30 | 2.29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_75_infer.tar) |
+| PPLCNet_x1_0     | 3.0 | 161 | 71.32 | 90.03 | 2.46 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_0_infer.tar) |
+| PPLCNet_x1_5   | 4.5 | 342 | 73.71 | 91.53 | 3.19 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_5_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_5_infer.tar) |
+| PPLCNet_x2_0     | 6.5 | 590 | 75.18 | 92.27 | 4.27 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x2_0_infer.tar) |
+| PPLCNet_x2_5   | 9.0 | 906 | 76.60 | 93.00 | 5.39 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x2_5_infer.tar) |
+| PPLCNet_x0_5_ssld | 1.9 | 47  | 66.10 | 86.46 | 2.05 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_ssld_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_5_ssld_infer.tar) |
+| PPLCNet_x1_0_ssld | 3.0 | 161 | 74.39 | 92.09 | 2.46 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_ssld_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_0_ssld_infer.tar) |
+| PPLCNet_x2_5_ssld | 9.0 | 906 | 80.82 | 95.33 | 5.39 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_ssld_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x2_5_ssld_infer.tar) |
 
 其中 `_ssld` 表示使用 `SSLD 蒸馏`后的模型。关于 `SSLD蒸馏` 的内容，详情 [SSLD 蒸馏](../advanced_tutorials/knowledge_distillation.md)。
 
 与其他轻量级网络的性能对比：
 
 | Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) |
-|-------|-----------|----------|---------------|---------------|-------------|
+|:--:|:--:|:--:|:--:|:--:|:--:|
 | MobileNetV2_x0_25  | 1.5 | 34  | 53.21 | 76.52 | 2.47 |
 | MobileNetV3_small_x0_35  | 1.7 | 15  | 53.03 | 76.37 | 3.02 |
 | ShuffleNetV2_x0_33  | 0.6 | 24  | 53.73 | 77.05 | 4.30 |
@@ -128,50 +154,75 @@ BaseNet 经过以上四个方面的改进，得到了 PP-LCNet。下表进一步
 | MobileNetV3_small_x1_25  | 3.6 | 100  | 70.67 | 89.51 | 3.95 |
 | <b>PPLCNet_x1_0<b>     |<b> 3.0<b> | <b>161<b> | <b>71.32<b> | <b>90.03<b> | <b>2.46<b> |
 
-<a name="4.2"></a>
-### 4.2 目标检测
+<a name="1.3.2"></a>
+    
+#### 1.3.2 目标检测
 
 目标检测的方法我们选用了百度自研的 PicoDet，该方法主打轻量级目标检测场景，下表展示了在 COCO 数据集上、backbone 选用 PP-LCNet 与 MobileNetV3 的结果的比较，无论在精度还是速度上，PP-LCNet 的优势都非常明显。
 
 | Backbone | mAP(%) | Latency(ms) |
-|-------|-----------|----------|
+|:--:|:--:|:--:|
 MobileNetV3_large_x0_35 | 19.2 | 8.1 |
 <b>PPLCNet_x0_5<b> | <b>20.3<b> | <b>6.0<b> |
 MobileNetV3_large_x0_75 | 25.8 | 11.1 |
 <b>PPLCNet_x1_0<b> | <b>26.9<b> | <b>7.9<b> |
 
-<a name="4.3"></a>
-### 4.3 语义分割
+<a name="1.3.3"></a>
+    
+#### 1.3.3 语义分割
 
 语义分割的方法我们选用了 DeeplabV3+，下表展示了在 Cityscapes 数据集上、backbone 选用 PP-LCNet 与 MobileNetV3 的比较，在精度和速度方面，PP-LCNet 的优势同样明显。
 
 | Backbone | mIoU(%) | Latency(ms) |
-|-------|-----------|----------|
+|:--:|:--:|:--:|
 MobileNetV3_large_x0_5 | 55.42 | 135 |
 <b>PPLCNet_x0_5<b> | <b>58.36<b> | <b>82<b> |
 MobileNetV3_large_x0_75 | 64.53 | 151 |
 <b>PPLCNet_x1_0<b> | <b>66.03<b> | <b>96<b> |
 
-<a name="5"></a>
+<a name="1.4"></a>
 
-## 5. 基于 V100 GPU 的预测速度
+## 1.4 Benchmark
 
-| Models        | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) | FP32<br/>Batch Size=1\4<br/>(ms) | FP32<br/>Batch Size=8<br/>(ms) |
-| ------------- | --------- | ----------------- | ---------------------------- | -------------------------------- | ------------------------------ |
-| PPLCNet_x0_25 | 224       | 256               | 0.72                         | 1.17                             | 1.71                           |
-| PPLCNet_x0_35 | 224       | 256               | 0.69                         | 1.21                             | 1.82                           |
-| PPLCNet_x0_5  | 224       | 256               | 0.70                         | 1.32                             | 1.94                           |
-| PPLCNet_x0_75 | 224       | 256               | 0.71                         | 1.49                             | 2.19                           |
-| PPLCNet_x1_0  | 224       | 256               | 0.73                         | 1.64                             | 2.53                           |
-| PPLCNet_x1_5  | 224       | 256               | 0.82                         | 2.06                             | 3.12                           |
-| PPLCNet_x2_0  | 224       | 256               | 0.94                         | 2.58                             | 4.08                           |
+<a name="1.4.1"></a>
+    
+#### 1.4.1 基于 Intel Xeon Gold 6148 的预测速度    
+
+| Model | Latency(ms)<br/>bs=1, thread=10 |
+|:--:|:--:|
+| PPLCNet_x0_25  | 1.74 |
+| PPLCNet_x0_35  | 1.92 |
+| PPLCNet_x0_5   | 2.05 |
+| PPLCNet_x0_75  | 2.29 |
+| PPLCNet_x1_0   | 2.46 |
+| PPLCNet_x1_5   | 3.19 |
+| PPLCNet_x2_0   | 4.27 |
+| PPLCNet_x2_5   | 5.39 |
+    
+**备注：** 精度类型为 FP32，推理过程使用 MKLDNN。
+
+<a name="1.4.2"></a>
+    
+#### 1.4.2 基于 V100 GPU 的预测速度
+
+| Models        | Latency(ms)<br>bs=1 | Latency(ms)<br/>bs=4 | Latency(ms)<br/>bs=8 |
+| :--: | :--:| :--: | :--: |
+| PPLCNet_x0_25 | 0.72                         | 1.17                             | 1.71                           |
+| PPLCNet_x0_35 | 0.69                         | 1.21                             | 1.82                           |
+| PPLCNet_x0_5  | 0.70                         | 1.32                             | 1.94                           |
+| PPLCNet_x0_75 | 0.71                         | 1.49                             | 2.19                           |
+| PPLCNet_x1_0  | 0.73                         | 1.64                             | 2.53                           |
+| PPLCNet_x1_5  | 0.82                         | 2.06                             | 3.12                           |
+| PPLCNet_x2_0  | 0.94                         | 2.58                             | 4.08                           |
+    
+**备注：** 精度类型为 FP32，推理过程使用 TensorRT。
 
-<a name="6"></a>
+<a name="1.4.3"></a>
 
-## 6. 基于 SD855 的预测速度
+#### 1.4.3 基于 SD855 的预测速度
 
-| Models        | SD855 time(ms)<br>bs=1, thread=1 | SD855 time(ms)<br/>bs=1, thread=2 | SD855 time(ms)<br/>bs=1, thread=4 |
-| ------------- | -------------------------------- | --------------------------------- | --------------------------------- |
+| Models        | Latency(ms)<br>bs=1, thread=1 | Latency(ms)<br/>bs=1, thread=2 | Latency(ms)<br/>bs=1, thread=4 |
+| :--: | :--: | :--: | :--: |
 | PPLCNet_x0_25 | 2.30                             | 1.62                              | 1.32                              |
 | PPLCNet_x0_35 | 3.15                             | 2.11                              | 1.64                              |
 | PPLCNet_x0_5  | 4.27                             | 2.73                              | 1.92                              |
@@ -180,16 +231,307 @@ MobileNetV3_large_x0_75 | 64.53 | 151 |
 | PPLCNet_x1_5  | 20.55                            | 12.26                             | 7.54                              |
 | PPLCNet_x2_0  | 33.79                            | 20.17                             | 12.10                             |
 | PPLCNet_x2_5  | 49.89                            | 29.60                             | 17.82                             |
+    
+**备注：** 精度类型为 FP32。
+
+<a name="2"></a>   
+    
+## 2. 模型快速体验
+
+<a name="2.1"></a>   
+    
+### 2.1 安装 paddleclas
+    
+使用如下命令快速安装 paddlepaddle, paddleclas
+    
+```    
+pip3 install paddlepaddle paddleclas
+```
+<a name="2.2"></a> 
+    
+### 2.2 预测
+
+* 在命令行中使用 PPLCNet_x1_0 的权重快速预测
+    
+```bash
+paddleclas --model_name=PPLCNet_x1_0  --infer_imgs="docs/images/inference_deployment/whl_demo.jpg"
+```
+    
+结果如下：
+```
+>>> result
+class_ids: [8, 7, 86, 81, 85], scores: [0.91347, 0.03779, 0.0036, 0.00117, 0.00112], label_names: ['hen', 'cock', 'partridge', 'ptarmigan', 'quail'], filename: docs/images/inference_deployment/whl_demo.jpg
+Predict complete!
+```   
+    
+**备注**： 更换 PPLCNet 的其他 scale 的模型时，只需替换 `model_name`，如将此时的模型改为 `PPLCNet_x2_0` 时，只需要将 `--model_name=PPLCNet_x1_0` 改为 `--model_name=PPLCNet_x2_0` 即可。   
+
+    
+* 在 Python 代码中预测
+```python
+from paddleclas import PaddleClas
+clas = PaddleClas(model_name='PPLCNet_x1_0')
+infer_imgs='docs/images/inference_deployment/whl_demo.jpg'
+result=clas.predict(infer_imgs)
+print(next(result))
+```
+
+**备注**：`PaddleClas.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭
+代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果。返回结果示例如下：
+
+```
+>>> result
+[{'class_ids': [8, 7, 86, 81, 85], 'scores': [0.91347, 0.03779, 0.0036, 0.00117, 0.00112], 'label_names': ['hen', 'cock', 'partridge', 'ptarmigan', 'quail'], 'filename': 'docs/images/inference_deployment/whl_demo.jpg'}]
+```
+        
+<a name="3"></a> 
+    
+## 3. 模型训练、评估和预测
+    
+<a name="3.1"></a>  
+
+### 3.1 环境配置
+
+* 安装：请先参考 [Paddle 安装教程](../installation/install_paddle.md) 以及 [PaddleClas 安装教程](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+
+<a name="3.2"></a> 
+
+### 3.2 数据准备
+
+请在[ImageNet 官网](https://www.image-net.org/)准备 ImageNet-1k 相关的数据。
+
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，将下载好的数据命名为 `ILSVRC2012` ，存放于此。 `ILSVRC2012` 目录中具有以下数据：
+
+```
+├── train
+│   ├── n01440764
+│   │   ├── n01440764_10026.JPEG
+│   │   ├── n01440764_10027.JPEG
+├── train_list.txt
+...
+├── val
+│   ├── ILSVRC2012_val_00000001.JPEG
+│   ├── ILSVRC2012_val_00000002.JPEG
+├── val_list.txt
+```
+
+其中 `train/` 和 `val/` 分别为训练集和验证集。`train_list.txt` 和 `val_list.txt` 分别为训练集和验证集的标签文件。
+    
+**备注：** 
 
-<a name="7"></a>
+* 关于 `train_list.txt`、`val_list.txt`的格式说明，可以参考[PaddleClas分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。
 
-## 7. 总结
 
-PP-LCNet 没有像学术界那样死扣极致的 FLOPs 与 Params，而是着眼于分析如何添加对 Intel CPU 友好的模块来提升模型的性能，这样可以更好的平衡准确率和推理时间，其中的实验结论也很适合其他网络结构设计的研究者，同时也为 NAS 搜索研究者提供了更小的搜索空间和一般结论。最终的 PP-LCNet 在产业界也可以更好的落地和应用。
+<a name="3.3"></a> 
+
+### 3.3 模型训练 
+
+
+在 `ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml` 中提供了 PPLCNet_x1_0 训练配置，可以通过如下脚本启动训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml 
+```
+
+
+**备注：** 
+
+* 当前精度最佳的模型会保存在 `output/PPLCNet_x1_0/best_model.pdparams`
+
+<a name="3.4"></a>
+
+### 3.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
+
+```bash
+python3 tools/eval.py \
+    -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+
+其中 `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+<a name="3.5"></a>
+
+### 3.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```python
+python3 tools/infer.py \
+    -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model 
+```
 
-<a name="8"></a>
+输出结果如下：
+
+```
+[{'class_ids': [8, 7, 86, 81, 85], 'scores': [0.91347, 0.03779, 0.0036, 0.00117, 0.00112], 'file_name': 'docs/images/inference_deployment/whl_demo.jpg', 'label_names': ['hen', 'cock', 'partridge', 'ptarmigan', 'quail']}]
+```
+
+**备注：** 
+
+* 这里`-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+    
+* 默认是对 `docs/images/inference_deployment/whl_demo.jpg` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
+    
+* 默认输出的是 Top-5 的值，如果希望输出 Top-k 的值，可以指定`-o Infer.PostProcess.topk=k`，其中，`k` 为您指定的值。
+
+
+    
+<a name="4"></a>
+
+## 4. 模型推理部署
+
+<a name="4.1"></a> 
+
+### 4.1 推理模型准备
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference可使用MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于Paddle Inference推理引擎的介绍，可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+    
+当使用 Paddle Inference 推理时，加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法，如果希望得到和文档相同的结果，请选择[直接下载 inference 模型](#6.1.2)的方式。
+
+    
+<a name="4.1.1"></a> 
+
+### 4.1.1 基于训练得到的权重导出 inference 模型
+
+此处，我们提供了将权重和模型转换的脚本，执行该脚本可以得到对应的 inference 模型：
+
+```bash
+python3 tools/export_model.py \
+    -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_infer
+```
+执行完该脚本后会在 `deploy/models/` 下生成 `PPLCNet_x1_0_infer` 文件夹，`models` 文件夹下应有如下文件结构：
+
+```
+├── PPLCNet_x1_0_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+
+<a name="4.1.2"></a> 
+
+### 4.1.2 直接下载 inference 模型
+
+[4.1.1 小节](#4.1.1)提供了导出 inference 模型的方法，此处也提供了该场景可以下载的 inference 模型，可以直接下载体验。
+
+```
+cd deploy/models
+# 下载 inference 模型并解压
+wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_0_infer.tar && tar -xf PPLCNet_x1_0_infer.tar
+```
+
+解压完毕后，`models` 文件夹下应有如下文件结构：
+
+```
+├── PPLCNet_x1_0_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="4.2"></a> 
+
+### 4.2 基于 Python 预测引擎推理
+
+
+<a name="4.2.1"></a>  
+
+#### 4.2.1 预测单张图像
+
+返回 `deploy` 目录：
+
+```
+cd ../
+```
+
+运行下面的命令，对图像 `./images/ImageNet/ILSVRC2012_val_00000010.jpeg` 进行分类。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/PPLCNet_x1_0_infer
+# 使用下面的命令使用 CPU 进行预测
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/PPLCNet_x1_0_infer -o Global.use_gpu=False
+```
+
+输出结果如下。
+
+```
+ILSVRC2012_val_00000010.jpeg:	class id(s): [153, 265, 204, 283, 229], score(s): [0.61, 0.11, 0.05, 0.03, 0.02], label_name(s): ['Maltese dog, Maltese terrier, Maltese', 'toy poodle', 'Lhasa, Lhasa apso', 'Persian cat', 'Old English sheepdog, bobtail']
+```
+
+<a name="4.2.2"></a>  
+
+#### 4.2.2 基于文件夹的批量预测
+
+如果希望预测文件夹内的图像，可以直接修改配置文件中的 `Global.infer_imgs` 字段，也可以通过下面的 `-o` 参数修改对应的配置。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测，如果希望使用 CPU 预测，可以在命令后面添加 -o Global.use_gpu=False
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/PPLCNet_x1_0_infer -o Global.infer_imgs=images/ImageNet/
+```
+
+终端中会输出该文件夹内所有图像的分类结果，如下所示。
+
+```
+ILSVRC2012_val_00000010.jpeg:	class id(s): [153, 265, 204, 283, 229], score(s): [0.61, 0.11, 0.05, 0.03, 0.02], label_name(s): ['Maltese dog, Maltese terrier, Maltese', 'toy poodle', 'Lhasa, Lhasa apso', 'Persian cat', 'Old English sheepdog, bobtail']
+ILSVRC2012_val_00010010.jpeg:	class id(s): [695, 551, 507, 531, 419], score(s): [0.11, 0.06, 0.03, 0.03, 0.03], label_name(s): ['padlock', 'face powder', 'combination lock', 'digital watch', 'Band Aid']
+ILSVRC2012_val_00020010.jpeg:	class id(s): [178, 211, 209, 210, 236], score(s): [0.87, 0.03, 0.01, 0.00, 0.00], label_name(s): ['Weimaraner', 'vizsla, Hungarian pointer', 'Chesapeake Bay retriever', 'German short-haired pointer', 'Doberman, Doberman pinscher']
+ILSVRC2012_val_00030010.jpeg:	class id(s): [80, 23, 93, 81, 99], score(s): [0.87, 0.01, 0.01, 0.01, 0.00], label_name(s): ['black grouse', 'vulture', 'hornbill', 'ptarmigan', 'goose']
+```
+
+
+<a name="4.3"></a> 
+
+### 4.3 基于 C++ 预测引擎推理
+
+PaddleClas 提供了基于 C++ 预测引擎推理的示例，您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台，可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。
+
+<a name="4.4"></a> 
+
+### 4.4 服务化部署
+
+Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍，可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。
+    
+PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例，您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。
+
+<a name="4.5"></a> 
+
+### 4.5 端侧部署
+
+Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍，可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。
+    
+PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例，您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。
+
+<a name="4.6"></a> 
+
+### 4.6 Paddle2ONNX 模型转换与预测
+    
+Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署，包括TensorRT/OpenVINO/MNN/TNN/NCNN，以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍，可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。
+
+PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例，您可以参考[Paddle2ONNX 模型转换与预测](@shuilong)来完成相应的部署工作。
+
+  
+<a name="5"></a>
 
-## 8. 引用
+## 5. 引用
 
 如果你的论文用到了 PP-LCNet 的方法，请添加如下 cite：
 ```
diff --git a/docs/zh_CN/models/PP-LCNetV2.md b/docs/zh_CN/models/PP-LCNetV2.md
index 362bac6f62957ae484a15a7f1b396e86d593214f..23c01df129c6a1e81f7aba30e5d5c3cbe841634b 100644
--- a/docs/zh_CN/models/PP-LCNetV2.md
+++ b/docs/zh_CN/models/PP-LCNetV2.md
@@ -2,52 +2,407 @@
 
 ---
 
-## 1. 概述
+## 目录
+
+- [1. 模型介绍](#1)
+    - [1.1 模型简介](#1.1)
+    - [1.2 模型细节](#1.2)
+      - [1.2.1 Rep 策略](#1.2.1)
+      - [1.2.2 PW 卷积](#1.2.2)
+      - [1.2.3 Shortcut](#1.2.3)
+      - [1.2.4 激活函数](#1.2.4)
+      - [1.2.5 SE 模块](#1.2.5)     
+    - [1.3 实验结果](#1.3)
+- [2. 模型快速体验](#2)
+   - [2.1 安装 paddleclas](#2.1)
+   - [2.2 预测](#2.2)
+- [3. 模型训练、评估和预测](#3)
+    - [3.1 环境配置](#3.1)
+    - [3.2 数据准备](#3.2)
+    - [3.3 模型训练](#3.3)
+    - [3.4 模型评估](#3.4)
+    - [3.5 模型预测](#3.5)
+- [4. 模型推理部署](#4)
+  - [4.1 推理模型准备](#4.1)
+    - [4.1.1 基于训练得到的权重导出 inference 模型](#4.1.1)
+    - [4.1.2 直接下载 inference 模型](#4.1.2)
+  - [4.2 基于 Python 预测引擎推理](#4.2)
+    - [4.2.1 预测单张图像](#4.2.1)
+    - [4.2.2 基于文件夹的批量预测](#4.2.2)
+  - [4.3 基于 C++ 预测引擎推理](#4.3)
+  - [4.4 服务化部署](#4.4)
+  - [4.5 端侧部署](#4.5)
+  - [4.6 Paddle2ONNX 模型转换与预测](#4.6)
+
+<a name="1"></a>
+
+## 1. 模型介绍
+
+<a name="1.1"></a>
+
+### 1.1 模型简介
 
 骨干网络对计算机视觉下游任务的影响不言而喻，不仅对下游模型的性能影响很大，而且模型效率也极大地受此影响，但现有的大多骨干网络在真实应用中的效率并不理想，特别是缺乏针对 Intel CPU 平台所优化的骨干网络，我们测试了现有的主流轻量级模型，发现在 Intel CPU 平台上的效率并不理想，然而目前 Intel CPU 平台在工业界仍有大量使用场景，因此我们提出了 PP-LCNet 系列模型，PP-LCNetV2 是在 [PP-LCNetV1](./PP-LCNet.md) 基础上所改进的。
 
-## 2. 设计细节
+<a name="1.2"></a>
+
+## 1.2 模型细节
 
 ![](../../images/PP-LCNetV2/net.png)
 
 PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在 PP-LCNetV1 的基础上优化而来，主要使用重参数化策略组合了不同大小卷积核的深度卷积，并优化了点卷积、Shortcut等。
 
-### 2.1 Rep 策略
+<a name="1.2.1"></a>
+
+### 1.2.1 Rep 策略
 
 卷积核的大小决定了卷积层感受野的大小，通过组合使用不同大小的卷积核，能够获取不同尺度的特征，因此 PPLCNetV2 在 Stage3、Stage4 中，在同一层组合使用 kernel size 分别为 5、3、1 的 DW 卷积，同时为了避免对模型效率的影响，使用重参数化（Re parameterization，Rep）策略对同层的 DW 卷积进行融合，如下图所示。
 
 ![](../../images/PP-LCNetV2/rep.png)
 
-### 2.2 PW 卷积
+<a name="1.2.2"></a>
+
+### 1.2.2 PW 卷积
 
 深度可分离卷积通常由一层 DW 卷积和一层 PW 卷积组成，用以替换标准卷积，为了使深度可分离卷积具有更强的拟合能力，我们尝试使用两层 PW 卷积，同时为了控制模型效率不受影响，两层 PW 卷积设置为：第一个在通道维度对特征图压缩，第二个再通过放大还原特征图通道，如下图所示。通过实验发现，该策略能够显著提高模型性能，同时为了平衡对模型效率带来的影响，PPLCNetV2 仅在 Stage4、Stage5 中使用了该策略。
 
 ![](../../images/PP-LCNetV2/split_pw.png)
 
-### 2.3 Shortcut
+<a name="1.2.3"></a>
+
+### 1.2.3 Shortcut
 
 残差结构（residual）自提出以来，被诸多模型广泛使用，但在轻量级卷积神经网络中，由于残差结构所带来的元素级（element-wise）加法操作，会对模型的速度造成影响，我们在 PP-LCNetV2 中，以 Stage 为单位实验了 残差结构对模型的影响，发现残差结构的使用并非一定会带来性能的提高，因此 PPLCNetV2 仅在最后一个 Stage 中的使用了残差结构：在 Block 中增加 Shortcut，如下图所示。
 
 ![](../../images/PP-LCNetV2/shortcut.png)
 
-### 2.4 激活函数
+<a name="1.2.4"></a>
+
+### 1.2.4 激活函数
 
 在目前的轻量级卷积神经网络中，ReLU、Hard-Swish 激活函数最为常用，虽然在模型性能方面，Hard-Swish 通常更为优秀，然而我们发现部分推理平台对于 Hard-Swish 激活函数的效率优化并不理想，因此为了兼顾通用性，PP-LCNetV2 默认使用了 ReLU 激活函数，并且我们测试发现，ReLU 激活函数对于较大模型的性能影响较小。
 
-### 2.5 SE 模块
+<a name="1.2.5"></a>
+
+### 1.2.5 SE 模块
 
 虽然 SE 模块能够显著提高模型性能，但其对模型速度的影响同样不可忽视，在 PP-LCNetV1 中，我们发现在模型中后部使用 SE 模块能够获得最大化的收益。在 PP-LCNetV2 的优化过程中，我们以 Stage 为单位对 SE 模块的位置做了进一步实验，并发现在 Stage3 中使用能够取得更好的平衡。
 
-## 3. 实验结果
+<a name="1.3"></a>
+
+## 1.3 实验结果
+
+PPLCNetV2 目前提供的模型的精度、速度指标及预训练权重链接如下：
+
+| Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) | 预训练模型下载地址 | inference模型下载地址 |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| <b>PPLCNetV2_base<b>  | <b>6.6<b> | <b>604<b>  | <b>77.04<b> | <b>93.27<b> | <b>4.32<b> | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNetV2_base_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNetV2_base_infer.tar) |
+| <b>PPLCNetV2_base_ssld<b>  | <b>6.6<b> | <b>604<b>  | <b>80.07<b> | <b>94.87<b> | <b>4.32<b> | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNetV2_base_ssld_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNetV2_base_ssld_infer.tar) |
+
+**备注：**
+    
+* 1. `_ssld` 表示使用 `SSLD 蒸馏`后的模型。关于 `SSLD蒸馏` 的内容，详情 [SSLD 蒸馏](../advanced_tutorials/knowledge_distillation.md)。
+* 2. PP-LCNetV2 更多模型指标及权重，敬请期待。
 
 在不使用额外数据的前提下，PPLCNetV2_base 模型在图像分类 ImageNet 数据集上能够取得超过 77% 的 Top1 Acc，同时在 Intel CPU 平台的推理时间在 4.4 ms 以下，如下表所示，其中推理时间基于 Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz 硬件平台，OpenVINO 推理平台。
 
 | Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) |
-|-------|-----------|----------|---------------|---------------|-------------|
+|:--:|:--:|:--:|:--:|:--:|:--:|
 | MobileNetV3_Large_x1_25 | 7.4 | 714  | 76.4 | 93.00 | 5.19 |
 | PPLCNetV2_x2_5  | 9 | 906  | 76.60 | 93.00 | 7.25 |
 | <b>PPLCNetV2_base<b>  | <b>6.6<b> | <b>604<b>  | <b>77.04<b> | <b>93.27<b> | <b>4.32<b> |
+| <b>PPLCNetV2_base_ssld<b>  | <b>6.6<b> | <b>604<b>  | <b>80.07<b> | <b>94.87<b> | <b>4.32<b> |
+
+
+<a name="2"></a>   
+    
+## 2. 模型快速体验
+
+<a name="2.1"></a>   
+    
+### 2.1 安装 paddleclas
+    
+使用如下命令快速安装 paddlepaddle, paddleclas
+    
+```    
+pip3 install paddlepaddle paddleclas
+```
+<a name="2.2"></a> 
+    
+### 2.2 预测
+
+* 在命令行中使用 PPLCNetV2_base 的权重快速预测
+    
+```bash
+paddleclas --model_name=PPLCNetV2_base  --infer_imgs="docs/images/inference_deployment/whl_demo.jpg"
+```
+    
+结果如下：
+```
+>>> result
+class_ids: [8, 7, 86, 82, 83], scores: [0.8859, 0.07156, 0.00588, 0.00047, 0.00034], label_names: ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'prairie chicken, prairie grouse, prairie fowl'], filename: docs/images/inference_deployment/whl_demo.jpg
+Predict complete
+``` 
+    
+    
+* 在 Python 代码中预测
+```python
+from paddleclas import PaddleClas
+clas = PaddleClas(model_name='PPLCNetV2_base')
+infer_imgs='docs/images/inference_deployment/whl_demo.jpg'
+result=clas.predict(infer_imgs)
+print(next(result))
+```
+
+**备注**：`PaddleClas.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭
+代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果。返回结果示例如下：
+
+```
+>>> result
+[{'class_ids': [8, 7, 86, 82, 83], 'scores': [0.8859, 0.07156, 0.00588, 0.00047, 0.00034], 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'prairie chicken, prairie grouse, prairie fowl'], 'filename': 'docs/images/inference_deployment/whl_demo.jpg'}]
+```
+
+    
+<a name="3"></a> 
+    
+## 3. 模型训练、评估和预测
+    
+<a name="3.1"></a>  
+
+### 3.1 环境配置
+
+* 安装：请先参考文档[环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+
+<a name="3.2"></a> 
+
+### 3.2 数据准备
+
+请在[ImageNet 官网](https://www.image-net.org/)准备 ImageNet-1k 相关的数据。
+
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，将下载好的数据命名为 `ILSVRC2012` ，存放于此。 `ILSVRC2012` 目录中具有以下数据：
+
+```
+├── train
+│   ├── n01440764
+│   │   ├── n01440764_10026.JPEG
+│   │   ├── n01440764_10027.JPEG
+├── train_list.txt
+...
+├── val
+│   ├── ILSVRC2012_val_00000001.JPEG
+│   ├── ILSVRC2012_val_00000002.JPEG
+├── val_list.txt
+```
+
+其中 `train/` 和 `val/` 分别为训练集和验证集。`train_list.txt` 和 `val_list.txt` 分别为训练集和验证集的标签文件。
+    
+**备注：** 
+
+* 关于 `train_list.txt`、`val_list.txt`的格式说明，可以参考[PaddleClas分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。
+
+
+<a name="3.3"></a> 
+
+### 3.3 模型训练 
+
+
+在 `ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml` 中提供了 PPLCNetV2_base 训练配置，可以通过如下脚本启动训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml 
+```
+
+
+**备注：** 
+
+* 当前精度最佳的模型会保存在 `output/PPLCNetV2_base/best_model.pdparams`
+
+<a name="3.4"></a>
+
+### 3.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
+
+```bash
+python3 tools/eval.py \
+    -c ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml \
+    -o Global.pretrained_model=output/PPLCNetV2_base/best_model
+```
+
+其中 `-o Global.pretrained_model="output/PPLCNetV2_base/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+<a name="3.5"></a>
+
+### 3.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```python
+python3 tools/infer.py \
+    -c ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml \
+    -o Global.pretrained_model=output/PPLCNetV2_base/best_model 
+```
+
+输出结果如下：
+
+```
+[{'class_ids': [8, 7, 86, 82, 83], 'scores': [0.8859, 0.07156, 0.00588, 0.00047, 0.00034], 'file_name': 'docs/images/inference_deployment/whl_demo.jpg', 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'prairie chicken, prairie grouse, prairie fowl']}]
+```
+
+**备注：** 
+
+* 这里`-o Global.pretrained_model="output/PPLCNetV2_base/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+    
+* 默认是对 `docs/images/inference_deployment/whl_demo.jpg` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
+    
+* 默认输出的是 Top-5 的值，如果希望输出 Top-k 的值，可以指定`-o Infer.PostProcess.topk=k`，其中，`k` 为您指定的值。
+
+
+    
+<a name="4"></a>
+
+## 4. 模型推理部署
+
+<a name="4.1"></a> 
+
+### 4.1 推理模型准备
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference可使用MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于Paddle Inference推理引擎的介绍，可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+    
+当使用 Paddle Inference 推理时，加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法，如果希望得到和文档相同的结果，请选择[直接下载 inference 模型](#6.1.2)的方式。
+
+    
+<a name="4.1.1"></a> 
+
+### 4.1.1 基于训练得到的权重导出 inference 模型
+
+此处，我们提供了将权重和模型转换的脚本，执行该脚本可以得到对应的 inference 模型：
+
+```bash
+python3 tools/export_model.py \
+    -c ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml \
+    -o Global.pretrained_model=output/PPLCNetV2_base/best_model \
+    -o Global.save_inference_dir=deploy/models/PPLCNetV2_base_infer
+```
+执行完该脚本后会在 `deploy/models/` 下生成 `PPLCNetV2_base_infer` 文件夹，`models` 文件夹下应有如下文件结构：
+
+```
+├── PPLCNetV2_base_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+
+<a name="4.1.2"></a> 
+
+### 4.1.2 直接下载 inference 模型
+
+[4.1.1 小节](#4.1.1)提供了导出 inference 模型的方法，此处也提供了该场景可以下载的 inference 模型，可以直接下载体验。
+
+```
+cd deploy/models
+# 下载 inference 模型并解压
+wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNetV2_base_infer.tar && tar -xf PPLCNetV2_base_infer.tar
+```
+
+解压完毕后，`models` 文件夹下应有如下文件结构：
+
+```
+├── PPLCNetV2_base_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="4.2"></a> 
+
+### 4.2 基于 Python 预测引擎推理
+
+
+<a name="4.2.1"></a>  
+
+#### 4.2.1 预测单张图像
+
+返回 `deploy` 目录：
+
+```
+cd ../
+```
+
+运行下面的命令，对图像 `./images/ImageNet/ILSVRC2012_val_00000010.jpeg` 进行分类。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/PPLCNetV2_base_infer
+# 使用下面的命令使用 CPU 进行预测
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/PPLCNetV2_base_infer -o Global.use_gpu=False
+```
+
+输出结果如下。
+
+```
+ILSVRC2012_val_00000010.jpeg:	class id(s): [332, 153, 229, 204, 265], score(s): [0.28, 0.25, 0.03, 0.02, 0.02], label_name(s): ['Angora, Angora rabbit', 'Maltese dog, Maltese terrier, Maltese', 'Old English sheepdog, bobtail', 'Lhasa, Lhasa apso', 'toy poodle']
+```
+
+<a name="4.2.2"></a>  
+
+#### 4.2.2 基于文件夹的批量预测
+
+如果希望预测文件夹内的图像，可以直接修改配置文件中的 `Global.infer_imgs` 字段，也可以通过下面的 `-o` 参数修改对应的配置。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测，如果希望使用 CPU 预测，可以在命令后面添加 -o Global.use_gpu=False
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/PPLCNetV2_base_infer -o Global.infer_imgs=images/ImageNet/
+```
+
+终端中会输出该文件夹内所有图像的分类结果，如下所示。
+
+```
+ILSVRC2012_val_00000010.jpeg:	class id(s): [332, 153, 229, 204, 265], score(s): [0.28, 0.25, 0.03, 0.02, 0.02], label_name(s): ['Angora, Angora rabbit', 'Maltese dog, Maltese terrier, Maltese', 'Old English sheepdog, bobtail', 'Lhasa, Lhasa apso', 'toy poodle']
+ILSVRC2012_val_00010010.jpeg:	class id(s): [626, 531, 761, 487, 673], score(s): [0.64, 0.06, 0.03, 0.02, 0.01], label_name(s): ['lighter, light, igniter, ignitor', 'digital watch', 'remote control, remote', 'cellular telephone, cellular phone, cellphone, cell, mobile phone', 'mouse, computer mouse']
+ILSVRC2012_val_00020010.jpeg:	class id(s): [178, 209, 246, 181, 211], score(s): [0.97, 0.00, 0.00, 0.00, 0.00], label_name(s): ['Weimaraner', 'Chesapeake Bay retriever', 'Great Dane', 'Bedlington terrier', 'vizsla, Hungarian pointer']
+ILSVRC2012_val_00030010.jpeg:	class id(s): [80, 143, 81, 137, 98], score(s): [0.91, 0.01, 0.00, 0.00, 0.00], label_name(s): ['black grouse', 'oystercatcher, oyster catcher', 'ptarmigan', 'American coot, marsh hen, mud hen, water hen, Fulica americana', 'red-breasted merganser, Mergus serrator'
+```
+
+
+<a name="4.3"></a> 
+
+### 4.3 基于 C++ 预测引擎推理
+
+PaddleClas 提供了基于 C++ 预测引擎推理的示例，您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台，可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。
+
+<a name="4.4"></a> 
+
+### 4.4 服务化部署
+
+Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍，可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。
+    
+PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例，您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。
+
+<a name="4.5"></a> 
+
+### 4.5 端侧部署
+
+Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍，可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。
+    
+PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例，您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。
 
+<a name="4.6"></a> 
 
+### 4.6 Paddle2ONNX 模型转换与预测
+    
+Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署，包括TensorRT/OpenVINO/MNN/TNN/NCNN，以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍，可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。
 
-关于 PP-LCNetV2 模型的更多信息，敬请关注。
+PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例，您可以参考[Paddle2ONNX 模型转换与预测](../../../deploy/paddle2onnx/readme.md)来完成相应的部署工作。
diff --git a/docs/zh_CN/models/ResNet.md b/docs/zh_CN/models/ResNet.md
new file mode 100644
index 0000000000000000000000000000000000000000..31bf4ac961849182bc9406a42aa9b8221ed866f4
--- /dev/null
+++ b/docs/zh_CN/models/ResNet.md
@@ -0,0 +1,421 @@
+# ResNet 系列
+-----
+## 目录
+
+- [1. 模型介绍](#1)
+    - [1.1 模型简介](#1.1)
+    - [1.2 模型指标](#1.2)
+    - [1.3 Benchmark](#1.3)
+      - [1.3.1 基于 V100 GPU 的预测速度](#1.3.1)
+      - [1.3.2 基于 T4 GPU 的预测速度](#1.3.2)
+- [2. 模型快速体验](#2)
+   - [2.1 安装 paddleclas](#2.1)
+   - [2.2 预测](#2.2)
+- [3. 模型训练、评估和预测](#3)
+    - [3.1 环境配置](#3.1)
+    - [3.2 数据准备](#3.2)
+    - [3.3 模型训练](#3.3)
+    - [3.4 模型评估](#3.4)
+    - [3.5 模型预测](#3.5)
+- [4. 模型推理部署](#4)
+  - [4.1 推理模型准备](#4.1)
+    - [4.1.1 基于训练得到的权重导出 inference 模型](#4.1.1)
+    - [4.1.2 直接下载 inference 模型](#4.1.2)
+  - [4.2 基于 Python 预测引擎推理](#4.2)
+    - [4.2.1 预测单张图像](#4.2.1)
+    - [4.2.2 基于文件夹的批量预测](#4.2.2)
+  - [4.3 基于 C++ 预测引擎推理](#4.3)
+  - [4.4 服务化部署](#4.4)
+  - [4.5 端侧部署](#4.5)
+  - [4.6 Paddle2ONNX 模型转换与预测](#4.6)
+
+<a name='1'></a>
+
+## 1. 模型介绍
+
+<a name='1.1'></a>
+
+### 1.1 模型简介
+
+ResNet 系列模型是在 2015 年提出的，一举在 ILSVRC2015 比赛中取得冠军，top5 错误率为 3.57%。该网络创新性的提出了残差结构，通过堆叠多个残差结构从而构建了 ResNet 网络。实验表明使用残差块可以有效地提升收敛速度和精度。
+
+斯坦福大学的 Joyce Xu 将 ResNet 称为「真正重新定义了我们看待神经网络的方式」的三大架构之一。由于 ResNet 卓越的性能，越来越多的来自学术界和工业界学者和工程师对其结构进行了改进，比较出名的有 Wide-ResNet, ResNet-vc, ResNet-vd, Res2Net 等，其中 ResNet-vc 与 ResNet-vd 的参数量和计算量与 ResNet 几乎一致，所以在此我们将其与 ResNet 统一归为 ResNet 系列。
+
+PaddleClas 提供的 ResNet 系列的模型包括 ResNet50，ResNet50_vd，ResNet50_vd_ssld，ResNet200_vd 等 16 个预训练模型。在训练层面上，ResNet 的模型采用了训练 ImageNet 的标准训练流程，而其余改进版模型采用了更多的训练策略，如 learning rate 的下降方式采用了 cosine decay，引入了 label smoothing 的标签正则方式，在数据预处理加入了 mixup 的操作，迭代总轮数从 120 个 epoch 增加到 200 个 epoch。
+
+其中，后缀使用`_ssld`的模型采用了 SSLD 知识蒸馏，保证模型结构不变的情况下，进一步提升了模型的精度。
+
+
+<a name='1.2'></a>
+
+### 1.2 模型指标
+
+| Models           | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Params<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| ResNet18         | 0.710           | 0.899           | 0.696                    | 0.891                    | 3.660     | 11.690    |
+| ResNet18_vd      | 0.723           | 0.908           |                          |                          | 4.140     | 11.710    |
+| ResNet34         | 0.746           | 0.921           | 0.732                    | 0.913                    | 7.360     | 21.800    |
+| ResNet34_vd      | 0.760           | 0.930           |                          |                          | 7.390     | 21.820    |
+| ResNet34_vd_ssld      | 0.797           | 0.949           |                          |                          | 7.390     | 21.820    |
+| ResNet50         | 0.765           | 0.930           | 0.760                    | 0.930                    | 8.190     | 25.560    |
+| ResNet50_vc      | 0.784           | 0.940           |                          |                          | 8.670     | 25.580    |
+| ResNet50_vd      | 0.791           | 0.944           | 0.792                    | 0.946                    | 8.670     | 25.580    |
+| ResNet101        | 0.776           | 0.936           | 0.776                    | 0.938                    | 15.520    | 44.550    |
+| ResNet101_vd     | 0.802           | 0.950           |                          |                          | 16.100    | 44.570    |
+| ResNet152        | 0.783           | 0.940           | 0.778                    | 0.938                    | 23.050    | 60.190    |
+| ResNet152_vd     | 0.806           | 0.953           |                          |                          | 23.530    | 60.210    |
+| ResNet200_vd     | 0.809           | 0.953           |                          |                          | 30.530    | 74.740    |
+| ResNet50_vd_ssld | 0.830           | 0.964           |                          |                          | 8.670     | 25.580    |
+| Fix_ResNet50_vd_ssld | 0.840           | 0.970           |                          |                          | 17.696     | 25.580    |
+| ResNet101_vd_ssld | 0.837           | 0.967           |                          |                          | 16.100    | 44.570     |
+
+**备注：** `Fix_ResNet50_vd_ssld` 是固定 `ResNet50_vd_ssld` 除 FC 层外所有的网络参数，在 320x320 的图像输入分辨率下，基于 ImageNet-1k 数据集微调得到。
+
+
+<a name='1.3'></a>
+
+## 1.3 Benchmark
+
+<a name='1.3.1'></a>
+
+### 1.3.1 基于 V100 GPU 的预测速度
+
+| Models                 | Size | Latency(ms)<br>bs=1 | Latency(ms)<br>bs=4 | Latency(ms)<br>bs=8 |
+|:--:|:--:|:--:|:--:|:--:|
+| ResNet18         | 224       | 1.22               | 2.19               | 3.63               |
+| ResNet18_vd      | 224       | 1.26               | 2.28               | 3.89               |
+| ResNet34         | 224       | 1.97               | 3.25               | 5.70               |
+| ResNet34_vd      | 224       | 2.00               | 3.28               | 5.84               |
+| ResNet34_vd_ssld      | 224  | 2.00               | 3.26               | 5.85               |
+| ResNet50         | 224       |  2.54               | 4.79               | 7.40               |
+| ResNet50_vc      | 224       | 2.57               | 4.83               | 7.52               |
+| ResNet50_vd      | 224       |  2.60               | 4.86               | 7.63               |
+| ResNet101        | 224       |  4.37               | 8.18               | 12.38              |
+| ResNet101_vd     | 224       |  4.43               | 8.25               | 12.60              |
+| ResNet152        | 224       | 6.05               | 11.41              | 17.33              |
+| ResNet152_vd     | 224       |  6.11               | 11.51              | 17.59              |
+| ResNet200_vd     | 224       |  7.70               | 14.57              | 22.16              |
+| ResNet50_vd_ssld | 224       | 2.59           | 4.87               | 7.62               |
+| ResNet101_vd_ssld  | 224     | 4.43             | 8.25             | 12.58            |
+
+**备注：** 精度类型为 FP32，推理过程使用 TensorRT。
+
+<a name='1.3.2'></a>
+
+### 1.3.2 基于 T4 GPU 的预测速度
+
+| Models            | Size | Latency(ms)<br>FP16<br>bs=1 | Latency(ms)<br>FP16<br>bs=4 | Latency(ms)<br>FP16<br>bs=8 | Latency(ms)<br>FP32<br>bs=1 | Latency(ms)<br>FP32<br>bs=4 | Latency(ms)<br>FP32<br>bs=8 |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| ResNet18          | 224       | 1.3568                       | 2.5225                       | 3.61904                      | 1.45606                      | 3.56305                      | 6.28798                      |
+| ResNet18_vd       | 224       | 1.39593                      | 2.69063                      | 3.88267                      | 1.54557                      | 3.85363                      | 6.88121                      |
+| ResNet34          | 224       | 2.23092                      | 4.10205                      | 5.54904                      | 2.34957                      | 5.89821                      | 10.73451                     |
+| ResNet34_vd       | 224       | 2.23992                      | 4.22246                      | 5.79534                      | 2.43427                      | 6.22257                      | 11.44906                     |
+| ResNet34_vd_ssld       | 224       | 2.23992                      | 4.22246                      | 5.79534                      | 2.43427                      | 6.22257                      | 11.44906                     |
+| ResNet50          | 224       | 2.63824                      | 4.63802                      | 7.02444                      | 3.47712                      | 7.84421                      | 13.90633                     |
+| ResNet50_vc       | 224       | 2.67064                      | 4.72372                      | 7.17204                      | 3.52346                      | 8.10725                      | 14.45577                     |
+| ResNet50_vd       | 224       | 2.65164                      | 4.84109                      | 7.46225                      | 3.53131                      | 8.09057                      | 14.45965                     |
+| ResNet101         | 224       | 5.04037                      | 7.73673                      | 10.8936                      | 6.07125                      | 13.40573                     | 24.3597                      |
+| ResNet101_vd      | 224       | 5.05972                      | 7.83685                      | 11.34235                     | 6.11704                      | 13.76222                     | 25.11071                     |
+| ResNet152         | 224       | 7.28665                      | 10.62001                     | 14.90317                     | 8.50198                      | 19.17073                     | 35.78384                     |
+| ResNet152_vd      | 224       | 7.29127                      | 10.86137                     | 15.32444                     | 8.54376                      | 19.52157                     | 36.64445                     |
+| ResNet200_vd      | 224       | 9.36026                      | 13.5474                      | 19.0725                      | 10.80619                     | 25.01731                     | 48.81399                     |
+| ResNet50_vd_ssld  | 224       | 2.65164                      | 4.84109                      | 7.46225                      | 3.53131                      | 8.09057                      | 14.45965                     |
+| Fix_ResNet50_vd_ssld  | 320       | 3.42818                      | 7.51534                      | 13.19370                      | 5.07696                      | 14.64218                      | 27.01453                     |
+| ResNet101_vd_ssld | 224       | 5.05972                      | 7.83685                      | 11.34235                     | 6.11704                      | 13.76222                     | 25.11071                     |
+
+**备注：** 推理过程使用 TensorRT。
+
+<a name="2"></a>   
+    
+## 2. 模型快速体验
+
+<a name="2.1"></a>   
+    
+### 2.1 安装 paddleclas
+    
+使用如下命令快速安装 paddlepaddle, paddleclas
+    
+```    
+pip3 install paddlepaddle paddleclas
+```
+<a name="2.2"></a> 
+    
+### 2.2 预测
+
+* 在命令行中使用 ResNet50 的权重快速预测
+    
+```bash
+paddleclas --model_name=ResNet50  --infer_imgs="docs/images/inference_deployment/whl_demo.jpg"
+```
+    
+结果如下：
+```
+>>> result
+class_ids: [8, 7, 86, 82, 80], scores: [0.97968, 0.02028, 3e-05, 1e-05, 0.0], label_names: ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'black grouse'], filename: docs/images/inference_deployment/whl_demo.jpg
+Predict complete!
+```
+    
+**备注**： 更换 ResNet 的其他 scale 的模型时，只需替换 `model_name`，如将此时的模型改为 `ResNet18` 时，只需要将 `--model_name=ResNet50` 改为 `--model_name=ResNet18` 即可。   
+
+    
+* 在 Python 代码中预测
+```python
+from paddleclas import PaddleClas
+clas = PaddleClas(model_name='ResNet50')
+infer_imgs = 'docs/images/inference_deployment/whl_demo.jpg'
+result = clas.predict(infer_imgs)
+print(next(result))
+```
+
+**备注**：`PaddleClas.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭
+代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果。返回结果示例如下：
+
+```
+>>> result
+[{'class_ids': [8, 7, 86, 82, 80], 'scores': [0.97968, 0.02028, 3e-05, 1e-05, 0.0], 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'black grouse'], 'filename': 'docs/images/inference_deployment/whl_demo.jpg'}]
+```
+
+
+<a name="3"></a> 
+    
+## 3. 模型训练、评估和预测
+    
+<a name="3.1"></a>  
+
+### 3.1 环境配置
+
+* 安装：请先参考 [Paddle 安装教程](../installation/install_paddle.md) 以及 [PaddleClas 安装教程](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+
+<a name="3.2"></a> 
+
+### 3.2 数据准备
+
+请在[ImageNet 官网](https://www.image-net.org/)准备 ImageNet-1k 相关的数据。
+
+
+进入 PaddleClas 目录。
+
+```
+cd path_to_PaddleClas
+```
+
+进入 `dataset/` 目录，将下载好的数据命名为 `ILSVRC2012` ，存放于此。 `ILSVRC2012` 目录中具有以下数据：
+
+```
+├── train
+│   ├── n01440764
+│   │   ├── n01440764_10026.JPEG
+│   │   ├── n01440764_10027.JPEG
+├── train_list.txt
+...
+├── val
+│   ├── ILSVRC2012_val_00000001.JPEG
+│   ├── ILSVRC2012_val_00000002.JPEG
+├── val_list.txt
+```
+
+其中 `train/` 和 `val/` 分别为训练集和验证集。`train_list.txt` 和 `val_list.txt` 分别为训练集和验证集的标签文件。
+    
+**备注：** 
+
+* 关于 `train_list.txt`、`val_list.txt`的格式说明，可以参考[PaddleClas分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。
+
+
+<a name="3.3"></a> 
+
+### 3.3 模型训练 
+
+
+在 `ppcls/configs/ImageNet/ResNet/ResNet50.yaml` 中提供了 ResNet50 训练配置，可以通过如下脚本启动训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ppcls/configs/ImageNet/ResNet/ResNet50.yaml 
+```
+
+
+**备注：** 
+
+* 当前精度最佳的模型会保存在 `output/ResNet50/best_model.pdparams`
+
+<a name="3.4"></a>
+
+### 3.4 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
+
+```bash
+python3 tools/eval.py \
+    -c ppcls/configs/ImageNet/ResNet/ResNet50.yaml \
+    -o Global.pretrained_model=output/ResNet50/best_model
+```
+
+其中 `-o Global.pretrained_model="output/ResNet50/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+
+<a name="3.5"></a>
+
+### 3.5 模型预测
+
+模型训练完成之后，可以加载训练得到的预训练模型，进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例，只需执行下述命令即可完成模型预测：
+
+```python
+python3 tools/infer.py \
+    -c ppcls/configs/ImageNet/ResNet/ResNet50.yaml \
+    -o Global.pretrained_model=output/ResNet50/best_model 
+```
+
+输出结果如下：
+
+```
+[{'class_ids': [8, 7, 86, 82, 80], 'scores': [0.97968, 0.02028, 3e-05, 1e-05, 0.0], 'file_name': 'docs/images/inference_deployment/whl_demo.jpg', 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'black grouse']}]
+```
+
+**备注：** 
+
+* 这里`-o Global.pretrained_model="output/ResNet50/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。
+    
+* 默认是对 `docs/images/inference_deployment/whl_demo.jpg` 进行预测，此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。
+    
+* 默认输出的是 Top-5 的值，如果希望输出 Top-k 的值，可以指定`-o Infer.PostProcess.topk=k`，其中，`k` 为您指定的值。
+
+
+    
+<a name="4"></a>
+
+## 4. 模型推理部署
+
+<a name="4.1"></a> 
+
+### 4.1 推理模型准备
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference可使用MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于Paddle Inference推理引擎的介绍，可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+    
+当使用 Paddle Inference 推理时，加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法，如果希望得到和文档相同的结果，请选择[直接下载 inference 模型](#6.1.2)的方式。
+
+    
+<a name="4.1.1"></a> 
+
+### 4.1.1 基于训练得到的权重导出 inference 模型
+
+此处，我们提供了将权重和模型转换的脚本，执行该脚本可以得到对应的 inference 模型：
+
+```bash
+python3 tools/export_model.py \
+    -c ppcls/configs/ImageNet/ResNet/ResNet50.yaml \
+    -o Global.pretrained_model=output/ResNet50/best_model \
+    -o Global.save_inference_dir=deploy/models/ResNet50_infer
+```
+执行完该脚本后会在 `deploy/models/` 下生成 `ResNet50_infer` 文件夹，`models` 文件夹下应有如下文件结构：
+
+```
+├── ResNet50_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+
+<a name="4.1.2"></a> 
+
+### 4.1.2 直接下载 inference 模型
+
+[4.1.1 小节](#4.1.1)提供了导出 inference 模型的方法，此处也提供了该场景可以下载的 inference 模型，可以直接下载体验。
+
+```
+cd deploy/models
+# 下载 inference 模型并解压
+wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_infer.tar && tar -xf ResNet50_infer.tar
+```
+
+解压完毕后，`models` 文件夹下应有如下文件结构：
+
+```
+├── ResNet50_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="4.2"></a> 
+
+### 4.2 基于 Python 预测引擎推理
+
+
+<a name="4.2.1"></a>  
+
+#### 4.2.1 预测单张图像
+
+返回 `deploy` 目录：
+
+```
+cd ../
+```
+
+运行下面的命令，对图像 `./images/ImageNet/ILSVRC2012_val_00000010.jpeg` 进行分类。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/ResNet50_infer
+# 使用下面的命令使用 CPU 进行预测
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/ResNet50_infer -o Global.use_gpu=False
+```
+
+输出结果如下。
+
+```
+ILSVRC2012_val_00000010.jpeg:	class id(s): [153, 332, 229, 204, 265], score(s): [0.41, 0.39, 0.05, 0.04, 0.04], label_name(s): ['Maltese dog, Maltese terrier, Maltese', 'Angora, Angora rabbit', 'Old English sheepdog, bobtail', 'Lhasa, Lhasa apso', 'toy poodle']
+```
+
+<a name="4.2.2"></a>  
+
+#### 4.2.2 基于文件夹的批量预测
+
+如果希望预测文件夹内的图像，可以直接修改配置文件中的 `Global.infer_imgs` 字段，也可以通过下面的 `-o` 参数修改对应的配置。
+
+```shell
+# 使用下面的命令使用 GPU 进行预测，如果希望使用 CPU 预测，可以在命令后面添加 -o Global.use_gpu=False
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/ResNet50_infer -o Global.infer_imgs=images/ImageNet/
+```
+
+终端中会输出该文件夹内所有图像的分类结果，如下所示。
+
+```
+ILSVRC2012_val_00000010.jpeg:	class id(s): [153, 332, 229, 204, 265], score(s): [0.41, 0.39, 0.05, 0.04, 0.04], label_name(s): ['Maltese dog, Maltese terrier, Maltese', 'Angora, Angora rabbit', 'Old English sheepdog, bobtail', 'Lhasa, Lhasa apso', 'toy poodle']
+ILSVRC2012_val_00010010.jpeg:	class id(s): [902, 626, 531, 487, 761], score(s): [0.47, 0.10, 0.05, 0.04, 0.03], label_name(s): ['whistle', 'lighter, light, igniter, ignitor', 'digital watch', 'cellular telephone, cellular phone, cellphone, cell, mobile phone', 'remote control, remote']
+ILSVRC2012_val_00020010.jpeg:	class id(s): [178, 211, 246, 236, 210], score(s): [1.00, 0.00, 0.00, 0.00, 0.00], label_name(s): ['Weimaraner', 'vizsla, Hungarian pointer', 'Great Dane', 'Doberman, Doberman pinscher', 'German short-haired pointer']
+ILSVRC2012_val_00030010.jpeg:	class id(s): [80, 23, 83, 93, 136], score(s): [1.00, 0.00, 0.00, 0.00, 0.00], label_name(s): ['black grouse', 'vulture', 'prairie chicken, prairie grouse, prairie fowl', 'hornbill', 'European gallinule, Porphyrio porphyrio']
+```
+
+
+<a name="4.3"></a> 
+
+### 4.3 基于 C++ 预测引擎推理
+
+PaddleClas 提供了基于 C++ 预测引擎推理的示例，您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台，可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。
+
+<a name="4.4"></a> 
+
+### 4.4 服务化部署
+
+Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍，可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。
+    
+PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例，您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。
+
+<a name="4.5"></a> 
+
+### 4.5 端侧部署
+
+Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍，可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。
+    
+PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例，您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。
+
+<a name="4.6"></a> 
+
+### 4.6 Paddle2ONNX 模型转换与预测
+    
+Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署，包括TensorRT/OpenVINO/MNN/TNN/NCNN，以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍，可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。
+
+PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例，您可以参考[Paddle2ONNX 模型转换与预测](@shuilong)来完成相应的部署工作。
diff --git a/docs/zh_CN/models/SwinTransformer.md b/docs/zh_CN/models/SwinTransformer.md
index 40a873274312f0fe3925bb57a8141183dc91562f..df29b0a0c99754196bd3871536013b4f67aa2447 100644
--- a/docs/zh_CN/models/SwinTransformer.md
+++ b/docs/zh_CN/models/SwinTransformer.md
@@ -1,21 +1,37 @@
 # SwinTransformer
----
+
+-----
 ## 目录
 
-* [1. 概述](#1)
-* [2. 精度、FLOPS 和参数量](#2)
-* [3. 基于V100 GPU 的预测速度](#3)
+- [1. 模型介绍](#1)
+    - [1.1 模型简介](#1.1)
+    - [1.2 模型指标](#1.2)
+    - [1.3 Benchmark](#1.3)
+      - [1.3.1 基于 V100 GPU 的预测速度](#1.3.1)
+- [2. 模型快速体验](#2)
+- [3. 模型训练、评估和预测](#3)
+- [4. 模型推理部署](#4)
+  - [4.1 推理模型准备](#4.1)
+  - [4.2 基于 Python 预测引擎推理](#4.2)
+  - [4.3 基于 C++ 预测引擎推理](#4.3)
+  - [4.4 服务化部署](#4.4)
+  - [4.5 端侧部署](#4.5)
+  - [4.6 Paddle2ONNX 模型转换与预测](#4.6)
+
 
 <a name='1'></a>
 
-## 1. 概述
+## 1. 模型介绍
+
+### 1.1 模型简介
+
 Swin Transformer 是一种新的视觉 Transformer 网络，可以用作计算机视觉领域的通用骨干网路。SwinTransformer 由移动窗口（shifted windows）表示的层次 Transformer 结构组成。移动窗口将自注意计算限制在非重叠的局部窗口上，同时允许跨窗口连接，从而提高了网络性能。[论文地址](https://arxiv.org/abs/2103.14030)。
 
 <a name='2'></a>
 
-## 2. 精度、FLOPS 和参数量
+### 1.2 模型指标
 
-| Models           | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Params<br>(M) |
+| Models           | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Params<br>(M) |
 |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
 | SwinTransformer_tiny_patch4_window7_224    | 0.8069 | 0.9534 | 0.812 | 0.955 | 4.5  | 28   |
 | SwinTransformer_small_patch4_window7_224   | 0.8275 | 0.9613 | 0.832 | 0.962 | 8.7  | 50   |
@@ -32,17 +48,87 @@ Swin Transformer 是一种新的视觉 Transformer 网络，可以用作计算
 
 <a name='3'></a>
 
-## 3. 基于 V100 GPU 的预测速度
+### 1.3 Benchmark
+
+#### 1.3.1 基于 V100 GPU 的预测速度
 
-| Models                                                  | Crop Size | Resize Short Size | FP32<br/>Batch Size=1<br/>(ms) | FP32<br/>Batch Size=4<br/>(ms) | FP32<br/>Batch Size=8<br/>(ms) |
-| ------------------------------------------------------- | --------- | ----------------- | ------------------------------ | ------------------------------ | ------------------------------ |
-| SwinTransformer_tiny_patch4_window7_224                 | 224       | 256               | 6.59                           | 9.68                           | 16.32                          |
-| SwinTransformer_small_patch4_window7_224                | 224       | 256               | 12.54                          | 17.07                          | 28.08                          |
-| SwinTransformer_base_patch4_window7_224                 | 224       | 256               | 13.37                          | 23.53                          | 39.11                          |
-| SwinTransformer_base_patch4_window12_384                | 384       | 384               | 19.52                          | 64.56                          | 123.30                         |
-| SwinTransformer_base_patch4_window7_224<sup>[1]</sup>   | 224       | 256               | 13.53                          | 23.46                          | 39.13                          |
-| SwinTransformer_base_patch4_window12_384<sup>[1]</sup>  | 384       | 384               | 19.65                          | 64.72                          | 123.42                         |
-| SwinTransformer_large_patch4_window7_224<sup>[1]</sup>  | 224       | 256               | 15.74                          | 38.57                          | 71.49                          |
-| SwinTransformer_large_patch4_window12_384<sup>[1]</sup> | 384       | 384               | 32.61                          | 116.59                         | 223.23                         |
+| Models  | Size |  Latency(ms)<br>bs=1 | Latency(ms)<br>bs=4 | Latency(ms)<br>bs=8 |
+|:--:|:--:|:--:|:--:|:--:|
+| SwinTransformer_tiny_patch4_window7_224                 | 224       | 6.59                           | 9.68                           | 16.32                          |
+| SwinTransformer_small_patch4_window7_224                | 224       | 12.54                          | 17.07                          | 28.08                          |
+| SwinTransformer_base_patch4_window7_224                 | 224       | 13.37                          | 23.53                          | 39.11                          |
+| SwinTransformer_base_patch4_window12_384                | 384       | 19.52                          | 64.56                          | 123.30                         |
+| SwinTransformer_base_patch4_window7_224<sup>[1]</sup>   | 224       | 13.53                          | 23.46                          | 39.13                          |
+| SwinTransformer_base_patch4_window12_384<sup>[1]</sup>  | 384       | 19.65                          | 64.72                          | 123.42                         |
+| SwinTransformer_large_patch4_window7_224<sup>[1]</sup>  | 224       | 15.74                          | 38.57                          | 71.49                          |
+| SwinTransformer_large_patch4_window12_384<sup>[1]</sup> | 384       | 32.61                          | 116.59                         | 223.23                         |
 
 [1]：基于 ImageNet22k 数据集预训练，然后在 ImageNet1k 数据集迁移学习得到。
+
+**备注：** 精度类型为 FP32，推理过程使用 TensorRT。
+
+
+<a name="2"></a>   
+    
+## 2. 模型快速体验
+
+安装 paddlepaddle 和 paddleclas 即可快速对图片进行预测，体验方法可以参考[ResNet50 模型快速体验](./ResNet.md#2-模型快速体验)。
+
+<a name="3"></a> 
+    
+## 3. 模型训练、评估和预测
+
+
+此部分内容包括训练环境配置、ImageNet数据的准备、SwinTransformer 在 ImageNet 上的训练、评估、预测等内容。在 `ppcls/configs/ImageNet/SwinTransformer/` 中提供了 SwinTransformer 的训练配置，可以通过如下脚本启动训练：此部分内容可以参考[ResNet50 模型训练、评估和预测](./ResNet.md#3-模型训练评估和预测)。
+
+**备注：** 由于 SwinTransformer 系列模型默认使用的 GPU 数量为 8 个，所以在训练时，需要指定8个GPU，如`python3 -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c xxx.yaml`, 如果使用 4 个 GPU 训练，默认学习率需要减小一半，精度可能有损。
+
+    
+<a name="4"></a>
+
+## 4. 模型推理部署
+
+<a name="4.1"></a> 
+
+### 4.1 推理模型准备
+
+Paddle Inference 是飞桨的原生推理库， 作用于服务器端和云端，提供高性能的推理能力。相比于直接基于预训练模型进行预测，Paddle Inference可使用 MKLDNN、CUDNN、TensorRT 进行预测加速，从而实现更优的推理性能。更多关于Paddle Inference推理引擎的介绍，可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。
+    
+Inference 的获取可以参考 [ResNet50 推理模型准备](./ResNet.md#41-推理模型准备) 。
+
+<a name="4.2"></a> 
+
+### 4.2 基于 Python 预测引擎推理
+
+PaddleClas 提供了基于 python 预测引擎推理的示例。您可以参考[ResNet50 基于 Python 预测引擎推理](./ResNet.md#42-基于-python-预测引擎推理) 对 SwinTransformer 完成推理预测。
+
+<a name="4.3"></a> 
+
+### 4.3 基于 C++ 预测引擎推理
+
+PaddleClas 提供了基于 C++ 预测引擎推理的示例，您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台，可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。
+
+<a name="4.4"></a> 
+
+### 4.4 服务化部署
+
+Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍，可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。
+    
+PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例，您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。
+
+<a name="4.5"></a> 
+
+### 4.5 端侧部署
+
+Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍，可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。
+    
+PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例，您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。
+
+<a name="4.6"></a> 
+
+### 4.6 Paddle2ONNX 模型转换与预测
+    
+Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署，包括TensorRT/OpenVINO/MNN/TNN/NCNN，以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍，可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。
+
+PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例，您可以参考[Paddle2ONNX 模型转换与预测](@shuilong)来完成相应的部署工作。
+
diff --git a/docs/zh_CN/models_training/distributed_training.md b/docs/zh_CN/models_training/distributed_training.md
new file mode 100644
index 0000000000000000000000000000000000000000..59532a5ed3d0c9f676ecb733b42b02052eb79752
--- /dev/null
+++ b/docs/zh_CN/models_training/distributed_training.md
@@ -0,0 +1,62 @@
+
+# 分布式训练
+
+## 1. 简介
+
+* 分布式训练指的是将训练任务按照一定方法拆分到多个计算节点进行计算，再按照一定的方法对拆分后计算得到的梯度等信息进行聚合与更新。飞桨分布式训练技术源自百度的业务实践，在自然语言处理、计算机视觉、搜索和推荐等领域经过超大规模业务检验。分布式训练的高性能，是飞桨的核心优势技术之一，在图像分类等任务上，分布式训练可以达到几乎线性的加速比。图像分类训练任务中往往包含大量训练数据，以ImageNet为例，ImageNet22k数据集中包含1400W张图像，如果使用单卡训练，会非常耗时。因此PaddleClas中使用分布式训练接口完成训练任务，同时支持单机训练与多机训练。更多关于分布式训练的方法与文档可以参考：[分布式训练快速开始教程](https://fleet-x.readthedocs.io/en/latest/paddle_fleet_rst/parameter_server/ps_quick_start.html)。
+
+## 2. 使用方法
+
+### 2.1 单机训练
+
+* 以识别为例，本地准备好数据之后，使用`paddle.distributed.launch`的接口启动训练任务即可。下面为运行代码示例。
+
+```shell
+python3 -m paddle.distributed.launch \
+    --log_dir=./log/ \
+    --gpus "0,1,2,3" \
+    tools/train.py \
+    -c ./ppcls/configs/ImageNet/ResNet/ResNet50.yaml
+```
+
+### 2.2 多机训练
+
+* 相比单机训练，多机训练时，只需要添加`--ips`的参数，该参数表示需要参与分布式训练的机器的ip列表，不同机器的ip用逗号隔开。下面为运行代码示例。
+
+```shell
+ip_list="192.168.0.1,192.168.0.2"
+python3 -m paddle.distributed.launch \
+    --log_dir=./log/ \
+    --ips="${ip_list}" \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+    -c ./ppcls/configs/ImageNet/ResNet/ResNet50.yaml
+```
+
+**注：**
+* 不同机器的ip信息需要用逗号隔开，可以通过`ifconfig`或者`ipconfig`查看。
+* 不同机器之间需要做免密设置，且可以直接ping通，否则无法完成通信。
+* 不同机器之间的代码、数据与运行命令或脚本需要保持一致，且所有的机器上都需要运行设置好的训练命令或者脚本。最终`ip_list`中的第一台机器的第一块设备是trainer0，以此类推。
+* 不同机器的起始端口可能不同，建议在启动多机任务前，在不同的机器中设置相同的多机运行起始端口，命令为`export FLAGS_START_PORT=17000`，端口值建议在`10000~20000`之间。
+
+
+## 3. 性能效果测试
+
+* 在单机8卡V100的机器上，基于[SSLD知识蒸馏训练策略](../advanced_tutorials/ssld.md)（数据量500W）进行模型训练，不同模型的训练耗时以及单机8卡加速比情况如下所示。
+
+
+| 模型    | 精度     | 单机单卡耗时 | 单机8卡耗时 | 加速比  |
+|:---------:|:--------:|:--------:|:--------:|:------:|
+| PPHGNet-base_ssld   | 85.00% | 133.2d | 18.96d  | **7.04** |
+| PPLCNetv2-base_ssld | 80.10% | 31.6d   | 6.4d  | **4.93** |
+| PPLCNet_x0_25_ssld  | 53.43% | 21.8d   | 6.2d  | **3.99** |
+
+
+* 在4机8卡V100的机器上，基于[SSLD知识蒸馏训练策略](../advanced_tutorials/ssld.md)（数据量500W）进行模型训练，不同模型的训练耗时以及多机加速比情况如下所示。
+
+
+| 模型    | 精度     | 单机8卡耗时 | 4机8卡耗时 | 加速比  |
+|:---------:|:--------:|:--------:|:--------:|:------:|
+| PPHGNet-base_ssld   | 85.00% | 18.96d | 4.86d  | **3.90** |
+| PPLCNetv2-base_ssld | 80.10% | 6.4d   | 1.67d  | **3.83** |
+| PPLCNet_x0_25_ssld  | 53.43% | 6.2d   | 1.78d  | **3.48** |
diff --git a/docs/zh_CN/quick_start/quick_start_classification_new_user.md b/docs/zh_CN/quick_start/quick_start_classification_new_user.md
index 905f62d4dfc68a2bea61c87e7ef3867051d891fc..fdc61193c88b4b8b522842c7685bcdcf315dc4b5 100644
--- a/docs/zh_CN/quick_start/quick_start_classification_new_user.md
+++ b/docs/zh_CN/quick_start/quick_start_classification_new_user.md
@@ -48,7 +48,7 @@
 
 ##  2. 环境安装与配置
 
-具体安装步骤可详看[Paddle 安装文档](../installation/install_paddle.md)，[PaddleClas 安装文档](../installation/install_paddleclas.md)。
+具体安装步骤可详看[环境准备](../installation/install_paddleclas.md)。
 
 <a name='3'></a>
 
diff --git a/docs/zh_CN/quick_start/quick_start_multilabel_classification.md b/docs/zh_CN/quick_start/quick_start_multilabel_classification.md
index 888a61582078c009865317a4cb1b067264aa4082..ea6e691c1ef51fb1371a5ff747c4cfc4fe72a79d 100644
--- a/docs/zh_CN/quick_start/quick_start_multilabel_classification.md
+++ b/docs/zh_CN/quick_start/quick_start_multilabel_classification.md
@@ -1,6 +1,6 @@
 # 多标签分类 quick start
 
-基于 [NUS-WIDE-SCENE](https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html) 数据集，体验多标签分类的训练、评估、预测的过程，该数据集是 NUS-WIDE 数据集的一个子集。请首先安装 PaddlePaddle 和 PaddleClas，具体安装步骤可详看 [Paddle 安装文档](../installation/install_paddle.md)，[PaddleClas 安装文档](../installation/install_paddleclas.md)。
+基于 [NUS-WIDE-SCENE](https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html) 数据集，体验多标签分类的训练、评估、预测的过程，该数据集是 NUS-WIDE 数据集的一个子集。请首先安装 PaddlePaddle 和 PaddleClas，具体安装步骤可详看 [环境准备](../installation/install_paddleclas.md)。
 
 
 ## 目录
diff --git a/docs/zh_CN/quick_start/quick_start_recognition.md b/docs/zh_CN/quick_start/quick_start_recognition.md
index e2e6b169ea0101239b33612a379fc17207e7ffd3..38803ec9be510d3a4a96117fce3a1ccf537d3af9 100644
--- a/docs/zh_CN/quick_start/quick_start_recognition.md
+++ b/docs/zh_CN/quick_start/quick_start_recognition.md
@@ -22,7 +22,7 @@
 
 ## 1. 环境配置
 
-* 安装：请先参考 [Paddle 安装教程](../installation/install_paddle.md) 以及 [PaddleClas 安装教程](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
+* 安装：请先参考文档 [环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
 
 * 进入 `deploy` 运行目录。本部分所有内容与命令均需要在 `deploy` 目录下运行，可以通过下面的命令进入 `deploy` 目录。
 
diff --git a/paddleclas.py b/paddleclas.py
index bfad1931bdec5c305000775a6af891f4d7295244..91e7fcb84e2aa7013a084ab957e049659e13fe5b 100644
--- a/paddleclas.py
+++ b/paddleclas.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -24,7 +24,6 @@ import shutil
 import textwrap
 import tarfile
 import requests
-import warnings
 from functools import partial
 from difflib import SequenceMatcher
 
@@ -32,24 +31,25 @@ import cv2
 import numpy as np
 from tqdm import tqdm
 from prettytable import PrettyTable
+import paddle
 
 from deploy.python.predict_cls import ClsPredictor
 from deploy.utils.get_image_list import get_image_list
 from deploy.utils import config
 
-from ppcls.arch.backbone import *
-from ppcls.utils.logger import init_logger
+import ppcls.arch.backbone as backbone
+from ppcls.utils import logger
 
 # for building model with loading pretrained weights from backbone
-init_logger()
+logger.init_logger()
 
 __all__ = ["PaddleClas"]
 
 BASE_DIR = os.path.expanduser("~/.paddleclas/")
 BASE_INFERENCE_MODEL_DIR = os.path.join(BASE_DIR, "inference_model")
 BASE_IMAGES_DIR = os.path.join(BASE_DIR, "images")
-BASE_DOWNLOAD_URL = "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/{}_infer.tar"
-MODEL_SERIES = {
+IMN_MODEL_BASE_DOWNLOAD_URL = "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/{}_infer.tar"
+IMN_MODEL_SERIES = {
     "AlexNet": ["AlexNet"],
     "DarkNet": ["DarkNet53"],
     "DeiT": [
@@ -100,10 +100,17 @@ MODEL_SERIES = {
         "MobileNetV3_large_x1_0", "MobileNetV3_large_x1_25",
         "MobileNetV3_small_x1_0_ssld", "MobileNetV3_large_x1_0_ssld"
     ],
+    "PPHGNet": [
+        "PPHGNet_tiny",
+        "PPHGNet_small",
+        "PPHGNet_tiny_ssld",
+        "PPHGNet_small_ssld",
+    ],
     "PPLCNet": [
         "PPLCNet_x0_25", "PPLCNet_x0_35", "PPLCNet_x0_5", "PPLCNet_x0_75",
         "PPLCNet_x1_0", "PPLCNet_x1_5", "PPLCNet_x2_0", "PPLCNet_x2_5"
     ],
+    "PPLCNetV2": ["PPLCNetV2_base"],
     "RedNet": ["RedNet26", "RedNet38", "RedNet50", "RedNet101", "RedNet152"],
     "RegNet": ["RegNetX_4GF"],
     "Res2Net": [
@@ -168,6 +175,13 @@ MODEL_SERIES = {
     ]
 }
 
+PULC_MODEL_BASE_DOWNLOAD_URL = "https://paddleclas.bj.bcebos.com/models/PULC/inference/{}_infer.tar"
+PULC_MODELS = [
+    "car_exists", "language_classification", "person_attribute",
+    "person_exists", "safety_helmet", "text_image_orientation",
+    "textline_orientation", "traffic_sign", "vehicle_attribute"
+]
+
 
 class ImageTypeError(Exception):
     """ImageTypeError.
@@ -185,76 +199,69 @@ class InputModelError(Exception):
         super().__init__(message)
 
 
-def init_config(model_name,
-                inference_model_dir,
-                use_gpu=True,
-                batch_size=1,
-                topk=5,
-                **kwargs):
-    imagenet1k_map_path = os.path.join(
-        os.path.abspath(__dir__), "ppcls/utils/imagenet1k_label_list.txt")
-    cfg = {
-        "Global": {
-            "infer_imgs": kwargs["infer_imgs"]
-            if "infer_imgs" in kwargs else False,
-            "model_name": model_name,
-            "inference_model_dir": inference_model_dir,
-            "batch_size": batch_size,
-            "use_gpu": use_gpu,
-            "enable_mkldnn": kwargs["enable_mkldnn"]
-            if "enable_mkldnn" in kwargs else False,
-            "cpu_num_threads": kwargs["cpu_num_threads"]
-            if "cpu_num_threads" in kwargs else 1,
-            "enable_benchmark": False,
-            "use_fp16": kwargs["use_fp16"] if "use_fp16" in kwargs else False,
-            "ir_optim": True,
-            "use_tensorrt": kwargs["use_tensorrt"]
-            if "use_tensorrt" in kwargs else False,
-            "gpu_mem": kwargs["gpu_mem"] if "gpu_mem" in kwargs else 8000,
-            "enable_profile": False
-        },
-        "PreProcess": {
-            "transform_ops": [{
-                "ResizeImage": {
-                    "resize_short": kwargs["resize_short"]
-                    if "resize_short" in kwargs else 256
-                }
-            }, {
-                "CropImage": {
-                    "size": kwargs["crop_size"]
-                    if "crop_size" in kwargs else 224
-                }
-            }, {
-                "NormalizeImage": {
-                    "scale": 0.00392157,
-                    "mean": [0.485, 0.456, 0.406],
-                    "std": [0.229, 0.224, 0.225],
-                    "order": ''
-                }
-            }, {
-                "ToCHWImage": None
-            }]
-        },
-        "PostProcess": {
-            "main_indicator": "Topk",
-            "Topk": {
-                "topk": topk,
-                "class_id_map_file": imagenet1k_map_path
-            }
-        }
-    }
-    if "save_dir" in kwargs:
-        if kwargs["save_dir"] is not None:
-            cfg["PostProcess"]["SavePreLabel"] = {
-                "save_dir": kwargs["save_dir"]
-            }
-    if "class_id_map_file" in kwargs:
-        if kwargs["class_id_map_file"] is not None:
-            cfg["PostProcess"]["Topk"]["class_id_map_file"] = kwargs[
+def init_config(model_type, model_name, inference_model_dir, **kwargs):
+
+    cfg_path = f"deploy/configs/PULC/{model_name}/inference_{model_name}.yaml" if model_type == "pulc" else "deploy/configs/inference_cls.yaml"
+    cfg_path = os.path.join(__dir__, cfg_path)
+    cfg = config.get_config(cfg_path, show=False)
+
+    cfg.Global.inference_model_dir = inference_model_dir
+
+    if "batch_size" in kwargs and kwargs["batch_size"]:
+        cfg.Global.batch_size = kwargs["batch_size"]
+
+    if "use_gpu" in kwargs and kwargs["use_gpu"]:
+        cfg.Global.use_gpu = kwargs["use_gpu"]
+    if cfg.Global.use_gpu and not paddle.device.is_compiled_with_cuda():
+        msg = "The current running environment does not support the use of GPU. CPU has been used instead."
+        logger.warning(msg)
+        cfg.Global.use_gpu = False
+
+    if "infer_imgs" in kwargs and kwargs["infer_imgs"]:
+        cfg.Global.infer_imgs = kwargs["infer_imgs"]
+    if "enable_mkldnn" in kwargs and kwargs["enable_mkldnn"]:
+        cfg.Global.enable_mkldnn = kwargs["enable_mkldnn"]
+    if "cpu_num_threads" in kwargs and kwargs["cpu_num_threads"]:
+        cfg.Global.cpu_num_threads = kwargs["cpu_num_threads"]
+    if "use_fp16" in kwargs and kwargs["use_fp16"]:
+        cfg.Global.use_fp16 = kwargs["use_fp16"]
+    if "use_tensorrt" in kwargs and kwargs["use_tensorrt"]:
+        cfg.Global.use_tensorrt = kwargs["use_tensorrt"]
+    if "gpu_mem" in kwargs and kwargs["gpu_mem"]:
+        cfg.Global.gpu_mem = kwargs["gpu_mem"]
+    if "resize_short" in kwargs and kwargs["resize_short"]:
+        cfg.PreProcess.transform_ops[0]["ResizeImage"][
+            "resize_short"] = kwargs["resize_short"]
+    if "crop_size" in kwargs and kwargs["crop_size"]:
+        cfg.PreProcess.transform_ops[1]["CropImage"]["size"] = kwargs[
+            "crop_size"]
+
+    # TODO(gaotingquan): not robust
+    if "thresh" in kwargs and kwargs[
+            "thresh"] and "ThreshOutput" in cfg.PostProcess:
+        cfg.PostProcess.ThreshOutput.thresh = kwargs["thresh"]
+    if "Topk" in cfg.PostProcess:
+        if "topk" in kwargs and kwargs["topk"]:
+            cfg.PostProcess.Topk.topk = kwargs["topk"]
+        if "class_id_map_file" in kwargs and kwargs["class_id_map_file"]:
+            cfg.PostProcess.Topk.class_id_map_file = kwargs[
                 "class_id_map_file"]
+        else:
+            class_id_map_file_path = os.path.relpath(
+                cfg.PostProcess.Topk.class_id_map_file, "../")
+            cfg.PostProcess.Topk.class_id_map_file = os.path.join(
+                __dir__, class_id_map_file_path)
+    if "VehicleAttribute" in cfg.PostProcess:
+        if "color_threshold" in kwargs and kwargs["color_threshold"]:
+            cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[
+                "color_threshold"]
+        if "type_threshold" in kwargs and kwargs["type_threshold"]:
+            cfg.PostProcess.VehicleAttribute.type_threshold = kwargs[
+                "type_threshold"]
+
+    if "save_dir" in kwargs and kwargs["save_dir"]:
+        cfg.PostProcess.SavePreLabel.save_dir = kwargs["save_dir"]
 
-    cfg = config.AttrDict(cfg)
-    config.create_attr_dict(cfg)
     return cfg
 
 
@@ -275,40 +282,48 @@ def args_cfg():
         type=str,
         help="The directory of model files. Valid when model_name not specifed."
     )
+    parser.add_argument("--use_gpu", type=str2bool, help="Whether use GPU.")
     parser.add_argument(
-        "--use_gpu", type=str, default=True, help="Whether use GPU.")
-    parser.add_argument("--gpu_mem", type=int, default=8000, help="")
+        "--gpu_mem",
+        type=int,
+        help="The memory size of GPU allocated to predict.")
     parser.add_argument(
         "--enable_mkldnn",
         type=str2bool,
-        default=False,
         help="Whether use MKLDNN. Valid when use_gpu is False")
-    parser.add_argument("--cpu_num_threads", type=int, default=1, help="")
     parser.add_argument(
-        "--use_tensorrt", type=str2bool, default=False, help="")
-    parser.add_argument("--use_fp16", type=str2bool, default=False, help="")
+        "--cpu_num_threads",
+        type=int,
+        help="The threads number when predicting on CPU.")
+    parser.add_argument(
+        "--use_tensorrt",
+        type=str2bool,
+        help="Whether use TensorRT to accelerate.")
     parser.add_argument(
-        "--batch_size", type=int, default=1, help="Batch size. Default by 1.")
+        "--use_fp16", type=str2bool, help="Whether use FP16 to predict.")
+    parser.add_argument("--batch_size", type=int, help="Batch size.")
     parser.add_argument(
         "--topk",
         type=int,
-        default=5,
-        help="Return topk score(s) and corresponding results. Default by 5.")
+        help="Return topk score(s) and corresponding results when Topk postprocess is used."
+    )
     parser.add_argument(
         "--class_id_map_file",
         type=str,
         help="The path of file that map class_id and label.")
+    parser.add_argument(
+        "--threshold",
+        type=float,
+        help="The threshold of ThreshOutput when postprocess is used.")
+    parser.add_argument("--color_threshold", type=float, help="")
+    parser.add_argument("--type_threshold", type=float, help="")
     parser.add_argument(
         "--save_dir",
         type=str,
         help="The directory to save prediction results as pre-label.")
     parser.add_argument(
-        "--resize_short",
-        type=int,
-        default=256,
-        help="Resize according to short size.")
-    parser.add_argument(
-        "--crop_size", type=int, default=224, help="Centor crop size.")
+        "--resize_short", type=int, help="Resize according to short size.")
+    parser.add_argument("--crop_size", type=int, help="Centor crop size.")
 
     args = parser.parse_args()
     return vars(args)
@@ -317,33 +332,44 @@ def args_cfg():
 def print_info():
     """Print list of supported models in formatted.
     """
-    table = PrettyTable(["Series", "Name"])
+    imn_table = PrettyTable(["IMN Model Series", "Model Name"])
+    pulc_table = PrettyTable(["PULC Models"])
     try:
         sz = os.get_terminal_size()
-        width = sz.columns - 30 if sz.columns > 50 else 10
+        total_width = sz.columns
+        first_width = 30
+        second_width = total_width - first_width if total_width > 50 else 10
     except OSError:
-        width = 100
-    for series in MODEL_SERIES:
-        names = textwrap.fill("  ".join(MODEL_SERIES[series]), width=width)
-        table.add_row([series, names])
-    width = len(str(table).split("\n")[0])
-    print("{}".format("-" * width))
-    print("Models supported by PaddleClas".center(width))
-    print(table)
-    print("Powered by PaddlePaddle!".rjust(width))
-    print("{}".format("-" * width))
-
-
-def get_model_names():
+        second_width = 100
+    for series in IMN_MODEL_SERIES:
+        names = textwrap.fill(
+            "  ".join(IMN_MODEL_SERIES[series]), width=second_width)
+        imn_table.add_row([series, names])
+
+    table_width = len(str(imn_table).split("\n")[0])
+    pulc_table.add_row([
+        textwrap.fill(
+            "  ".join(PULC_MODELS), width=total_width).center(table_width - 4)
+    ])
+
+    print("{}".format("-" * table_width))
+    print("Models supported by PaddleClas".center(table_width))
+    print(imn_table)
+    print(pulc_table)
+    print("Powered by PaddlePaddle!".rjust(table_width))
+    print("{}".format("-" * table_width))
+
+
+def get_imn_model_names():
     """Get the model names list.
     """
     model_names = []
-    for series in MODEL_SERIES:
-        model_names += (MODEL_SERIES[series])
+    for series in IMN_MODEL_SERIES:
+        model_names += (IMN_MODEL_SERIES[series])
     return model_names
 
 
-def similar_architectures(name="", names=[], thresh=0.1, topk=10):
+def similar_model_names(name="", names=[], thresh=0.1, topk=5):
     """Find the most similar topk model names.
     """
     scores = []
@@ -378,12 +404,17 @@ def download_with_progressbar(url, save_path):
             f"Something went wrong while downloading file from {url}")
 
 
-def check_model_file(model_name):
+def check_model_file(model_type, model_name):
     """Check the model files exist and download and untar when no exist.
     """
-    storage_directory = partial(os.path.join, BASE_INFERENCE_MODEL_DIR,
-                                model_name)
-    url = BASE_DOWNLOAD_URL.format(model_name)
+    if model_type == "pulc":
+        storage_directory = partial(os.path.join, BASE_INFERENCE_MODEL_DIR,
+                                    "PULC", model_name)
+        url = PULC_MODEL_BASE_DOWNLOAD_URL.format(model_name)
+    else:
+        storage_directory = partial(os.path.join, BASE_INFERENCE_MODEL_DIR,
+                                    "IMN", model_name)
+        url = IMN_MODEL_BASE_DOWNLOAD_URL.format(model_name)
 
     tar_file_name_list = [
         "inference.pdiparams", "inference.pdiparams.info", "inference.pdmodel"
@@ -393,7 +424,7 @@ def check_model_file(model_name):
     if not os.path.exists(model_file_path) or not os.path.exists(
             params_file_path):
         tmp_path = storage_directory(url.split("/")[-1])
-        print(f"download {url} to {tmp_path}")
+        logger.info(f"download {url} to {tmp_path}")
         os.makedirs(storage_directory(), exist_ok=True)
         download_with_progressbar(url, tmp_path)
         with tarfile.open(tmp_path, "r") as tarObj:
@@ -426,9 +457,6 @@ class PaddleClas(object):
     def __init__(self,
                  model_name: str=None,
                  inference_model_dir: str=None,
-                 use_gpu: bool=True,
-                 batch_size: int=1,
-                 topk: int=5,
                  **kwargs):
         """Init PaddleClas with config.
 
@@ -440,9 +468,11 @@ class PaddleClas(object):
             topk (int, optional): Return the top k prediction results with the highest score. Defaults to 5.
         """
         super().__init__()
-        self._config = init_config(model_name, inference_model_dir, use_gpu,
-                                   batch_size, topk, **kwargs)
-        self._check_input_model()
+        self.model_type, inference_model_dir = self._check_input_model(
+            model_name, inference_model_dir)
+        self._config = init_config(self.model_type, model_name,
+                                   inference_model_dir, **kwargs)
+
         self.cls_predictor = ClsPredictor(self._config)
 
     def get_config(self):
@@ -450,24 +480,29 @@ class PaddleClas(object):
         """
         return self._config
 
-    def _check_input_model(self):
+    def _check_input_model(self, model_name, inference_model_dir):
         """Check input model name or model files.
         """
-        candidate_model_names = get_model_names()
-        input_model_name = self._config.Global.get("model_name", None)
-        inference_model_dir = self._config.Global.get("inference_model_dir",
-                                                      None)
-        if input_model_name is not None:
-            similar_names = similar_architectures(input_model_name,
-                                                  candidate_model_names)
-            similar_names_str = ", ".join(similar_names)
-            if input_model_name not in candidate_model_names:
-                err = f"{input_model_name} is not provided by PaddleClas. \nMaybe you want: [{similar_names_str}]. \nIf you want to use your own model, please specify inference_model_dir!"
+        all_imn_model_names = get_imn_model_names()
+        all_pulc_model_names = PULC_MODELS
+
+        if model_name:
+            if model_name in all_imn_model_names:
+                inference_model_dir = check_model_file("imn", model_name)
+                return "imn", inference_model_dir
+            elif model_name in all_pulc_model_names:
+                inference_model_dir = check_model_file("pulc", model_name)
+                return "pulc", inference_model_dir
+            else:
+                similar_imn_names = similar_model_names(model_name,
+                                                        all_imn_model_names)
+                similar_pulc_names = similar_model_names(model_name,
+                                                         all_pulc_model_names)
+                similar_names_str = ", ".join(similar_imn_names +
+                                              similar_pulc_names)
+                err = f"{model_name} is not provided by PaddleClas. \nMaybe you want the : [{similar_names_str}]. \nIf you want to use your own model, please specify inference_model_dir!"
                 raise InputModelError(err)
-            self._config.Global.inference_model_dir = check_model_file(
-                input_model_name)
-            return
-        elif inference_model_dir is not None:
+        elif inference_model_dir:
             model_file_path = os.path.join(inference_model_dir,
                                            "inference.pdmodel")
             params_file_path = os.path.join(inference_model_dir,
@@ -476,11 +511,11 @@ class PaddleClas(object):
                     params_file_path):
                 err = f"There is no model file or params file in this directory: {inference_model_dir}"
                 raise InputModelError(err)
-            return
+            return "custom", inference_model_dir
         else:
             err = f"Please specify the model name supported by PaddleClas or directory contained model files(inference.pdmodel, inference.pdiparams)."
             raise InputModelError(err)
-        return
+        return None
 
     def predict(self, input_data: Union[str, np.array],
                 print_pred: bool=False) -> Generator[list, None, None]:
@@ -511,22 +546,21 @@ class PaddleClas(object):
                     os.makedirs(image_storage_dir())
                 image_save_path = image_storage_dir("tmp.jpg")
                 download_with_progressbar(input_data, image_save_path)
-                input_data = image_save_path
-                warnings.warn(
+                logger.info(
                     f"Image to be predicted from Internet: {input_data}, has been saved to: {image_save_path}"
                 )
+                input_data = image_save_path
             image_list = get_image_list(input_data)
 
             batch_size = self._config.Global.get("batch_size", 1)
-            topk = self._config.PostProcess.Topk.get('topk', 1)
 
             img_list = []
             img_path_list = []
             cnt = 0
-            for idx, img_path in enumerate(image_list):
+            for idx_img, img_path in enumerate(image_list):
                 img = cv2.imread(img_path)
                 if img is None:
-                    warnings.warn(
+                    logger.warning(
                         f"Image file failed to read and has been skipped. The path: {img_path}"
                     )
                     continue
@@ -535,16 +569,15 @@ class PaddleClas(object):
                 img_path_list.append(img_path)
                 cnt += 1
 
-                if cnt % batch_size == 0 or (idx + 1) == len(image_list):
+                if cnt % batch_size == 0 or (idx_img + 1) == len(image_list):
                     preds = self.cls_predictor.predict(img_list)
 
-                    if print_pred and preds:
-                        for idx, pred in enumerate(preds):
-                            pred_str = ", ".join(
-                                [f"{k}: {pred[k]}" for k in pred])
-                            print(
-                                f"filename: {img_path_list[idx]}, top-{topk}, {pred_str}"
-                            )
+                    if preds:
+                        for idx_pred, pred in enumerate(preds):
+                            pred["filename"] = img_path_list[idx_pred]
+                            if print_pred:
+                                logger.info(", ".join(
+                                    [f"{k}: {pred[k]}" for k in pred]))
 
                     img_list = []
                     img_path_list = []
@@ -564,7 +597,7 @@ def main():
     res = clas_engine.predict(cfg["infer_imgs"], print_pred=True)
     for _ in res:
         pass
-    print("Predict complete!")
+    logger.info("Predict complete!")
     return
 
 
diff --git a/ppcls/arch/backbone/__init__.py b/ppcls/arch/backbone/__init__.py
index bc41454c7b54806270110987d59f1657ac95cafa..e957358479cb98d8bde3dac0d4b2785b8965c7bf 100644
--- a/ppcls/arch/backbone/__init__.py
+++ b/ppcls/arch/backbone/__init__.py
@@ -70,6 +70,7 @@ from ppcls.arch.backbone.model_zoo.van import VAN_tiny
 from ppcls.arch.backbone.variant_models.resnet_variant import ResNet50_last_stage_stride1
 from ppcls.arch.backbone.variant_models.vgg_variant import VGG19Sigmoid
 from ppcls.arch.backbone.variant_models.pp_lcnet_variant import PPLCNet_x2_5_Tanh
+from ppcls.arch.backbone.model_zoo.adaface_ir_net import AdaFace_IR_18, AdaFace_IR_34, AdaFace_IR_50, AdaFace_IR_101, AdaFace_IR_152, AdaFace_IR_SE_50, AdaFace_IR_SE_101, AdaFace_IR_SE_152, AdaFace_IR_SE_200
 
 
 # help whl get all the models' api (class type) and components' api (func type)
diff --git a/ppcls/arch/backbone/legendary_models/mobilenet_v3.py b/ppcls/arch/backbone/legendary_models/mobilenet_v3.py
index b7fc7e9f75db79338af9211782ff7a3c1525b222..3fbf9776bc4f39a5667b01623b7950d362203e9c 100644
--- a/ppcls/arch/backbone/legendary_models/mobilenet_v3.py
+++ b/ppcls/arch/backbone/legendary_models/mobilenet_v3.py
@@ -154,7 +154,8 @@ class MobileNetV3(TheseusLayer):
                  class_expand=LAST_CONV,
                  dropout_prob=0.2,
                  return_patterns=None,
-                 return_stages=None):
+                 return_stages=None,
+                 **kwargs):
         super().__init__()
 
         self.cfg = config
diff --git a/ppcls/arch/backbone/legendary_models/pp_hgnet.py b/ppcls/arch/backbone/legendary_models/pp_hgnet.py
index 3e0412dfb210c7dc44bc98854dbb96fca526ab1f..a5add431b025d9b97f0564a671a531d5ab7cd72d 100644
--- a/ppcls/arch/backbone/legendary_models/pp_hgnet.py
+++ b/ppcls/arch/backbone/legendary_models/pp_hgnet.py
@@ -27,7 +27,8 @@ MODEL_URLS = {
     "PPHGNet_tiny":
     "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_tiny_pretrained.pdparams",
     "PPHGNet_small":
-    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_small_pretrained.pdparams"
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_small_pretrained.pdparams",
+    "PPHGNet_base": ""
 }
 
 __all__ = list(MODEL_URLS.keys())
@@ -344,7 +345,7 @@ def PPHGNet_small(pretrained=False, use_ssld=False, **kwargs):
     return model
 
 
-def PPHGNet_base(pretrained=False, use_ssld=False, **kwargs):
+def PPHGNet_base(pretrained=False, use_ssld=True, **kwargs):
     """
     PPHGNet_base
     Args:
diff --git a/ppcls/arch/backbone/legendary_models/pp_lcnet.py b/ppcls/arch/backbone/legendary_models/pp_lcnet.py
index 64fa61e19c5a362b28eb4c1e9686eadadd02555b..a4fe6fadb53b19176d03e529a00200a1570c1eed 100644
--- a/ppcls/arch/backbone/legendary_models/pp_lcnet.py
+++ b/ppcls/arch/backbone/legendary_models/pp_lcnet.py
@@ -94,13 +94,16 @@ class ConvBNLayer(TheseusLayer):
             stride=stride,
             padding=(filter_size - 1) // 2,
             groups=num_groups,
-            weight_attr=ParamAttr(initializer=KaimingNormal(), learning_rate=lr_mult),
+            weight_attr=ParamAttr(
+                initializer=KaimingNormal(), learning_rate=lr_mult),
             bias_attr=False)
 
         self.bn = BatchNorm2D(
             num_filters,
-            weight_attr=ParamAttr(regularizer=L2Decay(0.0), learning_rate=lr_mult),
-            bias_attr=ParamAttr(regularizer=L2Decay(0.0), learning_rate=lr_mult))
+            weight_attr=ParamAttr(
+                regularizer=L2Decay(0.0), learning_rate=lr_mult),
+            bias_attr=ParamAttr(
+                regularizer=L2Decay(0.0), learning_rate=lr_mult))
         self.hardswish = nn.Hardswish()
 
     def forward(self, x):
@@ -128,8 +131,7 @@ class DepthwiseSeparable(TheseusLayer):
             num_groups=num_channels,
             lr_mult=lr_mult)
         if use_se:
-            self.se = SEModule(num_channels,
-                               lr_mult=lr_mult)
+            self.se = SEModule(num_channels, lr_mult=lr_mult)
         self.pw_conv = ConvBNLayer(
             num_channels=num_channels,
             filter_size=1,
@@ -187,14 +189,18 @@ class PPLCNet(TheseusLayer):
                  dropout_prob=0.2,
                  class_expand=1280,
                  lr_mult_list=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
+                 stride_list=[2, 2, 2, 2, 2],
                  use_last_conv=True,
                  return_patterns=None,
-                 return_stages=None):
+                 return_stages=None,
+                 **kwargs):
         super().__init__()
         self.scale = scale
         self.class_expand = class_expand
         self.lr_mult_list = lr_mult_list
         self.use_last_conv = use_last_conv
+        self.stride_list = stride_list
+        self.net_config = NET_CONFIG
         if isinstance(self.lr_mult_list, str):
             self.lr_mult_list = eval(self.lr_mult_list)
 
@@ -203,17 +209,27 @@ class PPLCNet(TheseusLayer):
         )), "lr_mult_list should be in (list, tuple) but got {}".format(
             type(self.lr_mult_list))
         assert len(self.lr_mult_list
-                   ) == 6, "lr_mult_list length should be 5 but got {}".format(
+                   ) == 6, "lr_mult_list length should be 6 but got {}".format(
                        len(self.lr_mult_list))
 
+        assert isinstance(self.stride_list, (
+            list, tuple
+        )), "stride_list should be in (list, tuple) but got {}".format(
+            type(self.stride_list))
+        assert len(self.stride_list
+                   ) == 5, "stride_list length should be 5 but got {}".format(
+                       len(self.stride_list))
+
+        for i, stride in enumerate(stride_list[1:]):
+            self.net_config["blocks{}".format(i + 3)][0][3] = stride
         self.conv1 = ConvBNLayer(
             num_channels=3,
             filter_size=3,
             num_filters=make_divisible(16 * scale),
-            stride=2,
+            stride=stride_list[0],
             lr_mult=self.lr_mult_list[0])
 
-        self.blocks2 = nn.Sequential(* [
+        self.blocks2 = nn.Sequential(*[
             DepthwiseSeparable(
                 num_channels=make_divisible(in_c * scale),
                 num_filters=make_divisible(out_c * scale),
@@ -221,10 +237,11 @@ class PPLCNet(TheseusLayer):
                 stride=s,
                 use_se=se,
                 lr_mult=self.lr_mult_list[1])
-            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
+            for i, (k, in_c, out_c, s, se
+                    ) in enumerate(self.net_config["blocks2"])
         ])
 
-        self.blocks3 = nn.Sequential(* [
+        self.blocks3 = nn.Sequential(*[
             DepthwiseSeparable(
                 num_channels=make_divisible(in_c * scale),
                 num_filters=make_divisible(out_c * scale),
@@ -232,10 +249,11 @@ class PPLCNet(TheseusLayer):
                 stride=s,
                 use_se=se,
                 lr_mult=self.lr_mult_list[2])
-            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
+            for i, (k, in_c, out_c, s, se
+                    ) in enumerate(self.net_config["blocks3"])
         ])
 
-        self.blocks4 = nn.Sequential(* [
+        self.blocks4 = nn.Sequential(*[
             DepthwiseSeparable(
                 num_channels=make_divisible(in_c * scale),
                 num_filters=make_divisible(out_c * scale),
@@ -243,10 +261,11 @@ class PPLCNet(TheseusLayer):
                 stride=s,
                 use_se=se,
                 lr_mult=self.lr_mult_list[3])
-            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
+            for i, (k, in_c, out_c, s, se
+                    ) in enumerate(self.net_config["blocks4"])
         ])
 
-        self.blocks5 = nn.Sequential(* [
+        self.blocks5 = nn.Sequential(*[
             DepthwiseSeparable(
                 num_channels=make_divisible(in_c * scale),
                 num_filters=make_divisible(out_c * scale),
@@ -254,10 +273,11 @@ class PPLCNet(TheseusLayer):
                 stride=s,
                 use_se=se,
                 lr_mult=self.lr_mult_list[4])
-            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
+            for i, (k, in_c, out_c, s, se
+                    ) in enumerate(self.net_config["blocks5"])
         ])
 
-        self.blocks6 = nn.Sequential(* [
+        self.blocks6 = nn.Sequential(*[
             DepthwiseSeparable(
                 num_channels=make_divisible(in_c * scale),
                 num_filters=make_divisible(out_c * scale),
@@ -265,13 +285,15 @@ class PPLCNet(TheseusLayer):
                 stride=s,
                 use_se=se,
                 lr_mult=self.lr_mult_list[5])
-            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
+            for i, (k, in_c, out_c, s, se
+                    ) in enumerate(self.net_config["blocks6"])
         ])
 
         self.avg_pool = AdaptiveAvgPool2D(1)
         if self.use_last_conv:
             self.last_conv = Conv2D(
-                in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
+                in_channels=make_divisible(self.net_config["blocks6"][-1][2] *
+                                           scale),
                 out_channels=self.class_expand,
                 kernel_size=1,
                 stride=1,
@@ -282,7 +304,9 @@ class PPLCNet(TheseusLayer):
         else:
             self.last_conv = None
         self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
-        self.fc = Linear(self.class_expand if self.use_last_conv else NET_CONFIG["blocks6"][-1][2], class_num)
+        self.fc = Linear(
+            self.class_expand if self.use_last_conv else
+            make_divisible(self.net_config["blocks6"][-1][2]), class_num)
 
         super().init_res(
             stages_pattern,
diff --git a/ppcls/arch/backbone/legendary_models/resnet.py b/ppcls/arch/backbone/legendary_models/resnet.py
index ca75c2eaa4f2d7f4a604a312ed591c10811105c4..705511f5b5a8ed5aac45636dddb3598aefd4276a 100644
--- a/ppcls/arch/backbone/legendary_models/resnet.py
+++ b/ppcls/arch/backbone/legendary_models/resnet.py
@@ -26,6 +26,7 @@ from paddle.nn.initializer import Uniform
 from paddle.regularizer import L2Decay
 import math
 
+from ppcls.utils import logger
 from ppcls.arch.backbone.base.theseus_layer import TheseusLayer
 from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url
 
@@ -122,23 +123,23 @@ class ConvBNLayer(TheseusLayer):
         self.is_vd_mode = is_vd_mode
         self.act = act
         self.avg_pool = AvgPool2D(
-            kernel_size=2, stride=2, padding=0, ceil_mode=True)
+            kernel_size=2, stride=stride, padding="SAME", ceil_mode=True)
         self.conv = Conv2D(
             in_channels=num_channels,
             out_channels=num_filters,
             kernel_size=filter_size,
-            stride=stride,
+            stride=1 if is_vd_mode else stride,
             padding=(filter_size - 1) // 2,
             groups=groups,
             weight_attr=ParamAttr(learning_rate=lr_mult),
             bias_attr=False,
             data_format=data_format)
 
-        weight_attr = ParamAttr(learning_rate=lr_mult, trainable=True)
-        bias_attr = ParamAttr(learning_rate=lr_mult, trainable=True)
-
-        self.bn = BatchNorm2D(
-            num_filters, weight_attr=weight_attr, bias_attr=bias_attr)
+        self.bn = BatchNorm(
+            num_filters,
+            param_attr=ParamAttr(learning_rate=lr_mult),
+            bias_attr=ParamAttr(learning_rate=lr_mult),
+            data_layout=data_format)
         self.relu = nn.ReLU()
 
     def forward(self, x):
@@ -161,7 +162,6 @@ class BottleneckBlock(TheseusLayer):
                  lr_mult=1.0,
                  data_format="NCHW"):
         super().__init__()
-
         self.conv0 = ConvBNLayer(
             num_channels=num_channels,
             num_filters=num_filters,
@@ -190,7 +190,7 @@ class BottleneckBlock(TheseusLayer):
                 num_channels=num_channels,
                 num_filters=num_filters * 4,
                 filter_size=1,
-                stride=stride if if_first else 1,
+                stride=stride,
                 is_vd_mode=False if if_first else True,
                 lr_mult=lr_mult,
                 data_format=data_format)
@@ -245,7 +245,7 @@ class BasicBlock(TheseusLayer):
                 num_channels=num_channels,
                 num_filters=num_filters,
                 filter_size=1,
-                stride=stride if if_first else 1,
+                stride=stride,
                 is_vd_mode=False if if_first else True,
                 lr_mult=lr_mult,
                 data_format=data_format)
@@ -284,14 +284,17 @@ class ResNet(TheseusLayer):
                  stem_act="relu",
                  class_num=1000,
                  lr_mult_list=[1.0, 1.0, 1.0, 1.0, 1.0],
+                 stride_list=[2, 2, 2, 2, 2],
                  data_format="NCHW",
                  input_image_channel=3,
                  return_patterns=None,
-                 return_stages=None):
+                 return_stages=None,
+                 **kargs):
         super().__init__()
 
         self.cfg = config
         self.lr_mult_list = lr_mult_list
+        self.stride_list = stride_list
         self.is_vd_mode = version == "vd"
         self.class_num = class_num
         self.num_filters = [64, 128, 256, 512]
@@ -304,18 +307,28 @@ class ResNet(TheseusLayer):
             list, tuple
         )), "lr_mult_list should be in (list, tuple) but got {}".format(
             type(self.lr_mult_list))
-        assert len(self.lr_mult_list
-                   ) == 5, "lr_mult_list length should be 5 but got {}".format(
-                       len(self.lr_mult_list))
+        if len(self.lr_mult_list) != 5:
+            msg = "lr_mult_list length should be 5 but got {}, default lr_mult_list used".format(
+                len(self.lr_mult_list))
+            logger.warning(msg)
+            self.lr_mult_list = [1.0, 1.0, 1.0, 1.0, 1.0]
+
+        assert isinstance(self.stride_list, (
+            list, tuple
+        )), "stride_list should be in (list, tuple) but got {}".format(
+            type(self.stride_list))
+        assert len(self.stride_list
+                   ) == 5, "stride_list length should be 5 but got {}".format(
+                       len(self.stride_list))
 
         self.stem_cfg = {
             #num_channels, num_filters, filter_size, stride
-            "vb": [[input_image_channel, 64, 7, 2]],
-            "vd":
-            [[input_image_channel, 32, 3, 2], [32, 32, 3, 1], [32, 64, 3, 1]]
+            "vb": [[input_image_channel, 64, 7, self.stride_list[0]]],
+            "vd": [[input_image_channel, 32, 3, self.stride_list[0]],
+                   [32, 32, 3, 1], [32, 64, 3, 1]]
         }
 
-        self.stem = nn.Sequential(* [
+        self.stem = nn.Sequential(*[
             ConvBNLayer(
                 num_channels=in_c,
                 num_filters=out_c,
@@ -328,7 +341,10 @@ class ResNet(TheseusLayer):
         ])
 
         self.max_pool = MaxPool2D(
-            kernel_size=3, stride=2, padding=1, data_format=data_format)
+            kernel_size=3,
+            stride=stride_list[1],
+            padding=1,
+            data_format=data_format)
         block_list = []
         for block_idx in range(len(self.block_depth)):
             shortcut = False
@@ -337,7 +353,8 @@ class ResNet(TheseusLayer):
                     num_channels=self.num_channels[block_idx] if i == 0 else
                     self.num_filters[block_idx] * self.channels_mult,
                     num_filters=self.num_filters[block_idx],
-                    stride=2 if i == 0 and block_idx != 0 else 1,
+                    stride=self.stride_list[block_idx + 1]
+                    if i == 0 and block_idx != 0 else 1,
                     shortcut=shortcut,
                     if_first=block_idx == i == 0 if version == "vd" else True,
                     lr_mult=self.lr_mult_list[block_idx + 1],
@@ -381,7 +398,10 @@ def _load_pretrained(pretrained, model, model_url, use_ssld):
     elif pretrained is True:
         load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld)
     elif isinstance(pretrained, str):
-        load_dygraph_pretrain(model, pretrained)
+        if 'http' in pretrained:
+            load_dygraph_pretrain_from_url(model, pretrained, use_ssld=False)
+        else:
+            load_dygraph_pretrain(model, pretrained)
     else:
         raise RuntimeError(
             "pretrained type is not available. Please use `string` or `boolean` type."
diff --git a/ppcls/arch/backbone/legendary_models/swin_transformer.py b/ppcls/arch/backbone/legendary_models/swin_transformer.py
index 2a3401b2a3fae17e6ca5834cad1b362c5955400f..c951150151aeaa6d83a9d31d36cef3c6dae88455 100644
--- a/ppcls/arch/backbone/legendary_models/swin_transformer.py
+++ b/ppcls/arch/backbone/legendary_models/swin_transformer.py
@@ -157,6 +157,7 @@ class WindowAttention(nn.Layer):
         relative_coords[:, :, 1] += self.window_size[1] - 1
         relative_coords[:, :, 0] *= 2 * self.window_size[1] - 1
         relative_position_index = relative_coords.sum(-1)  # Wh*Ww, Wh*Ww
+
         self.register_buffer("relative_position_index",
                              relative_position_index)
 
@@ -168,6 +169,23 @@ class WindowAttention(nn.Layer):
         trunc_normal_(self.relative_position_bias_table)
         self.softmax = nn.Softmax(axis=-1)
 
+    def eval(self, ):
+        # this is used to re-param swin for model export
+        relative_position_bias_table = self.relative_position_bias_table
+        window_size = self.window_size
+        index = self.relative_position_index.reshape([-1])
+
+        relative_position_bias = paddle.index_select(
+            relative_position_bias_table, index)
+        relative_position_bias = relative_position_bias.reshape([
+            window_size[0] * window_size[1], window_size[0] * window_size[1],
+            -1
+        ])  # Wh*Ww,Wh*Ww,nH
+        relative_position_bias = relative_position_bias.transpose(
+            [2, 0, 1])  # nH, Wh*Ww, Wh*Ww
+        relative_position_bias = relative_position_bias.unsqueeze(0)
+        self.register_buffer("relative_position_bias", relative_position_bias)
+
     def forward(self, x, mask=None):
         """
         Args:
@@ -183,18 +201,21 @@ class WindowAttention(nn.Layer):
         q = q * self.scale
         attn = paddle.mm(q, k.transpose([0, 1, 3, 2]))
 
-        index = self.relative_position_index.reshape([-1])
+        if self.training or not hasattr(self, "relative_position_bias"):
+            index = self.relative_position_index.reshape([-1])
 
-        relative_position_bias = paddle.index_select(
-            self.relative_position_bias_table, index)
-        relative_position_bias = relative_position_bias.reshape([
-            self.window_size[0] * self.window_size[1],
-            self.window_size[0] * self.window_size[1], -1
-        ])  # Wh*Ww,Wh*Ww,nH
+            relative_position_bias = paddle.index_select(
+                self.relative_position_bias_table, index)
+            relative_position_bias = relative_position_bias.reshape([
+                self.window_size[0] * self.window_size[1],
+                self.window_size[0] * self.window_size[1], -1
+            ])  # Wh*Ww,Wh*Ww,nH
 
-        relative_position_bias = relative_position_bias.transpose(
-            [2, 0, 1])  # nH, Wh*Ww, Wh*Ww
-        attn = attn + relative_position_bias.unsqueeze(0)
+            relative_position_bias = relative_position_bias.transpose(
+                [2, 0, 1])  # nH, Wh*Ww, Wh*Ww
+            attn = attn + relative_position_bias.unsqueeze(0)
+        else:
+            attn = attn + self.relative_position_bias
 
         if mask is not None:
             nW = mask.shape[0]
diff --git a/ppcls/arch/backbone/model_zoo/adaface_ir_net.py b/ppcls/arch/backbone/model_zoo/adaface_ir_net.py
new file mode 100644
index 0000000000000000000000000000000000000000..47de152b646e6f824e5a888692b770d9e146223b
--- /dev/null
+++ b/ppcls/arch/backbone/model_zoo/adaface_ir_net.py
@@ -0,0 +1,529 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# this code is based on AdaFace(https://github.com/mk-minchul/AdaFace)
+from collections import namedtuple
+import paddle
+import paddle.nn as nn
+from paddle.nn import Dropout
+from paddle.nn import MaxPool2D
+from paddle.nn import Sequential
+from paddle.nn import Conv2D, Linear
+from paddle.nn import BatchNorm1D, BatchNorm2D
+from paddle.nn import ReLU, Sigmoid
+from paddle.nn import Layer
+from paddle.nn import PReLU
+
+# from ppcls.arch.backbone.legendary_models.resnet import _load_pretrained
+
+
+class Flatten(Layer):
+    """ Flat tensor
+    """
+
+    def forward(self, input):
+        return paddle.reshape(input, [input.shape[0], -1])
+
+
+class LinearBlock(Layer):
+    """ Convolution block without no-linear activation layer
+    """
+
+    def __init__(self,
+                 in_c,
+                 out_c,
+                 kernel=(1, 1),
+                 stride=(1, 1),
+                 padding=(0, 0),
+                 groups=1):
+        super(LinearBlock, self).__init__()
+        self.conv = Conv2D(
+            in_c,
+            out_c,
+            kernel,
+            stride,
+            padding,
+            groups=groups,
+            weight_attr=nn.initializer.KaimingNormal(),
+            bias_attr=None)
+        weight_attr = paddle.ParamAttr(
+            regularizer=None, initializer=nn.initializer.Constant(value=1.0))
+        bias_attr = paddle.ParamAttr(
+            regularizer=None, initializer=nn.initializer.Constant(value=0.0))
+        self.bn = BatchNorm2D(
+            out_c, weight_attr=weight_attr, bias_attr=bias_attr)
+
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.bn(x)
+        return x
+
+
+class GNAP(Layer):
+    """ Global Norm-Aware Pooling block
+    """
+
+    def __init__(self, in_c):
+        super(GNAP, self).__init__()
+        self.bn1 = BatchNorm2D(in_c, weight_attr=False, bias_attr=False)
+        self.pool = nn.AdaptiveAvgPool2D((1, 1))
+        self.bn2 = BatchNorm1D(in_c, weight_attr=False, bias_attr=False)
+
+    def forward(self, x):
+        x = self.bn1(x)
+        x_norm = paddle.norm(x, 2, 1, True)
+        x_norm_mean = paddle.mean(x_norm)
+        weight = x_norm_mean / x_norm
+        x = x * weight
+        x = self.pool(x)
+        x = x.view(x.shape[0], -1)
+        feature = self.bn2(x)
+        return feature
+
+
+class GDC(Layer):
+    """ Global Depthwise Convolution block
+    """
+
+    def __init__(self, in_c, embedding_size):
+        super(GDC, self).__init__()
+        self.conv_6_dw = LinearBlock(
+            in_c,
+            in_c,
+            groups=in_c,
+            kernel=(7, 7),
+            stride=(1, 1),
+            padding=(0, 0))
+        self.conv_6_flatten = Flatten()
+        self.linear = Linear(
+            in_c,
+            embedding_size,
+            weight_attr=nn.initializer.KaimingNormal(),
+            bias_attr=False)
+        self.bn = BatchNorm1D(
+            embedding_size, weight_attr=False, bias_attr=False)
+
+    def forward(self, x):
+        x = self.conv_6_dw(x)
+        x = self.conv_6_flatten(x)
+        x = self.linear(x)
+        x = self.bn(x)
+        return x
+
+
+class SELayer(Layer):
+    """ SE block
+    """
+
+    def __init__(self, channels, reduction):
+        super(SELayer, self).__init__()
+        self.avg_pool = nn.AdaptiveAvgPool2D(1)
+        weight_attr = paddle.ParamAttr(
+            initializer=paddle.nn.initializer.XavierUniform())
+        self.fc1 = Conv2D(
+            channels,
+            channels // reduction,
+            kernel_size=1,
+            padding=0,
+            weight_attr=weight_attr,
+            bias_attr=False)
+
+        self.relu = ReLU()
+        self.fc2 = Conv2D(
+            channels // reduction,
+            channels,
+            kernel_size=1,
+            padding=0,
+            weight_attr=nn.initializer.KaimingNormal(),
+            bias_attr=False)
+
+        self.sigmoid = Sigmoid()
+
+    def forward(self, x):
+        module_input = x
+        x = self.avg_pool(x)
+        x = self.fc1(x)
+        x = self.relu(x)
+        x = self.fc2(x)
+        x = self.sigmoid(x)
+
+        return module_input * x
+
+
+class BasicBlockIR(Layer):
+    """ BasicBlock for IRNet
+    """
+
+    def __init__(self, in_channel, depth, stride):
+        super(BasicBlockIR, self).__init__()
+
+        weight_attr = paddle.ParamAttr(
+            regularizer=None, initializer=nn.initializer.Constant(value=1.0))
+        bias_attr = paddle.ParamAttr(
+            regularizer=None, initializer=nn.initializer.Constant(value=0.0))
+        if in_channel == depth:
+            self.shortcut_layer = MaxPool2D(1, stride)
+        else:
+            self.shortcut_layer = Sequential(
+                Conv2D(
+                    in_channel,
+                    depth, (1, 1),
+                    stride,
+                    weight_attr=nn.initializer.KaimingNormal(),
+                    bias_attr=False),
+                BatchNorm2D(
+                    depth, weight_attr=weight_attr, bias_attr=bias_attr))
+        self.res_layer = Sequential(
+            BatchNorm2D(
+                in_channel, weight_attr=weight_attr, bias_attr=bias_attr),
+            Conv2D(
+                in_channel,
+                depth, (3, 3), (1, 1),
+                1,
+                weight_attr=nn.initializer.KaimingNormal(),
+                bias_attr=False),
+            BatchNorm2D(
+                depth, weight_attr=weight_attr, bias_attr=bias_attr),
+            PReLU(depth),
+            Conv2D(
+                depth,
+                depth, (3, 3),
+                stride,
+                1,
+                weight_attr=nn.initializer.KaimingNormal(),
+                bias_attr=False),
+            BatchNorm2D(
+                depth, weight_attr=weight_attr, bias_attr=bias_attr))
+
+    def forward(self, x):
+        shortcut = self.shortcut_layer(x)
+        res = self.res_layer(x)
+
+        return res + shortcut
+
+
+class BottleneckIR(Layer):
+    """ BasicBlock with bottleneck for IRNet
+    """
+
+    def __init__(self, in_channel, depth, stride):
+        super(BottleneckIR, self).__init__()
+        reduction_channel = depth // 4
+        weight_attr = paddle.ParamAttr(
+            regularizer=None, initializer=nn.initializer.Constant(value=1.0))
+        bias_attr = paddle.ParamAttr(
+            regularizer=None, initializer=nn.initializer.Constant(value=0.0))
+        if in_channel == depth:
+            self.shortcut_layer = MaxPool2D(1, stride)
+        else:
+            self.shortcut_layer = Sequential(
+                Conv2D(
+                    in_channel,
+                    depth, (1, 1),
+                    stride,
+                    weight_attr=nn.initializer.KaimingNormal(),
+                    bias_attr=False),
+                BatchNorm2D(
+                    depth, weight_attr=weight_attr, bias_attr=bias_attr))
+        self.res_layer = Sequential(
+            BatchNorm2D(
+                in_channel, weight_attr=weight_attr, bias_attr=bias_attr),
+            Conv2D(
+                in_channel,
+                reduction_channel, (1, 1), (1, 1),
+                0,
+                weight_attr=nn.initializer.KaimingNormal(),
+                bias_attr=False),
+            BatchNorm2D(
+                reduction_channel,
+                weight_attr=weight_attr,
+                bias_attr=bias_attr),
+            PReLU(reduction_channel),
+            Conv2D(
+                reduction_channel,
+                reduction_channel, (3, 3), (1, 1),
+                1,
+                weight_attr=nn.initializer.KaimingNormal(),
+                bias_attr=False),
+            BatchNorm2D(
+                reduction_channel,
+                weight_attr=weight_attr,
+                bias_attr=bias_attr),
+            PReLU(reduction_channel),
+            Conv2D(
+                reduction_channel,
+                depth, (1, 1),
+                stride,
+                0,
+                weight_attr=nn.initializer.KaimingNormal(),
+                bias_attr=False),
+            BatchNorm2D(
+                depth, weight_attr=weight_attr, bias_attr=bias_attr))
+
+    def forward(self, x):
+        shortcut = self.shortcut_layer(x)
+        res = self.res_layer(x)
+
+        return res + shortcut
+
+
+class BasicBlockIRSE(BasicBlockIR):
+    def __init__(self, in_channel, depth, stride):
+        super(BasicBlockIRSE, self).__init__(in_channel, depth, stride)
+        self.res_layer.add_sublayer("se_block", SELayer(depth, 16))
+
+
+class BottleneckIRSE(BottleneckIR):
+    def __init__(self, in_channel, depth, stride):
+        super(BottleneckIRSE, self).__init__(in_channel, depth, stride)
+        self.res_layer.add_sublayer("se_block", SELayer(depth, 16))
+
+
+class Bottleneck(namedtuple('Block', ['in_channel', 'depth', 'stride'])):
+    '''A named tuple describing a ResNet block.'''
+
+
+def get_block(in_channel, depth, num_units, stride=2):
+
+    return [Bottleneck(in_channel, depth, stride)] +\
+           [Bottleneck(depth, depth, 1) for i in range(num_units - 1)]
+
+
+def get_blocks(num_layers):
+    if num_layers == 18:
+        blocks = [
+            get_block(
+                in_channel=64, depth=64, num_units=2), get_block(
+                    in_channel=64, depth=128, num_units=2), get_block(
+                        in_channel=128, depth=256, num_units=2), get_block(
+                            in_channel=256, depth=512, num_units=2)
+        ]
+    elif num_layers == 34:
+        blocks = [
+            get_block(
+                in_channel=64, depth=64, num_units=3), get_block(
+                    in_channel=64, depth=128, num_units=4), get_block(
+                        in_channel=128, depth=256, num_units=6), get_block(
+                            in_channel=256, depth=512, num_units=3)
+        ]
+    elif num_layers == 50:
+        blocks = [
+            get_block(
+                in_channel=64, depth=64, num_units=3), get_block(
+                    in_channel=64, depth=128, num_units=4), get_block(
+                        in_channel=128, depth=256, num_units=14), get_block(
+                            in_channel=256, depth=512, num_units=3)
+        ]
+    elif num_layers == 100:
+        blocks = [
+            get_block(
+                in_channel=64, depth=64, num_units=3), get_block(
+                    in_channel=64, depth=128, num_units=13), get_block(
+                        in_channel=128, depth=256, num_units=30), get_block(
+                            in_channel=256, depth=512, num_units=3)
+        ]
+    elif num_layers == 152:
+        blocks = [
+            get_block(
+                in_channel=64, depth=256, num_units=3), get_block(
+                    in_channel=256, depth=512, num_units=8), get_block(
+                        in_channel=512, depth=1024, num_units=36), get_block(
+                            in_channel=1024, depth=2048, num_units=3)
+        ]
+    elif num_layers == 200:
+        blocks = [
+            get_block(
+                in_channel=64, depth=256, num_units=3), get_block(
+                    in_channel=256, depth=512, num_units=24), get_block(
+                        in_channel=512, depth=1024, num_units=36), get_block(
+                            in_channel=1024, depth=2048, num_units=3)
+        ]
+
+    return blocks
+
+
+class Backbone(Layer):
+    def __init__(self, input_size, num_layers, mode='ir'):
+        """ Args:
+            input_size: input_size of backbone
+            num_layers: num_layers of backbone
+            mode: support ir or irse
+        """
+        super(Backbone, self).__init__()
+        assert input_size[0] in [112, 224], \
+            "input_size should be [112, 112] or [224, 224]"
+        assert num_layers in [18, 34, 50, 100, 152, 200], \
+            "num_layers should be 18, 34, 50, 100 or 152"
+        assert mode in ['ir', 'ir_se'], \
+            "mode should be ir or ir_se"
+        weight_attr = paddle.ParamAttr(
+            regularizer=None, initializer=nn.initializer.Constant(value=1.0))
+        bias_attr = paddle.ParamAttr(
+            regularizer=None, initializer=nn.initializer.Constant(value=0.0))
+        self.input_layer = Sequential(
+            Conv2D(
+                3,
+                64, (3, 3),
+                1,
+                1,
+                weight_attr=nn.initializer.KaimingNormal(),
+                bias_attr=False),
+            BatchNorm2D(
+                64, weight_attr=weight_attr, bias_attr=bias_attr),
+            PReLU(64))
+        blocks = get_blocks(num_layers)
+        if num_layers <= 100:
+            if mode == 'ir':
+                unit_module = BasicBlockIR
+            elif mode == 'ir_se':
+                unit_module = BasicBlockIRSE
+            output_channel = 512
+        else:
+            if mode == 'ir':
+                unit_module = BottleneckIR
+            elif mode == 'ir_se':
+                unit_module = BottleneckIRSE
+            output_channel = 2048
+
+        if input_size[0] == 112:
+            self.output_layer = Sequential(
+                BatchNorm2D(
+                    output_channel,
+                    weight_attr=weight_attr,
+                    bias_attr=bias_attr),
+                Dropout(0.4),
+                Flatten(),
+                Linear(
+                    output_channel * 7 * 7,
+                    512,
+                    weight_attr=nn.initializer.KaimingNormal()),
+                BatchNorm1D(
+                    512, weight_attr=False, bias_attr=False))
+        else:
+            self.output_layer = Sequential(
+                BatchNorm2D(
+                    output_channel,
+                    weight_attr=weight_attr,
+                    bias_attr=bias_attr),
+                Dropout(0.4),
+                Flatten(),
+                Linear(
+                    output_channel * 14 * 14,
+                    512,
+                    weight_attr=nn.initializer.KaimingNormal()),
+                BatchNorm1D(
+                    512, weight_attr=False, bias_attr=False))
+
+        modules = []
+        for block in blocks:
+            for bottleneck in block:
+                modules.append(
+                    unit_module(bottleneck.in_channel, bottleneck.depth,
+                                bottleneck.stride))
+        self.body = Sequential(*modules)
+
+        # initialize_weights(self.modules())
+
+    def forward(self, x):
+
+        # current code only supports one extra image
+        # it comes with a extra dimension for number of extra image. We will just squeeze it out for now
+        x = self.input_layer(x)
+
+        for idx, module in enumerate(self.body):
+            x = module(x)
+
+        x = self.output_layer(x)
+        # norm = paddle.norm(x, 2, 1, True)
+        # output = paddle.divide(x, norm)
+        # return output, norm
+        return x
+
+
+def AdaFace_IR_18(input_size=(112, 112)):
+    """ Constructs a ir-18 model.
+    """
+    model = Backbone(input_size, 18, 'ir')
+    return model
+
+
+def AdaFace_IR_34(input_size=(112, 112)):
+    """ Constructs a ir-34 model.
+    """
+    model = Backbone(input_size, 34, 'ir')
+
+    return model
+
+
+def AdaFace_IR_50(input_size=(112, 112)):
+    """ Constructs a ir-50 model.
+    """
+    model = Backbone(input_size, 50, 'ir')
+
+    return model
+
+
+def AdaFace_IR_101(input_size=(112, 112)):
+    """ Constructs a ir-101 model.
+    """
+    model = Backbone(input_size, 100, 'ir')
+
+    return model
+
+
+def AdaFace_IR_152(input_size=(112, 112)):
+    """ Constructs a ir-152 model.
+    """
+    model = Backbone(input_size, 152, 'ir')
+
+    return model
+
+
+def AdaFace_IR_200(input_size=(112, 112)):
+    """ Constructs a ir-200 model.
+    """
+    model = Backbone(input_size, 200, 'ir')
+
+    return model
+
+
+def AdaFace_IR_SE_50(input_size=(112, 112)):
+    """ Constructs a ir_se-50 model.
+    """
+    model = Backbone(input_size, 50, 'ir_se')
+
+    return model
+
+
+def AdaFace_IR_SE_101(input_size=(112, 112)):
+    """ Constructs a ir_se-101 model.
+    """
+    model = Backbone(input_size, 100, 'ir_se')
+
+    return model
+
+
+def AdaFace_IR_SE_152(input_size=(112, 112)):
+    """ Constructs a ir_se-152 model.
+    """
+    model = Backbone(input_size, 152, 'ir_se')
+
+    return model
+
+
+def AdaFace_IR_SE_200(input_size=(112, 112)):
+    """ Constructs a ir_se-200 model.
+    """
+    model = Backbone(input_size, 200, 'ir_se')
+
+    return model
diff --git a/ppcls/arch/backbone/model_zoo/res2net_vd.py b/ppcls/arch/backbone/model_zoo/res2net_vd.py
index 511fbaa59e6ff5b4e5419edc084631f6e43873fa..2139e198819c6768b975b339e9373fe7f6334f10 100644
--- a/ppcls/arch/backbone/model_zoo/res2net_vd.py
+++ b/ppcls/arch/backbone/model_zoo/res2net_vd.py
@@ -165,7 +165,8 @@ class BottleneckBlock(nn.Layer):
 
 
 class Res2Net_vd(nn.Layer):
-    def __init__(self, layers=50, scales=4, width=26, class_num=1000):
+    def __init__(self, layers=50, scales=4, width=26, class_num=1000,
+                 **kwargs):
         super(Res2Net_vd, self).__init__()
 
         self.layers = layers
diff --git a/ppcls/arch/backbone/model_zoo/shufflenet_v2.py b/ppcls/arch/backbone/model_zoo/shufflenet_v2.py
index b10249b7e2ea59bfa846c4fa3e09c5fbfe77b9ef..c769afdd4b238fa0a7b92fdb72c3962645a2ac8f 100644
--- a/ppcls/arch/backbone/model_zoo/shufflenet_v2.py
+++ b/ppcls/arch/backbone/model_zoo/shufflenet_v2.py
@@ -233,7 +233,7 @@ class ShuffleNet(Layer):
         elif scale == 1.5:
             stage_out_channels = [-1, 24, 176, 352, 704, 1024]
         elif scale == 2.0:
-            stage_out_channels = [-1, 24, 224, 488, 976, 2048]
+            stage_out_channels = [-1, 24, 244, 488, 976, 2048]
         else:
             raise NotImplementedError("This scale size:[" + str(scale) +
                                       "] is not implemented!")
diff --git a/ppcls/arch/gears/__init__.py b/ppcls/arch/gears/__init__.py
index 8757aa4aeb4a510857ca4dc1c60696b1d6e86a0b..871967804e21c362935915942aa3f621207b934e 100644
--- a/ppcls/arch/gears/__init__.py
+++ b/ppcls/arch/gears/__init__.py
@@ -19,6 +19,7 @@ from .fc import FC
 from .vehicle_neck import VehicleNeck
 from paddle.nn import Tanh
 from .bnneck import BNNeck
+from .adamargin import AdaMargin
 
 __all__ = ['build_gear']
 
@@ -26,7 +27,7 @@ __all__ = ['build_gear']
 def build_gear(config):
     support_dict = [
         'ArcMargin', 'CosMargin', 'CircleMargin', 'FC', 'VehicleNeck', 'Tanh',
-        'BNNeck'
+        'BNNeck', 'AdaMargin'
     ]
     module_name = config.pop('name')
     assert module_name in support_dict, Exception(
diff --git a/ppcls/arch/gears/adamargin.py b/ppcls/arch/gears/adamargin.py
new file mode 100644
index 0000000000000000000000000000000000000000..1b0f5f245dbbe2c282f726b7d5be3634d6df912c
--- /dev/null
+++ b/ppcls/arch/gears/adamargin.py
@@ -0,0 +1,111 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This code is based on AdaFace(https://github.com/mk-minchul/AdaFace)
+# Paper: AdaFace: Quality Adaptive Margin for Face Recognition
+from paddle.nn import Layer
+import math
+import paddle
+
+
+def l2_norm(input, axis=1):
+    norm = paddle.norm(input, 2, axis, True)
+    output = paddle.divide(input, norm)
+    return output
+
+
+class AdaMargin(Layer):
+    def __init__(
+            self,
+            embedding_size=512,
+            class_num=70722,
+            m=0.4,
+            h=0.333,
+            s=64.,
+            t_alpha=1.0, ):
+        super(AdaMargin, self).__init__()
+        self.classnum = class_num
+        kernel_weight = paddle.uniform(
+            [embedding_size, class_num], min=-1, max=1)
+        kernel_weight_norm = paddle.norm(
+            kernel_weight, p=2, axis=0, keepdim=True)
+        kernel_weight_norm = paddle.where(kernel_weight_norm > 1e-5,
+                                          kernel_weight_norm,
+                                          paddle.ones_like(kernel_weight_norm))
+        kernel_weight = kernel_weight / kernel_weight_norm
+        self.kernel = self.create_parameter(
+            [embedding_size, class_num],
+            attr=paddle.nn.initializer.Assign(kernel_weight))
+
+        # initial kernel
+        # self.kernel.data.uniform_(-1, 1).renorm_(2,1,1e-5).mul_(1e5)
+        self.m = m
+        self.eps = 1e-3
+        self.h = h
+        self.s = s
+
+        # ema prep
+        self.t_alpha = t_alpha
+        self.register_buffer('t', paddle.zeros([1]), persistable=True)
+        self.register_buffer(
+            'batch_mean', paddle.ones([1]) * 20, persistable=True)
+        self.register_buffer(
+            'batch_std', paddle.ones([1]) * 100, persistable=True)
+
+    def forward(self, embbedings, label):
+
+        norms = paddle.norm(embbedings, 2, 1, True)
+        embbedings = paddle.divide(embbedings, norms)
+        kernel_norm = l2_norm(self.kernel, axis=0)
+        cosine = paddle.mm(embbedings, kernel_norm)
+        cosine = paddle.clip(cosine, -1 + self.eps,
+                             1 - self.eps)  # for stability
+
+        safe_norms = paddle.clip(norms, min=0.001, max=100)  # for stability
+        safe_norms = safe_norms.clone().detach()
+
+        # update batchmean batchstd
+        with paddle.no_grad():
+            mean = safe_norms.mean().detach()
+            std = safe_norms.std().detach()
+            self.batch_mean = mean * self.t_alpha + (1 - self.t_alpha
+                                                     ) * self.batch_mean
+            self.batch_std = std * self.t_alpha + (1 - self.t_alpha
+                                                   ) * self.batch_std
+
+        margin_scaler = (safe_norms - self.batch_mean) / (
+            self.batch_std + self.eps)  # 66% between -1, 1
+        margin_scaler = margin_scaler * self.h  # 68% between -0.333 ,0.333 when h:0.333
+        margin_scaler = paddle.clip(margin_scaler, -1, 1)
+
+        # g_angular
+        m_arc = paddle.nn.functional.one_hot(
+            label.reshape([-1]), self.classnum)
+        g_angular = self.m * margin_scaler * -1
+        m_arc = m_arc * g_angular
+        theta = paddle.acos(cosine)
+        theta_m = paddle.clip(
+            theta + m_arc, min=self.eps, max=math.pi - self.eps)
+        cosine = paddle.cos(theta_m)
+
+        # g_additive
+        m_cos = paddle.nn.functional.one_hot(
+            label.reshape([-1]), self.classnum)
+        g_add = self.m + (self.m * margin_scaler)
+        m_cos = m_cos * g_add
+        cosine = cosine - m_cos
+
+        # scale
+        scaled_cosine_m = cosine * self.s
+        return scaled_cosine_m
diff --git a/ppcls/configs/Attr/StrongBaselineAttr.yaml b/ppcls/configs/Attr/StrongBaselineAttr.yaml
index 7501669bc5707fa2577c7d0b573a3b23cd2a0213..2324015d667a09a56570677713792b16f1b2ed03 100644
--- a/ppcls/configs/Attr/StrongBaselineAttr.yaml
+++ b/ppcls/configs/Attr/StrongBaselineAttr.yaml
@@ -20,6 +20,7 @@ Arch:
   name: "ResNet50"
   pretrained: True
   class_num: 26
+  infer_add_softmax: False
 
 # loss function config for traing/eval process
 Loss:
@@ -110,5 +111,3 @@ DataLoader:
 Metric:
   Eval:
     - ATTRMetric:
-
-
diff --git a/ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_dml.yaml b/ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_dml.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..d906c22de20914eeb72d2162f9bfa2142b357dcf
--- /dev/null
+++ b/ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_dml.yaml
@@ -0,0 +1,158 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output_lcnet_x2_5_dml
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 100
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+
+AMP:
+  scale_loss: 128.0
+  use_dynamic_loss_scaling: True
+  # O1: mixed fp16
+  level: O1
+
+# model architecture
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 1000
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - False
+  - False
+  infer_model_name: "Student"
+  models:
+    - Teacher:
+        name: PPLCNet_x2_5
+        class_num: *class_num
+        pretrained: False
+    - Student:
+        name: PPLCNet_x2_5
+        class_num: *class_num
+        pretrained: False
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationGTCELoss:
+        weight: 1.0
+        model_names: ["Student", "Teacher"]
+    - DistillationDMLLoss:
+        weight: 1.0
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.4
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/inference_deployment/whl_demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+    Train:
+    - DistillationTopkAcc:
+        model_key: "Student"
+        topk: [1, 5]
+    Eval:
+    - DistillationTopkAcc:
+        model_key: "Student"
+        topk: [1, 5]
diff --git a/ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml b/ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..656a0e907716e0d7d7df8ec6ab2923f584fc368c
--- /dev/null
+++ b/ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml
@@ -0,0 +1,157 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output_r50_vd_distill
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 100
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  to_static: True
+
+AMP:
+  scale_loss: 128.0
+  use_dynamic_loss_scaling: True
+  # O1: mixed fp16
+  level: O1
+
+# model architecture
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 1000
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - True
+  - False
+  infer_model_name: "Student"
+  models:
+    - Teacher:
+        name: ResNet50_vd
+        class_num: *class_num
+        pretrained: True
+        use_ssld: True
+    - Student:
+        name: PPLCNet_x2_5
+        class_num: *class_num
+        pretrained: False
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationDMLLoss:
+        weight: 1.0
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.2
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/inference_deployment/whl_demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+    Train:
+    - DistillationTopkAcc:
+        model_key: "Student"
+        topk: [1, 5]
+    Eval:
+    - DistillationTopkAcc:
+        model_key: "Student"
+        topk: [1, 5]
diff --git a/ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_udml.yaml b/ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_udml.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b425b592fdabea61832a1ad3c2f50a20b62ecd6f
--- /dev/null
+++ b/ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_udml.yaml
@@ -0,0 +1,168 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output_lcnet_x2_5_udml
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 100
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+
+AMP:
+  scale_loss: 128.0
+  use_dynamic_loss_scaling: True
+  # O1: mixed fp16
+  level: O1
+
+# model architecture
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 1000
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - False
+  - False
+  infer_model_name: "Student"
+  models:
+    - Teacher:
+        name: PPLCNet_x2_5
+        class_num: *class_num
+        pretrained: False
+        return_patterns: ["blocks3", "blocks4", "blocks5", "blocks6"]
+    - Student:
+        name: PPLCNet_x2_5
+        class_num: *class_num
+        pretrained: False
+        return_patterns: ["blocks3", "blocks4", "blocks5", "blocks6"]
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationGTCELoss:
+       weight: 1.0
+       key: logits
+       model_names: ["Student", "Teacher"]
+    - DistillationDMLLoss:
+        weight: 1.0
+        key: logits
+        model_name_pairs:
+        - ["Student", "Teacher"]
+    - DistillationDistanceLoss:
+        weight: 1.0
+        key: "blocks5"
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.4
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/inference_deployment/whl_demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+    Train:
+    - DistillationTopkAcc:
+        model_key: "Student"
+        topk: [1, 5]
+    Eval:
+    - DistillationTopkAcc:
+        model_key: "Student"
+        topk: [1, 5]
+
diff --git a/ppcls/configs/ImageNet/Distillation/res2net200_vd_distill_pphgnet_base.yaml b/ppcls/configs/ImageNet/Distillation/res2net200_vd_distill_pphgnet_base.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..7af9680cb2ec90c4c70cdf66d1c7f5a6225a9456
--- /dev/null
+++ b/ppcls/configs/ImageNet/Distillation/res2net200_vd_distill_pphgnet_base.yaml
@@ -0,0 +1,169 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: "./output/"
+  device: "gpu"
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: "./inference"
+  use_dali: false
+
+# mixed precision training
+AMP:
+  scale_loss: 128.0
+  use_dynamic_loss_scaling: True
+  # O1: mixed fp16
+  level: O1
+
+# model architecture
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 1000
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - True
+  - False
+  models:
+    - Teacher:
+        name: Res2Net200_vd_26w_4s
+        class_num: *class_num
+        pretrained: True
+        use_ssld: True
+    - Student:
+        name: PPHGNet_base
+        class_num: *class_num
+        pretrained: False
+
+  infer_model_name: "Student"
+
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationCELoss:
+        weight: 1.0
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+        
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.5
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+        name: ImageNetDataset
+        image_root: "./dataset/ILSVRC2012/"
+        cls_label_path: "./dataset/ILSVRC2012/train_list.txt"
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - RandCropImage:
+              size: 224
+              interpolation: bicubic
+              backend: pil
+          - RandFlipImage:
+              flip_code: 1
+          - TimmAutoAugment:
+              config_str: rand-m7-mstd0.5-inc1
+              interpolation: bicubic
+              img_size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+
+    sampler:
+        name: DistributedBatchSampler
+        batch_size: 128
+        drop_last: False
+        shuffle: True
+    loader:
+        num_workers: 8
+        use_shared_memory: True
+
+  Eval:
+    dataset: 
+        name: ImageNetDataset
+        image_root: "./dataset/ILSVRC2012/"
+        cls_label_path: "./dataset/ILSVRC2012/val_list.txt"
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              resize_short: 236
+              interpolation: bicubic
+              backend: pil
+          - CropImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+    sampler:
+        name: DistributedBatchSampler
+        batch_size: 128
+        drop_last: False
+        shuffle: False
+    loader:
+        num_workers: 8
+        use_shared_memory: True
+
+Infer:
+  infer_imgs: "docs/images/inference_deployment/whl_demo.jpg"
+  batch_size: 10
+  transforms:
+      - DecodeImage:
+          to_rgb: True
+          channel_first: False
+      - ResizeImage:
+          resize_short: 236
+      - CropImage:
+          size: 224
+      - NormalizeImage:
+          scale: 1.0/255.0
+          mean: [0.485, 0.456, 0.406]
+          std: [0.229, 0.224, 0.225]
+          order: ''
+      - ToCHWImage:
+  PostProcess:
+    name: DistillationPostProcess
+    func: Topk
+    topk: 5
+    class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt"
+
+Metric:
+    Train:
+    - DistillationTopkAcc:
+        model_key: "Student"
+        topk: [1, 5]
+    Eval:
+    - DistillationTopkAcc:
+        model_key: "Student"
+        topk: [1, 5]
diff --git a/ppcls/configs/ImageNet/PPHGNet/PPHGNet_base.yaml b/ppcls/configs/ImageNet/PPHGNet/PPHGNet_base.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..5e07692b01715ffa8196e5ded4604f9294d1ed07
--- /dev/null
+++ b/ppcls/configs/ImageNet/PPHGNet/PPHGNet_base.yaml
@@ -0,0 +1,164 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 600
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# mixed precision training
+AMP:
+  scale_loss: 128.0
+  use_dynamic_loss_scaling: True
+  # O1: mixed fp16
+  level: O1
+
+# model architecture
+Arch:
+  name: PPHGNet_base
+  class_num: 1000
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.5
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+            interpolation: bicubic
+            backend: pil
+        - RandFlipImage:
+            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m15-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.4
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.4
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 16
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 236
+            interpolation: bicubic
+            backend: pil
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 16
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/inference_deployment/whl_demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 236
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
diff --git a/ppcls/configs/PULC/car_exists/MobileNetV3_small_x0_35.yaml b/ppcls/configs/PULC/car_exists/MobileNetV3_small_x0_35.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..911b8edec269b593fc500416a61fe044fb56ab0d
--- /dev/null
+++ b/ppcls/configs/PULC/car_exists/MobileNetV3_small_x0_35.yaml
@@ -0,0 +1,139 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  start_eval_epoch: 10
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# model architecture
+Arch:
+  name: MobileNetV3_small_x0_35
+  class_num: 2
+  pretrained: True
+  use_sync_bn: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.05
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/car_exists/
+      cls_label_path: ./dataset/car_exists/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/car_exists/
+      cls_label_path: ./dataset/car_exists/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/car_exists/objects365_00001507.jpeg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: ThreshOutput
+    threshold: 0.5
+    label_0: no_car
+    label_1: contains_car
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TprAtFpr:
+        max_fpr: 0.01
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml b/ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..247f655b56946a9e81bf0b25fd827bdbde059735
--- /dev/null
+++ b/ppcls/configs/PULC/car_exists/PPLCNet_x1_0.yaml
@@ -0,0 +1,152 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  start_eval_epoch: 10
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+  class_num: 2
+  pretrained: True
+  use_ssld: True
+  use_sync_bn: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.0125
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/car_exists/
+      cls_label_path: ./dataset/car_exists/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 192
+        - RandFlipImage:
+            flip_code: 1
+        - TimmAutoAugment:
+            prob: 0.5
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 192
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.5
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/car_exists
+      cls_label_path: ./dataset/car_exists/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/car_exists/objects365_00001507.jpeg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: ThreshOutput
+    threshold: 0.9
+    label_0: no_car
+    label_1: contains_car
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TprAtFpr:
+        max_fpr: 0.01
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/car_exists/PPLCNet_x1_0_distillation.yaml b/ppcls/configs/PULC/car_exists/PPLCNet_x1_0_distillation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..4c11802d6f1a5c1a5d0395bd7d32ce4f08ab26bc
--- /dev/null
+++ b/ppcls/configs/PULC/car_exists/PPLCNet_x1_0_distillation.yaml
@@ -0,0 +1,169 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  start_eval_epoch: 1
+  eval_interval: 1
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# model architecture
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 2
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - True
+  - False
+  use_sync_bn: True
+  models:
+    - Teacher:
+        name: ResNet101_vd
+        class_num: *class_num
+    - Student:
+        name: PPLCNet_x1_0
+        class_num: *class_num
+        pretrained: True
+        use_ssld: True
+
+  infer_model_name: "Student"
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationDMLLoss:
+        weight: 1.0
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/car_exists/
+      cls_label_path: ./dataset/car_exists/train_list_for_distill.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 192
+        - RandFlipImage:
+            flip_code: 1
+        - TimmAutoAugment:
+            prob: 0.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 192
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.1
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 16
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/car_exists/
+      cls_label_path: ./dataset/car_exists/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/car_exists/objects365_00001507.jpeg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: ThreshOutput
+    threshold: 0.5
+    label_0: no_car
+    label_1: contains_car
+
+Metric:
+    Train:
+    - DistillationTopkAcc:
+        model_key: "Student"
+        topk: [1, 2]
+    Eval:
+    - TprAtFpr:
+        max_fpr: 0.01
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/car_exists/PPLCNet_x1_0_search.yaml b/ppcls/configs/PULC/car_exists/PPLCNet_x1_0_search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c263f2309feb333b339f5e288b9b2ba30ac30c44
--- /dev/null
+++ b/ppcls/configs/PULC/car_exists/PPLCNet_x1_0_search.yaml
@@ -0,0 +1,152 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  start_eval_epoch: 10
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+  class_num: 2
+  pretrained: True
+  use_ssld: True
+  use_sync_bn: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/car_exists/
+      cls_label_path: ./dataset/car_exists/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - TimmAutoAugment:
+            prob: 0.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/car_exists/
+      cls_label_path: ./dataset/car_exists/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/car_exists/objects365_00001507.jpeg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: ThreshOutput
+    threshold: 0.5
+    label_0: no_car
+    label_1: contains_car
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TprAtFpr:
+        max_fpr: 0.01
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/person/OtherModels/SwinTransformer_tiny_patch4_window7_224.yaml b/ppcls/configs/PULC/car_exists/SwinTransformer_tiny_patch4_window7_224.yaml
similarity index 90%
rename from ppcls/configs/PULC/person/OtherModels/SwinTransformer_tiny_patch4_window7_224.yaml
rename to ppcls/configs/PULC/car_exists/SwinTransformer_tiny_patch4_window7_224.yaml
index 0e2248e98529b511c7821b49ced6cf0625016553..a75fda4b48ff613dc681d9dd20bcc5d753de6c74 100644
--- a/ppcls/configs/PULC/person/OtherModels/SwinTransformer_tiny_patch4_window7_224.yaml
+++ b/ppcls/configs/PULC/car_exists/SwinTransformer_tiny_patch4_window7_224.yaml
@@ -62,8 +62,8 @@ DataLoader:
   Train:
     dataset:
       name: ImageNetDataset
-      image_root: ./dataset/person/
-      cls_label_path: ./dataset/person/train_list.txt
+      image_root: ./dataset/car_exists/
+      cls_label_path: ./dataset/car_exists/train_list.txt
       transform_ops:
         - DecodeImage:
             to_rgb: True
@@ -111,8 +111,8 @@ DataLoader:
   Eval:
     dataset: 
       name: ImageNetDataset
-      image_root: ./dataset/person/
-      cls_label_path: ./dataset/person/val_list.txt
+      image_root: ./dataset/car_exists/
+      cls_label_path: ./dataset/car_exists/val_list.txt
       transform_ops:
         - DecodeImage:
             to_rgb: True
@@ -136,7 +136,7 @@ DataLoader:
       use_shared_memory: True
 
 Infer:
-  infer_imgs: docs/images/inference_deployment/whl_demo.jpg
+  infer_imgs: deploy/images/PULC/car_exists/objects365_00001507.jpeg
   batch_size: 10
   transforms:
     - DecodeImage:
@@ -154,9 +154,9 @@ Infer:
     - ToCHWImage:
   PostProcess:
     name: ThreshOutput
-    threshold: 0.9
-    label_0: nobody
-    label_1: someone
+    threshold: 0.5
+    label_0: no_car
+    label_1: contains_car
 
 Metric:
   Train:
@@ -164,5 +164,6 @@ Metric:
         topk: [1, 2]
   Eval:
     - TprAtFpr:
+        max_fpr: 0.01
     - TopkAcc:
         topk: [1, 2]
diff --git a/ppcls/configs/PULC/car_exists/search.yaml b/ppcls/configs/PULC/car_exists/search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..820337c027248501a564a74937934de9e602734c
--- /dev/null
+++ b/ppcls/configs/PULC/car_exists/search.yaml
@@ -0,0 +1,40 @@
+base_config_file: ppcls/configs/PULC/person_exists/PPLCNet_x1_0_search.yaml
+distill_config_file: ppcls/configs/PULC/person_exists/PPLCNet_x1_0_distillation.yaml
+
+gpus: 0,1,2,3
+output_dir: output/search_person_cls
+search_times: 1
+search_dict:
+  - search_key: lrs
+    replace_config:
+      - Optimizer.lr.learning_rate
+    search_values: [0.0075, 0.01, 0.0125]
+  - search_key: resolutions
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.1.RandCropImage.size
+      - DataLoader.Train.dataset.transform_ops.3.TimmAutoAugment.img_size
+    search_values: [176, 192, 224]
+  - search_key: ra_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.3.TimmAutoAugment.prob
+    search_values: [0.0, 0.1, 0.5]
+  - search_key: re_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.5.RandomErasing.EPSILON
+    search_values: [0.0, 0.1, 0.5]
+  - search_key: lr_mult_list
+    replace_config:
+      - Arch.lr_mult_list
+    search_values:
+      - [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
+      - [0.0, 0.4, 0.4, 0.8, 0.8, 1.0]
+      - [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
+teacher:
+  rm_keys:
+    - Arch.lr_mult_list
+  search_values:
+    - ResNet101_vd
+    - ResNet50_vd
+final_replace:
+  Arch.lr_mult_list: Arch.models.1.Student.lr_mult_list
+
diff --git a/ppcls/configs/PULC/language_classification/MobileNetV3_small_x0_35.yaml b/ppcls/configs/PULC/language_classification/MobileNetV3_small_x0_35.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c3973ff421325f5b2151dfe3349d062aa2ed90c0
--- /dev/null
+++ b/ppcls/configs/PULC/language_classification/MobileNetV3_small_x0_35.yaml
@@ -0,0 +1,132 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 30
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  start_eval_epoch: 20
+
+# model architecture
+Arch:
+  name: MobileNetV3_small_x0_35
+  class_num: 10
+  pretrained: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 1.3
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/language_classification/
+      cls_label_path: ./dataset/language_classification/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/language_classification/
+      cls_label_path: ./dataset/language_classification/test_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/inference_deployment/whl_demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 2
+    class_id_map_file: ppcls/utils/PULC_label_list/language_classification_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml b/ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..081d8d23f2be9598adf450cd048a2f6094d4477c
--- /dev/null
+++ b/ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml
@@ -0,0 +1,143 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 30
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 80, 160]
+  save_inference_dir: ./inference
+  
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+  class_num: 10
+  pretrained: True
+  use_ssld: True
+  stride_list: [2, [2, 1], [2, 1], [2, 1], [2, 1]]
+  lr_mult_list : [0.0, 0.4, 0.4, 0.8, 0.8, 1.0]
+
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/language_classification/
+      cls_label_path: ./dataset/language_classification/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [160, 80]
+        - RandFlipImage:
+            flip_code: 1
+        - TimmAutoAugment:
+            prob: 1.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: [160, 80]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 1.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/language_classification/
+      cls_label_path: ./dataset/language_classification/test_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [160, 80]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/language_classification/word_35404.png
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+            size: [160, 80]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 2
+    class_id_map_file: ppcls/utils/PULC_label_list/language_classification_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/language_classification/PPLCNet_x1_0_distillation.yaml b/ppcls/configs/PULC/language_classification/PPLCNet_x1_0_distillation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..d792c573df5454a753feab7fb4d6b214a894b10f
--- /dev/null
+++ b/ppcls/configs/PULC/language_classification/PPLCNet_x1_0_distillation.yaml
@@ -0,0 +1,164 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 30
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# model architecture
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 10
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - True
+  - False
+  use_sync_bn: True
+  models:
+    - Teacher:
+        name: ResNet101_vd
+        class_num: *class_num
+    - Student:
+        name: PPLCNet_x1_0
+        class_num: *class_num
+        pretrained: True
+        use_ssld: True
+        stride_list: [2, [2, 1], [2, 1], [2, 1], [2, 1]]
+        lr_mult_list : [0.0, 0.4, 0.4, 0.8, 0.8, 1.0]
+        
+
+  infer_model_name: "Student"
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationDMLLoss:
+        weight: 1.0
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/language_classification/
+      cls_label_path: ./dataset/language_classification/train_list_for_distill.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [160, 80]
+        - RandFlipImage:
+            flip_code: 1
+        - TimmAutoAugment:
+            prob: 1.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: [160, 80]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 1.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/language_classification/
+      cls_label_path: ./dataset/language_classification/test_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [160, 80]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/language_classification/word_35404.png
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        size: [160, 80]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 2
+    class_id_map_file: ppcls/utils/PULC_label_list/language_classification_label_list.txt
+
+Metric:
+    Train:
+    - DistillationTopkAcc:
+        model_key: "Student"
+        topk: [1, 2]
+    Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/language_classification/PPLCNet_x1_0_search.yaml b/ppcls/configs/PULC/language_classification/PPLCNet_x1_0_search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..49a5f17026e99441db8949b52e3b8d1942bc3139
--- /dev/null
+++ b/ppcls/configs/PULC/language_classification/PPLCNet_x1_0_search.yaml
@@ -0,0 +1,142 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 30
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 48, 192]
+  save_inference_dir: ./inference
+  start_eval_epoch: 20
+  
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+  class_num: 10
+  pretrained: True
+  use_ssld: True
+  stride_list: [2, [2, 1], [2, 1], [2, 1], [2, 1]]
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.4
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/language_classification/
+      cls_label_path: ./dataset/language_classification/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [192, 48]
+        - RandFlipImage:
+            flip_code: 1
+        - TimmAutoAugment:
+            prob: 0.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: [192, 48]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/language_classification/
+      cls_label_path: ./dataset/language_classification/test_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [192, 48]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 32
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/language_classification/word_35404.png
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        size: [192, 48]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 2
+    class_id_map_file: ppcls/utils/PULC_label_list/language_classification_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/language_classification/SwinTransformer_tiny_patch4_window7_224.yaml b/ppcls/configs/PULC/language_classification/SwinTransformer_tiny_patch4_window7_224.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..4e1a45a9e6eb08e39afbd74583b560db801306e7
--- /dev/null
+++ b/ppcls/configs/PULC/language_classification/SwinTransformer_tiny_patch4_window7_224.yaml
@@ -0,0 +1,160 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 30
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# model architecture
+Arch:
+  name: SwinTransformer_tiny_patch4_window7_224
+  class_num: 10
+  pretrained: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
+  lr:
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/language_classification/
+      cls_label_path: ./dataset/language_classification/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+            interpolation: bicubic
+            backend: pil
+        - RandFlipImage:
+            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/language_classification/
+      cls_label_path: ./dataset/language_classification/test_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+            interpolation: bicubic
+            backend: pil
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/language_classification/word_35404.png
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+        interpolation: bicubic
+        backend: pil
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 2
+    class_id_map_file: ppcls/utils/PULC_label_list/language_classification_label_list.txt
+
+Metric:
+  Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/language_classification/search.yaml b/ppcls/configs/PULC/language_classification/search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..a4b3dde564b1711e1e2c1d8c22b69d8e264adacd
--- /dev/null
+++ b/ppcls/configs/PULC/language_classification/search.yaml
@@ -0,0 +1,40 @@
+base_config_file: ppcls/configs/PULC/language_classification/PPLCNet_x1_0_search.yaml
+distill_config_file: ppcls/configs/PULC/language_classification/PPLCNet_x1_0_distillation.yaml
+
+gpus: 0,1,2,3
+output_dir: output/search_language_classification
+search_times: 1
+search_dict:
+  - search_key: lrs
+    replace_config:
+      - Optimizer.lr.learning_rate
+    search_values: [0.2, 0.4, 0.8]
+  - search_key: resolutions
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.1.ResizeImage.size
+      - DataLoader.Train.dataset.transform_ops.3.TimmAutoAugment.img_size
+      - DataLoader.Eval.dataset.transform_ops.1.ResizeImage.size
+    search_values: [[192, 48], [180, 60], [160, 80]]
+  - search_key: ra_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.3.TimmAutoAugment.prob
+    search_values: [0.0, 0.5, 1.0]
+  - search_key: re_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.5.RandomErasing.EPSILON
+    search_values: [0.0, 0.5, 1.0]
+  - search_key: lr_mult_list
+    replace_config:
+      - Arch.lr_mult_list
+    search_values:
+      - [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
+      - [0.0, 0.4, 0.4, 0.8, 0.8, 1.0]
+      - [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
+teacher:
+  rm_keys:
+    - Arch.lr_mult_list
+  search_values:
+    - ResNet101_vd
+    - ResNet50_vd
+final_replace:
+  Arch.lr_mult_list: Arch.models.1.Student.lr_mult_list
\ No newline at end of file
diff --git a/ppcls/configs/PULC/person_attribute/MobileNetV3_small_x0_35.yaml b/ppcls/configs/PULC/person_attribute/MobileNetV3_small_x0_35.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..94b443832cd2244ac900a85f78f6ab2ac05cb848
--- /dev/null
+++ b/ppcls/configs/PULC/person_attribute/MobileNetV3_small_x0_35.yaml
@@ -0,0 +1,135 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: "./output/"
+  device: "gpu"
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 256, 192]
+  save_inference_dir: "./inference"
+  use_multilabel: True
+
+# model architecture
+Arch:
+  name: "MobileNetV3_small_x0_35"
+  pretrained: True
+  class_num: 26
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+  Eval:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+  #clip_norm: 10
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/pa100k/"
+      cls_label_path: "dataset/pa100k/train_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [192, 256]
+        - Padv2:
+            size: [212, 276]
+            pad_mode: 1
+            fill_value: 0
+        - RandomCropImage:
+            size: [192, 256]
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: True
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/pa100k/"
+      cls_label_path: "dataset/pa100k/val_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [192, 256]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/person_attribute/090004.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        size: [192, 256]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: PersonAttribute
+    threshold: 0.5  #default threshold
+    glasses_threshold: 0.3  #threshold only for glasses
+    hold_threshold: 0.6 #threshold only for hold
+
+Metric:
+  Eval:
+    - ATTRMetric:
+
+
diff --git a/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml b/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b042ad757421f99572f6e2df3a7fb3cec4a7a510
--- /dev/null
+++ b/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
@@ -0,0 +1,149 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: "./output/"
+  device: "gpu"
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 256, 192]
+  save_inference_dir: "./inference"
+  use_multilabel: True
+
+# model architecture
+Arch:
+  name: "PPLCNet_x1_0"
+  pretrained: True
+  use_ssld: True
+  class_num: 26
+  
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+  Eval:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/pa100k/"
+      cls_label_path: "dataset/pa100k/train_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [192, 256]
+        - TimmAutoAugment:
+            prob: 0.8
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: [192, 256]
+        - Padv2:
+            size: [212, 276]
+            pad_mode: 1
+            fill_value: 0
+        - RandomCropImage:
+            size: [192, 256]
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.4
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: True
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/pa100k/"
+      cls_label_path: "dataset/pa100k/val_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [192, 256]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/person_attribute/090004.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        size: [192, 256]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: PersonAttribute
+    threshold: 0.5  #default threshold
+    glasses_threshold: 0.3  #threshold only for glasses
+    hold_threshold: 0.6 #threshold only for hold
+
+Metric:
+  Eval:
+    - ATTRMetric:
+
+
diff --git a/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0_Distillation.yaml b/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0_Distillation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..bd6503488f4730599c98d2f5889b7bf87aa0ba7a
--- /dev/null
+++ b/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0_Distillation.yaml
@@ -0,0 +1,172 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  start_eval_epoch: 1
+  eval_interval: 1
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 256, 192]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+  use_multilabel: True
+
+# model architecture
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 26
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - True
+  - False
+  use_sync_bn: True
+  models:
+    - Teacher:
+        name: ResNet101_vd
+        class_num: *class_num
+    - Student:
+        name: PPLCNet_x1_0
+        class_num: *class_num
+        pretrained: True
+        use_ssld: True
+
+  infer_model_name: "Student"
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationDMLLoss:
+        weight: 1.0
+        model_name_pairs:
+        - ["Student", "Teacher"]
+    - DistillationMultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        model_names: ["Student"]
+        size_sum: True
+  Eval:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/pa100k/"
+      cls_label_path: "dataset/pa100k/train_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [192, 256]
+        - TimmAutoAugment:
+            prob: 0.8
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: [192, 256]
+        - Padv2:
+            size: [212, 276]
+            pad_mode: 1
+            fill_value: 0
+        - RandomCropImage:
+            size: [192, 256]
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.4
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: True
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/pa100k/"
+      cls_label_path: "dataset/pa100k/val_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [192, 256]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/person_attribute/090004.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        size: [192, 256]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: PersonAttribute
+    threshold: 0.5  #default threshold
+    glasses_threshold: 0.3  #threshold only for glasses
+    hold_threshold: 0.6 #threshold only for hold
+
+Metric:
+  Eval:
+    - ATTRMetric:
diff --git a/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0_search.yaml b/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0_search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..8f6b0d7fede587c09bd0a01286ec62590854d12b
--- /dev/null
+++ b/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0_search.yaml
@@ -0,0 +1,149 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: "./output/"
+  device: "gpu"
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 256, 192]
+  save_inference_dir: "./inference"
+  use_multilabel: True
+
+# model architecture
+Arch:
+  name: "PPLCNet_x1_0"
+  pretrained: True
+  use_ssld: True
+  class_num: 26
+  
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+  Eval:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/pa100k/"
+      cls_label_path: "dataset/pa100k/train_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [192, 256]
+        - TimmAutoAugment:
+            prob: 0.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: [192, 256]
+        - Padv2:
+            size: [212, 276]
+            pad_mode: 1
+            fill_value: 0
+        - RandomCropImage:
+            size: [192, 256]
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: True
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/pa100k"
+      cls_label_path: "dataset/pa100k/val_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [192, 256]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/person_attribute/090004.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        size: [192, 256]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: PersonAttribute
+    threshold: 0.5  #default threshold
+    glasses_threshold: 0.3  #threshold only for glasses
+    hold_threshold: 0.6 #threshold only for hold
+
+Metric:
+  Eval:
+    - ATTRMetric:
+
+
diff --git a/ppcls/configs/PULC/person_attribute/Res2Net200_vd_26w_4s.yaml b/ppcls/configs/PULC/person_attribute/Res2Net200_vd_26w_4s.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..4f7dc273c3d057a4505fa01f198b75411838f3e8
--- /dev/null
+++ b/ppcls/configs/PULC/person_attribute/Res2Net200_vd_26w_4s.yaml
@@ -0,0 +1,134 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: "./output/"
+  device: "gpu"
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 256, 192]
+  save_inference_dir: "./inference"
+  use_multilabel: True
+
+# model architecture
+Arch:
+  name: "Res2Net200_vd_26w_4s"
+  pretrained: True
+  class_num: 26
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+  Eval:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/pa100k/"
+      cls_label_path: "dataset/pa100k/train_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [192, 256]
+        - Padv2:
+            size: [212, 276]
+            pad_mode: 1
+            fill_value: 0
+        - RandomCropImage:
+            size: [192, 256]
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: True
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/pa100k/"
+      cls_label_path: "dataset/pa100k/val_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [192, 256]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/person_attribute/090004.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        size: [192, 256]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: PersonAttribute
+    threshold: 0.5  #default threshold
+    glasses_threshold: 0.3  #threshold only for glasses
+    hold_threshold: 0.6 #threshold only for hold
+
+Metric:
+  Eval:
+    - ATTRMetric:
+
+
diff --git a/ppcls/configs/PULC/person_attribute/SwinTransformer_tiny_patch4_window7_224.yaml b/ppcls/configs/PULC/person_attribute/SwinTransformer_tiny_patch4_window7_224.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..36c3d6aae19b70a56bf1aebe3989fa83f0fcc715
--- /dev/null
+++ b/ppcls/configs/PULC/person_attribute/SwinTransformer_tiny_patch4_window7_224.yaml
@@ -0,0 +1,135 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: "./output/"
+  device: "gpu"
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: "./inference"
+  use_multilabel: True
+
+# model architecture
+Arch:
+  name: "SwinTransformer_tiny_patch4_window7_224"
+  pretrained: True
+  class_num: 26
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+  Eval:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+  #clip_norm: 10
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/pa100k/"
+      cls_label_path: "dataset/pa100k/train_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [224, 224]
+        - Padv2:
+            size: [244, 244]
+            pad_mode: 1
+            fill_value: 0
+        - RandomCropImage:
+            size: [224, 224]
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: True
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/pa100k/"
+      cls_label_path: "dataset/pa100k/val_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [224, 224]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/person_attribute/090004.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        size: [224, 224]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: PersonAttribute
+    threshold: 0.5  #default threshold
+    glasses_threshold: 0.3  #threshold only for glasses
+    hold_threshold: 0.6 #threshold only for hold
+
+Metric:
+  Eval:
+    - ATTRMetric:
+
+
diff --git a/ppcls/configs/PULC/person_attribute/search.yaml b/ppcls/configs/PULC/person_attribute/search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..78192d1132fb4c2cdf1261c86df020f4389ac77e
--- /dev/null
+++ b/ppcls/configs/PULC/person_attribute/search.yaml
@@ -0,0 +1,41 @@
+base_config_file: ppcls/configs/PULC/person_attribute/PPLCNet_x1_0_search.yaml
+distill_config_file: ppcls/configs/PULC/person_attribute/PPLCNet_x1_0_Distillation.yaml
+
+gpus: 0,1,2,3
+output_dir: output/search_attr
+search_times: 1
+search_dict:
+  - search_key: lrs
+    replace_config:
+      - Optimizer.lr.learning_rate
+    search_values: [0.0001, 0.005, 0.01, 0.02, 0.05]
+  - search_key: resolutions
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.1.ResizeImage.size
+      - DataLoader.Train.dataset.transform_ops.4.RandomCropImage.size
+      - DataLoader.Train.dataset.transform_ops.2.TimmAutoAugment.img_size
+    search_values: [[192, 256]]
+  - search_key: ra_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.2.TimmAutoAugment.prob
+    search_values: [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
+  - search_key: re_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.7.RandomErasing.EPSILON
+    search_values: [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
+  - search_key: lr_mult_list
+    replace_config:
+      - Arch.lr_mult_list
+    search_values:
+      - [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
+      - [0.0, 0.4, 0.4, 0.8, 0.8, 1.0]
+      - [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
+teacher:
+  rm_keys:
+    - Arch.lr_mult_list
+  search_values:
+    - ResNet101_vd
+    - ResNet50_vd
+final_replace:
+  Arch.lr_mult_list: Arch.models.1.Student.lr_mult_list
+
diff --git a/ppcls/configs/PULC/person/OtherModels/MobileNetV3_large_x1_0.yaml b/ppcls/configs/PULC/person_exists/MobileNetV3_small_x0_35.yaml
similarity index 85%
rename from ppcls/configs/PULC/person/OtherModels/MobileNetV3_large_x1_0.yaml
rename to ppcls/configs/PULC/person_exists/MobileNetV3_small_x0_35.yaml
index d69bb933fdbf5592d497651cad79995a492cdf28..9510ec258a678e513960a02fb83139e9312fca91 100644
--- a/ppcls/configs/PULC/person/OtherModels/MobileNetV3_large_x1_0.yaml
+++ b/ppcls/configs/PULC/person_exists/MobileNetV3_small_x0_35.yaml
@@ -18,16 +18,9 @@ Global:
   to_static: False
   use_dali: False
 
-# mixed precision training
-AMP:
-  scale_loss: 128.0
-  use_dynamic_loss_scaling: True
-  # O1: mixed fp16
-  level: O1
-
 # model architecture
 Arch:
-  name: MobileNetV3_large_x1_0
+  name: MobileNetV3_small_x0_35
   class_num: 2
   pretrained: True
   use_sync_bn: True
@@ -48,11 +41,11 @@ Optimizer:
   momentum: 0.9
   lr:
     name: Cosine
-    learning_rate: 0.13
+    learning_rate: 0.05
     warmup_epoch: 5
   regularizer:
     name: 'L2'
-    coeff: 0.00002
+    coeff: 0.00001
 
 
 # data loader for train and eval
@@ -60,8 +53,8 @@ DataLoader:
   Train:
     dataset:
       name: ImageNetDataset
-      image_root: ./dataset/person/
-      cls_label_path: ./dataset/person/train_list.txt
+      image_root: ./dataset/person_exists/
+      cls_label_path: ./dataset/person_exists/train_list.txt
       transform_ops:
         - DecodeImage:
             to_rgb: True
@@ -88,8 +81,8 @@ DataLoader:
   Eval:
     dataset: 
       name: ImageNetDataset
-      image_root: ./dataset/person/
-      cls_label_path: ./dataset/person/val_list.txt
+      image_root: ./dataset/person_exists/
+      cls_label_path: ./dataset/person_exists/val_list.txt
       transform_ops:
         - DecodeImage:
             to_rgb: True
@@ -113,7 +106,7 @@ DataLoader:
       use_shared_memory: True
 
 Infer:
-  infer_imgs: docs/images/inference_deployment/whl_demo.jpg
+  infer_imgs: deploy/images/PULC/person_exists/objects365_02035329.jpg
   batch_size: 10
   transforms:
     - DecodeImage:
@@ -131,7 +124,7 @@ Infer:
     - ToCHWImage:
   PostProcess:
     name: ThreshOutput
-    threshold: 0.9
+    threshold: 0.5
     label_0: nobody
     label_1: someone
 
diff --git a/ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml b/ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml
similarity index 91%
rename from ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml
rename to ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml
index e196547923a345a9535f5b63a568817b2784c6d7..93e9841d97209350521d3882b3288add5f748ffe 100644
--- a/ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml
+++ b/ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml
@@ -54,8 +54,8 @@ DataLoader:
   Train:
     dataset:
       name: ImageNetDataset
-      image_root: ./dataset/person/
-      cls_label_path: ./dataset/person/train_list.txt
+      image_root: ./dataset/person_exists/
+      cls_label_path: ./dataset/person_exists/train_list.txt
       transform_ops:
         - DecodeImage:
             to_rgb: True
@@ -94,8 +94,8 @@ DataLoader:
   Eval:
     dataset: 
       name: ImageNetDataset
-      image_root: ./dataset/person/
-      cls_label_path: ./dataset/person/val_list.txt
+      image_root: ./dataset/person_exists/
+      cls_label_path: ./dataset/person_exists/val_list.txt
       transform_ops:
         - DecodeImage:
             to_rgb: True
@@ -119,7 +119,7 @@ DataLoader:
       use_shared_memory: True
 
 Infer:
-  infer_imgs: docs/images/inference_deployment/whl_demo.jpg
+  infer_imgs: deploy/images/PULC/person_exists/objects365_02035329.jpg
   batch_size: 10
   transforms:
     - DecodeImage:
diff --git a/ppcls/configs/PULC/person/Distillation/PPLCNet_x1_0_distillation.yaml b/ppcls/configs/PULC/person_exists/PPLCNet_x1_0_distillation.yaml
similarity index 91%
rename from ppcls/configs/PULC/person/Distillation/PPLCNet_x1_0_distillation.yaml
rename to ppcls/configs/PULC/person_exists/PPLCNet_x1_0_distillation.yaml
index afb9b43a0dfad4153bdc761a13c61a4d0e5fd47d..3d3aa325870d645449d465662234e9a6551c01bf 100644
--- a/ppcls/configs/PULC/person/Distillation/PPLCNet_x1_0_distillation.yaml
+++ b/ppcls/configs/PULC/person_exists/PPLCNet_x1_0_distillation.yaml
@@ -70,8 +70,8 @@ DataLoader:
   Train:
     dataset:
       name: ImageNetDataset
-      image_root: ./dataset/person/
-      cls_label_path: ./dataset/person/train_list_for_distill.txt
+      image_root: ./dataset/person_exists/
+      cls_label_path: ./dataset/person_exists/train_list_for_distill.txt
       transform_ops:
         - DecodeImage:
             to_rgb: True
@@ -110,8 +110,8 @@ DataLoader:
   Eval:
     dataset: 
       name: ImageNetDataset
-      image_root: ./dataset/person/
-      cls_label_path: ./dataset/person/val_list.txt
+      image_root: ./dataset/person_exists/
+      cls_label_path: ./dataset/person_exists/val_list.txt
       transform_ops:
         - DecodeImage:
             to_rgb: True
@@ -135,7 +135,7 @@ DataLoader:
       use_shared_memory: True
 
 Infer:
-  infer_imgs: docs/images/inference_deployment/whl_demo.jpg
+  infer_imgs: deploy/images/PULC/person_exists/objects365_02035329.jpg
   batch_size: 10
   transforms:
     - DecodeImage:
@@ -153,7 +153,7 @@ Infer:
     - ToCHWImage:
   PostProcess:
     name: ThreshOutput
-    threshold: 0.9
+    threshold: 0.5
     label_0: nobody
     label_1: someone
 
diff --git a/ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0_search.yaml b/ppcls/configs/PULC/person_exists/PPLCNet_x1_0_search.yaml
similarity index 90%
rename from ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0_search.yaml
rename to ppcls/configs/PULC/person_exists/PPLCNet_x1_0_search.yaml
index b2126b69f9d773d918df6b1f03361cac06ee44f8..86c25a05b47399cfe044cab30cea06e94bcb90ec 100644
--- a/ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0_search.yaml
+++ b/ppcls/configs/PULC/person_exists/PPLCNet_x1_0_search.yaml
@@ -54,8 +54,8 @@ DataLoader:
   Train:
     dataset:
       name: ImageNetDataset
-      image_root: ./dataset/person/
-      cls_label_path: ./dataset/person/train_list.txt
+      image_root: ./dataset/person_exists/
+      cls_label_path: ./dataset/person_exists/train_list.txt
       transform_ops:
         - DecodeImage:
             to_rgb: True
@@ -94,8 +94,8 @@ DataLoader:
   Eval:
     dataset: 
       name: ImageNetDataset
-      image_root: ./dataset/person/
-      cls_label_path: ./dataset/person/val_list.txt
+      image_root: ./dataset/person_exists/
+      cls_label_path: ./dataset/person_exists/val_list.txt
       transform_ops:
         - DecodeImage:
             to_rgb: True
@@ -119,7 +119,7 @@ DataLoader:
       use_shared_memory: True
 
 Infer:
-  infer_imgs: docs/images/inference_deployment/whl_demo.jpg
+  infer_imgs: deploy/images/PULC/person_exists/objects365_02035329.jpg
   batch_size: 10
   transforms:
     - DecodeImage:
@@ -137,7 +137,7 @@ Infer:
     - ToCHWImage:
   PostProcess:
     name: ThreshOutput
-    threshold: 0.9
+    threshold: 0.5
     label_0: nobody
     label_1: someone
 
diff --git a/ppcls/configs/PULC/person_exists/SwinTransformer_tiny_patch4_window7_224.yaml b/ppcls/configs/PULC/person_exists/SwinTransformer_tiny_patch4_window7_224.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..be10d67b78a23948a6c62cf379be16d297647d38
--- /dev/null
+++ b/ppcls/configs/PULC/person_exists/SwinTransformer_tiny_patch4_window7_224.yaml
@@ -0,0 +1,168 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  start_eval_epoch: 10
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# mixed precision training
+AMP:
+  scale_loss: 128.0
+  use_dynamic_loss_scaling: True
+  # O1: mixed fp16
+  level: O1
+
+# model architecture
+Arch:
+  name: SwinTransformer_tiny_patch4_window7_224
+  class_num: 2
+  pretrained: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+Optimizer:
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm
+  one_dim_param_no_weight_decay: True
+  lr:
+    name: Cosine
+    learning_rate: 5e-5
+    eta_min: 1e-6
+    warmup_epoch: 5
+    warmup_start_lr: 1e-7
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/person_exists/
+      cls_label_path: ./dataset/person_exists/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+            interpolation: bicubic
+            backend: pil
+        - RandFlipImage:
+            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/person_exists/
+      cls_label_path: ./dataset/person_exists/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/person_exists/objects365_02035329.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: ThreshOutput
+    threshold: 0.5
+    label_0: nobody
+    label_1: someone
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TprAtFpr:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/person_exists/search.yaml b/ppcls/configs/PULC/person_exists/search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..820337c027248501a564a74937934de9e602734c
--- /dev/null
+++ b/ppcls/configs/PULC/person_exists/search.yaml
@@ -0,0 +1,40 @@
+base_config_file: ppcls/configs/PULC/person_exists/PPLCNet_x1_0_search.yaml
+distill_config_file: ppcls/configs/PULC/person_exists/PPLCNet_x1_0_distillation.yaml
+
+gpus: 0,1,2,3
+output_dir: output/search_person_cls
+search_times: 1
+search_dict:
+  - search_key: lrs
+    replace_config:
+      - Optimizer.lr.learning_rate
+    search_values: [0.0075, 0.01, 0.0125]
+  - search_key: resolutions
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.1.RandCropImage.size
+      - DataLoader.Train.dataset.transform_ops.3.TimmAutoAugment.img_size
+    search_values: [176, 192, 224]
+  - search_key: ra_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.3.TimmAutoAugment.prob
+    search_values: [0.0, 0.1, 0.5]
+  - search_key: re_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.5.RandomErasing.EPSILON
+    search_values: [0.0, 0.1, 0.5]
+  - search_key: lr_mult_list
+    replace_config:
+      - Arch.lr_mult_list
+    search_values:
+      - [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
+      - [0.0, 0.4, 0.4, 0.8, 0.8, 1.0]
+      - [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
+teacher:
+  rm_keys:
+    - Arch.lr_mult_list
+  search_values:
+    - ResNet101_vd
+    - ResNet50_vd
+final_replace:
+  Arch.lr_mult_list: Arch.models.1.Student.lr_mult_list
+
diff --git a/ppcls/configs/PULC/safety_helmet/MobileNetV3_small_x0_35.yaml b/ppcls/configs/PULC/safety_helmet/MobileNetV3_small_x0_35.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..9ef4beb79db475869414ec3c6e7b9ade3e24b50a
--- /dev/null
+++ b/ppcls/configs/PULC/safety_helmet/MobileNetV3_small_x0_35.yaml
@@ -0,0 +1,134 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 60
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+
+# model architecture
+Arch:
+  name: MobileNetV3_small_x0_35
+  pretrained: True
+  class_num: 2
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.08
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/safety_helmet/
+      cls_label_path: ./dataset/safety_helmet/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/safety_helmet/
+      cls_label_path: ./dataset/safety_helmet/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/safety_helmet/safety_helmet_test_1.png
+  batch_size: 1
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: ThreshOutput
+    threshold: 0.5
+    label_0: wearing_helmet
+    label_1: unwearing_helmet
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1]
+  Eval:
+    - TprAtFpr:
+        max_fpr: 0.0001
+    - TopkAcc:
+        topk: [1]
diff --git a/ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0.yaml b/ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..4c3c8642d9464025dda4d628d328e34d3b8a1613
--- /dev/null
+++ b/ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0.yaml
@@ -0,0 +1,148 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 40
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+  pretrained: True
+  use_ssld: True
+  class_num: 2
+  use_sync_bn : True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.025
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/safety_helmet/
+      cls_label_path: ./dataset/safety_helmet/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 176
+        - RandFlipImage:
+            flip_code: 1
+        - TimmAutoAugment:
+            prob : 0.5
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size : 176
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON : 0.1
+            r1 : 0.3
+            sh : 1.0/3.0
+            sl : 0.02
+            attempt : 10
+            use_log_aspect : True
+            mode : pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/safety_helmet/
+      cls_label_path: ./dataset/safety_helmet/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/safety_helmet/safety_helmet_test_1.png
+  batch_size: 1
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: ThreshOutput
+    threshold: 0.5
+    label_0: wearing_helmet
+    label_1: unwearing_helmet
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1]
+  Eval:
+    - TprAtFpr:
+        max_fpr: 0.0001
+    - TopkAcc:
+        topk: [1]
diff --git a/ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0_distillation.yaml b/ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0_distillation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..254db5df466cfb928561d295e401086cd5731f1b
--- /dev/null
+++ b/ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0_distillation.yaml
@@ -0,0 +1,185 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  start_eval_epoch: 1
+  eval_interval: 1
+  epochs: 40
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# model architecture
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 2
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - False
+  - False
+  use_sync_bn: True
+  models:
+    - Teacher:
+        name: PPLCNet_x1_0
+        class_num: *class_num
+        pretrained: True
+        use_ssld: True
+        return_stages: True
+        return_patterns: ["blocks3", "blocks4", "blocks5", "blocks6"]
+    - Student:
+        name: PPLCNet_x1_0
+        class_num: *class_num
+        pretrained: True
+        use_ssld: True
+        return_stages: True
+        return_patterns: ["blocks3", "blocks4", "blocks5", "blocks6"]
+
+  infer_model_name: "Student"
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationGTCELoss:
+        weight: 1.0
+        key: logits
+        model_names: ["Student", "Teacher"]
+    - DistillationDMLLoss:
+        weight: 1.0
+        key: logits
+        model_name_pairs:
+        - ["Student", "Teacher"]
+    - DistillationDistanceLoss:
+        weight: 1.0
+        key: "blocks4"
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.015
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/safety_helmet/
+      cls_label_path: ./dataset/safety_helmet/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 192
+        - RandFlipImage:
+            flip_code: 1
+        - TimmAutoAugment:
+            prob: 0.5
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 192
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.5
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/safety_helmet/
+      cls_label_path: ./dataset/safety_helmet/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/safety_helmet/safety_helmet_test_1.png
+  batch_size: 1
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: ThreshOutput
+    threshold: 0.5
+    label_0: wearing_helmet
+    label_1: unwearing_helmet
+
+Metric:
+  Train:
+    - DistillationTopkAcc:
+        model_key: "Student"
+        topk: [1]
+  Eval:
+    - TprAtFpr:
+        max_fpr: 0.0001
+    - TopkAcc:
+        topk: [1]
diff --git a/ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0_search.yaml b/ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0_search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..98f63a613278cf7eed879d60a65e942bdfb4c687
--- /dev/null
+++ b/ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0_search.yaml
@@ -0,0 +1,148 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 40
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+  pretrained: True
+  use_ssld: True
+  class_num: 2
+  use_sync_bn: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.10
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/safety_helmet/
+      cls_label_path: ./dataset/safety_helmet/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 192
+        - RandFlipImage:
+            flip_code: 1
+        - TimmAutoAugment:
+            prob: 0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 192
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/safety_helmet/
+      cls_label_path: ./dataset/safety_helmet/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/safety_helmet/safety_helmet_test_1.png
+  batch_size: 1
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: ThreshOutput
+    threshold: 0.5
+    label_0: wearing_helmet
+    label_1: unwearing_helmet
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1]
+  Eval:
+    - TprAtFpr:
+        max_fpr: 0.0001
+    - TopkAcc:
+        topk: [1]
diff --git a/ppcls/configs/PULC/safety_helmet/Res2Net200_vd_26w_4s.yaml b/ppcls/configs/PULC/safety_helmet/Res2Net200_vd_26w_4s.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..5b987d510c17df28c100728690f8a5d62293d36c
--- /dev/null
+++ b/ppcls/configs/PULC/safety_helmet/Res2Net200_vd_26w_4s.yaml
@@ -0,0 +1,137 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 60
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+
+# model architecture
+Arch:
+  name: Res2Net200_vd_26w_4s
+  class_num: 2
+  pretrained: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.005
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/safety_helmet/
+      cls_label_path: ./dataset/safety_helmet/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+      batch_transform_ops:
+        - MixupOperator:
+            alpha: 0.2
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 32
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/safety_helmet/
+      cls_label_path: ./dataset/safety_helmet/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/safety_helmet/safety_helmet_test_1.png
+  batch_size: 1
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: ThreshOutput
+    threshold: 0.5
+    label_0: wearing_helmet
+    label_1: unwearing_helmet
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1]
+  Eval:
+    - TprAtFpr:
+        max_fpr: 0.0001
+    - TopkAcc:
+        topk: [1]
+
diff --git a/ppcls/configs/PULC/safety_helmet/SwinTransformer_tiny_patch4_window7_224.yaml b/ppcls/configs/PULC/safety_helmet/SwinTransformer_tiny_patch4_window7_224.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..5863ee17e4627cff71444710988e94ae76cd8025
--- /dev/null
+++ b/ppcls/configs/PULC/safety_helmet/SwinTransformer_tiny_patch4_window7_224.yaml
@@ -0,0 +1,159 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 60
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+
+# model architecture
+Arch:
+  name: SwinTransformer_tiny_patch4_window7_224
+  pretrained: True
+  class_num: 2
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
+  lr:
+    name: Cosine
+    learning_rate: 1e-5
+    eta_min: 1e-7
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/safety_helmet/
+      cls_label_path: ./dataset/safety_helmet/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+            interpolation: bicubic
+            backend: pil
+        - RandFlipImage:
+            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/safety_helmet/
+      cls_label_path: ./dataset/safety_helmet/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+            interpolation: bicubic
+            backend: pil
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/safety_helmet/safety_helmet_test_1.png
+  batch_size: 1
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: ThreshOutput
+    threshold: 0.5
+    label_0: wearing_helmet
+    label_1: unwearing_helmet
+
+Metric:
+  Eval:
+    - TprAtFpr:
+        max_fpr: 0.0001
+    - TopkAcc:
+        topk: [1]
diff --git a/ppcls/configs/PULC/safety_helmet/search.yaml b/ppcls/configs/PULC/safety_helmet/search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e8c1c933dfef09f1d3a6189f4701a1e7d0678ab9
--- /dev/null
+++ b/ppcls/configs/PULC/safety_helmet/search.yaml
@@ -0,0 +1,36 @@
+base_config_file: ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0_search.yaml
+distill_config_file: ppcls/configs/PULC/safety_helmet/PPLCNet_x1_0_distillation.yaml
+
+gpus: 0,1,2,3
+output_dir: output/search_safety_helmet
+search_times: 1
+search_dict:
+  - search_key: lrs
+    replace_config:
+      - Optimizer.lr.learning_rate
+    search_values: [0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.11, 0.11, 0.12]
+  - search_key: resolutions
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.1.RandCropImage.size
+      - DataLoader.Train.dataset.transform_ops.3.TimmAutoAugment.img_size
+    search_values: [176, 192, 224]
+  - search_key: ra_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.3.TimmAutoAugment.prob
+    search_values: [0.0, 0.1, 0.5]
+  - search_key: re_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.5.RandomErasing.EPSILON
+    search_values: [0.0, 0.1, 0.5]
+  - search_key: lr_mult_list
+    replace_config:
+      - Arch.lr_mult_list
+    search_values:
+      - [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
+      - [0.0, 0.4, 0.4, 0.8, 0.8, 1.0]
+      - [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
+teacher:
+  algorithm: "udml"
+final_replace:
+  Arch.lr_mult_list: Arch.models.1.Student.lr_mult_list
+
diff --git a/ppcls/configs/PULC/text_image_orientation/MobileNetV3_small_x0_35.yaml b/ppcls/configs/PULC/text_image_orientation/MobileNetV3_small_x0_35.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..7eaff97684db9661c62d782e9feb65e7f7ba42f7
--- /dev/null
+++ b/ppcls/configs/PULC/text_image_orientation/MobileNetV3_small_x0_35.yaml
@@ -0,0 +1,132 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 60
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  start_eval_epoch: 40
+
+# model architecture
+Arch:
+  name: MobileNetV3_small_x0_35
+  class_num: 4
+  pretrained: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 1.3
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/text_image_orientation/
+      cls_label_path: ./dataset/text_image_orientation/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 16
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/text_image_orientation/
+      cls_label_path: ./dataset/text_image_orientation/test_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 16
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: ddeploy/images/PULC/text_image_orientation/img_rot0_demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 2
+    class_id_map_file: ppcls/utils/PULC_label_list/text_image_orientation_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml b/ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c8ded908ebcd389224721102984c7c63cd22293f
--- /dev/null
+++ b/ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml
@@ -0,0 +1,143 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 60
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+  class_num: 4
+  pretrained: True
+  use_ssld: True
+
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.4
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/text_image_orientation/
+      cls_label_path: ./dataset/text_image_orientation/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - TimmAutoAugment:
+            prob: 0.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/text_image_orientation/
+      cls_label_path: ./dataset/text_image_orientation/test_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/text_image_orientation/img_rot0_demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 2
+    class_id_map_file: ppcls/utils/PULC_label_list/text_image_orientation_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0_distillation.yaml b/ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0_distillation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b8fd0b10843ebbda530c198f65060c728a19dba1
--- /dev/null
+++ b/ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0_distillation.yaml
@@ -0,0 +1,164 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 60
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# model architecture
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 4
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - True
+  - False
+  use_sync_bn: True
+  models:
+    - Teacher:
+        name: ResNet101_vd
+        class_num: *class_num
+    - Student:
+        name: PPLCNet_x1_0
+        class_num: *class_num
+        pretrained: True
+        use_ssld: True
+        
+
+  infer_model_name: "Student"
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationDMLLoss:
+        weight: 1.0
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.4
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/text_image_orientation/
+      cls_label_path: ./dataset/text_image_orientation/train_list_for_distill.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - TimmAutoAugment:
+            prob: 0.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/text_image_orientation/
+      cls_label_path: ./dataset/text_image_orientation/test_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/text_image_orientation/img_rot0_demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 2
+    class_id_map_file: ppcls/utils/PULC_label_list/text_image_orientation_label_list.txt
+
+Metric:
+    Train:
+    - DistillationTopkAcc:
+        model_key: "Student"
+        topk: [1, 2]
+    Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0_search.yaml b/ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0_search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..0ba7881569ae568c96540073ed57e1e9c5f5d6ca
--- /dev/null
+++ b/ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0_search.yaml
@@ -0,0 +1,146 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 60
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  start_eval_epoch: 40
+
+
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+  class_num: 4
+  pretrained: True
+  use_ssld: True
+
+
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.04
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/text_image_orientation/
+      cls_label_path: ./dataset/text_image_orientation/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - TimmAutoAugment:
+            prob: 0.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/text_image_orientation/
+      cls_label_path: ./dataset/text_image_orientation/test_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/text_image_orientation/img_rot0_demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 2
+    class_id_map_file: ppcls/utils/PULC_label_list/text_image_orientation_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/text_image_orientation/SwinTransformer_tiny_patch4_window7_224.yaml b/ppcls/configs/PULC/text_image_orientation/SwinTransformer_tiny_patch4_window7_224.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..4d123cd4bcbf21c553a6f06ac6d767e9f38b1471
--- /dev/null
+++ b/ppcls/configs/PULC/text_image_orientation/SwinTransformer_tiny_patch4_window7_224.yaml
@@ -0,0 +1,157 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 60
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# model architecture
+Arch:
+  name: SwinTransformer_tiny_patch4_window7_224
+  class_num: 4
+  pretrained: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+Optimizer:
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
+  lr:
+    name: Cosine
+    learning_rate: 2.5e-4
+    eta_min: 1e-5
+    warmup_epoch: 20
+    warmup_start_lr: 1e-6
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/text_image_orientation/
+      cls_label_path: ./dataset/text_image_orientation/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+            interpolation: bicubic
+            backend: pil
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/text_image_orientation/
+      cls_label_path: ./dataset/text_image_orientation/test_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+            interpolation: bicubic
+            backend: pil
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/text_image_orientation/img_rot0_demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+        interpolation: bicubic
+        backend: pil
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 2
+    class_id_map_file: ppcls/utils/PULC_label_list/text_image_orientation_label_list.txt
+
+Metric:
+  Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/text_image_orientation/search.yaml b/ppcls/configs/PULC/text_image_orientation/search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..d8e65f5f07922985c2dc2a9756e28fcd7f7a0c16
--- /dev/null
+++ b/ppcls/configs/PULC/text_image_orientation/search.yaml
@@ -0,0 +1,41 @@
+base_config_file: ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0_search.yaml
+distill_config_file: ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0_distillation.yaml
+
+gpus: 0,1,2,3
+output_dir: output/search_text_image_orientation
+search_times: 1
+search_dict:
+  - search_key: lrs
+    replace_config:
+      - Optimizer.lr.learning_rate
+    search_values: [0.1, 0.2, 0.4, 0.8]
+  - search_key: resolutions
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.1.RandCropImage.size
+      - DataLoader.Train.dataset.transform_ops.2.TimmAutoAugment.img_size
+    search_values: [176, 192, 224]
+  - search_key: ra_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.2.TimmAutoAugment.prob
+    search_values: [0.0, 0.1, 0.5]
+  - search_key: re_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.4.RandomErasing.EPSILON
+    search_values: [0.0, 0.1, 0.5]
+  - search_key: lr_mult_list
+    replace_config:
+      - Arch.lr_mult_list
+    search_values:
+      - [0.0, 0.0, 0.3, 0.5, 0.8, 1.0]
+      - [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
+      - [0.0, 0.4, 0.4, 0.8, 0.8, 1.0]
+      - [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
+teacher:
+  rm_keys:
+    - Arch.lr_mult_list
+  search_values:
+    - ResNet101_vd
+    - ResNet50_vd
+final_replace:
+  Arch.lr_mult_list: Arch.models.1.Student.lr_mult_list
+
diff --git a/ppcls/configs/PULC/textline_orientation/MobileNetV3_small_x0_35.yaml b/ppcls/configs/PULC/textline_orientation/MobileNetV3_small_x0_35.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..040868378eb2cb80e66b72e6a1903c69a0833d7b
--- /dev/null
+++ b/ppcls/configs/PULC/textline_orientation/MobileNetV3_small_x0_35.yaml
@@ -0,0 +1,134 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  start_eval_epoch: 18
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# model architecture
+Arch:
+  name: MobileNetV3_small_x0_35
+  class_num: 2
+  pretrained: True
+  use_sync_bn: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.13
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/textline_orientation/
+      cls_label_path: ./dataset/textline_orientation/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/textline_orientation/
+      cls_label_path: ./dataset/textline_orientation/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/textline_orientation/textline_orientation_test_0_0.png
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 1
+    class_id_map_file: ppcls/utils/PULC_label_list/textline_orientation_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml b/ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..3ab3657d8a93f7825fa2c79fe341db5dbfdfa123
--- /dev/null
+++ b/ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0.yaml
@@ -0,0 +1,143 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  start_eval_epoch: 18
+  eval_interval: 1
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 80, 160]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+  class_num: 2
+  pretrained: True
+  use_ssld: True
+  stride_list: [2, [2, 1], [2, 1], [2, 1], [2, 1]]
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/textline_orientation/
+      cls_label_path: ./dataset/textline_orientation/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [160, 80]
+        - TimmAutoAugment:
+            prob: 1.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: [160, 80]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 16
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/textline_orientation/
+      cls_label_path: ./dataset/textline_orientation/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [160, 80]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/textline_orientation/textline_orientation_test_0_0.png
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        size: [160, 80]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 1
+    class_id_map_file: ppcls/utils/PULC_label_list/textline_orientation_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0_224x224.yaml b/ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0_224x224.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..17b9cbb158285bf6e451625f088a8f9b69705e6a
--- /dev/null
+++ b/ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0_224x224.yaml
@@ -0,0 +1,132 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  start_eval_epoch: 18
+  eval_interval: 1
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+  class_num: 2
+  pretrained: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.04
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/textline_orientation/
+      cls_label_path: ./dataset/textline_orientation/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 16
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/textline_orientation/
+      cls_label_path: ./dataset/textline_orientation/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/textline_orientation/textline_orientation_test_0_0.png
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 1
+    class_id_map_file: ppcls/utils/PULC_label_list/textline_orientation_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0_distillation.yaml b/ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0_distillation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2cc57e637153d7420d2f99bdece5bb0c8e5b0079
--- /dev/null
+++ b/ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0_distillation.yaml
@@ -0,0 +1,162 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  start_eval_epoch: 18
+  eval_interval: 1
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 80, 160]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# model architecture
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 2
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - True
+  - False
+  use_sync_bn: True
+  models:
+    - Teacher:
+        name: ResNet101_vd
+        class_num: *class_num
+        stride_list: [2, [2, 1], [2, 1], [2, 1], [2, 1]]
+    - Student:
+        name: PPLCNet_x1_0
+        class_num: *class_num
+        stride_list: [2, [2, 1], [2, 1], [2, 1], [2, 1]]
+        pretrained: True
+        use_ssld: True
+
+  infer_model_name: "Student"
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationDMLLoss:
+        weight: 1.0
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/textline_orientation/
+      cls_label_path: ./dataset/textline_orientation/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [160, 80]
+        - TimmAutoAugment:
+            prob: 1.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: [160, 80]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 16
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/textline_orientation/
+      cls_label_path: ./dataset/textline_orientation/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [160, 80]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/textline_orientation/textline_orientation_test_0_0.png
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        size: [160, 80]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 1
+    class_id_map_file: ppcls/utils/PULC_label_list/textline_orientation_label_list.txt
+
+Metric:
+    Train:
+    - DistillationTopkAcc:
+        model_key: "Student"
+        topk: [1, 2]
+    Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0_search.yaml b/ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0_search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e9e1863776522ca412168b8f11cef47f41bd3e63
--- /dev/null
+++ b/ppcls/configs/PULC/textline_orientation/PPLCNet_x1_0_search.yaml
@@ -0,0 +1,144 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  start_eval_epoch: 18
+  eval_interval: 1
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 48, 192]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+  class_num: 2
+  pretrained: True
+  use_ssld: True
+  stride_list: [2, [2, 1], [2, 1], [2, 1], [2, 1]]
+  
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.5
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/textline_orientation/
+      cls_label_path: ./dataset/textline_orientation/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [192, 48]
+        - TimmAutoAugment:
+            prob: 0.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: [192, 48]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 16
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/textline_orientation/
+      cls_label_path: ./dataset/textline_orientation/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [192, 48]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/textline_orientation/textline_orientation_test_0_0.png 
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        size: [192, 48]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 1
+    class_id_map_file: ppcls/utils/PULC_label_list/textline_orientation_label_list.txt 
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/textline_orientation/SwinTransformer_tiny_patch4_window7_224.yaml b/ppcls/configs/PULC/textline_orientation/SwinTransformer_tiny_patch4_window7_224.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..a466d5e08d78eb8859ae0fa3c46e61f5c94d9509
--- /dev/null
+++ b/ppcls/configs/PULC/textline_orientation/SwinTransformer_tiny_patch4_window7_224.yaml
@@ -0,0 +1,164 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  start_eval_epoch: 10
+  epochs: 20
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# mixed precision training
+AMP:
+  scale_loss: 128.0
+  use_dynamic_loss_scaling: True
+  # O1: mixed fp16
+  level: O1
+
+# model architecture
+Arch:
+  name: SwinTransformer_tiny_patch4_window7_224
+  class_num: 2
+  pretrained: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+Optimizer:
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm
+  one_dim_param_no_weight_decay: True
+  lr:
+    name: Cosine
+    learning_rate: 1e-4
+    eta_min: 2e-6
+    warmup_epoch: 5
+    warmup_start_lr: 2e-7
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/textline_orientation/
+      cls_label_path: ./dataset/textline_orientation/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+            interpolation: bicubic
+            backend: pil
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/textline_orientation/
+      cls_label_path: ./dataset/textline_orientation/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/textline_orientation/textline_orientation_test_0_0.png 
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 1
+    class_id_map_file: ppcls/utils/PULC_label_list/textline_orientation_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 2]
+  Eval:
+    - TopkAcc:
+        topk: [1, 2]
diff --git a/ppcls/configs/PULC/textline_orientation/search.yaml b/ppcls/configs/PULC/textline_orientation/search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..4419949bc6fc8e266d64ae21bb9f1ed7015e65b3
--- /dev/null
+++ b/ppcls/configs/PULC/textline_orientation/search.yaml
@@ -0,0 +1,41 @@
+base_config_file: ppcls/configs/PULC/text_direction/PPLCNet_x1_0.yaml
+distill_config_file: ppcls/configs/PULC/text_direction/PPLCNet_x1_0_distillation.yaml
+
+gpus: 0,1,2,3
+output_dir: output/search_text
+search_times: 1
+search_dict:
+  - search_key: lrs
+    replace_config:
+      - Optimizer.lr.learning_rate
+    search_values: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
+  - search_key: resolutions
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.1.ResizeImage.size
+      - DataLoader.Train.dataset.transform_ops.2.TimmAutoAugment.img_size
+      - DataLoader.Eval.dataset.transform_ops.1.ResizeImage.size
+    search_values: [[192, 48], [180, 60], [160, 80]]
+  - search_key: ra_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.2.TimmAutoAugment.prob
+    search_values: [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
+  - search_key: re_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.4.RandomErasing.EPSILON
+    search_values: [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
+  - search_key: lr_mult_list
+    replace_config:
+      - Arch.lr_mult_list
+    search_values:
+      - [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
+      - [0.0, 0.4, 0.4, 0.8, 0.8, 1.0]
+      - [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
+teacher:
+  rm_keys:
+    - Arch.lr_mult_list
+  search_values:
+    - ResNet101_vd
+    - ResNet50_vd
+final_replace:
+  Arch.lr_mult_list: Arch.models.1.Student.lr_mult_list
+
diff --git a/ppcls/configs/PULC/traffic_sign/MobileNetV3_samll_x0_35.yaml b/ppcls/configs/PULC/traffic_sign/MobileNetV3_samll_x0_35.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..5ebe7441ed307bc0dad25be396db6fa9d849a55b
--- /dev/null
+++ b/ppcls/configs/PULC/traffic_sign/MobileNetV3_samll_x0_35.yaml
@@ -0,0 +1,132 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 10
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+
+# model architecture
+Arch:
+  name: MobileNetV3_small_x0_35
+  class_num: 232
+  pretrained: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/traffic_sign/label_list_train.txt
+      delimiter: "\t"
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/traffic_sign/label_list_test.txt
+      delimiter: "\t"
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/inference_deployment/whl_demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/PULC_label_list/traffic_sign_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
+
diff --git a/ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml b/ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..5362d07b7b821f13dad7c1520a978a952d4cbad4
--- /dev/null
+++ b/ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0.yaml
@@ -0,0 +1,148 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  start_eval_epoch: 0
+  epochs: 10
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+  class_num: 232
+  pretrained: True
+  use_ssld: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.02
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/traffic_sign/label_list_train.txt
+      delimiter: "\t"
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - TimmAutoAugment:
+            prob: 0.5
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/traffic_sign/label_list_test.txt
+      delimiter: "\t"
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/traffic_sign/99603_17806.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/PULC_label_list/traffic_sign_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
diff --git a/ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0_distillation.yaml b/ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0_distillation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b00c250e1ec01b13b469dfbf8ed472bd2270af23
--- /dev/null
+++ b/ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0_distillation.yaml
@@ -0,0 +1,172 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 10
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# mixed precision training
+AMP:
+  scale_loss: 128.0
+  use_dynamic_loss_scaling: True
+  # O1: mixed fp16
+  level: O1
+
+# model architecture
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 232
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - True
+  - False
+  models:
+    - Teacher:
+        name: ResNet101_vd
+        class_num: *class_num
+        pretrained: False
+    - Student:
+        name: PPLCNet_x1_0
+        class_num: *class_num
+        pretrained: True
+        use_ssld: True
+
+  infer_model_name: "Student"
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationDMLLoss:
+        weight: 1.0
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/traffic_sign/label_list_train_for_distillation.txt
+      delimiter: "\t"
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - TimmAutoAugment:
+            prob: 0.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/traffic_sign/label_list_test.txt
+      delimiter: "\t"
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/inference_deployment/whl_demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/PULC_label_list/traffic_sign_label_list.txt
+
+Metric:
+    Train:
+    - DistillationTopkAcc:
+        model_key: "Student"
+        topk: [1, 5]
+    Eval:
+    - TopkAcc:
+        topk: [1, 5]
diff --git a/ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0_search.yaml b/ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0_search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..27fbc4b862073723004f6bb6ad679dff8d78214a
--- /dev/null
+++ b/ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0_search.yaml
@@ -0,0 +1,148 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  start_eval_epoch: 0
+  epochs: 10
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+  class_num: 232
+  pretrained: True
+  # use_ssld: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/traffic_sign/label_list_train.txt
+      delimiter: "\t"
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - TimmAutoAugment:
+            prob: 0.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/traffic_sign/label_list_test.txt
+      delimiter: "\t"
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: deploy/images/PULC/traffic_sign/99603_17806.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/PULC_label_list/traffic_sign_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
diff --git a/ppcls/configs/PULC/traffic_sign/SwinTransformer_tiny_patch4_window7_224.yaml b/ppcls/configs/PULC/traffic_sign/SwinTransformer_tiny_patch4_window7_224.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..ae86ae6220b58b9535b60004ce3140ab29380621
--- /dev/null
+++ b/ppcls/configs/PULC/traffic_sign/SwinTransformer_tiny_patch4_window7_224.yaml
@@ -0,0 +1,170 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  start_eval_epoch: 0
+  epochs: 10
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: False
+
+# mixed precision training
+AMP:
+  scale_loss: 128.0
+  use_dynamic_loss_scaling: True
+  # O1: mixed fp16
+  level: O1
+
+# model architecture
+Arch:
+  name: SwinTransformer_tiny_patch4_window7_224
+  class_num: 232
+  pretrained: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+Optimizer:
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm
+  one_dim_param_no_weight_decay: True
+  lr:
+    name: Cosine
+    learning_rate: 2e-4
+    eta_min: 2e-6
+    warmup_epoch: 5
+    warmup_start_lr: 2e-7
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/traffic_sign/label_list_train.txt
+      delimiter: "\t"
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+            interpolation: bicubic
+            backend: pil
+        - RandFlipImage:
+            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/traffic_sign/label_list_test.txt
+      delimiter: "\t"
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/inference_deployment/whl_demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/PULC_label_list/traffic_sign_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
+
+
diff --git a/ppcls/configs/PULC/traffic_sign/search.yaml b/ppcls/configs/PULC/traffic_sign/search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..029d042dff669be9a4c3751d65e9da70e8c47a73
--- /dev/null
+++ b/ppcls/configs/PULC/traffic_sign/search.yaml
@@ -0,0 +1,41 @@
+base_config_file: ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0_search.yaml
+distill_config_file: ppcls/configs/PULC/traffic_sign/PPLCNet_x1_0_distillation.yaml
+
+gpus: 0,1,2,3
+output_dir: output/search_traffic_sign
+search_times: 1
+search_dict:
+  - search_key: lrs
+    replace_config:
+      - Optimizer.lr.learning_rate
+    search_values: [0.0075, 0.01, 0.0125]
+  - search_key: resolutions
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.1.RandCropImage.size
+      - DataLoader.Train.dataset.transform_ops.2.TimmAutoAugment.img_size
+    search_values: [176, 192, 224]
+  - search_key: ra_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.2.TimmAutoAugment.prob
+    search_values: [0.0, 0.1, 0.5]
+  - search_key: re_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.4.RandomErasing.EPSILON
+    search_values: [0.0, 0.1, 0.5]
+  - search_key: lr_mult_list
+    replace_config:
+      - Arch.lr_mult_list
+    search_values:
+      - [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
+      - [0.0, 0.4, 0.4, 0.8, 0.8, 1.0]
+      - [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
+teacher:
+  algorithm: "skl-ugi"
+  rm_keys:
+    - Arch.lr_mult_list
+  search_values:
+    - ResNet101_vd
+    - ResNet50_vd
+final_replace:
+  Arch.lr_mult_list: Arch.models.1.Student.lr_mult_list
+
diff --git a/ppcls/configs/PULC/vehicle_attribute/MobileNetV3_small_x0_35.yaml b/ppcls/configs/PULC/vehicle_attribute/MobileNetV3_small_x0_35.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..a35bc61145a4ed97c1f02ba3b4c587b8686aaa14
--- /dev/null
+++ b/ppcls/configs/PULC/vehicle_attribute/MobileNetV3_small_x0_35.yaml
@@ -0,0 +1,115 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: "./output/"
+  device: "gpu"
+  save_interval: 5
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 30
+  print_batch_step: 20
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 192, 256]
+  save_inference_dir: "./inference"
+  use_multilabel: True
+
+# model architecture
+Arch:
+  name: "MobileNetV3_small_x0_35"
+  pretrained: True
+  class_num: 19
+  infer_add_softmax: False
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+  Eval:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/VeRi/"
+      cls_label_path: "dataset/VeRi/train_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [256, 192]
+        - Padv2:
+            size: [276, 212]
+            pad_mode: 1
+            fill_value: 0
+        - RandomCropImage:
+            size: [256, 192]
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: True
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+  Eval:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/VeRi/"
+      cls_label_path: "dataset/VeRi/test_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [256, 192]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+
+Metric:
+  Eval:
+    - ATTRMetric:
+
+
diff --git a/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml b/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..a3369a9eef9c8bc54ee5b26582f0a6c4ede789fd
--- /dev/null
+++ b/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml
@@ -0,0 +1,149 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: "./output/"
+  device: "gpu"
+  save_interval: 5
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 30
+  print_batch_step: 20
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 192, 256]
+  save_inference_dir: "./inference"
+  use_multilabel: True
+
+# model architecture
+Arch:
+  name: "PPLCNet_x1_0"
+  pretrained: True
+  class_num: 19
+  use_ssld: True
+  lr_mult_list: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
+  infer_add_softmax: False
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+  Eval:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.0125
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/VeRi/"
+      cls_label_path: "dataset/VeRi/train_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [256, 192]
+        - TimmAutoAugment:
+            prob: 0.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: [256, 192]
+        - Padv2:
+            size: [276, 212]
+            pad_mode: 1
+            fill_value: 0
+        - RandomCropImage:
+            size: [256, 192]
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.5
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: True
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+  Eval:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/VeRi/"
+      cls_label_path: "dataset/VeRi/test_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [256, 192]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: ./deploy/images/PULC/vehicle_attribute/0002_c002_00030670_0.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        size: [256, 192]
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: VehicleAttribute
+    color_threshold: 0.5
+    type_threshold: 0.5
+
+Metric:
+  Eval:
+    - ATTRMetric:
+
+
diff --git a/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0_distillation.yaml b/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0_distillation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c5144f3730700d46cb7e5597e76fc5c10ec38b63
--- /dev/null
+++ b/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0_distillation.yaml
@@ -0,0 +1,150 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: "./output/"
+  device: "gpu"
+  save_interval: 5
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 30
+  print_batch_step: 20
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 192, 256]
+  save_inference_dir: "./inference"
+  use_multilabel: True
+
+# model architecture
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 19
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - True
+  - False
+  use_ssld: True
+  models:
+    - Teacher:
+        name: ResNet101_vd
+        class_num: *class_num
+    - Student:
+        name: PPLCNet_x1_0
+        class_num: *class_num
+        pretrained: True
+        use_ssld: True
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationMultiLabelLoss:
+        weight: 1.0
+        model_names: ["Student"]
+        weight_ratio: True
+        size_sum: True
+    - DistillationDMLLoss:
+        weight: 1.0
+        weight_ratio: True
+        sum_across_class_dim: False
+        model_name_pairs:
+        - ["Student", "Teacher"]
+    
+  Eval:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/VeRi/"
+      cls_label_path: "dataset/VeRi/train_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [256, 192]
+        - TimmAutoAugment:
+            prob: 0.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: [256, 192]
+        - Padv2:
+            size: [276, 212]
+            pad_mode: 1
+            fill_value: 0
+        - RandomCropImage:
+            size: [256, 192]
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: True
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+  Eval:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/VeRi/"
+      cls_label_path: "dataset/VeRi/test_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [256, 192]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+
+Metric:
+  Eval:
+    - ATTRMetric:
+
+
diff --git a/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0_search.yaml b/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0_search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..5f84c2a6552c2a688c2311a1af2e695f047d4402
--- /dev/null
+++ b/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0_search.yaml
@@ -0,0 +1,129 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: "./output/"
+  device: "gpu"
+  save_interval: 5
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 30
+  print_batch_step: 20
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 192, 256]
+  save_inference_dir: "./inference"
+  use_multilabel: True
+
+# model architecture
+Arch:
+  name: "PPLCNet_x1_0"
+  pretrained: True
+  use_ssld: True
+  class_num: 19
+  infer_add_softmax: False
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+  Eval:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/VeRi/"
+      cls_label_path: "dataset/VeRi/train_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [256, 192]
+        - TimmAutoAugment:
+            prob: 0.0
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: [256, 192]
+        - Padv2:
+            size: [276, 212]
+            pad_mode: 1
+            fill_value: 0
+        - RandomCropImage:
+            size: [256, 192]
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - RandomErasing:
+            EPSILON: 0.0
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: True
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+  Eval:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/VeRi/"
+      cls_label_path: "dataset/VeRi/test_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [256, 192]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+
+Metric:
+  Eval:
+    - ATTRMetric:
+
+
diff --git a/ppcls/configs/PULC/vehicle_attribute/Res2Net200_vd_26w_4s.yaml b/ppcls/configs/PULC/vehicle_attribute/Res2Net200_vd_26w_4s.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c6618f960571a6161ca939210c9b21df6d1d847c
--- /dev/null
+++ b/ppcls/configs/PULC/vehicle_attribute/Res2Net200_vd_26w_4s.yaml
@@ -0,0 +1,122 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: "./output/mo"
+  device: "gpu"
+  save_interval: 5
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 30
+  print_batch_step: 20
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 192, 256]
+  save_inference_dir: "./inference"
+  use_multilabel: True
+
+# mixed precision training
+AMP:
+  scale_loss: 128.0
+  use_dynamic_loss_scaling: True
+  # O1: mixed fp16
+  level: O1
+
+# model architecture
+Arch:
+  name: "Res2Net200_vd_26w_4s"
+  pretrained: True
+  class_num: 19
+  infer_add_softmax: False
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+  Eval:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/VeRi/"
+      cls_label_path: "dataset/VeRi/train_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [256, 192]
+        - Padv2:
+            size: [276, 212]
+            pad_mode: 1
+            fill_value: 0
+        - RandomCropImage:
+            size: [256, 192]
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: True
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+  Eval:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/VeRi/"
+      cls_label_path: "dataset/VeRi/test_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [256, 192]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+
+Metric:
+  Eval:
+    - ATTRMetric:
+
+
diff --git a/ppcls/configs/PULC/vehicle_attribute/ResNet50.yaml b/ppcls/configs/PULC/vehicle_attribute/ResNet50.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..9218769c634949f9df44580aeb8e65df19805b9d
--- /dev/null
+++ b/ppcls/configs/PULC/vehicle_attribute/ResNet50.yaml
@@ -0,0 +1,116 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: "./output/"
+  device: "gpu"
+  save_interval: 5
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 30
+  print_batch_step: 20
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 192, 256]
+  save_inference_dir: "./inference"
+  use_multilabel: True
+
+# model architecture
+Arch:
+  name: "ResNet50"
+  pretrained: True
+  class_num: 19
+  infer_add_softmax: False
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+  Eval:
+    - MultiLabelLoss:
+        weight: 1.0
+        weight_ratio: True
+        size_sum: True
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/VeRi/"
+      cls_label_path: "dataset/VeRi/train_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [256, 192]
+        - Padv2:
+            size: [276, 212]
+            pad_mode: 1
+            fill_value: 0
+        - RandomCropImage:
+            size: [256, 192]
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: True
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+  Eval:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/VeRi/"
+      cls_label_path: "dataset/VeRi/test_list.txt"
+      label_ratio: True
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            size: [256, 192]
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+
+Metric:
+  Eval:
+    - ATTRMetric:
+
+
diff --git a/ppcls/configs/PULC/vehicle_attribute/search.yaml b/ppcls/configs/PULC/vehicle_attribute/search.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2a16266bf3a3592be8ccb169dee837024d0b1b06
--- /dev/null
+++ b/ppcls/configs/PULC/vehicle_attribute/search.yaml
@@ -0,0 +1,35 @@
+base_config_file: ppcls/configs/PULC/vehicle_attr/PPLCNet_x1_0_search.yaml
+distill_config_file: ppcls/configs/PULC/vehicle_attr/PPLCNet_x1_0_distillation.yaml
+
+gpus: 0,1,2,3
+output_dir: output/search_vehicle_attr
+search_times: 1
+search_dict:
+  - search_key: lrs
+    replace_config:
+      - Optimizer.lr.learning_rate
+    search_values: [0.0075, 0.01, 0.0125]
+  - search_key: ra_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.2.TimmAutoAugment.prob
+    search_values: [0.0, 0.1, 0.5]
+  - search_key: re_probs
+    replace_config:
+      - DataLoader.Train.dataset.transform_ops.7.RandomErasing.EPSILON
+    search_values: [0.0, 0.1, 0.5]
+  - search_key: lr_mult_list
+    replace_config:
+      - Arch.lr_mult_list
+    search_values:
+      - [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
+      - [0.0, 0.4, 0.4, 0.8, 0.8, 1.0]
+      - [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
+teacher:
+  algorithm: "skl-ugi"
+  rm_keys:
+    - Arch.lr_mult_list
+  search_values:
+    - ResNet101_vd
+    - ResNet50_vd
+final_replace:
+  Arch.lr_mult_list: Arch.models.1.Student.lr_mult_list
diff --git a/ppcls/configs/metric_learning/adaface_ir18.yaml b/ppcls/configs/metric_learning/adaface_ir18.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2cbfe5da43763701b244b2422bf9ad82b19ef4d6
--- /dev/null
+++ b/ppcls/configs/metric_learning/adaface_ir18.yaml
@@ -0,0 +1,105 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: "./output/"
+  device: "gpu"
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 26
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 112, 112]
+  save_inference_dir: "./inference"
+  eval_mode: "adaface"
+
+# model architecture
+Arch:
+  name: "RecModel"
+  infer_output_key: "features"
+  infer_add_softmax: False
+  Backbone: 
+    name: "AdaFace_IR_18"
+    input_size: [112, 112]
+  Head:
+    name: "AdaMargin"  
+    embedding_size: 512
+    class_num: 70722
+    m: 0.4
+    s: 64
+    h: 0.333
+    t_alpha: 0.01
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [12, 20, 24]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+        name: "AdaFaceDataset"
+        root_dir: "dataset/face/"
+        label_path: "dataset/face/train_filter_label.txt"
+        transform:
+          - CropWithPadding:
+              prob: 0.2
+              padding_num: 0
+              size: [112, 112]
+              scale: [0.2, 1.0]
+              ratio: [0.75, 1.3333333333333333]
+          - RandomInterpolationAugment:
+              prob: 0.2
+          - ColorJitter:
+              prob: 0.2
+              brightness: 0.5
+              contrast: 0.5
+              saturation: 0.5
+              hue: 0
+          - RandomHorizontalFlip:
+          - ToTensor:
+          - Normalize:
+              mean: [0.5, 0.5, 0.5]
+              std: [0.5, 0.5, 0.5]
+    sampler:
+        name: DistributedBatchSampler
+        batch_size: 256
+        drop_last: False
+        shuffle: True
+    loader:
+        num_workers: 6
+        use_shared_memory: True
+
+  Eval:
+    dataset:
+      name: FiveValidationDataset
+      val_data_path: dataset/face/faces_emore
+      concat_mem_file_name: dataset/face/faces_emore/concat_validation_memfile
+    sampler:
+        name: BatchSampler
+        batch_size: 256
+        drop_last: False
+        shuffle: True
+    loader:
+        num_workers: 6
+        use_shared_memory: True
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
\ No newline at end of file
diff --git a/ppcls/configs/practical_models/.gitkeep b/ppcls/configs/practical_models/.gitkeep
new file mode 100644
index 0000000000000000000000000000000000000000..8b137891791fe96927ad78e64b0aad7bded08bdc
--- /dev/null
+++ b/ppcls/configs/practical_models/.gitkeep
@@ -0,0 +1 @@
+
diff --git a/ppcls/configs/reid/strong_baseline/baseline.yaml b/ppcls/configs/reid/strong_baseline/baseline.yaml
index 35980206b19bab76f46df54e143adaecc1f4b566..be9d9b5c8a04e4cb95e054ebccc3e029aa826cf1 100644
--- a/ppcls/configs/reid/strong_baseline/baseline.yaml
+++ b/ppcls/configs/reid/strong_baseline/baseline.yaml
@@ -24,7 +24,7 @@ Arch:
   infer_add_softmax: False
   Backbone:
     name: "ResNet50"
-    pretrained: True
+    pretrained: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/others/resnet50-19c8e357_torch2paddle.pdparams
     stem_act: null
   BackboneStopLayer:
     name: "flatten"
diff --git a/ppcls/configs/reid/strong_baseline/softmax_triplet.yaml b/ppcls/configs/reid/strong_baseline/softmax_triplet.yaml
index 6f9cd955626316fe5267e3f9289b93b4317f736f..9694373b045c04eadc0dda7a6b69726966102182 100644
--- a/ppcls/configs/reid/strong_baseline/softmax_triplet.yaml
+++ b/ppcls/configs/reid/strong_baseline/softmax_triplet.yaml
@@ -24,7 +24,7 @@ Arch:
   infer_add_softmax: False
   Backbone:
     name: "ResNet50_last_stage_stride1"
-    pretrained: True
+    pretrained: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/others/resnet50-19c8e357_torch2paddle.pdparams
     stem_act: null
   BackboneStopLayer:
     name: "flatten"
diff --git a/ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml b/ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml
index 22af5e516ca4b9945bc8413ed56c67c972b48609..b225ebd86ae28e6769f6ec631e527ee46e781f9e 100644
--- a/ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml
+++ b/ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml
@@ -24,7 +24,7 @@ Arch:
   infer_add_softmax: False
   Backbone:
     name: "ResNet50_last_stage_stride1"
-    pretrained: True
+    pretrained: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/others/resnet50-19c8e357_torch2paddle.pdparams
     stem_act: null
   BackboneStopLayer:
     name: "flatten"
diff --git a/ppcls/data/__init__.py b/ppcls/data/__init__.py
index 9fc4d760be545ffa93652c80d285e17ad0c8ae57..80cf3bc9af826e935fe0fe6ccf8cad8d6924d370 100644
--- a/ppcls/data/__init__.py
+++ b/ppcls/data/__init__.py
@@ -30,6 +30,7 @@ from ppcls.data.dataloader.icartoon_dataset import ICartoonDataset
 from ppcls.data.dataloader.mix_dataset import MixDataset
 from ppcls.data.dataloader.multi_scale_dataset import MultiScaleDataset
 from ppcls.data.dataloader.person_dataset import Market1501, MSMT17
+from ppcls.data.dataloader.face_dataset import FiveValidationDataset, AdaFaceDataset
 
 
 # sampler
@@ -88,7 +89,7 @@ def build_dataloader(config, mode, device, use_dali=False, seed=None):
 
     # build sampler
     config_sampler = config[mode]['sampler']
-    if "name" not in config_sampler:
+    if config_sampler and "name" not in config_sampler:
         batch_sampler = None
         batch_size = config_sampler["batch_size"]
         drop_last = config_sampler["drop_last"]
diff --git a/ppcls/data/dataloader/__init__.py b/ppcls/data/dataloader/__init__.py
index 2b1d92b76bd202e36086f21a3a092c3673277690..796f4b458410e5b4b8540b72dd663711c4ad9f46 100644
--- a/ppcls/data/dataloader/__init__.py
+++ b/ppcls/data/dataloader/__init__.py
@@ -10,3 +10,4 @@ from ppcls.data.dataloader.mix_sampler import MixSampler
 from ppcls.data.dataloader.multi_scale_sampler import MultiScaleSampler
 from ppcls.data.dataloader.pk_sampler import PKSampler
 from ppcls.data.dataloader.person_dataset import Market1501, MSMT17
+from ppcls.data.dataloader.face_dataset import AdaFaceDataset, FiveValidationDataset
diff --git a/ppcls/data/dataloader/dali.py b/ppcls/data/dataloader/dali.py
index a340a946c921bedd475531eb3bd9172f49a99e1e..faef45e26b3dee2e17464a502f42f9886eac6518 100644
--- a/ppcls/data/dataloader/dali.py
+++ b/ppcls/data/dataloader/dali.py
@@ -23,7 +23,6 @@ import nvidia.dali.types as types
 import paddle
 from nvidia.dali import fn
 from nvidia.dali.pipeline import Pipeline
-from nvidia.dali.plugin.base_iterator import LastBatchPolicy
 from nvidia.dali.plugin.paddle import DALIGenericIterator
 
 
diff --git a/ppcls/data/dataloader/face_dataset.py b/ppcls/data/dataloader/face_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..a32cc2c5f89aa8c8e4904e7decc6ec5fb996aab3
--- /dev/null
+++ b/ppcls/data/dataloader/face_dataset.py
@@ -0,0 +1,163 @@
+import os
+import json
+import numpy as np
+from PIL import Image
+import cv2
+import paddle
+import paddle.vision.datasets as datasets
+from paddle.vision import transforms
+from paddle.vision.transforms import functional as F
+from paddle.io import Dataset
+from .common_dataset import create_operators
+from ppcls.data.preprocess import transform as transform_func
+
+# code is based on AdaFace: https://github.com/mk-minchul/AdaFace
+
+
+class AdaFaceDataset(Dataset):
+    def __init__(self, root_dir, label_path, transform=None):
+        self.root_dir = root_dir
+        self.transform = create_operators(transform)
+
+        with open(label_path) as fd:
+            lines = fd.readlines()
+        self.samples = []
+        for l in lines:
+            l = l.strip().split()
+            self.samples.append([os.path.join(root_dir, l[0]), int(l[1])])
+
+    def __len__(self):
+        return len(self.samples)
+
+    def __getitem__(self, index):
+        """
+        Args:
+            index (int): Index
+
+        Returns:
+            tuple: (sample, target) where target is class_index of the target class.
+        """
+        [path, target] = self.samples[index]
+        with open(path, 'rb') as f:
+            img = Image.open(f)
+            sample = img.convert('RGB')
+
+        # if 'WebFace' in self.root:
+        #     # swap rgb to bgr since image is in rgb for webface
+        #     sample = Image.fromarray(np.asarray(sample)[:, :, ::-1]
+        if self.transform is not None:
+            sample = transform_func(sample, self.transform)
+        return sample, target
+
+
+class FiveValidationDataset(Dataset):
+    def __init__(self, val_data_path, concat_mem_file_name):
+        '''
+        concatenates all validation datasets from emore
+        val_data_dict = {
+        'agedb_30': (agedb_30, agedb_30_issame),
+        "cfp_fp": (cfp_fp, cfp_fp_issame),
+        "lfw": (lfw, lfw_issame),
+        "cplfw": (cplfw, cplfw_issame),
+        "calfw": (calfw, calfw_issame),
+        }
+        agedb_30: 0
+        cfp_fp: 1
+        lfw: 2
+        cplfw: 3
+        calfw: 4
+        '''
+        val_data = get_val_data(val_data_path)
+        age_30, cfp_fp, lfw, age_30_issame, cfp_fp_issame, lfw_issame, cplfw, cplfw_issame, calfw, calfw_issame = val_data
+        val_data_dict = {
+            'agedb_30': (age_30, age_30_issame),
+            "cfp_fp": (cfp_fp, cfp_fp_issame),
+            "lfw": (lfw, lfw_issame),
+            "cplfw": (cplfw, cplfw_issame),
+            "calfw": (calfw, calfw_issame),
+        }
+        self.dataname_to_idx = {
+            "agedb_30": 0,
+            "cfp_fp": 1,
+            "lfw": 2,
+            "cplfw": 3,
+            "calfw": 4
+        }
+
+        self.val_data_dict = val_data_dict
+        # concat all dataset
+        all_imgs = []
+        all_issame = []
+        all_dataname = []
+        key_orders = []
+        for key, (imgs, issame) in val_data_dict.items():
+            all_imgs.append(imgs)
+            dup_issame = [
+            ]  # hacky way to make the issame length same as imgs. [1, 1, 0, 0, ...]
+            for same in issame:
+                dup_issame.append(same)
+                dup_issame.append(same)
+            all_issame.append(dup_issame)
+            all_dataname.append([self.dataname_to_idx[key]] * len(imgs))
+            key_orders.append(key)
+        assert key_orders == ['agedb_30', 'cfp_fp', 'lfw', 'cplfw', 'calfw']
+
+        if isinstance(all_imgs[0], np.memmap):
+            self.all_imgs = read_memmap(concat_mem_file_name)
+        else:
+            self.all_imgs = np.concatenate(all_imgs)
+
+        self.all_issame = np.concatenate(all_issame)
+        self.all_dataname = np.concatenate(all_dataname)
+
+    def __getitem__(self, index):
+        x_np = self.all_imgs[index].copy()
+        x = paddle.to_tensor(x_np)
+        y = self.all_issame[index]
+        dataname = self.all_dataname[index]
+        return x, y, dataname, index
+
+    def __len__(self):
+        return len(self.all_imgs)
+
+
+def read_memmap(mem_file_name):
+    # r+ mode: Open existing file for reading and writing
+    with open(mem_file_name + '.conf', 'r') as file:
+        memmap_configs = json.load(file)
+        return np.memmap(mem_file_name, mode='r+', \
+                        shape=tuple(memmap_configs['shape']), \
+                        dtype=memmap_configs['dtype'])
+
+
+def get_val_pair(path, name, use_memfile=True):
+    # installing bcolz should set proxy to access internet
+    import bcolz
+    if use_memfile:
+        mem_file_dir = os.path.join(path, name, 'memfile')
+        mem_file_name = os.path.join(mem_file_dir, 'mem_file.dat')
+        if os.path.isdir(mem_file_dir):
+            print('laoding validation data memfile')
+            np_array = read_memmap(mem_file_name)
+        else:
+            os.makedirs(mem_file_dir)
+            carray = bcolz.carray(rootdir=os.path.join(path, name), mode='r')
+            np_array = np.array(carray)
+            #  mem_array = make_memmap(mem_file_name, np_array)
+            #  del np_array, mem_array
+            del np_array
+            np_array = read_memmap(mem_file_name)
+    else:
+        np_array = bcolz.carray(rootdir=os.path.join(path, name), mode='r')
+
+    issame = np.load(os.path.join(path, '{}_list.npy'.format(name)))
+    return np_array, issame
+
+
+def get_val_data(data_path):
+    agedb_30, agedb_30_issame = get_val_pair(data_path, 'agedb_30')
+    cfp_fp, cfp_fp_issame = get_val_pair(data_path, 'cfp_fp')
+    lfw, lfw_issame = get_val_pair(data_path, 'lfw')
+    cplfw, cplfw_issame = get_val_pair(data_path, 'cplfw')
+    calfw, calfw_issame = get_val_pair(data_path, 'calfw')
+    return agedb_30, cfp_fp, lfw, agedb_30_issame, cfp_fp_issame, lfw_issame, cplfw, cplfw_issame, calfw, calfw_issame
diff --git a/ppcls/data/postprocess/__init__.py b/ppcls/data/postprocess/__init__.py
index 54678dc443ebab5bf55d54d9284d328bbc4523b3..6b8b7730bf6ac224cffb9f91ff88f230a14b45bf 100644
--- a/ppcls/data/postprocess/__init__.py
+++ b/ppcls/data/postprocess/__init__.py
@@ -18,6 +18,7 @@ from . import topk, threshoutput
 
 from .topk import Topk, MultiLabelTopk
 from .threshoutput import ThreshOutput
+from .attr_rec import VehicleAttribute, PersonAttribute
 
 
 def build_postprocess(config):
diff --git a/ppcls/data/postprocess/attr_rec.py b/ppcls/data/postprocess/attr_rec.py
new file mode 100644
index 0000000000000000000000000000000000000000..a8d492501833ac4ccd83d3aea108e7e34c46cadf
--- /dev/null
+++ b/ppcls/data/postprocess/attr_rec.py
@@ -0,0 +1,173 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+
+
+class VehicleAttribute(object):
+    def __init__(self, color_threshold=0.5, type_threshold=0.5):
+        self.color_threshold = color_threshold
+        self.type_threshold = type_threshold
+        self.color_list = [
+            "yellow", "orange", "green", "gray", "red", "blue", "white",
+            "golden", "brown", "black"
+        ]
+        self.type_list = [
+            "sedan", "suv", "van", "hatchback", "mpv", "pickup", "bus",
+            "truck", "estate"
+        ]
+
+    def __call__(self, x, file_names=None):
+        if isinstance(x, dict):
+            x = x['logits']
+        assert isinstance(x, paddle.Tensor)
+        if file_names is not None:
+            assert x.shape[0] == len(file_names)
+        x = F.sigmoid(x).numpy()
+
+        # postprocess output of predictor
+        batch_res = []
+        for idx, res in enumerate(x):
+            res = res.tolist()
+            label_res = []
+            color_idx = np.argmax(res[:10])
+            type_idx = np.argmax(res[10:])
+            print(color_idx, type_idx)
+            if res[color_idx] >= self.color_threshold:
+                color_info = f"Color: ({self.color_list[color_idx]}, prob: {res[color_idx]})"
+            else:
+                color_info = "Color unknown"
+
+            if res[type_idx + 10] >= self.type_threshold:
+                type_info = f"Type: ({self.type_list[type_idx]}, prob: {res[type_idx + 10]})"
+            else:
+                type_info = "Type unknown"
+
+            label_res = f"{color_info}, {type_info}"
+
+            threshold_list = [self.color_threshold
+                              ] * 10 + [self.type_threshold] * 9
+            pred_res = (np.array(res) > np.array(threshold_list)
+                        ).astype(np.int8).tolist()
+            batch_res.append({
+                "attr": label_res,
+                "pred": pred_res,
+                "file_name": file_names[idx]
+            })
+        return batch_res
+
+
+
+class PersonAttribute(object):
+    def __init__(self,
+                 threshold=0.5,
+                 glasses_threshold=0.3,
+                 hold_threshold=0.6):
+        self.threshold = threshold
+        self.glasses_threshold = glasses_threshold
+        self.hold_threshold = hold_threshold
+
+    def __call__(self, x, file_names=None):
+        if isinstance(x, dict):
+            x = x['logits']
+        assert isinstance(x, paddle.Tensor)
+        if file_names is not None:
+            assert x.shape[0] == len(file_names)
+        x = F.sigmoid(x).numpy()
+
+        # postprocess output of predictor
+        age_list = ['AgeLess18', 'Age18-60', 'AgeOver60']
+        direct_list = ['Front', 'Side', 'Back']
+        bag_list = ['HandBag', 'ShoulderBag', 'Backpack']
+        upper_list = ['UpperStride', 'UpperLogo', 'UpperPlaid', 'UpperSplice']
+        lower_list = [
+            'LowerStripe', 'LowerPattern', 'LongCoat', 'Trousers', 'Shorts',
+            'Skirt&Dress'
+        ]
+        batch_res = []
+        for idx, res in enumerate(x):
+            res = res.tolist()
+            label_res = []
+            # gender 
+            gender = 'Female' if res[22] > self.threshold else 'Male'
+            label_res.append(gender)
+            # age
+            age = age_list[np.argmax(res[19:22])]
+            label_res.append(age)
+            # direction 
+            direction = direct_list[np.argmax(res[23:])]
+            label_res.append(direction)
+            # glasses
+            glasses = 'Glasses: '
+            if res[1] > self.glasses_threshold:
+                glasses += 'True'
+            else:
+                glasses += 'False'
+            label_res.append(glasses)
+            # hat
+            hat = 'Hat: '
+            if res[0] > self.threshold:
+                hat += 'True'
+            else:
+                hat += 'False'
+            label_res.append(hat)
+            # hold obj
+            hold_obj = 'HoldObjectsInFront: '
+            if res[18] > self.hold_threshold:
+                hold_obj += 'True'
+            else:
+                hold_obj += 'False'
+            label_res.append(hold_obj)
+            # bag
+            bag = bag_list[np.argmax(res[15:18])]
+            bag_score = res[15 + np.argmax(res[15:18])]
+            bag_label = bag if bag_score > self.threshold else 'No bag'
+            label_res.append(bag_label)
+            # upper
+            upper_res = res[4:8]
+            upper_label = 'Upper:'
+            sleeve = 'LongSleeve' if res[3] > res[2] else 'ShortSleeve'
+            upper_label += ' {}'.format(sleeve)
+            for i, r in enumerate(upper_res):
+                if r > self.threshold:
+                    upper_label += ' {}'.format(upper_list[i])
+            label_res.append(upper_label)
+            # lower
+            lower_res = res[8:14]
+            lower_label = 'Lower: '
+            has_lower = False
+            for i, l in enumerate(lower_res):
+                if l > self.threshold:
+                    lower_label += ' {}'.format(lower_list[i])
+                    has_lower = True
+            if not has_lower:
+                lower_label += ' {}'.format(lower_list[np.argmax(lower_res)])
+
+            label_res.append(lower_label)
+            # shoe
+            shoe = 'Boots' if res[14] > self.threshold else 'No boots'
+            label_res.append(shoe)
+
+            threshold_list = [0.5] * len(res)
+            threshold_list[1] = self.glasses_threshold
+            threshold_list[18] = self.hold_threshold
+            pred_res = (np.array(res) > np.array(threshold_list)
+                        ).astype(np.int8).tolist()
+
+            batch_res.append({"attributes": label_res, "output": pred_res})
+        return batch_res
+
diff --git a/ppcls/data/postprocess/topk.py b/ppcls/data/postprocess/topk.py
index df02719471300ea8e2b7c1db286d104adabe116f..76772f568eef157c4bb5e3485ea9ec5bc41f9d20 100644
--- a/ppcls/data/postprocess/topk.py
+++ b/ppcls/data/postprocess/topk.py
@@ -21,9 +21,9 @@ import paddle.nn.functional as F
 class Topk(object):
     def __init__(self, topk=1, class_id_map_file=None, delimiter=None):
         assert isinstance(topk, (int, ))
-        self.class_id_map = self.parse_class_id_map(class_id_map_file)
         self.topk = topk
         self.delimiter = delimiter if delimiter is not None else " "
+        self.class_id_map = self.parse_class_id_map(class_id_map_file)
 
     def parse_class_id_map(self, class_id_map_file):
         if class_id_map_file is None:
diff --git a/ppcls/data/preprocess/__init__.py b/ppcls/data/preprocess/__init__.py
index 6822e2081f26ff033239b31edf8d5bdeffe85ce0..d0cfcf2409d2d890adcf03ef0e03b2475625ead8 100644
--- a/ppcls/data/preprocess/__init__.py
+++ b/ppcls/data/preprocess/__init__.py
@@ -33,6 +33,10 @@ from ppcls.data.preprocess.ops.operators import AugMix
 from ppcls.data.preprocess.ops.operators import Pad
 from ppcls.data.preprocess.ops.operators import ToTensor
 from ppcls.data.preprocess.ops.operators import Normalize
+from ppcls.data.preprocess.ops.operators import RandomHorizontalFlip
+from ppcls.data.preprocess.ops.operators import CropWithPadding
+from ppcls.data.preprocess.ops.operators import RandomInterpolationAugment
+from ppcls.data.preprocess.ops.operators import ColorJitter
 from ppcls.data.preprocess.ops.operators import RandomCropImage
 from ppcls.data.preprocess.ops.operators import Padv2
 
diff --git a/ppcls/data/preprocess/ops/operators.py b/ppcls/data/preprocess/ops/operators.py
index e5732d3925a6ea452c028c057b56bf9b335aee90..344675fdb85d6102bb99f03af4a17c8b9c00927e 100644
--- a/ppcls/data/preprocess/ops/operators.py
+++ b/ppcls/data/preprocess/ops/operators.py
@@ -25,8 +25,8 @@ import cv2
 import numpy as np
 from PIL import Image, ImageOps, __version__ as PILLOW_VERSION
 from paddle.vision.transforms import ColorJitter as RawColorJitter
-from paddle.vision.transforms import ToTensor, Normalize
-
+from paddle.vision.transforms import ToTensor, Normalize, RandomHorizontalFlip, RandomResizedCrop
+from paddle.vision.transforms import functional as F
 from .autoaugment import ImageNetPolicy
 from .functional import augmentations
 from ppcls.utils import logger
@@ -93,6 +93,42 @@ class UnifiedResize(object):
         return self.resize_func(src, size)
 
 
+class RandomInterpolationAugment(object):
+    def __init__(self, prob):
+        self.prob = prob
+
+    def _aug(self, img):
+        img_shape = img.shape
+        side_ratio = np.random.uniform(0.2, 1.0)
+        small_side = int(side_ratio * img_shape[0])
+        interpolation = np.random.choice([
+            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_AREA,
+            cv2.INTER_CUBIC, cv2.INTER_LANCZOS4
+        ])
+        small_img = cv2.resize(
+            img, (small_side, small_side), interpolation=interpolation)
+        interpolation = np.random.choice([
+            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_AREA,
+            cv2.INTER_CUBIC, cv2.INTER_LANCZOS4
+        ])
+        aug_img = cv2.resize(
+            small_img, (img_shape[1], img_shape[0]),
+            interpolation=interpolation)
+        return aug_img
+
+    def __call__(self, img):
+        if np.random.random() < self.prob:
+            if isinstance(img, np.ndarray):
+                return self._aug(img)
+            else:
+                pil_img = np.array(img)
+                aug_img = self._aug(pil_img)
+                img = Image.fromarray(aug_img.astype(np.uint8))
+                return img
+        else:
+            return img
+
+
 class OperatorParamError(ValueError):
     """ OperatorParamError
     """
@@ -170,6 +206,52 @@ class ResizeImage(object):
         return self._resize_func(img, (w, h))
 
 
+class CropWithPadding(RandomResizedCrop):
+    """
+    crop image and padding to original size
+    """
+
+    def __init__(self,
+                 prob=1,
+                 padding_num=0,
+                 size=224,
+                 scale=(0.08, 1.0),
+                 ratio=(3. / 4, 4. / 3),
+                 interpolation='bilinear',
+                 key=None):
+        super().__init__(size, scale, ratio, interpolation, key)
+        self.prob = prob
+        self.padding_num = padding_num
+
+    def __call__(self, img):
+        is_cv2_img = False
+        if isinstance(img, np.ndarray):
+            flag = True
+        if np.random.random() < self.prob:
+            # RandomResizedCrop augmentation
+            new = np.zeros_like(np.array(img)) + self.padding_num
+            #  orig_W, orig_H = F._get_image_size(sample)
+            orig_W, orig_H = self._get_image_size(img)
+            i, j, h, w = self._get_param(img)
+            cropped = F.crop(img, i, j, h, w)
+            new[i:i + h, j:j + w, :] = np.array(cropped)
+            if not isinstance:
+                new = Image.fromarray(new.astype(np.uint8))
+            return new
+        else:
+            return img
+
+    def _get_image_size(self, img):
+        if F._is_pil_image(img):
+            return img.size
+        elif F._is_numpy_image(img):
+            return img.shape[:2][::-1]
+        elif F._is_tensor_image(img):
+            return img.shape[1:][::-1]  # chw
+        else:
+            raise TypeError("Unexpected type {}".format(type(img)))
+
+
 class CropImage(object):
     """ crop image """
 
@@ -283,9 +365,6 @@ class RandomCropImage(object):
         j = random.randint(0, w - tw)
 
         img = img[i:i + th, j:j + tw, :]
-        if img.shape[0] != 256 or img.shape[1] != 192:
-            raise ValueError('sample: ', h, w, i, j, th, tw, img.shape)
-
         return img
 
 
@@ -533,16 +612,18 @@ class ColorJitter(RawColorJitter):
     """ColorJitter.
     """
 
-    def __init__(self, *args, **kwargs):
+    def __init__(self, prob=2, *args, **kwargs):
         super().__init__(*args, **kwargs)
+        self.prob = prob
 
     def __call__(self, img):
-        if not isinstance(img, Image.Image):
-            img = np.ascontiguousarray(img)
-            img = Image.fromarray(img)
-        img = super()._apply_image(img)
-        if isinstance(img, Image.Image):
-            img = np.asarray(img)
+        if np.random.random() < self.prob:
+            if not isinstance(img, Image.Image):
+                img = np.ascontiguousarray(img)
+                img = Image.fromarray(img)
+            img = super()._apply_image(img)
+            if isinstance(img, Image.Image):
+                img = np.asarray(img)
         return img
 
 
diff --git a/ppcls/engine/engine.py b/ppcls/engine/engine.py
index 05b9f3913bf23c07292eec785874d14403dfda9d..094427bf275675699c785f6c0b9a345ecf101959 100644
--- a/ppcls/engine/engine.py
+++ b/ppcls/engine/engine.py
@@ -75,8 +75,9 @@ class Engine(object):
         print_config(config)
 
         # init train_func and eval_func
-        assert self.eval_mode in ["classification", "retrieval"], logger.error(
-            "Invalid eval mode: {}".format(self.eval_mode))
+        assert self.eval_mode in [
+            "classification", "retrieval", "adaface"
+        ], logger.error("Invalid eval mode: {}".format(self.eval_mode))
         self.train_epoch_func = train_epoch
         self.eval_func = getattr(evaluation, self.eval_mode + "_eval")
 
@@ -115,7 +116,7 @@ class Engine(object):
                 self.config["DataLoader"], "Train", self.device, self.use_dali)
         if self.mode == "eval" or (self.mode == "train" and
                                    self.config["Global"]["eval_during_train"]):
-            if self.eval_mode == "classification":
+            if self.eval_mode in ["classification", "adaface"]:
                 self.eval_dataloader = build_dataloader(
                     self.config["DataLoader"], "Eval", self.device,
                     self.use_dali)
@@ -151,39 +152,33 @@ class Engine(object):
                 self.eval_loss_func = None
 
         # build metric
-        if self.mode == 'train':
-            metric_config = self.config.get("Metric")
-            if metric_config is not None:
-                metric_config = metric_config.get("Train")
-                if metric_config is not None:
-                    if hasattr(
-                            self.train_dataloader, "collate_fn"
-                    ) and self.train_dataloader.collate_fn is not None:
-                        for m_idx, m in enumerate(metric_config):
-                            if "TopkAcc" in m:
-                                msg = f"'TopkAcc' metric can not be used when setting 'batch_transform_ops' in config. The 'TopkAcc' metric has been removed."
-                                logger.warning(msg)
-                                break
+        if self.mode == 'train' and "Metric" in self.config and "Train" in self.config[
+                "Metric"] and self.config["Metric"]["Train"]:
+            metric_config = self.config["Metric"]["Train"]
+            if hasattr(self.train_dataloader, "collate_fn"
+                       ) and self.train_dataloader.collate_fn is not None:
+                for m_idx, m in enumerate(metric_config):
+                    if "TopkAcc" in m:
+                        msg = f"Unable to calculate accuracy when using \"batch_transform_ops\". The metric \"{m}\" has been removed."
+                        logger.warning(msg)
                         metric_config.pop(m_idx)
-                    self.train_metric_func = build_metrics(metric_config)
-                else:
-                    self.train_metric_func = None
+            self.train_metric_func = build_metrics(metric_config)
         else:
             self.train_metric_func = None
 
         if self.mode == "eval" or (self.mode == "train" and
                                    self.config["Global"]["eval_during_train"]):
-            metric_config = self.config.get("Metric")
             if self.eval_mode == "classification":
-                if metric_config is not None:
-                    metric_config = metric_config.get("Eval")
-                    if metric_config is not None:
-                        self.eval_metric_func = build_metrics(metric_config)
+                if "Metric" in self.config and "Eval" in self.config["Metric"]:
+                    self.eval_metric_func = build_metrics(self.config["Metric"]
+                                                          ["Eval"])
+                else:
+                    self.eval_metric_func = None
             elif self.eval_mode == "retrieval":
-                if metric_config is None:
-                    metric_config = [{"name": "Recallk", "topk": (1, 5)}]
+                if "Metric" in self.config and "Eval" in self.config["Metric"]:
+                    metric_config = self.config["Metric"]["Eval"]
                 else:
-                    metric_config = metric_config["Eval"]
+                    metric_config = [{"name": "Recallk", "topk": (1, 5)}]
                 self.eval_metric_func = build_metrics(metric_config)
         else:
             self.eval_metric_func = None
@@ -446,6 +441,8 @@ class Engine(object):
 
                 if isinstance(out, list):
                     out = out[0]
+                if isinstance(out, dict) and "Student" in out:
+                    out = out["Student"]
                 if isinstance(out, dict) and "logits" in out:
                     out = out["logits"]
                 if isinstance(out, dict) and "output" in out:
@@ -457,7 +454,9 @@ class Engine(object):
 
     def export(self):
         assert self.mode == "export"
-        use_multilabel = self.config["Global"].get("use_multilabel", False)
+        use_multilabel = self.config["Global"].get(
+            "use_multilabel",
+            False) and "ATTRMetric" in self.config["Metric"]["Eval"][0]
         model = ExportModel(self.config["Arch"], self.model, use_multilabel)
         if self.config["Global"]["pretrained_model"] is not None:
             load_dygraph_pretrain(model.base_model,
diff --git a/ppcls/engine/evaluation/__init__.py b/ppcls/engine/evaluation/__init__.py
index e0cd778887bf6f0e7ce05c18b587e5b54bcf6b3f..a301ad7fda34b87a959b59251b6dd0fffe9eb3e9 100644
--- a/ppcls/engine/evaluation/__init__.py
+++ b/ppcls/engine/evaluation/__init__.py
@@ -14,3 +14,4 @@
 
 from ppcls.engine.evaluation.classification import classification_eval
 from ppcls.engine.evaluation.retrieval import retrieval_eval
+from ppcls.engine.evaluation.adaface import adaface_eval
\ No newline at end of file
diff --git a/ppcls/engine/evaluation/adaface.py b/ppcls/engine/evaluation/adaface.py
new file mode 100644
index 0000000000000000000000000000000000000000..e62144b5cb374a14a93616c33e56ee74bef0eb01
--- /dev/null
+++ b/ppcls/engine/evaluation/adaface.py
@@ -0,0 +1,260 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import time
+import numpy as np
+import platform
+import paddle
+import sklearn
+from sklearn.model_selection import KFold
+from sklearn.decomposition import PCA
+
+from ppcls.utils.misc import AverageMeter
+from ppcls.utils import logger
+
+
+def fuse_features_with_norm(stacked_embeddings, stacked_norms):
+    assert stacked_embeddings.ndim == 3  # (n_features_to_fuse, batch_size, channel)
+    assert stacked_norms.ndim == 3  # (n_features_to_fuse, batch_size, 1)
+    pre_norm_embeddings = stacked_embeddings * stacked_norms
+    fused = pre_norm_embeddings.sum(axis=0)
+    norm = paddle.norm(fused, 2, 1, True)
+    fused = paddle.divide(fused, norm)
+    return fused, norm
+
+
+def adaface_eval(engine, epoch_id=0):
+    output_info = dict()
+    time_info = {
+        "batch_cost": AverageMeter(
+            "batch_cost", '.5f', postfix=" s,"),
+        "reader_cost": AverageMeter(
+            "reader_cost", ".5f", postfix=" s,"),
+    }
+    print_batch_step = engine.config["Global"]["print_batch_step"]
+
+    metric_key = None
+    tic = time.time()
+    unique_dict = {}
+    for iter_id, batch in enumerate(engine.eval_dataloader):
+        images, labels, dataname, image_index = batch
+        if iter_id == 5:
+            for key in time_info:
+                time_info[key].reset()
+        time_info["reader_cost"].update(time.time() - tic)
+        batch_size = images.shape[0]
+        batch[0] = paddle.to_tensor(images)
+        embeddings = engine.model(images, labels)['features']
+        norms = paddle.divide(embeddings, paddle.norm(embeddings, 2, 1, True))
+        embeddings = paddle.divide(embeddings, norms)
+        fliped_images = paddle.flip(images, axis=[3])
+        flipped_embeddings = engine.model(fliped_images, labels)['features']
+        flipped_norms = paddle.divide(
+            flipped_embeddings, paddle.norm(flipped_embeddings, 2, 1, True))
+        flipped_embeddings = paddle.divide(flipped_embeddings, flipped_norms)
+        stacked_embeddings = paddle.stack(
+            [embeddings, flipped_embeddings], axis=0)
+        stacked_norms = paddle.stack([norms, flipped_norms], axis=0)
+        embeddings, norms = fuse_features_with_norm(stacked_embeddings,
+                                                    stacked_norms)
+
+        for out, nor, label, data, idx in zip(embeddings, norms, labels,
+                                              dataname, image_index):
+            unique_dict[int(idx.numpy())] = {
+                'output': out,
+                'norm': nor,
+                'target': label,
+                'dataname': data
+            }
+            #  calc metric
+        time_info["batch_cost"].update(time.time() - tic)
+        if iter_id % print_batch_step == 0:
+            time_msg = "s, ".join([
+                "{}: {:.5f}".format(key, time_info[key].avg)
+                for key in time_info
+            ])
+
+            ips_msg = "ips: {:.5f} images/sec".format(
+                batch_size / time_info["batch_cost"].avg)
+
+            metric_msg = ", ".join([
+                "{}: {:.5f}".format(key, output_info[key].val)
+                for key in output_info
+            ])
+            logger.info("[Eval][Epoch {}][Iter: {}/{}]{}, {}, {}".format(
+                epoch_id, iter_id,
+                len(engine.eval_dataloader), metric_msg, time_msg, ips_msg))
+
+        tic = time.time()
+
+    unique_keys = sorted(unique_dict.keys())
+    all_output_tensor = paddle.stack(
+        [unique_dict[key]['output'] for key in unique_keys], axis=0)
+    all_norm_tensor = paddle.stack(
+        [unique_dict[key]['norm'] for key in unique_keys], axis=0)
+    all_target_tensor = paddle.stack(
+        [unique_dict[key]['target'] for key in unique_keys], axis=0)
+    all_dataname_tensor = paddle.stack(
+        [unique_dict[key]['dataname'] for key in unique_keys], axis=0)
+
+    eval_result = cal_metric(all_output_tensor, all_norm_tensor,
+                             all_target_tensor, all_dataname_tensor)
+
+    metric_msg = ", ".join([
+        "{}: {:.5f}".format(key, output_info[key].avg) for key in output_info
+    ])
+    face_msg = ", ".join([
+        "{}: {:.5f}".format(key, eval_result[key])
+        for key in eval_result.keys()
+    ])
+    logger.info("[Eval][Epoch {}][Avg]{}".format(epoch_id, metric_msg + ", " +
+                                                 face_msg))
+
+    # return 1st metric in the dict
+    return eval_result['all_test_acc']
+
+
+def cal_metric(all_output_tensor, all_norm_tensor, all_target_tensor,
+               all_dataname_tensor):
+    all_target_tensor = all_target_tensor.reshape([-1])
+    all_dataname_tensor = all_dataname_tensor.reshape([-1])
+    dataname_to_idx = {
+        "agedb_30": 0,
+        "cfp_fp": 1,
+        "lfw": 2,
+        "cplfw": 3,
+        "calfw": 4
+    }
+    idx_to_dataname = {val: key for key, val in dataname_to_idx.items()}
+    test_logs = {}
+    # _, indices = paddle.unique(all_dataname_tensor, return_index=True, return_inverse=False, return_counts=False)
+    for dataname_idx in all_dataname_tensor.unique():
+        dataname = idx_to_dataname[dataname_idx.item()]
+        # per dataset evaluation
+        embeddings = all_output_tensor[all_dataname_tensor ==
+                                       dataname_idx].numpy()
+        labels = all_target_tensor[all_dataname_tensor == dataname_idx].numpy()
+        issame = labels[0::2]
+        tpr, fpr, accuracy, best_thresholds = evaluate_face(
+            embeddings, issame, nrof_folds=10)
+        acc, best_threshold = accuracy.mean(), best_thresholds.mean()
+
+        num_test_samples = len(embeddings)
+        test_logs[f'{dataname}_test_acc'] = acc
+        test_logs[f'{dataname}_test_best_threshold'] = best_threshold
+        test_logs[f'{dataname}_num_test_samples'] = num_test_samples
+
+    test_acc = np.mean([
+        test_logs[f'{dataname}_test_acc']
+        for dataname in dataname_to_idx.keys()
+        if f'{dataname}_test_acc' in test_logs
+    ])
+
+    test_logs['all_test_acc'] = test_acc
+    return test_logs
+
+
+def evaluate_face(embeddings, actual_issame, nrof_folds=10, pca=0):
+    # Calculate evaluation metrics
+    thresholds = np.arange(0, 4, 0.01)
+    embeddings1 = embeddings[0::2]
+    embeddings2 = embeddings[1::2]
+    tpr, fpr, accuracy, best_thresholds = calculate_roc(
+        thresholds,
+        embeddings1,
+        embeddings2,
+        np.asarray(actual_issame),
+        nrof_folds=nrof_folds,
+        pca=pca)
+    return tpr, fpr, accuracy, best_thresholds
+
+
+def calculate_roc(thresholds,
+                  embeddings1,
+                  embeddings2,
+                  actual_issame,
+                  nrof_folds=10,
+                  pca=0):
+    assert (embeddings1.shape[0] == embeddings2.shape[0])
+    assert (embeddings1.shape[1] == embeddings2.shape[1])
+    nrof_pairs = min(len(actual_issame), embeddings1.shape[0])
+    nrof_thresholds = len(thresholds)
+    k_fold = KFold(n_splits=nrof_folds, shuffle=False)
+
+    tprs = np.zeros((nrof_folds, nrof_thresholds))
+    fprs = np.zeros((nrof_folds, nrof_thresholds))
+    accuracy = np.zeros((nrof_folds))
+    best_thresholds = np.zeros((nrof_folds))
+    indices = np.arange(nrof_pairs)
+    # print('pca', pca)
+    dist = None
+
+    if pca == 0:
+        diff = np.subtract(embeddings1, embeddings2)
+        dist = np.sum(np.square(diff), 1)
+
+    for fold_idx, (train_set, test_set) in enumerate(k_fold.split(indices)):
+        # print('train_set', train_set)
+        # print('test_set', test_set)
+        if pca > 0:
+            print('doing pca on', fold_idx)
+            embed1_train = embeddings1[train_set]
+            embed2_train = embeddings2[train_set]
+            _embed_train = np.concatenate((embed1_train, embed2_train), axis=0)
+            # print(_embed_train.shape)
+            pca_model = PCA(n_components=pca)
+            pca_model.fit(_embed_train)
+            embed1 = pca_model.transform(embeddings1)
+            embed2 = pca_model.transform(embeddings2)
+            embed1 = sklearn.preprocessing.normalize(embed1)
+            embed2 = sklearn.preprocessing.normalize(embed2)
+            # print(embed1.shape, embed2.shape)
+            diff = np.subtract(embed1, embed2)
+            dist = np.sum(np.square(diff), 1)
+
+        # Find the best threshold for the fold
+        acc_train = np.zeros((nrof_thresholds))
+        for threshold_idx, threshold in enumerate(thresholds):
+            _, _, acc_train[threshold_idx] = calculate_accuracy(
+                threshold, dist[train_set], actual_issame[train_set])
+        best_threshold_index = np.argmax(acc_train)
+        best_thresholds[fold_idx] = thresholds[best_threshold_index]
+        for threshold_idx, threshold in enumerate(thresholds):
+            tprs[fold_idx, threshold_idx], fprs[
+                fold_idx, threshold_idx], _ = calculate_accuracy(
+                    threshold, dist[test_set], actual_issame[test_set])
+        _, _, accuracy[fold_idx] = calculate_accuracy(
+            thresholds[best_threshold_index], dist[test_set],
+            actual_issame[test_set])
+
+    tpr = np.mean(tprs, 0)
+    fpr = np.mean(fprs, 0)
+    return tpr, fpr, accuracy, best_thresholds
+
+
+def calculate_accuracy(threshold, dist, actual_issame):
+    predict_issame = np.less(dist, threshold)
+    tp = np.sum(np.logical_and(predict_issame, actual_issame))
+    fp = np.sum(np.logical_and(predict_issame, np.logical_not(actual_issame)))
+    tn = np.sum(
+        np.logical_and(
+            np.logical_not(predict_issame), np.logical_not(actual_issame)))
+    fn = np.sum(np.logical_and(np.logical_not(predict_issame), actual_issame))
+
+    tpr = 0 if (tp + fn == 0) else float(tp) / float(tp + fn)
+    fpr = 0 if (fp + tn == 0) else float(fp) / float(fp + tn)
+    acc = float(tp + tn) / dist.size
+    return tpr, fpr, acc
diff --git a/ppcls/engine/evaluation/classification.py b/ppcls/engine/evaluation/classification.py
index 1f9b55fc33ff6b49e9e7f7bd3e9bcebdbf3e0093..61aa92b4f8be528b2a26b03b8af167608457308c 100644
--- a/ppcls/engine/evaluation/classification.py
+++ b/ppcls/engine/evaluation/classification.py
@@ -34,7 +34,6 @@ def classification_eval(engine, epoch_id=0):
     }
     print_batch_step = engine.config["Global"]["print_batch_step"]
 
-    metric_key = None
     tic = time.time()
     accum_samples = 0
     total_samples = len(
diff --git a/ppcls/loss/__init__.py b/ppcls/loss/__init__.py
index c1f2f95df7afd0a266304ea2ccdf5572d1de9625..741eb3b619d6ac070876a9acc87b7f1bdc75bf18 100644
--- a/ppcls/loss/__init__.py
+++ b/ppcls/loss/__init__.py
@@ -24,6 +24,7 @@ from .distillationloss import DistillationDistanceLoss
 from .distillationloss import DistillationRKDLoss
 from .distillationloss import DistillationKLDivLoss
 from .distillationloss import DistillationDKDLoss
+from .distillationloss import DistillationMultiLabelLoss
 from .multilabelloss import MultiLabelLoss
 from .afdloss import AFDLoss
 
diff --git a/ppcls/loss/afdloss.py b/ppcls/loss/afdloss.py
index 3e67e30b98df61576e40449015cc67a13dd6da60..e2f457451292f8b29f614d176288620d5c73006b 100644
--- a/ppcls/loss/afdloss.py
+++ b/ppcls/loss/afdloss.py
@@ -97,8 +97,6 @@ class Attention(nn.Layer):
         super().__init__()
         self.qk_dim = qk_dim
         self.n_t = n_t
-        # self.linear_trans_s = LinearTransformStudent(qk_dim, t_shapes, s_shapes, unique_t_shapes)
-        # self.linear_trans_t = LinearTransformTeacher(qk_dim, t_shapes)
 
         self.p_t = self.create_parameter(
             shape=[len(t_shapes), qk_dim],
diff --git a/ppcls/loss/distillationloss.py b/ppcls/loss/distillationloss.py
index c60a540db84edae1374e5370309256f1c98cd40a..4ca58d26db89dc7cf579da8d68a3a26d5d559222 100644
--- a/ppcls/loss/distillationloss.py
+++ b/ppcls/loss/distillationloss.py
@@ -22,6 +22,7 @@ from .distanceloss import DistanceLoss
 from .rkdloss import RKdAngle, RkdDistance
 from .kldivloss import KLDivLoss
 from .dkdloss import DKDLoss
+from .multilabelloss import MultiLabelLoss
 
 
 class DistillationCELoss(CELoss):
@@ -89,13 +90,16 @@ class DistillationDMLLoss(DMLLoss):
     def __init__(self,
                  model_name_pairs=[],
                  act="softmax",
+                 weight_ratio=False,
+                 sum_across_class_dim=False,
                  key=None,
                  name="loss_dml"):
-        super().__init__(act=act)
+        super().__init__(act=act, sum_across_class_dim=sum_across_class_dim)
         assert isinstance(model_name_pairs, list)
         self.key = key
         self.model_name_pairs = model_name_pairs
         self.name = name
+        self.weight_ratio = weight_ratio
 
     def forward(self, predicts, batch):
         loss_dict = dict()
@@ -105,7 +109,10 @@ class DistillationDMLLoss(DMLLoss):
             if self.key is not None:
                 out1 = out1[self.key]
                 out2 = out2[self.key]
-            loss = super().forward(out1, out2)
+            if self.weight_ratio is True:
+                loss = super().forward(out1, out2, batch)
+            else:
+                loss = super().forward(out1, out2)
             if isinstance(loss, dict):
                 for key in loss:
                     loss_dict["{}_{}_{}_{}".format(key, pair[0], pair[1],
@@ -122,6 +129,7 @@ class DistillationDistanceLoss(DistanceLoss):
     def __init__(self,
                  mode="l2",
                  model_name_pairs=[],
+                 act=None,
                  key=None,
                  name="loss_",
                  **kargs):
@@ -130,6 +138,13 @@ class DistillationDistanceLoss(DistanceLoss):
         self.key = key
         self.model_name_pairs = model_name_pairs
         self.name = name + mode
+        assert act in [None, "sigmoid", "softmax"]
+        if act == "sigmoid":
+            self.act = nn.Sigmoid()
+        elif act == "softmax":
+            self.act = nn.Softmax(axis=-1)
+        else:
+            self.act = None
 
     def forward(self, predicts, batch):
         loss_dict = dict()
@@ -139,6 +154,9 @@ class DistillationDistanceLoss(DistanceLoss):
             if self.key is not None:
                 out1 = out1[self.key]
                 out2 = out2[self.key]
+            if self.act is not None:
+                out1 = self.act(out1)
+                out2 = self.act(out2)
             loss = super().forward(out1, out2)
             for key in loss:
                 loss_dict["{}_{}_{}".format(self.name, key, idx)] = loss[key]
@@ -235,3 +253,34 @@ class DistillationDKDLoss(DKDLoss):
             loss = super().forward(out1, out2, batch)
             loss_dict[f"{self.name}_{pair[0]}_{pair[1]}"] = loss
         return loss_dict
+
+
+class DistillationMultiLabelLoss(MultiLabelLoss):
+    """
+    DistillationMultiLabelLoss
+    """
+
+    def __init__(self,
+                 model_names=[],
+                 epsilon=None,
+                 size_sum=False,
+                 weight_ratio=False,
+                 key=None,
+                 name="loss_mll"):
+        super().__init__(
+            epsilon=epsilon, size_sum=size_sum, weight_ratio=weight_ratio)
+        assert isinstance(model_names, list)
+        self.key = key
+        self.model_names = model_names
+        self.name = name
+
+    def forward(self, predicts, batch):
+        loss_dict = dict()
+        for name in self.model_names:
+            out = predicts[name]
+            if self.key is not None:
+                out = out[self.key]
+            loss = super().forward(out, batch)
+            for key in loss:
+                loss_dict["{}_{}".format(key, name)] = loss[key]
+        return loss_dict
diff --git a/ppcls/loss/dmlloss.py b/ppcls/loss/dmlloss.py
index 48bf6c02429084badb95cd9d5806a2ee4c20452e..e8983ed08a9e26da6b4df983becd8a9cbdbfab39 100644
--- a/ppcls/loss/dmlloss.py
+++ b/ppcls/loss/dmlloss.py
@@ -16,13 +16,15 @@ import paddle
 import paddle.nn as nn
 import paddle.nn.functional as F
 
+from ppcls.loss.multilabelloss import ratio2weight
+
 
 class DMLLoss(nn.Layer):
     """
     DMLLoss
     """
 
-    def __init__(self, act="softmax", eps=1e-12):
+    def __init__(self, act="softmax", sum_across_class_dim=False, eps=1e-12):
         super().__init__()
         if act is not None:
             assert act in ["softmax", "sigmoid"]
@@ -33,6 +35,7 @@ class DMLLoss(nn.Layer):
         else:
             self.act = None
         self.eps = eps
+        self.sum_across_class_dim = sum_across_class_dim
 
     def _kldiv(self, x, target):
         class_num = x.shape[-1]
@@ -40,11 +43,20 @@ class DMLLoss(nn.Layer):
             (target + self.eps) / (x + self.eps)) * class_num
         return cost
 
-    def forward(self, x, target):
+    def forward(self, x, target, gt_label=None):
         if self.act is not None:
             x = self.act(x)
             target = self.act(target)
         loss = self._kldiv(x, target) + self._kldiv(target, x)
         loss = loss / 2
-        loss = paddle.mean(loss)
+
+        # for multi-label dml loss
+        if gt_label is not None:
+            gt_label, label_ratio = gt_label[:, 0, :], gt_label[:, 1, :]
+            targets_mask = paddle.cast(gt_label > 0.5, 'float32')
+            weight = ratio2weight(targets_mask, paddle.to_tensor(label_ratio))
+            weight = weight * (gt_label > -1)
+            loss = loss * weight
+
+        loss = loss.sum(1).mean() if self.sum_across_class_dim else loss.mean()
         return {"DMLLoss": loss}
diff --git a/ppcls/metric/metrics.py b/ppcls/metric/metrics.py
index 4087cd4d4fd4eca0830d0ce253082dbbbbf16ec0..0c803ccfdbb29216381625ea3df4a4540c7b56c0 100644
--- a/ppcls/metric/metrics.py
+++ b/ppcls/metric/metrics.py
@@ -26,6 +26,7 @@ from easydict import EasyDict
 
 from ppcls.metric.avg_metrics import AvgMetrics
 from ppcls.utils.misc import AverageMeter, AttrMeter
+from ppcls.utils import logger
 
 
 class TopkAcc(AvgMetrics):
@@ -39,7 +40,7 @@ class TopkAcc(AvgMetrics):
 
     def reset(self):
         self.avg_meters = {
-            "top{}".format(k): AverageMeter("top{}".format(k))
+            f"top{k}": AverageMeter(f"top{k}")
             for k in self.topk
         }
 
@@ -47,11 +48,21 @@ class TopkAcc(AvgMetrics):
         if isinstance(x, dict):
             x = x["logits"]
 
+        output_dims = x.shape[-1]
+
         metric_dict = dict()
-        for k in self.topk:
-            metric_dict["top{}".format(k)] = paddle.metric.accuracy(
-                x, label, k=k)
-            self.avg_meters["top{}".format(k)].update(metric_dict["top{}".format(k)], x.shape[0])
+        for idx, k in enumerate(self.topk):
+            if output_dims < k:
+                msg = f"The output dims({output_dims}) is less than k({k}), and the argument {k} of Topk has been removed."
+                logger.warning(msg)
+                self.avg_meters.pop(f"top{k}")
+                continue
+            metric_dict[f"top{k}"] = paddle.metric.accuracy(x, label, k=k)
+            self.avg_meters[f"top{k}"].update(metric_dict[f"top{k}"],
+                                              x.shape[0])
+
+        self.topk = list(filter(lambda k: k <= output_dims, self.topk))
+
         return metric_dict
 
 
@@ -390,6 +401,7 @@ class AccuracyScore(MultiLabelMetric):
 def get_attr_metrics(gt_label, preds_probs, threshold):
     """
     index: evaluated label index
+    adapted from "https://github.com/valencebond/Rethinking_of_PAR/blob/master/metrics/pedestrian_metrics.py"
     """
     pred_label = (preds_probs > threshold).astype(int)
 
diff --git a/ppcls/static/program.py b/ppcls/static/program.py
index 7f2313a58f45bcf05de3c8c92fd205eeabcb4c3e..a6a80f13e07d6b040af17a16e6c0324492cfe174 100644
--- a/ppcls/static/program.py
+++ b/ppcls/static/program.py
@@ -371,6 +371,11 @@ def run(dataloader,
                 "Except RuntimeError when reading data from dataloader, try to read once again..."
             )
             continue
+        except IndexError:
+            logger.warning(
+                "Except IndexError when reading data from dataloader, try to read once again..."
+            )
+            continue
         idx += 1
         # ignore the warmup iters
         if idx == 5:
diff --git a/ppcls/utils/PULC_label_list/language_classification_label_list.txt b/ppcls/utils/PULC_label_list/language_classification_label_list.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8d9ee9dd86adfa7bdcced51220d48dde5511abc1
--- /dev/null
+++ b/ppcls/utils/PULC_label_list/language_classification_label_list.txt
@@ -0,0 +1,10 @@
+0 arabic
+1 chinese_cht
+2 cyrillic
+3 devanagari
+4 japan
+5 ka
+6 korean
+7 ta
+8 te
+9 latin
\ No newline at end of file
diff --git a/ppcls/utils/PULC_label_list/text_image_orientation_label_list.txt b/ppcls/utils/PULC_label_list/text_image_orientation_label_list.txt
new file mode 100644
index 0000000000000000000000000000000000000000..051944a929f323a3a25f1807ac0297170513484a
--- /dev/null
+++ b/ppcls/utils/PULC_label_list/text_image_orientation_label_list.txt
@@ -0,0 +1,4 @@
+0 0
+1 90
+2 180
+3 270
diff --git a/ppcls/utils/PULC_label_list/textline_orientation_label_list.txt b/ppcls/utils/PULC_label_list/textline_orientation_label_list.txt
new file mode 100644
index 0000000000000000000000000000000000000000..207b70c6b188d05ecb2a04c8f4946c993616e544
--- /dev/null
+++ b/ppcls/utils/PULC_label_list/textline_orientation_label_list.txt
@@ -0,0 +1,2 @@
+0 0_degree
+1 180_degree
diff --git a/ppcls/utils/PULC_label_list/traffic_sign_label_list.txt b/ppcls/utils/PULC_label_list/traffic_sign_label_list.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c1e41d539d1af5611b2b047d664000b8f41afb15
--- /dev/null
+++ b/ppcls/utils/PULC_label_list/traffic_sign_label_list.txt
@@ -0,0 +1,232 @@
+0 pl80
+1 w9
+2 p6
+3 ph4.2
+4 i8
+5 w14
+6 w33
+7 pa13
+8 im
+9 w58
+10 pl90
+11 il70
+12 p5
+13 pm55
+14 pl60
+15 ip
+16 p11
+17 pdd
+18 wc
+19 i2r
+20 w30
+21 pmr
+22 p23
+23 pl15
+24 pm10
+25 pss
+26 w1
+27 p4
+28 w38
+29 w50
+30 w34
+31 pw3.5
+32 iz
+33 w39
+34 w11
+35 p1n
+36 pr70
+37 pd
+38 pnl
+39 pg
+40 ph5.3
+41 w66
+42 il80
+43 pb
+44 pbm
+45 pm5
+46 w24
+47 w67
+48 w49
+49 pm40
+50 ph4
+51 w45
+52 i4
+53 w37
+54 ph2.6
+55 pl70
+56 ph5.5
+57 i14
+58 i11
+59 p7
+60 p29
+61 pne
+62 pr60
+63 pm13
+64 ph4.5
+65 p12
+66 p3
+67 w40
+68 pl5
+69 w13
+70 pr10
+71 p14
+72 i4l
+73 pr30
+74 pw4.2
+75 w16
+76 p17
+77 ph3
+78 i9
+79 w15
+80 w35
+81 pa8
+82 pt
+83 pr45
+84 w17
+85 pl30
+86 pcs
+87 pctl
+88 pr50
+89 ph4.4
+90 pm46
+91 pm35
+92 i15
+93 pa12
+94 pclr
+95 i1
+96 pcd
+97 pbp
+98 pcr
+99 w28
+100 ps
+101 pm8
+102 w18
+103 w2
+104 w52
+105 ph2.9
+106 ph1.8
+107 pe
+108 p20
+109 w36
+110 p10
+111 pn
+112 pa14
+113 w54
+114 ph3.2
+115 p2
+116 ph2.5
+117 w62
+118 w55
+119 pw3
+120 pw4.5
+121 i12
+122 ph4.3
+123 phclr
+124 i10
+125 pr5
+126 i13
+127 w10
+128 p26
+129 w26
+130 p8
+131 w5
+132 w42
+133 il50
+134 p13
+135 pr40
+136 p25
+137 w41
+138 pl20
+139 ph4.8
+140 pnlc
+141 ph3.3
+142 w29
+143 ph2.1
+144 w53
+145 pm30
+146 p24
+147 p21
+148 pl40
+149 w27
+150 pmb
+151 pc
+152 i6
+153 pr20
+154 p18
+155 ph3.8
+156 pm50
+157 pm25
+158 i2
+159 w22
+160 w47
+161 w56
+162 pl120
+163 ph2.8
+164 i7
+165 w12
+166 pm1.5
+167 pm2.5
+168 w32
+169 pm15
+170 ph5
+171 w19
+172 pw3.2
+173 pw2.5
+174 pl10
+175 il60
+176 w57
+177 w48
+178 w60
+179 pl100
+180 pr80
+181 p16
+182 pl110
+183 w59
+184 w64
+185 w20
+186 ph2
+187 p9
+188 il100
+189 w31
+190 w65
+191 ph2.4
+192 pr100
+193 p19
+194 ph3.5
+195 pa10
+196 pcl
+197 pl35
+198 p15
+199 w7
+200 pa6
+201 phcs
+202 w43
+203 p28
+204 w6
+205 w3
+206 w25
+207 pl25
+208 il110
+209 p1
+210 w46
+211 pn-2
+212 w51
+213 w44
+214 w63
+215 w23
+216 pm20
+217 w8
+218 pmblr
+219 w4
+220 i5
+221 il90
+222 w21
+223 p27
+224 pl50
+225 pl65
+226 w61
+227 ph2.2
+228 pm2
+229 i3
+230 pa18
+231 pw4
diff --git a/ppcls/utils/cls_demo/person_label_list.txt b/ppcls/utils/cls_demo/person_label_list.txt
deleted file mode 100644
index 8eea2b6dc2433abf303a0ea508021698559b749b..0000000000000000000000000000000000000000
--- a/ppcls/utils/cls_demo/person_label_list.txt
+++ /dev/null
@@ -1,2 +0,0 @@
-0 nobody
-1 someone
diff --git a/requirements.txt b/requirements.txt
index 5e927756a4a2341b91ca2e23065657bb09a4e514..4787aa84805e84c26a1030f773fbd89826e1aa56 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -4,9 +4,9 @@ opencv-python==4.4.0.46
 pillow
 tqdm
 PyYAML
-visualdl >= 2.2.0
+visualdl>=2.2.0
 scipy
-scikit-learn==0.23.2
+scikit-learn>=0.21.0
 gast==0.3.3
 faiss-cpu==1.7.1.post2
 easydict
diff --git a/setup.py b/setup.py
index 57045d31903917fdb8634887a1f6e7207871ead5..c935136f40b93ce32f1dce7f4be482e6dcb4bce9 100644
--- a/setup.py
+++ b/setup.py
@@ -38,13 +38,16 @@ setup(
     version='0.0.0',
     install_requires=requirements,
     license='Apache License 2.0',
-    description='Awesome Image Classification toolkits based on PaddlePaddle ',
+    description='A treasure chest for visual recognition powered by PaddlePaddle.',
     long_description=readme(),
     long_description_content_type='text/markdown',
     url='https://github.com/PaddlePaddle/PaddleClas',
     download_url='https://github.com/PaddlePaddle/PaddleClas.git',
     keywords=[
-        'A treasure chest for image classification powered by PaddlePaddle.'
+        'image-classification', 'image-recognition', 'pretrained-models',
+        'knowledge-distillation', 'product-recognition', 'autoaugment',
+        'cutmix', 'randaugment', 'gridmask', 'deit', 'repvgg',
+        'swin-transformer', 'image-retrieval-system'
     ],
     classifiers=[
         'Intended Audience :: Developers',
diff --git a/test_tipc/README.md b/test_tipc/README.md
index 4869f6e11ddc78b7c05c7805bfb25ba7e41b683d..a1178e5c6d5f78bc263a76fc0e6293255b40dcd8 100644
--- a/test_tipc/README.md
+++ b/test_tipc/README.md
@@ -35,18 +35,23 @@
 │   ├── MobileNetV3         # MobileNetV3系列模型测试配置文件目录
 │   │   ├── MobileNetV3_large_x1_0_train_infer_python.txt                                    #基础训练预测配置文件
 │   │   ├── MobileNetV3_large_x1_0_train_linux_gpu_fleet_amp_infer_python_linux_gpu_cpu.txt  #多机多卡训练预测配置文件
-│   │   └── MobileNetV3_large_x1_0_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt #混合精度训练预测配置文件
-│   └── ResNet              # ResNet系列模型测试配置文件目录
-│       ├── ResNet50_vd_train_infer_python.txt                                        #基础训练预测配置文件
-│       ├── ResNet50_vd_train_linux_gpu_fleet_amp_infer_python_linux_gpu_cpu.txt      #多机多卡训练预测配置文件
-│       └── ResNet50_vd_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt     #混合精度训练预测配置文件
-|   ......
+│   │   ├── MobileNetV3_large_x1_0_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt #混合精度训练预测配置文件
+│   │   ├── MobileNetV3_large_x1_0_paddle2onnx_infer_python.txt                              #paddle2onnx推理测试配置文件
+│   │   └── ......
+│   ├──ResNet              # ResNet系列模型测试配置文件目录
+│   │   ├── ResNet50_vd_train_infer_python.txt                                        #基础训练预测配置文件
+│   │   ├── ResNet50_vd_train_linux_gpu_fleet_amp_infer_python_linux_gpu_cpu.txt      #多机多卡训练预测配置文件
+│   │   ├── ResNet50_vd_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt     #混合精度训练预测配置文件
+│   │   ├── ResNet50_vd_paddle2onnx_infer_python.txt                                  #paddle2onnx推理测试配置文件
+│   │   └── ......
+│   └── ......
 ├── docs
 │   ├── guide.png
 │   └── test.png
 ├── prepare.sh                          # 完成test_*.sh运行所需要的数据和模型下载
 ├── README.md                           # 使用文档
 ├── results                             # 预先保存的预测结果，用于和实际预测结果进行精读比对
+├── test_paddle2onnx.sh                 # 测试paddle2onnx推理预测的主程序
 └── test_train_inference_python.sh      # 测试python训练预测的主程序
 ```
 
@@ -99,10 +104,13 @@ bash test_tipc/test_train_inference_python.sh ./test_tipc/configs/MobileNetV3/Mo
 
 ## 4 开始测试
 
-各功能测试中涉及混合精度、裁剪、量化等训练相关，及mkldnn、Tensorrt等多种预测相关参数配置，请点击下方相应链接了解更多细节和使用教程：  
+各功能测试中涉及混合精度、裁剪、量化等训练相关，及mkldnn、Tensorrt等多种预测相关参数配置，请点击下方相应链接了解更多细节和使用教程：
 
 - [test_train_inference_python 使用](docs/test_train_inference_python.md)：测试基于Python的模型训练、评估、推理等基本功能，包括裁剪、量化、蒸馏。
 - [test_inference_cpp 使用](docs/test_inference_cpp.md) ：测试基于C++的模型推理。
 - [test_serving 使用](docs/test_serving.md) ：测试基于Paddle Serving的服务化部署功能。
 - [test_lite_arm_cpu_cpp 使用](docs/test_lite_arm_cpu_cpp.md): 测试基于Paddle-Lite的ARM CPU端c++预测部署功能.
 - [test_paddle2onnx 使用](docs/test_paddle2onnx.md)：测试Paddle2ONNX的模型转化功能，并验证正确性。
+- [test_serving_infer_python 使用](docs/test_serving_infer_python.md)：测试python serving功能。
+- [test_serving_infer_cpp 使用](docs/test_serving_infer_cpp.md)：测试cpp serving功能。
+- [test_train_fleet_inference_python 使用](./docs/test_train_fleet_inference_python.md)：测试基于Python的多机多卡训练与推理等基本功能。
diff --git a/test_tipc/benchmark_train.sh b/test_tipc/benchmark_train.sh
index 793b89476fb829034687b442c517546f5d8a4cfc..5c4d4112ad691569914ccf9b84480db9b76fa024 100644
--- a/test_tipc/benchmark_train.sh
+++ b/test_tipc/benchmark_train.sh
@@ -225,7 +225,7 @@ for batch_size in ${batch_size_list[*]}; do
                 echo $cmd
                 eval $cmd
                 last_status=${PIPESTATUS[0]}
-                status_check $last_status "${cmd}" "${status_log}"
+                status_check $last_status "${cmd}" "${status_log}" "${model_name}"
             else
                 IFS=";"
                 unset_env=`unset CUDA_VISIBLE_DEVICES`
@@ -261,7 +261,7 @@ for batch_size in ${batch_size_list[*]}; do
                 echo $cmd
                 eval $cmd
                 last_status=${PIPESTATUS[0]}
-                status_check $last_status "${cmd}" "${status_log}"
+                status_check $last_status "${cmd}" "${status_log}" "${model_name}"
             fi
         done
     done
diff --git a/test_tipc/common_func.sh b/test_tipc/common_func.sh
index 63fa1014487ce43405896ddf97f5d2aae0344489..e0459366ed7d86d239624dc47937d91cc7704894 100644
--- a/test_tipc/common_func.sh
+++ b/test_tipc/common_func.sh
@@ -38,6 +38,7 @@ function func_set_params(){
 
 function func_parser_params(){
     strs=$1
+    MODE=$2
     IFS=":"
     array=(${strs})
     key=${array[0]}
@@ -64,10 +65,10 @@ function status_check(){
     last_status=$1   # the exit code
     run_command=$2
     run_log=$3
+    model_name=$4
     if [ $last_status -eq 0 ]; then
-        echo -e "\033[33m Run successfully with command - ${run_command}!  \033[0m" | tee -a ${run_log}
+        echo -e "\033[33m Run successfully with command - ${model_name} - ${run_command}!  \033[0m" | tee -a ${run_log}
     else
-        echo -e "\033[33m Run failed with command - ${run_command}!  \033[0m" | tee -a ${run_log}
+        echo -e "\033[33m Run failed with command - ${model_name} - ${run_command}!  \033[0m" | tee -a ${run_log}
     fi
 }
-
diff --git a/test_tipc/config/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/test_tipc/config/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..4c4e1be252629d6f8723a6b9831fb0461da13acf
--- /dev/null
+++ b/test_tipc/config/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt
@@ -0,0 +1,54 @@
+===========================train_params===========================
+model_name:GeneralRecognition_PPLCNet_x2_5
+python:python3.7
+gpu_list:192.168.0.1,192.168.0.2;0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.batch_size:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:norm_train
+norm_train:tools/train.py -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False
+pact_train:null
+fpgm_train:null
+distill_train:null
+null:null
+null:null
+##
+===========================eval_params===========================
+eval:tools/eval.py -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
+quant_export:null
+fpgm_export:null
+distill_export:null
+kl_quant:null
+export2:null
+pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/general_PPLCNet_x2_5_pretrained_v1.0.pdparams
+infer_model:../inference/
+infer_export:True
+infer_quant:Fasle
+inference:python/predict_rec.py -c configs/inference_rec.yaml
+-o Global.use_gpu:False
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1|16
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.rec_inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/Aliproduct/demo_test/
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
+null:null
+===========================infer_benchmark_params==========================
+random_infer_input:[{float32,[3,224,224]}]
\ No newline at end of file
diff --git a/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0-KL_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0-KL_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..6393d49a2f4b6d0e7228e88f046e738230d0c542
--- /dev/null
+++ b/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0-KL_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:MobileNetV3_large_x1_0_KL
+cpp_infer_type:cls
+cls_inference_model_dir:./MobileNetV3_large_x1_0_kl_quant_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/slim_model/MobileNetV3_large_x1_0_kl_quant_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0-KL_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0-KL_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..05efbacc76cbfd69279d658350abca9cfc607bd5
--- /dev/null
+++ b/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0-KL_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:MobileNetV3_large_x1_0_KL
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/slim_model/MobileNetV3_large_x1_0_kl_quant_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/MobileNetV3_large_x1_0_kl_quant_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/MobileNetV3_large_x1_0_kl_quant_serving/
+--serving_client:./deploy/paddleserving/MobileNetV3_large_x1_0_kl_quant_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0-KL_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0-KL_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..32b506afd594b486d6fe759ee2d1556f9b049a8c
--- /dev/null
+++ b/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0-KL_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:MobileNetV3_large_x1_0_KL
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/slim_model/MobileNetV3_large_x1_0_kl_quant_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/MobileNetV3_large_x1_0_kl_quant_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/MobileNetV3_large_x1_0_kl_quant_serving/
+--serving_client:./deploy/paddleserving/MobileNetV3_large_x1_0_kl_quant_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..cb292d2645735fab42f0f81d0a605f6d3b68ffed
--- /dev/null
+++ b/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:MobileNetV3_large_x1_0
+cpp_infer_type:cls
+cls_inference_model_dir:./MobileNetV3_large_x1_0_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..0212d0c454032bdcf292ac7de78b02c3072f510f
--- /dev/null
+++ b/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:MobileNetV3_large_x1_0
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/MobileNetV3_large_x1_0_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/MobileNetV3_large_x1_0_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/MobileNetV3_large_x1_0_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..0a3b777ecccc35bedd56ab7f19882ad8d1cd9b3d
--- /dev/null
+++ b/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:MobileNetV3_large_x1_0
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/MobileNetV3_large_x1_0_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/MobileNetV3_large_x1_0_serving/
+--serving_client:./deploy/paddleserving/MobileNetV3_large_x1_0_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..344983ad22e524d59089f3e12c496ce4356c2997
--- /dev/null
+++ b/test_tipc/config/MobileNetV3/MobileNetV3_large_x1_0_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:MobileNetV3_large_x1_0
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/MobileNetV3_large_x1_0_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/MobileNetV3_large_x1_0_serving/
+--serving_client:./deploy/paddleserving/MobileNetV3_large_x1_0_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/PP-ShiTu/PPShiTu_general_rec_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/PP-ShiTu/PPShiTu_general_rec_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..56dcff845be57473ca04a4a53b6c84b686b3a1ef
--- /dev/null
+++ b/test_tipc/config/PP-ShiTu/PPShiTu_general_rec_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:PP-ShiTu_general_rec
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/general_PPLCNet_x2_5_lite_v1.0_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/general_PPLCNet_x2_5_lite_v1.0_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/general_PPLCNet_x2_5_lite_v1.0_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/PP-ShiTu/PPShiTu_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/PP-ShiTu/PPShiTu_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
index b86aa9b1cffb58e2579b46e726e70a3d6f3b1790..c8727278e5838cb6cf6273e796dec5071cac4a1d 100644
--- a/test_tipc/config/PP-ShiTu/PPShiTu_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
+++ b/test_tipc/config/PP-ShiTu/PPShiTu_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -1,19 +1,19 @@
 ===========================cpp_infer_params===========================
 model_name:PPShiTu
 cpp_infer_type:shitu
-feature_inference_model_dir:./feature_inference/
-det_inference_model_dir:./det_inference
+feature_inference_model_dir:./general_PPLCNet_x2_5_lite_v1.0_infer/
+det_inference_model_dir:./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/
 cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar
 det_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
 infer_quant:False
 inference_cmd:./deploy/cpp_shitu/build/pp_shitu -c inference_drink.yaml
 use_gpu:True|False
-enable_mkldnn:True|False
-cpu_threads:1|6
+enable_mkldnn:False
+cpu_threads:1
 batch_size:1
-use_tensorrt:False|True
-precision:fp32|fp16
+use_tensorrt:False
+precision:fp32
 data_dir:./dataset/drink_dataset_v1.0
 benchmark:True
-generate_yaml_cmd:python3 test_tipc/generate_cpp_yaml.py
-transform_index_cmd:python3 deploy/cpp_shitu/tools/transform_id_map.py -c inference_drink.yaml
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
+transform_index_cmd:python3.7 deploy/cpp_shitu/tools/transform_id_map.py -c inference_drink.yaml
diff --git a/test_tipc/config/PP-ShiTu/PPShiTu_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/PP-ShiTu/PPShiTu_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5bf0e4c1da933b8b356e34356b10274245f42b9c
--- /dev/null
+++ b/test_tipc/config/PP-ShiTu/PPShiTu_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================serving_params===========================
+model_name:PPShiTu
+python:python3.7
+cls_inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar
+det_inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./models/general_PPLCNet_x2_5_lite_v1.0_infer/
+--dirname:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./models/general_PPLCNet_x2_5_lite_v1.0_serving/
+--serving_client:./models/general_PPLCNet_x2_5_lite_v1.0_client/
+--serving_server:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/
+--serving_client:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
+serving_dir:./paddleserving/recognition
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/PP-ShiTu/PPShiTu_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/PP-ShiTu/PPShiTu_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8a533e82856bfacb1f2db920f20f5f89383e0d27
--- /dev/null
+++ b/test_tipc/config/PP-ShiTu/PPShiTu_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================serving_params===========================
+model_name:PPShiTu
+python:python3.7
+cls_inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar
+det_inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./models/general_PPLCNet_x2_5_lite_v1.0_infer/
+--dirname:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./models/general_PPLCNet_x2_5_lite_v1.0_serving/
+--serving_client:./models/general_PPLCNet_x2_5_lite_v1.0_client/
+--serving_server:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/
+--serving_client:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
+serving_dir:./paddleserving/recognition
+web_service:recognition_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/PP-ShiTu/PPShiTu_mainbody_det_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/PP-ShiTu/PPShiTu_mainbody_det_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..1b970569a745c6f115b581fd2d020258d3d5014d
--- /dev/null
+++ b/test_tipc/config/PP-ShiTu/PPShiTu_mainbody_det_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:PP-ShiTu_mainbody_det
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/inference.onnx
+--opset_version:11
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/PPHGNet/PPHGNet_small_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPHGNet/PPHGNet_small_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..78e342fc9c465bd9e83f9a9352a2f1fa04fee509
--- /dev/null
+++ b/test_tipc/config/PPHGNet/PPHGNet_small_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:PPHGNet_small
+cpp_infer_type:cls
+cls_inference_model_dir:./PPHGNet_small_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_small_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/PPHGNet/PPHGNet_small_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/PPHGNet/PPHGNet_small_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..0b65824c391f796aeb7fb4acf12f38113303b6d7
--- /dev/null
+++ b/test_tipc/config/PPHGNet/PPHGNet_small_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:PPHGNet_small
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/PPHGNet_small_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/PPHGNet_small_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_small_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/PPHGNet_small_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/PPHGNet/PPHGNet_small_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPHGNet/PPHGNet_small_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..b576cb90e3b3fea3fd83d28943e443b8c39a099c
--- /dev/null
+++ b/test_tipc/config/PPHGNet/PPHGNet_small_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPHGNet_small
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_small_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPHGNet_small_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPHGNet_small_serving/
+--serving_client:./deploy/paddleserving/PPHGNet_small_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/PPHGNet/PPHGNet_small_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/PPHGNet/PPHGNet_small_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..b576cb90e3b3fea3fd83d28943e443b8c39a099c
--- /dev/null
+++ b/test_tipc/config/PPHGNet/PPHGNet_small_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPHGNet_small
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_small_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPHGNet_small_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPHGNet_small_serving/
+--serving_client:./deploy/paddleserving/PPHGNet_small_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/PPHGNet/PPHGNet_small_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/test_tipc/config/PPHGNet/PPHGNet_small_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..ba53f530355d9b8d6f26c0560fd86aaab3905ca2
--- /dev/null
+++ b/test_tipc/config/PPHGNet/PPHGNet_small_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt
@@ -0,0 +1,53 @@
+===========================train_params===========================
+model_name:PPHGNet_small
+python:python3.7
+gpu_list:192.168.0.1,192.168.0.2;0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.batch_size:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:norm_train
+norm_train:tools/train.py -c ppcls/configs/ImageNet/PPHGNet/PPHGNet_small.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False
+pact_train:null
+fpgm_train:null
+distill_train:null
+null:null
+null:null
+##
+===========================eval_params===========================
+eval:tools/eval.py -c ppcls/configs/ImageNet/PPHGNet/PPHGNet_small.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/ImageNet/PPHGNet/PPHGNet_small.yaml
+quant_export:null
+fpgm_export:null
+distill_export:null
+kl_quant:null
+export2:null
+pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_small_pretrained.pdparams
+infer_model:../inference/
+infer_export:True
+infer_quant:Fasle
+inference:python/predict_cls.py -c configs/inference_cls.yaml -o PreProcess.transform_ops.0.ResizeImage.resize_short=236
+-o Global.use_gpu:False
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1|16
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/ILSVRC2012/val
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
+===========================infer_benchmark_params==========================
+random_infer_input:[{float32,[3,224,224]}]
diff --git a/test_tipc/config/PPHGNet/PPHGNet_tiny_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPHGNet/PPHGNet_tiny_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..37d48ac17ec320fe54ceb70c2ed119581db2f016
--- /dev/null
+++ b/test_tipc/config/PPHGNet/PPHGNet_tiny_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:PPHGNet_tiny
+cpp_infer_type:cls
+cls_inference_model_dir:./PPHGNet_tiny_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_tiny_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/PPHGNet/PPHGNet_tiny_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/PPHGNet/PPHGNet_tiny_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..b31ff8c60af0c55950f20b0d9ba68ee058be15af
--- /dev/null
+++ b/test_tipc/config/PPHGNet/PPHGNet_tiny_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:PPHGNet_tiny
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/PPHGNet_tiny_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/PPHGNet_tiny_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_tiny_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/PPHGNet_tiny_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/PPHGNet/PPHGNet_tiny_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPHGNet/PPHGNet_tiny_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..6b4f5e41e110277938bd1d38543dcd5b8d5561d3
--- /dev/null
+++ b/test_tipc/config/PPHGNet/PPHGNet_tiny_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPHGNet_tiny
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_tiny_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPHGNet_tiny_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPHGNet_tiny_serving/
+--serving_client:./deploy/paddleserving/PPHGNet_tiny_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/PPHGNet/PPHGNet_tiny_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/PPHGNet/PPHGNet_tiny_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..6b4f5e41e110277938bd1d38543dcd5b8d5561d3
--- /dev/null
+++ b/test_tipc/config/PPHGNet/PPHGNet_tiny_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPHGNet_tiny
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_tiny_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPHGNet_tiny_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPHGNet_tiny_serving/
+--serving_client:./deploy/paddleserving/PPHGNet_tiny_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/PPLCNet/MobileNetV3_large_x1_0_lite_arm_cpu_cpp.txt b/test_tipc/config/PPLCNet/MobileNetV3_large_x1_0_lite_arm_cpu_cpp.txt
deleted file mode 100644
index b45c2a01b2cb0d9491b516e2caf410ef04e7d35e..0000000000000000000000000000000000000000
--- a/test_tipc/config/PPLCNet/MobileNetV3_large_x1_0_lite_arm_cpu_cpp.txt
+++ /dev/null
@@ -1,8 +0,0 @@
-runtime_device:arm_cpu
-lite_arm_work_path:/data/local/tmp/arm_cpu/
-lite_arm_so_path:inference_lite_lib.android.armv8/cxx/lib/libpaddle_light_api_shared.so
-clas_model_file:MobileNetV3_large_x1_0
-inference_cmd:clas_system config.txt tabby_cat.jpg
---num_threads_list:1
---batch_size_list:1
---precision_list:FP32
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_25_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_25_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..dbd610ae941a79d7cdf484d6ae9a666bd8799d24
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_25_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:PPLCNet_x0_25
+cpp_infer_type:cls
+cls_inference_model_dir:./PPLCNet_x0_25_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_25_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_25_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_25_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..62dcb630b61880531fc4f2888740fc0ca4251e16
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_25_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:PPLCNet_x0_25
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/PPLCNet_x0_25_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/PPLCNet_x0_25_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_25_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/PPLCNet_x0_25_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_25_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_25_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a36cb9ac1888b0019bde4299f1f4732cedd6a59f
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_25_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x0_25
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_25_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x0_25_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x0_25_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x0_25_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_25_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_25_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..b1ca66b137b3eecf5759ae7a4af1b44b9f1ac5cc
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_25_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x0_25
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_25_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x0_25_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x0_25_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x0_25_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_35_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_35_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d20b2099cd13cd8919dffdbcbbfc87f1de6febfc
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_35_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:PPLCNet_x0_35
+cpp_infer_type:cls
+cls_inference_model_dir:./PPLCNet_x0_35_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_35_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_35_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_35_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..427d8cc725cc7fa3f72d4a28c712d1109bcdae88
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_35_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:PPLCNet_x0_35
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/PPLCNet_x0_35_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/PPLCNet_x0_35_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_35_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/PPLCNet_x0_35_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_35_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_35_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f5673d9b583f95e32f2df0f8317a08e980a5bcd4
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_35_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x0_35
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_35_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x0_35_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x0_35_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x0_35_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_35_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_35_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f80a40ae603a553b27a63e15661a62946854dff7
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_35_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x0_35
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_35_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x0_35_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x0_35_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x0_35_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_5_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_5_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9cb7b18735ccf1af81b58505695ec910aa87ebcc
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_5_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:PPLCNet_x0_5
+cpp_infer_type:cls
+cls_inference_model_dir:./PPLCNet_x0_5_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_5_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_5_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_5_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c70ef6fc0d5a28d333cec103a533eee237bb8351
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_5_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:PPLCNet_x0_5
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/PPLCNet_x0_5_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/PPLCNet_x0_5_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_5_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/PPLCNet_x0_5_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_5_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_5_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..487d270e07bbfd7503cc4cbde094326c4430825b
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_5_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x0_5
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_5_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x0_5_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x0_5_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x0_5_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_5_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_5_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..3ac4971a5000b8b2d1b4c5c04a02e5e2d9aa2090
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_5_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x0_5
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_5_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x0_5_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x0_5_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x0_5_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
\ No newline at end of file
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_75_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_75_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..72d99a639d929975a21d34c1a07af9dc91e04f9d
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_75_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:PPLCNet_x0_75
+cpp_infer_type:cls
+cls_inference_model_dir:./PPLCNet_x0_75_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_75_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_75_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_75_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..124648502dbdfd7e4b10890977ab45d8e82e8d5d
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_75_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:PPLCNet_x0_75
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/PPLCNet_x0_75_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/PPLCNet_x0_75_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_75_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/PPLCNet_x0_75_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_75_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_75_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..51b2a8167a78f245c7c3b882e08897d338557881
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_75_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x0_75
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_75_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x0_75_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x0_75_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x0_75_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x0_75_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x0_75_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..771817b8f1fc28c71e18b3eda89156334192cfdb
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x0_75_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x0_75
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_75_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x0_75_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x0_75_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x0_75_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x1_0_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x1_0_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..872ba7acbf50e35b0e2db83582b6e70654dd7412
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x1_0_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:PPLCNet_x1_0
+cpp_infer_type:cls
+cls_inference_model_dir:./PPLCNet_x1_0_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_0_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x1_0_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x1_0_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d994671b348e90af1ca55b7bd116ed4e53a364c9
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x1_0_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:PPLCNet_x1_0
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/PPLCNet_x1_0_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/PPLCNet_x1_0_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_0_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/PPLCNet_x1_0_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x1_0_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x1_0_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..39aac394d0e62568ce51547ad0b995e9bbad7851
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x1_0_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x1_0
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_0_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x1_0_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x1_0_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x1_0_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x1_0_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x1_0_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e5d7859ee9dd035a5b7f1f0dd45a2c7325dba223
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x1_0_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x1_0
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_0_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x1_0_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x1_0_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x1_0_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x1_0_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x1_0_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8ff97481cfcae47d8d062862078bfcd1ab90751e
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x1_0_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt
@@ -0,0 +1,53 @@
+===========================train_params===========================
+model_name:PPLCNet_x1_0
+python:python3.7
+gpu_list:192.168.0.1,192.168.0.2;0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.batch_size:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:norm_train
+norm_train:tools/train.py -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False
+pact_train:null
+fpgm_train:null
+distill_train:null
+null:null
+null:null
+##
+===========================eval_params===========================
+eval:tools/eval.py -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml
+quant_export:null
+fpgm_export:null
+distill_export:null
+kl_quant:null
+export2:null
+pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams
+infer_model:../inference/
+infer_export:True
+infer_quant:Fasle
+inference:python/predict_cls.py -c configs/inference_cls.yaml
+-o Global.use_gpu:False
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1|16
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/ILSVRC2012/val
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
+===========================infer_benchmark_params==========================
+random_infer_input:[{float32,[3,224,224]}]
\ No newline at end of file
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x1_5_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x1_5_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..84ced77b48f9ec13a31f705eef71eced5f0f41d7
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x1_5_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:PPLCNet_x1_5
+cpp_infer_type:cls
+cls_inference_model_dir:./PPLCNet_x1_5_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_5_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x1_5_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x1_5_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8f8646f871e247a0f30cefec7e75c6d4b2dc9c44
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x1_5_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:PPLCNet_x1_5
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/PPLCNet_x1_5_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/PPLCNet_x1_5_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_5_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/PPLCNet_x1_5_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x1_5_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x1_5_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..20cdbd0bbc34126f8d5f5592df911cbebd619387
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x1_5_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x1_5
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_5_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x1_5_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x1_5_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x1_5_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x1_5_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x1_5_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d32527d825be1887d63cd774e88b1a4adecde3f7
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x1_5_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x1_5
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_5_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x1_5_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x1_5_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x1_5_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x2_0_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x2_0_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d8353543acdfcb7962a9a1261dc883399fba94e8
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x2_0_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:PPLCNet_x2_0
+cpp_infer_type:cls
+cls_inference_model_dir:./PPLCNet_x2_0_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x2_0_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x2_0_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x2_0_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..19336133ab4e70fde77b7c35d8bf03fd2d30258c
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x2_0_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:PPLCNet_x2_0
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/PPLCNet_x2_0_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/PPLCNet_x2_0_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x2_0_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/PPLCNet_x2_0_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x2_0_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x2_0_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f2a25a3d05f2de962f8194d769f82aa00064aa29
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x2_0_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x2_0
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x2_0_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x2_0_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x2_0_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x2_0_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x2_0_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x2_0_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e8309d5d173d9bd48be77b831c2e27650a612867
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x2_0_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x2_0
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x2_0_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x2_0_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x2_0_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x2_0_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x2_5_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x2_5_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..0aca9e32ee3da4512060657c9c1b982efd18bb4d
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x2_5_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:PPLCNet_x2_5
+cpp_infer_type:cls
+cls_inference_model_dir:./PPLCNet_x2_5_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x2_5_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x2_5_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x2_5_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..1e08ad41e51edd0b6b72bf4f4367b9ce6383f537
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x2_5_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:PPLCNet_x2_5
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/PPLCNet_x2_5_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/PPLCNet_x2_5_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x2_5_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/PPLCNet_x2_5_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x2_5_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x2_5_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e8c340ea7aa9e643fb45dbb426ae6459e88dcd75
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x2_5_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x2_5
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x2_5_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x2_5_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x2_5_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x2_5_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/PPLCNet/PPLCNet_x2_5_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/PPLCNet/PPLCNet_x2_5_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..269d33acf53dd923182bd1f68263d9efd8f20e9c
--- /dev/null
+++ b/test_tipc/config/PPLCNet/PPLCNet_x2_5_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNet_x2_5
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x2_5_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNet_x2_5_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNet_x2_5_serving/
+--serving_client:./deploy/paddleserving/PPLCNet_x2_5_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/PPLCNetV2/PPLCNetV2_base_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNetV2/PPLCNetV2_base_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d6e68233e78f2072e8540170abee2c5ceae6b1ef
--- /dev/null
+++ b/test_tipc/config/PPLCNetV2/PPLCNetV2_base_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:PPLCNetV2_base
+cpp_infer_type:cls
+cls_inference_model_dir:./PPLCNetV2_base_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNetV2_base_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/PPLCNetV2/PPLCNetV2_base_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/PPLCNetV2/PPLCNetV2_base_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..b5047248e98989c3de64f473a2ab7b64084bfe37
--- /dev/null
+++ b/test_tipc/config/PPLCNetV2/PPLCNetV2_base_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:PPLCNetV2_base
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/PPLCNetV2_base_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/PPLCNetV2_base_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNetV2_base_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/PPLCNetV2_base_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/PPLCNetV2/PPLCNetV2_base_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/PPLCNetV2/PPLCNetV2_base_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..2c355a9dbd734b8c5eab317007cb2ceb310a1a66
--- /dev/null
+++ b/test_tipc/config/PPLCNetV2/PPLCNetV2_base_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNetV2_base
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNetV2_base_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNetV2_base_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNetV2_base_serving/
+--serving_client:./deploy/paddleserving/PPLCNetV2_base_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/PPLCNetV2/PPLCNetV2_base_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/PPLCNetV2/PPLCNetV2_base_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..402679134a1b0f94fe1d8a67634efcff2772231a
--- /dev/null
+++ b/test_tipc/config/PPLCNetV2/PPLCNetV2_base_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:PPLCNetV2_base
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNetV2_base_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/PPLCNetV2_base_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/PPLCNetV2_base_serving/
+--serving_client:./deploy/paddleserving/PPLCNetV2_base_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/PPLCNetV2/PPLCNetV2_base_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/test_tipc/config/PPLCNetV2/PPLCNetV2_base_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..ee251ca60c8e2ee798821b4c17729e05a906c4cb
--- /dev/null
+++ b/test_tipc/config/PPLCNetV2/PPLCNetV2_base_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt
@@ -0,0 +1,53 @@
+===========================train_params===========================
+model_name:PPLCNetV2_base
+python:python3.7
+gpu_list:192.168.0.1,192.168.0.2;0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.first_bs:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:norm_train
+norm_train:tools/train.py -c ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml -o Global.seed=1234 -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False
+pact_train:null
+fpgm_train:null
+distill_train:null
+null:null
+null:null
+##
+===========================eval_params===========================
+eval:tools/eval.py -c ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml
+quant_export:null
+fpgm_export:null
+distill_export:null
+kl_quant:null
+export2:null
+pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNetV2_base_pretrained.pdparams
+infer_model:../inference/
+infer_export:True
+infer_quant:Fasle
+inference:python/predict_cls.py -c configs/inference_cls.yaml
+-o Global.use_gpu:False
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1|16
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/ILSVRC2012/val
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
+===========================infer_benchmark_params==========================
+random_infer_input:[{float32,[3,224,224]}]
diff --git a/test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e9de97fdd1046be3eec2cb7805ce5ca6b6bbb5ff
--- /dev/null
+++ b/test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:ResNet50
+cpp_infer_type:cls
+cls_inference_model_dir:./ResNet50_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..784f1eb2c3cdd3c4f8f10775fceb9c741c79aeaf
--- /dev/null
+++ b/test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:ResNet50
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/ResNet50_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/ResNet50_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/ResNet50_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a4344caf1f0327d53d64c3175034d5b4f41dff97
--- /dev/null
+++ b/test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:ResNet50
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/ResNet50_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/ResNet50_serving/
+--serving_client:./deploy/paddleserving/ResNet50_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..0f43feceac3f6200901d20540ae32fcaad92bc46
--- /dev/null
+++ b/test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:ResNet50
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/ResNet50_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/ResNet50_serving/
+--serving_client:./deploy/paddleserving/ResNet50_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/ResNet/ResNet50_train_amp_infer_python.txt b/test_tipc/config/ResNet/ResNet50_train_ampfp16_infer_python.txt
similarity index 81%
rename from test_tipc/config/ResNet/ResNet50_train_amp_infer_python.txt
rename to test_tipc/config/ResNet/ResNet50_train_ampfp16_infer_python.txt
index a398086aaf466d91e330d0e794943324e9913870..3c7592779c5f1a15279eaa1cb7bdd564a231904e 100644
--- a/test_tipc/config/ResNet/ResNet50_train_amp_infer_python.txt
+++ b/test_tipc/config/ResNet/ResNet50_train_ampfp16_infer_python.txt
@@ -13,7 +13,7 @@ train_infer_img_dir:./dataset/ILSVRC2012/val
 null:null
 ##
 trainer:amp_train
-amp_train:tools/train.py -c ppcls/configs/ImageNet/ResNet/ResNet50.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False -o AMP.scale_loss=128 -o AMP.use_dynamic_loss_scaling=True -o AMP.level=O2
+amp_train:tools/train.py -c ppcls/configs/ImageNet/ResNet/ResNet50_amp_O1.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False -o AMP.scale_loss=128 -o AMP.use_dynamic_loss_scaling=True -o AMP.level=O1
 pact_train:null
 fpgm_train:null
 distill_train:null
@@ -50,3 +50,7 @@ inference:python/predict_cls.py -c configs/inference_cls.yaml
 -o Global.benchmark:True
 null:null
 null:null
+===========================train_benchmark_params==========================
+batch_size:128|256
+fp_items:ampfp16
+epoch:1
diff --git a/test_tipc/config/ResNet/ResNet50_train_purefp16_infer_python.txt b/test_tipc/config/ResNet/ResNet50_train_purefp16_infer_python.txt
new file mode 100644
index 0000000000000000000000000000000000000000..907d579ecc01740922e3c5e06b5087063685224e
--- /dev/null
+++ b/test_tipc/config/ResNet/ResNet50_train_purefp16_infer_python.txt
@@ -0,0 +1,56 @@
+===========================train_params===========================
+model_name:ResNet50
+python:python3.7
+gpu_list:0|0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.batch_size:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:amp_train
+amp_train:tools/train.py -c ppcls/configs/ImageNet/ResNet/ResNet50_amp_O1.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False -o AMP.scale_loss=128 -o AMP.use_dynamic_loss_scaling=True -o AMP.level=O2
+pact_train:null
+fpgm_train:null
+distill_train:null
+to_static_train:-o Global.to_static=True
+null:null
+##
+===========================eval_params=========================== 
+eval:tools/eval.py -c ppcls/configs/ImageNet/ResNet/ResNet50.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/ImageNet/ResNet/ResNet50.yaml
+quant_export:null
+fpgm_export:null
+distill_export:null
+kl_quant:null
+export2:null
+pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_pretrained.pdparams
+infer_model:../inference/
+infer_export:True
+infer_quant:Fasle
+inference:python/predict_cls.py -c configs/inference_cls.yaml
+-o Global.use_gpu:True|False
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1|16
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/ILSVRC2012/val
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
+null:null
+===========================train_benchmark_params==========================
+batch_size:128|256
+fp_items:purefp16
+epoch:1
diff --git a/test_tipc/config/ResNet/ResNet50_vd-KL_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/ResNet/ResNet50_vd-KL_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..892b4859b1e3941c7b8ca235152f105d17d06729
--- /dev/null
+++ b/test_tipc/config/ResNet/ResNet50_vd-KL_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:ResNet50_vd_KL
+cpp_infer_type:cls
+cls_inference_model_dir:./ResNet50_vd_kl_quant_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/slim_model/ResNet50_vd_kl_quant_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/ResNet/ResNet50_vd-KL_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/ResNet/ResNet50_vd-KL_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..3e4b19b752e3f0f15fdb552a29035e055633102d
--- /dev/null
+++ b/test_tipc/config/ResNet/ResNet50_vd-KL_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:ResNet50_vd_KL
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/slim_model/ResNet50_vd_kl_quant_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/ResNet50_vd_kl_quant_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/ResNet50_vd_kl_quant_serving/
+--serving_client:./deploy/paddleserving/ResNet50_vd_kl_quant_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/ResNet/ResNet50_vd-KL_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/ResNet/ResNet50_vd-KL_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..236b57b03643d1e5cab743abb80dcf8cf03dc8c1
--- /dev/null
+++ b/test_tipc/config/ResNet/ResNet50_vd-KL_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:ResNet50_vd_KL
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/slim_model/ResNet50_vd_kl_quant_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/ResNet50_vd_kl_quant_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/ResNet50_vd_kl_quant_serving/
+--serving_client:./deploy/paddleserving/ResNet50_vd_kl_quant_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
index 51c73f13d46c3a8793f9b5db92a74e0aa7b4e599..e4b8c1ddfa3502299ff35214f0a1d9c822f7a86b 100644
--- a/test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
+++ b/test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -1,18 +1,18 @@
 ===========================cpp_infer_params===========================
 model_name:ResNet50_vd
 cpp_infer_type:cls
-cls_inference_model_dir:./cls_inference/
+cls_inference_model_dir:./ResNet50_vd_infer/
 det_inference_model_dir:
-cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/whole_chain/ResNet50_vd_inference.tar
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar
 det_inference_url:
 infer_quant:False
 inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
 use_gpu:True|False
-enable_mkldnn:True|False
-cpu_threads:1|6
+enable_mkldnn:False
+cpu_threads:1
 batch_size:1
-use_tensorrt:False|True
-precision:fp32|fp16
-image_dir:./dataset/ILSVRC2012/val
-benchmark:True
-generate_yaml_cmd:python3 test_tipc/generate_cpp_yaml.py
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
index 163bb48429b468433ee6bb539c029f51fe364190..82806311e7f2effec06a72dae8d6024e1d54f25e 100644
--- a/test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
+++ b/test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -8,7 +8,9 @@ python:python3.7
 --save_file:./deploy/models/ResNet50_vd_infer/inference.onnx
 --opset_version:10
 --enable_onnx_checker:True
-inference: python/predict_cls.py  -c configs/inference_cls.yaml
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar
+inference:./python/predict_cls.py
 Global.use_onnx:True
-Global.inference_model_dir:models/ResNet50_vd_infer/
+Global.inference_model_dir:./models/ResNet50_vd_infer/
 Global.use_gpu:False
+-c:configs/inference_cls.yaml
diff --git a/test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..2de324433d0a51e994044fdb215ce8ac621aed45
--- /dev/null
+++ b/test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:ResNet50_vd
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/ResNet50_vd_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/ResNet50_vd_serving/
+--serving_client:./deploy/paddleserving/ResNet50_vd_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d51f5d6ccd38e5e39c4b12b4999b987bb25ce951
--- /dev/null
+++ b/test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:ResNet50_vd
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/ResNet50_vd_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/ResNet50_vd_serving/
+--serving_client:./deploy/paddleserving/ResNet50_vd_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
diff --git a/test_tipc/config/SwinTransformer/SwinTransformer_tiny_patch4_window7_224_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/config/SwinTransformer/SwinTransformer_tiny_patch4_window7_224_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9f1365975aaabef655b0677b482797b8ae1ea674
--- /dev/null
+++ b/test_tipc/config/SwinTransformer/SwinTransformer_tiny_patch4_window7_224_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,18 @@
+===========================cpp_infer_params===========================
+model_name:SwinTransformer_tiny_patch4_window7_224
+cpp_infer_type:cls
+cls_inference_model_dir:./SwinTransformer_tiny_patch4_window7_224_infer/
+det_inference_model_dir:
+cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_tiny_patch4_window7_224_infer.tar
+det_inference_url:
+infer_quant:False
+inference_cmd:./deploy/cpp/build/clas_system -c inference_cls.yaml
+use_gpu:True|False
+enable_mkldnn:False
+cpu_threads:1
+batch_size:1
+use_tensorrt:False
+precision:fp32
+image_dir:./dataset/ILSVRC2012/val/ILSVRC2012_val_00000001.JPEG
+benchmark:False
+generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
diff --git a/test_tipc/config/SwinTransformer/SwinTransformer_tiny_patch4_window7_224_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/config/SwinTransformer/SwinTransformer_tiny_patch4_window7_224_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..82ea03ce699bf8d89ef1a85c1ba2b05917daab8b
--- /dev/null
+++ b/test_tipc/config/SwinTransformer/SwinTransformer_tiny_patch4_window7_224_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
@@ -0,0 +1,16 @@
+===========================paddle2onnx_params===========================
+model_name:SwinTransformer_tiny_patch4_window7_224
+python:python3.7
+2onnx: paddle2onnx
+--model_dir:./deploy/models/SwinTransformer_tiny_patch4_window7_224_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--save_file:./deploy/models/SwinTransformer_tiny_patch4_window7_224_infer/inference.onnx
+--opset_version:10
+--enable_onnx_checker:True
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_tiny_patch4_window7_224_infer.tar
+inference:./python/predict_cls.py
+Global.use_onnx:True
+Global.inference_model_dir:./models/SwinTransformer_tiny_patch4_window7_224_infer
+Global.use_gpu:False
+-c:configs/inference_cls.yaml
\ No newline at end of file
diff --git a/test_tipc/config/SwinTransformer/SwinTransformer_tiny_patch4_window7_224_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/config/SwinTransformer/SwinTransformer_tiny_patch4_window7_224_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5b5a2a1acdcf8c2c352b53f28dc325b933ef1053
--- /dev/null
+++ b/test_tipc/config/SwinTransformer/SwinTransformer_tiny_patch4_window7_224_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:SwinTransformer_tiny_patch4_window7_224
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_tiny_patch4_window7_224_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/SwinTransformer_tiny_patch4_window7_224_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/SwinTransformer_tiny_patch4_window7_224_serving/
+--serving_client:./deploy/paddleserving/SwinTransformer_tiny_patch4_window7_224_client/
+serving_dir:./deploy/paddleserving
+web_service:null
+--use_gpu:0|null
+pipline:test_cpp_serving_client.py
diff --git a/test_tipc/config/SwinTransformer/SwinTransformer_tiny_patch4_window7_224_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/config/SwinTransformer/SwinTransformer_tiny_patch4_window7_224_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..3d6389e02e41b64539ebd8283b3a1db9fbf0c767
--- /dev/null
+++ b/test_tipc/config/SwinTransformer/SwinTransformer_tiny_patch4_window7_224_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
@@ -0,0 +1,14 @@
+===========================serving_params===========================
+model_name:SwinTransformer_tiny_patch4_window7_224
+python:python3.7
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_tiny_patch4_window7_224_infer.tar
+trans_model:-m paddle_serving_client.convert
+--dirname:./deploy/paddleserving/SwinTransformer_tiny_patch4_window7_224_infer/
+--model_filename:inference.pdmodel
+--params_filename:inference.pdiparams
+--serving_server:./deploy/paddleserving/SwinTransformer_tiny_patch4_window7_224_serving/
+--serving_client:./deploy/paddleserving/SwinTransformer_tiny_patch4_window7_224_client/
+serving_dir:./deploy/paddleserving
+web_service:classification_web_service.py
+--use_gpu:0|null
+pipline:pipeline_http_client.py
\ No newline at end of file
diff --git a/test_tipc/docs/benchmark_train.md b/test_tipc/docs/benchmark_train.md
index 20cf9287423616103609187d2104d68c77f34650..a2b01ed9df000a9f281319da70512d8ad7f3c71d 100644
--- a/test_tipc/docs/benchmark_train.md
+++ b/test_tipc/docs/benchmark_train.md
@@ -9,7 +9,7 @@
 
 ```shell
 # 运行格式：bash test_tipc/prepare.sh  train_benchmark.txt  mode
-bash test_tipc/prepare.sh test_tipc/configs/MobileNetV2/MobileNetV2_train_infer_python.txt benchmark_train
+bash test_tipc/prepare.sh test_tipc/config/MobileNetV2/MobileNetV2_train_infer_python.txt benchmark_train
 ```
 
 ## 1.2 功能测试
@@ -24,7 +24,7 @@ bash test_tipc/benchmark_train.sh test_tipc/config/MobileNetV2/MobileNetV2_train
 `test_tipc/benchmark_train.sh`支持根据传入的第三个参数实现只运行某一个训练配置，如下：
 ```shell
 # 运行格式：bash test_tipc/benchmark_train.sh train_benchmark.txt mode params
-bash test_tipc/benchmark_train.sh test_tipc/configs/MobileNetV2/MobileNetV2_train_infer_python.txt benchmark_train  dynamic_bs8_fp32_DP_N1C1
+bash test_tipc/benchmark_train.sh test_tipc/config/MobileNetV2/MobileNetV2_train_infer_python.txt benchmark_train  dynamic_bs8_fp32_DP_N1C1
 ```
 dynamic_bs8_fp32_DP_N1C1为test_tipc/benchmark_train.sh传入的参数，格式如下：
 `${modeltype}_${batch_size}_${fp_item}_${run_mode}_${device_num}`
@@ -33,7 +33,7 @@ dynamic_bs8_fp32_DP_N1C1为test_tipc/benchmark_train.sh传入的参数，格式
 
 ## 2. 日志输出
 
-运行后将保存模型的训练日志和解析日志，使用 `test_tipc/configs/MobileNetV2/MobileNetV2_train_infer_python.txt` 参数文件的训练日志解析结果是：
+运行后将保存模型的训练日志和解析日志，使用 `test_tipc/config/MobileNetV2/MobileNetV2_train_infer_python.txt` 参数文件的训练日志解析结果是：
 
 ```
 {"model_branch": "dygaph", "model_commit": "7c39a1996b19087737c05d883fd346d2f39dbcc0", "model_name": "cls_MobileNetV2_bs8_fp32_SingleP_DP", "batch_size": 8, "fp_item": "fp32", "run_process_type": "SingleP", "run_mode": "DP", "convergence_value": "5.413110", "convergence_key": "loss:", "ips": 19.333, "speed_unit": "samples/s", "device_num": "N1C1", "model_run_time": "0", "frame_commit": "8cc09552473b842c651ead3b9848d41827a3dbab", "frame_version": "0.0.0"}
diff --git a/test_tipc/docs/test_inference_cpp.md b/test_tipc/docs/test_inference_cpp.md
index eabf8774d37454461fe28a5fc1d5ed0b3135fcad..e82b8ed800b4819e5f7599d565a0076137b8e531 100644
--- a/test_tipc/docs/test_inference_cpp.md
+++ b/test_tipc/docs/test_inference_cpp.md
@@ -1,86 +1,313 @@
-# C++预测功能测试
+# Linux GPU/CPU C++ 推理功能测试
 
-C++预测功能测试的主程序为`test_inference_cpp.sh`，可以测试基于C++预测库的模型推理功能。
+Linux GPU/CPU C++ 推理功能测试的主程序为`test_inference_cpp.sh`，可以测试基于C++预测引擎的推理功能。
 
 ## 1. 测试结论汇总
 
-基于训练是否使用量化，进行本测试的模型可以分为`正常模型`和`量化模型`，这两类模型对应的C++预测功能汇总如下：
+- 推理相关：
 
-| 模型类型 |device | batchsize | tensorrt | mkldnn | cpu多线程 |
-|  ----   |  ---- |   ----   |  :----:  |   :----:   |  :----:  |
-| 正常模型 | GPU | 1/6 | fp32/fp16 | - | - |
-| 正常模型 | CPU | 1/6 | - | fp32 | 支持 |
-| 量化模型 | GPU | 1/6 | int8 | - | - |
-| 量化模型 | CPU | 1/6 | - | int8 | 支持 |
+|    算法名称     |                 模型名称                  | device_CPU | device_GPU |
+| :-------------: | :---------------------------------------: | :--------: | :--------: |
+|   MobileNetV3   |          MobileNetV3_large_x1_0           |    支持    |    支持    |
+|   MobileNetV3   |         MobileNetV3_large_x1_0_KL         |    支持    |    支持    |
+|    PP-ShiTu     | PPShiTu_general_rec、PPShiTu_mainbody_det |    支持    |    支持    |
+|    PP-ShiTu     |           PPShiTu_mainbody_det            |    支持    |    支持    |
+|     PPHGNet     |               PPHGNet_small               |    支持    |    支持    |
+|     PPHGNet     |               PPHGNet_tiny                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x0_25               |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x0_35               |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x0_5                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x0_75               |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x1_0                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x1_5                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x2_0                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x2_5                |    支持    |    支持    |
+|    PPLCNetV2    |              PPLCNetV2_base               |    支持    |    支持    |
+|     ResNet      |                 ResNet50                  |    支持    |    支持    |
+|     ResNet      |                ResNet50_vd                |    支持    |    支持    |
+|     ResNet      |              ResNet50_vd_KL               |    支持    |    支持    |
+| SwinTransformer |  SwinTransformer_tiny_patch4_window7_224  |    支持    |    支持    |
 
-## 2. 测试流程
-运行环境配置请参考[文档](./install.md)的内容配置TIPC的运行环境。
+## 2. 测试流程(以**ResNet50**为例)
 
-### 2.1 功能测试
-先运行`prepare.sh`准备数据和模型，然后运行`test_inference_cpp.sh`进行测试，最终在```test_tipc/output```目录下生成`cpp_infer_*.log`后缀的日志文件。
+
+<details>
+<summary><b>准备数据、准备推理模型、编译opencv、编译（下载）Paddle Inference、编译C++预测Demo（已写入prepare.sh自动执行，点击以展开详细内容或者折叠）
+</b></summary>
+
+### 2.1 准备数据和推理模型
+
+#### 2.1.1 准备数据
+
+默认使用`./deploy/images/ILSVRC2012_val_00000010.jpeg`作为测试输入图片。
+
+#### 2.1.2 准备推理模型
+
+* 如果已经训练好了模型，可以参考[模型导出](../../docs/zh_CN/inference_deployment/export_model.md)，导出`inference model`，并将导出路径设置为`./deploy/models/ResNet50_infer`，
+导出完毕后文件结构如下
 
 ```shell
-bash test_tipc/prepare.sh  test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt cpp_infer
+./deploy/models/ResNet50_infer/
+├── inference.pdmodel
+├── inference.pdiparams
+└── inference.pdiparams.info
+```
+
+### 2.2 准备环境
+
+#### 2.2.1 运行准备
+
+配置合适的编译和执行环境，其中包括编译器，cuda等一些基础库，建议安装docker环境，[参考链接](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/docker/linux-docker.html)。
 
-# 用法1:
-bash test_tipc/test_inference_cpp.sh test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
-# 用法2: 指定GPU卡预测，第三个传入参数为GPU卡号
-bash test_tipc/test_inference_cpp.sh test_tipc/config/ResNet/ResNet50_vd_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt 1
+#### 2.2.2 编译opencv库
+
+* 首先需要从opencv官网上下载Linux环境下的源码，以3.4.7版本为例，下载及解压缩命令如下：
+
+```
+cd deploy/cpp
+wget https://github.com/opencv/opencv/archive/3.4.7.tar.gz
+tar -xvf 3.4.7.tar.gz
 ```
 
-运行预测指令后，在`test_tipc/output`文件夹下自动会保存运行日志，包括以下文件：
+* 编译opencv，首先设置opencv源码路径(`root_path`)以及安装路径(`install_path`)，`root_path`为下载的opencv源码路径，`install_path`为opencv的安装路径。在本例中，源码路径即为当前目录下的`opencv-3.4.7/`。
 
 ```shell
-test_tipc/output/
-|- results_cpp.log    # 运行指令状态的日志
-|- cls_cpp_infer_cpu_usemkldnn_False_threads_1_precision_fp32_batchsize_1.log  # CPU上不开启Mkldnn，线程数设置为1，测试batch_size=1条件下的预测运行日志
-|- cls_cpp_infer_cpu_usemkldnn_False_threads_6_precision_fp32_batchsize_1.log  # CPU上不开启Mkldnn，线程数设置为6，测试batch_size=1条件下的预测运行日志
-|- cls_cpp_infer_gpu_usetrt_False_precision_fp32_batchsize_1.log # GPU上不开启TensorRT，测试batch_size=1的fp32精度预测日志
-|- cls_cpp_infer_gpu_usetrt_True_precision_fp16_batchsize_1.log  # GPU上开启TensorRT，测试batch_size=1的fp16精度预测日志
-......
+cd ./opencv-3.4.7
+export root_path=$PWD
+export install_path=${root_path}/opencv3
 ```
-其中results_cpp.log中包含了每条指令的运行状态，如果运行成功会输出：
 
+* 然后在opencv源码路径下，按照下面的命令进行编译。
+
+```shell
+rm -rf build
+mkdir build
+cd build
+
+cmake .. \
+    -DCMAKE_INSTALL_PREFIX=${install_path} \
+    -DCMAKE_BUILD_TYPE=Release \
+    -DBUILD_SHARED_LIBS=OFF \
+    -DWITH_IPP=OFF \
+    -DBUILD_IPP_IW=OFF \
+    -DWITH_LAPACK=OFF \
+    -DWITH_EIGEN=OFF \
+    -DCMAKE_INSTALL_LIBDIR=lib64 \
+    -DWITH_ZLIB=ON \
+    -DBUILD_ZLIB=ON \
+    -DWITH_JPEG=ON \
+    -DBUILD_JPEG=ON \
+    -DWITH_PNG=ON \
+    -DBUILD_PNG=ON \
+    -DWITH_TIFF=ON \
+    -DBUILD_TIFF=ON
+
+make -j
+make install
 ```
-Run successfully with command - ./deploy/cpp/build/clas_system -c inference_cls.yaml 2>&1|tee test_tipc/output/cls_cpp_infer_gpu_usetrt_False_precision_fp32_batchsize_1.log
-......
+
+* `make install`完成之后，会在该文件夹下生成opencv头文件和库文件，用于后面的代码编译。
+
+以opencv3.4.7版本为例，最终在安装路径下的文件结构如下所示。**注意**：不同的opencv版本，下述的文件结构可能不同。
+
+```shell
+opencv3/
+├── bin     :可执行文件
+├── include :头文件
+├── lib64   :库文件
+└── share   :部分第三方库
+```
+
+#### 2.2.3 下载或者编译Paddle预测库
+
+* 有2种方式获取Paddle预测库，下面进行详细介绍。
+
+##### 预测库源码编译
+* 如果希望获取最新预测库特性，可以从Paddle github上克隆最新代码，源码编译预测库。
+* 可以参考[Paddle预测库官网](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#id16)的说明，从github上获取Paddle代码，然后进行编译，生成最新的预测库。使用git获取代码方法如下。
+
+```shell
+git clone https://github.com/PaddlePaddle/Paddle.git
+```
+
+* 进入Paddle目录后，使用如下命令编译。
+
+```shell
+rm -rf build
+mkdir build
+cd build
+
+cmake  .. \
+    -DWITH_CONTRIB=OFF \
+    -DWITH_MKL=ON \
+    -DWITH_MKLDNN=ON  \
+    -DWITH_TESTING=OFF \
+    -DCMAKE_BUILD_TYPE=Release \
+    -DWITH_INFERENCE_API_TEST=OFF \
+    -DON_INFER=ON \
+    -DWITH_PYTHON=ON
+make -j
+make inference_lib_dist
 ```
-如果运行失败，会输出：
+
+更多编译参数选项可以参考Paddle C++预测库官网：[https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#id16](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#id16)。
+
+
+* 编译完成之后，可以在`build/paddle_inference_install_dir/`文件下看到生成了以下文件及文件夹。
+
 ```
-Run failed with command - ./deploy/cpp/build/clas_system -c inference_cls.yaml 2>&1|tee test_tipc/output/cls_cpp_infer_gpu_usetrt_False_precision_fp32_batchsize_1.log
-......
+build/paddle_inference_install_dir/
+├── CMakeCache.txt
+├── paddle
+├── third_party
+└── version.txt
 ```
-可以很方便的根据results_cpp.log中的内容判定哪一个指令运行错误。
 
+其中`paddle`就是之后进行C++预测时所需的Paddle库，`version.txt`中包含当前预测库的版本信息。
+
+##### 直接下载安装
 
-### 2.2 精度测试
+* [Paddle预测库官网](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html)上提供了不同cuda版本的Linux预测库，可以在官网查看并选择合适的预测库版本。
 
-使用compare_results.py脚本比较模型预测的结果是否符合预期，主要步骤包括：
-- 提取日志中的预测坐标；
-- 从本地文件中提取保存好的坐标结果；
-- 比较上述两个结果是否符合精度预期，误差大于设置阈值时会报错。
+  以`manylinux_cuda10.1_cudnn7.6_avx_mkl_trt6_gcc8.2`版本为例，使用下述命令下载并解压：
+
+
+```shell
+wget https://paddle-inference-lib.bj.bcebos.com/2.2.2/cxx_c/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddle_inference.tgz
+
+tar -xvf paddle_inference.tgz
+```
+
+最终会在当前的文件夹中生成`paddle_inference/`的子文件夹,文件内容和上述的paddle_inference_install_dir一样。
+
+
+#### 2.2.4 编译C++预测Demo
+
+* 编译命令如下，其中Paddle C++预测库、opencv等其他依赖库的地址需要换成自己机器上的实际地址。
+
+
+```shell
+# 在deploy/cpp下执行以下命令
+bash tools/build.sh
+```
+
+具体地，`tools/build.sh`中内容如下。
+
+```shell
+OPENCV_DIR=your_opencv_dir
+LIB_DIR=your_paddle_inference_dir
+CUDA_LIB_DIR=your_cuda_lib_dir
+CUDNN_LIB_DIR=your_cudnn_lib_dir
+TENSORRT_DIR=your_tensorrt_lib_dir
+
+BUILD_DIR=build
+rm -rf ${BUILD_DIR}
+mkdir ${BUILD_DIR}
+cd ${BUILD_DIR}
+cmake .. \
+    -DPADDLE_LIB=${LIB_DIR} \
+    -DWITH_MKL=ON \
+    -DDEMO_NAME=clas_system \
+    -DWITH_GPU=OFF \
+    -DWITH_STATIC_LIB=OFF \
+    -DWITH_TENSORRT=OFF \
+    -DTENSORRT_DIR=${TENSORRT_DIR} \
+    -DOPENCV_DIR=${OPENCV_DIR} \
+    -DCUDNN_LIB=${CUDNN_LIB_DIR} \
+    -DCUDA_LIB=${CUDA_LIB_DIR} \
+
+make -j
+```
+
+上述命令中，
+
+* `OPENCV_DIR`为opencv编译安装的地址（本例中需修改为`opencv-3.4.7/opencv3`文件夹的路径）；
+
+* `LIB_DIR`为下载的Paddle预测库（`paddle_inference`文件夹），或编译生成的Paddle预测库（`build/paddle_inference_install_dir`文件夹）的路径；
+
+* `CUDA_LIB_DIR`为cuda库文件地址，在docker中一般为`/usr/local/cuda/lib64`；
+
+* `CUDNN_LIB_DIR`为cudnn库文件地址，在docker中一般为`/usr/lib64`。
+
+* `TENSORRT_DIR`是tensorrt库文件地址，在dokcer中一般为`/usr/local/TensorRT-7.2.3.4/`，TensorRT需要结合GPU使用。
+
+在执行上述命令，编译完成之后，会在当前路径下生成`build`文件夹，其中生成一个名为`clas_system`的可执行文件。
+</details>
+
+* 可执行以下命令，自动完成上述准备环境中的所需内容
+```shell
+bash test_tipc/prepare.sh test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt cpp_infer
+```
+### 2.3 功能测试
+
+
+测试方法如下所示，希望测试不同的模型文件，只需更换为自己的参数配置文件，即可完成对应模型的测试。
 
-#### 使用方式
-运行命令：
 ```shell
-python3.7 test_tipc/compare_results.py --gt_file=./test_tipc/results/cls_cpp_*.txt  --log_file=./test_tipc/output/cls_cpp_*.log --atol=1e-3 --rtol=1e-3
+bash test_tipc/test_inference_cpp.sh ${your_params_file}
 ```
 
-参数介绍：  
-- gt_file： 指向事先保存好的预测结果路径，支持*.txt 结尾，会自动索引*.txt格式的文件，文件默认保存在test_tipc/result/ 文件夹下
-- log_file: 指向运行test_tipc/test_inference_cpp.sh 脚本的infer模式保存的预测日志，预测日志中打印的有预测结果，比如：文本框，预测文本，类别等等，同样支持cpp_infer_*.log格式传入
-- atol: 设置的绝对误差
-- rtol: 设置的相对误差
+以`ResNet50`的`Linux GPU/CPU C++推理测试`为例，命令如下所示。
 
-#### 运行结果
+```shell
+bash test_tipc/test_inference_cpp.sh test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt
+```
 
-正常运行效果如下图：
-<img src="compare_cpp_right.png" width="1000">
+输出结果如下，表示命令运行成功。
 
-出现不一致结果时的运行输出：
-<img src="compare_cpp_wrong.png" width="1000">
+```shell
+Run successfully with command - ./deploy/cpp/build/clas_system -c inference_cls.yaml > ./test_tipc/output/ResNet50/cls_cpp_infer_gpu_usetrt_False_precision_fp32_batchsize_1.log 2>&1!
+Run successfully with command - ./deploy/cpp/build/clas_system -c inference_cls.yaml > ./test_tipc/output/ResNet50/cls_cpp_infer_cpu_usemkldnn_False_threads_1_precision_fp32_batchsize_1.log 2>&1!
+```
 
+最终log中会打印出结果，如下所示
+```log
+You are using Paddle compiled with TensorRT, but TensorRT dynamic library is not found. Ignore this if TensorRT is not needed.
+=======Paddle Class inference config======
+Global:
+  infer_imgs: ./deploy/images/ILSVRC2012_val_00000010.jpeg
+  inference_model_dir: ./deploy/models/ResNet50_infer
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: True
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ""
+        channel_num: 3
+    - ToCHWImage: ~
+PostProcess:
+  main_indicator: Topk
+  Topk:
+    topk: 5
+    class_id_map_file: ./ppcls/utils/imagenet1k_label_list.txt
+  SavePreLabel:
+    save_dir: ./pre_label/
+=======End of Paddle Class inference config======
+img_file_list length: 1
+Current image path: ./deploy/images/ILSVRC2012_val_00000010.jpeg
+Current total inferen time cost: 5449.39 ms.
+    Top1: class_id: 153, score: 0.4144, label: Maltese dog, Maltese terrier, Maltese
+    Top2: class_id: 332, score: 0.3909, label: Angora, Angora rabbit
+    Top3: class_id: 229, score: 0.0514, label: Old English sheepdog, bobtail
+    Top4: class_id: 204, score: 0.0430, label: Lhasa, Lhasa apso
+    Top5: class_id: 265, score: 0.0420, label: toy poodle
 
-## 3. 更多教程
+```
+详细log位于`./test_tipc/output/ResNet50/cls_cpp_infer_gpu_usetrt_False_precision_fp32_batchsize_1.log`和`./test_tipc/output/ResNet50/cls_cpp_infer_cpu_usemkldnn_False_threads_1_precision_fp32_batchsize_1.log`中。
 
-本文档为功能测试用，更详细的c++预测使用教程请参考：[服务器端C++预测](../../docs/zh_CN/inference_deployment/)  
+如果运行失败，也会在终端中输出运行失败的日志信息以及对应的运行命令。可以基于该命令，分析运行失败的原因。
diff --git a/test_tipc/docs/test_paddle2onnx.md b/test_tipc/docs/test_paddle2onnx.md
new file mode 100644
index 0000000000000000000000000000000000000000..f3c292cf95d39ec3a6b9842eaf22f1e706e29f78
--- /dev/null
+++ b/test_tipc/docs/test_paddle2onnx.md
@@ -0,0 +1,52 @@
+# Paddle2onnx预测功能测试
+
+PaddleServing预测功能测试的主程序为`test_paddle2onnx.sh`，可以测试Paddle2ONNX的模型转化功能，并验证正确性。
+
+## 1. 测试结论汇总
+
+基于训练是否使用量化，进行本测试的模型可以分为`正常模型`和`量化模型`，这两类模型对应的Paddle2ONNX预测功能汇总如下：
+
+| 模型类型 |device |
+|  ----   |  ---- |
+| 正常模型 | GPU |
+| 正常模型 | CPU |
+
+
+## 2. 测试流程
+
+以下内容以`ResNet50`模型的paddle2onnx测试为例
+
+### 2.1 功能测试
+先运行`prepare.sh`准备数据和模型，然后运行`test_paddle2onnx.sh`进行测试，最终在`test_tipc/output/ResNet50`目录下生成`paddle2onnx_infer_*.log`后缀的日志文件
+下方展示以PPHGNet_small为例的测试命令与结果。
+
+```shell
+bash test_tipc/prepare.sh ./test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt paddle2onnx_infer
+
+# 用法:
+bash test_tipc/test_paddle2onnx.sh ./test_tipc/config/ResNet/ResNet50_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt
+```
+
+#### 运行结果
+
+各测试的运行情况会打印在 `./test_tipc/output/ResNet50/results_paddle2onnx.log` 中：
+运行成功时会输出：
+
+```
+Run successfully with command -  paddle2onnx --model_dir=./deploy/models/ResNet50_infer/ --model_filename=inference.pdmodel --params_filename=inference.pdiparams --save_file=./deploy/models/ResNet50_infer/inference.onnx --opset_version=10 --enable_onnx_checker=True!
+Run successfully with command - cd deploy && python3.7 ./python/predict_cls.py -o Global.inference_model_dir=./models/ResNet50_infer -o Global.use_onnx=True -o Global.use_gpu=False -c=configs/inference_cls.yaml > ../test_tipc/output/ResNet50/paddle2onnx_infer_cpu.log 2>&1 && cd ../!
+
+```
+
+运行失败时会输出：
+
+```
+Run failed with command -  paddle2onnx --model_dir=./deploy/models/ResNet50_infer/ --model_filename=inference.pdmodel --params_filename=inference.pdiparams --save_file=./deploy/models/ResNet50_infer/inference.onnx --opset_version=10 --enable_onnx_checker=True!
+Run failed with command - cd deploy && python3.7 ./python/predict_cls.py -o Global.inference_model_dir=./models/ResNet50_infer -o Global.use_onnx=True -o Global.use_gpu=False -c=configs/inference_cls.yaml > ../test_tipc/output/ResNet50/paddle2onnx_infer_cpu.log 2>&1 && cd ../!
+...
+```
+
+
+## 3. 更多教程
+
+本文档为功能测试用，更详细的Paddle2onnx预测使用教程请参考：[Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX)
diff --git a/test_tipc/docs/test_serving_infer_cpp.md b/test_tipc/docs/test_serving_infer_cpp.md
new file mode 100644
index 0000000000000000000000000000000000000000..2018f4c9215d91d2e4b006743e54293315fdf346
--- /dev/null
+++ b/test_tipc/docs/test_serving_infer_cpp.md
@@ -0,0 +1,113 @@
+# Linux GPU/CPU C++ 服务化部署测试
+
+Linux GPU/CPU C++ 服务化部署测试的主程序为`test_serving_infer_cpp.sh`，可以测试基于C++的模型服务化部署功能。
+
+
+## 1. 测试结论汇总
+
+- 推理相关：
+
+|    算法名称     |                 模型名称                  | device_CPU | device_GPU |
+| :-------------: | :---------------------------------------: | :--------: | :--------: |
+|   MobileNetV3   |          MobileNetV3_large_x1_0           |    支持    |    支持    |
+|   MobileNetV3   |         MobileNetV3_large_x1_0_KL         |    支持    |    支持    |
+|    PP-ShiTu     | PPShiTu_general_rec、PPShiTu_mainbody_det |    支持    |    支持    |
+|     PPHGNet     |               PPHGNet_small               |    支持    |    支持    |
+|     PPHGNet     |               PPHGNet_tiny                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x0_25               |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x0_35               |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x0_5                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x0_75               |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x1_0                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x1_5                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x2_0                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x2_5                |    支持    |    支持    |
+|    PPLCNetV2    |              PPLCNetV2_base               |    支持    |    支持    |
+|     ResNet      |                 ResNet50                  |    支持    |    支持    |
+|     ResNet      |                ResNet50_vd                |    支持    |    支持    |
+|     ResNet      |              ResNet50_vd_KL               |    支持    |    支持    |
+| SwinTransformer |  SwinTransformer_tiny_patch4_window7_224  |    支持    |    支持    |
+
+
+## 2. 测试流程
+
+### 2.1 准备数据
+
+分类模型默认使用`./deploy/paddleserving/daisy.jpg`作为测试输入图片，无需下载
+识别模型默认使用`drink_dataset_v1.0/test_images/001.jpeg`作为测试输入图片，在**2.2 准备环境**中会下载好。
+
+### 2.2 准备环境
+
+
+- 安装PaddlePaddle：如果您已经安装了2.2或者以上版本的paddlepaddle，那么无需运行下面的命令安装paddlepaddle。
+  ```shell
+  # 需要安装2.2及以上版本的Paddle
+  # 安装GPU版本的Paddle
+  python3.7 -m pip install paddlepaddle-gpu==2.2.0
+  # 安装CPU版本的Paddle
+  python3.7 -m pip install paddlepaddle==2.2.0
+  ```
+
+- 安装依赖
+  ```shell
+  python3.7 -m pip install -r requirements.txt
+  ```
+
+- 安装TensorRT
+  编译 serving-server 的脚本内会设置 `TENSORRT_LIBRARY_PATH` 这一环境变量，因此编译前需要安装TensorRT。
+
+  如果使用`registry.baidubce.com/paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82`镜像进测试，则已自带TensorRT无需安装，
+  否则可以参考 [3.2 安装TensorRT](install.md#32-安装tensorrt) 进行安装，并在修改 [build_server.sh](../../deploy/paddleserving/build_server.sh#L62) 的 `TENSORRT_LIBRARY_PATH` 地址为安装后的路径。
+
+- 安装 PaddleServing 相关组件，包括serving_client、serving-app，自动编译并安装带自定义OP的 serving_server 包，以及自动下载并解压推理模型
+  ```bash
+  # 安装必要依赖包
+  python3.7 -m pip install paddle_serving_client==0.9.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
+  python3.7 -m pip install paddle-serving-app==0.9.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
+
+  # 安装编译自定义OP的serving-server包
+  pushd ./deploy/paddleserving
+  source build_server.sh python3.7
+  popd
+
+  # 测试PP-ShiTu识别模型时需安装faiss包
+  python3.7-m pip install faiss-cpu==1.7.1post2 -i https://pypi.tuna.tsinghua.edu.cn/simple
+
+  # 下载模型与数据
+  bash test_tipc/prepare.sh test_tipc/configs/PPLCNet/PPLCNet_x1_0_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt serving_infer
+  ```
+
+### 2.3 功能测试
+
+测试方法如下所示，希望测试不同的模型文件，只需更换为自己的参数配置文件，即可完成对应模型的测试。
+
+```bash
+bash test_tipc/test_serving_infer_cpp.sh ${your_params_file} ${mode}
+```
+
+以`PPLCNet_x1_0`的`Linux GPU/CPU C++ 服务化部署测试`为例，命令如下所示。
+
+
+```bash
+bash test_tipc/test_serving_infer_cpp.sh test_tipc/configs/PPLCNet/PPLCNet_x1_0_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt serving_infer
+```
+
+输出结果如下，表示命令运行成功。
+
+```
+Run successfully with command - PPLCNet_x1_0 - python3.7 test_cpp_serving_client.py > ../../test_tipc/output/PPLCNet_x1_0/server_infer_cpp_gpu_pipeline_batchsize_1.log 2>&1 !
+Run successfully with command - PPLCNet_x1_0 - python3.7 test_cpp_serving_client.py > ../../test_tipc/output/PPLCNet_x1_0/server_infer_cpp_cpu_pipeline_batchsize_1.log 2>&1 !
+```
+
+预测结果会自动保存在 `./test_tipc/output/PPLCNet_x1_0/server_infer_gpu_pipeline_http_batchsize_1.log` ，可以看到 PaddleServing 的运行结果：
+
+```
+WARNING: Logging before InitGoogleLogging() is written to STDERR
+I0612 09:55:16.109890 38303 naming_service_thread.cpp:202] brpc::policy::ListNamingService("127.0.0.1:9292"): added 1
+I0612 09:55:16.172924 38303 general_model.cpp:490] [client]logid=0,client_cost=60.772ms,server_cost=57.6ms.
+prediction: daisy, probability: 0.9099399447441101
+0.06275796890258789
+```
+
+
+如果运行失败，也会在终端中输出运行失败的日志信息以及对应的运行命令。可以基于该命令，分析运行失败的原因。
diff --git a/test_tipc/docs/test_serving_infer_python.md b/test_tipc/docs/test_serving_infer_python.md
new file mode 100644
index 0000000000000000000000000000000000000000..1ba4c1516dc07f2a88aa406c9316e871fd7c8a8b
--- /dev/null
+++ b/test_tipc/docs/test_serving_infer_python.md
@@ -0,0 +1,99 @@
+# Linux GPU/CPU PYTHON 服务化部署测试
+
+Linux GPU/CPU PYTHON 服务化部署测试的主程序为`test_serving_infer_python.sh`，可以测试基于Python的模型服务化部署功能。
+
+
+## 1. 测试结论汇总
+
+- 推理相关：
+
+|    算法名称     |                 模型名称                  | device_CPU | device_GPU |
+| :-------------: | :---------------------------------------: | :--------: | :--------: |
+|   MobileNetV3   |          MobileNetV3_large_x1_0           |    支持    |    支持    |
+|   MobileNetV3   |         MobileNetV3_large_x1_0_KL         |    支持    |    支持    |
+|    PP-ShiTu     | PPShiTu_general_rec、PPShiTu_mainbody_det |    支持    |    支持    |
+|     PPHGNet     |               PPHGNet_small               |    支持    |    支持    |
+|     PPHGNet     |               PPHGNet_tiny                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x0_25               |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x0_35               |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x0_5                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x0_75               |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x1_0                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x1_5                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x2_0                |    支持    |    支持    |
+|     PPLCNet     |               PPLCNet_x2_5                |    支持    |    支持    |
+|    PPLCNetV2    |              PPLCNetV2_base               |    支持    |    支持    |
+|     ResNet      |                 ResNet50                  |    支持    |    支持    |
+|     ResNet      |                ResNet50_vd                |    支持    |    支持    |
+|     ResNet      |              ResNet50_vd_KL               |    支持    |    支持    |
+| SwinTransformer |  SwinTransformer_tiny_patch4_window7_224  |    支持    |    支持    |
+
+
+## 2. 测试流程
+
+### 2.1 准备数据
+
+分类模型默认使用`./deploy/paddleserving/daisy.jpg`作为测试输入图片，无需下载
+识别模型默认使用`drink_dataset_v1.0/test_images/001.jpeg`作为测试输入图片，在**2.2 准备环境**中会下载好。
+
+### 2.2 准备环境
+
+
+- 安装PaddlePaddle：如果您已经安装了2.2或者以上版本的paddlepaddle，那么无需运行下面的命令安装paddlepaddle。
+  ```shell
+  # 需要安装2.2及以上版本的Paddle
+  # 安装GPU版本的Paddle
+  python3.7 -m pip install paddlepaddle-gpu==2.2.0
+  # 安装CPU版本的Paddle
+  python3.7 -m pip install paddlepaddle==2.2.0
+  ```
+
+- 安装依赖
+  ```shell
+  python3.7 -m pip install -r requirements.txt
+  ```
+
+- 安装 PaddleServing 相关组件，包括serving-server、serving_client、serving-app，自动下载并解压推理模型
+  ```bash
+  # 安装必要依赖包
+  python3.7 -m pip install paddle_serving_client==0.9.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
+  python3.7 -m pip install paddle-serving-app==0.9.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
+  python3.7 -m pip install install paddle-serving-server-gpu==0.9.0.post101 -i https://pypi.tuna.tsinghua.edu.cn/simple
+
+  # 测试PP-ShiTu识别模型时需安装faiss包
+  python3.7-m pip install faiss-cpu==1.7.1post2 -i https://pypi.tuna.tsinghua.edu.cn/simple
+
+  # 下载模型与数据
+  bash test_tipc/prepare.sh test_tipc/configs/ResNet50/ResNet50_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt serving_infer
+  ```
+
+### 2.3 功能测试
+
+测试方法如下所示，希望测试不同的模型文件，只需更换为自己的参数配置文件，即可完成对应模型的测试。
+
+```bash
+bash test_tipc/test_serving_infer_python.sh ${your_params_file} ${mode}
+```
+
+以`ResNet50`的`Linux GPU/CPU PYTHON 服务化部署测试`为例，命令如下所示。
+
+
+```bash
+bash test_tipc/test_serving_infer_python.sh test_tipc/configs/ResNet50/ResNet50_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt serving_infer
+```
+
+输出结果如下，表示命令运行成功。
+
+```
+Run successfully with command - python3.7 pipeline_http_client.py > ../../test_tipc/output/ResNet50/server_infer_gpu_pipeline_http_batchsize_1.log 2>&1!
+Run successfully with command - python3.7 pipeline_http_client.py > ../../test_tipc/output/ResNet50/server_infer_cpu_pipeline_http_batchsize_1.log 2>&1 !
+```
+
+预测结果会自动保存在 `./test_tipc/output/ResNet50/server_infer_gpu_pipeline_http_batchsize_1.log` ，可以看到 PaddleServing 的运行结果：
+
+```
+{'err_no': 0, 'err_msg': '', 'key': ['label', 'prob'], 'value': ["['daisy']", '[0.998314619064331]']}
+```
+
+
+如果运行失败，也会在终端中输出运行失败的日志信息以及对应的运行命令。可以基于该命令，分析运行失败的原因。
diff --git a/test_tipc/docs/test_train_fleet_inference_python.md b/test_tipc/docs/test_train_fleet_inference_python.md
new file mode 100644
index 0000000000000000000000000000000000000000..256d722f2b34411f77f228a33ec63c3c64ee2a19
--- /dev/null
+++ b/test_tipc/docs/test_train_fleet_inference_python.md
@@ -0,0 +1,121 @@
+# Linux GPU/CPU 多机多卡训练推理测试
+
+Linux GPU/CPU 多机多卡训练推理测试的主程序为`test_train_inference_python.sh`，可以测试基于Python的多机多卡模型训练、评估、推理等基本功能。
+
+## 1. 测试结论汇总
+
+- 训练相关：
+
+  | 算法名称  |      模型名称       |  多机多卡  |
+  | :-------: | :-----------------: | :--------: |
+  |  PPLCNet  |    PPLCNet_x1_0     | 分布式训练 |
+  | PPLCNetV2 |   PPLCNetV2_base    | 分布式训练 |
+  |  PPHGNet  |    PPHGNet_small    | 分布式训练 |
+  | PP-ShiTu  | PPShiTu_general_rec | 分布式训练 |
+
+
+- 推理相关：
+
+  | 算法名称  |      模型名称       | device_CPU | device_GPU | batchsize |
+  | :-------: | :-----------------: | :--------: | :--------: | :-------: |
+  |  PPLCNet  |    PPLCNet_x1_0     |    支持    |    支持    |     1     |
+  | PPLCNetV2 |   PPLCNetV2_base    |    支持    |    支持    |     1     |
+  |  PPHGNet  |    PPHGNet_small    |    支持    |    支持    |     1     |
+  | PP-ShiTu  | PPShiTu_general_rec |    支持    |    支持    |     1     |
+
+
+## 2. 测试流程
+
+运行环境配置请参考[文档](./install.md)的内容配置TIPC的运行环境。
+
+**下面以 PPLCNet_x1_0 模型为例，介绍测试流程**
+
+### 2.1 功能测试
+
+#### 2.1.1 修改配置文件
+
+首先，修改配置文件`test_tipc/config/PPLCNet/PPLCNet_x1_0_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt`中的`gpu_list`设置：假设两台机器的`ip`地址分别为`192.168.0.1`和`192.168.0.2`，则对应的配置文件`gpu_list`字段需要修改为`gpu_list:192.168.0.1,192.168.0.2;0,1`。
+
+**`ip`地址查看命令为`ifconfig`，在`inet addr:`字段后的即为ip地址**。
+
+
+#### 2.1.2 准备数据
+
+运行`prepare.sh`准备数据和模型，数据准备命令如下所示。
+
+```shell
+bash test_tipc/prepare.sh test_tipc/config/PPLCNet/PPLCNet_x1_0_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt lite_train_lite_infer
+```
+
+**注意：** 由于是多机训练，这里需要在所有节点上都运行一次启动上述命令来准备数据。
+
+#### 2.1.3 修改起始端口开始测试
+
+在多机的节点上使用下面的命令设置分布式的起始端口（否则后面运行的时候会由于无法找到运行端口而hang住），一般建议设置在`10000~20000`之间。
+
+```shell
+export FLAGS_START_PORT=17000
+```
+**注意：** 上述修改起始端口命令同样需要在所有节点上都执行一次。
+
+接下来就可以开始执行测试，命令如下所示。
+```shell
+bash test_tipc/test_train_inference_python.sh  test_tipc/config/PPLCNet/PPLCNet_x1_0_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt
+```
+
+**注意：** 由于是多机训练，这里需要在所有的节点上均运行启动上述命令进行测试。
+
+
+#### 2.1.4 输出结果
+
+输出结果保存在`test_tipc/output/PPLCNet_x1_0/results_python.log`，内容如下，以`Run successfully`开头表示测试命令正常，否则为测试失败。
+
+```bash
+Run successfully with command - python3.7 -m paddle.distributed.launch --ips=192.168.0.1,192.168.0.2 --gpus=0,1 tools/train.py -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml -o Global.seed=1234 -o DataL
+oader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False -o Global.device=gpu -o Global.output_dir=./test_tipc/output/PPLCNet_x1_0/norm_train_gpus_0,
+1_autocast_null_nodes_2   -o Global.epochs=2   -o DataLoader.Train.sampler.batch_size=8  !
+...
+...
+Run successfully with command - python3.7 python/predict_cls.py -c configs/inference_cls.yaml -o Global.use_gpu=False -o Global.enable_mkldnn=True -o Global.cpu_num_threads=1 -o Global.inference_model_dir=.././t
+est_tipc/output/PPLCNet_x1_0/norm_train_gpus_0,1_autocast_null_nodes_2 -o Global.batch_size=16 -o Global.infer_imgs=../dataset/ILSVRC2012/val -o Global.benchmark=True   > .././test_tipc/output/PPLCNet_x1_0/infer_cpu_us
+emkldnn_True_threads_1_batchsize_16.log 2>&1 !
+```
+
+在配置文件中默认设置`-o Global.benchmark:True`表示开启benchmark选项，此时可以得到测试的详细数据，包含运行环境信息（系统版本、CUDA版本、CUDNN版本、驱动版本），Paddle版本信息，参数设置信息（运行设备、线程数、是否开启内存优化等），模型信息（模型名称、精度），数据信息（batchsize、是否为动态shape等），性能信息（CPU,GPU的占用、运行耗时、预处理耗时、推理耗时、后处理耗时），内容如下所示：
+
+```log
+[2022/06/07 17:01:41] root INFO: ---------------------- Env info ----------------------
+[2022/06/07 17:01:41] root INFO:  OS_version: CentOS 6.10
+[2022/06/07 17:01:41] root INFO:  CUDA_version: 10.1.243
+[2022/06/07 17:01:41] root INFO:  CUDNN_version: None.None.None
+[2022/06/07 17:01:41] root INFO:  drivier_version: 460.32.03
+[2022/06/07 17:01:41] root INFO: ---------------------- Paddle info ----------------------
+[2022/06/07 17:01:41] root INFO:  paddle_version: 2.3.0-rc0
+[2022/06/07 17:01:41] root INFO:  paddle_version: 2.3.0-rc0
+[2022/06/07 17:01:41] root INFO:  paddle_commit: 5d4980c052583fec022812d9c29460aff7cdc18b
+[2022/06/07 17:01:41] root INFO:  log_api_version: 1.0
+[2022/06/07 17:01:41] root INFO: ----------------------- Conf info -----------------------
+[2022/06/07 17:01:41] root INFO:  runtime_device: cpu
+[2022/06/07 17:01:41] root INFO:  ir_optim: True
+[2022/06/07 17:01:41] root INFO:  enable_memory_optim: True
+[2022/06/07 17:01:41] root INFO:  enable_tensorrt: False
+[2022/06/07 17:01:41] root INFO:  enable_mkldnn: False
+[2022/06/07 17:01:41] root INFO:  cpu_math_library_num_threads: 6
+[2022/06/07 17:01:41] root INFO: ----------------------- Model info ----------------------
+[2022/06/07 17:01:41] root INFO:  model_name: cls
+[2022/06/07 17:01:41] root INFO:  precision: fp32
+[2022/06/07 17:01:41] root INFO: ----------------------- Data info -----------------------
+[2022/06/07 17:01:41] root INFO:  batch_size: 16
+[2022/06/07 17:01:41] root INFO:  input_shape: [3, 224, 224]
+[2022/06/07 17:01:41] root INFO:  data_num: 3
+[2022/06/07 17:01:41] root INFO: ----------------------- Perf info -----------------------
+[2022/06/07 17:01:41] root INFO:  cpu_rss(MB): 726.5586, gpu_rss(MB): None, gpu_util: None%
+[2022/06/07 17:01:41] root INFO:  total time spent(s): 0.3527
+[2022/06/07 17:01:41] root INFO:  preprocess_time(ms): 33.2723, inference_time(ms): 317.9824, postprocess_time(ms): 1.4579
+```
+
+该信息可以在运行log中查看，log位置在`test_tipc/output/PPLCNet_x1_0/infer_gpu_usetrt_True_precision_True_batchsize_1.log`。
+
+如果运行失败，也会在终端中输出运行失败的日志信息以及对应的运行命令。可以基于该命令，分析运行失败的原因。
+
+**注意：** 由于分布式训练时，仅在`trainer_id=0`所在的节点中保存模型，因此其他的节点中在运行模型导出与推理时会因为找不到保存的模型而报错，为正常现象。
diff --git a/test_tipc/generate_cpp_yaml.py b/test_tipc/generate_cpp_yaml.py
index 2e541de33a47bb3a940a3d5fadc0ddf436bb50b9..fdd5ee2e2fa2c31085a88dca0479a01a944e10cb 100644
--- a/test_tipc/generate_cpp_yaml.py
+++ b/test_tipc/generate_cpp_yaml.py
@@ -66,6 +66,10 @@ def main():
                                                       "test_images")
         config["IndexProcess"]["index_dir"] = os.path.join(args.data_dir,
                                                            "index")
+        config["IndexProcess"]["image_root"] = os.path.join(args.data_dir,
+                                                            "gallery")
+        config["IndexProcess"]["data_file"] = os.path.join(args.data_dir,
+                                                           "drink_label.txt")
         assert args.cls_model_dir
         assert args.det_model_dir
         config["Global"]["det_inference_model_dir"] = args.det_model_dir
diff --git a/test_tipc/prepare.sh b/test_tipc/prepare.sh
index 70040dc8b28656f7fb3e1384f840f068437dcf7e..f2e48710d884a07d0282cfaac225b1b9f7a31dcd 100644
--- a/test_tipc/prepare.sh
+++ b/test_tipc/prepare.sh
@@ -12,7 +12,7 @@ dataline=$(cat ${FILENAME})
 IFS=$'\n'
 lines=(${dataline})
 
-function func_parser_key(){
+function func_parser_key() {
     strs=$1
     IFS=":"
     array=(${strs})
@@ -20,104 +20,148 @@ function func_parser_key(){
     echo ${tmp}
 }
 
-function func_parser_value(){
+function func_parser_value() {
     strs=$1
     IFS=":"
     array=(${strs})
     if [ ${#array[*]} = 2 ]; then
         echo ${array[1]}
     else
-    	IFS="|"
-    	tmp="${array[1]}:${array[2]}"
+        IFS="|"
+        tmp="${array[1]}:${array[2]}"
         echo ${tmp}
     fi
 }
 
-function func_get_url_file_name(){
+function func_get_url_file_name() {
     strs=$1
     IFS="/"
     array=(${strs})
-    tmp=${array[${#array[@]}-1]}
+    tmp=${array[${#array[@]} - 1]}
     echo ${tmp}
 }
 
 model_name=$(func_parser_value "${lines[1]}")
 
-if [ ${MODE} = "cpp_infer" ];then
-   if [[ $FILENAME == *infer_cpp_linux_gpu_cpu.txt ]];then
-	cpp_type=$(func_parser_value "${lines[2]}")
-	cls_inference_model_dir=$(func_parser_value "${lines[3]}")
-	det_inference_model_dir=$(func_parser_value "${lines[4]}")
-	cls_inference_url=$(func_parser_value "${lines[5]}")
-	det_inference_url=$(func_parser_value "${lines[6]}")
-
-	if [[ $cpp_type == "cls" ]];then
-	    eval "wget -nc $cls_inference_url"
-	    tar xf "${model_name}_inference.tar"
-	    eval "mv inference $cls_inference_model_dir"
-	    cd dataset
-    	    rm -rf ILSVRC2012
-    	    wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/data/whole_chain/whole_chain_infer.tar
-    	    tar xf whole_chain_infer.tar
-    	    ln -s whole_chain_infer ILSVRC2012
-	    cd ..
-	elif [[ $cpp_type == "shitu" ]];then
-	    eval "wget -nc $cls_inference_url"
-	    tar_name=$(func_get_url_file_name "$cls_inference_url")
-	    model_dir=${tar_name%.*}
-	    eval "tar xf ${tar_name}"
-	    eval "mv ${model_dir} ${cls_inference_model_dir}"
-
-	    eval "wget -nc $det_inference_url"
-	    tar_name=$(func_get_url_file_name "$det_inference_url")
-	    model_dir=${tar_name%.*}
-	    eval "tar xf ${tar_name}"
-	    eval "mv ${model_dir} ${det_inference_model_dir}"
-	    cd dataset
-	    wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar
-	    tar -xf drink_dataset_v1.0.tar
-	else
-	    echo "Wrong cpp type in config file in line 3. only support cls, shitu"
-	fi
-	exit 0
-   else
-	echo "use wrong config file"
-	exit 1
-   fi
+if [[ ${MODE} = "cpp_infer" ]]; then
+    if [ -d "./deploy/cpp/opencv-3.4.7/opencv3/" ] && [ $(md5sum ./deploy/cpp/opencv-3.4.7.tar.gz | awk -F ' ' '{print $1}') = "faa2b5950f8bee3f03118e600c74746a" ]; then
+        echo "################### build opencv skipped ###################"
+    else
+        echo "################### build opencv ###################"
+        rm -rf ./deploy/cpp/opencv-3.4.7.tar.gz ./deploy/cpp/opencv-3.4.7/
+        pushd ./deploy/cpp/
+        wget -nc https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/opencv-3.4.7.tar.gz
+        tar -xf opencv-3.4.7.tar.gz
+
+        cd opencv-3.4.7/
+        install_path=$(pwd)/opencv3
+        rm -rf build
+        mkdir build
+        cd build
+
+        cmake .. \
+        -DCMAKE_INSTALL_PREFIX=${install_path} \
+        -DCMAKE_BUILD_TYPE=Release \
+        -DBUILD_SHARED_LIBS=OFF \
+        -DWITH_IPP=OFF \
+        -DBUILD_IPP_IW=OFF \
+        -DWITH_LAPACK=OFF \
+        -DWITH_EIGEN=OFF \
+        -DCMAKE_INSTALL_LIBDIR=lib64 \
+        -DWITH_ZLIB=ON \
+        -DBUILD_ZLIB=ON \
+        -DWITH_JPEG=ON \
+        -DBUILD_JPEG=ON \
+        -DWITH_PNG=ON \
+        -DBUILD_PNG=ON \
+        -DWITH_TIFF=ON \
+        -DBUILD_TIFF=ON
+
+        make -j
+        make install
+        cd ../../
+        popd
+        echo "################### build opencv finished ###################"
+    fi
+    if [[ ! -d "./deploy/cpp/paddle_inference/" ]]; then
+        pushd ./deploy/cpp/
+        wget -nc https://paddle-inference-lib.bj.bcebos.com/2.2.2/cxx_c/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddle_inference.tgz
+        tar xf paddle_inference.tgz
+        popd
+    fi
+    if [[ $FILENAME == *infer_cpp_linux_gpu_cpu.txt ]]; then
+        cpp_type=$(func_parser_value "${lines[2]}")
+        cls_inference_model_dir=$(func_parser_value "${lines[3]}")
+        det_inference_model_dir=$(func_parser_value "${lines[4]}")
+        cls_inference_url=$(func_parser_value "${lines[5]}")
+        det_inference_url=$(func_parser_value "${lines[6]}")
+
+        if [[ $cpp_type == "cls" ]]; then
+            eval "wget -nc $cls_inference_url"
+            tar xf "${model_name}_infer.tar"
+
+            cd dataset
+            rm -rf ILSVRC2012
+            wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/data/whole_chain/whole_chain_infer.tar
+            tar xf whole_chain_infer.tar
+            ln -s whole_chain_infer ILSVRC2012
+            cd ..
+        elif [[ $cpp_type == "shitu" ]]; then
+            eval "wget -nc $cls_inference_url"
+            tar_name=$(func_get_url_file_name "$cls_inference_url")
+            model_dir=${tar_name%.*}
+            eval "tar xf ${tar_name}"
+
+            eval "wget -nc $det_inference_url"
+            tar_name=$(func_get_url_file_name "$det_inference_url")
+            model_dir=${tar_name%.*}
+            eval "tar xf ${tar_name}"
+
+            cd dataset
+            wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar
+            tar -xf drink_dataset_v1.0.tar
+        else
+            echo "Wrong cpp type in config file in line 3. only support cls, shitu"
+        fi
+        exit 0
+    else
+        echo "use wrong config file"
+        exit 1
+    fi
 fi
 
 model_name=$(func_parser_value "${lines[1]}")
 model_url_value=$(func_parser_value "${lines[35]}")
 model_url_key=$(func_parser_key "${lines[35]}")
 
-if [[ $FILENAME == *GeneralRecognition* ]];then
-   cd dataset
-   rm -rf Aliproduct
-   rm -rf train_reg_all_data.txt
-   rm -rf demo_train
-   wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/data/whole_chain/tipc_shitu_demo_data.tar
-   tar -xf tipc_shitu_demo_data.tar
-   ln -s tipc_shitu_demo_data Aliproduct
-   ln -s tipc_shitu_demo_data/demo_train.txt train_reg_all_data.txt
-   ln -s tipc_shitu_demo_data/demo_train demo_train
-   cd tipc_shitu_demo_data
-   ln -s demo_test.txt val_list.txt
-   cd ../../
-   eval "wget -nc $model_url_value"
-   mv general_PPLCNet_x2_5_pretrained_v1.0.pdparams GeneralRecognition_PPLCNet_x2_5_pretrained.pdparams
-   exit 0
+if [[ $FILENAME == *GeneralRecognition* ]]; then
+    cd dataset
+    rm -rf Aliproduct
+    rm -rf train_reg_all_data.txt
+    rm -rf demo_train
+    wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/data/whole_chain/tipc_shitu_demo_data.tar --no-check-certificate
+    tar -xf tipc_shitu_demo_data.tar
+    ln -s tipc_shitu_demo_data Aliproduct
+    ln -s tipc_shitu_demo_data/demo_train.txt train_reg_all_data.txt
+    ln -s tipc_shitu_demo_data/demo_train demo_train
+    cd tipc_shitu_demo_data
+    ln -s demo_test.txt val_list.txt
+    cd ../../
+    eval "wget -nc $model_url_value --no-check-certificate"
+    mv general_PPLCNet_x2_5_pretrained_v1.0.pdparams GeneralRecognition_PPLCNet_x2_5_pretrained.pdparams
+    exit 0
 fi
 
-if [[ $FILENAME == *use_dali* ]];then
+if [[ $FILENAME == *use_dali* ]]; then
     python_name=$(func_parser_value "${lines[2]}")
     ${python_name} -m pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/nightly --upgrade nvidia-dali-nightly-cuda102
 fi
 
-if [ ${MODE} = "lite_train_lite_infer" ] || [ ${MODE} = "lite_train_whole_infer" ];then
+if [[ ${MODE} = "lite_train_lite_infer" ]] || [[ ${MODE} = "lite_train_whole_infer" ]]; then
     # pretrain lite train data
     cd dataset
     rm -rf ILSVRC2012
-    wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/data/whole_chain/whole_chain_little_train.tar
+    wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/data/whole_chain/whole_chain_little_train.tar --no-check-certificate
     tar xf whole_chain_little_train.tar
     ln -s whole_chain_little_train ILSVRC2012
     cd ILSVRC2012
@@ -125,7 +169,7 @@ if [ ${MODE} = "lite_train_lite_infer" ] || [ ${MODE} = "lite_train_whole_infer"
     mv val.txt val_list.txt
     cp -r train/* val/
     cd ../../
-elif [ ${MODE} = "whole_infer" ] || [ ${MODE} = "klquant_whole_infer" ];then
+elif [[ ${MODE} = "whole_infer" ]] || [[ ${MODE} = "klquant_whole_infer" ]]; then
     # download data
     cd dataset
     rm -rf ILSVRC2012
@@ -139,15 +183,15 @@ elif [ ${MODE} = "whole_infer" ] || [ ${MODE} = "klquant_whole_infer" ];then
     # download inference or pretrained model
     eval "wget -nc $model_url_value"
     if [[ $model_url_key == *inference* ]]; then
-	rm -rf inference
-	tar xf "${model_name}_inference.tar"
+        rm -rf inference
+        tar xf "${model_name}_infer.tar"
     fi
-    if [[ $model_name == "SwinTransformer_large_patch4_window7_224" || $model_name == "SwinTransformer_large_patch4_window12_384" ]];then
-	cmd="mv ${model_name}_22kto1k_pretrained.pdparams ${model_name}_pretrained.pdparams"
-	eval $cmd
+    if [[ $model_name == "SwinTransformer_large_patch4_window7_224" || $model_name == "SwinTransformer_large_patch4_window12_384" ]]; then
+        cmd="mv ${model_name}_22kto1k_pretrained.pdparams ${model_name}_pretrained.pdparams"
+        eval $cmd
     fi
 
-elif [ ${MODE} = "whole_train_whole_infer" ];then
+elif [[ ${MODE} = "whole_train_whole_infer" ]]; then
     cd dataset
     rm -rf ILSVRC2012
     wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/data/whole_chain/whole_chain_CIFAR100.tar
@@ -159,31 +203,50 @@ elif [ ${MODE} = "whole_train_whole_infer" ];then
     cd ../../
 fi
 
-if [ ${MODE} = "serving_infer" ];then
+if [[ ${MODE} = "serving_infer" ]]; then
     # prepare serving env
     python_name=$(func_parser_value "${lines[2]}")
-    ${python_name} -m pip install install paddle-serving-server-gpu==0.6.1.post101
-    ${python_name} -m pip install paddle_serving_client==0.6.1
-    ${python_name} -m pip install paddle-serving-app==0.6.1
+    if [[ ${model_name} =~ "ShiTu" ]]; then
+        cls_inference_model_url=$(func_parser_value "${lines[3]}")
+        cls_tar_name=$(func_get_url_file_name "${cls_inference_model_url}")
+        det_inference_model_url=$(func_parser_value "${lines[4]}")
+        det_tar_name=$(func_get_url_file_name "${det_inference_model_url}")
+        cd ./deploy
+        wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar --no-check-certificate
+        tar -xf drink_dataset_v1.0.tar
+        mkdir models
+        cd models
+        wget -nc ${cls_inference_model_url} && tar xf ${cls_tar_name}
+        wget -nc ${det_inference_model_url} && tar xf ${det_tar_name}
+        cd ..
+    else
+        cls_inference_model_url=$(func_parser_value "${lines[3]}")
+        cls_tar_name=$(func_get_url_file_name "${cls_inference_model_url}")
+        cd ./deploy/paddleserving
+        wget -nc ${cls_inference_model_url} && tar xf ${cls_tar_name}
+        cd ../../
+    fi
     unset http_proxy
     unset https_proxy
-    cd ./deploy/paddleserving
-    wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar && tar xf ResNet50_vd_infer.tar
 fi
 
-if [ ${MODE} = "paddle2onnx_infer" ];then
+if [[ ${MODE} = "paddle2onnx_infer" ]]; then
     # prepare paddle2onnx env
     python_name=$(func_parser_value "${lines[2]}")
-    ${python_name} -m pip install install paddle2onnx
-    ${python_name} -m pip install onnxruntime
+    inference_model_url=$(func_parser_value "${lines[10]}")
+    tar_name=${inference_model_url##*/}
 
-    # wget model
-    cd deploy && mkdir models && cd models
-    wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar  && tar xf ResNet50_vd_infer.tar
+    ${python_name} -m pip install paddle2onnx
+    ${python_name} -m pip install onnxruntime
+    cd deploy
+    mkdir models
+    cd models
+    wget -nc ${inference_model_url}
+    tar xf ${tar_name}
     cd ../../
 fi
 
-if [ ${MODE} = "benchmark_train" ];then
+if [[ ${MODE} = "benchmark_train" ]]; then
     pip install -r requirements.txt
     cd dataset
     rm -rf ILSVRC2012
@@ -191,6 +254,6 @@ if [ ${MODE} = "benchmark_train" ];then
     tar xf ILSVRC2012_val.tar
     ln -s ILSVRC2012_val ILSVRC2012
     cd ILSVRC2012
-    ln -s val_list.txt  train_list.txt
+    ln -s val_list.txt train_list.txt
     cd ../../
 fi
diff --git a/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_amp_fp16_DP.sh b/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_amp_fp16_DP.sh
index 8ec70d35c3aed8814e43062f70223d4a2c5fffe8..f9f2f76665cb030fbf11414d4f313345a539b12c 100644
--- a/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_amp_fp16_DP.sh
+++ b/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_amp_fp16_DP.sh
@@ -8,5 +8,11 @@ num_workers=8
 
 # get data
 bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
+
+cd ./dataset/ILSVRC2012
+cat train_list.txt >> tmp
+for i in {1..10}; do cat tmp >> train_list.txt; done
+cd ../../
+
 # run
 bash test_tipc/static/${model_item}/benchmark_common/run_benchmark.sh ${model_item} ${bs_item} ${fp_item} ${run_mode} ${device_num} ${max_epochs} ${num_workers} 2>&1;
diff --git a/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_fp32_DP.sh b/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_fp32_DP.sh
index 6ab1ec00cfc97b9b15e392bcc08ac5cde7a896e5..9f6ab183e28ab8d7243cbcedfa93215f590913f9 100644
--- a/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_fp32_DP.sh
+++ b/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_fp32_DP.sh
@@ -8,5 +8,11 @@ num_workers=8
 
 # get data
 bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
+
+cd ./dataset/ILSVRC2012
+cat train_list.txt >> tmp
+for i in {1..10}; do cat tmp >> train_list.txt; done
+cd ../../
+
 # run
 bash test_tipc/static/${model_item}/benchmark_common/run_benchmark.sh ${model_item} ${bs_item} ${fp_item} ${run_mode} ${device_num} ${max_epochs} ${num_workers} 2>&1;
diff --git a/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_pure_fp16_DP.sh b/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_pure_fp16_DP.sh
index 672fb24660ba10f181098dc2c8b2cc52463bfc40..bef8186ea5e10feda6d62e1df01a41303b5c3469 100644
--- a/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_pure_fp16_DP.sh
+++ b/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_pure_fp16_DP.sh
@@ -8,5 +8,11 @@ num_workers=8
 
 # get data
 bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
+
+cd ./dataset/ILSVRC2012
+cat train_list.txt >> tmp
+for i in {1..10}; do cat tmp >> train_list.txt; done
+cd ../../
+
 # run
 bash test_tipc/static/${model_item}/benchmark_common/run_benchmark.sh ${model_item} ${bs_item} ${fp_item} ${run_mode} ${device_num} ${max_epochs} ${num_workers} 2>&1;
diff --git a/test_tipc/test_inference_cpp.sh b/test_tipc/test_inference_cpp.sh
index 129e439562980a233924995141ea864d052f6dfb..abba66503f0ceede4cc17f0d3fb93811a4f50a11 100644
--- a/test_tipc/test_inference_cpp.sh
+++ b/test_tipc/test_inference_cpp.sh
@@ -6,19 +6,18 @@ GPUID=$2
 if [[ ! $GPUID ]];then
    GPUID=0
 fi
-dataline=$(awk 'NR==1, NR==16{print}'  $FILENAME)
+dataline=$(awk 'NR==1, NR==19{print}'  $FILENAME)
 
 # parser params
 IFS=$'\n'
 lines=(${dataline})
-
-# parser cpp inference model 
+# parser cpp inference model
 model_name=$(func_parser_value "${lines[1]}")
 cpp_infer_type=$(func_parser_value "${lines[2]}")
 cpp_infer_model_dir=$(func_parser_value "${lines[3]}")
 cpp_det_infer_model_dir=$(func_parser_value "${lines[4]}")
 cpp_infer_is_quant=$(func_parser_value "${lines[7]}")
-# parser cpp inference 
+# parser cpp inference
 inference_cmd=$(func_parser_value "${lines[8]}")
 cpp_use_gpu_list=$(func_parser_value "${lines[9]}")
 cpp_use_mkldnn_list=$(func_parser_value "${lines[10]}")
@@ -31,7 +30,7 @@ cpp_benchmark_value=$(func_parser_value "${lines[16]}")
 generate_yaml_cmd=$(func_parser_value "${lines[17]}")
 transform_index_cmd=$(func_parser_value "${lines[18]}")
 
-LOG_PATH="./test_tipc/output"
+LOG_PATH="./test_tipc/output/${model_name}"
 mkdir -p ${LOG_PATH}
 status_log="${LOG_PATH}/results_cpp.log"
 # generate_yaml_cmd="python3 test_tipc/generate_cpp_yaml.py"
@@ -43,7 +42,7 @@ function func_shitu_cpp_inference(){
     _log_path=$3
     _img_dir=$4
     _flag_quant=$5
-    # inference 
+    # inference
 
     for use_gpu in ${cpp_use_gpu_list[*]}; do
         if [ ${use_gpu} = "False" ] || [ ${use_gpu} = "cpu" ]; then
@@ -58,14 +57,13 @@ function func_shitu_cpp_inference(){
                             precison="int8"
                         fi
                         _save_log_path="${_log_path}/shitu_cpp_infer_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_precision_${precision}_batchsize_${batch_size}.log"
-
-			command="${generate_yaml_cmd} --type shitu --batch_size ${batch_size} --mkldnn ${use_mkldnn} --gpu ${use_gpu} --cpu_thread ${threads} --tensorrt False --precision ${precision} --data_dir ${_img_dir} --benchmark True --cls_model_dir ${cpp_infer_model_dir} --det_model_dir ${cpp_det_infer_model_dir} --gpu_id ${GPUID}"
-			eval $command
-			eval $transform_index_cmd
-			command="${_script} 2>&1|tee ${_save_log_path}"
-			eval $command
+                        eval $transform_index_cmd
+                        command="${generate_yaml_cmd} --type shitu --batch_size ${batch_size} --mkldnn ${use_mkldnn} --gpu ${use_gpu} --cpu_thread ${threads} --tensorrt False --precision ${precision} --data_dir ${_img_dir} --benchmark True --cls_model_dir ${cpp_infer_model_dir} --det_model_dir ${cpp_det_infer_model_dir} --gpu_id ${GPUID}"
+                        eval $command
+                        command="${_script} > ${_save_log_path} 2>&1"
+                        eval $command
                         last_status=${PIPESTATUS[0]}
-                        status_check $last_status "${command}" "${status_log}"
+                        status_check $last_status "${command}" "${status_log}" "${model_name}"
                     done
                 done
             done
@@ -74,7 +72,7 @@ function func_shitu_cpp_inference(){
                 for precision in ${cpp_precision_list[*]}; do
                     if [[ ${_flag_quant} = "False" ]] && [[ ${precision} =~ "int8" ]]; then
                         continue
-                    fi 
+                    fi
                     if [[ ${precision} =~ "fp16" || ${precision} =~ "int8" ]] && [ ${use_trt} = "False" ]; then
                         continue
                     fi
@@ -83,13 +81,13 @@ function func_shitu_cpp_inference(){
                     fi
                     for batch_size in ${cpp_batch_size_list[*]}; do
                         _save_log_path="${_log_path}/shitu_cpp_infer_gpu_usetrt_${use_trt}_precision_${precision}_batchsize_${batch_size}.log"
-			command="${generate_yaml_cmd} --type shitu --batch_size ${batch_size} --mkldnn False --gpu ${use_gpu} --cpu_thread 1 --tensorrt ${use_trt} --precision ${precision} --data_dir ${_img_dir} --benchmark True --cls_model_dir ${cpp_infer_model_dir} --det_model_dir ${cpp_det_infer_model_dir} --gpu_id ${GPUID}"
+                        eval $transform_index_cmd
+                        command="${generate_yaml_cmd} --type shitu --batch_size ${batch_size} --mkldnn False --gpu ${use_gpu} --cpu_thread 1 --tensorrt ${use_trt} --precision ${precision} --data_dir ${_img_dir} --benchmark True --cls_model_dir ${cpp_infer_model_dir} --det_model_dir ${cpp_det_infer_model_dir} --gpu_id ${GPUID}"
+                        eval $command
+                        command="${_script} > ${_save_log_path} 2>&1"
                         eval $command
-			eval $transform_index_cmd
-			command="${_script} 2>&1|tee ${_save_log_path}"
-			eval $command
                         last_status=${PIPESTATUS[0]}
-                        status_check $last_status "${_script}" "${status_log}"
+                        status_check $last_status "${command}" "${status_log}" "${model_name}"
                     done
                 done
             done
@@ -106,7 +104,7 @@ function func_cls_cpp_inference(){
     _log_path=$3
     _img_dir=$4
     _flag_quant=$5
-    # inference 
+    # inference
 
     for use_gpu in ${cpp_use_gpu_list[*]}; do
         if [ ${use_gpu} = "False" ] || [ ${use_gpu} = "cpu" ]; then
@@ -122,12 +120,12 @@ function func_cls_cpp_inference(){
                         fi
                         _save_log_path="${_log_path}/cls_cpp_infer_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_precision_${precision}_batchsize_${batch_size}.log"
 
-			command="${generate_yaml_cmd} --type cls --batch_size ${batch_size} --mkldnn ${use_mkldnn} --gpu ${use_gpu} --cpu_thread ${threads} --tensorrt False --precision ${precision} --data_dir ${_img_dir} --benchmark True --cls_model_dir ${cpp_infer_model_dir} --gpu_id ${GPUID}"
-			eval $command
-			command1="${_script} 2>&1|tee ${_save_log_path}"
-			eval ${command1}
+                        command="${generate_yaml_cmd} --type cls --batch_size ${batch_size} --mkldnn ${use_mkldnn} --gpu ${use_gpu} --cpu_thread ${threads} --tensorrt False --precision ${precision} --data_dir ${_img_dir} --benchmark True --cls_model_dir ${cpp_infer_model_dir} --gpu_id ${GPUID}"
+                        eval $command
+                        command1="${_script} > ${_save_log_path} 2>&1"
+                        eval ${command1}
                         last_status=${PIPESTATUS[0]}
-                        status_check $last_status "${command1}" "${status_log}"
+                        status_check $last_status "${command1}" "${status_log}" "${model_name}"
                     done
                 done
             done
@@ -136,7 +134,7 @@ function func_cls_cpp_inference(){
                 for precision in ${cpp_precision_list[*]}; do
                     if [[ ${_flag_quant} = "False" ]] && [[ ${precision} =~ "int8" ]]; then
                         continue
-                    fi 
+                    fi
                     if [[ ${precision} =~ "fp16" || ${precision} =~ "int8" ]] && [ ${use_trt} = "False" ]; then
                         continue
                     fi
@@ -145,12 +143,12 @@ function func_cls_cpp_inference(){
                     fi
                     for batch_size in ${cpp_batch_size_list[*]}; do
                         _save_log_path="${_log_path}/cls_cpp_infer_gpu_usetrt_${use_trt}_precision_${precision}_batchsize_${batch_size}.log"
-			command="${generate_yaml_cmd} --type cls --batch_size ${batch_size} --mkldnn False --gpu ${use_gpu} --cpu_thread 1 --tensorrt ${use_trt} --precision ${precision} --data_dir ${_img_dir} --benchmark True --cls_model_dir ${cpp_infer_model_dir} --gpu_id ${GPUID}"
+                        command="${generate_yaml_cmd} --type cls --batch_size ${batch_size} --mkldnn False --gpu ${use_gpu} --cpu_thread 1 --tensorrt ${use_trt} --precision ${precision} --data_dir ${_img_dir} --benchmark True --cls_model_dir ${cpp_infer_model_dir} --gpu_id ${GPUID}"
+                        eval $command
+                        command="${_script} > ${_save_log_path} 2>&1"
                         eval $command
-			command="${_script} 2>&1|tee ${_save_log_path}"
-			eval $command
                         last_status=${PIPESTATUS[0]}
-                        status_check $last_status "${command}" "${status_log}"
+                        status_check $last_status "${command}" "${status_log}" "${model_name}"
                     done
                 done
             done
@@ -195,49 +193,11 @@ if [[ $cpp_infer_type == "shitu" ]]; then
     cd ..
 fi
 
-if [ -d "opencv-3.4.7/opencv3/" ] && [ $(md5sum opencv-3.4.7.tar.gz | awk -F ' ' '{print $1}') = "faa2b5950f8bee3f03118e600c74746a" ];then
-    echo "################### build opencv skipped ###################"
-else
-    echo "################### build opencv ###################"
-    rm -rf opencv-3.4.7.tar.gz opencv-3.4.7/
-    wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/opencv-3.4.7.tar.gz
-    tar -xf opencv-3.4.7.tar.gz
-
-    cd opencv-3.4.7/
-    install_path=$(pwd)/opencv3
-
-    rm -rf build
-    mkdir build
-    cd build
-
-    cmake .. \
-            -DCMAKE_INSTALL_PREFIX=${install_path} \
-            -DCMAKE_BUILD_TYPE=Release \
-            -DBUILD_SHARED_LIBS=OFF \
-            -DWITH_IPP=OFF \
-            -DBUILD_IPP_IW=OFF \
-            -DWITH_LAPACK=OFF \
-            -DWITH_EIGEN=OFF \
-            -DCMAKE_INSTALL_LIBDIR=lib64 \
-            -DWITH_ZLIB=ON \
-            -DBUILD_ZLIB=ON \
-            -DWITH_JPEG=ON \
-            -DBUILD_JPEG=ON \
-            -DWITH_PNG=ON \
-            -DBUILD_PNG=ON \
-            -DWITH_TIFF=ON \
-            -DBUILD_TIFF=ON
-
-    make -j
-    make install
-    cd ../../
-    echo "################### build opencv finished ###################"
-fi
-
 echo "################### build PaddleClas demo ####################"
-OPENCV_DIR=$(pwd)/opencv-3.4.7/opencv3/
-# LIB_DIR=/work/project/project/test/paddle_inference/
-LIB_DIR=$(pwd)/Paddle/build/paddle_inference_install_dir/
+# pwd = /workspace/hesensen/PaddleClas/deploy/cpp_shitu
+OPENCV_DIR=$(dirname $PWD)/cpp/opencv-3.4.7/opencv3/
+LIB_DIR=$(dirname $PWD)/cpp/paddle_inference/
+
 CUDA_LIB_DIR=$(dirname `find /usr -name libcudart.so`)
 CUDNN_LIB_DIR=$(dirname `find /usr -name libcudnn.so`)
 
@@ -295,9 +255,9 @@ for infer_model in ${cpp_infer_model_dir[*]}; do
     #run inference
     is_quant=${infer_quant_flag[Count]}
     if [[ $cpp_infer_type == "cls" ]]; then
-	func_cls_cpp_inference "${inference_cmd}" "${infer_model}" "${LOG_PATH}" "${cpp_image_dir_value}" ${is_quant}
+        func_cls_cpp_inference "${inference_cmd}" "${infer_model}" "${LOG_PATH}" "${cpp_image_dir_value}" ${is_quant}
     else
-	func_shitu_cpp_inference "${inference_cmd}" "${infer_model}" "${LOG_PATH}" "${cpp_image_dir_value}" ${is_quant}
+        func_shitu_cpp_inference "${inference_cmd}" "${infer_model}" "${LOG_PATH}" "${cpp_image_dir_value}" ${is_quant}
     fi
     Count=$(($Count + 1))
 done
diff --git a/test_tipc/test_inference_jeston.sh b/test_tipc/test_inference_jeston.sh
index 2fd76e1e9e7e8c7b52d0b6838cd15840a59fe5c4..56845003908c1a9cc8ac1b76e40ec108d33e8478 100644
--- a/test_tipc/test_inference_jeston.sh
+++ b/test_tipc/test_inference_jeston.sh
@@ -71,7 +71,7 @@ if [ ${MODE} = "whole_infer" ]; then
             echo  $export_cmd
             eval $export_cmd
             status_export=$?
-            status_check $status_export "${export_cmd}" "${status_log}"
+            status_check $status_export "${export_cmd}" "${status_log}" "${model_name}"
         else
             save_infer_dir=${infer_model}
         fi
diff --git a/test_tipc/test_lite_arm_cpu_cpp.sh b/test_tipc/test_lite_arm_cpu_cpp.sh
index 86c340060296019d0aef798aacd95580a438e0ff..919226eea5ce38b82fad6c2130a7c6467b6ee041 100644
--- a/test_tipc/test_lite_arm_cpu_cpp.sh
+++ b/test_tipc/test_lite_arm_cpu_cpp.sh
@@ -67,7 +67,7 @@ function func_test_tipc(){
                 eval ${command1}
                 command2="adb shell 'export LD_LIBRARY_PATH=${lite_arm_work_path}; ${real_inference_cmd}'  > ${_save_log_path} 2>&1"
                 eval ${command2}
-                status_check $? "${command2}" "${status_log}"
+                status_check $? "${command2}" "${status_log}" "${model_name}"
             done
         done
     done
diff --git a/test_tipc/test_paddle2onnx.sh b/test_tipc/test_paddle2onnx.sh
index 850fc9049b95400ee6334ff9dfa677947294c2de..c869f8f3bf9df900d779dcf98355ca56eeece207 100644
--- a/test_tipc/test_paddle2onnx.sh
+++ b/test_tipc/test_paddle2onnx.sh
@@ -1,17 +1,10 @@
 #!/bin/bash
-source test_tipc/common_func.sh 
+source test_tipc/common_func.sh
 
 FILENAME=$1
 
-dataline=$(cat ${FILENAME})
-lines=(${dataline})
-# common params
-model_name=$(func_parser_value "${lines[1]}")
-python=$(func_parser_value "${lines[2]}")
-
-
 # parser params
-dataline=$(awk 'NR==1, NR==14{print}'  $FILENAME)
+dataline=$(awk 'NR==1, NR==16{print}'  $FILENAME)
 IFS=$'\n'
 lines=(${dataline})
 
@@ -31,17 +24,19 @@ opset_version_key=$(func_parser_key "${lines[8]}")
 opset_version_value=$(func_parser_value "${lines[8]}")
 enable_onnx_checker_key=$(func_parser_key "${lines[9]}")
 enable_onnx_checker_value=$(func_parser_value "${lines[9]}")
-# parser onnx inference 
-inference_py=$(func_parser_value "${lines[10]}")
-use_onnx_key=$(func_parser_key "${lines[11]}")
-use_onnx_value=$(func_parser_value "${lines[11]}")
-inference_model_dir_key=$(func_parser_key "${lines[12]}")
-inference_model_dir_value=$(func_parser_value "${lines[12]}")
-inference_hardware_key=$(func_parser_key "${lines[13]}")
-inference_hardware_value=$(func_parser_value "${lines[13]}")
+# parser onnx inference
+inference_py=$(func_parser_value "${lines[11]}")
+use_onnx_key=$(func_parser_key "${lines[12]}")
+use_onnx_value=$(func_parser_value "${lines[12]}")
+inference_model_dir_key=$(func_parser_key "${lines[13]}")
+inference_model_dir_value=$(func_parser_value "${lines[13]}")
+inference_hardware_key=$(func_parser_key "${lines[14]}")
+inference_hardware_value=$(func_parser_value "${lines[14]}")
+inference_config_key=$(func_parser_key "${lines[15]}")
+inference_config_value=$(func_parser_value "${lines[15]}")
 
-LOG_PATH="./test_tipc/output"
-mkdir -p ./test_tipc/output
+LOG_PATH="./test_tipc/output/${model_name}"
+mkdir -p ${LOG_PATH}
 status_log="${LOG_PATH}/results_paddle2onnx.log"
 
 
@@ -60,14 +55,16 @@ function func_paddle2onnx(){
     trans_model_cmd="${padlle2onnx_cmd} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_save_model} ${set_opset_version} ${set_enable_onnx_checker}"
     eval $trans_model_cmd
     last_status=${PIPESTATUS[0]}
-    status_check $last_status "${trans_model_cmd}" "${status_log}"
+    status_check $last_status "${trans_model_cmd}" "${status_log}" "${model_name}"
+
     # python inference
     set_model_dir=$(func_set_params "${inference_model_dir_key}" "${inference_model_dir_value}")
     set_use_onnx=$(func_set_params "${use_onnx_key}" "${use_onnx_value}")
     set_hardware=$(func_set_params "${inference_hardware_key}" "${inference_hardware_value}")
-    infer_model_cmd="cd deploy && ${python} ${inference_py} -o ${set_model_dir} -o ${set_use_onnx} -o ${set_hardware} >${_save_log_path} 2>&1 && cd ../"
+    set_inference_config=$(func_set_params "${inference_config_key}" "${inference_config_value}")
+    infer_model_cmd="cd deploy && ${python} ${inference_py} -o ${set_model_dir} -o ${set_use_onnx} -o ${set_hardware} ${set_inference_config} > ${_save_log_path} 2>&1 && cd ../"
     eval $infer_model_cmd
-    status_check $last_status "${infer_model_cmd}" "${status_log}"
+    status_check $last_status "${infer_model_cmd}" "${status_log}" "${model_name}"
 }
 
 
@@ -75,4 +72,4 @@ echo "################### run test ###################"
 
 export Count=0
 IFS="|"
-func_paddle2onnx 
\ No newline at end of file
+func_paddle2onnx
\ No newline at end of file
diff --git a/test_tipc/test_serving.sh b/test_tipc/test_serving.sh
deleted file mode 100644
index c36935a60fecacea672fd932773a8fb0bdcd619b..0000000000000000000000000000000000000000
--- a/test_tipc/test_serving.sh
+++ /dev/null
@@ -1,168 +0,0 @@
-#!/bin/bash
-source test_tipc/common_func.sh
-
-FILENAME=$1
-dataline=$(awk 'NR==1, NR==18{print}'  $FILENAME)
-
-# parser params
-IFS=$'\n'
-lines=(${dataline})
-
-# parser serving
-model_name=$(func_parser_value "${lines[1]}")
-python=$(func_parser_value "${lines[2]}")
-trans_model_py=$(func_parser_value "${lines[3]}")
-infer_model_dir_key=$(func_parser_key "${lines[4]}")
-infer_model_dir_value=$(func_parser_value "${lines[4]}")
-model_filename_key=$(func_parser_key "${lines[5]}")
-model_filename_value=$(func_parser_value "${lines[5]}")
-params_filename_key=$(func_parser_key "${lines[6]}")
-params_filename_value=$(func_parser_value "${lines[6]}")
-serving_server_key=$(func_parser_key "${lines[7]}")
-serving_server_value=$(func_parser_value "${lines[7]}")
-serving_client_key=$(func_parser_key "${lines[8]}")
-serving_client_value=$(func_parser_value "${lines[8]}")
-serving_dir_value=$(func_parser_value "${lines[9]}")
-web_service_py=$(func_parser_value "${lines[10]}")
-web_use_gpu_key=$(func_parser_key "${lines[11]}")
-web_use_gpu_list=$(func_parser_value "${lines[11]}")
-web_use_mkldnn_key=$(func_parser_key "${lines[12]}")
-web_use_mkldnn_list=$(func_parser_value "${lines[12]}")
-web_cpu_threads_key=$(func_parser_key "${lines[13]}")
-web_cpu_threads_list=$(func_parser_value "${lines[13]}")
-web_use_trt_key=$(func_parser_key "${lines[14]}")
-web_use_trt_list=$(func_parser_value "${lines[14]}")
-web_precision_key=$(func_parser_key "${lines[15]}")
-web_precision_list=$(func_parser_value "${lines[15]}")
-pipeline_py=$(func_parser_value "${lines[16]}")
-image_dir_key=$(func_parser_key "${lines[17]}")
-image_dir_value=$(func_parser_value "${lines[17]}")
-
-LOG_PATH="../../test_tipc/output"
-mkdir -p ./test_tipc/output
-status_log="${LOG_PATH}/results_serving.log"
-
-function func_serving(){
-    IFS='|'
-    _python=$1
-    _script=$2
-    _model_dir=$3
-    # pdserving
-    set_dirname=$(func_set_params "${infer_model_dir_key}" "${infer_model_dir_value}")
-    set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}")
-    set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}")
-    set_serving_server=$(func_set_params "${serving_server_key}" "${serving_server_value}")
-    set_serving_client=$(func_set_params "${serving_client_key}" "${serving_client_value}")
-    set_image_dir=$(func_set_params "${image_dir_key}" "${image_dir_value}")
-    trans_model_cmd="${python} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}"
-    eval $trans_model_cmd
-    cd ${serving_dir_value}
-    echo $PWD
-    unset https_proxy
-    unset http_proxy
-    for python in ${python[*]}; do
-        if [ ${python} = "cpp"]; then
-            for use_gpu in ${web_use_gpu_list[*]}; do
-                if [ ${use_gpu} = "null" ]; then
-                    web_service_cpp_cmd="${python} -m paddle_serving_server.serve --model ppocr_det_mobile_2.0_serving/ ppocr_rec_mobile_2.0_serving/ --port 9293"
-                    eval $web_service_cmd
-                    sleep 2s
-                    _save_log_path="${LOG_PATH}/server_infer_cpp_cpu_pipeline_usemkldnn_False_threads_4_batchsize_1.log"
-                    pipeline_cmd="${python} ocr_cpp_client.py ppocr_det_mobile_2.0_client/ ppocr_rec_mobile_2.0_client/"
-                    eval $pipeline_cmd
-                    status_check $last_status "${pipeline_cmd}" "${status_log}"
-                    sleep 2s
-                    ps ux | grep -E 'web_service|pipeline' | awk '{print $2}' | xargs kill -s 9
-                else
-                    web_service_cpp_cmd="${python} -m paddle_serving_server.serve --model ppocr_det_mobile_2.0_serving/ ppocr_rec_mobile_2.0_serving/ --port 9293 --gpu_id=0"
-                    eval $web_service_cmd
-                    sleep 2s
-                    _save_log_path="${LOG_PATH}/server_infer_cpp_cpu_pipeline_usemkldnn_False_threads_4_batchsize_1.log"
-                    pipeline_cmd="${python} ocr_cpp_client.py ppocr_det_mobile_2.0_client/ ppocr_rec_mobile_2.0_client/"
-                    eval $pipeline_cmd
-                    status_check $last_status "${pipeline_cmd}" "${status_log}"
-                    sleep 2s
-                    ps ux | grep -E 'web_service|pipeline' | awk '{print $2}' | xargs kill -s 9                
-                fi
-            done
-        else
-            # python serving
-            for use_gpu in ${web_use_gpu_list[*]}; do
-                echo ${ues_gpu}
-                if [ ${use_gpu} = "null" ]; then
-                    for use_mkldnn in ${web_use_mkldnn_list[*]}; do
-                        if [ ${use_mkldnn} = "False" ]; then
-                            continue
-                        fi
-                        for threads in ${web_cpu_threads_list[*]}; do
-                            set_cpu_threads=$(func_set_params "${web_cpu_threads_key}" "${threads}")
-                            web_service_cmd="${python} ${web_service_py} ${web_use_gpu_key}=${use_gpu} ${web_use_mkldnn_key}=${use_mkldnn} ${set_cpu_threads} &"
-                            eval $web_service_cmd
-                            sleep 2s
-                            for pipeline in ${pipeline_py[*]}; do
-                                _save_log_path="${LOG_PATH}/server_infer_cpu_${pipeline%_client*}_usemkldnn_${use_mkldnn}_threads_${threads}_batchsize_1.log"
-                                pipeline_cmd="${python} ${pipeline} ${set_image_dir} > ${_save_log_path} 2>&1 "
-                                eval $pipeline_cmd
-                                last_status=${PIPESTATUS[0]}
-                                eval "cat ${_save_log_path}"
-                                status_check $last_status "${pipeline_cmd}" "${status_log}"
-                                sleep 2s
-                            done
-                            ps ux | grep -E 'web_service|pipeline' | awk '{print $2}' | xargs kill -s 9
-                        done
-                    done
-                elif [ ${use_gpu} = "0" ]; then
-                    for use_trt in ${web_use_trt_list[*]}; do
-                        for precision in ${web_precision_list[*]}; do
-                            if [[ ${_flag_quant} = "False" ]] && [[ ${precision} =~ "int8" ]]; then
-                                continue
-                            fi
-                            if [[ ${precision} =~ "fp16" || ${precision} =~ "int8" ]] && [ ${use_trt} = "False" ]; then
-                                continue
-                            fi
-                            if [[ ${use_trt} = "False" || ${precision} =~ "int8" ]] && [[ ${_flag_quant} = "True" ]]; then
-                                continue
-                            fi
-                            set_tensorrt=$(func_set_params "${web_use_trt_key}" "${use_trt}")
-                            set_precision=$(func_set_params "${web_precision_key}" "${precision}")
-                            web_service_cmd="${python} ${web_service_py} ${web_use_gpu_key}=${use_gpu} ${set_tensorrt} ${set_precision} & "
-                            eval $web_service_cmd
-                        
-                            sleep 2s
-                            for pipeline in ${pipeline_py[*]}; do
-                                _save_log_path="${LOG_PATH}/server_infer_gpu_${pipeline%_client*}_usetrt_${use_trt}_precision_${precision}_batchsize_1.log"
-                                pipeline_cmd="${python} ${pipeline} ${set_image_dir}> ${_save_log_path} 2>&1"
-                                eval $pipeline_cmd
-                                last_status=${PIPESTATUS[0]}
-                                eval "cat ${_save_log_path}"
-                                status_check $last_status "${pipeline_cmd}" "${status_log}"
-                                sleep 2s
-                            done
-                            ps ux | grep -E 'web_service|pipeline' | awk '{print $2}' | xargs kill -s 9
-                        done
-                    done
-                else
-                    echo "Does not support hardware other than CPU and GPU Currently!"
-                fi
-            done
-        fi
-    done
-}
-
-
-# set cuda device
-GPUID=$2
-if [ ${#GPUID} -le 0 ];then
-    env=" "
-else
-    env="export CUDA_VISIBLE_DEVICES=${GPUID}"
-fi
-set CUDA_VISIBLE_DEVICES
-eval $env
-
-
-echo "################### run test ###################"
-
-export Count=0
-IFS="|"
-func_serving "${web_service_cmd}"
diff --git a/test_tipc/test_serving_infer_cpp.sh b/test_tipc/test_serving_infer_cpp.sh
new file mode 100644
index 0000000000000000000000000000000000000000..fdb7ef186bafd9dcd879150188e1f0450ca87211
--- /dev/null
+++ b/test_tipc/test_serving_infer_cpp.sh
@@ -0,0 +1,270 @@
+#!/bin/bash
+source test_tipc/common_func.sh
+
+FILENAME=$1
+dataline=$(awk 'NR==1, NR==19{print}'  $FILENAME)
+
+# parser params
+IFS=$'\n'
+lines=(${dataline})
+
+function func_get_url_file_name(){
+    strs=$1
+    IFS="/"
+    array=(${strs})
+    tmp=${array[${#array[@]}-1]}
+    echo ${tmp}
+}
+
+# parser serving
+model_name=$(func_parser_value "${lines[1]}")
+python=$(func_parser_value "${lines[2]}")
+trans_model_py=$(func_parser_value "${lines[4]}")
+infer_model_dir_key=$(func_parser_key "${lines[5]}")
+infer_model_dir_value=$(func_parser_value "${lines[5]}")
+model_filename_key=$(func_parser_key "${lines[6]}")
+model_filename_value=$(func_parser_value "${lines[6]}")
+params_filename_key=$(func_parser_key "${lines[7]}")
+params_filename_value=$(func_parser_value "${lines[7]}")
+serving_server_key=$(func_parser_key "${lines[8]}")
+serving_server_value=$(func_parser_value "${lines[8]}")
+serving_client_key=$(func_parser_key "${lines[9]}")
+serving_client_value=$(func_parser_value "${lines[9]}")
+serving_dir_value=$(func_parser_value "${lines[10]}")
+web_service_py=$(func_parser_value "${lines[11]}")
+web_use_gpu_key=$(func_parser_key "${lines[12]}")
+web_use_gpu_list=$(func_parser_value "${lines[12]}")
+pipeline_py=$(func_parser_value "${lines[13]}")
+
+
+function func_serving_cls(){
+    LOG_PATH="test_tipc/output/${model_name}"
+    mkdir -p ${LOG_PATH}
+    LOG_PATH="../../${LOG_PATH}"
+    status_log="${LOG_PATH}/results_serving.log"
+    IFS='|'
+
+    # pdserving
+    set_dirname=$(func_set_params "${infer_model_dir_key}" "${infer_model_dir_value}")
+    set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}")
+    set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}")
+    set_serving_server=$(func_set_params "${serving_server_key}" "${serving_server_value}")
+    set_serving_client=$(func_set_params "${serving_client_key}" "${serving_client_value}")
+
+    for python_ in ${python[*]}; do
+        if [[ ${python_} =~ "python" ]]; then
+            trans_model_cmd="${python_} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}"
+            eval ${trans_model_cmd}
+            break
+        fi
+    done
+
+    # modify the alias_name of fetch_var to "outputs"
+    server_fetch_var_line_cmd="sed -i '/fetch_var/,/is_lod_tensor/s/alias_name: .*/alias_name: \"prediction\"/' ${serving_server_value}/serving_server_conf.prototxt"
+    eval ${server_fetch_var_line_cmd}
+
+    client_fetch_var_line_cmd="sed -i '/fetch_var/,/is_lod_tensor/s/alias_name: .*/alias_name: \"prediction\"/' ${serving_client_value}/serving_client_conf.prototxt"
+    eval ${client_fetch_var_line_cmd}
+
+    prototxt_dataline=$(awk 'NR==1, NR==3{print}'  ${serving_server_value}/serving_server_conf.prototxt)
+    IFS=$'\n'
+    prototxt_lines=(${prototxt_dataline})
+    feed_var_name=$(func_parser_value "${prototxt_lines[2]}")
+    IFS='|'
+
+    cd ${serving_dir_value}
+    unset https_proxy
+    unset http_proxy
+
+    for item in ${python[*]}; do
+        if [[ ${item} =~ "python" ]]; then
+            python_=${item}
+            break
+        fi
+    done
+    serving_client_dir_name=$(func_get_url_file_name "$serving_client_value")
+    set_client_feed_type_cmd="sed -i '/feed_type/,/: .*/s/feed_type: .*/feed_type: 20/' ${serving_client_dir_name}/serving_client_conf.prototxt"
+    eval ${set_client_feed_type_cmd}
+    set_client_shape_cmd="sed -i '/shape: 3/,/shape: 3/s/shape: 3/shape: 1/' ${serving_client_dir_name}/serving_client_conf.prototxt"
+    eval ${set_client_shape_cmd}
+    set_client_shape224_cmd="sed -i '/shape: 224/,/shape: 224/s/shape: 224//' ${serving_client_dir_name}/serving_client_conf.prototxt"
+    eval ${set_client_shape224_cmd}
+    set_client_shape224_cmd="sed -i '/shape: 224/,/shape: 224/s/shape: 224//' ${serving_client_dir_name}/serving_client_conf.prototxt"
+    eval ${set_client_shape224_cmd}
+
+    set_pipeline_load_config_cmd="sed -i '/load_client_config/,/.prototxt/s/.\/.*\/serving_client_conf.prototxt/.\/${serving_client_dir_name}\/serving_client_conf.prototxt/' ${pipeline_py}"
+    eval ${set_pipeline_load_config_cmd}
+
+    set_pipeline_feed_var_cmd="sed -i '/feed=/,/: image}/s/feed={.*: image}/feed={${feed_var_name}: image}/' ${pipeline_py}"
+    eval ${set_pipeline_feed_var_cmd}
+
+    serving_server_dir_name=$(func_get_url_file_name "$serving_server_value")
+
+    for use_gpu in ${web_use_gpu_list[*]}; do
+        if [[ ${use_gpu} = "null" ]]; then
+            web_service_cpp_cmd="${python_} -m paddle_serving_server.serve --model ${serving_server_dir_name} --op GeneralClasOp --port 9292 &"
+            eval ${web_service_cpp_cmd}
+            last_status=${PIPESTATUS[0]}
+            status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}"
+            sleep 5s
+            _save_log_path="${LOG_PATH}/server_infer_cpp_cpu_pipeline_batchsize_1.log"
+            pipeline_cmd="${python_} test_cpp_serving_client.py > ${_save_log_path} 2>&1 "
+            eval ${pipeline_cmd}
+            last_status=${PIPESTATUS[0]}
+            eval "cat ${_save_log_path}"
+            status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}"
+            eval "${python_} -m paddle_serving_server.serve stop"
+            sleep 5s
+        else
+            web_service_cpp_cmd="${python_} -m paddle_serving_server.serve --model ${serving_server_dir_name} --op GeneralClasOp --port 9292 --gpu_id=${use_gpu} &"
+            eval ${web_service_cpp_cmd}
+            last_status=${PIPESTATUS[0]}
+            status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}"
+            sleep 8s
+
+            _save_log_path="${LOG_PATH}/server_infer_cpp_gpu_pipeline_batchsize_1.log"
+            pipeline_cmd="${python_} test_cpp_serving_client.py > ${_save_log_path} 2>&1 "
+            eval ${pipeline_cmd}
+            last_status=${PIPESTATUS[0]}
+            eval "cat ${_save_log_path}"
+            status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}"
+            sleep 5s
+            eval "${python_} -m paddle_serving_server.serve stop"
+        fi
+    done
+}
+
+
+function func_serving_rec(){
+    LOG_PATH="test_tipc/output/${model_name}"
+    mkdir -p ${LOG_PATH}
+    LOG_PATH="../../../${LOG_PATH}"
+    status_log="${LOG_PATH}/results_serving.log"
+    trans_model_py=$(func_parser_value "${lines[5]}")
+    cls_infer_model_dir_key=$(func_parser_key "${lines[6]}")
+    cls_infer_model_dir_value=$(func_parser_value "${lines[6]}")
+    det_infer_model_dir_key=$(func_parser_key "${lines[7]}")
+    det_infer_model_dir_value=$(func_parser_value "${lines[7]}")
+    model_filename_key=$(func_parser_key "${lines[8]}")
+    model_filename_value=$(func_parser_value "${lines[8]}")
+    params_filename_key=$(func_parser_key "${lines[9]}")
+    params_filename_value=$(func_parser_value "${lines[9]}")
+
+    cls_serving_server_key=$(func_parser_key "${lines[10]}")
+    cls_serving_server_value=$(func_parser_value "${lines[10]}")
+    cls_serving_client_key=$(func_parser_key "${lines[11]}")
+    cls_serving_client_value=$(func_parser_value "${lines[11]}")
+
+    det_serving_server_key=$(func_parser_key "${lines[12]}")
+    det_serving_server_value=$(func_parser_value "${lines[12]}")
+    det_serving_client_key=$(func_parser_key "${lines[13]}")
+    det_serving_client_value=$(func_parser_value "${lines[13]}")
+
+    serving_dir_value=$(func_parser_value "${lines[14]}")
+    web_service_py=$(func_parser_value "${lines[15]}")
+    web_use_gpu_key=$(func_parser_key "${lines[16]}")
+    web_use_gpu_list=$(func_parser_value "${lines[16]}")
+    pipeline_py=$(func_parser_value "${lines[17]}")
+
+    IFS='|'
+    for python_ in ${python[*]}; do
+        if [[ ${python_} =~ "python" ]]; then
+            python_interp=${python_}
+            break
+        fi
+    done
+
+    # pdserving
+    cd ./deploy
+    set_dirname=$(func_set_params "${cls_infer_model_dir_key}" "${cls_infer_model_dir_value}")
+    set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}")
+    set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}")
+    set_serving_server=$(func_set_params "${cls_serving_server_key}" "${cls_serving_server_value}")
+    set_serving_client=$(func_set_params "${cls_serving_client_key}" "${cls_serving_client_value}")
+    cls_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}"
+    eval ${cls_trans_model_cmd}
+
+    set_dirname=$(func_set_params "${det_infer_model_dir_key}" "${det_infer_model_dir_value}")
+    set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}")
+    set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}")
+    set_serving_server=$(func_set_params "${det_serving_server_key}" "${det_serving_server_value}")
+    set_serving_client=$(func_set_params "${det_serving_client_key}" "${det_serving_client_value}")
+    det_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}"
+    eval ${det_trans_model_cmd}
+
+    cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_serving/*.prototxt ${cls_serving_server_value}"
+    eval ${cp_prototxt_cmd}
+    cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_client/*.prototxt ${cls_serving_client_value}"
+    eval ${cp_prototxt_cmd}
+    cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/*.prototxt ${det_serving_client_value}"
+    eval ${cp_prototxt_cmd}
+    cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/*.prototxt ${det_serving_server_value}"
+    eval ${cp_prototxt_cmd}
+
+    prototxt_dataline=$(awk 'NR==1, NR==3{print}'  ${cls_serving_server_value}/serving_server_conf.prototxt)
+    IFS=$'\n'
+    prototxt_lines=(${prototxt_dataline})
+    feed_var_name=$(func_parser_value "${prototxt_lines[2]}")
+    IFS='|'
+
+    cd ${serving_dir_value}
+    unset https_proxy
+    unset http_proxy
+
+    # export SERVING_BIN=${PWD}/../Serving/server-build-gpu-opencv/core/general-server/serving
+    for use_gpu in ${web_use_gpu_list[*]}; do
+        if [ ${use_gpu} = "null" ]; then
+            det_serving_server_dir_name=$(func_get_url_file_name "$det_serving_server_value")
+            web_service_cpp_cmd="${python_interp} -m paddle_serving_server.serve --model ../../${det_serving_server_value} ../../${cls_serving_server_value} --op GeneralPicodetOp GeneralFeatureExtractOp --port 9400 &"
+            eval ${web_service_cpp_cmd}
+            last_status=${PIPESTATUS[0]}
+            status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}"
+            sleep 5s
+            _save_log_path="${LOG_PATH}/server_infer_cpp_cpu_batchsize_1.log"
+            pipeline_cmd="${python_interp} ${pipeline_py} > ${_save_log_path} 2>&1 "
+            eval ${pipeline_cmd}
+            last_status=${PIPESTATUS[0]}
+            eval "cat ${_save_log_path}"
+            status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}"
+            eval "${python_} -m paddle_serving_server.serve stop"
+            sleep 5s
+        else
+            det_serving_server_dir_name=$(func_get_url_file_name "$det_serving_server_value")
+            web_service_cpp_cmd="${python_interp} -m paddle_serving_server.serve --model ../../${det_serving_server_value} ../../${cls_serving_server_value} --op GeneralPicodetOp GeneralFeatureExtractOp --port 9400 --gpu_id=${use_gpu} &"
+            eval ${web_service_cpp_cmd}
+            last_status=${PIPESTATUS[0]}
+            status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}"
+            sleep 5s
+            _save_log_path="${LOG_PATH}/server_infer_cpp_gpu_batchsize_1.log"
+            pipeline_cmd="${python_interp} ${pipeline_py} > ${_save_log_path} 2>&1 "
+            eval ${pipeline_cmd}
+            last_status=${PIPESTATUS[0]}
+            eval "cat ${_save_log_path}"
+            status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}"
+            eval "${python_} -m paddle_serving_server.serve stop"
+            sleep 5s
+        fi
+    done
+}
+
+
+# set cuda device
+GPUID=$3
+if [ ${#GPUID} -le 0 ];then
+    env="export CUDA_VISIBLE_DEVICES=0"
+else
+    env="export CUDA_VISIBLE_DEVICES=${GPUID}"
+fi
+set CUDA_VISIBLE_DEVICES
+eval ${env}
+
+
+echo "################### run test ###################"
+
+export Count=0
+IFS="|"
+if [[ ${model_name} =~ "ShiTu" ]]; then
+    func_serving_rec
+else
+    func_serving_cls
+fi
diff --git a/test_tipc/test_serving_infer_python.sh b/test_tipc/test_serving_infer_python.sh
new file mode 100644
index 0000000000000000000000000000000000000000..6141e11086008215903a2f87ba8ca1d69fd518b3
--- /dev/null
+++ b/test_tipc/test_serving_infer_python.sh
@@ -0,0 +1,317 @@
+#!/bin/bash
+source test_tipc/common_func.sh
+
+FILENAME=$1
+dataline=$(awk 'NR==1, NR==19{print}'  $FILENAME)
+
+# parser params
+IFS=$'\n'
+lines=(${dataline})
+
+function func_get_url_file_name(){
+    strs=$1
+    IFS="/"
+    array=(${strs})
+    tmp=${array[${#array[@]}-1]}
+    echo ${tmp}
+}
+
+# parser serving
+model_name=$(func_parser_value "${lines[1]}")
+python=$(func_parser_value "${lines[2]}")
+trans_model_py=$(func_parser_value "${lines[4]}")
+infer_model_dir_key=$(func_parser_key "${lines[5]}")
+infer_model_dir_value=$(func_parser_value "${lines[5]}")
+model_filename_key=$(func_parser_key "${lines[6]}")
+model_filename_value=$(func_parser_value "${lines[6]}")
+params_filename_key=$(func_parser_key "${lines[7]}")
+params_filename_value=$(func_parser_value "${lines[7]}")
+serving_server_key=$(func_parser_key "${lines[8]}")
+serving_server_value=$(func_parser_value "${lines[8]}")
+serving_client_key=$(func_parser_key "${lines[9]}")
+serving_client_value=$(func_parser_value "${lines[9]}")
+serving_dir_value=$(func_parser_value "${lines[10]}")
+web_service_py=$(func_parser_value "${lines[11]}")
+web_use_gpu_key=$(func_parser_key "${lines[12]}")
+web_use_gpu_list=$(func_parser_value "${lines[12]}")
+pipeline_py=$(func_parser_value "${lines[13]}")
+
+
+function func_serving_cls(){
+    LOG_PATH="test_tipc/output/${model_name}"
+    mkdir -p ${LOG_PATH}
+    LOG_PATH="../../${LOG_PATH}"
+    status_log="${LOG_PATH}/results_serving.log"
+    IFS='|'
+
+    # pdserving
+    set_dirname=$(func_set_params "${infer_model_dir_key}" "${infer_model_dir_value}")
+    set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}")
+    set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}")
+    set_serving_server=$(func_set_params "${serving_server_key}" "${serving_server_value}")
+    set_serving_client=$(func_set_params "${serving_client_key}" "${serving_client_value}")
+
+    for python_ in ${python[*]}; do
+        if [[ ${python_} =~ "python" ]]; then
+            trans_model_cmd="${python_} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}"
+            eval ${trans_model_cmd}
+            break
+        fi
+    done
+
+    # modify the alias_name of fetch_var to "outputs"
+    server_fetch_var_line_cmd="sed -i '/fetch_var/,/is_lod_tensor/s/alias_name: .*/alias_name: \"prediction\"/' ${serving_server_value}/serving_server_conf.prototxt"
+    eval ${server_fetch_var_line_cmd}
+
+    client_fetch_var_line_cmd="sed -i '/fetch_var/,/is_lod_tensor/s/alias_name: .*/alias_name: \"prediction\"/' ${serving_client_value}/serving_client_conf.prototxt"
+    eval ${client_fetch_var_line_cmd}
+
+    prototxt_dataline=$(awk 'NR==1, NR==3{print}'  ${serving_server_value}/serving_server_conf.prototxt)
+    IFS=$'\n'
+    prototxt_lines=(${prototxt_dataline})
+    feed_var_name=$(func_parser_value "${prototxt_lines[2]}")
+    IFS='|'
+
+    cd ${serving_dir_value}
+    unset https_proxy
+    unset http_proxy
+
+    # python serving
+    # modify the input_name in "classification_web_service.py" to be consistent with feed_var.name in prototxt
+    set_web_service_feed_var_cmd="sed -i '/preprocess/,/input_imgs}/s/{.*: input_imgs}/{${feed_var_name}: input_imgs}/' ${web_service_py}"
+    eval ${set_web_service_feed_var_cmd}
+
+    model_config=21
+    serving_server_dir_name=$(func_get_url_file_name "$serving_server_value")
+    set_model_config_cmd="sed -i '${model_config}s/model_config: .*/model_config: ${serving_server_dir_name}/' config.yml"
+    eval ${set_model_config_cmd}
+
+    for use_gpu in ${web_use_gpu_list[*]}; do
+        if [[ ${use_gpu} = "null" ]]; then
+            device_type_line=24
+            set_device_type_cmd="sed -i '${device_type_line}s/device_type: .*/device_type: 0/' config.yml"
+            eval ${set_device_type_cmd}
+
+            devices_line=27
+            set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"\"/' config.yml"
+            eval ${set_devices_cmd}
+
+            web_service_cmd="${python_} ${web_service_py} &"
+            eval ${web_service_cmd}
+            last_status=${PIPESTATUS[0]}
+            status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}"
+            sleep 5s
+            for pipeline in ${pipeline_py[*]}; do
+                _save_log_path="${LOG_PATH}/server_infer_cpu_${pipeline%_client*}_batchsize_1.log"
+                pipeline_cmd="${python_} ${pipeline} > ${_save_log_path} 2>&1 "
+                eval ${pipeline_cmd}
+                last_status=${PIPESTATUS[0]}
+                eval "cat ${_save_log_path}"
+                status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}"
+                sleep 5s
+            done
+            eval "${python_} -m paddle_serving_server.serve stop"
+        elif [ ${use_gpu} -eq 0 ]; then
+            if [[ ${_flag_quant} = "False" ]] && [[ ${precision} =~ "int8" ]]; then
+                continue
+            fi
+            if [[ ${precision} =~ "fp16" || ${precision} =~ "int8" ]] && [ ${use_trt} = "False" ]; then
+                continue
+            fi
+            if [[ ${use_trt} = "False" || ${precision} =~ "int8" ]] && [[ ${_flag_quant} = "True" ]]; then
+                continue
+            fi
+
+            device_type_line=24
+            set_device_type_cmd="sed -i '${device_type_line}s/device_type: .*/device_type: 1/' config.yml"
+            eval ${set_device_type_cmd}
+
+            devices_line=27
+            set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"${use_gpu}\"/' config.yml"
+            eval ${set_devices_cmd}
+
+            web_service_cmd="${python_} ${web_service_py} & "
+            eval ${web_service_cmd}
+            last_status=${PIPESTATUS[0]}
+            status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}"
+            sleep 5s
+            for pipeline in ${pipeline_py[*]}; do
+                _save_log_path="${LOG_PATH}/server_infer_gpu_${pipeline%_client*}_batchsize_1.log"
+                pipeline_cmd="${python_} ${pipeline} > ${_save_log_path} 2>&1"
+                eval ${pipeline_cmd}
+                last_status=${PIPESTATUS[0]}
+                eval "cat ${_save_log_path}"
+                status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}"
+                sleep 5s
+            done
+            eval "${python_} -m paddle_serving_server.serve stop"
+        else
+            echo "Does not support hardware [${use_gpu}] other than CPU and GPU Currently!"
+        fi
+    done
+}
+
+
+function func_serving_rec(){
+    LOG_PATH="test_tipc/output/${model_name}"
+    mkdir -p ${LOG_PATH}
+    LOG_PATH="../../../${LOG_PATH}"
+    status_log="${LOG_PATH}/results_serving.log"
+    trans_model_py=$(func_parser_value "${lines[5]}")
+    cls_infer_model_dir_key=$(func_parser_key "${lines[6]}")
+    cls_infer_model_dir_value=$(func_parser_value "${lines[6]}")
+    det_infer_model_dir_key=$(func_parser_key "${lines[7]}")
+    det_infer_model_dir_value=$(func_parser_value "${lines[7]}")
+    model_filename_key=$(func_parser_key "${lines[8]}")
+    model_filename_value=$(func_parser_value "${lines[8]}")
+    params_filename_key=$(func_parser_key "${lines[9]}")
+    params_filename_value=$(func_parser_value "${lines[9]}")
+
+    cls_serving_server_key=$(func_parser_key "${lines[10]}")
+    cls_serving_server_value=$(func_parser_value "${lines[10]}")
+    cls_serving_client_key=$(func_parser_key "${lines[11]}")
+    cls_serving_client_value=$(func_parser_value "${lines[11]}")
+
+    det_serving_server_key=$(func_parser_key "${lines[12]}")
+    det_serving_server_value=$(func_parser_value "${lines[12]}")
+    det_serving_client_key=$(func_parser_key "${lines[13]}")
+    det_serving_client_value=$(func_parser_value "${lines[13]}")
+
+    serving_dir_value=$(func_parser_value "${lines[14]}")
+    web_service_py=$(func_parser_value "${lines[15]}")
+    web_use_gpu_key=$(func_parser_key "${lines[16]}")
+    web_use_gpu_list=$(func_parser_value "${lines[16]}")
+    pipeline_py=$(func_parser_value "${lines[17]}")
+
+    IFS='|'
+    for python_ in ${python[*]}; do
+        if [[ ${python_} =~ "python" ]]; then
+            python_interp=${python_}
+            break
+        fi
+    done
+
+    # pdserving
+    cd ./deploy
+    set_dirname=$(func_set_params "${cls_infer_model_dir_key}" "${cls_infer_model_dir_value}")
+    set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}")
+    set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}")
+    set_serving_server=$(func_set_params "${cls_serving_server_key}" "${cls_serving_server_value}")
+    set_serving_client=$(func_set_params "${cls_serving_client_key}" "${cls_serving_client_value}")
+    cls_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}"
+    eval ${cls_trans_model_cmd}
+
+    set_dirname=$(func_set_params "${det_infer_model_dir_key}" "${det_infer_model_dir_value}")
+    set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}")
+    set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}")
+    set_serving_server=$(func_set_params "${det_serving_server_key}" "${det_serving_server_value}")
+    set_serving_client=$(func_set_params "${det_serving_client_key}" "${det_serving_client_value}")
+    det_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}"
+    eval ${det_trans_model_cmd}
+
+    # modify the alias_name of fetch_var to "outputs"
+    server_fetch_var_line_cmd="sed -i '/fetch_var/,/is_lod_tensor/s/alias_name: .*/alias_name: \"features\"/' $cls_serving_server_value/serving_server_conf.prototxt"
+    eval ${server_fetch_var_line_cmd}
+    client_fetch_var_line_cmd="sed -i '/fetch_var/,/is_lod_tensor/s/alias_name: .*/alias_name: \"features\"/' $cls_serving_client_value/serving_client_conf.prototxt"
+    eval ${client_fetch_var_line_cmd}
+
+    prototxt_dataline=$(awk 'NR==1, NR==3{print}'  ${cls_serving_server_value}/serving_server_conf.prototxt)
+    IFS=$'\n'
+    prototxt_lines=(${prototxt_dataline})
+    feed_var_name=$(func_parser_value "${prototxt_lines[2]}")
+    IFS='|'
+
+    cd ${serving_dir_value}
+    unset https_proxy
+    unset http_proxy
+
+    # modify the input_name in "recognition_web_service.py" to be consistent with feed_var.name in prototxt
+    set_web_service_feed_var_cmd="sed -i '/preprocess/,/input_imgs}/s/{.*: input_imgs}/{${feed_var_name}: input_imgs}/' ${web_service_py}"
+    eval ${set_web_service_feed_var_cmd}
+    # python serving
+    for use_gpu in ${web_use_gpu_list[*]}; do
+        if [[ ${use_gpu} = "null" ]]; then
+            device_type_line=24
+            set_device_type_cmd="sed -i '${device_type_line}s/device_type: .*/device_type: 0/' config.yml"
+            eval ${set_device_type_cmd}
+
+            devices_line=27
+            set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"\"/' config.yml"
+            eval ${set_devices_cmd}
+
+            web_service_cmd="${python} ${web_service_py} &"
+            eval ${web_service_cmd}
+            last_status=${PIPESTATUS[0]}
+            status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}"
+            sleep 5s
+            for pipeline in ${pipeline_py[*]}; do
+                _save_log_path="${LOG_PATH}/server_infer_cpu_${pipeline%_client*}_batchsize_1.log"
+                pipeline_cmd="${python} ${pipeline} > ${_save_log_path} 2>&1 "
+                eval ${pipeline_cmd}
+                last_status=${PIPESTATUS[0]}
+                eval "cat ${_save_log_path}"
+                status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}"
+                sleep 5s
+            done
+            eval "${python_} -m paddle_serving_server.serve stop"
+        elif [ ${use_gpu} -eq 0 ]; then
+            if [[ ${_flag_quant} = "False" ]] && [[ ${precision} =~ "int8" ]]; then
+                continue
+            fi
+            if [[ ${precision} =~ "fp16" || ${precision} =~ "int8" ]] && [ ${use_trt} = "False" ]; then
+                continue
+            fi
+            if [[ ${use_trt} = "False" || ${precision} =~ "int8" ]] && [[ ${_flag_quant} = "True" ]]; then
+                continue
+            fi
+
+            device_type_line=24
+            set_device_type_cmd="sed -i '${device_type_line}s/device_type: .*/device_type: 1/' config.yml"
+            eval ${set_device_type_cmd}
+
+            devices_line=27
+            set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"${use_gpu}\"/' config.yml"
+            eval ${set_devices_cmd}
+
+            web_service_cmd="${python} ${web_service_py} & "
+            eval ${web_service_cmd}
+            last_status=${PIPESTATUS[0]}
+            status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}"
+            sleep 10s
+            for pipeline in ${pipeline_py[*]}; do
+                _save_log_path="${LOG_PATH}/server_infer_gpu_${pipeline%_client*}_batchsize_1.log"
+                pipeline_cmd="${python} ${pipeline} > ${_save_log_path} 2>&1"
+                eval ${pipeline_cmd}
+                last_status=${PIPESTATUS[0]}
+                eval "cat ${_save_log_path}"
+                status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}"
+                sleep 10s
+            done
+            eval "${python_} -m paddle_serving_server.serve stop"
+        else
+            echo "Does not support hardware [${use_gpu}] other than CPU and GPU Currently!"
+        fi
+    done
+}
+
+
+# set cuda device
+GPUID=$3
+if [ ${#GPUID} -le 0 ];then
+    env="export CUDA_VISIBLE_DEVICES=0"
+else
+    env="export CUDA_VISIBLE_DEVICES=${GPUID}"
+fi
+set CUDA_VISIBLE_DEVICES
+eval ${env}
+
+
+echo "################### run test ###################"
+
+export Count=0
+IFS="|"
+if [[ ${model_name} =~ "ShiTu" ]]; then
+    func_serving_rec
+else
+    func_serving_cls
+fi
diff --git a/test_tipc/test_train_inference_python.sh b/test_tipc/test_train_inference_python.sh
index a567ef3c6ae1e5a7429d4a5738cdb8ce5c6189fa..427005cf0601e192d01264e39ed82ed77ab57d0d 100644
--- a/test_tipc/test_train_inference_python.sh
+++ b/test_tipc/test_train_inference_python.sh
@@ -60,12 +60,12 @@ kl_quant_cmd_value=$(func_parser_value "${lines[33]}")
 export_key2=$(func_parser_key "${lines[34]}")
 export_value2=$(func_parser_value "${lines[34]}")
 
-# parser inference model 
+# parser inference model
 infer_model_dir_list=$(func_parser_value "${lines[36]}")
 infer_export_flag=$(func_parser_value "${lines[37]}")
 infer_is_quant=$(func_parser_value "${lines[38]}")
 
-# parser inference 
+# parser inference
 inference_py=$(func_parser_value "${lines[39]}")
 use_gpu_key=$(func_parser_key "${lines[40]}")
 use_gpu_list=$(func_parser_value "${lines[40]}")
@@ -90,7 +90,7 @@ infer_value1=$(func_parser_value "${lines[50]}")
 if [ ! $epoch_num ]; then
   epoch_num=2
 fi
-if [ $MODE = 'benchmark_train' ]; then
+if [[ $MODE = 'benchmark_train' ]]; then
   epoch_num=1
 fi
 
@@ -106,7 +106,7 @@ function func_inference(){
     _log_path=$4
     _img_dir=$5
     _flag_quant=$6
-    # inference 
+    # inference
     for use_gpu in ${use_gpu_list[*]}; do
         if [ ${use_gpu} = "False" ] || [ ${use_gpu} = "cpu" ]; then
             for use_mkldnn in ${use_mkldnn_list[*]}; do
@@ -126,7 +126,7 @@ function func_inference(){
                         eval $command
                         last_status=${PIPESTATUS[0]}
                         eval "cat ${_save_log_path}"
-                        status_check $last_status "${command}" "../${status_log}"
+                        status_check $last_status "${command}" "../${status_log}" "${model_name}"
                     done
                 done
             done
@@ -151,7 +151,7 @@ function func_inference(){
                         eval $command
                         last_status=${PIPESTATUS[0]}
                         eval "cat ${_save_log_path}"
-                        status_check $last_status "${command}" "../${status_log}"
+                        status_check $last_status "${command}" "../${status_log}" "${model_name}"
                     done
                 done
             done
@@ -161,7 +161,7 @@ function func_inference(){
     done
 }
 
-if [ ${MODE} = "whole_infer" ] || [ ${MODE} = "klquant_whole_infer" ]; then
+if [[ ${MODE} = "whole_infer" ]] || [[ ${MODE} = "klquant_whole_infer" ]]; then
    IFS="|"
    infer_export_flag=(${infer_export_flag})
    if [ ${infer_export_flag} != "null" ]  && [ ${infer_export_flag} != "False" ]; then
@@ -171,7 +171,7 @@ if [ ${MODE} = "whole_infer" ] || [ ${MODE} = "klquant_whole_infer" ]; then
    fi
 fi
 
-if [ ${MODE} = "whole_infer" ]; then
+if [[ ${MODE} = "whole_infer" ]]; then
     GPUID=$3
     if [ ${#GPUID} -le 0 ];then
         env=" "
@@ -191,14 +191,14 @@ if [ ${MODE} = "whole_infer" ]; then
     done
     cd ..
 
-elif [ ${MODE} = "klquant_whole_infer" ]; then
+elif [[ ${MODE} = "klquant_whole_infer" ]]; then
     # for kl_quant
     if [ ${kl_quant_cmd_value} != "null" ] && [ ${kl_quant_cmd_value} != "False" ]; then
 	echo "kl_quant"
 	command="${python} ${kl_quant_cmd_value}"
 	eval $command
 	last_status=${PIPESTATUS[0]}
-	status_check $last_status "${command}" "${status_log}"
+	status_check $last_status "${command}" "${status_log}" "${model_name}"
 	cd inference/quant_post_static_model
 	ln -s __model__ inference.pdmodel
 	ln -s __params__ inference.pdiparams
@@ -234,7 +234,7 @@ else
             env=" "
         fi
         for autocast in ${autocast_list[*]}; do
-            for trainer in ${trainer_list[*]}; do 
+            for trainer in ${trainer_list[*]}; do
                 flag_quant=False
                 if [ ${trainer} = ${pact_key} ]; then
                     run_train=${pact_trainer}
@@ -263,14 +263,16 @@ else
                 if [ ${run_train} = "null" ]; then
                     continue
                 fi
-                
+
                 set_autocast=$(func_set_params "${autocast_key}" "${autocast}")
                 set_epoch=$(func_set_params "${epoch_key}" "${epoch_num}")
                 set_pretrain=$(func_set_params "${pretrain_model_key}" "${pretrain_model_value}")
                 set_batchsize=$(func_set_params "${train_batch_key}" "${train_batch_value}")
                 set_train_params1=$(func_set_params "${train_param_key1}" "${train_param_value1}")
                 set_use_gpu=$(func_set_params "${train_use_gpu_key}" "${train_use_gpu_value}")
-                if [ ${#ips} -le 26 ];then
+                if [ ${#ips} -le 15 ];then
+                    # if length of ips >= 15, then it is seen as multi-machine
+                    # 15 is the min length of ips info for multi-machine: 0.0.0.0,0.0.0.0
                     save_log="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}"
                     nodes=1
                 else
@@ -280,7 +282,7 @@ else
                     nodes=${#ips_array[@]}
                     save_log="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}_nodes_${nodes}"
                 fi
-                
+
                 # load pretrain from norm training if current trainer is pact or fpgm trainer
                 # if [ ${trainer} = ${pact_key} ] || [ ${trainer} = ${fpgm_key} ]; then
                 #    set_pretrain="${load_norm_train_model}"
@@ -289,7 +291,7 @@ else
                 set_save_model=$(func_set_params "${save_model_key}" "${save_log}")
                 if [ ${#gpu} -le 2 ];then  # train with cpu or single gpu
                     cmd="${python} ${run_train} ${set_use_gpu}  ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1} "
-                elif [ ${#ips} -le 26 ];then  # train with multi-gpu
+                elif [ ${#ips} -le 15 ];then  # train with multi-gpu
                     cmd="${python} -m paddle.distributed.launch --gpus=${gpu} ${run_train} ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1}"
                 else     # train with multi-machine
                     cmd="${python} -m paddle.distributed.launch --ips=${ips} --gpus=${gpu} ${run_train} ${set_use_gpu} ${set_save_model} ${set_pretrain} ${set_epoch} ${set_autocast} ${set_batchsize} ${set_train_params1}"
@@ -299,28 +301,28 @@ else
 		# export FLAGS_cudnn_deterministic=True
 		sleep 5
                 eval $cmd
-                status_check $? "${cmd}" "${status_log}"
+                status_check $? "${cmd}" "${status_log}" "${model_name}"
                 sleep 5
-		
+
 		if [[ $FILENAME == *GeneralRecognition* ]]; then
 		    set_eval_pretrain=$(func_set_params "${pretrain_model_key}" "${save_log}/RecModel/${train_model_name}")
 		else
                     set_eval_pretrain=$(func_set_params "${pretrain_model_key}" "${save_log}/${model_name}/${train_model_name}")
 		fi
-                # save norm trained models to set pretrain for pact training and fpgm training 
+                # save norm trained models to set pretrain for pact training and fpgm training
                 if [ ${trainer} = ${trainer_norm} ]; then
                     load_norm_train_model=${set_eval_pretrain}
                 fi
-                # run eval 
+                # run eval
                 if [ ${eval_py} != "null" ]; then
                     set_eval_params1=$(func_set_params "${eval_key1}" "${eval_value1}")
-                    eval_cmd="${python} ${eval_py} ${set_eval_pretrain} ${set_use_gpu} ${set_eval_params1}" 
+                    eval_cmd="${python} ${eval_py} ${set_eval_pretrain} ${set_use_gpu} ${set_eval_params1}"
                     eval $eval_cmd
-                    status_check $? "${eval_cmd}" "${status_log}"
+                    status_check $? "${eval_cmd}" "${status_log}" "${model_name}"
                     sleep 5
                 fi
                 # run export model
-                if [ ${run_export} != "null" ]; then 
+                if [ ${run_export} != "null" ]; then
                     # run export model
                     save_infer_path="${save_log}"
 		    if [[ $FILENAME == *GeneralRecognition* ]]; then
@@ -331,7 +333,7 @@ else
                     set_save_infer_key=$(func_set_params "${save_infer_key}" "${save_infer_path}")
                     export_cmd="${python} ${run_export} ${set_export_weight} ${set_save_infer_key}"
                     eval $export_cmd
-                    status_check $? "${export_cmd}" "${status_log}"
+                    status_check $? "${export_cmd}" "${status_log}" "${model_name}"
 
                     #run inference
                     eval $env
@@ -341,7 +343,7 @@ else
 		    cd ..
                 fi
                 eval "unset CUDA_VISIBLE_DEVICES"
-            done  # done with:    for trainer in ${trainer_list[*]}; do 
-        done      # done with:    for autocast in ${autocast_list[*]}; do 
+            done  # done with:    for trainer in ${trainer_list[*]}; do
+        done      # done with:    for autocast in ${autocast_list[*]}; do
     done          # done with:    for gpu in ${gpu_list[*]}; do
 fi  # end if [ ${MODE} = "infer" ]; then
diff --git a/tools/export_model.py b/tools/export_model.py
index 01aba06c1f715f764352c6fd38a23c470e66e289..35f432f50e9d1dd903be3d0d3e07a4e42f2a2b7f 100644
--- a/tools/export_model.py
+++ b/tools/export_model.py
@@ -30,5 +30,7 @@ if __name__ == "__main__":
     args = config.parse_args()
     config = config.get_config(
         args.config, overrides=args.override, show=False)
+    if config["Arch"].get("use_sync_bn", False):
+        config["Arch"]["use_sync_bn"] = False
     engine = Engine(config, mode="export")
     engine.export()
diff --git a/tools/search_strategy.py b/tools/search_strategy.py
index 15f4aa71be67bbd0f5ec92d240bbc53896684d91..abc406167946c82604f2e58f3835d4a37bbb694d 100644
--- a/tools/search_strategy.py
+++ b/tools/search_strategy.py
@@ -20,8 +20,13 @@ def get_result(log_dir):
     return res
 
 
-def search_train(search_list, base_program, base_output_dir, search_key,
-                 config_replace_value, model_name, search_times=1):
+def search_train(search_list,
+                 base_program,
+                 base_output_dir,
+                 search_key,
+                 config_replace_value,
+                 model_name,
+                 search_times=1):
     best_res = 0.
     best = search_list[0]
     all_result = {}
@@ -33,7 +38,8 @@ def search_train(search_list, base_program, base_output_dir, search_key,
                 model_name = search_i
         res_list = []
         for j in range(search_times):
-            output_dir = "{}/{}_{}_{}".format(base_output_dir, search_key, search_i, j).replace(".", "_")
+            output_dir = "{}/{}_{}_{}".format(base_output_dir, search_key,
+                                              search_i, j).replace(".", "_")
             program += ["-o", "Global.output_dir={}".format(output_dir)]
             process = subprocess.Popen(program)
             process.communicate()
@@ -50,14 +56,17 @@ def search_train(search_list, base_program, base_output_dir, search_key,
 
 def search_strategy():
     args = config.parse_args()
-    configs = config.get_config(args.config, overrides=args.override, show=False)
+    configs = config.get_config(
+        args.config, overrides=args.override, show=False)
     base_config_file = configs["base_config_file"]
-    distill_config_file = configs["distill_config_file"]
+    distill_config_file = configs.get("distill_config_file", None)
     model_name = config.get_config(base_config_file)["Arch"]["name"]
     gpus = configs["gpus"]
     gpus = ",".join([str(i) for i in gpus])
-    base_program = ["python3.7", "-m", "paddle.distributed.launch", "--gpus={}".format(gpus),
-                    "tools/train.py", "-c", base_config_file]
+    base_program = [
+        "python3.7", "-m", "paddle.distributed.launch",
+        "--gpus={}".format(gpus), "tools/train.py", "-c", base_config_file
+    ]
     base_output_dir = configs["output_dir"]
     search_times = configs["search_times"]
     search_dict = configs.get("search_dict")
@@ -67,41 +76,61 @@ def search_strategy():
         search_values = search_i["search_values"]
         replace_config = search_i["replace_config"]
         res = search_train(search_values, base_program, base_output_dir,
-                           search_key, replace_config, model_name, search_times)
+                           search_key, replace_config, model_name,
+                           search_times)
         all_results[search_key] = res
         best = res.get("best")
         for v in replace_config:
             base_program += ["-o", "{}={}".format(v, best)]
 
     teacher_configs = configs.get("teacher", None)
-    if teacher_configs is not None:
+    if teacher_configs is None:
+        print(all_results, base_program)
+        return
+
+    algo = teacher_configs.get("algorithm", "skl-ugi")
+    supported_list = ["skl-ugi", "udml"]
+    assert algo in supported_list, f"algorithm must be in {supported_list} but got {algo}"
+    if algo == "skl-ugi":
         teacher_program = base_program.copy()
         # remove incompatible keys
         teacher_rm_keys = teacher_configs["rm_keys"]
         rm_indices = []
         for rm_k in teacher_rm_keys:
             for ind, ki in enumerate(base_program):
-              if rm_k in ki:
-                rm_indices.append(ind)
+                if rm_k in ki:
+                    rm_indices.append(ind)
         for rm_index in rm_indices[::-1]:
             teacher_program.pop(rm_index)
-            teacher_program.pop(rm_index-1)
+            teacher_program.pop(rm_index - 1)
         replace_config = ["Arch.name"]
         teacher_list = teacher_configs["search_values"]
-        res = search_train(teacher_list, teacher_program, base_output_dir, "teacher", replace_config, model_name)
+        res = search_train(teacher_list, teacher_program, base_output_dir,
+                           "teacher", replace_config, model_name)
         all_results["teacher"] = res
         best = res.get("best")
-        t_pretrained = "{}/{}_{}_0/{}/best_model".format(base_output_dir, "teacher", best, best)
-        base_program += ["-o", "Arch.models.0.Teacher.name={}".format(best),
-                         "-o", "Arch.models.0.Teacher.pretrained={}".format(t_pretrained)]
+        t_pretrained = "{}/{}_{}_0/{}/best_model".format(base_output_dir,
+                                                         "teacher", best, best)
+        base_program += [
+            "-o", "Arch.models.0.Teacher.name={}".format(best), "-o",
+            "Arch.models.0.Teacher.pretrained={}".format(t_pretrained)
+        ]
+    elif algo == "udml":
+        if "lr_mult_list" in all_results:
+            base_program += [
+                "-o", "Arch.models.0.Teacher.lr_mult_list={}".format(
+                    all_results["lr_mult_list"]["best"])
+            ]
+
     output_dir = "{}/search_res".format(base_output_dir)
     base_program += ["-o", "Global.output_dir={}".format(output_dir)]
     final_replace = configs.get('final_replace')
     for i in range(len(base_program)):
-      base_program[i] = base_program[i].replace(base_config_file, distill_config_file)
-      for k in final_replace:
-        v = final_replace[k]
-        base_program[i] = base_program[i].replace(k, v)
+        base_program[i] = base_program[i].replace(base_config_file,
+                                                  distill_config_file)
+        for k in final_replace:
+            v = final_replace[k]
+            base_program[i] = base_program[i].replace(k, v)
 
     process = subprocess.Popen(base_program)
     process.communicate()