Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleClas into arcmargin

5aa57d2c · dongshuilong · f0bf51b3 · 7c6e76e5 · 5aa57d2c · 5aa57d2c
95 changed file
--- a/README_ch.md
+++ b/README_ch.md
@@ -7,7 +7,7 @@
 飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集，助力使用者训练出更好的视觉模型和应用落地。

 **近期更新**
-
+- 2021.09.17 增加PaddleClas自研PP-LCNet系列模型, 这些模型在Intel CPU上有较强的竞争力。相关指标和预训练权重可以从 [这里](docs/zh_CN/ImageNet_models.md)下载。
 - 2021.08.11 更新7个[FAQ](docs/zh_CN/faq_series/faq_2021_s2.md)。
 - 2021.06.29 添加Swin-transformer系列模型，ImageNet1k数据集上Top1 acc最高精度可达87.2%；支持训练预测评估与whl包部署，预训练模型可以从[这里](docs/zh_CN/models/models_intro.md)下载。
 - 2021.06.22,23,24 PaddleClas官方研发团队带来技术深入解读三日直播课。课程回放：[https://aistudio.baidu.com/aistudio/course/introduce/24519](https://aistudio.baidu.com/aistudio/course/introduce/24519)

--- a/README_en.md
+++ b/README_en.md
@@ -8,6 +8,8 @@ PaddleClas is an image recognition toolset for industry and academia, helping us

 **Recent updates**

+- 2021.09.17 Add PP-LCNet series model developed by PaddleClas, these models show strong competitiveness on Intel CPUs. The metrics and pretrained model are available [here](docs/en/ImageNet_models_en.md).
+
 - 2021.06.29 Add Swin-transformer series model，Highest top1 acc on ImageNet1k dataset reaches 87.2%, training, evaluation and inference are all supported. Pretrained models can be downloaded [here](docs/en/models/models_intro_en.md).
 - 2021.06.16 PaddleClas release/2.2. Add metric learning and vector search modules. Add product recognition, animation character recognition, vehicle recognition and logo recognition. Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
 - [more](./docs/en/update_history_en.md)

--- a/deploy/configs/inference_multilabel_cls.yaml
+++ b/deploy/configs/inference_multilabel_cls.yaml
+Global:
+  infer_imgs: "./images/0517_2715693311.jpg"
+  inference_model_dir: "../inference/"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: False
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 3
+    - ToCHWImage:
+PostProcess:
+  main_indicator: MultiLabelTopk
+  MultiLabelTopk:
+    topk: 5
+    class_id_map_file: None
+  SavePreLabel:
+    save_dir: ./pre_label/
--- a/deploy/images/0517_2715693311.jpg
+++ b/deploy/images/0517_2715693311.jpg
--- a/deploy/paddleserving/README.md
+++ b/deploy/paddleserving/README.md
@@ -4,9 +4,9 @@

 PaddleClas provides two service deployment methods:
 - Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please refer to the [tutorial](../../deploy/hubserving/readme_en.md)
- Based on **PaddleServing**: Code path is "`./deploy/paddleserving`". Please follow this tutorial.
+- Based on **PaddleServing**: Code path is "`./deploy/paddleserving`".  if you prefer retrieval_based image reocognition service, please refer to [tutorial](./recognition/README.md)，if you'd like image classification service, Please follow this tutorial.

-# Service deployment based on PaddleServing  
+# Image Classification Service deployment based on PaddleServing  

 This document will introduce how to use the [PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md) to deploy the ResNet50_vd model as a pipeline online service.

@@ -131,7 +131,7 @@ fetch_var {
    config.yml                # configuration file of starting the service
    pipeline_http_client.py   # script to send pipeline prediction request by http
    pipeline_rpc_client.py    # script to send pipeline prediction request by rpc
-    resnet50_web_service.py   # start the script of the pipeline server
+    classification_web_service.py   # start the script of the pipeline server
    ```

 2. Run the following command to start the service.
@@ -147,7 +147,7 @@ fetch_var {
    python3 pipeline_http_client.py
    ```
    After successfully running, the predicted result of the model will be printed in the cmd window. An example of the result is:
-    ![](./imgs/results.png)  
+    ![](./imgs/results.png)

    Adjust the number of concurrency in config.yml to get the largest QPS. 


--- a/deploy/paddleserving/README_CN.md
+++ b/deploy/paddleserving/README_CN.md
@@ -4,9 +4,9 @@

 PaddleClas提供2种服务部署方式：
 - 基于PaddleHub Serving的部署：代码路径为"`./deploy/hubserving`"，使用方法参考[文档](../../deploy/hubserving/readme.md)；
- 基于PaddleServing的部署：代码路径为"`./deploy/paddleserving`"，按照本教程使用。
+- 基于PaddleServing的部署：代码路径为"`./deploy/paddleserving`"， 基于检索方式的图像识别服务参考[文档](./recognition/README_CN.md)， 图像分类服务按照本教程使用。

-# 基于PaddleServing的服务部署
+# 基于PaddleServing的图像分类服务部署

 本文档以经典的ResNet50_vd模型为例，介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PaddleClas
 动态图模型的pipeline在线服务。
@@ -127,7 +127,7 @@ fetch_var {
    config.yml                 # 启动服务的配置文件
    pipeline_http_client.py    # http方式发送pipeline预测请求的脚本
    pipeline_rpc_client.py     # rpc方式发送pipeline预测请求的脚本
-    resnet50_web_service.py    # 启动pipeline服务端的脚本
+    classification_web_service.py    # 启动pipeline服务端的脚本
    ```

 2. 启动服务可运行如下命令：

--- a/deploy/paddleserving/imgs/results_recog.png
+++ b/deploy/paddleserving/imgs/results_recog.png
--- a/deploy/paddleserving/imgs/start_server_recog.png
+++ b/deploy/paddleserving/imgs/start_server_recog.png
--- a/deploy/paddleserving/recognition/README.md
+++ b/deploy/paddleserving/recognition/README.md
+# Product Recognition Service deployment based on PaddleServing  
+
+(English|[简体中文](./README_CN.md))
+
+This document will introduce how to use the [PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md) to deploy the product recognition model based on retrieval method as a pipeline online service.
+
+Some Key Features of Paddle Serving:
+- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed with one line command.
+- Industrial serving features supported, such as models management, online loading, online A/B testing etc.
+- Highly concurrent and efficient communication between clients and servers supported.
+
+The introduction and tutorial of Paddle Serving service deployment framework reference [document](https://github.com/PaddlePaddle/Serving/blob/develop/README.md).
+
+## Contents
+- [Environmental preparation](#environmental-preparation)
+- [Model conversion](#model-conversion)
+- [Paddle Serving pipeline deployment](#paddle-serving-pipeline-deployment)
+- [FAQ](#faq)
+
+<a name="environmental-preparation"></a>
+## Environmental preparation
+
+PaddleClas operating environment and PaddleServing operating environment are needed.
+
+1. Please prepare PaddleClas operating environment reference [link](../../docs/zh_CN/tutorials/install.md).
+   Download the corresponding paddle whl package according to the environment, it is recommended to install version 2.1.0.
+
+2. The steps of PaddleServing operating environment prepare are as follows:
+
+    Install serving which used to start the service
+    ```
+    pip3 install paddle-serving-server==0.6.1 # for CPU
+    pip3 install paddle-serving-server-gpu==0.6.1 # for GPU
+    # Other GPU environments need to confirm the environment and then choose to execute the following commands
+    pip3 install paddle-serving-server-gpu==0.6.1.post101 # GPU with CUDA10.1 + TensorRT6
+    pip3 install paddle-serving-server-gpu==0.6.1.post11 # GPU with CUDA11 + TensorRT7
+    ```
+
+3. Install the client to send requests to the service
+    In [download link](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md) find the client installation package corresponding to the python version.
+    The python3.7 version is recommended here:
+
+    ```
+    wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
+    pip3 install paddle_serving_client-0.0.0-cp37-none-any.whl
+    ```
+
+4. Install serving-app
+    ```
+    pip3 install paddle-serving-app==0.6.1
+    ```
+
+   **note:** If you want to install the latest version of PaddleServing, refer to [link](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md).
+
+
+<a name="model-conversion"></a>
+## Model conversion
+When using PaddleServing for service deployment, you need to convert the saved inference model into a serving model that is easy to deploy.
+The following assumes that the current working directory is the PaddleClas root directory
+
+Firstly, download the inference model of ResNet50_vd
+```
+cd deploy
+# Download and unzip the ResNet50_vd model
+wget -P models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar
+cd models
+tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
+```
+
+Then, you can use installed paddle_serving_client tool to convert inference model to mobile model.
+```
+#  Product recognition model conversion
+python3 -m paddle_serving_client.convert --dirname ./product_ResNet50_vd_aliproduct_v1.0_infer/ \
+                                         --model_filename inference.pdmodel  \
+                                         --params_filename inference.pdiparams \
+                                         --serving_server ./product_ResNet50_vd_aliproduct_v1.0_serving/ \
+                                         --serving_client ./product_ResNet50_vd_aliproduct_v1.0_client/
+```
+
+After the ResNet50_vd inference model is converted, there will be additional folders of `product_ResNet50_vd_aliproduct_v1.0_serving` and `product_ResNet50_vd_aliproduct_v1.0_client` in the current folder, with the following format:
+```
+|- product_ResNet50_vd_aliproduct_v1.0_serving/
+  |- __model__  
+  |- __params__
+  |- serving_server_conf.prototxt  
+  |- serving_server_conf.stream.prototxt
+
+|- product_ResNet50_vd_aliproduct_v1.0_client
+  |- serving_client_conf.prototxt  
+  |- serving_client_conf.stream.prototxt
+```
+
+Once you have the model file for deployment, you need to change the alias name in `serving_server_conf.prototxt`:  change `alias_name` in `fetch_var` to `features`,
+The modified serving_server_conf.prototxt file is as follows:
+```
+feed_var {
+  name: "x"
+  alias_name: "x"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 3
+  shape: 224
+  shape: 224
+}
+fetch_var {
+  name: "save_infer_model/scale_0.tmp_1"
+  alias_name: "features"
+  is_lod_tensor: true
+  fetch_type: 1
+  shape: -1
+}
+```
+
+Next，download and unpack the built index of product gallery
+```
+cd ../
+wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_v1.1.tar && tar -xf recognition_demo_data_v1.1.tar
+```
+
+
+<a name="paddle-serving-pipeline-deployment"></a>
+## Paddle Serving pipeline deployment
+
+1. Download the PaddleClas code, if you have already downloaded it, you can skip this step.
+    ```
+    git clone https://github.com/PaddlePaddle/PaddleClas
+
+    # Enter the working directory  
+    cd PaddleClas/deploy/paddleserving/recognition
+    ```
+
+    The paddleserving directory contains the code to start the pipeline service and send prediction requests, including:
+    ```
+    __init__.py
+    config.yml                # configuration file of starting the service
+    pipeline_http_client.py   # script to send pipeline prediction request by http
+    pipeline_rpc_client.py    # script to send pipeline prediction request by rpc
+    recognition_web_service.py   # start the script of the pipeline server
+    ```
+
+2. Run the following command to start the service.
+    ```
+    # Start the service and save the running log in log.txt
+    python3 recognition_web_service.py &>log.txt &
+    ```
+    After the service is successfully started, a log similar to the following will be printed in log.txt
+    ![](../imgs/start_server_recog.png)
+
+3. Send service request
+    ```
+    python3 pipeline_http_client.py
+    ```
+    After successfully running, the predicted result of the model will be printed in the cmd window. An example of the result is:
+    ![](../imgs/results_recog.png)  
+
+    Adjust the number of concurrency in config.yml to get the largest QPS. 
+
+    ```
+    op:
+        concurrency: 8
+        ...
+    ```
+
+    Multiple service requests can be sent at the same time if necessary.
+
+    The predicted performance data will be automatically written into the `PipelineServingLogs/pipeline.tracer` file.
+
+<a name="faq"></a>
+## FAQ
+**Q1**: No result return after sending the request.
+
+**A1**: Do not set the proxy when starting the service and sending the request. You can close the proxy before starting the service and before sending the request. The command to close the proxy is:
+```
+unset https_proxy
+unset http_proxy
+```  
--- a/deploy/paddleserving/recognition/README_CN.md
+++ b/deploy/paddleserving/recognition/README_CN.md
+# 基于PaddleServing的商品识别服务部署
+
+([English](./README.md)|简体中文)
+
+本文以商品识别为例，介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PaddleClas动态图模型的pipeline在线服务。
+
+相比较于hubserving部署，PaddleServing具备以下优点：
+- 支持客户端和服务端之间高并发和高效通信
+- 支持 工业级的服务能力 例如模型管理，在线加载，在线A/B测试等
+- 支持 多种编程语言 开发客户端，例如C++, Python和Java
+
+更多有关PaddleServing服务化部署框架介绍和使用教程参考[文档](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)。
+
+## 目录
+- [环境准备](#环境准备)
+- [模型转换](#模型转换)
+- [Paddle Serving pipeline部署](#部署)
+- [FAQ](#FAQ)
+
+<a name="环境准备"></a>
+## 环境准备
+
+需要准备PaddleClas的运行环境和PaddleServing的运行环境。
+
+- 准备PaddleClas的[运行环境](../../docs/zh_CN/tutorials/install.md), 根据环境下载对应的paddle whl包，推荐安装2.1.0版本
+
+- 准备PaddleServing的运行环境，步骤如下
+
+1. 安装serving，用于启动服务
+    ```
+    pip3 install paddle-serving-server==0.6.1 # for CPU
+    pip3 install paddle-serving-server-gpu==0.6.1 # for GPU
+    # 其他GPU环境需要确认环境再选择执行如下命令
+    pip3 install paddle-serving-server-gpu==0.6.1.post101 # GPU with CUDA10.1 + TensorRT6
+    pip3 install paddle-serving-server-gpu==0.6.1.post11 # GPU with CUDA11 + TensorRT7
+    ```
+
+2. 安装client，用于向服务发送请求
+    在[下载链接](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)中找到对应python版本的client安装包，这里推荐python3.7版本：
+
+    ```
+    wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
+    pip3 install paddle_serving_client-0.0.0-cp37-none-any.whl
+    ```
+
+3. 安装serving-app
+    ```
+    pip3 install paddle-serving-app==0.6.1
+    ```
+    **Note:** 如果要安装最新版本的PaddleServing参考[链接](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)。
+
+<a name="模型转换"></a>
+## 模型转换
+
+使用PaddleServing做服务化部署时，需要将保存的inference模型转换为serving易于部署的模型。 
+以下内容假定当前工作目录为PaddleClas根目录。
+
+首先，下载商品识别的inference模型
+```
+cd deploy
+
+# 下载并解压商品识别模型
+wget -P models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar
+cd models
+tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
+```
+
+接下来，用安装的paddle_serving_client把下载的inference模型转换成易于server部署的模型格式。
+
+```
+# 转换商品识别模型
+python3 -m paddle_serving_client.convert --dirname ./product_ResNet50_vd_aliproduct_v1.0_infer/ \
+                                         --model_filename inference.pdmodel  \
+                                         --params_filename inference.pdiparams \
+                                         --serving_server ./product_ResNet50_vd_aliproduct_v1.0_serving/ \
+                                         --serving_client ./product_ResNet50_vd_aliproduct_v1.0_client/
+```
+商品识别推理模型转换完成后，会在当前文件夹多出`product_ResNet50_vd_aliproduct_v1.0_serving` 和`product_ResNet50_vd_aliproduct_v1.0_client`的文件夹，具备如下格式：
+```
+|- product_ResNet50_vd_aliproduct_v1.0_serving/
+  |- __model__  
+  |- __params__
+  |- serving_server_conf.prototxt  
+  |- serving_server_conf.stream.prototxt
+
+|- product_ResNet50_vd_aliproduct_v1.0_client
+  |- serving_client_conf.prototxt  
+  |- serving_client_conf.stream.prototxt
+
+```
+得到模型文件之后，需要修改serving_server_conf.prototxt中的alias名字： 将`fetch_var`中的`alias_name`改为`features`, 
+修改后的serving_server_conf.prototxt内容如下：
+```
+feed_var {
+  name: "x"
+  alias_name: "x"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 3
+  shape: 224
+  shape: 224
+}
+fetch_var {
+  name: "save_infer_model/scale_0.tmp_1"
+  alias_name: "features"
+  is_lod_tensor: true
+  fetch_type: 1
+  shape: -1
+}
+```
+
+接下来，下载并解压已经构建后的商品库index
+```
+cd ../
+wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_v1.1.tar && tar -xf recognition_demo_data_v1.1.tar
+```
+
+
+<a name="部署"></a>
+## Paddle Serving pipeline部署
+
+1. 下载PaddleClas代码，若已下载可跳过此步骤
+    ```
+    git clone https://github.com/PaddlePaddle/PaddleClas
+
+    # 进入到工作目录
+    cd PaddleClas/deploy/paddleserving/recognition
+    ```
+    paddleserving目录包含启动pipeline服务和发送预测请求的代码，包括：
+    ```
+    __init__.py
+    config.yml                    # 启动服务的配置文件
+    pipeline_http_client.py       # http方式发送pipeline预测请求的脚本
+    pipeline_rpc_client.py        # rpc方式发送pipeline预测请求的脚本
+    recognition_web_service.py    # 启动pipeline服务端的脚本
+    ```
+
+2. 启动服务可运行如下命令：
+    ```
+    # 启动服务，运行日志保存在log.txt
+    python3 recognition_web_service.py &>log.txt &
+    ```
+    成功启动服务后，log.txt中会打印类似如下日志
+    ![](../imgs/start_server_recog.png)
+
+3. 发送服务请求：
+    ```
+    python3 pipeline_http_client.py
+    ```
+    成功运行后，模型预测的结果会打印在cmd窗口中，结果示例为：
+    ![](../imgs/results_recog.png)
+
+    调整 config.yml 中的并发个数可以获得最大的QPS
+    ```
+    op:
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 8
+        ...
+    ```
+    有需要的话可以同时发送多个服务请求
+
+    预测性能数据会被自动写入 `PipelineServingLogs/pipeline.tracer` 文件中。
+
+<a name="FAQ"></a>
+## FAQ
+**Q1**： 发送请求后没有结果返回或者提示输出解码报错
+
+**A1**： 启动服务和发送请求时不要设置代理，可以在启动服务前和发送请求前关闭代理，关闭代理的命令是：
+```
+unset https_proxy
+unset http_proxy
+```
--- a/deploy/paddleserving/recognition/__init__.py
+++ b/deploy/paddleserving/recognition/__init__.py
--- a/deploy/paddleserving/recognition/config.yml
+++ b/deploy/paddleserving/recognition/config.yml
+#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
+##当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
+worker_num: 1
+
+#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
+http_port: 18081
+rpc_port: 9994
+
+dag:
+    #op资源类型, True, 为线程模型；False，为进程模型
+    is_thread_op: False
+op:
+    rec:
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 1
+
+        #当op配置没有server_endpoints时，从local_service_conf读取本地服务配置
+        local_service_conf:
+
+            #uci模型路径
+            model_config: ../../models/product_ResNet50_vd_aliproduct_v1.0_serving
+
+            #计算硬件类型: 空缺时由devices决定(CPU/GPU)，0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
+            device_type: 1
+
+            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
+            devices: "0" # "0,1"
+
+            #client类型，包括brpc, grpc和local_predictor.local_predictor不启动Serving服务，进程内预测
+            client_type: local_predictor
+
+            #Fetch结果列表，以client_config中fetch_var的alias_name为准
+            fetch_list: ["features"]
+            
+    det:
+        concurrency: 1
+        local_service_conf:
+            client_type: local_predictor
+            device_type: 1
+            devices: '0'
+            fetch_list:
+            - save_infer_model/scale_0.tmp_1
+            model_config: ../../models/ppyolov2_r50vd_dcn_mainbody_v1.0_serving/
\ No newline at end of file
--- a/deploy/paddleserving/recognition/daoxiangcunjinzhubing_6.jpg
+++ b/deploy/paddleserving/recognition/daoxiangcunjinzhubing_6.jpg
--- a/deploy/paddleserving/recognition/label_list.txt
+++ b/deploy/paddleserving/recognition/label_list.txt
+foreground
+background
\ No newline at end of file
--- a/deploy/paddleserving/recognition/pipeline_http_client.py
+++ b/deploy/paddleserving/recognition/pipeline_http_client.py
+import requests
+import json
+import base64
+import os
+
+imgpath = "daoxiangcunjinzhubing_6.jpg"
+
+def cv2_to_base64(image):
+    return base64.b64encode(image).decode('utf8')
+
+if __name__ == "__main__":
+    url = "http://127.0.0.1:18081/recognition/prediction"
+
+    with open(os.path.join(".",  imgpath), 'rb') as file:
+        image_data1 = file.read()
+    image = cv2_to_base64(image_data1)
+    data = {"key": ["image"], "value": [image]}
+
+    for i in range(1):
+        r = requests.post(url=url, data=json.dumps(data))
+        print(r.json())
--- a/deploy/paddleserving/recognition/pipeline_rpc_client.py
+++ b/deploy/paddleserving/recognition/pipeline_rpc_client.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+try:
+    from paddle_serving_server_gpu.pipeline import PipelineClient
+except ImportError:
+    from paddle_serving_server.pipeline import PipelineClient
+import base64
+
+client = PipelineClient()
+client.connect(['127.0.0.1:9994'])
+imgpath = "daoxiangcunjinzhubing_6.jpg"
+
+def cv2_to_base64(image):
+    return base64.b64encode(image).decode('utf8')
+
+if __name__ == "__main__":
+    with open(imgpath, 'rb') as file:
+        image_data = file.read()
+    image = cv2_to_base64(image_data)
+
+    for i in range(1):
+        ret = client.predict(feed_dict={"image": image}, fetch=["result"])
+        print(ret)
--- a/deploy/paddleserving/recognition/recognition_web_service.py
+++ b/deploy/paddleserving/recognition/recognition_web_service.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from paddle_serving_server.web_service import WebService, Op
+import logging
+import numpy as np
+import sys
+import cv2
+from paddle_serving_app.reader import *
+import base64
+import os
+import faiss
+import pickle
+import json
+
+class DetOp(Op):
+    def init_op(self):
+        self.img_preprocess = Sequential([
+            BGR2RGB(), Div(255.0),
+            Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False),
+            Resize((640, 640)), Transpose((2, 0, 1))
+        ])
+
+        self.img_postprocess = RCNNPostprocess("label_list.txt", "output")
+        self.threshold = 0.2
+        self.max_det_results = 5
+
+    def generate_scale(self, im):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+        Returns:
+            im_scale_x: the resize ratio of X
+            im_scale_y: the resize ratio of Y
+        """
+        target_size = [640, 640]
+        origin_shape = im.shape[:2]
+        resize_h, resize_w = target_size
+        im_scale_y = resize_h / float(origin_shape[0])
+        im_scale_x = resize_w / float(origin_shape[1])
+        return im_scale_y, im_scale_x
+
+    def preprocess(self, input_dicts, data_id, log_id):
+        (_, input_dict), = input_dicts.items()
+        imgs = []
+        raw_imgs = []
+        for key in input_dict.keys():
+            data = base64.b64decode(input_dict[key].encode('utf8'))
+            raw_imgs.append(data)
+            data = np.fromstring(data, np.uint8)
+            raw_im = cv2.imdecode(data, cv2.IMREAD_COLOR)
+
+            im_scale_y, im_scale_x = self.generate_scale(raw_im)
+            im = self.img_preprocess(raw_im)
+            
+            imgs.append({
+              "image": im[np.newaxis, :],
+              "im_shape": np.array(list(im.shape[1:])).reshape(-1)[np.newaxis,:],
+              "scale_factor": np.array([im_scale_y, im_scale_x]).astype('float32'),
+            })
+        self.raw_img = raw_imgs
+
+        feed_dict = {
+            "image":        np.concatenate([x["image"] for x in imgs], axis=0),
+            "im_shape":     np.concatenate([x["im_shape"] for x in imgs], axis=0),
+            "scale_factor": np.concatenate([x["scale_factor"] for x in imgs], axis=0)
+        }
+        return feed_dict, False,  None,  ""
+
+    def postprocess(self, input_dicts, fetch_dict, log_id):
+        boxes = self.img_postprocess(fetch_dict, visualize=False)
+        boxes.sort(key = lambda x: x["score"], reverse = True)
+        boxes = filter(lambda x: x["score"] >= self.threshold, boxes[:self.max_det_results])
+        boxes = list(boxes)
+        for i in range(len(boxes)):
+            boxes[i]["bbox"][2] += boxes[i]["bbox"][0] - 1
+            boxes[i]["bbox"][3] += boxes[i]["bbox"][1] - 1
+        result = json.dumps(boxes)
+        res_dict = {"bbox_result": result, "image": self.raw_img}
+        return res_dict,  None,  ""
+
+class RecOp(Op):
+    def init_op(self):
+        self.seq = Sequential([
+            BGR2RGB(), Resize((224, 224)), 
+            Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225],
+                                False), Transpose((2, 0, 1))
+        ])
+
+        index_dir = "../../recognition_demo_data_v1.1/gallery_product/index"
+        assert os.path.exists(os.path.join(
+            index_dir, "vector.index")), "vector.index not found ..."
+        assert os.path.exists(os.path.join(
+            index_dir, "id_map.pkl")), "id_map.pkl not found ... "
+        
+        self.searcher = faiss.read_index(
+            os.path.join(index_dir, "vector.index"))
+                
+        with open(os.path.join(index_dir, "id_map.pkl"), "rb") as fd:
+            self.id_map = pickle.load(fd)
+
+        self.rec_nms_thresold = 0.05
+        self.rec_score_thres = 0.5
+        self.feature_normalize = True
+        self.return_k = 1
+
+    def preprocess(self, input_dicts, data_id, log_id):
+        (_, input_dict), = input_dicts.items()
+        raw_img = input_dict["image"][0]
+        data = np.frombuffer(raw_img, np.uint8)
+        origin_img = cv2.imdecode(data, cv2.IMREAD_COLOR)
+        dt_boxes = input_dict["bbox_result"]
+        boxes = json.loads(dt_boxes)
+        boxes.append({"category_id": 0,
+                      "score": 1.0,
+                      "bbox": [0, 0, origin_img.shape[1], origin_img.shape[0]]
+                     })
+        self.det_boxes = boxes
+
+        #construct batch images for rec
+        imgs = []
+        for box in boxes:
+            box = [int(x) for x in box["bbox"]]
+            im = origin_img[box[1]: box[3], box[0]: box[2]].copy()
+            img = self.seq(im)
+            imgs.append(img[np.newaxis, :].copy())
+
+        input_imgs = np.concatenate(imgs, axis=0)
+        return {"x": input_imgs},  False,  None,  ""
+
+    def nms_to_rec_results(self, results, thresh = 0.1):
+        filtered_results = []
+        x1 = np.array([r["bbox"][0] for r in results]).astype("float32")
+        y1 = np.array([r["bbox"][1] for r in results]).astype("float32")
+        x2 = np.array([r["bbox"][2] for r in results]).astype("float32")
+        y2 = np.array([r["bbox"][3] for r in results]).astype("float32")
+        scores = np.array([r["rec_scores"] for r in results])
+
+        areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+        order = scores.argsort()[::-1]
+        while order.size > 0:
+            i = order[0]
+            xx1 = np.maximum(x1[i], x1[order[1:]])
+            yy1 = np.maximum(y1[i], y1[order[1:]])
+            xx2 = np.minimum(x2[i], x2[order[1:]])
+            yy2 = np.minimum(y2[i], y2[order[1:]])
+
+            w = np.maximum(0.0, xx2 - xx1 + 1)
+            h = np.maximum(0.0, yy2 - yy1 + 1)
+            inter = w * h
+            ovr = inter / (areas[i] + areas[order[1:]] - inter)
+            inds = np.where(ovr <= thresh)[0]
+            order = order[inds + 1]
+            filtered_results.append(results[i])
+        return filtered_results
+
+    def postprocess(self, input_dicts, fetch_dict, log_id):
+        batch_features = fetch_dict["features"]
+
+        if self.feature_normalize:
+            feas_norm = np.sqrt(
+                np.sum(np.square(batch_features), axis=1, keepdims=True))
+            batch_features = np.divide(batch_features, feas_norm)
+
+        scores, docs = self.searcher.search(batch_features,  self.return_k)
+
+        results = []
+        for i in range(scores.shape[0]):
+            pred = {}
+            if scores[i][0] >= self.rec_score_thres:
+                pred["bbox"] = [int(x) for x in self.det_boxes[i]["bbox"]]
+                pred["rec_docs"] = self.id_map[docs[i][0]].split()[1]
+                pred["rec_scores"] = scores[i][0]
+                results.append(pred)
+        
+        #do nms
+        results = self.nms_to_rec_results(results, self.rec_nms_thresold)
+        return {"result": str(results)}, None, ""
+
+class RecognitionService(WebService):
+    def get_pipeline_response(self, read_op):
+        det_op = DetOp(name="det", input_ops=[read_op])
+        rec_op = RecOp(name="rec", input_ops=[det_op])
+        return rec_op
+
+product_recog_service = RecognitionService(name="recognition")
+product_recog_service.prepare_pipeline_config("config.yml")
+product_recog_service.run_service()
--- a/deploy/python/postprocess.py
+++ b/deploy/python/postprocess.py
@@ -81,12 +81,14 @@ class Topk(object):
            class_id_map = None
        return class_id_map

-    def __call__(self, x, file_names=None):
+    def __call__(self, x, file_names=None, multilabel=False):
        if file_names is not None:
            assert x.shape[0] == len(file_names)
        y = []
        for idx, probs in enumerate(x):
-            index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32")
+            index = probs.argsort(axis=0)[-self.topk:][::-1].astype(
+                "int32") if not multilabel else np.where(
+                    probs >= 0.5)[0].astype("int32")
            clas_id_list = []
            score_list = []
            label_name_list = []
@@ -108,6 +110,14 @@ class Topk(object):
        return y


+class MultiLabelTopk(Topk):
+    def __init__(self, topk=1, class_id_map_file=None):
+        super().__init__()
+
+    def __call__(self, x, file_names=None):
+        return super().__call__(x, file_names, multilabel=True)
+
+
 class SavePreLabel(object):
    def __init__(self, save_dir):
        if save_dir is None:
@@ -128,23 +138,24 @@ class SavePreLabel(object):
        os.makedirs(output_dir, exist_ok=True)
        shutil.copy(image_file, output_dir)

+
 class Binarize(object):
-    def __init__(self, method = "round"):
+    def __init__(self, method="round"):
        self.method = method
        self.unit = np.array([[128, 64, 32, 16, 8, 4, 2, 1]]).T

    def __call__(self, x, file_names=None):
        if self.method == "round":
            x = np.round(x + 1).astype("uint8") - 1
-        
+
        if self.method == "sign":
            x = ((np.sign(x) + 1) / 2).astype("uint8")

        embedding_size = x.shape[1]
        assert embedding_size % 8 == 0, "The Binary index only support vectors with sizes multiple of 8"
-        
+
        byte = np.zeros([x.shape[0], embedding_size // 8], dtype=np.uint8)
        for i in range(embedding_size // 8):
-            byte[:, i:i+1] = np.dot(x[:, i * 8: (i + 1)* 8], self.unit)
+            byte[:, i:i + 1] = np.dot(x[:, i * 8:(i + 1) * 8], self.unit)

        return byte
--- a/deploy/python/predict_cls.py
+++ b/deploy/python/predict_cls.py
@@ -71,7 +71,6 @@ class ClsPredictor(Predictor):
        output_names = self.paddle_predictor.get_output_names()
        output_tensor = self.paddle_predictor.get_output_handle(output_names[
            0])
-
        if self.benchmark:
            self.auto_logger.times.start()
        if not isinstance(images, (list, )):
@@ -119,7 +118,6 @@ def main(config):
                                                         ) == len(image_list):
            if len(batch_imgs) == 0:
                continue
-
            batch_results = cls_predictor.predict(batch_imgs)
            for number, result_dict in enumerate(batch_results):
                filename = batch_names[number]

--- a/deploy/python/preprocess.py
+++ b/deploy/python/preprocess.py
@@ -19,12 +19,14 @@ from __future__ import division
 from __future__ import print_function
 from __future__ import unicode_literals

+from functools import partial
 import six
 import math
 import random
 import cv2
 import numpy as np
 import importlib
+from PIL import Image

 from python.det_preprocess import DetNormalizeImage, DetPadStride, DetPermute, DetResize

@@ -50,6 +52,50 @@ def create_operators(params):
    return ops


+class UnifiedResize(object):
+    def __init__(self, interpolation=None, backend="cv2"):
+        _cv2_interp_from_str = {
+            'nearest': cv2.INTER_NEAREST,
+            'bilinear': cv2.INTER_LINEAR,
+            'area': cv2.INTER_AREA,
+            'bicubic': cv2.INTER_CUBIC,
+            'lanczos': cv2.INTER_LANCZOS4
+        }
+        _pil_interp_from_str = {
+            'nearest': Image.NEAREST,
+            'bilinear': Image.BILINEAR,
+            'bicubic': Image.BICUBIC,
+            'box': Image.BOX,
+            'lanczos': Image.LANCZOS,
+            'hamming': Image.HAMMING
+        }
+
+        def _pil_resize(src, size, resample):
+            pil_img = Image.fromarray(src)
+            pil_img = pil_img.resize(size, resample)
+            return np.asarray(pil_img)
+
+        if backend.lower() == "cv2":
+            if isinstance(interpolation, str):
+                interpolation = _cv2_interp_from_str[interpolation.lower()]
+            # compatible with opencv < version 4.4.0
+            elif not interpolation:
+                interpolation = cv2.INTER_LINEAR
+            self.resize_func = partial(cv2.resize, interpolation=interpolation)
+        elif backend.lower() == "pil":
+            if isinstance(interpolation, str):
+                interpolation = _pil_interp_from_str[interpolation.lower()]
+            self.resize_func = partial(_pil_resize, resample=interpolation)
+        else:
+            logger.warning(
+                f"The backend of Resize only support \"cv2\" or \"PIL\". \"f{backend}\" is unavailable. Use \"cv2\" instead."
+            )
+            self.resize_func = cv2.resize
+
+    def __call__(self, src, size):
+        return self.resize_func(src, size)
+
+
 class OperatorParamError(ValueError):
    """ OperatorParamError
    """
@@ -87,8 +133,11 @@ class DecodeImage(object):
 class ResizeImage(object):
    """ resize image """

-    def __init__(self, size=None, resize_short=None, interpolation=-1):
-        self.interpolation = interpolation if interpolation >= 0 else None
+    def __init__(self,
+                 size=None,
+                 resize_short=None,
+                 interpolation=None,
+                 backend="cv2"):
        if resize_short is not None and resize_short > 0:
            self.resize_short = resize_short
            self.w = None
@@ -101,6 +150,9 @@ class ResizeImage(object):
            raise OperatorParamError("invalid params for ReisizeImage for '\
                'both 'size' and 'resize_short' are None")

+        self._resize_func = UnifiedResize(
+            interpolation=interpolation, backend=backend)
+
    def __call__(self, img):
        img_h, img_w = img.shape[:2]
        if self.resize_short is not None:
@@ -110,10 +162,7 @@ class ResizeImage(object):
        else:
            w = self.w
            h = self.h
-        if self.interpolation is None:
-            return cv2.resize(img, (w, h))
-        else:
-            return cv2.resize(img, (w, h), interpolation=self.interpolation)
+        return self._resize_func(img, (w, h))


 class CropImage(object):
@@ -145,9 +194,12 @@ class CropImage(object):
 class RandCropImage(object):
    """ random crop image """

-    def __init__(self, size, scale=None, ratio=None, interpolation=-1):
-
-        self.interpolation = interpolation if interpolation >= 0 else None
+    def __init__(self,
+                 size,
+                 scale=None,
+                 ratio=None,
+                 interpolation=None,
+                 backend="cv2"):
        if type(size) is int:
            self.size = (size, size)  # (h, w)
        else:
@@ -156,6 +208,9 @@ class RandCropImage(object):
        self.scale = [0.08, 1.0] if scale is None else scale
        self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio

+        self._resize_func = UnifiedResize(
+            interpolation=interpolation, backend=backend)
+
    def __call__(self, img):
        size = self.size
        scale = self.scale
@@ -181,10 +236,8 @@ class RandCropImage(object):
        j = random.randint(0, img_h - h)

        img = img[j:j + h, i:i + w, :]
-        if self.interpolation is None:
-            return cv2.resize(img, size)
-        else:
-            return cv2.resize(img, size, interpolation=self.interpolation)
+
+        return self._resize_func(img, size)


 class RandFlipImage(object):

--- a/deploy/shell/predict.sh
+++ b/deploy/shell/predict.sh
 # classification
 python3.7 python/predict_cls.py -c configs/inference_cls.yaml

+# multilabel_classification
+#python3.7 python/predict_cls.py -c configs/inference_multilabel_cls.yaml
+
 # feature extractor
 # python3.7 python/predict_rec.py -c configs/inference_rec.yaml


--- a/docs/en/ImageNet_models_en.md
+++ b/docs/en/ImageNet_models_en.md
@@ -24,13 +24,13 @@ Accuracy and inference time of the prtrained models based on SSLD distillation a
 * Server-side distillation pretrained models

 | Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                                                         |
-|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
+|---------------------|-----------|-----------|---------------|----------------|----------|-----------|-----------------------------------|
 | ResNet34_vd_ssld         | 0.797    | 0.760  | 0.037  | 2.434               | 6.222              | 7.39     | 21.82     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_vd_ssld_pretrained.pdparams)         |
-| ResNet50_vd_<br>ssld | 0.830    | 0.792    | 0.039 | 3.531               | 8.090              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
-| ResNet101_vd_<br>ssld   | 0.837    | 0.802    | 0.035 |  6.117               | 13.762             | 16.1     | 44.57     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams)   |
-| Res2Net50_vd_<br>26w_4s_ssld | 0.831    | 0.798    | 0.033 |  4.527              | 9.657             | 8.37     | 25.06     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
-| Res2Net101_vd_<br>26w_4s_ssld | 0.839    | 0.806    | 0.033 | 8.087              | 17.312             | 16.67    | 45.22     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
-| Res2Net200_vd_<br>26w_4s_ssld | 0.851    | 0.812    | 0.049 | 14.678              | 32.350             | 31.49    | 76.21     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
+| ResNet50_vd_ssld | 0.830    | 0.792    | 0.039 | 3.531               | 8.090              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
+| ResNet101_vd_ssld   | 0.837    | 0.802    | 0.035 |  6.117               | 13.762             | 16.1     | 44.57     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams)   |
+| Res2Net50_vd_26w_4s_ssld | 0.831    | 0.798    | 0.033 |  4.527              | 9.657             | 8.37     | 25.06     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
+| Res2Net101_vd_26w_4s_ssld | 0.839    | 0.806    | 0.033 | 8.087              | 17.312             | 16.67    | 45.22     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
+| Res2Net200_vd_26w_4s_ssld | 0.851    | 0.812    | 0.049 | 14.678              | 32.350             | 31.49    | 76.21     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
 | HRNet_W18_C_ssld | 0.812    | 0.769   | 0.043 | 7.406          | 13.297         | 4.14     | 21.29     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W18_C_ssld_pretrained.pdparams) |
 | HRNet_W48_C_ssld | 0.836    | 0.790   | 0.046  | 13.707         | 34.435         | 34.58    | 77.47     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W48_C_ssld_pretrained.pdparams) |
 | SE_HRNet_W64_C_ssld | 0.848    |  -    |  - |  31.697      |     94.995      | 57.83    | 128.97    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
@@ -38,19 +38,44 @@ Accuracy and inference time of the prtrained models based on SSLD distillation a

 * Mobile-side distillation pretrained models

-| Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | 模型大小(M) | Download Address  |
+| Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | Storage Size(M) | Download Address  |
+|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
+| MobileNetV1_ssld   | 0.779    | 0.710    | 0.069 |  32.523              | 1.11     | 4.19      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)                 |
+| MobileNetV2_ssld                 | 0.767    | 0.722  | 0.045  | 23.318              | 0.6      | 3.44      | 14      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)                 |
+| MobileNetV3_small_x0_35_ssld          | 0.556    | 0.530 | 0.026   | 2.635                 | 0.026    | 1.66      | 6.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
+| MobileNetV3_large_x1_0_ssld      | 0.790    | 0.753  | 0.036  | 19.308           | 0.45     | 5.47      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
+| MobileNetV3_small_x1_0_ssld      | 0.713    | 0.682  |  0.031  | 6.546                 | 0.123    | 2.94      | 12      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
+| GhostNet_x1_3_ssld                    | 0.794    | 0.757   | 0.037 | 19.983                | 0.44     | 7.3       | 29      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)  
+
+* Intel-CPU-side distillation pretrained models
+
+| Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain |  Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | Flops(M) | Params(M)  | Download Address   |
 |---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
-| MobileNetV1_<br>ssld   | 0.779    | 0.710    | 0.069 |  32.523              | 1.11     | 4.19      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)                 |
-| MobileNetV2_<br>ssld                 | 0.767    | 0.722  | 0.045  | 23.318              | 0.6      | 3.44      | 14      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)                 |
-| MobileNetV3_<br>small_x0_35_ssld          | 0.556    | 0.530 | 0.026   | 2.635                 | 0.026    | 1.66      | 6.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
-| MobileNetV3_<br>large_x1_0_ssld      | 0.790    | 0.753  | 0.036  | 19.308           | 0.45     | 5.47      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
-| MobileNetV3_small_<br>x1_0_ssld      | 0.713    | 0.682  |  0.031  | 6.546                 | 0.123    | 2.94      | 12      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
-| GhostNet_<br>x1_3_ssld                    | 0.794    | 0.757   | 0.037 | 19.983                | 0.44     | 7.3       | 29      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)  
+| PPLCNet_x0_5_ssld   | 0.661    | 0.631    | 0.030 | 2.05     | 47     |   1.9   | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_ssld_pretrained.pdparams)                 |
+| PPLCNet_x1_0_ssld   | 0.744    | 0.713    | 0.033 | 2.46     | 161     |   3.0  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_ssld_pretrained.pdparams)                 |
+| PPLCNet_x2_5_ssld   | 0.808    | 0.766    | 0.042 | 5.39     | 906     |   9.0  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_ssld_pretrained.pdparams)                 |


 * Note: `Reference Top-1 Acc` means accuracy of pretrained models which are trained on ImageNet1k dataset.


+<a name="PPLCNet_series"></a>
+### PPLCNet_series
+
+Accuracy and inference time metrics of PPLCNet series models are shown as follows. More detailed information can be refered to [PPLCNet series tutorial](../en/models/PPLCNet_en.md).
+
+| Model           | Top-1 Acc | Top-5 Acc | Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | FLOPs(M) | Params(M) | Download Address |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565   |  1.74      | 18    | 1.5  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_pretrained.pdparams) |
+| PPLCNet_x0_35        |0.5809           | 0.8083   |  1.92      | 29    | 1.6  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_35_pretrained.pdparams) |
+| PPLCNet_x0_5         |0.6314           | 0.8466   |  2.05      | 47    | 1.9  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_pretrained.pdparams) |
+| PPLCNet_x0_75        |0.6818           | 0.8830   |  2.29      | 99    | 2.4  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams) |
+| PPLCNet_x1_0         |0.7132           | 0.9003   |  2.46      | 161   | 3.0  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams) |
+| PPLCNet_x1_5         |0.7371           | 0.9153   |  3.19      | 342   | 4.5  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_5_pretrained.pdparams) |
+| PPLCNet_x2_0         |0.7518           | 0.9227   |  4.27      | 590   | 6.5  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams) |
+| PPLCNet_x2_5         |0.7660           | 0.9300   |  5.39      | 906   | 9.0  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_pretrained.pdparams) |
+
+
 <a name="ResNet_and_Vd_series"></a>
 ### ResNet and Vd series


--- a/docs/en/advanced_tutorials/multilabel/multilabel_en.md
+++ b/docs/en/advanced_tutorials/multilabel/multilabel_en.md
@@ -25,58 +25,68 @@ tar -xf NUS-SCENE-dataset.tar
 cd ../../
 ```

-## Environment
+## Training

-### Download pretrained model
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
+```

-You can use the following commands to download the pretrained model of ResNet50_vd.
+After training for 10 epochs, the best accuracy over the validation set should be around 0.95.
+
+## Evaluation

 ```bash
-mkdir pretrained
-cd pretrained
-wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams
-cd ../
+python tools/eval.py \
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
 ```

-## Training
+## Prediction

-```shell
-export CUDA_VISIBLE_DEVICES=0
-python -m paddle.distributed.launch \
-    --gpus="0" \
-    tools/train.py \
-        -c ./configs/quick_start/ResNet50_vd_multilabel.yaml
+```bash
+python3 tools/infer.py
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
 ```

-After training for 10 epochs, the best accuracy over the validation set should be around 0.72.
+You will get multiple output such as the following:
+```
+[{'class_ids': [6, 13, 17, 23, 26, 30], 'scores': [0.95683, 0.5567, 0.55211, 0.99088, 0.5943, 0.78767], 'file_name': './deploy/images/0517_2715693311.jpg', 'label_names': []}]  
+```

-## Evaluation
+## Prediction based on prediction engine
+
+### Export model

 ```bash
-python tools/eval.py \
-    -c ./configs/quick_start/ResNet50_vd_multilabel.yaml \
-    -o pretrained_model="./output/ResNet50_vd/best_model/ppcls" \
-    -o load_static_weights=False
+python3 tools/export_model.py \
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
 ```

-The metric of evaluation is based on mAP, which is commonly used in multilabel task to show model perfermance. The mAP over validation set should be around 0.57.
+The default path of the inference model is under the current path `./inference`

-## Prediction
+### Prediction based on prediction engine
+
+Enter the deploy directory:

 ```bash
-python tools/infer/infer.py \
-    -i "./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/0199_434752251.jpg" \
-    --model ResNet50_vd \
-    --pretrained_model "./output/ResNet50_vd/best_model/ppcls" \
-    --use_gpu True \
-    --load_static_weights False \
-    --multilabel True \
-    --class_num 33
+cd ./deploy
+```
+
+Prediction based on prediction engine:
+
+```
+python3 python/predict_cls.py \
+     -c configs/inference_multilabel_cls.yaml
 ```

 You will get multiple output such as the following:
-```    
-    class id: 3, probability: 0.6025
-    class id: 23, probability: 0.5491
-    class id: 32, probability: 0.7006
-```
\ No newline at end of file
+
+```
+0517_2715693311.jpg:    class id(s): [6, 13, 17, 23, 26, 30], score(s): [0.96, 0.56, 0.55, 0.99, 0.59, 0.79], label_name(s): []
+```
--- a/docs/en/models/PPLCNet_en.md
+++ b/docs/en/models/PPLCNet_en.md
+# PPLCNet series
+
+## Overview
+
+The PPLCNet series is a network that has excellent performance on Intel-CPU proposed by the Baidu PaddleCV team. The author summarizes some methods that can improve the accuracy of the model on Intel-CPU but hardly increase the inference time. The author combines these methods into a new network, namely PPLCNet. Compared with other lightweight networks, PPLCNet can achieve higher accuracy with the same inference time. PPLCNet has shown strong competitiveness in image classification, object detection, and semantic segmentation.
+
+
+
+## Accuracy, FLOPS and Parameters
+
+| Models           | Top1 | Top5 | FLOPs<br>(M) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565           | 18    | 1.5  |
+| PPLCNet_x0_35        |0.5809           | 0.8083           | 29    | 1.6  |
+| PPLCNet_x0_5         |0.6314           | 0.8466           | 47    | 1.9  |
+| PPLCNet_x0_75        |0.6818           | 0.8830           | 99    | 2.4  |
+| PPLCNet_x1_0         |0.7132           | 0.9003           | 161   | 3.0  |
+| PPLCNet_x1_5         |0.7371           | 0.9153           | 342   | 4.5  |
+| PPLCNet_x2_0         |0.7518           | 0.9227           | 590   | 6.5  |
+| PPLCNet_x2_5         |0.7660           | 0.9300           | 906   | 9.0  |
+| PPLCNet_x0_5_ssld    |0.6610           | 0.8646           | 47    | 1.9  |
+| PPLCNet_x1_0_ssld    |0.7439           | 0.9209           | 161   | 3.0  |
+| PPLCNet_x2_5_ssld    |0.8082           | 0.9533           | 906   | 9.0  |
+
+
+
+## Inference speed based on Intel(R)-Xeon(R)-Gold-6148-CPU
+
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|------------------|-----------|-------------------|--------------------------|
+| PPLCNet_x0_25        | 224       | 256               | 1.74                    |
+| PPLCNet_x0_35        | 224       | 256               | 1.92                    |
+| PPLCNet_x0_5         | 224       | 256               | 2.05                    |
+| PPLCNet_x0_75        | 224       | 256               | 2.29                    |
+| PPLCNet_x1_0         | 224       | 256               | 2.46                    |
+| PPLCNet_x1_5         | 224       | 256               | 3.19                    |
+| PPLCNet_x2_0         | 224       | 256               | 4.27                    |
+| PPLCNet_x2_5         | 224       | 256               | 5.39                    |
+| PPLCNet_x0_5_ssld    | 224       | 256               | 2.05                    |
+| PPLCNet_x1_0_ssld    | 224       | 256               | 2.46                    |
+| PPLCNet_x2_5_ssld    | 224       | 256               | 5.39                    |
--- a/docs/en/tutorials/getting_started_en.md
+++ b/docs/en/tutorials/getting_started_en.md
@@ -14,13 +14,13 @@ After preparing the configuration file, The training process can be started in t

 ```
 python tools/train.py \
-    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o pretrained_model="" \
-    -o use_gpu=False
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Arch.pretrained=False \
+    -o Global.device=gpu
 ```

-Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o pretrained_model=""` means to not using pre-trained models.
-`-o use_gpu=True` means to use GPU for training. If you want to use the CPU for training, you need to set `use_gpu` to `False`.
+Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o Arch.pretrained=False` means to not using pre-trained models.
+`-o Global.device=gpu` means to use GPU for training. If you want to use the CPU for training, you need to set `Global.device` to `cpu`.


 Of course, you can also directly modify the configuration file to update the configuration. For specific configuration parameters, please refer to [Configuration Document](config_description_en.md).
@@ -54,12 +54,12 @@ After configuring the configuration file, you can finetune it by loading the pre

 ```
 python tools/train.py \
-    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained" \
-    -o use_gpu=True
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Arch.pretrained=True \
+    -o Global.device=gpu
 ```

-Among them, `-o pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
+Among them, `-o Arch.pretrained` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file. You can also set it into `True` to use pretrained weights that trained in ImageNet1k.

 We also provide a lot of pre-trained models trained on the ImageNet-1k dataset. For the model list and download address, please refer to the [model library overview](../models/models_intro_en.md).

@@ -69,28 +69,26 @@ If the training process is terminated for some reasons, you can also load the ch

 ```
 python tools/train.py \
-    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
-    -o last_epoch=5 \
-    -o use_gpu=True
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
+    -o Global.device=gpu
 ```

-The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter.
+The configuration file does not need to be modified. You only need to add the `Global.checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter.

 **Note**:
-* The parameter `-o last_epoch=5` means to record the number of the last training epoch as `5`, that is, the number of this training epoch starts from `6`, , and the parameter defaults to `-1`, which means the number of this training epoch starts from `0`.

-* The `-o checkpoints` parameter does not need to include the suffix of the checkpoints. The above training command will generate the checkpoints as shown below during the training process. If you want to continue training from the epoch `5`, Just set the `checkpoints` to `./output/MobileNetV3_large_x1_0_gpupaddle/5/ppcls`, PaddleClas will automatically fill in the `pdopt` and `pdparams` suffixes.
+* The `-o Global.checkpoints` parameter does not need to include the suffix of the checkpoints. The above training command will generate the checkpoints as shown below during the training process. If you want to continue training from the epoch `5`, Just set the `Global.checkpoints` to `../output/MobileNetV3_large_x1_0/epoch_5`, PaddleClas will automatically fill in the `pdopt` and `pdparams` suffixes.

    ```shell
-    output/
-    └── MobileNetV3_large_x1_0
-        ├── 0
-        │   ├── ppcls.pdopt
-        │   └── ppcls.pdparams
-        ├── 1
-        │   ├── ppcls.pdopt
-        │   └── ppcls.pdparams
+    output
+    ├── MobileNetV3_large_x1_0
+    │   ├── best_model.pdopt
+    │   ├── best_model.pdparams
+    │   ├── best_model.pdstates
+    │   ├── epoch_1.pdopt
+    │   ├── epoch_1.pdparams
+    │   ├── epoch_1.pdstates
        .
        .
        .
@@ -103,18 +101,15 @@ The model evaluation process can be started as follows.

 ```bash
 python tools/eval.py \
-    -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
-    -o load_static_weights=False
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
 ```

-The above command will use `./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml` as the configuration file to evaluate the model `./output/MobileNetV3_large_x1_0/best_model/ppcls`. You can also set the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above.
+The above command will use `./configs/quick_start/MobileNetV3_large_x1_0.yaml` as the configuration file to evaluate the model `./output/MobileNetV3_large_x1_0/best_model`. You can also set the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above.

 Some of the configurable evaluation parameters are described as follows:
-* `ARCHITECTURE.name`: Model name
-* `pretrained_model`: The path of the model file to be evaluated
-* `load_static_weights`: Whether the model to be evaluated is a static graph model
-
+* `Arch.name`: Model name
+* `Global.pretrained_model`: The path of the model file to be evaluated

 **Note:** If the model is a dygraph type, you only need to specify the prefix of the model file when loading the model, instead of specifying the suffix, such as [1.3 Resume Training](#13-resume-training).

@@ -125,26 +120,15 @@ If you want to run PaddleClas on Linux with GPU, it is highly recommended to use

 ### 2.1 Model training

-After preparing the configuration file, The training process can be started in the following way. `paddle.distributed.launch` specifies the GPU running card number by setting `selected_gpus`:
+After preparing the configuration file, The training process can be started in the following way. `paddle.distributed.launch` specifies the GPU running card number by setting `gpus`:

 ```bash
 export CUDA_VISIBLE_DEVICES=0,1,2,3

-python -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml
-```
-
-The configuration can be updated by adding the `-o` parameter.
-
-```bash
-python -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
-    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-        -o pretrained_model="" \
-        -o use_gpu=True
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml
 ```

 The format of output log information is the same as above, see [1.1 Model training](#11-model-training) for details.
@@ -156,14 +140,14 @@ After configuring the configuration file, you can finetune it by loading the pre
 ```
 export CUDA_VISIBLE_DEVICES=0,1,2,3

-python -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-        -o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained"
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+        -o Arch.pretrained=True
 ```

-Among them, `pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
+Among them, `Arch.pretrained` is set to `True` or `False`. It also can be used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.

 There contains a lot of examples of model finetuning in [Quick Start](./quick_start_en.md). You can refer to this tutorial to finetune the model on a specific dataset.

@@ -175,26 +159,26 @@ If the training process is terminated for some reasons, you can also load the ch
 ```
 export CUDA_VISIBLE_DEVICES=0,1,2,3

-python -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-        -o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
-        -o last_epoch=5 \
-        -o use_gpu=True
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+        -o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
+        -o Global.device=gpu
 ```

-The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter. About `last_epoch` parameter, please refer [1.3 Resume training](#13-resume-training) for details.
+The configuration file does not need to be modified. You only need to add the `Global.checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter as described in [1.3 Resume training](#13-resume-training).

 ### 2.4 Model evaluation

 The model evaluation process can be started as follows.

 ```bash
-python tools/eval.py \
-    -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
-    -o load_static_weights=False
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    tools/eval.py \
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+        -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
 ```

 About parameter description, see [1.4 Model evaluation](#14-model-evaluation) for details.
@@ -204,30 +188,16 @@ About parameter description, see [1.4 Model evaluation](#14-model-evaluation) fo
 After the training is completed, you can predict by using the pre-trained model obtained by the training, as follows:

 ```python
-python tools/infer/infer.py \
-    -i image path \
-    --model MobileNetV3_large_x1_0 \
-    --pretrained_model "./output/MobileNetV3_large_x1_0/best_model/ppcls" \
-    --use_gpu True \
-    --load_static_weights False
+python3 tools/infer.py \
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Infer.infer_imgs=dataset/flowers102/jpg/image_00001.jpg \
+    -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
 ```

 Among them:
-+ `image_file`(i): The path of the image file to be predicted, such as `./test.jpeg`;
-+ `model`: Model name, such as `MobileNetV3_large_x1_0`;
-+ `pretrained_model`: Weight file path, such as `./pretrained/MobileNetV3_large_x1_0_pretrained/`;
-+ `use_gpu`: Whether to use the GPU, default by `True`;
-+ `load_static_weights`: Whether to load the pre-trained model obtained from static image training, default by `False`;
-+ `resize_short`: The length of the shortest side of the image that be scaled proportionally, default by `256`;
-+ `resize`: The side length of the image that be center cropped from resize_shorted image, default by `224`;
-+ `pre_label_image`: Whether to pre-label the image data, default value: `False`;
-+ `pre_label_out_idr`: The output path of pre-labeled image data. When `pre_label_image=True`, a lot of subfolders will be generated under the path, each subfolder represent a category, which stores all the images predicted by the model to belong to the category.
-
-**Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`.
-
-About more detailed infomation, you can refer to [infer.py](../../../tools/infer/infer.py).
+ `Infer.infer_imgs`: The path of the image file or folder to be predicted;
+ `Global.pretrained_model`: Weight file path, such as `./output/MobileNetV3_large_x1_0/best_model`;

-<a name="model_inference"></a>
 ## 4. Use the inference model to predict

 PaddlePaddle supports inference using prediction engines, which will be introduced next.
@@ -235,41 +205,38 @@ PaddlePaddle supports inference using prediction engines, which will be introduc
 Firstly, you should export inference model using `tools/export_model.py`.

 ```bash
-python tools/export_model.py \
-    --model MobileNetV3_large_x1_0 \
-    --pretrained_model ./output/MobileNetV3_large_x1_0/best_model/ppcls \
-    --output_path ./inference \
-    --class_dim 1000
+python3 tools/export_model.py \
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Global.pretrained_model=output/MobileNetV3_large_x1_0/best_model
 ```

-Among them, the `--model` parameter is used to specify the model name, `--pretrained_model` parameter is used to specify the model file path, the path does not need to include the model file suffix name, and `--output_path` is used to specify the storage path of the converted model, class_dim means number of class for the model, default as 1000.
-
-**Note**:
-1. If `--output_path=./inference`, then three files will be generated in the folder `inference`, they are `inference.pdiparams`, `inference.pdmodel` and `inference.pdiparams.info`.
-2. You can specify the `shape` of the model input image by setting the parameter `--img_size`, the default is `224`, which means the shape of input image is `224*224`. If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, you need to set `--img_size=384`.
+Among them,  `Global.pretrained_model` parameter is used to specify the model file path that does not need to include the file suffix name.

 The above command will generate the model structure file (`inference.pdmodel`) and the model weight file (`inference.pdiparams`), and then the inference engine can be used for inference:

+Go to the deploy directory:
+
+```
+cd deploy
+```
+
+Using inference engine to inference. Because the mapping file of ImageNet1k dataset is used by default, we should set `PostProcess.Topk.class_id_map_file` into `None`.
+
 ```bash
-python tools/infer/predict.py \
-    --image_file image path \
-    --model_file "./inference/inference.pdmodel" \
-    --params_file "./inference/inference.pdiparams" \
-    --use_gpu=True \
-    --use_tensorrt=False
+python3 python/predict_cls.py \
+    -c configs/inference_cls.yaml \
+    -o Global.infer_imgs=../dataset/flowers102/jpg/image_00001.jpg \
+    -o Global.inference_model_dir=../inference/ \
+    -o PostProcess.Topk.class_id_map_file=None
 ```
 Among them:
-+ `image_file`: The path of the image file to be predicted, such as `./test.jpeg`;
-+ `model_file`: Model file path, such as `./MobileNetV3_large_x1_0/inference.pdmodel`;
-+ `params_file`: Weight file path, such as `./MobileNetV3_large_x1_0/inference.pdiparams`;
-+ `use_tensorrt`: Whether to use the TesorRT, default by `True`;
-+ `use_gpu`: Whether to use the GPU, default by `True`
-+ `enable_mkldnn`: Wheter to use `MKL-DNN`, default by `False`. When both `use_gpu` and `enable_mkldnn` are set to `True`, GPU is used to run and `enable_mkldnn` will be ignored.
-+ `resize_short`: The length of the shortest side of the image that be scaled proportionally, default by `256`;
-+ `resize`: The side length of the image that be center cropped from resize_shorted image, default by `224`;
-+ `enable_calc_topk`: Whether to calculate top-k accuracy of the predction, default by `False`. Top-k accuracy will be printed out when set as `True`.
-+ `gt_label_path`: Image name and label file, used when `enable_calc_topk` is `True` to get image list and labels.
+ `Global.infer_imgs`: The path of the image file to be predicted;
+ `Global.inference_model_dir`: Model structure file path, such as `../inference/inference.pdmodel`;
+ `Global.use_tensorrt`: Whether to use the TesorRT, default by `False`;
+ `Global.use_gpu`: Whether to use the GPU, default by `True`
+ `Global.enable_mkldnn`: Wheter to use `MKL-DNN`, default by `False`. It is valid when `Global.use_gpu` is `False`.
+ `Global.use_fp16`: Whether to enable FP16, default by `False`;

 **Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`.

-If you want to evaluate the speed of the model, it is recommended to use [predict.py](../../../tools/infer/predict.py), and enable TensorRT to accelerate.
+If you want to evaluate the speed of the model, it is recommended to enable TensorRT to accelerate for GPU, and MKL-DNN for CPU.
--- a/docs/en/tutorials/getting_started_retrieval_en.md
+++ b/docs/en/tutorials/getting_started_retrieval_en.md
@@ -120,7 +120,7 @@ python3 tools/train.py \

 `-c` is used to specify the path to the configuration file, and `-o` is used to specify the parameters that need to be modified or added, where `-o Arch.Backbone.pretrained=True` indicates that the Backbone part uses the pre-trained model, in addition, `Arch.Backbone.pretrained` can also specify backbone.`pretrained` can also specify the address of a specific model weight file, which needs to be replaced with the path to your own pre-trained model weight file when using it. `-o Global.device=gpu` indicates that the GPU is used for training. If you want to use a CPU for training, you need to set `Global.device` to `cpu`.

-For more detailed training configuration, you can also modify the corresponding configuration file of the model directly. Refer to the [configuration document](config_en.md) for specific configuration parameters.
+For more detailed training configuration, you can also modify the corresponding configuration file of the model directly. Refer to the [configuration document](config_description_en.md) for specific configuration parameters.

 Run the above commands to check the output log, an example is as follows:


--- a/docs/images/wx_group.png
+++ b/docs/images/wx_group.png
--- a/docs/zh_CN/ImageNet_models_cn.md
+++ b/docs/zh_CN/ImageNet_models_cn.md
@@ -31,9 +31,9 @@
 | 模型                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址                                                                                         |
 |---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
 | ResNet34_vd_ssld         | 0.797    | 0.760  | 0.037  | 2.434               | 6.222              | 7.39     | 21.82     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_vd_ssld_pretrained.pdparams)         |
-| ResNet50_vd_<br>ssld | 0.830    | 0.792    | 0.039 | 3.531               | 8.090              | 8.67     | 25.58     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
-| ResNet101_vd_<br>ssld   | 0.837    | 0.802    | 0.035 |  6.117               | 13.762             | 16.1     | 44.57     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams)   |
-| Res2Net50_vd_<br>26w_4s_ssld | 0.831    | 0.798    | 0.033 |  4.527              | 9.657             | 8.37     | 25.06     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
+| ResNet50_vd_ssld | 0.830    | 0.792    | 0.039 | 3.531               | 8.090              | 8.67     | 25.58     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
+| ResNet101_vd_ssld   | 0.837    | 0.802    | 0.035 |  6.117               | 13.762             | 16.1     | 44.57     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams)   |
+| Res2Net50_vd_26w_4s_ssld | 0.831    | 0.798    | 0.033 |  4.527              | 9.657             | 8.37     | 25.06     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
 | Res2Net101_vd_<br>26w_4s_ssld | 0.839    | 0.806    | 0.033 | 8.087              | 17.312             | 16.67    | 45.22     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
 | Res2Net200_vd_<br>26w_4s_ssld | 0.851    | 0.812    | 0.049 | 14.678              | 32.350             | 31.49    | 76.21     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
 | HRNet_W18_C_ssld | 0.812    | 0.769   | 0.043 | 7.406          | 13.297         | 4.14     | 21.29     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W18_C_ssld_pretrained.pdparams) |
@@ -45,16 +45,44 @@

 | 模型                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | 模型大小(M) | 下载地址   |
 |---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
-| MobileNetV1_<br>ssld   | 0.779    | 0.710    | 0.069 |  32.523              | 1.11     | 4.19      | 16      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)                 |
-| MobileNetV2_<br>ssld                 | 0.767    | 0.722  | 0.045  | 23.318              | 0.6      | 3.44      | 14      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)                 |
-| MobileNetV3_<br>small_x0_35_ssld          | 0.556    | 0.530 | 0.026   | 2.635                 | 0.026    | 1.66      | 6.9     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
-| MobileNetV3_<br>large_x1_0_ssld      | 0.790    | 0.753  | 0.036  | 19.308           | 0.45     | 5.47      | 21      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
-| MobileNetV3_small_<br>x1_0_ssld      | 0.713    | 0.682  |  0.031  | 6.546                 | 0.123    | 2.94      | 12      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
-| GhostNet_<br>x1_3_ssld                    | 0.794    | 0.757   | 0.037 | 19.983                | 0.44     | 7.3       | 29      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)               |
+| MobileNetV1_ssld   | 0.779    | 0.710    | 0.069 |  32.523              | 1.11     | 4.19      | 16      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)                 |
+| MobileNetV2_ssld                 | 0.767    | 0.722  | 0.045  | 23.318              | 0.6      | 3.44      | 14      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)                 |
+| MobileNetV3_small_x0_35_ssld          | 0.556    | 0.530 | 0.026   | 2.635                 | 0.026    | 1.66      | 6.9     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
+| MobileNetV3_large_x1_0_ssld      | 0.790    | 0.753  | 0.036  | 19.308           | 0.45     | 5.47      | 21      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
+| MobileNetV3_small_x1_0_ssld      | 0.713    | 0.682  |  0.031  | 6.546                 | 0.123    | 2.94      | 12      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
+| GhostNet_x1_3_ssld                    | 0.794    | 0.757   | 0.037 | 19.983                | 0.44     | 7.3       | 29      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)               |
+
+
+* Intel CPU端知识蒸馏模型
+
+| 模型                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain |  Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | Flops(M) | Params(M)  | 下载地址   |
+|---------------------|-----------|-----------|---------------|----------------|----------|-----------|-----------------------------------|
+| PPLCNet_x0_5_ssld   | 0.661    | 0.631    | 0.030 | 2.05     | 47     |   1.9   | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_ssld_pretrained.pdparams)                 |
+| PPLCNet_x1_0_ssld   | 0.744    | 0.713    | 0.033 | 2.46     | 161     |   3.0  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_ssld_pretrained.pdparams)                 |
+| PPLCNet_x2_5_ssld   | 0.808    | 0.766    | 0.042 | 5.39     | 906     |   9.0  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_ssld_pretrained.pdparams)                 |
+
+


 * 注: `Reference Top-1 Acc`表示PaddleClas基于ImageNet1k数据集训练得到的预训练模型精度。

+<a name="PPLCNet系列"></a>
+### PPLCNet系列
+
+PPLCNet系列模型的精度、速度指标如下表所示，更多关于该系列的模型介绍可以参考：[PPLCNet系列模型文档](./models/PPLCNet.md)。
+
+| 模型           | Top-1 Acc | Top-5 Acc | Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | FLOPs(M) | Params(M) | 下载地址 |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565   |  1.74      | 18    | 1.5  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_pretrained.pdparams) |
+| PPLCNet_x0_35        |0.5809           | 0.8083   |  1.92      | 29    | 1.6  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_35_pretrained.pdparams) |
+| PPLCNet_x0_5         |0.6314           | 0.8466   |  2.05      | 47    | 1.9  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_pretrained.pdparams) |
+| PPLCNet_x0_75        |0.6818           | 0.8830   |  2.29      | 99    | 2.4  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams) |
+| PPLCNet_x1_0         |0.7132           | 0.9003   |  2.46      | 161   | 3.0  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams) |
+| PPLCNet_x1_5         |0.7371           | 0.9153   |  3.19      | 342   | 4.5  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_5_pretrained.pdparams) |
+| PPLCNet_x2_0         |0.7518           | 0.9227   |  4.27      | 590   | 6.5  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams) |
+| PPLCNet_x2_5         |0.7660           | 0.9300   |  5.39      | 906   | 9.0  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_pretrained.pdparams) |
+
+
 <a name="ResNet及其Vd系列"></a>
 ### ResNet及其Vd系列

@@ -429,7 +457,7 @@ ViT（Vision Transformer）与DeiT（Data-efficient Image Transformers）系列

 | 模型       | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址                                                     |
 | ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
-| TNT_small | 0.8121   |0.9563  |                  |                  | 5.2   |  23.8    | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/TNT_small_pretrained.pdparams) |               |   
+| TNT_small | 0.8121   |0.9563  |                  |                  | 5.2   |  23.8    | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/TNT_small_pretrained.pdparams) |               |  

 **注**：TNT模型的数据预处理部分`NormalizeImage`中的`mean`与`std`均为0.5。


--- a/docs/zh_CN/advanced_tutorials/multilabel/multilabel.md
+++ b/docs/zh_CN/advanced_tutorials/multilabel/multilabel.md
@@ -25,58 +25,66 @@ tar -xf NUS-SCENE-dataset.tar
 cd ../../
 ```

-## 二、环境准备
+## 二、模型训练

-### 2.1 下载预训练模型
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
+```
+
+训练10epoch之后，验证集最好的正确率应该在0.95左右。

-本例展示基于ResNet50_vd模型的多标签分类流程，因此首先下载ResNet50_vd的预训练模型
+## 三、模型评估

 ```bash
-mkdir pretrained
-cd pretrained
-wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams
-cd ../
+python3 tools/eval.py \
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
 ```

-## 三、模型训练
+## 四、模型预测

-```shell
-export CUDA_VISIBLE_DEVICES=0
-python -m paddle.distributed.launch \
-    --gpus="0" \
-    tools/train.py \
-        -c ./configs/quick_start/ResNet50_vd_multilabel.yaml
+```bash
+python3 tools/infer.py \
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
+```
+
+得到类似下面的输出：
+```  
+[{'class_ids': [6, 13, 17, 23, 26, 30], 'scores': [0.95683, 0.5567, 0.55211, 0.99088, 0.5943, 0.78767], 'file_name': './deploy/images/0517_2715693311.jpg', 'label_names': []}]
 ```

-训练10epoch之后，验证集最好的正确率应该在0.72左右。
+## 五、基于预测引擎预测

-## 四、模型评估
+### 5.1 导出inference model

 ```bash
-python tools/eval.py \
-    -c ./configs/quick_start/ResNet50_vd_multilabel.yaml \
-    -o pretrained_model="./output/ResNet50_vd/best_model/ppcls" \
-    -o load_static_weights=False
+python3 tools/export_model.py \
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
 ```
+inference model的路径默认在当前路径下`./inference`

-评估指标采用mAP，验证集的mAP应该在0.57左右。
+### 5.2 基于预测引擎预测

-## 五、模型预测
+首先进入deploy目录下：

 ```bash
-python tools/infer/infer.py \
-    -i "./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/0199_434752251.jpg" \
-    --model ResNet50_vd \
-    --pretrained_model "./output/ResNet50_vd/best_model/ppcls" \
-    --use_gpu True \
-    --load_static_weights False \
-    --multilabel True \
-    --class_num 33
+cd ./deploy
+```
+
+通过预测引擎推理预测：
+
+```
+python3 python/predict_cls.py \
+     -c configs/inference_multilabel_cls.yaml
 ```

 得到类似下面的输出：
-```    
-    class id: 3, probability: 0.6025
-    class id: 23, probability: 0.5491
-    class id: 32, probability: 0.7006
-```
\ No newline at end of file
+```
+0517_2715693311.jpg:    class id(s): [6, 13, 17, 23, 26, 30], score(s): [0.96, 0.56, 0.55, 0.99, 0.59, 0.79], label_name(s): []
+```
--- a/docs/zh_CN/faq_series/faq_2021_s2.md
+++ b/docs/zh_CN/faq_series/faq_2021_s2.md
@@ -7,7 +7,7 @@
 * 图像分类、识别、检索领域大佬众多，模型和论文更新速度也很快，本文档回答主要依赖有限的项目实践，难免挂一漏万，如有遗漏和不足，也希望有识之士帮忙补充和修正，万分感谢。

 ## 目录
-* [近期更新](#近期更新)(2021.08.11)
+* [近期更新](#近期更新)(2021.09.08)
 * [精选](#精选)
 * [1. 理论篇](#1.理论篇)
    * [1.1 PaddleClas基础知识](#1.1PaddleClas基础知识)
@@ -27,60 +27,69 @@
 <a name="近期更新"></a>
 ## 近期更新

-#### Q2.6.2: 导出inference模型进行预测部署，准确率异常，为什么呢？
-**A**: 该问题通常是由于在导出时未能正确加载模型参数导致的，首先检查模型导出时的日志，是否存在类似下述内容：
-```
-UserWarning: Skip loading for ***. *** is not found in the provided dict.
-```
-如果存在，则说明模型权重未能加载成功，请进一步检查配置文件中的 `Global.pretrained_model` 字段，是否正确配置了模型权重文件的路径。模型权重文件后缀名通常为 `pdparams`，注意在配置该路径时无需填写文件后缀名。
+#### Q2.1.7: 在训练时，出现如下报错信息：`ERROR: Unexpected segmentation fault encountered in DataLoader workers.`，如何排查解决问题呢？
+**A**：尝试将训练配置文件中的字段 `num_workers` 设置为 `0`；尝试将训练配置文件中的字段 `batch_size` 调小一些；检查数据集格式和配置文件中的数据集路径是否正确。

-#### Q2.1.4: 数据预处理中，不想对输入数据进行裁剪，该如何设置？或者如何设置剪裁的尺寸。
-**A**: PaddleClas 支持的数据预处理算子可在这里查看：`ppcls/data/preprocess/__init__.py`，所有支持的算子均可在配置文件中进行配置，配置的算子名称需要和算子类名一致，参数与对应算子类的构造函数参数一致。如不需要对图像裁剪，则可去掉 `CropImage`、`RandCropImage`，使用 `ResizeImage` 替换即可，可通过其参数设置不同的resize方式， 使用 `size` 参数则直接将图像缩放至固定大小，使用`resize_short` 参数则会维持图像宽高比进行缩放。设置裁剪尺寸时，可通过 `CropImage` 算子的 `size` 参数，或 `RandCropImage` 算子的 `size` 参数。
+#### Q2.1.8: 如何在训练时使用 `Mixup` 和 `Cutmix` ？
+**A**：
+* `Mixup` 的使用方法请参考 [Mixup](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Mixup.yaml#L63-L65)；`Cuxmix` 请参考 [Cuxmix](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L63-L65)。

-#### Q1.1.3: Momentum 优化器中的 momentum 参数是什么意思呢？
-**A**: Momentum 优化器是在 SGD 优化器的基础上引入了“动量”的概念。在 SGD 优化器中，在 `t+1` 时刻，参数 `w` 的更新可表示为：
-```latex
-w_t+1 = w_t - lr * grad
-```
-其中，`lr` 为学习率，`grad` 为此时参数 `w` 的梯度。在引入动量的概念后，参数 `w` 的更新可表示为：
-```latex
-v_t+1 = m * v_t + lr * grad
-w_t+1 = w_t - v_t+1
+* 在使用 `Mixup` 或 `Cutmix` 时，需要注意：
+    * 配置文件中的 `Loss.Tranin.CELoss` 需要修改为 `Loss.Tranin.MixCELoss`，可参考 [MixCELoss](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L23-L26)；
+    * 使用 `Mixup` 或 `Cutmix` 做训练时无法计算训练的精度（Acc）指标，因此需要在配置文件中取消 `Metric.Train.TopkAcc` 字段，可参考 [Metric.Train.TopkAcc](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L125-L128)。
+
+#### Q2.1.9: 训练配置yaml文件中，字段 `Global.pretrain_model` 和 `Global.checkpoints` 分别用于配置什么呢？
+**A**：
+* 当需要 `fine-tune` 时，可以通过字段 `Global.pretrain_model` 配置预训练模型权重文件的路径，预训练模型权重文件后缀名通常为 `.pdparams`；
+* 在训练过程中，训练程序会自动保存每个epoch结束时的断点信息，包括优化器信息 `.pdopt` 和模型权重信息 `.pdparams`。在训练过程意外中断等情况下，需要恢复训练时，可以通过字段 `Global.checkpoints` 配置训练过程中保存的断点信息文件，例如通过配置 `checkpoints: ./output/ResNet18/epoch_18` 即可恢复18epoch训练结束时的断点信息，PaddleClas将自动加载 `epoch_18.pdopt` 和 `epoch_18.pdparams`，从19epoch继续训练。
+
+#### Q2.6.3: 如何将模型转为 `ONNX` 格式？
+**A**：Paddle支持两种转ONNX格式模型的方式，且依赖于 `paddle2onnx` 工具，首先需要安装 `paddle2onnx`：
+
+```shell
+pip install paddle2onnx
 ```
-其中，`m` 即为动量 `momentum`，表示累积动量的加权值，一般取 `0.9`，当取值小于 `1` 时，则越早期的梯度对当前的影响越小，例如，当动量参数 `m` 取 `0.9` 时，在 `t` 时刻，`t-5` 的梯度加权值为 `0.9 ^ 5 = 0.59049`，而 `t-2` 时刻的梯度加权值为 `0.9 ^ 2 = 0.81`。因此，太过“久远”的梯度信息对当前的参考意义很小，而“最近”的历史梯度信息对当前影响更大，这也是符合直觉的。

-<div align="center">
-    <img src="../../images/faq/momentum.jpeg" width="400">
-</div>
+* 从 inference model 转为 ONNX 格式模型：

-*该图来自 `https://blog.csdn.net/tsyccnh/article/details/76270707`*
+    以动态图导出的 `combined` 格式 inference model（包含 `.pdmodel` 和 `.pdiparams` 两个文件）为例，使用以下命令进行模型格式转换：
+    ```shell
+    paddle2onnx --model_dir ${model_path}  --model_filename  ${model_path}/inference.pdmodel --params_filename ${model_path}/inference.pdiparams --save_file ${save_path}/model.onnx --enable_onnx_checker True
+    ```
+    上述命令中：
+    * `model_dir`：该参数下需要包含 `.pdmodel` 和 `.pdiparams` 两个文件；
+    * `model_filename`：该参数用于指定参数 `model_dir` 下的 `.pdmodel` 文件路径；
+    * `params_filename`：该参数用于指定参数 `model_dir` 下的 `.pdiparams` 文件路径；
+    * `save_file`：该参数用于指定转换后的模型保存目录路径。

-通过引入动量的概念，在参数更新时考虑了历史更新的影响，因此可以加快收敛速度，也改善了 `SGD` 优化器带来的损失（cost、loss）震荡问题。
+    关于静态图导出的非 `combined` 格式的 inference model（通常包含文件 `__model__` 和多个参数文件）转换模型格式，以及更多参数说明请参考 paddle2onnx 官方文档 [paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md#%E5%8F%82%E6%95%B0%E9%80%89%E9%A1%B9)。

-#### Q1.1.4: PaddleClas 是否有 `Fixing the train-test resolution discrepancy` 这篇论文的实现呢？
-**A**: 目前 PaddleClas 没有实现。如果需要，可以尝试自己修改代码。简单来说，该论文所提出的思想是使用较大分辨率作为输入，对已经训练好的模型最后的FC层进行fine-tune。具体操作上，首先在较低分辨率的数据集上对模型网络进行训练，完成训练后，对网络除最后的FC层外的其他层的权重设置参数 `stop_gradient=True`，然后使用较大分辨率的输入对网络进行fine-tune训练。
+* 直接从模型组网代码导出ONNX格式模型：

-#### Q1.6.2: PaddleClas 图像识别用于 Eval 的配置文件中，`Query` 和 `Gallery` 配置具体是用于做什么呢？
-**A**: `Query` 与 `Gallery` 均为数据集配置，其中 `Gallery` 用于配置底库数据，`Query` 用于配置验证集。在进行 Eval 时，首先使用模型对 `Gallery` 底库数据进行前向计算特征向量，特征向量用于构建底库，然后模型对 `Query` 验证集中的数据进行前向计算特征向量，再与底库计算召回率等指标。
+    以动态图模型组网代码为例，模型类为继承于 `paddle.nn.Layer` 的子类，代码如下所示：

-#### Q2.1.5: PaddlePaddle 安装后，使用报错，无法导入 paddle 下的任何模块（import paddle.xxx），是为什么呢？
-**A**: 首先可以使用以下代码测试 Paddle 是否安装正确：
-```python
-import paddle
-paddle.utils.install_check.run_check(）
-```
-正确安装时，通常会有如下提示：
-```
-PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
-```
-如未能安装成功，则会有相应问题的提示。
-另外，在同时安装CPU版本和GPU版本Paddle后，由于两个版本存在冲突，需要将两个版本全部卸载，然后重新安装所需要的版本。
+    ```python
+    import paddle
+    from paddle.static import InputSpec

-#### Q2.1.6: 使用PaddleClas训练时，如何设置仅保存最优模型？不想保存中间模型。
-**A**: PaddleClas在训练过程中，会保存/更新以下三类模型：
-1. 最新的模型（`latest.pdopt`， `latest.pdparams`，`latest.pdstates`），当训练意外中断时，可使用最新保存的模型恢复训练；
-2. 最优的模型（`best_model.pdopt`，`best_model.pdparams`，`best_model.pdstates`）；
-3. 训练过程中，一个epoch结束时的断点（`epoch_xxx.pdopt`，`epoch_xxx.pdparams`，`epoch_xxx.pdstates`）。训练配置文件中 `Global.save_interval` 字段表示该模型的保存间隔。将该字段设置大于总epochs数，则不再保存中间断点模型。
+    class SimpleNet(paddle.nn.Layer):
+        def __init__(self):
+            pass
+        def forward(self, x):
+            pass
+
+    net = SimpleNet()
+    x_spec = InputSpec(shape=[None, 3, 224, 224], dtype='float32', name='x')
+    paddle.onnx.export(layer=net, path="./SimpleNet", input_spec=[x_spec])
+    ```
+    其中：
+    * `InputSpec()` 函数用于描述模型输入的签名信息，包括输入数据的 `shape`、`type` 和 `name`（可省略）；
+    * `paddle.onnx.export()` 函数需要指定模型组网对象 `net`，导出模型的保存路径 `save_path`，模型的输入数据描述 `input_spec`。
+
+    需要注意，`paddlepaddle` 版本需大于 `2.0.0`。关于 `paddle.onnx.export()` 函数的更多参数说明请参考[paddle.onnx.export](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/onnx/export_cn.html#export)。
+
+#### Q2.5.4: 在 build 检索底库时，参数 `pq_size` 应该如何设置？
+**A**：`pq_size` 是PQ检索算法的参数。PQ检索算法可以简单理解为“分层”检索算法，`pq_size` 是每层的“容量”，因此该参数的设置会影响检索性能，不过，在底库总数据量不太大（小于10000张）的情况下，这个参数对性能的影响很小，因此对于大多数使用场景而言，在构建底库时无需修改该参数。关于PQ检索算法的更多内容，可以查看相关[论文](https://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_with_quantization.pdf)。

 <a name="精选"></a>
 ## 精选
@@ -204,6 +213,22 @@ PaddlePaddle is installed successfully! Let's start deep learning with PaddlePad
 2. 最优的模型（`best_model.pdopt`，`best_model.pdparams`，`best_model.pdstates`）；
 3. 训练过程中，一个epoch结束时的断点（`epoch_xxx.pdopt`，`epoch_xxx.pdparams`，`epoch_xxx.pdstates`）。训练配置文件中 `Global.save_interval` 字段表示该模型的保存间隔。将该字段设置大于总epochs数，则不再保存中间断点模型。

+#### Q2.1.7: 在训练时，出现如下报错信息：`ERROR: Unexpected segmentation fault encountered in DataLoader workers.`，如何排查解决问题呢？
+**A**：尝试将训练配置文件中的字段 `num_workers` 设置为 `0`；尝试将训练配置文件中的字段 `batch_size` 调小一些；检查数据集格式和配置文件中的数据集路径是否正确。
+
+#### Q2.1.8: 如何在训练时使用 `Mixup` 和 `Cutmix` ？
+**A**：
+* `Mixup` 的使用方法请参考 [Mixup](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Mixup.yaml#L63-L65)；`Cuxmix` 请参考 [Cuxmix](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L63-L65)。
+
+* 在使用 `Mixup` 或 `Cutmix` 时，需要注意：
+    * 配置文件中的 `Loss.Tranin.CELoss` 需要修改为 `Loss.Tranin.MixCELoss`，可参考 [MixCELoss](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L23-L26)；
+    * 使用 `Mixup` 或 `Cutmix` 做训练时无法计算训练的精度（Acc）指标，因此需要在配置文件中取消 `Metric.Train.TopkAcc` 字段，可参考 [Metric.Train.TopkAcc](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L125-L128)。
+
+#### Q2.1.9: 训练配置yaml文件中，字段 `Global.pretrain_model` 和 `Global.checkpoints` 分别用于配置什么呢？
+**A**：
+* 当需要 `fine-tune` 时，可以通过字段 `Global.pretrain_model` 配置预训练模型权重文件的路径，预训练模型权重文件后缀名通常为 `.pdparams`；
+* 在训练过程中，训练程序会自动保存每个epoch结束时的断点信息，包括优化器信息 `.pdopt` 和模型权重信息 `.pdparams`。在训练过程意外中断等情况下，需要恢复训练时，可以通过字段 `Global.checkpoints` 配置训练过程中保存的断点信息文件，例如通过配置 `checkpoints: ./output/ResNet18/epoch_18` 即可恢复18epoch训练结束时的断点信息，PaddleClas将自动加载 `epoch_18.pdopt` 和 `epoch_18.pdparams`，从19epoch继续训练。
+
 <a name="2.2图像分类"></a>
 ### 2.2 图像分类

@@ -255,6 +280,9 @@ PaddlePaddle is installed successfully! Let's start deep learning with PaddlePad
 #### Q2.5.3: Mac重新编译index.so时报错如下：clang: error: unsupported option '-fopenmp', 该如何处理？
 **A**：该问题已经解决。可以参照[文档](../../../develop/deploy/vector_search/README.md)重新编译 index.so。

+#### Q2.5.4: 在 build 检索底库时，参数 `pq_size` 应该如何设置？
+**A**：`pq_size` 是PQ检索算法的参数。PQ检索算法可以简单理解为“分层”检索算法，`pq_size` 是每层的“容量”，因此该参数的设置会影响检索性能，不过，在底库总数据量不太大（小于10000张）的情况下，这个参数对性能的影响很小，因此对于大多数使用场景而言，在构建底库时无需修改该参数。关于PQ检索算法的更多内容，可以查看相关[论文](https://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_with_quantization.pdf)。
+
 <a name="2.6模型预测部署"></a>
 ### 2.6 模型预测部署

@@ -267,3 +295,48 @@ PaddlePaddle is installed successfully! Let's start deep learning with PaddlePad
 UserWarning: Skip loading for ***. *** is not found in the provided dict.
 ```
 如果存在，则说明模型权重未能加载成功，请进一步检查配置文件中的 `Global.pretrained_model` 字段，是否正确配置了模型权重文件的路径。模型权重文件后缀名通常为 `pdparams`，注意在配置该路径时无需填写文件后缀名。
+
+#### Q2.6.3: 如何将模型转为 `ONNX` 格式？
+**A**：Paddle支持两种转ONNX格式模型的方式，且依赖于 `paddle2onnx` 工具，首先需要安装 `paddle2onnx`：
+
+```shell
+pip install paddle2onnx
+```
+
+* 从 inference model 转为 ONNX 格式模型：
+
+    以动态图导出的 `combined` 格式 inference model（包含 `.pdmodel` 和 `.pdiparams` 两个文件）为例，使用以下命令进行模型格式转换：
+    ```shell
+    paddle2onnx --model_dir ${model_path}  --model_filename  ${model_path}/inference.pdmodel --params_filename ${model_path}/inference.pdiparams --save_file ${save_path}/model.onnx --enable_onnx_checker True
+    ```
+    上述命令中：
+    * `model_dir`：该参数下需要包含 `.pdmodel` 和 `.pdiparams` 两个文件；
+    * `model_filename`：该参数用于指定参数 `model_dir` 下的 `.pdmodel` 文件路径；
+    * `params_filename`：该参数用于指定参数 `model_dir` 下的 `.pdiparams` 文件路径；
+    * `save_file`：该参数用于指定转换后的模型保存目录路径。
+
+    关于静态图导出的非 `combined` 格式的 inference model（通常包含文件 `__model__` 和多个参数文件）转换模型格式，以及更多参数说明请参考 paddle2onnx 官方文档 [paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md#%E5%8F%82%E6%95%B0%E9%80%89%E9%A1%B9)。
+
+* 直接从模型组网代码导出ONNX格式模型：
+
+    以动态图模型组网代码为例，模型类为继承于 `paddle.nn.Layer` 的子类，代码如下所示：
+
+    ```python
+    import paddle
+    from paddle.static import InputSpec
+
+    class SimpleNet(paddle.nn.Layer):
+        def __init__(self):
+            pass
+        def forward(self, x):
+            pass
+
+    net = SimpleNet()
+    x_spec = InputSpec(shape=[None, 3, 224, 224], dtype='float32', name='x')
+    paddle.onnx.export(layer=net, path="./SimpleNet", input_spec=[x_spec])
+    ```
+    其中：
+    * `InputSpec()` 函数用于描述模型输入的签名信息，包括输入数据的 `shape`、`type` 和 `name`（可省略）；
+    * `paddle.onnx.export()` 函数需要指定模型组网对象 `net`，导出模型的保存路径 `save_path`，模型的输入数据描述 `input_spec`。
+
+    需要注意，`paddlepaddle` 版本需大于 `2.0.0`。关于 `paddle.onnx.export()` 函数的更多参数说明请参考[paddle.onnx.export](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/onnx/export_cn.html#export)。
--- a/docs/zh_CN/models/LCNet.md
+++ b/docs/zh_CN/models/LCNet.md
+# PPLCNet系列
+
+## 概述
+
+PPLCNet系列是百度PaddleCV团队提出的一种在Intel-CPU上表现优异的网络，作者总结了一些在Intel-CPU上可以提升模型精度但几乎不增加推理耗时的方法，将这些方法组合成了一个新的网络，即PPLCNet。与其他轻量级网络相比，PPLCNet可以在相同延时下取得更高的精度。PPLCNet已在图像分类、目标检测、语义分割上表现出了强大的竞争力。
+
+
+
+## 精度、FLOPS和参数量
+
+| Models           | Top1 | Top5 | FLOPs<br>(M) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565           | 18    | 1.5  |
+| PPLCNet_x0_35        |0.5809           | 0.8083           | 29    | 1.6  |
+| PPLCNet_x0_5         |0.6314           | 0.8466           | 47    | 1.9  |
+| PPLCNet_x0_75        |0.6818           | 0.8830           | 99    | 2.4  |
+| PPLCNet_x1_0         |0.7132           | 0.9003           | 161   | 3.0  |
+| PPLCNet_x1_5         |0.7371           | 0.9153           | 342   | 4.5  |
+| PPLCNet_x2_0         |0.7518           | 0.9227           | 590   | 6.5  |
+| PPLCNet_x2_5         |0.7660           | 0.9300           | 906   | 9.0  |
+| PPLCNet_x0_5_ssld    |0.6610           | 0.8646           | 47    | 1.9  |
+| PPLCNet_x1_0_ssld    |0.7439           | 0.9209           | 161   | 3.0  |
+| PPLCNet_x2_5_ssld    |0.8082           | 0.9533           | 906   | 9.0  |
+
+
+
+## 基于Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz的预测速度
+
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|------------------|-----------|-------------------|--------------------------|
+| PPLCNet_x0_25        | 224       | 256               | 1.74                    |
+| PPLCNet_x0_35        | 224       | 256               | 1.92                    |
+| PPLCNet_x0_5         | 224       | 256               | 2.05                    |
+| PPLCNet_x0_75        | 224       | 256               | 2.29                    |
+| PPLCNet_x1_0         | 224       | 256               | 2.46                    |
+| PPLCNet_x1_5         | 224       | 256               | 3.19                    |
+| PPLCNet_x2_0         | 224       | 256               | 4.27                    |
+| PPLCNet_x2_5         | 224       | 256               | 5.39                    |
+| PPLCNet_x0_5_ssld    | 224       | 256               | 2.05                    |
+| PPLCNet_x1_0_ssld    | 224       | 256               | 2.46                    |
+| PPLCNet_x2_5_ssld    | 224       | 256               | 5.39                    |
--- a/docs/zh_CN/models/PPLCNet.md
+++ b/docs/zh_CN/models/PPLCNet.md
+# PPLCNet系列
+
+## 概述
+
+PPLCNet系列是百度PaddleCV团队提出的一种在Intel-CPU上表现优异的网络，作者总结了一些在Intel-CPU上可以提升模型精度但几乎不增加推理耗时的方法，将这些方法组合成了一个新的网络，即PPLCNet。与其他轻量级网络相比，PPLCNet可以在相同延时下取得更高的精度。PPLCNet已在图像分类、目标检测、语义分割上表现出了强大的竞争力。
+
+
+
+## 精度、FLOPS和参数量
+
+| Models           | Top1 | Top5 | FLOPs<br>(M) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565           | 18    | 1.5  |
+| PPLCNet_x0_35        |0.5809           | 0.8083           | 29    | 1.6  |
+| PPLCNet_x0_5         |0.6314           | 0.8466           | 47    | 1.9  |
+| PPLCNet_x0_75        |0.6818           | 0.8830           | 99    | 2.4  |
+| PPLCNet_x1_0         |0.7132           | 0.9003           | 161   | 3.0  |
+| PPLCNet_x1_5         |0.7371           | 0.9153           | 342   | 4.5  |
+| PPLCNet_x2_0         |0.7518           | 0.9227           | 590   | 6.5  |
+| PPLCNet_x2_5         |0.7660           | 0.9300           | 906   | 9.0  |
+| PPLCNet_x0_5_ssld    |0.6610           | 0.8646           | 47    | 1.9  |
+| PPLCNet_x1_0_ssld    |0.7439           | 0.9209           | 161   | 3.0  |
+| PPLCNet_x2_5_ssld    |0.8082           | 0.9533           | 906   | 9.0  |
+
+
+
+## 基于Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz的预测速度
+
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|------------------|-----------|-------------------|--------------------------|
+| PPLCNet_x0_25        | 224       | 256               | 1.74                    |
+| PPLCNet_x0_35        | 224       | 256               | 1.92                    |
+| PPLCNet_x0_5         | 224       | 256               | 2.05                    |
+| PPLCNet_x0_75        | 224       | 256               | 2.29                    |
+| PPLCNet_x1_0         | 224       | 256               | 2.46                    |
+| PPLCNet_x1_5         | 224       | 256               | 3.19                    |
+| PPLCNet_x2_0         | 224       | 256               | 4.27                    |
+| PPLCNet_x2_5         | 224       | 256               | 5.39                    |
+| PPLCNet_x0_5_ssld    | 224       | 256               | 2.05                    |
+| PPLCNet_x1_0_ssld    | 224       | 256               | 2.46                    |
+| PPLCNet_x2_5_ssld    | 224       | 256               | 5.39                    |
--- a/docs/zh_CN/tutorials/getting_started_retrieval.md
+++ b/docs/zh_CN/tutorials/getting_started_retrieval.md
@@ -117,7 +117,7 @@ python3 tools/train.py \

 其中，`-c`用于指定配置文件的路径，`-o`用于指定需要修改或者添加的参数，其中`-o Arch.Backbone.pretrained=True`表示Backbone部分使用预训练模型，此外，`Arch.Backbone.pretrained`也可以指定具体的模型权重文件的地址，使用时需要换成自己的预训练模型权重文件的路径。`-o Global.device=gpu`表示使用GPU进行训练。如果希望使用CPU进行训练，则需要将`Global.device`设置为`cpu`。

-更详细的训练配置，也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config.md)。
+更详细的训练配置，也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config_description.md)。

 运行上述命令，可以看到输出日志，示例如下：

@@ -245,4 +245,4 @@ python3 tools/export_model.py \
  - 平均检索精度(mAP)
  
    - AP: AP指的是不同召回率上的正确率的平均值
-    - mAP: 测试集中所有图片对应的AP的的平均值
\ No newline at end of file
+    - mAP: 测试集中所有图片对应的AP的的平均值
--- a/ppcls/arch/backbone/__init__.py
+++ b/ppcls/arch/backbone/__init__.py
@@ -21,6 +21,7 @@ from ppcls.arch.backbone.legendary_models.resnet import ResNet18, ResNet18_vd, R
 from ppcls.arch.backbone.legendary_models.vgg import VGG11, VGG13, VGG16, VGG19
 from ppcls.arch.backbone.legendary_models.inception_v3 import InceptionV3
 from ppcls.arch.backbone.legendary_models.hrnet import HRNet_W18_C, HRNet_W30_C, HRNet_W32_C, HRNet_W40_C, HRNet_W44_C, HRNet_W48_C, HRNet_W60_C, HRNet_W64_C, SE_HRNet_W64_C
+from ppcls.arch.backbone.legendary_models.pp_lcnet import PPLCNet_x0_25, PPLCNet_x0_35, PPLCNet_x0_5, PPLCNet_x0_75, PPLCNet_x1_0, PPLCNet_x1_5, PPLCNet_x2_0, PPLCNet_x2_5

 from ppcls.arch.backbone.model_zoo.resnet_vc import ResNet50_vc
 from ppcls.arch.backbone.model_zoo.resnext import ResNeXt50_32x4d, ResNeXt50_64x4d, ResNeXt101_32x4d, ResNeXt101_64x4d, ResNeXt152_32x4d, ResNeXt152_64x4d

--- a/ppcls/arch/backbone/legendary_models/pp_lcnet.py
+++ b/ppcls/arch/backbone/legendary_models/pp_lcnet.py
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import, division, print_function
+
+import paddle
+import paddle.nn as nn
+from paddle import ParamAttr
+from paddle.nn import AdaptiveAvgPool2D, BatchNorm, Conv2D, Dropout, Linear
+from paddle.regularizer import L2Decay
+from paddle.nn.initializer import KaimingNormal
+from ppcls.arch.backbone.base.theseus_layer import TheseusLayer
+from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url
+
+MODEL_URLS = {
+    "PPLCNet_x0_25":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_pretrained.pdparams",
+    "PPLCNet_x0_35":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_35_pretrained.pdparams",
+    "PPLCNet_x0_5":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_pretrained.pdparams",
+    "PPLCNet_x0_75":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams",
+    "PPLCNet_x1_0":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams",
+    "PPLCNet_x1_5":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_5_pretrained.pdparams",
+    "PPLCNet_x2_0":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams",
+    "PPLCNet_x2_5":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_pretrained.pdparams"
+}
+
+__all__ = list(MODEL_URLS.keys())
+
+# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se.
+# k: kernel_size
+# in_c: input channel number in depthwise block
+# out_c: output channel number in depthwise block
+# s: stride in depthwise block
+# use_se: whether to use SE block
+
+NET_CONFIG = {
+    "blocks2":
+    #k, in_c, out_c, s, use_se
+    [[3, 16, 32, 1, False]],
+    "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]],
+    "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]],
+    "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False],
+                [5, 256, 256, 1, False], [5, 256, 256, 1, False],
+                [5, 256, 256, 1, False], [5, 256, 256, 1, False]],
+    "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]]
+}
+
+
+def make_divisible(v, divisor=8, min_value=None):
+    if min_value is None:
+        min_value = divisor
+    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+    if new_v < 0.9 * v:
+        new_v += divisor
+    return new_v
+
+
+class ConvBNLayer(TheseusLayer):
+    def __init__(self,
+                 num_channels,
+                 filter_size,
+                 num_filters,
+                 stride,
+                 num_groups=1):
+        super().__init__()
+
+        self.conv = Conv2D(
+            in_channels=num_channels,
+            out_channels=num_filters,
+            kernel_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) // 2,
+            groups=num_groups,
+            weight_attr=ParamAttr(initializer=KaimingNormal()),
+            bias_attr=False)
+
+        self.bn = BatchNorm(
+            num_filters,
+            param_attr=ParamAttr(regularizer=L2Decay(0.0)),
+            bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
+        self.hardswish = nn.Hardswish()
+
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.bn(x)
+        x = self.hardswish(x)
+        return x
+
+
+class DepthwiseSeparable(TheseusLayer):
+    def __init__(self,
+                 num_channels,
+                 num_filters,
+                 stride,
+                 dw_size=3,
+                 use_se=False):
+        super().__init__()
+        self.use_se = use_se
+        self.dw_conv = ConvBNLayer(
+            num_channels=num_channels,
+            num_filters=num_channels,
+            filter_size=dw_size,
+            stride=stride,
+            num_groups=num_channels)
+        if use_se:
+            self.se = SEModule(num_channels)
+        self.pw_conv = ConvBNLayer(
+            num_channels=num_channels,
+            filter_size=1,
+            num_filters=num_filters,
+            stride=1)
+
+    def forward(self, x):
+        x = self.dw_conv(x)
+        if self.use_se:
+            x = self.se(x)
+        x = self.pw_conv(x)
+        return x
+
+
+class SEModule(TheseusLayer):
+    def __init__(self, channel, reduction=4):
+        super().__init__()
+        self.avg_pool = AdaptiveAvgPool2D(1)
+        self.conv1 = Conv2D(
+            in_channels=channel,
+            out_channels=channel // reduction,
+            kernel_size=1,
+            stride=1,
+            padding=0)
+        self.relu = nn.ReLU()
+        self.conv2 = Conv2D(
+            in_channels=channel // reduction,
+            out_channels=channel,
+            kernel_size=1,
+            stride=1,
+            padding=0)
+        self.hardsigmoid = nn.Hardsigmoid()
+
+    def forward(self, x):
+        identity = x
+        x = self.avg_pool(x)
+        x = self.conv1(x)
+        x = self.relu(x)
+        x = self.conv2(x)
+        x = self.hardsigmoid(x)
+        x = paddle.multiply(x=identity, y=x)
+        return x
+
+
+class PPLCNet(TheseusLayer):
+    def __init__(self,
+                 scale=1.0,
+                 class_num=1000,
+                 dropout_prob=0.2,
+                 class_expand=1280):
+        super().__init__()
+        self.scale = scale
+        self.class_expand = class_expand
+
+        self.conv1 = ConvBNLayer(
+            num_channels=3,
+            filter_size=3,
+            num_filters=make_divisible(16 * scale),
+            stride=2)
+
+        self.blocks2 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
+        ])
+
+        self.blocks3 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
+        ])
+
+        self.blocks4 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
+        ])
+
+        self.blocks5 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
+        ])
+
+        self.blocks6 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
+        ])
+
+        self.avg_pool = AdaptiveAvgPool2D(1)
+
+        self.last_conv = Conv2D(
+            in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
+            out_channels=self.class_expand,
+            kernel_size=1,
+            stride=1,
+            padding=0,
+            bias_attr=False)
+
+        self.hardswish = nn.Hardswish()
+        self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
+        self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
+
+        self.fc = Linear(self.class_expand, class_num)
+
+    def forward(self, x):
+        x = self.conv1(x)
+
+        x = self.blocks2(x)
+        x = self.blocks3(x)
+        x = self.blocks4(x)
+        x = self.blocks5(x)
+        x = self.blocks6(x)
+
+        x = self.avg_pool(x)
+        x = self.last_conv(x)
+        x = self.hardswish(x)
+        x = self.dropout(x)
+        x = self.flatten(x)
+        x = self.fc(x)
+        return x
+
+
+def _load_pretrained(pretrained, model, model_url, use_ssld):
+    if pretrained is False:
+        pass
+    elif pretrained is True:
+        load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld)
+    elif isinstance(pretrained, str):
+        load_dygraph_pretrain(model, pretrained)
+    else:
+        raise RuntimeError(
+            "pretrained type is not available. Please use `string` or `boolean` type."
+        )
+
+
+def PPLCNet_x0_25(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x0_25
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x0_25` model depends on args.
+    """
+    model = PPLCNet(scale=0.25, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_25"], use_ssld)
+    return model
+
+
+def PPLCNet_x0_35(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x0_35
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x0_35` model depends on args.
+    """
+    model = PPLCNet(scale=0.35, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_35"], use_ssld)
+    return model
+
+
+def PPLCNet_x0_5(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x0_5
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x0_5` model depends on args.
+    """
+    model = PPLCNet(scale=0.5, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_5"], use_ssld)
+    return model
+
+
+def PPLCNet_x0_75(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x0_75
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x0_75` model depends on args.
+    """
+    model = PPLCNet(scale=0.75, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_75"], use_ssld)
+    return model
+
+
+def PPLCNet_x1_0(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x1_0
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x1_0` model depends on args.
+    """
+    model = PPLCNet(scale=1.0, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x1_0"], use_ssld)
+    return model
+
+
+def PPLCNet_x1_5(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x1_5
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x1_5` model depends on args.
+    """
+    model = PPLCNet(scale=1.5, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x1_5"], use_ssld)
+    return model
+
+
+def PPLCNet_x2_0(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x2_0
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x2_0` model depends on args.
+    """
+    model = PPLCNet(scale=2.0, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x2_0"], use_ssld)
+    return model
+
+
+def PPLCNet_x2_5(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x2_5
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x2_5` model depends on args.
+    """
+    model = PPLCNet(scale=2.5, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x2_5"], use_ssld)
+    return model
--- a/ppcls/arch/backbone/model_zoo/googlenet.py
+++ b/ppcls/arch/backbone/model_zoo/googlenet.py
@@ -131,7 +131,7 @@ class GoogLeNetDY(nn.Layer):
        self._ince5b = Inception(
            832, 832, 384, 192, 384, 48, 128, 128, name="ince5b")

-        self._pool_5 = AvgPool2D(kernel_size=7, stride=7)
+        self._pool_5 = AdaptiveAvgPool2D(1)

        self._drop = Dropout(p=0.4, mode="downscale_in_infer")
        self._fc_out = Linear(

--- a/ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
+++ b/ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 100
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  eval_mode: retrieval
+  use_dali: False
+  to_static: False
+
+# model architecture
+Arch:
+  name: RecModel
+  infer_output_key: features
+  infer_add_softmax: False
+
+  Backbone: 
+    name: PPLCNet_x2_5
+    pretrained: True
+    use_ssld: True
+  BackboneStopLayer:
+    name: flatten_0
+  Neck:
+    name: FC
+    embedding_size: 1280
+    class_num: 512
+  Head:
+    name: ArcMargin 
+    embedding_size: 512
+    class_num: 185341
+    margin: 0.2
+    scale: 30
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.04
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/train_reg_all_data.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    Query:
+      dataset: 
+        name: VeriWild
+        image_root: ./dataset/Aliproduct/
+        cls_label_path: ./dataset/Aliproduct/val_list.txt
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+      sampler:
+        name: DistributedBatchSampler
+        batch_size: 64
+        drop_last: False
+        shuffle: False
+      loader:
+        num_workers: 4
+        use_shared_memory: True
+
+    Gallery:
+      dataset: 
+        name: VeriWild
+        image_root: ./dataset/Aliproduct/
+        cls_label_path: ./dataset/Aliproduct/val_list.txt
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+      sampler:
+        name: DistributedBatchSampler
+        batch_size: 64
+        drop_last: False
+        shuffle: False
+      loader:
+        num_workers: 4
+        use_shared_memory: True
+
+Metric:
+  Eval:
+    - Recallk:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/AlexNet/AlexNet.yaml
+++ b/ppcls/configs/ImageNet/AlexNet/AlexNet.yaml
@@ -34,9 +34,8 @@ Optimizer:
  momentum: 0.9
  lr:
    name: Piecewise
-    learning_rate: 0.01
    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
+    values: [0.01, 0.001, 0.0001, 0.00001]
  regularizer:
    name: 'L2'
    coeff: 0.0001

--- a/ppcls/configs/ImageNet/DeiT/DeiT_base_distilled_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_base_distilled_patch16_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_base_distilled_patch16_384.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_base_distilled_patch16_384.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@@ -54,18 +56,39 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - RandCropImage:
-            size: 384
+            size: 384 
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 384
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -82,7 +105,9 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - ResizeImage:
-            resize_short: 426
+            resize_short: 438
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 384
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -107,7 +132,9 @@ Infer:
        to_rgb: True
        channel_first: False
    - ResizeImage:
-        resize_short: 426
+        resize_short: 438
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 384
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_base_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_base_patch16_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_base_patch16_384.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_base_patch16_384.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@@ -54,18 +56,39 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - RandCropImage:
-            size: 384
+            size: 384 
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 384
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -82,7 +105,9 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - ResizeImage:
-            resize_short: 426
+            resize_short: 438
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 384
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -107,7 +132,9 @@ Infer:
        to_rgb: True
        channel_first: False
    - ResizeImage:
-        resize_short: 426
+        resize_short: 438
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 384
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_small_distilled_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_small_distilled_patch16_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_small_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_small_patch16_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_tiny_distilled_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_tiny_distilled_patch16_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_tiny_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_tiny_patch16_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_25.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_25.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x0_25
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_35.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_35.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x0_35
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_5.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_5.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x0_5
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_75.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_75.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x0_75
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_5.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_5.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x1_5
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_0.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_0.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x2_0
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_5.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_5.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x2_5
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - AutoAugment:
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window12_384.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window12_384.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 20
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 384
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 384 
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -84,7 +110,11 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - ResizeImage:
-            size: [384, 384]
+            resize_short: 438
+            interpolation: bicubic
+            backend: pil
+        - CropImage:
+            size: 384
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
@@ -92,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -107,7 +137,11 @@ Infer:
        to_rgb: True
        channel_first: False
    - ResizeImage:
-        size: [384, 384]
+        resize_short: 438
+        interpolation: bicubic
+        backend: pil
+    - CropImage:
+        size: 384
    - NormalizeImage:
        scale: 1.0/255.0
        mean: [0.485, 0.456, 0.406]
@@ -120,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window7_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 20
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -85,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -94,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -110,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -124,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window12_384.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window12_384.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 20
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 384
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 384 
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -84,7 +110,11 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - ResizeImage:
-            size: [384, 384]
+            resize_short: 438
+            interpolation: bicubic
+            backend: pil
+        - CropImage:
+            size: 384
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
@@ -92,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -107,7 +137,11 @@ Infer:
        to_rgb: True
        channel_first: False
    - ResizeImage:
-        size: [384, 384]
+        resize_short: 438
+        interpolation: bicubic
+        backend: pil
+    - CropImage:
+        size: 384
    - NormalizeImage:
        scale: 1.0/255.0
        mean: [0.485, 0.456, 0.406]
@@ -120,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window7_224.yaml
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_small_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_small_patch4_window7_224.yaml
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_tiny_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_tiny_patch4_window7_224.yaml
--- a/ppcls/configs/ImageNet/Twins/alt_gvt_base.yaml
+++ b/ppcls/configs/ImageNet/Twins/alt_gvt_base.yaml
--- a/ppcls/configs/ImageNet/Twins/alt_gvt_large.yaml
+++ b/ppcls/configs/ImageNet/Twins/alt_gvt_large.yaml
--- a/ppcls/configs/ImageNet/Twins/alt_gvt_small.yaml
+++ b/ppcls/configs/ImageNet/Twins/alt_gvt_small.yaml
--- a/ppcls/configs/ImageNet/Twins/pcpvt_base.yaml
+++ b/ppcls/configs/ImageNet/Twins/pcpvt_base.yaml
--- a/ppcls/configs/ImageNet/Twins/pcpvt_large.yaml
+++ b/ppcls/configs/ImageNet/Twins/pcpvt_large.yaml
--- a/ppcls/configs/ImageNet/Twins/pcpvt_small.yaml
+++ b/ppcls/configs/ImageNet/Twins/pcpvt_small.yaml
--- a/ppcls/configs/Logo/ResNet50_ReID.yaml
+++ b/ppcls/configs/Logo/ResNet50_ReID.yaml
@@ -54,7 +54,7 @@ Optimizer:
  momentum: 0.9
  lr:
    name: Cosine
-    learning_rate: 0.01
+    learning_rate: 0.04
  regularizer:
    name: 'L2'
    coeff: 0.0001
@@ -84,10 +84,10 @@ DataLoader:
          - RandomErasing:
              EPSILON: 0.5
    sampler:
-        name: DistributedRandomIdentitySampler
+        name: PKSampler
        batch_size: 128
-        num_instances: 2
-        drop_last: False
+        sample_per_id: 2
+        drop_last: True

    loader:
        num_workers: 6
@@ -97,7 +97,7 @@ DataLoader:
      dataset:
        name: LogoDataset
        image_root: "dataset/LogoDet-3K-crop/val/"
-        cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+query.txt"
+        cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+val.txt"
        transform_ops:
          - DecodeImage:
              to_rgb: True
@@ -122,7 +122,7 @@ DataLoader:
      dataset:
          name: LogoDataset
          image_root: "dataset/LogoDet-3K-crop/train/"
-          cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+gallery.txt"
+          cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+train.txt"
          transform_ops:
            - DecodeImage:
                to_rgb: True

--- a/ppcls/configs/Products/ResNet50_vd_Inshop.yaml
+++ b/ppcls/configs/Products/ResNet50_vd_Inshop.yaml
@@ -54,7 +54,7 @@ Optimizer:
  momentum: 0.9
  lr:
    name: MultiStepDecay
-    learning_rate: 0.01
+    learning_rate: 0.04
    milestones: [30, 60, 70, 80, 90, 100]
    gamma: 0.5
    verbose: False
@@ -90,10 +90,10 @@ DataLoader:
            r1: 0.3
            mean: [0., 0., 0.]
    sampler:
-      name: DistributedRandomIdentitySampler
+      name: PKSampler
      batch_size: 64
-      num_instances: 2
-      drop_last: False
+      sample_per_id: 2
+      drop_last: True
      shuffle: True
    loader:
      num_workers: 4

--- a/ppcls/configs/Vehicle/ResNet50_ReID.yaml
+++ b/ppcls/configs/Vehicle/ResNet50_ReID.yaml
@@ -53,7 +53,7 @@ Optimizer:
  momentum: 0.9
  lr:
    name: Cosine
-    learning_rate: 0.01
+    learning_rate: 0.04
  regularizer:
    name: 'L2'
    coeff: 0.0005
@@ -88,10 +88,10 @@ DataLoader:
              mean: [0., 0., 0.]

    sampler:
-        name: DistributedRandomIdentitySampler
+        name: PKSampler
        batch_size: 128
-        num_instances: 2
-        drop_last: False
+        sample_per_id: 2
+        drop_last: True
        shuffle: True
    loader:
        num_workers: 6

--- a/ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
+++ b/ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
--- a/ppcls/data/__init__.py
+++ b/ppcls/data/__init__.py
@@ -26,9 +26,12 @@ from ppcls.data.dataloader.common_dataset import create_operators
 from ppcls.data.dataloader.vehicle_dataset import CompCars, VeriWild
 from ppcls.data.dataloader.logo_dataset import LogoDataset
 from ppcls.data.dataloader.icartoon_dataset import ICartoonDataset
+from ppcls.data.dataloader.mix_dataset import MixDataset

 # sampler
 from ppcls.data.dataloader.DistributedRandomIdentitySampler import DistributedRandomIdentitySampler
+from ppcls.data.dataloader.pk_sampler import PKSampler
+from ppcls.data.dataloader.mix_sampler import MixSampler
 from ppcls.data import preprocess
 from ppcls.data.preprocess import transform


--- a/ppcls/data/dataloader/__init__.py
+++ b/ppcls/data/dataloader/__init__.py
+from ppcls.data.dataloader.imagenet_dataset import ImageNetDataset
+from ppcls.data.dataloader.multilabel_dataset import MultiLabelDataset
+from ppcls.data.dataloader.common_dataset import create_operators
+from ppcls.data.dataloader.vehicle_dataset import CompCars, VeriWild
+from ppcls.data.dataloader.logo_dataset import LogoDataset
+from ppcls.data.dataloader.icartoon_dataset import ICartoonDataset
+from ppcls.data.dataloader.mix_dataset import MixDataset
+from ppcls.data.dataloader.mix_sampler import MixSampler
+from ppcls.data.dataloader.pk_sampler import PKSampler
--- a/ppcls/data/dataloader/mix_dataset.py
+++ b/ppcls/data/dataloader/mix_dataset.py
--- a/ppcls/data/dataloader/mix_sampler.py
+++ b/ppcls/data/dataloader/mix_sampler.py
--- a/ppcls/data/dataloader/multilabel_dataset.py
+++ b/ppcls/data/dataloader/multilabel_dataset.py
--- a/ppcls/data/dataloader/pk_sampler.py
+++ b/ppcls/data/dataloader/pk_sampler.py
--- a/ppcls/data/postprocess/__init__.py
+++ b/ppcls/data/postprocess/__init__.py
@@ -16,7 +16,7 @@ import importlib

 from . import topk

-from .topk import Topk
+from .topk import Topk, MultiLabelTopk


 def build_postprocess(config):

--- a/ppcls/data/postprocess/topk.py
+++ b/ppcls/data/postprocess/topk.py
--- a/ppcls/data/preprocess/__init__.py
+++ b/ppcls/data/preprocess/__init__.py
--- a/ppcls/data/preprocess/batch_ops/batch_operators.py
+++ b/ppcls/data/preprocess/batch_ops/batch_operators.py
--- a/ppcls/data/preprocess/ops/operators.py
+++ b/ppcls/data/preprocess/ops/operators.py
--- a/ppcls/data/preprocess/ops/random_erasing.py
+++ b/ppcls/data/preprocess/ops/random_erasing.py
--- a/ppcls/data/preprocess/ops/timm_autoaugment.py
+++ b/ppcls/data/preprocess/ops/timm_autoaugment.py
--- a/ppcls/engine/engine.py
+++ b/ppcls/engine/engine.py
--- a/ppcls/engine/evaluation/classification.py
+++ b/ppcls/engine/evaluation/classification.py
--- a/ppcls/engine/evaluation/retrieval.py
+++ b/ppcls/engine/evaluation/retrieval.py
--- a/ppcls/engine/train/train.py
+++ b/ppcls/engine/train/train.py
--- a/ppcls/loss/__init__.py
+++ b/ppcls/loss/__init__.py
--- a/ppcls/loss/multilabelloss.py
+++ b/ppcls/loss/multilabelloss.py
--- a/ppcls/metric/__init__.py
+++ b/ppcls/metric/__init__.py
--- a/ppcls/metric/metrics.py
+++ b/ppcls/metric/metrics.py
--- a/ppcls/optimizer/__init__.py
+++ b/ppcls/optimizer/__init__.py
--- a/ppcls/optimizer/learning_rate.py
+++ b/ppcls/optimizer/learning_rate.py
--- a/ppcls/optimizer/optimizer.py
+++ b/ppcls/optimizer/optimizer.py
--- a/tools/train.sh
+++ b/tools/train.sh