Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleClas into arcmargin

5aa57d2c · dongshuilong · f0bf51b3 · 7c6e76e5 · 5aa57d2c · 5aa57d2c
95 changed file
--- a/README_ch.md
+++ b/README_ch.md
@@ -7,7 +7,7 @@
 飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集，助力使用者训练出更好的视觉模型和应用落地。
 **近期更新**
+- 2021.09.17 增加PaddleClas自研PP-LCNet系列模型, 这些模型在Intel CPU上有较强的竞争力。相关指标和预训练权重可以从 [这里](docs/zh_CN/ImageNet_models.md)下载。
 - 2021.08.11 更新7个[FAQ](docs/zh_CN/faq_series/faq_2021_s2.md)。
 - 2021.06.29 添加Swin-transformer系列模型，ImageNet1k数据集上Top1 acc最高精度可达87.2%；支持训练预测评估与whl包部署，预训练模型可以从[这里](docs/zh_CN/models/models_intro.md)下载。
 - 2021.06.22,23,24 PaddleClas官方研发团队带来技术深入解读三日直播课。课程回放：[https://aistudio.baidu.com/aistudio/course/introduce/24519](https://aistudio.baidu.com/aistudio/course/introduce/24519)

--- a/README_en.md
+++ b/README_en.md
@@ -8,6 +8,8 @@ PaddleClas is an image recognition toolset for industry and academia, helping us
 **Recent updates**
+- 2021.09.17 Add PP-LCNet series model developed by PaddleClas, these models show strong competitiveness on Intel CPUs. The metrics and pretrained model are available [here](docs/en/ImageNet_models_en.md).
 - 2021.06.29 Add Swin-transformer series model，Highest top1 acc on ImageNet1k dataset reaches 87.2%, training, evaluation and inference are all supported. Pretrained models can be downloaded [here](docs/en/models/models_intro_en.md).
 - 2021.06.16 PaddleClas release/2.2. Add metric learning and vector search modules. Add product recognition, animation character recognition, vehicle recognition and logo recognition. Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
 - [more](./docs/en/update_history_en.md)

--- a/deploy/configs/inference_multilabel_cls.yaml
+++ b/deploy/configs/inference_multilabel_cls.yaml
+Global:
+  infer_imgs: "./images/0517_2715693311.jpg"
+  inference_model_dir: "../inference/"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: False
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 3
+    - ToCHWImage:
+PostProcess:
+  main_indicator: MultiLabelTopk
+  MultiLabelTopk:
+    topk: 5
+    class_id_map_file: None
+  SavePreLabel:
+    save_dir: ./pre_label/
--- a/deploy/images/0517_2715693311.jpg
+++ b/deploy/images/0517_2715693311.jpg
--- a/deploy/paddleserving/README.md
+++ b/deploy/paddleserving/README.md
@@ -4,9 +4,9 @@
 PaddleClas provides two service deployment methods:
 - Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please refer to the [tutorial](../../deploy/hubserving/readme_en.md)
- Based on **PaddleServing**: Code path is "`./deploy/paddleserving`". Please follow this tutorial.
+- Based on **PaddleServing**: Code path is "`./deploy/paddleserving`".  if you prefer retrieval_based image reocognition service, please refer to [tutorial](./recognition/README.md)，if you'd like image classification service, Please follow this tutorial.
-# Service deployment based on PaddleServing  
+# Image Classification Service deployment based on PaddleServing  
 This document will introduce how to use the [PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md) to deploy the ResNet50_vd model as a pipeline online service.
@@ -131,7 +131,7 @@ fetch_var {
    config.yml                # configuration file of starting the service
    pipeline_http_client.py   # script to send pipeline prediction request by http
    pipeline_rpc_client.py    # script to send pipeline prediction request by rpc
-    resnet50_web_service.py   # start the script of the pipeline server
+    classification_web_service.py   # start the script of the pipeline server
    ```
 2. Run the following command to start the service.
@@ -147,7 +147,7 @@ fetch_var {
    python3 pipeline_http_client.py
    ```
    After successfully running, the predicted result of the model will be printed in the cmd window. An example of the result is:
-    ![](./imgs/results.png)  
+    ![](./imgs/results.png)
    Adjust the number of concurrency in config.yml to get the largest QPS. 

--- a/deploy/paddleserving/README_CN.md
+++ b/deploy/paddleserving/README_CN.md
@@ -4,9 +4,9 @@
 PaddleClas提供2种服务部署方式：
 - 基于PaddleHub Serving的部署：代码路径为"`./deploy/hubserving`"，使用方法参考[文档](../../deploy/hubserving/readme.md)；
- 基于PaddleServing的部署：代码路径为"`./deploy/paddleserving`"，按照本教程使用。
+- 基于PaddleServing的部署：代码路径为"`./deploy/paddleserving`"， 基于检索方式的图像识别服务参考[文档](./recognition/README_CN.md)， 图像分类服务按照本教程使用。
-# 基于PaddleServing的服务部署
+# 基于PaddleServing的图像分类服务部署
 本文档以经典的ResNet50_vd模型为例，介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PaddleClas
 动态图模型的pipeline在线服务。
@@ -127,7 +127,7 @@ fetch_var {
    config.yml                 # 启动服务的配置文件
    pipeline_http_client.py    # http方式发送pipeline预测请求的脚本
    pipeline_rpc_client.py     # rpc方式发送pipeline预测请求的脚本
-    resnet50_web_service.py    # 启动pipeline服务端的脚本
+    classification_web_service.py    # 启动pipeline服务端的脚本
    ```
 2. 启动服务可运行如下命令：

--- a/deploy/paddleserving/imgs/results_recog.png
+++ b/deploy/paddleserving/imgs/results_recog.png
--- a/deploy/paddleserving/imgs/start_server_recog.png
+++ b/deploy/paddleserving/imgs/start_server_recog.png
--- a/deploy/paddleserving/recognition/README.md
+++ b/deploy/paddleserving/recognition/README.md
+# Product Recognition Service deployment based on PaddleServing  
+(English|[简体中文](./README_CN.md))
+This document will introduce how to use the [PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md) to deploy the product recognition model based on retrieval method as a pipeline online service.
+Some Key Features of Paddle Serving:
+- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed with one line command.
+- Industrial serving features supported, such as models management, online loading, online A/B testing etc.
+- Highly concurrent and efficient communication between clients and servers supported.
+The introduction and tutorial of Paddle Serving service deployment framework reference [document](https://github.com/PaddlePaddle/Serving/blob/develop/README.md).
+## Contents
+- [Environmental preparation](#environmental-preparation)
+- [Model conversion](#model-conversion)
+- [Paddle Serving pipeline deployment](#paddle-serving-pipeline-deployment)
+- [FAQ](#faq)
+<a name="environmental-preparation"></a>
+## Environmental preparation
+PaddleClas operating environment and PaddleServing operating environment are needed.
+1. Please prepare PaddleClas operating environment reference [link](../../docs/zh_CN/tutorials/install.md).
+   Download the corresponding paddle whl package according to the environment, it is recommended to install version 2.1.0.
+2. The steps of PaddleServing operating environment prepare are as follows:
+    Install serving which used to start the service
+    ```
+    pip3 install paddle-serving-server==0.6.1 # for CPU
+    pip3 install paddle-serving-server-gpu==0.6.1 # for GPU
+    # Other GPU environments need to confirm the environment and then choose to execute the following commands
+    pip3 install paddle-serving-server-gpu==0.6.1.post101 # GPU with CUDA10.1 + TensorRT6
+    pip3 install paddle-serving-server-gpu==0.6.1.post11 # GPU with CUDA11 + TensorRT7
+    ```
+3. Install the client to send requests to the service
+    In [download link](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md) find the client installation package corresponding to the python version.
+    The python3.7 version is recommended here:
+    ```
+    wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
+    pip3 install paddle_serving_client-0.0.0-cp37-none-any.whl
+    ```
+4. Install serving-app
+    ```
+    pip3 install paddle-serving-app==0.6.1
+    ```
+   **note:** If you want to install the latest version of PaddleServing, refer to [link](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md).
+<a name="model-conversion"></a>
+## Model conversion
+When using PaddleServing for service deployment, you need to convert the saved inference model into a serving model that is easy to deploy.
+The following assumes that the current working directory is the PaddleClas root directory
+Firstly, download the inference model of ResNet50_vd
+```
+cd deploy
+# Download and unzip the ResNet50_vd model
+wget -P models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar
+cd models
+tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
+```
+Then, you can use installed paddle_serving_client tool to convert inference model to mobile model.
+```
+#  Product recognition model conversion
+python3 -m paddle_serving_client.convert --dirname ./product_ResNet50_vd_aliproduct_v1.0_infer/ \
+                                         --model_filename inference.pdmodel  \
+                                         --params_filename inference.pdiparams \
+                                         --serving_server ./product_ResNet50_vd_aliproduct_v1.0_serving/ \
+                                         --serving_client ./product_ResNet50_vd_aliproduct_v1.0_client/
+```
+After the ResNet50_vd inference model is converted, there will be additional folders of `product_ResNet50_vd_aliproduct_v1.0_serving` and `product_ResNet50_vd_aliproduct_v1.0_client` in the current folder, with the following format:
+```
+|- product_ResNet50_vd_aliproduct_v1.0_serving/
+  |- __model__  
+  |- __params__
+  |- serving_server_conf.prototxt  
+  |- serving_server_conf.stream.prototxt
+|- product_ResNet50_vd_aliproduct_v1.0_client
+  |- serving_client_conf.prototxt  
+  |- serving_client_conf.stream.prototxt
+```
+Once you have the model file for deployment, you need to change the alias name in `serving_server_conf.prototxt`:  change `alias_name` in `fetch_var` to `features`,
+The modified serving_server_conf.prototxt file is as follows:
+```
+feed_var {
+  name: "x"
+  alias_name: "x"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 3
+  shape: 224
+  shape: 224
+}
+fetch_var {
+  name: "save_infer_model/scale_0.tmp_1"
+  alias_name: "features"
+  is_lod_tensor: true
+  fetch_type: 1
+  shape: -1
+}
+```
+Next，download and unpack the built index of product gallery
+```
+cd ../
+wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_v1.1.tar && tar -xf recognition_demo_data_v1.1.tar
+```
+<a name="paddle-serving-pipeline-deployment"></a>
+## Paddle Serving pipeline deployment
+1. Download the PaddleClas code, if you have already downloaded it, you can skip this step.
+    ```
+    git clone https://github.com/PaddlePaddle/PaddleClas
+    # Enter the working directory  
+    cd PaddleClas/deploy/paddleserving/recognition
+    ```
+    The paddleserving directory contains the code to start the pipeline service and send prediction requests, including:
+    ```
+    __init__.py
+    config.yml                # configuration file of starting the service
+    pipeline_http_client.py   # script to send pipeline prediction request by http
+    pipeline_rpc_client.py    # script to send pipeline prediction request by rpc
+    recognition_web_service.py   # start the script of the pipeline server
+    ```
+2. Run the following command to start the service.
+    ```
+    # Start the service and save the running log in log.txt
+    python3 recognition_web_service.py &>log.txt &
+    ```
+    After the service is successfully started, a log similar to the following will be printed in log.txt
+    ![](../imgs/start_server_recog.png)
+3. Send service request
+    ```
+    python3 pipeline_http_client.py
+    ```
+    After successfully running, the predicted result of the model will be printed in the cmd window. An example of the result is:
+    ![](../imgs/results_recog.png)  
+    Adjust the number of concurrency in config.yml to get the largest QPS. 
+    ```
+    op:
+        concurrency: 8
+        ...
+    ```
+    Multiple service requests can be sent at the same time if necessary.
+    The predicted performance data will be automatically written into the `PipelineServingLogs/pipeline.tracer` file.
+<a name="faq"></a>
+## FAQ
+**Q1**: No result return after sending the request.
+**A1**: Do not set the proxy when starting the service and sending the request. You can close the proxy before starting the service and before sending the request. The command to close the proxy is:
+```
+unset https_proxy
+unset http_proxy
+```  
--- a/deploy/paddleserving/recognition/README_CN.md
+++ b/deploy/paddleserving/recognition/README_CN.md
+# 基于PaddleServing的商品识别服务部署
+([English](./README.md)|简体中文)
+本文以商品识别为例，介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PaddleClas动态图模型的pipeline在线服务。
+相比较于hubserving部署，PaddleServing具备以下优点：
+- 支持客户端和服务端之间高并发和高效通信
+- 支持 工业级的服务能力 例如模型管理，在线加载，在线A/B测试等
+- 支持 多种编程语言 开发客户端，例如C++, Python和Java
+更多有关PaddleServing服务化部署框架介绍和使用教程参考[文档](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)。
+## 目录
+- [环境准备](#环境准备)
+- [模型转换](#模型转换)
+- [Paddle Serving pipeline部署](#部署)
+- [FAQ](#FAQ)
+<a name="环境准备"></a>
+## 环境准备
+需要准备PaddleClas的运行环境和PaddleServing的运行环境。
+- 准备PaddleClas的[运行环境](../../docs/zh_CN/tutorials/install.md), 根据环境下载对应的paddle whl包，推荐安装2.1.0版本
+- 准备PaddleServing的运行环境，步骤如下
+1. 安装serving，用于启动服务
+    ```
+    pip3 install paddle-serving-server==0.6.1 # for CPU
+    pip3 install paddle-serving-server-gpu==0.6.1 # for GPU
+    # 其他GPU环境需要确认环境再选择执行如下命令
+    pip3 install paddle-serving-server-gpu==0.6.1.post101 # GPU with CUDA10.1 + TensorRT6
+    pip3 install paddle-serving-server-gpu==0.6.1.post11 # GPU with CUDA11 + TensorRT7
+    ```
+2. 安装client，用于向服务发送请求
+    在[下载链接](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)中找到对应python版本的client安装包，这里推荐python3.7版本：
+    ```
+    wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
+    pip3 install paddle_serving_client-0.0.0-cp37-none-any.whl
+    ```
+3. 安装serving-app
+    ```
+    pip3 install paddle-serving-app==0.6.1
+    ```
+    **Note:** 如果要安装最新版本的PaddleServing参考[链接](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)。
+<a name="模型转换"></a>
+## 模型转换
+使用PaddleServing做服务化部署时，需要将保存的inference模型转换为serving易于部署的模型。 
+以下内容假定当前工作目录为PaddleClas根目录。
+首先，下载商品识别的inference模型
+```
+cd deploy
+# 下载并解压商品识别模型
+wget -P models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar
+cd models
+tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
+```
+接下来，用安装的paddle_serving_client把下载的inference模型转换成易于server部署的模型格式。
+```
+# 转换商品识别模型
+python3 -m paddle_serving_client.convert --dirname ./product_ResNet50_vd_aliproduct_v1.0_infer/ \
+                                         --model_filename inference.pdmodel  \
+                                         --params_filename inference.pdiparams \
+                                         --serving_server ./product_ResNet50_vd_aliproduct_v1.0_serving/ \
+                                         --serving_client ./product_ResNet50_vd_aliproduct_v1.0_client/
+```
+商品识别推理模型转换完成后，会在当前文件夹多出`product_ResNet50_vd_aliproduct_v1.0_serving` 和`product_ResNet50_vd_aliproduct_v1.0_client`的文件夹，具备如下格式：
+```
+|- product_ResNet50_vd_aliproduct_v1.0_serving/
+  |- __model__  
+  |- __params__
+  |- serving_server_conf.prototxt  
+  |- serving_server_conf.stream.prototxt
+|- product_ResNet50_vd_aliproduct_v1.0_client
+  |- serving_client_conf.prototxt  
+  |- serving_client_conf.stream.prototxt
+```
+得到模型文件之后，需要修改serving_server_conf.prototxt中的alias名字： 将`fetch_var`中的`alias_name`改为`features`, 
+修改后的serving_server_conf.prototxt内容如下：
+```
+feed_var {
+  name: "x"
+  alias_name: "x"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 3
+  shape: 224
+  shape: 224
+}
+fetch_var {
+  name: "save_infer_model/scale_0.tmp_1"
+  alias_name: "features"
+  is_lod_tensor: true
+  fetch_type: 1
+  shape: -1
+}
+```
+接下来，下载并解压已经构建后的商品库index
+```
+cd ../
+wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_v1.1.tar && tar -xf recognition_demo_data_v1.1.tar
+```
+<a name="部署"></a>
+## Paddle Serving pipeline部署
+1. 下载PaddleClas代码，若已下载可跳过此步骤
+    ```
+    git clone https://github.com/PaddlePaddle/PaddleClas
+    # 进入到工作目录
+    cd PaddleClas/deploy/paddleserving/recognition
+    ```
+    paddleserving目录包含启动pipeline服务和发送预测请求的代码，包括：
+    ```
+    __init__.py
+    config.yml                    # 启动服务的配置文件
+    pipeline_http_client.py       # http方式发送pipeline预测请求的脚本
+    pipeline_rpc_client.py        # rpc方式发送pipeline预测请求的脚本
+    recognition_web_service.py    # 启动pipeline服务端的脚本
+    ```
+2. 启动服务可运行如下命令：
+    ```
+    # 启动服务，运行日志保存在log.txt
+    python3 recognition_web_service.py &>log.txt &
+    ```
+    成功启动服务后，log.txt中会打印类似如下日志
+    ![](../imgs/start_server_recog.png)
+3. 发送服务请求：
+    ```
+    python3 pipeline_http_client.py
+    ```
+    成功运行后，模型预测的结果会打印在cmd窗口中，结果示例为：
+    ![](../imgs/results_recog.png)
+    调整 config.yml 中的并发个数可以获得最大的QPS
+    ```
+    op:
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 8
+        ...
+    ```
+    有需要的话可以同时发送多个服务请求
+    预测性能数据会被自动写入 `PipelineServingLogs/pipeline.tracer` 文件中。
+<a name="FAQ"></a>
+## FAQ
+**Q1**： 发送请求后没有结果返回或者提示输出解码报错
+**A1**： 启动服务和发送请求时不要设置代理，可以在启动服务前和发送请求前关闭代理，关闭代理的命令是：
+```
+unset https_proxy
+unset http_proxy
+```
--- a/deploy/paddleserving/recognition/__init__.py
+++ b/deploy/paddleserving/recognition/__init__.py
--- a/deploy/paddleserving/recognition/config.yml
+++ b/deploy/paddleserving/recognition/config.yml
+#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
+##当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
+worker_num: 1
+#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
+http_port: 18081
+rpc_port: 9994
+dag:
+    #op资源类型, True, 为线程模型；False，为进程模型
+    is_thread_op: False
+op:
+    rec:
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 1
+        #当op配置没有server_endpoints时，从local_service_conf读取本地服务配置
+        local_service_conf:
+            #uci模型路径
+            model_config: ../../models/product_ResNet50_vd_aliproduct_v1.0_serving
+            #计算硬件类型: 空缺时由devices决定(CPU/GPU)，0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
+            device_type: 1
+            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
+            devices: "0" # "0,1"
+            #client类型，包括brpc, grpc和local_predictor.local_predictor不启动Serving服务，进程内预测
+            client_type: local_predictor
+            #Fetch结果列表，以client_config中fetch_var的alias_name为准
+            fetch_list: ["features"]
+    det:
+        concurrency: 1
+        local_service_conf:
+            client_type: local_predictor
+            device_type: 1
+            devices: '0'
+            fetch_list:
+            - save_infer_model/scale_0.tmp_1
+            model_config: ../../models/ppyolov2_r50vd_dcn_mainbody_v1.0_serving/
\ No newline at end of file
--- a/deploy/paddleserving/recognition/daoxiangcunjinzhubing_6.jpg
+++ b/deploy/paddleserving/recognition/daoxiangcunjinzhubing_6.jpg
--- a/deploy/paddleserving/recognition/label_list.txt
+++ b/deploy/paddleserving/recognition/label_list.txt
+foreground
+background
\ No newline at end of file
--- a/deploy/paddleserving/recognition/pipeline_http_client.py
+++ b/deploy/paddleserving/recognition/pipeline_http_client.py
+import requests
+import json
+import base64
+import os
+imgpath = "daoxiangcunjinzhubing_6.jpg"
+def cv2_to_base64(image):
+    return base64.b64encode(image).decode('utf8')
+if __name__ == "__main__":
+    url = "http://127.0.0.1:18081/recognition/prediction"
+    with open(os.path.join(".",  imgpath), 'rb') as file:
+        image_data1 = file.read()
+    image = cv2_to_base64(image_data1)
+    data = {"key": ["image"], "value": [image]}
+    for i in range(1):
+        r = requests.post(url=url, data=json.dumps(data))
+        print(r.json())
--- a/deploy/paddleserving/recognition/pipeline_rpc_client.py
+++ b/deploy/paddleserving/recognition/pipeline_rpc_client.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+try:
+    from paddle_serving_server_gpu.pipeline import PipelineClient
+except ImportError:
+    from paddle_serving_server.pipeline import PipelineClient
+import base64
+client = PipelineClient()
+client.connect(['127.0.0.1:9994'])
+imgpath = "daoxiangcunjinzhubing_6.jpg"
+def cv2_to_base64(image):
+    return base64.b64encode(image).decode('utf8')
+if __name__ == "__main__":
+    with open(imgpath, 'rb') as file:
+        image_data = file.read()
+    image = cv2_to_base64(image_data)
+    for i in range(1):
+        ret = client.predict(feed_dict={"image": image}, fetch=["result"])
+        print(ret)
--- a/deploy/paddleserving/recognition/recognition_web_service.py
+++ b/deploy/paddleserving/recognition/recognition_web_service.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from paddle_serving_server.web_service import WebService, Op
+import logging
+import numpy as np
+import sys
+import cv2
+from paddle_serving_app.reader import *
+import base64
+import os
+import faiss
+import pickle
+import json
+class DetOp(Op):
+    def init_op(self):
+        self.img_preprocess = Sequential([
+            BGR2RGB(), Div(255.0),
+            Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False),
+            Resize((640, 640)), Transpose((2, 0, 1))
+        ])
+        self.img_postprocess = RCNNPostprocess("label_list.txt", "output")
+        self.threshold = 0.2
+        self.max_det_results = 5
+    def generate_scale(self, im):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+        Returns:
+            im_scale_x: the resize ratio of X
+            im_scale_y: the resize ratio of Y
+        """
+        target_size = [640, 640]
+        origin_shape = im.shape[:2]
+        resize_h, resize_w = target_size
+        im_scale_y = resize_h / float(origin_shape[0])
+        im_scale_x = resize_w / float(origin_shape[1])
+        return im_scale_y, im_scale_x
+    def preprocess(self, input_dicts, data_id, log_id):
+        (_, input_dict), = input_dicts.items()
+        imgs = []
+        raw_imgs = []
+        for key in input_dict.keys():
+            data = base64.b64decode(input_dict[key].encode('utf8'))
+            raw_imgs.append(data)
+            data = np.fromstring(data, np.uint8)
+            raw_im = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            im_scale_y, im_scale_x = self.generate_scale(raw_im)
+            im = self.img_preprocess(raw_im)
+            imgs.append({
+              "image": im[np.newaxis, :],
+              "im_shape": np.array(list(im.shape[1:])).reshape(-1)[np.newaxis,:],
+              "scale_factor": np.array([im_scale_y, im_scale_x]).astype('float32'),
+            })
+        self.raw_img = raw_imgs
+        feed_dict = {
+            "image":        np.concatenate([x["image"] for x in imgs], axis=0),
+            "im_shape":     np.concatenate([x["im_shape"] for x in imgs], axis=0),
+            "scale_factor": np.concatenate([x["scale_factor"] for x in imgs], axis=0)
+        }
+        return feed_dict, False,  None,  ""
+    def postprocess(self, input_dicts, fetch_dict, log_id):
+        boxes = self.img_postprocess(fetch_dict, visualize=False)
+        boxes.sort(key = lambda x: x["score"], reverse = True)
+        boxes = filter(lambda x: x["score"] >= self.threshold, boxes[:self.max_det_results])
+        boxes = list(boxes)
+        for i in range(len(boxes)):
+            boxes[i]["bbox"][2] += boxes[i]["bbox"][0] - 1
+            boxes[i]["bbox"][3] += boxes[i]["bbox"][1] - 1
+        result = json.dumps(boxes)
+        res_dict = {"bbox_result": result, "image": self.raw_img}
+        return res_dict,  None,  ""
+class RecOp(Op):
+    def init_op(self):
+        self.seq = Sequential([
+            BGR2RGB(), Resize((224, 224)), 
+            Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225],
+                                False), Transpose((2, 0, 1))
+        ])
+        index_dir = "../../recognition_demo_data_v1.1/gallery_product/index"
+        assert os.path.exists(os.path.join(
+            index_dir, "vector.index")), "vector.index not found ..."
+        assert os.path.exists(os.path.join(
+            index_dir, "id_map.pkl")), "id_map.pkl not found ... "
+        self.searcher = faiss.read_index(
+            os.path.join(index_dir, "vector.index"))
+        with open(os.path.join(index_dir, "id_map.pkl"), "rb") as fd:
+            self.id_map = pickle.load(fd)
+        self.rec_nms_thresold = 0.05
+        self.rec_score_thres = 0.5
+        self.feature_normalize = True
+        self.return_k = 1
+    def preprocess(self, input_dicts, data_id, log_id):
+        (_, input_dict), = input_dicts.items()
+        raw_img = input_dict["image"][0]
+        data = np.frombuffer(raw_img, np.uint8)
+        origin_img = cv2.imdecode(data, cv2.IMREAD_COLOR)
+        dt_boxes = input_dict["bbox_result"]
+        boxes = json.loads(dt_boxes)
+        boxes.append({"category_id": 0,
+                      "score": 1.0,
+                      "bbox": [0, 0, origin_img.shape[1], origin_img.shape[0]]
+                     })
+        self.det_boxes = boxes
+        #construct batch images for rec
+        imgs = []
+        for box in boxes:
+            box = [int(x) for x in box["bbox"]]
+            im = origin_img[box[1]: box[3], box[0]: box[2]].copy()
+            img = self.seq(im)
+            imgs.append(img[np.newaxis, :].copy())
+        input_imgs = np.concatenate(imgs, axis=0)
+        return {"x": input_imgs},  False,  None,  ""
+    def nms_to_rec_results(self, results, thresh = 0.1):
+        filtered_results = []
+        x1 = np.array([r["bbox"][0] for r in results]).astype("float32")
+        y1 = np.array([r["bbox"][1] for r in results]).astype("float32")
+        x2 = np.array([r["bbox"][2] for r in results]).astype("float32")
+        y2 = np.array([r["bbox"][3] for r in results]).astype("float32")
+        scores = np.array([r["rec_scores"] for r in results])
+        areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+        order = scores.argsort()[::-1]
+        while order.size > 0:
+            i = order[0]
+            xx1 = np.maximum(x1[i], x1[order[1:]])
+            yy1 = np.maximum(y1[i], y1[order[1:]])
+            xx2 = np.minimum(x2[i], x2[order[1:]])
+            yy2 = np.minimum(y2[i], y2[order[1:]])
+            w = np.maximum(0.0, xx2 - xx1 + 1)
+            h = np.maximum(0.0, yy2 - yy1 + 1)
+            inter = w * h
+            ovr = inter / (areas[i] + areas[order[1:]] - inter)
+            inds = np.where(ovr <= thresh)[0]
+            order = order[inds + 1]
+            filtered_results.append(results[i])
+        return filtered_results
+    def postprocess(self, input_dicts, fetch_dict, log_id):
+        batch_features = fetch_dict["features"]
+        if self.feature_normalize:
+            feas_norm = np.sqrt(
+                np.sum(np.square(batch_features), axis=1, keepdims=True))
+            batch_features = np.divide(batch_features, feas_norm)
+        scores, docs = self.searcher.search(batch_features,  self.return_k)
+        results = []
+        for i in range(scores.shape[0]):
+            pred = {}
+            if scores[i][0] >= self.rec_score_thres:
+                pred["bbox"] = [int(x) for x in self.det_boxes[i]["bbox"]]
+                pred["rec_docs"] = self.id_map[docs[i][0]].split()[1]
+                pred["rec_scores"] = scores[i][0]
+                results.append(pred)
+        #do nms
+        results = self.nms_to_rec_results(results, self.rec_nms_thresold)
+        return {"result": str(results)}, None, ""
+class RecognitionService(WebService):
+    def get_pipeline_response(self, read_op):
+        det_op = DetOp(name="det", input_ops=[read_op])
+        rec_op = RecOp(name="rec", input_ops=[det_op])
+        return rec_op
+product_recog_service = RecognitionService(name="recognition")
+product_recog_service.prepare_pipeline_config("config.yml")
+product_recog_service.run_service()
--- a/deploy/python/postprocess.py
+++ b/deploy/python/postprocess.py
@@ -81,12 +81,14 @@ class Topk(object):
            class_id_map = None
        return class_id_map
-    def __call__(self, x, file_names=None):
+    def __call__(self, x, file_names=None, multilabel=False):
        if file_names is not None:
            assert x.shape[0] == len(file_names)
        y = []
        for idx, probs in enumerate(x):
-            index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32")
+            index = probs.argsort(axis=0)[-self.topk:][::-1].astype(
+                "int32") if not multilabel else np.where(
+                    probs >= 0.5)[0].astype("int32")
            clas_id_list = []
            score_list = []
            label_name_list = []
@@ -108,6 +110,14 @@ class Topk(object):
        return y
+class MultiLabelTopk(Topk):
+    def __init__(self, topk=1, class_id_map_file=None):
+        super().__init__()
+    def __call__(self, x, file_names=None):
+        return super().__call__(x, file_names, multilabel=True)
 class SavePreLabel(object):
    def __init__(self, save_dir):
        if save_dir is None:
@@ -128,23 +138,24 @@ class SavePreLabel(object):
        os.makedirs(output_dir, exist_ok=True)
        shutil.copy(image_file, output_dir)
 class Binarize(object):
-    def __init__(self, method = "round"):
+    def __init__(self, method="round"):
        self.method = method
        self.unit = np.array([[128, 64, 32, 16, 8, 4, 2, 1]]).T
    def __call__(self, x, file_names=None):
        if self.method == "round":
            x = np.round(x + 1).astype("uint8") - 1
        if self.method == "sign":
            x = ((np.sign(x) + 1) / 2).astype("uint8")
        embedding_size = x.shape[1]
        assert embedding_size % 8 == 0, "The Binary index only support vectors with sizes multiple of 8"
        byte = np.zeros([x.shape[0], embedding_size // 8], dtype=np.uint8)
        for i in range(embedding_size // 8):
-            byte[:, i:i+1] = np.dot(x[:, i * 8: (i + 1)* 8], self.unit)
+            byte[:, i:i + 1] = np.dot(x[:, i * 8:(i + 1) * 8], self.unit)
        return byte
--- a/deploy/python/predict_cls.py
+++ b/deploy/python/predict_cls.py
@@ -71,7 +71,6 @@ class ClsPredictor(Predictor):
        output_names = self.paddle_predictor.get_output_names()
        output_tensor = self.paddle_predictor.get_output_handle(output_names[
            0])
        if self.benchmark:
            self.auto_logger.times.start()
        if not isinstance(images, (list, )):
@@ -119,7 +118,6 @@ def main(config):
                                                         ) == len(image_list):
            if len(batch_imgs) == 0:
                continue
            batch_results = cls_predictor.predict(batch_imgs)
            for number, result_dict in enumerate(batch_results):
                filename = batch_names[number]

--- a/deploy/python/preprocess.py
+++ b/deploy/python/preprocess.py
@@ -19,12 +19,14 @@ from __future__ import division
 from __future__ import print_function
 from __future__ import unicode_literals
+from functools import partial
 import six
 import math
 import random
 import cv2
 import numpy as np
 import importlib
+from PIL import Image
 from python.det_preprocess import DetNormalizeImage, DetPadStride, DetPermute, DetResize
@@ -50,6 +52,50 @@ def create_operators(params):
    return ops
+class UnifiedResize(object):
+    def __init__(self, interpolation=None, backend="cv2"):
+        _cv2_interp_from_str = {
+            'nearest': cv2.INTER_NEAREST,
+            'bilinear': cv2.INTER_LINEAR,
+            'area': cv2.INTER_AREA,
+            'bicubic': cv2.INTER_CUBIC,
+            'lanczos': cv2.INTER_LANCZOS4
+        }
+        _pil_interp_from_str = {
+            'nearest': Image.NEAREST,
+            'bilinear': Image.BILINEAR,
+            'bicubic': Image.BICUBIC,
+            'box': Image.BOX,
+            'lanczos': Image.LANCZOS,
+            'hamming': Image.HAMMING
+        }
+        def _pil_resize(src, size, resample):
+            pil_img = Image.fromarray(src)
+            pil_img = pil_img.resize(size, resample)
+            return np.asarray(pil_img)
+        if backend.lower() == "cv2":
+            if isinstance(interpolation, str):
+                interpolation = _cv2_interp_from_str[interpolation.lower()]
+            # compatible with opencv < version 4.4.0
+            elif not interpolation:
+                interpolation = cv2.INTER_LINEAR
+            self.resize_func = partial(cv2.resize, interpolation=interpolation)
+        elif backend.lower() == "pil":
+            if isinstance(interpolation, str):
+                interpolation = _pil_interp_from_str[interpolation.lower()]
+            self.resize_func = partial(_pil_resize, resample=interpolation)
+        else:
+            logger.warning(
+                f"The backend of Resize only support \"cv2\" or \"PIL\". \"f{backend}\" is unavailable. Use \"cv2\" instead."
+            )
+            self.resize_func = cv2.resize
+    def __call__(self, src, size):
+        return self.resize_func(src, size)
 class OperatorParamError(ValueError):
    """ OperatorParamError
    """
@@ -87,8 +133,11 @@ class DecodeImage(object):
 class ResizeImage(object):
    """ resize image """
-    def __init__(self, size=None, resize_short=None, interpolation=-1):
+    def __init__(self,
-        self.interpolation = interpolation if interpolation >= 0 else None
+                 size=None,
+                 resize_short=None,
+                 interpolation=None,
+                 backend="cv2"):
        if resize_short is not None and resize_short > 0:
            self.resize_short = resize_short
            self.w = None
@@ -101,6 +150,9 @@ class ResizeImage(object):
            raise OperatorParamError("invalid params for ReisizeImage for '\
                'both 'size' and 'resize_short' are None")
+        self._resize_func = UnifiedResize(
+            interpolation=interpolation, backend=backend)
    def __call__(self, img):
        img_h, img_w = img.shape[:2]
        if self.resize_short is not None:
@@ -110,10 +162,7 @@ class ResizeImage(object):
        else:
            w = self.w
            h = self.h
-        if self.interpolation is None:
+        return self._resize_func(img, (w, h))
-            return cv2.resize(img, (w, h))
-        else:
-            return cv2.resize(img, (w, h), interpolation=self.interpolation)
 class CropImage(object):
@@ -145,9 +194,12 @@ class CropImage(object):
 class RandCropImage(object):
    """ random crop image """
-    def __init__(self, size, scale=None, ratio=None, interpolation=-1):
+    def __init__(self,
+                 size,
-        self.interpolation = interpolation if interpolation >= 0 else None
+                 scale=None,
+                 ratio=None,
+                 interpolation=None,
+                 backend="cv2"):
        if type(size) is int:
            self.size = (size, size)  # (h, w)
        else:
@@ -156,6 +208,9 @@ class RandCropImage(object):
        self.scale = [0.08, 1.0] if scale is None else scale
        self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+        self._resize_func = UnifiedResize(
+            interpolation=interpolation, backend=backend)
    def __call__(self, img):
        size = self.size
        scale = self.scale
@@ -181,10 +236,8 @@ class RandCropImage(object):
        j = random.randint(0, img_h - h)
        img = img[j:j + h, i:i + w, :]
-        if self.interpolation is None:
-            return cv2.resize(img, size)
+        return self._resize_func(img, size)
-        else:
-            return cv2.resize(img, size, interpolation=self.interpolation)
 class RandFlipImage(object):

--- a/deploy/shell/predict.sh
+++ b/deploy/shell/predict.sh
 # classification
 python3.7 python/predict_cls.py -c configs/inference_cls.yaml
+# multilabel_classification
+#python3.7 python/predict_cls.py -c configs/inference_multilabel_cls.yaml
 # feature extractor
 # python3.7 python/predict_rec.py -c configs/inference_rec.yaml

--- a/docs/en/ImageNet_models_en.md
+++ b/docs/en/ImageNet_models_en.md
@@ -24,13 +24,13 @@ Accuracy and inference time of the prtrained models based on SSLD distillation a
 * Server-side distillation pretrained models
 | Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                                                         |
-|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
+|---------------------|-----------|-----------|---------------|----------------|----------|-----------|-----------------------------------|
 | ResNet34_vd_ssld         | 0.797    | 0.760  | 0.037  | 2.434               | 6.222              | 7.39     | 21.82     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_vd_ssld_pretrained.pdparams)         |
-| ResNet50_vd_<br>ssld | 0.830    | 0.792    | 0.039 | 3.531               | 8.090              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
+| ResNet50_vd_ssld | 0.830    | 0.792    | 0.039 | 3.531               | 8.090              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
-| ResNet101_vd_<br>ssld   | 0.837    | 0.802    | 0.035 |  6.117               | 13.762             | 16.1     | 44.57     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams)   |
+| ResNet101_vd_ssld   | 0.837    | 0.802    | 0.035 |  6.117               | 13.762             | 16.1     | 44.57     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams)   |
-| Res2Net50_vd_<br>26w_4s_ssld | 0.831    | 0.798    | 0.033 |  4.527              | 9.657             | 8.37     | 25.06     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
+| Res2Net50_vd_26w_4s_ssld | 0.831    | 0.798    | 0.033 |  4.527              | 9.657             | 8.37     | 25.06     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
-| Res2Net101_vd_<br>26w_4s_ssld | 0.839    | 0.806    | 0.033 | 8.087              | 17.312             | 16.67    | 45.22     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
+| Res2Net101_vd_26w_4s_ssld | 0.839    | 0.806    | 0.033 | 8.087              | 17.312             | 16.67    | 45.22     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
-| Res2Net200_vd_<br>26w_4s_ssld | 0.851    | 0.812    | 0.049 | 14.678              | 32.350             | 31.49    | 76.21     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
+| Res2Net200_vd_26w_4s_ssld | 0.851    | 0.812    | 0.049 | 14.678              | 32.350             | 31.49    | 76.21     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
 | HRNet_W18_C_ssld | 0.812    | 0.769   | 0.043 | 7.406          | 13.297         | 4.14     | 21.29     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W18_C_ssld_pretrained.pdparams) |
 | HRNet_W48_C_ssld | 0.836    | 0.790   | 0.046  | 13.707         | 34.435         | 34.58    | 77.47     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W48_C_ssld_pretrained.pdparams) |
 | SE_HRNet_W64_C_ssld | 0.848    |  -    |  - |  31.697      |     94.995      | 57.83    | 128.97    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
@@ -38,19 +38,44 @@ Accuracy and inference time of the prtrained models based on SSLD distillation a
 * Mobile-side distillation pretrained models
-| Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | 模型大小(M) | Download Address  |
+| Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | Storage Size(M) | Download Address  |
+|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
+| MobileNetV1_ssld   | 0.779    | 0.710    | 0.069 |  32.523              | 1.11     | 4.19      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)                 |
+| MobileNetV2_ssld                 | 0.767    | 0.722  | 0.045  | 23.318              | 0.6      | 3.44      | 14      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)                 |
+| MobileNetV3_small_x0_35_ssld          | 0.556    | 0.530 | 0.026   | 2.635                 | 0.026    | 1.66      | 6.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
+| MobileNetV3_large_x1_0_ssld      | 0.790    | 0.753  | 0.036  | 19.308           | 0.45     | 5.47      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
+| MobileNetV3_small_x1_0_ssld      | 0.713    | 0.682  |  0.031  | 6.546                 | 0.123    | 2.94      | 12      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
+| GhostNet_x1_3_ssld                    | 0.794    | 0.757   | 0.037 | 19.983                | 0.44     | 7.3       | 29      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)  
+* Intel-CPU-side distillation pretrained models
+| Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain |  Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | Flops(M) | Params(M)  | Download Address   |
 |---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
-| MobileNetV1_<br>ssld   | 0.779    | 0.710    | 0.069 |  32.523              | 1.11     | 4.19      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)                 |
+| PPLCNet_x0_5_ssld   | 0.661    | 0.631    | 0.030 | 2.05     | 47     |   1.9   | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_ssld_pretrained.pdparams)                 |
-| MobileNetV2_<br>ssld                 | 0.767    | 0.722  | 0.045  | 23.318              | 0.6      | 3.44      | 14      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)                 |
+| PPLCNet_x1_0_ssld   | 0.744    | 0.713    | 0.033 | 2.46     | 161     |   3.0  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_ssld_pretrained.pdparams)                 |
-| MobileNetV3_<br>small_x0_35_ssld          | 0.556    | 0.530 | 0.026   | 2.635                 | 0.026    | 1.66      | 6.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
+| PPLCNet_x2_5_ssld   | 0.808    | 0.766    | 0.042 | 5.39     | 906     |   9.0  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_ssld_pretrained.pdparams)                 |
-| MobileNetV3_<br>large_x1_0_ssld      | 0.790    | 0.753  | 0.036  | 19.308           | 0.45     | 5.47      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
-| MobileNetV3_small_<br>x1_0_ssld      | 0.713    | 0.682  |  0.031  | 6.546                 | 0.123    | 2.94      | 12      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
-| GhostNet_<br>x1_3_ssld                    | 0.794    | 0.757   | 0.037 | 19.983                | 0.44     | 7.3       | 29      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)  
 * Note: `Reference Top-1 Acc` means accuracy of pretrained models which are trained on ImageNet1k dataset.
+<a name="PPLCNet_series"></a>
+### PPLCNet_series
+Accuracy and inference time metrics of PPLCNet series models are shown as follows. More detailed information can be refered to [PPLCNet series tutorial](../en/models/PPLCNet_en.md).
+| Model           | Top-1 Acc | Top-5 Acc | Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | FLOPs(M) | Params(M) | Download Address |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565   |  1.74      | 18    | 1.5  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_pretrained.pdparams) |
+| PPLCNet_x0_35        |0.5809           | 0.8083   |  1.92      | 29    | 1.6  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_35_pretrained.pdparams) |
+| PPLCNet_x0_5         |0.6314           | 0.8466   |  2.05      | 47    | 1.9  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_pretrained.pdparams) |
+| PPLCNet_x0_75        |0.6818           | 0.8830   |  2.29      | 99    | 2.4  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams) |
+| PPLCNet_x1_0         |0.7132           | 0.9003   |  2.46      | 161   | 3.0  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams) |
+| PPLCNet_x1_5         |0.7371           | 0.9153   |  3.19      | 342   | 4.5  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_5_pretrained.pdparams) |
+| PPLCNet_x2_0         |0.7518           | 0.9227   |  4.27      | 590   | 6.5  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams) |
+| PPLCNet_x2_5         |0.7660           | 0.9300   |  5.39      | 906   | 9.0  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_pretrained.pdparams) |
 <a name="ResNet_and_Vd_series"></a>
 ### ResNet and Vd series

--- a/docs/en/advanced_tutorials/multilabel/multilabel_en.md
+++ b/docs/en/advanced_tutorials/multilabel/multilabel_en.md
@@ -25,58 +25,68 @@ tar -xf NUS-SCENE-dataset.tar
 cd ../../
 ```
-## Environment
+## Training
-### Download pretrained model
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
+```
-You can use the following commands to download the pretrained model of ResNet50_vd.
+After training for 10 epochs, the best accuracy over the validation set should be around 0.95.
+## Evaluation
 ```bash
-mkdir pretrained
+python tools/eval.py \
-cd pretrained
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
-cd ../
 ```
-## Training
+## Prediction
-```shell
+```bash
-export CUDA_VISIBLE_DEVICES=0
+python3 tools/infer.py
-python -m paddle.distributed.launch \
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-    --gpus="0" \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
-    tools/train.py \
-        -c ./configs/quick_start/ResNet50_vd_multilabel.yaml
 ```
-After training for 10 epochs, the best accuracy over the validation set should be around 0.72.
+You will get multiple output such as the following:
+```
+[{'class_ids': [6, 13, 17, 23, 26, 30], 'scores': [0.95683, 0.5567, 0.55211, 0.99088, 0.5943, 0.78767], 'file_name': './deploy/images/0517_2715693311.jpg', 'label_names': []}]  
+```
-## Evaluation
+## Prediction based on prediction engine
+### Export model
 ```bash
-python tools/eval.py \
+python3 tools/export_model.py \
-    -c ./configs/quick_start/ResNet50_vd_multilabel.yaml \
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-    -o pretrained_model="./output/ResNet50_vd/best_model/ppcls" \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
-    -o load_static_weights=False
 ```
-The metric of evaluation is based on mAP, which is commonly used in multilabel task to show model perfermance. The mAP over validation set should be around 0.57.
+The default path of the inference model is under the current path `./inference`
-## Prediction
+### Prediction based on prediction engine
+Enter the deploy directory:
 ```bash
-python tools/infer/infer.py \
+cd ./deploy
-    -i "./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/0199_434752251.jpg" \
+```
-    --model ResNet50_vd \
-    --pretrained_model "./output/ResNet50_vd/best_model/ppcls" \
+Prediction based on prediction engine:
-    --use_gpu True \
-    --load_static_weights False \
+```
-    --multilabel True \
+python3 python/predict_cls.py \
-    --class_num 33
+     -c configs/inference_multilabel_cls.yaml
 ```
 You will get multiple output such as the following:
-```    
-    class id: 3, probability: 0.6025
+```
-    class id: 23, probability: 0.5491
+0517_2715693311.jpg:    class id(s): [6, 13, 17, 23, 26, 30], score(s): [0.96, 0.56, 0.55, 0.99, 0.59, 0.79], label_name(s): []
-    class id: 32, probability: 0.7006
+```
-```
\ No newline at end of file
--- a/docs/en/models/PPLCNet_en.md
+++ b/docs/en/models/PPLCNet_en.md
+# PPLCNet series
+## Overview
+The PPLCNet series is a network that has excellent performance on Intel-CPU proposed by the Baidu PaddleCV team. The author summarizes some methods that can improve the accuracy of the model on Intel-CPU but hardly increase the inference time. The author combines these methods into a new network, namely PPLCNet. Compared with other lightweight networks, PPLCNet can achieve higher accuracy with the same inference time. PPLCNet has shown strong competitiveness in image classification, object detection, and semantic segmentation.
+## Accuracy, FLOPS and Parameters
+| Models           | Top1 | Top5 | FLOPs<br>(M) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565           | 18    | 1.5  |
+| PPLCNet_x0_35        |0.5809           | 0.8083           | 29    | 1.6  |
+| PPLCNet_x0_5         |0.6314           | 0.8466           | 47    | 1.9  |
+| PPLCNet_x0_75        |0.6818           | 0.8830           | 99    | 2.4  |
+| PPLCNet_x1_0         |0.7132           | 0.9003           | 161   | 3.0  |
+| PPLCNet_x1_5         |0.7371           | 0.9153           | 342   | 4.5  |
+| PPLCNet_x2_0         |0.7518           | 0.9227           | 590   | 6.5  |
+| PPLCNet_x2_5         |0.7660           | 0.9300           | 906   | 9.0  |
+| PPLCNet_x0_5_ssld    |0.6610           | 0.8646           | 47    | 1.9  |
+| PPLCNet_x1_0_ssld    |0.7439           | 0.9209           | 161   | 3.0  |
+| PPLCNet_x2_5_ssld    |0.8082           | 0.9533           | 906   | 9.0  |
+## Inference speed based on Intel(R)-Xeon(R)-Gold-6148-CPU
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|------------------|-----------|-------------------|--------------------------|
+| PPLCNet_x0_25        | 224       | 256               | 1.74                    |
+| PPLCNet_x0_35        | 224       | 256               | 1.92                    |
+| PPLCNet_x0_5         | 224       | 256               | 2.05                    |
+| PPLCNet_x0_75        | 224       | 256               | 2.29                    |
+| PPLCNet_x1_0         | 224       | 256               | 2.46                    |
+| PPLCNet_x1_5         | 224       | 256               | 3.19                    |
+| PPLCNet_x2_0         | 224       | 256               | 4.27                    |
+| PPLCNet_x2_5         | 224       | 256               | 5.39                    |
+| PPLCNet_x0_5_ssld    | 224       | 256               | 2.05                    |
+| PPLCNet_x1_0_ssld    | 224       | 256               | 2.46                    |
+| PPLCNet_x2_5_ssld    | 224       | 256               | 5.39                    |
--- a/docs/en/tutorials/getting_started_en.md
+++ b/docs/en/tutorials/getting_started_en.md
@@ -14,13 +14,13 @@ After preparing the configuration file, The training process can be started in t
 ```
 python tools/train.py \
-    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-    -o pretrained_model="" \
+    -o Arch.pretrained=False \
-    -o use_gpu=False
+    -o Global.device=gpu
 ```
-Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o pretrained_model=""` means to not using pre-trained models.
+Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o Arch.pretrained=False` means to not using pre-trained models.
-`-o use_gpu=True` means to use GPU for training. If you want to use the CPU for training, you need to set `use_gpu` to `False`.
+`-o Global.device=gpu` means to use GPU for training. If you want to use the CPU for training, you need to set `Global.device` to `cpu`.
 Of course, you can also directly modify the configuration file to update the configuration. For specific configuration parameters, please refer to [Configuration Document](config_description_en.md).
@@ -54,12 +54,12 @@ After configuring the configuration file, you can finetune it by loading the pre
 ```
 python tools/train.py \
-    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-    -o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained" \
+    -o Arch.pretrained=True \
-    -o use_gpu=True
+    -o Global.device=gpu
 ```
-Among them, `-o pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
+Among them, `-o Arch.pretrained` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file. You can also set it into `True` to use pretrained weights that trained in ImageNet1k.
 We also provide a lot of pre-trained models trained on the ImageNet-1k dataset. For the model list and download address, please refer to the [model library overview](../models/models_intro_en.md).
@@ -69,28 +69,26 @@ If the training process is terminated for some reasons, you can also load the ch
 ```
 python tools/train.py \
-    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-    -o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
+    -o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
-    -o last_epoch=5 \
+    -o Global.device=gpu
-    -o use_gpu=True
 ```
-The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter.
+The configuration file does not need to be modified. You only need to add the `Global.checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter.
 **Note**:
-* The parameter `-o last_epoch=5` means to record the number of the last training epoch as `5`, that is, the number of this training epoch starts from `6`, , and the parameter defaults to `-1`, which means the number of this training epoch starts from `0`.
-* The `-o checkpoints` parameter does not need to include the suffix of the checkpoints. The above training command will generate the checkpoints as shown below during the training process. If you want to continue training from the epoch `5`, Just set the `checkpoints` to `./output/MobileNetV3_large_x1_0_gpupaddle/5/ppcls`, PaddleClas will automatically fill in the `pdopt` and `pdparams` suffixes.
+* The `-o Global.checkpoints` parameter does not need to include the suffix of the checkpoints. The above training command will generate the checkpoints as shown below during the training process. If you want to continue training from the epoch `5`, Just set the `Global.checkpoints` to `../output/MobileNetV3_large_x1_0/epoch_5`, PaddleClas will automatically fill in the `pdopt` and `pdparams` suffixes.
    ```shell
-    output/
+    output
-    └── MobileNetV3_large_x1_0
+    ├── MobileNetV3_large_x1_0
-        ├── 0
+    │   ├── best_model.pdopt
-        │   ├── ppcls.pdopt
+    │   ├── best_model.pdparams
-        │   └── ppcls.pdparams
+    │   ├── best_model.pdstates
-        ├── 1
+    │   ├── epoch_1.pdopt
-        │   ├── ppcls.pdopt
+    │   ├── epoch_1.pdparams
-        │   └── ppcls.pdparams
+    │   ├── epoch_1.pdstates
        .
        .
        .
@@ -103,18 +101,15 @@ The model evaluation process can be started as follows.
 ```bash
 python tools/eval.py \
-    -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-    -o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
+    -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
-    -o load_static_weights=False
 ```
-The above command will use `./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml` as the configuration file to evaluate the model `./output/MobileNetV3_large_x1_0/best_model/ppcls`. You can also set the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above.
+The above command will use `./configs/quick_start/MobileNetV3_large_x1_0.yaml` as the configuration file to evaluate the model `./output/MobileNetV3_large_x1_0/best_model`. You can also set the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above.
 Some of the configurable evaluation parameters are described as follows:
-* `ARCHITECTURE.name`: Model name
+* `Arch.name`: Model name
-* `pretrained_model`: The path of the model file to be evaluated
+* `Global.pretrained_model`: The path of the model file to be evaluated
-* `load_static_weights`: Whether the model to be evaluated is a static graph model
 **Note:** If the model is a dygraph type, you only need to specify the prefix of the model file when loading the model, instead of specifying the suffix, such as [1.3 Resume Training](#13-resume-training).
@@ -125,26 +120,15 @@ If you want to run PaddleClas on Linux with GPU, it is highly recommended to use
 ### 2.1 Model training
-After preparing the configuration file, The training process can be started in the following way. `paddle.distributed.launch` specifies the GPU running card number by setting `selected_gpus`:
+After preparing the configuration file, The training process can be started in the following way. `paddle.distributed.launch` specifies the GPU running card number by setting `gpus`:
 ```bash
 export CUDA_VISIBLE_DEVICES=0,1,2,3
-python -m paddle.distributed.launch \
+python3 -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
+    --gpus="0,1,2,3" \
    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml
-```
-The configuration can be updated by adding the `-o` parameter.
-```bash
-python -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
-    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-        -o pretrained_model="" \
-        -o use_gpu=True
 ```
 The format of output log information is the same as above, see [1.1 Model training](#11-model-training) for details.
@@ -156,14 +140,14 @@ After configuring the configuration file, you can finetune it by loading the pre
 ```
 export CUDA_VISIBLE_DEVICES=0,1,2,3
-python -m paddle.distributed.launch \
+python3 -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
+    --gpus="0,1,2,3" \
    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-        -o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained"
+        -o Arch.pretrained=True
 ```
-Among them, `pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
+Among them, `Arch.pretrained` is set to `True` or `False`. It also can be used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
 There contains a lot of examples of model finetuning in [Quick Start](./quick_start_en.md). You can refer to this tutorial to finetune the model on a specific dataset.
@@ -175,26 +159,26 @@ If the training process is terminated for some reasons, you can also load the ch
 ```
 export CUDA_VISIBLE_DEVICES=0,1,2,3
-python -m paddle.distributed.launch \
+python3 -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
+    --gpus="0,1,2,3" \
    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-        -o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
+        -o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
-        -o last_epoch=5 \
+        -o Global.device=gpu
-        -o use_gpu=True
 ```
-The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter. About `last_epoch` parameter, please refer [1.3 Resume training](#13-resume-training) for details.
+The configuration file does not need to be modified. You only need to add the `Global.checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter as described in [1.3 Resume training](#13-resume-training).
 ### 2.4 Model evaluation
 The model evaluation process can be started as follows.
 ```bash
-python tools/eval.py \
+export CUDA_VISIBLE_DEVICES=0,1,2,3
-    -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
+python3 -m paddle.distributed.launch \
-    -o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
+    tools/eval.py \
-    -o load_static_weights=False
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+        -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
 ```
 About parameter description, see [1.4 Model evaluation](#14-model-evaluation) for details.
@@ -204,30 +188,16 @@ About parameter description, see [1.4 Model evaluation](#14-model-evaluation) fo
 After the training is completed, you can predict by using the pre-trained model obtained by the training, as follows:
 ```python
-python tools/infer/infer.py \
+python3 tools/infer.py \
-    -i image path \
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-    --model MobileNetV3_large_x1_0 \
+    -o Infer.infer_imgs=dataset/flowers102/jpg/image_00001.jpg \
-    --pretrained_model "./output/MobileNetV3_large_x1_0/best_model/ppcls" \
+    -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
-    --use_gpu True \
-    --load_static_weights False
 ```
 Among them:
-+ `image_file`(i): The path of the image file to be predicted, such as `./test.jpeg`;
+ `Infer.infer_imgs`: The path of the image file or folder to be predicted;
-+ `model`: Model name, such as `MobileNetV3_large_x1_0`;
+ `Global.pretrained_model`: Weight file path, such as `./output/MobileNetV3_large_x1_0/best_model`;
-+ `pretrained_model`: Weight file path, such as `./pretrained/MobileNetV3_large_x1_0_pretrained/`;
-+ `use_gpu`: Whether to use the GPU, default by `True`;
-+ `load_static_weights`: Whether to load the pre-trained model obtained from static image training, default by `False`;
-+ `resize_short`: The length of the shortest side of the image that be scaled proportionally, default by `256`;
-+ `resize`: The side length of the image that be center cropped from resize_shorted image, default by `224`;
-+ `pre_label_image`: Whether to pre-label the image data, default value: `False`;
-+ `pre_label_out_idr`: The output path of pre-labeled image data. When `pre_label_image=True`, a lot of subfolders will be generated under the path, each subfolder represent a category, which stores all the images predicted by the model to belong to the category.
-**Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`.
-About more detailed infomation, you can refer to [infer.py](../../../tools/infer/infer.py).
-<a name="model_inference"></a>
 ## 4. Use the inference model to predict
 PaddlePaddle supports inference using prediction engines, which will be introduced next.
@@ -235,41 +205,38 @@ PaddlePaddle supports inference using prediction engines, which will be introduc
 Firstly, you should export inference model using `tools/export_model.py`.
 ```bash
-python tools/export_model.py \
+python3 tools/export_model.py \
-    --model MobileNetV3_large_x1_0 \
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-    --pretrained_model ./output/MobileNetV3_large_x1_0/best_model/ppcls \
+    -o Global.pretrained_model=output/MobileNetV3_large_x1_0/best_model
-    --output_path ./inference \
-    --class_dim 1000
 ```
-Among them, the `--model` parameter is used to specify the model name, `--pretrained_model` parameter is used to specify the model file path, the path does not need to include the model file suffix name, and `--output_path` is used to specify the storage path of the converted model, class_dim means number of class for the model, default as 1000.
+Among them,  `Global.pretrained_model` parameter is used to specify the model file path that does not need to include the file suffix name.
-**Note**:
-1. If `--output_path=./inference`, then three files will be generated in the folder `inference`, they are `inference.pdiparams`, `inference.pdmodel` and `inference.pdiparams.info`.
-2. You can specify the `shape` of the model input image by setting the parameter `--img_size`, the default is `224`, which means the shape of input image is `224*224`. If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, you need to set `--img_size=384`.
 The above command will generate the model structure file (`inference.pdmodel`) and the model weight file (`inference.pdiparams`), and then the inference engine can be used for inference:
+Go to the deploy directory:
+```
+cd deploy
+```
+Using inference engine to inference. Because the mapping file of ImageNet1k dataset is used by default, we should set `PostProcess.Topk.class_id_map_file` into `None`.
 ```bash
-python tools/infer/predict.py \
+python3 python/predict_cls.py \
-    --image_file image path \
+    -c configs/inference_cls.yaml \
-    --model_file "./inference/inference.pdmodel" \
+    -o Global.infer_imgs=../dataset/flowers102/jpg/image_00001.jpg \
-    --params_file "./inference/inference.pdiparams" \
+    -o Global.inference_model_dir=../inference/ \
-    --use_gpu=True \
+    -o PostProcess.Topk.class_id_map_file=None
-    --use_tensorrt=False
 ```
 Among them:
-+ `image_file`: The path of the image file to be predicted, such as `./test.jpeg`;
+ `Global.infer_imgs`: The path of the image file to be predicted;
-+ `model_file`: Model file path, such as `./MobileNetV3_large_x1_0/inference.pdmodel`;
+ `Global.inference_model_dir`: Model structure file path, such as `../inference/inference.pdmodel`;
-+ `params_file`: Weight file path, such as `./MobileNetV3_large_x1_0/inference.pdiparams`;
+ `Global.use_tensorrt`: Whether to use the TesorRT, default by `False`;
-+ `use_tensorrt`: Whether to use the TesorRT, default by `True`;
+ `Global.use_gpu`: Whether to use the GPU, default by `True`
-+ `use_gpu`: Whether to use the GPU, default by `True`
+ `Global.enable_mkldnn`: Wheter to use `MKL-DNN`, default by `False`. It is valid when `Global.use_gpu` is `False`.
-+ `enable_mkldnn`: Wheter to use `MKL-DNN`, default by `False`. When both `use_gpu` and `enable_mkldnn` are set to `True`, GPU is used to run and `enable_mkldnn` will be ignored.
+ `Global.use_fp16`: Whether to enable FP16, default by `False`;
-+ `resize_short`: The length of the shortest side of the image that be scaled proportionally, default by `256`;
-+ `resize`: The side length of the image that be center cropped from resize_shorted image, default by `224`;
-+ `enable_calc_topk`: Whether to calculate top-k accuracy of the predction, default by `False`. Top-k accuracy will be printed out when set as `True`.
-+ `gt_label_path`: Image name and label file, used when `enable_calc_topk` is `True` to get image list and labels.
 **Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`.
-If you want to evaluate the speed of the model, it is recommended to use [predict.py](../../../tools/infer/predict.py), and enable TensorRT to accelerate.
+If you want to evaluate the speed of the model, it is recommended to enable TensorRT to accelerate for GPU, and MKL-DNN for CPU.
--- a/docs/en/tutorials/getting_started_retrieval_en.md
+++ b/docs/en/tutorials/getting_started_retrieval_en.md
@@ -120,7 +120,7 @@ python3 tools/train.py \
 `-c` is used to specify the path to the configuration file, and `-o` is used to specify the parameters that need to be modified or added, where `-o Arch.Backbone.pretrained=True` indicates that the Backbone part uses the pre-trained model, in addition, `Arch.Backbone.pretrained` can also specify backbone.`pretrained` can also specify the address of a specific model weight file, which needs to be replaced with the path to your own pre-trained model weight file when using it. `-o Global.device=gpu` indicates that the GPU is used for training. If you want to use a CPU for training, you need to set `Global.device` to `cpu`.
-For more detailed training configuration, you can also modify the corresponding configuration file of the model directly. Refer to the [configuration document](config_en.md) for specific configuration parameters.
+For more detailed training configuration, you can also modify the corresponding configuration file of the model directly. Refer to the [configuration document](config_description_en.md) for specific configuration parameters.
 Run the above commands to check the output log, an example is as follows:

--- a/docs/images/wx_group.png
+++ b/docs/images/wx_group.png
--- a/docs/zh_CN/ImageNet_models_cn.md
+++ b/docs/zh_CN/ImageNet_models_cn.md
@@ -31,9 +31,9 @@
 | 模型                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址                                                                                         |
 |---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
 | ResNet34_vd_ssld         | 0.797    | 0.760  | 0.037  | 2.434               | 6.222              | 7.39     | 21.82     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_vd_ssld_pretrained.pdparams)         |
-| ResNet50_vd_<br>ssld | 0.830    | 0.792    | 0.039 | 3.531               | 8.090              | 8.67     | 25.58     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
+| ResNet50_vd_ssld | 0.830    | 0.792    | 0.039 | 3.531               | 8.090              | 8.67     | 25.58     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
-| ResNet101_vd_<br>ssld   | 0.837    | 0.802    | 0.035 |  6.117               | 13.762             | 16.1     | 44.57     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams)   |
+| ResNet101_vd_ssld   | 0.837    | 0.802    | 0.035 |  6.117               | 13.762             | 16.1     | 44.57     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams)   |
-| Res2Net50_vd_<br>26w_4s_ssld | 0.831    | 0.798    | 0.033 |  4.527              | 9.657             | 8.37     | 25.06     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
+| Res2Net50_vd_26w_4s_ssld | 0.831    | 0.798    | 0.033 |  4.527              | 9.657             | 8.37     | 25.06     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
 | Res2Net101_vd_<br>26w_4s_ssld | 0.839    | 0.806    | 0.033 | 8.087              | 17.312             | 16.67    | 45.22     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
 | Res2Net200_vd_<br>26w_4s_ssld | 0.851    | 0.812    | 0.049 | 14.678              | 32.350             | 31.49    | 76.21     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
 | HRNet_W18_C_ssld | 0.812    | 0.769   | 0.043 | 7.406          | 13.297         | 4.14     | 21.29     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W18_C_ssld_pretrained.pdparams) |
@@ -45,16 +45,44 @@
 | 模型                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | 模型大小(M) | 下载地址   |
 |---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
-| MobileNetV1_<br>ssld   | 0.779    | 0.710    | 0.069 |  32.523              | 1.11     | 4.19      | 16      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)                 |
+| MobileNetV1_ssld   | 0.779    | 0.710    | 0.069 |  32.523              | 1.11     | 4.19      | 16      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)                 |
-| MobileNetV2_<br>ssld                 | 0.767    | 0.722  | 0.045  | 23.318              | 0.6      | 3.44      | 14      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)                 |
+| MobileNetV2_ssld                 | 0.767    | 0.722  | 0.045  | 23.318              | 0.6      | 3.44      | 14      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)                 |
-| MobileNetV3_<br>small_x0_35_ssld          | 0.556    | 0.530 | 0.026   | 2.635                 | 0.026    | 1.66      | 6.9     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
+| MobileNetV3_small_x0_35_ssld          | 0.556    | 0.530 | 0.026   | 2.635                 | 0.026    | 1.66      | 6.9     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
-| MobileNetV3_<br>large_x1_0_ssld      | 0.790    | 0.753  | 0.036  | 19.308           | 0.45     | 5.47      | 21      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
+| MobileNetV3_large_x1_0_ssld      | 0.790    | 0.753  | 0.036  | 19.308           | 0.45     | 5.47      | 21      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
-| MobileNetV3_small_<br>x1_0_ssld      | 0.713    | 0.682  |  0.031  | 6.546                 | 0.123    | 2.94      | 12      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
+| MobileNetV3_small_x1_0_ssld      | 0.713    | 0.682  |  0.031  | 6.546                 | 0.123    | 2.94      | 12      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
-| GhostNet_<br>x1_3_ssld                    | 0.794    | 0.757   | 0.037 | 19.983                | 0.44     | 7.3       | 29      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)               |
+| GhostNet_x1_3_ssld                    | 0.794    | 0.757   | 0.037 | 19.983                | 0.44     | 7.3       | 29      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)               |
+* Intel CPU端知识蒸馏模型
+| 模型                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain |  Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | Flops(M) | Params(M)  | 下载地址   |
+|---------------------|-----------|-----------|---------------|----------------|----------|-----------|-----------------------------------|
+| PPLCNet_x0_5_ssld   | 0.661    | 0.631    | 0.030 | 2.05     | 47     |   1.9   | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_ssld_pretrained.pdparams)                 |
+| PPLCNet_x1_0_ssld   | 0.744    | 0.713    | 0.033 | 2.46     | 161     |   3.0  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_ssld_pretrained.pdparams)                 |
+| PPLCNet_x2_5_ssld   | 0.808    | 0.766    | 0.042 | 5.39     | 906     |   9.0  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_ssld_pretrained.pdparams)                 |
 * 注: `Reference Top-1 Acc`表示PaddleClas基于ImageNet1k数据集训练得到的预训练模型精度。
+<a name="PPLCNet系列"></a>
+### PPLCNet系列
+PPLCNet系列模型的精度、速度指标如下表所示，更多关于该系列的模型介绍可以参考：[PPLCNet系列模型文档](./models/PPLCNet.md)。
+| 模型           | Top-1 Acc | Top-5 Acc | Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | FLOPs(M) | Params(M) | 下载地址 |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565   |  1.74      | 18    | 1.5  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_pretrained.pdparams) |
+| PPLCNet_x0_35        |0.5809           | 0.8083   |  1.92      | 29    | 1.6  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_35_pretrained.pdparams) |
+| PPLCNet_x0_5         |0.6314           | 0.8466   |  2.05      | 47    | 1.9  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_pretrained.pdparams) |
+| PPLCNet_x0_75        |0.6818           | 0.8830   |  2.29      | 99    | 2.4  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams) |
+| PPLCNet_x1_0         |0.7132           | 0.9003   |  2.46      | 161   | 3.0  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams) |
+| PPLCNet_x1_5         |0.7371           | 0.9153   |  3.19      | 342   | 4.5  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_5_pretrained.pdparams) |
+| PPLCNet_x2_0         |0.7518           | 0.9227   |  4.27      | 590   | 6.5  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams) |
+| PPLCNet_x2_5         |0.7660           | 0.9300   |  5.39      | 906   | 9.0  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_pretrained.pdparams) |
 <a name="ResNet及其Vd系列"></a>
 ### ResNet及其Vd系列
@@ -429,7 +457,7 @@ ViT（Vision Transformer）与DeiT（Data-efficient Image Transformers）系列
 | 模型       | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址                                                     |
 | ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
-| TNT_small | 0.8121   |0.9563  |                  |                  | 5.2   |  23.8    | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/TNT_small_pretrained.pdparams) |               |   
+| TNT_small | 0.8121   |0.9563  |                  |                  | 5.2   |  23.8    | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/TNT_small_pretrained.pdparams) |               |  
 **注**：TNT模型的数据预处理部分`NormalizeImage`中的`mean`与`std`均为0.5。

--- a/docs/zh_CN/advanced_tutorials/multilabel/multilabel.md
+++ b/docs/zh_CN/advanced_tutorials/multilabel/multilabel.md
@@ -25,58 +25,66 @@ tar -xf NUS-SCENE-dataset.tar
 cd ../../
 ```
-## 二、环境准备
+## 二、模型训练
-### 2.1 下载预训练模型
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
+```
+训练10epoch之后，验证集最好的正确率应该在0.95左右。
-本例展示基于ResNet50_vd模型的多标签分类流程，因此首先下载ResNet50_vd的预训练模型
+## 三、模型评估
 ```bash
-mkdir pretrained
+python3 tools/eval.py \
-cd pretrained
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
-cd ../
 ```
-## 三、模型训练
+## 四、模型预测
-```shell
+```bash
-export CUDA_VISIBLE_DEVICES=0
+python3 tools/infer.py \
-python -m paddle.distributed.launch \
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-    --gpus="0" \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
-    tools/train.py \
+```
-        -c ./configs/quick_start/ResNet50_vd_multilabel.yaml
+得到类似下面的输出：
+```  
+[{'class_ids': [6, 13, 17, 23, 26, 30], 'scores': [0.95683, 0.5567, 0.55211, 0.99088, 0.5943, 0.78767], 'file_name': './deploy/images/0517_2715693311.jpg', 'label_names': []}]
 ```
-训练10epoch之后，验证集最好的正确率应该在0.72左右。
+## 五、基于预测引擎预测
-## 四、模型评估
+### 5.1 导出inference model
 ```bash
-python tools/eval.py \
+python3 tools/export_model.py \
-    -c ./configs/quick_start/ResNet50_vd_multilabel.yaml \
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-    -o pretrained_model="./output/ResNet50_vd/best_model/ppcls" \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
-    -o load_static_weights=False
 ```
+inference model的路径默认在当前路径下`./inference`
-评估指标采用mAP，验证集的mAP应该在0.57左右。
+### 5.2 基于预测引擎预测
-## 五、模型预测
+首先进入deploy目录下：
 ```bash
-python tools/infer/infer.py \
+cd ./deploy
-    -i "./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/0199_434752251.jpg" \
+```
-    --model ResNet50_vd \
-    --pretrained_model "./output/ResNet50_vd/best_model/ppcls" \
+通过预测引擎推理预测：
-    --use_gpu True \
-    --load_static_weights False \
+```
-    --multilabel True \
+python3 python/predict_cls.py \
-    --class_num 33
+     -c configs/inference_multilabel_cls.yaml
 ```
 得到类似下面的输出：
-```    
+```
-    class id: 3, probability: 0.6025
+0517_2715693311.jpg:    class id(s): [6, 13, 17, 23, 26, 30], score(s): [0.96, 0.56, 0.55, 0.99, 0.59, 0.79], label_name(s): []
-    class id: 23, probability: 0.5491
+```
-    class id: 32, probability: 0.7006
-```
\ No newline at end of file
--- a/docs/zh_CN/faq_series/faq_2021_s2.md
+++ b/docs/zh_CN/faq_series/faq_2021_s2.md
@@ -7,7 +7,7 @@
 * 图像分类、识别、检索领域大佬众多，模型和论文更新速度也很快，本文档回答主要依赖有限的项目实践，难免挂一漏万，如有遗漏和不足，也希望有识之士帮忙补充和修正，万分感谢。
 ## 目录
-* [近期更新](#近期更新)(2021.08.11)
+* [近期更新](#近期更新)(2021.09.08)
 * [精选](#精选)
 * [1. 理论篇](#1.理论篇)
    * [1.1 PaddleClas基础知识](#1.1PaddleClas基础知识)
@@ -27,60 +27,69 @@
 <a name="近期更新"></a>
 ## 近期更新
-#### Q2.6.2: 导出inference模型进行预测部署，准确率异常，为什么呢？
+#### Q2.1.7: 在训练时，出现如下报错信息：`ERROR: Unexpected segmentation fault encountered in DataLoader workers.`，如何排查解决问题呢？
-**A**: 该问题通常是由于在导出时未能正确加载模型参数导致的，首先检查模型导出时的日志，是否存在类似下述内容：
+**A**：尝试将训练配置文件中的字段 `num_workers` 设置为 `0`；尝试将训练配置文件中的字段 `batch_size` 调小一些；检查数据集格式和配置文件中的数据集路径是否正确。
-```
-UserWarning: Skip loading for ***. *** is not found in the provided dict.
-```
-如果存在，则说明模型权重未能加载成功，请进一步检查配置文件中的 `Global.pretrained_model` 字段，是否正确配置了模型权重文件的路径。模型权重文件后缀名通常为 `pdparams`，注意在配置该路径时无需填写文件后缀名。
-#### Q2.1.4: 数据预处理中，不想对输入数据进行裁剪，该如何设置？或者如何设置剪裁的尺寸。
+#### Q2.1.8: 如何在训练时使用 `Mixup` 和 `Cutmix` ？
-**A**: PaddleClas 支持的数据预处理算子可在这里查看：`ppcls/data/preprocess/__init__.py`，所有支持的算子均可在配置文件中进行配置，配置的算子名称需要和算子类名一致，参数与对应算子类的构造函数参数一致。如不需要对图像裁剪，则可去掉 `CropImage`、`RandCropImage`，使用 `ResizeImage` 替换即可，可通过其参数设置不同的resize方式， 使用 `size` 参数则直接将图像缩放至固定大小，使用`resize_short` 参数则会维持图像宽高比进行缩放。设置裁剪尺寸时，可通过 `CropImage` 算子的 `size` 参数，或 `RandCropImage` 算子的 `size` 参数。
+**A**：
+* `Mixup` 的使用方法请参考 [Mixup](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Mixup.yaml#L63-L65)；`Cuxmix` 请参考 [Cuxmix](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L63-L65)。
-#### Q1.1.3: Momentum 优化器中的 momentum 参数是什么意思呢？
+* 在使用 `Mixup` 或 `Cutmix` 时，需要注意：
-**A**: Momentum 优化器是在 SGD 优化器的基础上引入了“动量”的概念。在 SGD 优化器中，在 `t+1` 时刻，参数 `w` 的更新可表示为：
+    * 配置文件中的 `Loss.Tranin.CELoss` 需要修改为 `Loss.Tranin.MixCELoss`，可参考 [MixCELoss](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L23-L26)；
-```latex
+    * 使用 `Mixup` 或 `Cutmix` 做训练时无法计算训练的精度（Acc）指标，因此需要在配置文件中取消 `Metric.Train.TopkAcc` 字段，可参考 [Metric.Train.TopkAcc](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L125-L128)。
-w_t+1 = w_t - lr * grad
-```
+#### Q2.1.9: 训练配置yaml文件中，字段 `Global.pretrain_model` 和 `Global.checkpoints` 分别用于配置什么呢？
-其中，`lr` 为学习率，`grad` 为此时参数 `w` 的梯度。在引入动量的概念后，参数 `w` 的更新可表示为：
+**A**：
-```latex
+* 当需要 `fine-tune` 时，可以通过字段 `Global.pretrain_model` 配置预训练模型权重文件的路径，预训练模型权重文件后缀名通常为 `.pdparams`；
-v_t+1 = m * v_t + lr * grad
+* 在训练过程中，训练程序会自动保存每个epoch结束时的断点信息，包括优化器信息 `.pdopt` 和模型权重信息 `.pdparams`。在训练过程意外中断等情况下，需要恢复训练时，可以通过字段 `Global.checkpoints` 配置训练过程中保存的断点信息文件，例如通过配置 `checkpoints: ./output/ResNet18/epoch_18` 即可恢复18epoch训练结束时的断点信息，PaddleClas将自动加载 `epoch_18.pdopt` 和 `epoch_18.pdparams`，从19epoch继续训练。
-w_t+1 = w_t - v_t+1
+#### Q2.6.3: 如何将模型转为 `ONNX` 格式？
+**A**：Paddle支持两种转ONNX格式模型的方式，且依赖于 `paddle2onnx` 工具，首先需要安装 `paddle2onnx`：
+```shell
+pip install paddle2onnx
 ```
-其中，`m` 即为动量 `momentum`，表示累积动量的加权值，一般取 `0.9`，当取值小于 `1` 时，则越早期的梯度对当前的影响越小，例如，当动量参数 `m` 取 `0.9` 时，在 `t` 时刻，`t-5` 的梯度加权值为 `0.9 ^ 5 = 0.59049`，而 `t-2` 时刻的梯度加权值为 `0.9 ^ 2 = 0.81`。因此，太过“久远”的梯度信息对当前的参考意义很小，而“最近”的历史梯度信息对当前影响更大，这也是符合直觉的。
-<div align="center">
+* 从 inference model 转为 ONNX 格式模型：
-    <img src="../../images/faq/momentum.jpeg" width="400">
-</div>
-*该图来自 `https://blog.csdn.net/tsyccnh/article/details/76270707`*
+    以动态图导出的 `combined` 格式 inference model（包含 `.pdmodel` 和 `.pdiparams` 两个文件）为例，使用以下命令进行模型格式转换：
+    ```shell
+    paddle2onnx --model_dir ${model_path}  --model_filename  ${model_path}/inference.pdmodel --params_filename ${model_path}/inference.pdiparams --save_file ${save_path}/model.onnx --enable_onnx_checker True
+    ```
+    上述命令中：
+    * `model_dir`：该参数下需要包含 `.pdmodel` 和 `.pdiparams` 两个文件；
+    * `model_filename`：该参数用于指定参数 `model_dir` 下的 `.pdmodel` 文件路径；
+    * `params_filename`：该参数用于指定参数 `model_dir` 下的 `.pdiparams` 文件路径；
+    * `save_file`：该参数用于指定转换后的模型保存目录路径。
-通过引入动量的概念，在参数更新时考虑了历史更新的影响，因此可以加快收敛速度，也改善了 `SGD` 优化器带来的损失（cost、loss）震荡问题。
+    关于静态图导出的非 `combined` 格式的 inference model（通常包含文件 `__model__` 和多个参数文件）转换模型格式，以及更多参数说明请参考 paddle2onnx 官方文档 [paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md#%E5%8F%82%E6%95%B0%E9%80%89%E9%A1%B9)。
-#### Q1.1.4: PaddleClas 是否有 `Fixing the train-test resolution discrepancy` 这篇论文的实现呢？
+* 直接从模型组网代码导出ONNX格式模型：
-**A**: 目前 PaddleClas 没有实现。如果需要，可以尝试自己修改代码。简单来说，该论文所提出的思想是使用较大分辨率作为输入，对已经训练好的模型最后的FC层进行fine-tune。具体操作上，首先在较低分辨率的数据集上对模型网络进行训练，完成训练后，对网络除最后的FC层外的其他层的权重设置参数 `stop_gradient=True`，然后使用较大分辨率的输入对网络进行fine-tune训练。
-#### Q1.6.2: PaddleClas 图像识别用于 Eval 的配置文件中，`Query` 和 `Gallery` 配置具体是用于做什么呢？
+    以动态图模型组网代码为例，模型类为继承于 `paddle.nn.Layer` 的子类，代码如下所示：
-**A**: `Query` 与 `Gallery` 均为数据集配置，其中 `Gallery` 用于配置底库数据，`Query` 用于配置验证集。在进行 Eval 时，首先使用模型对 `Gallery` 底库数据进行前向计算特征向量，特征向量用于构建底库，然后模型对 `Query` 验证集中的数据进行前向计算特征向量，再与底库计算召回率等指标。
-#### Q2.1.5: PaddlePaddle 安装后，使用报错，无法导入 paddle 下的任何模块（import paddle.xxx），是为什么呢？
+    ```python
-**A**: 首先可以使用以下代码测试 Paddle 是否安装正确：
+    import paddle
-```python
+    from paddle.static import InputSpec
-import paddle
-paddle.utils.install_check.run_check(）
-```
-正确安装时，通常会有如下提示：
-```
-PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
-```
-如未能安装成功，则会有相应问题的提示。
-另外，在同时安装CPU版本和GPU版本Paddle后，由于两个版本存在冲突，需要将两个版本全部卸载，然后重新安装所需要的版本。
-#### Q2.1.6: 使用PaddleClas训练时，如何设置仅保存最优模型？不想保存中间模型。
+    class SimpleNet(paddle.nn.Layer):
-**A**: PaddleClas在训练过程中，会保存/更新以下三类模型：
+        def __init__(self):
-1. 最新的模型（`latest.pdopt`， `latest.pdparams`，`latest.pdstates`），当训练意外中断时，可使用最新保存的模型恢复训练；
+            pass
-2. 最优的模型（`best_model.pdopt`，`best_model.pdparams`，`best_model.pdstates`）；
+        def forward(self, x):
-3. 训练过程中，一个epoch结束时的断点（`epoch_xxx.pdopt`，`epoch_xxx.pdparams`，`epoch_xxx.pdstates`）。训练配置文件中 `Global.save_interval` 字段表示该模型的保存间隔。将该字段设置大于总epochs数，则不再保存中间断点模型。
+            pass
+    net = SimpleNet()
+    x_spec = InputSpec(shape=[None, 3, 224, 224], dtype='float32', name='x')
+    paddle.onnx.export(layer=net, path="./SimpleNet", input_spec=[x_spec])
+    ```
+    其中：
+    * `InputSpec()` 函数用于描述模型输入的签名信息，包括输入数据的 `shape`、`type` 和 `name`（可省略）；
+    * `paddle.onnx.export()` 函数需要指定模型组网对象 `net`，导出模型的保存路径 `save_path`，模型的输入数据描述 `input_spec`。
+    需要注意，`paddlepaddle` 版本需大于 `2.0.0`。关于 `paddle.onnx.export()` 函数的更多参数说明请参考[paddle.onnx.export](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/onnx/export_cn.html#export)。
+#### Q2.5.4: 在 build 检索底库时，参数 `pq_size` 应该如何设置？
+**A**：`pq_size` 是PQ检索算法的参数。PQ检索算法可以简单理解为“分层”检索算法，`pq_size` 是每层的“容量”，因此该参数的设置会影响检索性能，不过，在底库总数据量不太大（小于10000张）的情况下，这个参数对性能的影响很小，因此对于大多数使用场景而言，在构建底库时无需修改该参数。关于PQ检索算法的更多内容，可以查看相关[论文](https://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_with_quantization.pdf)。
 <a name="精选"></a>
 ## 精选
@@ -204,6 +213,22 @@ PaddlePaddle is installed successfully! Let's start deep learning with PaddlePad
 2. 最优的模型（`best_model.pdopt`，`best_model.pdparams`，`best_model.pdstates`）；
 3. 训练过程中，一个epoch结束时的断点（`epoch_xxx.pdopt`，`epoch_xxx.pdparams`，`epoch_xxx.pdstates`）。训练配置文件中 `Global.save_interval` 字段表示该模型的保存间隔。将该字段设置大于总epochs数，则不再保存中间断点模型。
+#### Q2.1.7: 在训练时，出现如下报错信息：`ERROR: Unexpected segmentation fault encountered in DataLoader workers.`，如何排查解决问题呢？
+**A**：尝试将训练配置文件中的字段 `num_workers` 设置为 `0`；尝试将训练配置文件中的字段 `batch_size` 调小一些；检查数据集格式和配置文件中的数据集路径是否正确。
+#### Q2.1.8: 如何在训练时使用 `Mixup` 和 `Cutmix` ？
+**A**：
+* `Mixup` 的使用方法请参考 [Mixup](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Mixup.yaml#L63-L65)；`Cuxmix` 请参考 [Cuxmix](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L63-L65)。
+* 在使用 `Mixup` 或 `Cutmix` 时，需要注意：
+    * 配置文件中的 `Loss.Tranin.CELoss` 需要修改为 `Loss.Tranin.MixCELoss`，可参考 [MixCELoss](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L23-L26)；
+    * 使用 `Mixup` 或 `Cutmix` 做训练时无法计算训练的精度（Acc）指标，因此需要在配置文件中取消 `Metric.Train.TopkAcc` 字段，可参考 [Metric.Train.TopkAcc](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L125-L128)。
+#### Q2.1.9: 训练配置yaml文件中，字段 `Global.pretrain_model` 和 `Global.checkpoints` 分别用于配置什么呢？
+**A**：
+* 当需要 `fine-tune` 时，可以通过字段 `Global.pretrain_model` 配置预训练模型权重文件的路径，预训练模型权重文件后缀名通常为 `.pdparams`；
+* 在训练过程中，训练程序会自动保存每个epoch结束时的断点信息，包括优化器信息 `.pdopt` 和模型权重信息 `.pdparams`。在训练过程意外中断等情况下，需要恢复训练时，可以通过字段 `Global.checkpoints` 配置训练过程中保存的断点信息文件，例如通过配置 `checkpoints: ./output/ResNet18/epoch_18` 即可恢复18epoch训练结束时的断点信息，PaddleClas将自动加载 `epoch_18.pdopt` 和 `epoch_18.pdparams`，从19epoch继续训练。
 <a name="2.2图像分类"></a>
 ### 2.2 图像分类
@@ -255,6 +280,9 @@ PaddlePaddle is installed successfully! Let's start deep learning with PaddlePad
 #### Q2.5.3: Mac重新编译index.so时报错如下：clang: error: unsupported option '-fopenmp', 该如何处理？
 **A**：该问题已经解决。可以参照[文档](../../../develop/deploy/vector_search/README.md)重新编译 index.so。
+#### Q2.5.4: 在 build 检索底库时，参数 `pq_size` 应该如何设置？
+**A**：`pq_size` 是PQ检索算法的参数。PQ检索算法可以简单理解为“分层”检索算法，`pq_size` 是每层的“容量”，因此该参数的设置会影响检索性能，不过，在底库总数据量不太大（小于10000张）的情况下，这个参数对性能的影响很小，因此对于大多数使用场景而言，在构建底库时无需修改该参数。关于PQ检索算法的更多内容，可以查看相关[论文](https://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_with_quantization.pdf)。
 <a name="2.6模型预测部署"></a>
 ### 2.6 模型预测部署
@@ -267,3 +295,48 @@ PaddlePaddle is installed successfully! Let's start deep learning with PaddlePad
 UserWarning: Skip loading for ***. *** is not found in the provided dict.
 ```
 如果存在，则说明模型权重未能加载成功，请进一步检查配置文件中的 `Global.pretrained_model` 字段，是否正确配置了模型权重文件的路径。模型权重文件后缀名通常为 `pdparams`，注意在配置该路径时无需填写文件后缀名。
+#### Q2.6.3: 如何将模型转为 `ONNX` 格式？
+**A**：Paddle支持两种转ONNX格式模型的方式，且依赖于 `paddle2onnx` 工具，首先需要安装 `paddle2onnx`：
+```shell
+pip install paddle2onnx
+```
+* 从 inference model 转为 ONNX 格式模型：
+    以动态图导出的 `combined` 格式 inference model（包含 `.pdmodel` 和 `.pdiparams` 两个文件）为例，使用以下命令进行模型格式转换：
+    ```shell
+    paddle2onnx --model_dir ${model_path}  --model_filename  ${model_path}/inference.pdmodel --params_filename ${model_path}/inference.pdiparams --save_file ${save_path}/model.onnx --enable_onnx_checker True
+    ```
+    上述命令中：
+    * `model_dir`：该参数下需要包含 `.pdmodel` 和 `.pdiparams` 两个文件；
+    * `model_filename`：该参数用于指定参数 `model_dir` 下的 `.pdmodel` 文件路径；
+    * `params_filename`：该参数用于指定参数 `model_dir` 下的 `.pdiparams` 文件路径；
+    * `save_file`：该参数用于指定转换后的模型保存目录路径。
+    关于静态图导出的非 `combined` 格式的 inference model（通常包含文件 `__model__` 和多个参数文件）转换模型格式，以及更多参数说明请参考 paddle2onnx 官方文档 [paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md#%E5%8F%82%E6%95%B0%E9%80%89%E9%A1%B9)。
+* 直接从模型组网代码导出ONNX格式模型：
+    以动态图模型组网代码为例，模型类为继承于 `paddle.nn.Layer` 的子类，代码如下所示：
+    ```python
+    import paddle
+    from paddle.static import InputSpec
+    class SimpleNet(paddle.nn.Layer):
+        def __init__(self):
+            pass
+        def forward(self, x):
+            pass
+    net = SimpleNet()
+    x_spec = InputSpec(shape=[None, 3, 224, 224], dtype='float32', name='x')
+    paddle.onnx.export(layer=net, path="./SimpleNet", input_spec=[x_spec])
+    ```
+    其中：
+    * `InputSpec()` 函数用于描述模型输入的签名信息，包括输入数据的 `shape`、`type` 和 `name`（可省略）；
+    * `paddle.onnx.export()` 函数需要指定模型组网对象 `net`，导出模型的保存路径 `save_path`，模型的输入数据描述 `input_spec`。
+    需要注意，`paddlepaddle` 版本需大于 `2.0.0`。关于 `paddle.onnx.export()` 函数的更多参数说明请参考[paddle.onnx.export](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/onnx/export_cn.html#export)。
--- a/docs/zh_CN/models/LCNet.md
+++ b/docs/zh_CN/models/LCNet.md
+# PPLCNet系列
+## 概述
+PPLCNet系列是百度PaddleCV团队提出的一种在Intel-CPU上表现优异的网络，作者总结了一些在Intel-CPU上可以提升模型精度但几乎不增加推理耗时的方法，将这些方法组合成了一个新的网络，即PPLCNet。与其他轻量级网络相比，PPLCNet可以在相同延时下取得更高的精度。PPLCNet已在图像分类、目标检测、语义分割上表现出了强大的竞争力。
+## 精度、FLOPS和参数量
+| Models           | Top1 | Top5 | FLOPs<br>(M) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565           | 18    | 1.5  |
+| PPLCNet_x0_35        |0.5809           | 0.8083           | 29    | 1.6  |
+| PPLCNet_x0_5         |0.6314           | 0.8466           | 47    | 1.9  |
+| PPLCNet_x0_75        |0.6818           | 0.8830           | 99    | 2.4  |
+| PPLCNet_x1_0         |0.7132           | 0.9003           | 161   | 3.0  |
+| PPLCNet_x1_5         |0.7371           | 0.9153           | 342   | 4.5  |
+| PPLCNet_x2_0         |0.7518           | 0.9227           | 590   | 6.5  |
+| PPLCNet_x2_5         |0.7660           | 0.9300           | 906   | 9.0  |
+| PPLCNet_x0_5_ssld    |0.6610           | 0.8646           | 47    | 1.9  |
+| PPLCNet_x1_0_ssld    |0.7439           | 0.9209           | 161   | 3.0  |
+| PPLCNet_x2_5_ssld    |0.8082           | 0.9533           | 906   | 9.0  |
+## 基于Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz的预测速度
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|------------------|-----------|-------------------|--------------------------|
+| PPLCNet_x0_25        | 224       | 256               | 1.74                    |
+| PPLCNet_x0_35        | 224       | 256               | 1.92                    |
+| PPLCNet_x0_5         | 224       | 256               | 2.05                    |
+| PPLCNet_x0_75        | 224       | 256               | 2.29                    |
+| PPLCNet_x1_0         | 224       | 256               | 2.46                    |
+| PPLCNet_x1_5         | 224       | 256               | 3.19                    |
+| PPLCNet_x2_0         | 224       | 256               | 4.27                    |
+| PPLCNet_x2_5         | 224       | 256               | 5.39                    |
+| PPLCNet_x0_5_ssld    | 224       | 256               | 2.05                    |
+| PPLCNet_x1_0_ssld    | 224       | 256               | 2.46                    |
+| PPLCNet_x2_5_ssld    | 224       | 256               | 5.39                    |
--- a/docs/zh_CN/models/PPLCNet.md
+++ b/docs/zh_CN/models/PPLCNet.md
+# PPLCNet系列
+## 概述
+PPLCNet系列是百度PaddleCV团队提出的一种在Intel-CPU上表现优异的网络，作者总结了一些在Intel-CPU上可以提升模型精度但几乎不增加推理耗时的方法，将这些方法组合成了一个新的网络，即PPLCNet。与其他轻量级网络相比，PPLCNet可以在相同延时下取得更高的精度。PPLCNet已在图像分类、目标检测、语义分割上表现出了强大的竞争力。
+## 精度、FLOPS和参数量
+| Models           | Top1 | Top5 | FLOPs<br>(M) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565           | 18    | 1.5  |
+| PPLCNet_x0_35        |0.5809           | 0.8083           | 29    | 1.6  |
+| PPLCNet_x0_5         |0.6314           | 0.8466           | 47    | 1.9  |
+| PPLCNet_x0_75        |0.6818           | 0.8830           | 99    | 2.4  |
+| PPLCNet_x1_0         |0.7132           | 0.9003           | 161   | 3.0  |
+| PPLCNet_x1_5         |0.7371           | 0.9153           | 342   | 4.5  |
+| PPLCNet_x2_0         |0.7518           | 0.9227           | 590   | 6.5  |
+| PPLCNet_x2_5         |0.7660           | 0.9300           | 906   | 9.0  |
+| PPLCNet_x0_5_ssld    |0.6610           | 0.8646           | 47    | 1.9  |
+| PPLCNet_x1_0_ssld    |0.7439           | 0.9209           | 161   | 3.0  |
+| PPLCNet_x2_5_ssld    |0.8082           | 0.9533           | 906   | 9.0  |
+## 基于Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz的预测速度
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|------------------|-----------|-------------------|--------------------------|
+| PPLCNet_x0_25        | 224       | 256               | 1.74                    |
+| PPLCNet_x0_35        | 224       | 256               | 1.92                    |
+| PPLCNet_x0_5         | 224       | 256               | 2.05                    |
+| PPLCNet_x0_75        | 224       | 256               | 2.29                    |
+| PPLCNet_x1_0         | 224       | 256               | 2.46                    |
+| PPLCNet_x1_5         | 224       | 256               | 3.19                    |
+| PPLCNet_x2_0         | 224       | 256               | 4.27                    |
+| PPLCNet_x2_5         | 224       | 256               | 5.39                    |
+| PPLCNet_x0_5_ssld    | 224       | 256               | 2.05                    |
+| PPLCNet_x1_0_ssld    | 224       | 256               | 2.46                    |
+| PPLCNet_x2_5_ssld    | 224       | 256               | 5.39                    |
--- a/docs/zh_CN/tutorials/getting_started_retrieval.md
+++ b/docs/zh_CN/tutorials/getting_started_retrieval.md
@@ -117,7 +117,7 @@ python3 tools/train.py \
 其中，`-c`用于指定配置文件的路径，`-o`用于指定需要修改或者添加的参数，其中`-o Arch.Backbone.pretrained=True`表示Backbone部分使用预训练模型，此外，`Arch.Backbone.pretrained`也可以指定具体的模型权重文件的地址，使用时需要换成自己的预训练模型权重文件的路径。`-o Global.device=gpu`表示使用GPU进行训练。如果希望使用CPU进行训练，则需要将`Global.device`设置为`cpu`。
-更详细的训练配置，也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config.md)。
+更详细的训练配置，也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config_description.md)。
 运行上述命令，可以看到输出日志，示例如下：
@@ -245,4 +245,4 @@ python3 tools/export_model.py \
  - 平均检索精度(mAP)
    - AP: AP指的是不同召回率上的正确率的平均值
    - mAP: 测试集中所有图片对应的AP的的平均值
\ No newline at end of file
--- a/ppcls/arch/backbone/__init__.py
+++ b/ppcls/arch/backbone/__init__.py
@@ -21,6 +21,7 @@ from ppcls.arch.backbone.legendary_models.resnet import ResNet18, ResNet18_vd, R
 from ppcls.arch.backbone.legendary_models.vgg import VGG11, VGG13, VGG16, VGG19
 from ppcls.arch.backbone.legendary_models.inception_v3 import InceptionV3
 from ppcls.arch.backbone.legendary_models.hrnet import HRNet_W18_C, HRNet_W30_C, HRNet_W32_C, HRNet_W40_C, HRNet_W44_C, HRNet_W48_C, HRNet_W60_C, HRNet_W64_C, SE_HRNet_W64_C
+from ppcls.arch.backbone.legendary_models.pp_lcnet import PPLCNet_x0_25, PPLCNet_x0_35, PPLCNet_x0_5, PPLCNet_x0_75, PPLCNet_x1_0, PPLCNet_x1_5, PPLCNet_x2_0, PPLCNet_x2_5
 from ppcls.arch.backbone.model_zoo.resnet_vc import ResNet50_vc
 from ppcls.arch.backbone.model_zoo.resnext import ResNeXt50_32x4d, ResNeXt50_64x4d, ResNeXt101_32x4d, ResNeXt101_64x4d, ResNeXt152_32x4d, ResNeXt152_64x4d

--- a/ppcls/arch/backbone/legendary_models/pp_lcnet.py
+++ b/ppcls/arch/backbone/legendary_models/pp_lcnet.py
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import, division, print_function
+import paddle
+import paddle.nn as nn
+from paddle import ParamAttr
+from paddle.nn import AdaptiveAvgPool2D, BatchNorm, Conv2D, Dropout, Linear
+from paddle.regularizer import L2Decay
+from paddle.nn.initializer import KaimingNormal
+from ppcls.arch.backbone.base.theseus_layer import TheseusLayer
+from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url
+MODEL_URLS = {
+    "PPLCNet_x0_25":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_pretrained.pdparams",
+    "PPLCNet_x0_35":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_35_pretrained.pdparams",
+    "PPLCNet_x0_5":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_pretrained.pdparams",
+    "PPLCNet_x0_75":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams",
+    "PPLCNet_x1_0":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams",
+    "PPLCNet_x1_5":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_5_pretrained.pdparams",
+    "PPLCNet_x2_0":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams",
+    "PPLCNet_x2_5":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_pretrained.pdparams"
+}
+__all__ = list(MODEL_URLS.keys())
+# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se.
+# k: kernel_size
+# in_c: input channel number in depthwise block
+# out_c: output channel number in depthwise block
+# s: stride in depthwise block
+# use_se: whether to use SE block
+NET_CONFIG = {
+    "blocks2":
+    #k, in_c, out_c, s, use_se
+    [[3, 16, 32, 1, False]],
+    "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]],
+    "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]],
+    "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False],
+                [5, 256, 256, 1, False], [5, 256, 256, 1, False],
+                [5, 256, 256, 1, False], [5, 256, 256, 1, False]],
+    "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]]
+}
+def make_divisible(v, divisor=8, min_value=None):
+    if min_value is None:
+        min_value = divisor
+    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+    if new_v < 0.9 * v:
+        new_v += divisor
+    return new_v
+class ConvBNLayer(TheseusLayer):
+    def __init__(self,
+                 num_channels,
+                 filter_size,
+                 num_filters,
+                 stride,
+                 num_groups=1):
+        super().__init__()
+        self.conv = Conv2D(
+            in_channels=num_channels,
+            out_channels=num_filters,
+            kernel_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) // 2,
+            groups=num_groups,
+            weight_attr=ParamAttr(initializer=KaimingNormal()),
+            bias_attr=False)
+        self.bn = BatchNorm(
+            num_filters,
+            param_attr=ParamAttr(regularizer=L2Decay(0.0)),
+            bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
+        self.hardswish = nn.Hardswish()
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.bn(x)
+        x = self.hardswish(x)
+        return x
+class DepthwiseSeparable(TheseusLayer):
+    def __init__(self,
+                 num_channels,
+                 num_filters,
+                 stride,
+                 dw_size=3,
+                 use_se=False):
+        super().__init__()
+        self.use_se = use_se
+        self.dw_conv = ConvBNLayer(
+            num_channels=num_channels,
+            num_filters=num_channels,
+            filter_size=dw_size,
+            stride=stride,
+            num_groups=num_channels)
+        if use_se:
+            self.se = SEModule(num_channels)
+        self.pw_conv = ConvBNLayer(
+            num_channels=num_channels,
+            filter_size=1,
+            num_filters=num_filters,
+            stride=1)
+    def forward(self, x):
+        x = self.dw_conv(x)
+        if self.use_se:
+            x = self.se(x)
+        x = self.pw_conv(x)
+        return x
+class SEModule(TheseusLayer):
+    def __init__(self, channel, reduction=4):
+        super().__init__()
+        self.avg_pool = AdaptiveAvgPool2D(1)
+        self.conv1 = Conv2D(
+            in_channels=channel,
+            out_channels=channel // reduction,
+            kernel_size=1,
+            stride=1,
+            padding=0)
+        self.relu = nn.ReLU()
+        self.conv2 = Conv2D(
+            in_channels=channel // reduction,
+            out_channels=channel,
+            kernel_size=1,
+            stride=1,
+            padding=0)
+        self.hardsigmoid = nn.Hardsigmoid()
+    def forward(self, x):
+        identity = x
+        x = self.avg_pool(x)
+        x = self.conv1(x)
+        x = self.relu(x)
+        x = self.conv2(x)
+        x = self.hardsigmoid(x)
+        x = paddle.multiply(x=identity, y=x)
+        return x
+class PPLCNet(TheseusLayer):
+    def __init__(self,
+                 scale=1.0,
+                 class_num=1000,
+                 dropout_prob=0.2,
+                 class_expand=1280):
+        super().__init__()
+        self.scale = scale
+        self.class_expand = class_expand
+        self.conv1 = ConvBNLayer(
+            num_channels=3,
+            filter_size=3,
+            num_filters=make_divisible(16 * scale),
+            stride=2)
+        self.blocks2 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
+        ])
+        self.blocks3 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
+        ])
+        self.blocks4 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
+        ])
+        self.blocks5 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
+        ])
+        self.blocks6 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
+        ])
+        self.avg_pool = AdaptiveAvgPool2D(1)
+        self.last_conv = Conv2D(
+            in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
+            out_channels=self.class_expand,
+            kernel_size=1,
+            stride=1,
+            padding=0,
+            bias_attr=False)
+        self.hardswish = nn.Hardswish()
+        self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
+        self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
+        self.fc = Linear(self.class_expand, class_num)
+    def forward(self, x):
+        x = self.conv1(x)
+        x = self.blocks2(x)
+        x = self.blocks3(x)
+        x = self.blocks4(x)
+        x = self.blocks5(x)
+        x = self.blocks6(x)
+        x = self.avg_pool(x)
+        x = self.last_conv(x)
+        x = self.hardswish(x)
+        x = self.dropout(x)
+        x = self.flatten(x)
+        x = self.fc(x)
+        return x
+def _load_pretrained(pretrained, model, model_url, use_ssld):
+    if pretrained is False:
+        pass
+    elif pretrained is True:
+        load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld)
+    elif isinstance(pretrained, str):
+        load_dygraph_pretrain(model, pretrained)
+    else:
+        raise RuntimeError(
+            "pretrained type is not available. Please use `string` or `boolean` type."
+        )
+def PPLCNet_x0_25(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x0_25
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x0_25` model depends on args.
+    """
+    model = PPLCNet(scale=0.25, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_25"], use_ssld)
+    return model
+def PPLCNet_x0_35(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x0_35
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x0_35` model depends on args.
+    """
+    model = PPLCNet(scale=0.35, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_35"], use_ssld)
+    return model
+def PPLCNet_x0_5(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x0_5
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x0_5` model depends on args.
+    """
+    model = PPLCNet(scale=0.5, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_5"], use_ssld)
+    return model
+def PPLCNet_x0_75(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x0_75
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x0_75` model depends on args.
+    """
+    model = PPLCNet(scale=0.75, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_75"], use_ssld)
+    return model
+def PPLCNet_x1_0(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x1_0
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x1_0` model depends on args.
+    """
+    model = PPLCNet(scale=1.0, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x1_0"], use_ssld)
+    return model
+def PPLCNet_x1_5(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x1_5
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x1_5` model depends on args.
+    """
+    model = PPLCNet(scale=1.5, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x1_5"], use_ssld)
+    return model
+def PPLCNet_x2_0(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x2_0
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x2_0` model depends on args.
+    """
+    model = PPLCNet(scale=2.0, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x2_0"], use_ssld)
+    return model
+def PPLCNet_x2_5(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x2_5
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x2_5` model depends on args.
+    """
+    model = PPLCNet(scale=2.5, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x2_5"], use_ssld)
+    return model
--- a/ppcls/arch/backbone/model_zoo/googlenet.py
+++ b/ppcls/arch/backbone/model_zoo/googlenet.py
@@ -131,7 +131,7 @@ class GoogLeNetDY(nn.Layer):
        self._ince5b = Inception(
            832, 832, 384, 192, 384, 48, 128, 128, name="ince5b")
-        self._pool_5 = AvgPool2D(kernel_size=7, stride=7)
+        self._pool_5 = AdaptiveAvgPool2D(1)
        self._drop = Dropout(p=0.4, mode="downscale_in_infer")
        self._fc_out = Linear(

--- a/ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
+++ b/ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 100
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  eval_mode: retrieval
+  use_dali: False
+  to_static: False
+# model architecture
+Arch:
+  name: RecModel
+  infer_output_key: features
+  infer_add_softmax: False
+  Backbone: 
+    name: PPLCNet_x2_5
+    pretrained: True
+    use_ssld: True
+  BackboneStopLayer:
+    name: flatten_0
+  Neck:
+    name: FC
+    embedding_size: 1280
+    class_num: 512
+  Head:
+    name: ArcMargin 
+    embedding_size: 512
+    class_num: 185341
+    margin: 0.2
+    scale: 30
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.04
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00001
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/train_reg_all_data.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    Query:
+      dataset: 
+        name: VeriWild
+        image_root: ./dataset/Aliproduct/
+        cls_label_path: ./dataset/Aliproduct/val_list.txt
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+      sampler:
+        name: DistributedBatchSampler
+        batch_size: 64
+        drop_last: False
+        shuffle: False
+      loader:
+        num_workers: 4
+        use_shared_memory: True
+    Gallery:
+      dataset: 
+        name: VeriWild
+        image_root: ./dataset/Aliproduct/
+        cls_label_path: ./dataset/Aliproduct/val_list.txt
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+      sampler:
+        name: DistributedBatchSampler
+        batch_size: 64
+        drop_last: False
+        shuffle: False
+      loader:
+        num_workers: 4
+        use_shared_memory: True
+Metric:
+  Eval:
+    - Recallk:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/AlexNet/AlexNet.yaml
+++ b/ppcls/configs/ImageNet/AlexNet/AlexNet.yaml
@@ -34,9 +34,8 @@ Optimizer:
  momentum: 0.9
  lr:
    name: Piecewise
-    learning_rate: 0.01
    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
+    values: [0.01, 0.001, 0.0001, 0.00001]
  regularizer:
    name: 'L2'
    coeff: 0.0001

--- a/ppcls/configs/ImageNet/DeiT/DeiT_base_distilled_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_base_distilled_patch16_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 1e-3
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 5
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
 DataLoader:
@@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_base_distilled_patch16_384.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_base_distilled_patch16_384.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 1e-3
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 5
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
 DataLoader:
@@ -54,18 +56,39 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - RandCropImage:
-            size: 384
+            size: 384 
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 384
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -82,7 +105,9 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - ResizeImage:
-            resize_short: 426
+            resize_short: 438
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 384
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -107,7 +132,9 @@ Infer:
        to_rgb: True
        channel_first: False
    - ResizeImage:
-        resize_short: 426
+        resize_short: 438
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 384
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_base_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_base_patch16_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 1e-3
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 5
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
 DataLoader:
@@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_base_patch16_384.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_base_patch16_384.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 1e-3
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 5
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
 DataLoader:
@@ -54,18 +56,39 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - RandCropImage:
-            size: 384
+            size: 384 
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 384
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -82,7 +105,9 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - ResizeImage:
-            resize_short: 426
+            resize_short: 438
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 384
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -107,7 +132,9 @@ Infer:
        to_rgb: True
        channel_first: False
    - ResizeImage:
-        resize_short: 426
+        resize_short: 438
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 384
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_small_distilled_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_small_distilled_patch16_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 1e-3
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 5
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
 DataLoader:
@@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_small_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_small_patch16_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 1e-3
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 5
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
 DataLoader:
@@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_tiny_distilled_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_tiny_distilled_patch16_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 1e-3
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 5
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
 DataLoader:
@@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_tiny_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_tiny_patch16_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 1e-3
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 5
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
 DataLoader:
@@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_25.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_25.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x0_25
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_35.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_35.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x0_35
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_5.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_5.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x0_5
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_75.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_75.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x0_75
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_5.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_5.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x1_5
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_0.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_0.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x2_0
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_5.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_5.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x2_5
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - AutoAugment:
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window12_384.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window12_384.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 5e-4
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 20
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
@@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 384
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 384 
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -84,7 +110,11 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - ResizeImage:
-            size: [384, 384]
+            resize_short: 438
+            interpolation: bicubic
+            backend: pil
+        - CropImage:
+            size: 384
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
@@ -92,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -107,7 +137,11 @@ Infer:
        to_rgb: True
        channel_first: False
    - ResizeImage:
-        size: [384, 384]
+        resize_short: 438
+        interpolation: bicubic
+        backend: pil
+    - CropImage:
+        size: 384
    - NormalizeImage:
        scale: 1.0/255.0
        mean: [0.485, 0.456, 0.406]
@@ -120,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window7_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 5e-4
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 20
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
@@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -85,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -94,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -110,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -124,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window12_384.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window12_384.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 5e-4
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 20
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
@@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 384
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 384 
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -84,7 +110,11 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - ResizeImage:
-            size: [384, 384]
+            resize_short: 438
+            interpolation: bicubic
+            backend: pil
+        - CropImage:
+            size: 384
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
@@ -92,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -107,7 +137,11 @@ Infer:
        to_rgb: True
        channel_first: False
    - ResizeImage:
-        size: [384, 384]
+        resize_short: 438
+        interpolation: bicubic
+        backend: pil
+    - CropImage:
+        size: 384
    - NormalizeImage:
        scale: 1.0/255.0
        mean: [0.485, 0.456, 0.406]
@@ -120,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window7_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 5e-4
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 20
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
@@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -85,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -94,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -110,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -124,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_small_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_small_patch4_window7_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 5e-4
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 20
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
@@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -85,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -94,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -110,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -124,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_tiny_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_tiny_patch4_window7_224.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 5e-4
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 20
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
@@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -85,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -94,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -110,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -124,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Twins/alt_gvt_base.yaml
+++ b/ppcls/configs/ImageNet/Twins/alt_gvt_base.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -20,28 +20,34 @@ Global:
 Arch:
  name: alt_gvt_base
  class_num: 1000
+  drop_rate: 0.0
+  drop_path_rate: 0.3
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 5e-4
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 5
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
@@ -57,17 +63,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -85,6 +113,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -94,7 +124,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -110,6 +140,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -124,9 +156,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Twins/alt_gvt_large.yaml
+++ b/ppcls/configs/ImageNet/Twins/alt_gvt_large.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -20,28 +20,34 @@ Global:
 Arch:
  name: alt_gvt_large
  class_num: 1000
+  drop_rate: 0.0
+  drop_path_rate: 0.5
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 5e-4
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 5
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
@@ -57,17 +63,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -85,6 +113,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -94,7 +124,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -110,6 +140,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -124,9 +156,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Twins/alt_gvt_small.yaml
+++ b/ppcls/configs/ImageNet/Twins/alt_gvt_small.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -20,28 +20,34 @@ Global:
 Arch:
  name: alt_gvt_small
  class_num: 1000
+  drop_rate: 0.0
+  drop_path_rate: 0.2
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 5e-4
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 5
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
@@ -57,17 +63,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -85,6 +113,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -94,7 +124,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -110,6 +140,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -124,9 +156,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Twins/pcpvt_base.yaml
+++ b/ppcls/configs/ImageNet/Twins/pcpvt_base.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -20,28 +20,34 @@ Global:
 Arch:
  name: pcpvt_base
  class_num: 1000
+  drop_rate: 0.0
+  drop_path_rate: 0.3
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 5e-4
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 5
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
@@ -57,17 +63,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -85,6 +113,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -94,7 +124,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -110,6 +140,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -124,9 +156,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Twins/pcpvt_large.yaml
+++ b/ppcls/configs/ImageNet/Twins/pcpvt_large.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -20,28 +20,34 @@ Global:
 Arch:
  name: pcpvt_large
  class_num: 1000
+  drop_rate: 0.0
+  drop_path_rate: 0.5
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 5e-4
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 5
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
@@ -57,17 +63,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -85,6 +113,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -94,7 +124,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -110,6 +140,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -124,9 +156,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Twins/pcpvt_small.yaml
+++ b/ppcls/configs/ImageNet/Twins/pcpvt_small.yaml
@@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -20,28 +20,34 @@ Global:
 Arch:
  name: pcpvt_small
  class_num: 1000
+  drop_rate: 0.0
+  drop_path_rate: 0.2
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0
 Optimizer:
-  name: Momentum
+  name: AdamW
-  momentum: 0.9
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
+    name: Cosine
-    learning_rate: 0.1
+    learning_rate: 5e-4
-    decay_epochs: [30, 60, 90]
+    eta_min: 1e-5
-    values: [0.1, 0.01, 0.001, 0.0001]
+    warmup_epoch: 5
-  regularizer:
+    warmup_start_lr: 1e-6
-    name: 'L2'
-    coeff: 0.0001
 # data loader for train and eval
@@ -57,17 +63,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@@ -85,6 +113,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -94,7 +124,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@@ -110,6 +140,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@@ -124,9 +156,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/Logo/ResNet50_ReID.yaml
+++ b/ppcls/configs/Logo/ResNet50_ReID.yaml
@@ -54,7 +54,7 @@ Optimizer:
  momentum: 0.9
  lr:
    name: Cosine
-    learning_rate: 0.01
+    learning_rate: 0.04
  regularizer:
    name: 'L2'
    coeff: 0.0001
@@ -84,10 +84,10 @@ DataLoader:
          - RandomErasing:
              EPSILON: 0.5
    sampler:
-        name: DistributedRandomIdentitySampler
+        name: PKSampler
        batch_size: 128
-        num_instances: 2
+        sample_per_id: 2
-        drop_last: False
+        drop_last: True
    loader:
        num_workers: 6
@@ -97,7 +97,7 @@ DataLoader:
      dataset:
        name: LogoDataset
        image_root: "dataset/LogoDet-3K-crop/val/"
-        cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+query.txt"
+        cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+val.txt"
        transform_ops:
          - DecodeImage:
              to_rgb: True
@@ -122,7 +122,7 @@ DataLoader:
      dataset:
          name: LogoDataset
          image_root: "dataset/LogoDet-3K-crop/train/"
-          cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+gallery.txt"
+          cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+train.txt"
          transform_ops:
            - DecodeImage:
                to_rgb: True

--- a/ppcls/configs/Products/ResNet50_vd_Inshop.yaml
+++ b/ppcls/configs/Products/ResNet50_vd_Inshop.yaml
@@ -54,7 +54,7 @@ Optimizer:
  momentum: 0.9
  lr:
    name: MultiStepDecay
-    learning_rate: 0.01
+    learning_rate: 0.04
    milestones: [30, 60, 70, 80, 90, 100]
    gamma: 0.5
    verbose: False
@@ -90,10 +90,10 @@ DataLoader:
            r1: 0.3
            mean: [0., 0., 0.]
    sampler:
-      name: DistributedRandomIdentitySampler
+      name: PKSampler
      batch_size: 64
-      num_instances: 2
+      sample_per_id: 2
-      drop_last: False
+      drop_last: True
      shuffle: True
    loader:
      num_workers: 4

--- a/ppcls/configs/Vehicle/ResNet50_ReID.yaml
+++ b/ppcls/configs/Vehicle/ResNet50_ReID.yaml
@@ -53,7 +53,7 @@ Optimizer:
  momentum: 0.9
  lr:
    name: Cosine
-    learning_rate: 0.01
+    learning_rate: 0.04
  regularizer:
    name: 'L2'
    coeff: 0.0005
@@ -88,10 +88,10 @@ DataLoader:
              mean: [0., 0., 0.]
    sampler:
-        name: DistributedRandomIdentitySampler
+        name: PKSampler
        batch_size: 128
-        num_instances: 2
+        sample_per_id: 2
-        drop_last: False
+        drop_last: True
        shuffle: True
    loader:
        num_workers: 6

--- a/ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
+++ b/ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 10
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  use_multilabel: True
+# model architecture
+Arch:
+  name: MobileNetV1
+  class_num: 33
+  pretrained: True
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MultiLabelLoss:
+        weight: 1.0
+  Eval:
+    - MultiLabelLoss:
+        weight: 1.0
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.1
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: MultiLabelDataset
+      image_root: ./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/
+      cls_label_path: ./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/multilabel_train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+  Eval:
+    dataset: 
+      name: MultiLabelDataset
+      image_root: ./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/
+      cls_label_path: ./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/multilabel_test_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+Infer:
+  infer_imgs: ./deploy/images/0517_2715693311.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: MultiLabelTopk
+    topk: 5
+    class_id_map_file: None
+Metric:
+  Train:
+    - HammingDistance:
+    - AccuracyScore:
+  Eval:
+    - HammingDistance:
+    - AccuracyScore:
--- a/ppcls/data/__init__.py
+++ b/ppcls/data/__init__.py
@@ -26,9 +26,12 @@ from ppcls.data.dataloader.common_dataset import create_operators
 from ppcls.data.dataloader.vehicle_dataset import CompCars, VeriWild
 from ppcls.data.dataloader.logo_dataset import LogoDataset
 from ppcls.data.dataloader.icartoon_dataset import ICartoonDataset
+from ppcls.data.dataloader.mix_dataset import MixDataset
 # sampler
 from ppcls.data.dataloader.DistributedRandomIdentitySampler import DistributedRandomIdentitySampler
+from ppcls.data.dataloader.pk_sampler import PKSampler
+from ppcls.data.dataloader.mix_sampler import MixSampler
 from ppcls.data import preprocess
 from ppcls.data.preprocess import transform

--- a/ppcls/data/dataloader/__init__.py
+++ b/ppcls/data/dataloader/__init__.py
+from ppcls.data.dataloader.imagenet_dataset import ImageNetDataset
+from ppcls.data.dataloader.multilabel_dataset import MultiLabelDataset
+from ppcls.data.dataloader.common_dataset import create_operators
+from ppcls.data.dataloader.vehicle_dataset import CompCars, VeriWild
+from ppcls.data.dataloader.logo_dataset import LogoDataset
+from ppcls.data.dataloader.icartoon_dataset import ICartoonDataset
+from ppcls.data.dataloader.mix_dataset import MixDataset
+from ppcls.data.dataloader.mix_sampler import MixSampler
+from ppcls.data.dataloader.pk_sampler import PKSampler
--- a/ppcls/data/dataloader/mix_dataset.py
+++ b/ppcls/data/dataloader/mix_dataset.py
+#   Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import print_function
+import numpy as np
+import os
+from paddle.io import Dataset
+from .. import dataloader
+class MixDataset(Dataset):
+    def __init__(self, datasets_config):
+        super().__init__()
+        self.dataset_list = []
+        start_idx = 0
+        end_idx = 0
+        for config_i in datasets_config:
+            dataset_name = config_i.pop('name')
+            dataset = getattr(dataloader, dataset_name)(**config_i)
+            end_idx += len(dataset)
+            self.dataset_list.append([end_idx, start_idx, dataset])
+            start_idx = end_idx
+        self.length = end_idx
+    def __getitem__(self, idx):
+        for dataset_i in self.dataset_list:
+            if dataset_i[0] > idx:
+                dataset_i_idx = idx - dataset_i[1]
+                return dataset_i[2][dataset_i_idx]
+    def __len__(self):
+        return self.length
+    def get_dataset_list(self):
+        return self.dataset_list
--- a/ppcls/data/dataloader/mix_sampler.py
+++ b/ppcls/data/dataloader/mix_sampler.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from paddle.io import DistributedBatchSampler, Sampler
+from ppcls.utils import logger
+from ppcls.data.dataloader.mix_dataset import MixDataset
+from ppcls.data import dataloader
+class MixSampler(DistributedBatchSampler):
+    def __init__(self, dataset, batch_size, sample_configs, iter_per_epoch):
+        super().__init__(dataset, batch_size)
+        assert isinstance(dataset,
+                          MixDataset), "MixSampler only support MixDataset"
+        self.sampler_list = []
+        self.batch_size = batch_size
+        self.start_list = []
+        self.length = iter_per_epoch
+        dataset_list = dataset.get_dataset_list()
+        batch_size_left = self.batch_size
+        self.iter_list = []
+        for i, config_i in enumerate(sample_configs):
+            self.start_list.append(dataset_list[i][1])
+            sample_method = config_i.pop("name")
+            ratio_i = config_i.pop("ratio")
+            if i < len(sample_configs) - 1:
+                batch_size_i = int(self.batch_size * ratio_i)
+                batch_size_left -= batch_size_i
+            else:
+                batch_size_i = batch_size_left
+            assert batch_size_i <= len(dataset_list[i][2])
+            config_i["batch_size"] = batch_size_i
+            if sample_method == "DistributedBatchSampler":
+                sampler_i = DistributedBatchSampler(dataset_list[i][2],
+                                                    **config_i)
+            else:
+                sampler_i = getattr(dataloader, sample_method)(
+                    dataset_list[i][2], **config_i)
+            self.sampler_list.append(sampler_i)
+            self.iter_list.append(iter(sampler_i))
+            self.length += len(dataset_list[i][2]) * ratio_i
+            self.iter_counter = 0
+    def __iter__(self):
+        while self.iter_counter < self.length:
+            batch = []
+            for i, iter_i in enumerate(self.iter_list):
+                batch_i = next(iter_i, None)
+                if batch_i is None:
+                    iter_i = iter(self.sampler_list[i])
+                    self.iter_list[i] = iter_i
+                    batch_i = next(iter_i, None)
+                    assert batch_i is not None, "dataset {} return None".format(
+                        i)
+                batch += [idx + self.start_list[i] for idx in batch_i]
+            if len(batch) == self.batch_size:
+                self.iter_counter += 1
+                yield batch
+            else:
+                logger.info("Some dataset reaches end")
+        self.iter_counter = 0
+    def __len__(self):
+        return self.length
--- a/ppcls/data/dataloader/multilabel_dataset.py
+++ b/ppcls/data/dataloader/multilabel_dataset.py
@@ -33,7 +33,7 @@ class MultiLabelDataset(CommonDataset):
        with open(self._cls_path) as fd:
            lines = fd.readlines()
            for l in lines:
-                l = l.strip().split(" ")
+                l = l.strip().split("\t")
                self.images.append(os.path.join(self._img_root, l[0]))
                labels = l[1].split(',')
@@ -44,13 +44,14 @@ class MultiLabelDataset(CommonDataset):
    def __getitem__(self, idx):
        try:
-            img = cv2.imread(self.images[idx])
+            with open(self.images[idx], 'rb') as f:
-            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+                img = f.read()
            if self._transform_ops:
                img = transform(img, self._transform_ops)
            img = img.transpose((2, 0, 1))
            label = np.array(self.labels[idx]).astype("float32")
            return (img, label)
        except Exception as ex:
            logger.error("Exception occured when parse line: {} with msg: {}".
                         format(self.images[idx], ex))

--- a/ppcls/data/dataloader/pk_sampler.py
+++ b/ppcls/data/dataloader/pk_sampler.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from collections import defaultdict
+import numpy as np
+import random
+from paddle.io import DistributedBatchSampler
+from ppcls.utils import logger
+class PKSampler(DistributedBatchSampler):
+    """
+    First, randomly sample P identities.
+    Then for each identity randomly sample K instances.
+    Therefore batch size is P*K, and the sampler called PKSampler.
+    Args:
+        dataset (paddle.io.Dataset): list of (img_path, pid, cam_id).
+        sample_per_id(int): number of instances per identity in a batch.
+        batch_size (int): number of examples in a batch.
+        shuffle(bool): whether to shuffle indices order before generating
+            batch indices. Default False.
+    """
+    def __init__(self,
+                 dataset,
+                 batch_size,
+                 sample_per_id,
+                 shuffle=True,
+                 drop_last=True,
+                 sample_method="sample_avg_prob"):
+        super().__init__(
+            dataset, batch_size, shuffle=shuffle, drop_last=drop_last)
+        assert batch_size % sample_per_id == 0, \
+            "PKSampler configs error, Sample_per_id must be a divisor of batch_size."
+        assert hasattr(self.dataset,
+                       "labels"), "Dataset must have labels attribute."
+        self.sample_per_label = sample_per_id
+        self.label_dict = defaultdict(list)
+        self.sample_method = sample_method
+        for idx, label in enumerate(self.dataset.labels):
+            self.label_dict[label].append(idx)
+        self.label_list = list(self.label_dict)
+        assert len(self.label_list) * self.sample_per_label > self.batch_size, \
+            "batch size should be smaller than "
+        if self.sample_method == "id_avg_prob":
+            self.prob_list = np.array([1 / len(self.label_list)] *
+                                      len(self.label_list))
+        elif self.sample_method == "sample_avg_prob":
+            counter = []
+            for label_i in self.label_list:
+                counter.append(len(self.label_dict[label_i]))
+            self.prob_list = np.array(counter) / sum(counter)
+        else:
+            logger.error(
+                "PKSampler only support id_avg_prob and sample_avg_prob sample method, "
+                "but receive {}.".format(self.sample_method))
+        if sum(np.abs(self.prob_list - 1) > 0.00000001):
+            self.prob_list[-1] = 1 - sum(self.prob_list[:-1])
+            if self.prob_list[-1] > 1 or self.prob_list[-1] < 0:
+                logger.error("PKSampler prob list error")
+            else:
+                logger.info(
+                    "PKSampler: sum of prob list not equal to 1, change the last prob"
+                )
+    def __iter__(self):
+        label_per_batch = self.batch_size // self.sample_per_label
+        if self.shuffle:
+            np.random.RandomState(self.epoch).shuffle(self.label_list)
+        for i in range(len(self)):
+            batch_index = []
+            batch_label_list = np.random.choice(
+                self.label_list,
+                size=label_per_batch,
+                replace=False,
+                p=self.prob_list)
+            for label_i in batch_label_list:
+                label_i_indexes = self.label_dict[label_i]
+                if self.sample_per_label <= len(label_i_indexes):
+                    batch_index.extend(
+                        np.random.choice(
+                            label_i_indexes,
+                            size=self.sample_per_label,
+                            replace=False))
+                else:
+                    batch_index.extend(
+                        np.random.choice(
+                            label_i_indexes,
+                            size=self.sample_per_label,
+                            replace=True))
+            if not self.drop_last or len(batch_index) == self.batch_size:
+                yield batch_index
--- a/ppcls/data/postprocess/__init__.py
+++ b/ppcls/data/postprocess/__init__.py
@@ -16,7 +16,7 @@ import importlib
 from . import topk
-from .topk import Topk
+from .topk import Topk, MultiLabelTopk
 def build_postprocess(config):

--- a/ppcls/data/postprocess/topk.py
+++ b/ppcls/data/postprocess/topk.py
@@ -45,15 +45,17 @@ class Topk(object):
            class_id_map = None
        return class_id_map
-    def __call__(self, x, file_names=None):
+    def __call__(self, x, file_names=None, multilabel=False):
        assert isinstance(x, paddle.Tensor)
        if file_names is not None:
            assert x.shape[0] == len(file_names)
-        x = F.softmax(x, axis=-1)
+        x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
        x = x.numpy()
        y = []
        for idx, probs in enumerate(x):
-            index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32")
+            index = probs.argsort(axis=0)[-self.topk:][::-1].astype(
+                "int32") if not multilabel else np.where(
+                    probs >= 0.5)[0].astype("int32")
            clas_id_list = []
            score_list = []
            label_name_list = []
@@ -73,3 +75,11 @@ class Topk(object):
                result["label_names"] = label_name_list
            y.append(result)
        return y
+class MultiLabelTopk(Topk):
+    def __init__(self, topk=1, class_id_map_file=None):
+        super().__init__()
+    def __call__(self, x, file_names=None):
+        return super().__call__(x, file_names, multilabel=True)
--- a/ppcls/data/preprocess/__init__.py
+++ b/ppcls/data/preprocess/__init__.py
@@ -14,6 +14,7 @@
 from ppcls.data.preprocess.ops.autoaugment import ImageNetPolicy as RawImageNetPolicy
 from ppcls.data.preprocess.ops.randaugment import RandAugment as RawRandAugment
+from ppcls.data.preprocess.ops.timm_autoaugment import RawTimmAutoAugment
 from ppcls.data.preprocess.ops.cutout import Cutout
 from ppcls.data.preprocess.ops.hide_and_seek import HideAndSeek
@@ -29,9 +30,8 @@ from ppcls.data.preprocess.ops.operators import NormalizeImage
 from ppcls.data.preprocess.ops.operators import ToCHWImage
 from ppcls.data.preprocess.ops.operators import AugMix
-from ppcls.data.preprocess.batch_ops.batch_operators import MixupOperator, CutmixOperator, FmixOperator
+from ppcls.data.preprocess.batch_ops.batch_operators import MixupOperator, CutmixOperator, OpSampler, FmixOperator
-import six
 import numpy as np
 from PIL import Image
@@ -45,21 +45,16 @@ def transform(data, ops=[]):
 class AutoAugment(RawImageNetPolicy):
    """ ImageNetPolicy wrapper to auto fit different img types """
    def __init__(self, *args, **kwargs):
-        if six.PY2:
+        super().__init__(*args, **kwargs)
-            super(AutoAugment, self).__init__(*args, **kwargs)
-        else:
-            super().__init__(*args, **kwargs)
    def __call__(self, img):
        if not isinstance(img, Image.Image):
            img = np.ascontiguousarray(img)
            img = Image.fromarray(img)
-        if six.PY2:
+        img = super().__call__(img)
-            img = super(AutoAugment, self).__call__(img)
-        else:
-            img = super().__call__(img)
        if isinstance(img, Image.Image):
            img = np.asarray(img)
@@ -69,21 +64,35 @@ class AutoAugment(RawImageNetPolicy):
 class RandAugment(RawRandAugment):
    """ RandAugment wrapper to auto fit different img types """
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+    def __call__(self, img):
+        if not isinstance(img, Image.Image):
+            img = np.ascontiguousarray(img)
+            img = Image.fromarray(img)
+        img = super().__call__(img)
+        if isinstance(img, Image.Image):
+            img = np.asarray(img)
+        return img
+class TimmAutoAugment(RawTimmAutoAugment):
+    """ TimmAutoAugment wrapper to auto fit different img tyeps. """
    def __init__(self, *args, **kwargs):
-        if six.PY2:
+        super().__init__(*args, **kwargs)
-            super(RandAugment, self).__init__(*args, **kwargs)
-        else:
-            super().__init__(*args, **kwargs)
    def __call__(self, img):
        if not isinstance(img, Image.Image):
            img = np.ascontiguousarray(img)
            img = Image.fromarray(img)
-        if six.PY2:
+        img = super().__call__(img)
-            img = super(RandAugment, self).__call__(img)
-        else:
-            img = super().__call__(img)
        if isinstance(img, Image.Image):
            img = np.asarray(img)

--- a/ppcls/data/preprocess/batch_ops/batch_operators.py
+++ b/ppcls/data/preprocess/batch_ops/batch_operators.py
@@ -16,13 +16,17 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 from __future__ import unicode_literals
+import random
 import numpy as np
+from ppcls.utils import logger
 from ppcls.data.preprocess.ops.fmix import sample_mask
 class BatchOperator(object):
    """ BatchOperator """
    def __init__(self, *args, **kwargs):
        pass
@@ -46,9 +50,20 @@ class BatchOperator(object):
 class MixupOperator(BatchOperator):
    """ Mixup operator """
-    def __init__(self, alpha=0.2):
-        assert alpha > 0., \
+    def __init__(self, alpha: float=1.):
-                'parameter alpha[%f] should > 0.0' % (alpha)
+        """Build Mixup operator
+        Args:
+            alpha (float, optional): The parameter alpha of mixup. Defaults to 1..
+        Raises:
+            Exception: The value of parameter is illegal.
+        """
+        if alpha <= 0:
+            raise Exception(
+                f"Parameter \"alpha\" of Mixup should be greater than 0. \"alpha\": {alpha}."
+            )
        self._alpha = alpha
    def __call__(self, batch):
@@ -62,9 +77,20 @@ class MixupOperator(BatchOperator):
 class CutmixOperator(BatchOperator):
    """ Cutmix operator """
    def __init__(self, alpha=0.2):
-        assert alpha > 0., \
+        """Build Cutmix operator
-                'parameter alpha[%f] should > 0.0' % (alpha)
+        Args:
+            alpha (float, optional): The parameter alpha of cutmix. Defaults to 0.2.
+        Raises:
+            Exception: The value of parameter is illegal.
+        """
+        if alpha <= 0:
+            raise Exception(
+                f"Parameter \"alpha\" of Cutmix should be greater than 0. \"alpha\": {alpha}."
+            )
        self._alpha = alpha
    def _rand_bbox(self, size, lam):
@@ -72,8 +98,8 @@ class CutmixOperator(BatchOperator):
        w = size[2]
        h = size[3]
        cut_rat = np.sqrt(1. - lam)
-        cut_w = np.int(w * cut_rat)
+        cut_w = int(w * cut_rat)
-        cut_h = np.int(h * cut_rat)
+        cut_h = int(h * cut_rat)
        # uniform
        cx = np.random.randint(w)
@@ -101,6 +127,7 @@ class CutmixOperator(BatchOperator):
 class FmixOperator(BatchOperator):
    """ Fmix operator """
    def __init__(self, alpha=1, decay_power=3, max_soft=0., reformulate=False):
        self._alpha = alpha
        self._decay_power = decay_power
@@ -115,3 +142,42 @@ class FmixOperator(BatchOperator):
                size, self._max_soft, self._reformulate)
        imgs = mask * imgs + (1 - mask) * imgs[idx]
        return list(zip(imgs, labels, labels[idx], [lam] * bs))
+class OpSampler(object):
+    """ Sample a operator from  """
+    def __init__(self, **op_dict):
+        """Build OpSampler
+        Raises:
+            Exception: The parameter \"prob\" of operator(s) are be set error.
+        """
+        if len(op_dict) < 1:
+            msg = f"ConfigWarning: No operator in \"OpSampler\". \"OpSampler\" has been skipped."
+        self.ops = {}
+        total_prob = 0
+        for op_name in op_dict:
+            param = op_dict[op_name]
+            if "prob" not in param:
+                msg = f"ConfigWarning: Parameter \"prob\" should be set when use operator in \"OpSampler\". The operator \"{op_name}\"'s prob has been set \"0\"."
+                logger.warning(msg)
+            prob = param.pop("prob", 0)
+            total_prob += prob
+            op = eval(op_name)(**param)
+            self.ops.update({op: prob})
+        if total_prob > 1:
+            msg = f"ConfigError: The total prob of operators in \"OpSampler\" should be less 1."
+            logger.error(msg)
+            raise Exception(msg)
+        # add "None Op" when total_prob < 1, "None Op" do nothing
+        self.ops[None] = 1 - total_prob
+    def __call__(self, batch):
+        op = random.choices(
+            list(self.ops.keys()), weights=list(self.ops.values()), k=1)[0]
+        # return batch directly when None Op
+        return op(batch) if op else batch
--- a/ppcls/data/preprocess/ops/operators.py
+++ b/ppcls/data/preprocess/ops/operators.py
@@ -19,15 +19,62 @@ from __future__ import division
 from __future__ import print_function
 from __future__ import unicode_literals
+from functools import partial
 import six
 import math
 import random
 import cv2
 import numpy as np
 from PIL import Image
+from paddle.vision.transforms import ColorJitter as RawColorJitter
 from .autoaugment import ImageNetPolicy
 from .functional import augmentations
+from ppcls.utils import logger
+class UnifiedResize(object):
+    def __init__(self, interpolation=None, backend="cv2"):
+        _cv2_interp_from_str = {
+            'nearest': cv2.INTER_NEAREST,
+            'bilinear': cv2.INTER_LINEAR,
+            'area': cv2.INTER_AREA,
+            'bicubic': cv2.INTER_CUBIC,
+            'lanczos': cv2.INTER_LANCZOS4
+        }
+        _pil_interp_from_str = {
+            'nearest': Image.NEAREST,
+            'bilinear': Image.BILINEAR,
+            'bicubic': Image.BICUBIC,
+            'box': Image.BOX,
+            'lanczos': Image.LANCZOS,
+            'hamming': Image.HAMMING
+        }
+        def _pil_resize(src, size, resample):
+            pil_img = Image.fromarray(src)
+            pil_img = pil_img.resize(size, resample)
+            return np.asarray(pil_img)
+        if backend.lower() == "cv2":
+            if isinstance(interpolation, str):
+                interpolation = _cv2_interp_from_str[interpolation.lower()]
+            # compatible with opencv < version 4.4.0
+            elif not interpolation:
+                interpolation = cv2.INTER_LINEAR
+            self.resize_func = partial(cv2.resize, interpolation=interpolation)
+        elif backend.lower() == "pil":
+            if isinstance(interpolation, str):
+                interpolation = _pil_interp_from_str[interpolation.lower()]
+            self.resize_func = partial(_pil_resize, resample=interpolation)
+        else:
+            logger.warning(
+                f"The backend of Resize only support \"cv2\" or \"PIL\". \"f{backend}\" is unavailable. Use \"cv2\" instead."
+            )
+            self.resize_func = cv2.resize
+    def __call__(self, src, size):
+        return self.resize_func(src, size)
 class OperatorParamError(ValueError):
@@ -67,8 +114,11 @@ class DecodeImage(object):
 class ResizeImage(object):
    """ resize image """
-    def __init__(self, size=None, resize_short=None, interpolation=-1):
+    def __init__(self,
-        self.interpolation = interpolation if interpolation >= 0 else None
+                 size=None,
+                 resize_short=None,
+                 interpolation=None,
+                 backend="cv2"):
        if resize_short is not None and resize_short > 0:
            self.resize_short = resize_short
            self.w = None
@@ -81,6 +131,9 @@ class ResizeImage(object):
            raise OperatorParamError("invalid params for ReisizeImage for '\
                'both 'size' and 'resize_short' are None")
+        self._resize_func = UnifiedResize(
+            interpolation=interpolation, backend=backend)
    def __call__(self, img):
        img_h, img_w = img.shape[:2]
        if self.resize_short is not None:
@@ -90,10 +143,7 @@ class ResizeImage(object):
        else:
            w = self.w
            h = self.h
-        if self.interpolation is None:
+        return self._resize_func(img, (w, h))
-            return cv2.resize(img, (w, h))
-        else:
-            return cv2.resize(img, (w, h), interpolation=self.interpolation)
 class CropImage(object):
@@ -119,9 +169,12 @@ class CropImage(object):
 class RandCropImage(object):
    """ random crop image """
-    def __init__(self, size, scale=None, ratio=None, interpolation=-1):
+    def __init__(self,
+                 size,
-        self.interpolation = interpolation if interpolation >= 0 else None
+                 scale=None,
+                 ratio=None,
+                 interpolation=None,
+                 backend="cv2"):
        if type(size) is int:
            self.size = (size, size)  # (h, w)
        else:
@@ -130,6 +183,9 @@ class RandCropImage(object):
        self.scale = [0.08, 1.0] if scale is None else scale
        self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+        self._resize_func = UnifiedResize(
+            interpolation=interpolation, backend=backend)
    def __call__(self, img):
        size = self.size
        scale = self.scale
@@ -155,10 +211,8 @@ class RandCropImage(object):
        j = random.randint(0, img_h - h)
        img = img[j:j + h, i:i + w, :]
-        if self.interpolation is None:
-            return cv2.resize(img, size)
+        return self._resize_func(img, size)
-        else:
-            return cv2.resize(img, size, interpolation=self.interpolation)
 class RandFlipImage(object):
@@ -313,3 +367,20 @@ class AugMix(object):
        mixed = (1 - m) * image + m * mix
        return mixed.astype(np.uint8)
+class ColorJitter(RawColorJitter):
+    """ColorJitter.
+    """
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+    def __call__(self, img):
+        if not isinstance(img, Image.Image):
+            img = np.ascontiguousarray(img)
+            img = Image.fromarray(img)
+        img = super()._apply_image(img)
+        if isinstance(img, Image.Image):
+            img = np.asarray(img)
+        return img
--- a/ppcls/data/preprocess/ops/random_erasing.py
+++ b/ppcls/data/preprocess/ops/random_erasing.py
@@ -12,7 +12,9 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
-#This code is based on https://github.com/zhunzhong07/Random-Erasing
+#This code is adapted from https://github.com/zhunzhong07/Random-Erasing, and refer to Timm.
+from functools import partial
 import math
 import random
@@ -20,36 +22,69 @@ import random
 import numpy as np
+class Pixels(object):
+    def __init__(self, mode="const", mean=[0., 0., 0.]):
+        self._mode = mode
+        self._mean = mean
+    def __call__(self, h=224, w=224, c=3):
+        if self._mode == "rand":
+            return np.random.normal(size=(1, 1, 3))
+        elif self._mode == "pixel":
+            return np.random.normal(size=(h, w, c))
+        elif self._mode == "const":
+            return self._mean
+        else:
+            raise Exception(
+                "Invalid mode in RandomErasing, only support \"const\", \"rand\", \"pixel\""
+            )
 class RandomErasing(object):
-    def __init__(self, EPSILON=0.5, sl=0.02, sh=0.4, r1=0.3,
+    """RandomErasing.
-                 mean=[0., 0., 0.]):
+    """
-        self.EPSILON = EPSILON
-        self.mean = mean
+    def __init__(self,
-        self.sl = sl
+                 EPSILON=0.5,
-        self.sh = sh
+                 sl=0.02,
-        self.r1 = r1
+                 sh=0.4,
+                 r1=0.3,
+                 mean=[0., 0., 0.],
+                 attempt=100,
+                 use_log_aspect=False,
+                 mode='const'):
+        self.EPSILON = eval(EPSILON) if isinstance(EPSILON, str) else EPSILON
+        self.sl = eval(sl) if isinstance(sl, str) else sl
+        self.sh = eval(sh) if isinstance(sh, str) else sh
+        r1 = eval(r1) if isinstance(r1, str) else r1
+        self.r1 = (math.log(r1), math.log(1 / r1)) if use_log_aspect else (
+            r1, 1 / r1)
+        self.use_log_aspect = use_log_aspect
+        self.attempt = attempt
+        self.get_pixels = Pixels(mode, mean)
    def __call__(self, img):
-        if random.uniform(0, 1) > self.EPSILON:
+        if random.random() > self.EPSILON:
            return img
-        for _ in range(100):
+        for _ in range(self.attempt):
            area = img.shape[0] * img.shape[1]
            target_area = random.uniform(self.sl, self.sh) * area
-            aspect_ratio = random.uniform(self.r1, 1 / self.r1)
+            aspect_ratio = random.uniform(*self.r1)
+            if self.use_log_aspect:
+                aspect_ratio = math.exp(aspect_ratio)
            h = int(round(math.sqrt(target_area * aspect_ratio)))
            w = int(round(math.sqrt(target_area / aspect_ratio)))
            if w < img.shape[1] and h < img.shape[0]:
+                pixels = self.get_pixels(h, w, img.shape[2])
                x1 = random.randint(0, img.shape[0] - h)
                y1 = random.randint(0, img.shape[1] - w)
-                if img.shape[0] == 3:
+                if img.shape[2] == 3:
-                    img[x1:x1 + h, y1:y1 + w, 0] = self.mean[0]
+                    img[x1:x1 + h, y1:y1 + w, :] = pixels
-                    img[x1:x1 + h, y1:y1 + w, 1] = self.mean[1]
-                    img[x1:x1 + h, y1:y1 + w, 2] = self.mean[2]
                else:
-                    img[0, x1:x1 + h, y1:y1 + w] = self.mean[1]
+                    img[x1:x1 + h, y1:y1 + w, 0] = pixels[0]
                return img
        return img
--- a/ppcls/data/preprocess/ops/timm_autoaugment.py
+++ b/ppcls/data/preprocess/ops/timm_autoaugment.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This code implements is borrowed from Timm: https://github.com/rwightman/pytorch-image-models.
+hacked together by / Copyright 2020 Ross Wightman
+"""
+import random
+import math
+import re
+from PIL import Image, ImageOps, ImageEnhance, ImageChops
+import PIL
+import numpy as np
+IMAGENET_DEFAULT_MEAN = (0.485, 0.456, 0.406)
+_PIL_VER = tuple([int(x) for x in PIL.__version__.split('.')[:2]])
+_FILL = (128, 128, 128)
+# This signifies the max integer that the controller RNN could predict for the
+# augmentation scheme.
+_MAX_LEVEL = 10.
+_HPARAMS_DEFAULT = dict(
+    translate_const=250,
+    img_mean=_FILL, )
+_RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)
+def _pil_interp(method):
+    if method == 'bicubic':
+        return Image.BICUBIC
+    elif method == 'lanczos':
+        return Image.LANCZOS
+    elif method == 'hamming':
+        return Image.HAMMING
+    else:
+        # default bilinear, do we want to allow nearest?
+        return Image.BILINEAR
+def _interpolation(kwargs):
+    interpolation = kwargs.pop('resample', Image.BILINEAR)
+    if isinstance(interpolation, (list, tuple)):
+        return random.choice(interpolation)
+    else:
+        return interpolation
+def _check_args_tf(kwargs):
+    if 'fillcolor' in kwargs and _PIL_VER < (5, 0):
+        kwargs.pop('fillcolor')
+    kwargs['resample'] = _interpolation(kwargs)
+def shear_x(img, factor, **kwargs):
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, factor, 0, 0, 1, 0),
+                         **kwargs)
+def shear_y(img, factor, **kwargs):
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, 0, factor, 1, 0),
+                         **kwargs)
+def translate_x_rel(img, pct, **kwargs):
+    pixels = pct * img.size[0]
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0),
+                         **kwargs)
+def translate_y_rel(img, pct, **kwargs):
+    pixels = pct * img.size[1]
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels),
+                         **kwargs)
+def translate_x_abs(img, pixels, **kwargs):
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0),
+                         **kwargs)
+def translate_y_abs(img, pixels, **kwargs):
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels),
+                         **kwargs)
+def rotate(img, degrees, **kwargs):
+    _check_args_tf(kwargs)
+    if _PIL_VER >= (5, 2):
+        return img.rotate(degrees, **kwargs)
+    elif _PIL_VER >= (5, 0):
+        w, h = img.size
+        post_trans = (0, 0)
+        rotn_center = (w / 2.0, h / 2.0)
+        angle = -math.radians(degrees)
+        matrix = [
+            round(math.cos(angle), 15),
+            round(math.sin(angle), 15),
+            0.0,
+            round(-math.sin(angle), 15),
+            round(math.cos(angle), 15),
+            0.0,
+        ]
+        def transform(x, y, matrix):
+            (a, b, c, d, e, f) = matrix
+            return a * x + b * y + c, d * x + e * y + f
+        matrix[2], matrix[5] = transform(-rotn_center[0] - post_trans[0],
+                                         -rotn_center[1] - post_trans[1],
+                                         matrix)
+        matrix[2] += rotn_center[0]
+        matrix[5] += rotn_center[1]
+        return img.transform(img.size, Image.AFFINE, matrix, **kwargs)
+    else:
+        return img.rotate(degrees, resample=kwargs['resample'])
+def auto_contrast(img, **__):
+    return ImageOps.autocontrast(img)
+def invert(img, **__):
+    return ImageOps.invert(img)
+def equalize(img, **__):
+    return ImageOps.equalize(img)
+def solarize(img, thresh, **__):
+    return ImageOps.solarize(img, thresh)
+def solarize_add(img, add, thresh=128, **__):
+    lut = []
+    for i in range(256):
+        if i < thresh:
+            lut.append(min(255, i + add))
+        else:
+            lut.append(i)
+    if img.mode in ("L", "RGB"):
+        if img.mode == "RGB" and len(lut) == 256:
+            lut = lut + lut + lut
+        return img.point(lut)
+    else:
+        return img
+def posterize(img, bits_to_keep, **__):
+    if bits_to_keep >= 8:
+        return img
+    return ImageOps.posterize(img, bits_to_keep)
+def contrast(img, factor, **__):
+    return ImageEnhance.Contrast(img).enhance(factor)
+def color(img, factor, **__):
+    return ImageEnhance.Color(img).enhance(factor)
+def brightness(img, factor, **__):
+    return ImageEnhance.Brightness(img).enhance(factor)
+def sharpness(img, factor, **__):
+    return ImageEnhance.Sharpness(img).enhance(factor)
+def _randomly_negate(v):
+    """With 50% prob, negate the value"""
+    return -v if random.random() > 0.5 else v
+def _rotate_level_to_arg(level, _hparams):
+    # range [-30, 30]
+    level = (level / _MAX_LEVEL) * 30.
+    level = _randomly_negate(level)
+    return level,
+def _enhance_level_to_arg(level, _hparams):
+    # range [0.1, 1.9]
+    return (level / _MAX_LEVEL) * 1.8 + 0.1,
+def _enhance_increasing_level_to_arg(level, _hparams):
+    # the 'no change' level is 1.0, moving away from that towards 0. or 2.0 increases the enhancement blend
+    # range [0.1, 1.9]
+    level = (level / _MAX_LEVEL) * .9
+    level = 1.0 + _randomly_negate(level)
+    return level,
+def _shear_level_to_arg(level, _hparams):
+    # range [-0.3, 0.3]
+    level = (level / _MAX_LEVEL) * 0.3
+    level = _randomly_negate(level)
+    return level,
+def _translate_abs_level_to_arg(level, hparams):
+    translate_const = hparams['translate_const']
+    level = (level / _MAX_LEVEL) * float(translate_const)
+    level = _randomly_negate(level)
+    return level,
+def _translate_rel_level_to_arg(level, hparams):
+    # default range [-0.45, 0.45]
+    translate_pct = hparams.get('translate_pct', 0.45)
+    level = (level / _MAX_LEVEL) * translate_pct
+    level = _randomly_negate(level)
+    return level,
+def _posterize_level_to_arg(level, _hparams):
+    # As per Tensorflow TPU EfficientNet impl
+    # range [0, 4], 'keep 0 up to 4 MSB of original image'
+    # intensity/severity of augmentation decreases with level
+    return int((level / _MAX_LEVEL) * 4),
+def _posterize_increasing_level_to_arg(level, hparams):
+    # As per Tensorflow models research and UDA impl
+    # range [4, 0], 'keep 4 down to 0 MSB of original image',
+    # intensity/severity of augmentation increases with level
+    return 4 - _posterize_level_to_arg(level, hparams)[0],
+def _posterize_original_level_to_arg(level, _hparams):
+    # As per original AutoAugment paper description
+    # range [4, 8], 'keep 4 up to 8 MSB of image'
+    # intensity/severity of augmentation decreases with level
+    return int((level / _MAX_LEVEL) * 4) + 4,
+def _solarize_level_to_arg(level, _hparams):
+    # range [0, 256]
+    # intensity/severity of augmentation decreases with level
+    return int((level / _MAX_LEVEL) * 256),
+def _solarize_increasing_level_to_arg(level, _hparams):
+    # range [0, 256]
+    # intensity/severity of augmentation increases with level
+    return 256 - _solarize_level_to_arg(level, _hparams)[0],
+def _solarize_add_level_to_arg(level, _hparams):
+    # range [0, 110]
+    return int((level / _MAX_LEVEL) * 110),
+LEVEL_TO_ARG = {
+    'AutoContrast': None,
+    'Equalize': None,
+    'Invert': None,
+    'Rotate': _rotate_level_to_arg,
+    # There are several variations of the posterize level scaling in various Tensorflow/Google repositories/papers
+    'Posterize': _posterize_level_to_arg,
+    'PosterizeIncreasing': _posterize_increasing_level_to_arg,
+    'PosterizeOriginal': _posterize_original_level_to_arg,
+    'Solarize': _solarize_level_to_arg,
+    'SolarizeIncreasing': _solarize_increasing_level_to_arg,
+    'SolarizeAdd': _solarize_add_level_to_arg,
+    'Color': _enhance_level_to_arg,
+    'ColorIncreasing': _enhance_increasing_level_to_arg,
+    'Contrast': _enhance_level_to_arg,
+    'ContrastIncreasing': _enhance_increasing_level_to_arg,
+    'Brightness': _enhance_level_to_arg,
+    'BrightnessIncreasing': _enhance_increasing_level_to_arg,
+    'Sharpness': _enhance_level_to_arg,
+    'SharpnessIncreasing': _enhance_increasing_level_to_arg,
+    'ShearX': _shear_level_to_arg,
+    'ShearY': _shear_level_to_arg,
+    'TranslateX': _translate_abs_level_to_arg,
+    'TranslateY': _translate_abs_level_to_arg,
+    'TranslateXRel': _translate_rel_level_to_arg,
+    'TranslateYRel': _translate_rel_level_to_arg,
+}
+NAME_TO_OP = {
+    'AutoContrast': auto_contrast,
+    'Equalize': equalize,
+    'Invert': invert,
+    'Rotate': rotate,
+    'Posterize': posterize,
+    'PosterizeIncreasing': posterize,
+    'PosterizeOriginal': posterize,
+    'Solarize': solarize,
+    'SolarizeIncreasing': solarize,
+    'SolarizeAdd': solarize_add,
+    'Color': color,
+    'ColorIncreasing': color,
+    'Contrast': contrast,
+    'ContrastIncreasing': contrast,
+    'Brightness': brightness,
+    'BrightnessIncreasing': brightness,
+    'Sharpness': sharpness,
+    'SharpnessIncreasing': sharpness,
+    'ShearX': shear_x,
+    'ShearY': shear_y,
+    'TranslateX': translate_x_abs,
+    'TranslateY': translate_y_abs,
+    'TranslateXRel': translate_x_rel,
+    'TranslateYRel': translate_y_rel,
+}
+class AugmentOp(object):
+    def __init__(self, name, prob=0.5, magnitude=10, hparams=None):
+        hparams = hparams or _HPARAMS_DEFAULT
+        self.aug_fn = NAME_TO_OP[name]
+        self.level_fn = LEVEL_TO_ARG[name]
+        self.prob = prob
+        self.magnitude = magnitude
+        self.hparams = hparams.copy()
+        self.kwargs = dict(
+            fillcolor=hparams['img_mean'] if 'img_mean' in hparams else _FILL,
+            resample=hparams['interpolation']
+            if 'interpolation' in hparams else _RANDOM_INTERPOLATION, )
+        # If magnitude_std is > 0, we introduce some randomness
+        # in the usually fixed policy and sample magnitude from a normal distribution
+        # with mean `magnitude` and std-dev of `magnitude_std`.
+        # NOTE This is my own hack, being tested, not in papers or reference impls.
+        self.magnitude_std = self.hparams.get('magnitude_std', 0)
+    def __call__(self, img):
+        if self.prob < 1.0 and random.random() > self.prob:
+            return img
+        magnitude = self.magnitude
+        if self.magnitude_std and self.magnitude_std > 0:
+            magnitude = random.gauss(magnitude, self.magnitude_std)
+        magnitude = min(_MAX_LEVEL, max(0, magnitude))  # clip to valid range
+        level_args = self.level_fn(
+            magnitude, self.hparams) if self.level_fn is not None else tuple()
+        return self.aug_fn(img, *level_args, **self.kwargs)
+def auto_augment_policy_v0(hparams):
+    # ImageNet v0 policy from TPU EfficientNet impl, cannot find a paper reference.
+    policy = [
+        [('Equalize', 0.8, 1), ('ShearY', 0.8, 4)],
+        [('Color', 0.4, 9), ('Equalize', 0.6, 3)],
+        [('Color', 0.4, 1), ('Rotate', 0.6, 8)],
+        [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)],
+        [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)],
+        [('Color', 0.2, 0), ('Equalize', 0.8, 8)],
+        [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)],
+        [('ShearX', 0.2, 9), ('Rotate', 0.6, 8)],
+        [('Color', 0.6, 1), ('Equalize', 1.0, 2)],
+        [('Invert', 0.4, 9), ('Rotate', 0.6, 0)],
+        [('Equalize', 1.0, 9), ('ShearY', 0.6, 3)],
+        [('Color', 0.4, 7), ('Equalize', 0.6, 0)],
+        [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)],
+        [('Solarize', 0.6, 8), ('Color', 0.6, 9)],
+        [('Solarize', 0.2, 4), ('Rotate', 0.8, 9)],
+        [('Rotate', 1.0, 7), ('TranslateYRel', 0.8, 9)],
+        [('ShearX', 0.0, 0), ('Solarize', 0.8, 4)],
+        [('ShearY', 0.8, 0), ('Color', 0.6, 4)],
+        [('Color', 1.0, 0), ('Rotate', 0.6, 2)],
+        [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)],
+        [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)],
+        [('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)],
+        [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)
+         ],  # This results in black image with Tpu posterize
+        [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)],
+        [('Color', 0.8, 6), ('Rotate', 0.4, 5)],
+    ]
+    pc = [[AugmentOp(*a, hparams=hparams) for a in sp] for sp in policy]
+    return pc
+def auto_augment_policy_v0r(hparams):
+    # ImageNet v0 policy from TPU EfficientNet impl, with variation of Posterize used
+    # in Google research implementation (number of bits discarded increases with magnitude)
+    policy = [
+        [('Equalize', 0.8, 1), ('ShearY', 0.8, 4)],
+        [('Color', 0.4, 9), ('Equalize', 0.6, 3)],
+        [('Color', 0.4, 1), ('Rotate', 0.6, 8)],
+        [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)],
+        [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)],
+        [('Color', 0.2, 0), ('Equalize', 0.8, 8)],
+        [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)],
+        [('ShearX', 0.2, 9), ('Rotate', 0.6, 8)],
+        [('Color', 0.6, 1), ('Equalize', 1.0, 2)],
+        [('Invert', 0.4, 9), ('Rotate', 0.6, 0)],
+        [('Equalize', 1.0, 9), ('ShearY', 0.6, 3)],
+        [('Color', 0.4, 7), ('Equalize', 0.6, 0)],
+        [('PosterizeIncreasing', 0.4, 6), ('AutoContrast', 0.4, 7)],
+        [('Solarize', 0.6, 8), ('Color', 0.6, 9)],
+        [('Solarize', 0.2, 4), ('Rotate', 0.8, 9)],
+        [('Rotate', 1.0, 7), ('TranslateYRel', 0.8, 9)],
+        [('ShearX', 0.0, 0), ('Solarize', 0.8, 4)],
+        [('ShearY', 0.8, 0), ('Color', 0.6, 4)],
+        [('Color', 1.0, 0), ('Rotate', 0.6, 2)],
+        [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)],
+        [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)],
+        [('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)],
+        [('PosterizeIncreasing', 0.8, 2), ('Solarize', 0.6, 10)],
+        [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)],
+        [('Color', 0.8, 6), ('Rotate', 0.4, 5)],
+    ]
+    pc = [[AugmentOp(*a, hparams=hparams) for a in sp] for sp in policy]
+    return pc
+def auto_augment_policy_original(hparams):
+    # ImageNet policy from https://arxiv.org/abs/1805.09501
+    policy = [
+        [('PosterizeOriginal', 0.4, 8), ('Rotate', 0.6, 9)],
+        [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+        [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+        [('PosterizeOriginal', 0.6, 7), ('PosterizeOriginal', 0.6, 6)],
+        [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+        [('Equalize', 0.4, 4), ('Rotate', 0.8, 8)],
+        [('Solarize', 0.6, 3), ('Equalize', 0.6, 7)],
+        [('PosterizeOriginal', 0.8, 5), ('Equalize', 1.0, 2)],
+        [('Rotate', 0.2, 3), ('Solarize', 0.6, 8)],
+        [('Equalize', 0.6, 8), ('PosterizeOriginal', 0.4, 6)],
+        [('Rotate', 0.8, 8), ('Color', 0.4, 0)],
+        [('Rotate', 0.4, 9), ('Equalize', 0.6, 2)],
+        [('Equalize', 0.0, 7), ('Equalize', 0.8, 8)],
+        [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+        [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+        [('Rotate', 0.8, 8), ('Color', 1.0, 2)],
+        [('Color', 0.8, 8), ('Solarize', 0.8, 7)],
+        [('Sharpness', 0.4, 7), ('Invert', 0.6, 8)],
+        [('ShearX', 0.6, 5), ('Equalize', 1.0, 9)],
+        [('Color', 0.4, 0), ('Equalize', 0.6, 3)],
+        [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+        [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+        [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+        [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+        [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+    ]
+    pc = [[AugmentOp(*a, hparams=hparams) for a in sp] for sp in policy]
+    return pc
+def auto_augment_policy_originalr(hparams):
+    # ImageNet policy from https://arxiv.org/abs/1805.09501 with research posterize variation
+    policy = [
+        [('PosterizeIncreasing', 0.4, 8), ('Rotate', 0.6, 9)],
+        [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+        [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+        [('PosterizeIncreasing', 0.6, 7), ('PosterizeIncreasing', 0.6, 6)],
+        [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+        [('Equalize', 0.4, 4), ('Rotate', 0.8, 8)],
+        [('Solarize', 0.6, 3), ('Equalize', 0.6, 7)],
+        [('PosterizeIncreasing', 0.8, 5), ('Equalize', 1.0, 2)],
+        [('Rotate', 0.2, 3), ('Solarize', 0.6, 8)],
+        [('Equalize', 0.6, 8), ('PosterizeIncreasing', 0.4, 6)],
+        [('Rotate', 0.8, 8), ('Color', 0.4, 0)],
+        [('Rotate', 0.4, 9), ('Equalize', 0.6, 2)],
+        [('Equalize', 0.0, 7), ('Equalize', 0.8, 8)],
+        [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+        [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+        [('Rotate', 0.8, 8), ('Color', 1.0, 2)],
+        [('Color', 0.8, 8), ('Solarize', 0.8, 7)],
+        [('Sharpness', 0.4, 7), ('Invert', 0.6, 8)],
+        [('ShearX', 0.6, 5), ('Equalize', 1.0, 9)],
+        [('Color', 0.4, 0), ('Equalize', 0.6, 3)],
+        [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+        [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+        [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+        [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+        [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+    ]
+    pc = [[AugmentOp(*a, hparams=hparams) for a in sp] for sp in policy]
+    return pc
+def auto_augment_policy(name='v0', hparams=None):
+    hparams = hparams or _HPARAMS_DEFAULT
+    if name == 'original':
+        return auto_augment_policy_original(hparams)
+    elif name == 'originalr':
+        return auto_augment_policy_originalr(hparams)
+    elif name == 'v0':
+        return auto_augment_policy_v0(hparams)
+    elif name == 'v0r':
+        return auto_augment_policy_v0r(hparams)
+    else:
+        assert False, 'Unknown AA policy (%s)' % name
+class AutoAugment(object):
+    def __init__(self, policy):
+        self.policy = policy
+    def __call__(self, img):
+        sub_policy = random.choice(self.policy)
+        for op in sub_policy:
+            img = op(img)
+        return img
+def auto_augment_transform(config_str, hparams):
+    """
+    Create a AutoAugment transform
+    :param config_str: String defining configuration of auto augmentation. Consists of multiple sections separated by
+    dashes ('-'). The first section defines the AutoAugment policy (one of 'v0', 'v0r', 'original', 'originalr').
+    The remaining sections, not order sepecific determine
+        'mstd' -  float std deviation of magnitude noise applied
+    Ex 'original-mstd0.5' results in AutoAugment with original policy, magnitude_std 0.5
+    :param hparams: Other hparams (kwargs) for the AutoAugmentation scheme
+    :return: A callable Transform Op
+    """
+    config = config_str.split('-')
+    policy_name = config[0]
+    config = config[1:]
+    for c in config:
+        cs = re.split(r'(\d.*)', c)
+        if len(cs) < 2:
+            continue
+        key, val = cs[:2]
+        if key == 'mstd':
+            # noise param injected via hparams for now
+            hparams.setdefault('magnitude_std', float(val))
+        else:
+            assert False, 'Unknown AutoAugment config section'
+    aa_policy = auto_augment_policy(policy_name, hparams=hparams)
+    return AutoAugment(aa_policy)
+_RAND_TRANSFORMS = [
+    'AutoContrast',
+    'Equalize',
+    'Invert',
+    'Rotate',
+    'Posterize',
+    'Solarize',
+    'SolarizeAdd',
+    'Color',
+    'Contrast',
+    'Brightness',
+    'Sharpness',
+    'ShearX',
+    'ShearY',
+    'TranslateXRel',
+    'TranslateYRel',
+    #'Cutout'  # NOTE I've implement this as random erasing separately
+]
+_RAND_INCREASING_TRANSFORMS = [
+    'AutoContrast',
+    'Equalize',
+    'Invert',
+    'Rotate',
+    'PosterizeIncreasing',
+    'SolarizeIncreasing',
+    'SolarizeAdd',
+    'ColorIncreasing',
+    'ContrastIncreasing',
+    'BrightnessIncreasing',
+    'SharpnessIncreasing',
+    'ShearX',
+    'ShearY',
+    'TranslateXRel',
+    'TranslateYRel',
+    #'Cutout'  # NOTE I've implement this as random erasing separately
+]
+# These experimental weights are based loosely on the relative improvements mentioned in paper.
+# They may not result in increased performance, but could likely be tuned to so.
+_RAND_CHOICE_WEIGHTS_0 = {
+    'Rotate': 0.3,
+    'ShearX': 0.2,
+    'ShearY': 0.2,
+    'TranslateXRel': 0.1,
+    'TranslateYRel': 0.1,
+    'Color': .025,
+    'Sharpness': 0.025,
+    'AutoContrast': 0.025,
+    'Solarize': .005,
+    'SolarizeAdd': .005,
+    'Contrast': .005,
+    'Brightness': .005,
+    'Equalize': .005,
+    'Posterize': 0,
+    'Invert': 0,
+}
+def _select_rand_weights(weight_idx=0, transforms=None):
+    transforms = transforms or _RAND_TRANSFORMS
+    assert weight_idx == 0  # only one set of weights currently
+    rand_weights = _RAND_CHOICE_WEIGHTS_0
+    probs = [rand_weights[k] for k in transforms]
+    probs /= np.sum(probs)
+    return probs
+def rand_augment_ops(magnitude=10, hparams=None, transforms=None):
+    hparams = hparams or _HPARAMS_DEFAULT
+    transforms = transforms or _RAND_TRANSFORMS
+    return [
+        AugmentOp(
+            name, prob=0.5, magnitude=magnitude, hparams=hparams)
+        for name in transforms
+    ]
+class RandAugment(object):
+    def __init__(self, ops, num_layers=2, choice_weights=None):
+        self.ops = ops
+        self.num_layers = num_layers
+        self.choice_weights = choice_weights
+    def __call__(self, img):
+        # no replacement when using weighted choice
+        ops = np.random.choice(
+            self.ops,
+            self.num_layers,
+            replace=self.choice_weights is None,
+            p=self.choice_weights)
+        for op in ops:
+            img = op(img)
+        return img
+def rand_augment_transform(config_str, hparams):
+    """
+    Create a RandAugment transform
+    :param config_str: String defining configuration of random augmentation. Consists of multiple sections separated by
+    dashes ('-'). The first section defines the specific variant of rand augment (currently only 'rand'). The remaining
+    sections, not order sepecific determine
+        'm' - integer magnitude of rand augment
+        'n' - integer num layers (number of transform ops selected per image)
+        'w' - integer probabiliy weight index (index of a set of weights to influence choice of op)
+        'mstd' -  float std deviation of magnitude noise applied
+        'inc' - integer (bool), use augmentations that increase in severity with magnitude (default: 0)
+    Ex 'rand-m9-n3-mstd0.5' results in RandAugment with magnitude 9, num_layers 3, magnitude_std 0.5
+    'rand-mstd1-w0' results in magnitude_std 1.0, weights 0, default magnitude of 10 and num_layers 2
+    :param hparams: Other hparams (kwargs) for the RandAugmentation scheme
+    :return: A callable Transform Op
+    """
+    magnitude = _MAX_LEVEL  # default to _MAX_LEVEL for magnitude (currently 10)
+    num_layers = 2  # default to 2 ops per image
+    weight_idx = None  # default to no probability weights for op choice
+    transforms = _RAND_TRANSFORMS
+    config = config_str.split('-')
+    assert config[0] == 'rand'
+    config = config[1:]
+    for c in config:
+        cs = re.split(r'(\d.*)', c)
+        if len(cs) < 2:
+            continue
+        key, val = cs[:2]
+        if key == 'mstd':
+            # noise param injected via hparams for now
+            hparams.setdefault('magnitude_std', float(val))
+        elif key == 'inc':
+            if bool(val):
+                transforms = _RAND_INCREASING_TRANSFORMS
+        elif key == 'm':
+            magnitude = int(val)
+        elif key == 'n':
+            num_layers = int(val)
+        elif key == 'w':
+            weight_idx = int(val)
+        else:
+            assert False, 'Unknown RandAugment config section'
+    ra_ops = rand_augment_ops(
+        magnitude=magnitude, hparams=hparams, transforms=transforms)
+    choice_weights = None if weight_idx is None else _select_rand_weights(
+        weight_idx)
+    return RandAugment(ra_ops, num_layers, choice_weights=choice_weights)
+_AUGMIX_TRANSFORMS = [
+    'AutoContrast',
+    'ColorIncreasing',  # not in paper
+    'ContrastIncreasing',  # not in paper
+    'BrightnessIncreasing',  # not in paper
+    'SharpnessIncreasing',  # not in paper
+    'Equalize',
+    'Rotate',
+    'PosterizeIncreasing',
+    'SolarizeIncreasing',
+    'ShearX',
+    'ShearY',
+    'TranslateXRel',
+    'TranslateYRel',
+]
+def augmix_ops(magnitude=10, hparams=None, transforms=None):
+    hparams = hparams or _HPARAMS_DEFAULT
+    transforms = transforms or _AUGMIX_TRANSFORMS
+    return [
+        AugmentOp(
+            name, prob=1.0, magnitude=magnitude, hparams=hparams)
+        for name in transforms
+    ]
+class AugMixAugment(object):
+    """ AugMix Transform
+    Adapted and improved from impl here: https://github.com/google-research/augmix/blob/master/imagenet.py
+    From paper: 'AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty -
+    https://arxiv.org/abs/1912.02781
+    """
+    def __init__(self, ops, alpha=1., width=3, depth=-1, blended=False):
+        self.ops = ops
+        self.alpha = alpha
+        self.width = width
+        self.depth = depth
+        self.blended = blended  # blended mode is faster but not well tested
+    def _calc_blended_weights(self, ws, m):
+        ws = ws * m
+        cump = 1.
+        rws = []
+        for w in ws[::-1]:
+            alpha = w / cump
+            cump *= (1 - alpha)
+            rws.append(alpha)
+        return np.array(rws[::-1], dtype=np.float32)
+    def _apply_blended(self, img, mixing_weights, m):
+        # This is my first crack and implementing a slightly faster mixed augmentation. Instead
+        # of accumulating the mix for each chain in a Numpy array and then blending with original,
+        # it recomputes the blending coefficients and applies one PIL image blend per chain.
+        # TODO the results appear in the right ballpark but they differ by more than rounding.
+        img_orig = img.copy()
+        ws = self._calc_blended_weights(mixing_weights, m)
+        for w in ws:
+            depth = self.depth if self.depth > 0 else np.random.randint(1, 4)
+            ops = np.random.choice(self.ops, depth, replace=True)
+            img_aug = img_orig  # no ops are in-place, deep copy not necessary
+            for op in ops:
+                img_aug = op(img_aug)
+            img = Image.blend(img, img_aug, w)
+        return img
+    def _apply_basic(self, img, mixing_weights, m):
+        # This is a literal adaptation of the paper/official implementation without normalizations and
+        # PIL <-> Numpy conversions between every op. It is still quite CPU compute heavy compared to the
+        # typical augmentation transforms, could use a GPU / Kornia implementation.
+        img_shape = img.size[0], img.size[1], len(img.getbands())
+        mixed = np.zeros(img_shape, dtype=np.float32)
+        for mw in mixing_weights:
+            depth = self.depth if self.depth > 0 else np.random.randint(1, 4)
+            ops = np.random.choice(self.ops, depth, replace=True)
+            img_aug = img  # no ops are in-place, deep copy not necessary
+            for op in ops:
+                img_aug = op(img_aug)
+            mixed += mw * np.asarray(img_aug, dtype=np.float32)
+        np.clip(mixed, 0, 255., out=mixed)
+        mixed = Image.fromarray(mixed.astype(np.uint8))
+        return Image.blend(img, mixed, m)
+    def __call__(self, img):
+        mixing_weights = np.float32(
+            np.random.dirichlet([self.alpha] * self.width))
+        m = np.float32(np.random.beta(self.alpha, self.alpha))
+        if self.blended:
+            mixed = self._apply_blended(img, mixing_weights, m)
+        else:
+            mixed = self._apply_basic(img, mixing_weights, m)
+        return mixed
+def augment_and_mix_transform(config_str, hparams):
+    """ Create AugMix transform
+    :param config_str: String defining configuration of random augmentation. Consists of multiple sections separated by
+    dashes ('-'). The first section defines the specific variant of rand augment (currently only 'rand'). The remaining
+    sections, not order sepecific determine
+        'm' - integer magnitude (severity) of augmentation mix (default: 3)
+        'w' - integer width of augmentation chain (default: 3)
+        'd' - integer depth of augmentation chain (-1 is random [1, 3], default: -1)
+        'b' - integer (bool), blend each branch of chain into end result without a final blend, less CPU (default: 0)
+        'mstd' -  float std deviation of magnitude noise applied (default: 0)
+    Ex 'augmix-m5-w4-d2' results in AugMix with severity 5, chain width 4, chain depth 2
+    :param hparams: Other hparams (kwargs) for the Augmentation transforms
+    :return: A callable Transform Op
+    """
+    magnitude = 3
+    width = 3
+    depth = -1
+    alpha = 1.
+    blended = False
+    config = config_str.split('-')
+    assert config[0] == 'augmix'
+    config = config[1:]
+    for c in config:
+        cs = re.split(r'(\d.*)', c)
+        if len(cs) < 2:
+            continue
+        key, val = cs[:2]
+        if key == 'mstd':
+            # noise param injected via hparams for now
+            hparams.setdefault('magnitude_std', float(val))
+        elif key == 'm':
+            magnitude = int(val)
+        elif key == 'w':
+            width = int(val)
+        elif key == 'd':
+            depth = int(val)
+        elif key == 'a':
+            alpha = float(val)
+        elif key == 'b':
+            blended = bool(val)
+        else:
+            assert False, 'Unknown AugMix config section'
+    ops = augmix_ops(magnitude=magnitude, hparams=hparams)
+    return AugMixAugment(
+        ops, alpha=alpha, width=width, depth=depth, blended=blended)
+class RawTimmAutoAugment(object):
+    """TimmAutoAugment API for PaddleClas."""
+    def __init__(self,
+                 config_str="rand-m9-mstd0.5-inc1",
+                 interpolation="bicubic",
+                 img_size=224,
+                 mean=IMAGENET_DEFAULT_MEAN):
+        if isinstance(img_size, (tuple, list)):
+            img_size_min = min(img_size)
+        else:
+            img_size_min = img_size
+        aa_params = dict(
+            translate_const=int(img_size_min * 0.45),
+            img_mean=tuple([min(255, round(255 * x)) for x in mean]), )
+        if interpolation and interpolation != 'random':
+            aa_params['interpolation'] = _pil_interp(interpolation)
+        if config_str.startswith('rand'):
+            self.augment_func = rand_augment_transform(config_str, aa_params)
+        elif config_str.startswith('augmix'):
+            aa_params['translate_pct'] = 0.3
+            self.augment_func = augment_and_mix_transform(config_str,
+                                                          aa_params)
+        elif config_str.startswith('auto'):
+            self.augment_func = auto_augment_transform(config_str, aa_params)
+        else:
+            raise Exception(
+                "ConfigError: The TimmAutoAugment Op only support RandAugment, AutoAugment, AugMix, and the config_str only starts with \"rand\", \"augmix\", \"auto\"."
+            )
+    def __call__(self, img):
+        return self.augment_func(img)
--- a/ppcls/engine/engine.py
+++ b/ppcls/engine/engine.py
@@ -200,7 +200,7 @@ class Engine(object):
        if self.mode == 'train':
            self.optimizer, self.lr_sch = build_optimizer(
                self.config["Optimizer"], self.config["Global"]["epochs"],
-                len(self.train_dataloader), self.model.parameters())
+                len(self.train_dataloader), [self.model])
        # for distributed
        self.config["Global"][
@@ -355,7 +355,8 @@ class Engine(object):
    def export(self):
        assert self.mode == "export"
-        model = ExportModel(self.config["Arch"], self.model)
+        use_multilabel = self.config["Global"].get("use_multilabel", False)
+        model = ExportModel(self.config["Arch"], self.model, use_multilabel)
        if self.config["Global"]["pretrained_model"] is not None:
            load_dygraph_pretrain(model.base_model,
                                  self.config["Global"]["pretrained_model"])
@@ -388,10 +389,9 @@ class ExportModel(nn.Layer):
    ExportModel: add softmax onto the model
    """
-    def __init__(self, config, model):
+    def __init__(self, config, model, use_multilabel):
        super().__init__()
        self.base_model = model
        # we should choose a final model to export
        if isinstance(self.base_model, DistillationModel):
            self.infer_model_name = config["infer_model_name"]
@@ -402,10 +402,13 @@ class ExportModel(nn.Layer):
        if self.infer_output_key == "features" and isinstance(self.base_model,
                                                              RecModel):
            self.base_model.head = IdentityHead()
-        if config.get("infer_add_softmax", True):
+        if use_multilabel:
-            self.softmax = nn.Softmax(axis=-1)
+            self.out_act = nn.Sigmoid()
        else:
-            self.softmax = None
+            if config.get("infer_add_softmax", True):
+                self.out_act = nn.Softmax(axis=-1)
+            else:
+                self.out_act = None
    def eval(self):
        self.training = False
@@ -421,6 +424,6 @@ class ExportModel(nn.Layer):
            x = x[self.infer_model_name]
        if self.infer_output_key is not None:
            x = x[self.infer_output_key]
-        if self.softmax is not None:
+        if self.out_act is not None:
-            x = self.softmax(x)
+            x = self.out_act(x)
        return x
--- a/ppcls/engine/evaluation/classification.py
+++ b/ppcls/engine/evaluation/classification.py
@@ -22,7 +22,7 @@ from ppcls.utils.misc import AverageMeter
 from ppcls.utils import logger
-def classification_eval(evaler, epoch_id=0):
+def classification_eval(engine, epoch_id=0):
    output_info = dict()
    time_info = {
        "batch_cost": AverageMeter(
@@ -30,21 +30,19 @@ def classification_eval(evaler, epoch_id=0):
        "reader_cost": AverageMeter(
            "reader_cost", ".5f", postfix=" s,"),
    }
-    print_batch_step = evaler.config["Global"]["print_batch_step"]
+    print_batch_step = engine.config["Global"]["print_batch_step"]
    metric_key = None
    tic = time.time()
-    eval_dataloader = evaler.eval_dataloader if evaler.use_dali else evaler.eval_dataloader(
+    max_iter = len(engine.eval_dataloader) - 1 if platform.system(
-    )
+    ) == "Windows" else len(engine.eval_dataloader)
-    max_iter = len(evaler.eval_dataloader) - 1 if platform.system(
+    for iter_id, batch in enumerate(engine.eval_dataloader):
-    ) == "Windows" else len(evaler.eval_dataloader)
-    for iter_id, batch in enumerate(eval_dataloader):
        if iter_id >= max_iter:
            break
        if iter_id == 5:
            for key in time_info:
                time_info[key].reset()
-        if evaler.use_dali:
+        if engine.use_dali:
            batch = [
                paddle.to_tensor(batch[0]['data']),
                paddle.to_tensor(batch[0]['label'])
@@ -52,19 +50,20 @@ def classification_eval(evaler, epoch_id=0):
        time_info["reader_cost"].update(time.time() - tic)
        batch_size = batch[0].shape[0]
        batch[0] = paddle.to_tensor(batch[0]).astype("float32")
-        batch[1] = batch[1].reshape([-1, 1]).astype("int64")
+        if not engine.config["Global"].get("use_multilabel", False):
+            batch[1] = batch[1].reshape([-1, 1]).astype("int64")
        # image input
-        out = evaler.model(batch[0])
+        out = engine.model(batch[0])
        # calc loss
-        if evaler.eval_loss_func is not None:
+        if engine.eval_loss_func is not None:
-            loss_dict = evaler.eval_loss_func(out, batch[1])
+            loss_dict = engine.eval_loss_func(out, batch[1])
            for key in loss_dict:
                if key not in output_info:
                    output_info[key] = AverageMeter(key, '7.5f')
                output_info[key].update(loss_dict[key].numpy()[0], batch_size)
        # calc metric
-        if evaler.eval_metric_func is not None:
+        if engine.eval_metric_func is not None:
-            metric_dict = evaler.eval_metric_func(out, batch[1])
+            metric_dict = engine.eval_metric_func(out, batch[1])
            if paddle.distributed.get_world_size() > 1:
                for key in metric_dict:
                    paddle.distributed.all_reduce(
@@ -97,18 +96,18 @@ def classification_eval(evaler, epoch_id=0):
            ])
            logger.info("[Eval][Epoch {}][Iter: {}/{}]{}, {}, {}".format(
                epoch_id, iter_id,
-                len(evaler.eval_dataloader), metric_msg, time_msg, ips_msg))
+                len(engine.eval_dataloader), metric_msg, time_msg, ips_msg))
        tic = time.time()
-    if evaler.use_dali:
+    if engine.use_dali:
-        evaler.eval_dataloader.reset()
+        engine.eval_dataloader.reset()
    metric_msg = ", ".join([
        "{}: {:.5f}".format(key, output_info[key].avg) for key in output_info
    ])
    logger.info("[Eval][Epoch {}][Avg]{}".format(epoch_id, metric_msg))
    # do not try to save best eval.model
-    if evaler.eval_metric_func is None:
+    if engine.eval_metric_func is None:
        return -1
    # return 1st metric in the dict
    return output_info[metric_key].avg
--- a/ppcls/engine/evaluation/retrieval.py
+++ b/ppcls/engine/evaluation/retrieval.py
@@ -20,21 +20,21 @@ import paddle
 from ppcls.utils import logger
-def retrieval_eval(evaler, epoch_id=0):
+def retrieval_eval(engine, epoch_id=0):
-    evaler.model.eval()
+    engine.model.eval()
    # step1. build gallery
-    if evaler.gallery_query_dataloader is not None:
+    if engine.gallery_query_dataloader is not None:
        gallery_feas, gallery_img_id, gallery_unique_id = cal_feature(
-            evaler, name='gallery_query')
+            engine, name='gallery_query')
        query_feas, query_img_id, query_query_id = gallery_feas, gallery_img_id, gallery_unique_id
    else:
        gallery_feas, gallery_img_id, gallery_unique_id = cal_feature(
-            evaler, name='gallery')
+            engine, name='gallery')
        query_feas, query_img_id, query_query_id = cal_feature(
-            evaler, name='query')
+            engine, name='query')
    # step2. do evaluation
-    sim_block_size = evaler.config["Global"].get("sim_block_size", 64)
+    sim_block_size = engine.config["Global"].get("sim_block_size", 64)
    sections = [sim_block_size] * (len(query_feas) // sim_block_size)
    if len(query_feas) % sim_block_size:
        sections.append(len(query_feas) % sim_block_size)
@@ -45,7 +45,7 @@ def retrieval_eval(evaler, epoch_id=0):
    image_id_blocks = paddle.split(query_img_id, num_or_sections=sections)
    metric_key = None
-    if evaler.eval_loss_func is None:
+    if engine.eval_loss_func is None:
        metric_dict = {metric_key: 0.}
    else:
        metric_dict = dict()
@@ -65,7 +65,7 @@ def retrieval_eval(evaler, epoch_id=0):
            else:
                keep_mask = None
-            metric_tmp = evaler.eval_metric_func(similarity_matrix,
+            metric_tmp = engine.eval_metric_func(similarity_matrix,
                                                 image_id_blocks[block_idx],
                                                 gallery_img_id, keep_mask)
@@ -88,32 +88,31 @@ def retrieval_eval(evaler, epoch_id=0):
    return metric_dict[metric_key]
-def cal_feature(evaler, name='gallery'):
+def cal_feature(engine, name='gallery'):
    all_feas = None
    all_image_id = None
    all_unique_id = None
    has_unique_id = False
    if name == 'gallery':
-        dataloader = evaler.gallery_dataloader
+        dataloader = engine.gallery_dataloader
    elif name == 'query':
-        dataloader = evaler.query_dataloader
+        dataloader = engine.query_dataloader
    elif name == 'gallery_query':
-        dataloader = evaler.gallery_query_dataloader
+        dataloader = engine.gallery_query_dataloader
    else:
        raise RuntimeError("Only support gallery or query dataset")
    max_iter = len(dataloader) - 1 if platform.system() == "Windows" else len(
        dataloader)
-    dataloader_tmp = dataloader if evaler.use_dali else dataloader()
+    for idx, batch in enumerate(dataloader):  # load is very time-consuming
-    for idx, batch in enumerate(dataloader_tmp):  # load is very time-consuming
        if idx >= max_iter:
            break
-        if idx % evaler.config["Global"]["print_batch_step"] == 0:
+        if idx % engine.config["Global"]["print_batch_step"] == 0:
            logger.info(
                f"{name} feature calculation process: [{idx}/{len(dataloader)}]"
            )
-        if evaler.use_dali:
+        if engine.use_dali:
            batch = [
                paddle.to_tensor(batch[0]['data']),
                paddle.to_tensor(batch[0]['label'])
@@ -123,20 +122,20 @@ def cal_feature(evaler, name='gallery'):
        if len(batch) == 3:
            has_unique_id = True
            batch[2] = batch[2].reshape([-1, 1]).astype("int64")
-        out = evaler.model(batch[0], batch[1])
+        out = engine.model(batch[0], batch[1])
        batch_feas = out["features"]
        # do norm
-        if evaler.config["Global"].get("feature_normalize", True):
+        if engine.config["Global"].get("feature_normalize", True):
            feas_norm = paddle.sqrt(
                paddle.sum(paddle.square(batch_feas), axis=1, keepdim=True))
            batch_feas = paddle.divide(batch_feas, feas_norm)
        # do binarize
-        if evaler.config["Global"].get("feature_binarize") == "round":
+        if engine.config["Global"].get("feature_binarize") == "round":
            batch_feas = paddle.round(batch_feas).astype("float32") * 2.0 - 1.0
-        if evaler.config["Global"].get("feature_binarize") == "sign":
+        if engine.config["Global"].get("feature_binarize") == "sign":
            batch_feas = paddle.sign(batch_feas).astype("float32")
        if all_feas is None:
@@ -150,8 +149,8 @@ def cal_feature(evaler, name='gallery'):
            if has_unique_id:
                all_unique_id = paddle.concat([all_unique_id, batch[2]])
-    if evaler.use_dali:
+    if engine.use_dali:
-        dataloader_tmp.reset()
+        dataloader.reset()
    if paddle.distributed.get_world_size() > 1:
        feat_list = []

--- a/ppcls/engine/train/train.py
+++ b/ppcls/engine/train/train.py
@@ -18,68 +18,66 @@ import paddle
 from ppcls.engine.train.utils import update_loss, update_metric, log_info
-def train_epoch(trainer, epoch_id, print_batch_step):
+def train_epoch(engine, epoch_id, print_batch_step):
    tic = time.time()
+    for iter_id, batch in enumerate(engine.train_dataloader):
-    train_dataloader = trainer.train_dataloader if trainer.use_dali else trainer.train_dataloader(
+        if iter_id >= engine.max_iter:
-    )
-    for iter_id, batch in enumerate(train_dataloader):
-        if iter_id >= trainer.max_iter:
            break
        if iter_id == 5:
-            for key in trainer.time_info:
+            for key in engine.time_info:
-                trainer.time_info[key].reset()
+                engine.time_info[key].reset()
-        trainer.time_info["reader_cost"].update(time.time() - tic)
+        engine.time_info["reader_cost"].update(time.time() - tic)
-        if trainer.use_dali:
+        if engine.use_dali:
            batch = [
                paddle.to_tensor(batch[0]['data']),
                paddle.to_tensor(batch[0]['label'])
            ]
        batch_size = batch[0].shape[0]
-        batch[1] = batch[1].reshape([-1, 1]).astype("int64")
+        if not engine.config["Global"].get("use_multilabel", False):
+            batch[1] = batch[1].reshape([-1, 1]).astype("int64")
+        engine.global_step += 1
-        trainer.global_step += 1
        # image input
-        if trainer.amp:
+        if engine.amp:
            with paddle.amp.auto_cast(custom_black_list={
                    "flatten_contiguous_range", "greater_than"
            }):
-                out = forward(trainer, batch)
+                out = forward(engine, batch)
-                loss_dict = trainer.train_loss_func(out, batch[1])
+                loss_dict = engine.train_loss_func(out, batch[1])
        else:
-            out = forward(trainer, batch)
+            out = forward(engine, batch)
        # calc loss
-        if trainer.config["DataLoader"]["Train"]["dataset"].get(
+        if engine.config["DataLoader"]["Train"]["dataset"].get(
                "batch_transform_ops", None):
-            loss_dict = trainer.train_loss_func(out, batch[1:])
+            loss_dict = engine.train_loss_func(out, batch[1:])
        else:
-            loss_dict = trainer.train_loss_func(out, batch[1])
+            loss_dict = engine.train_loss_func(out, batch[1])
        # step opt and lr
-        if trainer.amp:
+        if engine.amp:
-            scaled = trainer.scaler.scale(loss_dict["loss"])
+            scaled = engine.scaler.scale(loss_dict["loss"])
            scaled.backward()
-            trainer.scaler.minimize(trainer.optimizer, scaled)
+            engine.scaler.minimize(engine.optimizer, scaled)
        else:
            loss_dict["loss"].backward()
-            trainer.optimizer.step()
+            engine.optimizer.step()
-        trainer.optimizer.clear_grad()
+        engine.optimizer.clear_grad()
-        trainer.lr_sch.step()
+        engine.lr_sch.step()
        # below code just for logging
        # update metric_for_logger
-        update_metric(trainer, out, batch, batch_size)
+        update_metric(engine, out, batch, batch_size)
        # update_loss_for_logger
-        update_loss(trainer, loss_dict, batch_size)
+        update_loss(engine, loss_dict, batch_size)
-        trainer.time_info["batch_cost"].update(time.time() - tic)
+        engine.time_info["batch_cost"].update(time.time() - tic)
        if iter_id % print_batch_step == 0:
-            log_info(trainer, batch_size, epoch_id, iter_id)
+            log_info(engine, batch_size, epoch_id, iter_id)
        tic = time.time()
-def forward(trainer, batch):
+def forward(engine, batch):
-    if not trainer.is_rec:
+    if not engine.is_rec:
-        return trainer.model(batch[0])
+        return engine.model(batch[0])
    else:
-        return trainer.model(batch[0], batch[1])
+        return engine.model(batch[0], batch[1])
--- a/ppcls/loss/__init__.py
+++ b/ppcls/loss/__init__.py
@@ -20,6 +20,7 @@ from .distanceloss import DistanceLoss
 from .distillationloss import DistillationCELoss
 from .distillationloss import DistillationGTCELoss
 from .distillationloss import DistillationDMLLoss
+from .multilabelloss import MultiLabelLoss
 class CombinedLoss(nn.Layer):

--- a/ppcls/loss/multilabelloss.py
+++ b/ppcls/loss/multilabelloss.py
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+class MultiLabelLoss(nn.Layer):
+    """
+    Multi-label loss
+    """
+    def __init__(self, epsilon=None):
+        super().__init__()
+        if epsilon is not None and (epsilon <= 0 or epsilon >= 1):
+            epsilon = None
+        self.epsilon = epsilon
+    def _labelsmoothing(self, target, class_num):
+        if target.ndim == 1 or target.shape[-1] != class_num:
+            one_hot_target = F.one_hot(target, class_num)
+        else:
+            one_hot_target = target
+        soft_target = F.label_smooth(one_hot_target, epsilon=self.epsilon)
+        soft_target = paddle.reshape(soft_target, shape=[-1, class_num])
+        return soft_target
+    def _binary_crossentropy(self, input, target, class_num):
+        if self.epsilon is not None:
+            target = self._labelsmoothing(target, class_num)
+            cost = F.binary_cross_entropy_with_logits(
+                logit=input, label=target)
+        else:
+            cost = F.binary_cross_entropy_with_logits(
+                logit=input, label=target)
+        return cost
+    def forward(self, x, target):
+        if isinstance(x, dict):
+            x = x["logits"]
+        class_num = x.shape[-1]
+        loss = self._binary_crossentropy(x, target, class_num)
+        loss = loss.mean()
+        return {"MultiLabelLoss": loss}
--- a/ppcls/metric/__init__.py
+++ b/ppcls/metric/__init__.py
@@ -19,6 +19,8 @@ from collections import OrderedDict
 from .metrics import TopkAcc, mAP, mINP, Recallk, Precisionk
 from .metrics import DistillationTopkAcc
 from .metrics import GoogLeNetTopkAcc
+from .metrics import HammingDistance, AccuracyScore
 class CombinedMetrics(nn.Layer):
    def __init__(self, config_list):
@@ -32,7 +34,8 @@ class CombinedMetrics(nn.Layer):
            metric_name = list(config)[0]
            metric_params = config[metric_name]
            if metric_params is not None:
-                self.metric_func_list.append(eval(metric_name)(**metric_params))
+                self.metric_func_list.append(
+                    eval(metric_name)(**metric_params))
            else:
                self.metric_func_list.append(eval(metric_name)())
@@ -42,6 +45,7 @@ class CombinedMetrics(nn.Layer):
            metric_dict.update(metric_func(*args, **kwargs))
        return metric_dict
 def build_metrics(config):
    metrics_list = CombinedMetrics(copy.deepcopy(config))
    return metrics_list
--- a/ppcls/metric/metrics.py
+++ b/ppcls/metric/metrics.py
@@ -15,6 +15,12 @@
 import numpy as np
 import paddle
 import paddle.nn as nn
+import paddle.nn.functional as F
+from sklearn.metrics import hamming_loss
+from sklearn.metrics import accuracy_score as accuracy_metric
+from sklearn.metrics import multilabel_confusion_matrix
+from sklearn.preprocessing import binarize
 class TopkAcc(nn.Layer):
@@ -198,7 +204,7 @@ class Precisionk(nn.Layer):
            equal_flag = paddle.logical_and(equal_flag,
                                            keep_mask.astype('bool'))
        equal_flag = paddle.cast(equal_flag, 'float32')
        Ns = paddle.arange(gallery_img_id.shape[0]) + 1
        equal_flag_cumsum = paddle.cumsum(equal_flag, axis=1)
        Precision_at_k = (paddle.mean(equal_flag_cumsum, axis=0) / Ns).numpy()
@@ -232,3 +238,71 @@ class GoogLeNetTopkAcc(TopkAcc):
    def forward(self, x, label):
        return super().forward(x[0], label)
+class MutiLabelMetric(object):
+    def __init__(self):
+        pass
+    def _multi_hot_encode(self, logits, threshold=0.5):
+        return binarize(logits, threshold=threshold)
+    def __call__(self, output):
+        output = F.sigmoid(output)
+        preds = self._multi_hot_encode(logits=output.numpy(), threshold=0.5)
+        return preds
+class HammingDistance(MutiLabelMetric):
+    """
+    Soft metric based label for multilabel classification
+    Returns:
+        The smaller the return value is, the better model is.
+    """
+    def __init__(self):
+        super().__init__()
+    def __call__(self, output, target):
+        preds = super().__call__(output)
+        metric_dict = dict()
+        metric_dict["HammingDistance"] = paddle.to_tensor(
+            hamming_loss(target, preds))
+        return metric_dict
+class AccuracyScore(MutiLabelMetric):
+    """
+    Hard metric for multilabel classification
+    Args:
+        base: ["sample", "label"], default="sample"
+            if "sample", return metric score based sample,
+            if "label", return metric score based label.
+    Returns:
+        accuracy:
+    """
+    def __init__(self, base="label"):
+        super().__init__()
+        assert base in ["sample", "label"
+                        ], 'must be one of ["sample", "label"]'
+        self.base = base
+    def __call__(self, output, target):
+        preds = super().__call__(output)
+        metric_dict = dict()
+        if self.base == "sample":
+            accuracy = accuracy_metric(target, preds)
+        elif self.base == "label":
+            mcm = multilabel_confusion_matrix(target, preds)
+            tns = mcm[:, 0, 0]
+            fns = mcm[:, 1, 0]
+            tps = mcm[:, 1, 1]
+            fps = mcm[:, 0, 1]
+            accuracy = (sum(tps) + sum(tns)) / (
+                sum(tps) + sum(tns) + sum(fns) + sum(fps))
+            precision = sum(tps) / (sum(tps) + sum(fps))
+            recall = sum(tps) / (sum(tps) + sum(fns))
+            F1 = 2 * (accuracy * recall) / (accuracy + recall)
+        metric_dict["AccuracyScore"] = paddle.to_tensor(accuracy)
+        return metric_dict
--- a/ppcls/optimizer/__init__.py
+++ b/ppcls/optimizer/__init__.py
@@ -41,19 +41,22 @@ def build_lr_scheduler(lr_config, epochs, step_each_epoch):
    return lr
-def build_optimizer(config, epochs, step_each_epoch, parameters=None):
+def build_optimizer(config, epochs, step_each_epoch, model_list):
    config = copy.deepcopy(config)
    # step1 build lr
    lr = build_lr_scheduler(config.pop('lr'), epochs, step_each_epoch)
    logger.debug("build lr ({}) success..".format(lr))
    # step2 build regularization
    if 'regularizer' in config and config['regularizer'] is not None:
+        if 'weight_decay' in config:
+            logger.warning(
+                "ConfigError: Only one of regularizer and weight_decay can be set in Optimizer Config. \"weight_decay\" has been ignored."
+            )
        reg_config = config.pop('regularizer')
        reg_name = reg_config.pop('name') + 'Decay'
        reg = getattr(paddle.regularizer, reg_name)(**reg_config)
-    else:
+        config["weight_decay"] = reg
-        reg = None
+        logger.debug("build regularizer ({}) success..".format(reg))
-    logger.debug("build regularizer ({}) success..".format(reg))
    # step3 build optimizer
    optim_name = config.pop('name')
    if 'clip_norm' in config:
@@ -62,8 +65,7 @@ def build_optimizer(config, epochs, step_each_epoch, parameters=None):
    else:
        grad_clip = None
    optim = getattr(optimizer, optim_name)(learning_rate=lr,
-                                           weight_decay=reg,
                                           grad_clip=grad_clip,
-                                           **config)(parameters=parameters)
+                                           **config)(model_list=model_list)
    logger.debug("build optimizer ({}) success..".format(optim))
    return optim, lr
--- a/ppcls/optimizer/learning_rate.py
+++ b/ppcls/optimizer/learning_rate.py
@@ -11,12 +11,15 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from __future__ import (absolute_import, division, print_function,
                        unicode_literals)
 from paddle.optimizer import lr
 from paddle.optimizer.lr import LRScheduler
+from ppcls.utils import logger
 class Linear(object):
    """
@@ -26,6 +29,8 @@ class Linear(object):
        epochs(int): The decay step size. It determines the decay cycle.
        end_lr(float, optional): The minimum final learning rate. Default: 0.0001.
        power(float, optional): Power of polynomial. Default: 1.0.
+        warmup_epoch(int): The epoch numbers for LinearWarmup. Default: 0.
+        warmup_start_lr(float): Initial learning rate of warm up. Default: 0.0.
        last_epoch (int, optional):  The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
    """
@@ -36,28 +41,35 @@ class Linear(object):
                 end_lr=0.0,
                 power=1.0,
                 warmup_epoch=0,
+                 warmup_start_lr=0.0,
                 last_epoch=-1,
                 **kwargs):
-        super(Linear, self).__init__()
+        super().__init__()
+        if warmup_epoch >= epochs:
+            msg = f"When using warm up, the value of \"Global.epochs\" must be greater than value of \"Optimizer.lr.warmup_epoch\". The value of \"Optimizer.lr.warmup_epoch\" has been set to {epochs}."
+            logger.warning(msg)
+            warmup_epoch = epochs
        self.learning_rate = learning_rate
-        self.epochs = epochs * step_each_epoch
+        self.steps = (epochs - warmup_epoch) * step_each_epoch
        self.end_lr = end_lr
        self.power = power
        self.last_epoch = last_epoch
-        self.warmup_epoch = round(warmup_epoch * step_each_epoch)
+        self.warmup_steps = round(warmup_epoch * step_each_epoch)
+        self.warmup_start_lr = warmup_start_lr
    def __call__(self):
        learning_rate = lr.PolynomialDecay(
            learning_rate=self.learning_rate,
-            decay_steps=self.epochs,
+            decay_steps=self.steps,
            end_lr=self.end_lr,
            power=self.power,
-            last_epoch=self.last_epoch)
+            last_epoch=self.
-        if self.warmup_epoch > 0:
+            last_epoch) if self.steps > 0 else self.learning_rate
+        if self.warmup_steps > 0:
            learning_rate = lr.LinearWarmup(
                learning_rate=learning_rate,
-                warmup_steps=self.warmup_epoch,
+                warmup_steps=self.warmup_steps,
-                start_lr=0.0,
+                start_lr=self.warmup_start_lr,
                end_lr=self.learning_rate,
                last_epoch=self.last_epoch)
        return learning_rate
@@ -71,6 +83,9 @@ class Cosine(object):
        lr(float): initial learning rate
        step_each_epoch(int): steps each epoch
        epochs(int): total training epochs
+        eta_min(float): Minimum learning rate. Default: 0.0.
+        warmup_epoch(int): The epoch numbers for LinearWarmup. Default: 0.
+        warmup_start_lr(float): Initial learning rate of warm up. Default: 0.0.
        last_epoch (int, optional):  The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
    """
@@ -78,25 +93,35 @@ class Cosine(object):
                 learning_rate,
                 step_each_epoch,
                 epochs,
+                 eta_min=0.0,
                 warmup_epoch=0,
+                 warmup_start_lr=0.0,
                 last_epoch=-1,
                 **kwargs):
-        super(Cosine, self).__init__()
+        super().__init__()
+        if warmup_epoch >= epochs:
+            msg = f"When using warm up, the value of \"Global.epochs\" must be greater than value of \"Optimizer.lr.warmup_epoch\". The value of \"Optimizer.lr.warmup_epoch\" has been set to {epochs}."
+            logger.warning(msg)
+            warmup_epoch = epochs
        self.learning_rate = learning_rate
-        self.T_max = step_each_epoch * epochs
+        self.T_max = (epochs - warmup_epoch) * step_each_epoch
+        self.eta_min = eta_min
        self.last_epoch = last_epoch
-        self.warmup_epoch = round(warmup_epoch * step_each_epoch)
+        self.warmup_steps = round(warmup_epoch * step_each_epoch)
+        self.warmup_start_lr = warmup_start_lr
    def __call__(self):
        learning_rate = lr.CosineAnnealingDecay(
            learning_rate=self.learning_rate,
            T_max=self.T_max,
-            last_epoch=self.last_epoch)
+            eta_min=self.eta_min,
-        if self.warmup_epoch > 0:
+            last_epoch=self.
+            last_epoch) if self.T_max > 0 else self.learning_rate
+        if self.warmup_steps > 0:
            learning_rate = lr.LinearWarmup(
                learning_rate=learning_rate,
-                warmup_steps=self.warmup_epoch,
+                warmup_steps=self.warmup_steps,
-                start_lr=0.0,
+                start_lr=self.warmup_start_lr,
                end_lr=self.learning_rate,
                last_epoch=self.last_epoch)
        return learning_rate
@@ -111,6 +136,8 @@ class Step(object):
        step_size (int): the interval to update.
        gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * gamma`` .
            It should be less than 1.0. Default: 0.1.
+        warmup_epoch(int): The epoch numbers for LinearWarmup. Default: 0.
+        warmup_start_lr(float): Initial learning rate of warm up. Default: 0.0.
        last_epoch (int, optional):  The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
    """
@@ -118,16 +145,23 @@ class Step(object):
                 learning_rate,
                 step_size,
                 step_each_epoch,
+                 epochs,
                 gamma,
                 warmup_epoch=0,
+                 warmup_start_lr=0.0,
                 last_epoch=-1,
                 **kwargs):
-        super(Step, self).__init__()
+        super().__init__()
+        if warmup_epoch >= epochs:
+            msg = f"When using warm up, the value of \"Global.epochs\" must be greater than value of \"Optimizer.lr.warmup_epoch\". The value of \"Optimizer.lr.warmup_epoch\" has been set to {epochs}."
+            logger.warning(msg)
+            warmup_epoch = epochs
        self.step_size = step_each_epoch * step_size
        self.learning_rate = learning_rate
        self.gamma = gamma
        self.last_epoch = last_epoch
-        self.warmup_epoch = round(warmup_epoch * step_each_epoch)
+        self.warmup_steps = round(warmup_epoch * step_each_epoch)
+        self.warmup_start_lr = warmup_start_lr
    def __call__(self):
        learning_rate = lr.StepDecay(
@@ -135,11 +169,11 @@ class Step(object):
            step_size=self.step_size,
            gamma=self.gamma,
            last_epoch=self.last_epoch)
-        if self.warmup_epoch > 0:
+        if self.warmup_steps > 0:
            learning_rate = lr.LinearWarmup(
                learning_rate=learning_rate,
-                warmup_steps=self.warmup_epoch,
+                warmup_steps=self.warmup_steps,
-                start_lr=0.0,
+                start_lr=self.warmup_start_lr,
                end_lr=self.learning_rate,
                last_epoch=self.last_epoch)
        return learning_rate
@@ -152,6 +186,8 @@ class Piecewise(object):
        boundaries(list): A list of steps numbers. The type of element in the list is python int.
        values(list): A list of learning rate values that will be picked during different epoch boundaries.
            The type of element in the list is python float.
+        warmup_epoch(int): The epoch numbers for LinearWarmup. Default: 0.
+        warmup_start_lr(float): Initial learning rate of warm up. Default: 0.0.
        last_epoch (int, optional):  The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
    """
@@ -159,25 +195,32 @@ class Piecewise(object):
                 step_each_epoch,
                 decay_epochs,
                 values,
+                 epochs,
                 warmup_epoch=0,
+                 warmup_start_lr=0.0,
                 last_epoch=-1,
                 **kwargs):
-        super(Piecewise, self).__init__()
+        super().__init__()
+        if warmup_epoch >= epochs:
+            msg = f"When using warm up, the value of \"Global.epochs\" must be greater than value of \"Optimizer.lr.warmup_epoch\". The value of \"Optimizer.lr.warmup_epoch\" has been set to {epochs}."
+            logger.warning(msg)
+            warmup_epoch = epochs
        self.boundaries = [step_each_epoch * e for e in decay_epochs]
        self.values = values
        self.last_epoch = last_epoch
-        self.warmup_epoch = round(warmup_epoch * step_each_epoch)
+        self.warmup_steps = round(warmup_epoch * step_each_epoch)
+        self.warmup_start_lr = warmup_start_lr
    def __call__(self):
        learning_rate = lr.PiecewiseDecay(
            boundaries=self.boundaries,
            values=self.values,
            last_epoch=self.last_epoch)
-        if self.warmup_epoch > 0:
+        if self.warmup_steps > 0:
            learning_rate = lr.LinearWarmup(
                learning_rate=learning_rate,
-                warmup_steps=self.warmup_epoch,
+                warmup_steps=self.warmup_steps,
-                start_lr=0.0,
+                start_lr=self.warmup_start_lr,
                end_lr=self.values[0],
                last_epoch=self.last_epoch)
        return learning_rate
@@ -186,7 +229,7 @@ class Piecewise(object):
 class MultiStepDecay(LRScheduler):
    """
    Update the learning rate by ``gamma`` once ``epoch`` reaches one of the milestones.
-    The algorithm can be described as the code below. 
+    The algorithm can be described as the code below.
    .. code-block:: text
        learning_rate = 0.5
        milestones = [30, 50]
@@ -200,15 +243,15 @@ class MultiStepDecay(LRScheduler):
    Args:
        learning_rate (float): The initial learning rate. It is a python float number.
        milestones (tuple|list): List or tuple of each boundaries. Must be increasing.
-        gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * gamma`` . 
+        gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * gamma`` .
            It should be less than 1.0. Default: 0.1.
        last_epoch (int, optional):  The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
        verbose (bool, optional): If ``True``, prints a message to stdout for each update. Default: ``False`` .
    Returns:
        ``MultiStepDecay`` instance to schedule learning rate.
    Examples:
        .. code-block:: python
            import paddle
            import numpy as np
@@ -274,8 +317,7 @@ class MultiStepDecay(LRScheduler):
            raise ValueError('gamma should be < 1.0.')
        self.milestones = [x * step_each_epoch for x in milestones]
        self.gamma = gamma
-        super(MultiStepDecay, self).__init__(learning_rate, last_epoch,
+        super().__init__(learning_rate, last_epoch, verbose)
-                                             verbose)
    def get_lr(self):
        for i in range(len(self.milestones)):

--- a/ppcls/optimizer/optimizer.py
+++ b/ppcls/optimizer/optimizer.py
@@ -35,14 +35,15 @@ class Momentum(object):
                 weight_decay=None,
                 grad_clip=None,
                 multi_precision=False):
-        super(Momentum, self).__init__()
+        super().__init__()
        self.learning_rate = learning_rate
        self.momentum = momentum
        self.weight_decay = weight_decay
        self.grad_clip = grad_clip
        self.multi_precision = multi_precision
-    def __call__(self, parameters):
+    def __call__(self, model_list):
+        parameters = sum([m.parameters() for m in model_list], [])
        opt = optim.Momentum(
            learning_rate=self.learning_rate,
            momentum=self.momentum,
@@ -77,7 +78,8 @@ class Adam(object):
        self.lazy_mode = lazy_mode
        self.multi_precision = multi_precision
-    def __call__(self, parameters):
+    def __call__(self, model_list):
+        parameters = sum([m.parameters() for m in model_list], [])
        opt = optim.Adam(
            learning_rate=self.learning_rate,
            beta1=self.beta1,
@@ -112,7 +114,7 @@ class RMSProp(object):
                 weight_decay=None,
                 grad_clip=None,
                 multi_precision=False):
-        super(RMSProp, self).__init__()
+        super().__init__()
        self.learning_rate = learning_rate
        self.momentum = momentum
        self.rho = rho
@@ -120,7 +122,8 @@ class RMSProp(object):
        self.weight_decay = weight_decay
        self.grad_clip = grad_clip
-    def __call__(self, parameters):
+    def __call__(self, model_list):
+        parameters = sum([m.parameters() for m in model_list], [])
        opt = optim.RMSProp(
            learning_rate=self.learning_rate,
            momentum=self.momentum,
@@ -130,3 +133,57 @@ class RMSProp(object):
            grad_clip=self.grad_clip,
            parameters=parameters)
        return opt
+class AdamW(object):
+    def __init__(self,
+                 learning_rate=0.001,
+                 beta1=0.9,
+                 beta2=0.999,
+                 epsilon=1e-8,
+                 weight_decay=None,
+                 multi_precision=False,
+                 grad_clip=None,
+                 no_weight_decay_name=None,
+                 one_dim_param_no_weight_decay=False,
+                 **args):
+        super().__init__()
+        self.learning_rate = learning_rate
+        self.beta1 = beta1
+        self.beta2 = beta2
+        self.epsilon = epsilon
+        self.grad_clip = grad_clip
+        self.weight_decay = weight_decay
+        self.multi_precision = multi_precision
+        self.no_weight_decay_name_list = no_weight_decay_name.split(
+        ) if no_weight_decay_name else []
+        self.one_dim_param_no_weight_decay = one_dim_param_no_weight_decay
+    def __call__(self, model_list):
+        parameters = sum([m.parameters() for m in model_list], [])
+        self.no_weight_decay_param_name_list = [
+            p.name for model in model_list for n, p in model.named_parameters()
+            if any(nd in n for nd in self.no_weight_decay_name_list)
+        ]
+        if self.one_dim_param_no_weight_decay:
+            self.no_weight_decay_param_name_list += [
+                p.name for model in model_list
+                for n, p in model.named_parameters() if len(p.shape) == 1
+            ]
+        opt = optim.AdamW(
+            learning_rate=self.learning_rate,
+            beta1=self.beta1,
+            beta2=self.beta2,
+            epsilon=self.epsilon,
+            parameters=parameters,
+            weight_decay=self.weight_decay,
+            multi_precision=self.multi_precision,
+            grad_clip=self.grad_clip,
+            apply_decay_param_fun=self._apply_decay_param_fun)
+        return opt
+    def _apply_decay_param_fun(self, name):
+        return name not in self.no_weight_decay_param_name_list
--- a/tools/train.sh
+++ b/tools/train.sh
@@ -4,4 +4,4 @@
 # python3.7 tools/train.py -c ./ppcls/configs/ImageNet/ResNet/ResNet50.yaml
 # for multi-cards train
 python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" tools/train.py -c ./ppcls/configs/ImageNet/ResNet/ResNet50.yaml
\ No newline at end of file