Merge branch 'PaddlePaddle:develop' into develop

fdb9e34c · Bin Lu · GitHub · 1af89b2c · 5f3d50f7 · fdb9e34c
116 changed file
--- a/.github/ISSUE_TEMPLATE/---clas-issue-.md
+++ b/.github/ISSUE_TEMPLATE/---clas-issue-.md
+---
+name: 问题反馈
+about: PaddleClas问题反馈
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+欢迎您使用PaddleClas并反馈相关问题，非常感谢您对PaddleClas的贡献！
+提出issue时，辛苦您提供以下信息，方便我们快速定位问题并及时有效地解决您的问题：
+ 1. PaddleClas版本以及PaddlePaddle版本：请您提供您使用的版本号或分支信息，如PaddleClas release/2.2和PaddlePaddle 2.1.0
+ 2. 涉及的其他产品使用的版本号：如您在使用PaddleClas的同时还在使用其他产品，如PaddleServing、PaddleInference等，请您提供其版本号
+ 3. 训练环境信息：
+  a. 具体操作系统，如Linux/Windows/MacOS
+  b. Python版本号，如Python3.6/7/8
+  c. CUDA/cuDNN版本， 如CUDA10.2/cuDNN 7.6.5等
+ 4. 完整的代码(相比于repo中代码，有改动的地方)、详细的错误信息及相关log
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -4,5 +4,4 @@ include docs/en/whl_en.md
 recursive-include deploy/python predict_cls.py preprocess.py postprocess.py det_preprocess.py
 recursive-include deploy/utils get_image_list.py config.py logger.py predictor.py

-include ppcls/arch/backbone/__init__.py
-recursive-include ppcls/arch/backbone/ *.py
\ No newline at end of file
+recursive-include ppcls/ *.py *.txt
\ No newline at end of file
--- a/README_ch.md
+++ b/README_ch.md
@@ -10,7 +10,7 @@

 - 2021.06.29 添加Swin-transformer系列模型，ImageNet1k数据集上Top1 acc最高精度可达87.2%；支持训练预测评估与whl包部署，预训练模型可以从[这里](docs/zh_CN/models/models_intro.md)下载。
 - 2021.06.22,23,24 PaddleClas官方研发团队带来技术深入解读三日直播课。课程回放：[https://aistudio.baidu.com/aistudio/course/introduce/24519](https://aistudio.baidu.com/aistudio/course/introduce/24519)
- 2021.06.16 PaddleClas v2.2版本升级，集成Metric learning，向量检索等组件。新增商品识别、动漫人物识别、车辆识别和logo识别等4个图像识别应用。新增LeViT、TNT、DLA、HarDNet、RedNet系列24个预训练模型。
+- 2021.06.16 PaddleClas v2.2版本升级，集成Metric learning，向量检索等组件。新增商品识别、动漫人物识别、车辆识别和logo识别等4个图像识别应用。新增LeViT、Twins、TNT、DLA、HarDNet、RedNet系列30个预训练模型。
 - [more](./docs/zh_CN/update_history.md)

 ## 特性
@@ -18,7 +18,7 @@
 - 实用的图像识别系统：集成了目标检测、特征学习、图像检索等模块，广泛适用于各类图像识别任务。
 提供商品识别、车辆识别、logo识别和动漫人物识别等4个场景应用示例。

- 丰富的预训练模型库：提供了33个系列共150个ImageNet预训练模型，其中6个精选系列模型支持结构快速修改。
+- 丰富的预训练模型库：提供了35个系列共164个ImageNet预训练模型，其中6个精选系列模型支持结构快速修改。

 - 全面易用的特征学习组件：集成arcmargin, triplet loss等12度量学习方法，通过配置文件即可随意组合切换。


--- a/README_en.md
+++ b/README_en.md
@@ -9,7 +9,7 @@ PaddleClas is an image recognition toolset for industry and academia, helping us
 **Recent updates**

 - 2021.06.29 Add Swin-transformer series model，Highest top1 acc on ImageNet1k dataset reaches 87.2%, training, evaluation and inference are all supported. Pretrained models can be downloaded [here](docs/en/models/models_intro_en.md).
- 2021.06.16 PaddleClas release/2.2. Add metric learning and vector search modules. Add product recognition, animation character recognition, vehicle recognition and logo recognition. Added 24 pretrained models of LeViT, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
+- 2021.06.16 PaddleClas release/2.2. Add metric learning and vector search modules. Add product recognition, animation character recognition, vehicle recognition and logo recognition. Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
 - [more](./docs/en/update_history_en.md)

 ## Features
@@ -17,7 +17,7 @@ PaddleClas is an image recognition toolset for industry and academia, helping us
 - A practical image recognition system consist of detection, feature learning and retrieval modules, widely applicable to all types of image recognition tasks.
 Four sample solutions are provided, including product recognition, vehicle recognition, logo recognition and animation character recognition.

- Rich library of pre-trained models: Provide a total of 150 ImageNet pre-trained models in 33 series, among which 6 selected series of models support fast structural modification.
+- Rich library of pre-trained models: Provide a total of 164 ImageNet pre-trained models in 35 series, among which 6 selected series of models support fast structural modification.

 - Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be combined and switched at will through configuration files.

@@ -51,7 +51,7 @@ Quick experience of image recognition：[Link](./docs/en/tutorials/quick_start_r
 - [Introduction to Image Recognition Systems](#Introduction_to_Image_Recognition_Systems)
 - [Demo images](#Demo_images)
 - Algorithms Introduction
-    - [Backbone Network and Pre-trained Model Library](./docs/en/ImageNet_models.md)
+    - [Backbone Network and Pre-trained Model Library](./docs/en/ImageNet_models_en.md)
    - [Mainbody Detection](./docs/en/application/mainbody_detection_en.md)
    - [Image Classification](./docs/en/tutorials/image_classification_en.md)
    - [Feature Learning](./docs/en/application/feature_learning_en.md)

--- a/aa.txt
+++ b/aa.txt
-aaa
--- a/deploy/configs/build_cartoon.yaml
+++ b/deploy/configs/build_cartoon.yaml
@@ -2,8 +2,8 @@ Global:
  rec_inference_model_dir: "./models/cartoon_rec_ResNet50_iCartoon_v1.0_infer/"
  batch_size: 32
  use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
  enable_benchmark: True
  use_fp16: False
  ir_optim: True

--- a/deploy/configs/build_logo.yaml
+++ b/deploy/configs/build_logo.yaml
@@ -2,8 +2,8 @@ Global:
  rec_inference_model_dir: "./models/logo_rec_ResNet50_Logo3K_v1.0_infer/"
  batch_size: 32
  use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
  enable_benchmark: True
  use_fp16: False
  ir_optim: True

--- a/deploy/configs/build_product.yaml
+++ b/deploy/configs/build_product.yaml
@@ -2,8 +2,8 @@ Global:
  rec_inference_model_dir: "./models/product_ResNet50_vd_aliproduct_v1.0_infer"
  batch_size: 32
  use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
  enable_benchmark: True
  use_fp16: False
  ir_optim: True

--- a/deploy/configs/build_vehicle.yaml
+++ b/deploy/configs/build_vehicle.yaml
@@ -2,8 +2,8 @@ Global:
  rec_inference_model_dir: "./models/vehicle_cls_ResNet50_CompCars_v1.0_infer/"
  batch_size: 32
  use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
  enable_benchmark: True
  use_fp16: False
  ir_optim: True

--- a/deploy/configs/inference_cartoon.yaml
+++ b/deploy/configs/inference_cartoon.yaml
@@ -12,8 +12,8 @@ Global:
  - foreground

  use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
  enable_benchmark: True
  use_fp16: False
  ir_optim: True

--- a/deploy/configs/inference_cls.yaml
+++ b/deploy/configs/inference_cls.yaml
@@ -3,8 +3,8 @@ Global:
  inference_model_dir: "./models"
  batch_size: 1
  use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
  enable_benchmark: True
  use_fp16: False
  ir_optim: True
@@ -22,8 +22,12 @@ PreProcess:
        mean: [0.485, 0.456, 0.406]
        std: [0.229, 0.224, 0.225]
        order: ''
+        channel_num: 3
    - ToCHWImage:
 PostProcess:
-  name: Topk
+  main_indicator: Topk
+  Topk:
    topk: 5
    class_id_map_file: "../ppcls/utils/imagenet1k_label_list.txt"
+  SavePreLabel:
+    save_dir: ./pre_label/
--- a/deploy/configs/inference_cls_ch4.yaml
+++ b/deploy/configs/inference_cls_ch4.yaml
+Global:
+  infer_imgs: "./images/ILSVRC2012_val_00000010.jpeg"
+  inference_model_dir: "./models"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: True
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 4
+    - ToCHWImage:
+PostProcess:
+  main_indicator: Topk
+  Topk:
+    topk: 5
+    class_id_map_file: "../ppcls/utils/imagenet1k_label_list.txt"
+  SavePreLabel:
+    save_dir: ./pre_label/
--- a/deploy/configs/inference_det.yaml
+++ b/deploy/configs/inference_det.yaml
@@ -10,8 +10,8 @@ Global:

  # inference engine config
  use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
  enable_benchmark: True
  use_fp16: False
  ir_optim: True

--- a/deploy/configs/inference_logo.yaml
+++ b/deploy/configs/inference_logo.yaml
@@ -13,8 +13,8 @@ Global:

  # inference engine config
  use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
  enable_benchmark: True
  use_fp16: False
  ir_optim: True

--- a/deploy/configs/inference_product.yaml
+++ b/deploy/configs/inference_product.yaml
@@ -13,8 +13,8 @@ Global:

  # inference engine config
  use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
  enable_benchmark: True
  use_fp16: False
  ir_optim: True

--- a/deploy/configs/inference_rec.yaml
+++ b/deploy/configs/inference_rec.yaml
@@ -10,8 +10,8 @@ Global:

  # inference engine config
  use_gpu: False
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
  enable_benchmark: True
  use_fp16: False
  ir_optim: True

--- a/deploy/configs/inference_vehicle.yaml
+++ b/deploy/configs/inference_vehicle.yaml
@@ -13,8 +13,8 @@ Global:

  # inference engine config
  use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
  enable_benchmark: True
  use_fp16: False
  ir_optim: True

--- a/deploy/hubserving/clas/params.py
+++ b/deploy/hubserving/clas/params.py
@@ -33,8 +33,10 @@ def get_default_confg():
            "enable_benchmark": False
        },
        'PostProcess': {
-            'name': 'Topk',
+            'main_indicator': 'Topk',
+            'Topk': {
                'topk': 5,
                'class_id_map_file': './utils/imagenet1k_label_list.txt'
            }
        }
+    }
--- a/deploy/hubserving/readme.md
+++ b/deploy/hubserving/readme.md
@@ -15,7 +15,7 @@ hubserving/clas/
 ### 1. 准备环境
 ```shell
 # 安装paddlehub,请安装2.0版本
-pip3 install paddlehub==2.0.0b1 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
+pip3 install paddlehub==2.1.0 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
 ```

 ### 2. 下载推理模型
@@ -128,8 +128,12 @@ python hubserving/test_hubserving.py server_url image_path
 `http://[ip_address]:[port]/predict/[module_name]`  
 - **image_path**：测试图像路径，可以是单张图片路径，也可以是图像集合目录路径。
 - **batch_size**：[**可选**] 以`batch_size`大小为单位进行预测，默认为`1`。
+- **resize_short**：[**可选**] 预处理时，按短边调整大小，默认为`256`。
+- **crop_size**：[**可选**] 预处理时，居中裁剪的大小，默认为`224`。
+- **normalize**：[**可选**] 预处理时，是否进行`normalize`，默认为`True`。
+- **to_chw**：[**可选**] 预处理时，是否调整为`CHW`顺序，默认为`True`。

-**注意**：如果使用`Transformer`系列模型，如`DeiT_***_384`, `ViT_***_384`等，请注意模型的输入数据尺寸。需要指定`--resize_short=384 --resize=384`。
+**注意**：如果使用`Transformer`系列模型，如`DeiT_***_384`, `ViT_***_384`等，请注意模型的输入数据尺寸，需要指定`--resize_short=384 --crop_size=384`。


 访问示例：  

--- a/deploy/hubserving/readme_en.md
+++ b/deploy/hubserving/readme_en.md
@@ -15,7 +15,7 @@ hubserving/clas/
 ### 1. Prepare the environment
 ```shell
 # Install version 2.0 of PaddleHub  
-pip3 install paddlehub==2.0.0b1 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
+pip3 install paddlehub==2.1.0 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
 ```

 ### 2. Download inference model
@@ -126,9 +126,13 @@ Two required parameters need to be passed to the script:
 `http://[ip_address]:[port]/predict/[module_name]`  
 - **image_path**: Test image path, can be a single image path or an image directory path
 - **batch_size**: [**Optional**] batch_size. Default by `1`.
+- **resize_short**: [**Optional**] In preprocessing, resize according to short size. Default by `256`。
+- **crop_size**: [**Optional**] In preprocessing, centor crop size. Default by `224`。
+- **normalize**: [**Optional**] In preprocessing, whether to do `normalize`. Default by `True`。
+- **to_chw**: [**Optional**] In preprocessing, whether to transpose to `CHW`. Default by `True`。

 **Notice**:
-If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `--resize_short=384`, `--resize=384`.
+If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `--resize_short=384`, `--crop_size=384`.

 **Eg.**
 ```shell

--- a/deploy/hubserving/test_hubserving.py
+++ b/deploy/hubserving/test_hubserving.py
@@ -32,30 +32,59 @@ from utils import config
 from utils.encode_decode import np_to_b64
 from python.preprocess import create_operators

-preprocess_config = [{
+
+def get_args():
+    def str2bool(v):
+        return v.lower() in ("true", "t", "1")
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--server_url", type=str)
+    parser.add_argument("--image_file", type=str)
+    parser.add_argument("--batch_size", type=int, default=1)
+    parser.add_argument("--resize_short", type=int, default=256)
+    parser.add_argument("--crop_size", type=int, default=224)
+    parser.add_argument("--normalize", type=str2bool, default=True)
+    parser.add_argument("--to_chw", type=str2bool, default=True)
+    return parser.parse_args()
+
+
+class PreprocessConfig(object):
+    def __init__(self,
+                 resize_short=256,
+                 crop_size=224,
+                 normalize=True,
+                 to_chw=True):
+        self.config = [{
            'ResizeImage': {
-        'resize_short': 256
+                'resize_short': resize_short
            }
-}, {
+        }, {
            'CropImage': {
-        'size': 224
+                'size': crop_size
            }
-}, {
+        }]
+        if normalize:
+            self.config.append({
                'NormalizeImage': {
                    'scale': 0.00392157,
                    'mean': [0.485, 0.456, 0.406],
                    'std': [0.229, 0.224, 0.225],
                    'order': ''
                }
-}, {
-    'ToCHWImage': None
-}]
+            })
+        if to_chw:
+            self.config.append({'ToCHWImage': None})
+
+    def __call__(self):
+        return self.config


 def main(args):
    image_path_list = get_image_list(args.image_file)
    headers = {"Content-type": "application/json"}
-    preprocess_ops = create_operators(preprocess_config)
+    preprocess_ops = create_operators(
+        PreprocessConfig(args.resize_short, args.crop_size, args.normalize,
+                         args.to_chw)())

    cnt = 0
    predict_time = 0
@@ -113,14 +142,10 @@ def main(args):

                for number, result_list in enumerate(preds):
                    all_score += result_list["scores"][0]
-                    result_str = ""
-                    for i in range(len(result_list["class_ids"])):
-                        result_str += "{}: {:.2f}\t".format(
-                            result_list["class_ids"][i],
-                            result_list["scores"][i])
-
+                    pred_str = ", ".join(
+                        [f"{k}: {result_list[k]}" for k in result_list])
                    logger.info(
-                        f"File:{img_name_list[number]}, The result(s): {result_str}"
+                        f"File:{img_name_list[number]}, The result(s): {pred_str}"
                    )

            finally:
@@ -136,10 +161,5 @@ def main(args):


 if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--server_url", type=str)
-    parser.add_argument("--image_file", type=str)
-    parser.add_argument("--batch_size", type=int, default=1)
-    args = parser.parse_args()
-
+    args = get_args()
    main(args)
--- a/deploy/paddleserving/image_http_client.py
+++ b/deploy/paddleserving/image_http_client.py
@@ -22,10 +22,9 @@ py_version = sys.version_info[0]


 def predict(image_path, server):
-    if py_version == 2:
-        image = base64.b64encode(open(image_path).read())
-    else:
-        image = base64.b64encode(open(image_path, "rb").read()).decode("utf-8")
+
+    with open(image_path, "rb") as f:
+        image = base64.b64encode(f.read()).decode("utf-8")
    req = json.dumps({"feed": [{"image": image}], "fetch": ["prediction"]})
    r = requests.post(
        server, data=req, headers={"Content-Type": "application/json"})

--- a/deploy/python/build_gallery.py
+++ b/deploy/python/build_gallery.py
@@ -71,14 +71,26 @@ class GalleryBuilder(object):
        gallery_features = np.zeros(
            [len(gallery_images), config['embedding_size']], dtype=np.float32)

+        #construct batch imgs and do inference
+        batch_size = config.get("batch_size", 32)
+        batch_img = []
        for i, image_file in enumerate(tqdm(gallery_images)):
            img = cv2.imread(image_file)
            if img is None:
                logger.error("img empty, please check {}".format(image_file))
                exit()
            img = img[:, :, ::-1]
-            rec_feat = self.rec_predictor.predict(img)
-            gallery_features[i, :] = rec_feat
+            batch_img.append(img)
+
+            if (i + 1) % batch_size == 0:
+                rec_feat = self.rec_predictor.predict(batch_img)
+                gallery_features[i - batch_size + 1:i + 1, :] = rec_feat
+                batch_img = []
+
+        if len(batch_img) > 0:
+            rec_feat = self.rec_predictor.predict(batch_img)
+            gallery_features[-len(batch_img):, :] = rec_feat
+            batch_img = []

        # train index 
        self.Searcher = Graph_Index(dist_type=config['dist_type'])

--- a/deploy/python/postprocess.py
+++ b/deploy/python/postprocess.py
@@ -14,6 +14,8 @@

 import os
 import copy
+import shutil
+from functools import partial
 import importlib
 import numpy as np
 import paddle
@@ -23,11 +25,32 @@ import paddle.nn.functional as F
 def build_postprocess(config):
    if config is None:
        return None
-    config = copy.deepcopy(config)
-    model_name = config.pop("name")
+
    mod = importlib.import_module(__name__)
-    postprocess_func = getattr(mod, model_name)(**config)
-    return postprocess_func
+    config = copy.deepcopy(config)
+
+    main_indicator = config.pop(
+        "main_indicator") if "main_indicator" in config else None
+    main_indicator = main_indicator if main_indicator else ""
+
+    func_list = []
+    for func in config:
+        func_list.append(getattr(mod, func)(**config[func]))
+    return PostProcesser(func_list, main_indicator)
+
+
+class PostProcesser(object):
+    def __init__(self, func_list, main_indicator="Topk"):
+        self.func_list = func_list
+        self.main_indicator = main_indicator
+
+    def __call__(self, x, image_file=None):
+        rtn = None
+        for func in self.func_list:
+            tmp = func(x, image_file)
+            if type(func).__name__ in self.main_indicator:
+                rtn = tmp
+        return rtn


 class Topk(object):
@@ -82,3 +105,24 @@ class Topk(object):
                result["label_names"] = label_name_list
            y.append(result)
        return y
+
+
+class SavePreLabel(object):
+    def __init__(self, save_dir):
+        if save_dir is None:
+            raise Exception(
+                "Please specify save_dir if SavePreLabel specified.")
+        self.save_dir = partial(os.path.join, save_dir)
+
+    def __call__(self, x, file_names=None):
+        if file_names is None:
+            return
+        assert x.shape[0] == len(file_names)
+        for idx, probs in enumerate(x):
+            index = probs.argsort(axis=0)[-1].astype("int32")
+            self.save(index, file_names[idx])
+
+    def save(self, id, image_file):
+        output_dir = self.save_dir(str(id))
+        os.makedirs(output_dir, exist_ok=True)
+        shutil.copy(image_file, output_dir)
--- a/deploy/python/predict_cls.py
+++ b/deploy/python/predict_cls.py
@@ -70,7 +70,7 @@ def main(config):
    for idx, image_file in enumerate(image_list):
        img = cv2.imread(image_file)[:, :, ::-1]
        output = cls_predictor.predict(img)
-        output = cls_predictor.postprocess(output)
+        output = cls_predictor.postprocess(output, [image_file])
        print(output)
    return


--- a/deploy/vector_search/README.md
+++ b/deploy/vector_search/README.md
@@ -47,7 +47,7 @@ Windows上首先需要安装gcc编译工具，推荐使用[TDM-GCC](https://jmeu

 安装完成后，可以打开一个命令行终端，通过命令`gcc -v`查看gcc版本。

-在该文件夹下，运行命令`mingw32-make`，即可生成`index.dll`库文件。如果希望重新生成`index.dll`文件，可以首先使用`mingw32-make clean`清除已经生成的缓存，再使用`mingw32-make`生成更新之后的库文件。
+在该文件夹(deploy/vector_search)下，运行命令`mingw32-make`，即可生成`index.dll`库文件。如果希望重新生成`index.dll`文件，可以首先使用`mingw32-make clean`清除已经生成的缓存，再使用`mingw32-make`生成更新之后的库文件。

 ### 2.4 MacOS上编译生成库文件


--- a/docs/en/ImageNet_models.md
+++ b/docs/en/ImageNet_models.md
--- a/docs/en/advanced_tutorials/image_augmentation/ImageAugment_en.md
+++ b/docs/en/advanced_tutorials/image_augmentation/ImageAugment_en.md
@@ -154,10 +154,12 @@ cutout_op = Cutout(n_holes=1, length=112)

 ops = [decode_op, resize_op, cutout_op]

-imgs_dir = image_path
-fnames = os.listdir(imgs_dir)
-for f in fnames:
-    data = open(os.path.join(imgs_dir, f)).read()
+imgs_dir = "imgs_dir"
+file_names = os.listdir(imgs_dir)
+for file_name in file_names:
+    file_path = os.join(imgs_dir, file_name)
+    with open(file_path) as f:
+        data = f.read()
    img = transform(data, ops)
 ```


--- a/docs/en/models/LeViT_en.md
+++ b/docs/en/models/LeViT_en.md
+# LeViT series
+
+## Overview
+LeViT is a fast inference hybrid neural network for image classification tasks. Its design considers the performance of the network model on different hardware platforms, so it can better reflect the real scenarios of common applications. Through a large number of experiments, the author found a better way to combine the convolutional neural network and the Transformer system, and proposed an attention-based method to integrate the position information encoding in the Transformer. [Paper](https://arxiv.org/abs/2104.01136)。
+
+## Accuracy, FLOPS and Parameters
+
+| Models           | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(M) | Params<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| LeViT-128S | 0.7598 | 0.9269 | 0.766 | 0.929 | 305  | 7.8 |
+| LeViT-128  | 0.7810 | 0.9371 | 0.786 | 0.940 | 406  | 9.2 |
+| LeViT-192  | 0.7934 | 0.9446 | 0.800 | 0.947 | 658  | 11 |
+| LeViT-256  | 0.8085 | 0.9497 | 0.816 | 0.954 | 1120 | 19 |
+| LeViT-384  | 0.8191 | 0.9551 | 0.826 | 0.960 | 2353 | 39 |
+
+
+**Note**：The difference in accuracy from Reference is due to the difference in data preprocessing and the absence of distilled head as output.
--- a/docs/en/models/Twins.md
+++ b/docs/en/models/Twins.md
+# Twins
+
+## Overview
+The Twins network includes Twins-PCPVT and Twins-SVT, which focuses on the meticulous design of the spatial attention mechanism, resulting in a simple but more effective solution. Since the architecture only involves matrix multiplication, and the current deep learning framework has a high degree of optimization for matrix multiplication, the architecture is very efficient and easy to implement. Moreover, this architecture can achieve excellent performance in a variety of downstream vision tasks such as image classification, target detection, and semantic segmentation. [Paper](https://arxiv.org/abs/2104.13840).
+
+## Accuracy, FLOPS and Parameters
+
+| Models        | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Params<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| pcpvt_small   | 0.8082 | 0.9552 | 0.812 | - | 3.7 | 24.1   |
+| pcpvt_base    | 0.8242 | 0.9619 | 0.827 | - | 6.4 | 43.8   |
+| pcpvt_large   | 0.8273 | 0.9650 | 0.831 | - | 9.5 | 60.9   |
+| alt_gvt_small | 0.8140 | 0.9546 | 0.817 | - | 2.8  | 24   |
+| alt_gvt_base  | 0.8294 | 0.9621 | 0.832 | - | 8.3  | 56   |
+| alt_gvt_large | 0.8331 | 0.9642 | 0.837 | - | 14.8 | 99.2   |
+
+**Note**:The difference in accuracy from Reference is due to the difference in data preprocessing.
--- a/docs/en/whl_en.md
+++ b/docs/en/whl_en.md
-# paddleclas package
+# PaddleClas wheel package

-## Get started quickly
+## 1. Installation

-### install package
+* installing from pypi

-install by pypi
 ```bash
-pip install paddleclas==2.0.3
+pip3 install paddleclas==2.2.1
 ```

-build own whl package and install
+* build own whl package and install
+
 ```bash
 python3 setup.py bdist_wheel
-pip3 install dist/paddleclas-x.x.x-py3-none-any.whl
+pip3 install dist/*
 ```

-### 1. Quick Start
-
-* Assign `image_file='docs/images/whl/demo.jpg'`, Use inference model that Paddle provides `model_name='ResNet50'`

-**Here is demo.jpg**
+## 2. Quick Start
+* Using the `ResNet50` model provided by PaddleClas, the following image(`'docs/images/whl/demo.jpg'`) as an example.

 <div align="center">
 <img src="../images/whl/demo.jpg"  width = "400" />
 </div>

+* Python
 ```python
 from paddleclas import PaddleClas
-clas = PaddleClas(model_name='ResNet50', top_k=5)
-image_file='docs/images/whl/demo.jpg'
-result=clas.predict(image_file)
-print(result)
+clas = PaddleClas(model_name='ResNet50')
+infer_imgs='docs/images/whl/demo.jpg'
+result=clas.predict(infer_imgs)
+print(next(result))
 ```

+**Note**: `PaddleClas.predict()` is a `generator`. Therefore you need to use `next()` or `for` call it iteratively. It will perform a prediction by `batch_size` and return the prediction result(s) when called. Examples of returned results are as follows:
+
 ```
-    >>> result
-    [{'class_ids': array([ 8,  7, 86, 82, 80]), 'scores': array([9.7967714e-01, 2.0280687e-02, 2.7053760e-05, 6.1860351e-06,
-       2.6378802e-06], dtype=float32), 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'black grouse'], 'filename': 'docs/images/whl/demo.jpg'}
+>>> result
+[{'class_ids': [8, 7, 136, 80, 84], 'scores': [0.79368, 0.16329, 0.01853, 0.00959, 0.00239], 'label_names': ['hen', 'cock', 'European gallinule, Porphyrio porphyrio', 'black grouse', 'peacock']}]
 ```

-* Using command line interactive programming
+* CLI
 ```bash
-paddleclas --model_name=ResNet50 --top_k=5 --image_file='docs/images/whl/demo.jpg'
-```
-
-```
-    >>> result
-    **********docs/images/whl/demo.jpg**********
-    filename: docs/images/whl/demo.jpg; class id: 8, 7, 86, 82, 80; scores: 0.9797, 0.0203, 0.0000, 0.0000, 0.0000; label: hen, cock, partridge, ruffed grouse, partridge, Bonasa umbellus, black grouse
-    Predict complete!
-```
-
-### 2. Definition of Parameters
-* model_name(str): model's name. If not assigning `model_file`and`params_file`, you can assign this param. If using inference model based on ImageNet1k provided by Paddle, set as default='ResNet50'.
-* image_file(str or numpy.ndarray): image's path. Support assigning single local image, internet image and folder containing series of images. Also Support numpy.ndarray, the channel order is [B, G, R].
-* use_gpu(bool): Whether to use GPU or not, defalut=False。
-* use_tensorrt(bool): whether to open tensorrt or not. Using it can greatly promote predict preformance, default=False.
-* is_preprocessed(bool): Assign the image data has been preprocessed or not when the image_file is numpy.ndarray.
-* resize_short(int): resize the minima between height and width into resize_short(int), default=256
-* resize(int): resize image into resize(int), default=224.
-* normalize(bool): whether normalize image or not, default=True.
-* batch_size(int): batch number, default=1.
-* model_file(str): path of inference.pdmodel. If not assign this param，you need assign `model_name` for downloading.
-* params_file(str): path of inference.pdiparams. If not assign this param，you need assign `model_name` for downloading.
-* ir_optim(bool): whether enable IR optimization or not, default=True.
-* gpu_mem(int): GPU memory usages，default=8000。
-* enable_profile(bool): whether enable profile or not,default=False.
-* top_k(int): Assign top_k, default=1.
-* enable_mkldnn(bool): whether enable MKLDNN or not, default=False.
-* cpu_num_threads(int): Assign number of cpu threads, default=10.
-* label_name_path(str): Assign path of label_name_dict you use. If using your own training model, you can assign this param. If using inference model based on ImageNet1k provided by Paddle, you may not assign this param.Defaults take ImageNet1k's label name.
+paddleclas --model_name=ResNet50  --infer_imgs="docs/images/whl/demo.jpg"
+```
+
+```
+>>> result
+filename: docs/images/whl/demo.jpg, top-5, class_ids: [8, 7, 136, 80, 84], scores: [0.79368, 0.16329, 0.01853, 0.00959, 0.00239], label_names: ['hen', 'cock', 'European gallinule, Porphyrio porphyrio', 'black grouse', 'peacock']
+Predict complete!
+```
+
+
+
+
+## 3. Definition of Parameters
+
+The following parameters can be specified in Command Line or used as parameters of the constructor when instantiating the PaddleClas object in Python.
+* model_name(str): If using inference model based on ImageNet1k provided by Paddle, please specify the model's name by the parameter.
+* inference_model_dir(str): Local model files directory, which is valid when `model_name` is not specified. The directory should contain `inference.pdmodel` and `inference.pdiparams`.
+* infer_imgs(str): The path of image to be predicted, or the directory containing the image files, or the URL of the image from Internet.
+* use_gpu(bool): Whether to use GPU or not, default by `True`.
+* gpu_mem(int): GPU memory usages，default by `8000`。
+* use_tensorrt(bool): Whether to open TensorRT or not. Using it can greatly promote predict preformance, default by `False`.
+* enable_mkldnn(bool): Whether enable MKLDNN or not, default `False`.
+* cpu_num_threads(int): Assign number of cpu threads, valid when `--use_gpu` is `False` and `--enable_mkldnn` is `True`, default by `10`.
+* batch_size(int): Batch size, default by `1`.
+* resize_short(int): Resize the minima between height and width into `resize_short`, default by `256`.
+* crop_size(int): Center crop image to `crop_size`, default by `224`.
+* topk(int): Print (return) the `topk` prediction results, default by `5`.
+* class_id_map_file(str): The mapping file between class ID and label, default by `ImageNet1K` dataset's mapping.
 * pre_label_image(bool): whether prelabel or not, default=False.
-* pre_label_out_idr(str): If prelabeling, the path of output.
+* save_dir(str): The directory to save the prediction results that can be used as pre-label, default by `None`, that is, not to save.

-**Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384` when building a `PaddleClas` object. The following is a demo.
+**Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`. The following is a demo.

-* Bash:
-    ```bash
-    paddleclas --model_name=ViT_base_patch16_384 --image_file='docs/images/whl/demo.jpg' --resize_short=384 --resize=384
-    ```
+* CLI:
+```bash
+from paddleclas import PaddleClas, get_default_confg
+paddleclas --model_name=ViT_base_patch16_384 --infer_imgs='docs/images/whl/demo.jpg' --resize_short=384 --crop_size=384
+```

 * Python:
-    ```python
-    clas = PaddleClas(model_name='ViT_base_patch16_384', top_k=5, resize_short=384, resize=384)
-    ```
+```python
+from paddleclas import PaddleClas
+clas = PaddleClas(model_name='ViT_base_patch16_384', resize_short=384, crop_size=384)
+```
+
+## 4. Usage

-### 3. Different Usages of Codes
+PaddleClas provides two ways to use:
+1. Python interative programming;
+2. Bash command line programming.

-**We provide two ways to use: 1. Python interative programming 2. Bash command line programming**
+### 4.1 View help information

-* check `help` information
+* CLI
 ```bash
 paddleclas -h
 ```

-* Use user-specified model, you need to assign model's path `model_file` and parameters's path`params_file`
+### 4.2 Prediction using inference model provide by PaddleClas
+You can use the inference model provided by PaddleClas to predict, and only need to specify `model_name`. In this case, PaddleClas will automatically download files of specified model and save them in the directory `~/.paddleclas/`.

-###### python
+* Python
 ```python
 from paddleclas import PaddleClas
-clas = PaddleClas(model_file='the path of model file',
-    params_file='the path of params file')
-image_file = 'docs/images/whl/demo.jpg'
-result=clas.predict(image_file)
-print(result)
+clas = PaddleClas(model_name='ResNet50')
+infer_imgs = 'docs/images/whl/demo.jpg'
+result=clas.predict(infer_imgs)
+print(next(result))
 ```

-###### bash
+* CLI
 ```bash
-paddleclas --model_file='user-specified model path' --params_file='parmas path' --image_file='docs/images/whl/demo.jpg'
+paddleclas --model_name='ResNet50' --infer_imgs='docs/images/whl/demo.jpg'
 ```

-* Use inference model which PaddlePaddle provides to predict, you need to choose one of model proviede by PaddleClas to assign `model_name`. So there's no need to assign `model_file`. And the model you chosen will be download in `~/.paddleclas/`, which will be saved in folder named by `model_name`.

-###### python
+### 4.3 Prediction using local model files
+You can use the local model files trained by yourself to predict, and only need to specify `inference_model_dir`. Note that the directory must contain `inference.pdmodel` and `inference.pdiparams`.
+
+* Python
 ```python
 from paddleclas import PaddleClas
-clas = PaddleClas(model_name='ResNet50')
-image_file = 'docs/images/whl/demo.jpg'
-result=clas.predict(image_file)
-print(result)
+clas = PaddleClas(inference_model_dir='./inference/')
+infer_imgs = 'docs/images/whl/demo.jpg'
+result=clas.predict(infer_imgs)
+print(next(result))
 ```

-###### bash
+* CLI
 ```bash
-paddleclas --model_name='ResNet50' --image_file='docs/images/whl/demo.jpg'
+paddleclas --inference_model_dir='./inference/' --infer_imgs='docs/images/whl/demo.jpg'
 ```

-* You can assign input as format `numpy.ndarray` which has been preprocessed `image_file=np.ndarray`. Note that the image data must be three channel. If need To preprocess the image, the image channels order must be [B, G, R].
+### 4.4 Prediction by batch
+You can predict by batch, only need to specify `batch_size` when `infer_imgs` is direcotry contain image files.

-###### python
+* Python
 ```python
-import cv2
 from paddleclas import PaddleClas
-clas = PaddleClas(model_name='ResNet50')
-image_file = cv2.imread("docs/images/whl/demo.jpg")
-result=clas.predict(image_file)
+clas = PaddleClas(model_name='ResNet50', batch_size=2)
+infer_imgs = 'docs/images/'
+result=clas.predict(infer_imgs)
+for r in result:
+    print(r)
 ```

-* You can assign `image_file` as a folder path containing series of images.
+* CLI
+```bash
+paddleclas --model_name='ResNet50' --infer_imgs='docs/images/' --batch_size 2
+```

-###### python
+
+### 4.5 Prediction of Internet image
+You can predict the Internet image, only need to specify URL of Internet image by `infer_imgs`. In this case, the image file will be downloaded and saved in the directory `~/.paddleclas/images/`.
+
+* Python
 ```python
 from paddleclas import PaddleClas
 clas = PaddleClas(model_name='ResNet50')
-image_file = 'docs/images/whl/' # it can be image_file folder path which contains all of images you want to predict.
-result=clas.predict(image_file)
-print(result)
+infer_imgs = 'https://raw.githubusercontent.com/paddlepaddle/paddleclas/release/2.2/docs/images/whl/demo.jpg'
+result=clas.predict(infer_imgs)
+print(next(result))
 ```

-###### bash
+* CLI
 ```bash
-paddleclas --model_name='ResNet50' --image_file='docs/images/whl/'
+paddleclas --model_name='ResNet50' --infer_imgs='https://raw.githubusercontent.com/paddlepaddle/paddleclas/release/2.2/docs/images/whl/demo.jpg'
 ```

-* You can assign `--pre_label_image=True`, `--pre_label_out_idr= './output_pre_label/'`. Then images will be copied into folder named by top-1 class_id.

-###### python
+### 4.6 Prediction of NumPy.array format image
+In Python code, you can predict the NumPy.array format image, only need to use the `infer_imgs` to transfer variable of image data. Note that the image data must be 3 channels.
+
+* python
 ```python
+import cv2
 from paddleclas import PaddleClas
-clas = PaddleClas(model_name='ResNet50', pre_label_image=True, pre_label_out_idr='./output_pre_label/')
-image_file = 'docs/images/whl/' # it can be image_file folder path which contains all of images you want to predict.
-result=clas.predict(image_file)
-print(result)
+clas = PaddleClas(model_name='ResNet50')
+infer_imgs = cv2.imread("docs/images/whl/demo.jpg")
+result=clas.predict(infer_imgs)
+print(next(result))
 ```

-###### bash
+### 4.7 Save the prediction result(s)
+You can save the prediction result(s) as pre-label, only need to use `pre_label_out_dir` to specify the directory to save.
+
+* python
+```python
+from paddleclas import PaddleClas
+clas = PaddleClas(model_name='ResNet50', save_dir='./output_pre_label/')
+infer_imgs = 'docs/images/whl/' # it can be infer_imgs folder path which contains all of images you want to predict.
+result=clas.predict(infer_imgs)
+print(next(result))
+```
+
+* CLI
 ```bash
-paddleclas --model_name='ResNet50' --image_file='docs/images/whl/' --pre_label_image=True --pre_label_out_idr='./output_pre_label/'
+paddleclas --model_name='ResNet50' --infer_imgs='docs/images/whl/' --save_dir='./output_pre_label/'
 ```

-* You can assign `--label_name_path` as your own label_dict_file, format should be as(class_id<space>class_name<\n>).
+
+### 4.8 Specify the mapping between class id and label name
+You can specify the mapping between class id and label name, only need to use `class_id_map_file` to specify the mapping file. PaddleClas uses ImageNet1K's mapping by default.
+
+The content format of mapping file shall be:

 ```
-0 tench, Tinca tinca
-1 goldfish, Carassius auratus
-2 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
-......
+class_id<space>class_name<\n>
 ```

-* If you use inference model that PaddleClas provides, you do not need assign `label_name_path`. Program will take `ppcls/utils/imagenet1k_label_list.txt` as defaults. If you hope using your own training model, you can provide `label_name_path` outputing 'label_name' and scores, otherwise no 'label_name' in output information.
+For example:

-###### python
-```python
-from paddleclas import PaddleClas
-clas = PaddleClas(model_file= 'the path of model file', params_file = 'the path of params file', label_name_path='./ppcls/utils/imagenet1k_label_list.txt')
-image_file = 'docs/images/whl/demo.jpg' # it can be image_file folder path which contains all of images you want to predict.
-result=clas.predict(image_file)
-print(result)
 ```
-
-###### bash
-```bash
-paddleclas --model_file='the path of model file' --params_file='the path of params file' --image_file='docs/images/whl/demo.jpg' --label_name_path='./ppcls/utils/imagenet1k_label_list.txt'
+0 tench, Tinca tinca
+1 goldfish, Carassius auratus
+2 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
+......
 ```

-###### python
+* Python
 ```python
 from paddleclas import PaddleClas
-clas = PaddleClas(model_name='ResNet50')
-image_file = 'docs/images/whl/' # it can be directory which contains all of images you want to predict.
-result=clas.predict(image_file)
-print(result)
+clas = PaddleClas(model_name='ResNet50', class_id_map_file='./ppcls/utils/imagenet1k_label_list.txt')
+infer_imgs = 'docs/images/whl/demo.jpg'
+result=clas.predict(infer_imgs)
+print(next(result))
 ```

-###### bash
+* CLI
 ```bash
-paddleclas --model_name='ResNet50' --image_file='docs/images/whl/'
+paddleclas --model_name='ResNet50' --infer_imgs='docs/images/whl/demo.jpg' --class_id_map_file='./ppcls/utils/imagenet1k_label_list.txt'
 ```
--- a/docs/images/wx_group.jpeg
+++ b/docs/images/wx_group.jpeg
--- a/docs/images/wx_group.png
+++ b/docs/images/wx_group.png
--- a/docs/zh_CN/advanced_tutorials/image_augmentation/ImageAugment.md
+++ b/docs/zh_CN/advanced_tutorials/image_augmentation/ImageAugment.md
@@ -74,10 +74,12 @@ autoaugment_op = ImageNetPolicy()

 ops = [decode_op, resize_op, autoaugment_op]

-imgs_dir = 图像路径
-fnames = os.listdir(imgs_dir)
-for f in fnames:
-    data = open(os.path.join(imgs_dir, f)).read()
+imgs_dir = "imgs_dir"
+file_names = os.listdir(imgs_dir)
+for file_name in file_names:
+    file_path = os.join(imgs_dir, file_name)
+    with open(file_path) as f:
+        data = f.read()
    img = transform(data, ops)
 ```


--- a/docs/zh_CN/faq_series/faq_2021_s2.md
+++ b/docs/zh_CN/faq_series/faq_2021_s2.md
+# 图像识别常见问题汇总 - 2021 第2季
+
+
+## 目录
+* [第1期](#第1期)(2021.07.08)
+
+
+<a name="第1期"></a>
+## 第1期
+
+### Q1.1: 目前使用的主体检测模型检测在某些场景中会有误检？
+
+**A**：目前的主体检测模型训练时使用了COCO、Object365、RPC、LogoDet等公开数据集，如果被检测数据是类似工业质检等于常见类别差异较大的数据，需要基于目前的检测模型重新微调训练。
+
+### Q1.2: 添加图片后建索引报`assert text_num >= 2`错？
+
+**A**：请确保data_file.txt中图片路径和图片名称中间的间隔为单个table，而不是空格。
+
+### Q1.3: 识别模块预测时报`Illegal instruction`错？
+
+**A**：可能是编译生成的库文件与您的环境不兼容，导致程序报错，如果报错，推荐参考[向量检索教程](../../../deploy/vector_search/README.md)重新编译库文件。
+
+### Q1.4 主体检测是每次只输出一个主体检测框吗？
+
+**A**：主体检测这块的输出数量是可以通过配置文件配置的。在配置文件中Global.threshold控制检测的阈值，小于该阈值的检测框被舍弃，Global.max_det_results控制最大返回的结果数，这两个参数共同决定了输出检测框的数量。
+
+### Q1.5 训练主体检测模型的数据是如何选择的？换成更小的模型会有损精度吗？
+
+**A**：训练数据是在COCO、Object365、RPC、LogoDet等公开数据集中随机抽取的子集，小模型精度可能会有一些损失，后续我们也会尝试下更小的检测模型。关于主体检测模型的更多信息请参考[主体检测](../application/mainbody_detection.md)。
+
+### Q1.6 识别模型怎么在预训练模型的基础上进行微调训练？
+
+**A**：识别模型的微调训练和分类模型的微调训练类似，识别模型可以加载商品的预训练模型],训练过程可以参考[识别模型训练](../tutorials/getting_started_retrieval.md)，后续我们也会持续细化这块的文档。
+
+### Q1.7 PaddleClas和PaddleDetection区别
+
+**A**：PaddleClas是一个兼主体检测、图像分类、图像检索于一体的图像识别repo，用于解决大部分图像识别问题，用户可以很方便的使用PaddleClas来解决小样本、多类别的图像识别问题。PaddleDetection提供了目标检测、关键点检测、多目标跟踪等能力，方便用户定位图像中的感兴趣的点和区域，被广泛应用于工业质检、遥感图像检测、无人巡检等项目。
+
+### Q1.8 PaddleClas 2.2和PaddleClas 2.1完全兼容吗？
+
+**A**：PaddleClas2.2相对PaddleClas2.1新增了metric learning模块，主体检测模块、向量检索模块。另外，也提供了商品识别、车辆识别、logo识别和动漫人物识别等4个场景应用示例。用户可以基于PaddleClas 2.2快速构建图像识别系统。在图像分类模块，二者的使用方法类似，可以参考[图像分类示例](../tutorials/getting_started.md)快速迭代和评估。新增的metric learning模块，可以参考[metric learning示例](../tutorials/getting_started_retrieval.md)。另外，新版本暂时还不支持fp16、dali训练，也暂时不支持多标签训练，这块内容将在不久后支持。
+
+### Q1.9 训练metric learning时，每个epoch中，无法跑完所有mini-batch，为什么？
+
+**A**：在训练metric learning时，使用的Sampler是DistributedRandomIdentitySampler，该Sampler不会采样全部的图片，导致会让每一个epoch采样的数据不是所有的数据，所以无法跑完显示的mini-batch是正常现象。后续我们会优化下打印的信息，尽可能减少给大家带来的困惑。
+
+### Q1.10 有些图片没有识别出结果，为什么？
+
+**A**：在配置文件（如inference_product.yaml）中，`IndexProcess.score_thres`中会控制被识别的图片与库中的图片的余弦相似度的最小值。当余弦相似度小于该值时，不会打印结果。您可以根据自己的实际数据调整该值。
+
+### Q1.11 为什么有一些图片检测出的结果就是原图？
+
+**A**：主体检测模型会返回检测框，但事实上为了让后续的识别模型更加准确，在返回检测框的同时也返回了原图。后续会根据原图或者检测框与库中的图片的相似度排序，相似度最高的库中图片的标签即为被识别图片的标签。
+
+### Q1.12 使用`circle loss`还需加`triplet loss`吗？
+
+**A**：`circle loss`是统一了样本对学习和分类学习的两种形式，如果是分类学习的形式的话，可以增加`triplet loss`。
+
+### Q1.13 hub serving方式启动某个模块，怎么添加该模块的参数呢？
+
+**A**：具体可以参考[hub serving参数](../../../deploy/hubserving/clas/params.py)。
+
+### Q1.14  模型训练出nan，为什么？
+
+**A**：
+
+1.确保正确加载预训练模型, 最简单的加载方式添加参数`-o Arch.pretrained=True`即可；
+
+2.模型微调时，学习率不要太大，如设置0.001就好。
+
+
+### Q1.15 SSLD中，大模型在500M数据上预训练后蒸馏小模型，然后在1M数据上蒸馏finetune小模型？
+
+**A**：步骤如下：
+
+1.基于facebook开源的`ResNeXt101-32x16d-wsl`模型 去蒸馏得到了`ResNet50-vd`模型；
+
+2.用这个`ResNet50-vd`，在500W数据集上去蒸馏`MobilNetV3`；
+
+3.考虑到500W的数据集的分布和100W的数据分布不完全一致，所以这块，在100W上的数据上又finetune了一下，精度有微弱的提升。
+
+
+### Q1.16 如果不是识别开源的四个方向的图片，该使用哪个识别模型？
+
+**A**：建议使用商品识别模型，一来是因为商品覆盖的范围比较广，被识别的图片是商品的概率更大，二来是因为商品识别模型的训练数据使用了5万类别的数据，泛化能力更好，特征会更鲁棒一些。
+
+### Q1.17 最后使用512维的向量，为什么不用1024或者其他维度的呢？
+
+**A**：使用维度小的向量，为了加快计算，在实际使用过程中，可能使用128甚至更小。一般来说，512的维度已经够大，能充分表示特征了。
+
+### Q1.18 训练SwinTransformer，loss出现nan
+
+**A**：训练SwinTransformer的话，需要使用paddle-dev去训练，安装方式参考[paddlepaddle安装方式](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)，后续paddlepaddle-2.1也会同时支持。
+
+### Q1.19 新增底库数据需要重新构建索引吗？
+
+**A**：这一版需要重新构建索引，未来版本会支持只构建新增图片的索引。
+
+### Q1.20 PaddleClas 的`train_log`文件在哪里?
+
+**A**：在保存权重的路径中存放了`train.log`。
--- a/docs/zh_CN/tutorials/config.md
+++ b/docs/zh_CN/tutorials/config.md
-# 配置说明
-
---
-
-## 简介
-
-本文档介绍了PaddleClas配置文件(`configs/*.yaml`)中各参数的含义，以便您更快地自定义或修改超参数配置。
-
-* 注意：部分参数并未在配置文件中体现，在训练或者评估时，可以直接使用`-o`进行参数的扩充或者更新，比如说`-o checkpoints=./ckp_path/ppcls`，表示在配置文件中添加（如果之前不存在）或者更新（如果之前已经包含该字段）`checkpoints`字段，其值设为`./ckp_path/ppcls`。
-
-
-## 配置详解
-
-### 基础配置
-
-| 参数名字 | 具体含义 | 默认值 | 可选值 |
-|:---:|:---:|:---:|:---:|
-| mode | 运行模式 | "train" | ["train"," valid"] |
-| checkpoints | 断点模型路径，用于恢复训练 | "" | Str |
-| last_epoch | 上一次训练结束时已经训练的epoch数量，与checkpoints一起使用 | -1 | int |
-| pretrained_model | 预训练模型路径 | "" | Str |
-| load_static_weights | 加载的模型是否为静态图的预训练模型 | False | bool |
-| model_save_dir | 保存模型路径 | "" | Str |
-| classes_num | 分类数 | 1000 | int |
-| total_images | 总图片数 | 1281167 | int |
-| save_interval | 每隔多少个epoch保存模型 | 1 | int |
-| validate | 是否在训练时进行评估 | TRUE | bool |
-| valid_interval | 每隔多少个epoch进行模型评估 | 1 | int |
-| epochs | 训练总epoch数 |  | int |
-| topk | 评估指标K值大小 | 5 | int |
-| image_shape | 图片大小 | [3，224，224] | list, shape: (3,) |
-| use_mix | 是否启用mixup | False | ['True', 'False'] |
-| ls_epsilon | label_smoothing epsilon值| 0 | float |
-| use_distillation | 是否进行模型蒸馏 | False | bool |
-
-## 结构(ARCHITECTURE)
-
-### 分类模型结构配置
-
-| 参数名字 | 具体含义 | 默认值 | 可选值 |
-|:---:|:---:|:---:|:---:|
-| name | 模型结构名字 | "ResNet50_vd" | PaddleClas提供的模型结构 |
-| params | 模型传参 | {} | 模型结构所需的额外字典，如EfficientNet等配置文件中需要传入`padding_type`等参数，可以通过这种方式传入 |
-
-### 识别模型结构配置
-
-|     参数名字      |         具体含义          |   默认值   |                            可选值                            |
-| :---------------: | :-----------------------: | :--------: | :----------------------------------------------------------: |
-|       name        |         模型结构          | "RecModel" |                         ["RecModel"]                         |
-| infer_output_key  |    inference时的输出值    | “feature”  |                    ["feature", "logits"]                     |
-| infer_add_softmax |  infercne是否添加softmax  |    True    |                        [True, False]                         |
-|     Backbone      |    使用Backbone的名字     |            | 需传入字典结构，包含`name`、`pretrained`等key值。其中`name`为分类模型名字， `pretrained`为布尔值 |
-| BackboneStopLayer | Backbone中的feature输出层 |            | 需传入字典结构，包含`name`key值，具体值为Backbone中的特征输出层的`full_name` |
-|       Neck        |    添加的网络Neck部分     |            |           需传入字典结构，Neck网络层的具体输入参数           |
-|       Head        |    添加的网络Head部分     |            |           需传入字典结构，Head网络层的具体输入参数           |
-
-### 学习率(LEARNING_RATE)
-
-| 参数名字 | 具体含义 | 默认值 | 可选值 |
-|:---:|:---:|:---:|:---:|
-| function | decay方法名 | "Linear" | ["Linear", "Cosine", <br> "Piecewise", "CosineWarmup"] |
-| params.lr | 初始学习率 | 0.1 | float |
-| params.decay_epochs | piecewisedecay中<br>衰减学习率的milestone |  | list |
-| params.gamma | piecewisedecay中gamma值 | 0.1 | float |
-| params.warmup_epoch | warmup轮数 | 5 | int |
-| parmas.steps | lineardecay衰减steps数 | 100 | int |
-| params.end_lr | lineardecayend_lr值 | 0 | float |
-
-### 优化器(OPTIMIZER)
-
-| 参数名字 | 具体含义 | 默认值 | 可选值 |
-|:---:|:---:|:---:|:---:|
-| function | 优化器方法名 | "Momentum" | ["Momentum", "RmsProp"] |
-| params.momentum | momentum值 | 0.9 | float |
-| regularizer.function | 正则化方法名 | "L2" | ["L1", "L2"] |
-| regularizer.factor | 正则化系数 | 0.0001 | float |
-
-### 数据读取器与数据处理
-
-| 参数名字 | 具体含义 |
-|:---:|:---:|
-| batch_size | 批大小 |
-| num_workers | 数据读取器worker数量 |
-| file_list | train文件列表 |
-| data_dir | train文件路径 |
-| shuffle_seed | 用来进行shuffle的seed值 |
-
-数据处理
-
-| 功能名字 | 参数名字 | 具体含义 |
-|:---:|:---:|:---:|
-| DecodeImage | to_rgb | 数据转RGB |
-|  | to_np | 数据转numpy |
-|  | channel_first | 按CHW排列的图片数据 |
-| RandCropImage | size | 随机裁剪 |
-| RandFlipImage | | 随机翻转 |
-| NormalizeImage | scale | 归一化scale值 |
-|  | mean | 归一化均值 |
-|  | std | 归一化方差 |
-|  | order | 归一化顺序 |
-| ToCHWImage |  | 调整为CHW |
-| CropImage | size | 裁剪大小 |
-| ResizeImage | resize_short | 按短边调整大小 |
-
-mix处理
-
-| 参数名字| 具体含义|
-|:---:|:---:|
-| MixupOperator.alpha | mixup处理中的alpha值|
--- a/docs/zh_CN/tutorials/config_description.md
+++ b/docs/zh_CN/tutorials/config_description.md
+# 配置说明
+
+---
+
+## 简介
+
+本文档介绍了PaddleClas配置文件(`ppcls/configs/*.yaml`)中各参数的含义，以便您更快地自定义或修改超参数配置。
+
+
+## 配置详解
+
+### 1.分类模型
+
+此处以`ResNet50_vd`在`ImageNet-1k`上的训练配置为例，详解各个参数的意义。[配置路径](../../../ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml)。
+
+#### 1.1 全局配置(Global)
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| checkpoints | 断点模型路径，用于恢复训练 | null | str |
+| pretrained_model | 预训练模型路径 | null | str |
+| output_dir | 保存模型路径 | "./output/" | str |
+| save_interval | 每隔多少个epoch保存模型 | 1 | int |
+| eval_during_train| 是否在训练时进行评估 | True | bool |
+| eval_interval | 每隔多少个epoch进行模型评估 | 1 | int |
+| epochs | 训练总epoch数 |  | int |
+| print_batch_step | 每隔多少个mini-batch打印输出 | 10 | int |
+| use_visualdl | 是否是用visualdl可视化训练过程 | False | bool |
+| image_shape | 图片大小 | [3，224，224] | list, shape: (3,) |
+| save_inference_dir | inference模型的保存路径 | "./inference" | str |
+| eval_mode | eval的模式 | "classification" | "retrieval" |
+
+**注**：`pretrained_model`也可以填写存放预训练模型的http地址。
+
+#### 1.2 结构(Arch)
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| name | 模型结构名字 | ResNet50 | PaddleClas提供的模型结构 |
+| class_num | 分类数 | 1000 | int |
+| pretrained | 预训练模型 | False | bool， str |
+
+**注**：此处的pretrained可以设置为`True`或者`False`，也可以设置权重的路径。另外当`Global.pretrained_model`也设置相应路径时，此处的`pretrained`失效。
+
+#### 1.3 损失函数（Loss）
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| CELoss | 交叉熵损失函数 | —— | —— |
+| CELoss.weight | CELoss的在整个Loss中的权重 | 1.0 | float |
+| CELoss.epsilon | CELoss中label_smooth的epsilon值 | 0.1 | float，0-1之间 |
+
+
+#### 1.4 优化器(Optimizer)
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| name | 优化器方法名 | "Momentum" | "RmsProp"等其他优化器 |
+| momentum | momentum值 | 0.9 | float |
+| lr.name | 学习率下降方式 | "Cosine" | "Linear"、"Piecewise"等其他下降方式 |
+| lr.learning_rate | 学习率初始值 | 0.1 | float |
+| lr.warmup_epoch | warmup轮数 | 0 | int，如5 |
+| regularizer.name | 正则化方法名 | "L2" | ["L1", "L2"] |
+| regularizer.coeff | 正则化系数 | 0.00007 | float |
+
+**注**：`lr.name`不同时，新增的参数可能也不同，如当`lr.name=Piecewise`时，需要添加如下参数：
+
+```
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+```
+
+添加方法及参数请查看[learning_rate.py](../../../ppcls/optimizer/learning_rate.py)。
+
+
+#### 1.5数据读取模块（DataLoader）
+
+##### 1.5.1 dataset
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| name | 读取数据的类的名字 | ImageNetDataset | VeriWild等其他读取数据类的名字 |
+| image_root | 数据集存放的路径 | ./dataset/ILSVRC2012/ | str |
+| cls_label_path | 数据集标签list | ./dataset/ILSVRC2012/train_list.txt | str |
+| transform_ops | 单张图片的数据预处理 | —— | —— |
+| batch_transform_ops | batch图片的数据预处理 | —— | —— |
+
+
+transform_ops中参数的意义：
+
+| 功能名字 | 参数名字 | 具体含义 |
+|:---:|:---:|:---:|
+| DecodeImage | to_rgb | 数据转RGB |
+|  | channel_first | 按CHW排列的图片数据 |
+| RandCropImage | size | 随机裁剪 |
+| RandFlipImage | | 随机翻转 |
+| NormalizeImage | scale | 归一化scale值 |
+|  | mean | 归一化均值 |
+|  | std | 归一化方差 |
+|  | order | 归一化顺序 |
+| CropImage | size | 裁剪大小 |
+| ResizeImage | resize_short | 按短边调整大小 |
+
+batch_transform_ops中参数的含义：
+
+| 功能名字 | 参数名字 | 具体含义 |
+|:---:|:---:|:---:|
+| MixupOperator | alpha | Mixup参数值，该值越大增强越强 |
+
+##### 1.5.2 sampler
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| name |  sampler类型 | DistributedBatchSampler | DistributedRandomIdentitySampler等其他Sampler |
+| batch_size | 批大小 | 64 | int |
+| drop_last | 是否丢掉最后不够batch-size的数据 | False | bool |
+| shuffle | 数据是否做shuffle | True | bool |
+
+##### 1.5.3 loader
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| num_workers | 数据读取线程数 | 4 | int |
+| use_shared_memory | 是否使用共享内存 | True | bool |
+
+
+#### 1.6 评估指标（Metric）
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| TopkAcc | TopkAcc | [1, 5] | list, int |
+
+#### 1.7 预测（Infer）
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| infer_imgs | 被infer的图像的地址 | docs/images/whl/demo.jpg | str |
+| batch_size | 批大小 | 10 | int |
+| PostProcess.name | 后处理名字 | Topk | str |
+| PostProcess.topk | topk的值 | 5 | int |
+| PostProcess.class_id_map_file | class id和名字的映射文件 | ppcls/utils/imagenet1k_label_list.txt | str |
+
+**注**：Infer模块的`transforms`的解释参考数据读取模块中的dataset中`transform_ops`的解释。
+
+
+### 2.蒸馏模型
+
+**注**：此处以`MobileNetV3_large_x1_0`在`ImageNet-1k`上蒸馏`MobileNetV3_small_x1_0`的训练配置为例，详解各个参数的意义。[配置路径](../../../ppcls/configs/ImageNet/Distillation/mv3_large_x1_0_distill_mv3_small_x1_0.yaml)。这里只介绍与分类模型有区别的参数。
+
+#### 2.1 结构（Arch）
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| name | 模型结构名字 | DistillationModel | —— |
+| class_num | 分类数 | 1000 | int |
+| freeze_params_list | 冻结参数列表 | [True, False] | list |
+| models | 模型列表 | [Teacher, Student] | list |
+| Teacher.name | 教师模型的名字 | MobileNetV3_large_x1_0 | PaddleClas中的模型 |
+| Teacher.pretrained | 教师模型预训练权重 | True | 布尔值或者预训练权重路径 |
+| Teacher.use_ssld | 教师模型预训练权重是否是ssld权重 | True | 布尔值 |
+| infer_model_name | 被infer模型的类型 | Student | Teacher |
+
+**注**：
+
+1.list在yaml中体现如下：
+```
+  freeze_params_list:
+  - True
+  - False
+```
+2.Student的参数情况类似，不再赘述。
+
+#### 2.2 损失函数（Loss）
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| DistillationCELoss | 蒸馏的交叉熵损失函数 | —— | —— |
+| DistillationCELoss.weight | Loss权重 | 1.0 | float |
+| DistillationCELoss.model_name_pairs |  ["Student", "Teacher"] | —— | —— |
+| DistillationGTCELoss.weight | 蒸馏的模型与真实Label的交叉熵损失函数 | —— | —— |
+| DistillationGTCELos.weight | Loss权重 | 1.0 | float |
+| DistillationCELoss.model_names | 与真实label作交叉熵的模型名字 | ["Student"] | —— |
+
+
+#### 2.3 评估指标（Metric）
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| DistillationTopkAcc | DistillationTopkAcc | 包含model_key和topk两个参数 | —— |
+| DistillationTopkAcc.model_key | 被评估的模型 | "Student" | "Teacher" |
+| DistillationTopkAcc.topk | Topk的值 | [1, 5] | list, int |
+
+**注**：`DistillationTopkAcc`与普通`TopkAcc`含义相同，只是只用在蒸馏任务中。
+
+### 3. 识别模型
+
+**注**：此处以`ResNet50`在`LogoDet-3k`上的训练配置为例，详解各个参数的意义。[配置路径](../../../ppcls/configs/Logo/ResNet50_ReID.yaml)。这里只介绍与分类模型有区别的参数。
+
+#### 3.1 结构(Arch)
+
+|     参数名字      |         具体含义          |   默认值   |                            可选值                            |
+| :---------------: | :-----------------------: | :--------: | :----------------------------------------------------------: |
+|       name        |         模型结构          | "RecModel" |                         ["RecModel"]                         |
+| infer_output_key  |    inference时的输出值    | “feature”  |                    ["feature", "logits"]                     |
+| infer_add_softmax |  infercne是否添加softmax  |    False   |                        [True, False]                         |
+|     Backbone.name      |    Backbone的名字     |      ResNet50_last_stage_stride1     | PaddleClas提供的其他backbone |
+|     Backbone.pretrained      |   Backbone预训练模型    |      True      | 布尔值或者预训练模型路径 |
+| BackboneStopLayer.name | Backbone中的输出层名字 |     True       | Backbone中的特征输出层的`full_name` |
+|       Neck.name        |    网络Neck部分名字     |      VehicleNeck      |           需传入字典结构，Neck网络层的具体输入参数           |
+|       Neck.in_channels        |    输入Neck部分的维度大小     |      2048      |        与BackboneStopLayer.name层的大小相同           |
+|       Neck.out_channels        |    输出Neck部分的维度大小，即特征维度大小    |      512     |        int           |
+|       Head.name        |    网络Head部分名字     |      CircleMargin      |           Arcmargin等           |
+|       Head.embedding_size        |    特征维度大小      |      512      |           与Neck.out_channels保持一致           |
+|       Head.class_num        |    类别数     |      3000      |           int           |
+|       Head.margin        |    CircleMargin中的margin值     |      0.35      |           float          |
+|       Head.scale        |    CircleMargin中的scale值     |      64      |           int          |
+
+**注**：
+
+1.在PaddleClas中，`Neck`部分是Backbone与embedding层的连接部分，`Head`部分是embedding层与分类层的连接部分。
+
+2.`BackboneStopLayer.name`的获取方式可以通过将模型可视化后获取，可视化方式可以参考[Netron](https://github.com/lutzroeder/netron)或者[visualdl](https://github.com/PaddlePaddle/VisualDL)。
+
+3.调用`tools/export_model.py`会将模型的权重转为inference model，其中`infer_add_softmax`参数会控制是否在其后增加`Softmax`激活函数，代码中默认为`True`(分类任务中最后的输出层会接`Softmax`激活函数)，识别任务中特征层无须接激活函数，此处要设置为`False`。
+
+#### 3.2 评估指标（Metric）
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| Recallk| 召回率 | [1, 5] | list, int |
+| mAP| 平均检索精度 | None | None |
--- a/docs/zh_CN/whl.md
+++ b/docs/zh_CN/whl.md
-# paddleclas package使用说明
+# PaddleClas wheel package使用说明

-## 快速上手
+## 1. 安装

-### 安装whl包
+* pip安装

-pip安装
 ```bash
-pip install paddleclas==2.0.3
+pip3 install paddleclas==2.2.1
 ```

-本地构建并安装
+* 本地构建并安装
+
 ```bash
 python3 setup.py bdist_wheel
-pip3 install dist/paddleclas-x.x.x-py3-none-any.whl # x.x.x是paddleclas的版本号，默认为0.0.0
+pip3 install dist/*
 ```
-### 1. 快速开始
-* 指定`image_file='docs/images/whl/demo.jpg'`,使用Paddle提供的inference model,`model_name='ResNet50'`, 使用图片`docs/images/whl/demo.jpg`。

-**下图是使用的demo图片**
+
+## 2. 快速开始
+* 使用`ResNet50`模型，以下图（`'docs/images/whl/demo.jpg'`）为例进行说明。

 <div align="center">
 <img src="../images/whl/demo.jpg"  width = "400" />
 </div>

+
+* 在Python代码中使用
 ```python
 from paddleclas import PaddleClas
-clas = PaddleClas(model_name='ResNet50', top_k=5)
-image_file='docs/images/whl/demo.jpg'
-result=clas.predict(image_file)
-print(result)
+clas = PaddleClas(model_name='ResNet50')
+infer_imgs='docs/images/whl/demo.jpg'
+result=clas.predict(infer_imgs)
+print(next(result))
 ```

+**注意**：`PaddleClas.predict()` 为可迭代对象（`generator`），因此需要使用 `next()` 函数或 `for` 循环对其迭代调用。每次调用将以 `batch_size` 为单位进行一次预测，并返回预测结果。返回结果示例如下：
+
 ```
-    >>> result
-    [{'class_ids': array([ 8,  7, 86, 82, 80]), 'scores': array([9.7967714e-01, 2.0280687e-02, 2.7053760e-05, 6.1860351e-06,
-       2.6378802e-06], dtype=float32), 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'black grouse'], 'filename': 'docs/images/whl/demo.jpg'}]
+>>> result
+[{'class_ids': [8, 7, 136, 80, 84], 'scores': [0.79368, 0.16329, 0.01853, 0.00959, 0.00239], 'label_names': ['hen', 'cock', 'European gallinule, Porphyrio porphyrio', 'black grouse', 'peacock']}]
 ```

-* 使用命令行式交互方法。直接获得结果。
-```bash
-paddleclas --model_name=ResNet50 --top_k=5 --image_file='docs/images/whl/demo.jpg'
-```
-
-```
-    >>> result
-    **********docs/images/whl/demo.jpg**********
-    filename: docs/images/whl/demo.jpg; class id: 8, 7, 86, 82, 80; scores: 0.9797, 0.0203, 0.0000, 0.0000, 0.0000; label: hen, cock, partridge, ruffed grouse, partridge, Bonasa umbellus, black grouse
-    Predict complete!
-```
-
-### 2. 参数解释
-以下参数可在命令行交互使用时通过参数指定，或在Python代码中实例化PaddleClas对象时作为构造函数的参数使用。
-* model_name(str): 模型名称，没有指定自定义的model_file和params_file时，可以指定该参数，使用PaddleClas提供的基于ImageNet1k的inference model，默认值为ResNet50。
-* image_file(str or numpy.ndarray): 图像地址，支持指定单一图像的路径或图像的网址进行预测，支持指定包含图像的文件夹路径，支持numpy.ndarray格式的三通道图像数据，且通道顺序为[B, G, R]。
-* use_gpu(bool): 是否使用GPU，如果使用，指定为True。默认为False。
-* use_tensorrt(bool): 是否开启TensorRT预测，可提升GPU预测性能，需要使用带TensorRT的预测库。当使用TensorRT推理加速，指定为True。默认为False。
-* is_preprocessed(bool): 当image_file为numpy.ndarray格式的图像数据时，图像数据是否已经过预处理。如果该参数为True，则不再对image_file数据进行预处理，否则将转换通道顺序后，按照resize_short，resize，normalize参数对图像进行预处理。默认值为False。
-* resize_short(int): 将图像的高宽二者中小的值，调整到指定的resize_short值，大的值按比例放大。默认为256。
-* resize(int): 将图像裁剪到指定的resize值大小，默认224。
-* normalize(bool): 是否对图像数据归一化，默认True。
-* batch_size(int): 预测时每个batch的样本数量，默认为1。
-* model_file(str): 模型.pdmodel的路径，若不指定该参数，需要指定model_name，获得下载的模型。
-* params_file(str): 模型参数.pdiparams的路径，若不指定，则需要指定model_name,以获得下载的模型。
-* ir_optim(bool): 是否开启IR优化，默认为True。
-* gpu_mem(int): 使用的GPU显存大小，默认为8000。
-* enable_profile(bool): 是否开启profile功能，默认False。
-* top_k(int): 指定的topk，打印（返回）预测结果的前k个类别和对应的分类概率，默认为1。
-* enable_mkldnn(bool): 是否开启MKLDNN，默认False。
-* cpu_num_threads(int): 指定cpu线程数，默认设置为10。
-* label_name_path(str): 指定一个表示所有的label name的文件路径。当用户使用自己训练的模型，可指定这一参数，打印结果时可以显示图像对应的类名称。若用户使用Paddle提供的inference model，则可不指定该参数，使用imagenet1k的label_name，默认为空字符串。
-* pre_label_image(bool): 是否需要进行预标注。
-* pre_label_out_idr(str): 进行预标注后，输出结果的文件路径，默认为None。
-
-**注意**: 如果使用`Transformer`系列模型，如`DeiT_***_384`, `ViT_***_384`等，请注意模型的输入数据尺寸，需要设置参数`resize_short=384`, `resize=384`，如下所示：
-
-* 在命令行中使用：
-    ```bash
-    paddleclas --model_name=ViT_base_patch16_384 --image_file='docs/images/whl/demo.jpg' --resize_short=384 --resize=384
-    ```
-
-* 在python代码中：
-    ```python
-    clas = PaddleClas(model_name='ViT_base_patch16_384', top_k=5, resize_short=384, resize=384)
-    ```
-
-### 3. 代码使用方法
-
-**提供两种使用方式：1、python交互式编程。2、bash命令行式编程**
-
-* 查看帮助信息
-
-###### bash
+* 在命令行中使用
 ```bash
-paddleclas -h
+paddleclas --model_name=ResNet50  --infer_imgs="docs/images/whl/demo.jpg"
 ```

-* 用户使用自己指定的模型,需要指定模型路径参数`model_file`和参数`params_file`
+```
+>>> result
+filename: docs/images/whl/demo.jpg, top-5, class_ids: [8, 7, 136, 80, 84], scores: [0.79368, 0.16329, 0.01853, 0.00959, 0.00239], label_names: ['hen', 'cock', 'European gallinule, Porphyrio porphyrio', 'black grouse', 'peacock']
+Predict complete!
+```
+
+
+## 3. 参数解释
+以下参数可在命令行方式使用中通过参数指定，或在Python代码中实例化PaddleClas对象时作为构造函数的参数使用。
+* model_name(str): 模型名称，使用PaddleClas提供的基于ImageNet1k的预训练模型。
+* inference_model_dir(str): 本地模型文件目录，当未指定 `model_name` 时该参数有效。该目录下需包含 `inference.pdmodel` 和 `inference.pdiparams` 两个模型文件。
+* infer_imgs(str): 待预测图片文件路径，或包含图片文件的目录，或网络图片的URL。
+* use_gpu(bool): 是否使用GPU，默认为 `True`。
+* gpu_mem(int): 使用的GPU显存大小，当 `use_gpu` 为 `True` 时有效，默认为8000。
+* use_tensorrt(bool): 是否开启TensorRT预测，可提升GPU预测性能，需要使用带TensorRT的预测库，默认为 `False`。
+* enable_mkldnn(bool): 是否开启MKLDNN，当 `use_gpu` 为 `False` 时有效，默认 `False`。
+* cpu_num_threads(int): cpu预测时的线程数，当 `use_gpu` 为 `False` 且 `enable_mkldnn` 为 `True` 时有效，默认值为 `10`。
+* batch_size(int): 预测时每个batch的样本数量，默认为 `1`。
+* resize_short(int): 按图像较短边进行等比例缩放，默认为 `256`。
+* crop_size(int): 将图像裁剪到指定大小，默认为 `224`。
+* topk(int): 打印（返回）预测结果的前 `topk` 个类别和对应的分类概率，默认为 `5`。
+* class_id_map_file(str): `class id` 与 `label` 的映射关系文件。默认使用 `ImageNet1K` 数据集的映射关系。
+* save_dir(str): 将预测结果作为预标注数据保存的路径，默认为 `None`，即不保存。
+
+**注意**: 如果使用`Transformer`系列模型，如`DeiT_***_384`, `ViT_***_384`等，请注意模型的输入数据尺寸，需要设置参数`resize_short=384`, `crop_size=384`，如下所示。
+
+* 命令行中
+```bash
+from paddleclas import PaddleClas, get_default_confg
+paddleclas --model_name=ViT_base_patch16_384 --infer_imgs='docs/images/whl/demo.jpg' --resize_short=384 --crop_size=384
+```

-###### python
+* Python代码中
 ```python
 from paddleclas import PaddleClas
-clas = PaddleClas(model_file='the path of model file',
-    params_file='the path of params file')
-image_file = 'docs/images/whl/demo.jpg' # image_file 可指定为前缀是https的网络图片，也可指定为本地图片
-result=clas.predict(image_file)
-print(result)
+clas = PaddleClas(model_name='ViT_base_patch16_384', resize_short=384, crop_size=384)
 ```

-###### bash
+
+## 4. 使用示例
+
+PaddleClas提供两种使用方式：
+1. Python代码中使用；
+2. 命令行中使用。
+
+
+### 4.1 查看帮助信息
+
+* CLI
 ```bash
-paddleclas --model_file='user-specified model path' --params_file='parmas path' --image_file='docs/images/whl/demo.jpg'
+paddleclas -h
 ```

-* 用户使用PaddlePaddle训练好的inference model来预测，并通过参数`model_name`指定。
-此时无需指定`model_file`,模型会根据`model_name`自动下载指定模型到当前目录,并保存在目录`~/.paddleclas/`下以`model_name`命名的文件夹中。

-###### python
+### 4.2 使用PaddleClas提供的预训练模型进行预测
+可以使用PaddleClas提供的预训练模型来预测，并通过参数`model_name`指定。此时PaddleClas会根据`model_name`自动下载指定模型，并保存在目录`~/.paddleclas/`下。
+
+* Python
 ```python
 from paddleclas import PaddleClas
 clas = PaddleClas(model_name='ResNet50')
-image_file = 'docs/images/whl/demo.jpg' # image_file 可指定为前缀是https的网络图片，也可指定为本地图片
-result=clas.predict(image_file)
-print(result)
+infer_imgs = 'docs/images/whl/demo.jpg'
+result=clas.predict(infer_imgs)
+print(next(result))
 ```

-###### bash
+* CLI
 ```bash
-paddleclas --model_name='ResNet50' --image_file='docs/images/whl/demo.jpg'
+paddleclas --model_name='ResNet50' --infer_imgs='docs/images/whl/demo.jpg'
 ```

-* 用户可以使用numpy.ndarray格式的图像数据，并通过参数`image_file`指定。注意该图像数据必须为三通道图像数据。如需对图像进行预处理，则图像通道顺序必须为[B, G, R]。

-###### python
+### 4.3 使用本地模型文件预测
+可以使用本地的模型文件进行预测，通过参数`inference_model_dir`指定模型文件目录即可。需要注意，模型文件目录下必须包含`inference.pdmodel`和`inference.pdiparams`两个文件。
+
+* Python
 ```python
-import cv2
 from paddleclas import PaddleClas
-clas = PaddleClas(model_name='ResNet50')
-image_file = cv2.imread("docs/images/whl/demo.jpg")
-result=clas.predict(image_file)
+clas = PaddleClas(inference_model_dir='./inference/')
+infer_imgs = 'docs/images/whl/demo.jpg'
+result=clas.predict(infer_imgs)
+print(next(result))
 ```

-* 用户可以将`image_file`指定为包含图片的文件夹路径。
+* CLI
+```bash
+paddleclas --inference_model_dir='./inference/' --infer_imgs='docs/images/whl/demo.jpg'
+```
+
+
+### 4.4 批量预测
+当参数 `infer_imgs` 为包含图片文件的目录时，可以对图片进行批量预测，只需通过参数 `batch_size` 指定batch大小。

-###### python
+* Python
 ```python
 from paddleclas import PaddleClas
-clas = PaddleClas(model_name='ResNet50')
-image_file = 'docs/images/whl/' # it can be image_file folder path which contains all of images you want to predict.
-result=clas.predict(image_file)
-print(result)
+clas = PaddleClas(model_name='ResNet50', batch_size=2)
+infer_imgs = 'docs/images/'
+result=clas.predict(infer_imgs)
+for r in result:
+    print(r)
 ```

-###### bash
+* CLI
 ```bash
-paddleclas --model_name='ResNet50' --image_file='docs/images/whl/'
+paddleclas --model_name='ResNet50' --infer_imgs='docs/images/' --batch_size 2
 ```

-* 用户可以指定`pre_label_image=True`, `pre_label_out_idr='./output_pre_label/'`，将图片按其top1预测结果保存到`pre_label_out_dir`目录下对应类别的文件夹中。

-###### python
+### 4.5 对网络图片进行预测
+可以对网络图片进行预测，只需通过参数`infer_imgs`指定图片`url`。此时图片会下载并保存在`~/.paddleclas/images/`目录下。
+
+* Python
 ```python
 from paddleclas import PaddleClas
-clas = PaddleClas(model_name='ResNet50', pre_label_image=True,pre_label_out_idr='./output_pre_label/')
-image_file = 'docs/images/whl/' # it can be image_file folder path which contains all of images you want to predict.
-result=clas.predict(image_file)
-print(result)
+clas = PaddleClas(model_name='ResNet50')
+infer_imgs = 'https://raw.githubusercontent.com/paddlepaddle/paddleclas/release/2.2/docs/images/whl/demo.jpg'
+result=clas.predict(infer_imgs)
+print(next(result))
 ```

-###### bash
+* CLI
 ```bash
-paddleclas --model_name='ResNet50' --image_file='docs/images/whl/' --pre_label_image=True --pre_label_out_idr='./output_pre_label/'
+paddleclas --model_name='ResNet50' --infer_imgs='https://raw.githubusercontent.com/paddlepaddle/paddleclas/release/2.2/docs/images/whl/demo.jpg'
 ```

-* 用户可以通过参数`label_name_path`指定模型的`label_dict_file`文件路径，文件内容格式应为(class_id<space>class_name<\n>)，例如：

-```
-0 tench, Tinca tinca
-1 goldfish, Carassius auratus
-2 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
-......
+### 4.6 对`NumPy.ndarray`格式数据进行预测
+在Python中，可以对`Numpy.ndarray`格式的图像数据进行预测，只需通过参数`infer_imgs`指定即可。注意该图像数据必须为三通道图像数据。
+
+* python
+```python
+import cv2
+from paddleclas import PaddleClas
+clas = PaddleClas(model_name='ResNet50')
+infer_imgs = cv2.imread("docs/images/whl/demo.jpg")
+result=clas.predict(infer_imgs)
+print(next(result))
 ```

-* 用户如果使用Paddle提供的inference model，则不需要提供`label_name_path`，会默认使用`ppcls/utils/imagenet1k_label_list.txt`。
-如果用户希望使用自己的模型，则可以提供`label_name_path`，将label_name与结果一并输出。如果不提供将不会输出label_name信息。

+### 4.7 保存预测结果
+可以指定参数`pre_label_out_dir='./output_pre_label/'`，将图片按其top1预测结果保存到`pre_label_out_dir`目录下对应类别的文件夹中。

-###### python
+* python
 ```python
 from paddleclas import PaddleClas
-clas = PaddleClas(model_file='the path of model file', params_file ='the path of params file', label_name_path='./ppcls/utils/imagenet1k_label_list.txt')
-image_file = 'docs/images/whl/demo.jpg' # it can be image_file folder path which contains all of images you want to predict.
-result=clas.predict(image_file)
-print(result)
+clas = PaddleClas(model_name='ResNet50', save_dir='./output_pre_label/')
+infer_imgs = 'docs/images/whl/' # it can be infer_imgs folder path which contains all of images you want to predict.
+result=clas.predict(infer_imgs)
+print(next(result))
 ```

-###### bash
+* CLI
 ```bash
-paddleclas --model_file='the path of model file' --params_file='the path of params file' --image_file='docs/images/whl/demo.jpg' --label_name_path='./ppcls/utils/imagenet1k_label_list.txt'
+paddleclas --model_name='ResNet50' --infer_imgs='docs/images/whl/' --save_dir='./output_pre_label/'
+```
+
+
+### 4.8 指定label name
+可以通过参数`class_id_map_file`指定`class id`与`lable`的对应关系。PaddleClas默认使用ImageNet1K的label_name（`ppcls/utils/imagenet1k_label_list.txt`）。
+
+`class_id_map_file`文件内容格式应为：
+
+```
+class_id<space>class_name<\n>
+```
+
+例如：
+
+```
+0 tench, Tinca tinca
+1 goldfish, Carassius auratus
+2 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
+......
 ```

-###### python
+* Python
 ```python
 from paddleclas import PaddleClas
-clas = PaddleClas(model_name='ResNet50')
-image_file = 'docs/images/whl/' # it can be image_file folder path which contains all of images you want to predict.
-result=clas.predict(image_file)
-print(result)
+clas = PaddleClas(model_name='ResNet50', class_id_map_file='./ppcls/utils/imagenet1k_label_list.txt')
+infer_imgs = 'docs/images/whl/demo.jpg'
+result=clas.predict(infer_imgs)
+print(next(result))
 ```

-###### bash
+* CLI
 ```bash
-paddleclas --model_name='ResNet50' --image_file='docs/images/whl/'
+paddleclas --model_name='ResNet50' --infer_imgs='docs/images/whl/demo.jpg' --class_id_map_file='./ppcls/utils/imagenet1k_label_list.txt'
 ```
--- a/paddleclas.py
+++ b/paddleclas.py
@@ -18,6 +18,7 @@ __dir__ = os.path.dirname(__file__)
 sys.path.append(os.path.join(__dir__, ""))
 sys.path.append(os.path.join(__dir__, "deploy"))

+from typing import Union, Generator
 import argparse
 import shutil
 import textwrap
@@ -162,73 +163,133 @@ class InputModelError(Exception):
        super().__init__(message)


-def args_cfg():
-    parser = config.parser()
-    other_options = [
-        ("infer_imgs", str, None, "The image(s) to be predicted."),
-        ("model_name", str, None, "The model name to be used."),
-        ("inference_model_dir", str, None, "The directory of model files."),
-        ("use_gpu", bool, True, "Whether use GPU. Default by True."), (
-            "enable_mkldnn", bool, False,
-            "Whether use MKLDNN. Default by False."),
-        ("batch_size", int, 1, "Batch size. Default by 1.")
-    ]
-    for name, opt_type, default, description in other_options:
-        parser.add_argument(
-            "--" + name, type=opt_type, default=default, help=description)
-
-    args = parser.parse_args()
-
-    for name, opt_type, default, description in other_options:
-        val = eval("args." + name)
-        full_name = "Global." + name
-        args.override.append(
-            f"{full_name}={val}") if val is not default else None
-
-    cfg = config.get_config(
-        args.config, overrides=args.override, show=args.verbose)
-
-    return cfg
-
-
-def get_default_confg():
-    return {
+def init_config(model_name,
+                inference_model_dir,
+                use_gpu=True,
+                batch_size=1,
+                topk=5,
+                **kwargs):
+    imagenet1k_map_path = os.path.join(
+        os.path.abspath(__dir__), "ppcls/utils/imagenet1k_label_list.txt")
+    cfg = {
        "Global": {
-            "model_name": "MobileNetV3_small_x0_35",
-            "use_gpu": False,
-            "use_fp16": False,
-            "enable_mkldnn": False,
-            "cpu_num_threads": 1,
-            "use_tensorrt": False,
-            "ir_optim": False,
+            "infer_imgs": kwargs["infer_imgs"]
+            if "infer_imgs" in kwargs else False,
+            "model_name": model_name,
+            "inference_model_dir": inference_model_dir,
+            "batch_size": batch_size,
+            "use_gpu": use_gpu,
+            "enable_mkldnn": kwargs["enable_mkldnn"]
+            if "enable_mkldnn" in kwargs else False,
+            "cpu_num_threads": kwargs["cpu_num_threads"]
+            if "cpu_num_threads" in kwargs else 1,
+            "enable_benchmark": False,
+            "use_fp16": kwargs["use_fp16"] if "use_fp16" in kwargs else False,
+            "ir_optim": True,
+            "use_tensorrt": kwargs["use_tensorrt"]
+            if "use_tensorrt" in kwargs else False,
+            "gpu_mem": kwargs["gpu_mem"] if "gpu_mem" in kwargs else 8000,
            "enable_profile": False
        },
        "PreProcess": {
            "transform_ops": [{
                "ResizeImage": {
-                    "resize_short": 256
+                    "resize_short": kwargs["resize_short"]
+                    if "resize_short" in kwargs else 256
                }
            }, {
                "CropImage": {
-                    "size": 224
+                    "size": kwargs["crop_size"]
+                    if "crop_size" in kwargs else 224
                }
            }, {
                "NormalizeImage": {
                    "scale": 0.00392157,
                    "mean": [0.485, 0.456, 0.406],
                    "std": [0.229, 0.224, 0.225],
-                    "order": ""
+                    "order": ''
                }
            }, {
                "ToCHWImage": None
            }]
        },
        "PostProcess": {
-            "name": "Topk",
-            "topk": 5,
-            "class_id_map_file": "./ppcls/utils/imagenet1k_label_list.txt"
+            "main_indicator": "Topk",
+            "Topk": {
+                "topk": topk,
+                "class_id_map_file": imagenet1k_map_path
            }
        }
+    }
+    if "save_dir" in kwargs:
+        if kwargs["save_dir"] is not None:
+            cfg["PostProcess"]["SavePreLabel"] = {
+                "save_dir": kwargs["save_dir"]
+            }
+    if "class_id_map_file" in kwargs:
+        if kwargs["class_id_map_file"] is not None:
+            cfg["PostProcess"]["Topk"]["class_id_map_file"] = kwargs[
+                "class_id_map_file"]
+
+    cfg = config.AttrDict(cfg)
+    config.create_attr_dict(cfg)
+    return cfg
+
+
+def args_cfg():
+    def str2bool(v):
+        return v.lower() in ("true", "t", "1")
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--infer_imgs",
+        type=str,
+        required=True,
+        help="The image(s) to be predicted.")
+    parser.add_argument(
+        "--model_name", type=str, help="The model name to be used.")
+    parser.add_argument(
+        "--inference_model_dir",
+        type=str,
+        help="The directory of model files. Valid when model_name not specifed."
+    )
+    parser.add_argument(
+        "--use_gpu", type=str, default=True, help="Whether use GPU.")
+    parser.add_argument("--gpu_mem", type=int, default=8000, help="")
+    parser.add_argument(
+        "--enable_mkldnn",
+        type=str2bool,
+        default=False,
+        help="Whether use MKLDNN. Valid when use_gpu is False")
+    parser.add_argument("--cpu_num_threads", type=int, default=1, help="")
+    parser.add_argument(
+        "--use_tensorrt", type=str2bool, default=False, help="")
+    parser.add_argument("--use_fp16", type=str2bool, default=False, help="")
+    parser.add_argument(
+        "--batch_size", type=int, default=1, help="Batch size. Default by 1.")
+    parser.add_argument(
+        "--topk",
+        type=int,
+        default=5,
+        help="Return topk score(s) and corresponding results. Default by 5.")
+    parser.add_argument(
+        "--class_id_map_file",
+        type=str,
+        help="The path of file that map class_id and label.")
+    parser.add_argument(
+        "--save_dir",
+        type=str,
+        help="The directory to save prediction results as pre-label.")
+    parser.add_argument(
+        "--resize_short",
+        type=int,
+        default=256,
+        help="Resize according to short size.")
+    parser.add_argument(
+        "--crop_size", type=int, default=224, help="Centor crop size.")
+
+    args = parser.parse_args()
+    return vars(args)


 def print_info():
@@ -292,7 +353,7 @@ def download_with_progressbar(url, save_path):
    if total_size_in_bytes == 0 or progress_bar.n != total_size_in_bytes or not os.path.isfile(
            save_path):
        raise Exception(
-            f"Something went wrong while downloading model/image from {url}")
+            f"Something went wrong while downloading file from {url}")


 def check_model_file(model_name):
@@ -334,15 +395,6 @@ def check_model_file(model_name):
    return storage_directory()


-def save_prelabel_results(class_id, input_file_path, output_dir):
-    """Save the predicted image according to the prediction.
-    """
-    output_dir = os.path.join(output_dir, str(class_id))
-    if not os.path.isdir(output_dir):
-        os.makedirs(output_dir)
-    shutil.copy(input_file_path, output_dir)
-
-
 class PaddleClas(object):
    """PaddleClas.
    """
@@ -350,24 +402,24 @@ class PaddleClas(object):
    print_info()

    def __init__(self,
-                 config: dict=None,
                 model_name: str=None,
                 inference_model_dir: str=None,
-                 use_gpu: bool=None,
-                 batch_size: int=None):
+                 use_gpu: bool=True,
+                 batch_size: int=1,
+                 topk: int=5,
+                 **kwargs):
        """Init PaddleClas with config.

        Args:
-            config: The config of PaddleClas's predictor, default by None. If default, the default configuration is used. Please refer doc for more information.
-            model_name: The model name supported by PaddleClas, default by None. If specified, override config.
-            inference_model_dir: The directory that contained model file and params file to be used, default by None. If specified, override config.
-            use_gpu: Wheather use GPU, default by None. If specified, override config.
-            batch_size: The batch size to pridict, default by None. If specified, override config.
+            model_name (str, optional): The model name supported by PaddleClas. If specified, override config. Defaults to None.
+            inference_model_dir (str, optional): The directory that contained model file and params file to be used. If specified, override config. Defaults to None.
+            use_gpu (bool, optional): Whether use GPU. If specified, override config. Defaults to True.
+            batch_size (int, optional): The batch size to pridict. If specified, override config. Defaults to 1.
+            topk (int, optional): Return the top k prediction results with the highest score. Defaults to 5.
        """
        super().__init__()
-        self._config = config
-        self._check_config(model_name, inference_model_dir, use_gpu,
-                           batch_size)
+        self._config = init_config(model_name, inference_model_dir, use_gpu,
+                                   batch_size, topk, **kwargs)
        self._check_input_model()
        self.cls_predictor = ClsPredictor(self._config)

@@ -376,26 +428,6 @@ class PaddleClas(object):
        """
        return self._config

-    def _check_config(self,
-                      model_name=None,
-                      inference_model_dir=None,
-                      use_gpu=None,
-                      batch_size=None):
-        if self._config is None:
-            self._config = get_default_confg()
-            warnings.warn("config is not provided, use default!")
-        self._config = config.AttrDict(self._config)
-        config.create_attr_dict(self._config)
-
-        if model_name is not None:
-            self._config.Global["model_name"] = model_name
-        if inference_model_dir is not None:
-            self._config.Global["inference_model_dir"] = inference_model_dir
-        if use_gpu is not None:
-            self._config.Global["use_gpu"] = use_gpu
-        if batch_size is not None:
-            self._config.Global["batch_size"] = batch_size
-
    def _check_input_model(self):
        """Check input model name or model files.
        """
@@ -407,11 +439,8 @@ class PaddleClas(object):
            similar_names = similar_architectures(input_model_name,
                                                  candidate_model_names)
            similar_names_str = ", ".join(similar_names)
-            if input_model_name not in similar_names_str:
-                err = f"{input_model_name} is not exist! Maybe you want: [{similar_names_str}]"
-                raise InputModelError(err)
            if input_model_name not in candidate_model_names:
-                err = f"{input_model_name} is not provided by PaddleClas. If you want to use your own model, please input model_file as model path!"
+                err = f"{input_model_name} is not provided by PaddleClas. \nMaybe you want: [{similar_names_str}]. \nIf you want to use your own model, please specify inference_model_dir!"
                raise InputModelError(err)
            self._config.Global.inference_model_dir = check_model_file(
                input_model_name)
@@ -427,22 +456,35 @@ class PaddleClas(object):
                raise InputModelError(err)
            return
        else:
-            err = f"Please specify the model name supported by PaddleClas or directory contained model file and params file."
+            err = f"Please specify the model name supported by PaddleClas or directory contained model files(inference.pdmodel, inference.pdiparams)."
            raise InputModelError(err)
        return

-    def predict(self, input_data, print_pred=True):
-        """Predict label of img with paddleclas.
+    def predict(self, input_data: Union[str, np.array],
+                print_pred: bool=False) -> Generator[list, None, None]:
+        """Predict input_data.
+
        Args:
-            input_data(str, NumPy.ndarray): 
-                image to be classified, support: str(local path of image file, internet URL, directory containing series of images) and NumPy.ndarray(preprocessed image data that has 3 channels and accords with [C, H, W], or raw image data that has 3 channels and accords with [H, W, C]).
-        Returns:
-            dict: {image_name: "", class_id: [], scores: [], label_names: []}，if label name path == None，label_names will be empty.
+            input_data (Union[str, np.array]): 
+                When the type is str, it is the path of image, or the directory containing images, or the URL of image from Internet.
+                When the type is np.array, it is the image data whose channel order is RGB.
+            print_pred (bool, optional): Whether print the prediction result. Defaults to False. Defaults to False.
+
+        Raises:
+            ImageTypeError: Illegal input_data.
+
+        Yields:
+            Generator[list, None, None]: 
+                The prediction result(s) of input_data by batch_size. For every one image, 
+                prediction result(s) is zipped as a dict, that includs topk "class_ids", "scores" and "label_names". 
+                The format is as follow: [{"class_ids": [...], "scores": [...], "label_names": [...]}, ...]
        """
+
        if isinstance(input_data, np.ndarray):
-            return self.cls_predictor.predict(input_data)
+            outputs = self.cls_predictor.predict(input_data)
+            yield self.cls_predictor.postprocess(outputs)
        elif isinstance(input_data, str):
-            if input_data.startswith("http"):
+            if input_data.startswith("http") or input_data.startswith("https"):
                image_storage_dir = partial(os.path.join, BASE_IMAGES_DIR)
                if not os.path.exists(image_storage_dir()):
                    os.makedirs(image_storage_dir())
@@ -455,12 +497,10 @@ class PaddleClas(object):
            image_list = get_image_list(input_data)

            batch_size = self._config.Global.get("batch_size", 1)
-            pre_label_out_idr = self._config.Global.get("pre_label_out_idr",
-                                                        False)
+            topk = self._config.PostProcess.get('topk', 1)

            img_list = []
            img_path_list = []
-            output_list = []
            cnt = 0
            for idx, img_path in enumerate(image_list):
                img = cv2.imread(img_path)
@@ -469,30 +509,26 @@ class PaddleClas(object):
                        f"Image file failed to read and has been skipped. The path: {img_path}"
                    )
                    continue
+                img = img[:, :, ::-1]
                img_list.append(img)
                img_path_list.append(img_path)
                cnt += 1

                if cnt % batch_size == 0 or (idx + 1) == len(image_list):
                    outputs = self.cls_predictor.predict(img_list)
-                    output_list.append(outputs[0])
-                    preds = self.cls_predictor.postprocess(outputs)
-                    for nu, pred in enumerate(preds):
-                        if pre_label_out_idr:
-                            save_prelabel_results(pred["class_ids"][0],
-                                                  img_path_list[nu],
-                                                  pre_label_out_idr)
-                        if print_pred:
-                            pred_str_list = [
-                                f"filename: {img_path_list[nu]}",
-                                f"top-{self._config.PostProcess.get('topk', 1)}"
-                            ]
-                            for k in pred:
-                                pred_str_list.append(f"{k}: {pred[k]}")
-                            print(", ".join(pred_str_list))
+                    preds = self.cls_predictor.postprocess(outputs,
+                                                           img_path_list)
+                    if print_pred and preds:
+                        for pred in preds:
+                            filename = pred.pop("file_name")
+                            pred_str = ", ".join(
+                                [f"{k}: {pred[k]}" for k in pred])
+                            print(
+                                f"filename: {filename}, top-{topk}, {pred_str}")
+
                    img_list = []
                    img_path_list = []
-            return output_list
+                    yield preds
        else:
            err = "Please input legal image! The type of image supported by PaddleClas are: NumPy.ndarray and string of local path or Ineternet URL"
            raise ImageTypeError(err)
@@ -504,8 +540,11 @@ def main():
    """Function API used for commad line.
    """
    cfg = args_cfg()
-    clas_engine = PaddleClas(cfg)
-    clas_engine.predict(cfg["Global"]["infer_imgs"], print_pred=True)
+    clas_engine = PaddleClas(**cfg)
+    res = clas_engine.predict(cfg["infer_imgs"], print_pred=True)
+    for _ in res:
+        pass
+    print("Predict complete!")
    return



--- a/ppcls/arch/backbone/legendary_models/resnet.py
+++ b/ppcls/arch/backbone/legendary_models/resnet.py
@@ -104,7 +104,8 @@ class ConvBNLayer(TheseusLayer):
                 groups=1,
                 is_vd_mode=False,
                 act=None,
-                 lr_mult=1.0):
+                 lr_mult=1.0,
+                 data_format="NCHW"):
        super().__init__()
        self.is_vd_mode = is_vd_mode
        self.act = act
@@ -118,11 +119,13 @@ class ConvBNLayer(TheseusLayer):
            padding=(filter_size - 1) // 2,
            groups=groups,
            weight_attr=ParamAttr(learning_rate=lr_mult),
-            bias_attr=False)
+            bias_attr=False,
+            data_format=data_format)
        self.bn = BatchNorm(
            num_filters,
            param_attr=ParamAttr(learning_rate=lr_mult),
-            bias_attr=ParamAttr(learning_rate=lr_mult))
+            bias_attr=ParamAttr(learning_rate=lr_mult),
+            data_layout=data_format)
        self.relu = nn.ReLU()

    def forward(self, x):
@@ -136,14 +139,14 @@ class ConvBNLayer(TheseusLayer):


 class BottleneckBlock(TheseusLayer):
-    def __init__(
-            self,
+    def __init__(self,
                 num_channels,
                 num_filters,
                 stride,
                 shortcut=True,
                 if_first=False,
-            lr_mult=1.0, ):
+                 lr_mult=1.0,
+                 data_format="NCHW"):
        super().__init__()

        self.conv0 = ConvBNLayer(
@@ -151,20 +154,23 @@ class BottleneckBlock(TheseusLayer):
            num_filters=num_filters,
            filter_size=1,
            act="relu",
-            lr_mult=lr_mult)
+            lr_mult=lr_mult,
+            data_format=data_format)
        self.conv1 = ConvBNLayer(
            num_channels=num_filters,
            num_filters=num_filters,
            filter_size=3,
            stride=stride,
            act="relu",
-            lr_mult=lr_mult)
+            lr_mult=lr_mult,
+            data_format=data_format)
        self.conv2 = ConvBNLayer(
            num_channels=num_filters,
            num_filters=num_filters * 4,
            filter_size=1,
            act=None,
-            lr_mult=lr_mult)
+            lr_mult=lr_mult,
+            data_format=data_format)

        if not shortcut:
            self.short = ConvBNLayer(
@@ -173,7 +179,8 @@ class BottleneckBlock(TheseusLayer):
                filter_size=1,
                stride=stride if if_first else 1,
                is_vd_mode=False if if_first else True,
-                lr_mult=lr_mult)
+                lr_mult=lr_mult,
+                data_format=data_format)
        self.relu = nn.ReLU()
        self.shortcut = shortcut

@@ -199,7 +206,8 @@ class BasicBlock(TheseusLayer):
                 stride,
                 shortcut=True,
                 if_first=False,
-                 lr_mult=1.0):
+                 lr_mult=1.0,
+                 data_format="NCHW"):
        super().__init__()

        self.stride = stride
@@ -209,13 +217,15 @@ class BasicBlock(TheseusLayer):
            filter_size=3,
            stride=stride,
            act="relu",
-            lr_mult=lr_mult)
+            lr_mult=lr_mult,
+            data_format=data_format)
        self.conv1 = ConvBNLayer(
            num_channels=num_filters,
            num_filters=num_filters,
            filter_size=3,
            act=None,
-            lr_mult=lr_mult)
+            lr_mult=lr_mult,
+            data_format=data_format)
        if not shortcut:
            self.short = ConvBNLayer(
                num_channels=num_channels,
@@ -223,7 +233,8 @@ class BasicBlock(TheseusLayer):
                filter_size=1,
                stride=stride if if_first else 1,
                is_vd_mode=False if if_first else True,
-                lr_mult=lr_mult)
+                lr_mult=lr_mult,
+                data_format=data_format)
        self.shortcut = shortcut
        self.relu = nn.ReLU()

@@ -256,7 +267,9 @@ class ResNet(TheseusLayer):
                 config,
                 version="vb",
                 class_num=1000,
-                 lr_mult_list=[1.0, 1.0, 1.0, 1.0, 1.0]):
+                 lr_mult_list=[1.0, 1.0, 1.0, 1.0, 1.0],
+                 data_format="NCHW",
+                 input_image_channel=3):
        super().__init__()

        self.cfg = config
@@ -279,22 +292,25 @@ class ResNet(TheseusLayer):

        self.stem_cfg = {
            #num_channels, num_filters, filter_size, stride
-            "vb": [[3, 64, 7, 2]],
-            "vd": [[3, 32, 3, 2], [32, 32, 3, 1], [32, 64, 3, 1]]
+            "vb": [[input_image_channel, 64, 7, 2]],
+            "vd":
+            [[input_image_channel, 32, 3, 2], [32, 32, 3, 1], [32, 64, 3, 1]]
        }

-        self.stem = nn.Sequential(*[
+        self.stem = nn.Sequential(* [
            ConvBNLayer(
                num_channels=in_c,
                num_filters=out_c,
                filter_size=k,
                stride=s,
                act="relu",
-                lr_mult=self.lr_mult_list[0])
+                lr_mult=self.lr_mult_list[0],
+                data_format=data_format)
            for in_c, out_c, k, s in self.stem_cfg[version]
        ])

-        self.max_pool = MaxPool2D(kernel_size=3, stride=2, padding=1)
+        self.max_pool = MaxPool2D(
+            kernel_size=3, stride=2, padding=1, data_format=data_format)
        block_list = []
        for block_idx in range(len(self.block_depth)):
            shortcut = False
@@ -306,11 +322,12 @@ class ResNet(TheseusLayer):
                    stride=2 if i == 0 and block_idx != 0 else 1,
                    shortcut=shortcut,
                    if_first=block_idx == i == 0 if version == "vd" else True,
-                    lr_mult=self.lr_mult_list[block_idx + 1]))
+                    lr_mult=self.lr_mult_list[block_idx + 1],
+                    data_format=data_format))
                shortcut = True
        self.blocks = nn.Sequential(*block_list)

-        self.avg_pool = AdaptiveAvgPool2D(1)
+        self.avg_pool = AdaptiveAvgPool2D(1, data_format=data_format)
        self.flatten = nn.Flatten()
        self.avg_pool_channels = self.num_channels[-1] * 2
        stdv = 1.0 / math.sqrt(self.avg_pool_channels * 1.0)
@@ -319,7 +336,13 @@ class ResNet(TheseusLayer):
            self.class_num,
            weight_attr=ParamAttr(initializer=Uniform(-stdv, stdv)))

+        self.data_format = data_format
+
    def forward(self, x):
+        with paddle.static.amp.fp16_guard():
+            if self.data_format == "NHWC":
+                x = paddle.transpose(x, [0, 2, 3, 1])
+                x.stop_gradient = True
            x = self.stem(x)
            x = self.max_pool(x)
            x = self.blocks(x)

--- a/ppcls/arch/backbone/model_zoo/gvt.py
+++ b/ppcls/arch/backbone/model_zoo/gvt.py
@@ -56,10 +56,10 @@ class GroupAttention(nn.Layer):
                 ws=1):
        super().__init__()
        if ws == 1:
-            raise Exception(f"ws {ws} should not be 1")
+            raise Exception("ws {ws} should not be 1")
        if dim % num_heads != 0:
            raise Exception(
-                f"dim {dim} should be divided by num_heads {num_heads}.")
+                "dim {dim} should be divided by num_heads {num_heads}.")

        self.dim = dim
        self.num_heads = num_heads
@@ -78,15 +78,15 @@ class GroupAttention(nn.Layer):
        total_groups = h_group * w_group
        x = x.reshape([B, h_group, self.ws, w_group, self.ws, C]).transpose(
            [0, 1, 3, 2, 4, 5])
-        qkv = self.qkv(x).reshape(
-            [B, total_groups, -1, 3, self.num_heads,
-             C // self.num_heads]).transpose([3, 0, 1, 4, 2, 5])
+        qkv = self.qkv(x).reshape([
+            B, total_groups, self.ws**2, 3, self.num_heads, C // self.num_heads
+        ]).transpose([3, 0, 1, 4, 2, 5])
        q, k, v = qkv[0], qkv[1], qkv[2]
-        attn = (q @k.transpose([0, 1, 2, 4, 3])) * self.scale
+        attn = (q @ k.transpose([0, 1, 2, 4, 3])) * self.scale

        attn = nn.Softmax(axis=-1)(attn)
        attn = self.attn_drop(attn)
-        attn = (attn @v).transpose([0, 1, 3, 2, 4]).reshape(
+        attn = (attn @ v).transpose([0, 1, 3, 2, 4]).reshape(
            [B, h_group, w_group, self.ws, self.ws, C])

        x = attn.transpose([0, 1, 3, 2, 4, 5]).reshape([B, N, C])
@@ -135,22 +135,23 @@ class Attention(nn.Layer):

        if self.sr_ratio > 1:
            x_ = x.transpose([0, 2, 1]).reshape([B, C, H, W])
-            x_ = self.sr(x_).reshape([B, C, -1]).transpose([0, 2, 1])
+            tmp_n = H * W // self.sr_ratio**2
+            x_ = self.sr(x_).reshape([B, C, tmp_n]).transpose([0, 2, 1])
            x_ = self.norm(x_)
            kv = self.kv(x_).reshape(
-                [B, -1, 2, self.num_heads, C // self.num_heads]).transpose(
+                [B, tmp_n, 2, self.num_heads, C // self.num_heads]).transpose(
                    [2, 0, 3, 1, 4])
        else:
            kv = self.kv(x).reshape(
-                [B, -1, 2, self.num_heads, C // self.num_heads]).transpose(
+                [B, N, 2, self.num_heads, C // self.num_heads]).transpose(
                    [2, 0, 3, 1, 4])
        k, v = kv[0], kv[1]

-        attn = (q @k.transpose([0, 1, 3, 2])) * self.scale
+        attn = (q @ k.transpose([0, 1, 3, 2])) * self.scale
        attn = nn.Softmax(axis=-1)(attn)
        attn = self.attn_drop(attn)

-        x = (attn @v).transpose([0, 2, 1, 3]).reshape([B, N, C])
+        x = (attn @ v).transpose([0, 2, 1, 3]).reshape([B, N, C])
        x = self.proj(x)
        x = self.proj_drop(x)
        return x
@@ -280,7 +281,7 @@ class PyramidVisionTransformer(nn.Layer):
                 img_size=224,
                 patch_size=16,
                 in_chans=3,
-                 num_classes=1000,
+                 class_num=1000,
                 embed_dims=[64, 128, 256, 512],
                 num_heads=[1, 2, 4, 8],
                 mlp_ratios=[4, 4, 4, 4],
@@ -294,7 +295,7 @@ class PyramidVisionTransformer(nn.Layer):
                 sr_ratios=[8, 4, 2, 1],
                 block_cls=Block):
        super().__init__()
-        self.num_classes = num_classes
+        self.class_num = class_num
        self.depths = depths

        # patch_embed
@@ -317,7 +318,6 @@ class PyramidVisionTransformer(nn.Layer):
                self.create_parameter(
                    shape=[1, patch_num, embed_dims[i]],
                    default_initializer=zeros_))
-            self.add_parameter(f"pos_embeds_{i}", self.pos_embeds[i])
            self.pos_drops.append(nn.Dropout(p=drop_rate))

        dpr = [
@@ -354,7 +354,7 @@ class PyramidVisionTransformer(nn.Layer):

        # classification head
        self.head = nn.Linear(embed_dims[-1],
-                              num_classes) if num_classes > 0 else Identity()
+                              class_num) if class_num > 0 else Identity()

        # init weights
        for pos_emb in self.pos_embeds:
@@ -433,7 +433,7 @@ class CPVTV2(PyramidVisionTransformer):
                 img_size=224,
                 patch_size=4,
                 in_chans=3,
-                 num_classes=1000,
+                 class_num=1000,
                 embed_dims=[64, 128, 256, 512],
                 num_heads=[1, 2, 4, 8],
                 mlp_ratios=[4, 4, 4, 4],
@@ -446,10 +446,10 @@ class CPVTV2(PyramidVisionTransformer):
                 depths=[3, 4, 6, 3],
                 sr_ratios=[8, 4, 2, 1],
                 block_cls=Block):
-        super().__init__(img_size, patch_size, in_chans, num_classes,
-                         embed_dims, num_heads, mlp_ratios, qkv_bias, qk_scale,
-                         drop_rate, attn_drop_rate, drop_path_rate, norm_layer,
-                         depths, sr_ratios, block_cls)
+        super().__init__(img_size, patch_size, in_chans, class_num, embed_dims,
+                         num_heads, mlp_ratios, qkv_bias, qk_scale, drop_rate,
+                         attn_drop_rate, drop_path_rate, norm_layer, depths,
+                         sr_ratios, block_cls)
        del self.pos_embeds
        del self.cls_token
        self.pos_block = nn.LayerList(
@@ -488,7 +488,7 @@ class CPVTV2(PyramidVisionTransformer):
                    x = self.pos_block[i](x, H, W)  # PEG here

            if i < len(self.depths) - 1:
-                x = x.reshape([B, H, W, -1]).transpose([0, 3, 1, 2])
+                x = x.reshape([B, H, W, x.shape[-1]]).transpose([0, 3, 1, 2])

        x = self.norm(x)
        return x.mean(axis=1)  # GAP here
@@ -499,7 +499,7 @@ class PCPVT(CPVTV2):
                 img_size=224,
                 patch_size=4,
                 in_chans=3,
-                 num_classes=1000,
+                 class_num=1000,
                 embed_dims=[64, 128, 256],
                 num_heads=[1, 2, 4],
                 mlp_ratios=[4, 4, 4],
@@ -512,10 +512,10 @@ class PCPVT(CPVTV2):
                 depths=[4, 4, 4],
                 sr_ratios=[4, 2, 1],
                 block_cls=SBlock):
-        super().__init__(img_size, patch_size, in_chans, num_classes,
-                         embed_dims, num_heads, mlp_ratios, qkv_bias, qk_scale,
-                         drop_rate, attn_drop_rate, drop_path_rate, norm_layer,
-                         depths, sr_ratios, block_cls)
+        super().__init__(img_size, patch_size, in_chans, class_num, embed_dims,
+                         num_heads, mlp_ratios, qkv_bias, qk_scale, drop_rate,
+                         attn_drop_rate, drop_path_rate, norm_layer, depths,
+                         sr_ratios, block_cls)


 class ALTGVT(PCPVT):

--- a/ppcls/arch/backbone/model_zoo/levit.py
+++ b/ppcls/arch/backbone/model_zoo/levit.py
@@ -45,12 +45,13 @@ __all__ = list(MODEL_URLS.keys())
 def cal_attention_biases(attention_biases, attention_bias_idxs):
    gather_list = []
    attention_bias_t = paddle.transpose(attention_biases, (1, 0))
-    for idx in attention_bias_idxs:
-        gather = paddle.gather(attention_bias_t, idx)
+    nums = attention_bias_idxs.shape[0]
+    for idx in range(nums):
+        gather = paddle.gather(attention_bias_t, attention_bias_idxs[idx])
        gather_list.append(gather)
    shape0, shape1 = attention_bias_idxs.shape
-    return paddle.transpose(paddle.concat(gather_list), (1, 0)).reshape(
-        (0, shape0, shape1))
+    gather = paddle.concat(gather_list)
+    return paddle.transpose(gather, (1, 0)).reshape((0, shape0, shape1))


 class Conv2d_BN(nn.Sequential):
@@ -127,11 +128,12 @@ class Residual(nn.Layer):

    def forward(self, x):
        if self.training and self.drop > 0:
-            return x + self.m(x) * paddle.rand(
-                x.size(0), 1, 1,
-                device=x.device).ge_(self.drop).div(1 - self.drop).detach()
+            y = paddle.rand(
+                shape=[x.shape[0], 1, 1]).__ge__(self.drop).astype("float32")
+            y = y.divide(paddle.full_like(y, 1 - self.drop))
+            return paddle.add(x, y)
        else:
-            return x + self.m(x)
+            return paddle.add(x, self.m(x))


 class Attention(nn.Layer):
@@ -203,9 +205,9 @@ class Attention(nn.Layer):
                                                    self.attention_bias_idxs)
        else:
            attention_biases = self.ab
-        attn = ((q @k_transpose) * self.scale + attention_biases)
+        attn = (paddle.matmul(q, k_transpose) * self.scale + attention_biases)
        attn = F.softmax(attn)
-        x = paddle.transpose(attn @v, perm=[0, 2, 1, 3])
+        x = paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3])
        x = paddle.reshape(x, [B, N, self.dh])
        x = self.proj(x)
        return x
@@ -219,8 +221,9 @@ class Subsample(nn.Layer):

    def forward(self, x):
        B, N, C = x.shape
-        x = paddle.reshape(x, [B, self.resolution, self.resolution,
-                               C])[:, ::self.stride, ::self.stride]
+        x = paddle.reshape(x, [B, self.resolution, self.resolution, C])
+        end1, end2 = x.shape[1], x.shape[2]
+        x = x[:, 0:end1:self.stride, 0:end2:self.stride]
        x = paddle.reshape(x, [B, -1, C])
        return x

@@ -315,13 +318,14 @@ class AttentionSubsample(nn.Layer):
        else:
            attention_biases = self.ab

-        attn = (q @paddle.transpose(
-            k, perm=[0, 1, 3, 2])) * self.scale + attention_biases
+        attn = (paddle.matmul(
+            q, paddle.transpose(
+                k, perm=[0, 1, 3, 2]))) * self.scale + attention_biases
        attn = F.softmax(attn)

        x = paddle.reshape(
            paddle.transpose(
-                (attn @v), perm=[0, 2, 1, 3]), [B, -1, self.dh])
+                paddle.matmul(attn, v), perm=[0, 2, 1, 3]), [B, -1, self.dh])
        x = self.proj(x)
        return x

@@ -422,6 +426,8 @@ class LeViT(nn.Layer):
        x = paddle.transpose(x, perm=[0, 2, 1])
        x = self.blocks(x)
        x = x.mean(1)
+
+        x = paddle.reshape(x, [-1, self.embed_dim[-1]])
        if self.distillation:
            x = self.head(x), self.head_dist(x)
            if not self.training:

--- a/ppcls/arch/backbone/model_zoo/mixnet.py
+++ b/ppcls/arch/backbone/model_zoo/mixnet.py
@@ -780,13 +780,6 @@ def _load_pretrained(pretrained, model, model_url, use_ssld=False):


 def MixNet_S(pretrained=False, use_ssld=False, **kwargs):
-    model = InceptionV4DY(**kwargs)
-    _load_pretrained(
-        pretrained, model, MODEL_URLS["InceptionV4"], use_ssld=use_ssld)
-    return model
-
-
-def MixNet_S(**kwargs):
    """
    MixNet-S model from 'MixConv: Mixed Depthwise Convolutional Kernels,'
    https://arxiv.org/abs/1907.09595.
@@ -798,7 +791,7 @@ def MixNet_S(**kwargs):
    return model


-def MixNet_M(**kwargs):
+def MixNet_M(pretrained=False, use_ssld=False, **kwargs):
    """
    MixNet-M model from 'MixConv: Mixed Depthwise Convolutional Kernels,'
    https://arxiv.org/abs/1907.09595.
@@ -810,7 +803,7 @@ def MixNet_M(**kwargs):
    return model


-def MixNet_L(**kwargs):
+def MixNet_L(pretrained=False, use_ssld=False, **kwargs):
    """
    MixNet-S model from 'MixConv: Mixed Depthwise Convolutional Kernels,'
    https://arxiv.org/abs/1907.09595.

--- a/ppcls/arch/gears/cosmargin.py
+++ b/ppcls/arch/gears/cosmargin.py
@@ -38,7 +38,7 @@ class CosMargin(paddle.nn.Layer):

        input_norm = paddle.sqrt(
            paddle.sum(paddle.square(input), axis=1, keepdim=True))
-        input = paddle.divide(input, x_norm)
+        input = paddle.divide(input, input_norm)

        weight = self.fc.weight
        weight_norm = paddle.sqrt(

--- a/ppcls/configs/ImageNet/DPN/DPN107.yaml
+++ b/ppcls/configs/ImageNet/DPN/DPN107.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/DPN/DPN131.yaml
+++ b/ppcls/configs/ImageNet/DPN/DPN131.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/DPN/DPN68.yaml
+++ b/ppcls/configs/ImageNet/DPN/DPN68.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/DPN/DPN92.yaml
+++ b/ppcls/configs/ImageNet/DPN/DPN92.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/DPN/DPN98.yaml
+++ b/ppcls/configs/ImageNet/DPN/DPN98.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/DarkNet/DarkNet53.yaml
+++ b/ppcls/configs/ImageNet/DarkNet/DarkNet53.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml
+++ b/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
  Eval:
    - CELoss:

--- a/ppcls/configs/ImageNet/DataAugment/ResNet50_Mixup.yaml
+++ b/ppcls/configs/ImageNet/DataAugment/ResNet50_Mixup.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
  Eval:
    - CELoss:

--- a/ppcls/configs/ImageNet/Inception/InceptionV3.yaml
+++ b/ppcls/configs/ImageNet/Inception/InceptionV3.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/Inception/InceptionV4.yaml
+++ b/ppcls/configs/ImageNet/Inception/InceptionV4.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/MixNet/MixNet_L.yaml
+++ b/ppcls/configs/ImageNet/MixNet/MixNet_L.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# model architecture
+Arch:
+  name: MixNet_L
+  class_num: 1000
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/MixNet/MixNet_M.yaml
+++ b/ppcls/configs/ImageNet/MixNet/MixNet_M.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# model architecture
+Arch:
+  name: MixNet_M
+  class_num: 1000
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/MixNet/MixNet_S.yaml
+++ b/ppcls/configs/ImageNet/MixNet/MixNet_S.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# model architecture
+Arch:
+  name: MixNet_S
+  class_num: 1000
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/ReXNet/ReXNet_1_0.yaml
+++ b/ppcls/configs/ImageNet/ReXNet/ReXNet_1_0.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# model architecture
+Arch:
+  name: ReXNet_1_0
+  class_num: 1000
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/ReXNet/ReXNet_1_3.yaml
+++ b/ppcls/configs/ImageNet/ReXNet/ReXNet_1_3.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# model architecture
+Arch:
+  name: ReXNet_1_3
+  class_num: 1000
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/ReXNet/ReXNet_1_5.yaml
+++ b/ppcls/configs/ImageNet/ReXNet/ReXNet_1_5.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# model architecture
+Arch:
+  name: ReXNet_1_5
+  class_num: 1000
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/ReXNet/ReXNet_2_0.yaml
+++ b/ppcls/configs/ImageNet/ReXNet/ReXNet_2_0.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# model architecture
+Arch:
+  name: ReXNet_2_0
+  class_num: 1000
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/ReXNet/ReXNet_3_0.yaml
+++ b/ppcls/configs/ImageNet/ReXNet/ReXNet_3_0.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# model architecture
+Arch:
+  name: ReXNet_3_0
+  class_num: 1000
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Res2Net/Res2Net101_vd_26w_4s.yaml
+++ b/ppcls/configs/ImageNet/Res2Net/Res2Net101_vd_26w_4s.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/Res2Net/Res2Net200_vd_26w_4s.yaml
+++ b/ppcls/configs/ImageNet/Res2Net/Res2Net200_vd_26w_4s.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/Res2Net/Res2Net50_14w_8s.yaml
+++ b/ppcls/configs/ImageNet/Res2Net/Res2Net50_14w_8s.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/Res2Net/Res2Net50_26w_4s.yaml
+++ b/ppcls/configs/ImageNet/Res2Net/Res2Net50_26w_4s.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/Res2Net/Res2Net50_vd_26w_4s.yaml
+++ b/ppcls/configs/ImageNet/Res2Net/Res2Net50_vd_26w_4s.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNeSt/ResNeSt101.yaml
+++ b/ppcls/configs/ImageNet/ResNeSt/ResNeSt101.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNeSt/ResNeSt50.yaml
+++ b/ppcls/configs/ImageNet/ResNeSt/ResNeSt50.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNeSt/ResNeSt50_fast_1s1x64d.yaml
+++ b/ppcls/configs/ImageNet/ResNeSt/ResNeSt50_fast_1s1x64d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_32x4d.yaml
+++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_32x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_64x4d.yaml
+++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_64x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_32x4d.yaml
+++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_32x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_64x4d.yaml
+++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_64x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_32x4d.yaml
+++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_32x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_64x4d.yaml
+++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_64x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNet/ResNet101_vd.yaml
+++ b/ppcls/configs/ImageNet/ResNet/ResNet101_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNet/ResNet152_vd.yaml
+++ b/ppcls/configs/ImageNet/ResNet/ResNet152_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNet/ResNet18_vd.yaml
+++ b/ppcls/configs/ImageNet/ResNet/ResNet18_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNet/ResNet200_vd.yaml
+++ b/ppcls/configs/ImageNet/ResNet/ResNet200_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNet/ResNet34_vd.yaml
+++ b/ppcls/configs/ImageNet/ResNet/ResNet34_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/ResNet/ResNet50_fp16.yaml
+++ b/ppcls/configs/ImageNet/ResNet/ResNet50_fp16.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_channel: &image_channel 4
+  image_shape: [*image_channel, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# mixed precision training
+AMP:
+  scale_loss: 128.0
+  use_dynamic_loss_scaling: True
+  use_pure_fp16: &use_pure_fp16 True
+
+# model architecture
+Arch:
+  name: ResNet50
+  class_num: 1000
+  input_image_channel: *image_channel
+  data_format: "NHWC"
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  multi_precision: *use_pure_fp16
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+            output_fp16: *use_pure_fp16
+            channel_num: *image_channel
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 32
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+            output_fp16: *use_pure_fp16
+            channel_num: *image_channel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        output_fp16: *use_pure_fp16
+        channel_num: *image_channel
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/ResNet/ResNet50_fp16_dygraph.yaml
+++ b/ppcls/configs/ImageNet/ResNet/ResNet50_fp16_dygraph.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  image_channel: &image_channel 4
+  # used for static mode and model export
+  image_shape: [*image_channel, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: True
+
+# mixed precision training
+AMP:
+  scale_loss: 128.0
+  use_dynamic_loss_scaling: True
+  use_pure_fp16: &use_pure_fp16 False
+
+# model architecture
+Arch:
+  name: ResNet50
+  class_num: 1000
+  input_image_channel: *image_channel
+  data_format: "NHWC"
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+            output_fp16: *use_pure_fp16
+            channel_num: *image_channel
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+            output_fp16: *use_pure_fp16
+            channel_num: *image_channel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        output_fp16: *use_pure_fp16
+        channel_num: *image_channel
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml
+++ b/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/SENet/SENet154_vd.yaml
+++ b/ppcls/configs/ImageNet/SENet/SENet154_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d.yaml
+++ b/ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d_fp16.yaml
+++ b/ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d_fp16.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 200
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_channel: &image_channel 4
+  image_shape: [*image_channel, 224, 224]
+  save_inference_dir: ./inference
+
+# model architecture
+Arch:
+  name: SE_ResNeXt101_32x4d
+  class_num: 1000
+  input_image_channel: *image_channel
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+# mixed precision training
+AMP:
+    scale_loss: 128.0
+    use_dynamic_loss_scaling: True
+    use_pure_fp16: &use_pure_fp16 True
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.1
+  regularizer:
+    name: 'L2'
+    coeff: 0.00007
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+            output_fp16: *use_pure_fp16
+            channel_num: *image_channel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+            output_fp16: *use_pure_fp16
+            channel_num: *image_channel
+    sampler:
+      name: BatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        output_fp16: *use_pure_fp16
+        channel_num: *image_channel
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_32x4d.yaml
+++ b/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_32x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_vd_32x4d.yaml
+++ b/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_vd_32x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/SENet/SE_ResNet18_vd.yaml
+++ b/ppcls/configs/ImageNet/SENet/SE_ResNet18_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/SENet/SE_ResNet34_vd.yaml
+++ b/ppcls/configs/ImageNet/SENet/SE_ResNet34_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/SENet/SE_ResNet50_vd.yaml
+++ b/ppcls/configs/ImageNet/SENet/SE_ResNet50_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/Twins/alt_gvt_base.yaml
+++ b/ppcls/configs/ImageNet/Twins/alt_gvt_base.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# model architecture
+Arch:
+  name: alt_gvt_base
+  class_num: 1000
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Twins/alt_gvt_large.yaml
+++ b/ppcls/configs/ImageNet/Twins/alt_gvt_large.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# model architecture
+Arch:
+  name: alt_gvt_large
+  class_num: 1000
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Twins/alt_gvt_small.yaml
+++ b/ppcls/configs/ImageNet/Twins/alt_gvt_small.yaml
--- a/ppcls/configs/ImageNet/Twins/pcpvt_base.yaml
+++ b/ppcls/configs/ImageNet/Twins/pcpvt_base.yaml
--- a/ppcls/configs/ImageNet/Twins/pcpvt_large.yaml
+++ b/ppcls/configs/ImageNet/Twins/pcpvt_large.yaml
--- a/ppcls/configs/ImageNet/Twins/pcpvt_small.yaml
+++ b/ppcls/configs/ImageNet/Twins/pcpvt_small.yaml
--- a/ppcls/configs/ImageNet/Xception/Xception65.yaml
+++ b/ppcls/configs/ImageNet/Xception/Xception65.yaml
@@ -16,13 +16,13 @@ Global:

 # model architecture
 Arch:
-  name: Xception41_deeplab
+  name: Xception65
  class_num: 1000
 
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/ImageNet/Xception/Xception71.yaml
+++ b/ppcls/configs/ImageNet/Xception/Xception71.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
        epsilon: 0.1
  Eval:

--- a/ppcls/configs/Products/ResNet50_vd_Inshop.yaml
+++ b/ppcls/configs/Products/ResNet50_vd_Inshop.yaml
--- a/ppcls/configs/Products/ResNet50_vd_SOP.yaml
+++ b/ppcls/configs/Products/ResNet50_vd_SOP.yaml
--- a/ppcls/data/__init__.py
+++ b/ppcls/data/__init__.py
--- a/ppcls/data/dataloader/dali.py
+++ b/ppcls/data/dataloader/dali.py
--- a/ppcls/data/dataloader/vehicle_dataset.py
+++ b/ppcls/data/dataloader/vehicle_dataset.py
--- a/ppcls/data/preprocess/ops/operators.py
+++ b/ppcls/data/preprocess/ops/operators.py
--- a/ppcls/engine/trainer.py
+++ b/ppcls/engine/trainer.py
--- a/ppcls/loss/__init__.py
+++ b/ppcls/loss/__init__.py
--- a/ppcls/loss/celoss.py
+++ b/ppcls/loss/celoss.py
--- a/ppcls/optimizer/__init__.py
+++ b/ppcls/optimizer/__init__.py
--- a/ppcls/optimizer/optimizer.py
+++ b/ppcls/optimizer/optimizer.py
--- a/ppcls/static/program.py
+++ b/ppcls/static/program.py
--- a/ppcls/static/run_dali.sh
+++ b/ppcls/static/run_dali.sh
--- a/ppcls/static/save_load.py
+++ b/ppcls/static/save_load.py
--- a/ppcls/static/train.py
+++ b/ppcls/static/train.py
--- a/ppcls/utils/save_load.py
+++ b/ppcls/utils/save_load.py