diff --git a/README_ch.md b/README_ch.md
index 818572447c99bb98ab0b01a765855b5b658e23d7..cc5c214c8156f4b58417f04a5b59a29379cf090a 100644
--- a/README_ch.md
+++ b/README_ch.md
@@ -8,9 +8,10 @@
 
 **近期更新**
 
+- 2021.07.08、07.27 添加26个[FAQ](docs/zh_CN/faq_series/faq_2021_s2.md)
 - 2021.06.29 添加Swin-transformer系列模型，ImageNet1k数据集上Top1 acc最高精度可达87.2%；支持训练预测评估与whl包部署，预训练模型可以从[这里](docs/zh_CN/models/models_intro.md)下载。
 - 2021.06.22,23,24 PaddleClas官方研发团队带来技术深入解读三日直播课。课程回放：[https://aistudio.baidu.com/aistudio/course/introduce/24519](https://aistudio.baidu.com/aistudio/course/introduce/24519)
-- 2021.06.16 PaddleClas v2.2版本升级，集成Metric learning，向量检索等组件。新增商品识别、动漫人物识别、车辆识别和logo识别等4个图像识别应用。新增LeViT、TNT、DLA、HarDNet、RedNet系列24个预训练模型。
+- 2021.06.16 PaddleClas v2.2版本升级，集成Metric learning，向量检索等组件。新增商品识别、动漫人物识别、车辆识别和logo识别等4个图像识别应用。新增LeViT、Twins、TNT、DLA、HarDNet、RedNet系列30个预训练模型。
 - [more](./docs/zh_CN/update_history.md)
 
 ## 特性
@@ -18,7 +19,7 @@
 - 实用的图像识别系统：集成了目标检测、特征学习、图像检索等模块，广泛适用于各类图像识别任务。
 提供商品识别、车辆识别、logo识别和动漫人物识别等4个场景应用示例。
 
-- 丰富的预训练模型库：提供了33个系列共150个ImageNet预训练模型，其中6个精选系列模型支持结构快速修改。
+- 丰富的预训练模型库：提供了35个系列共164个ImageNet预训练模型，其中6个精选系列模型支持结构快速修改。
 
 - 全面易用的特征学习组件：集成arcmargin, triplet loss等12度量学习方法，通过配置文件即可随意组合切换。
 
@@ -78,7 +79,8 @@ Res2Net200_vd预训练模型Top-1精度高达85.1%。
     - [知识蒸馏](./docs/zh_CN/advanced_tutorials/distillation/distillation.md)
     - [模型量化](./docs/zh_CN/extension/paddle_quantization.md)
     - [数据增广](./docs/zh_CN/advanced_tutorials/image_augmentation/ImageAugment.md)
-- FAQ(暂停更新)
+- FAQ
+    - [图像识别任务FAQ](docs/zh_CN/faq_series/faq_2021_s2.md)
     - [图像分类任务FAQ](docs/zh_CN/faq.md)
 - [许可证书](#许可证书)
 - [贡献代码](#贡献代码)
diff --git a/README_en.md b/README_en.md
index 374792192766b1595144eccc8bbf2208e5966565..bc4f59cf9cec7fc86f9182254e2c355dc1e76156 100644
--- a/README_en.md
+++ b/README_en.md
@@ -9,7 +9,7 @@ PaddleClas is an image recognition toolset for industry and academia, helping us
 **Recent updates**
 
 - 2021.06.29 Add Swin-transformer series model，Highest top1 acc on ImageNet1k dataset reaches 87.2%, training, evaluation and inference are all supported. Pretrained models can be downloaded [here](docs/en/models/models_intro_en.md).
-- 2021.06.16 PaddleClas release/2.2. Add metric learning and vector search modules. Add product recognition, animation character recognition, vehicle recognition and logo recognition. Added 24 pretrained models of LeViT, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
+- 2021.06.16 PaddleClas release/2.2. Add metric learning and vector search modules. Add product recognition, animation character recognition, vehicle recognition and logo recognition. Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
 - [more](./docs/en/update_history_en.md)
 
 ## Features
@@ -17,7 +17,7 @@ PaddleClas is an image recognition toolset for industry and academia, helping us
 - A practical image recognition system consist of detection, feature learning and retrieval modules, widely applicable to all types of image recognition tasks.
 Four sample solutions are provided, including product recognition, vehicle recognition, logo recognition and animation character recognition.
 
-- Rich library of pre-trained models: Provide a total of 150 ImageNet pre-trained models in 33 series, among which 6 selected series of models support fast structural modification.
+- Rich library of pre-trained models: Provide a total of 164 ImageNet pre-trained models in 35 series, among which 6 selected series of models support fast structural modification.
 
 - Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be combined and switched at will through configuration files.
 
@@ -51,7 +51,7 @@ Quick experience of image recognition：[Link](./docs/en/tutorials/quick_start_r
 - [Introduction to Image Recognition Systems](#Introduction_to_Image_Recognition_Systems)
 - [Demo images](#Demo_images)
 - Algorithms Introduction
-    - [Backbone Network and Pre-trained Model Library](./docs/en/ImageNet_models.md)
+    - [Backbone Network and Pre-trained Model Library](./docs/en/ImageNet_models_en.md)
     - [Mainbody Detection](./docs/en/application/mainbody_detection_en.md)
     - [Image Classification](./docs/en/tutorials/image_classification_en.md)
     - [Feature Learning](./docs/en/application/feature_learning_en.md)
diff --git a/deploy/auto_log.log b/deploy/auto_log.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/deploy/configs/build_cartoon.yaml b/deploy/configs/build_cartoon.yaml
index f739cde33eb6efbede67baf58c4ca5d7ace394bf..c73279801dfedbf9576ef8013226bb1cc4ba02cd 100644
--- a/deploy/configs/build_cartoon.yaml
+++ b/deploy/configs/build_cartoon.yaml
@@ -1,9 +1,9 @@
 Global:
   rec_inference_model_dir: "./models/cartoon_rec_ResNet50_iCartoon_v1.0_infer/"
-  batch_size: 1
+  batch_size: 32
   use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
   enable_benchmark: True
   use_fp16: False
   ir_optim: True
diff --git a/deploy/configs/build_logo.yaml b/deploy/configs/build_logo.yaml
index a806ec00606706d2609f50bac5247cb08ecac393..5be17ed978d57ee5e084d14ee348b090d61fb5c7 100644
--- a/deploy/configs/build_logo.yaml
+++ b/deploy/configs/build_logo.yaml
@@ -1,9 +1,9 @@
 Global:
   rec_inference_model_dir: "./models/logo_rec_ResNet50_Logo3K_v1.0_infer/"
-  batch_size: 1
+  batch_size: 32
   use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
   enable_benchmark: True
   use_fp16: False
   ir_optim: True
diff --git a/deploy/configs/build_product.yaml b/deploy/configs/build_product.yaml
index 679a1a6746612f5bc9c4734d9670a507e341a7fb..59e3b29baec066f6b7cf87c90fca69793d12482e 100644
--- a/deploy/configs/build_product.yaml
+++ b/deploy/configs/build_product.yaml
@@ -1,9 +1,9 @@
 Global:
   rec_inference_model_dir: "./models/product_ResNet50_vd_aliproduct_v1.0_infer"
-  batch_size: 1
+  batch_size: 32
   use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
   enable_benchmark: True
   use_fp16: False
   ir_optim: True
diff --git a/deploy/configs/build_vehicle.yaml b/deploy/configs/build_vehicle.yaml
index e149d938b88c07bfafdd6729730cf2599f6c9e4d..be095f4e1eb78c81b1b0b083b10352e7f50ad25d 100644
--- a/deploy/configs/build_vehicle.yaml
+++ b/deploy/configs/build_vehicle.yaml
@@ -1,9 +1,9 @@
 Global:
   rec_inference_model_dir: "./models/vehicle_cls_ResNet50_CompCars_v1.0_infer/"
-  batch_size: 1
+  batch_size: 32
   use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
   enable_benchmark: True
   use_fp16: False
   ir_optim: True
diff --git a/deploy/configs/inference_cartoon.yaml b/deploy/configs/inference_cartoon.yaml
index bb08935a168db312c954dab5c84ea4c2347847a4..fb345530232da2e377b96f8644ea4dee0f47421c 100644
--- a/deploy/configs/inference_cartoon.yaml
+++ b/deploy/configs/inference_cartoon.yaml
@@ -12,8 +12,8 @@ Global:
   - foreground
 
   use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
   enable_benchmark: True
   use_fp16: False
   ir_optim: True
diff --git a/deploy/configs/inference_cls.yaml b/deploy/configs/inference_cls.yaml
index f58882b7efd1535ab422a0ec0d778754942f599f..7954880429cf52fcc905183ccf5976964d5996b5 100644
--- a/deploy/configs/inference_cls.yaml
+++ b/deploy/configs/inference_cls.yaml
@@ -3,8 +3,8 @@ Global:
   inference_model_dir: "./models"
   batch_size: 1
   use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
   enable_benchmark: True
   use_fp16: False
   ir_optim: True
@@ -22,6 +22,7 @@ PreProcess:
         mean: [0.485, 0.456, 0.406]
         std: [0.229, 0.224, 0.225]
         order: ''
+        channel_num: 3
     - ToCHWImage:
 PostProcess:
   main_indicator: Topk
@@ -29,4 +30,4 @@ PostProcess:
     topk: 5
     class_id_map_file: "../ppcls/utils/imagenet1k_label_list.txt"
   SavePreLabel:
-    save_dir: ./pre_label/
\ No newline at end of file
+    save_dir: ./pre_label/
diff --git a/deploy/configs/inference_cls_ch4.yaml b/deploy/configs/inference_cls_ch4.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e8916ec3142eda6ee885fcf9a618f24693a2f1ae
--- /dev/null
+++ b/deploy/configs/inference_cls_ch4.yaml
@@ -0,0 +1,33 @@
+Global:
+  infer_imgs: "./images/ILSVRC2012_val_00000010.jpeg"
+  inference_model_dir: "./models"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: True
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 4
+    - ToCHWImage:
+PostProcess:
+  main_indicator: Topk
+  Topk:
+    topk: 5
+    class_id_map_file: "../ppcls/utils/imagenet1k_label_list.txt"
+  SavePreLabel:
+    save_dir: ./pre_label/
diff --git a/deploy/configs/inference_det.yaml b/deploy/configs/inference_det.yaml
index 5236dd7767fbb4db37f7fd720faf5e7bd995dc85..7180c599c504aa46aa06f20c944e81101dc4b8f6 100644
--- a/deploy/configs/inference_det.yaml
+++ b/deploy/configs/inference_det.yaml
@@ -10,8 +10,8 @@ Global:
 
   # inference engine config
   use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
   enable_benchmark: True
   use_fp16: False
   ir_optim: True
diff --git a/deploy/configs/inference_logo.yaml b/deploy/configs/inference_logo.yaml
index a98b6c3870083065c8a426a5477f55321d4aa2f0..8be5c33ad404e8d555d5208f0352dd873e78812a 100644
--- a/deploy/configs/inference_logo.yaml
+++ b/deploy/configs/inference_logo.yaml
@@ -13,8 +13,8 @@ Global:
 
   # inference engine config
   use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
   enable_benchmark: True
   use_fp16: False
   ir_optim: True
diff --git a/deploy/configs/inference_product.yaml b/deploy/configs/inference_product.yaml
index f75fee3e151028a4345b6ce728121deffd0a6ab4..871d55d55778240a645a4a87fe96694e556b7684 100644
--- a/deploy/configs/inference_product.yaml
+++ b/deploy/configs/inference_product.yaml
@@ -13,8 +13,8 @@ Global:
 
   # inference engine config
   use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
   enable_benchmark: True
   use_fp16: False
   ir_optim: True
diff --git a/deploy/configs/inference_rec.yaml b/deploy/configs/inference_rec.yaml
index dd906880cfb1c86800feeb042817ff0ae7ddf9a5..5346510ba8bf2120df3674552850a37967d714c5 100644
--- a/deploy/configs/inference_rec.yaml
+++ b/deploy/configs/inference_rec.yaml
@@ -10,8 +10,8 @@ Global:
 
   # inference engine config
   use_gpu: False
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
   enable_benchmark: True
   use_fp16: False
   ir_optim: True
diff --git a/deploy/configs/inference_vehicle.yaml b/deploy/configs/inference_vehicle.yaml
index 17f70abccd725363da58f54494627b13f450cb6a..8edcb8d5dfef70081f5b6cd9185ee22133f90756 100644
--- a/deploy/configs/inference_vehicle.yaml
+++ b/deploy/configs/inference_vehicle.yaml
@@ -13,8 +13,8 @@ Global:
 
   # inference engine config
   use_gpu: True
-  enable_mkldnn: False
-  cpu_num_threads: 100
+  enable_mkldnn: True
+  cpu_num_threads: 10
   enable_benchmark: True
   use_fp16: False
   ir_optim: True
diff --git a/deploy/paddleserving/image_http_client.py b/deploy/paddleserving/image_http_client.py
index 3b92091c659613c83e4423a3f22b0d4d20321f43..4e33c4a7e4bb60b2937b9a0073cb16c74fe4d911 100644
--- a/deploy/paddleserving/image_http_client.py
+++ b/deploy/paddleserving/image_http_client.py
@@ -22,10 +22,9 @@ py_version = sys.version_info[0]
 
 
 def predict(image_path, server):
-    if py_version == 2:
-        image = base64.b64encode(open(image_path).read())
-    else:
-        image = base64.b64encode(open(image_path, "rb").read()).decode("utf-8")
+
+    with open(image_path, "rb") as f:
+        image = base64.b64encode(f.read()).decode("utf-8")
     req = json.dumps({"feed": [{"image": image}], "fetch": ["prediction"]})
     r = requests.post(
         server, data=req, headers={"Content-Type": "application/json"})
diff --git a/deploy/python/build_gallery.py b/deploy/python/build_gallery.py
index 142e3cf23ca844f712ff996893cbac7a44589cc8..a7297366dc21816f190bcebbacc592182ad5af89 100644
--- a/deploy/python/build_gallery.py
+++ b/deploy/python/build_gallery.py
@@ -71,14 +71,26 @@ class GalleryBuilder(object):
         gallery_features = np.zeros(
             [len(gallery_images), config['embedding_size']], dtype=np.float32)
 
+        #construct batch imgs and do inference
+        batch_size = config.get("batch_size", 32)
+        batch_img = []
         for i, image_file in enumerate(tqdm(gallery_images)):
             img = cv2.imread(image_file)
             if img is None:
                 logger.error("img empty, please check {}".format(image_file))
                 exit()
             img = img[:, :, ::-1]
-            rec_feat = self.rec_predictor.predict(img)
-            gallery_features[i, :] = rec_feat
+            batch_img.append(img)
+
+            if (i + 1) % batch_size == 0:
+                rec_feat = self.rec_predictor.predict(batch_img)
+                gallery_features[i - batch_size + 1:i + 1, :] = rec_feat
+                batch_img = []
+
+        if len(batch_img) > 0:
+            rec_feat = self.rec_predictor.predict(batch_img)
+            gallery_features[-len(batch_img):, :] = rec_feat
+            batch_img = []
 
         # train index 
         self.Searcher = Graph_Index(dist_type=config['dist_type'])
diff --git a/deploy/python/predict_cls.py b/deploy/python/predict_cls.py
index a9165a92efa62a9252834a988050eebfa8d89f69..dc6865404ecfbc517c7b952c52035a27cbc0137f 100644
--- a/deploy/python/predict_cls.py
+++ b/deploy/python/predict_cls.py
@@ -41,6 +41,29 @@ class ClsPredictor(Predictor):
         if "PostProcess" in config:
             self.postprocess = build_postprocess(config["PostProcess"])
 
+        # for whole_chain project to test each repo of paddle
+        self.benchmark = config["Global"].get("benchmark", False)
+        if self.benchmark:
+            import auto_log
+            import os
+            pid = os.getpid()
+            self.auto_logger = auto_log.AutoLogger(
+                model_name=config["Global"].get("model_name", "cls"),
+                model_precision='fp16'
+                if config["Global"]["use_fp16"] else 'fp32',
+                batch_size=config["Global"].get("batch_size", 1),
+                data_shape=[3, 224, 224],
+                save_path=config["Global"].get("save_log_path",
+                                               "./auto_log.log"),
+                inference_config=self.config,
+                pids=pid,
+                process_name=None,
+                gpu_ids=None,
+                time_keys=[
+                    'preprocess_time', 'inference_time', 'postprocess_time'
+                ],
+                warmup=2)
+
     def predict(self, images):
         input_names = self.paddle_predictor.get_input_names()
         input_tensor = self.paddle_predictor.get_input_handle(input_names[0])
@@ -49,16 +72,26 @@ class ClsPredictor(Predictor):
         output_tensor = self.paddle_predictor.get_output_handle(output_names[
             0])
 
+        if self.benchmark:
+            self.auto_logger.times.start()
         if not isinstance(images, (list, )):
             images = [images]
         for idx in range(len(images)):
             for ops in self.preprocess_ops:
                 images[idx] = ops(images[idx])
         image = np.array(images)
+        if self.benchmark:
+            self.auto_logger.times.stamp()
 
         input_tensor.copy_from_cpu(image)
         self.paddle_predictor.run()
         batch_output = output_tensor.copy_to_cpu()
+        if self.benchmark:
+            self.auto_logger.times.stamp()
+        if self.postprocess is not None:
+            batch_output = self.postprocess(batch_output)
+        if self.benchmark:
+            self.auto_logger.times.end(stamp=True)
         return batch_output
 
 
@@ -66,12 +99,40 @@ def main(config):
     cls_predictor = ClsPredictor(config)
     image_list = get_image_list(config["Global"]["infer_imgs"])
 
-    assert config["Global"]["batch_size"] == 1
-    for idx, image_file in enumerate(image_list):
-        img = cv2.imread(image_file)[:, :, ::-1]
-        output = cls_predictor.predict(img)
-        output = cls_predictor.postprocess(output, [image_file])
-        print(output)
+    batch_imgs = []
+    batch_names = []
+    cnt = 0
+    for idx, img_path in enumerate(image_list):
+        img = cv2.imread(img_path)
+        if img is None:
+            logger.warning(
+                "Image file failed to read and has been skipped. The path: {}".
+                format(img_path))
+        else:
+            img = img[:, :, ::-1]
+            batch_imgs.append(img)
+            img_name = os.path.basename(img_path)
+            batch_names.append(img_name)
+            cnt += 1
+
+        if cnt % config["Global"]["batch_size"] == 0 or (idx + 1
+                                                         ) == len(image_list):
+            if len(batch_imgs) == 0:
+                continue
+
+            batch_results = cls_predictor.predict(batch_imgs)
+            for number, result_dict in enumerate(batch_results):
+                filename = batch_names[number]
+                clas_ids = result_dict["class_ids"]
+                scores_str = "[{}]".format(", ".join("{:.2f}".format(
+                    r) for r in result_dict["scores"]))
+                label_names = result_dict["label_names"]
+                print("{}:\tclass id(s): {}, score(s): {}, label_name(s): {}".
+                      format(filename, clas_ids, scores_str, label_names))
+            batch_imgs = []
+            batch_names = []
+    if cls_predictor.benchmark:
+        cls_predictor.auto_logger.report()
     return
 
 
diff --git a/deploy/python/predict_rec.py b/deploy/python/predict_rec.py
index de293bf0097f9ea48e1ecd296ec2695e15c54ba4..d41c513f89fd83972e86bc5941a8dba1fd488856 100644
--- a/deploy/python/predict_rec.py
+++ b/deploy/python/predict_rec.py
@@ -54,12 +54,14 @@ class RecPredictor(Predictor):
         input_tensor.copy_from_cpu(image)
         self.paddle_predictor.run()
         batch_output = output_tensor.copy_to_cpu()
-        
+
         if feature_normalize:
             feas_norm = np.sqrt(
                 np.sum(np.square(batch_output), axis=1, keepdims=True))
             batch_output = np.divide(batch_output, feas_norm)
-            
+
+        if self.postprocess is not None:
+            batch_output = self.postprocess(batch_output)
         return batch_output
 
 
@@ -67,14 +69,33 @@ def main(config):
     rec_predictor = RecPredictor(config)
     image_list = get_image_list(config["Global"]["infer_imgs"])
 
-    assert config["Global"]["batch_size"] == 1
-    for idx, image_file in enumerate(image_list):
-        batch_input = []
-        img = cv2.imread(image_file)[:, :, ::-1]
-        output = rec_predictor.predict(img)
-        if rec_predictor.postprocess is not None:
-            output = rec_predictor.postprocess(output)
-        print(output)
+    batch_imgs = []
+    batch_names = []
+    cnt = 0
+    for idx, img_path in enumerate(image_list):
+        img = cv2.imread(img_path)
+        if img is None:
+            logger.warning(
+                "Image file failed to read and has been skipped. The path: {}".
+                format(img_path))
+        else:
+            img = img[:, :, ::-1]
+            batch_imgs.append(img)
+            img_name = os.path.basename(img_path)
+            batch_names.append(img_name)
+            cnt += 1
+
+        if cnt % config["Global"]["batch_size"] == 0 or (idx + 1) == len(image_list):
+            if len(batch_imgs) == 0: 
+                continue
+                
+            batch_results = rec_predictor.predict(batch_imgs)
+            for number, result_dict in enumerate(batch_results):
+                filename = batch_names[number]
+                print("{}:\t{}".format(filename, result_dict))
+            batch_imgs = []
+            batch_names = []
+
     return
 
 
diff --git a/deploy/utils/predictor.py b/deploy/utils/predictor.py
index 7757aa1e12a79cbba99e6ab56bde286ab2d09369..11f153071a0da0f82f035fe7389ad8f9f3bd8e6b 100644
--- a/deploy/utils/predictor.py
+++ b/deploy/utils/predictor.py
@@ -28,7 +28,7 @@ class Predictor(object):
         if args.use_fp16 is True:
             assert args.use_tensorrt is True
         self.args = args
-        self.paddle_predictor = self.create_paddle_predictor(
+        self.paddle_predictor, self.config = self.create_paddle_predictor(
             args, inference_model_dir)
 
     def predict(self, image):
@@ -59,11 +59,12 @@ class Predictor(object):
             config.enable_tensorrt_engine(
                 precision_mode=Config.Precision.Half
                 if args.use_fp16 else Config.Precision.Float32,
-                max_batch_size=args.batch_size)
+                max_batch_size=args.batch_size,
+                min_subgraph_size=30)
 
         config.enable_memory_optim()
         # use zero copy
         config.switch_use_feed_fetch_ops(False)
         predictor = create_predictor(config)
 
-        return predictor
+        return predictor, config
diff --git a/deploy/vector_search/README.md b/deploy/vector_search/README.md
index 151cc403f10701ed05d1b5720b71ba40efe04134..a921c23459eb4deade1a6cef2b68eefa24459b5b 100644
--- a/deploy/vector_search/README.md
+++ b/deploy/vector_search/README.md
@@ -35,7 +35,6 @@ sudo apt-get install build-essential gcc g++
 
 进入该文件夹，直接运行`make`即可，如果希望重新生成`index.so`文件，可以首先使用`make clean`清除已经生成的缓存，再使用`make`生成更新之后的库文件。
 
-
 ### 2.3 Windows上编译生成库文件
 
 Windows上首先需要安装gcc编译工具，推荐使用[TDM-GCC](https://jmeubank.github.io/tdm-gcc/articles/2020-03/9.2.0-release)，进入官网之后，可以选择合适的版本进行下载。推荐下载[tdm64-gcc-10.3.0-2.exe](https://github.com/jmeubank/tdm-gcc/releases/download/v10.3.0-tdm64-2/tdm64-gcc-10.3.0-2.exe)。
@@ -50,6 +49,25 @@ Windows上首先需要安装gcc编译工具，推荐使用[TDM-GCC](https://jmeu
 
 在该文件夹(deploy/vector_search)下，运行命令`mingw32-make`，即可生成`index.dll`库文件。如果希望重新生成`index.dll`文件，可以首先使用`mingw32-make clean`清除已经生成的缓存，再使用`mingw32-make`生成更新之后的库文件。
 
+### 2.4 MacOS上编译生成库文件
+
+运行下面的命令，安装gcc与g++:
+
+```shell
+brew install gcc
+```
+#### 注意：
+1. 若提示 `Error: Running Homebrew as root is extremely dangerous and no longer supported...`,  参考该[链接](https://jingyan.baidu.com/article/e52e3615057a2840c60c519c.html)处理
+2. 若提示 `Error: Failure while executing; `tar --extract --no-same-owner --file...`， 参考该[链接](https://blog.csdn.net/Dawn510/article/details/117787358)处理
+
+在安装之后编译后的可执行程序会被复制到/usr/local/bin下面，查看这个文件夹下的gcc：
+```
+ls /usr/local/bin/gcc*
+```
+可以看到本地gcc对应的版本号为gcc-11，编译命令如下: (如果本地gcc版本为gcc-9, 则相应命令修改为`CXX=g++-9 make`)
+```
+CXX=g++-11 make
+```
 
 ## 3. 快速使用
 
diff --git a/docs/en/ImageNet_models.md b/docs/en/ImageNet_models_en.md
similarity index 72%
rename from docs/en/ImageNet_models.md
rename to docs/en/ImageNet_models_en.md
index 99cbbd26616bbf6af31385073a1bae84b0047fc5..743337e46aaa2516c6418b4c3e819c4bcf873a97 100644
--- a/docs/en/ImageNet_models.md
+++ b/docs/en/ImageNet_models_en.md
@@ -1,6 +1,6 @@
 ### ImageNet Model zoo overview
 
-Based on the ImageNet-1k classification dataset, the 24 classification network structures supported by PaddleClas and the corresponding 122 image classification pretrained models are shown below. Training trick, a brief introduction to each series of network structures, and performance evaluation will be shown in the corresponding chapters. The  evaluation environment is as follows.
+Based on the ImageNet-1k classification dataset, the 35 classification network structures supported by PaddleClas and the corresponding 164 image classification pretrained models are shown below. Training trick, a brief introduction to each series of network structures, and performance evaluation will be shown in the corresponding chapters. The  evaluation environment is as follows.
 
 * CPU evaluation environment is based on Snapdragon 855 (SD855).
 * The GPU evaluation speed is measured by running 500 times under the FP32+TensorRT configuration (excluding the warmup time of the first 10 times).
@@ -25,28 +25,27 @@ Accuracy and inference time of the prtrained models based on SSLD distillation a
 
 | Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                                                         |
 |---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
-| ResNet34_vd_ssld         | 0.797    | 0.760  | 0.037  | 2.434               | 6.222              | 7.39     | 21.82     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_ssld_pretrained.pdparams)         |
-| ResNet50_vd_<br>ssld    | 0.824    | 0.791    | 0.033 |  3.531               | 8.090              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams)    |
-| ResNet50_vd_<br>ssld_v2 | 0.830    | 0.792    | 0.039 | 3.531               | 8.090              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_v2_pretrained.pdparams) |
-| ResNet101_vd_<br>ssld   | 0.837    | 0.802    | 0.035 |  6.117               | 13.762             | 16.1     | 44.57     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_ssld_pretrained.pdparams)   |
+| ResNet34_vd_ssld         | 0.797    | 0.760  | 0.037  | 2.434               | 6.222              | 7.39     | 21.82     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_vd_ssld_pretrained.pdparams)         |
+| ResNet50_vd_<br>ssld | 0.830    | 0.792    | 0.039 | 3.531               | 8.090              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
+| ResNet101_vd_<br>ssld   | 0.837    | 0.802    | 0.035 |  6.117               | 13.762             | 16.1     | 44.57     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams)   |
 | Res2Net50_vd_<br>26w_4s_ssld | 0.831    | 0.798    | 0.033 |  4.527              | 9.657             | 8.37     | 25.06     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
 | Res2Net101_vd_<br>26w_4s_ssld | 0.839    | 0.806    | 0.033 | 8.087              | 17.312             | 16.67    | 45.22     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
 | Res2Net200_vd_<br>26w_4s_ssld | 0.851    | 0.812    | 0.049 | 14.678              | 32.350             | 31.49    | 76.21     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
-| HRNet_W18_C_ssld | 0.812    | 0.769   | 0.043 | 7.406          | 13.297         | 4.14     | 21.29     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_ssld_pretrained.pdparams) |
-| HRNet_W48_C_ssld | 0.836    | 0.790   | 0.046  | 13.707         | 34.435         | 34.58    | 77.47     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_ssld_pretrained.pdparams) |
-| SE_HRNet_W64_C_ssld | 0.848    |  -    |  - |  31.697      |     94.995      | 57.83    | 128.97    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
+| HRNet_W18_C_ssld | 0.812    | 0.769   | 0.043 | 7.406          | 13.297         | 4.14     | 21.29     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W18_C_ssld_pretrained.pdparams) |
+| HRNet_W48_C_ssld | 0.836    | 0.790   | 0.046  | 13.707         | 34.435         | 34.58    | 77.47     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W48_C_ssld_pretrained.pdparams) |
+| SE_HRNet_W64_C_ssld | 0.848    |  -    |  - |  31.697      |     94.995      | 57.83    | 128.97    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
 
 
 * Mobile-side distillation pretrained models
 
-| Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | Model size(M) | Download Address   |
+| Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | 模型大小(M) | Download Address  |
 |---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
-| MobileNetV1_<br>ssld   | 0.779    | 0.710    | 0.069 |  32.523              | 1.11     | 4.19      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_ssld_pretrained.pdparams)                 |
+| MobileNetV1_<br>ssld   | 0.779    | 0.710    | 0.069 |  32.523              | 1.11     | 4.19      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)                 |
 | MobileNetV2_<br>ssld                 | 0.767    | 0.722  | 0.045  | 23.318              | 0.6      | 3.44      | 14      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)                 |
-| MobileNetV3_<br>small_x0_35_ssld          | 0.556    | 0.530 | 0.026   | 2.635                 | 0.026    | 1.66      | 6.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
-| MobileNetV3_<br>large_x1_0_ssld      | 0.790    | 0.753  | 0.036  | 19.308           | 0.45     | 5.47      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
-| MobileNetV3_small_<br>x1_0_ssld      | 0.713    | 0.682  |  0.031  | 6.546                 | 0.123    | 2.94      | 12      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
-| GhostNet_<br>x1_3_ssld                    | 0.794    | 0.757   | 0.037 | 19.983                | 0.44     | 7.3       | 29      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)               |
+| MobileNetV3_<br>small_x0_35_ssld          | 0.556    | 0.530 | 0.026   | 2.635                 | 0.026    | 1.66      | 6.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
+| MobileNetV3_<br>large_x1_0_ssld      | 0.790    | 0.753  | 0.036  | 19.308           | 0.45     | 5.47      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
+| MobileNetV3_small_<br>x1_0_ssld      | 0.713    | 0.682  |  0.031  | 6.546                 | 0.123    | 2.94      | 12      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
+| GhostNet_<br>x1_3_ssld                    | 0.794    | 0.757   | 0.037 | 19.983                | 0.44     | 7.3       | 29      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)  
 
 
 * Note: `Reference Top-1 Acc` means accuracy of pretrained models which are trained on ImageNet1k dataset.
@@ -57,25 +56,23 @@ Accuracy and inference time of the prtrained models based on SSLD distillation a
 
 Accuracy and inference time metrics of ResNet and Vd series models are shown as follows. More detailed information can be refered to [ResNet and Vd series tutorial](../en/models/ResNet_and_vd_en.md).
 
-| Model                 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                                                         |
+| Model                  | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                                                         |
 |---------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------|
-| ResNet18            | 0.7098    | 0.8992    | 1.45606               | 3.56305              | 3.66     | 11.69     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_pretrained.pdparams)            |
-| ResNet18_vd         | 0.7226    | 0.9080    | 1.54557               | 3.85363              | 4.14     | 11.71     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_vd_pretrained.pdparams)         |
-| ResNet34            | 0.7457    | 0.9214    | 2.34957               | 5.89821              | 7.36     | 21.8      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_pretrained.pdparams)            |
-| ResNet34_vd         | 0.7598    | 0.9298    | 2.43427               | 6.22257              | 7.39     | 21.82     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_pretrained.pdparams)         |
-| ResNet34_vd_ssld         | 0.7972    | 0.9490    | 2.43427               | 6.22257              | 7.39     | 21.82     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_ssld_pretrained.pdparams)         |
-| ResNet50            | 0.7650    | 0.9300    | 3.47712               | 7.84421              | 8.19     | 25.56     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_pretrained.pdparams)            |
+| ResNet18            | 0.7098    | 0.8992    | 1.45606               | 3.56305              | 3.66     | 11.69     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet18_pretrained.pdparams)            |
+| ResNet18_vd         | 0.7226    | 0.9080    | 1.54557               | 3.85363              | 4.14     | 11.71     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet18_vd_pretrained.pdparams)         |
+| ResNet34            | 0.7457    | 0.9214    | 2.34957               | 5.89821              | 7.36     | 21.8      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_pretrained.pdparams)            |
+| ResNet34_vd         | 0.7598    | 0.9298    | 2.43427               | 6.22257              | 7.39     | 21.82     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_vd_pretrained.pdparams)         |
+| ResNet34_vd_ssld         | 0.7972    | 0.9490    | 2.43427               | 6.22257              | 7.39     | 21.82     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_vd_ssld_pretrained.pdparams)         |
+| ResNet50            | 0.7650    | 0.9300    | 3.47712               | 7.84421              | 8.19     | 25.56     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_pretrained.pdparams)            |
 | ResNet50_vc         | 0.7835    | 0.9403    | 3.52346               | 8.10725              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vc_pretrained.pdparams)         |
-| ResNet50_vd         | 0.7912    | 0.9444    | 3.53131               | 8.09057              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams)         |
-| ResNet50_vd_v2      | 0.7984    | 0.9493    | 3.53131               | 8.09057              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_v2_pretrained.pdparams)      |
-| ResNet101           | 0.7756    | 0.9364    | 6.07125               | 13.40573             | 15.52    | 44.55     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_pretrained.pdparams)           |
-| ResNet101_vd        | 0.8017    | 0.9497    | 6.11704               | 13.76222             | 16.1     | 44.57     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_pretrained.pdparams)        |
-| ResNet152           | 0.7826    | 0.9396    | 8.50198               | 19.17073             | 23.05    | 60.19     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet152_pretrained.pdparams)           |
-| ResNet152_vd        | 0.8059    | 0.9530    | 8.54376               | 19.52157             | 23.53    | 60.21     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet152_vd_pretrained.pdparams)        |
-| ResNet200_vd        | 0.8093    | 0.9533    | 10.80619              | 25.01731             | 30.53    | 74.74     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet200_vd_pretrained.pdparams)        |
-| ResNet50_vd_<br>ssld    | 0.8239    | 0.9610    | 3.53131               | 8.09057              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams)    |
-| ResNet50_vd_<br>ssld_v2 | 0.8300    | 0.9640    | 3.53131               | 8.09057              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_v2_pretrained.pdparams) |
-| ResNet101_vd_<br>ssld   | 0.8373    | 0.9669    | 6.11704               | 13.76222             | 16.1     | 44.57     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_ssld_pretrained.pdparams)   |
+| ResNet50_vd         | 0.7912    | 0.9444    | 3.53131               | 8.09057              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_pretrained.pdparams)         |
+| ResNet101           | 0.7756    | 0.9364    | 6.07125               | 13.40573             | 15.52    | 44.55     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_pretrained.pdparams)           |
+| ResNet101_vd        | 0.8017    | 0.9497    | 6.11704               | 13.76222             | 16.1     | 44.57     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_pretrained.pdparams)        |
+| ResNet152           | 0.7826    | 0.9396    | 8.50198               | 19.17073             | 23.05    | 60.19     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet152_pretrained.pdparams)           |
+| ResNet152_vd        | 0.8059    | 0.9530    | 8.54376               | 19.52157             | 23.53    | 60.21     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet152_vd_pretrained.pdparams)        |
+| ResNet200_vd        | 0.8093    | 0.9533    | 10.80619              | 25.01731             | 30.53    | 74.74     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet200_vd_pretrained.pdparams)        |
+| ResNet50_vd_<br>ssld | 0.8300    | 0.9640    | 3.53131               | 8.09057              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
+| ResNet101_vd_<br>ssld   | 0.8373    | 0.9669    | 6.11704               | 13.76222             | 16.1     | 44.57     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams)   |
 
 
 <a name="Mobile_series"></a>
@@ -85,11 +82,11 @@ Accuracy and inference time metrics of Mobile series models are shown as follows
 
 | Model                              | Top-1 Acc | Top-5 Acc | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | Model storage size(M) | Download Address                                                                                                      |
 |----------------------------------|-----------|-----------|------------------------|----------|-----------|---------|-----------------------------------------------------------------------------------------------------------|
-| MobileNetV1_<br>x0_25                | 0.5143    | 0.7546    | 3.21985                | 0.07     | 0.46      | 1.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_25_pretrained.pdparams)                |
-| MobileNetV1_<br>x0_5                 | 0.6352    | 0.8473    | 9.579599               | 0.28     | 1.31      | 5.2     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_5_pretrained.pdparams)                 |
-| MobileNetV1_<br>x0_75                | 0.6881    | 0.8823    | 19.436399              | 0.63     | 2.55      | 10      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_75_pretrained.pdparams)                |
-| MobileNetV1                      | 0.7099    | 0.8968    | 32.523048              | 1.11     | 4.19      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_pretrained.pdparams)                      |
-| MobileNetV1_<br>ssld                 | 0.7789    | 0.9394    | 32.523048              | 1.11     | 4.19      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_ssld_pretrained.pdparams)                 |
+| MobileNetV1_<br>x0_25                | 0.5143    | 0.7546    | 3.21985                | 0.07     | 0.46      | 1.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_x0_25_pretrained.pdparams)                |
+| MobileNetV1_<br>x0_5                 | 0.6352    | 0.8473    | 9.579599               | 0.28     | 1.31      | 5.2     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_x0_5_pretrained.pdparams)                 |
+| MobileNetV1_<br>x0_75                | 0.6881    | 0.8823    | 19.436399              | 0.63     | 2.55      | 10      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_x0_75_pretrained.pdparams)                |
+| MobileNetV1                      | 0.7099    | 0.8968    | 32.523048              | 1.11     | 4.19      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_pretrained.pdparams)                      |
+| MobileNetV1_<br>ssld                 | 0.7789    | 0.9394    | 32.523048              | 1.11     | 4.19      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)                 |
 | MobileNetV2_<br>x0_25                | 0.5321    | 0.7652    | 3.79925                | 0.05     | 1.5       | 6.1     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_25_pretrained.pdparams)                |
 | MobileNetV2_<br>x0_5                 | 0.6503    | 0.8572    | 8.7021                 | 0.17     | 1.93      | 7.8     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_5_pretrained.pdparams)                 |
 | MobileNetV2_<br>x0_75                | 0.6983    | 0.8901    | 15.531351              | 0.35     | 2.58      | 10      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_75_pretrained.pdparams)                |
@@ -97,19 +94,19 @@ Accuracy and inference time metrics of Mobile series models are shown as follows
 | MobileNetV2_<br>x1_5                 | 0.7412    | 0.9167    | 45.623848              | 1.32     | 6.76      | 26      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x1_5_pretrained.pdparams)                 |
 | MobileNetV2_<br>x2_0                 | 0.7523    | 0.9258    | 74.291649              | 2.32     | 11.13     | 43      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x2_0_pretrained.pdparams)                 |
 | MobileNetV2_<br>ssld                 | 0.7674    | 0.9339    | 23.317699              | 0.6      | 3.44      | 14      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)                 |
-| MobileNetV3_<br>large_x1_25          | 0.7641    | 0.9295    | 28.217701              | 0.714    | 7.44      | 29      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_25_pretrained.pdparams)          |
-| MobileNetV3_<br>large_x1_0           | 0.7532    | 0.9231    | 19.30835               | 0.45     | 5.47      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams)           |
-| MobileNetV3_<br>large_x0_75          | 0.7314    | 0.9108    | 13.5646                | 0.296    | 3.91      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_75_pretrained.pdparams)          |
-| MobileNetV3_<br>large_x0_5           | 0.6924    | 0.8852    | 7.49315                | 0.138    | 2.67      | 11      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams)           |
-| MobileNetV3_<br>large_x0_35          | 0.6432    | 0.8546    | 5.13695                | 0.077    | 2.1       | 8.6     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_35_pretrained.pdparams)          |
-| MobileNetV3_<br>small_x1_25          | 0.7067    | 0.8951    | 9.2745                 | 0.195    | 3.62      | 14      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_25_pretrained.pdparams)          |
-| MobileNetV3_<br>small_x1_0           | 0.6824    | 0.8806    | 6.5463                 | 0.123    | 2.94      | 12      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_pretrained.pdparams)           |
-| MobileNetV3_<br>small_x0_75          | 0.6602    | 0.8633    | 5.28435                | 0.088    | 2.37      | 9.6     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_75_pretrained.pdparams)          |
-| MobileNetV3_<br>small_x0_5           | 0.5921    | 0.8152    | 3.35165                | 0.043    | 1.9       | 7.8     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_5_pretrained.pdparams)           |
-| MobileNetV3_<br>small_x0_35          | 0.5303    | 0.7637    | 2.6352                 | 0.026    | 1.66      | 6.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_pretrained.pdparams)          |
-| MobileNetV3_<br>small_x0_35_ssld          | 0.5555    | 0.7771    | 2.6352                 | 0.026    | 1.66      | 6.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
-| MobileNetV3_<br>large_x1_0_ssld      | 0.7896    | 0.9448    | 19.30835               | 0.45     | 5.47      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
-| MobileNetV3_small_<br>x1_0_ssld      | 0.7129    | 0.9010    | 6.5463                 | 0.123    | 2.94      | 12      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
+MobileNetV3_<br>large_x1_25          | 0.7641    | 0.9295    | 28.217701              | 0.714    | 7.44      | 29      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_25_pretrained.pdparams)          |
+| MobileNetV3_<br>large_x1_0           | 0.7532    | 0.9231    | 19.30835               | 0.45     | 5.47      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_pretrained.pdparams)           |
+| MobileNetV3_<br>large_x0_75          | 0.7314    | 0.9108    | 13.5646                | 0.296    | 3.91      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x0_75_pretrained.pdparams)          |
+| MobileNetV3_<br>large_x0_5           | 0.6924    | 0.8852    | 7.49315                | 0.138    | 2.67      | 11      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x0_5_pretrained.pdparams)           |
+| MobileNetV3_<br>large_x0_35          | 0.6432    | 0.8546    | 5.13695                | 0.077    | 2.1       | 8.6     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x0_35_pretrained.pdparams)          |
+| MobileNetV3_<br>small_x1_25          | 0.7067    | 0.8951    | 9.2745                 | 0.195    | 3.62      | 14      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_25_pretrained.pdparams)          |
+| MobileNetV3_<br>small_x1_0           | 0.6824    | 0.8806    | 6.5463                 | 0.123    | 2.94      | 12      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_pretrained.pdparams)           |
+| MobileNetV3_<br>small_x0_75          | 0.6602    | 0.8633    | 5.28435                | 0.088    | 2.37      | 9.6     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_75_pretrained.pdparams)          |
+| MobileNetV3_<br>small_x0_5           | 0.5921    | 0.8152    | 3.35165                | 0.043    | 1.9       | 7.8     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_5_pretrained.pdparams)           |
+| MobileNetV3_<br>small_x0_35          | 0.5303    | 0.7637    | 2.6352                 | 0.026    | 1.66      | 6.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_pretrained.pdparams)          |
+| MobileNetV3_<br>small_x0_35_ssld          | 0.5555    | 0.7771    | 2.6352                 | 0.026    | 1.66      | 6.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
+| MobileNetV3_<br>large_x1_0_ssld      | 0.7896    | 0.9448    | 19.30835               | 0.45     | 5.47      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
+| MobileNetV3_small_<br>x1_0_ssld      | 0.7129    | 0.9010    | 6.5463                 | 0.123    | 2.94      | 12      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
 | ShuffleNetV2                     | 0.6880    | 0.8845    | 10.941                 | 0.28     | 2.26      | 9       | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_0_pretrained.pdparams)                     |
 | ShuffleNetV2_<br>x0_25               | 0.4990    | 0.7379    | 2.329                  | 0.03     | 0.6       | 2.7     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_25_pretrained.pdparams)               |
 | ShuffleNetV2_<br>x0_33               | 0.5373    | 0.7705    | 2.64335                | 0.04     | 0.64      | 2.8     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_33_pretrained.pdparams)               |
@@ -185,16 +182,16 @@ Accuracy and inference time metrics of HRNet series models are shown as follows.
 
 | Model         | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                                                 |
 |-------------|-----------|-----------|------------------|------------------|----------|-----------|--------------------------------------------------------------------------------------|
-| HRNet_W18_C | 0.7692    | 0.9339    | 7.40636          | 13.29752         | 4.14     | 21.29     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_pretrained.pdparams) |
-| HRNet_W18_C_ssld | 0.81162    | 0.95804    | 7.40636          | 13.29752         | 4.14     | 21.29     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_ssld_pretrained.pdparams) |
-| HRNet_W30_C | 0.7804    | 0.9402    | 9.57594          | 17.35485         | 16.23    | 37.71     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W30_C_pretrained.pdparams) |
-| HRNet_W32_C | 0.7828    | 0.9424    | 9.49807          | 17.72921         | 17.86    | 41.23     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W32_C_pretrained.pdparams) |
-| HRNet_W40_C | 0.7877    | 0.9447    | 12.12202         | 25.68184         | 25.41    | 57.55     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W40_C_pretrained.pdparams) |
-| HRNet_W44_C | 0.7900    | 0.9451    | 13.19858         | 32.25202         | 29.79    | 67.06     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W44_C_pretrained.pdparams) |
-| HRNet_W48_C | 0.7895    | 0.9442    | 13.70761         | 34.43572         | 34.58    | 77.47     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_pretrained.pdparams) |
-| HRNet_W48_C_ssld | 0.8363    | 0.9682    | 13.70761         | 34.43572         | 34.58    | 77.47     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_ssld_pretrained.pdparams) |
-| HRNet_W64_C | 0.7930    | 0.9461    | 17.57527         | 47.9533          | 57.83    | 128.06    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W64_C_pretrained.pdparams) |
-| SE_HRNet_W64_C_ssld | 0.8475    |  0.9726    |    31.69770      |      94.99546     | 57.83    | 128.97    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
+| HRNet_W18_C | 0.7692    | 0.9339    | 7.40636          | 13.29752         | 4.14     | 21.29     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W18_C_pretrained.pdparams) |
+| HRNet_W18_C_ssld | 0.81162    | 0.95804    | 7.40636          | 13.29752         | 4.14     | 21.29     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W18_C_ssld_pretrained.pdparams) |
+| HRNet_W30_C | 0.7804    | 0.9402    | 9.57594          | 17.35485         | 16.23    | 37.71     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W30_C_pretrained.pdparams) |
+| HRNet_W32_C | 0.7828    | 0.9424    | 9.49807          | 17.72921         | 17.86    | 41.23     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W32_C_pretrained.pdparams) |
+| HRNet_W40_C | 0.7877    | 0.9447    | 12.12202         | 25.68184         | 25.41    | 57.55     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W40_C_pretrained.pdparams) |
+| HRNet_W44_C | 0.7900    | 0.9451    | 13.19858         | 32.25202         | 29.79    | 67.06     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W44_C_pretrained.pdparams) |
+| HRNet_W48_C | 0.7895    | 0.9442    | 13.70761         | 34.43572         | 34.58    | 77.47     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W48_C_pretrained.pdparams) |
+| HRNet_W48_C_ssld | 0.8363    | 0.9682    | 13.70761         | 34.43572         | 34.58    | 77.47     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W48_C_ssld_pretrained.pdparams) |
+| HRNet_W64_C | 0.7930    | 0.9461    | 17.57527         | 47.9533          | 57.83    | 128.06    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W64_C_pretrained.pdparams) |
+| SE_HRNet_W64_C_ssld | 0.8475    |  0.9726    |    31.69770      |     94.99546      | 57.83    | 128.97    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
 
 
 <a name="Inception_series"></a>
@@ -211,7 +208,7 @@ Accuracy and inference time metrics of Inception series models are shown as foll
 | Xception65         | 0.8100    | 0.9549    | 7.26158               | 25.88778             | 25.95    | 35.48     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_pretrained.pdparams)         |
 | Xception65_deeplab | 0.8032    | 0.9449    | 7.60208               | 26.03699             | 27.37    | 39.52     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_deeplab_pretrained.pdparams) |
 | Xception71         | 0.8111    | 0.9545    | 8.72457               | 31.55549             | 31.77    | 37.28     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception71_pretrained.pdparams)         |
-| InceptionV3        | 0.7914    | 0.9459    | 6.64054              | 13.53630              | 11.46    | 23.83     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV3_pretrained.pdparams)        |
+| InceptionV3        | 0.7914    | 0.9459    | 6.64054              | 13.53630              | 11.46    | 23.83     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/InceptionV3_pretrained.pdparams)        |
 | InceptionV4        | 0.8077    | 0.9526    | 12.99342              | 25.23416             | 24.57    | 42.68     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV4_pretrained.pdparams)        |
 
 
@@ -253,9 +250,9 @@ Accuracy and inference time metrics of ResNeSt and RegNet series models are show
 
 
 <a name="Transformer"></a>
-### Transformer series
+### ViT_DeiT series
 
-Accuracy and inference time metrics of ViT and DeiT series models are shown as follows. More detailed information can be refered to [Transformer series tutorial](../en/models/Transformer_en.md).
+Accuracy and inference time metrics of ViT and DeiT series models are shown as follows. More detailed information can be refered to [Transformer series tutorial](../en/models/ViT_and_DeiT_en.md).
 
 
 | Model                    | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
@@ -331,6 +328,109 @@ Accuracy and inference time metrics of ReXNet series models are shown as follows
 
 <a name="Others"></a>
 
+<a name="SwinTransformer_series"></a>
+### SwinTransformer
+
+Accuracy and inference time metrics of SwinTransformer series models are shown as follows. More detailed information can be refered to[SwinTransformer series tutorial](../en/models/SwinTransformer_en.md).
+
+| Model       | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                     |
+| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
+| SwinTransformer_tiny_patch4_window7_224    | 0.8069 | 0.9534 |                  |                  | 4.5  | 28   | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_tiny_patch4_window7_224_pretrained.pdparams) |
+| SwinTransformer_small_patch4_window7_224   | 0.8275 | 0.9613 |                  |                  | 8.7  | 50   | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_small_patch4_window7_224_pretrained.pdparams) |
+| SwinTransformer_base_patch4_window7_224    | 0.8300 | 0.9626 |                  |                  | 15.4 | 88   | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_pretrained.pdparams) |
+| SwinTransformer_base_patch4_window12_384   | 0.8439 | 0.9693 |                  |                  | 47.1 | 88   | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_pretrained.pdparams) |
+| SwinTransformer_base_patch4_window7_224<sup>[1]</sup>     | 0.8487 | 0.9746 |                  |                  | 15.4 | 88   | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_22kto1k_pretrained.pdparams) |
+| SwinTransformer_base_patch4_window12_384<sup>[1]</sup>    | 0.8642 | 0.9807 |                  |                  | 47.1 | 88   | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_22kto1k_pretrained.pdparams) |
+| SwinTransformer_large_patch4_window7_224<sup>[1]</sup>    | 0.8596 | 0.9783 |                  |                  | 34.5 | 197  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window7_224_22kto1k_pretrained.pdparams) |
+| SwinTransformer_large_patch4_window12_384<sup>[1]</sup>   | 0.8719 | 0.9823 |                  |                  | 103.9 | 197 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window12_384_22kto1k_pretrained.pdparams) |
+
+[1] Based on the pre-trained model of the ImageNet22k dataset, it is obtained by finetuning from the ImageNet1k data set.
+
+<a name="LeViT_series"></a>
+### LeViT
+
+Accuracy and inference time metrics of LeViT series models are shown as follows. More detailed information can be refered to[LeViT series tutorial](../en/models/LeViT_en.md).
+
+| Model       | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(M) | Params(M) | Download Address                                                     |
+| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
+| LeViT_128S | 0.7598    | 0.9269    |                  |                  | 305    | 7.8     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_128S_pretrained.pdparams) |
+| LeViT_128 | 0.7810    | 0.9371    |                  |                  | 406    | 9.2     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_128_pretrained.pdparams) |
+| LeViT_192 | 0.7934    | 0.9446    |                  |                  | 658    | 11     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_192_pretrained.pdparams) |
+| LeViT_256 | 0.8085    | 0.9497    |                  |                  | 1120    | 19    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_256_pretrained.pdparams) |
+| LeViT_384 | 0.8191   | 0.9551    |                  |                  | 2353    | 39    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_384_pretrained.pdparams) |
+
+**Note**：The difference in accuracy from Reference is due to the difference in data preprocessing and the absence of distilled head as output.
+
+<a name="Twins_series"></a>
+### Twins
+
+Accuracy and inference time metrics of Twins series models are shown as follows. More detailed information can be refered to[Twins series tutorial](../en/models/Twins_en.md).
+
+| Model       | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                     |
+| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
+| pcpvt_small | 0.8082    | 0.9552    |                  |                  |3.7    | 24.1    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_small_pretrained.pdparams) |
+| pcpvt_base | 0.8242    | 0.9619    |                  |                  | 6.4    | 43.8    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_base_pretrained.pdparams) |
+| pcpvt_large | 0.8273    | 0.9650    |                  |                  | 9.5    | 60.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_large_pretrained.pdparams) |
+| alt_gvt_small | 0.8140    | 0.9546    |                  |                  |2.8   | 24   | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_small_pretrained.pdparams) |
+| alt_gvt_base | 0.8294   | 0.9621    |                  |                  | 8.3   | 56   | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_base_pretrained.pdparams) |
+| alt_gvt_large | 0.8331   | 0.9642    |                  |                  | 14.8   | 99.2    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_large_pretrained.pdparams) |
+
+**Note**：The difference in accuracy from Reference is due to the difference in data preprocessing.
+
+<a name="HarDNet_series"></a>
+### HarDNet
+
+Accuracy and inference time metrics of HarDNet series models are shown as follows. More detailed information can be refered to[HarDNet series tutorial](../en/models/HarDNet_en.md).
+
+
+| Model       | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                    |
+| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
+| HarDNet39_ds | 0.7133    |0.8998    |                  |                  | 0.4   |  3.5    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HarDNet39_ds_pretrained.pdparams) |
+| HarDNet68_ds |0.7362    | 0.9152   |                  |                  | 0.8   | 4.2 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HarDNet68_ds_pretrained.pdparams) |
+| HarDNet68| 0.7546   | 0.9265   |                  |                  | 4.3   | 17.6    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HarDNet68_pretrained.pdparams) |
+| HarDNet85 | 0.7744   | 0.9355   |                  |                  | 9.1   | 36.7  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HarDNet85_pretrained.pdparams) |
+
+<a name="DLA_series"></a>
+### DLA
+
+Accuracy and inference time metrics of DLA series models are shown as follows. More detailed information can be refered to[DLA series tutorial](../en/models/DLA_en.md).
+
+| Model       | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                     |
+| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
+| DLA102 | 0.7893    |0.9452    |                  |                  | 7.2   |  33.3    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA102_pretrained.pdparams) |
+| DLA102x2 |0.7885    | 0.9445  |                  |                  | 9.3   | 41.4 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA102x2_pretrained.pdparams) |
+| DLA102x| 0.781   | 0.9400   |                  |                  | 5.9  | 26.4    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA102x_pretrained.pdparams) |
+| DLA169 | 0.7809  | 0.9409   |                  |                  | 11.6  | 53.5  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA169_pretrained.pdparams) |
+| DLA34 | 0.7603   | 0.9298    |                  |                  | 3.1   |  15.8    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA34_pretrained.pdparams) |
+| DLA46_c |0.6321   | 0.853   |                  |                  | 0.5   | 1.3 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA46_c_pretrained.pdparams) |
+| DLA60 | 0.7610   | 0.9292   |                  |                  | 4.2   | 22.0    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA60_pretrained.pdparams) |
+| DLA60x_c | 0.6645   | 0.8754   |                  |                  | 0.6   | 1.3  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA60x_c_pretrained.pdparams) |
+| DLA60x | 0.7753  | 0.9378  |                  |                  | 3.5   | 17.4  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA60x_pretrained.pdparams) |
+
+<a name="RedNet_series"></a>
+### RedNet
+
+Accuracy and inference time metrics of RedNet series models are shown as follows. More detailed information can be refered to[RedNet series tutorial](../en/models/RedNet_en.md).
+
+| Model      | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                     |
+| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
+| RedNet26 | 0.7595   |0.9319  |                  |                  | 1.7   |  9.2    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet26_pretrained.pdparams) |
+| RedNet38 |0.7747  | 0.9356  |                  |                  | 2.2   | 12.4 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet38_pretrained.pdparams) |
+| RedNet50| 0.7833  | 0.9417   |                  |                  | 2.7   | 15.5    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet50_pretrained.pdparams) |
+| RedNet101 | 0.7894  | 0.9436   |                  |                  | 4.7  | 25.7 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet101_pretrained.pdparams) |
+| RedNet152 | 0.7917  | 0.9440   |                  |                  | 6.8  | 34.0  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet152_pretrained.pdparams) |
+
+<a name="TNT_series"></a>
+### TNT
+
+Accuracy and inference time metrics of TNT series models are shown as follows. More detailed information can be refered to[TNT series tutorial](../en/models/TNT_en.md).
+
+|  Model       | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                      |
+| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
+| TNT_small | 0.8121   |0.9563  |                  |                  | 5.2   |  23.8    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/TNT_small_pretrained.pdparams) |               |  
+
+**Note**：The `mean` and `std` in `NormalizeImage` in the data preprocessing part of the TNT model are both 0.5.
+
 ### Others
 
 Accuracy and inference time metrics of AlexNet, SqueezeNet series, VGG series and DarkNet53 models are shown as follows. More detailed information can be refered to [Others](../en/models/Others_en.md).
diff --git a/docs/en/advanced_tutorials/image_augmentation/ImageAugment_en.md b/docs/en/advanced_tutorials/image_augmentation/ImageAugment_en.md
index 8286130749081e5d50038b7c07b8a20d578c7d9f..52ebfc08955c9457e9ffe1fa1445b3bd35c74039 100644
--- a/docs/en/advanced_tutorials/image_augmentation/ImageAugment_en.md
+++ b/docs/en/advanced_tutorials/image_augmentation/ImageAugment_en.md
@@ -154,10 +154,12 @@ cutout_op = Cutout(n_holes=1, length=112)
 
 ops = [decode_op, resize_op, cutout_op]
 
-imgs_dir = image_path
-fnames = os.listdir(imgs_dir)
-for f in fnames:
-    data = open(os.path.join(imgs_dir, f)).read()
+imgs_dir = "imgs_dir"
+file_names = os.listdir(imgs_dir)
+for file_name in file_names:
+    file_path = os.join(imgs_dir, file_name)
+    with open(file_path) as f:
+        data = f.read()
     img = transform(data, ops)
 ```
 
diff --git a/docs/en/models/LeViT_en.md b/docs/en/models/LeViT_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..7fd953aca91947cb3acd134c3119dcd0fbf5d2df
--- /dev/null
+++ b/docs/en/models/LeViT_en.md
@@ -0,0 +1,17 @@
+# LeViT series
+
+## Overview
+LeViT is a fast inference hybrid neural network for image classification tasks. Its design considers the performance of the network model on different hardware platforms, so it can better reflect the real scenarios of common applications. Through a large number of experiments, the author found a better way to combine the convolutional neural network and the Transformer system, and proposed an attention-based method to integrate the position information encoding in the Transformer. [Paper](https://arxiv.org/abs/2104.01136)。
+
+## Accuracy, FLOPS and Parameters
+
+| Models           | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(M) | Params<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| LeViT-128S | 0.7598 | 0.9269 | 0.766 | 0.929 | 305  | 7.8 |
+| LeViT-128  | 0.7810 | 0.9371 | 0.786 | 0.940 | 406  | 9.2 |
+| LeViT-192  | 0.7934 | 0.9446 | 0.800 | 0.947 | 658  | 11 |
+| LeViT-256  | 0.8085 | 0.9497 | 0.816 | 0.954 | 1120 | 19 |
+| LeViT-384  | 0.8191 | 0.9551 | 0.826 | 0.960 | 2353 | 39 |
+
+
+**Note**：The difference in accuracy from Reference is due to the difference in data preprocessing and the absence of distilled head as output.
diff --git a/docs/zh_CN/advanced_tutorials/image_augmentation/ImageAugment.md b/docs/zh_CN/advanced_tutorials/image_augmentation/ImageAugment.md
index 6ee6455fa8d24b4610c8d364031ab5e7c075807c..6278096d8ff96193835d0970a75440331c760ecc 100644
--- a/docs/zh_CN/advanced_tutorials/image_augmentation/ImageAugment.md
+++ b/docs/zh_CN/advanced_tutorials/image_augmentation/ImageAugment.md
@@ -74,10 +74,12 @@ autoaugment_op = ImageNetPolicy()
 
 ops = [decode_op, resize_op, autoaugment_op]
 
-imgs_dir = 图像路径
-fnames = os.listdir(imgs_dir)
-for f in fnames:
-    data = open(os.path.join(imgs_dir, f)).read()
+imgs_dir = "imgs_dir"
+file_names = os.listdir(imgs_dir)
+for file_name in file_names:
+    file_path = os.join(imgs_dir, file_name)
+    with open(file_path) as f:
+        data = f.read()
     img = transform(data, ops)
 ```
 
diff --git a/docs/zh_CN/faq_series/faq_2021_s2.md b/docs/zh_CN/faq_series/faq_2021_s2.md
new file mode 100644
index 0000000000000000000000000000000000000000..fbe2c3bdd237c80180b66cb8ea4a2a3a1665299a
--- /dev/null
+++ b/docs/zh_CN/faq_series/faq_2021_s2.md
@@ -0,0 +1,124 @@
+# 图像识别常见问题汇总 - 2021 第2季
+
+
+## 目录
+* [第1期](#第1期)(2021.07.08)
+* [第2期](#第2期)(2021.07.27)
+
+<a name="第1期"></a>
+## 第1期
+
+### Q1.1: 目前使用的主体检测模型检测在某些场景中会有误检？
+
+**A**：目前的主体检测模型训练时使用了COCO、Object365、RPC、LogoDet等公开数据集，如果被检测数据是类似工业质检等于常见类别差异较大的数据，需要基于目前的检测模型重新微调训练。
+
+### Q1.2: 添加图片后建索引报`assert text_num >= 2`错？
+
+**A**：请确保data_file.txt中图片路径和图片名称中间的间隔为单个table，而不是空格。
+
+### Q1.3: 识别模块预测时报`Illegal instruction`错？
+
+**A**：可能是编译生成的库文件与您的环境不兼容，导致程序报错，如果报错，推荐参考[向量检索教程](../../../deploy/vector_search/README.md)重新编译库文件。
+
+### Q1.4 主体检测是每次只输出一个主体检测框吗？
+
+**A**：主体检测这块的输出数量是可以通过配置文件配置的。在配置文件中Global.threshold控制检测的阈值，小于该阈值的检测框被舍弃，Global.max_det_results控制最大返回的结果数，这两个参数共同决定了输出检测框的数量。
+
+### Q1.5 训练主体检测模型的数据是如何选择的？换成更小的模型会有损精度吗？
+
+**A**：训练数据是在COCO、Object365、RPC、LogoDet等公开数据集中随机抽取的子集，小模型精度可能会有一些损失，后续我们也会尝试下更小的检测模型。关于主体检测模型的更多信息请参考[主体检测](../application/mainbody_detection.md)。
+
+### Q1.6 识别模型怎么在预训练模型的基础上进行微调训练？
+
+**A**：识别模型的微调训练和分类模型的微调训练类似，识别模型可以加载商品的预训练模型],训练过程可以参考[识别模型训练](../tutorials/getting_started_retrieval.md)，后续我们也会持续细化这块的文档。
+
+### Q1.7 PaddleClas和PaddleDetection区别
+
+**A**：PaddleClas是一个兼主体检测、图像分类、图像检索于一体的图像识别repo，用于解决大部分图像识别问题，用户可以很方便的使用PaddleClas来解决小样本、多类别的图像识别问题。PaddleDetection提供了目标检测、关键点检测、多目标跟踪等能力，方便用户定位图像中的感兴趣的点和区域，被广泛应用于工业质检、遥感图像检测、无人巡检等项目。
+
+### Q1.8 PaddleClas 2.2和PaddleClas 2.1完全兼容吗？
+
+**A**：PaddleClas2.2相对PaddleClas2.1新增了metric learning模块，主体检测模块、向量检索模块。另外，也提供了商品识别、车辆识别、logo识别和动漫人物识别等4个场景应用示例。用户可以基于PaddleClas 2.2快速构建图像识别系统。在图像分类模块，二者的使用方法类似，可以参考[图像分类示例](../tutorials/getting_started.md)快速迭代和评估。新增的metric learning模块，可以参考[metric learning示例](../tutorials/getting_started_retrieval.md)。另外，新版本暂时还不支持fp16、dali训练，也暂时不支持多标签训练，这块内容将在不久后支持。
+
+### Q1.9 训练metric learning时，每个epoch中，无法跑完所有mini-batch，为什么？
+
+**A**：在训练metric learning时，使用的Sampler是DistributedRandomIdentitySampler，该Sampler不会采样全部的图片，导致会让每一个epoch采样的数据不是所有的数据，所以无法跑完显示的mini-batch是正常现象。后续我们会优化下打印的信息，尽可能减少给大家带来的困惑。
+
+### Q1.10 有些图片没有识别出结果，为什么？
+
+**A**：在配置文件（如inference_product.yaml）中，`IndexProcess.score_thres`中会控制被识别的图片与库中的图片的余弦相似度的最小值。当余弦相似度小于该值时，不会打印结果。您可以根据自己的实际数据调整该值。
+
+### Q1.11 为什么有一些图片检测出的结果就是原图？
+
+**A**：主体检测模型会返回检测框，但事实上为了让后续的识别模型更加准确，在返回检测框的同时也返回了原图。后续会根据原图或者检测框与库中的图片的相似度排序，相似度最高的库中图片的标签即为被识别图片的标签。
+
+### Q1.12 使用`circle loss`还需加`triplet loss`吗？
+
+**A**：`circle loss`是统一了样本对学习和分类学习的两种形式，如果是分类学习的形式的话，可以增加`triplet loss`。
+
+### Q1.13 hub serving方式启动某个模块，怎么添加该模块的参数呢？
+
+**A**：具体可以参考[hub serving参数](../../../deploy/hubserving/clas/params.py)。
+
+### Q1.14  模型训练出nan，为什么？
+
+**A**：
+
+1.确保正确加载预训练模型, 最简单的加载方式添加参数`-o Arch.pretrained=True`即可；
+
+2.模型微调时，学习率不要太大，如设置0.001就好。
+
+
+### Q1.15 SSLD中，大模型在500M数据上预训练后蒸馏小模型，然后在1M数据上蒸馏finetune小模型？
+
+**A**：步骤如下：
+
+1.基于facebook开源的`ResNeXt101-32x16d-wsl`模型 去蒸馏得到了`ResNet50-vd`模型；
+
+2.用这个`ResNet50-vd`，在500W数据集上去蒸馏`MobilNetV3`；
+
+3.考虑到500W的数据集的分布和100W的数据分布不完全一致，所以这块，在100W上的数据上又finetune了一下，精度有微弱的提升。
+
+
+### Q1.16 如果不是识别开源的四个方向的图片，该使用哪个识别模型？
+
+**A**：建议使用商品识别模型，一来是因为商品覆盖的范围比较广，被识别的图片是商品的概率更大，二来是因为商品识别模型的训练数据使用了5万类别的数据，泛化能力更好，特征会更鲁棒一些。
+
+### Q1.17 最后使用512维的向量，为什么不用1024或者其他维度的呢？
+
+**A**：使用维度小的向量，为了加快计算，在实际使用过程中，可能使用128甚至更小。一般来说，512的维度已经够大，能充分表示特征了。
+
+### Q1.18 训练SwinTransformer，loss出现nan
+
+**A**：训练SwinTransformer的话，需要使用paddle-dev去训练，安装方式参考[paddlepaddle安装方式](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)，后续paddlepaddle-2.1也会同时支持。
+
+### Q1.19 新增底库数据需要重新构建索引吗？
+
+**A**：这一版需要重新构建索引，未来版本会支持只构建新增图片的索引。
+
+### Q1.20 PaddleClas 的`train_log`文件在哪里?
+
+**A**：在保存权重的路径中存放了`train.log`。
+
+
+<a name="第2期"></a>
+## 第2期
+
+### Q2.1 PaddleClas目前使用的Möbius向量检索算法支持类似于faiss的那种index.add()的功能吗? 另外，每次构建新的图都要进行train吗？这里的train是为了检索加速还是为了构建相似的图？
+
+**A**：Mobius提供的检索算法是一种基于图的近似最近邻搜索算法，目前支持两种距离计算方式：inner product和L2 distance. faiss中提供的index.add功能暂时不支持，如果需要增加检索库的内容，需要从头重新构建新的index. 在每次构建index时，检索算法内部执行的操作是一种类似于train的过程，不同于faiss提供的train接口，我们命名为build, 主要的目的是为了加速检索的速度。
+
+### Q2.2 可以对视频中每一帧画面进行逐帧预测吗？
+**A**：可以，但目前PaddleClas并不支持视频输入。可以尝试修改一下PaddleClas代码，或者预先将视频逐帧转为图像存储，再使用PaddleClas进行预测。
+
+### Q2.3：在直播场景中，需要提供一个直播即时识别画面，能够在延迟几秒内找到特征目标物并用框圈起，这个可以实现吗？
+**A**：要达到实时的检测效果，需要检测速度达到实时性的要求；PPyolo是Paddle团队提供的轻量级目标检测模型，检测速度和精度达到了很好的平衡，可以试试ppyolo来做检测. 关于ppyolo的使用，可以参照：   https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/README_cn.md
+
+### Q2.4: 对于未知的标签，加入gallery dataset可以用于后续的分类识别（无需训练），但是如果前面的检测模型对于未知的标签无法定位检测出来，是否还是要训练前面的检测模型？
+**A**：如果检测模型在自己的数据集上表现不佳，需要在自己的检测数据集上再finetune下
+
+### Q2.5: Mac重新编译index.so时报错如下：clang: error: unsupported option '-fopenmp', 该如何处理？
+**A**：该问题已经解决。Mac编译index.so，可以参照文档： https://github.com/PaddlePaddle/PaddleClas/blob/develop/deploy/vector_search/README.md
+
+### Q2.6: PaddleClas有提供调整图片亮度，对比度，饱和度，色调等方面的数据增强吗？
+**A**：PaddleClas提供了多种数据增广方式， 可分为3类：1. 图像变换类： AutoAugment, RandAugment;  2. 图像裁剪类： CutOut、RandErasing、HideAndSeek、GridMask；3. 图像混叠类：Mixup, Cutmix. 其中，Randangment提供了多种数据增强方式的随机组合，可以满足亮度、对比度、饱和度、色调等多方面的数据增广需求
diff --git a/docs/zh_CN/tutorials/config.md b/docs/zh_CN/tutorials/config.md
deleted file mode 100644
index 87ad4bc25a2187fb82ab463d53e3fd038379b901..0000000000000000000000000000000000000000
--- a/docs/zh_CN/tutorials/config.md
+++ /dev/null
@@ -1,109 +0,0 @@
-# 配置说明
-
----
-
-## 简介
-
-本文档介绍了PaddleClas配置文件(`configs/*.yaml`)中各参数的含义，以便您更快地自定义或修改超参数配置。
-
-* 注意：部分参数并未在配置文件中体现，在训练或者评估时，可以直接使用`-o`进行参数的扩充或者更新，比如说`-o checkpoints=./ckp_path/ppcls`，表示在配置文件中添加（如果之前不存在）或者更新（如果之前已经包含该字段）`checkpoints`字段，其值设为`./ckp_path/ppcls`。
-
-
-## 配置详解
-
-### 基础配置
-
-| 参数名字 | 具体含义 | 默认值 | 可选值 |
-|:---:|:---:|:---:|:---:|
-| mode | 运行模式 | "train" | ["train"," valid"] |
-| checkpoints | 断点模型路径，用于恢复训练 | "" | Str |
-| last_epoch | 上一次训练结束时已经训练的epoch数量，与checkpoints一起使用 | -1 | int |
-| pretrained_model | 预训练模型路径 | "" | Str |
-| load_static_weights | 加载的模型是否为静态图的预训练模型 | False | bool |
-| model_save_dir | 保存模型路径 | "" | Str |
-| classes_num | 分类数 | 1000 | int |
-| total_images | 总图片数 | 1281167 | int |
-| save_interval | 每隔多少个epoch保存模型 | 1 | int |
-| validate | 是否在训练时进行评估 | TRUE | bool |
-| valid_interval | 每隔多少个epoch进行模型评估 | 1 | int |
-| epochs | 训练总epoch数 |  | int |
-| topk | 评估指标K值大小 | 5 | int |
-| image_shape | 图片大小 | [3，224，224] | list, shape: (3,) |
-| use_mix | 是否启用mixup | False | ['True', 'False'] |
-| ls_epsilon | label_smoothing epsilon值| 0 | float |
-| use_distillation | 是否进行模型蒸馏 | False | bool |
-
-## 结构(ARCHITECTURE)
-
-### 分类模型结构配置
-
-| 参数名字 | 具体含义 | 默认值 | 可选值 |
-|:---:|:---:|:---:|:---:|
-| name | 模型结构名字 | "ResNet50_vd" | PaddleClas提供的模型结构 |
-| params | 模型传参 | {} | 模型结构所需的额外字典，如EfficientNet等配置文件中需要传入`padding_type`等参数，可以通过这种方式传入 |
-
-### 识别模型结构配置
-
-|     参数名字      |         具体含义          |   默认值   |                            可选值                            |
-| :---------------: | :-----------------------: | :--------: | :----------------------------------------------------------: |
-|       name        |         模型结构          | "RecModel" |                         ["RecModel"]                         |
-| infer_output_key  |    inference时的输出值    | “feature”  |                    ["feature", "logits"]                     |
-| infer_add_softmax |  infercne是否添加softmax  |    True    |                        [True, False]                         |
-|     Backbone      |    使用Backbone的名字     |            | 需传入字典结构，包含`name`、`pretrained`等key值。其中`name`为分类模型名字， `pretrained`为布尔值 |
-| BackboneStopLayer | Backbone中的feature输出层 |            | 需传入字典结构，包含`name`key值，具体值为Backbone中的特征输出层的`full_name` |
-|       Neck        |    添加的网络Neck部分     |            |           需传入字典结构，Neck网络层的具体输入参数           |
-|       Head        |    添加的网络Head部分     |            |           需传入字典结构，Head网络层的具体输入参数           |
-
-### 学习率(LEARNING_RATE)
-
-| 参数名字 | 具体含义 | 默认值 | 可选值 |
-|:---:|:---:|:---:|:---:|
-| function | decay方法名 | "Linear" | ["Linear", "Cosine", <br> "Piecewise", "CosineWarmup"] |
-| params.lr | 初始学习率 | 0.1 | float |
-| params.decay_epochs | piecewisedecay中<br>衰减学习率的milestone |  | list |
-| params.gamma | piecewisedecay中gamma值 | 0.1 | float |
-| params.warmup_epoch | warmup轮数 | 5 | int |
-| parmas.steps | lineardecay衰减steps数 | 100 | int |
-| params.end_lr | lineardecayend_lr值 | 0 | float |
-
-### 优化器(OPTIMIZER)
-
-| 参数名字 | 具体含义 | 默认值 | 可选值 |
-|:---:|:---:|:---:|:---:|
-| function | 优化器方法名 | "Momentum" | ["Momentum", "RmsProp"] |
-| params.momentum | momentum值 | 0.9 | float |
-| regularizer.function | 正则化方法名 | "L2" | ["L1", "L2"] |
-| regularizer.factor | 正则化系数 | 0.0001 | float |
-
-### 数据读取器与数据处理
-
-| 参数名字 | 具体含义 |
-|:---:|:---:|
-| batch_size | 批大小 |
-| num_workers | 数据读取器worker数量 |
-| file_list | train文件列表 |
-| data_dir | train文件路径 |
-| shuffle_seed | 用来进行shuffle的seed值 |
-
-数据处理
-
-| 功能名字 | 参数名字 | 具体含义 |
-|:---:|:---:|:---:|
-| DecodeImage | to_rgb | 数据转RGB |
-|  | to_np | 数据转numpy |
-|  | channel_first | 按CHW排列的图片数据 |
-| RandCropImage | size | 随机裁剪 |
-| RandFlipImage | | 随机翻转 |
-| NormalizeImage | scale | 归一化scale值 |
-|  | mean | 归一化均值 |
-|  | std | 归一化方差 |
-|  | order | 归一化顺序 |
-| ToCHWImage |  | 调整为CHW |
-| CropImage | size | 裁剪大小 |
-| ResizeImage | resize_short | 按短边调整大小 |
-
-mix处理
-
-| 参数名字| 具体含义|
-|:---:|:---:|
-| MixupOperator.alpha | mixup处理中的alpha值|
diff --git a/docs/zh_CN/tutorials/config_description.md b/docs/zh_CN/tutorials/config_description.md
new file mode 100644
index 0000000000000000000000000000000000000000..6f2cb482226b6b6f803a59bf7590dbe34303dc07
--- /dev/null
+++ b/docs/zh_CN/tutorials/config_description.md
@@ -0,0 +1,234 @@
+# 配置说明
+
+---
+
+## 简介
+
+本文档介绍了PaddleClas配置文件(`ppcls/configs/*.yaml`)中各参数的含义，以便您更快地自定义或修改超参数配置。
+
+
+## 配置详解
+
+### 1.分类模型
+
+此处以`ResNet50_vd`在`ImageNet-1k`上的训练配置为例，详解各个参数的意义。[配置路径](../../../ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml)。
+
+#### 1.1 全局配置(Global)
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| checkpoints | 断点模型路径，用于恢复训练 | null | str |
+| pretrained_model | 预训练模型路径 | null | str |
+| output_dir | 保存模型路径 | "./output/" | str |
+| save_interval | 每隔多少个epoch保存模型 | 1 | int |
+| eval_during_train| 是否在训练时进行评估 | True | bool |
+| eval_interval | 每隔多少个epoch进行模型评估 | 1 | int |
+| epochs | 训练总epoch数 |  | int |
+| print_batch_step | 每隔多少个mini-batch打印输出 | 10 | int |
+| use_visualdl | 是否是用visualdl可视化训练过程 | False | bool |
+| image_shape | 图片大小 | [3，224，224] | list, shape: (3,) |
+| save_inference_dir | inference模型的保存路径 | "./inference" | str |
+| eval_mode | eval的模式 | "classification" | "retrieval" |
+
+**注**：`pretrained_model`也可以填写存放预训练模型的http地址。
+
+#### 1.2 结构(Arch)
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| name | 模型结构名字 | ResNet50 | PaddleClas提供的模型结构 |
+| class_num | 分类数 | 1000 | int |
+| pretrained | 预训练模型 | False | bool， str |
+
+**注**：此处的pretrained可以设置为`True`或者`False`，也可以设置权重的路径。另外当`Global.pretrained_model`也设置相应路径时，此处的`pretrained`失效。
+
+#### 1.3 损失函数（Loss）
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| CELoss | 交叉熵损失函数 | —— | —— |
+| CELoss.weight | CELoss的在整个Loss中的权重 | 1.0 | float |
+| CELoss.epsilon | CELoss中label_smooth的epsilon值 | 0.1 | float，0-1之间 |
+
+
+#### 1.4 优化器(Optimizer)
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| name | 优化器方法名 | "Momentum" | "RmsProp"等其他优化器 |
+| momentum | momentum值 | 0.9 | float |
+| lr.name | 学习率下降方式 | "Cosine" | "Linear"、"Piecewise"等其他下降方式 |
+| lr.learning_rate | 学习率初始值 | 0.1 | float |
+| lr.warmup_epoch | warmup轮数 | 0 | int，如5 |
+| regularizer.name | 正则化方法名 | "L2" | ["L1", "L2"] |
+| regularizer.coeff | 正则化系数 | 0.00007 | float |
+
+**注**：`lr.name`不同时，新增的参数可能也不同，如当`lr.name=Piecewise`时，需要添加如下参数：
+
+```
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+```
+
+添加方法及参数请查看[learning_rate.py](../../../ppcls/optimizer/learning_rate.py)。
+
+
+#### 1.5数据读取模块（DataLoader）
+
+##### 1.5.1 dataset
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| name | 读取数据的类的名字 | ImageNetDataset | VeriWild等其他读取数据类的名字 |
+| image_root | 数据集存放的路径 | ./dataset/ILSVRC2012/ | str |
+| cls_label_path | 数据集标签list | ./dataset/ILSVRC2012/train_list.txt | str |
+| transform_ops | 单张图片的数据预处理 | —— | —— |
+| batch_transform_ops | batch图片的数据预处理 | —— | —— |
+
+
+transform_ops中参数的意义：
+
+| 功能名字 | 参数名字 | 具体含义 |
+|:---:|:---:|:---:|
+| DecodeImage | to_rgb | 数据转RGB |
+|  | channel_first | 按CHW排列的图片数据 |
+| RandCropImage | size | 随机裁剪 |
+| RandFlipImage | | 随机翻转 |
+| NormalizeImage | scale | 归一化scale值 |
+|  | mean | 归一化均值 |
+|  | std | 归一化方差 |
+|  | order | 归一化顺序 |
+| CropImage | size | 裁剪大小 |
+| ResizeImage | resize_short | 按短边调整大小 |
+
+batch_transform_ops中参数的含义：
+
+| 功能名字 | 参数名字 | 具体含义 |
+|:---:|:---:|:---:|
+| MixupOperator | alpha | Mixup参数值，该值越大增强越强 |
+
+##### 1.5.2 sampler
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| name |  sampler类型 | DistributedBatchSampler | DistributedRandomIdentitySampler等其他Sampler |
+| batch_size | 批大小 | 64 | int |
+| drop_last | 是否丢掉最后不够batch-size的数据 | False | bool |
+| shuffle | 数据是否做shuffle | True | bool |
+
+##### 1.5.3 loader
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| num_workers | 数据读取线程数 | 4 | int |
+| use_shared_memory | 是否使用共享内存 | True | bool |
+
+
+#### 1.6 评估指标（Metric）
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| TopkAcc | TopkAcc | [1, 5] | list, int |
+
+#### 1.7 预测（Infer）
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| infer_imgs | 被infer的图像的地址 | docs/images/whl/demo.jpg | str |
+| batch_size | 批大小 | 10 | int |
+| PostProcess.name | 后处理名字 | Topk | str |
+| PostProcess.topk | topk的值 | 5 | int |
+| PostProcess.class_id_map_file | class id和名字的映射文件 | ppcls/utils/imagenet1k_label_list.txt | str |
+
+**注**：Infer模块的`transforms`的解释参考数据读取模块中的dataset中`transform_ops`的解释。
+
+
+### 2.蒸馏模型
+
+**注**：此处以`MobileNetV3_large_x1_0`在`ImageNet-1k`上蒸馏`MobileNetV3_small_x1_0`的训练配置为例，详解各个参数的意义。[配置路径](../../../ppcls/configs/ImageNet/Distillation/mv3_large_x1_0_distill_mv3_small_x1_0.yaml)。这里只介绍与分类模型有区别的参数。
+
+#### 2.1 结构（Arch）
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| name | 模型结构名字 | DistillationModel | —— |
+| class_num | 分类数 | 1000 | int |
+| freeze_params_list | 冻结参数列表 | [True, False] | list |
+| models | 模型列表 | [Teacher, Student] | list |
+| Teacher.name | 教师模型的名字 | MobileNetV3_large_x1_0 | PaddleClas中的模型 |
+| Teacher.pretrained | 教师模型预训练权重 | True | 布尔值或者预训练权重路径 |
+| Teacher.use_ssld | 教师模型预训练权重是否是ssld权重 | True | 布尔值 |
+| infer_model_name | 被infer模型的类型 | Student | Teacher |
+
+**注**：
+
+1.list在yaml中体现如下：
+```
+  freeze_params_list:
+  - True
+  - False
+```
+2.Student的参数情况类似，不再赘述。
+
+#### 2.2 损失函数（Loss）
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| DistillationCELoss | 蒸馏的交叉熵损失函数 | —— | —— |
+| DistillationCELoss.weight | Loss权重 | 1.0 | float |
+| DistillationCELoss.model_name_pairs |  ["Student", "Teacher"] | —— | —— |
+| DistillationGTCELoss.weight | 蒸馏的模型与真实Label的交叉熵损失函数 | —— | —— |
+| DistillationGTCELos.weight | Loss权重 | 1.0 | float |
+| DistillationCELoss.model_names | 与真实label作交叉熵的模型名字 | ["Student"] | —— |
+
+
+#### 2.3 评估指标（Metric）
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| DistillationTopkAcc | DistillationTopkAcc | 包含model_key和topk两个参数 | —— |
+| DistillationTopkAcc.model_key | 被评估的模型 | "Student" | "Teacher" |
+| DistillationTopkAcc.topk | Topk的值 | [1, 5] | list, int |
+
+**注**：`DistillationTopkAcc`与普通`TopkAcc`含义相同，只是只用在蒸馏任务中。
+
+### 3. 识别模型
+
+**注**：此处以`ResNet50`在`LogoDet-3k`上的训练配置为例，详解各个参数的意义。[配置路径](../../../ppcls/configs/Logo/ResNet50_ReID.yaml)。这里只介绍与分类模型有区别的参数。
+
+#### 3.1 结构(Arch)
+
+|     参数名字      |         具体含义          |   默认值   |                            可选值                            |
+| :---------------: | :-----------------------: | :--------: | :----------------------------------------------------------: |
+|       name        |         模型结构          | "RecModel" |                         ["RecModel"]                         |
+| infer_output_key  |    inference时的输出值    | “feature”  |                    ["feature", "logits"]                     |
+| infer_add_softmax |  infercne是否添加softmax  |    False   |                        [True, False]                         |
+|     Backbone.name      |    Backbone的名字     |      ResNet50_last_stage_stride1     | PaddleClas提供的其他backbone |
+|     Backbone.pretrained      |   Backbone预训练模型    |      True      | 布尔值或者预训练模型路径 |
+| BackboneStopLayer.name | Backbone中的输出层名字 |     True       | Backbone中的特征输出层的`full_name` |
+|       Neck.name        |    网络Neck部分名字     |      VehicleNeck      |           需传入字典结构，Neck网络层的具体输入参数           |
+|       Neck.in_channels        |    输入Neck部分的维度大小     |      2048      |        与BackboneStopLayer.name层的大小相同           |
+|       Neck.out_channels        |    输出Neck部分的维度大小，即特征维度大小    |      512     |        int           |
+|       Head.name        |    网络Head部分名字     |      CircleMargin      |           Arcmargin等           |
+|       Head.embedding_size        |    特征维度大小      |      512      |           与Neck.out_channels保持一致           |
+|       Head.class_num        |    类别数     |      3000      |           int           |
+|       Head.margin        |    CircleMargin中的margin值     |      0.35      |           float          |
+|       Head.scale        |    CircleMargin中的scale值     |      64      |           int          |
+
+**注**：
+
+1.在PaddleClas中，`Neck`部分是Backbone与embedding层的连接部分，`Head`部分是embedding层与分类层的连接部分。
+
+2.`BackboneStopLayer.name`的获取方式可以通过将模型可视化后获取，可视化方式可以参考[Netron](https://github.com/lutzroeder/netron)或者[visualdl](https://github.com/PaddlePaddle/VisualDL)。
+
+3.调用`tools/export_model.py`会将模型的权重转为inference model，其中`infer_add_softmax`参数会控制是否在其后增加`Softmax`激活函数，代码中默认为`True`(分类任务中最后的输出层会接`Softmax`激活函数)，识别任务中特征层无须接激活函数，此处要设置为`False`。
+
+#### 3.2 评估指标（Metric）
+
+| 参数名字 | 具体含义 | 默认值 | 可选值 |
+|:---:|:---:|:---:|:---:|
+| Recallk| 召回率 | [1, 5] | list, int |
+| mAP| 平均检索精度 | None | None |
diff --git a/ppcls/arch/backbone/legendary_models/resnet.py b/ppcls/arch/backbone/legendary_models/resnet.py
index e2453f8dd53287831f6af5e1d1dc3b3f685b1cb6..5417e2d970e74a690ada1c5a4f3d7fdedf238d11 100644
--- a/ppcls/arch/backbone/legendary_models/resnet.py
+++ b/ppcls/arch/backbone/legendary_models/resnet.py
@@ -104,7 +104,8 @@ class ConvBNLayer(TheseusLayer):
                  groups=1,
                  is_vd_mode=False,
                  act=None,
-                 lr_mult=1.0):
+                 lr_mult=1.0,
+                 data_format="NCHW"):
         super().__init__()
         self.is_vd_mode = is_vd_mode
         self.act = act
@@ -118,11 +119,13 @@ class ConvBNLayer(TheseusLayer):
             padding=(filter_size - 1) // 2,
             groups=groups,
             weight_attr=ParamAttr(learning_rate=lr_mult),
-            bias_attr=False)
+            bias_attr=False,
+            data_format=data_format)
         self.bn = BatchNorm(
             num_filters,
             param_attr=ParamAttr(learning_rate=lr_mult),
-            bias_attr=ParamAttr(learning_rate=lr_mult))
+            bias_attr=ParamAttr(learning_rate=lr_mult),
+            data_layout=data_format)
         self.relu = nn.ReLU()
 
     def forward(self, x):
@@ -136,14 +139,14 @@ class ConvBNLayer(TheseusLayer):
 
 
 class BottleneckBlock(TheseusLayer):
-    def __init__(
-            self,
-            num_channels,
-            num_filters,
-            stride,
-            shortcut=True,
-            if_first=False,
-            lr_mult=1.0, ):
+    def __init__(self,
+                 num_channels,
+                 num_filters,
+                 stride,
+                 shortcut=True,
+                 if_first=False,
+                 lr_mult=1.0,
+                 data_format="NCHW"):
         super().__init__()
 
         self.conv0 = ConvBNLayer(
@@ -151,20 +154,23 @@ class BottleneckBlock(TheseusLayer):
             num_filters=num_filters,
             filter_size=1,
             act="relu",
-            lr_mult=lr_mult)
+            lr_mult=lr_mult,
+            data_format=data_format)
         self.conv1 = ConvBNLayer(
             num_channels=num_filters,
             num_filters=num_filters,
             filter_size=3,
             stride=stride,
             act="relu",
-            lr_mult=lr_mult)
+            lr_mult=lr_mult,
+            data_format=data_format)
         self.conv2 = ConvBNLayer(
             num_channels=num_filters,
             num_filters=num_filters * 4,
             filter_size=1,
             act=None,
-            lr_mult=lr_mult)
+            lr_mult=lr_mult,
+            data_format=data_format)
 
         if not shortcut:
             self.short = ConvBNLayer(
@@ -173,7 +179,8 @@ class BottleneckBlock(TheseusLayer):
                 filter_size=1,
                 stride=stride if if_first else 1,
                 is_vd_mode=False if if_first else True,
-                lr_mult=lr_mult)
+                lr_mult=lr_mult,
+                data_format=data_format)
         self.relu = nn.ReLU()
         self.shortcut = shortcut
 
@@ -199,7 +206,8 @@ class BasicBlock(TheseusLayer):
                  stride,
                  shortcut=True,
                  if_first=False,
-                 lr_mult=1.0):
+                 lr_mult=1.0,
+                 data_format="NCHW"):
         super().__init__()
 
         self.stride = stride
@@ -209,13 +217,15 @@ class BasicBlock(TheseusLayer):
             filter_size=3,
             stride=stride,
             act="relu",
-            lr_mult=lr_mult)
+            lr_mult=lr_mult,
+            data_format=data_format)
         self.conv1 = ConvBNLayer(
             num_channels=num_filters,
             num_filters=num_filters,
             filter_size=3,
             act=None,
-            lr_mult=lr_mult)
+            lr_mult=lr_mult,
+            data_format=data_format)
         if not shortcut:
             self.short = ConvBNLayer(
                 num_channels=num_channels,
@@ -223,7 +233,8 @@ class BasicBlock(TheseusLayer):
                 filter_size=1,
                 stride=stride if if_first else 1,
                 is_vd_mode=False if if_first else True,
-                lr_mult=lr_mult)
+                lr_mult=lr_mult,
+                data_format=data_format)
         self.shortcut = shortcut
         self.relu = nn.ReLU()
 
@@ -256,7 +267,9 @@ class ResNet(TheseusLayer):
                  config,
                  version="vb",
                  class_num=1000,
-                 lr_mult_list=[1.0, 1.0, 1.0, 1.0, 1.0]):
+                 lr_mult_list=[1.0, 1.0, 1.0, 1.0, 1.0],
+                 data_format="NCHW",
+                 input_image_channel=3):
         super().__init__()
 
         self.cfg = config
@@ -279,22 +292,25 @@ class ResNet(TheseusLayer):
 
         self.stem_cfg = {
             #num_channels, num_filters, filter_size, stride
-            "vb": [[3, 64, 7, 2]],
-            "vd": [[3, 32, 3, 2], [32, 32, 3, 1], [32, 64, 3, 1]]
+            "vb": [[input_image_channel, 64, 7, 2]],
+            "vd":
+            [[input_image_channel, 32, 3, 2], [32, 32, 3, 1], [32, 64, 3, 1]]
         }
 
-        self.stem = nn.Sequential(*[
+        self.stem = nn.Sequential(* [
             ConvBNLayer(
                 num_channels=in_c,
                 num_filters=out_c,
                 filter_size=k,
                 stride=s,
                 act="relu",
-                lr_mult=self.lr_mult_list[0])
+                lr_mult=self.lr_mult_list[0],
+                data_format=data_format)
             for in_c, out_c, k, s in self.stem_cfg[version]
         ])
 
-        self.max_pool = MaxPool2D(kernel_size=3, stride=2, padding=1)
+        self.max_pool = MaxPool2D(
+            kernel_size=3, stride=2, padding=1, data_format=data_format)
         block_list = []
         for block_idx in range(len(self.block_depth)):
             shortcut = False
@@ -306,11 +322,12 @@ class ResNet(TheseusLayer):
                     stride=2 if i == 0 and block_idx != 0 else 1,
                     shortcut=shortcut,
                     if_first=block_idx == i == 0 if version == "vd" else True,
-                    lr_mult=self.lr_mult_list[block_idx + 1]))
+                    lr_mult=self.lr_mult_list[block_idx + 1],
+                    data_format=data_format))
                 shortcut = True
         self.blocks = nn.Sequential(*block_list)
 
-        self.avg_pool = AdaptiveAvgPool2D(1)
+        self.avg_pool = AdaptiveAvgPool2D(1, data_format=data_format)
         self.flatten = nn.Flatten()
         self.avg_pool_channels = self.num_channels[-1] * 2
         stdv = 1.0 / math.sqrt(self.avg_pool_channels * 1.0)
@@ -319,13 +336,19 @@ class ResNet(TheseusLayer):
             self.class_num,
             weight_attr=ParamAttr(initializer=Uniform(-stdv, stdv)))
 
+        self.data_format = data_format
+
     def forward(self, x):
-        x = self.stem(x)
-        x = self.max_pool(x)
-        x = self.blocks(x)
-        x = self.avg_pool(x)
-        x = self.flatten(x)
-        x = self.fc(x)
+        with paddle.static.amp.fp16_guard():
+            if self.data_format == "NHWC":
+                x = paddle.transpose(x, [0, 2, 3, 1])
+                x.stop_gradient = True
+            x = self.stem(x)
+            x = self.max_pool(x)
+            x = self.blocks(x)
+            x = self.avg_pool(x)
+            x = self.flatten(x)
+            x = self.fc(x)
         return x
 
 
diff --git a/ppcls/arch/gears/cosmargin.py b/ppcls/arch/gears/cosmargin.py
index 1d3ff83a78f83a846316bd2c953ea08b63179fb6..51db550868352e2ef20c3accd1fa4dc92d64321d 100644
--- a/ppcls/arch/gears/cosmargin.py
+++ b/ppcls/arch/gears/cosmargin.py
@@ -38,7 +38,7 @@ class CosMargin(paddle.nn.Layer):
 
         input_norm = paddle.sqrt(
             paddle.sum(paddle.square(input), axis=1, keepdim=True))
-        input = paddle.divide(input, x_norm)
+        input = paddle.divide(input, input_norm)
 
         weight = self.fc.weight
         weight_norm = paddle.sqrt(
diff --git a/ppcls/configs/ImageNet/DPN/DPN107.yaml b/ppcls/configs/ImageNet/DPN/DPN107.yaml
index 239da60ea75df3cb17ba57a526cf30d224cf9a79..92c1fb8144ec4904302f079ac0b310038bfda4b0 100644
--- a/ppcls/configs/ImageNet/DPN/DPN107.yaml
+++ b/ppcls/configs/ImageNet/DPN/DPN107.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/DPN/DPN131.yaml b/ppcls/configs/ImageNet/DPN/DPN131.yaml
index ff81e4fe68ce6de500e2d48965ef5b53517e5419..3cb22f60dfa82d9a4603450eb7cbe3fb47e3a735 100644
--- a/ppcls/configs/ImageNet/DPN/DPN131.yaml
+++ b/ppcls/configs/ImageNet/DPN/DPN131.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/DPN/DPN68.yaml b/ppcls/configs/ImageNet/DPN/DPN68.yaml
index fd7dc147ba916d1c2f680b113dc33e95f1c46014..ecd2d8540f02d2780f15898da33084f8293f8f36 100644
--- a/ppcls/configs/ImageNet/DPN/DPN68.yaml
+++ b/ppcls/configs/ImageNet/DPN/DPN68.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/DPN/DPN92.yaml b/ppcls/configs/ImageNet/DPN/DPN92.yaml
index 3559e8f1943320c89268f7922dd5c72ded105823..c431efcf4e5eab76efe4ec4b64886d540d495da7 100644
--- a/ppcls/configs/ImageNet/DPN/DPN92.yaml
+++ b/ppcls/configs/ImageNet/DPN/DPN92.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/DPN/DPN98.yaml b/ppcls/configs/ImageNet/DPN/DPN98.yaml
index 11af4926294efd1780ad882f58b7ac09984c14bd..9fb1ec9f6e9a35badf44a1e3cb46b2c58b122291 100644
--- a/ppcls/configs/ImageNet/DPN/DPN98.yaml
+++ b/ppcls/configs/ImageNet/DPN/DPN98.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/DarkNet/DarkNet53.yaml b/ppcls/configs/ImageNet/DarkNet/DarkNet53.yaml
index 1a55e75d4661b1759dc46ac2203ad5f1e2ceb2fb..b69ccfcfdbb8045985977bc374b02b422e3e5c23 100644
--- a/ppcls/configs/ImageNet/DarkNet/DarkNet53.yaml
+++ b/ppcls/configs/ImageNet/DarkNet/DarkNet53.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml b/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml
index 6ab79d35c43e49e66d17372597875ed140ef4189..918a7629440f879f012d0fb7240fb3dd2a379b2f 100644
--- a/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml
+++ b/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
   Eval:
     - CELoss:
diff --git a/ppcls/configs/ImageNet/DataAugment/ResNet50_Mixup.yaml b/ppcls/configs/ImageNet/DataAugment/ResNet50_Mixup.yaml
index 448440ecfff0a5249bfe1a61ce8b5d06cd881e12..b12567150d8f788fd8abeaf4d6adb75e0b10d12f 100644
--- a/ppcls/configs/ImageNet/DataAugment/ResNet50_Mixup.yaml
+++ b/ppcls/configs/ImageNet/DataAugment/ResNet50_Mixup.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
   Eval:
     - CELoss:
diff --git a/ppcls/configs/ImageNet/Inception/InceptionV3.yaml b/ppcls/configs/ImageNet/Inception/InceptionV3.yaml
index a8c30ea1a497c9897299ca9526e17c22fe555977..fa8b64a5aaaeb0914e8ecbed80176a4e40014883 100644
--- a/ppcls/configs/ImageNet/Inception/InceptionV3.yaml
+++ b/ppcls/configs/ImageNet/Inception/InceptionV3.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/Inception/InceptionV4.yaml b/ppcls/configs/ImageNet/Inception/InceptionV4.yaml
index 17415b3cefb1a7b7e56db5545e754ffb7fd3d0ff..6a6dbb62d79a658bf564f7c55c86a2b9d963f645 100644
--- a/ppcls/configs/ImageNet/Inception/InceptionV4.yaml
+++ b/ppcls/configs/ImageNet/Inception/InceptionV4.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/Res2Net/Res2Net101_vd_26w_4s.yaml b/ppcls/configs/ImageNet/Res2Net/Res2Net101_vd_26w_4s.yaml
index bf27b303d72654e3ff835280d192370c6e9e6b8d..7e5cbfd3cae04673e28dfadef404b2c1650064b4 100644
--- a/ppcls/configs/ImageNet/Res2Net/Res2Net101_vd_26w_4s.yaml
+++ b/ppcls/configs/ImageNet/Res2Net/Res2Net101_vd_26w_4s.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/Res2Net/Res2Net200_vd_26w_4s.yaml b/ppcls/configs/ImageNet/Res2Net/Res2Net200_vd_26w_4s.yaml
index 90b7b879c8f92eeb3c80537a5f583b8323fd5082..edceda10f7c69a4b74f7b2a5b5dd697e095d708f 100644
--- a/ppcls/configs/ImageNet/Res2Net/Res2Net200_vd_26w_4s.yaml
+++ b/ppcls/configs/ImageNet/Res2Net/Res2Net200_vd_26w_4s.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/Res2Net/Res2Net50_14w_8s.yaml b/ppcls/configs/ImageNet/Res2Net/Res2Net50_14w_8s.yaml
index af1c4c73e6ff6449de94a61e88a2286a6f04db55..1f3ecde91ad0853c9a25a4c858dcf37203d29a15 100644
--- a/ppcls/configs/ImageNet/Res2Net/Res2Net50_14w_8s.yaml
+++ b/ppcls/configs/ImageNet/Res2Net/Res2Net50_14w_8s.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/Res2Net/Res2Net50_26w_4s.yaml b/ppcls/configs/ImageNet/Res2Net/Res2Net50_26w_4s.yaml
index e792e9d03b8be13223ad684dac50f101ce99603a..31ad95e65443a824099ee1fcaff6ff0e2a44cc78 100644
--- a/ppcls/configs/ImageNet/Res2Net/Res2Net50_26w_4s.yaml
+++ b/ppcls/configs/ImageNet/Res2Net/Res2Net50_26w_4s.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/Res2Net/Res2Net50_vd_26w_4s.yaml b/ppcls/configs/ImageNet/Res2Net/Res2Net50_vd_26w_4s.yaml
index 58d4968b43f7571fc633f6e29748136fec94c9ad..1157ac0c877fa070292ee95afcf066acb4839361 100644
--- a/ppcls/configs/ImageNet/Res2Net/Res2Net50_vd_26w_4s.yaml
+++ b/ppcls/configs/ImageNet/Res2Net/Res2Net50_vd_26w_4s.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNeSt/ResNeSt101.yaml b/ppcls/configs/ImageNet/ResNeSt/ResNeSt101.yaml
index 7b3bc2bdb6fb4c6d32a1d95d972407f9cf58f4c7..9daaac25129256f3cb06d8b0ac275dcd4dc4e4d2 100644
--- a/ppcls/configs/ImageNet/ResNeSt/ResNeSt101.yaml
+++ b/ppcls/configs/ImageNet/ResNeSt/ResNeSt101.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNeSt/ResNeSt50.yaml b/ppcls/configs/ImageNet/ResNeSt/ResNeSt50.yaml
index acf55ec789fb37c9e0d0e34bc295191959d704bd..24c82b5bf912df0b918c89a5c7e510baa49f94e2 100644
--- a/ppcls/configs/ImageNet/ResNeSt/ResNeSt50.yaml
+++ b/ppcls/configs/ImageNet/ResNeSt/ResNeSt50.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNeSt/ResNeSt50_fast_1s1x64d.yaml b/ppcls/configs/ImageNet/ResNeSt/ResNeSt50_fast_1s1x64d.yaml
index 9488195d833c7189b0551051190b05ed395f8241..e761cc2d89423a2e28affb4f45d5592c90c3a0c1 100644
--- a/ppcls/configs/ImageNet/ResNeSt/ResNeSt50_fast_1s1x64d.yaml
+++ b/ppcls/configs/ImageNet/ResNeSt/ResNeSt50_fast_1s1x64d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_32x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_32x4d.yaml
index c400b9e288af6bb5bf2c9d2eff23895b6efb62b6..4ac6ab70b34cae4368fca339f4f4f518e55007f7 100644
--- a/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_32x4d.yaml
+++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_32x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_64x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_64x4d.yaml
index 4f5f3c79353aca99ac800e79f2112839cb92d11a..1754e63a43f380e0764fc64875043310630c66f5 100644
--- a/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_64x4d.yaml
+++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_64x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_32x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_32x4d.yaml
index d3054143d0ab858b83d377d8b30bd022e6848c0b..5cfb972f8314af192f4daf6c875818772881cd29 100644
--- a/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_32x4d.yaml
+++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_32x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_64x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_64x4d.yaml
index c8b76d0f6846e0d425cee2c92ed2aa529c87598d..a95907312c5fe89ab76661aad7882702b1622d38 100644
--- a/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_64x4d.yaml
+++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_64x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_32x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_32x4d.yaml
index 3a03646f564ea00b53ce5e21ad166ac37eb3f2f4..466dfb361a4e6010add90dcb9a8fa1f9a7b8e12f 100644
--- a/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_32x4d.yaml
+++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_32x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_64x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_64x4d.yaml
index c9b9a1015eb701d3846466cb7be1c7b134dfbad0..d2a2f86ee25d889f4def86927922c8855a71a8cb 100644
--- a/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_64x4d.yaml
+++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_64x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNet/ResNet101_vd.yaml b/ppcls/configs/ImageNet/ResNet/ResNet101_vd.yaml
index f30ca07774bba5e457f77f6213be990045ce6b86..83d1fc028fee679327f2b835e3401cb94371225c 100644
--- a/ppcls/configs/ImageNet/ResNet/ResNet101_vd.yaml
+++ b/ppcls/configs/ImageNet/ResNet/ResNet101_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNet/ResNet152_vd.yaml b/ppcls/configs/ImageNet/ResNet/ResNet152_vd.yaml
index f3168c432a0154fbf25c7548499984433bc2abc5..e09bb60c940e732232c5cc6c048ff8bc8722fb22 100644
--- a/ppcls/configs/ImageNet/ResNet/ResNet152_vd.yaml
+++ b/ppcls/configs/ImageNet/ResNet/ResNet152_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNet/ResNet18_vd.yaml b/ppcls/configs/ImageNet/ResNet/ResNet18_vd.yaml
index 2dc6bba0c772e0fe8d73d2aad25aa61cc6c3c0da..e0ba71a6e80de43926f8bf5f152af06332bb7395 100644
--- a/ppcls/configs/ImageNet/ResNet/ResNet18_vd.yaml
+++ b/ppcls/configs/ImageNet/ResNet/ResNet18_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNet/ResNet200_vd.yaml b/ppcls/configs/ImageNet/ResNet/ResNet200_vd.yaml
index a52c83748cb4dddbb0c2e93899c067fe3a3e40b3..98de87e3274ae2bc1cfe419f6cf01ff232ac4936 100644
--- a/ppcls/configs/ImageNet/ResNet/ResNet200_vd.yaml
+++ b/ppcls/configs/ImageNet/ResNet/ResNet200_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNet/ResNet34_vd.yaml b/ppcls/configs/ImageNet/ResNet/ResNet34_vd.yaml
index daae960b596a7f76b599f96c327dbd698180b766..9ff0717113a3274e110f3ce4d2a0f773eb8562a7 100644
--- a/ppcls/configs/ImageNet/ResNet/ResNet34_vd.yaml
+++ b/ppcls/configs/ImageNet/ResNet/ResNet34_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/ResNet/ResNet50_fp16.yaml b/ppcls/configs/ImageNet/ResNet/ResNet50_fp16.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e58539b466f3b59d637f4f6a8e7188f6a9b8b876
--- /dev/null
+++ b/ppcls/configs/ImageNet/ResNet/ResNet50_fp16.yaml
@@ -0,0 +1,147 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_channel: &image_channel 4
+  image_shape: [*image_channel, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# mixed precision training
+AMP:
+  scale_loss: 128.0
+  use_dynamic_loss_scaling: True
+  use_pure_fp16: &use_pure_fp16 True
+
+# model architecture
+Arch:
+  name: ResNet50
+  class_num: 1000
+  input_image_channel: *image_channel
+  data_format: "NHWC"
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  multi_precision: *use_pure_fp16
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+            output_fp16: *use_pure_fp16
+            channel_num: *image_channel
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 32
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+            output_fp16: *use_pure_fp16
+            channel_num: *image_channel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        output_fp16: *use_pure_fp16
+        channel_num: *image_channel
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
diff --git a/ppcls/configs/ImageNet/ResNet/ResNet50_fp16_dygraph.yaml b/ppcls/configs/ImageNet/ResNet/ResNet50_fp16_dygraph.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..59d4bcec28c19b97434a1a7b93bfdaf2234d482f
--- /dev/null
+++ b/ppcls/configs/ImageNet/ResNet/ResNet50_fp16_dygraph.yaml
@@ -0,0 +1,148 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  image_channel: &image_channel 4
+  # used for static mode and model export
+  image_shape: [*image_channel, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+  use_dali: True
+
+# mixed precision training
+AMP:
+  scale_loss: 128.0
+  use_dynamic_loss_scaling: True
+  use_pure_fp16: &use_pure_fp16 False
+
+# model architecture
+Arch:
+  name: ResNet50
+  class_num: 1000
+  input_image_channel: *image_channel
+  data_format: "NHWC"
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Piecewise
+    learning_rate: 0.1
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+            output_fp16: *use_pure_fp16
+            channel_num: *image_channel
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+            output_fp16: *use_pure_fp16
+            channel_num: *image_channel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        output_fp16: *use_pure_fp16
+        channel_num: *image_channel
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
diff --git a/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml b/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml
index 0a2c4aa4e740809c69a6af44b6b6656f66ebc01b..ba38350bfddc54f938515d402d7cd2ad94834e7a 100644
--- a/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml
+++ b/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/SENet/SENet154_vd.yaml b/ppcls/configs/ImageNet/SENet/SENet154_vd.yaml
index f7f1ba0f99060c443379d386f395656009fb81a8..f8255a977d448d0644023beabca94f5d7e489e9a 100644
--- a/ppcls/configs/ImageNet/SENet/SENet154_vd.yaml
+++ b/ppcls/configs/ImageNet/SENet/SENet154_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d.yaml b/ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d.yaml
index 3b09c3fd365537e78abe1e6a2e1db7c7d5bf3daa..bf27461845b37ef1c0a934f8e8da08a955bb6705 100644
--- a/ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d.yaml
+++ b/ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d_fp16.yaml b/ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d_fp16.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..540858698a543f6185ab98590e227279ccd84845
--- /dev/null
+++ b/ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d_fp16.yaml
@@ -0,0 +1,140 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 200
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_channel: &image_channel 4
+  image_shape: [*image_channel, 224, 224]
+  save_inference_dir: ./inference
+
+# model architecture
+Arch:
+  name: SE_ResNeXt101_32x4d
+  class_num: 1000
+  input_image_channel: *image_channel
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+# mixed precision training
+AMP:
+    scale_loss: 128.0
+    use_dynamic_loss_scaling: True
+    use_pure_fp16: &use_pure_fp16 True
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.1
+  regularizer:
+    name: 'L2'
+    coeff: 0.00007
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+            output_fp16: *use_pure_fp16
+            channel_num: *image_channel
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+            output_fp16: *use_pure_fp16
+            channel_num: *image_channel
+    sampler:
+      name: BatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        output_fp16: *use_pure_fp16
+        channel_num: *image_channel
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
diff --git a/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_32x4d.yaml b/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_32x4d.yaml
index d04f298a3aa42adbaeabde2ff0afde653fd9d9bb..2c1286927ab503aba6994794f5fb3e68f36f7a78 100644
--- a/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_32x4d.yaml
+++ b/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_32x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_vd_32x4d.yaml b/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_vd_32x4d.yaml
index cabff29b423eed9a4861b41f35e0b75ed9c239b3..48e6e4206c2f626ba38ff0ff4bde5507453d9379 100644
--- a/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_vd_32x4d.yaml
+++ b/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_vd_32x4d.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/SENet/SE_ResNet18_vd.yaml b/ppcls/configs/ImageNet/SENet/SE_ResNet18_vd.yaml
index fcaada9342a364a7c35efdf915fdc18a799ddd95..20b3a0c40805e7706aa600f907f169e073b9496f 100644
--- a/ppcls/configs/ImageNet/SENet/SE_ResNet18_vd.yaml
+++ b/ppcls/configs/ImageNet/SENet/SE_ResNet18_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/SENet/SE_ResNet34_vd.yaml b/ppcls/configs/ImageNet/SENet/SE_ResNet34_vd.yaml
index 69d15ccaf79d804f200241208aaf39d551f2c14a..7280e32441f781e8ed1d08cd6e136b97a3122dc1 100644
--- a/ppcls/configs/ImageNet/SENet/SE_ResNet34_vd.yaml
+++ b/ppcls/configs/ImageNet/SENet/SE_ResNet34_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/SENet/SE_ResNet50_vd.yaml b/ppcls/configs/ImageNet/SENet/SE_ResNet50_vd.yaml
index f670c159e4f41ebbb8e2eb70486785662de06840..030dff93b963322aba5f47b1ce2bddd1cd35b8e2 100644
--- a/ppcls/configs/ImageNet/SENet/SE_ResNet50_vd.yaml
+++ b/ppcls/configs/ImageNet/SENet/SE_ResNet50_vd.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/Xception/Xception65.yaml b/ppcls/configs/ImageNet/Xception/Xception65.yaml
index f9217cf77395f4033955dab09814489f55b3ecd8..c94b28506a3e4969dc037f70f9dc940172618b6e 100644
--- a/ppcls/configs/ImageNet/Xception/Xception65.yaml
+++ b/ppcls/configs/ImageNet/Xception/Xception65.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/ImageNet/Xception/Xception71.yaml b/ppcls/configs/ImageNet/Xception/Xception71.yaml
index 7475a5f9588c2b48b2949d12771f844fbdcb181e..bda7ecfe9a2b6f3bedc0e8296b9eca4819235b84 100644
--- a/ppcls/configs/ImageNet/Xception/Xception71.yaml
+++ b/ppcls/configs/ImageNet/Xception/Xception71.yaml
@@ -22,7 +22,7 @@ Arch:
 # loss function config for traing/eval process
 Loss:
   Train:
-    - CELoss:
+    - MixCELoss:
         weight: 1.0
         epsilon: 0.1
   Eval:
diff --git a/ppcls/configs/Products/ResNet50_vd_Inshop.yaml b/ppcls/configs/Products/ResNet50_vd_Inshop.yaml
index 0a0e272505f26ac0cda105d1e79e8e9989f1c301..b29a3a3f0e53099b06cd4aa2994d3bbd209a0467 100644
--- a/ppcls/configs/Products/ResNet50_vd_Inshop.yaml
+++ b/ppcls/configs/Products/ResNet50_vd_Inshop.yaml
@@ -1,9 +1,7 @@
 # global configs
 Global:
   checkpoints: null
-# please download pretrained model via this link:
-# https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams
-  pretrained_model: product_ResNet50_vd_Aliproduct_v1.0_pretrained
+  pretrained_model: "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams"
   output_dir: ./output/
   device: gpu
   save_interval: 10
diff --git a/ppcls/configs/Products/ResNet50_vd_SOP.yaml b/ppcls/configs/Products/ResNet50_vd_SOP.yaml
index 795fb0265073d4135ed6c479bd3dfdc6ec71d67b..484b6ff85dced690f37a3721567515c0da2370d7 100644
--- a/ppcls/configs/Products/ResNet50_vd_SOP.yaml
+++ b/ppcls/configs/Products/ResNet50_vd_SOP.yaml
@@ -1,9 +1,7 @@
 # global configs
 Global:
   checkpoints: null
-# please download pretrained model via this link:
-# https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams
-  pretrained_model: product_ResNet50_vd_Aliproduct_v1.0_pretrained
+  pretrained_model: "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams"
   output_dir: ./output/
   device: gpu
   save_interval: 10
diff --git a/ppcls/data/__init__.py b/ppcls/data/__init__.py
index 9e896e0e7fdfc7ab3c6498186fef86426af8b21a..8d507330f1accfd1ffbab21ad04b82f1995c5699 100644
--- a/ppcls/data/__init__.py
+++ b/ppcls/data/__init__.py
@@ -53,10 +53,14 @@ def create_operators(params):
     return ops
 
 
-def build_dataloader(config, mode, device, seed=None):
+def build_dataloader(config, mode, device, use_dali=False, seed=None):
     assert mode in ['Train', 'Eval', 'Test', 'Gallery', 'Query'
                     ], "Mode should be Train, Eval, Test, Gallery, Query"
     # build dataset
+    if use_dali:
+        from ppcls.data.dataloader.dali import dali_dataloader
+        return dali_dataloader(config, mode, paddle.device.get_device(), seed)
+
     config_dataset = config[mode]['dataset']
     config_dataset = copy.deepcopy(config_dataset)
     dataset_name = config_dataset.pop('name')
diff --git a/ppcls/data/dataloader/dali.py b/ppcls/data/dataloader/dali.py
new file mode 100644
index 0000000000000000000000000000000000000000..a15c231568a97fd607f2ada4f5f6e81fa084cc62
--- /dev/null
+++ b/ppcls/data/dataloader/dali.py
@@ -0,0 +1,319 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import division
+
+import copy
+import os
+
+import numpy as np
+import nvidia.dali.ops as ops
+import nvidia.dali.types as types
+import paddle
+from nvidia.dali import fn
+from nvidia.dali.pipeline import Pipeline
+from nvidia.dali.plugin.base_iterator import LastBatchPolicy
+from nvidia.dali.plugin.paddle import DALIGenericIterator
+
+
+class HybridTrainPipe(Pipeline):
+    def __init__(self,
+                 file_root,
+                 file_list,
+                 batch_size,
+                 resize_shorter,
+                 crop,
+                 min_area,
+                 lower,
+                 upper,
+                 interp,
+                 mean,
+                 std,
+                 device_id,
+                 shard_id=0,
+                 num_shards=1,
+                 random_shuffle=True,
+                 num_threads=4,
+                 seed=42,
+                 pad_output=False,
+                 output_dtype=types.FLOAT,
+                 dataset='Train'):
+        super(HybridTrainPipe, self).__init__(
+            batch_size, num_threads, device_id, seed=seed)
+        self.input = ops.readers.File(
+            file_root=file_root,
+            file_list=file_list,
+            shard_id=shard_id,
+            num_shards=num_shards,
+            random_shuffle=random_shuffle)
+        # set internal nvJPEG buffers size to handle full-sized ImageNet images
+        # without additional reallocations
+        device_memory_padding = 211025920
+        host_memory_padding = 140544512
+        self.decode = ops.decoders.ImageRandomCrop(
+            device='mixed',
+            output_type=types.DALIImageType.RGB,
+            device_memory_padding=device_memory_padding,
+            host_memory_padding=host_memory_padding,
+            random_aspect_ratio=[lower, upper],
+            random_area=[min_area, 1.0],
+            num_attempts=100)
+        self.res = ops.Resize(
+            device='gpu', resize_x=crop, resize_y=crop, interp_type=interp)
+        self.cmnp = ops.CropMirrorNormalize(
+            device="gpu",
+            dtype=output_dtype,
+            output_layout='CHW',
+            crop=(crop, crop),
+            mean=mean,
+            std=std,
+            pad_output=pad_output)
+        self.coin = ops.random.CoinFlip(probability=0.5)
+        self.to_int64 = ops.Cast(dtype=types.DALIDataType.INT64, device="gpu")
+
+    def define_graph(self):
+        rng = self.coin()
+        jpegs, labels = self.input(name="Reader")
+        images = self.decode(jpegs)
+        images = self.res(images)
+        output = self.cmnp(images.gpu(), mirror=rng)
+        return [output, self.to_int64(labels.gpu())]
+
+    def __len__(self):
+        return self.epoch_size("Reader")
+
+
+class HybridValPipe(Pipeline):
+    def __init__(self,
+                 file_root,
+                 file_list,
+                 batch_size,
+                 resize_shorter,
+                 crop,
+                 interp,
+                 mean,
+                 std,
+                 device_id,
+                 shard_id=0,
+                 num_shards=1,
+                 random_shuffle=False,
+                 num_threads=4,
+                 seed=42,
+                 pad_output=False,
+                 output_dtype=types.FLOAT):
+        super(HybridValPipe, self).__init__(
+            batch_size, num_threads, device_id, seed=seed)
+        self.input = ops.readers.File(
+            file_root=file_root,
+            file_list=file_list,
+            shard_id=shard_id,
+            num_shards=num_shards,
+            random_shuffle=random_shuffle)
+        self.decode = ops.decoders.Image(device="mixed")
+        self.res = ops.Resize(
+            device="gpu", resize_shorter=resize_shorter, interp_type=interp)
+        self.cmnp = ops.CropMirrorNormalize(
+            device="gpu",
+            dtype=output_dtype,
+            output_layout='CHW',
+            crop=(crop, crop),
+            mean=mean,
+            std=std,
+            pad_output=pad_output)
+        self.to_int64 = ops.Cast(dtype=types.DALIDataType.INT64, device="gpu")
+
+    def define_graph(self):
+        jpegs, labels = self.input(name="Reader")
+        images = self.decode(jpegs)
+        images = self.res(images)
+        output = self.cmnp(images)
+        return [output, self.to_int64(labels.gpu())]
+
+    def __len__(self):
+        return self.epoch_size("Reader")
+
+
+def dali_dataloader(config, mode, device, seed=None):
+    assert "gpu" in device, "gpu training is required for DALI"
+    device_id = int(device.split(':')[1])
+    config_dataloader = config[mode]
+    seed = 42 if seed is None else seed
+    ops = [
+        list(x.keys())[0]
+        for x in config_dataloader["dataset"]["transform_ops"]
+    ]
+    support_ops_train = [
+        "DecodeImage", "NormalizeImage", "RandFlipImage", "RandCropImage"
+    ]
+    support_ops_eval = [
+        "DecodeImage", "ResizeImage", "CropImage", "NormalizeImage"
+    ]
+
+    if mode.lower() == 'train':
+        assert set(ops) == set(
+            support_ops_train
+        ), "The supported trasform_ops for train_dataset in dali is : {}".format(
+            ",".join(support_ops_train))
+    else:
+        assert set(ops) == set(
+            support_ops_eval
+        ), "The supported trasform_ops for eval_dataset in dali is : {}".format(
+            ",".join(support_ops_eval))
+
+    normalize_ops = [
+        op for op in config_dataloader["dataset"]["transform_ops"]
+        if "NormalizeImage" in op
+    ][0]["NormalizeImage"]
+    channel_num = normalize_ops.get("channel_num", 3)
+    output_dtype = types.FLOAT16 if normalize_ops.get("output_fp16",
+                                                      False) else types.FLOAT
+
+    env = os.environ
+    #  assert float(env.get('FLAGS_fraction_of_gpu_memory_to_use', 0.92)) < 0.9, \
+    #      "Please leave enough GPU memory for DALI workspace, e.g., by setting" \
+    #      " `export FLAGS_fraction_of_gpu_memory_to_use=0.8`"
+
+    gpu_num = paddle.distributed.get_world_size()
+
+    batch_size = config_dataloader["sampler"]["batch_size"]
+
+    file_root = config_dataloader["dataset"]["image_root"]
+    file_list = config_dataloader["dataset"]["cls_label_path"]
+
+    interp = 1  # settings.interpolation or 1  # default to linear
+    interp_map = {
+        0: types.DALIInterpType.INTERP_NN,  # cv2.INTER_NEAREST
+        1: types.DALIInterpType.INTERP_LINEAR,  # cv2.INTER_LINEAR
+        2: types.DALIInterpType.INTERP_CUBIC,  # cv2.INTER_CUBIC
+        3: types.DALIInterpType.
+        INTERP_LANCZOS3,  # XXX use LANCZOS3 for cv2.INTER_LANCZOS4
+    }
+
+    assert interp in interp_map, "interpolation method not supported by DALI"
+    interp = interp_map[interp]
+    pad_output = channel_num == 4
+
+    transforms = {
+        k: v
+        for d in config_dataloader["dataset"]["transform_ops"]
+        for k, v in d.items()
+    }
+
+    scale = transforms["NormalizeImage"].get("scale", 1.0 / 255)
+    scale = eval(scale) if isinstance(scale, str) else scale
+    mean = transforms["NormalizeImage"].get("mean", [0.485, 0.456, 0.406])
+    std = transforms["NormalizeImage"].get("std", [0.229, 0.224, 0.225])
+    mean = [v / scale for v in mean]
+    std = [v / scale for v in std]
+
+    sampler_name = config_dataloader["sampler"].get("name",
+                                                    "DistributedBatchSampler")
+    assert sampler_name in ["DistributedBatchSampler", "BatchSampler"]
+
+    if mode.lower() == "train":
+        resize_shorter = 256
+        crop = transforms["RandCropImage"]["size"]
+        scale = transforms["RandCropImage"].get("scale", [0.08, 1.])
+        ratio = transforms["RandCropImage"].get("ratio", [3.0 / 4, 4.0 / 3])
+        min_area = scale[0]
+        lower = ratio[0]
+        upper = ratio[1]
+
+        if 'PADDLE_TRAINER_ID' in env and 'PADDLE_TRAINERS_NUM' in env:
+            shard_id = int(env['PADDLE_TRAINER_ID'])
+            num_shards = int(env['PADDLE_TRAINERS_NUM'])
+            device_id = int(env['FLAGS_selected_gpus'])
+            pipe = HybridTrainPipe(
+                file_root,
+                file_list,
+                batch_size,
+                resize_shorter,
+                crop,
+                min_area,
+                lower,
+                upper,
+                interp,
+                mean,
+                std,
+                device_id,
+                shard_id,
+                num_shards,
+                seed=seed + shard_id,
+                pad_output=pad_output,
+                output_dtype=output_dtype)
+            pipe.build()
+            pipelines = [pipe]
+            #  sample_per_shard = len(pipe) // num_shards
+        else:
+            pipe = HybridTrainPipe(
+                file_root,
+                file_list,
+                batch_size,
+                resize_shorter,
+                crop,
+                min_area,
+                lower,
+                upper,
+                interp,
+                mean,
+                std,
+                device_id=device_id,
+                shard_id=0,
+                num_shards=1,
+                seed=seed,
+                pad_output=pad_output,
+                output_dtype=output_dtype)
+            pipe.build()
+            pipelines = [pipe]
+            #  sample_per_shard = len(pipelines[0])
+        return DALIGenericIterator(
+            pipelines, ['data', 'label'], reader_name='Reader')
+    else:
+        resize_shorter = transforms["ResizeImage"].get("resize_short", 256)
+        crop = transforms["CropImage"]["size"]
+        if 'PADDLE_TRAINER_ID' in env and 'PADDLE_TRAINERS_NUM' in env and sampler_name == "DistributedBatchSampler":
+            shard_id = int(env['PADDLE_TRAINER_ID'])
+            num_shards = int(env['PADDLE_TRAINERS_NUM'])
+            device_id = int(env['FLAGS_selected_gpus'])
+
+            pipe = HybridValPipe(
+                file_root,
+                file_list,
+                batch_size,
+                resize_shorter,
+                crop,
+                interp,
+                mean,
+                std,
+                device_id=device_id,
+                shard_id=shard_id,
+                num_shards=num_shards,
+                pad_output=pad_output,
+                output_dtype=output_dtype)
+        else:
+            pipe = HybridValPipe(
+                file_root,
+                file_list,
+                batch_size,
+                resize_shorter,
+                crop,
+                interp,
+                mean,
+                std,
+                device_id=device_id,
+                pad_output=pad_output,
+                output_dtype=output_dtype)
+        pipe.build()
+        return DALIGenericIterator(
+            [pipe], ['data', 'label'], reader_name="Reader")
diff --git a/ppcls/data/dataloader/vehicle_dataset.py b/ppcls/data/dataloader/vehicle_dataset.py
index baae63c257cd51f91b0d9d82894e0027d3d0d2df..80fc6bb08d6e2458ca2604dc91933b4a28971d2f 100644
--- a/ppcls/data/dataloader/vehicle_dataset.py
+++ b/ppcls/data/dataloader/vehicle_dataset.py
@@ -61,7 +61,8 @@ class CompCars(Dataset):
                     label_path = os.path.join(self._label_root,
                                               l[0].split('.')[0] + '.txt')
                     assert os.path.exists(label_path)
-                    bbox = open(label_path).readlines()[-1].strip().split()
+                    with open(label_path) as f:
+                        bbox = f.readlines()[-1].strip().split()
                     bbox = [int(x) for x in bbox]
                     self.images.append(os.path.join(self._img_root, l[0]))
                     self.labels.append(int(l[1]))
diff --git a/ppcls/data/preprocess/ops/operators.py b/ppcls/data/preprocess/ops/operators.py
index 7c8b27f1a9493596fa5e2341e4dd789c5838d21f..b00dd139a0e62ea44576331e183855bcbf60bcd1 100644
--- a/ppcls/data/preprocess/ops/operators.py
+++ b/ppcls/data/preprocess/ops/operators.py
@@ -197,14 +197,26 @@ class NormalizeImage(object):
     """ normalize image such as substract mean, divide std
     """
 
-    def __init__(self, scale=None, mean=None, std=None, order='chw'):
+    def __init__(self,
+                 scale=None,
+                 mean=None,
+                 std=None,
+                 order='chw',
+                 output_fp16=False,
+                 channel_num=3):
         if isinstance(scale, str):
             scale = eval(scale)
+        assert channel_num in [
+            3, 4
+        ], "channel number of input image should be set to 3 or 4."
+        self.channel_num = channel_num
+        self.output_dtype = 'float16' if output_fp16 else 'float32'
         self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+        self.order = order
         mean = mean if mean is not None else [0.485, 0.456, 0.406]
         std = std if std is not None else [0.229, 0.224, 0.225]
 
-        shape = (3, 1, 1) if order == 'chw' else (1, 1, 3)
+        shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
         self.mean = np.array(mean).reshape(shape).astype('float32')
         self.std = np.array(std).reshape(shape).astype('float32')
 
@@ -215,7 +227,20 @@ class NormalizeImage(object):
 
         assert isinstance(img,
                           np.ndarray), "invalid input 'img' in NormalizeImage"
-        return (img.astype('float32') * self.scale - self.mean) / self.std
+
+        img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+        if self.channel_num == 4:
+            img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+            img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+            pad_zeros = np.zeros(
+                (1, img_h, img_w)) if self.order == 'chw' else np.zeros(
+                    (img_h, img_w, 1))
+            img = (np.concatenate(
+                (img, pad_zeros), axis=0)
+                   if self.order == 'chw' else np.concatenate(
+                       (img, pad_zeros), axis=2))
+        return img.astype(self.output_dtype)
 
 
 class ToCHWImage(object):
diff --git a/ppcls/engine/trainer.py b/ppcls/engine/trainer.py
index 9db7fbcfe8ae2ac039d2cd10ce66eca66fa5fbe9..451531c1d1e6ca59e2addc1add752649e05f1e67 100644
--- a/ppcls/engine/trainer.py
+++ b/ppcls/engine/trainer.py
@@ -17,10 +17,12 @@ from __future__ import print_function
 import os
 import sys
 import numpy as np
+
 __dir__ = os.path.dirname(os.path.abspath(__file__))
 sys.path.append(os.path.abspath(os.path.join(__dir__, '../../')))
 
 import time
+import platform
 import datetime
 import argparse
 import paddle
@@ -39,7 +41,7 @@ from ppcls.arch import apply_to_static
 from ppcls.loss import build_loss
 from ppcls.metric import build_metrics
 from ppcls.optimizer import build_optimizer
-from ppcls.utils.save_load import load_dygraph_pretrain
+from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url
 from ppcls.utils.save_load import init_model
 from ppcls.utils import save_load
 
@@ -77,8 +79,12 @@ class Trainer(object):
         apply_to_static(self.config, self.model)
 
         if self.config["Global"]["pretrained_model"] is not None:
-            load_dygraph_pretrain(self.model,
-                                  self.config["Global"]["pretrained_model"])
+            if self.config["Global"]["pretrained_model"].startswith("http"):
+                load_dygraph_pretrain_from_url(
+                    self.model, self.config["Global"]["pretrained_model"])
+            else:
+                load_dygraph_pretrain(
+                    self.model, self.config["Global"]["pretrained_model"])
 
         if self.config["Global"]["distributed"]:
             self.model = paddle.DataParallel(self.model)
@@ -98,10 +104,25 @@ class Trainer(object):
         self.query_dataloader = None
         self.eval_mode = self.config["Global"].get("eval_mode",
                                                    "classification")
+        self.amp = True if "AMP" in self.config else False
+        if self.amp and self.config["AMP"] is not None:
+            self.scale_loss = self.config["AMP"].get("scale_loss", 1.0)
+            self.use_dynamic_loss_scaling = self.config["AMP"].get(
+                "use_dynamic_loss_scaling", False)
+        else:
+            self.scale_loss = 1.0
+            self.use_dynamic_loss_scaling = False
+        if self.amp:
+            AMP_RELATED_FLAGS_SETTING = {
+                'FLAGS_cudnn_batchnorm_spatial_persistent': 1,
+                'FLAGS_max_inplace_grad_add': 8,
+            }
+            paddle.fluid.set_flags(AMP_RELATED_FLAGS_SETTING)
         self.train_loss_func = None
         self.eval_loss_func = None
         self.train_metric_func = None
         self.eval_metric_func = None
+        self.use_dali = self.config['Global'].get("use_dali", False)
 
     def train(self):
         # build train loss and metric info
@@ -116,8 +137,8 @@ class Trainer(object):
                     self.train_metric_func = build_metrics(metric_config)
 
         if self.train_dataloader is None:
-            self.train_dataloader = build_dataloader(self.config["DataLoader"],
-                                                     "Train", self.device)
+            self.train_dataloader = build_dataloader(
+                self.config["DataLoader"], "Train", self.device, self.use_dali)
 
         step_each_epoch = len(self.train_dataloader)
 
@@ -133,7 +154,7 @@ class Trainer(object):
             "metric": 0.0,
             "epoch": 0,
         }
-        # key: 
+        # key:
         # val: metrics list word
         output_info = dict()
         time_info = {
@@ -151,27 +172,52 @@ class Trainer(object):
             if metric_info is not None:
                 best_metric.update(metric_info)
 
+        # for amp training
+        if self.amp:
+            scaler = paddle.amp.GradScaler(
+                init_loss_scaling=self.scale_loss,
+                use_dynamic_loss_scaling=self.use_dynamic_loss_scaling)
+
         tic = time.time()
+        max_iter = len(self.train_dataloader) - 1 if platform.system(
+        ) == "Windows" else len(self.train_dataloader)
         for epoch_id in range(best_metric["epoch"] + 1,
                               self.config["Global"]["epochs"] + 1):
             acc = 0.0
-            for iter_id, batch in enumerate(self.train_dataloader()):
+            train_dataloader = self.train_dataloader if self.use_dali else self.train_dataloader(
+            )
+            for iter_id, batch in enumerate(train_dataloader):
+                if iter_id >= max_iter:
+                    break
                 if iter_id == 5:
                     for key in time_info:
                         time_info[key].reset()
                 time_info["reader_cost"].update(time.time() - tic)
+                if self.use_dali:
+                    batch = [
+                        paddle.to_tensor(batch[0]['data']),
+                        paddle.to_tensor(batch[0]['label'])
+                    ]
                 batch_size = batch[0].shape[0]
                 batch[1] = batch[1].reshape([-1, 1]).astype("int64")
 
                 global_step += 1
                 # image input
-                if not self.is_rec:
-                    out = self.model(batch[0])
+                if self.amp:
+                    with paddle.amp.auto_cast(custom_black_list={
+                            "flatten_contiguous_range", "greater_than"
+                    }):
+                        out = self.forward(batch)
+                        loss_dict = self.train_loss_func(out, batch[1])
                 else:
-                    out = self.model(batch[0], batch[1])
+                    out = self.forward(batch)
 
                 # calc loss
-                loss_dict = self.train_loss_func(out, batch[1])
+                if self.config["DataLoader"]["Train"]["dataset"].get(
+                        "batch_transform_ops", None):
+                    loss_dict = self.train_loss_func(out, batch[1:])
+                else:
+                    loss_dict = self.train_loss_func(out, batch[1])
 
                 for key in loss_dict:
                     if not key in output_info:
@@ -188,8 +234,13 @@ class Trainer(object):
                                                 batch_size)
 
                 # step opt and lr
-                loss_dict["loss"].backward()
-                optimizer.step()
+                if self.amp:
+                    scaled = scaler.scale(loss_dict["loss"])
+                    scaled.backward()
+                    scaler.minimize(optimizer, scaled)
+                else:
+                    loss_dict["loss"].backward()
+                    optimizer.step()
                 optimizer.clear_grad()
                 lr_sch.step()
 
@@ -232,7 +283,8 @@ class Trainer(object):
                             step=global_step,
                             writer=self.vdl_writer)
                 tic = time.time()
-
+            if self.use_dali:
+                self.train_dataloader.reset()
             metric_msg = ", ".join([
                 "{}: {:.5f}".format(key, output_info[key].avg)
                 for key in output_info
@@ -302,7 +354,8 @@ class Trainer(object):
         if self.eval_mode == "classification":
             if self.eval_dataloader is None:
                 self.eval_dataloader = build_dataloader(
-                    self.config["DataLoader"], "Eval", self.device)
+                    self.config["DataLoader"], "Eval", self.device,
+                    self.use_dali)
 
             if self.eval_metric_func is None:
                 metric_config = self.config.get("Metric")
@@ -316,11 +369,13 @@ class Trainer(object):
         elif self.eval_mode == "retrieval":
             if self.gallery_dataloader is None:
                 self.gallery_dataloader = build_dataloader(
-                    self.config["DataLoader"]["Eval"], "Gallery", self.device)
+                    self.config["DataLoader"]["Eval"], "Gallery", self.device,
+                    self.use_dali)
 
             if self.query_dataloader is None:
                 self.query_dataloader = build_dataloader(
-                    self.config["DataLoader"]["Eval"], "Query", self.device)
+                    self.config["DataLoader"]["Eval"], "Query", self.device,
+                    self.use_dali)
             # build metric info
             if self.eval_metric_func is None:
                 metric_config = self.config.get("Metric", None)
@@ -336,6 +391,13 @@ class Trainer(object):
         self.model.train()
         return eval_result
 
+    def forward(self, batch):
+        if not self.is_rec:
+            out = self.model(batch[0])
+        else:
+            out = self.model(batch[0], batch[1])
+        return out
+
     @paddle.no_grad()
     def eval_cls(self, epoch_id=0):
         output_info = dict()
@@ -349,20 +411,27 @@ class Trainer(object):
 
         metric_key = None
         tic = time.time()
-        for iter_id, batch in enumerate(self.eval_dataloader()):
+        eval_dataloader = self.eval_dataloader if self.use_dali else self.eval_dataloader(
+        )
+        max_iter = len(self.eval_dataloader) - 1 if platform.system(
+        ) == "Windows" else len(self.eval_dataloader)
+        for iter_id, batch in enumerate(eval_dataloader):
+            if iter_id >= max_iter:
+                break
             if iter_id == 5:
                 for key in time_info:
                     time_info[key].reset()
-
+            if self.use_dali:
+                batch = [
+                    paddle.to_tensor(batch[0]['data']),
+                    paddle.to_tensor(batch[0]['label'])
+                ]
             time_info["reader_cost"].update(time.time() - tic)
             batch_size = batch[0].shape[0]
             batch[0] = paddle.to_tensor(batch[0]).astype("float32")
             batch[1] = batch[1].reshape([-1, 1]).astype("int64")
             # image input
-            if self.is_rec:
-                out = self.model(batch[0], batch[1])
-            else:
-                out = self.model(batch[0])
+            out = self.forward(batch)
             # calc loss
             if self.eval_loss_func is not None:
                 loss_dict = self.eval_loss_func(out, batch[-1])
@@ -410,7 +479,8 @@ class Trainer(object):
                     len(self.eval_dataloader), metric_msg, time_msg, ips_msg))
 
             tic = time.time()
-
+        if self.use_dali:
+            self.eval_dataloader.reset()
         metric_msg = ", ".join([
             "{}: {:.5f}".format(key, output_info[key].avg)
             for key in output_info
@@ -425,7 +495,6 @@ class Trainer(object):
 
     def eval_retrieval(self, epoch_id=0):
         self.model.eval()
-        cum_similarity_matrix = None
         # step1. build gallery
         gallery_feas, gallery_img_id, gallery_unique_id = self._cal_feature(
             name='gallery')
@@ -498,12 +567,22 @@ class Trainer(object):
             raise RuntimeError("Only support gallery or query dataset")
 
         has_unique_id = False
-        for idx, batch in enumerate(dataloader(
-        )):  # load is very time-consuming
+        max_iter = len(dataloader) - 1 if platform.system(
+        ) == "Windows" else len(dataloader)
+        dataloader_tmp = dataloader if self.use_dali else dataloader()
+        for idx, batch in enumerate(
+                dataloader_tmp):  # load is very time-consuming
+            if idx >= max_iter:
+                break
             if idx % self.config["Global"]["print_batch_step"] == 0:
                 logger.info(
                     f"{name} feature calculation process: [{idx}/{len(dataloader)}]"
                 )
+            if self.use_dali:
+                batch = [
+                    paddle.to_tensor(batch[0]['data']),
+                    paddle.to_tensor(batch[0]['label'])
+                ]
             batch = [paddle.to_tensor(x) for x in batch]
             batch[1] = batch[1].reshape([-1, 1]).astype("int64")
             if len(batch) == 3:
@@ -529,7 +608,8 @@ class Trainer(object):
                 all_image_id = paddle.concat([all_image_id, batch[1]])
                 if has_unique_id:
                     all_unique_id = paddle.concat([all_unique_id, batch[2]])
-
+        if self.use_dali:
+            dataloader_tmp.reset()
         if paddle.distributed.get_world_size() > 1:
             feat_list = []
             img_id_list = []
diff --git a/ppcls/loss/__init__.py b/ppcls/loss/__init__.py
index cee4b05ab55c1fdd23fe57590d8d9eda75619750..5421f421242d72bd27edcef869b23844c51703c6 100644
--- a/ppcls/loss/__init__.py
+++ b/ppcls/loss/__init__.py
@@ -4,7 +4,7 @@ import paddle
 import paddle.nn as nn
 from ppcls.utils import logger
 
-from .celoss import CELoss
+from .celoss import CELoss, MixCELoss
 from .googlenetloss import GoogLeNetLoss
 from .centerloss import CenterLoss
 from .emlloss import EmlLoss
@@ -30,7 +30,6 @@ class CombinedLoss(nn.Layer):
         assert isinstance(config_list, list), (
             'operator config should be a list')
         for config in config_list:
-            print(config)
             assert isinstance(config,
                               dict) and len(config) == 1, "yaml format error"
             name = list(config)[0]
diff --git a/ppcls/loss/celoss.py b/ppcls/loss/celoss.py
index 54c3703009beef11b4a8686620003f6bb948cd58..7bc3c06cb417083dd058f7e3b545249854653557 100644
--- a/ppcls/loss/celoss.py
+++ b/ppcls/loss/celoss.py
@@ -18,6 +18,10 @@ import paddle.nn.functional as F
 
 
 class CELoss(nn.Layer):
+    """
+    Cross entropy loss
+    """
+
     def __init__(self, epsilon=None):
         super().__init__()
         if epsilon is not None and (epsilon <= 0 or epsilon >= 1):
@@ -50,3 +54,21 @@ class CELoss(nn.Layer):
             loss = F.cross_entropy(x, label=label, soft_label=soft_label)
         loss = loss.mean()
         return {"CELoss": loss}
+
+
+class MixCELoss(CELoss):
+    """
+    Cross entropy loss with mix(mixup, cutmix, fixmix)
+    """
+
+    def __init__(self, epsilon=None):
+        super().__init__()
+        self.epsilon = epsilon
+
+    def __call__(self, input, batch):
+        target0, target1, lam = batch
+        loss0 = super().forward(input, target0)["CELoss"]
+        loss1 = super().forward(input, target1)["CELoss"]
+        loss = lam * loss0 + (1.0 - lam) * loss1
+        loss = paddle.mean(loss)
+        return {"MixCELoss": loss}
diff --git a/ppcls/optimizer/__init__.py b/ppcls/optimizer/__init__.py
index 9b71bdddd5ec6cc6ed0abbb8e338691a5efc3c49..0ccb7d197faf496a7ef826d0d51838c566b5fdc3 100644
--- a/ppcls/optimizer/__init__.py
+++ b/ppcls/optimizer/__init__.py
@@ -41,7 +41,7 @@ def build_lr_scheduler(lr_config, epochs, step_each_epoch):
     return lr
 
 
-def build_optimizer(config, epochs, step_each_epoch, parameters):
+def build_optimizer(config, epochs, step_each_epoch, parameters=None):
     config = copy.deepcopy(config)
     # step1 build lr
     lr = build_lr_scheduler(config.pop('lr'), epochs, step_each_epoch)
diff --git a/ppcls/optimizer/optimizer.py b/ppcls/optimizer/optimizer.py
index a6ae21209be769a9c03e7f26fa2e8d96d1176201..534108ed7ff12da4b5f40e0e768b334c32462344 100644
--- a/ppcls/optimizer/optimizer.py
+++ b/ppcls/optimizer/optimizer.py
@@ -33,12 +33,14 @@ class Momentum(object):
                  learning_rate,
                  momentum,
                  weight_decay=None,
-                 grad_clip=None):
+                 grad_clip=None,
+                 multi_precision=False):
         super(Momentum, self).__init__()
         self.learning_rate = learning_rate
         self.momentum = momentum
         self.weight_decay = weight_decay
         self.grad_clip = grad_clip
+        self.multi_precision = multi_precision
 
     def __call__(self, parameters):
         opt = optim.Momentum(
@@ -46,6 +48,7 @@ class Momentum(object):
             momentum=self.momentum,
             weight_decay=self.weight_decay,
             grad_clip=self.grad_clip,
+            multi_precision=self.multi_precision,
             parameters=parameters)
         return opt
 
@@ -60,7 +63,8 @@ class Adam(object):
                  weight_decay=None,
                  grad_clip=None,
                  name=None,
-                 lazy_mode=False):
+                 lazy_mode=False,
+                 multi_precision=False):
         self.learning_rate = learning_rate
         self.beta1 = beta1
         self.beta2 = beta2
@@ -71,6 +75,7 @@ class Adam(object):
         self.grad_clip = grad_clip
         self.name = name
         self.lazy_mode = lazy_mode
+        self.multi_precision = multi_precision
 
     def __call__(self, parameters):
         opt = optim.Adam(
@@ -82,6 +87,7 @@ class Adam(object):
             grad_clip=self.grad_clip,
             name=self.name,
             lazy_mode=self.lazy_mode,
+            multi_precision=self.multi_precision,
             parameters=parameters)
         return opt
 
@@ -104,7 +110,8 @@ class RMSProp(object):
                  rho=0.95,
                  epsilon=1e-6,
                  weight_decay=None,
-                 grad_clip=None):
+                 grad_clip=None,
+                 multi_precision=False):
         super(RMSProp, self).__init__()
         self.learning_rate = learning_rate
         self.momentum = momentum
@@ -122,4 +129,4 @@ class RMSProp(object):
             weight_decay=self.weight_decay,
             grad_clip=self.grad_clip,
             parameters=parameters)
-        return opt
\ No newline at end of file
+        return opt
diff --git a/ppcls/static/program.py b/ppcls/static/program.py
new file mode 100644
index 0000000000000000000000000000000000000000..e6022bbde4529b353db6102e5ac93f798a1cd196
--- /dev/null
+++ b/ppcls/static/program.py
@@ -0,0 +1,454 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+import time
+import numpy as np
+
+from collections import OrderedDict
+
+import paddle
+import paddle.nn.functional as F
+
+from paddle.distributed import fleet
+from paddle.distributed.fleet import DistributedStrategy
+
+# from ppcls.optimizer import OptimizerBuilder
+# from ppcls.optimizer.learning_rate import LearningRateBuilder
+
+from ppcls.arch import build_model
+from ppcls.loss import build_loss
+from ppcls.metric import build_metrics
+from ppcls.optimizer import build_optimizer
+from ppcls.optimizer import build_lr_scheduler
+
+from ppcls.utils.misc import AverageMeter
+from ppcls.utils import logger, profiler
+
+
+def create_feeds(image_shape, use_mix=None, dtype="float32"):
+    """
+    Create feeds as model input
+
+    Args:
+        image_shape(list[int]): model input shape, such as [3, 224, 224]
+        use_mix(bool): whether to use mix(include mixup, cutmix, fmix)
+
+    Returns:
+        feeds(dict): dict of model input variables
+    """
+    feeds = OrderedDict()
+    feeds['data'] = paddle.static.data(
+        name="data", shape=[None] + image_shape, dtype=dtype)
+    if use_mix:
+        feeds['y_a'] = paddle.static.data(
+            name="y_a", shape=[None, 1], dtype="int64")
+        feeds['y_b'] = paddle.static.data(
+            name="y_b", shape=[None, 1], dtype="int64")
+        feeds['lam'] = paddle.static.data(
+            name="lam", shape=[None, 1], dtype=dtype)
+    else:
+        feeds['label'] = paddle.static.data(
+            name="label", shape=[None, 1], dtype="int64")
+
+    return feeds
+
+
+def create_fetchs(out,
+                  feeds,
+                  architecture,
+                  topk=5,
+                  epsilon=None,
+                  use_mix=False,
+                  config=None,
+                  mode="Train"):
+    """
+    Create fetchs as model outputs(included loss and measures),
+    will call create_loss and create_metric(if use_mix).
+    Args:
+        out(variable): model output variable
+        feeds(dict): dict of model input variables.
+            If use mix_up, it will not include label.
+        architecture(dict): architecture information,
+            name(such as ResNet50) is needed
+        topk(int): usually top5
+        epsilon(float): parameter for label smoothing, 0.0 <= epsilon <= 1.0
+        use_mix(bool): whether to use mix(include mixup, cutmix, fmix)
+        config(dict): model config
+
+    Returns:
+        fetchs(dict): dict of model outputs(included loss and measures)
+    """
+    fetchs = OrderedDict()
+    # build loss
+    # TODO(littletomatodonkey): support mix training
+    if use_mix:
+        y_a = paddle.reshape(feeds['y_a'], [-1, 1])
+        y_b = paddle.reshape(feeds['y_b'], [-1, 1])
+        lam = paddle.reshape(feeds['lam'], [-1, 1])
+    else:
+        target = paddle.reshape(feeds['label'], [-1, 1])
+
+    loss_func = build_loss(config["Loss"][mode])
+
+    # TODO: support mix training
+    loss_dict = loss_func(out, target)
+
+    loss_out = loss_dict["loss"]
+    # if "AMP" in config and config.AMP.get("use_pure_fp16", False):
+    # loss_out = loss_out.astype("float16")
+
+    # if use_mix:
+    #     return loss_func(out, feed_y_a, feed_y_b, feed_lam)
+    # else:
+    #     return loss_func(out, target)
+
+    fetchs['loss'] = (loss_out, AverageMeter('loss', '7.4f', need_avg=True))
+
+    assert use_mix is False
+
+    # build metric
+    if not use_mix:
+        metric_func = build_metrics(config["Metric"][mode])
+
+        metric_dict = metric_func(out, target)
+
+        for key in metric_dict:
+            if mode != "Train" and paddle.distributed.get_world_size() > 1:
+                paddle.distributed.all_reduce(
+                    metric_dict[key], op=paddle.distributed.ReduceOp.SUM)
+                metric_dict[key] = metric_dict[
+                    key] / paddle.distributed.get_world_size()
+
+            fetchs[key] = (metric_dict[key], AverageMeter(
+                key, '7.4f', need_avg=True))
+
+    return fetchs
+
+
+def create_optimizer(config, step_each_epoch):
+    # create learning_rate instance
+    optimizer, lr_sch = build_optimizer(
+        config["Optimizer"], config["Global"]["epochs"], step_each_epoch)
+    return optimizer, lr_sch
+
+
+def create_strategy(config):
+    """
+    Create build strategy and exec strategy.
+
+    Args:
+        config(dict): config
+
+    Returns:
+        build_strategy: build strategy
+        exec_strategy: exec strategy
+    """
+    build_strategy = paddle.static.BuildStrategy()
+    exec_strategy = paddle.static.ExecutionStrategy()
+
+    exec_strategy.num_threads = 1
+    exec_strategy.num_iteration_per_drop_scope = (
+        10000
+        if 'AMP' in config and config.AMP.get("use_pure_fp16", False) else 10)
+
+    fuse_op = True if 'AMP' in config else False
+
+    fuse_bn_act_ops = config.get('fuse_bn_act_ops', fuse_op)
+    fuse_elewise_add_act_ops = config.get('fuse_elewise_add_act_ops', fuse_op)
+    fuse_bn_add_act_ops = config.get('fuse_bn_add_act_ops', fuse_op)
+    enable_addto = config.get('enable_addto', fuse_op)
+
+    build_strategy.fuse_bn_act_ops = fuse_bn_act_ops
+    build_strategy.fuse_elewise_add_act_ops = fuse_elewise_add_act_ops
+    build_strategy.fuse_bn_add_act_ops = fuse_bn_add_act_ops
+    build_strategy.enable_addto = enable_addto
+
+    return build_strategy, exec_strategy
+
+
+def dist_optimizer(config, optimizer):
+    """
+    Create a distributed optimizer based on a normal optimizer
+
+    Args:
+        config(dict):
+        optimizer(): a normal optimizer
+
+    Returns:
+        optimizer: a distributed optimizer
+    """
+    build_strategy, exec_strategy = create_strategy(config)
+
+    dist_strategy = DistributedStrategy()
+    dist_strategy.execution_strategy = exec_strategy
+    dist_strategy.build_strategy = build_strategy
+
+    dist_strategy.nccl_comm_num = 1
+    dist_strategy.fuse_all_reduce_ops = True
+    dist_strategy.fuse_grad_size_in_MB = 16
+    optimizer = fleet.distributed_optimizer(optimizer, strategy=dist_strategy)
+
+    return optimizer
+
+
+def mixed_precision_optimizer(config, optimizer):
+    if 'AMP' in config:
+        amp_cfg = config.AMP if config.AMP else dict()
+        scale_loss = amp_cfg.get('scale_loss', 1.0)
+        use_dynamic_loss_scaling = amp_cfg.get('use_dynamic_loss_scaling',
+                                               False)
+        use_pure_fp16 = amp_cfg.get('use_pure_fp16', False)
+        optimizer = paddle.static.amp.decorate(
+            optimizer,
+            init_loss_scaling=scale_loss,
+            use_dynamic_loss_scaling=use_dynamic_loss_scaling,
+            use_pure_fp16=use_pure_fp16,
+            use_fp16_guard=True)
+
+    return optimizer
+
+
+def build(config,
+          main_prog,
+          startup_prog,
+          step_each_epoch=100,
+          is_train=True,
+          is_distributed=True):
+    """
+    Build a program using a model and an optimizer
+        1. create feeds
+        2. create a dataloader
+        3. create a model
+        4. create fetchs
+        5. create an optimizer
+
+    Args:
+        config(dict): config
+        main_prog(): main program
+        startup_prog(): startup program
+        is_train(bool): train or eval
+        is_distributed(bool): whether to use distributed training method
+
+    Returns:
+        dataloader(): a bridge between the model and the data
+        fetchs(dict): dict of model outputs(included loss and measures)
+    """
+    with paddle.static.program_guard(main_prog, startup_prog):
+        with paddle.utils.unique_name.guard():
+            mode = "Train" if is_train else "Eval"
+            use_mix = "batch_transform_ops" in config["DataLoader"][mode][
+                "dataset"]
+            use_dali = config["Global"].get('use_dali', False)
+            feeds = create_feeds(
+                config["Global"]["image_shape"],
+                use_mix=use_mix,
+                dtype="float32")
+
+            # build model
+            # data_format should be assigned in arch-dict
+            input_image_channel = config["Global"]["image_shape"][
+                0]  # default as [3, 224, 224]
+            model = build_model(config["Arch"])
+            out = model(feeds["data"])
+            # end of build model
+
+            fetchs = create_fetchs(
+                out,
+                feeds,
+                config["Arch"],
+                epsilon=config.get('ls_epsilon'),
+                use_mix=use_mix,
+                config=config,
+                mode=mode)
+            lr_scheduler = None
+            optimizer = None
+            if is_train:
+                optimizer, lr_scheduler = build_optimizer(
+                    config["Optimizer"], config["Global"]["epochs"],
+                    step_each_epoch)
+                optimizer = mixed_precision_optimizer(config, optimizer)
+                if is_distributed:
+                    optimizer = dist_optimizer(config, optimizer)
+                optimizer.minimize(fetchs['loss'][0])
+    return fetchs, lr_scheduler, feeds, optimizer
+
+
+def compile(config, program, loss_name=None, share_prog=None):
+    """
+    Compile the program
+
+    Args:
+        config(dict): config
+        program(): the program which is wrapped by
+        loss_name(str): loss name
+        share_prog(): the shared program, used for evaluation during training
+
+    Returns:
+        compiled_program(): a compiled program
+    """
+    build_strategy, exec_strategy = create_strategy(config)
+
+    compiled_program = paddle.static.CompiledProgram(
+        program).with_data_parallel(
+            share_vars_from=share_prog,
+            loss_name=loss_name,
+            build_strategy=build_strategy,
+            exec_strategy=exec_strategy)
+
+    return compiled_program
+
+
+total_step = 0
+
+
+def run(dataloader,
+        exe,
+        program,
+        feeds,
+        fetchs,
+        epoch=0,
+        mode='train',
+        config=None,
+        vdl_writer=None,
+        lr_scheduler=None,
+        profiler_options=None):
+    """
+    Feed data to the model and fetch the measures and loss
+
+    Args:
+        dataloader(paddle io dataloader):
+        exe():
+        program():
+        fetchs(dict): dict of measures and the loss
+        epoch(int): epoch of training or evaluation
+        model(str): log only
+
+    Returns:
+    """
+    fetch_list = [f[0] for f in fetchs.values()]
+    metric_dict = OrderedDict([("lr", AverageMeter(
+        'lr', 'f', postfix=",", need_avg=False))])
+
+    for k in fetchs:
+        metric_dict[k] = fetchs[k][1]
+
+    metric_dict["batch_time"] = AverageMeter(
+        'batch_cost', '.5f', postfix=" s,")
+    metric_dict["reader_time"] = AverageMeter(
+        'reader_cost', '.5f', postfix=" s,")
+
+    for m in metric_dict.values():
+        m.reset()
+
+    use_dali = config["Global"].get('use_dali', False)
+    tic = time.time()
+
+    if not use_dali:
+        dataloader = dataloader()
+
+    idx = 0
+    batch_size = None
+    while True:
+        # The DALI maybe raise RuntimeError for some particular images, such as ImageNet1k/n04418357_26036.JPEG
+        try:
+            batch = next(dataloader)
+        except StopIteration:
+            break
+        except RuntimeError:
+            logger.warning(
+                "Except RuntimeError when reading data from dataloader, try to read once again..."
+            )
+            continue
+        idx += 1
+        # ignore the warmup iters
+        if idx == 5:
+            metric_dict["batch_time"].reset()
+            metric_dict["reader_time"].reset()
+
+        metric_dict['reader_time'].update(time.time() - tic)
+
+        profiler.add_profiler_step(profiler_options)
+
+        if use_dali:
+            batch_size = batch[0]["data"].shape()[0]
+            feed_dict = batch[0]
+        else:
+            batch_size = batch[0].shape()[0]
+            feed_dict = {
+                key.name: batch[idx]
+                for idx, key in enumerate(feeds.values())
+            }
+
+        metrics = exe.run(program=program,
+                          feed=feed_dict,
+                          fetch_list=fetch_list)
+
+        for name, m in zip(fetchs.keys(), metrics):
+            metric_dict[name].update(np.mean(m), batch_size)
+        metric_dict["batch_time"].update(time.time() - tic)
+        if mode == "train":
+            metric_dict['lr'].update(lr_scheduler.get_lr())
+
+        fetchs_str = ' '.join([
+            str(metric_dict[key].mean)
+            if "time" in key else str(metric_dict[key].value)
+            for key in metric_dict
+        ])
+        ips_info = " ips: {:.5f} images/sec.".format(
+            batch_size / metric_dict["batch_time"].avg)
+        fetchs_str += ips_info
+
+        if lr_scheduler is not None:
+            lr_scheduler.step()
+
+        if vdl_writer:
+            global total_step
+            logger.scaler('loss', metrics[0][0], total_step, vdl_writer)
+            total_step += 1
+        if mode == 'eval':
+            if idx % config.get('print_interval', 10) == 0:
+                logger.info("{:s} step:{:<4d} {:s}".format(mode, idx,
+                                                           fetchs_str))
+        else:
+            epoch_str = "epoch:{:<3d}".format(epoch)
+            step_str = "{:s} step:{:<4d}".format(mode, idx)
+
+            if idx % config.get('print_interval', 10) == 0:
+                logger.info("{:s} {:s} {:s}".format(epoch_str, step_str,
+                                                    fetchs_str))
+
+        tic = time.time()
+
+    end_str = ' '.join([str(m.mean) for m in metric_dict.values()] +
+                       [metric_dict["batch_time"].total])
+    ips_info = "ips: {:.5f} images/sec.".format(
+        batch_size * metric_dict["batch_time"].count /
+        metric_dict["batch_time"].sum)
+    if mode == 'eval':
+        logger.info("END {:s} {:s} {:s}".format(mode, end_str, ips_info))
+    else:
+        end_epoch_str = "END epoch:{:<3d}".format(epoch)
+        logger.info("{:s} {:s} {:s} {:s}".format(end_epoch_str, mode, end_str,
+                                                 ips_info))
+    if use_dali:
+        dataloader.reset()
+
+    # return top1_acc in order to save the best model
+    if mode == 'eval':
+        return fetchs["top1"][1].avg
diff --git a/ppcls/static/run_dali.sh b/ppcls/static/run_dali.sh
new file mode 100644
index 0000000000000000000000000000000000000000..8b33b28d28d0b83a163244495a6076fb63fd4a02
--- /dev/null
+++ b/ppcls/static/run_dali.sh
@@ -0,0 +1,11 @@
+#!/usr/bin/env bash
+
+export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
+export FLAGS_fraction_of_gpu_memory_to_use=0.80
+
+python3.7 -m paddle.distributed.launch \
+    --gpus="0,1,2,3,4,5,6,7" \
+    ppcls/static//train.py \
+    -c ./ppcls/configs/ImageNet/ResNet/ResNet50_fp16.yaml \
+    -o Global.use_dali=True
+
diff --git a/ppcls/utils/static/save_load.py b/ppcls/static/save_load.py
similarity index 89%
rename from ppcls/utils/static/save_load.py
rename to ppcls/static/save_load.py
index 7f20b29228a1fd56ef85f204bc511bb56389c527..13badfddc87b111e51f1c3d52ff0c53f11fdbc7e 100644
--- a/ppcls/utils/static/save_load.py
+++ b/ppcls/static/save_load.py
@@ -74,9 +74,7 @@ def load_params(exe, prog, path, ignore_params=None):
         raise ValueError("Model pretrain path {} does not "
                          "exists.".format(path))
 
-    logger.info(
-        logger.coloring('Loading parameters from {}...'.format(path),
-                        'HEADER'))
+    logger.info("Loading parameters from {}...".format(path))
 
     ignore_set = set()
     state = _load_state(path)
@@ -116,9 +114,7 @@ def init_model(config, program, exe):
     checkpoints = config.get('checkpoints')
     if checkpoints:
         paddle.static.load(program, checkpoints, exe)
-        logger.info(
-            logger.coloring("Finish initing model from {}".format(checkpoints),
-                            "HEADER"))
+        logger.info("Finish initing model from {}".format(checkpoints))
         return
 
     pretrained_model = config.get('pretrained_model')
@@ -127,19 +123,17 @@ def init_model(config, program, exe):
             pretrained_model = [pretrained_model]
         for pretrain in pretrained_model:
             load_params(exe, program, pretrain)
-        logger.info(
-            logger.coloring("Finish initing model from {}".format(
-                pretrained_model), "HEADER"))
+        logger.info("Finish initing model from {}".format(pretrained_model))
 
 
 def save_model(program, model_path, epoch_id, prefix='ppcls'):
     """
     save model to the target path
     """
+    if paddle.distributed.get_rank() != 0:
+        return
     model_path = os.path.join(model_path, str(epoch_id))
     _mkdir_if_not_exist(model_path)
     model_prefix = os.path.join(model_path, prefix)
     paddle.static.save(program, model_prefix)
-    logger.info(
-        logger.coloring("Already save model in {}".format(model_path),
-                        "HEADER"))
+    logger.info("Already save model in {}".format(model_path))
diff --git a/ppcls/static/train.py b/ppcls/static/train.py
new file mode 100644
index 0000000000000000000000000000000000000000..a3aa0b591ce2db7d1066f1fada521e3a91cfd239
--- /dev/null
+++ b/ppcls/static/train.py
@@ -0,0 +1,204 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import argparse
+import os
+import sys
+__dir__ = os.path.dirname(os.path.abspath(__file__))
+sys.path.append(__dir__)
+sys.path.append(os.path.abspath(os.path.join(__dir__, '../../')))
+
+import paddle
+from paddle.distributed import fleet
+from visualdl import LogWriter
+
+from ppcls.data import build_dataloader
+from ppcls.utils.config import get_config, print_config
+from ppcls.utils import logger
+from ppcls.utils.logger import init_logger
+from ppcls.static.save_load import init_model, save_model
+from ppcls.static import program
+
+
+def parse_args():
+    parser = argparse.ArgumentParser("PaddleClas train script")
+    parser.add_argument(
+        '-c',
+        '--config',
+        type=str,
+        default='configs/ResNet/ResNet50.yaml',
+        help='config file path')
+    parser.add_argument(
+        '-p',
+        '--profiler_options',
+        type=str,
+        default=None,
+        help='The option of profiler, which should be in format \"key1=value1;key2=value2;key3=value3\".'
+    )
+    parser.add_argument(
+        '-o',
+        '--override',
+        action='append',
+        default=[],
+        help='config options to be overridden')
+    args = parser.parse_args()
+    return args
+
+
+def main(args):
+    """
+    all the config of training paradigm should be in config["Global"]
+    """
+    config = get_config(args.config, overrides=args.override, show=False)
+    global_config = config["Global"]
+
+    mode = "train"
+
+    log_file = os.path.join(global_config['output_dir'],
+                            config["Arch"]["name"], f"{mode}.log")
+    init_logger(name='root', log_file=log_file)
+    print_config(config)
+
+    if global_config.get("is_distributed", True):
+        fleet.init(is_collective=True)
+    # assign the device
+    use_gpu = global_config.get("use_gpu", True)
+    # amp related config
+    if 'AMP' in config:
+        AMP_RELATED_FLAGS_SETTING = {
+            'FLAGS_cudnn_exhaustive_search': "1",
+            'FLAGS_conv_workspace_size_limit': "1500",
+            'FLAGS_cudnn_batchnorm_spatial_persistent': "1",
+            'FLAGS_max_indevice_grad_add': "8",
+            "FLAGS_cudnn_batchnorm_spatial_persistent": "1",
+        }
+        for k in AMP_RELATED_FLAGS_SETTING:
+            os.environ[k] = AMP_RELATED_FLAGS_SETTING[k]
+
+    use_xpu = global_config.get("use_xpu", False)
+    assert (
+        use_gpu and use_xpu
+    ) is not True, "gpu and xpu can not be true in the same time in static mode!"
+
+    if use_gpu:
+        device = paddle.set_device('gpu')
+    elif use_xpu:
+        device = paddle.set_device('xpu')
+    else:
+        device = paddle.set_device('cpu')
+
+    # visualDL
+    vdl_writer = None
+    if global_config["use_visualdl"]:
+        vdl_dir = os.path.join(global_config["output_dir"], "vdl")
+        vdl_writer = LogWriter(vdl_dir)
+
+    # build dataloader
+    eval_dataloader = None
+    use_dali = global_config.get('use_dali', False)
+
+    train_dataloader = build_dataloader(
+        config["DataLoader"], "Train", device=device, use_dali=use_dali)
+    if global_config["eval_during_train"]:
+        eval_dataloader = build_dataloader(
+            config["DataLoader"], "Eval", device=device, use_dali=use_dali)
+
+    step_each_epoch = len(train_dataloader)
+
+    # startup_prog is used to do some parameter init work,
+    # and train prog is used to hold the network
+    startup_prog = paddle.static.Program()
+    train_prog = paddle.static.Program()
+
+    best_top1_acc = 0.0  # best top1 acc record
+
+    train_fetchs, lr_scheduler, train_feeds, optimizer = program.build(
+        config,
+        train_prog,
+        startup_prog,
+        step_each_epoch=step_each_epoch,
+        is_train=True,
+        is_distributed=global_config.get("is_distributed", True))
+
+    if global_config["eval_during_train"]:
+        eval_prog = paddle.static.Program()
+        eval_fetchs, _, eval_feeds, _ = program.build(
+            config,
+            eval_prog,
+            startup_prog,
+            is_train=False,
+            is_distributed=global_config.get("is_distributed", True))
+        # clone to prune some content which is irrelevant in eval_prog
+        eval_prog = eval_prog.clone(for_test=True)
+
+    # create the "Executor" with the statement of which device
+    exe = paddle.static.Executor(device)
+    # Parameter initialization
+    exe.run(startup_prog)
+    # load pretrained models or checkpoints
+    init_model(global_config, train_prog, exe)
+
+    if 'AMP' in config and config.AMP.get("use_pure_fp16", False):
+        optimizer.amp_init(
+            device,
+            scope=paddle.static.global_scope(),
+            test_program=eval_prog
+            if global_config["eval_during_train"] else None)
+
+    if not global_config.get("is_distributed", True):
+        compiled_train_prog = program.compile(
+            config, train_prog, loss_name=train_fetchs["loss"][0].name)
+    else:
+        compiled_train_prog = train_prog
+
+    if eval_dataloader is not None:
+        compiled_eval_prog = program.compile(config, eval_prog)
+
+    for epoch_id in range(global_config["epochs"]):
+        # 1. train with train dataset
+        program.run(train_dataloader, exe, compiled_train_prog, train_feeds,
+                    train_fetchs, epoch_id, 'train', config, vdl_writer,
+                    lr_scheduler, args.profiler_options)
+        # 2. evaate with eval dataset
+        if global_config["eval_during_train"] and epoch_id % global_config[
+                "eval_interval"] == 0:
+            top1_acc = program.run(eval_dataloader, exe, compiled_eval_prog,
+                                   eval_feeds, eval_fetchs, epoch_id, "eval",
+                                   config)
+            if top1_acc > best_top1_acc:
+                best_top1_acc = top1_acc
+                message = "The best top1 acc {:.5f}, in epoch: {:d}".format(
+                    best_top1_acc, epoch_id)
+                logger.info(message)
+                if epoch_id % global_config["save_interval"] == 0:
+
+                    model_path = os.path.join(global_config["output_dir"],
+                                              config["Arch"]["name"])
+                    save_model(train_prog, model_path, "best_model")
+
+        # 3. save the persistable model
+        if epoch_id % global_config["save_interval"] == 0:
+            model_path = os.path.join(global_config["output_dir"],
+                                      config["Arch"]["name"])
+            save_model(train_prog, model_path, epoch_id)
+
+
+if __name__ == '__main__':
+    paddle.enable_static()
+    args = parse_args()
+    main(args)
diff --git a/ppcls/utils/save_load.py b/ppcls/utils/save_load.py
index cca79f4cb24b69a8687b9b3ee92774285bd574be..625a2848339e47e688a263631cfe4a5b9e6c691b 100644
--- a/ppcls/utils/save_load.py
+++ b/ppcls/utils/save_load.py
@@ -54,7 +54,7 @@ def load_dygraph_pretrain(model, path=None):
     return
 
 
-def load_dygraph_pretrain_from_url(model, pretrained_url, use_ssld):
+def load_dygraph_pretrain_from_url(model, pretrained_url, use_ssld=False):
     if use_ssld:
         pretrained_url = pretrained_url.replace("_pretrained",
                                                 "_ssld_pretrained")
diff --git a/ppcls/utils/static/dali.py b/ppcls/utils/static/dali.py
deleted file mode 100644
index eacb3fc9a895fad1d9dd010b05056646a884b48b..0000000000000000000000000000000000000000
--- a/ppcls/utils/static/dali.py
+++ /dev/null
@@ -1,361 +0,0 @@
-# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import division
-
-import os
-
-import numpy as np
-from nvidia.dali.pipeline import Pipeline
-import nvidia.dali.ops as ops
-import nvidia.dali.types as types
-from nvidia.dali.plugin.paddle import DALIGenericIterator
-
-import paddle
-from paddle import fluid
-
-
-class HybridTrainPipe(Pipeline):
-    def __init__(self,
-                 file_root,
-                 file_list,
-                 batch_size,
-                 resize_shorter,
-                 crop,
-                 min_area,
-                 lower,
-                 upper,
-                 interp,
-                 mean,
-                 std,
-                 device_id,
-                 shard_id=0,
-                 num_shards=1,
-                 random_shuffle=True,
-                 num_threads=4,
-                 seed=42,
-                 pad_output=False,
-                 output_dtype=types.FLOAT):
-        super(HybridTrainPipe, self).__init__(
-            batch_size, num_threads, device_id, seed=seed)
-        self.input = ops.FileReader(
-            file_root=file_root,
-            file_list=file_list,
-            shard_id=shard_id,
-            num_shards=num_shards,
-            random_shuffle=random_shuffle)
-        # set internal nvJPEG buffers size to handle full-sized ImageNet images
-        # without additional reallocations
-        device_memory_padding = 211025920
-        host_memory_padding = 140544512
-        self.decode = ops.ImageDecoderRandomCrop(
-            device='mixed',
-            output_type=types.RGB,
-            device_memory_padding=device_memory_padding,
-            host_memory_padding=host_memory_padding,
-            random_aspect_ratio=[lower, upper],
-            random_area=[min_area, 1.0],
-            num_attempts=100)
-        self.res = ops.Resize(
-            device='gpu', resize_x=crop, resize_y=crop, interp_type=interp)
-        self.cmnp = ops.CropMirrorNormalize(
-            device="gpu",
-            output_dtype=output_dtype,
-            output_layout=types.NCHW,
-            crop=(crop, crop),
-            image_type=types.RGB,
-            mean=mean,
-            std=std,
-            pad_output=pad_output)
-        self.coin = ops.CoinFlip(probability=0.5)
-        self.to_int64 = ops.Cast(dtype=types.INT64, device="gpu")
-
-    def define_graph(self):
-        rng = self.coin()
-        jpegs, labels = self.input(name="Reader")
-        images = self.decode(jpegs)
-        images = self.res(images)
-        output = self.cmnp(images.gpu(), mirror=rng)
-        return [output, self.to_int64(labels.gpu())]
-
-    def __len__(self):
-        return self.epoch_size("Reader")
-
-
-class HybridValPipe(Pipeline):
-    def __init__(self,
-                 file_root,
-                 file_list,
-                 batch_size,
-                 resize_shorter,
-                 crop,
-                 interp,
-                 mean,
-                 std,
-                 device_id,
-                 shard_id=0,
-                 num_shards=1,
-                 random_shuffle=False,
-                 num_threads=4,
-                 seed=42,
-                 pad_output=False,
-                 output_dtype=types.FLOAT):
-        super(HybridValPipe, self).__init__(
-            batch_size, num_threads, device_id, seed=seed)
-        self.input = ops.FileReader(
-            file_root=file_root,
-            file_list=file_list,
-            shard_id=shard_id,
-            num_shards=num_shards,
-            random_shuffle=random_shuffle)
-        self.decode = ops.ImageDecoder(device="mixed", output_type=types.RGB)
-        self.res = ops.Resize(
-            device="gpu", resize_shorter=resize_shorter, interp_type=interp)
-        self.cmnp = ops.CropMirrorNormalize(
-            device="gpu",
-            output_dtype=output_dtype,
-            output_layout=types.NCHW,
-            crop=(crop, crop),
-            image_type=types.RGB,
-            mean=mean,
-            std=std,
-            pad_output=pad_output)
-        self.to_int64 = ops.Cast(dtype=types.INT64, device="gpu")
-
-    def define_graph(self):
-        jpegs, labels = self.input(name="Reader")
-        images = self.decode(jpegs)
-        images = self.res(images)
-        output = self.cmnp(images)
-        return [output, self.to_int64(labels.gpu())]
-
-    def __len__(self):
-        return self.epoch_size("Reader")
-
-
-def build(config, mode='train'):
-    env = os.environ
-    assert config.get('use_gpu',
-                      True) == True, "gpu training is required for DALI"
-    assert not config.get(
-        'use_aa'), "auto augment is not supported by DALI reader"
-    assert float(env.get('FLAGS_fraction_of_gpu_memory_to_use', 0.92)) < 0.9, \
-        "Please leave enough GPU memory for DALI workspace, e.g., by setting" \
-        " `export FLAGS_fraction_of_gpu_memory_to_use=0.8`"
-
-    dataset_config = config[mode.upper()]
-
-    gpu_num = paddle.fluid.core.get_cuda_device_count() if (
-        'PADDLE_TRAINERS_NUM') and (
-            'PADDLE_TRAINER_ID'
-        ) not in env else int(env.get('PADDLE_TRAINERS_NUM', 0))
-
-    batch_size = dataset_config.batch_size
-    assert batch_size % gpu_num == 0, \
-        "batch size must be multiple of number of devices"
-    batch_size = batch_size // gpu_num
-
-    file_root = dataset_config.data_dir
-    file_list = dataset_config.file_list
-
-    interp = 1  # settings.interpolation or 1  # default to linear
-    interp_map = {
-        0: types.INTERP_NN,  # cv2.INTER_NEAREST
-        1: types.INTERP_LINEAR,  # cv2.INTER_LINEAR
-        2: types.INTERP_CUBIC,  # cv2.INTER_CUBIC
-        4: types.INTERP_LANCZOS3,  # XXX use LANCZOS3 for cv2.INTER_LANCZOS4
-    }
-
-    output_dtype = (types.FLOAT16 if 'AMP' in config and
-                    config.AMP.get("use_pure_fp16", False) 
-                    else types.FLOAT)
-    
-    assert interp in interp_map, "interpolation method not supported by DALI"
-    interp = interp_map[interp]
-    pad_output = False
-    image_shape = config.get("image_shape", None)
-    if image_shape and image_shape[0] == 4:
-        pad_output = True
-
-    transforms = {
-        k: v
-        for d in dataset_config["transforms"] for k, v in d.items()
-    }
-
-    scale = transforms["NormalizeImage"].get("scale", 1.0 / 255)
-    if isinstance(scale, str):
-        scale = eval(scale)
-    mean = transforms["NormalizeImage"].get("mean", [0.485, 0.456, 0.406])
-    std = transforms["NormalizeImage"].get("std", [0.229, 0.224, 0.225])
-    mean = [v / scale for v in mean]
-    std = [v / scale for v in std]
-
-    if mode == "train":
-        resize_shorter = 256
-        crop = transforms["RandCropImage"]["size"]
-        scale = transforms["RandCropImage"].get("scale", [0.08, 1.])
-        ratio = transforms["RandCropImage"].get("ratio", [3.0 / 4, 4.0 / 3])
-        min_area = scale[0]
-        lower = ratio[0]
-        upper = ratio[1]
-
-        if 'PADDLE_TRAINER_ID' in env and 'PADDLE_TRAINERS_NUM' in env:
-            shard_id = int(env['PADDLE_TRAINER_ID'])
-            num_shards = int(env['PADDLE_TRAINERS_NUM'])
-            device_id = int(env['FLAGS_selected_gpus'])
-            pipe = HybridTrainPipe(
-                file_root,
-                file_list,
-                batch_size,
-                resize_shorter,
-                crop,
-                min_area,
-                lower,
-                upper,
-                interp,
-                mean,
-                std,
-                device_id,
-                shard_id,
-                num_shards,
-                seed=42 + shard_id,
-                pad_output=pad_output,
-                output_dtype=output_dtype)
-            pipe.build()
-            pipelines = [pipe]
-            sample_per_shard = len(pipe) // num_shards
-        else:
-            pipelines = []
-            places = fluid.framework.cuda_places()
-            num_shards = len(places)
-            for idx, p in enumerate(places):
-                place = fluid.core.Place()
-                place.set_place(p)
-                device_id = place.gpu_device_id()
-                pipe = HybridTrainPipe(
-                    file_root,
-                    file_list,
-                    batch_size,
-                    resize_shorter,
-                    crop,
-                    min_area,
-                    lower,
-                    upper,
-                    interp,
-                    mean,
-                    std,
-                    device_id,
-                    idx,
-                    num_shards,
-                    seed=42 + idx,
-                pad_output=pad_output,
-                output_dtype=output_dtype)
-                pipe.build()
-                pipelines.append(pipe)
-            sample_per_shard = len(pipelines[0])
-        return DALIGenericIterator(
-            pipelines, ['feed_image', 'feed_label'], size=sample_per_shard)
-    else:
-        resize_shorter = transforms["ResizeImage"].get("resize_short", 256)
-        crop = transforms["CropImage"]["size"]
-
-        p = fluid.framework.cuda_places()[0]
-        place = fluid.core.Place()
-        place.set_place(p)
-        device_id = place.gpu_device_id()
-        pipe = HybridValPipe(
-            file_root,
-            file_list,
-            batch_size,
-            resize_shorter,
-            crop,
-            interp,
-            mean,
-            std,
-            device_id=device_id,
-            pad_output=pad_output,
-            output_dtype=output_dtype)
-        pipe.build()
-        return DALIGenericIterator(
-            pipe, ['feed_image', 'feed_label'],
-            size=len(pipe),
-            dynamic_shape=True,
-            fill_last_batch=True,
-            last_batch_padded=True)
-
-
-def train(config):
-    return build(config, 'train')
-
-
-def val(config):
-    return build(config, 'valid')
-
-
-def _to_Tensor(lod_tensor, dtype):
-    data_tensor = fluid.layers.create_tensor(dtype=dtype)
-    data = np.array(lod_tensor).astype(dtype)
-    fluid.layers.assign(data, data_tensor)
-    return data_tensor
-
-
-def normalize(feeds, config):
-    image, label = feeds['image'], feeds['label']
-    img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))
-    img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))
-    image = fluid.layers.cast(image, 'float32')
-    costant = fluid.layers.fill_constant(
-        shape=[1], value=255.0, dtype='float32')
-    image = fluid.layers.elementwise_div(image, costant)
-
-    mean = fluid.layers.create_tensor(dtype="float32")
-    fluid.layers.assign(input=img_mean.astype("float32"), output=mean)
-    std = fluid.layers.create_tensor(dtype="float32")
-    fluid.layers.assign(input=img_std.astype("float32"), output=std)
-
-    image = fluid.layers.elementwise_sub(image, mean)
-    image = fluid.layers.elementwise_div(image, std)
-
-    image.stop_gradient = True
-    feeds['image'] = image
-
-    return feeds
-
-
-def mix(feeds, config, is_train=True):
-    env = os.environ
-    gpu_num = paddle.fluid.core.get_cuda_device_count() if (
-        'PADDLE_TRAINERS_NUM') and (
-            'PADDLE_TRAINER_ID'
-        ) not in env else int(env.get('PADDLE_TRAINERS_NUM', 0))
-
-    batch_size = config.TRAIN.batch_size // gpu_num
-
-    images = feeds['image']
-    label = feeds['label']
-    # TODO: hard code here, should be fixed!
-    alpha = 0.2
-    idx = _to_Tensor(np.random.permutation(batch_size), 'int32')
-    lam = np.random.beta(alpha, alpha)
-
-    images = lam * images + (1 - lam) * paddle.fluid.layers.gather(images, idx)
-
-    feed = {
-        'image': images,
-        'feed_y_a': label,
-        'feed_y_b': paddle.fluid.layers.gather(label, idx),
-        'feed_lam': _to_Tensor([lam] * batch_size, 'float32')
-    }
-
-    return feed if is_train else feeds
diff --git a/ppcls/utils/static/program.py b/ppcls/utils/static/program.py
deleted file mode 100644
index f50d7b5d00fab2081e06143939a624cea48b1f46..0000000000000000000000000000000000000000
--- a/ppcls/utils/static/program.py
+++ /dev/null
@@ -1,606 +0,0 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import time
-import numpy as np
-
-from collections import OrderedDict
-from ppcls.optimizer import OptimizerBuilder
-
-import paddle
-import paddle.nn.functional as F
-
-from ppcls.optimizer.learning_rate import LearningRateBuilder
-from ppcls.arch import backbone
-from ppcls.arch.loss import CELoss
-from ppcls.arch.loss import MixCELoss
-from ppcls.arch.loss import JSDivLoss
-from ppcls.arch.loss import GoogLeNetLoss
-from ppcls.utils.misc import AverageMeter
-from ppcls.utils import logger, profiler
-
-from paddle.distributed import fleet
-from paddle.distributed.fleet import DistributedStrategy
-
-
-def create_feeds(image_shape, use_mix=None, use_dali=None, dtype="float32"):
-    """
-    Create feeds as model input
-
-    Args:
-        image_shape(list[int]): model input shape, such as [3, 224, 224]
-        use_mix(bool): whether to use mix(include mixup, cutmix, fmix)
-
-    Returns:
-        feeds(dict): dict of model input variables
-    """
-    feeds = OrderedDict()
-    feeds['image'] = paddle.static.data(
-        name="feed_image", shape=[None] + image_shape, dtype=dtype)
-    if use_mix and not use_dali:
-        feeds['feed_y_a'] = paddle.static.data(
-            name="feed_y_a", shape=[None, 1], dtype="int64")
-        feeds['feed_y_b'] = paddle.static.data(
-            name="feed_y_b", shape=[None, 1], dtype="int64")
-        feeds['feed_lam'] = paddle.static.data(
-            name="feed_lam", shape=[None, 1], dtype=dtype)
-    else:
-        feeds['label'] = paddle.static.data(
-            name="feed_label", shape=[None, 1], dtype="int64")
-
-    return feeds
-
-
-def create_model(architecture, image, classes_num, config, is_train):
-    """
-    Create a model
-
-    Args:
-        architecture(dict): architecture information,
-            name(such as ResNet50) is needed
-        image(variable): model input variable
-        classes_num(int): num of classes
-        config(dict): model config
-
-    Returns:
-        out(variable): model output variable
-    """
-    name = architecture["name"]
-    params = architecture.get("params", {})
-
-    if "data_format" in config:
-        params["data_format"] = config["data_format"]
-        data_format = config["data_format"]
-    input_image_channel = config.get('image_shape', [3, 224, 224])[0]
-    if input_image_channel != 3:
-        logger.warning(
-            "Input image channel is changed to {}, maybe for better speed-up".
-            format(input_image_channel))
-        params["input_image_channel"] = input_image_channel
-    if "is_test" in params:
-        params['is_test'] = not is_train
-    model = backbone.__dict__[name](class_dim=classes_num, **params)
-
-    out = model(image)
-    return out
-
-
-def create_loss(out,
-                feeds,
-                architecture,
-                classes_num=1000,
-                epsilon=None,
-                use_mix=False,
-                use_distillation=False):
-    """
-    Create a loss for optimization, such as:
-        1. CrossEnotry loss
-        2. CrossEnotry loss with label smoothing
-        3. CrossEnotry loss with mix(mixup, cutmix, fmix)
-        4. CrossEnotry loss with label smoothing and (mixup, cutmix, fmix)
-        5. GoogLeNet loss
-
-    Args:
-        out(variable): model output variable
-        feeds(dict): dict of model input variables
-        architecture(dict): architecture information,
-            name(such as ResNet50) is needed
-        classes_num(int): num of classes
-        epsilon(float): parameter for label smoothing, 0.0 <= epsilon <= 1.0
-        use_mix(bool): whether to use mix(include mixup, cutmix, fmix)
-
-    Returns:
-        loss(variable): loss variable
-    """
-    if use_mix:
-        feed_y_a = paddle.reshape(feeds['feed_y_a'], [-1, 1])
-        feed_y_b = paddle.reshape(feeds['feed_y_b'], [-1, 1])
-        feed_lam = paddle.reshape(feeds['feed_lam'], [-1, 1])
-    else:
-        target = paddle.reshape(feeds['label'], [-1, 1])
-
-    if architecture["name"] == "GoogLeNet":
-        assert len(out) == 3, "GoogLeNet should have 3 outputs"
-        loss = GoogLeNetLoss(class_dim=classes_num, epsilon=epsilon)
-        return loss(out[0], out[1], out[2], target)
-
-    if use_distillation:
-        assert len(out) == 2, ("distillation output length must be 2, "
-                               "but got {}".format(len(out)))
-        loss = JSDivLoss(class_dim=classes_num, epsilon=epsilon)
-        return loss(out[1], out[0])
-
-    if use_mix:
-        loss = MixCELoss(class_dim=classes_num, epsilon=epsilon)
-        return loss(out, feed_y_a, feed_y_b, feed_lam)
-    else:
-        loss = CELoss(class_dim=classes_num, epsilon=epsilon)
-        return loss(out, target)
-
-
-def create_metric(out,
-                  feeds,
-                  architecture,
-                  topk=5,
-                  classes_num=1000,
-                  config=None,
-                  use_distillation=False):
-    """
-    Create measures of model accuracy, such as top1 and top5
-
-    Args:
-        out(variable): model output variable
-        feeds(dict): dict of model input variables(included label)
-        topk(int): usually top5
-        classes_num(int): num of classes
-        config(dict) : model config
-
-    Returns:
-        fetchs(dict): dict of measures
-    """
-    label = paddle.reshape(feeds['label'], [-1, 1])
-    if architecture["name"] == "GoogLeNet":
-        assert len(out) == 3, "GoogLeNet should have 3 outputs"
-        out = out[0]
-    else:
-        # just need student label to get metrics
-        if use_distillation:
-            out = out[1]
-    softmax_out = F.softmax(out)
-
-    fetchs = OrderedDict()
-    # set top1 to fetchs
-    top1 = paddle.metric.accuracy(softmax_out, label=label, k=1)
-    fetchs['top1'] = (top1, AverageMeter('top1', '.4f', need_avg=True))
-    # set topk to fetchs
-    k = min(topk, classes_num)
-    topk = paddle.metric.accuracy(softmax_out, label=label, k=k)
-    topk_name = 'top{}'.format(k)
-    fetchs[topk_name] = (topk, AverageMeter(topk_name, '.4f', need_avg=True))
-    return fetchs
-
-
-def create_fetchs(out,
-                  feeds,
-                  architecture,
-                  topk=5,
-                  classes_num=1000,
-                  epsilon=None,
-                  use_mix=False,
-                  config=None,
-                  use_distillation=False):
-    """
-    Create fetchs as model outputs(included loss and measures),
-    will call create_loss and create_metric(if use_mix).
-
-    Args:
-        out(variable): model output variable
-        feeds(dict): dict of model input variables.
-            If use mix_up, it will not include label.
-        architecture(dict): architecture information,
-            name(such as ResNet50) is needed
-        topk(int): usually top5
-        classes_num(int): num of classes
-        epsilon(float): parameter for label smoothing, 0.0 <= epsilon <= 1.0
-        use_mix(bool): whether to use mix(include mixup, cutmix, fmix)
-        config(dict): model config
-
-    Returns:
-        fetchs(dict): dict of model outputs(included loss and measures)
-    """
-    fetchs = OrderedDict()
-    loss = create_loss(out, feeds, architecture, classes_num, epsilon, use_mix,
-                       use_distillation)
-    fetchs['loss'] = (loss, AverageMeter('loss', '7.4f', need_avg=True))
-    if not use_mix:
-        metric = create_metric(out, feeds, architecture, topk, classes_num,
-                               config, use_distillation)
-        fetchs.update(metric)
-
-    return fetchs
-
-
-def create_optimizer(config):
-    """
-    Create an optimizer using config, usually including
-    learning rate and regularization.
-
-    Args:
-        config(dict):  such as
-        {
-            'LEARNING_RATE':
-                {'function': 'Cosine',
-                 'params': {'lr': 0.1}
-                },
-            'OPTIMIZER':
-                {'function': 'Momentum',
-                 'params':{'momentum': 0.9},
-                 'regularizer':
-                    {'function': 'L2', 'factor': 0.0001}
-                }
-        }
-
-    Returns:
-        an optimizer instance
-    """
-    # create learning_rate instance
-    lr_config = config['LEARNING_RATE']
-    lr_config['params'].update({
-        'epochs': config['epochs'],
-        'step_each_epoch':
-        config['total_images'] // config['TRAIN']['batch_size'],
-    })
-    lr = LearningRateBuilder(**lr_config)()
-
-    # create optimizer instance
-    opt_config = config['OPTIMIZER']
-    opt = OptimizerBuilder(**opt_config)
-    return opt(lr), lr
-
-
-def create_strategy(config):
-    """
-    Create build strategy and exec strategy.
-
-    Args:
-        config(dict): config
-
-    Returns:
-        build_strategy: build strategy
-        exec_strategy: exec strategy
-    """
-    build_strategy = paddle.static.BuildStrategy()
-    exec_strategy = paddle.static.ExecutionStrategy()
-
-    exec_strategy.num_threads = 1
-    exec_strategy.num_iteration_per_drop_scope = (
-        10000
-        if 'AMP' in config and config.AMP.get("use_pure_fp16", False) else 10)
-
-    fuse_op = True if 'AMP' in config else False
-
-    fuse_bn_act_ops = config.get('fuse_bn_act_ops', fuse_op)
-    fuse_elewise_add_act_ops = config.get('fuse_elewise_add_act_ops', fuse_op)
-    fuse_bn_add_act_ops = config.get('fuse_bn_add_act_ops', fuse_op)
-    enable_addto = config.get('enable_addto', fuse_op)
-
-    try:
-        build_strategy.fuse_bn_act_ops = fuse_bn_act_ops
-    except Exception as e:
-        logger.info(
-            "PaddlePaddle version 1.7.0 or higher is "
-            "required when you want to fuse batch_norm and activation_op.")
-
-    try:
-        build_strategy.fuse_elewise_add_act_ops = fuse_elewise_add_act_ops
-    except Exception as e:
-        logger.info(
-            "PaddlePaddle version 1.7.0 or higher is "
-            "required when you want to fuse elewise_add_act and activation_op.")
-
-    try:
-        build_strategy.fuse_bn_add_act_ops = fuse_bn_add_act_ops
-    except Exception as e:
-        logger.info(
-            "PaddlePaddle 2.0-rc or higher is "
-            "required when you want to enable fuse_bn_add_act_ops strategy.")
-
-    try:
-        build_strategy.enable_addto = enable_addto
-    except Exception as e:
-        logger.info("PaddlePaddle 2.0-rc or higher is "
-                    "required when you want to enable addto strategy.")
-    return build_strategy, exec_strategy
-
-
-def dist_optimizer(config, optimizer):
-    """
-    Create a distributed optimizer based on a normal optimizer
-
-    Args:
-        config(dict):
-        optimizer(): a normal optimizer
-
-    Returns:
-        optimizer: a distributed optimizer
-    """
-    build_strategy, exec_strategy = create_strategy(config)
-
-    dist_strategy = DistributedStrategy()
-    dist_strategy.execution_strategy = exec_strategy
-    dist_strategy.build_strategy = build_strategy
-
-    dist_strategy.nccl_comm_num = 1
-    dist_strategy.fuse_all_reduce_ops = True
-    dist_strategy.fuse_grad_size_in_MB = 16
-    optimizer = fleet.distributed_optimizer(optimizer, strategy=dist_strategy)
-
-    return optimizer
-
-
-def mixed_precision_optimizer(config, optimizer):
-    if 'AMP' in config:
-        amp_cfg = config.AMP if config.AMP else dict()
-        scale_loss = amp_cfg.get('scale_loss', 1.0)
-        use_dynamic_loss_scaling = amp_cfg.get('use_dynamic_loss_scaling',
-                                               False)
-        use_pure_fp16 = amp_cfg.get('use_pure_fp16', False)
-        optimizer = paddle.static.amp.decorate(
-            optimizer,
-            init_loss_scaling=scale_loss,
-            use_dynamic_loss_scaling=use_dynamic_loss_scaling,
-            use_pure_fp16=use_pure_fp16,
-            use_fp16_guard=True)
-
-    return optimizer
-
-
-def build(config, main_prog, startup_prog, is_train=True, is_distributed=True):
-    """
-    Build a program using a model and an optimizer
-        1. create feeds
-        2. create a dataloader
-        3. create a model
-        4. create fetchs
-        5. create an optimizer
-
-    Args:
-        config(dict): config
-        main_prog(): main program
-        startup_prog(): startup program
-        is_train(bool): train or valid
-        is_distributed(bool): whether to use distributed training method
-
-    Returns:
-        dataloader(): a bridge between the model and the data
-        fetchs(dict): dict of model outputs(included loss and measures)
-    """
-    with paddle.static.program_guard(main_prog, startup_prog):
-        with paddle.utils.unique_name.guard():
-            use_mix = config.get('use_mix') and is_train
-            use_dali = config.get('use_dali', False)
-            use_distillation = config.get('use_distillation')
-
-            feeds = create_feeds(
-                config.image_shape,
-                use_mix=use_mix,
-                use_dali=use_dali,
-                dtype="float32")
-            if use_dali and use_mix:
-                import dali
-                feeds = dali.mix(feeds, config, is_train)
-            out = create_model(config.ARCHITECTURE, feeds['image'],
-                               config.classes_num, config, is_train)
-            fetchs = create_fetchs(
-                out,
-                feeds,
-                config.ARCHITECTURE,
-                config.topk,
-                config.classes_num,
-                epsilon=config.get('ls_epsilon'),
-                use_mix=use_mix,
-                config=config,
-                use_distillation=use_distillation)
-            lr_scheduler = None
-            optimizer = None
-            if is_train:
-                optimizer, lr_scheduler = create_optimizer(config)
-                optimizer = mixed_precision_optimizer(config, optimizer)
-                if is_distributed:
-                    optimizer = dist_optimizer(config, optimizer)
-                optimizer.minimize(fetchs['loss'][0])
-    return fetchs, lr_scheduler, feeds, optimizer
-
-
-def compile(config, program, loss_name=None, share_prog=None):
-    """
-    Compile the program
-
-    Args:
-        config(dict): config
-        program(): the program which is wrapped by
-        loss_name(str): loss name
-        share_prog(): the shared program, used for evaluation during training
-
-    Returns:
-        compiled_program(): a compiled program
-    """
-    build_strategy, exec_strategy = create_strategy(config)
-
-    compiled_program = paddle.static.CompiledProgram(
-        program).with_data_parallel(
-            share_vars_from=share_prog,
-            loss_name=loss_name,
-            build_strategy=build_strategy,
-            exec_strategy=exec_strategy)
-
-    return compiled_program
-
-
-total_step = 0
-
-
-def run(dataloader,
-        exe,
-        program,
-        feeds,
-        fetchs,
-        epoch=0,
-        mode='train',
-        config=None,
-        vdl_writer=None,
-        lr_scheduler=None,
-        profiler_options=None):
-    """
-    Feed data to the model and fetch the measures and loss
-
-    Args:
-        dataloader(paddle io dataloader):
-        exe():
-        program():
-        fetchs(dict): dict of measures and the loss
-        epoch(int): epoch of training or validation
-        model(str): log only
-
-    Returns:
-    """
-    fetch_list = [f[0] for f in fetchs.values()]
-    metric_list = [
-        ("lr", AverageMeter(
-            'lr', 'f', postfix=",", need_avg=False)),
-        ("batch_time", AverageMeter(
-            'batch_cost', '.5f', postfix=" s,")),
-        ("reader_time", AverageMeter(
-            'reader_cost', '.5f', postfix=" s,")),
-    ]
-    topk_name = 'top{}'.format(config.topk)
-    metric_list.insert(0, ("loss", fetchs["loss"][1]))
-    use_mix = config.get("use_mix", False) and mode == "train"
-    if not use_mix:
-        metric_list.insert(0, (topk_name, fetchs[topk_name][1]))
-        metric_list.insert(0, ("top1", fetchs["top1"][1]))
-
-    metric_list = OrderedDict(metric_list)
-
-    for m in metric_list.values():
-        m.reset()
-
-    use_dali = config.get('use_dali', False)
-    dataloader = dataloader if use_dali else dataloader()
-    tic = time.time()
-
-    idx = 0
-    batch_size = None
-    while True:
-        # The DALI maybe raise RuntimeError for some particular images, such as ImageNet1k/n04418357_26036.JPEG
-        try:
-            batch = next(dataloader)
-        except StopIteration:
-            break
-        except RuntimeError:
-            logger.warning(
-                "Except RuntimeError when reading data from dataloader, try to read once again..."
-            )
-            continue
-        idx += 1
-        # ignore the warmup iters
-        if idx == 5:
-            metric_list["batch_time"].reset()
-            metric_list["reader_time"].reset()
-
-        metric_list['reader_time'].update(time.time() - tic)
-
-        profiler.add_profiler_step(profiler_options)
-
-        if use_dali:
-            batch_size = batch[0]["feed_image"].shape()[0]
-            feed_dict = batch[0]
-        else:
-            batch_size = batch[0].shape()[0]
-            feed_dict = {
-                key.name: batch[idx]
-                for idx, key in enumerate(feeds.values())
-            }
-        metrics = exe.run(program=program,
-                          feed=feed_dict,
-                          fetch_list=fetch_list)
-
-        for name, m in zip(fetchs.keys(), metrics):
-            metric_list[name].update(np.mean(m), batch_size)
-        metric_list["batch_time"].update(time.time() - tic)
-        if mode == "train":
-            metric_list['lr'].update(lr_scheduler.get_lr())
-
-        fetchs_str = ' '.join([
-            str(metric_list[key].mean)
-            if "time" in key else str(metric_list[key].value)
-            for key in metric_list
-        ])
-        ips_info = " ips: {:.5f} images/sec.".format(
-            batch_size / metric_list["batch_time"].avg)
-        fetchs_str += ips_info
-
-        if lr_scheduler is not None:
-            if lr_scheduler.update_specified:
-                curr_global_counter = lr_scheduler.step_each_epoch * epoch + idx
-                update = max(
-                    0, curr_global_counter - lr_scheduler.
-                    update_start_step) % lr_scheduler.update_step_interval == 0
-                if update:
-                    lr_scheduler.step()
-            else:
-                lr_scheduler.step()
-
-        if vdl_writer:
-            global total_step
-            logger.scaler('loss', metrics[0][0], total_step, vdl_writer)
-            total_step += 1
-        if mode == 'valid':
-            if idx % config.get('print_interval', 10) == 0:
-                logger.info("{:s} step:{:<4d} {:s}".format(mode, idx,
-                                                           fetchs_str))
-        else:
-            epoch_str = "epoch:{:<3d}".format(epoch)
-            step_str = "{:s} step:{:<4d}".format(mode, idx)
-
-            if idx % config.get('print_interval', 10) == 0:
-                logger.info("{:s} {:s} {:s}".format(
-                    logger.coloring(epoch_str, "HEADER")
-                    if idx == 0 else epoch_str,
-                    logger.coloring(step_str, "PURPLE"),
-                    logger.coloring(fetchs_str, 'OKGREEN')))
-
-        tic = time.time()
-
-    end_str = ' '.join([str(m.mean) for m in metric_list.values()] +
-                       [metric_list["batch_time"].total])
-    ips_info = "ips: {:.5f} images/sec.".format(
-        batch_size * metric_list["batch_time"].count /
-        metric_list["batch_time"].sum)
-    if mode == 'valid':
-        logger.info("END {:s} {:s} {:s}".format(mode, end_str, ips_info))
-    else:
-        end_epoch_str = "END epoch:{:<3d}".format(epoch)
-        logger.info("{:s} {:s} {:s} {:s}".format(end_epoch_str, mode, end_str,
-                                                 ips_info))
-    if use_dali:
-        dataloader.reset()
-
-    # return top1_acc in order to save the best model
-    if mode == 'valid':
-        return fetchs["top1"][1].avg
diff --git a/ppcls/utils/static/run_dali.sh b/ppcls/utils/static/run_dali.sh
deleted file mode 100644
index 1ac48b3caef459eaf86f8a528af732d1a14a3630..0000000000000000000000000000000000000000
--- a/ppcls/utils/static/run_dali.sh
+++ /dev/null
@@ -1,11 +0,0 @@
-#!/usr/bin/env bash
-
-export CUDA_VISIBLE_DEVICES="0,1,2,3"
-export FLAGS_fraction_of_gpu_memory_to_use=0.80
-
-python3.7 -m paddle.distributed.launch \
-    --gpus="0,1,2,3" \
-    tools/static/train.py \
-        -c ./configs/ResNet/ResNet50.yaml \
-        -o print_interval=10 \
-        -o use_dali=True
diff --git a/ppcls/utils/static/train.py b/ppcls/utils/static/train.py
deleted file mode 100644
index 40c7856307bffdaa30ea253902b90abf3fa87bc6..0000000000000000000000000000000000000000
--- a/ppcls/utils/static/train.py
+++ /dev/null
@@ -1,195 +0,0 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import os
-import sys
-__dir__ = os.path.dirname(os.path.abspath(__file__))
-sys.path.append(__dir__)
-sys.path.append(os.path.abspath(os.path.join(__dir__, '../../')))
-
-from sys import version_info
-
-import paddle
-from paddle.distributed import fleet
-
-from ppcls.data import Reader
-from ppcls.utils.config import get_config
-from ppcls.utils import logger
-from tools.static import program
-from save_load import init_model, save_model
-
-
-def parse_args():
-    parser = argparse.ArgumentParser("PaddleClas train script")
-    parser.add_argument(
-        '-c',
-        '--config',
-        type=str,
-        default='configs/ResNet/ResNet50.yaml',
-        help='config file path')
-    parser.add_argument(
-        '--vdl_dir',
-        type=str,
-        default=None,
-        help='VisualDL logging directory for image.')
-    parser.add_argument(
-        '-p',
-        '--profiler_options',
-        type=str,
-        default=None,
-        help='The option of profiler, which should be in format \"key1=value1;key2=value2;key3=value3\".'
-    )
-    parser.add_argument(
-        '-o',
-        '--override',
-        action='append',
-        default=[],
-        help='config options to be overridden')
-    args = parser.parse_args()
-    return args
-
-
-def main(args):
-    config = get_config(args.config, overrides=args.override, show=True)
-    if config.get("is_distributed", True):
-        fleet.init(is_collective=True)
-    # assign the place
-    use_gpu = config.get("use_gpu", True)
-    # amp related config
-    if 'AMP' in config:
-        AMP_RELATED_FLAGS_SETTING = {
-            'FLAGS_cudnn_exhaustive_search': 1,
-            'FLAGS_conv_workspace_size_limit': 1500,
-            'FLAGS_cudnn_batchnorm_spatial_persistent': 1,
-            'FLAGS_max_inplace_grad_add': 8,
-        }
-        os.environ['FLAGS_cudnn_batchnorm_spatial_persistent'] = '1'
-        paddle.fluid.set_flags(AMP_RELATED_FLAGS_SETTING)
-    use_xpu = config.get("use_xpu", False)
-    assert (
-        use_gpu and use_xpu
-    ) is not True, "gpu and xpu can not be true in the same time in static mode!"
-
-    if use_gpu:
-        place = paddle.set_device('gpu')
-    elif use_xpu:
-        place = paddle.set_device('xpu')
-    else:
-        place = paddle.set_device('cpu')
-
-    # startup_prog is used to do some parameter init work,
-    # and train prog is used to hold the network
-    startup_prog = paddle.static.Program()
-    train_prog = paddle.static.Program()
-
-    best_top1_acc = 0.0  # best top1 acc record
-
-    train_fetchs, lr_scheduler, train_feeds, optimizer = program.build(
-        config,
-        train_prog,
-        startup_prog,
-        is_train=True,
-        is_distributed=config.get("is_distributed", True))
-
-    if config.validate:
-        valid_prog = paddle.static.Program()
-        valid_fetchs, _, valid_feeds, _ = program.build(
-            config,
-            valid_prog,
-            startup_prog,
-            is_train=False,
-            is_distributed=config.get("is_distributed", True))
-        # clone to prune some content which is irrelevant in valid_prog
-        valid_prog = valid_prog.clone(for_test=True)
-
-    # create the "Executor" with the statement of which place
-    exe = paddle.static.Executor(place)
-    # Parameter initialization
-    exe.run(startup_prog)
-    # load pretrained models or checkpoints
-    init_model(config, train_prog, exe)
-
-    if 'AMP' in config and config.AMP.get("use_pure_fp16", False):
-        optimizer.amp_init(
-            place,
-            scope=paddle.static.global_scope(),
-            test_program=valid_prog if config.validate else None)
-
-    if not config.get("is_distributed", True):
-        compiled_train_prog = program.compile(
-            config, train_prog, loss_name=train_fetchs["loss"][0].name)
-    else:
-        compiled_train_prog = train_prog
-
-    if not config.get('use_dali', False):
-        train_dataloader = Reader(config, 'train', places=place)()
-        if config.validate and paddle.distributed.get_rank() == 0:
-            valid_dataloader = Reader(config, 'valid', places=place)()
-            compiled_valid_prog = program.compile(config, valid_prog)
-    else:
-        assert use_gpu is True, "DALI only support gpu, please set use_gpu to True!"
-        import dali
-        train_dataloader = dali.train(config)
-        if config.validate and paddle.distributed.get_rank() == 0:
-            valid_dataloader = dali.val(config)
-            compiled_valid_prog = program.compile(config, valid_prog)
-
-    vdl_writer = None
-    if args.vdl_dir:
-        if version_info.major == 2:
-            logger.info(
-                "visualdl is just supported for python3, so it is disabled in python2..."
-            )
-        else:
-            from visualdl import LogWriter
-            vdl_writer = LogWriter(args.vdl_dir)
-
-    for epoch_id in range(config.epochs):
-        # 1. train with train dataset
-        program.run(train_dataloader, exe, compiled_train_prog, train_feeds,
-                    train_fetchs, epoch_id, 'train', config, vdl_writer,
-                    lr_scheduler, args.profiler_options)
-        if paddle.distributed.get_rank() == 0:
-            # 2. validate with validate dataset
-            if config.validate and epoch_id % config.valid_interval == 0:
-                top1_acc = program.run(valid_dataloader, exe,
-                                       compiled_valid_prog, valid_feeds,
-                                       valid_fetchs, epoch_id, 'valid', config)
-                if top1_acc > best_top1_acc:
-                    best_top1_acc = top1_acc
-                    message = "The best top1 acc {:.5f}, in epoch: {:d}".format(
-                        best_top1_acc, epoch_id)
-                    logger.info("{:s}".format(logger.coloring(message, "RED")))
-                    if epoch_id % config.save_interval == 0:
-
-                        model_path = os.path.join(config.model_save_dir,
-                                                  config.ARCHITECTURE["name"])
-                        save_model(train_prog, model_path, "best_model")
-
-            # 3. save the persistable model
-            if epoch_id % config.save_interval == 0:
-                model_path = os.path.join(config.model_save_dir,
-                                          config.ARCHITECTURE["name"])
-                save_model(train_prog, model_path, epoch_id)
-
-
-if __name__ == '__main__':
-    paddle.enable_static()
-    args = parse_args()
-    main(args)
diff --git a/tests/DarkNet53.txt b/tests/DarkNet53.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e5a9adb862eaae2b2abc3f9aaddee6b4c7a161ba
--- /dev/null
+++ b/tests/DarkNet53.txt
@@ -0,0 +1,51 @@
+===========================train_params===========================
+model_name:DarkNet53
+python:python3.7
+gpu_list:0|0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_infer=2|whole_train_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.batch_size:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:norm_train
+norm_train:tools/train.py -c ppcls/configs/ImageNet/DarkNet/DarkNet53.yaml
+pact_train:null
+fpgm_train:null
+distill_train:null
+null:null
+null:null
+##
+===========================eval_params=========================== 
+eval:tools/eval.py -c ppcls/configs/ImageNet/DarkNet/DarkNet53.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/ImageNet/DarkNet/DarkNet53.yaml
+quant_export:null
+fpgm_export:null
+distill_export:null
+export1:null
+export2:null
+infer_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/whole_chain/DarkNet53_inference.tar
+infer_model:../inference/
+infer_export:null
+infer_quant:Fasle
+inference:python/predict_cls.py -c configs/inference_cls.yaml
+-o Global.use_gpu:True|False
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/ILSVRC2012/val
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
diff --git a/tests/HRNet_W18_C.txt b/tests/HRNet_W18_C.txt
new file mode 100644
index 0000000000000000000000000000000000000000..08c712accc70dc3ea70030b006c2f44c12488a3f
--- /dev/null
+++ b/tests/HRNet_W18_C.txt
@@ -0,0 +1,51 @@
+===========================train_params===========================
+model_name:HRNet_W18_C
+python:python3.7
+gpu_list:0|0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_infer=2|whole_train_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.batch_size:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:norm_train
+norm_train:tools/train.py -c ppcls/configs/ImageNet/HRNet/HRNet_W18_C.yaml
+pact_train:null
+fpgm_train:null
+distill_train:null
+null:null
+null:null
+##
+===========================eval_params=========================== 
+eval:tools/eval.py -c ppcls/configs/ImageNet/HRNet/HRNet_W18_C.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/ImageNet/HRNet/HRNet_W18_C.yaml
+quant_export:null
+fpgm_export:null
+distill_export:null
+export1:null
+export2:null
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/whole_chain/HRNet_W18_C_inference.tar
+infer_model:../inference/
+infer_export:null
+infer_quant:Fasle
+inference:python/predict_cls.py -c configs/inference_cls.yaml
+-o Global.use_gpu:True|False
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/ILSVRC2012/val
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
diff --git a/tests/LeViT_128S.txt b/tests/LeViT_128S.txt
new file mode 100644
index 0000000000000000000000000000000000000000..337d8af7701eae6104a834dfe64995a97ac96439
--- /dev/null
+++ b/tests/LeViT_128S.txt
@@ -0,0 +1,51 @@
+===========================train_params===========================
+model_name:LeViT_128S
+python:python3.7
+gpu_list:0|0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_infer=2|whole_train_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.batch_size:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:norm_train
+norm_train:tools/train.py -c ppcls/configs/ImageNet/LeViT/LeViT_128S.yaml
+pact_train:null
+fpgm_train:null
+distill_train:null
+null:null
+null:null
+##
+===========================eval_params=========================== 
+eval:tools/eval.py -c ppcls/configs/ImageNet/LeViT/LeViT_128S.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/ImageNet/LeViT/LeViT_128S.yaml
+quant_export:null
+fpgm_export:null
+distill_export:null
+export1:null
+export2:null
+infer_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/whole_chain/LeViT_128S_inference.tar
+infer_model:../inference/
+infer_export:null
+infer_quant:Fasle
+inference:python/predict_cls.py -c configs/inference_cls.yaml
+-o Global.use_gpu:True|Fasle
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/ILSVRC2012/val
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
diff --git a/tests/MobileNetV1.txt b/tests/MobileNetV1.txt
new file mode 100644
index 0000000000000000000000000000000000000000..784d6f30832f26831a1f0d60adf6c43d71f3f181
--- /dev/null
+++ b/tests/MobileNetV1.txt
@@ -0,0 +1,51 @@
+===========================train_params===========================
+model_name:MobileNetV1
+python:python3.7
+gpu_list:0|0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_infer=2|whole_train_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.batch_size:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:norm_train
+norm_train:tools/train.py -c ppcls/configs/ImageNet/MobileNetV1/MobileNetV1.yaml
+pact_train:null
+fpgm_train:null
+distill_train:null
+null:null
+null:null
+##
+===========================eval_params=========================== 
+eval:tools/eval.py -c ppcls/configs/ImageNet/MobileNetV1/MobileNetV1.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/ImageNet/MobileNetV1/MobileNetV1.yaml
+quant_export:null
+fpgm_export:null
+distill_export:null
+export1:null
+export2:null
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/whole_chain/MobileNetV1_inference.tar
+infer_model:../inference/
+infer_export:null
+infer_quant:Fasle
+inference:python/predict_cls.py -c configs/inference_cls.yaml
+-o Global.use_gpu:True|False
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/ILSVRC2012/val
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
diff --git a/tests/MobileNetV2.txt b/tests/MobileNetV2.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c622100fea92753ff4a7239e1b56a0ea4ee97352
--- /dev/null
+++ b/tests/MobileNetV2.txt
@@ -0,0 +1,51 @@
+===========================train_params===========================
+model_name:MobileNetV2
+python:python3.7
+gpu_list:0|0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_infer=2|whole_train_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.batch_size:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:norm_train
+norm_train:tools/train.py -c ppcls/configs/ImageNet/MobileNetV2/MobileNetV2.yaml
+pact_train:null
+fpgm_train:null
+distill_train:null
+null:null
+null:null
+##
+===========================eval_params=========================== 
+eval:tools/eval.py -c ppcls/configs/ImageNet/MobileNetV2/MobileNetV2.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/ImageNet/MobileNetV2/MobileNetV2.yaml
+quant_export:null
+fpgm_export:null
+distill_export:null
+export1:null
+export2:null
+infer_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/whole_chain/MobileNetV2_inference.tar
+infer_model:../inference/
+infer_export:null
+infer_quant:Fasle
+inference:python/predict_cls.py -c configs/inference_cls.yaml
+-o Global.use_gpu:True|False
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/ILSVRC2012/val
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
diff --git a/tests/MobileNetV3_large_x1_0.txt b/tests/MobileNetV3_large_x1_0.txt
new file mode 100644
index 0000000000000000000000000000000000000000..2bc4ec43fce5706375648f13d6773d07b1524309
--- /dev/null
+++ b/tests/MobileNetV3_large_x1_0.txt
@@ -0,0 +1,51 @@
+===========================train_params===========================
+model_name:MobileNetV3_large_x1_0
+python:python3.7
+gpu_list:0|0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_infer=2|whole_train_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.batch_size:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:norm_train
+norm_train:tools/train.py -c ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x1_0.yaml
+pact_train:deploy/slim/slim.py -c ppcls/configs/slim/MobileNetV3_large_x1_0_quantization.yaml
+fpgm_train:deploy/slim/slim.py -c ppcls/configs/slim/MobileNetV3_large_x1_0_prune.yaml
+distill_train:null
+null:null
+null:null
+##
+===========================eval_params=========================== 
+eval:tools/eval.py -c ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x1_0.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x1_0.yaml
+quant_export:deploy/slim/slim.py -m export -c ppcls/configs/slim/MobileNetV3_large_x1_0_quantalization.yaml
+fpgm_export:deploy/slim/slim.py -m export -c ppcls/configs/slim/MobileNetV3_large_x1_0_prune.yaml
+distill_export:null
+export1:null
+export2:null
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/whole_chain/MobileNetV3_large_x1_0_inference.tar
+infer_model:../inference/
+infer_export:null
+infer_quant:Fasle
+inference:python/predict_cls.py -c configs/inference_cls.yaml
+-o Global.use_gpu:True|False
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/ILSVRC2012/val
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
diff --git a/tests/ResNeXt101_vd_64x4d.txt b/tests/ResNeXt101_vd_64x4d.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5a4a088d7367f522a9837394922a49d918d07830
--- /dev/null
+++ b/tests/ResNeXt101_vd_64x4d.txt
@@ -0,0 +1,51 @@
+===========================train_params===========================
+model_name:ResNeXt101_vd_64x4d
+python:python3.7
+gpu_list:0|0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_infer=2|whole_train_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.batch_size:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:norm_train
+norm_train:tools/train.py -c ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_64x4d.yaml
+pact_train:null
+fpgm_train:null
+distill_train:null
+null:null
+null:null
+##
+===========================eval_params=========================== 
+eval:tools/eval.py -c ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_64x4d.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_64x4d.yaml
+quant_export:null
+fpgm_export:null
+distill_export:null
+export1:null
+export2:null
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/whole_chain/ResNeXt101_64x4d_inference.tar
+infer_model:../inference/
+infer_export:null
+infer_quant:Fasle
+inference:python/predict_cls.py -c configs/inference_cls.yaml
+-o Global.use_gpu:True|False
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/ILSVRC2012/val
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
diff --git a/tests/ResNet50_vd.txt b/tests/ResNet50_vd.txt
new file mode 100644
index 0000000000000000000000000000000000000000..da02c8894b0c13981742bf698f95450a2b3e3082
--- /dev/null
+++ b/tests/ResNet50_vd.txt
@@ -0,0 +1,51 @@
+===========================train_params===========================
+model_name:ResNet50_vd
+python:python3.7
+gpu_list:0|0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_infer=2|whole_train_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.batch_size:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:norm_train
+norm_train:tools/train.py -c ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml
+pact_train:deploy/slim/slim.py -c ppcls/configs/slim/ResNet50_vd_quantization.yaml
+fpgm_train:deploy/slim/slim.py -c ppcls/configs/slim/ResNet50_vd_prune.yaml
+distill_train:null
+null:null
+null:null
+##
+===========================eval_params=========================== 
+eval:tools/eval.py -c ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml
+quant_export:deploy/slim/slim.py -m export -c ppcls/configs/slim/ResNet50_vd_quantalization.yaml
+fpgm_export:deploy/slim/slim.py -m export -c ppcls/configs/slim/ResNet50_vd_prune.yaml
+distill_export:null
+export1:null
+export2:null
+infer_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/whole_chain/ResNet50_vd_inference.tar
+infer_model:../inference/
+infer_export:null
+infer_quant:Fasle
+inference:python/predict_cls.py -c configs/inference_cls.yaml
+-o Global.use_gpu:True|False
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/ILSVRC2012/val
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
diff --git a/tests/ShuffleNetV2_x1_0.txt b/tests/ShuffleNetV2_x1_0.txt
new file mode 100644
index 0000000000000000000000000000000000000000..08964a2f0fda5c2eb7c56690604980a1eac73d75
--- /dev/null
+++ b/tests/ShuffleNetV2_x1_0.txt
@@ -0,0 +1,51 @@
+===========================train_params===========================
+model_name:ShuffleNetV2_x1_0
+python:python3.7
+gpu_list:0|0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_infer=2|whole_train_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.batch_size:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:norm_train
+norm_train:tools/train.py -c ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x1_0.yaml
+pact_train:null
+fpgm_train:null
+distill_train:null
+null:null
+null:null
+##
+===========================eval_params=========================== 
+eval:tools/eval.py -c ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x1_0.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x1_0.yaml
+quant_export:null
+fpgm_export:null
+distill_export:null
+export1:null
+export2:null
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/whole_chain/ShuffleNetV2_x1_0_inference.tar
+infer_model:../inference/
+infer_export:null
+infer_quant:Fasle
+inference:python/predict_cls.py -c configs/inference_cls.yaml
+-o Global.use_gpu:True|False
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/ILSVRC2012/val
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
diff --git a/tests/SwinTransformer_tiny_patch4_window7_224.txt b/tests/SwinTransformer_tiny_patch4_window7_224.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a358d191a04e3e471da5a7f127da37e1f7162c0a
--- /dev/null
+++ b/tests/SwinTransformer_tiny_patch4_window7_224.txt
@@ -0,0 +1,51 @@
+===========================train_params===========================
+model_name:SwinTransformer_tiny_patch4_window7_224
+python:python3.7
+gpu_list:0|0,1
+-o Global.device:gpu
+-o Global.auto_cast:null
+-o Global.epochs:lite_train_infer=2|whole_train_infer=120
+-o Global.output_dir:./output/
+-o DataLoader.Train.sampler.batch_size:8
+-o Global.pretrained_model:null
+train_model_name:latest
+train_infer_img_dir:./dataset/ILSVRC2012/val
+null:null
+##
+trainer:norm_train
+norm_train:tools/train.py -c ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_tiny_patch4_window7_224.yaml
+pact_train:null
+fpgm_train:null
+distill_train:null
+null:null
+null:null
+##
+===========================eval_params=========================== 
+eval:tools/eval.py -c ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_tiny_patch4_window7_224.yaml
+null:null
+##
+===========================infer_params==========================
+-o Global.save_inference_dir:./inference
+-o Global.pretrained_model:
+norm_export:tools/export_model.py -c ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_tiny_patch4_window7_224.yaml
+quant_export:null
+fpgm_export:null
+distill_export:null
+export1:null
+export2:null
+inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/whole_chain/SwinTransformer_tiny_patch4_window7_224_inference.tar
+infer_model:../inference/
+infer_export:null
+infer_quant:Fasle
+inference:python/predict_cls.py -c configs/inference_cls.yaml
+-o Global.use_gpu:True|False
+-o Global.enable_mkldnn:True|False
+-o Global.cpu_num_threads:1|6
+-o Global.batch_size:1
+-o Global.use_tensorrt:True|False
+-o Global.use_fp16:True|False
+-o Global.inference_model_dir:../inference
+-o Global.infer_imgs:../dataset/ILSVRC2012/val
+-o Global.save_log_path:null
+-o Global.benchmark:True
+null:null
diff --git a/tests/prepare.sh b/tests/prepare.sh
new file mode 100644
index 0000000000000000000000000000000000000000..55e1f2c7f0898779cfba6b91f2a0f0a789931170
--- /dev/null
+++ b/tests/prepare.sh
@@ -0,0 +1,60 @@
+#!/bin/bash
+FILENAME=$1
+# MODE be one of ['lite_train_infer' 'whole_infer' 'whole_train_infer', 'infer']
+MODE=$2
+
+dataline=$(cat ${FILENAME})
+# parser params
+IFS=$'\n'
+lines=(${dataline})
+function func_parser_value(){
+    strs=$1
+    IFS=":"
+    array=(${strs})
+    if [ ${#array[*]} = 2 ]; then
+        echo ${array[1]}
+    else
+    	IFS="|"
+    	tmp="${array[1]}:${array[2]}"
+        echo ${tmp}
+    fi
+}
+model_name=$(func_parser_value "${lines[1]}")
+inference_model_url=$(func_parser_value "${lines[35]}")
+
+if [ ${MODE} = "lite_train_infer" ] || [ ${MODE} = "whole_infer" ];then
+    # pretrain lite train data
+    cd dataset
+    rm -rf ILSVRC2012
+    wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/data/whole_chain/whole_chain_little_train.tar
+    tar xf whole_chain_little_train.tar
+    ln -s whole_chain_little_train ILSVRC2012
+    cd ILSVRC2012 
+    mv train.txt train_list.txt
+    mv val.txt val_list.txt
+    cd ../../
+elif [ ${MODE} = "infer" ];then
+    # download data
+    cd dataset
+    rm -rf ILSVRC2012
+    wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/data/whole_chain/whole_chain_infer.tar
+    tar xf whole_chain_infer.tar
+    ln -s whole_chain_infer ILSVRC2012
+    cd ILSVRC2012 
+    mv val.txt val_list.txt
+    cd ../../
+    # download inference model
+    eval "wget -nc $inference_model_url"
+    tar xf "${model_name}_inference.tar"
+
+elif [ ${MODE} = "whole_train_infer" ];then
+    cd dataset
+    rm -rf ILSVRC2012
+    wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/data/whole_chain/whole_chain_CIFAR100.tar
+    tar xf whole_chain_CIFAR100.tar
+    ln -s whole_chain_CIFAR100 ILSVRC2012
+    cd ILSVRC2012 
+    mv train.txt train_list.txt
+    mv val.txt val_list.txt
+    cd ../../
+fi
diff --git a/tests/test.sh b/tests/test.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1885b757e5c9be620bdb2873bfbca26f1ea694a5
--- /dev/null
+++ b/tests/test.sh
@@ -0,0 +1,363 @@
+#!/bin/bash
+FILENAME=$1
+# MODE be one of ['lite_train_infer' 'whole_infer' 'whole_train_infer', 'infer'] 
+MODE=$2
+
+dataline=$(cat ${FILENAME})
+
+# parser params
+IFS=$'\n'
+lines=(${dataline})
+
+function func_parser_key(){
+    strs=$1
+    IFS=":"
+    array=(${strs})
+    tmp=${array[0]}
+    echo ${tmp}
+}
+function func_parser_value(){
+    strs=$1
+    IFS=":"
+    array=(${strs})
+    tmp=${array[1]}
+    echo ${tmp}
+}
+function func_set_params(){
+    key=$1
+    value=$2
+    if [ ${key} = "null" ];then
+        echo " "
+    elif [[ ${value} = "null" ]] || [[ ${value} = " " ]] || [ ${#value} -le 0 ];then
+        echo " "
+    else 
+        echo "${key}=${value}"
+    fi
+}
+function func_parser_params(){
+    strs=$1
+    IFS=":"
+    array=(${strs})
+    key=${array[0]}
+    tmp=${array[1]}
+    IFS="|"
+    res=""
+    for _params in ${tmp[*]}; do
+        IFS="="
+        array=(${_params})
+        mode=${array[0]}
+        value=${array[1]}
+        if [[ ${mode} = ${MODE} ]]; then
+            IFS="|"
+            #echo $(func_set_params "${mode}" "${value}")
+            echo $value
+            break
+        fi
+        IFS="|"
+    done
+    echo ${res}
+}
+function status_check(){
+    last_status=$1   # the exit code
+    run_command=$2
+    run_log=$3
+    if [ $last_status -eq 0 ]; then
+        echo -e "\033[33m Run successfully with command - ${run_command}!  \033[0m" | tee -a ${run_log}
+    else
+        echo -e "\033[33m Run failed with command - ${run_command}!  \033[0m" | tee -a ${run_log}
+    fi
+}
+
+IFS=$'\n'
+# The training params
+model_name=$(func_parser_value "${lines[1]}")
+python=$(func_parser_value "${lines[2]}")
+gpu_list=$(func_parser_value "${lines[3]}")
+train_use_gpu_key=$(func_parser_key "${lines[4]}")
+train_use_gpu_value=$(func_parser_value "${lines[4]}")
+autocast_list=$(func_parser_value "${lines[5]}")
+autocast_key=$(func_parser_key "${lines[5]}")
+epoch_key=$(func_parser_key "${lines[6]}")
+epoch_num=$(func_parser_params "${lines[6]}")
+save_model_key=$(func_parser_key "${lines[7]}")
+train_batch_key=$(func_parser_key "${lines[8]}")
+train_batch_value=$(func_parser_params "${lines[8]}")
+pretrain_model_key=$(func_parser_key "${lines[9]}")
+pretrain_model_value=$(func_parser_value "${lines[9]}")
+train_model_name=$(func_parser_value "${lines[10]}")
+train_infer_img_dir=$(func_parser_value "${lines[11]}")
+train_param_key1=$(func_parser_key "${lines[12]}")
+train_param_value1=$(func_parser_value "${lines[12]}")
+
+trainer_list=$(func_parser_value "${lines[14]}")
+trainer_norm=$(func_parser_key "${lines[15]}")
+norm_trainer=$(func_parser_value "${lines[15]}")
+pact_key=$(func_parser_key "${lines[16]}")
+pact_trainer=$(func_parser_value "${lines[16]}")
+fpgm_key=$(func_parser_key "${lines[17]}")
+fpgm_trainer=$(func_parser_value "${lines[17]}")
+distill_key=$(func_parser_key "${lines[18]}")
+distill_trainer=$(func_parser_value "${lines[18]}")
+trainer_key1=$(func_parser_key "${lines[19]}")
+trainer_value1=$(func_parser_value "${lines[19]}")
+trainer_key2=$(func_parser_key "${lines[20]}")
+trainer_value2=$(func_parser_value "${lines[20]}")
+
+eval_py=$(func_parser_value "${lines[23]}")
+eval_key1=$(func_parser_key "${lines[24]}")
+eval_value1=$(func_parser_value "${lines[24]}")
+
+save_infer_key=$(func_parser_key "${lines[27]}")
+export_weight=$(func_parser_key "${lines[28]}")
+norm_export=$(func_parser_value "${lines[29]}")
+pact_export=$(func_parser_value "${lines[30]}")
+fpgm_export=$(func_parser_value "${lines[31]}")
+distill_export=$(func_parser_value "${lines[32]}")
+export_key1=$(func_parser_key "${lines[33]}")
+export_value1=$(func_parser_value "${lines[33]}")
+export_key2=$(func_parser_key "${lines[34]}")
+export_value2=$(func_parser_value "${lines[34]}")
+
+# parser inference model 
+infer_model_dir_list=$(func_parser_value "${lines[36]}")
+infer_export_list=$(func_parser_value "${lines[37]}")
+infer_is_quant=$(func_parser_value "${lines[38]}")
+# parser inference 
+inference_py=$(func_parser_value "${lines[39]}")
+use_gpu_key=$(func_parser_key "${lines[40]}")
+use_gpu_list=$(func_parser_value "${lines[40]}")
+use_mkldnn_key=$(func_parser_key "${lines[41]}")
+use_mkldnn_list=$(func_parser_value "${lines[41]}")
+cpu_threads_key=$(func_parser_key "${lines[42]}")
+cpu_threads_list=$(func_parser_value "${lines[42]}")
+batch_size_key=$(func_parser_key "${lines[43]}")
+batch_size_list=$(func_parser_value "${lines[43]}")
+use_trt_key=$(func_parser_key "${lines[44]}")
+use_trt_list=$(func_parser_value "${lines[44]}")
+precision_key=$(func_parser_key "${lines[45]}")
+precision_list=$(func_parser_value "${lines[45]}")
+infer_model_key=$(func_parser_key "${lines[46]}")
+image_dir_key=$(func_parser_key "${lines[47]}")
+infer_img_dir=$(func_parser_value "${lines[47]}")
+save_log_key=$(func_parser_key "${lines[48]}")
+benchmark_key=$(func_parser_key "${lines[49]}")
+benchmark_value=$(func_parser_value "${lines[49]}")
+infer_key1=$(func_parser_key "${lines[50]}")
+infer_value1=$(func_parser_value "${lines[50]}")
+
+LOG_PATH="./tests/output"
+mkdir -p ${LOG_PATH}
+status_log="${LOG_PATH}/results.log"
+
+
+function func_inference(){
+    IFS='|'
+    _python=$1
+    _script=$2
+    _model_dir=$3
+    _log_path=$4
+    _img_dir=$5
+    _flag_quant=$6
+    # inference 
+    for use_gpu in ${use_gpu_list[*]}; do
+        if [ ${use_gpu} = "False" ] || [ ${use_gpu} = "cpu" ]; then
+            for use_mkldnn in ${use_mkldnn_list[*]}; do
+                if [ ${use_mkldnn} = "False" ] && [ ${_flag_quant} = "True" ]; then
+                    continue
+                fi
+                for threads in ${cpu_threads_list[*]}; do
+                    for batch_size in ${batch_size_list[*]}; do
+                        _save_log_path="${_log_path}/infer_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_batchsize_${batch_size}.log"
+                        set_infer_data=$(func_set_params "${image_dir_key}" "${_img_dir}")
+                        set_benchmark=$(func_set_params "${benchmark_key}" "${benchmark_value}")
+                        set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}")
+                        set_cpu_threads=$(func_set_params "${cpu_threads_key}" "${threads}")
+                        set_model_dir=$(func_set_params "${infer_model_key}" "${_model_dir}")
+                        set_infer_params1=$(func_set_params "${infer_key1}" "${infer_value1}")
+                        command="${_python} ${_script} ${use_gpu_key}=${use_gpu} ${use_mkldnn_key}=${use_mkldnn} ${set_cpu_threads} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} ${set_infer_params1} > ${_save_log_path} 2>&1 "
+                        eval $command
+                        last_status=${PIPESTATUS[0]}
+                        eval "cat ${_save_log_path}"
+                        status_check $last_status "${command}" "../${status_log}"
+                    done
+                done
+            done
+        elif [ ${use_gpu} = "True" ] || [ ${use_gpu} = "gpu" ]; then
+            for use_trt in ${use_trt_list[*]}; do
+                for precision in ${precision_list[*]}; do
+                    if [ ${precision} = "False" ] && [ ${use_trt} = "False" ]; then
+                        continue
+                    fi
+                    if [[ ${use_trt} = "False" || ${precision} =~ "int8" ]] && [ ${_flag_quant} = "True" ]; then
+                        continue
+                    fi
+                    for batch_size in ${batch_size_list[*]}; do
+                        _save_log_path="${_log_path}/infer_gpu_usetrt_${use_trt}_precision_${precision}_batchsize_${batch_size}.log"
+                        set_infer_data=$(func_set_params "${image_dir_key}" "${_img_dir}")
+                        set_benchmark=$(func_set_params "${benchmark_key}" "${benchmark_value}")
+                        set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}")
+                        set_tensorrt=$(func_set_params "${use_trt_key}" "${use_trt}")
+                        set_precision=$(func_set_params "${precision_key}" "${precision}")
+                        set_model_dir=$(func_set_params "${infer_model_key}" "${_model_dir}")
+                        command="${_python} ${_script} ${use_gpu_key}=${use_gpu} ${set_tensorrt} ${set_precision} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} > ${_save_log_path} 2>&1 "
+                        eval $command
+                        last_status=${PIPESTATUS[0]}
+                        eval "cat ${_save_log_path}"
+                        status_check $last_status "${command}" "../${status_log}"
+                        
+                    done
+                done
+            done
+        else
+            echo "Does not support hardware other than CPU and GPU Currently!"
+        fi
+    done
+}
+
+if [ ${MODE} = "infer" ]; then
+    GPUID=$3
+    if [ ${#GPUID} -le 0 ];then
+        env=" "
+    else
+        env="export CUDA_VISIBLE_DEVICES=${GPUID}"
+    fi
+    # set CUDA_VISIBLE_DEVICES
+    eval $env
+    export Count=0
+    IFS="|"
+    infer_run_exports=(${infer_export_list})
+    infer_quant_flag=(${infer_is_quant})
+    cd deploy
+    for infer_model in ${infer_model_dir_list[*]}; do
+        # run export
+        if [ ${infer_run_exports[Count]} != "null" ];then
+            set_export_weight=$(func_set_params "${export_weight}" "${infer_model}")
+            set_save_infer_key=$(func_set_params "${save_infer_key}" "${infer_model}")
+            export_cmd="${python} ${norm_export} ${set_export_weight} ${set_save_infer_key}"
+            eval $export_cmd
+            status_export=$?
+            if [ ${status_export} = 0 ];then
+                status_check $status_export "${export_cmd}" "../${status_log}"
+            fi
+        fi
+        #run inference
+        is_quant=${infer_quant_flag[Count]}
+        echo "is_quant: ${is_quant}"
+        func_inference "${python}" "${inference_py}" "${infer_model}" "../${LOG_PATH}" "${infer_img_dir}" ${is_quant}
+        Count=$(($Count + 1))
+    done
+    cd ..
+
+else
+    IFS="|"
+    export Count=0
+    USE_GPU_KEY=(${train_use_gpu_value})
+    for gpu in ${gpu_list[*]}; do
+        use_gpu=${USE_GPU_KEY[Count]}
+        Count=$(($Count + 1))
+        if [ ${gpu} = "-1" ];then
+            env=""
+        elif [ ${#gpu} -le 1 ];then
+            env="export CUDA_VISIBLE_DEVICES=${gpu}"
+            eval ${env}
+        elif [ ${#gpu} -le 15 ];then
+            IFS=","
+            array=(${gpu})
+            env="export CUDA_VISIBLE_DEVICES=${array[0]}"
+            IFS="|"
+        else
+            IFS=";"
+            array=(${gpu})
+            ips=${array[0]}
+            gpu=${array[1]}
+            IFS="|"
+            env=" "
+        fi
+        for autocast in ${autocast_list[*]}; do 
+            for trainer in ${trainer_list[*]}; do 
+                flag_quant=False
+                if [ ${trainer} = ${pact_key} ]; then
+                    run_train=${pact_trainer}
+                    run_export=${pact_export}
+                    flag_quant=True
+                elif [ ${trainer} = "${fpgm_key}" ]; then
+                    run_train=${fpgm_trainer}
+                    run_export=${fpgm_export}
+                elif [ ${trainer} = "${distill_key}" ]; then
+                    run_train=${distill_trainer}
+                    run_export=${distill_export}
+                elif [ ${trainer} = ${trainer_key1} ]; then
+                    run_train=${trainer_value1}
+                    run_export=${export_value1}
+                elif [[ ${trainer} = ${trainer_key2} ]]; then
+                    run_train=${trainer_value2}
+                    run_export=${export_value2}
+                else
+                    run_train=${norm_trainer}
+                    run_export=${norm_export}
+                fi
+
+                if [ ${run_train} = "null" ]; then
+                    continue
+                fi
+                
+                set_autocast=$(func_set_params "${autocast_key}" "${autocast}")
+                set_epoch=$(func_set_params "${epoch_key}" "${epoch_num}")
+                set_pretrain=$(func_set_params "${pretrain_model_key}" "${pretrain_model_value}")
+                set_batchsize=$(func_set_params "${train_batch_key}" "${train_batch_value}")
+                set_train_params1=$(func_set_params "${train_param_key1}" "${train_param_value1}")
+                set_use_gpu=$(func_set_params "${train_use_gpu_key}" "${use_gpu}")
+                save_log="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}"
+                
+                # load pretrain from norm training if current trainer is pact or fpgm trainer
+                if [ ${trainer} = ${pact_key} ] || [ ${trainer} = ${fpgm_key} ]; then
+                    set_pretrain="${load_norm_train_model}"
+                fi
+
+                set_save_model=$(func_set_params "${save_model_key}" "${save_log}")
+                if [ ${#gpu} -le 2 ];then  # train with cpu or single gpu
+                    cmd="${python} ${run_train} ${set_use_gpu}  ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1} "
+                elif [ ${#gpu} -le 15 ];then  # train with multi-gpu
+                    cmd="${python} -m paddle.distributed.launch --gpus=${gpu} ${run_train} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1}"
+                else     # train with multi-machine
+                    cmd="${python} -m paddle.distributed.launch --ips=${ips} --gpus=${gpu} ${run_train} ${set_save_model} ${set_pretrain} ${set_epoch} ${set_autocast} ${set_batchsize} ${set_train_params1}"
+                fi
+                # run train
+		eval "unset CUDA_VISIBLE_DEVICES"
+                eval $cmd
+                status_check $? "${cmd}" "${status_log}"
+
+                set_eval_pretrain=$(func_set_params "${pretrain_model_key}" "${save_log}/${$model_name}/${train_model_name}")
+                # save norm trained models to set pretrain for pact training and fpgm training 
+                if [ ${trainer} = ${trainer_norm} ]; then
+                    load_norm_train_model=${set_eval_pretrain}
+                fi
+                # run eval 
+                if [ ${eval_py} != "null" ]; then
+                    set_eval_params1=$(func_set_params "${eval_key1}" "${eval_value1}")
+                    eval_cmd="${python} ${eval_py} ${set_eval_pretrain} ${set_use_gpu} ${set_eval_params1}" 
+                    eval $eval_cmd
+                    status_check $? "${eval_cmd}" "${status_log}"
+                fi
+                # run export model
+                if [ ${run_export} != "null" ]; then 
+                    # run export model
+                    save_infer_path="${save_log}"
+                    set_export_weight=$(func_set_params "${export_weight}" "${save_log}/${model_name}/${train_model_name}")
+                    set_save_infer_key=$(func_set_params "${save_infer_key}" "${save_infer_path}")
+                    export_cmd="${python} ${run_export} ${set_export_weight} ${set_save_infer_key}"
+                    eval $export_cmd
+                    status_check $? "${export_cmd}" "${status_log}"
+
+                    #run inference
+                    eval $env
+                    save_infer_path="${save_log}"
+		    cd deploy
+                    func_inference "${python}" "${inference_py}" "../${save_infer_path}" "../${LOG_PATH}" "${infer_img_dir}" "${flag_quant}"
+		    cd ..
+                fi
+                eval "unset CUDA_VISIBLE_DEVICES"
+            done  # done with:    for trainer in ${trainer_list[*]}; do 
+        done      # done with:    for autocast in ${autocast_list[*]}; do 
+    done          # done with:    for gpu in ${gpu_list[*]}; do
+fi  # end if [ ${MODE} = "infer" ]; then