Update Some Demos & Add Infer for ImageNet Demo (#1202)

* Update Some Demos & Add Infer for ImageNet Demo * Update Some Demos & Add Infer for ImageNet Demo * Update Some Demos & Add Infer for ImageNet Demo * Update Some Demos & Add Infer for ImageNet Demo

Update Some Demos & Add Infer for ImageNet Demo (#1202)
* Update Some Demos & Add Infer for ImageNet Demo * Update Some Demos & Add Infer for ImageNet Demo * Update Some Demos & Add Infer for ImageNet Demo * Update Some Demos & Add Infer for ImageNet Demo
bd1bca08 · Chang Xu · GitHub · e95a22ca · bd1bca08 · bd1bca08
37 changed file
--- a/demo/auto_compression/detection/eval.py
+++ b/demo/auto_compression/detection/eval.py
@@ -40,13 +40,6 @@ def argsparser():
    return parser


-def print_arguments(args):
-    print('-----------  Running Arguments -----------')
-    for arg, value in sorted(vars(args).items()):
-        print('%s: %s' % (arg, value))
-    print('------------------------------------------')
-
-
 def reader_wrapper(reader, input_list):
    def gen():
        for data in reader:
@@ -153,8 +146,6 @@ if __name__ == '__main__':
    paddle.enable_static()
    parser = argsparser()
    FLAGS = parser.parse_args()
-    print_arguments(FLAGS)
-
    assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
    paddle.set_device(FLAGS.devices)


--- a/demo/auto_compression/detection/run.py
+++ b/demo/auto_compression/detection/run.py
@@ -48,13 +48,6 @@ def argsparser():
    return parser


-def print_arguments(args):
-    print('-----------  Running Arguments -----------')
-    for arg, value in sorted(vars(args).items()):
-        print('%s: %s' % (arg, value))
-    print('------------------------------------------')
-
-
 def reader_wrapper(reader, input_list):
    def gen():
        for data in reader:
@@ -171,8 +164,6 @@ if __name__ == '__main__':
    paddle.enable_static()
    parser = argsparser()
    FLAGS = parser.parse_args()
-    print_arguments(FLAGS)
-
    assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
    paddle.set_device(FLAGS.devices)


--- a/demo/auto_compression/image_classification/README.md
+++ b/demo/auto_compression/image_classification/README.md
@@ -9,6 +9,8 @@
  - [3.3 准备预测模型](#33-准备预测模型)
  - [3.4 自动压缩并产出模型](#34-自动压缩并产出模型)
 - [4. 预测部署](#4预测部署)
+  - [4.1 Python预测推理](#41-Python预测推理)
+  - [4.2 PaddleLite端侧部署](#42-PaddleLite端侧部署)
 - [5. FAQ](5FAQ)


@@ -19,7 +21,7 @@

 ### PaddleClas模型

-| 模型 | 策略 | Top-1 Acc | GPU 耗时(ms) | ARM CPU 耗时(ms) | 
+| 模型 | 策略 | Top-1 Acc | GPU 耗时(ms) | ARM CPU 耗时(ms) |
 |:------:|:------:|:------:|:------:|:------:|
 | MobileNetV1 | Baseline | 70.90 | - | 33.15 |
 | MobileNetV1 | 量化+蒸馏 | 70.49 | - | 13.64 |
@@ -105,21 +107,50 @@ tar -xf MobileNetV1_infer.tar

 蒸馏量化自动压缩示例通过run.py脚本启动，会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行量化训练和蒸馏。配置config文件中模型路径、数据集路径、蒸馏、量化和训练等部分的参数，配置完成后便可开始自动压缩。

+**单卡启动**
+
 ```shell
-# 单卡启动
 export CUDA_VISIBLE_DEVICES=0
-# 多卡启动
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-
 python run.py --save_dir='./save_quant_mobilev1/' --config_path='./configs/MobileNetV1/qat_dis.yaml'
+```
+
+**分布式训练**
+
+图像分类训练任务中往往包含大量训练数据，以ImageNet为例，ImageNet22k数据集中包含1400W张图像，如果使用单卡训练，会非常耗时，使用分布式训练可以达到几乎线性的加速比。

+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python -m paddle.distributed.launch run.py --save_dir='./save_quant_mobilev1/' --config_path='./configs/MobileNetV1/qat_dis.yaml'
 ```
+多卡训练（分布式训练）指的是将训练任务按照一定方法拆分到多个训练节点完成数据读取、前向计算、反向梯度计算等过程，并将计算出的梯度上传至服务节点。服务节点在收到所有训练节点传来的梯度后，会将梯度聚合并更新参数。最后将参数发送给训练节点，开始新一轮的训练。多卡训练一轮训练能训练```batch size * num gpus```的数据，比如单卡的```batch size```为32，单轮训练的数据量即32，而四卡训练的```batch size```为32，单轮训练的数据量为128。
+
+注意```learning rate```与```batch size```呈线性关系，这里单卡```batch size```为32，对应的```learning rate```为0.015，那么如果```batch size```减小4倍改为8，```learning rate```也需除以4；多卡时```batch size```为32，```learning rate```需乘上卡数。所以改变```batch size```或改变训练卡数都需要对应修改```learning rate```。


 ## 4.预测部署
+#### 4.1 Python预测推理
+准备好inference模型后，使用以下命令进行预测：
+```shell
+python infer.py -c configs/infer.yaml
+```

- [Paddle Inference Python部署](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/inference/python_inference.md)
- [Paddle Inference C++部署](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/inference/cpp_inference.md)
- [Paddle Lite部署](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/lite/lite.md)
+在配置文件```configs/infer.yaml```中有以下字段用于配置预测参数：
+- ```Global.infer_imgs```：待预测的图片文件路径
+- ```Global.inference_model_dir```：inference 模型文件所在目录，该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件
+- ```Global.use_tensorrt```：是否使用 TesorRT 预测引擎
+- ```Global.use_gpu```：是否使用 GPU 预测
+- ```Global.enable_mkldnn```：是否启用```MKL-DNN```加速库，注意```enable_mkldnn```与```use_gpu```同时为```True```时，将忽略```enable_mkldnn```，而使用```GPU```预测
+- ```Global.use_fp16```：是否启用```FP16```
+- ```PreProcess```：用于数据预处理配置
+- ```PostProcess```：由于后处理配置
+- ```PostProcess.Topk.class_id_map_file```：数据集 label 的映射文件，默认为```./images/imagenet1k_label_list.txt```，该文件为 PaddleClas 所使用的 ImageNet 数据集 label 映射文件
+
+注意：
+- 请注意模型的输入数据尺寸，部分模型需要修改参数：```PreProcess.resize_short```, ```PreProcess.resize```
+- 如果希望提升评测模型速度，使用```GPU```评测时，建议开启```TensorRT```加速预测，使用```CPU```评测时，建议开启```MKL-DNN```加速预测。
+
+#### 4.2 PaddleLite端侧部署
+PaddleLite端侧部署可参考：
+- [Paddle Lite部署](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/paddle_lite_deploy.md)

 ## 5.FAQ
--- a/demo/auto_compression/image_classification/configs/EfficientNetB0/prune_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/EfficientNetB0/prune_dis.yaml
@@ -85,7 +85,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 500
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/EfficientNetB0/qat_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/EfficientNetB0/qat_dis.yaml
@@ -29,7 +29,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/GhostNet_x1_0/prune_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/GhostNet_x1_0/prune_dis.yaml
@@ -61,7 +61,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/GhostNet_x1_0/qat_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/GhostNet_x1_0/qat_dis.yaml
@@ -29,7 +29,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/InceptionV3/prune_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/InceptionV3/prune_dis.yaml
@@ -115,7 +115,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/InceptionV3/qat_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/InceptionV3/qat_dis.yaml
@@ -27,7 +27,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/MobileNetV1/prune_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/MobileNetV1/prune_dis.yaml
@@ -30,7 +30,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 10000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/MobileNetV1/qat_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/MobileNetV1/qat_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /workspace/dataset/ILSVRC2012
+  data_dir: /ILSVRC2012
  
 Distillation:
  alpha: 1.0
@@ -29,7 +29,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 10000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/prune_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/prune_dis.yaml
@@ -30,7 +30,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/qat_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/qat_dis.yaml
@@ -29,7 +29,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.0001
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/PPLCNetV2_base/prune_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/PPLCNetV2_base/prune_dis.yaml
@@ -30,7 +30,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/PPLCNetV2_base/qat_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/PPLCNetV2_base/qat_dis.yaml
@@ -29,7 +29,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/PPLCNet_x1_0/prune_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/PPLCNet_x1_0/prune_dis.yaml
@@ -30,7 +30,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/PPLCNet_x1_0/qat_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/PPLCNet_x1_0/qat_dis.yaml
@@ -29,7 +29,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 10000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/ResNet50_vd/prune_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/ResNet50_vd/prune_dis.yaml
@@ -76,7 +76,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 500
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/ResNet50_vd/qat_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/ResNet50_vd/qat_dis.yaml
@@ -29,7 +29,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/prune_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/prune_dis.yaml
@@ -30,7 +30,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/qat_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/qat_dis.yaml
@@ -29,7 +29,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/SqueezeNet1_0/prune_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/SqueezeNet1_0/prune_dis.yaml
@@ -30,7 +30,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/SqueezeNet1_0/qat_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/SqueezeNet1_0/qat_dis.yaml
@@ -31,9 +31,8 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum
    weight_decay: 0.00002
-  origin_metric: 0.596
\ No newline at end of file
+  origin_metric: 0.596
--- a/demo/auto_compression/image_classification/configs/SwinTransformer_base_patch4_window7_224/qat_dis.yaml
+++ b/demo/auto_compression/image_classification/configs/SwinTransformer_base_patch4_window7_224/qat_dis.yaml
@@ -29,7 +29,6 @@ TrainConfig:
  learning_rate: 
    type: CosineAnnealingDecay 
    learning_rate: 0.015
-    T_max: 5000
  optimizer_builder:
    optimizer:
      type: Momentum

--- a/demo/auto_compression/image_classification/configs/infer.yaml
+++ b/demo/auto_compression/image_classification/configs/infer.yaml
+Global:
+  infer_imgs: "./images/ILSVRC2012_val_00000010.jpeg"
+  inference_model_dir: "./MobileNetV1_infer"
+  model_filename: "inference.pdmodel"
+  params_filename: "inference.pdiparams"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: True
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  use_int8: False
+  ir_optim: True
+  use_tensorrt: True
+  gpu_mem: 8000
+  enable_profile: False
+  benchmark: True
+
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 3
+    - ToCHWImage:
+
+PostProcess:
+  main_indicator: Topk
+  Topk:
+    topk: 5
+    class_id_map_file: "./images/imagenet1k_label_list.txt"
+  SavePreLabel:
+    save_dir: ./pre_label/ 
+
--- a/demo/auto_compression/image_classification/eval.py
+++ b/demo/auto_compression/image_classification/eval.py
+import os
+import sys
+sys.path[0] = os.path.join(
+    os.path.dirname("__file__"), os.path.pardir, os.path.pardir)
+import argparse
+import functools
+from functools import partial
+
+import numpy as np
+import paddle
+import paddle.nn as nn
+from paddle.io import Dataset, BatchSampler, DataLoader
+import imagenet_reader as reader
+from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from paddleslim.auto_compression import AutoCompression
+
+
+def argsparser():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        '--config_path',
+        type=str,
+        default=None,
+        help="path of compression strategy config.",
+        required=True)
+    parser.add_argument(
+        '--save_dir',
+        type=str,
+        default='output',
+        help="directory to save compressed model.")
+    return parser
+
+
+# yapf: enable
+def reader_wrapper(reader, input_name):
+    def gen():
+        for i, data in enumerate(reader()):
+            imgs = np.float32([item[0] for item in data])
+            yield {input_name: imgs}
+
+    return gen
+
+
+def eval_reader(data_dir, batch_size):
+    val_reader = paddle.batch(
+        reader.val(data_dir=data_dir), batch_size=batch_size)
+    return val_reader
+
+
+def eval():
+    devices = paddle.device.get_device().split(':')[0]
+    places = paddle.device._convert_to_place(devices)
+    exe = paddle.static.Executor(places)
+    val_program, feed_target_names, fetch_targets = paddle.static.load_inference_model(
+        global_config["model_dir"],
+        exe,
+        model_filename=global_config["model_filename"],
+        params_filename=global_config["params_filename"])
+    print('Loaded model from: {}'.format(global_config["model_dir"]))
+
+    val_reader = eval_reader(data_dir, batch_size=global_config['batch_size'])
+    image = paddle.static.data(
+        name=global_config['input_name'],
+        shape=[None, 3, 224, 224],
+        dtype='float32')
+    label = paddle.static.data(name='label', shape=[None, 1], dtype='int64')
+    results = []
+    print('Evaluating... It will take a while. Please wait...')
+    for batch_id, data in enumerate(val_reader()):
+        # top1_acc, top5_acc
+        image = np.array([[d[0]] for d in data])
+        image = image.reshape((len(data), 3, 224, 224))
+        label = [[d[1]] for d in data]
+        pred = exe.run(val_program,
+                       feed={feed_target_names[0]: image},
+                       fetch_list=fetch_targets)
+        pred = np.array(pred[0])
+        label = np.array(label)
+        sort_array = pred.argsort(axis=1)
+        top_1_pred = sort_array[:, -1:][:, ::-1]
+        top_1 = np.mean(label == top_1_pred)
+        top_5_pred = sort_array[:, -5:][:, ::-1]
+        acc_num = 0
+        for i in range(len(label)):
+            if label[i][0] in top_5_pred[i]:
+                acc_num += 1
+        top_5 = float(acc_num) / len(label)
+        results.append([top_1, top_5])
+    result = np.mean(np.array(results), axis=0)
+    return result[0]
+
+
+def main():
+    global global_config
+    all_config = load_slim_config(args.config_path)
+    assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
+    global_config = all_config["Global"]
+    global data_dir
+    data_dir = global_config['data_dir']
+    result = eval()
+    print('Eval Top1:', result)
+
+
+if __name__ == '__main__':
+    paddle.enable_static()
+    parser = argsparser()
+    args = parser.parse_args()
+    main()
--- a/demo/auto_compression/image_classification/images/ILSVRC2012_val_00000010.jpeg
+++ b/demo/auto_compression/image_classification/images/ILSVRC2012_val_00000010.jpeg
--- a/demo/auto_compression/image_classification/images/imagenet1k_label_list.txt
+++ b/demo/auto_compression/image_classification/images/imagenet1k_label_list.txt
--- a/demo/auto_compression/image_classification/infer.py
+++ b/demo/auto_compression/image_classification/infer.py
+import os
+import sys
+import cv2
+import numpy as np
+import platform
+import argparse
+import base64
+import shutil
+import paddle
+from postprocess import build_postprocess
+from preprocess import create_operators
+from paddleslim.auto_compression.config_helpers import load_config
+
+
+def argsparser():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        '-c',
+        '--config',
+        type=str,
+        default='configs/config.yaml',
+        help='config file path')
+    return parser
+
+
+def print_arguments(args):
+    print('-----------  Running Arguments -----------')
+    for arg, value in args.items():
+        print('%s: %s' % (arg, value))
+    print('------------------------------------------')
+
+
+def get_image_list(img_file):
+    imgs_lists = []
+    if img_file is None or not os.path.exists(img_file):
+        raise Exception("not found any img file in {}".format(img_file))
+
+    img_end = ['jpg', 'png', 'jpeg', 'JPEG', 'JPG', 'bmp']
+    if os.path.isfile(img_file) and img_file.split('.')[-1] in img_end:
+        imgs_lists.append(img_file)
+    elif os.path.isdir(img_file):
+        for single_file in os.listdir(img_file):
+            if single_file.split('.')[-1] in img_end:
+                imgs_lists.append(os.path.join(img_file, single_file))
+    if len(imgs_lists) == 0:
+        raise Exception("not found any img file in {}".format(img_file))
+    imgs_lists = sorted(imgs_lists)
+    return imgs_lists
+
+
+class Predictor(object):
+    def __init__(self, config):
+        predict_args = config['Global']
+        # HALF precission predict only work when using tensorrt
+        if predict_args['use_fp16'] is True:
+            assert predict_args.use_tensorrt is True
+        self.args = predict_args
+        if self.args.get("use_onnx", False):
+            self.predictor, self.config = self.create_onnx_predictor(
+                predict_args)
+        else:
+            self.predictor, self.config = self.create_paddle_predictor(
+                predict_args)
+
+        self.preprocess_ops = []
+        self.postprocess = None
+        if "PreProcess" in config:
+            if "transform_ops" in config["PreProcess"]:
+                self.preprocess_ops = create_operators(config["PreProcess"][
+                    "transform_ops"])
+        if "PostProcess" in config:
+            self.postprocess = build_postprocess(config["PostProcess"])
+
+        # for whole_chain project to test each repo of paddle
+        self.benchmark = config["Global"].get("benchmark", False)
+        if self.benchmark:
+            import auto_log
+            import os
+            pid = os.getpid()
+            size = config["PreProcess"]["transform_ops"][1]["CropImage"]["size"]
+            if config["Global"].get("use_int8", False):
+                precision = "int8"
+            elif config["Global"].get("use_fp16", False):
+                precision = "fp16"
+            else:
+                precision = "fp32"
+            self.auto_logger = auto_log.AutoLogger(
+                model_name=config["Global"].get("model_name", "cls"),
+                model_precision=precision,
+                batch_size=config["Global"].get("batch_size", 1),
+                data_shape=[3, size, size],
+                save_path=config["Global"].get("save_log_path",
+                                               "./auto_log.log"),
+                inference_config=self.config,
+                pids=pid,
+                process_name=None,
+                gpu_ids=None,
+                time_keys=[
+                    'preprocess_time', 'inference_time', 'postprocess_time'
+                ],
+                warmup=2)
+
+    def create_paddle_predictor(self, args):
+        inference_model_dir = args['inference_model_dir']
+
+        params_file = os.path.join(inference_model_dir, args['params_filename'])
+        model_file = os.path.join(inference_model_dir, args['model_filename'])
+
+        config = paddle.inference.Config(model_file, params_file)
+
+        if args['use_gpu']:
+            config.enable_use_gpu(args['gpu_mem'], 0)
+        else:
+            config.disable_gpu()
+            if args['enable_mkldnn']:
+                # there is no set_mkldnn_cache_capatity() on macOS
+                if platform.system() != "Darwin":
+                    # cache 10 different shapes for mkldnn to avoid memory leak
+                    config.set_mkldnn_cache_capacity(10)
+                config.enable_mkldnn()
+        config.set_cpu_math_library_num_threads(args['cpu_num_threads'])
+
+        if args['enable_profile']:
+            config.enable_profile()
+        config.disable_glog_info()
+        config.switch_ir_optim(args['ir_optim'])  # default true
+        if args['use_tensorrt']:
+            precision = paddle.inference.Config.Precision.Float32
+            if args.get("use_int8", False):
+                precision = paddle.inference.Config.Precision.Int8
+            elif args.get("use_fp16", False):
+                precision = paddle.inference.Config.Precision.Half
+
+            config.enable_tensorrt_engine(
+                precision_mode=precision,
+                max_batch_size=args['batch_size'],
+                workspace_size=1 << 30,
+                min_subgraph_size=30,
+                use_calib_mode=False)
+
+        config.enable_memory_optim()
+        # use zero copy
+        config.switch_use_feed_fetch_ops(False)
+        predictor = paddle.inference.create_predictor(config)
+
+        return predictor, config
+
+    def create_onnx_predictor(self, args):
+        import onnxruntime as ort
+        inference_model_dir = args['inference_model_dir']
+        model_file = os.path.join(inference_model_dir, args['model_filename'])
+        config = ort.SessionOptions()
+        if args['use_gpu']:
+            raise ValueError(
+                "onnx inference now only supports cpu! please specify use_gpu false."
+            )
+        else:
+            config.intra_op_num_threads = args['cpu_num_threads']
+            if args['ir_optim']:
+                config.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
+        predictor = ort.InferenceSession(model_file, sess_options=config)
+        return predictor, config
+
+    def predict(self, images):
+        use_onnx = self.args.get("use_onnx", False)
+        if not use_onnx:
+            input_names = self.predictor.get_input_names()
+            input_tensor = self.predictor.get_input_handle(input_names[0])
+
+            output_names = self.predictor.get_output_names()
+            output_tensor = self.predictor.get_output_handle(output_names[0])
+        else:
+            input_names = self.predictor.get_inputs()[0].name
+            output_names = self.predictor.get_outputs()[0].name
+
+        if self.benchmark:
+            self.auto_logger.times.start()
+        if not isinstance(images, (list, )):
+            images = [images]
+        for idx in range(len(images)):
+            for ops in self.preprocess_ops:
+                images[idx] = ops(images[idx])
+        image = np.array(images)
+        if self.benchmark:
+            self.auto_logger.times.stamp()
+
+        if not use_onnx:
+            input_tensor.copy_from_cpu(image)
+            self.predictor.run()
+            batch_output = output_tensor.copy_to_cpu()
+        else:
+            batch_output = self.predictor.run(
+                output_names=[output_names], input_feed={input_names: image})[0]
+
+        if self.benchmark:
+            self.auto_logger.times.stamp()
+        if self.postprocess is not None:
+            batch_output = self.postprocess(batch_output)
+        if self.benchmark:
+            self.auto_logger.times.end(stamp=True)
+        return batch_output
+
+
+def main(config):
+    predictor = Predictor(config)
+    image_list = get_image_list(config["Global"]["infer_imgs"])
+    image_list = image_list * 1000
+    batch_imgs = []
+    batch_names = []
+    cnt = 0
+    for idx, img_path in enumerate(image_list):
+        img = cv2.imread(img_path)
+        if img is None:
+            logger.warning(
+                "Image file failed to read and has been skipped. The path: {}".
+                format(img_path))
+        else:
+            img = img[:, :, ::-1]
+            batch_imgs.append(img)
+            img_name = os.path.basename(img_path)
+            batch_names.append(img_name)
+            cnt += 1
+
+        if cnt % config["Global"]["batch_size"] == 0 or (idx + 1
+                                                         ) == len(image_list):
+            if len(batch_imgs) == 0:
+                continue
+            batch_results = predictor.predict(batch_imgs)
+            for number, result_dict in enumerate(batch_results):
+                if "PersonAttribute" in config[
+                        "PostProcess"] or "VehicleAttribute" in config[
+                            "PostProcess"]:
+                    filename = batch_names[number]
+                else:
+                    filename = batch_names[number]
+                    clas_ids = result_dict["class_ids"]
+                    scores_str = "[{}]".format(", ".join("{:.2f}".format(
+                        r) for r in result_dict["scores"]))
+                    label_names = result_dict["label_names"]
+            batch_imgs = []
+            batch_names = []
+    if predictor.benchmark:
+        predictor.auto_logger.report()
+    return
+
+
+if __name__ == "__main__":
+    parser = argsparser()
+    args = parser.parse_args()
+    config = load_config(args.config)
+    print_arguments(config['Global'])
+    main(config)
--- a/demo/auto_compression/image_classification/postprocess.py
+++ b/demo/auto_compression/image_classification/postprocess.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import copy
+import shutil
+from functools import partial
+import importlib
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+
+
+def build_postprocess(config):
+    if config is None:
+        return None
+
+    mod = importlib.import_module(__name__)
+    config = copy.deepcopy(config)
+
+    main_indicator = config.pop(
+        "main_indicator") if "main_indicator" in config else None
+    main_indicator = main_indicator if main_indicator else ""
+
+    func_list = []
+    for func in config:
+        func_list.append(getattr(mod, func)(**config[func]))
+    return PostProcesser(func_list, main_indicator)
+
+
+class PostProcesser(object):
+    def __init__(self, func_list, main_indicator="Topk"):
+        self.func_list = func_list
+        self.main_indicator = main_indicator
+
+    def __call__(self, x, image_file=None):
+        rtn = None
+        for func in self.func_list:
+            tmp = func(x, image_file)
+            if type(func).__name__ in self.main_indicator:
+                rtn = tmp
+        return rtn
+
+
+class ThreshOutput(object):
+    def __init__(self, threshold, label_0="0", label_1="1"):
+        self.threshold = threshold
+        self.label_0 = label_0
+        self.label_1 = label_1
+
+    def __call__(self, x, file_names=None):
+        y = []
+        for idx, probs in enumerate(x):
+            score = probs[1]
+            if score < self.threshold:
+                result = {
+                    "class_ids": [0],
+                    "scores": [1 - score],
+                    "label_names": [self.label_0]
+                }
+            else:
+                result = {
+                    "class_ids": [1],
+                    "scores": [score],
+                    "label_names": [self.label_1]
+                }
+            if file_names is not None:
+                result["file_name"] = file_names[idx]
+            y.append(result)
+        return y
+
+
+class Topk(object):
+    def __init__(self, topk=1, class_id_map_file=None):
+        assert isinstance(topk, (int, ))
+        self.class_id_map = self.parse_class_id_map(class_id_map_file)
+        self.topk = topk
+
+    def parse_class_id_map(self, class_id_map_file):
+        if class_id_map_file is None:
+            return None
+
+        if not os.path.exists(class_id_map_file):
+            print(
+                "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+            )
+            return None
+
+        try:
+            class_id_map = {}
+            with open(class_id_map_file, "r") as fin:
+                lines = fin.readlines()
+                for line in lines:
+                    partition = line.split("\n")[0].partition(" ")
+                    class_id_map[int(partition[0])] = str(partition[-1])
+        except Exception as ex:
+            print(ex)
+            class_id_map = None
+        return class_id_map
+
+    def __call__(self, x, file_names=None, multilabel=False):
+        if file_names is not None:
+            assert x.shape[0] == len(file_names)
+        y = []
+        for idx, probs in enumerate(x):
+            index = probs.argsort(axis=0)[-self.topk:][::-1].astype(
+                "int32") if not multilabel else np.where(
+                    probs >= 0.5)[0].astype("int32")
+            clas_id_list = []
+            score_list = []
+            label_name_list = []
+            for i in index:
+                clas_id_list.append(i.item())
+                score_list.append(probs[i].item())
+                if self.class_id_map is not None:
+                    label_name_list.append(self.class_id_map[i.item()])
+            result = {
+                "class_ids": clas_id_list,
+                "scores": np.around(
+                    score_list, decimals=5).tolist(),
+            }
+            if file_names is not None:
+                result["file_name"] = file_names[idx]
+            if label_name_list is not None:
+                result["label_names"] = label_name_list
+            y.append(result)
+        return y
+
+
+class MultiLabelTopk(Topk):
+    def __init__(self, topk=1, class_id_map_file=None):
+        super().__init__()
+
+    def __call__(self, x, file_names=None):
+        return super().__call__(x, file_names, multilabel=True)
+
+
+class SavePreLabel(object):
+    def __init__(self, save_dir):
+        if save_dir is None:
+            raise Exception(
+                "Please specify save_dir if SavePreLabel specified.")
+        self.save_dir = partial(os.path.join, save_dir)
+
+    def __call__(self, x, file_names=None):
+        if file_names is None:
+            return
+        assert x.shape[0] == len(file_names)
+        for idx, probs in enumerate(x):
+            index = probs.argsort(axis=0)[-1].astype("int32")
+            self.save(index, file_names[idx])
+
+    def save(self, id, image_file):
+        output_dir = self.save_dir(str(id))
+        os.makedirs(output_dir, exist_ok=True)
+        shutil.copy(image_file, output_dir)
+
+
+class Binarize(object):
+    def __init__(self, method="round"):
+        self.method = method
+        self.unit = np.array([[128, 64, 32, 16, 8, 4, 2, 1]]).T
+
+    def __call__(self, x, file_names=None):
+        if self.method == "round":
+            x = np.round(x + 1).astype("uint8") - 1
+
+        if self.method == "sign":
+            x = ((np.sign(x) + 1) / 2).astype("uint8")
+
+        embedding_size = x.shape[1]
+        assert embedding_size % 8 == 0, "The Binary index only support vectors with sizes multiple of 8"
+
+        byte = np.zeros([x.shape[0], embedding_size // 8], dtype=np.uint8)
+        for i in range(embedding_size // 8):
+            byte[:, i:i + 1] = np.dot(x[:, i * 8:(i + 1) * 8], self.unit)
+
+        return byte
+
+
+class PersonAttribute(object):
+    def __init__(self, threshold=0.5, glasses_threshold=0.3,
+                 hold_threshold=0.6):
+        self.threshold = threshold
+        self.glasses_threshold = glasses_threshold
+        self.hold_threshold = hold_threshold
+
+    def __call__(self, batch_preds, file_names=None):
+        # postprocess output of predictor
+        age_list = ['AgeLess18', 'Age18-60', 'AgeOver60']
+        direct_list = ['Front', 'Side', 'Back']
+        bag_list = ['HandBag', 'ShoulderBag', 'Backpack']
+        upper_list = ['UpperStride', 'UpperLogo', 'UpperPlaid', 'UpperSplice']
+        lower_list = [
+            'LowerStripe', 'LowerPattern', 'LongCoat', 'Trousers', 'Shorts',
+            'Skirt&Dress'
+        ]
+        batch_res = []
+        for res in batch_preds:
+            res = res.tolist()
+            label_res = []
+            # gender 
+            gender = 'Female' if res[22] > self.threshold else 'Male'
+            label_res.append(gender)
+            # age
+            age = age_list[np.argmax(res[19:22])]
+            label_res.append(age)
+            # direction 
+            direction = direct_list[np.argmax(res[23:])]
+            label_res.append(direction)
+            # glasses
+            glasses = 'Glasses: '
+            if res[1] > self.glasses_threshold:
+                glasses += 'True'
+            else:
+                glasses += 'False'
+            label_res.append(glasses)
+            # hat
+            hat = 'Hat: '
+            if res[0] > self.threshold:
+                hat += 'True'
+            else:
+                hat += 'False'
+            label_res.append(hat)
+            # hold obj
+            hold_obj = 'HoldObjectsInFront: '
+            if res[18] > self.hold_threshold:
+                hold_obj += 'True'
+            else:
+                hold_obj += 'False'
+            label_res.append(hold_obj)
+            # bag
+            bag = bag_list[np.argmax(res[15:18])]
+            bag_score = res[15 + np.argmax(res[15:18])]
+            bag_label = bag if bag_score > self.threshold else 'No bag'
+            label_res.append(bag_label)
+            # upper
+            upper_res = res[4:8]
+            upper_label = 'Upper:'
+            sleeve = 'LongSleeve' if res[3] > res[2] else 'ShortSleeve'
+            upper_label += ' {}'.format(sleeve)
+            for i, r in enumerate(upper_res):
+                if r > self.threshold:
+                    upper_label += ' {}'.format(upper_list[i])
+            label_res.append(upper_label)
+            # lower
+            lower_res = res[8:14]
+            lower_label = 'Lower: '
+            has_lower = False
+            for i, l in enumerate(lower_res):
+                if l > self.threshold:
+                    lower_label += ' {}'.format(lower_list[i])
+                    has_lower = True
+            if not has_lower:
+                lower_label += ' {}'.format(lower_list[np.argmax(lower_res)])
+
+            label_res.append(lower_label)
+            # shoe
+            shoe = 'Boots' if res[14] > self.threshold else 'No boots'
+            label_res.append(shoe)
+
+            threshold_list = [0.5] * len(res)
+            threshold_list[1] = self.glasses_threshold
+            threshold_list[18] = self.hold_threshold
+            pred_res = (np.array(res) > np.array(threshold_list)
+                        ).astype(np.int8).tolist()
+            batch_res.append({"attributes": label_res, "output": pred_res})
+        return batch_res
+
+
+class VehicleAttribute(object):
+    def __init__(self, color_threshold=0.5, type_threshold=0.5):
+        self.color_threshold = color_threshold
+        self.type_threshold = type_threshold
+        self.color_list = [
+            "yellow", "orange", "green", "gray", "red", "blue", "white",
+            "golden", "brown", "black"
+        ]
+        self.type_list = [
+            "sedan", "suv", "van", "hatchback", "mpv", "pickup", "bus", "truck",
+            "estate"
+        ]
+
+    def __call__(self, batch_preds, file_names=None):
+        # postprocess output of predictor
+        batch_res = []
+        for res in batch_preds:
+            res = res.tolist()
+            label_res = []
+            color_idx = np.argmax(res[:10])
+            type_idx = np.argmax(res[10:])
+            if res[color_idx] >= self.color_threshold:
+                color_info = f"Color: ({self.color_list[color_idx]}, prob: {res[color_idx]})"
+            else:
+                color_info = "Color unknown"
+
+            if res[type_idx + 10] >= self.type_threshold:
+                type_info = f"Type: ({self.type_list[type_idx]}, prob: {res[type_idx + 10]})"
+            else:
+                type_info = "Type unknown"
+
+            label_res = f"{color_info}, {type_info}"
+
+            threshold_list = [self.color_threshold
+                              ] * 10 + [self.type_threshold] * 9
+            pred_res = (np.array(res) > np.array(threshold_list)
+                        ).astype(np.int8).tolist()
+            batch_res.append({"attributes": label_res, "output": pred_res})
+        return batch_res
--- a/demo/auto_compression/image_classification/preprocess.py
+++ b/demo/auto_compression/image_classification/preprocess.py
+"""
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+from functools import partial
+import six
+import math
+import random
+import cv2
+import numpy as np
+import importlib
+from PIL import Image
+
+#from python.det_preprocess import DetNormalizeImage, DetPadStride, DetPermute, DetResize
+
+
+def create_operators(params):
+    """
+    create operators based on the config
+
+    Args:
+        params(list): a dict list, used to create some operators
+    """
+    assert isinstance(params, list), ('operator config should be a list')
+    mod = importlib.import_module(__name__)
+    ops = []
+    for operator in params:
+        assert isinstance(operator,
+                          dict) and len(operator) == 1, "yaml format error"
+        op_name = list(operator)[0]
+        param = {} if operator[op_name] is None else operator[op_name]
+        op = getattr(mod, op_name)(**param)
+        ops.append(op)
+
+    return ops
+
+
+class UnifiedResize(object):
+    def __init__(self, interpolation=None, backend="cv2"):
+        _cv2_interp_from_str = {
+            'nearest': cv2.INTER_NEAREST,
+            'bilinear': cv2.INTER_LINEAR,
+            'area': cv2.INTER_AREA,
+            'bicubic': cv2.INTER_CUBIC,
+            'lanczos': cv2.INTER_LANCZOS4
+        }
+        _pil_interp_from_str = {
+            'nearest': Image.NEAREST,
+            'bilinear': Image.BILINEAR,
+            'bicubic': Image.BICUBIC,
+            'box': Image.BOX,
+            'lanczos': Image.LANCZOS,
+            'hamming': Image.HAMMING
+        }
+
+        def _pil_resize(src, size, resample):
+            pil_img = Image.fromarray(src)
+            pil_img = pil_img.resize(size, resample)
+            return np.asarray(pil_img)
+
+        if backend.lower() == "cv2":
+            if isinstance(interpolation, str):
+                interpolation = _cv2_interp_from_str[interpolation.lower()]
+            # compatible with opencv < version 4.4.0
+            elif interpolation is None:
+                interpolation = cv2.INTER_LINEAR
+            self.resize_func = partial(cv2.resize, interpolation=interpolation)
+        elif backend.lower() == "pil":
+            if isinstance(interpolation, str):
+                interpolation = _pil_interp_from_str[interpolation.lower()]
+            self.resize_func = partial(_pil_resize, resample=interpolation)
+        else:
+            logger.warning(
+                f"The backend of Resize only support \"cv2\" or \"PIL\". \"f{backend}\" is unavailable. Use \"cv2\" instead."
+            )
+            self.resize_func = cv2.resize
+
+    def __call__(self, src, size):
+        return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+    """ OperatorParamError
+    """
+    pass
+
+
+class DecodeImage(object):
+    """ decode image """
+
+    def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+        self.to_rgb = to_rgb
+        self.to_np = to_np  # to numpy
+        self.channel_first = channel_first  # only enabled when to_np is True
+
+    def __call__(self, img):
+        if six.PY2:
+            assert type(img) is str and len(
+                img) > 0, "invalid input 'img' in DecodeImage"
+        else:
+            assert type(img) is bytes and len(
+                img) > 0, "invalid input 'img' in DecodeImage"
+        data = np.frombuffer(img, dtype='uint8')
+        img = cv2.imdecode(data, 1)
+        if self.to_rgb:
+            assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+            img = img[:, :, ::-1]
+
+        if self.channel_first:
+            img = img.transpose((2, 0, 1))
+
+        return img
+
+
+class ResizeImage(object):
+    """ resize image """
+
+    def __init__(self,
+                 size=None,
+                 resize_short=None,
+                 interpolation=None,
+                 backend="cv2"):
+        if resize_short is not None and resize_short > 0:
+            self.resize_short = resize_short
+            self.w = None
+            self.h = None
+        elif size is not None:
+            self.resize_short = None
+            self.w = size if type(size) is int else size[0]
+            self.h = size if type(size) is int else size[1]
+        else:
+            raise OperatorParamError("invalid params for ReisizeImage for '\
+                'both 'size' and 'resize_short' are None")
+
+        self._resize_func = UnifiedResize(
+            interpolation=interpolation, backend=backend)
+
+    def __call__(self, img):
+        img_h, img_w = img.shape[:2]
+        if self.resize_short is not None:
+            percent = float(self.resize_short) / min(img_w, img_h)
+            w = int(round(img_w * percent))
+            h = int(round(img_h * percent))
+        else:
+            w = self.w
+            h = self.h
+        return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+    """ crop image """
+
+    def __init__(self, size):
+        if type(size) is int:
+            self.size = (size, size)
+        else:
+            self.size = size  # (h, w)
+
+    def __call__(self, img):
+        w, h = self.size
+        img_h, img_w = img.shape[:2]
+
+        if img_h < h or img_w < w:
+            raise Exception(
+                f"The size({h}, {w}) of CropImage must be greater than size({img_h}, {img_w}) of image. Please check image original size and size of ResizeImage if used."
+            )
+
+        w_start = (img_w - w) // 2
+        h_start = (img_h - h) // 2
+
+        w_end = w_start + w
+        h_end = h_start + h
+        return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+    """ random crop image """
+
+    def __init__(self,
+                 size,
+                 scale=None,
+                 ratio=None,
+                 interpolation=None,
+                 backend="cv2"):
+        if type(size) is int:
+            self.size = (size, size)  # (h, w)
+        else:
+            self.size = size
+
+        self.scale = [0.08, 1.0] if scale is None else scale
+        self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+        self._resize_func = UnifiedResize(
+            interpolation=interpolation, backend=backend)
+
+    def __call__(self, img):
+        size = self.size
+        scale = self.scale
+        ratio = self.ratio
+
+        aspect_ratio = math.sqrt(random.uniform(*ratio))
+        w = 1. * aspect_ratio
+        h = 1. / aspect_ratio
+
+        img_h, img_w = img.shape[:2]
+
+        bound = min((float(img_w) / img_h) / (w**2),
+                    (float(img_h) / img_w) / (h**2))
+        scale_max = min(scale[1], bound)
+        scale_min = min(scale[0], bound)
+
+        target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+        target_size = math.sqrt(target_area)
+        w = int(target_size * w)
+        h = int(target_size * h)
+
+        i = random.randint(0, img_w - w)
+        j = random.randint(0, img_h - h)
+
+        img = img[j:j + h, i:i + w, :]
+
+        return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+    """ random flip image
+        flip_code:
+            1: Flipped Horizontally
+            0: Flipped Vertically
+            -1: Flipped Horizontally & Vertically
+    """
+
+    def __init__(self, flip_code=1):
+        assert flip_code in [-1, 0, 1
+                             ], "flip_code should be a value in [-1, 0, 1]"
+        self.flip_code = flip_code
+
+    def __call__(self, img):
+        if random.randint(0, 1) == 1:
+            return cv2.flip(img, self.flip_code)
+        else:
+            return img
+
+
+class AutoAugment(object):
+    def __init__(self):
+        self.policy = ImageNetPolicy()
+
+    def __call__(self, img):
+        from PIL import Image
+        img = np.ascontiguousarray(img)
+        img = Image.fromarray(img)
+        img = self.policy(img)
+        img = np.asarray(img)
+
+
+class NormalizeImage(object):
+    """ normalize image such as substract mean, divide std
+    """
+
+    def __init__(self,
+                 scale=None,
+                 mean=None,
+                 std=None,
+                 order='chw',
+                 output_fp16=False,
+                 channel_num=3):
+        if isinstance(scale, str):
+            scale = eval(scale)
+        assert channel_num in [
+            3, 4
+        ], "channel number of input image should be set to 3 or 4."
+        self.channel_num = channel_num
+        self.output_dtype = 'float16' if output_fp16 else 'float32'
+        self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+        self.order = order
+        mean = mean if mean is not None else [0.485, 0.456, 0.406]
+        std = std if std is not None else [0.229, 0.224, 0.225]
+
+        shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+        self.mean = np.array(mean).reshape(shape).astype('float32')
+        self.std = np.array(std).reshape(shape).astype('float32')
+
+    def __call__(self, img):
+        from PIL import Image
+        if isinstance(img, Image.Image):
+            img = np.array(img)
+
+        assert isinstance(img,
+                          np.ndarray), "invalid input 'img' in NormalizeImage"
+
+        img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+        if self.channel_num == 4:
+            img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+            img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+            pad_zeros = np.zeros(
+                (1, img_h, img_w)) if self.order == 'chw' else np.zeros(
+                    (img_h, img_w, 1))
+            img = (np.concatenate(
+                (img, pad_zeros), axis=0)
+                   if self.order == 'chw' else np.concatenate(
+                       (img, pad_zeros), axis=2))
+        return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+    """ convert hwc image to chw image
+    """
+
+    def __init__(self):
+        pass
+
+    def __call__(self, img):
+        from PIL import Image
+        if isinstance(img, Image.Image):
+            img = np.array(img)
+
+        return img.transpose((2, 0, 1))
--- a/demo/auto_compression/image_classification/run.py
+++ b/demo/auto_compression/image_classification/run.py
@@ -7,6 +7,7 @@ import functools
 from functools import partial

 import numpy as np
+import math
 import paddle
 import paddle.nn as nn
 from paddle.io import Dataset, BatchSampler, DataLoader
@@ -15,6 +16,7 @@ from paddleslim.auto_compression.config_helpers import load_config as load_slim_
 from paddleslim.auto_compression import AutoCompression
 from utility import add_arguments, print_arguments

+
 def argsparser():
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument(
@@ -28,15 +30,14 @@ def argsparser():
        type=str,
        default='output',
        help="directory to save compressed model.")
+    parser.add_argument(
+        '--total_images',
+        type=int,
+        default=1281167,
+        help="the number of total training images.")
    return parser


-def print_arguments(args):
-    print('-----------  Running Arguments -----------')
-    for arg, value in sorted(vars(args).items()):
-        print('%s: %s' % (arg, value))
-    print('------------------------------------------')
-    
 # yapf: enable
 def reader_wrapper(reader, input_name):
    def gen():
@@ -56,10 +57,13 @@ def eval_reader(data_dir, batch_size):
 def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
    val_reader = eval_reader(data_dir, batch_size=global_config['batch_size'])
    image = paddle.static.data(
-        name=global_config['input_name'], shape=[None, 3, 224, 224], dtype='float32')
+        name=global_config['input_name'],
+        shape=[None, 3, 224, 224],
+        dtype='float32')
    label = paddle.static.data(name='label', shape=[None, 1], dtype='int64')

    results = []
+    print('Evaluating... It will take a while. Please wait...')
    for batch_id, data in enumerate(val_reader()):
        # top1_acc, top5_acc
        if len(test_feed_names) == 1:
@@ -93,8 +97,6 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
                fetch_list=test_fetch_list)
            result = [np.mean(r) for r in result]
            results.append(result)
-        if batch_id % 50 == 0:
-            print('Eval iter: ', batch_id)
    result = np.mean(np.array(results), axis=0)
    return result[0]

@@ -104,6 +106,15 @@ def main():
    all_config = load_slim_config(args.config_path)
    assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
    global_config = all_config["Global"]
+    gpu_num = paddle.distributed.get_world_size()
+    if all_config['TrainConfig']['learning_rate'][
+            'type'] == 'CosineAnnealingDecay':
+        step = int(
+            math.ceil(
+                float(args.total_images) / (global_config['batch_size'] *
+                                            gpu_num)))
+        all_config['TrainConfig']['learning_rate']['T_max'] = step
+        print('total training steps:', step)
    global data_dir
    data_dir = global_config['data_dir']

@@ -119,13 +130,15 @@ def main():
        config=all_config,
        train_dataloader=train_dataloader,
        eval_callback=eval_function,
-        eval_dataloader=reader_wrapper(eval_reader(data_dir, global_config['batch_size']), global_config['input_name']))
+        eval_dataloader=reader_wrapper(
+            eval_reader(data_dir, global_config['batch_size']),
+            global_config['input_name']))

    ac.compress()
-    
+
+
 if __name__ == '__main__':
    paddle.enable_static()
    parser = argsparser()
    args = parser.parse_args()
-    print_arguments(args)
    main()
--- a/demo/auto_compression/image_classification/run.sh
+++ b/demo/auto_compression/image_classification/run.sh
 # 单卡启动
 export CUDA_VISIBLE_DEVICES=0
-python run.py \
-    --model_dir='MobileNetV1_infer' \
-    --model_filename='inference.pdmodel' \
-    --params_filename='inference.pdiparams' \
-    --save_dir='./save_quant_mobilev1/' \
-    --batch_size=128 \
-    --config_path='./configs/mobilenetv1_qat_dis.yaml'\
-    --input_shape 3 224 224 \
-    --image_reader_type='paddle' \
-    --data_dir='ILSVRC2012'
-    
-# 多卡启动    
-# python -m paddle.distributed.launch run.py \
-#     --model_dir='MobileNetV1_infer' \
-#     --model_filename='inference.pdmodel' \
-#     --params_filename='inference.pdiparams' \
-#     --save_dir='./save_quant_mobilev1/' \
-#     --batch_size=128 \
-#     --config_path='./configs/mobilenetv1_qat_dis.yaml'\
-#     --data_dir='ILSVRC2012' 
-    
+python3.7 eval.py --save_dir='./save_quant_mobilev1/' --config_path='./configs/MobileNetV1/qat_dis.yaml'   
+
+# 多卡启动  
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python -m paddle.distributed.launch run.py --save_dir='./save_quant_mobilev1/' --config_path='./configs/MobileNetV1/qat_dis.yaml'
+
--- a/demo/auto_compression/nlp/run.py
+++ b/demo/auto_compression/nlp/run.py
@@ -19,7 +19,7 @@ from paddlenlp.data.sampler import SamplerHelper
 from paddlenlp.metrics import Mcc, PearsonAndSpearman
 from paddleslim.auto_compression.config_helpers import load_config
 from paddleslim.auto_compression.compressor import AutoCompression
-from utility import add_arguments, print_arguments
+from utility import add_arguments

 parser = argparse.ArgumentParser(description=__doc__)
 add_arg = functools.partial(add_arguments, argparser=parser)
@@ -239,7 +239,6 @@ def apply_decay_param_fun(name):

 if __name__ == '__main__':
    args = parser.parse_args()
-    print_arguments(args)
    paddle.enable_static()
    all_config = load_config(args.config_path)


--- a/demo/auto_compression/pytorch_huggingface/README.md
+++ b/demo/auto_compression/pytorch_huggingface/README.md
@@ -15,6 +15,7 @@
 飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨（PaddlePaddle）的预测模型。借助X2Paddle的能力，PaddleSlim的自动压缩功能可方便地用于各种框架的推理模型。


+
 本示例将以[PyTorch](https://github.com/pytorch/pytorch)框架的自然语言处理模型为例，介绍如何自动压缩其他框架中的自然语言处理模型。本示例会利用[huggingface](https://github.com/huggingface/transformers)开源transformers库，将PyTorch框架模型转换为Paddle框架模型，再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为剪枝蒸馏和离线量化(```Post-training quantization```)。


@@ -87,6 +88,7 @@ pip install paddlenlp

 #### 3.3 X2Paddle转换模型流程

+
 **方式1: PyTorch2Paddle直接将PyTorch动态图模型转为Paddle静态图模型**

 ```shell
@@ -126,6 +128,7 @@ wget https://paddle-slim-models.bj.bcebos.com/act/x2paddle_cola.tar
 tar xf x2paddle_cola.tar
 ```

+
 **方式2: Onnx2Paddle将PyTorch动态图模型保存为Onnx格式后再转为Paddle静态图模型**



--- a/demo/auto_compression/pytorch_huggingface/run.py
+++ b/demo/auto_compression/pytorch_huggingface/run.py
@@ -45,14 +45,6 @@ def argsparser():
        help="directory to save compressed model.")
    return parser

-
-def print_arguments(args):
-    print('-----------  Running Arguments -----------')
-    for arg, value in sorted(vars(args).items()):
-        print('%s: %s' % (arg, value))
-    print('------------------------------------------')
-
-
 METRIC_CLASSES = {
    "cola": Mcc,
    "sst-2": Accuracy,
@@ -320,5 +312,4 @@ if __name__ == '__main__':
    paddle.enable_static()
    parser = argsparser()
    args = parser.parse_args()
-    print_arguments(args)
    main()
--- a/paddleslim/auto_compression/config_helpers.py
+++ b/paddleslim/auto_compression/config_helpers.py
@@ -18,6 +18,19 @@ from paddleslim.auto_compression.strategy_config import *
 __all__ = ['save_config', 'load_config']


+def print_arguments(args, level=0):
+    if level == 0:
+        print('-----------  Running Arguments -----------')
+    for arg, value in sorted(args.items()):
+        if isinstance(value, dict):
+            print('\t' * level, '%s:' % arg)
+            print_arguments(value, level + 1)
+        else:
+            print('\t' * level, '%s: %s' % (arg, value))
+    if level == 0:
+        print('------------------------------------------')
+
+
 def load_config(config):
    """Load configurations from yaml file into dict.
    Fields validation is skipped for loading some custom information.
@@ -35,6 +48,7 @@ def load_config(config):
        config), f"{config} not found or it is not a file."
    with open(config) as f:
        cfg = yaml.load(f, Loader=yaml.FullLoader)
+    print_arguments(cfg)
    return cfg