optimize act readme (#1265)

* optimize readme * update * add eval.yaml

optimize act readme (#1265)
* optimize readme * update * add eval.yaml
a256a9b6 · ceci3 · GitHub · a2e9809f · a256a9b6 · a256a9b6
8 changed file
--- a/README.md
+++ b/README.md
@@ -280,7 +280,7 @@ python setup.py install
 答：这是因为量化后保存的参数是虽然是int8范围，但是类型是float。这是因为Paddle训练前向默认的Kernel不支持INT8 Kernel实现，只有Paddle Inference TensorRT的推理才支持量化推理加速。为了方便量化后验证量化精度，使用Paddle训练前向能加载此模型，默认保存的Float32类型权重，体积没有发生变换。
-#### 2. macOS + Python3.9环境下, 安装出错, "command 'swig' failed"
+#### 2. macOS + Python3.9环境或者Windows环境下, 安装出错, "command 'swig' failed"
 答: 请参考https://github.com/PaddlePaddle/PaddleSlim/issues/1258

--- a/example/auto_compression/README.md
+++ b/example/auto_compression/README.md
@@ -35,7 +35,7 @@ wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileN
 tar -xf MobileNetV1_infer.tar
 # 下载ImageNet小型数据集
 wget https://sys-p0.bj.bcebos.com/slim_ci/ILSVRC2012_data_demo.tar.gz
-tar xf ILSVRC2012_data_demo.tar.gz
+tar -xf ILSVRC2012_data_demo.tar.gz
 ```
 - 2.运行
@@ -77,17 +77,66 @@ ac = AutoCompression(
    model_dir="./MobileNetV1_infer",
    model_filename="inference.pdmodel",
    params_filename="inference.pdiparams",
-    save_dir="output",
+    save_dir="MobileNetV1_quant",
-    config={'Quantization': {}, "HyperParameterOptimization": {'max_quant_count': 5}},
+    config={'Quantization': {}, "HyperParameterOptimization": {'ptq_algo': ['avg'], 'max_quant_count': 3}},
    train_dataloader=train_loader,
-    eval_dataloader=train_loader)  # eval_function to verify accuracy
+    eval_dataloader=train_loader)
 ac.compress()
 ```
+- 3.测试精度
+测试压缩前模型的精度:
+```shell
+CUDA_VISIBLE_DEVICES=0 python ./image_classification/eval.py
+### Eval Top1: 0.7171724759615384
+```
+测试量化模型的精度:
+```shell
+CUDA_VISIBLE_DEVICES=0 python ./image_classification/eval.py --model_dir='MobileNetV1_quant'
+### Eval Top1: 0.7166466346153846
+```
+量化后模型的精度相比量化前的模型几乎精度无损，由于是使用的超参搜索的方法来选择的量化参数，所以每次运行得到的量化模型精度会有些许波动。
+- 4.推理速度测试
+量化模型速度的测试依赖推理库的支持，所以确保安装的是带有TensorRT的PaddlePaddle。以下示例和展示的测试结果是基于Tesla V100、CUDA 10.2、python3.7得到的。
+使用以下指令查看本地cuda版本，并且在[下载链接](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)中下载对应cuda版本和对应python版本的paddlepaddle安装包。
+```shell
+cat /usr/local/cuda/version.txt ### CUDA Version 10.2.89
+### 10.2.89 为cuda版本号，可以根据这个版本号选择需要安装的带有TensorRT的PaddlePaddle安装包。
+```
+安装下载的whl包：
+```
+### 这里通过wget下载到的是python3.7、cuda10.2的PaddlePaddle安装包，若您的环境和示例环境不同，请依赖您自己机器的环境下载对应的安装包，否则运行示例代码会报错。
+wget https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.3.0-cp37-cp37m-linux_x86_64.whl
+pip install paddlepaddle_gpu-2.3.0-cp37-cp37m-linux_x86_64.whl --force-reinstall
+```
+测试FP32模型的速度
+```
+python ./image_classification/infer.py
+### using tensorrt FP32	batch size: 1 time(ms): 0.6140608787536621
+```
+测试FP16模型的速度
+```
+python ./image_classification/infer.py --use_fp16=True
+### using tensorrt FP16	batch size: 1 time(ms): 0.5795984268188477
+```
+测试INT8模型的速度
+```
+python ./image_classification/infer.py --model_dir=./MobileNetV1_quant/ --use_int8=True
+### using tensorrt INT8 batch size: 1 time(ms): 0.5213963985443115
+```
 **提示：**
- DataLoader传入的数据集是待压缩模型所用的数据集，DataLoader继承自`paddle.io.DataLoader`。
+- DataLoader传入的数据集是待压缩模型所用的数据集，DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader，或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader。
- 如无需验证自动化压缩过程中模型的精度，`eval_callback`可不传入function，程序会自动根据损失来选择最优模型。
+- 自动化压缩Config中定义量化、蒸馏、剪枝等压缩算法会合并执行，压缩策略有：量化+蒸馏，剪枝+蒸馏等等。示例中选择的配置为离线量化超参搜索。
- 自动化压缩Config中定义量化、蒸馏、剪枝等压缩算法会合并执行，压缩策略有：量化+蒸馏，剪枝+蒸馏等等。
 - 如果要压缩的模型参数是存储在各自分离的文件中，需要先通过[convert.py](./convert.py) 脚本将其保存成一个单独的二进制文件。
 ## 应用示例

--- a/example/auto_compression/image_classification/README.md
+++ b/example/auto_compression/image_classification/README.md
@@ -122,9 +122,9 @@ python infer.py --config_path="configs/infer.yaml"
 ```
 在配置文件```configs/infer.yaml```中有以下字段用于配置预测参数：
- ```inference_model_dir```：inference 模型文件所在目录，该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件
+- ```model_dir```：inference 模型文件所在目录，该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件
- ```model_filename```：inference_model_dir文件夹下的模型文件名称
+- ```model_filename```：model_dir文件夹下的模型文件名称
- ```params_filename```：inference_model_dir文件夹下的参数文件名称
+- ```params_filename```：model_dir文件夹下的参数文件名称
 - ```batch_size```：预测一个batch的大小
 - ```image_size```：输入图像的大小
 - ```use_tensorrt```：是否使用 TesorRT 预测引擎

--- a/example/auto_compression/image_classification/configs/eval.yaml
+++ b/example/auto_compression/image_classification/configs/eval.yaml
+model_dir: './MobileNetV1_infer'
+model_filename: 'inference.pdmodel'
+params_filename: "inference.pdiparams"
+batch_size: 128
+data_dir: './ILSVRC2012_data_demo/ILSVRC2012/'
+img_size: 224
+resize_size: 256
--- a/example/auto_compression/image_classification/configs/infer.yaml
+++ b/example/auto_compression/image_classification/configs/infer.yaml
-inference_model_dir: "./MobileNetV1_infer"
+model_dir: "./MobileNetV1_infer"
 model_filename: "inference.pdmodel"
 params_filename: "inference.pdiparams"
 batch_size: 1

--- a/example/auto_compression/image_classification/eval.py
+++ b/example/auto_compression/image_classification/eval.py
@@ -31,9 +31,13 @@ def argsparser():
    parser.add_argument(
        '--config_path',
        type=str,
-        default=None,
+        default='./image_classification/configs/eval.yaml',
-        help="path of compression strategy config.",
+        help="path of compression strategy config.")
-        required=True)
+    parser.add_argument(
+        '--model_dir',
+        type=str,
+        default='./MobileNetV1_infer',
+        help='model directory')
    return parser
@@ -92,19 +96,20 @@ def eval():
    return result[0]
-def main():
+def main(args):
    global global_config
-    all_config = load_slim_config(args.config_path)
+    global_config = load_slim_config(args.config_path)
-    assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
-    global_config = all_config["Global"]
    global data_dir
    data_dir = global_config['data_dir']
+    if args.model_dir != global_config['model_dir']:
+        global_config['model_dir'] = args.model_dir
    global img_size, resize_size
-    img_size = global_config['img_size'] if 'img_size' in global_config else 224
+    img_size = int(global_config[
-    resize_size = global_config[
+        'img_size']) if 'img_size' in global_config else 224
-        'resize_size'] if 'resize_size' in global_config else 256
+    resize_size = int(global_config[
+        'resize_size']) if 'resize_size' in global_config else 256
    result = eval()
    print('Eval Top1:', result)
@@ -114,4 +119,4 @@ if __name__ == '__main__':
    paddle.enable_static()
    parser = argsparser()
    args = parser.parse_args()
-    main()
+    main(args)
--- a/example/auto_compression/image_classification/infer.py
+++ b/example/auto_compression/image_classification/infer.py
@@ -30,8 +30,17 @@ def argsparser():
    parser.add_argument(
        '--config_path',
        type=str,
-        default='configs/infer.yaml',
+        default='./image_classification/configs/infer.yaml',
        help='config file path')
+    parser.add_argument(
+        '--model_dir',
+        type=str,
+        default='./MobileNetV1_infer',
+        help='model directory')
+    parser.add_argument(
+        '--use_fp16', type=bool, default=False, help='Whether to use fp16')
+    parser.add_argument(
+        '--use_int8', type=bool, default=False, help='Whether to use int8')
    return parser
@@ -53,7 +62,7 @@ class Predictor(object):
            output_names[0])
    def create_paddle_predictor(self):
-        inference_model_dir = self.config['inference_model_dir']
+        inference_model_dir = self.config['model_dir']
        model_file = os.path.join(inference_model_dir,
                                  self.config['model_filename'])
        params_file = os.path.join(inference_model_dir,
@@ -110,6 +119,7 @@ class Predictor(object):
            time.sleep(0.01)  # sleep for T4 GPU
        fp_message = "FP16" if config['use_fp16'] else "FP32"
+        fp_message = "INT8" if config['use_int8'] else fp_message
        trt_msg = "using tensorrt" if config[
            'use_tensorrt'] else "not using tensorrt"
        print("{0}\t{1}\tbatch size: {2}\ttime(ms): {3}".format(
@@ -121,5 +131,11 @@ if __name__ == "__main__":
    parser = argsparser()
    args = parser.parse_args()
    config = load_config(args.config_path)
+    if args.model_dir != config['model_dir']:
+        config['model_dir'] = args.model_dir
+    if args.use_fp16 != config['use_fp16']:
+        config['use_fp16'] = args.use_fp16
+    if args.use_int8 != config['use_int8']:
+        config['use_int8'] = args.use_int8
    predictor = Predictor(config)
    predictor.predict()
--- a/paddleslim/auto_compression/compressor.py
+++ b/paddleslim/auto_compression/compressor.py
@@ -151,8 +151,9 @@ class AutoCompression:
        self.train_dataloader = wrap_dataloader(train_dataloader,
                                                self.feed_vars)
        self.eval_dataloader = wrap_dataloader(eval_dataloader, self.feed_vars)
-        if eval_dataloader is None:
+        if self.eval_dataloader is None:
-            eval_dataloader = self._get_eval_dataloader(self.train_dataloader)
+            self.eval_dataloader = self._get_eval_dataloader(
+                self.train_dataloader)
        self.target_speedup = target_speedup
        self.eval_function = eval_callback