未验证 提交 3b2ed2cf 编写于 作者: C Chang Xu 提交者: GitHub

Merge AnalysisPTQ & AnalysisQAT to Analysis (#1692)

上级 2bb09da6
# PTQ(Post Training Quantization)量化分析工具详细教程 # 量化分析工具详细教程
## 1. 量化分析工具功能 ## 1. 量化分析工具功能
1. 统计分析(statistical_analyse): 1. 统计分析(statistical_analyse):
...@@ -13,17 +13,18 @@ ...@@ -13,17 +13,18 @@
- 输入预期精度,直接产出符合预期精度的量化模型。 - 输入预期精度,直接产出符合预期精度的量化模型。
## 2. paddleslim.quant.AnalysisPTQ 可传入参数解析 ## 2. paddleslim.quant.Analysis 可传入参数解析
| **参数名** | **参数释义** | | **参数名** | **参数释义** |
|-----------------------------|-----------------------------------------| |-----------------------------|-----------------------------------------|
| model_dir | 必须传入的模型文件路径,可为文件夹名;若模型为ONNX类型,直接输入'.onnx'模型文件名称即可 | | float_model_dir | 必须传入的模型文件路径,可为文件夹名;若模型为ONNX类型,直接输入'.onnx'模型文件名称即可 |
| quant_model_dir | 默认为None,传入的量化模型文件路径,可为文件夹名;若模型为ONNX类型,直接输入'.onnx'模型文件名称即可; 若不传入,分析工具将使用PTQ进行量化并分析|
| model_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdmodel'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入 | | model_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdmodel'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入 |
| params_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdiparams'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入 | | params_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdiparams'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入 |
| eval_function | 若需要验证精度,需要传入自定义的验证函数;若不传入,精度误差分析将根据Cosine Similarity计算得出 | | eval_function | 若需要验证精度,需要传入自定义的验证函数;若不传入,精度误差分析将根据Cosine Similarity计算得出 |
| data_loader | 模型校准时使用的数据,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader | | data_loader | 模型校准时使用的数据,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader |
| save_dir | 分析后保存模型精度或pdf等文件的文件夹,默认为`analysis_results`| | save_dir | 分析后保存模型精度或pdf等文件的文件夹,默认为`analysis_results`|
| resume | 是否加载中间分析文件,默认为False| | resume | 是否加载中间分析文件,默认为False|
| ptq_config | 可传入的离线量化中的参数,详细可参考[离线量化文档](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/quant/quant_post) | | quant_config | 可传入的离线量化中的参数,详细可参考[离线量化文档](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/quant/quant_post) |
...@@ -45,7 +46,7 @@ import paddle ...@@ -45,7 +46,7 @@ import paddle
from PIL import Image from PIL import Image
from paddle.vision.datasets import DatasetFolder from paddle.vision.datasets import DatasetFolder
from paddle.vision.transforms import transforms from paddle.vision.transforms import transforms
from paddleslim.quant.analysis_ptq import AnalysisPTQ from paddleslim.quant.analysis import Analysis
paddle.enable_static() paddle.enable_static()
class ImageNetDataset(DatasetFolder): class ImageNetDataset(DatasetFolder):
...@@ -72,12 +73,12 @@ image = paddle.static.data( ...@@ -72,12 +73,12 @@ image = paddle.static.data(
train_loader = paddle.io.DataLoader( train_loader = paddle.io.DataLoader(
train_dataset, feed_list=[image], batch_size=8, return_list=False) train_dataset, feed_list=[image], batch_size=8, return_list=False)
analyzer = AnalysisPTQ( analyzer = Analysis(
model_dir="./MobileNetV1_infer", float_model_dir="./MobileNetV1_infer",
model_filename="inference.pdmodel", model_filename="inference.pdmodel",
params_filename="inference.pdiparams", params_filename="inference.pdiparams",
save_dir="MobileNetV1_analysis", save_dir="MobileNetV1_analysis",
ptq_config={ quant_config={
'quantizable_op_type': ["conv2d", "depthwise_conv2d"], 'quantizable_op_type': ["conv2d", "depthwise_conv2d"],
'weight_quantize_type': 'abs_max', 'weight_quantize_type': 'abs_max',
'activation_quantize_type': 'moving_average_abs_max', 'activation_quantize_type': 'moving_average_abs_max',
...@@ -124,22 +125,17 @@ analyzer.statistical_analyse() ...@@ -124,22 +125,17 @@ analyzer.statistical_analyse()
```shell ```shell
analyzer.metric_error_analyse() analyzer.metric_error_analyse()
``` ```
调用该接口,会遍历量化模型中的一层,并计算量化该层后模型的损失。调用该接口时,需要输入Eval Function。会产出所有只量化一层的模型精度排序,将默认保存在 `./analysis_results/analysis.txt` 中。 若不传入quant_model_dir,并且调用该接口,会遍历量化模型中的一层,并计算量化该层后模型的损失。调用该接口时,需要输入Eval Function。会产出所有只量化一层的模型精度排序,将默认保存在 `./analysis_results/analysis.txt` 中。
若传入quant_model_dir,并且调用该接口,会遍历量化模型中的每一层,去掉量化节点并计算当前层不量化的模型精度。调用该接口时,需要输入Eval Function。会产出所有去掉一层量化的模型精度排序,将默认保存在 `./analysis_results/analysis.txt` 中。具体使用可参考[GPT量化训练敏感度分析DEMO](../../../../example/quantization_analysis/GPT/README.md)
**直接产出符合预期精度的目标量化模型** **直接产出符合预期精度的目标量化模型**
```shell ```shell
analyzer.get_target_quant_model(target_metric=70.0) analyzer.get_target_quant_model(target_metric=0.70)
``` ```
## 4. 根据分析结果执行离线量化 ## 4. 根据分析结果执行离线量化
执行完量化分析工具后,可根据 `analysis.txt` 中的精度排序,在量化中去掉效果较差的层,具体操作为:在调用 `paddleslim.quant.quant_post_static` 时加入参数 `skip_tensor_list`,将需要去掉的层传入即可。 执行完量化分析工具后,可根据 `analysis.txt` 中的精度排序,在量化中去掉效果较差的层,具体操作为:在调用 `paddleslim.quant.quant_post_static` 时加入参数 `skip_tensor_list`,将需要去掉的层传入即可。
## FAQ:
- 与QAT(Quantization-Aware Training)量化分析工具的区别:与QAT量化分析工具不同的是,PTQ量化分析工具则是加载待量化的原模型,对模型所有层依次进行量化,每次量化一层,进行验证获取精度误差分析。而QAT量化分析工具加载量化训练后的量化模型,遍历所有量化的层,依次去掉量化层,加载Float模型的参数,并进行验证获取精度误差分析。
- PTQ量化分析工具设计的原因:PTQ量化分析工具依次量化模型中的每一层,而不是依次去掉量化层是由于PTQ本身的高效性。依次量化一层进行验证,查看对模型精度的损失十分直观。
- 量化分析工具为什么要区分PTQ和QAT:实验证明PTQ和QAT后的量化模型的敏感层并不完全一致,将两种算法分开,敏感度分析结果更加准确。
# QAT(Quantization-Aware Training)量化分析工具详细教程
## 1. 量化分析工具功能
精度误差分析(metric_error_analyse):
- 遍历量化训练后模型的每层,去掉量化节点并计算当前层不量化的模型精度。该功能可以定位具体某层导致的量化损失。
## 2. paddleslim.quant.AnalysisQAT 可传入参数解析
| **参数名** | **参数释义** |
|-----------------------------|-----------------------------------------|
| quant_model_dir | 必须传入的量化后的模型文件路径 |
| float_model_dir | 必须传入的量化前的模型文件路径 |
| model_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdmodel'结尾的模型名称 |
| params_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdiparams'结尾的模型名称 |
| quantizable_op_type | 需分析的量化的op类型,默认为`conv2d`, `depthwise_conv2d`, `mul` |
| qat_metric | 量化模型的精度,可不传入,默认为None,不传入时会自动计算 |
| eval_function | 若需要验证精度,需要传入自定义的验证函数;若不传入,精度误差分析将根据Cosine Similarity计算得出 |
| data_loader | 模型校准时使用的数据,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader |
| save_dir | 分析后保存模型精度或pdf等文件的文件夹,默认为`analysis_results`|
| resume | 是否加载中间分析文件,默认为False|
## 3. 量化分析工具的使用
**创建量化分析工具**
```shell
# 下载Inference模型
wget -q https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_infer.tar
tar -xf MobileNetV1_infer.tar
wget -q https://paddle-slim-models.bj.bcebos.com/act/MobileNetV1_QAT.tar
tar -xf MobileNetV1_QAT.tar
# 下载demo数据集
wget -q https://sys-p0.bj.bcebos.com/slim_ci/ILSVRC2012_data_demo.tar.gz
tar -xf ILSVRC2012_data_demo.tar.gz
```
```shell
import paddle
from PIL import Image
from paddle.vision.datasets import DatasetFolder
from paddle.vision.transforms import transforms
from paddle.quantization import PostTrainingQuantization
from paddleslim.quant.analysis_qat import AnalysisQAT
paddle.enable_static()
class ImageNetDataset(DatasetFolder):
def __init__(self, path, image_size=224):
super(ImageNetDataset, self).__init__(path)
normalize = transforms.Normalize(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.120, 57.375])
self.transform = transforms.Compose([
transforms.Resize(256), transforms.CenterCrop(image_size),
transforms.Transpose(), normalize
])
def __getitem__(self, idx):
img_path, _ = self.samples[idx]
return self.transform(Image.open(img_path).convert('RGB'))
def __len__(self):
return len(self.samples)
train_dataset = ImageNetDataset(
"./ILSVRC2012_data_demo/ILSVRC2012/train/")
image = paddle.static.data(
name='inputs', shape=[None] + [3, 224, 224], dtype='float32')
train_loader = paddle.io.DataLoader(
train_dataset, feed_list=[image], batch_size=8, return_list=False)
analyzer = AnalysisQAT(
float_model_dir="./MobileNetV1_infer",
quant_model_dir="./MobileNetV1_QAT",
model_filename="inference.pdmodel",
params_filename="inference.pdiparams",
save_dir="MobileNetV1_analysis",
data_loader=train_loader)
```
**精度误差分析**
```shell
analyzer.metric_error_analyse()
```
调用该接口,会遍历量化模型中的每一层,去掉量化节点并计算当前层不量化的模型精度。调用该接口时,需要输入Eval Function。会产出所有去掉一层量化的模型精度排序,将默认保存在 `./analysis_results/analysis.txt` 中。具体使用可参考[GPT量化训练敏感度分析DEMO](../../../../example/quantization_analysis/GPT/README.md)
## FAQ:
- 与PTQ(Post Training Quantization)量化分析工具的区别:与PTQ量化分析工具不同的是,QAT量化分析工具加载量化训练后的量化模型,遍历所有量化的层,依次去掉量化层,加载Float模型的参数,并进行验证获取精度误差分析。而PTQ量化分析工具则是加载待量化的原模型,对模型所有层依次进行量化,每次量化一层,进行验证获取精度误差分析。
- QAT量化分析工具设计的原因:QAT量化分析工具依次去掉量化层,而不是依次量化一层是由于QAT需要训练的特性。遍历每层进行量化训练再验证精度比较耗时,直接加载量化训练后的量化模型,依次去掉量化层更高效。
- 量化分析工具为什么要区分PTQ和QAT:实验证明PTQ和QAT后的量化模型的敏感层并不完全一致,将两种算法分开,敏感度分析结果更加准确。
...@@ -23,7 +23,7 @@ from ppdet.core.workspace import create ...@@ -23,7 +23,7 @@ from ppdet.core.workspace import create
from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval
from keypoint_utils import keypoint_post_process from keypoint_utils import keypoint_post_process
from post_process import PPYOLOEPostProcess from post_process import PPYOLOEPostProcess
from paddleslim.quant.analysis_ptq import AnalysisPTQ from paddleslim.quant.analysis import Analysis
def argsparser(): def argsparser():
...@@ -87,7 +87,8 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list): ...@@ -87,7 +87,8 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
elif isinstance(config['input_list'], dict): elif isinstance(config['input_list'], dict):
if k in config['input_list'].keys(): if k in config['input_list'].keys():
data_input[config['input_list'][k]] = np.array(v) data_input[config['input_list'][k]] = np.array(v)
outs = exe.run(compiled_test_program, outs = exe.run(
compiled_test_program,
feed=data_input, feed=data_input,
fetch_list=test_fetch_list, fetch_list=test_fetch_list,
return_numpy=False) return_numpy=False)
...@@ -115,8 +116,7 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list): ...@@ -115,8 +116,7 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
metric.log() metric.log()
map_res = metric.get_results() map_res = metric.get_results()
metric.reset() metric.reset()
map_key = 'keypoint' if 'arch' in config and config[ map_key = 'keypoint' if 'arch' in config and config['arch'] == 'keypoint' else 'bbox'
'arch'] == 'keypoint' else 'bbox'
return map_res[map_key][0] return map_res[map_key][0]
...@@ -127,9 +127,8 @@ def main(): ...@@ -127,9 +127,8 @@ def main():
ptq_config = config['PTQ'] ptq_config = config['PTQ']
# val dataset is sufficient for PTQ # val dataset is sufficient for PTQ
data_loader = create('EvalReader')(config['EvalDataset'], data_loader = create('EvalReader')(
config['worker_num'], config['EvalDataset'], config['worker_num'], return_list=True)
return_list=True)
ptq_data_loader = reader_wrapper(data_loader, config['input_list']) ptq_data_loader = reader_wrapper(data_loader, config['input_list'])
# fast_val_anno_path, such as annotation path of several pictures can accelerate analysis # fast_val_anno_path, such as annotation path of several pictures can accelerate analysis
...@@ -139,7 +138,8 @@ def main(): ...@@ -139,7 +138,8 @@ def main():
global val_loader global val_loader
_eval_batch_sampler = paddle.io.BatchSampler( _eval_batch_sampler = paddle.io.BatchSampler(
dataset, batch_size=config['EvalReader']['batch_size']) dataset, batch_size=config['EvalReader']['batch_size'])
val_loader = create('EvalReader')(dataset, val_loader = create('EvalReader')(
dataset,
config['worker_num'], config['worker_num'],
batch_sampler=_eval_batch_sampler, batch_sampler=_eval_batch_sampler,
return_list=True) return_list=True)
...@@ -161,14 +161,14 @@ def main(): ...@@ -161,14 +161,14 @@ def main():
else: else:
raise ValueError("metric currently only supports COCO and VOC.") raise ValueError("metric currently only supports COCO and VOC.")
analyzer = AnalysisPTQ( analyzer = Analysis(
model_dir=config["model_dir"], float_model_dir=config["model_dir"],
model_filename=config["model_filename"], model_filename=config["model_filename"],
params_filename=config["params_filename"], params_filename=config["params_filename"],
eval_function=eval_function, eval_function=eval_function,
data_loader=ptq_data_loader, data_loader=ptq_data_loader,
save_dir=config['save_dir'], save_dir=config['save_dir'],
ptq_config=ptq_config, quant_config=ptq_config,
resume=True, ) resume=True, )
analyzer.statistical_analyse() analyzer.statistical_analyse()
......
...@@ -21,7 +21,7 @@ from tqdm import tqdm ...@@ -21,7 +21,7 @@ from tqdm import tqdm
from post_process import YOLOPostProcess, coco_metric from post_process import YOLOPostProcess, coco_metric
from dataset import COCOValDataset, COCOTrainDataset from dataset import COCOValDataset, COCOTrainDataset
from paddleslim.common import load_config, load_onnx_model from paddleslim.common import load_config, load_onnx_model
from paddleslim.quant.analysis_ptq import AnalysisPTQ from paddleslim.quant.analysis import Analysis
def argsparser(): def argsparser():
...@@ -41,7 +41,8 @@ def argsparser(): ...@@ -41,7 +41,8 @@ def argsparser():
'--resume', '--resume',
type=bool, type=bool,
default=False, default=False,
help="When break off while ananlyzing, could resume analysis program and load already analyzed information." help=
"When break off while ananlyzing, could resume analysis program and load already analyzed information."
) )
return parser return parser
...@@ -54,7 +55,8 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list): ...@@ -54,7 +55,8 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
ncols=80) as t: ncols=80) as t:
for data in val_loader: for data in val_loader:
data_all = {k: np.array(v) for k, v in data.items()} data_all = {k: np.array(v) for k, v in data.items()}
outs = exe.run(compiled_test_program, outs = exe.run(
compiled_test_program,
feed={test_feed_names[0]: data_all['image']}, feed={test_feed_names[0]: data_all['image']},
fetch_list=test_fetch_list, fetch_list=test_fetch_list,
return_numpy=False) return_numpy=False)
...@@ -103,15 +105,15 @@ def main(): ...@@ -103,15 +105,15 @@ def main():
load_onnx_model(config["model_dir"]) load_onnx_model(config["model_dir"])
inference_model_path = config["model_dir"].rstrip().rstrip( inference_model_path = config["model_dir"].rstrip().rstrip(
'.onnx') + '_infer' '.onnx') + '_infer'
analyzer = AnalysisPTQ( analyzer = Analysis(
model_dir=inference_model_path, float_model_dir=inference_model_path,
model_filename='model.pdmodel', model_filename='model.pdmodel',
params_filename='model.pdiparams', params_filename='model.pdiparams',
eval_function=eval_function, eval_function=eval_function,
data_loader=data_loader, data_loader=data_loader,
save_dir=config['save_dir'], save_dir=config['save_dir'],
resume=FLAGS.resume, resume=FLAGS.resume,
ptq_config=ptq_config) quant_config=ptq_config)
analyzer.statistical_analyse() analyzer.statistical_analyse()
analyzer.metric_error_analyse() analyzer.metric_error_analyse()
......
...@@ -21,7 +21,7 @@ import time ...@@ -21,7 +21,7 @@ import time
import paddle import paddle
from paddleslim.common import load_config as load_slim_config from paddleslim.common import load_config as load_slim_config
from paddleslim.quant.analysis_qat import AnalysisQAT from paddleslim.quant.analysis import Analysis
from ppfleetx.data import build_dataloader from ppfleetx.data import build_dataloader
from ppfleetx.distributed.apis import env from ppfleetx.distributed.apis import env
from utils import parse_config from utils import parse_config
...@@ -164,17 +164,15 @@ def main(): ...@@ -164,17 +164,15 @@ def main():
global eval_loader global eval_loader
eval_loader = eval_reader_wrapper(valid_data_loader) eval_loader = eval_reader_wrapper(valid_data_loader)
analyzer = AnalysisQAT( analyzer = Analysis(
quant_model_dir=global_config["quant_model_dir"], quant_model_dir=global_config["quant_model_dir"],
float_model_dir=global_config["float_model_dir"], float_model_dir=global_config["float_model_dir"],
model_filename=global_config["model_filename"], model_filename=global_config["model_filename"],
params_filename=global_config["params_filename"], params_filename=global_config["params_filename"],
quantizable_op_type=global_config['quantizable_op_type'],
qat_metric=global_config['qat_metric']
if 'qat_metric' in global_config else None,
eval_function=eval_function, eval_function=eval_function,
data_loader=eval_loader, data_loader=eval_loader,
save_dir=FLAGS.save_dir, save_dir=FLAGS.save_dir,
quant_config=all_config['quant_config'],
resume=global_config['resume'], ) resume=global_config['resume'], )
analyzer.metric_error_analyse() analyzer.metric_error_analyse()
......
...@@ -5,11 +5,16 @@ Global: ...@@ -5,11 +5,16 @@ Global:
float_model_dir: ./GPT_345M_Baseline float_model_dir: ./GPT_345M_Baseline
model_filename: model.pdmodel model_filename: model.pdmodel
params_filename: model.pdiparams params_filename: model.pdiparams
quantizable_op_type: ["mul", "matmul", "matmul_v2"]
resume: False resume: False
reader_config: ./configs/gpt_reader.yaml reader_config: ./configs/gpt_reader.yaml
cloze_eval: True # True for LAMBADA Dataset; False for WikiText cloze_eval: True # True for LAMBADA Dataset; False for WikiText
quant_config:
quantizable_op_type: ["mul", "matmul", "matmul_v2"]
weight_quantize_type: 'abs_max'
activation_quantize_type: 'moving_average_abs_max'
is_full_quantize: False
batch_size: 8
batch_nums: 10
\ No newline at end of file
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. # Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -17,132 +17,111 @@ import sys ...@@ -17,132 +17,111 @@ import sys
import pickle import pickle
import copy import copy
import logging import logging
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import csv import csv
import numpy as np import numpy as np
import random import random
import tempfile import tempfile
import paddle import paddle
import paddle.nn.functional as F from ..common import get_logger, load_inference_model
from ..core import GraphWrapper from paddle.fluid.framework import IrGraph
from ..common import get_logger from paddle.framework import core
from ..common import get_feed_vars, wrap_dataloader, load_inference_model, get_model_dir
from paddle.static.quantization import PostTrainingQuantization
from .analysis_utils import *
_logger = get_logger(__name__, level=logging.INFO) _logger = get_logger(__name__, level=logging.INFO)
__all__ = ["AnalysisPTQ"] SUPPORT_WEIGHT_OP_DICT = {
"conv2d": [["Input", "Filter"], ["Output"]],
"depthwise_conv2d": [["Input", "Filter"], ["Output"]],
"conv2d_transpose": [["Input", "Filter"], ["Output"]],
"mul": [["X", "Y"], ["Out"]],
"matmul": [["X", "Y"], ["Out"]],
"matmul_v2": [["X", "Y"], ["Out"]]
}
class AnalysisPTQ(object): class Analysis(object):
def __init__(self, def __init__(self,
model_dir, float_model_dir,
quant_model_dir=None,
model_filename=None, model_filename=None,
params_filename=None, params_filename=None,
eval_function=None,
data_loader=None, data_loader=None,
save_dir='analysis_results', eval_function=None,
resume=False, resume=False,
ptq_config=None): save_dir='analysis_results',
""" quant_config=None):
AnalysisPTQ provides to analysis the sensitivity of each op in the model. '''
Analysis provides to analysis the sensitivity of each op in the model.
Args: Args:
model_dir(str): the path of fp32 model that will be quantized, it can also be '.onnx' float_model_dir(str, required): the path of fp32 model, it can also be '.onnx'
model_filename(str, optional): the model file name of the fp32 model quant_model_dir(str, optional):the path of quantized model, if is None, float model will be quantized by PTQ
params_filename(str, optional): the parameter file name of the fp32 model model_filename(str, optional): the model file name of the fp32 and quantized model
params_filename(str, optional): the parameter file name of the fp32 and quantized model
eval_function(function): eval function, define by yourself to return the metric of the inference program, can be used to judge the metric of quantized model. (TODO: optional) eval_function(function): eval function, define by yourself to return the metric of the inference program, can be used to judge the metric of quantized model. (TODO: optional)
data_loader(Python Generator, Paddle.io.DataLoader, optional): the data_loader(Python Generator, Paddle.io.DataLoader, optional): the
Generator or Dataloader provides calibrate data, and it could Generator or Dataloader provides calibrate data, and it could
return a batch every time return a batch every time
save_dir(str, optional): the output dir that stores the analyzed information save_dir(str, optional): the output dir that stores the analyzed information
resume(bool, optional): When break off while ananlyzing, could resume analysis program and load already analyzed information. resume(bool, optional): When break off while ananlyzing, could resume analysis program and load already analyzed information.
ptq_config(dict, optional): the args that can initialize PostTrainingQuantization quant_config(dict, optional): the args that can initialize PostTrainingQuantization
""" Examples:
.. code-block:: python
from paddleslim.quant.analysis import Analysis
analyzer = Analysis(quant_model_dir=quant_model_dir)
analyzer.metric_error_analyse()
'''
if model_filename is None: if model_filename is None:
model_filename = 'model.pdmodel' model_filename = 'model.pdmodel'
if params_filename is None: if params_filename is None:
params_filename = 'model.pdiparams' params_filename = 'model.pdiparams'
self.model_dir = model_dir self.float_model_dir = float_model_dir
self.quant_model_dir = quant_model_dir
self.model_filename = model_filename self.model_filename = model_filename
self.params_filename = params_filename self.params_filename = params_filename
self.histogram_bins = 1000 self.histogram_bins = 1000
self.save_dir = save_dir self.save_dir = save_dir
self.eval_function = eval_function
self.quant_layer_names = []
self.checkpoint_name = os.path.join(save_dir, 'analysis_checkpoint.pkl') self.checkpoint_name = os.path.join(save_dir, 'analysis_checkpoint.pkl')
self.quant_layer_metrics = {} self.data_loader = data_loader
self.ptq_config = ptq_config self.eval_function = eval_function
self.batch_nums = ptq_config[ self.quant_config = quant_config
'batch_nums'] if 'batch_nums' in ptq_config else 10 self.batch_nums = quant_config.get("batch_nums", 10)
self.is_full_quantize = ptq_config[ self.is_full_quantize = quant_config.get("is_full_quantize", False)
'is_full_quantize'] if 'is_full_quantize' in ptq_config else False self.onnx_format = quant_config.get("onnx_format", False)
self.onnx_format = ptq_config[
'onnx_format'] if 'onnx_format' in ptq_config else False self.quantizable_op_type = quant_config.get(
ptq_config['onnx_format'] = self.onnx_format "quantizable_op_type", list(SUPPORT_WEIGHT_OP_DICT.keys()))
if 'algo' not in ptq_config: self.skip_tensor_list = quant_config.get("skip_tensor_list", [])
ptq_config['algo'] = 'avg' if self.skip_tensor_list:
del self.quant_config['skip_tensor_list']
quant_config['onnx_format'] = self.onnx_format
quant_config['algo'] = quant_config.get("algo", 'avg')
if not os.path.exists(self.save_dir):
os.mkdir(self.save_dir)
if self.onnx_format: if self.onnx_format:
self.temp_root_path = tempfile.TemporaryDirectory(dir=self.save_dir) self.temp_root_path = tempfile.TemporaryDirectory(dir=self.save_dir)
self.temp_save_path = os.path.join(self.temp_root_path.name, "ptq") self.temp_save_path = os.path.join(self.temp_root_path.name, "ptq")
if not os.path.exists(self.temp_save_path): if not os.path.exists(self.temp_save_path):
os.makedirs(self.temp_save_path) os.makedirs(self.temp_save_path)
if not os.path.exists(self.save_dir):
os.mkdir(self.save_dir)
devices = paddle.device.get_device().split(':')[0] devices = paddle.device.get_device().split(':')[0]
self.places = paddle.device._convert_to_place(devices) self.places = paddle.device._convert_to_place(devices)
executor = paddle.static.Executor(self.places)
# load model
[program, self.feed_list, self.fetch_list]= load_inference_model( \
self.model_dir, \
executor=executor, \
model_filename=self.model_filename, \
params_filename=self.params_filename)
# create data_loader
self.data_loader = wrap_dataloader(data_loader, self.feed_list)
# quant model to get quantizable ops
post_training_quantization = self.create_ptq(executor, None)
_logger.info('Run PTQ before analysis.')
program = post_training_quantization.quantize()
if self.onnx_format:
post_training_quantization.save_quantized_model(
self.temp_save_path,
model_filename='model.pdmodel',
params_filename='model.pdiparams')
program, _, _ = load_inference_model(
self.temp_save_path,
executor,
model_filename='model.pdmodel',
params_filename='model.pdiparams')
# get quantized weight and act var name
self.quantized_weight_var_name = post_training_quantization._quantized_weight_var_name
self.quantized_act_var_name = post_training_quantization._quantized_act_var_name
self.support_quant_val_name_list = self.quantized_weight_var_name if not self.is_full_quantize else list(
self.quantized_act_var_name)
self.weight_names = list(self.quantized_weight_var_name)
self.act_names = list(self.quantized_act_var_name)
executor.close()
# load tobe_analyized_layer from checkpoint self.layer_metrics = {}
if resume: if resume:
self.load_checkpoint() self.load_checkpoint()
self.tobe_analyized_layer = sorted(
list(self.support_quant_val_name_list))
def save_checkpoint(self): def save_checkpoint(self):
if not os.path.exists(self.save_dir): if not os.path.exists(self.save_dir):
os.makedirs(self.save_dir) os.makedirs(self.save_dir)
with open(self.checkpoint_name, 'wb') as f: with open(self.checkpoint_name, 'wb') as f:
pickle.dump(self.quant_layer_metrics, f) pickle.dump(self.layer_metrics, f)
_logger.info('Save checkpoint to {}.'.format(self.checkpoint_name)) _logger.info('Save checkpoint to {}.'.format(self.checkpoint_name))
def load_checkpoint(self): def load_checkpoint(self):
...@@ -151,308 +130,117 @@ class AnalysisPTQ(object): ...@@ -151,308 +130,117 @@ class AnalysisPTQ(object):
self.checkpoint_name)) self.checkpoint_name))
return False return False
with open(self.checkpoint_name, 'rb') as f: with open(self.checkpoint_name, 'rb') as f:
self.quant_layer_metrics = pickle.load(f) self.layer_metrics = pickle.load(f)
_logger.info('Load checkpoint from {}.'.format(self.checkpoint_name)) _logger.info('Load checkpoint from {}.'.format(self.checkpoint_name))
return True return True
def save_csv(self, data, save_name, csv_columns): def get_weight_act_info(self, program, persistable=True):
save_path = os.path.join(self.save_dir, save_name) self.persistable_var_names = []
with open(save_path, 'w') as csvfile: for var in program.list_vars():
writer = csv.DictWriter(csvfile, fieldnames=csv_columns) if var.persistable:
writer.writeheader() self.persistable_var_names.append(var.name)
for d in data: graph = IrGraph(core.Graph(program.desc), for_test=True)
writer.writerow(d)
_logger.info('Activation Statistic is saved in {}'.format(save_path)) weight_act_dict = {}
act_weight_dict = {}
def create_ptq(self, executor, skip_tensor_list): ops = graph.all_op_nodes()
return paddle.static.quantization.PostTrainingQuantization( for op_node in ops:
if op_node.name() in self.quantizable_op_type:
in_x, in_y = SUPPORT_WEIGHT_OP_DICT[op_node.name()][0]
input_name_x = op_node.input(in_x)[0]
input_name_y = op_node.input(in_y)[0]
if not persistable:
weight_act_dict[input_name_y] = input_name_x
act_weight_dict[input_name_x] = input_name_y
else:
if input_name_y in self.persistable_var_names and input_name_y not in self.skip_tensor_list:
weight_act_dict[input_name_y] = input_name_x
act_weight_dict[input_name_x] = input_name_y
return weight_act_dict, act_weight_dict
def create_ptq(self, executor, skip_tensor_list=[]):
skip_tensor_list += self.skip_tensor_list
return PostTrainingQuantization(
executor=executor, executor=executor,
data_loader=self.data_loader, data_loader=self.data_loader,
model_dir=self.model_dir, model_dir=self.float_model_dir,
model_filename=self.model_filename, model_filename=self.model_filename,
params_filename=self.params_filename, params_filename=self.params_filename,
skip_tensor_list=skip_tensor_list, skip_tensor_list=skip_tensor_list,
**self.ptq_config) **self.quant_config)
def sampling(self, executor, program, scope): def sampling(self, executor, program, scope, fetch_list):
batch_id = 0 batch_id = 0
for data in self.data_loader(): for data in self.data_loader():
executor.run(program=program, executor.run(
program=program,
feed=data, feed=data,
fetch_list=self.fetch_list, fetch_list=fetch_list,
return_numpy=False, return_numpy=False,
scope=scope) scope=scope)
batch_id += 1 batch_id += 1
if batch_id >= self.batch_nums: if batch_id >= self.batch_nums:
break break
def fp_int_cosine_similarity(self, executor, float_program, quant_program,
float_scope, quant_scope):
cosine_similarity = []
for step, data in enumerate(self.data_loader()):
with paddle.static.scope_guard(float_scope):
float_preds = executor.run(program=float_program,
feed=data,
fetch_list=self.fetch_list,
return_numpy=False)
float_preds = float_preds[0]
with paddle.static.scope_guard(quant_scope):
quant_preds = executor.run(program=quant_program,
feed=data,
fetch_list=self.fetch_list,
return_numpy=False)
quant_preds = quant_preds[0]
paddle.disable_static()
float_preds = paddle.to_tensor(float_preds)
quant_preds = paddle.to_tensor(quant_preds)
cos_sim = F.cosine_similarity(float_preds, quant_preds).mean()
cos_sim = cos_sim.numpy()
cosine_similarity.append(cos_sim)
if step != 0 and (step % 10 == 0):
_logger.info("[step]: %d, cosine similarity: %.9f" %
(step, np.array(cosine_similarity).mean()))
paddle.enable_static()
return np.array(cosine_similarity).mean()
def get_sensitive_metric(self, skip_list, layer_name):
executor = paddle.static.Executor(self.places)
if self.eval_function is not None:
post_training_quantization = self.create_ptq(executor, skip_list)
program = post_training_quantization.quantize()
_logger.info('Evaluating...')
if self.onnx_format:
post_training_quantization.save_quantized_model(
self.temp_save_path,
model_filename='model.pdmodel',
params_filename='model.pdiparams')
program, _, _ = load_inference_model(
self.temp_save_path,
executor,
model_filename='model.pdmodel',
params_filename='model.pdiparams')
metric = self.eval_function(executor, program, self.feed_list,
self.fetch_list)
if skip_list is None:
executor.close()
return metric
sensitive_metric = self.base_metric - metric
_logger.info(
"Quantized layer name: %s, the accuracy: %.4f, the sensitive metric: %.4f"
% (layer_name, metric, sensitive_metric))
else:
float_scope = paddle.static.Scope()
quant_scope = paddle.static.Scope()
with paddle.static.scope_guard(float_scope):
[float_program, _, _] = load_inference_model(
self.model_dir,
executor=executor,
model_filename=self.model_filename,
params_filename=self.params_filename)
with paddle.static.scope_guard(quant_scope):
post_training_quantization = self.create_ptq(executor,
skip_list)
quant_program = post_training_quantization.quantize()
metric = self.fp_int_cosine_similarity(executor, float_program,
quant_program, float_scope,
quant_scope)
sensitive_metric = 1.0 - metric
_logger.info(
"Quantized layer name: %s, the cosine similarity: %.4f, the sensitive metric: %.4f"
% (layer_name, metric, sensitive_metric))
executor.close()
return sensitive_metric
def metric_error_analyse(self):
'''
Evaluate the quantized models, which are generated by quantizing each weight operator one by one. The results will be saved into analysis.txt.
'''
assert self.data_loader is not None, "When computing the sensitivity of quantized layers, the data loader is needed"
if self.eval_function is not None:
# evaluate before quant
_logger.info('Start to evaluate the base model.')
executor = paddle.static.Executor(self.places)
[program, feed_list, fetch_list]= load_inference_model( \
self.model_dir, \
executor=executor, \
model_filename=self.model_filename, \
params_filename=self.params_filename)
self.base_metric = self.eval_function(executor, program, feed_list,
fetch_list)
_logger.info('Before quantized, the accuracy of the model is: {}'.
format(self.base_metric))
executor.close()
# evaluate before quant
_logger.info('Start to evaluate the quantized model.')
self.quant_metric = self.get_sensitive_metric(
None, 'all quantizable layers')
_logger.info('After quantized, the accuracy of the model is: {}'.
format(self.quant_metric))
# For each layer, quantize the weight op and evaluate the quantized model.
for i, layer_name in enumerate(self.tobe_analyized_layer):
if layer_name in self.quant_layer_metrics:
continue
_logger.info('Checking {}/{} quant model: quant layer {}'.format(
i + 1, len(self.tobe_analyized_layer), layer_name))
skip_list = copy.copy(list(self.support_quant_val_name_list))
skip_list.remove(layer_name)
sensitive_metric = self.get_sensitive_metric(skip_list, layer_name)
self.quant_layer_metrics[layer_name] = sensitive_metric
self.save_checkpoint()
if self.onnx_format:
self.temp_root_path.cleanup()
self.sensitivity_ranklist = sorted(
self.quant_layer_metrics,
key=self.quant_layer_metrics.get,
reverse=True)
_logger.info('Finished computing the sensitivity of the model.')
for name in self.sensitivity_ranklist:
_logger.info("Quantized layer name: {}, sensitivity metric: {}".
format(name, self.quant_layer_metrics[name]))
analysis_file = os.path.join(self.save_dir, "analysis.txt")
with open(analysis_file, "w") as analysis_ret_f:
for name in self.sensitivity_ranklist:
analysis_ret_f.write(
"Quantized layer name: {}, sensitivity metric: {}\n".format(
name, self.quant_layer_metrics[name]))
_logger.info('Analysis file is saved in {}'.format(analysis_file))
def collect_vars(self, scope, var_names):
all_vars = {}
for var_name in var_names:
var_tensor = paddle.static.quantization.utils.load_variable_data(
scope, var_name)
all_vars[var_name] = var_tensor
return all_vars
def collect_base_stat(self): def collect_base_stat(self):
_logger.info('Collecting Statistic Before PTQ...') _logger.info('Collecting fp model statistic...')
executor = paddle.static.Executor(self.places) executor = paddle.static.Executor(self.places)
[program, feed_list, fetch_list]= load_inference_model( \ [program, feed_list, fetch_list]= load_inference_model( \
self.model_dir, \ self.float_model_dir, \
executor=executor, \ executor=executor, \
model_filename=self.model_filename, \ model_filename=self.model_filename, \
params_filename=self.params_filename) params_filename=self.params_filename)
scope = paddle.static.global_scope() scope = paddle.static.global_scope()
persistable_var_names = []
for var in program.list_vars():
if var.persistable:
persistable_var_names.append(var.name)
self.acts_weight_map = self.get_weight_act_map( self.fp_weight_act_dict, self.fp_act_weight_dict = self.get_weight_act_info(
program, self.weight_names, persistable_var_names) program)
activations_names = list(self.acts_weight_map.keys()) self.fp_weight_names = list(self.fp_weight_act_dict.keys())
self.fp_act_names = list(self.fp_weight_act_dict.values())
for var in program.list_vars(): for var in program.list_vars():
if var.name in activations_names: if var.name in self.fp_act_names:
var.persistable = True var.persistable = True
# sample # sample
self.sampling(executor, program, scope) self.sampling(executor, program, scope, fetch_list)
before_act_data = self.collect_vars(scope, activations_names) fp_act = collect_vars(scope, self.fp_act_names)
before_weight_data = self.collect_vars(scope, self.weight_names) fp_weight = collect_vars(scope, self.fp_weight_names)
executor.close() executor.close()
return before_act_data, before_weight_data return fp_act, fp_weight
def collect_quant_stat(self): def collect_quant_stat(self):
_logger.info('Collecting Statistic After PTQ...') _logger.info('Collecting quant model statistic...')
if self.quant_model_dir is None:
executor = paddle.static.Executor(self.places) executor = paddle.static.Executor(self.places)
scope = paddle.static.global_scope() scope = paddle.static.global_scope()
post_training_quantization = self.create_ptq(executor, None) ptq = self.create_ptq(executor)
program = post_training_quantization.quantize() program = ptq.quantize()
feed_list, fetch_list = ptq._feed_list, ptq._fetch_list
else:
executor = paddle.static.Executor(self.places)
[program, feed_list, fetch_list]= load_inference_model( \
self.quant_model_dir, \
executor=executor, \
model_filename=self.model_filename, \
params_filename=self.params_filename)
scope = paddle.static.global_scope()
persistable_var_names = [] self.quant_weight_act_dict, self.quant_act_weight_dict = self.get_weight_act_info(
for var in program.list_vars(): program)
if var.persistable: self.quant_weight_names = list(self.quant_weight_act_dict.keys())
persistable_var_names.append(var.name) self.quant_act_names = list(self.quant_weight_act_dict.values())
quant_weight_names = self.weight_names
dequant_act_names = ["%s.quantized" % (n) for n in self.acts_weight_map]
for var in program.list_vars(): for var in program.list_vars():
if var.name in dequant_act_names: if var.name in self.quant_act_names:
var.persistable = True var.persistable = True
self.sampling(executor, program, scope) self.sampling(executor, program, scope, fetch_list)
after_act_data = self.collect_vars(scope, dequant_act_names) quant_act = collect_vars(scope, self.quant_act_names)
after_weight_data = self.collect_vars(scope, quant_weight_names) quant_weight = collect_vars(scope, self.quant_weight_names)
executor.close() executor.close()
return after_act_data, after_weight_data return quant_act, quant_weight
def statistical_analyse(self, analysis_axis=None):
self.act_data, self.weight_data = self.collect_base_stat()
self.quant_act_data, self.dequant_weight_data = self.collect_quant_stat(
)
fp_q_act_name_map = {
n: "%s.quantized" % (n)
for n in self.acts_weight_map
}
act_statistic, box_fp_dist, box_q_dist, hist_fp_dist, hist_q_dist = self.collect_statistic(
self.act_data,
self.quant_act_data,
fp_q_act_name_map,
is_weight=False,
axis=analysis_axis)
self.plot_box_distribution(box_fp_dist,
list(self.acts_weight_map.keys()),
'fp_activation_boxplot.pdf')
self.plot_box_distribution(box_q_dist,
list(self.acts_weight_map.keys()),
'quantized_activation_boxplot.pdf')
self.plot_hist_distribution(hist_fp_dist, 'fp_activation_histplot.pdf')
self.plot_hist_distribution(hist_q_dist,
'quantized_activation_histplot.pdf')
weight_statistic, box_fp_dist, box_q_dist, hist_fp_dist, hist_q_dist = self.collect_statistic(
self.weight_data,
self.dequant_weight_data,
None,
is_weight=True,
axis=analysis_axis)
self.plot_box_distribution(box_fp_dist,
list(self.quantized_weight_var_name),
'fp_weight_boxplot.pdf')
self.plot_box_distribution(box_q_dist,
list(self.quantized_weight_var_name),
'quantized_weight_boxplot.pdf')
self.plot_hist_distribution(hist_fp_dist, 'fp_weight_histplot.pdf')
self.plot_hist_distribution(hist_q_dist,
'quantized_weight_histplot.pdf')
statistic = act_statistic + weight_statistic
csv_columns = [
'Var Name', 'Var Type', 'Corresponding Weight Name', 'FP32 Min',
'FP32 Max', 'FP32 Mean', 'FP32 Std', 'Quantized Min',
'Quantized Max', 'Quantized Mean', 'Quantized Std', 'Diff Min',
'Diff Max', 'Diff Mean', 'Diff Std'
]
self.save_csv(statistic, 'statistic.csv', csv_columns)
def get_weight_act_map(self, program, weight_names, persistable_var_names):
weight_act_map = {}
for op_name in weight_names:
for block_id in range(len(program.blocks)):
for op in program.blocks[block_id].ops:
var_name_list = paddle.static.quantization.utils._get_op_input_var_names(
op)
if op_name in var_name_list:
for var_name in var_name_list:
if var_name not in persistable_var_names:
weight_act_map[var_name] = op_name
return weight_act_map
def collect_statistic(self, def collect_statistic(self,
fp_tensors, fp_tensors,
...@@ -461,7 +249,7 @@ class AnalysisPTQ(object): ...@@ -461,7 +249,7 @@ class AnalysisPTQ(object):
is_weight, is_weight,
axis=None): axis=None):
statistic = [] statistic = []
box_fp_dist, box_q_dist = [], [] box_fp_dist, box_q_dist = {}, {}
hist_fp_dist, hist_q_dist = {}, {} hist_fp_dist, hist_q_dist = {}, {}
fp_tensor_names = sorted(list(fp_tensors.keys())) fp_tensor_names = sorted(list(fp_tensors.keys()))
for var_name in fp_tensor_names: for var_name in fp_tensor_names:
...@@ -487,22 +275,36 @@ class AnalysisPTQ(object): ...@@ -487,22 +275,36 @@ class AnalysisPTQ(object):
diff_std = round(diff.std(), 4) diff_std = round(diff.std(), 4)
stat = { stat = {
'Var Name': var_name, 'Var Name':
'Var Type': 'Weight' if is_weight else 'Activation', var_name,
'Corresponding Weight Name': self.acts_weight_map[var_name] 'Var Type':
if not is_weight else None, 'Weight' if is_weight else 'Activation',
'FP32 Min': fp_min, 'Corresponding Weight Name':
'FP32 Max': fp_max, self.fp_act_weight_dict[var_name] if not is_weight else None,
'FP32 Mean': fp_mean, 'FP32 Min':
'FP32 Std': fp_std, fp_min,
'Quantized Min': q_min, 'FP32 Max':
'Quantized Max': q_max, fp_max,
'Quantized Mean': q_mean, 'FP32 Mean':
'Quantized Std': q_std, fp_mean,
'Diff Min': diff_min, 'FP32 Std':
'Diff Max': diff_max, fp_std,
'Diff Mean': diff_mean, 'Quantized Min':
'Diff Std': diff_std, q_min,
'Quantized Max':
q_max,
'Quantized Mean':
q_mean,
'Quantized Std':
q_std,
'Diff Min':
diff_min,
'Diff Max':
diff_max,
'Diff Mean':
diff_mean,
'Diff Std':
diff_std,
} }
statistic.append(stat) statistic.append(stat)
# for boxplot # for boxplot
...@@ -514,12 +316,12 @@ class AnalysisPTQ(object): ...@@ -514,12 +316,12 @@ class AnalysisPTQ(object):
[-1, fp_tensor.shape[axis]]).abs().max(axis=-1) [-1, fp_tensor.shape[axis]]).abs().max(axis=-1)
box_q_tensor = quant_tensor.reshape( box_q_tensor = quant_tensor.reshape(
[-1, quant_tensor.shape[axis]]).abs().max(axis=-1) [-1, quant_tensor.shape[axis]]).abs().max(axis=-1)
sample_num = len(box_fp_tensor) if len( sample_num = len(
box_fp_tensor) < 1000 else 1000 box_fp_tensor) if len(box_fp_tensor) < 1000 else 1000
box_fp_tensor = random.sample(list(box_fp_tensor), sample_num) box_fp_tensor = random.sample(list(box_fp_tensor), sample_num)
box_q_tensor = random.sample(list(box_q_tensor), sample_num) box_q_tensor = random.sample(list(box_q_tensor), sample_num)
box_fp_dist.append(box_fp_tensor) box_fp_dist[var_name] = box_fp_tensor
box_q_dist.append(box_q_tensor) box_q_dist[quant_name] = box_q_tensor
# for histplot # for histplot
_, hist_edges = np.histogram( _, hist_edges = np.histogram(
...@@ -531,50 +333,253 @@ class AnalysisPTQ(object): ...@@ -531,50 +333,253 @@ class AnalysisPTQ(object):
return statistic, box_fp_dist, box_q_dist, hist_fp_dist, hist_q_dist return statistic, box_fp_dist, box_q_dist, hist_fp_dist, hist_q_dist
def plot_box_distribution(self, distribution, labels, save_name): def statistical_analyse(self, analysis_axis=None):
all_values = sum(distribution, []) fp_act, fp_weight = self.collect_base_stat()
max_value = np.max(all_values) quant_act, quant_weight = self.collect_quant_stat()
min_value = np.min(all_values) fp_q_act_dict = {
pdf_path = os.path.join(self.save_dir, save_name) self.fp_weight_act_dict[n]: self.quant_weight_act_dict[n]
with PdfPages(pdf_path) as pdf: for n in self.fp_weight_act_dict
for i in range(0, len(distribution), 20): }
r = i + 20 if i + 20 < len(distribution) else len(distribution) act_statistic, box_fp_dist, box_q_dist, hist_fp_dist, hist_q_dist = self.collect_statistic(
plt.boxplot( fp_act,
distribution[i:r], quant_act,
labels=labels[i:r], fp_q_act_dict,
showbox=True, is_weight=False,
patch_artist=True) axis=analysis_axis)
plt.xticks(rotation=90)
plt.tick_params(axis='x') plot_box_distribution(box_fp_dist, self.save_dir,
plt.ylim([min_value, max_value]) 'fp_activation_boxplot.pdf')
if 'act' in save_name: plot_box_distribution(box_q_dist, self.save_dir,
plt.xlabel('Activation Name') 'quantized_activation_boxplot.pdf')
plot_hist_distribution(hist_fp_dist, self.save_dir,
'fp_activation_histplot.pdf')
plot_hist_distribution(hist_q_dist, self.save_dir,
'quantized_activation_histplot.pdf')
weight_statistic, box_fp_dist, box_q_dist, hist_fp_dist, hist_q_dist = self.collect_statistic(
fp_weight, quant_weight, None, is_weight=True, axis=analysis_axis)
plot_box_distribution(box_fp_dist, self.save_dir,
'fp_weight_boxplot.pdf')
plot_box_distribution(box_q_dist, self.save_dir,
'quantized_weight_boxplot.pdf')
plot_hist_distribution(hist_fp_dist, self.save_dir,
'fp_weight_histplot.pdf')
plot_hist_distribution(hist_q_dist, self.save_dir,
'quantized_weight_histplot.pdf')
statistic = act_statistic + weight_statistic
csv_columns = [
'Var Name', 'Var Type', 'Corresponding Weight Name', 'FP32 Min',
'FP32 Max', 'FP32 Mean', 'FP32 Std', 'Quantized Min',
'Quantized Max', 'Quantized Mean', 'Quantized Std', 'Diff Min',
'Diff Max', 'Diff Mean', 'Diff Std'
]
save_csv(statistic, self.save_dir, 'statistic.csv', csv_columns)
def get_quant_sensitive_metric(self, skip_list, layer_name):
executor = paddle.static.Executor(self.places)
if self.eval_function is not None:
ptq = self.create_ptq(executor, skip_list)
program = ptq.quantize()
_logger.info('Evaluating...')
if self.onnx_format:
post_training_quantization.save_quantized_model(
self.temp_save_path,
model_filename='model.pdmodel',
params_filename='model.pdiparams')
program, feed_list, fetch_list = load_inference_model(
self.temp_save_path,
executor,
model_filename='model.pdmodel',
params_filename='model.pdiparams')
metric = self.eval_function(executor, program, ptq._feed_list,
ptq._fetch_list)
sensitive_metric = self.fp_metric - metric
_logger.info(
"Quantized layer name: %s, the accuracy: %.4f, the sensitive metric: %.4f"
% (layer_name, metric, sensitive_metric))
else:
float_scope = paddle.static.Scope()
quant_scope = paddle.static.Scope()
with paddle.static.scope_guard(float_scope):
[float_program, float_feed_list,
float_fetch_list] = load_inference_model(
self.float_model_dir,
executor=executor,
model_filename=self.model_filename,
params_filename=self.params_filename)
with paddle.static.scope_guard(quant_scope):
ptq = self.create_ptq(executor, skip_list)
quant_program = ptq.quantize()
metric = fp_quant_cosine_similarity(
executor, self.data_loader, float_program, quant_program,
float_scope, quant_scope, float_fetch_list, ptq._fetch_list)
sensitive_metric = 1.0 - metric
_logger.info(
"Quantized layer name: %s, the cosine similarity: %.4f, the sensitive metric: %.4f"
% (layer_name, metric, sensitive_metric))
executor.close()
return sensitive_metric
def get_dequant_sensitive_metric(self, executor, float_scope, quant_scope,
layer_name):
weight_name = layer_name.split('.quantized.dequantized')[0]
with paddle.static.scope_guard(float_scope):
[float_program, float_feed_list,
float_fetch_list] = load_inference_model(
self.float_model_dir,
executor=executor,
model_filename=self.model_filename,
params_filename=self.params_filename)
with paddle.static.scope_guard(quant_scope):
[program, quant_feed_list, quant_fetch_list] = load_inference_model(
self.quant_model_dir,
executor=executor,
model_filename=self.model_filename,
params_filename=self.params_filename)
program_copy = program.clone()
graph = IrGraph(core.Graph(program_copy.desc), for_test=True)
input_rename_map, output_rename_map, removed_ops = get_new_in_out_map(
self.weight_act_dict[layer_name], graph, float_scope, quant_scope,
self.places)
saved_program = relink_graph(graph, input_rename_map, output_rename_map,
removed_ops)
if self.eval_function is not None:
with paddle.static.scope_guard(quant_scope):
_logger.info(
'Skip quant {}, evaluating....'.format(weight_name))
metric = self.eval_function(executor, saved_program,
quant_feed_list, quant_fetch_list)
sensitive_metric = self.quant_metric - metric
_logger.info(
'When skip quant %s, the eval metric is %.4f, the sensitive metric is %.4f'
% (weight_name, metric, self.quant_metric - metric))
else: else:
plt.xlabel('Weight Name') metric = fp_quant_cosine_similarity(
plt.ylabel("Box Distribution") executor, self.data_loader, float_program, saved_program,
plt.tight_layout() float_scope, quant_scope, float_fetch_list, quant_fetch_list)
plt.show() sensitive_metric = 1 - metric
pdf.savefig() _logger.info(
plt.close() 'When skip quant %s, the cosine similarity is %.4f, the sensitive metric is %.4f'
_logger.info('Distribution plots is saved in {}'.format(pdf_path)) % (weight_name, metric, 1 - metric))
return sensitive_metric
def plot_hist_distribution(self, hist_data, save_name):
pdf_path = os.path.join(self.save_dir, save_name) def prepare_error_analyse(self, dequant_layer_by_layer):
with PdfPages(pdf_path) as pdf: if not dequant_layer_by_layer:
for name in hist_data: executor = paddle.static.Executor(self.places)
plt.hist(hist_data[name][0], bins=hist_data[name][1]) [program, feed_list, fetch_list]= load_inference_model( \
plt.xlabel(name) self.float_model_dir, \
plt.ylabel("Probability") executor=executor, \
locs, _ = plt.yticks() model_filename=self.model_filename, \
plt.yticks(locs, np.round(locs / len(hist_data[name][0]), 3)) params_filename=self.params_filename)
if 'act' in save_name:
plt.title("Hist of Activation {}".format(name)) self.weight_act_dict, _ = self.get_weight_act_info(program)
self.support_quant_name_list = list(self.weight_act_dict.keys())
self.tobe_analyized_layer = sorted(
list(
set(self.support_quant_name_list) -
set(self.skip_tensor_list)))
if self.eval_function is not None:
_logger.info('Start to evaluate the FP model.')
self.fp_metric = self.eval_function(executor, program,
feed_list, fetch_list)
_logger.info(
'The accuracy of the FP model is: %.4f' % self.fp_metric)
executor.close()
_logger.info('Start to evaluate the quantized model.')
executor = paddle.static.Executor(self.places)
ptq = self.create_ptq(executor, self.skip_tensor_list)
program = ptq.quantize()
self.quant_metric = self.eval_function(executor, program,
feed_list, fetch_list)
_logger.info('The accuracy of the quantized model is: %.4f' %
self.quant_metric)
else: else:
plt.title("Hist of Weight {}".format(name)) executor = paddle.static.Executor(self.places)
plt.show() [program, feed_list, fetch_list] = load_inference_model(
pdf.savefig() self.quant_model_dir,
plt.close() executor=executor,
_logger.info('Histogram plot is saved in {}'.format(pdf_path)) model_filename=self.model_filename,
params_filename=self.params_filename)
graph = IrGraph(core.Graph(program.desc), for_test=True)
self.weight_act_dict, _ = self.get_weight_act_info(
program, persistable=False)
if self.eval_function is not None:
_logger.info('Start to evaluate the quantized model.')
self.quant_metric = self.eval_function(executor, program,
feed_list, fetch_list)
_logger.info('The accuracy of the quantized model is: %.4f' %
self.quant_metric)
executor.close()
def metric_error_analyse(self):
assert self.data_loader is not None, \
"When computing the sensitivity of quantized layers, the data loader is needed"
dequant_layer_by_layer = False if self.quant_model_dir is None else True
self.prepare_error_analyse(dequant_layer_by_layer)
if not dequant_layer_by_layer:
_logger.info(
'For each layer, quantize the weight op and evaluate the quantized model.'
)
# For each layer, quantize the weight op and evaluate the quantized model.
for i, layer_name in enumerate(self.tobe_analyized_layer):
if layer_name in self.layer_metrics:
continue
_logger.info(
'Checking {}/{} quant model: quant layer {}'.format(
i + 1, len(self.tobe_analyized_layer), layer_name))
skip_list = copy.copy(list(self.support_quant_name_list))
skip_list.remove(layer_name)
sensitive_metric = self.get_quant_sensitive_metric(
skip_list, layer_name)
self.layer_metrics[layer_name] = sensitive_metric
self.save_checkpoint()
if self.onnx_format:
self.temp_root_path.cleanup()
else:
_logger.info(
'For each layer, dequantize the weight op and evaluate the quantized model.'
)
executor = paddle.static.Executor(self.places)
float_scope = paddle.static.Scope()
quant_scope = paddle.static.Scope()
for idx, name in enumerate(self.weight_act_dict):
weight_name = name.split('.quantized.dequantized')[0]
if weight_name in self.layer_metrics:
continue
_logger.info(
'Checking {}/{} quant model: without quant layer {}'.format(
idx + 1, len(self.weight_act_dict), weight_name))
sensitive_metric = self.get_dequant_sensitive_metric(
executor, float_scope, quant_scope, name)
self.layer_metrics[weight_name] = sensitive_metric
self.save_checkpoint()
executor.close()
self.sensitivity_ranklist = sorted(
self.layer_metrics, key=self.layer_metrics.get, reverse=True)
_logger.info('Finished computing the sensitivity of the model.')
for name in self.sensitivity_ranklist:
_logger.info("layer name: {}, sensitivity metric: {}".format(
name, self.layer_metrics[name]))
analysis_file = os.path.join(self.save_dir, "analysis.txt")
with open(analysis_file, "w") as analysis_ret_f:
for name in self.sensitivity_ranklist:
analysis_ret_f.write("layer name: {}, sensitivity metric: {}\n".
format(name, self.layer_metrics[name]))
_logger.info('Analysis file is saved in {}'.format(analysis_file))
def get_target_quant_model(self, target_metric): def get_target_quant_model(self, target_metric):
_logger.info( _logger.info(
...@@ -583,11 +588,9 @@ class AnalysisPTQ(object): ...@@ -583,11 +588,9 @@ class AnalysisPTQ(object):
'Make sure that you are using full eval dataset to get target quantized model.' 'Make sure that you are using full eval dataset to get target quantized model.'
) )
skip_list = [] skip_list = []
if self.quant_layer_metrics: if self.layer_metrics:
rank_list = sorted( rank_list = sorted(
self.quant_layer_metrics, self.layer_metrics, key=self.layer_metrics.get, reverse=True)
key=self.quant_layer_metrics.get,
reverse=True)
else: else:
_logger.info( _logger.info(
'Analyse metric error before get target quantized model.') 'Analyse metric error before get target quantized model.')
...@@ -597,12 +600,12 @@ class AnalysisPTQ(object): ...@@ -597,12 +600,12 @@ class AnalysisPTQ(object):
skip_list.append(rank_list.pop(0)) skip_list.append(rank_list.pop(0))
_logger.info('Skip Ops: {}'.format(skip_list)) _logger.info('Skip Ops: {}'.format(skip_list))
executor = paddle.static.Executor(self.places) executor = paddle.static.Executor(self.places)
post_training_quantization = self.create_ptq(executor, skip_list) ptq = self.create_ptq(executor, skip_list)
program = post_training_quantization.quantize() program = ptq.quantize()
_logger.info('Evaluating...') _logger.info('Evaluating...')
quant_metric = self.eval_function(executor, program, self.feed_list, quant_metric = self.eval_function(executor, program, ptq._feed_list,
self.fetch_list) ptq._fetch_list)
_logger.info("Current eval metric: {}, the target metric: {}". _logger.info("Current eval metric: {}, the target metric: {}".
format(quant_metric, target_metric)) format(quant_metric, target_metric))
if quant_metric >= target_metric: if quant_metric >= target_metric:
...@@ -611,7 +614,7 @@ class AnalysisPTQ(object): ...@@ -611,7 +614,7 @@ class AnalysisPTQ(object):
_logger.info( _logger.info(
'The quantized model satisfies the target metric and is saved to {}'. 'The quantized model satisfies the target metric and is saved to {}'.
format(quantize_model_path)) format(quantize_model_path))
post_training_quantization.save_quantized_model( ptq.save_quantized_model(
quantize_model_path, quantize_model_path,
model_filename='model.pdmodel', model_filename='model.pdmodel',
params_filename='model.pdiparams') params_filename='model.pdiparams')
......
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import pickle
import copy
import logging
import numpy as np
import paddle
import paddle.nn.functional as F
from paddle.framework import core
from paddle.fluid.framework import IrGraph
from ..common import get_logger, load_inference_model
_logger = get_logger(__name__, level=logging.INFO)
__all__ = ["AnalysisQAT"]
class AnalysisQAT(object):
def __init__(self,
quant_model_dir,
float_model_dir,
model_filename=None,
params_filename=None,
quantizable_op_type=["conv2d", "depthwise_conv2d", "mul"],
qat_metric=None,
eval_function=None,
data_loader=None,
save_dir='analysis_results',
resume=False):
'''
AnalysisQAT provides to analysis the sensitivity of each op in the model.
Args:
quant_model_dir(str): the path of INT8 model that quantized through QAT
float_model_dir(str): the path of FP32 model that is the base model of quant_model
model_filename(str, optional): the model file name of the model
params_filename(str, optional): the parameter file name of the model
quantizable_op_type(list of str, optional): the type of op that will be analyzed
qat_metric(float, optional): the metric of the quantized model, which will be calculated automatically if is None
eval_function(function): eval function, define by yourself to return the metric of the inference program, can be used to judge the metric of quantized model.
data_loader(Python Generator, Paddle.io.DataLoader, optional): the
Generator or Dataloader provides calibrate data, and it could
return a batch every time
save_dir(str, optional): the output dir that stores the analyzed information
resume(bool, optional): When break off while ananlyzing, could resume analysis program and load already analyzed information.
'''
if model_filename is None:
model_filename = 'model.pdmodel'
if params_filename is None:
params_filename = 'model.pdiparams'
self.quant_model_dir = quant_model_dir
self.float_model_dir = float_model_dir
self.model_filename = model_filename
self.params_filename = params_filename
self.quantizable_op_type = quantizable_op_type
self.qat_metric = qat_metric
self.eval_function = eval_function
self.data_loader = data_loader
self.save_dir = save_dir
self.checkpoint_name = os.path.join(save_dir, 'analysis_checkpoint.pkl')
self.nonquant_layer_metrics = {}
if not os.path.exists(self.save_dir):
os.mkdir(self.save_dir)
devices = paddle.device.get_device().split(':')[0]
self.places = paddle.device._convert_to_place(devices)
executor = paddle.static.Executor(self.places)
[program, self.feed_list, self.fetch_list] = load_inference_model(
self.quant_model_dir,
executor=executor,
model_filename=self.model_filename,
params_filename=self.params_filename)
_logger.info('Loaded model from: {}'.format(quant_model_dir))
graph = IrGraph(core.Graph(program.desc), for_test=True)
# find all inputs for each quantizable op
self.inputs_of_quantized_op = []
sorted_ops = graph.topology_sort()
for op_node in sorted_ops:
op_name = op_node.name()
if op_name in quantizable_op_type:
input_names = op_node.op().input_arg_names()
for input_name in input_names:
if 'quantized' in input_name:
self.inputs_of_quantized_op.append(input_names)
break
if self.eval_function is None:
assert self.data_loader is not None, "DataLoader cannot be None if Eval Fuction is None."
_logger.info(
'The sensitivity will measured by cosine similarity of the outputs from float model and quantized model.'
)
if self.qat_metric is None and self.eval_function is not None:
_logger.info('Calculating the metric of QAT model...')
self.qat_metric = self.eval_function(
executor, program, self.feed_list, self.fetch_list) * 100
_logger.info('The metric of QAT model is {}'.format(
round(self.qat_metric, 4)))
executor.close()
if resume:
self.load_checkpoint()
def save_checkpoint(self):
if not os.path.exists(self.save_dir):
os.makedirs(self.save_dir)
with open(self.checkpoint_name, 'wb') as f:
pickle.dump(self.nonquant_layer_metrics, f)
_logger.info('Save checkpoint to {}.'.format(self.checkpoint_name))
def load_checkpoint(self):
if not os.path.exists(self.checkpoint_name):
_logger.info('Checkpoint path {} does not exist.'.format(
self.checkpoint_name))
return False
with open(self.checkpoint_name, 'rb') as f:
self.nonquant_layer_metrics = pickle.load(f)
_logger.info('Load checkpoint from {}.'.format(self.checkpoint_name))
return True
def get_weight_name(self, inputs_names):
# TODO(xc)
w_idx = 0 if 'w_0' in inputs_names[0] else 1
weight_name = inputs_names[w_idx].split('.quantized.dequantized')[0]
return weight_name
def get_new_in_out_map(
self,
input_list,
graph,
float_scope,
quant_scope, ):
input_rename_map = {}
output_rename_map = {}
removed_ops = []
for op_node in graph.all_op_nodes():
if op_node.id() in removed_ops:
continue
in_names = op_node.input_arg_names()
out_names = op_node.output_arg_names()
if len(out_names) == 1 and out_names[0] in input_list:
in_var = graph._find_node_by_name(op_node.inputs,
op_node.input('X')[0])
out_var = graph._find_node_by_name(op_node.outputs,
op_node.output('Y')[0])
if not in_var.persistable():
# act
for op in graph.all_op_nodes():
o_ns = op.output_arg_names()
if len(o_ns) == 1 and o_ns[0] == in_var.name():
in_var_1 = graph._find_node_by_name(
op.inputs, op.input('X')[0])
graph.safe_remove_nodes(op)
removed_ops.append(op.id())
input_rename_map[out_var.node] = in_var_1
else:
# weight
with paddle.static.scope_guard(float_scope):
float_name = in_var.name().replace('.quantized', '')
float_weight = np.array(
float_scope.find_var(float_name).get_tensor())
with paddle.static.scope_guard(quant_scope):
quant_scope.find_var(in_var.name()).get_tensor().set(
float_weight, self.places)
input_rename_map[out_var.node] = in_var
graph.safe_remove_nodes(op_node)
removed_ops.append(op_node.id())
output_rename_map[in_var.node] = out_var
return input_rename_map, output_rename_map, removed_ops
def relink_graph(self, graph, input_rename_map, output_rename_map,
removed_ops):
for op_node in graph.all_op_nodes():
if op_node.id() in removed_ops:
continue
for var in op_node.inputs:
if var.node in input_rename_map:
old_in = var
new_in = input_rename_map[var.node]
graph.update_input_link(old_in, new_in, op_node)
_logger.info(
f'relink {op_node.name()} \'s input node from {old_in.name()} to {new_in.name()}.'
)
for var in op_node.outputs:
if var.node in output_rename_map:
old_out = var
new_out = output_rename_map[var.node]
graph.update_input_link(old_out, new_out, op_node)
_logger.info(
f'relink {op_node.name()} \'s output node from {old_out.name()} to {new_out.name()}.'
)
return graph.to_program()
def fp_int_cosine_similarity(self, executor, float_program, quant_program,
float_scope, quant_scope):
cosine_similarity = []
for step, data in enumerate(self.data_loader()):
with paddle.static.scope_guard(float_scope):
float_preds = executor.run(program=float_program,
feed=data,
fetch_list=self.float_fetch_list,
return_numpy=False)
float_preds = float_preds[0]
with paddle.static.scope_guard(quant_scope):
quant_preds = executor.run(program=quant_program,
feed=data,
fetch_list=self.fetch_list,
return_numpy=False)
quant_preds = quant_preds[0]
paddle.disable_static()
float_preds = paddle.to_tensor(float_preds)
quant_preds = paddle.to_tensor(quant_preds)
cos_sim = F.cosine_similarity(float_preds, quant_preds).mean()
cos_sim = cos_sim.numpy()
cosine_similarity.append(cos_sim)
if step != 0 and (step % 10 == 0):
_logger.info("[step]: %d, cosine similarity: %.9f" %
(step, np.array(cosine_similarity).mean()))
paddle.enable_static()
return np.array(cosine_similarity).mean()
def metric_error_analyse(self):
executor = paddle.static.Executor(self.places)
float_scope = paddle.static.Scope()
quant_scope = paddle.static.Scope()
for idx, input_list in enumerate(self.inputs_of_quantized_op):
weight_name = self.get_weight_name(input_list)
if weight_name in self.nonquant_layer_metrics:
continue
_logger.info(
'Checking {}/{} quant model: without quant layer {}'.format(
idx + 1, len(self.inputs_of_quantized_op), weight_name))
with paddle.static.scope_guard(float_scope):
[float_program, self.float_feed_list,
self.float_fetch_list] = load_inference_model(
self.float_model_dir,
executor=executor,
model_filename=self.model_filename,
params_filename=self.params_filename)
with paddle.static.scope_guard(quant_scope):
[program, self.feed_list,
self.fetch_list] = load_inference_model(
self.quant_model_dir,
executor=executor,
model_filename=self.model_filename,
params_filename=self.params_filename)
program_copy = program.clone()
graph = IrGraph(core.Graph(program_copy.desc), for_test=True)
input_rename_map, output_rename_map, removed_ops = self.get_new_in_out_map(
input_list, graph, float_scope, quant_scope)
saved_program = self.relink_graph(graph, input_rename_map,
output_rename_map, removed_ops)
if self.eval_function is not None:
with paddle.static.scope_guard(quant_scope):
_logger.info('Skip quant {}, evaluating....'.format(
weight_name))
metric = self.eval_function(executor, saved_program,
self.feed_list,
self.fetch_list) * 100
self.nonquant_layer_metrics[
weight_name] = metric - self.qat_metric
_logger.info(
'When skip quant %s, the eval metric is %.4f, the sensitive metric is %.4f'
% (weight_name, metric, metric - self.qat_metric))
else:
metric = self.fp_int_cosine_similarity(executor, float_program,
saved_program,
float_scope, quant_scope)
self.nonquant_layer_metrics[weight_name] = 1 - metric
_logger.info(
'When skip quant %s, the cosine similarity is %.4f, the sensitive metric is %.4f'
% (weight_name, metric, 1 - metric))
self.save_checkpoint()
executor.close()
self.sensitivity_ranklist = sorted(
self.nonquant_layer_metrics,
key=self.nonquant_layer_metrics.get,
reverse=True)
_logger.info('Finished computing the sensitivity of the model.')
for name in self.sensitivity_ranklist:
_logger.info("Without quant layer name: {}, sensitive metric: {}".
format(name, self.nonquant_layer_metrics[name]))
analysis_file = os.path.join(self.save_dir, "analysis.txt")
with open(analysis_file, "w") as analysis_ret_f:
for name in self.sensitivity_ranklist:
analysis_ret_f.write(
"Without quant layer name: {}, sensitive metric: {}\n".
format(name, self.nonquant_layer_metrics[name]))
_logger.info('Analysis file is saved in {}'.format(analysis_file))
import os
import sys
import csv
import logging
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
from ..common import get_logger
import paddle
import paddle.nn.functional as F
from paddle.static.quantization.utils import load_variable_data
_logger = get_logger(__name__, level=logging.INFO)
def collect_vars(scope, var_names):
all_vars = {}
for var_name in var_names:
var_tensor = load_variable_data(scope, var_name)
all_vars[var_name] = var_tensor
return all_vars
def plot_box_distribution(box_data, save_dir, save_name):
all_values = sum(list(box_data.values()), [])
max_value = np.max(all_values)
min_value = np.min(all_values)
pdf_path = os.path.join(save_dir, save_name)
labels = sorted(box_data.keys())
with PdfPages(pdf_path) as pdf:
for i in range(0, len(labels), 20):
r = i + 20 if i + 20 < len(labels) else len(labels)
dist = [box_data[n] for n in labels[i:r]]
plt.boxplot(
dist, labels=labels[i:r], showbox=True, patch_artist=True)
plt.xticks(rotation=90)
plt.tick_params(axis='x')
plt.ylim([min_value, max_value])
if 'act' in save_name:
plt.xlabel('Activation Name')
else:
plt.xlabel('Weight Name')
plt.ylabel("Box Distribution")
plt.tight_layout()
plt.show()
pdf.savefig()
plt.close()
_logger.info('Box plots is saved in {}'.format(pdf_path))
def plot_hist_distribution(hist_data, save_dir, save_name):
pdf_path = os.path.join(save_dir, save_name)
with PdfPages(pdf_path) as pdf:
for name in hist_data:
plt.hist(hist_data[name][0], bins=hist_data[name][1])
plt.xlabel(name)
plt.ylabel("Probability")
locs, _ = plt.yticks()
plt.yticks(locs, np.round(locs / len(hist_data[name][0]), 3))
if 'act' in save_name:
plt.title("Hist of Activation {}".format(name))
else:
plt.title("Hist of Weight {}".format(name))
plt.show()
pdf.savefig()
plt.close()
_logger.info('Histogram plot is saved in {}'.format(pdf_path))
def save_csv(data, save_dir, save_name, csv_columns):
save_path = os.path.join(save_dir, save_name)
with open(save_path, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=csv_columns)
writer.writeheader()
for d in data:
writer.writerow(d)
_logger.info('Activation Statistic is saved in {}'.format(save_path))
def fp_quant_cosine_similarity(executor, data_loader, float_program,
quant_program, float_scope, quant_scope,
float_fetch_list, quant_fetch_list):
cosine_similarity = []
for step, data in enumerate(data_loader()):
with paddle.static.scope_guard(float_scope):
float_preds = executor.run(
program=float_program,
feed=data,
fetch_list=float_fetch_list,
return_numpy=False)
float_preds = float_preds[0]
with paddle.static.scope_guard(quant_scope):
quant_preds = executor.run(
program=quant_program,
feed=data,
fetch_list=quant_fetch_list,
return_numpy=False)
quant_preds = quant_preds[0]
paddle.disable_static()
float_preds = paddle.to_tensor(float_preds)
quant_preds = paddle.to_tensor(quant_preds)
cos_sim = F.cosine_similarity(float_preds, quant_preds).mean()
cos_sim = cos_sim.numpy()
cosine_similarity.append(cos_sim)
if step != 0 and (step % 10 == 0):
_logger.info("[step]: %d, cosine similarity: %.9f" %
(step, np.array(cosine_similarity).mean()))
paddle.enable_static()
return np.array(cosine_similarity).mean()
def get_new_in_out_map(input_name, graph, float_scope, quant_scope, place):
input_rename_map = {}
output_rename_map = {}
removed_ops = []
for op_node in graph.all_op_nodes():
if op_node.id() in removed_ops:
continue
in_names = op_node.input_arg_names()
out_names = op_node.output_arg_names()
if out_names[0] == input_name:
in_var = graph._find_node_by_name(op_node.inputs,
op_node.input('X')[0])
out_var = graph._find_node_by_name(op_node.outputs,
op_node.output('Y')[0])
if not in_var.persistable():
# act
for op in graph.all_op_nodes():
o_ns = op.output_arg_names()
if len(o_ns) == 1 and o_ns[0] == in_var.name():
in_var_1 = graph._find_node_by_name(
op.inputs, op.input('X')[0])
graph.safe_remove_nodes(op)
removed_ops.append(op.id())
input_rename_map[out_var.node] = in_var_1
else:
# weight
with paddle.static.scope_guard(float_scope):
float_name = in_var.name().replace('.quantized', '')
float_weight = np.array(
float_scope.find_var(float_name).get_tensor())
with paddle.static.scope_guard(quant_scope):
quant_scope.find_var(in_var.name()).get_tensor().set(
float_weight, place)
input_rename_map[out_var.node] = in_var
graph.safe_remove_nodes(op_node)
removed_ops.append(op_node.id())
output_rename_map[in_var.node] = out_var
return input_rename_map, output_rename_map, removed_ops
def relink_graph(graph, input_rename_map, output_rename_map, removed_ops):
for op_node in graph.all_op_nodes():
if op_node.id() in removed_ops:
continue
for var in op_node.inputs:
if var.node in input_rename_map:
old_in = var
new_in = input_rename_map[var.node]
graph.update_input_link(old_in, new_in, op_node)
_logger.info(
f'relink {op_node.name()} \'s input node from {old_in.name()} to {new_in.name()}.'
)
for var in op_node.outputs:
if var.node in output_rename_map:
old_out = var
new_out = output_rename_map[var.node]
graph.update_input_link(old_out, new_out, op_node)
_logger.info(
f'relink {op_node.name()} \'s output node from {old_out.name()} to {new_out.name()}.'
)
return graph.to_program()
...@@ -7,7 +7,7 @@ import paddle ...@@ -7,7 +7,7 @@ import paddle
from PIL import Image from PIL import Image
from paddle.vision.datasets import DatasetFolder from paddle.vision.datasets import DatasetFolder
from paddle.vision.transforms import transforms from paddle.vision.transforms import transforms
from paddleslim.quant.analysis_ptq import AnalysisPTQ from paddleslim.quant.analysis import Analysis
paddle.enable_static() paddle.enable_static()
...@@ -17,7 +17,8 @@ class ImageNetDataset(DatasetFolder): ...@@ -17,7 +17,8 @@ class ImageNetDataset(DatasetFolder):
normalize = transforms.Normalize( normalize = transforms.Normalize(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.120, 57.375]) mean=[123.675, 116.28, 103.53], std=[58.395, 57.120, 57.375])
self.transform = transforms.Compose([ self.transform = transforms.Compose([
transforms.Resize(256), transforms.CenterCrop(image_size), transforms.Resize(256),
transforms.CenterCrop(image_size),
transforms.Transpose(), normalize transforms.Transpose(), normalize
]) ])
...@@ -51,12 +52,12 @@ class AnalysisPTQDemo(unittest.TestCase): ...@@ -51,12 +52,12 @@ class AnalysisPTQDemo(unittest.TestCase):
train_loader = paddle.io.DataLoader( train_loader = paddle.io.DataLoader(
train_dataset, feed_list=[image], batch_size=8, return_list=False) train_dataset, feed_list=[image], batch_size=8, return_list=False)
analyzer = AnalysisPTQ( analyzer = Analysis(
model_dir="./MobileNetV1_infer", float_model_dir="./MobileNetV1_infer",
model_filename="inference.pdmodel", model_filename="inference.pdmodel",
params_filename="inference.pdiparams", params_filename="inference.pdiparams",
save_dir="MobileNetV1_analysis", save_dir="MobileNetV1_analysis",
ptq_config={ quant_config={
'quantizable_op_type': ["conv2d", "depthwise_conv2d"], 'quantizable_op_type': ["conv2d", "depthwise_conv2d"],
'weight_quantize_type': 'abs_max', 'weight_quantize_type': 'abs_max',
'activation_quantize_type': 'moving_average_abs_max', 'activation_quantize_type': 'moving_average_abs_max',
......
...@@ -8,7 +8,7 @@ import paddle ...@@ -8,7 +8,7 @@ import paddle
from PIL import Image from PIL import Image
from paddle.vision.datasets import DatasetFolder from paddle.vision.datasets import DatasetFolder
from paddle.vision.transforms import transforms from paddle.vision.transforms import transforms
from paddleslim.quant.analysis_ptq import AnalysisPTQ from paddleslim.quant.analysis import Analysis
paddle.enable_static() paddle.enable_static()
...@@ -19,7 +19,8 @@ class ImageNetDataset(DatasetFolder): ...@@ -19,7 +19,8 @@ class ImageNetDataset(DatasetFolder):
normalize = transforms.Normalize( normalize = transforms.Normalize(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.120, 57.375]) mean=[123.675, 116.28, 103.53], std=[58.395, 57.120, 57.375])
self.transform = transforms.Compose([ self.transform = transforms.Compose([
transforms.Resize(256), transforms.CenterCrop(image_size), transforms.Resize(256),
transforms.CenterCrop(image_size),
transforms.Transpose(), normalize transforms.Transpose(), normalize
]) ])
self.mode = mode self.mode = mode
...@@ -52,9 +53,9 @@ class ImageNetDataset(DatasetFolder): ...@@ -52,9 +53,9 @@ class ImageNetDataset(DatasetFolder):
return len(self.samples) return len(self.samples)
class AnalysisPTQEvalFunction(unittest.TestCase): class AnalysisEvalFunction(unittest.TestCase):
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
super(AnalysisPTQEvalFunction, self).__init__(*args, **kwargs) super(AnalysisEvalFunction, self).__init__(*args, **kwargs)
if not os.path.exists('MobileNetV1_infer'): if not os.path.exists('MobileNetV1_infer'):
os.system( os.system(
'wget -q https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_infer.tar' 'wget -q https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_infer.tar'
...@@ -116,7 +117,8 @@ class AnalysisPTQEvalFunction(unittest.TestCase): ...@@ -116,7 +117,8 @@ class AnalysisPTQEvalFunction(unittest.TestCase):
if len(test_feed_names) == 1: if len(test_feed_names) == 1:
image = np.array(image) image = np.array(image)
label = np.array(label).astype('int64') label = np.array(label).astype('int64')
pred = exe.run(compiled_test_program, pred = exe.run(
compiled_test_program,
feed={test_feed_names[0]: image}, feed={test_feed_names[0]: image},
fetch_list=test_fetch_list) fetch_list=test_fetch_list)
pred = np.array(pred[0]) pred = np.array(pred[0])
...@@ -135,7 +137,8 @@ class AnalysisPTQEvalFunction(unittest.TestCase): ...@@ -135,7 +137,8 @@ class AnalysisPTQEvalFunction(unittest.TestCase):
# eval "eval model", which inputs are image and label, output is top1 and top5 accuracy # eval "eval model", which inputs are image and label, output is top1 and top5 accuracy
image = np.array(image) image = np.array(image)
label = np.array(label).astype('int64') label = np.array(label).astype('int64')
result = exe.run(compiled_test_program, result = exe.run(
compiled_test_program,
feed={ feed={
test_feed_names[0]: image, test_feed_names[0]: image,
test_feed_names[1]: label test_feed_names[1]: label
...@@ -148,12 +151,12 @@ class AnalysisPTQEvalFunction(unittest.TestCase): ...@@ -148,12 +151,12 @@ class AnalysisPTQEvalFunction(unittest.TestCase):
result = np.mean(np.array(results), axis=0) result = np.mean(np.array(results), axis=0)
return result[0] return result[0]
analyzer = AnalysisPTQ( analyzer = Analysis(
model_dir="./MobileNetV1_infer", float_model_dir="./MobileNetV1_infer",
model_filename="inference.pdmodel", model_filename="inference.pdmodel",
params_filename="inference.pdiparams", params_filename="inference.pdiparams",
save_dir="MobileNetV1_analysis", save_dir="MobileNetV1_analysis",
ptq_config={ quant_config={
'quantizable_op_type': ["conv2d", "depthwise_conv2d"], 'quantizable_op_type': ["conv2d", "depthwise_conv2d"],
'weight_quantize_type': 'abs_max', 'weight_quantize_type': 'abs_max',
'activation_quantize_type': 'moving_average_abs_max', 'activation_quantize_type': 'moving_average_abs_max',
...@@ -164,7 +167,7 @@ class AnalysisPTQEvalFunction(unittest.TestCase): ...@@ -164,7 +167,7 @@ class AnalysisPTQEvalFunction(unittest.TestCase):
data_loader=train_loader, data_loader=train_loader,
eval_function=eval_function) eval_function=eval_function)
analyzer.metric_error_analyse() analyzer.metric_error_analyse()
analyzer.get_target_quant_model(69.5) analyzer.get_target_quant_model(0.695)
os.system('rm -rf MobileNetV1_analysis') os.system('rm -rf MobileNetV1_analysis')
......
...@@ -8,7 +8,7 @@ from PIL import Image ...@@ -8,7 +8,7 @@ from PIL import Image
from paddle.vision.datasets import DatasetFolder from paddle.vision.datasets import DatasetFolder
from paddle.vision.transforms import transforms from paddle.vision.transforms import transforms
from paddle.static.quantization import PostTrainingQuantization from paddle.static.quantization import PostTrainingQuantization
from paddleslim.quant.analysis_qat import AnalysisQAT from paddleslim.quant.analysis import Analysis
paddle.enable_static() paddle.enable_static()
...@@ -19,7 +19,8 @@ class ImageNetDataset(DatasetFolder): ...@@ -19,7 +19,8 @@ class ImageNetDataset(DatasetFolder):
normalize = transforms.Normalize( normalize = transforms.Normalize(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.120, 57.375]) mean=[123.675, 116.28, 103.53], std=[58.395, 57.120, 57.375])
self.transform = transforms.Compose([ self.transform = transforms.Compose([
transforms.Resize(256), transforms.CenterCrop(image_size), transforms.Resize(256),
transforms.CenterCrop(image_size),
transforms.Transpose(), normalize transforms.Transpose(), normalize
]) ])
...@@ -55,8 +56,8 @@ class AnalysisQATDemo(unittest.TestCase): ...@@ -55,8 +56,8 @@ class AnalysisQATDemo(unittest.TestCase):
train_loader = paddle.io.DataLoader( train_loader = paddle.io.DataLoader(
train_dataset, feed_list=[image], batch_size=8, return_list=False) train_dataset, feed_list=[image], batch_size=8, return_list=False)
place = paddle.CUDAPlace(0) if paddle.is_compiled_with_cuda( place = paddle.CUDAPlace(
) else paddle.CPUPlace() 0) if paddle.is_compiled_with_cuda() else paddle.CPUPlace()
executor = paddle.static.Executor(place) executor = paddle.static.Executor(place)
ptq_config = { ptq_config = {
...@@ -83,12 +84,13 @@ class AnalysisQATDemo(unittest.TestCase): ...@@ -83,12 +84,13 @@ class AnalysisQATDemo(unittest.TestCase):
model_filename='inference.pdmodel', model_filename='inference.pdmodel',
params_filename='inference.pdiparams') params_filename='inference.pdiparams')
analyzer = AnalysisQAT( analyzer = Analysis(
float_model_dir="./MobileNetV1_infer", float_model_dir="./MobileNetV1_infer",
quant_model_dir="./MobileNetV1_quant", quant_model_dir="./MobileNetV1_quant",
model_filename="inference.pdmodel", model_filename="inference.pdmodel",
params_filename="inference.pdiparams", params_filename="inference.pdiparams",
save_dir="analysis_result", save_dir="analysis_result",
quant_config=ptq_config,
data_loader=train_loader) data_loader=train_loader)
analyzer.metric_error_analyse() analyzer.metric_error_analyse()
os.system('rm -rf analysis_result') os.system('rm -rf analysis_result')
......
...@@ -8,7 +8,7 @@ import paddle ...@@ -8,7 +8,7 @@ import paddle
from PIL import Image from PIL import Image
from paddle.vision.datasets import DatasetFolder from paddle.vision.datasets import DatasetFolder
from paddle.vision.transforms import transforms from paddle.vision.transforms import transforms
from paddleslim.quant.analysis_qat import AnalysisQAT from paddleslim.quant.analysis import Analysis
from paddle.static.quantization import PostTrainingQuantization from paddle.static.quantization import PostTrainingQuantization
paddle.enable_static() paddle.enable_static()
...@@ -21,7 +21,8 @@ class ImageNetDataset(DatasetFolder): ...@@ -21,7 +21,8 @@ class ImageNetDataset(DatasetFolder):
normalize = transforms.Normalize( normalize = transforms.Normalize(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.120, 57.375]) mean=[123.675, 116.28, 103.53], std=[58.395, 57.120, 57.375])
self.transform = transforms.Compose([ self.transform = transforms.Compose([
transforms.Resize(256), transforms.CenterCrop(image_size), transforms.Resize(256),
transforms.CenterCrop(image_size),
transforms.Transpose(), normalize transforms.Transpose(), normalize
]) ])
self.mode = mode self.mode = mode
...@@ -118,7 +119,8 @@ class AnalysisQATEvalFunction(unittest.TestCase): ...@@ -118,7 +119,8 @@ class AnalysisQATEvalFunction(unittest.TestCase):
if len(test_feed_names) == 1: if len(test_feed_names) == 1:
image = np.array(image) image = np.array(image)
label = np.array(label).astype('int64') label = np.array(label).astype('int64')
pred = exe.run(compiled_test_program, pred = exe.run(
compiled_test_program,
feed={test_feed_names[0]: image}, feed={test_feed_names[0]: image},
fetch_list=test_fetch_list) fetch_list=test_fetch_list)
pred = np.array(pred[0]) pred = np.array(pred[0])
...@@ -137,7 +139,8 @@ class AnalysisQATEvalFunction(unittest.TestCase): ...@@ -137,7 +139,8 @@ class AnalysisQATEvalFunction(unittest.TestCase):
# eval "eval model", which inputs are image and label, output is top1 and top5 accuracy # eval "eval model", which inputs are image and label, output is top1 and top5 accuracy
image = np.array(image) image = np.array(image)
label = np.array(label).astype('int64') label = np.array(label).astype('int64')
result = exe.run(compiled_test_program, result = exe.run(
compiled_test_program,
feed={ feed={
test_feed_names[0]: image, test_feed_names[0]: image,
test_feed_names[1]: label test_feed_names[1]: label
...@@ -150,8 +153,8 @@ class AnalysisQATEvalFunction(unittest.TestCase): ...@@ -150,8 +153,8 @@ class AnalysisQATEvalFunction(unittest.TestCase):
result = np.mean(np.array(results), axis=0) result = np.mean(np.array(results), axis=0)
return result[0] return result[0]
place = paddle.CUDAPlace(0) if paddle.is_compiled_with_cuda( place = paddle.CUDAPlace(
) else paddle.CPUPlace() 0) if paddle.is_compiled_with_cuda() else paddle.CPUPlace()
executor = paddle.static.Executor(place) executor = paddle.static.Executor(place)
ptq_config = { ptq_config = {
...@@ -178,12 +181,13 @@ class AnalysisQATEvalFunction(unittest.TestCase): ...@@ -178,12 +181,13 @@ class AnalysisQATEvalFunction(unittest.TestCase):
model_filename='inference.pdmodel', model_filename='inference.pdmodel',
params_filename='inference.pdiparams') params_filename='inference.pdiparams')
analyzer = AnalysisQAT( analyzer = Analysis(
float_model_dir="./MobileNetV1_infer", float_model_dir="./MobileNetV1_infer",
quant_model_dir="./MobileNetV1_QAT", quant_model_dir="./MobileNetV1_QAT",
model_filename="inference.pdmodel", model_filename="inference.pdmodel",
params_filename="inference.pdiparams", params_filename="inference.pdiparams",
save_dir="MobileNetV1_analysis", save_dir="MobileNetV1_analysis",
quant_config=ptq_config,
data_loader=train_loader, data_loader=train_loader,
eval_function=eval_function) eval_function=eval_function)
analyzer.metric_error_analyse() analyzer.metric_error_analyse()
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册