diff --git a/docs/zh_cn/tutorials/quant/AnalysisPTQ.md b/docs/zh_cn/tutorials/quant/AnalysisPTQ.md new file mode 100644 index 0000000000000000000000000000000000000000..6ad49a98d4f764fe0c9429843d35b35cf78e265b --- /dev/null +++ b/docs/zh_cn/tutorials/quant/AnalysisPTQ.md @@ -0,0 +1,99 @@ +# PTQ(Post Training Quantization)量化分析工具详细教程 + +## 1. 量化分析工具功能 +1. 统计分析(statistical_analyse): + - 可视化激活和权重箱状图。箱状图可发现是否出现离群点。 + - 可视化权重和激活直方分布图。直方分布图可观察更具体的数值分布。 + - 提供量化前后权重和激活的具体数据信息,包括min,max,mean,std等。 + +2. 精度误差分析(metric_error_analyse): + - 遍历量化模型的每层,并计算量化后精度。该功能可以定位具体某层导致的量化损失。 + +3. 获取目标模型(get_target_quant_model): + - 输入预期精度,直接产出符合预期精度的量化模型。 + + +## 2. paddleslim.quant.AnalysisPTQ 可传入参数解析 +| **参数名** | **参数释义** | +|-----------------------------|-----------------------------------------| +| model_dir | 必须传入的模型文件路径,可为文件夹名;若模型为ONNX类型,直接输入'.onnx'模型文件名称即可 | +| model_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdmodel'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入 | +| params_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdiparams'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入 | +| eval_function | 若需要验证精度,需要传入自定义的验证函数 | +| data_loader | 模型校准时使用的数据,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader | +| save_dir | 分析后保存模型精度或pdf等文件的文件夹,默认为`analysis_results`| +| resume | 是否加载中间分析文件,默认为False| +| ptq_config | 可传入的离线量化中的参数,详细可参考[离线量化文档](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/quant/quant_post) | + + + + + +## 3. 量化分析工具的使用 +**创建量化分析工具** : +``` +analyzer = AnalysisPTQ( + model_dir=config["model_dir"], + model_filename=config["model_filename"], + params_filename=config["params_filename"], + eval_function=eval_function, + data_loader=data_loader, + save_dir=config['save_dir'], + ptq_config=config['PTQ']) +``` + +**统计分析** +``` +analyzer.statistical_analyse() +``` + +调用该接口,会统计量化前和量化后每一个可量化权重和其对应激活的数据。只使用该接口可以不输入Eval Function,但需要输入DataLoader,少量数据即可。会产出以下文件: +- `fp_activation_boxplot.pdf`:量化前Float数据类型的模型激活箱状图 +- `fp_weight_boxplot.pdf`:量化前Float数据类型的模型权重箱状图 +- `quantized_activation_boxplot.pdf`:量化后INT数据类型的模型激活箱状图 +- `quantized_weight_boxplot.pdf`:量化后INT数据类型的模型权重箱状图 +- `fp_activation_histplot.pdf`:量化前Float数据类型的模型激活直方图 +- `fp_weight_histplot.pdf`:量化前Float数据类型的模型权重直方图 +- `quantized_activation_histplot.pdf`:量化后INT数据类型的模型激活直方图 +- `quantized_weight_histplot.pdf`:量化后INT数据类型的模型权重直方图 +- `statistic.csv`:量化前后权重和激活的具体数据信息,表格中会保存的信息有: + - Var Name: Variable的名称 + - Var Type:Variable的类型,Weight或Activation + - Corresponding Weight Name:如果为Activation,其对应的Weight名称 + - FP32 Min:量化前Float数据类型的最小值 + - FP32 Max:量化前Float数据类型的最大值 + - FP32 Mean:量化前Float数据类型的平均值 + - FP32 Std:量化前Float数据类型的方差值 + - Quantized Min:量化后INT数据类型的最小值 + - Quantized Max:量化后INT数据类型的最大值 + - Quantized Mean:量化后INT数据类型的平均值 + - Quantized Std:量化后INT数据类型的方差值 + - Diff Min:量化前后该Variable的相差的最小值 + - Diff Max:量化前后该Variable的相差的最大值 + - Diff Mean:量化前后该Variable的相差的平均值 + - Diff Std:量化前后该Variable的相差的方差值 + + +**精度误差分析** +``` +analyzer.metric_error_analyse() +``` +调用该接口,会遍历量化模型中的一层,并计算量化该层后模型的损失。调用该接口时,需要输入Eval Function。会产出所有只量化一层的模型精度排序,将默认保存在 `./analysis_results/analysis.txt` 中。 + + + +**直接产出符合预期精度的目标量化模型** +``` +analyzer.get_target_quant_model(target_metric) +``` + +## 4. 根据分析结果执行离线量化 +执行完量化分析工具后,可根据 `analysis.txt` 中的精度排序,在量化中去掉效果较差的层,具体操作为:在调用 `paddleslim.quant.quant_post_static` 时加入参数 `skip_tensor_list`,将需要去掉的层传入即可。 + + +## FAQ: +- 与QAT(Quantization-Aware Training)量化分析工具的区别:与QAT量化分析工具不同的是,PTQ量化分析工具则是加载待量化的原模型,对模型所有层依次进行量化,每次量化一层,进行验证获取精度误差分析。而QAT量化分析工具加载量化训练后的量化模型,遍历所有量化的层,依次去掉量化层,加载Float模型的参数,并进行验证获取精度误差分析。 + +- PTQ量化分析工具设计的原因:PTQ量化分析工具依次量化模型中的每一层,而不是依次去掉量化层是由于PTQ本身的高效性。依次量化一层进行验证,查看对模型精度的损失十分直观。 + +- 量化分析工具为什么要区分PTQ和QAT:实验证明PTQ和QAT后的量化模型的敏感层并不完全一致,将两种算法分开,敏感度分析结果更加准确。 diff --git a/docs/zh_cn/tutorials/quant/AnalysisQAT.md b/docs/zh_cn/tutorials/quant/AnalysisQAT.md new file mode 100644 index 0000000000000000000000000000000000000000..b6386c9c7fe302c1f66d4ac8ce7c1b08f8cafca1 --- /dev/null +++ b/docs/zh_cn/tutorials/quant/AnalysisQAT.md @@ -0,0 +1,56 @@ +# QAT(Quantization-Aware Training)量化分析工具详细教程 + +## 1. 量化分析工具功能 +精度误差分析(metric_error_analyse): + - 遍历量化训练后模型的每层,去掉量化节点并计算当前层不量化的模型精度。该功能可以定位具体某层导致的量化损失。 + + +## 2. paddleslim.quant.AnalysisQAT 可传入参数解析 +| **参数名** | **参数释义** | +|-----------------------------|-----------------------------------------| +| quant_model_dir | 必须传入的量化后的模型文件路径 | +| float_model_dir | 必须传入的量化前的模型文件路径 | +| model_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdmodel'结尾的模型名称 | +| params_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdiparams'结尾的模型名称 | +| quantizable_op_type | 需分析的量化的op类型,默认为`conv2d`, `depthwise_conv2d`, `mul` | +| qat_metric | 量化模型的精度,可不传入,默认为None,不传入时会自动计算 | +| eval_function | 需要传入自定义的验证函数 | +| data_loader | 模型校准时使用的数据,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader | +| save_dir | 分析后保存模型精度或pdf等文件的文件夹,默认为`analysis_results`| +| resume | 是否加载中间分析文件,默认为False| + + + + + +## 3. 量化分析工具的使用 +**创建量化分析工具** : +``` +analyzer = AnalysisQAT( + quant_model_dir=config["quant_model_dir"], + float_model_dir=config["float_model_dir"], + model_filename=config["model_filename"], + params_filename=config["params_filename"], + quantizable_op_type=config['quantizable_op_type'], + qat_metric=config['qat_metric'], + eval_function=eval_function, + data_loader=eval_loader, + save_dir=config['save_dir'], + resume=config['resume'], +) +``` + + +**精度误差分析** +``` +analyzer.metric_error_analyse() +``` +调用该接口,会遍历量化模型中的每一层,去掉量化节点并计算当前层不量化的模型精度。调用该接口时,需要输入Eval Function。会产出所有去掉一层量化的模型精度排序,将默认保存在 `./analysis_results/analysis.txt` 中。具体使用可参考[GPT量化训练敏感度分析DEMO](../../../../example/quantization_analysis/GPT/README.md)。 + + +## FAQ: +- 与PTQ(Post Training Quantization)量化分析工具的区别:与PTQ量化分析工具不同的是,QAT量化分析工具加载量化训练后的量化模型,遍历所有量化的层,依次去掉量化层,加载Float模型的参数,并进行验证获取精度误差分析。而PTQ量化分析工具则是加载待量化的原模型,对模型所有层依次进行量化,每次量化一层,进行验证获取精度误差分析。 + +- QAT量化分析工具设计的原因:QAT量化分析工具依次去掉量化层,而不是依次量化一层是由于QAT需要训练的特性。遍历每层进行量化训练再验证精度比较耗时,直接加载量化训练后的量化模型,依次去掉量化层更高效。 + +- 量化分析工具为什么要区分PTQ和QAT:实验证明PTQ和QAT后的量化模型的敏感层并不完全一致,将两种算法分开,敏感度分析结果更加准确。 diff --git a/docs/zh_cn/tutorials/quant/AnalysisQuant.md b/docs/zh_cn/tutorials/quant/AnalysisQuant.md deleted file mode 100644 index 669a126b49bb23c230f55a60d18ba1c31900d5fa..0000000000000000000000000000000000000000 --- a/docs/zh_cn/tutorials/quant/AnalysisQuant.md +++ /dev/null @@ -1,98 +0,0 @@ -# 量化分析工具详细教程 - -## 1. 量化分析工具功能 -1. statistical_analyse: - - 可视化激活和权重箱状图。箱状图可发现是否出现离群点。 - - 可视化权重和激活直方分布图。直方分布图可观察更具体的数值分布。 - - 提供量化前后权重和激活的具体数据信息,包括min,max,mean,std等 - -2. metric_error_analyse: - - 遍历量化模型的每层,并计算量化后精度。该功能可以定位具体某层导致的量化损失。 - -3. get_target_quant_model: - - 输入预期精度,直接产出符合预期精度的量化模型。 - - -## 2. paddleslim.quant.AnalysisQuant 可传入参数解析 -```yaml -model_dir -model_filename: None -params_filename: None -eval_function: None -data_loader: None -save_dir: 'analysis_results' -resume: False -ptq_config -``` -- model_dir: 必须传入的模型文件路径,可为文件夹名;若模型为ONNX类型,直接输入'.onnx'模型文件名称即可。 -- model_filename: 默认为None,若model_dir为文件夹名,则必须传入以'.pdmodel'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入。 -- params_filename: 默认为None,若model_dir为文件夹名,则必须传入以'.pdiparams'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入。 -- eval_function:若需要验证精度,需要传入自定义的验证函数。 -- data_loader:模型校准时使用的数据,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader。 -- save_dir:分析后保存模型精度或pdf等文件的文件夹,默认为`analysis_results`。 -- resume:是否加载中间分析文件 -- ptq_config:可传入的离线量化中的参数,详细可参考[离线量化文档](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/quant/quant_post)。 - - - - -## 3. 量化分析工具的使用 -**创建量化分析工具** : -``` -analyzer = AnalysisQuant( - model_dir=config["model_dir"], - model_filename=config["model_filename"], - params_filename=config["params_filename"], - eval_function=eval_function, - data_loader=data_loader, - save_dir=config['save_dir'], - ptq_config=config['PTQ']) -``` - -**统计分析** -``` -analyzer.statistical_analyse() -``` - -调用该接口,会统计量化前和量化后每一个可量化权重和其对应激活的数据。只使用该接口可以不输入Eval Function,但需要输入DataLoader,少量数据即可。会产出以下文件: -- `fp_activation_boxplot.pdf`:量化前Float数据类型的模型激活箱状图 -- `fp_weight_boxplot.pdf`:量化前Float数据类型的模型权重箱状图 -- `quantized_activation_boxplot.pdf`:量化后INT数据类型的模型激活箱状图 -- `quantized_weight_boxplot.pdf`:量化后INT数据类型的模型权重箱状图 -- `fp_activation_histplot.pdf`:量化前Float数据类型的模型激活直方图 -- `fp_weight_histplot.pdf`:量化前Float数据类型的模型权重直方图 -- `quantized_activation_histplot.pdf`:量化后INT数据类型的模型激活直方图 -- `quantized_weight_histplot.pdf`:量化后INT数据类型的模型权重直方图 -- `statistic.csv`:量化前后权重和激活的具体数据信息,表格中会保存的信息有: - - Var Name: Variable的名称 - - Var Type:Variable的类型,Weight或Activation - - Corresponding Weight Name:如果为Activation,其对应的Weight名称 - - FP32 Min:量化前Float数据类型的最小值 - - FP32 Max:量化前Float数据类型的最大值 - - FP32 Mean:量化前Float数据类型的平均值 - - FP32 Std:量化前Float数据类型的方差值 - - Quantized Min:量化后INT数据类型的最小值 - - Quantized Max:量化后INT数据类型的最大值 - - Quantized Mean:量化后INT数据类型的平均值 - - Quantized Std:量化后INT数据类型的方差值 - - Diff Min:量化前后该Variable的相差的最小值 - - Diff Max:量化前后该Variable的相差的最大值 - - Diff Mean:量化前后该Variable的相差的平均值 - - Diff Std:量化前后该Variable的相差的方差值 - - -**精度误差分析** -``` -analyzer.metric_error_analyse() -``` -调用该接口,会遍历量化模型中的一层,并计算量化该层后模型的损失。调用该接口时,需要输入Eval Function。会产出所有只量化一层的模型精度排序,将默认保存在 `./analysis_results/analysis.txt` 中。 - - - -**直接产出符合预期精度的量化模型** -``` -analyzer.get_target_quant_model(target_metric) -``` - -## 4. 根据分析结果执行离线量化 -执行完量化分析工具后,可根据 `analysis.txt` 中的精度排序,在量化中去掉效果较差的层,具体操作为:在调用 `paddleslim.quant.quant_post_static` 时加入参数 `skip_tensor_list`,将需要去掉的层传入即可。 diff --git a/example/post_training_quantization/detection/README.md b/example/post_training_quantization/detection/README.md index 80c8d7701ec619a3fc8964533a32406a8cd2e02d..b51f5d58102fb9a331e1e5023fe56962a879d1ca 100644 --- a/example/post_training_quantization/detection/README.md +++ b/example/post_training_quantization/detection/README.md @@ -130,7 +130,7 @@ python eval.py --config_path=./configs/ppyoloe_s_ptq.yaml - 要测试的模型路径可以在配置文件中`model_dir`字段下进行修改。 #### 3.6 提高离线量化精度 -本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisQuant```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisQuant```详解见[AnalysisQuant.md](../../../docs/zh_cn/tutorials/quant/AnalysisQuant.md)。 +本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisPTQ```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisPTQ```详解见[AnalysisPTQ.md](../../../docs/zh_cn/tutorials/quant/AnalysisPTQ.md)。 经过多个实验,包括尝试多种激活算法(avg,KL等)、weight的量化方式(abs_max,channel_wise_abs_max),对PicoDet-s进行离线量化后精度均为0,以PicoDet-s为例,量化分析工具具体使用方法如下: diff --git a/example/post_training_quantization/detection/analysis.py b/example/post_training_quantization/detection/analysis.py index 4acd54e43efc852f56ae5a27c1d3ef1ae4204fa1..7b854d265d55dcd5731679828523068875d7b3a0 100644 --- a/example/post_training_quantization/detection/analysis.py +++ b/example/post_training_quantization/detection/analysis.py @@ -23,7 +23,7 @@ from ppdet.core.workspace import create from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval from keypoint_utils import keypoint_post_process from post_process import PPYOLOEPostProcess -from paddleslim.quant.analysis import AnalysisQuant +from paddleslim.quant.analysis_ptq import AnalysisPTQ def argsparser(): @@ -161,7 +161,7 @@ def main(): else: raise ValueError("metric currently only supports COCO and VOC.") - analyzer = AnalysisQuant( + analyzer = AnalysisPTQ( model_dir=config["model_dir"], model_filename=config["model_filename"], params_filename=config["params_filename"], diff --git a/example/post_training_quantization/pytorch_yolo_series/README.md b/example/post_training_quantization/pytorch_yolo_series/README.md index e012cff3c7fe8f2a8bee2de61c09ba0ea8d3e9f1..dbf23ef26135a66df47590e86e1005499b0eca68 100644 --- a/example/post_training_quantization/pytorch_yolo_series/README.md +++ b/example/post_training_quantization/pytorch_yolo_series/README.md @@ -116,7 +116,9 @@ python eval.py --config_path=./configs/yolov5s_ptq.yaml #### 3.6 提高离线量化精度 -本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisQuant```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisQuant```详解见[AnalysisQuant.md](../../../docs/zh_cn/tutorials/quant/AnalysisQuant.md)。 + +###### 3.6.1 量化分析工具 +本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisPTQ```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisPTQ```详解见[AnalysisPTQ.md](../../../docs/zh_cn/tutorials/quant/AnalysisPTQ.md)。 由于YOLOv6离线量化效果较差,以YOLOv6为例,量化分析工具具体使用方法如下: diff --git a/example/post_training_quantization/pytorch_yolo_series/analysis.py b/example/post_training_quantization/pytorch_yolo_series/analysis.py index 39d879f0ccd9d2a1acc79980c8c0aa3c3257066c..088c7aec684b5ebd3aacc83dac6ff02bb9886973 100644 --- a/example/post_training_quantization/pytorch_yolo_series/analysis.py +++ b/example/post_training_quantization/pytorch_yolo_series/analysis.py @@ -21,7 +21,7 @@ from tqdm import tqdm from post_process import YOLOPostProcess, coco_metric from dataset import COCOValDataset, COCOTrainDataset from paddleslim.common import load_config, load_onnx_model -from paddleslim.quant.analysis import AnalysisQuant +from paddleslim.quant.analysis_ptq import AnalysisPTQ def argsparser(): @@ -103,7 +103,7 @@ def main(): load_onnx_model(config["model_dir"]) inference_model_path = config["model_dir"].rstrip().rstrip( '.onnx') + '_infer' - analyzer = AnalysisQuant( + analyzer = AnalysisPTQ( model_dir=inference_model_path, model_filename='model.pdmodel', params_filename='model.pdiparams', diff --git a/example/quantization_analysis/GPT/README.md b/example/quantization_analysis/GPT/README.md new file mode 100644 index 0000000000000000000000000000000000000000..007c37ce180ed611c006812df00793f825957c10 --- /dev/null +++ b/example/quantization_analysis/GPT/README.md @@ -0,0 +1,46 @@ +# GPT量化训练敏感度分析示例 + + +## 1. 简介 +本示例将以自然语言处理生成模型GPT-3为例,介绍如何使用量化训练敏感度分析工具分析量化模型,以及提升量化训练精度。 + +## 2.Benchmark +| 模型 | 策略 | ACC | Inference模型 | +| :-------- |:-------- | :--------: | :--------: | +| GPT-345M | Baseline | 44.17 | [Model](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345M_Baseline.tar) | +| GPT-345M | 量化训练(分析前) | 41.58 | [Model](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345_QAT_wo_analysis.tar) | +| GPT-345M | 量化训练(分析后) | 44.94 | [Model](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345M_QAT_w_analysis_infer.tar) | + + +- ACC的指标均在基于[LAMBADA](https://raw.githubusercontent.com/cybertronai/bflm/master/lambada_test.jsonl)数据集,采用 ACC(accuracy) 指标评测得到 + +## 3. 量化分析流程 +#### 3.1 准备环境 +- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim develop版本 +- PaddleFleetX >= 2.4 + +#### 3.2 准备数据集 + +量化敏感度分析基于验证集获得每层的敏感度,可下载和使用 [LAMBADA](https://raw.githubusercontent.com/cybertronai/bflm/master/lambada_test.jsonl) 或者 [WikiText](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip) 数据集。本示例使用LAMBADA数据集来进行敏感度分析。 + +#### 3.3 准备预测模型 +- [GPT-345M](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345M_Baseline.tar) :Base模型 +- [GPT-345M](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345_QAT_wo_analysis.tar) :分析前量化训练后的模型 + + +#### 3.4 量化敏感度分析 +量化敏感度分析示例通过analysis.py脚本启动,会使用接口```paddleslim.quant.AnalysisQAT```对模型进行敏感度分析。配置config文件中模型路径、数据路径和量化相关的参数,配置完成后便可对模型进行敏感度分析。具体运行命令为: + +```shell +python analysis.py --config_path=./configs/gpt_345M_analysis.yaml +``` + +分析完成后,会产生排序好的层敏感度(敏感度由大到小排序,敏感度越大说明约负向影响模型精度),并保存在```analysis_results/analysis.txt```中。 +敏感度排序前10层分别为:```linear_31```,```linear_27```,```linear_22```,```linear_43```,```linear_83```,```linear_15```,```linear_87```,```linear_3```,```linear_38```,```linear_39```。在这十层中,其中有八层属于```TransformerDecoder```中第二个FFN层,两层属于```TransformerDecoder```中第一个FFN层,而```MultiHeadAttention```中的Linear层都相对不敏感。 + +```paddleslim.quant.AnalysisQAT```详解见[AnalysisQAT.md](../../../docs/zh_cn/tutorials/quant/AnalysisQAT.md)。 + +#### 3.5 重新量化训练 + +根据分析结果,重新量化训练时,去掉了```linear_31```,```linear_27```,```linear_22```,```linear_43```,```linear_83```,```linear_15```,```linear_87```七层Linear的量化,最后量化模型精度达到44.94。 diff --git a/example/quantization_analysis/GPT/analysis.py b/example/quantization_analysis/GPT/analysis.py new file mode 100644 index 0000000000000000000000000000000000000000..d41818e6113172af57fe5926da96062f363fb0ee --- /dev/null +++ b/example/quantization_analysis/GPT/analysis.py @@ -0,0 +1,188 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys +import random +import numpy as np +import argparse +import time + +import paddle +from paddleslim.common import load_config as load_slim_config +from paddleslim.quant.analysis_qat import AnalysisQAT +from ppfleetx.data import build_dataloader +from ppfleetx.distributed.apis import env +from utils import parse_config + + +def argsparser(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + '--config_path', + type=str, + default=None, + help="path of compression strategy config.", + required=True) + parser.add_argument( + '--save_dir', + type=str, + default='analysis_results', + help="directory to save compressed model.") + parser.add_argument( + '--devices', + type=str, + default='gpu', + help="which device used to compress.") + return parser + + +def eval_reader_wrapper(reader): + def gen(): + for data in reader: + tokens, loss_mask, attention_mask, position_ids, labels, info = data + in_dict = {} + in_dict['tokens'] = tokens + in_dict['ids'] = position_ids + yield in_dict, labels, loss_mask, info + + return gen + + +def eval_function(exe, program, feed_names, fetch_list): + tic_eval = time.time() + score_name = "loss" if not global_config['cloze_eval'] else "number correct" + first_step = True + eval_losses = [] + total_score = 0 + for eval_step, (data, labels, loss_mask, info) in enumerate(eval_loader()): + preds = exe.run(program=program, + feed=data, + fetch_list=fetch_list, + return_numpy=False) + + paddle.disable_static() + + labels = paddle.to_tensor(labels) + preds = paddle.to_tensor(preds[0]) + loss_mask = paddle.to_tensor(loss_mask) + info = paddle.to_tensor(info) + + if not global_config['cloze_eval']: + if first_step: + num_original_tokens = info.numpy()[0][0] + num_tokenized_tokens = info.numpy()[0][1] + first_step = False + + masked_lm_loss = paddle.nn.functional.cross_entropy( + preds, labels, reduction="none") + loss = paddle.sum(masked_lm_loss * loss_mask) + eval_losses.append(loss.numpy()[0]) + total_score += loss.numpy() / (num_tokenized_tokens - 1) + + else: + if first_step: + num_examples = info.numpy()[0][0] + first_step = False + outputs = paddle.argmax(preds, -1) + acc = paddle.cast(outputs == labels, 'float32') + acc = paddle.where( + paddle.cast(loss_mask, 'bool'), acc, paddle.ones_like(acc)) + acc = paddle.sum(paddle.prod(acc, -1)) + eval_losses.append(acc.numpy()[0]) + total_score += acc.numpy()[0] + + if eval_step != 0 and (eval_step % 10 == 0): + print("[eval] step: %d, batch: %d, %s: %.9f, speed: %.2f step/s" % + (eval_step, eval_step, score_name, total_score, + 1. / (time.time() - tic_eval))) + tic_eval = time.time() + paddle.enable_static() + + metric = None + if not global_config['cloze_eval']: + total_loss = float(total_score) + ppl = math.exp(min(20, total_loss)) + token_ratio = (num_tokenized_tokens - 1) / (num_original_tokens - 1) + adjusted_ppl = math.exp(min(20, total_loss * token_ratio)) + string = ' validation results on {} | '.format(gpt_config['Data'][ + 'Eval']['dataset']['name']) + string += 'avg loss: {:.4E} | '.format(total_loss) + string += 'ppl: {:.4E} | '.format(ppl) + string += 'adjusted ppl: {:.4E} | '.format(adjusted_ppl) + string += 'token ratio: {} |'.format(token_ratio) + metric = ppl + else: + num_correct = float(total_score) + acc = float(num_correct / num_examples) + string = ' validation results on {} | '.format(gpt_config['Data'][ + 'Eval']['dataset']['name']) + string += 'number correct: {:.4E} | '.format(num_correct) + string += 'total examples: {:.4E} | '.format(num_examples) + string += 'avg accuracy: {:.4E}'.format(acc) + metric = acc + + print(string) + return metric + + +def main(): + global global_config, all_config + all_config = load_slim_config(FLAGS.config_path) + assert "Global" in all_config, "Key 'Global' not found in config file. \n{}".format( + all_config) + global_config = all_config["Global"] + + seed = all_config['Global']['seed'] + random.seed(seed) + np.random.seed(seed) + paddle.seed(seed) + env.set_seed(seed) + + global gpt_config + gpt_config = parse_config(global_config['reader_config']) + + if not global_config['cloze_eval']: + gpt_config['Data']['Eval']['dataset']['name'] = "LM_Eval_Dataset" + else: + gpt_config['Data']['Eval']['dataset']['name'] = "Lambada_Eval_Dataset" + + valid_data_loader = build_dataloader(gpt_config['Data'], "Eval") + + global eval_loader + eval_loader = eval_reader_wrapper(valid_data_loader) + + analyzer = AnalysisQAT( + quant_model_dir=global_config["quant_model_dir"], + float_model_dir=global_config["float_model_dir"], + model_filename=global_config["model_filename"], + params_filename=global_config["params_filename"], + quantizable_op_type=global_config['quantizable_op_type'], + qat_metric=global_config['qat_metric'] + if 'qat_metric' in global_config else None, + eval_function=eval_function, + data_loader=eval_loader, + save_dir=FLAGS.save_dir, + resume=global_config['resume'], ) + analyzer.metric_error_analyse() + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu'] + paddle.set_device(FLAGS.devices) + + main() diff --git a/example/quantization_analysis/GPT/configs/gpt_345M_analysis.yaml b/example/quantization_analysis/GPT/configs/gpt_345M_analysis.yaml new file mode 100644 index 0000000000000000000000000000000000000000..1be19fc556835aee30e067f2d0f1777644a97e8b --- /dev/null +++ b/example/quantization_analysis/GPT/configs/gpt_345M_analysis.yaml @@ -0,0 +1,15 @@ +Global: + device: gpu + seed: 1024 + quant_model_dir: ./GPT_345_QAT_wo_analysis + float_model_dir: ./GPT_345M_Baseline + model_filename: model.pdmodel + params_filename: model.pdiparams + quantizable_op_type: ["mul", "matmul", "matmul_v2"] + resume: False + reader_config: ./configs/gpt_reader.yaml + cloze_eval: True # True for LAMBADA Dataset; False for WikiText + + + + \ No newline at end of file diff --git a/example/quantization_analysis/GPT/configs/gpt_reader.yaml b/example/quantization_analysis/GPT/configs/gpt_reader.yaml new file mode 100644 index 0000000000000000000000000000000000000000..55612323e897c39541bfcf5defca4f3f81cac939 --- /dev/null +++ b/example/quantization_analysis/GPT/configs/gpt_reader.yaml @@ -0,0 +1,13 @@ +Data: + Eval: + dataset: + name: GPTDataset + input_dir: ./lambada_test.jsonl + max_seq_len: 1024 + overlapping_eval: 32 + loader: + num_workers: 1 + return_list: True + collate_fn: gpt_collate_fn + batch_size: 1 + \ No newline at end of file diff --git a/example/quantization_analysis/GPT/utils.py b/example/quantization_analysis/GPT/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..42e62b5fa1502d24cb513bfb7e03a85a509c048e --- /dev/null +++ b/example/quantization_analysis/GPT/utils.py @@ -0,0 +1,110 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import codecs +import yaml +import time +import copy + + +class AttrDict(dict): + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __copy__(self): + cls = self.__class__ + result = cls.__new__(cls) + result.__dict__.update(self.__dict__) + return result + + def __deepcopy__(self, memo): + cls = self.__class__ + result = cls.__new__(cls) + memo[id(self)] = result + for k, v in self.__dict__.items(): + setattr(result, k, copy.deepcopy(v, memo)) + for k, v in self.items(): + setattr(result, k, copy.deepcopy(v, memo)) + return result + + def setdefault(self, k, default=None): + if k not in self or self[k] is None: + self[k] = default + return default + else: + return self[k] + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + + def _update_dic(dic, base_dic): + '''Update config from dic based base_dic + ''' + base_dic = base_dic.copy() + dic = dic.copy() + + if dic.get('_inherited_', True) == False: + dic.pop('_inherited_') + return dic + + for key, val in dic.items(): + if isinstance(val, dict) and key in base_dic: + base_dic[key] = _update_dic(val, base_dic[key]) + else: + base_dic[key] = val + dic = base_dic + return dic + + def _parse_from_yaml(path): + '''Parse a yaml file and build config''' + + with codecs.open(path, 'r', 'utf-8') as file: + dic = yaml.load(file, Loader=yaml.FullLoader) + + if '_base_' in dic: + cfg_dir = os.path.dirname(path) + base_path = dic.pop('_base_') + base_path = os.path.join(cfg_dir, base_path) + base_dic = _parse_from_yaml(base_path) + dic = _update_dic(dic, base_dic) + return dic + + yaml_dict = _parse_from_yaml(cfg_file) + yaml_config = AttrDict(yaml_dict) + + create_attr_dict(yaml_config) + return yaml_config diff --git a/paddleslim/quant/analysis.py b/paddleslim/quant/analysis_ptq.py similarity index 98% rename from paddleslim/quant/analysis.py rename to paddleslim/quant/analysis_ptq.py index d2e12c94d346a3369dcc5cedbb0eda18c9c689d5..c207eb56f77d93be3d273234c3ba635ca865d4bb 100644 --- a/paddleslim/quant/analysis.py +++ b/paddleslim/quant/analysis_ptq.py @@ -37,10 +37,10 @@ from ..common import get_feed_vars, wrap_dataloader, load_inference_model, get_m _logger = get_logger(__name__, level=logging.INFO) -__all__ = ["AnalysisQuant"] +__all__ = ["AnalysisPTQ"] -class AnalysisQuant(object): +class AnalysisPTQ(object): def __init__(self, model_dir, model_filename=None, @@ -51,7 +51,7 @@ class AnalysisQuant(object): resume=False, ptq_config=None): """ - AnalysisQuant provides to analysis the sensitivity of each op in the model. + AnalysisPTQ provides to analysis the sensitivity of each op in the model. Args: model_dir(str): the path of fp32 model that will be quantized, it can also be '.onnx' @@ -403,7 +403,8 @@ class AnalysisQuant(object): statistic = [] box_fp_dist, box_q_dist = [], [] hist_fp_dist, hist_q_dist = {}, {} - for var_name in fp_tensors: + fp_tensor_names = sorted(list(fp_tensors.keys())) + for var_name in fp_tensor_names: fp_tensor = fp_tensors[var_name] quant_name = var_name_map[ var_name] if var_name_map is not None else var_name @@ -503,7 +504,9 @@ class AnalysisQuant(object): for name in hist_data: plt.hist(hist_data[name][0], bins=hist_data[name][1]) plt.xlabel(name) - plt.ylabel("Frequency") + plt.ylabel("Probability") + locs, _ = plt.yticks() + plt.yticks(locs, np.round(locs / len(hist_data[name][0]), 3)) if 'act' in save_name: plt.title("Hist of Activation {}".format(name)) else: diff --git a/paddleslim/quant/analysis_qat.py b/paddleslim/quant/analysis_qat.py new file mode 100644 index 0000000000000000000000000000000000000000..98a990333ca977e48c7d91967ee8767d36d38aa3 --- /dev/null +++ b/paddleslim/quant/analysis_qat.py @@ -0,0 +1,266 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys +import pickle +import copy +import logging +import numpy as np + +import paddle +from paddle.fluid import core +from paddle.fluid.framework import IrGraph +from ..common import get_logger, load_inference_model + +_logger = get_logger(__name__, level=logging.INFO) + +__all__ = ["AnalysisQAT"] + + +class AnalysisQAT(object): + def __init__(self, + quant_model_dir, + float_model_dir, + model_filename=None, + params_filename=None, + quantizable_op_type=["conv2d", "depthwise_conv2d", "mul"], + qat_metric=None, + eval_function=None, + data_loader=None, + save_dir='analysis_results', + resume=False): + ''' + AnalysisQAT provides to analysis the sensitivity of each op in the model. + + Args: + quant_model_dir(str): the path of INT8 model that quantized through QAT + float_model_dir(str): the path of FP32 model that is the base model of quant_model + model_filename(str, optional): the model file name of the model + params_filename(str, optional): the parameter file name of the model + quantizable_op_type(list of str, optional): the type of op that will be analyzed + qat_metric(float, optional): the metric of the quantized model, which will be calculated automatically if is None + eval_function(function): eval function, define by yourself to return the metric of the inference program, can be used to judge the metric of quantized model. + data_loader(Python Generator, Paddle.io.DataLoader, optional): the + Generator or Dataloader provides calibrate data, and it could + return a batch every time + save_dir(str, optional): the output dir that stores the analyzed information + resume(bool, optional): When break off while ananlyzing, could resume analysis program and load already analyzed information. + ''' + if model_filename is None: + model_filename = 'model.pdmodel' + if params_filename is None: + params_filename = 'model.pdiparams' + self.quant_model_dir = quant_model_dir + self.float_model_dir = float_model_dir + self.model_filename = model_filename + self.params_filename = params_filename + self.quantizable_op_type = quantizable_op_type + self.qat_metric = qat_metric + self.eval_function = eval_function + self.save_dir = save_dir + self.checkpoint_name = os.path.join(save_dir, 'analysis_checkpoint.pkl') + self.nonquant_layer_metrics = {} + if not os.path.exists(self.save_dir): + os.mkdir(self.save_dir) + + devices = paddle.device.get_device().split(':')[0] + self.places = paddle.device._convert_to_place(devices) + executor = paddle.static.Executor(self.places) + [program, self.feed_list, self.fetch_list] = load_inference_model( + self.quant_model_dir, + executor=executor, + model_filename=self.model_filename, + params_filename=self.params_filename) + _logger.info('Loaded model from: {}'.format(quant_model_dir)) + + graph = IrGraph(core.Graph(program.desc), for_test=True) + + # find all inputs for each quantizable op + self.inputs_of_quantized_op = [] + sorted_ops = graph.topology_sort() + for op_node in sorted_ops: + op_name = op_node.name() + if op_name in quantizable_op_type: + input_names = op_node.op().input_arg_names() + for input_name in input_names: + if 'quantized' in input_name: + self.inputs_of_quantized_op.append(input_names) + break + + if self.qat_metric is None: + _logger.info('Calculating the metric of QAT model...') + self.qat_metric = self.eval_function( + executor, program, self.feed_list, self.fetch_list) * 100 + _logger.info('The metric of QAT model is {}'.format( + round(self.qat_metric, 4))) + executor.close() + + def save_checkpoint(self): + if not os.path.exists(self.save_dir): + os.makedirs(self.save_dir) + with open(self.checkpoint_name, 'wb') as f: + pickle.dump(self.nonquant_layer_metrics, f) + _logger.info('Save checkpoint to {}.'.format(self.checkpoint_name)) + + def load_checkpoint(self): + if not os.path.exists(self.checkpoint_name): + _logger.info('Checkpoint path {} does not exist.'.format( + self.checkpoint_name)) + return False + with open(self.checkpoint_name, 'rb') as f: + self.nonquant_layer_metrics = pickle.load(f) + _logger.info('Load checkpoint from {}.'.format(self.checkpoint_name)) + return True + + def get_weight_name(self, inputs_names): + # TODO(xc) + w_idx = 0 if 'w_0' in inputs_names[0] else 1 + weight_name = inputs_names[w_idx].split('.quantized.dequantized')[0] + return weight_name + + def get_new_in_out_map( + self, + input_list, + graph, + float_scope, + quant_scope, ): + + input_rename_map = {} + output_rename_map = {} + removed_ops = [] + for op_node in graph.all_op_nodes(): + if op_node.id() in removed_ops: + continue + in_names = op_node.input_arg_names() + out_names = op_node.output_arg_names() + if len(out_names) == 1 and out_names[0] in input_list: + in_var = graph._find_node_by_name(op_node.inputs, + op_node.input('X')[0]) + out_var = graph._find_node_by_name(op_node.outputs, + op_node.output('Y')[0]) + if 'quantized' in in_var.name(): + # act + for op in graph.all_op_nodes(): + o_ns = op.output_arg_names() + if len(o_ns) == 1 and o_ns[0] == in_var.name(): + in_var_1 = graph._find_node_by_name( + op.inputs, op.input('X')[0]) + graph.safe_remove_nodes(op) + removed_ops.append(op.id()) + input_rename_map[out_var.node] = in_var_1 + else: + # weight + with paddle.static.scope_guard(float_scope): + float_weight = np.array( + float_scope.find_var(in_var.name()).get_tensor()) + with paddle.static.scope_guard(quant_scope): + quant_scope.find_var(in_var.name()).get_tensor().set( + float_weight, self.places) + input_rename_map[out_var.node] = in_var + graph.safe_remove_nodes(op_node) + removed_ops.append(op_node.id()) + output_rename_map[in_var.node] = out_var + + return input_rename_map, output_rename_map, removed_ops + + def relink_graph(self, graph, input_rename_map, output_rename_map, + removed_ops): + for op_node in graph.all_op_nodes(): + if op_node.id() in removed_ops: + continue + for var in op_node.inputs: + if var.node in input_rename_map: + old_in = var + new_in = input_rename_map[var.node] + graph.update_input_link(old_in, new_in, op_node) + _logger.info( + f'relink {op_node.name()} \'s input node from {old_in.name()} to {new_in.name()}.' + ) + for var in op_node.outputs: + if var.node in output_rename_map: + old_out = var + new_out = output_rename_map[var.node] + graph.update_input_link(old_out, new_out, op_node) + _logger.info( + f'relink {op_node.name()} \'s output node from {old_out.name()} to {new_out.name()}.' + ) + + return graph.to_program() + + def metric_error_analyse(self): + executor = paddle.static.Executor(self.places) + + float_scope = paddle.static.Scope() + quant_scope = paddle.static.Scope() + + for idx, input_list in enumerate(self.inputs_of_quantized_op): + weight_name = self.get_weight_name(input_list) + _logger.info( + 'Checking {}/{} quant model: without quant layer {}'.format( + idx + 1, len(self.inputs_of_quantized_op), weight_name)) + + with paddle.static.scope_guard(float_scope): + load_inference_model( + self.float_model_dir, + executor=executor, + model_filename=self.model_filename, + params_filename=self.params_filename) + + with paddle.static.scope_guard(quant_scope): + [program, self.feed_list, + self.fetch_list] = load_inference_model( + self.quant_model_dir, + executor=executor, + model_filename=self.model_filename, + params_filename=self.params_filename) + + program_copy = program.clone() + graph = IrGraph(core.Graph(program_copy.desc), for_test=True) + input_rename_map, output_rename_map, removed_ops = self.get_new_in_out_map( + input_list, graph, float_scope, quant_scope) + saved_program = self.relink_graph(graph, input_rename_map, + output_rename_map, removed_ops) + with paddle.static.scope_guard(quant_scope): + _logger.info('Skip quant {}, evaluating....'.format( + weight_name)) + metric = self.eval_function(executor, saved_program, + self.feed_list, + self.fetch_list) * 100 + self.nonquant_layer_metrics[weight_name] = metric + _logger.info( + 'When skip quant {}, the metric is {}, the diff is {}'. + format(weight_name, + round(metric, 4), round(metric - self.qat_metric, + 4))) + self.save_checkpoint() + + executor.close() + + self.sensitivity_ranklist = sorted( + self.nonquant_layer_metrics, + key=self.nonquant_layer_metrics.get, + reverse=True) + _logger.info('Finished computing the sensitivity of the model.') + for name in self.sensitivity_ranklist: + _logger.info("without quant layer name: {}, eval metric: {}".format( + name, self.nonquant_layer_metrics[name])) + + analysis_file = os.path.join(self.save_dir, "analysis.txt") + with open(analysis_file, "w") as analysis_ret_f: + for name in self.sensitivity_ranklist: + analysis_ret_f.write( + "without layer name: {}, eval metric: {}\n".format( + name, self.nonquant_layer_metrics[name])) + _logger.info('Analysis file is saved in {}'.format(analysis_file))