未验证 提交 45c8f7ce 编写于 作者: C Chang Xu 提交者: GitHub

[Cherry-Pick] Analysis Quant (#1599)

上级 9b5c5202
# 量化分析工具详细教程 # PTQ(Post Training Quantization)量化分析工具详细教程
## 1. 量化分析工具功能 ## 1. 量化分析工具功能
1. statistical_analyse 1. 统计分析(statistical_analyse)
- 可视化激活和权重箱状图。箱状图可发现是否出现离群点。 - 可视化激活和权重箱状图。箱状图可发现是否出现离群点。
- 可视化权重和激活直方分布图。直方分布图可观察更具体的数值分布。 - 可视化权重和激活直方分布图。直方分布图可观察更具体的数值分布。
- 提供量化前后权重和激活的具体数据信息,包括min,max,mean,std等 - 提供量化前后权重和激活的具体数据信息,包括min,max,mean,std等
2. metric_error_analyse 2. 精度误差分析(metric_error_analyse)
- 遍历量化模型的每层,并计算量化后精度。该功能可以定位具体某层导致的量化损失。 - 遍历量化模型的每层,并计算量化后精度。该功能可以定位具体某层导致的量化损失。
3. get_target_quant_model 3. 获取目标模型(get_target_quant_model)
- 输入预期精度,直接产出符合预期精度的量化模型。 - 输入预期精度,直接产出符合预期精度的量化模型。
## 2. paddleslim.quant.AnalysisQuant 可传入参数解析 ## 2. paddleslim.quant.AnalysisPTQ 可传入参数解析
```yaml | **参数名** | **参数释义** |
model_dir |-----------------------------|-----------------------------------------|
model_filename: None | model_dir | 必须传入的模型文件路径,可为文件夹名;若模型为ONNX类型,直接输入'.onnx'模型文件名称即可 |
params_filename: None | model_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdmodel'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入 |
eval_function: None | params_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdiparams'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入 |
data_loader: None | eval_function | 若需要验证精度,需要传入自定义的验证函数 |
save_dir: 'analysis_results' | data_loader | 模型校准时使用的数据,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader |
resume: False | save_dir | 分析后保存模型精度或pdf等文件的文件夹,默认为`analysis_results`|
ptq_config | resume | 是否加载中间分析文件,默认为False|
``` | ptq_config | 可传入的离线量化中的参数,详细可参考[离线量化文档](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/quant/quant_post) |
- model_dir: 必须传入的模型文件路径,可为文件夹名;若模型为ONNX类型,直接输入'.onnx'模型文件名称即可。
- model_filename: 默认为None,若model_dir为文件夹名,则必须传入以'.pdmodel'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入。
- params_filename: 默认为None,若model_dir为文件夹名,则必须传入以'.pdiparams'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入。
- eval_function:若需要验证精度,需要传入自定义的验证函数。
- data_loader:模型校准时使用的数据,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader。
- save_dir:分析后保存模型精度或pdf等文件的文件夹,默认为`analysis_results`
- resume:是否加载中间分析文件
- ptq_config:可传入的离线量化中的参数,详细可参考[离线量化文档](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/quant/quant_post)
...@@ -39,14 +32,14 @@ ptq_config ...@@ -39,14 +32,14 @@ ptq_config
## 3. 量化分析工具的使用 ## 3. 量化分析工具的使用
**创建量化分析工具** **创建量化分析工具**
``` ```
analyzer = AnalysisQuant( analyzer = AnalysisPTQ(
model_dir=config["model_dir"], model_dir=config["model_dir"],
model_filename=config["model_filename"], model_filename=config["model_filename"],
params_filename=config["params_filename"], params_filename=config["params_filename"],
eval_function=eval_function, eval_function=eval_function,
data_loader=data_loader, data_loader=data_loader,
save_dir=config['save_dir'], save_dir=config['save_dir'],
ptq_config=config['PTQ']) ptq_config=config['PTQ'])
``` ```
**统计分析** **统计分析**
...@@ -64,21 +57,21 @@ analyzer.statistical_analyse() ...@@ -64,21 +57,21 @@ analyzer.statistical_analyse()
- `quantized_activation_histplot.pdf`:量化后INT数据类型的模型激活直方图 - `quantized_activation_histplot.pdf`:量化后INT数据类型的模型激活直方图
- `quantized_weight_histplot.pdf`:量化后INT数据类型的模型权重直方图 - `quantized_weight_histplot.pdf`:量化后INT数据类型的模型权重直方图
- `statistic.csv`:量化前后权重和激活的具体数据信息,表格中会保存的信息有: - `statistic.csv`:量化前后权重和激活的具体数据信息,表格中会保存的信息有:
- Var Name: Variable的名称 - Var Name: Variable的名称
- Var Type:Variable的类型,Weight或Activation - Var Type:Variable的类型,Weight或Activation
- Corresponding Weight Name:如果为Activation,其对应的Weight名称 - Corresponding Weight Name:如果为Activation,其对应的Weight名称
- FP32 Min:量化前Float数据类型的最小值 - FP32 Min:量化前Float数据类型的最小值
- FP32 Max:量化前Float数据类型的最大值 - FP32 Max:量化前Float数据类型的最大值
- FP32 Mean:量化前Float数据类型的平均值 - FP32 Mean:量化前Float数据类型的平均值
- FP32 Std:量化前Float数据类型的方差值 - FP32 Std:量化前Float数据类型的方差值
- Quantized Min:量化后INT数据类型的最小值 - Quantized Min:量化后INT数据类型的最小值
- Quantized Max:量化后INT数据类型的最大值 - Quantized Max:量化后INT数据类型的最大值
- Quantized Mean:量化后INT数据类型的平均值 - Quantized Mean:量化后INT数据类型的平均值
- Quantized Std:量化后INT数据类型的方差值 - Quantized Std:量化后INT数据类型的方差值
- Diff Min:量化前后该Variable的相差的最小值 - Diff Min:量化前后该Variable的相差的最小值
- Diff Max:量化前后该Variable的相差的最大值 - Diff Max:量化前后该Variable的相差的最大值
- Diff Mean:量化前后该Variable的相差的平均值 - Diff Mean:量化前后该Variable的相差的平均值
- Diff Std:量化前后该Variable的相差的方差值 - Diff Std:量化前后该Variable的相差的方差值
**精度误差分析** **精度误差分析**
...@@ -89,10 +82,18 @@ analyzer.metric_error_analyse() ...@@ -89,10 +82,18 @@ analyzer.metric_error_analyse()
**直接产出符合预期精度的量化模型** **直接产出符合预期精度的目标量化模型**
``` ```
analyzer.get_target_quant_model(target_metric) analyzer.get_target_quant_model(target_metric)
``` ```
## 4. 根据分析结果执行离线量化 ## 4. 根据分析结果执行离线量化
执行完量化分析工具后,可根据 `analysis.txt` 中的精度排序,在量化中去掉效果较差的层,具体操作为:在调用 `paddleslim.quant.quant_post_static` 时加入参数 `skip_tensor_list`,将需要去掉的层传入即可。 执行完量化分析工具后,可根据 `analysis.txt` 中的精度排序,在量化中去掉效果较差的层,具体操作为:在调用 `paddleslim.quant.quant_post_static` 时加入参数 `skip_tensor_list`,将需要去掉的层传入即可。
## FAQ:
- 与QAT(Quantization-Aware Training)量化分析工具的区别:与QAT量化分析工具不同的是,PTQ量化分析工具则是加载待量化的原模型,对模型所有层依次进行量化,每次量化一层,进行验证获取精度误差分析。而QAT量化分析工具加载量化训练后的量化模型,遍历所有量化的层,依次去掉量化层,加载Float模型的参数,并进行验证获取精度误差分析。
- PTQ量化分析工具设计的原因:PTQ量化分析工具依次量化模型中的每一层,而不是依次去掉量化层是由于PTQ本身的高效性。依次量化一层进行验证,查看对模型精度的损失十分直观。
- 量化分析工具为什么要区分PTQ和QAT:实验证明PTQ和QAT后的量化模型的敏感层并不完全一致,将两种算法分开,敏感度分析结果更加准确。
# QAT(Quantization-Aware Training)量化分析工具详细教程
## 1. 量化分析工具功能
精度误差分析(metric_error_analyse):
- 遍历量化训练后模型的每层,去掉量化节点并计算当前层不量化的模型精度。该功能可以定位具体某层导致的量化损失。
## 2. paddleslim.quant.AnalysisQAT 可传入参数解析
| **参数名** | **参数释义** |
|-----------------------------|-----------------------------------------|
| quant_model_dir | 必须传入的量化后的模型文件路径 |
| float_model_dir | 必须传入的量化前的模型文件路径 |
| model_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdmodel'结尾的模型名称 |
| params_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdiparams'结尾的模型名称 |
| quantizable_op_type | 需分析的量化的op类型,默认为`conv2d`, `depthwise_conv2d`, `mul` |
| qat_metric | 量化模型的精度,可不传入,默认为None,不传入时会自动计算 |
| eval_function | 需要传入自定义的验证函数 |
| data_loader | 模型校准时使用的数据,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader |
| save_dir | 分析后保存模型精度或pdf等文件的文件夹,默认为`analysis_results`|
| resume | 是否加载中间分析文件,默认为False|
## 3. 量化分析工具的使用
**创建量化分析工具**
```
analyzer = AnalysisQAT(
quant_model_dir=config["quant_model_dir"],
float_model_dir=config["float_model_dir"],
model_filename=config["model_filename"],
params_filename=config["params_filename"],
quantizable_op_type=config['quantizable_op_type'],
qat_metric=config['qat_metric'],
eval_function=eval_function,
data_loader=eval_loader,
save_dir=config['save_dir'],
resume=config['resume'],
)
```
**精度误差分析**
```
analyzer.metric_error_analyse()
```
调用该接口,会遍历量化模型中的每一层,去掉量化节点并计算当前层不量化的模型精度。调用该接口时,需要输入Eval Function。会产出所有去掉一层量化的模型精度排序,将默认保存在 `./analysis_results/analysis.txt` 中。具体使用可参考[GPT量化训练敏感度分析DEMO](../../../../example/quantization_analysis/GPT/README.md)
## FAQ:
- 与PTQ(Post Training Quantization)量化分析工具的区别:与PTQ量化分析工具不同的是,QAT量化分析工具加载量化训练后的量化模型,遍历所有量化的层,依次去掉量化层,加载Float模型的参数,并进行验证获取精度误差分析。而PTQ量化分析工具则是加载待量化的原模型,对模型所有层依次进行量化,每次量化一层,进行验证获取精度误差分析。
- QAT量化分析工具设计的原因:QAT量化分析工具依次去掉量化层,而不是依次量化一层是由于QAT需要训练的特性。遍历每层进行量化训练再验证精度比较耗时,直接加载量化训练后的量化模型,依次去掉量化层更高效。
- 量化分析工具为什么要区分PTQ和QAT:实验证明PTQ和QAT后的量化模型的敏感层并不完全一致,将两种算法分开,敏感度分析结果更加准确。
...@@ -130,7 +130,7 @@ python eval.py --config_path=./configs/ppyoloe_s_ptq.yaml ...@@ -130,7 +130,7 @@ python eval.py --config_path=./configs/ppyoloe_s_ptq.yaml
- 要测试的模型路径可以在配置文件中`model_dir`字段下进行修改。 - 要测试的模型路径可以在配置文件中`model_dir`字段下进行修改。
#### 3.6 提高离线量化精度 #### 3.6 提高离线量化精度
本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisQuant```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisQuant```详解见[AnalysisQuant.md](../../../docs/zh_cn/tutorials/quant/AnalysisQuant.md) 本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisPTQ```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisPTQ```详解见[AnalysisPTQ.md](../../../docs/zh_cn/tutorials/quant/AnalysisPTQ.md)
经过多个实验,包括尝试多种激活算法(avg,KL等)、weight的量化方式(abs_max,channel_wise_abs_max),对PicoDet-s进行离线量化后精度均为0,以PicoDet-s为例,量化分析工具具体使用方法如下: 经过多个实验,包括尝试多种激活算法(avg,KL等)、weight的量化方式(abs_max,channel_wise_abs_max),对PicoDet-s进行离线量化后精度均为0,以PicoDet-s为例,量化分析工具具体使用方法如下:
......
...@@ -23,7 +23,7 @@ from ppdet.core.workspace import create ...@@ -23,7 +23,7 @@ from ppdet.core.workspace import create
from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval
from keypoint_utils import keypoint_post_process from keypoint_utils import keypoint_post_process
from post_process import PPYOLOEPostProcess from post_process import PPYOLOEPostProcess
from paddleslim.quant.analysis import AnalysisQuant from paddleslim.quant.analysis_ptq import AnalysisPTQ
def argsparser(): def argsparser():
...@@ -161,7 +161,7 @@ def main(): ...@@ -161,7 +161,7 @@ def main():
else: else:
raise ValueError("metric currently only supports COCO and VOC.") raise ValueError("metric currently only supports COCO and VOC.")
analyzer = AnalysisQuant( analyzer = AnalysisPTQ(
model_dir=config["model_dir"], model_dir=config["model_dir"],
model_filename=config["model_filename"], model_filename=config["model_filename"],
params_filename=config["params_filename"], params_filename=config["params_filename"],
......
...@@ -116,7 +116,9 @@ python eval.py --config_path=./configs/yolov5s_ptq.yaml ...@@ -116,7 +116,9 @@ python eval.py --config_path=./configs/yolov5s_ptq.yaml
#### 3.6 提高离线量化精度 #### 3.6 提高离线量化精度
本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisQuant```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisQuant```详解见[AnalysisQuant.md](../../../docs/zh_cn/tutorials/quant/AnalysisQuant.md)
###### 3.6.1 量化分析工具
本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisPTQ```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisPTQ```详解见[AnalysisPTQ.md](../../../docs/zh_cn/tutorials/quant/AnalysisPTQ.md)
由于YOLOv6离线量化效果较差,以YOLOv6为例,量化分析工具具体使用方法如下: 由于YOLOv6离线量化效果较差,以YOLOv6为例,量化分析工具具体使用方法如下:
......
...@@ -21,7 +21,7 @@ from tqdm import tqdm ...@@ -21,7 +21,7 @@ from tqdm import tqdm
from post_process import YOLOPostProcess, coco_metric from post_process import YOLOPostProcess, coco_metric
from dataset import COCOValDataset, COCOTrainDataset from dataset import COCOValDataset, COCOTrainDataset
from paddleslim.common import load_config, load_onnx_model from paddleslim.common import load_config, load_onnx_model
from paddleslim.quant.analysis import AnalysisQuant from paddleslim.quant.analysis_ptq import AnalysisPTQ
def argsparser(): def argsparser():
...@@ -103,7 +103,7 @@ def main(): ...@@ -103,7 +103,7 @@ def main():
load_onnx_model(config["model_dir"]) load_onnx_model(config["model_dir"])
inference_model_path = config["model_dir"].rstrip().rstrip( inference_model_path = config["model_dir"].rstrip().rstrip(
'.onnx') + '_infer' '.onnx') + '_infer'
analyzer = AnalysisQuant( analyzer = AnalysisPTQ(
model_dir=inference_model_path, model_dir=inference_model_path,
model_filename='model.pdmodel', model_filename='model.pdmodel',
params_filename='model.pdiparams', params_filename='model.pdiparams',
......
# GPT量化训练敏感度分析示例
## 1. 简介
本示例将以自然语言处理生成模型GPT-3为例,介绍如何使用量化训练敏感度分析工具分析量化模型,以及提升量化训练精度。
## 2.Benchmark
| 模型 | 策略 | ACC | Inference模型 |
| :-------- |:-------- | :--------: | :--------: |
| GPT-345M | Baseline | 44.17 | [Model](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345M_Baseline.tar) |
| GPT-345M | 量化训练(分析前) | 41.58 | [Model](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345_QAT_wo_analysis.tar) |
| GPT-345M | 量化训练(分析后) | 44.94 | [Model](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345M_QAT_w_analysis_infer.tar) |
- ACC的指标均在基于[LAMBADA](https://raw.githubusercontent.com/cybertronai/bflm/master/lambada_test.jsonl)数据集,采用 ACC(accuracy) 指标评测得到
## 3. 量化分析流程
#### 3.1 准备环境
- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装)
- PaddleSlim develop版本
- PaddleFleetX >= 2.4
#### 3.2 准备数据集
量化敏感度分析基于验证集获得每层的敏感度,可下载和使用 [LAMBADA](https://raw.githubusercontent.com/cybertronai/bflm/master/lambada_test.jsonl) 或者 [WikiText](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip) 数据集。本示例使用LAMBADA数据集来进行敏感度分析。
#### 3.3 准备预测模型
- [GPT-345M](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345M_Baseline.tar) :Base模型
- [GPT-345M](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345_QAT_wo_analysis.tar) :分析前量化训练后的模型
#### 3.4 量化敏感度分析
量化敏感度分析示例通过analysis.py脚本启动,会使用接口```paddleslim.quant.AnalysisQAT```对模型进行敏感度分析。配置config文件中模型路径、数据路径和量化相关的参数,配置完成后便可对模型进行敏感度分析。具体运行命令为:
```shell
python analysis.py --config_path=./configs/gpt_345M_analysis.yaml
```
分析完成后,会产生排序好的层敏感度(敏感度由大到小排序,敏感度越大说明约负向影响模型精度),并保存在```analysis_results/analysis.txt```中。
敏感度排序前10层分别为:```linear_31``````linear_27``````linear_22``````linear_43``````linear_83``````linear_15``````linear_87``````linear_3``````linear_38``````linear_39```。在这十层中,其中有八层属于```TransformerDecoder```中第二个FFN层,两层属于```TransformerDecoder```中第一个FFN层,而```MultiHeadAttention```中的Linear层都相对不敏感。
```paddleslim.quant.AnalysisQAT```详解见[AnalysisQAT.md](../../../docs/zh_cn/tutorials/quant/AnalysisQAT.md)。
#### 3.5 重新量化训练
根据分析结果,重新量化训练时,去掉了```linear_31```,```linear_27```,```linear_22```,```linear_43```,```linear_83```,```linear_15```,```linear_87```七层Linear的量化,最后量化模型精度达到44.94。
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import random
import numpy as np
import argparse
import time
import paddle
from paddleslim.common import load_config as load_slim_config
from paddleslim.quant.analysis_qat import AnalysisQAT
from ppfleetx.data import build_dataloader
from ppfleetx.distributed.apis import env
from utils import parse_config
def argsparser():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
'--config_path',
type=str,
default=None,
help="path of compression strategy config.",
required=True)
parser.add_argument(
'--save_dir',
type=str,
default='analysis_results',
help="directory to save compressed model.")
parser.add_argument(
'--devices',
type=str,
default='gpu',
help="which device used to compress.")
return parser
def eval_reader_wrapper(reader):
def gen():
for data in reader:
tokens, loss_mask, attention_mask, position_ids, labels, info = data
in_dict = {}
in_dict['tokens'] = tokens
in_dict['ids'] = position_ids
yield in_dict, labels, loss_mask, info
return gen
def eval_function(exe, program, feed_names, fetch_list):
tic_eval = time.time()
score_name = "loss" if not global_config['cloze_eval'] else "number correct"
first_step = True
eval_losses = []
total_score = 0
for eval_step, (data, labels, loss_mask, info) in enumerate(eval_loader()):
preds = exe.run(program=program,
feed=data,
fetch_list=fetch_list,
return_numpy=False)
paddle.disable_static()
labels = paddle.to_tensor(labels)
preds = paddle.to_tensor(preds[0])
loss_mask = paddle.to_tensor(loss_mask)
info = paddle.to_tensor(info)
if not global_config['cloze_eval']:
if first_step:
num_original_tokens = info.numpy()[0][0]
num_tokenized_tokens = info.numpy()[0][1]
first_step = False
masked_lm_loss = paddle.nn.functional.cross_entropy(
preds, labels, reduction="none")
loss = paddle.sum(masked_lm_loss * loss_mask)
eval_losses.append(loss.numpy()[0])
total_score += loss.numpy() / (num_tokenized_tokens - 1)
else:
if first_step:
num_examples = info.numpy()[0][0]
first_step = False
outputs = paddle.argmax(preds, -1)
acc = paddle.cast(outputs == labels, 'float32')
acc = paddle.where(
paddle.cast(loss_mask, 'bool'), acc, paddle.ones_like(acc))
acc = paddle.sum(paddle.prod(acc, -1))
eval_losses.append(acc.numpy()[0])
total_score += acc.numpy()[0]
if eval_step != 0 and (eval_step % 10 == 0):
print("[eval] step: %d, batch: %d, %s: %.9f, speed: %.2f step/s" %
(eval_step, eval_step, score_name, total_score,
1. / (time.time() - tic_eval)))
tic_eval = time.time()
paddle.enable_static()
metric = None
if not global_config['cloze_eval']:
total_loss = float(total_score)
ppl = math.exp(min(20, total_loss))
token_ratio = (num_tokenized_tokens - 1) / (num_original_tokens - 1)
adjusted_ppl = math.exp(min(20, total_loss * token_ratio))
string = ' validation results on {} | '.format(gpt_config['Data'][
'Eval']['dataset']['name'])
string += 'avg loss: {:.4E} | '.format(total_loss)
string += 'ppl: {:.4E} | '.format(ppl)
string += 'adjusted ppl: {:.4E} | '.format(adjusted_ppl)
string += 'token ratio: {} |'.format(token_ratio)
metric = ppl
else:
num_correct = float(total_score)
acc = float(num_correct / num_examples)
string = ' validation results on {} | '.format(gpt_config['Data'][
'Eval']['dataset']['name'])
string += 'number correct: {:.4E} | '.format(num_correct)
string += 'total examples: {:.4E} | '.format(num_examples)
string += 'avg accuracy: {:.4E}'.format(acc)
metric = acc
print(string)
return metric
def main():
global global_config, all_config
all_config = load_slim_config(FLAGS.config_path)
assert "Global" in all_config, "Key 'Global' not found in config file. \n{}".format(
all_config)
global_config = all_config["Global"]
seed = all_config['Global']['seed']
random.seed(seed)
np.random.seed(seed)
paddle.seed(seed)
env.set_seed(seed)
global gpt_config
gpt_config = parse_config(global_config['reader_config'])
if not global_config['cloze_eval']:
gpt_config['Data']['Eval']['dataset']['name'] = "LM_Eval_Dataset"
else:
gpt_config['Data']['Eval']['dataset']['name'] = "Lambada_Eval_Dataset"
valid_data_loader = build_dataloader(gpt_config['Data'], "Eval")
global eval_loader
eval_loader = eval_reader_wrapper(valid_data_loader)
analyzer = AnalysisQAT(
quant_model_dir=global_config["quant_model_dir"],
float_model_dir=global_config["float_model_dir"],
model_filename=global_config["model_filename"],
params_filename=global_config["params_filename"],
quantizable_op_type=global_config['quantizable_op_type'],
qat_metric=global_config['qat_metric']
if 'qat_metric' in global_config else None,
eval_function=eval_function,
data_loader=eval_loader,
save_dir=FLAGS.save_dir,
resume=global_config['resume'], )
analyzer.metric_error_analyse()
if __name__ == '__main__':
paddle.enable_static()
parser = argsparser()
FLAGS = parser.parse_args()
assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
paddle.set_device(FLAGS.devices)
main()
Global:
device: gpu
seed: 1024
quant_model_dir: ./GPT_345_QAT_wo_analysis
float_model_dir: ./GPT_345M_Baseline
model_filename: model.pdmodel
params_filename: model.pdiparams
quantizable_op_type: ["mul", "matmul", "matmul_v2"]
resume: False
reader_config: ./configs/gpt_reader.yaml
cloze_eval: True # True for LAMBADA Dataset; False for WikiText
\ No newline at end of file
Data:
Eval:
dataset:
name: GPTDataset
input_dir: ./lambada_test.jsonl
max_seq_len: 1024
overlapping_eval: 32
loader:
num_workers: 1
return_list: True
collate_fn: gpt_collate_fn
batch_size: 1
\ No newline at end of file
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import codecs
import yaml
import time
import copy
class AttrDict(dict):
def __getattr__(self, key):
return self[key]
def __setattr__(self, key, value):
if key in self.__dict__:
self.__dict__[key] = value
else:
self[key] = value
def __copy__(self):
cls = self.__class__
result = cls.__new__(cls)
result.__dict__.update(self.__dict__)
return result
def __deepcopy__(self, memo):
cls = self.__class__
result = cls.__new__(cls)
memo[id(self)] = result
for k, v in self.__dict__.items():
setattr(result, k, copy.deepcopy(v, memo))
for k, v in self.items():
setattr(result, k, copy.deepcopy(v, memo))
return result
def setdefault(self, k, default=None):
if k not in self or self[k] is None:
self[k] = default
return default
else:
return self[k]
def create_attr_dict(yaml_config):
from ast import literal_eval
for key, value in yaml_config.items():
if type(value) is dict:
yaml_config[key] = value = AttrDict(value)
if isinstance(value, str):
try:
value = literal_eval(value)
except BaseException:
pass
if isinstance(value, AttrDict):
create_attr_dict(yaml_config[key])
else:
yaml_config[key] = value
def parse_config(cfg_file):
"""Load a config file into AttrDict"""
def _update_dic(dic, base_dic):
'''Update config from dic based base_dic
'''
base_dic = base_dic.copy()
dic = dic.copy()
if dic.get('_inherited_', True) == False:
dic.pop('_inherited_')
return dic
for key, val in dic.items():
if isinstance(val, dict) and key in base_dic:
base_dic[key] = _update_dic(val, base_dic[key])
else:
base_dic[key] = val
dic = base_dic
return dic
def _parse_from_yaml(path):
'''Parse a yaml file and build config'''
with codecs.open(path, 'r', 'utf-8') as file:
dic = yaml.load(file, Loader=yaml.FullLoader)
if '_base_' in dic:
cfg_dir = os.path.dirname(path)
base_path = dic.pop('_base_')
base_path = os.path.join(cfg_dir, base_path)
base_dic = _parse_from_yaml(base_path)
dic = _update_dic(dic, base_dic)
return dic
yaml_dict = _parse_from_yaml(cfg_file)
yaml_config = AttrDict(yaml_dict)
create_attr_dict(yaml_config)
return yaml_config
...@@ -37,10 +37,10 @@ from ..common import get_feed_vars, wrap_dataloader, load_inference_model, get_m ...@@ -37,10 +37,10 @@ from ..common import get_feed_vars, wrap_dataloader, load_inference_model, get_m
_logger = get_logger(__name__, level=logging.INFO) _logger = get_logger(__name__, level=logging.INFO)
__all__ = ["AnalysisQuant"] __all__ = ["AnalysisPTQ"]
class AnalysisQuant(object): class AnalysisPTQ(object):
def __init__(self, def __init__(self,
model_dir, model_dir,
model_filename=None, model_filename=None,
...@@ -51,7 +51,7 @@ class AnalysisQuant(object): ...@@ -51,7 +51,7 @@ class AnalysisQuant(object):
resume=False, resume=False,
ptq_config=None): ptq_config=None):
""" """
AnalysisQuant provides to analysis the sensitivity of each op in the model. AnalysisPTQ provides to analysis the sensitivity of each op in the model.
Args: Args:
model_dir(str): the path of fp32 model that will be quantized, it can also be '.onnx' model_dir(str): the path of fp32 model that will be quantized, it can also be '.onnx'
...@@ -403,7 +403,8 @@ class AnalysisQuant(object): ...@@ -403,7 +403,8 @@ class AnalysisQuant(object):
statistic = [] statistic = []
box_fp_dist, box_q_dist = [], [] box_fp_dist, box_q_dist = [], []
hist_fp_dist, hist_q_dist = {}, {} hist_fp_dist, hist_q_dist = {}, {}
for var_name in fp_tensors: fp_tensor_names = sorted(list(fp_tensors.keys()))
for var_name in fp_tensor_names:
fp_tensor = fp_tensors[var_name] fp_tensor = fp_tensors[var_name]
quant_name = var_name_map[ quant_name = var_name_map[
var_name] if var_name_map is not None else var_name var_name] if var_name_map is not None else var_name
...@@ -503,7 +504,9 @@ class AnalysisQuant(object): ...@@ -503,7 +504,9 @@ class AnalysisQuant(object):
for name in hist_data: for name in hist_data:
plt.hist(hist_data[name][0], bins=hist_data[name][1]) plt.hist(hist_data[name][0], bins=hist_data[name][1])
plt.xlabel(name) plt.xlabel(name)
plt.ylabel("Frequency") plt.ylabel("Probability")
locs, _ = plt.yticks()
plt.yticks(locs, np.round(locs / len(hist_data[name][0]), 3))
if 'act' in save_name: if 'act' in save_name:
plt.title("Hist of Activation {}".format(name)) plt.title("Hist of Activation {}".format(name))
else: else:
......
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import pickle
import copy
import logging
import numpy as np
import paddle
from paddle.fluid import core
from paddle.fluid.framework import IrGraph
from ..common import get_logger, load_inference_model
_logger = get_logger(__name__, level=logging.INFO)
__all__ = ["AnalysisQAT"]
class AnalysisQAT(object):
def __init__(self,
quant_model_dir,
float_model_dir,
model_filename=None,
params_filename=None,
quantizable_op_type=["conv2d", "depthwise_conv2d", "mul"],
qat_metric=None,
eval_function=None,
data_loader=None,
save_dir='analysis_results',
resume=False):
'''
AnalysisQAT provides to analysis the sensitivity of each op in the model.
Args:
quant_model_dir(str): the path of INT8 model that quantized through QAT
float_model_dir(str): the path of FP32 model that is the base model of quant_model
model_filename(str, optional): the model file name of the model
params_filename(str, optional): the parameter file name of the model
quantizable_op_type(list of str, optional): the type of op that will be analyzed
qat_metric(float, optional): the metric of the quantized model, which will be calculated automatically if is None
eval_function(function): eval function, define by yourself to return the metric of the inference program, can be used to judge the metric of quantized model.
data_loader(Python Generator, Paddle.io.DataLoader, optional): the
Generator or Dataloader provides calibrate data, and it could
return a batch every time
save_dir(str, optional): the output dir that stores the analyzed information
resume(bool, optional): When break off while ananlyzing, could resume analysis program and load already analyzed information.
'''
if model_filename is None:
model_filename = 'model.pdmodel'
if params_filename is None:
params_filename = 'model.pdiparams'
self.quant_model_dir = quant_model_dir
self.float_model_dir = float_model_dir
self.model_filename = model_filename
self.params_filename = params_filename
self.quantizable_op_type = quantizable_op_type
self.qat_metric = qat_metric
self.eval_function = eval_function
self.save_dir = save_dir
self.checkpoint_name = os.path.join(save_dir, 'analysis_checkpoint.pkl')
self.nonquant_layer_metrics = {}
if not os.path.exists(self.save_dir):
os.mkdir(self.save_dir)
devices = paddle.device.get_device().split(':')[0]
self.places = paddle.device._convert_to_place(devices)
executor = paddle.static.Executor(self.places)
[program, self.feed_list, self.fetch_list] = load_inference_model(
self.quant_model_dir,
executor=executor,
model_filename=self.model_filename,
params_filename=self.params_filename)
_logger.info('Loaded model from: {}'.format(quant_model_dir))
graph = IrGraph(core.Graph(program.desc), for_test=True)
# find all inputs for each quantizable op
self.inputs_of_quantized_op = []
sorted_ops = graph.topology_sort()
for op_node in sorted_ops:
op_name = op_node.name()
if op_name in quantizable_op_type:
input_names = op_node.op().input_arg_names()
for input_name in input_names:
if 'quantized' in input_name:
self.inputs_of_quantized_op.append(input_names)
break
if self.qat_metric is None:
_logger.info('Calculating the metric of QAT model...')
self.qat_metric = self.eval_function(
executor, program, self.feed_list, self.fetch_list) * 100
_logger.info('The metric of QAT model is {}'.format(
round(self.qat_metric, 4)))
executor.close()
def save_checkpoint(self):
if not os.path.exists(self.save_dir):
os.makedirs(self.save_dir)
with open(self.checkpoint_name, 'wb') as f:
pickle.dump(self.nonquant_layer_metrics, f)
_logger.info('Save checkpoint to {}.'.format(self.checkpoint_name))
def load_checkpoint(self):
if not os.path.exists(self.checkpoint_name):
_logger.info('Checkpoint path {} does not exist.'.format(
self.checkpoint_name))
return False
with open(self.checkpoint_name, 'rb') as f:
self.nonquant_layer_metrics = pickle.load(f)
_logger.info('Load checkpoint from {}.'.format(self.checkpoint_name))
return True
def get_weight_name(self, inputs_names):
# TODO(xc)
w_idx = 0 if 'w_0' in inputs_names[0] else 1
weight_name = inputs_names[w_idx].split('.quantized.dequantized')[0]
return weight_name
def get_new_in_out_map(
self,
input_list,
graph,
float_scope,
quant_scope, ):
input_rename_map = {}
output_rename_map = {}
removed_ops = []
for op_node in graph.all_op_nodes():
if op_node.id() in removed_ops:
continue
in_names = op_node.input_arg_names()
out_names = op_node.output_arg_names()
if len(out_names) == 1 and out_names[0] in input_list:
in_var = graph._find_node_by_name(op_node.inputs,
op_node.input('X')[0])
out_var = graph._find_node_by_name(op_node.outputs,
op_node.output('Y')[0])
if 'quantized' in in_var.name():
# act
for op in graph.all_op_nodes():
o_ns = op.output_arg_names()
if len(o_ns) == 1 and o_ns[0] == in_var.name():
in_var_1 = graph._find_node_by_name(
op.inputs, op.input('X')[0])
graph.safe_remove_nodes(op)
removed_ops.append(op.id())
input_rename_map[out_var.node] = in_var_1
else:
# weight
with paddle.static.scope_guard(float_scope):
float_weight = np.array(
float_scope.find_var(in_var.name()).get_tensor())
with paddle.static.scope_guard(quant_scope):
quant_scope.find_var(in_var.name()).get_tensor().set(
float_weight, self.places)
input_rename_map[out_var.node] = in_var
graph.safe_remove_nodes(op_node)
removed_ops.append(op_node.id())
output_rename_map[in_var.node] = out_var
return input_rename_map, output_rename_map, removed_ops
def relink_graph(self, graph, input_rename_map, output_rename_map,
removed_ops):
for op_node in graph.all_op_nodes():
if op_node.id() in removed_ops:
continue
for var in op_node.inputs:
if var.node in input_rename_map:
old_in = var
new_in = input_rename_map[var.node]
graph.update_input_link(old_in, new_in, op_node)
_logger.info(
f'relink {op_node.name()} \'s input node from {old_in.name()} to {new_in.name()}.'
)
for var in op_node.outputs:
if var.node in output_rename_map:
old_out = var
new_out = output_rename_map[var.node]
graph.update_input_link(old_out, new_out, op_node)
_logger.info(
f'relink {op_node.name()} \'s output node from {old_out.name()} to {new_out.name()}.'
)
return graph.to_program()
def metric_error_analyse(self):
executor = paddle.static.Executor(self.places)
float_scope = paddle.static.Scope()
quant_scope = paddle.static.Scope()
for idx, input_list in enumerate(self.inputs_of_quantized_op):
weight_name = self.get_weight_name(input_list)
_logger.info(
'Checking {}/{} quant model: without quant layer {}'.format(
idx + 1, len(self.inputs_of_quantized_op), weight_name))
with paddle.static.scope_guard(float_scope):
load_inference_model(
self.float_model_dir,
executor=executor,
model_filename=self.model_filename,
params_filename=self.params_filename)
with paddle.static.scope_guard(quant_scope):
[program, self.feed_list,
self.fetch_list] = load_inference_model(
self.quant_model_dir,
executor=executor,
model_filename=self.model_filename,
params_filename=self.params_filename)
program_copy = program.clone()
graph = IrGraph(core.Graph(program_copy.desc), for_test=True)
input_rename_map, output_rename_map, removed_ops = self.get_new_in_out_map(
input_list, graph, float_scope, quant_scope)
saved_program = self.relink_graph(graph, input_rename_map,
output_rename_map, removed_ops)
with paddle.static.scope_guard(quant_scope):
_logger.info('Skip quant {}, evaluating....'.format(
weight_name))
metric = self.eval_function(executor, saved_program,
self.feed_list,
self.fetch_list) * 100
self.nonquant_layer_metrics[weight_name] = metric
_logger.info(
'When skip quant {}, the metric is {}, the diff is {}'.
format(weight_name,
round(metric, 4), round(metric - self.qat_metric,
4)))
self.save_checkpoint()
executor.close()
self.sensitivity_ranklist = sorted(
self.nonquant_layer_metrics,
key=self.nonquant_layer_metrics.get,
reverse=True)
_logger.info('Finished computing the sensitivity of the model.')
for name in self.sensitivity_ranklist:
_logger.info("without quant layer name: {}, eval metric: {}".format(
name, self.nonquant_layer_metrics[name]))
analysis_file = os.path.join(self.save_dir, "analysis.txt")
with open(analysis_file, "w") as analysis_ret_f:
for name in self.sensitivity_ranklist:
analysis_ret_f.write(
"without layer name: {}, eval metric: {}\n".format(
name, self.nonquant_layer_metrics[name]))
_logger.info('Analysis file is saved in {}'.format(analysis_file))
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册