未验证 提交 2350af8e 编写于 作者: C Chang Xu 提交者: GitHub

add_ppminilm_demo (#1082)

上级 84eb98df
# 自然语言处理模型自动压缩示例
本示例将介绍如何使用PaddleNLP中Inference部署模型进行自动压缩。
## Benchmark
- PP-MiniLM模型
PP-MiniLM是一个6层的预训练中文小模型,使用PaddleNLP中``from_pretrained``导入PP-MiniLM之后,就可以在自己的数据集上进行fine-tuning,具体介绍可参考[PP-MiniLM文档](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/model_compression/pp-minilm#PP-MiniLM%E4%B8%AD%E6%96%87%E5%B0%8F%E6%A8%A1%E5%9E%8B)
此自动压缩实验首先会对模型的attention head裁剪25%,同时进行蒸馏训练,然后进行离线量化(Post-training quantization)。
| 模型 | 策略 | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | CLUEWSC2020 | CSL | AVG |
|:------:|:------:|:------:|:------:|:------:|:------:|:-----------:|:------:|:------:|:------:|
| PP-MiniLM | Base模型| 74.03 | 56.66 | 60.21 | 80.98 | 76.20 | 84.21 | 77.36 | 72.81 |
| PP-MiniLM |剪枝蒸馏+离线量化| 73.56 | 56.38 | 59.87 | 80.80 | 76.44 | 82.23 | 77.77 | 72.44 |
性能测试的环境为
- 硬件:NVIDIA Tesla T4 单卡
- 软件:CUDA 11.0, cuDNN 8.0, TensorRT 8.0
- 测试配置:batch_size: 40, max_seq_len: 128
## 环境准备
### 1.准备数据
本案例默认以CLUE数据进行自动压缩实验,如数据集为非CLUE格式数据,请修改启动文本run.sh中dataset字段,PaddleNLP会自动下载对应数据集。
### 2.准备需要压缩的环境
- python >= 3.6
- paddlepaddle >= 2.3
- PaddleNLP >= 2.3
安装paddlepaddle:
```shell
# CPU
pip install paddlepaddle
# GPU
pip install paddlepaddle-gpu
```
安装paddlenlp:
```shell
pip install paddlenlp
```
安装paddleslim:
```shell
pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple
```
注:安装PaddleNLP的目的是为了下载PaddleNLP中的数据集和Tokenizer。
### 3.准备待压缩的部署模型
如果已经准备好部署的model.pdmodel和model.pdiparams部署模型,跳过此步。
根据[PaddleNLP文档](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples)导出Inference模型,本示例可参考[PaddleNLP PP-MiniLM 中文小模型](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/model_compression/pp-minilm)微调后保存下每个数据集下有最高准确率的模型。或直接下载以下已微调完成的Inference模型:[afqmc](https://bj.bcebos.com/v1/paddle-slim-models/act/afqmc.tar), [tnews](https://bj.bcebos.com/v1/paddle-slim-models/act/tnews.tar), [iflytek](https://bj.bcebos.com/v1/paddle-slim-models/act/iflytek.tar),[ ocnli](https://bj.bcebos.com/v1/paddle-slim-models/act/ocnli.tar), [cmnli](https://bj.bcebos.com/v1/paddle-slim-models/act/cmnli.tar), [cluewsc2020](https://bj.bcebos.com/v1/paddle-slim-models/act/cluewsc.tar), [csl](https://bj.bcebos.com/v1/paddle-slim-models/act/csl.tar)
```shell
wget https://bj.bcebos.com/v1/paddle-slim-models/act/afqmc.tar
tar -zxvf afqmc.tar
```
## 开始自动压缩
### 压缩配置介绍
自动压缩需要准备config文件,并传入``config_path``字段,configs文件夹下可查看不同任务的配置文件,以下示例以afqmc数据集为例介绍。训练参数需要自行配置。蒸馏、剪枝和离线量化的相关配置,自动压缩策略可以自动获取得到,也可以自行配置。PaddleNLP模型的自动压缩实验默认使用剪枝、蒸馏和离线量化的策略。
- 训练参数
训练参数主要设置学习率、训练轮数(epochs)和优化器等。``origin_metric``是原模型精度,如设置该参数,压缩之前会先验证模型精度是否正常。
```yaml
TrainConfig:
epochs: 6
eval_iter: 1070
learning_rate: 2.0e-5
optim_args:
weight_decay: 0.01
optimizer: AdamW
origin_metric: 0.7403
```
以下是默认的蒸馏、剪枝和离线量化的配置:
- 蒸馏参数
蒸馏参数包括teacher网络模型路径(即微调后未剪枝的模型),自动压缩策略会自动查找教师网络节点和对应的学生网络节点进行蒸馏,不需要手动设置。
```yaml
Distillation:
teacher_model_dir: ./afqmc/
teacher_model_filename: inference.pdmodel
teacher_params_filename: inference.pdiparams
```
- 剪枝参数
剪枝参数包括裁剪算法和裁剪度。
```yaml
Prune:
prune_algo: transformer_pruner
pruned_ratio: 0.25
```
- 优化参数
```yaml
HyperParameterOptimization:
batch_num:
- 4
- 16
bias_correct:
- true
hist_percent:
- 0.999
- 0.99999
max_quant_count: 20
ptq_algo:
- KL
- hist
weight_quantize_type:
- channel_wise_abs_max
```
- 量化参数
量化参数主要设置量化比特数和量化op类型,其中量化op包含卷积层(conv2d, depthwise_conv2d)和全连接层(mul,matmul_v2)。
```yaml
Quantization:
activation_bits: 8
quantize_op_types:
- conv2d
- depthwise_conv2d
- mul
- matmul_v2
weight_bits: 8
```
### 进行剪枝蒸馏和离线量化自动压缩
蒸馏量化自动压缩示例通过run.py脚本启动,会使用接口``paddleslim.auto_compression.AutoCompression``对模型进行离线量化。将任务名称、模型类型、数据集名称、压缩参数传入,对模型进行剪枝、蒸馏训练和离线量化。数据集为CLUE,不同任务名称代表CLUE上不同的任务,可选择的任务名称有:afqmc, tnews, iflytek, ocnli, cmnli, cluewsc2020, csl。具体运行命令为:
```shell
python run.py \
--model_type='ppminilm' \
--model_dir='./afqmc/' \
--model_filename='inference.pdmodel' \
--params_filename='inference.pdiparams' \
--dataset='clue' \
--save_dir='./save_afqmc_pruned/' \
--batch_size=16 \
--max_seq_length=128 \
--task_name='afqmc' \
--config_path='./configs/afqmc.yaml'
```
TrainConfig:
epochs: 6
eval_iter: 1070
learning_rate: 2.0e-5
optim_args:
weight_decay: 0.01
optimizer: AdamW
origin_metric: 0.7403
TrainConfig:
epochs: 100
eval_iter: 70
learning_rate: 1.0e-5
optim_args:
weight_decay: 0.01
optimizer: AdamW
origin_metric: 0.8421
TrainConfig:
epochs: 6
eval_iter: 2000
learning_rate: 3.0e-5
optim_args:
weight_decay: 0.01
optimizer: AdamW
origin_metric: 0.8098
\ No newline at end of file
TrainConfig:
epochs: 16
eval_iter: 1000
learning_rate: 1.0e-5
optim_args:
weight_decay: 0.01
optimizer: AdamW
origin_metric: 0.7736
TrainConfig:
epochs: 12
eval_iter: 750
learning_rate: 2.0e-5
optim_args:
weight_decay: 0.01
optimizer: AdamW
origin_metric: 0.6021
TrainConfig:
epochs: 20
eval_iter: 1050
learning_rate: 3.0e-5
optim_args:
weight_decay: 0.01
optimizer: AdamW
origin_metric: 0.7620
\ No newline at end of file
TrainConfig:
epochs: 6
eval_iter: 1110
learning_rate: 2.0e-5
optim_args:
weight_decay: 0.01
optimizer: AdamW
origin_metric: 0.5666
\ No newline at end of file
import os
import sys
sys.path[0] = os.path.join(os.path.dirname("__file__"), os.path.pardir)
sys.path[0] = os.path.join(os.path.dirname("__file__"), os.path.pardir, os.path.pardir)
import argparse
import functools
from functools import partial
......@@ -10,6 +10,7 @@ import paddle
import paddle.nn as nn
from paddle.io import Dataset, BatchSampler, DataLoader
from paddle.metric import Metric, Accuracy, Precision, Recall
from paddlenlp.transformers import PPMiniLMForSequenceClassification, PPMiniLMTokenizer
from paddlenlp.transformers import BertForSequenceClassification, BertTokenizer
from paddlenlp.datasets import load_dataset
from paddlenlp.data import Stack, Tuple, Pad
......@@ -23,12 +24,15 @@ parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('model_type', str, None, "model type can be bert or ppminilm.")
add_arg('model_dir', str, None, "inference model directory.")
add_arg('model_filename', str, None, "inference model filename.")
add_arg('params_filename', str, None, "inference params filename.")
add_arg('dataset', str, None, "datset name.")
add_arg('save_dir', str, None, "directory to save compressed model.")
add_arg('max_seq_length', int, 128, "max sequence length after tokenization.")
add_arg('batch_size', int, 1, "train batch size.")
add_arg('task', str, 'sst-2', "task name in glue.")
add_arg('task_name', str, 'sst-2', "task name in glue.")
add_arg('config_path', str, None, "path of compression strategy config.")
# yapf: enable
......@@ -39,6 +43,13 @@ METRIC_CLASSES = {
"mnli": Accuracy,
"qnli": Accuracy,
"rte": Accuracy,
"afqmc": Accuracy,
"tnews": Accuracy,
"iflytek": Accuracy,
"ocnli": Accuracy,
"cmnli": Accuracy,
"cluewsc2020": Accuracy,
"csl": Accuracy,
}
......@@ -47,22 +58,74 @@ def convert_example(example,
label_list,
max_seq_length=512,
is_test=False):
"""
Convert a glue example into necessary features.
"""
if not is_test:
# `label_list == None` is for regression task
label_dtype = "int64" if label_list else "float32"
# Get the label
label = example['labels']
label = np.array([label], dtype=label_dtype)
# Convert raw text to feature
example = tokenizer(example['sentence'], max_seq_len=max_seq_length)
if not is_test:
return example['input_ids'], example['token_type_ids'], label
else:
return example['input_ids'], example['token_type_ids']
assert args.dataset in ['glue', 'clue'], "This demo only supports for dataset glue or clue"
"""Convert a glue example into necessary features."""
if args.dataset == 'glue':
if not is_test:
# `label_list == None` is for regression task
label_dtype = "int64" if label_list else "float32"
# Get the label
label = example['labels']
label = np.array([label], dtype=label_dtype)
# Convert raw text to feature
example = tokenizer(example['sentence'], max_seq_len=max_seq_length)
if not is_test:
return example['input_ids'], example['token_type_ids'], label
else:
return example['input_ids'], example['token_type_ids']
else: #if args.dataset == 'clue':
if not is_test:
# `label_list == None` is for regression task
label_dtype = "int64" if label_list else "float32"
# Get the label
example['label'] = np.array(example["label"], dtype="int64").reshape((-1, 1))
label = example['label']
# Convert raw text to feature
if 'keyword' in example: # CSL
sentence1 = " ".join(example['keyword'])
example = {
'sentence1': sentence1,
'sentence2': example['abst'],
'label': example['label']
}
elif 'target' in example: # wsc
text, query, pronoun, query_idx, pronoun_idx = example['text'], example[
'target']['span1_text'], example['target']['span2_text'], example[
'target']['span1_index'], example['target']['span2_index']
text_list = list(text)
assert text[pronoun_idx:(pronoun_idx + len(pronoun)
)] == pronoun, "pronoun: {}".format(pronoun)
assert text[query_idx:(query_idx + len(query)
)] == query, "query: {}".format(query)
if pronoun_idx > query_idx:
text_list.insert(query_idx, "_")
text_list.insert(query_idx + len(query) + 1, "_")
text_list.insert(pronoun_idx + 2, "[")
text_list.insert(pronoun_idx + len(pronoun) + 2 + 1, "]")
else:
text_list.insert(pronoun_idx, "[")
text_list.insert(pronoun_idx + len(pronoun) + 1, "]")
text_list.insert(query_idx + 2, "_")
text_list.insert(query_idx + len(query) + 2 + 1, "_")
text = "".join(text_list)
example['sentence'] = text
if tokenizer is None:
return example
if 'sentence' in example:
example = tokenizer(example['sentence'], max_seq_len=max_seq_length)
elif 'sentence1' in example:
example = tokenizer(
example['sentence1'],
text_pair=example['sentence2'],
max_seq_len=max_seq_length)
if not is_test:
return example['input_ids'], example['token_type_ids'], label
else:
return example['input_ids'], example['token_type_ids']
def create_data_holder(task_name):
......@@ -83,14 +146,18 @@ def create_data_holder(task_name):
def reader():
# Create the tokenizer and dataset
tokenizer = BertTokenizer.from_pretrained(args.model_dir)
train_ds = load_dataset('glue', args.task, splits="train")
if args.model_type == 'bert':
tokenizer = BertTokenizer.from_pretrained(args.model_dir)
else: # ppminilm
tokenizer = PPMiniLMTokenizer.from_pretrained(args.model_dir)
train_ds, dev_ds = load_dataset(
args.dataset, args.task_name, splits=('train', 'dev'))
trans_func = partial(
convert_example,
tokenizer=tokenizer,
label_list=train_ds.label_list,
max_seq_length=128,
max_seq_length=args.max_seq_length,
is_test=True)
train_ds = train_ds.map(trans_func, lazy=True)
......@@ -101,9 +168,9 @@ def reader():
): fn(samples)
train_batch_sampler = paddle.io.BatchSampler(
train_ds, batch_size=32, shuffle=True)
train_ds, batch_size=args.batch_size, shuffle=True)
[input_ids, token_type_ids, labels] = create_data_holder(args.task)
[input_ids, token_type_ids, labels] = create_data_holder(args.task_name)
feed_list_name = []
train_data_loader = DataLoader(
dataset=train_ds,
......@@ -117,16 +184,15 @@ def reader():
convert_example,
tokenizer=tokenizer,
label_list=train_ds.label_list,
max_seq_length=128)
max_seq_length=args.max_seq_length)
dev_batchify_fn = lambda samples, fn=Tuple(
Pad(axis=0, pad_val=tokenizer.pad_token_id), # input
Pad(axis=0, pad_val=tokenizer.pad_token_type_id), # token_type
Stack(dtype="int64" if train_ds.label_list else "float32") # label
): fn(samples)
dev_ds = load_dataset('glue', args.task, splits='dev')
dev_ds = dev_ds.map(dev_trans_func, lazy=True)
dev_batch_sampler = paddle.io.BatchSampler(
dev_ds, batch_size=32, shuffle=False)
dev_ds, batch_size=args.batch_size, shuffle=False)
dev_data_loader = DataLoader(
dataset=dev_ds,
batch_sampler=dev_batch_sampler,
......@@ -148,7 +214,7 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
},
fetch_list=test_fetch_list)
paddle.disable_static()
labels_pd = paddle.to_tensor(np.array(data[0]['label']))
labels_pd = paddle.to_tensor(np.array(data[0]['label']).flatten())
logits_pd = paddle.to_tensor(logits[0])
correct = metric.compute(logits_pd, labels_pd)
metric.update(correct)
......@@ -179,7 +245,7 @@ if __name__ == '__main__':
'apply_decay_param_fun'] = apply_decay_param_fun
train_dataloader, eval_dataloader = reader()
metric_class = METRIC_CLASSES[args.task]
metric_class = METRIC_CLASSES[args.task_name]
metric = metric_class()
ac = AutoCompression(
......@@ -191,7 +257,7 @@ if __name__ == '__main__':
train_config=train_config,
train_dataloader=train_dataloader,
eval_callback=eval_function
if 'HyperParameterOptimization' not in compress_config else
if compress_config is None or 'HyperParameterOptimization' not in compress_config else
eval_dataloader,
eval_dataloader=eval_dataloader)
......
export FLAGS_cudnn_deterministic=True
python run.py \
--model_type='ppminilm' \
--model_dir='./afqmc/' \
--model_filename='inference.pdmodel' \
--params_filename='inference.pdiparams' \
--dataset='clue' \
--save_dir='./save_afqmc_pruned/' \
--batch_size=16 \
--max_seq_length=128 \
--task_name='afqmc' \
--config_path='./configs/afqmc.yaml'
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册