Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleSlim
提交
61ad6665
P
PaddleSlim
项目概览
PaddlePaddle
/
PaddleSlim
1 年多 前同步成功
通知
51
Star
1434
Fork
344
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
53
列表
看板
标记
里程碑
合并请求
16
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleSlim
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
53
Issue
53
列表
看板
标记
里程碑
合并请求
16
合并请求
16
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
61ad6665
编写于
7月 12, 2022
作者:
C
Chang Xu
提交者:
GitHub
7月 12, 2022
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Update NLP/HF/Clas ACT Docs (#1278)
上级
ae917097
变更
32
隐藏空白更改
内联
并排
Showing
32 changed file
with
148 addition
and
136 deletion
+148
-136
example/auto_compression/image_classification/README.md
example/auto_compression/image_classification/README.md
+22
-12
example/auto_compression/image_classification/configs/EfficientNetB0/prune_dis.yaml
...mage_classification/configs/EfficientNetB0/prune_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/EfficientNetB0/qat_dis.yaml
.../image_classification/configs/EfficientNetB0/qat_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/GhostNet_x1_0/prune_dis.yaml
...image_classification/configs/GhostNet_x1_0/prune_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/GhostNet_x1_0/qat_dis.yaml
...n/image_classification/configs/GhostNet_x1_0/qat_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/InceptionV3/prune_dis.yaml
...n/image_classification/configs/InceptionV3/prune_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/InceptionV3/qat_dis.yaml
...ion/image_classification/configs/InceptionV3/qat_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/MobileNetV1/prune_dis.yaml
...n/image_classification/configs/MobileNetV1/prune_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/prune_dis.yaml
...ssification/configs/MobileNetV3_large_x1_0/prune_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/qat_dis.yaml
...lassification/configs/MobileNetV3_large_x1_0/qat_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/PPHGNet_tiny/prune_dis.yaml
.../image_classification/configs/PPHGNet_tiny/prune_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/PPHGNet_tiny/qat_dis.yaml
...on/image_classification/configs/PPHGNet_tiny/qat_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/PPLCNetV2_base/prune_dis.yaml
...mage_classification/configs/PPLCNetV2_base/prune_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/PPLCNetV2_base/qat_dis.yaml
.../image_classification/configs/PPLCNetV2_base/qat_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/PPLCNet_x1_0/prune_dis.yaml
.../image_classification/configs/PPLCNet_x1_0/prune_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/PPLCNet_x1_0/qat_dis.yaml
...on/image_classification/configs/PPLCNet_x1_0/qat_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/ResNet50_vd/prune_dis.yaml
...n/image_classification/configs/ResNet50_vd/prune_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/ResNet50_vd/qat_dis.yaml
...ion/image_classification/configs/ResNet50_vd/qat_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/prune_dis.yaml
...e_classification/configs/ShuffleNetV2_x1_0/prune_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/qat_dis.yaml
...age_classification/configs/ShuffleNetV2_x1_0/qat_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/SqueezeNet1_0/prune_dis.yaml
...image_classification/configs/SqueezeNet1_0/prune_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/SqueezeNet1_0/qat_dis.yaml
...n/image_classification/configs/SqueezeNet1_0/qat_dis.yaml
+1
-1
example/auto_compression/image_classification/configs/SwinTransformer_base_patch4_window7_224/qat_dis.yaml
...figs/SwinTransformer_base_patch4_window7_224/qat_dis.yaml
+1
-1
example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml
...le/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml
+5
-0
example/auto_compression/nlp/configs/pp-minilm/auto/cluewsc.yaml
.../auto_compression/nlp/configs/pp-minilm/auto/cluewsc.yaml
+5
-0
example/auto_compression/nlp/configs/pp-minilm/auto/cmnli.yaml
...le/auto_compression/nlp/configs/pp-minilm/auto/cmnli.yaml
+5
-0
example/auto_compression/nlp/configs/pp-minilm/auto/csl.yaml
example/auto_compression/nlp/configs/pp-minilm/auto/csl.yaml
+5
-0
example/auto_compression/nlp/configs/pp-minilm/auto/iflytek.yaml
.../auto_compression/nlp/configs/pp-minilm/auto/iflytek.yaml
+5
-0
example/auto_compression/nlp/configs/pp-minilm/auto/ocnli.yaml
...le/auto_compression/nlp/configs/pp-minilm/auto/ocnli.yaml
+5
-0
example/auto_compression/nlp/configs/pp-minilm/auto/tnews.yaml
...le/auto_compression/nlp/configs/pp-minilm/auto/tnews.yaml
+5
-0
example/auto_compression/pytorch_huggingface/README.md
example/auto_compression/pytorch_huggingface/README.md
+28
-11
example/auto_compression/pytorch_huggingface/infer.py
example/auto_compression/pytorch_huggingface/infer.py
+41
-91
未找到文件。
example/auto_compression/image_classification/README.md
浏览文件 @
61ad6665
...
...
@@ -72,7 +72,7 @@ pip install paddleslim
```
#### 3.2 准备数据集
本案例默认以ImageNet1k数据进行自动压缩实验,如数据集为非ImageNet1k格式数据, 请参考
[
PaddleClas数据准备文档
](
https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/data_preparation/classification_dataset.md
)
。
本案例默认以ImageNet1k数据进行自动压缩实验,如数据集为非ImageNet1k格式数据, 请参考
[
PaddleClas数据准备文档
](
https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/data_preparation/classification_dataset.md
)
。
将下载好的数据集放在当前目录下
`./ILSVRC2012`
。
#### 3.3 准备预测模型
...
...
@@ -99,7 +99,7 @@ export CUDA_VISIBLE_DEVICES=0
python run.py
--save_dir
=
'./save_quant_mobilev1/'
--config_path
=
'./configs/MobileNetV1/qat_dis.yaml'
```
**
分布式训练
**
**
多卡启动
**
图像分类训练任务中往往包含大量训练数据,以ImageNet为例,ImageNet22k数据集中包含1400W张图像,如果使用单卡训练,会非常耗时,使用分布式训练可以达到几乎线性的加速比。
...
...
@@ -107,24 +107,30 @@ python run.py --save_dir='./save_quant_mobilev1/' --config_path='./configs/Mobil
export
CUDA_VISIBLE_DEVICES
=
0,1,2,3
python
-m
paddle.distributed.launch run.py
--save_dir
=
'./save_quant_mobilev1/'
--config_path
=
'./configs/MobileNetV1/qat_dis.yaml'
```
多卡训练
(分布式训练)
指的是将训练任务按照一定方法拆分到多个训练节点完成数据读取、前向计算、反向梯度计算等过程,并将计算出的梯度上传至服务节点。服务节点在收到所有训练节点传来的梯度后,会将梯度聚合并更新参数。最后将参数发送给训练节点,开始新一轮的训练。多卡训练一轮训练能训练
```batch size * num gpus```
的数据,比如单卡的
```batch size```
为32,单轮训练的数据量即32,而四卡训练的
```batch size```
为32,单轮训练的数据量为128。
多卡训练指的是将训练任务按照一定方法拆分到多个训练节点完成数据读取、前向计算、反向梯度计算等过程,并将计算出的梯度上传至服务节点。服务节点在收到所有训练节点传来的梯度后,会将梯度聚合并更新参数。最后将参数发送给训练节点,开始新一轮的训练。多卡训练一轮训练能训练
```batch size * num gpus```
的数据,比如单卡的
```batch size```
为32,单轮训练的数据量即32,而四卡训练的
```batch size```
为32,单轮训练的数据量为128。
注意
```learning rate```
与
```batch size```
呈线性关系,这里单卡
```batch size```
为32,对应的
```learning rate```
为0.015,那么如果
```batch size```
减小4倍改为8,
```learning rate```
也需除以4;多卡时
```batch size```
为32,
```learning rate```
需乘上卡数。所以改变
```batch size```
或改变训练卡数都需要对应修改
```learning rate```
。
**验证精度**
根据训练log可以看到模型验证的精度,若需再次验证精度,修改配置文件
```./configs/MobileNetV1/qat_dis.yaml```
中所需验证模型的文件夹路径及模型和参数名称
```model_dir, model_filename, params_filename```
,然后使用以下命令进行验证:
```
shell
export
CUDA_VISIBLE_DEVICES
=
0
python eval.py
--config_path
=
'./configs/MobileNetV1/qat_dis.yaml'
```
## 4.预测部署
#### 4.1 Python预测推理
环境配置:若使用 TesorRT 预测引擎,需安装
```WITH_TRT=ON```
的Paddle,下载地址:
[
Python预测库
](
https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python
)
准备好inference模型后,使用以下命令进行预测
:
```
shell
python infer.py
--config_path
=
"configs/infer.yaml"
```
配置文件:
```configs/infer.yaml```
中有以下字段用于配置预测参数
:
-
```inference_model_dir```
:inference 模型文件所在目录,该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件
-
```model_filename```
:inference_model_dir文件夹下的模型文件名称
-
```params_filename```
:inference_model_dir文件夹下的参数文件名称
在配置文件
```configs/infer.yaml```
中有以下字段用于配置预测参数:
-
```model_dir```
:inference 模型文件所在目录,该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件
-
```model_filename```
:model_dir文件夹下的模型文件名称
-
```params_filename```
:model_dir文件夹下的参数文件名称
-
```batch_size```
:预测一个batch的大小
-
```image_size```
:输入图像的大小
-
```use_tensorrt```
:是否使用 TesorRT 预测引擎
...
...
@@ -136,9 +142,13 @@ python infer.py --config_path="configs/infer.yaml"
注意:
-
请注意模型的输入数据尺寸,如InceptionV3输入尺寸为299,部分模型需要修改参数:
```image_size```
-
如果希望提升评测模型速度,使用
```GPU```
评测时,建议开启
```TensorRT```
加速预测,使用
```CPU```
评测时,建议开启
```MKL-DNN```
加速预测
-
若使用 TesorRT 预测引擎,需安装
```WITH_TRT=ON```
的Paddle,下载地址:
[
Python预测库
](
https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python
)
准备好inference模型后,使用以下命令进行预测:
```
shell
python infer.py
--config_path
=
"configs/infer.yaml"
```
#### 4.2 PaddleLite端侧部署
PaddleLite端侧部署可参考:
-
[
Paddle Lite部署
](
https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/paddle_lite_deploy.md
)
...
...
example/auto_compression/image_classification/configs/EfficientNetB0/prune_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/EfficientNetB0/qat_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/GhostNet_x1_0/prune_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/GhostNet_x1_0/qat_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/InceptionV3/prune_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -6,7 +6,7 @@ Global:
batch_size
:
32
resize_size
:
320
crop_size
:
299
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/InceptionV3/qat_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -6,7 +6,7 @@ Global:
batch_size
:
32
resize_size
:
320
img_size
:
299
data_dir
:
/workspace/dataset
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/MobileNetV1/prune_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/prune_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/qat_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/PPHGNet_tiny/prune_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/PPHGNet_tiny/qat_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/PPLCNetV2_base/prune_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/PPLCNetV2_base/qat_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/PPLCNet_x1_0/prune_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/PPLCNet_x1_0/qat_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/ResNet50_vd/prune_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/ResNet50_vd/qat_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/prune_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/qat_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/SqueezeNet1_0/prune_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/SqueezeNet1_0/qat_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/image_classification/configs/SwinTransformer_base_patch4_window7_224/qat_dis.yaml
浏览文件 @
61ad6665
...
...
@@ -4,7 +4,7 @@ Global:
model_filename
:
inference.pdmodel
params_filename
:
inference.pdiparams
batch_size
:
32
data_dir
:
/ILSVRC2012
data_dir
:
.
/ILSVRC2012
Distillation
:
alpha
:
1.0
...
...
example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml
浏览文件 @
61ad6665
...
...
@@ -6,6 +6,11 @@ Global:
dataset
:
clue
batch_size
:
16
max_seq_length
:
128
TransformerPrune
:
pruned_ratio
:
0.25
HyperParameterOptimization
:
Distillation
:
Quantization
:
TrainConfig
:
epochs
:
6
eval_iter
:
1070
...
...
example/auto_compression/nlp/configs/pp-minilm/auto/cluewsc.yaml
浏览文件 @
61ad6665
...
...
@@ -6,6 +6,11 @@ Global:
dataset
:
clue
batch_size
:
16
max_seq_length
:
128
TransformerPrune
:
pruned_ratio
:
0.25
HyperParameterOptimization
:
Distillation
:
Quantization
:
TrainConfig
:
epochs
:
100
eval_iter
:
70
...
...
example/auto_compression/nlp/configs/pp-minilm/auto/cmnli.yaml
浏览文件 @
61ad6665
...
...
@@ -6,6 +6,11 @@ Global:
dataset
:
clue
batch_size
:
16
max_seq_length
:
128
TransformerPrune
:
pruned_ratio
:
0.25
HyperParameterOptimization
:
Distillation
:
Quantization
:
TrainConfig
:
epochs
:
6
eval_iter
:
2000
...
...
example/auto_compression/nlp/configs/pp-minilm/auto/csl.yaml
浏览文件 @
61ad6665
...
...
@@ -6,6 +6,11 @@ Global:
dataset
:
clue
batch_size
:
16
max_seq_length
:
128
TransformerPrune
:
pruned_ratio
:
0.25
HyperParameterOptimization
:
Distillation
:
Quantization
:
TrainConfig
:
epochs
:
16
eval_iter
:
1000
...
...
example/auto_compression/nlp/configs/pp-minilm/auto/iflytek.yaml
浏览文件 @
61ad6665
...
...
@@ -6,6 +6,11 @@ Global:
dataset
:
clue
batch_size
:
16
max_seq_length
:
128
TransformerPrune
:
pruned_ratio
:
0.25
HyperParameterOptimization
:
Distillation
:
Quantization
:
TrainConfig
:
epochs
:
12
eval_iter
:
750
...
...
example/auto_compression/nlp/configs/pp-minilm/auto/ocnli.yaml
浏览文件 @
61ad6665
...
...
@@ -6,6 +6,11 @@ Global:
dataset
:
clue
batch_size
:
16
max_seq_length
:
128
TransformerPrune
:
pruned_ratio
:
0.25
HyperParameterOptimization
:
Distillation
:
Quantization
:
TrainConfig
:
epochs
:
20
eval_iter
:
1050
...
...
example/auto_compression/nlp/configs/pp-minilm/auto/tnews.yaml
浏览文件 @
61ad6665
...
...
@@ -6,6 +6,11 @@ Global:
dataset
:
clue
batch_size
:
16
max_seq_length
:
128
TransformerPrune
:
pruned_ratio
:
0.25
HyperParameterOptimization
:
Distillation
:
Quantization
:
TrainConfig
:
epochs
:
6
eval_iter
:
1110
...
...
example/auto_compression/pytorch_huggingface/README.md
浏览文件 @
61ad6665
...
...
@@ -15,7 +15,7 @@
飞桨模型转换工具
[
X2Paddle
](
https://github.com/PaddlePaddle/X2Paddle
)
支持将
```Caffe/TensorFlow/ONNX/PyTorch```
的模型一键转为飞桨(PaddlePaddle)的预测模型。借助X2Paddle的能力,PaddleSlim的自动压缩功能可方便地用于各种框架的推理模型。
本示例将以
[
Pytorch
](
https://github.com/pytorch/pytorch
)
框架的自然语言处理模型为例,介绍如何自动压缩其他框架中的自然语言处理模型。本示例会利用
[
huggingface
](
https://github.com/huggingface/transformers
)
开源transformers库,将Pytorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为剪枝蒸馏和
离线量化(
```Post-training quantization```
)
。
本示例将以
[
Pytorch
](
https://github.com/pytorch/pytorch
)
框架的自然语言处理模型为例,介绍如何自动压缩其他框架中的自然语言处理模型。本示例会利用
[
huggingface
](
https://github.com/huggingface/transformers
)
开源transformers库,将Pytorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为剪枝蒸馏和
量化训练
。
...
...
@@ -27,13 +27,13 @@
| 模型 | 策略 | CoLA | MRPC | QNLI | QQP | RTE | SST2 | STSB | AVG |
|:------:|:------:|:------:|:------:|:-----------:|:------:|:------:|:------:|:------:|:------:|
| bert-base-cased | Base模型| 60.06 | 84.31 | 90.68 | 90.84 | 63.53 | 91.63 | 88.46 | 81.35 |
| bert-base-cased |剪枝蒸馏+
离线量化| 60.52 | 84.80 | 90.59 | 90.42 | 64.26 | 91.63 | 88.51 | 81.53
|
| bert-base-cased |剪枝蒸馏+
量化训练| 58.69 | 85.05 | 90.74 | 90.42 | 65.34 | 92.08 | 88.22 | 81.51
|
模型在多个任务上平均精度以及加速对比如下:
| bert-base-cased | Accuracy(avg) | 时延(ms) | 加速比 |
|:-------:|:----------:|:------------:| :------:|
| 压缩前 | 81.35 | 11.60 | - |
| 压缩后 | 81.5
3
| 4.83 | 2.40 |
| 压缩后 | 81.5
1
| 4.83 | 2.40 |
-
Nvidia GPU 测试环境:
-
硬件:NVIDIA Tesla T4 单卡
...
...
@@ -192,7 +192,17 @@ python run.py --config_path=./configs/cola.yaml --eval True
## 4. 预测部署
准备好inference模型后,可以使用
```infer.py```
进行预测,比如:
环境配置:若使用 Paddle TensorRT 预测引擎,需安装
```WITH_TRT=ON```
的Paddle,下载地址:
[
Python预测库
](
https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python
)
启动配置:
除需传入
```task_name```
任务名称,
```model_name_or_path```
模型名称,
```model_path```
保存inference模型的路径等基本参数外,还需根据预测环境传入预测参数:
-
```device```
:默认为gpu,可选为gpu, cpu, xpu
-
```use_trt```
:是否使用 TesorRT 预测引擎
-
```int8```
:是否启用
```INT8```
-
```fp16```
:是否启用
```FP16```
准备好inference模型后,可以使用
```infer.py```
进行预测,如使用 TesorRT 预测引擎测试 FP32 模型:
```
shell
python
-u
./infer.py
\
--task_name
cola
\
...
...
@@ -201,16 +211,23 @@ python -u ./infer.py \
--batch_size
1
\
--max_seq_length
128
\
--device
gpu
\
--use_trt
\
--use_trt
```
如使用 TesorRT 预测引擎测试 INT8 模型:
```
shell
python
-u
./infer.py
\
--task_name
cola
\
--model_name_or_path
bert-base-cased
\
--model_path
./output/cola/model
\
--batch_size
1
\
--max_seq_length
128
\
--device
gpu
\
--use_trt
\
--int8
```
除需传入
```task_name```
任务名称,
```model_name_or_path```
模型名称,
```model_path```
保存inference模型的路径等基本参数外,还需根据预测环境传入预测参数:
-
```device```
:默认为gpu,可选为gpu, cpu, xpu
-
```use_trt```
:是否使用 TesorRT 预测引擎
-
```int8```
:是否启用
```INT8```
-
```fp16```
:是否启用
```FP16```
若使用 TesorRT 预测引擎,需安装
```WITH_TRT=ON```
的Paddle,下载地址:
[
Python预测库
](
https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python
)
## 5. FAQ
example/auto_compression/pytorch_huggingface/infer.py
浏览文件 @
61ad6665
...
...
@@ -99,24 +99,14 @@ def parse_args():
default
=
'max_length'
,
type
=
int
,
help
=
"Padding type"
,
)
parser
.
add_argument
(
"--perf_warmup_steps"
,
default
=
20
,
type
=
int
,
help
=
"Warmup steps for performance test."
,
)
parser
.
add_argument
(
"--use_trt"
,
action
=
'store_true'
,
help
=
"Whether to use inference engin TensorRT."
,
)
parser
.
add_argument
(
"--perf"
,
action
=
'store_false'
,
help
=
"Whether to test performance."
,
)
parser
.
add_argument
(
"--int8"
,
action
=
'store_true'
,
help
=
"Whether to use int8 inference."
,
)
parser
.
add_argument
(
"--fp16"
,
action
=
'store_true'
,
...
...
@@ -125,18 +115,6 @@ def parse_args():
return
args
@
paddle
.
no_grad
()
def
evaluate
(
outputs
,
metric
,
data_loader
):
metric
.
reset
()
for
i
,
batch
in
enumerate
(
data_loader
):
input_ids
,
segment_ids
,
labels
=
batch
logits
=
paddle
.
to_tensor
(
outputs
[
i
][
0
])
correct
=
metric
.
compute
(
logits
,
labels
)
metric
.
update
(
correct
)
res
=
metric
.
accumulate
()
print
(
"acc: %s, "
%
res
,
end
=
''
)
def
convert_example
(
example
,
tokenizer
,
label_list
,
...
...
@@ -240,95 +218,67 @@ class Predictor(object):
return
cls
(
predictor
,
input_handles
,
output_handles
)
def
predict_batch
(
self
,
data
):
for
input_field
,
input_handle
in
zip
(
data
,
self
.
input_handles
):
input_handle
.
copy_from_cpu
(
input_field
)
self
.
predictor
.
run
()
output
=
[
output_handle
.
copy_to_cpu
()
for
output_handle
in
self
.
output_handles
]
return
output
def
convert_predict_batch
(
self
,
args
,
data
,
tokenizer
,
batchify_fn
,
label_list
):
examples
=
[]
for
example
in
data
:
example
=
convert_example
(
example
,
tokenizer
,
label_list
,
task_name
=
args
.
task_name
,
max_seq_length
=
args
.
max_seq_length
,
padding
=
'max_length'
,
return_attention_mask
=
True
)
examples
.
append
(
example
)
def
predict
(
self
,
dataset
,
collate_fn
,
batch_size
):
batch_sampler
=
paddle
.
io
.
BatchSampler
(
dataset
,
batch_size
=
batch_size
,
shuffle
=
False
)
data_loader
=
paddle
.
io
.
DataLoader
(
dataset
=
dataset
,
batch_sampler
=
batch_sampler
,
collate_fn
=
collate_fn
,
num_workers
=
0
,
return_list
=
True
)
outputs
=
[]
end_time
=
0
for
data
in
data_loader
:
for
input_field
,
input_handle
in
zip
(
data
,
self
.
input_handles
):
input_handle
.
copy_from_cpu
(
input_field
.
numpy
()
if
isinstance
(
input_field
,
paddle
.
Tensor
)
else
input_field
)
for
i
in
range
(
50
):
self
.
predictor
.
run
()
return
examples
def
predict
(
self
,
dataset
,
tokenizer
,
batchify_fn
,
args
):
batches
=
[
dataset
[
idx
:
idx
+
args
.
batch_size
]
for
idx
in
range
(
0
,
len
(
dataset
),
args
.
batch_size
)
]
if
args
.
perf
:
for
i
,
batch
in
enumerate
(
batches
):
examples
=
self
.
convert_predict_batch
(
args
,
batch
,
tokenizer
,
batchify_fn
,
dataset
.
label_list
)
input_ids
,
atten_mask
,
segment_ids
,
label
=
batchify_fn
(
examples
)
output
=
self
.
predict_batch
(
[
input_ids
,
atten_mask
,
segment_ids
])
if
i
>
args
.
perf_warmup_steps
:
break
time1
=
time
.
time
()
for
batch
in
batches
:
examples
=
self
.
convert_predict_batch
(
args
,
batch
,
tokenizer
,
batchify_fn
,
dataset
.
label_list
)
input_ids
,
atten_mask
,
segment_ids
,
_
=
batchify_fn
(
examples
)
output
=
self
.
predict_batch
(
[
input_ids
,
atten_mask
,
segment_ids
])
print
(
"task name: %s, time: %s, "
%
(
args
.
task_name
,
time
.
time
()
-
time1
))
else
:
metric
=
METRIC_CLASSES
[
args
.
task_name
]()
metric
.
reset
()
for
i
,
batch
in
enumerate
(
batches
):
examples
=
self
.
convert_predict_batch
(
args
,
batch
,
tokenizer
,
batchify_fn
,
dataset
.
label_list
)
input_ids
,
atten_mask
,
segment_ids
,
label
=
batchify_fn
(
examples
)
output
=
self
.
predict_batch
(
[
input_ids
,
atten_mask
,
segment_ids
])
correct
=
metric
.
compute
(
paddle
.
to_tensor
(
output
),
paddle
.
to_tensor
(
label
))
metric
.
update
(
correct
)
res
=
metric
.
accumulate
()
print
(
"task name: %s, acc: %s, "
%
(
args
.
task_name
,
res
),
end
=
''
)
repeats
=
1000
for
i
in
range
(
repeats
):
self
.
predictor
.
run
()
output
=
[
output_handle
.
copy_to_cpu
()
for
output_handle
in
self
.
output_handles
]
time2
=
time
.
time
()
end_time
=
(
time2
-
time1
)
/
repeats
*
1000
break
print
(
"task name: %s, inference time: %s ms."
%
(
args
.
task_name
,
end_time
))
def
main
():
paddle
.
seed
(
42
)
args
=
parse_args
()
predictor
=
Predictor
.
create_predictor
(
args
)
args
.
task_name
=
args
.
task_name
.
lower
()
args
.
model_type
=
args
.
model_type
.
lower
()
predictor
=
Predictor
.
create_predictor
(
args
)
dev_ds
=
load_dataset
(
'glue'
,
args
.
task_name
,
splits
=
'dev'
)
tokenizer
=
BertTokenizer
.
from_pretrained
(
args
.
model_name_or_path
)
trans_func
=
partial
(
convert_example
,
tokenizer
=
tokenizer
,
label_list
=
dev_ds
.
label_list
,
max_seq_length
=
args
.
max_seq_length
,
padding
=
args
.
padding
,
return_attention_mask
=
True
)
dev_ds
=
dev_ds
.
map
(
trans_func
)
batchify_fn
=
lambda
samples
,
fn
=
Tuple
(
Pad
(
axis
=
0
,
pad_val
=
tokenizer
.
pad_token_id
),
# input
Pad
(
axis
=
0
,
pad_val
=
0
),
Pad
(
axis
=
0
,
pad_val
=
tokenizer
.
pad_token_id
),
# segment
Stack
(
dtype
=
"int64"
if
dev_ds
.
label_list
else
"float32"
)
# label
):
fn
(
samples
)
outputs
=
predictor
.
predict
(
dev_ds
,
tokenizer
,
batchify_fn
,
args
)
predictor
.
predict
(
dev_ds
,
batchify_fn
,
args
.
batch_size
)
if
__name__
==
"__main__"
:
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录