Update NLP/HF/Clas ACT Docs (#1278)

61ad6665 · Chang Xu · GitHub · ae917097 · 61ad6665 · 61ad6665
32 changed file
--- a/example/auto_compression/image_classification/README.md
+++ b/example/auto_compression/image_classification/README.md
@@ -72,7 +72,7 @@ pip install paddleslim
 ```

 #### 3.2 准备数据集
-本案例默认以ImageNet1k数据进行自动压缩实验，如数据集为非ImageNet1k格式数据， 请参考[PaddleClas数据准备文档](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/data_preparation/classification_dataset.md)。
+本案例默认以ImageNet1k数据进行自动压缩实验，如数据集为非ImageNet1k格式数据， 请参考[PaddleClas数据准备文档](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/data_preparation/classification_dataset.md)。将下载好的数据集放在当前目录下`./ILSVRC2012`。


 #### 3.3 准备预测模型
@@ -99,7 +99,7 @@ export CUDA_VISIBLE_DEVICES=0
 python run.py --save_dir='./save_quant_mobilev1/' --config_path='./configs/MobileNetV1/qat_dis.yaml'
 ```

-**分布式训练**
+**多卡启动**

 图像分类训练任务中往往包含大量训练数据，以ImageNet为例，ImageNet22k数据集中包含1400W张图像，如果使用单卡训练，会非常耗时，使用分布式训练可以达到几乎线性的加速比。

@@ -107,24 +107,30 @@ python run.py --save_dir='./save_quant_mobilev1/' --config_path='./configs/Mobil
 export CUDA_VISIBLE_DEVICES=0,1,2,3
 python -m paddle.distributed.launch run.py --save_dir='./save_quant_mobilev1/' --config_path='./configs/MobileNetV1/qat_dis.yaml'
 ```
-多卡训练（分布式训练）指的是将训练任务按照一定方法拆分到多个训练节点完成数据读取、前向计算、反向梯度计算等过程，并将计算出的梯度上传至服务节点。服务节点在收到所有训练节点传来的梯度后，会将梯度聚合并更新参数。最后将参数发送给训练节点，开始新一轮的训练。多卡训练一轮训练能训练```batch size * num gpus```的数据，比如单卡的```batch size```为32，单轮训练的数据量即32，而四卡训练的```batch size```为32，单轮训练的数据量为128。
+多卡训练指的是将训练任务按照一定方法拆分到多个训练节点完成数据读取、前向计算、反向梯度计算等过程，并将计算出的梯度上传至服务节点。服务节点在收到所有训练节点传来的梯度后，会将梯度聚合并更新参数。最后将参数发送给训练节点，开始新一轮的训练。多卡训练一轮训练能训练```batch size * num gpus```的数据，比如单卡的```batch size```为32，单轮训练的数据量即32，而四卡训练的```batch size```为32，单轮训练的数据量为128。

 注意 ```learning rate``` 与 ```batch size``` 呈线性关系，这里单卡 ```batch size``` 为32，对应的 ```learning rate``` 为0.015，那么如果 ```batch size``` 减小4倍改为8，```learning rate``` 也需除以4；多卡时 ```batch size``` 为32，```learning rate``` 需乘上卡数。所以改变 ```batch size``` 或改变训练卡数都需要对应修改 ```learning rate```。

+**验证精度**
+
+根据训练log可以看到模型验证的精度，若需再次验证精度，修改配置文件```./configs/MobileNetV1/qat_dis.yaml```中所需验证模型的文件夹路径及模型和参数名称```model_dir, model_filename, params_filename```，然后使用以下命令进行验证：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0
+python eval.py --config_path='./configs/MobileNetV1/qat_dis.yaml'
+```
+

 ## 4.预测部署
 #### 4.1 Python预测推理

+环境配置：若使用 TesorRT 预测引擎，需安装 ```WITH_TRT=ON``` 的Paddle，下载地址：[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)

-准备好inference模型后，使用以下命令进行预测：
-```shell
-python infer.py --config_path="configs/infer.yaml"
-```
+配置文件：```configs/infer.yaml```中有以下字段用于配置预测参数：
+- ```inference_model_dir```：inference 模型文件所在目录，该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件
+- ```model_filename```：inference_model_dir文件夹下的模型文件名称
+- ```params_filename```：inference_model_dir文件夹下的参数文件名称

-在配置文件```configs/infer.yaml```中有以下字段用于配置预测参数：
- ```model_dir```：inference 模型文件所在目录，该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件
- ```model_filename```：model_dir文件夹下的模型文件名称
- ```params_filename```：model_dir文件夹下的参数文件名称
 - ```batch_size```：预测一个batch的大小
 - ```image_size```：输入图像的大小
 - ```use_tensorrt```：是否使用 TesorRT 预测引擎
@@ -136,9 +142,13 @@ python infer.py --config_path="configs/infer.yaml"
 注意：
 - 请注意模型的输入数据尺寸，如InceptionV3输入尺寸为299，部分模型需要修改参数：```image_size```
 - 如果希望提升评测模型速度，使用 ```GPU``` 评测时，建议开启 ```TensorRT``` 加速预测，使用 ```CPU``` 评测时，建议开启 ```MKL-DNN``` 加速预测
- 若使用 TesorRT 预测引擎，需安装 ```WITH_TRT=ON``` 的Paddle，下载地址：[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)


+准备好inference模型后，使用以下命令进行预测：
+```shell
+python infer.py --config_path="configs/infer.yaml"
+```
+
 #### 4.2 PaddleLite端侧部署
 PaddleLite端侧部署可参考：
 - [Paddle Lite部署](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/paddle_lite_deploy.md)

--- a/example/auto_compression/image_classification/configs/EfficientNetB0/prune_dis.yaml
+++ b/example/auto_compression/image_classification/configs/EfficientNetB0/prune_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012
  
 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/EfficientNetB0/qat_dis.yaml
+++ b/example/auto_compression/image_classification/configs/EfficientNetB0/qat_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012

 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/GhostNet_x1_0/prune_dis.yaml
+++ b/example/auto_compression/image_classification/configs/GhostNet_x1_0/prune_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012
  
 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/GhostNet_x1_0/qat_dis.yaml
+++ b/example/auto_compression/image_classification/configs/GhostNet_x1_0/qat_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012
  
 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/InceptionV3/prune_dis.yaml
+++ b/example/auto_compression/image_classification/configs/InceptionV3/prune_dis.yaml
@@ -6,7 +6,7 @@ Global:
  batch_size: 32
  resize_size: 320
  crop_size: 299
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012
  
 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/InceptionV3/qat_dis.yaml
+++ b/example/auto_compression/image_classification/configs/InceptionV3/qat_dis.yaml
@@ -6,7 +6,7 @@ Global:
  batch_size: 32
  resize_size: 320
  img_size: 299
-  data_dir: /workspace/dataset/ILSVRC2012
+  data_dir: ./ILSVRC2012
  
 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/MobileNetV1/prune_dis.yaml
+++ b/example/auto_compression/image_classification/configs/MobileNetV1/prune_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012
  
 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/prune_dis.yaml
+++ b/example/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/prune_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012

 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/qat_dis.yaml
+++ b/example/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/qat_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012

 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/PPHGNet_tiny/prune_dis.yaml
+++ b/example/auto_compression/image_classification/configs/PPHGNet_tiny/prune_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012

 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/PPHGNet_tiny/qat_dis.yaml
+++ b/example/auto_compression/image_classification/configs/PPHGNet_tiny/qat_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012

 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/PPLCNetV2_base/prune_dis.yaml
+++ b/example/auto_compression/image_classification/configs/PPLCNetV2_base/prune_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012

 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/PPLCNetV2_base/qat_dis.yaml
+++ b/example/auto_compression/image_classification/configs/PPLCNetV2_base/qat_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012

 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/PPLCNet_x1_0/prune_dis.yaml
+++ b/example/auto_compression/image_classification/configs/PPLCNet_x1_0/prune_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012
  
 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/PPLCNet_x1_0/qat_dis.yaml
+++ b/example/auto_compression/image_classification/configs/PPLCNet_x1_0/qat_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012
  
 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/ResNet50_vd/prune_dis.yaml
+++ b/example/auto_compression/image_classification/configs/ResNet50_vd/prune_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012

 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/ResNet50_vd/qat_dis.yaml
+++ b/example/auto_compression/image_classification/configs/ResNet50_vd/qat_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012

 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/prune_dis.yaml
+++ b/example/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/prune_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012

 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/qat_dis.yaml
+++ b/example/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/qat_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012

 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/SqueezeNet1_0/prune_dis.yaml
+++ b/example/auto_compression/image_classification/configs/SqueezeNet1_0/prune_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012

 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/SqueezeNet1_0/qat_dis.yaml
+++ b/example/auto_compression/image_classification/configs/SqueezeNet1_0/qat_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012

 Distillation:
  alpha: 1.0

--- a/example/auto_compression/image_classification/configs/SwinTransformer_base_patch4_window7_224/qat_dis.yaml
+++ b/example/auto_compression/image_classification/configs/SwinTransformer_base_patch4_window7_224/qat_dis.yaml
@@ -4,7 +4,7 @@ Global:
  model_filename: inference.pdmodel
  params_filename: inference.pdiparams
  batch_size: 32
-  data_dir: /ILSVRC2012
+  data_dir: ./ILSVRC2012

 Distillation:
  alpha: 1.0

--- a/example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml
+++ b/example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml
@@ -6,6 +6,11 @@ Global:
  dataset: clue
  batch_size: 16
  max_seq_length: 128
+TransformerPrune:
+  pruned_ratio: 0.25
+HyperParameterOptimization:
+Distillation:
+Quantization:
 TrainConfig:
  epochs: 6
  eval_iter: 1070

--- a/example/auto_compression/nlp/configs/pp-minilm/auto/cluewsc.yaml
+++ b/example/auto_compression/nlp/configs/pp-minilm/auto/cluewsc.yaml
@@ -6,6 +6,11 @@ Global:
  dataset: clue
  batch_size: 16
  max_seq_length: 128
+TransformerPrune:
+  pruned_ratio: 0.25
+HyperParameterOptimization:
+Distillation:
+Quantization:
 TrainConfig:
  epochs: 100
  eval_iter: 70

--- a/example/auto_compression/nlp/configs/pp-minilm/auto/cmnli.yaml
+++ b/example/auto_compression/nlp/configs/pp-minilm/auto/cmnli.yaml
@@ -6,6 +6,11 @@ Global:
  dataset: clue
  batch_size: 16
  max_seq_length: 128
+TransformerPrune:
+  pruned_ratio: 0.25
+HyperParameterOptimization:
+Distillation:
+Quantization:
 TrainConfig:
  epochs: 6
  eval_iter: 2000

--- a/example/auto_compression/nlp/configs/pp-minilm/auto/csl.yaml
+++ b/example/auto_compression/nlp/configs/pp-minilm/auto/csl.yaml
@@ -6,6 +6,11 @@ Global:
  dataset: clue
  batch_size: 16
  max_seq_length: 128
+TransformerPrune:
+  pruned_ratio: 0.25
+HyperParameterOptimization:
+Distillation:
+Quantization:
 TrainConfig:
  epochs: 16
  eval_iter: 1000

--- a/example/auto_compression/nlp/configs/pp-minilm/auto/iflytek.yaml
+++ b/example/auto_compression/nlp/configs/pp-minilm/auto/iflytek.yaml
@@ -6,6 +6,11 @@ Global:
  dataset: clue
  batch_size: 16
  max_seq_length: 128
+TransformerPrune:
+  pruned_ratio: 0.25
+HyperParameterOptimization:
+Distillation:
+Quantization:
 TrainConfig:
  epochs: 12
  eval_iter: 750

--- a/example/auto_compression/nlp/configs/pp-minilm/auto/ocnli.yaml
+++ b/example/auto_compression/nlp/configs/pp-minilm/auto/ocnli.yaml
@@ -6,6 +6,11 @@ Global:
  dataset: clue
  batch_size: 16
  max_seq_length: 128
+TransformerPrune:
+  pruned_ratio: 0.25
+HyperParameterOptimization:
+Distillation:
+Quantization:
 TrainConfig:
  epochs: 20
  eval_iter: 1050

--- a/example/auto_compression/nlp/configs/pp-minilm/auto/tnews.yaml
+++ b/example/auto_compression/nlp/configs/pp-minilm/auto/tnews.yaml
@@ -6,6 +6,11 @@ Global:
  dataset: clue
  batch_size: 16
  max_seq_length: 128
+TransformerPrune:
+  pruned_ratio: 0.25
+HyperParameterOptimization:
+Distillation:
+Quantization:
 TrainConfig:
  epochs: 6
  eval_iter: 1110

--- a/example/auto_compression/pytorch_huggingface/README.md
+++ b/example/auto_compression/pytorch_huggingface/README.md
@@ -15,7 +15,7 @@
 飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨（PaddlePaddle）的预测模型。借助X2Paddle的能力，PaddleSlim的自动压缩功能可方便地用于各种框架的推理模型。


-本示例将以[Pytorch](https://github.com/pytorch/pytorch)框架的自然语言处理模型为例，介绍如何自动压缩其他框架中的自然语言处理模型。本示例会利用[huggingface](https://github.com/huggingface/transformers)开源transformers库，将Pytorch框架模型转换为Paddle框架模型，再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为剪枝蒸馏和离线量化(```Post-training quantization```)。
+本示例将以[Pytorch](https://github.com/pytorch/pytorch)框架的自然语言处理模型为例，介绍如何自动压缩其他框架中的自然语言处理模型。本示例会利用[huggingface](https://github.com/huggingface/transformers)开源transformers库，将Pytorch框架模型转换为Paddle框架模型，再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为剪枝蒸馏和量化训练。



@@ -27,13 +27,13 @@
 | 模型 | 策略 | CoLA | MRPC | QNLI | QQP | RTE | SST2  | STSB  | AVG |
 |:------:|:------:|:------:|:------:|:-----------:|:------:|:------:|:------:|:------:|:------:|
 | bert-base-cased | Base模型| 60.06 | 84.31 | 90.68 | 90.84 | 63.53 | 91.63  | 88.46 |  81.35  |
-| bert-base-cased |剪枝蒸馏+离线量化| 60.52 | 84.80 | 90.59 | 90.42 | 64.26 | 91.63 | 88.51 |  81.53 |
+| bert-base-cased |剪枝蒸馏+量化训练| 58.69 | 85.05 | 90.74 | 90.42 | 65.34 | 92.08 | 88.22 |  81.51 |

 模型在多个任务上平均精度以及加速对比如下：
 |  bert-base-cased | Accuracy（avg） | 时延(ms) | 加速比 |
 |:-------:|:----------:|:------------:| :------:|
 | 压缩前 |  81.35 | 11.60 | - |
-| 压缩后 |  81.53 | 4.83 | 2.40 |
+| 压缩后 |  81.51 | 4.83 | 2.40 |

 - Nvidia GPU 测试环境：
  - 硬件：NVIDIA Tesla T4 单卡
@@ -192,7 +192,17 @@ python run.py --config_path=./configs/cola.yaml  --eval True

 ## 4. 预测部署

-准备好inference模型后，可以使用```infer.py```进行预测，比如：
+环境配置：若使用 Paddle TensorRT 预测引擎，需安装 ```WITH_TRT=ON``` 的Paddle，下载地址：[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)
+
+启动配置：
+
+除需传入```task_name```任务名称，```model_name_or_path```模型名称，```model_path```保存inference模型的路径等基本参数外，还需根据预测环境传入预测参数：
+- ```device```：默认为gpu，可选为gpu, cpu, xpu
+- ```use_trt```：是否使用 TesorRT 预测引擎
+- ```int8```：是否启用```INT8```
+- ```fp16```：是否启用```FP16```
+
+准备好inference模型后，可以使用```infer.py```进行预测，如使用 TesorRT 预测引擎测试 FP32 模型：
 ```shell
 python -u ./infer.py \
    --task_name cola \
@@ -201,16 +211,23 @@ python -u ./infer.py \
    --batch_size 1 \
    --max_seq_length 128 \
    --device gpu \
-    --use_trt \  
+    --use_trt
+```
+
+如使用 TesorRT 预测引擎测试 INT8 模型：
+```shell
+python -u ./infer.py \
+    --task_name cola \
+    --model_name_or_path bert-base-cased \
+    --model_path ./output/cola/model \
+    --batch_size 1 \
+    --max_seq_length 128 \
+    --device gpu \
+    --use_trt \
+    --int8
 ```

-除需传入```task_name```任务名称，```model_name_or_path```模型名称，```model_path```保存inference模型的路径等基本参数外，还需根据预测环境传入预测参数：
- ```device```：默认为gpu，可选为gpu, cpu, xpu
- ```use_trt```：是否使用 TesorRT 预测引擎
- ```int8```：是否启用```INT8```
- ```fp16```：是否启用```FP16```


-若使用 TesorRT 预测引擎，需安装 ```WITH_TRT=ON``` 的Paddle，下载地址：[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)

 ## 5. FAQ
--- a/example/auto_compression/pytorch_huggingface/infer.py
+++ b/example/auto_compression/pytorch_huggingface/infer.py
@@ -99,24 +99,14 @@ def parse_args():
        default='max_length',
        type=int,
        help="Padding type", )
-    parser.add_argument(
-        "--perf_warmup_steps",
-        default=20,
-        type=int,
-        help="Warmup steps for performance test.", )
    parser.add_argument(
        "--use_trt",
        action='store_true',
        help="Whether to use inference engin TensorRT.", )
-    parser.add_argument(
-        "--perf",
-        action='store_false',
-        help="Whether to test performance.", )
    parser.add_argument(
        "--int8",
        action='store_true',
        help="Whether to use int8 inference.", )
-
    parser.add_argument(
        "--fp16",
        action='store_true',
@@ -125,18 +115,6 @@ def parse_args():
    return args


-@paddle.no_grad()
-def evaluate(outputs, metric, data_loader):
-    metric.reset()
-    for i, batch in enumerate(data_loader):
-        input_ids, segment_ids, labels = batch
-        logits = paddle.to_tensor(outputs[i][0])
-        correct = metric.compute(logits, labels)
-        metric.update(correct)
-    res = metric.accumulate()
-    print("acc: %s, " % res, end='')
-
-
 def convert_example(example,
                    tokenizer,
                    label_list,
@@ -240,95 +218,67 @@ class Predictor(object):

        return cls(predictor, input_handles, output_handles)

-    def predict_batch(self, data):
-        for input_field, input_handle in zip(data, self.input_handles):
-            input_handle.copy_from_cpu(input_field)
-        self.predictor.run()
-        output = [
-            output_handle.copy_to_cpu() for output_handle in self.output_handles
-        ]
-        return output
-
-    def convert_predict_batch(self, args, data, tokenizer, batchify_fn,
-                              label_list):
-        examples = []
-        for example in data:
-            example = convert_example(
-                example,
-                tokenizer,
-                label_list,
-                task_name=args.task_name,
-                max_seq_length=args.max_seq_length,
-                padding='max_length',
-                return_attention_mask=True)
-            examples.append(example)
+    def predict(self, dataset, collate_fn, batch_size):
+        batch_sampler = paddle.io.BatchSampler(
+            dataset, batch_size=batch_size, shuffle=False)
+        data_loader = paddle.io.DataLoader(
+            dataset=dataset,
+            batch_sampler=batch_sampler,
+            collate_fn=collate_fn,
+            num_workers=0,
+            return_list=True)
+        outputs = []
+        end_time = 0
+        for data in data_loader:
+            for input_field, input_handle in zip(data, self.input_handles):
+                input_handle.copy_from_cpu(input_field.numpy() if isinstance(
+                    input_field, paddle.Tensor) else input_field)
+            for i in range(50):
+                self.predictor.run()

-        return examples
-
-    def predict(self, dataset, tokenizer, batchify_fn, args):
-        batches = [
-            dataset[idx:idx + args.batch_size]
-            for idx in range(0, len(dataset), args.batch_size)
-        ]
-        if args.perf:
-            for i, batch in enumerate(batches):
-                examples = self.convert_predict_batch(
-                    args, batch, tokenizer, batchify_fn, dataset.label_list)
-                input_ids, atten_mask, segment_ids, label = batchify_fn(
-                    examples)
-                output = self.predict_batch(
-                    [input_ids, atten_mask, segment_ids])
-                if i > args.perf_warmup_steps:
-                    break
            time1 = time.time()
-            for batch in batches:
-                examples = self.convert_predict_batch(
-                    args, batch, tokenizer, batchify_fn, dataset.label_list)
-                input_ids, atten_mask, segment_ids, _ = batchify_fn(examples)
-                output = self.predict_batch(
-                    [input_ids, atten_mask, segment_ids])
-
-            print("task name: %s, time: %s, " %
-                  (args.task_name, time.time() - time1))
-
-        else:
-            metric = METRIC_CLASSES[args.task_name]()
-            metric.reset()
-            for i, batch in enumerate(batches):
-                examples = self.convert_predict_batch(
-                    args, batch, tokenizer, batchify_fn, dataset.label_list)
-                input_ids, atten_mask, segment_ids, label = batchify_fn(
-                    examples)
-                output = self.predict_batch(
-                    [input_ids, atten_mask, segment_ids])
-                correct = metric.compute(
-                    paddle.to_tensor(output), paddle.to_tensor(label))
-                metric.update(correct)
-
-            res = metric.accumulate()
-            print("task name: %s, acc: %s, " % (args.task_name, res), end='')
+            repeats = 1000
+            for i in range(repeats):
+                self.predictor.run()
+                output = [
+                    output_handle.copy_to_cpu()
+                    for output_handle in self.output_handles
+                ]
+            time2 = time.time()
+            end_time = (time2 - time1) / repeats * 1000
+            break
+        print("task name: %s, inference time: %s ms." %
+              (args.task_name, end_time))


 def main():
    paddle.seed(42)
    args = parse_args()

+    predictor = Predictor.create_predictor(args)
+
    args.task_name = args.task_name.lower()
    args.model_type = args.model_type.lower()

-    predictor = Predictor.create_predictor(args)
-
    dev_ds = load_dataset('glue', args.task_name, splits='dev')
-
    tokenizer = BertTokenizer.from_pretrained(args.model_name_or_path)

+    trans_func = partial(
+        convert_example,
+        tokenizer=tokenizer,
+        label_list=dev_ds.label_list,
+        max_seq_length=args.max_seq_length,
+        padding=args.padding,
+        return_attention_mask=True)
+
+    dev_ds = dev_ds.map(trans_func)
    batchify_fn = lambda samples, fn=Tuple(
        Pad(axis=0, pad_val=tokenizer.pad_token_id),  # input
        Pad(axis=0, pad_val=0),
        Pad(axis=0, pad_val=tokenizer.pad_token_id),  # segment
        Stack(dtype="int64" if dev_ds.label_list else "float32")  # label
    ): fn(samples)
-    outputs = predictor.predict(dev_ds, tokenizer, batchify_fn, args)
+    predictor.predict(dev_ds, batchify_fn, args.batch_size)


 if __name__ == "__main__":