未验证 提交 fa7227e5 编写于 作者: G Guanghua Yu 提交者: GitHub

add ptq docs and demo (#5451)

* add ptq docs and demo

* fix readme

* update readme
上级 4f732271
# MobileNetV3
## 目录
- [1. 简介](#1)
- [2. 离线量化](#2)
- [2.1 准备Inference模型及环境](#2.1)
- [2.2 开始离线量化](#2.2)
- [2.3 验证推理结果](#2.3)
- [3. FAQ](#3)
<a name="1"></a>
## 1. 简介
Paddle中静态离线量化,使用少量校准数据计算量化因子,可以快速将FP32模型量化成低比特模型(比如最常用的int8量化)。使用该量化模型进行预测,可以减少计算量、降低计算内存、减小模型大小。
本文档主要基于Paddle的MobileNetV3模型进行离线量化。
更多关于Paddle 模型离线量化的介绍,可以参考[Paddle 离线量化官网教程](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/static/quant/quantization_api.rst#quant_post_static)
<a name="2"></a>
## 2. 离线量化
<a name="2.1"></a>
### 2.1 准备Inference模型及环境
由于离线量化直接使用Inference模型进行量化,不依赖模型组网,所以需要提前准备好Inference模型.
我们准备好了动转静后的MobileNetv3 small的Inference模型,可以从[mobilenet_v3_small_infer](https://paddle-model-ecology.bj.bcebos.com/model/mobilenetv3_reprod/mobilenet_v3_small_infer.tar)直接下载。
```shell
wget https://paddle-model-ecology.bj.bcebos.com/model/mobilenetv3_reprod/mobilenet_v3_small_infer.tar
tar -xf mobilenet_v3_small_infer.tar
```
也可以按照[MobileNetv3 动转静流程](xxx),将MobileNetv3 small的模型转成Inference模型。
<a name="2.2"></a>
环境准备:
- 安装PaddleSlim:
```shell
pip install paddleslim==2.2.1
```
- 安装PaddlePaddle:
```shell
pip install paddlepaddle-gpu==2.2.1.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
```
- 准备数据:
请参考[数据准备文档](https://github.com/PaddlePaddle/models/tree/release/2.2/tutorials/mobilenetv3_prod/Step6#32-%E5%87%86%E5%A4%87%E6%95%B0%E6%8D%AE)
### 2.2 开始离线量化
启动离线量化:
```bash
python post_quant.py --model_path=mobilenet_v3_small_infer/ \
--model_filename=inference.pdmodel \
--params_filename=inference.pdiparams \
--data_dir=/path/dataset/ILSVRC2012/ \
--use_gpu=True \
--batch_size=32 \
--batch_num=20
```
部分离线量化日志如下:
```
Thu Dec 30 12:36:17-INFO: Collect quantized variable names ...
Thu Dec 30 12:36:17-INFO: Preparation stage ...
Thu Dec 30 12:36:27-INFO: Run batch: 0
Thu Dec 30 12:37:10-INFO: Run batch: 5
Thu Dec 30 12:37:43-INFO: Finish preparation stage, all batch:10
Thu Dec 30 12:37:43-INFO: Sampling stage ...
Thu Dec 30 12:38:10-INFO: Run batch: 0
Thu Dec 30 12:39:03-INFO: Run batch: 5
Thu Dec 30 12:39:46-INFO: Finish sampling stage, all batch: 10
Thu Dec 30 12:39:46-INFO: Calculate hist threshold ...
Thu Dec 30 12:39:47-INFO: Update the program ...
Thu Dec 30 12:39:49-INFO: The quantized model is saved in output/mv3_int8_infer
```
离线量化完成后,会在`output_dir`中生成量化后的Inference模型。
<a name="2.3"></a>
### 2.3 验证推理结果
- 量化推理模型重新命名:
需要将`__model__`重命名为`inference.pdmodel`,将`__params__`重命名为`inference.pdiparams`
正确的命名如下:
```shell
output/mv3_int8_infer/
|----inference.pdiparams : 模型参数文件(原__params__文件)
|----inference.pdmodel : 模型结构文件(原__model__文件)
```
- 使用Paddle Inference测试模型推理结果是否正确:
具体测试流程请参考[Inference推理文档](https://github.com/PaddlePaddle/models/blob/release/2.2/tutorials/mobilenetv3_prod/Step6/deploy/inference_python/README.md)
如果您希望验证量化模型的在全量验证集上的精度,也可以按照下面的步骤进行操作:
使用如下命令验证MobileNetv3 small模型的精度:
- FP32模型:
```bash
python eval.py --model_path=mobilenet_v3_small_infer/ \
--model_filename=inference.pdmodel \
--params_filename=inference.pdiparams \
--data_dir=/path/dataset/ILSVRC2012/ \
--batch_size=128 \
--use_gpu=True
```
FP32模型精度验证日志如下:
```
batch_id 300, acc1 0.602, acc5 0.825, avg time 0.00005 sec/img
batch_id 310, acc1 0.602, acc5 0.825, avg time 0.00005 sec/img
batch_id 320, acc1 0.602, acc5 0.825, avg time 0.00005 sec/img
batch_id 330, acc1 0.602, acc5 0.825, avg time 0.00005 sec/img
batch_id 340, acc1 0.601, acc5 0.825, avg time 0.00005 sec/img
batch_id 350, acc1 0.601, acc5 0.825, avg time 0.00005 sec/img
batch_id 360, acc1 0.602, acc5 0.826, avg time 0.00005 sec/img
batch_id 370, acc1 0.602, acc5 0.826, avg time 0.00005 sec/img
batch_id 380, acc1 0.602, acc5 0.825, avg time 0.00005 sec/img
batch_id 390, acc1 0.601, acc5 0.825, avg time 0.00005 sec/img
End test: test image 50000.0
test_acc1 0.6015, test_acc5 0.8253, avg time 0.00005 sec/img
```
- 量化模型:
```shell
python eval.py --model_path=output/mv3_int8_infer/ \
--model_filename=__model__ \
--params_filename=__params__ \
--data_dir=/path/dataset/ILSVRC2012/ \
--batch_size=128 \
--use_gpu=True
```
量化后模型精度验证日志如下:
```
batch_id 300, acc1 0.564, acc5 0.800, avg time 0.00006 sec/img
batch_id 310, acc1 0.562, acc5 0.798, avg time 0.00006 sec/img
batch_id 320, acc1 0.560, acc5 0.796, avg time 0.00006 sec/img
batch_id 330, acc1 0.556, acc5 0.792, avg time 0.00006 sec/img
batch_id 340, acc1 0.554, acc5 0.792, avg time 0.00006 sec/img
batch_id 350, acc1 0.552, acc5 0.790, avg time 0.00006 sec/img
batch_id 360, acc1 0.550, acc5 0.789, avg time 0.00006 sec/img
batch_id 370, acc1 0.551, acc5 0.789, avg time 0.00006 sec/img
batch_id 380, acc1 0.551, acc5 0.789, avg time 0.00006 sec/img
batch_id 390, acc1 0.553, acc5 0.790, avg time 0.00006 sec/img
End test: test image 50000.0
test_acc1 0.5530, test_acc5 0.7905, avg time 0.00006 sec/img
```
<a name="3"></a>
## 3. FAQ
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import numpy as np
import time
import sys
import argparse
import math
sys.path[0] = os.path.join(
os.path.dirname("__file__"), os.path.pardir, os.path.pardir)
import paddle
import paddle.inference as paddle_infer
from presets import ClassificationPresetEval
import paddlevision
def eval():
# create predictor
model_file = os.path.join(FLAGS.model_path, FLAGS.model_filename)
params_file = os.path.join(FLAGS.model_path, FLAGS.params_filename)
config = paddle_infer.Config(model_file, params_file)
if FLAGS.use_gpu:
config.enable_use_gpu(1000, 0)
if not FLAGS.ir_optim:
config.switch_ir_optim(False)
predictor = paddle_infer.create_predictor(config)
input_names = predictor.get_input_names()
input_handle = predictor.get_input_handle(input_names[0])
output_names = predictor.get_output_names()
output_handle = predictor.get_output_handle(output_names[0])
# prepare data
resize_size, crop_size = (256, 224)
val_dataset = paddlevision.datasets.ImageFolder(
os.path.join(FLAGS.data_dir, 'val'),
ClassificationPresetEval(
crop_size=crop_size, resize_size=resize_size))
eval_loader = paddle.io.DataLoader(
val_dataset, batch_size=FLAGS.batch_size, num_workers=5)
cost_time = 0.
total_num = 0.
correct_1_num = 0
correct_5_num = 0
for batch_id, data in enumerate(eval_loader()):
# set input
img_np = np.array([tensor.numpy() for tensor in data[0]])
label_np = np.array([tensor.numpy() for tensor in data[1]])
input_handle.reshape(img_np.shape)
input_handle.copy_from_cpu(img_np)
# run
t1 = time.time()
predictor.run()
t2 = time.time()
cost_time += (t2 - t1)
output_data = output_handle.copy_to_cpu()
# calculate accuracy
for i in range(len(label_np)):
label = label_np[i][0]
result = output_data[i, :]
index = result.argsort()
total_num += 1
if index[-1] == label:
correct_1_num += 1
if label in index[-5:]:
correct_5_num += 1
if batch_id % 10 == 0:
acc1 = correct_1_num / total_num
acc5 = correct_5_num / total_num
avg_time = cost_time / total_num
print(
"batch_id {}, acc1 {:.3f}, acc5 {:.3f}, avg time {:.5f} sec/img".
format(batch_id, acc1, acc5, avg_time))
acc1 = correct_1_num / total_num
acc5 = correct_5_num / total_num
avg_time = cost_time / total_num
print("End test: test image {}".format(total_num))
print("test_acc1 {:.4f}, test_acc5 {:.4f}, avg time {:.5f} sec/img".format(
acc1, acc5, avg_time))
print("\n")
if __name__ == '__main__':
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
'--model_path', type=str, default="", help="The inference model path.")
parser.add_argument(
'--model_filename',
type=str,
default="model.pdmodel",
help="model filename")
parser.add_argument(
'--params_filename',
type=str,
default="model.pdiparams",
help="params filename")
parser.add_argument(
'--data_dir',
type=str,
default="dataset/ILSVRC2012/",
help="The ImageNet dataset root dir.")
parser.add_argument(
'--batch_size', type=int, default=10, help="Batch size.")
parser.add_argument(
'--use_gpu', type=bool, default=False, help=" Whether use gpu or not.")
parser.add_argument(
'--ir_optim', type=bool, default=False, help="Enable ir optim.")
FLAGS = parser.parse_args()
eval()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import division
from __future__ import print_function
import argparse
import os
import sys
import numpy as np
from PIL import Image
sys.path[0] = os.path.join(
os.path.dirname("__file__"), os.path.pardir, os.path.pardir)
import paddle
import paddlevision
from presets import ClassificationPresetEval
from paddleslim.quant import quant_post_static
def sample_generator(loader):
def __reader__():
for indx, data in enumerate(loader):
images = np.array(data[0])
yield images
return __reader__
def main():
paddle.enable_static()
place = paddle.CUDAPlace(0) if FLAGS.use_gpu else paddle.CPUPlace()
resize_size, crop_size = (256, 224)
val_dataset = paddlevision.datasets.ImageFolder(
os.path.join(FLAGS.data_dir, 'val'),
ClassificationPresetEval(
crop_size=crop_size, resize_size=resize_size))
data_loader = paddle.io.DataLoader(
val_dataset, places=place, batch_size=FLAGS.batch_size)
quant_output_dir = os.path.join(FLAGS.output_dir, "mv3_int8_infer")
exe = paddle.static.Executor(place)
quant_post_static(
executor=exe,
model_dir=FLAGS.model_path,
quantize_model_path=quant_output_dir,
sample_generator=sample_generator(data_loader),
model_filename=FLAGS.model_filename,
params_filename=FLAGS.params_filename,
batch_size=FLAGS.batch_size,
batch_nums=FLAGS.batch_num,
algo=FLAGS.algo,
hist_percent=FLAGS.hist_percent)
if __name__ == '__main__':
parser = argparse.ArgumentParser("Quantization on ImageNet")
parser.add_argument(
"--model_path", type=str, default=None, help="Inference model path")
parser.add_argument(
"--model_filename",
type=str,
default=None,
help="Inference model model_filename")
parser.add_argument(
"--params_filename",
type=str,
default=None,
help="Inference model params_filename")
parser.add_argument(
"--output_dir", type=str, default='output', help="save dir")
parser.add_argument(
'--data_dir',
default="/dataset/ILSVRC2012",
help='path to dataset (should have subdirectories named "train" and "val"'
)
parser.add_argument(
'--use_gpu',
default=True,
type=bool,
help='Whether to use GPU or not.')
# train
parser.add_argument(
"--batch_num", default=10, type=int, help="batch num for quant")
parser.add_argument(
"--batch_size", default=10, type=int, help="batch size for quant")
parser.add_argument(
'--algo', default='hist', type=str, help="calibration algorithm")
parser.add_argument(
'--hist_percent',
default=0.999,
type=float,
help="The percentile of algo:hist")
FLAGS = parser.parse_args()
assert FLAGS.data_dir, "error: must provide data path"
main()
# Linux GPU/CPU 离线量化功能开发文档
# 目录
- [1. 简介](#1---)
- [2. 开发流程](#2---)
- [3. FAQ](#3---)
# Linux GPU/CPU 离线量化功能开发文档
# 目录
- [1. 简介](#1)
- [2. 离线量化功能开发](#2)
- [2.1 准备校准数据和环境](#2.1)
- [2.2 准备推理模型](#2.2)
- [2.3 准备离线量化代码](#2.3)
- [2.4 开始离线量化](#2.4)
- [2.5 验证推理结果正确性](#2.5)
- [3. FAQ](#3)
- [3.1 通用问题](#3.1)
<a name="1"></a>
## 1. 简介
Paddle中静态离线量化,使用少量校准数据计算量化因子,可以快速将FP32模型量化成低比特模型(比如最常用的int8量化)。使用该量化模型进行预测,可以减少计算量、降低计算内存、减小模型大小。
更多关于Paddle 模型离线量化的介绍,可以参考[Paddle 离线量化官网教程](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/static/quant/quantization_api.rst#quant_post_static)
<a name="2"></a>
## 2. 离线量化功能开发
Paddle 混合精度训练开发可以分为4个步骤,如下图所示。
<div align="center">
<img src="../images/post_training_quant_guide.png" width="600">
</div>
其中设置了2个核验点,分别为:
* 准备推理模型
* 验证量化模型推理结果正确性
<a name="2.1"></a>
### 2.1 准备校准数据和环境
**【准备校准数据】**
由于离线量化需要获得网络预测的每层的scale值,用来做数值范围的映射,所以需要适量的数据执行网络前向,故需要事先准备好校准数据集。
以ImageNet1k数据集为例,可参考[数据准备文档](https://github.com/PaddlePaddle/models/tree/release/2.2/tutorials/mobilenetv3_prod/Step6#32-%E5%87%86%E5%A4%87%E6%95%B0%E6%8D%AE)
**【准备开发环境】**
- 确定已安装paddle,通过pip安装linux版本paddle命令如下,更多的版本安装方法可查看飞桨[官网](https://www.paddlepaddle.org.cn/)
- 确定已安装paddleslim,通过pip安装linux版本paddle命令如下,更多的版本安装方法可查看[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
```
pip install paddlepaddle-gpu==2.2.1.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
pip install paddleslim==2.2.1
```
<a name="2.2"></a>
### 2.2 准备推理模型
**【基本流程】**
准备推理模型分为三步:
- Step1:定义继承自`paddle.nn.Layer`的网络模型
- Step2:使用`paddle.jit.save`接口对模型进行动转静,导出成Inference模型
- Step3:检查导出的路径下是否生成 `model.pdmodel``model.pdiparams` 文件
**【实战】**
模型组网可以参考[mobilenet_v3](https://github.com/PaddlePaddle/models/blob/release/2.2/tutorials/mobilenetv3_prod/Step6/paddlevision/models/mobilenet_v3.py)
```python
fp32_model = mobilenet_v3_small()
fp32_model.eval()
```
然后将模型进行动转静:
```python
# save inference model
input_spec = paddle.static.InputSpec(
shape=[None, 3, 224, 224], dtype='float32')
fp32_output_model_path = os.path.join("mv3_fp32_infer", "model")
paddle.jit.save(fp32_model, fp32_output_model_path, [input_spec])
```
会在`mv3_fp32_infer`文件夹下生成`model.pdmodel``model.pdiparams`两个文件。
<a name="2.3"></a>
### 2.3 准备离线量化代码
**【基本流程】**
基于PaddleSlim,使用接口``paddleslim.quant.quant_post_static``对模型进行离线量化:
- Step1:定义`sample_generator`,传入paddle.io.Dataloader实例化对象,用来遍历校准数据集
- Step2:定义Executor,由于离线量化模型是Inference模型,量化校准过程也需要在静态图下执行,所以需要定义静态图Executor,用来执行离线量化校准执行
**【实战】**
1)定义数据集,可以参考[Datasets定义](https://github.com/PaddlePaddle/models/blob/release/2.2/tutorials/mobilenetv3_prod/Step6/paddlevision/datasets/vision.py)
2)定义`sample_generator`
```python
def sample_generator(loader):
def __reader__():
for indx, data in enumerate(loader):
images = np.array(data[0])
yield images
return __reader__
```
2)定义Executor:
```python
use_gpu = True
place = paddle.CUDAPlace(0) if use_gpu else paddle.CPUPlace()
exe = paddle.static.Executor(place)
```
<a name="2.4"></a>
### 2.4 开始离线量化
**【基本流程】**
使用飞桨PaddleSlim中的`quant_post_static`接口开始进行离线量化:
- Step1:导入`quant_post_static`接口
```python
from paddleslim.quant import quant_post_static
```
- Step2:配置传入`quant_post_static`接口参数,开始离线量化
```python
fp32_model_dir = 'mv3_fp32_infer'
quant_output_dir = 'quant_model'
quant_post_static(
executor=exe,
model_dir=fp32_model_dir,
quantize_model_path=quant_output_dir,
sample_generator=sample_generator(data_loader),
model_filename='model.pdmodel',
params_filename='model.pdiparams',
batch_size=32,
batch_nums=10,
algo='KL')
```
- Step3:检查输出结果,确保离线量化后生成`__model__``__params__`文件。
**【实战】**
开始离线量化,具体可参考MobileNetv3[离线量化代码](https://github.com/PaddlePaddle/models/tree/release/2.2/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/post_quant.py)
<a name="2.5"></a>
### 2.5 验证推理结果正确性
**【基本流程】**
使用Paddle Inference库测试离线量化模型,确保模型精度符合预期。
- Step1:初始化`paddle.inference`库并配置相应参数
```python
import paddle.inference as paddle_infer
model_file = os.path.join('quant_model', '__model__')
params_file = os.path.join('quant_model', '__params__')
config = paddle_infer.Config(model_file, params_file)
if FLAGS.use_gpu:
config.enable_use_gpu(1000, 0)
if not FLAGS.ir_optim:
config.switch_ir_optim(False)
predictor = paddle_infer.create_predictor(config)
```
- Step2:配置预测库输入输出
```python
```python
input_names = predictor.get_input_names()
input_handle = predictor.get_input_handle(input_names[0])
output_names = predictor.get_output_names()
output_handle = predictor.get_output_handle(output_names[0])
```
- Step3:开始预测并检验结果正确性
```python
```python
input_handle.copy_from_cpu(img_np)
predictor.run()
output_data = output_handle.copy_to_cpu()
```
**【实战】**
1)初始化`paddle.inference`库并配置相应参数:
具体可以参考MobileNetv3 [Inference模型测试代码](https://github.com/PaddlePaddle/models/tree/release/2.2/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/eval.py)
2)配置预测库输入输出:
具体可以参考MobileNetv3 [Inference模型测试代码](https://github.com/PaddlePaddle/models/tree/release/2.2/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/eval.py)
3)开始预测:
具体可以参考MobileNetv3 [Inference模型测试代码](https://github.com/PaddlePaddle/models/tree/release/2.2/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/eval.py)
4)测试单张图像预测结果是否正确,可参考[Inference预测文档](https://github.com/PaddlePaddle/models/blob/release/2.2/docs/tipc/train_infer_python/infer_python.md)
5)同时也可以测试量化模型和FP32模型的精度,确保量化后模型精度损失符合预期。参考[MobileNet量化模型精度验证文档](https://github.com/PaddlePaddle/models/tree/release/2.2/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/README.md)
<a name="3"></a>
## 3. FAQ
如果您在使用该文档完成离线量化的过程中遇到问题,可以给在[这里](https://github.com/PaddlePaddle/PaddleSlim/issues)提一个ISSUE,我们会高优跟进。
## 3.1 通用问题
- 如何选择离线量化方法?
选择合适的离线量化方法,比如`KL``hist``mse`等,具体离线量化方法选择可以参考API文档:[quant_post_static API文档](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/static/quant/quantization_api.rst#quant_post_static)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册