diff --git a/deploy/slim/quantization/README.md b/deploy/slim/quantization/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ccd4d06b4f16165f968402751b63a8fe58773e0b --- /dev/null +++ b/deploy/slim/quantization/README.md @@ -0,0 +1,61 @@ + +## 介绍 +复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型量化将全精度缩减到定点数减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。 +模型量化可以在基本不损失模型的精度的情况下,将FP32精度的模型参数转换为Int8精度,减小模型参数大小并加速计算,使用量化后的模型在移动端等部署时更具备速度优势。 + +本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleOCR模型的压缩。 +[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) 集成了模型剪枝、量化(包括量化训练和离线量化)、蒸馏和神经网络搜索等多种业界常用且领先的模型压缩功能,如果您感兴趣,可以关注并了解。 + +在开始本教程之前,建议先了解[PaddleOCR模型的训练方法](../../../doc/doc_ch/quickstart.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html) + + +## 快速开始 +量化多适用于轻量模型在移动端的部署,当训练出一个模型后,如果希望进一步的压缩模型大小并加速预测,可使用量化的方法压缩模型。 + +模型量化主要包括五个步骤: +1. 安装 PaddleSlim +2. 准备训练好的模型 +3. 量化训练 +4. 导出量化推理模型 +5. 量化模型预测部署 + +### 1. 安装PaddleSlim + +```bash +git clone https://github.com/PaddlePaddle/PaddleSlim.git +cd Paddleslim +python setup.py install +``` + +### 2. 准备训练好的模型 + +PaddleOCR提供了一系列训练好的[模型](../../../doc/doc_ch/models_list.md),如果待量化的模型不在列表中,需要按照[常规训练](../../../doc/doc_ch/quickstart.md)方法得到训练好的模型。 + +### 3. 量化训练 +量化训练包括离线量化训练和在线量化训练,在线量化训练效果更好,需加载预训练模型,在定义好量化策略后即可对模型进行量化。 + + +量化训练的代码位于slim/quantization/quant.py 中,比如训练检测模型,训练指令如下: +```bash +python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights='your trained model' Global.save_model_dir=./output/quant_model + +# 比如下载提供的训练模型 +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar +tar -xf ch_ppocr_mobile_v2.0_det_train.tar +python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./ch_ppocr_mobile_v2.0_det_train/best_accuracy Global.save_model_dir=./output/quant_model + +``` +如果要训练识别模型的量化,修改配置文件和加载的模型参数即可。 + +### 4. 导出模型 + +在得到量化训练保存的模型后,我们可以将其导出为inference_model,用于预测部署: + +```bash +python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model +``` + +### 5. 量化模型部署 + +上述步骤导出的量化模型,参数精度仍然是FP32,但是参数的数值范围是int8,导出的模型可以通过PaddleLite的opt模型转换工具完成模型转换。 +量化模型部署的可参考 [移动端模型部署](../../lite/readme.md) diff --git a/deploy/slim/quantization/README_en.md b/deploy/slim/quantization/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..7da0b3e7e7d5f72e45dc17864630b9725f6fc8ba --- /dev/null +++ b/deploy/slim/quantization/README_en.md @@ -0,0 +1,68 @@ + +## Introduction + +Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. +Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number, +so as to reduce model calculation complexity and improve model inference performance. + +This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the OCR model. + +It is recommended that you could understand following pages before reading this example: +- [The training strategy of OCR model](../../../doc/doc_en/quickstart_en.md) +- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) + +## Quick Start +Quantization is mostly suitable for the deployment of lightweight models on mobile terminals. +After training, if you want to further compress the model size and accelerate the prediction, you can use quantization methods to compress the model according to the following steps. + +1. Install PaddleSlim +2. Prepare trained model +3. Quantization-Aware Training +4. Export inference model +5. Deploy quantization inference model + + +### 1. Install PaddleSlim + +```bash +git clone https://github.com/PaddlePaddle/PaddleSlim.git +cd Paddleslim +python setup.py install +``` + + +### 2. Download Pretrain Model +PaddleOCR provides a series of trained [models](../../../doc/doc_en/models_list_en.md). +If the model to be quantified is not in the list, you need to follow the [Regular Training](../../../doc/doc_en/quickstart_en.md) method to get the trained model. + + +### 3. Quant-Aware Training +Quantization training includes offline quantization training and online quantization training. +Online quantization training is more effective. It is necessary to load the pre-training model. +After the quantization strategy is defined, the model can be quantified. + +The code for quantization training is located in `slim/quantization/quant.py`. For example, to train a detection model, the training instructions are as follows: +```bash +python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights='your trained model' Global.save_model_dir=./output/quant_model + +# download provided model +wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar +tar -xf ch_ppocr_mobile_v2.0_det_train.tar +python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./ch_ppocr_mobile_v2.0_det_train/best_accuracy Global.save_model_dir=./output/quant_model + +``` + + +### 4. Export inference model + +After getting the model after pruning and finetuning we, can export it as inference_model for predictive deployment: + +```bash +python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model +``` + +### 5. Deploy +The numerical range of the quantized model parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8. +The derived model can be converted through the `opt tool` of PaddleLite. + +For quantitative model deployment, please refer to [Mobile terminal model deployment](../../lite/readme_en.md) diff --git a/deploy/slim/quantization/export_model.py b/deploy/slim/quantization/export_model.py new file mode 100755 index 0000000000000000000000000000000000000000..100b107a1deb1ce9932c9cefa50659c060f5803e --- /dev/null +++ b/deploy/slim/quantization/export_model.py @@ -0,0 +1,118 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys + +__dir__ = os.path.dirname(os.path.abspath(__file__)) +sys.path.append(__dir__) +sys.path.append(os.path.abspath(os.path.join(__dir__, '..', '..', '..'))) +sys.path.append( + os.path.abspath(os.path.join(__dir__, '..', '..', '..', 'tools'))) + +import argparse + +import paddle +from paddle.jit import to_static + +from ppocr.modeling.architectures import build_model +from ppocr.postprocess import build_post_process +from ppocr.utils.save_load import init_model +from ppocr.utils.logging import get_logger +from tools.program import load_config, merge_config, ArgsParser +from ppocr.metrics import build_metric +import tools.program as program +from paddleslim.dygraph.quant import QAT +from ppocr.data import build_dataloader + + +def main(): + ############################################################################################################ + # 1. quantization configs + ############################################################################################################ + quant_config = { + # weight preprocess type, default is None and no preprocessing is performed. + 'weight_preprocess_type': None, + # activation preprocess type, default is None and no preprocessing is performed. + 'activation_preprocess_type': None, + # weight quantize type, default is 'channel_wise_abs_max' + 'weight_quantize_type': 'channel_wise_abs_max', + # activation quantize type, default is 'moving_average_abs_max' + 'activation_quantize_type': 'moving_average_abs_max', + # weight quantize bit num, default is 8 + 'weight_bits': 8, + # activation quantize bit num, default is 8 + 'activation_bits': 8, + # data type after quantization, such as 'uint8', 'int8', etc. default is 'int8' + 'dtype': 'int8', + # window size for 'range_abs_max' quantization. default is 10000 + 'window_size': 10000, + # The decay coefficient of moving average, default is 0.9 + 'moving_rate': 0.9, + # for dygraph quantization, layers of type in quantizable_layer_type will be quantized + 'quantizable_layer_type': ['Conv2D', 'Linear'], + } + FLAGS = ArgsParser().parse_args() + config = load_config(FLAGS.config) + merge_config(FLAGS.opt) + logger = get_logger() + # build post process + + post_process_class = build_post_process(config['PostProcess'], + config['Global']) + + # build model + # for rec algorithm + if hasattr(post_process_class, 'character'): + char_num = len(getattr(post_process_class, 'character')) + config['Architecture']["Head"]['out_channels'] = char_num + model = build_model(config['Architecture']) + + # get QAT model + quanter = QAT(config=quant_config) + quanter.quantize(model) + + init_model(config, model, logger) + model.eval() + + # build metric + eval_class = build_metric(config['Metric']) + + # build dataloader + valid_dataloader = build_dataloader(config, 'Eval', device, logger) + + # start eval + metirc = program.eval(model, valid_dataloader, post_process_class, + eval_class) + logger.info('metric eval ***************') + for k, v in metirc.items(): + logger.info('{}:{}'.format(k, v)) + + save_path = '{}/inference'.format(config['Global']['save_inference_dir']) + infer_shape = [3, 32, 100] if config['Architecture'][ + 'model_type'] != "det" else [3, 640, 640] + + quanter.save_quantized_model( + model, + save_path, + input_spec=[ + paddle.static.InputSpec( + shape=[None] + infer_shape, dtype='float32') + ]) + logger.info('inference QAT model is saved to {}'.format(save_path)) + + +if __name__ == "__main__": + config, device, logger, vdl_writer = program.preprocess() + main() diff --git a/deploy/slim/quantization/quant.py b/deploy/slim/quantization/quant.py new file mode 100755 index 0000000000000000000000000000000000000000..7671e5f871ce6769fc51876d1fa2e5f0af63d904 --- /dev/null +++ b/deploy/slim/quantization/quant.py @@ -0,0 +1,166 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys + +__dir__ = os.path.dirname(os.path.abspath(__file__)) +sys.path.append(__dir__) +sys.path.append(os.path.abspath(os.path.join(__dir__, '..', '..', '..'))) +sys.path.append( + os.path.abspath(os.path.join(__dir__, '..', '..', '..', 'tools'))) + +import yaml +import paddle +import paddle.distributed as dist + +paddle.seed(2) + +from ppocr.data import build_dataloader +from ppocr.modeling.architectures import build_model +from ppocr.losses import build_loss +from ppocr.optimizer import build_optimizer +from ppocr.postprocess import build_post_process +from ppocr.metrics import build_metric +from ppocr.utils.save_load import init_model +import tools.program as program +from paddleslim.dygraph.quant import QAT + +dist.get_world_size() + + +class PACT(paddle.nn.Layer): + def __init__(self): + super(PACT, self).__init__() + alpha_attr = paddle.ParamAttr( + name=self.full_name() + ".pact", + initializer=paddle.nn.initializer.Constant(value=20), + learning_rate=1.0, + regularizer=paddle.regularizer.L2Decay(2e-5)) + + self.alpha = self.create_parameter( + shape=[1], attr=alpha_attr, dtype='float32') + + def forward(self, x): + out_left = paddle.nn.functional.relu(x - self.alpha) + out_right = paddle.nn.functional.relu(-self.alpha - x) + x = x - out_left + out_right + return x + + +quant_config = { + # weight preprocess type, default is None and no preprocessing is performed. + 'weight_preprocess_type': None, + # activation preprocess type, default is None and no preprocessing is performed. + 'activation_preprocess_type': None, + # weight quantize type, default is 'channel_wise_abs_max' + 'weight_quantize_type': 'channel_wise_abs_max', + # activation quantize type, default is 'moving_average_abs_max' + 'activation_quantize_type': 'moving_average_abs_max', + # weight quantize bit num, default is 8 + 'weight_bits': 8, + # activation quantize bit num, default is 8 + 'activation_bits': 8, + # data type after quantization, such as 'uint8', 'int8', etc. default is 'int8' + 'dtype': 'int8', + # window size for 'range_abs_max' quantization. default is 10000 + 'window_size': 10000, + # The decay coefficient of moving average, default is 0.9 + 'moving_rate': 0.9, + # for dygraph quantization, layers of type in quantizable_layer_type will be quantized + 'quantizable_layer_type': ['Conv2D', 'Linear'], +} + + +def main(config, device, logger, vdl_writer): + # init dist environment + if config['Global']['distributed']: + dist.init_parallel_env() + + global_config = config['Global'] + + # build dataloader + train_dataloader = build_dataloader(config, 'Train', device, logger) + if config['Eval']: + valid_dataloader = build_dataloader(config, 'Eval', device, logger) + else: + valid_dataloader = None + + # build post process + post_process_class = build_post_process(config['PostProcess'], + global_config) + + # build model + # for rec algorithm + if hasattr(post_process_class, 'character'): + char_num = len(getattr(post_process_class, 'character')) + config['Architecture']["Head"]['out_channels'] = char_num + model = build_model(config['Architecture']) + + # prepare to quant + quanter = QAT(config=quant_config, act_preprocess=PACT) + quanter.quantize(model) + + if config['Global']['distributed']: + model = paddle.DataParallel(model) + + # build loss + loss_class = build_loss(config['Loss']) + + # build optim + optimizer, lr_scheduler = build_optimizer( + config['Optimizer'], + epochs=config['Global']['epoch_num'], + step_each_epoch=len(train_dataloader), + parameters=model.parameters()) + + # build metric + eval_class = build_metric(config['Metric']) + # load pretrain model + pre_best_model_dict = init_model(config, model, logger, optimizer) + + logger.info('train dataloader has {} iters, valid dataloader has {} iters'. + format(len(train_dataloader), len(valid_dataloader))) + # start train + program.train(config, train_dataloader, valid_dataloader, device, model, + loss_class, optimizer, lr_scheduler, post_process_class, + eval_class, pre_best_model_dict, logger, vdl_writer) + + +def test_reader(config, device, logger): + loader = build_dataloader(config, 'Train', device, logger) + import time + starttime = time.time() + count = 0 + try: + for data in loader(): + count += 1 + if count % 1 == 0: + batch_time = time.time() - starttime + starttime = time.time() + logger.info("reader: {}, {}, {}".format( + count, len(data[0]), batch_time)) + except Exception as e: + logger.info(e) + logger.info("finish reader: {}, Success!".format(count)) + + +if __name__ == '__main__': + config, device, logger, vdl_writer = program.preprocess(is_train=True) + main(config, device, logger, vdl_writer) + # test_reader(config, device, logger)