在线量化示例

本示例介绍如何使用在线量化接口,来对训练好的分类模型进行量化, 可以减少模型的存储空间和显存占用。

接口介绍

请参考 量化API文档

分类模型的离线量化流程

1. 配置量化参数

quant_config = {
    'weight_quantize_type': 'abs_max',
    'activation_quantize_type': 'moving_average_abs_max',
    'weight_bits': 8,
    'activation_bits': 8,
    'not_quant_pattern': ['skip_quant'],
    'quantize_op_types': ['conv2d', 'depthwise_conv2d', 'mul'],
    'dtype': 'int8',
    'window_size': 10000,
    'moving_rate': 0.9,
    'quant_weight_only': False
}

2. 对训练和测试program插入可训练量化op

val_program = quant_aware(val_program, place, quant_config, scope=None, for_test=True)

compiled_train_prog = quant_aware(train_prog, place, quant_config, scope=None, for_test=False)

3.关掉指定build策略

build_strategy = fluid.BuildStrategy()
build_strategy.fuse_all_reduce_ops = False
build_strategy.sync_batch_norm = False
exec_strategy = fluid.ExecutionStrategy()
compiled_train_prog = compiled_train_prog.with_data_parallel(
        loss_name=avg_cost.name,
        build_strategy=build_strategy,
        exec_strategy=exec_strategy)

4. freeze program

float_program, int8_program = convert(val_program, 
                                      place,
                                      quant_config,
                                      scope=None,
                                      save_int8=True)

5.保存预测模型

fluid.io.save_inference_model(
    dirname=float_path,
    feeded_var_names=[image.name],
    target_vars=[out], executor=exe,
    main_program=float_program,
    model_filename=float_path + '/model',
    params_filename=float_path + '/params')

fluid.io.save_inference_model(
    dirname=int8_path,
    feeded_var_names=[image.name],
    target_vars=[out], executor=exe,
    main_program=int8_program,
    model_filename=int8_path + '/model',
    params_filename=int8_path + '/params')