paddleslim.quant package

Submodules

paddleslim.quant.quant_embedding module

paddleslim.quant.quant_embedding.quant_embedding(program, place, config, scope=None)

quantize lookup_table op parameters

Parameters:
  • program (fluid.Program) – infer program
  • scope (fluid.Scope) –

    Scope records the mapping between variable names and variables, similar to brackets in programming languages. Usually users can use fluid.global_scope() . When None will use fluid.global_scope(). Default : None.

  • place (fluid.CPUPlace or fluid.CUDAPlace) – This parameter represents the executor run on which device.
  • config (dict) – config to quantize. The keys are ‘params_name’, ‘quantize_type’, ‘quantize_bits’, ‘dtype’, ‘threshold’. params_name is parameter name to quantize, must be set. quantize_type is quantize type, supported types are [‘abs_max’], default is “abs_max”. quantize_bits supported bits are [8] and default is 8. dtype is quantize dtype, supported dtype are [‘int8’], default is ‘int8’. threshold is threshold to clip tensor before quant. When threshold is not set, tensor will not be clipped.
Returns:

None

paddleslim.quant.quanter module

paddleslim.quant.quanter.convert(program, place, config=None, scope=None, save_int8=False)

convert quantized and well-trained program to final quantized program that can be used to save inference model.

Parameters:
  • program (fluid.Program) – quantized and well-trained test program.
  • place (fluid.CPUPlace or fluid.CUDAPlace) – This parameter represents the executor run on which device.
  • config (dict, optional) – configs for convert. if set None, will use default config. It must be same with config that used in ‘quant_aware’. Default: None.
  • scope (fluid.Scope, optional) –

    Scope records the mapping between variable names and variables, similar to brackets in programming languages. Usually users can use fluid.global_scope. When None will use fluid.global_scope() . Default: None.

  • save_int8 – Whether to return program which model parameters’ dtype is int8. This parameter can only be used to get model size. Default: False.
Returns:

freezed program which can be used for inference. when save_int8 is False, return freezed_program(fluid.Program). when save_int8 is True, return freezed_program(fluid.Program) and freezed_program_int8(fluid.Program)

Return type:

Tuple

paddleslim.quant.quanter.quant_aware(program, place, config=None, scope=None, for_test=False)

Add quantization and dequantization operators to “program” for quantization training or testing.

Parameters:
  • program (fluid.Program) – training or testing program.
  • place (fluid.CPUPlace or fluid.CUDAPlace) – This parameter represents the executor run on which device.
  • config (dict, optional) – configs for quantization. if None, will use default config. Default: None.
  • scope (fluid.Scope) –

    Scope records the mapping between variable names and variables, similar to brackets in programming languages. Usually users can use fluid.global_scope. When None will use fluid.global_scope() . Default: None.

  • for_test (bool) – If the ‘program’ parameter is a test program, this parameter should be set to True. Otherwise, set to False.Default: False
Returns:

Program with quantization and dequantization operators

Return type:

fluid.CompiledProgram | fluid.Program

paddleslim.quant.quanter.quant_post(executor, model_dir, quantize_model_path, sample_generator, model_filename=None, params_filename=None, batch_size=16, batch_nums=None, scope=None, algo='KL', quantizable_op_type=['conv2d', 'depthwise_conv2d', 'mul'], is_full_quantize=False, weight_bits=8, activation_bits=8, is_use_cache_file=False, cache_dir='./temp_post_training')

The function utilizes post training quantization method to quantize the fp32 model. It uses calibrate data to calculate the scale factor of quantized variables, and inserts fake quantization and dequantization operators to obtain the quantized model.

Parameters:
  • executor (fluid.Executor) – The executor to load, run and save the quantized model.
  • model_dir (str) – The path of fp32 model that will be quantized, and the model and params that saved by fluid.io.save_inference_model are under the path.
  • quantize_model_path (str) – The path to save quantized model using api fluid.io.save_inference_model.
  • sample_generator (Python Generator) – The sample generator provides calibrate data for DataLoader, and it only returns a sample every time.
  • model_filename (str, optional) – The name of model file. If parameters are saved in separate files, set it as ‘None’. Default: ‘None’.
  • params_filename (str, optional) – The name of params file. When all parameters are saved in a single file, set it as filename. If parameters are saved in separate files, set it as ‘None’. Default : ‘None’.
  • batch_size (int, optional) – The batch size of DataLoader, default is 16.
  • batch_nums (int, optional) – If batch_nums is not None, the number of calibrate data is ‘batch_size*batch_nums’. If batch_nums is None, use all data generated by sample_generator as calibrate data.
  • scope (fluid.Scope, optional) – The scope to run program, use it to load and save variables. If scope is None, will use fluid.global_scope().
  • algo (str, optional) – If algo=KL, use KL-divergenc method to get the more precise scale factor. If algo=’direct’, use abs_max method to get the scale factor. Default: ‘KL’.
  • quantizable_op_type (list[str], optional) – The list of op types that will be quantized. Default: [“conv2d”, “depthwise_conv2d”, “mul”].
  • weight_bits (int, optional) – quantization bit number for weights.
  • activation_bits (int) – quantization bit number for activation.
  • is_full_quantize (bool) – if True, apply quantization to all supported quantizable op type. If False, only apply quantization to the input quantizable_op_type. Default is False.
  • is_use_cache_file (bool) – If False, all temp data will be saved in memory. If True, all temp data will be saved to disk. Defalut: False.
  • cache_dir (str) – When ‘is_use_cache_file’ is True, temp data will be save in ‘cache_dir’. Default is ‘./temp_post_training’.
Returns:

None