paddleslim.quant package¶

Submodules¶

paddleslim.quant.quant_embedding module¶

paddleslim.quant.quant_embedding.quant_embedding(program, place, config, scope=None)¶

quantize lookup_table op parameters

Parameters:

program (fluid.Program) – infer program
scope (fluid.Scope) –
Scope records the mapping between variable names and variables, similar to brackets in programming languages. Usually users can use fluid.global_scope() . When None will use fluid.global_scope(). Default : None.
place (fluid.CPUPlace or fluid.CUDAPlace) – This parameter represents the executor run on which device.
config (dict) – config to quantize. The keys are ‘params_name’, ‘quantize_type’, ‘quantize_bits’, ‘dtype’, ‘threshold’. params_name is parameter name to quantize, must be set. quantize_type is quantize type, supported types are [‘abs_max’], default is “abs_max”. quantize_bits supported bits are [8] and default is 8. dtype is quantize dtype, supported dtype are [‘int8’], default is ‘int8’. threshold is threshold to clip tensor before quant. When threshold is not set, tensor will not be clipped.

Returns:

None

paddleslim.quant.quanter module¶

paddleslim.quant.quanter.convert(program, place, config=None, scope=None, save_int8=False)¶

convert quantized and well-trained program to final quantized program that can be used to save inference model.

Parameters:	program (fluid.Program) – quantized and well-trained `test program`. place (fluid.CPUPlace or fluid.CUDAPlace) – This parameter represents the executor run on which device. config (dict, optional) – configs for convert. if set None, will use default config. It must be same with config that used in ‘quant_aware’. Default: None. scope (fluid.Scope, optional) – Scope records the mapping between variable names and variables, similar to brackets in programming languages. Usually users can use fluid.global_scope. When `None` will use fluid.global_scope() . Default: `None`. save_int8 – Whether to return `program` which model parameters’ dtype is `int8`. This parameter can only be used to get model size. Default: `False`.
Returns:	freezed program which can be used for inference. when `save_int8` is False, return `freezed_program(fluid.Program)`. when `save_int8` is True, return `freezed_program(fluid.Program)` and `freezed_program_int8(fluid.Program)`
Return type:	Tuple

paddleslim.quant.quanter.quant_aware(program, place, config=None, scope=None, for_test=False)¶

Add quantization and dequantization operators to “program” for quantization training or testing.

Parameters:	program (fluid.Program) – training or testing `program`. place (fluid.CPUPlace or fluid.CUDAPlace) – This parameter represents the executor run on which device. config (dict, optional) – configs for quantization. if None, will use default config. Default: None. scope (fluid.Scope) – Scope records the mapping between variable names and variables, similar to brackets in programming languages. Usually users can use fluid.global_scope. When `None` will use fluid.global_scope() . Default: `None`. for_test (bool) – If the ‘program’ parameter is a test program, this parameter should be set to `True`. Otherwise, set to `False`.Default: False
Returns:	Program with quantization and dequantization `operators`
Return type:	fluid.CompiledProgram \| fluid.Program

paddleslim.quant.quanter.quant_post(executor, model_dir, quantize_model_path, sample_generator, model_filename=None, params_filename=None, batch_size=16, batch_nums=None, scope=None, algo='KL', quantizable_op_type=['conv2d', 'depthwise_conv2d', 'mul'], is_full_quantize=False, weight_bits=8, activation_bits=8, is_use_cache_file=False, cache_dir='./temp_post_training')¶

The function utilizes post training quantization method to quantize the fp32 model. It uses calibrate data to calculate the scale factor of quantized variables, and inserts fake quantization and dequantization operators to obtain the quantized model.

Parameters:

executor (fluid.Executor) – The executor to load, run and save the quantized model.
model_dir (str) – The path of fp32 model that will be quantized, and the model and params that saved by fluid.io.save_inference_model are under the path.
quantize_model_path (str) – The path to save quantized model using api fluid.io.save_inference_model.
sample_generator (Python Generator) – The sample generator provides calibrate data for DataLoader, and it only returns a sample every time.
model_filename (str, optional) – The name of model file. If parameters are saved in separate files, set it as ‘None’. Default: ‘None’.
params_filename (str, optional) – The name of params file. When all parameters are saved in a single file, set it as filename. If parameters are saved in separate files, set it as ‘None’. Default : ‘None’.
batch_size (int, optional) – The batch size of DataLoader, default is 16.
batch_nums (int, optional) – If batch_nums is not None, the number of calibrate data is ‘batch_size*batch_nums’. If batch_nums is None, use all data generated by sample_generator as calibrate data.
scope (fluid.Scope, optional) – The scope to run program, use it to load and save variables. If scope is None, will use fluid.global_scope().
algo (str, optional) – If algo=KL, use KL-divergenc method to get the more precise scale factor. If algo=’direct’, use abs_max method to get the scale factor. Default: ‘KL’.
quantizable_op_type (list[str], optional) – The list of op types that will be quantized. Default: [“conv2d”, “depthwise_conv2d”, “mul”].
weight_bits (int, optional) – quantization bit number for weights.
activation_bits (int) – quantization bit number for activation.
is_full_quantize (bool) – if True, apply quantization to all supported quantizable op type. If False, only apply quantization to the input quantizable_op_type. Default is False.
is_use_cache_file (bool) – If False, all temp data will be saved in memory. If True, all temp data will be saved to disk. Defalut: False.
cache_dir (str) – When ‘is_use_cache_file’ is True, temp data will be save in ‘cache_dir’. Default is ‘./temp_post_training’.

Returns:

None