paddleslim.quant package¶
Submodules¶
paddleslim.quant.quant_embedding module¶
-
paddleslim.quant.quant_embedding.
quant_embedding
(program, place, config, scope=None)¶ quantize lookup_table op parameters
Parameters: - program (fluid.Program) – infer program
- scope (fluid.Scope) –
Scope records the mapping between variable names and variables, similar to brackets in programming languages. Usually users can use fluid.global_scope() . When
None
will use fluid.global_scope(). Default :None
. - place (fluid.CPUPlace or fluid.CUDAPlace) – This parameter represents the executor run on which device.
- config (dict) – config to quantize. The keys are ‘params_name’, ‘quantize_type’, ‘quantize_bits’, ‘dtype’, ‘threshold’.
params_name
is parameter name to quantize, must be set.quantize_type
is quantize type, supported types are [‘abs_max’], default is “abs_max”.quantize_bits
supported bits are [8] and default is 8.dtype
is quantize dtype, supported dtype are [‘int8’], default is ‘int8’.threshold
is threshold to clip tensor before quant. When threshold is not set, tensor will not be clipped.
Returns: None
paddleslim.quant.quanter module¶
-
paddleslim.quant.quanter.
convert
(program, place, config=None, scope=None, save_int8=False)¶ convert quantized and well-trained
program
to final quantizedprogram
that can be used to saveinference model
.Parameters: - program (fluid.Program) – quantized and well-trained
test program
. - place (fluid.CPUPlace or fluid.CUDAPlace) – This parameter represents the executor run on which device.
- config (dict, optional) – configs for convert. if set None, will use default config. It must be same with config that used in ‘quant_aware’. Default: None.
- scope (fluid.Scope, optional) –
Scope records the mapping between variable names and variables, similar to brackets in programming languages. Usually users can use fluid.global_scope. When
None
will use fluid.global_scope() . Default:None
. - save_int8 – Whether to return
program
which model parameters’ dtype isint8
. This parameter can only be used to get model size. Default:False
.
Returns: freezed program which can be used for inference. when
save_int8
is False, returnfreezed_program(fluid.Program)
. whensave_int8
is True, returnfreezed_program(fluid.Program)
andfreezed_program_int8(fluid.Program)
Return type: Tuple
- program (fluid.Program) – quantized and well-trained
-
paddleslim.quant.quanter.
quant_aware
(program, place, config=None, scope=None, for_test=False)¶ Add quantization and dequantization operators to “program” for quantization training or testing.
Parameters: - program (fluid.Program) – training or testing
program
. - place (fluid.CPUPlace or fluid.CUDAPlace) – This parameter represents the executor run on which device.
- config (dict, optional) – configs for quantization. if None, will use default config. Default: None.
- scope (fluid.Scope) –
Scope records the mapping between variable names and variables, similar to brackets in programming languages. Usually users can use fluid.global_scope. When
None
will use fluid.global_scope() . Default:None
. - for_test (bool) – If the ‘program’ parameter is a test program, this parameter should be set to
True
. Otherwise, set toFalse
.Default: False
Returns: Program with quantization and dequantization
operators
Return type: fluid.CompiledProgram | fluid.Program
- program (fluid.Program) – training or testing
-
paddleslim.quant.quanter.
quant_post
(executor, model_dir, quantize_model_path, sample_generator, model_filename=None, params_filename=None, batch_size=16, batch_nums=None, scope=None, algo='KL', quantizable_op_type=['conv2d', 'depthwise_conv2d', 'mul'], is_full_quantize=False, weight_bits=8, activation_bits=8, is_use_cache_file=False, cache_dir='./temp_post_training')¶ The function utilizes post training quantization method to quantize the fp32 model. It uses calibrate data to calculate the scale factor of quantized variables, and inserts fake quantization and dequantization operators to obtain the quantized model.
Parameters: - executor (fluid.Executor) – The executor to load, run and save the quantized model.
- model_dir (str) – The path of fp32 model that will be quantized, and
the model and params that saved by
fluid.io.save_inference_model
are under the path. - quantize_model_path (str) – The path to save quantized model using api
fluid.io.save_inference_model
. - sample_generator (Python Generator) – The sample generator provides calibrate data for DataLoader, and it only returns a sample every time.
- model_filename (str, optional) – The name of model file. If parameters are saved in separate files, set it as ‘None’. Default: ‘None’.
- params_filename (str, optional) – The name of params file. When all parameters are saved in a single file, set it as filename. If parameters are saved in separate files, set it as ‘None’. Default : ‘None’.
- batch_size (int, optional) – The batch size of DataLoader, default is 16.
- batch_nums (int, optional) – If batch_nums is not None, the number of calibrate data is ‘batch_size*batch_nums’. If batch_nums is None, use all data generated by sample_generator as calibrate data.
- scope (fluid.Scope, optional) – The scope to run program, use it to load and save variables. If scope is None, will use fluid.global_scope().
- algo (str, optional) – If algo=KL, use KL-divergenc method to get the more precise scale factor. If algo=’direct’, use abs_max method to get the scale factor. Default: ‘KL’.
- quantizable_op_type (list[str], optional) – The list of op types that will be quantized. Default: [“conv2d”, “depthwise_conv2d”, “mul”].
- weight_bits (int, optional) – quantization bit number for weights.
- activation_bits (int) – quantization bit number for activation.
- is_full_quantize (bool) – if True, apply quantization to all supported quantizable op type. If False, only apply quantization to the input quantizable_op_type. Default is False.
- is_use_cache_file (bool) – If False, all temp data will be saved in memory. If True, all temp data will be saved to disk. Defalut: False.
- cache_dir (str) – When ‘is_use_cache_file’ is True, temp data will be save in ‘cache_dir’. Default is ‘./temp_post_training’.
Returns: None