scope(fluid.Scope): the scope to store var, when is None will use fluid.global_scope()
scope(fluid.Scope): Scope records the mapping between variable names and variables, similar to brackets in programming languages. Usually users can use `fluid.global_scope() <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_ . When ``None`` will use `fluid.global_scope() <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_. Default : ``None``.
place(fluid.CPUPlace or fluid.CUDAPlace): place
place(fluid.CPUPlace or fluid.CUDAPlace): This parameter represents the executor run on which device.
config(dict): config to quant. The keys are 'params_name', 'quantize_type', \
config(dict): config to quantize. The keys are 'params_name', 'quantize_type', \
'quantize_bits', 'dtype', 'threshold'. \
'quantize_bits', 'dtype', 'threshold'. \
'params_name': parameter name to quant, must be set.
``params_name`` is parameter name to quantize, must be set.
'quantize_type': quantize type, supported types are ['abs_max']. default is "abs_max".
``quantize_type`` is quantize type, supported types are ['abs_max'], default is "abs_max".
'quantize_bits': quantize bits, supported bits are [8]. default is 8.
``quantize_bits`` supported bits are [8] and default is 8.
'dtype': quantize dtype, supported dtype are ['int8']. default is 'int8'.
``dtype`` is quantize dtype, supported dtype are ['int8'], default is 'int8'.
'threshold': threshold to clip tensor before quant. When threshold is not set, \
``threshold`` is threshold to clip tensor before quant. When threshold is not set, \
tensor will not be clipped.
tensor will not be clipped.
Returns:
None
"""
"""
assertisinstance(config,dict),"config must be dict"
assertisinstance(config,dict),"config must be dict"
"""Add quantization and dequantization operators to "program"
add trainable quantization ops in program.
for quantization training or testing.
Args:
Args:
program(fluid.Program): program to quant
program(fluid.Program): training or testing ``program``.
place(fluid.CPUPlace or fluid.CUDAPlace): CPU or CUDA device
place(fluid.CPUPlace or fluid.CUDAPlace): This parameter represents
config(dict, optional): configs for quantization. if None, will use default config. Default is None.
the executor run on which device.
scope(fluid.Scope): the scope to store var, it should be program's scope. if None, will use fluid.global_scope().
config(dict, optional): configs for quantization. if None, will use default config.
default is None.
Default: None.
for_test(bool): if program is test program, set True when program is for test, False when program is for train. Default is False.
scope(fluid.Scope): Scope records the mapping between variable names and variables,
Return:
similar to brackets in programming languages. Usually users can use
fluid.Program: user can finetune this quantization program to enhance the accuracy.
`fluid.global_scope <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_. When ``None`` will use `fluid.global_scope() <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_ . Default: ``None``.
for_test(bool): If the 'program' parameter is a test program, this parameter should be set to ``True``.
Otherwise, set to ``False``.Default: False
Returns:
fluid.CompiledProgram | fluid.Program: Program with quantization and dequantization ``operators``
"""
"""
scope=fluid.global_scope()ifnotscopeelsescope
scope=fluid.global_scope()ifnotscopeelsescope
...
@@ -237,25 +243,25 @@ def quant_post(executor,
...
@@ -237,25 +243,25 @@ def quant_post(executor,
"""
"""
The function utilizes post training quantization method to quantize the
The function utilizes post training quantization method to quantize the
fp32 model. It uses calibrate data to calculate the scale factor of
fp32 model. It uses calibrate data to calculate the scale factor of
quantized variables, and inserts fake quant/dequant op to obtain the
quantized variables, and inserts fake quantization and dequantization
quantized model.
operators to obtain the quantized model.
Args:
Args:
executor(fluid.Executor): The executor to load, run and save the
executor(fluid.Executor): The executor to load, run and save the
quantized model.
quantized model.
model_dir(str): The path of fp32 model that will be quantized, and
model_dir(str): The path of fp32 model that will be quantized, and
the model and params that saved by fluid.io.save_inference_model
the model and params that saved by ``fluid.io.save_inference_model``
are under the path.
are under the path.
quantize_model_path(str): The path to save quantized model using api
quantize_model_path(str): The path to save quantized model using api
fluid.io.save_inference_model.
``fluid.io.save_inference_model``.
sample_generator(Python Generator): The sample generator provides
sample_generator(Python Generator): The sample generator provides
calibrate data for DataLoader, and it only returns a sample every time.
calibrate data for DataLoader, and it only returns a sample every time.
model_filename(str, optional): The name of model file. If parameters
model_filename(str, optional): The name of model file. If parameters
are saved in separate files, set it as 'None'. Default is 'None'.
are saved in separate files, set it as 'None'. Default: 'None'.
params_filename(str, optional): The name of params file.
params_filename(str, optional): The name of params file.
When all parameters are saved in a single file, set it
When all parameters are saved in a single file, set it
as filename. If parameters are saved in separate files,
as filename. If parameters are saved in separate files,
set it as 'None'. Default is 'None'.
set it as 'None'. Default : 'None'.
batch_size(int, optional): The batch size of DataLoader, default is 16.
batch_size(int, optional): The batch size of DataLoader, default is 16.
batch_nums(int, optional): If batch_nums is not None, the number of calibrate
batch_nums(int, optional): If batch_nums is not None, the number of calibrate
data is 'batch_size*batch_nums'. If batch_nums is None, use all data
data is 'batch_size*batch_nums'. If batch_nums is None, use all data
...
@@ -264,15 +270,16 @@ def quant_post(executor,
...
@@ -264,15 +270,16 @@ def quant_post(executor,
and save variables. If scope is None, will use fluid.global_scope().
and save variables. If scope is None, will use fluid.global_scope().
algo(str, optional): If algo=KL, use KL-divergenc method to
algo(str, optional): If algo=KL, use KL-divergenc method to
get the more precise scale factor. If algo='direct', use
get the more precise scale factor. If algo='direct', use
abs_max method to get the scale factor. Default is 'KL'.
abs_max method to get the scale factor. Default: 'KL'.
quantizable_op_type(list[str], optional): The list of op types
quantizable_op_type(list[str], optional): The list of op types
that will be quantized. Default is ["conv2d", "depthwise_conv2d",
that will be quantized. Default: ["conv2d", "depthwise_conv2d",
"mul"].
"mul"].
is_full_quantize(bool): if True, apply quantization to all supported quantizable op type.
is_full_quantize(bool): if True, apply quantization to all supported quantizable op type.
If False, only apply quantization to the input quantizable_op_type. Default is False.
If False, only apply quantization to the input quantizable_op_type. Default is False.
is_use_cache_file(bool): If False, all temp data will be saved in memory. If True,
is_use_cache_file(bool): If False, all temp data will be saved in memory. If True,
all temp data will be saved to disk. Defalut is False.
all temp data will be saved to disk. Defalut: False.
cache_dir(str): When 'is_use_cache_file' is True, temp data will be save in 'cache_dir'. Default is './temp_post_training'.
cache_dir(str): When 'is_use_cache_file' is True, temp data will be save in 'cache_dir'. Default is './temp_post_training'.
change quantization ops order in program. return program that can used by Paddle-Lite.
convert quantized and well-trained ``program`` to final quantized ``program`` that can be used to save ``inference model``.
Args:
Args:
program(fluid.Program): program that returned by quant_aware
program(fluid.Program): quantized and well-trained ``test program``.
place(fluid.CPUPlace or fluid.CUDAPlace): CPU or CUDA device
place(fluid.CPUPlace or fluid.CUDAPlace): This parameter represents the executor run on which device.
scope(fluid.Scope, optional): the scope to store var, it should be program's scope. if None, will use fluid.global_scope().
config(dict, optional): configs for convert. if set None, will use default config.
default is None.
It must be same with config that used in 'quant_aware'. Default: None.
config(dict, optional): configs for convert. if set None, will use default config. Default is None.\
scope(fluid.Scope, optional): Scope records the mapping between variable names and variables,
It must be same with config that used in 'quant_aware'.
similar to brackets in programming languages. Usually users can use
save_int8: if return int8 freezed program. Int8 program can only be used to check size of model weights. \
`fluid.global_scope <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_. When ``None`` will use `fluid.global_scope() <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_ . Default: ``None``.
It cannot be used in Fluid or Paddle-Lite.
save_int8: Whether to return ``program`` which model parameters' dtype is ``int8``.
Return:
This parameter can only be used to get model size. Default: ``False``.
freezed_program(fluid.Program): freezed program which can be used for inference.
parameters is float32 type, but it's value in int8 range.