未验证 提交 7c9fa0dd 编写于 作者: L Liufang Sang 提交者: GitHub

update quantization en api format (#97)

上级 2c73ed17
...@@ -233,20 +233,23 @@ def _quant_embedding_abs_max(graph, scope, place, config): ...@@ -233,20 +233,23 @@ def _quant_embedding_abs_max(graph, scope, place, config):
def quant_embedding(program, place, config, scope=None): def quant_embedding(program, place, config, scope=None):
""" """quantize lookup_table op parameters
quant lookup_table op parameters
Args: Args:
program(fluid.Program): infer program program(fluid.Program): infer program
scope(fluid.Scope): the scope to store var, when is None will use fluid.global_scope() scope(fluid.Scope): Scope records the mapping between variable names and variables, similar to brackets in programming languages. Usually users can use `fluid.global_scope() <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_ . When ``None`` will use `fluid.global_scope() <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_. Default : ``None``.
place(fluid.CPUPlace or fluid.CUDAPlace): place place(fluid.CPUPlace or fluid.CUDAPlace): This parameter represents the executor run on which device.
config(dict): config to quant. The keys are 'params_name', 'quantize_type', \ config(dict): config to quantize. The keys are 'params_name', 'quantize_type', \
'quantize_bits', 'dtype', 'threshold'. \ 'quantize_bits', 'dtype', 'threshold'. \
'params_name': parameter name to quant, must be set. ``params_name`` is parameter name to quantize, must be set.
'quantize_type': quantize type, supported types are ['abs_max']. default is "abs_max". ``quantize_type`` is quantize type, supported types are ['abs_max'], default is "abs_max".
'quantize_bits': quantize bits, supported bits are [8]. default is 8. ``quantize_bits`` supported bits are [8] and default is 8.
'dtype': quantize dtype, supported dtype are ['int8']. default is 'int8'. ``dtype`` is quantize dtype, supported dtype are ['int8'], default is 'int8'.
'threshold': threshold to clip tensor before quant. When threshold is not set, \ ``threshold`` is threshold to clip tensor before quant. When threshold is not set, \
tensor will not be clipped. tensor will not be clipped.
Returns:
None
""" """
assert isinstance(config, dict), "config must be dict" assert isinstance(config, dict), "config must be dict"
config = _merge_config(copy.deepcopy(default_config), config) config = _merge_config(copy.deepcopy(default_config), config)
......
...@@ -158,17 +158,23 @@ def _parse_configs(user_config): ...@@ -158,17 +158,23 @@ def _parse_configs(user_config):
def quant_aware(program, place, config=None, scope=None, for_test=False): def quant_aware(program, place, config=None, scope=None, for_test=False):
""" """Add quantization and dequantization operators to "program"
add trainable quantization ops in program. for quantization training or testing.
Args: Args:
program(fluid.Program): program to quant program(fluid.Program): training or testing ``program``.
place(fluid.CPUPlace or fluid.CUDAPlace): CPU or CUDA device place(fluid.CPUPlace or fluid.CUDAPlace): This parameter represents
config(dict, optional): configs for quantization. if None, will use default config. Default is None. the executor run on which device.
scope(fluid.Scope): the scope to store var, it should be program's scope. if None, will use fluid.global_scope(). config(dict, optional): configs for quantization. if None, will use default config.
default is None. Default: None.
for_test(bool): if program is test program, set True when program is for test, False when program is for train. Default is False. scope(fluid.Scope): Scope records the mapping between variable names and variables,
Return: similar to brackets in programming languages. Usually users can use
fluid.Program: user can finetune this quantization program to enhance the accuracy. `fluid.global_scope <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_. When ``None`` will use `fluid.global_scope() <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_ . Default: ``None``.
for_test(bool): If the 'program' parameter is a test program, this parameter should be set to ``True``.
Otherwise, set to ``False``.Default: False
Returns:
fluid.CompiledProgram | fluid.Program: Program with quantization and dequantization ``operators``
""" """
scope = fluid.global_scope() if not scope else scope scope = fluid.global_scope() if not scope else scope
...@@ -237,25 +243,25 @@ def quant_post(executor, ...@@ -237,25 +243,25 @@ def quant_post(executor,
""" """
The function utilizes post training quantization method to quantize the The function utilizes post training quantization method to quantize the
fp32 model. It uses calibrate data to calculate the scale factor of fp32 model. It uses calibrate data to calculate the scale factor of
quantized variables, and inserts fake quant/dequant op to obtain the quantized variables, and inserts fake quantization and dequantization
quantized model. operators to obtain the quantized model.
Args: Args:
executor(fluid.Executor): The executor to load, run and save the executor(fluid.Executor): The executor to load, run and save the
quantized model. quantized model.
model_dir(str): The path of fp32 model that will be quantized, and model_dir(str): The path of fp32 model that will be quantized, and
the model and params that saved by fluid.io.save_inference_model the model and params that saved by ``fluid.io.save_inference_model``
are under the path. are under the path.
quantize_model_path(str): The path to save quantized model using api quantize_model_path(str): The path to save quantized model using api
fluid.io.save_inference_model. ``fluid.io.save_inference_model``.
sample_generator(Python Generator): The sample generator provides sample_generator(Python Generator): The sample generator provides
calibrate data for DataLoader, and it only returns a sample every time. calibrate data for DataLoader, and it only returns a sample every time.
model_filename(str, optional): The name of model file. If parameters model_filename(str, optional): The name of model file. If parameters
are saved in separate files, set it as 'None'. Default is 'None'. are saved in separate files, set it as 'None'. Default: 'None'.
params_filename(str, optional): The name of params file. params_filename(str, optional): The name of params file.
When all parameters are saved in a single file, set it When all parameters are saved in a single file, set it
as filename. If parameters are saved in separate files, as filename. If parameters are saved in separate files,
set it as 'None'. Default is 'None'. set it as 'None'. Default : 'None'.
batch_size(int, optional): The batch size of DataLoader, default is 16. batch_size(int, optional): The batch size of DataLoader, default is 16.
batch_nums(int, optional): If batch_nums is not None, the number of calibrate batch_nums(int, optional): If batch_nums is not None, the number of calibrate
data is 'batch_size*batch_nums'. If batch_nums is None, use all data data is 'batch_size*batch_nums'. If batch_nums is None, use all data
...@@ -264,15 +270,16 @@ def quant_post(executor, ...@@ -264,15 +270,16 @@ def quant_post(executor,
and save variables. If scope is None, will use fluid.global_scope(). and save variables. If scope is None, will use fluid.global_scope().
algo(str, optional): If algo=KL, use KL-divergenc method to algo(str, optional): If algo=KL, use KL-divergenc method to
get the more precise scale factor. If algo='direct', use get the more precise scale factor. If algo='direct', use
abs_max method to get the scale factor. Default is 'KL'. abs_max method to get the scale factor. Default: 'KL'.
quantizable_op_type(list[str], optional): The list of op types quantizable_op_type(list[str], optional): The list of op types
that will be quantized. Default is ["conv2d", "depthwise_conv2d", that will be quantized. Default: ["conv2d", "depthwise_conv2d",
"mul"]. "mul"].
is_full_quantize(bool): if True, apply quantization to all supported quantizable op type. is_full_quantize(bool): if True, apply quantization to all supported quantizable op type.
If False, only apply quantization to the input quantizable_op_type. Default is False. If False, only apply quantization to the input quantizable_op_type. Default is False.
is_use_cache_file(bool): If False, all temp data will be saved in memory. If True, is_use_cache_file(bool): If False, all temp data will be saved in memory. If True,
all temp data will be saved to disk. Defalut is False. all temp data will be saved to disk. Defalut: False.
cache_dir(str): When 'is_use_cache_file' is True, temp data will be save in 'cache_dir'. Default is './temp_post_training'. cache_dir(str): When 'is_use_cache_file' is True, temp data will be save in 'cache_dir'. Default is './temp_post_training'.
Returns: Returns:
None None
""" """
...@@ -296,22 +303,23 @@ def quant_post(executor, ...@@ -296,22 +303,23 @@ def quant_post(executor,
def convert(program, place, config=None, scope=None, save_int8=False): def convert(program, place, config=None, scope=None, save_int8=False):
""" """
change quantization ops order in program. return program that can used by Paddle-Lite. convert quantized and well-trained ``program`` to final quantized ``program`` that can be used to save ``inference model``.
Args: Args:
program(fluid.Program): program that returned by quant_aware program(fluid.Program): quantized and well-trained ``test program``.
place(fluid.CPUPlace or fluid.CUDAPlace): CPU or CUDA device place(fluid.CPUPlace or fluid.CUDAPlace): This parameter represents the executor run on which device.
scope(fluid.Scope, optional): the scope to store var, it should be program's scope. if None, will use fluid.global_scope(). config(dict, optional): configs for convert. if set None, will use default config.
default is None. It must be same with config that used in 'quant_aware'. Default: None.
config(dict, optional): configs for convert. if set None, will use default config. Default is None.\ scope(fluid.Scope, optional): Scope records the mapping between variable names and variables,
It must be same with config that used in 'quant_aware'. similar to brackets in programming languages. Usually users can use
save_int8: if return int8 freezed program. Int8 program can only be used to check size of model weights. \ `fluid.global_scope <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_. When ``None`` will use `fluid.global_scope() <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_ . Default: ``None``.
It cannot be used in Fluid or Paddle-Lite. save_int8: Whether to return ``program`` which model parameters' dtype is ``int8``.
Return: This parameter can only be used to get model size. Default: ``False``.
freezed_program(fluid.Program): freezed program which can be used for inference.
parameters is float32 type, but it's value in int8 range. Returns:
freezed_program_int8(fluid.Program): freezed int8 program. Tuple : freezed program which can be used for inference.
when save_int8 is False, return freezed_program. when ``save_int8`` is False, return ``freezed_program(fluid.Program)``.
when save_int8 is True, return freezed_program and freezed_program_int8 when ``save_int8`` is True, return ``freezed_program(fluid.Program)`` and ``freezed_program_int8(fluid.Program)``
""" """
scope = fluid.global_scope() if not scope else scope scope = fluid.global_scope() if not scope else scope
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册