fsp_loss出自论文 `A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning <http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf>`_
fsploss出自论文 `A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning <http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf>`_
**参数:**
...
...
@@ -70,7 +70,7 @@ fsp_loss出自论文 `A Gift from Knowledge Distillation: Fast Optimization, Net
2. set ``Quantization`` and ``HyperParameterOptimization`` to get quant_post and hyperparameter optimization compress config.
The Quantization config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L24`_ .
The HyperParameterOptimization config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L73`_ .
3. set ``Prune`` and ``Distillation`` to get prune and distillation compress config.
The Prune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L82`_ .
3. set ``ChannelPrune`` and ``Distillation`` to get channel prune and distillation compress config.
The ChannelPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L82`_ .
The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ .
4. set ``UnstructurePrune`` and ``Distillation`` to get unstructureprune and distillation compress config.
4. set ``ASPPrune`` and ``Distillation`` to get asp prune and distillation compress config.
The ASPPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L82`_ .
The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ .
5. set ``TransformerPrune`` and ``Distillation`` to get transformer prune and distillation compress config.
The TransformerPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L82`_ .
The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ .
6. set ``UnstructurePrune`` and ``Distillation`` to get unstructureprune and distillation compress config.
The UnstructurePrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L91`_ .
The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ .
5. set ``Distillation`` to use one teacher modol to distillation student model.
7. set ``Distillation`` to use one teacher modol to distillation student model.
The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ .
6. set ``MultiTeacherDistillation`` to use multi-teacher to distillation student model.
8. set ``MultiTeacherDistillation`` to use multi-teacher to distillation student model.
The MultiTeacherDistillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L56`_ .
If set to None, will choose a strategy automatically. Default: None.
quantize_op_types(list(str)): Ops of type in quantize_op_types, will be quantized. Default: ['conv2d', 'depthwise_conv2d', 'mul', 'matmul', 'matmul_v2'].
weight_bits(int): Weight quantize bit num. Default: 8.
activation_bits(int): Activation quantize bit num. Default 8.
not_quant_pattern(list(str)): Ops of name_scope in not_quant_pattern list, will not be quantized. Default: 'skip_quant'.
use_pact(bool): Whether to use pact in quantization training. Default: False.
activation_quantize_type(str): Activation quantize type. Default is 'moving_average_abs_max'.
loss(str|list(str)): Distillation loss, the type of loss can be set reference `<https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/dist/single_distiller_api.html>`_. If set list of loss, means the difference node can be set difference distill loss, the length of loss must equal to length of node. Default: 'l2'.
node(list(str)|list(list(str))): Distillation node, users can set node from the model before compress. If set list of list, every inner list used same distill loss, the length of list must equal to length of loss. Default: [].
alpha(float|list(float)): The lambda of distillation loss. If set list of alpha, the length of alpha must equal to length of loss. Default: 1.0.
teacher_model_dir(str, optional): The path of teacher inference model, and the model and params that saved by ``paddle.static.io.save_inference_model`` are under the path. If set to None, the teacher model will be set to the model before compress. Default: None.
teacher_model_filename(str, optional): The name of teacher model file. If parameters are saved in separate files, set it as 'None'. Default: 'None'.
teacher_params_filename(str, optional): The name of teacher params file. When all parameters are saved in a single file, set it as filename. If parameters are saved in separate files, set it as 'None'. Default : 'None'.
loss(list(str)): The list of distillation loss, the type of loss can be set reference `<https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/dist/single_distiller_api.html>`_. One-to-one correspondence between loss and teacher model. Default: [].
node(list(list(str))): Distillation node, users can set node from the model before compress. If set list of list, every inner list used same distill loss, the length of list must equal to length of loss. Default: [].
alpha(list(float)): The list of lambda of distillation loss. One-to-one correspondence between alpha and loss. Default: [].
teacher_model_dir(list): The list of path of teacher inference model, and the model and params that saved by ``paddle.static.io.save_inference_model`` are under the path. If set to None, the teacher model will be set to the model before compress. Default: None.
teacher_model_filename(list): The list of name of teacher model file. If parameters are saved in separate files, set it as 'None'. Default: 'None'.
teacher_params_filename(list): The list of name of teacher params fie. When all parameters are saved in a single file, set it as filename. If parameters are saved in separate files, set it as 'None'. Default : 'None'.
ptq_algo(list(str)): Post-Training Quantization algorithm, can be set reference the algo from `<https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/quant/quantization_api.html#quant-post-static>`_.
bias_correct(list(bool)): Whether to use bias_correct.
weight_quantize_type(list(str)): Quantization type for weight, can be set from 'channel_abs_max' or 'abs_max'.
hist_percent(list(float)): The upper and lower bounds of threshold of algo 'hist' for activations, the real percent is uniform sampling in this bounds.
batch_num(list(int)): The upper and lower bounds of batch number, the real batch number is uniform sampling in this bounds.
max_quant_count(int): Max number of model quantization. Default: 20.
prune_params_name(list(str)): A list of parameter names to be pruned.
criterion(str|function): the criterion used to sort channels for pruning, can be choose from ['l1_norm', 'bn_scale', 'geometry_median']. Default: 'l1_norm'.
"""
self.pruned_ratio=pruned_ratio
self.prune_params_name=prune_params_name
self.criterion=criterion
classASPPrune:
def__init__(self,prune_params_name):
"""
ASPPrune Config.
Args:
prune_params_name(list(str)): A list of parameter names to be pruned.
"""
self.prune_params_name=prune_params_name
classTransformerPrune:
def__init__(self,pruned_ratio):
"""
TransformerPrune Config.
Args:
pruned_ratio(float): The ratios to be pruned each fully-connected layer.
"""
self.pruned_ratio=pruned_ratio
classUnstructurePrune:
def__init__(self,
prune_strategy=None,
prune_mode='ratio',
threshold=0.01,
ratio=0.55,
gmp_config=None,
prune_params_type=None,
local_sparsity=False):
"""
UnstructurePrune Config.
Args:
prune_strategy(str, optional): The pruning strategy, currently we support base and gmp, ``None`` means use base pruning strategy. Default: ``None``.
prune_mode(str): The pruning mode: whether by ratio or by threshold. Default: 'ratio'.
threshold(float): The threshold to set zeros, the abs(weights) lower than which will be zeros. Default: 0.01.
ratio(float): The ratio to set zeros, the smaller portion will be zeros. Default: 0.55.
gmp_config(dict): The dictionary contains all the configs for GMP pruner. Default: None. The detailed description is as below:
.. code-block:: python
{'stable_iterations': int} # the duration of stable phase in terms of global iterations
{'pruning_iterations': int} # the duration of pruning phase in terms of global iterations
{'tunning_iterations': int} # the duration of tunning phase in terms of global iterations
{'resume_iteration': int} # the start timestamp you want to train from, in terms if global iteration
{'pruning_steps': int} # the total times you want to increase the ratio
{'initial_ratio': float} # the initial ratio value
..
prune_params_type(str): Which kind of params should be pruned, we only support None (all but norms) and conv1x1_only for now. Default: None.
local_sparsity(bool): Whether to prune all the parameter matrix at the same ratio or not. Default: False.
"""
self.prune_strategy=prune_strategy
self.prune_mode=prune_mode
self.threshold=threshold
self.ratio=ratio
self.gmp_config=gmp_config
self.prune_params_type=prune_params_type
self.local_sparsity=local_sparsity
classTrainConfig:
def__init__(self,
epochs=None,
train_iter=None,
learning_rate=0.02,
optimizer_builder={'optimizer':'SGD'},
eval_iter=1000,
logging_iter=10,
origin_metric=None,
target_metric=None,
use_fleet=False,
amp_config=None,
recompute_config=None,
sharding_config=None,
sparse_model=False):
"""
Train Config.
Args:
epochs(int): The number of total epochs. Default: None.
train_iter(int): Training total iteration, `epochs` or `train_iter` only need to set one. Default: None.
learning_rate(float|dict): learning rate in the training. If set dict, the detailed description of learning_rate is as blow:
.. code-block:: python
'type'(str) # the class name of learning rate decay, can reference in paddle.optimizer.lr.
..
other keys in the learning_rate depend on the parameters in the class of learning rate decay.
Such as, if you want to use ``PiecewiseDecay``, need to set learning_rate like:
optimizer_builder(str|dict): optimizer in th training. If set dict, the detailed description of optimizer_builder is as blow:
.. code-block:: python
'optimizer'(dict) # the 'type' in the optimizer need to be the class name in the paddle.optimizer,
other key of optimzer depend on the parameters in the class.
'weight_decay(float, optional)' # weight decay in the training.
'regularizer(dict)': # the 'type' in the regularizer need to be the class name in the paddle.regularizer,
other key of optimzer depend on the parameters in the class.
'grad_clip(dict)': # the 'type' in the grad_clip need to be the class name in the paddle.nn, such as: 'ClipGradByGlobalNorm',
other key of grad_clip depend on the parameters in the class.
..
eval_iter(int): Test period in batches. Default: 1000.
logging_iter(int): Log period in batches. Default: 10.
origin_metric(float, optional): The Metric of model before compress, used to check whether the dataloader is correct if is not None. Default: None.
target_metric(float, optional): The Metric of model after compress, if set target metric, the metric of compressed model satisfy the requirements, will be stop training. If not set, will train epochs as users set. Default: None.
use_fleet(bool): Whether to use fleet. Default: False.
amp_config(dict, optional): The dictionary contains all the configs of amp. Default: None. The detailed description is as below if use_fleet=False:
{'use_pure_fp16': bool} # Whether to use the pure fp16 training.
{'use_fp16_guard': bool} # Whether to use `fp16_guard` when constructing the program.
..
If you want to use AMP-O2, you need to set use_pure_fp16 is True and use_fp16_guard is False.
If use_fleet=True, the key of amp_config can reference `<https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/distributed/fleet/DistributedStrategy_cn.html#amp-configs>`_.
recompute_config(dict, optional): The dictionary contains all the configs of recompute. Default: None. The recompute config only can be set when use_fleet=True, the key of recompute_config can reference `<https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/distributed/fleet/DistributedStrategy_cn.html#recompute-configs>`_.
sharding_config(dict, optional): The dictionary contains all the configs of sharding. Default: None. The sharding config only can be set when use_fleet=True, the key of sharding_config can reference `<https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/distributed/fleet/DistributedStrategy_cn.html#sharding-configs>`_.
sparse_model(bool, optional): Set sparse_model to ``True`` to remove mask tensor when the compress strategy is unstructure prune. Default: False.
"""
self.epochs=epochs
self.train_iter=train_iter
self.learning_rate=learning_rate
self.optimizer_builder=optimizer_builder
self.eval_iter=eval_iter
self.logging_iter=logging_iter
self.origin_metric=origin_metric
self.target_metric=target_metric
self.use_fleet=use_fleet
self.amp_config=amp_config
self.recompute_config=recompute_config
self.sharding_config=sharding_config
self.sparse_model=sparse_model
classMergeConfig:
def__init__(self,**kwargs):
forname,valueinkwargs.items():
setattr(self,name,value)
defmerge_config(*args):
fields=set()
cfg=dict()
forarginargs:
fields=fields.union(arg._fields)
cfg.update(dict(arg._asdict()))
MergeConfig=namedtuple("MergeConfig",fields)
cfg.update(arg.__dict__)
returnMergeConfig(**cfg)
...
...
@@ -143,6 +322,16 @@ class ProgramInfo:
fetch_targets,
optimizer=None,
learning_rate=None):
"""
ProgramInfo Config.
Args:
startup_program(paddle.static.Program): Startup program, the means of startup program can reference `<https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/static/default_startup_program_cn.html#cn-api-fluid-default-startup-program>`_.
program(paddle.static.Program): main program, the means of main program can reference `<https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/static/default_main_program_cn.html#cn-api-fluid-default-main-program>`_.
feed_target_names(list(str)): The name of feed tensor in the program.
fetch_targets(list(Variable)): The fetch variable in the program.
optimizer(Optimizer, optional): Optimizer in training. Default: None.
learning_rate(float|paddle.optimizer.lr, optional): learning_rate in training. Default: None.