未验证 提交 e8efaee9 编写于 作者: Z Zhou Wei 提交者: GitHub

update gradient clip english doc for new gradient clipping strategy

梯度裁剪的策略进行了升级,配合修复相应的 裁剪API、minimize、ParamAttr 的API英文文档。

对应API变动的文档: #23224 

对应中文文档PR:PaddlePaddle/FluidDoc#1942
上级 426912df
此差异已折叠。
...@@ -801,11 +801,12 @@ class Optimizer(object): ...@@ -801,11 +801,12 @@ class Optimizer(object):
to minimize ``loss``. The default value is None, at this time all parameters to minimize ``loss``. The default value is None, at this time all parameters
will be updated. will be updated.
no_grad_set (set, optional): Set of ``Variable`` or ``Variable.name`` that don't need no_grad_set (set, optional): Set of ``Variable`` or ``Variable.name`` that don't need
to be updated. The default value is None. to be updated. The default value is None.
grad_clip (GradClipBase, optional) : Gradient clipping strategy, static grad_clip (GradientClipBase, optional): Gradient cliping strategy, it's an instance of
graph mode does not need to use this argument. Currently, this argument some derived class of ``GradientClipBase`` . There are three cliping strategies
only supports gradient clipping in dygraph mode. In the future, this ( :ref:`api_fluid_clip_GradientClipByGlobalNorm` , :ref:`api_fluid_clip_GradientClipByNorm` ,
argument my be adjusted. The default value is None. :ref:`api_fluid_clip_GradientClipByValue` ). Default value: None, and there is no
gradient clipping.
Returns: Returns:
tuple: tuple (optimize_ops, params_grads), A list of operators appended tuple: tuple (optimize_ops, params_grads), A list of operators appended
......
...@@ -31,6 +31,12 @@ class ParamAttr(object): ...@@ -31,6 +31,12 @@ class ParamAttr(object):
Create a object to represent the attribute of parameter. The attributes are: Create a object to represent the attribute of parameter. The attributes are:
name, initializer, learning rate, regularizer, trainable, gradient clip, name, initializer, learning rate, regularizer, trainable, gradient clip,
and model average. and model average.
Note:
``gradient_clip`` of ``ParamAttr`` HAS BEEN DEPRECATED since 2.0.
It is recommended to use ``minimize(loss, grad_clip=clip)`` to clip gradient.
There are three clipping strategies: :ref:`api_fluid_clip_GradientClipByGlobalNorm` ,
:ref:`api_fluid_clip_GradientClipByNorm` , :ref:`api_fluid_clip_GradientClipByValue` .
Parameters: Parameters:
name (str, optional): The parameter's name. Default None, meaning that the name name (str, optional): The parameter's name. Default None, meaning that the name
...@@ -44,8 +50,6 @@ class ParamAttr(object): ...@@ -44,8 +50,6 @@ class ParamAttr(object):
regularizer (WeightDecayRegularizer, optional): Regularization factor. Default None, meaning regularizer (WeightDecayRegularizer, optional): Regularization factor. Default None, meaning
there is no regularization. there is no regularization.
trainable (bool): Whether this parameter is trainable. Default True. trainable (bool): Whether this parameter is trainable. Default True.
gradient_clip (BaseGradientClipAttr, optional): The method to clip this parameter's
gradient. Default None, meaning that there is no gradient clip.
do_model_average (bool): Whether this parameter should do model average do_model_average (bool): Whether this parameter should do model average
when model average is enabled. Default False. when model average is enabled. Default False.
...@@ -190,6 +194,12 @@ class WeightNormParamAttr(ParamAttr): ...@@ -190,6 +194,12 @@ class WeightNormParamAttr(ParamAttr):
paper: `Weight Normalization: A Simple Reparameterization to Accelerate paper: `Weight Normalization: A Simple Reparameterization to Accelerate
Training of Deep Neural Networks Training of Deep Neural Networks
<https://arxiv.org/pdf/1602.07868.pdf>`_. <https://arxiv.org/pdf/1602.07868.pdf>`_.
Note:
``gradient_clip`` of ``WeightNormParamAttr`` HAS BEEN DEPRECATED since 2.0.
It is recommended to use ``minimize(loss, grad_clip=clip)`` to clip gradient.
There are three clipping strategies: :ref:`api_fluid_clip_GradientClipByGlobalNorm` ,
:ref:`api_fluid_clip_GradientClipByNorm` , :ref:`api_fluid_clip_GradientClipByValue` .
Args: Args:
dim(int): Dimension over which to compute the norm. Dim is a non-negative dim(int): Dimension over which to compute the norm. Dim is a non-negative
...@@ -209,9 +219,6 @@ class WeightNormParamAttr(ParamAttr): ...@@ -209,9 +219,6 @@ class WeightNormParamAttr(ParamAttr):
``regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1)``. ``regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1)``.
Default None, meaning that there is no regularization. Default None, meaning that there is no regularization.
trainable(bool, optional): Whether this parameter is trainable. Default True. trainable(bool, optional): Whether this parameter is trainable. Default True.
gradient_clip: The method to clip this parameter's gradient, such as
``gradient_clip = fluid.clip.GradientClipByNorm(clip_norm=2.0))`` .
Default None, meaning that there is no gradient clip.
do_model_average(bool, optional): Whether this parameter should do model average. do_model_average(bool, optional): Whether this parameter should do model average.
Default False. Default False.
...@@ -229,7 +236,6 @@ class WeightNormParamAttr(ParamAttr): ...@@ -229,7 +236,6 @@ class WeightNormParamAttr(ParamAttr):
learning_rate=1.0, learning_rate=1.0,
regularizer=fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1), regularizer=fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1),
trainable=True, trainable=True,
gradient_clip=fluid.clip.GradientClipByNorm(clip_norm=2.0),
do_model_average=False)) do_model_average=False))
""" """
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册