update gradient clip english doc for new gradient clipping strategy

梯度裁剪的策略进行了升级，配合修复相应的裁剪API、minimize、ParamAttr 的API英文文档。对应API变动的文档： #23224 对应中文文档PR：PaddlePaddle/FluidDoc#1942

update gradient clip english doc for new gradient clipping strategy
梯度裁剪的策略进行了升级，配合修复相应的裁剪API、minimize、ParamAttr 的API英文文档。对应API变动的文档： #23224 对应中文文档PR：PaddlePaddle/FluidDoc#1942
e8efaee9 · Zhou Wei · GitHub · 426912df · e8efaee9 · e8efaee9
Showing with 270 addition and 146 deletion

python/paddle/fluid/clip.py python/paddle/fluid/clip.py +252 -135

python/paddle/fluid/optimizer.py python/paddle/fluid/optimizer.py +6 -5

python/paddle/fluid/param_attr.py python/paddle/fluid/param_attr.py +12 -6

未找到文件。
--- a/python/paddle/fluid/clip.py
+++ b/python/paddle/fluid/clip.py
--- a/python/paddle/fluid/optimizer.py
+++ b/python/paddle/fluid/optimizer.py
@@ -801,11 +801,12 @@ class Optimizer(object):
                to minimize ``loss``. The default value is None, at this time all parameters
                will be updated.
            no_grad_set (set, optional): Set of ``Variable``  or ``Variable.name`` that don't need
-                to be updated. The default value is None.
-            grad_clip (GradClipBase, optional) : Gradient clipping strategy, static
-                graph mode does not need to use this argument. Currently, this argument
-                only supports gradient clipping in dygraph mode. In the future, this
-                argument my be adjusted. The default value is None.
+                to be updated. The default value is None.   
+            grad_clip (GradientClipBase, optional): Gradient cliping strategy, it's an instance of 
+                some derived class of ``GradientClipBase`` . There are three cliping strategies 
+                ( :ref:`api_fluid_clip_GradientClipByGlobalNorm` , :ref:`api_fluid_clip_GradientClipByNorm` , 
+                :ref:`api_fluid_clip_GradientClipByValue` ). Default value: None, and there is no 
+                gradient clipping.

        Returns:
            tuple: tuple (optimize_ops, params_grads), A list of operators appended

--- a/python/paddle/fluid/param_attr.py
+++ b/python/paddle/fluid/param_attr.py
@@ -31,6 +31,12 @@ class ParamAttr(object):
    Create a object to represent the attribute of parameter. The attributes are:
    name, initializer, learning rate, regularizer, trainable, gradient clip,
    and model average.
+    
+    Note:
+        ``gradient_clip`` of ``ParamAttr`` HAS BEEN DEPRECATED since 2.0. 
+        It is recommended to use ``minimize(loss, grad_clip=clip)`` to clip gradient. 
+        There are three clipping strategies: :ref:`api_fluid_clip_GradientClipByGlobalNorm` , 
+        :ref:`api_fluid_clip_GradientClipByNorm` , :ref:`api_fluid_clip_GradientClipByValue` .

    Parameters:
        name (str, optional): The parameter's name. Default None, meaning that the name
@@ -44,8 +50,6 @@ class ParamAttr(object):
        regularizer (WeightDecayRegularizer, optional): Regularization factor. Default None, meaning
                there is no regularization.
        trainable (bool): Whether this parameter is trainable. Default True.
-        gradient_clip (BaseGradientClipAttr, optional): The method to clip this parameter's
-                gradient. Default None, meaning that there is no gradient clip.
        do_model_average (bool): Whether this parameter should do model average
                when model average is enabled. Default False.

@@ -190,6 +194,12 @@ class WeightNormParamAttr(ParamAttr):
    paper: `Weight Normalization: A Simple Reparameterization to Accelerate
    Training of Deep Neural Networks
    <https://arxiv.org/pdf/1602.07868.pdf>`_.
+      
+    Note:
+        ``gradient_clip`` of ``WeightNormParamAttr`` HAS BEEN DEPRECATED since 2.0. 
+        It is recommended to use ``minimize(loss, grad_clip=clip)`` to clip gradient. 
+        There are three clipping strategies: :ref:`api_fluid_clip_GradientClipByGlobalNorm` , 
+        :ref:`api_fluid_clip_GradientClipByNorm` , :ref:`api_fluid_clip_GradientClipByValue` .

    Args:
        dim(int): Dimension over which to compute the norm. Dim is a non-negative
@@ -209,9 +219,6 @@ class WeightNormParamAttr(ParamAttr):
            ``regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1)``.
            Default None, meaning that there is no regularization.
        trainable(bool, optional): Whether this parameter is trainable. Default True.
-        gradient_clip: The method to clip this parameter's gradient, such as
-            ``gradient_clip = fluid.clip.GradientClipByNorm(clip_norm=2.0))`` .
-            Default None, meaning that there is no gradient clip.
        do_model_average(bool, optional): Whether this parameter should do model average.
            Default False.

@@ -229,7 +236,6 @@ class WeightNormParamAttr(ParamAttr):
                                          learning_rate=1.0,
                                          regularizer=fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1),
                                          trainable=True,
-                                          gradient_clip=fluid.clip.GradientClipByNorm(clip_norm=2.0),
                                          do_model_average=False))

    """