fix docstrings: paddle.nn.Conv3DTranspose, etc. (#49149)

* fix docstrings: 1. 修复了 python/paddle/nn/layer/conv.py 中出现的 Tex 语法问题； 2. 为 python/paddle/nn/functional/conv.py 修复 Tex 语法问题，修改 note 及参数说明格式，尝试添加论文链接； 3. 修改 python/paddle/nn/layer/loss.py、python/paddle/nn/functional/loss.py 中参数说明的格式； 4. 修改 python/paddle/nn/layer/common.py、python/paddle/nn/functional/common.py 中参数说明的格式，添加 optional 说明。 * fix docstring format, add paper's hyperlink. * fix formula Tex. * fix format error; test=docs_preview * fix reference hyperlink; test=docs_preview * fix docstring; test=docs_preview * fix Tex format. * fix api reference; test=docs_preview * test=document_fix * Update conv.py * formula; test=document_fix * test=document_fix * formula; test=document_fix Co-authored-by: N Ligoml <39876205+Ligoml@users.noreply.github.com>

fix docstrings: paddle.nn.Conv3DTranspose, etc. (#49149)
* fix docstrings: 1. 修复了 python/paddle/nn/layer/conv.py 中出现的 Tex 语法问题； 2. 为 python/paddle/nn/functional/conv.py 修复 Tex 语法问题，修改 note 及参数说明格式，尝试添加论文链接； 3. 修改 python/paddle/nn/layer/loss.py、python/paddle/nn/functional/loss.py 中参数说明的格式； 4. 修改 python/paddle/nn/layer/common.py、python/paddle/nn/functional/common.py 中参数说明的格式，添加 optional 说明。 * fix docstring format, add paper's hyperlink. * fix formula Tex. * fix format error; test=docs_preview * fix reference hyperlink; test=docs_preview * fix docstring; test=docs_preview * fix Tex format. * fix api reference; test=docs_preview * test=document_fix * Update conv.py * formula; test=document_fix * test=document_fix * formula; test=document_fix Co-authored-by: N Ligoml <39876205+Ligoml@users.noreply.github.com>
baa98d1d · 学渣戊 · GitHub · 13c4fd59 · baa98d1d · baa98d1d
6 changed file
--- a/python/paddle/nn/functional/common.py
+++ b/python/paddle/nn/functional/common.py
@@ -950,7 +950,7 @@ def bilinear(x1, x2, weight, bias=None, name=None):
 def dropout(
    x, p=0.5, axis=None, training=True, mode="upscale_in_train", name=None
 ):
-    """
+    r"""
    Dropout is a regularization technique for reducing overfitting by preventing
    neuron co-adaption during training. The dropout operator randomly sets the
    outputs of some units to zero, while upscale others according to the given
@@ -958,22 +958,22 @@ def dropout(
    Args:
        x (Tensor): The input tensor. The data type is float32 or float64.
-        p (float|int, optional): Probability of setting units to zero. Default 0.5.
+        p (float|int, optional): Probability of setting units to zero. Default: 0.5.
-        axis (int|list|tuple, optional): The axis along which the dropout is performed. Default None.
+        axis (int|list|tuple, optional): The axis along which the dropout is performed. Default: None.
-        training (bool, optional): A flag indicating whether it is in train phrase or not. Default True.
+        training (bool, optional): A flag indicating whether it is in train phrase or not. Default: True.
        mode(str, optional): ['upscale_in_train'(default) | 'downscale_in_infer'].
-            1. upscale_in_train(default), upscale the output at training time
+            1. upscale_in_train (default), upscale the output at training time
-                - train: out = input * mask / ( 1.0 - dropout_prob )
+                - train: :math:`out = input \times \frac{mask}{(1.0 - dropout\_prob)}`
-                - inference: out = input
+                - inference: :math:`out = input`
            2. downscale_in_infer, downscale the output at inference
-                - train: out = input * mask
+                - train: :math:`out = input \times mask`
-                - inference: out = input * (1.0 - dropout_prob)
+                - inference: :math:`out = input \times (1.0 - dropout\_prob)`
-        name (str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.
+        name (str, optional): Name for the operation, Default: None. For more information, please refer to :ref:`api_guide_Name`.
    Returns:
        A Tensor representing the dropout, has same shape and data type as `x` .
@@ -1057,8 +1057,8 @@ def dropout(
                 [0 0 0]]
                Actually this is not what we want because all elements may set to zero~
-        When x is a 4d tensor with shape `NCHW`, we can set ``axis=[0,1]`` and the dropout will be performed in channel `N` and `C`, `H` and `W` is tied, i.e. paddle.nn.dropout(x, p, axis=[0,1]) . Please refer to ``paddle.nn.functional.dropout2d`` for more details.
+        When x is a 4d tensor with shape `NCHW`, where `N` is batch size, `C` is the number of channels, H and W are the height and width of the feature, we can set ``axis=[0,1]`` and the dropout will be performed in channel `N` and `C`, `H` and `W` is tied, i.e. paddle.nn.dropout(x, p, axis=[0,1]) . Please refer to ``paddle.nn.functional.dropout2d`` for more details.
-        Similarly, when x is a 5d tensor with shape `NCDHW`, we can set ``axis=[0,1]`` to perform dropout3d. Please refer to ``paddle.nn.functional.dropout3d`` for more details.
+        Similarly, when x is a 5d tensor with shape `NCDHW`, where `D` is the depth of the feature, we can set ``axis=[0,1]`` to perform dropout3d. Please refer to ``paddle.nn.functional.dropout3d`` for more details.
        .. code-block:: python
@@ -1255,15 +1255,15 @@ def dropout2d(x, p=0.5, training=True, data_format='NCHW', name=None):
    a channel is a 2D feature map with the shape `HW` ). Each channel will be zeroed out independently
    on every forward call with probability `p` using samples from a Bernoulli distribution.
-    See ``paddle.nn.functional.dropout`` for more details.
+    See :ref:`api_paddle_nn_functional_dropout` for more details.
    Args:
        x (Tensor):  The input is 4-D Tensor with shape [N, C, H, W] or [N, H, W, C].
                     The data type is float32 or float64.
-        p (float): Probability of setting units to zero. Default 0.5.
+        p (float, optional): Probability of setting units to zero. Default: 0.5.
-        training (bool): A flag indicating whether it is in train phrase or not. Default True.
+        training (bool, optional): A flag indicating whether it is in train phrase or not. Default: True.
-        data_format (str, optional): Specify the data format of the input, and the data format of the output will be consistent with that of the input. An optional string from `NCHW` or `NHWC` . The default is `NCHW` . When it is `NCHW` , the data is stored in the order of: [batch_size, input_channels, input_height, input_width].
+        data_format (str, optional): Specify the data format of the input, and the data format of the output will be consistent with that of the input. An optional string from `NCHW` or `NHWC` . When it is `NCHW` , the data is stored in the order of: [batch_size, input_channels, input_height, input_width]. Default: `NCHW` .
-        name (str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.
+        name (str, optional): Name for the operation, Default: None. For more information, please refer to :ref:`api_guide_Name`.
    Returns:
        A Tensor representing the dropout2d, has same shape and data type as `x` .
@@ -1314,15 +1314,15 @@ def dropout3d(x, p=0.5, training=True, data_format='NCDHW', name=None):
    a channel is a 3D feature map with the shape `DHW` ). Each channel will be zeroed out independently
    on every forward call with probability `p` using samples from a Bernoulli distribution.
-    See ``paddle.nn.functional.dropout`` for more details.
+    See :ref:`api_paddle_nn_functional_dropout` for more details.
    Args:
        x (Tensor):  The input is 5-D Tensor with shape [N, C, D, H, W] or [N, D, H, W, C].
                     The data type is float32 or float64.
-        p (float): Probability of setting units to zero. Default 0.5.
+        p (float, optional): Probability of setting units to zero. Default: 0.5.
-        training (bool): A flag indicating whether it is in train phrase or not. Default True.
+        training (bool, optional): A flag indicating whether it is in train phrase or not. Default: True.
-        data_format (str, optional): Specify the data format of the input, and the data format of the output will be consistent with that of the input. An optional string from ``NCDHW`` or ``NDHWC``. The default is ``NCDHW`` . When it is ``NCDHW`` , the data is stored in the order of: [batch_size, input_channels, input_depth, input_height, input_width].
+        data_format (str, optional): Specify the data format of the input, and the data format of the output will be consistent with that of the input. An optional string from ``NCDHW`` or ``NDHWC``. When it is ``NCDHW`` , the data is stored in the order of: [batch_size, input_channels, input_depth, input_height, input_width]. Default: ``NCDHW`` .
-        name (str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.
+        name (str, optional): Name for the operation, Default: None. For more information, please refer to :ref:`api_guide_Name`.
    Returns:
        A Tensor representing the dropout3d, has same shape and data type with `x` .

--- a/python/paddle/nn/functional/conv.py
+++ b/python/paddle/nn/functional/conv.py
@@ -852,7 +852,7 @@ def conv1d_transpose(
        .. math::
-           L^\prime_{out} &= (L_{in} - 1) * stride - pad_top - pad_bottom + dilation * (L_f - 1) + 1 + output_padding \\\\
+           L^\prime_{out} &= (L_{in} - 1) * stride - 2 * padding + dilation * (L_f - 1) + 1 \\
           L_{out} &\in [ L^\prime_{out}, L^\prime_{out} + stride ]
    Note:
@@ -1157,9 +1157,9 @@ def conv2d_transpose(
        ..  math::
-           H^\prime_{out} &= (H_{in} - 1) * strides[0] - pad_height_top - pad_height_bottom + dilations[0] * (H_f - 1) + 1 \\\\
+           H^\prime_{out} &= (H_{in} - 1) * strides[0] - 2 * paddings[0] + dilations[0] * (H_f - 1) + 1 \\
-           W^\prime_{out} &= (W_{in} - 1) * strides[1] - pad_width_left - pad_width_right + dilations[1] * (W_f - 1) + 1 \\\\
+           W^\prime_{out} &= (W_{in} - 1) * strides[1] - 2 * paddings[1] + dilations[1] * (W_f - 1) + 1 \\
-           H_{out} &\in [ H^\prime_{out}, H^\prime_{out} + strides[0] ] \\\\
+           H_{out} &\in [ H^\prime_{out}, H^\prime_{out} + strides[0] ] \\
           W_{out} &\in [ W^\prime_{out}, W^\prime_{out} + strides[1] ]
    Note:
@@ -1460,9 +1460,9 @@ def conv3d(
        ..  math::
-            D_{out}&= \\frac{(D_{in} + 2 * paddings[0] - (dilations[0] * (D_f - 1) + 1))}{strides[0]} + 1 \\\\
+            D_{out}&= \frac{(D_{in} + 2 * paddings[0] - (dilations[0] * (D_f - 1) + 1))}{strides[0]} + 1 \\
-            H_{out}&= \\frac{(H_{in} + 2 * paddings[1] - (dilations[1] * (H_f - 1) + 1))}{strides[1]} + 1 \\\\
+            H_{out}&= \frac{(H_{in} + 2 * paddings[1] - (dilations[1] * (H_f - 1) + 1))}{strides[1]} + 1 \\
-            W_{out}&= \\frac{(W_{in} + 2 * paddings[2] - (dilations[2] * (W_f - 1) + 1))}{strides[2]} + 1
+            W_{out}&= \frac{(W_{in} + 2 * paddings[2] - (dilations[2] * (W_f - 1) + 1))}{strides[2]} + 1
    Args:
        x (Tensor): The input is 5-D Tensor with shape [N, C, D, H, W], the data
@@ -1630,10 +1630,10 @@ def conv3d_transpose(
    In the above equation:
    * :math:`X`: Input value, a Tensor with NCDHW or NDHWC format.
-    * :math:`W`: Filter value, a Tensor with MCDHW format.
+    * :math:`W`: Filter value, a Tensor with NCDHW format.
-    * :math:`\\ast`: Convolution operation.
+    * :math:`\ast`: Convolution operation.
    * :math:`b`: Bias value, a 2-D Tensor with shape [M, 1].
-    * :math:`\\sigma`: Activation function.
+    * :math:`\sigma`: Activation function.
    * :math:`Out`: Output value, the shape of :math:`Out` and :math:`X` may be different.
    Example:
@@ -1652,36 +1652,36 @@ def conv3d_transpose(
        ..  math::
-           D^\prime_{out} &= (D_{in} - 1) * strides[0] - 2 * paddings[0] + dilations[0] * (D_f - 1) + 1 \\\\
+           D^\prime_{out} &= (D_{in} - 1) * strides[0] - 2 * paddings[0] + dilations[0] * (D_f - 1) + 1 \\
-           H^\prime_{out} &= (H_{in} - 1) * strides[1] - 2 * paddings[1] + dilations[1] * (H_f - 1) + 1 \\\\
+           H^\prime_{out} &= (H_{in} - 1) * strides[1] - 2 * paddings[1] + dilations[1] * (H_f - 1) + 1 \\
-           W^\prime_{out} &= (W_{in} - 1) * strides[2] - 2 * paddings[2] + dilations[2] * (W_f - 1) + 1 \\\\
+           W^\prime_{out} &= (W_{in} - 1) * strides[2] - 2 * paddings[2] + dilations[2] * (W_f - 1) + 1 \\
-           D_{out} &\in [ D^\prime_{out}, D^\prime_{out} + strides[0] ] \\\\
+           D_{out} &\in [ D^\prime_{out}, D^\prime_{out} + strides[0] ] \\
-           H_{out} &\in [ H^\prime_{out}, H^\prime_{out} + strides[1] ] \\\\
+           H_{out} &\in [ H^\prime_{out}, H^\prime_{out} + strides[1] ] \\
           W_{out} &\in [ W^\prime_{out}, W^\prime_{out} + strides[2] ]
    Note:
-          The conv3d_transpose can be seen as the backward of the conv3d. For conv3d,
+        The conv3d_transpose can be seen as the backward of the conv3d. For conv3d,
-          when stride > 1, conv3d maps multiple input shape to the same output shape,
+        when stride > 1, conv3d maps multiple input shape to the same output shape,
-          so for conv3d_transpose, when stride > 1, input shape maps multiple output shape.
+        so for conv3d_transpose, when stride > 1, input shape maps multiple output shape.
-          If output_size is None, :math:`H_{out} = H^\prime_{out}, :math:`H_{out} = \
+        If output_size is None, :math:`H_{out} = H^\prime_{out}, W_{out} = W^\prime_{out}`;
-          H^\prime_{out}, W_{out} = W^\prime_{out}`; else, the :math:`D_{out}` of the output
+        else, the :math:`D_{out}` of the output size must between :math:`D^\prime_{out}` and
-          size must between :math:`D^\prime_{out}` and :math:`D^\prime_{out} + strides[0]`,
+        :math:`D^\prime_{out} + strides[0]`, the :math:`H_{out}` of the output size must
-          the :math:`H_{out}` of the output size must between :math:`H^\prime_{out}`
+        between :math:`H^\prime_{out}` and :math:`H^\prime_{out} + strides[1]`, and the
-          and :math:`H^\prime_{out} + strides[1]`, and the :math:`W_{out}` of the output size must
+        :math:`W_{out}` of the output size must between :math:`W^\prime_{out}` and
-          between :math:`W^\prime_{out}` and :math:`W^\prime_{out} + strides[2]`.
+        :math:`W^\prime_{out} + strides[2]`.
    Args:
-        x(Tensor): The input is 5-D Tensor with shape [N, C, D, H, W] or [N, D, H, W, C], the data type
+        x (Tensor): The input is 5-D Tensor with shape [N, C, D, H, W] or [N, D, H, W, C], the data type
            of input is float32 or float64.
        weight (Tensor): The convolution kernel, a Tensor with shape [C, M/g, kD, kH, kW],
-            where M is the number of filters(output channels), g is the number of groups,
+            where M is the number of filters (output channels), g is the number of groups,
            kD, kH, kW are the filter's depth, height and width respectively.
-        bias (Tensor, optional): The bias, a Tensor of shape [M, ].
+        bias (Tensor, optional): The bias, a Tensor of shape [M, ]. Default: None.
-        stride(int|list|tuple, optional): The stride size. It means the stride in transposed convolution.
+        stride (int|list|tuple, optional): The stride size. It means the stride in transposed convolution.
            If stride is a list/tuple, it must contain three integers, (stride_depth, stride_height,
            stride_width). Otherwise, stride_depth = stride_height = stride_width = stride.
-            Default: stride = 1.
+            Default: 1.
-        padding (string|int|list|tuple, optional): The padding size. It means the number of zero-paddings
+        padding (str|int|list|tuple, optional): The padding size. It means the number of zero-paddings
            on both sides for each dimension. If `padding` is a string, either 'VALID' or
            'SAME' which is the padding algorithm. If padding size is a tuple or list,
            it could be in three forms: `[pad_depth, pad_height, pad_width]` or
@@ -1690,29 +1690,29 @@ def conv3d_transpose(
            `[[0,0], [0,0], [pad_depth_front, pad_depth_back], [pad_height_top, pad_height_bottom], [pad_width_left, pad_width_right]]`.
            when `data_format` is `"NDHWC"`, `padding` can be in the form
            `[[0,0], [pad_depth_front, pad_depth_back], [pad_height_top, pad_height_bottom], [pad_width_left, pad_width_right], [0,0]]`.
-            Default: padding = 0.
+            Default: 0.
-        output_padding(int|list|tuple, optional): Additional size added to one side
+        output_padding (int|list|tuple, optional): Additional size added to one side
            of each dimension in the output shape. Default: 0.
-        groups(int, optional): The groups number of the Conv3D transpose layer. Inspired by
+        groups (int, optional): The groups number of the Conv3D transpose layer. Inspired by
-            grouped convolution in Alex Krizhevsky's Deep CNN paper, in which
+            grouped convolution in `Alex Krizhevsky's Deep CNN paper <https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf>`_, in which
-            when group=2, the first half of the filters is only connected to the
+            when groups = 2, the first half of the filters is only connected to the
            first half of the input channels, while the second half of the
            filters is only connected to the second half of the input channels.
-            Default: groups=1
+            Default: 1.
-        dilation(int|list|tuple, optional): The dilation size. It means the spacing between the kernel points.
+        dilation (int|list|tuple, optional): The dilation size. It means the spacing between the kernel points.
            If dilation is a list/tuple, it must contain three integers, (dilation_depth, dilation_height,
            dilation_width). Otherwise, dilation_depth = dilation_height = dilation_width = dilation.
-            Default: dilation = 1.
+            Default: 1.
-        output_size(int|list|tuple, optional): The output image size. If output size is a
+        output_size (int|list|tuple, optional): The output image size. If output size is a
            list/tuple, it must contain three integers, (image_depth, image_height, image_width).
            None if use filter_size(shape of weight), padding, and stride to calculate output_size.
        data_format (str, optional): Specify the data format of the input, and the data format of the output
            will be consistent with that of the input. An optional string from: `"NCHW"`, `"NHWC"`.
-            The default is `"NCHW"`. When it is `"NCHW"`, the data is stored in the order of:
+            When it is `"NCHW"`, the data is stored in the order of: `[batch_size, input_channels, input_height, input_width]`.
-            `[batch_size, input_channels, input_height, input_width]`.
+            Default: `"NCHW"`.
-        name(str, optional): For detailed information, please refer
+        name (str, optional): For detailed information, please refer
-           to :ref:`api_guide_Name`. Usually name is no need to set and
+           to :ref:`api_guide_Name`. Usually name is no need to set.
-           None by default.
+           Default: None.
    Returns:
        A Tensor representing the conv3d_transpose, whose data

--- a/python/paddle/nn/functional/loss.py
+++ b/python/paddle/nn/functional/loss.py
@@ -1755,9 +1755,9 @@ def ctc_loss(
        labels (Tensor): The ground truth sequence with padding, which must be a 3-D Tensor. The tensor shape is [batch_size, max_label_length], where max_label_length is the longest length of label sequence. The data type must be int32.
        input_lengths (Tensor): The length for each input sequence, it should have shape [batch_size] and dtype int64.
        label_lengths (Tensor): The length for each label sequence, it should have shape [batch_size] and dtype int64.
-        blank (int, optional): The blank label index of Connectionist Temporal Classification (CTC) loss, which is in the half-opened interval [0, num_classes + 1). The data type must be int32. Default is 0.
+        blank (int, optional): The blank label index of Connectionist Temporal Classification (CTC) loss, which is in the half-opened interval [0, num_classes + 1). The data type must be int32. Default: 0.
-        reduction (string, optional): Indicate how to average the loss, the candicates are ``'none'`` | ``'mean'`` | ``'sum'``. If :attr:`reduction` is ``'mean'``, the output loss will be divided by the label_lengths, and then return the mean of quotient; If :attr:`reduction` is ``'sum'``, return the sum of loss; If :attr:`reduction` is ``'none'``, no reduction will be applied. Default is ``'mean'``.
+        reduction (str, optional): Indicate how to average the loss, the candicates are ``'none'`` | ``'mean'`` | ``'sum'``. If :attr:`reduction` is ``'mean'``, the output loss will be divided by the label_lengths, and then return the mean of quotient; If :attr:`reduction` is ``'sum'``, return the sum of loss; If :attr:`reduction` is ``'none'``, no reduction will be applied. Default: ``'mean'``.
-        norm_by_times (bool, default False): Whether to normalize the gradients by the number of time-step, which is also the sequence’s length. There is no need to normalize the gradients if reduction mode is 'mean'.
+        norm_by_times (bool, optional): Whether to normalize the gradients by the number of time-step, which is also the sequence's length. There is no need to normalize the gradients if reduction mode is 'mean'. Default: False.
    Returns:
        Tensor, The Connectionist Temporal Classification (CTC) loss between ``log_probs`` and  ``labels``. If attr:`reduction` is ``'none'``, the shape of loss is [batch_size], otherwise, the shape of loss is [1]. Data type is the same as ``log_probs``.

--- a/python/paddle/nn/layer/common.py
+++ b/python/paddle/nn/layer/common.py
@@ -698,32 +698,32 @@ class Bilinear(Layer):
 class Dropout(Layer):
-    """
+    r"""
    Dropout is a regularization technique for reducing overfitting by preventing
    neuron co-adaption during training as described in the paper:
    `Improving neural networks by preventing co-adaptation of feature detectors <https://arxiv.org/abs/1207.0580>`_
    The dropout operator randomly sets the outputs of some units to zero, while upscale others
    according to the given dropout probability.
-    See ``paddle.nn.functional.dropout`` for more details.
+    See :ref:`api_paddle_nn_functional_dropout` for more details.
    In dygraph mode, please use ``eval()`` to switch to evaluation mode, where dropout is disabled.
    Parameters:
-        p (float|int): Probability of setting units to zero. Default: 0.5
+        p (float|int, optional): Probability of setting units to zero. Default: 0.5
-        axis (int|list|tuple): The axis along which the dropout is performed. Default None.
+        axis (int|list|tuple, optional): The axis along which the dropout is performed. Default: None.
        mode(str, optional): ['upscale_in_train'(default) | 'downscale_in_infer']
-                               1. upscale_in_train(default), upscale the output at training time
+                               1. upscale_in_train (default), upscale the output at training time
-                                  - train: out = input * mask / ( 1.0 - p )
+                                  - train: :math:`out = input \times \frac{mask}{(1.0 - p)}`
-                                  - inference: out = input
+                                  - inference: :math:`out = input`
                               2. downscale_in_infer, downscale the output at inference
-                                  - train: out = input * mask
+                                  - train: :math:`out = input \times mask`
-                                  - inference: out = input * (1.0 - p)
+                                  - inference: :math:`out = input \times (1.0 - p)`
-        name (str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.
+        name (str, optional): Name for the operation, Default: None. For more information, please refer to :ref:`api_guide_Name`.
    Shape:
        - input: N-D tensor.
@@ -786,14 +786,14 @@ class Dropout2D(Layer):
    Dropout2D will help promote independence between feature maps as described in the paper:
    `Efficient Object Localization Using Convolutional Networks <https://arxiv.org/abs/1411.4280>`_
-    See ``paddle.nn.functional.dropout2d`` for more details.
+    See :ref:`api_paddle_nn_functional_dropout2d` for more details.
    In dygraph mode, please use ``eval()`` to switch to evaluation mode, where dropout is disabled.
    Parameters:
-        p (float, optional): Probability of setting units to zero. Default: 0.5
+        p (float, optional): Probability of setting units to zero. Default: 0.5.
-        data_format (str, optional): Specify the data format of the input, and the data format of the output will be consistent with that of the input. An optional string from `NCHW` or `NHWC`. The default is `NCHW`. When it is `NCHW`, the data is stored in the order of: [batch_size, input_channels, input_height, input_width].
+        data_format (str, optional): Specify the data format of the input, and the data format of the output will be consistent with that of the input. An optional string from `NCHW` or `NHWC`. When it is `NCHW`, the data is stored in the order of: [batch_size, input_channels, input_height, input_width]. Default: `NCHW`.
-        name (str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.
+        name (str, optional): Name for the operation, Default: None. For more information, please refer to :ref:`api_guide_Name`.
    Shape:
        - input: 4-D tensor.
@@ -867,14 +867,14 @@ class Dropout3D(Layer):
    Dropout3D will help promote independence between feature maps as described in the paper:
    `Efficient Object Localization Using Convolutional Networks <https://arxiv.org/abs/1411.4280>`_
-    See ``paddle.nn.functional.dropout3d`` for more details.
+    See :ref:`api_paddle_nn_functional_dropout3d` for more details.
    In dygraph mode, please use ``eval()`` to switch to evaluation mode, where dropout is disabled.
    Parameters:
-        p (float | int): Probability of setting units to zero. Default: 0.5
+        p (float | int, optional): Probability of setting units to zero. Default: 0.5.
-        data_format (str, optional): Specify the data format of the input, and the data format of the output will be consistent with that of the input. An optional string from `NCDHW` or `NDHWC`. The default is `NCDHW`. When it is `NCDHW`, the data is stored in the order of: [batch_size, input_channels, input_depth, input_height, input_width].
+        data_format (str, optional): Specify the data format of the input, and the data format of the output will be consistent with that of the input. An optional string from `NCDHW` or `NDHWC`. When it is `NCDHW`, the data is stored in the order of: [batch_size, input_channels, input_depth, input_height, input_width]. Default: `NCDHW`.
-        name (str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.
+        name (str, optional): Name for the operation, Default: None. For more information, please refer to :ref:`api_guide_Name`.
    Shape:
        - input: 5-D tensor.

--- a/python/paddle/nn/layer/conv.py
+++ b/python/paddle/nn/layer/conv.py
@@ -237,9 +237,9 @@ class Conv1D(_ConvNd):
    * :math:`X`: Input value, a ``Tensor`` with 'NCL' format or 'NLC' format.
    * :math:`W`: Filter value, a ``Tensor`` with shape [MCK] .
-    * :math:`\\ast`: Convolution operation.
+    * :math:`\ast`: Convolution operation.
    * :math:`b`: Bias value, a 1-D ``Tensor`` with shape [M].
-    * :math:`\\sigma`: Activation function.
+    * :math:`\sigma`: Activation function.
    * :math:`Out`: Output value, the shape of :math:`Out` and :math:`X` may be different.
    Example:
@@ -258,7 +258,7 @@ class Conv1D(_ConvNd):
        .. math::
-            L_{out}&= \frac{(L_{in} + 2 * padding - (dilation * (L_f - 1) + 1))}{stride} + 1 \\
+            L_{out}&= \frac{(L_{in} + 2 * padding - (dilation * (L_f - 1) + 1))}{stride} + 1
    Parameters:
        in_channels(int): The number of channels in the input image.
@@ -413,9 +413,9 @@ class Conv1DTranspose(_ConvNd):
    * :math:`X`: Input value, a 3-D Tensor with 'NCL' format or 'NLC' format.
    * :math:`W`: Kernel value, a 3-D Tensor with 'MCK' format.
-    * :math:`\\ast`: Convolution operation.
+    * :math:`\ast`: Convolution operation.
    * :math:`b`: Bias value, a 2-D Tensor with shape [M, 1].
-    * :math:`\\sigma`: Activation function.
+    * :math:`\sigma`: Activation function.
    * :math:`Out`: Output value, a 3-D Tensor with data format 'NCL' of 'NLC', the shape of :math:`Out` and :math:`X` may be different.
    Example:
@@ -434,7 +434,7 @@ class Conv1DTranspose(_ConvNd):
        .. math::
-           L^\prime_{out} &= (L_{in} - 1) * stride - pad_top - pad_bottom + dilation * (L_f - 1) + 1 \\\\
+           L^\prime_{out} &= (L_{in} - 1) * stride - 2 * padding + dilation * (L_f - 1) + 1 \\
           L_{out} &\in [ L^\prime_{out}, L^\prime_{out} + stride ]
    Note:
@@ -590,9 +590,9 @@ class Conv2D(_ConvNd):
    * :math:`X`: Input value, a ``Tensor`` with NCHW format.
    * :math:`W`: Filter value, a ``Tensor`` with shape [MCHW] .
-    * :math:`\\ast`: Convolution operation.
+    * :math:`\ast`: Convolution operation.
    * :math:`b`: Bias value, a 1-D ``Tensor`` with shape [M].
-    * :math:`\\sigma`: Activation function.
+    * :math:`\sigma`: Activation function.
    * :math:`Out`: Output value, the shape of :math:`Out` and :math:`X` may be different.
    Parameters:
@@ -755,9 +755,9 @@ class Conv2DTranspose(_ConvNd):
    * :math:`X`: Input value, a ``Tensor`` with NCHW format.
    * :math:`W`: Filter value, a ``Tensor`` with shape [CMHW] .
-    * :math:`\\ast`: Convolution operation.
+    * :math:`\ast`: Convolution operation.
    * :math:`b`: Bias value, a 1-D ``Tensor`` with shape [M].
-    * :math:`\\sigma`: Activation function.
+    * :math:`\sigma`: Activation function.
    * :math:`Out`: Output value, the shape of :math:`Out` and :math:`X` may be different.
    Parameters:
@@ -917,9 +917,9 @@ class Conv3D(_ConvNd):
    * :math:`X`: Input value, a tensor with NCDHW or NDHWC format.
    * :math:`W`: Filter value, a tensor with MCDHW format.
-    * :math:`\\ast`: Convolution operation.
+    * :math:`\ast`: Convolution operation.
    * :math:`b`: Bias value, a 1-D tensor with shape [M].
-    * :math:`\\sigma`: Activation function.
+    * :math:`\sigma`: Activation function.
    * :math:`Out`: Output value, the shape of :math:`Out` and :math:`X` may be different.
    Parameters:
@@ -1081,23 +1081,21 @@ class Conv3DTranspose(_ConvNd):
    * :math:`X`: Input value, a tensor with NCDHW format.
    * :math:`W`: Filter value, a tensor with CMDHW format.
-    * :math:`\\ast`: Convolution operation.
+    * :math:`\ast`: Convolution operation.
    * :math:`b`: Bias value, a 1-D tensor with shape [M].
-    * :math:`\\sigma`: Activation function.
+    * :math:`\sigma`: Activation function.
    * :math:`Out`: Output value, the shape of :math:`Out` and :math:`X` may be different.
-    **Note**:
+    .. note::
+        The conv3d_transpose can be seen as the backward of the conv3d. For conv3d,
-          The conv3d_transpose can be seen as the backward of the conv3d. For conv3d,
+        when stride > 1, conv3d maps multiple input shape to the same output shape,
-          when stride > 1, conv3d maps multiple input shape to the same output shape,
+        so for conv3d_transpose, when stride > 1, input shape maps multiple output shape.
-          so for conv3d_transpose, when stride > 1, input shape maps multiple output shape.
+        If output_size is None, :math:`H_{out} = H^\prime_{out}, W_{out} = W^\prime_{out}`;
-          If output_size is None, :math:`H_{out} = H^\prime_{out}, :math:`H_{out} = \
+        else, the :math:`D_{out}` of the output size must between :math:`D^\prime_{out}`
-          H^\prime_{out}, W_{out} = W^\prime_{out}`; else, the :math:`D_{out}` of the output
+        and :math:`D^\prime_{out} + strides[0]`, the :math:`H_{out}` of the output size must
-          size must between :math:`D^\prime_{out}` and :math:`D^\prime_{out} + strides[0]`,
+        between :math:`H^\prime_{out}` and :math:`H^\prime_{out} + strides[1]`, and the
-          the :math:`H_{out}` of the output size must between :math:`H^\prime_{out}`
+        :math:`W_{out}` of the output size must between :math:`W^\prime_{out}` and
-          and :math:`H^\prime_{out} + strides[1]`, and the :math:`W_{out}` of the output size must
+        :math:`W^\prime_{out} + strides[2]`, conv3d_transpose can compute the kernel size automatically.
-          between :math:`W^\prime_{out}` and :math:`W^\prime_{out} + strides[2]`,
-          conv3d_transpose can compute the kernel size automatically.
    Parameters:
        in_channels(int): The number of channels in the input image.
@@ -1108,34 +1106,34 @@ class Conv3DTranspose(_ConvNd):
        stride(int|list|tuple, optional): The stride size. It means the stride in transposed convolution.
            If stride is a list/tuple, it must contain three integers, (stride_depth, stride_height,
            stride_width). Otherwise, stride_depth = stride_height = stride_width = stride.
-            The default value is 1.
+            Default: 1.
        padding(int|str|tuple|list, optional): The padding size. Padding coule be in one of the following forms.
            1. a string in ['valid', 'same'].
            2. an int, which means each spartial dimension(depth, height, width) is zero paded by size of `padding`
            3. a list[int] or tuple[int] whose length is the number of spartial dimensions, which contains the amount of padding on each side for each spartial dimension. It has the form [pad_d1, pad_d2, ...].
            4. a list[int] or tuple[int] whose length is 2 * number of spartial dimensions. It has the form  [pad_before, pad_after, pad_before, pad_after, ...] for all spartial dimensions.
            5. a list or tuple of pairs of ints. It has the form [[pad_before, pad_after], [pad_before, pad_after], ...]. Note that, the batch dimension and channel dimension are also included. Each pair of integers correspond to the amount of padding for a dimension of the input. Padding in batch dimension and channel dimension should be [0, 0] or (0, 0).
-            The default value is 0.
+            Default: 0.
        output_padding(int|list|tuple, optional): Additional size added to one side
            of each dimension in the output shape. Default: 0.
        dilation(int|list|tuple, optional): The dilation size. If dilation is a list/tuple, it must
            contain three integers, (dilation_D, dilation_H, dilation_W). Otherwise, the
-            dilation_D = dilation_H = dilation_W = dilation. The default value is 1.
+            dilation_D = dilation_H = dilation_W = dilation. Default: 1.
        groups(int, optional): The groups number of the Conv3D transpose layer. Inspired by
-            grouped convolution in Alex Krizhevsky's Deep CNN paper, in which
+            grouped convolution in `Alex Krizhevsky's Deep CNN paper <https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf>`_, in which
-            when group=2, the first half of the filters is only connected to the
+            when groups = 2, the first half of the filters is only connected to the
            first half of the input channels, while the second half of the
            filters is only connected to the second half of the input channels.
-            The default value is 1.
+            Default: 1.
        weight_attr(ParamAttr, optional): The parameter attribute for learnable parameters/weights
            of conv3d_transpose. If it is set to None or one attribute of ParamAttr, conv3d_transpose
            will create ParamAttr as param_attr. If the Initializer of the param_attr
-            is not set, the parameter is initialized with Xavier. The default value is None.
+            is not set, the parameter is initialized with Xavier. Default: None.
        bias_attr(ParamAttr|bool, optional): The parameter attribute for the bias of conv3d_transpose.
            If it is set to False, no bias will be added to the output units.
            If it is set to None or one attribute of ParamAttr, conv3d_transpose
            will create ParamAttr as bias_attr. If the Initializer of the bias_attr
-            is not set, the bias is initialized zero. The default value is None.
+            is not set, the bias is initialized zero. Default: None.
        data_format(str, optional): Data format that specifies the layout of input.
            It can be "NCDHW" or "NDHWC". Default: "NCDHW".

--- a/python/paddle/nn/layer/loss.py
+++ b/python/paddle/nn/layer/loss.py
@@ -1026,7 +1026,7 @@ class MarginRankingLoss(Layer):
 class CTCLoss(Layer):
-    """
+    r"""
    An operator integrating the open source Warp-CTC library (https://github.com/baidu-research/warp-ctc)
    to compute Connectionist Temporal Classification (CTC) loss.
@@ -1038,11 +1038,11 @@ class CTCLoss(Layer):
        reduction (string, optional): Indicate how to average the loss, the candicates are ``'none'`` | ``'mean'`` | ``'sum'``. If :attr:`reduction` is ``'mean'``, the output loss will be divided by the label_lengths, and then return the mean of quotient; If :attr:`reduction` is ``'sum'``, return the sum of loss; If :attr:`reduction` is ``'none'``, no reduction will be applied. Default is ``'mean'``.
    Shape:
-        log_probs (Tensor): The unscaled probability sequence with padding, which is a 3-D Tensor. The tensor shape is [max_logit_length, batch_size, num_classes + 1], where max_logit_length is the longest length of input logit sequence. The data type should be float32 or float64.
+        - log_probs (Tensor): The unscaled probability sequence with padding, which is a 3-D Tensor. The tensor shape is [max_logit_length, batch_size, num_classes + 1], where max_logit_length is the longest length of input logit sequence. The data type should be float32 or float64.
-        labels (Tensor): The ground truth sequence with padding, which must be a 3-D Tensor. The tensor shape is [batch_size, max_label_length], where max_label_length is the longest length of label sequence. The data type must be int32.
+        - labels (Tensor): The ground truth sequence with padding, which must be a 3-D Tensor. The tensor shape is [batch_size, max_label_length], where max_label_length is the longest length of label sequence. The data type must be int32.
-        input_lengths (Tensor): The length for each input sequence, it should have shape [batch_size] and dtype int64.
+        - input_lengths (Tensor): The length for each input sequence, it should have shape [batch_size] and dtype int64.
-        label_lengths (Tensor): The length for each label sequence, it should have shape [batch_size] and dtype int64.
+        - label_lengths (Tensor): The length for each label sequence, it should have shape [batch_size] and dtype int64.
-        norm_by_times (bool, default false) – Whether to normalize the gradients by the number of time-step, which is also the sequence’s length. There is no need to normalize the gradients if reduction mode is 'mean'.
+        - norm_by_times (bool, optional): Whether to normalize the gradients by the number of time-step, which is also the sequence's length. There is no need to normalize the gradients if reduction mode is 'mean'. Default: False.
    Returns:
        Tensor, The Connectionist Temporal Classification (CTC) loss between ``log_probs`` and  ``labels``. If attr:`reduction` is ``'none'``, the shape of loss is [batch_size], otherwise, the shape of loss is [1]. Data type is the same as ``log_probs``.