test=develop (#2127)

da19eb33 · swtkiwi · GitHub · 1695d88b · da19eb33 · da19eb33
572 changed file
--- a/doc/fluid/api_cn/backward_cn/append_backward_cn.rst
+++ b/doc/fluid/api_cn/backward_cn/append_backward_cn.rst
@@ -3,10 +3,13 @@
 append_backward
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.backward.append_backward(loss, parameter_list=None, no_grad_set=None, callbacks=None)
+:api_attr: 声明式编程模式（静态图)
 该接口将向主程序（``main_program``）追加反向部分 。
 完整的神经网络训练由前向和反向传播组成。但是当我们配置网络时，我们只需要指定其前向部分。

--- a/doc/fluid/api_cn/backward_cn/gradients_cn.rst
+++ b/doc/fluid/api_cn/backward_cn/gradients_cn.rst
@@ -3,10 +3,13 @@
 gradients
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.backward.gradients(targets, inputs, target_gradients=None, no_grad_set=None)
+:api_attr: 声明式编程模式（静态图)
 将目标梯度反向传播到输入。
 参数：  

--- a/doc/fluid/api_cn/clip_cn/ErrorClipByValue_cn.rst
+++ b/doc/fluid/api_cn/clip_cn/ErrorClipByValue_cn.rst
 .. _cn_api_fluid_clip_ErrorClipByValue:
 ErrorClipByValue
 -------------------------------
 .. py:class:: paddle.fluid.clip.ErrorClipByValue(max, min=None)
-给定一个 Tensor  ``t`` （该 Tensor 传入方式见代码示例），对 Tensor 中的元素超出给定最大 ``max`` 和最小界 ``min`` 内区间范围 [min, max] 的元素，重设为所超出界的界值。
- 任何小于min（最小值）的值都被设置为 ``min``
+给定一个 Tensor  ``t`` （该 Tensor 传入方式见代码示例），对 Tensor 中的元素超出给定最大 ``max`` 和最小界 ``min`` 内区间范围 [min, max] 的元素，重设为所超出界的界值。
- 任何大于max（最大值）的值都被设置为 ``max``
+- 任何小于min（最小值）的值都被设置为 ``min``
-参数:
+- 任何大于max（最大值）的值都被设置为 ``max``
- - **max** (foat) - 要修剪的最大值。
- - **min** (float) - 要修剪的最小值。如果用户没有设置，将被框架默认设置为 ``-max`` 。
+参数:
+ - **max** (foat) - 要修剪的最大值。
-**代码示例**
+ - **min** (float) - 要修剪的最小值。如果用户没有设置，将被框架默认设置为 ``-max`` 。
-.. code-block:: python
+**代码示例**
-     import paddle.fluid as fluid
+.. code-block:: python
-     BATCH_SIZE = 128
-     CLIP_MAX = 2e-6
+     import paddle.fluid as fluid
-     CLIP_MIN = -1e-6
-     prog = fluid.framework.Program()
+     BATCH_SIZE = 128
+     CLIP_MAX = 2e-6
-     with fluid.program_guard(main_program=prog):
+     CLIP_MIN = -1e-6
-         image = fluid.layers.data(name='x', shape=[784], dtype='float32')
+     prog = fluid.framework.Program()
-         hidden1 = fluid.layers.fc(input=image, size=128, act='relu')
-         hidden2 = fluid.layers.fc(input=hidden1, size=64, act='relu')
+     with fluid.program_guard(main_program=prog):
-         predict = fluid.layers.fc(input=hidden2, size=10, act='softmax')
+         image = fluid.layers.data(name='x', shape=[784], dtype='float32')
-         label = fluid.layers.data(name='y', shape=[1], dtype='int64')
+         hidden1 = fluid.layers.fc(input=image, size=128, act='relu')
-         cost = fluid.layers.cross_entropy(input=predict, label=label)
+         hidden2 = fluid.layers.fc(input=hidden1, size=64, act='relu')
-         avg_cost = fluid.layers.mean(cost)
+         predict = fluid.layers.fc(input=hidden2, size=10, act='softmax')
-     prog_clip = prog.clone()
+         label = fluid.layers.data(name='y', shape=[1], dtype='int64')
-     prog_clip.block(0).var(hidden1.name)._set_error_clip(
+         cost = fluid.layers.cross_entropy(input=predict, label=label)
-         fluid.clip.ErrorClipByValue(max=CLIP_MAX, min=CLIP_MIN))
+         avg_cost = fluid.layers.mean(cost)
+     prog_clip = prog.clone()
+     prog_clip.block(0).var(hidden1.name)._set_error_clip(
+         fluid.clip.ErrorClipByValue(max=CLIP_MAX, min=CLIP_MIN))
--- a/doc/fluid/api_cn/clip_cn/GradientClipByGlobalNorm_cn.rst
+++ b/doc/fluid/api_cn/clip_cn/GradientClipByGlobalNorm_cn.rst
 .. _cn_api_fluid_clip_GradientClipByGlobalNorm:
 GradientClipByGlobalNorm
 -------------------------------
 .. py:class:: paddle.fluid.clip.GradientClipByGlobalNorm(clip_norm, group_name='default_group', need_clip=None)
-将一个 Tensor列表 :math:`t\_list` 中所有Tensor的L2范数之和，限定在 ``clip_norm`` 范围内。
+:alias_main: paddle.nn.GradientClipByGlobalNorm
+:alias: paddle.nn.GradientClipByGlobalNorm,paddle.nn.clip.GradientClipByGlobalNorm
- 如果范数之和大于 ``clip_norm`` ，则所有 Tensor 会乘以一个系数进行压缩
+:old_api: paddle.fluid.clip.GradientClipByGlobalNorm
- 如果范数之和小于或等于 ``clip_norm`` ，则不会进行任何操作。
-输入的 Tensor列表 不是从该类里传入， 而是默认会选择 ``Program`` 中全部的梯度，如果 ``need_clip`` 不为None，则可以只选择部分参数进行梯度裁剪。
+将一个 Tensor列表 :math:`t\_list` 中所有Tensor的L2范数之和，限定在 ``clip_norm`` 范围内。
-该类需要在初始化 ``optimizer`` 时进行设置后才能生效，可参看 ``optimizer`` 文档(例如： :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。
+- 如果范数之和大于 ``clip_norm`` ，则所有 Tensor 会乘以一个系数进行压缩
-裁剪公式如下：
+- 如果范数之和小于或等于 ``clip_norm`` ，则不会进行任何操作。
-.. math::
+输入的 Tensor列表 不是从该类里传入， 而是默认会选择 ``Program`` 中全部的梯度，如果 ``need_clip`` 不为None，则可以只选择部分参数进行梯度裁剪。
-            \\t\_list[i]=t\_list[i]∗\frac{clip\_norm}{max(global\_norm,clip\_norm)}\\
+该类需要在初始化 ``optimizer`` 时进行设置后才能生效，可参看 ``optimizer`` 文档(例如： :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。
-其中：
+裁剪公式如下：
-.. math::            
-            \\global\_norm=\sqrt{\sum_{i=0}^{n-1}(l2norm(t\_list[i]))^2}\\
+.. math::
+            \\t\_list[i]=t\_list[i]∗\frac{clip\_norm}{max(global\_norm,clip\_norm)}\\
-参数:
+其中：
- - **clip_norm** (float) - 所允许的范数最大值
- - **group_name** (str, optional) - 剪切的组名
+.. math::            
- - **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数，该函数接收一个 ``Parameter`` ，返回一个 ``bool`` (True表示需要裁剪，False不需要裁剪)。默认为None，此时会裁剪网络中全部参数。
+            \\global\_norm=\sqrt{\sum_{i=0}^{n-1}(l2norm(t\_list[i]))^2}\\
-**代码示例1：静态图**
+参数:
-.. code-block:: python
+ - **clip_norm** (float) - 所允许的范数最大值
+ - **group_name** (str, optional) - 剪切的组名
-    import paddle
+ - **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数，该函数接收一个 ``Parameter`` ，返回一个 ``bool`` (True表示需要裁剪，False不需要裁剪)。默认为None，此时会裁剪网络中全部参数。
-    import paddle.fluid as fluid
-    import numpy as np
+**代码示例1：静态图**
-    main_prog = fluid.Program()
+.. code-block:: python
-    startup_prog = fluid.Program()
-    with fluid.program_guard(
+    import paddle
-            main_program=main_prog, startup_program=startup_prog):
+    import paddle.fluid as fluid
-        image = fluid.data(
+    import numpy as np
-            name='x', shape=[-1, 2], dtype='float32')
-        predict = fluid.layers.fc(input=image, size=3, act='relu') #Trainable parameters: fc_0.w.0, fc_0.b.0
+    main_prog = fluid.Program()
-        loss = fluid.layers.mean(predict)
+    startup_prog = fluid.Program()
+    with fluid.program_guard(
-        # 裁剪网络中全部参数：
+            main_program=main_prog, startup_program=startup_prog):
-        clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0)
+        image = fluid.data(
+            name='x', shape=[-1, 2], dtype='float32')
-        # 仅裁剪参数fc_0.w_0时：
+        predict = fluid.layers.fc(input=image, size=3, act='relu') #Trainable parameters: fc_0.w.0, fc_0.b.0
-        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为Parameter，返回类型为bool
+        loss = fluid.layers.mean(predict)
-        # def fileter_func(Parameter):
-        # # 可以较为方便的通过Parameter.name判断（name可以在fluid.ParamAttr中设置，默认为fc_0.w_0、fc_0.b_0）
+        # 裁剪网络中全部参数：
-        #   return Parameter.name=="fc_0.w_0"
+        clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0)
-        # clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0, need_clip=fileter_func)
+        # 仅裁剪参数fc_0.w_0时：
-        sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1)
+        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为Parameter，返回类型为bool
-        sgd_optimizer.minimize(loss, grad_clip=clip)
+        # def fileter_func(Parameter):
+        # # 可以较为方便的通过Parameter.name判断（name可以在fluid.ParamAttr中设置，默认为fc_0.w_0、fc_0.b_0）
-    place = fluid.CPUPlace()
+        #   return Parameter.name=="fc_0.w_0"
-    exe = fluid.Executor(place)
+        # clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0, need_clip=fileter_func)
-    x = np.random.uniform(-100, 100, (10, 2)).astype('float32')
-    exe.run(startup_prog)
+        sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1)
-    out = exe.run(main_prog, feed={'x': x}, fetch_list=loss)
+        sgd_optimizer.minimize(loss, grad_clip=clip)
+    place = fluid.CPUPlace()
-**代码示例2：动态图**
+    exe = fluid.Executor(place)
+    x = np.random.uniform(-100, 100, (10, 2)).astype('float32')
-.. code-block:: python
+    exe.run(startup_prog)
+    out = exe.run(main_prog, feed={'x': x}, fetch_list=loss)
-    import paddle
-    import paddle.fluid as fluid
+**代码示例2：动态图**
-    with fluid.dygraph.guard():
-        linear = fluid.dygraph.Linear(10, 10)  #可训练参数: linear_0.w.0, linear_0.b.0
+.. code-block:: python
-        inputs = fluid.layers.uniform_random([32, 10]).astype('float32')
-        out = linear(fluid.dygraph.to_variable(inputs))
+    import paddle
-        loss = fluid.layers.reduce_mean(out)
+    import paddle.fluid as fluid
-        loss.backward()
+    with fluid.dygraph.guard():
-        # 裁剪网络中全部参数：
+        linear = fluid.dygraph.Linear(10, 10)  #可训练参数: linear_0.w.0, linear_0.b.0
-        clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0)
+        inputs = fluid.layers.uniform_random([32, 10]).astype('float32')
+        out = linear(fluid.dygraph.to_variable(inputs))
-        # 仅裁剪参数linear_0.w_0时：
+        loss = fluid.layers.reduce_mean(out)
-        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为ParamBase，返回类型为bool
+        loss.backward()
-        # def fileter_func(ParamBase):
-        # # 可以通过ParamBase.name判断（name可以在fluid.ParamAttr中设置，默认为linear_0.w_0、linear_0.b_0）
+        # 裁剪网络中全部参数：
-        #   return ParamBase.name == "linear_0.w_0"
+        clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0)
-        # # 注：linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差，也可以此来判断
-        #   return ParamBase.name == linear.weight.name
+        # 仅裁剪参数linear_0.w_0时：
-        # clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0, need_clip=fileter_func)
+        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为ParamBase，返回类型为bool
+        # def fileter_func(ParamBase):
-        sgd_optimizer = fluid.optimizer.SGD(
+        # # 可以通过ParamBase.name判断（name可以在fluid.ParamAttr中设置，默认为linear_0.w_0、linear_0.b_0）
-        learning_rate=0.1, parameter_list=linear.parameters())
+        #   return ParamBase.name == "linear_0.w_0"
+        # # 注：linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差，也可以此来判断
+        #   return ParamBase.name == linear.weight.name
+        # clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0, need_clip=fileter_func)
+        sgd_optimizer = fluid.optimizer.SGD(
+        learning_rate=0.1, parameter_list=linear.parameters())
        sgd_optimizer.minimize(loss, grad_clip=clip)
\ No newline at end of file
--- a/doc/fluid/api_cn/clip_cn/GradientClipByNorm_cn.rst
+++ b/doc/fluid/api_cn/clip_cn/GradientClipByNorm_cn.rst
 .. _cn_api_fluid_clip_GradientClipByNorm:
 GradientClipByNorm
 -------------------------------
 .. py:class:: paddle.fluid.clip.GradientClipByNorm(clip_norm, need_clip=None)
-将输入的多维Tensor :math:`X` 的L2范数限制在 ``clip_norm`` 范围之内。
+:alias_main: paddle.nn.GradientClipByNorm
+:alias: paddle.nn.GradientClipByNorm,paddle.nn.clip.GradientClipByNorm
- 如果L2范数大于 ``clip_norm`` ，则该 Tensor 会乘以一个系数进行压缩
+:old_api: paddle.fluid.clip.GradientClipByNorm
- 如果L2范数小于或等于 ``clip_norm`` ，则不会进行任何操作。
-输入的 Tensor 不是从该类里传入， 而是默认会选择 ``Program`` 中全部的梯度，如果 ``need_clip`` 不为None，则可以只选择部分参数进行梯度裁剪。
+将输入的多维Tensor :math:`X` 的L2范数限制在 ``clip_norm`` 范围之内。
-该类需要在初始化 ``optimizer`` 时进行设置后才能生效，可参看 ``optimizer`` 文档(例如： :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。
+- 如果L2范数大于 ``clip_norm`` ，则该 Tensor 会乘以一个系数进行压缩
-裁剪公式如下：
+- 如果L2范数小于或等于 ``clip_norm`` ，则不会进行任何操作。
-.. math::
+输入的 Tensor 不是从该类里传入， 而是默认会选择 ``Program`` 中全部的梯度，如果 ``need_clip`` 不为None，则可以只选择部分参数进行梯度裁剪。
-  Out=
+该类需要在初始化 ``optimizer`` 时进行设置后才能生效，可参看 ``optimizer`` 文档(例如： :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。
-  \left\{
-  \begin{aligned}
+裁剪公式如下：
-  &  X & & if (norm(X) \leq clip\_norm)\\
-  &  \frac{clip\_norm∗X}{norm(X)} & & if (norm(X) > clip\_norm) \\
+.. math::
-  \end{aligned}
-  \right.
+  Out=
+  \left\{
+  \begin{aligned}
-其中 :math:`norm（X）` 代表 :math:`X` 的L2范数
+  &  X & & if (norm(X) \leq clip\_norm)\\
+  &  \frac{clip\_norm∗X}{norm(X)} & & if (norm(X) > clip\_norm) \\
-.. math::
+  \end{aligned}
-  \\norm(X) = (\sum_{i=1}^{n}|x_i|^2)^{\frac{1}{2}}\\
+  \right.
-参数:
- - **clip_norm** (float) - 所允许的二范数最大值。
+其中 :math:`norm（X）` 代表 :math:`X` 的L2范数
- - **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数，该函数接收一个 ``Parameter`` ，返回一个 ``bool`` (True表示需要裁剪，False不需要裁剪)。默认为None，此时会裁剪网络中全部参数。
+.. math::
-**代码示例1：静态图**
+  \\norm(X) = (\sum_{i=1}^{n}|x_i|^2)^{\frac{1}{2}}\\
-.. code-block:: python
+参数:
+ - **clip_norm** (float) - 所允许的二范数最大值。
-    import paddle
+ - **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数，该函数接收一个 ``Parameter`` ，返回一个 ``bool`` (True表示需要裁剪，False不需要裁剪)。默认为None，此时会裁剪网络中全部参数。
-    import paddle.fluid as fluid
-    import numpy as np
+**代码示例1：静态图**
-    main_prog = fluid.Program()
+.. code-block:: python
-    startup_prog = fluid.Program()
-    with fluid.program_guard(
+    import paddle
-            main_program=main_prog, startup_program=startup_prog):
+    import paddle.fluid as fluid
-        image = fluid.data(
+    import numpy as np
-            name='x', shape=[-1, 2], dtype='float32')
-        predict = fluid.layers.fc(input=image, size=3, act='relu') #可训练参数: fc_0.w.0, fc_0.b.0
+    main_prog = fluid.Program()
-        loss = fluid.layers.mean(predict)
+    startup_prog = fluid.Program()
+    with fluid.program_guard(
-        # 裁剪网络中全部参数：
+            main_program=main_prog, startup_program=startup_prog):
-        clip = fluid.clip.GradientClipByNorm(clip_norm=1.0)
+        image = fluid.data(
+            name='x', shape=[-1, 2], dtype='float32')
-        # 仅裁剪参数fc_0.w_0时：
+        predict = fluid.layers.fc(input=image, size=3, act='relu') #可训练参数: fc_0.w.0, fc_0.b.0
-        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为Parameter，返回类型为bool
+        loss = fluid.layers.mean(predict)
-        # def fileter_func(Parameter):
-        # # 可以较为方便的通过Parameter.name判断（name可以在fluid.ParamAttr中设置，默认为fc_0.w_0、fc_0.b_0）
+        # 裁剪网络中全部参数：
-        #   return Parameter.name=="fc_0.w_0"
+        clip = fluid.clip.GradientClipByNorm(clip_norm=1.0)
-        # clip = fluid.clip.GradientClipByNorm(clip_norm=1.0, need_clip=fileter_func)
+        # 仅裁剪参数fc_0.w_0时：
-        sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1, grad_clip=clip)
+        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为Parameter，返回类型为bool
-        sgd_optimizer.minimize(loss)
+        # def fileter_func(Parameter):
+        # # 可以较为方便的通过Parameter.name判断（name可以在fluid.ParamAttr中设置，默认为fc_0.w_0、fc_0.b_0）
-    place = fluid.CPUPlace()
+        #   return Parameter.name=="fc_0.w_0"
-    exe = fluid.Executor(place)
+        # clip = fluid.clip.GradientClipByNorm(clip_norm=1.0, need_clip=fileter_func)
-    x = np.random.uniform(-100, 100, (10, 2)).astype('float32')
-    exe.run(startup_prog)
+        sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1, grad_clip=clip)
-    out = exe.run(main_prog, feed={'x': x}, fetch_list=loss)
+        sgd_optimizer.minimize(loss)
+    place = fluid.CPUPlace()
-**代码示例2：动态图**
+    exe = fluid.Executor(place)
+    x = np.random.uniform(-100, 100, (10, 2)).astype('float32')
-.. code-block:: python
+    exe.run(startup_prog)
+    out = exe.run(main_prog, feed={'x': x}, fetch_list=loss)
-    import paddle
-    import paddle.fluid as fluid
+**代码示例2：动态图**
-    with fluid.dygraph.guard():
-        linear = fluid.dygraph.Linear(10, 10)  #可训练参数: linear_0.w.0, linear_0.b.0
+.. code-block:: python
-        inputs = fluid.layers.uniform_random([32, 10]).astype('float32')
-        out = linear(fluid.dygraph.to_variable(inputs))
+    import paddle
-        loss = fluid.layers.reduce_mean(out)
+    import paddle.fluid as fluid
-        loss.backward()
+    with fluid.dygraph.guard():
-        # 裁剪网络中全部参数：
+        linear = fluid.dygraph.Linear(10, 10)  #可训练参数: linear_0.w.0, linear_0.b.0
-        clip = fluid.clip.GradientClipByNorm(clip_norm=1.0)
+        inputs = fluid.layers.uniform_random([32, 10]).astype('float32')
+        out = linear(fluid.dygraph.to_variable(inputs))
-        # 仅裁剪参数linear_0.w_0时：
+        loss = fluid.layers.reduce_mean(out)
-        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为ParamBase，返回类型为bool
+        loss.backward()
-        # def fileter_func(ParamBase):
-        # # 可以通过ParamBase.name判断（name可以在fluid.ParamAttr中设置，默认为linear_0.w_0、linear_0.b_0）
+        # 裁剪网络中全部参数：
-        #   return ParamBase.name == "linear_0.w_0"
+        clip = fluid.clip.GradientClipByNorm(clip_norm=1.0)
-        # # 注：linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差，也可以此来判断
-        #   return ParamBase.name == linear.weight.name
+        # 仅裁剪参数linear_0.w_0时：
-        # clip = fluid.clip.GradientClipByNorm(clip_norm=1.0, need_clip=fileter_func)
+        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为ParamBase，返回类型为bool
+        # def fileter_func(ParamBase):
-        sgd_optimizer = fluid.optimizer.SGD(
+        # # 可以通过ParamBase.name判断（name可以在fluid.ParamAttr中设置，默认为linear_0.w_0、linear_0.b_0）
-          learning_rate=0.1, parameter_list=linear.parameters(), grad_clip=clip)
+        #   return ParamBase.name == "linear_0.w_0"
+        # # 注：linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差，也可以此来判断
+        #   return ParamBase.name == linear.weight.name
+        # clip = fluid.clip.GradientClipByNorm(clip_norm=1.0, need_clip=fileter_func)
+        sgd_optimizer = fluid.optimizer.SGD(
+          learning_rate=0.1, parameter_list=linear.parameters(), grad_clip=clip)
        sgd_optimizer.minimize(loss)
\ No newline at end of file
--- a/doc/fluid/api_cn/clip_cn/GradientClipByValue_cn.rst
+++ b/doc/fluid/api_cn/clip_cn/GradientClipByValue_cn.rst
 .. _cn_api_fluid_clip_GradientClipByValue:
 GradientClipByValue
 -------------------------------
 .. py:class:: paddle.fluid.clip.GradientClipByValue(max, min=None, need_clip=None)
+:alias_main: paddle.nn.GradientClipByValue
-将输入的多维Tensor :math:`X` 的值限制在 [min, max] 范围。
+:alias: paddle.nn.GradientClipByValue,paddle.nn.clip.GradientClipByValue
+:old_api: paddle.fluid.clip.GradientClipByValue
-输入的 Tensor 不是从该类里传入， 而是默认会选择 ``Program`` 中全部的梯度，如果 ``need_clip`` 不为None，则可以只选择部分参数进行梯度裁剪。
-该类需要在初始化 ``optimizer`` 时进行设置后才能生效，可参看 ``optimizer`` 文档(例如： :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。
-给定一个 Tensor  ``t`` ，该操作将它的值压缩到 ``min`` 和 ``max`` 之间
+将输入的多维Tensor :math:`X` 的值限制在 [min, max] 范围。
- 任何小于 ``min`` 的值都被设置为 ``min``
+输入的 Tensor 不是从该类里传入， 而是默认会选择 ``Program`` 中全部的梯度，如果 ``need_clip`` 不为None，则可以只选择部分参数进行梯度裁剪。
- 任何大于 ``max`` 的值都被设置为 ``max``
+该类需要在初始化 ``optimizer`` 时进行设置后才能生效，可参看 ``optimizer`` 文档(例如： :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。
-参数:
+给定一个 Tensor  ``t`` ，该操作将它的值压缩到 ``min`` 和 ``max`` 之间
- - **max** (foat) - 要修剪的最大值。
- - **min** (float，optional) - 要修剪的最小值。如果用户没有设置，将被自动设置为 ``-max`` （此时 ``max`` 必须大于0）。
+- 任何小于 ``min`` 的值都被设置为 ``min``
- - **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数，该函数接收一个 ``Parameter`` ，返回一个 ``bool`` (True表示需要裁剪，False不需要裁剪)。默认为None，此时会裁剪网络中全部参数。
+- 任何大于 ``max`` 的值都被设置为 ``max``
-**代码示例1：静态图**
+参数:
-.. code-block:: python
+ - **max** (foat) - 要修剪的最大值。
+ - **min** (float，optional) - 要修剪的最小值。如果用户没有设置，将被自动设置为 ``-max`` （此时 ``max`` 必须大于0）。
-    import paddle
+ - **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数，该函数接收一个 ``Parameter`` ，返回一个 ``bool`` (True表示需要裁剪，False不需要裁剪)。默认为None，此时会裁剪网络中全部参数。
-    import paddle.fluid as fluid
-    import numpy as np
+**代码示例1：静态图**
-    main_prog = fluid.Program()
+.. code-block:: python
-    startup_prog = fluid.Program()
-    with fluid.program_guard(
+    import paddle
-            main_program=main_prog, startup_program=startup_prog):
+    import paddle.fluid as fluid
-        image = fluid.data(
+    import numpy as np
-            name='x', shape=[-1, 2], dtype='float32')
-        predict = fluid.layers.fc(input=image, size=3, act='relu') #可训练参数: fc_0.w.0, fc_0.b.0
+    main_prog = fluid.Program()
-        loss = fluid.layers.mean(predict)
+    startup_prog = fluid.Program()
+    with fluid.program_guard(
-        # 裁剪网络中全部参数：
+            main_program=main_prog, startup_program=startup_prog):
-        clip = fluid.clip.GradientClipByValue(min=-1, max=1)
+        image = fluid.data(
+            name='x', shape=[-1, 2], dtype='float32')
-        # 仅裁剪参数fc_0.w_0时：
+        predict = fluid.layers.fc(input=image, size=3, act='relu') #可训练参数: fc_0.w.0, fc_0.b.0
-        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为Parameter，返回类型为bool
+        loss = fluid.layers.mean(predict)
-        # def fileter_func(Parameter):
-        # # 可以较为方便的通过Parameter.name判断（name可以在fluid.ParamAttr中设置，默认为fc_0.w_0、fc_0.b_0）
+        # 裁剪网络中全部参数：
-        #   return Parameter.name=="fc_0.w_0"
+        clip = fluid.clip.GradientClipByValue(min=-1, max=1)
-        # clip = fluid.clip.GradientClipByValue(min=-1, max=1, need_clip=fileter_func)
+        # 仅裁剪参数fc_0.w_0时：
-        sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1, grad_clip=clip)
+        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为Parameter，返回类型为bool
-        sgd_optimizer.minimize(loss)
+        # def fileter_func(Parameter):
+        # # 可以较为方便的通过Parameter.name判断（name可以在fluid.ParamAttr中设置，默认为fc_0.w_0、fc_0.b_0）
-    place = fluid.CPUPlace()
+        #   return Parameter.name=="fc_0.w_0"
-    exe = fluid.Executor(place)
+        # clip = fluid.clip.GradientClipByValue(min=-1, max=1, need_clip=fileter_func)
-    x = np.random.uniform(-100, 100, (10, 2)).astype('float32')
-    exe.run(startup_prog)
+        sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1, grad_clip=clip)
-    out = exe.run(main_prog, feed={'x': x}, fetch_list=loss)
+        sgd_optimizer.minimize(loss)
+    place = fluid.CPUPlace()
-**代码示例2：动态图**
+    exe = fluid.Executor(place)
+    x = np.random.uniform(-100, 100, (10, 2)).astype('float32')
-.. code-block:: python
+    exe.run(startup_prog)
+    out = exe.run(main_prog, feed={'x': x}, fetch_list=loss)
-    import paddle
-    import paddle.fluid as fluid
+**代码示例2：动态图**
-    with fluid.dygraph.guard():
-        linear = fluid.dygraph.Linear(10, 10)  #可训练参数: linear_0.w.0, linear_0.b.0
+.. code-block:: python
-        inputs = fluid.layers.uniform_random([32, 10]).astype('float32')
-        out = linear(fluid.dygraph.to_variable(inputs))
+    import paddle
-        loss = fluid.layers.reduce_mean(out)
+    import paddle.fluid as fluid
-        loss.backward()
+    with fluid.dygraph.guard():
-        # 裁剪网络中全部参数：
+        linear = fluid.dygraph.Linear(10, 10)  #可训练参数: linear_0.w.0, linear_0.b.0
-        clip = fluid.clip.GradientClipByValue(min=-1, max=1)
+        inputs = fluid.layers.uniform_random([32, 10]).astype('float32')
+        out = linear(fluid.dygraph.to_variable(inputs))
-        # 仅裁剪参数linear_0.w_0时：
+        loss = fluid.layers.reduce_mean(out)
-        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为ParamBase，返回类型为bool
+        loss.backward()
-        # def fileter_func(ParamBase):
-        # # 可以通过ParamBase.name判断（name可以在fluid.ParamAttr中设置，默认为linear_0.w_0、linear_0.b_0）
+        # 裁剪网络中全部参数：
-        #   return ParamBase.name == "linear_0.w_0"
+        clip = fluid.clip.GradientClipByValue(min=-1, max=1)
-        # # 注：linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差，可以此来判断
-        #   return ParamBase.name == linear.weight.name
+        # 仅裁剪参数linear_0.w_0时：
-        # clip = fluid.clip.GradientClipByValue(min=-1, max=1, need_clip=fileter_func)
+        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为ParamBase，返回类型为bool
+        # def fileter_func(ParamBase):
-        sgd_optimizer = fluid.optimizer.SGD(
+        # # 可以通过ParamBase.name判断（name可以在fluid.ParamAttr中设置，默认为linear_0.w_0、linear_0.b_0）
-            learning_rate=0.1, parameter_list=linear.parameters(), grad_clip=clip)
+        #   return ParamBase.name == "linear_0.w_0"
-        sgd_optimizer.minimize(loss)
+        # # 注：linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差，可以此来判断
+        #   return ParamBase.name == linear.weight.name
+        # clip = fluid.clip.GradientClipByValue(min=-1, max=1, need_clip=fileter_func)
+        sgd_optimizer = fluid.optimizer.SGD(
+            learning_rate=0.1, parameter_list=linear.parameters(), grad_clip=clip)
+        sgd_optimizer.minimize(loss)
--- a/doc/fluid/api_cn/clip_cn/set_gradient_clip_cn.rst
+++ b/doc/fluid/api_cn/clip_cn/set_gradient_clip_cn.rst
@@ -3,10 +3,13 @@
 set_gradient_clip
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.clip.set_gradient_clip(clip, param_list=None, program=None)
+:api_attr: 声明式编程模式（静态图)
 .. warning::
    此API对位置使用的要求较高，其必须位于组建网络之后， ``minimize`` 之前，因此在未来版本中可能被删除，故不推荐使用。推荐在 ``optimizer`` 初始化时设置梯度裁剪。
    有三种裁剪策略： :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。

--- a/doc/fluid/api_cn/dataset_cn/DatasetFactory_cn.rst
+++ b/doc/fluid/api_cn/dataset_cn/DatasetFactory_cn.rst
 .. _cn_api_fluid_dataset_DatasetFactory:
 DatasetFactory
 -------------------------------
 .. py:class:: paddle.fluid.dataset.DatasetFactory
-DatasetFactory是一个按数据集名称创建数据集的 "工厂"，可以创建“QueueDataset”，“InMemoryDataset”或“FileInstantDataset”，默认为“QueueDataset”。
-**代码示例**
+DatasetFactory是一个按数据集名称创建数据集的 "工厂"，可以创建“QueueDataset”，“InMemoryDataset”或“FileInstantDataset”，默认为“QueueDataset”。
-.. code-block:: python
+**代码示例**
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
+.. code-block:: python
-.. py:method:: create_dataset(datafeed_class='QueueDataset')
+    import paddle.fluid as fluid
+    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-创建“QueueDataset”，“InMemoryDataset” 或 “FileInstantDataset”，默认为“QueueDataset”。
+.. py:method:: create_dataset(datafeed_class='QueueDataset')
-参数：
+创建“QueueDataset”，“InMemoryDataset” 或 “FileInstantDataset”，默认为“QueueDataset”。
-    - **datafeed_class** (str) – datafeed类名，为QueueDataset或InMemoryDataset。默认为QueueDataset。
-**代码示例**:
+参数：
+    - **datafeed_class** (str) – datafeed类名，为QueueDataset或InMemoryDataset。默认为QueueDataset。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
+    import paddle.fluid as fluid
+    dataset = fluid.DatasetFactory().create_dataset()
--- a/doc/fluid/api_cn/dataset_cn/InMemoryDataset_cn.rst
+++ b/doc/fluid/api_cn/dataset_cn/InMemoryDataset_cn.rst
 .. _cn_api_fluid_dataset_InMemoryDataset:
 InMemoryDataset
 -------------------------------
 .. py:class:: paddle.fluid.dataset.InMemoryDataset
-InMemoryDataset会向内存中加载数据并在训练前缓冲数据。此类由DatasetFactory创建。
-**代码示例**:
+InMemoryDataset会向内存中加载数据并在训练前缓冲数据。此类由DatasetFactory创建。
-.. code-block:: python
+**代码示例**:
-    dataset = paddle.fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
+.. code-block:: python
-.. py:method:: set_queue_num(queue_num)
+    dataset = paddle.fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
-设置 ``Dataset`` 输出队列数量，训练进程会从队列中获取数据。
+.. py:method:: set_queue_num(queue_num)
-参数：
-    - **queue_num** (int) - dataset输出队列数量
+设置 ``Dataset`` 输出队列数量，训练进程会从队列中获取数据。
-**代码示例**:
+参数：
+    - **queue_num** (int) - dataset输出队列数量
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
+.. code-block:: python
-    dataset.set_queue_num(12)
+    import paddle.fluid as fluid
-.. py:method:: set_fleet_send_batch_size(fleet_send_batch_size)
+    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
+    dataset.set_queue_num(12)
-设置发送batch的大小
+.. py:method:: set_fleet_send_batch_size(fleet_send_batch_size)
-参数:
-    - **fleet_send_batch_size** (int) - 设置发送batch的大小。
+设置发送batch的大小
-**代码示例**
+参数:
+    - **fleet_send_batch_size** (int) - 设置发送batch的大小。
-.. code-block:: python
+**代码示例**
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
+.. code-block:: python
-    dataset.set_fleet_send_batch_size(800)
+    import paddle.fluid as fluid
-.. py:method:: set_merge_by_lineid(var_list, erase_duplicate_feas=True, min_merge_size=2, keep_unmerged-ins=True)
+    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
+    dataset.set_fleet_send_batch_size(800)
-通过样本id来设置合并，一些线id的实例将会在shuffle之后进行合并，你应该在一个data生成器里面解析样本id。
+.. py:method:: set_merge_by_lineid(var_list, erase_duplicate_feas=True, min_merge_size=2, keep_unmerged-ins=True)
-参数:
-    - **var_list** (list) - 可以被合并的特征列表，其中的每一个元素都是一个 ``Variable`` 。一些类特征我们通常不把它们合并为同样的样本id，所以用户应当指定哪个类特征可以被合并。
+通过样本id来设置合并，一些线id的实例将会在shuffle之后进行合并，你应该在一个data生成器里面解析样本id。
-    - **erase_duplicate_feas** (bool) - 合并的时候是否删除重复的特征值。默认为True。
-    - **min_merge_size** (int) - 合并的最小数量。默认为2。
+参数:
-    - **keep_unmerged_ins** (bool) - 是否保留没有合并的样本，比如有着独特id的样本，或者重复id的数量小于 ``min_merge_size`` 的样本。
+    - **var_list** (list) - 可以被合并的特征列表，其中的每一个元素都是一个 ``Variable`` 。一些类特征我们通常不把它们合并为同样的样本id，所以用户应当指定哪个类特征可以被合并。
+    - **erase_duplicate_feas** (bool) - 合并的时候是否删除重复的特征值。默认为True。
-.. code-block:: python
+    - **min_merge_size** (int) - 合并的最小数量。默认为2。
+    - **keep_unmerged_ins** (bool) - 是否保留没有合并的样本，比如有着独特id的样本，或者重复id的数量小于 ``min_merge_size`` 的样本。
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
+.. code-block:: python
-    dataset.set_merge_by_lineid()
+    import paddle.fluid as fluid
-.. py:method:: load_into_memory()
+    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
+    dataset.set_merge_by_lineid()
-向内存中加载数据。
+.. py:method:: load_into_memory()
-**代码示例**:
+向内存中加载数据。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
+.. code-block:: python
-    filelist = ["a.txt", "b.txt"]
-    dataset.set_filelist(filelist)
+    import paddle.fluid as fluid
-    dataset.load_into_memory()
+    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
+    filelist = ["a.txt", "b.txt"]
-.. py:method:: preload_into_memory()
+    dataset.set_filelist(filelist)
+    dataset.load_into_memory()
-向内存中以异步模式加载数据。
+.. py:method:: preload_into_memory()
-**代码示例**:
+向内存中以异步模式加载数据。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
+.. code-block:: python
-    filelist = ["a.txt", "b.txt"]
-    dataset.set_filelist(filelist)
+    import paddle.fluid as fluid
-    dataset.preload_into_memory()
+    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    dataset.wait_preload_done()
+    filelist = ["a.txt", "b.txt"]
+    dataset.set_filelist(filelist)
-.. py:method:: wait_preload_done()
+    dataset.preload_into_memory()
+    dataset.wait_preload_done()
-等待 ``preload_into_memory`` 完成。
+.. py:method:: wait_preload_done()
-**代码示例**:
+等待 ``preload_into_memory`` 完成。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
+.. code-block:: python
-    filelist = ["a.txt", "b.txt"]
-    dataset.set_filelist(filelist)
+    import paddle.fluid as fluid
-    dataset.preload_into_memory()
+    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    dataset.wait_preload_done()
+    filelist = ["a.txt", "b.txt"]
+    dataset.set_filelist(filelist)
-.. py:method:: local_shuffle()
+    dataset.preload_into_memory()
+    dataset.wait_preload_done()
-局域shuffle。
+.. py:method:: local_shuffle()
-**代码示例**:
+局域shuffle。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
+.. code-block:: python
-    filelist = ["a.txt", "b.txt"]
-    dataset.set_filelist(filelist)
+    import paddle.fluid as fluid
-    dataset.load_into_memory()
+    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    dataset.local_shuffle()
+    filelist = ["a.txt", "b.txt"]
+    dataset.set_filelist(filelist)
+    dataset.load_into_memory()
-.. py:method:: global_shuffle(fleet=None)
+    dataset.local_shuffle()
-全局shuffle。
+.. py:method:: global_shuffle(fleet=None)
-只能用在分布式模式（单机多进程或多机多进程）中。您如果在分布式模式中运行，应当传递fleet而非None。
+全局shuffle。
-**代码示例**:
+只能用在分布式模式（单机多进程或多机多进程）中。您如果在分布式模式中运行，应当传递fleet而非None。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
+.. code-block:: python
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    filelist = ["a.txt", "b.txt"]
+    import paddle.fluid as fluid
-    dataset.set_filelist(filelist)
+    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
-    dataset.load_into_memory()
+    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    dataset.global_shuffle(fleet)
+    filelist = ["a.txt", "b.txt"]
+    dataset.set_filelist(filelist)
-参数：
+    dataset.load_into_memory()
-    - **fleet** (Fleet) – fleet单例。默认为None。
+    dataset.global_shuffle(fleet)
+参数：
-.. py:method:: release_memory()
+    - **fleet** (Fleet) – fleet单例。默认为None。
-当数据不再使用时，释放InMemoryDataset内存数据。
+.. py:method:: release_memory()
-**代码示例**:
+当数据不再使用时，释放InMemoryDataset内存数据。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
+.. code-block:: python
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    filelist = ["a.txt", "b.txt"]
+    import paddle.fluid as fluid
-    dataset.set_filelist(filelist)
+    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
-    dataset.load_into_memory()
+    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    dataset.global_shuffle(fleet)
+    filelist = ["a.txt", "b.txt"]
-    exe = fluid.Executor(fluid.CPUPlace())
+    dataset.set_filelist(filelist)
-    exe.run(fluid.default_startup_program())
+    dataset.load_into_memory()
-    exe.train_from_dataset(fluid.default_main_program(), dataset)
+    dataset.global_shuffle(fleet)
-    dataset.release_memory()
+    exe = fluid.Executor(fluid.CPUPlace())
+    exe.run(fluid.default_startup_program())
-.. py:method:: get_memory_data_size(fleet=None)
+    exe.train_from_dataset(fluid.default_main_program(), dataset)
+    dataset.release_memory()
-用户可以调用此函数以了解加载进内存后所有workers中的样本数量。
+.. py:method:: get_memory_data_size(fleet=None)
-.. note::
-    该函数可能会导致性能不佳，因为它具有barrier。
+用户可以调用此函数以了解加载进内存后所有workers中的样本数量。
-参数：
+.. note::
-    - **fleet** (Fleet) – fleet对象。
+    该函数可能会导致性能不佳，因为它具有barrier。
-返回：内存数据的大小。
+参数：
+    - **fleet** (Fleet) – fleet对象。
-**代码示例**:
+返回：内存数据的大小。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
+.. code-block:: python
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    filelist = ["a.txt", "b.txt"]
+    import paddle.fluid as fluid
-    dataset.set_filelist(filelist)
+    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
-    dataset.load_into_memory()
+    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    print dataset.get_memory_data_size(fleet)
+    filelist = ["a.txt", "b.txt"]
+    dataset.set_filelist(filelist)
+    dataset.load_into_memory()
-.. py:method:: get_shuffle_data_size(fleet=None)
+    print dataset.get_memory_data_size(fleet)
-获取shuffle数据大小，用户可以调用此函数以了解局域/全局shuffle后所有workers中的样本数量。
+.. py:method:: get_shuffle_data_size(fleet=None)
-.. note::
-    该函数可能会导致局域shuffle性能不佳，因为它具有barrier。但其不影响局域shuffle。
+获取shuffle数据大小，用户可以调用此函数以了解局域/全局shuffle后所有workers中的样本数量。
-参数：
+.. note::
-    - **fleet** (Fleet) – fleet对象。
+    该函数可能会导致局域shuffle性能不佳，因为它具有barrier。但其不影响局域shuffle。
-返回：shuffle数据的大小。
+参数：
+    - **fleet** (Fleet) – fleet对象。
-**代码示例**:
+返回：shuffle数据的大小。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
+.. code-block:: python
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    filelist = ["a.txt", "b.txt"]
+    import paddle.fluid as fluid
-    dataset.set_filelist(filelist)
+    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
-    dataset.load_into_memory()
+    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    dataset.global_shuffle(fleet)
+    filelist = ["a.txt", "b.txt"]
-    print dataset.get_shuffle_data_size(fleet)
+    dataset.set_filelist(filelist)
+    dataset.load_into_memory()
+    dataset.global_shuffle(fleet)
-.. py:method:: set_batch_size(batch_size)
+    print dataset.get_shuffle_data_size(fleet)
-设置batch size。在训练期间生效。
+.. py:method:: set_batch_size(batch_size)
-**代码示例**:
+设置batch size。在训练期间生效。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
-    dataset.set_batch_size(128)
+    import paddle.fluid as fluid
-参数：
+    dataset = fluid.DatasetFactory().create_dataset()
-    - **batch_size** (int) - batch size
+    dataset.set_batch_size(128)
-.. py:method:: set_fea_eval(record_candidate_size, fea_eval=True)
+参数：
+    - **batch_size** (int) - batch size
-设置特征打乱特征验证模式，来修正特征level的重要性， 特征打乱需要 ``fea_eval`` 被设置为True。
+.. py:method:: set_fea_eval(record_candidate_size, fea_eval=True)
-参数：
-    - **record_candidate_size** (int) - 打乱一个特征的候选实例大小
+设置特征打乱特征验证模式，来修正特征level的重要性， 特征打乱需要 ``fea_eval`` 被设置为True。
-    - **fea_eval** (bool) - 是否设置特征验证模式来打乱特征，默认为True。
+参数：
-**代码示例**:
+    - **record_candidate_size** (int) - 打乱一个特征的候选实例大小
+    - **fea_eval** (bool) - 是否设置特征验证模式来打乱特征，默认为True。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
+.. code-block:: python
-    dataset.set_fea_eval(1000000, True)
+    import paddle.fluid as fluid
-.. py:method:: desc()
+    dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
+    dataset.set_fea_eval(1000000, True)
-为 ``DataFeedDesc`` 返回一个缓存信息。
+.. py:method:: desc()
-**代码示例**:
+为 ``DataFeedDesc`` 返回一个缓存信息。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
-    print(dataset.desc())
+    import paddle.fluid as fluid
-返回：一个字符串信息
+    dataset = fluid.DatasetFactory().create_dataset()
+    print(dataset.desc())
-.. py:method:: set_filelist(filelist)
+返回：一个字符串信息
-在当前的worker中设置文件列表。
+.. py:method:: set_filelist(filelist)
-**代码示例**:
+在当前的worker中设置文件列表。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
-    dataset.set_filelist(["a.txt", "b.txt"])
+    import paddle.fluid as fluid
-参数：
+    dataset = fluid.DatasetFactory().create_dataset()
-    - **filelist** (list) - 文件列表
+    dataset.set_filelist(["a.txt", "b.txt"])
-.. py:method:: set_hdfs_config(fs_name, fs_ugi)
+参数：
+    - **filelist** (list) - 文件列表
-设置hdfs配置：fs名称与ugi。
+.. py:method:: set_hdfs_config(fs_name, fs_ugi)
-**代码示例**:
+设置hdfs配置：fs名称与ugi。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
-    dataset.set_hdfs_config("my_fs_name", "my_fs_ugi")
+    import paddle.fluid as fluid
-参数：
+    dataset = fluid.DatasetFactory().create_dataset()
-    - **fs_name** (str) - fs名称
+    dataset.set_hdfs_config("my_fs_name", "my_fs_ugi")
-    - **fs_ugi** (str) - fs ugi
+参数：
-.. py:method:: set_pipe_command(pipe_coommand)
+    - **fs_name** (str) - fs名称
+    - **fs_ugi** (str) - fs ugi
-在当前的 ``dataset`` 中设置pipe命令。pipe命令只能使用UNIX的pipe命令
+.. py:method:: set_pipe_command(pipe_coommand)
-**代码示例**:
+在当前的 ``dataset`` 中设置pipe命令。pipe命令只能使用UNIX的pipe命令
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
-    dataset.set_pipe_command("python my_script.py")
+    import paddle.fluid as fluid
-参数：
+    dataset = fluid.DatasetFactory().create_dataset()
-    - **pipe_command** (str) - pipe命令
+    dataset.set_pipe_command("python my_script.py")
-.. py:method:: set_thread(thread_num)
+参数：
+    - **pipe_command** (str) - pipe命令
-设置进程数量，等于readers的数量。
+.. py:method:: set_thread(thread_num)
-**代码示例**:
+设置进程数量，等于readers的数量。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
-    dataset.set_thread(12)
+    import paddle.fluid as fluid
-参数：
+    dataset = fluid.DatasetFactory().create_dataset()
-    - **thread_num** (int) - 进程数量
+    dataset.set_thread(12)
-.. py:method:: set_use_var(var_list)
+参数：
+    - **thread_num** (int) - 进程数量
-设置将要使用的 ``Variable`` 。
+.. py:method:: set_use_var(var_list)
-**代码示例**:
+设置将要使用的 ``Variable`` 。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
-    dataset.set_use_var([data, label])
+    import paddle.fluid as fluid
-参数：
+    dataset = fluid.DatasetFactory().create_dataset()
-    - **var_list** (list) - variable 列表
+    dataset.set_use_var([data, label])
-.. py:method:: slots_shuffle(slots)
+参数：
+    - **var_list** (list) - variable 列表
-该方法是在特征层次上的一个打乱方法，经常被用在有着较大缩放率实例的稀疏矩阵上，为了比较metric，比如auc，在一个或者多个有着baseline的特征上做特征打乱来验证特征level的重要性。
+.. py:method:: slots_shuffle(slots)
-参数：
-    - **slots** (list[string]) - 要打乱特征的集合
+该方法是在特征层次上的一个打乱方法，经常被用在有着较大缩放率实例的稀疏矩阵上，为了比较metric，比如auc，在一个或者多个有着baseline的特征上做特征打乱来验证特征level的重要性。
-**代码示例**:
+参数：
+    - **slots** (list[string]) - 要打乱特征的集合
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
+.. code-block:: python
-    dataset.set_merge_by_lineid()
-    #支持slot 0
+    import paddle.fluid as fluid
-    dataset.slots_shuffle([‘0’])
+    dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
+    dataset.set_merge_by_lineid()
+    #支持slot 0
+    dataset.slots_shuffle([‘0’])
--- a/doc/fluid/api_cn/dataset_cn/QueueDataset_cn.rst
+++ b/doc/fluid/api_cn/dataset_cn/QueueDataset_cn.rst
 .. _cn_api_fluid_dataset_QueueDataset:
 QueueDataset
 -------------------------------
 .. py:class:: paddle.fluid.dataset.QueueDataset
-流式处理数据。
-**代码示例**:
+流式处理数据。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
+.. code-block:: python
+    import paddle.fluid as fluid
+    dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
-.. py:method:: local_shuffle()
-局域shuffle数据
+.. py:method:: local_shuffle()
-QueueDataset中不支持局域shuffle，可能抛出NotImplementedError
+局域shuffle数据
-**代码示例**:
+QueueDataset中不支持局域shuffle，可能抛出NotImplementedError
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
+.. code-block:: python
-    dataset.local_shuffle()
+    import paddle.fluid as fluid
+    dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
+    dataset.local_shuffle()
-.. py:method:: global_shuffle(fleet=None)
-全局shuffle数据
+.. py:method:: global_shuffle(fleet=None)
-QueueDataset中不支持全局shuffle，可能抛出NotImplementedError
+全局shuffle数据
-**代码示例**:
+QueueDataset中不支持全局shuffle，可能抛出NotImplementedError
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
+.. code-block:: python
-    dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
-    dataset.global_shuffle(fleet)
+    import paddle.fluid as fluid
+    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
-.. py:method:: desc()
+    dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
+    dataset.global_shuffle(fleet)
-为 ``DataFeedDesc`` 返回一个缓存信息。
+.. py:method:: desc()
-**代码示例**:
+为 ``DataFeedDesc`` 返回一个缓存信息。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
-    print(dataset.desc())
+    import paddle.fluid as fluid
-返回：一个字符串信息
+    dataset = fluid.DatasetFactory().create_dataset()
+    print(dataset.desc())
-.. py:method:: set_batch_size(batch_size)
+返回：一个字符串信息
-设置batch size。在训练期间生效。
+.. py:method:: set_batch_size(batch_size)
-**代码示例**:
+设置batch size。在训练期间生效。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
-    dataset.set_batch_size(128)
+    import paddle.fluid as fluid
-参数：
+    dataset = fluid.DatasetFactory().create_dataset()
-    - **batch_size** (int) - batch size
+    dataset.set_batch_size(128)
-.. py:method:: set_fea_eval(record_candidate_size,fea_eval)
+参数：
+    - **batch_size** (int) - batch size
-参数：
-    - **record_candidate_size** (int) - 打乱一个特征的候选实例大小
+.. py:method:: set_fea_eval(record_candidate_size,fea_eval)
-    - **fea_eval** (bool) - 是否设置特征验证模式来打乱特征，默认为True。
+参数：
-**代码示例**:
+    - **record_candidate_size** (int) - 打乱一个特征的候选实例大小
+    - **fea_eval** (bool) - 是否设置特征验证模式来打乱特征，默认为True。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
+.. code-block:: python
-    dataset.set_fea_eval(1000000, True)
+    import paddle.fluid as fluid
-.. py:method:: set_filelist(filelist)
+    dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
+    dataset.set_fea_eval(1000000, True)
-在当前的worker中设置文件列表。
+.. py:method:: set_filelist(filelist)
-**代码示例**:
+在当前的worker中设置文件列表。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
-    dataset.set_filelist(["a.txt", "b.txt"])
+    import paddle.fluid as fluid
-参数：
+    dataset = fluid.DatasetFactory().create_dataset()
-    - **filelist** (list) - 文件列表
+    dataset.set_filelist(["a.txt", "b.txt"])
-.. py:method:: set_hdfs_config(fs_name, fs_ugi)
+参数：
+    - **filelist** (list) - 文件列表
-设置hdfs配置：fs名称与ugi。
+.. py:method:: set_hdfs_config(fs_name, fs_ugi)
-**代码示例**:
+设置hdfs配置：fs名称与ugi。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
-    dataset.set_hdfs_config("my_fs_name", "my_fs_ugi")
+    import paddle.fluid as fluid
-参数：
+    dataset = fluid.DatasetFactory().create_dataset()
-    - **fs_name** (str) - fs名称
+    dataset.set_hdfs_config("my_fs_name", "my_fs_ugi")
-    - **fs_ugi** (str) - fs ugi
+参数：
-.. py:method:: set_pipe_command(pipe_coommand)
+    - **fs_name** (str) - fs名称
+    - **fs_ugi** (str) - fs ugi
-在当前的 ``dataset`` 中设置pipe命令。pipe命令只能使用UNIX的pipe命令
+.. py:method:: set_pipe_command(pipe_coommand)
-**代码示例**:
+在当前的 ``dataset`` 中设置pipe命令。pipe命令只能使用UNIX的pipe命令
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
-    dataset.set_pipe_command("python my_script.py")
+    import paddle.fluid as fluid
-参数：
+    dataset = fluid.DatasetFactory().create_dataset()
-    - **pipe_command** (str) - pipe命令
+    dataset.set_pipe_command("python my_script.py")
-.. py:method:: set_thread(thread_num)
+参数：
+    - **pipe_command** (str) - pipe命令
-设置进程数量，等于readers的数量。
+.. py:method:: set_thread(thread_num)
-**代码示例**:
+设置进程数量，等于readers的数量。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
-    dataset.set_thread(12)
+    import paddle.fluid as fluid
-参数：
+    dataset = fluid.DatasetFactory().create_dataset()
-    - **thread_num** (int) - 进程数量
+    dataset.set_thread(12)
-.. py:method:: set_use_var(var_list)
+参数：
+    - **thread_num** (int) - 进程数量
-设置将要使用的 ``Variable`` 。
+.. py:method:: set_use_var(var_list)
-**代码示例**:
+设置将要使用的 ``Variable`` 。
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
+.. code-block:: python
-    dataset.set_use_var([data, label])
+    import paddle.fluid as fluid
-参数：
+    dataset = fluid.DatasetFactory().create_dataset()
-    - **var_list** (list) - variable 列表
+    dataset.set_use_var([data, label])
-.. py:method:: slots_shuffle(slots)
+参数：
+    - **var_list** (list) - variable 列表
-该方法是在特征层次上的一个打乱方法，经常被用在有着较大缩放率实例的稀疏矩阵上，为了比较metric，比如auc，在一个或者多个有着baseline的特征上做特征打乱来验证特征level的重要性。
+.. py:method:: slots_shuffle(slots)
-参数：
-    - **slots** (list[string]) - 要打乱特征的集合
+该方法是在特征层次上的一个打乱方法，经常被用在有着较大缩放率实例的稀疏矩阵上，为了比较metric，比如auc，在一个或者多个有着baseline的特征上做特征打乱来验证特征level的重要性。
-**代码示例**:
+参数：
+    - **slots** (list[string]) - 要打乱特征的集合
-.. code-block:: python
+**代码示例**:
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
+.. code-block:: python
-    dataset.set_merge_by_lineid()
-    #支持slot 0
+    import paddle.fluid as fluid
-    dataset.slots_shuffle([‘0’])
+    dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
+    dataset.set_merge_by_lineid()
+    #支持slot 0
+    dataset.slots_shuffle([‘0’])
--- a/doc/fluid/api_cn/dygraph_cn/BackwardStrategy_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/BackwardStrategy_cn.rst
@@ -3,10 +3,13 @@
 BackwardStrategy
 -------------------------------
-**注意：该API仅支持【动态图】模式**
 .. py:class:: paddle.fluid.dygraph.BackwardStrategy
+:api_attr: 命令式编程模式（动态图)
 **注意：该API只在动态图下生效**
 BackwardStrategy是描述动态图反向执行的策略，主要功能是定义动态图反向执行时的不同策略

--- a/doc/fluid/api_cn/dygraph_cn/BatchNorm_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/BatchNorm_cn.rst
@@ -5,6 +5,12 @@ BatchNorm
 .. py:class:: paddle.fluid.dygraph.BatchNorm(num_channels, act=None, is_test=False, momentum=0.9, epsilon=1e-05, param_attr=None, bias_attr=None, dtype='float32', data_layout='NCHW', in_place=False, moving_mean_name=None, moving_variance_name=None, do_model_average_for_mean_and_var=False, use_global_stats=False, trainable_statistics=False)
+:alias_main: paddle.nn.BatchNorm
+:alias: paddle.nn.BatchNorm,paddle.nn.layer.BatchNorm,paddle.nn.layer.norm.BatchNorm
+:old_api: paddle.fluid.dygraph.BatchNorm
 该接口用于构建 ``BatchNorm`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。其中实现了批归一化层（Batch Normalization Layer）的功能，可用作卷积和全连接操作的批归一化函数，根据当前批次数据按通道计算的均值和方差进行归一化。更多详情请参考 : `Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift <https://arxiv.org/pdf/1502.03167.pdf>`_
 当use_global_stats = False时，:math:`\mu_{\beta}` 和 :math:`\sigma_{\beta}^{2}` 是minibatch的统计数据。计算公式如下：

--- a/doc/fluid/api_cn/dygraph_cn/BilinearTensorProduct_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/BilinearTensorProduct_cn.rst
@@ -5,6 +5,12 @@ BilinearTensorProduct
 .. py:class:: paddle.fluid.dygraph.BilinearTensorProduct(input1_dim, input2_dim, output_dim, name=None, act=None, param_attr=None, bias_attr=None, dtype="float32")
+:alias_main: paddle.nn.BilinearTensorProduct
+:alias: paddle.nn.BilinearTensorProduct,paddle.nn.layer.BilinearTensorProduct,paddle.nn.layer.common.BilinearTensorProduct
+:old_api: paddle.fluid.dygraph.BilinearTensorProduct
 该接口用于构建 ``BilinearTensorProduct`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。双线性乘积计算式子如下。
 .. math::

--- a/doc/fluid/api_cn/dygraph_cn/Conv2DTranspose_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/Conv2DTranspose_cn.rst
@@ -5,6 +5,9 @@ Conv2DTranspose
 .. py:class:: paddle.fluid.dygraph.Conv2DTranspose(num_channels, num_filters, filter_size, output_size=None, padding=0, stride=1, dilation=1, groups=None, param_attr=None, bias_attr=None, use_cudnn=True, act=None, dtype="float32")
 该接口用于构建 ``Conv2DTranspose`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。其将在神经网络中构建一个二维卷积转置层（Convlution2D Transpose Layer），其根据输入（input）、滤波器参数（num_filters、filter_size）、步长（stride）、填充（padding）、膨胀系数（dilation）、组数（groups）来计算得到输出特征图。输入和输出是 ``NCHW`` 格式，N是批数据大小，C是特征图个数，H是特征图高度，W是特征图宽度。滤波器的维度是 [M, C, H, W] ，M是输入特征图个数，C是输出特征图个数，H是滤波器高度，W是滤波器宽度。如果组数大于1，C等于输入特征图个数除以组数的结果。如果提供了偏移属性和激活函数类型，卷积的结果会和偏移相加，激活函数会作用在最终结果上。转置卷积的计算过程相当于卷积的反向计算，转置卷积又被称为反卷积（但其实并不是真正的反卷积）。详情请参考： `Conv2DTranspose <http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdf>`_ 。
 输入 ``X`` 和输出 ``Out`` 的函数关系如下：

--- a/doc/fluid/api_cn/dygraph_cn/Conv2D_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/Conv2D_cn.rst
@@ -5,6 +5,9 @@ Conv2D
 .. py:class:: paddle.fluid.dygraph.Conv2D(num_channels, num_filters, filter_size, stride=1, padding=0, dilation=1, groups=None, param_attr=None, bias_attr=None, use_cudnn=True, act=None, dtype='float32')
 该接口用于构建 ``Conv2D`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。其将在神经网络中构建一个二维卷积层（Convolution2D Layer），其根据输入、滤波器参数（num_filters、filter_size）、步长（stride）、填充（padding）、膨胀系数（dilation）、组数（groups）参数来计算得到输出特征图。输入和输出是 ``NCHW`` 格式，N是批数据大小，C是特征图个数，H是特征图高度，W是特征图宽度。滤波器的维度是 [M, C, H, W] ，M是输出特征图个数，C是输入特征图个数，H是滤波器高度，W是滤波器宽度。如果组数大于1，C等于输入特征图个数除以组数的结果。如果提供了偏移属性和激活函数类型，卷积的结果会和偏移相加，激活函数会作用在最终结果上。详情请参考： `卷积 <http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractionUsingConvolution/>`_ 。
 对每个输入 ``X`` ，有等式：

--- a/doc/fluid/api_cn/dygraph_cn/Conv3DTranspose_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/Conv3DTranspose_cn.rst
@@ -6,6 +6,9 @@ Conv3DTranspose
 .. py:class:: paddle.fluid.dygraph.Conv3DTranspose(num_channels, num_filters, filter_size, output_size=None, padding=0, stride=1, dilation=1, groups=None, param_attr=None, bias_attr=None, use_cudnn=True, act=None, name=None, dtype="float32")
 该接口用于构建 ``Conv3DTranspose`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。3D卷积转置层（Convlution3D transpose layer)根据输入（input）、滤波器（filter）和卷积核膨胀（dilations）、步长（stride）、填充来计算输出特征层大小或者通过output_size指定输出特征层大小。输入(Input)和输出(Output)为NCDHW格式。其中 ``N`` 为batch大小， ``C`` 为通道数（channel）, ``D``  为特征深度, ``H`` 为特征高度， ``W`` 为特征宽度。转置卷积的计算过程相当于卷积的反向计算。转置卷积又被称为反卷积（但其实并不是真正的反卷积）。欲了解卷积转置层细节，请参考下面的说明和 参考文献_ 。如果参数bias_attr不为False, 转置卷积计算会添加偏置项。如果act不为None，则转置卷积计算之后添加相应的激活函数。

--- a/doc/fluid/api_cn/dygraph_cn/Conv3D_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/Conv3D_cn.rst
@@ -6,6 +6,9 @@ Conv3D
 .. py:class:: paddle.fluid.dygraph.Conv3D(num_channels, num_filters, filter_size, stride=1, padding=0, dilation=1, groups=None, param_attr=None, bias_attr=None, use_cudnn=True, act=None, dtype="float32")
 该接口用于构建 ``Conv3D`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。3D卷积层（convolution3D layer）根据输入、滤波器（filter）、步长（stride）、填充（padding）、膨胀（dilations）、组数参数计算得到输出。输入和输出是[N, C, D, H, W]的多维tensor，其中N是批尺寸，C是通道数，D是特征深度，H是特征高度，W是特征宽度。卷积三维（Convlution3D）和卷积二维（Convlution2D）相似，但多了一维深度（depth）。如果提供了bias属性和激活函数类型，bias会添加到卷积（convolution）的结果中相应的激活函数会作用在最终结果上。
 对每个输入X，有等式：

--- a/doc/fluid/api_cn/dygraph_cn/CosineDecay_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/CosineDecay_cn.rst
@@ -3,10 +3,13 @@
 CosineDecay
 -------------------------------
-**注意：该API仅支持【动态图】模式**
 .. py:class:: paddle.fluid.dygraph.CosineDecay(learning_rate, step_each_epoch, epochs, begin=0, step=1, dtype='float32')
+:api_attr: 命令式编程模式（动态图)
 该接口提供按余弦函数衰减学习率的功能。
 余弦衰减的计算方式如下。

--- a/doc/fluid/api_cn/dygraph_cn/Embedding_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/Embedding_cn.rst
@@ -5,6 +5,12 @@ Embedding
 .. py:class:: paddle.fluid.dygraph.Embedding(size, is_sparse=False, is_distributed=False, padding_idx=None, param_attr=None, dtype='float32')
+:alias_main: paddle.nn.Embedding
+:alias: paddle.nn.Embedding,paddle.nn.layer.Embedding,paddle.nn.layer.common.Embedding
+:old_api: paddle.fluid.dygraph.Embedding
 嵌入层(Embedding Layer)
 该接口用于构建 ``Embedding`` 的一个可调用对象，具体用法参照 ``代码示例`` 。其根据input中的id信息从embedding矩阵中查询对应embedding信息，并会根据输入的size (vocab_size, emb_size)和dtype自动构造一个二维embedding矩阵。

--- a/doc/fluid/api_cn/dygraph_cn/ExponentialDecay_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/ExponentialDecay_cn.rst
@@ -3,10 +3,13 @@
 ExponentialDecay
 -------------------------------
-**注意：该API仅支持【动态图】模式**
 .. py:class:: paddle.fluid.dygraph.ExponentialDecay(learning_rate, decay_steps, decay_rate, staircase=False, begin=0, step=1, dtype=’float32‘)
+:api_attr: 命令式编程模式（动态图)
 该接口提供一种学习率按指数函数衰减的功能。
 指数衰减的计算方式如下。

--- a/doc/fluid/api_cn/dygraph_cn/GRUUnit_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/GRUUnit_cn.rst
@@ -5,6 +5,9 @@ GRUUnit
 .. py:class:: paddle.fluid.dygraph.GRUUnit(name_scope, size, param_attr=None, bias_attr=None, activation='tanh', gate_activation='sigmoid', origin_mode=False, dtype='float32')
 该接口用于构建 ``GRU(Gated Recurrent Unit)`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。其用于完成单个时间步内GRU的计算，支持以下两种计算方式：
 如果origin_mode为True，则使用的运算公式来自论文

--- a/doc/fluid/api_cn/dygraph_cn/GroupNorm_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/GroupNorm_cn.rst
@@ -5,6 +5,12 @@ GroupNorm
 .. py:class:: paddle.fluid.dygraph.GroupNorm(channels, groups, epsilon=1e-05, param_attr=None, bias_attr=None, act=None, data_layout='NCHW', dtype="float32")
+:alias_main: paddle.nn.GroupNorm
+:alias: paddle.nn.GroupNorm,paddle.nn.layer.GroupNorm,paddle.nn.layer.norm.GroupNorm
+:old_api: paddle.fluid.dygraph.GroupNorm
 **Group Normalization层**
 该接口用于构建 ``GroupNorm`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。其中实现了组归一化层的功能。更多详情请参考： `Group Normalization <https://arxiv.org/abs/1803.08494>`_ 。

--- a/doc/fluid/api_cn/dygraph_cn/InverseTimeDecay_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/InverseTimeDecay_cn.rst
@@ -3,10 +3,13 @@
 InverseTimeDecay
 -------------------------------
-**注意：该API仅支持【动态图】模式**
 .. py:class:: paddle.fluid.dygraph.InverseTimeDecay(learning_rate, decay_steps, decay_rate, staircase=False, begin=0, step=1, dtype='float32')
+:api_attr: 命令式编程模式（动态图)
 该接口提供反时限学习率衰减的功能。
 反时限学习率衰减计算方式如下。

--- a/doc/fluid/api_cn/dygraph_cn/LayerList_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/LayerList_cn.rst
@@ -5,6 +5,9 @@ LayerList
 .. py:class:: paddle.fluid.dygraph.LayerList(sublayers=None)
 LayerList用于保存子层列表，它包含的子层将被正确地注册和添加。列表中的子层可以像常规python列表一样被索引。
 参数：

--- a/doc/fluid/api_cn/dygraph_cn/LayerNorm_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/LayerNorm_cn.rst
@@ -5,6 +5,12 @@ LayerNorm
 .. py:class:: paddle.fluid.dygraph.LayerNorm(normalized_shape, scale=True, shift=True, begin_norm_axis=1, epsilon=1e-05, param_attr=None, bias_attr=None, act=None, dtype="float32")
+:alias_main: paddle.nn.LayerNorm
+:alias: paddle.nn.LayerNorm,paddle.nn.layer.LayerNorm,paddle.nn.layer.norm.LayerNorm
+:old_api: paddle.fluid.dygraph.LayerNorm
 该接口用于构建 ``LayerNorm`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。其中实现了层归一化层（Layer Normalization Layer）的功能，其可以应用于小批量输入数据。更多详情请参考：`Layer Normalization <https://arxiv.org/pdf/1607.06450v1.pdf>`_
 计算公式如下

--- a/doc/fluid/api_cn/dygraph_cn/Layer_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/Layer_cn.rst
@@ -5,6 +5,9 @@ Layer
 .. py:class:: paddle.fluid.dygraph.Layer(name_scope=None, dtype=core.VarDesc.VarType.FP32)
 基于OOD实现的动态图Layer，包含该Layer的参数、前序运行的结构等信息。
 参数：

--- a/doc/fluid/api_cn/dygraph_cn/Linear_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/Linear_cn.rst
@@ -5,6 +5,12 @@ Linear
 .. py:class:: paddle.fluid.dygraph.Linear(input_dim, output_dim, param_attr=None, bias_attr=None, act=None, dtype='float32')
+:alias_main: paddle.nn.Linear
+:alias: paddle.nn.Linear,paddle.nn.layer.Linear,paddle.nn.layer.common.Linear
+:old_api: paddle.fluid.dygraph.Linear
 **线性变换层：**

--- a/doc/fluid/api_cn/dygraph_cn/NCE_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/NCE_cn.rst
@@ -5,6 +5,9 @@ NCE
 .. py:class:: paddle.fluid.dygraph.NCE(num_total_classes, dim, param_attr=None, bias_attr=None, num_neg_samples=None, sampler='uniform', custom_dist=None, seed=0, is_sparse=False, dtype="float32")
 该接口用于构建 ``NCE`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。其中实现了 ``NCE`` 损失函数的功能，其默认使用均匀分布进行抽样，计算并返回噪音对比估计（ noise-contrastive estimation training loss）。更多详情请参考：`Noise-contrastive estimation: A new estimation principle for unnormalized statistical models <http://www.jmlr.org/proceedings/papers/v9/gutmann10a/gutmann10a.pdf>`_
 参数：

--- a/doc/fluid/api_cn/dygraph_cn/NaturalExpDecay_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/NaturalExpDecay_cn.rst
@@ -3,10 +3,13 @@
 NaturalExpDecay
 -------------------------------
-**注意：该API仅支持【动态图】模式**
 .. py:class:: paddle.fluid.dygraph.NaturalExpDecay(learning_rate, decay_steps, decay_rate, staircase=False, begin=0, step=1, dtype='float32')
+:api_attr: 命令式编程模式（动态图)
 该接口提供按自然指数衰减学习率的功能。
 自然指数衰减的计算方式如下。

--- a/doc/fluid/api_cn/dygraph_cn/NoamDecay_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/NoamDecay_cn.rst
@@ -3,10 +3,13 @@
 NoamDecay
 -------------------------------
-**注意：该API仅支持【动态图】模式**
 .. py:class:: paddle.fluid.dygraph.NoamDecay(d_model, warmup_steps, begin=1, step=1, dtype='float32', learning_rate=1.0)
+:api_attr: 命令式编程模式（动态图)
 该接口提供Noam衰减学习率的功能。
 Noam衰减的计算方式如下。

--- a/doc/fluid/api_cn/dygraph_cn/PRelu_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/PRelu_cn.rst
@@ -5,6 +5,9 @@ PRelu
 .. py:class:: paddle.fluid.dygraph.PRelu(mode, input_shape=None, param_attr=None, dtype="float32")
 该接口用于构建 ``PRelu`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。其中实现了 ``PRelu`` 激活函数的三种激活方式。
 计算公式如下：

--- a/doc/fluid/api_cn/dygraph_cn/ParameterList_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/ParameterList_cn.rst
@@ -5,6 +5,9 @@ ParameterList
 .. py:class:: paddle.fluid.dygraph.ParameterList(parameters=None)
 参数列表容器。此容器的行为类似于Python列表，但它包含的参数将被正确地注册和添加。
 参数：

--- a/doc/fluid/api_cn/dygraph_cn/PiecewiseDecay_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/PiecewiseDecay_cn.rst
@@ -3,10 +3,13 @@
 PiecewiseDecay
 -------------------------------
-**注意：该API仅支持【动态图】模式**
 .. py:class:: paddle.fluid.dygraph.PiecewiseDecay(boundaries, values, begin, step=1, dtype='float32')
+:api_attr: 命令式编程模式（动态图)
 该接口提供对初始学习率进行分段(piecewise)常数衰减的功能。
 分段常数衰减的过程举例描述如下。

--- a/doc/fluid/api_cn/dygraph_cn/PolynomialDecay_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/PolynomialDecay_cn.rst
@@ -3,10 +3,13 @@
 PolynomialDecay
 -------------------------------
-**注意：该API仅支持【动态图】模式**
 .. py:class:: paddle.fluid.dygraph.PolynomialDecay(learning_rate, decay_steps, end_learning_rate=0.0001, power=1.0, cycle=False, begin=0, step=1, dtype='float32')
+:api_attr: 命令式编程模式（动态图)
 该接口提供学习率按多项式衰减的功能。通过多项式衰减函数，使得学习率值逐步从初始的 ``learning_rate``，衰减到 ``end_learning_rate`` 。
 计算方式如下。

--- a/doc/fluid/api_cn/dygraph_cn/Pool2D_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/Pool2D_cn.rst
@@ -5,6 +5,12 @@ Pool2D
 .. py:class:: paddle.fluid.dygraph.Pool2D(pool_size=-1, pool_type='max', pool_stride=1, pool_padding=0, global_pooling=False, use_cudnn=True, ceil_mode=False, exclusive=True)
+:alias_main: paddle.nn.Pool2D
+:alias: paddle.nn.Pool2D,paddle.nn.layer.Pool2D,paddle.nn.layer.common.Pool2D
+:old_api: paddle.fluid.dygraph.Pool2D
 该接口用于构建 ``Pool2D`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。其将在神经网络中构建一个二维池化层，并使用上述输入参数的池化配置，为二维空间池化操作，根据 ``input`` ， 池化类型 ``pool_type`` ， 池化核大小 ``pool_size`` , 步长 ``pool_stride`` ，填充 ``pool_padding`` 这些参数得到输出。
 输入X和输出Out是NCHW格式，N为批大小，C是通道数，H是特征高度，W是特征宽度。参数（ ``ksize``, ``strides``, ``paddings`` ）含有两个整型元素。分别表示高度和宽度上的参数。输入X的大小和输出Out的大小可能不一致。

--- a/doc/fluid/api_cn/dygraph_cn/Sequential_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/Sequential_cn.rst
@@ -5,6 +5,9 @@ Sequential
 .. py:class:: paddle.fluid.dygraph.Sequential(*layers)
 顺序容器。子Layer将按构造函数参数的顺序添加到此容器中。传递给构造函数的参数可以Layers或可迭代的name Layer元组。
 参数：

--- a/doc/fluid/api_cn/dygraph_cn/SpectralNorm_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/SpectralNorm_cn.rst
@@ -5,6 +5,12 @@ SpectralNorm
 .. py:class:: paddle.fluid.dygraph.SpectralNorm(weight_shape, dim=0, power_iters=1, eps=1e-12, name=None, dtype="float32")
+:alias_main: paddle.nn.SpectralNorm
+:alias: paddle.nn.SpectralNorm,paddle.nn.layer.SpectralNorm,paddle.nn.layer.norm.SpectralNorm
+:old_api: paddle.fluid.dygraph.SpectralNorm
 该接口用于构建 ``SpectralNorm`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。其中实现了谱归一化层的功能，用于计算fc、conv1d、conv2d、conv3d层的权重参数的谱正则值，输入权重参数应分别为2-D, 3-D, 4-D, 5-D张量，输出张量与输入张量维度相同。谱特征值计算方式如下：
 步骤1：生成形状为[H]的向量U,以及形状为[W]的向量V,其中H是输入权重张量的第 ``dim`` 个维度，W是剩余维度的乘积。

--- a/doc/fluid/api_cn/dygraph_cn/TracedLayer_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/TracedLayer_cn.rst
@@ -3,10 +3,13 @@
 TracedLayer
 -------------------------------
-**注意：该API仅支持【动态图】模式**
 .. py:class:: paddle.fluid.dygraph.TracedLayer(program, parameters, feed_names, fetch_names)
+:api_attr: 命令式编程模式（动态图)
 TracedLayer用于将前向动态图模型转换为静态图模型，主要用于将动态图保存后做在线C++预测。除此以外，用户也可使用转换后的静态图模型在Python端做预测，通常比原先的动态图性能更好。
 TracedLayer使用 ``Executor`` 和 ``CompiledProgram`` 运行静态图模型。转换后的静态图模型与原动态图模型共享参数。

--- a/doc/fluid/api_cn/dygraph_cn/TreeConv_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/TreeConv_cn.rst
@@ -5,6 +5,9 @@ TreeConv
 .. py:class:: paddle.fluid.dygraph.TreeConv(feature_size, output_size, num_filters=1, max_depth=2, act='tanh', param_attr=None, bias_attr=None, name=None, dtype="float32")
 该接口用于构建 ``TreeConv`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。其将在神经网络中构建一个基于树结构的卷积（Tree-Based Convolution）运算。基于树的卷积是基于树的卷积神经网络（TBCNN，Tree-Based Convolution Neural Network）的一部分，它用于对树结构进行分类，例如抽象语法树。 Tree-Based Convolution提出了一种称为连续二叉树的数据结构，它将多路（multiway）树视为二叉树。详情请参考： `基于树的卷积论文 <https://arxiv.org/abs/1409.5718v1>`_ 。

--- a/doc/fluid/api_cn/dygraph_cn/guard_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/guard_cn.rst
@@ -3,10 +3,13 @@
 guard
 -------------------------------
-**注意：该API仅支持【动态图】模式**
 .. py:function:: paddle.fluid.dygraph.guard(place=None)
+:api_attr: 命令式编程模式（动态图)
 通过with语句创建一个dygraph运行的context，执行context代码。
 参数：

--- a/doc/fluid/api_cn/dygraph_cn/load_dygraph_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/load_dygraph_cn.rst
@@ -3,10 +3,13 @@
 load_dygraph
 -------------------------------
-**注意：该API仅支持【动态图】模式**
 .. py:function:: paddle.fluid.dygraph.load_dygraph(model_path)
+:api_attr: 命令式编程模式（动态图)
 该接口尝试从磁盘中加载参数或优化器的 ``dict`` 。
 该接口会同时加载 ``model_path + ".pdparams"`` 和 ``model_path + ".pdopt"`` 中的内容。

--- a/doc/fluid/api_cn/dygraph_cn/no_grad_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/no_grad_cn.rst
@@ -3,10 +3,13 @@
 no_grad
 -------------------------------
-**注意：该API仅支持【动态图】模式**
 .. py:method:: paddle.fluid.dygraph.no_grad(func=None)
+:api_attr: 命令式编程模式（动态图)
 创建一个上下文来禁用动态图梯度计算。在此模式下，每次计算的结果都将具有stop_gradient=True。
 也可以用作一个装饰器（确保不要用括号来初始化）。

--- a/doc/fluid/api_cn/dygraph_cn/prepare_context_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/prepare_context_cn.rst
@@ -5,6 +5,10 @@ prepare_context
 .. py:class:: paddle.fluid.dygraph.prepare_context(strategy=None)
+:api_attr: 命令式编程模式（动态图)
 该API是进行多进程多卡训练的环境配置接口，接受一个ParallelStrategy结构体变量作为输入。当strategy属性中的nums_trainer小于2时，API会直接返回，当nums_trainer大于1且为CUDAPlace时，由于目前动态图模式仅支持GPU多卡训练，仅能配置NCCL多卡训练的环境，所以此时会对NCCL环境进行配置，具体内容包括：生成NCCL ID，并广播至参与训练的各进程，用于支持的处理器同步操作，创建并配置NCCL通信器等。
 参数：

--- a/doc/fluid/api_cn/dygraph_cn/save_dygraph_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/save_dygraph_cn.rst
@@ -3,10 +3,13 @@
 save_dygraph
 -------------------------------
-**注意：该API仅支持【动态图】模式**
 .. py:function:: paddle.fluid.dygraph.save_dygraph(state_dict, model_path)
+:api_attr: 命令式编程模式（动态图)
 该接口将传入的参数或优化器的 ``dict`` 保存到磁盘上。
 ``state_dict`` 是通过 :ref:`cn_api_fluid_dygraph_Layer` 的 ``state_dict()`` 方法得到的。

--- a/doc/fluid/api_cn/dygraph_cn/to_variable_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/to_variable_cn.rst
@@ -3,10 +3,13 @@
 to_variable
 -------------------------------
-**注意：该API仅支持【动态图】模式**
 .. py:function:: paddle.fluid.dygraph.to_variable(value, name=None, zero_copy=None)
+:api_attr: 命令式编程模式（动态图)
 该函数实现从numpy\.ndarray对象或者Variable对象创建一个 ``Variable`` 类型的对象。
 参数：

--- a/doc/fluid/api_cn/executor_cn/Executor_cn.rst
+++ b/doc/fluid/api_cn/executor_cn/Executor_cn.rst
@@ -3,10 +3,13 @@
 Executor
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:class:: paddle.fluid.executor.Executor (place=None)
+:api_attr: 声明式编程模式（静态图)
 Executor支持单GPU、多GPU以及CPU运行。
 参数：

--- a/doc/fluid/api_cn/executor_cn/global_scope_cn.rst
+++ b/doc/fluid/api_cn/executor_cn/global_scope_cn.rst
@@ -3,10 +3,13 @@
 global_scope
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.global_scope()
+:api_attr: 声明式编程模式（静态图)
 获取全局/默认作用域实例。很多API使用默认 ``global_scope`` ，例如 ``Executor.run`` 等。
 返回：全局/默认作用域实例

--- a/doc/fluid/api_cn/executor_cn/scope_guard_cn.rst
+++ b/doc/fluid/api_cn/executor_cn/scope_guard_cn.rst
@@ -3,10 +3,13 @@
 scope_guard
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.executor.scope_guard (scope)
+:api_attr: 声明式编程模式（静态图)
 该接口通过 python 的 ``with`` 语句切换作用域（scope）。
 作用域记录了变量名和变量 ( :ref:`api_guide_Variable` ) 之间的映射关系，类似于编程语言中的大括号。

--- a/doc/fluid/api_cn/fluid_cn/BuildStrategy_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/BuildStrategy_cn.rst
@@ -3,10 +3,13 @@
 BuildStrategy
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:class:: paddle.fluid.BuildStrategy
+:api_attr: 声明式编程模式（静态图)
 ``BuildStrategy`` 使用户更方便地控制 :ref:`cn_api_fluid_ParallelExecutor` 中计算图的建造方法，可通过设置 ``ParallelExecutor`` 中的 ``BuildStrategy`` 成员来实现此功能。
 **代码示例**

--- a/doc/fluid/api_cn/fluid_cn/CPUPlace_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/CPUPlace_cn.rst
@@ -5,6 +5,9 @@ CPUPlace
 .. py:class:: paddle.fluid.CPUPlace
 ``CPUPlace`` 是一个设备描述符，表示一个分配或将要分配 ``Tensor`` 或 ``LoDTensor`` 的 ``CPU`` 设备。
 **代码示例**

--- a/doc/fluid/api_cn/fluid_cn/CUDAPinnedPlace_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/CUDAPinnedPlace_cn.rst
@@ -5,6 +5,9 @@ CUDAPinnedPlace
 .. py:class:: paddle.fluid.CUDAPinnedPlace
 ``CUDAPinnedPlace`` 是一个设备描述符，它所指代的页锁定内存由 CUDA 函数 ``cudaHostAlloc()`` 在主机内存上分配，主机的操作系统将不会对这块内存进行分页和交换操作，可以通过直接内存访问技术访问，加速主机和 GPU 之间的数据拷贝。
 有关 CUDA 的数据转移和 ``pinned memory``，参见 `官方文档 <https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#pinned-memory>`_ 。

--- a/doc/fluid/api_cn/fluid_cn/CUDAPlace_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/CUDAPlace_cn.rst
@@ -5,6 +5,9 @@ CUDAPlace
 .. py:class:: paddle.fluid.CUDAPlace
 .. note::
    多卡任务请先使用 FLAGS_selected_gpus 环境变量设置可见的GPU设备，下个版本将会修正 CUDA_VISIBLE_DEVICES 环境变量无效的问题。

--- a/doc/fluid/api_cn/fluid_cn/CompiledProgram_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/CompiledProgram_cn.rst
@@ -3,10 +3,13 @@
 CompiledProgram
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:class:: paddle.fluid.CompiledProgram(program_or_graph, build_strategy=None)
+:api_attr: 声明式编程模式（静态图)
 CompiledProgram根据 `build_strategy` 的配置将输入的Program或Graph进行转换和优化，例如：计算图中算子融合、计算图执行过程中开启内存/显存优化等，关于build_strategy更多信息。请参阅  ``fluid.BuildStrategy`` 。
 参数：

--- a/doc/fluid/api_cn/fluid_cn/DataFeedDesc_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/DataFeedDesc_cn.rst
@@ -3,10 +3,13 @@
 DataFeedDesc
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:class:: paddle.fluid.DataFeedDesc(proto_file)
+:api_attr: 声明式编程模式（静态图)
 描述训练数据的格式。输入是一个文件路径名，其内容是protobuf message。
 可以参考 :code:`paddle/fluid/framework/data_feed.proto` 查看我们如何定义message

--- a/doc/fluid/api_cn/fluid_cn/DataFeeder_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/DataFeeder_cn.rst
@@ -3,10 +3,13 @@
 DataFeeder
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:class:: paddle.fluid.DataFeeder(feed_list, place, program=None)
+:api_attr: 声明式编程模式（静态图)
 ``DataFeeder`` 负责将reader(读取器)返回的数据转成一种特殊的数据结构，使它们可以输入到 ``Executor`` 和 ``ParallelExecutor`` 中。

--- a/doc/fluid/api_cn/fluid_cn/DistributeTranspilerConfig_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/DistributeTranspilerConfig_cn.rst
@@ -6,6 +6,9 @@ DistributeTranspilerConfig
 .. py:class:: paddle.fluid.DistributeTranspilerConfig
 .. py:attribute:: slice_var_up (bool)
 为多个Pserver（parameter server）将tensor切片, 默认为True。

--- a/doc/fluid/api_cn/fluid_cn/DistributeTranspiler_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/DistributeTranspiler_cn.rst
@@ -6,6 +6,9 @@ DistributeTranspiler
 .. py:class:: paddle.fluid.DistributeTranspiler (config=None)
 该类可以把fluid program转变为分布式数据并行计算的program, 有PServer和NCCL2两种模式。
 在Pserver（全称：parameter server）模式下， 通过 ``transpile`` 将用于单机训练的 ``program``  转译为可用于parameter server的分布式架构(即PServer,参数服务器)来进行训练的program。
 在NCCL2模式下, 通过 ``transpile`` 将用于单机训练的 ``program``  转译为可用于NCCL2的分布式架构来进行训练的program。在NCCL2模式下，transpiler会在 ``startup_program`` 中附加一个 ``NCCL_ID`` 广播

--- a/doc/fluid/api_cn/fluid_cn/ExecutionStrategy_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/ExecutionStrategy_cn.rst
@@ -3,10 +3,13 @@
 ExecutionStrategy
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:class:: paddle.fluid.ExecutionStrategy
+:api_attr: 声明式编程模式（静态图)
 通过设置 ``ExecutionStrategy`` 中的选项，用户可以对执行器的执行配置进行调整，比如设置执行器中线程池的大小等。
 返回：初始化后的ExecutionStrategy的实例

--- a/doc/fluid/api_cn/fluid_cn/Executor_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/Executor_cn.rst
@@ -4,10 +4,13 @@ Executor
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:class:: paddle.fluid.Executor (place=None)
+:api_attr: 声明式编程模式（静态图)
 Executor支持单GPU、多GPU以及CPU运行。
 参数：

--- a/doc/fluid/api_cn/fluid_cn/LoDTensorArray_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/LoDTensorArray_cn.rst
@@ -5,6 +5,9 @@ LoDTensorArray
 .. py:class:: paddle.fluid.LoDTensorArray
 LoDTensorArray是由LoDTensor组成的数组，支持"[]"运算符、len()函数和for迭代等。
 **示例代码**

--- a/doc/fluid/api_cn/fluid_cn/LoDTensor_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/LoDTensor_cn.rst
@@ -6,6 +6,9 @@ LoDTensor
 .. py:class:: paddle.fluid.LoDTensor
 LoDTensor是一个具有LoD（Level of Details）信息的张量（Tensor），可用于表示变长序列，详见 :ref:`cn_user_guide_lod_tensor` 。
 LoDTensor可以通过 ``np.array(lod_tensor)`` 方法转换为numpy.ndarray。

--- a/doc/fluid/api_cn/fluid_cn/ParallelExecutor_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/ParallelExecutor_cn.rst
@@ -3,10 +3,13 @@
 ParallelExecutor
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:class:: paddle.fluid.ParallelExecutor(use_cuda, loss_name=None, main_program=None, share_vars_from=None, exec_strategy=None, build_strategy=None, num_trainers=1, trainer_id=0, scope=None)
+:api_attr: 声明式编程模式（静态图)
 ``ParallelExecutor`` 是 ``Executor`` 的一个升级版本，可以支持基于数据并行的多节点模型训练和测试。如果采用数据并行模式， ``ParallelExecutor`` 在构造时会将参数分发到不同的节点上，并将输入的 ``Program`` 拷贝到不同的节点，在执行过程中，各个节点独立运行模型，将模型反向计算得到的参数梯度在多个节点之间进行聚合，之后各个节点独立的进行参数的更新。如果使用GPU运行模型，即 ``use_cuda=True`` ，节点指代GPU， ``ParallelExecutor`` 将自动获取在当前机器上可用的GPU资源，用户也可以通过在环境变量设置可用的GPU资源，例如：希望使用GPU0、GPU1计算，export CUDA_VISIBLEDEVICES=0,1；如果在CPU上进行操作，即 ``use_cuda=False`` ，节点指代CPU，**注意：此时需要用户在环境变量中手动添加 CPU_NUM ，并将该值设置为CPU设备的个数，例如：export CPU_NUM=4，如果没有设置该环境变量，执行器会在环境变量中添加该变量，并将其值设为1**。
 参数:

--- a/doc/fluid/api_cn/fluid_cn/ParamAttr_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/ParamAttr_cn.rst
@@ -7,6 +7,9 @@ ParamAttr
 .. py:class:: paddle.fluid.ParamAttr(name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, do_model_average=False)
 .. note::
    该类中的 ``gradient_clip`` 属性在2.0版本会废弃，推荐在初始化 ``optimizer`` 时设置梯度裁剪。共有三种裁剪策略： :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 
    :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。

--- a/doc/fluid/api_cn/fluid_cn/Program_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/Program_cn.rst
@@ -5,6 +5,9 @@ Program
 .. py:class::  paddle.fluid.Program
 **注意：默认情况下，Paddle Fluid内部默认含有** :ref:`cn_api_fluid_default_startup_program` **和** :ref:`cn_api_fluid_default_main_program` **，它们共享参数。** :ref:`cn_api_fluid_default_startup_program` **只运行一次来初始化参数，** :ref:`cn_api_fluid_default_main_program` **在每个mini batch中运行并更新权重。**
 Program是Paddle Fluid对于计算图的一种静态描述，使用Program的构造函数可以创建一个Program。Program中包括至少一个 :ref:`api_guide_Block` ，当 :ref:`api_guide_Block` 中存在条件选择的控制流OP（例如 :ref:`cn_api_fluid_layers_While` 等）时，该Program将会含有嵌套着的 :ref:`api_guide_Block` 即控制流外部的 :ref:`api_guide_Block` 将包含着控制流内部的 :ref:`api_guide_Block` ，而嵌套的 :ref:`api_guide_Block` 的元素访问控制将由具体的控制流OP来决定。关于Program具体的结构和包含的类型请参阅 `framework.proto <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/framework.proto>`_

--- a/doc/fluid/api_cn/fluid_cn/Tensor_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/Tensor_cn.rst
@@ -5,6 +5,9 @@ Tensor
 .. py:function:: paddle.fluid.Tensor
 Tensor用于表示多维张量，可以通过 ``np.array(tensor)`` 方法转换为numpy.ndarray。
 **示例代码**

--- a/doc/fluid/api_cn/fluid_cn/Variable_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/Variable_cn.rst
@@ -5,6 +5,9 @@ Variable
 .. py:class:: paddle.fluid.Variable
 **注意：**
  **1. 请不要直接调用** `Variable` **的构造函数，因为这会造成严重的错误发生！**

--- a/doc/fluid/api_cn/fluid_cn/WeightNormParamAttr_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/WeightNormParamAttr_cn.rst
@@ -3,10 +3,13 @@
 WeightNormParamAttr
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:class:: paddle.fluid.WeightNormParamAttr(dim=None, name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, do_model_average=False)
+:api_attr: 声明式编程模式（静态图)
 .. note::
    该类中的 ``gradient_clip`` 属性在2.0版本会废弃，推荐在初始化 ``optimizer`` 时设置梯度裁剪。共有三种裁剪策略： :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 
    :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。

--- a/doc/fluid/api_cn/fluid_cn/cpu_places_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/cpu_places_cn.rst
@@ -5,6 +5,9 @@ cpu_places
 .. py:function:: paddle.fluid.cpu_places(device_count=None)
 该接口创建 ``device_count`` 个 ``fluid.CPUPlace`` 对象，并返回所创建的对象列表。
 如果 ``device_count`` 为 ``None``，则设备数目将由环境变量 ``CPU_NUM`` 确定。如果未设置 ``CPU_NUM`` 环境变量，则设备数目会默认设为1，也就是说， ``CPU_NUM=1``。

--- a/doc/fluid/api_cn/fluid_cn/create_lod_tensor_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/create_lod_tensor_cn.rst
@@ -6,6 +6,9 @@ create_lod_tensor
 .. py:function:: paddle.fluid.create_lod_tensor(data, recursive_seq_lens, place)
 从一个numpy数组、list或LoDTensor创建一个新的LoDTensor。
 具体实现方法如下:

--- a/doc/fluid/api_cn/fluid_cn/create_random_int_lodtensor_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/create_random_int_lodtensor_cn.rst
@@ -4,10 +4,13 @@
 create_random_int_lodtensor
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.create_random_int_lodtensor(recursive_seq_lens, base_shape, place, low, high)
+:api_attr: 声明式编程模式（静态图)
 创建一个包含随机整数的LoDTensor。
 具体实现方法如下：

--- a/doc/fluid/api_cn/fluid_cn/cuda_pinned_places_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/cuda_pinned_places_cn.rst
@@ -8,6 +8,9 @@ cuda_pinned_places
 该接口创建 ``device_count`` 个 ``fluid.CUDAPinnedPlace`` ( fluid. :ref:`cn_api_fluid_CUDAPinnedPlace` ) 对象，并返回所创建的对象列表。
 如果 ``device_count`` 为 ``None``，实际设备数目将由当前任务中使用的GPU设备数决定。用户可通过以下2种方式设置任务可用的GPU设备：

--- a/doc/fluid/api_cn/fluid_cn/cuda_places_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/cuda_places_cn.rst
@@ -5,6 +5,9 @@ cuda_places
 .. py:function:: paddle.fluid.cuda_places(device_ids=None)
 .. note::
    多卡任务请先使用 FLAGS_selected_gpus 环境变量设置可见的GPU设备，下个版本将会修正 CUDA_VISIBLE_DEVICES 环境变量无效的问题。

--- a/doc/fluid/api_cn/fluid_cn/data_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/data_cn.rst
@@ -3,10 +3,16 @@
 data
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.data(name, shape, dtype='float32', lod_level=0)
+:api_attr: 声明式编程模式（静态图)
+:alias_main: paddle.nn.data
+:alias: paddle.nn.data,paddle.nn.input.data
+:old_api: paddle.fluid.data
 该OP会在全局block中创建变量（Variable），该全局变量可被计算图中的算子（operator）访问。该变量可作为占位符用于数据输入。例如用执行器（Executor）feed数据进该变量
 注意：

--- a/doc/fluid/api_cn/fluid_cn/default_main_program_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/default_main_program_cn.rst
@@ -6,6 +6,9 @@ default_main_program
 .. py:function:: paddle.fluid.default_main_program()
 此接口可以获取当前用于存储op和variable描述信息的 ``default main program``
 ``fluid.layers`` 接口中添加的op和variable会存储在 ``default main program`` 中

--- a/doc/fluid/api_cn/fluid_cn/default_startup_program_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/default_startup_program_cn.rst
@@ -10,6 +10,9 @@ default_startup_program
 该函数可以获取默认/全局 startup :ref:`cn_api_fluid_Program` (初始化启动程序)。
 :ref:`_cn_api_fluid_layers` 中的函数会新建参数或 :ref:`cn_api_paddle_data_reader_reader` (读取器) 或 `NCCL <https://developer.nvidia.com/nccl>`_ 句柄作为全局变量。

--- a/doc/fluid/api_cn/fluid_cn/embedding_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/embedding_cn.rst
@@ -3,10 +3,13 @@
 embedding
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.embedding(input, size, is_sparse=False, is_distributed=False, padding_idx=None, param_attr=None, dtype='float32')
+:api_attr: 声明式编程模式（静态图)
 该OP根据input中的id信息从embedding矩阵中查询对应embedding信息，函数会根据输入的size (vocab_size, emb_size)和dtype自动构造一个二维embedding矩阵。
 输出的Tensor的shape是在输入Tensor shape的最后一维后面添加了emb_size的维度。

--- a/doc/fluid/api_cn/fluid_cn/global_scope_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/global_scope_cn.rst
@@ -3,10 +3,13 @@
 global_scope
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.global_scope()
+:api_attr: 声明式编程模式（静态图)
 获取全局/默认作用域实例。很多API使用默认 ``global_scope`` ，例如 ``Executor.run`` 等。
 返回：全局/默认作用域实例

--- a/doc/fluid/api_cn/fluid_cn/gradients_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/gradients_cn.rst
@@ -3,10 +3,13 @@
 gradients
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.gradients(targets, inputs, target_gradients=None, no_grad_set=None)
+:api_attr: 声明式编程模式（静态图)
 将目标梯度反向传播到输入。
 参数：  

--- a/doc/fluid/api_cn/fluid_cn/in_dygraph_mode_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/in_dygraph_mode_cn.rst
@@ -5,6 +5,9 @@ in_dygraph_mode
 .. py:function:: paddle.fluid.in_dygraph_mode()
 该接口检查程序是否在动态图模式中运行。
 可以通过 ``fluid.dygraph.guard`` 接口开启动态图模式。

--- a/doc/fluid/api_cn/fluid_cn/is_compiled_with_cuda_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/is_compiled_with_cuda_cn.rst
@@ -5,6 +5,9 @@ is_compiled_with_cuda
 .. py:function:: paddle.fluid.is_compiled_with_cuda()
 检查 ``whl`` 包是否可以被用来在GPU上运行模型
 返回：支持gpu则为True,否则为False。

--- a/doc/fluid/api_cn/fluid_cn/load_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/load_cn.rst
@@ -5,6 +5,10 @@ load
 .. py:function:: paddle.fluid.load(program, model_path, executor=None, var_list=None)
+:api_attr: 声明式编程模式（静态图)
 该接口从Program中过滤出参数和优化器信息，然后从文件中获取相应的值。
 如果Program和加载的文件之间参数的维度或数据类型不匹配，将引发异常。

--- a/doc/fluid/api_cn/fluid_cn/load_op_library_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/load_op_library_cn.rst
@@ -5,6 +5,10 @@ load_op_library
 .. py:class:: paddle.fluid.load_op_library
+:api_attr: 声明式编程模式（静态图)
 ``load_op_library`` 用于自定义C++算子中，用来加载算子动态共享库。加载库后，注册好的算子及其Kernel实现将在PaddlePaddle主进程中可以被调用。 请注意，自定义算子的类型不能与框架中的现有算子类型相同。
 参数：

--- a/doc/fluid/api_cn/fluid_cn/memory_optimize_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/memory_optimize_cn.rst
@@ -3,9 +3,12 @@
 memory_optimize
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.memory_optimize(input_program, skip_opt_set=None, print_log=False, level=0, skip_grads=True)
+:api_attr: 声明式编程模式（静态图)
 **从1.6版本开始此接口不再推荐使用，请不要在新写的代码中使用它，1.6+版本已默认开启更优的存储优化策略**
--- a/doc/fluid/api_cn/fluid_cn/name_scope_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/name_scope_cn.rst
@@ -3,10 +3,13 @@
 name_scope
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.name_scope(prefix=None)
+:api_attr: 声明式编程模式（静态图)
 该函数为operators生成不同的命名空间。该函数只用于调试和可视化，不建议用在其它方面。

--- a/doc/fluid/api_cn/fluid_cn/one_hot_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/one_hot_cn.rst
@@ -5,6 +5,12 @@ one_hot
 .. py:function:: paddle.fluid.one_hot(input, depth, allow_out_of_range=False)
+:alias_main: paddle.nn.functional.one_hot
+:alias: paddle.nn.functional.one_hot,paddle.nn.functional.common.one_hot
+:old_api: paddle.fluid.one_hot
 该OP将输入（input）中的每个id转换为一个one-hot向量，其长度为 ``depth`` ，该id对应的向量维度上的值为1，其余维度的值为0。
 输出的Tensor（或LoDTensor）的shape是在输入shape的最后一维后面添加了depth的维度。

--- a/doc/fluid/api_cn/fluid_cn/program_guard_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/program_guard_cn.rst
@@ -3,10 +3,13 @@
 program_guard
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.program_guard(main_program, startup_program=None)
+:api_attr: 声明式编程模式（静态图)
 该接口应配合使用python的 ``with`` 语句来将 ``with`` block 里的算子和变量添加进指定的全局主程序（main program）和启动程序（startup program）。
 ``with`` 语句块中的fluid.layers下各接口将在新的main program（主程序）中添加operators（算子）和variables（变量）。

--- a/doc/fluid/api_cn/fluid_cn/release_memory_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/release_memory_cn.rst
@@ -3,8 +3,11 @@
 release_memory
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.release_memory(input_program, skip_opt_set=None)
+:api_attr: 声明式编程模式（静态图)
 **从1.6版本开始此接口不再推荐使用，请不要在新写的代码中使用它，1.6+版本已默认开启更优的存储优化策略**
--- a/doc/fluid/api_cn/fluid_cn/require_version_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/require_version_cn.rst
@@ -4,6 +4,9 @@ require_version
 -------------------------------
 .. py:function:: paddle.fluid.require_version(min_version, max_version=None)
 该接口用于检查已安装的飞桨版本是否介于[``min_version``, ``max_version``]之间（包含 ``min_version`` 和 ``max_version`` ），如果已安装的版本低于 ``min_version`` 或者高于 ``max_version`` ，将会抛出异常。该接口无返回值。
 参数:

--- a/doc/fluid/api_cn/fluid_cn/save_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/save_cn.rst
@@ -3,10 +3,16 @@
 save
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.save(program, model_path)
+:api_attr: 声明式编程模式（静态图)
+:alias_main: paddle.save
+:alias: paddle.save,paddle.tensor.save,paddle.tensor.io.save
+:old_api: paddle.fluid.save
 该接口将传入的参数、优化器信息和网络描述保存到 ``model_path`` 。
 参数包含所有的可训练 :ref:`cn_api_fluid_Variable` ，将保存到后缀为 ``.pdparams`` 的文件中。

--- a/doc/fluid/api_cn/fluid_cn/scope_guard_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/scope_guard_cn.rst
@@ -3,10 +3,13 @@
 scope_guard
 -------------------------------
-**注意：该API仅支持【静态图】模式**
 .. py:function:: paddle.fluid.scope_guard(scope)
+:api_attr: 声明式编程模式（静态图)
 该接口通过 python 的 ``with`` 语句切换作用域（scope）。
 作用域记录了变量名和变量 ( :ref:`api_guide_Variable` ) 之间的映射关系，类似于编程语言中的大括号。

--- a/doc/fluid/api_cn/framework_cn/manual_seed_cn.rst
+++ b/doc/fluid/api_cn/framework_cn/manual_seed_cn.rst
@@ -5,6 +5,11 @@ manual_seed
 .. py:function:: paddle.framework.manual_seed(seed)
+:alias_main: paddle.manual_seed
+:alias: paddle.manual_seed,paddle.framework.random.manual_seed
 设置并固定随机种子, manual_seed设置后，会将用户定义的Program中的random_seed参数设置成相同的种子

--- a/doc/fluid/api_cn/initializer_cn/BilinearInitializer_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/BilinearInitializer_cn.rst
 .. _cn_api_fluid_initializer_BilinearInitializer:
 BilinearInitializer
 -------------------------------
 .. py:class:: paddle.fluid.initializer.BilinearInitializer())
 该接口为参数初始化函数，用于转置卷积函数中，对输入进行上采样。用户通过任意整型因子放大shape为(B，C，H，W)的特征图。
 返回：对象
 用法如下：
 **代码示例**:
 .. code-block:: python
    import paddle.fluid as fluid
    import math
    factor = 2
    C = 2
    H = W = 32
    w_attr = fluid.ParamAttr(
        learning_rate=0.,
        regularizer=fluid.regularizer.L2Decay(0.),
        initializer=fluid.initializer.BilinearInitializer())
    x = fluid.layers.data(name="data", shape=[4, H, W],
                          dtype="float32")
    conv_up = fluid.layers.conv2d_transpose(
        input=x,
        num_filters=C,
        output_size=None,
        filter_size=2 * factor - factor % 2,
        padding=int(math.ceil((factor - 1) / 2.)),
        stride=factor,
        groups=C,
        param_attr=w_attr,
        bias_attr=False)
 上述代码实现的是将输入x（shape=[-1, 4, H, W]）经过转置卷积得到shape=[-1, C, H*factor, W*factor]的输出，num_filters = C和groups = C 表示这是按通道转置的卷积函数，输出通道为C，转置卷积的groups为C。滤波器shape为(C,1,K,K)，K为filter_size。该初始化函数为滤波器的每个通道设置(K,K)插值核。输出特征图的最终输出shape为(B,C,factor*H,factor*W)。注意学习率和权重衰减设为0，以便在训练过程中双线性插值的系数值保持不变
--- a/doc/fluid/api_cn/initializer_cn/Bilinear_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/Bilinear_cn.rst
 .. _cn_api_fluid_initializer_Bilinear:
 Bilinear
 -------------------------------
 .. py:attribute:: paddle.fluid.initializer.Bilinear
-``BilinearInitializer`` 的别名
+:alias_main: paddle.nn.initializer.Bilinear
+:alias: paddle.nn.initializer.Bilinear
+:old_api: paddle.fluid.initializer.Bilinear
+``BilinearInitializer`` 的别名
--- a/doc/fluid/api_cn/initializer_cn/ConstantInitializer_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/ConstantInitializer_cn.rst
@@ -5,6 +5,9 @@ ConstantInitializer
 .. py:class:: paddle.fluid.initializer.ConstantInitializer(value=0.0, force_cpu=False)
 该接口为常量初始化函数，用于权重初始化，通过输入的value值初始化输入变量；
 参数：

--- a/doc/fluid/api_cn/initializer_cn/Constant_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/Constant_cn.rst
 .. _cn_api_fluid_initializer_Constant:
 Constant
 -------------------------------
 .. py:attribute:: paddle.fluid.initializer.Constant
-``ConstantInitializer`` 的别名
+:alias_main: paddle.nn.initializer.Constant
+:alias: paddle.nn.initializer.Constant
+:old_api: paddle.fluid.initializer.Constant
+``ConstantInitializer`` 的别名
--- a/doc/fluid/api_cn/initializer_cn/MSRAInitializer_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/MSRAInitializer_cn.rst
 .. _cn_api_fluid_initializer_MSRAInitializer:
 MSRAInitializer
 -------------------------------
 .. py:class:: paddle.fluid.initializer.MSRAInitializer(uniform=True, fan_in=None, seed=0)
-该接口实现MSRA方式的权重初始化（a.k.a. Kaiming初始化）
-该接口为权重初始化函数，方法来自Kaiming He，Xiangyu Zhang，Shaoqing Ren 和 Jian Sun所写的论文: `Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification <https://arxiv.org/abs/1502.01852>`_ 。这是一个鲁棒性特别强的初始化方法，并且适应了非线性激活函数（rectifier nonlinearities）。
-可以选择使用均匀分布或者正太分布初始化权重；
+该接口实现MSRA方式的权重初始化（a.k.a. Kaiming初始化）
-在均匀分布中，范围为[-x,x]，其中：
+该接口为权重初始化函数，方法来自Kaiming He，Xiangyu Zhang，Shaoqing Ren 和 Jian Sun所写的论文: `Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification <https://arxiv.org/abs/1502.01852>`_ 。这是一个鲁棒性特别强的初始化方法，并且适应了非线性激活函数（rectifier nonlinearities）。
-.. math::
+可以选择使用均匀分布或者正太分布初始化权重；
+在均匀分布中，范围为[-x,x]，其中：
-    x = \sqrt{\frac{6.0}{fan\_in}}
+.. math::
-在正态分布中，均值为0，标准差为：
+    x = \sqrt{\frac{6.0}{fan\_in}}
-.. math::
+在正态分布中，均值为0，标准差为：
-    \sqrt{\frac{2.0}{fan\_in}}
+.. math::
-参数：
-    - **uniform** (bool) - 为True表示使用均匀分布，为False表示使用正态分布
+    \sqrt{\frac{2.0}{fan\_in}}
-    - **fan_in** (float16|float32) - MSRAInitializer的fan_in。如果为None，fan_in沿伸自变量，多设置为None
-    - **seed** (int32) - 随机种子
+参数：
+    - **uniform** (bool) - 为True表示使用均匀分布，为False表示使用正态分布
+    - **fan_in** (float16|float32) - MSRAInitializer的fan_in。如果为None，fan_in沿伸自变量，多设置为None
+    - **seed** (int32) - 随机种子
 返回：对象
 .. note:: 
    在大多数情况下推荐设置fan_in为None
 **代码示例**：
 .. code-block:: python
    import paddle.fluid as fluid
    x = fluid.layers.data(name="data", shape=[32, 32], dtype="float32")
    fc = fluid.layers.fc(input=x, size=10, param_attr=fluid.initializer.MSRAInitializer(uniform=False))
--- a/doc/fluid/api_cn/initializer_cn/MSRA_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/MSRA_cn.rst
 .. _cn_api_fluid_initializer_MSRA:
 MSRA
 -------------------------------
 .. py:attribute:: paddle.fluid.initializer.MSRA
-``MSRAInitializer`` 的别名
+:alias_main: paddle.nn.initializer.MSRA
+:alias: paddle.nn.initializer.MSRA
+:old_api: paddle.fluid.initializer.MSRA
+``MSRAInitializer`` 的别名
--- a/doc/fluid/api_cn/initializer_cn/NormalInitializer_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/NormalInitializer_cn.rst
 .. _cn_api_fluid_initializer_NormalInitializer:
 NormalInitializer
 -------------------------------
 .. py:class:: paddle.fluid.initializer.NormalInitializer(loc=0.0, scale=1.0, seed=0)
-随机正态(高斯)分布初始化函数
-参数：
-    - **loc** (float16|float32) - 正态分布的平均值
+随机正态(高斯)分布初始化函数
-    - **scale** (float16|float32) - 正态分布的标准差
-    - **seed** (int32) - 随机种子
+参数：
+    - **loc** (float16|float32) - 正态分布的平均值
-返回：对象
+    - **scale** (float16|float32) - 正态分布的标准差
+    - **seed** (int32) - 随机种子
-**代码示例**
+返回：对象
-.. code-block:: python
+**代码示例**
-    import paddle.fluid as fluid
-    x = fluid.layers.data(name="data", shape=[32, 32], dtype="float32")
+.. code-block:: python
-    fc = fluid.layers.fc(input=x, size=10,
-        param_attr=fluid.initializer.Normal(loc=0.0, scale=2.0))
+    import paddle.fluid as fluid
+    x = fluid.layers.data(name="data", shape=[32, 32], dtype="float32")
+    fc = fluid.layers.fc(input=x, size=10,
+        param_attr=fluid.initializer.Normal(loc=0.0, scale=2.0))
--- a/doc/fluid/api_cn/initializer_cn/Normal_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/Normal_cn.rst
 .. _cn_api_fluid_initializer_Normal:
 Normal
 -------------------------------
 .. py:attribute:: paddle.fluid.initializer.Normal
-``NormalInitializer`` 的别名
+:alias_main: paddle.nn.initializer.Normal
+:alias: paddle.nn.initializer.Normal
+:old_api: paddle.fluid.initializer.Normal
+``NormalInitializer`` 的别名
--- a/doc/fluid/api_cn/initializer_cn/NumpyArrayInitializer_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/NumpyArrayInitializer_cn.rst
 .. _cn_api_fluid_initializer_NumpyArrayInitializer:
 NumpyArrayInitializer
 -------------------------------
 .. py:class:: paddle.fluid.initializer.NumpyArrayInitializer(value)
-该OP使用Numpy型数组来初始化参数变量。
-参数：
-        - **value** （numpy） - 用于初始化变量的一个Numpy型数组。
+该OP使用Numpy型数组来初始化参数变量。
-返回：张量（Tensor）
+参数：
+        - **value** （numpy） - 用于初始化变量的一个Numpy型数组。
-返回类型：变量（Variable）
+返回：张量（Tensor）
-**代码示例**
+返回类型：变量（Variable）
-.. code-block:: python
+**代码示例**
-    import paddle.fluid as fluid
-    x = fluid.layers.data(name="x", shape=[5], dtype='float32')
+.. code-block:: python
-    fc = fluid.layers.fc(input=x, size=10,
-        param_attr=fluid.initializer.NumpyArrayInitializer(numpy.array([1,2])))
+    import paddle.fluid as fluid
+    x = fluid.layers.data(name="x", shape=[5], dtype='float32')
+    fc = fluid.layers.fc(input=x, size=10,
+        param_attr=fluid.initializer.NumpyArrayInitializer(numpy.array([1,2])))
--- a/doc/fluid/api_cn/initializer_cn/TruncatedNormalInitializer_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/TruncatedNormalInitializer_cn.rst
--- a/doc/fluid/api_cn/initializer_cn/TruncatedNormal_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/TruncatedNormal_cn.rst
--- a/doc/fluid/api_cn/initializer_cn/UniformInitializer_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/UniformInitializer_cn.rst
--- a/doc/fluid/api_cn/initializer_cn/Uniform_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/Uniform_cn.rst
--- a/doc/fluid/api_cn/initializer_cn/XavierInitializer_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/XavierInitializer_cn.rst
--- a/doc/fluid/api_cn/initializer_cn/Xavier_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/Xavier_cn.rst
--- a/doc/fluid/api_cn/initializer_cn/force_init_on_cpu_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/force_init_on_cpu_cn.rst
--- a/doc/fluid/api_cn/initializer_cn/init_on_cpu_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/init_on_cpu_cn.rst
--- a/doc/fluid/api_cn/io_cn/DataLoader_cn.rst
+++ b/doc/fluid/api_cn/io_cn/DataLoader_cn.rst
--- a/doc/fluid/api_cn/io_cn/PyReader_cn.rst
+++ b/doc/fluid/api_cn/io_cn/PyReader_cn.rst
--- a/doc/fluid/api_cn/io_cn/batch_cn.rst
+++ b/doc/fluid/api_cn/io_cn/batch_cn.rst
--- a/doc/fluid/api_cn/io_cn/buffered_cn.rst
+++ b/doc/fluid/api_cn/io_cn/buffered_cn.rst
--- a/doc/fluid/api_cn/io_cn/cache_cn.rst
+++ b/doc/fluid/api_cn/io_cn/cache_cn.rst
--- a/doc/fluid/api_cn/io_cn/chain_cn.rst
+++ b/doc/fluid/api_cn/io_cn/chain_cn.rst
--- a/doc/fluid/api_cn/io_cn/compose_cn.rst
+++ b/doc/fluid/api_cn/io_cn/compose_cn.rst
--- a/doc/fluid/api_cn/io_cn/firstn_cn.rst
+++ b/doc/fluid/api_cn/io_cn/firstn_cn.rst
--- a/doc/fluid/api_cn/io_cn/get_program_parameter_cn.rst
+++ b/doc/fluid/api_cn/io_cn/get_program_parameter_cn.rst
--- a/doc/fluid/api_cn/io_cn/get_program_persistable_vars_cn.rst
+++ b/doc/fluid/api_cn/io_cn/get_program_persistable_vars_cn.rst
--- a/doc/fluid/api_cn/io_cn/load_cn.rst
+++ b/doc/fluid/api_cn/io_cn/load_cn.rst
--- a/doc/fluid/api_cn/io_cn/load_inference_model_cn.rst
+++ b/doc/fluid/api_cn/io_cn/load_inference_model_cn.rst
--- a/doc/fluid/api_cn/io_cn/load_params_cn.rst
+++ b/doc/fluid/api_cn/io_cn/load_params_cn.rst
--- a/doc/fluid/api_cn/io_cn/load_persistables_cn.rst
+++ b/doc/fluid/api_cn/io_cn/load_persistables_cn.rst
--- a/doc/fluid/api_cn/io_cn/load_program_state_cn.rst
+++ b/doc/fluid/api_cn/io_cn/load_program_state_cn.rst
--- a/doc/fluid/api_cn/io_cn/load_vars_cn.rst
+++ b/doc/fluid/api_cn/io_cn/load_vars_cn.rst
--- a/doc/fluid/api_cn/io_cn/map_readers_cn.rst
+++ b/doc/fluid/api_cn/io_cn/map_readers_cn.rst
--- a/doc/fluid/api_cn/io_cn/multiprocess_reader_cn.rst
+++ b/doc/fluid/api_cn/io_cn/multiprocess_reader_cn.rst
--- a/doc/fluid/api_cn/io_cn/save_cn.rst
+++ b/doc/fluid/api_cn/io_cn/save_cn.rst
--- a/doc/fluid/api_cn/io_cn/save_inference_model_cn.rst
+++ b/doc/fluid/api_cn/io_cn/save_inference_model_cn.rst
--- a/doc/fluid/api_cn/io_cn/save_params_cn.rst
+++ b/doc/fluid/api_cn/io_cn/save_params_cn.rst
--- a/doc/fluid/api_cn/io_cn/save_persistables_cn.rst
+++ b/doc/fluid/api_cn/io_cn/save_persistables_cn.rst
--- a/doc/fluid/api_cn/io_cn/save_vars_cn.rst
+++ b/doc/fluid/api_cn/io_cn/save_vars_cn.rst
--- a/doc/fluid/api_cn/io_cn/set_program_state_cn.rst
+++ b/doc/fluid/api_cn/io_cn/set_program_state_cn.rst
--- a/doc/fluid/api_cn/io_cn/shuffle_cn.rst
+++ b/doc/fluid/api_cn/io_cn/shuffle_cn.rst
--- a/doc/fluid/api_cn/io_cn/xmap_readers_cn.rst
+++ b/doc/fluid/api_cn/io_cn/xmap_readers_cn.rst
--- a/doc/fluid/api_cn/layers_cn/BeamSearchDecoder_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/BeamSearchDecoder_cn.rst
--- a/doc/fluid/api_cn/layers_cn/Categorical_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/Categorical_cn.rst
--- a/doc/fluid/api_cn/layers_cn/Decoder_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/Decoder_cn.rst
--- a/doc/fluid/api_cn/layers_cn/DynamicRNN_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/DynamicRNN_cn.rst
--- a/doc/fluid/api_cn/layers_cn/GRUCell_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/GRUCell_cn.rst
--- a/doc/fluid/api_cn/layers_cn/IfElse_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/IfElse_cn.rst
--- a/doc/fluid/api_cn/layers_cn/LSTMCell_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/LSTMCell_cn.rst
--- a/doc/fluid/api_cn/layers_cn/MultivariateNormalDiag_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/MultivariateNormalDiag_cn.rst
--- a/doc/fluid/api_cn/layers_cn/Normal_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/Normal_cn.rst
--- a/doc/fluid/api_cn/layers_cn/Print_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/Print_cn.rst
--- a/doc/fluid/api_cn/layers_cn/RNNCell_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/RNNCell_cn.rst
--- a/doc/fluid/api_cn/layers_cn/StaticRNN_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/StaticRNN_cn.rst
--- a/doc/fluid/api_cn/layers_cn/Switch_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/Switch_cn.rst
--- a/doc/fluid/api_cn/layers_cn/Uniform_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/Uniform_cn.rst
--- a/doc/fluid/api_cn/layers_cn/While_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/While_cn.rst
--- a/doc/fluid/api_cn/layers_cn/abs_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/abs_cn.rst
--- a/doc/fluid/api_cn/layers_cn/accuracy_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/accuracy_cn.rst
--- a/doc/fluid/api_cn/layers_cn/acos_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/acos_cn.rst
--- a/doc/fluid/api_cn/layers_cn/adaptive_pool2d_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/adaptive_pool2d_cn.rst
--- a/doc/fluid/api_cn/layers_cn/adaptive_pool3d_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/adaptive_pool3d_cn.rst
--- a/doc/fluid/api_cn/layers_cn/add_position_encoding_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/add_position_encoding_cn.rst
--- a/doc/fluid/api_cn/layers_cn/affine_channel_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/affine_channel_cn.rst
--- a/doc/fluid/api_cn/layers_cn/affine_grid_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/affine_grid_cn.rst
--- a/doc/fluid/api_cn/layers_cn/anchor_generator_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/anchor_generator_cn.rst
--- a/doc/fluid/api_cn/layers_cn/argmax_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/argmax_cn.rst
--- a/doc/fluid/api_cn/layers_cn/argmin_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/argmin_cn.rst
--- a/doc/fluid/api_cn/layers_cn/argsort_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/argsort_cn.rst
--- a/doc/fluid/api_cn/layers_cn/array_length_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/array_length_cn.rst
--- a/doc/fluid/api_cn/layers_cn/array_read_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/array_read_cn.rst
--- a/doc/fluid/api_cn/layers_cn/array_write_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/array_write_cn.rst
--- a/doc/fluid/api_cn/layers_cn/asin_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/asin_cn.rst
--- a/doc/fluid/api_cn/layers_cn/assign_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/assign_cn.rst
--- a/doc/fluid/api_cn/layers_cn/atan_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/atan_cn.rst
--- a/doc/fluid/api_cn/layers_cn/auc_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/auc_cn.rst
--- a/doc/fluid/api_cn/layers_cn/autoincreased_step_counter_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/autoincreased_step_counter_cn.rst
--- a/doc/fluid/api_cn/layers_cn/batch_norm_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/batch_norm_cn.rst
--- a/doc/fluid/api_cn/layers_cn/beam_search_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/beam_search_cn.rst
--- a/doc/fluid/api_cn/layers_cn/beam_search_decode_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/beam_search_decode_cn.rst
--- a/doc/fluid/api_cn/layers_cn/bilinear_tensor_product_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/bilinear_tensor_product_cn.rst
--- a/doc/fluid/api_cn/layers_cn/bipartite_match_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/bipartite_match_cn.rst
--- a/doc/fluid/api_cn/layers_cn/box_clip_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/box_clip_cn.rst
--- a/doc/fluid/api_cn/layers_cn/box_coder_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/box_coder_cn.rst
--- a/doc/fluid/api_cn/layers_cn/box_decoder_and_assign_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/box_decoder_and_assign_cn.rst
--- a/doc/fluid/api_cn/layers_cn/bpr_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/bpr_loss_cn.rst
--- a/doc/fluid/api_cn/layers_cn/brelu_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/brelu_cn.rst
--- a/doc/fluid/api_cn/layers_cn/case_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/case_cn.rst
--- a/doc/fluid/api_cn/layers_cn/cast_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/cast_cn.rst
--- a/doc/fluid/api_cn/layers_cn/ceil_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/ceil_cn.rst
--- a/doc/fluid/api_cn/layers_cn/center_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/center_loss_cn.rst
--- a/doc/fluid/api_cn/layers_cn/chunk_eval_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/chunk_eval_cn.rst
--- a/doc/fluid/api_cn/layers_cn/clip_by_norm_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/clip_by_norm_cn.rst
--- a/doc/fluid/api_cn/layers_cn/clip_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/clip_cn.rst
--- a/doc/fluid/api_cn/layers_cn/collect_fpn_proposals_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/collect_fpn_proposals_cn.rst
--- a/doc/fluid/api_cn/layers_cn/concat_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/concat_cn.rst
--- a/doc/fluid/api_cn/layers_cn/cond_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/cond_cn.rst
--- a/doc/fluid/api_cn/layers_cn/continuous_value_model_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/continuous_value_model_cn.rst
--- a/doc/fluid/api_cn/layers_cn/conv2d_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/conv2d_cn.rst
--- a/doc/fluid/api_cn/layers_cn/conv2d_transpose_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/conv2d_transpose_cn.rst
--- a/doc/fluid/api_cn/layers_cn/conv3d_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/conv3d_cn.rst
--- a/doc/fluid/api_cn/layers_cn/conv3d_transpose_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/conv3d_transpose_cn.rst
--- a/doc/fluid/api_cn/layers_cn/cos_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/cos_cn.rst
--- a/doc/fluid/api_cn/layers_cn/cos_sim_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/cos_sim_cn.rst
--- a/doc/fluid/api_cn/layers_cn/cosine_decay_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/cosine_decay_cn.rst
--- a/doc/fluid/api_cn/layers_cn/create_array_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/create_array_cn.rst
--- a/doc/fluid/api_cn/layers_cn/create_global_var_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/create_global_var_cn.rst
--- a/doc/fluid/api_cn/layers_cn/create_parameter_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/create_parameter_cn.rst
--- a/doc/fluid/api_cn/layers_cn/create_py_reader_by_data_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/create_py_reader_by_data_cn.rst
--- a/doc/fluid/api_cn/layers_cn/create_tensor_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/create_tensor_cn.rst
--- a/doc/fluid/api_cn/layers_cn/crf_decoding_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/crf_decoding_cn.rst
--- a/doc/fluid/api_cn/layers_cn/crop_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/crop_cn.rst
--- a/doc/fluid/api_cn/layers_cn/crop_tensor_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/crop_tensor_cn.rst
--- a/doc/fluid/api_cn/layers_cn/cross_entropy_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/cross_entropy_cn.rst
--- a/doc/fluid/api_cn/layers_cn/ctc_greedy_decoder_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/ctc_greedy_decoder_cn.rst
--- a/doc/fluid/api_cn/layers_cn/cumsum_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/cumsum_cn.rst
--- a/doc/fluid/api_cn/layers_cn/data_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/data_cn.rst
--- a/doc/fluid/api_cn/layers_cn/data_norm_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/data_norm_cn.rst
--- a/doc/fluid/api_cn/layers_cn/deformable_conv_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/deformable_conv_cn.rst
--- a/doc/fluid/api_cn/layers_cn/deformable_roi_pooling_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/deformable_roi_pooling_cn.rst
--- a/doc/fluid/api_cn/layers_cn/density_prior_box_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/density_prior_box_cn.rst
--- a/doc/fluid/api_cn/layers_cn/detection_output_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/detection_output_cn.rst
--- a/doc/fluid/api_cn/layers_cn/diag_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/diag_cn.rst
--- a/doc/fluid/api_cn/layers_cn/dice_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/dice_loss_cn.rst
--- a/doc/fluid/api_cn/layers_cn/distribute_fpn_proposals_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/distribute_fpn_proposals_cn.rst
--- a/doc/fluid/api_cn/layers_cn/double_buffer_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/double_buffer_cn.rst
--- a/doc/fluid/api_cn/layers_cn/dropout_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/dropout_cn.rst
--- a/doc/fluid/api_cn/layers_cn/dynamic_decode_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/dynamic_decode_cn.rst
--- a/doc/fluid/api_cn/layers_cn/dynamic_gru_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/dynamic_gru_cn.rst
--- a/doc/fluid/api_cn/layers_cn/dynamic_lstm_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/dynamic_lstm_cn.rst
--- a/doc/fluid/api_cn/layers_cn/dynamic_lstmp_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/dynamic_lstmp_cn.rst
--- a/doc/fluid/api_cn/layers_cn/edit_distance_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/edit_distance_cn.rst
--- a/doc/fluid/api_cn/layers_cn/elementwise_add_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/elementwise_add_cn.rst
--- a/doc/fluid/api_cn/layers_cn/elementwise_div_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/elementwise_div_cn.rst
--- a/doc/fluid/api_cn/layers_cn/elementwise_floordiv_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/elementwise_floordiv_cn.rst
--- a/doc/fluid/api_cn/layers_cn/elementwise_max_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/elementwise_max_cn.rst
--- a/doc/fluid/api_cn/layers_cn/elementwise_min_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/elementwise_min_cn.rst
--- a/doc/fluid/api_cn/layers_cn/elementwise_mod_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/elementwise_mod_cn.rst
--- a/doc/fluid/api_cn/layers_cn/elementwise_mul_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/elementwise_mul_cn.rst
--- a/doc/fluid/api_cn/layers_cn/elementwise_pow_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/elementwise_pow_cn.rst
--- a/doc/fluid/api_cn/layers_cn/elementwise_sub_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/elementwise_sub_cn.rst
--- a/doc/fluid/api_cn/layers_cn/elu_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/elu_cn.rst
--- a/doc/fluid/api_cn/layers_cn/embedding_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/embedding_cn.rst
--- a/doc/fluid/api_cn/layers_cn/equal_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/equal_cn.rst
--- a/doc/fluid/api_cn/layers_cn/erf_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/erf_cn.rst
--- a/doc/fluid/api_cn/layers_cn/exp_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/exp_cn.rst
--- a/doc/fluid/api_cn/layers_cn/expand_as_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/expand_as_cn.rst
--- a/doc/fluid/api_cn/layers_cn/expand_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/expand_cn.rst
--- a/doc/fluid/api_cn/layers_cn/exponential_decay_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/exponential_decay_cn.rst
--- a/doc/fluid/api_cn/layers_cn/eye_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/eye_cn.rst
--- a/doc/fluid/api_cn/layers_cn/fc_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/fc_cn.rst
--- a/doc/fluid/api_cn/layers_cn/fill_constant_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/fill_constant_cn.rst
--- a/doc/fluid/api_cn/layers_cn/filter_by_instag_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/filter_by_instag_cn.rst
--- a/doc/fluid/api_cn/layers_cn/flatten_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/flatten_cn.rst
--- a/doc/fluid/api_cn/layers_cn/floor_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/floor_cn.rst
--- a/doc/fluid/api_cn/layers_cn/fsp_matrix_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/fsp_matrix_cn.rst
--- a/doc/fluid/api_cn/layers_cn/gather_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/gather_cn.rst
--- a/doc/fluid/api_cn/layers_cn/gather_nd_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/gather_nd_cn.rst
--- a/doc/fluid/api_cn/layers_cn/gather_tree_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/gather_tree_cn.rst
--- a/doc/fluid/api_cn/layers_cn/gaussian_random_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/gaussian_random_cn.rst
--- a/doc/fluid/api_cn/layers_cn/gelu_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/gelu_cn.rst
--- a/doc/fluid/api_cn/layers_cn/generate_mask_labels_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/generate_mask_labels_cn.rst
--- a/doc/fluid/api_cn/layers_cn/generate_proposal_labels_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/generate_proposal_labels_cn.rst
--- a/doc/fluid/api_cn/layers_cn/generate_proposals_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/generate_proposals_cn.rst
--- a/doc/fluid/api_cn/layers_cn/get_tensor_from_selected_rows_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/get_tensor_from_selected_rows_cn.rst
--- a/doc/fluid/api_cn/layers_cn/greater_equal_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/greater_equal_cn.rst
--- a/doc/fluid/api_cn/layers_cn/greater_than_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/greater_than_cn.rst
--- a/doc/fluid/api_cn/layers_cn/grid_sampler_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/grid_sampler_cn.rst
--- a/doc/fluid/api_cn/layers_cn/group_norm_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/group_norm_cn.rst
--- a/doc/fluid/api_cn/layers_cn/gru_unit_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/gru_unit_cn.rst
--- a/doc/fluid/api_cn/layers_cn/hard_shrink_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/hard_shrink_cn.rst
--- a/doc/fluid/api_cn/layers_cn/hard_sigmoid_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/hard_sigmoid_cn.rst
--- a/doc/fluid/api_cn/layers_cn/hard_swish_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/hard_swish_cn.rst
--- a/doc/fluid/api_cn/layers_cn/has_inf_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/has_inf_cn.rst
--- a/doc/fluid/api_cn/layers_cn/has_nan_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/has_nan_cn.rst
--- a/doc/fluid/api_cn/layers_cn/hash_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/hash_cn.rst
--- a/doc/fluid/api_cn/layers_cn/hsigmoid_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/hsigmoid_cn.rst
--- a/doc/fluid/api_cn/layers_cn/huber_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/huber_loss_cn.rst
--- a/doc/fluid/api_cn/layers_cn/im2sequence_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/im2sequence_cn.rst
--- a/doc/fluid/api_cn/layers_cn/image_resize_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/image_resize_cn.rst
--- a/doc/fluid/api_cn/layers_cn/image_resize_short_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/image_resize_short_cn.rst
--- a/doc/fluid/api_cn/layers_cn/increment_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/increment_cn.rst
--- a/doc/fluid/api_cn/layers_cn/instance_norm_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/instance_norm_cn.rst
--- a/doc/fluid/api_cn/layers_cn/inverse_time_decay_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/inverse_time_decay_cn.rst
--- a/doc/fluid/api_cn/layers_cn/iou_similarity_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/iou_similarity_cn.rst
--- a/doc/fluid/api_cn/layers_cn/is_empty_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/is_empty_cn.rst
--- a/doc/fluid/api_cn/layers_cn/isfinite_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/isfinite_cn.rst
--- a/doc/fluid/api_cn/layers_cn/kldiv_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/kldiv_loss_cn.rst
--- a/doc/fluid/api_cn/layers_cn/l2_normalize_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/l2_normalize_cn.rst
--- a/doc/fluid/api_cn/layers_cn/label_smooth_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/label_smooth_cn.rst
--- a/doc/fluid/api_cn/layers_cn/layer_norm_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/layer_norm_cn.rst
--- a/doc/fluid/api_cn/layers_cn/leaky_relu_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/leaky_relu_cn.rst
--- a/doc/fluid/api_cn/layers_cn/less_equal_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/less_equal_cn.rst
--- a/doc/fluid/api_cn/layers_cn/less_than_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/less_than_cn.rst
--- a/doc/fluid/api_cn/layers_cn/linear_chain_crf_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/linear_chain_crf_cn.rst
--- a/doc/fluid/api_cn/layers_cn/linear_lr_warmup_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/linear_lr_warmup_cn.rst
--- a/doc/fluid/api_cn/layers_cn/linspace_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/linspace_cn.rst
--- a/doc/fluid/api_cn/layers_cn/load_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/load_cn.rst
--- a/doc/fluid/api_cn/layers_cn/locality_aware_nms_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/locality_aware_nms_cn.rst
--- a/doc/fluid/api_cn/layers_cn/lod_append_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/lod_append_cn.rst
--- a/doc/fluid/api_cn/layers_cn/lod_reset_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/lod_reset_cn.rst
--- a/doc/fluid/api_cn/layers_cn/log_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/log_cn.rst
--- a/doc/fluid/api_cn/layers_cn/log_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/log_loss_cn.rst
--- a/doc/fluid/api_cn/layers_cn/logical_and_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/logical_and_cn.rst
--- a/doc/fluid/api_cn/layers_cn/logical_not_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/logical_not_cn.rst
--- a/doc/fluid/api_cn/layers_cn/logical_or_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/logical_or_cn.rst
--- a/doc/fluid/api_cn/layers_cn/logical_xor_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/logical_xor_cn.rst
--- a/doc/fluid/api_cn/layers_cn/logsigmoid_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/logsigmoid_cn.rst
--- a/doc/fluid/api_cn/layers_cn/lrn_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/lrn_cn.rst
--- a/doc/fluid/api_cn/layers_cn/lstm_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/lstm_cn.rst
--- a/doc/fluid/api_cn/layers_cn/lstm_unit_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/lstm_unit_cn.rst
--- a/doc/fluid/api_cn/layers_cn/margin_rank_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/margin_rank_loss_cn.rst
--- a/doc/fluid/api_cn/layers_cn/matmul_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/matmul_cn.rst
--- a/doc/fluid/api_cn/layers_cn/maxout_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/maxout_cn.rst
--- a/doc/fluid/api_cn/layers_cn/mean_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/mean_cn.rst
--- a/doc/fluid/api_cn/layers_cn/mean_iou_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/mean_iou_cn.rst
--- a/doc/fluid/api_cn/layers_cn/merge_selected_rows_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/merge_selected_rows_cn.rst
--- a/doc/fluid/api_cn/layers_cn/mse_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/mse_loss_cn.rst
--- a/doc/fluid/api_cn/layers_cn/mul_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/mul_cn.rst
--- a/doc/fluid/api_cn/layers_cn/multi_box_head_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/multi_box_head_cn.rst
--- a/doc/fluid/api_cn/layers_cn/multiclass_nms_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/multiclass_nms_cn.rst
--- a/doc/fluid/api_cn/layers_cn/multiplex_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/multiplex_cn.rst
--- a/doc/fluid/api_cn/layers_cn/natural_exp_decay_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/natural_exp_decay_cn.rst
--- a/doc/fluid/api_cn/layers_cn/nce_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/nce_cn.rst
--- a/doc/fluid/api_cn/layers_cn/noam_decay_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/noam_decay_cn.rst
--- a/doc/fluid/api_cn/layers_cn/not_equal_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/not_equal_cn.rst
--- a/doc/fluid/api_cn/layers_cn/npair_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/npair_loss_cn.rst
--- a/doc/fluid/api_cn/layers_cn/one_hot_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/one_hot_cn.rst
--- a/doc/fluid/api_cn/layers_cn/ones_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/ones_cn.rst
--- a/doc/fluid/api_cn/layers_cn/ones_like_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/ones_like_cn.rst
--- a/doc/fluid/api_cn/layers_cn/pad2d_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/pad2d_cn.rst
--- a/doc/fluid/api_cn/layers_cn/pad_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/pad_cn.rst
--- a/doc/fluid/api_cn/layers_cn/pad_constant_like_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/pad_constant_like_cn.rst
--- a/doc/fluid/api_cn/layers_cn/piecewise_decay_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/piecewise_decay_cn.rst
--- a/doc/fluid/api_cn/layers_cn/pixel_shuffle_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/pixel_shuffle_cn.rst
--- a/doc/fluid/api_cn/layers_cn/polygon_box_transform_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/polygon_box_transform_cn.rst
--- a/doc/fluid/api_cn/layers_cn/polynomial_decay_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/polynomial_decay_cn.rst
--- a/doc/fluid/api_cn/layers_cn/pool2d_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/pool2d_cn.rst
--- a/doc/fluid/api_cn/layers_cn/pool3d_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/pool3d_cn.rst
--- a/doc/fluid/api_cn/layers_cn/pow_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/pow_cn.rst
--- a/doc/fluid/api_cn/layers_cn/prelu_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/prelu_cn.rst
--- a/doc/fluid/api_cn/layers_cn/prior_box_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/prior_box_cn.rst
--- a/doc/fluid/api_cn/layers_cn/prroi_pool_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/prroi_pool_cn.rst
--- a/doc/fluid/api_cn/layers_cn/psroi_pool_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/psroi_pool_cn.rst
--- a/doc/fluid/api_cn/layers_cn/py_func_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/py_func_cn.rst
--- a/doc/fluid/api_cn/layers_cn/py_reader_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/py_reader_cn.rst
--- a/doc/fluid/api_cn/layers_cn/random_crop_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/random_crop_cn.rst
--- a/doc/fluid/api_cn/layers_cn/range_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/range_cn.rst
--- a/doc/fluid/api_cn/layers_cn/rank_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/rank_cn.rst
--- a/doc/fluid/api_cn/layers_cn/rank_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/rank_loss_cn.rst
--- a/doc/fluid/api_cn/layers_cn/read_file_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/read_file_cn.rst
--- a/doc/fluid/api_cn/layers_cn/reciprocal_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reciprocal_cn.rst
--- a/doc/fluid/api_cn/layers_cn/reduce_all_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reduce_all_cn.rst
--- a/doc/fluid/api_cn/layers_cn/reduce_any_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reduce_any_cn.rst
--- a/doc/fluid/api_cn/layers_cn/reduce_max_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reduce_max_cn.rst
--- a/doc/fluid/api_cn/layers_cn/reduce_mean_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reduce_mean_cn.rst
--- a/doc/fluid/api_cn/layers_cn/reduce_min_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reduce_min_cn.rst
--- a/doc/fluid/api_cn/layers_cn/reduce_prod_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reduce_prod_cn.rst
--- a/doc/fluid/api_cn/layers_cn/reduce_sum_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reduce_sum_cn.rst
--- a/doc/fluid/api_cn/layers_cn/relu6_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/relu6_cn.rst
--- a/doc/fluid/api_cn/layers_cn/relu_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/relu_cn.rst
--- a/doc/fluid/api_cn/layers_cn/reorder_lod_tensor_by_rank_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reorder_lod_tensor_by_rank_cn.rst
--- a/doc/fluid/api_cn/layers_cn/reshape_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reshape_cn.rst
--- a/doc/fluid/api_cn/layers_cn/resize_bilinear_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/resize_bilinear_cn.rst
--- a/doc/fluid/api_cn/layers_cn/resize_nearest_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/resize_nearest_cn.rst
--- a/doc/fluid/api_cn/layers_cn/resize_trilinear_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/resize_trilinear_cn.rst
--- a/doc/fluid/api_cn/layers_cn/retinanet_detection_output_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/retinanet_detection_output_cn.rst
--- a/doc/fluid/api_cn/layers_cn/retinanet_target_assign_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/retinanet_target_assign_cn.rst
--- a/doc/fluid/api_cn/layers_cn/reverse_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reverse_cn.rst
--- a/doc/fluid/api_cn/layers_cn/rnn_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/rnn_cn.rst
--- a/doc/fluid/api_cn/layers_cn/roi_align_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/roi_align_cn.rst
--- a/doc/fluid/api_cn/layers_cn/roi_perspective_transform_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/roi_perspective_transform_cn.rst
--- a/doc/fluid/api_cn/layers_cn/roi_pool_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/roi_pool_cn.rst
--- a/doc/fluid/api_cn/layers_cn/round_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/round_cn.rst
--- a/doc/fluid/api_cn/layers_cn/row_conv_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/row_conv_cn.rst
--- a/doc/fluid/api_cn/layers_cn/rpn_target_assign_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/rpn_target_assign_cn.rst
--- a/doc/fluid/api_cn/layers_cn/rsqrt_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/rsqrt_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sampled_softmax_with_cross_entropy_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sampled_softmax_with_cross_entropy_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sampling_id_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sampling_id_cn.rst
--- a/doc/fluid/api_cn/layers_cn/scale_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/scale_cn.rst
--- a/doc/fluid/api_cn/layers_cn/scatter_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/scatter_cn.rst
--- a/doc/fluid/api_cn/layers_cn/scatter_nd_add_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/scatter_nd_add_cn.rst
--- a/doc/fluid/api_cn/layers_cn/scatter_nd_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/scatter_nd_cn.rst
--- a/doc/fluid/api_cn/layers_cn/selu_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/selu_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_concat_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_concat_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_conv_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_conv_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_enumerate_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_enumerate_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_expand_as_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_expand_as_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_expand_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_expand_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_first_step_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_first_step_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_last_step_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_last_step_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_mask_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_mask_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_pad_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_pad_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_pool_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_pool_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_reshape_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_reshape_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_reverse_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_reverse_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_scatter_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_scatter_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_slice_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_slice_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_softmax_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_softmax_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sequence_unpad_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sequence_unpad_cn.rst
--- a/doc/fluid/api_cn/layers_cn/shape_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/shape_cn.rst
--- a/doc/fluid/api_cn/layers_cn/shard_index_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/shard_index_cn.rst
--- a/doc/fluid/api_cn/layers_cn/shuffle_channel_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/shuffle_channel_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sigmoid_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sigmoid_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sigmoid_cross_entropy_with_logits_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sigmoid_cross_entropy_with_logits_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sigmoid_focal_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sigmoid_focal_loss_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sign_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sign_cn.rst
--- a/doc/fluid/api_cn/layers_cn/similarity_focus_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/similarity_focus_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sin_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sin_cn.rst
--- a/doc/fluid/api_cn/layers_cn/size_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/size_cn.rst
--- a/doc/fluid/api_cn/layers_cn/slice_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/slice_cn.rst
--- a/doc/fluid/api_cn/layers_cn/smooth_l1_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/smooth_l1_cn.rst
--- a/doc/fluid/api_cn/layers_cn/soft_relu_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/soft_relu_cn.rst
--- a/doc/fluid/api_cn/layers_cn/softmax_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/softmax_cn.rst
--- a/doc/fluid/api_cn/layers_cn/softmax_with_cross_entropy_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/softmax_with_cross_entropy_cn.rst
--- a/doc/fluid/api_cn/layers_cn/softplus_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/softplus_cn.rst
--- a/doc/fluid/api_cn/layers_cn/softshrink_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/softshrink_cn.rst
--- a/doc/fluid/api_cn/layers_cn/softsign_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/softsign_cn.rst
--- a/doc/fluid/api_cn/layers_cn/space_to_depth_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/space_to_depth_cn.rst
--- a/doc/fluid/api_cn/layers_cn/spectral_norm_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/spectral_norm_cn.rst
--- a/doc/fluid/api_cn/layers_cn/split_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/split_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sqrt_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sqrt_cn.rst
--- a/doc/fluid/api_cn/layers_cn/square_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/square_cn.rst
--- a/doc/fluid/api_cn/layers_cn/square_error_cost_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/square_error_cost_cn.rst
--- a/doc/fluid/api_cn/layers_cn/squeeze_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/squeeze_cn.rst
--- a/doc/fluid/api_cn/layers_cn/ssd_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/ssd_loss_cn.rst
--- a/doc/fluid/api_cn/layers_cn/stack_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/stack_cn.rst
--- a/doc/fluid/api_cn/layers_cn/stanh_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/stanh_cn.rst
--- a/doc/fluid/api_cn/layers_cn/strided_slice_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/strided_slice_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sum_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sum_cn.rst
--- a/doc/fluid/api_cn/layers_cn/sums_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sums_cn.rst
--- a/doc/fluid/api_cn/layers_cn/swish_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/swish_cn.rst
--- a/doc/fluid/api_cn/layers_cn/switch_case_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/switch_case_cn.rst
--- a/doc/fluid/api_cn/layers_cn/tanh_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/tanh_cn.rst
--- a/doc/fluid/api_cn/layers_cn/tanh_shrink_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/tanh_shrink_cn.rst
--- a/doc/fluid/api_cn/layers_cn/target_assign_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/target_assign_cn.rst
--- a/doc/fluid/api_cn/layers_cn/teacher_student_sigmoid_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/teacher_student_sigmoid_loss_cn.rst
--- a/doc/fluid/api_cn/layers_cn/temporal_shift_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/temporal_shift_cn.rst
--- a/doc/fluid/api_cn/layers_cn/tensor_array_to_tensor_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/tensor_array_to_tensor_cn.rst
--- a/doc/fluid/api_cn/layers_cn/thresholded_relu_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/thresholded_relu_cn.rst
--- a/doc/fluid/api_cn/layers_cn/topk_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/topk_cn.rst
--- a/doc/fluid/api_cn/layers_cn/transpose_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/transpose_cn.rst
--- a/doc/fluid/api_cn/layers_cn/unfold_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/unfold_cn.rst
--- a/doc/fluid/api_cn/layers_cn/uniform_random_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/uniform_random_cn.rst
--- a/doc/fluid/api_cn/layers_cn/unique_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/unique_cn.rst
--- a/doc/fluid/api_cn/layers_cn/unique_with_counts_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/unique_with_counts_cn.rst
--- a/doc/fluid/api_cn/layers_cn/unsqueeze_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/unsqueeze_cn.rst
--- a/doc/fluid/api_cn/layers_cn/unstack_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/unstack_cn.rst
--- a/doc/fluid/api_cn/layers_cn/warpctc_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/warpctc_cn.rst
--- a/doc/fluid/api_cn/layers_cn/where_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/where_cn.rst
--- a/doc/fluid/api_cn/layers_cn/while_loop_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/while_loop_cn.rst
--- a/doc/fluid/api_cn/layers_cn/yolo_box_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/yolo_box_cn.rst
--- a/doc/fluid/api_cn/layers_cn/yolov3_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/yolov3_loss_cn.rst
--- a/doc/fluid/api_cn/layers_cn/zeros_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/zeros_cn.rst
--- a/doc/fluid/api_cn/layers_cn/zeros_like_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/zeros_like_cn.rst
--- a/doc/fluid/api_cn/metrics_cn/Accuracy_cn.rst
+++ b/doc/fluid/api_cn/metrics_cn/Accuracy_cn.rst
--- a/doc/fluid/api_cn/metrics_cn/Auc_cn.rst
+++ b/doc/fluid/api_cn/metrics_cn/Auc_cn.rst
--- a/doc/fluid/api_cn/metrics_cn/ChunkEvaluator_cn.rst
+++ b/doc/fluid/api_cn/metrics_cn/ChunkEvaluator_cn.rst
--- a/doc/fluid/api_cn/metrics_cn/CompositeMetric_cn.rst
+++ b/doc/fluid/api_cn/metrics_cn/CompositeMetric_cn.rst
--- a/doc/fluid/api_cn/metrics_cn/DetectionMAP_cn.rst
+++ b/doc/fluid/api_cn/metrics_cn/DetectionMAP_cn.rst
--- a/doc/fluid/api_cn/metrics_cn/EditDistance_cn.rst
+++ b/doc/fluid/api_cn/metrics_cn/EditDistance_cn.rst
--- a/doc/fluid/api_cn/metrics_cn/MetricBase_cn.rst
+++ b/doc/fluid/api_cn/metrics_cn/MetricBase_cn.rst
--- a/doc/fluid/api_cn/metrics_cn/Precision_cn.rst
+++ b/doc/fluid/api_cn/metrics_cn/Precision_cn.rst
--- a/doc/fluid/api_cn/metrics_cn/Recall_cn.rst
+++ b/doc/fluid/api_cn/metrics_cn/Recall_cn.rst
--- a/doc/fluid/api_cn/nets_cn/glu_cn.rst
+++ b/doc/fluid/api_cn/nets_cn/glu_cn.rst
--- a/doc/fluid/api_cn/nets_cn/img_conv_group_cn.rst
+++ b/doc/fluid/api_cn/nets_cn/img_conv_group_cn.rst
--- a/doc/fluid/api_cn/nets_cn/scaled_dot_product_attention_cn.rst
+++ b/doc/fluid/api_cn/nets_cn/scaled_dot_product_attention_cn.rst
--- a/doc/fluid/api_cn/nets_cn/sequence_conv_pool_cn.rst
+++ b/doc/fluid/api_cn/nets_cn/sequence_conv_pool_cn.rst
--- a/doc/fluid/api_cn/nets_cn/simple_img_conv_pool_cn.rst
+++ b/doc/fluid/api_cn/nets_cn/simple_img_conv_pool_cn.rst
--- a/doc/fluid/api_cn/nn_cn/Conv2D_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/Conv2D_cn.rst
--- a/doc/fluid/api_cn/nn_cn/LogSoftmax_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/LogSoftmax_cn.rst
--- a/doc/fluid/api_cn/nn_cn/ReLU_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/ReLU_cn.rst
--- a/doc/fluid/api_cn/nn_cn/diag_embed_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/diag_embed_cn.rst
--- a/doc/fluid/api_cn/nn_cn/interpolate_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/interpolate_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/AdadeltaOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/AdadeltaOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/Adadelta_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/Adadelta_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/AdagradOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/AdagradOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/Adagrad_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/Adagrad_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/AdamOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/AdamOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/Adam_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/Adam_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/AdamaxOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/AdamaxOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/Adamax_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/Adamax_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/DGCMomentumOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/DGCMomentumOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/DecayedAdagradOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/DecayedAdagradOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/DecayedAdagrad_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/DecayedAdagrad_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/DpsgdOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/DpsgdOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/Dpsgd_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/Dpsgd_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/ExponentialMovingAverage_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/ExponentialMovingAverage_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/FtrlOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/FtrlOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/Ftrl_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/Ftrl_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/LambOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/LambOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/LarsMomentumOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/LarsMomentumOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/LarsMomentum_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/LarsMomentum_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/LookaheadOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/LookaheadOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/ModelAverage_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/ModelAverage_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/MomentumOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/MomentumOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/Momentum_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/Momentum_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/PipelineOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/PipelineOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/RMSPropOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/RMSPropOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/RecomputeOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/RecomputeOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/SGDOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/SGDOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/SGD_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/SGD_cn.rst
--- a/doc/fluid/api_cn/profiler_cn/cuda_profiler_cn.rst
+++ b/doc/fluid/api_cn/profiler_cn/cuda_profiler_cn.rst
--- a/doc/fluid/api_cn/profiler_cn/profiler_cn.rst
+++ b/doc/fluid/api_cn/profiler_cn/profiler_cn.rst
--- a/doc/fluid/api_cn/profiler_cn/reset_profiler_cn.rst
+++ b/doc/fluid/api_cn/profiler_cn/reset_profiler_cn.rst
--- a/doc/fluid/api_cn/profiler_cn/start_profiler_cn.rst
+++ b/doc/fluid/api_cn/profiler_cn/start_profiler_cn.rst
--- a/doc/fluid/api_cn/profiler_cn/stop_profiler_cn.rst
+++ b/doc/fluid/api_cn/profiler_cn/stop_profiler_cn.rst
--- a/doc/fluid/api_cn/regularizer_cn/L1DecayRegularizer_cn.rst
+++ b/doc/fluid/api_cn/regularizer_cn/L1DecayRegularizer_cn.rst
--- a/doc/fluid/api_cn/regularizer_cn/L1Decay_cn.rst
+++ b/doc/fluid/api_cn/regularizer_cn/L1Decay_cn.rst
--- a/doc/fluid/api_cn/regularizer_cn/L2DecayRegularizer_cn.rst
+++ b/doc/fluid/api_cn/regularizer_cn/L2DecayRegularizer_cn.rst
--- a/doc/fluid/api_cn/regularizer_cn/L2Decay_cn.rst
+++ b/doc/fluid/api_cn/regularizer_cn/L2Decay_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/add_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/add_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/addcmul_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/addcmul_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/addmm_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/addmm_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/allclose_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/allclose_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/arange_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/arange_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/argmax_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/argmax_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/bmm_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/bmm_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/clamp_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/clamp_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/cross_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/cross_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/dist_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/dist_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/div_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/div_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/dot_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/dot_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/elementwise_equal_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/elementwise_equal_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/elementwise_sum_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/elementwise_sum_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/equal_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/equal_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/flip_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/flip_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/full_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/full_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/full_like_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/full_like_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/gather_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/gather_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/index_sample_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/index_sample_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/index_select_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/index_select_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/inverse_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/inverse_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/kron_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/kron_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/linspace_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/linspace_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/log1p_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/log1p_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/logsumexp_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/logsumexp_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/matmul_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/matmul_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/max_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/max_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/meshgrid_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/meshgrid_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/min_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/min_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/mm_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/mm_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/mul_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/mul_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/nonzero_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/nonzero_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/norm_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/norm_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/ones_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/ones_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/ones_like_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/ones_like_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/pow_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/pow_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/randint_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/randint_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/randn_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/randn_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/randperm_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/randperm_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/roll_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/roll_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/sin_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/sin_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/sort_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/sort_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/split_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/split_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/sqrt_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/sqrt_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/squeeze_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/squeeze_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/stack_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/stack_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/std_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/std_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/sum_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/sum_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/t_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/t_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/tanh_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/tanh_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/trace_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/trace_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/tril_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/tril_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/triu_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/triu_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/unbind_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/unbind_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/unsqueeze_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/unsqueeze_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/var_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/var_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/where_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/where_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/zeros_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/zeros_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/zeros_like_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/zeros_like_cn.rst
--- a/doc/fluid/api_cn/transpiler_cn/DistributeTranspilerConfig_cn.rst
+++ b/doc/fluid/api_cn/transpiler_cn/DistributeTranspilerConfig_cn.rst
--- a/doc/fluid/api_cn/transpiler_cn/DistributeTranspiler_cn.rst
+++ b/doc/fluid/api_cn/transpiler_cn/DistributeTranspiler_cn.rst
--- a/doc/fluid/api_cn/transpiler_cn/HashName_cn.rst
+++ b/doc/fluid/api_cn/transpiler_cn/HashName_cn.rst
--- a/doc/fluid/api_cn/transpiler_cn/RoundRobin_cn.rst
+++ b/doc/fluid/api_cn/transpiler_cn/RoundRobin_cn.rst
--- a/doc/fluid/api_cn/transpiler_cn/memory_optimize_cn.rst
+++ b/doc/fluid/api_cn/transpiler_cn/memory_optimize_cn.rst
--- a/doc/fluid/api_cn/transpiler_cn/release_memory_cn.rst
+++ b/doc/fluid/api_cn/transpiler_cn/release_memory_cn.rst
--- a/doc/fluid/api_cn/unique_name_cn/generate_cn.rst
+++ b/doc/fluid/api_cn/unique_name_cn/generate_cn.rst
--- a/doc/fluid/api_cn/unique_name_cn/guard_cn.rst
+++ b/doc/fluid/api_cn/unique_name_cn/guard_cn.rst
--- a/doc/fluid/api_cn/unique_name_cn/switch_cn.rst
+++ b/doc/fluid/api_cn/unique_name_cn/switch_cn.rst