add_some_fluid_docs test=develop (#2576)

ec583540 · Chen Long · GitHub · 5e48b05e · ec583540 · ec583540
32 changed file
--- a/doc/paddle/api/paddle/fluid/dygraph/LinearLrWarmup_cn.rst
+++ b/doc/paddle/api/paddle/fluid/dygraph/LinearLrWarmup_cn.rst
+.. _cn_api_paddle_optimizer_LinearLrWarmup:
+
+LinearLrWarmup
+-----------------------------------
+
+.. py:class:: paddle.optimizer.lr_scheduler.LinearLrWarmup(learing_rate, warmup_steps, start_lr, end_lr, last_epoch=-1, verbose=False)
+
+该接口提供一种学习率优化策略-线性学习率热身(warm up)对学习率进行初步调整。在正常调整学习率之前，先逐步增大学习率。
+
+当训练步数小于热身步数（warmup_steps）时，学习率lr按如下方式更新：
+
+.. code-block:: text
+
+    linear_step = end_lr - start_lr
+    lr = start_lr + linear_step * (epoch / warmup_steps)
+
+当训练步数大于等于热身步数（warmup_steps）时，学习率lr为：
+
+.. code-block:: text
+
+    lr = learning_rate
+
+其中learning_rate为热身之后的学习率。
+
+参数
+:::::::::
+    - **learning rate** （float|_LRScheduler）：热启训练之后的学习率，可以是Python的float或_LRScheduler子类。
+    - **warmup_steps** （int）：进行warm up过程的步数。
+    - **start_lr** （float）：warm up的起始学习率。
+    - **end_lr** （float）：warm up的最终学习率。
+    - **last_epoch** （int，可选）: 上一轮的轮数，重启训练时设置为上一轮的epoch数。默认值为 -1，则为初始学习率 。
+    - **verbose** （bool，可选）：如果是 `True` ，则在每一轮更新时在标准输出 `stdout` 输出一条信息。默认值为 ``False`` 。
+
+
+返回
+:::::::::
+返回计算LinearLrWarmup的可调用对象。
+
+代码示例
+:::::::::
+
+.. code-block:: python
+
+     import paddle
+     import numpy as np
+
+     # train on default dygraph mode
+     paddle.disable_static()
+     x = np.random.uniform(-1, 1, [10, 10]).astype("float32")
+     linear = paddle.nn.Linear(10, 10)
+     scheduler = paddle.optimizer.LinearLrWarmup(
+             learning_rate=0.5, warmup_steps=20, start_lr=0, end_lr=0.5, verbose=True)
+     sgd = paddle.optimizer.SGD(learning_rate=scheduler, parameter_list=linear.parameters())
+     for epoch in range(20):
+         for batch_id in range(2):
+             x = paddle.to_tensor(x)
+             out = linear(x)
+             loss = paddle.reduce_mean(out)
+             loss.backward()
+             sgd.minimize(loss)
+             linear.clear_gradients()
+         scheduler.step()
+
+     # train on static mode
+     paddle.enable_static()
+     main_prog = paddle.static.Program()
+     start_prog = paddle.static.Program()
+     with paddle.static.program_guard(main_prog, start_prog):
+         x = paddle.static.data(name='x', shape=[None, 4, 5])
+         y = paddle.static.data(name='y', shape=[None, 4, 5])
+         z = paddle.static.nn.fc(x, 100)
+         loss = paddle.mean(z)
+         scheduler = paddle.optimizer.lr_scheduler.LinearLrWarmup(
+             learning_rate=0.5, warmup_steps=20, start_lr=0, end_lr=0.5, verbose=True)
+         sgd = paddle.optimizer.SGD(learning_rate=scheduler)
+         sgd.minimize(loss)
+
+     exe = paddle.static.Executor()
+     exe.run(start_prog)
+     for epoch in range(20):
+         for batch_id in range(2):
+             out = exe.run(
+                 main_prog,
+                 feed={
+                     'x': np.random.randn(3, 4, 5).astype('float32'),
+                     'y': np.random.randn(3, 4, 5).astype('float32')
+                 },
+                 fetch_list=loss.name)
+         scheduler.step()      
+
+.. py:method:: step(epoch=None)
+
+step函数需要在优化器的 `step()` 函数之后调用，调用之后将会根据epoch数来更新学习率，更新之后的学习率将会在优化器下一轮更新参数时使用。
+
+参数：
+  - **epoch** （int，可选）- 指定具体的epoch数。默认值None，此时将会从-1自动累加 ``epoch`` 数。
+
+返回：
+  无。
+
+**代码示例** ：
+
+  参照上述示例代码。
+
+
--- a/doc/paddle/api/paddle/fluid/dygraph/base/grad_cn.rst
+++ b/doc/paddle/api/paddle/fluid/dygraph/base/grad_cn.rst
+.. _cn_api_paddle_grad:
+
+grad
+-------------------------------
+
+**注意：该API仅支持【动态图】模式**
+
+.. py:method:: paddle.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False, no_grad_vars=None)
+
+对于每个 `inputs` ，计算所有 `outputs` 相对于其的梯度和。
+
+参数:
+    - **outputs** (Tensor|list(Tensor)|tuple(Tensor)) – 用于计算梯度的图的输出变量，或多个输出变量构成的list/tuple。
+    - **inputs** (Tensor|list(Tensor)|tuple(Tensor)) - 用于计算梯度的图的输入变量，或多个输入变量构成的list/tuple。该API的每个返回值对应每个 `inputs` 的梯度。
+    - **grad_outputs** (Tensor|list(Tensor|None)|tuple(Tensor|None), 可选) - `outputs` 变量梯度的初始值。若 `grad_outputs` 为None，则 `outputs` 梯度的初始值均为全1的Tensor。若 `grad_outputs` 不为None，它必须与 `outputs` 的长度相等，此时，若 `grad_outputs` 的第i个元素为None，则第i个 `outputs` 的梯度初始值为全1的Tensor；若 `grad_outputs` 的第i个元素为Tensor，则第i个 `outputs` 的梯度初始值为 `grad_outputs` 的第i个元素。默认值为None。
+    - **retain_graph** (bool, 可选) - 是否保留计算梯度的前向图。若值为True，则前向图会保留，用户可对同一张图求两次反向。若值为False，则前向图会释放。默认值为None，表示值与 `create_graph` 相等。
+    - **create_graph** (bool, 可选) - 是否创建计算过程中的反向图。若值为True，则可支持计算高阶导数。若值为False，则计算过程中的反向图会释放。默认值为False。
+    - **only_inputs** (bool, 可选) - 是否只计算 `inputs` 的梯度。若值为False，则图中所有叶节点变量的梯度均会计算，并进行累加。若值为True，则只会计算 `inputs` 的梯度。默认值为True。only_inputs=False功能正在开发中，目前尚不支持。
+    - **allow_unused** (bool, 可选) - 决定当某些 `inputs` 变量不在计算图中时抛出错误还是返回None。若某些 `inputs` 变量不在计算图中（即它们的梯度为None），则当allowed_unused=False时会抛出错误，当allow_unused=True时会返回None作为这些变量的梯度。默认值为False。
+    - **no_grad_vars** (Tensor|list(Tensor)|tuple(Tensor)|set(Tensor), 可选) - 指明不需要计算梯度的变量。默认值为None。
+
+返回: tuple(Tensor)，其长度等于 `inputs` 中的变量个数，且第i个返回的变量是所有 `outputs` 相对于第i个 `inputs` 的梯度之和。
+
+**示例代码 1**
+  .. code-block:: python
+
+        import paddle
+        paddle.disable_static()
+
+        def test_dygraph_grad(create_graph):
+            x = paddle.ones(shape=[1], dtype='float32')
+            x.stop_gradient = False
+            y = x * x
+
+            # Since y = x * x, dx = 2 * x
+            dx = paddle.grad(
+                    outputs=[y],
+                    inputs=[x],
+                    create_graph=create_graph,
+                    retain_graph=True)[0]
+
+            z = y + dx
+
+            # If create_graph = False, the gradient of dx
+            # would not be backpropagated. Therefore,
+            # z = x * x + dx, and x.gradient() = 2 * x = 2.0
+
+            # If create_graph = True, the gradient of dx
+            # would be backpropagated. Therefore,
+            # z = x * x + dx = x * x + 2 * x, and
+            # x.gradient() = 2 * x + 2 = 4.0
+
+            z.backward()
+            return x.gradient()
+
+        print(test_dygraph_grad(create_graph=False)) # [2.]
+        print(test_dygraph_grad(create_graph=True)) # [4.]
+
+**示例代码 2**
+  .. code-block:: python
+
+        import paddle
+        paddle.disable_static()
+
+        def test_dygraph_grad(grad_outputs=None):
+            x = paddle.fill_constant(shape=[1], value=2.0, dtype='float32')
+            x.stop_gradient = False
+
+            y1 = x * x
+            y2 = x * 3 
+
+            # If grad_outputs=None, dy1 = [1], dy2 = [1].
+            # If grad_outputs=[g1, g2], then:
+            #    - dy1 = [1] if g1 is None else g1
+            #    - dy2 = [1] if g2 is None else g2
+
+            # Since y1 = x * x, dx = 2 * x * dy1.
+            # Since y2 = x * 3, dx = 3 * dy2.
+            # Therefore, the final result would be:
+            # dx = 2 * x * dy1 + 3 * dy2 = 4 * dy1 + 3 * dy2.
+
+            dx = paddle.grad(
+                outputs=[y1, y2],
+                inputs=[x],
+                grad_outputs=grad_outputs)[0]
+
+            return dx.numpy()
+
+        grad_value = paddle.fill_constant(shape=[1], value=4.0, dtype='float32')
+
+        # dy1 = [1], dy2 = [1]
+        print(test_dygraph_grad(None)) # [7.]
+
+        # dy1 = [1], dy2 = [4]
+        print(test_dygraph_grad([None, grad_value])) # [16.]
+
+        # dy1 = [4], dy2 = [1]
+        print(test_dygraph_grad([grad_value, None])) # [19.]
+
+        # dy1 = [3], dy2 = [4]
+        grad_y1 = paddle.fill_constant(shape=[1], value=3.0, dtype='float32')
+        print(test_dygraph_grad([grad_y1, grad_value])) # [24.]
\ No newline at end of file
--- a/doc/paddle/api/paddle/fluid/framework/Program_cn.rst
+++ b/doc/paddle/api/paddle/fluid/framework/Program_cn.rst
+.. _cn_api_fluid_Program:
+
+Program
+-------------------------------
+
+.. py:class::  paddle.fluid.Program
+
+
+
+
+**注意：默认情况下，Paddle Fluid内部默认含有** :ref:`cn_api_fluid_default_startup_program` **和** :ref:`cn_api_fluid_default_main_program` **，它们共享参数。** :ref:`cn_api_fluid_default_startup_program` **只运行一次来初始化参数，** :ref:`cn_api_fluid_default_main_program` **在每个mini batch中运行并更新权重。**
+
+Program是Paddle Fluid对于计算图的一种静态描述，使用Program的构造函数可以创建一个Program。Program中包括至少一个 :ref:`api_guide_Block` ，当 :ref:`api_guide_Block` 中存在条件选择的控制流OP（例如 :ref:`cn_api_fluid_layers_While` 等）时，该Program将会含有嵌套着的 :ref:`api_guide_Block` 即控制流外部的 :ref:`api_guide_Block` 将包含着控制流内部的 :ref:`api_guide_Block` ，而嵌套的 :ref:`api_guide_Block` 的元素访问控制将由具体的控制流OP来决定。关于Program具体的结构和包含的类型请参阅 `framework.proto <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/framework.proto>`_
+。
+
+一个Program的集合通常包含初始化程序（startup_program）与主程序(main_program)，初始化程序是一个包含一些初始化工作的Program，主程序将会包含用来训练的网络结构和变量，在使用同一个 :ref:`api_guide_executor` 执行时他们会共享初始化工作的结果，例如初始化的参数。一个Program的集合可以被用来测试或者训练，被用来训练时， ``Paddle Fluid`` 将会利用所有用户使用的OP和变量来搭建一个训练网络，被用来测试时， 可以通过调用Program相关的接口例如：`clone` 剪去一些与测试无关的OP和变量，比如反向传播的OP和变量。
+
+
+返回：创建的空的Program
+
+返回值类型：Program
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+
+    main_program = fluid.Program()
+    startup_program = fluid.Program()
+    with fluid.program_guard(main_program=main_program, startup_program=startup_program):
+        x = fluid.layers.data(name="x", shape=[-1, 784], dtype='float32')
+        y = fluid.layers.data(name="y", shape=[-1, 1], dtype='int32')
+        z = fluid.layers.fc(name="fc", input=x, size=10, act="relu")
+
+    # start_up program here will share fc's weight with main program
+    print("main program is: {}".format(main_program))
+
+    print("start up program is: {}".format(startup_program))
+
+
+.. py:method:: to_string(throw_on_error, with_details=False)
+
+将Program转换为字符串
+
+参数：
+ - **throw_on_error** (bool) - 是否在没有设置必需字段时抛出异常。
+ - **with_details** (bool) - 值为true时，打印更多关于变量和参数的信息，如trainable, optimize_attr等
+
+返回： 将Program转换为字符串
+
+返回类型： str
+
+抛出异常： ``ValueError`` - 当 ``throw_on_error == true`` ，当没有设置任何必需的字段时，抛出 ``ValueError`` 。
+
+**代码示例**
+
+.. code-block:: python
+
+        import paddle.fluid as fluid
+
+        prog = fluid.default_main_program()
+        x = fluid.layers.data(name="X", shape=[2,3], dtype="float32", append_batch_size=False)
+        pred = fluid.layers.fc(x, size=3)
+        prog_string = prog.to_string(throw_on_error=True, with_details=False)
+        prog_string_with_details = prog.to_string(throw_on_error=False, with_details=True)
+        print("program string without detail: {}".format(prog_string))
+        print("program string with detail: {}".format(prog_string_with_details))
+
+.. py:method:: clone(for_test=False)
+
+**注意:**
+    **1.** ``Program.clone()`` **方法不会克隆例如**  :ref:`cn_api_fluid_io_DataLoader` **这样的数据读取相关的部分，这可能会造成的数据读取部分在克隆后丢失**
+
+    **2. 此API当** ``for_test=True`` **时将会裁剪部分OP和变量。为防止错误的裁剪，推荐在** :ref:`cn_api_fluid_backward_append_backward` **和执行优化器之前使用** ``clone(for_test=True)`` 。
+
+
+当 ``for_test=True`` 时创建一个新的、仅包含当前Program前向内容的Program。否则创建一个新的，和当前Program完全相同的Program
+
+有些OP，在训练和测试之间的行为是不同的，比如  :ref:`cn_api_fluid_layers_batch_norm` 。它们有一个属性 ``is_test`` 来控制行为。当 ``for_test=True`` 时，此方法将把它们的 ``is_test`` 属性更改为True。
+
+- 克隆Program用于训练时，将 ``for_test`` 设置为False。
+- 克隆Program用于测试时，将 ``for_test`` 设置为True。虽然在这种情况下，如果在使用了优化器之后调用 ``clone`` 我们依旧会对Program当中反向执行以及优化器相关的内容进行自动裁剪，但是，我们强烈建议在使用优化器之前使用 ``clone`` 例如如果使用的是 :ref:`cn_api_fluid_optimizer_Momentum` 可以这样去使用:
+
+**代码示例**
+
+   ::
+
+        import paddle.fluid as fluid
+        img = fluid.layers.data(name='image', shape=[784])
+        pred = fluid.layers.fc(input=img, size=10, act='relu')
+        loss = fluid.layers.mean(pred)
+        ## 我们推荐在使用 Optimizer前使用clone()接口
+        test_program = fluid.default_main_program().clone(for_test=True)
+        optimizer = fluid.optimizer.Momentum(learning_rate=0.01, momentum=0.9)
+        optimizer.minimize(loss)
+
+参数：
+ - **for_test** (bool) – 取值为True时，clone方法内部会把operator的属性 ``is_test`` 设置为 True， 并裁剪反向OP和参数优化OP，默认值为False
+
+返回：当 ``for_test=True`` 时返回一个新的、仅包含当前Program前向内容的Program。否则返回一个新的，和当前Program完全相同的Program
+
+返回类型： Program
+
+**代码示例**
+
+注意，Program在clone后的顺序可能不同，这不会影响的训练或测试进程。在下面的示例中，我们提供了一个简单的方法print_prog（Program）来打印程序描述，以确保clone后仍能得到同样的打印结果：
+
+.. code-block:: python
+
+        import paddle.fluid as fluid
+        import six
+
+
+        def print_prog(prog):
+            for name, value in sorted(six.iteritems(prog.block(0).vars)):
+                print(value)
+            for op in prog.block(0).ops:
+                print("op type is {}".format(op.type))
+                print("op inputs are {}".format(op.input_arg_names))
+                print("op outputs are {}".format(op.output_arg_names))
+                for key, value in sorted(six.iteritems(op.all_attrs())):
+                    if key not in ['op_callstack', 'op_role_var']:
+                        print(" [ attrs: {}:   {} ]".format(key, value))
+
+1.克隆一个Program，示例代码如下。
+
+.. code-block:: python
+
+        import paddle.fluid as fluid
+        import six
+
+        def print_prog(prog):
+            for name, value in sorted(six.iteritems(prog.block(0).vars)):
+                print(value)
+            for op in prog.block(0).ops:
+                print("op type is {}".format(op.type))
+                print("op inputs are {}".format(op.input_arg_names))
+                print("op outputs are {}".format(op.output_arg_names))
+                for key, value in sorted(six.iteritems(op.all_attrs())):
+                    if key not in ['op_callstack', 'op_role_var']:
+                        print(" [ attrs: {}:   {} ]".format(key, value))
+
+        train_program = fluid.Program()
+        startup_program = fluid.Program()
+
+        # ``startup_program`` 被用来执行一些参数初始化工作
+        # ``main_program`` 被用来容纳网络
+        with fluid.program_guard(train_program, startup_program):
+            with fluid.unique_name.guard():
+                img = fluid.layers.data(name='image', shape=[784])
+                hidden = fluid.layers.fc(input=img, size=200, act='relu')
+                hidden = fluid.layers.dropout(hidden, dropout_prob=0.5)
+                loss = fluid.layers.cross_entropy(
+                                          input=fluid.layers.fc(hidden, size=10, act='softmax'),
+                            label=fluid.layers.data(name='label', shape=[1], dtype='int64'))
+                avg_loss = fluid.layers.mean(loss)
+                test_program = train_program.clone(for_test=True)
+        print_prog(test_program)
+
+        # 由于需要使训练和测试参数共享，我们需要使用训练的 ``startup_program``
+        # 来代替测试用的 ``startup_program``, 尽管测试的 ``startup_program`` 里面什么也没有。
+
+        # 在Paddle Fluid中我们会通过同样的变量名来共享权重.
+        # 训练和测试程序的所有参数将会拥有同样的名字，这将会使训练和测试程序实现参数的共享，
+        # 所以我们使用训练程序的 ``startup_program`` .并且由于测试的 ``startup_program`` 什么也没有,
+        # 因此它是一个新的程序.
+        with fluid.program_guard(train_program, startup_program):
+            with fluid.unique_name.guard():
+                sgd = fluid.optimizer.SGD(learning_rate=1e-3)
+                sgd.minimize(avg_loss)
+
+2.如果分别运行 train Program 和 test Program，则可以不使用clone。
+
+.. code-block:: python
+
+        import paddle.fluid as fluid
+        import six
+
+        def print_prog(prog):
+            for name, value in sorted(six.iteritems(prog.block(0).vars)):
+                print(value)
+            for op in prog.block(0).ops:
+                print("op type is {}".format(op.type))
+                print("op inputs are {}".format(op.input_arg_names))
+                print("op outputs are {}".format(op.output_arg_names))
+                for key, value in sorted(six.iteritems(op.all_attrs())):
+                    if key not in ['op_callstack', 'op_role_var']:
+                        print(" [ attrs: {}:   {} ]".format(key, value))
+        
+        def network():
+            img = fluid.layers.data(name='image', shape=[784])
+            hidden = fluid.layers.fc(input=img, size=200, act='relu')
+            hidden = fluid.layers.dropout(hidden, dropout_prob=0.5)
+            loss = fluid.layers.cross_entropy(
+                input=fluid.layers.fc(hidden, size=10, act='softmax'),
+                label=fluid.layers.data(name='label', shape=[1], dtype='int64'))
+            avg_loss = fluid.layers.mean(loss)
+            return avg_loss
+
+        train_program_2 = fluid.Program()
+        startup_program_2 = fluid.Program()
+        test_program_2 = fluid.Program()
+        with fluid.program_guard(train_program_2, startup_program_2):
+            with fluid.unique_name.guard():
+                avg_loss = network()
+                sgd = fluid.optimizer.SGD(learning_rate=1e-3)
+                sgd.minimize(avg_loss)
+        # 不使用测试阶段的启动程序
+        with fluid.program_guard(test_program_2, startup_program_2):
+            with fluid.unique_name.guard():
+                avg_loss = network()
+        print_prog(test_program_2)
+
+上边两个代码片段生成和打印的Program是一样的。
+
+.. py:staticmethod:: parse_from_string(binary_str)
+
+通过对 `protobuf <https://en.wikipedia.org/wiki/Protocol_Buffers>`_ 的反序列化，转换成Program
+
+
+参数：
+ - **binary_str_type** (str) – `protobuf <https://en.wikipedia.org/wiki/Protocol_Buffers>`_ 二进制字符串
+
+返回：反序列化后的 Program
+
+返回类型：Program
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+
+    startup_prog = fluid.Program()
+    main_prog = fluid.Program()
+    with fluid.program_guard(startup_prog, main_prog):
+        x = fluid.layers.data(
+            name='X', shape=[1000, 784], dtype='float32', append_batch_size=False)
+
+        y = fluid.layers.data(
+            name='Y', shape=[784, 100], dtype='float32', append_batch_size=False)
+
+        z = fluid.layers.mul(x=x, y=y)
+
+        binary_str = fluid.default_main_program().desc.serialize_to_string()
+        prog_restored = fluid.default_main_program().parse_from_string(binary_str)
+
+        print(fluid.default_main_program())
+        print(prog_restored)
+
+        # 这里打印出的两个Program应该是一模一样的
+
+.. py:attribute:: num_blocks
+
+该Program中的 :ref:`api_guide_Block` 的个数
+
+返回： 该Program中的 :ref:`api_guide_Block` 的个数
+
+返回类型：int
+
+**代码示例**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+
+            prog = fluid.default_main_program()
+            num_blocks = prog.num_blocks
+            print(num_blocks)
+
+            ## 1
+            ## 当前Program中只有一个Block，即全局的Block
+
+.. py:attribute:: random_seed
+
+**注意：必须在相关OP被添加之前设置。**
+
+程序中随机运算符的默认随机种子。0意味着随机生成随机种子。
+
+返回：该Program中当前正在使用的random seed
+
+返回类型：int64
+
+**代码示例**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+
+            prog = fluid.default_main_program()
+            random_seed = prog.random_seed
+            x_var = fluid.layers.data(name="X", shape=[3,3], dtype="float32", append_batch_size=False)
+            print(random_seed)
+            ## 0
+            ## 默认的random seed是 0
+
+            # 这里我们必须要在fluid.layers.dropout之前设置random_seed
+            prog.random_seed = 1
+            z_var = fluid.layers.dropout(x_var, 0.7)
+
+            print(prog.random_seed)
+            ## 1
+            ## 修改后random seed变成了 1
+
+.. py:method:: global_block()
+
+获取该Program的第一个 :ref:`api_guide_Block` 。
+
+返回：该Program的第一个 :ref:`api_guide_Block`
+
+返回类型：:ref:`api_guide_Block`
+
+**代码示例**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+
+            prog = fluid.default_main_program()
+            gb_block = prog.global_block()
+            print(gb_block)
+            ##
+            ## idx: 0
+            ## parent_idx: -1
+            ## 打印出了当前全局Block的描述
+
+.. py:method:: block(index)
+
+返回该Program中 ， ``index`` 指定的 :ref:`api_guide_Block` 。 ``index`` 类型为int
+
+参数:
+ - **index** (int) - 需要获取的 :ref:`api_guide_Block`  的index
+
+返回: 该Program中index对应的那个 :ref:`api_guide_Block`
+
+返回类型: :ref:`api_guide_Block`
+
+**代码示例**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+
+            prog = fluid.default_main_program()
+            block_0 = prog.block(0)
+            print(block_0)
+            ##
+            ## idx: 0
+            ## parent_idx: -1
+            ## 打印出了0号Block的描述
+
+.. py:method:: current_block()
+
+获取当前 :ref:`api_guide_Block` 。当前 :ref:`api_guide_Block`  是用来添加OP的。
+
+返回: 该Program中用户当前所在的 :ref:`api_guide_Block`
+
+返回类型: :ref:`api_guide_Block`
+
+**代码示例**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+
+            prog = fluid.default_main_program()
+            current_blk = prog.current_block()
+            print(current_blk)
+            ##
+            ## idx: 0
+            ## parent_idx: -1
+            ## 打印出了当前Block的描述
+
+.. py:method:: list_vars()
+
+获取当前Program中所有变量。返回值是一个可迭代对象（iterable object)。
+
+返回: Generator 会yield每个Program中的变量
+
+返回类型: iterable 的 :ref:`api_guide_Variable`
+
+
+**代码示例**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+
+            prog = fluid.default_main_program()
+            img = fluid.layers.data(name='img', shape=[1,28,28], dtype='float32')
+            label = fluid.layers.data(name='label', shape=[128,1], dtype='int64')
+            for var in prog.list_vars():
+                print(var)
+
+            # 这里将会打印出当前Program中所有的Variable
+
+.. py:method:: all_parameters()
+
+获取当前Program中所有的 :ref:`api_guide_parameter` 。返回值是一个列表。
+
+返回: 一个包含当前Program中所有参数的列表。
+
+返回类型: list[ :ref:`api_guide_parameter` ]
+
+
+**代码示例**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+
+            program = fluid.default_main_program()
+            data = fluid.data(name='x', shape=[None, 13], dtype='float32')
+            hidden = fluid.layers.fc(input=data, size=10)
+            loss = fluid.layers.mean(hidden)
+            fluid.optimizer.SGD(learning_rate=0.01).minimize(loss)
+
+            for param in program.all_parameters():
+                print(param)
+
+            # 这里将会打印出当前Program中所有的Parameters，在本例中，输出结果是:
+            #
+            # name: "fc_0.w_0"
+            # type {
+            # type: LOD_TENSOR
+            # lod_tensor {
+            #     tensor {
+            #       data_type: FP32
+            #       dims: 13
+            #       dims: 10
+            #     }
+            #   }
+            # }
+            #
+            # persistable: true
+            # name: "fc_0.b_0"
+            # type {
+            # type: LOD_TENSOR
+            # lod_tensor {
+            #     tensor {
+            #       data_type: FP32
+            #       dims: 10
+            #     }
+            #   }
+            # }
+            # persistable: true
+            #
+            # 这里print(param)将会打印出一个参数所有的属性，包括name，type和persistable，
+            # 你可以访问一个参数的指定属性，例如param.name，param.type
\ No newline at end of file
--- a/doc/paddle/api/paddle/fluid/layers/dynamic_decode_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/dynamic_decode_cn.rst
+.. _cn_api_fluid_layers_dynamic_decode:
+
+dynamic_decode
+-------------------------------
+
+
+
+.. py:method:: dynamic_decode(decoder, inits=None, max_step_num=None, output_time_major=False, impute_finished=False, is_test=False, return_length=False, **kwargs):
+
+:api_attr: 声明式编程模式（静态图)
+
+
+
+该接口重复执行 :code:`decoder.step()` 直到 其返回的表示完成状态的Tensor中的值全部为True或解码步骤达到 :code:`max_step_num`。
+
+:code:`decode.initialize()` 会在解码循环之前被调用一次。如果 :code:`decoder` 实现了 :code:`finalize` 方法，则 :code:`decoder.finalize()` 在解码循环后将被调用一次。
+
+参数:
+  - **decoder** (Decoder) - 解码器的实例。
+  - **inits** (object，可选) - 传递给 :code:`decoder.initialize` 的参数。默认为None。
+  - **max_step_num** (int，可选) - 最大步数。如果未提供，解码直到解码过程完成（ :code:`decode.step()` 返回的表示完成状态的Tensor中的值全部为True）。默认为None。
+  - **output_time_major** (bool，可选) - 指明最终输出(此方法的第一个返回值)中包含的Tensor的数据布局。如果为False，其将使用batch优先的数据布局, 此时的形状为 :math:`[batch\_size，seq\_len，...]`。如果为True，其将使用time优先的数据布局，此时的形状为 :math:`[seq\_len，batch\_size，...]`。默认值为False。
+  - **impute_finished** (bool，可选) - 若为True，对于当前批次中完成状态为结束的样本，将会拷贝其上一步的状态，而非像未结束的实例那样使用 :code:`decode.step()` 返回的 :code:`next_states` 作为新的状态，这保证了返回的最终状态 :code:`final_states` 是正确的；否则，不会区分是否结束，也没有这个拷贝操作。若 :code:`final_states` 会被使用，则这里应该设置为True，这会一定程度上影响速度。默认为False。
+  - **is_test** (bool，可选) - 标识是否是预测模式，预测模式下内存占用会更少。默认为False。
+  - **return_length** (bool，可选) - 标识是否在返回的元组中额外包含一个存放了所有解码序列实际长度的Tensor。默认为False。
+  - **kwargs** - 其他命名关键字参数。这些参数将传递给 :code:`decoder.step`。
+
+返回：若 :code:`return_length` 为True，则返回三元组 :code:`(final_outputs, final_states, sequence_lengths)` ，否则返回二元组 :code:`(final_outputs, final_states)` 。 :code:`final_outputs, final_states` 包含了最终的输出和状态，这两者都是Tensor或Tensor的嵌套结构。:code:`final_outputs` 具有与 :code:`decoder.step()` 返回的 :code:`outputs` 相同的结构和数据类型， 且其中的每个tensor都是将所有解码步中与其对应的的输出进行堆叠的结果；如果 :code:`decoder` 实现了 :code:`finalize` 方法，这些tensor也可能会通过 :code:`decoder.finalize()` 进行修改。:code:`final_states` 是最后时间步的状态，和 :code:`decoder.initialize()` 返回的初始状态具有相同的结构，形状和数据类型。:code:`sequence_lengths` 是int64类型的tensor，和 :code:`decoder.initialize()` 返回的 :code:`finished` 具有相同的形状，其保存了所有解码序列实际长度。
+
+返回类型：tuple
+
+**示例代码**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+    import paddle.fluid.layers as layers
+    from paddle.fluid.layers import GRUCell, BeamSearchDecoder, dynamic_decode
+    encoder_output = fluid.data(name="encoder_output",
+                            shape=[-1, 32, 128],
+                            dtype="float32")
+    trg_embeder = lambda x: fluid.embedding(
+        x, size=[10000, 128], param_attr=fluid.ParamAttr(name="trg_embedding"))
+    output_layer = lambda x: layers.fc(x,
+                                    size=10000,
+                                    num_flatten_dims=len(x.shape) - 1,
+                                    param_attr=fluid.ParamAttr(name=
+                                                                "output_w"),
+                                    bias_attr=False)
+    decoder_cell = GRUCell(hidden_size=128)
+    decoder = BeamSearchDecoder(decoder_cell,
+                                start_token=0,
+                                end_token=1,
+                                beam_size=4,
+                                embedding_fn=trg_embeder,
+                                output_fn=output_layer)
+    outputs = dynamic_decode(	
+        decoder=decoder, inits=decoder_cell.get_initial_states(encoder_output))
--- a/doc/paddle/api/paddle/fluid/layers/dynamic_gru_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/dynamic_gru_cn.rst
+.. _cn_api_fluid_layers_dynamic_gru:
+
+dynamic_gru
+-------------------------------
+
+
+.. py:function::  paddle.fluid.layers.dynamic_gru(input, size, param_attr=None, bias_attr=None, is_reverse=False, gate_activation='sigmoid', candidate_activation='tanh', h_0=None, origin_mode=False)
+
+:api_attr: 声明式编程模式（静态图)
+
+
+
+
+**注意：该OP的输入只能是LoDTensor，如果您需要处理的输入是Tensor类型，请使用StaticRNN（fluid.layers.** :ref:`cn_api_fluid_layers_StaticRNN` **）。**
+
+该OP用于在完整序列上逐个时间步的进行单层Gated Recurrent Unit（GRU）的计算，单个时间步内GRU的计算支持以下两种计算方式：
+
+如果origin_mode为True，则使用的运算公式来自论文
+`Learning Phrase Representations using RNN Encoder Decoder for Statistical Machine Translation <https://arxiv.org/pdf/1406.1078.pdf>`_ 。
+
+.. math::
+    u_t & = act_g(W_{ux}x_{t} + W_{uh}h_{t-1} + b_u)\\
+    r_t & = act_g(W_{rx}x_{t} + W_{rh}h_{t-1} + b_r)\\
+    \tilde{h_t} & = act_c(W_{cx}x_{t} + W_{ch}(r_t \odot h_{t-1}) + b_c)\\
+    h_t & = u_t \odot h_{t-1} + (1-u_t) \odot \tilde{h_t}
+
+
+如果origin_mode为False，则使用的运算公式来自论文
+`Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling  <https://arxiv.org/pdf/1412.3555.pdf>`_ 。
+
+公式如下:
+
+.. math::
+    u_t & = act_g(W_{ux}x_{t} + W_{uh}h_{t-1} + b_u)\\
+    r_t & = act_g(W_{rx}x_{t} + W_{rh}h_{t-1} + b_r)\\
+    \tilde{h_t} & = act_c(W_{cx}x_{t} + W_{ch}(r_t \odot h_{t-1}) + b_c)\\
+    h_t & = (1-u_t) \odot h_{t-1} + u_t \odot \tilde{h_t}
+
+
+其中， :math:`x_t` 为当前时间步的输入，这个输入并非 ``input``，该OP不包含 :math:`W_{ux}x_{t}, W_{rx}x_{t}, W_{cx}x_{t}` 的计算， **注意** 要在该OP前使用大小为 ``size`` 的3倍的全连接层并将其输出作为 ``input``；
+:math:`h_{t-1}` 为前一时间步的隐状态 ``hidden``； :math:`u_t` 、 :math:`r_t` 、 :math:`\tilde{h_t}` 和 :math:`h_t` 分别代表了GRU单元中update gate（更新门）、reset gate（重置门）、candidate hidden（候选隐状态）和隐状态输出; :math:`\odot` 为逐个元素相乘；
+:math:`W_{uh}, b_u` 、 :math:`W_{rh}, b_r` 和 :math:`W_{ch}, b_c` 分别代表更新门、重置门和候选隐状态在计算时使用的权重矩阵和偏置。在实现上，三个权重矩阵合并为一个 :math:`[D, D \times 3]` 形状的Tensor存放，三个偏置拼接为一个 :math:`[1, D \times 3]` 形状的Tensor存放，其中 :math:`D` 为隐单元的数目；权重Tensor存放布局为： :math:`W_{uh}` 和 :math:`W_{rh}` 拼接为 :math:`[D, D  \times 2]` 形状位于前半部分，:math:`W_{ch}` 以 :math:`[D, D]` 形状位于后半部分。
+
+
+参数:
+    - **input** (Variable) – LoD level为1的LoDTensor，表示经线性变换后的序列输入，形状为 :math:`[T, D \times 3]` ，其中 :math:`T` 表示mini-batch中所有序列长度之和， :math:`D` 为隐状态特征维度的大小。数据类型为float32或float64。
+    - **size** (int) – 隐状态特征维度的大小
+    - **param_attr** (ParamAttr，可选) – 指定权重参数属性的对象。默认值为None，表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+    - **bias_attr** (ParamAttr，可选) - 指定偏置参数属性的对象。默认值为None，表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+    - **is_reverse** (bool，可选) – 指明是否按照和输入相反的序列顺序计算，默认为False。
+    - **gate_activation** (str，可选) – 公式中 :math:`act_g` 激活函数的类型。支持identity、sigmoid、tanh、relu四种激活函数类型，默认为sigmoid。
+    - **candidate_activation** (str，可选) – 公式中 :math:`act_c` 激活函数的类型。支持identity、sigmoid、tanh、relu四种激活函数类型，默认为tanh。
+    - **h_0** (Variable，可选) – 表示初始隐状态的Tensor，若未提供，则默认为0。其形状为 :math:`[N, D]` , 其中 :math:`N` 为输入mini-batch中序列的数目， :math:`D` 为隐状态特征维度的大小。数据类型与 ``input`` 相同。默认值为None。
+    - **origin_mode** (bool，可选) – 指明要使用的GRU计算方式，两种计算方式具体差异见公式描述，默认值为False。
+
+返回： 形状为 :math:`[T, D]` 、LoD level为1的LoDTensor，其中 :math:`T` 表示mini-batch中所有序列长度之和， :math:`D` 为隐状态特征维度的大小。表示经过GRU变换的输出特征序列，和 ``input`` 具有相同的LoD（序列长度）和数据类型。
+
+返回类型: Variable
+
+
+**代码示例**
+
+..  code-block:: python
+
+    import paddle.fluid as fluid
+
+    dict_dim, emb_dim = 128, 64
+    data = fluid.data(name='sequence',
+                shape=[None],
+                dtype='int64',
+                lod_level=1)
+    emb = fluid.embedding(input=data, size=[dict_dim, emb_dim])
+    hidden_dim = 512
+    x = fluid.layers.fc(input=emb, size=hidden_dim * 3)
+    hidden = fluid.layers.dynamic_gru(input=x, size=hidden_dim)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
--- a/doc/paddle/api/paddle/fluid/layers/dynamic_lstm_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/dynamic_lstm_cn.rst
+.. _cn_api_fluid_layers_dynamic_lstm:
+
+dynamic_lstm
+-------------------------------
+
+
+.. py:function::  paddle.fluid.layers.dynamic_lstm(input, size, h_0=None, c_0=None, param_attr=None, bias_attr=None, use_peepholes=True, is_reverse=False, gate_activation='sigmoid', cell_activation='tanh', candidate_activation='tanh', dtype='float32', name=None)
+
+:api_attr: 声明式编程模式（静态图)
+
+
+
+该OP实现了 LSTM，即 Long-Short Term Memory（长短期记忆）运算 - `Hochreiter, S., & Schmidhuber, J. (1997) <http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf>`_。
+
+.. note::
+    - 该OP仅支持 LoDTensor 作为输入，如果您需要处理的是Tensor，请使用 :ref:`cn_api_fluid_layers_lstm` 。
+    - 在实现的时候为了提升效率，用户必须将LSTM的输入先进行线性映射，将维度为 [T, hidden_size] 的输入映射为 [T, 4 × hidden_size] 输入，然后再传给该OP。
+
+该OP的默认实现方式为 diagonal/peephole 连接，参见 `Gers, F. A., & Schmidhuber, J. (2000) <ftp://ftp.idsia.ch/pub/juergen/TimeCount-IJCNN2000.pdf>`_。
+如果需要禁用 peephole 连接方法，将 use_peepholes 设为 False 即可。 
+
+该OP对于序列中每一个时间步的计算公式如下：
+
+.. math::
+      i_t=\sigma (W_{ix}x_{t}+W_{ih}h_{t-1}+W_{ic}c_{t-1}+b_i)
+.. math::
+      f_t=\sigma (W_{fx}x_{t}+W_{fh}h_{t-1}+W_{fc}c_{t-1}+b_f)
+.. math::
+      o_t=\sigma (W_{ox}x_{t}+W_{oh}h_{t-1}+W_{oc}c_{t-1}+b_o)
+.. math::
+      \widetilde{c_t}=act_g(W_{ct}x_{t}+W_{ch}h_{t-1}+b_{c})
+.. math::
+      c_t=f_t\odot c_{t-1}+i_t\odot \widetilde{c_t}
+.. math::
+      h_t=o_t\odot act_h(c_t)
+
+公式中的概念信息如下：
+      - :math:`x_{t}` 表示时间步 :math:`t` 的输入
+      - :math:`h_{t}` 表示时间步 :math:`t` 的 hidden 状态
+      - :math:`h_{t-1}, c_{t-1}` 分别表示前一个时间步的 hidden 和 cell 状态
+      - :math:`\widetilde{c_t}` 表示候选的 cell 状态
+      - :math:`i_t` ，:math:`f_t` 和 :math:`o_t` 分别为 input gate，forget gate，output gate
+      - :math:`W` 表示 weight （例如， :math:`W_{ix}` 是在计算 input gate :math:`i_t` 时，对输入 :math:`x_{t}` 做线性变换的 weight）
+      - :math:`b` 表示 bias （例如， :math:`b_{i}` 是 input gate 的 bias）
+      - :math:`\sigma` 表示 gate 的非线性激活函数，默认为 sigmoid
+      - :math:`act_g， act_h` 分别表示 cell 输入和 cell 输出的非线性激活函数，默认为 tanh
+      - :math:`\odot` 表示矩阵的 Hadamard product，即对两个维度相同的矩阵，将相同位置的元素相乘，得到另一个维度相同的矩阵
+
+参数:
+  - **input** ( :ref:`api_guide_Variable` ) 维度为 :math:`[T, 4*hidden\_size]` 的多维 LoDTensor（必须在传入该OP前对维度为 :math:`[T, hidden\_size]` 的输入经过线性变换得到），其中 T 为 batch 中所有样本的长度之和，hidden_size 为隐层大小，数据类型为 float32 或者 float64。
+  - **size** (int) – 必须为 4*hidden_size。
+  - **h_0** ( :ref:`api_guide_Variable` ，可选) 维度为 :math:`[batch\_size, hidden\_size]` 的多维 Tensor，其中 hidden_size 为隐层大小，数据类型为 float32 或者 float64。如果为 None，该OP会自动设置为全0的向量。默认值为None。
+  - **c_0** ( :ref:`api_guide_Variable` ，可选) 维度为 :math:`[batch\_size, hidden\_size]` 的多维 Tensor，其中 hidden_size 为隐层大小，数据类型为 float32 或者 float64。如果为 None，该OP会自动设置为全0的向量；:math:`h_0, c_0` 如果要设置为None，必须同时为None。默认值为None。
+  - **param_attr** (ParamAttr，可选) – 指定权重参数属性的对象。如果为None，表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。如果用户需要设置此属性，维度必须等于 :math:`[hidden\_size, 4*hidden\_size]`。默认值为None。
+  - **bias_attr** (ParamAttr，可选) – 指定偏置参数属性的对象。如果为None，表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。如果用户需要设置此属性，如果 use_peepholes=true，维度需为 :math:`[1, 4*hidden\_size]`, use_peepholes=true，维度需为 :math:`[1, 7*hidden\_size]`。默认值为None。   
+  - **use_peepholes** (bool，可选) – 是否使用 peephole 连接。默认值为True。
+  - **is_reverse** (bool，可选) – 是否将输入的数据根据根据样本长度进行逆序，同时会将输出进行逆序，用户拿到结果之后，不需要再逆序。默认值为False。
+  - **gate_activation** (str，可选) – 应用于input gate，forget gate， output gate 的激活函数。默认值为sigmoid。
+  - **cell_activation** (str，可选) – 用于cell输入的激活函数。默认值为tanh。
+  - **candidate_activation** (str，可选) – 用于cell输出的激活函数。默认值为tanh。
+  - **dtype** (str，可选) – 数据类型为 float32 或者 float64。默认值为 float32。
+  - **name** (str，可选) – 具体用法请参见 :ref:`api_guide_Name` ，默认值为None。
+
+返回：经过lstm运算输出的 hidden 和 cell 的状态的tuple，包括
+
+- hidden：LSTM hidden的输出结果，维度为 :math:`[T, hidden\_size]` 的LoDTensor，且LoD保持与输入一致，数据类型与input一致。
+- cell：LSTM cell的输出结果，维度为 :math:`[T, hidden\_size]` 的LoDTensor，且LoD保持与输入一致，数据类型与input一致。
+
+返回类型: tuple（ :ref:`api_guide_Variable` , :ref:`api_guide_Variable` ）
+
+
+**代码示例**
+
+..  code-block:: python
+
+      import paddle.fluid as fluid
+      emb_dim = 256
+      vocab_size = 10000
+      hidden_dim = 512
+
+      data = fluid.layers.data(name='x', shape=[1], dtype='int32', lod_level=1)
+      emb = fluid.layers.embedding(input=data, size=[vocab_size, emb_dim], is_sparse=True)
+      
+      forward_proj = fluid.layers.fc(input=emb, size=hidden_dim * 4, bias_attr=False)
+      forward, cell = fluid.layers.dynamic_lstm(input=forward_proj, size=hidden_dim * 4, use_peepholes=False)
+      forward.shape  # (-1, 512)
+      cell.shape  # (-1, 512)
+
+
+
+
+
+
+
+
+
+
+
+
+
--- a/doc/paddle/api/paddle/fluid/layers/fc_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/fc_cn.rst
+.. _cn_api_fluid_layers_fc:
+
+fc
+-------------------------------
+
+
+.. py:function::  paddle.fluid.layers.fc(input, size, num_flatten_dims=1, param_attr=None, bias_attr=None, act=None, name=None)
+
+:api_attr: 声明式编程模式（静态图)
+
+
+
+
+**全连接层**
+
+该OP将在神经网络中构建一个全连接层。其输入可以是一个Tensor（或LoDTensor）或多个Tensor（或LoDTensor）组成的list（详见参数说明），该OP会为每个输入的Tensor创建一个权重（weights）变量，即一个从每个输入单元到每个输出单元的全连接权重矩阵。FC层将每个输入Tensor和其对应的权重(weights)相乘得到shape为 :math:`[M, size]` 输出Tensor，其中 ``M`` 为batch_size大小。如果有多个输入Tensor，则多个shape为 :math:`[M, size]` 的Tensor计算结果会被累加起来，作为最终输出。如果 ``bias_attr`` 非空，则会创建一个偏置变量（bias variable），并把它累加到输出结果中。如果 ``act`` 非空，将会在输出结果上应用相应的激活函数。
+
+当输入为单个Tensor（或LoDTensor）：
+
+.. math::
+
+        \\Out = Act({XW + b})\\
+
+
+
+当输入为多个Tensor（或LoDTensor）组成的list时：
+
+.. math::
+
+        \\Out=Act(\sum^{N-1}_{i=0}X_iW_i+b) \\
+
+
+上述等式中：
+  - :math:`N` ：输入的数目，如果输入是Tensor列表，N等于len(input)
+  - :math:`X_i` ：第i个输入的Tensor
+  - :math:`W_i` ：对应第i个输入张量的第i个权重矩阵
+  - :math:`b` ：该层创建的bias参数
+  - :math:`Act` ：activation function(激活函数)
+  - :math:`Out` ：输出Tensor
+
+::
+            
+        Case 1： 
+            给定单个输入Tensor data_1, 且num_flatten_dims = 2:
+                data_1.data = [[[0.1, 0.2],
+                               [0.3, 0.4]]]
+                data_1.shape = (1, 2, 2) # 1是batch_size
+
+                out = fluid.layers.fc(input=data_1, size=1， num_flatten_dims=2)
+
+          则输出为：
+                out.data = [[0.83234344], [0.34936576]]
+                out.shape = (1, 2, 1)
+
+
+        Case 2: 
+            给定多个Tensor组成的list:
+                data_1.data = [[[0.1, 0.2],
+                               [0.3, 0.4]]]
+                data_1.shape = (1, 2, 2) # 1 是 batch_size
+
+                data_2 = [[[0.1, 0.2, 0.3]]]
+                data_2.shape = (1, 1, 3)
+
+                out = fluid.layers.fc(input=[data_1, data_2], size=2)
+
+            则输出为：
+                out.data = [[0.18669507, 0.1893476]]
+                out.shape = (1, 2)
+
+
+参数:
+  - **input** (Variable|list of Variable) – 维度为 :math:`[N_1, N_2, ..., N_k]` 的多维Tensor（或LoDTensor）或由多个Tensor（或LoDTensor）组成的list，输入Tensor的shape至少是2。数据类型为float32或float64。
+  - **size** (int) – 全连接层输出单元的数目，即输出Tensor（或LoDTensor）特征维度。
+  - **num_flatten_dims** (int) – 输入可以接受维度大于2的Tensor。在计算时，输入首先会被扁平化（flatten）为一个二维矩阵，之后再与权重(weights)相乘。参数 ``num_flatten_dims`` 决定了输入Tensor的flatten方式: 前 ``num_flatten_dims`` (包含边界，从1开始数) 个维度会被扁平化为二维矩阵的第一维 (即为矩阵的高), 剩下的 :math:`rank(X) - num\_flatten\_dims` 维被扁平化为二维矩阵的第二维 (即矩阵的宽)。 例如， 假设X是一个五维的Tensor，其shape为(2, 3, 4, 5, 6), 若 :math:`num\_flatten\_dims = 3` ，则扁平化的矩阵shape为： :math:`(2 x 3 x 4, 5 x 6) = (24, 30)` ，最终输出Tensor的shape为 :math:`(2, 3, 4, size)` 。默认为1。
+  - **param_attr** (ParamAttr) – 指定权重参数属性的对象。默认值为None，表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+  - **bias_attr** (ParamAttr) – 指定偏置参数属性的对象。默认值为None，表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+  - **act** (str) – 应用于输出上的激活函数，如tanh、softmax、sigmoid，relu等，支持列表请参考 :ref:`api_guide_activations` ，默认值为None。
+  - **name** (str，可选) – 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+
+
+返回：经过全连接层计算后的Tensor或LoDTensor，数据类型与input类型一致。
+
+返回类型: Variable
+
+弹出异常：``ValueError`` - 如果输入Tensor（或LoDTensor）的维度小于2
+
+**代码示例**
+
+..  code-block:: python
+
+         import paddle.fluid as fluid
+         # 当输入为单个张量时
+
+        data = fluid.layers.data(name="data", shape=[32, 32], dtype="float32")
+        fc = fluid.layers.fc(input=data, size=1000, act="tanh")
+
+        # 当输入为多个张量时
+        data_1 = fluid.layers.data(name="data_1", shape=[32, 32], dtype="float32")
+        data_2 = fluid.layers.data(name="data_2", shape=[24, 36], dtype="float32")
+        fc = fluid.layers.fc(input=[data_1, data_2], size=1000, act="tanh")
+
+
+
+
+
+
+
+
+
+
+
+
+
--- a/doc/paddle/api/paddle/fluid/layers/flatten_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/flatten_cn.rst
+.. _cn_api_fluid_layers_flatten:
+
+flatten
+-------------------------------
+
+.. py:function::  paddle.fluid.layers.flatten(x, axis=1, name=None)
+
+:alias_main: paddle.flatten
+:alias: paddle.flatten,paddle.tensor.flatten,paddle.tensor.manipulation.flatten
+:old_api: paddle.fluid.layers.flatten
+
+
+
+flatten op将输入的多维Tensor展平成2-D Tensor矩阵
+
+例如：
+
+.. code-block:: text
+
+    Case 1:
+
+      给定
+        X.shape = (3, 100, 100, 4)
+      且
+        axis = 2
+      得到:
+        Out.shape = (3 * 100, 4 * 100)
+
+    Case 2:
+
+      给定
+        X.shape = (3, 100, 100, 4)
+      且
+        axis = 0
+      得到:
+        Out.shape = (1, 3 * 100 * 100 * 4)
+
+参数：
+  - **x** (Variable) - 一个维度数>=axis 的多维Tensor, 数据类型可以为float32，float64，int8，int32或int64。
+  - **axis** (int) - flatten展开的分割轴，[0, axis) 轴数据被flatten到输出矩阵的0轴，[axis, R)数据被flatten到输出矩阵的1轴，其中R是输入张量的总维度数。axis的值必须在[0,R]范围内。当 axis=0 时，若输入Tensor的维度为 :math:`[d_0, d_1，… d_n]` ，则输出张量的Tensor维度为 :math:`[1，d_0 * d_1 *… d_n]` ，默认值为1。
+  - **name** (str，可选) - 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+
+返回: 一个 2-D Tensor，它包含输入Tensor的数据，但维度发生变化。输入的[0, axis)维将沿axis展平到输出Tensor的0维度，剩余的输入维数展平到输出的1维度。数据类型与输入x相同。
+
+返回类型: Variable
+
+抛出异常：
+  - ValueError: 如果 x 不是一个Variable
+  - ValueError: 如果axis的范围不在 [0, rank(x)] 范围内
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+    x = fluid.layers.data(name="x", shape=[4, 4, 3], append_batch_size=False, dtype="float32")
+    # x shape is [4, 4, 3]
+    out = fluid.layers.flatten(x=x, axis=2)
+    # out shape is [16, 3]
+
+
+
--- a/doc/paddle/api/paddle/fluid/layers/get_tensor_from_selected_rows_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/get_tensor_from_selected_rows_cn.rst
+.. _cn_api_fluid_layers_get_tensor_from_selected_rows:
+
+get_tensor_from_selected_rows
+-------------------------------
+
+.. py:function::  paddle.fluid.layers.get_tensor_from_selected_rows(x, name=None)
+
+
+
+
+该OP从SelectedRows类型的输入中获取向量数据，以LoDTensor的形式输出。
+
+
+::
+
+    例如：
+
+          输入为SelectedRows类型:
+               x.rows = [0, 5, 5, 4, 19]
+               x.height = 20
+               x.value = [[1, 1] [2, 2] [2, 2] [3, 3] [6, 6]]
+
+          输出为LoDTensor：
+               out.shape = [5, 2]
+               out.data = [[1, 1],
+                           [2, 2],
+                           [2, 2],
+                           [3, 3],
+                           [6, 6]]
+
+
+参数：
+  - **x** (SelectedRows) - SelectedRows类型的输入，数据类型为float32，float64，int32或int64。
+  - **name** (str) - 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None。
+
+返回： 从SelectedRows中转化而来的LoDTensor，数据类型和输入一致。
+
+返回类型： Variable
+
+**代码示例：**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+    b = fluid.default_main_program().global_block()
+    input = b.create_var(name="X", dtype="float32", persistable=True, type=fluid.core.VarDesc.VarType.SELECTED_ROWS)
+    out = fluid.layers.get_tensor_from_selected_rows(input)
+
+
+
+
+
+
+
+
+
--- a/doc/paddle/api/paddle/fluid/layers/grid_sampler_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/grid_sampler_cn.rst
+.. _cn_api_fluid_layers_grid_sampler:
+
+grid_sampler
+-------------------------------
+
+.. py:function::  paddle.fluid.layers.grid_sampler(x, grid, name=None)
+
+:alias_main: paddle.nn.functional.grid_sampler
+:alias: paddle.nn.functional.grid_sampler,paddle.nn.functional.vision.grid_sampler
+:old_api: paddle.fluid.layers.grid_sampler
+
+
+
+该OP基于flow field网格的对输入X进行双线性插值采样。网格通常由affine_grid生成, shape为[N, H, W, 2]，是shape为[N, H, W]的采样点张量的(x, y)坐标。
+其中，x坐标是对输入数据X的第四个维度(宽度维度)的索引，y坐标是第三维度(高维度)的索引，最终输出采样值为采样点的4个最接近的角点的双线性插值结果，输出张量的shape为[N, C, H, W]。
+
+step 1：
+
+  得到(x, y)网格坐标，缩放到[0,h -1/W-1]
+
+.. code-block:: text
+
+  grid_x = 0.5 * (grid[:, :, :, 0] + 1) * (W - 1) grid_y = 0.5 * (grid[:, :, :, 1] + 1) * (H - 1)
+
+step 2：
+
+  在每个[H, W]区域用网格(X, y)作为输入数据X的索引，并将双线性插值点值由4个最近的点表示。
+
+.. code-block:: text
+
+      wn ------- y_n ------- en
+      |           |           |
+      |          d_n          |
+      |           |           |
+     x_w --d_w-- grid--d_e-- x_e
+      |           |           |
+      |          d_s          |
+      |           |           |
+      ws ------- y_s ------- wn
+
+    x_w = floor(x)              // west side x coord
+    x_e = x_w + 1               // east side x coord
+    y_n = floor(y)              // north side y coord
+    y_s = y_s + 1               // south side y coord
+    d_w = grid_x - x_w          // distance to west side
+    d_e = x_e - grid_x          // distance to east side
+    d_n = grid_y - y_n          // distance to north side
+    d_s = y_s - grid_y          // distance to south side
+    wn = X[:, :, y_n, x_w]      // north-west point value
+    en = X[:, :, y_n, x_e]      // north-east point value
+    ws = X[:, :, y_s, x_w]      // south-east point value
+    es = X[:, :, y_s, x_w]      // north-east point value
+
+
+    output = wn * d_e * d_s + en * d_w * d_s
+           + ws * d_e * d_n + es * d_w * d_n
+
+参数：
+  - **x** (Variable): 输入张量，维度为 :math:`[N, C, H, W]` 的4-D Tensor，N为批尺寸，C是通道数，H是特征高度，W是特征宽度, 数据类型为float32或float64。
+  - **grid** (Variable): 输入网格数据张量，维度为 :math:`[N, H, W, 2]` 的4-D Tensor，N为批尺寸，C是通道数，H是特征高度，W是特征宽度, 数据类型为float32或float64。
+  - **name** (str，可选) – 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置。默认值：None。
+
+返回： Variable(Tensor): 输入X基于输入网格的双线性插值计算结果，维度为 :math:`[N, C, H, W]` 的4-D Tensor
+
+返回类型：变量(Variable)，数据类型与 ``x`` 一致
+
+**代码示例：**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+
+    # 一般与 affine_grid 组合使用
+    x = fluid.data(name='x', shape=[None, 10, 32, 32], dtype='float32')
+    theta = fluid.layers.data(name='theta', shape=[2, 3], dtype='float32')
+    grid = fluid.layers.affine_grid(theta=theta, out_shape=[3, 10, 32, 32])
+    out = fluid.layers.grid_sampler(x=x, grid=grid)
+
+
+
+
+
+
+
+
+
+
--- a/doc/paddle/api/paddle/fluid/layers/group_norm_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/group_norm_cn.rst
+.. _cn_api_fluid_layers_group_norm:
+
+group_norm
+-------------------------------
+
+
+.. py:function::  paddle.fluid.layers.group_norm(input, groups, epsilon=1e-05, param_attr=None, bias_attr=None, act=None, data_layout='NCHW', name=None)
+
+:api_attr: 声明式编程模式（静态图)
+
+
+
+参考论文： `Group Normalization <https://arxiv.org/abs/1803.08494>`_
+
+参数：
+  - **input** (Variable)：输入为4-D Tensor，数据类型为float32或float64。
+  - **groups** (int)：从 channel 中分离出来的 group 的数目，数据类型为int32。
+  - **epsilon** (float，可选)：为防止方差除以零，增加一个很小的值。数据类型为float32。默认值：1e-05。
+  - **param_attr** (ParamAttr|bool，可选) ：指定权重参数属性的对象。若 ``param_attr`` 为bool类型，只支持为False，表示没有权重参数。默认值为None，表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+  - **bias_attr** (ParamAttr|bool，可选) : 指定偏置参数属性的对象。若 ``bias_attr`` 为bool类型，只支持为False，表示没有偏置参数。默认值为None，表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+  - **act** (str，可选)：将激活应用于输出的 group normalizaiton。
+  - **data_layout** (str，可选)：指定输入的数据格式，输出的数据格式将与输入保持一致，可以是"NCHW"和"NHWC"。N是批尺寸，C是通道数，H是特征高度，W是特征宽度。默认值："NCHW"。
+  - **name** (str，可选)：具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+
+返回：4-D Tensor，数据类型和格式与 `input` 一致。
+
+返回类型：Variable
+
+抛出异常：
+    - ``ValueError`` - 如果 ``data_layout`` 既不是"NCHW"也不是"NHWC"。
+    - ``ValueError`` - 如果 ``groups`` 小于1，或者 ``groups`` 大于输入的通道数。
+    - ``ShapeError`` - 如果  ``param_attr`` (Scale) 或者 ``bias_attr`` (Bias) 不是 1-D Tensor。
+    - ``ShapeError`` - 如果  ``param_attr`` (Scale) 或者 ``bias_attr`` (Bias) 的大小与输入的通道数不相等。
+
+**代码示例：**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+    data = fluid.data(name='data', shape=[None, 8, 32, 32], dtype='float32')
+    x = fluid.layers.group_norm(input=data, groups=4)
+
+
+
+
+
+
+
+
+
+
--- a/doc/paddle/api/paddle/fluid/layers/hash_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/hash_cn.rst
+.. _cn_api_fluid_layers_hash:
+
+hash
+-------------------------------
+
+.. py:function::  paddle.fluid.layers.hash(input, hash_size, num_hash=1, name=None)
+
+:alias_main: paddle.nn.functional.hash
+:alias: paddle.nn.functional.hash,paddle.nn.functional.lod.hash
+:old_api: paddle.fluid.layers.hash
+
+
+
+该OP将输入 hash 成为一个整数，该数的值小于给定的 ``hash_size`` 。**仅支持输入为LoDTensor**。
+
+该OP使用的哈希算法是：xxHash - `Extremely fast hash algorithm <https://github.com/Cyan4973/xxHash/tree/v0.6.5>`_
+
+
+参数：
+  - **input** (Variable) - 输入是一个 **二维** ``LoDTensor`` 。**输入维数必须为2**。数据类型为：int32、int64。**仅支持LoDTensor**。
+  - **hash_size** (int) - 哈希算法的空间大小。输出值将保持在 :math:`[0, hash\_size)` 范围内。
+  - **num_hash** (int) - 哈希次数。默认值为1。
+  - **name** (str，可选) – 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+
+返回：``LoDTensor``
+
+返回类型：Variable
+
+**代码示例：**
+
+.. code-block:: python
+
+  import paddle.fluid as fluid
+  import numpy as np
+
+  place = fluid.core.CPUPlace()
+
+  # 构建网络
+  x = fluid.data(name="x", shape=[2, 2], dtype="int32", lod_level=1)
+  res = fluid.layers.hash(name="res", input=x, hash_size=1000, num_hash=4)
+
+  # 创建CPU执行器
+  exe = fluid.Executor(place)
+  exe.run(fluid.default_startup_program())
+
+  in1 = np.array([[1,2],[3,4]]).astype("int32")
+  print(in1)
+  x_i = fluid.create_lod_tensor(in1, [[0, 2]], place)
+  res = exe.run(fluid.default_main_program(), feed={'x':x_i}, fetch_list=[res], return_numpy=False)
+  print(np.array(res[0]))
+  # [[[722]
+  #   [407]
+  #   [337]
+  #   [395]]
+  #  [[603]
+  #   [590]
+  #   [386]
+  #   [901]]]
--- a/doc/paddle/api/paddle/fluid/layers/logical_and_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/logical_and_cn.rst
+.. _cn_api_fluid_layers_logical_and:
+
+logical_and
+-------------------------------
+
+.. py:function:: paddle.logical_and(x, y, out=None, name=None)
+
+该OP逐元素的对 ``x`` 和 ``y`` 进行逻辑与运算。
+
+.. math::
+       Out = X \&\& Y
+
+.. note::
+    ``paddle.logical_and`` 遵守broadcasting，如您想了解更多，请参见 :ref:`cn_user_guide_broadcasting` 。
+
+参数：
+        - **x** （Tensor）- 输入的 `Tensor` ，数据类型为：bool。
+        - **y** （Tensor）- 输入的 `Tensor` ，数据类型为：bool。
+        - **out** （Tensor，可选）- 指定算子输出结果的 `Tensor` ，可以是程序中已经创建的任何Tensor。默认值为None，此时将创建新的Tensor来保存输出结果。
+        - **name** （str，可选）- 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+
+返回： ``Tensor`` ， 维度``x`` 维度相同，存储运算后的结果。
+
+**代码示例：**
+
+.. code-block:: python
+
+     import paddle
+     import numpy as np
+
+     paddle.disable_static()
+     x_data = np.array([True], dtype=np.bool)
+     y_data = np.array([True, False, True, False], dtype=np.bool)
+     x = paddle.to_tensor(x_data)
+     y = paddle.to_tensor(y_data)
+     res = paddle.logical_and(x, y)
+     print(res.numpy()) # [True False True False]
--- a/doc/paddle/api/paddle/fluid/layers/logical_not_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/logical_not_cn.rst
+.. _cn_api_fluid_layers_logical_not:
+
+logical_not
+-------------------------------
+
+.. py:function:: paddle.logical_not(x, out=None, name=None)
+
+:alias_main: paddle.logical_not
+:alias: paddle.logical_not, paddle.tensor.logical_not, paddle.tensor.logic.logical_not
+:old_api: paddle.fluid.layers.logical_not
+
+
+
+该OP逐元素的对 ``X``  Variable进行逻辑非运算
+
+.. math::
+        Out = !X
+
+参数：
+        - **x** （Variable）- 逻辑非运算的输入，是一个 Variable，数据类型只能是bool。
+        - **out** （Variable，可选）- 指定算子输出结果的 Variable，可以是程序中已经创建的任何 Variable。默认值为None，此时将创建新的Variable来保存输出结果。
+        - **name** （str，可选）- 该参数供开发人员打印调试信息时使用，具体用法参见 :ref:`api_guide_Name` ，默认值为None。
+
+返回：与 ``x`` 维度相同，数据类型相同的 Variable。
+
+返回类型：Variable
+
+**代码示例：**
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+
+    paddle.enable_imperative()
+    x_data = np.array([True, False, True, False], dtype=np.bool)
+    x = paddle.imperative.to_variable(x_data)
+    res = paddle.logical_not(x)
+    print(res.numpy()) # [False  True False  True]
--- a/doc/paddle/api/paddle/fluid/layers/logical_or_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/logical_or_cn.rst
+.. _cn_api_fluid_layers_logical_or:
+
+logical_or
+-------------------------------
+
+.. py:function:: paddle.logical_or(x, y, out=None, name=None)
+
+该OP逐元素的对 ``X`` 和 ``Y`` 进行逻辑或运算。
+
+.. math::
+        Out = X || Y
+
+.. note::
+    ``paddle.logical_or`` 遵守broadcasting，如您想了解更多，请参见 :ref:`cn_user_guide_broadcasting` 。
+
+参数：
+        - **x** （Tensor）- 输入的 `Tensor` ，数据类型为：bool。
+        - **y** （Tensor）- 输入的 `Tensor` ，数据类型为：bool。
+        - **out** （Tensor，可选）- 指定算子输出结果的 `Tensor` ，可以是程序中已经创建的任何Tensor。默认值为None，此时将创建新的Tensor来保存输出结果。
+        - **name** （str，可选）- 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+
+返回： ``Tensor`` ， 维度``x`` 维度相同，存储运算后的结果。
+
+**代码示例：**
+
+.. code-block:: python
+
+     import paddle
+     import numpy as np
+
+     paddle.disable_static()
+     x_data = np.array([True, False], dtype=np.bool).reshape(2, 1)
+     y_data = np.array([True, False, True, False], dtype=np.bool).reshape(2, 2)
+     x = paddle.to_tensor(x_data)
+     y = paddle.to_tensor(y_data)
+     res = paddle.logical_or(x, y)
+     print(res.numpy()) # [[ True  True] [ True False]]
--- a/doc/paddle/api/paddle/fluid/layers/logical_xor_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/logical_xor_cn.rst
+.. _cn_api_fluid_layers_logical_xor:
+
+logical_xor
+-------------------------------
+
+.. py:function:: paddle.logical_xor(x, y, out=None, name=None)
+
+该OP逐元素的对 ``X`` 和 ``Y`` 进行逻辑异或运算。
+
+.. math::
+        Out = (X || Y) \&\& !(X \&\& Y)
+
+.. note::
+    ``paddle.logical_xor`` 遵守broadcasting，如您想了解更多，请参见 :ref:`cn_user_guide_broadcasting` 。
+
+参数：
+        - **x** （Tensor）- 输入的 `Tensor` ，数据类型为：bool。
+        - **y** （Tensor）- 输入的 `Tensor` ，数据类型为：bool。
+        - **out** （Tensor，可选）- 指定算子输出结果的 `Tensor` ，可以是程序中已经创建的任何Tensor。默认值为None，此时将创建新的Tensor来保存输出结果。
+        - **name** （str，可选）- 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+
+返回： ``Tensor`` ， 维度``x`` 维度相同，存储运算后的结果。
+
+**代码示例：**
+
+.. code-block:: python
+
+      import paddle
+      import numpy as np
+
+      paddle.disable_static()
+      x_data = np.array([True, False], dtype=np.bool).reshape([2, 1])
+      y_data = np.array([True, False, True, False], dtype=np.bool).reshape([2, 2])
+      x = paddle.to_tensor(x_data)
+      y = paddle.to_tensor(y_data)
+      res = paddle.logical_xor(x, y)
+      print(res.numpy()) # [[False,  True], [ True, False]]
--- a/doc/paddle/api/paddle/fluid/layers/lstm_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/lstm_cn.rst
+.. _cn_api_fluid_layers_lstm:
+
+lstm
+-------------------------------
+
+
+.. py:function::  paddle.fluid.layers.lstm(input, init_h, init_c, max_len, hidden_size, num_layers, dropout_prob=0.0, is_bidirec=False, is_test=False, name=None, default_initializer=None, seed=-1)
+
+:api_attr: 声明式编程模式（静态图)
+
+
+
+.. note::
+    该OP仅支持 GPU 设备运行
+
+该OP实现了 LSTM，即 Long-Short Term Memory（长短期记忆）运算 - `Hochreiter, S., & Schmidhuber, J. (1997) <https://www.bioinf.jku.at/publications/older/2604.pdf>`_。
+
+该OP的实现不包括 diagonal/peephole 连接，参见 `Gers, F. A., & Schmidhuber, J. (2000) <ftp://ftp.idsia.ch/pub/juergen/TimeCount-IJCNN2000.pdf>`_。
+如果需要使用 peephole 连接方法，请使用 :ref:`cn_api_fluid_layers_dynamic_lstm` 。
+
+该OP对于序列中每一个时间步的计算公式如下：
+
+.. math::
+  i_t = \sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + b_{x_i} + b_{h_i})
+.. math::
+  f_t = \sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + b_{x_f} + b_{h_f})
+.. math::
+  o_t = \sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + b_{x_o} + b_{h_o})
+.. math::
+  \widetilde{c_t} = tanh(W_{cx}x_t + W_{ch}h_{t-1} + b{x_c} + b_{h_c})
+.. math::
+  c_t = f_t \odot c_{t-1} + i_t \odot \widetilde{c_t}
+.. math::
+  h_t = o_t \odot tanh(c_t)
+
+公式中的概念信息如下：
+      - :math:`x_{t}` 表示时间步 :math:`t` 的输入
+      - :math:`h_{t}` 表示时间步 :math:`t` 的 hidden 状态
+      - :math:`h_{t-1}, c_{t-1}` 分别表示前一个时间步的 hidden 和 cell 状态
+      - :math:`\widetilde{c_t}` 表示候选的 cell 状态
+      - :math:`i_t` ，:math:`f_t` 和 :math:`o_t` 分别为 input gate，forget gate，output gate
+      - :math:`W` 表示 weight （例如， :math:`W_{ix}` 是在计算 input gate :math:`i_t` 时，对输入 :math:`x_{t}` 做线性变换的 weight）
+      - :math:`b` 表示 bias （例如， :math:`b_{i}` 是 input gate 的 bias）
+      - :math:`\sigma` 表示 gate 的非线性激活函数，默认为 sigmoid
+      - :math:`\odot` 表示矩阵的 Hadamard product，即对两个维度相同的矩阵，将相同位置的元素相乘，得到另一个维度相同的矩阵
+
+参数：
+  - **input** ( :ref:`api_guide_Variable` ) - LSTM的输入张量，维度为 :math:`[batch\_size, seq\_len, input\_dim]` 的 3-D Tensor，其中 seq_len 为序列的长度， input_dim 为序列词嵌入的维度。数据类型为 float32 或者 float64。
+  - **init_h** ( :ref:`api_guide_Variable` ) – LSTM的初始 hidden 状态，维度为 :math:`[num\_layers, batch\_size, hidden\_size]` 的 3-D Tensor，其中 num_layers 是LSTM的总层数，hidden_size 是隐层维度。 如果is_bidirec = True， 维度应该为 :math:`[num\_layers*2, batch\_size, hidden\_size]` 。数据类型为 float32 或者 float64。
+  - **init_c** ( :ref:`api_guide_Variable` ) - LSTM的初始 cell 状态。维度为 :math:`[num\_layers, batch\_size, hidden\_size]` 的 3-D Tensor，其中 num_layers 是LSTM的总层数，hidden_size 是隐层维度。 如果is_bidirec = True， 维度应该为 :math:`[num\_layers*2, batch\_size, hidden\_size]` 。数据类型为 float32 或者 float64。
+  - **max_len** (int) – LSTM的最大长度。输入张量的第一个 input_dim 不能大于 max_len。
+  - **hidden_size** (int) - LSTM hidden 状态的维度。
+  - **num_layers** (int) –  LSTM的总层数。例如，该参数设置为2，则会堆叠两个LSTM，其第一个LSTM的输出会作为第二个LSTM的输入。
+  - **dropout_prob** (float，可选) – dropout比例，dropout 只在 rnn 层之间工作，而不是在时间步骤之间。dropout 不作用于最后的 rnn 层的 rnn 输出中。默认值为 0.0。
+  - **is_bidirec** (bool，可选) – 是否是双向的LSTM。默认值为 False。
+  - **is_test** (bool，可选) – 是否在测试阶段。默认值为 False。
+  - **name** (str，可选) - 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+  - **default_initializer** (Initializer，可选) – 用于初始化权重的初始化器，如果为None，将进行默认初始化。默认值为 None。
+  - **seed** (int，可选) – LSTM中dropout的seed，如果是-1，dropout将使用随机seed。默认值为 1。
+
+返回： 经过lstm运算输出的三个Tensor的tuple，包括
+
+- rnn_out：LSTM hidden的输出结果的Tensor，数据类型与input一致，维度为 :math:`[batch\_size, seq\_len, hidden\_size]` 。如果 ``is_bidirec`` 设置为True，则维度为 :math:`[batch\_size, seq\_len, hidden\_size*2]`
+- last_h：LSTM最后一步的hidden状态的Tensor，数据类型与input一致，维度为 :math:`[num\_layers, batch\_size, hidden\_size]` 。如果 ``is_bidirec`` 设置为True，则维度为 :math:`[num\_layers*2, batch\_size, hidden\_size]`
+- last_c：LSTM最后一步的cell状态的Tensor，数据类型与input一致，维度为 :math:`[num\_layers, batch\_size, hidden\_size]` 。如果 ``is_bidirec`` 设置为True，则维度为 :math:`[num\_layers*2, batch\_size, hidden\_size]`
+
+返回类型:  tuple（ :ref:`api_guide_Variable` , :ref:`api_guide_Variable` , :ref:`api_guide_Variable` ）
+
+**代码示例：**
+
+.. code-block:: python
+
+  import paddle.fluid as fluid
+  import paddle.fluid.layers as layers
+
+  emb_dim = 256
+  vocab_size = 10000
+  data = fluid.layers.data(name='x', shape=[-1, 100, 1],
+                 dtype='int64')
+  emb = fluid.layers.embedding(input=data, size=[vocab_size, emb_dim], is_sparse=True)
+  batch_size = 20
+  max_len = 100
+  dropout_prob = 0.2
+  hidden_size = 150
+  num_layers = 1
+  init_h = layers.fill_constant( [num_layers, batch_size, hidden_size], 'float32', 0.0 )
+  init_c = layers.fill_constant( [num_layers, batch_size, hidden_size], 'float32', 0.0 )
+
+  rnn_out, last_h, last_c = layers.lstm(emb, init_h, init_c, max_len, hidden_size, num_layers, dropout_prob=dropout_prob)
+  rnn_out.shape  # (-1, 100, 150)
+  last_h.shape  # (1, 20, 150)
+  last_c.shape  # (1, 20, 150)
+
+
+
+
+
+
+
+
+
+
+
+
--- a/doc/paddle/api/paddle/fluid/layers/pad2d_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/pad2d_cn.rst
+.. _cn_api_fluid_layers_pad2d:
+
+pad2d
+-------------------------------
+
+.. py:function::  paddle.fluid.layers.pad2d(input, paddings=[0, 0, 0, 0], mode='constant', pad_value=0.0, data_format='NCHW', name=None)
+
+:alias_main: paddle.nn.functional.pad2d
+:alias: paddle.nn.functional.pad2d,paddle.nn.functional.common.pad2d
+:old_api: paddle.fluid.layers.pad2d
+
+
+
+该OP依照 paddings 和 mode 属性对input进行2维 ``pad`` 。
+
+参数：
+  - **input** (Variable) - 类型为float32的4-D Tensor， format为 `[N, C, H, W]` 或 `[N, H, W, C]` 。
+  - **paddings** (Variable | List[int32]) - 填充大小。如果paddings是一个List，它必须包含四个整数 `[padding_top, padding_bottom, padding_left, padding_right]` 。
+    如果paddings是Variable， 则是类型为int32 的1-D Tensor，shape是 `[4]` 。默认值为 `[0,0,0,0]` 。
+  - **mode** (str) - padding的三种模式，分别为 `'constant'` (默认)、 `'reflect'` 、 `'edge'` 。 `'constant'` 为填充常数 `pad_value` ， `'reflect'` 为填充以input边界值为轴的映射， `'edge'` 为填充input边界值。具体结果可见以下示例。默认值为 `'constant'` 。
+  - **pad_value** (float32) - 以 `'constant'` 模式填充区域时填充的值。默认值为0.0。
+  - **data_format** (str)  - 指定input的format，可为 `'NCHW'` 和 `'NHWC'` ，默认值为 `'NCHW'` 。
+  - **name** (str, 可选) - 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，缺省值为None。
+返回： 对input进行2维 ``pad`` 的结果，数据类型和input一样的4-D Tensor。
+
+返回类型：Variable
+
+**示例**：
+
+.. code-block:: text
+
+      Input = [[[[1., 2., 3.],
+                 [4., 5., 6.]]]]
+
+      Case 0:
+          paddings = [0, 1, 2, 3],
+          mode = 'constant'
+          pad_value = 0
+          Out = [[[[0., 0., 1., 2., 3., 0., 0., 0.],
+                   [0., 0., 4., 5., 6., 0., 0., 0.],
+                   [0., 0., 0., 0., 0., 0., 0., 0.]]]]
+
+      Case 1:
+          paddings = [0, 1, 2, 1],
+          mode = 'reflect'
+          Out = [[[[3., 2., 1., 2., 3., 2.],
+                   [6., 5., 4., 5., 6., 5.],
+                   [3., 2., 1., 2., 3., 2.]]]]
+
+      Case 2:
+          paddings = [0, 1, 2, 1],
+          mode = 'edge'
+          Out = [[[[1., 1., 1., 2., 3., 3.],
+                   [4., 4., 4., 5., 6., 6.],
+                   [4., 4., 4., 5., 6., 6.]]]]
+
+
+
+**代码示例：**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+    data = fluid.data(name='data', shape=[None, 3, 32, 32], dtype='float32')
+    result = fluid.layers.pad2d(input=data, paddings=[0, 1, 2, 3], mode='reflect')
--- a/doc/paddle/api/paddle/fluid/layers/pow_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/pow_cn.rst
+.. _cn_api_fluid_layers_pow:
+
+pow
+-------------------------------
+
+.. py:function:: paddle.pow(x, exponent, name=None)
+
+
+
+
+该OP是指数激活算子：
+
+.. math::
+
+    out = x^{exponent}
+
+**注意：如果需要对输入进行 elementwise_pow 操作，请查使用** :ref:`cn_api_fluid_layers_elementwise_pow` 。
+
+参数：
+    - **x** （Variable）- 多维 ``Variable``，数据类型为 ``float32`` 或 ``float64`` 。
+    - **exponent** （float32|Variable）- ``float32`` 或形状为[1]的 ``Variable``，数据类型为 ``float32``。
+    - **name** （str，可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置。默认值： ``None``。
+
+返回：维度与输入 `x` 相同的 ``Variable``，数据类型与 ``x`` 相同。
+
+返回类型：Variable。
+
+
+**代码示例：**
+
+.. code-block:: python
+
+            import paddle
+            import numpy as np
+            x = fluid.data(name="x", shape=[32,32], dtype="float32")
+            paddle.enable_imperative()
+            
+            # example 1: exponent is a float
+            x_data = np.array([1, 2, 3])
+            exponent = 2
+            x = paddle.imperative.to_variable(x_data)
+            res = paddle.pow(x, exponent)
+            print(res.numpy()) # [1 4 9]
+            
+            # example 2: exponent is a Variable
+            exponent = paddle.fill_constant(shape=[1], value=2, dtype='float32')
+            res = paddle.pow(x, exponent)
+            print(res.numpy()) # [1 4 9]
+
+
+
+
+
+
+
--- a/doc/paddle/api/paddle/fluid/layers/rank_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/rank_cn.rst
+.. _cn_api_fluid_layers_rank:
+
+rank
+-------------------------------
+
+.. py:function::  paddle.fluid.layers.rank(input)
+
+:alias_main: paddle.rank
+:alias: paddle.rank,paddle.tensor.rank,paddle.tensor.attribute.rank
+:old_api: paddle.fluid.layers.rank
+
+
+
+该OP用于计算输入Tensor的维度（秩）。
+
+参数：
+    - **input** (Variable) — 输入input是shape为 :math:`[N_1, N_2, ..., N_k]` 的多维Tensor，数据类型可以任意类型。
+
+返回：输出Tensor的秩，是一个0-D Tensor。
+
+返回类型：Variable，数据类型为int32。
+
+**代码示例**
+
+.. code-block:: python
+
+       import paddle.fluid as fluid
+       input = fluid.data(
+            name="input", shape=[3, 100, 100], dtype="float32")
+       rank = fluid.layers.rank(input) # rank=(4,)
+
+
--- a/doc/paddle/api/paddle/fluid/layers/rank_loss_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/rank_loss_cn.rst
+.. _cn_api_fluid_layers_rank_loss:
+
+rank_loss
+-------------------------------
+
+.. py:function::  paddle.fluid.layers.rank_loss(label, left, right, name=None)
+
+:alias_main: paddle.nn.functional.rank_loss
+:alias: paddle.nn.functional.rank_loss,paddle.nn.functional.loss.rank_loss
+:old_api: paddle.fluid.layers.rank_loss
+
+
+
+该OP实现了RankNet模型中的排序损失层。RankNet是一种文档对（pairwise）排序模型，训练样本由一对文档（假设用A、B来表示）组成。标签（假设用P来表示）表示A的排名是否高于B。更多详情请参考：`RankNet <http://icml.cc/2015/wp-content/uploads/2015/06/icml_ranking.pdf>`_
+
+排序损失层有三个输入： :math:`o_i` 、 :math:`o_j` 和 :math:`\tilde{P_{ij}}` ，输入分别表示RankNet模型对文档A、B的输出得分和标签P的值；排序损失层的输入是批输入数据（批大小大于等于1）；标签P的取值可以为： {0, 1} 或 {0, 0.5, 1} ，其中，0.5表示输入文档对排序相同。输入数据的排序损失 :math:`C_{i,j}` 计算过程如下：
+
+.. math::
+
+    C_{i,j} &= -\tilde{P_{ij}} * o_{i,j} + \log(1 + e^{o_{i,j}})
+
+    o_{i,j} &=  o_i - o_j
+
+    \tilde{P_{i,j}} &= \left \{0, 0.5, 1 \right \} \ or \ \left \{0, 1 \right \}
+
+参数：
+    - **label** (Variable)：维度为 :math:`[batch,1]` 的2-D ``Tensor`` ，数据类型为float32。其中batch表示批数据的大小。表示A的排名是否高于B。
+    - **left** (Variable)：维度为 :math:`[batch,1]` 的2-D ``Tensor`` ，数据类型为float32。其中batch表示批数据的大小。表示RankNet对文档A的输出得分。
+    - **right** (Variable)：维度为 :math:`[batch,1]` 的2-D ``Tensor`` ，数据类型为float32。其中batch表示批数据的大小。表示RankNet对文档B的输出得分。
+    - **name** (str, 可选)：具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+
+返回：表示排序损失层输出值的 ``Tensor`` ，数据类型为float32，返回值维度为 :math:`[batch,1]` 。
+
+返回类型：Variable
+
+抛出异常：
+    - ``ValueError`` - 输入 ``label`` ， ``left`` ，和 ``right`` 至少有一个不是 ``Variable`` 类型。
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+    label = fluid.layers.data(name="label", shape=[-1, 1], dtype="float32")
+    left = fluid.layers.data(name="left", shape=[-1, 1], dtype="float32")
+    right = fluid.layers.data(name="right", shape=[-1, 1], dtype="float32")
+    out = fluid.layers.rank_loss(label, left, right)
+
--- a/doc/paddle/api/paddle/fluid/layers/reshape_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/reshape_cn.rst
+.. _cn_api_fluid_layers_reshape:
+
+reshape
+-------------------------------
+
+.. py:function::  paddle.fluid.layers.reshape(x, shape, actual_shape=None, act=None, inplace=False, name=None)
+
+
+该OP在保持输入 ``x`` 数据不变的情况下，改变 ``x`` 的形状。
+
+目标形状可由 ``shape`` 或 ``actual_shape`` 给出。当两个属性同时被指定时，``actual_shape`` 的优先级高于 ``shape`` ，但此时 ``shape`` 只能是整数列表或元组，且在编译时仍然应该正确地设置 ``shape`` 以保证形状推断。
+
+在指定目标shape时存在一些技巧：
+
+.. code-block:: text
+
+  1. -1 表示这个维度的值是从x的元素总数和剩余维度推断出来的。因此，有且只有一个维度可以被设置为-1。
+  2. 0 表示实际的维数是从x的对应维数中复制出来的，因此shape中0的索引值不能超过x的维度。
+
+
+这里有一些例子来解释它们：
+
+.. code-block:: text
+
+  1. 给定一个形状为[2,4,6]的三维张量x，目标形状为[6,8]，则将x变换为形状为[6,8]的2-D张量，且x的数据保持不变。
+  2. 给定一个形状为[2,4,6]的三维张量x，目标形状为[2,3,-1,2]，则将x变换为形状为[2,3,4,2]的4-D张量，且x的数据保持不变。在这种情况下，目标形状的一个维度被设置为-1，这个维度的值是从x的元素总数和剩余维度推断出来的。
+  3. 给定一个形状为[2,4,6]的三维张量x，目标形状为[-1,0,3,2]，则将x变换为形状为[2,4,3,2]的4-D张量，且x的数据保持不变。在这种情况下，0对应位置的维度值将从x的对应维数中复制,-1对应位置的维度值由x的元素总数和剩余维度推断出来。
+
+.. warning::
+参数 ``actual_shape`` 之后将被舍弃，只用参数 ``shape`` 来表示目标形状。
+
+参数：
+  - **x** （Tensor）- N-D ``Tensor``，数据类型为 ``float32``，``float64``，``int32``，或 ``int64``。
+  - **shape** （list|tuple|Tensor）- 数据类型是 ``int32`` 。定义目标形状。目标形状最多只能有一个维度为-1。如果 ``shape`` 的类型是 list 或 tuple, 它的元素可以是整数或者形状为[1]的 ``Tensor``。如果 ``shape`` 的类型是 ``Tensor``，则是1-D的 ``Tensor``。
+  - **actual_shape** （Tensor，可选）- 1-D ``Tensor``，默认值：`None`。如果 ``actual_shape`` 被提供，``actual_shape`` 具有比 ``shape`` 更高的优先级，此时 ``shape`` 只能是整数列表或元组。更新提示：``actual_shape`` 在未来的版本中将被舍弃，并用 ``shape`` 代替。
+  - **act** （str，可选）- 对形状改变后的输入变量做非线性激活操作，激活函数类型可以参考 :ref:`api_guide_activations` 。默认值： ``None``。
+  - **inplace** （bool，可选）- 如果 ``inplace`` 为 ``True``，则 ``layers.reshape`` 的输入和输出是同一个变量，否则 ``layers.reshape`` 的输入和输出是不同的变量。默认值：``False``。请注意，如果 ``x`` 是多个OP的输入，则 ``inplace`` 必须为False。
+  - **name** （str，可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置。默认值： ``None``。
+
+返回：
+:::::::::
+``Tensor``，改变形状后的 ``Tensor``，数据类型与 ``x`` 相同。如果 ``inplace`` 为 ``False``，则返回一个新的变量，否则将改变输入变量 ``x`` 自身。如果 ``act`` 为 ``None``，则直接返回形状改变后的变量，否则返回经过激活函数后的变量。
+
+
+**代码示例**
+
+.. code-block:: python
+
+  import paddle.fluid as fluid
+
+  # example 1:
+  # attr shape is a list which doesn't contain Tensors.
+  data_1 = fluid.data(
+    name='data_1', shape=[2, 4, 6], dtype='float32')
+  reshaped_1 = fluid.layers.reshape(
+    x=data_1, shape=[-1, 0, 3, 2], inplace=True)
+  # the shape of reshaped_1 is [2,4,3,2].
+
+  # example 2:
+  # attr shape is a list which contains Tensors.
+  data_2 = fluid.layers.fill_constant([2,25], "int32", 3)
+  dim = fluid.layers.fill_constant([1], "int32", 5)
+  reshaped_2 = fluid.layers.reshape(data_2, shape=[dim, 10])
+  # the shape of reshaped_2 is [5,10].
+
+  # example 3:
+  data_3 = fluid.data(
+    name="data_3", shape=[2,4,6], dtype='float32')
+  reshaped_3 = fluid.layers.reshape(x=data_3, shape=[6,8])
+  # the shape of reshaped_3 is [6,8].
+
+
+
+
+
+
+
+
--- a/doc/paddle/api/paddle/fluid/layers/sequence_mask_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/sequence_mask_cn.rst
+.. _cn_api_fluid_layers_sequence_mask:
+
+sequence_mask
+-------------------------------
+
+.. py:function::  paddle.fluid.layers.sequence_mask(x, maxlen=None, dtype='int64', name=None)
+
+
+
+
+该层根据输入 ``x`` 和 ``maxlen`` 输出一个掩码，数据类型为 ``dtype`` 。
+
+假设 x 是一个形状为 ``[d_1, d_2，…, d_n]`` 的张量， 则输出 y 是一个形状为 ``[d_1, d_2，… ，d_n, maxlen]`` 的掩码，其中:
+
+.. math::
+
+  y(i_1, i_2,..., i_n, j) = (j < x(i_1, i_2,..., i_n))
+
+范例如下：
+
+::
+
+    给定输入：
+      x = [3, 1, 1, 0]  maxlen = 4
+
+    得到输出张量：
+      mask = [[1, 1, 1, 0],
+              [1, 0, 0, 0],
+              [1, 0, 0, 0],
+              [0, 0, 0, 0]]
+        
+
+
+
+
+参数：
+  - **x** (Variable) - 输入张量，其元素是小于等于 ``maxlen`` 的整数，形状为 ``[d_1, d_2，…, d_n]`` 的Tensor或LoDTensor。
+  - **maxlen** (int，可选) - 序列的最大长度。默认为空，此时 ``maxlen`` 取 ``x`` 中所有元素的最大值。
+  - **dtype** (np.dtype|core.VarDesc.VarType|str，可选) - 输出的数据类型，默认为 ``int64`` 。
+  - **name** (str，可选) - 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+
+返回： mask张量，Tensor或LoDTensor，形状为 ``[d_1, d_2，… ，d_n, maxlen]`` ，数据类型由 ``dtype`` 指定，支持float32、float64、int32和int64，默认为int64。
+
+返回类型： Variable
+
+**代码示例**：
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+    import paddle.fluid.layers as layers
+    
+    x = fluid.data(name='x', shape=[10], dtype='float32', lod_level=1)
+    mask = layers.sequence_mask(x=x)
+
+
+
+
+
+
+
+
+
+
--- a/doc/paddle/api/paddle/fluid/layers/unbind_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/unbind_cn.rst
+.. _cn_api_paddle_tensor_unbind
+unbind
+-------------------------------
+
+.. py:function:: paddle.tensor.unbind(input, axis=0)
+
+:alias_main: paddle.unbind
+:alias: paddle.unbind,paddle.tensor.unbind,paddle.tensor.manipulation.unbind
+
+
+
+该OP将输入Tensor按照指定的维度分割成多个子Tensor。
+
+**参数**：
+       - **input** (Variable) - 输入变量，数据类型为float32，float64，int32，int64的多维Tensor。
+       - **axis** (int32|int64，可选) - 数据类型为int32或int64,表示需要分割的维度。如果axis < 0，则划分的维度为rank(input) + axis。默认值为0。
+
+**返回**：分割后的Tensor列表。
+
+**返回类型**：列表(Variable)，数据类型为int32，int64，float32，float64。
+
+**代码示例**：
+
+.. code-block:: python
+    
+    import paddle
+    # input is a variable which shape is [3, 4, 5]
+    input = paddle.fluid.data(
+        name="input", shape=[3, 4, 5], dtype="float32")
+    [x0, x1, x2] = paddle.tensor.unbind(input, axis=0)
+    # x0.shape [4, 5]
+    # x1.shape [4, 5]
+    # x2.shape [4, 5]
+    [x0, x1, x2, x3] = paddle.tensor.unbind(input, axis=1)
+    # x0.shape [3, 5]
+    # x1.shape [3, 5]
+    # x2.shape [3, 5]
+    # x3.shape [3, 5]
--- a/doc/paddle/api/paddle/fluid/optimizer/Adam_cn.rst
+++ b/doc/paddle/api/paddle/fluid/optimizer/Adam_cn.rst
+.. _cn_api_paddle_optimizer_Adam:
+
+Adam
+-------------------------------
+
+.. py:class:: paddle.optimizer.Adam(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, parameters=None, weight_decay=None, grad_clip=None, name=None, lazy_mode=False)
+
+
+
+
+Adam优化器出自 `Adam论文 <https://arxiv.org/abs/1412.6980>`_ 的第二节，能够利用梯度的一阶矩估计和二阶矩估计动态调整每个参数的学习率。
+
+其参数更新的计算公式如下：
+
+.. math::
+    \\t = t + 1
+.. math::
+    moment\_1\_out=\beta_1∗moment\_1+(1−\beta_1)∗grad
+.. math::
+    moment\_2\_out=\beta_2∗moment\_2+(1−\beta_2)∗grad*grad
+.. math::
+    learning\_rate=learning\_rate*\frac{\sqrt{1-\beta_2^t}}{1-\beta_1^t}
+.. math::
+    param\_out=param-learning\_rate*\frac{moment\_1}{\sqrt{moment\_2}+\epsilon}\\
+
+相关论文：`Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_ 
+
+参数: 
+    - **learning_rate** (float|_LRScheduler) - 学习率，用于参数更新的计算。可以是一个浮点型值或者一个_LRScheduler类，默认值为0.001
+    - **beta1** (float|Tensor, 可选) - 一阶矩估计的指数衰减率，是一个float类型或者一个shape为[1]，数据类型为float32的Tensor类型。默认值为0.9
+    - **beta2** (float|Tensor, 可选) - 二阶矩估计的指数衰减率，是一个float类型或者一个shape为[1]，数据类型为float32的Tensor类型。默认值为0.999
+    - **epsilon** (float, 可选) - 保持数值稳定性的短浮点类型值，默认值为1e-08
+    - **parameters** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数；在静态图模式下默认值为None，这时所有的参数都将被优化。
+    - **weight_decay** (float|WeightDecayRegularizer，可选) - 正则化方法。可以是float类型的L2正则化系数或者正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、 
+      :ref:`cn_api_fluid_regularizer_L2Decay` 。如果一个参数已经在 :ref:`cn_api_fluid_ParamAttr` 中设置了正则化，这里的正则化设置将被忽略；
+      如果没有在 :ref:`cn_api_fluid_ParamAttr` 中设置正则化，这里的设置才会生效。默认值为None，表示没有正则化。
+    - **grad_clip** (GradientClipBase, 可选) – 梯度裁剪的策略，支持三种裁剪策略： :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。
+      默认值为None，此时将不进行梯度裁剪。
+    - **name** (str, 可选)- 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None
+    - **lazy_mode** （bool, 可选） - 设为True时，仅更新当前具有梯度的元素。官方Adam算法有两个移动平均累加器（moving-average accumulators）。累加器在每一步都会更新。在密集模式和稀疏模式下，两条移动平均线的每个元素都会更新。如果参数非常大，那么更新可能很慢。 lazy mode仅更新当前具有梯度的元素，所以它会更快。但是这种模式与原始的算法有不同的描述，可能会导致不同的结果，默认为False
+
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+
+    paddle.disable_static()
+    inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32")
+    linear = paddle.nn.Linear(10, 10)
+    inp = paddle.to_tensor(inp)
+    out = linear(inp)
+    loss = paddle.mean(out)
+    adam = paddle.optimizer.Adam(learning_rate=0.1,
+            parameters=linear.parameters())
+    out.backward()
+    adam.step()
+    adam.clear_grad()
+
+.. code-block:: python
+
+    # Adam with beta1/beta2 as Tensor and weight_decay as float
+    import paddle
+    import numpy as np
+
+    paddle.disable_static()
+    inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32")
+    linear = paddle.nn.Linear(10, 10)
+    inp = paddle.to_tensor(inp)
+    out = linear(inp)
+    loss = paddle.mean(out)
+
+    beta1 = paddle.to_tensor([0.9], dtype="float32")
+    beta2 = paddle.to_tensor([0.99], dtype="float32")
+
+    adam = paddle.optimizer.Adam(learning_rate=0.1,
+            parameters=linear.parameters(),
+            beta1=beta1,
+            beta2=beta2,
+            weight_decay=0.01)
+    out.backward()
+    adam.step()
+    adam.clear_grad()
+
+.. py:method:: step()
+
+**注意：**
+
+  **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效**
+
+执行一次优化器并进行参数更新。
+
+返回：None。
+
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+    paddle.disable_static()
+    value = np.arange(26).reshape(2, 13).astype("float32")
+    a = paddle.to_tensor(value)
+    linear = paddle.nn.Linear(13, 5)
+    adam = paddle.optimizer.Adam(learning_rate = 0.01,
+                                parameters = linear.parameters())
+    out = linear(a)
+    out.backward()
+    adam.step()
+    adam.clear_grad()
+
+.. py:method:: minimize(loss, startup_program=None, parameters=None, no_grad_set=None)
+
+为网络添加反向计算过程，并根据反向计算所得的梯度，更新parameters中的Parameters，最小化网络损失值loss。
+
+参数：
+    - **loss** (Tensor) – 需要最小化的损失值变量
+    - **startup_program** (Program, 可选) – 用于初始化parameters中参数的 :ref:`cn_api_fluid_Program` , 默认值为None，此时将使用 :ref:`cn_api_fluid_default_startup_program` 
+    - **parameters** (list, 可选) – 待更新的Parameter或者Parameter.name组成的列表， 默认值为None，此时将更新所有的Parameter
+    - **no_grad_set** (set, 可选) – 不需要更新的Parameter或者Parameter.name组成的集合，默认值为None
+         
+返回: tuple(optimize_ops, params_grads)，其中optimize_ops为参数优化OP列表；param_grads为由(param, param_grad)组成的列表，其中param和param_grad分别为参数和参数的梯度。在静态图模式下，该返回值可以加入到 ``Executor.run()`` 接口的 ``fetch_list`` 参数中，若加入，则会重写 ``use_prune`` 参数为True，并根据 ``feed`` 和 ``fetch_list`` 进行剪枝，详见 ``Executor`` 的文档。
+
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+
+    paddle.disable_static()
+    inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32")
+    linear = paddle.nn.Linear(10, 10)
+    inp = paddle.to_tensor(inp)
+    out = linear(inp)
+    loss = paddle.mean(out)
+
+    beta1 = paddle.to_tensor([0.9], dtype="float32")
+    beta2 = paddle.to_tensor([0.99], dtype="float32")
+
+    adam = paddle.optimizer.Adam(learning_rate=0.1,
+            parameters=linear.parameters(),
+            weight_decay=0.01)
+    out.backward()
+    adam.minimize(loss)
+    adam.clear_grad()
+
+.. py:method:: clear_grad()
+
+**注意：**
+
+  **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效**
+
+
+清除需要优化的参数的梯度。
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+
+    paddle.disable_static()
+    value = np.arange(26).reshape(2, 13).astype("float32")
+    a = paddle.to_tensor(value)
+    linear = paddle.nn.Linear(13, 5)
+    optimizer = paddle.optimizer.Adam(learning_rate=0.02,
+                                     parameters=linear.parameters())
+    out = linear(a)
+    out.backward()
+    optimizer.step()
+    optimizer.clear_grad()
+
+.. py:method:: set_lr(value)
+
+**注意：**
+
+  **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效**  
+
+手动设置当前 ``optimizer`` 的学习率。当使用_LRScheduler时，无法使用该API手动设置学习率，因为这将导致冲突。
+
+参数：
+    value (float) - 需要设置的学习率的值。
+
+返回：None
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle
+    paddle.disable_static()
+    linear = paddle.nn.Linear(10, 10)
+
+    adam = paddle.optimizer.Adam(0.1, parameters=linear.parameters())
+
+    # set learning rate manually by python float value
+    lr_list = [0.2, 0.3, 0.4, 0.5, 0.6]
+    for i in range(5):
+        adam.set_lr(lr_list[i])
+        lr = adam.get_lr()
+        print("current lr is {}".format(lr))
+    # Print:
+    #    current lr is 0.2
+    #    current lr is 0.3
+    #    current lr is 0.4
+    #    current lr is 0.5
+    #    current lr is 0.6
+
+.. py:method:: get_lr()
+
+**注意：**
+
+  **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效**
+
+获取当前步骤的学习率。当不使用_LRScheduler时，每次调用的返回值都相同，否则返回当前步骤的学习率。
+
+返回：float，当前步骤的学习率。
+
+
+**代码示例**
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle
+    # example1: _LRScheduler is not used, return value is all the same
+    paddle.disable_static()
+    emb = paddle.nn.Embedding(10, 10, sparse=False)
+    adam = paddle.optimizer.Adam(0.001, parameters = emb.parameters())
+    lr = adam.get_lr()
+    print(lr) # 0.001
+
+    # example2: PiecewiseLR is used, return the step learning rate
+    paddle.disable_static()
+    inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32")
+    linear = paddle.nn.Linear(10, 10)
+    inp = paddle.to_tensor(inp)
+    out = linear(inp)
+    loss = paddle.reduce_mean(out)
+
+    bd = [2, 4, 6, 8]
+    value = [0.2, 0.4, 0.6, 0.8, 1.0]
+    scheduler = paddle.optimizer.PiecewiseLR(bd, value, 0)
+    adam = paddle.optimizer.Adam(scheduler,
+                           parameters=linear.parameters())
+
+    # first step: learning rate is 0.2
+    np.allclose(adam.get_lr(), 0.2, rtol=1e-06, atol=0.0) # True
+
+    # learning rate for different steps
+    ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0]
+    for i in range(12):
+        adam.step()
+        lr = adam.get_lr()
+        scheduler.step()
+        np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True
--- a/doc/paddle/api/paddle/fluid/optimizer/Adamax_cn.rst
+++ b/doc/paddle/api/paddle/fluid/optimizer/Adamax_cn.rst
+.. _cn_api_paddle_optimizer_Adamax:
+
+Adamax
+-------------------------------
+
+.. py:class:: paddle.optimizer.Adamax(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, parameters=None, weight_decay=None, grad_clip=None, name=None)
+
+
+
+
+Adamax优化器是参考 `Adam论文 <https://arxiv.org/abs/1412.6980>`_ 第7节Adamax优化相关内容所实现的。Adamax算法是基于无穷大范数的 `Adam <https://arxiv.org/abs/1412.6980>`_ 算法的一个变种，使学习率更新的算法更加稳定和简单。
+
+其参数更新的计算公式如下:
+
+.. math::
+    \\t = t + 1
+.. math::
+    moment\_out=\beta_1∗moment+(1−\beta_1)∗grad
+.. math::
+    inf\_norm\_out=\max{(\beta_2∗inf\_norm+\epsilon, \left|grad\right|)}
+.. math::
+    learning\_rate=\frac{learning\_rate}{1-\beta_1^t}
+.. math::
+    param\_out=param−learning\_rate*\frac{moment\_out}{inf\_norm\_out}\\
+
+相关论文：`Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_
+
+论文中没有 ``epsilon`` 参数。但是，为了保持数值稳定性， 避免除0错误， 此处增加了这个参数。
+
+参数：
+  - **learning_rate** (float|_LRScheduler) - 学习率，用于参数更新的计算。可以是一个浮点型值或者一个_LRScheduler类，默认值为0.001
+  - **beta1** (float, 可选) - 一阶矩估计的指数衰减率，默认值为0.9
+  - **beta2** (float, 可选) - 二阶矩估计的指数衰减率，默认值为0.999
+  - **epsilon** (float, 可选) - 保持数值稳定性的短浮点类型值，默认值为1e-08
+  - **parameters** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数；在静态图模式下默认值为None，这时所有的参数都将被优化。
+  - **weight_decay** (float|WeightDecayRegularizer，可选) - 正则化方法。可以是float类型的L2正则化系数或者正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、 
+    :ref:`cn_api_fluid_regularizer_L2Decay` 。如果一个参数已经在 :ref:`cn_api_fluid_ParamAttr` 中设置了正则化，这里的正则化设置将被忽略；
+    如果没有在 :ref:`cn_api_fluid_ParamAttr` 中设置正则化，这里的设置才会生效。默认值为None，表示没有正则化。
+  - **grad_clip** (GradientClipBase, 可选) – 梯度裁剪的策略，支持三种裁剪策略： :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。
+    默认值为None，此时将不进行梯度裁剪。
+  - **name** (str, 可选)- 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None
+
+.. note::
+    目前 ``Adamax`` 不支持 Sparse Parameter Optimization（稀疏参数优化）。
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+
+    paddle.disable_static()
+    inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32")
+    linear = paddle.nn.Linear(10, 10)
+    inp = paddle.to_tensor(inp)
+    out = linear(inp)
+    loss = paddle.mean(out)
+    adam = paddle.optimizer.Adamax(learning_rate=0.1,
+            parameters=linear.parameters())
+    out.backward()
+    adam.step()
+    adam.clear_grad()
+     
+
+.. py:method:: step()
+
+**注意：**
+
+  **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效**
+
+执行一次优化器并进行参数更新。
+
+返回：None。
+
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+    paddle.disable_static()
+    value = np.arange(26).reshape(2, 13).astype("float32")
+    a = paddle.to_tensor(value)
+    linear = paddle.nn.Linear(13, 5)
+    adam = paddle.optimizer.Adam(learning_rate = 0.01,
+                                parameters = linear.parameters())
+    out = linear(a)
+    out.backward()
+    adam.step()
+    adam.clear_grad()
+
+.. py:method:: minimize(loss, startup_program=None, parameters=None, no_grad_set=None)
+
+为网络添加反向计算过程，并根据反向计算所得的梯度，更新parameters中的Parameters，最小化网络损失值loss。
+
+参数：
+    - **loss** (Tensor) – 需要最小化的损失值变量
+    - **startup_program** (Program, 可选) – 用于初始化parameters中参数的 :ref:`cn_api_fluid_Program` , 默认值为None，此时将使用 :ref:`cn_api_fluid_default_startup_program` 
+    - **parameters** (list, 可选) – 待更新的Parameter或者Parameter.name组成的列表， 默认值为None，此时将更新所有的Parameter
+    - **no_grad_set** (set, 可选) – 不需要更新的Parameter或者Parameter.name组成集合，默认值为None
+        
+返回: tuple(optimize_ops, params_grads)，其中optimize_ops为参数优化OP列表；param_grads为由(param, param_grad)组成的列表，其中param和param_grad分别为参数和参数的梯度。在静态图模式下，该返回值可以加入到 ``Executor.run()`` 接口的 ``fetch_list`` 参数中，若加入，则会重写 ``use_prune`` 参数为True，并根据 ``feed`` 和 ``fetch_list`` 进行剪枝，详见 ``Executor`` 的文档。
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+
+    paddle.disable_static()
+    inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32")
+    linear = paddle.nn.Linear(10, 10)
+    inp = paddle.to_tensor(inp)
+    out = linear(inp)
+    loss = paddle.mean(out)
+
+    beta1 = paddle.to_tensor([0.9], dtype="float32")
+    beta2 = paddle.to_tensor([0.99], dtype="float32")
+
+    adam = paddle.optimizer.Adamax(learning_rate=0.1,
+            parameters=linear.parameters(),
+            weight_decay=0.01)
+    out.backward()
+    adam.minimize(loss)
+    adam.clear_grad()
+
+
+.. py:method:: clear_grad()
+
+**注意：**
+
+  **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效**
+
+
+清除需要优化的参数的梯度。
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+
+    paddle.disable_static()
+    value = np.arange(26).reshape(2, 13).astype("float32")
+    a = paddle.to_tensor(value)
+    linear = paddle.nn.Linear(13, 5)
+    optimizer = paddle.optimizer.Adamax(learning_rate=0.02,
+                                     parameters=linear.parameters())
+    out = linear(a)
+    out.backward()
+    optimizer.step()
+    optimizer.clear_grad()
+
+.. py:method:: set_lr(value)
+
+**注意：**
+
+  **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效**  
+
+手动设置当前 ``optimizer`` 的学习率。当使用_LRScheduler时，无法使用该API手动设置学习率，因为这将导致冲突。
+
+参数：
+    value (float) - 需要设置的学习率的值。
+
+返回：None
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle
+    paddle.disable_static()
+    linear = paddle.nn.Linear(10, 10)
+
+    adam = paddle.optimizer.Adamax(0.1, parameters=linear.parameters())
+
+    # set learning rate manually by python float value
+    lr_list = [0.2, 0.3, 0.4, 0.5, 0.6]
+    for i in range(5):
+        adam.set_lr(lr_list[i])
+        lr = adam.get_lr()
+        print("current lr is {}".format(lr))
+    # Print:
+    #    current lr is 0.2
+    #    current lr is 0.3
+    #    current lr is 0.4
+    #    current lr is 0.5
+    #    current lr is 0.6
+
+.. py:method:: get_lr()
+
+**注意：**
+
+  **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效**
+
+获取当前步骤的学习率。当不使用_LRScheduler时，每次调用的返回值都相同，否则返回当前步骤的学习率。
+
+返回：float，当前步骤的学习率。
+
+
+**代码示例**
+
+.. code-block:: python
+
+
+    import numpy as np
+    import paddle
+    # example1: _LRScheduler is not used, return value is all the same
+    paddle.disable_static()
+    emb = paddle.nn.Embedding(10, 10, sparse=False)
+    adam = paddle.optimizer.Adamax(0.001, parameters = emb.parameters())
+    lr = adam.get_lr()
+    print(lr) # 0.001
+
+    # example2: PiecewiseLR is used, return the step learning rate
+    paddle.disable_static()
+    inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32")
+    linear = paddle.nn.Linear(10, 10)
+    inp = paddle.to_tensor(inp)
+    out = linear(inp)
+    loss = paddle.reduce_mean(out)
+
+    bd = [2, 4, 6, 8]
+    value = [0.2, 0.4, 0.6, 0.8, 1.0]
+    scheduler = paddle.optimizer.PiecewiseLR(bd, value, 0)
+    adam = paddle.optimizer.Adamax(scheduler,
+                           parameters=linear.parameters())
+
+    # first step: learning rate is 0.2
+    np.allclose(adam.get_lr(), 0.2, rtol=1e-06, atol=0.0) # True
+
+    # learning rate for different steps
+    ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0]
+    for i in range(12):
+        adam.step()
+        lr = adam.get_lr()
+        scheduler.step()
+        np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True
--- a/doc/paddle/api/paddle/fluid/optimizer/MomentumOptimizer_cn.rst
+++ b/doc/paddle/api/paddle/fluid/optimizer/MomentumOptimizer_cn.rst
+.. _cn_api_fluid_optimizer_MomentumOptimizer:
+
+MomentumOptimizer
+-------------------------------
+
+.. py:class::  paddle.fluid.optimizer.MomentumOptimizer(learning_rate, momentum, parameter_list=None, use_nesterov=False, regularization=None, grad_clip=None, name=None)
+
+
+
+
+该接口实现含有速度状态的Simple Momentum 优化器
+
+该优化器含有牛顿动量标志，公式更新如下：
+
+.. math::
+    & velocity = mu * velocity + gradient\\
+    & if (use\_nesterov):\\
+    &\quad   param = param - (gradient + mu * velocity) * learning\_rate\\
+    & else:\\&\quad   param = param - learning\_rate * velocity
+
+参数：
+    - **learning_rate** (float|Variable) - 学习率，用于参数更新。作为数据参数，可以是浮点型值或含有一个浮点型值的变量。
+    - **momentum** (float) - 动量因子。
+    - **parameter_list** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数；在静态图模式下默认值为None，这时所有的参数都将被优化。
+    - **use_nesterov** (bool，可选) - 赋能牛顿动量，默认值False。
+    - **regularization** (WeightDecayRegularizer，可选) - 正则化方法。支持两种正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、 
+      :ref:`cn_api_fluid_regularizer_L2Decay` 。如果一个参数已经在 :ref:`cn_api_fluid_ParamAttr` 中设置了正则化，这里的正则化设置将被忽略；
+      如果没有在 :ref:`cn_api_fluid_ParamAttr` 中设置正则化，这里的设置才会生效。默认值为None，表示没有正则化。
+    - **grad_clip** (GradientClipBase, 可选) – 梯度裁剪的策略，支持三种裁剪策略： :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。
+      默认值为None，此时将不进行梯度裁剪。
+    - **name** (str, 可选) - 可选的名称前缀，一般无需设置，默认值为None。
+
+**代码示例**：
+
+.. code-block:: python
+
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+
+    place = fluid.CPUPlace()
+    main = fluid.Program()
+    with fluid.program_guard(main):
+        x = fluid.layers.data(name='x', shape=[13], dtype='float32')
+        y = fluid.layers.data(name='y', shape=[1], dtype='float32')
+        y_predict = fluid.layers.fc(input=x, size=1, act=None)
+        cost = fluid.layers.square_error_cost(input=y_predict, label=y)
+        avg_cost = fluid.layers.mean(cost)
+
+        moment_optimizer = fluid.optimizer.MomentumOptimizer(learning_rate=0.001, momentum=0.9)
+        moment_optimizer.minimize(avg_cost)
+
+        fetch_list = [avg_cost]
+        train_reader = paddle.batch(
+            paddle.dataset.uci_housing.train(), batch_size=1)
+        feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
+        exe = fluid.Executor(place)
+        exe.run(fluid.default_startup_program())
+        for data in train_reader():
+            exe.run(main, feed=feeder.feed(data), fetch_list=fetch_list)
+
+
+
+.. py:method:: minimize(loss, startup_program=None, parameter_list=None, no_grad_set=None)
+
+为网络添加反向计算过程，并根据反向计算所得的梯度，更新parameter_list中的Parameters，最小化网络损失值loss。
+
+参数：
+    - **loss** (Variable) – 需要最小化的损失值变量
+    - **startup_program** (Program, 可选) – 用于初始化parameter_list中参数的 :ref:`cn_api_fluid_Program` , 默认值为None，此时将使用 :ref:`cn_api_fluid_default_startup_program` 
+    - **parameter_list** (list, 可选) – 待更新的Parameter或者Parameter.name组成的列表， 默认值为None，此时将更新所有的Parameter
+    - **no_grad_set** (set, 可选) – 不需要更新的Parameter或者Parameter.name组成的集合，默认值为None
+        
+返回: tuple(optimize_ops, params_grads)，其中optimize_ops为参数优化OP列表；param_grads为由(param, param_grad)组成的列表，其中param和param_grad分别为参数和参数的梯度。该返回值可以加入到 ``Executor.run()`` 接口的 ``fetch_list`` 参数中，若加入，则会重写 ``use_prune`` 参数为True，并根据 ``feed`` 和 ``fetch_list`` 进行剪枝，详见 ``Executor`` 的文档。
+
+返回类型： tuple
+
+**代码示例**：
+
+.. code-block:: python
+
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+     
+    place = fluid.CPUPlace()
+    main = fluid.Program()
+    with fluid.program_guard(main):
+        x = fluid.layers.data(name='x', shape=[13], dtype='float32')
+        y = fluid.layers.data(name='y', shape=[1], dtype='float32')
+        y_predict = fluid.layers.fc(input=x, size=1, act=None)
+        cost = fluid.layers.square_error_cost(input=y_predict, label=y)
+        avg_cost = fluid.layers.mean(cost)
+        
+        moment_optimizer = fluid.optimizer.MomentumOptimizer(learning_rate=0.001, momentum=0.9)
+        moment_optimizer.minimize(avg_cost)
+        
+        fetch_list = [avg_cost]
+        train_reader = paddle.batch(
+            paddle.dataset.uci_housing.train(), batch_size=1)
+        feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
+        exe = fluid.Executor(place)
+        exe.run(fluid.default_startup_program())
+        for data in train_reader():
+            exe.run(main, feed=feeder.feed(data), fetch_list=fetch_list)
+
+
+
+.. py:method:: clear_gradients()
+
+**注意：**
+
+  **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效**
+
+
+清除需要优化的参数的梯度。
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+    import numpy as np
+
+    with fluid.dygraph.guard():
+        value = np.arange(26).reshape(2, 13).astype("float32")
+        a = fluid.dygraph.to_variable(value)
+        linear = fluid.Linear(13, 5, dtype="float32")
+        optimizer = fluid.optimizer.MomentumOptimizer(learning_rate=0.001, momentum=0.9,
+                                                      parameter_list=linear.parameters())
+        out = linear(a)
+        out.backward()
+        optimizer.minimize(out)
+        optimizer.clear_gradients()
+
+
+.. py:method:: set_lr()
+
+**注意：**
+
+  **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效**  
+
+手动设置当前 ``optimizer`` 的学习率。当使用LearningRateDecay时，无法使用该API手动设置学习率，因为这将导致冲突。
+
+参数：
+    value (float|Variable) - 需要设置的学习率的值。
+
+返回：无
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+            
+    with fluid.dygraph.guard():
+        linear = fluid.dygraph.nn.Linear(10, 10)
+        adam = fluid.optimizer.Adam(0.1, parameter_list=linear.parameters())
+        # 通过Python float数值手动设置学习率
+        lr_list = [0.2, 0.3, 0.4, 0.5, 0.6]
+        for i in range(5):
+            adam.set_lr(lr_list[i])
+            print("current lr is {}".format(adam.current_step_lr()))
+        # 打印结果:
+        #    current lr is 0.2
+        #    current lr is 0.3
+        #    current lr is 0.4
+        #    current lr is 0.5
+        #    current lr is 0.6
+
+
+        # 通过 框架的Variable 设置学习率
+        lr_var = fluid.layers.create_global_var(shape=[1], value=0.7, dtype='float32')
+        adam.set_lr(lr_var)
+        print("current lr is {}".format(adam.current_step_lr()))
+        # 打印结果:
+        #    current lr is 0.7
+
+
+
+.. py:method:: current_step_lr()
+
+**注意：**
+
+  **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效**
+
+获取当前步骤的学习率。当不使用LearningRateDecay时，每次调用的返回值都相同，否则返回当前步骤的学习率。
+
+返回：当前步骤的学习率。
+
+返回类型：float
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+    import numpy as np
+
+    # example1: LearningRateDecay is not used, return value is all the same
+    with fluid.dygraph.guard():
+        emb = fluid.dygraph.Embedding([10, 10])
+        adam = fluid.optimizer.Adam(0.001, parameter_list = emb.parameters())
+        lr = adam.current_step_lr()
+        print(lr) # 0.001
+
+    # example2: PiecewiseDecay is used, return the step learning rate
+    with fluid.dygraph.guard():
+        inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32")
+        linear = fluid.dygraph.nn.Linear(10, 10)
+        inp = fluid.dygraph.to_variable(inp)
+        out = linear(inp)
+        loss = fluid.layers.reduce_mean(out)
+
+        bd = [2, 4, 6, 8]
+        value = [0.2, 0.4, 0.6, 0.8, 1.0]
+        adam = fluid.optimizer.Adam(fluid.dygraph.PiecewiseDecay(bd, value, 0),
+                           parameter_list=linear.parameters())
+
+        # first step: learning rate is 0.2
+        np.allclose(adam.current_step_lr(), 0.2, rtol=1e-06, atol=0.0) # True
+
+        # learning rate for different steps
+        ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0]
+        for i in range(12):
+            adam.minimize(loss)
+            lr = adam.current_step_lr()
+            np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True
+
--- a/doc/paddle/api/paddle/fluid/optimizer/Momentum_cn.rst
+++ b/doc/paddle/api/paddle/fluid/optimizer/Momentum_cn.rst
+.. _cn_api_fluid_optimizer_Momentum:
+
+Momentum
+-------------------------------
+
+.. py:attribute::  paddle.fluid.optimizer.Momentum
+
+
+
+
+``MomentumOptimizer`` 的别名
+
+
+
--- a/doc/paddle/api/paddle/fluid/optimizer/SGD_cn.rst
+++ b/doc/paddle/api/paddle/fluid/optimizer/SGD_cn.rst
+.. _cn_api_fluid_optimizer_SGD:
+
+SGD
+-------------------------------
+
+.. py:attribute::  paddle.fluid.optimizer.SGD
+
+
+
+
+``SGDOptimizer`` 的别名
+
+
+
+
+
+
--- a/doc/paddle/api/paddle/fluid/regularizer/L1Decay_cn.rst
+++ b/doc/paddle/api/paddle/fluid/regularizer/L1Decay_cn.rst
+
+.. _cn_api_fluid_regularizer_L1Decay:
+
+L1Decay
+-------------------------------
+
+.. py:attribute::   paddle.fluid.regularizer.L1Decay(regularization_coeff=0.0)
+
+
+
+
+L1Decay实现L1权重衰减正则化，用于模型训练，使得权重矩阵稀疏。
+
+该类生成的实例对象，需要设置在 :ref:`cn_api_fluid_ParamAttr` 或者 ``optimizer`` 
+(例如 :ref:`cn_api_fluid_optimizer_SGDOptimizer` )中，在 ``ParamAttr`` 中设置时，
+只对该网络层中的参数生效；在 ``optimizer`` 中设置时，会对所有的参数生效；如果同时设置，
+在 ``ParamAttr`` 中设置的优先级会高于在 ``optimizer`` 中设置。
+
+具体实现中，L1权重衰减正则化的计算公式如下：
+
+.. math::
+            \\L1WeightDecay=reg\_coeff∗sign(parameter)\\
+
+参数：
+  - **regularization_coeff** (float) – L1正则化系数，默认值为0.0。
+
+**代码示例1**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+
+    main_prog = fluid.Program()
+    startup_prog = fluid.Program()
+    with fluid.program_guard(main_prog, startup_prog):
+        data = fluid.layers.data(name='image', shape=[3, 28, 28], dtype='float32')
+        label = fluid.layers.data(name='label', shape=[1], dtype='int64')
+        hidden = fluid.layers.fc(input=data, size=128, act='relu')
+        prediction = fluid.layers.fc(input=hidden, size=10, act='softmax')
+        loss = fluid.layers.cross_entropy(input=prediction, label=label)
+        avg_loss = fluid.layers.mean(loss)
+    optimizer = fluid.optimizer.Adagrad(
+        learning_rate=1e-4,
+        regularization=fluid.regularizer.L1Decay(
+            regularization_coeff=0.1))
+    optimizer.minimize(avg_loss)
+
+
+**代码示例2**
+
+.. code-block:: python
+    
+    # 在 ParamAttr 和 optimizer 中同时设置正则化
+    import paddle.fluid as fluid
+    l1 = fluid.regularizer.L1Decay(regularization_coeff=0.1)
+    l2 = fluid.regularizer.L2Decay(regularization_coeff=0.1)
+    x = fluid.layers.uniform_random([3,4])
+    
+    # 在ParamAttr中设置L1正则化
+    w_param = fluid.ParamAttr(regularizer=l1)
+    hidden1 = fluid.layers.fc(x, 8, param_attr=w_param)    # fc_0.w_0(L1), fc_0.b_0
+    hidden2 = fluid.layers.fc(hidden1, 16, param_attr=w_param)   # fc_1.w_0(L1), fc_1.b_0
+    predict = fluid.layers.fc(hidden2, 32)     # fc_3.w_0, fc_3.b_0
+    avg_loss = fluid.layers.mean(predict)
+    
+    # 在optimizer中设置L2正则化
+    optimizer = fluid.optimizer.SGD(learning_rate=1e-4, regularization=l2)
+    optimizer.minimize(avg_loss)
+    
+    # 将会打印出提示信息:
+    # Regularization of [fc_0.w_0, fc_1.w_0] have been set by ParamAttr or WeightNormParamAttr already. 
+    # So, the Regularization of Optimizer will not take effect for these parameters!
+
+
--- a/doc/paddle/api/paddle/fluid/regularizer/L2Decay_cn.rst
+++ b/doc/paddle/api/paddle/fluid/regularizer/L2Decay_cn.rst
+.. _cn_api_fluid_regularizer_L2Decay:
+
+L2Decay
+-------------------------------
+
+.. py:attribute::   paddle.fluid.regularizer.L2Decay
+
+
+
+
+L2Decay实现L2权重衰减正则化，用于模型训练，有助于防止模型对训练数据过拟合。
+
+该类生成的实例对象，需要设置在 :ref:`cn_api_fluid_ParamAttr` 或者 ``optimizer`` 
+(例如 :ref:`cn_api_fluid_optimizer_SGDOptimizer` )中，在 ``ParamAttr`` 中设置时，
+只对该网络层中的参数生效；在 ``optimizer`` 中设置时，会对所有的参数生效；如果同时设置，
+在 ``ParamAttr`` 中设置的优先级会高于在 ``optimizer`` 中设置。
+
+具体实现中，L2权重衰减正则化的计算公式如下：
+
+.. math::
+            \\L2WeightDecay=reg\_coeff*parameter\\
+
+参数:
+  - **regularization_coeff** (float) – 正则化系数，默认值为0.0。
+
+**代码示例1**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+
+    main_prog = fluid.Program()
+    startup_prog = fluid.Program()
+    with fluid.program_guard(main_prog, startup_prog):
+        data = fluid.layers.data(name='image', shape=[3, 28, 28], dtype='float32')
+        label = fluid.layers.data(name='label', shape=[1], dtype='int64')
+        hidden = fluid.layers.fc(input=data, size=128, act='relu')
+        prediction = fluid.layers.fc(input=hidden, size=10, act='softmax')
+        loss = fluid.layers.cross_entropy(input=prediction, label=label)
+        avg_loss = fluid.layers.mean(loss)
+    optimizer = fluid.optimizer.Adagrad(
+        learning_rate=1e-4,
+        regularization=fluid.regularizer.L2Decay(
+            regularization_coeff=0.1))
+    optimizer.minimize(avg_loss)
+
+
+**代码示例2**
+
+.. code-block:: python
+    
+    # 在 ParamAttr 和 optimizer 中同时设置正则化
+    import paddle.fluid as fluid
+    l1 = fluid.regularizer.L1Decay(regularization_coeff=0.1)
+    l2 = fluid.regularizer.L2Decay(regularization_coeff=0.1)
+    x = fluid.layers.uniform_random([3,4])
+    
+    # 在ParamAttr中设置L1正则化
+    w_param = fluid.ParamAttr(regularizer=l1)
+    hidden1 = fluid.layers.fc(x, 8, param_attr=w_param)    # fc_0.w_0(L1), fc_0.b_0
+    hidden2 = fluid.layers.fc(hidden1, 16, param_attr=w_param)  # fc_1.w_0(L1), fc_1.b_0
+    predict = fluid.layers.fc(hidden2, 32)    # fc_3.w_0, fc_3.b_0
+    avg_loss = fluid.layers.mean(predict)
+    
+    # 在optimizer中设置L2正则化
+    optimizer = fluid.optimizer.SGD(learning_rate=1e-4, regularization=l2)
+    optimizer.minimize(avg_loss)
+    
+    # 将会打印出提示信息:
+    # Regularization of [fc_0.w_0, fc_1.w_0] have been set by ParamAttr or WeightNormParamAttr already. 
+    # So, the Regularization of Optimizer will not take effect for these parameters!
+
--- a/doc/paddle/api/paddle/tensor/manipulation/chunk_cn.rst
+++ b/doc/paddle/api/paddle/tensor/manipulation/chunk_cn.rst
+.. _cn_api_tensor_cn_chunk:
+
+chunk
+-------------------------------
+
+.. py:function:: paddle.chunk(x, chunks, axis=0, name=None)
+
+该OP将输入Tensor分割成多个子Tensor。
+
+**参数**：
+       - **x** (Tensor) - 输入变量，数据类型为bool, float16, float32，float64，int32，int64的多维Tensor。
+       - **chunks** (int) - ``chunks`` 是一个整数，表示将输入Tensor划分成多少个相同大小的子Tensor。
+       - **axis** (int|Tensor，可选) - 整数或者形状为[1]的Tensor，数据类型为int32或int64。表示需要分割的维度。如果 ``axis < 0`` ，则划分的维度为 ``rank(x) + axis`` 。默认值为0。
+       - **name** (str，可选) – 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+
+返回：分割后的Tensor列表。
+
+
+**代码示例**
+
+..  code-block:: python
+
+      import numpy as np
+      import paddle
+      
+      paddle.disable_static()
+      # x is a Tensor which shape is [3, 9, 5]
+      x_np = np.random.random([3, 9, 5]).astype("int32")
+      x = paddle.to_tensor(x_np)
+
+      out0, out1, out2 = paddle.chunk(x, chunks=3, axis=1)
+      # out0.shape [3, 3, 5]
+      # out1.shape [3, 3, 5]
+      # out2.shape [3, 3, 5]
+
+      
+      # axis is negative, the real axis is (rank(x) + axis) which real
+      # value is 1.
+      out0, out1, out2 = paddle.chunk(x, chunks=3, axis=-2)
+      # out0.shape [3, 3, 5]
+      # out1.shape [3, 3, 5]
+      # out2.shape [3, 3, 5]