fix_paddle test=develop (#2549)

bb2241a5 · Chen Long · GitHub · 9283a400 · bb2241a5 · bb2241a5
323 changed file
--- a/doc/paddle/api/paddle/compat/round_cn.rst
+++ b/doc/paddle/api/paddle/compat/round_cn.rst
+.. _cn_api_tensor_cn_round:
+
+round
+-------------------------------
+
+.. py:function:: paddle.round(x, name=None)
+
+
+
+该OP将输入中的数值四舍五入到最接近的整数数值。
+
+参数:
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为： float16, float32, float64。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+
+返回：
+    - Tensor，对输入x四舍五入后的Tensor，形状、数据类型与输入x一致。
+
+
+**代码示例**：
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle
+    paddle.disable_static()
+    x_data = np.array([-0.5, -0.2, 0.6, 1.5])
+    x = paddle.to_tensor(x_data)
+    out = paddle.round(x)
+    print(out.numpy())
+    # [-1. -0.  1.  2.]
--- a/doc/paddle/api/paddle/dataset/Conll05_cn.rst
+++ b/doc/paddle/api/paddle/dataset/Conll05_cn.rst
+.. _cn_api_paddle_dataset_Conll05:
+
+Conll05
+-------------------------------
+
+Conll05数据集。Paddle深度学习基础中的语义角色标注文档使用这个数据集为例。因为Conll05数据集不是免费公开的，所以默认下载的url是Conll05的测试集（它是公开的）。用户可以将url和md5更改为其Conll数据集。并采用基于维基百科语料库的预训练词向量模型对SRL模型进行初始化。
+
+
+.. py:function:: paddle.dataset.conll05.get_dict()
+
+获取维基百科语料库的单词、动词和标签字典。
+
+
+.. py:function:: paddle.dataset.conll05.get_embedding()
+
+获取基于维基百科语料库的训练词向量。
+
+
+
+.. py:function:: paddle.dataset.conll05.test()
+
+Conll05测试数据集的creator。
+
+因为训练数据集不是免费公开的，所以用测试数据集进行训练。它返回一个reader creator，reader中的每个样本都有九个特征，包括句子序列、谓词、谓词上下文、谓词上下文标记和标记序列。
+
+返回： 训练数据集的reader creator
+
+返回类型：callable
+
+
+
--- a/doc/paddle/api/paddle/dataset/common/split_cn.rst
+++ b/doc/paddle/api/paddle/dataset/common/split_cn.rst
-.. _cn_api_fluid_layers_split:
-
+.. _cn_api_paddle_tensor_split
 split
 -------------------------------

-.. py:function:: paddle.fluid.layers.split(input, num_or_sections, dim=-1, name=None)
-
+.. py:function:: paddle.tensor.split(x, num_or_sections, axis=0, name=None)



 该OP将输入Tensor分割成多个子Tensor。

-参数：
-    - **input** (Tensor) - 输入变量，数据类型为bool， float16，float32，float64，int32，int64的多维Tensor。
-    - **num_or_sections** (int|list|tuple) - 如果 ``num_or_sections`` 是一个整数，则表示Tensor平均划分为相同大小子Tensor的数量。如果 ``num_or_sections`` 是一个list或tuple，那么它的长度代表子Tensor的数量，它的元素可以是整数或者形状为[1]的Tensor，依次代表子Tensor需要分割成的维度的大小。list或tuple的长度不能超过输入Tensor待分割的维度的大小。至多有一个元素值为-1，-1表示该值是由 ``input`` 待分割的维度值和 ``num_or_sections`` 的剩余元素推断出来的。
-    - **dim** (int|Tenspr，可选) - 整数或者形状为[1]的Tensor，数据类型为int32或int64。表示需要分割的维度。如果 ``dim < 0`` ，则划分的维度为 ``rank(input) + dim`` 。默认值为-1。
-    - **name** (str，可选) - 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+**参数**：
+       - **x** (Tensor) - 输入变量，数据类型为bool, float16, float32，float64，int32，int64的多维Tensor。
+       - **num_or_sections** (int|list|tuple) - 如果 ``num_or_sections`` 是一个整数，则表示Tensor平均划分为相同大小子Tensor的数量。如果 ``num_or_sections`` 是一个list或tuple，那么它的长度代表子Tensor的数量，它的元素可以是整数或者形状为[1]的Tensor，依次代表子Tensor需要分割成的维度的大小。list或tuple的长度不能超过输入Tensor待分割的维度的大小。在list或tuple中，至多有一个元素值为-1，表示该值是由 ``x`` 的维度和其他 ``num_or_sections`` 中元素推断出来的。例如对一个维度为[4,6,6]Tensor的第三维进行分割时，指定 ``num_or_sections=[2,-1,1]`` ，输出的三个Tensor维度分别为：[4,6,2]，[4,6,3]，[4,6,1]。
+       - **axis** (int|Tensor，可选) - 整数或者形状为[1]的Tensor，数据类型为int32或int64。表示需要分割的维度。如果 ``axis < 0`` ，则划分的维度为 ``rank(x) + axis`` 。默认值为0。
+       - **name** (str，可选) – 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。

 返回：分割后的Tensor列表。


-抛出异常：
-    - :code:`TypeError`：``input`` 的数据类型不是bool、float16、float32、float64、int32或int64时 。
-    - :code:`TypeError`：``num_or_sections`` 不是int、list 或 tuple时。
-    - :code:`TypeError`：``dim`` 不是 int 或 Tensor时。当 ``dim`` 为Tensor，其数据类型不是int32或int64时。
-
 **代码示例**：

 .. code-block:: python

-    import paddle.fluid as fluid
-
-    # input is a Tensor which shape is [3, 9, 5]
-    input = fluid.data(
-         name="input", shape=[3, 9, 5], dtype="float32")
+    import numpy as np
+    import paddle
+    
+    paddle.disable_static()
+    # x is a Tensor which shape is [3, 9, 5]
+    x_np = np.random.random([3, 9, 5]).astype("int32")
+    x = paddle.to_tensor(x_np)

-    out0, out1, out2 = fluid.layers.split(input, num_or_sections=3, dim=1)
+    out0, out1, out22 = paddle.split(x, num_or_sections=3, axis=1)
    # out0.shape [3, 3, 5]
    # out1.shape [3, 3, 5]
    # out2.shape [3, 3, 5]

-    out0, out1, out2 = fluid.layers.split(input, num_or_sections=[2, 3, 4], dim=1)
+    out0, out1, out2 = paddle.split(x, num_or_sections=[2, 3, 4], axis=1)
    # out0.shape [3, 2, 5]
    # out1.shape [3, 3, 5]
    # out2.shape [3, 4, 5]

-    out0, out1, out2 = fluid.layers.split(input, num_or_sections=[2, 3, -1], dim=1)
+    out0, out1, out2 = paddle.split(x, num_or_sections=[2, 3, -1], axis=1)
    # out0.shape [3, 2, 5]
    # out1.shape [3, 3, 5]
    # out2.shape [3, 4, 5]
    
-    # dim is negative, the real dim is (rank(input) + axis) which real
+    # axis is negative, the real axis is (rank(x) + axis) which real
    # value is 1.
-    out0, out1, out2 = fluid.layers.split(input, num_or_sections=3, dim=-2)
+    out0, out1, out2 = paddle.split(x, num_or_sections=3, axis=-2)
    # out0.shape [3, 3, 5]
    # out1.shape [3, 3, 5]
    # out2.shape [3, 3, 5]
-
-
-
-
-
-
-
-
--- a/doc/paddle/api/paddle/fluid/dygraph/parallel/ParallelEnv_cn.rst
+++ b/doc/paddle/api/paddle/fluid/dygraph/parallel/ParallelEnv_cn.rst
--- a/doc/paddle/api/paddle/distributed/all_gather_cn.rst
+++ b/doc/paddle/api/paddle/distributed/all_gather_cn.rst
+.. _cn_api_distributed_all_gather:
+
+all_gather
+-------------------------------
+
+
+.. py:function:: paddle.distributed.all_gather(tensor_list, tensor, group=0)
+
+进程组内所有进程的指定tensor进行聚合操作，并返回给所有进程聚合的结果。
+
+参数
+:::::::::
+    - tensor_list (list) - 操作的输出Tensor列表。列表中的每个元素均为Tensor，每个Tensor的数据类型为：float16、float32、float64、int32、int64。
+    - tensor (Tensor) - 操作的输入Tensor。Tensor的数据类型为：float16、float32、float64、int32、int64。
+    - group (int，可选) - 工作的进程组编号，默认为0。
+
+返回
+:::::::::
+无
+
+代码示例
+:::::::::
+.. code-block:: python
+
+        import numpy as np
+        import paddle
+        from paddle.distributed import init_parallel_env
+
+        paddle.disable_static()
+        paddle.set_device('gpu:%d'%paddle.distributed.ParallelEnv().dev_id)
+        init_parallel_env()
+        tensor_list = []
+        if paddle.distributed.ParallelEnv().local_rank == 0:
+            np_data1 = np.array([[4, 5, 6], [4, 5, 6]])
+            np_data2 = np.array([[4, 5, 6], [4, 5, 6]])
+            data1 = paddle.to_tensor(np_data1)
+            data2 = paddle.to_tensor(np_data2)
+            paddle.distributed.all_gather(tensor_list, data1)
+        else:
+            np_data1 = np.array([[1, 2, 3], [1, 2, 3]])
+            np_data2 = np.array([[1, 2, 3], [1, 2, 3]])
+            data1 = paddle.to_tensor(np_data1)
+            data2 = paddle.to_tensor(np_data2)
+            paddle.distributed.all_gather(tensor_list, data2)
--- a/doc/paddle/api/paddle/distributed/all_reduce_cn.rst
+++ b/doc/paddle/api/paddle/distributed/all_reduce_cn.rst
+.. _cn_api_distributed_all_reduce:
+
+all_reduce
+-------------------------------
+
+
+.. py:function:: paddle.distributed.all_reduce(tensor, op=ReduceOp.SUM, group=0)
+
+进程组内所有进程的指定tensor进行归约操作，并返回给所有进程归约的结果。
+
+参数
+:::::::::
+    - tensor (Tensor) - 操作的输入Tensor，同时也会将归约结果返回至此Tensor中。Tensor的数据类型为：float16、float32、float64、int32、int64。
+    - op (ReduceOp.SUM|ReduceOp.MAX|ReduceOp.Min|ReduceOp.PROD，可选) - 归约的具体操作，比如求和，取最大值，取最小值和求乘积，默认为求和归约。
+    - group (int，可选) - 工作的进程组编号，默认为0。
+
+返回
+:::::::::
+无
+
+代码示例
+:::::::::
+.. code-block:: python
+
+        import numpy as np
+        import paddle
+        from paddle.distributed import ReduceOp
+        from paddle.distributed import init_parallel_env
+
+        paddle.disable_static()
+        paddle.set_device('gpu:%d'%paddle.distributed.ParallelEnv().dev_id)
+        init_parallel_env()
+        if paddle.distributed.ParallelEnv().local_rank == 0:
+            np_data = np.array([[4, 5, 6], [4, 5, 6]])
+        else:
+            np_data = np.array([[1, 2, 3], [1, 2, 3]])
+        data = paddle.to_tensor(np_data)
+        paddle.distributed.all_reduce(data)
+        out = data.numpy()
+        # [[5, 7, 9], [5, 7, 9]]
--- a/doc/paddle/api/paddle/distributed/barrier_cn.rst
+++ b/doc/paddle/api/paddle/distributed/barrier_cn.rst
+.. _cn_api_distributed_barrier:
+
+barrier
+-------------------------------
+
+
+.. py:function:: paddle.distributed.barrier(group=0)
+
+同步进程组内的所有进程。
+
+参数
+:::::::::
+    - group (int，可选) - 工作的进程组编号，默认为0。
+
+返回
+:::::::::
+无
+
+代码示例
+:::::::::
+.. code-block:: python
+
+        import paddle
+        from paddle.distributed import init_parallel_env
+
+        paddle.disable_static()
+        paddle.set_device('gpu:%d'%paddle.distributed.ParallelEnv().dev_id)
+        init_parallel_env()
+        paddle.distributed.barrier()
--- a/doc/paddle/api/paddle/distributed/broadcast_cn.rst
+++ b/doc/paddle/api/paddle/distributed/broadcast_cn.rst
+.. _cn_api_distributed_broadcast:
+
+broadcast
+-------------------------------
+
+
+.. py:function:: paddle.distributed.broadcast(tensor, src, group=0)
+
+广播一个Tensor给其他所有进程
+
+参数
+:::::::::
+    - tensor (Tensor) - 如果当前进程编号是源，那么这个Tensor变量将被发送给其他进程，否则这个Tensor将接收源发送过来的数据。Tensor的数据类型为：float16、float32、float64、int32、int64。
+    - src (int) - 发送源的进程编号。
+    - group (int，可选) - 工作的进程组编号，默认为0。
+
+返回
+:::::::::
+无
+
+代码示例
+:::::::::
+.. code-block:: python
+
+        import numpy as np
+        import paddle
+        from paddle.distributed import init_parallel_env
+
+        paddle.disable_static()
+        paddle.set_device('gpu:%d'%paddle.distributed.ParallelEnv().dev_id)
+        init_parallel_env()
+        if paddle.distributed.ParallelEnv().local_rank == 0:
+            np_data = np.array([[4, 5, 6], [4, 5, 6]])
+        else:
+            np_data = np.array([[1, 2, 3], [1, 2, 3]])
+        data = paddle.to_tensor(np_data)
+        paddle.distributed.broadcast(data, 1)
+        out = data.numpy()
+        # [[1, 2, 3], [1, 2, 3]]
--- a/doc/paddle/api/paddle/distributed/fleet/InMemoryDataset_cn.rst
+++ b/doc/paddle/api/paddle/distributed/fleet/InMemoryDataset_cn.rst
-.. _cn_api_fluid_dataset_InMemoryDataset:
-
-InMemoryDataset
-------------------------------
-
-.. py:class:: paddle.fluid.dataset.InMemoryDataset
-
-
-
-
-InMemoryDataset会向内存中加载数据并在训练前缓冲数据。此类由DatasetFactory创建。
-
-**代码示例**:
-
-.. code-block:: python
-
-    dataset = paddle.fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
-
-.. py:method:: set_queue_num(queue_num)
-
-设置 ``Dataset`` 输出队列数量，训练进程会从队列中获取数据。
-
-参数：
-    - **queue_num** (int) - dataset输出队列数量
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    dataset.set_queue_num(12)
-
-.. py:method:: set_fleet_send_batch_size(fleet_send_batch_size)
-
-设置发送batch的大小
-
-参数:
-    - **fleet_send_batch_size** (int) - 设置发送batch的大小。
-
-**代码示例**
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    dataset.set_fleet_send_batch_size(800)
-
-.. py:method:: set_merge_by_lineid(var_list, erase_duplicate_feas=True, min_merge_size=2, keep_unmerged-ins=True)
-
-通过样本id来设置合并，一些线id的实例将会在shuffle之后进行合并，你应该在一个data生成器里面解析样本id。
-
-参数:
-    - **var_list** (list) - 可以被合并的特征列表，其中的每一个元素都是一个 ``Variable`` 。一些类特征我们通常不把它们合并为同样的样本id，所以用户应当指定哪个类特征可以被合并。
-    - **erase_duplicate_feas** (bool) - 合并的时候是否删除重复的特征值。默认为True。
-    - **min_merge_size** (int) - 合并的最小数量。默认为2。
-    - **keep_unmerged_ins** (bool) - 是否保留没有合并的样本，比如有着独特id的样本，或者重复id的数量小于 ``min_merge_size`` 的样本。
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    dataset.set_merge_by_lineid()
-
-.. py:method:: load_into_memory()
-
-向内存中加载数据。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    filelist = ["a.txt", "b.txt"]
-    dataset.set_filelist(filelist)
-    dataset.load_into_memory()
-
-.. py:method:: preload_into_memory()
-
-向内存中以异步模式加载数据。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    filelist = ["a.txt", "b.txt"]
-    dataset.set_filelist(filelist)
-    dataset.preload_into_memory()
-    dataset.wait_preload_done()
-
-.. py:method:: wait_preload_done()
-
-等待 ``preload_into_memory`` 完成。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    filelist = ["a.txt", "b.txt"]
-    dataset.set_filelist(filelist)
-    dataset.preload_into_memory()
-    dataset.wait_preload_done()
-
-.. py:method:: local_shuffle()
-
-局域shuffle。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    filelist = ["a.txt", "b.txt"]
-    dataset.set_filelist(filelist)
-    dataset.load_into_memory()
-    dataset.local_shuffle()
-
-
-.. py:method:: global_shuffle(fleet=None)
-
-全局shuffle。
-
-只能用在分布式模式（单机多进程或多机多进程）中。您如果在分布式模式中运行，应当传递fleet而非None。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    filelist = ["a.txt", "b.txt"]
-    dataset.set_filelist(filelist)
-    dataset.load_into_memory()
-    dataset.global_shuffle(fleet)
-
-参数：
-    - **fleet** (Fleet) – fleet单例。默认为None。
-
-
-.. py:method:: release_memory()
-
-当数据不再使用时，释放InMemoryDataset内存数据。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    filelist = ["a.txt", "b.txt"]
-    dataset.set_filelist(filelist)
-    dataset.load_into_memory()
-    dataset.global_shuffle(fleet)
-    exe = fluid.Executor(fluid.CPUPlace())
-    exe.run(fluid.default_startup_program())
-    exe.train_from_dataset(fluid.default_main_program(), dataset)
-    dataset.release_memory()
-
-.. py:method:: get_memory_data_size(fleet=None)
-
-用户可以调用此函数以了解加载进内存后所有workers中的样本数量。
-
-.. note::
-    该函数可能会导致性能不佳，因为它具有barrier。
-
-参数：
-    - **fleet** (Fleet) – fleet对象。
-
-返回：内存数据的大小。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    filelist = ["a.txt", "b.txt"]
-    dataset.set_filelist(filelist)
-    dataset.load_into_memory()
-    print dataset.get_memory_data_size(fleet)
-
-
-.. py:method:: get_shuffle_data_size(fleet=None)
-
-获取shuffle数据大小，用户可以调用此函数以了解局域/全局shuffle后所有workers中的样本数量。
-
-.. note::
-    该函数可能会导致局域shuffle性能不佳，因为它具有barrier。但其不影响局域shuffle。
-
-参数：
-    - **fleet** (Fleet) – fleet对象。
-
-返回：shuffle数据的大小。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    filelist = ["a.txt", "b.txt"]
-    dataset.set_filelist(filelist)
-    dataset.load_into_memory()
-    dataset.global_shuffle(fleet)
-    print dataset.get_shuffle_data_size(fleet)
-
-
-.. py:method:: set_batch_size(batch_size)
-
-设置batch size。在训练期间生效。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
-    dataset.set_batch_size(128)
-
-参数：
-    - **batch_size** (int) - batch size
-
-.. py:method:: set_fea_eval(record_candidate_size, fea_eval=True)
-
-设置特征打乱特征验证模式，来修正特征level的重要性， 特征打乱需要 ``fea_eval`` 被设置为True。
-
-参数：
-    - **record_candidate_size** (int) - 打乱一个特征的候选实例大小
-    - **fea_eval** (bool) - 是否设置特征验证模式来打乱特征，默认为True。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
-    dataset.set_fea_eval(1000000, True)
-
-.. py:method:: desc()
-
-为 ``DataFeedDesc`` 返回一个缓存信息。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
-    print(dataset.desc())
-
-返回：一个字符串信息
-
-.. py:method:: set_filelist(filelist)
-
-在当前的worker中设置文件列表。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
-    dataset.set_filelist(["a.txt", "b.txt"])
-
-参数：
-    - **filelist** (list) - 文件列表
-
-.. py:method:: set_hdfs_config(fs_name, fs_ugi)
-
-设置hdfs配置：fs名称与ugi。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
-    dataset.set_hdfs_config("my_fs_name", "my_fs_ugi")
-
-参数：
-    - **fs_name** (str) - fs名称
-    - **fs_ugi** (str) - fs ugi
-
-.. py:method:: set_pipe_command(pipe_coommand)
-
-在当前的 ``dataset`` 中设置pipe命令。pipe命令只能使用UNIX的pipe命令
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
-    dataset.set_pipe_command("python my_script.py")
-
-参数：
-    - **pipe_command** (str) - pipe命令
-
-.. py:method:: set_thread(thread_num)
-
-设置进程数量，等于readers的数量。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
-    dataset.set_thread(12)
-
-参数：
-    - **thread_num** (int) - 进程数量
-
-.. py:method:: set_use_var(var_list)
-
-设置将要使用的 ``Variable`` 。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
-    dataset.set_use_var([data, label])
-
-参数：
-    - **var_list** (list) - variable 列表
-
-.. py:method:: slots_shuffle(slots)
-
-该方法是在特征层次上的一个打乱方法，经常被用在有着较大缩放率实例的稀疏矩阵上，为了比较metric，比如auc，在一个或者多个有着baseline的特征上做特征打乱来验证特征level的重要性。
-
-参数：
-    - **slots** (list[string]) - 要打乱特征的集合
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
-    dataset.set_merge_by_lineid()
-    #支持slot 0
-    dataset.slots_shuffle([‘0’])
-
-
-
--- a/doc/paddle/api/paddle/distributed/fleet/QueueDataset_cn.rst
+++ b/doc/paddle/api/paddle/distributed/fleet/QueueDataset_cn.rst
-.. _cn_api_fluid_dataset_QueueDataset:
-
-QueueDataset
-------------------------------
-
-.. py:class:: paddle.fluid.dataset.QueueDataset
-
-
-
-
-流式处理数据。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
-
-
-
-.. py:method:: local_shuffle()
-
-局域shuffle数据
-
-QueueDataset中不支持局域shuffle，可能抛出NotImplementedError
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
-    dataset.local_shuffle()
-
-
-
-.. py:method:: global_shuffle(fleet=None)
-
-全局shuffle数据
-
-QueueDataset中不支持全局shuffle，可能抛出NotImplementedError
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
-    dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
-    dataset.global_shuffle(fleet)
-
-.. py:method:: desc()
-
-为 ``DataFeedDesc`` 返回一个缓存信息。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
-    print(dataset.desc())
-
-返回：一个字符串信息
-
-.. py:method:: set_batch_size(batch_size)
-
-设置batch size。在训练期间生效。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
-    dataset.set_batch_size(128)
-
-参数：
-    - **batch_size** (int) - batch size
-
-.. py:method:: set_fea_eval(record_candidate_size,fea_eval)
-
-参数：
-    - **record_candidate_size** (int) - 打乱一个特征的候选实例大小
-    - **fea_eval** (bool) - 是否设置特征验证模式来打乱特征，默认为True。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
-    dataset.set_fea_eval(1000000, True)
-
-.. py:method:: set_filelist(filelist)
-
-在当前的worker中设置文件列表。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
-    dataset.set_filelist(["a.txt", "b.txt"])
-
-参数：
-    - **filelist** (list) - 文件列表
-
-.. py:method:: set_hdfs_config(fs_name, fs_ugi)
-
-设置hdfs配置：fs名称与ugi。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
-    dataset.set_hdfs_config("my_fs_name", "my_fs_ugi")
-
-参数：
-    - **fs_name** (str) - fs名称
-    - **fs_ugi** (str) - fs ugi
-
-.. py:method:: set_pipe_command(pipe_coommand)
-
-在当前的 ``dataset`` 中设置pipe命令。pipe命令只能使用UNIX的pipe命令
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
-    dataset.set_pipe_command("python my_script.py")
-
-参数：
-    - **pipe_command** (str) - pipe命令
-
-.. py:method:: set_thread(thread_num)
-
-设置进程数量，等于readers的数量。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
-    dataset.set_thread(12)
-
-参数：
-    - **thread_num** (int) - 进程数量
-
-.. py:method:: set_use_var(var_list)
-
-设置将要使用的 ``Variable`` 。
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset()
-    dataset.set_use_var([data, label])
-
-参数：
-    - **var_list** (list) - variable 列表
-
-.. py:method:: slots_shuffle(slots)
-
-该方法是在特征层次上的一个打乱方法，经常被用在有着较大缩放率实例的稀疏矩阵上，为了比较metric，比如auc，在一个或者多个有着baseline的特征上做特征打乱来验证特征level的重要性。
-
-参数：
-    - **slots** (list[string]) - 要打乱特征的集合
-
-**代码示例**:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”)
-    dataset.set_merge_by_lineid()
-    #支持slot 0
-    dataset.slots_shuffle([‘0’])
-
--- a/doc/paddle/api/paddle/distributed/fleet/meta_optimizers/LambOptimizer_cn.rst
+++ b/doc/paddle/api/paddle/distributed/fleet/meta_optimizers/LambOptimizer_cn.rst
-.. _cn_api_fluid_optimizer_LambOptimizer:
-
-LambOptimizer
-------------------------------
-
-.. py:class:: paddle.fluid.optimizer.LambOptimizer(learning_rate=0.001, lamb_weight_decay=0.01, beta1=0.9, beta2=0.999, epsilon=1e-06, parameter_list=None, regularization=None, grad_clip=None, exclude_from_weight_decay_fn=None, name=None)
-
-
-
-
-LAMB（Layer-wise Adaptive Moments optimizer for Batching training）优化器
-LAMB的优化器旨在不降低精度的前提下增大训练的批量大小，其支持自适应的逐元素更新和精确的分层校正。 更多信息请参考 `Large Batch Optimization for
-Deep Learning: Training BERT in 76 minutes <https://arxiv.org/pdf/1904.00962.pdf>`_ 。
-参数更新如下：
-
-.. math::
-
-    \begin{align}
-    \begin{aligned}
-     m_t &= \beta_1 m_{t - 1}+ (1 - \beta_1)g_t \\
-     v_t &= \beta_2 v_{t - 1}  + (1 - \beta_2)g_t^2 \\
-     r_t &= \frac{m_t}{\sqrt{v_t}+\epsilon} \\
-     w_t &= w_{t-1} -\eta_t \frac{\left \| w_{t-1}\right \|}{\left \| r_t + \lambda w_{t-1}\right \|} (r_t + \lambda w_{t-1})
-    \end{aligned}
-    \end{align}
-
-其中 :math:`m` 为第一个动量，:math:`v` 为第二个动量，:math:`\eta` 为学习率，:math:`\lambda` 为 LAMB 权重衰减率。
-
-参数：
-    - **learning_rate** (float|Variable) – 用于更新参数的学习率。可以是浮点数，或数据类型为浮点数的 Variable。
-    - **lamb_weight_decay** (float) – LAMB权重衰减率。
-    - **beta1** (float) – 第一个动量估计的指数衰减率。
-    - **beta2** (float) – 第二个动量估计的指数衰减率。
-    - **epsilon** (float) – 一个小的浮点值，目的是维持数值稳定性。
-    - **parameter_list** (list, 可选) - 指定优化器需要优化的参数。在动态图模式下必须提供该参数；在静态图模式下默认值为None，这时所有的参数都将被优化。
-    - **regularization** (WeightDecayRegularizer，可选) - 正则化方法。支持两种正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、 
-      :ref:`cn_api_fluid_regularizer_L2Decay` 。如果一个参数已经在 :ref:`cn_api_fluid_ParamAttr` 中设置了正则化，这里的正则化设置将被忽略；
-      如果没有在 :ref:`cn_api_fluid_ParamAttr` 中设置正则化，这里的设置才会生效。默认值为None，表示没有正则化。
-    - **grad_clip** (GradientClipBase, 可选) – 梯度裁剪的策略，支持三种裁剪策略： :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。
-      默认值为None，此时将不进行梯度裁剪。
-    - **exclude_from_weight_decay_fn** (function) – 当某个参数作为输入该函数返回值为 ``True`` 时，为该参数跳过权重衰减。 
-    - **name** (str，可选) – 具体用法请参见 :ref:`cn_api_guide_Name` ，一般无需设置，默认值为None。
-
-**代码示例**
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-     
-    data = fluid.layers.data(name='x', shape=[5], dtype='float32')
-    hidden = fluid.layers.fc(input=data, size=10)
-    cost = fluid.layers.mean(hidden)
-
-    def exclude_fn(param):
-        return param.name.endswith('.b_0')
-     
-    optimizer = fluid.optimizer.Lamb(learning_rate=0.002,
-                                     exclude_from_weight_decay_fn=exclude_fn)
-    optimizer.minimize(cost)
-
-
-.. py:method:: minimize(loss, startup_program=None, parameter_list=None, no_grad_set=None)
-
-为网络添加反向计算过程，并根据反向计算所得的梯度，更新parameter_list中的Parameters，最小化网络损失值loss。
-
-参数：
-    - **loss** (Variable) – 需要最小化的损失值变量。
-    - **startup_program** (Program, 可选) – 用于初始化parameter_list中参数的 :ref:`cn_api_fluid_Program` , 默认值为None，此时将使用 :ref:`cn_api_fluid_default_startup_program` 
-    - **parameter_list** (list, 可选) – 待更新的Parameter或者Parameter.name组成的列表， 默认值为None，此时将更新所有的Parameter
-    - **no_grad_set** (set, 可选) – 不需要更新的Parameter或者Parameter.name组成的的集合，默认值为None
-         
-返回: tuple(optimize_ops, params_grads)，其中optimize_ops为参数优化OP列表；param_grads为由(param, param_grad)组成的列表，其中param和param_grad分别为参数和参数的梯度。该返回值可以加入到 ``Executor.run()`` 接口的 ``fetch_list`` 参数中，若加入，则会重写 ``use_prune`` 参数为True，并根据 ``feed`` 和 ``fetch_list`` 进行剪枝，详见 ``Executor`` 的文档。
-
-返回类型： tuple
-
-**代码示例**：
-
-.. code-block:: python
-
-    import numpy
-    import paddle.fluid as fluid
-     
-    x = fluid.layers.data(name='X', shape=[13], dtype='float32')
-    y = fluid.layers.data(name='Y', shape=[1], dtype='float32')
-    y_predict = fluid.layers.fc(input=x, size=1, act=None)
-    cost = fluid.layers.square_error_cost(input=y_predict, label=y)
-    loss = fluid.layers.mean(cost)
-    adam = fluid.optimizer.LambOptimizer(learning_rate=0.2)
-    adam.minimize(loss)
-
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-     
-    x = numpy.random.random(size=(10, 13)).astype('float32')
-    y = numpy.random.random(size=(10, 1)).astype('float32')
-    exe.run(fluid.default_startup_program())
-    outs = exe.run(program=fluid.default_main_program(),
-                   feed={'X': x, 'Y': y},
-                   fetch_list=[loss.name])
-
-
-
-.. py:method:: clear_gradients()
-
-**注意：**
-
-  **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效**
-
-
-清除需要优化的参数的梯度。
-
-**代码示例**
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    import numpy as np
-
-    def exclude_fn(param):
-        return param.name.endswith('.b_0')
-
-    with fluid.dygraph.guard():
-        value = np.arange(26).reshape(2, 13).astype("float32")
-        a = fluid.dygraph.to_variable(value)
-        linear = fluid.Linear(13, 5, dtype="float32")
-        optimizer = fluid.optimizer.LambOptimizer(learning_rate=0.02,
-                                      exclude_from_weight_decay_fn=exclude_fn,
-                                      parameter_list=linear.parameters())
-        out = linear(a)
-        out.backward()
-        optimizer.minimize(out)
-        optimizer.clear_gradients()
-
-.. py:method:: set_lr()
-
-**注意：**
-
-  **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效**  
-
-手动设置当前 ``optimizer`` 的学习率。当使用LearningRateDecay时，无法使用该API手动设置学习率，因为这将导致冲突。
-
-参数：
-    value (float|Variable) - 需要设置的学习率的值。
-
-返回：无
-
-**代码示例**
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-            
-    with fluid.dygraph.guard():
-        linear = fluid.dygraph.nn.Linear(10, 10)
-        adam = fluid.optimizer.Adam(0.1, parameter_list=linear.parameters())
-        # 通过Python float数值手动设置学习率
-        lr_list = [0.2, 0.3, 0.4, 0.5, 0.6]
-        for i in range(5):
-            adam.set_lr(lr_list[i])
-            print("current lr is {}".format(adam.current_step_lr()))
-        # 打印结果:
-        #    current lr is 0.2
-        #    current lr is 0.3
-        #    current lr is 0.4
-        #    current lr is 0.5
-        #    current lr is 0.6
-
-
-        # 通过 框架的Variable 设置学习率
-        lr_var = fluid.layers.create_global_var(shape=[1], value=0.7, dtype='float32')
-        adam.set_lr(lr_var)
-        print("current lr is {}".format(adam.current_step_lr()))
-        # 打印结果:
-        #    current lr is 0.7
-
-
-
-.. py:method:: current_step_lr()
-
-**注意：**
-
-  **1. 该API只在** `Dygraph <../../user_guides/howto/dygraph/DyGraph.html>`_ **模式下生效**
-
-获取当前步骤的学习率。当不使用LearningRateDecay时，每次调用的返回值都相同，否则返回当前步骤的学习率。
-
-返回：当前步骤的学习率。
-
-返回类型：float
-
-**代码示例**
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    import numpy as np
-
-    # example1: LearningRateDecay is not used, return value is all the same
-    with fluid.dygraph.guard():
-        emb = fluid.dygraph.Embedding([10, 10])
-        adam = fluid.optimizer.Adam(0.001, parameter_list = emb.parameters())
-        lr = adam.current_step_lr()
-        print(lr) # 0.001
-
-    # example2: PiecewiseDecay is used, return the step learning rate
-    with fluid.dygraph.guard():
-        inp = np.random.uniform(-0.1, 0.1, [10, 10]).astype("float32")
-        linear = fluid.dygraph.nn.Linear(10, 10)
-        inp = fluid.dygraph.to_variable(inp)
-        out = linear(inp)
-        loss = fluid.layers.reduce_mean(out)
-
-        bd = [2, 4, 6, 8]
-        value = [0.2, 0.4, 0.6, 0.8, 1.0]
-        adam = fluid.optimizer.Adam(fluid.dygraph.PiecewiseDecay(bd, value, 0),
-                           parameter_list=linear.parameters())
-
-        # first step: learning rate is 0.2
-        np.allclose(adam.current_step_lr(), 0.2, rtol=1e-06, atol=0.0) # True
-
-        # learning rate for different steps
-        ret = [0.2, 0.2, 0.4, 0.4, 0.6, 0.6, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0]
-        for i in range(12):
-            adam.minimize(loss)
-            lr = adam.current_step_lr()
-            np.allclose(lr, ret[i], rtol=1e-06, atol=0.0) # True
-
--- a/doc/paddle/api/paddle/distributed/fleet/meta_optimizers/RecomputeOptimizer_cn.rst
+++ b/doc/paddle/api/paddle/distributed/fleet/meta_optimizers/RecomputeOptimizer_cn.rst
-.. _cn_api_fluid_optimizer_RecomputeOptimizer:
-
-RecomputeOptimizer
-------------------------------
-
-
-.. py:class:: paddle.fluid.optimizer.RecomputeOptimizer(optimizer)
-
-:api_attr: 声明式编程模式（静态图)
-
-
-
-通常来讲，一个深度学习的训练流程包含了三个子步骤：首先，运行前向算子来计算Variable和loss的值；其次，运行反向算子来计算参数的梯度；最后，应用优化算法以更新参数值。
-
-在前向运算过程中，反向运算会用到的Variable都会保存在内存中，当模型深度很深时，这会占用大量的内存。
-
-重计算将深度学习网络切分为k个部分（segments）。在每个segment，运行反向运算时会首先运算前向计算。在重计算模式下，前向计算除了checkpoint和一些必须存储在内存中的特殊Variable，其他临时Variable都会被释放，这对节省内存非常有益。
-
-把一个深度学习网络切分为k个segments的Variables被称为checkpoints。用户在使用运行RecomputeOptimizer之前需要先设置checkpoints。
-
-参数: 
-    - **optimizer** (Optimizer)-内部优化器
-
-**代码示例**：
-
-.. code-block:: python
-
-            import paddle.fluid as fluid
-            import numpy as np
-            def gen_data():
-                return {"x": np.random.random(size=(32, 32)).astype('float32'),
-                "y": np.random.randint(2, size=(32, 1)).astype('int64')}
-            def mlp(input_x, input_y, hid_dim=128, label_dim=2):
-                print(input_x)
-                fc_1 = fluid.layers.fc(input=input_x, size=hid_dim)
-                prediction = fluid.layers.fc(input=[fc_1], size=label_dim, act='softmax')
-                cost = fluid.layers.cross_entropy(input=prediction, label=input_y)
-                sum_cost = fluid.layers.reduce_mean(cost)
-                return sum_cost, fc_1, prediction
-            input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
-            input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
-            cost, fc_1, pred = mlp(input_x, input_y)
-
-            sgd = fluid.optimizer.Adam(learning_rate=0.01)
-            sgd = fluid.optimizer.RecomputeOptimizer(sgd)
-            sgd._set_checkpoints([fc_1, pred])
-            sgd.minimize(cost)
-
-            print("Finished optimize")
-            place = fluid.CPUPlace()
-            exe = fluid.Executor(place)
-            exe.run(fluid.default_startup_program())
-            step = 10
-
-            for i in range(step):
-                cost_val = exe.run(feed=gen_data(),
-                       program=fluid.default_main_program(),
-                       fetch_list=[cost.name])
-                print("step=%d cost=%f" % (i, cost_val[0]))
-
-
-.. py:method:: apply_gradients(params_grads)
-
-调用self.apply_gradients
-
-参数：
-    - **params_grads** (list)- 用于优化的(param, grad)对组成的列表
-
-返回：  附加在当前Program的优化算子组成的列表
-
-返回类型：  list
-
-**代码示例**
-
-.. code-block:: python
-
-                import paddle.fluid as fluid
-                import paddle.fluid.framework as framework
-
-                def mlp(input_x, input_y, hid_dim=128, label_dim=2):
-                    fc_1 = fluid.layers.fc(input=input_x, size=hid_dim)
-                    prediction = fluid.layers.fc(input=[fc_1], size=label_dim, act='softmax')
-                    cost = fluid.layers.cross_entropy(input=prediction, label=input_y)
-                    sum_cost = fluid.layers.reduce_mean(cost)
-                    return sum_cost, fc_1, prediction
-
-
-                input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
-                input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
-                cost, fc_1, pred = mlp(input_x, input_y)
-                print("Finished FF")
-
-                sgd = fluid.optimizer.Adam(learning_rate=0.01)
-                sgd = fluid.optimizer.RecomputeOptimizer(sgd)
-                params_grads = sgd.backward(
-                    cost,
-                    startup_program=None,
-                    parameter_list=None,
-                    no_grad_set=None)
-
-                program = cost.block.program
-                with framework.program_guard(program, None):
-                    optimize_ops = sgd.apply_gradients(params_grads)
-
-                print("Finished apply gradients")
-
-.. py:method:: apply_optimize(loss, startup_program, params_grads)
-
-调用self._optimizer的apply_optimize函数
-
-参数：
-    - **loss** (Variable) – 用于优化过程的损失值变量
-    - **startup_program** (Program) – 用于初始化在parameter_list中参数的startup_program
-    - **params_grads** (list)- 用于优化的(param, grad)对组成的列表
-
-返回：  附加在当前Program的算子组成的列表
-
-返回类型：  list
-
-**代码示例**
-
-.. code-block:: python
-
-                import paddle.fluid as fluid
-
-                def mlp(input_x, input_y, hid_dim=128, label_dim=2):
-                    fc_1 = fluid.layers.fc(input=input_x, size=hid_dim)
-                    prediction = fluid.layers.fc(input=[fc_1], size=label_dim, act='softmax')
-                    cost = fluid.layers.cross_entropy(input=prediction, label=input_y)
-                    sum_cost = fluid.layers.reduce_mean(cost)
-                    return sum_cost, fc_1, prediction
-
-                input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
-                input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
-                cost, fc_1, pred = mlp(input_x, input_y)
-                print("Finished FF")
-
-                sgd = fluid.optimizer.Adam(learning_rate=0.01)
-                sgd = fluid.optimizer.RecomputeOptimizer(sgd)
-                params_grads = sgd.backward(
-                    cost,
-                    startup_program=None,
-                    parameter_list=None,
-                    no_grad_set=None)
-
-                optimize_ops = sgd.apply_optimize(
-                    cost, startup_program=None, params_grads=params_grads)
-
-                print("Finished apply_optimize")
-
-.. py:method:: backward(loss, startup_program=None, parameter_list=None, no_grad_set=None, callbacks=None)
-
-带checkpoint的backward函数
-
-参数：
-    - **loss** (Variable) – 需要最小化的损失值变量
-    - **startup_program** (Program, 可选) – 用于初始化parameter_list中参数的 :ref:`cn_api_fluid_Program` , 默认值为None，此时将使用 :ref:`cn_api_fluid_default_startup_program`
-    - **parameter_list** (list, 可选) – 待更新的Parameter或者Parameter.name组成的列表， 默认值为None，此时将更新所有的Parameter
-    - **no_grad_set** (set, 可选) – 不需要更新的Parameter或者Parameter.name组成的的集合，默认值为None
-    - **callbacks** (list, 可选) – 当为某参数附加反向算子时所要运行的callables组成的列表
-    - **checkpoints** (list, 可选) – 一批作为checkpoints的Variables
-
-返回：  由(param, grad)对构成的列表，其中param是参数，grad是其对应的梯度
-
-返回类型：  list
-
-**代码示例**
-
-.. code-block:: python
-
-                import paddle.fluid as fluid
-
-                def mlp(input_x, input_y, hid_dim=128, label_dim=2):
-                    fc_1 = fluid.layers.fc(input=input_x, size=hid_dim)
-                    prediction = fluid.layers.fc(input=[fc_1], size=label_dim, act='softmax')
-                    cost = fluid.layers.cross_entropy(input=prediction, label=input_y)
-                    sum_cost = fluid.layers.reduce_mean(cost)
-                    return sum_cost, fc_1, prediction
-
-
-                input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
-                input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
-                cost, fc_1, pred = mlp(input_x, input_y)
-                print("Finished FF")
-
-                sgd = fluid.optimizer.Adam(learning_rate=0.01)
-                sgd = fluid.optimizer.RecomputeOptimizer(sgd)
-                params_grads = sgd.backward(
-                    cost,
-                    startup_program=None,
-                    parameter_list=None,
-                    no_grad_set=None)
-                print("Finished backward")
-
-
-
--- a/doc/paddle/api/paddle/distributed/get_rank_cn.rst
+++ b/doc/paddle/api/paddle/distributed/get_rank_cn.rst
+.. _cn_api_distributed_get_rank:
+
+get_rank
+----------
+
+..  py:function:: paddle.distributed.get_rank()
+
+返回当前进程的rank。
+
+当前进程rank的值等于环境变量 ``PADDLE_TRAINER_ID`` 的值，默认值为0。
+
+返回
+:::::::::
+(int) 当前进程的rank。
+
+代码示例
+:::::::::
+.. code-block:: python
+
+    import paddle
+    import paddle.distributed as dist
+
+    # execute this command in terminal: export PADDLE_TRAINER_ID=0
+    print("The rank is %d" % dist.get_rank())
+    # The rank is 0
--- a/doc/paddle/api/paddle/distributed/get_world_size_cn.rst
+++ b/doc/paddle/api/paddle/distributed/get_world_size_cn.rst
+.. _cn_api_distributed_get_world_size:
+
+get_world_size
+----------------
+
+.. py:function:: paddle.distributed.get_world_size()
+
+返回参与当前任务的进程数。
+
+当前进程数等于环境变量 ``PADDLE_TRAINERS_NUM`` 的值，默认值为1。
+
+返回
+:::::::::
+(int) 参与任务的进程数。
+
+代码示例
+:::::::::
+.. code-block:: python
+
+    import paddle
+    import paddle.distributed as dist
+
+    # execute this command in terminal: export PADDLE_TRAINERS_NUM=4
+    print("The world_size is %d" % dist.get_world_size())
+    # The world_size is 4
--- a/doc/paddle/api/paddle/distributed/init_parallel_env_cn.rst
+++ b/doc/paddle/api/paddle/distributed/init_parallel_env_cn.rst
+.. _cn_api_distributed_init_parallel_env:
+
+init_parallel_env
+-----------------
+
+.. py:function:: paddle.distributed.init_parallel_env()
+
+初始化动态图模式下的并行训练环境。
+
+.. note::
+    目前仅支持初始化GPU训练环境，使用NCCL进行通信。
+
+返回
+:::::::::
+无
+
+代码示例
+:::::::::
+.. code-block:: python
+
+    import paddle
+    import paddle.nn as nn
+    import paddle.optimizer as opt
+    import paddle.distributed as dist
+
+    class LinearNet(nn.Layer):
+        def __init__(self):
+            super(LinearNet, self).__init__()
+            self._linear1 = nn.Linear(10, 10)
+            self._linear2 = nn.Linear(10, 1)
+            
+        def forward(self, x):
+            return self._linear2(self._linear1(x))
+
+    def train():
+        # 1. enable dynamic mode
+        paddle.disable_static()
+        
+        # 2. initialize parallel environment
+        dist.init_parallel_env()
+
+        # 3. create data parallel layer & optimizer
+        layer = LinearNet()
+        dp_layer = paddle.DataParallel(layer)
+
+        loss_fn = nn.MSELoss()
+        adam = opt.Adam(
+            learning_rate=0.001, parameters=dp_layer.parameters())
+
+        # 4. run layer
+        inputs = paddle.randn([10, 10], 'float32')
+        outputs = dp_layer(inputs)
+        labels = paddle.randn([10, 1], 'float32')
+        loss = loss_fn(outputs, labels)
+        
+        loss = dp_layer.scale_loss(loss)
+        loss.backward()
+        dp_layer.apply_collective_grads()
+
+        adam.step()
+        adam.clear_grad()
+
+    if __name__ == '__main__':
+        dist.spawn(train)
--- a/doc/paddle/api/paddle/fluid/dygraph/parallel/prepare_context_cn.rst
+++ b/doc/paddle/api/paddle/fluid/dygraph/parallel/prepare_context_cn.rst
--- a/doc/paddle/api/paddle/distributed/reduce_cn.rst
+++ b/doc/paddle/api/paddle/distributed/reduce_cn.rst
+.. _cn_api_distributed_reduce:
+
+reduce
+-------------------------------
+
+
+.. py:function:: paddle.distributed.reduce(tensor, dst, op=ReduceOp.SUM, group=0)
+
+进程组内所有进程的指定tensor进行归约操作，并返回给所有进程归约的结果。
+
+参数
+:::::::::
+    - tensor (Tensor) - 操作的输入Tensor，结果返回至目标进程号的Tensor中。Tensor的数据类型为：float16、float32、float64、int32、int64。
+    - dst (int) - 返回操作结果的目标进程编号。
+    - op (ReduceOp.SUM|ReduceOp.MAX|ReduceOp.Min|ReduceOp.PROD，可选) - 归约的具体操作，比如求和，取最大值，取最小值和求乘积，默认为求和归约。
+    - group (int，可选) - 工作的进程组编号，默认为0。
+
+返回
+:::::::::
+无
+
+代码示例
+:::::::::
+.. code-block:: python
+
+        import numpy as np
+        import paddle
+        from paddle.distributed import init_parallel_env
+
+        paddle.disable_static()
+        paddle.set_device('gpu:%d'%paddle.distributed.ParallelEnv().dev_id)
+        init_parallel_env()
+        if paddle.distributed.ParallelEnv().local_rank == 0:
+            np_data = np.array([[4, 5, 6], [4, 5, 6]])
+        else:
+            np_data = np.array([[1, 2, 3], [1, 2, 3]])
+        data = paddle.to_tensor(np_data)
+        paddle.distributed.reduce(data, 0)
+        out = data.numpy()
+        # [[5, 7, 9], [5, 7, 9]]
--- a/doc/paddle/api/paddle/distributed/scatter_cn.rst
+++ b/doc/paddle/api/paddle/distributed/scatter_cn.rst
+.. _cn_api_paddle_cn_scatter:
+
+scatter
+-------------------------------
+.. py:function:: paddle.scatter(x, index, updates, overwrite=True, name=None)
+
+
+通过基于 ``updates`` 来更新选定索引 ``index`` 上的输入来获得输出。具体行为如下：
+
+    .. code-block:: python
+    
+        import numpy as np
+        #input:
+        x = np.array([[1, 1], [2, 2], [3, 3]])
+        index = np.array([2, 1, 0, 1])
+        # shape of updates should be the same as x
+        # shape of updates with dim > 1 should be the same as input
+        updates = np.array([[1, 1], [2, 2], [3, 3], [4, 4]])
+        overwrite = False
+        # calculation:
+        if not overwrite:
+            for i in range(len(index)):
+                x[index[i]] = np.zeros((2))
+        for i in range(len(index)):
+            if (overwrite):
+                x[index[i]] = updates[i]
+            else:
+                x[index[i]] += updates[i]
+        # output:
+        out = np.array([[3, 3], [6, 6], [1, 1]])
+        out.shape # [3, 2]
+
+**Notice：**
+因为 ``updates`` 的应用顺序是不确定的，因此，如果索引 ``index`` 包含重复项，则输出将具有不确定性。
+
+
+参数：
+    - **x** (Tensor) - ndim> = 1的输入N-D张量。 数据类型可以是float32，float64。
+    - **index** （Tensor）- 一维Tensor。 数据类型可以是int32，int64。 ``index`` 的长度不能超过 ``updates`` 的长度，并且 ``index`` 中的值不能超过输入的长度。
+    - **updates** （Tensor）- 根据 ``index`` 使用 ``update`` 参数更新输入 ``x`` 。 形状应与输入 ``x`` 相同，并且dim>1的dim值应与输入 ``x`` 相同。
+    - **overwrite** （bool，可选）- 指定索引 ``index`` 相同时，更新输出的方式。如果为True，则使用覆盖模式更新相同索引的输出，如果为False，则使用累加模式更新相同索引的输出。默认值为True。
+    - **name** （str，可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+
+返回：Tensor，与x有相同形状和数据类型。
+
+
+**代码示例：**
+    .. code-block:: python
+        
+        import paddle
+        import numpy as np
+        paddle.disable_static()
+        x_data = np.array([[1, 1], [2, 2], [3, 3]]).astype(np.float32)
+        index_data = np.array([2, 1, 0, 1]).astype(np.int64)
+        updates_data = np.array([[1, 1], [2, 2], [3, 3], [4, 4]]).astype(np.float32)
+        
+        x = paddle.to_tensor(x_data)
+        index = paddle.to_tensor(index_data)
+        updates = paddle.to_tensor(updates_data)
+
+        output1 = paddle.scatter(x, index, updates, overwrite=False)
+        # [[3., 3.],
+        #  [6., 6.],
+        #  [1., 1.]]
+        output2 = paddle.scatter(x, index, updates, overwrite=True)
+        # CPU device:
+        # [[3., 3.],
+        #  [4., 4.],
+        #  [1., 1.]]
+        # GPU device maybe have two results because of the repeated numbers in index
+        # result 1:
+        # [[3., 3.],
+        #  [4., 4.],
+        #  [1., 1.]]
+        # result 2:
+        # [[3., 3.],
+        #  [2., 2.],
+        #  [1., 1.]]
--- a/doc/paddle/api/paddle/distributed/spawn_cn.rst
+++ b/doc/paddle/api/paddle/distributed/spawn_cn.rst
+.. _cn_api_distributed_spawn:
+
+spawn
+-----
+
+.. py:function:: paddle.distributed.spawn(func, args=(), nprocs=-1, join=True, daemon=False, **options)
+
+使用 ``spawn`` 方法启动多进程任务。
+
+参数
+:::::::::
+    - func (function) - 由 ``spawn`` 方法启动的进程所调用的目标函数。该目标函数需要能够被 ``pickled`` (序列化)，所以目标函数必须定义为模块的一级函数，不能是内部子函数或者类方法。
+    - args (tuple, 可选) - 传入目标函数 ``func`` 的参数。
+    - nprocs (int, 可选) - 启动进程的数目。默认值为-1。当 ``nproc`` 为-1时，模型执行时将会从环境变量中获取当前可用的所有设备进行使用：如果使用GPU执行任务，将会从环境变量 ``CUDA_VISIBLE_DEVICES`` 中获取当前所有可用的设备ID；如果使用CPU执行任务，将会从环境变量 ``CPU_NUM`` 中获取当前可用的CPU设备数，例如，可以通过指令 ``export CPU_NUM=4`` 配置默认可用CPU设备数，如果此环境变量没有设置，将会默认设置该环境变量的值为1。
+    - join (bool, 可选) - 对所有启动的进程执行阻塞的 ``join`` ，等待进程执行结束。默认为True。
+    - daemon (bool, 可选) - 配置启动进程的 ``daemon`` 属性。默认为False。
+    - **options (dict, 可选) - 其他初始化并行执行环境的配置选项。目前支持以下选项： (1) start_method (string) - 启动子进程的方法。进程的启动方法可以是 ``spawn`` ， ``fork`` , ``forkserver`` 。 因为CUDA运行时环境不支持 ``fork`` 方法，当在子进程中使用CUDA时，需要使用 ``spawn`` 或者 ``forkserver`` 方法启动进程。默认方法为 ``spawn`` ； (2) cluster_node_ips (string) - 运行集群的节点（机器）IP，例如 "192.168.0.16,192.168.0.17" ，默认值为 "127.0.0.1" ； (3) node_ip (string) - 当前节点（机器）的IP。例如 "192.168.0.16" , 默认值为 "127.0.0.1" ； (4) started_port (int) - 一个训练节点（机器）上各训练进程的起始端口。例如 6170. 默认值为None ； (5) selected_gpus (string) - 指定训练使用的GPU ID, 例如 "0,1,2,3" ， 默认值为None ； (6) print_config (bool) - 打印当前并行训练的配置， 默认值为False ； (7) use_paddlecloud (bool) - 配置是否使用PaddleCloud启动多进程任务，默认值为False。
+
+返回
+:::::::::
+ ``MultiprocessContext`` 对象，持有创建的多个进程。
+
+代码示例
+:::::::::
+.. code-block:: python
+
+    from __future__ import print_function
+
+    import paddle
+    import paddle.nn as nn
+    import paddle.optimizer as opt
+    import paddle.distributed as dist
+
+    class LinearNet(nn.Layer):
+        def __init__(self):
+            super(LinearNet, self).__init__()
+            self._linear1 = nn.Linear(10, 10)
+            self._linear2 = nn.Linear(10, 1)
+            
+        def forward(self, x):
+            return self._linear2(self._linear1(x))
+
+    def train(print_result=False):
+        # 1. enable dynamic mode
+        paddle.disable_static()
+        
+        # 2. initialize parallel environment
+        dist.init_parallel_env()
+
+        # 3. create data parallel layer & optimizer
+        layer = LinearNet()
+        dp_layer = paddle.DataParallel(layer)
+
+        loss_fn = nn.MSELoss()
+        adam = opt.Adam(
+            learning_rate=0.001, parameters=dp_layer.parameters())
+
+        # 4. run layer
+        inputs = paddle.randn([10, 10], 'float32')
+        outputs = dp_layer(inputs)
+        labels = paddle.randn([10, 1], 'float32')
+        loss = loss_fn(outputs, labels)
+        
+        if print_result is True:
+            print("loss:", loss.numpy())
+        
+        loss = dp_layer.scale_loss(loss)
+        loss.backward()
+        dp_layer.apply_collective_grads()
+
+        adam.step()
+        adam.clear_grad()
+
+    # Usage 1: only pass function. 
+    # If your training method no need any argument, and 
+    # use all visible devices for parallel training. 
+    if __name__ == '__main__':
+        dist.spawn(train)
+
+    # Usage 2: pass function and arguments.
+    # If your training method need some arguments, and 
+    # use all visible devices for parallel training.
+    if __name__ == '__main__':
+        dist.spawn(train, args=(True,))
+
+    # Usage 3: pass function, arguments and nprocs.
+    # If your training method need some arguments, and 
+    # only use part of visible devices for parallel training.
+    # If your machine hold 8 cards {0,1,2,3,4,5,6,7},
+    # this case will use cards {0,1}; If you set 
+    # CUDA_VISIBLE_DEVICES=4,5,6,7, this case will use
+    # cards {4,5}
+    if __name__ == '__main__':
+        dist.spawn(train, args=(True,), nprocs=2)
+
+    # Usage 4: pass function, arguments, nprocs and selected_gpus.
+    # If your training method need some arguments, and 
+    # only use part of visible devices for parallel training,
+    # but you can't set your machine's environment varibale 
+    # CUDA_VISIBLE_DEVICES, such as it is None or all cards
+    # {0,1,2,3,4,5,6,7}, you can pass `selelcted_gpus` to 
+    # select the GPU cards you want to use. For example,
+    # this case will use cards {4,5} if your machine hold 8 cards.
+    if __name__ == '__main__':
+        dist.spawn(train, args=(True,), nprocs=2, selelcted_gpus='4,5')
\ No newline at end of file
--- a/doc/paddle/api/paddle/distribution/Distribution_cn.rst
+++ b/doc/paddle/api/paddle/distribution/Distribution_cn.rst
+.. _cn_api_distribution_Distribution:
+
+Distribution
+-------------------------------
+
+.. py:class:: paddle.distribution.Distribution()
+
+
+
+
+概率分布的抽象基类，在具体的分布中实现具体功能。
+
+
+.. py:function:: sample()
+
+从分布中采样
+
+.. py:function:: entropy()
+
+分布的信息熵
+
+.. py:function:: log_prob(value)
+
+对数概率密度函数
+
+参数：
+    - **value** (Tensor) - 输入张量。
+
+.. py:function:: probs(value)
+
+概率密度函数
+
+参数：
+    - **value** (Tensor) - 输入张量。
+
+.. py:function:: kl_divergence(other)
+
+两个分布之间的KL散度。
+
+参数：
+    - **other** (Distribution) - Distribution的实例。
+
+
+
+
+
+
+
+
--- a/doc/paddle/api/paddle/distribution/Normal_cn.rst
+++ b/doc/paddle/api/paddle/distribution/Normal_cn.rst
-.. _cn_api_fluid_initializer_Normal:
+.. _cn_api_distribution_Normal:

 Normal
 -------------------------------

-.. py:attribute:: paddle.fluid.initializer.Normal
+.. py:class:: paddle.distribution.Normal(loc, scale, name=None)
+
+
+
+
+正态分布
+
+数学公式：
+
+.. math::
+
+    pdf(x; \mu, \sigma) = \frac{1}{Z}e^{\frac {-0.5 (x - \mu)^2}  {\sigma^2} }
+
+    Z = (2 \pi \sigma^2)^{0.5}
+
+上面的数学公式中：
+
+:math:`loc = \mu` : 平均值。
+:math:`scale = \sigma` : 标准差。
+:math:`Z`: 正态分布常量。
+
+参数：
+    - **loc** (int|float|list|numpy.ndarray|Tensor) - 正态分布平均值。数据类型为int、float32、list、numpy.ndarray或Tensor。
+    - **scale** (int|float|list|numpy.ndarray|Tensor) - 正态分布标准差。数据类型为int、float32、list、numpy.ndarray或Tensor。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+**代码示例**：
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle
+    from paddle.distribution import Normal
+
+    paddle.disable_static()
+    # Define a single scalar Normal distribution.
+    dist = Normal(loc=0., scale=3.)
+    # Define a batch of two scalar valued Normals.
+    # The first has mean 1 and standard deviation 11, the second 2 and 22.
+    dist = Normal(loc=[1., 2.], scale=[11., 22.])
+    # Get 3 samples, returning a 3 x 2 tensor.
+    dist.sample([3])
+
+    # Define a batch of two scalar valued Normals.
+    # Both have mean 1, but different standard deviations.
+    dist = Normal(loc=1., scale=[11., 22.])
+
+    # Complete example
+    value_npdata = np.array([0.8], dtype="float32")
+    value_tensor = paddle.to_tensor(value_npdata)
+
+    normal_a = Normal([0.], [1.])
+    normal_b = Normal([0.5], [2.])
+    sample = normal_a.sample([2])
+    # a random tensor created by normal distribution with shape: [2, 1]
+    entropy = normal_a.entropy()
+    # [1.4189385] with shape: [1]
+    lp = normal_a.log_prob(value_tensor)
+    # [-1.2389386] with shape: [1]
+    p = normal_a.probs(value_tensor)
+    # [0.28969154] with shape: [1]
+    kl = normal_a.kl_divergence(normal_b)
+    # [0.34939718] with shape: [1]
+
+
+.. py:function:: sample(shape, seed=0)
+
+生成指定维度的样本
+
+参数：
+    - **shape** (list) - 1维列表，指定生成样本的维度。数据类型为int32。
+    - **seed** (int) - 长整型数。
+    
+返回：预先设计好维度的张量, 数据类型为float32
+
+返回类型：Tensor
+
+.. py:function:: entropy()
+
+信息熵
+    
+返回：正态分布的信息熵, 数据类型为float32
+
+返回类型：Tensor
+
+.. py:function:: log_prob(value)
+
+对数概率密度函数
+
+参数：
+    - **value** (Tensor) - 输入张量。数据类型为float32或float64。
+    
+返回：对数概率, 数据类型与value相同
+
+返回类型：Tensor
+
+.. py:function:: probs(value)
+
+概率密度函数
+
+参数：
+    - **value** (Tensor) - 输入张量。数据类型为float32或float64。
+    
+返回：概率, 数据类型与value相同
+
+返回类型：Tensor
+
+.. py:function:: kl_divergence(other)
+
+两个正态分布之间的KL散度。
+
+参数：
+    - **other** (Normal) - Normal的实例。
+    
+返回：两个正态分布之间的KL散度, 数据类型为float32
+
+返回类型：Tensor

-:alias_main: paddle.nn.initializer.Normal
-:alias: paddle.nn.initializer.Normal
-:old_api: paddle.fluid.initializer.Normal



-``NormalInitializer`` 的别名


--- a/doc/paddle/api/paddle/distribution/Uniform_cn.rst
+++ b/doc/paddle/api/paddle/distribution/Uniform_cn.rst
-.. _cn_api_fluid_initializer_Uniform:
+.. _cn_api_distribution_Uniform:

 Uniform
 -------------------------------

-.. py:attribute:: paddle.fluid.initializer.Uniform
+.. py:class:: paddle.distribution.Uniform(low, high, name=None)

-:alias_main: paddle.nn.initializer.Uniform
-:alias: paddle.nn.initializer.Uniform
-:old_api: paddle.fluid.initializer.Uniform



-``UniformInitializer`` 的别名
+均匀分布
+
+概率密度函数（pdf）为：
+
+.. math::
+
+    pdf(x; a, b) = \frac{1}{Z},  a <=x < b
+
+    Z = b - a
+
+上面的数学公式中：
+
+:math:`low = a` 。
+:math:`high = b` 。
+:math:`Z`: 正态分布常量。
+
+参数low和high的维度必须能够支持广播。
+
+参数：
+    - **low** (int|float|list|numpy.ndarray|Tensor) - 均匀分布的下边界。数据类型为int、float32、list、numpy.ndarray或Tensor。
+    - **high** (int|float|list|numpy.ndarray|Tensor) - 均匀分布的上边界。数据类型为int、float32、list、numpy.ndarray或Tensor。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+**代码示例**：
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle
+    from paddle.distribution import Uniform
+
+    paddle.disable_static()
+    # Without broadcasting, a single uniform distribution [3, 4]:
+    u1 = Uniform(low=3.0, high=4.0)
+    # 2 distributions [1, 3], [2, 4]
+    u2 = Uniform(low=[1.0, 2.0], high=[3.0, 4.0])
+    # 4 distributions
+    u3 = Uniform(low=[[1.0, 2.0], [3.0, 4.0]],
+            high=[[1.5, 2.5], [3.5, 4.5]])
+
+    # With broadcasting:
+    u4 = Uniform(low=3.0, high=[5.0, 6.0, 7.0])
+
+    # Complete example
+    value_npdata = np.array([0.8], dtype="float32")
+    value_tensor = paddle.to_tensor(value_npdata)
+
+    uniform = Uniform([0.], [2.])
+
+    sample = uniform.sample([2])
+    # a random tensor created by uniform distribution with shape: [2, 1]
+    entropy = uniform.entropy()
+    # [0.6931472] with shape: [1]
+    lp = uniform.log_prob(value_tensor)
+    # [-0.6931472] with shape: [1]
+    p = uniform.probs(value_tensor)
+    # [0.5] with shape: [1]
+
+
+.. py:function:: sample(shape, seed=0)
+
+生成指定维度的样本
+
+参数：
+    - **shape** (list) - 1维列表，指定生成样本的维度。数据类型为int32。
+    - **seed** (int) - 长整型数。
+    
+返回：预先设计好维度的张量, 数据类型为float32
+
+返回类型：Tensor
+
+.. py:function:: entropy()
+
+信息熵
+    
+返回：均匀分布的信息熵, 数据类型为float32
+
+返回类型：Tensor
+
+.. py:function:: log_prob(value)
+
+对数概率密度函数
+
+参数：
+    - **value** (Tensor) - 输入张量。数据类型为float32或float64。
+    
+返回：对数概率, 数据类型与value相同
+
+返回类型：Tensor
+
+.. py:function:: probs(value)
+
+概率密度函数
+
+参数：
+    - **value** (Tensor) - 输入张量。数据类型为float32或float64。
+    
+返回：概率, 数据类型与value相同
+
+返回类型：Tensor
+
+



--- a/doc/paddle/api/paddle/fluid/DistributeTranspilerConfig_cn.rst
+++ b/doc/paddle/api/paddle/fluid/DistributeTranspilerConfig_cn.rst
-.. _cn_api_fluid_DistributeTranspilerConfig:
+.. _cn_api_fluid_transpiler_DistributeTranspilerConfig:

 DistributeTranspilerConfig
 -------------------------------

-.. py:class:: paddle.fluid.DistributeTranspilerConfig

+.. py:class:: paddle.fluid.transpiler.DistributeTranspilerConfig

+:api_attr: 声明式编程模式（静态图)



+单机任务切换为分布式任务的配置类，用户可根据需求进行配置，如指定同步/异步训练，指定节点个数及模型切分逻辑。
+
+返回：None
+
 .. py:attribute:: slice_var_up (bool)

-为多个Pserver（parameter server）将tensor切片, 默认为True。
+是否为Pserver将张量切片， 默认为True, bool类型属性， 默认为True。该参数将指定是否将参数/梯度切分后均匀分布于多个PServer上。slice_var_up为True的情况下，会将参数均匀切分后分布于多个PServer端，使每个PServer的负载相对均衡。
+

 .. py:attribute:: split_method (PSDispatcher)

-可使用 RoundRobin 或者 HashName。
+参数分发的方式，当前支持的方法包括 :ref:`cn_api_fluid_transpiler_RoundRobin` 和 :ref:`cn_api_fluid_transpiler_HashName` 两种， 默认为RoundRobin。

-注意: 尝试选择最佳方法来达到Pserver间负载均衡。
+注意: 尝试选择最佳方法来达到负载均衡。

 .. py:attribute:: min_block_size (int)

-block中分割(split)出的元素个数的最小值。
+参数切片时，最小数据块的大小，默认为8192。

-注意: 根据：`issuecomment-369912156 <https://github.com/PaddlePaddle/Paddle/issues/8638#issuecomment-369912156>`_ , 当数据块大小超过2MB时，我们可以有效地使用带宽。如果你想更改它，请详细查看 ``slice_variable`` 函数。
+注意: 根据：https://github.com/PaddlePaddle/Paddle/issues/8638#issuecomment-369912156 ， 当数据块大小超过2MB时，我们可以有效地使用带宽。如果你想更改它，请详细查看slice_variable函数。

 **代码示例**

 .. code-block:: python
-    
-    import paddle.fluid as fluid
-    config = fluid.DistributeTranspilerConfig()
-    config.slice_var_up = True

+        from paddle.fluid.transpiler.ps_dispatcher import RoundRobin
+        import paddle.fluid as fluid
+
+        config = fluid.DistributeTranspilerConfig()
+        config.slice_var_up = True
+        config.split_method = RoundRobin
+        config.min_block_size = 81920



--- a/doc/paddle/api/paddle/fluid/DistributeTranspiler_cn.rst
+++ b/doc/paddle/api/paddle/fluid/DistributeTranspiler_cn.rst
@@ -3,16 +3,17 @@
 DistributeTranspiler
 -------------------------------

-.. py:class:: paddle.fluid.DistributeTranspiler (config=None)

+.. py:class:: paddle.fluid.transpiler.DistributeTranspiler (config=None)
+
+:api_attr: 声明式编程模式（静态图)




 该类可以把fluid program转变为分布式数据并行计算的program, 有PServer和NCCL2两种模式。
 在Pserver（全称：parameter server）模式下， 通过 ``transpile`` 将用于单机训练的 ``program``  转译为可用于parameter server的分布式架构(即PServer,参数服务器)来进行训练的program。
-在NCCL2模式下, 通过 ``transpile`` 将用于单机训练的 ``program``  转译为可用于NCCL2的分布式架构来进行训练的program。在NCCL2模式下，transpiler会在 ``startup_program`` 中附加一个 ``NCCL_ID`` 广播
-算子（broadcasting operators）来实现在该集群中所有工作结点共享``NCCL_ID`` 。 调用 ``transpile_nccl2`` 后， 你 **必须** 将 ``trainer_id`` , ``num_trainers`` 参数提供给 ``Executor`` 来启动NCCL2分布式模式。
+在NCCL2模式下, 通过 ``transpile`` 将用于单机训练的 ``program``  转译为可用于NCCL2的分布式架构来进行训练的program。在NCCL2模式下，transpiler会在 ``startup_program`` 中附加一个 ``NCCL_ID`` 广播算子（broadcasting operators）来实现在该集群中所有工作结点共享``NCCL_ID`` 。 调用 ``transpile_nccl2`` 后， 你 **必须** 将 ``trainer_id`` , ``num_trainers`` 参数提供给 ``Executor`` 来启动NCCL2分布式模式。 


 参数：
@@ -75,15 +76,14 @@ DistributeTranspiler

 通过此方法，可根据用户配置将单机的program转换为当前节点可用的数据并行的分布式program。

-参数:
+参数:    
    - **trainer_id** (int) – 当前Trainer worker的id, 如果有n个Trainer worker, id 取值范围为0 ~ n-1
-    - **program** (Program|None) – 待transpile（转译）的program, 缺省为 ``fluid.default_main_program()``
-    - **startup_program** (Program|None) - 要转译的 ``startup_program`` ,默认为 ``fluid.default_startup_program()``
-    - **pservers** (str) – 内容为Pserver列表的字符串，格式为：按逗号区分不同的Pserver，每个Pserver的格式为 *ip地址:端口号*
+    - **program** (Program|None) – 待transpile（转译）的main program, 默认为 ``fluid.default_main_program()`` 
+    - **pservers** (str) – 内容为Pserver列表的字符串，格式为：按逗号区分不同的Pserver，每个Pserver的格式为 *ip地址:端口号* 
    - **trainers** (int|str) – 在Pserver模式下，该参数指Trainer机的个数；在nccl2模式下，它是一个内容为Trainer终端列表的字符串
    - **sync_mode** (bool) – 是否做同步训练(synchronous training), 默认为True
-    - **startup_program** (Program|None) – 待transpile（转译）的startup_program，默认为 ``fluid.default_main_program()``
-    - **current_endpoint** (str) – 当需要把program转译（transpile）至NCCL2模式下时，需要将当前endpoint（终端）传入该参数。PServer模型下，当用户需要使用增量训练时，必须要指定该参数。
+    - **startup_program** (Program|None) – 待transpile（转译）的startup program，默认为 ``fluid.default_startup_program()``
+    - **current_endpoint** (str) – 当需要把program转译（transpile）至NCCL2模式时，需要将当前endpoint（终端）传入该参数。PServer模型下，当用户需要使用增量训练时，必须要指定该参数。

 返回：None

@@ -104,7 +104,13 @@ DistributeTranspiler
 .. py:method:: get_trainer_program(wait_port=True)


-该方法可以得到Trainer侧的program。
+该方法可以得到Trainer侧的program。Trainer侧的program相较于原始的单机执行的program，主要有以下不同:
+
+     - 删除了参数更新optimizer相关op，参数的更新由Pserver（参数服务器）执行
+     - 在每个参数的反向梯度计算op后，添加了 ``Send_op`` 与 ``Recv_op`` ，用于发送参数的梯度与接受更新后的参数
+
+参数:
+     - **wait_port** (bool,默认值True) - 是否等待参数服务器准备就绪后再返回program

 返回:    Trainer侧的program

@@ -127,11 +133,15 @@ DistributeTranspiler
 .. py:method:: get_pserver_program(endpoint)


-该方法可以得到Pserver（参数服务器）侧的程序
-
-参数:
+该方法可以得到Pserver（参数服务器）侧的program。Pserver侧的program相较于原始的单机执行的program，主要有以下不同:
+     
+     - 仅包含参数更新optimizer相关op，与分布式通信相关op
+     - 0号block仅包含变量的定义及 ``listen_and_serv_op`` 
+     - Pserver为每个需要进行更新的参数新建了一个独立的block
+ 
+参数:    
    - **endpoint** (str) – 当前Pserver终端
-
+ 
 返回:    当前Pserver需要执行的program

 返回类型:    Program
@@ -155,16 +165,16 @@ DistributeTranspiler
 .. py:method:: get_pserver_programs(endpoint)


-该方法可以得到Pserver侧用于分布式训练的 ``main_program`` 和 ``startup_program`` 。
+该方法可以得到Pserver侧用于分布式训练的 ``main_program`` 和 ``startup_program`` 。该函数返回的 ``main_program`` 与函数 ``get_pserver_program`` 的返回值一致。

-参数:
+参数:    
    - **endpoint** (str) – 当前Pserver终端

 返回:    (main_program, startup_program), “Program”类型的元组

-返回类型:    tuple
-
-
+返回类型:    tuple 
+ 
+ 
 **代码示例**

 .. code-block:: python
@@ -187,7 +197,7 @@ DistributeTranspiler
 **该函数已停止使用**
 获取当前Pserver的startup_program，如果有多个被分散到不同blocks的变量，则修改operator的输入变量。

-参数:
+参数:    
    - **endpoint** (str) – 当前Pserver终端
    - **pserver_program** (Program) – 已停止使用。 先调用get_pserver_program
    - **startup_program** (Program) – 已停止使用。应在初始化时传入startup_program
@@ -205,7 +215,7 @@ DistributeTranspiler
          current_endpoint = "192.168.0.1:6174"
          trainer_id = 0
          trainers = 4
-
+           
          t = fluid.DistributeTranspiler()
          t.transpile(trainer_id, pservers=pserver_endpoints, trainers=trainers)
          pserver_program = t.get_pserver_program(current_endpoint)

--- a/doc/paddle/api/paddle/fluid/clip_cn.rst
+++ b/doc/paddle/api/paddle/fluid/clip_cn.rst
-.. _cn_api_fluid_layers_clip:
-
-clip
-------------------------------
-
-.. py:function:: paddle.fluid.layers.clip(x, min, max, name=None)
-
-:alias_main: paddle.nn.clip
-:alias: paddle.nn.clip,paddle.nn.clip.clip
-:old_api: paddle.fluid.layers.clip
-
-
-
-该OP对输入Tensor每个元素的数值进行裁剪，使得输出Tensor元素的数值被限制在区间[min, max]内。具体的计算公式为如下。
-
-.. math::
-
-  Out = MIN(MAX(x,min),max)
-
-
-
-参数：
-        - **x** (Variable)- 多维Tensor，数据类型为float32
-        - **min** (float)- 最小值，输入Tensor中小于该值的元素由min代替。
-        - **max** (float)- 最大值，输入Tensor中大于该值的元素由max替换。
-        - **name** (None|str) – 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None。
-
-返回：  对元素的数值进行裁剪之后的Tesnor，与输入x具有相同的shape和数据类型
-
-返回类型：Variable
-
-**代码示例：**
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    input = fluid.layers.data(
-        name='data', shape=[1], dtype='float32')
-    reward = fluid.layers.clip(x=input, min=-1.0, max=1.0)
-
-
--- a/doc/paddle/api/paddle/fluid/contrib/BeamSearchDecoder_cn.rst
+++ b/doc/paddle/api/paddle/fluid/contrib/BeamSearchDecoder_cn.rst
@@ -20,7 +20,7 @@ BeamSearchDecoder
  - **start_token** (int) - 起始标记id。
  - **end_token** (int) - 结束标记id。
  - **beam_size** (int) - 在beam search中使用的beam宽度。
-  - **embedding_fn** (可选) - 处理选中的候选id的接口。通常，它是一个将词id转换为词嵌入的嵌入层，函数的返回值作为 :code:`cell.call` 接口的 :code:`input` 参数。如果 :code:`embedding_fn` 未提供，则必须在 :code:`cell.call` 中实现词嵌入转换。默认值None。
+  - **embedding_fn** (可选) - 处理选中的候选id的接口。它通常是一个将词id转换为词嵌入的嵌入层，其返回值将作为 :code:`cell.call` 接口的 :code:`input` 参数。**注意** ，这里要使用 :ref:`cn_api_fluid_embedding` 而非 :ref:`cn_api_fluid_layers_embedding`，因为选中的id的形状是 :math:`[batch\_size, beam\_size]` ，如果使用后者则还需要在这里提供unsqueeze。如果 :code:`embedding_fn` 未提供，则必须在 :code:`cell.call` 中实现词嵌入转换。默认值None。
  - **output_fn** (可选) - 处理cell输出的接口，在计算得分和选择候选标记id之前使用。默认值None。

 **示例代码**
@@ -123,7 +123,7 @@ BeamSearchDecoder
 参数：
  - **initial_cell_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。调用者提供的参数。

-返回：一个元组 :code:`(initial_inputs, initial_states, finished)`。:code:`initial_inputs` 是一个tensor，当 :code:`embedding_fn` 为None时，由 :code:`start_token` 填充，形状为 :math:`[batch\_size,beam\_size,1]` ；否则使用 :code:`embedding_fn(t)` 返回的值。:code:`initial_states` 是tensor变量的嵌套结构(命名元组，字段包括 :code:`cell_states，log_probs，finished，lengths`)，其中 :code:`log_probs，finished，lengths` 都含有一个tensor，形状为 :math:`[batch\_size, beam\_size]`，数据类型为float32，bool，int64。:code:`cell_states` 具有与输入参数 :code:`initial_cell_states` 相同结构的值，但形状扩展为 :math:`[batch\_size,beam\_size,...]`。 :code:`finished` 是一个布尔型tensor，由False填充，形状为 :math:`[batch\_size,beam\_size]`。
+返回：一个元组 :code:`(initial_inputs, initial_states, finished)`。:code:`initial_inputs` 是一个tensor，当 :code:`embedding_fn` 为None时，该tensor t的形状为 :math:`[batch\_size,beam\_size]` ，值为 :code:`start_token` ；否则使用 :code:`embedding_fn(t)` 返回的值。:code:`initial_states` 是tensor变量的嵌套结构(命名元组，字段包括 :code:`cell_states，log_probs，finished，lengths`)，其中 :code:`log_probs，finished，lengths` 都含有一个tensor，形状为 :math:`[batch\_size, beam\_size]`，数据类型为float32，bool，int64。:code:`cell_states` 具有与输入参数 :code:`initial_cell_states` 相同结构的值，但形状扩展为 :math:`[batch\_size,beam\_size,...]`。 :code:`finished` 是一个布尔型tensor，由False填充，形状为 :math:`[batch\_size,beam\_size]`。

 返回类型：tuple

@@ -135,7 +135,7 @@ BeamSearchDecoder
  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
  - **logits** (Variable) - 形状为 :math:`[batch\_size,beam\_size,vocab\_size]` 的tensor，表示当前时间步的logits。其数据类型为float32。
  - **next_cell_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。它的结构，形状和数据类型与 :code:`initialize()` 的返回值 :code:`initial_states` 中的 :code:`cell_states` 相同。它代表该cell的下一个状态。
-  - **beam_state** (Variable) - tensor变量的结构。在第一个解码步骤与 :code:`initialize()` 返回的 :code:`initial_states` 同，其他步骤与 :code:`initialize()` 返回的 :code:`beam_search_state` 相同。
+  - **beam_state** (Variable) - tensor变量的结构。在第一个解码步骤与 :code:`initialize()` 返回的 :code:`initial_states` 同，其他步骤与 :code:`step()` 返回的 :code:`beam_search_state` 相同。
  
 返回：一个元组 :code:`(beam_search_output, beam_search_state)`。:code:`beam_search_output` 是tensor变量的命名元组，字段为 :code:`scores，predicted_ids parent_ids`。其中 :code:`scores，predicted_ids，parent_ids` 都含有一个tensor，形状为 :math:`[batch\_size,beam\_size]`，数据类型为float32 ，int64，int64。:code:`beam_search_state` 具有与输入参数 :code:`beam_state` 相同的结构，形状和数据类型。

@@ -146,9 +146,9 @@ BeamSearchDecoder
 执行beam search解码步骤，该步骤使用 :code:`cell` 来计算概率，然后执行beam search步骤以计算得分并选择候选标记ID。
  
 参数：
-  - **time** (Variable) - 调用者提供的形状为[1]的int64tensor，表示当前解码的时间步长。
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。。
  - **inputs** (Variable) - tensor变量。在第一个解码时间步时与由 :code:`initialize()` 返回的 :code:`initial_inputs` 相同，其他时间步与由 :code:`step()` 返回的 :code:`next_inputs` 相同。
-  - **States** (Variable) - tensor变量的结构。在第一个解码时间步时与 :code:`initialize()` 返回的 :code:`initial_states` 相同，其他时间步与由 :code:`step()` 返回的 :code:`beam_search_state` 相同。
+  - **states** (Variable) - tensor变量的结构。在第一个解码时间步时与 :code:`initialize()` 返回的 :code:`initial_states` 相同，其他时间步与由 :code:`step()` 返回的 :code:`beam_search_state` 相同。
  - **kwargs** - 附加的关键字参数，由调用者提供。
  
 返回：一个元组 :code:`(beam_search_output，beam_search_state，next_inputs，finish)` 。:code:`beam_search_state` 和参数 :code:`states` 具有相同的结构，形状和数据类型。 :code:`next_inputs` 与输入参数 :code:`inputs` 具有相同的结构，形状和数据类型。 :code:`beam_search_output` 是tensor变量的命名元组(字段包括 :code:`scores，predicted_ids，parent_ids` )，其中 :code:`scores，predicted_ids，parent_ids` 都含有一个tensor，形状为 :math:`[batch\_size,beam\_size]`，数据类型为float32 ，int64，int64。:code:`finished` 是一个bool类型的tensor，形状为 :math:`[batch\_size,beam\_size]`。
@@ -167,12 +167,3 @@ BeamSearchDecoder
 返回：一个元组 :code:`(predicted_ids, final_states)`。:code:`predicted_ids` 是一个tensor，形状为 :math:`[time\_step，batch\_size,beam\_size]`，数据类型为int64。:code:`final_states` 与输入参数 :code:`final_states` 相同。

 返回类型：tuple
-
-.. py:method:: output_dtype()
-   
-用于beam search输出的数据类型的嵌套结构。它是一个命名元组，字段包括 :code:`scores, predicted_ids, parent_ids`。
-
-参数：无。
-
-返回：用于beam search输出的数据类型的命名元组。
-
--- a/doc/paddle/api/paddle/fluid/data_cn.rst
+++ b/doc/paddle/api/paddle/fluid/data_cn.rst
@@ -6,10 +6,6 @@ data

 .. py:function:: paddle.fluid.data(name, shape, dtype='float32', lod_level=0)

-:api_attr: 声明式编程模式（静态图)
-:alias_main: paddle.nn.data
-:alias: paddle.nn.data,paddle.nn.input.data
-:old_api: paddle.fluid.data




--- a/doc/paddle/api/paddle/fluid/dygraph/Conv2D_cn.rst
+++ b/doc/paddle/api/paddle/fluid/dygraph/Conv2D_cn.rst
@@ -46,7 +46,7 @@ Conv2D

 参数：
    - **num_channels** (int) - 输入图像的通道数。
-    - **num_fliters** (int) - 滤波器的个数，和输出特征图个数相同。
+    - **num_filters** (int) - 滤波器的个数，和输出特征图个数相同。
    - **filter_size** (int|tuple) - 滤波器大小。如果 ``filter_size`` 是一个元组，则必须包含两个整型数，分别表示滤波器高度和宽度。否则，表示滤波器高度和宽度均为 ``filter_size`` 。
    - **stride** (int|tuple, 可选) - 步长大小。如果 ``stride`` 为元组，则必须包含两个整型数，分别表示垂直和水平滑动步长。否则，表示垂直和水平滑动步长均为 ``stride`` 。默认值：1。
    - **padding** (int|tuple, 可选) - 填充大小。如果 ``padding`` 为元组，则必须包含两个整型数，分别表示竖直和水平边界填充大小。否则，表示竖直和水平边界填充大小均为 ``padding`` 。默认值：0。

--- a/doc/paddle/api/paddle/fluid/dygraph/base/grad_cn.rst
+++ b/doc/paddle/api/paddle/fluid/dygraph/base/grad_cn.rst
-.. _cn_api_fluid_dygraph_grad:
-
-grad
-------------------------------
-
-**注意：该API仅支持【动态图】模式**
-
-.. py:method:: paddle.fluid.dygraph.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False, no_grad_vars=None, backward_strategy=None)
-
-对于每个 `inputs` ，计算所有 `outputs` 相对于其的梯度和。
-
-参数:
-    - **outputs** (Variable|list(Variable)|tuple(Variable)) – 用于计算梯度的图的输出变量，或多个输出变量构成的list/tuple。
-    - **inputs** (Variable|list(Variable)|tuple(Variable)) - 用于计算梯度的图的输入变量，或多个输入变量构成的list/tuple。该API的每个返回值对应每个 `inputs` 的梯度。
-    - **grad_outputs** (Variable|list(Variable|None)|tuple(Variable|None), 可选) - `outputs` 变量梯度的初始值。若 `grad_outputs` 为None，则 `outputs` 梯度的初始值均为全1的Tensor。若 `grad_outputs` 不为None，它必须与 `outputs` 的长度相等，此时，若 `grad_outputs` 的第i个元素为None，则第i个 `outputs` 的梯度初始值为全1的Tensor；若 `grad_outputs` 的第i个元素为Variable，则第i个 `outputs` 的梯度初始值为 `grad_outputs` 的第i个元素。默认值为None。
-    - **retain_graph** (bool, 可选) - 是否保留计算梯度的前向图。若值为True，则前向图会保留，用户可对同一张图求两次反向。若值为False，则前向图会释放。默认值为None，表示值与 `create_graph` 相等。
-    - **create_graph** (bool, 可选) - 是否创建计算过程中的反向图。若值为True，则可支持计算高阶导数。若值为False，则计算过程中的反向图会释放。默认值为False。
-    - **only_inputs** (bool, 可选) - 是否只计算 `inputs` 的梯度。若值为False，则图中所有叶节点变量的梯度均会计算，并进行累加。若值为True，则只会计算 `inputs` 的梯度。默认值为True。only_inputs=False功能正在开发中，目前尚不支持。
-    - **allow_unused** (bool, 可选) - 决定当某些 `inputs` 变量不在计算图中时抛出错误还是返回None。若某些 `inputs` 变量不在计算图中（即它们的梯度为None），则当allowed_unused=False时会抛出错误，当allow_unused=True时会返回None作为这些变量的梯度。默认值为False。
-    - **no_grad_vars** (Variable|list(Variable)|tuple(Variable)|set(Variable), 可选) - 指明不需要计算梯度的变量。默认值为None。
-    - **backward_strategy** (BackwardStrategy, 可选) - 计算梯度的策略。详见 :ref:`cn_api_fluid_dygraph_BackwardStrategy` 。默认值为None。
-
-返回: 变量构成的tuple，其长度等于 `inputs` 中的变量个数，且第i个返回的变量是所有 `outputs` 相对于第i个 `inputs` 的梯度之和。
-
-返回类型: tuple
-
-**示例代码 1**
-  .. code-block:: python
-
-        import paddle.fluid as fluid
-
-        def test_dygraph_grad(create_graph):
-            with fluid.dygraph.guard():
-                x = fluid.layers.ones(shape=[1], dtype='float32')
-                x.stop_gradient = False
-                y = x * x
-
-                # Since y = x * x, dx = 2 * x
-                dx = fluid.dygraph.grad(
-                        outputs=[y],
-                        inputs=[x],
-                        create_graph=create_graph,
-                        retain_graph=True)[0]
-
-                z = y + dx
-
-                # If create_graph = False, the gradient of dx
-                # would not be backpropagated. Therefore,
-                # z = x * x + dx, and x.gradient() = 2 * x = 2.0
-
-                # If create_graph = True, the gradient of dx
-                # would be backpropagated. Therefore,
-                # z = x * x + dx = x * x + 2 * x, and
-                # x.gradient() = 2 * x + 2 = 4.0
-
-                z.backward()
-                return x.gradient()
-
-        print(test_dygraph_grad(create_graph=False)) # [2.]
-        print(test_dygraph_grad(create_graph=True)) # [4.]
-
-**示例代码 2**
-  .. code-block:: python
-
-        import paddle.fluid as fluid
-
-        fluid.enable_dygraph()
-
-        def test_dygraph_grad(grad_outputs=None):
-            x = fluid.layers.fill_constant(shape=[1], value=2.0, dtype='float32')
-            x.stop_gradient = False
-
-            y1 = x * x
-            y2 = x * 3
-
-            # If grad_outputs=None, dy1 = [1], dy2 = [1].
-            # If grad_outputs=[g1, g2], then:
-            #    - dy1 = [1] if g1 is None else g1
-            #    - dy2 = [1] if g2 is None else g2
-
-            # Since y1 = x * x, dx = 2 * x * dy1.
-            # Since y2 = x * 3, dx = 3 * dy2.
-            # Therefore, the final result would be:
-            # dx = 2 * x * dy1 + 3 * dy2 = 4 * dy1 + 3 * dy2.
-
-            dx = fluid.dygraph.grad(
-                outputs=[y1, y2],
-                inputs=[x],
-                grad_outputs=grad_outputs)[0]
-
-            return dx.numpy()
-
-        THREE = fluid.layers.fill_constant(shape=[1], value=3.0, dtype='float32')
-        FOUR = fluid.layers.fill_constant(shape=[1], value=4.0, dtype='float32')
-
-        # dy1 = [1], dy2 = [1]
-        print(test_dygraph_grad(None)) # [7.]
-
-        # dy1 = [1], dy2 = [4]
-        print(test_dygraph_grad([None, FOUR])) # [16.]
-
-        # dy1 = [4], dy2 = [1]
-        print(test_dygraph_grad([FOUR, None])) # [19.]
-
-        # dy1 = [3], dy2 = [4]
-        print(test_dygraph_grad([THREE, FOUR])) # [24.]
\ No newline at end of file
--- a/doc/paddle/api/paddle/fluid/dygraph/base/no_grad_cn.rst
+++ b/doc/paddle/api/paddle/fluid/dygraph/base/no_grad_cn.rst
@@ -4,48 +4,49 @@ no_grad
 -------------------------------


-.. py:method:: paddle.fluid.dygraph.no_grad(func=None)
+.. py:class:: paddle.fluid.dygraph.no_grad

 :api_attr: 命令式编程模式（动态图)
-
+:old_api: paddle.fluid.dygraph.no_grad


 创建一个上下文来禁用动态图梯度计算。在此模式下，每次计算的结果都将具有stop_gradient=True。

-也可以用作一个装饰器（确保不要用括号来初始化）。
+也可以用作一个装饰器（需要创建实例对象作为装饰器）。

 **代码示例**

 ..  code-block:: python

-
    import numpy as np
    import paddle.fluid as fluid

+    paddle.enable_imperative()
+
    # 用作生成器
+
    data = np.array([[2, 3], [4, 5]]).astype('float32')
-    with fluid.dygraph.guard():
-        l0 = fluid.Linear(2, 2)  # l0.weight.gradient() is None
-        l1 = fluid.Linear(2, 2)
-        with fluid.dygraph.no_grad():
-            # l1.weight.stop_gradient is False
-            tmp = l1.weight * 2  # tmp.stop_gradient is True
-        x = fluid.dygraph.to_variable(data)
-        y = l0(x) + tmp
-        o = l1(y)
-        o.backward()
-        print(tmp.gradient() is None)  # True
-        print(l0.weight.gradient() is None)  # False
-    
+    l0 = fluid.Linear(2, 2)  # l0.weight.gradient() is None
+    l1 = fluid.Linear(2, 2)
+    with fluid.no_grad():
+        # l1.weight.stop_gradient is False
+        tmp = l1.weight * 2  # tmp.stop_gradient is True
+    x = fluid.dygraph.to_variable(data)
+    y = l0(x) + tmp
+    o = l1(y)
+    o.backward()
+    print(tmp.gradient() is None)  # True
+    print(l0.weight.gradient() is None)  # False
+
    # 用作装饰器
-    @fluid.dygraph.no_grad
+
+    @fluid.no_grad()
    def test_layer():
-        with fluid.dygraph.guard():
-            inp = np.ones([3, 1024], dtype='float32')
-            t = fluid.dygraph.base.to_variable(inp)
-            linear1 = fluid.Linear(1024, 4, bias_attr=False)
-            linear2 = fluid.Linear(4, 4)
-            ret = linear1(t)
-            dy_ret = linear2(ret)
+        inp = np.ones([3, 1024], dtype='float32')
+        t = fluid.dygraph.base.to_variable(inp)
+        linear1 = fluid.Linear(1024, 4, bias_attr=False)
+        linear2 = fluid.Linear(4, 4)
+        ret = linear1(t)
+        dy_ret = linear2(ret)

    test_layer()
--- a/doc/paddle/api/paddle/fluid/dygraph/checkpoint/load_dygraph_cn.rst
+++ b/doc/paddle/api/paddle/fluid/dygraph/checkpoint/load_dygraph_cn.rst
 .. _cn_api_fluid_dygraph_load_dygraph:

-load
----
+load_dygraph
+-------------------------------


-.. py:function:: paddle.load(model_path, configs=None)
+.. py:function:: paddle.fluid.dygraph.load_dygraph(model_path)

 :api_attr: 命令式编程模式（动态图)

-该接口用于从磁盘中加载Layer和Optimizer的 ``state_dict`` ，该接口会同时加载 ``model_path + ".pdparams"`` 和 ``model_path + ".pdopt"`` 中的内容。

-.. note::
-    由于一些历史原因，如果从 ``paddle.io.save_inference_model`` 的存储结果中载入 ``state_dict`` ，动态图模式下参数的结构性变量名将无法被恢复。并且在将载入的 ``state_dict`` 配置到当前Layer中时，需要配置 ``Layer.set_state_dict`` 的参数 ``use_structured_name=False`` 。
+
+该接口尝试从磁盘中加载参数或优化器的 ``dict`` 。
+
+该接口会同时加载 ``model_path + ".pdparams"`` 和 ``model_path + ".pdopt"`` 中的内容。

 参数:
-    - **model_path** (str) – 保存state_dict的文件前缀。该路径不应该包括后缀 ``.pdparams`` 或 ``.pdopt``。
-    - **configs** (SaveLoadConfig, 可选) - 用于指定额外配置选项的 :ref:`cn_api_fluid_dygraph_jit_SaveLoadConfig` 对象，这些选项主要是用于兼容 ``paddle.io.save_inference_model`` 存储模型的格式。默认为 ``None``。
+    - **model_path**  (str) – 保存state_dict的文件前缀。该路径不应该包括后缀 ``.pdparams`` 或 ``.pdopt``。


-返回: 两个 ``dict`` ，即从文件中恢复的模型参数 ``dict`` 和优化器参数 ``dict``，如果只找到其中一个的存储文件，另一个返回None
+返回: 两个 ``dict`` ，即从文件中恢复的参数 ``dict`` 和优化器 ``dict``

- param_dict: 从文件中恢复的模型参数 ``dict``
- opt_dict: 从文件中恢复的优化器参数 ``dict``
+- para_dict: 从文件中恢复的参数 ``dict``
+- opti_dict: 从文件中恢复的优化器 ``dict``

 返回类型: tuple(dict, dict)
  
@@ -29,24 +29,18 @@ load

 .. code-block:: python

-    import paddle
-            
-    paddle.disable_static()
-
-    emb = paddle.nn.Embedding([10, 10])
-
-    state_dict = emb.state_dict()
-    paddle.save(state_dict, "paddle_dy")
+    import paddle.fluid as fluid

-    scheduler = paddle.optimizer.lr_scheduler.NoamLR(
-        d_model=0.01, warmup_steps=100, verbose=True)
-    adam = paddle.optimizer.Adam(
-        learning_rate=scheduler,
-        parameters=emb.parameters())
-    state_dict = adam.state_dict()
-    paddle.save(state_dict, "paddle_dy")
+    with fluid.dygraph.guard():
+        emb = fluid.dygraph.Embedding([10, 10])
+        state_dict = emb.state_dict()
+        fluid.save_dygraph( state_dict, "paddle_dy")
+        adam = fluid.optimizer.Adam( learning_rate = fluid.layers.noam_decay( 100, 10000) ,
+                                     parameter_list = emb.parameters() )
+        state_dict = adam.state_dict()
+        fluid.save_dygraph( state_dict, "paddle_dy")

-    para_state_dict, opti_state_dict = paddle.load("paddle_dy")
+        para_state_dict, opti_state_dict = fluid.load_dygraph( "paddle_dy")



--- a/doc/paddle/api/paddle/fluid/dygraph/enabled_cn.rst
+++ b/doc/paddle/api/paddle/fluid/dygraph/enabled_cn.rst
+.. _cn_api_fluid_dygraph_enabled:
+
+enabled
+-------------------------------
+
+.. py:method:: paddle.fluid.dygraph.enabled()
+
+这个函数用于检查程序是否运行在动态图模式。你可以使用 :ref:`cn_api_fluid_dygraph_guard` api进入动态图模式。或者使用 :ref:`cn_api_fluid_enable_dygraph` 和 :ref:`cn_api_fluid_disable_dygraph` api打开、关闭动态图模式。
+
+注意：   `fluid.dygraph.enabled` 实际上调用了 :ref:`cn_api_fluid_in_dygraph_mode` api，所以推荐使用 :ref:`cn_api_fluid_in_dygraph_mode` api。
+
+返回：   程序是否运行在动态图模式。
+
+返回类型：       bool
+
+**示例代码**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+
+            fluid.enable_dygraph()  # Now we are in dygragh mode
+            print(fluid.dygraph.enabled())  # True
+            fluid.disable_dygraph()
+            print(fluid.dygraph.enabled())  # False
--- a/doc/paddle/api/paddle/fluid/dygraph/guard_cn.rst
+++ b/doc/paddle/api/paddle/fluid/dygraph/guard_cn.rst
-.. _cn_api_fluid_dygraph_guard:
+.. _cn_api_fluid_unique_name_guard:

 guard
 -------------------------------

+.. py:function:: paddle.fluid.unique_name.guard(new_generator=None)

-.. py:function:: paddle.fluid.dygraph.guard(place=None)

-:api_attr: 命令式编程模式（动态图)


+该接口用于更改命名空间，与with语句一起使用。使用后，在with语句的上下文中使用新的命名空间，调用generate接口时相同前缀的名称将从0开始重新编号。

-通过with语句创建一个dygraph运行的context，执行context代码。
+参数:
+  - **new_generator** (str|bytes, 可选) - 新命名空间的名称。请注意，Python2中的str在Python3中被区分为str和bytes两种，因此这里有两种类型。 缺省值为None，若不为None，new_generator将作为前缀添加到generate接口产生的唯一名称中。

-参数：
-    - **place** (fluid.CPUPlace|fluid.CUDAPlace, 可选) –  动态图执行的设备，可以选择cpu，gpu，如果用户未制定，则根据用户paddle编译的方式来选择运行的设备，如果编译的cpu版本，则在cpu上运行，如果是编译的gpu版本，则在gpu上运行。默认值：None。
-
-返回： None
+返回: 无。

 **代码示例**

 .. code-block:: python

-    import numpy as np
-    import paddle.fluid as fluid
-
-    with fluid.dygraph.guard():
-        inp = np.ones([3, 1024], dtype='float32')
-        t = fluid.dygraph.base.to_variable(inp)
-        linear1 = fluid.Linear(1024, 4, bias_attr=False)
-        linear2 = fluid.Linear(4, 4)
-        ret = linear1(t)
-        dy_ret = linear2(ret)
+        import paddle.fluid as fluid
+        with fluid.unique_name.guard():
+            name_1 = fluid.unique_name.generate('fc')
+        with fluid.unique_name.guard():
+            name_2 = fluid.unique_name.generate('fc')
+        print(name_1, name_2)  # fc_0, fc_0
+         
+        with fluid.unique_name.guard('A'):
+            name_1 = fluid.unique_name.generate('fc')
+        with fluid.unique_name.guard('B'):
+            name_2 = fluid.unique_name.generate('fc')
+        print(name_1, name_2)  # Afc_0, Bfc_0


--- a/doc/paddle/api/paddle/fluid/dygraph/io/TranslatedLayer_cn.rst
+++ b/doc/paddle/api/paddle/fluid/dygraph/io/TranslatedLayer_cn.rst
@@ -15,180 +15,70 @@ TranslatedLayer
    .. code-block:: python

        import numpy as np
-        import paddle
-        import paddle.nn as nn
-        import paddle.optimizer as opt
-
-        BATCH_SIZE = 16
-        BATCH_NUM = 4
-        EPOCH_NUM = 4
-
-        IMAGE_SIZE = 784
-        CLASS_NUM = 10
-
-        # define a random dataset
-        class RandomDataset(paddle.io.Dataset):
-            def __init__(self, num_samples):
-                self.num_samples = num_samples
-
-            def __getitem__(self, idx):
-                image = np.random.random([IMAGE_SIZE]).astype('float32')
-                label = np.random.randint(0, CLASS_NUM - 1, (1, )).astype('int64')
+        import paddle.fluid as fluid
+        from paddle.fluid.dygraph import Linear
+        from paddle.fluid.dygraph import declarative
+        BATCH_SIZE = 32
+        BATCH_NUM = 20
+        def random_batch_reader():
+            def _get_random_images_and_labels(image_shape, label_shape):
+                image = np.random.random(size=image_shape).astype('float32')
+                label = np.random.random(size=label_shape).astype('int64')
                return image, label
-
-            def __len__(self):
-                return self.num_samples
-
-        class LinearNet(nn.Layer):
-            def __init__(self):
+            def __reader__():
+                for _ in range(BATCH_NUM):
+                    batch_image, batch_label = _get_random_images_and_labels(
+                        [BATCH_SIZE, 784], [BATCH_SIZE, 1])
+                    yield batch_image, batch_label
+            return __reader__
+        class LinearNet(fluid.dygraph.Layer):
+            def __init__(self, in_size, out_size):
                super(LinearNet, self).__init__()
-                self._linear = nn.Linear(IMAGE_SIZE, CLASS_NUM)
-
-            @paddle.jit.to_static
+                self._linear = Linear(in_size, out_size)
+            @declarative
            def forward(self, x):
                return self._linear(x)
-
-        def train(layer, loader, loss_fn, opt):
-            for epoch_id in range(EPOCH_NUM):
-                for batch_id, (image, label) in enumerate(loader()):
-                    out = layer(image)
-                    loss = loss_fn(out, label)
-                    loss.backward()
-                    opt.step()
-                    opt.clear_grad()
-                    print("Epoch {} batch {}: loss = {}".format(
-                        epoch_id, batch_id, np.mean(loss.numpy())))
-
-        # enable dygraph mode
-        place = paddle.CPUPlace()
-        paddle.disable_static(place) 
-
-        # 1. train & save model.
-
-        # create network
-        layer = LinearNet()
-        loss_fn = nn.CrossEntropyLoss()
-        adam = opt.Adam(learning_rate=0.001, parameters=layer.parameters())
-
-        # create data loader
-        dataset = RandomDataset(BATCH_NUM * BATCH_SIZE)
-        loader = paddle.io.DataLoader(dataset,
-            places=place,
-            batch_size=BATCH_SIZE,
-            shuffle=True,
-            drop_last=True,
-            num_workers=2)
-
-        # train
-        train(layer, loader, loss_fn, adam)
-
-        # save
+        # 开启命令式编程模式
+        fluid.enable_dygraph() 
+        # 1. 训练存储模型.
+        # 创建网络
+        net = LinearNet(784, 1)
+        adam = fluid.optimizer.AdamOptimizer(learning_rate=0.1, parameter_list=net.parameters())
+        # 创建DataLoader
+        train_loader = fluid.io.DataLoader.from_generator(capacity=5)
+        train_loader.set_batch_generator(random_batch_reader())
+        # 训练
+        for data in train_loader():
+            img, label = data
+            label.stop_gradient = True
+            cost = net(img)
+            loss = fluid.layers.cross_entropy(cost, label)
+            avg_loss = fluid.layers.mean(loss)
+            avg_loss.backward()
+            adam.minimize(avg_loss)
+            net.clear_gradients()
        model_path = "linear.example.model"
-        paddle.jit.save(layer, model_path)
-
-        # 2. load model as TranslatedLayer
-
-        # load
-        translated_layer = paddle.jit.load(model_path)
-
-        # inference
+        fluid.dygraph.jit.save(
+            layer=net,
+            model_path=model_path,
+            input_spec=[img])
+        # 2. 载入模型构建TranslatedLayer
+        translated_layer = fluid.dygraph.jit.load(model_path)
+        # 预测
        translated_layer.eval()
-        x = paddle.randn([1, IMAGE_SIZE], 'float32')
+        x = fluid.dygraph.to_variable(np.random.random((1, 784)).astype('float32'))
        pred = translated_layer(x)
-
-        # fine-tune
+        # fine-tune训练
        translated_layer.train()
-        adam = opt.Adam(learning_rate=0.001, parameters=translated_layer.parameters())
-        train(translated_layer, loader, loss_fn, adam)
-
-
-.. py:method:: program(method_name='forward'):
-
-获取TranslatedLayer中指定方法对应的Program。
-
-参数：
-    - **method_name** (string) - 要获取的Porgram对应的方法名。默认值为"forward"。
-
-返回：Program
-
-返回类型：Program
-
-**示例代码：**
-    .. code-block:: python
-
-        import numpy as np
-        import paddle
-        import paddle.nn as nn
-        import paddle.optimizer as opt
-
-        BATCH_SIZE = 16
-        BATCH_NUM = 4
-        EPOCH_NUM = 4
-
-        IMAGE_SIZE = 784
-        CLASS_NUM = 10
-
-        # define a random dataset
-        class RandomDataset(paddle.io.Dataset):
-            def __init__(self, num_samples):
-                self.num_samples = num_samples
-
-            def __getitem__(self, idx):
-                image = np.random.random([IMAGE_SIZE]).astype('float32')
-                label = np.random.randint(0, CLASS_NUM - 1, (1, )).astype('int64')
-                return image, label
-
-            def __len__(self):
-                return self.num_samples
-
-        class LinearNet(nn.Layer):
-            def __init__(self):
-                super(LinearNet, self).__init__()
-                self._linear = nn.Linear(IMAGE_SIZE, CLASS_NUM)
-
-            @paddle.jit.to_static
-            def forward(self, x):
-                return self._linear(x)
-
-        def train(layer, loader, loss_fn, opt):
-            for epoch_id in range(EPOCH_NUM):
-                for batch_id, (image, label) in enumerate(loader()):
-                    out = layer(image)
-                    loss = loss_fn(out, label)
-                    loss.backward()
-                    opt.step()
-                    opt.clear_grad()
-                    print("Epoch {} batch {}: loss = {}".format(
-                        epoch_id, batch_id, np.mean(loss.numpy())))
-
-        # enable dygraph mode
-        place = paddle.CPUPlace()
-        paddle.disable_static(place) 
-
-        # create network
-        layer = LinearNet()
-        loss_fn = nn.CrossEntropyLoss()
-        adam = opt.Adam(learning_rate=0.001, parameters=layer.parameters())
-
-        # create data loader
-        dataset = RandomDataset(BATCH_NUM * BATCH_SIZE)
-        loader = paddle.io.DataLoader(dataset,
-            places=place,
-            batch_size=BATCH_SIZE,
-            shuffle=True,
-            drop_last=True,
-            num_workers=2)
-
-        # train
-        train(layer, loader, loss_fn, adam)
-
-        # save
-        model_path = "linear.example.model"
-        paddle.jit.save(layer, model_path)
-
-        # load
-        translated_layer = paddle.jit.load(model_path)
-
-        # get program
-        program = translated_layer.program()
-        print(program)
+        adam = fluid.optimizer.AdamOptimizer(learning_rate=0.1, parameter_list=translated_layer.parameters())
+        train_loader = fluid.io.DataLoader.from_generator(capacity=5)
+        train_loader.set_batch_generator(random_batch_reader())
+        for data in train_loader():
+            img, label = data
+            label.stop_gradient = True
+            cost = translated_layer(img)
+            loss = fluid.layers.cross_entropy(cost, label)
+            avg_loss = fluid.layers.mean(loss)
+            avg_loss.backward()
+            adam.minimize(avg_loss)
+            translated_layer.clear_gradients()
--- a/doc/paddle/api/paddle/fluid/dygraph/jit/declarative_cn.rst
+++ b/doc/paddle/api/paddle/fluid/dygraph/jit/declarative_cn.rst
@@ -5,8 +5,7 @@ declarative

 .. py:decorator:: paddle.fluid.dygraph.jit.declarative

-本装饰器将函数内的动态图API转化为静态图API。此装饰器自动处理静态图模式下的
-Program和Executor，并将结果作为动态图VarBase返回。
+本装饰器将函数内的动态图API转化为静态图API。此装饰器自动处理静态图模式下的Program和Executor，并将结果作为动态图Tensor返回。输出的动态图Tensor可以继续进行动态图训练、预测或其他运算。如果被装饰的函数里面调用其他动态图函数，被调用的函数也会被转化为静态图函数。

 **示例代码**

@@ -16,6 +15,8 @@ Program和Executor，并将结果作为动态图VarBase返回。
    import numpy as np
    from paddle.fluid.dygraph.jit import declarative

+    fluid.enable_dygraph()
+
    @declarative
    def func(x):
        x = fluid.dygraph.to_variable(x)

--- a/doc/paddle/api/paddle/fluid/dygraph/layers/Layer_cn.rst
+++ b/doc/paddle/api/paddle/fluid/dygraph/layers/Layer_cn.rst
@@ -391,14 +391,13 @@ buffer是一个非参数类型的变量，不会被优化器更新，但在评
        state_dict = emb.state_dict()
        fluid.save_dygraph(state_dict, "paddle_dy")

-.. py:method:: set_state_dict(state_dict, include_sublayers=True, use_structured_name=True)
+.. py:method:: set_dict(stat_dict, include_sublayers=True)

-根据传入的 ``state_dict`` 设置参数和可持久性buffers。 所有参数和buffers将由 ``state_dict`` 中的 ``Tensor`` 设置。
+根据传入的 ``stat_dict`` 设置参数和可持久性buffers。 所有参数和buffers将由 ``stat_dict`` 中的 ``Tensor`` 设置。

 参数：
    - **state_dict** (dict) - 包含所有参数和可持久性buffers的dict。
    - **include_sublayers** (bool, 可选) - 如果设置为True，则还包括子层的参数和buffers。 默认值：True。
-    - **use_structured_name** (bool, 可选) - 如果设置为True，将使用Layer的结构性变量名作为dict的key，否则将使用Parameter或者Buffer的变量名作为key。默认值：True。

 返回：None

@@ -406,16 +405,36 @@ buffer是一个非参数类型的变量，不会被优化器更新，但在评

 .. code-block:: python

-    import paddle
-                
-    paddle.disable_static()
-    
-    emb = paddle.nn.Embedding([10, 10])
+    import paddle.fluid as fluid
+    with fluid.dygraph.guard():
+        emb = fluid.dygraph.Embedding([10, 10])
+        state_dict = emb.state_dict()
+        fluid.save_dygraph(state_dict, "paddle_dy")
+        para_state_dict, _ = fluid.load_dygraph("paddle_dy")
+        emb.set_dict(para_state_dict)
+
+.. py:method:: load_dict(stat_dict, include_sublayers=True)
+
+.. warning::
+    该函数将被弃用。请使用set_dict函数。
+
+根据传入的 ``stat_dict`` 设置参数和可持久性buffers。 所有参数和buffers将由 ``stat_dict`` 中的 ``Tensor`` 设置。
+
+参数：
+    - **state_dict** (dict) - 包含所有参数和可持久性buffers的dict。
+    - **include_sublayers** (bool, 可选) - 如果设置为True，则还包括子层的参数和buffers。 默认值：True。
+
+返回：None

-    state_dict = emb.state_dict()
-    paddle.save(state_dict, "paddle_dy")
-    
-    para_state_dict, _ = paddle.load("paddle_dy")
+**代码示例**
+
+.. code-block:: python

-    emb.set_state_dict(para_state_dict)
+    import paddle.fluid as fluid
+    with fluid.dygraph.guard():
+        emb = fluid.dygraph.Embedding([10, 10])
+        state_dict = emb.state_dict()
+        fluid.save_dygraph(state_dict, "paddle_dy")
+        para_state_dict, _ = fluid.load_dygraph("paddle_dy")
+        emb.load_dict(para_state_dict)

--- a/doc/paddle/api/paddle/fluid/dygraph/parallel/DataParallel_cn.rst
+++ b/doc/paddle/api/paddle/fluid/dygraph/parallel/DataParallel_cn.rst
+.. _cn_api_fluid_dygraph_DataParallel:
+
+DataParallel
+------------
+
+.. py:class:: paddle.fluid.dygraph.DataParallel(layers, strategy)
+
+:api_attr: 命令式编程模式（动态图)
+
+通过数据并行模式执行动态图模型。
+
+目前，``DataParallel`` 仅支持以多进程的方式执行动态图模型。
+
+支持两种使用方式：
+
+1. 使用 ``paddle.distributed.spawn`` 方法启动，例如：
+
+ ``python demo.py`` (spawn need to be called in ``__main__`` method)
+
+2. 使用 ``paddle.distributed.launch`` 方法启动，例如：
+
+``python -m paddle.distributed.launch –selected_gpus=0,1 demo.py``
+
+其中 ``demo.py`` 脚本的代码可以是下面的示例代码。
+
+参数：
+    - **Layer** (Layer) - 需要通过数据并行方式执行的模型。
+    - **strategy** (ParallelStrategy，可选) - (deprecated) 数据并行的策略，包括并行执行的环境配置。默认为None。
+
+返回：支持数据并行的 ``Layer``
+
+返回类型：Layer实例
+
+**代码示例**：
+
+.. code-block:: python
+
+    import paddle
+    import paddle.nn as nn
+    import paddle.optimizer as opt
+    import paddle.distributed as dist
+
+    class LinearNet(nn.Layer):
+        def __init__(self):
+            super(LinearNet, self).__init__()
+            self._linear1 = nn.Linear(10, 10)
+            self._linear2 = nn.Linear(10, 1)
+            
+        def forward(self, x):
+            return self._linear2(self._linear1(x))
+
+    def train():
+        # 1. enable dynamic mode
+        paddle.disable_static()
+        
+        # 2. initialize parallel environment
+        dist.init_parallel_env()
+
+        # 3. create data parallel layer & optimizer
+        layer = LinearNet()
+        dp_layer = paddle.DataParallel(layer)
+
+        loss_fn = nn.MSELoss()
+        adam = opt.Adam(
+            learning_rate=0.001, parameters=dp_layer.parameters())
+
+        # 4. run layer
+        inputs = paddle.randn([10, 10], 'float32')
+        outputs = dp_layer(inputs)
+        labels = paddle.randn([10, 1], 'float32')
+        loss = loss_fn(outputs, labels)
+        
+        loss = dp_layer.scale_loss(loss)
+        loss.backward()
+        dp_layer.apply_collective_grads()
+
+        adam.step()
+        adam.clear_grad()
+
+    if __name__ == '__main__':
+        # 1. start by ``paddle.distributed.spawn`` (default)
+        dist.spawn(train, nprocs=2)
+        # 2. start by ``paddle.distributed.launch``
+        # train()
+
+.. py:method:: scale_loss(loss)
+
+缩放模型损失值 ``loss`` 。在数据并行模式中，损失值 ``loss`` 需要根据并行训练进程的数目进行缩放。
+
+如果不在数据并行模式下，会直接返回原 ``loss`` 。
+
+参数：
+    - **loss** (Variable) - 当前模型的损失值。
+
+返回：缩放后的损失值 ``loss``
+
+返回类型：Variable
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle
+    import paddle.nn as nn
+    import paddle.optimizer as opt
+    import paddle.distributed as dist
+
+    class LinearNet(nn.Layer):
+        def __init__(self):
+            super(LinearNet, self).__init__()
+            self._linear1 = nn.Linear(10, 10)
+            self._linear2 = nn.Linear(10, 1)
+            
+        def forward(self, x):
+            return self._linear2(self._linear1(x))
+
+    def train():
+        # 1. enable dynamic mode
+        paddle.disable_static()
+        
+        # 2. initialize parallel environment
+        dist.init_parallel_env()
+
+        # 3. create data parallel layer & optimizer
+        layer = LinearNet()
+        dp_layer = paddle.DataParallel(layer)
+
+        loss_fn = nn.MSELoss()
+        adam = opt.Adam(
+            learning_rate=0.001, parameters=dp_layer.parameters())
+
+        # 4. run layer
+        inputs = paddle.randn([10, 10], 'float32')
+        outputs = dp_layer(inputs)
+        labels = paddle.randn([10, 1], 'float32')
+        loss = loss_fn(outputs, labels)
+        
+        loss = dp_layer.scale_loss(loss)
+        loss.backward()
+        dp_layer.apply_collective_grads()
+
+        adam.step()
+        adam.clear_grad()
+
+    if __name__ == '__main__':
+        # 1. start by ``paddle.distributed.spawn`` (default)
+        dist.spawn(train, nprocs=2)
+        # 2. start by ``paddle.distributed.launch``
+        # train()
+
+
+.. py:method:: apply_collective_grads()
+
+AllReduce（规约）参数的梯度值。
+
+返回：无
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle
+    import paddle.nn as nn
+    import paddle.optimizer as opt
+    import paddle.distributed as dist
+
+    class LinearNet(nn.Layer):
+        def __init__(self):
+            super(LinearNet, self).__init__()
+            self._linear1 = nn.Linear(10, 10)
+            self._linear2 = nn.Linear(10, 1)
+            
+        def forward(self, x):
+            return self._linear2(self._linear1(x))
+
+    def train():
+        # 1. enable dynamic mode
+        paddle.disable_static()
+        
+        # 2. initialize parallel environment
+        dist.init_parallel_env()
+
+        # 3. create data parallel layer & optimizer
+        layer = LinearNet()
+        dp_layer = paddle.DataParallel(layer)
+
+        loss_fn = nn.MSELoss()
+        adam = opt.Adam(
+            learning_rate=0.001, parameters=dp_layer.parameters())
+
+        # 4. run layer
+        inputs = paddle.randn([10, 10], 'float32')
+        outputs = dp_layer(inputs)
+        labels = paddle.randn([10, 1], 'float32')
+        loss = loss_fn(outputs, labels)
+        
+        loss = dp_layer.scale_loss(loss)
+        loss.backward()
+        dp_layer.apply_collective_grads()
+
+        adam.step()
+        adam.clear_grad()
+
+    if __name__ == '__main__':
+        # 1. start by ``paddle.distributed.spawn`` (default)
+        dist.spawn(train, nprocs=2)
+        # 2. start by ``paddle.distributed.launch``
+        # train()
--- a/doc/paddle/api/paddle/fluid/executor/global_scope_cn.rst
+++ b/doc/paddle/api/paddle/fluid/executor/global_scope_cn.rst
-.. _cn_api_fluid_global_scope:
+.. _cn_api_fluid_executor_global_scope:

 global_scope
 -------------------------------
@@ -25,4 +25,4 @@ global_scope

        fluid.global_scope().var("data").get_tensor().set(numpy.ones((1, 2)), fluid.CPUPlace())
        data = numpy.array(fluid.global_scope().find_var("data").get_tensor())
-        print(data)  # [[1. 1.]]
\ No newline at end of file
+        print(data)  # [[1. 1.]]
--- a/doc/paddle/api/paddle/fluid/executor/scope_guard_cn.rst
+++ b/doc/paddle/api/paddle/fluid/executor/scope_guard_cn.rst
-.. _cn_api_fluid_scope_guard:
+.. _cn_api_fluid_executor_scope_guard:

 scope_guard
 -------------------------------


-.. py:function:: paddle.fluid.scope_guard(scope)
+.. py:function:: paddle.fluid.executor.scope_guard (scope)

 :api_attr: 声明式编程模式（静态图)


--- a/doc/paddle/api/paddle/fluid/framework/Program_cn.rst
+++ b/doc/paddle/api/paddle/fluid/framework/Program_cn.rst
-.. _cn_api_fluid_Program:
-
-Program
-------------------------------
-
-.. py:class::  paddle.fluid.Program
-
-
-
-
-**注意：默认情况下，Paddle Fluid内部默认含有** :ref:`cn_api_fluid_default_startup_program` **和** :ref:`cn_api_fluid_default_main_program` **，它们共享参数。** :ref:`cn_api_fluid_default_startup_program` **只运行一次来初始化参数，** :ref:`cn_api_fluid_default_main_program` **在每个mini batch中运行并更新权重。**
-
-Program是Paddle Fluid对于计算图的一种静态描述，使用Program的构造函数可以创建一个Program。Program中包括至少一个 :ref:`api_guide_Block` ，当 :ref:`api_guide_Block` 中存在条件选择的控制流OP（例如 :ref:`cn_api_fluid_layers_While` 等）时，该Program将会含有嵌套着的 :ref:`api_guide_Block` 即控制流外部的 :ref:`api_guide_Block` 将包含着控制流内部的 :ref:`api_guide_Block` ，而嵌套的 :ref:`api_guide_Block` 的元素访问控制将由具体的控制流OP来决定。关于Program具体的结构和包含的类型请参阅 `framework.proto <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/framework.proto>`_
-。
-
-一个Program的集合通常包含初始化程序（startup_program）与主程序(main_program)，初始化程序是一个包含一些初始化工作的Program，主程序将会包含用来训练的网络结构和变量，在使用同一个 :ref:`api_guide_executor` 执行时他们会共享初始化工作的结果，例如初始化的参数。一个Program的集合可以被用来测试或者训练，被用来训练时， ``Paddle Fluid`` 将会利用所有用户使用的OP和变量来搭建一个训练网络，被用来测试时， 可以通过调用Program相关的接口例如：`clone` 剪去一些与测试无关的OP和变量，比如反向传播的OP和变量。
-
-
-返回：创建的空的Program
-
-返回值类型：Program
-
-**代码示例**
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-
-    main_program = fluid.Program()
-    startup_program = fluid.Program()
-    with fluid.program_guard(main_program=main_program, startup_program=startup_program):
-        x = fluid.layers.data(name="x", shape=[-1, 784], dtype='float32')
-        y = fluid.layers.data(name="y", shape=[-1, 1], dtype='int32')
-        z = fluid.layers.fc(name="fc", input=x, size=10, act="relu")
-
-    # start_up program here will share fc's weight with main program
-    print("main program is: {}".format(main_program))
-
-    print("start up program is: {}".format(startup_program))
-
-
-.. py:method:: to_string(throw_on_error, with_details=False)
-
-将Program转换为字符串
-
-参数：
- - **throw_on_error** (bool) - 是否在没有设置必需字段时抛出异常。
- - **with_details** (bool) - 值为true时，打印更多关于变量和参数的信息，如trainable, optimize_attr等
-
-返回： 将Program转换为字符串
-
-返回类型： str
-
-抛出异常： ``ValueError`` - 当 ``throw_on_error == true`` ，当没有设置任何必需的字段时，抛出 ``ValueError`` 。
-
-**代码示例**
-
-.. code-block:: python
-
-        import paddle.fluid as fluid
-
-        prog = fluid.default_main_program()
-        x = fluid.layers.data(name="X", shape=[2,3], dtype="float32", append_batch_size=False)
-        pred = fluid.layers.fc(x, size=3)
-        prog_string = prog.to_string(throw_on_error=True, with_details=False)
-        prog_string_with_details = prog.to_string(throw_on_error=False, with_details=True)
-        print("program string without detail: {}".format(prog_string))
-        print("program string with detail: {}".format(prog_string_with_details))
-
-.. py:method:: clone(for_test=False)
-
-**注意:**
-    **1.** ``Program.clone()`` **方法不会克隆例如**  :ref:`cn_api_fluid_io_DataLoader` **这样的数据读取相关的部分，这可能会造成的数据读取部分在克隆后丢失**
-
-    **2. 此API当** ``for_test=True`` **时将会裁剪部分OP和变量。为防止错误的裁剪，推荐在** :ref:`cn_api_fluid_backward_append_backward` **和执行优化器之前使用** ``clone(for_test=True)`` 。
-
-
-当 ``for_test=True`` 时创建一个新的、仅包含当前Program前向内容的Program。否则创建一个新的，和当前Program完全相同的Program
-
-有些OP，在训练和测试之间的行为是不同的，比如  :ref:`cn_api_fluid_layers_batch_norm` 。它们有一个属性 ``is_test`` 来控制行为。当 ``for_test=True`` 时，此方法将把它们的 ``is_test`` 属性更改为True。
-
- 克隆Program用于训练时，将 ``for_test`` 设置为False。
- 克隆Program用于测试时，将 ``for_test`` 设置为True。虽然在这种情况下，如果在使用了优化器之后调用 ``clone`` 我们依旧会对Program当中反向执行以及优化器相关的内容进行自动裁剪，但是，我们强烈建议在使用优化器之前使用 ``clone`` 例如如果使用的是 :ref:`cn_api_fluid_optimizer_Momentum` 可以这样去使用:
-
-**代码示例**
-
-   ::
-
-        import paddle.fluid as fluid
-        img = fluid.layers.data(name='image', shape=[784])
-        pred = fluid.layers.fc(input=img, size=10, act='relu')
-        loss = fluid.layers.mean(pred)
-        ## 我们推荐在使用 Optimizer前使用clone()接口
-        test_program = fluid.default_main_program().clone(for_test=True)
-        optimizer = fluid.optimizer.Momentum(learning_rate=0.01, momentum=0.9)
-        optimizer.minimize(loss)
-
-参数：
- - **for_test** (bool) – 取值为True时，clone方法内部会把operator的属性 ``is_test`` 设置为 True， 并裁剪反向OP和参数优化OP，默认值为False
-
-返回：当 ``for_test=True`` 时返回一个新的、仅包含当前Program前向内容的Program。否则返回一个新的，和当前Program完全相同的Program
-
-返回类型： Program
-
-**代码示例**
-
-注意，Program在clone后的顺序可能不同，这不会影响的训练或测试进程。在下面的示例中，我们提供了一个简单的方法print_prog（Program）来打印程序描述，以确保clone后仍能得到同样的打印结果：
-
-.. code-block:: python
-
-        import paddle.fluid as fluid
-        import six
-
-
-        def print_prog(prog):
-            for name, value in sorted(six.iteritems(prog.block(0).vars)):
-                print(value)
-            for op in prog.block(0).ops:
-                print("op type is {}".format(op.type))
-                print("op inputs are {}".format(op.input_arg_names))
-                print("op outputs are {}".format(op.output_arg_names))
-                for key, value in sorted(six.iteritems(op.all_attrs())):
-                    if key not in ['op_callstack', 'op_role_var']:
-                        print(" [ attrs: {}:   {} ]".format(key, value))
-
-1.克隆一个Program，示例代码如下。
-
-.. code-block:: python
-
-        import paddle.fluid as fluid
-        import six
-
-        def print_prog(prog):
-            for name, value in sorted(six.iteritems(prog.block(0).vars)):
-                print(value)
-            for op in prog.block(0).ops:
-                print("op type is {}".format(op.type))
-                print("op inputs are {}".format(op.input_arg_names))
-                print("op outputs are {}".format(op.output_arg_names))
-                for key, value in sorted(six.iteritems(op.all_attrs())):
-                    if key not in ['op_callstack', 'op_role_var']:
-                        print(" [ attrs: {}:   {} ]".format(key, value))
-
-        train_program = fluid.Program()
-        startup_program = fluid.Program()
-
-        # ``startup_program`` 被用来执行一些参数初始化工作
-        # ``main_program`` 被用来容纳网络
-        with fluid.program_guard(train_program, startup_program):
-            with fluid.unique_name.guard():
-                img = fluid.layers.data(name='image', shape=[784])
-                hidden = fluid.layers.fc(input=img, size=200, act='relu')
-                hidden = fluid.layers.dropout(hidden, dropout_prob=0.5)
-                loss = fluid.layers.cross_entropy(
-                                          input=fluid.layers.fc(hidden, size=10, act='softmax'),
-                            label=fluid.layers.data(name='label', shape=[1], dtype='int64'))
-                avg_loss = fluid.layers.mean(loss)
-                test_program = train_program.clone(for_test=True)
-        print_prog(test_program)
-
-        # 由于需要使训练和测试参数共享，我们需要使用训练的 ``startup_program``
-        # 来代替测试用的 ``startup_program``, 尽管测试的 ``startup_program`` 里面什么也没有。
-
-        # 在Paddle Fluid中我们会通过同样的变量名来共享权重.
-        # 训练和测试程序的所有参数将会拥有同样的名字，这将会使训练和测试程序实现参数的共享，
-        # 所以我们使用训练程序的 ``startup_program`` .并且由于测试的 ``startup_program`` 什么也没有,
-        # 因此它是一个新的程序.
-        with fluid.program_guard(train_program, startup_program):
-            with fluid.unique_name.guard():
-                sgd = fluid.optimizer.SGD(learning_rate=1e-3)
-                sgd.minimize(avg_loss)
-
-2.如果分别运行 train Program 和 test Program，则可以不使用clone。
-
-.. code-block:: python
-
-        import paddle.fluid as fluid
-        import six
-
-        def print_prog(prog):
-            for name, value in sorted(six.iteritems(prog.block(0).vars)):
-                print(value)
-            for op in prog.block(0).ops:
-                print("op type is {}".format(op.type))
-                print("op inputs are {}".format(op.input_arg_names))
-                print("op outputs are {}".format(op.output_arg_names))
-                for key, value in sorted(six.iteritems(op.all_attrs())):
-                    if key not in ['op_callstack', 'op_role_var']:
-                        print(" [ attrs: {}:   {} ]".format(key, value))
-        
-        def network():
-            img = fluid.layers.data(name='image', shape=[784])
-            hidden = fluid.layers.fc(input=img, size=200, act='relu')
-            hidden = fluid.layers.dropout(hidden, dropout_prob=0.5)
-            loss = fluid.layers.cross_entropy(
-                input=fluid.layers.fc(hidden, size=10, act='softmax'),
-                label=fluid.layers.data(name='label', shape=[1], dtype='int64'))
-            avg_loss = fluid.layers.mean(loss)
-            return avg_loss
-
-        train_program_2 = fluid.Program()
-        startup_program_2 = fluid.Program()
-        test_program_2 = fluid.Program()
-        with fluid.program_guard(train_program_2, startup_program_2):
-            with fluid.unique_name.guard():
-                avg_loss = network()
-                sgd = fluid.optimizer.SGD(learning_rate=1e-3)
-                sgd.minimize(avg_loss)
-        # 不使用测试阶段的启动程序
-        with fluid.program_guard(test_program_2, startup_program_2):
-            with fluid.unique_name.guard():
-                avg_loss = network()
-        print_prog(test_program_2)
-
-上边两个代码片段生成和打印的Program是一样的。
-
-.. py:staticmethod:: parse_from_string(binary_str)
-
-通过对 `protobuf <https://en.wikipedia.org/wiki/Protocol_Buffers>`_ 的反序列化，转换成Program
-
-
-参数：
- - **binary_str_type** (str) – `protobuf <https://en.wikipedia.org/wiki/Protocol_Buffers>`_ 二进制字符串
-
-返回：反序列化后的 Program
-
-返回类型：Program
-
-**代码示例**
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-
-    startup_prog = fluid.Program()
-    main_prog = fluid.Program()
-    with fluid.program_guard(startup_prog, main_prog):
-        x = fluid.layers.data(
-            name='X', shape=[1000, 784], dtype='float32', append_batch_size=False)
-
-        y = fluid.layers.data(
-            name='Y', shape=[784, 100], dtype='float32', append_batch_size=False)
-
-        z = fluid.layers.mul(x=x, y=y)
-
-        binary_str = fluid.default_main_program().desc.serialize_to_string()
-        prog_restored = fluid.default_main_program().parse_from_string(binary_str)
-
-        print(fluid.default_main_program())
-        print(prog_restored)
-
-        # 这里打印出的两个Program应该是一模一样的
-
-.. py:attribute:: num_blocks
-
-该Program中的 :ref:`api_guide_Block` 的个数
-
-返回： 该Program中的 :ref:`api_guide_Block` 的个数
-
-返回类型：int
-
-**代码示例**
-
-.. code-block:: python
-
-            import paddle.fluid as fluid
-
-            prog = fluid.default_main_program()
-            num_blocks = prog.num_blocks
-            print(num_blocks)
-
-            ## 1
-            ## 当前Program中只有一个Block，即全局的Block
-
-.. py:attribute:: random_seed
-
-**注意：必须在相关OP被添加之前设置。**
-
-程序中随机运算符的默认随机种子。0意味着随机生成随机种子。
-
-返回：该Program中当前正在使用的random seed
-
-返回类型：int64
-
-**代码示例**
-
-.. code-block:: python
-
-            import paddle.fluid as fluid
-
-            prog = fluid.default_main_program()
-            random_seed = prog.random_seed
-            x_var = fluid.layers.data(name="X", shape=[3,3], dtype="float32", append_batch_size=False)
-            print(random_seed)
-            ## 0
-            ## 默认的random seed是 0
-
-            # 这里我们必须要在fluid.layers.dropout之前设置random_seed
-            prog.random_seed = 1
-            z_var = fluid.layers.dropout(x_var, 0.7)
-
-            print(prog.random_seed)
-            ## 1
-            ## 修改后random seed变成了 1
-
-.. py:method:: global_block()
-
-获取该Program的第一个 :ref:`api_guide_Block` 。
-
-返回：该Program的第一个 :ref:`api_guide_Block`
-
-返回类型：:ref:`api_guide_Block`
-
-**代码示例**
-
-.. code-block:: python
-
-            import paddle.fluid as fluid
-
-            prog = fluid.default_main_program()
-            gb_block = prog.global_block()
-            print(gb_block)
-            ##
-            ## idx: 0
-            ## parent_idx: -1
-            ## 打印出了当前全局Block的描述
-
-.. py:method:: block(index)
-
-返回该Program中 ， ``index`` 指定的 :ref:`api_guide_Block` 。 ``index`` 类型为int
-
-参数:
- - **index** (int) - 需要获取的 :ref:`api_guide_Block`  的index
-
-返回: 该Program中index对应的那个 :ref:`api_guide_Block`
-
-返回类型: :ref:`api_guide_Block`
-
-**代码示例**
-
-.. code-block:: python
-
-            import paddle.fluid as fluid
-
-            prog = fluid.default_main_program()
-            block_0 = prog.block(0)
-            print(block_0)
-            ##
-            ## idx: 0
-            ## parent_idx: -1
-            ## 打印出了0号Block的描述
-
-.. py:method:: current_block()
-
-获取当前 :ref:`api_guide_Block` 。当前 :ref:`api_guide_Block`  是用来添加OP的。
-
-返回: 该Program中用户当前所在的 :ref:`api_guide_Block`
-
-返回类型: :ref:`api_guide_Block`
-
-**代码示例**
-
-.. code-block:: python
-
-            import paddle.fluid as fluid
-
-            prog = fluid.default_main_program()
-            current_blk = prog.current_block()
-            print(current_blk)
-            ##
-            ## idx: 0
-            ## parent_idx: -1
-            ## 打印出了当前Block的描述
-
-.. py:method:: list_vars()
-
-获取当前Program中所有变量。返回值是一个可迭代对象（iterable object)。
-
-返回: Generator 会yield每个Program中的变量
-
-返回类型: iterable 的 :ref:`api_guide_Variable`
-
-
-**代码示例**
-
-.. code-block:: python
-
-            import paddle.fluid as fluid
-
-            prog = fluid.default_main_program()
-            img = fluid.layers.data(name='img', shape=[1,28,28], dtype='float32')
-            label = fluid.layers.data(name='label', shape=[128,1], dtype='int64')
-            for var in prog.list_vars():
-                print(var)
-
-            # 这里将会打印出当前Program中所有的Variable
-
-.. py:method:: all_parameters()
-
-获取当前Program中所有的 :ref:`api_guide_parameter` 。返回值是一个列表。
-
-返回: 一个包含当前Program中所有参数的列表。
-
-返回类型: list[ :ref:`api_guide_parameter` ]
-
-
-**代码示例**
-
-.. code-block:: python
-
-            import paddle.fluid as fluid
-
-            program = fluid.default_main_program()
-            data = fluid.data(name='x', shape=[None, 13], dtype='float32')
-            hidden = fluid.layers.fc(input=data, size=10)
-            loss = fluid.layers.mean(hidden)
-            fluid.optimizer.SGD(learning_rate=0.01).minimize(loss)
-
-            for param in program.all_parameters():
-                print(param)
-
-            # 这里将会打印出当前Program中所有的Parameters，在本例中，输出结果是:
-            #
-            # name: "fc_0.w_0"
-            # type {
-            # type: LOD_TENSOR
-            # lod_tensor {
-            #     tensor {
-            #       data_type: FP32
-            #       dims: 13
-            #       dims: 10
-            #     }
-            #   }
-            # }
-            #
-            # persistable: true
-            # name: "fc_0.b_0"
-            # type {
-            # type: LOD_TENSOR
-            # lod_tensor {
-            #     tensor {
-            #       data_type: FP32
-            #       dims: 10
-            #     }
-            #   }
-            # }
-            # persistable: true
-            #
-            # 这里print(param)将会打印出一个参数所有的属性，包括name，type和persistable，
-            # 你可以访问一个参数的指定属性，例如param.name，param.type
\ No newline at end of file
--- a/doc/paddle/api/paddle/fluid/framework/Variable_cn.rst
+++ b/doc/paddle/api/paddle/fluid/framework/Variable_cn.rst
@@ -145,7 +145,7 @@ Variable

 **参数:**

-  - **backward_strategy**: ( :ref:`cn_api_fluid_dygraph_BackwardStrategy` ) 使用何种 :ref:`cn_api_fluid_dygraph_BackwardStrategy`  聚合反向的梯度
+  - **retain_graph** (bool，可选) – 该参数用于确定反向梯度更新完成后反向梯度计算图是否需要保留（retain_graph为True则保留反向梯度计算图）。若用户打算在执行完该方法（  :code:`backward` ）后，继续向之前已构建的计算图中添加更多的Op，则需要设置 :code:`retain_graph` 值为True（这样才会保留之前计算得到的梯度）。可以看出，将 :code:`retain_graph` 设置为False可降低内存的占用。默认值为False。

 返回：无

@@ -153,23 +153,20 @@ Variable
 **示例代码**
  .. code-block:: python

-        import paddle.fluid as fluid
        import numpy as np
-
+        import paddle
+        paddle.disable_static()
        x = np.ones([2, 2], np.float32)
-        with fluid.dygraph.guard():
-            inputs2 = []
-            for _ in range(10):
-                tmp = fluid.dygraph.base.to_variable(x)
-                # 如果这里我们不为输入tmp设置stop_gradient=False，那么后面loss2也将因为这个链路都不需要梯度
-                # 而不产生梯度
-                tmp.stop_gradient=False
-                inputs2.append(tmp)
-            ret2 = fluid.layers.sums(inputs2)
-            loss2 = fluid.layers.reduce_sum(ret2)
-            backward_strategy = fluid.dygraph.BackwardStrategy()
-            backward_strategy.sort_sum_gradient = True
-            loss2.backward(backward_strategy)
+        inputs = []
+        for _ in range(10):
+            tmp = paddle.to_tensor(x)
+            # 如果这里我们不为输入tmp设置stop_gradient=False，那么后面loss也将因为这个链路都不需要梯度
+            # 而不产生梯度
+            tmp.stop_gradient=False
+            inputs.append(tmp)
+        ret = paddle.sums(inputs)
+        loss = paddle.reduce_sum(ret)
+        loss.backward()

 .. py:method:: gradient()

@@ -202,9 +199,7 @@ Variable
                inputs2.append(tmp)
            ret2 = fluid.layers.sums(inputs2)
            loss2 = fluid.layers.reduce_sum(ret2)
-            backward_strategy = fluid.dygraph.BackwardStrategy()
-            backward_strategy.sort_sum_gradient = True
-            loss2.backward(backward_strategy)
+            loss2.backward()
            print(loss2.gradient())

        # example2: 返回tuple of ndarray
@@ -248,9 +243,7 @@ Variable
                inputs2.append(tmp)
            ret2 = fluid.layers.sums(inputs2)
            loss2 = fluid.layers.reduce_sum(ret2)
-            backward_strategy = fluid.dygraph.BackwardStrategy()
-            backward_strategy.sort_sum_gradient = True
-            loss2.backward(backward_strategy)
+            loss2.backward()
            print(loss2.gradient())
            loss2.clear_gradient()
            print("After clear {}".format(loss2.gradient()))
@@ -351,6 +344,7 @@ Variable
  .. code-block:: python

        import paddle.fluid as fluid
+        import numpy as np

        with fluid.dygraph.guard():
            value0 = np.arange(26).reshape(2, 13).astype("float32")
@@ -366,9 +360,9 @@ Variable
            out1.stop_gradient = True
            out = fluid.layers.concat(input=[out1, out2, c], axis=1)
            out.backward()
-            # 可以发现这里linear的参数变成了
-            assert (linear.weight.gradient() == 0).all()
-            assert (out1.gradient() == 0).all()
+            # 可以发现这里linear的参数梯度变成了None
+            assert linear.weight.gradient() is None
+            assert out1.gradient() is None

 .. py:attribute:: persistable


--- a/doc/paddle/api/paddle/fluid/initializer/Normal_cn.rst
+++ b/doc/paddle/api/paddle/fluid/initializer/Normal_cn.rst
-.. _cn_api_fluid_initializer_Normal:
+.. _cn_api_fluid_layers_Normal:

 Normal
 -------------------------------

-.. py:attribute:: paddle.fluid.initializer.Normal
+.. py:class:: paddle.fluid.layers.Normal(loc, scale)
+
+
+
+
+正态分布
+
+数学公式：
+
+.. math::
+
+    pdf(x; \mu, \sigma) = \frac{1}{Z}e^{\frac {-0.5 (x - \mu)^2}  {\sigma^2} }
+
+    Z = (2 \pi \sigma^2)^{0.5}
+
+上面的数学公式中：
+
+:math:`loc = \mu` : 平均值。
+:math:`scale = \sigma` : 标准差。
+:math:`Z`: 正态分布常量。
+
+参数：
+    - **loc** (float|list|numpy.ndarray|Variable) - 正态分布平均值。数据类型为float32。
+    - **scale** (float|list|numpy.ndarray|Variable) - 正态分布标准差。数据类型为float32。
+
+**代码示例**：
+
+.. code-block:: python
+
+    import numpy as np
+    from paddle.fluid import layers
+	from paddle.fluid.layers import Normal
+
+    # 定义参数为float的正态分布。
+    dist = Normal(loc=0., scale=3.)
+    # 定义一组有两个数的正态分布。
+    # 第一组为均值1，标准差11，第二组为均值2，标准差22。
+    dist = Normal(loc=[1., 2.], scale=[11., 22.])
+    # 得到3个样本, 返回一个 3 x 2 张量。
+    dist.sample([3])
+
+    # 通过广播的方式，定义一个两个参数的正态分布。
+    # 均值都是1，标准差不同。
+    dist = Normal(loc=1., scale=[11., 22.])
+
+    # 一个完整的例子
+    value_npdata = np.array([0.8], dtype="float32")
+    value_tensor = layers.create_tensor(dtype="float32")
+    layers.assign(value_npdata, value_tensor)
+
+    normal_a = Normal([0.], [1.])
+    normal_b = Normal([0.5], [2.])
+
+    sample = normal_a.sample([2])
+    # 一个由定义好的正太分布随机生成的张量，维度为: [2, 1]
+    entropy = normal_a.entropy()
+    # [1.4189385] with shape: [1]
+    lp = normal_a.log_prob(value_tensor)
+    # [-1.2389386] with shape: [1]
+    kl = normal_a.kl_divergence(normal_b)
+    # [0.34939718] with shape: [1]
+
+
+.. py:function:: sample(shape, seed=0)
+
+生成指定维度的样本
+
+参数：
+    - **shape** (list) - 1维列表，指定生成样本的维度。数据类型为int32。
+    - **seed** (int) - 长整型数。
+    
+返回：预先设计好维度的张量, 数据类型为float32
+
+返回类型：Variable
+
+.. py:function:: entropy()
+
+信息熵
+    
+返回：正态分布的信息熵, 数据类型为float32
+
+返回类型：Variable
+
+.. py:function:: log_prob(value)
+
+对数概率密度函数
+
+参数：
+    - **value** (Variable) - 输入张量。数据类型为float32或float64。
+    
+返回：对数概率, 数据类型与value相同
+
+返回类型：Variable
+
+.. py:function:: kl_divergence(other)
+
+两个正态分布之间的KL散度。
+
+参数：
+    - **other** (Normal) - Normal的实例。
+    
+返回：两个正态分布之间的KL散度, 数据类型为float32
+
+返回类型：Variable

-:alias_main: paddle.nn.initializer.Normal
-:alias: paddle.nn.initializer.Normal
-:old_api: paddle.fluid.initializer.Normal



-``NormalInitializer`` 的别名


--- a/doc/paddle/api/paddle/fluid/initializer/Uniform_cn.rst
+++ b/doc/paddle/api/paddle/fluid/initializer/Uniform_cn.rst
-.. _cn_api_fluid_initializer_Uniform:
+.. _cn_api_fluid_layers_Uniform:

 Uniform
 -------------------------------

-.. py:attribute:: paddle.fluid.initializer.Uniform
+.. py:class:: paddle.fluid.layers.Uniform(low, high)
+
+
+
+
+均匀分布
+
+概率密度函数（pdf）为：
+
+.. math::
+
+    pdf(x; a, b) = \frac{1}{Z},  a <=x < b
+
+    Z = b - a
+
+上面的数学公式中：
+
+:math:`low = a` 。
+:math:`high = b` 。
+:math:`Z`: 正态分布常量。
+
+参数low和high的维度必须能够支持广播。
+
+参数：
+    - **low** (float|list|numpy.ndarray|Variable) - 均匀分布的下边界。数据类型为float32。
+    - **high** (float|list|numpy.ndarray|Variable) - 均匀分布的上边界。数据类型为float32。
+
+**代码示例**：
+
+.. code-block:: python
+
+    import numpy as np
+    from paddle.fluid import layers
+    from paddle.fluid.layers import Uniform
+
+    # 定义参数为float的均匀分布
+    u1 = Uniform(low=3.0, high=4.0)
+    # 定义参数为list的均匀分布
+    u2 = Uniform(low=[1.0, 2.0],
+                  high=[3.0, 4.0])
+    # 通过广播的方式，定义一个均匀分布
+    u3 = Uniform(low=[[1.0, 2.0],
+              [3.0, 4.0]],
+         high=[[1.5, 2.5],
+               [3.5, 4.5]])
+
+    # 通过广播的方式，定义一个均匀分布
+    u4 = Uniform(low=3.0, high=[5.0, 6.0, 7.0])
+
+    # 一个完整的例子
+    value_npdata = np.array([0.8], dtype="float32")
+    value_tensor = layers.create_tensor(dtype="float32")
+    layers.assign(value_npdata, value_tensor)
+
+    uniform = Uniform([0.], [2.])
+
+    sample = uniform.sample([2])
+    # 一个由定义好的均匀分布随机生成的张量，维度为: [2, 1]
+    entropy = uniform.entropy()
+    # [0.6931472] with shape: [1]
+    lp = uniform.log_prob(value_tensor)
+    # [-0.6931472] with shape: [1]
+
+
+.. py:function:: sample(shape, seed=0)
+
+生成指定维度的样本
+
+参数：
+    - **shape** (list) - 1维列表，指定生成样本的维度。数据类型为int32。
+    - **seed** (int) - 长整型数。
+    
+返回：预先设计好维度的张量, 数据类型为float32
+
+返回类型：Variable
+
+.. py:function:: entropy()
+
+信息熵
+    
+返回：均匀分布的信息熵, 数据类型为float32
+
+返回类型：Variable
+
+.. py:function:: log_prob(value)
+
+对数概率密度函数
+
+参数：
+    - **value** (Variable) - 输入张量。数据类型为float32或float64。
+    
+返回：对数概率, 数据类型与value相同
+
+返回类型：Variable

-:alias_main: paddle.nn.initializer.Uniform
-:alias: paddle.nn.initializer.Uniform
-:old_api: paddle.fluid.initializer.Uniform



-``UniformInitializer`` 的别名



--- a/doc/paddle/api/paddle/fluid/input/embedding_cn.rst
+++ b/doc/paddle/api/paddle/fluid/input/embedding_cn.rst
-.. _cn_api_fluid_embedding:
+.. _cn_api_fluid_layers_embedding:

 embedding
 -------------------------------


-.. py:function:: paddle.fluid.embedding(input, size, is_sparse=False, is_distributed=False, padding_idx=None, param_attr=None, dtype='float32')
+.. py:function:: paddle.fluid.layers.embedding(input, size, is_sparse=False, is_distributed=False, padding_idx=None, param_attr=None, dtype='float32')

 :api_attr: 声明式编程模式（静态图)



-该OP根据input中的id信息从embedding矩阵中查询对应embedding信息，函数会根据输入的size (vocab_size, emb_size)和dtype自动构造一个二维embedding矩阵。
+嵌入层(Embedding Layer)

-输出的Tensor的shape是在输入Tensor shape的最后一维后面添加了emb_size的维度。
+**注意：此OP将在未来的版本中被移除！该OP要求输入Tensor shape的最后一维必须为1。推荐使用fluid.** :ref:`cn_api_fluid_embedding` 。
+
+该OP根据input中的id信息从embedding矩阵中查询对应embedding信息，并会根据输入的size (vocab_size, emb_size)和dtype自动构造一个二维embedding矩阵。
+
+要求input的最后一维必须等于1，输出的Tensor的shape是将输入Tensor shape的最后一维的1替换为emb_size。

 注：input中的id必须满足 ``0 =< id < size[0]``，否则程序会抛异常退出。

@@ -22,8 +26,8 @@ embedding
    Case 1:

    input是Tensor, 且padding_idx = -1
-        input.data = [[1, 3], [2, 4], [4, 127]]
-        input.shape = [3, 2]
+        input.data = [[[1], [3]], [[2], [4]], [[4], [127]]]
+        input.shape = [3, 2, 1]
    若size = [128, 16]
    输出为Tensor:
        out.shape = [3, 2, 16]
@@ -32,7 +36,7 @@ embedding

                    [[0.345249859, 0.124939536, ..., 0.194353745],
                     [0.945345345, 0.435394634, ..., 0.435345365]],
-
+                     
                    [[0.945345345, 0.435394634, ..., 0.435345365],
                     [0.0,         0.0,         ..., 0.0        ]]]  # padding data
    输入的padding_idx小于0，则自动转换为padding_idx = -1 + 128 = 127, 对于输入id为127的词，进行padding处理。
@@ -46,25 +50,25 @@ embedding
    若size = [128, 16]
    输出为LoDTensor:
        out.lod = [[2, 3]]
-        out.shape = [5, 1, 16]
-        out.data = [[[0.129435295, 0.244512452, ..., 0.436322452]],
-                    [[0.345421456, 0.524563927, ..., 0.144534654]],
-                    [[0.345249859, 0.124939536, ..., 0.194353745]],
-                    [[0.945345345, 0.435394634, ..., 0.435345365]],
-                    [[0.0,         0.0,         ..., 0.0        ]]]  # padding data
+        out.shape = [5, 16]
+        out.data = [[0.129435295, 0.244512452, ..., 0.436322452],
+                    [0.345421456, 0.524563927, ..., 0.144534654],
+                    [0.345249859, 0.124939536, ..., 0.194353745],
+                    [0.945345345, 0.435394634, ..., 0.435345365],
+                    [0.0,         0.0,         ..., 0.0        ]]  # padding data
    输入的padding_idx = 0，则对于输入id为0的词，进行padding处理。


 参数：
-    - **input** (Variable) - 存储id信息的Tensor或LoDTensor，数据类型必须为：int64。input中的id必须满足 ``0 =< id < size[0]`` 。
+    - **input** (Variable) - 存储id信息的Tensor或LoDTensor，数据类型必须为：int64，输入的shape最后一维须为1。input中的id必须满足 ``0 =< id < size[0]`` 。
    - **size** (tuple|list) - embedding矩阵的维度。必须包含两个元素，第一个元素为vocab_size(词表大小), 第二个为emb_size（embedding层维度）。
    - **is_sparse** (bool) - 是否使用稀疏的更新方式，这个参数只会影响反向的梯度更新的性能，sparse更新速度更快，推荐使用稀疏更新的方式。但某些optimizer不支持sparse更新，比如 :ref:`cn_api_fluid_optimizer_AdadeltaOptimizer` 、 :ref:`cn_api_fluid_optimizer_AdamaxOptimizer` 、 :ref:`cn_api_fluid_optimizer_DecayedAdagradOptimizer` 、 :ref:`cn_api_fluid_optimizer_FtrlOptimizer` 、 :ref:`cn_api_fluid_optimizer_LambOptimizer` 、:ref:`cn_api_fluid_optimizer_LarsMomentumOptimizer` ，此时is_sparse必须为False。默认为False。
    - **is_distributed** (bool) - 是否使用分布式的方式存储embedding矩阵，仅在多机分布式cpu训练中使用。默认为False。
-    - **padding_idx** (int|long|None) - padding_idx需在区间[-vocab_size, vocab_size)，否则不生效，padding_idx<0时，padding_idx 会被改成 vocab_size + padding_idx，input中等于padding_index的id对应的embedding信息会被设置为0，且这部分填充数据在训练时将不会被更新。如果为none，不作处理，默认为None。
+    - **padding_idx** (int|long|None) - padding_idx需在区间[-vocab_size, vocab_size)，否则不生效，padding_idx<0时，padding_idx会被改成vocab_size + padding_idx，input中等于padding_index的id对应的embedding信息会被设置为0，且这部分填充数据在训练时将不会被更新。如果为None，不作处理，默认为None。
    - **param_attr** (ParamAttr) - 指定权重参数属性的对象。默认值为None，表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。此外，可以通过 ``param_attr`` 参数加载用户自定义或预训练的词向量。只需将本地词向量转为numpy数据格式，且保证本地词向量的shape和embedding的 ``size`` 参数一致，然后使用 :ref:`cn_api_fluid_initializer_NumpyArrayInitializer` 进行初始化，即可实现加载自定义或预训练的词向量。详细使用方法见代码示例2。
-    - **dtype** (str|core.VarDesc.VarType) - 输出Tensor或LoDTensor的数据类型，数据类型必须为：float32，float64，默认为float32。
+    - **dtype** (str|core.VarDesc.VarType) - 输出Tensor或LoDTensor的数据类型，数据类型必须为：float32或float64，默认为float32。

-返回：input映射后embedding Tensor或LoDTensor，数据类型和dtype定义的类型一致。
+返回：input映射后得到的Embedding Tensor或LoDTensor，数据类型和dtype定义的类型一致。

 返回类型：Variable

@@ -73,10 +77,12 @@ embedding
 .. code-block:: python

    import paddle.fluid as fluid
+    import numpy as np
+
    data = fluid.layers.data(name='sequence', shape=[1], dtype='int64', lod_level=1)

    # 示例 1
-    emb_1 = fluid.embedding(input=data, size=[128, 64])
+    emb_1 = fluid.layers.embedding(input=data, size=[128, 64])

    # 示例 2: 加载用户自定义或预训练的词向量
    weight_data = np.random.random(size=(128, 100))  # numpy格式的词向量数据
@@ -85,7 +91,7 @@ embedding
        learning_rate=0.5,
        initializer=fluid.initializer.NumpyArrayInitializer(weight_data),
        trainable=True)
-    emb_2 = fluid.embedding(input=data, size=(128, 100), param_attr=w_param_attrs, dtype='float32')
+    emb_2 = fluid.layers.embedding(input=data, size=(128, 100), param_attr=w_param_attrs, dtype='float32')




--- a/doc/paddle/api/paddle/fluid/io/load_cn.rst
+++ b/doc/paddle/api/paddle/fluid/io/load_cn.rst
-.. _cn_api_fluid_load:
+.. _cn_api_fluid_dygraph_jit_load:

 load
-------------------------------
-
-.. py:function:: paddle.fluid.load(program, model_path, executor=None, var_list=None)
-
-:api_attr: 声明式编程模式（静态图)
-
-
-
-该接口从Program中过滤出参数和优化器信息，然后从文件中获取相应的值。
-
-如果Program和加载的文件之间参数的维度或数据类型不匹配，将引发异常。
-
-该函数还可以加载用[save_params，save_persistables，save_vars]接口保存的模型文件。
-当[save_params，save_persistables，save_vars]保存的模型格式为单个大文件时，var_list不能为None。
-
-参数:
- - **program**  ( :ref:`cn_api_fluid_Program` ) – 要加载的Program。
- - **model_path**  (str) – 保存Program的目录名称+文件前缀。格式为 ``目录名称/文件前缀`` 。
- - **executor** (Executor, 可选) - 当startup program没有运行时，用于初始化参数的Executor。默认值：None。
- - **var_list** (list, 可选) - 指定加载的变量列表，该参数只在加载旧接口[save_params，save_persistables，save_vars]保存的模型文件时使用。当加载的是多个小文件时，变量列表可以是所有加载文件中变量的子集；当加载的单个大文件时，变量列表必须和加载文件中的变量保持一致。
-
-返回: 无
-
-**代码示例**
-
-.. code-block:: python
-
-    # example1
-    import paddle.fluid as fluid
-
-    x = fluid.data( name="x", shape=[10, 10], dtype='float32')
-    y = fluid.layers.fc(x, 10)
-    z = fluid.layers.fc(y, 10)
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    exe.run(fluid.default_startup_program())
-    fluid.save(fluid.default_main_program(), "./test_path")
-    fluid.load(fluid.default_main_program(), "./test_path")
-
-    # example2
-    # 注意example1和example2应该分开执行，避免干扰。
-    import paddle.fluid as fluid
-
-    x = fluid.data( name="x", shape=[10, 10], dtype='float32')
-    y = fluid.layers.fc(x, 10)
-    z = fluid.layers.fc(y, 10)
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    exe.run(fluid.default_startup_program())
-    fluid.save(fluid.default_main_program(), "./test_path")
-    fluid.load(fluid.default_main_program(), "./test_path", exe)
-
+-----------------
+
+.. py:function:: paddle.fluid.dygraph.jit.load(model_path, configs=None)
+
+:api_attr: 命令式编程模式（动态图)
+
+将接口 :ref:`cn_api_fluid_dygraph_jit_save` 或者 :ref:`cn_api_fluid_io_save_inference_model` 存储的模型载入为 :ref:`cn_api_fluid_dygraph_TranslatedLayer` ，用于预测推理或者fine-tune训练。
+
+.. note::
+    由于一些历史原因，如果载入的模型是通过 :ref:`cn_api_fluid_io_save_inference_model` 存储的，
+    在使用它进行fine-tune训练时会存在一些局限：
+    1. 命令式编程模式不支持 ``LoDTensor`` ，所有原先输入变量或者参数依赖于LoD信息的模型暂时无法使用；
+    2. 所有存储模型的feed变量都需要被传入 ``Translatedlayer`` 的forward方法；
+    3. 原模型变量的 ``stop_gradient`` 信息已丢失且无法准确恢复；
+    4. 原模型参数的 ``trainable`` 信息已丢失且无法准确恢复。
+
+参数：
+    - **model_path** (str) - 存储模型的目录。
+    - **configs** (SaveLoadConfig, 可选) - 用于指定额外配置选项的 :ref:`cn_api_fluid_dygraph_jit_SaveLoadConfig` 对象。默认为 ``None``。
+
+返回：TranslatedLayer - 一个能够执行存储模型的 ``Layer`` 对象。
+
+**示例代码**
+
+1. 载入由接口 :ref:`cn_api_fluid_dygraph_jit_save` 存储的模型进行预测推理及fine-tune训练。
+
+    .. code-block:: python
+
+        import numpy as np
+        import paddle.fluid as fluid
+        from paddle.fluid.dygraph import Linear
+        from paddle.fluid.dygraph import declarative
+        BATCH_SIZE = 32
+        BATCH_NUM = 20
+        def random_batch_reader():
+            def _get_random_images_and_labels(image_shape, label_shape):
+                image = np.random.random(size=image_shape).astype('float32')
+                label = np.random.random(size=label_shape).astype('int64')
+                return image, label
+            def __reader__():
+                for _ in range(BATCH_NUM):
+                    batch_image, batch_label = _get_random_images_and_labels(
+                        [BATCH_SIZE, 784], [BATCH_SIZE, 1])
+                    yield batch_image, batch_label
+            return __reader__
+        class LinearNet(fluid.dygraph.Layer):
+            def __init__(self, in_size, out_size):
+                super(LinearNet, self).__init__()
+                self._linear = Linear(in_size, out_size)
+            @declarative
+            def forward(self, x):
+                return self._linear(x)
+        # 开启命令式编程模式
+        fluid.enable_dygraph() 
+        # 1. 训练存储模型.
+        # 创建网络
+        net = LinearNet(784, 1)
+        adam = fluid.optimizer.AdamOptimizer(learning_rate=0.1, parameter_list=net.parameters())
+        # 创建DataLoader
+        train_loader = fluid.io.DataLoader.from_generator(capacity=5)
+        train_loader.set_batch_generator(random_batch_reader())
+        # 训练
+        for data in train_loader():
+            img, label = data
+            label.stop_gradient = True
+            cost = net(img)
+            loss = fluid.layers.cross_entropy(cost, label)
+            avg_loss = fluid.layers.mean(loss)
+            avg_loss.backward()
+            adam.minimize(avg_loss)
+            net.clear_gradients()
+        model_path = "linear.example.model"
+        fluid.dygraph.jit.save(
+            layer=net,
+            model_path=model_path,
+            input_spec=[img])
+        # 2. 载入模型 & 预测
+        # 载入模型
+        infer_net = fluid.dygraph.jit.load(model_path)
+        # 预测
+        x = fluid.dygraph.to_variable(np.random.random((1, 784)).astype('float32'))
+        pred = infer_net(x)
+        # 3. 载入模型 & fine-tune训练
+        # 载入模型
+        train_net = fluid.dygraph.jit.load(model_path)
+        train_net.train()
+        adam = fluid.optimizer.AdamOptimizer(learning_rate=0.1, parameter_list=train_net.parameters())
+        # 创建DataLoader
+        train_loader = fluid.io.DataLoader.from_generator(capacity=5)
+        train_loader.set_batch_generator(random_batch_reader())
+        # fine-tune训练
+        for data in train_loader():
+            img, label = data
+            label.stop_gradient = True
+            cost = train_net(img)
+            loss = fluid.layers.cross_entropy(cost, label)
+            avg_loss = fluid.layers.mean(loss)
+            avg_loss.backward()
+            adam.minimize(avg_loss)
+            train_net.clear_gradients()
+
+
+2. 载入由接口 :ref:`cn_api_fluid_io_save_inference_model` 存储的模型进行预测推理及fine-tune训练。
+
+    .. code-block:: python
+
+        import numpy as np
+        import paddle.fluid as fluid
+        BATCH_SIZE = 32
+        BATCH_NUM = 20
+        def random_batch_reader():
+            def _get_random_images_and_labels(image_shape, label_shape):
+                image = np.random.random(size=image_shape).astype('float32')
+                label = np.random.random(size=label_shape).astype('int64')
+                return image, label
+            def __reader__():
+                for _ in range(BATCH_NUM):
+                    batch_image, batch_label = _get_random_images_and_labels(
+                        [BATCH_SIZE, 784], [BATCH_SIZE, 1])
+                    yield batch_image, batch_label
+            return __reader__
+        img = fluid.data(name='img', shape=[None, 784], dtype='float32')
+        label = fluid.data(name='label', shape=[None, 1], dtype='int64')
+        pred = fluid.layers.fc(input=img, size=10, act='softmax')
+        loss = fluid.layers.cross_entropy(input=pred, label=label)
+        avg_loss = fluid.layers.mean(loss)
+        optimizer = fluid.optimizer.SGD(learning_rate=0.001)
+        optimizer.minimize(avg_loss)
+        place = fluid.CPUPlace()
+        exe = fluid.Executor(place)
+        exe.run(fluid.default_startup_program())
+        loader = fluid.io.DataLoader.from_generator(
+            feed_list=[img, label], capacity=5, iterable=True)
+        loader.set_batch_generator(random_batch_reader(), places=place)
+        # 1. 训练 & 存储预测模型
+        for data in loader():
+            exe.run(
+                fluid.default_main_program(),
+                feed=data, 
+                fetch_list=[avg_loss])
+        model_path = "fc.example.model"
+        fluid.io.save_inference_model(
+            model_path, ["img"], [pred], exe)
+        # 开启命令式编程模式
+        fluid.enable_dygraph() 
+        # 2. 载入模型 & 预测
+        fc = fluid.dygraph.jit.load(model_path)
+        x = fluid.dygraph.to_variable(np.random.random((1, 784)).astype('float32'))
+        pred = fc(x)
+        # 3. 载入模型 & fine-tune训练
+        fc = fluid.dygraph.jit.load(model_path)
+        fc.train()
+        sgd = fluid.optimizer.SGD(learning_rate=0.001,
+                                    parameter_list=fc.parameters())
+        train_loader = fluid.io.DataLoader.from_generator(capacity=5)
+        train_loader.set_batch_generator(
+            random_batch_reader(), places=place)
+        for data in train_loader():
+            img, label = data
+            label.stop_gradient = True
+            cost = fc(img)
+            loss = fluid.layers.cross_entropy(cost, label)
+            avg_loss = fluid.layers.mean(loss)
+            avg_loss.backward()
+            sgd.minimize(avg_loss)
--- a/doc/paddle/api/paddle/fluid/layers/BasicDecoder_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/BasicDecoder_cn.rst
+.. _cn_api_fluid_layers_BasicDecoder:
+
+BasicDecoder
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.BasicDecoder(cell, helper, output_fn=None)
+
+BasicDecoder是 :ref:`cn_api_fluid_layers_Decoder` 的子类，它组装了 :ref:`cn_api_fluid_layers_RNNCell` 和 :ref:`cn_api_fluid_layers_DecodeHelper` 的实例作为成员，其中DecodeHelper用来实现不同的解码策略。它依次执行以下步骤来完成单步解码：
+
+1. 执行 :code:`cell_outputs, cell_states = cell.call(inputs, states)` 以获取输出和新的状态。
+
+2. 执行 :code:`sample_ids = helper.sample(time, cell_outputs, cell_states)` 以采样id并将其作为当前步的解码结果。
+
+3. 执行 :code:`finished, next_inputs, next_states = helper.next_inputs(time, cell_outputs, cell_states, sample_ids)` 以产生下一解码步的结束标识、输入和状态。
+
+参数：
+  - **cell** (RNNCell) - RNNCell的实例或者具有相同接口定义的对象。
+  - **helper** (DecodeHelper) - DecodeHelper的实例。
+  - **output_fn** (可选) - 处理cell输出的接口，在采样之前使用。默认值None。
+
+**示例代码**
+
+.. code-block:: python
+        
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+
+            start_tokens = fluid.data(name="start_tokens",
+                                 shape=[None],
+                                 dtype="int64")
+            
+            trg_embeder = lambda x: fluid.embedding(
+                x, size=[10000, 128], param_attr=fluid.ParamAttr(name="trg_embedding"))
+            output_layer = lambda x: layers.fc(x,
+                                            size=10000,
+                                            num_flatten_dims=len(x.shape) - 1,
+                                            param_attr=fluid.ParamAttr(name=
+                                                                    "output_w"),
+                                            bias_attr=False)
+            helper = layers.SampleEmbeddingHelper(trg_embeder, start_tokens=start_tokens, end_token=1)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper, output_fn=output_layer)
+            outputs = layers.dynamic_decode(
+                decoder=decoder, inits=decoder_cell.get_initial_states(start_tokens))
+
+.. py:method:: initialize(initial_cell_states)
+
+初始化，包括helper的初始化和cell的初始化，cell初始化直接使用 :code:`initial_cell_states` 作为结果。
+
+参数：
+  - **initial_cell_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。这是由调用者 :ref:`cn_api_fluid_layers_dynamic_decode` 提供的参数。
+
+返回：:code:`(initial_inputs, initial_states, finished)` 的三元组。 :code:`initial_inputs, initial_states` 均是单个tensor变量或tensor变量组成的嵌套结构， :code:`finished` 是bool类型的tensor。 :code:`initial_inputs, finished` 与 :code:`helper.initialize()` 返回的内容相同； :code:`initial_states` 与输入参数中的 :code:`initial_cell_states` 的相同。
+
+返回类型：tuple
+    
+.. py:class:: OutputWrapper(cell_outputs, sample_ids)
+
+ :code:`step()` 的返回值中 :code:`outputs` 使用的数据结构，是一个由 :code:`cell_outputs` 和 :code:`sample_ids` 这两个字段构成的命名元组。
+
+.. py:method:: step(time, inputs, states, **kwargs)
+
+按照以下步骤执行单步解码：
+
+1. 执行 :code:`cell_outputs, cell_states = cell.call(inputs, states)` 以获取输出和新的状态。
+
+2. 执行 :code:`sample_ids = helper.sample(time, cell_outputs, cell_states)` 以采样id并将其作为当前步的解码结果。
+
+3. 执行 :code:`finished, next_inputs, next_states = helper.next_inputs(time, cell_outputs, cell_states, sample_ids)` 以产生下一解码步的结束标识、输入和状态。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **inputs** (Variable) - tensor变量。在第一个解码时间步时与由 :code:`initialize()` 返回的 :code:`initial_inputs` 相同，其他时间步与由 :code:`step()` 返回的 :code:`next_inputs` 相同。
+  - **states** (Variable) - tensor变量的结构。在第一个解码时间步时与 :code:`initialize()` 返回的 :code:`initial_states` 相同，其他时间步与由 :code:`step()` 返回的 :code:`next_states` 相同。
+  - **kwargs** - 附加的关键字参数，由调用者 :ref:`cn_api_fluid_layers_dynamic_decode` 提供。
+
+返回： :code:`(outputs, next_states, next_inputs, finished)` 的四元组。 :code:`outputs` 是包含 :code:`cell_outputs` 和 :code:`sample_ids` 两个字段的命名元组，其中 :code:`cell_outputs` 是 :code:`cell.call()` 的结果， :code:`sample_ids` 是 :code:`helper.sample()` 的结果； :code:`next_states, next_inputs` 分别和输入参数中的 :code:`states, inputs` 有相同的的结构、形状和数据类型； :code:`finished` 是一个bool类型的tensor，形状是 :math:`[batch\_size]` 。
+
+返回类型：tuple
--- a/doc/paddle/api/paddle/fluid/layers/DecodeHelper_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/DecodeHelper_cn.rst
+.. _cn_api_fluid_layers_DecodeHelper:
+
+DecodeHelper
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.DecodeHelper()
+
+DecodeHelper是一个基类，其子类的实例将在 :ref:`cn_api_fluid_layers_BasicDecoder` 中使用。它提供了在动态解码时采样和产生下一解码步的输入的接口。
+
+.. py:method:: initialize()
+
+初始化以产生第一个解码步的输入和每个序列是否结束的初始标识。这是 :ref:`cn_api_fluid_layers_BasicDecoder` 初始化的一部分。
+
+返回：:code:`(initial_inputs, initial_finished)` 的二元组， :code:`initial_inputs` 是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` 。 :code:`initial_finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型：tuple
+    
+.. py:method:: sample(time, outputs, states)
+
+根据 :code:`outputs` 以特定的方式进行采样，该方法是 :code:`BasicDecoder.step` 中的一部分。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+
+返回类型：Variable        
+
+.. py:method:: next_inputs(time, outputs, states, sample_ids)
+
+产生下一解码步的输入、状态，以及每个序列是否结束的标识。该方法是 :code:`BasicDecoder.step` 中的一部分。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+  - **sample_ids** (Variable) - 数据类型为int64形状为 :math:`[batch\_size]` 的tensor，和由 :code:`sample()` 返回的 :code:`sample_ids` 是同一内容。
+
+返回： :code:`(finished, next_inputs, next_states)` 的三元组。 :code:`next_inputs, next_states` 均是单个tensor变量或tensor变量组成的嵌套结构， :code:`next_states` 和输入参数中的 :code:`states` 具有相同的结构、形状和数据类型； :code:`finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型：tuple
--- a/doc/paddle/api/paddle/fluid/layers/Decoder_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/Decoder_cn.rst
@@ -39,13 +39,28 @@ Decoder提供的主要抽象为：

 返回类型：tuple

-.. py:method:: step(time, inputs, states)
+.. py:method:: step(time, inputs, states, **kwargs)

 在解码的每个时间步中被调用的接口

 参数：  
-  - **outputs** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。 结构和数据类型与 :code:`output_dtype` 相同。 tensor堆叠所有时间步长的输出从而具有shape :math:`[time\_step，batch\_size，...]` ，由调用者完成。 
-  - **final_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。 它是 :code:`decoder.step` 在最后一个解码步返回的 :code:`next_states`， 因此具有与任何时间步长的状态相同的结构，形状和数据类型。
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。。
+  - **inputs** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。在第一个解码时间步时与由 :code:`initialize()` 返回的 :code:`initial_inputs` 相同，其他时间步与由 :code:`step()` 返回的 :code:`next_inputs` 相同。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。在第一个解码时间步时与 :code:`initialize()` 返回的 :code:`initial_states` 相同，其他时间步与由 :code:`step()` 返回的 :code:`beam_search_state` 相同。
+  - **kwargs** - 附加的关键字参数，由调用者提供。
+
+返回：一个元组 :code:`(outputs, next_states, next_inputs, finished)` 。:code:`next_states` 和 :code:`next_inputs` 都是单个tensor变量或tensor变量组成的嵌套结构，且结构、形状和数据类型均分别与输入参数中的 :code:`states` 和 :code:`inputs` 相同。 :code:`outputs` 是单个tensor变量或tensor变量组成的嵌套结构。 :code:`finished` 是一个bool类型的tensor变量。
+
+返回类型：tuple
+
+.. py:method:: finalize(self, outputs, final_states, sequence_lengths)
+
+如果提供了实现，将在整个解码迭代结束后被执行一次。
+
+参数：  
+  - **outputs** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。 其中每个tensor的形状均为 :math:`[time\_step，batch\_size，...]` ，是将所有解码步中与其对应的的输出进行堆叠的结果，这个过程由其调用者完成。 
+  - **final_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。 它是 :code:`decoder.step` 在最后一个解码步返回的 :code:`next_states`， 因此具有与任何时间步的状态相同的结构，形状和数据类型。
+  - **kwargs** - 命名关键字参数，由提供调用者。

 返回：一个元组 :code:`(final_outputs, final_states)` 。:code:`final_outputs` 和 :code:`final_states` 都是单个tensor变量或tensor变量组成的嵌套结构。


--- a/doc/paddle/api/paddle/fluid/layers/GreedyEmbeddingHelper_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/GreedyEmbeddingHelper_cn.rst
+.. _cn_api_fluid_layers_GreedyEmbeddingHelper:
+
+GreedyEmbeddingHelper
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.GreedyEmbeddingHelper(embedding_fn, start_tokens, end_token)
+
+GreedyEmbeddingHelper是 :ref:`cn_api_fluid_layers_DecodeHelper` 的子类。作为解码helper，它使用 :code:`argmax` 进行采样，并将采样结果送入embedding层，以此作为下一解码步的输入。
+
+参数：
+  - **embedding_fn** (callable) - 作用于 :code:`argmax` 结果的函数，通常是一个将词id转换为词嵌入的embedding层，**注意** ，这里要使用 :ref:`cn_api_fluid_embedding` 而非 :ref:`cn_api_fluid_layers_embedding`，因为选中的id的形状是 :math:`[batch\_size]` ，如果使用后者则还需要在这里提供unsqueeze。
+  - **start_tokens** (Variable) - 形状为 :math:`[batch\_size]` 、数据类型为int64、 值为起始标记id的tensor。
+  - **end_token** (int) - 结束标记id。
+
+**示例代码**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+
+            start_tokens = fluid.data(name="start_tokens",
+                                 shape=[None],
+                                 dtype="int64")
+            
+            trg_embeder = lambda x: fluid.embedding(
+                x, size=[10000, 128], param_attr=fluid.ParamAttr(name="trg_embedding"))
+            output_layer = lambda x: layers.fc(x,
+                                            size=10000,
+                                            num_flatten_dims=len(x.shape) - 1,
+                                            param_attr=fluid.ParamAttr(name=
+                                                                    "output_w"),
+                                            bias_attr=False)
+            helper = layers.GreedyEmbeddingHelper(trg_embeder, start_tokens=start_tokens, end_token=1)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper, output_fn=output_layer)
+            outputs = layers.dynamic_decode(
+                decoder=decoder, inits=decoder_cell.get_initial_states(start_tokens))
+
+.. py:method:: initialize()
+
+GreedyEmbeddingHelper初始化，其使用构造函数中的 :code:`start_tokens` 作为第一个解码步的输入，并给出每个序列是否结束的初始标识。这是 :ref:`cn_api_fluid_layers_BasicDecoder` 初始化的一部分。
+
+返回：:code:`(initial_inputs, initial_finished)` 的二元组， :code:`initial_inputs` 同构造函数中的 :code:`start_tokens` ； :code:`initial_finished` 是一个bool类型、值为False的tensor，其形状和 :code:`start_tokens` 相同。
+
+返回类型：tuple
+    
+.. py:method:: sample(time, outputs, states)
+
+使用 :code:`argmax` 根据 `outputs` 进行采样。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+
+返回类型：Variable
+
+.. py:method:: next_inputs(time, outputs, states, sample_ids)
+
+对 :code:`sample_ids` 使用 :code:`embedding_fn` ，以此作为下一解码步的输入；同时直接使用输入参数中的 :code:`states` 作为下一解码步的状态；并通过判别 :code:`sample_ids` 是否得到 :code:`end_token`，依此产生每个序列是否结束的标识。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+  - **sample_ids** (Variable) - 数据类型为int64形状为 :math:`[batch\_size]` 的tensor，和由 :code:`sample()` 返回的 :code:`sample_ids` 是同一内容。
+
+返回： :code:`(finished, next_inputs, next_states)` 的三元组。 :code:`next_inputs, next_states` 均是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` ， :code:`next_states` 和输入参数中的 :code:`states` 相同； :code:`finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型：tuple
--- a/doc/paddle/api/paddle/fluid/layers/Normal_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/Normal_cn.rst
-.. _cn_api_fluid_initializer_Normal:
+.. _cn_api_fluid_layers_Normal:

 Normal
 -------------------------------

-.. py:attribute:: paddle.fluid.initializer.Normal
+.. py:class:: paddle.fluid.layers.Normal(loc, scale)
+
+
+
+
+正态分布
+
+数学公式：
+
+.. math::
+
+    pdf(x; \mu, \sigma) = \frac{1}{Z}e^{\frac {-0.5 (x - \mu)^2}  {\sigma^2} }
+
+    Z = (2 \pi \sigma^2)^{0.5}
+
+上面的数学公式中：
+
+:math:`loc = \mu` : 平均值。
+:math:`scale = \sigma` : 标准差。
+:math:`Z`: 正态分布常量。
+
+参数：
+    - **loc** (float|list|numpy.ndarray|Variable) - 正态分布平均值。数据类型为float32。
+    - **scale** (float|list|numpy.ndarray|Variable) - 正态分布标准差。数据类型为float32。
+
+**代码示例**：
+
+.. code-block:: python
+
+    import numpy as np
+    from paddle.fluid import layers
+	from paddle.fluid.layers import Normal
+
+    # 定义参数为float的正态分布。
+    dist = Normal(loc=0., scale=3.)
+    # 定义一组有两个数的正态分布。
+    # 第一组为均值1，标准差11，第二组为均值2，标准差22。
+    dist = Normal(loc=[1., 2.], scale=[11., 22.])
+    # 得到3个样本, 返回一个 3 x 2 张量。
+    dist.sample([3])
+
+    # 通过广播的方式，定义一个两个参数的正态分布。
+    # 均值都是1，标准差不同。
+    dist = Normal(loc=1., scale=[11., 22.])
+
+    # 一个完整的例子
+    value_npdata = np.array([0.8], dtype="float32")
+    value_tensor = layers.create_tensor(dtype="float32")
+    layers.assign(value_npdata, value_tensor)
+
+    normal_a = Normal([0.], [1.])
+    normal_b = Normal([0.5], [2.])
+
+    sample = normal_a.sample([2])
+    # 一个由定义好的正太分布随机生成的张量，维度为: [2, 1]
+    entropy = normal_a.entropy()
+    # [1.4189385] with shape: [1]
+    lp = normal_a.log_prob(value_tensor)
+    # [-1.2389386] with shape: [1]
+    kl = normal_a.kl_divergence(normal_b)
+    # [0.34939718] with shape: [1]
+
+
+.. py:function:: sample(shape, seed=0)
+
+生成指定维度的样本
+
+参数：
+    - **shape** (list) - 1维列表，指定生成样本的维度。数据类型为int32。
+    - **seed** (int) - 长整型数。
+    
+返回：预先设计好维度的张量, 数据类型为float32
+
+返回类型：Variable
+
+.. py:function:: entropy()
+
+信息熵
+    
+返回：正态分布的信息熵, 数据类型为float32
+
+返回类型：Variable
+
+.. py:function:: log_prob(value)
+
+对数概率密度函数
+
+参数：
+    - **value** (Variable) - 输入张量。数据类型为float32或float64。
+    
+返回：对数概率, 数据类型与value相同
+
+返回类型：Variable
+
+.. py:function:: kl_divergence(other)
+
+两个正态分布之间的KL散度。
+
+参数：
+    - **other** (Normal) - Normal的实例。
+    
+返回：两个正态分布之间的KL散度, 数据类型为float32
+
+返回类型：Variable

-:alias_main: paddle.nn.initializer.Normal
-:alias: paddle.nn.initializer.Normal
-:old_api: paddle.fluid.initializer.Normal



-``NormalInitializer`` 的别名


--- a/doc/paddle/api/paddle/fluid/layers/RNNCell_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/RNNCell_cn.rst
@@ -21,11 +21,11 @@ RNNCell是抽象的基类，代表将输入和状态映射到输出和新状态
  - **states** - 状态，单个tensor变量或tensor变量组成的嵌套结构。
  - **kwargs** - 附加的关键字参数，由调用者提供。
         
-返回：输出和新状态。输出和新状态都可以是嵌套的tensor变量。新状态必须具有与状态相同的结构。
+返回：包含输出和新状态的二元组 :code:`(outputs，new_states)` 。输出和新状态都可以是嵌套的tensor变量。新状态必须具有与状态相同的结构。

 返回类型：tuple

-.. py:method:: get_initial_states(batch_ref, shape=None, dtype=None, init_value=0)
+.. py:method:: get_initial_states(batch_ref, shape=None, dtype=None, init_value=0, batch_dim_idx=0)

 该接口根据提供的形状，数据类型和初始值来初始化状态。

@@ -34,6 +34,7 @@ RNNCell是抽象的基类，代表将输入和状态映射到输出和新状态
  - **shape** - 单个形状或形状组成的嵌套结构，单个形状是整数的列表或元组。 如果形状的第一维不是batch大小，则自动插入-1作为batch大小。 如果该项为None，将使用属性 :code:`state_shape`。默认值为None。 
  - **dtype** - 单个数据类型或由数据类型组成的嵌套结构。该结构必须与shape的结构相同，例外是当状态中的所有tensor都具有相同的数据类型，这时可以使用单个数据类型。 如果是None并且属性 :code:`cell.state_shape` 不可用，则float32将用作数据类型。 默认值为None。 
  - **init_value** - 用于初始化状态的浮点值。
+  - **batch_dim_idx** - 用于指示 :code:`batch_ref` 中batch所在维度的int值，默认值为0。

 返回：和shape具有相同结构的tensor变量，代表初始状态。

@@ -41,9 +42,9 @@ RNNCell是抽象的基类，代表将输入和状态映射到输出和新状态

 .. py:method:: state_shape()

-该接口用于初始化cell的状态。 单个形状或由形状组成的嵌套结构，单个形状可以是整数的列表或元组(如果形状的第一维不是batch大小，则自动插入-1作为batch大小)。 当没有使用 :code:`get_initial_states` 初始化状态或 :code:`get_initial_states` 没有提供 :code:`shape` 参数的时候，不用实现该方法。
+抽象方法（属性），该接口用于初始化cell的状态。 单个形状或由形状组成的嵌套结构，单个形状可以是整数的列表或元组(如果形状的第一维不是batch大小，则自动插入-1作为batch大小)。 当没有使用 :code:`get_initial_states` 初始化状态或 :code:`get_initial_states` 没有提供 :code:`shape` 参数的时候，不用实现该方法。


 .. py:method:: state_dtype()

-该接口用于初始化cell的状态。 单个数据类型或由数据类型组成的嵌套结构，该结构必须与 :code:`shape` 的结构相同，例外是当状态中的所有tensor都具有相同的数据类型，这时可以使用单个数据类型。 当没有使用 :code:`get_initial_states` 初始化状态或 :code:`get_initial_states` 没有提供 :code:`dtype` 参数的时候，不用实现该方法。
+抽象方法（属性），该接口用于初始化cell的状态。 单个数据类型或由数据类型组成的嵌套结构，该结构必须与 :code:`shape` 的结构相同，例外是当状态中的所有tensor都具有相同的数据类型，这时可以使用单个数据类型。 当没有使用 :code:`get_initial_states` 初始化状态或 :code:`get_initial_states` 没有提供 :code:`dtype` 参数的时候，不用实现该方法。
--- a/doc/paddle/api/paddle/fluid/layers/SampleEmbeddingHelper_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/SampleEmbeddingHelper_cn.rst
+.. _cn_api_fluid_layers_SampleEmbeddingHelper:
+
+SampleEmbeddingHelper
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.SampleEmbeddingHelper(embedding_fn, start_tokens, end_token, softmax_temperature=None, seed=None)
+
+SampleEmbeddingHelper是 :ref:`cn_api_fluid_layers_GreedyEmbeddingHelper` 的子类。作为解码helper，它通过采样而非使用 :code:`argmax` 并将采样结果送入embedding层，以此作为下一解码步的输入。
+
+参数：
+  - **embedding_fn** (callable) - 作用于 :code:`argmax` 结果的函数，通常是一个将词id转换为词嵌入的embedding层，**注意** ，这里要使用 :ref:`cn_api_fluid_embedding` 而非 :ref:`cn_api_fluid_layers_embedding`，因为选中的id的形状是 :math:`[batch\_size]` ，如果使用后者则还需要在这里提供unsqueeze。
+  - **start_tokens** (Variable) - 形状为 :math:`[batch\_size]` 、数据类型为int64、 值为起始标记id的tensor。
+  - **end_token** (int) - 结束标记id。
+  - **softmax_temperature** (float，可选) - 该值用于在softmax计算前除以logits。温度越高（大于1.0）随机性越大，温度越低则越趋向于argmax。该值必须大于0，默认值None等同于1.0。
+  - **seed** (int，可选) - 采样使用的随机种子。默认为None，表示不使用固定的随机种子。
+
+**示例代码**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+
+            start_tokens = fluid.data(name="start_tokens",
+                                 shape=[None],
+                                 dtype="int64")
+            
+            trg_embeder = lambda x: fluid.embedding(
+                x, size=[10000, 128], param_attr=fluid.ParamAttr(name="trg_embedding"))
+            output_layer = lambda x: layers.fc(x,
+                                            size=10000,
+                                            num_flatten_dims=len(x.shape) - 1,
+                                            param_attr=fluid.ParamAttr(name=
+                                                                    "output_w"),
+                                            bias_attr=False)
+            helper = layers.SampleEmbeddingHelper(trg_embeder, start_tokens=start_tokens, end_token=1)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper, output_fn=output_layer)
+            outputs = layers.dynamic_decode(
+                decoder=decoder, inits=decoder_cell.get_initial_states(start_tokens))
+    
+.. py:method:: sample(time, outputs, states)
+
+根据一个多项分布进行采样，此分布由 :code:`softmax(outputs/softmax_temperature)` 计算得到。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+
+返回类型：Variable
--- a/doc/paddle/api/paddle/fluid/layers/TrainingHelper_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/TrainingHelper_cn.rst
+.. _cn_api_fluid_layers_TrainingHelper:
+
+TrainingHelper
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.TrainingHelper(inputs, sequence_length, time_major=False)
+
+TrainingHelper是 :ref:`cn_api_fluid_layers_DecodeHelper` 的子类。作为解码helper，它在每个解码时间步通过在完整序列输入 :code:`inputs` 的相应位置切片作为各步的输入，并且使用 :code:`argmax` 根据 :code:`cell.call()` 的输出进行采样。
+由于要求有完整的序列输入 :code:`inputs` ，TrainingHelper主要用于以teach-forcing的方式进行最大似然训练，采样得到的内容通常不会使用。
+
+参数：
+  - **inputs** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。当 :code:`time_major == False` 时，tensor的形状应为 :math:`[batch\_size, sequence\_length, ...]`；当 :code:`time_major == True` 时，tensor的形状应为 :math:`[sequence\_length, batch\_size, ...]`。在解码的每一步都要从中切片取出相应的数据。
+  - **sequence_length** (Variable) - 形状为 :math:`[batch\_size]` 的tensor。它存储了 :code:`inputs` 中每个样本的实际长度，可以据此来标识每个解码步中每个样本是否结束。
+  - **time_major** (bool，可选) - 指示输入tensor和输出tensor中包含的tensor的数据组织。如果为False，则数据组织为batch为主，形状为 :math:`[batch\_size，sequence\_length，...]`。如果为True，则数据组织为time为主，形状为 :math:`[sequence\_length，batch\_size，...]`。默认值：False。
+
+**示例代码**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+            trg_emb = fluid.data(name="trg_emb",
+                                 shape=[None, None, 128],
+                                 dtype="float32")
+            trg_seq_length = fluid.data(name="trg_seq_length",
+                                        shape=[None],
+                                        dtype="int64")
+            helper = layers.TrainingHelper(trg_emb, trg_seq_length)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper)
+            outputs = layers.dynamic_decode(
+                decoder,
+                inits=decoder_cell.get_initial_states(trg_emb),
+                is_test=False)
+
+.. py:method:: initialize()
+
+TrainingHelper初始化，其通过在完整序列输入 :code:`inputs` 中首个时间步的位置上切片，以此作为第一个解码步的输入，并给出每个序列是否结束的初始标识。这是 :ref:`cn_api_fluid_layers_BasicDecoder` 初始化的一部分。
+
+返回：:code:`(initial_inputs, initial_finished)` 的二元组， :code:`initial_inputs` 是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` 。 :code:`initial_finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型：tuple
+    
+.. py:method:: sample(time, outputs, states)
+
+使用 :code:`argmax` 根据 `outputs` 进行采样。由于使用完整序列中的切片作为下一解码步的输入，采样得到的内容通常不会使用。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+
+返回类型：Variable        
+
+.. py:method:: next_inputs(time, outputs, states, sample_ids)
+
+从完整序列输入中当前时间步的位置上切片，以此作为产生下一解码步的输入；同时直接使用输入参数中的 :code:`states` 作为下一解码步的状态；并比较当前时间与每个序列的大小，依此产生每个序列是否结束的标识。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+  - **sample_ids** (Variable) - 数据类型为int64形状为 :math:`[batch\_size]` 的tensor，和由 :code:`sample()` 返回的 :code:`sample_ids` 是同一内容。
+
+返回： :code:`(finished, next_inputs, next_states)` 的三元组。 :code:`next_inputs, next_states` 均是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` ， :code:`next_states` 和输入参数中的 :code:`states` 相同； :code:`finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型：tuple
--- a/doc/paddle/api/paddle/fluid/layers/Uniform_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/Uniform_cn.rst
-.. _cn_api_fluid_initializer_Uniform:
+.. _cn_api_fluid_layers_Uniform:

 Uniform
 -------------------------------

-.. py:attribute:: paddle.fluid.initializer.Uniform
+.. py:class:: paddle.fluid.layers.Uniform(low, high)
+
+
+
+
+均匀分布
+
+概率密度函数（pdf）为：
+
+.. math::
+
+    pdf(x; a, b) = \frac{1}{Z},  a <=x < b
+
+    Z = b - a
+
+上面的数学公式中：
+
+:math:`low = a` 。
+:math:`high = b` 。
+:math:`Z`: 正态分布常量。
+
+参数low和high的维度必须能够支持广播。
+
+参数：
+    - **low** (float|list|numpy.ndarray|Variable) - 均匀分布的下边界。数据类型为float32。
+    - **high** (float|list|numpy.ndarray|Variable) - 均匀分布的上边界。数据类型为float32。
+
+**代码示例**：
+
+.. code-block:: python
+
+    import numpy as np
+    from paddle.fluid import layers
+    from paddle.fluid.layers import Uniform
+
+    # 定义参数为float的均匀分布
+    u1 = Uniform(low=3.0, high=4.0)
+    # 定义参数为list的均匀分布
+    u2 = Uniform(low=[1.0, 2.0],
+                  high=[3.0, 4.0])
+    # 通过广播的方式，定义一个均匀分布
+    u3 = Uniform(low=[[1.0, 2.0],
+              [3.0, 4.0]],
+         high=[[1.5, 2.5],
+               [3.5, 4.5]])
+
+    # 通过广播的方式，定义一个均匀分布
+    u4 = Uniform(low=3.0, high=[5.0, 6.0, 7.0])
+
+    # 一个完整的例子
+    value_npdata = np.array([0.8], dtype="float32")
+    value_tensor = layers.create_tensor(dtype="float32")
+    layers.assign(value_npdata, value_tensor)
+
+    uniform = Uniform([0.], [2.])
+
+    sample = uniform.sample([2])
+    # 一个由定义好的均匀分布随机生成的张量，维度为: [2, 1]
+    entropy = uniform.entropy()
+    # [0.6931472] with shape: [1]
+    lp = uniform.log_prob(value_tensor)
+    # [-0.6931472] with shape: [1]
+
+
+.. py:function:: sample(shape, seed=0)
+
+生成指定维度的样本
+
+参数：
+    - **shape** (list) - 1维列表，指定生成样本的维度。数据类型为int32。
+    - **seed** (int) - 长整型数。
+    
+返回：预先设计好维度的张量, 数据类型为float32
+
+返回类型：Variable
+
+.. py:function:: entropy()
+
+信息熵
+    
+返回：均匀分布的信息熵, 数据类型为float32
+
+返回类型：Variable
+
+.. py:function:: log_prob(value)
+
+对数概率密度函数
+
+参数：
+    - **value** (Variable) - 输入张量。数据类型为float32或float64。
+    
+返回：对数概率, 数据类型与value相同
+
+返回类型：Variable

-:alias_main: paddle.nn.initializer.Uniform
-:alias: paddle.nn.initializer.Uniform
-:old_api: paddle.fluid.initializer.Uniform



-``UniformInitializer`` 的别名



--- a/doc/paddle/api/paddle/fluid/layers/abs_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/abs_cn.rst
@@ -11,23 +11,29 @@ abs



-绝对值激活函数。
+绝对值函数。

 .. math::
    out = |x|

 参数:
-    - **x** (Variable)- 多维Tensor，数据类型为float32或float64。
-    - **name** (str) – 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。

-返回：表示绝对值结果的Tensor，数据类型与x相同。
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。

-返回类型：Variable
+返回类型：Tensor

 **代码示例**：

 .. code-block:: python

-        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[32, 784])
-        result = fluid.layers.abs(data)
+        import paddle
+        import numpy as np
+
+        paddle.disable_static()
+        x_data = np.array([-1, -2, -3, -4]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.abs(x)
+        print(res.numpy())
+        # [1, 2, 3, 4]
--- a/doc/paddle/api/paddle/fluid/layers/acos_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/acos_cn.rst
@@ -11,29 +11,30 @@ acos



-arccosine激活函数。
+arccosine函数。

 .. math::
    out = cos^{-1}(x)

 参数:
-    - **x(Variable)** - acos的输入Tensor，数据类型为 float32 或 float64
-    - **name** (str|None) – 具体用法请参见 :ref:`cn_api_guide_Name` ，一般无需设置，默认值为None。
-返回：  `acos` 的输出Tensor，数据类型与 `x` 相同。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。

-返回类型： Variable
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。

+返回类型： Tensor


 **代码示例**：

 .. code-block:: python

-        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[4])
-        # if data is [-0.8183,  0.4912, -0.6444,  0.0371]
-        result = fluid.layers.acos(data)
-        # result is [2.5293, 1.0573, 2.2711, 1.5336]
-
-
+        import paddle
+        import numpy as np

+        paddle.disable_static()
+        x_data = np.array([-0.8183,  0.4912, -0.6444,  0.0371]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.acos(x)
+        print(res.numpy())
+        # [2.5293, 1.0573, 2.2711, 1.5336]
--- a/doc/paddle/api/paddle/fluid/layers/asin_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/asin_cn.rst
@@ -11,29 +11,29 @@ asin



-arcsine激活函数。
+arcsine函数。

 .. math::
    out = sin^{-1}(x)

-
 参数:
-    - **x(Variable)** - asin的输入Tensor，数据类型为 float32 或 float64
-    - **name** (str|None) – 具体用法请参见 :ref:`cn_api_guide_Name` ，一般无需设置，默认值为None。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64、float16。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。

-返回：  `asin` 的输出Tensor，数据类型与 `x` 相同。
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。

-返回类型： Variable
+返回类型： Tensor

 **代码示例**：

 .. code-block:: python

-        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[4])
-        # if data is [-0.8183,  0.4912, -0.6444,  0.0371]
-        result = fluid.layers.asin(data)
-        # result is [-0.9585,  0.5135, -0.7003,  0.0372]
-
-
+        import paddle
+        import numpy as np

+        paddle.disable_static()
+        x_data = np.array([-0.8183,  0.4912, -0.6444,  0.0371]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.asin(x)
+        print(res.numpy())
+        # [-0.9585,  0.5135, -0.7003,  0.0372]
--- a/doc/paddle/api/paddle/fluid/layers/atan_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/atan_cn.rst
@@ -11,30 +11,29 @@ atan



-arctanh激活函数。
+arctangent函数。

 .. math::
-    out = tanh^{-1}(x)
+    out = tan^{-1}(x)

 参数:
-    - **x(Variable)** - atan的输入Tensor，数据类型为 float32 或 float64
-    - **name** (str|None) – 具体用法请参见 :ref:`cn_api_guide_Name` ，一般无需设置，默认值为None。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64、float16。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。

-返回：  `atan` 的输出Tensor，数据类型与 `x` 相同。
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。

-返回类型： Variable
+返回类型： Tensor

 **代码示例**：

 .. code-block:: python

-        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[4])
-        # if data is [-0.8183,  0.4912, -0.6444,  0.0371]
-        result = fluid.layers.atan(data)
-        # result is [-0.6858,  0.4566, -0.5724,  0.0371]
-
-
-
-
+        import paddle
+        import numpy as np

+        paddle.disable_static()
+        x_data = np.array([-0.8183,  0.4912, -0.6444,  0.0371]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.atan(x)
+        print(res.numpy())
+        # [-0.6858,  0.4566, -0.5724,  0.0371]
--- a/doc/paddle/api/paddle/fluid/layers/ceil_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/ceil_cn.rst
@@ -19,24 +19,24 @@ ceil


 参数:
-    - **x** (Variable) - 该OP的输入为多维Tensor。数据类型为float32或float64。
-    - **name** (str, 可选) - 具体用法请参见 :ref:`api_guide_Name`，一般无需设置，默认值为None。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64 、float16。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。

-返回： 输出为Tensor，与 ``x`` 维度相同、数据类型相同。
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。

-返回类型： Variable
+返回类型： Tensor

 **代码示例**：

 .. code-block:: python

-  import paddle.fluid as fluid
-  import numpy as np
+        import paddle
+        import numpy as np

-  input_ceil = np.array([[-1.5,6],[1,15.6]])
-  with fluid.dygraph.guard():
-      x = fluid.dygraph.to_variable(input_ceil)
-      y = fluid.layers.ceil(x)
-      print(y.numpy())
-      # [[-1.  6.]
-      # [ 1. 16.]]
+        paddle.disable_static()
+        x_data = np.array([[-1.5,6],[1,15.6]]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.ceil(x)
+        print(res.numpy())
+        # [[-1.  6.]
+        # [ 1. 16.]]
--- a/doc/paddle/api/paddle/fluid/layers/concat_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/concat_cn.rst
@@ -15,13 +15,6 @@ concat

 返回：联结后的 ``Tensor`` ，数据类型和 ``input`` 中的Tensor相同。

-
-抛出异常：
-    - ``TypeError``: - 当输入 ``input`` 的类型不是list、tuple或者Tensor的时候。
-    - ``TypeError``: - 当输入 ``input`` 的数据类型不是 bool，float16， float32， float64， int32， int64时。
-    - ``TypeError``: - 当 ``axis`` 的类型不是int或者Tensor时。当 ``axis`` 是Tensor的时候其数据类型不是int32或者int64时。
-    - ``TypeError``: - 当输入 ``input`` 中的Tensor存在数据类型不一致时。
-
 **代码示例**：

 .. code-block:: python

--- a/doc/paddle/api/paddle/fluid/layers/cos_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/cos_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/crf_decoding_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/crf_decoding_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/cumsum_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/cumsum_cn.rst
@@ -5,11 +5,6 @@ cumsum

 .. py:function:: paddle.fluid.layers.cumsum(x,axis=None,exclusive=None,reverse=None)

-:alias_main: paddle.cumsum
-:alias: paddle.cumsum,paddle.tensor.cumsum,paddle.tensor.math.cumsum
-:old_api: paddle.fluid.layers.cumsum
-
-

 沿给定轴(axis)的元素的累加和。默认结果的第一个元素和输入的第一个元素一致。如果exlusive为True，结果的第一个元素则为0。


--- a/doc/paddle/api/paddle/fluid/layers/data_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/data_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/dynamic_decode_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/dynamic_decode_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/dynamic_gru_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/dynamic_gru_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/dynamic_lstm_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/dynamic_lstm_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/embedding_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/embedding_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/eye_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/eye_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/fc_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/fc_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/fill_constant_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/fill_constant_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/flatten_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/flatten_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/gather_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/gather_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/gather_nd_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/gather_nd_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/get_tensor_from_selected_rows_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/get_tensor_from_selected_rows_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/grid_sampler_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/grid_sampler_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/group_norm_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/group_norm_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/hash_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/hash_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/linspace_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/linspace_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/load_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/load_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/logical_and_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/logical_and_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/logical_not_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/logical_not_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/logical_or_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/logical_or_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/logical_xor_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/logical_xor_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/lstm_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/lstm_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/one_hot_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/one_hot_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/ones_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/ones_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/pad2d_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/pad2d_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/pow_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/pow_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/rank_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/rank_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/rank_loss_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/rank_loss_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/reduce_prod_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/reduce_prod_cn.rst
@@ -15,7 +15,7 @@ reduce_prod

 参数：
          - **input** （Variable）- 输入变量为多维Tensor或LoDTensor，支持数据类型为float32，float64，int32，int64。
-          - **dim** （list | int ，可选）- 求乘积运算的维度。如果为None，则计算所有元素的乘积并返回包含单个元素的Tensor变量，否则必须在  :math:`[−rank(input),rank(input)]` 范围内。如果 :math:`dim [i] <0` ，则维度将变为 :math:`rank+dim[i]` ，默认值为None。
+          - **dim** （int|list|tuple ，可选）- 求乘积运算的维度。如果为None，则计算所有元素的乘积并返回包含单个元素的Tensor变量，否则必须在  :math:`[−rank(input),rank(input)]` 范围内。如果 :math:`dim [i] <0` ，则维度将变为 :math:`rank+dim[i]` ，默认值为None。
          - **keep_dim** （bool）- 是否在输出Tensor中保留减小的维度。如 keep_dim 为true，否则结果张量的维度将比输入张量小，默认值为False。
          - **name** （str ， 可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。


--- a/doc/paddle/api/paddle/fluid/layers/reshape_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/reshape_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/sequence_mask_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/sequence_mask_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/sigmoid_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/sigmoid_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/sign_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/sign_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/slice_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/slice_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/softmax_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/softmax_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/split_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/split_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/unbind_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/unbind_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/unique_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/unique_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/unstack_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/unstack_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/zeros_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/zeros_cn.rst
--- a/doc/paddle/api/paddle/fluid/memory_optimize_cn.rst
+++ b/doc/paddle/api/paddle/fluid/memory_optimize_cn.rst
--- a/doc/paddle/api/paddle/fluid/metrics/Accuracy_cn.rst
+++ b/doc/paddle/api/paddle/fluid/metrics/Accuracy_cn.rst
--- a/doc/paddle/api/paddle/fluid/metrics/Auc_cn.rst
+++ b/doc/paddle/api/paddle/fluid/metrics/Auc_cn.rst
--- a/doc/paddle/api/paddle/metric/ChunkEvaluator_cn.rst
+++ b/doc/paddle/api/paddle/metric/ChunkEvaluator_cn.rst
--- a/doc/paddle/api/paddle/metric/CompositeMetric_cn.rst
+++ b/doc/paddle/api/paddle/metric/CompositeMetric_cn.rst
--- a/doc/paddle/api/paddle/metric/DetectionMAP_cn.rst
+++ b/doc/paddle/api/paddle/metric/DetectionMAP_cn.rst
--- a/doc/paddle/api/paddle/metric/EditDistance_cn.rst
+++ b/doc/paddle/api/paddle/metric/EditDistance_cn.rst
--- a/doc/paddle/api/paddle/fluid/metrics/Precision_cn.rst
+++ b/doc/paddle/api/paddle/fluid/metrics/Precision_cn.rst
--- a/doc/paddle/api/paddle/fluid/metrics/Recall_cn.rst
+++ b/doc/paddle/api/paddle/fluid/metrics/Recall_cn.rst
--- a/doc/paddle/api/paddle/fluid/one_hot_cn.rst
+++ b/doc/paddle/api/paddle/fluid/one_hot_cn.rst
--- a/doc/paddle/api/paddle/fluid/optimizer/AdamOptimizer_cn.rst
+++ b/doc/paddle/api/paddle/fluid/optimizer/AdamOptimizer_cn.rst
--- a/doc/paddle/api/paddle/fluid/optimizer/Adam_cn.rst
+++ b/doc/paddle/api/paddle/fluid/optimizer/Adam_cn.rst
--- a/doc/paddle/api/paddle/fluid/optimizer/AdamaxOptimizer_cn.rst
+++ b/doc/paddle/api/paddle/fluid/optimizer/AdamaxOptimizer_cn.rst
--- a/doc/paddle/api/paddle/fluid/optimizer/Adamax_cn.rst
+++ b/doc/paddle/api/paddle/fluid/optimizer/Adamax_cn.rst
--- a/doc/paddle/api/paddle/optimizer/SGDOptimizer_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/SGDOptimizer_cn.rst
--- a/doc/paddle/api/paddle/fluid/profiler_cn.rst
+++ b/doc/paddle/api/paddle/fluid/profiler_cn.rst
--- a/doc/paddle/api/paddle/fluid/regularizer/L1Decay_cn.rst
+++ b/doc/paddle/api/paddle/fluid/regularizer/L1Decay_cn.rst
--- a/doc/paddle/api/paddle/fluid/regularizer/L2Decay_cn.rst
+++ b/doc/paddle/api/paddle/fluid/regularizer/L2Decay_cn.rst
--- a/doc/paddle/api/paddle/fluid/release_memory_cn.rst
+++ b/doc/paddle/api/paddle/fluid/release_memory_cn.rst
--- a/doc/paddle/api/paddle/fluid/save_cn.rst
+++ b/doc/paddle/api/paddle/fluid/save_cn.rst
--- a/doc/paddle/api/paddle/fluid/unique_name/guard_cn.rst
+++ b/doc/paddle/api/paddle/fluid/unique_name/guard_cn.rst
--- a/doc/paddle/api/paddle/framework/BackwardStrategy_cn.rst
+++ b/doc/paddle/api/paddle/framework/BackwardStrategy_cn.rst
--- a/doc/paddle/api/paddle/jit/SaveLoadConfig_cn.rst
+++ b/doc/paddle/api/paddle/jit/SaveLoadConfig_cn.rst
--- a/doc/paddle/api/paddle/hapi/Model_cn.rst
+++ b/doc/paddle/api/paddle/hapi/Model_cn.rst
--- a/doc/paddle/api/paddle/io/BatchSampler_cn.rst
+++ b/doc/paddle/api/paddle/io/BatchSampler_cn.rst
--- a/doc/paddle/api/paddle/io/Dataset_cn.rst
+++ b/doc/paddle/api/paddle/io/Dataset_cn.rst
--- a/doc/paddle/api/paddle/metric/accuracy_cn.rst
+++ b/doc/paddle/api/paddle/metric/accuracy_cn.rst
--- a/doc/paddle/api/paddle/metric/auc_cn.rst
+++ b/doc/paddle/api/paddle/metric/auc_cn.rst
--- a/doc/paddle/api/paddle/nn/GRUCell_cn.rst
+++ b/doc/paddle/api/paddle/nn/GRUCell_cn.rst
--- a/doc/paddle/api/paddle/nn/LSTMCell_cn.rst
+++ b/doc/paddle/api/paddle/nn/LSTMCell_cn.rst
--- a/doc/paddle/api/paddle/nn/Linear_cn.rst
+++ b/doc/paddle/api/paddle/nn/Linear_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/elu_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/elu_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/gelu_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/gelu_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/hardshrink_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/hardshrink_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/hardtanh_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/hardtanh_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/leaky_relu_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/leaky_relu_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/log_softmax_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/log_softmax_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/logsigmoid_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/logsigmoid_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/prelu_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/prelu_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/relu6_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/relu6_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/relu_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/relu_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/selu_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/selu_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/softmax_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/softmax_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/softplus_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/softplus_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/softshrink_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/softshrink_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/softsign_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/softsign_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/activation/tanhshrink_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/activation/tanhshrink_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/common/alpha_dropout_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/common/alpha_dropout_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/common/bilinear_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/common/bilinear_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/common/cosine_similarity_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/common/cosine_similarity_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/common/dropout2d_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/common/dropout2d_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/common/dropout3d_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/common/dropout3d_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/common/dropout_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/common/dropout_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/common/pad_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/common/pad_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/dropout_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/dropout_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/elu_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/elu_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/gelu_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/gelu_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/input/embedding_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/input/embedding_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/input/one_hot_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/input/one_hot_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/leaky_relu_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/leaky_relu_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/logsigmoid_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/logsigmoid_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/loss/binary_cross_entropy_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/loss/binary_cross_entropy_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/loss/binary_cross_entropy_with_logits_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/loss/binary_cross_entropy_with_logits_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/cross_entropy_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/cross_entropy_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/loss/ctc_loss_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/loss/ctc_loss_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/loss/l1_loss_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/loss/l1_loss_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/loss/margin_ranking_loss_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/loss/margin_ranking_loss_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/loss/mse_loss_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/loss/mse_loss_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/loss/nll_loss_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/loss/nll_loss_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/loss/smooth_l1_loss_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/loss/smooth_l1_loss_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/mse_loss_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/mse_loss_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/norm/batch_norm_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/norm/batch_norm_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/norm/instance_norm_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/norm/instance_norm_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/norm/layer_norm_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/norm/layer_norm_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/norm/normalize_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/norm/normalize_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/one_hot_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/one_hot_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/pad_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/pad_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/pooling/adaptive_avg_pool1d_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/pooling/adaptive_avg_pool1d_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/pooling/adaptive_avg_pool2d_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/pooling/adaptive_avg_pool2d_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/pooling/adaptive_avg_pool3d_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/pooling/adaptive_avg_pool3d_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/pooling/adaptive_max_pool1d_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/pooling/adaptive_max_pool1d_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/pooling/avg_pool1d_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/pooling/avg_pool1d_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/pooling/max_pool1d_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/pooling/max_pool1d_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/prelu_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/prelu_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/relu6_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/relu6_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/selu_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/selu_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/softplus_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/softplus_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/softshrink_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/softshrink_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/softsign_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/softsign_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/tanh_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/tanh_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/vision/affine_grid_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/vision/affine_grid_cn.rst
--- a/doc/paddle/api/paddle/nn/functional/vision/pixel_shuffle_cn.rst
+++ b/doc/paddle/api/paddle/nn/functional/vision/pixel_shuffle_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/ELU_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/ELU_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/GELU_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/GELU_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/Hardshrink_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/Hardshrink_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/Hardtanh_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/Hardtanh_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/LeakyReLU_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/LeakyReLU_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/LogSigmoid_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/LogSigmoid_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/LogSoftmax_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/LogSoftmax_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/PReLU_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/PReLU_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/ReLU6_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/ReLU6_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/ReLU_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/ReLU_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/SELU_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/SELU_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/Sigmoid_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/Sigmoid_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/Softmax_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/Softmax_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/Softplus_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/Softplus_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/Softshrink_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/Softshrink_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/Softsign_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/Softsign_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/Tanh_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/Tanh_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/activation/Tanhshrink_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/activation/Tanhshrink_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/common/AlphaDropout_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/common/AlphaDropout_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/common/BilinearTensorProduct_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/common/BilinearTensorProduct_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/common/CosineSimilarity_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/common/CosineSimilarity_cn.rst
--- a/doc/paddle/api/paddle/nn/Dropout_cn.rst
+++ b/doc/paddle/api/paddle/nn/Dropout_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/common/Embedding_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/common/Embedding_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/common/ReflectionPad1d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/common/ReflectionPad1d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/common/ReflectionPad2d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/common/ReflectionPad2d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/common/ReplicationPad1d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/common/ReplicationPad1d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/common/ReplicationPad2d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/common/ReplicationPad2d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/common/ReplicationPad3d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/common/ReplicationPad3d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/common/ZeroPad2d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/common/ZeroPad2d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/distance/PairwiseDistance_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/distance/PairwiseDistance_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/loss/BCELoss_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/loss/BCELoss_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/loss/BCEWithLogitsLoss_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/loss/BCEWithLogitsLoss_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/loss/CTCLoss_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/loss/CTCLoss_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/loss/CrossEntropyLoss_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/loss/CrossEntropyLoss_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/loss/L1Loss_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/loss/L1Loss_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/loss/MSELoss_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/loss/MSELoss_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/loss/MarginRankingLoss_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/loss/MarginRankingLoss_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/loss/NLLLoss_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/loss/NLLLoss_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/loss/SmoothL1Loss_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/loss/SmoothL1Loss_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/norm/BatchNorm1d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/norm/BatchNorm1d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/norm/BatchNorm2d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/norm/BatchNorm2d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/norm/BatchNorm3d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/norm/BatchNorm3d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/norm/GroupNorm_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/norm/GroupNorm_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/norm/InstanceNorm1d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/norm/InstanceNorm1d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/norm/InstanceNorm2d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/norm/InstanceNorm2d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/norm/InstanceNorm3d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/norm/InstanceNorm3d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/norm/LayerNorm_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/norm/LayerNorm_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/norm/SyncBatchNorm_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/norm/SyncBatchNorm_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/pooling/AdaptiveAvgPool1d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/pooling/AdaptiveAvgPool1d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/pooling/AdaptiveAvgPool2d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/pooling/AdaptiveAvgPool2d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/pooling/AdaptiveAvgPool3d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/pooling/AdaptiveAvgPool3d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/pooling/AdaptiveMaxPool1d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/pooling/AdaptiveMaxPool1d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/pooling/AvgPool1d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/pooling/AvgPool1d_cn.rst
--- a/doc/paddle/api/paddle/nn/layer/pooling/MaxPool1d_cn.rst
+++ b/doc/paddle/api/paddle/nn/layer/pooling/MaxPool1d_cn.rst
--- a/doc/paddle/api/paddle/nn/utils/weight_norm_hook/remove_weight_norm_cn.rst
+++ b/doc/paddle/api/paddle/nn/utils/weight_norm_hook/remove_weight_norm_cn.rst
--- a/doc/paddle/api/paddle/nn/utils/weight_norm_hook/weight_norm_cn.rst
+++ b/doc/paddle/api/paddle/nn/utils/weight_norm_hook/weight_norm_cn.rst
--- a/doc/paddle/api/paddle/optimizer/AdamW_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/AdamW_cn.rst
--- a/doc/paddle/api/paddle/optimizer/Adam_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/Adam_cn.rst
--- a/doc/paddle/api/paddle/optimizer/Adamax_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/Adamax_cn.rst
--- a/doc/paddle/api/paddle/optimizer/CosineAnnealingLR_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/CosineAnnealingLR_cn.rst
--- a/doc/paddle/api/paddle/optimizer/ExponentialLR_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/ExponentialLR_cn.rst
--- a/doc/paddle/api/paddle/optimizer/InverseTimeLR_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/InverseTimeLR_cn.rst
--- a/doc/paddle/api/paddle/optimizer/LambdaLR_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/LambdaLR_cn.rst
--- a/doc/paddle/api/paddle/optimizer/LinearLrWarmup_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/LinearLrWarmup_cn.rst
--- a/doc/paddle/api/paddle/optimizer/MomentumOptimizer_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/MomentumOptimizer_cn.rst
--- a/doc/paddle/api/paddle/optimizer/MultiStepLR_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/MultiStepLR_cn.rst
--- a/doc/paddle/api/paddle/optimizer/NaturalExpLR_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/NaturalExpLR_cn.rst
--- a/doc/paddle/api/paddle/optimizer/NoamLR_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/NoamLR_cn.rst
--- a/doc/paddle/api/paddle/optimizer/Optimizer_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/Optimizer_cn.rst
--- a/doc/paddle/api/paddle/optimizer/PiecewiseLR_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/PiecewiseLR_cn.rst
--- a/doc/paddle/api/paddle/fluid/optimizer/RMSPropOptimizer_cn.rst
+++ b/doc/paddle/api/paddle/fluid/optimizer/RMSPropOptimizer_cn.rst
--- a/doc/paddle/api/paddle/optimizer/ReduceLROnPlateau_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/ReduceLROnPlateau_cn.rst
--- a/doc/paddle/api/paddle/optimizer/StepLR_cn.rst
+++ b/doc/paddle/api/paddle/optimizer/StepLR_cn.rst
--- a/doc/paddle/api/paddle/static/input/InputSpec_cn.rst
+++ b/doc/paddle/api/paddle/static/input/InputSpec_cn.rst
--- a/doc/paddle/api/paddle/static/data_cn.rst
+++ b/doc/paddle/api/paddle/static/data_cn.rst
--- a/doc/paddle/api/paddle/tensor/creation/diag_cn.rst
+++ b/doc/paddle/api/paddle/tensor/creation/diag_cn.rst
--- a/doc/paddle/api/paddle/tensor/creation/full_cn.rst
+++ b/doc/paddle/api/paddle/tensor/creation/full_cn.rst
--- a/doc/paddle/api/paddle/tensor/creation/full_like_cn.rst
+++ b/doc/paddle/api/paddle/tensor/creation/full_like_cn.rst
--- a/doc/paddle/api/paddle/tensor/linalg/dot_cn.rst
+++ b/doc/paddle/api/paddle/tensor/linalg/dot_cn.rst
--- a/doc/paddle/api/paddle/tensor/linalg/norm_cn.rst
+++ b/doc/paddle/api/paddle/tensor/linalg/norm_cn.rst
--- a/doc/paddle/api/paddle/tensor/logic/allclose_cn.rst
+++ b/doc/paddle/api/paddle/tensor/logic/allclose_cn.rst
--- a/doc/paddle/api/paddle/tensor/manipulation/concat_cn.rst
+++ b/doc/paddle/api/paddle/tensor/manipulation/concat_cn.rst
--- a/doc/paddle/api/paddle/tensor/manipulation/expand_as_cn.rst
+++ b/doc/paddle/api/paddle/tensor/manipulation/expand_as_cn.rst
--- a/doc/paddle/api/paddle/tensor/manipulation/expand_cn.rst
+++ b/doc/paddle/api/paddle/tensor/manipulation/expand_cn.rst
--- a/doc/paddle/api/paddle/tensor/manipulation/gather_cn.rst
+++ b/doc/paddle/api/paddle/tensor/manipulation/gather_cn.rst
--- a/doc/paddle/api/paddle/tensor/manipulation/reshape_cn.rst
+++ b/doc/paddle/api/paddle/tensor/manipulation/reshape_cn.rst
--- a/doc/paddle/api/paddle/tensor/manipulation/scatter_cn.rst
+++ b/doc/paddle/api/paddle/tensor/manipulation/scatter_cn.rst
--- a/doc/paddle/api/paddle/tensor/manipulation/split_cn.rst
+++ b/doc/paddle/api/paddle/tensor/manipulation/split_cn.rst
--- a/doc/paddle/api/paddle/tensor/manipulation/tile_cn.rst
+++ b/doc/paddle/api/paddle/tensor/manipulation/tile_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/add_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/add_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/clip_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/clip_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/divide_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/divide_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/floor_divide_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/floor_divide_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/floor_mod_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/floor_mod_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/isfinite_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/isfinite_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/isinf_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/isinf_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/isnan_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/isnan_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/log1p_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/log1p_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/logsumexp_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/logsumexp_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/max_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/max_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/min_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/min_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/mod_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/mod_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/prod_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/prod_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/remainder_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/remainder_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/sign_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/sign_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/sqrt_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/sqrt_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/sum_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/sum_cn.rst
--- a/doc/paddle/api/paddle/tensor/math/tanh_cn.rst
+++ b/doc/paddle/api/paddle/tensor/math/tanh_cn.rst
--- a/doc/paddle/api/paddle/tensor/random/bernoulli_cn.rst
+++ b/doc/paddle/api/paddle/tensor/random/bernoulli_cn.rst
--- a/doc/paddle/api/paddle/tensor/random/normal_cn.rst
+++ b/doc/paddle/api/paddle/tensor/random/normal_cn.rst
--- a/doc/paddle/api/paddle/tensor/random/rand_cn.rst
+++ b/doc/paddle/api/paddle/tensor/random/rand_cn.rst
--- a/doc/paddle/api/paddle/tensor/random/randint_cn.rst
+++ b/doc/paddle/api/paddle/tensor/random/randint_cn.rst
--- a/doc/paddle/api/paddle/tensor/random/randn_cn.rst
+++ b/doc/paddle/api/paddle/tensor/random/randn_cn.rst
--- a/doc/paddle/api/paddle/tensor/random/randperm_cn.rst
+++ b/doc/paddle/api/paddle/tensor/random/randperm_cn.rst
--- a/doc/paddle/api/paddle/tensor/random/standard_normal_cn.rst
+++ b/doc/paddle/api/paddle/tensor/random/standard_normal_cn.rst
--- a/doc/paddle/api/paddle/tensor/random/uniform_cn.rst
+++ b/doc/paddle/api/paddle/tensor/random/uniform_cn.rst
--- a/doc/paddle/api/paddle/tensor/search/argmax_cn.rst
+++ b/doc/paddle/api/paddle/tensor/search/argmax_cn.rst
--- a/doc/paddle/api/paddle/tensor/search/argmin_cn.rst
+++ b/doc/paddle/api/paddle/tensor/search/argmin_cn.rst
--- a/doc/paddle/api/paddle/tensor/search/index_select_cn.rst
+++ b/doc/paddle/api/paddle/tensor/search/index_select_cn.rst
--- a/doc/paddle/api/paddle/tensor/search/masked_select_cn.rst
+++ b/doc/paddle/api/paddle/tensor/search/masked_select_cn.rst
--- a/doc/paddle/api/paddle/tensor/search/sort_cn.rst
+++ b/doc/paddle/api/paddle/tensor/search/sort_cn.rst
--- a/doc/paddle/api/paddle/tensor/search/topk_cn.rst
+++ b/doc/paddle/api/paddle/tensor/search/topk_cn.rst
--- a/doc/paddle/api/paddle/tensor/stat/numel_cn.rst
+++ b/doc/paddle/api/paddle/tensor/stat/numel_cn.rst
--- a/doc/paddle/api/paddle/tensor/stat/std_cn.rst
+++ b/doc/paddle/api/paddle/tensor/stat/std_cn.rst
--- a/doc/paddle/api/paddle/tensor/stat/var_cn.rst
+++ b/doc/paddle/api/paddle/tensor/stat/var_cn.rst
--- a/doc/paddle/api/paddle/fluid/layers/BeamSearchDecoder_cn.rst
+++ b/doc/paddle/api/paddle/fluid/layers/BeamSearchDecoder_cn.rst
--- a/doc/paddle/api/paddle/text/RNNCell_cn.rst
+++ b/doc/paddle/api/paddle/text/RNNCell_cn.rst
--- a/doc/paddle/api/paddle/vision/flip_cn.rst
+++ b/doc/paddle/api/paddle/vision/flip_cn.rst