Fix layers docs (#2412)

* fix_docs test=release/1.8 * add_DataPatallel docs test=release/1.8 * add DataParallel to white list test=release/1.8

Fix layers docs (#2412)
* fix_docs test=release/1.8 * add_DataPatallel docs test=release/1.8 * add DataParallel to white list test=release/1.8
8a4a5d0a · Chen Long · GitHub · e30af5b2 · e30af5b2 · 8a4a5d0a
18 changed file
--- a/doc/fluid/api/data/data_reader.rst
+++ b/doc/fluid/api/data/data_reader.rst
-=====================
-Data Reader
-=====================
-
-..  toctree::
-    :maxdepth: 1
-
-    data_reader/Reader.rst
--- a/doc/fluid/api/index_en.rst
+++ b/doc/fluid/api/index_en.rst
@@ -9,7 +9,6 @@ API Reference
    fluid.rst
    backward.rst
    clip.rst
-    data/data_reader.rst
    data/dataset.rst
    dataset.rst
    dygraph.rst

--- a/doc/fluid/api_cn/data_cn/data_reader_cn.rst
+++ b/doc/fluid/api_cn/data_cn/data_reader_cn.rst
-=======================
-Data Reader
-=======================
-
-
-
-
-..  toctree::
-    :maxdepth: 1
-
-    data_reader_cn/Reader_cn.rst
-
-
-
-
-
-
--- a/doc/fluid/api_cn/data_cn/data_reader_cn/DataFeeder_cn.rst
+++ b/doc/fluid/api_cn/data_cn/data_reader_cn/DataFeeder_cn.rst
-.. _cn_api_paddle_data_reader_datafeeder:
-
-DataFeeder
-----------------------------------
-
-.. py:class:: paddle.fluid.data_feeder.DataFeeder(feed_list, place, program=None)
-
-
-DataFeeder将reader返回的数据转换为可以输入Executor和ParallelExecutor的数据结构。reader通常返回一个小批量数据条目列表。列表中的每个数据条目都是一个样本。每个样本都是具有一个或多个特征的列表或元组。
-
-简单用法如下：
-
-代码示例
-::::::::::::
-
-..  code-block:: python
-
-    import paddle.fluid as fluid
-    place = fluid.CPUPlace()
-    img = fluid.layers.data(name='image', shape=[1, 28, 28])
-    label = fluid.layers.data(name='label', shape=[1], dtype='int64')
-    feeder = fluid.DataFeeder([img, label], fluid.CPUPlace())
-    result = feeder.feed([([0] * 784, [9]), ([1] * 784, [1])])
-
-
-如果您想在使用多个GPU训练模型时预先将数据单独输入GPU端，可以使用decorate_reader函数。
-
-
-代码示例
-::::::::::::
-
-..  code-block:: python
-
-    import paddle
-    import paddle.fluid as fluid
-    
-    place=fluid.CUDAPlace(0)
-    data = fluid.layers.data(name='data', shape=[3, 224, 224], dtype='float32')
-    label = fluid.layers.data(name='label', shape=[1], dtype='int64')
-    
-    feeder = fluid.DataFeeder(place=place, feed_list=[data, label])
-    reader = feeder.decorate_reader(
-        paddle.batch(paddle.dataset.flowers.train(), batch_size=16), multi_devices=False)
-
-
-参数
-::::::::::::
-
-    - **feed_list**  (list) –  将输入模型的变量或变量的名称。
-    - **place**  (Place) – place表示将数据输入CPU或GPU，如果要将数据输入GPU，请使用fluid.CUDAPlace(i)（i表示GPU的ID），如果要将数据输入CPU，请使用fluid.CPUPlace()。
-    - **program**  (Program) –将数据输入的Program，如果Program为None，它将使用default_main_program() 。默认值None。
-
-抛出异常
-::::::::::::
-     ``ValueError`` – 如果某些变量未在Program中出现
-
-
-代码示例
-::::::::::::
-
-..  code-block:: python
-
-    import numpy as np
-    import paddle
-    import paddle.fluid as fluid
-
-    place = fluid.CPUPlace()
-
-    def reader():
-        yield [np.random.random([4]).astype('float32'), np.random.random([3]).astype('float32')],
-
-    main_program = fluid.Program()
-    startup_program = fluid.Program()
-
-    with fluid.program_guard(main_program, startup_program):
-        data_1 = fluid.layers.data(name='data_1', shape=[1, 2, 2])
-        data_2 = fluid.layers.data(name='data_2', shape=[1, 1, 3])
-        out = fluid.layers.fc(input=[data_1, data_2], size=2)
-        # ...
-
-    feeder = fluid.DataFeeder([data_1, data_2], place)
-
-    exe = fluid.Executor(place)
-    exe.run(startup_program)
-    for data in reader():
-        outs = exe.run(program=main_program,
-                       feed=feeder.feed(data),
-                       fetch_list=[out])
-
-
-方法
-::::::::::::
-feed(iterable)
-'''''''''
-
-根据feed_list和iterable，将输入转换成一个数据结构，该数据结构可以输入Executor和ParallelExecutor。
-
-**参数**
-
-    - **iterable** (list|tuple) – 输入的数据
-
-**返回**
- 转换结果
-
-**返回类型**
- dict
-
-**代码示例**
-
-..  code-block:: python
-
-        import numpy.random as random
-        import paddle.fluid as fluid
-        
-        def reader(limit=5):
-            for i in range(limit):
-                    yield random.random([784]).astype('float32'), random.random([1]).astype('int64'), random.random([256]).astype('float32')
-        
-        data_1 = fluid.layers.data(name='data_1', shape=[1, 28, 28])
-        data_2 = fluid.layers.data(name='data_2', shape=[1], dtype='int64')
-        data_3 = fluid.layers.data(name='data_3', shape=[16, 16], dtype='float32')
-        feeder = fluid.DataFeeder(['data_1','data_2', 'data_3'], fluid.CPUPlace())
-        
-        result = feeder.feed(reader())
-
-
-
-feed_parallel(iterable, num_places=None)
-'''''''''
-
-需要多个mini-batches。每个mini-batch都将提前在每个设备上输入。
-
-**参数**
-
-    - **iterable** (list|tuple) – 输入的数据。
-    - **num_places**  (int) – 设备编号，默认值为None。
-
-**返回**
- 转换结果
-
-**返回类型**
- dict
-
-
-
-.. note::
-
-    设备数量和mini-batches数量必须一致。
-
-**代码示例**
-
-..  code-block:: python
-
-        import numpy.random as random
-        import paddle.fluid as fluid
-        
-        def reader(limit=10):
-            for i in range(limit):
-                yield [random.random([784]).astype('float32'), random.randint(10)],
-        
-        x = fluid.layers.data(name='x', shape=[1, 28, 28])
-        y = fluid.layers.data(name='y', shape=[1], dtype='int64')
-        
-        feeder = fluid.DataFeeder(['x','y'], fluid.CPUPlace())
-        place_num = 2
-        places = [fluid.CPUPlace() for x in range(place_num)]
-        data = []
-        exe = fluid.Executor(fluid.CPUPlace())
-        exe.run(fluid.default_startup_program())
-        program = fluid.CompiledProgram(fluid.default_main_program()).with_data_parallel(places=places)
-        for item in reader():
-            data.append(item)
-            if place_num == len(data):
-                exe.run(program=program, feed=list(feeder.feed_parallel(data, place_num)), fetch_list=[])
-                data = []
-
-
-decorate_reader(reader, multi_devices, num_places=None, drop_last=True)
-'''''''''
-
-将输入数据转换成reader返回的多个mini-batches。每个mini-batch分别送入各设备中。
-
-**参数**
-
-    - **reader** (function) – reader是可以生成数据的函数。
-    - **multi_devices** (bool) – 是否用多个设备。
-    - **num_places** (int) – 如果multi_devices是True, 你可以指定GPU的使用数量, 如果multi_devices是None, 会使用当前机器的所有GPU ，默认值None。
-    - **drop_last** (bool) – 如果最后一个batch的大小小于batch_size，选择是否删除最后一个batch，默认值True。
-
-**返回**
- 转换结果
-
-**返回类型**
- dict
-
-**抛出异常**
-     ``ValueError`` – 如果drop_last为False并且数据batch和设备数目不匹配。
-
-**代码示例**
-
-..  code-block:: python
-
-        import numpy.random as random
-        import paddle
-        import paddle.fluid as fluid
-        
-        def reader(limit=5):
-            for i in range(limit):
-                yield (random.random([784]).astype('float32'), random.random([1]).astype('int64')),
-        
-        place=fluid.CUDAPlace(0)
-        data = fluid.layers.data(name='data', shape=[1, 28, 28], dtype='float32')
-        label = fluid.layers.data(name='label', shape=[1], dtype='int64')
-        
-        feeder = fluid.DataFeeder(place=place, feed_list=[data, label])
-        reader = feeder.decorate_reader(reader, multi_devices=False)
-        
-        exe = fluid.Executor(place)
-        exe.run(fluid.default_startup_program())
-        for data in reader():
-            exe.run(feed=data)
\ No newline at end of file
--- a/doc/fluid/api_cn/dygraph_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn.rst
@@ -17,6 +17,7 @@ fluid.dygraph
    dygraph_cn/Conv3D_cn.rst
    dygraph_cn/Conv3DTranspose_cn.rst
    dygraph_cn/CosineDecay_cn.rst
+    dygraph_cn/DataParallel_cn.rst
    dygraph_cn/disable_dygraph_cn.rst
    dygraph_cn/disable_imperative_cn.rst
    dygraph_cn/Dropout_cn.rst

--- a/doc/fluid/api_cn/dygraph_cn/DataParallel_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/DataParallel_cn.rst
+.. _cn_api_fluid_dygraph_DataParallel:
+
+DataParallel
+------------
+
+.. py:class:: paddle.fluid.dygraph.DataParallel(layers, strategy)
+
+:api_attr: 命令式编程模式（动态图)
+
+通过数据并行模式执行动态图模型。
+
+目前，``DataParallel`` 仅支持以多进程的方式执行动态图模型。使用方式如下：
+
+``python -m paddle.distributed.launch –selected_gpus=0,1 dynamic_graph_test.py``
+
+其中 ``dynamic_graph_test.py`` 脚本的代码可以是下面的示例代码。
+
+参数
+:::::::::
+    - **Layer** (Layer) - 需要通过数据并行方式执行的模型。
+    - **strategy** (ParallelStrategy) - 数据并行的策略，包括并行执行的环境配置。
+
+返回
+:::::::::
+支持数据并行的 ``Layer``
+
+返回类型
+:::::::::
+Layer实例
+
+代码示例
+:::::::::
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle.fluid as fluid
+
+    place = fluid.CUDAPlace(fluid.dygraph.ParallelEnv().dev_id)
+    with fluid.dygraph.guard(place):
+
+        # prepare the data parallel context
+        strategy = fluid.dygraph.prepare_context()
+
+        linear = fluid.dygraph.Linear(1, 10, act="softmax")
+        adam = fluid.optimizer.AdamOptimizer(
+            learning_rate=0.001, parameter_list=linear.parameters())
+
+        # make the module become the data parallelism module
+        linear = fluid.dygraph.DataParallel(linear, strategy)
+
+        x_data = np.random.random(size=[10, 1]).astype(np.float32)
+        data = fluid.dygraph.to_variable(x_data)
+
+        hidden = linear(data)
+        avg_loss = fluid.layers.mean(hidden)
+
+        # scale the loss according to the number of trainers.
+        avg_loss = linear.scale_loss(avg_loss)
+
+        avg_loss.backward()
+
+        # collect the gradients of trainers.
+        linear.apply_collective_grads()
+
+        adam.minimize(avg_loss)
+        linear.clear_gradients()
+
+.. py:method:: scale_loss(loss)
+
+缩放模型损失值 ``loss`` 。在数据并行模式中，损失值 ``loss`` 需要根据并行训练进程的数目进行缩放。
+
+如果不在数据并行模式下，会直接返回原 ``loss`` 。
+
+参数：
+    - **loss** (Variable) - 当前模型的损失值。
+
+返回：缩放后的损失值 ``loss``
+
+返回类型：Variable
+
+**代码示例**
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle.fluid as fluid
+
+    place = fluid.CUDAPlace(fluid.dygraph.ParallelEnv().dev_id)
+    with fluid.dygraph.guard(place):
+
+        # prepare the data parallel context
+        strategy = fluid.dygraph.prepare_context()
+
+        linear = fluid.dygraph.Linear(1, 10, act="softmax")
+        adam = fluid.optimizer.AdamOptimizer(
+            learning_rate=0.001, parameter_list=linear.parameters())
+
+        # make the module become the data parallelism module
+        linear = fluid.dygraph.DataParallel(linear, strategy)
+
+        x_data = np.random.random(size=[10, 1]).astype(np.float32)
+        data = fluid.dygraph.to_variable(x_data)
+
+        hidden = linear(data)
+        avg_loss = fluid.layers.mean(hidden)
+
+        # scale the loss according to the number of trainers.
+        avg_loss = linear.scale_loss(avg_loss)
+
+        avg_loss.backward()
+
+        # collect the gradients of trainers.
+        linear.apply_collective_grads()
+
+        adam.minimize(avg_loss)
+        linear.clear_gradients()
+
+
+.. py:method:: apply_collective_grads()
+
+AllReduce（规约）参数的梯度值。
+
+返回：无
+
+**代码示例**
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle.fluid as fluid
+
+    place = fluid.CUDAPlace(fluid.dygraph.ParallelEnv().dev_id)
+    with fluid.dygraph.guard(place):
+
+        # prepare the data parallel context
+        strategy = fluid.dygraph.prepare_context()
+
+        linear = fluid.dygraph.Linear(1, 10, act="softmax")
+        adam = fluid.optimizer.AdamOptimizer(
+            learning_rate=0.001, parameter_list=linear.parameters())
+
+        # make the module become the data parallelism module
+        linear = fluid.dygraph.DataParallel(linear, strategy)
+
+        x_data = np.random.random(size=[10, 1]).astype(np.float32)
+        data = fluid.dygraph.to_variable(x_data)
+
+        hidden = linear(data)
+        avg_loss = fluid.layers.mean(hidden)
+
+        # scale the loss according to the number of trainers.
+        avg_loss = linear.scale_loss(avg_loss)
+
+        avg_loss.backward()
+
+        # collect the gradients of trainers.
+        linear.apply_collective_grads()
+
+        adam.minimize(avg_loss)
+        linear.clear_gradients()
--- a/doc/fluid/api_cn/index_cn.rst
+++ b/doc/fluid/api_cn/index_cn.rst
@@ -9,7 +9,6 @@ API Reference
    fluid_cn.rst
    backward_cn.rst
    clip_cn.rst
-    data_cn/data_reader_cn.rst
    data_cn/dataset_cn.rst
    dataset_cn.rst
    dygraph_cn.rst

--- a/doc/fluid/api_cn/layers_cn.rst
+++ b/doc/fluid/api_cn/layers_cn.rst
@@ -35,6 +35,7 @@ fluid.layers
    layers_cn/auc_cn.rst
    layers_cn/autoincreased_step_counter_cn.rst
    layers_cn/batch_norm_cn.rst
+    layers_cn/BasicDecoder_cn.rst
    layers_cn/beam_search_cn.rst
    layers_cn/beam_search_decode_cn.rst
    layers_cn/bilinear_tensor_product_cn.rst
@@ -97,6 +98,7 @@ fluid.layers
    layers_cn/dynamic_lstmp_cn.rst
    layers_cn/dynamic_decode_cn.rst
    layers_cn/Decoder_cn.rst
+    layers_cn/DecodeHelper_cn.rst
    layers_cn/DynamicRNN_cn.rst
    layers_cn/edit_distance_cn.rst
    layers_cn/elementwise_add_cn.rst
@@ -138,6 +140,7 @@ fluid.layers
    layers_cn/get_tensor_from_selected_rows_cn.rst
    layers_cn/greater_equal_cn.rst
    layers_cn/greater_than_cn.rst
+    layers_cn/GreedyEmbeddingHelper_cn.rst
    layers_cn/grid_sampler_cn.rst
    layers_cn/group_norm_cn.rst
    layers_cn/gru_unit_cn.rst
@@ -270,6 +273,7 @@ fluid.layers
    layers_cn/rsqrt_cn.rst
    layers_cn/RNNCell_cn.rst
    layers_cn/sampled_softmax_with_cross_entropy_cn.rst
+    layers_cn/SampleEmbeddingHelper_cn.rst
    layers_cn/sampling_id_cn.rst
    layers_cn/scale_cn.rst
    layers_cn/scatter_cn.rst
@@ -338,6 +342,7 @@ fluid.layers
    layers_cn/topk_cn.rst
    layers_cn/trace_cn.rst
    layers_cn/transpose_cn.rst
+    layers_cn/TrainingHelper_cn.rst
    layers_cn/tril_cn.rst
    layers_cn/triu_cn.rst
    layers_cn/unfold_cn.rst

--- a/doc/fluid/api_cn/layers_cn/BasicDecoder_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/BasicDecoder_cn.rst
+.. _cn_api_fluid_layers_BasicDecoder:
+
+BasicDecoder
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.BasicDecoder(cell, helper, output_fn=None)
+BasicDecoder是 :ref:`cn_api_fluid_layers_Decoder` 的子类，它组装了 :ref:`cn_api_fluid_layers_RNNCell` 和 :ref:`cn_api_fluid_layers_DecodeHelper` 的实例作为成员，其中DecodeHelper用来实现不同的解码策略。它依次执行以下步骤来完成单步解码：
+
+1. 执行 :code:`cell_outputs, cell_states = cell.call(inputs, states)` 以获取输出和新的状态。
+
+2. 执行 :code:`sample_ids = helper.sample(time, cell_outputs, cell_states)` 以采样id并将其作为当前步的解码结果。
+
+3. 执行 :code:`finished, next_inputs, next_states = helper.next_inputs(time, cell_outputs, cell_states, sample_ids)` 以产生下一解码步的结束标识、输入和状态。
+
+参数
+:::::::::
+  - **cell** (RNNCell) - RNNCell的实例或者具有相同接口定义的对象。
+  - **helper** (DecodeHelper) - DecodeHelper的实例。
+  - **output_fn** (可选) - 处理cell输出的接口，在采样之前使用。默认值None。
+
+代码示例
+:::::::::
+
+.. code-block:: python
+        
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+            start_tokens = fluid.data(name="start_tokens",
+                                 shape=[None],
+                                 dtype="int64")
+            
+            trg_embeder = lambda x: fluid.embedding(
+                x, size=[10000, 128], param_attr=fluid.ParamAttr(name="trg_embedding"))
+            output_layer = lambda x: layers.fc(x,
+                                            size=10000,
+                                            num_flatten_dims=len(x.shape) - 1,
+                                            param_attr=fluid.ParamAttr(name=
+                                                                    "output_w"),
+                                            bias_attr=False)
+            helper = layers.SampleEmbeddingHelper(trg_embeder, start_tokens=start_tokens, end_token=1)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper, output_fn=output_layer)
+            outputs = layers.dynamic_decode(
+                decoder=decoder, inits=decoder_cell.get_initial_states(start_tokens))
+
+.. py:method:: initialize(initial_cell_states)
+初始化，包括helper的初始化和cell的初始化，cell初始化直接使用 :code:`initial_cell_states` 作为结果。
+
+参数：
+  - **initial_cell_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。这是由调用者 :ref:`cn_api_fluid_layers_dynamic_decode` 提供的参数。
+
+返回：:code:`(initial_inputs, initial_states, finished)` 的三元组。 :code:`initial_inputs, initial_states` 均是单个tensor变量或tensor变量组成的嵌套结构， :code:`finished` 是bool类型的tensor。 :code:`initial_inputs, finished` 与 :code:`helper.initialize()` 返回的内容相同； :code:`initial_states` 与输入参数中的 :code:`initial_cell_states` 的相同。
+
+返回类型：tuple
+
+.. py:class:: OutputWrapper(cell_outputs, sample_ids)
+
+ :code:`step()` 的返回值中 :code:`outputs` 使用的数据结构，是一个由 :code:`cell_outputs` 和 :code:`sample_ids` 这两个字段构成的命名元组。
+
+.. py:method:: step(time, inputs, states, **kwargs)
+
+按照以下步骤执行单步解码：
+
+1. 执行 :code:`cell_outputs, cell_states = cell.call(inputs, states)` 以获取输出和新的状态。
+
+2. 执行 :code:`sample_ids = helper.sample(time, cell_outputs, cell_states)` 以采样id并将其作为当前步的解码结果。
+
+3. 执行 :code:`finished, next_inputs, next_states = helper.next_inputs(time, cell_outputs, cell_states, sample_ids)` 以产生下一解码步的结束标识、输入和状态。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **inputs** (Variable) - tensor变量。在第一个解码时间步时与由 :code:`initialize()` 返回的 :code:`initial_inputs` 相同，其他时间步与由 :code:`step()` 返回的 :code:`next_inputs` 相同。
+  - **states** (Variable) - tensor变量的结构。在第一个解码时间步时与 :code:`initialize()` 返回的 :code:`initial_states` 相同，其他时间步与由 :code:`step()` 返回的 :code:`next_states` 相同。
+  - **kwargs** - 附加的关键字参数，由调用者 :ref:`cn_api_fluid_layers_dynamic_decode` 提供。
+
+返回： :code:`(outputs, next_states, next_inputs, finished)` 的四元组。 :code:`outputs` 是包含 :code:`cell_outputs` 和 :code:`sample_ids` 两个字段的命名元组，其中 :code:`cell_outputs` 是 :code:`cell.call()` 的结果， :code:`sample_ids` 是 :code:`helper.sample()` 的结果； :code:`next_states, next_inputs` 分别和输入参数中的 :code:`states, inputs` 有相同的的结构、形状和数据类型； :code:`finished` 是一个bool类型的tensor，形状是 :math:`[batch\_size]` 。
+
+返回类型：tuple
--- a/doc/fluid/api_cn/layers_cn/DecodeHelper_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/DecodeHelper_cn.rst
+.. _cn_api_fluid_layers_DecodeHelper:
+
+DecodeHelper
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.DecodeHelper()
+
+DecodeHelper是一个基类，其子类的实例将在 :ref:`cn_api_fluid_layers_BasicDecoder` 中使用。它提供了在动态解码时采样和产生下一解码步的输入的接口。
+
+.. py:method:: initialize()
+
+初始化以产生第一个解码步的输入和每个序列是否结束的初始标识。这是 :ref:`cn_api_fluid_layers_BasicDecoder` 初始化的一部分。
+
+返回
+:::::::::
+ :code:`(initial_inputs, initial_finished)` 的二元组， :code:`initial_inputs` 是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` 。 :code:`initial_finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型
+:::::::::
+tuple
+    
+.. py:method:: sample(time, outputs, states)
+
+根据 :code:`outputs` 以特定的方式进行采样，该方法是 :code:`BasicDecoder.step` 中的一部分。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+
+返回类型：Variable        
+
+.. py:method:: next_inputs(time, outputs, states, sample_ids)
+
+产生下一解码步的输入、状态，以及每个序列是否结束的标识。该方法是 :code:`BasicDecoder.step` 中的一部分。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+  - **sample_ids** (Variable) - 数据类型为int64形状为 :math:`[batch\_size]` 的tensor，和由 :code:`sample()` 返回的 :code:`sample_ids` 是同一内容。
+
+返回： :code:`(finished, next_inputs, next_states)` 的三元组。 :code:`next_inputs, next_states` 均是单个tensor变量或tensor变量组成的嵌套结构， :code:`next_states` 和输入参数中的 :code:`states` 具有相同的结构、形状和数据类型； :code:`finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型：tuple
--- a/doc/fluid/api_cn/layers_cn/GreedyEmbeddingHelper_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/GreedyEmbeddingHelper_cn.rst
+.. _cn_api_fluid_layers_GreedyEmbeddingHelper:
+
+GreedyEmbeddingHelper
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.GreedyEmbeddingHelper(embedding_fn, start_tokens, end_token)
+
+GreedyEmbeddingHelper是 :ref:`cn_api_fluid_layers_DecodeHelper` 的子类。作为解码helper，它使用 :code:`argmax` 进行采样，并将采样结果送入embedding层，以此作为下一解码步的输入。
+
+参数
+:::::::::
+  - **embedding_fn** (callable) - 作用于 :code:`argmax` 结果的函数，通常是一个将词id转换为词嵌入的embedding层，**注意** ，这里要使用 :ref:`cn_api_fluid_embedding` 而非 :ref:`cn_api_fluid_layers_embedding`，因为选中的id的形状是 :math:`[batch\_size]` ，如果使用后者则还需要在这里提供unsqueeze。
+  - **start_tokens** (Variable) - 形状为 :math:`[batch\_size]` 、数据类型为int64、 值为起始标记id的tensor。
+  - **end_token** (int) - 结束标记id。
+
+代码示例
+:::::::::
+.. code-block:: python
+
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+
+            start_tokens = fluid.data(name="start_tokens",
+                                 shape=[None],
+                                 dtype="int64")
+            
+            trg_embeder = lambda x: fluid.embedding(
+                x, size=[10000, 128], param_attr=fluid.ParamAttr(name="trg_embedding"))
+            output_layer = lambda x: layers.fc(x,
+                                            size=10000,
+                                            num_flatten_dims=len(x.shape) - 1,
+                                            param_attr=fluid.ParamAttr(name=
+                                                                    "output_w"),
+                                            bias_attr=False)
+            helper = layers.GreedyEmbeddingHelper(trg_embeder, start_tokens=start_tokens, end_token=1)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper, output_fn=output_layer)
+            outputs = layers.dynamic_decode(
+                decoder=decoder, inits=decoder_cell.get_initial_states(start_tokens))
+
+.. py:method:: initialize()
+
+GreedyEmbeddingHelper初始化，其使用构造函数中的 :code:`start_tokens` 作为第一个解码步的输入，并给出每个序列是否结束的初始标识。这是 :ref:`cn_api_fluid_layers_BasicDecoder` 初始化的一部分。
+
+返回：:code:`(initial_inputs, initial_finished)` 的二元组， :code:`initial_inputs` 同构造函数中的 :code:`start_tokens` ； :code:`initial_finished` 是一个bool类型、值为False的tensor，其形状和 :code:`start_tokens` 相同。
+
+返回类型：tuple
+    
+.. py:method:: sample(time, outputs, states)
+
+使用 :code:`argmax` 根据 `outputs` 进行采样。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+
+返回类型：Variable
+
+.. py:method:: next_inputs(time, outputs, states, sample_ids)
+
+对 :code:`sample_ids` 使用 :code:`embedding_fn` ，以此作为下一解码步的输入；同时直接使用输入参数中的 :code:`states` 作为下一解码步的状态；并通过判别 :code:`sample_ids` 是否得到 :code:`end_token`，依此产生每个序列是否结束的标识。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+  - **sample_ids** (Variable) - 数据类型为int64形状为 :math:`[batch\_size]` 的tensor，和由 :code:`sample()` 返回的 :code:`sample_ids` 是同一内容。
+
+返回： :code:`(finished, next_inputs, next_states)` 的三元组。 :code:`next_inputs, next_states` 均是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` ， :code:`next_states` 和输入参数中的 :code:`states` 相同； :code:`finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型：tuple
--- a/doc/fluid/api_cn/layers_cn/SampleEmbeddingHelper_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/SampleEmbeddingHelper_cn.rst
+.. _cn_api_fluid_layers_SampleEmbeddingHelper:
+
+SampleEmbeddingHelper
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.SampleEmbeddingHelper(embedding_fn, start_tokens, end_token, softmax_temperature=None, seed=None)
+
+SampleEmbeddingHelper是 :ref:`cn_api_fluid_layers_GreedyEmbeddingHelper` 的子类。作为解码helper，它通过采样而非使用 :code:`argmax` 并将采样结果送入embedding层，以此作为下一解码步的输入。
+
+参数
+:::::::::
+  - **embedding_fn** (callable) - 作用于 :code:`argmax` 结果的函数，通常是一个将词id转换为词嵌入的embedding层，**注意** ，这里要使用 :ref:`cn_api_fluid_embedding` 而非 :ref:`cn_api_fluid_layers_embedding`，因为选中的id的形状是 :math:`[batch\_size]` ，如果使用后者则还需要在这里提供unsqueeze。
+  - **start_tokens** (Variable) - 形状为 :math:`[batch\_size]` 、数据类型为int64、 值为起始标记id的tensor。
+  - **end_token** (int) - 结束标记id。
+  - **softmax_temperature** (float，可选) - 该值用于在softmax计算前除以logits。温度越高（大于1.0）随机性越大，温度越低则越趋向于argmax。该值必须大于0，默认值None等同于1.0。
+  - **seed** (int，可选) - 采样使用的随机种子。默认为None，表示不使用固定的随机种子。
+
+代码示例
+:::::::::
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+
+            start_tokens = fluid.data(name="start_tokens",
+                                 shape=[None],
+                                 dtype="int64")
+            
+            trg_embeder = lambda x: fluid.embedding(
+                x, size=[10000, 128], param_attr=fluid.ParamAttr(name="trg_embedding"))
+            output_layer = lambda x: layers.fc(x,
+                                            size=10000,
+                                            num_flatten_dims=len(x.shape) - 1,
+                                            param_attr=fluid.ParamAttr(name=
+                                                                    "output_w"),
+                                            bias_attr=False)
+            helper = layers.SampleEmbeddingHelper(trg_embeder, start_tokens=start_tokens, end_token=1)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper, output_fn=output_layer)
+            outputs = layers.dynamic_decode(
+                decoder=decoder, inits=decoder_cell.get_initial_states(start_tokens))
+    
+.. py:method:: sample(time, outputs, states)
+
+根据一个多项分布进行采样，此分布由 :code:`softmax(outputs/softmax_temperature)` 计算得到。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+
+返回类型：Variable
--- a/doc/fluid/api_cn/layers_cn/TrainingHelper_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/TrainingHelper_cn.rst
+.. _cn_api_fluid_layers_TrainingHelper:
+
+TrainingHelper
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.TrainingHelper(inputs, sequence_length, time_major=False)
+
+TrainingHelper是 :ref:`cn_api_fluid_layers_DecodeHelper` 的子类。作为解码helper，它在每个解码时间步通过在完整序列输入 :code:`inputs` 的相应位置切片作为各步的输入，并且使用 :code:`argmax` 根据 :code:`cell.call()` 的输出进行采样。
+由于要求有完整的序列输入 :code:`inputs` ，TrainingHelper主要用于以teach-forcing的方式进行最大似然训练，采样得到的内容通常不会使用。
+
+参数
+:::::::::
+  - **inputs** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。当 :code:`time_major == False` 时，tensor的形状应为 :math:`[batch\_size, sequence\_length, ...]`；当 :code:`time_major == True` 时，tensor的形状应为 :math:`[sequence\_length, batch\_size, ...]`。在解码的每一步都要从中切片取出相应的数据。
+  - **sequence_length** (Variable) - 形状为 :math:`[batch\_size]` 的tensor。它存储了 :code:`inputs` 中每个样本的实际长度，可以据此来标识每个解码步中每个样本是否结束。
+  - **time_major** (bool，可选) - 指示输入tensor和输出tensor中包含的tensor的数据组织。如果为False，则数据组织为batch为主，形状为 :math:`[batch\_size，sequence\_length，...]`。如果为True，则数据组织为time为主，形状为 :math:`[sequence\_length，batch\_size，...]` 。默认值：False。
+
+代码示例
+:::::::::
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+            trg_emb = fluid.data(name="trg_emb",
+                                 shape=[None, None, 128],
+                                 dtype="float32")
+            trg_seq_length = fluid.data(name="trg_seq_length",
+                                        shape=[None],
+                                        dtype="int64")
+            helper = layers.TrainingHelper(trg_emb, trg_seq_length)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper)
+            outputs = layers.dynamic_decode(
+                decoder,
+                inits=decoder_cell.get_initial_states(trg_emb),
+                is_test=False)
+
+.. py:method:: initialize()
+
+TrainingHelper初始化，其通过在完整序列输入 :code:`inputs` 中首个时间步的位置上切片，以此作为第一个解码步的输入，并给出每个序列是否结束的初始标识。这是 :ref:`cn_api_fluid_layers_BasicDecoder` 初始化的一部分。
+
+返回：:code:`(initial_inputs, initial_finished)` 的二元组， :code:`initial_inputs` 是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` 。 :code:`initial_finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型：tuple
+    
+.. py:method:: sample(time, outputs, states)
+
+使用 :code:`argmax` 根据 `outputs` 进行采样。由于使用完整序列中的切片作为下一解码步的输入，采样得到的内容通常不会使用。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+
+返回类型：Variable        
+
+.. py:method:: next_inputs(time, outputs, states, sample_ids)
+
+从完整序列输入中当前时间步的位置上切片，以此作为产生下一解码步的输入；同时直接使用输入参数中的 :code:`states` 作为下一解码步的状态；并比较当前时间与每个序列的大小，依此产生每个序列是否结束的标识。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+  - **sample_ids** (Variable) - 数据类型为int64形状为 :math:`[batch\_size]` 的tensor，和由 :code:`sample()` 返回的 :code:`sample_ids` 是同一内容。
+
+返回： :code:`(finished, next_inputs, next_states)` 的三元组。 :code:`next_inputs, next_states` 均是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` ， :code:`next_states` 和输入参数中的 :code:`states` 相同； :code:`finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型：tuple
--- a/doc/fluid/api_cn/data_cn/data_reader_cn/Reader_cn.rst
+++ b/doc/fluid/api_cn/data_cn/data_reader_cn/Reader_cn.rst
--- a/doc/fluid/api/data/data_reader/Reader.rst
+++ b/doc/fluid/api/data/data_reader/Reader.rst
--- a/doc/fluid/beginners_guide/coding_practice/index_cn.rst
+++ b/doc/fluid/beginners_guide/coding_practice/index_cn.rst
@@ -6,7 +6,8 @@

 .. toctree::
   :maxdepth: 1
-
+   
+   Reader.rst
   configure_simple_model/index_cn.rst
   single_node.rst
   save_load_variables.rst

--- a/doc/fluid/beginners_guide/coding_practice/index_en.rst
+++ b/doc/fluid/beginners_guide/coding_practice/index_en.rst
-############
+###############
 Coding Practice
-############
+###############

 If you have mastered the basic concepts and you expect to model and build your own network according to the actual problems, this module provides you with some details about the use of paddle for your reference:

 .. toctree::
   :maxdepth: 1

+   Reader_en.rst   
   configure_simple_model/index_en.rst
   single_node_en.rst
   test_while_training_en.rst

--- a/scripts/api_white_list.txt
+++ b/scripts/api_white_list.txt
@@ -7,3 +7,4 @@ transpiler_cn/release_memory_cn.rst
 transpiler_cn/RoundRobin_cn.rst
 optimizer_cn/Dpsgd_cn.rst
 io_cn/ComposeNotAligned_cn.rst
+dygraph_cn/DataParallel_cn.rst