Refine prepare steps doc (#1482)

* refine prepare steps doc, test=develop * replace layers.data with fluid.data, test=develop

Refine prepare steps doc (#1482)
* refine prepare steps doc, test=develop * replace layers.data with fluid.data, test=develop
c3e4fc1b · Zeng Jinle · GitHub · 1d0b8615 · c3e4fc1b · c3e4fc1b
2 changed file
--- a/doc/fluid/user_guides/howto/prepare_data/prepare_steps.rst
+++ b/doc/fluid/user_guides/howto/prepare_data/prepare_steps.rst
@@ -6,7 +6,7 @@
 使用PaddlePaddle Fluid准备数据分为三个步骤：
-Step1: 自定义Reader生成训练/预测数据
+Step 1: 自定义Reader生成训练/预测数据
 ###################################
 生成的数据类型可以为Numpy Array或LoDTensor。根据Reader返回的数据形式的不同，可分为Batch级的Reader和Sample（样本）级的Reader。
@@ -16,41 +16,28 @@ Batch级的Reader每次返回一个Batch的数据，Sample级的Reader每次返
 如果您的数据是Sample级的数据，我们提供了一个可以数据预处理和组建batch的工具：:code:`Python Reader` 。
-Step2: 在网络配置中定义数据层变量
+Step 2: 在网络配置中定义数据层变量
 ###################################
-用户需使用 :code:`fluid.layers.data` 在网络中定义数据层变量。定义数据层变量时需指明数据层的名称name、数据类型dtype和维度shape。例如：
+用户需使用 :code:`fluid.data` 在网络中定义数据层变量。定义数据层变量时需指明数据层的名称name、数据类型dtype和维度shape。例如：
 .. code-block:: python
    import paddle.fluid as fluid
-    image = fluid.layers.data(name='image', dtype='float32', shape=[28, 28])
+    image = fluid.data(name='image', dtype='float32', shape=[None, 28, 28])
-    label = fluid.layers.data(name='label', dtype='int64', shape=[1])
+    label = fluid.data(name='label', dtype='int64', shape=[None, 1])
+其中，None表示不确定的维度。此例子中None的含义为batch size。
-需要注意的是，此处的shape是单个样本的维度，PaddlePaddle Fluid会在shape第0维位置添加-1，表示batch_size的维度，即此例中image.shape为[-1, 28, 28]，
+Step 3: 将数据送入网络进行训练/预测
-label.shape为[-1, 1]。
-若用户不希望框架在第0维位置添加-1，则可通过append_batch_size=False参数控制，即：
-.. code-block:: python
-   import paddle.fluid as fluid
-   image = fluid.layers.data(name='image', dtype='float32', shape=[28, 28], append_batch_size=False)
-   label = fluid.layers.data(name='label', dtype='int64', shape=[1], append_batch_size=False)
-此时，image.shape为[28, 28]，label.shape为[1]。
-Step3: 将数据送入网络进行训练/预测
 ###################################
-Fluid提供两种方式，分别是异步PyReader接口方式或同步Feed方式，具体介绍如下：
+Fluid提供两种方式，分别是异步DataLoader接口方式或同步Feed方式，具体介绍如下：
- 异步PyReader接口方式
+- 异步DataLoader接口方式
-用户需要先使用 :code:`fluid.io.PyReader` 定义PyReader对象，然后通过PyReader对象的decorate方法设置数据源。
+用户需要先使用 :code:`fluid.io.DataLoader` 定义DataLoader对象，然后通过DataLoader对象的set方法设置数据源。
-使用PyReader接口时，数据传入与模型训练/预测过程是异步进行的，效率较高，推荐使用。
+使用DataLoader接口时，数据传入与模型训练/预测过程是异步进行的，效率较高，推荐使用。
 - 同步Feed方式
@@ -62,9 +49,9 @@ Fluid提供两种方式，分别是异步PyReader接口方式或同步Feed方式
 这两种准备数据方法的比较如下:
 ========  =================================   =====================================
-对比项            同步Feed方式                          异步PyReader接口方式
+对比项            同步Feed方式                          异步DataLoader接口方式
 ========  =================================   =====================================
-API接口     :code:`executor.run(feed=...)`          :code:`fluid.io.PyReader`
+API接口     :code:`executor.run(feed=...)`          :code:`fluid.io.DataLoader`
 数据格式         Numpy Array或LoDTensor               Numpy Array或LoDTensor
 数据增强          Python端使用其他库完成                  Python端使用其他库完成
 速度                     慢                                   快
@@ -72,7 +59,7 @@ API接口     :code:`executor.run(feed=...)`          :code:`fluid.io.PyReader`
 ========  =================================   =====================================
 Reader数据类型对使用方式的影响
-###############################
+###########################
 根据Reader数据类型的不同，上述步骤的具体操作将有所不同，具体介绍如下:
@@ -81,49 +68,32 @@ Reader数据类型对使用方式的影响
 若自定义的Reader每次返回单个样本的数据，用户需通过以下步骤完成数据送入：
-Step1. 组建数据
+Step 1. 组建数据
-=============================
+================
-调用Fluid提供的Reader相关接口完成组batch和部分的数据预处理功能，具体请参见：
+调用Fluid提供的Reader相关接口完成组batch和部分的数据预处理功能，具体请参见： `数据预处理工具 <./reader_cn.html>`_ 。
-.. toctree::
+Step 2. 送入数据
-   :maxdepth: 1
+================
-   reader_cn.md
+若使用异步DataLoader接口方式送入数据，请调用 :code:`set_sample_generator` 或 :code:`set_sample_list_generator` 接口完成，具体请参见： :ref:`user_guides_use_py_reader` 。
-Step2. 送入数据
+若使用同步Feed方式送入数据，请使用DataFeeder接口将Reader数据转换为LoDTensor格式后送入网络，具体请参见 :ref:`cn_api_fluid_DataFeeder` 。
-=================================
-若使用异步PyReader接口方式送入数据，请调用 :code:`decorate_sample_generator` 或 :code:`decorate_sample_list_generator` 接口完成，具体请参见：
- :ref:`user_guides_use_py_reader`
-若使用同步Feed方式送入数据，请使用DataFeeder接口将Reader数据转换为LoDTensor格式后送入网络，具体请参见 :ref:`cn_api_fluid_DataFeeder`
 读取Batch级Reader数据
-+++++++++++++++++++++++
++++++++++++++++++++
-Step1. 组建数据
-=================
-由于Batch已经组好，已经满足了Step1的条件，可以直接进行Step2
-Step2. 送入数据
-=================================
-若使用异步PyReader接口方式送入数据，请调用PyReader的 :code:`decorate_batch_generator` 接口完成，具体方式请参见:
-.. toctree::
+Step 1. 组建数据
-   :maxdepth: 1
+================
-   use_py_reader.rst
+由于Batch已经组好，已经满足了Step 1的条件，可以直接进行Step 2。
-若使用同步Feed方式送入数据，具体请参见:
+Step 2. 送入数据
+================
-.. toctree::
+若使用异步DataLoader接口方式送入数据，请调用DataLoader的 :code:`set_batch_generator` 接口完成，具体方式请参见: :ref:`user_guides_use_py_reader` 。
-   :maxdepth: 1
-   feeding_data.rst
+若使用同步Feed方式送入数据，具体请参见: :ref:`user_guide_use_numpy_array_as_train_data` 。

--- a/doc/fluid/user_guides/howto/prepare_data/prepare_steps_en.rst
+++ b/doc/fluid/user_guides/howto/prepare_data/prepare_steps_en.rst
@@ -4,52 +4,92 @@
 Prepare Steps
 #############
-PaddlePaddle Fluid supports two methods to feed data into networks:
+Data preparation in PaddlePaddle Fluid can be separated into 3 steps.
-1. Synchronous method - Python Reader：Firstly, use :code:`fluid.layers.data` to set up data input layer. Then, feed in the training data through :code:`executor.run(feed=...)` in :code:`fluid.Executor` or :code:`fluid.ParallelExecutor` .
+Step 1: Define a reader to generate training/testing data
+##########################################################
-2. Asynchronous method - py_reader：Firstly, use :code:`fluid.layers.py_reader` to set up data input layer. Then configure the data source with functions :code:`decorate_paddle_reader` or :code:`decorate_tensor_provider` of :code:`py_reader` . After that, call :code:`fluid.layers.read_file` to read data.
+The generated data type can be Numpy Array or LoDTensor. According to the different data formats returned by the reader, it can be divided into Batch Reader and Sample Reader.
+The batch reader yields a mini-batch data for each, while the sample reader yields a sample data for each.
+If your reader yields a sample data, we provide a data augmentation and batching tool for you: :code:`Python Reader` .
-Comparisons of the two methods:
+Step 2: Define data layer variables in network
+###############################################
-=========================  ====================================================   ===============================================
+Users should use :code:`fluid.data` to define data layer variables. Name, dtype and shape are required when defining. For example,
-Aspects                                   Synchronous Python Reader                       Asynchronous py_reader
-=========================  ====================================================   ===============================================
-API interface                          :code:`executor.run(feed=...)`                 :code:`fluid.layers.py_reader`
-data type                                   Numpy Array                                Numpy Array or LoDTensor
-data augmentation          carried out by other libraries on Python end            carried out by other libraries on Python end 
-velocity                                        slow                                            rapid
-recommended applications                model debugging                                      industrial training
-=========================  ====================================================   ===============================================
-Synchronous Python Reader
+.. code-block:: python
-##########################
-Fluid provides Python Reader to feed in data.
+    import paddle.fluid as fluid
-Python Reader is a pure Python-side interface, and data feeding is synchronized with the model training/prediction process. Users can pass in data through Numpy Array. For specific operations, please refer to:
+    image = fluid.data(name='image', dtype='float32', shape=[None, 28, 28])
+    label = fluid.data(name='label', dtype='int64', shape=[None, 1])
+None means that the dimension is uncertain. In this example, None means the batch size.
-.. toctree::
+Step 3: Send the data to network for training/testing
-   :maxdepth: 1
+######################################################
-   feeding_data_en.rst
+PaddlePaddle Fluid provides 2 methods for sending data to the network: Asynchronous DataLoader API, and Synchronous Feed Method.
-Python Reader supports advanced functions like group batch, shuffle. For specific operations, please refer to：
+- Asynchronous DataLoader API
-.. toctree::
+User should use :code:`fluid.io.DataLoader` to define a DataLoader object and use its setter method to set the data source.
-   :maxdepth: 1
+When using DataLoader API, the process of data sending works asynchronously with network training/testing.
+It is an efficient way for sending data and recommended to use.
-   reader.md
+- Synchronous Feed Method
-Asynchronous py_reader
+User should create the feeding data beforehand and use :code:`executor.run(feed=...)` to send the data to :code:`fluid.Executor` or :code:`fluid.ParallelExecutor` .
-########################
+Data preparation and network training/testing work synchronously, which is less efficient.
-Fluid provides asynchronous data feeding method PyReader. It is more efficient as data feeding is not synchronized with the model training/prediction process. For specific operations, please refer to：
+Comparison of these 2 methods are as follows:
-.. toctree::
+==========================  =================================   ======================================
-   :maxdepth: 1
+Comparison item                 Synchronous Feed Method              Asynchronous DataLoader API
+==========================  =================================   ======================================
+API                           :code:`executor.run(feed=...)`          :code:`fluid.io.DataLoader`
+Data type                       Numpy Array or LoDTensor                Numpy Array or LoDTensor
+Data augmentation            use Python for data augmentation       use Python for data augmentation
+Speed                                     slow                                    rapid
+Recommended applications            model debugging                        industrial training
+==========================  =================================   ======================================
-   use_py_reader_en.rst
+Choose different usages for different data formats
+###################################################
+According to the different data formats of reader, users should choose different usages for data preparation.
+Read data from sample reader
+++++++++++++++++++++++++++++
+If user-defined reader is a sample reader, users should use the following steps:
+Step 1. Batching
+=================
+Use the data reader interfaces in PaddlePaddle Fluid for data augmentation and batching. Please refer to `Python Reader <./reader.html>`_ for details.
+Step 2. Sending data
+=====================
+If using Asynchronous DataLoader API, please use :code:`set_sample_generator` or :code:`set_sample_list_generator` to set the data source for DataLoader. Please refer to :ref:`user_guide_use_py_reader_en` for details.
+If using Synchronous Feed Method, please use DataFeeder to convert the reader data to LoDTensor before sending to the network. Please refer to :ref:`api_fluid_DataFeeder` for details.
+Read data from sample reader
+++++++++++++++++++++++++++++
+Step 1. Batching
+=================
+Since the reader has been a batch reader, this step can be skipped.
+Step 2. Sending data
+=====================
+If using Asynchronous DataLoader API, please use :code:`set_batch_generator` to set the data source for DataLoader. Please refer to :ref:`user_guide_use_py_reader_en` for details.
+If using Synchronous Feed Method, please refer to :ref:`user_guide_use_numpy_array_as_train_data_en` for details.
\ No newline at end of file