Complete Feeding Data

b71dafab · yuyang18 · b3617531 · b71dafab · b71dafab · b71dafab
9 changed file
--- a/source/api_guides/low_level/executor/executor.rst
+++ b/source/api_guides/low_level/executor/executor.rst
+..  _api_guide_executor:
+
 ########
 Executor
 ########
@@ -5,7 +7,7 @@ Executor
 :code:`Executor` 即 :code:`执行器` 。PaddlePaddle Fluid中有两种执行器可以选择。
 :code:`Executor` 实现了一个简易的执行器，所有Operator会被顺序执行。用户可以使用
 Python脚本驱动 :code:`Executor` 执行。默认情况下 :code:`Executor` 是单线程的，如果
-想使用数据并行，请参考另一个执行器， :ref:`api_guide_low_level_parallel_executor` 。
+想使用数据并行，请参考另一个执行器， :ref:`api_guide_parallel_executor` 。

 :code:`Executor` 的代码逻辑非常简单。建议用户在调试过程中，先使用
 :code:`Executor` 跑通模型，再切换到多设备计算，甚至多机计算。
@@ -15,4 +17,4 @@ Python脚本驱动 :code:`Executor` 执行。默认情况下 :code:`Executor` 
 :ref:`api_guide_low_level_program` 。

 简单的使用方法，请参考 :ref:`quick_start_fit_a_line` , API Reference 请参考
-:ref:`api_fluid_Executor` 。
\ No newline at end of file
+:ref:`api_fluid_Executor` 。
--- a/source/api_guides/low_level/executor/parallel_executor.rst
+++ b/source/api_guides/low_level/executor/parallel_executor.rst
-.. _api_guide_low_level_parallel_executor:
+.. _api_guide_parallel_executor:

 ################
 ParallelExecutor

--- a/source/api_guides/low_level/layers/io.rst
+++ b/source/api_guides/low_level/layers/io.rst
 ########
 输入输出
-########
\ No newline at end of file
+########
+
+
+..  _api_guide_reader:
+
+Reader相关API
+#############
\ No newline at end of file
--- a/source/api_guides/low_level/lodtensor.rst
+++ b/source/api_guides/low_level/lodtensor.rst
+..  _api_guide_lod_tensor:
+
+#########
+LoDTensor
+#########
--- a/source/api_guides/low_level/recordio.rst
+++ b/source/api_guides/low_level/recordio.rst
+############
+RecordIO文件
+############
+
+
+RecordIO转换API
+###############
+
+
+
+.. _api_guide_recordio_file_format:
+
+RecordIO文件格式
+################
\ No newline at end of file
--- a/source/user_guides/howto/index.rst
+++ b/source/user_guides/howto/index.rst
@@ -3,27 +3,7 @@
 ####################


-概述
-####
-
-
-
-数据预处理
-##########
-
-
-配置简单的网络
-##############
-
-
-训练
-####
-
-
-
-调试
-####
-
-模型评估
-########
+.. toctree::
+   :maxdepth: 2

+   prepare_data/index
\ No newline at end of file
--- a/source/user_guides/howto/prepare_data/feeding_data.rst
+++ b/source/user_guides/howto/prepare_data/feeding_data.rst
+
+.. _user_guide_use_numpy_array_as_train_data:
+
+###########################
+使用Numpy Array作为训练数据
+###########################
+
+PaddlePaddle Fluid支持使用 :ref:`api_fluid_layers_data` 配置数据层；
+再使用 Numpy Array 或者直接使用Python创建C++的
+:ref:`api_guide_lod_tensor` , 通过 :code:`Executor.run(feed=...)` 传给
+:ref:`api_guide_executor` 或 :ref:`api_guide_parallel_executor` 。
+
+数据层配置
+##########
+
+通过 :ref:`api_fluid_layers_data` 可以配置神经网络中需要的数据层。具体方法为:
+
+.. code-block:: python
+
+   import paddle.fluid as fluid
+
+   image = fluid.layers.data(name="image", shape=[3, 224, 224])
+   label = fluid.layers.data(name="label", shape=[1], dtype="int64")
+
+   # use image/label as layer input
+   prediction = fluid.layers.fc(input=image, size=1000, act="softmax")
+   loss = fluid.layers.cross_entropy(input=prediction, label=label)
+   ...
+
+上段代码中，:code:`image` 和 :code:`label` 是通过 :code:`fluid.layers.data`
+创建的两个输入数据层。其中 :code:`image` 是 :code:`[3, 224, 224]` 维度的浮点数据;
+:code:`data` 是 :code:`[1]` 维度的整数数据。这里需要注意的是:
+
+1. Fluid中默认使用 :code:`-1` 表示 batch size 维度，默认情况下会在 :code:`shape`
+的第一个维度添加 :code:`-1` 。 所以 上段代码中， 我们可以接受将一个
+:code:`[32, 3, 224, 224]`的numpy array传给 :code:`image`。 如果想自定义batch size
+维度的位置的话，请设置 :code:`fluid.layers.data(append_batch_size=False)` 。
+
+2. Fluid中目前使用 :code:`int64` 表示类别标签。
+
+传递训练数据给执行器
+####################
+
+:code:`Executor.run` 和 :code:`ParallelExecutor.run` 都接受一个 :code:`feed` 参数。
+这个参数是一个Python的字典。他的键是数据层的名字，例如上文代码中的:code:`image`。
+他的值是对应的numpy array。
+
+例如:
+
+.. code-block:: python
+
+   exe = fluid.Executor(fluid.CPUPlace())
+   exe.run(feed={
+      "image": numpy.random.random(size=(32, 3, 224, 224)).astype('float32'),
+      "label": numpy.random.random(size=(32, 1)).astype('int64')
+   })
+
+进阶使用
+########
+
+如何传入序列数据
+----------------
+
+序列数据是PaddlePaddle Fluid支持的特殊数据类型，可以使用 :code:`LoDTensor` 作为
+输入数据类型。它需要用户传入一个mini-batch需要被训练的所有数据和每个序列的长度信息。
+具体可以使用 :code:`fluid.create_lod_tensor` 来创建 :code:`LoDTensor`。
+
+传入序列信息的时候，需要设置序列嵌套深度，:code:`lod_level`。
+例如训练数据是词汇组成的句子，:code:`lod_level=1`；训练数据是 词汇先组成了句子，
+句子再组成了段落，那么 :code:`lod_level=2`。
+
+例如:
+
+.. code-block:: python
+
+   sentence = fluid.layers.data(name="sentence", dtype="int64", shape=[1], lod_level=1)
+
+   ...
+
+   exe.run(feed={
+     "sentence": create_lod_tensor(
+       data=numpy.array([1, 3, 4, 5, 3, 6, 8], dtype='int64').reshape(-1, 1),
+       lod=[4, 1, 2],
+       place=fluid.CPUPlace()
+     )
+   })
+
+训练数据 :code:`sentence` 包含三个样本，他们的长度分别是 :code:`4, 1, 2`。
+他们分别是 :code:`data[0:4]`， :code:`data[4:5]` 和 :code:`data[5:7]`。
+
+如何分别设置ParallelExecutor中每个设备的训练数据
+------------------------------------------------
+
+用户将数据传递给使用 :code:`ParallelExecutor.run(feed=...)` 时，
+可以显示指定每一个训练设备(例如GPU)上的数据。
+用户需要将一个列表传递给 :code:`feed` 参数，列表中的每一个元素都是一个字典。
+这个字典的键是数据层的名字，值是数据层的值。
+
+例如:
+
+.. code-block:: python
+
+   parallel_executor = fluid.ParallelExecutor()
+   parallel_executor.run(
+     feed=[
+        {
+          "image": numpy.random.random(size=(32, 3, 224, 224)).astype('float32'),
+          "label": numpy.random.random(size=(32, 1)).astype('int64')
+        },
+        {
+          "image": numpy.random.random(size=(16, 3, 224, 224)).astype('float32'),
+          "label": numpy.random.random(size=(16, 1)).astype('int64')
+        },
+     ]
+   )
+
+上述代码中，GPU0会训练 32 个样本，而 GPU1训练 16 个样本。
\ No newline at end of file
--- a/source/user_guides/howto/prepare_data/index.rst
+++ b/source/user_guides/howto/prepare_data/index.rst
+..  _user_guide_prepare_data:
+
+########
+准备数据
+########
+
+PaddlePaddle Fluid支持两种传入数据的方式： 一种用户需要使用 :code:`fluid.layers.data`
+配置数据输入层，并在 :ref:`api_guide_executor` 或 :ref:`api_guide_parallel_executor`
+中，使用 :code:`executor.run(feed=...)` 传入训练数据; 另一种用户需要先将训练数据
+转换成 Paddle 识别的 :ref:`api_guide_recordio_file_format` ， 再使用
+:code:`fluid.layers.open_files` 以及 :ref:`api_guide_reader` 配置数据读取。
+
+这两种准备数据方法的比较如下:
+
+.. _user_guide_prepare_data_comparision:
+
+------------+----------------------------------+---------------------------------------+
+|            |        Feed数据                  |         使用Reader                    |
+============+==================================+=======================================+
+| API接口    | :code:`executor.run(feed=...)`   |         :ref:`api_guide_reader`       |
+------------+----------------------------------+---------------------------------------+
+| 数据格式   |           Numpy Array            | :ref:`api_guide_recordio_file_format` |
+------------+----------------------------------+---------------------------------------+
+| 数据增强   | Python端使用其他库完成           | 使用Fluid中的Operator 完成            |
+------------+----------------------------------+---------------------------------------+
+|   速度     |                 慢               |                 快                    |
+------------+----------------------------------+---------------------------------------+
+| 推荐用途   |   调试模型                       |   工业训练                            |
+------------+----------------------------------+---------------------------------------+
+
+这些准备数据的详细使用方法，请参考:
+
+.. toctree::
+   :maxdepth: 2
+
+   feeding_data
+   use_recordio_reader
\ No newline at end of file
--- a/source/user_guides/howto/prepare_data/use_recordio_reader.rst
+++ b/source/user_guides/howto/prepare_data/use_recordio_reader.rst
+
+############################
+使用RecordIO文件作为训练数据
+############################
\ No newline at end of file