Merge remote-tracking branch 'origin/develop' into doc/benchmark

6c6fac2b · dzhwinter · f295e064 · 7d5e0e20 · 6c6fac2b · f295e064
68 changed file
--- a/.gitmodules
+++ b/.gitmodules
@@ -4,3 +4,6 @@
 [submodule "book"]
 	path = book
 	url = https://github.com/PaddlePaddle/book.git
+[submodule "source/anakin"]
+	path = source/anakin
+	url = https://github.com/PaddlePaddle/Anakin
--- a/source/advanced_usage/development/index.rst
+++ b/source/advanced_usage/development/index.rst
-####################
-如何开发PaddlePaddle
-####################
-
-
-如何贡献代码
-############
-
-如何贡献文档
-############
-
-如何写新的operator
-##################
-
-CPU性能调优
-###########
-
-GPU性能调优
-###########
\ No newline at end of file
--- a/anakin @ 4e77324d
+++ b/anakin @ 4e77324d
+Subproject commit 4e77324d1e1a7c224fee320b6e8ca1cd33b434ba
--- a/source/api_guides/low_level/executor/executor.rst
+++ b/source/api_guides/low_level/executor/executor.rst
+..  _api_guide_executor:
+
 ########
 Executor
 ########
@@ -5,7 +7,7 @@ Executor
 :code:`Executor` 即 :code:`执行器` 。PaddlePaddle Fluid中有两种执行器可以选择。
 :code:`Executor` 实现了一个简易的执行器，所有Operator会被顺序执行。用户可以使用
 Python脚本驱动 :code:`Executor` 执行。默认情况下 :code:`Executor` 是单线程的，如果
-想使用数据并行，请参考另一个执行器， :ref:`api_guide_low_level_parallel_executor` 。
+想使用数据并行，请参考另一个执行器， :ref:`api_guide_parallel_executor` 。

 :code:`Executor` 的代码逻辑非常简单。建议用户在调试过程中，先使用
 :code:`Executor` 跑通模型，再切换到多设备计算，甚至多机计算。
@@ -15,4 +17,4 @@ Python脚本驱动 :code:`Executor` 执行。默认情况下 :code:`Executor` 
 :ref:`api_guide_low_level_program` 。

 简单的使用方法，请参考 :ref:`quick_start_fit_a_line` , API Reference 请参考
-:ref:`api_fluid_Executor` 。
\ No newline at end of file
+:ref:`api_fluid_Executor` 。
--- a/source/api_guides/low_level/executor/parallel_executor.rst
+++ b/source/api_guides/low_level/executor/parallel_executor.rst
-.. _api_guide_low_level_parallel_executor:
+.. _api_guide_parallel_executor:

 ################
 ParallelExecutor

--- a/source/api_guides/low_level/layers/io.rst
+++ b/source/api_guides/low_level/layers/io.rst
 ########
 输入输出
-########
\ No newline at end of file
+########
+
+
+..  _api_guide_reader:
+
+Reader相关API
+#############
\ No newline at end of file
--- a/source/api_guides/low_level/lodtensor.rst
+++ b/source/api_guides/low_level/lodtensor.rst
+..  _api_guide_lod_tensor:
+
+#########
+LoDTensor
+#########
--- a/source/api_guides/low_level/recordio.rst
+++ b/source/api_guides/low_level/recordio.rst
+############
+RecordIO文件
+############
+
+
+RecordIO转换API
+###############
+
+
+
+.. _api_guide_recordio_file_format:
+
+RecordIO文件格式
+################
--- a/source/api_reference/gen_doc.sh
+++ b/source/api_reference/gen_doc.sh
 #!/bin/bash
-python gen_doc.py layers --submodules control_flow device io nn ops tensor learning_rate_scheduler detection metric tensor > layers.rst
+python gen_doc.py layers --submodules control_flow device io nn ops tensor learning_rate_scheduler detection metric_op tensor > layers.rst

 for module in data_feeder clip metrics executor initializer io nets optimizer param_attr profiler regularizer transpiler recordio_writer backward average profiler
 do

--- a/source/api_reference/layers.rst
+++ b/source/api_reference/layers.rst
@@ -1618,8 +1618,8 @@ box_coder
 ..  autofunction:: paddle.fluid.layers.box_coder
    :noindex:

-metric
-======
+metric_op
+=========

 .. _api_fluid_layers_accuracy:


--- a/source/faq.rst
+++ b/source/faq.rst
--- a/source/quick_start/theoretical_background.rst
+++ b/source/quick_start/theoretical_background.rst
--- a/source/beginners_guide/basics/tutorial/foo.rst
+++ b/source/beginners_guide/basics/tutorial/foo.rst
+###
+FAQ
+###
--- a/source/quick_start/index.rst
+++ b/source/quick_start/index.rst
+..   _quick_start:
+
 ########
 新手入门
 ########

-..  todo::
-
-    新手入门的导引文字，需要完善。
-
 ..  toctree::
    :maxdepth: 2

    install/index.rst
-    quick_start.rst
-    theoretical_background.rst
\ No newline at end of file
+    quick_start/quick_start.rst
+    basics/theoretical_background.rst
--- a/source/beginners_guide/install/build_from_source_cn.rst
+++ b/source/beginners_guide/install/build_from_source_cn.rst
+../../../paddle/doc/fluid/build_and_install/build_from_source_cn.rst
\ No newline at end of file
--- a/source/beginners_guide/install/details/foo.rst
+++ b/source/beginners_guide/install/details/foo.rst
+###
+FAQ
+###
--- a/source/beginners_guide/install/docker_install_cn.rst
+++ b/source/beginners_guide/install/docker_install_cn.rst
+../../../paddle/doc/fluid/build_and_install/docker_install_cn.rst
\ No newline at end of file
--- a/source/beginners_guide/install/index.rst
+++ b/source/beginners_guide/install/index.rst
+.. _quick_start_install:
+
+安装与编译
+==========
+
+.. _install_steps:
+
+PaddlePaddle针对不同的用户群体提供了多种安装方式。
+
+专注深度学习模型开发
+--------------------
+
+PaddlePaddle提供了多种python wheel包，可通过pip一键安装：
+
+.. toctree::
+	:maxdepth: 1
+
+	pip_install_cn.rst
+
+这是最便捷的安装方式，请根据机器配置和系统选择对应的安装包。
+
+关注底层框架
+-------------
+
+PaddlePaddle提供了基于Docker的安装方式，请参照以下教程：
+
+.. toctree::
+	:maxdepth: 1
+
+	docker_install_cn.rst
+
+我们推荐在Docker中运行PaddlePaddle，该方式具有以下优势：
+
+- 无需单独安装第三方依赖
+- 方便分享运行时环境，易于问题的复现
+
+对于有定制化二进制文件需求的用户，我们同样提供了从源码编译安装PaddlePaddle的方法：
+
+.. toctree::
+    :maxdepth: 1
+
+    build_from_source_cn.rst
+
+.. warning::
+
+	需要提醒的是，这种安装方式会涉及到一些第三方库的下载、编译及安装，整个安装过程耗时较长。
+
+
+常见问题汇总
+--------------
+
+如果在安装过程中遇到了问题，请先尝试在下面的页面寻找答案：
+
+:ref:`常见问题解答 <install_faq>`
+
+如果问题没有得到解决，欢迎向PaddlePaddle社区反馈问题：
+
+`创建issue <https://github.com/PaddlePaddle/Paddle/issues/new>`_
--- a/source/beginners_guide/install/others/foo.rst
+++ b/source/beginners_guide/install/others/foo.rst
+###
+FAQ
+###
--- a/source/beginners_guide/install/paddleci.png
+++ b/source/beginners_guide/install/paddleci.png
+../../../paddle/doc/fluid/build_and_install/paddleci.png
\ No newline at end of file
--- a/source/beginners_guide/install/pip_install_cn.rst
+++ b/source/beginners_guide/install/pip_install_cn.rst
+../../../paddle/doc/fluid/build_and_install/pip_install_cn.rst
\ No newline at end of file
--- a/source/quick_start/fit_a_line/image/predictions.png
+++ b/source/quick_start/fit_a_line/image/predictions.png
--- a/source/quick_start/fit_a_line/image/predictions_en.png
+++ b/source/quick_start/fit_a_line/image/predictions_en.png
--- a/source/quick_start/fit_a_line/image/ranges.png
+++ b/source/quick_start/fit_a_line/image/ranges.png
--- a/source/quick_start/fit_a_line/image/ranges_en.png
+++ b/source/quick_start/fit_a_line/image/ranges_en.png
--- a/source/quick_start/fit_a_line/image/train_and_test.png
+++ b/source/quick_start/fit_a_line/image/train_and_test.png
--- a/source/quick_start/fit_a_line/index.md
+++ b/source/quick_start/fit_a_line/index.md
--- a/source/beginners_guide/quick_start/quick_start.rst
+++ b/source/beginners_guide/quick_start/quick_start.rst
+..  _quick_start_examples:
+
+########
+快速入门
+########
+
+PaddlePaddle Fluid 是PaddlePaddle的新版本。他使用类似于编程语言的语法树来表示
+神经网络的计算，并且可以利用多显卡，多核心CPU与多机集群协同进行模型训练。
+
+阅读本文前，请先确定正确安装了PaddlePaddle。参考文章 :ref:`quick_start_install`。
+
+
+..  toctree::
+    :maxdepth: 2
+
+    fit_a_line/index.md
+    recognize_digits/index.md
\ No newline at end of file
--- a/source/quick_start/recognize_digits/image/cnn.png
+++ b/source/quick_start/recognize_digits/image/cnn.png
--- a/source/quick_start/recognize_digits/image/cnn_en.png
+++ b/source/quick_start/recognize_digits/image/cnn_en.png
--- a/source/quick_start/recognize_digits/image/cnn_train_log.png
+++ b/source/quick_start/recognize_digits/image/cnn_train_log.png
--- a/source/quick_start/recognize_digits/image/cnn_train_log_en.png
+++ b/source/quick_start/recognize_digits/image/cnn_train_log_en.png
--- a/source/quick_start/recognize_digits/image/conv_layer.png
+++ b/source/quick_start/recognize_digits/image/conv_layer.png
--- a/source/quick_start/recognize_digits/image/infer_3.png
+++ b/source/quick_start/recognize_digits/image/infer_3.png
--- a/source/quick_start/recognize_digits/image/max_pooling.png
+++ b/source/quick_start/recognize_digits/image/max_pooling.png
--- a/source/quick_start/recognize_digits/image/max_pooling_en.png
+++ b/source/quick_start/recognize_digits/image/max_pooling_en.png
--- a/source/quick_start/recognize_digits/image/mlp.png
+++ b/source/quick_start/recognize_digits/image/mlp.png
--- a/source/quick_start/recognize_digits/image/mlp_en.png
+++ b/source/quick_start/recognize_digits/image/mlp_en.png
--- a/source/quick_start/recognize_digits/image/mlp_train_log.png
+++ b/source/quick_start/recognize_digits/image/mlp_train_log.png
--- a/source/quick_start/recognize_digits/image/mlp_train_log_en.png
+++ b/source/quick_start/recognize_digits/image/mlp_train_log_en.png
--- a/source/quick_start/recognize_digits/image/mnist_example_image.png
+++ b/source/quick_start/recognize_digits/image/mnist_example_image.png
--- a/source/quick_start/recognize_digits/image/softmax_regression.png
+++ b/source/quick_start/recognize_digits/image/softmax_regression.png
--- a/source/quick_start/recognize_digits/image/softmax_regression_en.png
+++ b/source/quick_start/recognize_digits/image/softmax_regression_en.png
--- a/source/quick_start/recognize_digits/image/softmax_train_log.png
+++ b/source/quick_start/recognize_digits/image/softmax_train_log.png
--- a/source/quick_start/recognize_digits/image/softmax_train_log_en.png
+++ b/source/quick_start/recognize_digits/image/softmax_train_log_en.png
--- a/source/quick_start/recognize_digits/image/train_and_test.png
+++ b/source/quick_start/recognize_digits/image/train_and_test.png
--- a/source/beginners_guide/quick_start/recognize_digits/index.md
+++ b/source/beginners_guide/quick_start/recognize_digits/index.md
+# 识别数字
+
+本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/167.html)。
+
+## 背景介绍
+当我们学习编程的时候，编写的第一个程序一般是实现打印"Hello World"。而机器学习（或深度学习）的入门教程，一般都是 [MNIST](http://yann.lecun.com/exdb/mnist/) 数据库上的手写识别问题。原因是手写识别属于典型的图像分类问题，比较简单，同时MNIST数据集也很完备。MNIST数据集作为一个简单的计算机视觉数据集，包含一系列如图1所示的手写数字图片和对应的标签。图片是28x28的像素矩阵，标签则对应着0~9的10个数字。每张图片都经过了大小归一化和居中处理。
+
+<p align="center">
+<img src="image/mnist_example_image.png" width="400"><br/>
+图1. MNIST图片示例
+</p>
+
+MNIST数据集是从 [NIST](https://www.nist.gov/srd/nist-special-database-19) 的Special Database 3（SD-3）和Special Database 1（SD-1）构建而来。由于SD-3是由美国人口调查局的员工进行标注，SD-1是由美国高中生进行标注，因此SD-3比SD-1更干净也更容易识别。Yann LeCun等人从SD-1和SD-3中各取一半作为MNIST的训练集（60000条数据）和测试集（10000条数据），其中训练集来自250位不同的标注员，此外还保证了训练集和测试集的标注员是不完全相同的。
+
+Yann LeCun早先在手写字符识别上做了很多研究，并在研究过程中提出了卷积神经网络（Convolutional Neural Network），大幅度地提高了手写字符的识别能力，也因此成为了深度学习领域的奠基人之一。如今的深度学习领域，卷积神经网络占据了至关重要的地位，从最早Yann LeCun提出的简单LeNet，到如今ImageNet大赛上的优胜模型VGGNet、GoogLeNet、ResNet等（请参见[图像分类](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification) 教程），人们在图像分类领域，利用卷积神经网络得到了一系列惊人的结果。
+
+有很多算法在MNIST上进行实验。1998年，LeCun分别用单层线性分类器、多层感知器（Multilayer Perceptron, MLP）和多层卷积神经网络LeNet进行实验，使得测试集上的误差不断下降（从12%下降到0.7%）\[[1](#参考文献)\]。此后，科学家们又基于K近邻（K-Nearest Neighbors）算法\[[2](#参考文献)\]、支持向量机（SVM）\[[3](#参考文献)\]、神经网络\[[4-7](#参考文献)\]和Boosting方法\[[8](#参考文献)\]等做了大量实验，并采用多种预处理方法（如去除歪曲、去噪、模糊等）来提高识别的准确率。
+
+本教程中，我们从简单的模型Softmax回归开始，带大家入门手写字符识别，并逐步进行模型优化。
+
+
+## 模型概览
+
+基于MNIST数据训练一个分类器，在介绍本教程使用的三个基本图像分类网络前，我们先给出一些定义：
+- $X$是输入：MNIST图片是$28\times28$ 的二维图像，为了进行计算，我们将其转化为$784$维向量，即$X=\left ( x_0, x_1, \dots, x_{783} \right )$。
+- $Y$是输出：分类器的输出是10类数字（0-9），即$Y=\left ( y_0, y_1, \dots, y_9 \right )$，每一维$y_i$代表图片分类为第$i$类数字的概率。
+- $L$是图片的真实标签：$L=\left ( l_0, l_1, \dots, l_9 \right )$也是10维，但只有一维为1，其他都为0。
+
+### Softmax回归(Softmax Regression)
+
+最简单的Softmax回归模型是先将输入层经过一个全连接层得到的特征，然后直接通过softmax 函数进行多分类\[[9](#参考文献)\]。
+
+输入层的数据$X$传到输出层，在激活操作之前，会乘以相应的权重 $W$ ，并加上偏置变量 $b$ ，具体如下：
+
+$$ y_i = \text{softmax}(\sum_j W_{i,j}x_j + b_i) $$
+
+其中 $ \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
+
+对于有 $N$ 个类别的多分类问题，指定 $N$ 个输出节点，$N$ 维结果向量经过softmax将归一化为 $N$ 个[0,1]范围内的实数值，分别表示该样本属于这 $N$ 个类别的概率。此处的 $y_i$ 即对应该图片为数字 $i$ 的预测概率。
+
+在分类问题中，我们一般采用交叉熵代价损失函数（cross entropy），公式如下：
+
+$$  \text{crossentropy}(label, y) = -\sum_i label_ilog(y_i) $$
+
+图2为softmax回归的网络图，图中权重用蓝线表示、偏置用红线表示、+1代表偏置参数的系数为1。
+
+<p align="center">
+<img src="image/softmax_regression.png" width=400><br/>
+图2. softmax回归网络结构图<br/>
+</p>
+
+### 多层感知器(Multilayer Perceptron, MLP)
+
+Softmax回归模型采用了最简单的两层神经网络，即只有输入层和输出层，因此其拟合能力有限。为了达到更好的识别效果，我们考虑在输入层和输出层中间加上若干个隐藏层\[[10](#参考文献)\]。
+
+1.  经过第一个隐藏层，可以得到 $ H_1 = \phi(W_1X + b_1) $，其中$\phi$代表激活函数，常见的有sigmoid、tanh或ReLU等函数。
+2.  经过第二个隐藏层，可以得到 $ H_2 = \phi(W_2H_1 + b_2) $。
+3.  最后，再经过输出层，得到的$Y=\text{softmax}(W_3H_2 + b_3)$，即为最后的分类结果向量。
+
+
+图3为多层感知器的网络结构图，图中权重用蓝线表示、偏置用红线表示、+1代表偏置参数的系数为1。
+
+<p align="center">
+<img src="image/mlp.png" width=500><br/>
+图3. 多层感知器网络结构图<br/>
+</p>
+
+### 卷积神经网络(Convolutional Neural Network, CNN)
+
+在多层感知器模型中，将图像展开成一维向量输入到网络中，忽略了图像的位置和结构信息，而卷积神经网络能够更好的利用图像的结构信息。[LeNet-5](http://yann.lecun.com/exdb/lenet/)是一个较简单的卷积神经网络。图4显示了其结构：输入的二维图像，先经过两次卷积层到池化层，再经过全连接层，最后使用softmax分类作为输出层。下面我们主要介绍卷积层和池化层。
+
+<p align="center">
+<img src="image/cnn.png"><br/>
+图4. LeNet-5卷积神经网络结构<br/>
+</p>
+
+#### 卷积层
+
+卷积层是卷积神经网络的核心基石。在图像识别里我们提到的卷积是二维卷积，即离散二维滤波器（也称作卷积核）与二维图像做卷积操作，简单的讲是二维滤波器滑动到二维图像上所有位置，并在每个位置上与该像素点及其领域像素点做内积。卷积操作被广泛应用与图像处理领域，不同卷积核可以提取不同的特征，例如边沿、线性、角等特征。在深层卷积神经网络中，通过卷积操作可以提取出图像低级到复杂的特征。
+
+<p align="center">
+<img src="image/conv_layer.png" width='750'><br/>
+图5. 卷积层图片<br/>
+</p>
+
+图5给出一个卷积计算过程的示例图，输入图像大小为$H=5,W=5,D=3$，即$5 \times 5$大小的3通道（RGB，也称作深度）彩色图像。这个示例图中包含两（用$K$表示）组卷积核，即图中滤波器$W_0$和$W_1$。在卷积计算中，通常对不同的输入通道采用不同的卷积核，如图示例中每组卷积核包含（$D=3）$个$3 \times 3$（用$F \times F$表示）大小的卷积核。另外，这个示例中卷积核在图像的水平方向（$W$方向）和垂直方向（$H$方向）的滑动步长为2（用$S$表示）；对输入图像周围各填充1（用$P$表示）个0，即图中输入层原始数据为蓝色部分，灰色部分是进行了大小为1的扩展，用0来进行扩展。经过卷积操作得到输出为$3 \times 3 \times 2$（用$H_{o} \times W_{o} \times K$表示）大小的特征图，即$3 \times 3$大小的2通道特征图，其中$H_o$计算公式为：$H_o = (H - F + 2 \times P)/S + 1$，$W_o$同理。 而输出特征图中的每个像素，是每组滤波器与输入图像每个特征图的内积再求和，再加上偏置$b_o$，偏置通常对于每个输出特征图是共享的。输出特征图$o[:,:,0]$中的最后一个$-2$计算如图5右下角公式所示。
+
+在卷积操作中卷积核是可学习的参数，经过上面示例介绍，每层卷积的参数大小为$D \times F \times F \times K$。在多层感知器模型中，神经元通常是全部连接，参数较多。而卷积层的参数较少，这也是由卷积层的主要特性即局部连接和共享权重所决定。
+
+- 局部连接：每个神经元仅与输入神经元的一块区域连接，这块局部区域称作感受野（receptive field）。在图像卷积操作中，即神经元在空间维度（spatial dimension，即上图示例H和W所在的平面）是局部连接，但在深度上是全部连接。对于二维图像本身而言，也是局部像素关联较强。这种局部连接保证了学习后的过滤器能够对于局部的输入特征有最强的响应。局部连接的思想，也是受启发于生物学里面的视觉系统结构，视觉皮层的神经元就是局部接受信息的。
+
+- 权重共享：计算同一个深度切片的神经元时采用的滤波器是共享的。例如图4中计算$o[:,:,0]$的每个每个神经元的滤波器均相同，都为$W_0$，这样可以很大程度上减少参数。共享权重在一定程度上讲是有意义的，例如图片的底层边缘特征与特征在图中的具体位置无关。但是在一些场景中是无意的，比如输入的图片是人脸，眼睛和头发位于不同的位置，希望在不同的位置学到不同的特征 (参考[斯坦福大学公开课]( http://cs231n.github.io/convolutional-networks/))。请注意权重只是对于同一深度切片的神经元是共享的，在卷积层，通常采用多组卷积核提取不同特征，即对应不同深度切片的特征，不同深度切片的神经元权重是不共享。另外，偏重对同一深度切片的所有神经元都是共享的。
+
+通过介绍卷积计算过程及其特性，可以看出卷积是线性操作，并具有平移不变性（shift-invariant），平移不变性即在图像每个位置执行相同的操作。卷积层的局部连接和权重共享使得需要学习的参数大大减小，这样也有利于训练较大卷积神经网络。
+
+#### 池化层
+
+<p align="center">
+<img src="image/max_pooling.png" width="400px"><br/>
+图6. 池化层图片<br/>
+</p>
+
+池化是非线性下采样的一种形式，主要作用是通过减少网络的参数来减小计算量，并且能够在一定程度上控制过拟合。通常在卷积层的后面会加上一个池化层。池化包括最大池化、平均池化等。其中最大池化是用不重叠的矩形框将输入层分成不同的区域，对于每个矩形框的数取最大值作为输出层，如图6所示。
+
+更详细的关于卷积神经网络的具体知识可以参考[斯坦福大学公开课]( http://cs231n.github.io/convolutional-networks/ )和[图像分类](https://github.com/PaddlePaddle/book/blob/develop/image_classification/README.md)教程。
+
+### 常见激活函数介绍  
+- sigmoid激活函数： $ f(x) = sigmoid(x) = \frac{1}{1+e^{-x}} $
+
+- tanh激活函数： $ f(x) = tanh(x) = \frac{e^x-e^{-x}}{e^x+e^{-x}} $
+
+实际上，tanh函数只是规模变化的sigmoid函数，将sigmoid函数值放大2倍之后再向下平移1个单位：tanh(x) = 2sigmoid(2x) - 1 。
+
+- ReLU激活函数： $ f(x) = max(0, x) $
+
+更详细的介绍请参考[维基百科激活函数](https://en.wikipedia.org/wiki/Activation_function)。
+
+## 数据介绍
+
+PaddlePaddle在API中提供了自动加载[MNIST](http://yann.lecun.com/exdb/mnist/)数据的模块`paddle.dataset.mnist`。加载后的数据位于`/home/username/.cache/paddle/dataset/mnist`下：
+
+
+|         文件名称          |        说明             |
+| ----------------------- | ----------------------- |
+| train-images-idx3-ubyte | 训练数据图片，60,000条数据 |
+| train-labels-idx1-ubyte | 训练数据标签，60,000条数据 |
+| t10k-images-idx3-ubyte | 测试数据图片，10,000条数据 |
+| t10k-labels-idx1-ubyte | 测试数据标签，10,000条数据 |
+
+## 配置说明
+
+首先，加载PaddlePaddle的V2 api包。
+
+```python
+import paddle.v2 as paddle
+```
+其次，定义三个不同的分类器：
+
+- Softmax回归：只通过一层简单的以softmax为激活函数的全连接层，就可以得到分类的结果。
+
+```python
+def softmax_regression(img):
+predict = paddle.layer.fc(input=img,
+size=10,
+act=paddle.activation.Softmax())
+return predict
+```
+- 多层感知器：下面代码实现了一个含有两个隐藏层（即全连接层）的多层感知器。其中两个隐藏层的激活函数均采用ReLU，输出层的激活函数用Softmax。
+
+```python
+def multilayer_perceptron(img):
+# 第一个全连接层，激活函数为ReLU
+hidden1 = paddle.layer.fc(input=img, size=128, act=paddle.activation.Relu())
+# 第二个全连接层，激活函数为ReLU
+hidden2 = paddle.layer.fc(input=hidden1,
+size=64,
+act=paddle.activation.Relu())
+# 以softmax为激活函数的全连接输出层，输出层的大小必须为数字的个数10
+predict = paddle.layer.fc(input=hidden2,
+size=10,
+act=paddle.activation.Softmax())
+return predict
+```
+- 卷积神经网络LeNet-5: 输入的二维图像，首先经过两次卷积层到池化层，再经过全连接层，最后使用以softmax为激活函数的全连接层作为输出层。
+
+```python
+def convolutional_neural_network(img):
+# 第一个卷积-池化层
+conv_pool_1 = paddle.networks.simple_img_conv_pool(
+input=img,
+filter_size=5,
+num_filters=20,
+num_channel=1,
+pool_size=2,
+pool_stride=2,
+act=paddle.activation.Relu())
+# 第二个卷积-池化层
+conv_pool_2 = paddle.networks.simple_img_conv_pool(
+input=conv_pool_1,
+filter_size=5,
+num_filters=50,
+num_channel=20,
+pool_size=2,
+pool_stride=2,
+act=paddle.activation.Relu())
+# 以softmax为激活函数的全连接输出层，输出层的大小必须为数字的个数10
+predict = paddle.layer.fc(input=conv_pool_2,
+size=10,
+act=paddle.activation.Softmax())
+return predict
+```
+
+接着，通过`layer.data`调用来获取数据，然后调用分类器（这里我们提供了三个不同的分类器）得到分类结果。训练时，对该结果计算其损失函数，分类问题常常选择交叉熵损失函数。
+
+```python
+# 该模型运行在单个CPU上
+paddle.init(use_gpu=False, trainer_count=1)
+
+images = paddle.layer.data(
+name='pixel', type=paddle.data_type.dense_vector(784))
+label = paddle.layer.data(
+name='label', type=paddle.data_type.integer_value(10))
+
+# predict = softmax_regression(images) # Softmax回归
+# predict = multilayer_perceptron(images) #多层感知器
+predict = convolutional_neural_network(images) #LeNet5卷积神经网络
+
+cost = paddle.layer.classification_cost(input=predict, label=label)
+```
+
+然后，指定训练相关的参数。
+- 训练方法（optimizer)： 代表训练过程在更新权重时采用动量优化器 `Momentum` ，其中参数0.9代表动量优化每次保持前一次速度的0.9倍。
+- 训练速度（learning_rate）： 迭代的速度，与网络的训练收敛速度有关系。
+- 正则化（regularization）： 是防止网络过拟合的一种手段，此处采用L2正则化。
+
+```python
+parameters = paddle.parameters.create(cost)
+
+optimizer = paddle.optimizer.Momentum(
+learning_rate=0.1 / 128.0,
+momentum=0.9,
+regularization=paddle.optimizer.L2Regularization(rate=0.0005 * 128))
+
+trainer = paddle.trainer.SGD(cost=cost,
+parameters=parameters,
+update_equation=optimizer)
+```
+
+下一步，我们开始训练过程。`paddle.dataset.movielens.train()`和`paddle.dataset.movielens.test()`分别做训练和测试数据集。这两个函数各自返回一个reader——PaddlePaddle中的reader是一个Python函数，每次调用的时候返回一个Python yield generator。
+
+下面`shuffle`是一个reader decorator，它接受一个reader A，返回另一个reader B —— reader B 每次读入`buffer_size`条训练数据到一个buffer里，然后随机打乱其顺序，并且逐条输出。
+
+`batch`是一个特殊的decorator，它的输入是一个reader，输出是一个batched reader —— 在PaddlePaddle里，一个reader每次yield一条训练数据，而一个batched reader每次yield一个minibatch。
+
+`event_handler_plot`可以用来在训练过程中画图如下：
+
+![png](./image/train_and_test.png)
+
+```python
+from paddle.v2.plot import Ploter
+
+train_title = "Train cost"
+test_title = "Test cost"
+cost_ploter = Ploter(train_title, test_title)
+
+step = 0
+
+# event_handler to plot a figure
+def event_handler_plot(event):
+global step
+if isinstance(event, paddle.event.EndIteration):
+if step % 100 == 0:
+cost_ploter.append(train_title, step, event.cost)
+cost_ploter.plot()
+step += 1
+if isinstance(event, paddle.event.EndPass):
+# save parameters
+with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
+trainer.save_parameter_to_tar(f)
+
+result = trainer.test(reader=paddle.batch(
+paddle.dataset.mnist.test(), batch_size=128))
+cost_ploter.append(test_title, step, result.cost)
+```
+
+`event_handler` 用来在训练过程中输出训练结果
+```python
+lists = []
+
+def event_handler(event):
+if isinstance(event, paddle.event.EndIteration):
+if event.batch_id % 100 == 0:
+print "Pass %d, Batch %d, Cost %f, %s" % (
+event.pass_id, event.batch_id, event.cost, event.metrics)
+if isinstance(event, paddle.event.EndPass):
+# save parameters
+with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
+trainer.save_parameter_to_tar(f)
+
+result = trainer.test(reader=paddle.batch(
+paddle.dataset.mnist.test(), batch_size=128))
+print "Test with Pass %d, Cost %f, %s\n" % (
+event.pass_id, result.cost, result.metrics)
+lists.append((event.pass_id, result.cost,
+result.metrics['classification_error_evaluator']))
+```
+
+```python
+trainer.train(
+reader=paddle.batch(
+paddle.reader.shuffle(
+paddle.dataset.mnist.train(), buf_size=8192),
+batch_size=128),
+event_handler=event_handler_plot,
+num_passes=5)
+```
+
+训练过程是完全自动的，event_handler里打印的日志类似如下所示：
+
+```
+# Pass 0, Batch 0, Cost 2.780790, {'classification_error_evaluator': 0.9453125}
+# Pass 0, Batch 100, Cost 0.635356, {'classification_error_evaluator': 0.2109375}
+# Pass 0, Batch 200, Cost 0.326094, {'classification_error_evaluator': 0.1328125}
+# Pass 0, Batch 300, Cost 0.361920, {'classification_error_evaluator': 0.1015625}
+# Pass 0, Batch 400, Cost 0.410101, {'classification_error_evaluator': 0.125}
+# Test with Pass 0, Cost 0.326659, {'classification_error_evaluator': 0.09470000118017197}
+```
+
+训练之后，检查模型的预测准确度。用 MNIST 训练的时候，一般 softmax回归模型的分类准确率为约为 92.34%，多层感知器为97.66%，卷积神经网络可以达到 99.20%。
+
+
+## 应用模型
+
+可以使用训练好的模型对手写体数字图片进行分类，下面程序展示了如何使用paddle.infer接口进行推断。
+
+```python
+from PIL import Image
+import numpy as np
+import os
+def load_image(file):
+im = Image.open(file).convert('L')
+im = im.resize((28, 28), Image.ANTIALIAS)
+im = np.array(im).astype(np.float32).flatten()
+im = im / 255.0 * 2.0 - 1.0
+return im
+
+test_data = []
+cur_dir = os.getcwd()
+test_data.append((load_image(cur_dir + '/image/infer_3.png'),))
+
+probs = paddle.infer(
+output_layer=predict, parameters=parameters, input=test_data)
+lab = np.argsort(-probs) # probs and lab are the results of one batch data
+print "Label of image/infer_3.png is: %d" % lab[0][0]
+```
+
+## 总结
+
+本教程的softmax回归、多层感知器和卷积神经网络是最基础的深度学习模型，后续章节中复杂的神经网络都是从它们衍生出来的，因此这几个模型对之后的学习大有裨益。同时，我们也观察到从最简单的softmax回归变换到稍复杂的卷积神经网络的时候，MNIST数据集上的识别准确率有了大幅度的提升，原因是卷积层具有局部连接和共享权重的特性。在之后学习新模型的时候，希望大家也要深入到新模型相比原模型带来效果提升的关键之处。此外，本教程还介绍了PaddlePaddle模型搭建的基本流程，从dataprovider的编写、网络层的构建，到最后的训练和预测。对这个流程熟悉以后，大家就可以用自己的数据，定义自己的网络模型，并完成自己的训练和预测任务了。
+
+## 参考文献
+
+1. LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. ["Gradient-based learning applied to document recognition."](http://ieeexplore.ieee.org/abstract/document/726791/) Proceedings of the IEEE 86, no. 11 (1998): 2278-2324.
+2. Wejéus, Samuel. ["A Neural Network Approach to Arbitrary SymbolRecognition on Modern Smartphones."](http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A753279&dswid=-434) (2014).
+3. Decoste, Dennis, and Bernhard Schölkopf. ["Training invariant support vector machines."](http://link.springer.com/article/10.1023/A:1012454411458) Machine learning 46, no. 1-3 (2002): 161-190.
+4. Simard, Patrice Y., David Steinkraus, and John C. Platt. ["Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis."](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.160.8494&rep=rep1&type=pdf) In ICDAR, vol. 3, pp. 958-962. 2003.
+5. Salakhutdinov, Ruslan, and Geoffrey E. Hinton. ["Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure."](http://www.jmlr.org/proceedings/papers/v2/salakhutdinov07a/salakhutdinov07a.pdf) In AISTATS, vol. 11. 2007.
+6. Cireşan, Dan Claudiu, Ueli Meier, Luca Maria Gambardella, and Jürgen Schmidhuber. ["Deep, big, simple neural nets for handwritten digit recognition."](http://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_00052) Neural computation 22, no. 12 (2010): 3207-3220.
+7. Deng, Li, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, and Geoffrey E. Hinton. ["Binary coding of speech spectrograms using a deep auto-encoder."](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.185.1908&rep=rep1&type=pdf) In Interspeech, pp. 1692-1695. 2010.
+8. Kégl, Balázs, and Róbert Busa-Fekete. ["Boosting products of base classifiers."](http://dl.acm.org/citation.cfm?id=1553439) In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 497-504. ACM, 2009.
+9. Rosenblatt, Frank. ["The perceptron: A probabilistic model for information storage and organization in the brain."](http://psycnet.apa.org/journals/rev/65/6/386/) Psychological review 65, no. 6 (1958): 386.
+10. Bishop, Christopher M. ["Pattern recognition."](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf) Machine Learning 128 (2006): 1-58.
+
+<br/>
+<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">知识共享 署名-相同方式共享 4.0 国际 许可协议</a>进行许可。
--- a/source/faq/faq.rst
+++ b/source/faq/faq.rst
+###
+FAQ
+###
--- a/source/index.rst
+++ b/source/index.rst
@@ -14,7 +14,7 @@
 ..  toctree::
    :maxdepth: 1

-    quick_start/index.rst
+    beginners_guide/index.rst
    user_guides/index.rst
    advanced_usage/index.rst
    api_guides/index.rst

--- a/source/mobile/foo.rst
+++ b/source/mobile/foo.rst
+###
+FAQ
+###
--- a/source/quick_start/install/build_from_source_cn.rst
+++ b/source/quick_start/install/build_from_source_cn.rst
-../../../paddle/doc/fluid/build_and_install/build_from_source_cn.rst
\ No newline at end of file
--- a/source/quick_start/install/docker_install_cn.rst
+++ b/source/quick_start/install/docker_install_cn.rst
-../../../paddle/doc/fluid/build_and_install/docker_install_cn.rst
\ No newline at end of file
--- a/source/quick_start/install/index.rst
+++ b/source/quick_start/install/index.rst
-../../../paddle/doc/fluid/build_and_install/index_cn.rst
\ No newline at end of file
--- a/source/quick_start/install/paddleci.png
+++ b/source/quick_start/install/paddleci.png
-../../../paddle/doc/fluid/build_and_install/paddleci.png
\ No newline at end of file
--- a/source/quick_start/install/pip_install_cn.rst
+++ b/source/quick_start/install/pip_install_cn.rst
-../../../paddle/doc/fluid/build_and_install/pip_install_cn.rst
\ No newline at end of file
--- a/source/quick_start/quick_start.rst
+++ b/source/quick_start/quick_start.rst
-########
-快速入门
-########
-
-..  todo::
-
-    概述
-
-..  toctree::
-    :maxdepth: 2
-
-    fit_a_line/index.md
-    recognize_digits/index.md
\ No newline at end of file
--- a/source/quick_start/recognize_digits/index.md
+++ b/source/quick_start/recognize_digits/index.md
-# 识别数字
-
-本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/167.html)。
-
-## 背景介绍
-当我们学习编程的时候，编写的第一个程序一般是实现打印"Hello World"。而机器学习（或深度学习）的入门教程，一般都是 [MNIST](http://yann.lecun.com/exdb/mnist/) 数据库上的手写识别问题。原因是手写识别属于典型的图像分类问题，比较简单，同时MNIST数据集也很完备。MNIST数据集作为一个简单的计算机视觉数据集，包含一系列如图1所示的手写数字图片和对应的标签。图片是28x28的像素矩阵，标签则对应着0~9的10个数字。每张图片都经过了大小归一化和居中处理。
-
-<p align="center">
-<img src="image/mnist_example_image.png" width="400"><br/>
-图1. MNIST图片示例
-</p>
-
-MNIST数据集是从 [NIST](https://www.nist.gov/srd/nist-special-database-19) 的Special Database 3（SD-3）和Special Database 1（SD-1）构建而来。由于SD-3是由美国人口调查局的员工进行标注，SD-1是由美国高中生进行标注，因此SD-3比SD-1更干净也更容易识别。Yann LeCun等人从SD-1和SD-3中各取一半作为MNIST的训练集（60000条数据）和测试集（10000条数据），其中训练集来自250位不同的标注员，此外还保证了训练集和测试集的标注员是不完全相同的。
-
-Yann LeCun早先在手写字符识别上做了很多研究，并在研究过程中提出了卷积神经网络（Convolutional Neural Network），大幅度地提高了手写字符的识别能力，也因此成为了深度学习领域的奠基人之一。如今的深度学习领域，卷积神经网络占据了至关重要的地位，从最早Yann LeCun提出的简单LeNet，到如今ImageNet大赛上的优胜模型VGGNet、GoogLeNet、ResNet等（请参见[图像分类](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification) 教程），人们在图像分类领域，利用卷积神经网络得到了一系列惊人的结果。
-
-有很多算法在MNIST上进行实验。1998年，LeCun分别用单层线性分类器、多层感知器（Multilayer Perceptron, MLP）和多层卷积神经网络LeNet进行实验，使得测试集上的误差不断下降（从12%下降到0.7%）\[[1](#参考文献)\]。此后，科学家们又基于K近邻（K-Nearest Neighbors）算法\[[2](#参考文献)\]、支持向量机（SVM）\[[3](#参考文献)\]、神经网络\[[4-7](#参考文献)\]和Boosting方法\[[8](#参考文献)\]等做了大量实验，并采用多种预处理方法（如去除歪曲、去噪、模糊等）来提高识别的准确率。
-
-本教程中，我们从简单的模型Softmax回归开始，带大家入门手写字符识别，并逐步进行模型优化。
-
-
-## 模型概览
-
-基于MNIST数据训练一个分类器，在介绍本教程使用的三个基本图像分类网络前，我们先给出一些定义：
- $X$是输入：MNIST图片是$28\times28$ 的二维图像，为了进行计算，我们将其转化为$784$维向量，即$X=\left ( x_0, x_1, \dots, x_{783} \right )$。
- $Y$是输出：分类器的输出是10类数字（0-9），即$Y=\left ( y_0, y_1, \dots, y_9 \right )$，每一维$y_i$代表图片分类为第$i$类数字的概率。
- $L$是图片的真实标签：$L=\left ( l_0, l_1, \dots, l_9 \right )$也是10维，但只有一维为1，其他都为0。
-
-### Softmax回归(Softmax Regression)
-
-最简单的Softmax回归模型是先将输入层经过一个全连接层得到的特征，然后直接通过softmax 函数进行多分类\[[9](#参考文献)\]。
-
-输入层的数据$X$传到输出层，在激活操作之前，会乘以相应的权重 $W$ ，并加上偏置变量 $b$ ，具体如下：
-
-$$ y_i = \text{softmax}(\sum_j W_{i,j}x_j + b_i) $$
-
-其中 $ \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
-
-对于有 $N$ 个类别的多分类问题，指定 $N$ 个输出节点，$N$ 维结果向量经过softmax将归一化为 $N$ 个[0,1]范围内的实数值，分别表示该样本属于这 $N$ 个类别的概率。此处的 $y_i$ 即对应该图片为数字 $i$ 的预测概率。
-
-在分类问题中，我们一般采用交叉熵代价损失函数（cross entropy），公式如下：
-
-$$  \text{crossentropy}(label, y) = -\sum_i label_ilog(y_i) $$
-
-图2为softmax回归的网络图，图中权重用蓝线表示、偏置用红线表示、+1代表偏置参数的系数为1。
-
-<p align="center">
-<img src="image/softmax_regression.png" width=400><br/>
-图2. softmax回归网络结构图<br/>
-</p>
-
-### 多层感知器(Multilayer Perceptron, MLP)
-
-Softmax回归模型采用了最简单的两层神经网络，即只有输入层和输出层，因此其拟合能力有限。为了达到更好的识别效果，我们考虑在输入层和输出层中间加上若干个隐藏层\[[10](#参考文献)\]。
-
-1.  经过第一个隐藏层，可以得到 $ H_1 = \phi(W_1X + b_1) $，其中$\phi$代表激活函数，常见的有sigmoid、tanh或ReLU等函数。
-2.  经过第二个隐藏层，可以得到 $ H_2 = \phi(W_2H_1 + b_2) $。
-3.  最后，再经过输出层，得到的$Y=\text{softmax}(W_3H_2 + b_3)$，即为最后的分类结果向量。
-
-
-图3为多层感知器的网络结构图，图中权重用蓝线表示、偏置用红线表示、+1代表偏置参数的系数为1。
-
-<p align="center">
-<img src="image/mlp.png" width=500><br/>
-图3. 多层感知器网络结构图<br/>
-</p>
-
-### 卷积神经网络(Convolutional Neural Network, CNN)
-
-在多层感知器模型中，将图像展开成一维向量输入到网络中，忽略了图像的位置和结构信息，而卷积神经网络能够更好的利用图像的结构信息。[LeNet-5](http://yann.lecun.com/exdb/lenet/)是一个较简单的卷积神经网络。图4显示了其结构：输入的二维图像，先经过两次卷积层到池化层，再经过全连接层，最后使用softmax分类作为输出层。下面我们主要介绍卷积层和池化层。
-
-<p align="center">
-<img src="image/cnn.png"><br/>
-图4. LeNet-5卷积神经网络结构<br/>
-</p>
-
-#### 卷积层
-
-卷积层是卷积神经网络的核心基石。在图像识别里我们提到的卷积是二维卷积，即离散二维滤波器（也称作卷积核）与二维图像做卷积操作，简单的讲是二维滤波器滑动到二维图像上所有位置，并在每个位置上与该像素点及其领域像素点做内积。卷积操作被广泛应用与图像处理领域，不同卷积核可以提取不同的特征，例如边沿、线性、角等特征。在深层卷积神经网络中，通过卷积操作可以提取出图像低级到复杂的特征。
-
-<p align="center">
-<img src="image/conv_layer.png" width='750'><br/>
-图5. 卷积层图片<br/>
-</p>
-
-图5给出一个卷积计算过程的示例图，输入图像大小为$H=5,W=5,D=3$，即$5 \times 5$大小的3通道（RGB，也称作深度）彩色图像。这个示例图中包含两（用$K$表示）组卷积核，即图中滤波器$W_0$和$W_1$。在卷积计算中，通常对不同的输入通道采用不同的卷积核，如图示例中每组卷积核包含（$D=3）$个$3 \times 3$（用$F \times F$表示）大小的卷积核。另外，这个示例中卷积核在图像的水平方向（$W$方向）和垂直方向（$H$方向）的滑动步长为2（用$S$表示）；对输入图像周围各填充1（用$P$表示）个0，即图中输入层原始数据为蓝色部分，灰色部分是进行了大小为1的扩展，用0来进行扩展。经过卷积操作得到输出为$3 \times 3 \times 2$（用$H_{o} \times W_{o} \times K$表示）大小的特征图，即$3 \times 3$大小的2通道特征图，其中$H_o$计算公式为：$H_o = (H - F + 2 \times P)/S + 1$，$W_o$同理。 而输出特征图中的每个像素，是每组滤波器与输入图像每个特征图的内积再求和，再加上偏置$b_o$，偏置通常对于每个输出特征图是共享的。输出特征图$o[:,:,0]$中的最后一个$-2$计算如图5右下角公式所示。
-
-在卷积操作中卷积核是可学习的参数，经过上面示例介绍，每层卷积的参数大小为$D \times F \times F \times K$。在多层感知器模型中，神经元通常是全部连接，参数较多。而卷积层的参数较少，这也是由卷积层的主要特性即局部连接和共享权重所决定。
-
- 局部连接：每个神经元仅与输入神经元的一块区域连接，这块局部区域称作感受野（receptive field）。在图像卷积操作中，即神经元在空间维度（spatial dimension，即上图示例H和W所在的平面）是局部连接，但在深度上是全部连接。对于二维图像本身而言，也是局部像素关联较强。这种局部连接保证了学习后的过滤器能够对于局部的输入特征有最强的响应。局部连接的思想，也是受启发于生物学里面的视觉系统结构，视觉皮层的神经元就是局部接受信息的。
-
- 权重共享：计算同一个深度切片的神经元时采用的滤波器是共享的。例如图4中计算$o[:,:,0]$的每个每个神经元的滤波器均相同，都为$W_0$，这样可以很大程度上减少参数。共享权重在一定程度上讲是有意义的，例如图片的底层边缘特征与特征在图中的具体位置无关。但是在一些场景中是无意的，比如输入的图片是人脸，眼睛和头发位于不同的位置，希望在不同的位置学到不同的特征 (参考[斯坦福大学公开课]( http://cs231n.github.io/convolutional-networks/))。请注意权重只是对于同一深度切片的神经元是共享的，在卷积层，通常采用多组卷积核提取不同特征，即对应不同深度切片的特征，不同深度切片的神经元权重是不共享。另外，偏重对同一深度切片的所有神经元都是共享的。
-
-通过介绍卷积计算过程及其特性，可以看出卷积是线性操作，并具有平移不变性（shift-invariant），平移不变性即在图像每个位置执行相同的操作。卷积层的局部连接和权重共享使得需要学习的参数大大减小，这样也有利于训练较大卷积神经网络。
-
-#### 池化层
-
-<p align="center">
-<img src="image/max_pooling.png" width="400px"><br/>
-图6. 池化层图片<br/>
-</p>
-
-池化是非线性下采样的一种形式，主要作用是通过减少网络的参数来减小计算量，并且能够在一定程度上控制过拟合。通常在卷积层的后面会加上一个池化层。池化包括最大池化、平均池化等。其中最大池化是用不重叠的矩形框将输入层分成不同的区域，对于每个矩形框的数取最大值作为输出层，如图6所示。
-
-更详细的关于卷积神经网络的具体知识可以参考[斯坦福大学公开课]( http://cs231n.github.io/convolutional-networks/ )和[图像分类](https://github.com/PaddlePaddle/book/blob/develop/image_classification/README.md)教程。
-
-### 常见激活函数介绍  
- sigmoid激活函数： $ f(x) = sigmoid(x) = \frac{1}{1+e^{-x}} $
-
- tanh激活函数： $ f(x) = tanh(x) = \frac{e^x-e^{-x}}{e^x+e^{-x}} $
-
-  实际上，tanh函数只是规模变化的sigmoid函数，将sigmoid函数值放大2倍之后再向下平移1个单位：tanh(x) = 2sigmoid(2x) - 1 。
-
- ReLU激活函数： $ f(x) = max(0, x) $
-
-更详细的介绍请参考[维基百科激活函数](https://en.wikipedia.org/wiki/Activation_function)。
-
-## 数据介绍
-
-PaddlePaddle在API中提供了自动加载[MNIST](http://yann.lecun.com/exdb/mnist/)数据的模块`paddle.dataset.mnist`。加载后的数据位于`/home/username/.cache/paddle/dataset/mnist`下：
-
-
-|         文件名称          |        说明             |
-| ----------------------- | ----------------------- |
-| train-images-idx3-ubyte | 训练数据图片，60,000条数据 |
-| train-labels-idx1-ubyte | 训练数据标签，60,000条数据 |
-| t10k-images-idx3-ubyte | 测试数据图片，10,000条数据 |
-| t10k-labels-idx1-ubyte | 测试数据标签，10,000条数据 |
-
-## 配置说明
-
-首先，加载PaddlePaddle的V2 api包。
-
-```python
-import paddle.v2 as paddle
-```
-其次，定义三个不同的分类器：
-
- Softmax回归：只通过一层简单的以softmax为激活函数的全连接层，就可以得到分类的结果。
-
-```python
-def softmax_regression(img):
-    predict = paddle.layer.fc(input=img,
-                              size=10,
-                              act=paddle.activation.Softmax())
-    return predict
-```
- 多层感知器：下面代码实现了一个含有两个隐藏层（即全连接层）的多层感知器。其中两个隐藏层的激活函数均采用ReLU，输出层的激活函数用Softmax。
-
-```python
-def multilayer_perceptron(img):
-    # 第一个全连接层，激活函数为ReLU
-    hidden1 = paddle.layer.fc(input=img, size=128, act=paddle.activation.Relu())
-    # 第二个全连接层，激活函数为ReLU
-    hidden2 = paddle.layer.fc(input=hidden1,
-                              size=64,
-                              act=paddle.activation.Relu())
-    # 以softmax为激活函数的全连接输出层，输出层的大小必须为数字的个数10
-    predict = paddle.layer.fc(input=hidden2,
-                              size=10,
-                              act=paddle.activation.Softmax())
-    return predict
-```
- 卷积神经网络LeNet-5: 输入的二维图像，首先经过两次卷积层到池化层，再经过全连接层，最后使用以softmax为激活函数的全连接层作为输出层。
-
-```python
-def convolutional_neural_network(img):
-    # 第一个卷积-池化层
-    conv_pool_1 = paddle.networks.simple_img_conv_pool(
-        input=img,
-        filter_size=5,
-        num_filters=20,
-        num_channel=1,
-        pool_size=2,
-        pool_stride=2,
-        act=paddle.activation.Relu())
-    # 第二个卷积-池化层
-    conv_pool_2 = paddle.networks.simple_img_conv_pool(
-        input=conv_pool_1,
-        filter_size=5,
-        num_filters=50,
-        num_channel=20,
-        pool_size=2,
-        pool_stride=2,
-        act=paddle.activation.Relu())
-    # 以softmax为激活函数的全连接输出层，输出层的大小必须为数字的个数10
-    predict = paddle.layer.fc(input=conv_pool_2,
-                              size=10,
-                              act=paddle.activation.Softmax())
-    return predict
-```
-
-接着，通过`layer.data`调用来获取数据，然后调用分类器（这里我们提供了三个不同的分类器）得到分类结果。训练时，对该结果计算其损失函数，分类问题常常选择交叉熵损失函数。
-
-```python
-# 该模型运行在单个CPU上
-paddle.init(use_gpu=False, trainer_count=1)
-
-images = paddle.layer.data(
-    name='pixel', type=paddle.data_type.dense_vector(784))
-label = paddle.layer.data(
-    name='label', type=paddle.data_type.integer_value(10))
-
-# predict = softmax_regression(images) # Softmax回归
-# predict = multilayer_perceptron(images) #多层感知器
-predict = convolutional_neural_network(images) #LeNet5卷积神经网络
-
-cost = paddle.layer.classification_cost(input=predict, label=label)
-```
-
-然后，指定训练相关的参数。
- 训练方法（optimizer)： 代表训练过程在更新权重时采用动量优化器 `Momentum` ，其中参数0.9代表动量优化每次保持前一次速度的0.9倍。
- 训练速度（learning_rate）： 迭代的速度，与网络的训练收敛速度有关系。
- 正则化（regularization）： 是防止网络过拟合的一种手段，此处采用L2正则化。
-
-```python
-parameters = paddle.parameters.create(cost)
-
-optimizer = paddle.optimizer.Momentum(
-    learning_rate=0.1 / 128.0,
-    momentum=0.9,
-    regularization=paddle.optimizer.L2Regularization(rate=0.0005 * 128))
-
-trainer = paddle.trainer.SGD(cost=cost,
-                             parameters=parameters,
-                             update_equation=optimizer)
-```
-
-下一步，我们开始训练过程。`paddle.dataset.movielens.train()`和`paddle.dataset.movielens.test()`分别做训练和测试数据集。这两个函数各自返回一个reader——PaddlePaddle中的reader是一个Python函数，每次调用的时候返回一个Python yield generator。
-
-下面`shuffle`是一个reader decorator，它接受一个reader A，返回另一个reader B —— reader B 每次读入`buffer_size`条训练数据到一个buffer里，然后随机打乱其顺序，并且逐条输出。
-
-`batch`是一个特殊的decorator，它的输入是一个reader，输出是一个batched reader —— 在PaddlePaddle里，一个reader每次yield一条训练数据，而一个batched reader每次yield一个minibatch。
-
-`event_handler_plot`可以用来在训练过程中画图如下：
-
-![png](./image/train_and_test.png)
-
-```python
-from paddle.v2.plot import Ploter
-
-train_title = "Train cost"
-test_title = "Test cost"
-cost_ploter = Ploter(train_title, test_title)
-
-step = 0
-
-# event_handler to plot a figure
-def event_handler_plot(event):
-    global step
-    if isinstance(event, paddle.event.EndIteration):
-        if step % 100 == 0:
-            cost_ploter.append(train_title, step, event.cost)
-            cost_ploter.plot()
-        step += 1
-    if isinstance(event, paddle.event.EndPass):
-        # save parameters
-        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
-            trainer.save_parameter_to_tar(f)
-
-        result = trainer.test(reader=paddle.batch(
-            paddle.dataset.mnist.test(), batch_size=128))
-        cost_ploter.append(test_title, step, result.cost)
-```
-
-`event_handler` 用来在训练过程中输出训练结果
-```python
-lists = []
-
-def event_handler(event):
-    if isinstance(event, paddle.event.EndIteration):
-        if event.batch_id % 100 == 0:
-            print "Pass %d, Batch %d, Cost %f, %s" % (
-                event.pass_id, event.batch_id, event.cost, event.metrics)
-    if isinstance(event, paddle.event.EndPass):
-        # save parameters
-        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
-            trainer.save_parameter_to_tar(f)
-
-        result = trainer.test(reader=paddle.batch(
-            paddle.dataset.mnist.test(), batch_size=128))
-        print "Test with Pass %d, Cost %f, %s\n" % (
-            event.pass_id, result.cost, result.metrics)
-        lists.append((event.pass_id, result.cost,
-                      result.metrics['classification_error_evaluator']))
-```
-
-```python
-trainer.train(
-    reader=paddle.batch(
-        paddle.reader.shuffle(
-            paddle.dataset.mnist.train(), buf_size=8192),
-        batch_size=128),
-    event_handler=event_handler_plot,
-    num_passes=5)
-```
-
-训练过程是完全自动的，event_handler里打印的日志类似如下所示：
-
-```
-# Pass 0, Batch 0, Cost 2.780790, {'classification_error_evaluator': 0.9453125}
-# Pass 0, Batch 100, Cost 0.635356, {'classification_error_evaluator': 0.2109375}
-# Pass 0, Batch 200, Cost 0.326094, {'classification_error_evaluator': 0.1328125}
-# Pass 0, Batch 300, Cost 0.361920, {'classification_error_evaluator': 0.1015625}
-# Pass 0, Batch 400, Cost 0.410101, {'classification_error_evaluator': 0.125}
-# Test with Pass 0, Cost 0.326659, {'classification_error_evaluator': 0.09470000118017197}
-```
-
-训练之后，检查模型的预测准确度。用 MNIST 训练的时候，一般 softmax回归模型的分类准确率为约为 92.34%，多层感知器为97.66%，卷积神经网络可以达到 99.20%。
-
-
-## 应用模型
-
-可以使用训练好的模型对手写体数字图片进行分类，下面程序展示了如何使用paddle.infer接口进行推断。
-
-```python
-from PIL import Image
-import numpy as np
-import os
-def load_image(file):
-    im = Image.open(file).convert('L')
-    im = im.resize((28, 28), Image.ANTIALIAS)
-    im = np.array(im).astype(np.float32).flatten()
-    im = im / 255.0 * 2.0 - 1.0
-    return im
-
-test_data = []
-cur_dir = os.getcwd()
-test_data.append((load_image(cur_dir + '/image/infer_3.png'),))
-
-probs = paddle.infer(
-    output_layer=predict, parameters=parameters, input=test_data)
-lab = np.argsort(-probs) # probs and lab are the results of one batch data
-print "Label of image/infer_3.png is: %d" % lab[0][0]
-```
-
-## 总结
-
-本教程的softmax回归、多层感知器和卷积神经网络是最基础的深度学习模型，后续章节中复杂的神经网络都是从它们衍生出来的，因此这几个模型对之后的学习大有裨益。同时，我们也观察到从最简单的softmax回归变换到稍复杂的卷积神经网络的时候，MNIST数据集上的识别准确率有了大幅度的提升，原因是卷积层具有局部连接和共享权重的特性。在之后学习新模型的时候，希望大家也要深入到新模型相比原模型带来效果提升的关键之处。此外，本教程还介绍了PaddlePaddle模型搭建的基本流程，从dataprovider的编写、网络层的构建，到最后的训练和预测。对这个流程熟悉以后，大家就可以用自己的数据，定义自己的网络模型，并完成自己的训练和预测任务了。
-
-## 参考文献
-
-1. LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. ["Gradient-based learning applied to document recognition."](http://ieeexplore.ieee.org/abstract/document/726791/) Proceedings of the IEEE 86, no. 11 (1998): 2278-2324.
-2. Wejéus, Samuel. ["A Neural Network Approach to Arbitrary SymbolRecognition on Modern Smartphones."](http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A753279&dswid=-434) (2014).
-3. Decoste, Dennis, and Bernhard Schölkopf. ["Training invariant support vector machines."](http://link.springer.com/article/10.1023/A:1012454411458) Machine learning 46, no. 1-3 (2002): 161-190.
-4. Simard, Patrice Y., David Steinkraus, and John C. Platt. ["Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis."](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.160.8494&rep=rep1&type=pdf) In ICDAR, vol. 3, pp. 958-962. 2003.
-5. Salakhutdinov, Ruslan, and Geoffrey E. Hinton. ["Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure."](http://www.jmlr.org/proceedings/papers/v2/salakhutdinov07a/salakhutdinov07a.pdf) In AISTATS, vol. 11. 2007.
-6. Cireşan, Dan Claudiu, Ueli Meier, Luca Maria Gambardella, and Jürgen Schmidhuber. ["Deep, big, simple neural nets for handwritten digit recognition."](http://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_00052) Neural computation 22, no. 12 (2010): 3207-3220.
-7. Deng, Li, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, and Geoffrey E. Hinton. ["Binary coding of speech spectrograms using a deep auto-encoder."](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.185.1908&rep=rep1&type=pdf) In Interspeech, pp. 1692-1695. 2010.
-8. Kégl, Balázs, and Róbert Busa-Fekete. ["Boosting products of base classifiers."](http://dl.acm.org/citation.cfm?id=1553439) In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 497-504. ACM, 2009.
-9. Rosenblatt, Frank. ["The perceptron: A probabilistic model for information storage and organization in the brain."](http://psycnet.apa.org/journals/rev/65/6/386/) Psychological review 65, no. 6 (1958): 386.
-10. Bishop, Christopher M. ["Pattern recognition."](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf) Machine Learning 128 (2006): 1-58.
-
-<br/>
-<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">知识共享 署名-相同方式共享 4.0 国际 许可协议</a>进行许可。
--- a/source/user_guides/howto/index.rst
+++ b/source/user_guides/howto/index.rst
@@ -3,27 +3,7 @@
 ####################


-概述
-####
-
-
-
-数据预处理
-##########
-
-
-配置简单的网络
-##############
-
-
-训练
-####
-
-
-
-调试
-####
-
-模型评估
-########
+.. toctree::
+   :maxdepth: 2

+   prepare_data/index
\ No newline at end of file
--- a/source/user_guides/howto/modification/foo.rst
+++ b/source/user_guides/howto/modification/foo.rst
+###
+FAQ
+###
--- a/source/user_guides/howto/prepare_data/feeding_data.rst
+++ b/source/user_guides/howto/prepare_data/feeding_data.rst
+.. _user_guide_use_numpy_array_as_train_data:
+
+###########################
+使用Numpy Array作为训练数据
+###########################
+
+PaddlePaddle Fluid支持使用 :ref:`api_fluid_layers_data` 配置数据层；
+再使用 Numpy Array 或者直接使用Python创建C++的
+:ref:`api_guide_lod_tensor` , 通过 :code:`Executor.run(feed=...)` 传给
+:ref:`api_guide_executor` 或 :ref:`api_guide_parallel_executor` 。
+
+数据层配置
+##########
+
+通过 :ref:`api_fluid_layers_data` 可以配置神经网络中需要的数据层。具体方法为:
+
+.. code-block:: python
+
+   import paddle.fluid as fluid
+
+   image = fluid.layers.data(name="image", shape=[3, 224, 224])
+   label = fluid.layers.data(name="label", shape=[1], dtype="int64")
+
+   # use image/label as layer input
+   prediction = fluid.layers.fc(input=image, size=1000, act="softmax")
+   loss = fluid.layers.cross_entropy(input=prediction, label=label)
+   ...
+
+上段代码中，:code:`image` 和 :code:`label` 是通过 :code:`fluid.layers.data`
+创建的两个输入数据层。其中 :code:`image` 是 :code:`[3, 224, 224]` 维度的浮点数据;
+:code:`label` 是 :code:`[1]` 维度的整数数据。这里需要注意的是:
+
+1. Fluid中默认使用 :code:`-1` 表示 batch size 维度，默认情况下会在 :code:`shape`
+   的第一个维度添加 :code:`-1` 。 所以 上段代码中， 我们可以接受将一个
+   :code:`[32, 3, 224, 224]` 的numpy array传给 :code:`image` 。 如果想自定义batch size
+   维度的位置的话，请设置 :code:`fluid.layers.data(append_batch_size=False)` 。
+   请参考进阶使用中的 :ref:`user_guide_customize_batch_size_rank` 。
+
+2. Fluid中用来做类别标签的数据类型是 :code:`int64`，并且标签从0开始。
+
+传递训练数据给执行器
+####################
+
+:code:`Executor.run` 和 :code:`ParallelExecutor.run` 都接受一个 :code:`feed` 参数。
+这个参数是一个Python的字典。它的键是数据层的名字，例如上文代码中的 :code:`image`。
+它的值是对应的numpy array。
+
+例如:
+
+.. code-block:: python
+
+   exe = fluid.Executor(fluid.CPUPlace())
+   exe.run(feed={
+      "image": numpy.random.random(size=(32, 3, 224, 224)).astype('float32'),
+      "label": numpy.random.random(size=(32, 1)).astype('int64')
+   })
+
+进阶使用
+########
+
+如何传入序列数据
+----------------
+
+序列数据是PaddlePaddle Fluid支持的特殊数据类型，可以使用 :code:`LoDTensor` 作为
+输入数据类型。它需要用户: 1. 传入一个mini-batch需要被训练的所有数据;
+2.每个序列的长度信息。
+用户可以使用 :code:`fluid.create_lod_tensor` 来创建 :code:`LoDTensor`。
+
+传入序列信息的时候，需要设置序列嵌套深度，:code:`lod_level`。
+例如训练数据是词汇组成的句子，:code:`lod_level=1`；训练数据是 词汇先组成了句子，
+句子再组成了段落，那么 :code:`lod_level=2`。
+
+例如:
+
+.. code-block:: python
+
+   sentence = fluid.layers.data(name="sentence", dtype="int64", shape=[1], lod_level=1)
+
+   ...
+
+   exe.run(feed={
+     "sentence": create_lod_tensor(
+       data=numpy.array([1, 3, 4, 5, 3, 6, 8], dtype='int64').reshape(-1, 1),
+       lod=[4, 1, 2],
+       place=fluid.CPUPlace()
+     )
+   })
+
+训练数据 :code:`sentence` 包含三个样本，他们的长度分别是 :code:`4, 1, 2`。
+他们分别是 :code:`data[0:4]`， :code:`data[4:5]` 和 :code:`data[5:7]`。
+
+如何分别设置ParallelExecutor中每个设备的训练数据
+------------------------------------------------
+
+用户将数据传递给使用 :code:`ParallelExecutor.run(feed=...)` 时，
+可以显示指定每一个训练设备(例如GPU)上的数据。
+用户需要将一个列表传递给 :code:`feed` 参数，列表中的每一个元素都是一个字典。
+这个字典的键是数据层的名字，值是数据层的值。
+
+例如:
+
+.. code-block:: python
+
+   parallel_executor = fluid.ParallelExecutor()
+   parallel_executor.run(
+     feed=[
+        {
+          "image": numpy.random.random(size=(32, 3, 224, 224)).astype('float32'),
+          "label": numpy.random.random(size=(32, 1)).astype('int64')
+        },
+        {
+          "image": numpy.random.random(size=(16, 3, 224, 224)).astype('float32'),
+          "label": numpy.random.random(size=(16, 1)).astype('int64')
+        },
+     ]
+   )
+
+上述代码中，GPU0会训练 32 个样本，而 GPU1训练 16 个样本。
+
+
+.. _user_guide_customize_batch_size_rank:
+
+自定义BatchSize维度
+-------------------
+
+PaddlePaddle Fluid默认batch size是数据的第一维度，以 :code:`-1` 表示。但是在高级
+使用中，batch_size 可以固定，也可以是其他维度或者多个维度来表示。这都需要设置
+:code:`fluid.layers.data(append_batch_size=False)` 来完成。
+
+1. 固定batch size维度
+
+  .. code-block:: python
+
+     image = fluid.layers.data(name="image", shape=[32, 784], append_batch_size=False)
+
+  这里，:code:`image` 永远是一个 :code:`[32, 784]` 大小的矩阵。
+
+2. 使用其他维度表示batch size
+
+  .. code-block:: python
+
+     sentence = fluid.layers.data(name="sentence",
+                                  shape=[80, -1, 1],
+                                  append_batch_size=False,
+                                  dtype="int64")
+
+  这里 :code:`sentence` 的中间维度是batch size。这种数据排布会用在定长的循环神经
+  网络中。
\ No newline at end of file
--- a/source/user_guides/howto/prepare_data/index.rst
+++ b/source/user_guides/howto/prepare_data/index.rst
+..  _user_guide_prepare_data:
+
+########
+准备数据
+########
+
+PaddlePaddle Fluid支持两种传入数据的方式:
+
+1. 用户需要使用 :code:`fluid.layers.data`
+配置数据输入层，并在 :ref:`api_guide_executor` 或 :ref:`api_guide_parallel_executor`
+中，使用 :code:`executor.run(feed=...)` 传入训练数据。
+
+2. 用户需要先将训练数据
+转换成 Paddle 识别的 :ref:`api_guide_recordio_file_format` ， 再使用
+:code:`fluid.layers.open_files` 以及 :ref:`api_guide_reader` 配置数据读取。
+
+这两种准备数据方法的比较如下:
+
+.. _user_guide_prepare_data_comparision:
+
+------------+----------------------------------+---------------------------------------+
+|            |        Feed数据                  |         使用Reader                    |
+============+==================================+=======================================+
+| API接口    | :code:`executor.run(feed=...)`   |         :ref:`api_guide_reader`       |
+------------+----------------------------------+---------------------------------------+
+| 数据格式   |           Numpy Array            | :ref:`api_guide_recordio_file_format` |
+------------+----------------------------------+---------------------------------------+
+| 数据增强   | Python端使用其他库完成           | 使用Fluid中的Operator 完成            |
+------------+----------------------------------+---------------------------------------+
+|   速度     |                 慢               |                 快                    |
+------------+----------------------------------+---------------------------------------+
+| 推荐用途   |   调试模型                       |   工业训练                            |
+------------+----------------------------------+---------------------------------------+
+
+这些准备数据的详细使用方法，请参考:
+
+.. toctree::
+   :maxdepth: 2
+
+   feeding_data
+   use_recordio_reader
+
+Python Reader
+#############
+
+为了方便用户在Python中定义数据处理流程，PaddlePaddle Fluid支持 Python Reader，
+具体请参考:
+
+.. toctree::
+   :maxdepth: 2
+
+   reader.md
--- a/source/user_guides/howto/prepare_data/reader.md
+++ b/source/user_guides/howto/prepare_data/reader.md
+```eval_rst
+.. _user_guide_reader:
+```
+
+# Python Reader
+
+During the training and testing phases, PaddlePaddle programs need to read data. To help the users write code that performs reading input data, we define the following:
+
+- A *reader*: A function that reads data (from file, network, random number generator, etc) and yields the data items.
+- A *reader creator*: A function that returns a reader function.
+- A *reader decorator*: A function, which takes in one or more readers, and returns a reader.
+- A *batch reader*: A function that reads data (from *reader*, file, network, random number generator, etc) and yields a batch of data items.
+
+and also provide a function which can convert a reader to a batch reader, frequently used reader creators and reader decorators.
+
+## Data Reader Interface
+
+*Data reader* doesn't have to be a function that reads and yields data items. It can just be any function without any parameters that creates an iterable (anything can be used in `for x in iterable`) as follows:
+
+```
+iterable = data_reader()
+```
+
+The item produced from the iterable should be a **single** entry of data and **not** a mini batch. The entry of data could be a single item or a tuple of items. Item should be of one of the [supported types](http://www.paddlepaddle.org/doc/ui/data_provider/pydataprovider2.html?highlight=dense_vector#input-types) (e.g., numpy 1d array of float32, int, list of int etc.)
+
+An example implementation for single item data reader creator is as follows:
+
+```python
+def reader_creator_random_image(width, height):
+    def reader():
+        while True:
+            yield numpy.random.uniform(-1, 1, size=width*height)
+    return reader
+```
+
+An example implementation for multiple item data reader creator is as follows:
+```python
+def reader_creator_random_image_and_label(width, height, label):
+    def reader():
+        while True:
+            yield numpy.random.uniform(-1, 1, size=width*height), label
+    return reader
+```
+
+## Batch Reader Interface
+
+*Batch reader* can be any function without any parameters that creates an iterable (anything can be used in `for x in iterable`). The output of the iterable should be a batch (list) of data items. Each item inside the list should be a tuple.
+
+Here are some valid outputs:
+
+```python
+# a mini batch of three data items. Each data item consist three columns of data, each of which is 1.
+[(1, 1, 1),
+(2, 2, 2),
+(3, 3, 3)]
+
+# a mini batch of three data items, each data item is a list (single column).
+[([1,1,1],),
+([2,2,2],),
+([3,3,3],)]
+```
+
+Please note that each item inside the list must be a tuple, below is an invalid output:
+```python
+ # wrong, [1,1,1] needs to be inside a tuple: ([1,1,1],).
+ # Otherwise it is ambiguous whether [1,1,1] means a single column of data [1, 1, 1],
+ # or three columns of data, each of which is 1.
+[[1,1,1],
+[2,2,2],
+[3,3,3]]
+```
+
+It is easy to convert from a reader to a batch reader:
+
+```python
+mnist_train = paddle.dataset.mnist.train()
+mnist_train_batch_reader = paddle.batch(mnist_train, 128)
+```
+
+It is also straight forward to create a custom batch reader:
+
+```python
+def custom_batch_reader():
+    while True:
+        batch = []
+        for i in xrange(128):
+            batch.append((numpy.random.uniform(-1, 1, 28*28),)) # note that it's a tuple being appended.
+        yield batch
+
+mnist_random_image_batch_reader = custom_batch_reader
+```
+
+## Usage
+
+Following is how we can use the reader with PaddlePaddle:
+The batch reader, a mapping from item(s) to data layer, the batch size and the number of total passes will be passed into `paddle.train` as follows:
+
+```python
+# two data layer is created:
+image_layer = paddle.layer.data("image", ...)
+label_layer = paddle.layer.data("label", ...)
+
+# ...
+batch_reader = paddle.batch(paddle.dataset.mnist.train(), 128)
+paddle.train(batch_reader, {"image":0, "label":1}, 128, 10, ...)
+```
+
+## Data Reader Decorator
+
+The *Data reader decorator* takes in a single reader or multiple data readers and returns a new data reader. It is similar to a [python decorator](https://wiki.python.org/moin/PythonDecorators), but it does not use `@` in the syntax.
+
+Since we have a strict interface for data readers (no parameters and return a single data item), a data reader can be used in a flexible way using data reader decorators. Following are a few examples:
+
+### Prefetch Data
+
+Since reading data may take some time and training can not proceed without data, it is generally a good idea to prefetch the data.
+
+Use `paddle.reader.buffered` to prefetch data:
+
+```python
+buffered_reader = paddle.reader.buffered(paddle.dataset.mnist.train(), 100)
+```
+
+`buffered_reader` will try to buffer (prefetch) `100` data entries.
+
+### Compose Multiple Data Readers
+
+For example, if we want to use a source of real images (say reusing mnist dataset), and a source of random images as input for [Generative Adversarial Networks](https://arxiv.org/abs/1406.2661).
+
+We can do the following :
+
+```python
+def reader_creator_random_image(width, height):
+    def reader():
+        while True:
+            yield numpy.random.uniform(-1, 1, size=width*height)
+    return reader
+
+def reader_creator_bool(t):
+    def reader:
+        while True:
+            yield t
+    return reader
+
+true_reader = reader_creator_bool(True)
+false_reader = reader_creator_bool(False)
+
+reader = paddle.reader.compose(paddle.dataset.mnist.train(), data_reader_creator_random_image(20, 20), true_reader, false_reader)
+# Skipped 1 because paddle.dataset.mnist.train() produces two items per data entry.
+# And we don't care about the second item at this time.
+paddle.train(paddle.batch(reader, 128), {"true_image":0, "fake_image": 2, "true_label": 3, "false_label": 4}, ...)
+```
+
+### Shuffle
+
+Given the shuffle buffer size `n`, `paddle.reader.shuffle` returns a data reader that buffers `n` data entries and shuffles them before a data entry is read.
+
+Example:
+```python
+reader = paddle.reader.shuffle(paddle.dataset.mnist.train(), 512)
+```
+
+## Q & A
+
+### Why does a reader return only a single entry, and not a mini batch?
+
+Returning a single entry makes reusing existing data readers much easier (for example, if an existing reader returns 3 entries instead if a single entry, the training code will be more complicated because it need to handle cases like a batch size 2).
+
+We provide a function: `paddle.batch` to turn (a single entry) reader into a batch reader.
+
+### Why do we need a batch reader, isn't is sufficient to give the reader and batch_size as arguments during training ?
+
+In most of the cases, it would be sufficient to give the reader and batch_size as arguments to the train method. However sometimes the user wants to customize the order of data entries inside a mini batch, or even change the batch size dynamically. For these cases using a batch reader is very efficient and helpful.
+
+### Why use a dictionary instead of a list to provide mapping?
+
+Using a dictionary (`{"image":0, "label":1}`) instead of a list (`["image", "label"]`) gives the advantage that the user can easily reuse the items (e.g., using `{"image_a":0, "image_b":0, "label":1}`) or even skip an item (e.g., using `{"image_a":0, "label":2}`).
+
+### How to create a custom data reader creator ?
+
+```python
+def image_reader_creator(image_path, label_path, n):
+    def reader():
+        f = open(image_path)
+        l = open(label_path)
+        images = numpy.fromfile(
+            f, 'ubyte', count=n * 28 * 28).reshape((n, 28 * 28)).astype('float32')
+        images = images / 255.0 * 2.0 - 1.0
+        labels = numpy.fromfile(l, 'ubyte', count=n).astype("int")
+        for i in xrange(n):
+            yield images[i, :], labels[i] # a single entry of data is created each time
+        f.close()
+        l.close()
+    return reader
+
+# images_reader_creator creates a reader
+reader = image_reader_creator("/path/to/image_file", "/path/to/label_file", 1024)
+paddle.train(paddle.batch(reader, 128), {"image":0, "label":1}, ...)
+```
+
+### How is `paddle.train` implemented
+
+An example implementation of paddle.train is:
+
+```python
+def train(batch_reader, mapping, batch_size, total_pass):
+    for pass_idx in range(total_pass):
+        for mini_batch in batch_reader(): # this loop will never end in online learning.
+            do_forward_backward(mini_batch, mapping)
+```
--- a/source/user_guides/howto/prepare_data/use_recordio_reader.rst
+++ b/source/user_guides/howto/prepare_data/use_recordio_reader.rst
+.. _user_guide_use_recordio_as_train_data:
+
+############################
+使用RecordIO文件作为训练数据
+############################
+
+相比于 :ref:`user_guide_use_numpy_array_as_train_data`，
+:ref:`user_guide_use_recordio_as_train_data` 的性能更好；
+但是用户需要先将训练数据集转换成RecordIO文件格式，再使用
+:ref:`api_fluid_layers_open_files` 层在神经网络配置中导入 RecordIO 文件。
+用户还可以使用 :ref:`api_fluid_layers_double_buffer` 加速数据从内存到显存的拷贝，
+使用 :ref:`api_fluid_layers_Preprocessor` 工具进行数据增强。
+
+将训练数据转换成RecordIO文件格式
+################################
+
+:ref:`api_guide_recordio_file_format` 中，每个记录都是一个
+:code:`vector<LoDTensor>`, 即一个支持序列信息的Tensor数组。这个数组包括训练所需
+的所有特征。例如对于图像分类来说，这个数组可以包含图片和分类标签。
+
+用户可以使用 :ref:`api_fluid_recordio_writer_convert_reader_to_recordio_file` 可以将
+:ref:`user_guide_reader` 转换成一个RecordIO文件。或者可以使用
+:ref:`api_fluid_recordio_writer_convert_reader_to_recordio_files` 将一个
+:ref:`user_guide_reader` 转换成多个RecordIO文件。
+
+具体使用方法为:
+
+.. code-block:: python
+
+   import paddle.fluid as fluid
+   import numpy
+
+   def reader_creator():
+       def __impl__():
+           for i in range(1000):
+               yield [
+                        numpy.random.random(size=[3,224,224], dtype="float32"),
+                        numpy.random.random(size=[1], dtype="int64")
+                     ]
+       return __impl__
+
+   img = fluid.layers.data(name="image", shape=[3, 224, 224])
+   label = fluid.layers.data(name="label", shape=[1], dtype="int64")
+   feeder = fluid.DataFeeder(feed_list=[img, label], place=fluid.CPUPlace())
+
+   BATCH_SIZE = 32
+   reader = paddle.batch(reader_creator(), batch_size=BATCH_SIZE)
+   fluid.recordio_writer.convert_reader_to_recordio_file(
+      "train.recordio", feeder=feeder, reader_creator=reader)
+
+其中 :code:`reader_creator` 创建了一个 :code:`Reader`。
+:ref:`_api_fluid_data_feeder_DataFeeder`
+是将 :code:`Reader` 转换成 :code:`LoDTensor` 的工具。详细请参考
+:ref:`user_guide_reader` 。
+
+上述程序将 :code:`reader_creator` 的数据转换成了 :code:`train.recordio` 文件，
+其中每一个record 含有 32 条样本。如果batch size会在训练过程中调整，
+用户可以将每一个Record的样本数设置成1。并参考
+:ref:`user_guide_use_recordio_as_train_data_use_op_create_batch`。
+
+
+配置神经网络, 打开RecordIO文件
+##############################
+
+RecordIO文件转换好之后，用户可以使用 :ref:`api_fluid_layers_open_files`
+打开文件，并使用 :ref:`api_fluid_layers_read_file` 读取文件内容。
+简单使用方法如下:
+
+.. code-block:: python
+
+   import paddle.fluid as fluid
+
+   file_obj = fluid.layers.open_files(
+     filenames=["train.recordio"],
+     shape=[[3, 224, 224], [1]],
+     lod_levels=[0, 0],
+     dtypes=["float32", "int64"],
+     pass_num=100
+   )
+
+   image, label = fluid.layers.read_file(file_obj)
+
+其中如果设置了 :code:`pass_num` ，那么当所有数据读完后，会重新读取数据，
+直到读取了 :code:`pass_num` 遍。
+
+
+
+进阶使用
+########
+
+
+使用 :ref:`api_fluid_layers_double_buffer`
+------------------------------------------
+
+:code:`Double buffer` 使用双缓冲技术，将训练数据从内存中复制到显存中。配置双缓冲
+需要使用 :ref:`api_fluid_layers_double_buffer` 修饰文件对象。 例如:
+
+.. code-block:: python
+
+   import paddle.fliud as fluid
+   file_obj = fluid.layers.open_files(...)
+   file_obj = fluid.layers.double_buffer(file_obj)
+
+   image, label = fluid.layers.read_file(file_obj)
+
+双缓冲技术可以参考
+`Multiple buffering <https://en.wikipedia.org/wiki/Multiple_buffering>`_ 。
+
+配置数据增强
+------------
+
+使用 :ref:`api_fluid_layers_Preprocessor` 可以配置文件的数据增强方法。例如
+
+.. code-block:: python
+
+   import paddle.fluid as fluid
+   file_obj = fluid.layers.open_files(...)
+   preprocessor = fluid.layers.Preprocessor(reader=data_file)
+   with preprocessor.block():
+       image, label = preprocessor.inputs()
+       image = image / 2
+       label = label + 1
+       preprocessor.outputs(image, label)
+
+如上代码所示，使用 :code:`Preprocessor` 定义了一个数据增强模块，并在
+:code:`with preprocessor.block()` 中定义了数据增强的具体操作。 用户通过配置
+:code:`preprocessor.inputs()` 获得数据文件中的各个字段。 并用
+:code:`preprocessor.outputs()` 标记预处理后的输出。
+
+.. _user_guide_use_recordio_as_train_data_use_op_create_batch:
+
+使用Op组batch
+-------------
+
+使用 :ref:`api_fluid_layers_batch` 可以在训练的过程中动态的组batch。例如
+
+.. code-block:: python
+
+   import paddle.fluid as fluid
+   file_obj = fluid.layers.open_files(...)
+   file_obj = fluid.layers.batch(file_obj, batch_size=32)
+
+   img, label = fluid.layers.read_file(file_obj)
+
+需要注意的是，如果数据集中的最后几个样本不能组成 :code:`batch_size` 大小的批量数据，
+那么这几个样本直接组成一个批量数据进行训练。
+
+读入数据的shuffle
+-----------------
+
+使用 :ref:`api_fluid_layers_shuffle` 可以在训练过程中动态重排训练数据。例如
+
+.. code-block:: python
+
+   import paddle.fluid as fluid
+   file_obj = fluid.layers.open_files(...)
+   file_obj = fliud.layers.shuffle(file_obj, buffer_size=8192)
+
+   img, label = fliud.layers.read_file(file_obj)
+
+需要注意的是:
+
+1. :code:`shuffle` 实现方法是:
+先读入 :code:`buffer_size` 条样本，再随机的选出样本进行训练。
+
+2. :code:`shuffle` 中 :code:`buffer_size` 会占用训练内存，需要确定训练过程中内存
+足够支持缓存 :code:`buffer_size` 条数据。
--- a/source/user_guides/howto/training/foo.rst
+++ b/source/user_guides/howto/training/foo.rst
+###
+FAQ
+###
--- a/source/user_guides/model_bank/images/foo.rst
+++ b/source/user_guides/model_bank/images/foo.rst
+###
+FAQ
+###
--- a/source/user_guides/model_bank/nlp/foo.rst
+++ b/source/user_guides/model_bank/nlp/foo.rst
+###
+FAQ
+###
--- a/source/user_guides/model_bank/others/foo.rst
+++ b/source/user_guides/model_bank/others/foo.rst
+###
+FAQ
+###
--- a/source/user_guides/model_bank/voice/foo.rst
+++ b/source/user_guides/model_bank/voice/foo.rst
+###
+FAQ
+###