Update doc

3d23804e · Yu Yang · 762916d9 · 3d23804e · 3d23804e · 3d23804e
16 changed file
--- a/doc/_sources/demo/image_classification/image_classification.txt
+++ b/doc/_sources/demo/image_classification/image_classification.txt
-#Image Classification Tutorial
+Image Classification Tutorial
+==============================

 This tutorial will guide you through training a convolutional neural network to classify objects using the CIFAR-10 image classification dataset.
 As shown in the following figure, the convolutional neural network can recognize the main object in images, and output the classification result.
@@ -172,7 +173,7 @@ python -m paddle.utils.plotcurve -i $log > plot.png
 - The script `plotcurve.py` requires the python module of `matplotlib`, so if it fails, maybe you need to install `matplotlib`.


-After training finishes, the training and testing error curve will be saved to `plot.png` using `plotcurve.py` script. An example of the plot is shown below:
+After training finishes, the training and testing error curves will be saved to `plot.png` using `plotcurve.py` script. An example of the plot is shown below:

 <center>![Training and testing curves.](./plot.png)</center>


--- a/doc/_sources/demo/imagenet_model/resnet_model.txt
+++ b/doc/_sources/demo/imagenet_model/resnet_model.txt
 # Model Zoo - ImageNet #

-[ImageNet](http://www.image-net.org/) is a popular dataset for generic object classification. This tutorial provided convolutional neural network(CNN) models for ImageNet.
+[ImageNet](http://www.image-net.org/) is a popular dataset for generic object classification. This tutorial provides convolutional neural network(CNN) models for ImageNet.

 ## ResNet Introduction

@@ -48,11 +48,11 @@ We present three ResNet models, which are converted from the models provided by

 ## ResNet Model

-See ```demo/model_zoo/resnet/resnet.py```. This confgiure contains network of 50, 101 and 152 layers. You can specify layer number by adding argument like this ```--config_args=layer_num=50``` in command line arguments.
+See ```demo/model_zoo/resnet/resnet.py```. This config contains network of 50, 101 and 152 layers. You can specify layer number by adding argument like ```--config_args=layer_num=50``` in command line arguments.

 ### Network Visualization

-You can get a diagram of ResNet network by running the following command. The script generates dot file and then converts dot file to PNG file, which uses installed draw_dot tool in our server. If you can not access the server, just install graphviz to convert dot file.
+You can get a diagram of ResNet network by running the following commands. The script generates dot file and then converts dot file to PNG file, which uses installed draw_dot tool in our server. If you can not access the server, just install graphviz to convert dot file.

 ```
 cd demo/model_zoo/resnet
@@ -190,8 +190,7 @@ Second, specify layers to extract features in `Outputs()` of `resnet.py`. For ex
 Outputs("res5_3_branch2c_conv", "res5_3_branch2c_bn")
 ```

-Third, specify model path and output directory in `extract_fea_c++.sh
-`, and then run following commands
+Third, specify model path and output directory in `extract_fea_c++.sh`, and then run the following commands.

 ```
 cd demo/model_zoo/resnet

--- a/doc/_sources/ui/data_provider/index.txt
+++ b/doc/_sources/ui/data_provider/index.txt
@@ -10,7 +10,7 @@ customized, with sacrificing the efficiency only a little. This is extremly
 useful when you have to dynamically generate certain kinds of data according to,
 for example, the training performance.

-Besides, users also can also customize a C++ :code:`DataProvider` for a more
+Besides, users also can customize a C++ :code:`DataProvider` for a more
 complex usage, or for a higher efficiency.

 The following parameters are required to define in the PaddlePaddle network

--- a/doc/_sources/ui/data_provider/pydataprovider2.txt
+++ b/doc/_sources/ui/data_provider/pydataprovider2.txt
@@ -17,10 +17,10 @@ how to write a simple PyDataProvider.

 MNIST is a handwriting classification data set. It contains 70,000 digital
 grayscale images. Labels of the training sample range from 0 to 9. All the
-images have been size-normalized and centered into images with a same size
+images have been size-normalized and centered into images with the same size
 of 28 x 28 pixels.

-A small part of the original data as an example can be found in the path below:
+A small part of the original data as an example is shown as below:

 .. literalinclude:: ../../../doc_cn/ui/data_provider/mnist_train.txt

@@ -31,10 +31,9 @@ Just write path of the above data into train.list. It looks like this:

 .. literalinclude:: ../../../doc_cn/ui/data_provider/train.list

-The corresponding dataprovider can be found in the path below:
+The corresponding dataprovider is shown as below:

 .. literalinclude:: ../../../doc_cn/ui/data_provider/mnist_provider.py
-   : linenos:

 The first line imports PyDataProvider2 package.
 The main function is the process function, that has two parameters.
@@ -45,8 +44,8 @@ This parameter is passed to the process function by PaddlePaddle.
 :code:`@provider` is a Python
 `Decorator <http://www.learnpython.org/en/Decorators>`_ .
 It sets some properties to DataProvider, and constructs a real PaddlePaddle
-DataProvider from a very sample user implemented python function. It does not
-matter if you are not familiar with `Decorator`_. You can keep it sample by
+DataProvider from a very simple user implemented python function. It does not
+matter if you are not familiar with `Decorator`_. You can keep it simple by
 just taking :code:`@provider` as a fixed mark above the provider function you
 implemented.

@@ -59,9 +58,9 @@ document of `input_types`_ for more details.

 The process method is the core part to construct a real DataProvider in
 PaddlePaddle. It implements how to open the text file, how to read one sample
-from the original text file, converted them into `input_types`_, and give them
+from the original text file, convert them into `input_types`_, and give them
 back to PaddlePaddle process at line 23.
-Note that data yields by the process function must follow a same order that
+Note that data yielded by the process function must follow the same order that
 `input_types`_ are defined.


@@ -111,7 +110,7 @@ The corresponding data provider can be found in the path below:

 .. literalinclude:: ../../../doc_cn/ui/data_provider/sentimental_provider.py

-This data provider for sequential model is a little bit complex than that
+This data provider for sequential model is a little more complex than that
 for MINST dataset.
 A new initialization method is introduced here.
 The method :code:`on_init` is configured to DataProvider by :code:`@provider`'s
@@ -243,7 +242,7 @@ parameters which your init_hook does not use.
 cache
 +++++
 DataProvider provides two simple cache strategy. They are
-* CacheType.NO_CACHE means do not cache any data, then data is read runtime by
+* CacheType.NO_CACHE means do not cache any data, then data is read at runtime by
  the user implemented python module every pass.
 * CacheType.CACHE_PASS_IN_MEM means the first pass reads data by the user
  implemented python module, and the rest passes will directly read data from

--- a/doc/demo/image_classification/image_classification.html
+++ b/doc/demo/image_classification/image_classification.html
@@ -6,7 +6,7 @@
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
-    <title>Data Preparation &mdash; PaddlePaddle  documentation</title>
+    <title>Image Classification Tutorial &mdash; PaddlePaddle  documentation</title>
    
    <link rel="stylesheet" href="../../_static/classic.css" type="text/css" />
    <link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
@@ -56,12 +56,13 @@
        <div class="bodywrapper">
          <div class="body" role="main">
            
-  <p>#Image Classification Tutorial</p>
+  <div class="section" id="image-classification-tutorial">
+<span id="image-classification-tutorial"></span><h1>Image Classification Tutorial<a class="headerlink" href="#image-classification-tutorial" title="Permalink to this headline">¶</a></h1>
 <p>This tutorial will guide you through training a convolutional neural network to classify objects using the CIFAR-10 image classification dataset.
 As shown in the following figure, the convolutional neural network can recognize the main object in images, and output the classification result.</p>
 <p><center><img alt="Image Classification" src="../../_images/image_classification.png" /></center></p>
 <div class="section" id="data-preparation">
-<span id="data-preparation"></span><h1>Data Preparation<a class="headerlink" href="#data-preparation" title="Permalink to this headline">¶</a></h1>
+<span id="data-preparation"></span><h2>Data Preparation<a class="headerlink" href="#data-preparation" title="Permalink to this headline">¶</a></h2>
 <p>First, download CIFAR-10 dataset. CIFAR-10 dataset can be downloaded from its official website.</p>
 <p><a class="reference external" href="https://www.cs.toronto.edu/~kriz/cifar.html">https://www.cs.toronto.edu/~kriz/cifar.html</a></p>
 <p>We have prepared a script to download and process CIFAR-10 dataset. The script will download CIFAR-10 dataset from the official dataset.
@@ -112,7 +113,7 @@ sh download_cifar.sh
 <p>It has two directories:<code class="docutils literal"><span class="pre">train</span></code> and <code class="docutils literal"><span class="pre">test</span></code>. These two directories contain training data and testing data of CIFAR-10, respectively. Each of these two folders contains 10 sub-folders, ranging from <code class="docutils literal"><span class="pre">airplane</span></code> to <code class="docutils literal"><span class="pre">truck</span></code>. Each sub-folder contains images with the corresponding label. After the images are organized into this structure, we are ready to train an image classification model.</p>
 </div>
 <div class="section" id="preprocess">
-<span id="preprocess"></span><h1>Preprocess<a class="headerlink" href="#preprocess" title="Permalink to this headline">¶</a></h1>
+<span id="preprocess"></span><h2>Preprocess<a class="headerlink" href="#preprocess" title="Permalink to this headline">¶</a></h2>
 <p>After the data has been downloaded, it needs to be pre-processed into the Paddle format. We can run the following command for preprocessing.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span>cd demo/image_classification/
 sh preprocess.sh
@@ -132,7 +133,7 @@ python preprocess.py -i <span class="nv">$data_dir</span> -s <span class="m">32<
 </ul>
 </div>
 <div class="section" id="model-training">
-<span id="model-training"></span><h1>Model Training<a class="headerlink" href="#model-training" title="Permalink to this headline">¶</a></h1>
+<span id="model-training"></span><h2>Model Training<a class="headerlink" href="#model-training" title="Permalink to this headline">¶</a></h2>
 <p>We need to create a model config file before training the model. An example of the config file (vgg_16_cifar.py) is listed below. <strong>Note</strong>, it is slightly different from the <code class="docutils literal"><span class="pre">vgg_16_cifar.py</span></code> which also applies to the prediction.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">paddle.trainer_config_helpers</span> <span class="kn">import</span> <span class="o">*</span>
 <span class="n">data_dir</span><span class="o">=</span><span class="s1">&#39;data/cifar-out/batches/&#39;</span>
@@ -212,11 +213,11 @@ python -m paddle.utils.plotcurve -i <span class="nv">$log</span> &gt; plot.png
 <li><code class="docutils literal"><span class="pre">./demo/image_classification/vgg_16_cifar.py</span></code> is the network and data configuration file. The meaning of the other flags can be found in the documentation of the command line flags.</li>
 <li>The script <code class="docutils literal"><span class="pre">plotcurve.py</span></code> requires the python module of <code class="docutils literal"><span class="pre">matplotlib</span></code>, so if it fails, maybe you need to install <code class="docutils literal"><span class="pre">matplotlib</span></code>.</li>
 </ul>
-<p>After training finishes, the training and testing error curve will be saved to <code class="docutils literal"><span class="pre">plot.png</span></code> using <code class="docutils literal"><span class="pre">plotcurve.py</span></code> script. An example of the plot is shown below:</p>
+<p>After training finishes, the training and testing error curves will be saved to <code class="docutils literal"><span class="pre">plot.png</span></code> using <code class="docutils literal"><span class="pre">plotcurve.py</span></code> script. An example of the plot is shown below:</p>
 <p><center><img alt="Training and testing curves." src="../../_images/plot.png" /></center></p>
 </div>
 <div class="section" id="prediction">
-<span id="prediction"></span><h1>Prediction<a class="headerlink" href="#prediction" title="Permalink to this headline">¶</a></h1>
+<span id="prediction"></span><h2>Prediction<a class="headerlink" href="#prediction" title="Permalink to this headline">¶</a></h2>
 <p>After we train the model, the model file as well as the model parameters are stored in path <code class="docutils literal"><span class="pre">./cifar_vgg_model/pass-%05d</span></code>. For example, the model of the 300-th pass is stored at <code class="docutils literal"><span class="pre">./cifar_vgg_model/pass-00299</span></code>.</p>
 <p>To make a prediction for an image, one can run <code class="docutils literal"><span class="pre">predict.sh</span></code> as follows. The script will output the label of the classfiication.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span>sh predict.sh
@@ -231,14 +232,14 @@ python prediction.py $model $image $use_gpu
 </div>
 </div>
 <div class="section" id="exercise">
-<span id="exercise"></span><h1>Exercise<a class="headerlink" href="#exercise" title="Permalink to this headline">¶</a></h1>
+<span id="exercise"></span><h2>Exercise<a class="headerlink" href="#exercise" title="Permalink to this headline">¶</a></h2>
 <p>Train a image classification of birds using VGG model and CUB-200 dataset. The birds dataset can be downloaded here. It contains an image dataset with photos of 200 bird species (mostly North American).</p>
 <p><a class="reference external" href="http://www.vision.caltech.edu/visipedia/CUB-200.html">http://www.vision.caltech.edu/visipedia/CUB-200.html</a></p>
 </div>
 <div class="section" id="delve-into-details">
-<span id="delve-into-details"></span><h1>Delve into Details<a class="headerlink" href="#delve-into-details" title="Permalink to this headline">¶</a></h1>
+<span id="delve-into-details"></span><h2>Delve into Details<a class="headerlink" href="#delve-into-details" title="Permalink to this headline">¶</a></h2>
 <div class="section" id="convolutional-neural-network">
-<span id="convolutional-neural-network"></span><h2>Convolutional Neural Network<a class="headerlink" href="#convolutional-neural-network" title="Permalink to this headline">¶</a></h2>
+<span id="convolutional-neural-network"></span><h3>Convolutional Neural Network<a class="headerlink" href="#convolutional-neural-network" title="Permalink to this headline">¶</a></h3>
 <p>A Convolutional Neural Network is a feedforward neural network that uses convolution layers. It is very suitable for building neural networks that process and understand images. A standard convolutional neural network is shown below:</p>
 <p><img alt="Convolutional Neural Network" src="../../_images/lenet.png" /></p>
 <p>Convolutional Neural Network contains the following layers:</p>
@@ -250,6 +251,7 @@ python prediction.py $model $image $use_gpu
 <p>Convolutional Neural Network achieves amazing performance for image classification because it exploits two important characteristics of images: <em>local correlation</em> and <em>spatial invariance</em>. By iteratively applying convolution and max-pooing operations, convolutional neural network can well represent these two characteristics of images.</p>
 <p>For more details of how to define layers and their connections, please refer to the documentation of layers.</p>
 </div>
+</div>
 </div>


@@ -260,7 +262,8 @@ python prediction.py $model $image $use_gpu
        <div class="sphinxsidebarwrapper">
  <h3><a href="../../index.html">Table Of Contents</a></h3>
  <ul>
-<li><a class="reference internal" href="#">Data Preparation</a></li>
+<li><a class="reference internal" href="#">Image Classification Tutorial</a><ul>
+<li><a class="reference internal" href="#data-preparation">Data Preparation</a></li>
 <li><a class="reference internal" href="#preprocess">Preprocess</a></li>
 <li><a class="reference internal" href="#model-training">Model Training</a></li>
 <li><a class="reference internal" href="#prediction">Prediction</a></li>
@@ -269,6 +272,8 @@ python prediction.py $model $image $use_gpu
 <li><a class="reference internal" href="#convolutional-neural-network">Convolutional Neural Network</a></li>
 </ul>
 </li>
+</ul>
+</li>
 </ul>

  <h4>Previous topic</h4>

--- a/doc/demo/image_classification/index.html
+++ b/doc/demo/image_classification/index.html
@@ -26,7 +26,7 @@
    <script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
    <link rel="top" title="PaddlePaddle  documentation" href="../../index.html" />
    <link rel="up" title="Examples and demos" href="../index.html" />
-    <link rel="next" title="Data Preparation" href="image_classification.html" />
+    <link rel="next" title="Image Classification Tutorial" href="image_classification.html" />
    <link rel="prev" title="Examples and demos" href="../index.html" /> 
  </head>
  <body role="document">
@@ -40,7 +40,7 @@
          <a href="../../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
-          <a href="image_classification.html" title="Data Preparation"
+          <a href="image_classification.html" title="Image Classification Tutorial"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="../index.html" title="Examples and demos"
@@ -59,13 +59,16 @@
 <h1>Image Classification Tutorial<a class="headerlink" href="#image-classification-tutorial" title="Permalink to this headline">¶</a></h1>
 <div class="toctree-wrapper compound">
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="image_classification.html">Data Preparation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="image_classification.html#preprocess">Preprocess</a></li>
-<li class="toctree-l1"><a class="reference internal" href="image_classification.html#model-training">Model Training</a></li>
-<li class="toctree-l1"><a class="reference internal" href="image_classification.html#prediction">Prediction</a></li>
-<li class="toctree-l1"><a class="reference internal" href="image_classification.html#exercise">Exercise</a></li>
-<li class="toctree-l1"><a class="reference internal" href="image_classification.html#delve-into-details">Delve into Details</a><ul>
-<li class="toctree-l2"><a class="reference internal" href="image_classification.html#convolutional-neural-network">Convolutional Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="image_classification.html">Training Locally</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="image_classification.html#data-preparation">Data Preparation</a></li>
+<li class="toctree-l2"><a class="reference internal" href="image_classification.html#preprocess">Preprocess</a></li>
+<li class="toctree-l2"><a class="reference internal" href="image_classification.html#model-training">Model Training</a></li>
+<li class="toctree-l2"><a class="reference internal" href="image_classification.html#prediction">Prediction</a></li>
+<li class="toctree-l2"><a class="reference internal" href="image_classification.html#exercise">Exercise</a></li>
+<li class="toctree-l2"><a class="reference internal" href="image_classification.html#delve-into-details">Delve into Details</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="image_classification.html#convolutional-neural-network">Convolutional Neural Network</a></li>
+</ul>
+</li>
 </ul>
 </li>
 </ul>
@@ -83,7 +86,7 @@
                        title="previous chapter">Examples and demos</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="image_classification.html"
-                        title="next chapter">Data Preparation</a></p>
+                        title="next chapter">Image Classification Tutorial</a></p>
  <div role="note" aria-label="source link">
    <h3>This Page</h3>
    <ul class="this-page-menu">
@@ -118,7 +121,7 @@
          <a href="../../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
-          <a href="image_classification.html" title="Data Preparation"
+          <a href="image_classification.html" title="Image Classification Tutorial"
             >next</a> |</li>
        <li class="right" >
          <a href="../index.html" title="Examples and demos"

--- a/doc/demo/imagenet_model/resnet_model.html
+++ b/doc/demo/imagenet_model/resnet_model.html
@@ -57,7 +57,7 @@
            
  <div class="section" id="model-zoo-imagenet">
 <span id="model-zoo-imagenet"></span><h1>Model Zoo - ImageNet<a class="headerlink" href="#model-zoo-imagenet" title="Permalink to this headline">¶</a></h1>
-<p><a class="reference external" href="http://www.image-net.org/">ImageNet</a> is a popular dataset for generic object classification. This tutorial provided convolutional neural network(CNN) models for ImageNet.</p>
+<p><a class="reference external" href="http://www.image-net.org/">ImageNet</a> is a popular dataset for generic object classification. This tutorial provides convolutional neural network(CNN) models for ImageNet.</p>
 <div class="section" id="resnet-introduction">
 <span id="resnet-introduction"></span><h2>ResNet Introduction<a class="headerlink" href="#resnet-introduction" title="Permalink to this headline">¶</a></h2>
 <p>ResNets from paper <a class="reference external" href="http://arxiv.org/abs/1512.03385">Deep Residual Learning for Image Recognition</a> won the 1st place on the ILSVRC 2015 classification task. They present residual learning framework to ease the training of networks that are substantially deeper than those used previously. The residual connections are shown in following figure. The left building block is used in network of 34 layers and the right bottleneck building block is used in network of 50, 101, 152 layers .</p>
@@ -97,10 +97,10 @@
 <br></div>
 <div class="section" id="resnet-model">
 <span id="resnet-model"></span><h2>ResNet Model<a class="headerlink" href="#resnet-model" title="Permalink to this headline">¶</a></h2>
-<p>See <code class="docutils literal"><span class="pre">demo/model_zoo/resnet/resnet.py</span></code>. This confgiure contains network of 50, 101 and 152 layers. You can specify layer number by adding argument like this <code class="docutils literal"><span class="pre">--config_args=layer_num=50</span></code> in command line arguments.</p>
+<p>See <code class="docutils literal"><span class="pre">demo/model_zoo/resnet/resnet.py</span></code>. This config contains network of 50, 101 and 152 layers. You can specify layer number by adding argument like <code class="docutils literal"><span class="pre">--config_args=layer_num=50</span></code> in command line arguments.</p>
 <div class="section" id="network-visualization">
 <span id="network-visualization"></span><h3>Network Visualization<a class="headerlink" href="#network-visualization" title="Permalink to this headline">¶</a></h3>
-<p>You can get a diagram of ResNet network by running the following command. The script generates dot file and then converts dot file to PNG file, which uses installed draw_dot tool in our server. If you can not access the server, just install graphviz to convert dot file.</p>
+<p>You can get a diagram of ResNet network by running the following commands. The script generates dot file and then converts dot file to PNG file, which uses installed draw_dot tool in our server. If you can not access the server, just install graphviz to convert dot file.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span>cd demo/model_zoo/resnet
 ./net_diagram.sh
 </pre></div>
@@ -227,7 +227,7 @@ shape: <code class="docutils literal"><span class="pre">(Co,</span> <span class=
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">Outputs</span><span class="p">(</span><span class="s2">&quot;res5_3_branch2c_conv&quot;</span><span class="p">,</span> <span class="s2">&quot;res5_3_branch2c_bn&quot;</span><span class="p">)</span>
 </pre></div>
 </div>
-<p>Third, specify model path and output directory in <code class="docutils literal"><span class="pre">extract_fea_c++.sh</span></code>, and then run following commands</p>
+<p>Third, specify model path and output directory in <code class="docutils literal"><span class="pre">extract_fea_c++.sh</span></code>, and then run the following commands.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span>cd demo/model_zoo/resnet
 ./extract_fea_c++.sh
 </pre></div>

--- a/doc/demo/sentiment_analysis/index.html
+++ b/doc/demo/sentiment_analysis/index.html
@@ -27,7 +27,7 @@
    <link rel="top" title="PaddlePaddle  documentation" href="../../index.html" />
    <link rel="up" title="Examples and demos" href="../index.html" />
    <link rel="next" title="Sentiment Analysis Tutorial" href="sentiment_analysis.html" />
-    <link rel="prev" title="Data Preparation" href="../image_classification/image_classification.html" /> 
+    <link rel="prev" title="Image Classification Tutorial" href="../image_classification/image_classification.html" /> 
  </head>
  <body role="document">
    <div class="related" role="navigation" aria-label="related navigation">
@@ -43,7 +43,7 @@
          <a href="sentiment_analysis.html" title="Sentiment Analysis Tutorial"
             accesskey="N">next</a> |</li>
        <li class="right" >
-          <a href="../image_classification/image_classification.html" title="Data Preparation"
+          <a href="../image_classification/image_classification.html" title="Image Classification Tutorial"
             accesskey="P">previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="../../index.html">PaddlePaddle  documentation</a> &raquo;</li>
          <li class="nav-item nav-item-1"><a href="../index.html" accesskey="U">Examples and demos</a> &raquo;</li> 
@@ -88,7 +88,7 @@
        <div class="sphinxsidebarwrapper">
  <h4>Previous topic</h4>
  <p class="topless"><a href="../image_classification/image_classification.html"
-                        title="previous chapter">Data Preparation</a></p>
+                        title="previous chapter">Image Classification Tutorial</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="sentiment_analysis.html"
                        title="next chapter">Sentiment Analysis Tutorial</a></p>
@@ -129,7 +129,7 @@
          <a href="sentiment_analysis.html" title="Sentiment Analysis Tutorial"
             >next</a> |</li>
        <li class="right" >
-          <a href="../image_classification/image_classification.html" title="Data Preparation"
+          <a href="../image_classification/image_classification.html" title="Image Classification Tutorial"
             >previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="../../index.html">PaddlePaddle  documentation</a> &raquo;</li>
          <li class="nav-item nav-item-1"><a href="../index.html" >Examples and demos</a> &raquo;</li> 

--- a/doc/searchindex.js
+++ b/doc/searchindex.js
--- a/doc/ui/api/trainer_config_helpers/layers.html
+++ b/doc/ui/api/trainer_config_helpers/layers.html
@@ -161,7 +161,7 @@ reasons.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer Output Object.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -210,7 +210,7 @@ default Bias.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer Name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -251,7 +251,7 @@ default Bias.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -288,9 +288,9 @@ support GPU mode.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>LayerOutput|list|tuple</em>) &#8211; Input layer.</li>
 <li><strong>filter_size</strong> (<em>int</em>) &#8211; The x dimension of a filter kernel.</li>
-<li><strong>filter_size_y</strong> (<em>int</em>) &#8211; The y dimension of a filter kernel. Since paddle now
-support rectangular filters, the filter&#8217;s shape
-will be (filter_size, filter_size_y).</li>
+<li><strong>filter_size_y</strong> (<em>int</em>) &#8211; The y dimension of a filter kernel. Since
+PaddlePaddle now supports rectangular filters,
+the filter&#8217;s shape can be (filter_size, filter_size_y).</li>
 <li><strong>num_filter</strong> (<em>int</em>) &#8211; channel of output data.</li>
 <li><strong>num_channel</strong> (<em>int</em>) &#8211; channel of input data.</li>
 <li><strong>stride</strong> (<em>int</em>) &#8211; The x dimension of the stride.</li>
@@ -349,7 +349,7 @@ will be (filter_size, filter_size_y).</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -371,12 +371,12 @@ thus input image&#8217;s width equals height.</p>
 <p>The num_channel means input image&#8217;s channel number. It may be 1 or 3 when
 input is raw pixels of image(mono or RGB), or it may be the previous layer&#8217;s
 num_filters * num_group.</p>
-<p>There are several group of filter in paddle
-implementation. Each group will process some channel of inputs. For example,
-if input num_channel = 256, group = 4, num_filter=32, the paddle will create
+<p>There are several group of filter in PaddlePaddle implementation.
+Each group will process some channel of the inputs. For example, if an input
+num_channel = 256, group = 4, num_filter=32, the PaddlePaddle will create
 32*4 = 128 filters to process inputs. The channels will be split into 4
-pieces. First 256/4 = 64 channels will process by first 32 filters. The rest
-channels will be processed by rest group of filters.</p>
+pieces. First 256/4 = 64 channels will process by first 32 filters. The
+rest channels will be processed by rest group of filters.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
@@ -385,9 +385,9 @@ channels will be processed by rest group of filters.</p>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; Layer Input.</li>
 <li><strong>filter_size</strong> (<em>int</em>) &#8211; The x dimension of a filter kernel.</li>
-<li><strong>filter_size_y</strong> (<em>int</em>) &#8211; The y dimension of a filter kernel. Since paddle now
-support rectangular filters, the filter&#8217;s shape
-will be (filter_size, filter_size_y).</li>
+<li><strong>filter_size_y</strong> (<em>int</em>) &#8211; The y dimension of a filter kernel. Since PaddlePaddle
+currently supports rectangular filters, the filter&#8217;s
+shape will be (filter_size, filter_size_y).</li>
 <li><strong>num_filters</strong> &#8211; Each filter group&#8217;s number of filter</li>
 <li><strong>act</strong> (<em>BaseActivation</em>) &#8211; Activation type. Default is tanh</li>
 <li><strong>groups</strong> (<em>int</em>) &#8211; Group size of filters.</li>
@@ -405,7 +405,7 @@ automatically from previous output.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer output.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -484,7 +484,10 @@ MaxPooling.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last">LayerOutput</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -500,26 +503,27 @@ MaxPooling.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.trainer_config_helpers.layers.</code><code class="descname">img_cmrnorm_layer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Convolution cross-map-response-normalize layer.</p>
-<p>TODO(yuyang18): Add reference and equations, to explain why cmr is work?</p>
+<dd><p>Convolution cross-map-response-normalize layer.
+The details please refer to
+<a class="reference external" href="http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf">Alex&#8217;s paper</a>.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; cross map response size.</li>
-<li><strong>scale</strong> (<em>float</em>) &#8211; TODO(yuyang18)</li>
-<li><strong>power</strong> (<em>float</em>) &#8211; TODO(yuyang18)</li>
+<li><strong>scale</strong> (<em>float</em>) &#8211; The hyper-parameter.</li>
+<li><strong>power</strong> (<em>float</em>) &#8211; The hyper-parameter.</li>
 <li><strong>num_channels</strong> &#8211; input layer&#8217;s filers number or channels. If
 num_channels is None, it will be set automatically.</li>
-<li><strong>blocked</strong> &#8211; TODO(yuyang18)</li>
+<li><strong>blocked</strong> &#8211; namely normalize in number of blocked feature maps.</li>
 <li><strong>layer_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ExtraLayerAttribute" title="paddle.trainer_config_helpers.attrs.ExtraLayerAttribute"><em>ExtraLayerAttribute</em></a>) &#8211; Extra Layer Attribute.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer&#8217;s output</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -535,15 +539,15 @@ num_channels is None, it will be set automatically.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.trainer_config_helpers.layers.</code><code class="descname">img_rnorm_layer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>TODO(yuyang18): add comments</p>
-<p>TODO(yuyang18): Why it is always not implemented whenever use_gpu or not?</p>
+<dd><p>Normalize the input in local region, namely response normalization
+across feature maps.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> &#8211; </li>
-<li><strong>input</strong> &#8211; </li>
+<li><strong>name</strong> &#8211; The name of this layer.</li>
+<li><strong>input</strong> &#8211; The input of this layer.</li>
 <li><strong>size</strong> &#8211; </li>
 <li><strong>scale</strong> &#8211; </li>
 <li><strong>power</strong> &#8211; </li>
@@ -552,7 +556,13 @@ num_channels is None, it will be set automatically.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last"></p>
+<tr class="field-even field"><th class="field-name">Rtype name:</th><td class="field-body"><p class="first">None|basestring</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -617,7 +627,7 @@ computation, referred to as facotr,
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer&#8217;s output</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -654,7 +664,7 @@ and <span class="math">\(out\)</span> is a (batchSize x dataDim) output vector.<
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -689,7 +699,7 @@ and <span class="math">\(out\)</span> is a (batchSize x dataDim) output vector.<
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last"></p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last">LayerOutput object.</p>
 </td>
 </tr>
 </tbody>
@@ -706,16 +716,17 @@ and <span class="math">\(out\)</span> is a (batchSize x dataDim) output vector.<
 <p>The memory cell was implemented as follow equations.</p>
 <div class="math">
 \[\begin{split}i_t &amp; = \sigma(W_{xi}x_{t} + W_{hi}h_{t-1} + W_{ci}c_{t-1} + b_i)\end{split}\]\[\begin{split}f_t &amp; = \sigma(W_{xf}x_{t} + W_{hf}h_{t-1} + W_{cf}c_{t-1} + b_f)\end{split}\]\[\begin{split}c_t &amp; = f_tc_{t-1} + i_t tanh (W_{xc}x_t+W_{hc}h_{t-1} + b_c)\end{split}\]\[\begin{split}o_t &amp; = \sigma(W_{xo}x_{t} + W_{ho}h_{t-1} + W_{co}c_t + b_o)\end{split}\]\[\begin{split}h_t &amp; = o_t tanh(c_t)\end{split}\]</div>
-<p>NOTE: In paddle&#8217;s implementation, the multiply operation
+<p>NOTE: In PaddlePaddle&#8217;s implementation, the multiplications
 <span class="math">\(W_{xi}x_{t}\)</span> , <span class="math">\(W_{xf}x_{t}\)</span>,
-<span class="math">\(W_{xc}x_t\)</span>, <span class="math">\(W_{xo}x_{t}\)</span> is not done by
-lstmemory layer, so it must use a mixed_layer do this full_matrix_projection
-before lstm is used.</p>
-<p>NOTE: This is a low level user interface. You may use network.simple_lstm
+<span class="math">\(W_{xc}x_t\)</span>, <span class="math">\(W_{xo}x_{t}\)</span> are not done in the lstmemory layer,
+so an additional mixed_layer with full_matrix_projection or a fc_layer must
+be included in the configuration file to complete the input-to-hidden
+mappings before lstmemory is called.</p>
+<p>NOTE: This is a low level user interface. You can use network.simple_lstm
 to config a simple plain lstm layer.</p>
-<p>Please refer <strong>Generating Sequences With Recurrent Neural Networks</strong> if you
-want to know what lstm is. <a class="reference external" href="http://arxiv.org/abs/1308.0850">Link</a> is here.</p>
-<p>TODO(yuyang18): Check lstm can input multiple values or not?</p>
+<p>Please refer to <strong>Generating Sequences With Recurrent Neural Networks</strong> for
+more details about LSTM.</p>
+<p><a class="reference external" href="http://arxiv.org/abs/1308.0850">Link</a> goes as below.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
@@ -734,7 +745,7 @@ bias.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -784,7 +795,7 @@ be sigmoid only.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">lstm step&#8217;s layer output</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -811,21 +822,22 @@ is computed by:</p>
 previous memory. The reset gate is computed similarly to the update gate:</p>
 <div class="math">
 \[r_t = \sigma(W_{r}x_{t} + U_{r}h_{t-1} + b_r)\]</div>
-<p>3. The candidate activation <span class="math">\(\tilde{h_t}\)</span> is computed similarly to that
-of the traditional recurrent unit:</p>
+<p>3. The candidate activation <span class="math">\(\tilde{h_t}\)</span> is computed similarly to
+that of the traditional recurrent unit:</p>
 <div class="math">
 \[{\tilde{h_t}} = tanh(W x_{t} + U (r_{t} \odot h_{t-1}) + b)\]</div>
-<p>4. The hidden activation <span class="math">\(h_t\)</span> of the GRU at time t is a linear interpolation
-between the previous activation <span class="math">\(h_{t-1}\)</span> and the candidate activation
-<span class="math">\(\tilde{h_t}\)</span>:</p>
+<p>4. The hidden activation <span class="math">\(h_t\)</span> of the GRU at time t is a linear
+interpolation between the previous activation <span class="math">\(h_{t-1}\)</span> and the
+candidate activation <span class="math">\(\tilde{h_t}\)</span>:</p>
 <div class="math">
 \[h_t = (1 - z_t) h_{t-1} + z_t {\tilde{h_t}}\]</div>
-<p>NOTE: In paddle&#8217;s implementation, the multiply operation
+<p>NOTE: In PaddlePaddle&#8217;s implementation, the multiplication operations
 <span class="math">\(W_{r}x_{t}\)</span>, <span class="math">\(W_{z}x_{t}\)</span> and <span class="math">\(W x_t\)</span> are not computed in
-gate_recurrent layer. So it must use a mixed_layer with full_matrix_projection
-or fc_layer to compute them before GRU.</p>
-<p>The details can refer to <a class="reference external" href="https://arxiv.org/abs/1412.3555">Empirical Evaluation of Gated Recurrent
-Neural Networks on Sequence Modeling.</a></p>
+gate_recurrent layer. Consequently, an additional mixed_layer with
+full_matrix_projection or a fc_layer must be included before grumemory
+is called.</p>
+<p>More details can be found by referring to <a class="reference external" href="https://arxiv.org/abs/1412.3555">Empirical Evaluation of Gated
+Recurrent Neural Networks on Sequence Modeling.</a></p>
 <p>The simple usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">grumemory</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
@@ -850,7 +862,7 @@ bias.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -882,7 +894,7 @@ bias.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first"></p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -901,7 +913,11 @@ bias.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.trainer_config_helpers.layers.</code><code class="descname">recurrent_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Recurrent Group. It supports time steps and sequence steps mechanisms.</p>
+<dd><p>Recurrent layer group is an extremely flexible recurrent unit in
+PaddlePaddle. As long as the user defines the calculation done within a
+time step, PaddlePaddle will iterate such a recurrent calculation over
+sequence input. This is extremely usefull for attention based model, or
+Neural Turning Machine like models.</p>
 <p>The basic usage (time steps) is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">step</span><span class="p">(</span><span class="nb">input</span><span class="p">):</span>
    <span class="n">output</span> <span class="o">=</span> <span class="n">fc_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer</span><span class="p">,</span>
@@ -944,7 +960,7 @@ input sequence in a reverse order.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer output object</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1043,9 +1059,10 @@ parameter.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.trainer_config_helpers.layers.</code><code class="descname">get_output_layer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Get layer&#8217;s output by name. In paddle, a layer might return multiple value,
-but return one layer output. If user want to reference another output beside
-default output, use get_output_layer first to get another output from input.</p>
+<dd><p>Get layer&#8217;s output by name. In PaddlePaddle, a layer might return multiple
+values, but returns one layer&#8217;s output. If the user wants to use another
+output besides the default one, please use get_output_layer first to get
+the output from input.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
@@ -1059,7 +1076,7 @@ multiple outputs.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer&#8217;s output</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1145,7 +1162,7 @@ for details.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Embedding Layer output</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1405,7 +1422,8 @@ The simply usage is:</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>agg_level</strong> (<em>AggregateLevel</em>) &#8211; AggregateLevel.EACH_TIMESTEP or AggregateLevel.EACH_SEQUENCE</li>
+<li><strong>agg_level</strong> (<em>AggregateLevel</em>) &#8211; AggregateLevel.EACH_TIMESTEP or
+AggregateLevel.EACH_SEQUENCE</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>pooling_type</strong> (<em>BasePoolingType|None</em>) &#8211; Type of pooling, MaxPooling(default), AvgPooling,
@@ -1415,7 +1433,7 @@ SumPooling, SquareRootNPooling.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerType</p>
@@ -1444,7 +1462,7 @@ SumPooling, SquareRootNPooling.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1473,7 +1491,7 @@ SumPooling, SquareRootNPooling.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1503,7 +1521,7 @@ Inputs can be list of LayerOutput or list of projection.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1563,7 +1581,7 @@ convolution neural network, and before recurrent neural network.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1602,7 +1620,7 @@ bias.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer name</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1635,10 +1653,12 @@ and <span class="math">\(f\)</span> is activation function.</p>
 <p>This layer just simply add all input layers together, then activate the sum
 inputs. Each input of this layer should be the same size, which is also the
 output size of this layer.</p>
-<p>There is no weight matrix for each input, because it just a simple add operation.
-If you want to a complicated operation before add, please use mixed_layer.</p>
+<p>There is no weight matrix for each input, because it just a simple add
+operation. If you want a complicated operation before add, please use
+mixed_layer.</p>
 <p>It is a very good way to set dropout outside the layers. Since not all
-paddle layer support dropout, you can add an add_to layer, set dropout here.
+PaddlePaddle layer support dropout, you can add an add_to layer, set
+dropout here.
 Please refer to dropout_layer for details.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
@@ -1655,7 +1675,7 @@ bias.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1712,7 +1732,7 @@ bias.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1751,7 +1771,7 @@ which is used in NEURAL TURING MACHINE.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1789,7 +1809,7 @@ and <span class="math">\(y\)</span> is a output vector.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1826,7 +1846,7 @@ and <span class="math">\(y\)</span> is a output vector.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1862,7 +1882,7 @@ element-wise. There is no activation and weight.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1914,7 +1934,7 @@ default Bias.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1949,7 +1969,7 @@ default Bias.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -1985,7 +2005,7 @@ The result is stored in output.ids.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -2017,7 +2037,7 @@ Sampling one id for one sample.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -2053,7 +2073,7 @@ Sampling one id for one sample.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput.</p>
@@ -2087,7 +2107,7 @@ Sampling one id for one sample.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput.</p>
@@ -2120,7 +2140,7 @@ Sampling one id for one sample.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -2153,7 +2173,7 @@ Sampling one id for one sample.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput.</p>
@@ -2193,15 +2213,15 @@ minimum size of lists.</li>
 If max_sort_size = -1, then for each list, the
 algorithm will sort the entire list to get gradient.
 In other cases, max_sort_size must be greater than or
-equal to NDCG_num. And if max_sort_size is greater than
-the size of a list, the algorithm will sort the entire
-list of get gradient.</li>
+equal to NDCG_num. And if max_sort_size is greater
+than the size of a list, the algorithm will sort the
+entire list of get gradient.</li>
 <li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layers. It is not necessary.</li>
 <li><strong>coeff</strong> (<em>float</em>) &#8211; The coefficient affects the gradient in the backward.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -2255,7 +2275,7 @@ It is an optional argument.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -2273,8 +2293,8 @@ It is an optional argument.</li>
 <code class="descclassname">paddle.trainer_config_helpers.layers.</code><code class="descname">cos_sim</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Cosine Similarity Layer. The cosine similarity equation is here.</p>
 <div class="math">
-\[similarity = cos(\theta) = {\mathbf{A} \cdot \mathbf{B}
-\over \|\mathbf{A}\| \|\mathbf{B}\|}\]</div>
+\[similarity = cos(\theta) = {\mathbf{a} \cdot \mathbf{b}
+\over \|\mathbf{b}\| \|\mathbf{b}\|}\]</div>
 <p>And the input dimension is <span class="math">\(a \in R^M\)</span>, <span class="math">\(b \in R^{MN}\)</span>. The
 similarity will be calculated N times by step M. The output dimension is
 <span class="math">\(R^N\)</span>. The scale will be multiplied to similarity.</p>
@@ -2292,7 +2312,7 @@ similarity will be calculated N times by step M. The output dimension is
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -2331,7 +2351,7 @@ optional argument.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -2365,7 +2385,7 @@ decoding or 0 for correct decoding.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -2404,7 +2424,7 @@ alignment between the inputs and the target labels is unknown.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">a object of LayerOutput.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -2446,7 +2466,7 @@ False means no bias.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -2485,7 +2505,7 @@ It is used by recurrent layer group.</p>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>

--- a/doc/ui/api/trainer_config_helpers/networks.html
+++ b/doc/ui/api/trainer_config_helpers/networks.html
@@ -319,7 +319,24 @@ False if no bias.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.trainer_config_helpers.networks.</code><code class="descname">lstmemory_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>TODO(yuyang18): complete docs</p>
+<dd><p>Define calculations that a LSTM unit performs in a single time step.
+This function itself is not a recurrent layer, so that it can not be
+directly applied to sequence input. This function is always used in
+recurrent_group (see layers.py for more details) to implement attention
+mechanism.</p>
+<p>Please refer to  <strong>Generating Sequences With Recurrent Neural Networks</strong>
+for more details about LSTM. The link goes as follows:
+.. _Link: <a class="reference external" href="https://arxiv.org/abs/1308.0850">https://arxiv.org/abs/1308.0850</a></p>
+<div class="math">
+\[\begin{split}i_t &amp; = \sigma(W_{xi}x_{t} + W_{hi}h_{t-1} + W_{ci}c_{t-1} + b_i)\end{split}\]\[\begin{split}f_t &amp; = \sigma(W_{xf}x_{t} + W_{hf}h_{t-1} + W_{cf}c_{t-1} + b_f)\end{split}\]\[\begin{split}c_t &amp; = f_tc_{t-1} + i_t tanh (W_{xc}x_t+W_{hc}h_{t-1} + b_c)\end{split}\]\[\begin{split}o_t &amp; = \sigma(W_{xo}x_{t} + W_{ho}h_{t-1} + W_{co}c_t + b_o)\end{split}\]\[\begin{split}h_t &amp; = o_t tanh(c_t)\end{split}\]</div>
+<p>The example usage is:</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">lstm_step</span> <span class="o">=</span> <span class="n">lstmemory_unit</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
+                           <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
+                           <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">(),</span>
+                           <span class="n">gate_act</span><span class="o">=</span><span class="n">SigmoidActivation</span><span class="p">(),</span>
+                           <span class="n">state_act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">())</span>
+</pre></div>
+</div>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
@@ -329,9 +346,9 @@ False if no bias.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory unit name.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory unit size.</li>
 <li><strong>param_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ParameterAttribute" title="paddle.trainer_config_helpers.attrs.ParameterAttribute"><em>ParameterAttribute</em></a>) &#8211; Parameter config, None if use default.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activate type</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activate type</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activate type.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
 <li><strong>mixed_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of mixed layer.
 False means no bias, None means default bias.</li>
 <li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
@@ -358,7 +375,28 @@ False means no bias, None means default bias.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.trainer_config_helpers.networks.</code><code class="descname">lstmemory_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>TODO(yuyang18): complete docs</p>
+<dd><p>lstm_group is a recurrent layer group version Long Short Term Memory. It
+does exactly the same calculation as the lstmemory layer (see lstmemory in
+layers.py for the maths) does. A promising benefit is that LSTM memory
+cell states, or hidden states in every time step are accessible to for the
+user. This is especially useful in attention model. If you do not need to
+access to the internal states of the lstm, but merely use its outputs,
+it is recommanded to use the lstmemory, which is relatively faster than
+lstmemory_group.</p>
+<p>NOTE: In PaddlePaddle&#8217;s implementation, the following input-to-hidden
+multiplications:
+<span class="math">\(W_{xi}x_{t}\)</span> , <span class="math">\(W_{xf}x_{t}\)</span>,
+<span class="math">\(W_{xc}x_t\)</span>, <span class="math">\(W_{xo}x_{t}\)</span> are not done in lstmemory_unit to
+speed up the calculations. Consequently, an additional mixed_layer with
+full_matrix_projection must be included before lstmemory_unit is called.</p>
+<p>The example usage is:</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">lstm_step</span> <span class="o">=</span> <span class="n">lstmemory_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
+                            <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
+                            <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">(),</span>
+                            <span class="n">gate_act</span><span class="o">=</span><span class="n">SigmoidActivation</span><span class="p">(),</span>
+                            <span class="n">state_act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">())</span>
+</pre></div>
+</div>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
@@ -369,9 +407,9 @@ False means no bias, None means default bias.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory group size.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; is lstm reversed</li>
 <li><strong>param_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ParameterAttribute" title="paddle.trainer_config_helpers.attrs.ParameterAttribute"><em>ParameterAttribute</em></a>) &#8211; Parameter config, None if use default.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activate type</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activate type</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activate type.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
 <li><strong>mixed_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of mixed layer.
 False means no bias, None means default bias.</li>
 <li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
@@ -382,7 +420,7 @@ False means no bias, None means default bias.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">lstmemory group name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">the lstmemory group.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -413,14 +451,14 @@ want to know what lstm is. <a class="reference external" href="http://arxiv.org/
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstm layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstm layer size.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; is lstm reversed</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
 <li><strong>mat_param_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ParameterAttribute" title="paddle.trainer_config_helpers.attrs.ParameterAttribute"><em>ParameterAttribute</em></a>) &#8211; mixed layer&#8217;s matrix projection parameter attribute.</li>
 <li><strong>bias_param_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute. False means no bias, None
 means default bias.</li>
 <li><strong>inner_param_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ParameterAttribute" title="paddle.trainer_config_helpers.attrs.ParameterAttribute"><em>ParameterAttribute</em></a>) &#8211; lstm cell parameter attribute.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activate type</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activate type</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activate type.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
 <li><strong>mixed_layer_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ExtraLayerAttribute" title="paddle.trainer_config_helpers.attrs.ExtraLayerAttribute"><em>ExtraLayerAttribute</em></a>) &#8211; mixed layer&#8217;s extra attribute.</li>
 <li><strong>lstm_cell_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ExtraLayerAttribute" title="paddle.trainer_config_helpers.attrs.ExtraLayerAttribute"><em>ExtraLayerAttribute</em></a>) &#8211; lstm layer&#8217;s extra attribute.</li>
 </ul>
@@ -442,7 +480,19 @@ means default bias.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.trainer_config_helpers.networks.</code><code class="descname">bidirectional_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>TODO(yuyang18): Complete docs</p>
+<dd><p>A bidirectional_lstm is a recurrent unit that iterates over the input
+sequence both in forward and bardward orders, and then concatenate two
+outputs form a final output. However, concatenation of two outputs
+is not the only way to form the final output, you can also, for example,
+just add them together.</p>
+<p>Please refer to  <strong>Neural Machine Translation by Jointly Learning to Align
+and Translate</strong> for more details about the bidirectional lstm.
+The link goes as follows:
+.. _Link: <a class="reference external" href="https://arxiv.org/pdf/1409.0473v3.pdf">https://arxiv.org/pdf/1409.0473v3.pdf</a></p>
+<p>The example usage is:</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">lstm_step</span> <span class="o">=</span> <span class="n">bidirectional_lstm</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">input1</span><span class="p">],</span> <span class="n">size</span><span class="o">=</span><span class="mi">512</span><span class="p">)</span>
+</pre></div>
+</div>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
@@ -451,8 +501,11 @@ means default bias.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; bidirectional lstm layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstm layer size.</li>
-<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If False, concat word in last time step and return.
-If True, concat sequnce in all time step and return.</li>
+<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, outputs of the last time step are
+concatenated and returned.
+If set True, the entire output sequences that are
+processed in forward and backward directions are
+concatenated and returned.</li>
 </ul>
 </td>
 </tr>
@@ -475,22 +528,30 @@ If True, concat sequnce in all time step and return.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.trainer_config_helpers.networks.</code><code class="descname">gru_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><table class="docutils field-list" frame="void" rules="none">
+<dd><p>Define calculations that a gated recurrent unit performs in a single time
+step. This function itself is not a recurrent layer, so that it can not be
+directly applied to sequence input. This function is almost always used in
+the recurrent_group (see layers.py for more details) to implement attention
+mechanism.</p>
+<p>Please see grumemory in layers.py for the details about the maths.</p>
+<table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; </li>
-<li><strong>name</strong> &#8211; </li>
-<li><strong>size</strong> &#8211; </li>
-<li><strong>gru_bias_attr</strong> &#8211; </li>
-<li><strong>act</strong> &#8211; </li>
-<li><strong>gate_act</strong> &#8211; </li>
-<li><strong>gru_layer_attr</strong> &#8211; </li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
+<li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activation</li>
+<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last"></p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">the gru output layer.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -500,9 +561,94 @@ If True, concat sequnce in all time step and return.</li>
 </div>
 <div class="section" id="gru-group">
 <h3>gru_group<a class="headerlink" href="#gru-group" title="Permalink to this headline">¶</a></h3>
+<dl class="function">
+<dt>
+<code class="descclassname">paddle.trainer_config_helpers.networks.</code><code class="descname">gru_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<dd><p>gru_group is a recurrent layer group version Gated Recurrent Unit. It
+does exactly the same calculation as the grumemory layer does. A promising
+benefit is that gru hidden sates are accessible to for the user. This is
+especially useful in attention model. If you do not need to access to
+any internal state, but merely use the outputs of a GRU, it is recommanded
+to use the grumemory, which is relatively faster.</p>
+<p>Please see grumemory in layers.py for more detail about the maths.</p>
+<p>The example usage is:</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">gur_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
+                <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
+                <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">(),</span>
+                <span class="n">gate_act</span><span class="o">=</span><span class="n">SigmoidActivation</span><span class="p">())</span>
+</pre></div>
+</div>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
+<li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+</ul>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">the gru group.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
+</td>
+</tr>
+</tbody>
+</table>
+</dd></dl>
+
 </div>
 <div class="section" id="simple-gru">
 <h3>simple_gru<a class="headerlink" href="#simple-gru" title="Permalink to this headline">¶</a></h3>
+<dl class="function">
+<dt>
+<code class="descclassname">paddle.trainer_config_helpers.networks.</code><code class="descname">simple_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<dd><p>simple_gru is also a recurrent layer group version Gated Recurrent Unit as
+gru_group. The difference only lies in implemention details.
+The computational speed is that, grumemory is relatively better than
+gru_group, and gru_group is relatively better than simple_gru.</p>
+<p>simple_gru does exactly the same calculation as the grumemory layer does.
+Please see grumemory in layers.py for more detail about the maths.</p>
+<p>The example usage is:</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">gur_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
+                <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
+                <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">(),</span>
+                <span class="n">gate_act</span><span class="o">=</span><span class="n">SigmoidActivation</span><span class="p">())</span>
+</pre></div>
+</div>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
+<li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+</ul>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">the gru group.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
+</td>
+</tr>
+</tbody>
+</table>
+</dd></dl>
+
 </div>
 </div>
 <div class="section" id="simple-attention">

--- a/doc/ui/api/trainer_config_helpers/optimizers.html
+++ b/doc/ui/api/trainer_config_helpers/optimizers.html
@@ -79,12 +79,6 @@ be learned. The i is the i-th observation in (trainning) data.</p>
 <div class="math">
 \[w = w - \eta \nabla Q(w) = w - \eta \sum_{i}^{n} \nabla Q_i(w)\]</div>
 <p>where <span class="math">\(\eta\)</span> is learning rate. And <span class="math">\(n\)</span> is batch size.</p>
-<p>The SGD method is implemented by paddle with multiple extensions. Such as
-momentum, adagrad, rmsprop, adam. Please use method &#8216;use_xxx&#8217;, such as
-use_adam, to enhance the SGD method.</p>
-<p>WARNING: IN PADDLE&#8217;S IMPLEMENTATION, BATCH_SIZE IS SET FOR ONE COMPUTE
-PROCESS(NODE). IF YOU USE MULTIPLE MACHINE TO TRAIN YOUR NETWORK, THE GLOBAL
-BATCH SIZE WILL BE (BATCH_SIZE * MACHINE_COUNT).</p>
 </dd></dl>

 </div>
@@ -239,25 +233,35 @@ w &amp; = w - \frac{\eta} {\sqrt{v(w,t) + \epsilon}} \nabla Q_{i}(w)\end{split}\
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.trainer_config_helpers.optimizers.</code><code class="descname">settings</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>TODO(yuyang18): Complete docs.</p>
+<dd><p>Set the optimization method, learning rate, batch size, and other training
+settings. The currently supported algorithms are SGD and Async-SGD.</p>
+<div class="admonition warning">
+<p class="first admonition-title">Warning</p>
+<p class="last">Note that the &#8216;batch_size&#8217; in PaddlePaddle is not equal to global
+training batch size. It represents the single training process&#8217;s batch
+size. If you use N processes to train one model, for example use three
+GPU machines, the global batch size is N*&#8217;batch_size&#8217;.</p>
+</div>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
-<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>batch_size</strong> &#8211; </li>
-<li><strong>learning_rate</strong> &#8211; </li>
-<li><strong>learning_method</strong> &#8211; </li>
-<li><strong>regularization</strong> &#8211; </li>
-<li><strong>is_async</strong> &#8211; </li>
-<li><strong>model_average</strong> &#8211; </li>
-<li><strong>gradient_clipping_threshold</strong> &#8211; </li>
+<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
+<li><strong>batch_size</strong> (<em>int</em>) &#8211; batch size for one training process.</li>
+<li><strong>learning_rate</strong> (<em>float</em>) &#8211; learning rate for SGD</li>
+<li><strong>learning_method</strong> (<em>BaseSGDOptimizer</em>) &#8211; The extension optimization algorithms of gradient
+descent, such as momentum, adagrad, rmsprop, etc.
+Note that it should be instance with base type
+BaseSGDOptimizer.</li>
+<li><strong>regularization</strong> (<em>BaseRegularization</em>) &#8211; The regularization method.</li>
+<li><strong>is_async</strong> (<em>bool</em>) &#8211; Is Async-SGD or not. Default value is False.</li>
+<li><strong>model_average</strong> (<em>ModelAverage</em>) &#8211; Model Average Settings.</li>
+<li><strong>gradient_clipping_threshold</strong> (<em>float</em>) &#8211; gradient clipping threshold. If gradient
+value larger than some value, will be
+clipped.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last"></p>
-</td>
-</tr>
 </tbody>
 </table>
 </dd></dl>

--- a/doc/ui/data_provider/index.html
+++ b/doc/ui/data_provider/index.html
@@ -65,7 +65,7 @@ data format PaddlePaddle requires. The process is extremly flexible and highly
 customized, with sacrificing the efficiency only a little. This is extremly
 useful when you have to dynamically generate certain kinds of data according to,
 for example, the training performance.</p>
-<p>Besides, users also can also customize a C++ <code class="code docutils literal"><span class="pre">DataProvider</span></code> for a more
+<p>Besides, users also can customize a C++ <code class="code docutils literal"><span class="pre">DataProvider</span></code> for a more
 complex usage, or for a higher efficiency.</p>
 <p>The following parameters are required to define in the PaddlePaddle network
 configuration file (trainer_config.py): which DataProvider is chosen to used,

--- a/doc/ui/data_provider/pydataprovider2.html
+++ b/doc/ui/data_provider/pydataprovider2.html
@@ -71,9 +71,9 @@ providing process.</p>
 how to write a simple PyDataProvider.</p>
 <p>MNIST is a handwriting classification data set. It contains 70,000 digital
 grayscale images. Labels of the training sample range from 0 to 9. All the
-images have been size-normalized and centered into images with a same size
+images have been size-normalized and centered into images with the same size
 of 28 x 28 pixels.</p>
-<p>A small part of the original data as an example can be found in the path below:</p>
+<p>A small part of the original data as an example is shown as below:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span>5;0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.215686 0.533333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.67451 0.992157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.070588 0.886275 0.992157 0 0 0 0 0 0 0 0 0 0 0.192157 0.070588 0 0 0 0 0 0 0 0 0 0 0 0 0 0.670588 0.992157 0.992157 0 0 0 0 0 0 0 0 0 0.117647 0.933333 0.858824 0.313725 0 0 0 0 0 0 0 0 0 0 0 0.090196 0.858824 0.992157 0.831373 0 0 0 0 0 0 0 0 0 0.141176 0.992157 0.992157 0.611765 0.054902 0 0 0 0 0 0 0 0 0 0 0.258824 0.992157 0.992157 0.529412 0 0 0 0 0 0 0 0 0 0.368627 0.992157 0.992157 0.419608 0.003922 0 0 0 0 0 0 0 0 0 0.094118 0.835294 0.992157 0.992157 0.517647 0 0 0 0 0 0 0 0 0 0.603922 0.992157 0.992157 0.992157 0.603922 0.545098 0.043137 0 0 0 0 0 0 0 0.447059 0.992157 0.992157 0.956863 0.062745 0 0 0 0 0 0 0 0 0.011765 0.666667 0.992157 0.992157 0.992157 0.992157 0.992157 0.745098 0.137255 0 0 0 0 0 0.152941 0.866667 0.992157 0.992157 0.521569 0 0 0 0 0 0 0 0 0 0.070588 0.992157 0.992157 0.992157 0.803922 0.352941 0.745098 0.992157 0.945098 0.317647 0 0 0 0 0.580392 0.992157 0.992157 0.764706 0.043137 0 0 0 0 0 0 0 0 0 0.070588 0.992157 0.992157 0.776471 0.043137 0 0.007843 0.27451 0.882353 0.941176 0.176471 0 0 0.180392 0.898039 0.992157 0.992157 0.313725 0 0 0 0 0 0 0 0 0 0 0.070588 0.992157 0.992157 0.713725 0 0 0 0 0.627451 0.992157 0.729412 0.062745 0 0.509804 0.992157 0.992157 0.776471 0.035294 0 0 0 0 0 0 0 0 0 0 0.494118 0.992157 0.992157 0.968627 0.168627 0 0 0 0.423529 0.992157 0.992157 0.364706 0 0.717647 0.992157 0.992157 0.317647 0 0 0 0 0 0 0 0 0 0 0 0.533333 0.992157 0.984314 0.945098 0.603922 0 0 0 0.003922 0.466667 0.992157 0.988235 0.976471 0.992157 0.992157 0.788235 0.007843 0 0 0 0 0 0 0 0 0 0 0 0.686275 0.882353 0.364706 0 0 0 0 0 0 0.098039 0.588235 0.992157 0.992157 0.992157 0.980392 0.305882 0 0 0 0 0 0 0 0 0 0 0 0 0.101961 0.67451 0.321569 0 0 0 0 0 0 0 0.105882 0.733333 0.976471 0.811765 0.713725 0 0 0 0 0 0 0 0 0 0 0 0 0 0.65098 0.992157 0.321569 0 0 0 0 0 0 0 0 0 0.25098 0.007843 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.94902 0.219608 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.968627 0.764706 0.152941 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.498039 0.25098 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
 0;0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.298039 0.333333 0.333333 0.333333 0.337255 0.333333 0.333333 0.109804 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.027451 0.223529 0.776471 0.964706 0.988235 0.988235 0.988235 0.992157 0.988235 0.988235 0.780392 0.098039 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.14902 0.698039 0.988235 0.992157 0.988235 0.901961 0.87451 0.568627 0.882353 0.976471 0.988235 0.988235 0.501961 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.188235 0.647059 0.988235 0.988235 0.745098 0.439216 0.098039 0 0 0 0.572549 0.988235 0.988235 0.988235 0 0 0 0 0 0 0 0 0 0 0 0 0 0.2 0.933333 0.992157 0.941176 0.247059 0 0 0 0 0 0 0.188235 0.898039 0.992157 0.992157 0 0 0 0 0 0 0 0 0 0 0 0.039216 0.639216 0.933333 0.988235 0.913725 0.278431 0 0 0 0 0 0 0 0.113725 0.843137 0.988235 0.988235 0 0 0 0 0 0 0 0 0 0 0 0.235294 0.988235 0.992157 0.988235 0.815686 0.07451 0 0 0 0 0 0 0 0.333333 0.988235 0.988235 0.552941 0 0 0 0 0 0 0 0 0 0 0.211765 0.878431 0.988235 0.992157 0.701961 0.329412 0.109804 0 0 0 0 0 0 0 0.698039 0.988235 0.913725 0.145098 0 0 0 0 0 0 0 0 0 0.188235 0.890196 0.988235 0.988235 0.745098 0.047059 0 0 0 0 0 0 0 0 0 0.882353 0.988235 0.568627 0 0 0 0 0 0 0 0 0 0.2 0.933333 0.992157 0.992157 0.992157 0.447059 0.294118 0 0 0 0 0 0 0 0 0.447059 0.992157 0.768627 0 0 0 0 0 0 0 0 0 0 0.623529 0.988235 0.988235 0.988235 0.988235 0.992157 0.47451 0 0 0 0 0 0 0 0.188235 0.933333 0.87451 0.509804 0 0 0 0 0 0 0 0 0 0 0.992157 0.988235 0.937255 0.792157 0.988235 0.894118 0.082353 0 0 0 0 0 0 0.027451 0.647059 0.992157 0.654902 0 0 0 0 0 0 0 0 0 0 0 0.623529 0.988235 0.913725 0.329412 0.376471 0.184314 0 0 0 0 0 0 0.027451 0.513725 0.988235 0.635294 0.219608 0 0 0 0 0 0 0 0 0 0 0 0.196078 0.929412 0.988235 0.988235 0.741176 0.309804 0 0 0 0 0 0 0.529412 0.988235 0.678431 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.223529 0.992157 0.992157 1 0.992157 0.992157 0.992157 0.992157 1 0.992157 0.992157 0.882353 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.023529 0.478431 0.654902 0.658824 0.952941 0.988235 0.988235 0.988235 0.992157 0.988235 0.729412 0.278431 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.196078 0.647059 0.764706 0.764706 0.768627 0.580392 0.047059 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
 4;0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.180392 0.470588 0.623529 0.623529 0.623529 0.588235 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.243137 0.494118 0.862745 0.870588 0.960784 0.996078 0.996078 0.996078 0.996078 0.992157 0.466667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.317647 0.639216 0.639216 0.639216 0.639216 0.639216 0.470588 0.262745 0.333333 0.929412 0.694118 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.811765 0.694118 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.811765 0.694118 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.811765 0.694118 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.184314 0.992157 0.694118 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.192157 0.996078 0.384314 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.454902 0.980392 0.219608 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.564706 0.941176 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.588235 0.776471 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.945098 0.560784 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.054902 0.952941 0.356863 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.337255 0.917647 0.109804 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.698039 0.701961 0.019608 0.4 0.662745 0.662745 0.662745 0.662745 0.662745 0.662745 0.662745 0.376471 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.090196 0.639216 0.972549 0.945098 0.913725 0.996078 0.996078 0.996078 0.996078 1 0.996078 0.996078 1 0.996078 0 0 0 0 0 0 0 0 0 0 0.007843 0.105882 0.717647 0.776471 0.905882 0.996078 0.996078 0.988235 0.980392 0.862745 0.537255 0.223529 0.223529 0.368627 0.376471 0.6 0.6 0.6 0 0 0 0 0 0 0 0 0.262745 0.470588 0.6 0.996078 0.996078 0.996078 0.996078 0.847059 0.356863 0.156863 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.909804 0.705882 0.823529 0.635294 0.490196 0.219608 0.113725 0.062745 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.152941 0.152941 0.156863 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
@@ -85,7 +85,34 @@ label of an image. The second part contains 28x28 pixel float values.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">mnist_train</span><span class="o">.</span><span class="n">txt</span>
 </pre></div>
 </div>
-<p>The corresponding dataprovider can be found in the path below:</p>
+<p>The corresponding dataprovider is shown as below:</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">paddle.trainer.PyDataProvider2</span> <span class="kn">import</span> <span class="o">*</span>
+
+
+<span class="c1"># Define a py data provider</span>
+<span class="nd">@provider</span><span class="p">(</span><span class="n">input_types</span><span class="o">=</span><span class="p">[</span>
+    <span class="n">dense_vector</span><span class="p">(</span><span class="mi">28</span> <span class="o">*</span> <span class="mi">28</span><span class="p">),</span>
+    <span class="n">integer_value</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
+<span class="p">])</span>
+<span class="k">def</span> <span class="nf">process</span><span class="p">(</span><span class="n">settings</span><span class="p">,</span> <span class="n">filename</span><span class="p">):</span>  <span class="c1"># settings is not used currently.</span>
+    <span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="s1">&#39;r&#39;</span><span class="p">)</span>  <span class="c1"># open one of training file</span>
+
+    <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">f</span><span class="p">:</span>  <span class="c1"># read each line</span>
+        <span class="n">label</span><span class="p">,</span> <span class="n">pixel</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;;&#39;</span><span class="p">)</span>
+
+        <span class="c1"># get features and label</span>
+        <span class="n">pixels_str</span> <span class="o">=</span> <span class="n">pixel</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)</span>
+
+        <span class="n">pixels_float</span> <span class="o">=</span> <span class="p">[]</span>
+        <span class="k">for</span> <span class="n">each_pixel_str</span> <span class="ow">in</span> <span class="n">pixels_str</span><span class="p">:</span>
+            <span class="n">pixels_float</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">float</span><span class="p">(</span><span class="n">each_pixel_str</span><span class="p">))</span>
+
+        <span class="c1"># give data to paddle.</span>
+        <span class="k">yield</span> <span class="n">pixels_float</span><span class="p">,</span> <span class="nb">int</span><span class="p">(</span><span class="n">label</span><span class="p">)</span>
+
+    <span class="n">f</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>  <span class="c1"># close file</span>
+</pre></div>
+</div>
 <p>The first line imports PyDataProvider2 package.
 The main function is the process function, that has two parameters.
 The first parameter is the settings, which is not used in this example.
@@ -94,8 +121,8 @@ This parameter is passed to the process function by PaddlePaddle.</p>
 <p><code class="code docutils literal"><span class="pre">&#64;provider</span></code> is a Python
 <a class="reference external" href="http://www.learnpython.org/en/Decorators">Decorator</a> .
 It sets some properties to DataProvider, and constructs a real PaddlePaddle
-DataProvider from a very sample user implemented python function. It does not
-matter if you are not familiar with <a class="reference external" href="http://www.learnpython.org/en/Decorators">Decorator</a>. You can keep it sample by
+DataProvider from a very simple user implemented python function. It does not
+matter if you are not familiar with <a class="reference external" href="http://www.learnpython.org/en/Decorators">Decorator</a>. You can keep it simple by
 just taking <code class="code docutils literal"><span class="pre">&#64;provider</span></code> as a fixed mark above the provider function you
 implemented.</p>
 <p><a class="reference internal" href="#input-types">input_types</a> defines the data format that a DataProvider returns.
@@ -105,9 +132,9 @@ scalar, whose value ranges from 0 to 9.
 document of <a class="reference internal" href="#input-types">input_types</a> for more details.</p>
 <p>The process method is the core part to construct a real DataProvider in
 PaddlePaddle. It implements how to open the text file, how to read one sample
-from the original text file, converted them into <a class="reference internal" href="#input-types">input_types</a>, and give them
+from the original text file, convert them into <a class="reference internal" href="#input-types">input_types</a>, and give them
 back to PaddlePaddle process at line 23.
-Note that data yields by the process function must follow a same order that
+Note that data yielded by the process function must follow the same order that
 <a class="reference internal" href="#input-types">input_types</a> are defined.</p>
 <p>With the help of PyDataProvider2, user can focus on how to generate ONE traning
 sample by using keywords <code class="code docutils literal"><span class="pre">yield</span></code>.
@@ -202,7 +229,7 @@ negative sentiment (marked by 0 and 1 respectively).</p>
    <span class="n">f</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
 </pre></div>
 </div>
-<p>This data provider for sequential model is a little bit complex than that
+<p>This data provider for sequential model is a little more complex than that
 for MINST dataset.
 A new initialization method is introduced here.
 The method <code class="code docutils literal"><span class="pre">on_init</span></code> is configured to DataProvider by <code class="code docutils literal"><span class="pre">&#64;provider</span></code>&#8216;s
@@ -363,7 +390,7 @@ parameters which your init_hook does not use.</p>
 <div class="section" id="cache">
 <h3>cache<a class="headerlink" href="#cache" title="Permalink to this headline">¶</a></h3>
 <p>DataProvider provides two simple cache strategy. They are
-* CacheType.NO_CACHE means do not cache any data, then data is read runtime by</p>
+* CacheType.NO_CACHE means do not cache any data, then data is read at runtime by</p>
 <blockquote>
 <div>the user implemented python module every pass.</div></blockquote>
 <ul class="simple">

--- a/doc_cn/_sources/demo/quick_start/index.txt
+++ b/doc_cn/_sources/demo/quick_start/index.txt
@@ -4,7 +4,7 @@

 ## 安装(Install)

-首先请参考<a href = "../../build/index.html">安装教程</a>安装PaddlePaddle。
+首先请参考<a href = "../../build_and_install/install/index.html">安装教程</a>安装PaddlePaddle。

 ## 使用概述(Overview)

@@ -134,8 +134,8 @@ define_py_data_sources2(train_list='data/train.list',
 * obj="process": 指定生成数据的函数
 * args={"dictionary": word_dict}: 额外的参数，这里指定词典

-更详细用例请参考文档<a href = "../../ui/data_provider/python_case.html">Python Use Case</a>，
-数据格式和详细文档请参考<a href = "../../ui/py_data_provider_wrapper_api.html">
+更详细用例请参考文档<a href = "../../../doc/ui/data_provider/python_case.html">Python Use Case</a>，
+数据格式和详细文档请参考<a href = "../../../doc/ui/data_provider/pydataprovider2.html">
 PyDataProviderWrapper</a>。

 ## 网络结构(Network Architecture)
@@ -143,7 +143,7 @@ PyDataProviderWrapper</a>。
 <center> ![](./PipelineNetwork.jpg) </center>

 我们将以基本的逻辑回归网络作为起点，并逐渐展示更加深入的功能。更详细的网络配置
-连接请参考<a href = "../../ui/trainer_config_helpers_api.html#module-paddle.trainer_config_helpers.layers">Layer文档</a>。
+连接请参考<a href = "../../../doc/layer.html">Layer文档</a>。
 所有配置在`demo/quick_start`目录，首先列举逻辑回归网络。

 ### 逻辑回归模型(Logistic Regression)
@@ -350,7 +350,7 @@ lstm = simple_lstm(input=emb, size=lstm_size)
 <br>

 ## 优化算法(Optimization Algorithm)
-<a href = "../../ui/trainer_config_helpers_api.html#module-paddle.trainer_config_helpers.optimizers">优化算法</a>包括
+<a href = "../../../doc/ui/trainer_config_helpers_api.html#module-paddle.trainer_config_helpers.optimizers">优化算法</a>包括
 Momentum, RMSProp，AdaDelta，AdaGrad，ADAM，Adamax等，这里采用Adam优化方法，加了L2正则和梯度截断。

 ```python
@@ -375,7 +375,7 @@ paddle train \
 --num_passes=15 \
 --use_gpu=false
 ```
-这里没有介绍多机分布式训练，可以参考<a href = "../../platform/index.html">分布式训练</a>的demo学习如何进行多机训练。
+这里没有介绍多机分布式训练，可以参考<a href = "../../cluster/index.html">分布式训练</a>的demo学习如何进行多机训练。

 ## 预测(Prediction)
 可以使用训练好的模型评估带有label的验证集，也可以预测没有label的测试集。

--- a/doc_cn/demo/quick_start/index.html
+++ b/doc_cn/demo/quick_start/index.html
@@ -55,7 +55,7 @@
 <p>我们以文本分类问题作为背景，介绍PaddlePaddle使用流程和常用的网络基础单元的配置方法。</p>
 <div class="section" id="install">
 <span id="install"></span><h2>安装(Install)<a class="headerlink" href="#install" title="Permalink to this headline">¶</a></h2>
-<p>首先请参考<a href = "../../build/index.html">安装教程</a>安装PaddlePaddle。</p>
+<p>首先请参考<a href = "../../build_and_install/install/index.html">安装教程</a>安装PaddlePaddle。</p>
 </div>
 <div class="section" id="overview">
 <span id="overview"></span><h2>使用概述(Overview)<a class="headerlink" href="#overview" title="Permalink to this headline">¶</a></h2>
@@ -192,8 +192,8 @@
 <li>obj=&#8221;process&#8221;: 指定生成数据的函数</li>
 <li>args={&#8220;dictionary&#8221;: word_dict}: 额外的参数，这里指定词典</li>
 </ul>
-<p>更详细用例请参考文档<a href = "../../ui/data_provider/python_case.html">Python Use Case</a>，
-数据格式和详细文档请参考<a href = "../../ui/py_data_provider_wrapper_api.html">
+<p>更详细用例请参考文档<a href = "../../../doc/ui/data_provider/python_case.html">Python Use Case</a>，
+数据格式和详细文档请参考<a href = "../../../doc/ui/data_provider/pydataprovider2.html">
 PyDataProviderWrapper</a>。</p>
 </div>
 </div>
@@ -202,7 +202,7 @@ PyDataProviderWrapper</a>。</p>
 <p>本节我们将专注于网络结构的介绍。
 <center> <img alt="" src="../../_images/PipelineNetwork.jpg" /> </center></p>
 <p>我们将以基本的逻辑回归网络作为起点，并逐渐展示更加深入的功能。更详细的网络配置
-连接请参考<a href = "../../ui/trainer_config_helpers_api.html#module-paddle.trainer_config_helpers.layers">Layer文档</a>。
+连接请参考<a href = "../../../doc/layer.html">Layer文档</a>。
 所有配置在<code class="docutils literal"><span class="pre">demo/quick_start</span></code>目录，首先列举逻辑回归网络。</p>
 <div class="section" id="logistic-regression">
 <span id="logistic-regression"></span><h3>逻辑回归模型(Logistic Regression)<a class="headerlink" href="#logistic-regression" title="Permalink to this headline">¶</a></h3>
@@ -374,7 +374,7 @@ PyDataProviderWrapper</a>。</p>
 </div>
 <div class="section" id="optimization-algorithm">
 <span id="optimization-algorithm"></span><h2>优化算法(Optimization Algorithm)<a class="headerlink" href="#optimization-algorithm" title="Permalink to this headline">¶</a></h2>
-<p><a href = "../../ui/trainer_config_helpers_api.html#module-paddle.trainer_config_helpers.optimizers">优化算法</a>包括
+<p><a href = "../../../doc/ui/trainer_config_helpers_api.html#module-paddle.trainer_config_helpers.optimizers">优化算法</a>包括
 Momentum, RMSProp，AdaDelta，AdaGrad，ADAM，Adamax等，这里采用Adam优化方法，加了L2正则和梯度截断。</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">settings</span><span class="p">(</span><span class="n">batch_size</span><span class="o">=</span><span class="mi">128</span><span class="p">,</span>
         <span class="n">learning_rate</span><span class="o">=</span><span class="mf">2e-3</span><span class="p">,</span>
@@ -397,7 +397,7 @@ Momentum, RMSProp，AdaDelta，AdaGrad，ADAM，Adamax等，这里采用Adam优
 --use_gpu<span class="o">=</span><span class="nb">false</span>
 </pre></div>
 </div>
-<p>这里没有介绍多机分布式训练，可以参考<a href = "../../platform/index.html">分布式训练</a>的demo学习如何进行多机训练。</p>
+<p>这里没有介绍多机分布式训练，可以参考<a href = "../../cluster/index.html">分布式训练</a>的demo学习如何进行多机训练。</p>
 </div>
 <div class="section" id="prediction">
 <span id="prediction"></span><h2>预测(Prediction)<a class="headerlink" href="#prediction" title="Permalink to this headline">¶</a></h2>