merge from paddle/book/develop

8bdc3588 · gongweibao · 34ca1d3c · 4f1d0137 · 8bdc3588 · 8bdc3588
51 changed file
--- a/README.en.md
+++ b/README.en.md
+# Deep Learning with PaddlePaddle
+1. [Fit a Line](http://book.paddlepaddle.org/fit_a_line/index.en.html)
+1. [Recognize Digits](http://book.paddlepaddle.org/recognize_digits/index.en.html)
+1. [Image Classification](http://book.paddlepaddle.org/image_classification/index.en.html)
+1. [Word to Vector](http://book.paddlepaddle.org/word2vec/index.en.html)
+1. [Understand Sentiment](http://book.paddlepaddle.org/understand_sentiment/index.en.html)
+1. [Label Semantic Roles](http://book.paddlepaddle.org/label_semantic_roles/index.en.html)
+1. [Machine Translation](http://book.paddlepaddle.org/machine_translation/index.en.html)
+1. [Recommender System](http://book.paddlepaddle.org/recommender_system/index.en.html)
+This tutorial is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
--- a/README.md
+++ b/README.md
 # 深度学习入门
-1. 新手入门 [[fit_a_line](fit_a_line/)] [[html](http://book.paddlepaddle.org/fit_a_line)]
+1. [新手入门](http://book.paddlepaddle.org/fit_a_line)
-1. 识别数字 [[recognize_digits](recognize_digits/)] [[html](http://book.paddlepaddle.org/recognize_digits)]
+1. [识别数字](http://book.paddlepaddle.org/recognize_digits)
-1. 图像分类 [[image_classification](image_classification/)] [[html](http://book.paddlepaddle.org/image_classification)]
+1. [图像分类](http://book.paddlepaddle.org/image_classification)
-1. 词向量 [[word2vec](word2vec/)] [[html](http://book.paddlepaddle.org/word2vec)]
+1. [词向量](http://book.paddlepaddle.org/word2vec)
-1. 情感分析 [[understand_sentiment](understand_sentiment/)] [[html](http://book.paddlepaddle.org/understand_sentiment)]
+1. [情感分析](http://book.paddlepaddle.org/understand_sentiment)
-1. 语义角色标注 [[label_semantic_roles](label_semantic_roles/)] [[html](http://book.paddlepaddle.org/label_semantic_roles)]
+1. [语义角色标注](http://book.paddlepaddle.org/label_semantic_roles)
-1. 机器翻译 [[machine_translation](machine_translation/)] [[html](http://book.paddlepaddle.org/machine_translation)]
+1. [机器翻译](http://book.paddlepaddle.org/machine_translation)
-1. 个性化推荐 [[recommender_system](recommender_system/)] [[html](http://book.paddlepaddle.org/recommender_system)]
+1. [个性化推荐](http://book.paddlepaddle.org/recommender_system)
-<br/>
 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">知识共享 署名-非商业性使用-相同方式共享 4.0 国际 许可协议</a>进行许可。
--- a/fit_a_line/README.en.md
+++ b/fit_a_line/README.en.md
 # Linear Regression
 Let us begin the tutorial with a classical problem called Linear Regression \[[1](#References)\]. In this chapter, we will train a model from a realistic dataset to predict home prices. Some important concepts in Machine Learning will be covered through this example.
-The source code for this tutorial lives on [book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line). For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html).
+The source code for this tutorial lives on [book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line). For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
 ## Problem Setup
 Suppose we have a dataset of $n$ real estate properties. These real estate properties will be referred to as *homes* in this chapter for clarity.
@@ -202,4 +202,4 @@ This chapter introduces *Linear Regression* and how to train and test this model
 4. Bishop C M. Pattern recognition[J]. Machine Learning, 2006, 128.
 <br/>
-<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Common Creative License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a> This tutorial was created and published with [Creative Common License 4.0](http://creativecommons.org/licenses/by-nc-sa/4.0/).
+This tutorial is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
--- a/fit_a_line/README.md
+++ b/fit_a_line/README.md
 # 线性回归
 让我们从经典的线性回归（Linear Regression \[[1](#参考文献)\]）模型开始这份教程。在这一章里，你将使用真实的数据集建立起一个房价预测模型，并且了解到机器学习中的若干重要概念。
-本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍
 给定一个大小为$n$的数据集  ${\{y_{i}, x_{i1}, ..., x_{id}\}}_{i=1}^{n}$，其中$x_{i1}, \ldots, x_{id}$是第$i$个样本$d$个属性上的取值，$y_i$是该样本待预测的目标。线性回归模型假设目标$y_i$可以被属性间的线性组合描述，即
@@ -15,8 +15,8 @@ $$y_i = \omega_1x_{i1} + \omega_2x_{i2} + \ldots + \omega_dx_{id} + b,  i=1,\ldo
 ## 效果展示
 我们使用从[UCI Housing Data Set](https://archive.ics.uci.edu/ml/datasets/Housing)获得的波士顿房价数据集进行模型的训练和预测。下面的散点图展示了使用模型对部分房屋价格进行的预测。其中，每个点的横坐标表示同一类房屋真实价格的中位数，纵坐标表示线性回归模型根据特征预测的结果，当二者值完全相等的时候就会落在虚线上。所以模型预测得越准确，则点离虚线越近。
 <p align="center">
-	<img src = "image/predictions.png" width=400><br/>
+    <img src = "image/predictions.png" width=400><br/>
-	图1. 预测值 V.S. 真实值
+    图1. 预测值 V.S. 真实值
 </p>
 ## 模型概览
@@ -96,8 +96,8 @@ import paddle.v2.dataset.uci_housing as uci_housing
 - 很多的机器学习技巧/模型（例如L1，L2正则项，向量空间模型-Vector Space Model）都基于这样的假设：所有的属性取值都差不多是以0为均值且取值范围相近的。
 <p align="center">
-	<img src = "image/ranges.png" width=550><br/>
+    <img src = "image/ranges.png" width=550><br/>
-	图2. 各维属性的取值范围
+    图2. 各维属性的取值范围
 </p>
 #### 整理训练集与测试集

--- a/fit_a_line/index.en.html
+++ b/fit_a_line/index.en.html
@@ -43,7 +43,7 @@
 # Linear Regression
 Let us begin the tutorial with a classical problem called Linear Regression \[[1](#References)\]. In this chapter, we will train a model from a realistic dataset to predict home prices. Some important concepts in Machine Learning will be covered through this example.
-The source code for this tutorial lives on [book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line). For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html).
+The source code for this tutorial lives on [book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line). For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
 ## Problem Setup
 Suppose we have a dataset of $n$ real estate properties. These real estate properties will be referred to as *homes* in this chapter for clarity.
@@ -244,7 +244,7 @@ This chapter introduces *Linear Regression* and how to train and test this model
 4. Bishop C M. Pattern recognition[J]. Machine Learning, 2006, 128.
 <br/>
-<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Common Creative License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a> This tutorial was created and published with [Creative Common License 4.0](http://creativecommons.org/licenses/by-nc-sa/4.0/).
+This tutorial is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
 </div>
 <!-- You can change the lines below now. -->

--- a/fit_a_line/index.html
+++ b/fit_a_line/index.html
@@ -43,7 +43,7 @@
 # 线性回归
 让我们从经典的线性回归（Linear Regression \[[1](#参考文献)\]）模型开始这份教程。在这一章里，你将使用真实的数据集建立起一个房价预测模型，并且了解到机器学习中的若干重要概念。
-本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍
 给定一个大小为$n$的数据集  ${\{y_{i}, x_{i1}, ..., x_{id}\}}_{i=1}^{n}$，其中$x_{i1}, \ldots, x_{id}$是第$i$个样本$d$个属性上的取值，$y_i$是该样本待预测的目标。线性回归模型假设目标$y_i$可以被属性间的线性组合描述，即
@@ -57,8 +57,8 @@ $$y_i = \omega_1x_{i1} + \omega_2x_{i2} + \ldots + \omega_dx_{id} + b,  i=1,\ldo
 ## 效果展示
 我们使用从[UCI Housing Data Set](https://archive.ics.uci.edu/ml/datasets/Housing)获得的波士顿房价数据集进行模型的训练和预测。下面的散点图展示了使用模型对部分房屋价格进行的预测。其中，每个点的横坐标表示同一类房屋真实价格的中位数，纵坐标表示线性回归模型根据特征预测的结果，当二者值完全相等的时候就会落在虚线上。所以模型预测得越准确，则点离虚线越近。
 <p align="center">
-	<img src = "image/predictions.png" width=400><br/>
+    <img src = "image/predictions.png" width=400><br/>
-	图1. 预测值 V.S. 真实值
+    图1. 预测值 V.S. 真实值
 </p>
 ## 模型概览
@@ -138,8 +138,8 @@ import paddle.v2.dataset.uci_housing as uci_housing
 - 很多的机器学习技巧/模型（例如L1，L2正则项，向量空间模型-Vector Space Model）都基于这样的假设：所有的属性取值都差不多是以0为均值且取值范围相近的。
 <p align="center">
-	<img src = "image/ranges.png" width=550><br/>
+    <img src = "image/ranges.png" width=550><br/>
-	图2. 各维属性的取值范围
+    图2. 各维属性的取值范围
 </p>
 #### 整理训练集与测试集

--- a/fit_a_line/train.py
+++ b/fit_a_line/train.py
@@ -18,9 +18,8 @@ def main():
    # create optimizer
    optimizer = paddle.optimizer.Momentum(momentum=0)
-    trainer = paddle.trainer.SGD(cost=cost,
+    trainer = paddle.trainer.SGD(
-                                 parameters=parameters,
+        cost=cost, parameters=parameters, update_equation=optimizer)
-                                 update_equation=optimizer)
    feeding = {'x': 0, 'y': 1}
@@ -33,16 +32,14 @@ def main():
        if isinstance(event, paddle.event.EndPass):
            result = trainer.test(
-                reader=paddle.batch(
+                reader=paddle.batch(uci_housing.test(), batch_size=2),
-                    uci_housing.test(), batch_size=2),
                feeding=feeding)
            print "Test %d, Cost %f" % (event.pass_id, result.cost)
    # training
    trainer.train(
        reader=paddle.batch(
-            paddle.reader.shuffle(
+            paddle.reader.shuffle(uci_housing.train(), buf_size=500),
-                uci_housing.train(), buf_size=500),
            batch_size=2),
        feeding=feeding,
        event_handler=event_handler,

--- a/image_classification/README.en.md
+++ b/image_classification/README.en.md
--- a/image_classification/README.md
+++ b/image_classification/README.md
 图像分类
 =======
-本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍
@@ -173,24 +173,24 @@ paddle.init(use_gpu=False, trainer_count=1)
 1. 定义数据输入及其维度
-	网络输入定义为 `data_layer` (数据层)，在图像分类中即为图像像素信息。CIFRAR10是RGB 3通道32x32大小的彩色图，因此输入数据大小为3072(3x32x32)，类别大小为10，即10分类。
+    网络输入定义为 `data_layer` (数据层)，在图像分类中即为图像像素信息。CIFRAR10是RGB 3通道32x32大小的彩色图，因此输入数据大小为3072(3x32x32)，类别大小为10，即10分类。
-	```python
+    ```python
    datadim = 3 * 32 * 32
    classdim = 10
    image = paddle.layer.data(
        name="image", type=paddle.data_type.dense_vector(datadim))
-	```
+    ```
 2. 定义VGG网络核心模块
-	```python
+    ```python
-	net = vgg_bn_drop(image)
+    net = vgg_bn_drop(image)
-	```
+    ```
-	VGG核心模块的输入是数据层，`vgg_bn_drop` 定义了16层VGG结构，每层卷积后面引入BN层和Dropout层，详细的定义如下：
+    VGG核心模块的输入是数据层，`vgg_bn_drop` 定义了16层VGG结构，每层卷积后面引入BN层和Dropout层，详细的定义如下：
-	```python
+    ```python
    def vgg_bn_drop(input):
        def conv_block(ipt, num_filter, groups, dropouts, num_channels=None):
            return paddle.networks.img_conv_group(
@@ -219,40 +219,40 @@ paddle.init(use_gpu=False, trainer_count=1)
            layer_attr=paddle.attr.Extra(drop_rate=0.5))
        fc2 = paddle.layer.fc(input=bn, size=512, act=paddle.activation.Linear())
        return fc2
-	```
+    ```
-	2.1. 首先定义了一组卷积网络，即conv_block。卷积核大小为3x3，池化窗口大小为2x2，窗口滑动大小为2，groups决定每组VGG模块是几次连续的卷积操作，dropouts指定Dropout操作的概率。所使用的`img_conv_group`是在`paddle.networks`中预定义的模块，由若干组 `Conv->BN->ReLu->Dropout` 和 一组 `Pooling` 组成，
+    2.1. 首先定义了一组卷积网络，即conv_block。卷积核大小为3x3，池化窗口大小为2x2，窗口滑动大小为2，groups决定每组VGG模块是几次连续的卷积操作，dropouts指定Dropout操作的概率。所使用的`img_conv_group`是在`paddle.networks`中预定义的模块，由若干组 `Conv->BN->ReLu->Dropout` 和 一组 `Pooling` 组成，
-	2.2. 五组卷积操作，即 5个conv_block。 第一、二组采用两次连续的卷积操作。第三、四、五组采用三次连续的卷积操作。每组最后一个卷积后面Dropout概率为0，即不使用Dropout操作。
+    2.2. 五组卷积操作，即 5个conv_block。 第一、二组采用两次连续的卷积操作。第三、四、五组采用三次连续的卷积操作。每组最后一个卷积后面Dropout概率为0，即不使用Dropout操作。
-	2.3. 最后接两层512维的全连接。
+    2.3. 最后接两层512维的全连接。
 3. 定义分类器
-	通过上面VGG网络提取高层特征，然后经过全连接层映射到类别维度大小的向量，再通过Softmax归一化得到每个类别的概率，也可称作分类器。
+    通过上面VGG网络提取高层特征，然后经过全连接层映射到类别维度大小的向量，再通过Softmax归一化得到每个类别的概率，也可称作分类器。
-	```python
+    ```python
    out = paddle.layer.fc(input=net,
                          size=classdim,
                          act=paddle.activation.Softmax())
-	```
+    ```
 4. 定义损失函数和网络输出
-	在有监督训练中需要输入图像对应的类别信息，同样通过`paddle.layer.data`来定义。训练中采用多类交叉熵作为损失函数，并作为网络的输出，预测阶段定义网络的输出为分类器得到的概率信息。
+    在有监督训练中需要输入图像对应的类别信息，同样通过`paddle.layer.data`来定义。训练中采用多类交叉熵作为损失函数，并作为网络的输出，预测阶段定义网络的输出为分类器得到的概率信息。
-	```python
+    ```python
    lbl = paddle.layer.data(
        name="label", type=paddle.data_type.integer_value(classdim))
    cost = paddle.layer.classification_cost(input=out, label=lbl)
-	```
+    ```
 ### ResNet
 ResNet模型的第1、3、4步和VGG模型相同，这里不再介绍。主要介绍第2步即CIFAR10数据集上ResNet核心模块。
 ```python
-net = resnet_cifar10(data, depth=56)
+net = resnet_cifar10(image, depth=56)
 ```
 先介绍`resnet_cifar10`中的一些基本函数，再介绍网络连接过程。
@@ -375,7 +375,7 @@ $$  lr = lr_{0} * a^ {\lfloor \frac{n}{ b}\rfloor} $$
 cifar.train10()每次产生一条样本，在完成shuffle和batch之后，作为训练的输入。
 ```python
-reader=paddle.reader.batch(
+reader=paddle.batch(
    paddle.reader.shuffle(
        paddle.dataset.cifar.train10(), buf_size=50000),
        batch_size=128)
@@ -402,10 +402,9 @@ def event_handler(event):
            sys.stdout.flush()
    if isinstance(event, paddle.event.EndPass):
        result = trainer.test(
-            reader=paddle.reader.batch(
+            reader=paddle.batch(
                paddle.dataset.cifar.test10(), batch_size=128),
-            reader_dict={'image': 0,
+            feeding=feeding)
-                         'label': 1})
        print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
 ```

--- a/image_classification/deprecated/README.md
+++ b/image_classification/deprecated/README.md
 图像分类
 =======
-本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍
@@ -244,77 +244,77 @@ $$  lr = lr_{0} * a^ {\lfloor \frac{n}{ b}\rfloor} $$
 1. 定义数据输入及其维度
-	网络输入定义为 `data_layer` (数据层)，在图像分类中即为图像像素信息。CIFRAR10是RGB 3通道32x32大小的彩色图，因此输入数据大小为3072(3x32x32)，类别大小为10，即10分类。
+    网络输入定义为 `data_layer` (数据层)，在图像分类中即为图像像素信息。CIFRAR10是RGB 3通道32x32大小的彩色图，因此输入数据大小为3072(3x32x32)，类别大小为10，即10分类。
-	```python
+    ```python
-	datadim = 3 * 32 * 32
+    datadim = 3 * 32 * 32
-	classdim = 10
+    classdim = 10
-	data = data_layer(name='image', size=datadim)
+    data = data_layer(name='image', size=datadim)
-	```
+    ```
 2. 定义VGG网络核心模块
-	```python
+    ```python
-	net = vgg_bn_drop(data)
+    net = vgg_bn_drop(data)
-	```
+    ```
-	VGG核心模块的输入是数据层，`vgg_bn_drop` 定义了16层VGG结构，每层卷积后面引入BN层和Dropout层，详细的定义如下：
+    VGG核心模块的输入是数据层，`vgg_bn_drop` 定义了16层VGG结构，每层卷积后面引入BN层和Dropout层，详细的定义如下：
-	```python
+    ```python
-	def vgg_bn_drop(input, num_channels):
+    def vgg_bn_drop(input, num_channels):
-	    def conv_block(ipt, num_filter, groups, dropouts, num_channels_=None):
+        def conv_block(ipt, num_filter, groups, dropouts, num_channels_=None):
-	        return img_conv_group(
+            return img_conv_group(
-	            input=ipt,
+                input=ipt,
-	            num_channels=num_channels_,
+                num_channels=num_channels_,
-	            pool_size=2,
+                pool_size=2,
-	            pool_stride=2,
+                pool_stride=2,
-	            conv_num_filter=[num_filter] * groups,
+                conv_num_filter=[num_filter] * groups,
-	            conv_filter_size=3,
+                conv_filter_size=3,
-	            conv_act=ReluActivation(),
+                conv_act=ReluActivation(),
-	            conv_with_batchnorm=True,
+                conv_with_batchnorm=True,
-	            conv_batchnorm_drop_rate=dropouts,
+                conv_batchnorm_drop_rate=dropouts,
-	            pool_type=MaxPooling())
+                pool_type=MaxPooling())
-	    conv1 = conv_block(input, 64, 2, [0.3, 0], 3)
+        conv1 = conv_block(input, 64, 2, [0.3, 0], 3)
-	    conv2 = conv_block(conv1, 128, 2, [0.4, 0])
+        conv2 = conv_block(conv1, 128, 2, [0.4, 0])
-	    conv3 = conv_block(conv2, 256, 3, [0.4, 0.4, 0])
+        conv3 = conv_block(conv2, 256, 3, [0.4, 0.4, 0])
-	    conv4 = conv_block(conv3, 512, 3, [0.4, 0.4, 0])
+        conv4 = conv_block(conv3, 512, 3, [0.4, 0.4, 0])
-	    conv5 = conv_block(conv4, 512, 3, [0.4, 0.4, 0])
+        conv5 = conv_block(conv4, 512, 3, [0.4, 0.4, 0])
-	    drop = dropout_layer(input=conv5, dropout_rate=0.5)
+        drop = dropout_layer(input=conv5, dropout_rate=0.5)
-	    fc1 = fc_layer(input=drop, size=512, act=LinearActivation())
+        fc1 = fc_layer(input=drop, size=512, act=LinearActivation())
-	    bn = batch_norm_layer(
+        bn = batch_norm_layer(
-	        input=fc1, act=ReluActivation(), layer_attr=ExtraAttr(drop_rate=0.5))
+            input=fc1, act=ReluActivation(), layer_attr=ExtraAttr(drop_rate=0.5))
-	    fc2 = fc_layer(input=bn, size=512, act=LinearActivation())
+        fc2 = fc_layer(input=bn, size=512, act=LinearActivation())
-	    return fc2
+        return fc2
-	```
+    ```
-	2.1. 首先定义了一组卷积网络，即conv_block。卷积核大小为3x3，池化窗口大小为2x2，窗口滑动大小为2，groups决定每组VGG模块是几次连续的卷积操作，dropouts指定Dropout操作的概率。所使用的`img_conv_group`是在`paddle.trainer_config_helpers`中预定义的模块，由若干组 `Conv->BN->ReLu->Dropout` 和 一组 `Pooling` 组成，
+    2.1. 首先定义了一组卷积网络，即conv_block。卷积核大小为3x3，池化窗口大小为2x2，窗口滑动大小为2，groups决定每组VGG模块是几次连续的卷积操作，dropouts指定Dropout操作的概率。所使用的`img_conv_group`是在`paddle.trainer_config_helpers`中预定义的模块，由若干组 `Conv->BN->ReLu->Dropout` 和 一组 `Pooling` 组成，
-	2.2. 五组卷积操作，即 5个conv_block。 第一、二组采用两次连续的卷积操作。第三、四、五组采用三次连续的卷积操作。每组最后一个卷积后面Dropout概率为0，即不使用Dropout操作。
+    2.2. 五组卷积操作，即 5个conv_block。 第一、二组采用两次连续的卷积操作。第三、四、五组采用三次连续的卷积操作。每组最后一个卷积后面Dropout概率为0，即不使用Dropout操作。
-	2.3. 最后接两层512维的全连接。
+    2.3. 最后接两层512维的全连接。
 3. 定义分类器
-	通过上面VGG网络提取高层特征，然后经过全连接层映射到类别维度大小的向量，再通过Softmax归一化得到每个类别的概率，也可称作分类器。
+    通过上面VGG网络提取高层特征，然后经过全连接层映射到类别维度大小的向量，再通过Softmax归一化得到每个类别的概率，也可称作分类器。
-	```python
+    ```python
-	out = fc_layer(input=net, size=class_num, act=SoftmaxActivation())
+    out = fc_layer(input=net, size=class_num, act=SoftmaxActivation())
-	```
+    ```
 4. 定义损失函数和网络输出
-	在有监督训练中需要输入图像对应的类别信息，同样通过`data_layer`来定义。训练中采用多类交叉熵作为损失函数，并作为网络的输出，预测阶段定义网络的输出为分类器得到的概率信息。
+    在有监督训练中需要输入图像对应的类别信息，同样通过`data_layer`来定义。训练中采用多类交叉熵作为损失函数，并作为网络的输出，预测阶段定义网络的输出为分类器得到的概率信息。
-	```python
+    ```python
-	if not is_predict:
+    if not is_predict:
-	    lbl = data_layer(name="label", size=class_num)
+        lbl = data_layer(name="label", size=class_num)
-	    cost = classification_cost(input=out, label=lbl)
+        cost = classification_cost(input=out, label=lbl)
-	    outputs(cost)
+        outputs(cost)
-	else:
+    else:
-	    outputs(out)
+        outputs(out)
-	```
+    ```
 ### ResNet

--- a/image_classification/deprecated/classify.py
+++ b/image_classification/deprecated/classify.py
@@ -44,8 +44,9 @@ def vis_square(data, fname):
         (0, 1))  # add some space between filters
        + ((0, 0), ) *
        (data.ndim - 3))  # don't pad the last dimension (if there is one)
-    data = np.pad(data, padding, mode='constant',
+    data = np.pad(
-                  constant_values=1)  # pad with ones (white)
+        data, padding, mode='constant',
+        constant_values=1)  # pad with ones (white)
    # tile the filters into an image
    data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(
        range(4, data.ndim + 1)))

--- a/image_classification/index.en.html
+++ b/image_classification/index.en.html
--- a/image_classification/index.html
+++ b/image_classification/index.html
@@ -43,7 +43,7 @@
 图像分类
 =======
-本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍
@@ -215,24 +215,24 @@ paddle.init(use_gpu=False, trainer_count=1)
 1. 定义数据输入及其维度
-	网络输入定义为 `data_layer` (数据层)，在图像分类中即为图像像素信息。CIFRAR10是RGB 3通道32x32大小的彩色图，因此输入数据大小为3072(3x32x32)，类别大小为10，即10分类。
+    网络输入定义为 `data_layer` (数据层)，在图像分类中即为图像像素信息。CIFRAR10是RGB 3通道32x32大小的彩色图，因此输入数据大小为3072(3x32x32)，类别大小为10，即10分类。
-	```python
+    ```python
    datadim = 3 * 32 * 32
    classdim = 10
    image = paddle.layer.data(
        name="image", type=paddle.data_type.dense_vector(datadim))
-	```
+    ```
 2. 定义VGG网络核心模块
-	```python
+    ```python
-	net = vgg_bn_drop(image)
+    net = vgg_bn_drop(image)
-	```
+    ```
-	VGG核心模块的输入是数据层，`vgg_bn_drop` 定义了16层VGG结构，每层卷积后面引入BN层和Dropout层，详细的定义如下：
+    VGG核心模块的输入是数据层，`vgg_bn_drop` 定义了16层VGG结构，每层卷积后面引入BN层和Dropout层，详细的定义如下：
-	```python
+    ```python
    def vgg_bn_drop(input):
        def conv_block(ipt, num_filter, groups, dropouts, num_channels=None):
            return paddle.networks.img_conv_group(
@@ -261,40 +261,40 @@ paddle.init(use_gpu=False, trainer_count=1)
            layer_attr=paddle.attr.Extra(drop_rate=0.5))
        fc2 = paddle.layer.fc(input=bn, size=512, act=paddle.activation.Linear())
        return fc2
-	```
+    ```
-	2.1. 首先定义了一组卷积网络，即conv_block。卷积核大小为3x3，池化窗口大小为2x2，窗口滑动大小为2，groups决定每组VGG模块是几次连续的卷积操作，dropouts指定Dropout操作的概率。所使用的`img_conv_group`是在`paddle.networks`中预定义的模块，由若干组 `Conv->BN->ReLu->Dropout` 和 一组 `Pooling` 组成，
+    2.1. 首先定义了一组卷积网络，即conv_block。卷积核大小为3x3，池化窗口大小为2x2，窗口滑动大小为2，groups决定每组VGG模块是几次连续的卷积操作，dropouts指定Dropout操作的概率。所使用的`img_conv_group`是在`paddle.networks`中预定义的模块，由若干组 `Conv->BN->ReLu->Dropout` 和 一组 `Pooling` 组成，
-	2.2. 五组卷积操作，即 5个conv_block。 第一、二组采用两次连续的卷积操作。第三、四、五组采用三次连续的卷积操作。每组最后一个卷积后面Dropout概率为0，即不使用Dropout操作。
+    2.2. 五组卷积操作，即 5个conv_block。 第一、二组采用两次连续的卷积操作。第三、四、五组采用三次连续的卷积操作。每组最后一个卷积后面Dropout概率为0，即不使用Dropout操作。
-	2.3. 最后接两层512维的全连接。
+    2.3. 最后接两层512维的全连接。
 3. 定义分类器
-	通过上面VGG网络提取高层特征，然后经过全连接层映射到类别维度大小的向量，再通过Softmax归一化得到每个类别的概率，也可称作分类器。
+    通过上面VGG网络提取高层特征，然后经过全连接层映射到类别维度大小的向量，再通过Softmax归一化得到每个类别的概率，也可称作分类器。
-	```python
+    ```python
    out = paddle.layer.fc(input=net,
                          size=classdim,
                          act=paddle.activation.Softmax())
-	```
+    ```
 4. 定义损失函数和网络输出
-	在有监督训练中需要输入图像对应的类别信息，同样通过`paddle.layer.data`来定义。训练中采用多类交叉熵作为损失函数，并作为网络的输出，预测阶段定义网络的输出为分类器得到的概率信息。
+    在有监督训练中需要输入图像对应的类别信息，同样通过`paddle.layer.data`来定义。训练中采用多类交叉熵作为损失函数，并作为网络的输出，预测阶段定义网络的输出为分类器得到的概率信息。
-	```python
+    ```python
    lbl = paddle.layer.data(
        name="label", type=paddle.data_type.integer_value(classdim))
    cost = paddle.layer.classification_cost(input=out, label=lbl)
-	```
+    ```
 ### ResNet
 ResNet模型的第1、3、4步和VGG模型相同，这里不再介绍。主要介绍第2步即CIFAR10数据集上ResNet核心模块。
 ```python
-net = resnet_cifar10(data, depth=56)
+net = resnet_cifar10(image, depth=56)
 ```
 先介绍`resnet_cifar10`中的一些基本函数，再介绍网络连接过程。
@@ -417,7 +417,7 @@ $$  lr = lr_{0} * a^ {\lfloor \frac{n}{ b}\rfloor} $$
 cifar.train10()每次产生一条样本，在完成shuffle和batch之后，作为训练的输入。
 ```python
-reader=paddle.reader.batch(
+reader=paddle.batch(
    paddle.reader.shuffle(
        paddle.dataset.cifar.train10(), buf_size=50000),
        batch_size=128)
@@ -444,10 +444,9 @@ def event_handler(event):
            sys.stdout.flush()
    if isinstance(event, paddle.event.EndPass):
        result = trainer.test(
-            reader=paddle.reader.batch(
+            reader=paddle.batch(
                paddle.dataset.cifar.test10(), batch_size=128),
-            reader_dict={'image': 0,
+            feeding=feeding)
-                         'label': 1})
        print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
 ```

--- a/image_classification/train.py
+++ b/image_classification/train.py
@@ -36,9 +36,8 @@ def main():
    # option 2. vgg
    net = vgg_bn_drop(image)
-    out = paddle.layer.fc(input=net,
+    out = paddle.layer.fc(
-                          size=classdim,
+        input=net, size=classdim, act=paddle.activation.Softmax())
-                          act=paddle.activation.Softmax())
    lbl = paddle.layer.data(
        name="label", type=paddle.data_type.integer_value(classdim))
@@ -75,9 +74,8 @@ def main():
            print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
    # Create trainer
-    trainer = paddle.trainer.SGD(cost=cost,
+    trainer = paddle.trainer.SGD(
-                                 parameters=parameters,
+        cost=cost, parameters=parameters, update_equation=momentum_optimizer)
-                                 update_equation=momentum_optimizer)
    trainer.train(
        reader=paddle.batch(
            paddle.reader.shuffle(

--- a/index.en.html
+++ b/index.en.html
+<html>
+<head>
+  <script type="text/x-mathjax-config">
+  MathJax.Hub.Config({
+    extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
+    jax: ["input/TeX", "output/HTML-CSS"],
+    tex2jax: {
+      inlineMath: [ ['$','$'] ],
+      displayMath: [ ['$$','$$'] ],
+      processEscapes: true
+    },
+    "HTML-CSS": { availableFonts: ["TeX"] }
+  });
+  </script>
+  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
+  <script type="text/javascript" src="../.tmpl/marked.js">
+  </script>
+  <link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
+  <script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
+  <link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
+  <link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
+  <link href="../.tmpl/github-markdown.css" rel='stylesheet'>
+</head>
+<style type="text/css" >
+.markdown-body {
+    box-sizing: border-box;
+    min-width: 200px;
+    max-width: 980px;
+    margin: 0 auto;
+    padding: 45px;
+}
+</style>
+<body>
+<div id="context" class="container markdown-body">
+</div>
+<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
+<div id="markdown" style='display:none'>
+# Deep Learning with PaddlePaddle
+1. [Fit a Line](http://book.paddlepaddle.org/fit_a_line/index.en.html)
+1. [Recognize Digits](http://book.paddlepaddle.org/recognize_digits/index.en.html)
+1. [Image Classification](http://book.paddlepaddle.org/image_classification/index.en.html)
+1. [Word to Vector](http://book.paddlepaddle.org/word2vec/index.en.html)
+1. [Understand Sentiment](http://book.paddlepaddle.org/understand_sentiment/index.en.html)
+1. [Label Semantic Roles](http://book.paddlepaddle.org/label_semantic_roles/index.en.html)
+1. [Machine Translation](http://book.paddlepaddle.org/machine_translation/index.en.html)
+1. [Recommender System](http://book.paddlepaddle.org/recommender_system/index.en.html)
+This tutorial is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
+</div>
+<!-- You can change the lines below now. -->
+<script type="text/javascript">
+marked.setOptions({
+  renderer: new marked.Renderer(),
+  gfm: true,
+  breaks: false,
+  smartypants: true,
+  highlight: function(code, lang) {
+    code = code.replace(/&amp;/g, "&")
+    code = code.replace(/&gt;/g, ">")
+    code = code.replace(/&lt;/g, "<")
+    code = code.replace(/&nbsp;/g, " ")
+    return hljs.highlightAuto(code, [lang]).value;
+  }
+});
+document.getElementById("context").innerHTML = marked(
+        document.getElementById("markdown").innerHTML)
+</script>
+</body>
--- a/index.html
+++ b/index.html
@@ -42,16 +42,16 @@
 <div id="markdown" style='display:none'>
 # 深度学习入门
-1. 新手入门 [[fit_a_line](fit_a_line/)] [[html](http://book.paddlepaddle.org/fit_a_line)]
+1. [新手入门](http://book.paddlepaddle.org/fit_a_line)
-1. 识别数字 [[recognize_digits](recognize_digits/)] [[html](http://book.paddlepaddle.org/recognize_digits)]
+1. [识别数字](http://book.paddlepaddle.org/recognize_digits)
-1. 图像分类 [[image_classification](image_classification/)] [[html](http://book.paddlepaddle.org/image_classification)]
+1. [图像分类](http://book.paddlepaddle.org/image_classification)
-1. 词向量 [[word2vec](word2vec/)] [[html](http://book.paddlepaddle.org/word2vec)]
+1. [词向量](http://book.paddlepaddle.org/word2vec)
-1. 情感分析 [[understand_sentiment](understand_sentiment/)] [[html](http://book.paddlepaddle.org/understand_sentiment)]
+1. [情感分析](http://book.paddlepaddle.org/understand_sentiment)
-1. 语义角色标注 [[label_semantic_roles](label_semantic_roles/)] [[html](http://book.paddlepaddle.org/label_semantic_roles)]
+1. [语义角色标注](http://book.paddlepaddle.org/label_semantic_roles)
-1. 机器翻译 [[machine_translation](machine_translation/)] [[html](http://book.paddlepaddle.org/machine_translation)]
+1. [机器翻译](http://book.paddlepaddle.org/machine_translation)
-1. 个性化推荐 [[recommender_system](recommender_system/)] [[html](http://book.paddlepaddle.org/recommender_system)]
+1. [个性化推荐](http://book.paddlepaddle.org/recommender_system)
-<br/>
 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">知识共享 署名-非商业性使用-相同方式共享 4.0 国际 许可协议</a>进行许可。
 </div>

--- a/label_semantic_roles/README.en.md
+++ b/label_semantic_roles/README.en.md
@@ -2,6 +2,8 @@
 Source code of this chapter is in [book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles).
+For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
 ## Background
 Natural Language Analysis contains three components: Lexical Analysis, Syntactic Analysis, and Semantic Analysis. Semantic Role Labelling (SRL) is one way for Shallow Semantic Analysis. A predicate of a sentence is a property that a subject possesses or is characterized, such as what it does, what it is or how it is, which mostly corresponds to the core of an event. The noun associated with a predicate is called Argument. Semantic roles express the abstract roles that arguments of a predicate can take in the event, such as Agent, Patient, Theme, Experiencer, Beneficiary, Instrument, Location, Goal and Source, etc.
@@ -134,7 +136,7 @@ After modification, the model is as follows:
 <div  align="center">  
-<img src="image/db_lstm_en.png" width = "60%"  align=center /><br>
+<img src="image/db_lstm_network_en.png" width = "60%"  align=center /><br>
 Fig 6. DB-LSTM for SRL tasks
 </div>
@@ -200,6 +202,8 @@ import numpy as np
 import paddle.v2 as paddle
 import paddle.v2.dataset.conll05 as conll05
+paddle.init(use_gpu=False, trainer_count=1)
 word_dict, verb_dict, label_dict = conll05.get_dict()
 word_dict_len = len(word_dict)
 label_dict_len = len(label_dict)
@@ -212,7 +216,7 @@ print pred_len
 ## Model configuration
- 1. Define input data dimensions and model hyperparameters.
+- Define input data dimensions and model hyperparameters.
 ```python
 mark_dict_len = 2    # Value range of region mark. Region mark is either 0 or 1, so range is 2
@@ -247,7 +251,7 @@ target = paddle.layer.data(name='target', type=d_type(label_dict_len))
 Speciala note: hidden_dim = 512 means LSTM hidden vector of 128 dimension (512/4). Please refer PaddlePaddle official documentation for detail: [lstmemory](http://www.paddlepaddle.org/doc/ui/api/trainer_config_helpers/layers.html#lstmemory)。
- 2. The word sequence, predicate, predicate context, and region mark sequence are transformed into embedding vector sequences.
+- The word sequence, predicate, predicate context, and region mark sequence are transformed into embedding vector sequences.
 ```python  
@@ -276,7 +280,7 @@ emb_layers.append(predicate_embedding)
 emb_layers.append(mark_embedding)
 ```
- 3. 8 LSTM units will be trained in "forward / backward" order.
+- 8 LSTM units will be trained in "forward / backward" order.
 ```python  
 hidden_0 = paddle.layer.mixed(
@@ -326,7 +330,7 @@ for i in range(1, depth):
    input_tmp = [mix_hidden, lstm]
 ```
- 4. We will concatenate the output of top LSTM unit with it's input, and project into a hidden layer. Then put a fully connected layer on top of it to get the final vector representation.
+- We will concatenate the output of top LSTM unit with it's input, and project into a hidden layer. Then put a fully connected layer on top of it to get the final vector representation.
 ```python
 feature_out = paddle.layer.mixed(
@@ -340,7 +344,7 @@ for i in range(1, depth):
 ], )
 ```
- 5.  We use CRF as cost function, the parameter of CRF cost will be named `crfw`.
+- We use CRF as cost function, the parameter of CRF cost will be named `crfw`.
 ```python
 crf_cost = paddle.layer.crf(
@@ -353,7 +357,7 @@ crf_cost = paddle.layer.crf(
        learning_rate=mix_hidden_lr))
 ```
- 6.  CRF decoding layer is used for evaluation and inference. It shares parameter with CRF layer.  The sharing of parameters among multiple layers is specified by the same parameter name in these layers.
+- CRF decoding layer is used for evaluation and inference. It shares parameter with CRF layer.  The sharing of parameters among multiple layers is specified by the same parameter name in these layers.
 ```python
 crf_dec = paddle.layer.crf_decoding(
@@ -470,4 +474,4 @@ Semantic Role Labeling is an important intermediate step in a wide range of natu
 10. Zhou J, Xu W. [End-to-end learning of semantic role labeling using recurrent neural networks](http://www.aclweb.org/anthology/P/P15/P15-1109.pdf)[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2015.
 <br/>
-<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">知识共享 署名-非商业性使用-相同方式共享 4.0 国际 许可协议</a>进行许可。
+This tutorial is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
--- a/label_semantic_roles/README.md
+++ b/label_semantic_roles/README.md
 # 语义角色标注
-本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍
@@ -206,7 +206,7 @@ print pred_len
 ## 模型配置说明
- 1. 定义输入数据维度及模型超参数。
+- 定义输入数据维度及模型超参数。
 ```python
 mark_dict_len = 2    # 谓上下文区域标志的维度，是一个0-1 2值特征，因此维度为2
@@ -240,7 +240,7 @@ target = paddle.layer.data(name='target', type=d_type(label_dict_len))
 这里需要特别说明的是hidden_dim = 512指定了LSTM隐层向量的维度为128维，关于这一点请参考PaddlePaddle官方文档中[lstmemory](http://www.paddlepaddle.org/doc/ui/api/trainer_config_helpers/layers.html#lstmemory)的说明。
- 2. 将句子序列、谓词、谓词上下文、谓词上下文区域标记通过词表，转换为实向量表示的词向量序列。
+- 将句子序列、谓词、谓词上下文、谓词上下文区域标记通过词表，转换为实向量表示的词向量序列。
 ```python  
@@ -269,7 +269,7 @@ emb_layers.append(predicate_embedding)
 emb_layers.append(mark_embedding)
 ```
- 3. 8个LSTM单元以“正向/反向”的顺序对所有输入序列进行学习。
+- 8个LSTM单元以“正向/反向”的顺序对所有输入序列进行学习。
 ```python  
 hidden_0 = paddle.layer.mixed(
@@ -319,7 +319,7 @@ for i in range(1, depth):
    input_tmp = [mix_hidden, lstm]
 ```
- 4. 取最后一个栈式LSTM的输出和这个LSTM单元的输入到隐层映射，经过一个全连接层映射到标记字典的维度，得到最终的特征向量表示。
+- 取最后一个栈式LSTM的输出和这个LSTM单元的输入到隐层映射，经过一个全连接层映射到标记字典的维度，得到最终的特征向量表示。
 ```python
 feature_out = paddle.layer.mixed(
@@ -333,7 +333,7 @@ input=[
 ], )
 ```
- 5. 网络的末端定义CRF层计算损失(cost)，指定参数名字为 `crfw`，该层需要输入正确的数据标签(target)。
+- 网络的末端定义CRF层计算损失(cost)，指定参数名字为 `crfw`，该层需要输入正确的数据标签(target)。
 ```python
 crf_cost = paddle.layer.crf(
@@ -346,7 +346,7 @@ crf_cost = paddle.layer.crf(
        learning_rate=mix_hidden_lr))
 ```
- 6. CRF译码层和CRF层参数名字相同，即共享权重。如果输入了正确的数据标签(target)，会统计错误标签的个数，可以用来评估模型。如果没有输入正确的数据标签，该层可以推到出最优解，可以用来预测模型。
+- CRF译码层和CRF层参数名字相同，即共享权重。如果输入了正确的数据标签(target)，会统计错误标签的个数，可以用来评估模型。如果没有输入正确的数据标签，该层可以推到出最优解，可以用来预测模型。
 ```python
 crf_dec = paddle.layer.crf_decoding(

--- a/label_semantic_roles/db_lstm.py
+++ b/label_semantic_roles/db_lstm.py
@@ -75,8 +75,7 @@ settings(
    learning_method=MomentumOptimizer(momentum=0),
    learning_rate=2e-2,
    regularization=L2Regularization(8e-4),
-    model_average=ModelAverage(
+    model_average=ModelAverage(average_window=0.5, max_average_window=10000), )
-        average_window=0.5, max_average_window=10000), )
 ####################################### network ##############################
 #8 features and 1 target
@@ -102,13 +101,12 @@ std_default = ParameterAttribute(initial_std=default_std)
 predicate_embedding = embedding_layer(
    size=word_dim,
    input=predicate,
-    param_attr=ParameterAttribute(
+    param_attr=ParameterAttribute(name='vemb', initial_std=default_std))
-        name='vemb', initial_std=default_std))
 word_input = [word, ctx_n2, ctx_n1, ctx_0, ctx_p1, ctx_p2]
 emb_layers = [
-    embedding_layer(
+    embedding_layer(size=word_dim, input=x, param_attr=emb_para)
-        size=word_dim, input=x, param_attr=emb_para) for x in word_input
+    for x in word_input
 ]
 emb_layers.append(predicate_embedding)
 mark_embedding = embedding_layer(
@@ -120,8 +118,8 @@ hidden_0 = mixed_layer(
    size=hidden_dim,
    bias_attr=std_default,
    input=[
-        full_matrix_projection(
+        full_matrix_projection(input=emb, param_attr=std_default)
-            input=emb, param_attr=std_default) for emb in emb_layers
+        for emb in emb_layers
    ])
 mix_hidden_lr = 1e-3
@@ -171,10 +169,8 @@ feature_out = mixed_layer(
    size=label_dict_len,
    bias_attr=std_default,
    input=[
-        full_matrix_projection(
+        full_matrix_projection(input=input_tmp[0], param_attr=hidden_para_attr),
-            input=input_tmp[0], param_attr=hidden_para_attr),
+        full_matrix_projection(input=input_tmp[1], param_attr=lstm_para_attr)
-        full_matrix_projection(
-            input=input_tmp[1], param_attr=lstm_para_attr)
    ], )
 if not is_predict:

--- a/label_semantic_roles/image/bd_lstm_en.png
+++ b/label_semantic_roles/image/bd_lstm_en.png
--- a/label_semantic_roles/index.en.html
+++ b/label_semantic_roles/index.en.html
@@ -44,6 +44,8 @@
 Source code of this chapter is in [book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles).
+For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
 ## Background
 Natural Language Analysis contains three components: Lexical Analysis, Syntactic Analysis, and Semantic Analysis. Semantic Role Labelling (SRL) is one way for Shallow Semantic Analysis. A predicate of a sentence is a property that a subject possesses or is characterized, such as what it does, what it is or how it is, which mostly corresponds to the core of an event. The noun associated with a predicate is called Argument. Semantic roles express the abstract roles that arguments of a predicate can take in the event, such as Agent, Patient, Theme, Experiencer, Beneficiary, Instrument, Location, Goal and Source, etc.
@@ -176,7 +178,7 @@ After modification, the model is as follows:
 <div  align="center">  
-<img src="image/db_lstm_en.png" width = "60%"  align=center /><br>
+<img src="image/db_lstm_network_en.png" width = "60%"  align=center /><br>
 Fig 6. DB-LSTM for SRL tasks
 </div>
@@ -242,6 +244,8 @@ import numpy as np
 import paddle.v2 as paddle
 import paddle.v2.dataset.conll05 as conll05
+paddle.init(use_gpu=False, trainer_count=1)
 word_dict, verb_dict, label_dict = conll05.get_dict()
 word_dict_len = len(word_dict)
 label_dict_len = len(label_dict)
@@ -254,7 +258,7 @@ print pred_len
 ## Model configuration
- 1. Define input data dimensions and model hyperparameters.
+- Define input data dimensions and model hyperparameters.
 ```python
 mark_dict_len = 2    # Value range of region mark. Region mark is either 0 or 1, so range is 2
@@ -289,7 +293,7 @@ target = paddle.layer.data(name='target', type=d_type(label_dict_len))
 Speciala note: hidden_dim = 512 means LSTM hidden vector of 128 dimension (512/4). Please refer PaddlePaddle official documentation for detail: [lstmemory](http://www.paddlepaddle.org/doc/ui/api/trainer_config_helpers/layers.html#lstmemory)。
- 2. The word sequence, predicate, predicate context, and region mark sequence are transformed into embedding vector sequences.
+- The word sequence, predicate, predicate context, and region mark sequence are transformed into embedding vector sequences.
 ```python  
@@ -318,7 +322,7 @@ emb_layers.append(predicate_embedding)
 emb_layers.append(mark_embedding)
 ```
- 3. 8 LSTM units will be trained in "forward / backward" order.
+- 8 LSTM units will be trained in "forward / backward" order.
 ```python  
 hidden_0 = paddle.layer.mixed(
@@ -368,7 +372,7 @@ for i in range(1, depth):
    input_tmp = [mix_hidden, lstm]
 ```
- 4. We will concatenate the output of top LSTM unit with it's input, and project into a hidden layer. Then put a fully connected layer on top of it to get the final vector representation.
+- We will concatenate the output of top LSTM unit with it's input, and project into a hidden layer. Then put a fully connected layer on top of it to get the final vector representation.
 ```python
 feature_out = paddle.layer.mixed(
@@ -382,7 +386,7 @@ for i in range(1, depth):
 ], )
 ```
- 5.  We use CRF as cost function, the parameter of CRF cost will be named `crfw`.
+- We use CRF as cost function, the parameter of CRF cost will be named `crfw`.
 ```python
 crf_cost = paddle.layer.crf(
@@ -395,7 +399,7 @@ crf_cost = paddle.layer.crf(
        learning_rate=mix_hidden_lr))
 ```
- 6.  CRF decoding layer is used for evaluation and inference. It shares parameter with CRF layer.  The sharing of parameters among multiple layers is specified by the same parameter name in these layers.
+- CRF decoding layer is used for evaluation and inference. It shares parameter with CRF layer.  The sharing of parameters among multiple layers is specified by the same parameter name in these layers.
 ```python
 crf_dec = paddle.layer.crf_decoding(
@@ -512,7 +516,7 @@ Semantic Role Labeling is an important intermediate step in a wide range of natu
 10. Zhou J, Xu W. [End-to-end learning of semantic role labeling using recurrent neural networks](http://www.aclweb.org/anthology/P/P15/P15-1109.pdf)[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2015.
 <br/>
-<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">知识共享 署名-非商业性使用-相同方式共享 4.0 国际 许可协议</a>进行许可。
+This tutorial is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
 </div>
 <!-- You can change the lines below now. -->

--- a/label_semantic_roles/index.html
+++ b/label_semantic_roles/index.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 语义角色标注
-本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍
@@ -248,7 +248,7 @@ print pred_len
 ## 模型配置说明
- 1. 定义输入数据维度及模型超参数。
+- 定义输入数据维度及模型超参数。
 ```python
 mark_dict_len = 2    # 谓上下文区域标志的维度，是一个0-1 2值特征，因此维度为2
@@ -282,7 +282,7 @@ target = paddle.layer.data(name='target', type=d_type(label_dict_len))
 这里需要特别说明的是hidden_dim = 512指定了LSTM隐层向量的维度为128维，关于这一点请参考PaddlePaddle官方文档中[lstmemory](http://www.paddlepaddle.org/doc/ui/api/trainer_config_helpers/layers.html#lstmemory)的说明。
- 2. 将句子序列、谓词、谓词上下文、谓词上下文区域标记通过词表，转换为实向量表示的词向量序列。
+- 将句子序列、谓词、谓词上下文、谓词上下文区域标记通过词表，转换为实向量表示的词向量序列。
 ```python  
@@ -311,7 +311,7 @@ emb_layers.append(predicate_embedding)
 emb_layers.append(mark_embedding)
 ```
- 3. 8个LSTM单元以“正向/反向”的顺序对所有输入序列进行学习。
+- 8个LSTM单元以“正向/反向”的顺序对所有输入序列进行学习。
 ```python  
 hidden_0 = paddle.layer.mixed(
@@ -361,7 +361,7 @@ for i in range(1, depth):
    input_tmp = [mix_hidden, lstm]
 ```
- 4. 取最后一个栈式LSTM的输出和这个LSTM单元的输入到隐层映射，经过一个全连接层映射到标记字典的维度，得到最终的特征向量表示。
+- 取最后一个栈式LSTM的输出和这个LSTM单元的输入到隐层映射，经过一个全连接层映射到标记字典的维度，得到最终的特征向量表示。
 ```python
 feature_out = paddle.layer.mixed(
@@ -375,7 +375,7 @@ input=[
 ], )
 ```
- 5. 网络的末端定义CRF层计算损失(cost)，指定参数名字为 `crfw`，该层需要输入正确的数据标签(target)。
+- 网络的末端定义CRF层计算损失(cost)，指定参数名字为 `crfw`，该层需要输入正确的数据标签(target)。
 ```python
 crf_cost = paddle.layer.crf(
@@ -388,7 +388,7 @@ crf_cost = paddle.layer.crf(
        learning_rate=mix_hidden_lr))
 ```
- 6. CRF译码层和CRF层参数名字相同，即共享权重。如果输入了正确的数据标签(target)，会统计错误标签的个数，可以用来评估模型。如果没有输入正确的数据标签，该层可以推到出最优解，可以用来预测模型。
+- CRF译码层和CRF层参数名字相同，即共享权重。如果输入了正确的数据标签(target)，会统计错误标签的个数，可以用来评估模型。如果没有输入正确的数据标签，该层可以推到出最优解，可以用来预测模型。
 ```python
 crf_dec = paddle.layer.crf_decoding(

--- a/label_semantic_roles/train.py
+++ b/label_semantic_roles/train.py
@@ -40,15 +40,14 @@ def db_lstm():
    predicate_embedding = paddle.layer.embedding(
        size=word_dim,
        input=predicate,
-        param_attr=paddle.attr.Param(
+        param_attr=paddle.attr.Param(name='vemb', initial_std=default_std))
-            name='vemb', initial_std=default_std))
    mark_embedding = paddle.layer.embedding(
        size=mark_dim, input=mark, param_attr=std_0)
    word_input = [word, ctx_n2, ctx_n1, ctx_0, ctx_p1, ctx_p2]
    emb_layers = [
-        paddle.layer.embedding(
+        paddle.layer.embedding(size=word_dim, input=x, param_attr=emb_para)
-            size=word_dim, input=x, param_attr=emb_para) for x in word_input
+        for x in word_input
    ]
    emb_layers.append(predicate_embedding)
    emb_layers.append(mark_embedding)
@@ -109,13 +108,12 @@ def db_lstm():
                input=input_tmp[1], param_attr=lstm_para_attr)
        ], )
-    crf_cost = paddle.layer.crf(size=label_dict_len,
+    crf_cost = paddle.layer.crf(
-                                input=feature_out,
+        size=label_dict_len,
-                                label=target,
+        input=feature_out,
-                                param_attr=paddle.attr.Param(
+        label=target,
-                                    name='crfw',
+        param_attr=paddle.attr.Param(
-                                    initial_std=default_std,
+            name='crfw', initial_std=default_std, learning_rate=mix_hidden_lr))
-                                    learning_rate=mix_hidden_lr))
    crf_dec = paddle.layer.crf_decoding(
        name='crf_dec_l',
@@ -151,13 +149,11 @@ def main():
        model_average=paddle.optimizer.ModelAverage(
            average_window=0.5, max_average_window=10000), )
-    trainer = paddle.trainer.SGD(cost=crf_cost,
+    trainer = paddle.trainer.SGD(
-                                 parameters=parameters,
+        cost=crf_cost, parameters=parameters, update_equation=optimizer)
-                                 update_equation=optimizer)
    reader = paddle.batch(
-        paddle.reader.shuffle(
+        paddle.reader.shuffle(conll05.test(), buf_size=8192), batch_size=10)
-            conll05.test(), buf_size=8192), batch_size=10)
    feeding = {
        'word_data': 0,

--- a/machine_translation/README.en.md
+++ b/machine_translation/README.en.md
--- a/machine_translation/README.md
+++ b/machine_translation/README.md
 # 机器翻译
-本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/machine_translation)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/machine_translation)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍
@@ -152,54 +152,8 @@ e_{ij}&=align(z_i,h_j)\\\\
 ## 数据介绍
-### 下载与解压缩
 本教程使用[WMT-14](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/)数据集中的[bitexts(after selection)](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/bitexts.tgz)作为训练集，[dev+test data](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/dev+test.tgz)作为测试集和生成集。
-在Linux下，只需简单地运行以下命令：
-```bash
-cd data
-./wmt14_data.sh
-```
-得到的数据集`data/wmt14`包含如下三个文件夹：
-<p align = "center">
-<table>
-<tr>
-<td>文件夹名</td>
-<td>法英平行语料文件</td>
-<td>文件数</td>
-<td>文件大小</td>
-</tr>
-<tr>
-<td>train</td>
-<td>ccb2_pc30.src, ccb2_pc30.trg, etc</td>
-<td>12</td>
-<td>3.55G</td>
-</tr>
-<tr>
-<td>test</td>
-<td>ntst1213.src, ntst1213.trg</td>
-<td>2</td>
-<td>1636k</td>
-</tr>
-</tr>
-<tr>
-<td>gen</td>
-<td>ntst14.src, ntst14.trg</td>
-<td>2</td>
-<td>864k</td>
-</tr>
-</table>
-</p>
- `XXX.src`是源法语文件，`XXX.trg`是目标英语文件，文件中的每行存放一个句子
- `XXX.src`和`XXX.trg`的行数一致，且两者任意第$i$行的句子之间都有着一一对应的关系。
 ### 数据预处理
 我们的预处理流程包括两步：
@@ -220,6 +174,7 @@ cd data
 ```python
 # 加载 paddle的python包
+import sys
 import paddle.v2 as paddle
 # 配置只使用cpu，并且使用一个cpu进行训练
@@ -256,17 +211,16 @@ wmt14_reader = paddle.batch(
   decoder_size = 512 # 解码器中的GRU隐层大小
  ```
-2. 其次，实现编码器框架。分为三步：
+1. 其次，实现编码器框架。分为三步：
-   2.1 将在dataset reader中生成的用每个单词在字典中的索引表示的源语言序列
+   1 输入是一个文字序列，被表示成整型的序列。序列中每个元素是文字在字典中的索引。所以，我们定义数据层的数据类型为`integer_value_sequence`（整型序列），序列中每个元素的范围是`[0, source_dict_dim)`。
-   转换成one-hot vector表示的源语言序列$\mathbf{w}$，其类型为integer_value_sequence。
   ```python
    src_word_id = paddle.layer.data(
        name='source_language_word',
        type=paddle.data_type.integer_value_sequence(source_dict_dim))
   ```
-   2.2 将上述编码映射到低维语言空间的词向量$\mathbf{s}$。
+   1. 将上述编码映射到低维语言空间的词向量$\mathbf{s}$。
   ```python
    src_embedding = paddle.layer.embedding(
@@ -274,7 +228,7 @@ wmt14_reader = paddle.batch(
        size=word_vector_dim,
        param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))
   ```
-   2.3 用双向GRU编码源语言序列，拼接两个GRU的编码结果得到$\mathbf{h}$。
+   1. 用双向GRU编码源语言序列，拼接两个GRU的编码结果得到$\mathbf{h}$。
   ```python
    src_forward = paddle.networks.simple_gru(
@@ -284,16 +238,17 @@ wmt14_reader = paddle.batch(
    encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])
   ```
-3. 接着，定义基于注意力机制的解码器框架。分为三步：
+1. 接着，定义基于注意力机制的解码器框架。分为三步：
-   3.1 对源语言序列编码后的结果（见2.3），过一个前馈神经网络（Feed Forward Neural Network），得到其映射。
+   1. 对源语言序列编码后的结果（见2.3），过一个前馈神经网络（Feed Forward Neural Network），得到其映射。
   ```python
    with paddle.layer.mixed(size=decoder_size) as encoded_proj:
        encoded_proj += paddle.layer.full_matrix_projection(
            input=encoded_vector)
   ```
-   3.2 构造解码器RNN的初始状态。由于解码器需要预测时序目标序列，但在0时刻并没有初始值，所以我们希望对其进行初始化。这里采用的是将源语言序列逆序编码后的最后一个状态进行非线性映射，作为该初始值，即$c_0=h_T$。
+   1. 构造解码器RNN的初始状态。由于解码器需要预测时序目标序列，但在0时刻并没有初始值，所以我们希望对其进行初始化。这里采用的是将源语言序列逆序编码后的最后一个状态进行非线性映射，作为该初始值，即$c_0=h_T$。
   ```python
    backward_first = paddle.layer.first_seq(input=src_backward)
@@ -302,15 +257,14 @@ wmt14_reader = paddle.batch(
        decoder_boot += paddle.layer.full_matrix_projection(
            input=backward_first)
   ```
-   3.3 定义解码阶段每一个时间步的RNN行为，即根据当前时刻的源语言上下文向量$c_i$、解码器隐层状态$z_i$和目标语言中第$i$个词$u_i$，来预测第$i+1$个词的概率$p_{i+1}$。
+   1. 定义解码阶段每一个时间步的RNN行为，即根据当前时刻的源语言上下文向量$c_i$、解码器隐层状态$z_i$和目标语言中第$i$个词$u_i$，来预测第$i+1$个词的概率$p_{i+1}$。
      - decoder_mem记录了前一个时间步的隐层状态$z_i$，其初始状态是decoder_boot。
      - context通过调用`simple_attention`函数，实现公式$c_i=\sum {j=1}^{T}a_{ij}h_j$。其中，enc_vec是$h_j$，enc_proj是$h_j$的映射（见3.1），权重$a_{ij}$的计算已经封装在`simple_attention`函数中。
      - decoder_inputs融合了$c_i$和当前目标词current_word（即$u_i$）的表示。
      - gru_step通过调用`gru_step_layer`函数，在decoder_inputs和decoder_mem上做了激活操作，即实现公式$z_{i+1}=\phi _{\theta '}\left ( c_i,u_i,z_i \right )$。
      - 最后，使用softmax归一化计算单词的概率，将out结果返回，即实现公式$p\left ( u_i|u_{&lt;i},\mathbf{x} \right )=softmax(W_sz_i+b_z)$。
   ```python
    def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
@@ -340,24 +294,24 @@ wmt14_reader = paddle.batch(
            out += paddle.layer.full_matrix_projection(input=gru_step)
        return out
    ```
-4. 训练模式与生成模式下的解码器调用区别。
-   4.1 定义解码器框架名字，和`gru_decoder_with_attention`函数的前两个输入。注意：这两个输入使用`StaticInput`，具体说明可见[StaticInput文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入)。
+1. 定义解码器框架名字，和`gru_decoder_with_attention`函数的前两个输入。注意：这两个输入使用`StaticInput`，具体说明可见[StaticInput文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入)。
-   ```python
+    ```python
    decoder_group_name = "decoder_group"
    group_input1 = paddle.layer.StaticInputV2(input=encoded_vector, is_seq=True)
    group_input2 = paddle.layer.StaticInputV2(input=encoded_proj, is_seq=True)
    group_inputs = [group_input1, group_input2]
-   ```
+    ```
-   4.2 训练模式下的解码器调用：
-      - 首先，将目标语言序列的词向量trg_embedding，直接作为训练模式下的current_word传给`gru_decoder_with_attention`函数。
+1. 训练模式下的解码器调用：
-      - 其次，使用`recurrent_group`函数循环调用`gru_decoder_with_attention`函数。
-      - 接着，使用目标语言的下一个词序列作为标签层lbl，即预测目标词。
-      - 最后，用多类交叉熵损失函数`classification_cost`来计算损失值。
-   ```python
+   - 首先，将目标语言序列的词向量trg_embedding，直接作为训练模式下的current_word传给`gru_decoder_with_attention`函数。
+   - 其次，使用`recurrent_group`函数循环调用`gru_decoder_with_attention`函数。
+   - 接着，使用目标语言的下一个词序列作为标签层lbl，即预测目标词。
+   - 最后，用多类交叉熵损失函数`classification_cost`来计算损失值。
+    ```python
    trg_embedding = paddle.layer.embedding(
        input=paddle.layer.data(
            name='target_language_word',
@@ -380,7 +334,8 @@ wmt14_reader = paddle.batch(
        name='target_language_next_word',
        type=paddle.data_type.integer_value_sequence(target_dict_dim))
    cost = paddle.layer.classification_cost(input=decoder, label=lbl)
-   ```
+    ```
 注意：我们提供的配置在Bahdanau的论文\[[4](#参考文献)\]上做了一些简化，可参考[issue #1133](https://github.com/PaddlePaddle/Paddle/issues/1133)。
 ### 参数定义
@@ -388,7 +343,6 @@ wmt14_reader = paddle.batch(
 首先依据模型配置的`cost`定义模型参数。
 ```python
-# create parameters
 parameters = paddle.parameters.create(cost)
 ```
@@ -400,28 +354,36 @@ for param in parameters.keys():
 ```
 ### 训练模型
 1. 构造trainer
    根据优化目标cost,网络拓扑结构和模型参数来构造出trainer用来训练，在构造时还需指定优化方法，这里使用最基本的SGD方法。
    ```python
-    optimizer = paddle.optimizer.Adam(learning_rate=1e-4)
+    optimizer = paddle.optimizer.Adam(
+        learning_rate=5e-5,
+        regularization=paddle.optimizer.L2Regularization(rate=1e-3))
    trainer = paddle.trainer.SGD(cost=cost,
                                 parameters=parameters,
                                 update_equation=optimizer)
-```
+    ```
-2. 构造event_handler
+1. 构造event_handler
    可以通过自定义回调函数来评估训练过程中的各种状态，比如错误率等。下面的代码通过event.batch_id % 10 == 0 指定没10个batch打印一次日志，包含cost等信息。
    ```python
    def event_handler(event):
        if isinstance(event, paddle.event.EndIteration):
            if event.batch_id % 10 == 0:
-                print "Pass %d, Batch %d, Cost %f, %s" % (
+                print "\nPass %d, Batch %d, Cost %f, %s" % (
                    event.pass_id, event.batch_id, event.cost, event.metrics)
+            else:
+                sys.stdout.write('.')
+                sys.stdout.flush()
    ```
-3. 启动训练：
+1. 启动训练：
    ```python
    trainer.train(
@@ -430,30 +392,29 @@ for param in parameters.keys():
        num_passes=10000,
        feeding=feeding)
    ```
-    训练开始后，可以观察到event_handler输出的日志如下：
-    ```text
+训练开始后，可以观察到event_handler输出的日志如下：
-    Pass 0, Batch 0, Cost 247.408008, {'classification_error_evaluator': 1.0}
-    Pass 0, Batch 10, Cost 212.058789, {'classification_error_evaluator': 0.8737863898277283}
+```text
-    ...
+Pass 0, Batch 0, Cost 148.444983, {'classification_error_evaluator': 1.0}
+.........
+Pass 0, Batch 10, Cost 335.896802, {'classification_error_evaluator': 0.9325153231620789}
+.........
 ```
+    当`classification_error_evaluator`的值低于0.35的时候，表示训练成功。
 ## 应用模型
 ### 下载预训练的模型
 由于NMT模型的训练非常耗时，我们在50个物理节点（每节点含有2颗6核CPU）的集群中，花了5天时间训练了16个pass，其中每个pass耗时7个小时。因此，我们提供了一个预先训练好的模型（pass-00012）供大家直接下载使用。该模型大小为205MB，在所有16个模型中有最高的[BLEU评估](#BLEU评估)值26.92。下载并解压模型的命令如下：
 ```bash
 cd pretrained
 ./wmt14_model.sh
 ```
-### 应用命令与结果
-新版api尚未支持机器翻译的翻译过程，尽请期待。
-翻译结果请见[效果展示](#效果展示)。
 ### BLEU评估
 BLEU(Bilingual Evaluation understudy)是一种广泛使用的机器翻译自动评测指标，由IBM的watson研究中心于2002年提出\[[5](#参考文献)\]，基本出发点是：机器译文越接近专业翻译人员的翻译结果，翻译系统的性能越好。其中，机器译文与人工参考译文之间的接近程度，采用句子精确度（precision）的计算方法，即比较两者的n元词组相匹配的个数，匹配的个数越多，BLEU得分越好。

--- a/machine_translation/api_train.py
+++ b/machine_translation/api_train.py
@@ -105,9 +105,8 @@ def main():
    # define optimize method and trainer
    optimizer = paddle.optimizer.Adam(learning_rate=1e-4)
-    trainer = paddle.trainer.SGD(cost=cost,
+    trainer = paddle.trainer.SGD(
-                                 parameters=parameters,
+        cost=cost, parameters=parameters, update_equation=optimizer)
-                                 update_equation=optimizer)
    # define data reader
    feeding = {

--- a/machine_translation/index.en.html
+++ b/machine_translation/index.en.html
--- a/machine_translation/index.html
+++ b/machine_translation/index.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 机器翻译
-本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/machine_translation)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/machine_translation)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍
@@ -194,54 +194,8 @@ e_{ij}&=align(z_i,h_j)\\\\
 ## 数据介绍
-### 下载与解压缩
 本教程使用[WMT-14](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/)数据集中的[bitexts(after selection)](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/bitexts.tgz)作为训练集，[dev+test data](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/dev+test.tgz)作为测试集和生成集。
-在Linux下，只需简单地运行以下命令：
-```bash
-cd data
-./wmt14_data.sh
-```
-得到的数据集`data/wmt14`包含如下三个文件夹：
-<p align = "center">
-<table>
-<tr>
-<td>文件夹名</td>
-<td>法英平行语料文件</td>
-<td>文件数</td>
-<td>文件大小</td>
-</tr>
-<tr>
-<td>train</td>
-<td>ccb2_pc30.src, ccb2_pc30.trg, etc</td>
-<td>12</td>
-<td>3.55G</td>
-</tr>
-<tr>
-<td>test</td>
-<td>ntst1213.src, ntst1213.trg</td>
-<td>2</td>
-<td>1636k</td>
-</tr>
-</tr>
-<tr>
-<td>gen</td>
-<td>ntst14.src, ntst14.trg</td>
-<td>2</td>
-<td>864k</td>
-</tr>
-</table>
-</p>
- `XXX.src`是源法语文件，`XXX.trg`是目标英语文件，文件中的每行存放一个句子
- `XXX.src`和`XXX.trg`的行数一致，且两者任意第$i$行的句子之间都有着一一对应的关系。
 ### 数据预处理
 我们的预处理流程包括两步：
@@ -262,6 +216,7 @@ cd data
 ```python
 # 加载 paddle的python包
+import sys
 import paddle.v2 as paddle
 # 配置只使用cpu，并且使用一个cpu进行训练
@@ -298,17 +253,16 @@ wmt14_reader = paddle.batch(
   decoder_size = 512 # 解码器中的GRU隐层大小
  ```
-2. 其次，实现编码器框架。分为三步：
+1. 其次，实现编码器框架。分为三步：
-   2.1 将在dataset reader中生成的用每个单词在字典中的索引表示的源语言序列
+   1 输入是一个文字序列，被表示成整型的序列。序列中每个元素是文字在字典中的索引。所以，我们定义数据层的数据类型为`integer_value_sequence`（整型序列），序列中每个元素的范围是`[0, source_dict_dim)`。
-   转换成one-hot vector表示的源语言序列$\mathbf{w}$，其类型为integer_value_sequence。
   ```python
    src_word_id = paddle.layer.data(
        name='source_language_word',
        type=paddle.data_type.integer_value_sequence(source_dict_dim))
   ```
-   2.2 将上述编码映射到低维语言空间的词向量$\mathbf{s}$。
+   1. 将上述编码映射到低维语言空间的词向量$\mathbf{s}$。
   ```python
    src_embedding = paddle.layer.embedding(
@@ -316,7 +270,7 @@ wmt14_reader = paddle.batch(
        size=word_vector_dim,
        param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))
   ```
-   2.3 用双向GRU编码源语言序列，拼接两个GRU的编码结果得到$\mathbf{h}$。
+   1. 用双向GRU编码源语言序列，拼接两个GRU的编码结果得到$\mathbf{h}$。
   ```python
    src_forward = paddle.networks.simple_gru(
@@ -326,16 +280,17 @@ wmt14_reader = paddle.batch(
    encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])
   ```
-3. 接着，定义基于注意力机制的解码器框架。分为三步：
+1. 接着，定义基于注意力机制的解码器框架。分为三步：
-   3.1 对源语言序列编码后的结果（见2.3），过一个前馈神经网络（Feed Forward Neural Network），得到其映射。
+   1. 对源语言序列编码后的结果（见2.3），过一个前馈神经网络（Feed Forward Neural Network），得到其映射。
   ```python
    with paddle.layer.mixed(size=decoder_size) as encoded_proj:
        encoded_proj += paddle.layer.full_matrix_projection(
            input=encoded_vector)
   ```
-   3.2 构造解码器RNN的初始状态。由于解码器需要预测时序目标序列，但在0时刻并没有初始值，所以我们希望对其进行初始化。这里采用的是将源语言序列逆序编码后的最后一个状态进行非线性映射，作为该初始值，即$c_0=h_T$。
+   1. 构造解码器RNN的初始状态。由于解码器需要预测时序目标序列，但在0时刻并没有初始值，所以我们希望对其进行初始化。这里采用的是将源语言序列逆序编码后的最后一个状态进行非线性映射，作为该初始值，即$c_0=h_T$。
   ```python
    backward_first = paddle.layer.first_seq(input=src_backward)
@@ -344,15 +299,14 @@ wmt14_reader = paddle.batch(
        decoder_boot += paddle.layer.full_matrix_projection(
            input=backward_first)
   ```
-   3.3 定义解码阶段每一个时间步的RNN行为，即根据当前时刻的源语言上下文向量$c_i$、解码器隐层状态$z_i$和目标语言中第$i$个词$u_i$，来预测第$i+1$个词的概率$p_{i+1}$。
+   1. 定义解码阶段每一个时间步的RNN行为，即根据当前时刻的源语言上下文向量$c_i$、解码器隐层状态$z_i$和目标语言中第$i$个词$u_i$，来预测第$i+1$个词的概率$p_{i+1}$。
      - decoder_mem记录了前一个时间步的隐层状态$z_i$，其初始状态是decoder_boot。
      - context通过调用`simple_attention`函数，实现公式$c_i=\sum {j=1}^{T}a_{ij}h_j$。其中，enc_vec是$h_j$，enc_proj是$h_j$的映射（见3.1），权重$a_{ij}$的计算已经封装在`simple_attention`函数中。
      - decoder_inputs融合了$c_i$和当前目标词current_word（即$u_i$）的表示。
      - gru_step通过调用`gru_step_layer`函数，在decoder_inputs和decoder_mem上做了激活操作，即实现公式$z_{i+1}=\phi _{\theta '}\left ( c_i,u_i,z_i \right )$。
      - 最后，使用softmax归一化计算单词的概率，将out结果返回，即实现公式$p\left ( u_i|u_{&lt;i},\mathbf{x} \right )=softmax(W_sz_i+b_z)$。
   ```python
    def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
@@ -382,24 +336,24 @@ wmt14_reader = paddle.batch(
            out += paddle.layer.full_matrix_projection(input=gru_step)
        return out
    ```
-4. 训练模式与生成模式下的解码器调用区别。
-   4.1 定义解码器框架名字，和`gru_decoder_with_attention`函数的前两个输入。注意：这两个输入使用`StaticInput`，具体说明可见[StaticInput文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入)。
+1. 定义解码器框架名字，和`gru_decoder_with_attention`函数的前两个输入。注意：这两个输入使用`StaticInput`，具体说明可见[StaticInput文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入)。
-   ```python
+    ```python
    decoder_group_name = "decoder_group"
    group_input1 = paddle.layer.StaticInputV2(input=encoded_vector, is_seq=True)
    group_input2 = paddle.layer.StaticInputV2(input=encoded_proj, is_seq=True)
    group_inputs = [group_input1, group_input2]
-   ```
+    ```
-   4.2 训练模式下的解码器调用：
-      - 首先，将目标语言序列的词向量trg_embedding，直接作为训练模式下的current_word传给`gru_decoder_with_attention`函数。
+1. 训练模式下的解码器调用：
-      - 其次，使用`recurrent_group`函数循环调用`gru_decoder_with_attention`函数。
-      - 接着，使用目标语言的下一个词序列作为标签层lbl，即预测目标词。
-      - 最后，用多类交叉熵损失函数`classification_cost`来计算损失值。
-   ```python
+   - 首先，将目标语言序列的词向量trg_embedding，直接作为训练模式下的current_word传给`gru_decoder_with_attention`函数。
+   - 其次，使用`recurrent_group`函数循环调用`gru_decoder_with_attention`函数。
+   - 接着，使用目标语言的下一个词序列作为标签层lbl，即预测目标词。
+   - 最后，用多类交叉熵损失函数`classification_cost`来计算损失值。
+    ```python
    trg_embedding = paddle.layer.embedding(
        input=paddle.layer.data(
            name='target_language_word',
@@ -422,7 +376,8 @@ wmt14_reader = paddle.batch(
        name='target_language_next_word',
        type=paddle.data_type.integer_value_sequence(target_dict_dim))
    cost = paddle.layer.classification_cost(input=decoder, label=lbl)
-   ```
+    ```
 注意：我们提供的配置在Bahdanau的论文\[[4](#参考文献)\]上做了一些简化，可参考[issue #1133](https://github.com/PaddlePaddle/Paddle/issues/1133)。
 ### 参数定义
@@ -430,7 +385,6 @@ wmt14_reader = paddle.batch(
 首先依据模型配置的`cost`定义模型参数。
 ```python
-# create parameters
 parameters = paddle.parameters.create(cost)
 ```
@@ -442,28 +396,36 @@ for param in parameters.keys():
 ```
 ### 训练模型
 1. 构造trainer
    根据优化目标cost,网络拓扑结构和模型参数来构造出trainer用来训练，在构造时还需指定优化方法，这里使用最基本的SGD方法。
    ```python
-    optimizer = paddle.optimizer.Adam(learning_rate=1e-4)
+    optimizer = paddle.optimizer.Adam(
+        learning_rate=5e-5,
+        regularization=paddle.optimizer.L2Regularization(rate=1e-3))
    trainer = paddle.trainer.SGD(cost=cost,
                                 parameters=parameters,
                                 update_equation=optimizer)
-```
+    ```
-2. 构造event_handler
+1. 构造event_handler
    可以通过自定义回调函数来评估训练过程中的各种状态，比如错误率等。下面的代码通过event.batch_id % 10 == 0 指定没10个batch打印一次日志，包含cost等信息。
    ```python
    def event_handler(event):
        if isinstance(event, paddle.event.EndIteration):
            if event.batch_id % 10 == 0:
-                print "Pass %d, Batch %d, Cost %f, %s" % (
+                print "\nPass %d, Batch %d, Cost %f, %s" % (
                    event.pass_id, event.batch_id, event.cost, event.metrics)
+            else:
+                sys.stdout.write('.')
+                sys.stdout.flush()
    ```
-3. 启动训练：
+1. 启动训练：
    ```python
    trainer.train(
@@ -472,30 +434,29 @@ for param in parameters.keys():
        num_passes=10000,
        feeding=feeding)
    ```
-    训练开始后，可以观察到event_handler输出的日志如下：
-    ```text
+训练开始后，可以观察到event_handler输出的日志如下：
-    Pass 0, Batch 0, Cost 247.408008, {'classification_error_evaluator': 1.0}
-    Pass 0, Batch 10, Cost 212.058789, {'classification_error_evaluator': 0.8737863898277283}
+```text
-    ...
+Pass 0, Batch 0, Cost 148.444983, {'classification_error_evaluator': 1.0}
+.........
+Pass 0, Batch 10, Cost 335.896802, {'classification_error_evaluator': 0.9325153231620789}
+.........
 ```
+    当`classification_error_evaluator`的值低于0.35的时候，表示训练成功。
 ## 应用模型
 ### 下载预训练的模型
 由于NMT模型的训练非常耗时，我们在50个物理节点（每节点含有2颗6核CPU）的集群中，花了5天时间训练了16个pass，其中每个pass耗时7个小时。因此，我们提供了一个预先训练好的模型（pass-00012）供大家直接下载使用。该模型大小为205MB，在所有16个模型中有最高的[BLEU评估](#BLEU评估)值26.92。下载并解压模型的命令如下：
 ```bash
 cd pretrained
 ./wmt14_model.sh
 ```
-### 应用命令与结果
-新版api尚未支持机器翻译的翻译过程，尽请期待。
-翻译结果请见[效果展示](#效果展示)。
 ### BLEU评估
 BLEU(Bilingual Evaluation understudy)是一种广泛使用的机器翻译自动评测指标，由IBM的watson研究中心于2002年提出\[[5](#参考文献)\]，基本出发点是：机器译文越接近专业翻译人员的翻译结果，翻译系统的性能越好。其中，机器译文与人工参考译文之间的接近程度，采用句子精确度（precision）的计算方法，即比较两者的n元词组相匹配的个数，匹配的个数越多，BLEU得分越好。

--- a/machine_translation/seqToseq_net.py
+++ b/machine_translation/seqToseq_net.py
@@ -110,8 +110,7 @@ group_inputs = [group_input1, group_input2]
 if not is_generating:
    trg_embedding = embedding_layer(
-        input=data_layer(
+        input=data_layer(name='target_language_word', size=target_dict_dim),
-            name='target_language_word', size=target_dict_dim),
        size=word_vector_dim,
        param_attr=ParamAttr(name='_target_language_embedding'))
    group_inputs.append(trg_embedding)
@@ -156,8 +155,7 @@ else:
    seqtext_printer_evaluator(
        input=beam_gen,
-        id_input=data_layer(
+        id_input=data_layer(name="sent_id", size=1),
-            name="sent_id", size=1),
        dict_file=trg_lang_dict,
        result_file=gen_trans_file)
    outputs(beam_gen)
--- a/recognize_digits/README.en.md
+++ b/recognize_digits/README.en.md
 # Recognize Digits
-The source code for this tutorial is under [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits). First-time readers, please refer to PaddlePaddle [installation instructions](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html).
+The source code for this tutorial is under [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits). First-time readers, please refer to PaddlePaddle [installation instructions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
 ## Introduction
 When we learn a new programming language, the first task is usually to write a program that prints "Hello World." In Machine Learning or Deep Learning, the equivalent task is to train a model to perform handwritten digit recognition with [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. Handwriting recognition is a typical image classification problem. The problem is relatively easy, and MNIST is a complete dataset. As a simple Computer Vision dataset, MNIST contains images of handwritten digits and their corresponding labels (Fig. 1). The input image is a 28x28 matrix, and the label is one of the digits from 0 to 9. Each image is normalized in size and centered.
@@ -32,15 +32,15 @@ In a simple softmax regression model, the input is fed to fully connected layers
 Input $X$ is multiplied with weights $W$, and bias $b$ is added to generate activations.
-$$ y_i = softmax(\sum_j W_{i,j}x_j + b_i) $$
+$$ y_i = \text{softmax}(\sum_j W_{i,j}x_j + b_i) $$
-where $ softmax(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
+where $ \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
 For an $N$ class classification problem with $N$ output nodes, an $N$ dimensional vector is normalized to $N$ real values in the range [0, 1], each representing the probability of the sample to belong to the class. Here $y_i$ is the prediction probability that an image is digit $i$.
 In such a classification problem, we usually use the cross entropy loss function:
-$$  crossentropy(label, y) = -\sum_i label_ilog(y_i) $$
+$$  \text{crossentropy}(label, y) = -\sum_i label_ilog(y_i) $$
 Fig. 2 shows a softmax regression network, with weights in blue, and bias in red. +1 indicates bias is 1.
@@ -55,7 +55,7 @@ The Softmax regression model described above uses the simplest two-layer neural
 1.  After the first hidden layer, we get $ H_1 = \phi(W_1X + b_1) $, where $\phi$ is the activation function. Some common ones are sigmoid, tanh and ReLU.
 2.  After the second hidden layer, we get $ H_2 = \phi(W_2H_1 + b_2) $.
-3.  Finally, after output layer, we get $Y=softmax(W_3H_2 + b_3)$, the final classification result vector.
+3.  Finally, after output layer, we get $Y=\text{softmax}(W_3H_2 + b_3)$, the final classification result vector.
 Fig. 3. is Multilayer Perceptron network, with weights in blue, and bias in red. +1 indicates bias is 1.
@@ -70,7 +70,7 @@ Fig. 3. Multilayer Perceptron network architecture<br/>
 #### Convolutional Layer
 <p align="center">
-<img src="image/conv_layer_en.png" width=500><br/>
+<img src="image/conv_layer.png" width='750'><br/>
 Fig. 4. Convolutional layer<br/>
 </p>
@@ -240,7 +240,7 @@ def event_handler(event):
            print "Pass %d, Batch %d, Cost %f, %s" % (
                event.pass_id, event.batch_id, event.cost, event.metrics)
    if isinstance(event, paddle.event.EndPass):
-        result = trainer.test(reader=paddle.reader.batched(
+        result = trainer.test(reader=paddle.batch(
            paddle.dataset.mnist.test(), batch_size=128))
        print "Test with Pass %d, Cost %f, %s\n" % (
            event.pass_id, result.cost, result.metrics)
@@ -248,7 +248,7 @@ def event_handler(event):
                      result.metrics['classification_error_evaluator']))
 trainer.train(
-    reader=paddle.reader.batched(
+    reader=paddle.batch(
        paddle.reader.shuffle(
            paddle.dataset.mnist.train(), buf_size=8192),
        batch_size=128),
@@ -293,7 +293,7 @@ This tutorial describes a few basic Deep Learning models viz. Softmax regression
 7. Deng, Li, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, and Geoffrey E. Hinton. ["Binary coding of speech spectrograms using a deep auto-encoder."](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.185.1908&rep=rep1&type=pdf) In Interspeech, pp. 1692-1695. 2010.
 8. Kégl, Balázs, and Róbert Busa-Fekete. ["Boosting products of base classifiers."](http://dl.acm.org/citation.cfm?id=1553439) In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 497-504. ACM, 2009.
 9. Rosenblatt, Frank. ["The perceptron: A probabilistic model for information storage and organization in the brain."](http://psycnet.apa.org/journals/rev/65/6/386/) Psychological review 65, no. 6 (1958): 386.
-10. Bishop, Christopher M. ["Pattern recognition."](http://s3.amazonaws.com/academia.edu.documents/30428242/bg0137.pdf?AWSAccessKeyId=AKIAJ56TQJRTWSMTNPEA&Expires=1484816640&Signature=85Ad6%2Fca8T82pmHzxaSXermovIA%3D&response-content-disposition=inline%3B%20filename%3DPattern_recognition_and_machine_learning.pdf) Machine Learning 128 (2006): 1-58.
+10. Bishop, Christopher M. ["Pattern recognition."](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf) Machine Learning 128 (2006): 1-58.
 <br/>
-<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">This book</span> is created by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and uses <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Shared knowledge signature - non commercial use-Sharing 4.0 International Licensing Protocal</a>.
+This tutorial is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
--- a/recognize_digits/README.md
+++ b/recognize_digits/README.md
 # 识别数字
-本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍
 当我们学习编程的时候，编写的第一个程序一般是实现打印"Hello World"。而机器学习（或深度学习）的入门教程，一般都是 [MNIST](http://yann.lecun.com/exdb/mnist/) 数据库上的手写识别问题。原因是手写识别属于典型的图像分类问题，比较简单，同时MNIST数据集也很完备。MNIST数据集作为一个简单的计算机视觉数据集，包含一系列如图1所示的手写数字图片和对应的标签。图片是28x28的像素矩阵，标签则对应着0~9的10个数字。每张图片都经过了大小归一化和居中处理。
@@ -32,15 +32,15 @@ Yann LeCun早先在手写字符识别上做了很多研究，并在研究过程
 输入层的数据$X$传到输出层，在激活操作之前，会乘以相应的权重 $W$ ，并加上偏置变量 $b$ ，具体如下：
-$$ y_i = softmax(\sum_j W_{i,j}x_j + b_i) $$
+$$ y_i = \text{softmax}(\sum_j W_{i,j}x_j + b_i) $$
-其中 $ softmax(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
+其中 $ \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
 对于有 $N$ 个类别的多分类问题，指定 $N$ 个输出节点，$N$ 维输入特征经过softmax将归一化为 $N$ 个[0,1]范围内的实数值，分别表示该样本属于这 $N$ 个类别的概率。此处的 $y_i$ 即对应该图片为数字 $i$ 的预测概率。
 在分类问题中，我们一般采用交叉熵代价损失函数（cross entropy），公式如下：
-$$  crossentropy(label, y) = -\sum_i label_ilog(y_i) $$
+$$  \text{crossentropy}(label, y) = -\sum_i label_ilog(y_i) $$
 图2为softmax回归的网络图，图中权重用蓝线表示、偏置用红线表示、+1代表偏置参数的系数为1。
@@ -55,7 +55,7 @@ Softmax回归模型采用了最简单的两层神经网络，即只有输入层
 1.  经过第一个隐藏层，可以得到 $ H_1 = \phi(W_1X + b_1) $，其中$\phi$代表激活函数，常见的有sigmoid、tanh或ReLU等函数。
 2.  经过第二个隐藏层，可以得到 $ H_2 = \phi(W_2H_1 + b_2) $。
-3.  最后，再经过输出层，得到的$Y=softmax(W_3H_2 + b_3)$，即为最后的分类结果向量。
+3.  最后，再经过输出层，得到的$Y=\text{softmax}(W_3H_2 + b_3)$，即为最后的分类结果向量。
 图3为多层感知器的网络结构图，图中权重用蓝线表示、偏置用红线表示、+1代表偏置参数的系数为1。
@@ -67,11 +67,11 @@ Softmax回归模型采用了最简单的两层神经网络，即只有输入层
 ### 卷积神经网络(Convolutional Neural Network, CNN)
-在多层感知器模型中，将图像展开成一维向量输入到网络中，忽略了图像的位置和结构信息，而卷积神经网络能够更好的利用图像的结构信息。[LeNet-5](http://yann.lecun.com/exdb/lenet/)是一个较简单的卷积神经网络。图6显示了其结构：输入的二维图像，先经过两次卷积层到池化层，再经过全连接层，最后使用softmax分类作为输出层。下面我们主要介绍卷积层和池化层。
+在多层感知器模型中，将图像展开成一维向量输入到网络中，忽略了图像的位置和结构信息，而卷积神经网络能够更好的利用图像的结构信息。[LeNet-5](http://yann.lecun.com/exdb/lenet/)是一个较简单的卷积神经网络。图4显示了其结构：输入的二维图像，先经过两次卷积层到池化层，再经过全连接层，最后使用softmax分类作为输出层。下面我们主要介绍卷积层和池化层。
 <p align="center">
 <img src="image/cnn.png"><br/>
-图6. LeNet-5卷积神经网络结构<br/>
+图4. LeNet-5卷积神经网络结构<br/>
 </p>
 #### 卷积层
@@ -79,17 +79,11 @@ Softmax回归模型采用了最简单的两层神经网络，即只有输入层
 卷积层是卷积神经网络的核心基石。在图像识别里我们提到的卷积是二维卷积，即离散二维滤波器（也称作卷积核）与二维图像做卷积操作，简单的讲是二维滤波器滑动到二维图像上所有位置，并在每个位置上与该像素点及其领域像素点做内积。卷积操作被广泛应用与图像处理领域，不同卷积核可以提取不同的特征，例如边沿、线性、角等特征。在深层卷积神经网络中，通过卷积操作可以提取出图像低级到复杂的特征。
 <p align="center">
-<img src="image/conv_layer.png"><br/>
+<img src="image/conv_layer.png" width='750'><br/>
-图4. 卷积层图片<br/>
+图5. 卷积层图片<br/>
 </p>
-图4给出一个卷积计算过程的示例图，输入图像大小为$H=5,W=5,D=3$，即$5 \times 5$大小的3通道（RGB，也称作深度）彩色图像。这个示例图中包含两（用$K$表示）组卷积核，即图中滤波器$W_0$和$W_1$。在卷积计算中，通常对不同的输入通道采用不同的卷积核，如图示例中每组卷积核包含（$D=3）$个$3 \times 3$（用$F \times F$表示）大小的卷积核。另外，这个示例中卷积核在图像的水平方向（$W$方向）和垂直方向（$H$方向）的滑动步长为2（用$S$表示）；对输入图像周围各填充1（用$P$表示）个0，即图中输入层原始数据为蓝色部分，灰色部分是进行了大小为1的扩展，用0来进行扩展。经过卷积操作得到输出为$3 \times 3 \times 2$（用$H_{o} \times W_{o} \times K$表示）大小的特征图，即$3 \times 3$大小的2通道特征图，其中$H_o$计算公式为：$H_o = (H - F + 2 \times P)/S + 1$，$W_o$同理。 而输出特征图中的每个像素，是每组滤波器与输入图像每个特征图的内积再求和，再加上偏置$b_o$，偏置通常对于每个输出特征图是共享的。例如图中输出特征图$o[:,:,0]$中的第一个$2$计算如下：
+图5给出一个卷积计算过程的示例图，输入图像大小为$H=5,W=5,D=3$，即$5 \times 5$大小的3通道（RGB，也称作深度）彩色图像。这个示例图中包含两（用$K$表示）组卷积核，即图中滤波器$W_0$和$W_1$。在卷积计算中，通常对不同的输入通道采用不同的卷积核，如图示例中每组卷积核包含（$D=3）$个$3 \times 3$（用$F \times F$表示）大小的卷积核。另外，这个示例中卷积核在图像的水平方向（$W$方向）和垂直方向（$H$方向）的滑动步长为2（用$S$表示）；对输入图像周围各填充1（用$P$表示）个0，即图中输入层原始数据为蓝色部分，灰色部分是进行了大小为1的扩展，用0来进行扩展。经过卷积操作得到输出为$3 \times 3 \times 2$（用$H_{o} \times W_{o} \times K$表示）大小的特征图，即$3 \times 3$大小的2通道特征图，其中$H_o$计算公式为：$H_o = (H - F + 2 \times P)/S + 1$，$W_o$同理。 而输出特征图中的每个像素，是每组滤波器与输入图像每个特征图的内积再求和，再加上偏置$b_o$，偏置通常对于每个输出特征图是共享的。输出特征图$o[:,:,0]$中的最后一个$-2$计算如图5右下角公式所示。
-$$ o[0,0,0] = \sum x[0:3,0:3,0] * w_{0}[:,:,0]]  + \sum x[0:3,0:3,1] * w_{0}[:,:,1]]  +  \sum x[0:3,0:3,2] * w_{0}[:,:,2]] + b_0 = 2 $$
-$$ \sum x[0:3,0:3,0] * w_{0}[:,:,0]] = 0*1 + 0*1 + 0*1 + 0*1 + 1*1 + 2*(-1) + 0*(-1) + 0*1 + 0*(-1) = -1 $$
-$$ \sum x[0:3,0:3,1] * w_{0}[:,:,1]] = 0*0 + 0*1 + 0*1 + 0*(-1) + 0*0 + 1*1 + 0*1 + 2*0 + 1*1 = 2 $$
-$$ \sum x[0:3,0:3,2] * w_{0}[:,:,2]] = 0*(-1) + 0*1 + 0*(-1) + 0*0 + 1*1 + 1*0 + 0*(-1) + 1*0 + 1*(-1) = 0 $$
-$$ b_0 = 1 $$
 在卷积操作中卷积核是可学习的参数，经过上面示例介绍，每层卷积的参数大小为$D \times F \times F \times K$。在多层感知器模型中，神经元通常是全部连接，参数较多。而卷积层的参数较少，这也是由卷积层的主要特性即局部连接和共享权重所决定。
@@ -103,10 +97,10 @@ $$ b_0 = 1 $$
 <p align="center">
 <img src="image/max_pooling.png" width="400px"><br/>
-图5. 池化层图片<br/>
+图6. 池化层图片<br/>
 </p>
-池化是非线性下采样的一种形式，主要作用是通过减少网络的参数来减小计算量，并且能够在一定程度上控制过拟合。通常在卷积层的后面会加上一个池化层。池化包括最大池化、平均池化等。其中最大池化是用不重叠的矩形框将输入层分成不同的区域，对于每个矩形框的数取最大值作为输出层，如图5所示。
+池化是非线性下采样的一种形式，主要作用是通过减少网络的参数来减小计算量，并且能够在一定程度上控制过拟合。通常在卷积层的后面会加上一个池化层。池化包括最大池化、平均池化等。其中最大池化是用不重叠的矩形框将输入层分成不同的区域，对于每个矩形框的数取最大值作为输出层，如图6所示。
 更详细的关于卷积神经网络的具体知识可以参考[斯坦福大学公开课]( http://cs231n.github.io/convolutional-networks/ )和[图像分类](https://github.com/PaddlePaddle/book/blob/develop/image_classification/README.md)教程。
@@ -251,7 +245,7 @@ def event_handler(event):
            print "Pass %d, Batch %d, Cost %f, %s" % (
                event.pass_id, event.batch_id, event.cost, event.metrics)
    if isinstance(event, paddle.event.EndPass):
-        result = trainer.test(reader=paddle.reader.batched(
+        result = trainer.test(reader=paddle.batch(
            paddle.dataset.mnist.test(), batch_size=128))
        print "Test with Pass %d, Cost %f, %s\n" % (
            event.pass_id, result.cost, result.metrics)
@@ -259,7 +253,7 @@ def event_handler(event):
                      result.metrics['classification_error_evaluator']))
 trainer.train(
-    reader=paddle.reader.batched(
+    reader=paddle.batch(
        paddle.reader.shuffle(
            paddle.dataset.mnist.train(), buf_size=8192),
        batch_size=128),
@@ -295,7 +289,7 @@ trainer.train(
 7. Deng, Li, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, and Geoffrey E. Hinton. ["Binary coding of speech spectrograms using a deep auto-encoder."](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.185.1908&rep=rep1&type=pdf) In Interspeech, pp. 1692-1695. 2010.
 8. Kégl, Balázs, and Róbert Busa-Fekete. ["Boosting products of base classifiers."](http://dl.acm.org/citation.cfm?id=1553439) In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 497-504. ACM, 2009.
 9. Rosenblatt, Frank. ["The perceptron: A probabilistic model for information storage and organization in the brain."](http://psycnet.apa.org/journals/rev/65/6/386/) Psychological review 65, no. 6 (1958): 386.
-10. Bishop, Christopher M. ["Pattern recognition."](http://s3.amazonaws.com/academia.edu.documents/30428242/bg0137.pdf?AWSAccessKeyId=AKIAJ56TQJRTWSMTNPEA&Expires=1484816640&Signature=85Ad6%2Fca8T82pmHzxaSXermovIA%3D&response-content-disposition=inline%3B%20filename%3DPattern_recognition_and_machine_learning.pdf) Machine Learning 128 (2006): 1-58.
+10. Bishop, Christopher M. ["Pattern recognition."](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf) Machine Learning 128 (2006): 1-58.
 <br/>
 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">知识共享 署名-非商业性使用-相同方式共享 4.0 国际 许可协议</a>进行许可。
--- a/recognize_digits/image/conv_layer.png
+++ b/recognize_digits/image/conv_layer.png
--- a/recognize_digits/image/conv_layer_en.png
+++ b/recognize_digits/image/conv_layer_en.png
--- a/recognize_digits/index.en.html
+++ b/recognize_digits/index.en.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # Recognize Digits
-The source code for this tutorial is under [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits). First-time readers, please refer to PaddlePaddle [installation instructions](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html).
+The source code for this tutorial is under [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits). First-time readers, please refer to PaddlePaddle [installation instructions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
 ## Introduction
 When we learn a new programming language, the first task is usually to write a program that prints "Hello World." In Machine Learning or Deep Learning, the equivalent task is to train a model to perform handwritten digit recognition with [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. Handwriting recognition is a typical image classification problem. The problem is relatively easy, and MNIST is a complete dataset. As a simple Computer Vision dataset, MNIST contains images of handwritten digits and their corresponding labels (Fig. 1). The input image is a 28x28 matrix, and the label is one of the digits from 0 to 9. Each image is normalized in size and centered.
@@ -74,15 +74,15 @@ In a simple softmax regression model, the input is fed to fully connected layers
 Input $X$ is multiplied with weights $W$, and bias $b$ is added to generate activations.
-$$ y_i = softmax(\sum_j W_{i,j}x_j + b_i) $$
+$$ y_i = \text{softmax}(\sum_j W_{i,j}x_j + b_i) $$
-where $ softmax(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
+where $ \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
 For an $N$ class classification problem with $N$ output nodes, an $N$ dimensional vector is normalized to $N$ real values in the range [0, 1], each representing the probability of the sample to belong to the class. Here $y_i$ is the prediction probability that an image is digit $i$.
 In such a classification problem, we usually use the cross entropy loss function:
-$$  crossentropy(label, y) = -\sum_i label_ilog(y_i) $$
+$$  \text{crossentropy}(label, y) = -\sum_i label_ilog(y_i) $$
 Fig. 2 shows a softmax regression network, with weights in blue, and bias in red. +1 indicates bias is 1.
@@ -97,7 +97,7 @@ The Softmax regression model described above uses the simplest two-layer neural
 1.  After the first hidden layer, we get $ H_1 = \phi(W_1X + b_1) $, where $\phi$ is the activation function. Some common ones are sigmoid, tanh and ReLU.
 2.  After the second hidden layer, we get $ H_2 = \phi(W_2H_1 + b_2) $.
-3.  Finally, after output layer, we get $Y=softmax(W_3H_2 + b_3)$, the final classification result vector.
+3.  Finally, after output layer, we get $Y=\text{softmax}(W_3H_2 + b_3)$, the final classification result vector.
 Fig. 3. is Multilayer Perceptron network, with weights in blue, and bias in red. +1 indicates bias is 1.
@@ -112,7 +112,7 @@ Fig. 3. Multilayer Perceptron network architecture<br/>
 #### Convolutional Layer
 <p align="center">
-<img src="image/conv_layer_en.png" width=500><br/>
+<img src="image/conv_layer.png" width='750'><br/>
 Fig. 4. Convolutional layer<br/>
 </p>
@@ -282,7 +282,7 @@ def event_handler(event):
            print "Pass %d, Batch %d, Cost %f, %s" % (
                event.pass_id, event.batch_id, event.cost, event.metrics)
    if isinstance(event, paddle.event.EndPass):
-        result = trainer.test(reader=paddle.reader.batched(
+        result = trainer.test(reader=paddle.batch(
            paddle.dataset.mnist.test(), batch_size=128))
        print "Test with Pass %d, Cost %f, %s\n" % (
            event.pass_id, result.cost, result.metrics)
@@ -290,7 +290,7 @@ def event_handler(event):
                      result.metrics['classification_error_evaluator']))
 trainer.train(
-    reader=paddle.reader.batched(
+    reader=paddle.batch(
        paddle.reader.shuffle(
            paddle.dataset.mnist.train(), buf_size=8192),
        batch_size=128),
@@ -335,10 +335,10 @@ This tutorial describes a few basic Deep Learning models viz. Softmax regression
 7. Deng, Li, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, and Geoffrey E. Hinton. ["Binary coding of speech spectrograms using a deep auto-encoder."](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.185.1908&rep=rep1&type=pdf) In Interspeech, pp. 1692-1695. 2010.
 8. Kégl, Balázs, and Róbert Busa-Fekete. ["Boosting products of base classifiers."](http://dl.acm.org/citation.cfm?id=1553439) In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 497-504. ACM, 2009.
 9. Rosenblatt, Frank. ["The perceptron: A probabilistic model for information storage and organization in the brain."](http://psycnet.apa.org/journals/rev/65/6/386/) Psychological review 65, no. 6 (1958): 386.
-10. Bishop, Christopher M. ["Pattern recognition."](http://s3.amazonaws.com/academia.edu.documents/30428242/bg0137.pdf?AWSAccessKeyId=AKIAJ56TQJRTWSMTNPEA&Expires=1484816640&Signature=85Ad6%2Fca8T82pmHzxaSXermovIA%3D&response-content-disposition=inline%3B%20filename%3DPattern_recognition_and_machine_learning.pdf) Machine Learning 128 (2006): 1-58.
+10. Bishop, Christopher M. ["Pattern recognition."](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf) Machine Learning 128 (2006): 1-58.
 <br/>
-<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">This book</span> is created by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and uses <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Shared knowledge signature - non commercial use-Sharing 4.0 International Licensing Protocal</a>.
+This tutorial is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
 </div>
 <!-- You can change the lines below now. -->

--- a/recognize_digits/index.html
+++ b/recognize_digits/index.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 识别数字
-本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍
 当我们学习编程的时候，编写的第一个程序一般是实现打印"Hello World"。而机器学习（或深度学习）的入门教程，一般都是 [MNIST](http://yann.lecun.com/exdb/mnist/) 数据库上的手写识别问题。原因是手写识别属于典型的图像分类问题，比较简单，同时MNIST数据集也很完备。MNIST数据集作为一个简单的计算机视觉数据集，包含一系列如图1所示的手写数字图片和对应的标签。图片是28x28的像素矩阵，标签则对应着0~9的10个数字。每张图片都经过了大小归一化和居中处理。
@@ -74,15 +74,15 @@ Yann LeCun早先在手写字符识别上做了很多研究，并在研究过程
 输入层的数据$X$传到输出层，在激活操作之前，会乘以相应的权重 $W$ ，并加上偏置变量 $b$ ，具体如下：
-$$ y_i = softmax(\sum_j W_{i,j}x_j + b_i) $$
+$$ y_i = \text{softmax}(\sum_j W_{i,j}x_j + b_i) $$
-其中 $ softmax(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
+其中 $ \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
 对于有 $N$ 个类别的多分类问题，指定 $N$ 个输出节点，$N$ 维输入特征经过softmax将归一化为 $N$ 个[0,1]范围内的实数值，分别表示该样本属于这 $N$ 个类别的概率。此处的 $y_i$ 即对应该图片为数字 $i$ 的预测概率。
 在分类问题中，我们一般采用交叉熵代价损失函数（cross entropy），公式如下：
-$$  crossentropy(label, y) = -\sum_i label_ilog(y_i) $$
+$$  \text{crossentropy}(label, y) = -\sum_i label_ilog(y_i) $$
 图2为softmax回归的网络图，图中权重用蓝线表示、偏置用红线表示、+1代表偏置参数的系数为1。
@@ -97,7 +97,7 @@ Softmax回归模型采用了最简单的两层神经网络，即只有输入层
 1.  经过第一个隐藏层，可以得到 $ H_1 = \phi(W_1X + b_1) $，其中$\phi$代表激活函数，常见的有sigmoid、tanh或ReLU等函数。
 2.  经过第二个隐藏层，可以得到 $ H_2 = \phi(W_2H_1 + b_2) $。
-3.  最后，再经过输出层，得到的$Y=softmax(W_3H_2 + b_3)$，即为最后的分类结果向量。
+3.  最后，再经过输出层，得到的$Y=\text{softmax}(W_3H_2 + b_3)$，即为最后的分类结果向量。
 图3为多层感知器的网络结构图，图中权重用蓝线表示、偏置用红线表示、+1代表偏置参数的系数为1。
@@ -109,11 +109,11 @@ Softmax回归模型采用了最简单的两层神经网络，即只有输入层
 ### 卷积神经网络(Convolutional Neural Network, CNN)
-在多层感知器模型中，将图像展开成一维向量输入到网络中，忽略了图像的位置和结构信息，而卷积神经网络能够更好的利用图像的结构信息。[LeNet-5](http://yann.lecun.com/exdb/lenet/)是一个较简单的卷积神经网络。图6显示了其结构：输入的二维图像，先经过两次卷积层到池化层，再经过全连接层，最后使用softmax分类作为输出层。下面我们主要介绍卷积层和池化层。
+在多层感知器模型中，将图像展开成一维向量输入到网络中，忽略了图像的位置和结构信息，而卷积神经网络能够更好的利用图像的结构信息。[LeNet-5](http://yann.lecun.com/exdb/lenet/)是一个较简单的卷积神经网络。图4显示了其结构：输入的二维图像，先经过两次卷积层到池化层，再经过全连接层，最后使用softmax分类作为输出层。下面我们主要介绍卷积层和池化层。
 <p align="center">
 <img src="image/cnn.png"><br/>
-图6. LeNet-5卷积神经网络结构<br/>
+图4. LeNet-5卷积神经网络结构<br/>
 </p>
 #### 卷积层
@@ -121,17 +121,11 @@ Softmax回归模型采用了最简单的两层神经网络，即只有输入层
 卷积层是卷积神经网络的核心基石。在图像识别里我们提到的卷积是二维卷积，即离散二维滤波器（也称作卷积核）与二维图像做卷积操作，简单的讲是二维滤波器滑动到二维图像上所有位置，并在每个位置上与该像素点及其领域像素点做内积。卷积操作被广泛应用与图像处理领域，不同卷积核可以提取不同的特征，例如边沿、线性、角等特征。在深层卷积神经网络中，通过卷积操作可以提取出图像低级到复杂的特征。
 <p align="center">
-<img src="image/conv_layer.png"><br/>
+<img src="image/conv_layer.png" width='750'><br/>
-图4. 卷积层图片<br/>
+图5. 卷积层图片<br/>
 </p>
-图4给出一个卷积计算过程的示例图，输入图像大小为$H=5,W=5,D=3$，即$5 \times 5$大小的3通道（RGB，也称作深度）彩色图像。这个示例图中包含两（用$K$表示）组卷积核，即图中滤波器$W_0$和$W_1$。在卷积计算中，通常对不同的输入通道采用不同的卷积核，如图示例中每组卷积核包含（$D=3）$个$3 \times 3$（用$F \times F$表示）大小的卷积核。另外，这个示例中卷积核在图像的水平方向（$W$方向）和垂直方向（$H$方向）的滑动步长为2（用$S$表示）；对输入图像周围各填充1（用$P$表示）个0，即图中输入层原始数据为蓝色部分，灰色部分是进行了大小为1的扩展，用0来进行扩展。经过卷积操作得到输出为$3 \times 3 \times 2$（用$H_{o} \times W_{o} \times K$表示）大小的特征图，即$3 \times 3$大小的2通道特征图，其中$H_o$计算公式为：$H_o = (H - F + 2 \times P)/S + 1$，$W_o$同理。 而输出特征图中的每个像素，是每组滤波器与输入图像每个特征图的内积再求和，再加上偏置$b_o$，偏置通常对于每个输出特征图是共享的。例如图中输出特征图$o[:,:,0]$中的第一个$2$计算如下：
+图5给出一个卷积计算过程的示例图，输入图像大小为$H=5,W=5,D=3$，即$5 \times 5$大小的3通道（RGB，也称作深度）彩色图像。这个示例图中包含两（用$K$表示）组卷积核，即图中滤波器$W_0$和$W_1$。在卷积计算中，通常对不同的输入通道采用不同的卷积核，如图示例中每组卷积核包含（$D=3）$个$3 \times 3$（用$F \times F$表示）大小的卷积核。另外，这个示例中卷积核在图像的水平方向（$W$方向）和垂直方向（$H$方向）的滑动步长为2（用$S$表示）；对输入图像周围各填充1（用$P$表示）个0，即图中输入层原始数据为蓝色部分，灰色部分是进行了大小为1的扩展，用0来进行扩展。经过卷积操作得到输出为$3 \times 3 \times 2$（用$H_{o} \times W_{o} \times K$表示）大小的特征图，即$3 \times 3$大小的2通道特征图，其中$H_o$计算公式为：$H_o = (H - F + 2 \times P)/S + 1$，$W_o$同理。 而输出特征图中的每个像素，是每组滤波器与输入图像每个特征图的内积再求和，再加上偏置$b_o$，偏置通常对于每个输出特征图是共享的。输出特征图$o[:,:,0]$中的最后一个$-2$计算如图5右下角公式所示。
-$$ o[0,0,0] = \sum x[0:3,0:3,0] * w_{0}[:,:,0]]  + \sum x[0:3,0:3,1] * w_{0}[:,:,1]]  +  \sum x[0:3,0:3,2] * w_{0}[:,:,2]] + b_0 = 2 $$
-$$ \sum x[0:3,0:3,0] * w_{0}[:,:,0]] = 0*1 + 0*1 + 0*1 + 0*1 + 1*1 + 2*(-1) + 0*(-1) + 0*1 + 0*(-1) = -1 $$
-$$ \sum x[0:3,0:3,1] * w_{0}[:,:,1]] = 0*0 + 0*1 + 0*1 + 0*(-1) + 0*0 + 1*1 + 0*1 + 2*0 + 1*1 = 2 $$
-$$ \sum x[0:3,0:3,2] * w_{0}[:,:,2]] = 0*(-1) + 0*1 + 0*(-1) + 0*0 + 1*1 + 1*0 + 0*(-1) + 1*0 + 1*(-1) = 0 $$
-$$ b_0 = 1 $$
 在卷积操作中卷积核是可学习的参数，经过上面示例介绍，每层卷积的参数大小为$D \times F \times F \times K$。在多层感知器模型中，神经元通常是全部连接，参数较多。而卷积层的参数较少，这也是由卷积层的主要特性即局部连接和共享权重所决定。
@@ -145,10 +139,10 @@ $$ b_0 = 1 $$
 <p align="center">
 <img src="image/max_pooling.png" width="400px"><br/>
-图5. 池化层图片<br/>
+图6. 池化层图片<br/>
 </p>
-池化是非线性下采样的一种形式，主要作用是通过减少网络的参数来减小计算量，并且能够在一定程度上控制过拟合。通常在卷积层的后面会加上一个池化层。池化包括最大池化、平均池化等。其中最大池化是用不重叠的矩形框将输入层分成不同的区域，对于每个矩形框的数取最大值作为输出层，如图5所示。
+池化是非线性下采样的一种形式，主要作用是通过减少网络的参数来减小计算量，并且能够在一定程度上控制过拟合。通常在卷积层的后面会加上一个池化层。池化包括最大池化、平均池化等。其中最大池化是用不重叠的矩形框将输入层分成不同的区域，对于每个矩形框的数取最大值作为输出层，如图6所示。
 更详细的关于卷积神经网络的具体知识可以参考[斯坦福大学公开课]( http://cs231n.github.io/convolutional-networks/ )和[图像分类](https://github.com/PaddlePaddle/book/blob/develop/image_classification/README.md)教程。
@@ -293,7 +287,7 @@ def event_handler(event):
            print "Pass %d, Batch %d, Cost %f, %s" % (
                event.pass_id, event.batch_id, event.cost, event.metrics)
    if isinstance(event, paddle.event.EndPass):
-        result = trainer.test(reader=paddle.reader.batched(
+        result = trainer.test(reader=paddle.batch(
            paddle.dataset.mnist.test(), batch_size=128))
        print "Test with Pass %d, Cost %f, %s\n" % (
            event.pass_id, result.cost, result.metrics)
@@ -301,7 +295,7 @@ def event_handler(event):
                      result.metrics['classification_error_evaluator']))
 trainer.train(
-    reader=paddle.reader.batched(
+    reader=paddle.batch(
        paddle.reader.shuffle(
            paddle.dataset.mnist.train(), buf_size=8192),
        batch_size=128),
@@ -337,7 +331,7 @@ trainer.train(
 7. Deng, Li, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, and Geoffrey E. Hinton. ["Binary coding of speech spectrograms using a deep auto-encoder."](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.185.1908&rep=rep1&type=pdf) In Interspeech, pp. 1692-1695. 2010.
 8. Kégl, Balázs, and Róbert Busa-Fekete. ["Boosting products of base classifiers."](http://dl.acm.org/citation.cfm?id=1553439) In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 497-504. ACM, 2009.
 9. Rosenblatt, Frank. ["The perceptron: A probabilistic model for information storage and organization in the brain."](http://psycnet.apa.org/journals/rev/65/6/386/) Psychological review 65, no. 6 (1958): 386.
-10. Bishop, Christopher M. ["Pattern recognition."](http://s3.amazonaws.com/academia.edu.documents/30428242/bg0137.pdf?AWSAccessKeyId=AKIAJ56TQJRTWSMTNPEA&Expires=1484816640&Signature=85Ad6%2Fca8T82pmHzxaSXermovIA%3D&response-content-disposition=inline%3B%20filename%3DPattern_recognition_and_machine_learning.pdf) Machine Learning 128 (2006): 1-58.
+10. Bishop, Christopher M. ["Pattern recognition."](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf) Machine Learning 128 (2006): 1-58.
 <br/>
 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">知识共享 署名-非商业性使用-相同方式共享 4.0 国际 许可协议</a>进行许可。

--- a/recognize_digits/train.py
+++ b/recognize_digits/train.py
@@ -2,9 +2,8 @@ import paddle.v2 as paddle
 def softmax_regression(img):
-    predict = paddle.layer.fc(input=img,
+    predict = paddle.layer.fc(
-                              size=10,
+        input=img, size=10, act=paddle.activation.Softmax())
-                              act=paddle.activation.Softmax())
    return predict
@@ -12,14 +11,12 @@ def multilayer_perceptron(img):
    # The first fully-connected layer
    hidden1 = paddle.layer.fc(input=img, size=128, act=paddle.activation.Relu())
    # The second fully-connected layer and the according activation function
-    hidden2 = paddle.layer.fc(input=hidden1,
+    hidden2 = paddle.layer.fc(
-                              size=64,
+        input=hidden1, size=64, act=paddle.activation.Relu())
-                              act=paddle.activation.Relu())
    # The thrid fully-connected layer, note that the hidden size should be 10,
    # which is the number of unique digits
-    predict = paddle.layer.fc(input=hidden2,
+    predict = paddle.layer.fc(
-                              size=10,
+        input=hidden2, size=10, act=paddle.activation.Softmax())
-                              act=paddle.activation.Softmax())
    return predict
@@ -43,14 +40,12 @@ def convolutional_neural_network(img):
        pool_stride=2,
        act=paddle.activation.Tanh())
    # The first fully-connected layer
-    fc1 = paddle.layer.fc(input=conv_pool_2,
+    fc1 = paddle.layer.fc(
-                          size=128,
+        input=conv_pool_2, size=128, act=paddle.activation.Tanh())
-                          act=paddle.activation.Tanh())
    # The softmax layer, note that the hidden size should be 10,
    # which is the number of unique digits
-    predict = paddle.layer.fc(input=fc1,
+    predict = paddle.layer.fc(
-                              size=10,
+        input=fc1, size=10, act=paddle.activation.Softmax())
-                              act=paddle.activation.Softmax())
    return predict
@@ -76,9 +71,8 @@ optimizer = paddle.optimizer.Momentum(
    momentum=0.9,
    regularization=paddle.optimizer.L2Regularization(rate=0.0005 * 128))
-trainer = paddle.trainer.SGD(cost=cost,
+trainer = paddle.trainer.SGD(
-                             parameters=parameters,
+    cost=cost, parameters=parameters, update_equation=optimizer)
-                             update_equation=optimizer)
 lists = []
@@ -89,7 +83,7 @@ def event_handler(event):
            print "Pass %d, Batch %d, Cost %f, %s" % (
                event.pass_id, event.batch_id, event.cost, event.metrics)
    if isinstance(event, paddle.event.EndPass):
-        result = trainer.test(reader=paddle.reader.batched(
+        result = trainer.test(reader=paddle.batch(
            paddle.dataset.mnist.test(), batch_size=128))
        print "Test with Pass %d, Cost %f, %s\n" % (event.pass_id, result.cost,
                                                    result.metrics)
@@ -98,9 +92,8 @@ def event_handler(event):
 trainer.train(
-    reader=paddle.reader.batched(
+    reader=paddle.batch(
-        paddle.reader.shuffle(
+        paddle.reader.shuffle(paddle.dataset.mnist.train(), buf_size=8192),
-            paddle.dataset.mnist.train(), buf_size=8192),
        batch_size=128),
    event_handler=event_handler,
    num_passes=100)

--- a/recommender_system/README.en.md
+++ b/recommender_system/README.en.md
@@ -2,6 +2,9 @@
 The source code of this tutorial is in [book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/recommender_system).
+For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
 ## Background
 With the fast growth of e-commerce, online videos, and online reading business, users have to rely on recommender systems to avoid manually browsing tremendous volume of choices.  Recommender systems understand users' interest by mining user behavior and other properties of users and products.
@@ -76,22 +79,287 @@ Figure 3. A hybrid recommendation model.
 ## Dataset
-We use the [MovieLens ml-1m](http://files.grouplens.org/datasets/movielens/ml-1m.zip) to train our model.  This dataset includes 10,000 ratings of 4,000 movies from 6,000 users to 4,000 movies.  Each rate is in the range of 1~5.  Thanks to GroupLens Research for collecting, processing and publishing the dataset.  
+We use the [MovieLens ml-1m](http://files.grouplens.org/datasets/movielens/ml-1m.zip) to train our model.  This dataset includes 10,000 ratings of 4,000 movies from 6,000 users to 4,000 movies.  Each rate is in the range of 1~5.  Thanks to GroupLens Research for collecting, processing and publishing the dataset.
-We don't have to download and preprocess the data.  Instead, we can use PaddlePaddle's dataset module `paddle.v2.dataset.movielens`.
+`paddle.v2.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `moivelens` and `wmt14`, etc. There's no need for us to manually download and preprocess `MovieLens` dataset.
+```python
-## Model Specification
+# Run this block to show dataset's documentation
+help(paddle.v2.dataset.movielens)
+```
-## Training
-## Inference
+The raw `MoiveLens` contains movie ratings, relevant features from both movies and users.
+For instance, one movie's feature could be:
+```python
+movie_info = paddle.dataset.movielens.movie_info()
+print movie_info.values()[0]
+```
+```text
+<MovieInfo id(1), title(Toy Story), categories(['Animation', "Children's", 'Comedy'])>
+```
+One user's feature could be:
+```python
+user_info = paddle.dataset.movielens.user_info()
+print user_info.values()[0]
+```
+```text
+<UserInfo id(1), gender(F), age(1), job(10)>
+```
+In this dateset, the distribution of age is shown as follows:
+```text
+1: "Under 18"
+18: "18-24"
+25: "25-34"
+35: "35-44"
+45: "45-49"
+50: "50-55"
+56: "56+"
+```
+User's occupation is selected from the following options:
+```text
+0: "other" or not specified
+1: "academic/educator"
+2: "artist"
+3: "clerical/admin"
+4: "college/grad student"
+5: "customer service"
+6: "doctor/health care"
+7: "executive/managerial"
+8: "farmer"
+9: "homemaker"
+10: "K-12 student"
+11: "lawyer"
+12: "programmer"
+13: "retired"
+14: "sales/marketing"
+15: "scientist"
+16: "self-employed"
+17: "technician/engineer"
+18: "tradesman/craftsman"
+19: "unemployed"
+20: "writer"
+```
+Each record consists of three main components: user features, movie features and movie ratings.
+Likewise, as a simple example, consider the following:
+```python
+train_set_creator = paddle.dataset.movielens.train()
+train_sample = next(train_set_creator())
+uid = train_sample[0]
+mov_id = train_sample[len(user_info[uid].value())]
+print "User %s rates Movie %s with Score %s"%(user_info[uid], movie_info[mov_id], train_sample[-1])
+```
+```text
+User <UserInfo id(1), gender(F), age(1), job(10)> rates Movie <MovieInfo id(1193), title(One Flew Over the Cuckoo's Nest), categories(['Drama'])> with Score [5.0]
+```
+The output shows that user 1 gave movie `1193` a rating of 5.
+After issuing a command `python train.py`, training will start immediately. The details will be unpacked by the following sessions to see how it works.
+## Model Architecture
+### Initialize PaddlePaddle
+First, we must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).
+```python
+%matplotlib inline
+import matplotlib.pyplot as plt
+from IPython import display
+import cPickle
+import paddle.v2 as paddle
+paddle.init(use_gpu=False)
+```
+### Model Configuration
+```python
+uid = paddle.layer.data(
+        name='user_id',
+        type=paddle.data_type.integer_value(
+            paddle.dataset.movielens.max_user_id() + 1))
+usr_emb = paddle.layer.embedding(input=uid, size=32)
+usr_gender_id = paddle.layer.data(
+        name='gender_id', type=paddle.data_type.integer_value(2))
+usr_gender_emb = paddle.layer.embedding(input=usr_gender_id, size=16)
+usr_age_id = paddle.layer.data(
+        name='age_id',
+        type=paddle.data_type.integer_value(
+            len(paddle.dataset.movielens.age_table)))
+usr_age_emb = paddle.layer.embedding(input=usr_age_id, size=16)
+usr_job_id = paddle.layer.data(
+        name='job_id',
+        type=paddle.data_type.integer_value(paddle.dataset.movielens.max_job_id(
+        ) + 1))
+usr_job_emb = paddle.layer.embedding(input=usr_job_id, size=16)
+```
+As shown in the above code, the input is four dimension integers for each user, that is,  `user_id`,`gender_id`, `age_id` and `job_id`. In order to deal with these features conveniently, we use the language model in NLP to transform these discrete values into embedding vaules `usr_emb`, `usr_gender_emb`, `usr_age_emb` and `usr_job_emb`.
+```python
+usr_combined_features = paddle.layer.fc(
+        input=[usr_emb, usr_gender_emb, usr_age_emb, usr_job_emb],
+        size=200,
+        act=paddle.activation.Tanh())
+```
+Then, employing user features as input, directly connecting to a fully-connected layer, which is used to reduce dimension to 200.
+Furthermore, we do a similar transformation for each movie feature. The model configuration is:
+```python
+mov_id = paddle.layer.data(
+    name='movie_id',
+    type=paddle.data_type.integer_value(
+        paddle.dataset.movielens.max_movie_id() + 1))
+mov_emb = paddle.layer.embedding(input=mov_id, size=32)
+mov_categories = paddle.layer.data(
+    name='category_id',
+    type=paddle.data_type.sparse_binary_vector(
+        len(paddle.dataset.movielens.movie_categories())))
+mov_categories_hidden = paddle.layer.fc(input=mov_categories, size=32)
+movie_title_dict = paddle.dataset.movielens.get_movie_title_dict()
+mov_title_id = paddle.layer.data(
+    name='movie_title',
+    type=paddle.data_type.integer_value_sequence(len(movie_title_dict)))
+mov_title_emb = paddle.layer.embedding(input=mov_title_id, size=32)
+mov_title_conv = paddle.networks.sequence_conv_pool(
+    input=mov_title_emb, hidden_size=32, context_len=3)
+mov_combined_features = paddle.layer.fc(
+    input=[mov_emb, mov_categories_hidden, mov_title_conv],
+    size=200,
+    act=paddle.activation.Tanh())
+```
+Movie title, a sequence of words represented by an integer word index sequence, will be feed into a `sequence_conv_pool` layer, which will apply convolution and pooling on time dimension. Because pooling is done on time dimension, the output will be a fixed-length vector regardless the length of the input sequence.
+Finally, we can use cosine similarity to calculate the similarity between user characteristics and movie features.
+```python
+inference = paddle.layer.cos_sim(a=usr_combined_features, b=mov_combined_features, size=1, scale=5)
+cost = paddle.layer.regression_cost(
+        input=inference,
+        label=paddle.layer.data(
+        name='score', type=paddle.data_type.dense_vector(1)))
+```
+## Model Training
+### Define Parameters
+First, we define the model parameters according to the previous model configuration `cost`.
+```python
+# Create parameters
+parameters = paddle.parameters.create(cost)
+```
+### Create Trainer
+Before jumping into creating a training module, algorithm setting is also necessary. Here we specified Adam optimization algorithm via `paddle.optimizer`.
+```python
+trainer = paddle.trainer.SGD(cost=cost, parameters=parameters,
+                             update_equation=paddle.optimizer.Adam(learning_rate=1e-4))
+```
+```text
+[INFO 2017-03-06 17:12:13,378 networks.py:1472] The input order is [user_id, gender_id, age_id, job_id, movie_id, category_id, movie_title, score]
+[INFO 2017-03-06 17:12:13,379 networks.py:1478] The output order is [__regression_cost_0__]
+```
+### Training
+`paddle.dataset.movielens.train` will yield records during each pass, after shuffling, a batch input is generated for training.
+```python
+reader=paddle.reader.batch(
+    paddle.reader.shuffle(
+        paddle.dataset.movielens.trai(), buf_size=8192),
+        batch_size=256)
+```
+`feeding` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance, the first column of data generated by `movielens.train` corresponds to `user_id` feature.
+```python
+feeding = {
+    'user_id': 0,
+    'gender_id': 1,
+    'age_id': 2,
+    'job_id': 3,
+    'movie_id': 4,
+    'category_id': 5,
+    'movie_title': 6,
+    'score': 7
+}
+```
+Callback function `event_handler` will be called during training when a pre-defined event happens.
+```python
+step=0
+train_costs=[],[]
+test_costs=[],[]
+def event_handler(event):
+    global step
+    global train_costs
+    global test_costs
+    if isinstance(event, paddle.event.EndIteration):
+        need_plot = False
+        if step % 10 == 0:  # every 10 batches, record a train cost
+            train_costs[0].append(step)
+            train_costs[1].append(event.cost)
+        if step % 1000 == 0: # every 1000 batches, record a test cost
+            result = trainer.test(reader=paddle.batch(
+                  paddle.dataset.movielens.test(), batch_size=256))
+            test_costs[0].append(step)
+            test_costs[1].append(result.cost)
+        if step % 100 == 0: # every 100 batches, update cost plot
+            plt.plot(*train_costs)
+            plt.plot(*test_costs)
+            plt.legend(['Train Cost', 'Test Cost'], loc='upper left')
+            display.clear_output(wait=True)
+            display.display(plt.gcf())
+            plt.gcf().clear()
+        step += 1
+```
+Finally, we can invoke `trainer.train` to start training:
+```python
+trainer.train(
+    reader=reader,
+    event_handler=event_handler,
+    feeding=feeding,
+    num_passes=200)
+```
 ## Conclusion
@@ -99,13 +367,13 @@ This tutorial goes over traditional approaches in recommender system and a deep
 ## Reference
-1. [Peter Brusilovsky](https://en.wikipedia.org/wiki/Peter_Brusilovsky) (2007). *The Adaptive Web*. p. 325.
+1. [Peter Brusilovsky](https://en.wikipedia.org/wiki/Peter_Brusilovsky) (2007). *The Adaptive Web*. p. 325.
-2. Robin Burke , [Hybrid Web Recommender Systems](http://www.dcs.warwick.ac.uk/~acristea/courses/CS411/2010/Book%20-%20The%20Adaptive%20Web/HybridWebRecommenderSystems.pdf), pp. 377-408, The Adaptive Web, Peter Brusilovsky, Alfred Kobsa, Wolfgang Nejdl (Ed.), Lecture Notes in Computer Science, Springer-Verlag, Berlin, Germany, Lecture Notes in Computer Science, Vol. 4321, May 2007, 978-3-540-72078-2.
+2. Robin Burke , [Hybrid Web Recommender Systems](http://www.dcs.warwick.ac.uk/~acristea/courses/CS411/2010/Book%20-%20The%20Adaptive%20Web/HybridWebRecommenderSystems.pdf), pp. 377-408, The Adaptive Web, Peter Brusilovsky, Alfred Kobsa, Wolfgang Nejdl (Ed.), Lecture Notes in Computer Science, Springer-Verlag, Berlin, Germany, Lecture Notes in Computer Science, Vol. 4321, May 2007, 978-3-540-72078-2.
 3. P. Resnick, N. Iacovou, etc. “[GroupLens: An Open Architecture for Collaborative Filtering of Netnews](http://ccs.mit.edu/papers/CCSWP165.html)”, Proceedings of ACM Conference on Computer Supported Cooperative Work, CSCW 1994. pp.175-186.
-4. Sarwar, Badrul, et al. "[Item-based collaborative filtering recommendation algorithms.](http://files.grouplens.org/papers/www10_sarwar.pdf)" *Proceedings of the 10th International Conference on World Wide Web*. ACM, 2001.
+4. Sarwar, Badrul, et al. "[Item-based collaborative filtering recommendation algorithms.](http://files.grouplens.org/papers/www10_sarwar.pdf)" *Proceedings of the 10th International Conference on World Wide Web*. ACM, 2001.
 5. Kautz, Henry, Bart Selman, and Mehul Shah. "[Referral Web: Combining Social networks and collaborative filtering.](http://www.cs.cornell.edu/selman/papers/pdf/97.cacm.refweb.pdf)" Communications of the ACM 40.3 (1997): 63-65. APA
-6. Yuan, Jianbo, et al. ["Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach."](https://arxiv.org/pdf/1611.05480v1.pdf) *arXiv preprint arXiv:1611.05480* (2016).
+6. Yuan, Jianbo, et al. ["Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach."](https://arxiv.org/pdf/1611.05480v1.pdf) *arXiv preprint arXiv:1611.05480* (2016).
 7. Covington P, Adams J, Sargin E. [Deep neural networks for youtube recommendations](https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45530.pdf)[C]//Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016: 191-198.
 <br/>
-<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">This tutorial</span> was created by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">the PaddlePaddle community</a> and published under <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Common Creative 4.0 License</a>。
+This tutorial is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
--- a/recommender_system/README.md
+++ b/recommender_system/README.md
 # 个性化推荐
-本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/recommender_system)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/recommender_system)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍

--- a/recommender_system/data/meta_generator.py
+++ b/recommender_system/data/meta_generator.py
@@ -208,8 +208,8 @@ class EmbeddingFieldParser(object):
        elif config['dict']['type'] == 'split':
            self.dict = SplitEmbeddingDict(config['dict'].get('delimiter', ','))
        elif config['dict']['type'] == 'whole_content':
-            self.dict = EmbeddingFieldParser.WholeContentDict(config['dict'][
+            self.dict = EmbeddingFieldParser.WholeContentDict(
-                'sort'])
+                config['dict']['sort'])
        else:
            print config
            assert False

--- a/recommender_system/index.en.html
+++ b/recommender_system/index.en.html
--- a/recommender_system/index.html
+++ b/recommender_system/index.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 个性化推荐
-本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/recommender_system)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/recommender_system)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍

--- a/understand_sentiment/README.en.md
+++ b/understand_sentiment/README.en.md
--- a/understand_sentiment/README.md
+++ b/understand_sentiment/README.md
 # 情感分析
-本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍
 在自然语言处理中，情感分析一般是指判断一段文本所表达的情绪状态。其中，一段文本可以是一个句子，一个段落或一个文档。情绪状态可以是两类，如（正面，负面），（高兴，悲伤）；也可以是三类，如（积极，消极，中性）等等。情感分析的应用场景十分广泛，如把用户在购物网站（亚马逊、天猫、淘宝等）、旅游网站、电影评论网站上发表的评论分成正面评论和负面评论；或为了分析用户对于某一产品的整体使用感受，抓取产品的用户评论并进行情感分析等等。表格1展示了对电影评论进行情感分析的例子：
@@ -108,14 +108,14 @@ aclImdb
 ```
 Paddle在`dataset/imdb.py`中提实现了imdb数据集的自动下载和读取，并提供了读取字典、训练数据、测试数据等API。
-```
+```python
 import sys
 import paddle.v2 as paddle
 ```
 ## 配置模型
 在该示例中，我们实现了两种文本分类算法，分别基于上文所述的[文本卷积神经网络](#文本卷积神经网络（CNN）)和[栈式双向LSTM](#栈式双向LSTM（Stacked Bidirectional LSTM）)。
 ### 文本卷积神经网络
-```
+```python
 def convolution_net(input_dim,
                    class_dim=2,
                    emb_dim=128,
@@ -136,7 +136,7 @@ def convolution_net(input_dim,
 ```
 网络的输入`input_dim`表示的是词典的大小，`class_dim`表示类别数。这里，我们使用[`sequence_conv_pool`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/trainer_config_helpers/networks.py) API实现了卷积和池化操作。
 ### 栈式双向LSTM
-```
+```python
 def stacked_lstm_net(input_dim,
                     class_dim=2,
                     emb_dim=128,
@@ -205,7 +205,7 @@ def stacked_lstm_net(input_dim,
 ```
 网络的输入`stacked_num`表示的是LSTM的层数，需要是奇数，确保最高层LSTM正向。Paddle里面是通过一个fc和一个lstmemory来实现基于LSTM的循环神经网络。
 ## 训练模型
-```
+```python
 if __name__ == '__main__':
    # init
    paddle.init(use_gpu=False)
@@ -213,14 +213,14 @@ if __name__ == '__main__':
 启动paddle程序，use_gpu=False表示用CPU训练，如果系统支持GPU也可以修改成True使用GPU训练。
 ### 训练数据
 使用Paddle提供的数据集`dataset.imdb`中的API来读取训练数据。
-```
+```python
    print 'load dictionary...'
    word_dict = paddle.dataset.imdb.word_dict()
    dict_dim = len(word_dict)
    class_dim = 2
 ```
 加载数据字典，这里通过`word_dict()`API可以直接构造字典。`class_dim`是指样本类别数，该示例中样本只有正负两类。
-```
+```python
    train_reader = paddle.batch(
        paddle.reader.shuffle(
            lambda: paddle.dataset.imdb.train(word_dict), buf_size=1000),
@@ -230,12 +230,12 @@ if __name__ == '__main__':
        batch_size=100)
 ```
 这里，`dataset.imdb.train()`和`dataset.imdb.test()`分别是`dataset.imdb`中的训练数据和测试数据API。`train_reader`在训练时使用，意义是将读取的训练数据进行shuffle后，组成一个batch数据。同理，`test_reader`是在测试的时候使用，将读取的测试数据组成一个batch。
+```python
+    feeding={'word': 0, 'label': 1}
 ```
-    reader_dict={'word': 0, 'label': 1}
+`feeding`用来指定`train_reader`和`test_reader`返回的数据与模型配置中data_layer的对应关系。这里表示reader返回的第0列数据对应`word`层，第1列数据对应`label`层。
-```
-`reader_dict`用来指定`train_reader`和`test_reader`返回的数据与模型配置中data_layer的对应关系。这里表示reader返回的第0列数据对应`word`层，第1列数据对应`label`层。
 ### 构造模型
-```
+```python
    # Please choose the way to build the network
    # by uncommenting the corresponding line.
    cost = convolution_net(dict_dim, class_dim=class_dim)
@@ -243,13 +243,13 @@ if __name__ == '__main__':
 ```
 该示例中默认使用`convolution_net`网络，如果使用`stacked_lstm_net`网络，注释相应的行即可。其中cost是网络的优化目标，同时cost包含了整个网络的拓扑信息。
 ### 网络参数
-```
+```python
    # create parameters
    parameters = paddle.parameters.create(cost)
 ```
 根据网络的拓扑构造网络参数。这里parameters是整个网络的参数集。
 ### 优化算法
-```
+```python
    # create optimizer
    adam_optimizer = paddle.optimizer.Adam(
        learning_rate=2e-3,
@@ -259,7 +259,7 @@ if __name__ == '__main__':
 Paddle中提供了一系列优化算法的API，这里使用Adam优化算法。
 ### 训练
 可以通过`paddle.trainer.SGD`构造一个sgd trainer，并调用`trainer.train`来训练模型。
-```
+```python
    # End batch and end pass event handler
    def event_handler(event):
        if isinstance(event, paddle.event.EndIteration):
@@ -270,11 +270,11 @@ Paddle中提供了一系列优化算法的API，这里使用Adam优化算法。
                sys.stdout.write('.')
                sys.stdout.flush()
        if isinstance(event, paddle.event.EndPass):
-            result = trainer.test(reader=test_reader, reader_dict=reader_dict)
+            result = trainer.test(reader=test_reader, feeding=feeding)
            print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
 ```
 可以通过给train函数传递一个`event_handler`来获取每个batch和每个pass结束的状态。比如构造如下一个`event_handler`可以在每100个batch结束后输出cost和error；在每个pass结束后调用`trainer.test`计算一遍测试集并获得当前模型在测试集上的error。
-```
+```python
    # create trainer
    trainer = paddle.trainer.SGD(cost=cost,
                                 parameters=parameters,
@@ -283,11 +283,11 @@ Paddle中提供了一系列优化算法的API，这里使用Adam优化算法。
    trainer.train(
        reader=train_reader,
        event_handler=event_handler,
-        reader_dict=reader_dict,
+        feeding=feeding,
        num_passes=2)
 ```
 程序运行之后的输出如下。
-```
+```text
 Pass 0, Batch 0, Cost 0.693721, {'classification_error_evaluator': 0.5546875}
 ...................................................................................................
 Pass 0, Batch 100, Cost 0.294321, {'classification_error_evaluator': 0.1015625}

--- a/understand_sentiment/index.en.html
+++ b/understand_sentiment/index.en.html
--- a/understand_sentiment/index.html
+++ b/understand_sentiment/index.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 情感分析
-本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。
 ## 背景介绍
 在自然语言处理中，情感分析一般是指判断一段文本所表达的情绪状态。其中，一段文本可以是一个句子，一个段落或一个文档。情绪状态可以是两类，如（正面，负面），（高兴，悲伤）；也可以是三类，如（积极，消极，中性）等等。情感分析的应用场景十分广泛，如把用户在购物网站（亚马逊、天猫、淘宝等）、旅游网站、电影评论网站上发表的评论分成正面评论和负面评论；或为了分析用户对于某一产品的整体使用感受，抓取产品的用户评论并进行情感分析等等。表格1展示了对电影评论进行情感分析的例子：
@@ -150,14 +150,14 @@ aclImdb
 ```
 Paddle在`dataset/imdb.py`中提实现了imdb数据集的自动下载和读取，并提供了读取字典、训练数据、测试数据等API。
-```
+```python
 import sys
 import paddle.v2 as paddle
 ```
 ## 配置模型
 在该示例中，我们实现了两种文本分类算法，分别基于上文所述的[文本卷积神经网络](#文本卷积神经网络（CNN）)和[栈式双向LSTM](#栈式双向LSTM（Stacked Bidirectional LSTM）)。
 ### 文本卷积神经网络
-```
+```python
 def convolution_net(input_dim,
                    class_dim=2,
                    emb_dim=128,
@@ -178,7 +178,7 @@ def convolution_net(input_dim,
 ```
 网络的输入`input_dim`表示的是词典的大小，`class_dim`表示类别数。这里，我们使用[`sequence_conv_pool`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/trainer_config_helpers/networks.py) API实现了卷积和池化操作。
 ### 栈式双向LSTM
-```
+```python
 def stacked_lstm_net(input_dim,
                     class_dim=2,
                     emb_dim=128,
@@ -247,7 +247,7 @@ def stacked_lstm_net(input_dim,
 ```
 网络的输入`stacked_num`表示的是LSTM的层数，需要是奇数，确保最高层LSTM正向。Paddle里面是通过一个fc和一个lstmemory来实现基于LSTM的循环神经网络。
 ## 训练模型
-```
+```python
 if __name__ == '__main__':
    # init
    paddle.init(use_gpu=False)
@@ -255,14 +255,14 @@ if __name__ == '__main__':
 启动paddle程序，use_gpu=False表示用CPU训练，如果系统支持GPU也可以修改成True使用GPU训练。
 ### 训练数据
 使用Paddle提供的数据集`dataset.imdb`中的API来读取训练数据。
-```
+```python
    print 'load dictionary...'
    word_dict = paddle.dataset.imdb.word_dict()
    dict_dim = len(word_dict)
    class_dim = 2
 ```
 加载数据字典，这里通过`word_dict()`API可以直接构造字典。`class_dim`是指样本类别数，该示例中样本只有正负两类。
-```
+```python
    train_reader = paddle.batch(
        paddle.reader.shuffle(
            lambda: paddle.dataset.imdb.train(word_dict), buf_size=1000),
@@ -272,12 +272,12 @@ if __name__ == '__main__':
        batch_size=100)
 ```
 这里，`dataset.imdb.train()`和`dataset.imdb.test()`分别是`dataset.imdb`中的训练数据和测试数据API。`train_reader`在训练时使用，意义是将读取的训练数据进行shuffle后，组成一个batch数据。同理，`test_reader`是在测试的时候使用，将读取的测试数据组成一个batch。
+```python
+    feeding={'word': 0, 'label': 1}
 ```
-    reader_dict={'word': 0, 'label': 1}
+`feeding`用来指定`train_reader`和`test_reader`返回的数据与模型配置中data_layer的对应关系。这里表示reader返回的第0列数据对应`word`层，第1列数据对应`label`层。
-```
-`reader_dict`用来指定`train_reader`和`test_reader`返回的数据与模型配置中data_layer的对应关系。这里表示reader返回的第0列数据对应`word`层，第1列数据对应`label`层。
 ### 构造模型
-```
+```python
    # Please choose the way to build the network
    # by uncommenting the corresponding line.
    cost = convolution_net(dict_dim, class_dim=class_dim)
@@ -285,13 +285,13 @@ if __name__ == '__main__':
 ```
 该示例中默认使用`convolution_net`网络，如果使用`stacked_lstm_net`网络，注释相应的行即可。其中cost是网络的优化目标，同时cost包含了整个网络的拓扑信息。
 ### 网络参数
-```
+```python
    # create parameters
    parameters = paddle.parameters.create(cost)
 ```
 根据网络的拓扑构造网络参数。这里parameters是整个网络的参数集。
 ### 优化算法
-```
+```python
    # create optimizer
    adam_optimizer = paddle.optimizer.Adam(
        learning_rate=2e-3,
@@ -301,7 +301,7 @@ if __name__ == '__main__':
 Paddle中提供了一系列优化算法的API，这里使用Adam优化算法。
 ### 训练
 可以通过`paddle.trainer.SGD`构造一个sgd trainer，并调用`trainer.train`来训练模型。
-```
+```python
    # End batch and end pass event handler
    def event_handler(event):
        if isinstance(event, paddle.event.EndIteration):
@@ -312,11 +312,11 @@ Paddle中提供了一系列优化算法的API，这里使用Adam优化算法。
                sys.stdout.write('.')
                sys.stdout.flush()
        if isinstance(event, paddle.event.EndPass):
-            result = trainer.test(reader=test_reader, reader_dict=reader_dict)
+            result = trainer.test(reader=test_reader, feeding=feeding)
            print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
 ```
 可以通过给train函数传递一个`event_handler`来获取每个batch和每个pass结束的状态。比如构造如下一个`event_handler`可以在每100个batch结束后输出cost和error；在每个pass结束后调用`trainer.test`计算一遍测试集并获得当前模型在测试集上的error。
-```
+```python
    # create trainer
    trainer = paddle.trainer.SGD(cost=cost,
                                 parameters=parameters,
@@ -325,11 +325,11 @@ Paddle中提供了一系列优化算法的API，这里使用Adam优化算法。
    trainer.train(
        reader=train_reader,
        event_handler=event_handler,
-        reader_dict=reader_dict,
+        feeding=feeding,
        num_passes=2)
 ```
 程序运行之后的输出如下。
-```
+```text
 Pass 0, Batch 0, Cost 0.693721, {'classification_error_evaluator': 0.5546875}
 ...................................................................................................
 Pass 0, Batch 100, Cost 0.294321, {'classification_error_evaluator': 0.1015625}

--- a/understand_sentiment/train.py
+++ b/understand_sentiment/train.py
--- a/word2vec/README.en.md
+++ b/word2vec/README.en.md
--- a/word2vec/README.md
+++ b/word2vec/README.md
--- a/word2vec/index.en.html
+++ b/word2vec/index.en.html
--- a/word2vec/index.html
+++ b/word2vec/index.html
--- a/word2vec/train.py
+++ b/word2vec/train.py