fix conflict

b17e5cbc · chengduoZH · 44f1fff6 · 524b56c8 · b17e5cbc · b17e5cbc
38 changed file
--- a/.tools/convert-markdown-into-ipynb-and-test.sh
+++ b/.tools/convert-markdown-into-ipynb-and-test.sh
@@ -11,7 +11,7 @@ cur_path="$(cd "$(dirname "$0")" && pwd -P)"
 cd $cur_path/../

 #convert md to ipynb
-for file in */{README,README\.en}.md ; do
+for file in */{README,README\.cn}.md ; do
    ~/go/bin/markdown-to-ipynb < $file > ${file%.*}".ipynb"
    if [ $? -ne 0 ]; then
        echo >&2 "markdown-to-ipynb $file error"
@@ -24,7 +24,7 @@ if [[ -z $TEST_EMBEDDED_PYTHON_SCRIPTS ]]; then
 fi

 #exec ipynb's py file
-for file in */{README,README\.en}.ipynb ; do
+for file in */{README,README\.cn}.ipynb ; do
    pushd $PWD > /dev/null
    cd $(dirname $file) > /dev/null


--- a/.tools/notedown.sh
+++ b/.tools/notedown.sh
@@ -4,6 +4,6 @@ set -xe
 cd /book

 #convert md to ipynb
-for file in */{README,README\.en}.md ; do
+for file in */{README,README\.cn}.md ; do
    notedown $file > ${file%.*}.ipynb
 done
--- a/01.fit_a_line/README.cn.md
+++ b/01.fit_a_line/README.cn.md
 # 线性回归
 让我们从经典的线性回归（Linear Regression \[[1](#参考文献)\]）模型开始这份教程。在这一章里，你将使用真实的数据集建立起一个房价预测模型，并且了解到机器学习中的若干重要概念。

-本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/01.fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/01.fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/137.html)。

 ## 背景介绍
 给定一个大小为$n$的数据集  ${\{y_{i}, x_{i1}, ..., x_{id}\}}_{i=1}^{n}$，其中$x_{i1}, \ldots, x_{id}$是第$i$个样本$d$个属性上的取值，$y_i$是该样本待预测的目标。线性回归模型假设目标$y_i$可以被属性间的线性组合描述，即
@@ -200,6 +200,11 @@ def event_handler_plot(event):
            cost_ploter.plot()

        step += 1
+
+    if isinstance(event, paddle.event.EndPass):
+        if event.pass_id % 10 == 0:
+            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
+                parameters.to_tar(f)
 ```

 ### 开始训练
@@ -217,6 +222,37 @@ trainer.train(

 ![png](./image/train_and_test.png)

+### 应用模型
+
+#### 1. 生成测试数据
+
+```python
+test_data_creator = paddle.dataset.uci_housing.test()
+test_data = []
+test_label = []
+
+for item in test_data_creator():
+    test_data.append((item[0],))
+    test_label.append(item[1])
+    if len(test_data) == 5:
+        break
+```
+
+#### 2. 推测 inference
+
+```python
+# load parameters from tar file.
+# users can remove the comments and change the model name
+# with open('params_pass_20.tar', 'r') as f:
+#     parameters = paddle.parameters.Parameters.from_tar(f)
+
+probs = paddle.infer(
+    output_layer=y_predict, parameters=parameters, input=test_data)
+
+for i in xrange(len(probs)):
+    print "label=" + str(test_label[i][0]) + ", predict=" + str(probs[i][0])
+```
+
 ## 总结
 在这章里，我们借助波士顿房价这一数据集，介绍了线性回归模型的基本概念，以及如何使用PaddlePaddle实现训练和测试的过程。很多的模型和技巧都是从简单的线性回归模型演化而来，因此弄清楚线性模型的原理和局限非常重要。


--- a/01.fit_a_line/README.md
+++ b/01.fit_a_line/README.md
@@ -205,6 +205,11 @@ def event_handler_plot(event):
            plot_cost.plot()

        step += 1
+
+    if isinstance(event, paddle.event.EndPass):
+        if event.pass_id % 10 == 0:
+            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
+                parameters.to_tar(f)
 ```

 ### Start Training
@@ -222,6 +227,37 @@ trainer.train(

 ![png](./image/train_and_test.png)

+### Apply model
+
+#### 1. generate testing data
+
+```python
+test_data_creator = paddle.dataset.uci_housing.test()
+test_data = []
+test_label = []
+
+for item in test_data_creator():
+    test_data.append((item[0],))
+    test_label.append(item[1])
+    if len(test_data) == 5:
+        break
+```
+
+#### 2. inference
+
+```python
+# load parameters from tar file.
+# users can remove the comments and change the model name
+# with open('params_pass_20.tar', 'r') as f:
+#     parameters = paddle.parameters.Parameters.from_tar(f)
+
+probs = paddle.infer(
+    output_layer=y_predict, parameters=parameters, input=test_data)
+
+for i in xrange(len(probs)):
+    print "label=" + str(test_label[i][0]) + ", predict=" + str(probs[i][0])
+```
+
 ## Summary
 This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation.


--- a/01.fit_a_line/image/ranges.png
+++ b/01.fit_a_line/image/ranges.png
--- a/01.fit_a_line/index.cn.html
+++ b/01.fit_a_line/index.cn.html
@@ -43,7 +43,7 @@
 # 线性回归
 让我们从经典的线性回归（Linear Regression \[[1](#参考文献)\]）模型开始这份教程。在这一章里，你将使用真实的数据集建立起一个房价预测模型，并且了解到机器学习中的若干重要概念。

-本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/01.fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/01.fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/137.html)。

 ## 背景介绍
 给定一个大小为$n$的数据集  ${\{y_{i}, x_{i1}, ..., x_{id}\}}_{i=1}^{n}$，其中$x_{i1}, \ldots, x_{id}$是第$i$个样本$d$个属性上的取值，$y_i$是该样本待预测的目标。线性回归模型假设目标$y_i$可以被属性间的线性组合描述，即
@@ -242,6 +242,11 @@ def event_handler_plot(event):
            cost_ploter.plot()

        step += 1
+
+    if isinstance(event, paddle.event.EndPass):
+        if event.pass_id % 10 == 0:
+            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
+                parameters.to_tar(f)
 ```

 ### 开始训练
@@ -259,6 +264,37 @@ trainer.train(

 ![png](./image/train_and_test.png)

+### 应用模型
+
+#### 1. 生成测试数据
+
+```python
+test_data_creator = paddle.dataset.uci_housing.test()
+test_data = []
+test_label = []
+
+for item in test_data_creator():
+    test_data.append((item[0],))
+    test_label.append(item[1])
+    if len(test_data) == 5:
+        break
+```
+
+#### 2. 推测 inference
+
+```python
+# load parameters from tar file.
+# users can remove the comments and change the model name
+# with open('params_pass_20.tar', 'r') as f:
+#     parameters = paddle.parameters.Parameters.from_tar(f)
+
+probs = paddle.infer(
+    output_layer=y_predict, parameters=parameters, input=test_data)
+
+for i in xrange(len(probs)):
+    print "label=" + str(test_label[i][0]) + ", predict=" + str(probs[i][0])
+```
+
 ## 总结
 在这章里，我们借助波士顿房价这一数据集，介绍了线性回归模型的基本概念，以及如何使用PaddlePaddle实现训练和测试的过程。很多的模型和技巧都是从简单的线性回归模型演化而来，因此弄清楚线性模型的原理和局限非常重要。


--- a/01.fit_a_line/index.html
+++ b/01.fit_a_line/index.html
@@ -247,6 +247,11 @@ def event_handler_plot(event):
            plot_cost.plot()

        step += 1
+
+    if isinstance(event, paddle.event.EndPass):
+        if event.pass_id % 10 == 0:
+            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
+                parameters.to_tar(f)
 ```

 ### Start Training
@@ -264,6 +269,37 @@ trainer.train(

 ![png](./image/train_and_test.png)

+### Apply model
+
+#### 1. generate testing data
+
+```python
+test_data_creator = paddle.dataset.uci_housing.test()
+test_data = []
+test_label = []
+
+for item in test_data_creator():
+    test_data.append((item[0],))
+    test_label.append(item[1])
+    if len(test_data) == 5:
+        break
+```
+
+#### 2. inference
+
+```python
+# load parameters from tar file.
+# users can remove the comments and change the model name
+# with open('params_pass_20.tar', 'r') as f:
+#     parameters = paddle.parameters.Parameters.from_tar(f)
+
+probs = paddle.infer(
+    output_layer=y_predict, parameters=parameters, input=test_data)
+
+for i in xrange(len(probs)):
+    print "label=" + str(test_label[i][0]) + ", predict=" + str(probs[i][0])
+```
+
 ## Summary
 This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation.


--- a/01.fit_a_line/train.py
+++ b/01.fit_a_line/train.py
@@ -31,6 +31,9 @@ def main():
                    event.pass_id, event.batch_id, event.cost)

        if isinstance(event, paddle.event.EndPass):
+            if event.pass_id % 10 == 0:
+                with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
+                    parameters.to_tar(f)
            result = trainer.test(
                reader=paddle.batch(uci_housing.test(), batch_size=2),
                feeding=feeding)
@@ -45,6 +48,28 @@ def main():
        event_handler=event_handler,
        num_passes=30)

+    # inference
+    test_data_creator = paddle.dataset.uci_housing.test()
+    test_data = []
+    test_label = []
+
+    for item in test_data_creator():
+        test_data.append((item[0], ))
+        test_label.append(item[1])
+        if len(test_data) == 5:
+            break
+
+    # load parameters from tar file.
+    # users can remove the comments and change the model name
+    # with open('params_pass_20.tar', 'r') as f:
+    #     parameters = paddle.parameters.Parameters.from_tar(f)
+
+    probs = paddle.infer(
+        output_layer=y_predict, parameters=parameters, input=test_data)
+
+    for i in xrange(len(probs)):
+        print "label=" + str(test_label[i][0]) + ", predict=" + str(probs[i][0])
+

 if __name__ == '__main__':
    main()
--- a/02.recognize_digits/README.cn.md
+++ b/02.recognize_digits/README.cn.md
 # 识别数字

-本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/167.html)。

 ## 背景介绍
 当我们学习编程的时候，编写的第一个程序一般是实现打印"Hello World"。而机器学习（或深度学习）的入门教程，一般都是 [MNIST](http://yann.lecun.com/exdb/mnist/) 数据库上的手写识别问题。原因是手写识别属于典型的图像分类问题，比较简单，同时MNIST数据集也很完备。MNIST数据集作为一个简单的计算机视觉数据集，包含一系列如图1所示的手写数字图片和对应的标签。图片是28x28的像素矩阵，标签则对应着0~9的10个数字。每张图片都经过了大小归一化和居中处理。
@@ -132,7 +132,6 @@ PaddlePaddle在API中提供了自动加载[MNIST](http://yann.lecun.com/exdb/mni
 首先，加载PaddlePaddle的V2 api包。

 ```python
-import gzip
 import paddle.v2 as paddle
 ```
 其次，定义三个不同的分类器：
@@ -256,7 +255,7 @@ def event_handler_plot(event):
        step += 1
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(reader=paddle.batch(
@@ -275,7 +274,7 @@ def event_handler(event):
                event.pass_id, event.batch_id, event.cost, event.metrics)
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(reader=paddle.batch(

--- a/02.recognize_digits/README.md
+++ b/02.recognize_digits/README.md
@@ -131,7 +131,6 @@ PaddlePaddle provides a Python module, `paddle.dataset.mnist`, which downloads a
 A PaddlePaddle program starts from importing the API package:

 ```python
-import gzip
 import paddle.v2 as paddle
 ```

@@ -251,7 +250,7 @@ def event_handler_plot(event):
        step += 1
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(reader=paddle.batch(
@@ -271,7 +270,7 @@ def event_handler(event):
                event.pass_id, event.batch_id, event.cost, event.metrics)
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(reader=paddle.batch(

--- a/02.recognize_digits/index.cn.html
+++ b/02.recognize_digits/index.cn.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 识别数字

-本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/167.html)。

 ## 背景介绍
 当我们学习编程的时候，编写的第一个程序一般是实现打印"Hello World"。而机器学习（或深度学习）的入门教程，一般都是 [MNIST](http://yann.lecun.com/exdb/mnist/) 数据库上的手写识别问题。原因是手写识别属于典型的图像分类问题，比较简单，同时MNIST数据集也很完备。MNIST数据集作为一个简单的计算机视觉数据集，包含一系列如图1所示的手写数字图片和对应的标签。图片是28x28的像素矩阵，标签则对应着0~9的10个数字。每张图片都经过了大小归一化和居中处理。
@@ -174,7 +174,6 @@ PaddlePaddle在API中提供了自动加载[MNIST](http://yann.lecun.com/exdb/mni
 首先，加载PaddlePaddle的V2 api包。

 ```python
-import gzip
 import paddle.v2 as paddle
 ```
 其次，定义三个不同的分类器：
@@ -298,7 +297,7 @@ def event_handler_plot(event):
        step += 1
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(reader=paddle.batch(
@@ -317,7 +316,7 @@ def event_handler(event):
                event.pass_id, event.batch_id, event.cost, event.metrics)
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(reader=paddle.batch(

--- a/02.recognize_digits/index.html
+++ b/02.recognize_digits/index.html
@@ -173,7 +173,6 @@ PaddlePaddle provides a Python module, `paddle.dataset.mnist`, which downloads a
 A PaddlePaddle program starts from importing the API package:

 ```python
-import gzip
 import paddle.v2 as paddle
 ```

@@ -293,7 +292,7 @@ def event_handler_plot(event):
        step += 1
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(reader=paddle.batch(
@@ -313,7 +312,7 @@ def event_handler(event):
                event.pass_id, event.batch_id, event.cost, event.metrics)
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(reader=paddle.batch(

--- a/02.recognize_digits/train.py
+++ b/02.recognize_digits/train.py
-import gzip
 import os
 from PIL import Image
 import numpy as np
@@ -85,7 +84,7 @@ def main():
                    event.pass_id, event.batch_id, event.cost, event.metrics)
        if isinstance(event, paddle.event.EndPass):
            # save parameters
-            with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
                parameters.to_tar(f)

            result = trainer.test(reader=paddle.batch(

--- a/03.image_classification/README.cn.md
+++ b/03.image_classification/README.cn.md
 # 图像分类

-本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/168.html)。

 ## 背景介绍

@@ -156,7 +156,6 @@ Paddle API提供了自动加载cifar数据集模块 `paddle.dataset.cifar`。

 ```python
 import sys
-import gzip
 import paddle.v2 as paddle
 from vgg import vgg_bn_drop
 from resnet import resnet_cifar10
@@ -431,7 +430,7 @@ def event_handler(event):
            sys.stdout.flush()
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(
@@ -496,9 +495,9 @@ def load_image(file):

 test_data = []
 cur_dir = os.getcwd()
-test_data.append((load_image(cur_dir + '/image/dog.png'),)
+test_data.append((load_image(cur_dir + '/image/dog.png'),))

-# with gzip.open('params_pass_50.tar.gz', 'r') as f:
+# with open('params_pass_50.tar', 'r') as f:
 #    parameters = paddle.parameters.Parameters.from_tar(f)

 probs = paddle.infer(

--- a/03.image_classification/README.md
+++ b/03.image_classification/README.md
@@ -169,7 +169,6 @@ We must import and initialize PaddlePaddle (enable/disable GPU, set the number o

 ```python
 import sys
-import gzip
 import paddle.v2 as paddle
 from vgg import vgg_bn_drop
 from resnet import resnet_cifar10
@@ -438,7 +437,7 @@ def event_handler(event):
            sys.stdout.flush()
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(
@@ -505,10 +504,10 @@ def load_image(file):
    return im
 test_data = []
 cur_dir = os.getcwd()
-test_data.append((load_image(cur_dir + '/image/dog.png'),)
+test_data.append((load_image(cur_dir + '/image/dog.png'),))

 # users can remove the comments and change the model name
-# with gzip.open('params_pass_50.tar.gz', 'r') as f:
+# with open('params_pass_50.tar', 'r') as f:
 #    parameters = paddle.parameters.Parameters.from_tar(f)

 probs = paddle.infer(

--- a/03.image_classification/index.cn.html
+++ b/03.image_classification/index.cn.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 图像分类

-本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/168.html)。

 ## 背景介绍

@@ -198,7 +198,6 @@ Paddle API提供了自动加载cifar数据集模块 `paddle.dataset.cifar`。

 ```python
 import sys
-import gzip
 import paddle.v2 as paddle
 from vgg import vgg_bn_drop
 from resnet import resnet_cifar10
@@ -473,7 +472,7 @@ def event_handler(event):
            sys.stdout.flush()
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(
@@ -538,9 +537,9 @@ def load_image(file):

 test_data = []
 cur_dir = os.getcwd()
-test_data.append((load_image(cur_dir + '/image/dog.png'),)
+test_data.append((load_image(cur_dir + '/image/dog.png'),))

-# with gzip.open('params_pass_50.tar.gz', 'r') as f:
+# with open('params_pass_50.tar', 'r') as f:
 #    parameters = paddle.parameters.Parameters.from_tar(f)

 probs = paddle.infer(

--- a/03.image_classification/index.html
+++ b/03.image_classification/index.html
@@ -211,7 +211,6 @@ We must import and initialize PaddlePaddle (enable/disable GPU, set the number o

 ```python
 import sys
-import gzip
 import paddle.v2 as paddle
 from vgg import vgg_bn_drop
 from resnet import resnet_cifar10
@@ -480,7 +479,7 @@ def event_handler(event):
            sys.stdout.flush()
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(
@@ -547,10 +546,10 @@ def load_image(file):
    return im
 test_data = []
 cur_dir = os.getcwd()
-test_data.append((load_image(cur_dir + '/image/dog.png'),)
+test_data.append((load_image(cur_dir + '/image/dog.png'),))

 # users can remove the comments and change the model name
-# with gzip.open('params_pass_50.tar.gz', 'r') as f:
+# with open('params_pass_50.tar', 'r') as f:
 #    parameters = paddle.parameters.Parameters.from_tar(f)

 probs = paddle.infer(

--- a/03.image_classification/train.py
+++ b/03.image_classification/train.py
@@ -13,7 +13,6 @@
 # limitations under the License

 import sys
-import gzip

 import paddle.v2 as paddle

@@ -67,7 +66,7 @@ def main():
                sys.stdout.flush()
        if isinstance(event, paddle.event.EndPass):
            # save parameters
-            with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
                parameters.to_tar(f)

            result = trainer.test(
@@ -116,7 +115,7 @@ def main():
    test_data.append((load_image(cur_dir + '/image/dog.png'), ))

    # users can remove the comments and change the model name
-    # with gzip.open('params_pass_50.tar.gz', 'r') as f:
+    # with open('params_pass_50.tar', 'r') as f:
    #    parameters = paddle.parameters.Parameters.from_tar(f)

    probs = paddle.infer(

--- a/04.word2vec/README.cn.md
+++ b/04.word2vec/README.cn.md

 # 词向量

-本教程源代码目录在[book/word2vec](https://github.com/PaddlePaddle/book/tree/develop/04.word2vec)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/word2vec](https://github.com/PaddlePaddle/book/tree/develop/04.word2vec)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/175.html)。

 ## 背景介绍

@@ -302,8 +302,6 @@ trainer = paddle.trainer.SGD(cost, parameters, adagrad)
 `paddle.batch`的输入是一个reader，输出是一个batched reader —— 在PaddlePaddle里，一个reader每次yield一条训练数据，而一个batched reader每次yield一个minbatch。

 ```python
-import gzip
-
 def event_handler(event):
    if isinstance(event, paddle.event.EndIteration):
        if event.batch_id % 100 == 0:
@@ -315,7 +313,7 @@ def event_handler(event):
                    paddle.batch(
                        paddle.dataset.imikolov.test(word_dict, N), 32))
        print "Pass %d, Testing metrics %s" % (event.pass_id, result.metrics)
-        with gzip.open("model_%d.tar.gz"%event.pass_id, 'w') as f:
+        with open("model_%d.tar"%event.pass_id, 'w') as f:
            parameters.to_tar(f)

 trainer.train(

--- a/04.word2vec/README.md
+++ b/04.word2vec/README.md
@@ -313,8 +313,6 @@ Next, we will begin the training process. `paddle.dataset.imikolov.train()` and
 `paddle.batch` takes reader as input, outputs a **batched reader**: In PaddlePaddle, a reader outputs a single data instance at a time but batched reader outputs a minibatch of data instances.

 ```python
-import gzip
-
 def event_handler(event):
    if isinstance(event, paddle.event.EndIteration):
        if event.batch_id % 100 == 0:
@@ -326,7 +324,7 @@ def event_handler(event):
                    paddle.batch(
                        paddle.dataset.imikolov.test(word_dict, N), 32))
        print "Pass %d, Testing metrics %s" % (event.pass_id, result.metrics)
-        with gzip.open("model_%d.tar.gz"%event.pass_id, 'w') as f:
+        with open("model_%d.tar"%event.pass_id, 'w') as f:
            parameters.to_tar(f)

 trainer.train(

--- a/04.word2vec/index.cn.html
+++ b/04.word2vec/index.cn.html
@@ -43,7 +43,7 @@

 # 词向量

-本教程源代码目录在[book/word2vec](https://github.com/PaddlePaddle/book/tree/develop/04.word2vec)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/word2vec](https://github.com/PaddlePaddle/book/tree/develop/04.word2vec)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/175.html)。

 ## 背景介绍

@@ -344,8 +344,6 @@ trainer = paddle.trainer.SGD(cost, parameters, adagrad)
 `paddle.batch`的输入是一个reader，输出是一个batched reader —— 在PaddlePaddle里，一个reader每次yield一条训练数据，而一个batched reader每次yield一个minbatch。

 ```python
-import gzip
-
 def event_handler(event):
    if isinstance(event, paddle.event.EndIteration):
        if event.batch_id % 100 == 0:
@@ -357,7 +355,7 @@ def event_handler(event):
                    paddle.batch(
                        paddle.dataset.imikolov.test(word_dict, N), 32))
        print "Pass %d, Testing metrics %s" % (event.pass_id, result.metrics)
-        with gzip.open("model_%d.tar.gz"%event.pass_id, 'w') as f:
+        with open("model_%d.tar"%event.pass_id, 'w') as f:
            parameters.to_tar(f)

 trainer.train(

--- a/04.word2vec/index.html
+++ b/04.word2vec/index.html
@@ -355,8 +355,6 @@ Next, we will begin the training process. `paddle.dataset.imikolov.train()` and
 `paddle.batch` takes reader as input, outputs a **batched reader**: In PaddlePaddle, a reader outputs a single data instance at a time but batched reader outputs a minibatch of data instances.

 ```python
-import gzip
-
 def event_handler(event):
    if isinstance(event, paddle.event.EndIteration):
        if event.batch_id % 100 == 0:
@@ -368,7 +366,7 @@ def event_handler(event):
                    paddle.batch(
                        paddle.dataset.imikolov.test(word_dict, N), 32))
        print "Pass %d, Testing metrics %s" % (event.pass_id, result.metrics)
-        with gzip.open("model_%d.tar.gz"%event.pass_id, 'w') as f:
+        with open("model_%d.tar"%event.pass_id, 'w') as f:
            parameters.to_tar(f)

 trainer.train(

--- a/05.recommender_system/README.cn.md
+++ b/05.recommender_system/README.cn.md
 # 个性化推荐

-本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/176.html)。

 ## 背景介绍

@@ -45,7 +45,7 @@ YouTube是世界上最大的视频上传、分享和发现网站，YouTube推荐

 候选生成网络将推荐问题建模为一个类别数极大的多类分类问题：对于一个Youtube用户，使用其观看历史（视频ID）、搜索词记录（search tokens）、人口学信息（如地理位置、用户登录设备）、二值特征（如性别，是否登录）和连续特征（如用户年龄）等，对视频库中所有视频进行多分类，得到每一类别的分类结果（即每一个视频的推荐概率），最终输出概率较高的几百个视频。

-首先，将观看历史及搜索词记录这类历史信息，映射为向量后取平均值得到定长表示；同时，输入人口学特征以优化新用户的推荐效果，并将二值特征和连续特征归一化处理到[0, 1]范围。接下来，将所有特征表示拼接为一个向量，并输入给非线形多层感知器（MLP，详见[识别数字](https://github.com/PaddlePaddle/book/blob/develop/02.recognize_digits/README.md)教程）处理。最后，训练时将MLP的输出给softmax做分类，预测时计算用户的综合特征（MLP的输出）与所有视频的相似度，取得分最高的$k$个作为候选生成网络的筛选结果。图2显示了候选生成网络结构。
+首先，将观看历史及搜索词记录这类历史信息，映射为向量后取平均值得到定长表示；同时，输入人口学特征以优化新用户的推荐效果，并将二值特征和连续特征归一化处理到[0, 1]范围。接下来，将所有特征表示拼接为一个向量，并输入给非线形多层感知器（MLP，详见[识别数字](https://github.com/PaddlePaddle/book/blob/develop/02.recognize_digits/README.cn.md)教程）处理。最后，训练时将MLP的输出给softmax做分类，预测时计算用户的综合特征（MLP的输出）与所有视频的相似度，取得分最高的$k$个作为候选生成网络的筛选结果。图2显示了候选生成网络结构。

 <p align="center">
 <img src="image/Deep_candidate_generation_model_architecture.png" width="70%" ><br/>

--- a/05.recommender_system/index.cn.html
+++ b/05.recommender_system/index.cn.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 个性化推荐

-本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/176.html)。

 ## 背景介绍

@@ -87,7 +87,7 @@ YouTube是世界上最大的视频上传、分享和发现网站，YouTube推荐

 候选生成网络将推荐问题建模为一个类别数极大的多类分类问题：对于一个Youtube用户，使用其观看历史（视频ID）、搜索词记录（search tokens）、人口学信息（如地理位置、用户登录设备）、二值特征（如性别，是否登录）和连续特征（如用户年龄）等，对视频库中所有视频进行多分类，得到每一类别的分类结果（即每一个视频的推荐概率），最终输出概率较高的几百个视频。

-首先，将观看历史及搜索词记录这类历史信息，映射为向量后取平均值得到定长表示；同时，输入人口学特征以优化新用户的推荐效果，并将二值特征和连续特征归一化处理到[0, 1]范围。接下来，将所有特征表示拼接为一个向量，并输入给非线形多层感知器（MLP，详见[识别数字](https://github.com/PaddlePaddle/book/blob/develop/02.recognize_digits/README.md)教程）处理。最后，训练时将MLP的输出给softmax做分类，预测时计算用户的综合特征（MLP的输出）与所有视频的相似度，取得分最高的$k$个作为候选生成网络的筛选结果。图2显示了候选生成网络结构。
+首先，将观看历史及搜索词记录这类历史信息，映射为向量后取平均值得到定长表示；同时，输入人口学特征以优化新用户的推荐效果，并将二值特征和连续特征归一化处理到[0, 1]范围。接下来，将所有特征表示拼接为一个向量，并输入给非线形多层感知器（MLP，详见[识别数字](https://github.com/PaddlePaddle/book/blob/develop/02.recognize_digits/README.cn.md)教程）处理。最后，训练时将MLP的输出给softmax做分类，预测时计算用户的综合特征（MLP的输出）与所有视频的相似度，取得分最高的$k$个作为候选生成网络的筛选结果。图2显示了候选生成网络结构。

 <p align="center">
 <img src="image/Deep_candidate_generation_model_architecture.png" width="70%" ><br/>

--- a/06.understand_sentiment/README.cn.md
+++ b/06.understand_sentiment/README.cn.md
 # 情感分析

-本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/177.html)。

 ## 背景介绍

@@ -26,7 +26,7 @@

 ### 文本卷积神经网络简介（CNN）

-我们在[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节介绍过应用于文本数据的卷及神经网络模型的计算过程，这里进行一个简单的回顾。
+我们在[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节介绍过应用于文本数据的卷积神经网络模型的计算过程，这里进行一个简单的回顾。

 对卷积神经网络来说，首先使用卷积处理输入的词向量序列，产生一个特征图（feature map），对特征图采用时间维度上的最大池化（max pooling over time）操作得到此卷积核对应的整句话的特征，最后，将所有卷积核得到的特征拼接起来即为文本的定长向量表示，对于文本分类问题，将其连接至softmax即构建出完整的模型。在实际应用中，我们会使用多个卷积核来处理句子，窗口大小相同的卷积核堆叠起来形成一个矩阵，这样可以更高效的完成运算。另外，我们也可使用窗口大小不同的卷积核来处理句子，[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节的图3作为示意画了四个卷积核，不同颜色表示不同大小的卷积核操作。


--- a/06.understand_sentiment/index.cn.html
+++ b/06.understand_sentiment/index.cn.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 情感分析

-本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/177.html)。

 ## 背景介绍

@@ -68,7 +68,7 @@

 ### 文本卷积神经网络简介（CNN）

-我们在[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节介绍过应用于文本数据的卷及神经网络模型的计算过程，这里进行一个简单的回顾。
+我们在[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节介绍过应用于文本数据的卷积神经网络模型的计算过程，这里进行一个简单的回顾。

 对卷积神经网络来说，首先使用卷积处理输入的词向量序列，产生一个特征图（feature map），对特征图采用时间维度上的最大池化（max pooling over time）操作得到此卷积核对应的整句话的特征，最后，将所有卷积核得到的特征拼接起来即为文本的定长向量表示，对于文本分类问题，将其连接至softmax即构建出完整的模型。在实际应用中，我们会使用多个卷积核来处理句子，窗口大小相同的卷积核堆叠起来形成一个矩阵，这样可以更高效的完成运算。另外，我们也可使用窗口大小不同的卷积核来处理句子，[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节的图3作为示意画了四个卷积核，不同颜色表示不同大小的卷积核操作。


--- a/07.label_semantic_roles/README.cn.md
+++ b/07.label_semantic_roles/README.cn.md
 # 语义角色标注

-本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/07.label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/07.label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/178.html)。

 ## 背景介绍

@@ -68,7 +68,7 @@ $$\mbox{[小明]}_{\mbox{Agent}}\mbox{[昨天]}_{\mbox{Time}}\mbox{[晚上]}_\mb
 图4. 基于LSTM的双向循环神经网络结构示意图
 </p>

-需要说明的是，这种双向RNN结构和Bengio等人在机器翻译任务中使用的双向RNN结构\[[3](#参考文献), [4](#参考文献)\] 并不相同，我们会在后续[机器翻译](https://github.com/PaddlePaddle/book/blob/develop/machine_translation/README.md)任务中，介绍另一种双向循环神经网络。
+需要说明的是，这种双向RNN结构和Bengio等人在机器翻译任务中使用的双向RNN结构\[[3](#参考文献), [4](#参考文献)\] 并不相同，我们会在后续[机器翻译](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.cn.md)任务中，介绍另一种双向循环神经网络。

 ### 条件随机场 (Conditional Random Field)

@@ -182,14 +182,13 @@ conll05st-release/
 | predicate_dict | 谓词的词典，共计3162个词 |
 | emb | 一个训练好的词表，32维 |

-我们在英文维基百科上训练语言模型得到了一份词向量用来初始化SRL模型。在SRL模型训练过程中，词向量不再被更新。关于语言模型和词向量可以参考[词向量](https://github.com/PaddlePaddle/book/blob/develop/04.word2vec/README.md) 这篇教程。我们训练语言模型的语料共有995,000,000个token，词典大小控制为4900,000词。CoNLL 2005训练语料中有5%的词不在这4900,000个词中，我们将它们全部看作未登录词，用`<unk>`表示。
+我们在英文维基百科上训练语言模型得到了一份词向量用来初始化SRL模型。在SRL模型训练过程中，词向量不再被更新。关于语言模型和词向量可以参考[词向量](https://github.com/PaddlePaddle/book/blob/develop/04.word2vec/README.cn.md) 这篇教程。我们训练语言模型的语料共有995,000,000个token，词典大小控制为4900,000词。CoNLL 2005训练语料中有5%的词不在这4900,000个词中，我们将它们全部看作未登录词，用`<unk>`表示。

 获取词典，打印词典大小：

 ```python
 import math
 import numpy as np
-import gzip
 import paddle.v2 as paddle
 import paddle.v2.dataset.conll05 as conll05
 import paddle.v2.evaluator as evaluator
@@ -448,7 +447,7 @@ def event_handler(event):

    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(reader=reader, feeding=feeding)

--- a/07.label_semantic_roles/README.md
+++ b/07.label_semantic_roles/README.md
@@ -87,7 +87,7 @@ To address, we can design a bidirectional recurrent neural network by making a m
 Fig 4. Bidirectional LSTMs
 </p>

-Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/machine_translation/README.en.md)
+Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md)

 ### Conditional Random Field (CRF)

@@ -118,7 +118,7 @@ where $\omega$ are the weights to the feature function that the CRF learns. Whil
 $$\DeclareMathOperator*{\argmax}{arg\,max} L(\lambda, D) = - \text{log}\left(\prod_{m=1}^{N}p(Y_m|X_m, W)\right) + C \frac{1}{2}\lVert W\rVert^{2}$$


-This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/07.machine_translation/README.en.md#Beam%20Search%20Algorithm)).
+This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md#beam-search-algorithm)).

 ### Deep Bidirectional LSTM (DB-LSTM) SRL model

@@ -211,7 +211,6 @@ Here we fetch the dictionary, and print its size:
 ```python
 import math
 import numpy as np
-import gzip
 import paddle.v2 as paddle
 import paddle.v2.dataset.conll05 as conll05
 import paddle.v2.evaluator as evaluator
@@ -466,7 +465,7 @@ def event_handler(event):

    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(reader=reader, feeding=feeding)

--- a/07.label_semantic_roles/index.cn.html
+++ b/07.label_semantic_roles/index.cn.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 语义角色标注

-本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/07.label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/07.label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/178.html)。

 ## 背景介绍

@@ -110,7 +110,7 @@ $$\mbox{[小明]}_{\mbox{Agent}}\mbox{[昨天]}_{\mbox{Time}}\mbox{[晚上]}_\mb
 图4. 基于LSTM的双向循环神经网络结构示意图
 </p>

-需要说明的是，这种双向RNN结构和Bengio等人在机器翻译任务中使用的双向RNN结构\[[3](#参考文献), [4](#参考文献)\] 并不相同，我们会在后续[机器翻译](https://github.com/PaddlePaddle/book/blob/develop/machine_translation/README.md)任务中，介绍另一种双向循环神经网络。
+需要说明的是，这种双向RNN结构和Bengio等人在机器翻译任务中使用的双向RNN结构\[[3](#参考文献), [4](#参考文献)\] 并不相同，我们会在后续[机器翻译](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.cn.md)任务中，介绍另一种双向循环神经网络。

 ### 条件随机场 (Conditional Random Field)

@@ -224,14 +224,13 @@ conll05st-release/
 | predicate_dict | 谓词的词典，共计3162个词 |
 | emb | 一个训练好的词表，32维 |

-我们在英文维基百科上训练语言模型得到了一份词向量用来初始化SRL模型。在SRL模型训练过程中，词向量不再被更新。关于语言模型和词向量可以参考[词向量](https://github.com/PaddlePaddle/book/blob/develop/04.word2vec/README.md) 这篇教程。我们训练语言模型的语料共有995,000,000个token，词典大小控制为4900,000词。CoNLL 2005训练语料中有5%的词不在这4900,000个词中，我们将它们全部看作未登录词，用`<unk>`表示。
+我们在英文维基百科上训练语言模型得到了一份词向量用来初始化SRL模型。在SRL模型训练过程中，词向量不再被更新。关于语言模型和词向量可以参考[词向量](https://github.com/PaddlePaddle/book/blob/develop/04.word2vec/README.cn.md) 这篇教程。我们训练语言模型的语料共有995,000,000个token，词典大小控制为4900,000词。CoNLL 2005训练语料中有5%的词不在这4900,000个词中，我们将它们全部看作未登录词，用`<unk>`表示。

 获取词典，打印词典大小：

 ```python
 import math
 import numpy as np
-import gzip
 import paddle.v2 as paddle
 import paddle.v2.dataset.conll05 as conll05
 import paddle.v2.evaluator as evaluator
@@ -490,7 +489,7 @@ def event_handler(event):

    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(reader=reader, feeding=feeding)

--- a/07.label_semantic_roles/index.html
+++ b/07.label_semantic_roles/index.html
@@ -129,7 +129,7 @@ To address, we can design a bidirectional recurrent neural network by making a m
 Fig 4. Bidirectional LSTMs
 </p>

-Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/machine_translation/README.en.md)
+Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md)

 ### Conditional Random Field (CRF)

@@ -160,7 +160,7 @@ where $\omega$ are the weights to the feature function that the CRF learns. Whil
 $$\DeclareMathOperator*{\argmax}{arg\,max} L(\lambda, D) = - \text{log}\left(\prod_{m=1}^{N}p(Y_m|X_m, W)\right) + C \frac{1}{2}\lVert W\rVert^{2}$$


-This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/07.machine_translation/README.en.md#Beam%20Search%20Algorithm)).
+This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md#beam-search-algorithm)).

 ### Deep Bidirectional LSTM (DB-LSTM) SRL model

@@ -253,7 +253,6 @@ Here we fetch the dictionary, and print its size:
 ```python
 import math
 import numpy as np
-import gzip
 import paddle.v2 as paddle
 import paddle.v2.dataset.conll05 as conll05
 import paddle.v2.evaluator as evaluator
@@ -508,7 +507,7 @@ def event_handler(event):

    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)

        result = trainer.test(reader=reader, feeding=feeding)

--- a/07.label_semantic_roles/train.py
+++ b/07.label_semantic_roles/train.py
 import math
 import numpy as np
-import gzip
 import paddle.v2 as paddle
 import paddle.v2.dataset.conll05 as conll05
 import paddle.v2.evaluator as evaluator
@@ -183,7 +182,7 @@ def main():

        if isinstance(event, paddle.event.EndPass):
            # save parameters
-            with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
                parameters.to_tar(f)

            result = trainer.test(reader=reader, feeding=feeding)

--- a/08.machine_translation/README.cn.md
+++ b/08.machine_translation/README.cn.md
 # 机器翻译

-本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/179.html)。

 ## 背景介绍

@@ -39,7 +39,7 @@

 ### GRU

-我们已经在[情感分析](https://github.com/PaddlePaddle/book/blob/develop/understand_sentiment/README.md)一章中介绍了循环神经网络（RNN）及长短时间记忆网络（LSTM）。相比于简单的RNN，LSTM增加了记忆单元（memory cell）、输入门（input gate）、遗忘门（forget gate）及输出门（output gate），这些门及记忆单元组合起来大大提升了RNN处理远距离依赖问题的能力。
+我们已经在[情感分析](https://github.com/PaddlePaddle/book/blob/develop/06.understand_sentiment/README.cn.md)一章中介绍了循环神经网络（RNN）及长短时间记忆网络（LSTM）。相比于简单的RNN，LSTM增加了记忆单元（memory cell）、输入门（input gate）、遗忘门（forget gate）及输出门（output gate），这些门及记忆单元组合起来大大提升了RNN处理远距离依赖问题的能力。

 GRU\[[2](#参考文献)\]是Cho等人在LSTM上提出的简化版本，也是RNN的一种扩展，如下图所示。GRU单元只有两个门：
 - 重置门（reset gate）：如果重置门关闭，会忽略掉历史信息，即历史不相干的信息不会影响未来的输出。
@@ -53,7 +53,7 @@ GRU\[[2](#参考文献)\]是Cho等人在LSTM上提出的简化版本，也是RNN

 ### 双向循环神经网络

-我们已经在[语义角色标注](https://github.com/PaddlePaddle/book/blob/develop/label_semantic_roles/README.md)一章中介绍了一种双向循环神经网络，这里介绍Bengio团队在论文\[[2](#参考文献),[4](#参考文献)\]中提出的另一种结构。该结构的目的是输入一个序列，得到其在每个时刻的特征表示，即输出的每个时刻都用定长向量表示到该时刻的上下文语义信息。
+我们已经在[语义角色标注](https://github.com/PaddlePaddle/book/blob/develop/07.label_semantic_roles/README.cn.md)一章中介绍了一种双向循环神经网络，这里介绍Bengio团队在论文\[[2](#参考文献),[4](#参考文献)\]中提出的另一种结构。该结构的目的是输入一个序列，得到其在每个时刻的特征表示，即输出的每个时刻都用定长向量表示到该时刻的上下文语义信息。

 具体来说，该双向循环神经网络分别在时间维以顺序和逆序——即前向（forward）和后向（backward）——依次处理输入序列，并将每个时间步RNN的输出拼接成为最终的输出层。这样每个时间步的输出节点，都包含了输入序列中当前时刻完整的过去和未来的上下文信息。下图展示的是一个按时间步展开的双向循环神经网络。该网络包含一个前向和一个后向RNN，其中有六个权重矩阵：输入到前向隐层和后向隐层的权重矩阵（$W_1, W_3$），隐层到隐层自己的权重矩阵（$W_2,W_5$），前向隐层和后向隐层到输出层的权重矩阵（$W_4, W_6$）。注意，该网络的前向隐层和后向隐层之间没有连接。

@@ -185,16 +185,16 @@ is_generating = False
 ### 模型结构
 1. 首先，定义了一些全局变量。

-   ```python
-   dict_size = 30000 # 字典维度
-   source_dict_dim = dict_size # 源语言字典维度
-   target_dict_dim = dict_size # 目标语言字典维度
-   word_vector_dim = 512 # 词向量维度
-   encoder_size = 512 # 编码器中的GRU隐层大小
-   decoder_size = 512 # 解码器中的GRU隐层大小
-   beam_size = 3 # 柱宽度
-   max_length = 250 # 生成句子的最大长度
-  ```
+    ```python
+    dict_size = 30000 # 字典维度
+    source_dict_dim = dict_size # 源语言字典维度
+    target_dict_dim = dict_size # 目标语言字典维度
+    word_vector_dim = 512 # 词向量维度
+    encoder_size = 512 # 编码器中的GRU隐层大小
+    decoder_size = 512 # 解码器中的GRU隐层大小
+    beam_size = 3 # 柱宽度
+    max_length = 250 # 生成句子的最大长度
+    ```

 2. 其次，实现编码器框架。分为三步：

@@ -209,9 +209,7 @@ is_generating = False

   ```python
    src_embedding = paddle.layer.embedding(
-        input=src_word_id,
-        size=word_vector_dim,
-        param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))
+        input=src_word_id, size=word_vector_dim)
   ```
   - 用双向GRU编码源语言序列，拼接两个GRU的编码结果得到$\mathbf{h}$。

@@ -228,19 +226,22 @@ is_generating = False
   - 对源语言序列编码后的结果（见2的最后一步），过一个前馈神经网络（Feed Forward Neural Network），得到其映射。

   ```python
-    with paddle.layer.mixed(size=decoder_size) as encoded_proj:
-        encoded_proj += paddle.layer.full_matrix_projection(
-            input=encoded_vector)
+   encoded_proj = paddle.layer.fc(
+         act=paddle.activation.Linear(),
+         size=decoder_size,
+         bias_attr=False,
+         input=encoded_vector)
   ```

   - 构造解码器RNN的初始状态。由于解码器需要预测时序目标序列，但在0时刻并没有初始值，所以我们希望对其进行初始化。这里采用的是将源语言序列逆序编码后的最后一个状态进行非线性映射，作为该初始值，即$c_0=h_T$。

   ```python
-    backward_first = paddle.layer.first_seq(input=src_backward)
-    with paddle.layer.mixed(
-            size=decoder_size, act=paddle.activation.Tanh()) as decoder_boot:
-        decoder_boot += paddle.layer.full_matrix_projection(
-            input=backward_first)
+   backward_first = paddle.layer.first_seq(input=src_backward)
+   decoder_boot = paddle.layer.fc(
+         size=decoder_size,
+         act=paddle.activation.Tanh(),
+         bias_attr=False,
+         input=backward_first)
   ```

   - 定义解码阶段每一个时间步的RNN行为，即根据当前时刻的源语言上下文向量$c_i$、解码器隐层状态$z_i$和目标语言中第$i$个词$u_i$，来预测第$i+1$个词的概率$p_{i+1}$。
@@ -251,8 +252,7 @@ is_generating = False
      - 最后，使用softmax归一化计算单词的概率，将out结果返回，即实现公式$p\left ( u_i|u_{&lt;i},\mathbf{x} \right )=softmax(W_sz_i+b_z)$。

   ```python
-    def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
-
+   def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
        decoder_mem = paddle.layer.memory(
            name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)

@@ -261,10 +261,13 @@ is_generating = False
            encoded_proj=enc_proj,
            decoder_state=decoder_mem)

-        with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
-            decoder_inputs += paddle.layer.full_matrix_projection(input=context)
-            decoder_inputs += paddle.layer.full_matrix_projection(
-                input=current_word)
+        decoder_inputs = paddle.layer.fc(
+            act=paddle.activation.Linear(),
+            size=decoder_size * 3,
+            bias_attr=False,
+            input=[context, current_word],
+            layer_attr=paddle.attr.ExtraLayerAttribute(
+                error_clipping_threshold=100.0))

        gru_step = paddle.layer.gru_step(
            name='gru_decoder',
@@ -272,20 +275,20 @@ is_generating = False
            output_mem=decoder_mem,
            size=decoder_size)

-        with paddle.layer.mixed(
-                size=target_dict_dim,
-                bias_attr=True,
-                act=paddle.activation.Softmax()) as out:
-            out += paddle.layer.full_matrix_projection(input=gru_step)
+        out = paddle.layer.mixed(
+            size=target_dict_dim,
+            bias_attr=True,
+            act=paddle.activation.Softmax(),
+            input=paddle.layer.full_matrix_projection(input=gru_step))
        return out
-    ```
+   ```

 4. 定义解码器框架名字，和`gru_decoder_with_attention`函数的前两个输入。注意：这两个输入使用`StaticInput`，具体说明可见[StaticInput文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入)。

    ```python
    decoder_group_name = "decoder_group"
-    group_input1 = paddle.layer.StaticInputV2(input=encoded_vector, is_seq=True)
-    group_input2 = paddle.layer.StaticInputV2(input=encoded_proj, is_seq=True)
+    group_input1 = paddle.layer.StaticInput(input=encoded_vector)
+    group_input2 = paddle.layer.StaticInput(input=encoded_proj)
    group_inputs = [group_input1, group_input2]
    ```

@@ -300,7 +303,7 @@ is_generating = False
   if not is_generating:
       trg_embedding = paddle.layer.embedding(
           input=paddle.layer.data(
-               name='target_language_word',  
+               name='target_language_word',
               type=paddle.data_type.integer_value_sequence(target_dict_dim)),
           size=word_vector_dim,
           param_attr=paddle.attr.ParamAttr(name='_target_language_embedding'))
@@ -329,16 +332,15 @@ is_generating = False

   ```python
   if is_generating:
-       # In generation, the decoder predicts a next target word based on
-       # the encoded source sequence and the last generated target word.
+      # In generation, the decoder predicts a next target word based on
+      # the encoded source sequence and the previous generated target word.

-       # The encoded source sequence (encoder's output) must be specified by
-       # StaticInput, which is a read-only memory.
-       # Embedding of the last generated word is automatically gotten by
-       # GeneratedInputs, which is initialized by a start mark, such as <s>,
-       # and must be included in generation.
+      # The encoded source sequence (encoder's output) must be specified by
+      # StaticInput, which is a read-only memory.
+      # Embedding of the previous generated word is automatically retrieved
+      # by GeneratedInputs initialized by a start mark <s>.

-       trg_embedding = paddle.layer.GeneratedInputV2(
+       trg_embedding = paddle.layer.GeneratedInput(
           size=target_dict_dim,
           embedding_name='_target_language_embedding',
           embedding_size=word_vector_dim)
@@ -467,36 +469,31 @@ is_generating = False

    ```python
    if is_generating:
-        # get the dictionary
+        # load the dictionary
        src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)

-        # the delimited element of generated sequences is -1,
-        # the first element of each generated sequence is the sequence length
-        seq_list = []
-        seq = []
-        for w in beam_result[1]:
-            if w != -1:
-                seq.append(w)
-            else:
-                seq_list.append(' '.join([trg_dict.get(w) for w in seq[1:]]))
-                seq = []
-
-        prob = beam_result[0]
-        for i in xrange(gen_num):
-            print "\n*******************************************************\n"
-            print "src:", ' '.join(
-                [src_dict.get(w) for w in gen_data[i][0]]), "\n"
+        gen_sen_idx = np.where(beam_result[1] == -1)[0]
+        assert len(gen_sen_idx) == len(gen_data) * beam_size
+
+        # -1 is the delimiter of generated sequences.
+        # the first element of each generated sequence its length.
+        start_pos, end_pos = 1, 0
+        for i, sample in enumerate(gen_data):
+            print(" ".join([src_dict[w] for w in sample[0][1:-1]]))
            for j in xrange(beam_size):
-                print "prob = %f:" % (prob[i][j]), seq_list[i * beam_size + j]
+                end_pos = gen_sen_idx[i * beam_size + j]
+                print("%.4f\t%s" % (beam_result[0][i][j], " ".join(
+                    trg_dict[w] for w in beam_result[1][start_pos:end_pos])))
+                start_pos = end_pos + 2
+            print("\n")
    ```

  生成开始后，可以观察到输出的日志如下：
  ```text
-  src: <s> Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu <e>
-
-  prob = -19.019573: The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
-  prob = -19.113066: The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
-  prob = -19.512890: The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
+  Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu
+  -19.0196        The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
+  -19.1131        The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
+  -19.5129        The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
  ```

 ## 总结

--- a/08.machine_translation/README.md
+++ b/08.machine_translation/README.md
@@ -230,34 +230,32 @@ is_generating = False
   decoder_size = 512 # hidden layer size of GRU in decoder
   beam_size = 3 # expand width in beam search
   max_length = 250 # a stop condition of sequence generation
-  ```
+   ```

 2. Implement Encoder as follows:
   - Input is a sequence of words represented by an integer word index sequence. So we define data layer of data type `integer_value_sequence`. The value range of each element in the sequence is `[0, source_dict_dim)`

   ```python
-    src_word_id = paddle.layer.data(
-        name='source_language_word',
-        type=paddle.data_type.integer_value_sequence(source_dict_dim))
+   src_word_id = paddle.layer.data(
+       name='source_language_word',
+       type=paddle.data_type.integer_value_sequence(source_dict_dim))
   ```

   - Map the one-hot vector (represented by word index) into a word vector $\mathbf{s}$ in a low-dimensional semantic space

   ```python
-    src_embedding = paddle.layer.embedding(
-        input=src_word_id,
-        size=word_vector_dim,
-        param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))
+   src_embedding = paddle.layer.embedding(
+       input=src_word_id, size=word_vector_dim)
   ```

   - Use bi-direcitonal GRU to encode the source language sequence, and concatenate the encoding outputs from the two GRUs to get $\mathbf{h}$

   ```python
-    src_forward = paddle.networks.simple_gru(
-        input=src_embedding, size=encoder_size)
-    src_backward = paddle.networks.simple_gru(
-        input=src_embedding, size=encoder_size, reverse=True)
-    encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])
+   src_forward = paddle.networks.simple_gru(
+       input=src_embedding, size=encoder_size)
+   src_backward = paddle.networks.simple_gru(
+       input=src_embedding, size=encoder_size, reverse=True)
+   encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])
   ```

 3. Implement Attention-based Decoder as follows:
@@ -265,19 +263,22 @@ is_generating = False
   - Get a projection of the encoding (c.f. 2.3) of the source language sequence by passing it into a feed forward neural network

   ```python
-    with paddle.layer.mixed(size=decoder_size) as encoded_proj:
-        encoded_proj += paddle.layer.full_matrix_projection(
-            input=encoded_vector)
+   encoded_proj = paddle.layer.fc(
+         act=paddle.activation.Linear(),
+         size=decoder_size,
+         bias_attr=False,
+         input=encoded_vector)
   ```

   - Use a non-linear transformation of the last hidden state of the backward GRU on the source language sentence as the initial state of the decoder RNN $c_0=h_T$

   ```python
-    backward_first = paddle.layer.first_seq(input=src_backward)
-    with paddle.layer.mixed(
-            size=decoder_size, act=paddle.activation.Tanh()) as decoder_boot:
-        decoder_boot += paddle.layer.full_matrix_projection(
-            input=backward_first)
+   backward_first = paddle.layer.first_seq(input=src_backward)
+   decoder_boot = paddle.layer.fc(
+         size=decoder_size,
+         act=paddle.activation.Tanh(),
+         bias_attr=False,
+         input=backward_first)
   ```

   - Define the computation in each time step for the decoder RNN, i.e., according to the current context vector $c_i$, hidden state for the decoder $z_i$ and the $i$-th word $u_i$ in the target language to predict the probability $p_{i+1}$ for the $i+1$-th word.
@@ -289,8 +290,7 @@ is_generating = False
      - Softmax normalization is used in the end to computed the probability of words, i.e., $p\left ( u_i|u_{&lt;i},\mathbf{x} \right )=softmax(W_sz_i+b_z)$. The output is returned.

   ```python
-    def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
-
+   def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
        decoder_mem = paddle.layer.memory(
            name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)

@@ -299,10 +299,13 @@ is_generating = False
            encoded_proj=enc_proj,
            decoder_state=decoder_mem)

-        with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
-            decoder_inputs += paddle.layer.full_matrix_projection(input=context)
-            decoder_inputs += paddle.layer.full_matrix_projection(
-                input=current_word)
+        decoder_inputs = paddle.layer.fc(
+            act=paddle.activation.Linear(),
+            size=decoder_size * 3,
+            bias_attr=False,
+            input=[context, current_word],
+            layer_attr=paddle.attr.ExtraLayerAttribute(
+            error_clipping_threshold=100.0))

        gru_step = paddle.layer.gru_step(
            name='gru_decoder',
@@ -310,20 +313,20 @@ is_generating = False
            output_mem=decoder_mem,
            size=decoder_size)

-        with paddle.layer.mixed(
-                size=target_dict_dim,
-                bias_attr=True,
-                act=paddle.activation.Softmax()) as out:
-            out += paddle.layer.full_matrix_projection(input=gru_step)
+        out = paddle.layer.fc(
+            size=target_dict_dim,
+            bias_attr=True,
+            act=paddle.activation.Softmax(),
+            input=gru_step)
        return out
-    ```
+   ```

 4. Define the name for the decoder and the first two input for `gru_decoder_with_attention`. Note that `StaticInput` is used for the two inputs. Please refer to [StaticInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for more details.

    ```python
    decoder_group_name = "decoder_group"
-    group_input1 = paddle.layer.StaticInputV2(input=encoded_vector, is_seq=True)
-    group_input2 = paddle.layer.StaticInputV2(input=encoded_proj, is_seq=True)
+    group_input1 = paddle.layer.StaticInput(input=encoded_vector)
+    group_input2 = paddle.layer.StaticInput(input=encoded_proj)
    group_inputs = [group_input1, group_input2]
    ```

@@ -368,15 +371,14 @@ is_generating = False
   ```python
   if is_generating:
       # In generation, the decoder predicts a next target word based on
-       # the encoded source sequence and the last generated target word.
+       # the encoded source sequence and the previous generated target word.

       # The encoded source sequence (encoder's output) must be specified by
       # StaticInput, which is a read-only memory.
-       # Embedding of the last generated word is automatically gotten by
-       # GeneratedInputs, which is initialized by a start mark, such as <s>,
-       # and must be included in generation.
+       # Embedding of the previous generated word is automatically retrieved
+       # by GeneratedInputs initialized by a start mark <s>.

-       trg_embedding = paddle.layer.GeneratedInputV2(
+       trg_embedding = paddle.layer.GeneratedInput(
           size=target_dict_dim,
           embedding_name='_target_language_embedding',
           embedding_size=word_vector_dim)
@@ -503,36 +505,31 @@ Note: Our configuration is based on Bahdanau et al. \[[4](#Reference)\] but with

   ```python
   if is_generating:
-        # get the dictionary
-        src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)
-
-        # the delimited element of generated sequences is -1,
-        # the first element of each generated sequence is the sequence length
-        seq_list = []
-        seq = []
-        for w in beam_result[1]:
-            if w != -1:
-                seq.append(w)
-            else:
-                seq_list.append(' '.join([trg_dict.get(w) for w in seq[1:]]))
-                seq = []
-
-        prob = beam_result[0]
-        for i in xrange(gen_num):
-            print "\n*******************************************************\n"
-            print "src:", ' '.join(
-                [src_dict.get(w) for w in gen_data[i][0]]), "\n"
-            for j in xrange(beam_size):
-                print "prob = %f:" % (prob[i][j]), seq_list[i * beam_size + j]
+       # load the dictionary
+       src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)
+
+       gen_sen_idx = np.where(beam_result[1] == -1)[0]
+       assert len(gen_sen_idx) == len(gen_data) * beam_size
+
+       # -1 is the delimiter of generated sequences.
+       # the first element of each generated sequence its length.
+       start_pos, end_pos = 1, 0
+       for i, sample in enumerate(gen_data):
+           print(" ".join([src_dict[w] for w in sample[0][1:-1]]))
+           for j in xrange(beam_size):
+               end_pos = gen_sen_idx[i * beam_size + j]
+               print("%.4f\t%s" % (beam_result[0][i][j], " ".join(
+                     trg_dict[w] for w in beam_result[1][start_pos:end_pos])))
+               start_pos = end_pos + 2
+           print("\n")
   ```

  The generating log is as follows:
  ```text
-  src: <s> Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu <e>
-
-  prob = -19.019573: The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
-  prob = -19.113066: The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
-  prob = -19.512890: The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
+  Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu
+  -19.0196        The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
+  -19.1131        The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
+  -19.5129        The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
  ```

 ## Summary

--- a/08.machine_translation/index.cn.html
+++ b/08.machine_translation/index.cn.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 机器翻译

-本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/179.html)。

 ## 背景介绍

@@ -81,7 +81,7 @@

 ### GRU

-我们已经在[情感分析](https://github.com/PaddlePaddle/book/blob/develop/understand_sentiment/README.md)一章中介绍了循环神经网络（RNN）及长短时间记忆网络（LSTM）。相比于简单的RNN，LSTM增加了记忆单元（memory cell）、输入门（input gate）、遗忘门（forget gate）及输出门（output gate），这些门及记忆单元组合起来大大提升了RNN处理远距离依赖问题的能力。
+我们已经在[情感分析](https://github.com/PaddlePaddle/book/blob/develop/06.understand_sentiment/README.cn.md)一章中介绍了循环神经网络（RNN）及长短时间记忆网络（LSTM）。相比于简单的RNN，LSTM增加了记忆单元（memory cell）、输入门（input gate）、遗忘门（forget gate）及输出门（output gate），这些门及记忆单元组合起来大大提升了RNN处理远距离依赖问题的能力。

 GRU\[[2](#参考文献)\]是Cho等人在LSTM上提出的简化版本，也是RNN的一种扩展，如下图所示。GRU单元只有两个门：
 - 重置门（reset gate）：如果重置门关闭，会忽略掉历史信息，即历史不相干的信息不会影响未来的输出。
@@ -95,7 +95,7 @@ GRU\[[2](#参考文献)\]是Cho等人在LSTM上提出的简化版本，也是RNN

 ### 双向循环神经网络

-我们已经在[语义角色标注](https://github.com/PaddlePaddle/book/blob/develop/label_semantic_roles/README.md)一章中介绍了一种双向循环神经网络，这里介绍Bengio团队在论文\[[2](#参考文献),[4](#参考文献)\]中提出的另一种结构。该结构的目的是输入一个序列，得到其在每个时刻的特征表示，即输出的每个时刻都用定长向量表示到该时刻的上下文语义信息。
+我们已经在[语义角色标注](https://github.com/PaddlePaddle/book/blob/develop/07.label_semantic_roles/README.cn.md)一章中介绍了一种双向循环神经网络，这里介绍Bengio团队在论文\[[2](#参考文献),[4](#参考文献)\]中提出的另一种结构。该结构的目的是输入一个序列，得到其在每个时刻的特征表示，即输出的每个时刻都用定长向量表示到该时刻的上下文语义信息。

 具体来说，该双向循环神经网络分别在时间维以顺序和逆序——即前向（forward）和后向（backward）——依次处理输入序列，并将每个时间步RNN的输出拼接成为最终的输出层。这样每个时间步的输出节点，都包含了输入序列中当前时刻完整的过去和未来的上下文信息。下图展示的是一个按时间步展开的双向循环神经网络。该网络包含一个前向和一个后向RNN，其中有六个权重矩阵：输入到前向隐层和后向隐层的权重矩阵（$W_1, W_3$），隐层到隐层自己的权重矩阵（$W_2,W_5$），前向隐层和后向隐层到输出层的权重矩阵（$W_4, W_6$）。注意，该网络的前向隐层和后向隐层之间没有连接。

@@ -227,16 +227,16 @@ is_generating = False
 ### 模型结构
 1. 首先，定义了一些全局变量。

-   ```python
-   dict_size = 30000 # 字典维度
-   source_dict_dim = dict_size # 源语言字典维度
-   target_dict_dim = dict_size # 目标语言字典维度
-   word_vector_dim = 512 # 词向量维度
-   encoder_size = 512 # 编码器中的GRU隐层大小
-   decoder_size = 512 # 解码器中的GRU隐层大小
-   beam_size = 3 # 柱宽度
-   max_length = 250 # 生成句子的最大长度
-  ```
+    ```python
+    dict_size = 30000 # 字典维度
+    source_dict_dim = dict_size # 源语言字典维度
+    target_dict_dim = dict_size # 目标语言字典维度
+    word_vector_dim = 512 # 词向量维度
+    encoder_size = 512 # 编码器中的GRU隐层大小
+    decoder_size = 512 # 解码器中的GRU隐层大小
+    beam_size = 3 # 柱宽度
+    max_length = 250 # 生成句子的最大长度
+    ```

 2. 其次，实现编码器框架。分为三步：

@@ -251,9 +251,7 @@ is_generating = False

   ```python
    src_embedding = paddle.layer.embedding(
-        input=src_word_id,
-        size=word_vector_dim,
-        param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))
+        input=src_word_id, size=word_vector_dim)
   ```
   - 用双向GRU编码源语言序列，拼接两个GRU的编码结果得到$\mathbf{h}$。

@@ -270,19 +268,22 @@ is_generating = False
   - 对源语言序列编码后的结果（见2的最后一步），过一个前馈神经网络（Feed Forward Neural Network），得到其映射。

   ```python
-    with paddle.layer.mixed(size=decoder_size) as encoded_proj:
-        encoded_proj += paddle.layer.full_matrix_projection(
-            input=encoded_vector)
+   encoded_proj = paddle.layer.fc(
+         act=paddle.activation.Linear(),
+         size=decoder_size,
+         bias_attr=False,
+         input=encoded_vector)
   ```

   - 构造解码器RNN的初始状态。由于解码器需要预测时序目标序列，但在0时刻并没有初始值，所以我们希望对其进行初始化。这里采用的是将源语言序列逆序编码后的最后一个状态进行非线性映射，作为该初始值，即$c_0=h_T$。

   ```python
-    backward_first = paddle.layer.first_seq(input=src_backward)
-    with paddle.layer.mixed(
-            size=decoder_size, act=paddle.activation.Tanh()) as decoder_boot:
-        decoder_boot += paddle.layer.full_matrix_projection(
-            input=backward_first)
+   backward_first = paddle.layer.first_seq(input=src_backward)
+   decoder_boot = paddle.layer.fc(
+         size=decoder_size,
+         act=paddle.activation.Tanh(),
+         bias_attr=False,
+         input=backward_first)
   ```

   - 定义解码阶段每一个时间步的RNN行为，即根据当前时刻的源语言上下文向量$c_i$、解码器隐层状态$z_i$和目标语言中第$i$个词$u_i$，来预测第$i+1$个词的概率$p_{i+1}$。
@@ -293,8 +294,7 @@ is_generating = False
      - 最后，使用softmax归一化计算单词的概率，将out结果返回，即实现公式$p\left ( u_i|u_{&lt;i},\mathbf{x} \right )=softmax(W_sz_i+b_z)$。

   ```python
-    def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
-
+   def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
        decoder_mem = paddle.layer.memory(
            name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)

@@ -303,10 +303,13 @@ is_generating = False
            encoded_proj=enc_proj,
            decoder_state=decoder_mem)

-        with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
-            decoder_inputs += paddle.layer.full_matrix_projection(input=context)
-            decoder_inputs += paddle.layer.full_matrix_projection(
-                input=current_word)
+        decoder_inputs = paddle.layer.fc(
+            act=paddle.activation.Linear(),
+            size=decoder_size * 3,
+            bias_attr=False,
+            input=[context, current_word],
+            layer_attr=paddle.attr.ExtraLayerAttribute(
+                error_clipping_threshold=100.0))

        gru_step = paddle.layer.gru_step(
            name='gru_decoder',
@@ -314,20 +317,20 @@ is_generating = False
            output_mem=decoder_mem,
            size=decoder_size)

-        with paddle.layer.mixed(
-                size=target_dict_dim,
-                bias_attr=True,
-                act=paddle.activation.Softmax()) as out:
-            out += paddle.layer.full_matrix_projection(input=gru_step)
+        out = paddle.layer.mixed(
+            size=target_dict_dim,
+            bias_attr=True,
+            act=paddle.activation.Softmax(),
+            input=paddle.layer.full_matrix_projection(input=gru_step))
        return out
-    ```
+   ```

 4. 定义解码器框架名字，和`gru_decoder_with_attention`函数的前两个输入。注意：这两个输入使用`StaticInput`，具体说明可见[StaticInput文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入)。

    ```python
    decoder_group_name = "decoder_group"
-    group_input1 = paddle.layer.StaticInputV2(input=encoded_vector, is_seq=True)
-    group_input2 = paddle.layer.StaticInputV2(input=encoded_proj, is_seq=True)
+    group_input1 = paddle.layer.StaticInput(input=encoded_vector)
+    group_input2 = paddle.layer.StaticInput(input=encoded_proj)
    group_inputs = [group_input1, group_input2]
    ```

@@ -342,7 +345,7 @@ is_generating = False
   if not is_generating:
       trg_embedding = paddle.layer.embedding(
           input=paddle.layer.data(
-               name='target_language_word',  
+               name='target_language_word',
               type=paddle.data_type.integer_value_sequence(target_dict_dim)),
           size=word_vector_dim,
           param_attr=paddle.attr.ParamAttr(name='_target_language_embedding'))
@@ -371,16 +374,15 @@ is_generating = False

   ```python
   if is_generating:
-       # In generation, the decoder predicts a next target word based on
-       # the encoded source sequence and the last generated target word.
+      # In generation, the decoder predicts a next target word based on
+      # the encoded source sequence and the previous generated target word.

-       # The encoded source sequence (encoder's output) must be specified by
-       # StaticInput, which is a read-only memory.
-       # Embedding of the last generated word is automatically gotten by
-       # GeneratedInputs, which is initialized by a start mark, such as <s>,
-       # and must be included in generation.
+      # The encoded source sequence (encoder's output) must be specified by
+      # StaticInput, which is a read-only memory.
+      # Embedding of the previous generated word is automatically retrieved
+      # by GeneratedInputs initialized by a start mark <s>.

-       trg_embedding = paddle.layer.GeneratedInputV2(
+       trg_embedding = paddle.layer.GeneratedInput(
           size=target_dict_dim,
           embedding_name='_target_language_embedding',
           embedding_size=word_vector_dim)
@@ -509,36 +511,31 @@ is_generating = False

    ```python
    if is_generating:
-        # get the dictionary
+        # load the dictionary
        src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)

-        # the delimited element of generated sequences is -1,
-        # the first element of each generated sequence is the sequence length
-        seq_list = []
-        seq = []
-        for w in beam_result[1]:
-            if w != -1:
-                seq.append(w)
-            else:
-                seq_list.append(' '.join([trg_dict.get(w) for w in seq[1:]]))
-                seq = []
-
-        prob = beam_result[0]
-        for i in xrange(gen_num):
-            print "\n*******************************************************\n"
-            print "src:", ' '.join(
-                [src_dict.get(w) for w in gen_data[i][0]]), "\n"
+        gen_sen_idx = np.where(beam_result[1] == -1)[0]
+        assert len(gen_sen_idx) == len(gen_data) * beam_size
+
+        # -1 is the delimiter of generated sequences.
+        # the first element of each generated sequence its length.
+        start_pos, end_pos = 1, 0
+        for i, sample in enumerate(gen_data):
+            print(" ".join([src_dict[w] for w in sample[0][1:-1]]))
            for j in xrange(beam_size):
-                print "prob = %f:" % (prob[i][j]), seq_list[i * beam_size + j]
+                end_pos = gen_sen_idx[i * beam_size + j]
+                print("%.4f\t%s" % (beam_result[0][i][j], " ".join(
+                    trg_dict[w] for w in beam_result[1][start_pos:end_pos])))
+                start_pos = end_pos + 2
+            print("\n")
    ```

  生成开始后，可以观察到输出的日志如下：
  ```text
-  src: <s> Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu <e>
-
-  prob = -19.019573: The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
-  prob = -19.113066: The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
-  prob = -19.512890: The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
+  Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu
+  -19.0196        The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
+  -19.1131        The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
+  -19.5129        The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
  ```

 ## 总结

--- a/08.machine_translation/index.html
+++ b/08.machine_translation/index.html
@@ -272,34 +272,32 @@ is_generating = False
   decoder_size = 512 # hidden layer size of GRU in decoder
   beam_size = 3 # expand width in beam search
   max_length = 250 # a stop condition of sequence generation
-  ```
+   ```

 2. Implement Encoder as follows:
   - Input is a sequence of words represented by an integer word index sequence. So we define data layer of data type `integer_value_sequence`. The value range of each element in the sequence is `[0, source_dict_dim)`

   ```python
-    src_word_id = paddle.layer.data(
-        name='source_language_word',
-        type=paddle.data_type.integer_value_sequence(source_dict_dim))
+   src_word_id = paddle.layer.data(
+       name='source_language_word',
+       type=paddle.data_type.integer_value_sequence(source_dict_dim))
   ```

   - Map the one-hot vector (represented by word index) into a word vector $\mathbf{s}$ in a low-dimensional semantic space

   ```python
-    src_embedding = paddle.layer.embedding(
-        input=src_word_id,
-        size=word_vector_dim,
-        param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))
+   src_embedding = paddle.layer.embedding(
+       input=src_word_id, size=word_vector_dim)
   ```

   - Use bi-direcitonal GRU to encode the source language sequence, and concatenate the encoding outputs from the two GRUs to get $\mathbf{h}$

   ```python
-    src_forward = paddle.networks.simple_gru(
-        input=src_embedding, size=encoder_size)
-    src_backward = paddle.networks.simple_gru(
-        input=src_embedding, size=encoder_size, reverse=True)
-    encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])
+   src_forward = paddle.networks.simple_gru(
+       input=src_embedding, size=encoder_size)
+   src_backward = paddle.networks.simple_gru(
+       input=src_embedding, size=encoder_size, reverse=True)
+   encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])
   ```

 3. Implement Attention-based Decoder as follows:
@@ -307,19 +305,22 @@ is_generating = False
   - Get a projection of the encoding (c.f. 2.3) of the source language sequence by passing it into a feed forward neural network

   ```python
-    with paddle.layer.mixed(size=decoder_size) as encoded_proj:
-        encoded_proj += paddle.layer.full_matrix_projection(
-            input=encoded_vector)
+   encoded_proj = paddle.layer.fc(
+         act=paddle.activation.Linear(),
+         size=decoder_size,
+         bias_attr=False,
+         input=encoded_vector)
   ```

   - Use a non-linear transformation of the last hidden state of the backward GRU on the source language sentence as the initial state of the decoder RNN $c_0=h_T$

   ```python
-    backward_first = paddle.layer.first_seq(input=src_backward)
-    with paddle.layer.mixed(
-            size=decoder_size, act=paddle.activation.Tanh()) as decoder_boot:
-        decoder_boot += paddle.layer.full_matrix_projection(
-            input=backward_first)
+   backward_first = paddle.layer.first_seq(input=src_backward)
+   decoder_boot = paddle.layer.fc(
+         size=decoder_size,
+         act=paddle.activation.Tanh(),
+         bias_attr=False,
+         input=backward_first)
   ```

   - Define the computation in each time step for the decoder RNN, i.e., according to the current context vector $c_i$, hidden state for the decoder $z_i$ and the $i$-th word $u_i$ in the target language to predict the probability $p_{i+1}$ for the $i+1$-th word.
@@ -331,8 +332,7 @@ is_generating = False
      - Softmax normalization is used in the end to computed the probability of words, i.e., $p\left ( u_i|u_{&lt;i},\mathbf{x} \right )=softmax(W_sz_i+b_z)$. The output is returned.

   ```python
-    def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
-
+   def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
        decoder_mem = paddle.layer.memory(
            name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)

@@ -341,10 +341,13 @@ is_generating = False
            encoded_proj=enc_proj,
            decoder_state=decoder_mem)

-        with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
-            decoder_inputs += paddle.layer.full_matrix_projection(input=context)
-            decoder_inputs += paddle.layer.full_matrix_projection(
-                input=current_word)
+        decoder_inputs = paddle.layer.fc(
+            act=paddle.activation.Linear(),
+            size=decoder_size * 3,
+            bias_attr=False,
+            input=[context, current_word],
+            layer_attr=paddle.attr.ExtraLayerAttribute(
+            error_clipping_threshold=100.0))

        gru_step = paddle.layer.gru_step(
            name='gru_decoder',
@@ -352,20 +355,20 @@ is_generating = False
            output_mem=decoder_mem,
            size=decoder_size)

-        with paddle.layer.mixed(
-                size=target_dict_dim,
-                bias_attr=True,
-                act=paddle.activation.Softmax()) as out:
-            out += paddle.layer.full_matrix_projection(input=gru_step)
+        out = paddle.layer.fc(
+            size=target_dict_dim,
+            bias_attr=True,
+            act=paddle.activation.Softmax(),
+            input=gru_step)
        return out
-    ```
+   ```

 4. Define the name for the decoder and the first two input for `gru_decoder_with_attention`. Note that `StaticInput` is used for the two inputs. Please refer to [StaticInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for more details.

    ```python
    decoder_group_name = "decoder_group"
-    group_input1 = paddle.layer.StaticInputV2(input=encoded_vector, is_seq=True)
-    group_input2 = paddle.layer.StaticInputV2(input=encoded_proj, is_seq=True)
+    group_input1 = paddle.layer.StaticInput(input=encoded_vector)
+    group_input2 = paddle.layer.StaticInput(input=encoded_proj)
    group_inputs = [group_input1, group_input2]
    ```

@@ -410,15 +413,14 @@ is_generating = False
   ```python
   if is_generating:
       # In generation, the decoder predicts a next target word based on
-       # the encoded source sequence and the last generated target word.
+       # the encoded source sequence and the previous generated target word.

       # The encoded source sequence (encoder's output) must be specified by
       # StaticInput, which is a read-only memory.
-       # Embedding of the last generated word is automatically gotten by
-       # GeneratedInputs, which is initialized by a start mark, such as <s>,
-       # and must be included in generation.
+       # Embedding of the previous generated word is automatically retrieved
+       # by GeneratedInputs initialized by a start mark <s>.

-       trg_embedding = paddle.layer.GeneratedInputV2(
+       trg_embedding = paddle.layer.GeneratedInput(
           size=target_dict_dim,
           embedding_name='_target_language_embedding',
           embedding_size=word_vector_dim)
@@ -545,36 +547,31 @@ Note: Our configuration is based on Bahdanau et al. \[[4](#Reference)\] but with

   ```python
   if is_generating:
-        # get the dictionary
-        src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)
-
-        # the delimited element of generated sequences is -1,
-        # the first element of each generated sequence is the sequence length
-        seq_list = []
-        seq = []
-        for w in beam_result[1]:
-            if w != -1:
-                seq.append(w)
-            else:
-                seq_list.append(' '.join([trg_dict.get(w) for w in seq[1:]]))
-                seq = []
-
-        prob = beam_result[0]
-        for i in xrange(gen_num):
-            print "\n*******************************************************\n"
-            print "src:", ' '.join(
-                [src_dict.get(w) for w in gen_data[i][0]]), "\n"
-            for j in xrange(beam_size):
-                print "prob = %f:" % (prob[i][j]), seq_list[i * beam_size + j]
+       # load the dictionary
+       src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)
+
+       gen_sen_idx = np.where(beam_result[1] == -1)[0]
+       assert len(gen_sen_idx) == len(gen_data) * beam_size
+
+       # -1 is the delimiter of generated sequences.
+       # the first element of each generated sequence its length.
+       start_pos, end_pos = 1, 0
+       for i, sample in enumerate(gen_data):
+           print(" ".join([src_dict[w] for w in sample[0][1:-1]]))
+           for j in xrange(beam_size):
+               end_pos = gen_sen_idx[i * beam_size + j]
+               print("%.4f\t%s" % (beam_result[0][i][j], " ".join(
+                     trg_dict[w] for w in beam_result[1][start_pos:end_pos])))
+               start_pos = end_pos + 2
+           print("\n")
   ```

  The generating log is as follows:
  ```text
-  src: <s> Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu <e>
-
-  prob = -19.019573: The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
-  prob = -19.113066: The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
-  prob = -19.512890: The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
+  Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu
+  -19.0196        The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
+  -19.1131        The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
+  -19.5129        The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
  ```

 ## Summary

--- a/08.machine_translation/train.py
+++ b/08.machine_translation/train.py
 import sys
+import numpy as np

 import paddle.v2 as paddle


-def seqToseq_net(source_dict_dim, target_dict_dim, is_generating=False):
+def save_model(parameters, save_path):
+    with open(save_path, 'w') as f:
+        parameters.to_tar(f)
+
+
+def seq_to_seq_net(source_dict_dim,
+                   target_dict_dim,
+                   is_generating,
+                   beam_size=3,
+                   max_length=250):
    ### Network Architecture
    word_vector_dim = 512  # dimension of word vector
-    decoder_size = 512  # dimension of hidden unit in GRU Decoder network
-    encoder_size = 512  # dimension of hidden unit in GRU Encoder network
-
-    beam_size = 3
-    max_length = 250
+    decoder_size = 512  # dimension of hidden unit of GRU decoder
+    encoder_size = 512  # dimension of hidden unit of GRU encoder

    #### Encoder
    src_word_id = paddle.layer.data(
        name='source_language_word',
        type=paddle.data_type.integer_value_sequence(source_dict_dim))
    src_embedding = paddle.layer.embedding(
-        input=src_word_id,
-        size=word_vector_dim,
-        param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))
+        input=src_word_id, size=word_vector_dim)
    src_forward = paddle.networks.simple_gru(
        input=src_embedding, size=encoder_size)
    src_backward = paddle.networks.simple_gru(
@@ -27,16 +32,19 @@ def seqToseq_net(source_dict_dim, target_dict_dim, is_generating=False):
    encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])

    #### Decoder
-    with paddle.layer.mixed(size=decoder_size) as encoded_proj:
-        encoded_proj += paddle.layer.full_matrix_projection(
-            input=encoded_vector)
+    encoded_proj = paddle.layer.fc(
+        act=paddle.activation.Linear(),
+        size=decoder_size,
+        bias_attr=False,
+        input=encoded_vector)

    backward_first = paddle.layer.first_seq(input=src_backward)

-    with paddle.layer.mixed(
-            size=decoder_size, act=paddle.activation.Tanh()) as decoder_boot:
-        decoder_boot += paddle.layer.full_matrix_projection(
-            input=backward_first)
+    decoder_boot = paddle.layer.fc(
+        size=decoder_size,
+        act=paddle.activation.Tanh(),
+        bias_attr=False,
+        input=backward_first)

    def gru_decoder_with_attention(enc_vec, enc_proj, current_word):

@@ -48,10 +56,13 @@ def seqToseq_net(source_dict_dim, target_dict_dim, is_generating=False):
            encoded_proj=enc_proj,
            decoder_state=decoder_mem)

-        with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
-            decoder_inputs += paddle.layer.full_matrix_projection(input=context)
-            decoder_inputs += paddle.layer.full_matrix_projection(
-                input=current_word)
+        decoder_inputs = paddle.layer.fc(
+            act=paddle.activation.Linear(),
+            size=decoder_size * 3,
+            bias_attr=False,
+            input=[context, current_word],
+            layer_attr=paddle.attr.ExtraLayerAttribute(
+                error_clipping_threshold=100.0))

        gru_step = paddle.layer.gru_step(
            name='gru_decoder',
@@ -59,16 +70,16 @@ def seqToseq_net(source_dict_dim, target_dict_dim, is_generating=False):
            output_mem=decoder_mem,
            size=decoder_size)

-        with paddle.layer.mixed(
-                size=target_dict_dim,
-                bias_attr=True,
-                act=paddle.activation.Softmax()) as out:
-            out += paddle.layer.full_matrix_projection(input=gru_step)
+        out = paddle.layer.fc(
+            size=target_dict_dim,
+            bias_attr=True,
+            act=paddle.activation.Softmax(),
+            input=gru_step)
        return out

-    decoder_group_name = "decoder_group"
-    group_input1 = paddle.layer.StaticInputV2(input=encoded_vector, is_seq=True)
-    group_input2 = paddle.layer.StaticInputV2(input=encoded_proj, is_seq=True)
+    decoder_group_name = 'decoder_group'
+    group_input1 = paddle.layer.StaticInput(input=encoded_vector)
+    group_input2 = paddle.layer.StaticInput(input=encoded_proj)
    group_inputs = [group_input1, group_input2]

    if not is_generating:
@@ -98,15 +109,14 @@ def seqToseq_net(source_dict_dim, target_dict_dim, is_generating=False):
        return cost
    else:
        # In generation, the decoder predicts a next target word based on
-        # the encoded source sequence and the last generated target word.
+        # the encoded source sequence and the previous generated target word.

        # The encoded source sequence (encoder's output) must be specified by
        # StaticInput, which is a read-only memory.
-        # Embedding of the last generated word is automatically gotten by
-        # GeneratedInputs, which is initialized by a start mark, such as <s>,
-        # and must be included in generation.
+        # Embedding of the previous generated word is automatically retrieved
+        # by GeneratedInputs initialized by a start mark <s>.

-        trg_embedding = paddle.layer.GeneratedInputV2(
+        trg_embedding = paddle.layer.GeneratedInput(
            size=target_dict_dim,
            embedding_name='_target_language_embedding',
            embedding_size=word_vector_dim)
@@ -134,32 +144,43 @@ def main():

    # train the network
    if not is_generating:
-        cost = seqToseq_net(source_dict_dim, target_dict_dim)
-        parameters = paddle.parameters.create(cost)
-
        # define optimize method and trainer
        optimizer = paddle.optimizer.Adam(
            learning_rate=5e-5,
            regularization=paddle.optimizer.L2Regularization(rate=8e-4))
+
+        cost = seq_to_seq_net(source_dict_dim, target_dict_dim, is_generating)
+        parameters = paddle.parameters.create(cost)
+
        trainer = paddle.trainer.SGD(
            cost=cost, parameters=parameters, update_equation=optimizer)
        # define data reader
        wmt14_reader = paddle.batch(
            paddle.reader.shuffle(
                paddle.dataset.wmt14.train(dict_size), buf_size=8192),
-            batch_size=5)
+            batch_size=4)

        # define event_handler callback
        def event_handler(event):
            if isinstance(event, paddle.event.EndIteration):
                if event.batch_id % 10 == 0:
-                    print "\nPass %d, Batch %d, Cost %f, %s" % (
-                        event.pass_id, event.batch_id, event.cost,
-                        event.metrics)
+                    print("\nPass %d, Batch %d, Cost %f, %s" %
+                          (event.pass_id, event.batch_id, event.cost,
+                           event.metrics))
                else:
                    sys.stdout.write('.')
                    sys.stdout.flush()

+                if not event.batch_id % 10:
+                    save_path = 'params_pass_%05d_batch_%05d.tar' % (
+                        event.pass_id, event.batch_id)
+                    save_model(parameters, save_path)
+
+            if isinstance(event, paddle.event.EndPass):
+                # save parameters
+                save_path = 'params_pass_%05d.tar' % (event.pass_id)
+                save_model(parameters, save_path)
+
        # start to train
        trainer.train(
            reader=wmt14_reader, event_handler=event_handler, num_passes=2)
@@ -167,46 +188,46 @@ def main():
    # generate a english sequence to french
    else:
        # use the first 3 samples for generation
-        gen_creator = paddle.dataset.wmt14.gen(dict_size)
        gen_data = []
        gen_num = 3
-        for item in gen_creator():
-            gen_data.append((item[0], ))
+        for item in paddle.dataset.wmt14.gen(dict_size)():
+            gen_data.append([item[0]])
            if len(gen_data) == gen_num:
                break

-        beam_gen = seqToseq_net(source_dict_dim, target_dict_dim, is_generating)
-        # get the pretrained model, whose bleu = 26.92
+        beam_size = 3
+        beam_gen = seq_to_seq_net(source_dict_dim, target_dict_dim,
+                                  is_generating, beam_size)
+
+        # get the trained model, whose bleu = 26.92
        parameters = paddle.dataset.wmt14.model()
-        # prob is the prediction probabilities, and id is the prediction word. 
+
+        # prob is the prediction probabilities, and id is the prediction word.
        beam_result = paddle.infer(
            output_layer=beam_gen,
            parameters=parameters,
            input=gen_data,
            field=['prob', 'id'])

-        # get the dictionary
+        # load the dictionary
        src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)

-        # the delimited element of generated sequences is -1,
-        # the first element of each generated sequence is the sequence length
-        seq_list = []
-        seq = []
-        for w in beam_result[1]:
-            if w != -1:
-                seq.append(w)
-            else:
-                seq_list.append(' '.join([trg_dict.get(w) for w in seq[1:]]))
-                seq = []
-
-        prob = beam_result[0]
-        beam_size = 3
-        for i in xrange(gen_num):
-            print "\n*******************************************************\n"
-            print "src:", ' '.join(
-                [src_dict.get(w) for w in gen_data[i][0]]), "\n"
+        gen_sen_idx = np.where(beam_result[1] == -1)[0]
+        assert len(gen_sen_idx) == len(gen_data) * beam_size
+
+        # -1 is the delimiter of generated sequences.
+        # the first element of each generated sequence its length.
+        start_pos, end_pos = 1, 0
+        for i, sample in enumerate(gen_data):
+            print(
+                " ".join([src_dict[w] for w in sample[0][1:-1]])
+            )  # skip the start and ending mark when printing the source sentence
            for j in xrange(beam_size):
-                print "prob = %f:" % (prob[i][j]), seq_list[i * beam_size + j]
+                end_pos = gen_sen_idx[i * beam_size + j]
+                print("%.4f\t%s" % (beam_result[0][i][j], " ".join(
+                    trg_dict[w] for w in beam_result[1][start_pos:end_pos])))
+                start_pos = end_pos + 2
+            print("\n")


 if __name__ == '__main__':

--- a/README.cn.md
+++ b/README.cn.md
@@ -13,6 +13,8 @@
 1. [语义角色标注](http://book.paddlepaddle.org/07.label_semantic_roles/index.cn.html)
 1. [机器翻译](http://book.paddlepaddle.org/08.machine_translation/index.cn.html)

+更多学习内容请访问PaddlePaddle[视频课堂](http://bit.baidu.com/Course/datalist/column/117.html)。
+
 ## 运行这本书

 您现在在看的这本书是一本“交互式”电子书 —— 每一章都可以运行在一个Jupyter Notebook里。
@@ -22,7 +24,7 @@
 只需要在命令行窗口里运行：

 ```bash
-docker run -d -p 8888:8888 paddlepaddle/book:0.10.0
+docker run -d -p 8888:8888 paddlepaddle/book
 ```

 会从DockerHub.com下载和运行本书的Docker image。阅读和在线编辑本书请在浏览器里访问 http://localhost:8888 。
@@ -30,7 +32,7 @@ docker run -d -p 8888:8888 paddlepaddle/book:0.10.0
 如果您访问DockerHub.com很慢，可以试试我们的另一个镜像docker.paddlepaddle.org：

 ```bash
-docker run -d -p 8888:8888 docker.paddlepaddle.org/book:0.10.0
+docker run -d -p 8888:8888 docker.paddlepaddle.org/book
 ```

 ### 使用GPU训练
@@ -38,13 +40,13 @@ docker run -d -p 8888:8888 docker.paddlepaddle.org/book:0.10.0
 本书默认使用CPU训练，若是要使用GPU训练，使用步骤会稍有变化。为了保证GPU驱动能够在镜像里面正常运行，我们推荐使用[nvidia-docker](https://github.com/NVIDIA/nvidia-docker)来运行镜像。请先安装nvidia-docker，之后请运行：

 ```bash
-nvidia-docker run -d -p 8888:8888 paddlepaddle/book:0.10.0-gpu
+nvidia-docker run -d -p 8888:8888 paddlepaddle/book:latest-gpu
 ```

 或者使用国内的镜像请运行：

 ```bash
-nvidia-docker run -d -p 8888:8888 docker.paddlepaddle.org/book:0.10.0-gpu
+nvidia-docker run -d -p 8888:8888 docker.paddlepaddle.org/book:latest-gpu
 ```

 还需要将以下代码
@@ -64,7 +66,7 @@ paddle.init(use_gpu=True, trainer_count=1)

 为了写作、运行、调试，您需要安装Python 2.x和Go >1.5, 并可以用[脚本程序](https://github.com/PaddlePaddle/book/blob/develop/.tools/convert-markdown-into-ipynb-and-test.sh)来生成新的Docker image。

-**Note:** We also provide [English Readme](https://github.com/PaddlePaddle/book/blob/develop/README.en.md) for PaddlePaddle book.
+**Note:** We also provide [English Readme](https://github.com/PaddlePaddle/book/blob/develop/README.md) for PaddlePaddle book.


 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">知识共享 署名-相同方式共享 4.0 国际 许可协议</a>进行许可。
--- a/README.md
+++ b/README.md
@@ -22,7 +22,8 @@ We packed this book, Jupyter, PaddlePaddle, and all dependencies into a Docker i
 Just type

 ```bash
-docker run -d -p 8888:8888 paddlepaddle/book:0.10.0
+docker run -d -p 8888:8888 paddlepaddle/book
+
 ```

 This command will download the pre-built Docker image from DockerHub.com and run it in a container.  Please direct your Web browser to http://localhost:8888 to read the book.
@@ -30,7 +31,8 @@ This command will download the pre-built Docker image from DockerHub.com and run
 If you are living in somewhere slow to access DockerHub.com, you might try our mirror server docker.paddlepaddle.org:

 ```bash
-docker run -d -p 8888:8888 docker.paddlepaddle.org/book:0.10.0
+docker run -d -p 8888:8888 docker.paddlepaddle.org/book
+
 ```

 ### Training with GPU
@@ -40,13 +42,15 @@ By default we are using CPU for training, if you want to train with GPU, the ste
 To make sure GPU can be successfully used from inside container, please install [nvidia-docker](https://github.com/NVIDIA/nvidia-docker). Then run:

 ```bash
-nvidia-docker run -d -p 8888:8888 paddlepaddle/book:0.10.0-gpu
+nvidia-docker run -d -p 8888:8888 paddlepaddle/book:latest-gpu
+
 ```

 Or you can use the image registry mirror in China:

 ```bash
-nvidia-docker run -d -p 8888:8888 docker.paddlepaddle.org/book:0.10.0-gpu
+nvidia-docker run -d -p 8888:8888 docker.paddlepaddle.org/book:latest-gpu
+
 ```

 Change the code in the chapter that you are reading from