fix conflict

b17e5cbc · chengduoZH · 44f1fff6 · 524b56c8 · b17e5cbc · b17e5cbc
38 changed file
--- a/.tools/convert-markdown-into-ipynb-and-test.sh
+++ b/.tools/convert-markdown-into-ipynb-and-test.sh
@@ -11,7 +11,7 @@ cur_path="$(cd "$(dirname "$0")" && pwd -P)"
 cd $cur_path/../
 #convert md to ipynb
-for file in */{README,README\.en}.md ; do
+for file in */{README,README\.cn}.md ; do
    ~/go/bin/markdown-to-ipynb < $file > ${file%.*}".ipynb"
    if [ $? -ne 0 ]; then
        echo >&2 "markdown-to-ipynb $file error"
@@ -24,7 +24,7 @@ if [[ -z $TEST_EMBEDDED_PYTHON_SCRIPTS ]]; then
 fi
 #exec ipynb's py file
-for file in */{README,README\.en}.ipynb ; do
+for file in */{README,README\.cn}.ipynb ; do
    pushd $PWD > /dev/null
    cd $(dirname $file) > /dev/null

--- a/.tools/notedown.sh
+++ b/.tools/notedown.sh
@@ -4,6 +4,6 @@ set -xe
 cd /book
 #convert md to ipynb
-for file in */{README,README\.en}.md ; do
+for file in */{README,README\.cn}.md ; do
    notedown $file > ${file%.*}.ipynb
 done
--- a/01.fit_a_line/README.cn.md
+++ b/01.fit_a_line/README.cn.md
 # 线性回归
 让我们从经典的线性回归（Linear Regression \[[1](#参考文献)\]）模型开始这份教程。在这一章里，你将使用真实的数据集建立起一个房价预测模型，并且了解到机器学习中的若干重要概念。
-本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/01.fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/01.fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/137.html)。
 ## 背景介绍
 给定一个大小为$n$的数据集  ${\{y_{i}, x_{i1}, ..., x_{id}\}}_{i=1}^{n}$，其中$x_{i1}, \ldots, x_{id}$是第$i$个样本$d$个属性上的取值，$y_i$是该样本待预测的目标。线性回归模型假设目标$y_i$可以被属性间的线性组合描述，即
@@ -200,6 +200,11 @@ def event_handler_plot(event):
            cost_ploter.plot()
        step += 1
+    if isinstance(event, paddle.event.EndPass):
+        if event.pass_id % 10 == 0:
+            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
+                parameters.to_tar(f)
 ```
 ### 开始训练
@@ -217,6 +222,37 @@ trainer.train(
 ![png](./image/train_and_test.png)
+### 应用模型
+#### 1. 生成测试数据
+```python
+test_data_creator = paddle.dataset.uci_housing.test()
+test_data = []
+test_label = []
+for item in test_data_creator():
+    test_data.append((item[0],))
+    test_label.append(item[1])
+    if len(test_data) == 5:
+        break
+```
+#### 2. 推测 inference
+```python
+# load parameters from tar file.
+# users can remove the comments and change the model name
+# with open('params_pass_20.tar', 'r') as f:
+#     parameters = paddle.parameters.Parameters.from_tar(f)
+probs = paddle.infer(
+    output_layer=y_predict, parameters=parameters, input=test_data)
+for i in xrange(len(probs)):
+    print "label=" + str(test_label[i][0]) + ", predict=" + str(probs[i][0])
+```
 ## 总结
 在这章里，我们借助波士顿房价这一数据集，介绍了线性回归模型的基本概念，以及如何使用PaddlePaddle实现训练和测试的过程。很多的模型和技巧都是从简单的线性回归模型演化而来，因此弄清楚线性模型的原理和局限非常重要。

--- a/01.fit_a_line/README.md
+++ b/01.fit_a_line/README.md
@@ -205,6 +205,11 @@ def event_handler_plot(event):
            plot_cost.plot()
        step += 1
+    if isinstance(event, paddle.event.EndPass):
+        if event.pass_id % 10 == 0:
+            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
+                parameters.to_tar(f)
 ```
 ### Start Training
@@ -222,6 +227,37 @@ trainer.train(
 ![png](./image/train_and_test.png)
+### Apply model
+#### 1. generate testing data
+```python
+test_data_creator = paddle.dataset.uci_housing.test()
+test_data = []
+test_label = []
+for item in test_data_creator():
+    test_data.append((item[0],))
+    test_label.append(item[1])
+    if len(test_data) == 5:
+        break
+```
+#### 2. inference
+```python
+# load parameters from tar file.
+# users can remove the comments and change the model name
+# with open('params_pass_20.tar', 'r') as f:
+#     parameters = paddle.parameters.Parameters.from_tar(f)
+probs = paddle.infer(
+    output_layer=y_predict, parameters=parameters, input=test_data)
+for i in xrange(len(probs)):
+    print "label=" + str(test_label[i][0]) + ", predict=" + str(probs[i][0])
+```
 ## Summary
 This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation.

--- a/01.fit_a_line/image/ranges.png
+++ b/01.fit_a_line/image/ranges.png
--- a/01.fit_a_line/index.cn.html
+++ b/01.fit_a_line/index.cn.html
@@ -43,7 +43,7 @@
 # 线性回归
 让我们从经典的线性回归（Linear Regression \[[1](#参考文献)\]）模型开始这份教程。在这一章里，你将使用真实的数据集建立起一个房价预测模型，并且了解到机器学习中的若干重要概念。
-本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/01.fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/01.fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/137.html)。
 ## 背景介绍
 给定一个大小为$n$的数据集  ${\{y_{i}, x_{i1}, ..., x_{id}\}}_{i=1}^{n}$，其中$x_{i1}, \ldots, x_{id}$是第$i$个样本$d$个属性上的取值，$y_i$是该样本待预测的目标。线性回归模型假设目标$y_i$可以被属性间的线性组合描述，即
@@ -242,6 +242,11 @@ def event_handler_plot(event):
            cost_ploter.plot()
        step += 1
+    if isinstance(event, paddle.event.EndPass):
+        if event.pass_id % 10 == 0:
+            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
+                parameters.to_tar(f)
 ```
 ### 开始训练
@@ -259,6 +264,37 @@ trainer.train(
 ![png](./image/train_and_test.png)
+### 应用模型
+#### 1. 生成测试数据
+```python
+test_data_creator = paddle.dataset.uci_housing.test()
+test_data = []
+test_label = []
+for item in test_data_creator():
+    test_data.append((item[0],))
+    test_label.append(item[1])
+    if len(test_data) == 5:
+        break
+```
+#### 2. 推测 inference
+```python
+# load parameters from tar file.
+# users can remove the comments and change the model name
+# with open('params_pass_20.tar', 'r') as f:
+#     parameters = paddle.parameters.Parameters.from_tar(f)
+probs = paddle.infer(
+    output_layer=y_predict, parameters=parameters, input=test_data)
+for i in xrange(len(probs)):
+    print "label=" + str(test_label[i][0]) + ", predict=" + str(probs[i][0])
+```
 ## 总结
 在这章里，我们借助波士顿房价这一数据集，介绍了线性回归模型的基本概念，以及如何使用PaddlePaddle实现训练和测试的过程。很多的模型和技巧都是从简单的线性回归模型演化而来，因此弄清楚线性模型的原理和局限非常重要。

--- a/01.fit_a_line/index.html
+++ b/01.fit_a_line/index.html
@@ -247,6 +247,11 @@ def event_handler_plot(event):
            plot_cost.plot()
        step += 1
+    if isinstance(event, paddle.event.EndPass):
+        if event.pass_id % 10 == 0:
+            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
+                parameters.to_tar(f)
 ```
 ### Start Training
@@ -264,6 +269,37 @@ trainer.train(
 ![png](./image/train_and_test.png)
+### Apply model
+#### 1. generate testing data
+```python
+test_data_creator = paddle.dataset.uci_housing.test()
+test_data = []
+test_label = []
+for item in test_data_creator():
+    test_data.append((item[0],))
+    test_label.append(item[1])
+    if len(test_data) == 5:
+        break
+```
+#### 2. inference
+```python
+# load parameters from tar file.
+# users can remove the comments and change the model name
+# with open('params_pass_20.tar', 'r') as f:
+#     parameters = paddle.parameters.Parameters.from_tar(f)
+probs = paddle.infer(
+    output_layer=y_predict, parameters=parameters, input=test_data)
+for i in xrange(len(probs)):
+    print "label=" + str(test_label[i][0]) + ", predict=" + str(probs[i][0])
+```
 ## Summary
 This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation.

--- a/01.fit_a_line/train.py
+++ b/01.fit_a_line/train.py
@@ -31,6 +31,9 @@ def main():
                    event.pass_id, event.batch_id, event.cost)
        if isinstance(event, paddle.event.EndPass):
+            if event.pass_id % 10 == 0:
+                with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
+                    parameters.to_tar(f)
            result = trainer.test(
                reader=paddle.batch(uci_housing.test(), batch_size=2),
                feeding=feeding)
@@ -45,6 +48,28 @@ def main():
        event_handler=event_handler,
        num_passes=30)
+    # inference
+    test_data_creator = paddle.dataset.uci_housing.test()
+    test_data = []
+    test_label = []
+    for item in test_data_creator():
+        test_data.append((item[0], ))
+        test_label.append(item[1])
+        if len(test_data) == 5:
+            break
+    # load parameters from tar file.
+    # users can remove the comments and change the model name
+    # with open('params_pass_20.tar', 'r') as f:
+    #     parameters = paddle.parameters.Parameters.from_tar(f)
+    probs = paddle.infer(
+        output_layer=y_predict, parameters=parameters, input=test_data)
+    for i in xrange(len(probs)):
+        print "label=" + str(test_label[i][0]) + ", predict=" + str(probs[i][0])
 if __name__ == '__main__':
    main()
--- a/02.recognize_digits/README.cn.md
+++ b/02.recognize_digits/README.cn.md
 # 识别数字
-本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/167.html)。
 ## 背景介绍
 当我们学习编程的时候，编写的第一个程序一般是实现打印"Hello World"。而机器学习（或深度学习）的入门教程，一般都是 [MNIST](http://yann.lecun.com/exdb/mnist/) 数据库上的手写识别问题。原因是手写识别属于典型的图像分类问题，比较简单，同时MNIST数据集也很完备。MNIST数据集作为一个简单的计算机视觉数据集，包含一系列如图1所示的手写数字图片和对应的标签。图片是28x28的像素矩阵，标签则对应着0~9的10个数字。每张图片都经过了大小归一化和居中处理。
@@ -132,7 +132,6 @@ PaddlePaddle在API中提供了自动加载[MNIST](http://yann.lecun.com/exdb/mni
 首先，加载PaddlePaddle的V2 api包。
 ```python
-import gzip
 import paddle.v2 as paddle
 ```
 其次，定义三个不同的分类器：
@@ -256,7 +255,7 @@ def event_handler_plot(event):
        step += 1
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(reader=paddle.batch(
@@ -275,7 +274,7 @@ def event_handler(event):
                event.pass_id, event.batch_id, event.cost, event.metrics)
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(reader=paddle.batch(

--- a/02.recognize_digits/README.md
+++ b/02.recognize_digits/README.md
@@ -131,7 +131,6 @@ PaddlePaddle provides a Python module, `paddle.dataset.mnist`, which downloads a
 A PaddlePaddle program starts from importing the API package:
 ```python
-import gzip
 import paddle.v2 as paddle
 ```
@@ -251,7 +250,7 @@ def event_handler_plot(event):
        step += 1
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(reader=paddle.batch(
@@ -271,7 +270,7 @@ def event_handler(event):
                event.pass_id, event.batch_id, event.cost, event.metrics)
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(reader=paddle.batch(

--- a/02.recognize_digits/index.cn.html
+++ b/02.recognize_digits/index.cn.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 识别数字
-本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/167.html)。
 ## 背景介绍
 当我们学习编程的时候，编写的第一个程序一般是实现打印"Hello World"。而机器学习（或深度学习）的入门教程，一般都是 [MNIST](http://yann.lecun.com/exdb/mnist/) 数据库上的手写识别问题。原因是手写识别属于典型的图像分类问题，比较简单，同时MNIST数据集也很完备。MNIST数据集作为一个简单的计算机视觉数据集，包含一系列如图1所示的手写数字图片和对应的标签。图片是28x28的像素矩阵，标签则对应着0~9的10个数字。每张图片都经过了大小归一化和居中处理。
@@ -174,7 +174,6 @@ PaddlePaddle在API中提供了自动加载[MNIST](http://yann.lecun.com/exdb/mni
 首先，加载PaddlePaddle的V2 api包。
 ```python
-import gzip
 import paddle.v2 as paddle
 ```
 其次，定义三个不同的分类器：
@@ -298,7 +297,7 @@ def event_handler_plot(event):
        step += 1
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(reader=paddle.batch(
@@ -317,7 +316,7 @@ def event_handler(event):
                event.pass_id, event.batch_id, event.cost, event.metrics)
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(reader=paddle.batch(

--- a/02.recognize_digits/index.html
+++ b/02.recognize_digits/index.html
@@ -173,7 +173,6 @@ PaddlePaddle provides a Python module, `paddle.dataset.mnist`, which downloads a
 A PaddlePaddle program starts from importing the API package:
 ```python
-import gzip
 import paddle.v2 as paddle
 ```
@@ -293,7 +292,7 @@ def event_handler_plot(event):
        step += 1
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(reader=paddle.batch(
@@ -313,7 +312,7 @@ def event_handler(event):
                event.pass_id, event.batch_id, event.cost, event.metrics)
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(reader=paddle.batch(

--- a/02.recognize_digits/train.py
+++ b/02.recognize_digits/train.py
-import gzip
 import os
 from PIL import Image
 import numpy as np
@@ -85,7 +84,7 @@ def main():
                    event.pass_id, event.batch_id, event.cost, event.metrics)
        if isinstance(event, paddle.event.EndPass):
            # save parameters
-            with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
                parameters.to_tar(f)
            result = trainer.test(reader=paddle.batch(

--- a/03.image_classification/README.cn.md
+++ b/03.image_classification/README.cn.md
 # 图像分类
-本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/168.html)。
 ## 背景介绍
@@ -156,7 +156,6 @@ Paddle API提供了自动加载cifar数据集模块 `paddle.dataset.cifar`。
 ```python
 import sys
-import gzip
 import paddle.v2 as paddle
 from vgg import vgg_bn_drop
 from resnet import resnet_cifar10
@@ -431,7 +430,7 @@ def event_handler(event):
            sys.stdout.flush()
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(
@@ -496,9 +495,9 @@ def load_image(file):
 test_data = []
 cur_dir = os.getcwd()
-test_data.append((load_image(cur_dir + '/image/dog.png'),)
+test_data.append((load_image(cur_dir + '/image/dog.png'),))
-# with gzip.open('params_pass_50.tar.gz', 'r') as f:
+# with open('params_pass_50.tar', 'r') as f:
 #    parameters = paddle.parameters.Parameters.from_tar(f)
 probs = paddle.infer(

--- a/03.image_classification/README.md
+++ b/03.image_classification/README.md
@@ -169,7 +169,6 @@ We must import and initialize PaddlePaddle (enable/disable GPU, set the number o
 ```python
 import sys
-import gzip
 import paddle.v2 as paddle
 from vgg import vgg_bn_drop
 from resnet import resnet_cifar10
@@ -438,7 +437,7 @@ def event_handler(event):
            sys.stdout.flush()
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(
@@ -505,10 +504,10 @@ def load_image(file):
    return im
 test_data = []
 cur_dir = os.getcwd()
-test_data.append((load_image(cur_dir + '/image/dog.png'),)
+test_data.append((load_image(cur_dir + '/image/dog.png'),))
 # users can remove the comments and change the model name
-# with gzip.open('params_pass_50.tar.gz', 'r') as f:
+# with open('params_pass_50.tar', 'r') as f:
 #    parameters = paddle.parameters.Parameters.from_tar(f)
 probs = paddle.infer(

--- a/03.image_classification/index.cn.html
+++ b/03.image_classification/index.cn.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 图像分类
-本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/168.html)。
 ## 背景介绍
@@ -198,7 +198,6 @@ Paddle API提供了自动加载cifar数据集模块 `paddle.dataset.cifar`。
 ```python
 import sys
-import gzip
 import paddle.v2 as paddle
 from vgg import vgg_bn_drop
 from resnet import resnet_cifar10
@@ -473,7 +472,7 @@ def event_handler(event):
            sys.stdout.flush()
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(
@@ -538,9 +537,9 @@ def load_image(file):
 test_data = []
 cur_dir = os.getcwd()
-test_data.append((load_image(cur_dir + '/image/dog.png'),)
+test_data.append((load_image(cur_dir + '/image/dog.png'),))
-# with gzip.open('params_pass_50.tar.gz', 'r') as f:
+# with open('params_pass_50.tar', 'r') as f:
 #    parameters = paddle.parameters.Parameters.from_tar(f)
 probs = paddle.infer(

--- a/03.image_classification/index.html
+++ b/03.image_classification/index.html
@@ -211,7 +211,6 @@ We must import and initialize PaddlePaddle (enable/disable GPU, set the number o
 ```python
 import sys
-import gzip
 import paddle.v2 as paddle
 from vgg import vgg_bn_drop
 from resnet import resnet_cifar10
@@ -480,7 +479,7 @@ def event_handler(event):
            sys.stdout.flush()
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(
@@ -547,10 +546,10 @@ def load_image(file):
    return im
 test_data = []
 cur_dir = os.getcwd()
-test_data.append((load_image(cur_dir + '/image/dog.png'),)
+test_data.append((load_image(cur_dir + '/image/dog.png'),))
 # users can remove the comments and change the model name
-# with gzip.open('params_pass_50.tar.gz', 'r') as f:
+# with open('params_pass_50.tar', 'r') as f:
 #    parameters = paddle.parameters.Parameters.from_tar(f)
 probs = paddle.infer(

--- a/03.image_classification/train.py
+++ b/03.image_classification/train.py
@@ -13,7 +13,6 @@
 # limitations under the License
 import sys
-import gzip
 import paddle.v2 as paddle
@@ -67,7 +66,7 @@ def main():
                sys.stdout.flush()
        if isinstance(event, paddle.event.EndPass):
            # save parameters
-            with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
                parameters.to_tar(f)
            result = trainer.test(
@@ -116,7 +115,7 @@ def main():
    test_data.append((load_image(cur_dir + '/image/dog.png'), ))
    # users can remove the comments and change the model name
-    # with gzip.open('params_pass_50.tar.gz', 'r') as f:
+    # with open('params_pass_50.tar', 'r') as f:
    #    parameters = paddle.parameters.Parameters.from_tar(f)
    probs = paddle.infer(

--- a/04.word2vec/README.cn.md
+++ b/04.word2vec/README.cn.md
 # 词向量
-本教程源代码目录在[book/word2vec](https://github.com/PaddlePaddle/book/tree/develop/04.word2vec)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/word2vec](https://github.com/PaddlePaddle/book/tree/develop/04.word2vec)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/175.html)。
 ## 背景介绍
@@ -302,8 +302,6 @@ trainer = paddle.trainer.SGD(cost, parameters, adagrad)
 `paddle.batch`的输入是一个reader，输出是一个batched reader —— 在PaddlePaddle里，一个reader每次yield一条训练数据，而一个batched reader每次yield一个minbatch。
 ```python
-import gzip
 def event_handler(event):
    if isinstance(event, paddle.event.EndIteration):
        if event.batch_id % 100 == 0:
@@ -315,7 +313,7 @@ def event_handler(event):
                    paddle.batch(
                        paddle.dataset.imikolov.test(word_dict, N), 32))
        print "Pass %d, Testing metrics %s" % (event.pass_id, result.metrics)
-        with gzip.open("model_%d.tar.gz"%event.pass_id, 'w') as f:
+        with open("model_%d.tar"%event.pass_id, 'w') as f:
            parameters.to_tar(f)
 trainer.train(

--- a/04.word2vec/README.md
+++ b/04.word2vec/README.md
@@ -313,8 +313,6 @@ Next, we will begin the training process. `paddle.dataset.imikolov.train()` and
 `paddle.batch` takes reader as input, outputs a **batched reader**: In PaddlePaddle, a reader outputs a single data instance at a time but batched reader outputs a minibatch of data instances.
 ```python
-import gzip
 def event_handler(event):
    if isinstance(event, paddle.event.EndIteration):
        if event.batch_id % 100 == 0:
@@ -326,7 +324,7 @@ def event_handler(event):
                    paddle.batch(
                        paddle.dataset.imikolov.test(word_dict, N), 32))
        print "Pass %d, Testing metrics %s" % (event.pass_id, result.metrics)
-        with gzip.open("model_%d.tar.gz"%event.pass_id, 'w') as f:
+        with open("model_%d.tar"%event.pass_id, 'w') as f:
            parameters.to_tar(f)
 trainer.train(

--- a/04.word2vec/index.cn.html
+++ b/04.word2vec/index.cn.html
@@ -43,7 +43,7 @@
 # 词向量
-本教程源代码目录在[book/word2vec](https://github.com/PaddlePaddle/book/tree/develop/04.word2vec)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/word2vec](https://github.com/PaddlePaddle/book/tree/develop/04.word2vec)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/175.html)。
 ## 背景介绍
@@ -344,8 +344,6 @@ trainer = paddle.trainer.SGD(cost, parameters, adagrad)
 `paddle.batch`的输入是一个reader，输出是一个batched reader —— 在PaddlePaddle里，一个reader每次yield一条训练数据，而一个batched reader每次yield一个minbatch。
 ```python
-import gzip
 def event_handler(event):
    if isinstance(event, paddle.event.EndIteration):
        if event.batch_id % 100 == 0:
@@ -357,7 +355,7 @@ def event_handler(event):
                    paddle.batch(
                        paddle.dataset.imikolov.test(word_dict, N), 32))
        print "Pass %d, Testing metrics %s" % (event.pass_id, result.metrics)
-        with gzip.open("model_%d.tar.gz"%event.pass_id, 'w') as f:
+        with open("model_%d.tar"%event.pass_id, 'w') as f:
            parameters.to_tar(f)
 trainer.train(

--- a/04.word2vec/index.html
+++ b/04.word2vec/index.html
@@ -355,8 +355,6 @@ Next, we will begin the training process. `paddle.dataset.imikolov.train()` and
 `paddle.batch` takes reader as input, outputs a **batched reader**: In PaddlePaddle, a reader outputs a single data instance at a time but batched reader outputs a minibatch of data instances.
 ```python
-import gzip
 def event_handler(event):
    if isinstance(event, paddle.event.EndIteration):
        if event.batch_id % 100 == 0:
@@ -368,7 +366,7 @@ def event_handler(event):
                    paddle.batch(
                        paddle.dataset.imikolov.test(word_dict, N), 32))
        print "Pass %d, Testing metrics %s" % (event.pass_id, result.metrics)
-        with gzip.open("model_%d.tar.gz"%event.pass_id, 'w') as f:
+        with open("model_%d.tar"%event.pass_id, 'w') as f:
            parameters.to_tar(f)
 trainer.train(

--- a/05.recommender_system/README.cn.md
+++ b/05.recommender_system/README.cn.md
 # 个性化推荐
-本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/176.html)。
 ## 背景介绍
@@ -45,7 +45,7 @@ YouTube是世界上最大的视频上传、分享和发现网站，YouTube推荐
 候选生成网络将推荐问题建模为一个类别数极大的多类分类问题：对于一个Youtube用户，使用其观看历史（视频ID）、搜索词记录（search tokens）、人口学信息（如地理位置、用户登录设备）、二值特征（如性别，是否登录）和连续特征（如用户年龄）等，对视频库中所有视频进行多分类，得到每一类别的分类结果（即每一个视频的推荐概率），最终输出概率较高的几百个视频。
-首先，将观看历史及搜索词记录这类历史信息，映射为向量后取平均值得到定长表示；同时，输入人口学特征以优化新用户的推荐效果，并将二值特征和连续特征归一化处理到[0, 1]范围。接下来，将所有特征表示拼接为一个向量，并输入给非线形多层感知器（MLP，详见[识别数字](https://github.com/PaddlePaddle/book/blob/develop/02.recognize_digits/README.md)教程）处理。最后，训练时将MLP的输出给softmax做分类，预测时计算用户的综合特征（MLP的输出）与所有视频的相似度，取得分最高的$k$个作为候选生成网络的筛选结果。图2显示了候选生成网络结构。
+首先，将观看历史及搜索词记录这类历史信息，映射为向量后取平均值得到定长表示；同时，输入人口学特征以优化新用户的推荐效果，并将二值特征和连续特征归一化处理到[0, 1]范围。接下来，将所有特征表示拼接为一个向量，并输入给非线形多层感知器（MLP，详见[识别数字](https://github.com/PaddlePaddle/book/blob/develop/02.recognize_digits/README.cn.md)教程）处理。最后，训练时将MLP的输出给softmax做分类，预测时计算用户的综合特征（MLP的输出）与所有视频的相似度，取得分最高的$k$个作为候选生成网络的筛选结果。图2显示了候选生成网络结构。
 <p align="center">
 <img src="image/Deep_candidate_generation_model_architecture.png" width="70%" ><br/>

--- a/05.recommender_system/index.cn.html
+++ b/05.recommender_system/index.cn.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 个性化推荐
-本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/176.html)。
 ## 背景介绍
@@ -87,7 +87,7 @@ YouTube是世界上最大的视频上传、分享和发现网站，YouTube推荐
 候选生成网络将推荐问题建模为一个类别数极大的多类分类问题：对于一个Youtube用户，使用其观看历史（视频ID）、搜索词记录（search tokens）、人口学信息（如地理位置、用户登录设备）、二值特征（如性别，是否登录）和连续特征（如用户年龄）等，对视频库中所有视频进行多分类，得到每一类别的分类结果（即每一个视频的推荐概率），最终输出概率较高的几百个视频。
-首先，将观看历史及搜索词记录这类历史信息，映射为向量后取平均值得到定长表示；同时，输入人口学特征以优化新用户的推荐效果，并将二值特征和连续特征归一化处理到[0, 1]范围。接下来，将所有特征表示拼接为一个向量，并输入给非线形多层感知器（MLP，详见[识别数字](https://github.com/PaddlePaddle/book/blob/develop/02.recognize_digits/README.md)教程）处理。最后，训练时将MLP的输出给softmax做分类，预测时计算用户的综合特征（MLP的输出）与所有视频的相似度，取得分最高的$k$个作为候选生成网络的筛选结果。图2显示了候选生成网络结构。
+首先，将观看历史及搜索词记录这类历史信息，映射为向量后取平均值得到定长表示；同时，输入人口学特征以优化新用户的推荐效果，并将二值特征和连续特征归一化处理到[0, 1]范围。接下来，将所有特征表示拼接为一个向量，并输入给非线形多层感知器（MLP，详见[识别数字](https://github.com/PaddlePaddle/book/blob/develop/02.recognize_digits/README.cn.md)教程）处理。最后，训练时将MLP的输出给softmax做分类，预测时计算用户的综合特征（MLP的输出）与所有视频的相似度，取得分最高的$k$个作为候选生成网络的筛选结果。图2显示了候选生成网络结构。
 <p align="center">
 <img src="image/Deep_candidate_generation_model_architecture.png" width="70%" ><br/>

--- a/06.understand_sentiment/README.cn.md
+++ b/06.understand_sentiment/README.cn.md
 # 情感分析
-本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/177.html)。
 ## 背景介绍
@@ -26,7 +26,7 @@
 ### 文本卷积神经网络简介（CNN）
-我们在[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节介绍过应用于文本数据的卷及神经网络模型的计算过程，这里进行一个简单的回顾。
+我们在[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节介绍过应用于文本数据的卷积神经网络模型的计算过程，这里进行一个简单的回顾。
 对卷积神经网络来说，首先使用卷积处理输入的词向量序列，产生一个特征图（feature map），对特征图采用时间维度上的最大池化（max pooling over time）操作得到此卷积核对应的整句话的特征，最后，将所有卷积核得到的特征拼接起来即为文本的定长向量表示，对于文本分类问题，将其连接至softmax即构建出完整的模型。在实际应用中，我们会使用多个卷积核来处理句子，窗口大小相同的卷积核堆叠起来形成一个矩阵，这样可以更高效的完成运算。另外，我们也可使用窗口大小不同的卷积核来处理句子，[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节的图3作为示意画了四个卷积核，不同颜色表示不同大小的卷积核操作。

--- a/06.understand_sentiment/index.cn.html
+++ b/06.understand_sentiment/index.cn.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 情感分析
-本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/177.html)。
 ## 背景介绍
@@ -68,7 +68,7 @@
 ### 文本卷积神经网络简介（CNN）
-我们在[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节介绍过应用于文本数据的卷及神经网络模型的计算过程，这里进行一个简单的回顾。
+我们在[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节介绍过应用于文本数据的卷积神经网络模型的计算过程，这里进行一个简单的回顾。
 对卷积神经网络来说，首先使用卷积处理输入的词向量序列，产生一个特征图（feature map），对特征图采用时间维度上的最大池化（max pooling over time）操作得到此卷积核对应的整句话的特征，最后，将所有卷积核得到的特征拼接起来即为文本的定长向量表示，对于文本分类问题，将其连接至softmax即构建出完整的模型。在实际应用中，我们会使用多个卷积核来处理句子，窗口大小相同的卷积核堆叠起来形成一个矩阵，这样可以更高效的完成运算。另外，我们也可使用窗口大小不同的卷积核来处理句子，[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节的图3作为示意画了四个卷积核，不同颜色表示不同大小的卷积核操作。

--- a/07.label_semantic_roles/README.cn.md
+++ b/07.label_semantic_roles/README.cn.md
 # 语义角色标注
-本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/07.label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/07.label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/178.html)。
 ## 背景介绍
@@ -68,7 +68,7 @@ $$\mbox{[小明]}_{\mbox{Agent}}\mbox{[昨天]}_{\mbox{Time}}\mbox{[晚上]}_\mb
 图4. 基于LSTM的双向循环神经网络结构示意图
 </p>
-需要说明的是，这种双向RNN结构和Bengio等人在机器翻译任务中使用的双向RNN结构\[[3](#参考文献), [4](#参考文献)\] 并不相同，我们会在后续[机器翻译](https://github.com/PaddlePaddle/book/blob/develop/machine_translation/README.md)任务中，介绍另一种双向循环神经网络。
+需要说明的是，这种双向RNN结构和Bengio等人在机器翻译任务中使用的双向RNN结构\[[3](#参考文献), [4](#参考文献)\] 并不相同，我们会在后续[机器翻译](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.cn.md)任务中，介绍另一种双向循环神经网络。
 ### 条件随机场 (Conditional Random Field)
@@ -182,14 +182,13 @@ conll05st-release/
 | predicate_dict | 谓词的词典，共计3162个词 |
 | emb | 一个训练好的词表，32维 |
-我们在英文维基百科上训练语言模型得到了一份词向量用来初始化SRL模型。在SRL模型训练过程中，词向量不再被更新。关于语言模型和词向量可以参考[词向量](https://github.com/PaddlePaddle/book/blob/develop/04.word2vec/README.md) 这篇教程。我们训练语言模型的语料共有995,000,000个token，词典大小控制为4900,000词。CoNLL 2005训练语料中有5%的词不在这4900,000个词中，我们将它们全部看作未登录词，用`<unk>`表示。
+我们在英文维基百科上训练语言模型得到了一份词向量用来初始化SRL模型。在SRL模型训练过程中，词向量不再被更新。关于语言模型和词向量可以参考[词向量](https://github.com/PaddlePaddle/book/blob/develop/04.word2vec/README.cn.md) 这篇教程。我们训练语言模型的语料共有995,000,000个token，词典大小控制为4900,000词。CoNLL 2005训练语料中有5%的词不在这4900,000个词中，我们将它们全部看作未登录词，用`<unk>`表示。
 获取词典，打印词典大小：
 ```python
 import math
 import numpy as np
-import gzip
 import paddle.v2 as paddle
 import paddle.v2.dataset.conll05 as conll05
 import paddle.v2.evaluator as evaluator
@@ -448,7 +447,7 @@ def event_handler(event):
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(reader=reader, feeding=feeding)

--- a/07.label_semantic_roles/README.md
+++ b/07.label_semantic_roles/README.md
@@ -87,7 +87,7 @@ To address, we can design a bidirectional recurrent neural network by making a m
 Fig 4. Bidirectional LSTMs
 </p>
-Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/machine_translation/README.en.md)
+Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md)
 ### Conditional Random Field (CRF)
@@ -118,7 +118,7 @@ where $\omega$ are the weights to the feature function that the CRF learns. Whil
 $$\DeclareMathOperator*{\argmax}{arg\,max} L(\lambda, D) = - \text{log}\left(\prod_{m=1}^{N}p(Y_m|X_m, W)\right) + C \frac{1}{2}\lVert W\rVert^{2}$$
-This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/07.machine_translation/README.en.md#Beam%20Search%20Algorithm)).
+This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md#beam-search-algorithm)).
 ### Deep Bidirectional LSTM (DB-LSTM) SRL model
@@ -211,7 +211,6 @@ Here we fetch the dictionary, and print its size:
 ```python
 import math
 import numpy as np
-import gzip
 import paddle.v2 as paddle
 import paddle.v2.dataset.conll05 as conll05
 import paddle.v2.evaluator as evaluator
@@ -466,7 +465,7 @@ def event_handler(event):
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(reader=reader, feeding=feeding)

--- a/07.label_semantic_roles/index.cn.html
+++ b/07.label_semantic_roles/index.cn.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 语义角色标注
-本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/07.label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/07.label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/178.html)。
 ## 背景介绍
@@ -110,7 +110,7 @@ $$\mbox{[小明]}_{\mbox{Agent}}\mbox{[昨天]}_{\mbox{Time}}\mbox{[晚上]}_\mb
 图4. 基于LSTM的双向循环神经网络结构示意图
 </p>
-需要说明的是，这种双向RNN结构和Bengio等人在机器翻译任务中使用的双向RNN结构\[[3](#参考文献), [4](#参考文献)\] 并不相同，我们会在后续[机器翻译](https://github.com/PaddlePaddle/book/blob/develop/machine_translation/README.md)任务中，介绍另一种双向循环神经网络。
+需要说明的是，这种双向RNN结构和Bengio等人在机器翻译任务中使用的双向RNN结构\[[3](#参考文献), [4](#参考文献)\] 并不相同，我们会在后续[机器翻译](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.cn.md)任务中，介绍另一种双向循环神经网络。
 ### 条件随机场 (Conditional Random Field)
@@ -224,14 +224,13 @@ conll05st-release/
 | predicate_dict | 谓词的词典，共计3162个词 |
 | emb | 一个训练好的词表，32维 |
-我们在英文维基百科上训练语言模型得到了一份词向量用来初始化SRL模型。在SRL模型训练过程中，词向量不再被更新。关于语言模型和词向量可以参考[词向量](https://github.com/PaddlePaddle/book/blob/develop/04.word2vec/README.md) 这篇教程。我们训练语言模型的语料共有995,000,000个token，词典大小控制为4900,000词。CoNLL 2005训练语料中有5%的词不在这4900,000个词中，我们将它们全部看作未登录词，用`<unk>`表示。
+我们在英文维基百科上训练语言模型得到了一份词向量用来初始化SRL模型。在SRL模型训练过程中，词向量不再被更新。关于语言模型和词向量可以参考[词向量](https://github.com/PaddlePaddle/book/blob/develop/04.word2vec/README.cn.md) 这篇教程。我们训练语言模型的语料共有995,000,000个token，词典大小控制为4900,000词。CoNLL 2005训练语料中有5%的词不在这4900,000个词中，我们将它们全部看作未登录词，用`<unk>`表示。
 获取词典，打印词典大小：
 ```python
 import math
 import numpy as np
-import gzip
 import paddle.v2 as paddle
 import paddle.v2.dataset.conll05 as conll05
 import paddle.v2.evaluator as evaluator
@@ -490,7 +489,7 @@ def event_handler(event):
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(reader=reader, feeding=feeding)

--- a/07.label_semantic_roles/index.html
+++ b/07.label_semantic_roles/index.html
@@ -129,7 +129,7 @@ To address, we can design a bidirectional recurrent neural network by making a m
 Fig 4. Bidirectional LSTMs
 </p>
-Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/machine_translation/README.en.md)
+Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md)
 ### Conditional Random Field (CRF)
@@ -160,7 +160,7 @@ where $\omega$ are the weights to the feature function that the CRF learns. Whil
 $$\DeclareMathOperator*{\argmax}{arg\,max} L(\lambda, D) = - \text{log}\left(\prod_{m=1}^{N}p(Y_m|X_m, W)\right) + C \frac{1}{2}\lVert W\rVert^{2}$$
-This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/07.machine_translation/README.en.md#Beam%20Search%20Algorithm)).
+This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md#beam-search-algorithm)).
 ### Deep Bidirectional LSTM (DB-LSTM) SRL model
@@ -253,7 +253,6 @@ Here we fetch the dictionary, and print its size:
 ```python
 import math
 import numpy as np
-import gzip
 import paddle.v2 as paddle
 import paddle.v2.dataset.conll05 as conll05
 import paddle.v2.evaluator as evaluator
@@ -508,7 +507,7 @@ def event_handler(event):
    if isinstance(event, paddle.event.EndPass):
        # save parameters
-        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+        with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
            parameters.to_tar(f)
        result = trainer.test(reader=reader, feeding=feeding)

--- a/07.label_semantic_roles/train.py
+++ b/07.label_semantic_roles/train.py
 import math
 import numpy as np
-import gzip
 import paddle.v2 as paddle
 import paddle.v2.dataset.conll05 as conll05
 import paddle.v2.evaluator as evaluator
@@ -183,7 +182,7 @@ def main():
        if isinstance(event, paddle.event.EndPass):
            # save parameters
-            with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
                parameters.to_tar(f)
            result = trainer.test(reader=reader, feeding=feeding)

--- a/08.machine_translation/README.cn.md
+++ b/08.machine_translation/README.cn.md
 # 机器翻译
-本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/179.html)。
 ## 背景介绍
@@ -39,7 +39,7 @@
 ### GRU
-我们已经在[情感分析](https://github.com/PaddlePaddle/book/blob/develop/understand_sentiment/README.md)一章中介绍了循环神经网络（RNN）及长短时间记忆网络（LSTM）。相比于简单的RNN，LSTM增加了记忆单元（memory cell）、输入门（input gate）、遗忘门（forget gate）及输出门（output gate），这些门及记忆单元组合起来大大提升了RNN处理远距离依赖问题的能力。
+我们已经在[情感分析](https://github.com/PaddlePaddle/book/blob/develop/06.understand_sentiment/README.cn.md)一章中介绍了循环神经网络（RNN）及长短时间记忆网络（LSTM）。相比于简单的RNN，LSTM增加了记忆单元（memory cell）、输入门（input gate）、遗忘门（forget gate）及输出门（output gate），这些门及记忆单元组合起来大大提升了RNN处理远距离依赖问题的能力。
 GRU\[[2](#参考文献)\]是Cho等人在LSTM上提出的简化版本，也是RNN的一种扩展，如下图所示。GRU单元只有两个门：
 - 重置门（reset gate）：如果重置门关闭，会忽略掉历史信息，即历史不相干的信息不会影响未来的输出。
@@ -53,7 +53,7 @@ GRU\[[2](#参考文献)\]是Cho等人在LSTM上提出的简化版本，也是RNN
 ### 双向循环神经网络
-我们已经在[语义角色标注](https://github.com/PaddlePaddle/book/blob/develop/label_semantic_roles/README.md)一章中介绍了一种双向循环神经网络，这里介绍Bengio团队在论文\[[2](#参考文献),[4](#参考文献)\]中提出的另一种结构。该结构的目的是输入一个序列，得到其在每个时刻的特征表示，即输出的每个时刻都用定长向量表示到该时刻的上下文语义信息。
+我们已经在[语义角色标注](https://github.com/PaddlePaddle/book/blob/develop/07.label_semantic_roles/README.cn.md)一章中介绍了一种双向循环神经网络，这里介绍Bengio团队在论文\[[2](#参考文献),[4](#参考文献)\]中提出的另一种结构。该结构的目的是输入一个序列，得到其在每个时刻的特征表示，即输出的每个时刻都用定长向量表示到该时刻的上下文语义信息。
 具体来说，该双向循环神经网络分别在时间维以顺序和逆序——即前向（forward）和后向（backward）——依次处理输入序列，并将每个时间步RNN的输出拼接成为最终的输出层。这样每个时间步的输出节点，都包含了输入序列中当前时刻完整的过去和未来的上下文信息。下图展示的是一个按时间步展开的双向循环神经网络。该网络包含一个前向和一个后向RNN，其中有六个权重矩阵：输入到前向隐层和后向隐层的权重矩阵（$W_1, W_3$），隐层到隐层自己的权重矩阵（$W_2,W_5$），前向隐层和后向隐层到输出层的权重矩阵（$W_4, W_6$）。注意，该网络的前向隐层和后向隐层之间没有连接。
@@ -209,9 +209,7 @@ is_generating = False
   ```python
    src_embedding = paddle.layer.embedding(
-        input=src_word_id,
+        input=src_word_id, size=word_vector_dim)
-        size=word_vector_dim,
-        param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))
   ```
   - 用双向GRU编码源语言序列，拼接两个GRU的编码结果得到$\mathbf{h}$。
@@ -228,8 +226,10 @@ is_generating = False
   - 对源语言序列编码后的结果（见2的最后一步），过一个前馈神经网络（Feed Forward Neural Network），得到其映射。
   ```python
-    with paddle.layer.mixed(size=decoder_size) as encoded_proj:
+   encoded_proj = paddle.layer.fc(
-        encoded_proj += paddle.layer.full_matrix_projection(
+         act=paddle.activation.Linear(),
+         size=decoder_size,
+         bias_attr=False,
         input=encoded_vector)
   ```
@@ -237,9 +237,10 @@ is_generating = False
   ```python
   backward_first = paddle.layer.first_seq(input=src_backward)
-    with paddle.layer.mixed(
+   decoder_boot = paddle.layer.fc(
-            size=decoder_size, act=paddle.activation.Tanh()) as decoder_boot:
+         size=decoder_size,
-        decoder_boot += paddle.layer.full_matrix_projection(
+         act=paddle.activation.Tanh(),
+         bias_attr=False,
         input=backward_first)
   ```
@@ -252,7 +253,6 @@ is_generating = False
   ```python
   def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
        decoder_mem = paddle.layer.memory(
            name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
@@ -261,10 +261,13 @@ is_generating = False
            encoded_proj=enc_proj,
            decoder_state=decoder_mem)
-        with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
+        decoder_inputs = paddle.layer.fc(
-            decoder_inputs += paddle.layer.full_matrix_projection(input=context)
+            act=paddle.activation.Linear(),
-            decoder_inputs += paddle.layer.full_matrix_projection(
+            size=decoder_size * 3,
-                input=current_word)
+            bias_attr=False,
+            input=[context, current_word],
+            layer_attr=paddle.attr.ExtraLayerAttribute(
+                error_clipping_threshold=100.0))
        gru_step = paddle.layer.gru_step(
            name='gru_decoder',
@@ -272,11 +275,11 @@ is_generating = False
            output_mem=decoder_mem,
            size=decoder_size)
-        with paddle.layer.mixed(
+        out = paddle.layer.mixed(
            size=target_dict_dim,
            bias_attr=True,
-                act=paddle.activation.Softmax()) as out:
+            act=paddle.activation.Softmax(),
-            out += paddle.layer.full_matrix_projection(input=gru_step)
+            input=paddle.layer.full_matrix_projection(input=gru_step))
        return out
   ```
@@ -284,8 +287,8 @@ is_generating = False
    ```python
    decoder_group_name = "decoder_group"
-    group_input1 = paddle.layer.StaticInputV2(input=encoded_vector, is_seq=True)
+    group_input1 = paddle.layer.StaticInput(input=encoded_vector)
-    group_input2 = paddle.layer.StaticInputV2(input=encoded_proj, is_seq=True)
+    group_input2 = paddle.layer.StaticInput(input=encoded_proj)
    group_inputs = [group_input1, group_input2]
    ```
@@ -330,15 +333,14 @@ is_generating = False
   ```python
   if is_generating:
      # In generation, the decoder predicts a next target word based on
-       # the encoded source sequence and the last generated target word.
+      # the encoded source sequence and the previous generated target word.
      # The encoded source sequence (encoder's output) must be specified by
      # StaticInput, which is a read-only memory.
-       # Embedding of the last generated word is automatically gotten by
+      # Embedding of the previous generated word is automatically retrieved
-       # GeneratedInputs, which is initialized by a start mark, such as <s>,
+      # by GeneratedInputs initialized by a start mark <s>.
-       # and must be included in generation.
-       trg_embedding = paddle.layer.GeneratedInputV2(
+       trg_embedding = paddle.layer.GeneratedInput(
           size=target_dict_dim,
           embedding_name='_target_language_embedding',
           embedding_size=word_vector_dim)
@@ -467,36 +469,31 @@ is_generating = False
    ```python
    if is_generating:
-        # get the dictionary
+        # load the dictionary
        src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)
-        # the delimited element of generated sequences is -1,
+        gen_sen_idx = np.where(beam_result[1] == -1)[0]
-        # the first element of each generated sequence is the sequence length
+        assert len(gen_sen_idx) == len(gen_data) * beam_size
-        seq_list = []
-        seq = []
+        # -1 is the delimiter of generated sequences.
-        for w in beam_result[1]:
+        # the first element of each generated sequence its length.
-            if w != -1:
+        start_pos, end_pos = 1, 0
-                seq.append(w)
+        for i, sample in enumerate(gen_data):
-            else:
+            print(" ".join([src_dict[w] for w in sample[0][1:-1]]))
-                seq_list.append(' '.join([trg_dict.get(w) for w in seq[1:]]))
-                seq = []
-        prob = beam_result[0]
-        for i in xrange(gen_num):
-            print "\n*******************************************************\n"
-            print "src:", ' '.join(
-                [src_dict.get(w) for w in gen_data[i][0]]), "\n"
            for j in xrange(beam_size):
-                print "prob = %f:" % (prob[i][j]), seq_list[i * beam_size + j]
+                end_pos = gen_sen_idx[i * beam_size + j]
+                print("%.4f\t%s" % (beam_result[0][i][j], " ".join(
+                    trg_dict[w] for w in beam_result[1][start_pos:end_pos])))
+                start_pos = end_pos + 2
+            print("\n")
    ```
  生成开始后，可以观察到输出的日志如下：
  ```text
-  src: <s> Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu <e>
+  Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu
+  -19.0196        The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
-  prob = -19.019573: The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
+  -19.1131        The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
-  prob = -19.113066: The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
+  -19.5129        The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
-  prob = -19.512890: The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
  ```
 ## 总结

--- a/08.machine_translation/README.md
+++ b/08.machine_translation/README.md
@@ -245,9 +245,7 @@ is_generating = False
   ```python
   src_embedding = paddle.layer.embedding(
-        input=src_word_id,
+       input=src_word_id, size=word_vector_dim)
-        size=word_vector_dim,
-        param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))
   ```
   - Use bi-direcitonal GRU to encode the source language sequence, and concatenate the encoding outputs from the two GRUs to get $\mathbf{h}$
@@ -265,8 +263,10 @@ is_generating = False
   - Get a projection of the encoding (c.f. 2.3) of the source language sequence by passing it into a feed forward neural network
   ```python
-    with paddle.layer.mixed(size=decoder_size) as encoded_proj:
+   encoded_proj = paddle.layer.fc(
-        encoded_proj += paddle.layer.full_matrix_projection(
+         act=paddle.activation.Linear(),
+         size=decoder_size,
+         bias_attr=False,
         input=encoded_vector)
   ```
@@ -274,9 +274,10 @@ is_generating = False
   ```python
   backward_first = paddle.layer.first_seq(input=src_backward)
-    with paddle.layer.mixed(
+   decoder_boot = paddle.layer.fc(
-            size=decoder_size, act=paddle.activation.Tanh()) as decoder_boot:
+         size=decoder_size,
-        decoder_boot += paddle.layer.full_matrix_projection(
+         act=paddle.activation.Tanh(),
+         bias_attr=False,
         input=backward_first)
   ```
@@ -290,7 +291,6 @@ is_generating = False
   ```python
   def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
        decoder_mem = paddle.layer.memory(
            name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
@@ -299,10 +299,13 @@ is_generating = False
            encoded_proj=enc_proj,
            decoder_state=decoder_mem)
-        with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
+        decoder_inputs = paddle.layer.fc(
-            decoder_inputs += paddle.layer.full_matrix_projection(input=context)
+            act=paddle.activation.Linear(),
-            decoder_inputs += paddle.layer.full_matrix_projection(
+            size=decoder_size * 3,
-                input=current_word)
+            bias_attr=False,
+            input=[context, current_word],
+            layer_attr=paddle.attr.ExtraLayerAttribute(
+            error_clipping_threshold=100.0))
        gru_step = paddle.layer.gru_step(
            name='gru_decoder',
@@ -310,11 +313,11 @@ is_generating = False
            output_mem=decoder_mem,
            size=decoder_size)
-        with paddle.layer.mixed(
+        out = paddle.layer.fc(
            size=target_dict_dim,
            bias_attr=True,
-                act=paddle.activation.Softmax()) as out:
+            act=paddle.activation.Softmax(),
-            out += paddle.layer.full_matrix_projection(input=gru_step)
+            input=gru_step)
        return out
   ```
@@ -322,8 +325,8 @@ is_generating = False
    ```python
    decoder_group_name = "decoder_group"
-    group_input1 = paddle.layer.StaticInputV2(input=encoded_vector, is_seq=True)
+    group_input1 = paddle.layer.StaticInput(input=encoded_vector)
-    group_input2 = paddle.layer.StaticInputV2(input=encoded_proj, is_seq=True)
+    group_input2 = paddle.layer.StaticInput(input=encoded_proj)
    group_inputs = [group_input1, group_input2]
    ```
@@ -368,15 +371,14 @@ is_generating = False
   ```python
   if is_generating:
       # In generation, the decoder predicts a next target word based on
-       # the encoded source sequence and the last generated target word.
+       # the encoded source sequence and the previous generated target word.
       # The encoded source sequence (encoder's output) must be specified by
       # StaticInput, which is a read-only memory.
-       # Embedding of the last generated word is automatically gotten by
+       # Embedding of the previous generated word is automatically retrieved
-       # GeneratedInputs, which is initialized by a start mark, such as <s>,
+       # by GeneratedInputs initialized by a start mark <s>.
-       # and must be included in generation.
-       trg_embedding = paddle.layer.GeneratedInputV2(
+       trg_embedding = paddle.layer.GeneratedInput(
           size=target_dict_dim,
           embedding_name='_target_language_embedding',
           embedding_size=word_vector_dim)
@@ -503,36 +505,31 @@ Note: Our configuration is based on Bahdanau et al. \[[4](#Reference)\] but with
   ```python
   if is_generating:
-        # get the dictionary
+       # load the dictionary
       src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)
-        # the delimited element of generated sequences is -1,
+       gen_sen_idx = np.where(beam_result[1] == -1)[0]
-        # the first element of each generated sequence is the sequence length
+       assert len(gen_sen_idx) == len(gen_data) * beam_size
-        seq_list = []
-        seq = []
+       # -1 is the delimiter of generated sequences.
-        for w in beam_result[1]:
+       # the first element of each generated sequence its length.
-            if w != -1:
+       start_pos, end_pos = 1, 0
-                seq.append(w)
+       for i, sample in enumerate(gen_data):
-            else:
+           print(" ".join([src_dict[w] for w in sample[0][1:-1]]))
-                seq_list.append(' '.join([trg_dict.get(w) for w in seq[1:]]))
-                seq = []
-        prob = beam_result[0]
-        for i in xrange(gen_num):
-            print "\n*******************************************************\n"
-            print "src:", ' '.join(
-                [src_dict.get(w) for w in gen_data[i][0]]), "\n"
           for j in xrange(beam_size):
-                print "prob = %f:" % (prob[i][j]), seq_list[i * beam_size + j]
+               end_pos = gen_sen_idx[i * beam_size + j]
+               print("%.4f\t%s" % (beam_result[0][i][j], " ".join(
+                     trg_dict[w] for w in beam_result[1][start_pos:end_pos])))
+               start_pos = end_pos + 2
+           print("\n")
   ```
  The generating log is as follows:
  ```text
-  src: <s> Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu <e>
+  Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu
+  -19.0196        The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
-  prob = -19.019573: The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
+  -19.1131        The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
-  prob = -19.113066: The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
+  -19.5129        The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
-  prob = -19.512890: The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
  ```
 ## Summary

--- a/08.machine_translation/index.cn.html
+++ b/08.machine_translation/index.cn.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 机器翻译
-本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
+本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)，更多内容请参考本教程的[视频课堂](http://bit.baidu.com/course/detail/id/179.html)。
 ## 背景介绍
@@ -81,7 +81,7 @@
 ### GRU
-我们已经在[情感分析](https://github.com/PaddlePaddle/book/blob/develop/understand_sentiment/README.md)一章中介绍了循环神经网络（RNN）及长短时间记忆网络（LSTM）。相比于简单的RNN，LSTM增加了记忆单元（memory cell）、输入门（input gate）、遗忘门（forget gate）及输出门（output gate），这些门及记忆单元组合起来大大提升了RNN处理远距离依赖问题的能力。
+我们已经在[情感分析](https://github.com/PaddlePaddle/book/blob/develop/06.understand_sentiment/README.cn.md)一章中介绍了循环神经网络（RNN）及长短时间记忆网络（LSTM）。相比于简单的RNN，LSTM增加了记忆单元（memory cell）、输入门（input gate）、遗忘门（forget gate）及输出门（output gate），这些门及记忆单元组合起来大大提升了RNN处理远距离依赖问题的能力。
 GRU\[[2](#参考文献)\]是Cho等人在LSTM上提出的简化版本，也是RNN的一种扩展，如下图所示。GRU单元只有两个门：
 - 重置门（reset gate）：如果重置门关闭，会忽略掉历史信息，即历史不相干的信息不会影响未来的输出。
@@ -95,7 +95,7 @@ GRU\[[2](#参考文献)\]是Cho等人在LSTM上提出的简化版本，也是RNN
 ### 双向循环神经网络
-我们已经在[语义角色标注](https://github.com/PaddlePaddle/book/blob/develop/label_semantic_roles/README.md)一章中介绍了一种双向循环神经网络，这里介绍Bengio团队在论文\[[2](#参考文献),[4](#参考文献)\]中提出的另一种结构。该结构的目的是输入一个序列，得到其在每个时刻的特征表示，即输出的每个时刻都用定长向量表示到该时刻的上下文语义信息。
+我们已经在[语义角色标注](https://github.com/PaddlePaddle/book/blob/develop/07.label_semantic_roles/README.cn.md)一章中介绍了一种双向循环神经网络，这里介绍Bengio团队在论文\[[2](#参考文献),[4](#参考文献)\]中提出的另一种结构。该结构的目的是输入一个序列，得到其在每个时刻的特征表示，即输出的每个时刻都用定长向量表示到该时刻的上下文语义信息。
 具体来说，该双向循环神经网络分别在时间维以顺序和逆序——即前向（forward）和后向（backward）——依次处理输入序列，并将每个时间步RNN的输出拼接成为最终的输出层。这样每个时间步的输出节点，都包含了输入序列中当前时刻完整的过去和未来的上下文信息。下图展示的是一个按时间步展开的双向循环神经网络。该网络包含一个前向和一个后向RNN，其中有六个权重矩阵：输入到前向隐层和后向隐层的权重矩阵（$W_1, W_3$），隐层到隐层自己的权重矩阵（$W_2,W_5$），前向隐层和后向隐层到输出层的权重矩阵（$W_4, W_6$）。注意，该网络的前向隐层和后向隐层之间没有连接。
@@ -251,9 +251,7 @@ is_generating = False
   ```python
    src_embedding = paddle.layer.embedding(
-        input=src_word_id,
+        input=src_word_id, size=word_vector_dim)
-        size=word_vector_dim,
-        param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))
   ```
   - 用双向GRU编码源语言序列，拼接两个GRU的编码结果得到$\mathbf{h}$。
@@ -270,8 +268,10 @@ is_generating = False
   - 对源语言序列编码后的结果（见2的最后一步），过一个前馈神经网络（Feed Forward Neural Network），得到其映射。
   ```python
-    with paddle.layer.mixed(size=decoder_size) as encoded_proj:
+   encoded_proj = paddle.layer.fc(
-        encoded_proj += paddle.layer.full_matrix_projection(
+         act=paddle.activation.Linear(),
+         size=decoder_size,
+         bias_attr=False,
         input=encoded_vector)
   ```
@@ -279,9 +279,10 @@ is_generating = False
   ```python
   backward_first = paddle.layer.first_seq(input=src_backward)
-    with paddle.layer.mixed(
+   decoder_boot = paddle.layer.fc(
-            size=decoder_size, act=paddle.activation.Tanh()) as decoder_boot:
+         size=decoder_size,
-        decoder_boot += paddle.layer.full_matrix_projection(
+         act=paddle.activation.Tanh(),
+         bias_attr=False,
         input=backward_first)
   ```
@@ -294,7 +295,6 @@ is_generating = False
   ```python
   def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
        decoder_mem = paddle.layer.memory(
            name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
@@ -303,10 +303,13 @@ is_generating = False
            encoded_proj=enc_proj,
            decoder_state=decoder_mem)
-        with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
+        decoder_inputs = paddle.layer.fc(
-            decoder_inputs += paddle.layer.full_matrix_projection(input=context)
+            act=paddle.activation.Linear(),
-            decoder_inputs += paddle.layer.full_matrix_projection(
+            size=decoder_size * 3,
-                input=current_word)
+            bias_attr=False,
+            input=[context, current_word],
+            layer_attr=paddle.attr.ExtraLayerAttribute(
+                error_clipping_threshold=100.0))
        gru_step = paddle.layer.gru_step(
            name='gru_decoder',
@@ -314,11 +317,11 @@ is_generating = False
            output_mem=decoder_mem,
            size=decoder_size)
-        with paddle.layer.mixed(
+        out = paddle.layer.mixed(
            size=target_dict_dim,
            bias_attr=True,
-                act=paddle.activation.Softmax()) as out:
+            act=paddle.activation.Softmax(),
-            out += paddle.layer.full_matrix_projection(input=gru_step)
+            input=paddle.layer.full_matrix_projection(input=gru_step))
        return out
   ```
@@ -326,8 +329,8 @@ is_generating = False
    ```python
    decoder_group_name = "decoder_group"
-    group_input1 = paddle.layer.StaticInputV2(input=encoded_vector, is_seq=True)
+    group_input1 = paddle.layer.StaticInput(input=encoded_vector)
-    group_input2 = paddle.layer.StaticInputV2(input=encoded_proj, is_seq=True)
+    group_input2 = paddle.layer.StaticInput(input=encoded_proj)
    group_inputs = [group_input1, group_input2]
    ```
@@ -372,15 +375,14 @@ is_generating = False
   ```python
   if is_generating:
      # In generation, the decoder predicts a next target word based on
-       # the encoded source sequence and the last generated target word.
+      # the encoded source sequence and the previous generated target word.
      # The encoded source sequence (encoder's output) must be specified by
      # StaticInput, which is a read-only memory.
-       # Embedding of the last generated word is automatically gotten by
+      # Embedding of the previous generated word is automatically retrieved
-       # GeneratedInputs, which is initialized by a start mark, such as <s>,
+      # by GeneratedInputs initialized by a start mark <s>.
-       # and must be included in generation.
-       trg_embedding = paddle.layer.GeneratedInputV2(
+       trg_embedding = paddle.layer.GeneratedInput(
           size=target_dict_dim,
           embedding_name='_target_language_embedding',
           embedding_size=word_vector_dim)
@@ -509,36 +511,31 @@ is_generating = False
    ```python
    if is_generating:
-        # get the dictionary
+        # load the dictionary
        src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)
-        # the delimited element of generated sequences is -1,
+        gen_sen_idx = np.where(beam_result[1] == -1)[0]
-        # the first element of each generated sequence is the sequence length
+        assert len(gen_sen_idx) == len(gen_data) * beam_size
-        seq_list = []
-        seq = []
+        # -1 is the delimiter of generated sequences.
-        for w in beam_result[1]:
+        # the first element of each generated sequence its length.
-            if w != -1:
+        start_pos, end_pos = 1, 0
-                seq.append(w)
+        for i, sample in enumerate(gen_data):
-            else:
+            print(" ".join([src_dict[w] for w in sample[0][1:-1]]))
-                seq_list.append(' '.join([trg_dict.get(w) for w in seq[1:]]))
-                seq = []
-        prob = beam_result[0]
-        for i in xrange(gen_num):
-            print "\n*******************************************************\n"
-            print "src:", ' '.join(
-                [src_dict.get(w) for w in gen_data[i][0]]), "\n"
            for j in xrange(beam_size):
-                print "prob = %f:" % (prob[i][j]), seq_list[i * beam_size + j]
+                end_pos = gen_sen_idx[i * beam_size + j]
+                print("%.4f\t%s" % (beam_result[0][i][j], " ".join(
+                    trg_dict[w] for w in beam_result[1][start_pos:end_pos])))
+                start_pos = end_pos + 2
+            print("\n")
    ```
  生成开始后，可以观察到输出的日志如下：
  ```text
-  src: <s> Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu <e>
+  Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu
+  -19.0196        The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
-  prob = -19.019573: The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
+  -19.1131        The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
-  prob = -19.113066: The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
+  -19.5129        The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
-  prob = -19.512890: The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
  ```
 ## 总结

--- a/08.machine_translation/index.html
+++ b/08.machine_translation/index.html
@@ -287,9 +287,7 @@ is_generating = False
   ```python
   src_embedding = paddle.layer.embedding(
-        input=src_word_id,
+       input=src_word_id, size=word_vector_dim)
-        size=word_vector_dim,
-        param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))
   ```
   - Use bi-direcitonal GRU to encode the source language sequence, and concatenate the encoding outputs from the two GRUs to get $\mathbf{h}$
@@ -307,8 +305,10 @@ is_generating = False
   - Get a projection of the encoding (c.f. 2.3) of the source language sequence by passing it into a feed forward neural network
   ```python
-    with paddle.layer.mixed(size=decoder_size) as encoded_proj:
+   encoded_proj = paddle.layer.fc(
-        encoded_proj += paddle.layer.full_matrix_projection(
+         act=paddle.activation.Linear(),
+         size=decoder_size,
+         bias_attr=False,
         input=encoded_vector)
   ```
@@ -316,9 +316,10 @@ is_generating = False
   ```python
   backward_first = paddle.layer.first_seq(input=src_backward)
-    with paddle.layer.mixed(
+   decoder_boot = paddle.layer.fc(
-            size=decoder_size, act=paddle.activation.Tanh()) as decoder_boot:
+         size=decoder_size,
-        decoder_boot += paddle.layer.full_matrix_projection(
+         act=paddle.activation.Tanh(),
+         bias_attr=False,
         input=backward_first)
   ```
@@ -332,7 +333,6 @@ is_generating = False
   ```python
   def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
        decoder_mem = paddle.layer.memory(
            name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
@@ -341,10 +341,13 @@ is_generating = False
            encoded_proj=enc_proj,
            decoder_state=decoder_mem)
-        with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
+        decoder_inputs = paddle.layer.fc(
-            decoder_inputs += paddle.layer.full_matrix_projection(input=context)
+            act=paddle.activation.Linear(),
-            decoder_inputs += paddle.layer.full_matrix_projection(
+            size=decoder_size * 3,
-                input=current_word)
+            bias_attr=False,
+            input=[context, current_word],
+            layer_attr=paddle.attr.ExtraLayerAttribute(
+            error_clipping_threshold=100.0))
        gru_step = paddle.layer.gru_step(
            name='gru_decoder',
@@ -352,11 +355,11 @@ is_generating = False
            output_mem=decoder_mem,
            size=decoder_size)
-        with paddle.layer.mixed(
+        out = paddle.layer.fc(
            size=target_dict_dim,
            bias_attr=True,
-                act=paddle.activation.Softmax()) as out:
+            act=paddle.activation.Softmax(),
-            out += paddle.layer.full_matrix_projection(input=gru_step)
+            input=gru_step)
        return out
   ```
@@ -364,8 +367,8 @@ is_generating = False
    ```python
    decoder_group_name = "decoder_group"
-    group_input1 = paddle.layer.StaticInputV2(input=encoded_vector, is_seq=True)
+    group_input1 = paddle.layer.StaticInput(input=encoded_vector)
-    group_input2 = paddle.layer.StaticInputV2(input=encoded_proj, is_seq=True)
+    group_input2 = paddle.layer.StaticInput(input=encoded_proj)
    group_inputs = [group_input1, group_input2]
    ```
@@ -410,15 +413,14 @@ is_generating = False
   ```python
   if is_generating:
       # In generation, the decoder predicts a next target word based on
-       # the encoded source sequence and the last generated target word.
+       # the encoded source sequence and the previous generated target word.
       # The encoded source sequence (encoder's output) must be specified by
       # StaticInput, which is a read-only memory.
-       # Embedding of the last generated word is automatically gotten by
+       # Embedding of the previous generated word is automatically retrieved
-       # GeneratedInputs, which is initialized by a start mark, such as <s>,
+       # by GeneratedInputs initialized by a start mark <s>.
-       # and must be included in generation.
-       trg_embedding = paddle.layer.GeneratedInputV2(
+       trg_embedding = paddle.layer.GeneratedInput(
           size=target_dict_dim,
           embedding_name='_target_language_embedding',
           embedding_size=word_vector_dim)
@@ -545,36 +547,31 @@ Note: Our configuration is based on Bahdanau et al. \[[4](#Reference)\] but with
   ```python
   if is_generating:
-        # get the dictionary
+       # load the dictionary
       src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)
-        # the delimited element of generated sequences is -1,
+       gen_sen_idx = np.where(beam_result[1] == -1)[0]
-        # the first element of each generated sequence is the sequence length
+       assert len(gen_sen_idx) == len(gen_data) * beam_size
-        seq_list = []
-        seq = []
+       # -1 is the delimiter of generated sequences.
-        for w in beam_result[1]:
+       # the first element of each generated sequence its length.
-            if w != -1:
+       start_pos, end_pos = 1, 0
-                seq.append(w)
+       for i, sample in enumerate(gen_data):
-            else:
+           print(" ".join([src_dict[w] for w in sample[0][1:-1]]))
-                seq_list.append(' '.join([trg_dict.get(w) for w in seq[1:]]))
-                seq = []
-        prob = beam_result[0]
-        for i in xrange(gen_num):
-            print "\n*******************************************************\n"
-            print "src:", ' '.join(
-                [src_dict.get(w) for w in gen_data[i][0]]), "\n"
           for j in xrange(beam_size):
-                print "prob = %f:" % (prob[i][j]), seq_list[i * beam_size + j]
+               end_pos = gen_sen_idx[i * beam_size + j]
+               print("%.4f\t%s" % (beam_result[0][i][j], " ".join(
+                     trg_dict[w] for w in beam_result[1][start_pos:end_pos])))
+               start_pos = end_pos + 2
+           print("\n")
   ```
  The generating log is as follows:
  ```text
-  src: <s> Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu <e>
+  Les <unk> se <unk> au sujet de la largeur des sièges alors que de grosses commandes sont en jeu
+  -19.0196        The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
-  prob = -19.019573: The <unk> will be rotated about the width of the seats , while large orders are at stake . <e>
+  -19.1131        The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
-  prob = -19.113066: The <unk> will be rotated about the width of the seats , while large commands are at stake . <e>
+  -19.5129        The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
-  prob = -19.512890: The <unk> will be rotated about the width of the seats , while large commands are at play . <e>
  ```
 ## Summary

--- a/08.machine_translation/train.py
+++ b/08.machine_translation/train.py
 import sys
+import numpy as np
 import paddle.v2 as paddle
-def seqToseq_net(source_dict_dim, target_dict_dim, is_generating=False):
+def save_model(parameters, save_path):
+    with open(save_path, 'w') as f:
+        parameters.to_tar(f)
+def seq_to_seq_net(source_dict_dim,
+                   target_dict_dim,
+                   is_generating,
+                   beam_size=3,
+                   max_length=250):
    ### Network Architecture
    word_vector_dim = 512  # dimension of word vector
-    decoder_size = 512  # dimension of hidden unit in GRU Decoder network
+    decoder_size = 512  # dimension of hidden unit of GRU decoder
-    encoder_size = 512  # dimension of hidden unit in GRU Encoder network
+    encoder_size = 512  # dimension of hidden unit of GRU encoder
-    beam_size = 3
-    max_length = 250
    #### Encoder
    src_word_id = paddle.layer.data(
        name='source_language_word',
        type=paddle.data_type.integer_value_sequence(source_dict_dim))
    src_embedding = paddle.layer.embedding(
-        input=src_word_id,
+        input=src_word_id, size=word_vector_dim)
-        size=word_vector_dim,
-        param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))
    src_forward = paddle.networks.simple_gru(
        input=src_embedding, size=encoder_size)
    src_backward = paddle.networks.simple_gru(
@@ -27,15 +32,18 @@ def seqToseq_net(source_dict_dim, target_dict_dim, is_generating=False):
    encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])
    #### Decoder
-    with paddle.layer.mixed(size=decoder_size) as encoded_proj:
+    encoded_proj = paddle.layer.fc(
-        encoded_proj += paddle.layer.full_matrix_projection(
+        act=paddle.activation.Linear(),
+        size=decoder_size,
+        bias_attr=False,
        input=encoded_vector)
    backward_first = paddle.layer.first_seq(input=src_backward)
-    with paddle.layer.mixed(
+    decoder_boot = paddle.layer.fc(
-            size=decoder_size, act=paddle.activation.Tanh()) as decoder_boot:
+        size=decoder_size,
-        decoder_boot += paddle.layer.full_matrix_projection(
+        act=paddle.activation.Tanh(),
+        bias_attr=False,
        input=backward_first)
    def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
@@ -48,10 +56,13 @@ def seqToseq_net(source_dict_dim, target_dict_dim, is_generating=False):
            encoded_proj=enc_proj,
            decoder_state=decoder_mem)
-        with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
+        decoder_inputs = paddle.layer.fc(
-            decoder_inputs += paddle.layer.full_matrix_projection(input=context)
+            act=paddle.activation.Linear(),
-            decoder_inputs += paddle.layer.full_matrix_projection(
+            size=decoder_size * 3,
-                input=current_word)
+            bias_attr=False,
+            input=[context, current_word],
+            layer_attr=paddle.attr.ExtraLayerAttribute(
+                error_clipping_threshold=100.0))
        gru_step = paddle.layer.gru_step(
            name='gru_decoder',
@@ -59,16 +70,16 @@ def seqToseq_net(source_dict_dim, target_dict_dim, is_generating=False):
            output_mem=decoder_mem,
            size=decoder_size)
-        with paddle.layer.mixed(
+        out = paddle.layer.fc(
            size=target_dict_dim,
            bias_attr=True,
-                act=paddle.activation.Softmax()) as out:
+            act=paddle.activation.Softmax(),
-            out += paddle.layer.full_matrix_projection(input=gru_step)
+            input=gru_step)
        return out
-    decoder_group_name = "decoder_group"
+    decoder_group_name = 'decoder_group'
-    group_input1 = paddle.layer.StaticInputV2(input=encoded_vector, is_seq=True)
+    group_input1 = paddle.layer.StaticInput(input=encoded_vector)
-    group_input2 = paddle.layer.StaticInputV2(input=encoded_proj, is_seq=True)
+    group_input2 = paddle.layer.StaticInput(input=encoded_proj)
    group_inputs = [group_input1, group_input2]
    if not is_generating:
@@ -98,15 +109,14 @@ def seqToseq_net(source_dict_dim, target_dict_dim, is_generating=False):
        return cost
    else:
        # In generation, the decoder predicts a next target word based on
-        # the encoded source sequence and the last generated target word.
+        # the encoded source sequence and the previous generated target word.
        # The encoded source sequence (encoder's output) must be specified by
        # StaticInput, which is a read-only memory.
-        # Embedding of the last generated word is automatically gotten by
+        # Embedding of the previous generated word is automatically retrieved
-        # GeneratedInputs, which is initialized by a start mark, such as <s>,
+        # by GeneratedInputs initialized by a start mark <s>.
-        # and must be included in generation.
-        trg_embedding = paddle.layer.GeneratedInputV2(
+        trg_embedding = paddle.layer.GeneratedInput(
            size=target_dict_dim,
            embedding_name='_target_language_embedding',
            embedding_size=word_vector_dim)
@@ -134,32 +144,43 @@ def main():
    # train the network
    if not is_generating:
-        cost = seqToseq_net(source_dict_dim, target_dict_dim)
-        parameters = paddle.parameters.create(cost)
        # define optimize method and trainer
        optimizer = paddle.optimizer.Adam(
            learning_rate=5e-5,
            regularization=paddle.optimizer.L2Regularization(rate=8e-4))
+        cost = seq_to_seq_net(source_dict_dim, target_dict_dim, is_generating)
+        parameters = paddle.parameters.create(cost)
        trainer = paddle.trainer.SGD(
            cost=cost, parameters=parameters, update_equation=optimizer)
        # define data reader
        wmt14_reader = paddle.batch(
            paddle.reader.shuffle(
                paddle.dataset.wmt14.train(dict_size), buf_size=8192),
-            batch_size=5)
+            batch_size=4)
        # define event_handler callback
        def event_handler(event):
            if isinstance(event, paddle.event.EndIteration):
                if event.batch_id % 10 == 0:
-                    print "\nPass %d, Batch %d, Cost %f, %s" % (
+                    print("\nPass %d, Batch %d, Cost %f, %s" %
-                        event.pass_id, event.batch_id, event.cost,
+                          (event.pass_id, event.batch_id, event.cost,
-                        event.metrics)
+                           event.metrics))
                else:
                    sys.stdout.write('.')
                    sys.stdout.flush()
+                if not event.batch_id % 10:
+                    save_path = 'params_pass_%05d_batch_%05d.tar' % (
+                        event.pass_id, event.batch_id)
+                    save_model(parameters, save_path)
+            if isinstance(event, paddle.event.EndPass):
+                # save parameters
+                save_path = 'params_pass_%05d.tar' % (event.pass_id)
+                save_model(parameters, save_path)
        # start to train
        trainer.train(
            reader=wmt14_reader, event_handler=event_handler, num_passes=2)
@@ -167,17 +188,20 @@ def main():
    # generate a english sequence to french
    else:
        # use the first 3 samples for generation
-        gen_creator = paddle.dataset.wmt14.gen(dict_size)
        gen_data = []
        gen_num = 3
-        for item in gen_creator():
+        for item in paddle.dataset.wmt14.gen(dict_size)():
-            gen_data.append((item[0], ))
+            gen_data.append([item[0]])
            if len(gen_data) == gen_num:
                break
-        beam_gen = seqToseq_net(source_dict_dim, target_dict_dim, is_generating)
+        beam_size = 3
-        # get the pretrained model, whose bleu = 26.92
+        beam_gen = seq_to_seq_net(source_dict_dim, target_dict_dim,
+                                  is_generating, beam_size)
+        # get the trained model, whose bleu = 26.92
        parameters = paddle.dataset.wmt14.model()
        # prob is the prediction probabilities, and id is the prediction word.
        beam_result = paddle.infer(
            output_layer=beam_gen,
@@ -185,28 +209,25 @@ def main():
            input=gen_data,
            field=['prob', 'id'])
-        # get the dictionary
+        # load the dictionary
        src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)
-        # the delimited element of generated sequences is -1,
+        gen_sen_idx = np.where(beam_result[1] == -1)[0]
-        # the first element of each generated sequence is the sequence length
+        assert len(gen_sen_idx) == len(gen_data) * beam_size
-        seq_list = []
-        seq = []
-        for w in beam_result[1]:
-            if w != -1:
-                seq.append(w)
-            else:
-                seq_list.append(' '.join([trg_dict.get(w) for w in seq[1:]]))
-                seq = []
-        prob = beam_result[0]
+        # -1 is the delimiter of generated sequences.
-        beam_size = 3
+        # the first element of each generated sequence its length.
-        for i in xrange(gen_num):
+        start_pos, end_pos = 1, 0
-            print "\n*******************************************************\n"
+        for i, sample in enumerate(gen_data):
-            print "src:", ' '.join(
+            print(
-                [src_dict.get(w) for w in gen_data[i][0]]), "\n"
+                " ".join([src_dict[w] for w in sample[0][1:-1]])
+            )  # skip the start and ending mark when printing the source sentence
            for j in xrange(beam_size):
-                print "prob = %f:" % (prob[i][j]), seq_list[i * beam_size + j]
+                end_pos = gen_sen_idx[i * beam_size + j]
+                print("%.4f\t%s" % (beam_result[0][i][j], " ".join(
+                    trg_dict[w] for w in beam_result[1][start_pos:end_pos])))
+                start_pos = end_pos + 2
+            print("\n")
 if __name__ == '__main__':

--- a/README.cn.md
+++ b/README.cn.md
@@ -13,6 +13,8 @@
 1. [语义角色标注](http://book.paddlepaddle.org/07.label_semantic_roles/index.cn.html)
 1. [机器翻译](http://book.paddlepaddle.org/08.machine_translation/index.cn.html)
+更多学习内容请访问PaddlePaddle[视频课堂](http://bit.baidu.com/Course/datalist/column/117.html)。
 ## 运行这本书
 您现在在看的这本书是一本“交互式”电子书 —— 每一章都可以运行在一个Jupyter Notebook里。
@@ -22,7 +24,7 @@
 只需要在命令行窗口里运行：
 ```bash
-docker run -d -p 8888:8888 paddlepaddle/book:0.10.0
+docker run -d -p 8888:8888 paddlepaddle/book
 ```
 会从DockerHub.com下载和运行本书的Docker image。阅读和在线编辑本书请在浏览器里访问 http://localhost:8888 。
@@ -30,7 +32,7 @@ docker run -d -p 8888:8888 paddlepaddle/book:0.10.0
 如果您访问DockerHub.com很慢，可以试试我们的另一个镜像docker.paddlepaddle.org：
 ```bash
-docker run -d -p 8888:8888 docker.paddlepaddle.org/book:0.10.0
+docker run -d -p 8888:8888 docker.paddlepaddle.org/book
 ```
 ### 使用GPU训练
@@ -38,13 +40,13 @@ docker run -d -p 8888:8888 docker.paddlepaddle.org/book:0.10.0
 本书默认使用CPU训练，若是要使用GPU训练，使用步骤会稍有变化。为了保证GPU驱动能够在镜像里面正常运行，我们推荐使用[nvidia-docker](https://github.com/NVIDIA/nvidia-docker)来运行镜像。请先安装nvidia-docker，之后请运行：
 ```bash
-nvidia-docker run -d -p 8888:8888 paddlepaddle/book:0.10.0-gpu
+nvidia-docker run -d -p 8888:8888 paddlepaddle/book:latest-gpu
 ```
 或者使用国内的镜像请运行：
 ```bash
-nvidia-docker run -d -p 8888:8888 docker.paddlepaddle.org/book:0.10.0-gpu
+nvidia-docker run -d -p 8888:8888 docker.paddlepaddle.org/book:latest-gpu
 ```
 还需要将以下代码
@@ -64,7 +66,7 @@ paddle.init(use_gpu=True, trainer_count=1)
 为了写作、运行、调试，您需要安装Python 2.x和Go >1.5, 并可以用[脚本程序](https://github.com/PaddlePaddle/book/blob/develop/.tools/convert-markdown-into-ipynb-and-test.sh)来生成新的Docker image。
-**Note:** We also provide [English Readme](https://github.com/PaddlePaddle/book/blob/develop/README.en.md) for PaddlePaddle book.
+**Note:** We also provide [English Readme](https://github.com/PaddlePaddle/book/blob/develop/README.md) for PaddlePaddle book.
 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">知识共享 署名-相同方式共享 4.0 国际 许可协议</a>进行许可。
--- a/README.md
+++ b/README.md
@@ -22,7 +22,8 @@ We packed this book, Jupyter, PaddlePaddle, and all dependencies into a Docker i
 Just type
 ```bash
-docker run -d -p 8888:8888 paddlepaddle/book:0.10.0
+docker run -d -p 8888:8888 paddlepaddle/book
 ```
 This command will download the pre-built Docker image from DockerHub.com and run it in a container.  Please direct your Web browser to http://localhost:8888 to read the book.
@@ -30,7 +31,8 @@ This command will download the pre-built Docker image from DockerHub.com and run
 If you are living in somewhere slow to access DockerHub.com, you might try our mirror server docker.paddlepaddle.org:
 ```bash
-docker run -d -p 8888:8888 docker.paddlepaddle.org/book:0.10.0
+docker run -d -p 8888:8888 docker.paddlepaddle.org/book
 ```
 ### Training with GPU
@@ -40,13 +42,15 @@ By default we are using CPU for training, if you want to train with GPU, the ste
 To make sure GPU can be successfully used from inside container, please install [nvidia-docker](https://github.com/NVIDIA/nvidia-docker). Then run:
 ```bash
-nvidia-docker run -d -p 8888:8888 paddlepaddle/book:0.10.0-gpu
+nvidia-docker run -d -p 8888:8888 paddlepaddle/book:latest-gpu
 ```
 Or you can use the image registry mirror in China:
 ```bash
-nvidia-docker run -d -p 8888:8888 docker.paddlepaddle.org/book:0.10.0-gpu
+nvidia-docker run -d -p 8888:8888 docker.paddlepaddle.org/book:latest-gpu
 ```
 Change the code in the chapter that you are reading from