Merge branch 'develop' of https://github.com/PaddlePaddle/book into review_chp06

and update according review comments

Merge branch 'develop' of https://github.com/PaddlePaddle/book into review_chp06
and update according review comments
d539d46f · Xi Chen · a792db16 · 6ba54d78 · d539d46f · d539d46f
12 changed file
--- a/03.image_classification/README.md
+++ b/03.image_classification/README.md
@@ -10,7 +10,7 @@ Compared to words, images provide much more vivid and easier to understand infor

 Image classification is the task of distinguishing images in different categories based on their semantic meaning. It is a core problem in computer vision and is also the foundation of other higher level computer vision tasks such as object detection, image segmentation, object tracking, action recognition, etc. Image classification has applications in many areas such as face recognition, intelligent video analysis in security systems, traffic scene recognition in transportation systems, content-based image retrieval and automatic photo indexing in web services, image classification in medicine, etc.

-To classify an image we first encode the entire image using handcrafted or learned features and then determine the category using a classifier. Thus, feature extraction plays an important role in image classification. Prior to deep learning the BoW(Bag of Words) model was the most widely used method for classifying an image as well as an object. The BoW technique was introduced in Natural Language Processing where a training sentence is represented as a bag of words. In the context of image classification, the BoW model requires constructing a dictionary. The simplest BoW framework can be designed with three steps: **feature extraction**, **feature encoding** and **classifier design**.
+To classify an image we firstly encode the entire image using handcrafted or learned features and then determine the category using a classifier. Thus, feature extraction plays an important role in image classification. Prior to deep learning the BoW(Bag of Words) model was the most widely used method for classifying an image as well as an object. The BoW technique was introduced in Natural Language Processing where a training sentence is represented as a bag of words. In the context of image classification, the BoW model requires constructing a dictionary. The simplest BoW framework can be designed with three steps: **feature extraction**, **feature encoding** and **classifier design**.

 Using Deep learning, image classification can be framed as a supervised or unsupervised learning problem that uses hierarchical features automatically without any need for manually crafted features from the image. In recent years, Convolutional Neural Networks (CNNs) have made significant progress in image classification. CNNs use raw image pixels as input, extract low-level and high-level abstract features through convolution operations, and directly output the classification results from the model. This style of end-to-end learning has lead to not only increased performance but also wider adoption various applications.

@@ -47,7 +47,7 @@ Figure 3. Disturbed images [22]

 ## Model Overview

-A large amount of research in image classification is built upon public datasets such as [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/), [ImageNet](http://image-net.org/) etc. Many image classification algorithms are usually evaluated and compared on these datasets. PASCAL VOC is a computer vision competition started in 2005, and ImageNet is a dataset for Large Scale Visual Recognition Challenge (ILSVRC) started in 2010. In this chapter, we introduce some image classification models from the submissions to these competitions.
+A large amount of research in image classification is built upon public datasets such as [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/), [ImageNet](http://image-net.org/) etc. Many image classification algorithms are usually evaluated and compared on top of these datasets. PASCAL VOC is a computer vision competition started in 2005, and ImageNet is a dataset for Large Scale Visual Recognition Challenge (ILSVRC) started in 2010. In this chapter, we introduce some image classification models from the submissions to these competitions.

 Before 2012, traditional image classification was accomplished with the three steps described in the background section. A complete model construction usually involves the following stages: low-level feature extraction, feature encoding, spatial constraint or feature clustering, classifier design, model ensemble.

@@ -87,7 +87,7 @@ Figure 5. A CNN example [20]

 - Dropout [10]: At each training stage, individual nodes are dropped out of the network with a certain probability. This improves the network's ability to generalize and avoids overfitting.

-Parameter updates at each layer during training causes input layer distributions to change and in turn requires hyper-parameters to be careful tuned. In 2015, Sergey Ioffe and Christian Szegedy proposed a Batch Normalization (BN) algorithm [14], which normalizes the features of each batch in a layer, and enables relatively stable distribution in each layer. Not only does BN algorithm act as a regularizer, but also reduces the need for careful hyper-parameter design. Experiments demonstrate that BN algorithm accelerates the training convergence and has been widely used in later deeper models.
+Parameter updates at each layer during training causes input layer distributions to change and in turn requires hyper-parameters to be carefully tuned. In 2015, Sergey Ioffe and Christian Szegedy proposed a Batch Normalization (BN) algorithm [14], which normalizes the features of each batch in a layer, and enables relatively stable distribution in each layer. Not only does BN algorithm act as a regularizer, but also reduces the need for careful hyper-parameter design. Experiments demonstrate that BN algorithm accelerates the training convergence and has been widely used in later deeper models.

 In the following sections, we will introduce the following network architectures - VGG, GoogleNet and ResNets.

@@ -119,7 +119,7 @@ Figure 7. Inception block

 GoogleNet consists of multiple stacked Inception blocks followed by an avg-pooling layer as in NIN instead of traditional fully connected layers. The difference between GoogleNet and NIN is that GoogleNet adds a fully connected layer after avg-pooling layer to output a vector of category size. Besides these two characteristics, the features from middle layers of a GoogleNet are also very discriminative. Therefore, GoogeleNet inserts two auxiliary classifiers in the model for enhancing gradient and regularization when doing backpropagation. The loss function of the whole network is the weighted sum of these three classifiers.

-Figure 8 illustrates the neural architecture of a GoogleNet which consists of 22 layers: it starts with three regular convolutional layers followed by three groups of sub-networks -- the first group contains two Inception blocks, the second one five, and the third one two. It ends up with an average pooling and a fully-connected layer.
+Figure 8 illustrates the neural architecture of a GoogleNet which consists of 22 layers: it starts with three regular convolutional layers followed by three groups of sub-networks -- the first group contains two Inception blocks, the second group has five, and the third group has two. It ends with an average pooling and a fully-connected layer.

 <p align="center">
 <img src="image/googlenet.jpeg" ><br/>
@@ -130,7 +130,7 @@ The above model is the first version of GoogleNet or GoogelNet-v1. GoogleNet-v2

 ### ResNet

-Residual Network(ResNet)[15] won the 2015 championship on three ImageNet competitions -- image classification, object localization, and object detection. The main challenge in training deeper networks is that accuracy degrades with network depth. The authors of ResNet proposed a residual learning approach to ease the difficulty of training deeper networks. Based on the design ideas of BN, small convolutional kernels, full convolutional network, ResNets reformulate the layers as residual blocks, with each block containing two branches, one directly connecting input to the output, the other performing two to three convolutions and calculating the residual function with reference to the layer inputs. The outputs of these two branches are then added up.
+Residual Network(ResNet)[15] won the 2015 championship on three ImageNet competitions -- image classification, object localization, and object detection. The main challenge in training deeper networks is that accuracy degrades with network depth. The authors of ResNet proposed a residual learning approach to ease the difficulty of training deeper networks. Based on the design ideas of BN, small convolutional kernels, full convolutional network, ResNets reformulate the layers as residual blocks, with each block containing two branches, one directly connecting input to the output, the other performing two to three convolutions and calculating the residual function with reference to the layer's inputs. The outputs of these two branches are then added up.

 Figure 9 illustrates the ResNet architecture. To the left is the basic building block, it consists of two 3x3 convolutional layers of the same channels. To the right is a Bottleneck block. The bottleneck is a 1x1 convolutional layer used to reduce dimension from 256 to 64. The other 1x1 convolutional layer is used to increase dimension from 64 to 256. Thus, the number of input and output channels of the middle 3x3 convolutional layer is 64, which is relatively small.

@@ -160,7 +160,7 @@ Figure 11. CIFAR10 dataset[21]

 `paddle.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `moivelens` and `wmt14`, etc. There's no need to manually download and preprocess CIFAR-10.

-After issuing a command `python train.py`, training will start immediately. The following sections describe the details:
+After running the command `python train.py`, training will start immediately. The following sections will describe in details.

 ## Model Structure

@@ -177,12 +177,11 @@ from resnet import resnet_cifar10
 # PaddlePaddle init
 paddle.init(use_gpu=False, trainer_count=1)
 ```
-
-As mentioned in section [Model Overview](#model-overview), here we provide the implementations of the VGG and ResNet models.
+Now we are going to walk you through the implementations of the VGG and ResNet.

 ### VGG

-First, we use a VGG network. Since the image size and amount of CIFAR10 are relatively small comparing to ImageNet, we use a small version of VGG network for CIFAR10. Convolution groups incorporate BN and dropout operations.
+Let's start with the VGG model. Since the image size and amount of CIFAR10 are relatively small comparing to ImageNet, we use a small version of VGG network for CIFAR10. Convolution groups incorporate BN and dropout operations.

 1. Define input data and its dimension

@@ -233,7 +232,7 @@ First, we use a VGG network. Since the image size and amount of CIFAR10 are rela
        return fc2
    ```

-    2.1. First, define a convolution block or conv_block. The default convolution kernel is 3x3, and the default pooling size is 2x2 with stride 2. Dropout specifies the probability in dropout operation. Function `img_conv_group` is defined in `paddle.networks` consisting of a series of `Conv->BN->ReLu->Dropout` and a `Pooling`.
+    2.1. Firstly, it defines a convolution block or conv_block. The default convolution kernel is 3x3, and the default pooling size is 2x2 with stride 2. Dropout specifies the probability in dropout operation. Function `img_conv_group` is defined in `paddle.networks` consisting of a series of `Conv->BN->ReLu->Dropout` and a `Pooling`.

    2.2. Five groups of convolutions. The first two groups perform two convolutions, while the last three groups perform three convolutions. The dropout rate of the last convolution in each group is set to 0, which means there is no dropout for this layer.

@@ -261,7 +260,7 @@ First, we use a VGG network. Since the image size and amount of CIFAR10 are rela

 ### ResNet

-The first, third and fourth steps of a ResNet are the same as a VGG. The second one is the main module.
+The first, third and fourth steps of a ResNet are the same as a VGG. The second step is the main module of ResNet.

 ```python
 net = resnet_cifar10(image, depth=56)
@@ -344,7 +343,7 @@ def resnet_cifar10(ipt, depth=32):

 ### Define Parameters

-First, we create the model parameters according to the previous model configuration `cost`.
+Firstly, we create the model parameters according to the previous model configuration `cost`.

 ```python
 # Create parameters
@@ -482,7 +481,7 @@ Figure 12. The error rate of VGG model on CIFAR10

 ## Application

-After training is done, users can use the trained model to classify images. The following code shows how to infer through `paddle.infer` interface. You can remove the comments to change the model name.
+After training is completed, users can use the trained model to classify images. The following code shows how to infer through `paddle.infer` interface. You can uncomment some lines from below to change the model name.

 ```python
 from PIL import Image
@@ -520,7 +519,7 @@ print "Label of image/dog.png is: %d" % lab[0][0]

 ## Conclusion

-Traditional image classification methods have complicated frameworks that involve multiple stages of processing. In contrast, CNN models can be trained end-to-end with a significant increase in classification accuracy. In this chapter, we introduced three models -- VGG, GoogleNet, ResNet and provided PaddlePaddle config files for training VGG and ResNet on CIFAR10. We also explained how to perform prediction and feature extraction using the PaddlePaddle API. For other datasets such as ImageNet, the procedure for config and training are the same and you are welcome to give it a try.
+Traditional image classification methods involve multiple stages of processing, which has to utilize complex frameworks. Contrarily, CNN models can be trained end-to-end with a significant increase in classification accuracy. In this chapter, we introduced three models -- VGG, GoogleNet, ResNet and provided PaddlePaddle config files for training VGG and ResNet on CIFAR10. We also explained how to perform prediction and feature extraction using the PaddlePaddle API. For other datasets such as ImageNet, the procedure for config and training are the same and you are welcome to give it a try.


 ## Reference

--- a/03.image_classification/index.html
+++ b/03.image_classification/index.html
@@ -52,7 +52,7 @@ Compared to words, images provide much more vivid and easier to understand infor

 Image classification is the task of distinguishing images in different categories based on their semantic meaning. It is a core problem in computer vision and is also the foundation of other higher level computer vision tasks such as object detection, image segmentation, object tracking, action recognition, etc. Image classification has applications in many areas such as face recognition, intelligent video analysis in security systems, traffic scene recognition in transportation systems, content-based image retrieval and automatic photo indexing in web services, image classification in medicine, etc.

-To classify an image we first encode the entire image using handcrafted or learned features and then determine the category using a classifier. Thus, feature extraction plays an important role in image classification. Prior to deep learning the BoW(Bag of Words) model was the most widely used method for classifying an image as well as an object. The BoW technique was introduced in Natural Language Processing where a training sentence is represented as a bag of words. In the context of image classification, the BoW model requires constructing a dictionary. The simplest BoW framework can be designed with three steps: **feature extraction**, **feature encoding** and **classifier design**.
+To classify an image we firstly encode the entire image using handcrafted or learned features and then determine the category using a classifier. Thus, feature extraction plays an important role in image classification. Prior to deep learning the BoW(Bag of Words) model was the most widely used method for classifying an image as well as an object. The BoW technique was introduced in Natural Language Processing where a training sentence is represented as a bag of words. In the context of image classification, the BoW model requires constructing a dictionary. The simplest BoW framework can be designed with three steps: **feature extraction**, **feature encoding** and **classifier design**.

 Using Deep learning, image classification can be framed as a supervised or unsupervised learning problem that uses hierarchical features automatically without any need for manually crafted features from the image. In recent years, Convolutional Neural Networks (CNNs) have made significant progress in image classification. CNNs use raw image pixels as input, extract low-level and high-level abstract features through convolution operations, and directly output the classification results from the model. This style of end-to-end learning has lead to not only increased performance but also wider adoption various applications.

@@ -89,7 +89,7 @@ Figure 3. Disturbed images [22]

 ## Model Overview

-A large amount of research in image classification is built upon public datasets such as [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/), [ImageNet](http://image-net.org/) etc. Many image classification algorithms are usually evaluated and compared on these datasets. PASCAL VOC is a computer vision competition started in 2005, and ImageNet is a dataset for Large Scale Visual Recognition Challenge (ILSVRC) started in 2010. In this chapter, we introduce some image classification models from the submissions to these competitions.
+A large amount of research in image classification is built upon public datasets such as [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/), [ImageNet](http://image-net.org/) etc. Many image classification algorithms are usually evaluated and compared on top of these datasets. PASCAL VOC is a computer vision competition started in 2005, and ImageNet is a dataset for Large Scale Visual Recognition Challenge (ILSVRC) started in 2010. In this chapter, we introduce some image classification models from the submissions to these competitions.

 Before 2012, traditional image classification was accomplished with the three steps described in the background section. A complete model construction usually involves the following stages: low-level feature extraction, feature encoding, spatial constraint or feature clustering, classifier design, model ensemble.

@@ -129,7 +129,7 @@ Figure 5. A CNN example [20]

 - Dropout [10]: At each training stage, individual nodes are dropped out of the network with a certain probability. This improves the network's ability to generalize and avoids overfitting.

-Parameter updates at each layer during training causes input layer distributions to change and in turn requires hyper-parameters to be careful tuned. In 2015, Sergey Ioffe and Christian Szegedy proposed a Batch Normalization (BN) algorithm [14], which normalizes the features of each batch in a layer, and enables relatively stable distribution in each layer. Not only does BN algorithm act as a regularizer, but also reduces the need for careful hyper-parameter design. Experiments demonstrate that BN algorithm accelerates the training convergence and has been widely used in later deeper models.
+Parameter updates at each layer during training causes input layer distributions to change and in turn requires hyper-parameters to be carefully tuned. In 2015, Sergey Ioffe and Christian Szegedy proposed a Batch Normalization (BN) algorithm [14], which normalizes the features of each batch in a layer, and enables relatively stable distribution in each layer. Not only does BN algorithm act as a regularizer, but also reduces the need for careful hyper-parameter design. Experiments demonstrate that BN algorithm accelerates the training convergence and has been widely used in later deeper models.

 In the following sections, we will introduce the following network architectures - VGG, GoogleNet and ResNets.

@@ -161,7 +161,7 @@ Figure 7. Inception block

 GoogleNet consists of multiple stacked Inception blocks followed by an avg-pooling layer as in NIN instead of traditional fully connected layers. The difference between GoogleNet and NIN is that GoogleNet adds a fully connected layer after avg-pooling layer to output a vector of category size. Besides these two characteristics, the features from middle layers of a GoogleNet are also very discriminative. Therefore, GoogeleNet inserts two auxiliary classifiers in the model for enhancing gradient and regularization when doing backpropagation. The loss function of the whole network is the weighted sum of these three classifiers.

-Figure 8 illustrates the neural architecture of a GoogleNet which consists of 22 layers: it starts with three regular convolutional layers followed by three groups of sub-networks -- the first group contains two Inception blocks, the second one five, and the third one two. It ends up with an average pooling and a fully-connected layer.
+Figure 8 illustrates the neural architecture of a GoogleNet which consists of 22 layers: it starts with three regular convolutional layers followed by three groups of sub-networks -- the first group contains two Inception blocks, the second group has five, and the third group has two. It ends with an average pooling and a fully-connected layer.

 <p align="center">
 <img src="image/googlenet.jpeg" ><br/>
@@ -172,7 +172,7 @@ The above model is the first version of GoogleNet or GoogelNet-v1. GoogleNet-v2

 ### ResNet

-Residual Network(ResNet)[15] won the 2015 championship on three ImageNet competitions -- image classification, object localization, and object detection. The main challenge in training deeper networks is that accuracy degrades with network depth. The authors of ResNet proposed a residual learning approach to ease the difficulty of training deeper networks. Based on the design ideas of BN, small convolutional kernels, full convolutional network, ResNets reformulate the layers as residual blocks, with each block containing two branches, one directly connecting input to the output, the other performing two to three convolutions and calculating the residual function with reference to the layer inputs. The outputs of these two branches are then added up.
+Residual Network(ResNet)[15] won the 2015 championship on three ImageNet competitions -- image classification, object localization, and object detection. The main challenge in training deeper networks is that accuracy degrades with network depth. The authors of ResNet proposed a residual learning approach to ease the difficulty of training deeper networks. Based on the design ideas of BN, small convolutional kernels, full convolutional network, ResNets reformulate the layers as residual blocks, with each block containing two branches, one directly connecting input to the output, the other performing two to three convolutions and calculating the residual function with reference to the layer's inputs. The outputs of these two branches are then added up.

 Figure 9 illustrates the ResNet architecture. To the left is the basic building block, it consists of two 3x3 convolutional layers of the same channels. To the right is a Bottleneck block. The bottleneck is a 1x1 convolutional layer used to reduce dimension from 256 to 64. The other 1x1 convolutional layer is used to increase dimension from 64 to 256. Thus, the number of input and output channels of the middle 3x3 convolutional layer is 64, which is relatively small.

@@ -202,7 +202,7 @@ Figure 11. CIFAR10 dataset[21]

 `paddle.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `moivelens` and `wmt14`, etc. There's no need to manually download and preprocess CIFAR-10.

-After issuing a command `python train.py`, training will start immediately. The following sections describe the details:
+After running the command `python train.py`, training will start immediately. The following sections will describe in details.

 ## Model Structure

@@ -219,12 +219,11 @@ from resnet import resnet_cifar10
 # PaddlePaddle init
 paddle.init(use_gpu=False, trainer_count=1)
 ```
-
-As mentioned in section [Model Overview](#model-overview), here we provide the implementations of the VGG and ResNet models.
+Now we are going to walk you through the implementations of the VGG and ResNet.

 ### VGG

-First, we use a VGG network. Since the image size and amount of CIFAR10 are relatively small comparing to ImageNet, we use a small version of VGG network for CIFAR10. Convolution groups incorporate BN and dropout operations.
+Let's start with the VGG model. Since the image size and amount of CIFAR10 are relatively small comparing to ImageNet, we use a small version of VGG network for CIFAR10. Convolution groups incorporate BN and dropout operations.

 1. Define input data and its dimension

@@ -275,7 +274,7 @@ First, we use a VGG network. Since the image size and amount of CIFAR10 are rela
        return fc2
    ```

-    2.1. First, define a convolution block or conv_block. The default convolution kernel is 3x3, and the default pooling size is 2x2 with stride 2. Dropout specifies the probability in dropout operation. Function `img_conv_group` is defined in `paddle.networks` consisting of a series of `Conv->BN->ReLu->Dropout` and a `Pooling`.
+    2.1. Firstly, it defines a convolution block or conv_block. The default convolution kernel is 3x3, and the default pooling size is 2x2 with stride 2. Dropout specifies the probability in dropout operation. Function `img_conv_group` is defined in `paddle.networks` consisting of a series of `Conv->BN->ReLu->Dropout` and a `Pooling`.

    2.2. Five groups of convolutions. The first two groups perform two convolutions, while the last three groups perform three convolutions. The dropout rate of the last convolution in each group is set to 0, which means there is no dropout for this layer.

@@ -303,7 +302,7 @@ First, we use a VGG network. Since the image size and amount of CIFAR10 are rela

 ### ResNet

-The first, third and fourth steps of a ResNet are the same as a VGG. The second one is the main module.
+The first, third and fourth steps of a ResNet are the same as a VGG. The second step is the main module of ResNet.

 ```python
 net = resnet_cifar10(image, depth=56)
@@ -386,7 +385,7 @@ def resnet_cifar10(ipt, depth=32):

 ### Define Parameters

-First, we create the model parameters according to the previous model configuration `cost`.
+Firstly, we create the model parameters according to the previous model configuration `cost`.

 ```python
 # Create parameters
@@ -524,7 +523,7 @@ Figure 12. The error rate of VGG model on CIFAR10

 ## Application

-After training is done, users can use the trained model to classify images. The following code shows how to infer through `paddle.infer` interface. You can remove the comments to change the model name.
+After training is completed, users can use the trained model to classify images. The following code shows how to infer through `paddle.infer` interface. You can uncomment some lines from below to change the model name.

 ```python
 from PIL import Image
@@ -562,7 +561,7 @@ print "Label of image/dog.png is: %d" % lab[0][0]

 ## Conclusion

-Traditional image classification methods have complicated frameworks that involve multiple stages of processing. In contrast, CNN models can be trained end-to-end with a significant increase in classification accuracy. In this chapter, we introduced three models -- VGG, GoogleNet, ResNet and provided PaddlePaddle config files for training VGG and ResNet on CIFAR10. We also explained how to perform prediction and feature extraction using the PaddlePaddle API. For other datasets such as ImageNet, the procedure for config and training are the same and you are welcome to give it a try.
+Traditional image classification methods involve multiple stages of processing, which has to utilize complex frameworks. Contrarily, CNN models can be trained end-to-end with a significant increase in classification accuracy. In this chapter, we introduced three models -- VGG, GoogleNet, ResNet and provided PaddlePaddle config files for training VGG and ResNet on CIFAR10. We also explained how to perform prediction and feature extraction using the PaddlePaddle API. For other datasets such as ImageNet, the procedure for config and training are the same and you are welcome to give it a try.


 ## Reference

--- a/04.word2vec/README.cn.md
+++ b/04.word2vec/README.cn.md
@@ -207,6 +207,28 @@ hiddensize = 256 # 隐层维度
 N = 5 # 训练5-Gram
 ```

+用于保存和加载word_dict和embedding table的函数
+```python
+# save and load word dict and embedding table
+def save_dict_and_embedding(word_dict, embeddings):
+    with open("word_dict", "w") as f:
+        for key in word_dict:
+            f.write(key + " " + str(word_dict[key]) + "\n")
+    with open("embedding_table", "w") as f:
+        numpy.savetxt(f, embeddings, delimiter=',', newline='\n')
+
+
+def load_dict_and_embedding():
+    word_dict = dict()
+    with open("word_dict", "r") as f:
+        for line in f:
+            key, value = line.strip().split(" ")
+            word_dict[key] = value
+
+    embeddings = numpy.loadtxt("embedding_table", delimiter=",")
+    return word_dict, embeddings
+```
+
 接着，定义网络结构：

 - 将$w_t$之前的$n-1$个词 $w_{t-n+1},...w_{t-1}$，通过$|V|\times D$的矩阵映射到D维词向量（本例中取D=32）。
@@ -333,6 +355,16 @@ Pass 0, Batch 200, Cost 5.786797, {'classification_error_evaluator': 0.8125}, Te

 经过30个pass，我们将得到平均错误率为classification_error_evaluator=0.735611。

+## 保存词典和embedding
+
+训练完成之后，我们可以把词典和embedding table单独保存下来，后面可以直接使用
+
+```python
+# save word dict and embedding table
+embeddings = parameters.get("_proj").reshape(len(word_dict), embsize)
+save_dict_and_embedding(word_dict, embeddings)
+```
+

 ## 应用模型
 训练模型后，我们可以加载模型参数，用训练出来的词向量初始化其他模型，也可以将模型查看参数用来做后续应用。

--- a/04.word2vec/README.md
+++ b/04.word2vec/README.md
@@ -224,6 +224,29 @@ hiddensize = 256 # hidden layer dimension
 N = 5 # train 5-gram
 ```

+
+- functions used to save and load word dict and embedding table
+```python
+# save and load word dict and embedding table
+def save_dict_and_embedding(word_dict, embeddings):
+    with open("word_dict", "w") as f:
+        for key in word_dict:
+            f.write(key + " " + str(word_dict[key]) + "\n")
+    with open("embedding_table", "w") as f:
+        numpy.savetxt(f, embeddings, delimiter=',', newline='\n')
+
+
+def load_dict_and_embedding():
+    word_dict = dict()
+    with open("word_dict", "r") as f:
+        for line in f:
+            key, value = line.strip().split(" ")
+            word_dict[key] = value
+
+    embeddings = numpy.loadtxt("embedding_table", delimiter=",")
+    return word_dict, embeddings
+```
+
 - Map the $n-1$ words $w_{t-n+1},...w_{t-1}$ before $w_t$ to a D-dimensional vector though matrix of dimention $|V|\times D$ (D=32 in this example).

 ```python
@@ -343,6 +366,16 @@ Pass 0, Batch 200, Cost 5.786797, {'classification_error_evaluator': 0.8125}, Te

 After 30 passes, we can get average error rate around 0.735611.

+## Save word dict and embedding table
+
+after training, we can save the word dict and embedding table for the future usage.
+
+```python
+# save word dict and embedding table
+embeddings = parameters.get("_proj").reshape(len(word_dict), embsize)
+save_dict_and_embedding(word_dict, embeddings)
+```
+

 ## Model Application


--- a/04.word2vec/index.cn.html
+++ b/04.word2vec/index.cn.html
@@ -249,6 +249,28 @@ hiddensize = 256 # 隐层维度
 N = 5 # 训练5-Gram
 ```

+用于保存和加载word_dict和embedding table的函数
+```python
+# save and load word dict and embedding table
+def save_dict_and_embedding(word_dict, embeddings):
+    with open("word_dict", "w") as f:
+        for key in word_dict:
+            f.write(key + " " + str(word_dict[key]) + "\n")
+    with open("embedding_table", "w") as f:
+        numpy.savetxt(f, embeddings, delimiter=',', newline='\n')
+
+
+def load_dict_and_embedding():
+    word_dict = dict()
+    with open("word_dict", "r") as f:
+        for line in f:
+            key, value = line.strip().split(" ")
+            word_dict[key] = value
+
+    embeddings = numpy.loadtxt("embedding_table", delimiter=",")
+    return word_dict, embeddings
+```
+
 接着，定义网络结构：

 - 将$w_t$之前的$n-1$个词 $w_{t-n+1},...w_{t-1}$，通过$|V|\times D$的矩阵映射到D维词向量（本例中取D=32）。
@@ -375,6 +397,16 @@ Pass 0, Batch 200, Cost 5.786797, {'classification_error_evaluator': 0.8125}, Te

 经过30个pass，我们将得到平均错误率为classification_error_evaluator=0.735611。

+## 保存词典和embedding
+
+训练完成之后，我们可以把词典和embedding table单独保存下来，后面可以直接使用
+
+```python
+# save word dict and embedding table
+embeddings = parameters.get("_proj").reshape(len(word_dict), embsize)
+save_dict_and_embedding(word_dict, embeddings)
+```
+

 ## 应用模型
 训练模型后，我们可以加载模型参数，用训练出来的词向量初始化其他模型，也可以将模型查看参数用来做后续应用。

--- a/04.word2vec/index.html
+++ b/04.word2vec/index.html
@@ -266,6 +266,29 @@ hiddensize = 256 # hidden layer dimension
 N = 5 # train 5-gram
 ```

+
+- functions used to save and load word dict and embedding table
+```python
+# save and load word dict and embedding table
+def save_dict_and_embedding(word_dict, embeddings):
+    with open("word_dict", "w") as f:
+        for key in word_dict:
+            f.write(key + " " + str(word_dict[key]) + "\n")
+    with open("embedding_table", "w") as f:
+        numpy.savetxt(f, embeddings, delimiter=',', newline='\n')
+
+
+def load_dict_and_embedding():
+    word_dict = dict()
+    with open("word_dict", "r") as f:
+        for line in f:
+            key, value = line.strip().split(" ")
+            word_dict[key] = value
+
+    embeddings = numpy.loadtxt("embedding_table", delimiter=",")
+    return word_dict, embeddings
+```
+
 - Map the $n-1$ words $w_{t-n+1},...w_{t-1}$ before $w_t$ to a D-dimensional vector though matrix of dimention $|V|\times D$ (D=32 in this example).

 ```python
@@ -385,6 +408,16 @@ Pass 0, Batch 200, Cost 5.786797, {'classification_error_evaluator': 0.8125}, Te

 After 30 passes, we can get average error rate around 0.735611.

+## Save word dict and embedding table
+
+after training, we can save the word dict and embedding table for the future usage.
+
+```python
+# save word dict and embedding table
+embeddings = parameters.get("_proj").reshape(len(word_dict), embsize)
+save_dict_and_embedding(word_dict, embeddings)
+```
+

 ## Model Application


--- a/04.word2vec/train.py
+++ b/04.word2vec/train.py
-import math, os
+import math
+import os

+import numpy
 import paddle.v2 as paddle

 with_gpu = os.getenv('WITH_GPU', '0') != '0'
@@ -18,6 +20,26 @@ def wordemb(inlayer):
    return wordemb


+# save and load word dict and embedding table
+def save_dict_and_embedding(word_dict, embeddings):
+    with open("word_dict", "w") as f:
+        for key in word_dict:
+            f.write(key + " " + str(word_dict[key]) + "\n")
+    with open("embedding_table", "w") as f:
+        numpy.savetxt(f, embeddings, delimiter=',', newline='\n')
+
+
+def load_dict_and_embedding():
+    word_dict = dict()
+    with open("word_dict", "r") as f:
+        for line in f:
+            key, value = line.strip().split(" ")
+            word_dict[key] = value
+
+    embeddings = numpy.loadtxt("embedding_table", delimiter=",")
+    return word_dict, embeddings
+
+
 def main():
    paddle.init(use_gpu=with_gpu, trainer_count=3)
    word_dict = paddle.dataset.imikolov.build_dict()
@@ -79,6 +101,10 @@ def main():
        num_passes=100,
        event_handler=event_handler)

+    # save word dict and embedding table
+    embeddings = parameters.get("_proj").reshape(len(word_dict), embsize)
+    save_dict_and_embedding(word_dict, embeddings)
+

 if __name__ == '__main__':
    main()
--- a/05.recommender_system/README.md
+++ b/05.recommender_system/README.md
 # Personalized Recommendation

-The source code of this tutorial is in [book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system).
-
-For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book).
+The source code from this tutorial is at [here](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system).  For instructions to run it, please refer to [this guide](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book).


 ## Background

-With the fast growth of e-commerce, online videos, and online reading business, users have to rely on recommender systems to avoid manually browsing tremendous volume of choices.  Recommender systems understand users' interest by mining user behavior and other properties of users and products.
-
-Some well know approaches include:
-
- User behavior-based approach.  A well-known method is collaborative filtering. The underlying assumption is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person.
+The recommender system is a component of e-commerce, online videos, and online reading services.  There are several different approaches for recommender systems to learn from user behavior and product properties and to understand users' interests.

- Content-based recommendation[[1](#reference)]. This approach infers feature vectors that represent products from their descriptions.  It also infers feature vectors that represent users' interests.  Then it measures the relevance of users and products by some distances between these feature vectors.
+- User behavior-based approach.  A well-known method of this approach is collaborative filtering, which assumes that if two users made similar purchases, they share common interests and would likely go on making the same decision. Some variants of collaborative filtering are user-based[[3](#reference)], item-based [[4](#reference)], social network based[[5](#reference)], and model-based.

- Hybrid approach[[2](#reference)]: This approach uses the content-based information to help address the cold start problem[[6](#reference)] in behavior-based approach.
+- Content-based approach[[1](#reference)].  This approach represents product properties and user interests as feature vectors of the same space so that it could measure how much a user is interested in a product by the distance between two feature vectors.

-Among these options, collaborative filtering might be the most studied one.  Some of its variants include user-based[[3](#reference)], item-based [[4](#reference)], social network based[[5](#reference)], and model-based.
+- Hybrid approach[[2](#reference)]: This one combines above two to help with each other about the data sparsity problem[[6](#reference)].

-This tutorial explains a deep learning based approach and how to implement it using PaddlePaddle.  We will train a model using a dataset that includes user information, movie information, and ratings.  Once we train the model, we will be able to get a predicted rating given a pair of user and movie IDs.
+This tutorial explains a deep learning based hybrid approach and its implement in PaddlePaddle.  We are going to train a model using a dataset that includes user information, movie information, and ratings.  Once we train the model, we will be able to get a predicted rating given a pair of user and movie IDs.


 ## Model Overview

--- a/05.recommender_system/index.html
+++ b/05.recommender_system/index.html
@@ -42,26 +42,20 @@
 <div id="markdown" style='display:none'>
 # Personalized Recommendation

-The source code of this tutorial is in [book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system).
-
-For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book).
+The source code from this tutorial is at [here](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system).  For instructions to run it, please refer to [this guide](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book).


 ## Background

-With the fast growth of e-commerce, online videos, and online reading business, users have to rely on recommender systems to avoid manually browsing tremendous volume of choices.  Recommender systems understand users' interest by mining user behavior and other properties of users and products.
-
-Some well know approaches include:
-
- User behavior-based approach.  A well-known method is collaborative filtering. The underlying assumption is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person.
+The recommender system is a component of e-commerce, online videos, and online reading services.  There are several different approaches for recommender systems to learn from user behavior and product properties and to understand users' interests.

- Content-based recommendation[[1](#reference)]. This approach infers feature vectors that represent products from their descriptions.  It also infers feature vectors that represent users' interests.  Then it measures the relevance of users and products by some distances between these feature vectors.
+- User behavior-based approach.  A well-known method of this approach is collaborative filtering, which assumes that if two users made similar purchases, they share common interests and would likely go on making the same decision. Some variants of collaborative filtering are user-based[[3](#reference)], item-based [[4](#reference)], social network based[[5](#reference)], and model-based.

- Hybrid approach[[2](#reference)]: This approach uses the content-based information to help address the cold start problem[[6](#reference)] in behavior-based approach.
+- Content-based approach[[1](#reference)].  This approach represents product properties and user interests as feature vectors of the same space so that it could measure how much a user is interested in a product by the distance between two feature vectors.

-Among these options, collaborative filtering might be the most studied one.  Some of its variants include user-based[[3](#reference)], item-based [[4](#reference)], social network based[[5](#reference)], and model-based.
+- Hybrid approach[[2](#reference)]: This one combines above two to help with each other about the data sparsity problem[[6](#reference)].

-This tutorial explains a deep learning based approach and how to implement it using PaddlePaddle.  We will train a model using a dataset that includes user information, movie information, and ratings.  Once we train the model, we will be able to get a predicted rating given a pair of user and movie IDs.
+This tutorial explains a deep learning based hybrid approach and its implement in PaddlePaddle.  We are going to train a model using a dataset that includes user information, movie information, and ratings.  Once we train the model, we will be able to get a predicted rating given a pair of user and movie IDs.


 ## Model Overview

--- a/06.understand_sentiment/README.md
+++ b/06.understand_sentiment/README.md
 # Sentiment Analysis

-The source codes of this section locates at [book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment). First-time users may refer to PaddlePaddle for [Installation guide](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book).
+The source codes of this section is located at [book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment). First-time users may refer to PaddlePaddle for [Installation guide](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book).

 ## Background

@@ -28,7 +28,7 @@ The model we used in this chapter uses **Convolutional Neural Networks** (**CNNs

 ### Revisit to the Convolutional Neural Networks for Texts (CNN)

-The convolutional neural network for texts is introduced in chapter [recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system), let's have a brief overview.
+The convolutional neural network for texts is introduced in chapter [recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system), here is a brief overview.

 CNN mainly contains convolution and pooling operation, with versatile combinations in various applications. We firstly apply the convolution operation: we apply the kernel in each window, extracting features. Convolving by the kernel at every window produces a feature map. Next, we apply *max pooling* over time to represent the whole sentence, which is the maximum element across the feature map. In real applications, we will apply multiple CNN kernels on the sentences. It can be implemented efficiently by concatenating the kernels together as a matrix. Also, we can use CNN kernels with different kernel size. Finally, concatenating the resulting features produces a fixed-length representation, which can be combined with a softmax to form the model for the sentiment analysis problem.

@@ -164,8 +164,8 @@ def stacked_lstm_net(input_dim,
    """
    A Wrapper for sentiment classification task.
    This network uses a bi-directional recurrent network,
-    consisting three LSTM layers. This configure refers to
-    the paper with following url, but use fewer layers.
+    consisting of three LSTM layers. This configuration is 
+    motivated from the following paper, but uses few layers.
        http://www.aclweb.org/anthology/P15-1109
    input_dim: here is word dictionary dimension.
    class_dim: number of categories.

--- a/07.label_semantic_roles/train.py
+++ b/07.label_semantic_roles/train.py
@@ -160,6 +160,9 @@ def main():
    reader = paddle.batch(
        paddle.reader.shuffle(conll05.test(), buf_size=8192), batch_size=10)

+    test_reader = paddle.batch(
+        paddle.reader.shuffle(conll05.test(), buf_size=8192), batch_size=10)
+
    feeding = {
        'word_data': 0,
        'ctx_n2_data': 1,
@@ -178,7 +181,7 @@ def main():
                print "Pass %d, Batch %d, Cost %f, %s" % (
                    event.pass_id, event.batch_id, event.cost, event.metrics)
            if event.batch_id % 1000 == 0:
-                result = trainer.test(reader=reader, feeding=feeding)
+                result = trainer.test(reader=test_reader, feeding=feeding)
                print "\nTest with Pass %d, Batch %d, %s" % (
                    event.pass_id, event.batch_id, result.metrics)

@@ -187,7 +190,7 @@ def main():
            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
                parameters.to_tar(f)

-            result = trainer.test(reader=reader, feeding=feeding)
+            result = trainer.test(reader=test_reader, feeding=feeding)
            print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)

    trainer.train(
@@ -211,6 +214,7 @@ def main():
        output_layer=predict,
        parameters=parameters,
        input=test_data,
+        feeding=feeding,
        field='id')
    assert len(probs) == len(test_data[0][0])
    labels_reverse = {}

--- a/serve/main.py
+++ b/serve/main.py
@@ -22,7 +22,7 @@ if topology_filepath is None:
    )

 with_gpu = os.getenv('WITH_GPU', '0') != '0'
-
+output_field = os.getenv('OUTPUT_FIELD', 'value')
 port = int(os.getenv('PORT', '80'))

 app = Flask(__name__)
@@ -38,13 +38,13 @@ def successResp(data):


 sendQ = Queue()
-recvQ = Queue()


 @app.route('/', methods=['POST'])
 def infer():
-    sendQ.put(request.json)
-    success, resp = recvQ.get()
+    recv_queue = Queue()
+    sendQ.put((request.json, recv_queue))
+    success, resp = recv_queue.get()
    if success:
        return successResp(resp)
    else:
@@ -55,24 +55,30 @@ def infer():
 # threads, so we create a single worker thread.
 def worker():
    paddle.init(use_gpu=with_gpu)
+
+    fields = filter(lambda x: len(x) != 0, output_field.split(","))
+
    with open(tarfn) as param_f, open(topology_filepath) as topo_f:
        params = paddle.parameters.Parameters.from_tar(param_f)
        inferer = paddle.inference.Inference(parameters=params, fileobj=topo_f)

    while True:
-        j = sendQ.get()
+        j, recv_queue = sendQ.get()
        try:
            feeding = {}
            d = []
            for i, key in enumerate(j):
                d.append(j[key])
                feeding[key] = i
-                r = inferer.infer([d], feeding=feeding)
+                r = inferer.infer([d], feeding=feeding, field=fields)
        except:
            trace = traceback.format_exc()
-            recvQ.put((False, trace))
+            recv_queue.put((False, trace))
            continue
-        recvQ.put((True, r.tolist()))
+        if isinstance(r, list):
+            recv_queue.put((True, [elem.tolist() for elem in r]))
+        else:
+            recv_queue.put((True, r.tolist()))


 if __name__ == '__main__':