diff --git a/word2vec/README.en.md b/word2vec/README.en.md
index e4017d896a3a61a457f8b02af91bdabb8f07c64b..9ed9f720809f3ad8fdd843bd49211deff2ec6b83 100644
--- a/word2vec/README.en.md
+++ b/word2vec/README.en.md
@@ -149,19 +149,257 @@ The advantages of CBOW is that it smooths over the word embeddings of the contex
 
 As illustrated in the figure above, skip-gram model maps the word embedding of the given word onto $2n$ word embeddings (including $n$ words before and $n$ words after the given word), and then combine the classification loss of all those $2n$ words by softmax.
 
-## Data Preparation
+## Dataset
+
+We will use Peen Treebank (PTB) (Tomas Mikolov's pre-processed version) dataset. PTB is a small dataset, used in Recurrent Neural Network Language Modeling Toolkit\[[2](#reference)\]. Its statistics are as follows:
+
+<p align="center">
+<table>
+    <tr>
+        <td>training set</td>
+        <td>validation set</td>
+        <td>test set</td>
+    </tr>
+    <tr>
+        <td>ptb.train.txt</td>
+        <td>ptb.valid.txt</td>
+        <td>ptb.test.txt</td>
+    </tr>
+    <tr>
+        <td>42068 lines</td>
+        <td>3370 lines</td>
+        <td>3761 lines</td>
+    </tr>
+</table>
+</p>
+
+### Python Dataset Module
+
+We encapsulated the PTB Data Set in our Python module `paddle.dataset.imikolov`. This module can
+
+1. download the dataset to `~/.cache/paddle/dataset/imikolov`, if not yet, and
+2. [preprocesses](#preprocessing) the dataset.
+
+### Preprocessing
+
+We will be training a 5-gram model. Given five words in a window, we will predict the fifth word given the first four words.
+
+Beginning and end of a sentence have a special meaning, so we will add begin token `<s>` in the front of the sentence. And end token `<e>` in the end of the sentence. By moving the five word window in the sentence, data instances are generated.
+
+For example, the sentence "I have a dream that one day" generates five data instances:
+
+```text
+<s> I have a dream
+I have a dream that
+have a dream that one
+a dream that one day
+dream that one day <e>
+```
+
+At last, each data instance will be converted into an integer sequence according it's words' index inside the dictionary.
+
+## Training
+
+The neural network that we will be using is illustrated in the graph below:
 
-## Model Configuration
 <p align="center">
     <img src="image/ngram.en.png" width=400><br/>
     Figure 5. N-gram neural network model in model configuration
 </p>
 
+`word2vec/train.py` demonstrates training word2vec using PaddlePaddle:
+
+- Import packages.
+
+```python
+import math
+import paddle.v2 as paddle
+```
+
+- Configure parameter.
+
+```python
+embsize = 32 # word vector dimension
+hiddensize = 256 # hidden layer dimension
+N = 5 # train 5-gram
+```
+
+- Map the $n-1$ words $w_{t-n+1},...w_{t-1}$ before $w_t$ to a D-dimensional vector though matrix of dimention $|V|\times D$ (D=32 in this example).
+
+```python
+def wordemb(inlayer):
+    wordemb = paddle.layer.table_projection(
+        input=inlayer,
+        size=embsize,
+        param_attr=paddle.attr.Param(
+            name="_proj",
+            initial_std=0.001,
+            learning_rate=1,
+            l2_rate=0, ))
+    return wordemb
+```
+
+- Define name and type for input to data layer.
+
+```python
+paddle.init(use_gpu=False, trainer_count=3)
+word_dict = paddle.dataset.imikolov.build_dict()
+dict_size = len(word_dict)
+# Every layer takes integer value of range [0, dict_size)
+firstword = paddle.layer.data(
+    name="firstw", type=paddle.data_type.integer_value(dict_size))
+secondword = paddle.layer.data(
+    name="secondw", type=paddle.data_type.integer_value(dict_size))
+thirdword = paddle.layer.data(
+    name="thirdw", type=paddle.data_type.integer_value(dict_size))
+fourthword = paddle.layer.data(
+    name="fourthw", type=paddle.data_type.integer_value(dict_size))
+nextword = paddle.layer.data(
+    name="fifthw", type=paddle.data_type.integer_value(dict_size))
+
+Efirst = wordemb(firstword)
+Esecond = wordemb(secondword)
+Ethird = wordemb(thirdword)
+Efourth = wordemb(fourthword)
+```
+
+- Concatenate n-1 word embedding vectors into a single feature vector.
+
+```python
+contextemb = paddle.layer.concat(input=[Efirst, Esecond, Ethird, Efourth])
+```
+
+- Feature vector will go through a fully connected layer which outputs a hidden feature vector.
+
+```python
+hidden1 = paddle.layer.fc(input=contextemb,
+                          size=hiddensize,
+                          act=paddle.activation.Sigmoid(),
+                          layer_attr=paddle.attr.Extra(drop_rate=0.5),
+                          bias_attr=paddle.attr.Param(learning_rate=2),
+                          param_attr=paddle.attr.Param(
+                                initial_std=1. / math.sqrt(embsize * 8),
+                                learning_rate=1))
+```
+
+- Hidden feature vector will go through another fully conected layer, turn into a $|V|$ dimensional vector. At the same time softmax will be applied to get the probability of each word being generated.
+
+```python
+predictword = paddle.layer.fc(input=hidden1,
+                              size=dict_size,
+                              bias_attr=paddle.attr.Param(learning_rate=2),
+                              act=paddle.activation.Softmax())
+```
+
+- We will use cross-entropy cost function.
+
+```python
+cost = paddle.layer.classification_cost(input=predictword, label=nextword)
+```
+
+- Create parameter, optimizer and trainer.
+
+```python
+parameters = paddle.parameters.create(cost)
+adam_optimizer = paddle.optimizer.Adam(
+    learning_rate=3e-3,
+    regularization=paddle.optimizer.L2Regularization(8e-4))
+trainer = paddle.trainer.SGD(cost, parameters, adam_optimizer)
+```
+
+Next, we will begin the training process. `paddle.dataset.imikolov.train()` and `paddle.dataset.imikolov.test()` is our training set and test set. Both of the function will return a **reader**: In PaddlePaddle, reader is a python function which returns a Python iterator which output a single data instance at a time.
+
+`paddle.batch` takes reader as input, outputs a **batched reader**: In PaddlePaddle, a reader outputs a single data instance at a time but batched reader outputs a minibatch of data instances.
+
+```python
+import gzip
+
+def event_handler(event):
+    if isinstance(event, paddle.event.EndIteration):
+        if event.batch_id % 100 == 0:
+            print "Pass %d, Batch %d, Cost %f, %s" % (
+                event.pass_id, event.batch_id, event.cost, event.metrics)
+
+    if isinstance(event, paddle.event.EndPass):
+        result = trainer.test(
+                    paddle.batch(
+                        paddle.dataset.imikolov.test(word_dict, N), 32))
+        print "Pass %d, Testing metrics %s" % (event.pass_id, result.metrics)
+        with gzip.open("model_%d.tar.gz"%event.pass_id, 'w') as f:
+            parameters.to_tar(f)
+
+trainer.train(
+    paddle.batch(paddle.dataset.imikolov.train(word_dict, N), 32),
+    num_passes=100,
+    event_handler=event_handler)
+```
+
+`trainer.train` will start training, the output of `event_handler` will be similar to following:
+```text
+Pass 0, Batch 0, Cost 7.870579, {'classification_error_evaluator': 1.0}, Testing metrics {'classification_error_evaluator': 0.999591588973999}
+Pass 0, Batch 100, Cost 6.136420, {'classification_error_evaluator': 0.84375}, Testing metrics {'classification_error_evaluator': 0.8328699469566345}
+Pass 0, Batch 200, Cost 5.786797, {'classification_error_evaluator': 0.8125}, Testing metrics {'classification_error_evaluator': 0.8328542709350586}
+...
+```
+
+After 30 passes, we can get average error rate around 0.735611.
 
-## Model Training
 
 ## Model Application
 
+After the model is trained, we can load saved model parameters and uses it for other models. We can also use the parameters in applications.
+
+### Viewing Word Vector
+
+Parameters trained by PaddlePaddle can be viewed by `parameters.get()`. For example, we can check the word vector for word `apple`.
+
+```python
+embeddings = parameters.get("_proj").reshape(len(word_dict), embsize)
+
+print embeddings[word_dict['apple']]
+```
+
+```text
+[-0.38961065 -0.02392169 -0.00093231  0.36301503  0.13538605  0.16076435
+-0.0678709   0.1090285   0.42014077 -0.24119169 -0.31847557  0.20410083
+0.04910378  0.19021918 -0.0122014  -0.04099389 -0.16924137  0.1911236
+-0.10917275  0.13068172 -0.23079982  0.42699069 -0.27679482 -0.01472992
+0.2069038   0.09005053 -0.3282454   0.12717034 -0.24218646  0.25304323
+0.19072419 -0.24286366]
+```
+
+### Modifying Word Vector
+
+Word vectors (`embeddings`) that we get is a numpy array. We can modify this array and set it back to `parameters`.
+
+
+```python
+def modify_embedding(emb):
+    # Add your modification here.
+    pass
+
+modify_embedding(embeddings)
+parameters.set("_proj", embeddings)
+```
+
+### Calculating Cosine Similarity
+
+Cosine similarity is one way of quantifying the similarity between two vectors. The range of result is $[-1, 1]$. The bigger the value, the similar two vectors are:
+
+
+```python
+from scipy import spatial
+
+emb_1 = embeddings[word_dict['world']]
+emb_2 = embeddings[word_dict['would']]
+
+print spatial.distance.cosine(emb_1, emb_2)
+```
+
+```text
+0.99375076448
+```
+
 ## Conclusion
 
 This chapter introduces word embedding, the relationship between language model and word embedding, and how to train neural networks to learn word embedding.
diff --git a/word2vec/README.md b/word2vec/README.md
index 414c7839d8b81834a75ae1d51db840804edaa77c..a6e5732ca5e7844cbb45c0a568a9bbac67ff05b6 100644
--- a/word2vec/README.md
+++ b/word2vec/README.md
@@ -144,7 +144,7 @@ CBOW的好处是对上下文词语的分布在词向量上进行了平滑，去
 
 ### 数据介绍
 
-本教程使用Penn Tree Bank (PTB)数据集。PTB数据集较小，训练速度快，应用于Mikolov的公开语言模型训练工具\[[2](#参考文献)\]中。其统计情况如下：
+本教程使用Penn Treebank （PTB）（经Tomas Mikolov预处理过的版本）数据集。PTB数据集较小，训练速度快，应用于Mikolov的公开语言模型训练工具\[[2](#参考文献)\]中。其统计情况如下：
 
 <p align="center">
 <table>
@@ -183,6 +183,7 @@ a dream that one day
 dream that one day <e>
 ```
 
+最后，每个输入会按其单词次在字典里的位置，转化成整数的索引序列，作为PaddlePaddle的输入。
 ## 编程实现
 
 本配置的模型结构如下图所示：
@@ -245,7 +246,6 @@ Efirst = wordemb(firstword)
 Esecond = wordemb(secondword)
 Ethird = wordemb(thirdword)
 Efourth = wordemb(fourthword)
-
 ```
 
 - 将这n-1个词向量经过concat_layer连接成一个大向量作为历史文本特征。
@@ -323,11 +323,12 @@ trainer.train(
     event_handler=event_handler)
 ```
 
-    ...
-    Pass 0, Batch 25000, Cost 4.251861, {'classification_error_evaluator': 0.84375}
-    Pass 0, Batch 25100, Cost 4.847692, {'classification_error_evaluator': 0.8125}
-    Pass 0, Testing metrics {'classification_error_evaluator': 0.7417652606964111}
-
+```text
+Pass 0, Batch 0, Cost 7.870579, {'classification_error_evaluator': 1.0}, Testing metrics {'classification_error_evaluator': 0.999591588973999}
+Pass 0, Batch 100, Cost 6.136420, {'classification_error_evaluator': 0.84375}, Testing metrics {'classification_error_evaluator': 0.8328699469566345}
+Pass 0, Batch 200, Cost 5.786797, {'classification_error_evaluator': 0.8125}, Testing metrics {'classification_error_evaluator': 0.8328542709350586}
+...
+```
 
 训练过程是完全自动的，event_handler里打印的日志类似如上所示：
 
@@ -340,22 +341,23 @@ trainer.train(
 
 ### 查看词向量
 
-PaddlePaddle训练出来的参数可以直接使用`parameters.get()`获取出来。例如查看单词的word的词向量，即为
+PaddlePaddle训练出来的参数可以直接使用`parameters.get()`获取出来。例如查看单词`apple`的词向量，即为
 
 
 ```python
 embeddings = parameters.get("_proj").reshape(len(word_dict), embsize)
 
-print embeddings[word_dict['word']]
+print embeddings[word_dict['apple']]
 ```
 
-    [-0.38961065 -0.02392169 -0.00093231  0.36301503  0.13538605  0.16076435
-     -0.0678709   0.1090285   0.42014077 -0.24119169 -0.31847557  0.20410083
-      0.04910378  0.19021918 -0.0122014  -0.04099389 -0.16924137  0.1911236
-     -0.10917275  0.13068172 -0.23079982  0.42699069 -0.27679482 -0.01472992
-      0.2069038   0.09005053 -0.3282454   0.12717034 -0.24218646  0.25304323
-      0.19072419 -0.24286366]
-
+```text
+[-0.38961065 -0.02392169 -0.00093231  0.36301503  0.13538605  0.16076435
+-0.0678709   0.1090285   0.42014077 -0.24119169 -0.31847557  0.20410083
+0.04910378  0.19021918 -0.0122014  -0.04099389 -0.16924137  0.1911236
+-0.10917275  0.13068172 -0.23079982  0.42699069 -0.27679482 -0.01472992
+0.2069038   0.09005053 -0.3282454   0.12717034 -0.24218646  0.25304323
+0.19072419 -0.24286366]
+```
 
 
 ### 修改词向量
@@ -387,8 +389,9 @@ emb_2 = embeddings[word_dict['would']]
 print spatial.distance.cosine(emb_1, emb_2)
 ```
 
-    0.99375076448
-
+```text
+0.99375076448
+```
 
 ## 总结
 本章中，我们介绍了词向量、语言模型和词向量的关系、以及如何通过训练神经网络模型获得词向量。在信息检索中，我们可以根据向量间的余弦夹角，来判断query和文档关键词这二者间的相关性。在句法分析和语义分析中，训练好的词向量可以用来初始化模型，以得到更好的效果。在文档分类中，有了词向量之后，可以用聚类的方法将文档中同义词进行分组。希望大家在本章后能够自行运用词向量进行相关领域的研究。
diff --git a/word2vec/index.en.html b/word2vec/index.en.html
index 1738b1588ad5363e3f9a83168ab35948fe2664c3..66e4bcc9a4eb1047524729a454c57e5268ceeb86 100644
--- a/word2vec/index.en.html
+++ b/word2vec/index.en.html
@@ -191,19 +191,257 @@ The advantages of CBOW is that it smooths over the word embeddings of the contex
 
 As illustrated in the figure above, skip-gram model maps the word embedding of the given word onto $2n$ word embeddings (including $n$ words before and $n$ words after the given word), and then combine the classification loss of all those $2n$ words by softmax.
 
-## Data Preparation
+## Dataset
+
+We will use Peen Treebank (PTB) (Tomas Mikolov's pre-processed version) dataset. PTB is a small dataset, used in Recurrent Neural Network Language Modeling Toolkit\[[2](#reference)\]. Its statistics are as follows:
+
+<p align="center">
+<table>
+    <tr>
+        <td>training set</td>
+        <td>validation set</td>
+        <td>test set</td>
+    </tr>
+    <tr>
+        <td>ptb.train.txt</td>
+        <td>ptb.valid.txt</td>
+        <td>ptb.test.txt</td>
+    </tr>
+    <tr>
+        <td>42068 lines</td>
+        <td>3370 lines</td>
+        <td>3761 lines</td>
+    </tr>
+</table>
+</p>
+
+### Python Dataset Module
+
+We encapsulated the PTB Data Set in our Python module `paddle.dataset.imikolov`. This module can
+
+1. download the dataset to `~/.cache/paddle/dataset/imikolov`, if not yet, and
+2. [preprocesses](#preprocessing) the dataset.
+
+### Preprocessing
+
+We will be training a 5-gram model. Given five words in a window, we will predict the fifth word given the first four words.
+
+Beginning and end of a sentence have a special meaning, so we will add begin token `<s>` in the front of the sentence. And end token `<e>` in the end of the sentence. By moving the five word window in the sentence, data instances are generated.
+
+For example, the sentence "I have a dream that one day" generates five data instances:
+
+```text
+<s> I have a dream
+I have a dream that
+have a dream that one
+a dream that one day
+dream that one day <e>
+```
+
+At last, each data instance will be converted into an integer sequence according it's words' index inside the dictionary.
+
+## Training
+
+The neural network that we will be using is illustrated in the graph below:
 
-## Model Configuration
 <p align="center">
     <img src="image/ngram.en.png" width=400><br/>
     Figure 5. N-gram neural network model in model configuration
 </p>
 
+`word2vec/train.py` demonstrates training word2vec using PaddlePaddle:
+
+- Import packages.
+
+```python
+import math
+import paddle.v2 as paddle
+```
+
+- Configure parameter.
+
+```python
+embsize = 32 # word vector dimension
+hiddensize = 256 # hidden layer dimension
+N = 5 # train 5-gram
+```
+
+- Map the $n-1$ words $w_{t-n+1},...w_{t-1}$ before $w_t$ to a D-dimensional vector though matrix of dimention $|V|\times D$ (D=32 in this example).
+
+```python
+def wordemb(inlayer):
+    wordemb = paddle.layer.table_projection(
+        input=inlayer,
+        size=embsize,
+        param_attr=paddle.attr.Param(
+            name="_proj",
+            initial_std=0.001,
+            learning_rate=1,
+            l2_rate=0, ))
+    return wordemb
+```
+
+- Define name and type for input to data layer.
+
+```python
+paddle.init(use_gpu=False, trainer_count=3)
+word_dict = paddle.dataset.imikolov.build_dict()
+dict_size = len(word_dict)
+# Every layer takes integer value of range [0, dict_size)
+firstword = paddle.layer.data(
+    name="firstw", type=paddle.data_type.integer_value(dict_size))
+secondword = paddle.layer.data(
+    name="secondw", type=paddle.data_type.integer_value(dict_size))
+thirdword = paddle.layer.data(
+    name="thirdw", type=paddle.data_type.integer_value(dict_size))
+fourthword = paddle.layer.data(
+    name="fourthw", type=paddle.data_type.integer_value(dict_size))
+nextword = paddle.layer.data(
+    name="fifthw", type=paddle.data_type.integer_value(dict_size))
+
+Efirst = wordemb(firstword)
+Esecond = wordemb(secondword)
+Ethird = wordemb(thirdword)
+Efourth = wordemb(fourthword)
+```
+
+- Concatenate n-1 word embedding vectors into a single feature vector.
+
+```python
+contextemb = paddle.layer.concat(input=[Efirst, Esecond, Ethird, Efourth])
+```
+
+- Feature vector will go through a fully connected layer which outputs a hidden feature vector.
+
+```python
+hidden1 = paddle.layer.fc(input=contextemb,
+                          size=hiddensize,
+                          act=paddle.activation.Sigmoid(),
+                          layer_attr=paddle.attr.Extra(drop_rate=0.5),
+                          bias_attr=paddle.attr.Param(learning_rate=2),
+                          param_attr=paddle.attr.Param(
+                                initial_std=1. / math.sqrt(embsize * 8),
+                                learning_rate=1))
+```
+
+- Hidden feature vector will go through another fully conected layer, turn into a $|V|$ dimensional vector. At the same time softmax will be applied to get the probability of each word being generated.
+
+```python
+predictword = paddle.layer.fc(input=hidden1,
+                              size=dict_size,
+                              bias_attr=paddle.attr.Param(learning_rate=2),
+                              act=paddle.activation.Softmax())
+```
+
+- We will use cross-entropy cost function.
+
+```python
+cost = paddle.layer.classification_cost(input=predictword, label=nextword)
+```
+
+- Create parameter, optimizer and trainer.
+
+```python
+parameters = paddle.parameters.create(cost)
+adam_optimizer = paddle.optimizer.Adam(
+    learning_rate=3e-3,
+    regularization=paddle.optimizer.L2Regularization(8e-4))
+trainer = paddle.trainer.SGD(cost, parameters, adam_optimizer)
+```
+
+Next, we will begin the training process. `paddle.dataset.imikolov.train()` and `paddle.dataset.imikolov.test()` is our training set and test set. Both of the function will return a **reader**: In PaddlePaddle, reader is a python function which returns a Python iterator which output a single data instance at a time.
+
+`paddle.batch` takes reader as input, outputs a **batched reader**: In PaddlePaddle, a reader outputs a single data instance at a time but batched reader outputs a minibatch of data instances.
+
+```python
+import gzip
+
+def event_handler(event):
+    if isinstance(event, paddle.event.EndIteration):
+        if event.batch_id % 100 == 0:
+            print "Pass %d, Batch %d, Cost %f, %s" % (
+                event.pass_id, event.batch_id, event.cost, event.metrics)
+
+    if isinstance(event, paddle.event.EndPass):
+        result = trainer.test(
+                    paddle.batch(
+                        paddle.dataset.imikolov.test(word_dict, N), 32))
+        print "Pass %d, Testing metrics %s" % (event.pass_id, result.metrics)
+        with gzip.open("model_%d.tar.gz"%event.pass_id, 'w') as f:
+            parameters.to_tar(f)
+
+trainer.train(
+    paddle.batch(paddle.dataset.imikolov.train(word_dict, N), 32),
+    num_passes=100,
+    event_handler=event_handler)
+```
+
+`trainer.train` will start training, the output of `event_handler` will be similar to following:
+```text
+Pass 0, Batch 0, Cost 7.870579, {'classification_error_evaluator': 1.0}, Testing metrics {'classification_error_evaluator': 0.999591588973999}
+Pass 0, Batch 100, Cost 6.136420, {'classification_error_evaluator': 0.84375}, Testing metrics {'classification_error_evaluator': 0.8328699469566345}
+Pass 0, Batch 200, Cost 5.786797, {'classification_error_evaluator': 0.8125}, Testing metrics {'classification_error_evaluator': 0.8328542709350586}
+...
+```
+
+After 30 passes, we can get average error rate around 0.735611.
 
-## Model Training
 
 ## Model Application
 
+After the model is trained, we can load saved model parameters and uses it for other models. We can also use the parameters in applications.
+
+### Viewing Word Vector
+
+Parameters trained by PaddlePaddle can be viewed by `parameters.get()`. For example, we can check the word vector for word `apple`.
+
+```python
+embeddings = parameters.get("_proj").reshape(len(word_dict), embsize)
+
+print embeddings[word_dict['apple']]
+```
+
+```text
+[-0.38961065 -0.02392169 -0.00093231  0.36301503  0.13538605  0.16076435
+-0.0678709   0.1090285   0.42014077 -0.24119169 -0.31847557  0.20410083
+0.04910378  0.19021918 -0.0122014  -0.04099389 -0.16924137  0.1911236
+-0.10917275  0.13068172 -0.23079982  0.42699069 -0.27679482 -0.01472992
+0.2069038   0.09005053 -0.3282454   0.12717034 -0.24218646  0.25304323
+0.19072419 -0.24286366]
+```
+
+### Modifying Word Vector
+
+Word vectors (`embeddings`) that we get is a numpy array. We can modify this array and set it back to `parameters`.
+
+
+```python
+def modify_embedding(emb):
+    # Add your modification here.
+    pass
+
+modify_embedding(embeddings)
+parameters.set("_proj", embeddings)
+```
+
+### Calculating Cosine Similarity
+
+Cosine similarity is one way of quantifying the similarity between two vectors. The range of result is $[-1, 1]$. The bigger the value, the similar two vectors are:
+
+
+```python
+from scipy import spatial
+
+emb_1 = embeddings[word_dict['world']]
+emb_2 = embeddings[word_dict['would']]
+
+print spatial.distance.cosine(emb_1, emb_2)
+```
+
+```text
+0.99375076448
+```
+
 ## Conclusion
 
 This chapter introduces word embedding, the relationship between language model and word embedding, and how to train neural networks to learn word embedding.
diff --git a/word2vec/index.html b/word2vec/index.html
index e7b232121b0b1d2c9f47720602c05e171597497a..ad5cd1942de85a946beca2bfd7213e8924316a39 100644
--- a/word2vec/index.html
+++ b/word2vec/index.html
@@ -186,7 +186,7 @@ CBOW的好处是对上下文词语的分布在词向量上进行了平滑，去
 
 ### 数据介绍
 
-本教程使用Penn Tree Bank (PTB)数据集。PTB数据集较小，训练速度快，应用于Mikolov的公开语言模型训练工具\[[2](#参考文献)\]中。其统计情况如下：
+本教程使用Penn Treebank （PTB）（经Tomas Mikolov预处理过的版本）数据集。PTB数据集较小，训练速度快，应用于Mikolov的公开语言模型训练工具\[[2](#参考文献)\]中。其统计情况如下：
 
 <p align="center">
 <table>
@@ -225,6 +225,7 @@ a dream that one day
 dream that one day <e>
 ```
 
+最后，每个输入会按其单词次在字典里的位置，转化成整数的索引序列，作为PaddlePaddle的输入。
 ## 编程实现
 
 本配置的模型结构如下图所示：
@@ -287,7 +288,6 @@ Efirst = wordemb(firstword)
 Esecond = wordemb(secondword)
 Ethird = wordemb(thirdword)
 Efourth = wordemb(fourthword)
-
 ```
 
 - 将这n-1个词向量经过concat_layer连接成一个大向量作为历史文本特征。
@@ -365,11 +365,12 @@ trainer.train(
     event_handler=event_handler)
 ```
 
-    ...
-    Pass 0, Batch 25000, Cost 4.251861, {'classification_error_evaluator': 0.84375}
-    Pass 0, Batch 25100, Cost 4.847692, {'classification_error_evaluator': 0.8125}
-    Pass 0, Testing metrics {'classification_error_evaluator': 0.7417652606964111}
-
+```text
+Pass 0, Batch 0, Cost 7.870579, {'classification_error_evaluator': 1.0}, Testing metrics {'classification_error_evaluator': 0.999591588973999}
+Pass 0, Batch 100, Cost 6.136420, {'classification_error_evaluator': 0.84375}, Testing metrics {'classification_error_evaluator': 0.8328699469566345}
+Pass 0, Batch 200, Cost 5.786797, {'classification_error_evaluator': 0.8125}, Testing metrics {'classification_error_evaluator': 0.8328542709350586}
+...
+```
 
 训练过程是完全自动的，event_handler里打印的日志类似如上所示：
 
@@ -382,22 +383,23 @@ trainer.train(
 
 ### 查看词向量
 
-PaddlePaddle训练出来的参数可以直接使用`parameters.get()`获取出来。例如查看单词的word的词向量，即为
+PaddlePaddle训练出来的参数可以直接使用`parameters.get()`获取出来。例如查看单词`apple`的词向量，即为
 
 
 ```python
 embeddings = parameters.get("_proj").reshape(len(word_dict), embsize)
 
-print embeddings[word_dict['word']]
+print embeddings[word_dict['apple']]
 ```
 
-    [-0.38961065 -0.02392169 -0.00093231  0.36301503  0.13538605  0.16076435
-     -0.0678709   0.1090285   0.42014077 -0.24119169 -0.31847557  0.20410083
-      0.04910378  0.19021918 -0.0122014  -0.04099389 -0.16924137  0.1911236
-     -0.10917275  0.13068172 -0.23079982  0.42699069 -0.27679482 -0.01472992
-      0.2069038   0.09005053 -0.3282454   0.12717034 -0.24218646  0.25304323
-      0.19072419 -0.24286366]
-
+```text
+[-0.38961065 -0.02392169 -0.00093231  0.36301503  0.13538605  0.16076435
+-0.0678709   0.1090285   0.42014077 -0.24119169 -0.31847557  0.20410083
+0.04910378  0.19021918 -0.0122014  -0.04099389 -0.16924137  0.1911236
+-0.10917275  0.13068172 -0.23079982  0.42699069 -0.27679482 -0.01472992
+0.2069038   0.09005053 -0.3282454   0.12717034 -0.24218646  0.25304323
+0.19072419 -0.24286366]
+```
 
 
 ### 修改词向量
@@ -429,8 +431,9 @@ emb_2 = embeddings[word_dict['would']]
 print spatial.distance.cosine(emb_1, emb_2)
 ```
 
-    0.99375076448
-
+```text
+0.99375076448
+```
 
 ## 总结
 本章中，我们介绍了词向量、语言模型和词向量的关系、以及如何通过训练神经网络模型获得词向量。在信息检索中，我们可以根据向量间的余弦夹角，来判断query和文档关键词这二者间的相关性。在句法分析和语义分析中，训练好的词向量可以用来初始化模型，以得到更好的效果。在文档分类中，有了词向量之后，可以用聚类的方法将文档中同义词进行分组。希望大家在本章后能够自行运用词向量进行相关领域的研究。

training set	validation set	test set
ptb.train.txt	ptb.valid.txt	ptb.test.txt
42068 lines	3370 lines	3761 lines