Source codes are located at [book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/machine_translation). Please refer to the PaddlePaddle[installation tutorial](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html) if you are a first time user.
Source codes are located at [book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/machine_translation). Please refer to the PaddlePaddle[installation tutorial](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html) if you are a first time user.
## Background
...
...
@@ -15,7 +15,7 @@ The recent development of deep learning provides new solutions to those challeng
<palign="center">
<imgsrc="image/nmt.png"width=400><br/>
Figure 1. Neural Network based Machine Translation
Generally speaking, sequences with short distance dependency will have active reset gate while sequences with long distance dependency will have active update date.
...
...
@@ -65,7 +65,7 @@ Specifically, this bi-directional RNN processes the input sequence in the origin
Figure 4. Encoder-Decoder Framework (源语言词序列: Word Sequence for the Source Language; 源语编码状态: Word Embedding Sequence for the Source Language; 独热编码: One-hot Encoding; 词向量: Word Embedding; 隐层状态: Hidden State; 词概率: Word Probability; 词样本: Word Sample; 编码器: Encoder; 解码器: Decoder.)
**Note: there is an error in the original figure. The locations for 源语言词序列 and 源语编码状态 should be switched.**
</p>
#### Encoder
...
...
@@ -91,12 +92,12 @@ Bi-directional RNN can also be used in step 3 for more complicated sentence enco
Figure 5. Encoder using bi-directional GRU (源语编码状态: Word Embedding Sequence for the Source Language; 词向量: Word Embedding; 独热编码: One-hot Encoding; 编码器: Encoder)
</p>
#### Decoder
The goal of the decoder is to maximize the probability of the next correct word. The main idea is as follows:
The goal of the decoder is to maximize the probability of the next correct word in target language. The main idea is as follows:
1. At each time step $i$, given the encoding vector (or context vector) $c$ of the source sentence, the $i$-th word $u_i$ from the ground-truth target language and the RNN hidden state $z_i$, the next hidden state $z_{i+1}$ is computated as:
...
...
@@ -137,7 +138,7 @@ where $align$ is an alignment model, measuring the fitness between the $i$-th wo
Figure 6. Decoder with Attention Mechanism ( 源语编码状态: Word Embedding Sequence for the Source Language; 权重: Attention Weight; 隐层状态: Hidden State; 词概率: Word Probability; 词样本: Word Sample; 解码器: Decoder.)
</p>
### Beam Search Algorithm
...
...
@@ -159,7 +160,7 @@ Note: $z_{i+1}$ and $p_{i+1}$ are computed the same way as in [Decoder](#Decoder
### Download and Uncompression
This tutorial uses a dataset from [WMT-14](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/), where the dataset [bitexts(after selection)](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/bitexts.tgz) is used as training set, and [dev+test data](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/dev+test.tgz) is used as testing and generating set.
This tutorial uses a dataset from [WMT-14](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/), where the dataset [bitexts(after selection)](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/bitexts.tgz) is used as training set, and [dev+test data](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/dev+test.tgz) is used as testing and generating set.
Run the following command in Linux to obtain the data: