提交 ff9a664d 编写于 作者: D dangqingqing

also fix machine translation

上级 06b45a76
...@@ -446,6 +446,7 @@ settings( ...@@ -446,6 +446,7 @@ settings(
This tutorial will use the default SGD and Adam learning algorithm, with a learning rate of 5e-4. Note that the `batch_size = 50` denotes generating 50 sequence each time. This tutorial will use the default SGD and Adam learning algorithm, with a learning rate of 5e-4. Note that the `batch_size = 50` denotes generating 50 sequence each time.
### Model Structure ### Model Structure
1. Define some global variables 1. Define some global variables
```python ```python
...@@ -493,6 +494,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn ...@@ -493,6 +494,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn
with mixed_layer(size=decoder_size) as encoded_proj: with mixed_layer(size=decoder_size) as encoded_proj:
encoded_proj += full_matrix_projection(input=encoded_vector) encoded_proj += full_matrix_projection(input=encoded_vector)
``` ```
3.2 Use a non-linear transformation of the last hidden state of the backward GRU on the source language sentence as the initial state of the decoder RNN $c_0=h_T$ 3.2 Use a non-linear transformation of the last hidden state of the backward GRU on the source language sentence as the initial state of the decoder RNN $c_0=h_T$
```python ```python
...@@ -502,6 +504,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn ...@@ -502,6 +504,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn
act=TanhActivation(), ) as decoder_boot: act=TanhActivation(), ) as decoder_boot:
decoder_boot += full_matrix_projection(input=backward_first) decoder_boot += full_matrix_projection(input=backward_first)
``` ```
3.3 Define the computation in each time step for the decoder RNN, i.e., according to the current context vector $c_i$, hidden state for the decoder $z_i$ and the $i$-th word $u_i$ in the target language to predict the probability $p_{i+1}$ for the $i+1$-th word. 3.3 Define the computation in each time step for the decoder RNN, i.e., according to the current context vector $c_i$, hidden state for the decoder $z_i$ and the $i$-th word $u_i$ in the target language to predict the probability $p_{i+1}$ for the $i+1$-th word.
- decoder_mem records the hidden state $z_i$ from the previous time step, with an initial state as decoder_boot. - decoder_mem records the hidden state $z_i$ from the previous time step, with an initial state as decoder_boot.
...@@ -536,6 +539,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn ...@@ -536,6 +539,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn
out += full_matrix_projection(input=gru_step) out += full_matrix_projection(input=gru_step)
return out return out
``` ```
4. Decoder differences between the training and generation 4. Decoder differences between the training and generation
4.1 Define the name for the decoder and the first two input for `gru_decoder_with_attention`. Note that `StaticInput` is used for the two inputs. Please refer to [StaticInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for more details. 4.1 Define the name for the decoder and the first two input for `gru_decoder_with_attention`. Note that `StaticInput` is used for the two inputs. Please refer to [StaticInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for more details.
...@@ -546,6 +550,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn ...@@ -546,6 +550,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn
group_input2 = StaticInput(input=encoded_proj, is_seq=True) group_input2 = StaticInput(input=encoded_proj, is_seq=True)
group_inputs = [group_input1, group_input2] group_inputs = [group_input1, group_input2]
``` ```
4.2 In training mode: 4.2 In training mode:
- word embedding from the target langauge trg_embedding is passed to `gru_decoder_with_attention` as current_word. - word embedding from the target langauge trg_embedding is passed to `gru_decoder_with_attention` as current_word.
...@@ -571,6 +576,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn ...@@ -571,6 +576,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn
cost = classification_cost(input=decoder, label=lbl) cost = classification_cost(input=decoder, label=lbl)
outputs(cost) outputs(cost)
``` ```
4.3 In generation mode: 4.3 In generation mode:
- during generation, as the decoder RNN will take the word vector generated from the previous time step as input, `GeneratedInput` is used to implement this automatically. Please refer to [GeneratedInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for details. - during generation, as the decoder RNN will take the word vector generated from the previous time step as input, `GeneratedInput` is used to implement this automatically. Please refer to [GeneratedInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for details.
......
...@@ -340,6 +340,7 @@ wmt14_reader = paddle.batch( ...@@ -340,6 +340,7 @@ wmt14_reader = paddle.batch(
out += paddle.layer.full_matrix_projection(input=gru_step) out += paddle.layer.full_matrix_projection(input=gru_step)
return out return out
``` ```
4. 训练模式与生成模式下的解码器调用区别。 4. 训练模式与生成模式下的解码器调用区别。
4.1 定义解码器框架名字,和`gru_decoder_with_attention`函数的前两个输入。注意:这两个输入使用`StaticInput`,具体说明可见[StaticInput文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) 4.1 定义解码器框架名字,和`gru_decoder_with_attention`函数的前两个输入。注意:这两个输入使用`StaticInput`,具体说明可见[StaticInput文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入)
...@@ -400,6 +401,7 @@ for param in parameters.keys(): ...@@ -400,6 +401,7 @@ for param in parameters.keys():
``` ```
### 训练模型 ### 训练模型
1. 构造trainer 1. 构造trainer
根据优化目标cost,网络拓扑结构和模型参数来构造出trainer用来训练,在构造时还需指定优化方法,这里使用最基本的SGD方法。 根据优化目标cost,网络拓扑结构和模型参数来构造出trainer用来训练,在构造时还需指定优化方法,这里使用最基本的SGD方法。
...@@ -409,7 +411,7 @@ for param in parameters.keys(): ...@@ -409,7 +411,7 @@ for param in parameters.keys():
trainer = paddle.trainer.SGD(cost=cost, trainer = paddle.trainer.SGD(cost=cost,
parameters=parameters, parameters=parameters,
update_equation=optimizer) update_equation=optimizer)
``` ```
2. 构造event_handler 2. 构造event_handler
...@@ -421,6 +423,7 @@ for param in parameters.keys(): ...@@ -421,6 +423,7 @@ for param in parameters.keys():
print "Pass %d, Batch %d, Cost %f, %s" % ( print "Pass %d, Batch %d, Cost %f, %s" % (
event.pass_id, event.batch_id, event.cost, event.metrics) event.pass_id, event.batch_id, event.cost, event.metrics)
``` ```
3. 启动训练: 3. 启动训练:
```python ```python
...@@ -435,7 +438,7 @@ for param in parameters.keys(): ...@@ -435,7 +438,7 @@ for param in parameters.keys():
Pass 0, Batch 0, Cost 247.408008, {'classification_error_evaluator': 1.0} Pass 0, Batch 0, Cost 247.408008, {'classification_error_evaluator': 1.0}
Pass 0, Batch 10, Cost 212.058789, {'classification_error_evaluator': 0.8737863898277283} Pass 0, Batch 10, Cost 212.058789, {'classification_error_evaluator': 0.8737863898277283}
... ...
``` ```
## 应用模型 ## 应用模型
......
...@@ -488,6 +488,7 @@ settings( ...@@ -488,6 +488,7 @@ settings(
This tutorial will use the default SGD and Adam learning algorithm, with a learning rate of 5e-4. Note that the `batch_size = 50` denotes generating 50 sequence each time. This tutorial will use the default SGD and Adam learning algorithm, with a learning rate of 5e-4. Note that the `batch_size = 50` denotes generating 50 sequence each time.
### Model Structure ### Model Structure
1. Define some global variables 1. Define some global variables
```python ```python
...@@ -535,6 +536,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn ...@@ -535,6 +536,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn
with mixed_layer(size=decoder_size) as encoded_proj: with mixed_layer(size=decoder_size) as encoded_proj:
encoded_proj += full_matrix_projection(input=encoded_vector) encoded_proj += full_matrix_projection(input=encoded_vector)
``` ```
3.2 Use a non-linear transformation of the last hidden state of the backward GRU on the source language sentence as the initial state of the decoder RNN $c_0=h_T$ 3.2 Use a non-linear transformation of the last hidden state of the backward GRU on the source language sentence as the initial state of the decoder RNN $c_0=h_T$
```python ```python
...@@ -544,6 +546,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn ...@@ -544,6 +546,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn
act=TanhActivation(), ) as decoder_boot: act=TanhActivation(), ) as decoder_boot:
decoder_boot += full_matrix_projection(input=backward_first) decoder_boot += full_matrix_projection(input=backward_first)
``` ```
3.3 Define the computation in each time step for the decoder RNN, i.e., according to the current context vector $c_i$, hidden state for the decoder $z_i$ and the $i$-th word $u_i$ in the target language to predict the probability $p_{i+1}$ for the $i+1$-th word. 3.3 Define the computation in each time step for the decoder RNN, i.e., according to the current context vector $c_i$, hidden state for the decoder $z_i$ and the $i$-th word $u_i$ in the target language to predict the probability $p_{i+1}$ for the $i+1$-th word.
- decoder_mem records the hidden state $z_i$ from the previous time step, with an initial state as decoder_boot. - decoder_mem records the hidden state $z_i$ from the previous time step, with an initial state as decoder_boot.
...@@ -578,6 +581,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn ...@@ -578,6 +581,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn
out += full_matrix_projection(input=gru_step) out += full_matrix_projection(input=gru_step)
return out return out
``` ```
4. Decoder differences between the training and generation 4. Decoder differences between the training and generation
4.1 Define the name for the decoder and the first two input for `gru_decoder_with_attention`. Note that `StaticInput` is used for the two inputs. Please refer to [StaticInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for more details. 4.1 Define the name for the decoder and the first two input for `gru_decoder_with_attention`. Note that `StaticInput` is used for the two inputs. Please refer to [StaticInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for more details.
...@@ -588,6 +592,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn ...@@ -588,6 +592,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn
group_input2 = StaticInput(input=encoded_proj, is_seq=True) group_input2 = StaticInput(input=encoded_proj, is_seq=True)
group_inputs = [group_input1, group_input2] group_inputs = [group_input1, group_input2]
``` ```
4.2 In training mode: 4.2 In training mode:
- word embedding from the target langauge trg_embedding is passed to `gru_decoder_with_attention` as current_word. - word embedding from the target langauge trg_embedding is passed to `gru_decoder_with_attention` as current_word.
...@@ -613,6 +618,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn ...@@ -613,6 +618,7 @@ This tutorial will use the default SGD and Adam learning algorithm, with a learn
cost = classification_cost(input=decoder, label=lbl) cost = classification_cost(input=decoder, label=lbl)
outputs(cost) outputs(cost)
``` ```
4.3 In generation mode: 4.3 In generation mode:
- during generation, as the decoder RNN will take the word vector generated from the previous time step as input, `GeneratedInput` is used to implement this automatically. Please refer to [GeneratedInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for details. - during generation, as the decoder RNN will take the word vector generated from the previous time step as input, `GeneratedInput` is used to implement this automatically. Please refer to [GeneratedInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for details.
......
...@@ -382,6 +382,7 @@ wmt14_reader = paddle.batch( ...@@ -382,6 +382,7 @@ wmt14_reader = paddle.batch(
out += paddle.layer.full_matrix_projection(input=gru_step) out += paddle.layer.full_matrix_projection(input=gru_step)
return out return out
``` ```
4. 训练模式与生成模式下的解码器调用区别。 4. 训练模式与生成模式下的解码器调用区别。
4.1 定义解码器框架名字,和`gru_decoder_with_attention`函数的前两个输入。注意:这两个输入使用`StaticInput`,具体说明可见[StaticInput文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入)。 4.1 定义解码器框架名字,和`gru_decoder_with_attention`函数的前两个输入。注意:这两个输入使用`StaticInput`,具体说明可见[StaticInput文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入)。
...@@ -442,6 +443,7 @@ for param in parameters.keys(): ...@@ -442,6 +443,7 @@ for param in parameters.keys():
``` ```
### 训练模型 ### 训练模型
1. 构造trainer 1. 构造trainer
根据优化目标cost,网络拓扑结构和模型参数来构造出trainer用来训练,在构造时还需指定优化方法,这里使用最基本的SGD方法。 根据优化目标cost,网络拓扑结构和模型参数来构造出trainer用来训练,在构造时还需指定优化方法,这里使用最基本的SGD方法。
...@@ -451,7 +453,7 @@ for param in parameters.keys(): ...@@ -451,7 +453,7 @@ for param in parameters.keys():
trainer = paddle.trainer.SGD(cost=cost, trainer = paddle.trainer.SGD(cost=cost,
parameters=parameters, parameters=parameters,
update_equation=optimizer) update_equation=optimizer)
``` ```
2. 构造event_handler 2. 构造event_handler
...@@ -463,6 +465,7 @@ for param in parameters.keys(): ...@@ -463,6 +465,7 @@ for param in parameters.keys():
print "Pass %d, Batch %d, Cost %f, %s" % ( print "Pass %d, Batch %d, Cost %f, %s" % (
event.pass_id, event.batch_id, event.cost, event.metrics) event.pass_id, event.batch_id, event.cost, event.metrics)
``` ```
3. 启动训练: 3. 启动训练:
```python ```python
...@@ -477,7 +480,7 @@ for param in parameters.keys(): ...@@ -477,7 +480,7 @@ for param in parameters.keys():
Pass 0, Batch 0, Cost 247.408008, {'classification_error_evaluator': 1.0} Pass 0, Batch 0, Cost 247.408008, {'classification_error_evaluator': 1.0}
Pass 0, Batch 10, Cost 212.058789, {'classification_error_evaluator': 0.8737863898277283} Pass 0, Batch 10, Cost 212.058789, {'classification_error_evaluator': 0.8737863898277283}
... ...
``` ```
## 应用模型 ## 应用模型
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册