提交 b819b44a 编写于 作者: L Luo Tao

update rnn doc from v1-api to v2-api

上级 0e2acb8b
......@@ -4,6 +4,7 @@ RNN相关模型
.. toctree::
:maxdepth: 1
rnn_config_cn.rst
recurrent_group_cn.md
hierarchical_layer_cn.rst
hrnn_rnn_api_compare_cn.rst
RNN Models
==========
.. toctree::
:maxdepth: 1
rnn_config_en.rst
......@@ -5,36 +5,13 @@ RNN配置
中配置循环神经网络(RNN)。PaddlePaddle
高度支持灵活和高效的循环神经网络配置。 在本教程中,您将了解如何:
- 准备用来学习循环神经网络的序列数据。
- 配置循环神经网络架构。
- 使用学习完成的循环神经网络模型生成序列。
我们将使用 vanilla 循环神经网络和 sequence to sequence
模型来指导你完成这些步骤。sequence to sequence
模型的代码可以在\ ``demo / seqToseq``\ 找到。
准备序列数据
------------
PaddlePaddle
不需要对序列数据进行任何预处理,例如填充。唯一需要做的是将相应类型设置为输入。例如,以下代码段定义了三个输入。
它们都是序列,它们的大小是\ ``src_dict``\ ,\ ``trg_dict``\ 和\ ``trg_dict``\ :
.. code:: python
settings.input_types = [
integer_value_sequence(len(settings.src_dict)),
integer_value_sequence(len(settings.trg_dict)),
integer_value_sequence(len(settings.trg_dict))]
在\ ``process``\ 函数中,每个\ ``yield``\ 函数将返回三个整数列表。每个整数列表被视为一个整数序列:
.. code:: python
yield src_ids, trg_ids, trg_ids_next
有关如何编写数据提供程序的更多细节描述,请参考 :ref:`api_pydataprovider2` 。完整的数据提供文件在
``demo/seqToseq/dataprovider.py``\ 。
模型的代码可以在 `book/08.machine_translation <https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation>`_ 找到。
wmt14数据的提供文件在 `python/paddle/v2/dataset/wmt14.py <https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/dataset/wmt14.py>`_ 。
配置循环神经网络架构
--------------------
......@@ -85,19 +62,19 @@ vanilla
act=None,
rnn_layer_attr=None):
def __rnn_step__(ipt):
out_mem = memory(name=name, size=size)
rnn_out = mixed_layer(input = [full_matrix_projection(ipt),
full_matrix_projection(out_mem)],
name = name,
bias_attr = rnn_bias_attr,
act = act,
layer_attr = rnn_layer_attr,
size = size)
out_mem = paddle.layer.memory(name=name, size=size)
rnn_out = paddle.layer.mixed(input = [paddle.layer.full_matrix_projection(input=ipt),
paddle.layer.full_matrix_projection(input=out_mem)],
name = name,
bias_attr = rnn_bias_attr,
act = act,
layer_attr = rnn_layer_attr,
size = size)
return rnn_out
return recurrent_group(name='%s_recurrent_group' % name,
step=__rnn_step__,
reverse=reverse,
input=input)
return paddle.layer.recurrent_group(name='%s_recurrent_group' % name,
step=__rnn_step__,
reverse=reverse,
input=input)
PaddlePaddle
使用“Memory”(记忆模块)实现单步函数。\ **Memory**\ 是在PaddlePaddle中构造循环神经网络时最重要的概念。
......@@ -140,43 +117,52 @@ Sequence to Sequence Model with Attention
.. code:: python
# 定义源语句的数据层
src_word_id = data_layer(name='source_language_word', size=source_dict_dim)
src_word_id = paddle.layer.data(
name='source_language_word',
type=paddle.data_type.integer_value_sequence(source_dict_dim))
# 计算每个词的词向量
src_embedding = embedding_layer(
src_embedding = paddle.layer.embedding(
input=src_word_id,
size=word_vector_dim,
param_attr=ParamAttr(name='_source_language_embedding'))
param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))
# 应用前向循环神经网络
src_forward = grumemory(input=src_embedding, size=encoder_size)
src_forward = paddle.networks.simple_gru(
input=src_embedding, size=encoder_size)
# 应用反向递归神经网络(reverse=True表示反向循环神经网络)
src_backward = grumemory(input=src_embedding,
size=encoder_size,
reverse=True)
src_backward = paddle.networks.simple_gru(
input=src_embedding, size=encoder_size, reverse=True)
# 将循环神经网络的前向和反向部分混合在一起
encoded_vector = concat_layer(input=[src_forward, src_backward])
encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])
# 投射编码向量到 decoder_size
encoder_proj = mixed_layer(input = [full_matrix_projection(encoded_vector)],
size = decoder_size)
encoded_proj = paddle.layer.mixed(
size=decoder_size,
input=paddle.layer.full_matrix_projection(encoded_vector))
# 计算反向RNN的第一个实例
backward_first = first_seq(input=src_backward)
backward_first = paddle.layer.first_seq(input=src_backward)
# 投射反向RNN的第一个实例到 decoder size
decoder_boot = mixed_layer(input=[full_matrix_projection(backward_first)], size=decoder_size, act=TanhActivation())
decoder_boot = paddle.layer.mixed(
size=decoder_size,
act=paddle.activation.Tanh(),
input=paddle.layer.full_matrix_projection(backward_first))
解码器使用 ``recurrent_group`` 来定义循环神经网络。单步函数和输出函数在
``gru_decoder_with_attention`` 中定义:
.. code:: python
group_inputs=[StaticInput(input=encoded_vector,is_seq=True),
StaticInput(input=encoded_proj,is_seq=True)]
trg_embedding = embedding_layer(
input=data_layer(name='target_language_word',
size=target_dict_dim),
size=word_vector_dim,
param_attr=ParamAttr(name='_target_language_embedding'))
group_input1 = paddle.layer.StaticInput(input=encoded_vector, is_seq=True)
group_input2 = paddle.layer.StaticInput(input=encoded_proj, is_seq=True)
group_inputs = [group_input1, group_input2]
trg_embedding = paddle.layer.embedding(
input=paddle.layer.data(
name='target_language_word',
type=paddle.data_type.integer_value_sequence(target_dict_dim)),
size=word_vector_dim,
param_attr=paddle.attr.ParamAttr(name='_target_language_embedding'))
group_inputs.append(trg_embedding)
group_inputs.append(trg_embedding)
# 对于配备有注意力机制的解码器,在训练中,
......@@ -185,9 +171,10 @@ Sequence to Sequence Model with Attention
# StaticInput 意味着不同时间步的输入都是相同的值,
# 否则它以一个序列输入,不同时间步的输入是不同的。
# 所有输入序列应该有相同的长度。
decoder = recurrent_group(name=decoder_group_name,
step=gru_decoder_with_attention,
input=group_inputs)
decoder = paddle.layer.recurrent_group(
name=decoder_group_name,
step=gru_decoder_with_attention,
input=group_inputs)
单步函数的实现如下所示。首先,它定义解码网络的\ **Memory**\ 。然后定义
attention,门控循环单元单步函数和输出函数:
......@@ -198,27 +185,32 @@ attention,门控循环单元单步函数和输出函数:
# 定义解码器的Memory
# Memory的输出定义在 gru_step 内
# 注意 gru_step 应该与它的Memory名字相同
decoder_mem = memory(name='gru_decoder',
size=decoder_size,
boot_layer=decoder_boot)
decoder_mem = paddle.layer.memory(
name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
# 计算 attention 加权编码向量
context = simple_attention(encoded_sequence=enc_vec,
encoded_proj=enc_proj,
decoder_state=decoder_mem)
context = paddle.networks.simple_attention(
encoded_sequence=enc_vec,
encoded_proj=enc_proj,
decoder_state=decoder_mem)
# 混合当前词向量和attention加权编码向量
decoder_inputs = mixed_layer(inputs = [full_matrix_projection(context),
full_matrix_projection(current_word)],
size = decoder_size * 3)
decoder_inputs = paddle.layer.mixed(
size=decoder_size * 3,
input=[
paddle.layer.full_matrix_projection(input=context),
paddle.layer.full_matrix_projection(input=current_word)
])
# 定义门控循环单元循环神经网络单步函数
gru_step = gru_step_layer(name='gru_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
gru_step = paddle.layer.gru_step(
name='gru_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
# 定义输出函数
out = mixed_layer(input=[full_matrix_projection(input=gru_step)],
size=target_dict_dim,
bias_attr=True,
act=SoftmaxActivation())
out = paddle.layer.mixed(
size=target_dict_dim,
bias_attr=True,
act=paddle.activation.Softmax(),
input=paddle.layer.full_matrix_projection(input=gru_step))
return out
生成序列
......@@ -238,41 +230,32 @@ attention,门控循环单元单步函数和输出函数:
- ``beam_size``: beam search 算法中的beam大小。
- ``max_length``: 生成序列的最大长度。
- 使用 ``seqtext_printer_evaluator``
根据索引矩阵和字典打印文本。这个函数需要设置:
- ``id_input``: 数据的整数ID,用于标识生成的文件中的相应输出。
- ``dict_file``: 用于将词ID转换为词的字典文件。
- ``result_file``: 生成结果文件的路径。
代码如下:
.. code:: python
group_inputs=[StaticInput(input=encoded_vector,is_seq=True),
StaticInput(input=encoded_proj,is_seq=True)]
group_input1 = paddle.layer.StaticInput(input=encoded_vector, is_seq=True)
group_input2 = paddle.layer.StaticInput(input=encoded_proj, is_seq=True)
group_inputs = [group_input1, group_input2]
# 在生成时,解码器基于编码源序列和最后生成的目标词预测下一目标词。
# 编码源序列(编码器输出)必须由只读Memory的 StaticInput 指定。
# 这里, GeneratedInputs 自动获取上一个生成的词,并在最开始初始化为起始词,如 <s>。
trg_embedding = GeneratedInput(
size=target_dict_dim,
embedding_name='_target_language_embedding',
embedding_size=word_vector_dim)
trg_embedding = paddle.layer.GeneratedInput(
size=target_dict_dim,
embedding_name='_target_language_embedding',
embedding_size=word_vector_dim)
group_inputs.append(trg_embedding)
beam_gen = beam_search(name=decoder_group_name,
step=gru_decoder_with_attention,
input=group_inputs,
bos_id=0, # Beginnning token.
eos_id=1, # End of sentence token.
beam_size=beam_size,
max_length=max_length)
seqtext_printer_evaluator(input=beam_gen,
id_input=data_layer(name="sent_id", size=1),
dict_file=trg_dict_path,
result_file=gen_trans_file)
outputs(beam_gen)
注意,这种生成技术只用于类似解码器的生成过程。如果你正在处理序列标记任务,请参阅 :ref:`semantic_role_labeling` 了解更多详细信息。
完整的配置文件在\ ``demo/seqToseq/seqToseq_net.py``\ 。
beam_gen = paddle.layer.beam_search(
name=decoder_group_name,
step=gru_decoder_with_attention,
input=group_inputs,
bos_id=0, # Beginnning token.
eos_id=1, # End of sentence token.
beam_size=beam_size,
max_length=max_length)
return beam_gen
注意,这种生成技术只用于类似解码器的生成过程。如果你正在处理序列标记任务,请参阅 `book/06.understand_sentiment <https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment>`_ 了解更多详细信息。
完整的配置文件在 `book/08.machine_translation/train.py <https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/train.py>`_ 。
......@@ -3,34 +3,11 @@ RNN Configuration
This tutorial will guide you how to configure recurrent neural network in PaddlePaddle. PaddlePaddle supports highly flexible and efficient recurrent neural network configuration. In this tutorial, you will learn how to:
- prepare sequence data for learning recurrent neural networks.
- configure recurrent neural network architecture.
- generate sequence with learned recurrent neural network models.
We will use vanilla recurrent neural network, and sequence to sequence model to guide you through these steps. The code of sequence to sequence model can be found at :code:`demo/seqToseq`.
=====================
Prepare Sequence Data
=====================
PaddlePaddle does not need any preprocessing to sequence data, such as padding. The only thing that needs to be done is to set the type of the corresponding type to input. For example, the following code snippets defines three input. All of them are sequences, and the size of them are :code:`src_dict`, :code:`trg_dict`, and :code:`trg_dict`:
.. code-block:: python
settings.input_types = [
integer_value_sequence(len(settings.src_dict)),
integer_value_sequence(len(settings.trg_dict)),
integer_value_sequence(len(settings.trg_dict))]
Then at the :code:`process` function, each :code:`yield` function will return three integer lists. Each integer list is treated as a sequence of integers:
.. code-block:: python
yield src_ids, trg_ids, trg_ids_next
For more details description of how to write a data provider, please refer to :ref:`api_pydataprovider2` . The full data provider file is located at :code:`demo/seqToseq/dataprovider.py`.
We will use vanilla recurrent neural network, and sequence to sequence model to guide you through these steps. The code of sequence to sequence model can be found at `book/08.machine_translation <https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation>`_ .
And the data preparation of this model can be found at `python/paddle/v2/dataset/wmt14.py <https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/dataset/wmt14.py>`_
===============================================
Configure Recurrent Neural Network Architecture
......@@ -75,19 +52,19 @@ Its **output function** simply takes :math:`x_t` as the output.
act=None,
rnn_layer_attr=None):
def __rnn_step__(ipt):
out_mem = memory(name=name, size=size)
rnn_out = mixed_layer(input = [full_matrix_projection(ipt),
full_matrix_projection(out_mem)],
name = name,
bias_attr = rnn_bias_attr,
act = act,
layer_attr = rnn_layer_attr,
size = size)
out_mem = paddle.layer.memory(name=name, size=size)
rnn_out = paddle.layer.mixed(input = [paddle.layer.full_matrix_projection(input=ipt),
paddle.layer.full_matrix_projection(input=out_mem)],
name = name,
bias_attr = rnn_bias_attr,
act = act,
layer_attr = rnn_layer_attr,
size = size)
return rnn_out
return recurrent_group(name='%s_recurrent_group' % name,
step=__rnn_step__,
reverse=reverse,
input=input)
return paddle.layer.recurrent_group(name='%s_recurrent_group' % name,
step=__rnn_step__,
reverse=reverse,
input=input)
PaddlePaddle uses memory to construct step function. **Memory** is the most important concept when constructing recurrent neural networks in PaddlePaddle. A memory is a state that is used recurrently in step functions, such as :math:`x_{t+1} = f_x(x_t)`. One memory contains an **output** and a **input**. The output of memory at the current time step is utilized as the input of the memory at the next time step. A memory can also has a **boot layer**, whose output is utilized as the initial value of the memory. In our case, the output of the gated recurrent unit is employed as the output memory. Notice that the name of the layer :code:`rnn_out` is the same as the name of :code:`out_mem`. This means the output of the layer :code:`rnn_out` (:math:`x_{t+1}`) is utilized as the **output** of :code:`out_mem` memory.
......@@ -113,43 +90,52 @@ We also project the encoder vector to :code:`decoder_size` dimensional space, ge
.. code-block:: python
# Define the data layer of the source sentence.
src_word_id = data_layer(name='source_language_word', size=source_dict_dim)
src_word_id = paddle.layer.data(
name='source_language_word',
type=paddle.data_type.integer_value_sequence(source_dict_dim))
# Calculate the word embedding of each word.
src_embedding = embedding_layer(
src_embedding = paddle.layer.embedding(
input=src_word_id,
size=word_vector_dim,
param_attr=ParamAttr(name='_source_language_embedding'))
param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))
# Apply forward recurrent neural network.
src_forward = grumemory(input=src_embedding, size=encoder_size)
src_forward = paddle.networks.simple_gru(
input=src_embedding, size=encoder_size)
# Apply backward recurrent neural network. reverse=True means backward recurrent neural network.
src_backward = grumemory(input=src_embedding,
size=encoder_size,
reverse=True)
src_backward = paddle.networks.simple_gru(
input=src_embedding, size=encoder_size, reverse=True)
# Mix the forward and backward parts of the recurrent neural network together.
encoded_vector = concat_layer(input=[src_forward, src_backward])
encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])
# Project encoding vector to decoder_size.
encoder_proj = mixed_layer(input = [full_matrix_projection(encoded_vector)],
size = decoder_size)
encoded_proj = paddle.layer.mixed(
size=decoder_size,
input=paddle.layer.full_matrix_projection(encoded_vector))
# Compute the first instance of the backward RNN.
backward_first = first_seq(input=src_backward)
backward_first = paddle.layer.first_seq(input=src_backward)
# Project the first instance of backward RNN to decoder size.
decoder_boot = mixed_layer(input=[full_matrix_projection(backward_first)], size=decoder_size, act=TanhActivation())
decoder_boot = paddle.layer.mixed(
size=decoder_size,
act=paddle.activation.Tanh(),
input=paddle.layer.full_matrix_projection(backward_first))
The decoder uses :code:`recurrent_group` to define the recurrent neural network. The step and output functions are defined in :code:`gru_decoder_with_attention`:
.. code-block:: python
group_inputs=[StaticInput(input=encoded_vector,is_seq=True),
StaticInput(input=encoded_proj,is_seq=True)]
trg_embedding = embedding_layer(
input=data_layer(name='target_language_word',
size=target_dict_dim),
size=word_vector_dim,
param_attr=ParamAttr(name='_target_language_embedding'))
group_input1 = paddle.layer.StaticInput(input=encoded_vector, is_seq=True)
group_input2 = paddle.layer.StaticInput(input=encoded_proj, is_seq=True)
group_inputs = [group_input1, group_input2]
trg_embedding = paddle.layer.embedding(
input=paddle.layer.data(
name='target_language_word',
type=paddle.data_type.integer_value_sequence(target_dict_dim)),
size=word_vector_dim,
param_attr=paddle.attr.ParamAttr(name='_target_language_embedding'))
group_inputs.append(trg_embedding)
group_inputs.append(trg_embedding)
# For decoder equipped with attention mechanism, in training,
......@@ -158,9 +144,10 @@ The decoder uses :code:`recurrent_group` to define the recurrent neural network.
# StaticInput means the same value is utilized at different time steps.
# Otherwise, it is a sequence input. Inputs at different time steps are different.
# All sequence inputs should have the same length.
decoder = recurrent_group(name=decoder_group_name,
step=gru_decoder_with_attention,
input=group_inputs)
decoder = paddle.layer.recurrent_group(
name=decoder_group_name,
step=gru_decoder_with_attention,
input=group_inputs)
The implementation of the step function is listed as below. First, it defines the **memory** of the decoder network. Then it defines attention, gated recurrent unit step function, and the output function:
......@@ -171,27 +158,32 @@ The implementation of the step function is listed as below. First, it defines th
# Defines the memory of the decoder.
# The output of this memory is defined in gru_step.
# Notice that the name of gru_step should be the same as the name of this memory.
decoder_mem = memory(name='gru_decoder',
size=decoder_size,
boot_layer=decoder_boot)
decoder_mem = paddle.layer.memory(
name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
# Compute attention weighted encoder vector.
context = simple_attention(encoded_sequence=enc_vec,
encoded_proj=enc_proj,
decoder_state=decoder_mem)
context = paddle.networks.simple_attention(
encoded_sequence=enc_vec,
encoded_proj=enc_proj,
decoder_state=decoder_mem)
# Mix the current word embedding and the attention weighted encoder vector.
decoder_inputs = mixed_layer(inputs = [full_matrix_projection(context),
full_matrix_projection(current_word)],
size = decoder_size * 3)
decoder_inputs = paddle.layer.mixed(
size=decoder_size * 3,
input=[
paddle.layer.full_matrix_projection(input=context),
paddle.layer.full_matrix_projection(input=current_word)
])
# Define Gated recurrent unit recurrent neural network step function.
gru_step = gru_step_layer(name='gru_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
gru_step = paddle.layer.gru_step(
name='gru_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
# Defines the output function.
out = mixed_layer(input=[full_matrix_projection(input=gru_step)],
size=target_dict_dim,
bias_attr=True,
act=SoftmaxActivation())
out = paddle.layer.mixed(
size=target_dict_dim,
bias_attr=True,
act=paddle.activation.Softmax(),
input=paddle.layer.full_matrix_projection(input=gru_step))
return out
......@@ -207,45 +199,37 @@ After training the model, we can use it to generate sequences. A common practice
- :code:`eos_id`: the end token. Every sentence ends with the end token.
- :code:`beam_size`: the beam size used in beam search.
- :code:`max_length`: the maximum length of the generated sentences.
* use :code:`seqtext_printer_evaluator` to print text according to index matrix and dictionary. This function needs to set:
- :code:`id_input`: the integer ID of the data, used to identify the corresponding output in the generated files.
- :code:`dict_file`: the dictionary file for converting word id to word.
- :code:`result_file`: the path of the generation result file.
The code is listed below:
.. code-block:: python
group_inputs=[StaticInput(input=encoded_vector,is_seq=True),
StaticInput(input=encoded_proj,is_seq=True)]
group_input1 = paddle.layer.StaticInput(input=encoded_vector, is_seq=True)
group_input2 = paddle.layer.StaticInput(input=encoded_proj, is_seq=True)
group_inputs = [group_input1, group_input2]
# In generation, decoder predicts a next target word based on
# the encoded source sequence and the last generated target word.
# The encoded source sequence (encoder's output) must be specified by
# StaticInput which is a read-only memory.
# Here, GeneratedInputs automatically fetchs the last generated word,
# which is initialized by a start mark, such as <s>.
trg_embedding = GeneratedInput(
size=target_dict_dim,
embedding_name='_target_language_embedding',
embedding_size=word_vector_dim)
trg_embedding = paddle.layer.GeneratedInput(
size=target_dict_dim,
embedding_name='_target_language_embedding',
embedding_size=word_vector_dim)
group_inputs.append(trg_embedding)
beam_gen = beam_search(name=decoder_group_name,
step=gru_decoder_with_attention,
input=group_inputs,
bos_id=0, # Beginnning token.
eos_id=1, # End of sentence token.
beam_size=beam_size,
max_length=max_length)
beam_gen = paddle.layer.beam_search(
name=decoder_group_name,
step=gru_decoder_with_attention,
input=group_inputs,
bos_id=0, # Beginnning token.
eos_id=1, # End of sentence token.
beam_size=beam_size,
max_length=max_length)
seqtext_printer_evaluator(input=beam_gen,
id_input=data_layer(name="sent_id", size=1),
dict_file=trg_dict_path,
result_file=gen_trans_file)
outputs(beam_gen)
return beam_gen
Notice that this generation technique is only useful for decoder like generation process. If you are working on sequence tagging tasks, please refer to :ref:`semantic_role_labeling` for more details.
Notice that this generation technique is only useful for decoder like generation process. If you are working on sequence tagging tasks, please refer to `book/06.understand_sentiment <https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment>`_ for more details.
The full configuration file is located at :code:`demo/seqToseq/seqToseq_net.py`.
The full configuration file is located at `book/08.machine_translation/train.py <https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/train.py>`_ .
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册