diff --git a/mt_with_external_memory/README.md b/mt_with_external_memory/README.md index 22c8effdcd13249c168555b0ac64762b8f5e4873..96dfff0d4f409aa0e90af79ca0faad6d13a95840 100644 --- a/mt_with_external_memory/README.md +++ b/mt_with_external_memory/README.md @@ -222,7 +222,7 @@ class ExternalMemory(object): - 输入参数 `read_key`:某层的输出,其包含的信息用于读头的寻址。 - 返回:读出的信息(可直接作为其他层的输入)。 -部分重要的实现逻辑: +部分关键实现逻辑: - 神经图灵机的 “外部存储矩阵” 采用 `Paddle.layer.memory`实现,并采用序列形式(`is_seq=True`),该序列的长度表示记忆槽的数量,序列的 `size` 表示记忆槽(向量)的大小。该序列依赖一个外部层作为初始化, 其记忆槽的数量取决于该层输出序列的长度。因此,该类不仅可用来实现有界记忆(Bounded Memory),同时可用来实现无界记忆 (Unbounded Memory,即记忆槽数量可变)。 @@ -244,21 +244,130 @@ class ExternalMemory(object): 涉及三个主要函数: ``` -memory_enhanced_seq2seq(...) -bidirectional_gru_encoder(...) -memory_enhanced_decoder(...) +def bidirectional_gru_encoder(input, size, word_vec_dim): + """Bidirectional GRU encoder. + + :params size: Hidden cell number in decoder rnn. + :type size: int + :params word_vec_dim: Word embedding size. + :type word_vec_dim: int + :return: Tuple of 1. concatenated forward and backward hidden sequence. + 2. last state of backward rnn. + :rtype: tuple of LayerOutput + """ + pass + + +def memory_enhanced_decoder(input, target, initial_state, source_context, size, + word_vec_dim, dict_size, is_generating, beam_size): + """GRU sequence decoder enhanced with external memory. + + The "external memory" refers to two types of memories. + - Unbounded memory: i.e. attention mechanism in Seq2Seq. + - Bounded memory: i.e. external memory in NTM. + Both types of external memories can be implemented with + ExternalMemory class, and are both exploited in this enhanced RNN decoder. + + The vanilla RNN/LSTM/GRU also has a narrow memory mechanism, namely the + hidden state vector (or cell state in LSTM) carrying information through + a span of sequence time, which is a successful design enriching the model + with the capability to "remember" things in the long run. However, such a + vector state is somewhat limited to a very narrow memory bandwidth. External + memory introduced here could easily increase the memory capacity with linear + complexity cost (rather than quadratic for vector state). + + This enhanced decoder expands its "memory passage" through two + ExternalMemory objects: + - Bounded memory for handling long-term information exchange within decoder + itself. A direct expansion of traditional "vector" state. + - Unbounded memory for handling source language's token-wise information. + Exactly the attention mechanism over Seq2Seq. + + Notice that we take the attention mechanism as a particular form of external + memory, with read-only memory bank initialized with encoder states, and a + read head with content-based addressing (attention). From this view point, + we arrive at a better understanding of attention mechanism itself and other + external memory, and a concise and unified implementation for them. + + For more details about external memory, please refer to + `Neural Turing Machines `_. + + For more details about this memory-enhanced decoder, please + refer to `Memory-enhanced Decoder for Neural Machine Translation + `_. This implementation is highly + correlated to this paper, but with minor differences (e.g. put "write" + before "read" to bypass a potential bug in V2 APIs. See + (`issue `_). + """ + pass + + + +def memory_enhanced_seq2seq(encoder_input, decoder_input, decoder_target, + hidden_size, word_vec_dim, dict_size, is_generating, + beam_size): + """Seq2Seq Model enhanced with external memory. + + The "external memory" refers to two types of memories. + - Unbounded memory: i.e. attention mechanism in Seq2Seq. + - Bounded memory: i.e. external memory in NTM. + Both types of external memories can be implemented with + ExternalMemory class, and are both exploited in this Seq2Seq model. + + :params encoder_input: Encoder input. + :type encoder_input: LayerOutput + :params decoder_input: Decoder input. + :type decoder_input: LayerOutput + :params decoder_target: Decoder target. + :type decoder_target: LayerOutput + :params hidden_size: Hidden cell number, both in encoder and decoder rnn. + :type hidden_size: int + :params word_vec_dim: Word embedding size. + :type word_vec_dim: int + :param dict_size: Vocabulary size. + :type dict_size: int + :params is_generating: Whether for beam search inferencing (True) or + for training (False). + :type is_generating: bool + :params beam_size: Beam search width. + :type beam_size: int + :return: Cost layer if is_generating=False; Beam search layer if + is_generating = True. + :rtype: LayerOutput + """ + pass ``` -`memory_enhanced_seq2seq` 函数定义整个带外部记忆机制的序列到序列模型,是模型定义的主调函数。它首先调用`bidirectional_gru_encoder` 对源语言进行编码,然后通过 `memory_enhanced_decoder` 进行解码。 +- `bidirectional_gru_encoder` 函数实现双向单层 GRU(Gated Recurrent Unit) 编码器。返回两组结果:一组为字符级编码向量序列(包含前后向),一组为整个源语句的句级编码向量(仅后向)。前者用于解码器的注意力机制中记忆矩阵的初始化,后者用于解码器的状态向量的初始化。 + +- `memory_enhanced_decoder` 函数实现通过外部记忆增强的 GRU 解码器。它利用同一个`ExternalMemory` 类实现两种外部记忆模块: + + - 无界外部记忆:即传统的注意力机制。利用`ExternalMemory`,打开只读开关,关闭插值寻址。并利用解码器的第一组输出作为 `ExternalMemory` 中存储矩阵的初始化(`boot_layer`)。因此,该存储的记忆槽数目是动态可变的,取决于编码器的字符数。 + + ``` + unbounded_memory = ExternalMemory( + name="unbounded_memory", + mem_slot_size=size * 2, + boot_layer=unbounded_memory_init, + readonly=True, + enable_interpolation=False) + ``` + - 有界外部记忆:利用`ExternalMemory`,关闭只读开关,打开插值寻址。并利用解码器的第一组输出,取均值池化(pooling)后并扩展为指定序列长度后,叠加随机噪声(训练和推断时保持一致),作为 `ExternalMemory` 中存储矩阵的初始化(`boot_layer`)。因此,该存储的记忆槽数目是固定的。即代码中的: -`bidirectional_gru_encoder` 函数实现双向单层 GRU(Gated Recurrent Unit) 编码器。返回两组结果:一组为字符级编码向量序列(包含前后向),一组为整个源语句的句级编码向量(仅后向)。前者用于解码器的注意力机制中记忆矩阵的初始化,后者用于解码器的状态向量的初始化。 + ``` + bounded_memory = ExternalMemory( + name="bounded_memory", + mem_slot_size=size, + boot_layer=bounded_memory_init, + readonly=False, + enable_interpolation=True) + ``` -`memory_enhanced_decoder` 函数实现通过外部记忆增强的 GRU 解码器。它利用同一个`ExternalMemory` 类实现两种外部记忆模块: + 注意到,在我们的实现中,注意力机制(或无界外部存储)和神经图灵机(或有界外部存储)被实现成相同的 `ExternalMemory` 类。前者是**只读**的, 后者**可读可写**。这样处理仅仅是为了便于统一我们对 “注意机制” 和 “记忆机制” 的理解和认识,同时也提供更简洁和统一的实现版本。注意力机制也可以通过 `paddle.networks.simple_attention` 实现。 + +- `memory_enhanced_seq2seq` 函数定义整个带外部记忆机制的序列到序列模型,是模型定义的主调函数。它首先调用`bidirectional_gru_encoder` 对源语言进行编码,然后通过 `memory_enhanced_decoder` 进行解码。 -- 无界外部记忆:即传统的注意力机制。利用`ExternalMemory`,打开只读开关,关闭插值寻址。并利用解码器的第一组输出作为 `ExternalMemory` 中存储矩阵的初始化(`boot_layer`)。因此,该存储的记忆槽数目是动态可变的,取决于编码器的字符数。 -- 有界外部记忆:利用`ExternalMemory`,关闭只读开关,打开插值寻址。并利用解码器的第一组输出,取均值池化(pooling)后并扩展为指定序列长度后,叠加随机噪声(训练和推断时保持一致),作为 `ExternalMemory` 中存储矩阵的初始化(`boot_layer`)。因此,该存储的记忆槽数目是固定的。 -注意到,在我们的实现中,注意力机制(或无界外部存储)和神经图灵机(或有界外部存储)被实现成相同的 `ExternalMemory` 类。前者是**只读**的, 后者**可读可写**。这样处理仅仅是为了便于统一我们对 “注意机制” 和 “记忆机制” 的理解和认识,同时也提供更简洁和统一的实现版本。注意力机制也可以通过 `paddle.networks.simple_attention` 实现。 此外,在该实现中,将 `ExternalMemory` 的 `write` 操作提前至 `read` 之前,以避开潜在的拓扑连接局限,详见 [Issue](https://github.com/PaddlePaddle/Paddle/issues/2061)。我们可以看到,本质上他们是等价的。 @@ -278,24 +387,73 @@ def reader(): 用户需自行完成字符的切分 (Tokenize) ,并构建字典完成 ID 化。 -PaddlePaddle 的接口 [paddle.paddle.wmt14](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/dataset/wmt14.py), 默认提供了一个经过预处理的、较小规模的 wmt14 英法翻译数据集的子集。并提供了两个reader creator函数如下: +PaddlePaddle 的接口 [paddle.paddle.wmt14](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/dataset/wmt14.py), 默认提供了一个经过预处理的、较小规模的 [wmt14 英法翻译数据集的子集](http://paddlepaddle.bj.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz)(该数据集有193319条训练数据,6003条测试数据,词典长度为30000)。并提供了两个reader creator函数如下: ``` paddle.dataset.wmt14.train(dict_size) paddle.dataset.wmt14.test(dict_size) ``` -这两个函数被调用时即返回相应的`reader()`函数,供`paddle.traner.SGD.train`使用。 +这两个函数被调用时即返回相应的`reader()`函数,供`paddle.traner.SGD.train`使用。当我们需要使用其他数据时,可参考 [paddle.paddle.wmt14](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/dataset/wmt14.py) 构造相应的 data creator,并替换 `paddle.dataset.wmt14.train` 和 `paddle.dataset.wmt14.train` 成相应函数名。 + +### 训练 + +命令行输入: -当我们需要使用其他数据时,可参考 [paddle.paddle.wmt14](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/dataset/wmt14.py) 构造相应的 data creator,并替换 `paddle.dataset.wmt14.train` 和 `paddle.dataset.wmt14.train` 成相应函数名。 +``` +python mt_with_external_memory.py +``` +或自定义部分参数, 例如: -### 训练及预测 +``` +CUDA_VISIBLE_DEVICES=8,9,10,11 python train.py \ + --dict_size 30000 \ + --word_vec_dim 512 \ + --hidden_size 1024 \ + --memory_slot_num 8 \ + --use_gpu True \ + --trainer_count 4 \ + --num_passes 100 \ + --batch_size 128 \ + --memory_perturb_stddev 0.1 +``` + +即可运行训练脚本,训练模型将被定期保存于本地 `./checkpoints`。参数含义可运行 + +``` +python train.py --help +``` + + +### 解码 命令行输入: -```python mt_with_external_memory.py``` +``` +python infer.py +``` +或自定义部分参数, 例如: + +``` +CUDA_VISIBLE_DEVICES=8,9,10,11 python train.py \ + --dict_size 30000 \ + --word_vec_dim 512 \ + --hidden_size 1024 \ + --memory_slot_num 8 \ + --use_gpu True \ + --trainer_count 4 \ + --memory_perturb_stddev 0.1 \ + --infer_num_data 10 \ + --model_filepath checkpoints/params.latest.tar.gz + --beam_size 3 +``` + +即可运行解码脚本,产生示例翻译结果。参数含义可运行: + +``` +python infer.py --help +``` -即可运行训练脚本(默认训练一轮),训练模型将被定期保存于本地 `params.tar.gz`。训练完成后,将为少量样本生成翻译结果,详见 `infer` 函数。 ## 其他讨论 diff --git a/mt_with_external_memory/external_memory.py b/mt_with_external_memory/external_memory.py index d3dbf32ff3eb76b76326d8e64459d41d833b455a..f8597b7b2a164c2944b10652462743f620dc06bd 100755 --- a/mt_with_external_memory/external_memory.py +++ b/mt_with_external_memory/external_memory.py @@ -5,8 +5,7 @@ import paddle.v2 as paddle class ExternalMemory(object): - """ - External neural memory class. + """External neural memory class. A simplified Neural Turing Machines (NTM) with only content-based addressing (including content addressing and interpolation, but excluding @@ -76,8 +75,7 @@ class ExternalMemory(object): size=self.mem_slot_size) def _content_addressing(self, key_vector): - """ - Get write/read head's addressing weights via content-based addressing. + """Get write/read head's addressing weights via content-based addressing. """ # content-based addressing: a=tanh(W*M + U*key) key_projection = paddle.layer.fc( @@ -104,8 +102,7 @@ class ExternalMemory(object): return addressing_weight def _interpolation(self, head_name, key_vector, addressing_weight): - """ - Interpolate between previous and current addressing weights. + """Interpolate between previous and current addressing weights. """ # prepare interpolation scalar gate: g=sigmoid(W*key) gate = paddle.layer.fc( @@ -126,8 +123,7 @@ class ExternalMemory(object): return interpolated_weight def _get_addressing_weight(self, head_name, key_vector): - """ - Get final addressing weights for read/write heads, including content + """Get final addressing weights for read/write heads, including content addressing and interpolation. """ # current content-based addressing @@ -139,8 +135,7 @@ class ExternalMemory(object): return addressing_weight def write(self, write_key): - """ - Write onto the external memory. + """Write onto the external memory. It cannot be called if "readonly" set True. :param write_key: Key vector for write heads to generate writing @@ -183,8 +178,7 @@ class ExternalMemory(object): name=self.name) def read(self, read_key): - """ - Read from the external memory. + """Read from the external memory. :param write_key: Key vector for read head to generate addressing signals. diff --git a/mt_with_external_memory/model.py b/mt_with_external_memory/model.py index 6ac00d1db8ce96cb91a41d6a9b2d22687cb968f0..af8414a3fa8faabbb75b2408261cc0b20edea7f3 100644 --- a/mt_with_external_memory/model.py +++ b/mt_with_external_memory/model.py @@ -20,8 +20,15 @@ from external_memory import ExternalMemory def bidirectional_gru_encoder(input, size, word_vec_dim): - """ - Bidirectional GRU encoder. + """Bidirectional GRU encoder. + + :params size: Hidden cell number in decoder rnn. + :type size: int + :params word_vec_dim: Word embedding size. + :type word_vec_dim: int + :return: Tuple of 1. concatenated forward and backward hidden sequence. + 2. last state of backward rnn. + :rtype: tuple of LayerOutput """ # token embedding embeddings = paddle.layer.embedding(input=input, size=word_vec_dim) @@ -38,8 +45,7 @@ def bidirectional_gru_encoder(input, size, word_vec_dim): def memory_enhanced_decoder(input, target, initial_state, source_context, size, word_vec_dim, dict_size, is_generating, beam_size): - """ - GRU sequence decoder enhanced with external memory. + """GRU sequence decoder enhanced with external memory. The "external memory" refers to two types of memories. - Unbounded memory: i.e. attention mechanism in Seq2Seq. @@ -77,6 +83,30 @@ def memory_enhanced_decoder(input, target, initial_state, source_context, size, correlated to this paper, but with minor differences (e.g. put "write" before "read" to bypass a potential bug in V2 APIs. See (`issue `_). + + :params input: Decoder input. + :type input: LayerOutput + :params target: Decoder target. + :type target: LayerOutput + :params initial_state: Initial hidden state. + :type initial_state: LayerOutput + :params source_context: Group of context hidden states for each token in the + source sentence, for attention mechanisim. + :type source_context: LayerOutput + :params size: Hidden cell number in decoder rnn. + :type size: int + :params word_vec_dim: Word embedding size. + :type word_vec_dim: int + :param dict_size: Vocabulary size. + :type dict_size: int + :params is_generating: Whether for beam search inferencing (True) or + for training (False). + :type is_generating: bool + :params beam_size: Beam search width. + :type beam_size: int + :return: Cost layer if is_generating=False; Beam search layer if + is_generating = True. + :rtype: LayerOutput """ # prepare initial bounded and unbounded memory bounded_memory_slot_init = paddle.layer.fc( @@ -172,8 +202,7 @@ def memory_enhanced_decoder(input, target, initial_state, source_context, size, def memory_enhanced_seq2seq(encoder_input, decoder_input, decoder_target, hidden_size, word_vec_dim, dict_size, is_generating, beam_size): - """ - Seq2Seq Model enhanced with external memory. + """Seq2Seq Model enhanced with external memory. The "external memory" refers to two types of memories. - Unbounded memory: i.e. attention mechanism in Seq2Seq. @@ -189,6 +218,27 @@ def memory_enhanced_seq2seq(encoder_input, decoder_input, decoder_target, For more details about this memory-enhanced Seq2Seq, please refer to `Memory-enhanced Decoder for Neural Machine Translation `_. + + :params encoder_input: Encoder input. + :type encoder_input: LayerOutput + :params decoder_input: Decoder input. + :type decoder_input: LayerOutput + :params decoder_target: Decoder target. + :type decoder_target: LayerOutput + :params hidden_size: Hidden cell number, both in encoder and decoder rnn. + :type hidden_size: int + :params word_vec_dim: Word embedding size. + :type word_vec_dim: int + :param dict_size: Vocabulary size. + :type dict_size: int + :params is_generating: Whether for beam search inferencing (True) or + for training (False). + :type is_generating: bool + :params beam_size: Beam search width. + :type beam_size: int + :return: Cost layer if is_generating=False; Beam search layer if + is_generating = True. + :rtype: LayerOutput """ # encoder context_encodings, sequence_encoding = bidirectional_gru_encoder(