model.py 11.2 KB
Newer Older
C
caoying03 已提交
1
"""
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
    Contains model configuration for external-memory-enhanced seq2seq.

    The "external memory" refers to two types of memories.
    - Unbounded memory: i.e. vanilla attention mechanism in Seq2Seq.
    - Bounded memory: i.e. external memory in NTM.
    Both types of external memories are exploited to enhance the vanilla
    Seq2Seq neural machine translation.

    The implementation primarily follows the paper
    `Memory-enhanced Decoder for Neural Machine Translation
    <https://arxiv.org/abs/1606.02003>`_,
    with some minor differences (will be listed in README.md).

    For details about "external memory", please also refer to
    `Neural Turing Machines <https://arxiv.org/abs/1410.5401>`_.
"""
import paddle.v2 as paddle
from external_memory import ExternalMemory


def bidirectional_gru_encoder(input, size, word_vec_dim):
23 24 25 26 27 28 29 30 31
    """Bidirectional GRU encoder.

    :params size: Hidden cell number in decoder rnn.
    :type size: int
    :params word_vec_dim: Word embedding size.
    :type word_vec_dim: int
    :return: Tuple of 1. concatenated forward and backward hidden sequence.
             2. last state of backward rnn.
    :rtype: tuple of LayerOutput
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
    """
    # token embedding
    embeddings = paddle.layer.embedding(input=input, size=word_vec_dim)
    # token-level forward and backard encoding for attentions
    forward = paddle.networks.simple_gru(
        input=embeddings, size=size, reverse=False)
    backward = paddle.networks.simple_gru(
        input=embeddings, size=size, reverse=True)
    forward_backward = paddle.layer.concat(input=[forward, backward])
    # sequence-level encoding
    backward_first = paddle.layer.first_seq(input=backward)
    return forward_backward, backward_first


def memory_enhanced_decoder(input, target, initial_state, source_context, size,
                            word_vec_dim, dict_size, is_generating, beam_size):
48
    """GRU sequence decoder enhanced with external memory.
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85

    The "external memory" refers to two types of memories.
    - Unbounded memory: i.e. attention mechanism in Seq2Seq.
    - Bounded memory: i.e. external memory in NTM.
    Both types of external memories can be implemented with
    ExternalMemory class, and are both exploited in this enhanced RNN decoder.

    The vanilla RNN/LSTM/GRU also has a narrow memory mechanism, namely the
    hidden state vector (or cell state in LSTM) carrying information through
    a span of sequence time, which is a successful design enriching the model
    with the capability to "remember" things in the long run. However, such a
    vector state is somewhat limited to a very narrow memory bandwidth. External
    memory introduced here could easily increase the memory capacity with linear
    complexity cost (rather than quadratic for vector state).

    This enhanced decoder expands its "memory passage" through two
    ExternalMemory objects:
    - Bounded memory for handling long-term information exchange within decoder
      itself. A direct expansion of traditional "vector" state.
    - Unbounded memory for handling source language's token-wise information.
      Exactly the attention mechanism over Seq2Seq.
    
    Notice that we take the attention mechanism as a particular form of external
    memory, with read-only memory bank initialized with encoder states, and a
    read head with content-based addressing (attention). From this view point,
    we arrive at a better understanding of attention mechanism itself and other
    external memory, and a concise and unified implementation for them.

    For more details about external memory, please refer to
    `Neural Turing Machines <https://arxiv.org/abs/1410.5401>`_.

    For more details about this memory-enhanced decoder, please
    refer to `Memory-enhanced Decoder for Neural Machine Translation 
    <https://arxiv.org/abs/1606.02003>`_. This implementation is highly
    correlated to this paper, but with minor differences (e.g. put "write"
    before "read" to bypass a potential bug in V2 APIs. See
    (`issue <https://github.com/PaddlePaddle/Paddle/issues/2061>`_).
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109

    :params input: Decoder input.
    :type input: LayerOutput
    :params target: Decoder target.
    :type target: LayerOutput
    :params initial_state: Initial hidden state.
    :type initial_state: LayerOutput
    :params source_context: Group of context hidden states for each token in the
                            source sentence, for attention mechanisim.
    :type source_context: LayerOutput
    :params size: Hidden cell number in decoder rnn.
    :type size: int
    :params word_vec_dim: Word embedding size.
    :type word_vec_dim: int
    :param dict_size: Vocabulary size.
    :type dict_size: int
    :params is_generating: Whether for beam search inferencing (True) or
                           for training (False).
    :type is_generating: bool
    :params beam_size: Beam search width.
    :type beam_size: int
    :return: Cost layer if is_generating=False; Beam search layer if
             is_generating = True.
    :rtype: LayerOutput
110 111
    """
    # prepare initial bounded and unbounded memory
112 113 114 115
    bounded_memory_slot_init = paddle.layer.fc(input=paddle.layer.pooling(
        input=source_context, pooling_type=paddle.pooling.Avg()),
                                               size=size,
                                               act=paddle.activation.Sigmoid())
116 117 118 119 120 121 122 123 124 125 126
    bounded_memory_perturbation = paddle.layer.data(
        name='bounded_memory_perturbation',
        type=paddle.data_type.dense_vector_sequence(size))
    bounded_memory_init = paddle.layer.addto(
        input=[
            paddle.layer.expand(
                input=bounded_memory_slot_init,
                expand_as=bounded_memory_perturbation),
            bounded_memory_perturbation
        ],
        act=paddle.activation.Linear())
R
ranqiu 已提交
127
    bounded_memory_weight_init = paddle.layer.slope_intercept(
R
ranqiu 已提交
128 129 130
        input=paddle.layer.fc(input=bounded_memory_init, size=1),
        slope=0.0,
        intercept=0.0)
131
    unbounded_memory_init = source_context
R
ranqiu 已提交
132
    unbounded_memory_weight_init = paddle.layer.slope_intercept(
R
ranqiu 已提交
133 134 135
        input=paddle.layer.fc(input=unbounded_memory_init, size=1),
        slope=0.0,
        intercept=0.0)
136 137 138 139 140 141 142 143 144 145

    # prepare step function for reccurent group
    def recurrent_decoder_step(cur_embedding):
        # create hidden state, bounded and unbounded memory.
        state = paddle.layer.memory(
            name="gru_decoder", size=size, boot_layer=initial_state)
        bounded_memory = ExternalMemory(
            name="bounded_memory",
            mem_slot_size=size,
            boot_layer=bounded_memory_init,
R
ranqiu 已提交
146
            initial_weight=bounded_memory_weight_init,
147 148 149 150 151 152
            readonly=False,
            enable_interpolation=True)
        unbounded_memory = ExternalMemory(
            name="unbounded_memory",
            mem_slot_size=size * 2,
            boot_layer=unbounded_memory_init,
R
ranqiu 已提交
153
            initial_weight=unbounded_memory_weight_init,
154 155 156 157 158 159 160 161 162 163 164 165
            readonly=True,
            enable_interpolation=False)
        # write bounded memory
        bounded_memory.write(state)
        # read bounded memory
        bounded_memory_read = bounded_memory.read(state)
        # prepare key for unbounded memory
        key_for_unbounded_memory = paddle.layer.fc(
            input=[bounded_memory_read, cur_embedding],
            size=size,
            act=paddle.activation.Tanh(),
            bias_attr=False)
R
ranqiu 已提交
166
        # read unbounded memory (i.e. attention mechanism)
167 168 169 170 171 172 173 174 175 176
        context = unbounded_memory.read(key_for_unbounded_memory)
        # gated recurrent unit
        gru_inputs = paddle.layer.fc(
            input=[context, cur_embedding, bounded_memory_read],
            size=size * 3,
            act=paddle.activation.Linear(),
            bias_attr=False)
        gru_output = paddle.layer.gru_step(
            name="gru_decoder", input=gru_inputs, output_mem=state, size=size)
        # step output
177 178 179 180
        return paddle.layer.fc(input=[gru_output, context, cur_embedding],
                               size=dict_size,
                               act=paddle.activation.Softmax(),
                               bias_attr=True)
181 182 183 184 185 186 187 188 189 190 191 192 193 194

    if not is_generating:
        target_embeddings = paddle.layer.embedding(
            input=input,
            size=word_vec_dim,
            param_attr=paddle.attr.ParamAttr(name="_decoder_word_embedding"))
        decoder_result = paddle.layer.recurrent_group(
            name="decoder_group",
            step=recurrent_decoder_step,
            input=[target_embeddings])
        cost = paddle.layer.classification_cost(
            input=decoder_result, label=target)
        return cost
    else:
195
        target_embeddings = paddle.layer.GeneratedInput(
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212
            size=dict_size,
            embedding_name="_decoder_word_embedding",
            embedding_size=word_vec_dim)
        beam_gen = paddle.layer.beam_search(
            name="decoder_group",
            step=recurrent_decoder_step,
            input=[target_embeddings],
            bos_id=0,
            eos_id=1,
            beam_size=beam_size,
            max_length=100)
        return beam_gen


def memory_enhanced_seq2seq(encoder_input, decoder_input, decoder_target,
                            hidden_size, word_vec_dim, dict_size, is_generating,
                            beam_size):
213
    """Seq2Seq Model enhanced with external memory.
214 215 216 217 218 219 220 221 222 223 224 225 226 227 228

    The "external memory" refers to two types of memories.
    - Unbounded memory: i.e. attention mechanism in Seq2Seq.
    - Bounded memory: i.e. external memory in NTM.
    Both types of external memories can be implemented with
    ExternalMemory class, and are both exploited in this Seq2Seq model.

    Please refer to the function comments of memory_enhanced_decoder(...).

    For more details about external memory, please refer to
    `Neural Turing Machines <https://arxiv.org/abs/1410.5401>`_.

    For more details about this memory-enhanced Seq2Seq, please
    refer to `Memory-enhanced Decoder for Neural Machine Translation 
    <https://arxiv.org/abs/1606.02003>`_.
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249

    :params encoder_input: Encoder input.
    :type encoder_input: LayerOutput
    :params decoder_input: Decoder input.
    :type decoder_input: LayerOutput
    :params decoder_target: Decoder target.
    :type decoder_target: LayerOutput
    :params hidden_size: Hidden cell number, both in encoder and decoder rnn.
    :type hidden_size: int
    :params word_vec_dim: Word embedding size.
    :type word_vec_dim: int
    :param dict_size: Vocabulary size.
    :type dict_size: int
    :params is_generating: Whether for beam search inferencing (True) or
                           for training (False).
    :type is_generating: bool
    :params beam_size: Beam search width.
    :type beam_size: int
    :return: Cost layer if is_generating=False; Beam search layer if
             is_generating = True.
    :rtype: LayerOutput
250 251 252 253 254 255 256 257 258 259 260 261 262 263 264
    """
    # encoder
    context_encodings, sequence_encoding = bidirectional_gru_encoder(
        input=encoder_input, size=hidden_size, word_vec_dim=word_vec_dim)
    # decoder
    return memory_enhanced_decoder(
        input=decoder_input,
        target=decoder_target,
        initial_state=sequence_encoding,
        source_context=context_encodings,
        size=hidden_size,
        word_vec_dim=word_vec_dim,
        dict_size=dict_size,
        is_generating=is_generating,
        beam_size=beam_size)