Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
models
提交
aefd266f
M
models
项目概览
PaddlePaddle
/
models
大约 2 年 前同步成功
通知
232
Star
6828
Fork
2962
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
602
列表
看板
标记
里程碑
合并请求
255
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
M
models
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
602
Issue
602
列表
看板
标记
里程碑
合并请求
255
合并请求
255
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
aefd266f
编写于
5月 15, 2017
作者:
X
Xinghai Sun
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Correct text typos and fix NTM's addressing bugs.
上级
cdbf36da
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
60 addition
and
59 deletion
+60
-59
mt_with_external_memory/README.md
mt_with_external_memory/README.md
+2
-2
mt_with_external_memory/mt_with_external_memory.py
mt_with_external_memory/mt_with_external_memory.py
+58
-57
未找到文件。
mt_with_external_memory/README.md
浏览文件 @
aefd266f
...
@@ -99,5 +99,5 @@ TBD
...
@@ -99,5 +99,5 @@ TBD
## References
## References
1.
Alex Graves, Greg Wayne, Ivo Danihelka,
[
Neural Turing Machines
](
https://arxiv.org/abs/1410.5401
)
. arXiv preprint arXiv:1410.5401, 2014.
1.
Alex Graves, Greg Wayne, Ivo Danihelka,
[
Neural Turing Machines
](
https://arxiv.org/abs/1410.5401
)
. arXiv preprint arXiv:1410.5401, 2014.
2.
Mingxuan Wang, Zhengdong Lu, Hang Li, Qun Liu,
[
Memory-enhanced Decoder Neural Machine Translation
](
https://arxiv.org/abs/1606.02003
)
. arXiv preprint arXiv:1606.02003, 2016.
2.
Mingxuan Wang, Zhengdong Lu, Hang Li, Qun Liu,
[
Memory-enhanced Decoder Neural Machine Translation
](
https://arxiv.org/abs/1606.02003
)
. arXiv preprint arXiv:1606.02003, 2016.
3.
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio,
[
Neural Machine Translation by Jointly Learning to Align and Translate
](
https://arxiv.org/abs/1409.0473
)
. arXiv preprint arXiv:1409.0473, 2014.
3.
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio,
[
Neural Machine Translation by Jointly Learning to Align and Translate
](
https://arxiv.org/abs/1409.0473
)
. arXiv preprint arXiv:1409.0473, 2014.
\ No newline at end of file
mt_with_external_memory/mt_with_external_memory.py
浏览文件 @
aefd266f
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
"""
This python script is a example model configuration for neural machine
This python script is a
n
example model configuration for neural machine
translation with external memory, based on PaddlePaddle V2 APIs.
translation with external memory, based on PaddlePaddle V2 APIs.
The "external memory" refers to two types of memories.
The "external memory" refers to two types of memories.
...
@@ -21,7 +8,7 @@
...
@@ -21,7 +8,7 @@
Both types of external memories are exploited to enhance the vanilla
Both types of external memories are exploited to enhance the vanilla
Seq2Seq neural machine translation.
Seq2Seq neural machine translation.
The implementation
largely follower
s the paper
The implementation
primarily follow
s the paper
`Memory-enhanced Decoder for Neural Machine Translation
`Memory-enhanced Decoder for Neural Machine Translation
<https://arxiv.org/abs/1606.02003>`_,
<https://arxiv.org/abs/1606.02003>`_,
with some minor differences (will be listed in README.md).
with some minor differences (will be listed in README.md).
...
@@ -39,7 +26,7 @@ word_vec_dim = 512
...
@@ -39,7 +26,7 @@ word_vec_dim = 512
hidden_size
=
1024
hidden_size
=
1024
batch_size
=
5
batch_size
=
5
memory_slot_num
=
8
memory_slot_num
=
8
beam_size
=
40
beam_size
=
3
infer_data_num
=
3
infer_data_num
=
3
...
@@ -67,24 +54,40 @@ class ExternalMemory(object):
...
@@ -67,24 +54,40 @@ class ExternalMemory(object):
:type name: basestring
:type name: basestring
:param mem_slot_size: Size of memory slot/vector.
:param mem_slot_size: Size of memory slot/vector.
:type mem_slot_size: int
:type mem_slot_size: int
:param boot_layer: Boot layer for initializing
memory. Sequence layer
:param boot_layer: Boot layer for initializing
the external memory. The
with sequence length indicating the number of memory
sequence layer has sequence length indicating the number
slots, and size as mem_slot_
size.
of memory slots, and size as memory slot
size.
:type boot_layer: LayerOutput
:type boot_layer: LayerOutput
:param readonly: If true, the memory is read-only, and write function cannot
:param readonly: If true, the memory is read-only, and write function cannot
be called. Default is false.
be called. Default is false.
:type readonly: bool
:type readonly: bool
:param enable_interpolation: If set true, the read/write addressing weights
will be interpolated with the weights in the
last step, with the affine coefficients being
a learnable gate function.
:type enable_interpolation: bool
"""
"""
def
__init__
(
self
,
name
,
mem_slot_size
,
boot_layer
,
readonly
=
False
):
def
__init__
(
self
,
name
,
mem_slot_size
,
boot_layer
,
readonly
=
False
,
enable_interpolation
=
True
):
self
.
name
=
name
self
.
name
=
name
self
.
mem_slot_size
=
mem_slot_size
self
.
mem_slot_size
=
mem_slot_size
self
.
readonly
=
readonly
self
.
readonly
=
readonly
self
.
enable_interpolation
=
enable_interpolation
self
.
external_memory
=
paddle
.
layer
.
memory
(
self
.
external_memory
=
paddle
.
layer
.
memory
(
name
=
self
.
name
,
name
=
self
.
name
,
size
=
self
.
mem_slot_size
,
size
=
self
.
mem_slot_size
,
is_seq
=
True
,
is_seq
=
True
,
boot_layer
=
boot_layer
)
boot_layer
=
boot_layer
)
# prepare a constant (zero) intializer for addressing weights
self
.
zero_addressing_init
=
paddle
.
layer
.
slope_intercept
(
input
=
paddle
.
layer
.
fc
(
input
=
boot_layer
,
size
=
1
),
slope
=
0.0
,
intercept
=
0.0
)
# set memory to constant when readonly=True
# set memory to constant when readonly=True
if
self
.
readonly
:
if
self
.
readonly
:
self
.
updated_external_memory
=
paddle
.
layer
.
mixed
(
self
.
updated_external_memory
=
paddle
.
layer
.
mixed
(
...
@@ -111,18 +114,18 @@ class ExternalMemory(object):
...
@@ -111,18 +114,18 @@ class ExternalMemory(object):
size
=
self
.
mem_slot_size
,
size
=
self
.
mem_slot_size
,
act
=
paddle
.
activation
.
Linear
(),
act
=
paddle
.
activation
.
Linear
(),
bias_attr
=
False
)
bias_attr
=
False
)
merged
=
paddle
.
layer
.
addto
(
merged
_projection
=
paddle
.
layer
.
addto
(
input
=
[
key_proj_expanded
,
memory_projection
],
input
=
[
key_proj_expanded
,
memory_projection
],
act
=
paddle
.
activation
.
Tanh
())
act
=
paddle
.
activation
.
Tanh
())
# softmax addressing weight: w=softmax(v^T a)
# softmax addressing weight: w=softmax(v^T a)
addressing_weight
=
paddle
.
layer
.
fc
(
addressing_weight
=
paddle
.
layer
.
fc
(
input
=
merged
,
input
=
merged
_projection
,
size
=
1
,
size
=
1
,
act
=
paddle
.
activation
.
SequenceSoftmax
(),
act
=
paddle
.
activation
.
SequenceSoftmax
(),
bias_attr
=
False
)
bias_attr
=
False
)
return
addressing_weight
return
addressing_weight
def
__interpolation__
(
self
,
key_vector
,
addressing_weight
):
def
__interpolation__
(
self
,
head_name
,
key_vector
,
addressing_weight
):
"""
"""
Interpolate between previous and current addressing weights.
Interpolate between previous and current addressing weights.
"""
"""
...
@@ -134,34 +137,33 @@ class ExternalMemory(object):
...
@@ -134,34 +137,33 @@ class ExternalMemory(object):
bias_attr
=
False
)
bias_attr
=
False
)
# interpolation: w_t = g*w_t+(1-g)*w_{t-1}
# interpolation: w_t = g*w_t+(1-g)*w_{t-1}
last_addressing_weight
=
paddle
.
layer
.
memory
(
last_addressing_weight
=
paddle
.
layer
.
memory
(
name
=
self
.
name
+
"_addressing_weight"
,
size
=
1
,
is_seq
=
True
)
name
=
self
.
name
+
"_addressing_weight_"
+
head_name
,
gated_addressing_weight
=
paddle
.
layer
.
addto
(
size
=
1
,
name
=
self
.
name
+
"_addressing_weight"
,
is_seq
=
True
,
input
=
[
boot_layer
=
self
.
zero_addressing_init
)
last_addressing_weight
,
interpolated_weight
=
paddle
.
layer
.
interpolation
(
paddle
.
layer
.
scaling
(
weight
=
gate
,
input
=
addressing_weight
),
name
=
self
.
name
+
"_addressing_weight_"
+
head_name
,
paddle
.
layer
.
mixed
(
input
=
[
addressing_weight
,
addressing_weight
],
input
=
paddle
.
layer
.
dotmul_operator
(
weight
=
paddle
.
layer
.
expand
(
input
=
gate
,
expand_as
=
addressing_weight
))
a
=
gate
,
b
=
last_addressing_weight
,
scale
=-
1.0
),
return
interpolated_weight
size
=
1
)
],
def
__get_addressing_weight__
(
self
,
head_name
,
key_vector
):
act
=
paddle
.
activation
.
Tanh
())
return
gated_addressing_weight
def
__get_addressing_weight__
(
self
,
key_vector
):
"""
"""
Get final addressing weights for read/write heads, including content
Get final addressing weights for read/write heads, including content
addressing and interpolation.
addressing and interpolation.
"""
"""
# current content-based addressing
# current content-based addressing
addressing_weight
=
self
.
__content_addressing__
(
key_vector
)
addressing_weight
=
self
.
__content_addressing__
(
key_vector
)
return
addressing_weight
# interpolation with previous addresing weight
# interpolation with previous addresing weight
return
self
.
__interpolation__
(
key_vector
,
addressing_weight
)
if
self
.
enable_interpolation
:
return
self
.
__interpolation__
(
head_name
,
key_vector
,
addressing_weight
)
else
:
return
addressing_weight
def
write
(
self
,
write_key
):
def
write
(
self
,
write_key
):
"""
"""
Write
head for
external memory.
Write
onto the
external memory.
It cannot be called if "readonly" set True.
It cannot be called if "readonly" set True.
:param write_key: Key vector for write heads to generate writing
:param write_key: Key vector for write heads to generate writing
...
@@ -172,7 +174,7 @@ class ExternalMemory(object):
...
@@ -172,7 +174,7 @@ class ExternalMemory(object):
if
self
.
readonly
:
if
self
.
readonly
:
raise
ValueError
(
"ExternalMemory with readonly=True cannot write."
)
raise
ValueError
(
"ExternalMemory with readonly=True cannot write."
)
# get addressing weight for write head
# get addressing weight for write head
write_weight
=
self
.
__get_addressing_weight__
(
write_key
)
write_weight
=
self
.
__get_addressing_weight__
(
"write_head"
,
write_key
)
# prepare add_vector and erase_vector
# prepare add_vector and erase_vector
erase_vector
=
paddle
.
layer
.
fc
(
erase_vector
=
paddle
.
layer
.
fc
(
input
=
write_key
,
input
=
write_key
,
...
@@ -205,7 +207,7 @@ class ExternalMemory(object):
...
@@ -205,7 +207,7 @@ class ExternalMemory(object):
def
read
(
self
,
read_key
):
def
read
(
self
,
read_key
):
"""
"""
Read
head for
external memory.
Read
from the
external memory.
:param write_key: Key vector for read head to generate addressing
:param write_key: Key vector for read head to generate addressing
signals.
signals.
...
@@ -214,7 +216,7 @@ class ExternalMemory(object):
...
@@ -214,7 +216,7 @@ class ExternalMemory(object):
:rtype: LayerOutput
:rtype: LayerOutput
"""
"""
# get addressing weight for write head
# get addressing weight for write head
read_weight
=
self
.
__get_addressing_weight__
(
read_key
)
read_weight
=
self
.
__get_addressing_weight__
(
"read_head"
,
read_key
)
# read content from external memory
# read content from external memory
scaled
=
paddle
.
layer
.
scaling
(
scaled
=
paddle
.
layer
.
scaling
(
weight
=
read_weight
,
input
=
self
.
updated_external_memory
)
weight
=
read_weight
,
input
=
self
.
updated_external_memory
)
...
@@ -227,19 +229,16 @@ def bidirectional_gru_encoder(input, size, word_vec_dim):
...
@@ -227,19 +229,16 @@ def bidirectional_gru_encoder(input, size, word_vec_dim):
Bidirectional GRU encoder.
Bidirectional GRU encoder.
"""
"""
# token embedding
# token embedding
embeddings
=
paddle
.
layer
.
embedding
(
embeddings
=
paddle
.
layer
.
embedding
(
input
=
input
,
size
=
word_vec_dim
)
input
=
input
,
size
=
word_vec_dim
,
param_attr
=
paddle
.
attr
.
ParamAttr
(
name
=
'_encoder_word_embedding'
))
# token-level forward and backard encoding for attentions
# token-level forward and backard encoding for attentions
forward
=
paddle
.
networks
.
simple_gru
(
forward
=
paddle
.
networks
.
simple_gru
(
input
=
embeddings
,
size
=
size
,
reverse
=
False
)
input
=
embeddings
,
size
=
size
,
reverse
=
False
)
backward
=
paddle
.
networks
.
simple_gru
(
backward
=
paddle
.
networks
.
simple_gru
(
input
=
embeddings
,
size
=
size
,
reverse
=
True
)
input
=
embeddings
,
size
=
size
,
reverse
=
True
)
merge
d
=
paddle
.
layer
.
concat
(
input
=
[
forward
,
backward
])
forward_backwar
d
=
paddle
.
layer
.
concat
(
input
=
[
forward
,
backward
])
# sequence-level encoding
# sequence-level encoding
backward_first
=
paddle
.
layer
.
first_seq
(
input
=
backward
)
backward_first
=
paddle
.
layer
.
first_seq
(
input
=
backward
)
return
merge
d
,
backward_first
return
forward_backwar
d
,
backward_first
def
memory_enhanced_decoder
(
input
,
target
,
initial_state
,
source_context
,
size
,
def
memory_enhanced_decoder
(
input
,
target
,
initial_state
,
source_context
,
size
,
...
@@ -256,9 +255,9 @@ def memory_enhanced_decoder(input, target, initial_state, source_context, size,
...
@@ -256,9 +255,9 @@ def memory_enhanced_decoder(input, target, initial_state, source_context, size,
The vanilla RNN/LSTM/GRU also has a narrow memory mechanism, namely the
The vanilla RNN/LSTM/GRU also has a narrow memory mechanism, namely the
hidden state vector (or cell state in LSTM) carrying information through
hidden state vector (or cell state in LSTM) carrying information through
a span of sequence time, which is a successful design enriching the model
a span of sequence time, which is a successful design enriching the model
with
capability to "remember" things in the long run. However, such a vector
with
the capability to "remember" things in the long run. However, such a
state is somewhat limited to a very narrow memory bandwidth. External memory
vector state is somewhat limited to a very narrow memory bandwidth. External
introduced here could easily increase the memory capacity with linear
memory
introduced here could easily increase the memory capacity with linear
complexity cost (rather than quadratic for vector state).
complexity cost (rather than quadratic for vector state).
This enhanced decoder expands its "memory passage" through two
This enhanced decoder expands its "memory passage" through two
...
@@ -268,7 +267,7 @@ def memory_enhanced_decoder(input, target, initial_state, source_context, size,
...
@@ -268,7 +267,7 @@ def memory_enhanced_decoder(input, target, initial_state, source_context, size,
- Unbounded memory for handling source language's token-wise information.
- Unbounded memory for handling source language's token-wise information.
Exactly the attention mechanism over Seq2Seq.
Exactly the attention mechanism over Seq2Seq.
Notice that we take the attention mechanism as a
special
form of external
Notice that we take the attention mechanism as a
particular
form of external
memory, with read-only memory bank initialized with encoder states, and a
memory, with read-only memory bank initialized with encoder states, and a
read head with content-based addressing (attention). From this view point,
read head with content-based addressing (attention). From this view point,
we arrive at a better understanding of attention mechanism itself and other
we arrive at a better understanding of attention mechanism itself and other
...
@@ -306,12 +305,14 @@ def memory_enhanced_decoder(input, target, initial_state, source_context, size,
...
@@ -306,12 +305,14 @@ def memory_enhanced_decoder(input, target, initial_state, source_context, size,
name
=
"bounded_memory"
,
name
=
"bounded_memory"
,
mem_slot_size
=
size
,
mem_slot_size
=
size
,
boot_layer
=
bounded_memory_init
,
boot_layer
=
bounded_memory_init
,
readonly
=
False
)
readonly
=
False
,
enable_interpolation
=
True
)
unbounded_memory
=
ExternalMemory
(
unbounded_memory
=
ExternalMemory
(
name
=
"unbounded_memory"
,
name
=
"unbounded_memory"
,
mem_slot_size
=
size
*
2
,
mem_slot_size
=
size
*
2
,
boot_layer
=
unbounded_memory_init
,
boot_layer
=
unbounded_memory_init
,
readonly
=
True
)
readonly
=
True
,
enable_interpolation
=
False
)
# write bounded memory
# write bounded memory
bounded_memory
.
write
(
state
)
bounded_memory
.
write
(
state
)
# read bounded memory
# read bounded memory
...
@@ -566,7 +567,7 @@ def infer():
...
@@ -566,7 +567,7 @@ def infer():
def
main
():
def
main
():
paddle
.
init
(
use_gpu
=
False
,
trainer_count
=
8
)
paddle
.
init
(
use_gpu
=
False
,
trainer_count
=
1
)
train
(
num_passes
=
1
)
train
(
num_passes
=
1
)
infer
()
infer
()
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录