Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
Paddle
提交
4545a058
P
Paddle
项目概览
PaddlePaddle
/
Paddle
大约 1 年 前同步成功
通知
2298
Star
20931
Fork
5422
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1423
列表
看板
标记
里程碑
合并请求
543
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1,423
Issue
1,423
列表
看板
标记
里程碑
合并请求
543
合并请求
543
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
You need to sign in or sign up before continuing.
提交
4545a058
编写于
10月 10, 2017
作者:
R
ranqiu
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add dot-product attention
上级
8e2cc754
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
88 addition
and
2 deletion
+88
-2
doc/api/v2/config/networks.rst
doc/api/v2/config/networks.rst
+5
-0
python/paddle/trainer_config_helpers/networks.py
python/paddle/trainer_config_helpers/networks.py
+83
-2
未找到文件。
doc/api/v2/config/networks.rst
浏览文件 @
4545a058
...
@@ -125,3 +125,8 @@ simple_attention
...
@@ -125,3 +125,8 @@ simple_attention
:members: simple_attention
:members: simple_attention
:noindex:
:noindex:
dot_product_attention
---------------------
.. automodule:: paddle.v2.networks
:members: dot_product_attention
:noindex:
python/paddle/trainer_config_helpers/networks.py
浏览文件 @
4545a058
...
@@ -26,8 +26,9 @@ __all__ = [
...
@@ -26,8 +26,9 @@ __all__ = [
'sequence_conv_pool'
,
'simple_lstm'
,
"simple_img_conv_pool"
,
'sequence_conv_pool'
,
'simple_lstm'
,
"simple_img_conv_pool"
,
"img_conv_bn_pool"
,
'lstmemory_group'
,
'lstmemory_unit'
,
'small_vgg'
,
"img_conv_bn_pool"
,
'lstmemory_group'
,
'lstmemory_unit'
,
'small_vgg'
,
'img_conv_group'
,
'vgg_16_network'
,
'gru_unit'
,
'gru_group'
,
'simple_gru'
,
'img_conv_group'
,
'vgg_16_network'
,
'gru_unit'
,
'gru_group'
,
'simple_gru'
,
'simple_attention'
,
'simple_gru2'
,
'bidirectional_gru'
,
'text_conv_pool'
,
'simple_attention'
,
'dot_product_attention'
,
'simple_gru2'
,
'bidirectional_lstm'
,
'inputs'
,
'outputs'
'bidirectional_gru'
,
'text_conv_pool'
,
'bidirectional_lstm'
,
'inputs'
,
'outputs'
]
]
######################################################
######################################################
...
@@ -1361,6 +1362,7 @@ def simple_attention(encoded_sequence,
...
@@ -1361,6 +1362,7 @@ def simple_attention(encoded_sequence,
compute attention weight.
compute attention weight.
:type transform_param_attr: ParameterAttribute
:type transform_param_attr: ParameterAttribute
:return: a context vector
:return: a context vector
:rtype: LayerOutput
"""
"""
assert
encoded_proj
.
size
==
decoder_state
.
size
assert
encoded_proj
.
size
==
decoder_state
.
size
proj_size
=
encoded_proj
.
size
proj_size
=
encoded_proj
.
size
...
@@ -1396,6 +1398,85 @@ def simple_attention(encoded_sequence,
...
@@ -1396,6 +1398,85 @@ def simple_attention(encoded_sequence,
input
=
scaled
,
pooling_type
=
SumPooling
(),
name
=
"%s_pooling"
%
name
)
input
=
scaled
,
pooling_type
=
SumPooling
(),
name
=
"%s_pooling"
%
name
)
@
wrap_name_default
()
def
dot_product_attention
(
encoded_sequence
,
attending_sequence
,
transformed_state
,
softmax_param_attr
=
None
,
name
=
None
):
"""
Calculate and return a context vector with dot-product attention mechanism.
Size of the context vector equals to size of the attending_sequence.
.. math::
a(s_{i-1},h_{j}) & = s_{i-1}^\mathrm{T} h_{j}
e_{i,j} & = a(s_{i-1}, h_{j})
a_{i,j} & =
\\
frac{exp(e_{i,j})}{
\\
sum_{k=1}^{T_x}{exp(e_{i,k})}}
c_{i} & =
\\
sum_{j=1}^{T_{x}}a_{i,j}z_{j}
where :math:`h_{j}` is the jth element of encoded_sequence,
:math:`z_{j}` is the jth element of attending_sequence,
:math:`s_{i-1}` is transformed_state
The example usage is:
.. code-block:: python
context = dot_product_attention(encoded_sequence=enc_seq,
attending_sequence=att_seq,
transformed_state=state,)
:param name: name of the dot-product attention model.
:type name: basestring
:param softmax_param_attr: parameter attribute of sequence softmax
that is used to produce attention weight.
:type softmax_param_attr: ParameterAttribute
:param encoded_sequence: output of the encoder
:type encoded_sequence: LayerOutput
:param attending_sequence: attention weight is computed by a feed forward neural
network which has two inputs : decoder's transformed
hidden state of previous time step and encoder's output.
attending_sequence is the sequence to be attended.
:type attending_sequence: LayerOutput
:param transformed_state: transformed hidden state of decoder in previous time step,
its size should equal to encoded_sequence's. Here we do the
transformation outside dot_product_attention for flexibility
consideration.
:type transformed_state: LayerOutput
:return: a context vector
:rtype: LayerOutput
"""
assert
transformed_state
.
size
==
encoded_sequence
.
size
expanded
=
expand_layer
(
input
=
transformed_state
,
expanded_as
=
encoded_sequence
,
name
=
'%s_expand'
%
name
)
m
=
linear_comb_layer
(
weights
=
expanded
,
vectors
=
encoded_sequence
,
name
=
'%s_dot-product'
)
attention_weight
=
fc_layer
(
input
=
m
,
size
=
1
,
act
=
SequenceSoftmaxActivation
(),
param_attr
=
softmax_param_attr
,
name
=
"%s_softmax"
%
name
,
bias_attr
=
False
)
scaled
=
scaling_layer
(
weight
=
attention_weight
,
input
=
attending_sequence
,
name
=
'%s_scaling'
%
name
)
return
pooling_layer
(
input
=
scaled
,
pooling_type
=
SumPooling
(),
name
=
"%s_pooling"
%
name
)
def
inputs
(
layers
,
*
args
):
def
inputs
(
layers
,
*
args
):
"""
"""
Declare the inputs of network. The order of input should be as same as
Declare the inputs of network. The order of input should be as same as
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录