Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Crayon鑫
Paddle
提交
4545a058
P
Paddle
项目概览
Crayon鑫
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
4545a058
编写于
10月 10, 2017
作者:
R
ranqiu
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add dot-product attention
上级
8e2cc754
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
88 addition
and
2 deletion
+88
-2
doc/api/v2/config/networks.rst
doc/api/v2/config/networks.rst
+5
-0
python/paddle/trainer_config_helpers/networks.py
python/paddle/trainer_config_helpers/networks.py
+83
-2
未找到文件。
doc/api/v2/config/networks.rst
浏览文件 @
4545a058
...
@@ -125,3 +125,8 @@ simple_attention
...
@@ -125,3 +125,8 @@ simple_attention
:members: simple_attention
:members: simple_attention
:noindex:
:noindex:
dot_product_attention
---------------------
.. automodule:: paddle.v2.networks
:members: dot_product_attention
:noindex:
python/paddle/trainer_config_helpers/networks.py
浏览文件 @
4545a058
...
@@ -26,8 +26,9 @@ __all__ = [
...
@@ -26,8 +26,9 @@ __all__ = [
'sequence_conv_pool'
,
'simple_lstm'
,
"simple_img_conv_pool"
,
'sequence_conv_pool'
,
'simple_lstm'
,
"simple_img_conv_pool"
,
"img_conv_bn_pool"
,
'lstmemory_group'
,
'lstmemory_unit'
,
'small_vgg'
,
"img_conv_bn_pool"
,
'lstmemory_group'
,
'lstmemory_unit'
,
'small_vgg'
,
'img_conv_group'
,
'vgg_16_network'
,
'gru_unit'
,
'gru_group'
,
'simple_gru'
,
'img_conv_group'
,
'vgg_16_network'
,
'gru_unit'
,
'gru_group'
,
'simple_gru'
,
'simple_attention'
,
'simple_gru2'
,
'bidirectional_gru'
,
'text_conv_pool'
,
'simple_attention'
,
'dot_product_attention'
,
'simple_gru2'
,
'bidirectional_lstm'
,
'inputs'
,
'outputs'
'bidirectional_gru'
,
'text_conv_pool'
,
'bidirectional_lstm'
,
'inputs'
,
'outputs'
]
]
######################################################
######################################################
...
@@ -1361,6 +1362,7 @@ def simple_attention(encoded_sequence,
...
@@ -1361,6 +1362,7 @@ def simple_attention(encoded_sequence,
compute attention weight.
compute attention weight.
:type transform_param_attr: ParameterAttribute
:type transform_param_attr: ParameterAttribute
:return: a context vector
:return: a context vector
:rtype: LayerOutput
"""
"""
assert
encoded_proj
.
size
==
decoder_state
.
size
assert
encoded_proj
.
size
==
decoder_state
.
size
proj_size
=
encoded_proj
.
size
proj_size
=
encoded_proj
.
size
...
@@ -1396,6 +1398,85 @@ def simple_attention(encoded_sequence,
...
@@ -1396,6 +1398,85 @@ def simple_attention(encoded_sequence,
input
=
scaled
,
pooling_type
=
SumPooling
(),
name
=
"%s_pooling"
%
name
)
input
=
scaled
,
pooling_type
=
SumPooling
(),
name
=
"%s_pooling"
%
name
)
@
wrap_name_default
()
def
dot_product_attention
(
encoded_sequence
,
attending_sequence
,
transformed_state
,
softmax_param_attr
=
None
,
name
=
None
):
"""
Calculate and return a context vector with dot-product attention mechanism.
Size of the context vector equals to size of the attending_sequence.
.. math::
a(s_{i-1},h_{j}) & = s_{i-1}^\mathrm{T} h_{j}
e_{i,j} & = a(s_{i-1}, h_{j})
a_{i,j} & =
\\
frac{exp(e_{i,j})}{
\\
sum_{k=1}^{T_x}{exp(e_{i,k})}}
c_{i} & =
\\
sum_{j=1}^{T_{x}}a_{i,j}z_{j}
where :math:`h_{j}` is the jth element of encoded_sequence,
:math:`z_{j}` is the jth element of attending_sequence,
:math:`s_{i-1}` is transformed_state
The example usage is:
.. code-block:: python
context = dot_product_attention(encoded_sequence=enc_seq,
attending_sequence=att_seq,
transformed_state=state,)
:param name: name of the dot-product attention model.
:type name: basestring
:param softmax_param_attr: parameter attribute of sequence softmax
that is used to produce attention weight.
:type softmax_param_attr: ParameterAttribute
:param encoded_sequence: output of the encoder
:type encoded_sequence: LayerOutput
:param attending_sequence: attention weight is computed by a feed forward neural
network which has two inputs : decoder's transformed
hidden state of previous time step and encoder's output.
attending_sequence is the sequence to be attended.
:type attending_sequence: LayerOutput
:param transformed_state: transformed hidden state of decoder in previous time step,
its size should equal to encoded_sequence's. Here we do the
transformation outside dot_product_attention for flexibility
consideration.
:type transformed_state: LayerOutput
:return: a context vector
:rtype: LayerOutput
"""
assert
transformed_state
.
size
==
encoded_sequence
.
size
expanded
=
expand_layer
(
input
=
transformed_state
,
expanded_as
=
encoded_sequence
,
name
=
'%s_expand'
%
name
)
m
=
linear_comb_layer
(
weights
=
expanded
,
vectors
=
encoded_sequence
,
name
=
'%s_dot-product'
)
attention_weight
=
fc_layer
(
input
=
m
,
size
=
1
,
act
=
SequenceSoftmaxActivation
(),
param_attr
=
softmax_param_attr
,
name
=
"%s_softmax"
%
name
,
bias_attr
=
False
)
scaled
=
scaling_layer
(
weight
=
attention_weight
,
input
=
attending_sequence
,
name
=
'%s_scaling'
%
name
)
return
pooling_layer
(
input
=
scaled
,
pooling_type
=
SumPooling
(),
name
=
"%s_pooling"
%
name
)
def
inputs
(
layers
,
*
args
):
def
inputs
(
layers
,
*
args
):
"""
"""
Declare the inputs of network. The order of input should be as same as
Declare the inputs of network. The order of input should be as same as
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录