Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
Paddle
提交
f12f61d5
P
Paddle
项目概览
PaddlePaddle
/
Paddle
大约 1 年 前同步成功
通知
2298
Star
20931
Fork
5422
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1423
列表
看板
标记
里程碑
合并请求
543
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1,423
Issue
1,423
列表
看板
标记
里程碑
合并请求
543
合并请求
543
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
f12f61d5
编写于
10月 17, 2017
作者:
C
Cao Ying
提交者:
GitHub
10月 17, 2017
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #4674 from ranqiu92/attention
add config helper for dot-product attention.
上级
064c3695
7ad15259
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
91 addition
and
2 deletion
+91
-2
doc/api/v2/config/networks.rst
doc/api/v2/config/networks.rst
+5
-0
python/paddle/trainer_config_helpers/networks.py
python/paddle/trainer_config_helpers/networks.py
+86
-2
未找到文件。
doc/api/v2/config/networks.rst
浏览文件 @
f12f61d5
...
...
@@ -125,3 +125,8 @@ simple_attention
:members: simple_attention
:noindex:
dot_product_attention
---------------------
.. automodule:: paddle.v2.networks
:members: dot_product_attention
:noindex:
python/paddle/trainer_config_helpers/networks.py
浏览文件 @
f12f61d5
...
...
@@ -26,8 +26,9 @@ __all__ = [
'sequence_conv_pool'
,
'simple_lstm'
,
"simple_img_conv_pool"
,
"img_conv_bn_pool"
,
'lstmemory_group'
,
'lstmemory_unit'
,
'small_vgg'
,
'img_conv_group'
,
'vgg_16_network'
,
'gru_unit'
,
'gru_group'
,
'simple_gru'
,
'simple_attention'
,
'simple_gru2'
,
'bidirectional_gru'
,
'text_conv_pool'
,
'bidirectional_lstm'
,
'inputs'
,
'outputs'
'simple_attention'
,
'dot_product_attention'
,
'simple_gru2'
,
'bidirectional_gru'
,
'text_conv_pool'
,
'bidirectional_lstm'
,
'inputs'
,
'outputs'
]
######################################################
...
...
@@ -1361,6 +1362,7 @@ def simple_attention(encoded_sequence,
compute attention weight.
:type transform_param_attr: ParameterAttribute
:return: a context vector
:rtype: LayerOutput
"""
assert
encoded_proj
.
size
==
decoder_state
.
size
proj_size
=
encoded_proj
.
size
...
...
@@ -1396,6 +1398,88 @@ def simple_attention(encoded_sequence,
input
=
scaled
,
pooling_type
=
SumPooling
(),
name
=
"%s_pooling"
%
name
)
@
wrap_name_default
()
def
dot_product_attention
(
encoded_sequence
,
attended_sequence
,
transformed_state
,
softmax_param_attr
=
None
,
name
=
None
):
"""
Calculate and return a context vector with dot-product attention mechanism.
The dimension of the context vector equals to that of the attended_sequence.
.. math::
a(s_{i-1},h_{j}) & = s_{i-1}^\mathrm{T} h_{j}
e_{i,j} & = a(s_{i-1}, h_{j})
a_{i,j} & =
\\
frac{exp(e_{i,j})}{
\\
sum_{k=1}^{T_x}{exp(e_{i,k})}}
c_{i} & =
\\
sum_{j=1}^{T_{x}}a_{i,j}z_{j}
where :math:`h_{j}` is the jth element of encoded_sequence,
:math:`z_{j}` is the jth element of attended_sequence,
:math:`s_{i-1}` is transformed_state.
The example usage is:
.. code-block:: python
context = dot_product_attention(encoded_sequence=enc_seq,
attended_sequence=att_seq,
transformed_state=state,)
:param name: A prefix attached to the name of each layer that defined inside
the dot_product_attention.
:type name: basestring
:param softmax_param_attr: The parameter attribute of sequence softmax
that is used to produce attention weight.
:type softmax_param_attr: ParameterAttribute
:param encoded_sequence: The output hidden vectors of the encoder.
:type encoded_sequence: LayerOutput
:param attended_sequence: The attention weight is computed by a feed forward neural
network which has two inputs : decoder's transformed hidden
state of previous time step and encoder's output.
attended_sequence is the sequence to be attended.
:type attended_sequence: LayerOutput
:param transformed_state: The transformed hidden state of decoder in previous time step.
Since the dot-product operation will be performed on it and the
encoded_sequence, their dimensions must be equal. For flexibility,
we suppose transformations of the decoder's hidden state have been
done outside dot_product_attention and no more will be performed
inside. Then users can use either the original or transformed one.
:type transformed_state: LayerOutput
:return: The context vector.
:rtype: LayerOutput
"""
assert
transformed_state
.
size
==
encoded_sequence
.
size
expanded
=
expand_layer
(
input
=
transformed_state
,
expanded_as
=
encoded_sequence
,
name
=
'%s_expand'
%
name
)
m
=
linear_comb_layer
(
weights
=
expanded
,
vectors
=
encoded_sequence
,
name
=
'%s_dot-product'
)
attention_weight
=
fc_layer
(
input
=
m
,
size
=
1
,
act
=
SequenceSoftmaxActivation
(),
param_attr
=
softmax_param_attr
,
name
=
"%s_softmax"
%
name
,
bias_attr
=
False
)
scaled
=
scaling_layer
(
weight
=
attention_weight
,
input
=
attended_sequence
,
name
=
'%s_scaling'
%
name
)
return
pooling_layer
(
input
=
scaled
,
pooling_type
=
SumPooling
(),
name
=
"%s_pooling"
%
name
)
def
inputs
(
layers
,
*
args
):
"""
Declare the inputs of network. The order of input should be as same as
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录