Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Crayon鑫
Paddle
提交
78320194
P
Paddle
项目概览
Crayon鑫
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
78320194
编写于
10月 17, 2017
作者:
R
ranqiu
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
refine dot-product attention according to the comments
上级
4545a058
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
22 addition
and
19 deletion
+22
-19
python/paddle/trainer_config_helpers/networks.py
python/paddle/trainer_config_helpers/networks.py
+22
-19
未找到文件。
python/paddle/trainer_config_helpers/networks.py
浏览文件 @
78320194
...
@@ -1400,13 +1400,13 @@ def simple_attention(encoded_sequence,
...
@@ -1400,13 +1400,13 @@ def simple_attention(encoded_sequence,
@
wrap_name_default
()
@
wrap_name_default
()
def
dot_product_attention
(
encoded_sequence
,
def
dot_product_attention
(
encoded_sequence
,
attend
ing
_sequence
,
attend
ed
_sequence
,
transformed_state
,
transformed_state
,
softmax_param_attr
=
None
,
softmax_param_attr
=
None
,
name
=
None
):
name
=
None
):
"""
"""
Calculate and return a context vector with dot-product attention mechanism.
Calculate and return a context vector with dot-product attention mechanism.
Size of the context vector equals to size of the attending
_sequence.
The dimension of the context vector equals to that of the attended
_sequence.
.. math::
.. math::
...
@@ -1419,35 +1419,38 @@ def dot_product_attention(encoded_sequence,
...
@@ -1419,35 +1419,38 @@ def dot_product_attention(encoded_sequence,
c_{i} & =
\\
sum_{j=1}^{T_{x}}a_{i,j}z_{j}
c_{i} & =
\\
sum_{j=1}^{T_{x}}a_{i,j}z_{j}
where :math:`h_{j}` is the jth element of encoded_sequence,
where :math:`h_{j}` is the jth element of encoded_sequence,
:math:`z_{j}` is the jth element of attend
ing
_sequence,
:math:`z_{j}` is the jth element of attend
ed
_sequence,
:math:`s_{i-1}` is transformed_state
:math:`s_{i-1}` is transformed_state
.
The example usage is:
The example usage is:
.. code-block:: python
.. code-block:: python
context = dot_product_attention(encoded_sequence=enc_seq,
context = dot_product_attention(encoded_sequence=enc_seq,
attend
ing
_sequence=att_seq,
attend
ed
_sequence=att_seq,
transformed_state=state,)
transformed_state=state,)
:param name: name of the dot-product attention model.
:param name: A prefix attached to the name of each layer that defined inside
the dot_product_attention.
:type name: basestring
:type name: basestring
:param softmax_param_attr: parameter attribute of sequence softmax
:param softmax_param_attr:
The
parameter attribute of sequence softmax
that is used to produce attention weight.
that is used to produce attention weight.
:type softmax_param_attr: ParameterAttribute
:type softmax_param_attr: ParameterAttribute
:param encoded_sequence:
output of the encoder
:param encoded_sequence:
The output hidden vectors of the encoder.
:type encoded_sequence: LayerOutput
:type encoded_sequence: LayerOutput
:param attending_sequence: attention weight is computed by a feed forward neural
:param attended_sequence: The attention weight is computed by a feed forward neural
network which has two inputs : decoder's transformed
network which has two inputs : decoder's transformed hidden
hidden state of previous time step and encoder's output.
state of previous time step and encoder's output.
attending_sequence is the sequence to be attended.
attended_sequence is the sequence to be attended.
:type attending_sequence: LayerOutput
:type attended_sequence: LayerOutput
:param transformed_state: transformed hidden state of decoder in previous time step,
:param transformed_state: The transformed hidden state of decoder in previous time step.
its size should equal to encoded_sequence's. Here we do the
Since the dot-product operation will be performed on it and the
transformation outside dot_product_attention for flexibility
encoded_sequence, their dimensions must be equal. For flexibility,
consideration.
we suppose transformations of the decoder's hidden state have been
done outside dot_product_attention and no more will be performed
inside. Then users can use either the original or transformed one.
:type transformed_state: LayerOutput
:type transformed_state: LayerOutput
:return:
a context vector
:return:
The context vector.
:rtype: LayerOutput
:rtype: LayerOutput
"""
"""
assert
transformed_state
.
size
==
encoded_sequence
.
size
assert
transformed_state
.
size
==
encoded_sequence
.
size
...
@@ -1470,7 +1473,7 @@ def dot_product_attention(encoded_sequence,
...
@@ -1470,7 +1473,7 @@ def dot_product_attention(encoded_sequence,
scaled
=
scaling_layer
(
scaled
=
scaling_layer
(
weight
=
attention_weight
,
weight
=
attention_weight
,
input
=
attend
ing
_sequence
,
input
=
attend
ed
_sequence
,
name
=
'%s_scaling'
%
name
)
name
=
'%s_scaling'
%
name
)
return
pooling_layer
(
return
pooling_layer
(
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录