Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
Paddle
提交
78320194
P
Paddle
项目概览
PaddlePaddle
/
Paddle
大约 1 年 前同步成功
通知
2299
Star
20931
Fork
5422
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1423
列表
看板
标记
里程碑
合并请求
543
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1,423
Issue
1,423
列表
看板
标记
里程碑
合并请求
543
合并请求
543
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
78320194
编写于
10月 17, 2017
作者:
R
ranqiu
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
refine dot-product attention according to the comments
上级
4545a058
变更
1
显示空白变更内容
内联
并排
Showing
1 changed file
with
22 addition
and
19 deletion
+22
-19
python/paddle/trainer_config_helpers/networks.py
python/paddle/trainer_config_helpers/networks.py
+22
-19
未找到文件。
python/paddle/trainer_config_helpers/networks.py
浏览文件 @
78320194
...
...
@@ -1400,13 +1400,13 @@ def simple_attention(encoded_sequence,
@
wrap_name_default
()
def
dot_product_attention
(
encoded_sequence
,
attend
ing
_sequence
,
attend
ed
_sequence
,
transformed_state
,
softmax_param_attr
=
None
,
name
=
None
):
"""
Calculate and return a context vector with dot-product attention mechanism.
Size of the context vector equals to size of the attending
_sequence.
The dimension of the context vector equals to that of the attended
_sequence.
.. math::
...
...
@@ -1419,35 +1419,38 @@ def dot_product_attention(encoded_sequence,
c_{i} & =
\\
sum_{j=1}^{T_{x}}a_{i,j}z_{j}
where :math:`h_{j}` is the jth element of encoded_sequence,
:math:`z_{j}` is the jth element of attend
ing
_sequence,
:math:`s_{i-1}` is transformed_state
:math:`z_{j}` is the jth element of attend
ed
_sequence,
:math:`s_{i-1}` is transformed_state
.
The example usage is:
.. code-block:: python
context = dot_product_attention(encoded_sequence=enc_seq,
attend
ing
_sequence=att_seq,
attend
ed
_sequence=att_seq,
transformed_state=state,)
:param name: name of the dot-product attention model.
:param name: A prefix attached to the name of each layer that defined inside
the dot_product_attention.
:type name: basestring
:param softmax_param_attr: parameter attribute of sequence softmax
:param softmax_param_attr:
The
parameter attribute of sequence softmax
that is used to produce attention weight.
:type softmax_param_attr: ParameterAttribute
:param encoded_sequence:
output of the encoder
:param encoded_sequence:
The output hidden vectors of the encoder.
:type encoded_sequence: LayerOutput
:param attending_sequence: attention weight is computed by a feed forward neural
network which has two inputs : decoder's transformed
hidden state of previous time step and encoder's output.
attending_sequence is the sequence to be attended.
:type attending_sequence: LayerOutput
:param transformed_state: transformed hidden state of decoder in previous time step,
its size should equal to encoded_sequence's. Here we do the
transformation outside dot_product_attention for flexibility
consideration.
:param attended_sequence: The attention weight is computed by a feed forward neural
network which has two inputs : decoder's transformed hidden
state of previous time step and encoder's output.
attended_sequence is the sequence to be attended.
:type attended_sequence: LayerOutput
:param transformed_state: The transformed hidden state of decoder in previous time step.
Since the dot-product operation will be performed on it and the
encoded_sequence, their dimensions must be equal. For flexibility,
we suppose transformations of the decoder's hidden state have been
done outside dot_product_attention and no more will be performed
inside. Then users can use either the original or transformed one.
:type transformed_state: LayerOutput
:return:
a context vector
:return:
The context vector.
:rtype: LayerOutput
"""
assert
transformed_state
.
size
==
encoded_sequence
.
size
...
...
@@ -1470,7 +1473,7 @@ def dot_product_attention(encoded_sequence,
scaled
=
scaling_layer
(
weight
=
attention_weight
,
input
=
attend
ing
_sequence
,
input
=
attend
ed
_sequence
,
name
=
'%s_scaling'
%
name
)
return
pooling_layer
(
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录