Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
Paddle
提交
d00eb53a
P
Paddle
项目概览
PaddlePaddle
/
Paddle
大约 1 年 前同步成功
通知
2298
Star
20931
Fork
5422
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1423
列表
看板
标记
里程碑
合并请求
543
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1,423
Issue
1,423
列表
看板
标记
里程碑
合并请求
543
合并请求
543
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
d00eb53a
编写于
1月 24, 2018
作者:
Y
ying
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add linear projection to q, k and v.
上级
0d96899f
变更
4
隐藏空白更改
内联
并排
Showing
4 changed file
with
47 addition
and
14 deletion
+47
-14
python/paddle/v2/fluid/layers/nn.py
python/paddle/v2/fluid/layers/nn.py
+13
-11
python/paddle/v2/fluid/nets.py
python/paddle/v2/fluid/nets.py
+31
-3
python/paddle/v2/fluid/tests/test_iou_similarity_op.py
python/paddle/v2/fluid/tests/test_iou_similarity_op.py
+0
-0
python/paddle/v2/fluid/tests/test_multihead_attention.py
python/paddle/v2/fluid/tests/test_multihead_attention.py
+3
-0
未找到文件。
python/paddle/v2/fluid/layers/nn.py
浏览文件 @
d00eb53a
...
@@ -108,16 +108,17 @@ def fc(input,
...
@@ -108,16 +108,17 @@ def fc(input,
into a 2-dimensional matrix. The parameter
into a 2-dimensional matrix. The parameter
`num_flatten_dims` determines how the input tensor
`num_flatten_dims` determines how the input tensor
is flattened: the first `num_flatten_dims`
is flattened: the first `num_flatten_dims`
dimensions will be flatten to form the first
(inclusive, index starts from 1) dimensions will
dimension of the final matrix (height of the
be flatten to form the first dimension of the
matrix), and the rest `rank(X) - num_flatten_dims`
final matrix (height of the matrix), and the rest
dimensions are flattened to form the second
`rank(X) - num_flatten_dims` dimensions are
dimension of the final matrix (width of the matrix).
flattened to form the second dimension of the
For example, suppose `X` is a 6-dimensional tensor
final matrix (width of the matrix). For example,
with a shape [2, 3, 4, 5, 6], and
suppose `X` is a 6-dimensional tensor with a shape
`num_flatten_dims` = 3. Then, the flattened matrix
[2, 3, 4, 5, 6], and `num_flatten_dims` = 3. Then,
will have a shape [2 x 3 x 4, 5 x 6] = [24, 30].
the flattened matrix will have a shape
By default, `num_flatten_dims` is set to 1.
[2 x 3 x 4, 5 x 6] = [24, 30]. By default,
`num_flatten_dims` is set to 1.
param_attr(ParamAttr|list): The parameter attribute for learnable
param_attr(ParamAttr|list): The parameter attribute for learnable
parameters/weights of the fully connected
parameters/weights of the fully connected
layer.
layer.
...
@@ -158,6 +159,7 @@ def fc(input,
...
@@ -158,6 +159,7 @@ def fc(input,
param_shape
=
[
param_shape
=
[
reduce
(
lambda
a
,
b
:
a
*
b
,
input_shape
[
num_flatten_dims
:],
1
)
reduce
(
lambda
a
,
b
:
a
*
b
,
input_shape
[
num_flatten_dims
:],
1
)
]
+
[
size
]
]
+
[
size
]
w
=
helper
.
create_parameter
(
w
=
helper
.
create_parameter
(
attr
=
param_attr
,
shape
=
param_shape
,
dtype
=
dtype
,
is_bias
=
False
)
attr
=
param_attr
,
shape
=
param_shape
,
dtype
=
dtype
,
is_bias
=
False
)
tmp
=
helper
.
create_tmp_variable
(
dtype
)
tmp
=
helper
.
create_tmp_variable
(
dtype
)
...
@@ -747,7 +749,7 @@ def square_error_cost(input, label, **kwargs):
...
@@ -747,7 +749,7 @@ def square_error_cost(input, label, **kwargs):
This layer accepts input predictions and target label and returns the
This layer accepts input predictions and target label and returns the
squared error cost.
squared error cost.
For predictions, :math:`X`, and target labels, :math:`Y`, the equation is:
For predictions, :math:`X`, and target labels, :math:`Y`, the equation is:
.. math::
.. math::
...
...
python/paddle/v2/fluid/nets.py
浏览文件 @
d00eb53a
...
@@ -197,15 +197,27 @@ def scaled_dot_product_attention(queries,
...
@@ -197,15 +197,27 @@ def scaled_dot_product_attention(queries,
Variable: A 3-D Tensor computed by multi-head scaled dot product
Variable: A 3-D Tensor computed by multi-head scaled dot product
attention.
attention.
Raises:
ValueError: If input queries, keys, values are not 3-D Tensors.
NOTE:
1. When num_heads > 1, three linear projections are learned respectively
to map input queries, keys and values into queries', keys' and values'.
queries', keys' and values' have the same shapes with queries, keys
and values.
1. When num_heads == 1, scaled_dot_product_attention has no learnable
parameters.
Examples:
Examples:
.. code-block:: python
.. code-block:: python
# Suppose q, k, v are Tensors with the following shape:
# Suppose q, k, v are Tensors with the following shape:
# q: [3, 5, 9], k: [3, 6, 9], v: [3, 6, 10]
# q: [3, 5, 9], k: [3, 6, 9], v: [3, 6, 10]
contexts = fluid.nets.dot_product_attention(q, k, v)
contexts = fluid.nets.scaled_dot_product_attention(q, k, v)
out.shape # [3, 5, 10]
contexts.shape # [3, 5, 10]
attn_scores.shape # [3, 5, 6]
"""
"""
if
not
(
len
(
queries
.
shape
)
==
len
(
keys
.
shape
)
==
len
(
values
.
shape
)
==
3
):
if
not
(
len
(
queries
.
shape
)
==
len
(
keys
.
shape
)
==
len
(
values
.
shape
)
==
3
):
raise
ValueError
(
raise
ValueError
(
...
@@ -228,6 +240,22 @@ def scaled_dot_product_attention(queries,
...
@@ -228,6 +240,22 @@ def scaled_dot_product_attention(queries,
(
values
.
shape
[
-
1
],
num_heads
))
(
values
.
shape
[
-
1
],
num_heads
))
def
__compute_qkv
(
queries
,
keys
,
values
,
num_heads
):
def
__compute_qkv
(
queries
,
keys
,
values
,
num_heads
):
"""
Add linear projection to queries, keys, and values.
Args:
queries(Tensor): a 3-D input Tensor.
keys(Tensor): a 3-D input Tensor.
values(Tensor): a 3-D input Tensor.
num_heads(int): The number of heads. Linearly project the inputs
ONLY when num_heads > 1.
Returns:
Tensor: linearly projected output Tensors: queries', keys' and
values'. They have the same shapes with queries, keys and
values.
"""
if
num_heads
==
1
:
if
num_heads
==
1
:
return
queries
,
keys
,
values
return
queries
,
keys
,
values
...
...
python/paddle/v2/fluid/tests/test_iou_similarity_op.py
100755 → 100644
浏览文件 @
d00eb53a
文件模式从 100755 更改为 100644
python/paddle/v2/fluid/tests/test_multihead_attention.py
浏览文件 @
d00eb53a
...
@@ -65,6 +65,7 @@ class TestMultiheadAttention(unittest.TestCase):
...
@@ -65,6 +65,7 @@ class TestMultiheadAttention(unittest.TestCase):
self
.
set_inputs
(
place
)
self
.
set_inputs
(
place
)
exe
=
fluid
.
Executor
(
place
)
exe
=
fluid
.
Executor
(
place
)
exe
.
run
(
fluid
.
default_startup_program
())
output
=
exe
.
run
(
fluid
.
default_main_program
(),
output
=
exe
.
run
(
fluid
.
default_main_program
(),
feed
=
self
.
inputs
,
feed
=
self
.
inputs
,
fetch_list
=
self
.
fetch_list
,
fetch_list
=
self
.
fetch_list
,
...
@@ -90,6 +91,8 @@ class TestMultiheadAttention(unittest.TestCase):
...
@@ -90,6 +91,8 @@ class TestMultiheadAttention(unittest.TestCase):
self
.
set_program
()
self
.
set_program
()
self
.
run_program
()
self
.
run_program
()
#fixme(caoying) add more meaningfull unittest.
if
__name__
==
'__main__'
:
if
__name__
==
'__main__'
:
unittest
.
main
()
unittest
.
main
()
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录