未验证 提交 65494051 编写于 作者: L liu zhengxi 提交者: GitHub

Transfer MultiHeadAttention's matmul to v2 op (#36222)

* promote to v2

* alter
上级 37f43ebc
......@@ -402,9 +402,8 @@ class MultiHeadAttention(Layer):
q, k, v, cache = self._prepare_qkv(query, key, value, cache)
# scale dot product attention
# TODO(guosheng): use tensor.matmul, however it doesn't support `alpha`
product = layers.matmul(
x=q, y=k, transpose_y=True, alpha=self.head_dim**-0.5)
product = paddle.matmul(
x=q * (self.head_dim**-0.5), y=k, transpose_y=True)
if attn_mask is not None:
# Support bool or int mask
attn_mask = _convert_attention_mask(attn_mask, product.dtype)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册