<li><strong>query</strong> (<em>Variable</em>) – The input variable which is a Tensor or LoDTensor.</li>
<li><strong>key</strong> (<em>Variable</em>) – The input variable which is a Tensor or LoDTensor.</li>
<li><strong>value</strong> (<em>Variable</em>) – The input variable which is a Tensor or LoDTensor.</li>
<li><strong>queries</strong> (<em>Variable</em>) – The input variable which should be a 3-D Tensor.</li>
<li><strong>keys</strong> (<em>Variable</em>) – The input variable which should be a 3-D Tensor.</li>
<li><strong>values</strong> (<em>Variable</em>) – The input variable which should be a 3-D Tensor.</li>
<li><strong>num_heads</strong> (<em>int</em>) – Head number to compute the scaled dot product
attention. Default value is 1.</li>
<li><strong>dropout_rate</strong> (<em>float</em>) – The dropout rate to drop the attention weight.
Default value is 0.</li>
</ul>
</td>
</tr>
<trclass="field-even field"><thclass="field-name">Returns:</th><tdclass="field-body"><pclass="first">The Tensor variables representing the output and attention scores.</p>
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– If input queries, keys, values are not 3-D Tensors.</p>
</td>
</tr>
</tbody>
</table>
<divclass="admonition note">
<pclass="first admonition-title">Note</p>
<p>1. When num_heads > 1, three linear projections are learned respectively
to map input queries, keys and values into queries’, keys’ and values’.
queries’, keys’ and values’ have the same shapes with queries, keys
and values.</p>
<pclass="last">1. When num_heads == 1, scaled_dot_product_attention has no learnable
parameters.</p>
</div>
<pclass="rubric">Examples</p>
<divclass="highlight-python"><divclass="highlight"><pre><span></span><spanclass="c1"># Suppose q, k, v are tensor variables with the following shape:</span>
<divclass="highlight-python"><divclass="highlight"><pre><span></span><spanclass="c1"># Suppose q, k, v are Tensors with the following shape:</span>
"comment":"\nReshape Operator.\n\nReshape Input(X) into the shape specified by Attr(shape).\n\nAn example:\nGiven a 2-D tensor X with 2 rows and 2 columns\n\n [[1, 2], [3, 4]]\n\nand target shape = [1, 4], the reshape operator will transform\nthe tensor X into a 2-D tensor:\n\n [[1, 2, 3, 4]]\n\nOne dimension in the target shape can be set -1, representing that its\nsize is unknown. In this case, the real dimension will be infered from \nthe original shape of Input(X) and other dimensions in the target shape.\n",
"comment":"\nReshape Operator.\n\nReshape Input(X) into the shape specified by Attr(shape).\n\nAn example:\nGiven a 2-D tensor X with 2 rows and 2 columns : [[1, 2], [3, 4]]\n\nand target shape = [1, 4], the reshape operator will transform\nthe tensor X into a 2-D tensor: [[1, 2, 3, 4]]\n\nOne dimension in the target shape can be set -1, representing that its\nsize is unknown. In this case, the real dimension will be infered from \nthe original shape of Input(X) and other dimensions in the target shape.\n",
<li><strong>query</strong> (<em>Variable</em>) – The input variable which is a Tensor or LoDTensor.</li>
<li><strong>key</strong> (<em>Variable</em>) – The input variable which is a Tensor or LoDTensor.</li>
<li><strong>value</strong> (<em>Variable</em>) – The input variable which is a Tensor or LoDTensor.</li>
<li><strong>queries</strong> (<em>Variable</em>) – The input variable which should be a 3-D Tensor.</li>
<li><strong>keys</strong> (<em>Variable</em>) – The input variable which should be a 3-D Tensor.</li>
<li><strong>values</strong> (<em>Variable</em>) – The input variable which should be a 3-D Tensor.</li>
<li><strong>num_heads</strong> (<em>int</em>) – Head number to compute the scaled dot product
attention. Default value is 1.</li>
<li><strong>dropout_rate</strong> (<em>float</em>) – The dropout rate to drop the attention weight.
Default value is 0.</li>
</ul>
</td>
</tr>
<trclass="field-even field"><thclass="field-name">返回:</th><tdclass="field-body"><pclass="first">The Tensor variables representing the output and attention scores.</p>
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– If input queries, keys, values are not 3-D Tensors.</p>
</td>
</tr>
</tbody>
</table>
<divclass="admonition note">
<pclass="first admonition-title">注解</p>
<p>1. When num_heads > 1, three linear projections are learned respectively
to map input queries, keys and values into queries’, keys’ and values’.
queries’, keys’ and values’ have the same shapes with queries, keys
and values.</p>
<pclass="last">1. When num_heads == 1, scaled_dot_product_attention has no learnable
parameters.</p>
</div>
<pclass="rubric">Examples</p>
<divclass="highlight-python"><divclass="highlight"><pre><span></span><spanclass="c1"># Suppose q, k, v are tensor variables with the following shape:</span>
<divclass="highlight-python"><divclass="highlight"><pre><span></span><spanclass="c1"># Suppose q, k, v are Tensors with the following shape:</span>