<li><strong>input</strong> (<em>Variable|list</em>) – a 2-D tensor with shape [N x D], where N is the
batch size and D is the number of classes. This input is a probability
computed by the previous operator, which is almost always the result
of a softmax operator.</li>
batch size and D is the number of classes. This
input is a probability computed by the previous
operator, which is almost always the result of
a softmax operator.</li>
<li><strong>label</strong> (<em>Variable|list</em>) – the ground truth which is a 2-D tensor. When
<cite>soft_label</cite> is set to <cite>False</cite>, <cite>label</cite> is a tensor<int64> with shape
[N x 1]. When <cite>soft_label</cite> is set to <cite>True</cite>, <cite>label</cite> is a
<cite>soft_label</cite> is set to <cite>False</cite>, <cite>label</cite> is a
tensor<int64> with shape [N x 1]. When
<cite>soft_label</cite> is set to <cite>True</cite>, <cite>label</cite> is a
tensor<float/double> with shape [N x D].</li>
<li><strong>soft_label</strong> (bool, via <cite>**kwargs</cite>) – a flag indicating whether to interpretate
the given labels as soft labels, default <cite>False</cite>.</li>
<li><strong>soft_label</strong> (bool, via <cite>**kwargs</cite>) – a flag indicating whether to
interpretate the given labels as soft
labels, default <cite>False</cite>.</li>
</ul>
</td>
</tr>
<trclass="field-even field"><thclass="field-name">Returns:</th><tdclass="field-body"><pclass="first">A 2-D tensor with shape [N x 1], the cross entropy loss.</p>
</td>
</tr>
<trclass="field-odd field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><cite>ValueError</cite>– 1) the 1st dimension of <cite>input</cite> and <cite>label</cite> are not equal; 2) when <cite>soft_label == True</cite>, and the 2nd dimension of <cite>input</cite> and <cite>label</cite> are not equal; 3) when <cite>soft_label == False</cite>, and the 2nd dimension of <cite>label</cite> is not 1.</p>
<trclass="field-odd field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first"><cite>ValueError</cite>– 1) the 1st dimension of <cite>input</cite> and <cite>label</cite> are not equal.
2) when <cite>soft_label == True</cite>, and the 2nd dimension of</p>
<blockquote>
<div><p><cite>input</cite> and <cite>label</cite> are not equal.</p>
</div></blockquote>
<olclass="last arabic simple"start="3">
<li>when <cite>soft_label == False</cite>, and the 2nd dimension of
<cite>label</cite> is not 1.</li>
</ol>
</td>
</tr>
</tbody>
...
...
@@ -1277,8 +1286,9 @@ the given labels as soft labels, default <cite>False</cite>.</li>
<p>This layer accepts input predictions and target label and returns the squared error cost.
For predictions, <spanclass="math">\(X\)</span>, and target labels, <spanclass="math">\(Y\)</span>, the equation is:</p>
<p>This layer accepts input predictions and target label and returns the
squared error cost.</p>
<p>For predictions, <spanclass="math">\(X\)</span>, and target labels, <spanclass="math">\(Y\)</span>, the equation is:</p>
<divclass="math">
\[Out = (X - Y)^2\]</div>
<p>In the above equation:</p>
...
...
@@ -1299,7 +1309,12 @@ For predictions, <span class="math">\(X\)</span>, and target labels, <span class
</ul>
</td>
</tr>
<trclass="field-even field"><thclass="field-name">Returns:</th><tdclass="field-body"><pclass="first">The tensor variable storing the element-wise squared error difference of input and label.</p>
<li><spanclass="math">\(Out\)</span>: Output value, the shape of <spanclass="math">\(Out\)</span> and <spanclass="math">\(X\)</span> may be different.</li>
<li><dlclass="first docutils">
<dt><spanclass="math">\(Out\)</span>: Output value, the shape of <spanclass="math">\(Out\)</span> and <spanclass="math">\(X\)</span> may be</dt>
<dd>different.</dd>
</dl>
</li>
</ul>
<pclass="rubric">Example</p>
<ul>
...
...
@@ -1407,20 +1427,28 @@ library is installed. Default: True</li>
</ul>
</td>
</tr>
<trclass="field-even field"><thclass="field-name">Returns:</th><tdclass="field-body"><pclass="first">The tensor variable storing the convolution and non-linearity activation result.</p>
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– If the shapes of input, filter_size, stride, padding and groups mismatch.</p>
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– If the shapes of input, filter_size, stride, padding and
@@ -2158,7 +2186,8 @@ are in NCHW format. Where N is batch size, C is the number of channels,
H is the height of the feature, and W is the width of the feature.
Parameters(dilations, strides, paddings) are two elements. These two elements
represent height and width, respectively. The details of convolution transpose
layer, please refer to the following explanation and references <aclass="reference external"href="http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdf">therein</a>.</p>
layer, please refer to the following explanation and references
<li><spanclass="math">\(Out\)</span>: Output value, the shape of <spanclass="math">\(Out\)</span> and <spanclass="math">\(X\)</span> may be different.</li>
<li><dlclass="first docutils">
<dt><spanclass="math">\(Out\)</span>: Output value, the shape of <spanclass="math">\(Out\)</span> and <spanclass="math">\(X\)</span> may be</dt>
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– If the shapes of input, filter_size, stride, padding and groups mismatch.</p>
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– If the shapes of input, filter_size, stride, padding and
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– The ranks of <strong>x_t</strong>, <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be 2 or the 1st dimensions of <strong>x_t</strong>, <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be the same or the 2nd dimensions of <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be the same.</p>
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– The ranks of <strong>x_t</strong>, <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong>
not be 2 or the 1st dimensions of <strong>x_t</strong>, <strong>hidden_t_prev</strong>
and <strong>cell_t_prev</strong> not be the same or the 2nd dimensions of
<strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be the same.</p>
</td>
</tr>
</tbody>
...
...
@@ -2706,9 +2748,9 @@ will be named automatically.</li>
<spanclass="n">fluid</span><spanclass="o">.</span><spanclass="n">layers</span><spanclass="o">.</span><spanclass="n">matmul</span><spanclass="p">(</span><spanclass="n">x</span><spanclass="p">,</span><spanclass="n">y</span><spanclass="p">)</span><spanclass="c1"># out: [B, ..., M, N]</span>
<spanclass="c1"># x: [B, M, K], y: [B, K, N]</span>
<spanclass="n">fluid</span><spanclass="o">.</span><spanclass="n">layers</span><spanclass="o">.</span><spanclass="n">matmul</span><spanclass="p">(</span><spanclass="n">x</span><spanclass="p">,</span><spanclass="n">y</span><spanclass="p">)</span><spanclass="c1"># out: [B, M, N]</span>
<spanclass="c1"># x: [B, M, K], y: [K, N]</span>
<spanclass="n">fluid</span><spanclass="o">.</span><spanclass="n">layers</span><spanclass="o">.</span><spanclass="n">matmul</span><spanclass="p">(</span><spanclass="n">x</span><spanclass="p">,</span><spanclass="n">y</span><spanclass="p">)</span><spanclass="c1"># out: [B, M, N]</span>
<li><strong>input</strong> (<em>Variable</em>) – (LoDTensor<float>), the probabilities of variable-length sequences, which is a 2-D Tensor with LoD information. It’s shape is [Lp, num_classes + 1], where Lp is the sum of all input sequences’ length and num_classes is the true number of classes. (not including the blank label).</li>
<li><strong>blank</strong> (<em>int</em>) – the blank label index of Connectionist Temporal Classification (CTC) loss, which is in thehalf-opened interval [0, num_classes + 1).</li>
<li><strong>input</strong> (<em>Variable</em>) – (LoDTensor<float>), the probabilities of
variable-length sequences, which is a 2-D Tensor with
LoD information. It’s shape is [Lp, num_classes + 1],
where Lp is the sum of all input sequences’ length and
num_classes is the true number of classes. (not
including the blank label).</li>
<li><strong>blank</strong> (<em>int</em>) – the blank label index of Connectionist Temporal
Classification (CTC) loss, which is in thehalf-opened
interval [0, num_classes + 1).</li>
</ul>
</td>
</tr>
...
...
@@ -3609,7 +3669,7 @@ will be named automatically.</li>
<li><strong>query</strong> (<em>Variable</em>) – The input variable which is a Tensor or LoDTensor.</li>
<li><strong>key</strong> (<em>Variable</em>) – The input variable which is a Tensor or LoDTensor.</li>
<li><strong>value</strong> (<em>Variable</em>) – The input variable which is a Tensor or LoDTensor.</li>
<li><strong>queries</strong> (<em>Variable</em>) – The input variable which should be a 3-D Tensor.</li>
<li><strong>keys</strong> (<em>Variable</em>) – The input variable which should be a 3-D Tensor.</li>
<li><strong>values</strong> (<em>Variable</em>) – The input variable which should be a 3-D Tensor.</li>
<li><strong>num_heads</strong> (<em>int</em>) – Head number to compute the scaled dot product
attention. Default value is 1.</li>
<li><strong>dropout_rate</strong> (<em>float</em>) – The dropout rate to drop the attention weight.
Default value is 0.</li>
</ul>
</td>
</tr>
<trclass="field-even field"><thclass="field-name">Returns:</th><tdclass="field-body"><pclass="first">The Tensor variables representing the output and attention scores.</p>
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– If input queries, keys, values are not 3-D Tensors.</p>
</td>
</tr>
</tbody>
</table>
<divclass="admonition note">
<pclass="first admonition-title">Note</p>
<p>1. When num_heads > 1, three linear projections are learned respectively
to map input queries, keys and values into queries’, keys’ and values’.
queries’, keys’ and values’ have the same shapes with queries, keys
and values.</p>
<pclass="last">1. When num_heads == 1, scaled_dot_product_attention has no learnable
parameters.</p>
</div>
<pclass="rubric">Examples</p>
<divclass="highlight-python"><divclass="highlight"><pre><span></span><spanclass="c1"># Suppose q, k, v are tensor variables with the following shape:</span>
<divclass="highlight-python"><divclass="highlight"><pre><span></span><spanclass="c1"># Suppose q, k, v are Tensors with the following shape:</span>
"comment":"\nReshape Operator.\n\nReshape Input(X) into the shape specified by Attr(shape).\n\nAn example:\nGiven a 2-D tensor X with 2 rows and 2 columns\n\n [[1, 2], [3, 4]]\n\nand target shape = [1, 4], the reshape operator will transform\nthe tensor X into a 2-D tensor:\n\n [[1, 2, 3, 4]]\n\nOne dimension in the target shape can be set -1, representing that its\nsize is unknown. In this case, the real dimension will be infered from \nthe original shape of Input(X) and other dimensions in the target shape.\n",
"comment":"\nReshape Operator.\n\nReshape Input(X) into the shape specified by Attr(shape).\n\nAn example:\nGiven a 2-D tensor X with 2 rows and 2 columns : [[1, 2], [3, 4]]\n\nand target shape = [1, 4], the reshape operator will transform\nthe tensor X into a 2-D tensor: [[1, 2, 3, 4]]\n\nOne dimension in the target shape can be set -1, representing that its\nsize is unknown. In this case, the real dimension will be infered from \nthe original shape of Input(X) and other dimensions in the target shape.\n",
<li><strong>input</strong> (<em>Variable|list</em>) – a 2-D tensor with shape [N x D], where N is the
batch size and D is the number of classes. This input is a probability
computed by the previous operator, which is almost always the result
of a softmax operator.</li>
batch size and D is the number of classes. This
input is a probability computed by the previous
operator, which is almost always the result of
a softmax operator.</li>
<li><strong>label</strong> (<em>Variable|list</em>) – the ground truth which is a 2-D tensor. When
<cite>soft_label</cite> is set to <cite>False</cite>, <cite>label</cite> is a tensor<int64> with shape
[N x 1]. When <cite>soft_label</cite> is set to <cite>True</cite>, <cite>label</cite> is a
<cite>soft_label</cite> is set to <cite>False</cite>, <cite>label</cite> is a
tensor<int64> with shape [N x 1]. When
<cite>soft_label</cite> is set to <cite>True</cite>, <cite>label</cite> is a
tensor<float/double> with shape [N x D].</li>
<li><strong>soft_label</strong> (bool, via <cite>**kwargs</cite>) – a flag indicating whether to interpretate
the given labels as soft labels, default <cite>False</cite>.</li>
<li><strong>soft_label</strong> (bool, via <cite>**kwargs</cite>) – a flag indicating whether to
interpretate the given labels as soft
labels, default <cite>False</cite>.</li>
</ul>
</td>
</tr>
<trclass="field-even field"><thclass="field-name">返回:</th><tdclass="field-body"><pclass="first">A 2-D tensor with shape [N x 1], the cross entropy loss.</p>
</td>
</tr>
<trclass="field-odd field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><cite>ValueError</cite>– 1) the 1st dimension of <cite>input</cite> and <cite>label</cite> are not equal; 2) when <cite>soft_label == True</cite>, and the 2nd dimension of <cite>input</cite> and <cite>label</cite> are not equal; 3) when <cite>soft_label == False</cite>, and the 2nd dimension of <cite>label</cite> is not 1.</p>
<trclass="field-odd field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first"><cite>ValueError</cite>– 1) the 1st dimension of <cite>input</cite> and <cite>label</cite> are not equal.
2) when <cite>soft_label == True</cite>, and the 2nd dimension of</p>
<blockquote>
<div><p><cite>input</cite> and <cite>label</cite> are not equal.</p>
</div></blockquote>
<olclass="last arabic simple"start="3">
<li>when <cite>soft_label == False</cite>, and the 2nd dimension of
<cite>label</cite> is not 1.</li>
</ol>
</td>
</tr>
</tbody>
...
...
@@ -1296,8 +1305,9 @@ the given labels as soft labels, default <cite>False</cite>.</li>
<p>This layer accepts input predictions and target label and returns the squared error cost.
For predictions, <spanclass="math">\(X\)</span>, and target labels, <spanclass="math">\(Y\)</span>, the equation is:</p>
<p>This layer accepts input predictions and target label and returns the
squared error cost.</p>
<p>For predictions, <spanclass="math">\(X\)</span>, and target labels, <spanclass="math">\(Y\)</span>, the equation is:</p>
<divclass="math">
\[Out = (X - Y)^2\]</div>
<p>In the above equation:</p>
...
...
@@ -1318,7 +1328,12 @@ For predictions, <span class="math">\(X\)</span>, and target labels, <span class
</ul>
</td>
</tr>
<trclass="field-even field"><thclass="field-name">返回:</th><tdclass="field-body"><pclass="first">The tensor variable storing the element-wise squared error difference of input and label.</p>
<li><spanclass="math">\(Out\)</span>: Output value, the shape of <spanclass="math">\(Out\)</span> and <spanclass="math">\(X\)</span> may be different.</li>
<li><dlclass="first docutils">
<dt><spanclass="math">\(Out\)</span>: Output value, the shape of <spanclass="math">\(Out\)</span> and <spanclass="math">\(X\)</span> may be</dt>
<dd>different.</dd>
</dl>
</li>
</ul>
<pclass="rubric">Example</p>
<ul>
...
...
@@ -1426,20 +1446,28 @@ library is installed. Default: True</li>
</ul>
</td>
</tr>
<trclass="field-even field"><thclass="field-name">返回:</th><tdclass="field-body"><pclass="first">The tensor variable storing the convolution and non-linearity activation result.</p>
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– If the shapes of input, filter_size, stride, padding and groups mismatch.</p>
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– If the shapes of input, filter_size, stride, padding and
@@ -2177,7 +2205,8 @@ are in NCHW format. Where N is batch size, C is the number of channels,
H is the height of the feature, and W is the width of the feature.
Parameters(dilations, strides, paddings) are two elements. These two elements
represent height and width, respectively. The details of convolution transpose
layer, please refer to the following explanation and references <aclass="reference external"href="http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdf">therein</a>.</p>
layer, please refer to the following explanation and references
<li><spanclass="math">\(Out\)</span>: Output value, the shape of <spanclass="math">\(Out\)</span> and <spanclass="math">\(X\)</span> may be different.</li>
<li><dlclass="first docutils">
<dt><spanclass="math">\(Out\)</span>: Output value, the shape of <spanclass="math">\(Out\)</span> and <spanclass="math">\(X\)</span> may be</dt>
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– If the shapes of input, filter_size, stride, padding and groups mismatch.</p>
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– If the shapes of input, filter_size, stride, padding and
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– The ranks of <strong>x_t</strong>, <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be 2 or the 1st dimensions of <strong>x_t</strong>, <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be the same or the 2nd dimensions of <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be the same.</p>
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– The ranks of <strong>x_t</strong>, <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong>
not be 2 or the 1st dimensions of <strong>x_t</strong>, <strong>hidden_t_prev</strong>
and <strong>cell_t_prev</strong> not be the same or the 2nd dimensions of
<strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be the same.</p>
</td>
</tr>
</tbody>
...
...
@@ -2725,9 +2767,9 @@ will be named automatically.</li>
<spanclass="n">fluid</span><spanclass="o">.</span><spanclass="n">layers</span><spanclass="o">.</span><spanclass="n">matmul</span><spanclass="p">(</span><spanclass="n">x</span><spanclass="p">,</span><spanclass="n">y</span><spanclass="p">)</span><spanclass="c1"># out: [B, ..., M, N]</span>
<spanclass="c1"># x: [B, M, K], y: [B, K, N]</span>
<spanclass="n">fluid</span><spanclass="o">.</span><spanclass="n">layers</span><spanclass="o">.</span><spanclass="n">matmul</span><spanclass="p">(</span><spanclass="n">x</span><spanclass="p">,</span><spanclass="n">y</span><spanclass="p">)</span><spanclass="c1"># out: [B, M, N]</span>
<spanclass="c1"># x: [B, M, K], y: [K, N]</span>
<spanclass="n">fluid</span><spanclass="o">.</span><spanclass="n">layers</span><spanclass="o">.</span><spanclass="n">matmul</span><spanclass="p">(</span><spanclass="n">x</span><spanclass="p">,</span><spanclass="n">y</span><spanclass="p">)</span><spanclass="c1"># out: [B, M, N]</span>
<li><strong>input</strong> (<em>Variable</em>) – (LoDTensor<float>), the probabilities of variable-length sequences, which is a 2-D Tensor with LoD information. It’s shape is [Lp, num_classes + 1], where Lp is the sum of all input sequences’ length and num_classes is the true number of classes. (not including the blank label).</li>
<li><strong>blank</strong> (<em>int</em>) – the blank label index of Connectionist Temporal Classification (CTC) loss, which is in thehalf-opened interval [0, num_classes + 1).</li>
<li><strong>input</strong> (<em>Variable</em>) – (LoDTensor<float>), the probabilities of
variable-length sequences, which is a 2-D Tensor with
LoD information. It’s shape is [Lp, num_classes + 1],
where Lp is the sum of all input sequences’ length and
num_classes is the true number of classes. (not
including the blank label).</li>
<li><strong>blank</strong> (<em>int</em>) – the blank label index of Connectionist Temporal
Classification (CTC) loss, which is in thehalf-opened
interval [0, num_classes + 1).</li>
</ul>
</td>
</tr>
...
...
@@ -3628,7 +3688,7 @@ will be named automatically.</li>
<li><strong>query</strong> (<em>Variable</em>) – The input variable which is a Tensor or LoDTensor.</li>
<li><strong>key</strong> (<em>Variable</em>) – The input variable which is a Tensor or LoDTensor.</li>
<li><strong>value</strong> (<em>Variable</em>) – The input variable which is a Tensor or LoDTensor.</li>
<li><strong>queries</strong> (<em>Variable</em>) – The input variable which should be a 3-D Tensor.</li>
<li><strong>keys</strong> (<em>Variable</em>) – The input variable which should be a 3-D Tensor.</li>
<li><strong>values</strong> (<em>Variable</em>) – The input variable which should be a 3-D Tensor.</li>
<li><strong>num_heads</strong> (<em>int</em>) – Head number to compute the scaled dot product
attention. Default value is 1.</li>
<li><strong>dropout_rate</strong> (<em>float</em>) – The dropout rate to drop the attention weight.
Default value is 0.</li>
</ul>
</td>
</tr>
<trclass="field-even field"><thclass="field-name">返回:</th><tdclass="field-body"><pclass="first">The Tensor variables representing the output and attention scores.</p>
<trclass="field-even field"><thclass="field-name">Raises:</th><tdclass="field-body"><pclass="first last"><codeclass="xref py py-exc docutils literal"><spanclass="pre">ValueError</span></code>– If input queries, keys, values are not 3-D Tensors.</p>
</td>
</tr>
</tbody>
</table>
<divclass="admonition note">
<pclass="first admonition-title">注解</p>
<p>1. When num_heads > 1, three linear projections are learned respectively
to map input queries, keys and values into queries’, keys’ and values’.
queries’, keys’ and values’ have the same shapes with queries, keys
and values.</p>
<pclass="last">1. When num_heads == 1, scaled_dot_product_attention has no learnable
parameters.</p>
</div>
<pclass="rubric">Examples</p>
<divclass="highlight-python"><divclass="highlight"><pre><span></span><spanclass="c1"># Suppose q, k, v are tensor variables with the following shape:</span>
<divclass="highlight-python"><divclass="highlight"><pre><span></span><spanclass="c1"># Suppose q, k, v are Tensors with the following shape:</span>