提交 eace3e49 编写于 作者: T Travis CI

Deploy to GitHub Pages: ef8cb8f6

上级 d690e184
...@@ -26,8 +26,8 @@ glu ...@@ -26,8 +26,8 @@ glu
:noindex: :noindex:
dot_product_attention scaled_dot_product_attention
--------------------- ----------------------------
.. autofunction:: paddle.v2.fluid.nets.dot_product_attention .. autofunction:: paddle.v2.fluid.nets.scaled_dot_product_attention
:noindex: :noindex:
...@@ -258,16 +258,17 @@ multidimensional tensor will first be flattened ...@@ -258,16 +258,17 @@ multidimensional tensor will first be flattened
into a 2-dimensional matrix. The parameter into a 2-dimensional matrix. The parameter
<cite>num_flatten_dims</cite> determines how the input tensor <cite>num_flatten_dims</cite> determines how the input tensor
is flattened: the first <cite>num_flatten_dims</cite> is flattened: the first <cite>num_flatten_dims</cite>
dimensions will be flatten to form the first (inclusive, index starts from 1) dimensions will
dimension of the final matrix (height of the be flatten to form the first dimension of the
matrix), and the rest <cite>rank(X) - num_flatten_dims</cite> final matrix (height of the matrix), and the rest
dimensions are flattened to form the second <cite>rank(X) - num_flatten_dims</cite> dimensions are
dimension of the final matrix (width of the matrix). flattened to form the second dimension of the
For example, suppose <cite>X</cite> is a 6-dimensional tensor final matrix (width of the matrix). For example,
with a shape [2, 3, 4, 5, 6], and suppose <cite>X</cite> is a 6-dimensional tensor with a shape
<cite>num_flatten_dims</cite> = 3. Then, the flattened matrix [2, 3, 4, 5, 6], and <cite>num_flatten_dims</cite> = 3. Then,
will have a shape [2 x 3 x 4, 5 x 6] = [24, 30]. the flattened matrix will have a shape
By default, <cite>num_flatten_dims</cite> is set to 1.</li> [2 x 3 x 4, 5 x 6] = [24, 30]. By default,
<cite>num_flatten_dims</cite> is set to 1.</li>
<li><strong>param_attr</strong> (<em>ParamAttr|list</em>) &#8211; The parameter attribute for learnable <li><strong>param_attr</strong> (<em>ParamAttr|list</em>) &#8211; The parameter attribute for learnable
parameters/weights of the fully connected parameters/weights of the fully connected
layer.</li> layer.</li>
...@@ -858,13 +859,9 @@ Duplicable: False Optional: False</li> ...@@ -858,13 +859,9 @@ Duplicable: False Optional: False</li>
<dd><p>Reshape Operator.</p> <dd><p>Reshape Operator.</p>
<p>Reshape Input(X) into the shape specified by Attr(shape).</p> <p>Reshape Input(X) into the shape specified by Attr(shape).</p>
<p>An example: <p>An example:
Given a 2-D tensor X with 2 rows and 2 columns</p> Given a 2-D tensor X with 2 rows and 2 columns : [[1, 2], [3, 4]]</p>
<blockquote>
<div>[[1, 2], [3, 4]]</div></blockquote>
<p>and target shape = [1, 4], the reshape operator will transform <p>and target shape = [1, 4], the reshape operator will transform
the tensor X into a 2-D tensor:</p> the tensor X into a 2-D tensor: [[1, 2, 3, 4]]</p>
<blockquote>
<div>[[1, 2, 3, 4]]</div></blockquote>
<p>One dimension in the target shape can be set -1, representing that its <p>One dimension in the target shape can be set -1, representing that its
size is unknown. In this case, the real dimension will be infered from size is unknown. In this case, the real dimension will be infered from
the original shape of Input(X) and other dimensions in the target shape.</p> the original shape of Input(X) and other dimensions in the target shape.</p>
...@@ -1206,8 +1203,9 @@ X and Y and returns that as the output.</p> ...@@ -1206,8 +1203,9 @@ X and Y and returns that as the output.</p>
<dt> <dt>
<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">cross_entropy</code><span class="sig-paren">(</span><em>input</em>, <em>label</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">cross_entropy</code><span class="sig-paren">(</span><em>input</em>, <em>label</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
<dd><p><strong>Cross Entropy Layer</strong></p> <dd><p><strong>Cross Entropy Layer</strong></p>
<p>This layer computes the cross entropy between <cite>input</cite> and <cite>label</cite>. It supports <p>This layer computes the cross entropy between <cite>input</cite> and <cite>label</cite>. It
both standard cross-entropy and soft-label cross-entropy loss computation.</p> supports both standard cross-entropy and soft-label cross-entropy loss
computation.</p>
<ol class="arabic"> <ol class="arabic">
<li><dl class="first docutils"> <li><dl class="first docutils">
<dt>One-hot cross-entropy:</dt> <dt>One-hot cross-entropy:</dt>
...@@ -1243,22 +1241,33 @@ to a one-hot cross-entropy with one-hot label representation.</p> ...@@ -1243,22 +1241,33 @@ to a one-hot cross-entropy with one-hot label representation.</p>
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>input</strong> (<em>Variable|list</em>) &#8211; a 2-D tensor with shape [N x D], where N is the <li><strong>input</strong> (<em>Variable|list</em>) &#8211; a 2-D tensor with shape [N x D], where N is the
batch size and D is the number of classes. This input is a probability batch size and D is the number of classes. This
computed by the previous operator, which is almost always the result input is a probability computed by the previous
of a softmax operator.</li> operator, which is almost always the result of
a softmax operator.</li>
<li><strong>label</strong> (<em>Variable|list</em>) &#8211; the ground truth which is a 2-D tensor. When <li><strong>label</strong> (<em>Variable|list</em>) &#8211; the ground truth which is a 2-D tensor. When
<cite>soft_label</cite> is set to <cite>False</cite>, <cite>label</cite> is a tensor&lt;int64&gt; with shape <cite>soft_label</cite> is set to <cite>False</cite>, <cite>label</cite> is a
[N x 1]. When <cite>soft_label</cite> is set to <cite>True</cite>, <cite>label</cite> is a tensor&lt;int64&gt; with shape [N x 1]. When
<cite>soft_label</cite> is set to <cite>True</cite>, <cite>label</cite> is a
tensor&lt;float/double&gt; with shape [N x D].</li> tensor&lt;float/double&gt; with shape [N x D].</li>
<li><strong>soft_label</strong> (bool, via <cite>**kwargs</cite>) &#8211; a flag indicating whether to interpretate <li><strong>soft_label</strong> (bool, via <cite>**kwargs</cite>) &#8211; a flag indicating whether to
the given labels as soft labels, default <cite>False</cite>.</li> interpretate the given labels as soft
labels, default <cite>False</cite>.</li>
</ul> </ul>
</td> </td>
</tr> </tr>
<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">A 2-D tensor with shape [N x 1], the cross entropy loss.</p> <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">A 2-D tensor with shape [N x 1], the cross entropy loss.</p>
</td> </td>
</tr> </tr>
<tr class="field-odd field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><cite>ValueError</cite> &#8211; 1) the 1st dimension of <cite>input</cite> and <cite>label</cite> are not equal; 2) when <cite>soft_label == True</cite>, and the 2nd dimension of <cite>input</cite> and <cite>label</cite> are not equal; 3) when <cite>soft_label == False</cite>, and the 2nd dimension of <cite>label</cite> is not 1.</p> <tr class="field-odd field"><th class="field-name">Raises:</th><td class="field-body"><p class="first"><cite>ValueError</cite> &#8211; 1) the 1st dimension of <cite>input</cite> and <cite>label</cite> are not equal.
2) when <cite>soft_label == True</cite>, and the 2nd dimension of</p>
<blockquote>
<div><p><cite>input</cite> and <cite>label</cite> are not equal.</p>
</div></blockquote>
<ol class="last arabic simple" start="3">
<li>when <cite>soft_label == False</cite>, and the 2nd dimension of
<cite>label</cite> is not 1.</li>
</ol>
</td> </td>
</tr> </tr>
</tbody> </tbody>
...@@ -1277,8 +1286,9 @@ the given labels as soft labels, default <cite>False</cite>.</li> ...@@ -1277,8 +1286,9 @@ the given labels as soft labels, default <cite>False</cite>.</li>
<dt> <dt>
<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">square_error_cost</code><span class="sig-paren">(</span><em>input</em>, <em>label</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">square_error_cost</code><span class="sig-paren">(</span><em>input</em>, <em>label</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
<dd><p><strong>Square error cost layer</strong></p> <dd><p><strong>Square error cost layer</strong></p>
<p>This layer accepts input predictions and target label and returns the squared error cost. <p>This layer accepts input predictions and target label and returns the
For predictions, <span class="math">\(X\)</span>, and target labels, <span class="math">\(Y\)</span>, the equation is:</p> squared error cost.</p>
<p>For predictions, <span class="math">\(X\)</span>, and target labels, <span class="math">\(Y\)</span>, the equation is:</p>
<div class="math"> <div class="math">
\[Out = (X - Y)^2\]</div> \[Out = (X - Y)^2\]</div>
<p>In the above equation:</p> <p>In the above equation:</p>
...@@ -1299,7 +1309,12 @@ For predictions, <span class="math">\(X\)</span>, and target labels, <span class ...@@ -1299,7 +1309,12 @@ For predictions, <span class="math">\(X\)</span>, and target labels, <span class
</ul> </ul>
</td> </td>
</tr> </tr>
<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">The tensor variable storing the element-wise squared error difference of input and label.</p> <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first"><dl class="docutils">
<dt>The tensor variable storing the element-wise squared error</dt>
<dd><p class="first last">difference of input and label.</p>
</dd>
</dl>
</p>
</td> </td>
</tr> </tr>
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">Variable</p> <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">Variable</p>
...@@ -1344,12 +1359,13 @@ in the input parameters to the function.</p> ...@@ -1344,12 +1359,13 @@ in the input parameters to the function.</p>
<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">conv2d</code><span class="sig-paren">(</span><em>input</em>, <em>num_filters</em>, <em>filter_size</em>, <em>stride=None</em>, <em>padding=None</em>, <em>groups=None</em>, <em>param_attr=None</em>, <em>bias_attr=None</em>, <em>use_cudnn=True</em>, <em>act=None</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">conv2d</code><span class="sig-paren">(</span><em>input</em>, <em>num_filters</em>, <em>filter_size</em>, <em>stride=None</em>, <em>padding=None</em>, <em>groups=None</em>, <em>param_attr=None</em>, <em>bias_attr=None</em>, <em>use_cudnn=True</em>, <em>act=None</em><span class="sig-paren">)</span></dt>
<dd><p><strong>Convlution2D Layer</strong></p> <dd><p><strong>Convlution2D Layer</strong></p>
<p>The convolution2D layer calculates the output based on the input, filter <p>The convolution2D layer calculates the output based on the input, filter
and strides, paddings, dilations, groups parameters. Input(Input) and Output(Output) and strides, paddings, dilations, groups parameters. Input(Input) and
are in NCHW format. Where N is batch size, C is the number of channels, H is the height Output(Output) are in NCHW format. Where N is batch size, C is the number of
of the feature, and W is the width of the feature. channels, H is the height of the feature, and W is the width of the feature.
The details of convolution layer, please refer UFLDL&#8217;s <a class="reference external" href="http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractionUsingConvolution/">convolution,</a> . The details of convolution layer, please refer UFLDL&#8217;s <a class="reference external" href="http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractionUsingConvolution/">convolution,</a> .
If bias attribution and activation type are provided, bias is added to the output of the convolution, If bias attribution and activation type are provided, bias is added to the
and the corresponding activation function is applied to the final result.</p> output of the convolution, and the corresponding activation function is
applied to the final result.</p>
<p>For each input <span class="math">\(X\)</span>, the equation is:</p> <p>For each input <span class="math">\(X\)</span>, the equation is:</p>
<div class="math"> <div class="math">
\[Out = \sigma (W \ast X + b)\]</div> \[Out = \sigma (W \ast X + b)\]</div>
...@@ -1360,7 +1376,11 @@ and the corresponding activation function is applied to the final result.</p> ...@@ -1360,7 +1376,11 @@ and the corresponding activation function is applied to the final result.</p>
<li><span class="math">\(\ast\)</span>: Convolution operation.</li> <li><span class="math">\(\ast\)</span>: Convolution operation.</li>
<li><span class="math">\(b\)</span>: Bias value, a 2-D tensor with shape [M, 1].</li> <li><span class="math">\(b\)</span>: Bias value, a 2-D tensor with shape [M, 1].</li>
<li><span class="math">\(\sigma\)</span>: Activation function.</li> <li><span class="math">\(\sigma\)</span>: Activation function.</li>
<li><span class="math">\(Out\)</span>: Output value, the shape of <span class="math">\(Out\)</span> and <span class="math">\(X\)</span> may be different.</li> <li><dl class="first docutils">
<dt><span class="math">\(Out\)</span>: Output value, the shape of <span class="math">\(Out\)</span> and <span class="math">\(X\)</span> may be</dt>
<dd>different.</dd>
</dl>
</li>
</ul> </ul>
<p class="rubric">Example</p> <p class="rubric">Example</p>
<ul> <ul>
...@@ -1407,20 +1427,28 @@ library is installed. Default: True</li> ...@@ -1407,20 +1427,28 @@ library is installed. Default: True</li>
</ul> </ul>
</td> </td>
</tr> </tr>
<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">The tensor variable storing the convolution and non-linearity activation result.</p> <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first"><dl class="docutils">
<dt>The tensor variable storing the convolution and</dt>
<dd><p class="first last">non-linearity activation result.</p>
</dd>
</dl>
</p>
</td> </td>
</tr> </tr>
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first">Variable</p> <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first">Variable</p>
</td> </td>
</tr> </tr>
<tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; If the shapes of input, filter_size, stride, padding and groups mismatch.</p> <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; If the shapes of input, filter_size, stride, padding and
groups mismatch.</p>
</td> </td>
</tr> </tr>
</tbody> </tbody>
</table> </table>
<p class="rubric">Examples</p> <p class="rubric">Examples</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;data&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">32</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">)</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span>
<span class="n">conv2d</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">conv2d</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">num_filters</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">filter_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="s2">&quot;relu&quot;</span><span class="p">)</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;data&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">32</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">)</span>
<span class="n">conv2d</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">conv2d</span><span class="p">(</span>
<span class="nb">input</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">num_filters</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">filter_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="s2">&quot;relu&quot;</span><span class="p">)</span>
</pre></div> </pre></div>
</div> </div>
</dd></dl> </dd></dl>
...@@ -2158,7 +2186,8 @@ are in NCHW format. Where N is batch size, C is the number of channels, ...@@ -2158,7 +2186,8 @@ are in NCHW format. Where N is batch size, C is the number of channels,
H is the height of the feature, and W is the width of the feature. H is the height of the feature, and W is the width of the feature.
Parameters(dilations, strides, paddings) are two elements. These two elements Parameters(dilations, strides, paddings) are two elements. These two elements
represent height and width, respectively. The details of convolution transpose represent height and width, respectively. The details of convolution transpose
layer, please refer to the following explanation and references <a class="reference external" href="http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdf">therein</a>.</p> layer, please refer to the following explanation and references
<a class="reference external" href="http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdf">therein</a>.</p>
<p>For each input <span class="math">\(X\)</span>, the equation is:</p> <p>For each input <span class="math">\(X\)</span>, the equation is:</p>
<div class="math"> <div class="math">
\[Out = W \ast X\]</div> \[Out = W \ast X\]</div>
...@@ -2167,7 +2196,11 @@ layer, please refer to the following explanation and references <a class="refere ...@@ -2167,7 +2196,11 @@ layer, please refer to the following explanation and references <a class="refere
<li><span class="math">\(X\)</span>: Input value, a tensor with NCHW format.</li> <li><span class="math">\(X\)</span>: Input value, a tensor with NCHW format.</li>
<li><span class="math">\(W\)</span>: Filter value, a tensor with MCHW format.</li> <li><span class="math">\(W\)</span>: Filter value, a tensor with MCHW format.</li>
<li><span class="math">\(\ast\)</span> : Convolution transpose operation.</li> <li><span class="math">\(\ast\)</span> : Convolution transpose operation.</li>
<li><span class="math">\(Out\)</span>: Output value, the shape of <span class="math">\(Out\)</span> and <span class="math">\(X\)</span> may be different.</li> <li><dl class="first docutils">
<dt><span class="math">\(Out\)</span>: Output value, the shape of <span class="math">\(Out\)</span> and <span class="math">\(X\)</span> may be</dt>
<dd>different.</dd>
</dl>
</li>
</ul> </ul>
<p class="rubric">Example</p> <p class="rubric">Example</p>
<ul> <ul>
...@@ -2207,7 +2240,8 @@ stride_H = stride_W = stride. Default: stride = 1.</li> ...@@ -2207,7 +2240,8 @@ stride_H = stride_W = stride. Default: stride = 1.</li>
<li><strong>dilation</strong> (<em>int|tuple</em>) &#8211; The dilation size. If dilation is a tuple, it must <li><strong>dilation</strong> (<em>int|tuple</em>) &#8211; The dilation size. If dilation is a tuple, it must
contain two integers, (dilation_H, dilation_W). Otherwise, the contain two integers, (dilation_H, dilation_W). Otherwise, the
dilation_H = dilation_W = dilation. Default: dilation = 1.</li> dilation_H = dilation_W = dilation. Default: dilation = 1.</li>
<li><strong>param_attr</strong> (<em>ParamAttr</em>) &#8211; The parameters to the Conv2d_transpose Layer. Default: None</li> <li><strong>param_attr</strong> (<em>ParamAttr</em>) &#8211; The parameters to the Conv2d_transpose Layer.
Default: None</li>
<li><strong>use_cudnn</strong> (<em>bool</em>) &#8211; Use cudnn kernel or not, it is valid only when the cudnn <li><strong>use_cudnn</strong> (<em>bool</em>) &#8211; Use cudnn kernel or not, it is valid only when the cudnn
library is installed. Default: True</li> library is installed. Default: True</li>
<li><strong>name</strong> (<em>str|None</em>) &#8211; A name for this layer(optional). If set None, the layer <li><strong>name</strong> (<em>str|None</em>) &#8211; A name for this layer(optional). If set None, the layer
...@@ -2221,14 +2255,17 @@ will be named automatically.</li> ...@@ -2221,14 +2255,17 @@ will be named automatically.</li>
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first">Variable</p> <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first">Variable</p>
</td> </td>
</tr> </tr>
<tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; If the shapes of input, filter_size, stride, padding and groups mismatch.</p> <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; If the shapes of input, filter_size, stride, padding and
groups mismatch.</p>
</td> </td>
</tr> </tr>
</tbody> </tbody>
</table> </table>
<p class="rubric">Examples</p> <p class="rubric">Examples</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;data&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">32</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">)</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span>
<span class="n">conv2d_transpose</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">conv2d_transpose</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">num_filters</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">filter_size</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;data&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">32</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">)</span>
<span class="n">conv2d_transpose</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">conv2d_transpose</span><span class="p">(</span>
<span class="nb">input</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">num_filters</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">filter_size</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
</pre></div> </pre></div>
</div> </div>
</dd></dl> </dd></dl>
...@@ -2337,8 +2374,10 @@ and concatenation of <span class="math">\(u_t\)</span>, <span class="math">\(r_t ...@@ -2337,8 +2374,10 @@ and concatenation of <span class="math">\(u_t\)</span>, <span class="math">\(r_t
<li><strong>size</strong> (<em>integer</em>) &#8211; The input dimension value.</li> <li><strong>size</strong> (<em>integer</em>) &#8211; The input dimension value.</li>
<li><strong>weight</strong> (<em>ParamAttr</em>) &#8211; The weight parameters for gru unit. Default: None</li> <li><strong>weight</strong> (<em>ParamAttr</em>) &#8211; The weight parameters for gru unit. Default: None</li>
<li><strong>bias</strong> (<em>ParamAttr</em>) &#8211; The bias parameters for gru unit. Default: None</li> <li><strong>bias</strong> (<em>ParamAttr</em>) &#8211; The bias parameters for gru unit. Default: None</li>
<li><strong>activation</strong> (<em>string</em>) &#8211; The activation type for cell (actNode). Default: &#8216;tanh&#8217;</li> <li><strong>activation</strong> (<em>string</em>) &#8211; The activation type for cell (actNode).
<li><strong>gate_activation</strong> (<em>string</em>) &#8211; The activation type for gates (actGate). Default: &#8216;sigmoid&#8217;</li> Default: &#8216;tanh&#8217;</li>
<li><strong>gate_activation</strong> (<em>string</em>) &#8211; The activation type for gates (actGate).
Default: &#8216;sigmoid&#8217;</li>
</ul> </ul>
</td> </td>
</tr> </tr>
...@@ -2414,7 +2453,10 @@ will be named automatically.</li> ...@@ -2414,7 +2453,10 @@ will be named automatically.</li>
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first">tuple</p> <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first">tuple</p>
</td> </td>
</tr> </tr>
<tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; The ranks of <strong>x_t</strong>, <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be 2 or the 1st dimensions of <strong>x_t</strong>, <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be the same or the 2nd dimensions of <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be the same.</p> <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; The ranks of <strong>x_t</strong>, <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong>
not be 2 or the 1st dimensions of <strong>x_t</strong>, <strong>hidden_t_prev</strong>
and <strong>cell_t_prev</strong> not be the same or the 2nd dimensions of
<strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be the same.</p>
</td> </td>
</tr> </tr>
</tbody> </tbody>
...@@ -2706,9 +2748,9 @@ will be named automatically.</li> ...@@ -2706,9 +2748,9 @@ will be named automatically.</li>
<dl class="function"> <dl class="function">
<dt> <dt>
<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">matmul</code><span class="sig-paren">(</span><em>x</em>, <em>y</em>, <em>transpose_x=False</em>, <em>transpose_y=False</em>, <em>name=None</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">matmul</code><span class="sig-paren">(</span><em>x</em>, <em>y</em>, <em>transpose_x=False</em>, <em>transpose_y=False</em>, <em>name=None</em><span class="sig-paren">)</span></dt>
<dd><p>Applies matrix multiplication to two tensors. Currently, the input <dd><p>Applies matrix multiplication to two tensors.</p>
tensors&#8217; rank can be any, but when the rank of anyone inputs is <p>Currently, the input tensors&#8217; rank can be any, but when the rank of any
bigger than 3, this two inputs&#8217; rank should be equal.</p> inputs is bigger than 3, this two inputs&#8217; rank should be equal.</p>
<p>The actual behavior depends on the shapes of <span class="math">\(x\)</span>, <span class="math">\(y\)</span> and the <p>The actual behavior depends on the shapes of <span class="math">\(x\)</span>, <span class="math">\(y\)</span> and the
flag values of <code class="xref py py-attr docutils literal"><span class="pre">transpose_x</span></code>, <code class="xref py py-attr docutils literal"><span class="pre">transpose_y</span></code>. Specifically:</p> flag values of <code class="xref py py-attr docutils literal"><span class="pre">transpose_x</span></code>, <code class="xref py py-attr docutils literal"><span class="pre">transpose_y</span></code>. Specifically:</p>
<ul class="simple"> <ul class="simple">
...@@ -2756,18 +2798,23 @@ will be named automatically.</li> ...@@ -2756,18 +2798,23 @@ will be named automatically.</li>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Examples to clarify shapes of the inputs and output</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Examples to clarify shapes of the inputs and output</span>
<span class="c1"># x: [B, ..., M, K], y: [B, ..., K, N]</span> <span class="c1"># x: [B, ..., M, K], y: [B, ..., K, N]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, ..., M, N]</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, ..., M, N]</span>
<span class="c1"># x: [B, M, K], y: [B, K, N]</span> <span class="c1"># x: [B, M, K], y: [B, K, N]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, M, N]</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, M, N]</span>
<span class="c1"># x: [B, M, K], y: [K, N]</span> <span class="c1"># x: [B, M, K], y: [K, N]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, M, N]</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, M, N]</span>
<span class="c1"># x: [B, M, K], y: [K]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, M]</span>
<span class="c1"># x: [M, K], y: [K, N]</span> <span class="c1"># x: [M, K], y: [K, N]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [M, N]</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [M, N]</span>
<span class="c1"># x: [B, M, K], y: [K]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, M]</span>
<span class="c1"># x: [K], y: [K]</span> <span class="c1"># x: [K], y: [K]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [1]</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [1]</span>
<span class="c1"># x: [M], y: [N]</span>
<span class="c1"># x: [M], y: [N]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)</span> <span class="c1"># out: [M, N]</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)</span> <span class="c1"># out: [M, N]</span>
</pre></div> </pre></div>
</div> </div>
...@@ -3502,7 +3549,8 @@ output.lod = [[0, 4, 8]] ...@@ -3502,7 +3549,8 @@ output.lod = [[0, 4, 8]]
</pre></div> </pre></div>
</div> </div>
<p>The simple usage is:</p> <p>The simple usage is:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">output</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">im2sequence</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">filter_size</span><span class="o">=</span><span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">output</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">im2sequence</span><span class="p">(</span>
<span class="nb">input</span><span class="o">=</span><span class="n">layer</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">filter_size</span><span class="o">=</span><span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span>
</pre></div> </pre></div>
</div> </div>
</div></blockquote> </div></blockquote>
...@@ -3518,8 +3566,13 @@ output.lod = [[0, 4, 8]] ...@@ -3518,8 +3566,13 @@ output.lod = [[0, 4, 8]]
<dt> <dt>
<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">ctc_greedy_decoder</code><span class="sig-paren">(</span><em>input</em>, <em>blank</em>, <em>name=None</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">ctc_greedy_decoder</code><span class="sig-paren">(</span><em>input</em>, <em>blank</em>, <em>name=None</em><span class="sig-paren">)</span></dt>
<dd><p>This op is used to decode sequences by greedy policy by below steps: <dd><p>This op is used to decode sequences by greedy policy by below steps:
1. Get the indexes of max value for each row in input. a.k.a. numpy.argmax(input, axis=0). 1. Get the indexes of max value for each row in input. a.k.a.</p>
2. For each sequence in result of step1, merge repeated tokens between two blanks and delete all blanks.</p> <blockquote>
<div>numpy.argmax(input, axis=0).</div></blockquote>
<ol class="arabic simple" start="2">
<li>For each sequence in result of step1, merge repeated tokens between two
blanks and delete all blanks.</li>
</ol>
<p>A simple example as below:</p> <p>A simple example as below:</p>
<div class="highlight-text"><div class="highlight"><pre><span></span>Given: <div class="highlight-text"><div class="highlight"><pre><span></span>Given:
...@@ -3549,8 +3602,15 @@ output.lod = [[0, 2, 3]] ...@@ -3549,8 +3602,15 @@ output.lod = [[0, 2, 3]]
<col class="field-body" /> <col class="field-body" />
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>input</strong> (<em>Variable</em>) &#8211; (LoDTensor&lt;float&gt;), the probabilities of variable-length sequences, which is a 2-D Tensor with LoD information. It&#8217;s shape is [Lp, num_classes + 1], where Lp is the sum of all input sequences&#8217; length and num_classes is the true number of classes. (not including the blank label).</li> <li><strong>input</strong> (<em>Variable</em>) &#8211; (LoDTensor&lt;float&gt;), the probabilities of
<li><strong>blank</strong> (<em>int</em>) &#8211; the blank label index of Connectionist Temporal Classification (CTC) loss, which is in thehalf-opened interval [0, num_classes + 1).</li> variable-length sequences, which is a 2-D Tensor with
LoD information. It&#8217;s shape is [Lp, num_classes + 1],
where Lp is the sum of all input sequences&#8217; length and
num_classes is the true number of classes. (not
including the blank label).</li>
<li><strong>blank</strong> (<em>int</em>) &#8211; the blank label index of Connectionist Temporal
Classification (CTC) loss, which is in thehalf-opened
interval [0, num_classes + 1).</li>
</ul> </ul>
</td> </td>
</tr> </tr>
...@@ -3609,7 +3669,7 @@ will be named automatically.</li> ...@@ -3609,7 +3669,7 @@ will be named automatically.</li>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">&quot;data&quot;</span><span class="p">,</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">&quot;data&quot;</span><span class="p">,</span>
<span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">17</span><span class="p">,</span> <span class="mi">13</span><span class="p">),</span> <span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">17</span><span class="p">,</span> <span class="mi">13</span><span class="p">),</span>
<span class="n">dtype</span><span class="o">=</span><span class="s2">&quot;float32&quot;</span><span class="p">)</span> <span class="n">dtype</span><span class="o">=</span><span class="s2">&quot;float32&quot;</span><span class="p">)</span>
<span class="n">fc</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">l2_normalize</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="n">normed</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">l2_normalize</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</pre></div> </pre></div>
</div> </div>
</dd></dl> </dd></dl>
......
...@@ -284,11 +284,11 @@ dimension to split along is <span class="math">\(rank(input) + dim\)</span>.</li ...@@ -284,11 +284,11 @@ dimension to split along is <span class="math">\(rank(input) + dim\)</span>.</li
</dd></dl> </dd></dl>
</div> </div>
<div class="section" id="dot-product-attention"> <div class="section" id="scaled-dot-product-attention">
<h2>dot_product_attention<a class="headerlink" href="#dot-product-attention" title="Permalink to this headline"></a></h2> <h2>scaled_dot_product_attention<a class="headerlink" href="#scaled-dot-product-attention" title="Permalink to this headline"></a></h2>
<dl class="function"> <dl class="function">
<dt> <dt>
<code class="descclassname">paddle.v2.fluid.nets.</code><code class="descname">dot_product_attention</code><span class="sig-paren">(</span><em>querys</em>, <em>keys</em>, <em>values</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.fluid.nets.</code><code class="descname">scaled_dot_product_attention</code><span class="sig-paren">(</span><em>queries</em>, <em>keys</em>, <em>values</em>, <em>num_heads=1</em>, <em>dropout_rate=0.0</em><span class="sig-paren">)</span></dt>
<dd><p>The dot-product attention.</p> <dd><p>The dot-product attention.</p>
<p>Attention mechanism can be seen as mapping a query and a set of key-value <p>Attention mechanism can be seen as mapping a query and a set of key-value
pairs to an output. The output is computed as a weighted sum of the values, pairs to an output. The output is computed as a weighted sum of the values,
...@@ -298,36 +298,55 @@ function (dot-product here) of the query with the corresponding key.</p> ...@@ -298,36 +298,55 @@ function (dot-product here) of the query with the corresponding key.</p>
multipication as follows:</p> multipication as follows:</p>
<blockquote> <blockquote>
<div><div class="math"> <div><div class="math">
\[Attention(Q, K, V)= softmax(QK^\mathrm{T})V\]</div> \[Attention(Q, K, V)= softmax(QK^\mathrm{T})V\]</div>
</div></blockquote> </div></blockquote>
<p>Refer to <a class="reference external" href="https://arxiv.org/pdf/1706.03762.pdf">Attention Is All You Need</a>.</p> <p>Refer to <a class="reference external" href="https://arxiv.org/pdf/1706.03762.pdf">Attention Is All You Need</a>.</p>
<p>Note that batch data containing sequences with different lengths is not
supported by this because of the (batch) matrix multipication.</p>
<table class="docutils field-list" frame="void" rules="none"> <table class="docutils field-list" frame="void" rules="none">
<col class="field-name" /> <col class="field-name" />
<col class="field-body" /> <col class="field-body" />
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>query</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li> <li><strong>queries</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
<li><strong>key</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li> <li><strong>keys</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
<li><strong>value</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li> <li><strong>values</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
<li><strong>num_heads</strong> (<em>int</em>) &#8211; Head number to compute the scaled dot product
attention. Default value is 1.</li>
<li><strong>dropout_rate</strong> (<em>float</em>) &#8211; The dropout rate to drop the attention weight.
Default value is 0.</li>
</ul> </ul>
</td> </td>
</tr> </tr>
<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">The Tensor variables representing the output and attention scores.</p> <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first"><dl class="docutils">
<dt>A 3-D Tensor computed by multi-head scaled dot product</dt>
<dd><p class="first last">attention.</p>
</dd>
</dl>
</p>
</td> </td>
</tr> </tr>
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">tuple</p> <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first">Variable</p>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; If input queries, keys, values are not 3-D Tensors.</p>
</td> </td>
</tr> </tr>
</tbody> </tbody>
</table> </table>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p>1. When num_heads &gt; 1, three linear projections are learned respectively
to map input queries, keys and values into queries&#8217;, keys&#8217; and values&#8217;.
queries&#8217;, keys&#8217; and values&#8217; have the same shapes with queries, keys
and values.</p>
<p class="last">1. When num_heads == 1, scaled_dot_product_attention has no learnable
parameters.</p>
</div>
<p class="rubric">Examples</p> <p class="rubric">Examples</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Suppose q, k, v are tensor variables with the following shape:</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Suppose q, k, v are Tensors with the following shape:</span>
<span class="c1"># q: [3, 5, 9], k: [3, 6, 9], v: [3, 6, 10]</span> <span class="c1"># q: [3, 5, 9], k: [3, 6, 9], v: [3, 6, 10]</span>
<span class="n">out</span><span class="p">,</span> <span class="n">attn_scores</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">nets</span><span class="o">.</span><span class="n">dot_product_attention</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
<span class="n">out</span><span class="o">.</span><span class="n">shape</span> <span class="c1"># [3, 5, 10]</span> <span class="n">contexts</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">nets</span><span class="o">.</span><span class="n">scaled_dot_product_attention</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
<span class="n">attn_scores</span><span class="o">.</span><span class="n">shape</span> <span class="c1"># [3, 5, 6]</span> <span class="n">contexts</span><span class="o">.</span><span class="n">shape</span> <span class="c1"># [3, 5, 10]</span>
</pre></div> </pre></div>
</div> </div>
</dd></dl> </dd></dl>
......
...@@ -3192,7 +3192,7 @@ ...@@ -3192,7 +3192,7 @@
} ] } ]
},{ },{
"type" : "reshape", "type" : "reshape",
"comment" : "\nReshape Operator.\n\nReshape Input(X) into the shape specified by Attr(shape).\n\nAn example:\nGiven a 2-D tensor X with 2 rows and 2 columns\n\n [[1, 2], [3, 4]]\n\nand target shape = [1, 4], the reshape operator will transform\nthe tensor X into a 2-D tensor:\n\n [[1, 2, 3, 4]]\n\nOne dimension in the target shape can be set -1, representing that its\nsize is unknown. In this case, the real dimension will be infered from \nthe original shape of Input(X) and other dimensions in the target shape.\n", "comment" : "\nReshape Operator.\n\nReshape Input(X) into the shape specified by Attr(shape).\n\nAn example:\nGiven a 2-D tensor X with 2 rows and 2 columns : [[1, 2], [3, 4]]\n\nand target shape = [1, 4], the reshape operator will transform\nthe tensor X into a 2-D tensor: [[1, 2, 3, 4]]\n\nOne dimension in the target shape can be set -1, representing that its\nsize is unknown. In this case, the real dimension will be infered from \nthe original shape of Input(X) and other dimensions in the target shape.\n",
"inputs" : [ "inputs" : [
{ {
"name" : "X", "name" : "X",
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
...@@ -26,8 +26,8 @@ glu ...@@ -26,8 +26,8 @@ glu
:noindex: :noindex:
dot_product_attention scaled_dot_product_attention
--------------------- ----------------------------
.. autofunction:: paddle.v2.fluid.nets.dot_product_attention .. autofunction:: paddle.v2.fluid.nets.scaled_dot_product_attention
:noindex: :noindex:
...@@ -277,16 +277,17 @@ multidimensional tensor will first be flattened ...@@ -277,16 +277,17 @@ multidimensional tensor will first be flattened
into a 2-dimensional matrix. The parameter into a 2-dimensional matrix. The parameter
<cite>num_flatten_dims</cite> determines how the input tensor <cite>num_flatten_dims</cite> determines how the input tensor
is flattened: the first <cite>num_flatten_dims</cite> is flattened: the first <cite>num_flatten_dims</cite>
dimensions will be flatten to form the first (inclusive, index starts from 1) dimensions will
dimension of the final matrix (height of the be flatten to form the first dimension of the
matrix), and the rest <cite>rank(X) - num_flatten_dims</cite> final matrix (height of the matrix), and the rest
dimensions are flattened to form the second <cite>rank(X) - num_flatten_dims</cite> dimensions are
dimension of the final matrix (width of the matrix). flattened to form the second dimension of the
For example, suppose <cite>X</cite> is a 6-dimensional tensor final matrix (width of the matrix). For example,
with a shape [2, 3, 4, 5, 6], and suppose <cite>X</cite> is a 6-dimensional tensor with a shape
<cite>num_flatten_dims</cite> = 3. Then, the flattened matrix [2, 3, 4, 5, 6], and <cite>num_flatten_dims</cite> = 3. Then,
will have a shape [2 x 3 x 4, 5 x 6] = [24, 30]. the flattened matrix will have a shape
By default, <cite>num_flatten_dims</cite> is set to 1.</li> [2 x 3 x 4, 5 x 6] = [24, 30]. By default,
<cite>num_flatten_dims</cite> is set to 1.</li>
<li><strong>param_attr</strong> (<em>ParamAttr|list</em>) &#8211; The parameter attribute for learnable <li><strong>param_attr</strong> (<em>ParamAttr|list</em>) &#8211; The parameter attribute for learnable
parameters/weights of the fully connected parameters/weights of the fully connected
layer.</li> layer.</li>
...@@ -877,13 +878,9 @@ Duplicable: False Optional: False</li> ...@@ -877,13 +878,9 @@ Duplicable: False Optional: False</li>
<dd><p>Reshape Operator.</p> <dd><p>Reshape Operator.</p>
<p>Reshape Input(X) into the shape specified by Attr(shape).</p> <p>Reshape Input(X) into the shape specified by Attr(shape).</p>
<p>An example: <p>An example:
Given a 2-D tensor X with 2 rows and 2 columns</p> Given a 2-D tensor X with 2 rows and 2 columns : [[1, 2], [3, 4]]</p>
<blockquote>
<div>[[1, 2], [3, 4]]</div></blockquote>
<p>and target shape = [1, 4], the reshape operator will transform <p>and target shape = [1, 4], the reshape operator will transform
the tensor X into a 2-D tensor:</p> the tensor X into a 2-D tensor: [[1, 2, 3, 4]]</p>
<blockquote>
<div>[[1, 2, 3, 4]]</div></blockquote>
<p>One dimension in the target shape can be set -1, representing that its <p>One dimension in the target shape can be set -1, representing that its
size is unknown. In this case, the real dimension will be infered from size is unknown. In this case, the real dimension will be infered from
the original shape of Input(X) and other dimensions in the target shape.</p> the original shape of Input(X) and other dimensions in the target shape.</p>
...@@ -1225,8 +1222,9 @@ X and Y and returns that as the output.</p> ...@@ -1225,8 +1222,9 @@ X and Y and returns that as the output.</p>
<dt> <dt>
<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">cross_entropy</code><span class="sig-paren">(</span><em>input</em>, <em>label</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">cross_entropy</code><span class="sig-paren">(</span><em>input</em>, <em>label</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
<dd><p><strong>Cross Entropy Layer</strong></p> <dd><p><strong>Cross Entropy Layer</strong></p>
<p>This layer computes the cross entropy between <cite>input</cite> and <cite>label</cite>. It supports <p>This layer computes the cross entropy between <cite>input</cite> and <cite>label</cite>. It
both standard cross-entropy and soft-label cross-entropy loss computation.</p> supports both standard cross-entropy and soft-label cross-entropy loss
computation.</p>
<ol class="arabic"> <ol class="arabic">
<li><dl class="first docutils"> <li><dl class="first docutils">
<dt>One-hot cross-entropy:</dt> <dt>One-hot cross-entropy:</dt>
...@@ -1262,22 +1260,33 @@ to a one-hot cross-entropy with one-hot label representation.</p> ...@@ -1262,22 +1260,33 @@ to a one-hot cross-entropy with one-hot label representation.</p>
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>input</strong> (<em>Variable|list</em>) &#8211; a 2-D tensor with shape [N x D], where N is the <li><strong>input</strong> (<em>Variable|list</em>) &#8211; a 2-D tensor with shape [N x D], where N is the
batch size and D is the number of classes. This input is a probability batch size and D is the number of classes. This
computed by the previous operator, which is almost always the result input is a probability computed by the previous
of a softmax operator.</li> operator, which is almost always the result of
a softmax operator.</li>
<li><strong>label</strong> (<em>Variable|list</em>) &#8211; the ground truth which is a 2-D tensor. When <li><strong>label</strong> (<em>Variable|list</em>) &#8211; the ground truth which is a 2-D tensor. When
<cite>soft_label</cite> is set to <cite>False</cite>, <cite>label</cite> is a tensor&lt;int64&gt; with shape <cite>soft_label</cite> is set to <cite>False</cite>, <cite>label</cite> is a
[N x 1]. When <cite>soft_label</cite> is set to <cite>True</cite>, <cite>label</cite> is a tensor&lt;int64&gt; with shape [N x 1]. When
<cite>soft_label</cite> is set to <cite>True</cite>, <cite>label</cite> is a
tensor&lt;float/double&gt; with shape [N x D].</li> tensor&lt;float/double&gt; with shape [N x D].</li>
<li><strong>soft_label</strong> (bool, via <cite>**kwargs</cite>) &#8211; a flag indicating whether to interpretate <li><strong>soft_label</strong> (bool, via <cite>**kwargs</cite>) &#8211; a flag indicating whether to
the given labels as soft labels, default <cite>False</cite>.</li> interpretate the given labels as soft
labels, default <cite>False</cite>.</li>
</ul> </ul>
</td> </td>
</tr> </tr>
<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">A 2-D tensor with shape [N x 1], the cross entropy loss.</p> <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">A 2-D tensor with shape [N x 1], the cross entropy loss.</p>
</td> </td>
</tr> </tr>
<tr class="field-odd field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><cite>ValueError</cite> &#8211; 1) the 1st dimension of <cite>input</cite> and <cite>label</cite> are not equal; 2) when <cite>soft_label == True</cite>, and the 2nd dimension of <cite>input</cite> and <cite>label</cite> are not equal; 3) when <cite>soft_label == False</cite>, and the 2nd dimension of <cite>label</cite> is not 1.</p> <tr class="field-odd field"><th class="field-name">Raises:</th><td class="field-body"><p class="first"><cite>ValueError</cite> &#8211; 1) the 1st dimension of <cite>input</cite> and <cite>label</cite> are not equal.
2) when <cite>soft_label == True</cite>, and the 2nd dimension of</p>
<blockquote>
<div><p><cite>input</cite> and <cite>label</cite> are not equal.</p>
</div></blockquote>
<ol class="last arabic simple" start="3">
<li>when <cite>soft_label == False</cite>, and the 2nd dimension of
<cite>label</cite> is not 1.</li>
</ol>
</td> </td>
</tr> </tr>
</tbody> </tbody>
...@@ -1296,8 +1305,9 @@ the given labels as soft labels, default <cite>False</cite>.</li> ...@@ -1296,8 +1305,9 @@ the given labels as soft labels, default <cite>False</cite>.</li>
<dt> <dt>
<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">square_error_cost</code><span class="sig-paren">(</span><em>input</em>, <em>label</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">square_error_cost</code><span class="sig-paren">(</span><em>input</em>, <em>label</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
<dd><p><strong>Square error cost layer</strong></p> <dd><p><strong>Square error cost layer</strong></p>
<p>This layer accepts input predictions and target label and returns the squared error cost. <p>This layer accepts input predictions and target label and returns the
For predictions, <span class="math">\(X\)</span>, and target labels, <span class="math">\(Y\)</span>, the equation is:</p> squared error cost.</p>
<p>For predictions, <span class="math">\(X\)</span>, and target labels, <span class="math">\(Y\)</span>, the equation is:</p>
<div class="math"> <div class="math">
\[Out = (X - Y)^2\]</div> \[Out = (X - Y)^2\]</div>
<p>In the above equation:</p> <p>In the above equation:</p>
...@@ -1318,7 +1328,12 @@ For predictions, <span class="math">\(X\)</span>, and target labels, <span class ...@@ -1318,7 +1328,12 @@ For predictions, <span class="math">\(X\)</span>, and target labels, <span class
</ul> </ul>
</td> </td>
</tr> </tr>
<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">The tensor variable storing the element-wise squared error difference of input and label.</p> <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first"><dl class="docutils">
<dt>The tensor variable storing the element-wise squared error</dt>
<dd><p class="first last">difference of input and label.</p>
</dd>
</dl>
</p>
</td> </td>
</tr> </tr>
<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">Variable</p> <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">Variable</p>
...@@ -1363,12 +1378,13 @@ in the input parameters to the function.</p> ...@@ -1363,12 +1378,13 @@ in the input parameters to the function.</p>
<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">conv2d</code><span class="sig-paren">(</span><em>input</em>, <em>num_filters</em>, <em>filter_size</em>, <em>stride=None</em>, <em>padding=None</em>, <em>groups=None</em>, <em>param_attr=None</em>, <em>bias_attr=None</em>, <em>use_cudnn=True</em>, <em>act=None</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">conv2d</code><span class="sig-paren">(</span><em>input</em>, <em>num_filters</em>, <em>filter_size</em>, <em>stride=None</em>, <em>padding=None</em>, <em>groups=None</em>, <em>param_attr=None</em>, <em>bias_attr=None</em>, <em>use_cudnn=True</em>, <em>act=None</em><span class="sig-paren">)</span></dt>
<dd><p><strong>Convlution2D Layer</strong></p> <dd><p><strong>Convlution2D Layer</strong></p>
<p>The convolution2D layer calculates the output based on the input, filter <p>The convolution2D layer calculates the output based on the input, filter
and strides, paddings, dilations, groups parameters. Input(Input) and Output(Output) and strides, paddings, dilations, groups parameters. Input(Input) and
are in NCHW format. Where N is batch size, C is the number of channels, H is the height Output(Output) are in NCHW format. Where N is batch size, C is the number of
of the feature, and W is the width of the feature. channels, H is the height of the feature, and W is the width of the feature.
The details of convolution layer, please refer UFLDL&#8217;s <a class="reference external" href="http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractionUsingConvolution/">convolution,</a> . The details of convolution layer, please refer UFLDL&#8217;s <a class="reference external" href="http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractionUsingConvolution/">convolution,</a> .
If bias attribution and activation type are provided, bias is added to the output of the convolution, If bias attribution and activation type are provided, bias is added to the
and the corresponding activation function is applied to the final result.</p> output of the convolution, and the corresponding activation function is
applied to the final result.</p>
<p>For each input <span class="math">\(X\)</span>, the equation is:</p> <p>For each input <span class="math">\(X\)</span>, the equation is:</p>
<div class="math"> <div class="math">
\[Out = \sigma (W \ast X + b)\]</div> \[Out = \sigma (W \ast X + b)\]</div>
...@@ -1379,7 +1395,11 @@ and the corresponding activation function is applied to the final result.</p> ...@@ -1379,7 +1395,11 @@ and the corresponding activation function is applied to the final result.</p>
<li><span class="math">\(\ast\)</span>: Convolution operation.</li> <li><span class="math">\(\ast\)</span>: Convolution operation.</li>
<li><span class="math">\(b\)</span>: Bias value, a 2-D tensor with shape [M, 1].</li> <li><span class="math">\(b\)</span>: Bias value, a 2-D tensor with shape [M, 1].</li>
<li><span class="math">\(\sigma\)</span>: Activation function.</li> <li><span class="math">\(\sigma\)</span>: Activation function.</li>
<li><span class="math">\(Out\)</span>: Output value, the shape of <span class="math">\(Out\)</span> and <span class="math">\(X\)</span> may be different.</li> <li><dl class="first docutils">
<dt><span class="math">\(Out\)</span>: Output value, the shape of <span class="math">\(Out\)</span> and <span class="math">\(X\)</span> may be</dt>
<dd>different.</dd>
</dl>
</li>
</ul> </ul>
<p class="rubric">Example</p> <p class="rubric">Example</p>
<ul> <ul>
...@@ -1426,20 +1446,28 @@ library is installed. Default: True</li> ...@@ -1426,20 +1446,28 @@ library is installed. Default: True</li>
</ul> </ul>
</td> </td>
</tr> </tr>
<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">The tensor variable storing the convolution and non-linearity activation result.</p> <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first"><dl class="docutils">
<dt>The tensor variable storing the convolution and</dt>
<dd><p class="first last">non-linearity activation result.</p>
</dd>
</dl>
</p>
</td> </td>
</tr> </tr>
<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first">Variable</p> <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first">Variable</p>
</td> </td>
</tr> </tr>
<tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; If the shapes of input, filter_size, stride, padding and groups mismatch.</p> <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; If the shapes of input, filter_size, stride, padding and
groups mismatch.</p>
</td> </td>
</tr> </tr>
</tbody> </tbody>
</table> </table>
<p class="rubric">Examples</p> <p class="rubric">Examples</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;data&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">32</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">)</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span>
<span class="n">conv2d</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">conv2d</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">num_filters</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">filter_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="s2">&quot;relu&quot;</span><span class="p">)</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;data&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">32</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">)</span>
<span class="n">conv2d</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">conv2d</span><span class="p">(</span>
<span class="nb">input</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">num_filters</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">filter_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="s2">&quot;relu&quot;</span><span class="p">)</span>
</pre></div> </pre></div>
</div> </div>
</dd></dl> </dd></dl>
...@@ -2177,7 +2205,8 @@ are in NCHW format. Where N is batch size, C is the number of channels, ...@@ -2177,7 +2205,8 @@ are in NCHW format. Where N is batch size, C is the number of channels,
H is the height of the feature, and W is the width of the feature. H is the height of the feature, and W is the width of the feature.
Parameters(dilations, strides, paddings) are two elements. These two elements Parameters(dilations, strides, paddings) are two elements. These two elements
represent height and width, respectively. The details of convolution transpose represent height and width, respectively. The details of convolution transpose
layer, please refer to the following explanation and references <a class="reference external" href="http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdf">therein</a>.</p> layer, please refer to the following explanation and references
<a class="reference external" href="http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdf">therein</a>.</p>
<p>For each input <span class="math">\(X\)</span>, the equation is:</p> <p>For each input <span class="math">\(X\)</span>, the equation is:</p>
<div class="math"> <div class="math">
\[Out = W \ast X\]</div> \[Out = W \ast X\]</div>
...@@ -2186,7 +2215,11 @@ layer, please refer to the following explanation and references <a class="refere ...@@ -2186,7 +2215,11 @@ layer, please refer to the following explanation and references <a class="refere
<li><span class="math">\(X\)</span>: Input value, a tensor with NCHW format.</li> <li><span class="math">\(X\)</span>: Input value, a tensor with NCHW format.</li>
<li><span class="math">\(W\)</span>: Filter value, a tensor with MCHW format.</li> <li><span class="math">\(W\)</span>: Filter value, a tensor with MCHW format.</li>
<li><span class="math">\(\ast\)</span> : Convolution transpose operation.</li> <li><span class="math">\(\ast\)</span> : Convolution transpose operation.</li>
<li><span class="math">\(Out\)</span>: Output value, the shape of <span class="math">\(Out\)</span> and <span class="math">\(X\)</span> may be different.</li> <li><dl class="first docutils">
<dt><span class="math">\(Out\)</span>: Output value, the shape of <span class="math">\(Out\)</span> and <span class="math">\(X\)</span> may be</dt>
<dd>different.</dd>
</dl>
</li>
</ul> </ul>
<p class="rubric">Example</p> <p class="rubric">Example</p>
<ul> <ul>
...@@ -2226,7 +2259,8 @@ stride_H = stride_W = stride. Default: stride = 1.</li> ...@@ -2226,7 +2259,8 @@ stride_H = stride_W = stride. Default: stride = 1.</li>
<li><strong>dilation</strong> (<em>int|tuple</em>) &#8211; The dilation size. If dilation is a tuple, it must <li><strong>dilation</strong> (<em>int|tuple</em>) &#8211; The dilation size. If dilation is a tuple, it must
contain two integers, (dilation_H, dilation_W). Otherwise, the contain two integers, (dilation_H, dilation_W). Otherwise, the
dilation_H = dilation_W = dilation. Default: dilation = 1.</li> dilation_H = dilation_W = dilation. Default: dilation = 1.</li>
<li><strong>param_attr</strong> (<em>ParamAttr</em>) &#8211; The parameters to the Conv2d_transpose Layer. Default: None</li> <li><strong>param_attr</strong> (<em>ParamAttr</em>) &#8211; The parameters to the Conv2d_transpose Layer.
Default: None</li>
<li><strong>use_cudnn</strong> (<em>bool</em>) &#8211; Use cudnn kernel or not, it is valid only when the cudnn <li><strong>use_cudnn</strong> (<em>bool</em>) &#8211; Use cudnn kernel or not, it is valid only when the cudnn
library is installed. Default: True</li> library is installed. Default: True</li>
<li><strong>name</strong> (<em>str|None</em>) &#8211; A name for this layer(optional). If set None, the layer <li><strong>name</strong> (<em>str|None</em>) &#8211; A name for this layer(optional). If set None, the layer
...@@ -2240,14 +2274,17 @@ will be named automatically.</li> ...@@ -2240,14 +2274,17 @@ will be named automatically.</li>
<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first">Variable</p> <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first">Variable</p>
</td> </td>
</tr> </tr>
<tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; If the shapes of input, filter_size, stride, padding and groups mismatch.</p> <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; If the shapes of input, filter_size, stride, padding and
groups mismatch.</p>
</td> </td>
</tr> </tr>
</tbody> </tbody>
</table> </table>
<p class="rubric">Examples</p> <p class="rubric">Examples</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;data&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">32</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">)</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span>
<span class="n">conv2d_transpose</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">conv2d_transpose</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">num_filters</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">filter_size</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;data&#39;</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">32</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">&#39;float32&#39;</span><span class="p">)</span>
<span class="n">conv2d_transpose</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">conv2d_transpose</span><span class="p">(</span>
<span class="nb">input</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">num_filters</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">filter_size</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
</pre></div> </pre></div>
</div> </div>
</dd></dl> </dd></dl>
...@@ -2356,8 +2393,10 @@ and concatenation of <span class="math">\(u_t\)</span>, <span class="math">\(r_t ...@@ -2356,8 +2393,10 @@ and concatenation of <span class="math">\(u_t\)</span>, <span class="math">\(r_t
<li><strong>size</strong> (<em>integer</em>) &#8211; The input dimension value.</li> <li><strong>size</strong> (<em>integer</em>) &#8211; The input dimension value.</li>
<li><strong>weight</strong> (<em>ParamAttr</em>) &#8211; The weight parameters for gru unit. Default: None</li> <li><strong>weight</strong> (<em>ParamAttr</em>) &#8211; The weight parameters for gru unit. Default: None</li>
<li><strong>bias</strong> (<em>ParamAttr</em>) &#8211; The bias parameters for gru unit. Default: None</li> <li><strong>bias</strong> (<em>ParamAttr</em>) &#8211; The bias parameters for gru unit. Default: None</li>
<li><strong>activation</strong> (<em>string</em>) &#8211; The activation type for cell (actNode). Default: &#8216;tanh&#8217;</li> <li><strong>activation</strong> (<em>string</em>) &#8211; The activation type for cell (actNode).
<li><strong>gate_activation</strong> (<em>string</em>) &#8211; The activation type for gates (actGate). Default: &#8216;sigmoid&#8217;</li> Default: &#8216;tanh&#8217;</li>
<li><strong>gate_activation</strong> (<em>string</em>) &#8211; The activation type for gates (actGate).
Default: &#8216;sigmoid&#8217;</li>
</ul> </ul>
</td> </td>
</tr> </tr>
...@@ -2433,7 +2472,10 @@ will be named automatically.</li> ...@@ -2433,7 +2472,10 @@ will be named automatically.</li>
<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first">tuple</p> <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first">tuple</p>
</td> </td>
</tr> </tr>
<tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; The ranks of <strong>x_t</strong>, <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be 2 or the 1st dimensions of <strong>x_t</strong>, <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be the same or the 2nd dimensions of <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be the same.</p> <tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; The ranks of <strong>x_t</strong>, <strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong>
not be 2 or the 1st dimensions of <strong>x_t</strong>, <strong>hidden_t_prev</strong>
and <strong>cell_t_prev</strong> not be the same or the 2nd dimensions of
<strong>hidden_t_prev</strong> and <strong>cell_t_prev</strong> not be the same.</p>
</td> </td>
</tr> </tr>
</tbody> </tbody>
...@@ -2725,9 +2767,9 @@ will be named automatically.</li> ...@@ -2725,9 +2767,9 @@ will be named automatically.</li>
<dl class="function"> <dl class="function">
<dt> <dt>
<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">matmul</code><span class="sig-paren">(</span><em>x</em>, <em>y</em>, <em>transpose_x=False</em>, <em>transpose_y=False</em>, <em>name=None</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">matmul</code><span class="sig-paren">(</span><em>x</em>, <em>y</em>, <em>transpose_x=False</em>, <em>transpose_y=False</em>, <em>name=None</em><span class="sig-paren">)</span></dt>
<dd><p>Applies matrix multiplication to two tensors. Currently, the input <dd><p>Applies matrix multiplication to two tensors.</p>
tensors&#8217; rank can be any, but when the rank of anyone inputs is <p>Currently, the input tensors&#8217; rank can be any, but when the rank of any
bigger than 3, this two inputs&#8217; rank should be equal.</p> inputs is bigger than 3, this two inputs&#8217; rank should be equal.</p>
<p>The actual behavior depends on the shapes of <span class="math">\(x\)</span>, <span class="math">\(y\)</span> and the <p>The actual behavior depends on the shapes of <span class="math">\(x\)</span>, <span class="math">\(y\)</span> and the
flag values of <code class="xref py py-attr docutils literal"><span class="pre">transpose_x</span></code>, <code class="xref py py-attr docutils literal"><span class="pre">transpose_y</span></code>. Specifically:</p> flag values of <code class="xref py py-attr docutils literal"><span class="pre">transpose_x</span></code>, <code class="xref py py-attr docutils literal"><span class="pre">transpose_y</span></code>. Specifically:</p>
<ul class="simple"> <ul class="simple">
...@@ -2775,18 +2817,23 @@ will be named automatically.</li> ...@@ -2775,18 +2817,23 @@ will be named automatically.</li>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Examples to clarify shapes of the inputs and output</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Examples to clarify shapes of the inputs and output</span>
<span class="c1"># x: [B, ..., M, K], y: [B, ..., K, N]</span> <span class="c1"># x: [B, ..., M, K], y: [B, ..., K, N]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, ..., M, N]</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, ..., M, N]</span>
<span class="c1"># x: [B, M, K], y: [B, K, N]</span> <span class="c1"># x: [B, M, K], y: [B, K, N]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, M, N]</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, M, N]</span>
<span class="c1"># x: [B, M, K], y: [K, N]</span> <span class="c1"># x: [B, M, K], y: [K, N]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, M, N]</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, M, N]</span>
<span class="c1"># x: [B, M, K], y: [K]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, M]</span>
<span class="c1"># x: [M, K], y: [K, N]</span> <span class="c1"># x: [M, K], y: [K, N]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [M, N]</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [M, N]</span>
<span class="c1"># x: [B, M, K], y: [K]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [B, M]</span>
<span class="c1"># x: [K], y: [K]</span> <span class="c1"># x: [K], y: [K]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [1]</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># out: [1]</span>
<span class="c1"># x: [M], y: [N]</span>
<span class="c1"># x: [M], y: [N]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)</span> <span class="c1"># out: [M, N]</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)</span> <span class="c1"># out: [M, N]</span>
</pre></div> </pre></div>
</div> </div>
...@@ -3521,7 +3568,8 @@ output.lod = [[0, 4, 8]] ...@@ -3521,7 +3568,8 @@ output.lod = [[0, 4, 8]]
</pre></div> </pre></div>
</div> </div>
<p>The simple usage is:</p> <p>The simple usage is:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">output</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">im2sequence</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">filter_size</span><span class="o">=</span><span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">output</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">im2sequence</span><span class="p">(</span>
<span class="nb">input</span><span class="o">=</span><span class="n">layer</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">filter_size</span><span class="o">=</span><span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span>
</pre></div> </pre></div>
</div> </div>
</div></blockquote> </div></blockquote>
...@@ -3537,8 +3585,13 @@ output.lod = [[0, 4, 8]] ...@@ -3537,8 +3585,13 @@ output.lod = [[0, 4, 8]]
<dt> <dt>
<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">ctc_greedy_decoder</code><span class="sig-paren">(</span><em>input</em>, <em>blank</em>, <em>name=None</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">ctc_greedy_decoder</code><span class="sig-paren">(</span><em>input</em>, <em>blank</em>, <em>name=None</em><span class="sig-paren">)</span></dt>
<dd><p>This op is used to decode sequences by greedy policy by below steps: <dd><p>This op is used to decode sequences by greedy policy by below steps:
1. Get the indexes of max value for each row in input. a.k.a. numpy.argmax(input, axis=0). 1. Get the indexes of max value for each row in input. a.k.a.</p>
2. For each sequence in result of step1, merge repeated tokens between two blanks and delete all blanks.</p> <blockquote>
<div>numpy.argmax(input, axis=0).</div></blockquote>
<ol class="arabic simple" start="2">
<li>For each sequence in result of step1, merge repeated tokens between two
blanks and delete all blanks.</li>
</ol>
<p>A simple example as below:</p> <p>A simple example as below:</p>
<div class="highlight-text"><div class="highlight"><pre><span></span>Given: <div class="highlight-text"><div class="highlight"><pre><span></span>Given:
...@@ -3568,8 +3621,15 @@ output.lod = [[0, 2, 3]] ...@@ -3568,8 +3621,15 @@ output.lod = [[0, 2, 3]]
<col class="field-body" /> <col class="field-body" />
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>input</strong> (<em>Variable</em>) &#8211; (LoDTensor&lt;float&gt;), the probabilities of variable-length sequences, which is a 2-D Tensor with LoD information. It&#8217;s shape is [Lp, num_classes + 1], where Lp is the sum of all input sequences&#8217; length and num_classes is the true number of classes. (not including the blank label).</li> <li><strong>input</strong> (<em>Variable</em>) &#8211; (LoDTensor&lt;float&gt;), the probabilities of
<li><strong>blank</strong> (<em>int</em>) &#8211; the blank label index of Connectionist Temporal Classification (CTC) loss, which is in thehalf-opened interval [0, num_classes + 1).</li> variable-length sequences, which is a 2-D Tensor with
LoD information. It&#8217;s shape is [Lp, num_classes + 1],
where Lp is the sum of all input sequences&#8217; length and
num_classes is the true number of classes. (not
including the blank label).</li>
<li><strong>blank</strong> (<em>int</em>) &#8211; the blank label index of Connectionist Temporal
Classification (CTC) loss, which is in thehalf-opened
interval [0, num_classes + 1).</li>
</ul> </ul>
</td> </td>
</tr> </tr>
...@@ -3628,7 +3688,7 @@ will be named automatically.</li> ...@@ -3628,7 +3688,7 @@ will be named automatically.</li>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">&quot;data&quot;</span><span class="p">,</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">&quot;data&quot;</span><span class="p">,</span>
<span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">17</span><span class="p">,</span> <span class="mi">13</span><span class="p">),</span> <span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">17</span><span class="p">,</span> <span class="mi">13</span><span class="p">),</span>
<span class="n">dtype</span><span class="o">=</span><span class="s2">&quot;float32&quot;</span><span class="p">)</span> <span class="n">dtype</span><span class="o">=</span><span class="s2">&quot;float32&quot;</span><span class="p">)</span>
<span class="n">fc</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">l2_normalize</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="n">normed</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">l2_normalize</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</pre></div> </pre></div>
</div> </div>
</dd></dl> </dd></dl>
......
...@@ -303,11 +303,11 @@ dimension to split along is <span class="math">\(rank(input) + dim\)</span>.</li ...@@ -303,11 +303,11 @@ dimension to split along is <span class="math">\(rank(input) + dim\)</span>.</li
</dd></dl> </dd></dl>
</div> </div>
<div class="section" id="dot-product-attention"> <div class="section" id="scaled-dot-product-attention">
<h2>dot_product_attention<a class="headerlink" href="#dot-product-attention" title="永久链接至标题"></a></h2> <h2>scaled_dot_product_attention<a class="headerlink" href="#scaled-dot-product-attention" title="永久链接至标题"></a></h2>
<dl class="function"> <dl class="function">
<dt> <dt>
<code class="descclassname">paddle.v2.fluid.nets.</code><code class="descname">dot_product_attention</code><span class="sig-paren">(</span><em>querys</em>, <em>keys</em>, <em>values</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.fluid.nets.</code><code class="descname">scaled_dot_product_attention</code><span class="sig-paren">(</span><em>queries</em>, <em>keys</em>, <em>values</em>, <em>num_heads=1</em>, <em>dropout_rate=0.0</em><span class="sig-paren">)</span></dt>
<dd><p>The dot-product attention.</p> <dd><p>The dot-product attention.</p>
<p>Attention mechanism can be seen as mapping a query and a set of key-value <p>Attention mechanism can be seen as mapping a query and a set of key-value
pairs to an output. The output is computed as a weighted sum of the values, pairs to an output. The output is computed as a weighted sum of the values,
...@@ -317,36 +317,55 @@ function (dot-product here) of the query with the corresponding key.</p> ...@@ -317,36 +317,55 @@ function (dot-product here) of the query with the corresponding key.</p>
multipication as follows:</p> multipication as follows:</p>
<blockquote> <blockquote>
<div><div class="math"> <div><div class="math">
\[Attention(Q, K, V)= softmax(QK^\mathrm{T})V\]</div> \[Attention(Q, K, V)= softmax(QK^\mathrm{T})V\]</div>
</div></blockquote> </div></blockquote>
<p>Refer to <a class="reference external" href="https://arxiv.org/pdf/1706.03762.pdf">Attention Is All You Need</a>.</p> <p>Refer to <a class="reference external" href="https://arxiv.org/pdf/1706.03762.pdf">Attention Is All You Need</a>.</p>
<p>Note that batch data containing sequences with different lengths is not
supported by this because of the (batch) matrix multipication.</p>
<table class="docutils field-list" frame="void" rules="none"> <table class="docutils field-list" frame="void" rules="none">
<col class="field-name" /> <col class="field-name" />
<col class="field-body" /> <col class="field-body" />
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>query</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li> <li><strong>queries</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
<li><strong>key</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li> <li><strong>keys</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
<li><strong>value</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li> <li><strong>values</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
<li><strong>num_heads</strong> (<em>int</em>) &#8211; Head number to compute the scaled dot product
attention. Default value is 1.</li>
<li><strong>dropout_rate</strong> (<em>float</em>) &#8211; The dropout rate to drop the attention weight.
Default value is 0.</li>
</ul> </ul>
</td> </td>
</tr> </tr>
<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">The Tensor variables representing the output and attention scores.</p> <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first"><dl class="docutils">
<dt>A 3-D Tensor computed by multi-head scaled dot product</dt>
<dd><p class="first last">attention.</p>
</dd>
</dl>
</p>
</td> </td>
</tr> </tr>
<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">tuple</p> <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first">Variable</p>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; If input queries, keys, values are not 3-D Tensors.</p>
</td> </td>
</tr> </tr>
</tbody> </tbody>
</table> </table>
<div class="admonition note">
<p class="first admonition-title">注解</p>
<p>1. When num_heads &gt; 1, three linear projections are learned respectively
to map input queries, keys and values into queries&#8217;, keys&#8217; and values&#8217;.
queries&#8217;, keys&#8217; and values&#8217; have the same shapes with queries, keys
and values.</p>
<p class="last">1. When num_heads == 1, scaled_dot_product_attention has no learnable
parameters.</p>
</div>
<p class="rubric">Examples</p> <p class="rubric">Examples</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Suppose q, k, v are tensor variables with the following shape:</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Suppose q, k, v are Tensors with the following shape:</span>
<span class="c1"># q: [3, 5, 9], k: [3, 6, 9], v: [3, 6, 10]</span> <span class="c1"># q: [3, 5, 9], k: [3, 6, 9], v: [3, 6, 10]</span>
<span class="n">out</span><span class="p">,</span> <span class="n">attn_scores</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">nets</span><span class="o">.</span><span class="n">dot_product_attention</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
<span class="n">out</span><span class="o">.</span><span class="n">shape</span> <span class="c1"># [3, 5, 10]</span> <span class="n">contexts</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">nets</span><span class="o">.</span><span class="n">scaled_dot_product_attention</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
<span class="n">attn_scores</span><span class="o">.</span><span class="n">shape</span> <span class="c1"># [3, 5, 6]</span> <span class="n">contexts</span><span class="o">.</span><span class="n">shape</span> <span class="c1"># [3, 5, 10]</span>
</pre></div> </pre></div>
</div> </div>
</dd></dl> </dd></dl>
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册