Deploy to GitHub Pages: ef8cb8f6

eace3e49 · Travis CI · d690e184 · eace3e49 · eace3e49 · eace3e49
9 changed file
--- a/develop/doc/_sources/api/v2/fluid/nets.rst.txt
+++ b/develop/doc/_sources/api/v2/fluid/nets.rst.txt
@@ -26,8 +26,8 @@ glu
    :noindex:


-dot_product_attention
---------------------
-..  autofunction:: paddle.v2.fluid.nets.dot_product_attention
+scaled_dot_product_attention
+----------------------------
+..  autofunction:: paddle.v2.fluid.nets.scaled_dot_product_attention
    :noindex:

--- a/develop/doc/api/v2/fluid/layers.html
+++ b/develop/doc/api/v2/fluid/layers.html
--- a/develop/doc/api/v2/fluid/nets.html
+++ b/develop/doc/api/v2/fluid/nets.html
@@ -284,11 +284,11 @@ dimension to split along is <span class="math">\(rank(input) + dim\)</span>.</li
 </dd></dl>

 </div>
-<div class="section" id="dot-product-attention">
-<h2>dot_product_attention<a class="headerlink" href="#dot-product-attention" title="Permalink to this headline">¶</a></h2>
+<div class="section" id="scaled-dot-product-attention">
+<h2>scaled_dot_product_attention<a class="headerlink" href="#scaled-dot-product-attention" title="Permalink to this headline">¶</a></h2>
 <dl class="function">
 <dt>
-<code class="descclassname">paddle.v2.fluid.nets.</code><code class="descname">dot_product_attention</code><span class="sig-paren">(</span><em>querys</em>, <em>keys</em>, <em>values</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.fluid.nets.</code><code class="descname">scaled_dot_product_attention</code><span class="sig-paren">(</span><em>queries</em>, <em>keys</em>, <em>values</em>, <em>num_heads=1</em>, <em>dropout_rate=0.0</em><span class="sig-paren">)</span></dt>
 <dd><p>The dot-product attention.</p>
 <p>Attention mechanism can be seen as mapping a query and a set of key-value
 pairs to an output. The output is computed as a weighted sum of the values,
@@ -298,36 +298,55 @@ function (dot-product here) of the query with the corresponding key.</p>
 multipication as follows:</p>
 <blockquote>
 <div><div class="math">
-\[Attention(Q, K, V)= softmax(QK^\mathrm{T})V\]</div>
+\[Attention(Q, K, V)= softmax(QK^\mathrm{T})V\]</div>
 </div></blockquote>
 <p>Refer to <a class="reference external" href="https://arxiv.org/pdf/1706.03762.pdf">Attention Is All You Need</a>.</p>
-<p>Note that batch data containing sequences with different lengths is not
-supported by this because of the (batch) matrix multipication.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>query</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
-<li><strong>key</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
-<li><strong>value</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
+<li><strong>queries</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
+<li><strong>keys</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
+<li><strong>values</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
+<li><strong>num_heads</strong> (<em>int</em>) &#8211; Head number to compute the scaled dot product
+attention. Default value is 1.</li>
+<li><strong>dropout_rate</strong> (<em>float</em>) &#8211; The dropout rate to drop the attention weight.
+Default value is 0.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">The Tensor variables representing the output and attention scores.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first"><dl class="docutils">
+<dt>A 3-D Tensor computed by multi-head scaled dot product</dt>
+<dd><p class="first last">attention.</p>
+</dd>
+</dl>
+</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">tuple</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first">Variable</p>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; If input queries, keys, values are not 3-D Tensors.</p>
 </td>
 </tr>
 </tbody>
 </table>
+<div class="admonition note">
+<p class="first admonition-title">Note</p>
+<p>1. When num_heads &gt; 1, three linear projections are learned respectively
+to map input queries, keys and values into queries&#8217;, keys&#8217; and values&#8217;.
+queries&#8217;, keys&#8217; and values&#8217; have the same shapes with queries, keys
+and values.</p>
+<p class="last">1. When num_heads == 1, scaled_dot_product_attention has no learnable
+parameters.</p>
+</div>
 <p class="rubric">Examples</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Suppose q, k, v are tensor variables with the following shape:</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Suppose q, k, v are Tensors with the following shape:</span>
 <span class="c1"># q: [3, 5, 9], k: [3, 6, 9], v: [3, 6, 10]</span>
-<span class="n">out</span><span class="p">,</span> <span class="n">attn_scores</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">nets</span><span class="o">.</span><span class="n">dot_product_attention</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
-<span class="n">out</span><span class="o">.</span><span class="n">shape</span>  <span class="c1"># [3, 5, 10]</span>
-<span class="n">attn_scores</span><span class="o">.</span><span class="n">shape</span>  <span class="c1"># [3, 5, 6]</span>
+
+<span class="n">contexts</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">nets</span><span class="o">.</span><span class="n">scaled_dot_product_attention</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
+<span class="n">contexts</span><span class="o">.</span><span class="n">shape</span>  <span class="c1"># [3, 5, 10]</span>
 </pre></div>
 </div>
 </dd></dl>

--- a/develop/doc/operators.json
+++ b/develop/doc/operators.json
@@ -3192,7 +3192,7 @@
 } ] 
 },{
 "type" : "reshape",
- "comment" : "\nReshape Operator.\n\nReshape Input(X) into the shape specified by Attr(shape).\n\nAn example:\nGiven a 2-D tensor X with 2 rows and 2 columns\n\n    [[1, 2], [3, 4]]\n\nand target shape = [1, 4], the reshape operator will transform\nthe tensor X into a 2-D tensor:\n\n    [[1, 2, 3, 4]]\n\nOne dimension in the target shape can be set -1, representing that its\nsize is unknown. In this case, the real dimension will be infered from \nthe original shape of Input(X) and other dimensions in the target shape.\n",
+ "comment" : "\nReshape Operator.\n\nReshape Input(X) into the shape specified by Attr(shape).\n\nAn example:\nGiven a 2-D tensor X with 2 rows and 2 columns : [[1, 2], [3, 4]]\n\nand target shape = [1, 4], the reshape operator will transform\nthe tensor X into a 2-D tensor: [[1, 2, 3, 4]]\n\nOne dimension in the target shape can be set -1, representing that its\nsize is unknown. In this case, the real dimension will be infered from \nthe original shape of Input(X) and other dimensions in the target shape.\n",
 "inputs" : [ 
 { 
   "name" : "X",

--- a/develop/doc/searchindex.js
+++ b/develop/doc/searchindex.js
--- a/develop/doc_cn/_sources/api/v2/fluid/nets.rst.txt
+++ b/develop/doc_cn/_sources/api/v2/fluid/nets.rst.txt
@@ -26,8 +26,8 @@ glu
    :noindex:


-dot_product_attention
---------------------
-..  autofunction:: paddle.v2.fluid.nets.dot_product_attention
+scaled_dot_product_attention
+----------------------------
+..  autofunction:: paddle.v2.fluid.nets.scaled_dot_product_attention
    :noindex:

--- a/develop/doc_cn/api/v2/fluid/layers.html
+++ b/develop/doc_cn/api/v2/fluid/layers.html
--- a/develop/doc_cn/api/v2/fluid/nets.html
+++ b/develop/doc_cn/api/v2/fluid/nets.html
@@ -303,11 +303,11 @@ dimension to split along is <span class="math">\(rank(input) + dim\)</span>.</li
 </dd></dl>

 </div>
-<div class="section" id="dot-product-attention">
-<h2>dot_product_attention<a class="headerlink" href="#dot-product-attention" title="永久链接至标题">¶</a></h2>
+<div class="section" id="scaled-dot-product-attention">
+<h2>scaled_dot_product_attention<a class="headerlink" href="#scaled-dot-product-attention" title="永久链接至标题">¶</a></h2>
 <dl class="function">
 <dt>
-<code class="descclassname">paddle.v2.fluid.nets.</code><code class="descname">dot_product_attention</code><span class="sig-paren">(</span><em>querys</em>, <em>keys</em>, <em>values</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.fluid.nets.</code><code class="descname">scaled_dot_product_attention</code><span class="sig-paren">(</span><em>queries</em>, <em>keys</em>, <em>values</em>, <em>num_heads=1</em>, <em>dropout_rate=0.0</em><span class="sig-paren">)</span></dt>
 <dd><p>The dot-product attention.</p>
 <p>Attention mechanism can be seen as mapping a query and a set of key-value
 pairs to an output. The output is computed as a weighted sum of the values,
@@ -317,36 +317,55 @@ function (dot-product here) of the query with the corresponding key.</p>
 multipication as follows:</p>
 <blockquote>
 <div><div class="math">
-\[Attention(Q, K, V)= softmax(QK^\mathrm{T})V\]</div>
+\[Attention(Q, K, V)= softmax(QK^\mathrm{T})V\]</div>
 </div></blockquote>
 <p>Refer to <a class="reference external" href="https://arxiv.org/pdf/1706.03762.pdf">Attention Is All You Need</a>.</p>
-<p>Note that batch data containing sequences with different lengths is not
-supported by this because of the (batch) matrix multipication.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>query</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
-<li><strong>key</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
-<li><strong>value</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
+<li><strong>queries</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
+<li><strong>keys</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
+<li><strong>values</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
+<li><strong>num_heads</strong> (<em>int</em>) &#8211; Head number to compute the scaled dot product
+attention. Default value is 1.</li>
+<li><strong>dropout_rate</strong> (<em>float</em>) &#8211; The dropout rate to drop the attention weight.
+Default value is 0.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">The Tensor variables representing the output and attention scores.</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first"><dl class="docutils">
+<dt>A 3-D Tensor computed by multi-head scaled dot product</dt>
+<dd><p class="first last">attention.</p>
+</dd>
+</dl>
+</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">tuple</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first">Variable</p>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; If input queries, keys, values are not 3-D Tensors.</p>
 </td>
 </tr>
 </tbody>
 </table>
+<div class="admonition note">
+<p class="first admonition-title">注解</p>
+<p>1. When num_heads &gt; 1, three linear projections are learned respectively
+to map input queries, keys and values into queries&#8217;, keys&#8217; and values&#8217;.
+queries&#8217;, keys&#8217; and values&#8217; have the same shapes with queries, keys
+and values.</p>
+<p class="last">1. When num_heads == 1, scaled_dot_product_attention has no learnable
+parameters.</p>
+</div>
 <p class="rubric">Examples</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Suppose q, k, v are tensor variables with the following shape:</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Suppose q, k, v are Tensors with the following shape:</span>
 <span class="c1"># q: [3, 5, 9], k: [3, 6, 9], v: [3, 6, 10]</span>
-<span class="n">out</span><span class="p">,</span> <span class="n">attn_scores</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">nets</span><span class="o">.</span><span class="n">dot_product_attention</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
-<span class="n">out</span><span class="o">.</span><span class="n">shape</span>  <span class="c1"># [3, 5, 10]</span>
-<span class="n">attn_scores</span><span class="o">.</span><span class="n">shape</span>  <span class="c1"># [3, 5, 6]</span>
+
+<span class="n">contexts</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">nets</span><span class="o">.</span><span class="n">scaled_dot_product_attention</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
+<span class="n">contexts</span><span class="o">.</span><span class="n">shape</span>  <span class="c1"># [3, 5, 10]</span>
 </pre></div>
 </div>
 </dd></dl>

--- a/develop/doc_cn/searchindex.js
+++ b/develop/doc_cn/searchindex.js