Deploy to GitHub Pages: ac739009

b5a7f4b4 · Travis CI · b82601df · b5a7f4b4 · b5a7f4b4 · b5a7f4b4
12 changed file
--- a/develop/doc/_sources/api/v2/fluid/layers.rst.txt
+++ b/develop/doc/_sources/api/v2/fluid/layers.rst.txt
@@ -364,6 +364,12 @@ split
 ..  autofunction:: paddle.v2.fluid.layers.split
    :noindex:

+
+matmul
+------
+..  autofunction:: paddle.v2.fluid.layers.matmul
+    :noindex:
+
 logsigmoid
 ----------
 ..  autofunction:: paddle.v2.fluid.layers.logsigmoid

--- a/develop/doc/_sources/api/v2/fluid/nets.rst.txt
+++ b/develop/doc/_sources/api/v2/fluid/nets.rst.txt
@@ -25,3 +25,9 @@ glu
 ..  autofunction:: paddle.v2.fluid.nets.glu
    :noindex:

+
+dot_product_attention
+---------------------
+..  autofunction:: paddle.v2.fluid.nets.dot_product_attention
+    :noindex:
+
--- a/develop/doc/api/v2/fluid/layers.html
+++ b/develop/doc/api/v2/fluid/layers.html
@@ -2480,6 +2480,75 @@ dimension to split along is <span class="math">\(rank(input) + dim\)</span>.</li
 </div>
 </dd></dl>

+</div>
+<div class="section" id="matmul">
+<h2>matmul<a class="headerlink" href="#matmul" title="Permalink to this headline">¶</a></h2>
+<dl class="function">
+<dt>
+<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">matmul</code><span class="sig-paren">(</span><em>x</em>, <em>y</em>, <em>transpose_x=False</em>, <em>transpose_y=False</em>, <em>name=None</em><span class="sig-paren">)</span></dt>
+<dd><p>Applies matrix multipication to two tensors. Currently only rank 1 to rank
+3 input tensors are supported.</p>
+<p>The actual behavior depends on the shapes of <span class="math">\(x\)</span>, <span class="math">\(y\)</span> and the
+flag values of <code class="xref py py-attr docutils literal"><span class="pre">transpose_x</span></code>, <code class="xref py py-attr docutils literal"><span class="pre">transpose_y</span></code>. Specifically:</p>
+<ul class="simple">
+<li>If a transpose flag is specified, the last two dimensions of the tensor
+are transposed. If the tensor is rank-1 of shape <span class="math">\([D]\)</span>, then for
+<span class="math">\(x\)</span> it is treated as <span class="math">\([1, D]\)</span> in nontransposed form and as
+<span class="math">\([D, 1]\)</span> in transposed form, whereas for <span class="math">\(y\)</span> it is the
+opposite: It is treated as <span class="math">\([D, 1]\)</span> in nontransposed form and as
+<span class="math">\([1, D]\)</span> in transposed form.</li>
+<li>After transpose, the two tensors are 2-D or 3-D and matrix multipication
+performs in the following way.<ul>
+<li>If both are 2-D, they are multiplied like conventional matrices.</li>
+<li>If either is 3-D, it is treated as a stack of matrices residing in the
+last two dimensions and a batched matrix multiply supporting broadcast
+applies on the two tensors.</li>
+</ul>
+</li>
+</ul>
+<p>Also note that if the raw tensor <span class="math">\(x\)</span> or <span class="math">\(y\)</span> is rank-1 and
+nontransposed, the prepended or appended dimension <span class="math">\(1\)</span> will be
+removed after matrix multipication.</p>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
+<li><strong>x</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
+<li><strong>y</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
+<li><strong>transpose_x</strong> (<em>bool</em>) &#8211; Whether to transpose <span class="math">\(x\)</span> before multiplication.</li>
+<li><strong>transpose_y</strong> (<em>bool</em>) &#8211; Whether to transpose <span class="math">\(y\)</span> before multiplication.</li>
+<li><strong>name</strong> (<em>str|None</em>) &#8211; A name for this layer(optional). If set None, the layer
+will be named automatically.</li>
+</ul>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">The product Tensor variable.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">Variable</p>
+</td>
+</tr>
+</tbody>
+</table>
+<p class="rubric">Examples</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Examples to clarify shapes of the inputs and output</span>
+<span class="c1"># x: [B, M, K], y: [B, K, N]</span>
+<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>  <span class="c1"># out: [B, M, N]</span>
+<span class="c1"># x: [B, M, K], y: [K, N]</span>
+<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>  <span class="c1"># out: [B, M, N]</span>
+<span class="c1"># x: [B, M, K], y: [K]</span>
+<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>  <span class="c1"># out: [B, M]</span>
+<span class="c1"># x: [M, K], y: [K, N]</span>
+<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>  <span class="c1"># out: [M, N]</span>
+<span class="c1"># x: [K], y: [K]</span>
+<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>  <span class="c1"># out: [1]</span>
+<span class="c1"># x: [M], y: [N]</span>
+<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)</span>  <span class="c1"># out: [M, N]</span>
+</pre></div>
+</div>
+</dd></dl>
+
 </div>
 <div class="section" id="logsigmoid">
 <h2>logsigmoid<a class="headerlink" href="#logsigmoid" title="Permalink to this headline">¶</a></h2>

--- a/develop/doc/api/v2/fluid/nets.html
+++ b/develop/doc/api/v2/fluid/nets.html
@@ -283,6 +283,55 @@ dimension to split along is <span class="math">\(rank(input) + dim\)</span>.</li
 </div>
 </dd></dl>

+</div>
+<div class="section" id="dot-product-attention">
+<h2>dot_product_attention<a class="headerlink" href="#dot-product-attention" title="Permalink to this headline">¶</a></h2>
+<dl class="function">
+<dt>
+<code class="descclassname">paddle.v2.fluid.nets.</code><code class="descname">dot_product_attention</code><span class="sig-paren">(</span><em>querys</em>, <em>keys</em>, <em>values</em><span class="sig-paren">)</span></dt>
+<dd><p>The dot-product attention.</p>
+<p>Attention mechanism can be seen as mapping a query and a set of key-value
+pairs to an output. The output is computed as a weighted sum of the values,
+where the weight assigned to each value is computed by a compatibility
+function (dot-product here) of the query with the corresponding key.</p>
+<p>The dot-product attention can be implemented through (batch) matrix
+multipication as follows:</p>
+<blockquote>
+<div><div class="math">
+\[Attention(Q, K, V)= softmax(QK^\mathrm{T})V\]</div>
+</div></blockquote>
+<p>Refer to <a class="reference external" href="https://arxiv.org/pdf/1706.03762.pdf">Attention Is All You Need</a>.</p>
+<p>Note that batch data containing sequences with different lengths is not
+supported by this because of the (batch) matrix multipication.</p>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
+<li><strong>query</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
+<li><strong>key</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
+<li><strong>value</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
+</ul>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">The Tensor variables representing the output and attention scores.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">tuple</p>
+</td>
+</tr>
+</tbody>
+</table>
+<p class="rubric">Examples</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Suppose q, k, v are tensor variables with the following shape:</span>
+<span class="c1"># q: [3, 5, 9], k: [3, 6, 9], v: [3, 6, 10]</span>
+<span class="n">out</span><span class="p">,</span> <span class="n">attn_scores</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">nets</span><span class="o">.</span><span class="n">dot_product_attention</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
+<span class="n">out</span><span class="o">.</span><span class="n">shape</span>  <span class="c1"># [3, 5, 10]</span>
+<span class="n">attn_scores</span><span class="o">.</span><span class="n">shape</span>  <span class="c1"># [3, 5, 6]</span>
+</pre></div>
+</div>
+</dd></dl>
+
 </div>
 </div>


--- a/develop/doc/searchindex.js
+++ b/develop/doc/searchindex.js
--- a/develop/doc_cn/_sources/api/v2/fluid/layers.rst.txt
+++ b/develop/doc_cn/_sources/api/v2/fluid/layers.rst.txt
@@ -364,6 +364,12 @@ split
 ..  autofunction:: paddle.v2.fluid.layers.split
    :noindex:

+
+matmul
+------
+..  autofunction:: paddle.v2.fluid.layers.matmul
+    :noindex:
+
 logsigmoid
 ----------
 ..  autofunction:: paddle.v2.fluid.layers.logsigmoid

--- a/develop/doc_cn/_sources/api/v2/fluid/nets.rst.txt
+++ b/develop/doc_cn/_sources/api/v2/fluid/nets.rst.txt
@@ -25,3 +25,9 @@ glu
 ..  autofunction:: paddle.v2.fluid.nets.glu
    :noindex:

+
+dot_product_attention
+---------------------
+..  autofunction:: paddle.v2.fluid.nets.dot_product_attention
+    :noindex:
+
--- a/develop/doc_cn/_sources/howto/usage/capi/organization_of_the_inputs_cn.md.txt
+++ b/develop/doc_cn/_sources/howto/usage/capi/organization_of_the_inputs_cn.md.txt
@@ -19,7 +19,7 @@

 ### 基本使用概念

- 在PaddlePaddle内部，神经网络中一个计算层的输入/输出被组织为一个 `Argument` 结构体，如果神经网络有多个输入或者多个输入，每一个输入/输入都会对应有自己的`Argument`。
+- 在PaddlePaddle内部，神经网络中一个计算层的输入/输出被组织为一个 `Argument` 结构体，如果神经网络有多个输入或者多个输出，每一个输入/输出都会对应有自己的`Argument`。
 - `Argument` 并不真正“存储”数据，而是将输入/输出信息有机地组织在一起。
 - 在`Argument`内部由`IVector`（对应着上文提到的一维整型数组）和`Matrix`（对应着上文提到的二维浮点型矩阵）来实际存储数据；由 `Sequence Start Positions` (下文详细解释) 来描述输入/输出的序列信息。


--- a/develop/doc_cn/api/v2/fluid/layers.html
+++ b/develop/doc_cn/api/v2/fluid/layers.html
@@ -2499,6 +2499,75 @@ dimension to split along is <span class="math">\(rank(input) + dim\)</span>.</li
 </div>
 </dd></dl>

+</div>
+<div class="section" id="matmul">
+<h2>matmul<a class="headerlink" href="#matmul" title="永久链接至标题">¶</a></h2>
+<dl class="function">
+<dt>
+<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">matmul</code><span class="sig-paren">(</span><em>x</em>, <em>y</em>, <em>transpose_x=False</em>, <em>transpose_y=False</em>, <em>name=None</em><span class="sig-paren">)</span></dt>
+<dd><p>Applies matrix multipication to two tensors. Currently only rank 1 to rank
+3 input tensors are supported.</p>
+<p>The actual behavior depends on the shapes of <span class="math">\(x\)</span>, <span class="math">\(y\)</span> and the
+flag values of <code class="xref py py-attr docutils literal"><span class="pre">transpose_x</span></code>, <code class="xref py py-attr docutils literal"><span class="pre">transpose_y</span></code>. Specifically:</p>
+<ul class="simple">
+<li>If a transpose flag is specified, the last two dimensions of the tensor
+are transposed. If the tensor is rank-1 of shape <span class="math">\([D]\)</span>, then for
+<span class="math">\(x\)</span> it is treated as <span class="math">\([1, D]\)</span> in nontransposed form and as
+<span class="math">\([D, 1]\)</span> in transposed form, whereas for <span class="math">\(y\)</span> it is the
+opposite: It is treated as <span class="math">\([D, 1]\)</span> in nontransposed form and as
+<span class="math">\([1, D]\)</span> in transposed form.</li>
+<li>After transpose, the two tensors are 2-D or 3-D and matrix multipication
+performs in the following way.<ul>
+<li>If both are 2-D, they are multiplied like conventional matrices.</li>
+<li>If either is 3-D, it is treated as a stack of matrices residing in the
+last two dimensions and a batched matrix multiply supporting broadcast
+applies on the two tensors.</li>
+</ul>
+</li>
+</ul>
+<p>Also note that if the raw tensor <span class="math">\(x\)</span> or <span class="math">\(y\)</span> is rank-1 and
+nontransposed, the prepended or appended dimension <span class="math">\(1\)</span> will be
+removed after matrix multipication.</p>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
+<li><strong>x</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
+<li><strong>y</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
+<li><strong>transpose_x</strong> (<em>bool</em>) &#8211; Whether to transpose <span class="math">\(x\)</span> before multiplication.</li>
+<li><strong>transpose_y</strong> (<em>bool</em>) &#8211; Whether to transpose <span class="math">\(y\)</span> before multiplication.</li>
+<li><strong>name</strong> (<em>str|None</em>) &#8211; A name for this layer(optional). If set None, the layer
+will be named automatically.</li>
+</ul>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">The product Tensor variable.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">Variable</p>
+</td>
+</tr>
+</tbody>
+</table>
+<p class="rubric">Examples</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Examples to clarify shapes of the inputs and output</span>
+<span class="c1"># x: [B, M, K], y: [B, K, N]</span>
+<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>  <span class="c1"># out: [B, M, N]</span>
+<span class="c1"># x: [B, M, K], y: [K, N]</span>
+<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>  <span class="c1"># out: [B, M, N]</span>
+<span class="c1"># x: [B, M, K], y: [K]</span>
+<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>  <span class="c1"># out: [B, M]</span>
+<span class="c1"># x: [M, K], y: [K, N]</span>
+<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>  <span class="c1"># out: [M, N]</span>
+<span class="c1"># x: [K], y: [K]</span>
+<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>  <span class="c1"># out: [1]</span>
+<span class="c1"># x: [M], y: [N]</span>
+<span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)</span>  <span class="c1"># out: [M, N]</span>
+</pre></div>
+</div>
+</dd></dl>
+
 </div>
 <div class="section" id="logsigmoid">
 <h2>logsigmoid<a class="headerlink" href="#logsigmoid" title="永久链接至标题">¶</a></h2>

--- a/develop/doc_cn/api/v2/fluid/nets.html
+++ b/develop/doc_cn/api/v2/fluid/nets.html
@@ -302,6 +302,55 @@ dimension to split along is <span class="math">\(rank(input) + dim\)</span>.</li
 </div>
 </dd></dl>

+</div>
+<div class="section" id="dot-product-attention">
+<h2>dot_product_attention<a class="headerlink" href="#dot-product-attention" title="永久链接至标题">¶</a></h2>
+<dl class="function">
+<dt>
+<code class="descclassname">paddle.v2.fluid.nets.</code><code class="descname">dot_product_attention</code><span class="sig-paren">(</span><em>querys</em>, <em>keys</em>, <em>values</em><span class="sig-paren">)</span></dt>
+<dd><p>The dot-product attention.</p>
+<p>Attention mechanism can be seen as mapping a query and a set of key-value
+pairs to an output. The output is computed as a weighted sum of the values,
+where the weight assigned to each value is computed by a compatibility
+function (dot-product here) of the query with the corresponding key.</p>
+<p>The dot-product attention can be implemented through (batch) matrix
+multipication as follows:</p>
+<blockquote>
+<div><div class="math">
+\[Attention(Q, K, V)= softmax(QK^\mathrm{T})V\]</div>
+</div></blockquote>
+<p>Refer to <a class="reference external" href="https://arxiv.org/pdf/1706.03762.pdf">Attention Is All You Need</a>.</p>
+<p>Note that batch data containing sequences with different lengths is not
+supported by this because of the (batch) matrix multipication.</p>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
+<li><strong>query</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
+<li><strong>key</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
+<li><strong>value</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
+</ul>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">The Tensor variables representing the output and attention scores.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">tuple</p>
+</td>
+</tr>
+</tbody>
+</table>
+<p class="rubric">Examples</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Suppose q, k, v are tensor variables with the following shape:</span>
+<span class="c1"># q: [3, 5, 9], k: [3, 6, 9], v: [3, 6, 10]</span>
+<span class="n">out</span><span class="p">,</span> <span class="n">attn_scores</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">nets</span><span class="o">.</span><span class="n">dot_product_attention</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
+<span class="n">out</span><span class="o">.</span><span class="n">shape</span>  <span class="c1"># [3, 5, 10]</span>
+<span class="n">attn_scores</span><span class="o">.</span><span class="n">shape</span>  <span class="c1"># [3, 5, 6]</span>
+</pre></div>
+</div>
+</dd></dl>
+
 </div>
 </div>


--- a/develop/doc_cn/howto/usage/capi/organization_of_the_inputs_cn.html
+++ b/develop/doc_cn/howto/usage/capi/organization_of_the_inputs_cn.html
@@ -264,7 +264,7 @@
 <div class="section" id="">
 <span id="id3"></span><h2>基本使用概念<a class="headerlink" href="#" title="永久链接至标题">¶</a></h2>
 <ul class="simple">
-<li>在PaddlePaddle内部，神经网络中一个计算层的输入/输出被组织为一个 <code class="docutils literal"><span class="pre">Argument</span></code> 结构体，如果神经网络有多个输入或者多个输入，每一个输入/输入都会对应有自己的<code class="docutils literal"><span class="pre">Argument</span></code>。</li>
+<li>在PaddlePaddle内部，神经网络中一个计算层的输入/输出被组织为一个 <code class="docutils literal"><span class="pre">Argument</span></code> 结构体，如果神经网络有多个输入或者多个输出，每一个输入/输出都会对应有自己的<code class="docutils literal"><span class="pre">Argument</span></code>。</li>
 <li><code class="docutils literal"><span class="pre">Argument</span></code> 并不真正“存储”数据，而是将输入/输出信息有机地组织在一起。</li>
 <li>在<code class="docutils literal"><span class="pre">Argument</span></code>内部由<code class="docutils literal"><span class="pre">IVector</span></code>（对应着上文提到的一维整型数组）和<code class="docutils literal"><span class="pre">Matrix</span></code>（对应着上文提到的二维浮点型矩阵）来实际存储数据；由 <code class="docutils literal"><span class="pre">Sequence</span> <span class="pre">Start</span> <span class="pre">Positions</span></code> (下文详细解释) 来描述输入/输出的序列信息。</li>
 <li><strong>注</strong>：<ol>

--- a/develop/doc_cn/searchindex.js
+++ b/develop/doc_cn/searchindex.js