  <div class="section" id="nets">
<h1>nets<a class="headerlink" href="#nets" title="Permalink to this headline"></a></h1>
<div class="section" id="simple-img-conv-pool">
<h2>simple_img_conv_pool<a class="headerlink" href="#simple-img-conv-pool" title="Permalink to this headline"></a></h2>
<dl class="function">
<code class="descclassname">paddle.v2.fluid.nets.</code><code class="descname">simple_img_conv_pool</code><span class="sig-paren">(</span><em>input</em>, <em>num_filters</em>, <em>filter_size</em>, <em>pool_size</em>, <em>pool_stride</em>, <em>act</em>, <em>param_attr=None</em>, <em>pool_type='max'</em>, <em>use_cudnn=True</em><span class="sig-paren">)</span></dt>

<div class="section" id="sequence-conv-pool">
<h2>sequence_conv_pool<a class="headerlink" href="#sequence-conv-pool" title="Permalink to this headline"></a></h2>
<dl class="function">
<code class="descclassname">paddle.v2.fluid.nets.</code><code class="descname">sequence_conv_pool</code><span class="sig-paren">(</span><em>input</em>, <em>num_filters</em>, <em>filter_size</em>, <em>param_attr=None</em>, <em>act='sigmoid'</em>, <em>pool_type='max'</em><span class="sig-paren">)</span></dt>

<div class="section" id="glu">
<h2>glu<a class="headerlink" href="#glu" title="Permalink to this headline"></a></h2>
<dl class="function">
<code class="descclassname">paddle.v2.fluid.nets.</code><code class="descname">glu</code><span class="sig-paren">(</span><em>input</em>, <em>dim=-1</em><span class="sig-paren">)</span></dt>
<dd><p>The gated linear unit composed by split, sigmoid activation and elementwise
multiplication. Specifically, Split the input into two equal sized parts
<span class="math">\(a\)</span> and <span class="math">\(b\)</span> along the given dimension and then compute as
<div><div class="math">
\[{GLU}(a, b)= a \otimes \sigma(b)\]</div>
<p>Refer to <a class="reference external" href="">Language Modeling with Gated Convolutional Networks</a>.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>input</strong> (<em>Variable</em>) &#8211; The input variable which is a Tensor or LoDTensor.</li>
<li><strong>dim</strong> (<em>int</em>) &#8211; The dimension along which to split. If <span class="math">\(dim &lt; 0\)</span>, the
dimension to split along is <span class="math">\(rank(input) + dim\)</span>.</li>
<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">The Tensor variable with half the size of input.</p>
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">Variable</p>
<p class="rubric">Examples</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># x is a Tensor variable with shape [3, 6, 9]</span>
<span class="n">fluid</span><span class="o">.</span><span class="n">nets</span><span class="o">.</span><span class="n">glu</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">x</span><span class="p">,</span> <span class="n">dim</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>  <span class="c1"># shape of output: [3, 3, 9]</span>

<div class="section" id="scaled-dot-product-attention">
<h2>scaled_dot_product_attention<a class="headerlink" href="#scaled-dot-product-attention" title="Permalink to this headline"></a></h2>
<dl class="function">
<code class="descclassname">paddle.v2.fluid.nets.</code><code class="descname">scaled_dot_product_attention</code><span class="sig-paren">(</span><em>queries</em>, <em>keys</em>, <em>values</em>, <em>num_heads=1</em>, <em>dropout_rate=0.0</em><span class="sig-paren">)</span></dt>
<dd><p>The dot-product attention.</p>
<p>Attention mechanism can be seen as mapping a query and a set of key-value
pairs to an output. The output is computed as a weighted sum of the values,
where the weight assigned to each value is computed by a compatibility
function (dot-product here) of the query with the corresponding key.</p>
<p>The dot-product attention can be implemented through (batch) matrix
multipication as follows:</p>
<div><div class="math">
\[Attention(Q, K, V)= softmax(QK^\mathrm{T})V\]</div>
<p>Refer to <a class="reference external" href="">Attention Is All You Need</a>.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>queries</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
<li><strong>keys</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
<li><strong>values</strong> (<em>Variable</em>) &#8211; The input variable which should be a 3-D Tensor.</li>
<li><strong>num_heads</strong> (<em>int</em>) &#8211; Head number to compute the scaled dot product
attention. Default value is 1.</li>
<li><strong>dropout_rate</strong> (<em>float</em>) &#8211; The dropout rate to drop the attention weight.
Default value is 0.</li>
<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">A 3-D Tensor computed by multi-head scaled dot product                   attention.</p>
266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first">Variable</p>
<tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code> &#8211; If input queries, keys, values are not 3-D Tensors.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p>1. When num_heads &gt; 1, three linear projections are learned respectively
to map input queries, keys and values into queries&#8217;, keys&#8217; and values&#8217;.
queries&#8217;, keys&#8217; and values&#8217; have the same shapes with queries, keys
and values.</p>
<p class="last">1. When num_heads == 1, scaled_dot_product_attention has no learnable
<p class="rubric">Examples</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="c1"># Suppose q, k, v are Tensors with the following shape:</span>
<span class="c1"># q: [3, 5, 9], k: [3, 6, 9], v: [3, 6, 10]</span>

<span class="n">contexts</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">nets</span><span class="o">.</span><span class="n">scaled_dot_product_attention</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
<span class="n">contexts</span><span class="o">.</span><span class="n">shape</span>  <span class="c1"># [3, 5, 10]</span>


