提交 b953e6a6 编写于 作者: T Travis CI

Deploy to GitHub Pages: 8e957df4

上级 31bae860
......@@ -142,12 +142,15 @@ We also project the encoder vector to :code:`decoder_size` dimensional space, ge
The decoder uses :code:`recurrent_group` to define the recurrent neural network. The step and output functions are defined in :code:`gru_decoder_with_attention`:
.. code-block:: python
group_inputs=[StaticInput(input=encoded_vector,is_seq=True),
StaticInput(input=encoded_proj,is_seq=True)]
trg_embedding = embedding_layer(
input=data_layer(name='target_language_word',
size=target_dict_dim),
size=word_vector_dim,
param_attr=ParamAttr(name='_target_language_embedding'))
group_inputs.append(trg_embedding)
# For decoder equipped with attention mechanism, in training,
# target embedding (the groudtruth) is the data input,
# while encoded source sequence is accessed to as an unbounded memory.
......@@ -156,13 +159,7 @@ The decoder uses :code:`recurrent_group` to define the recurrent neural network.
# All sequence inputs should have the same length.
decoder = recurrent_group(name=decoder_group_name,
step=gru_decoder_with_attention,
input=[
StaticInput(input=encoded_vector,
is_seq=True),
StaticInput(input=encoded_proj,
is_seq=True),
trg_embedding
])
input=group_inputs)
The implementation of the step function is listed as below. First, it defines the **memory** of the decoder network. Then it defines attention, gated recurrent unit step function, and the output function:
......@@ -217,10 +214,8 @@ The code is listed below:
.. code-block:: python
gen_inputs = [StaticInput(input=encoded_vector,
is_seq=True),
StaticInput(input=encoded_proj,
is_seq=True), ]
group_inputs=[StaticInput(input=encoded_vector,is_seq=True),
StaticInput(input=encoded_proj,is_seq=True)]
# In generation, decoder predicts a next target word based on
# the encoded source sequence and the last generated target word.
# The encoded source sequence (encoder's output) must be specified by
......@@ -231,10 +226,10 @@ The code is listed below:
size=target_dict_dim,
embedding_name='_target_language_embedding',
embedding_size=word_vector_dim)
gen_inputs.append(trg_embedding)
group_inputs.append(trg_embedding)
beam_gen = beam_search(name=decoder_group_name,
step=gru_decoder_with_attention,
input=gen_inputs,
input=group_inputs,
id_input=data_layer(name="sent_id",
size=1),
dict_file=trg_dict_path,
......
......@@ -169,6 +169,12 @@ dotmul_projection
:members: dotmul_projection
:noindex:
dotmul_operator
---------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: dotmul_operator
:noindex:
full_matrix_projection
----------------------
.. automodule:: paddle.trainer_config_helpers.layers
......
......@@ -151,28 +151,6 @@ Its <strong>output function</strong> simply takes <span class="math">\(x_t\)</sp
</pre></div>
</div>
<p>The decoder uses <code class="code docutils literal"><span class="pre">recurrent_group</span></code> to define the recurrent neural network. The step and output functions are defined in <code class="code docutils literal"><span class="pre">gru_decoder_with_attention</span></code>:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">trg_embedding</span> <span class="o">=</span> <span class="n">embedding_layer</span><span class="p">(</span>
<span class="nb">input</span><span class="o">=</span><span class="n">data_layer</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;target_language_word&#39;</span><span class="p">,</span>
<span class="n">size</span><span class="o">=</span><span class="n">target_dict_dim</span><span class="p">),</span>
<span class="n">size</span><span class="o">=</span><span class="n">word_vector_dim</span><span class="p">,</span>
<span class="n">param_attr</span><span class="o">=</span><span class="n">ParamAttr</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;_target_language_embedding&#39;</span><span class="p">))</span>
<span class="c1"># For decoder equipped with attention mechanism, in training,</span>
<span class="c1"># target embedding (the groudtruth) is the data input,</span>
<span class="c1"># while encoded source sequence is accessed to as an unbounded memory.</span>
<span class="c1"># StaticInput means the same value is utilized at different time steps.</span>
<span class="c1"># Otherwise, it is a sequence input. Inputs at different time steps are different.</span>
<span class="c1"># All sequence inputs should have the same length.</span>
<span class="n">decoder</span> <span class="o">=</span> <span class="n">recurrent_group</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">decoder_group_name</span><span class="p">,</span>
<span class="n">step</span><span class="o">=</span><span class="n">gru_decoder_with_attention</span><span class="p">,</span>
<span class="nb">input</span><span class="o">=</span><span class="p">[</span>
<span class="n">StaticInput</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">encoded_vector</span><span class="p">,</span>
<span class="n">is_seq</span><span class="o">=</span><span class="bp">True</span><span class="p">),</span>
<span class="n">StaticInput</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">encoded_proj</span><span class="p">,</span>
<span class="n">is_seq</span><span class="o">=</span><span class="bp">True</span><span class="p">),</span>
<span class="n">trg_embedding</span>
<span class="p">])</span>
</pre></div>
</div>
<p>The implementation of the step function is listed as below. First, it defines the <strong>memory</strong> of the decoder network. Then it defines attention, gated recurrent unit step function, and the output function:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">gru_decoder_with_attention</span><span class="p">(</span><span class="n">enc_vec</span><span class="p">,</span> <span class="n">enc_proj</span><span class="p">,</span> <span class="n">current_word</span><span class="p">):</span>
<span class="c1"># Defines the memory of the decoder.</span>
......@@ -221,10 +199,8 @@ Its <strong>output function</strong> simply takes <span class="math">\(x_t\)</sp
</li>
</ul>
<p>The code is listed below:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gen_inputs</span> <span class="o">=</span> <span class="p">[</span><span class="n">StaticInput</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">encoded_vector</span><span class="p">,</span>
<span class="n">is_seq</span><span class="o">=</span><span class="bp">True</span><span class="p">),</span>
<span class="n">StaticInput</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">encoded_proj</span><span class="p">,</span>
<span class="n">is_seq</span><span class="o">=</span><span class="bp">True</span><span class="p">),</span> <span class="p">]</span>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">group_inputs</span><span class="o">=</span><span class="p">[</span><span class="n">StaticInput</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">encoded_vector</span><span class="p">,</span><span class="n">is_seq</span><span class="o">=</span><span class="bp">True</span><span class="p">),</span>
<span class="n">StaticInput</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">encoded_proj</span><span class="p">,</span><span class="n">is_seq</span><span class="o">=</span><span class="bp">True</span><span class="p">)]</span>
<span class="c1"># In generation, decoder predicts a next target word based on</span>
<span class="c1"># the encoded source sequence and the last generated target word.</span>
<span class="c1"># The encoded source sequence (encoder&#39;s output) must be specified by</span>
......@@ -235,10 +211,10 @@ Its <strong>output function</strong> simply takes <span class="math">\(x_t\)</sp
<span class="n">size</span><span class="o">=</span><span class="n">target_dict_dim</span><span class="p">,</span>
<span class="n">embedding_name</span><span class="o">=</span><span class="s1">&#39;_target_language_embedding&#39;</span><span class="p">,</span>
<span class="n">embedding_size</span><span class="o">=</span><span class="n">word_vector_dim</span><span class="p">)</span>
<span class="n">gen_inputs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">trg_embedding</span><span class="p">)</span>
<span class="n">group_inputs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">trg_embedding</span><span class="p">)</span>
<span class="n">beam_gen</span> <span class="o">=</span> <span class="n">beam_search</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">decoder_group_name</span><span class="p">,</span>
<span class="n">step</span><span class="o">=</span><span class="n">gru_decoder_with_attention</span><span class="p">,</span>
<span class="nb">input</span><span class="o">=</span><span class="n">gen_inputs</span><span class="p">,</span>
<span class="nb">input</span><span class="o">=</span><span class="n">group_inputs</span><span class="p">,</span>
<span class="n">id_input</span><span class="o">=</span><span class="n">data_layer</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">&quot;sent_id&quot;</span><span class="p">,</span>
<span class="n">size</span><span class="o">=</span><span class="mi">1</span><span class="p">),</span>
<span class="n">dict_file</span><span class="o">=</span><span class="n">trg_dict_path</span><span class="p">,</span>
......
此差异已折叠。
......@@ -1148,7 +1148,6 @@ It performs element-wise multiplication with weight.</p>
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; Input layer.</li>
<li><strong>param_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ParameterAttribute" title="paddle.trainer_config_helpers.attrs.ParameterAttribute"><em>ParameterAttribute</em></a>) &#8211; Parameter config, None if use default.</li>
<li><strong>scale</strong> (<em>float</em>) &#8211; config scalar, default value is one.</li>
</ul>
</td>
</tr>
......@@ -1162,6 +1161,42 @@ It performs element-wise multiplication with weight.</p>
</table>
</dd></dl>
</div>
<div class="section" id="dotmul-operator">
<h2>dotmul_operator<a class="headerlink" href="#dotmul-operator" title="Permalink to this headline"></a></h2>
<dl class="function">
<dt>
<code class="descclassname">paddle.trainer_config_helpers.layers.</code><code class="descname">dotmul_operator</code><span class="sig-paren">(</span><em>x</em>, <em>y</em>, <em>scale=1</em><span class="sig-paren">)</span></dt>
<dd><p>DotMulOperator takes two inputs and performs element-wise multiplication:</p>
<div class="math">
\[out.row[i] += scale * (x.row[i] .* y.row[i])\]</div>
<p>where <span class="math">\(.*\)</span> means element-wise multiplication, and
scale is a config scalar, its default value is one.</p>
<p>The example usage is:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">op</span> <span class="o">=</span> <span class="n">dotmul_operator</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">layer1</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">layer2</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mf">0.5</span><span class="p">)</span>
</pre></div>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>x</strong> (<em>LayerOutput</em>) &#8211; Input layer1</li>
<li><strong>y</strong> (<em>LayerOutput</em>) &#8211; Input layer2</li>
<li><strong>scale</strong> (<em>float</em>) &#8211; config scalar, default value is one.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">A DotMulOperator Object.</p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">DotMulOperator</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
</div>
<div class="section" id="full-matrix-projection">
<h2>full_matrix_projection<a class="headerlink" href="#full-matrix-projection" title="Permalink to this headline"></a></h2>
......@@ -2539,6 +2574,7 @@ It is used by recurrent layer group.</p>
<li><a class="reference internal" href="#id2">mixed_layer</a></li>
<li><a class="reference internal" href="#embedding-layer">embedding_layer</a></li>
<li><a class="reference internal" href="#dotmul-projection">dotmul_projection</a></li>
<li><a class="reference internal" href="#dotmul-operator">dotmul_operator</a></li>
<li><a class="reference internal" href="#full-matrix-projection">full_matrix_projection</a></li>
<li><a class="reference internal" href="#identity-projection">identity_projection</a></li>
<li><a class="reference internal" href="#table-projection">table_projection</a></li>
......
......@@ -109,6 +109,7 @@
<li class="toctree-l2"><a class="reference internal" href="layers.html#id2">mixed_layer</a></li>
<li class="toctree-l2"><a class="reference internal" href="layers.html#embedding-layer">embedding_layer</a></li>
<li class="toctree-l2"><a class="reference internal" href="layers.html#dotmul-projection">dotmul_projection</a></li>
<li class="toctree-l2"><a class="reference internal" href="layers.html#dotmul-operator">dotmul_operator</a></li>
<li class="toctree-l2"><a class="reference internal" href="layers.html#full-matrix-projection">full_matrix_projection</a></li>
<li class="toctree-l2"><a class="reference internal" href="layers.html#identity-projection">identity_projection</a></li>
<li class="toctree-l2"><a class="reference internal" href="layers.html#table-projection">table_projection</a></li>
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册