提交 40fd3f58 编写于 作者: T Travis CI

Deploy to GitHub Pages: 1fc43527

上级 4d6f4df7
......@@ -142,6 +142,7 @@ We also project the encoder vector to :code:`decoder_size` dimensional space, ge
The decoder uses :code:`recurrent_group` to define the recurrent neural network. The step and output functions are defined in :code:`gru_decoder_with_attention`:
.. code-block:: python
group_inputs=[StaticInput(input=encoded_vector,is_seq=True),
StaticInput(input=encoded_proj,is_seq=True)]
trg_embedding = embedding_layer(
......
......@@ -4,6 +4,12 @@ BaseSGDOptimizer
:members: BaseSGDOptimizer
:noindex:
MomentumOptimizer
=================
.. automodule:: paddle.trainer_config_helpers.optimizers
:members: MomentumOptimizer
:noindex:
AdamOptimizer
=============
.. automodule:: paddle.trainer_config_helpers.optimizers
......
......@@ -163,6 +163,26 @@ Its <strong>output function</strong> simply takes <span class="math">\(x_t\)</sp
</pre></div>
</div>
<p>The decoder uses <code class="code docutils literal"><span class="pre">recurrent_group</span></code> to define the recurrent neural network. The step and output functions are defined in <code class="code docutils literal"><span class="pre">gru_decoder_with_attention</span></code>:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">group_inputs</span><span class="o">=</span><span class="p">[</span><span class="n">StaticInput</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">encoded_vector</span><span class="p">,</span><span class="n">is_seq</span><span class="o">=</span><span class="bp">True</span><span class="p">),</span>
<span class="n">StaticInput</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">encoded_proj</span><span class="p">,</span><span class="n">is_seq</span><span class="o">=</span><span class="bp">True</span><span class="p">)]</span>
<span class="n">trg_embedding</span> <span class="o">=</span> <span class="n">embedding_layer</span><span class="p">(</span>
<span class="nb">input</span><span class="o">=</span><span class="n">data_layer</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;target_language_word&#39;</span><span class="p">,</span>
<span class="n">size</span><span class="o">=</span><span class="n">target_dict_dim</span><span class="p">),</span>
<span class="n">size</span><span class="o">=</span><span class="n">word_vector_dim</span><span class="p">,</span>
<span class="n">param_attr</span><span class="o">=</span><span class="n">ParamAttr</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;_target_language_embedding&#39;</span><span class="p">))</span>
<span class="n">group_inputs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">trg_embedding</span><span class="p">)</span>
<span class="c1"># For decoder equipped with attention mechanism, in training,</span>
<span class="c1"># target embedding (the groudtruth) is the data input,</span>
<span class="c1"># while encoded source sequence is accessed to as an unbounded memory.</span>
<span class="c1"># StaticInput means the same value is utilized at different time steps.</span>
<span class="c1"># Otherwise, it is a sequence input. Inputs at different time steps are different.</span>
<span class="c1"># All sequence inputs should have the same length.</span>
<span class="n">decoder</span> <span class="o">=</span> <span class="n">recurrent_group</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">decoder_group_name</span><span class="p">,</span>
<span class="n">step</span><span class="o">=</span><span class="n">gru_decoder_with_attention</span><span class="p">,</span>
<span class="nb">input</span><span class="o">=</span><span class="n">group_inputs</span><span class="p">)</span>
</pre></div>
</div>
<p>The implementation of the step function is listed as below. First, it defines the <strong>memory</strong> of the decoder network. Then it defines attention, gated recurrent unit step function, and the output function:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">gru_decoder_with_attention</span><span class="p">(</span><span class="n">enc_vec</span><span class="p">,</span> <span class="n">enc_proj</span><span class="p">,</span> <span class="n">current_word</span><span class="p">):</span>
<span class="c1"># Defines the memory of the decoder.</span>
......
此差异已折叠。
......@@ -93,6 +93,32 @@ be learned. The i is the i-th observation in (trainning) data.</p>
<p>where <span class="math">\(\eta\)</span> is learning rate. And <span class="math">\(n\)</span> is batch size.</p>
</dd></dl>
</div>
<div class="section" id="momentumoptimizer">
<h1>MomentumOptimizer<a class="headerlink" href="#momentumoptimizer" title="Permalink to this headline"></a></h1>
<dl class="class">
<dt>
<em class="property">class </em><code class="descclassname">paddle.trainer_config_helpers.optimizers.</code><code class="descname">MomentumOptimizer</code><span class="sig-paren">(</span><em>momentum=None</em>, <em>sparse=False</em><span class="sig-paren">)</span></dt>
<dd><p>MomentumOptimizer.</p>
<p>When sparse=True, the update scheme:</p>
<div class="math">
\[\begin{split}\alpha_t &amp;= \alpha_{t-1} / k \\
\beta_t &amp;= \beta_{t-1} / (1 + \lambda \gamma_t) \\
u_t &amp;= u_{t-1} - \alpha_t \gamma_t g_t \\
v_t &amp;= v_{t-1} + \tau_{t-1} \alpha_t \gamma_t g_t \\
\tau_t &amp;= \tau_{t-1} + \beta_t / \alpha_t\end{split}\]</div>
<p>where <span class="math">\(k\)</span> is momentum, <span class="math">\(\lambda\)</span> is decay rate,
<span class="math">\(\gamma_t\)</span> is learning rate at the t&#8217;th step.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>sparse</strong> (<em>bool</em>) &#8211; with sparse support or not.</td>
</tr>
</tbody>
</table>
</dd></dl>
</div>
<div class="section" id="adamoptimizer">
<h1>AdamOptimizer<a class="headerlink" href="#adamoptimizer" title="Permalink to this headline"></a></h1>
......@@ -289,6 +315,7 @@ clipped.</li>
<h3><a href="../../../index.html">Table Of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">BaseSGDOptimizer</a></li>
<li><a class="reference internal" href="#momentumoptimizer">MomentumOptimizer</a></li>
<li><a class="reference internal" href="#adamoptimizer">AdamOptimizer</a></li>
<li><a class="reference internal" href="#adamaxoptimizer">AdamaxOptimizer</a></li>
<li><a class="reference internal" href="#adagradoptimizer">AdaGradOptimizer</a></li>
......
......@@ -73,6 +73,7 @@ var _hmt = _hmt || [];
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="optimizers.html">BaseSGDOptimizer</a></li>
<li class="toctree-l1"><a class="reference internal" href="optimizers.html#momentumoptimizer">MomentumOptimizer</a></li>
<li class="toctree-l1"><a class="reference internal" href="optimizers.html#adamoptimizer">AdamOptimizer</a></li>
<li class="toctree-l1"><a class="reference internal" href="optimizers.html#adamaxoptimizer">AdamaxOptimizer</a></li>
<li class="toctree-l1"><a class="reference internal" href="optimizers.html#adagradoptimizer">AdaGradOptimizer</a></li>
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册