提交 96bbb20e 编写于 作者: T Travis CI

Deploy to GitHub Pages: d011514e

上级 b603c182
...@@ -1027,6 +1027,7 @@ more details about LSTM.</p> ...@@ -1027,6 +1027,7 @@ more details about LSTM.</p>
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>name</strong> (<em>basestring</em>) &#8211; The lstmemory layer name.</li> <li><strong>name</strong> (<em>basestring</em>) &#8211; The lstmemory layer name.</li>
<li><strong>size</strong> (<em>int</em>) &#8211; DEPRECATED. size of the lstm cell</li>
<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li> <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
<li><strong>reverse</strong> (<em>bool</em>) &#8211; is sequence process reversed or not.</li> <li><strong>reverse</strong> (<em>bool</em>) &#8211; is sequence process reversed or not.</li>
<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type, paddle.v2.activation.Tanh by default. <span class="math">\(h_t\)</span></li> <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type, paddle.v2.activation.Tanh by default. <span class="math">\(h_t\)</span></li>
...@@ -1093,6 +1094,7 @@ Recurrent Neural Networks on Sequence Modeling.</a></p> ...@@ -1093,6 +1094,7 @@ Recurrent Neural Networks on Sequence Modeling.</a></p>
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The gru layer name.</li> <li><strong>name</strong> (<em>None|basestring</em>) &#8211; The gru layer name.</li>
<li><strong>input</strong> (<em>paddle.v2.config_base.Layer.</em>) &#8211; input layer.</li> <li><strong>input</strong> (<em>paddle.v2.config_base.Layer.</em>) &#8211; input layer.</li>
<li><strong>size</strong> (<em>int</em>) &#8211; DEPRECATED. size of the gru cell</li>
<li><strong>reverse</strong> (<em>bool</em>) &#8211; Whether sequence process is reversed or not.</li> <li><strong>reverse</strong> (<em>bool</em>) &#8211; Whether sequence process is reversed or not.</li>
<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type, paddle.v2.activation.Tanh by default. This activation <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type, paddle.v2.activation.Tanh by default. This activation
affects the <span class="math">\({\tilde{h_t}}\)</span>.</li> affects the <span class="math">\({\tilde{h_t}}\)</span>.</li>
...@@ -1103,8 +1105,6 @@ This activation affects the <span class="math">\(z_t\)</span> and <span class="m ...@@ -1103,8 +1105,6 @@ This activation affects the <span class="math">\(z_t\)</span> and <span class="m
bias.</li> bias.</li>
<li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Parameter Attribute.</li> <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Parameter Attribute.</li>
<li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer attribute</li> <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer attribute</li>
<li><strong>size</strong> (<em>None</em>) &#8211; Stub parameter of size, but actually not used. If set this size
will get a warning.</li>
</ul> </ul>
</td> </td>
</tr> </tr>
...@@ -1259,18 +1259,18 @@ be paddle.v2.config_base.Layer.</li> ...@@ -1259,18 +1259,18 @@ be paddle.v2.config_base.Layer.</li>
<dl class="class"> <dl class="class">
<dt> <dt>
<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lstm_step</code></dt> <em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lstm_step</code></dt>
<dd><p>LSTM Step Layer. It used in recurrent_group. The lstm equations are shown <dd><p>LSTM Step Layer. This function is used only in recurrent_group.
as follow.</p> The lstm equations are shown as follows.</p>
<div class="math"> <div class="math">
\[ \begin{align}\begin{aligned}i_t &amp; = \sigma(W_{xi}x_{t} + W_{hi}h_{t-1} + W_{ci}c_{t-1} + b_i)\\f_t &amp; = \sigma(W_{xf}x_{t} + W_{hf}h_{t-1} + W_{cf}c_{t-1} + b_f)\\c_t &amp; = f_tc_{t-1} + i_t tanh (W_{xc}x_t+W_{hc}h_{t-1} + b_c)\\o_t &amp; = \sigma(W_{xo}x_{t} + W_{ho}h_{t-1} + W_{co}c_t + b_o)\\h_t &amp; = o_t tanh(c_t)\end{aligned}\end{align} \]</div> \[ \begin{align}\begin{aligned}i_t &amp; = \sigma(W_{x_i}x_{t} + W_{h_i}h_{t-1} + W_{c_i}c_{t-1} + b_i)\\f_t &amp; = \sigma(W_{x_f}x_{t} + W_{h_f}h_{t-1} + W_{c_f}c_{t-1} + b_f)\\c_t &amp; = f_tc_{t-1} + i_t tanh (W_{x_c}x_t+W_{h_c}h_{t-1} + b_c)\\o_t &amp; = \sigma(W_{x_o}x_{t} + W_{h_o}h_{t-1} + W_{c_o}c_t + b_o)\\h_t &amp; = o_t tanh(c_t)\end{aligned}\end{align} \]</div>
<p>The input of lstm step is <span class="math">\(Wx_t + Wh_{t-1}\)</span>, and user should use <p>The input of lstm step is <span class="math">\(Wx_t + Wh_{t-1}\)</span>, and user should use
<code class="code docutils literal"><span class="pre">mixed</span></code> and <code class="code docutils literal"><span class="pre">full_matrix_projection</span></code> to calculate these <code class="code docutils literal"><span class="pre">mixed</span></code> and <code class="code docutils literal"><span class="pre">full_matrix_projection</span></code> to calculate these
input vector.</p> input vectors.</p>
<p>The state of lstm step is <span class="math">\(c_{t-1}\)</span>. And lstm step layer will do</p> <p>The state of lstm step is <span class="math">\(c_{t-1}\)</span>. And lstm step layer will do</p>
<div class="math"> <div class="math">
\[ \begin{align}\begin{aligned}i_t = \sigma(input + W_{ci}c_{t-1} + b_i)\\...\end{aligned}\end{align} \]</div> \[ \begin{align}\begin{aligned}i_t = \sigma(input + W_{ci}c_{t-1} + b_i)\\...\end{aligned}\end{align} \]</div>
<p>This layer contains two outputs. Default output is <span class="math">\(h_t\)</span>. The other <p>This layer has two outputs. Default output is <span class="math">\(h_t\)</span>. The other
output is <span class="math">\(o_t\)</span>, which name is &#8216;state&#8217; and can use output is <span class="math">\(o_t\)</span>, whose name is &#8216;state&#8217; and can use
<code class="code docutils literal"><span class="pre">get_output</span></code> to extract this output.</p> <code class="code docutils literal"><span class="pre">get_output</span></code> to extract this output.</p>
<table class="docutils field-list" frame="void" rules="none"> <table class="docutils field-list" frame="void" rules="none">
<col class="field-name" /> <col class="field-name" />
...@@ -1278,8 +1278,8 @@ output is <span class="math">\(o_t\)</span>, which name is &#8216;state&#8217; a ...@@ -1278,8 +1278,8 @@ output is <span class="math">\(o_t\)</span>, which name is &#8216;state&#8217; a
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer&#8217;s name.</li> <li><strong>name</strong> (<em>basestring</em>) &#8211; Layer&#8217;s name.</li>
<li><strong>size</strong> (<em>int</em>) &#8211; Layer&#8217;s size. NOTE: lstm layer&#8217;s size, should be equal as <li><strong>size</strong> (<em>int</em>) &#8211; Layer&#8217;s size. NOTE: lstm layer&#8217;s size, should be equal to
<code class="code docutils literal"><span class="pre">input.size/4</span></code>, and should be equal as <code class="code docutils literal"><span class="pre">input.size/4</span></code>, and should be equal to
<code class="code docutils literal"><span class="pre">state.size</span></code>.</li> <code class="code docutils literal"><span class="pre">state.size</span></code>.</li>
<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer. <span class="math">\(Wx_t + Wh_{t-1}\)</span></li> <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer. <span class="math">\(Wx_t + Wh_{t-1}\)</span></li>
<li><strong>state</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; State Layer. <span class="math">\(c_{t-1}\)</span></li> <li><strong>state</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; State Layer. <span class="math">\(c_{t-1}\)</span></li>
......
...@@ -452,16 +452,16 @@ False if no bias.</li> ...@@ -452,16 +452,16 @@ False if no bias.</li>
<dl class="function"> <dl class="function">
<dt> <dt>
<code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
<dd><p>Define calculations that a LSTM unit performs in a single time step. <dd><p>Define calculations that a LSTM unit performs during a single time step.
This function itself is not a recurrent layer, so that it can not be This function itself is not a recurrent layer, so it can not be
directly applied to sequence input. This function is always used in directly used to process sequence inputs. This function is always used in
recurrent_group (see layers.py for more details) to implement attention recurrent_group (see layers.py for more details) to implement attention
mechanism.</p> mechanism.</p>
<p>Please refer to <strong>Generating Sequences With Recurrent Neural Networks</strong> <p>Please refer to <strong>Generating Sequences With Recurrent Neural Networks</strong>
for more details about LSTM. The link goes as follows: for more details about LSTM. The link goes as follows:
.. _Link: <a class="reference external" href="https://arxiv.org/abs/1308.0850">https://arxiv.org/abs/1308.0850</a></p> .. _Link: <a class="reference external" href="https://arxiv.org/abs/1308.0850">https://arxiv.org/abs/1308.0850</a></p>
<div class="math"> <div class="math">
\[ \begin{align}\begin{aligned}i_t &amp; = \sigma(W_{xi}x_{t} + W_{hi}h_{t-1} + W_{ci}c_{t-1} + b_i)\\f_t &amp; = \sigma(W_{xf}x_{t} + W_{hf}h_{t-1} + W_{cf}c_{t-1} + b_f)\\c_t &amp; = f_tc_{t-1} + i_t tanh (W_{xc}x_t+W_{hc}h_{t-1} + b_c)\\o_t &amp; = \sigma(W_{xo}x_{t} + W_{ho}h_{t-1} + W_{co}c_t + b_o)\\h_t &amp; = o_t tanh(c_t)\end{aligned}\end{align} \]</div> \[ \begin{align}\begin{aligned}i_t &amp; = \sigma(W_{x_i}x_{t} + W_{h_i}h_{t-1} + W_{c_i}c_{t-1} + b_i)\\f_t &amp; = \sigma(W_{x_f}x_{t} + W_{h_f}h_{t-1} + W_{c_f}c_{t-1} + b_f)\\c_t &amp; = f_tc_{t-1} + i_t tanh (W_{x_c}x_t+W_{h_c}h_{t-1} + b_c)\\o_t &amp; = \sigma(W_{x_o}x_{t} + W_{h_o}h_{t-1} + W_{c_o}c_t + b_o)\\h_t &amp; = o_t tanh(c_t)\end{aligned}\end{align} \]</div>
<p>The example usage is:</p> <p>The example usage is:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">lstm_step</span> <span class="o">=</span> <span class="n">lstmemory_unit</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">lstm_step</span> <span class="o">=</span> <span class="n">lstmemory_unit</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
<span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
...@@ -476,6 +476,7 @@ for more details about LSTM. The link goes as follows: ...@@ -476,6 +476,7 @@ for more details about LSTM. The link goes as follows:
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li> <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
<li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of the LSTM cell.</li>
<li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory unit name.</li> <li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory unit name.</li>
<li><strong>size</strong> (<em>int</em>) &#8211; lstmemory unit size.</li> <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory unit size.</li>
<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li> <li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
...@@ -508,7 +509,7 @@ False means no bias, None means default bias.</li> ...@@ -508,7 +509,7 @@ False means no bias, None means default bias.</li>
<dl class="function"> <dl class="function">
<dt> <dt>
<code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
<dd><p>lstm_group is a recurrent layer group version of Long Short Term Memory. It <dd><p>lstm_group is a recurrent_group version of Long Short Term Memory. It
does exactly the same calculation as the lstmemory layer (see lstmemory in does exactly the same calculation as the lstmemory layer (see lstmemory in
layers.py for the maths) does. A promising benefit is that LSTM memory layers.py for the maths) does. A promising benefit is that LSTM memory
cell states, or hidden states in every time step are accessible to the cell states, or hidden states in every time step are accessible to the
...@@ -518,8 +519,8 @@ it is recommended to use the lstmemory, which is relatively faster than ...@@ -518,8 +519,8 @@ it is recommended to use the lstmemory, which is relatively faster than
lstmemory_group.</p> lstmemory_group.</p>
<p>NOTE: In PaddlePaddle&#8217;s implementation, the following input-to-hidden <p>NOTE: In PaddlePaddle&#8217;s implementation, the following input-to-hidden
multiplications: multiplications:
<span class="math">\(W_{xi}x_{t}\)</span> , <span class="math">\(W_{xf}x_{t}\)</span>, <span class="math">\(W_{x_i}x_{t}\)</span> , <span class="math">\(W_{x_f}x_{t}\)</span>,
<span class="math">\(W_{xc}x_t\)</span>, <span class="math">\(W_{xo}x_{t}\)</span> are not done in lstmemory_unit to <span class="math">\(W_{x_c}x_t\)</span>, <span class="math">\(W_{x_o}x_{t}\)</span> are not done in lstmemory_unit to
speed up the calculations. Consequently, an additional mixed_layer with speed up the calculations. Consequently, an additional mixed_layer with
full_matrix_projection must be included before lstmemory_unit is called.</p> full_matrix_projection must be included before lstmemory_unit is called.</p>
<p>The example usage is:</p> <p>The example usage is:</p>
...@@ -536,8 +537,9 @@ full_matrix_projection must be included before lstmemory_unit is called.</p> ...@@ -536,8 +537,9 @@ full_matrix_projection must be included before lstmemory_unit is called.</p>
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li> <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
<li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory group name.</li>
<li><strong>size</strong> (<em>int</em>) &#8211; lstmemory group size.</li> <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory group size.</li>
<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the lstmemory group.</li>
<li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of LSTM cell.</li>
<li><strong>reverse</strong> (<em>bool</em>) &#8211; is lstm reversed</li> <li><strong>reverse</strong> (<em>bool</em>) &#8211; is lstm reversed</li>
<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li> <li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li> <li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
...@@ -662,8 +664,8 @@ concatenated and returned.</li> ...@@ -662,8 +664,8 @@ concatenated and returned.</li>
<dt> <dt>
<code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
<dd><p>Define calculations that a gated recurrent unit performs in a single time <dd><p>Define calculations that a gated recurrent unit performs in a single time
step. This function itself is not a recurrent layer, so that it can not be step. This function itself is not a recurrent layer, so it can not be
directly applied to sequence input. This function is almost always used in directly used to process sequence inputs. This function is always used in
the recurrent_group (see layers.py for more details) to implement attention the recurrent_group (see layers.py for more details) to implement attention
mechanism.</p> mechanism.</p>
<p>Please see grumemory in layers.py for the details about the maths.</p> <p>Please see grumemory in layers.py for the details about the maths.</p>
...@@ -673,6 +675,7 @@ mechanism.</p> ...@@ -673,6 +675,7 @@ mechanism.</p>
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li> <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
<li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of the LSTM cell.</li>
<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li> <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
<li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li> <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activation</li> <li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activation</li>
...@@ -697,7 +700,7 @@ mechanism.</p> ...@@ -697,7 +700,7 @@ mechanism.</p>
<dl class="function"> <dl class="function">
<dt> <dt>
<code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
<dd><p>gru_group is a recurrent layer group version of Gated Recurrent Unit. It <dd><p>gru_group is a recurrent_group version of Gated Recurrent Unit. It
does exactly the same calculation as the grumemory layer does. A promising does exactly the same calculation as the grumemory layer does. A promising
benefit is that gru hidden states are accessible to the user. This is benefit is that gru hidden states are accessible to the user. This is
especially useful in attention model. If you do not need to access especially useful in attention model. If you do not need to access
...@@ -717,6 +720,7 @@ to use the grumemory, which is relatively faster.</p> ...@@ -717,6 +720,7 @@ to use the grumemory, which is relatively faster.</p>
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li> <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
<li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of the LSTM cell.</li>
<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li> <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
<li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li> <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li> <li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
...@@ -1032,6 +1032,7 @@ more details about LSTM.</p> ...@@ -1032,6 +1032,7 @@ more details about LSTM.</p>
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>name</strong> (<em>basestring</em>) &#8211; The lstmemory layer name.</li> <li><strong>name</strong> (<em>basestring</em>) &#8211; The lstmemory layer name.</li>
<li><strong>size</strong> (<em>int</em>) &#8211; DEPRECATED. size of the lstm cell</li>
<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li> <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
<li><strong>reverse</strong> (<em>bool</em>) &#8211; is sequence process reversed or not.</li> <li><strong>reverse</strong> (<em>bool</em>) &#8211; is sequence process reversed or not.</li>
<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type, paddle.v2.activation.Tanh by default. <span class="math">\(h_t\)</span></li> <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type, paddle.v2.activation.Tanh by default. <span class="math">\(h_t\)</span></li>
...@@ -1098,6 +1099,7 @@ Recurrent Neural Networks on Sequence Modeling.</a></p> ...@@ -1098,6 +1099,7 @@ Recurrent Neural Networks on Sequence Modeling.</a></p>
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The gru layer name.</li> <li><strong>name</strong> (<em>None|basestring</em>) &#8211; The gru layer name.</li>
<li><strong>input</strong> (<em>paddle.v2.config_base.Layer.</em>) &#8211; input layer.</li> <li><strong>input</strong> (<em>paddle.v2.config_base.Layer.</em>) &#8211; input layer.</li>
<li><strong>size</strong> (<em>int</em>) &#8211; DEPRECATED. size of the gru cell</li>
<li><strong>reverse</strong> (<em>bool</em>) &#8211; Whether sequence process is reversed or not.</li> <li><strong>reverse</strong> (<em>bool</em>) &#8211; Whether sequence process is reversed or not.</li>
<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type, paddle.v2.activation.Tanh by default. This activation <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type, paddle.v2.activation.Tanh by default. This activation
affects the <span class="math">\({\tilde{h_t}}\)</span>.</li> affects the <span class="math">\({\tilde{h_t}}\)</span>.</li>
...@@ -1108,8 +1110,6 @@ This activation affects the <span class="math">\(z_t\)</span> and <span class="m ...@@ -1108,8 +1110,6 @@ This activation affects the <span class="math">\(z_t\)</span> and <span class="m
bias.</li> bias.</li>
<li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Parameter Attribute.</li> <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Parameter Attribute.</li>
<li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer attribute</li> <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer attribute</li>
<li><strong>size</strong> (<em>None</em>) &#8211; Stub parameter of size, but actually not used. If set this size
will get a warning.</li>
</ul> </ul>
</td> </td>
</tr> </tr>
...@@ -1264,18 +1264,18 @@ be paddle.v2.config_base.Layer.</li> ...@@ -1264,18 +1264,18 @@ be paddle.v2.config_base.Layer.</li>
<dl class="class"> <dl class="class">
<dt> <dt>
<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lstm_step</code></dt> <em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lstm_step</code></dt>
<dd><p>LSTM Step Layer. It used in recurrent_group. The lstm equations are shown <dd><p>LSTM Step Layer. This function is used only in recurrent_group.
as follow.</p> The lstm equations are shown as follows.</p>
<div class="math"> <div class="math">
\[ \begin{align}\begin{aligned}i_t &amp; = \sigma(W_{xi}x_{t} + W_{hi}h_{t-1} + W_{ci}c_{t-1} + b_i)\\f_t &amp; = \sigma(W_{xf}x_{t} + W_{hf}h_{t-1} + W_{cf}c_{t-1} + b_f)\\c_t &amp; = f_tc_{t-1} + i_t tanh (W_{xc}x_t+W_{hc}h_{t-1} + b_c)\\o_t &amp; = \sigma(W_{xo}x_{t} + W_{ho}h_{t-1} + W_{co}c_t + b_o)\\h_t &amp; = o_t tanh(c_t)\end{aligned}\end{align} \]</div> \[ \begin{align}\begin{aligned}i_t &amp; = \sigma(W_{x_i}x_{t} + W_{h_i}h_{t-1} + W_{c_i}c_{t-1} + b_i)\\f_t &amp; = \sigma(W_{x_f}x_{t} + W_{h_f}h_{t-1} + W_{c_f}c_{t-1} + b_f)\\c_t &amp; = f_tc_{t-1} + i_t tanh (W_{x_c}x_t+W_{h_c}h_{t-1} + b_c)\\o_t &amp; = \sigma(W_{x_o}x_{t} + W_{h_o}h_{t-1} + W_{c_o}c_t + b_o)\\h_t &amp; = o_t tanh(c_t)\end{aligned}\end{align} \]</div>
<p>The input of lstm step is <span class="math">\(Wx_t + Wh_{t-1}\)</span>, and user should use <p>The input of lstm step is <span class="math">\(Wx_t + Wh_{t-1}\)</span>, and user should use
<code class="code docutils literal"><span class="pre">mixed</span></code> and <code class="code docutils literal"><span class="pre">full_matrix_projection</span></code> to calculate these <code class="code docutils literal"><span class="pre">mixed</span></code> and <code class="code docutils literal"><span class="pre">full_matrix_projection</span></code> to calculate these
input vector.</p> input vectors.</p>
<p>The state of lstm step is <span class="math">\(c_{t-1}\)</span>. And lstm step layer will do</p> <p>The state of lstm step is <span class="math">\(c_{t-1}\)</span>. And lstm step layer will do</p>
<div class="math"> <div class="math">
\[ \begin{align}\begin{aligned}i_t = \sigma(input + W_{ci}c_{t-1} + b_i)\\...\end{aligned}\end{align} \]</div> \[ \begin{align}\begin{aligned}i_t = \sigma(input + W_{ci}c_{t-1} + b_i)\\...\end{aligned}\end{align} \]</div>
<p>This layer contains two outputs. Default output is <span class="math">\(h_t\)</span>. The other <p>This layer has two outputs. Default output is <span class="math">\(h_t\)</span>. The other
output is <span class="math">\(o_t\)</span>, which name is &#8216;state&#8217; and can use output is <span class="math">\(o_t\)</span>, whose name is &#8216;state&#8217; and can use
<code class="code docutils literal"><span class="pre">get_output</span></code> to extract this output.</p> <code class="code docutils literal"><span class="pre">get_output</span></code> to extract this output.</p>
<table class="docutils field-list" frame="void" rules="none"> <table class="docutils field-list" frame="void" rules="none">
<col class="field-name" /> <col class="field-name" />
...@@ -1283,8 +1283,8 @@ output is <span class="math">\(o_t\)</span>, which name is &#8216;state&#8217; a ...@@ -1283,8 +1283,8 @@ output is <span class="math">\(o_t\)</span>, which name is &#8216;state&#8217; a
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer&#8217;s name.</li> <li><strong>name</strong> (<em>basestring</em>) &#8211; Layer&#8217;s name.</li>
<li><strong>size</strong> (<em>int</em>) &#8211; Layer&#8217;s size. NOTE: lstm layer&#8217;s size, should be equal as <li><strong>size</strong> (<em>int</em>) &#8211; Layer&#8217;s size. NOTE: lstm layer&#8217;s size, should be equal to
<code class="code docutils literal"><span class="pre">input.size/4</span></code>, and should be equal as <code class="code docutils literal"><span class="pre">input.size/4</span></code>, and should be equal to
<code class="code docutils literal"><span class="pre">state.size</span></code>.</li> <code class="code docutils literal"><span class="pre">state.size</span></code>.</li>
<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer. <span class="math">\(Wx_t + Wh_{t-1}\)</span></li> <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer. <span class="math">\(Wx_t + Wh_{t-1}\)</span></li>
<li><strong>state</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; State Layer. <span class="math">\(c_{t-1}\)</span></li> <li><strong>state</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; State Layer. <span class="math">\(c_{t-1}\)</span></li>
......
...@@ -457,16 +457,16 @@ False if no bias.</li> ...@@ -457,16 +457,16 @@ False if no bias.</li>
<dl class="function"> <dl class="function">
<dt> <dt>
<code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
<dd><p>Define calculations that a LSTM unit performs in a single time step. <dd><p>Define calculations that a LSTM unit performs during a single time step.
This function itself is not a recurrent layer, so that it can not be This function itself is not a recurrent layer, so it can not be
directly applied to sequence input. This function is always used in directly used to process sequence inputs. This function is always used in
recurrent_group (see layers.py for more details) to implement attention recurrent_group (see layers.py for more details) to implement attention
mechanism.</p> mechanism.</p>
<p>Please refer to <strong>Generating Sequences With Recurrent Neural Networks</strong> <p>Please refer to <strong>Generating Sequences With Recurrent Neural Networks</strong>
for more details about LSTM. The link goes as follows: for more details about LSTM. The link goes as follows:
.. _Link: <a class="reference external" href="https://arxiv.org/abs/1308.0850">https://arxiv.org/abs/1308.0850</a></p> .. _Link: <a class="reference external" href="https://arxiv.org/abs/1308.0850">https://arxiv.org/abs/1308.0850</a></p>
<div class="math"> <div class="math">
\[ \begin{align}\begin{aligned}i_t &amp; = \sigma(W_{xi}x_{t} + W_{hi}h_{t-1} + W_{ci}c_{t-1} + b_i)\\f_t &amp; = \sigma(W_{xf}x_{t} + W_{hf}h_{t-1} + W_{cf}c_{t-1} + b_f)\\c_t &amp; = f_tc_{t-1} + i_t tanh (W_{xc}x_t+W_{hc}h_{t-1} + b_c)\\o_t &amp; = \sigma(W_{xo}x_{t} + W_{ho}h_{t-1} + W_{co}c_t + b_o)\\h_t &amp; = o_t tanh(c_t)\end{aligned}\end{align} \]</div> \[ \begin{align}\begin{aligned}i_t &amp; = \sigma(W_{x_i}x_{t} + W_{h_i}h_{t-1} + W_{c_i}c_{t-1} + b_i)\\f_t &amp; = \sigma(W_{x_f}x_{t} + W_{h_f}h_{t-1} + W_{c_f}c_{t-1} + b_f)\\c_t &amp; = f_tc_{t-1} + i_t tanh (W_{x_c}x_t+W_{h_c}h_{t-1} + b_c)\\o_t &amp; = \sigma(W_{x_o}x_{t} + W_{h_o}h_{t-1} + W_{c_o}c_t + b_o)\\h_t &amp; = o_t tanh(c_t)\end{aligned}\end{align} \]</div>
<p>The example usage is:</p> <p>The example usage is:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">lstm_step</span> <span class="o">=</span> <span class="n">lstmemory_unit</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">lstm_step</span> <span class="o">=</span> <span class="n">lstmemory_unit</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
<span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
...@@ -481,6 +481,7 @@ for more details about LSTM. The link goes as follows: ...@@ -481,6 +481,7 @@ for more details about LSTM. The link goes as follows:
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li> <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
<li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of the LSTM cell.</li>
<li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory unit name.</li> <li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory unit name.</li>
<li><strong>size</strong> (<em>int</em>) &#8211; lstmemory unit size.</li> <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory unit size.</li>
<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li> <li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
...@@ -513,7 +514,7 @@ False means no bias, None means default bias.</li> ...@@ -513,7 +514,7 @@ False means no bias, None means default bias.</li>
<dl class="function"> <dl class="function">
<dt> <dt>
<code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
<dd><p>lstm_group is a recurrent layer group version of Long Short Term Memory. It <dd><p>lstm_group is a recurrent_group version of Long Short Term Memory. It
does exactly the same calculation as the lstmemory layer (see lstmemory in does exactly the same calculation as the lstmemory layer (see lstmemory in
layers.py for the maths) does. A promising benefit is that LSTM memory layers.py for the maths) does. A promising benefit is that LSTM memory
cell states, or hidden states in every time step are accessible to the cell states, or hidden states in every time step are accessible to the
...@@ -523,8 +524,8 @@ it is recommended to use the lstmemory, which is relatively faster than ...@@ -523,8 +524,8 @@ it is recommended to use the lstmemory, which is relatively faster than
lstmemory_group.</p> lstmemory_group.</p>
<p>NOTE: In PaddlePaddle&#8217;s implementation, the following input-to-hidden <p>NOTE: In PaddlePaddle&#8217;s implementation, the following input-to-hidden
multiplications: multiplications:
<span class="math">\(W_{xi}x_{t}\)</span> , <span class="math">\(W_{xf}x_{t}\)</span>, <span class="math">\(W_{x_i}x_{t}\)</span> , <span class="math">\(W_{x_f}x_{t}\)</span>,
<span class="math">\(W_{xc}x_t\)</span>, <span class="math">\(W_{xo}x_{t}\)</span> are not done in lstmemory_unit to <span class="math">\(W_{x_c}x_t\)</span>, <span class="math">\(W_{x_o}x_{t}\)</span> are not done in lstmemory_unit to
speed up the calculations. Consequently, an additional mixed_layer with speed up the calculations. Consequently, an additional mixed_layer with
full_matrix_projection must be included before lstmemory_unit is called.</p> full_matrix_projection must be included before lstmemory_unit is called.</p>
<p>The example usage is:</p> <p>The example usage is:</p>
...@@ -541,8 +542,9 @@ full_matrix_projection must be included before lstmemory_unit is called.</p> ...@@ -541,8 +542,9 @@ full_matrix_projection must be included before lstmemory_unit is called.</p>
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li> <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
<li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory group name.</li>
<li><strong>size</strong> (<em>int</em>) &#8211; lstmemory group size.</li> <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory group size.</li>
<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the lstmemory group.</li>
<li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of LSTM cell.</li>
<li><strong>reverse</strong> (<em>bool</em>) &#8211; is lstm reversed</li> <li><strong>reverse</strong> (<em>bool</em>) &#8211; is lstm reversed</li>
<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li> <li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li> <li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
...@@ -667,8 +669,8 @@ concatenated and returned.</li> ...@@ -667,8 +669,8 @@ concatenated and returned.</li>
<dt> <dt>
<code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
<dd><p>Define calculations that a gated recurrent unit performs in a single time <dd><p>Define calculations that a gated recurrent unit performs in a single time
step. This function itself is not a recurrent layer, so that it can not be step. This function itself is not a recurrent layer, so it can not be
directly applied to sequence input. This function is almost always used in directly used to process sequence inputs. This function is always used in
the recurrent_group (see layers.py for more details) to implement attention the recurrent_group (see layers.py for more details) to implement attention
mechanism.</p> mechanism.</p>
<p>Please see grumemory in layers.py for the details about the maths.</p> <p>Please see grumemory in layers.py for the details about the maths.</p>
...@@ -678,6 +680,7 @@ mechanism.</p> ...@@ -678,6 +680,7 @@ mechanism.</p>
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li> <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
<li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of the LSTM cell.</li>
<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li> <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
<li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li> <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activation</li> <li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activation</li>
...@@ -702,7 +705,7 @@ mechanism.</p> ...@@ -702,7 +705,7 @@ mechanism.</p>
<dl class="function"> <dl class="function">
<dt> <dt>
<code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt> <code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
<dd><p>gru_group is a recurrent layer group version of Gated Recurrent Unit. It <dd><p>gru_group is a recurrent_group version of Gated Recurrent Unit. It
does exactly the same calculation as the grumemory layer does. A promising does exactly the same calculation as the grumemory layer does. A promising
benefit is that gru hidden states are accessible to the user. This is benefit is that gru hidden states are accessible to the user. This is
especially useful in attention model. If you do not need to access especially useful in attention model. If you do not need to access
...@@ -722,6 +725,7 @@ to use the grumemory, which is relatively faster.</p> ...@@ -722,6 +725,7 @@ to use the grumemory, which is relatively faster.</p>
<tbody valign="top"> <tbody valign="top">
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple"> <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li> <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
<li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of the LSTM cell.</li>
<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li> <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
<li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li> <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li> <li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
......
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册