<p>The defalut implementation is diagonal/peephole connection
(<aclass="reference external"href="https://arxiv.org/pdf/1402.1128.pdf">https://arxiv.org/pdf/1402.1128.pdf</a>), the formula is as follows:</p>
...
...
@@ -368,7 +368,7 @@ with zeros whenever lookup encounters it in <code class="xref py py-attr docutil
the matrix of weights from the input gate to the input), <spanclass="math">\(W_{ic}, W_{fc}, W_{oc}\)</span> are diagonal weight matrices for peephole connections. In
our implementation, we use vectors to reprenset these diagonal weight
matrices. The <spanclass="math">\(b\)</span> terms denote bias vectors (<spanclass="math">\(b_i\)</span> is the input
gate bias vector), <spanclass="math">\(\sigma\)</span> is the non-line activations, such as
gate bias vector), <spanclass="math">\(\sigma\)</span> is the non-linear activations, such as
logistic sigmoid function, and <spanclass="math">\(i, f, o\)</span> and <spanclass="math">\(c\)</span> are the input
gate, forget gate, output gate, and cell activation vectors, respectively,
all of which have the same size as the cell output activation vector <spanclass="math">\(h\)</span>.</p>
...
...
@@ -394,15 +394,15 @@ tensor in this Variable is a matrix with shape
<p>LSTMP (LSTM with recurrent projection) layer has a separate projection
layer after the LSTM layer, projecting the original hidden state to a
lower-dimensional one, which is proposed to reduce the number of total
parameters and furthermore computational complexity for the LSTM,
espeacially for the case that the size of output units is relative
large (<aclass="reference external"href="https://research.google.com/pubs/archive/43905.pdf">https://research.google.com/pubs/archive/43905.pdf</a>).</p>
<p>The formula is as follows:</p>
<divclass="math">
\[ \begin{align}\begin{aligned}i_t & = \sigma(W_{ix}x_{t} + W_{ir}r_{t-1} + W_{ic}c_{t-1} + b_i)\\f_t & = \sigma(W_{fx}x_{t} + W_{fr}r_{t-1} + W_{fc}c_{t-1} + b_f)\\\tilde{c_t} & = act_g(W_{cx}x_t + W_{cr}r_{t-1} + b_c)\\o_t & = \sigma(W_{ox}x_{t} + W_{or}r_{t-1} + W_{oc}c_t + b_o)\\c_t & = f_t \odot c_{t-1} + i_t \odot \tilde{c_t}\\h_t & = o_t \odot act_h(c_t)\\r_t & = \overline{act_h}(W_{rh}h_t)\end{aligned}\end{align} \]</div>
<p>In the above formula:</p>
<ulclass="simple">
<li><spanclass="math">\(W\)</span>: Denotes weight matrices (e.g. <spanclass="math">\(W_{xi}\)</span> is the matrix of weights from the input gate to the input).</li>
<li><spanclass="math">\(W_{ic}\)</span>, <spanclass="math">\(W_{fc}\)</span>, <spanclass="math">\(W_{oc}\)</span>: Diagonal weight matrices for peephole connections. In our implementation, we use vectors to reprenset these diagonal weight matrices.</li>
<li><spanclass="math">\(b\)</span>: Denotes bias vectors (e.g. <spanclass="math">\(b_i\)</span> is the input gate bias vector).</li>
<li><spanclass="math">\(\sigma\)</span>: The activation, such as logistic sigmoid function.</li>
<li><spanclass="math">\(i, f, o\)</span> and <spanclass="math">\(c\)</span>: The input gate, forget gate, output gate, and cell activation vectors, respectively, all of which have the same size as the cell output activation vector <spanclass="math">\(h\)</span>.</li>
<li><spanclass="math">\(h\)</span>: The hidden state.</li>
<li><spanclass="math">\(r\)</span>: The recurrent projection of the hidden state.</li>
<li><spanclass="math">\(\tilde{c_t}\)</span>: The candidate hidden state, whose computation is based on the current input and previous hidden state.</li>
<li><spanclass="math">\(\odot\)</span>: The element-wise product of the vectors.</li>
<li><spanclass="math">\(act_g\)</span> and <spanclass="math">\(act_h\)</span>: The cell input and cell output activation functions and <cite>tanh</cite> is usually used for them.</li>
<li><spanclass="math">\(\overline{act_h}\)</span>: The activation function for the projection output, usually using <cite>identity</cite> or same as <spanclass="math">\(act_h\)</span>.</li>
</ul>
<p>Set <cite>use_peepholes</cite> to <cite>False</cite> to disable peephole connection. The formula
is omitted here, please refer to the paper
<aclass="reference external"href="http://www.bioinf.jku.at/publications/older/2604.pdf">http://www.bioinf.jku.at/publications/older/2604.pdf</a> for details.</p>
<p>Note that these <spanclass="math">\(W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}\)</span>
operations on the input <spanclass="math">\(x_{t}\)</span> are NOT included in this operator.
Users can choose to use fully-connected layer before LSTMP layer.</p>
<li><strong>name</strong> (<em>str|None</em>) – A name for this layer(optional). If set None, the layer
will be named automatically.</li>
</ul>
</td>
</tr>
<trclass="field-even field"><thclass="field-name">Returns:</th><tdclass="field-body"><pclass="first">The projection of hidden state, and cell state of LSTMP. The shape of projection is (T x P), for the cell state which is (T x D), and both LoD is the same with the <cite>input</cite>.</p>
<p>The defalut implementation is diagonal/peephole connection
(<aclass="reference external"href="https://arxiv.org/pdf/1402.1128.pdf">https://arxiv.org/pdf/1402.1128.pdf</a>), the formula is as follows:</p>
...
...
@@ -387,7 +387,7 @@ with zeros whenever lookup encounters it in <code class="xref py py-attr docutil
the matrix of weights from the input gate to the input), <spanclass="math">\(W_{ic}, W_{fc}, W_{oc}\)</span> are diagonal weight matrices for peephole connections. In
our implementation, we use vectors to reprenset these diagonal weight
matrices. The <spanclass="math">\(b\)</span> terms denote bias vectors (<spanclass="math">\(b_i\)</span> is the input
gate bias vector), <spanclass="math">\(\sigma\)</span> is the non-line activations, such as
gate bias vector), <spanclass="math">\(\sigma\)</span> is the non-linear activations, such as
logistic sigmoid function, and <spanclass="math">\(i, f, o\)</span> and <spanclass="math">\(c\)</span> are the input
gate, forget gate, output gate, and cell activation vectors, respectively,
all of which have the same size as the cell output activation vector <spanclass="math">\(h\)</span>.</p>
...
...
@@ -413,15 +413,15 @@ tensor in this Variable is a matrix with shape
<p>LSTMP (LSTM with recurrent projection) layer has a separate projection
layer after the LSTM layer, projecting the original hidden state to a
lower-dimensional one, which is proposed to reduce the number of total
parameters and furthermore computational complexity for the LSTM,
espeacially for the case that the size of output units is relative
large (<aclass="reference external"href="https://research.google.com/pubs/archive/43905.pdf">https://research.google.com/pubs/archive/43905.pdf</a>).</p>
<p>The formula is as follows:</p>
<divclass="math">
\[ \begin{align}\begin{aligned}i_t & = \sigma(W_{ix}x_{t} + W_{ir}r_{t-1} + W_{ic}c_{t-1} + b_i)\\f_t & = \sigma(W_{fx}x_{t} + W_{fr}r_{t-1} + W_{fc}c_{t-1} + b_f)\\\tilde{c_t} & = act_g(W_{cx}x_t + W_{cr}r_{t-1} + b_c)\\o_t & = \sigma(W_{ox}x_{t} + W_{or}r_{t-1} + W_{oc}c_t + b_o)\\c_t & = f_t \odot c_{t-1} + i_t \odot \tilde{c_t}\\h_t & = o_t \odot act_h(c_t)\\r_t & = \overline{act_h}(W_{rh}h_t)\end{aligned}\end{align} \]</div>
<p>In the above formula:</p>
<ulclass="simple">
<li><spanclass="math">\(W\)</span>: Denotes weight matrices (e.g. <spanclass="math">\(W_{xi}\)</span> is the matrix of weights from the input gate to the input).</li>
<li><spanclass="math">\(W_{ic}\)</span>, <spanclass="math">\(W_{fc}\)</span>, <spanclass="math">\(W_{oc}\)</span>: Diagonal weight matrices for peephole connections. In our implementation, we use vectors to reprenset these diagonal weight matrices.</li>
<li><spanclass="math">\(b\)</span>: Denotes bias vectors (e.g. <spanclass="math">\(b_i\)</span> is the input gate bias vector).</li>
<li><spanclass="math">\(\sigma\)</span>: The activation, such as logistic sigmoid function.</li>
<li><spanclass="math">\(i, f, o\)</span> and <spanclass="math">\(c\)</span>: The input gate, forget gate, output gate, and cell activation vectors, respectively, all of which have the same size as the cell output activation vector <spanclass="math">\(h\)</span>.</li>
<li><spanclass="math">\(h\)</span>: The hidden state.</li>
<li><spanclass="math">\(r\)</span>: The recurrent projection of the hidden state.</li>
<li><spanclass="math">\(\tilde{c_t}\)</span>: The candidate hidden state, whose computation is based on the current input and previous hidden state.</li>
<li><spanclass="math">\(\odot\)</span>: The element-wise product of the vectors.</li>
<li><spanclass="math">\(act_g\)</span> and <spanclass="math">\(act_h\)</span>: The cell input and cell output activation functions and <cite>tanh</cite> is usually used for them.</li>
<li><spanclass="math">\(\overline{act_h}\)</span>: The activation function for the projection output, usually using <cite>identity</cite> or same as <spanclass="math">\(act_h\)</span>.</li>
</ul>
<p>Set <cite>use_peepholes</cite> to <cite>False</cite> to disable peephole connection. The formula
is omitted here, please refer to the paper
<aclass="reference external"href="http://www.bioinf.jku.at/publications/older/2604.pdf">http://www.bioinf.jku.at/publications/older/2604.pdf</a> for details.</p>
<p>Note that these <spanclass="math">\(W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}\)</span>
operations on the input <spanclass="math">\(x_{t}\)</span> are NOT included in this operator.
Users can choose to use fully-connected layer before LSTMP layer.</p>
<li><strong>name</strong> (<em>str|None</em>) – A name for this layer(optional). If set None, the layer
will be named automatically.</li>
</ul>
</td>
</tr>
<trclass="field-even field"><thclass="field-name">返回:</th><tdclass="field-body"><pclass="first">The projection of hidden state, and cell state of LSTMP. The shape of projection is (T x P), for the cell state which is (T x D), and both LoD is the same with the <cite>input</cite>.</p>