<p>The defalut implementation is diagonal/peephole connection
(<aclass="reference external"href="https://arxiv.org/pdf/1402.1128.pdf">https://arxiv.org/pdf/1402.1128.pdf</a>), the formula is as follows:</p>
<divclass="math">
\[ \begin{align}\begin{aligned}i_t & = \sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + W_{ic}c_{t-1} + b_i)\\f_t & = \sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + W_{fc}c_{t-1} + b_f)\\\tilde{c_t} & = act_g(W_{cx}x_t + W_{ch}h_{t-1} + b_c)\\o_t & = \sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + W_{oc}c_t + b_o)\\c_t & = f_t \odot c_{t-1} + i_t \odot \tilde{c_t}\\h_t & = o_t \odot act_h(c_t)\end{aligned}\end{align} \]</div>
<p>where the <spanclass="math">\(W\)</span> terms denote weight matrices (e.g. <spanclass="math">\(W_{xi}\)</span> is
the matrix of weights from the input gate to the input), <spanclass="math">\(W_{ic}, W_{fc}, W_{oc}\)</span> are diagonal weight matrices for peephole connections. In
our implementation, we use vectors to reprenset these diagonal weight
matrices. The <spanclass="math">\(b\)</span> terms denote bias vectors (<spanclass="math">\(b_i\)</span> is the input
gate bias vector), <spanclass="math">\(\sigma\)</span> is the non-line activations, such as
logistic sigmoid function, and <spanclass="math">\(i, f, o\)</span> and <spanclass="math">\(c\)</span> are the input
gate, forget gate, output gate, and cell activation vectors, respectively,
all of which have the same size as the cell output activation vector <spanclass="math">\(h\)</span>.</p>
<p>The <spanclass="math">\(\odot\)</span> is the element-wise product of the vectors. <spanclass="math">\(act_g\)</span>
and <spanclass="math">\(act_h\)</span> are the cell input and cell output activation functions
and <cite>tanh</cite> is usually used for them. <spanclass="math">\(\tilde{c_t}\)</span> is also called
candidate hidden state, which is computed based on the current input and
the previous hidden state.</p>
<p>Set <cite>use_peepholes</cite> to <cite>False</cite> to disable peephole connection. The formula
is omitted here, please refer to the paper
<aclass="reference external"href="http://www.bioinf.jku.at/publications/older/2604.pdf">http://www.bioinf.jku.at/publications/older/2604.pdf</a> for details.</p>
<p>Note that these <spanclass="math">\(W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}\)</span>
operations on the input <spanclass="math">\(x_{t}\)</span> are NOT included in this operator.
Users can choose to use fully-connect layer before LSTM layer.</p>
<trclass="field-even field"><thclass="field-name">Returns:</th><tdclass="field-body"><pclass="first">The hidden state, and cell state of LSTM. The shape of both is (T x D), and lod is the same with the <cite>input</cite>.</p>
"comment":"(Tensor, optional) the initial cell state is an optional input. This is a tensor with shape (N x D), where N is the batch size. `H0` and `C0` can be NULL but only at the same time",
"comment":"(Tensor, optional) the initial cell state is an optional input. This is a tensor with shape (N x D), where N is the batch size. `H0` and `C0` can be NULL but only at the same time.",
<p>The defalut implementation is diagonal/peephole connection
(<aclass="reference external"href="https://arxiv.org/pdf/1402.1128.pdf">https://arxiv.org/pdf/1402.1128.pdf</a>), the formula is as follows:</p>
<divclass="math">
\[ \begin{align}\begin{aligned}i_t & = \sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + W_{ic}c_{t-1} + b_i)\\f_t & = \sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + W_{fc}c_{t-1} + b_f)\\\tilde{c_t} & = act_g(W_{cx}x_t + W_{ch}h_{t-1} + b_c)\\o_t & = \sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + W_{oc}c_t + b_o)\\c_t & = f_t \odot c_{t-1} + i_t \odot \tilde{c_t}\\h_t & = o_t \odot act_h(c_t)\end{aligned}\end{align} \]</div>
<p>where the <spanclass="math">\(W\)</span> terms denote weight matrices (e.g. <spanclass="math">\(W_{xi}\)</span> is
the matrix of weights from the input gate to the input), <spanclass="math">\(W_{ic}, W_{fc}, W_{oc}\)</span> are diagonal weight matrices for peephole connections. In
our implementation, we use vectors to reprenset these diagonal weight
matrices. The <spanclass="math">\(b\)</span> terms denote bias vectors (<spanclass="math">\(b_i\)</span> is the input
gate bias vector), <spanclass="math">\(\sigma\)</span> is the non-line activations, such as
logistic sigmoid function, and <spanclass="math">\(i, f, o\)</span> and <spanclass="math">\(c\)</span> are the input
gate, forget gate, output gate, and cell activation vectors, respectively,
all of which have the same size as the cell output activation vector <spanclass="math">\(h\)</span>.</p>
<p>The <spanclass="math">\(\odot\)</span> is the element-wise product of the vectors. <spanclass="math">\(act_g\)</span>
and <spanclass="math">\(act_h\)</span> are the cell input and cell output activation functions
and <cite>tanh</cite> is usually used for them. <spanclass="math">\(\tilde{c_t}\)</span> is also called
candidate hidden state, which is computed based on the current input and
the previous hidden state.</p>
<p>Set <cite>use_peepholes</cite> to <cite>False</cite> to disable peephole connection. The formula
is omitted here, please refer to the paper
<aclass="reference external"href="http://www.bioinf.jku.at/publications/older/2604.pdf">http://www.bioinf.jku.at/publications/older/2604.pdf</a> for details.</p>
<p>Note that these <spanclass="math">\(W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}\)</span>
operations on the input <spanclass="math">\(x_{t}\)</span> are NOT included in this operator.
Users can choose to use fully-connect layer before LSTM layer.</p>
<trclass="field-even field"><thclass="field-name">返回:</th><tdclass="field-body"><pclass="first">The hidden state, and cell state of LSTM. The shape of both is (T x D), and lod is the same with the <cite>input</cite>.</p>