1. This OP only supports LoDTensor as inputs. If you need to deal with Tensor, please use :ref:`api_fluid_layers_lstm` .
2. In order to improve efficiency, users must first map the input of dimension [T, hidden_size] to input of [T, 4 * hidden_size], and then pass it to this OP.
Args:
The implementation of this OP include diagonal/peephole connections.
input (Variable): ${input_comment}
Please refer to `Gers, F. A., & Schmidhuber, J. (2000) <ftp://ftp.idsia.ch/pub/juergen/TimeCount-IJCNN2000.pdf>`_ .
size (int): 4 * hidden size.
If you do not need peephole connections, please set use_peepholes to False .
h_0(Variable): The initial hidden state is an optional input, default is zero.
This is a tensor with shape (N x D), where N is the
batch size and D is the hidden size.
c_0(Variable): The initial cell state is an optional input, default is zero.
This is a tensor with shape (N x D), where N is the
batch size. `h_0` and `c_0` can be NULL but only at the same time.
param_attr(ParamAttr|None): The parameter attribute for the learnable
hidden-hidden weights.
- Weights = {:math:`W_{ch}, W_{ih}, \
This OP computes each timestep as follows:
W_{fh}, W_{oh}`}
- The shape is (D x 4D), where D is the hidden
size.
If it is set to None or one attribute of ParamAttr,
- :math:`W` represents weight (e.g., :math:`W_{ix}` is the weight of a linear transformation of input :math:`x_{t}` when calculating input gate :math:`i_t` )
- :math:`b` represents bias (e.g., :math:`b_{i}` is the bias of input gate)
- :math:`\sigma` represents nonlinear activation function for gate, default sigmoid
- :math:`\odot` represents the Hadamard product of a matrix, i.e. multiplying the elements of the same position for two matrices with the same dimension to get another matrix with the same dimension
Parameters:
input ( :ref:`api_guide_Variable_en` ): LSTM input tensor, multi-dimensional LODTensor of shape :math:`[T, 4*hidden\_size]` . Data type is float32 or float64.
size (int): must be 4 * hidden_size.
h_0( :ref:`api_guide_Variable_en` , optional): The initial hidden state of the LSTM, multi-dimensional Tensor of shape :math:`[batch\_size, hidden\_size]` .
Data type is float32 or float64. If set to None, it will be a vector of all 0. Default: None.
c_0( :ref:`api_guide_Variable_en` , optional): The initial hidden state of the LSTM, multi-dimensional Tensor of shape :math:`[batch\_size, hidden\_size]` .
Data type is float32 or float64. If set to None, it will be a vector of all 0. `h_0` and `c_0` can be None but only at the same time. Default: None.
param_attr(ParamAttr, optional): Parameter attribute of weight. If it is None, the default weight parameter attribute is used. Please refer to ref:`api_fluid_ParamAttr' .
If the user needs to set this parameter, the dimension must be :math:`[hidden\_size, 4*hidden\_size]` . Default: None.
- Weights = :math:`\{ W_{cr},W_{ir},W_{fr},W_{or} \}` , the shape is [hidden_size, 4*hidden_size].
bias_attr (ParamAttr, optional): The bias attribute for the learnable bias
weights, which contains two parts, input-hidden
weights, which contains two parts, input-hidden
bias weights and peephole connections weights if
bias weights and peephole connections weights if
setting `use_peepholes` to `True`.
setting `use_peepholes` to `True`.
Please refer to ref:`api_fluid_ParamAttr' . Default: None.
1. `use_peepholes = False`
1. `use_peepholes = False`
- Biases = {:math:`b_c, b_i, b_f, b_o`}.
- Biases = {:math:`b_c, b_i, b_f, b_o`}.
- The shape is (1 x 4D).
- The shape is [1, 4*hidden_size].
2. `use_peepholes = True`
2. `use_peepholes = True`
- Biases = { :math:`b_c, b_i, b_f, b_o, W_{ic}, \
- Biases = { :math:`b_c, b_i, b_f, b_o, W_{ic}, \
W_{fc}, W_{oc}`}.
W_{fc}, W_{oc}`}.
- The shape is (1 x 7D).
- The shape is [1, 7*hidden_size].
If it is set to None or one attribute of ParamAttr,
use_peepholes (bool, optional): Whether to use peephole connection or not. Default: True.
dynamic_lstm will create ParamAttr as bias_attr.
is_reverse (bool, optional): Whether to calculate reverse LSTM. Default: False.
If the Initializer of the bias_attr is not set,
gate_activation (str, optional): The activation for input gate, forget gate and output gate. Default: "sigmoid".
the bias is initialized zero. Default: None.
cell_activation (str, optional): The activation for cell output. Default: "tanh".
use_peepholes (bool): ${use_peepholes_comment}
candidate_activation (str, optional): The activation for candidate hidden state. Default: "tanh".
is_reverse (bool): ${is_reverse_comment}
dtype (str, optional): Data type, can be "float32" or "float64". Default: "float32".
gate_activation (str): ${gate_activation_comment}
name (str, optional): A name for this layer. Please refer to :ref:`api_guide_Name` . Default: None.
- :math:`W` represents weight (e.g., :math:`W_{ix}` is the weight of a linear transformation of input :math:`x_{t}` when calculating input gate :math:`i_t` )
- :math:`b` represents bias (e.g., :math:`b_{i}` is the bias of input gate)
- :math:`\sigma` represents nonlinear activation function for gate, default sigmoid
- :math:`\odot` represents the Hadamard product of a matrix, i.e. multiplying the elements of the same position for two matrices with the same dimension to get another matrix with the same dimension
Args:
Parameters:
input (Variable): LSTM input tensor, shape MUST be ( seq_len x batch_size x input_size )
input ( :ref:`api_guide_Variable_en` ): LSTM input tensor, 3-D Tensor of shape :math:`[batch\_size, seq\_len, input\_dim]` . Data type is float32 or float64
init_h(Variable): The initial hidden state of the LSTM
init_h( :ref:`api_guide_Variable_en` ): The initial hidden state of the LSTM, 3-D Tensor of shape :math:`[num\_layers, batch\_size, hidden\_size]` .
This is a tensor with shape ( num_layers x batch_size x hidden_size)
If is_bidirec = True, shape should be :math:`[num\_layers*2, batch\_size, hidden\_size]` . Data type is float32 or float64.
if is_bidirec = True, shape should be ( num_layers*2 x batch_size x hidden_size)
init_c( :ref:`api_guide_Variable_en` ): The initial cell state of the LSTM, 3-D Tensor of shape :math:`[num\_layers, batch\_size, hidden\_size]` .
init_c(Variable): The initial cell state of the LSTM.
If is_bidirec = True, shape should be :math:`[num\_layers*2, batch\_size, hidden\_size]` . Data type is float32 or float64.
This is a tensor with shape ( num_layers x batch_size x hidden_size )
max_len (int): max length of LSTM. the first dim of input tensor CAN NOT greater than max_len.
if is_bidirec = True, shape should be ( num_layers*2 x batch_size x hidden_size)
hidden_size (int): hidden size of the LSTM.
max_len (int): max length of LSTM. the first dim of input tensor CAN NOT greater than max_len
num_layers (int): total layers number of the LSTM.
hidden_size (int): hidden size of the LSTM
dropout_prob(float, optional): dropout prob, dropout ONLY work between rnn layers, NOT between time steps
num_layers (int): total layers number of the LSTM
There is NO dropout work on rnn output of the last RNN layers.
dropout_prob(float|0.0): dropout prob, dropout ONLY work between rnn layers, NOT between time steps
Default: 0.0.
There is NO dropout work on rnn output of the last RNN layers
is_bidirec (bool, optional): If it is bidirectional. Default: False.
is_bidirec (bool): If it is bidirectional
is_test (bool, optional): If it is in test phrase. Default: False.
is_test (bool): If it is in test phrase
name (str, optional): A name for this layer. If set None, the layer
name (str|None): A name for this layer(optional). If set None, the layer
will be named automatically. Default: None.
will be named automatically.
default_initializer(Initializer, optional): Where use initializer to initialize the Weight
default_initializer(Initialize|None): Where use initializer to initialize the Weight
If set None, defaule initializer will be used. Default: None.
If set None, defaule initializer will be used
seed(int, optional): Seed for dropout in LSTM, If it's -1, dropout will use random seed. Default: 1.
seed(int): Seed for dropout in LSTM, If it's -1, dropout will use random seed
1. In order to improve efficiency, users must first map the input of dimension [T, hidden_size] to input of [T, 4 * hidden_size], and then pass it to this OP.
LSTMP (LSTM with recurrent projection) layer has a separate projection
layer after the LSTM layer, projecting the original hidden state to a
lower-dimensional one, which is proposed to reduce the number of total
parameters and furthermore computational complexity for the LSTM,
espeacially for the case that the size of output units is relative
large (https://research.google.com/pubs/archive/43905.pdf).
This OP implements the LSTMP (LSTM Projected) layer.
The LSTMP layer has a separate linear mapping layer behind the LSTM layer. -- `Sak, H., Senior, A., & Beaufays, F. (2014) <https://ai.google/research/pubs/pub43905.pdf>`_ .
h_t & = o_t \odot act_h(c_t)
Compared with the standard LSTM layer, LSTMP has an additional linear mapping layer,
which is used to map from the original hidden state :math:`h_t` to the lower dimensional state :math:`r_t` .
This reduces the total number of parameters and computational complexity, especially when the output unit is relatively large.
r_t & = \overline{act_h}(W_{rh}h_t)
The default implementation of the OP contains diagonal/peephole connections,
please refer to `Gers, F. A., & Schmidhuber, J. (2000) <ftp://ftp.idsia.ch/pub/juergen/TimeCount-IJCNN2000.pdf>`_ .
If you need to disable the peephole connections, set use_peepholes to False.
In the above formula:
This OP computes each timestep as follows:
* :math:`W`: Denotes weight matrices (e.g. :math:`W_{xi}` is \
.. math::
the matrix of weights from the input gate to the input).
- :math:`W` represents weight (e.g., :math:`W_{ix}` is the weight of a linear transformation of input :math:`x_{t}` when calculating input gate :math:`i_t` )
- :math:`b` represents bias (e.g., :math:`b_{i}` is the bias of input gate)
- :math:`\sigma` represents nonlinear activation function for gate, default sigmoid
- :math:`\odot` represents the Hadamard product of a matrix, i.e. multiplying the elements of the same position for two matrices with the same dimension to get another matrix with the same dimension
Args:
Parameters:
input(Variable): The input of dynamic_lstmp layer, which supports
input( :ref:`api_guide_Variable_en` ): The input of dynamic_lstmp layer, which supports
variable-time length input sequence. The underlying
variable-time length input sequence.
tensor in this Variable is a matrix with shape
It is a multi-dimensional LODTensor of shape :math:`[T, 4*hidden\_size]` . Data type is float32 or float64.
(T X 4D), where T is the total time steps in this
size(int): must be 4 * hidden_size.
mini-batch, D is the hidden size.
size(int): 4 * hidden size.
proj_size(int): The size of projection output.
proj_size(int): The size of projection output.
param_attr(ParamAttr|None): The parameter attribute for the learnable
param_attr(ParamAttr, optional): Parameter attribute of weight. If it is None, the default weight parameter attribute is used. Please refer to ref:`api_fluid_ParamAttr' .
hidden-hidden weight and projection weight.
If the user needs to set this parameter, the dimension must be :math:`[hidden\_size, 4*hidden\_size]` . Default: None.