Networks

The v2.networks module contains pieces of neural network that combine multiple layers.

NLP

sequence_conv_pool

class paddle.v2.networks.sequence_conv_pool(*args, **kwargs)

Text convolution pooling layers helper.

Text input => Context Projection => FC Layer => Pooling => Output.

参数:
  • name (basestring) – name of output layer(pooling layer name)
  • input (paddle.v2.config_base.Layer) – name of input layer
  • context_len (int) – context projection length. See context_projection’s document.
  • hidden_size (int) – FC Layer size.
  • context_start (int or None) – context projection length. See context_projection’s context_start.
  • pool_type (BasePoolingType.) – pooling layer type. See pooling’s document.
  • context_proj_name (basestring) – context projection layer name. None if user don’t care.
  • context_proj_param_attr (paddle.v2.attr.ParameterAttribute or None.) – context projection parameter attribute. None if user don’t care.
  • fc_name (basestring) – fc layer name. None if user don’t care.
  • fc_param_attr (paddle.v2.attr.ParameterAttribute or None) – fc layer parameter attribute. None if user don’t care.
  • fc_bias_attr (paddle.v2.attr.ParameterAttribute or None) – fc bias parameter attribute. False if no bias, None if user don’t care.
  • fc_act (paddle.v2.Activation.Base) – fc layer activation type. None means tanh
  • pool_bias_attr (paddle.v2.attr.ParameterAttribute or None.) – pooling layer bias attr. None if don’t care. False if no bias.
  • fc_attr (paddle.v2.attr.ExtraAttribute) – fc layer extra attribute.
  • context_attr (paddle.v2.attr.ExtraAttribute) – context projection layer extra attribute.
  • pool_attr (paddle.v2.attr.ExtraAttribute) – pooling layer extra attribute.
返回:

output layer name.

返回类型:

paddle.v2.config_base.Layer

text_conv_pool

class paddle.v2.networks.text_conv_pool(*args, **kwargs)

Text convolution pooling layers helper.

Text input => Context Projection => FC Layer => Pooling => Output.

参数:
  • name (basestring) – name of output layer(pooling layer name)
  • input (paddle.v2.config_base.Layer) – name of input layer
  • context_len (int) – context projection length. See context_projection’s document.
  • hidden_size (int) – FC Layer size.
  • context_start (int or None) – context projection length. See context_projection’s context_start.
  • pool_type (BasePoolingType.) – pooling layer type. See pooling’s document.
  • context_proj_name (basestring) – context projection layer name. None if user don’t care.
  • context_proj_param_attr (paddle.v2.attr.ParameterAttribute or None.) – context projection parameter attribute. None if user don’t care.
  • fc_name (basestring) – fc layer name. None if user don’t care.
  • fc_param_attr (paddle.v2.attr.ParameterAttribute or None) – fc layer parameter attribute. None if user don’t care.
  • fc_bias_attr (paddle.v2.attr.ParameterAttribute or None) – fc bias parameter attribute. False if no bias, None if user don’t care.
  • fc_act (paddle.v2.Activation.Base) – fc layer activation type. None means tanh
  • pool_bias_attr (paddle.v2.attr.ParameterAttribute or None.) – pooling layer bias attr. None if don’t care. False if no bias.
  • fc_attr (paddle.v2.attr.ExtraAttribute) – fc layer extra attribute.
  • context_attr (paddle.v2.attr.ExtraAttribute) – context projection layer extra attribute.
  • pool_attr (paddle.v2.attr.ExtraAttribute) – pooling layer extra attribute.
返回:

output layer name.

返回类型:

paddle.v2.config_base.Layer

Images

img_conv_bn_pool

class paddle.v2.networks.img_conv_bn_pool(*args, **kwargs)

Convolution, batch normalization, pooling group.

参数:
  • name (basestring) – group name
  • input (paddle.v2.config_base.Layer) – layer’s input
  • filter_size (int) – see img_conv’s document
  • num_filters (int) – see img_conv’s document
  • pool_size (int) – see img_pool’s document.
  • pool_type (BasePoolingType) – see img_pool’s document.
  • act (paddle.v2.Activation.Base) – see batch_norm’s document.
  • groups (int) – see img_conv’s document
  • conv_stride (int) – see img_conv’s document.
  • conv_padding (int) – see img_conv’s document.
  • conv_bias_attr (paddle.v2.attr.ParameterAttribute) – see img_conv’s document.
  • num_channel (int) – see img_conv’s document.
  • conv_param_attr (paddle.v2.attr.ParameterAttribute) – see img_conv’s document.
  • shared_bias (bool) – see img_conv’s document.
  • conv_attr (Extrapaddle.v2.config_base.Layer) – see img_conv’s document.
  • bn_param_attr (paddle.v2.attr.ParameterAttribute.) – see batch_norm’s document.
  • bn_bias_attr – see batch_norm’s document.
  • bn_attr – paddle.v2.attr.ParameterAttribute.
  • pool_stride (int) – see img_pool’s document.
  • pool_padding (int) – see img_pool’s document.
  • pool_attr (paddle.v2.attr.ExtraAttribute) – see img_pool’s document.
返回:

Layer groups output

返回类型:

paddle.v2.config_base.Layer

img_conv_group

class paddle.v2.networks.img_conv_group(**kwargs)

Image Convolution Group, Used for vgg net.

TODO(yuyang18): Complete docs

参数:
  • conv_batchnorm_drop_rate
  • input
  • conv_num_filter
  • pool_size
  • num_channels
  • conv_padding
  • conv_filter_size
  • conv_act
  • conv_with_batchnorm
  • pool_stride
  • pool_type
返回:

simple_img_conv_pool

class paddle.v2.networks.simple_img_conv_pool(*args, **kwargs)

Simple image convolution and pooling group.

Input => conv => pooling

参数:
  • name (basestring) – group name
  • input (paddle.v2.config_base.Layer) – input layer name.
  • filter_size (int) – see img_conv for details
  • num_filters (int) – see img_conv for details
  • pool_size (int) – see img_pool for details
  • pool_type (BasePoolingType) – see img_pool for details
  • act (paddle.v2.Activation.Base) – see img_conv for details
  • groups (int) – see img_conv for details
  • conv_stride (int) – see img_conv for details
  • conv_padding (int) – see img_conv for details
  • bias_attr (paddle.v2.attr.ParameterAttribute) – see img_conv for details
  • num_channel (int) – see img_conv for details
  • param_attr (paddle.v2.attr.ParameterAttribute) – see img_conv for details
  • shared_bias (bool) – see img_conv for details
  • conv_attr (paddle.v2.attr.ExtraAttribute) – see img_conv for details
  • pool_stride (int) – see img_pool for details
  • pool_padding (int) – see img_pool for details
  • pool_attr (paddle.v2.attr.ExtraAttribute) – see img_pool for details
返回:

Layer’s output

返回类型:

paddle.v2.config_base.Layer

small_vgg

vgg_16_network

class paddle.v2.networks.vgg_16_network(**kwargs)

Same model from https://gist.github.com/ksimonyan/211839e770f7b538e2d8

参数:
  • num_classes
  • input_image (paddle.v2.config_base.Layer) –
  • num_channels (int) –
返回:

Recurrent

LSTM

lstmemory_unit

class paddle.v2.networks.lstmemory_unit(*args, **kwargs)

Define calculations that a LSTM unit performs in a single time step. This function itself is not a recurrent layer, so that it can not be directly applied to sequence input. This function is always used in recurrent_group (see layers.py for more details) to implement attention mechanism.

Please refer to Generating Sequences With Recurrent Neural Networks for more details about LSTM. The link goes as follows: .. _Link: https://arxiv.org/abs/1308.0850

\[ \begin{align}\begin{aligned}i_t & = \sigma(W_{xi}x_{t} + W_{hi}h_{t-1} + W_{ci}c_{t-1} + b_i)\\f_t & = \sigma(W_{xf}x_{t} + W_{hf}h_{t-1} + W_{cf}c_{t-1} + b_f)\\c_t & = f_tc_{t-1} + i_t tanh (W_{xc}x_t+W_{hc}h_{t-1} + b_c)\\o_t & = \sigma(W_{xo}x_{t} + W_{ho}h_{t-1} + W_{co}c_t + b_o)\\h_t & = o_t tanh(c_t)\end{aligned}\end{align} \]

The example usage is:

lstm_step = lstmemory_unit(input=[layer1],
                           size=256,
                           act=paddle.v2.Activation.Tanh(),
                           gate_act=paddle.v2.Activation.Sigmoid(),
                           state_act=paddle.v2.Activation.Tanh())
参数:
  • input (paddle.v2.config_base.Layer) – input layer name.
  • name (basestring) – lstmemory unit name.
  • size (int) – lstmemory unit size.
  • param_attr (paddle.v2.attr.ParameterAttribute) – Parameter config, None if use default.
  • act (paddle.v2.Activation.Base) – lstm final activiation type
  • gate_act (paddle.v2.Activation.Base) – lstm gate activiation type
  • state_act (paddle.v2.Activation.Base) – lstm state activiation type.
  • mixed_bias_attr (paddle.v2.attr.ParameterAttribute|False) – bias parameter attribute of mixed layer. False means no bias, None means default bias.
  • lstm_bias_attr (paddle.v2.attr.ParameterAttribute|False) – bias parameter attribute of lstm layer. False means no bias, None means default bias.
  • mixed_attr (paddle.v2.attr.ExtraAttribute) – mixed layer’s extra attribute.
  • lstm_attr (paddle.v2.attr.ExtraAttribute) – lstm layer’s extra attribute.
  • get_output_attr (paddle.v2.attr.ExtraAttribute) – get output layer’s extra attribute.
返回:

lstmemory unit name.

返回类型:

paddle.v2.config_base.Layer

lstmemory_group

class paddle.v2.networks.lstmemory_group(*args, **kwargs)

lstm_group is a recurrent layer group version of Long Short Term Memory. It does exactly the same calculation as the lstmemory layer (see lstmemory in layers.py for the maths) does. A promising benefit is that LSTM memory cell states, or hidden states in every time step are accessible to the user. This is especially useful in attention model. If you do not need to access the internal states of the lstm, but merely use its outputs, it is recommended to use the lstmemory, which is relatively faster than lstmemory_group.

NOTE: In PaddlePaddle’s implementation, the following input-to-hidden multiplications: \(W_{xi}x_{t}\) , \(W_{xf}x_{t}\), \(W_{xc}x_t\), \(W_{xo}x_{t}\) are not done in lstmemory_unit to speed up the calculations. Consequently, an additional mixed with full_matrix_projection must be included before lstmemory_unit is called.

The example usage is:

lstm_step = lstmemory_group(input=[layer1],
                            size=256,
                            act=paddle.v2.Activation.Tanh(),
                            gate_act=paddle.v2.Activation.Sigmoid(),
                            state_act=paddle.v2.Activation.Tanh())
参数:
  • input (paddle.v2.config_base.Layer) – input layer name.
  • name (basestring) – lstmemory group name.
  • size (int) – lstmemory group size.
  • reverse (bool) – is lstm reversed
  • param_attr (paddle.v2.attr.ParameterAttribute) – Parameter config, None if use default.
  • act (paddle.v2.Activation.Base) – lstm final activiation type
  • gate_act (paddle.v2.Activation.Base) – lstm gate activiation type
  • state_act (paddle.v2.Activation.Base) – lstm state activiation type.
  • mixed_bias_attr (paddle.v2.attr.ParameterAttribute|False) – bias parameter attribute of mixed layer. False means no bias, None means default bias.
  • lstm_bias_attr (paddle.v2.attr.ParameterAttribute|False) – bias parameter attribute of lstm layer. False means no bias, None means default bias.
  • mixed_attr (paddle.v2.attr.ExtraAttribute) – mixed layer’s extra attribute.
  • lstm_attr (paddle.v2.attr.ExtraAttribute) – lstm layer’s extra attribute.
  • get_output_attr (paddle.v2.attr.ExtraAttribute) – get output layer’s extra attribute.
返回:

the lstmemory group.

返回类型:

paddle.v2.config_base.Layer

simple_lstm

class paddle.v2.networks.simple_lstm(*args, **kwargs)

Simple LSTM Cell.

It just combine a mixed layer with fully_matrix_projection and a lstmemory layer. The simple lstm cell was implemented as follow equations.

\[ \begin{align}\begin{aligned}i_t & = \sigma(W_{xi}x_{t} + W_{hi}h_{t-1} + W_{ci}c_{t-1} + b_i)\\f_t & = \sigma(W_{xf}x_{t} + W_{hf}h_{t-1} + W_{cf}c_{t-1} + b_f)\\c_t & = f_tc_{t-1} + i_t tanh (W_{xc}x_t+W_{hc}h_{t-1} + b_c)\\o_t & = \sigma(W_{xo}x_{t} + W_{ho}h_{t-1} + W_{co}c_t + b_o)\\h_t & = o_t tanh(c_t)\end{aligned}\end{align} \]

Please refer Generating Sequences With Recurrent Neural Networks if you want to know what lstm is. Link is here.

参数:
  • name (basestring) – lstm layer name.
  • input (paddle.v2.config_base.Layer) – input layer name.
  • size (int) – lstm layer size.
  • reverse (bool) – whether to process the input data in a reverse order
  • mat_param_attr (paddle.v2.attr.ParameterAttribute) – mixed layer’s matrix projection parameter attribute.
  • bias_param_attr (paddle.v2.attr.ParameterAttribute|False) – bias parameter attribute. False means no bias, None means default bias.
  • inner_param_attr (paddle.v2.attr.ParameterAttribute) – lstm cell parameter attribute.
  • act (paddle.v2.Activation.Base) – lstm final activiation type
  • gate_act (paddle.v2.Activation.Base) – lstm gate activiation type
  • state_act (paddle.v2.Activation.Base) – lstm state activiation type.
  • mixed_attr (paddle.v2.attr.ExtraAttribute) – mixed layer’s extra attribute.
  • lstm_cell_attr (paddle.v2.attr.ExtraAttribute) – lstm layer’s extra attribute.
返回:

lstm layer name.

返回类型:

paddle.v2.config_base.Layer

bidirectional_lstm

class paddle.v2.networks.bidirectional_lstm(*args, **kwargs)

A bidirectional_lstm is a recurrent unit that iterates over the input sequence both in forward and bardward orders, and then concatenate two outputs form a final output. However, concatenation of two outputs is not the only way to form the final output, you can also, for example, just add them together.

Please refer to Neural Machine Translation by Jointly Learning to Align and Translate for more details about the bidirectional lstm. The link goes as follows: .. _Link: https://arxiv.org/pdf/1409.0473v3.pdf

The example usage is:

bi_lstm = bidirectional_lstm(input=[input1], size=512)
参数:
  • name (basestring) – bidirectional lstm layer name.
  • input (paddle.v2.config_base.Layer) – input layer.
  • size (int) – lstm layer size.
  • return_seq (bool) – If set False, outputs of the last time step are concatenated and returned. If set True, the entire output sequences that are processed in forward and backward directions are concatenated and returned.
返回:

paddle.v2.config_base.Layer object accroding to the return_seq.

返回类型:

paddle.v2.config_base.Layer

GRU

gru_unit

class paddle.v2.networks.gru_unit(*args, **kwargs)

Define calculations that a gated recurrent unit performs in a single time step. This function itself is not a recurrent layer, so that it can not be directly applied to sequence input. This function is almost always used in the recurrent_group (see layers.py for more details) to implement attention mechanism.

Please see grumemory in layers.py for the details about the maths.

参数:
  • input (paddle.v2.config_base.Layer) – input layer name.
  • name (basestring) – name of the gru group.
  • size (int) – hidden size of the gru.
  • act (paddle.v2.Activation.Base) – type of the activation
  • gate_act (paddle.v2.Activation.Base) – type of the gate activation
  • gru_attr (paddle.v2.attr.ParameterAttribute|False) – Extra parameter attribute of the gru layer.
返回:

the gru output layer.

返回类型:

paddle.v2.config_base.Layer

gru_group

class paddle.v2.networks.gru_group(*args, **kwargs)

gru_group is a recurrent layer group version of Gated Recurrent Unit. It does exactly the same calculation as the grumemory layer does. A promising benefit is that gru hidden states are accessible to the user. This is especially useful in attention model. If you do not need to access any internal state, but merely use the outputs of a GRU, it is recommended to use the grumemory, which is relatively faster.

Please see grumemory in layers.py for more detail about the maths.

The example usage is:

gru = gur_group(input=[layer1],
                size=256,
                act=paddle.v2.Activation.Tanh(),
                gate_act=paddle.v2.Activation.Sigmoid())
参数:
  • input (paddle.v2.config_base.Layer) – input layer name.
  • name (basestring) – name of the gru group.
  • size (int) – hidden size of the gru.
  • reverse (bool) – whether to process the input data in a reverse order
  • act (paddle.v2.Activation.Base) – type of the activiation
  • gate_act (paddle.v2.Activation.Base) – type of the gate activiation
  • gru_bias_attr (paddle.v2.attr.ParameterAttribute|False) – bias. False means no bias, None means default bias.
  • gru_attr (paddle.v2.attr.ParameterAttribute|False) – Extra parameter attribute of the gru layer.
返回:

the gru group.

返回类型:

paddle.v2.config_base.Layer

simple_gru

class paddle.v2.networks.simple_gru(*args, **kwargs)

You maybe see gru_step, grumemory in layers.py, gru_unit, gru_group, simple_gru in network.py. The reason why there are so many interfaces is that we have two ways to implement recurrent neural network. One way is to use one complete layer to implement rnn (including simple rnn, gru and lstm) with multiple time steps, such as recurrent, lstmemory, grumemory. But, the multiplication operation \(W x_t\) is not computed in these layers. See details in their interfaces in layers.py. The other implementation is to use an recurrent group which can ensemble a series of layers to compute rnn step by step. This way is flexible for attenion mechanism or other complex connections.

  • gru_step: only compute rnn by one step. It needs an memory as input and can be used in recurrent group.
  • gru_unit: a wrapper of gru_step with memory.
  • gru_group: a GRU cell implemented by a combination of multiple layers in recurrent group. But \(W x_t\) is not done in group.
  • gru_memory: a GRU cell implemented by one layer, which does same calculation with gru_group and is faster than gru_group.
  • simple_gru: a complete GRU implementation inlcuding \(W x_t\) and gru_group. \(W\) contains \(W_r\), \(W_z\) and \(W\), see formula in grumemory.

The computational speed is that, grumemory is relatively better than gru_group, and gru_group is relatively better than simple_gru.

The example usage is:

gru = simple_gru(input=[layer1], size=256)
参数:
  • input (paddle.v2.config_base.Layer) – input layer name.
  • name (basestring) – name of the gru group.
  • size (int) – hidden size of the gru.
  • reverse (bool) – whether to process the input data in a reverse order
  • act (paddle.v2.Activation.Base) – type of the activiation
  • gate_act (paddle.v2.Activation.Base) – type of the gate activiation
  • gru_bias_attr (paddle.v2.attr.ParameterAttribute|False) – bias. False means no bias, None means default bias.
  • gru_attr (paddle.v2.attr.ParameterAttribute|False) – Extra parameter attribute of the gru layer.
返回:

the gru group.

返回类型:

paddle.v2.config_base.Layer

simple_gru2

class paddle.v2.networks.simple_gru2(*args, **kwargs)

simple_gru2 is the same with simple_gru, but using grumemory instead Please see grumemory in layers.py for more detail about the maths. simple_gru2 is faster than simple_gru.

The example usage is:

gru = simple_gru2(input=[layer1], size=256)
参数:
  • input (paddle.v2.config_base.Layer) – input layer name.
  • name (basestring) – name of the gru group.
  • size (int) – hidden size of the gru.
  • reverse (bool) – whether to process the input data in a reverse order
  • act (paddle.v2.Activation.Base) – type of the activiation
  • gate_act (paddle.v2.Activation.Base) – type of the gate activiation
  • gru_bias_attr (paddle.v2.attr.ParameterAttribute|False) – bias. False means no bias, None means default bias.
  • gru_attr (paddle.v2.attr.ParameterAttribute|False) – Extra parameter attribute of the gru layer.
返回:

the gru group.

返回类型:

paddle.v2.config_base.Layer

bidirectional_gru

class paddle.v2.networks.bidirectional_gru(*args, **kwargs)

A bidirectional_gru is a recurrent unit that iterates over the input sequence both in forward and bardward orders, and then concatenate two outputs to form a final output. However, concatenation of two outputs is not the only way to form the final output, you can also, for example, just add them together.

The example usage is:

bi_gru = bidirectional_gru(input=[input1], size=512)
参数:
  • name (basestring) – bidirectional gru layer name.
  • input (paddle.v2.config_base.Layer) – input layer.
  • size (int) – gru layer size.
  • return_seq (bool) – If set False, outputs of the last time step are concatenated and returned. If set True, the entire output sequences that are processed in forward and backward directions are concatenated and returned.
返回:

paddle.v2.config_base.Layer object.

返回类型:

paddle.v2.config_base.Layer

simple_attention

class paddle.v2.networks.simple_attention(*args, **kwargs)

Calculate and then return a context vector by attention machanism. Size of the context vector equals to size of the encoded_sequence.

\[ \begin{align}\begin{aligned}a(s_{i-1},h_{j}) & = v_{a}f(W_{a}s_{t-1} + U_{a}h_{j})\\e_{i,j} & = a(s_{i-1}, h_{j})\\a_{i,j} & = \frac{exp(e_{i,j})}{\sum_{k=1}^{T_x}{exp(e_{i,k})}}\\c_{i} & = \sum_{j=1}^{T_{x}}a_{i,j}h_{j}\end{aligned}\end{align} \]

where \(h_{j}\) is the jth element of encoded_sequence, \(U_{a}h_{j}\) is the jth element of encoded_proj \(s_{i-1}\) is decoder_state \(f\) is weight_act, and is set to tanh by default.

Please refer to Neural Machine Translation by Jointly Learning to Align and Translate for more details. The link is as follows: https://arxiv.org/abs/1409.0473.

The example usage is:

context = simple_attention(encoded_sequence=enc_seq,
                           encoded_proj=enc_proj,
                           decoder_state=decoder_prev,)
参数:
  • name (basestring) – name of the attention model.
  • softmax_param_attr (paddle.v2.attr.ParameterAttribute) – parameter attribute of sequence softmax that is used to produce attention weight
  • weight_act (Activation) – activation of the attention model
  • encoded_sequence (paddle.v2.config_base.Layer) – output of the encoder
  • encoded_proj (paddle.v2.config_base.Layer) – attention weight is computed by a feed forward neural network which has two inputs : decoder’s hidden state of previous time step and encoder’s output. encoded_proj is output of the feed-forward network for encoder’s output. Here we pre-compute it outside simple_attention for speed consideration.
  • decoder_state (paddle.v2.config_base.Layer) – hidden state of decoder in previous time step
  • transform_param_attr (paddle.v2.attr.ParameterAttribute) – parameter attribute of the feed-forward network that takes decoder_state as inputs to compute attention weight.
返回:

a context vector

Miscs

dropout_layer

class paddle.v2.networks.dropout_layer(*args, **kwargs)

@TODO(yuyang18): Add comments.

参数:
  • name
  • input
  • dropout_rate
返回: