Some notes about PaddlePaddle, hope someone can help me to refine it and add them into F&Q
Created by: lcy-seso
paddle.layer.memory
in PaddlePaddle
1. About - Every layer in PaddlePaddle has a unique name if the user does not name a layer explicitly, it will be named automatically.
- Memory in PaddlePaddle is very like reference parameters C++. Itself is not a real layer, but points to a layer and retrieve whose output in the previous time step.
- You have to explicitly give a name to the layer
paddle.layer.memory
points to, becausepaddle.layer.memory
needs a layer's name to decide to retrieve which layer's output in previous time step. - In
paddle.layer.memory
, the name specified by thename
parameter is not the name of the definedmemory
layer, but the name of the real layermemory
points to.
paddle.layer.dropout
and paddle.attr.ExtraLayerAttribute(drop_rate=x)
2. About -
I think for most layers, a better way to use dropout is to set the droprate in
layer_attr
(every layer inpaddle.layer
has this attribute) by usingpaddle.attr.ExtraLayerAttribute(drop_rate=0.5)
as below:fc = paddle.layer.fc( input=input, bias_attr=paddle.attr.Param(initial_std=0.), param_attr=paddle.attr.Param(initial_std=5e-4), layer_attr=paddle.attr.ExtraLayerAttribute(drop_rate=0.5),
-
dropout in PaddlePaddle is actually implemented in activiation function, it is not a layer.
-
But
paddle.layer.lstmemory
,paddle.layer.grumemory
,paddle.layer.recurrent
are different, these layers does not activiate the output by calling the general activiation process, but implement the activiation process themselves. As a results, drop rate cannot be directly set in these layer. -
paddle.layer.dropout
actually defines apaddle.layer.add_to
layer and set the drop rate in this layer. This is a little waste of memory because output value to drop is copied again and PaddlePaddle will not release the memory to improve the time efficiency. But if you want to drop a recurrent layer's output, you have to usepaddle.layer.dropout
.
3. About different recurrent layers in PaddlePaddle.
-
If you do not need explicit access to the intermedia values in a recurrent unit (hidden states, input-to-hidden mapping, memory cells and so on), I recommend using
paddle.networks.simple_lstm
orpaddle.layer.lstmemory
. -
recurrent_group
is useful in attention model or NTM. -
In PaddlePaddle we have (here I take LSTM for example, GRU is the same):
paddle.layer.lstmemory
paddle.networks.simple_lstm
paddle.networks.lstmemory_group
paddle.networks.lstmemory_unit
paddle.networks.bidirectional_lstm
-
The above recurrent layers can be categorized into two type:
- recurrent layer implemented by recurrent_group:
- you can access to any intermedia values (hidden states, input-to-hidden mapping, memory cells and so on) a recurrent unit computes during one time step.
- the above 3
- recurrent layer as a whole:
- you can only access to its outputs.
- the above 1 ~ 2, 5
-
paddle.networks.lstmemory_unit
is not a recurrent unit, it defines the computation an LSTM unit performed in one time step.- It only can be used as the step function in
recurrent_group
. - the above 4
- It only can be used as the step function in
- recurrent layer implemented by recurrent_group:
-
The second type (recurrent layer as a whole) is more computation efficient because
recurrent_group
is made up of many basic layers (including add, element-wise multiplication, matrix multiplication and so on), while recurrent layer as a whole is carefully optimized for both CPU and GPU. -
But all recurrent layers(simple rnn, GRU, LSTM) in PaddlePaddle leave the input-to-hidden mapping outside the recurrent layer to make a larger matrix for LSTM and GRU to accelerate the computation speed.
- This is the diffences between
paddle.layer.lstmmemory
andpaddle.network.simple_lstm
. Specifically:-
paddle.layer.lstmmemory
is not the LSTM in textbook, it is a LSTM unit without input-to-hidden projection. -
paddle.network.simple_lstm
is a wrapper which just adds the input-to-hidden projection intopaddle.layer.lstmmemory
. It is the LSTM in textbook.
-
-
paddle.layer.lstmmemory
andpaddle.network.simple_lstm
in PaddlePaddle is LSTM with peephole connections. Be careful to this, make sure you have known they have more parameters than LSTM without peep-hole connections.
- This is the diffences between