How to stack GRU layers in the seqToseq_net.py demo example?
Created by: alvations
In the seqToseq
demo, the main code that implements the decoder is:
def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
decoder_mem = memory(
name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
context = simple_attention(
encoded_sequence=enc_vec,
encoded_proj=enc_proj,
decoder_state=decoder_mem, )
with mixed_layer(size=decoder_size * 3) as decoder_inputs:
decoder_inputs += full_matrix_projection(input=context)
decoder_inputs += full_matrix_projection(input=current_word)
gru_step = gru_step_layer(
name='gru_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
with mixed_layer(
size=target_dict_dim, bias_attr=True,
act=SoftmaxActivation()) as out:
out += full_matrix_projection(input=gru_step)
return out
How do I add more layers in the decoder like in the Google NMT paper?
From my understanding (according to the documentation), the recurrent_group
at https://github.com/baidu/Paddle/blob/develop/demo/seqToseq/seqToseq_net.py#L156 is the part where Paddle takes a timestep so the layers should be added inside the gru_decoder_with_attention()
, is that right?
But I'm unsure what is gru_step_layer()
doing at https://github.com/baidu/Paddle/blob/develop/demo/seqToseq/seqToseq_net.py#L124 . There isn't much information in the documentation and from the code it says it's wrapping the gru_step
which is at the GruStepLayer.cpp
:
/**
- @brief GruStepLayer is like GatedRecurrentLayer, but used in recurrent
- layer group. GruStepLayer takes 2 input layer.
- input[0] with size * 3 and diveded into 3 equal parts: (xz_t, xr_t, xi_t).
- input[1] with size: {prev_out}.
- parameter and biasParameter is also diveded into 3 equal parts:
- parameter consists of (U_z, U_r, U)
- baisParameter consists of (bias_z, bias_r, bias_o)
- \f[
- update \ gate: z_t = actGate(xz_t + U_z * prev_out + bias_z) \
- reset \ gate: r_t = actGate(xr_t + U_r * prev_out + bias_r) \
- output \ candidate: {h}_t = actNode(xi_t + U * dot(r_t, prev_out) + bias_o) \
- output: h_t = dot((1-z_t), prev_out) + dot(z_t, prev_out)
- \f]
- @note
- dot denotes "element-wise multiplication".
- actNode is defined by config active_type
- actGate is defined by config actvie_gate_type
- The config file api if gru_step_layer. */
When I tried to stack the layers as such:
def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
decoder_mem = memory(name='gru_decoder',
size=decoder_size,
boot_layer=decoder_boot)
context = simple_attention(encoded_sequence=enc_vec,
encoded_proj=enc_proj,
decoder_state=decoder_mem, )
with mixed_layer(size=decoder_size * 3) as decoder_inputs:
decoder_inputs += full_matrix_projection(input=context)
decoder_inputs += full_matrix_projection(input=current_word)
gru_step = gru_step_layer(name='gru_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
with mixed_layer(size=decoder_size * 3) as decoder_inputs:
decoder_inputs += full_matrix_projection(input=context)
decoder_inputs += full_matrix_projection(input=gru_step)
gru_step = gru_step_layer(name='gru_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
with mixed_layer(size=decoder_size,
bias_attr=True,
act=SoftmaxActivation()) as out:
out += full_matrix_projection(input=gru_step)
return out
It throw an error that says there's no input sequence.
$ bash train.sh
I1117 14:37:45.858877 13026 Util.cpp:155] commandline: /home/ltan/Paddle/binary/bin/../opt/paddle/bin/paddle_trainer --config=train.conf --save_dir=/home/ltan/Paddle/demo/ibot/model-sub --use_gpu=true --num_passes=100 --show_parameter_stats_period=1000 --trainer_count=4 --log_period=10 --dot_period=5
I1117 14:37:51.345690 13026 Util.cpp:130] Calling runInitFunctions
I1117 14:37:51.345918 13026 Util.cpp:143] Call runInitFunctions done.
[WARNING 2016-11-17 14:37:51,529 layers.py:1133] You are getting the first instance for a time series, and it is a normal recurrent layer output. There is no time series information at all. Maybe you want to use last_seq instead.
[WARNING 2016-11-17 14:37:51,533 default_decorators.py:40] please use keyword arguments in paddle config.
[WARNING 2016-11-17 14:37:51,539 default_decorators.py:40] please use keyword arguments in paddle config.
[WARNING 2016-11-17 14:37:51,540 default_decorators.py:40] please use keyword arguments in paddle config.
[WARNING 2016-11-17 14:37:51,540 default_decorators.py:40] please use keyword arguments in paddle config.
[WARNING 2016-11-17 14:37:51,540 default_decorators.py:40] please use keyword arguments in paddle config.
[WARNING 2016-11-17 14:37:51,540 default_decorators.py:40] please use keyword arguments in paddle config.
[INFO 2016-11-17 14:37:51,543 networks.py:1125] The input order is [source_language_word, target_language_word, target_language_next_word]
[INFO 2016-11-17 14:37:51,543 networks.py:1132] The output order is [__cost_0__]
I1117 14:37:51.551156 13026 Trainer.cpp:170] trainer mode: Normal
I1117 14:37:51.552254 13026 MultiGradientMachine.cpp:108] numLogicalDevices=1 numThreads=4 numDevices=4
I1117 14:37:51.656347 13026 PyDataProvider2.cpp:247] loading dataprovider dataprovider::process
[INFO 2016-11-17 14:37:51,656 dataprovider.py:27] src dict len : 10000
[INFO 2016-11-17 14:37:51,656 dataprovider.py:37] trg dict len : 7116
I1117 14:37:51.676383 13026 PyDataProvider2.cpp:247] loading dataprovider dataprovider::process
[INFO 2016-11-17 14:37:51,676 dataprovider.py:27] src dict len : 10000
[INFO 2016-11-17 14:37:51,676 dataprovider.py:37] trg dict len : 7116
I1117 14:37:51.676964 13026 GradientMachine.cpp:134] Initing parameters..
I1117 14:37:52.880277 13026 GradientMachine.cpp:141] Init parameters done.
F1117 14:37:53.244454 13049 GatedRecurrentLayer.cpp:79] Check failed: input.sequenceStartPositions
*** Check failure stack trace: ***
@ 0x7f3acc440daa (unknown)
@ 0x7f3acc440ce4 (unknown)
@ 0x7f3acc4406e6 (unknown)
@ 0x7f3acc443687 (unknown)
@ 0x5a9d7c paddle::GatedRecurrentLayer::forward()
@ 0x66c220 paddle::NeuralNetwork::forward()
@ 0x65cf6f paddle::RecurrentGradientMachine::forward()
@ 0x5ef64a paddle::RecurrentLayerGroup::forward()
@ 0x66c220 paddle::NeuralNetwork::forward()
@ 0x672617 paddle::TrainerThread::forward()
@ 0x674935 paddle::TrainerThread::computeThread()
@ 0x7f3acbfbda60 (unknown)
@ 0x7f3accff9184 start_thread
F1117 14:37:53.255451 13041 GatedRecurrentLayer.cpp:79] Check failed: input.sequenceStartPositions
*** Check failure stack trace: ***
@ 0x7f3acb72537d (unknown)
@ 0x7f3acc440daa (unknown)
@ (nil) (unknown)
/home/ltan/Paddle/binary/bin/paddle: line 81: 13026 Aborted (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}
I'm not sure what is wrong here. **Is it because my inputs to the gru_step_layer
is wrong? **
How should the GRU layer stacking be done on the seqToseq GRU decoder?