RNN: High gpu memory usage for recurrent_group + simple_attention
Created by: byzhang
We observed significant more gpu memory usage when using recurrent_group + simple_attention as compared with the standard gru or lstm. @lyxm added some logging in https://github.com/lyxm/Paddle/commit/9ce03727c273542492d63db9bb16088153d1edc4 with some analysis on the logging results. It seems some gpu memory copied in mixed_layer may be eliminated. @emailweixu commented, "大概是在RecurrentGrsdientMachine里// connect in_links 那里修改,把rnn 每个frame和static input相连". It's kind of a blocking issue to learn a meaningful sized model with long sequences. It's highly appreciated if you can triage this issue. cc @xliux