speed up LSTM with DNI
Created by: zuowang
I found the method described in Deepmind's DNI paper to speed up LSTM is promising. It showed that unrolling the LSTM and decoupling with DNI would get 2x faster. Can we implement it on paddle?
The paper is Decoupled Neural Interfaces using Synthetic Gradients