Created by: 1024er
change cudnn rnn implementation:
before change: same as tf's weights definition now: same as pytorch's weights definition, which is more friendly to cudnn api