Fork自 PaddlePaddle / Paddle
* Polish codes and memory usage for fused_gate_attention. * Fix wrong reduce_dims in fused_gate_attention when computing gradient of nonbatched_bias.