Fix sequence expand op (!11618) · 合并请求 · PaddlePaddle / Paddle

Fix sequence expand op !11618

Created by: ktlichkid

Sequence expand op's GPU grad kernel implementation is not robust enough if memory optimizer is on.

The GPU kernel directly computed the sum of gradient without checking the initial value in d_x tensor.

In this PR, I moved the "set zero" function outside the functor to guarantee d_x is set to zero both on CPU and GPU.