improve gru unit performance. (#16338)
refine code
fuse cublas calling and kernels into one cuda kernel.
test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
Showing
想要评论请 注册 或 登录
refine code
fuse cublas calling and kernels into one cuda kernel.
test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>