Optimize the CUDA implementation of sequence_expand op by reduce the times of...
Optimize the CUDA implementation of sequence_expand op by reduce the times of copying lod data from CPU to GPU. (#15493) * Optimize the CUDA implementation of sequence_expand op by reduce the times of copying lod data from CPU to GPU. test=develop * Refine the op benchmark to support setting lod in config. test=develop
Showing
想要评论请 注册 或 登录