Optimize the cuda implementation of sum_op (#17283)
* Optimize the cuda implementation of sum_op, which add two lod_tensors inplace. test=develop * Use eigen to add to tensors. test=develop
Showing
想要评论请 注册 或 登录
* Optimize the cuda implementation of sum_op, which add two lod_tensors inplace. test=develop * Use eigen to add to tensors. test=develop