Optimize the concat and split cuda implementation for cases when the number of...
Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979) test=develop
Showing
想要评论请 注册 或 登录
Fork自 PaddlePaddle / Paddle
Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979) test=develop