Allclose op (#27891)
* Still has bugs. * Fixed allclose_op bug, which cannot deal with some cases of fp64 inputs. * improved CUDA kernel performance. * Changed CUDA code. * Fixed a bug in cuda kernel which cannot deal with large dimension input, and added an unittest for it. * Add a test case for float32 input.
Showing
想要评论请 注册 或 登录