Allclose op (#27891) (#28069)
* Fixed allclose_op bug, which cannot deal with some cases of fp64 inputs. * improved CUDA kernel performance. * Fixed a bug in cuda kernel which cannot deal with large dimension input, and added an unit test for it. * Add a test case for float32 input.
Showing
想要评论请 注册 或 登录