Fix NCCL UB issue
This is fixing a UB issue which occurs with newer version of Clang (17+). The fix is also upstreamed through https://github.com/NVIDIA/nccl/pull/916. In addition I'm changing the handling of `enqueue.cc` which needs to be compiled in cuda mode under clang. The previous solution with just passing in the `-x cuda` option fails with CUDA 12+. I'm also correcting the version number that we set in the patch - not sure if this version is reported in some logs, but if it is, it should be correct. PiperOrigin-RevId: 564811002
Showing
想要评论请 注册 或 登录