Created by: gavin1332
CUDADeviceContext has several CUDA resources such as CublasHandleHolder, ncclComm_t, cudaStream_t, cudnnHandle_t, etc. We have to eagerly release them all before CUDA enviroment destroying. So we explicitly reset NCCLCommImpl, which contains CUDADeviceContext, in std::atexit.