未验证 提交 d2e6110e 编写于 作者: M Michael Wyatt 提交者: GitHub

add skip for mismatched cuda versions (#3436)

上级 2c63e349
......@@ -147,6 +147,28 @@ This is also recommended to ensure your exact architecture is used. Due to a var
The full list of nvidia GPUs and their compute capabilities can be found [here](https://developer.nvidia.com/cuda-gpus).
## CUDA version mismatch
If you're getting the following error:
```
Exception: >- DeepSpeed Op Builder: Installed CUDA version {VERSION} does not match the version torch was compiled with {VERSION}, unable to compile cuda/cpp extensions without a matching cuda version.
```
You have a misaligned version of CUDA installed compared to the version of CUDA
used to compile torch. We only require that major version match (e.g., 11.1 and
11.8 are OK). However a mismatch in the major version may result in unexpected
behavior and errors.
The easiest fix for this error is changing the CUDA version installed (check
with `nvcc --version`) or updating the torch version to match the installed
CUDA version (check with `python3 -c "import torch; print(torch.__version__)"`).
If you want to skip this check and proceed with the mismatched CUDA versions, use the following environment variable:
```bash
DS_SKIP_CUDA_CHECK=1
```
## Feature specific dependencies
Some DeepSpeed features require specific dependencies outside the general dependencies of DeepSpeed.
......
......@@ -70,6 +70,7 @@ cuda_minor_mismatch_ok = {
"10.2",
],
11: ["11.0", "11.1", "11.2", "11.3", "11.4", "11.5", "11.6", "11.7", "11.8"],
12: ["12.0", "12.1"],
}
......@@ -85,6 +86,13 @@ def assert_no_cuda_mismatch(name=""):
f"version torch was compiled with {torch.version.cuda} "
"but since the APIs are compatible, accepting this combination")
return True
elif os.getenv("DS_SKIP_CUDA_CHECK", "0") == "1":
print(
f"{WARNING} DeepSpeed Op Builder: Installed CUDA version {sys_cuda_version} does not match the "
f"version torch was compiled with {torch.version.cuda}."
"Detected `DS_SKIP_CUDA_CHECK=1`: Allowing this combination of CUDA, but it may result in unexpected behavior."
)
return True
raise Exception(f">- DeepSpeed Op Builder: Installed CUDA version {sys_cuda_version} does not match the "
f"version torch was compiled with {torch.version.cuda}, unable to compile "
"cuda/cpp extensions without a matching cuda version.")
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册