Created by: zhouwei25
当前问题
目前CUDA报错信息格式:需要用户在网站里,自行搜索错误码。既无简要信息,也无详细信息,没有用户可以参考的价值;
因此报错信息较为不友好,用户出现CUDA问题无法自行分析。存在问题较大,issue众多(总计会有10个以上)。
- 用户issue1:https://github.com/PaddlePaddle/Paddle/issues/21913
- 用户issue2:https://github.com/PaddlePaddle/Paddle/issues/22701
- 用户issue3:https://github.com/PaddlePaddle/Paddle/issues/22749
升级方案
同时支持 简要信息
+ 详细信息(Recommended Solution)
,其中详细信息(Recommended Solution)
从Nvidia官网获取而来。
修改前
-------------------------------------------- Error Message Summary: -------------------------------------------- Error: cudaGetDeviceCount failed in paddle::platform::GetCUDADeviceCountImpl, error code : 35, Please see detail in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038 at (D:\1.6.2\paddle\paddle\fluid\platform\gpu_info.cc:67)
修改后
-------------------------------------------- Error Message Summary: -------------------------------------------- ExternalError: CUDA runtime error(35): CUDA driver version is insufficient for CUDA runtime version.
Recommended Solution: This indicates that the installed NVIDIA CUDA driver is older than the CUDA runtime library. This is not a supported configuration.Users should install an updated NVIDIA display driver to allow the application to run. at (/Paddle/paddle/fluid/pybind/pybind.cc:1243)
调用方式举例:
- 实现了全部封装,只需传入返回值,调用简单;
PADDLE_ENFORCE_CUDA_SUCCESS(cudaGetDeviceCount(&count));
- 开发者亦可在前面加上自定义的报错信息;
PADDLE_ENFORCE_CUDA_SUCCESS(cudaGetDeviceCount(&count), "cudaGetDeviceCount failed in paddle::platform::GetCUDADeviceCountImpl.");