PaddleCheckError: cudaGetDeviceCount failed in paddle::platform::GetCUDADeviceCountImpl, error code : 3
Created by: brbheart
- 版本、环境信息: 1)PaddleCloud paddle-fluid-v1.6.1 2)GPU:v100
- 训练信息 1)单机 单卡
- 复现信息: 在PaddleCloud上同样的代码, 少量训练数据(2张图片)可以正确训练; 大量训练数据正式训练时报错。
- 问题描述:
Traceback (most recent call last): File "train.py", line 777, in train(args) File "train.py", line 164, in train place = fluid.CUDAPlace(0) paddle.fluid.core_avx.EnforceNotMet:
C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) 2 paddle::platform::GetCUDADeviceCount()
Error Message Summary:
PaddleCheckError: cudaGetDeviceCount failed in paddle::platform::GetCUDADeviceCountImpl, error code : 3, Please see detail in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038: initialization error at [/paddle/paddle/fluid/platform/gpu_info.cc:67]