unhandled cuda error 多卡跑pretrain的时候,cuDNN和NCCL都装了
Created by: fyubang
W0429 09:42:47.633998 34675 device_context.cc:261] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 9.0, Runtime API Version: 9.0
W0429 09:42:47.638882 34675 device_context.cc:269] device: 0, cuDNN Version: 7.0.
ParallelExecutor is deprecated. Please use CompiledProgram and Executor. CompiledProgram is a central place for optimization and Executor is the unified executor. Example can be found in compiler.py.
W0429 09:42:51.247493 34675 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the `OriginProgram()` method!
Traceback (most recent call last):
File "./train.py", line 357, in <module>
train(args)
File "./train.py", line 272, in train
trainer_id=nccl2_trainer_id)
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/parallel_executor.py", line 134, in __init__
self._compiled_program._compile(place=self._place, scope=self._scope)
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/compiler.py", line 307, in _compile
scope=self._scope)
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/compiler.py", line 278, in _compile_data_parallel
self._exec_strategy, self._build_strategy, self._graph)
paddle.fluid.core.EnforceNotMet: unhandled cuda error at [/paddle/paddle/fluid/platform/nccl_helper.h:113]
python 3.5 cuda 9.0 在跑ernie的多卡与训练的时候报的错。cudnn和nccl2都装了,这个报错太不明显了。大家有遇到吗?