unhandled cuda error 多卡跑pretrain的时候，cuDNN和NCCL都装了 (#113) · Issue · PaddlePaddle / ERNIE

unhandled cuda error 多卡跑pretrain的时候，cuDNN和NCCL都装了

Created by: fyubang

W0429 09:42:47.633998 34675 device_context.cc:261] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 9.0, Runtime API Version: 9.0
W0429 09:42:47.638882 34675 device_context.cc:269] device: 0, cuDNN Version: 7.0.
ParallelExecutor is deprecated. Please use CompiledProgram and Executor. CompiledProgram is a central place for optimization and Executor is the unified executor. Example can be found in compiler.py.
W0429 09:42:51.247493 34675 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the `OriginProgram()` method!
Traceback (most recent call last):
  File "./train.py", line 357, in <module>
    train(args)
  File "./train.py", line 272, in train
    trainer_id=nccl2_trainer_id)
  File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/parallel_executor.py", line 134, in __init__
    self._compiled_program._compile(place=self._place, scope=self._scope)
  File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/compiler.py", line 307, in _compile
    scope=self._scope)
  File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/compiler.py", line 278, in _compile_data_parallel
    self._exec_strategy, self._build_strategy, self._graph)
paddle.fluid.core.EnforceNotMet: unhandled cuda error at [/paddle/paddle/fluid/platform/nccl_helper.h:113]

python 3.5 cuda 9.0 在跑ernie的多卡与训练的时候报的错。cudnn和nccl2都装了，这个报错太不明显了。大家有遇到吗？

PaddlePaddle / ERNIE 大约 1 年 前同步成功

unhandled cuda error 多卡跑pretrain的时候，cuDNN和NCCL都装了

PaddlePaddle / ERNIE
大约 1 年前同步成功