"cudaSuccess == err (0 vs. 8)" error on v0.8.0b1
Created by: alvations
I have installed paddlepaddle using the .deb
file from https://github.com/baidu/Paddle/releases/download/V0.8.0b1/paddle-gpu-0.8.0b1-Linux.deb
I have a GTX 1080 with CUDA 8.0 installed with cudnn v5.1 without the NVIDIA Accelerated Graphics Driver
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44
I've set the shell variables:
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda
And when I tried to run the demo from the Paddle github repo, I am getting a [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
error. Is there some way to resolve this?
#3 (closed), #18 (closed), #95 (closed) seems to occur even though the cmake file on the latest release should have been fixed #107 , i'm getting this error:
~/Paddle/demo/image_classification$ bash train.sh
I1005 13:07:51.456790 894 Util.cpp:151] commandline: /home/ltan/Paddle/binary/bin/../opt/paddle/bin/paddle_trainer --config=vgg_16_cifar.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=1 --trainer_count=1 --num_passes=200 --save_dir=./cifar_vgg_model
I1005 13:07:55.145606 894 Util.cpp:126] Calling runInitFunctions
I1005 13:07:55.145925 894 Util.cpp:139] Call runInitFunctions done.
[INFO 2016-10-05 13:07:55,313 layers.py:1620] channels=3 size=3072
[INFO 2016-10-05 13:07:55,313 layers.py:1620] output size for __conv_0__ is 32
[INFO 2016-10-05 13:07:55,315 layers.py:1620] channels=64 size=65536
[INFO 2016-10-05 13:07:55,315 layers.py:1620] output size for __conv_1__ is 32
[INFO 2016-10-05 13:07:55,316 layers.py:1681] output size for __pool_0__ is 16*16
[INFO 2016-10-05 13:07:55,317 layers.py:1620] channels=64 size=16384
[INFO 2016-10-05 13:07:55,317 layers.py:1620] output size for __conv_2__ is 16
[INFO 2016-10-05 13:07:55,319 layers.py:1620] channels=128 size=32768
[INFO 2016-10-05 13:07:55,319 layers.py:1620] output size for __conv_3__ is 16
[INFO 2016-10-05 13:07:55,320 layers.py:1681] output size for __pool_1__ is 8*8
[INFO 2016-10-05 13:07:55,321 layers.py:1620] channels=128 size=8192
[INFO 2016-10-05 13:07:55,321 layers.py:1620] output size for __conv_4__ is 8
[INFO 2016-10-05 13:07:55,323 layers.py:1620] channels=256 size=16384
[INFO 2016-10-05 13:07:55,323 layers.py:1620] output size for __conv_5__ is 8
[INFO 2016-10-05 13:07:55,324 layers.py:1620] channels=256 size=16384
[INFO 2016-10-05 13:07:55,325 layers.py:1620] output size for __conv_6__ is 8
[INFO 2016-10-05 13:07:55,326 layers.py:1681] output size for __pool_2__ is 4*4
[INFO 2016-10-05 13:07:55,327 layers.py:1620] channels=256 size=4096
[INFO 2016-10-05 13:07:55,327 layers.py:1620] output size for __conv_7__ is 4
[INFO 2016-10-05 13:07:55,328 layers.py:1620] channels=512 size=8192
[INFO 2016-10-05 13:07:55,329 layers.py:1620] output size for __conv_8__ is 4
[INFO 2016-10-05 13:07:55,330 layers.py:1620] channels=512 size=8192
[INFO 2016-10-05 13:07:55,330 layers.py:1620] output size for __conv_9__ is 4
[INFO 2016-10-05 13:07:55,332 layers.py:1681] output size for __pool_3__ is 2*2
[INFO 2016-10-05 13:07:55,332 layers.py:1681] output size for __pool_4__ is 1*1
[INFO 2016-10-05 13:07:55,335 networks.py:1125] The input order is [image, label]
[INFO 2016-10-05 13:07:55,335 networks.py:1132] The output order is [__cost_0__]
I1005 13:07:55.342417 894 Trainer.cpp:170] trainer mode: Normal
F1005 13:07:55.343267 894 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***
@ 0x7f1c681cadaa (unknown)
@ 0x7f1c681cace4 (unknown)
@ 0x7f1c681ca6e6 (unknown)
@ 0x7f1c681cd687 (unknown)
@ 0x78a939 hl_gpu_apply_unary_op<>()
@ 0x7536bf paddle::BaseMatrixT<>::applyUnary<>()
@ 0x7532a9 paddle::BaseMatrixT<>::applyUnary<>()
@ 0x73d82f paddle::BaseMatrixT<>::zero()
@ 0x66d2ae paddle::Parameter::enableType()
@ 0x669acc paddle::parameterInitNN()
@ 0x66bd13 paddle::NeuralNetwork::init()
@ 0x679ed3 paddle::GradientMachine::create()
@ 0x6a6355 paddle::TrainerInternal::init()
@ 0x6a2697 paddle::Trainer::init()
@ 0x53a1f5 main
@ 0x7f1c673d6f45 (unknown)
@ 0x545ae5 (unknown)
@ (nil) (unknown)
/home/ltan/Paddle/binary/bin/paddle: line 81: 894 Aborted (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}
No data to plot. Exiting!
I have also tried recompiling from source and the same error occurs. BTW, the quick_start
demo works though.