同一个程序单卡跑没问题,多卡比如export CUDA_VISIBLE_DEVICES=0,1,2,就报断错
Created by: 597477803
Device count: 3 Num train examples: 96220 Max train steps: 3006 Num warmup steps: 0 memory_optimize is deprecated. Use CompiledProgram and Executor Theoretical memory usage in training: 5334.724 - 5588.759 MB W1111 17:19:20.743114 159621 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 9.2, Runtime API Version: 8.0 W1111 17:19:20.746721 159621 device_context.cc:267] device: 0, cuDNN Version: 7.0. W1111 17:19:20.746817 159621 device_context.cc:293] WARNING: device: 0. The installed Paddle is compiled with CUDNN 7.1, but CUDNN version in your machine is 7.0, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version. Load pretraining parameters from ./lym/model//params. I1111 17:19:22.966073 159621 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 3. And the Program will be copied 3 copies script/run_lcqmc.sh: line 30: 159621 Segmentation fault python -u run_classifier.py --use_cuda true --verbose true --do_train true --do_val true --do_test true --batch_size 32 --init_pretraining_params ${MODEL_PATH}/params --train_set ${TASK_DATA_PATH}/lcqmc/train.tsv --dev_set ${TASK_DATA_PATH}/lcqmc/dev.tsv --test_set ${TASK_DATA_PATH}/lcqmc/test.tsv --vocab_path config/vocab.txt --checkpoints ./checkpoints --save_steps 1000 --weight_decay 0.0 --warmup_proportion 0.0 --validation_steps 100 --epoch 3 --max_seq_len 128 --ernie_config_path config/ernie_config.json --learning_rate 2e-5 --skip_steps 10 --num_iteration_per_drop_scope 1 --num_labels 2 --random_seed 1