单机多卡,GPU指定1卡(1卡空闲),可是GPU 0卡报错。
Created by: lalala805
请教, 单机多卡,训练时指定了export CUDA_VISIBLE_DEVICES=1,GPU1卡也是闲置状态。 训练时报错,我看报错信息好像是GPU0卡内存不够?(GPU0卡被别的模型使用),可我已经指定使用GPU1卡,是我哪里搞错了么?
附报错信息。
- export FLAGS_sync_nccl_allreduce=1
- FLAGS_sync_nccl_allreduce=1
- export CUDA_VISIBLE_DEVICES=1
- CUDA_VISIBLE_DEVICES=1
- python -u run_classifier.py --use_cuda true --verbose true --do_train true --do_val false --do_test false --batch_size 16 --init_pretraining_params ../model/params --train_set ../task_data/dev/a.tsv --dev_set ../task_data/dev/b.tsv --test_set ../task_data/dev/c.tsv --vocab_path ../model/vocab.txt --checkpoints ./checkpoints --save_steps 10 --weight_decay 0.0 --warmup_proportion 0.0 --validation_steps 5 --epoch 10 --max_seq_len 128 --ernie_config_path ../model/ernie_config.json --learning_rate 2e-5 --skip_steps 10 --num_iteration_per_drop_scope 1 --num_labels 2 --random_seed 1 ----------- Configuration Arguments ----------- batch_size: 16 checkpoints: ./checkpoints dev_set: ../task_data/dev/b.tsv do_lower_case: True do_test: False do_train: True do_val: False enable_ce: False epoch: 10 ernie_config_path: ../model/ernie_config.json in_tokens: False init_checkpoint: None init_pretraining_params: ../model/params label_map_config: None learning_rate: 2e-05 loss_scaling: 1.0 lr_scheduler: linear_warmup_decay max_seq_len: 128 metrics: True num_iteration_per_drop_scope: 1 num_labels: 2 random_seed: 1 save_steps: 10 shuffle: True skip_steps: 10 test_set: ../task_data/dev/c.tsv train_set: ../task_data/dev/a.tsv use_cuda: True use_fast_executor: False use_fp16: False validation_steps: 5 verbose: True vocab_path: ../model/vocab.txt warmup_proportion: 0.0 weight_decay: 0.0
attention_probs_dropout_prob: 0.1 hidden_act: relu hidden_dropout_prob: 0.1 hidden_size: 768 initializer_range: 0.02 max_position_embeddings: 513 num_attention_heads: 12 num_hidden_layers: 12 type_vocab_size: 2 vocab_size: 18000
Device count: 1 Num train examples: 7976 Max train steps: 4985 Num warmup steps: 0 memory_optimize is deprecated. Use CompiledProgram and Executor Theoretical memory usage in training: 4144.448 - 4341.803 MB W0812 11:06:08.425791 92102 device_context.cc:261] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 9.2, Runtime API Version: 9.0 W0812 11:06:08.428957 92102 device_context.cc:269] device: 0, cuDNN Version: 7.0. W0812 11:06:08.428998 92102 device_context.cc:293] WARNING: device: 0. The installed Paddle is compiled with CUDNN 7.3, but CUDNN version in your machine is 7.0, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version. Load pretraining parameters from ../model/params. ParallelExecutor is deprecated. Please use CompiledProgram and Executor. CompiledProgram is a central place for optimization and Executor is the unified executor. Example can be found in compiler.py. W0812 11:06:09.157330 92102 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the OriginProgram() method! I0812 11:06:09.588352 92102 build_strategy.cc:285] SeqOnlyAllReduceOps:0, num_trainers:1 W0812 11:06:11.642266 92102 system_allocator.cc:125] Cannot malloc 6652.83 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use or FLAGS_initial_gpu_memory_in_mb or FLAGS_reallocate_gpu_memory_in_mbenvironment variable to a lower value. Current FLAGS_fraction_of_gpu_memory_to_use value is 0.92. Current FLAGS_initial_gpu_memory_in_mb value is 0. Current FLAGS_reallocate_gpu_memory_in_mb value is 0 F0812 11:06:11.642613 92102 legacy_allocator.cc:200] Cannot allocate 2.953125MB in GPU 0, available 436.937500MBtotal 7981694976GpuMinChunkSize 256.000000BGpuMaxChunkSize 6.496908GBGPU memory used: 6.495386GB