GPU Memory error while running script/zh_task/ernie_base/run_drcd.sh (#286) · Issue · PaddlePaddle / ERNIE

GPU Memory error while running script/zh_task/ernie_base/run_drcd.sh

Created by: cibinjohn

I was trying to develop a custom Machine comprehension model over squad v1 data by running script/zh_task/ernie_base/run_drcd.sh, and encountered the following error. Any help would be appreciated.

export FLAGS_eager_delete_tensor_gb=0
FLAGS_eager_delete_tensor_gb=0
export FLAGS_sync_nccl_allreduce=1
FLAGS_sync_nccl_allreduce=1
export CUDA_VISIBLE_DEVICES=0,1,2,3
CUDA_VISIBLE_DEVICES=0,1,2,3
python -u run_mrc.py --use_cuda true --train_set /home/ubuntu/cibin/squad_v1_1__data/train.json --batch_size 16 --in_tokens false --use_fast_executor true --checkpoints ./checkpoints --vocab_path /home/ubuntu/cibin/ERNIE/pretrained_model/vocab.txt --ernie_config_path /home/ubuntu/cibin/ERNIE/pretrained_model/ernie_config.json --do_train true --do_val true --do_test true --verbose true --save_steps 1000 --validation_steps 100 --warmup_proportion 0.0 --weight_decay 0.01 --epoch 2 --max_seq_len 512 --do_lower_case true --doc_stride 128 --dev_set /home/ubuntu/cibin/squad_v1_1__data/dev.json --test_set /home/ubuntu/cibin/squad_v1_1__data/test.json --learning_rate 5e-5 --num_iteration_per_drop_scope 1 --init_pretraining_params /home/ubuntu/cibin/ERNIE/pretrained_model/params --skip_steps 10 attention_probs_dropout_prob: 0.1 hidden_act: gelu hidden_dropout_prob: 0.1 hidden_size: 768 initializer_range: 0.02 max_position_embeddings: 512 num_attention_heads: 12 num_hidden_layers: 12 sent_type_vocab_size: 4 task_type_vocab_size: 16 vocab_size: 30522

Device count: 4 Num train examples: 1483 Max train steps: 46 Num warmup steps: 0 memory_optimize is deprecated. Use CompiledProgram and Executor Theoretical memory usage in training: 13971.085 - 14636.375 MB W0819 05:09:46.604622 511 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 10.0 W0819 05:09:46.606772 511 device_context.cc:267] device: 0, cuDNN Version: 7.6. Load pretraining parameters from /home/ubuntu/cibin/libor/github/ERNIE/pretrained_model/params. I0819 05:09:49.962049 511 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 4. And the Program will be copied 4 copies I0819 05:09:51.959648 511 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 W0819 05:09:55.706979 569 system_allocator.cc:121] Cannot malloc 9770.01 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use or FLAGS_initial_gpu_memory_in_mb or FLAGS_reallocate_gpu_memory_in_mbenvironment variable to a lower value. Current FLAGS_fraction_of_gpu_memory_to_use value is 0.92. Current FLAGS_initial_gpu_memory_in_mb value is 0. Current FLAGS_reallocate_gpu_memory_in_mb value is 0 F0819 05:09:55.707295 569 legacy_allocator.cc:201] Cannot allocate 139.869873MB in GPU 1, available 648.500000MBtotal 11721506816GpuMinChunkSize 256.000000BGpuMaxChunkSize 9.541025GBGPU memory used: 9.507799GB

* Check failure stack trace: *

@ 0x7f4d748e639d google::LogMessage::Fail() @ 0x7f4d748e9e4c google::LogMessage::SendToLog() @ 0x7f4d748e5ec3 google::LogMessage::Flush() @ 0x7f4d748eb35e google::LogMessageFatal::~LogMessageFatal() @ 0x7f4d768c77d4 paddle::memory::legacy::Alloc<>() @ 0x7f4d768c7ab5 paddle::memory::allocation::LegacyAllocator::AllocateImpl() @ 0x7f4d768bbbd5 paddle::memory::allocation::AllocatorFacade::Alloc() @ 0x7f4d768bbd5a paddle::memory::allocation::AllocatorFacade::AllocShared() @ 0x7f4d764b489c paddle::memory::AllocShared() @ 0x7f4d7688d924 paddle::framework::Tensor::mutable_data() @ 0x7f4d74b90ba5 paddle::operators::MatMulGradKernel<>::MatMul() @ 0x7f4d74b90e1f paddle::operators::MatMulGradKernel<>::CalcInputGrad() @ 0x7f4d74b912e7 paddle::operators::MatMulGradKernel<>::Compute() @ 0x7f4d74b916f3 ZNSt17_Function_handlerIFvRKN6paddle9framework16ExecutionContextEEZNKS1_24OpKernelRegistrarFunctorINS0_8platform9CUDAPlaceELb0ELm0EINS0_9operators16MatMulGradKernelINS7_17CUDADeviceContextEfEENSA_ISB_dEENSA_ISB_NS7_7float16EEEEEclEPKcSI_iEUlS4_E_E9_M_invokeERKSt9_Any_dataS4 @ 0x7f4d7682f187 paddle::framework::OperatorWithKernel::RunImpl() @ 0x7f4d7682f561 paddle::framework::OperatorWithKernel::RunImpl() @ 0x7f4d7682cb5c paddle::framework::OperatorBase::Run() @ 0x7f4d7662805a paddle::framework::details::ComputationOpHandle::RunImpl() @ 0x7f4d7661aa00 paddle::framework::details::OpHandleBase::Run() @ 0x7f4d765fbd76 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync() @ 0x7f4d765fa9df paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp() @ 0x7f4d765fad9f _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data @ 0x7f4d749d3b53 std::_Function_handler<>::_M_invoke() @ 0x7f4d74869c47 std::__future_base::_State_base::_M_do_set() @ 0x7f4dc650da99 __pthread_once_slow @ 0x7f4d765f6422 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv @ 0x7f4d7486b1c4 _ZZN10ThreadPoolC1EmENKUlvE_clEv @ 0x7f4da839cc80 (unknown) @ 0x7f4dc65066ba start_thread @ 0x7f4dc623c41d clone @ (nil) (unknown) script/zh_task/ernie_base/run_drcd.sh: line 50: 511 Aborted (core dumped) python -u run_mrc.py --use_cuda true --train_set ${TASK_DATA_PATH1}/train.json --batch_size 16 --in_tokens false --use_fast_executor true --checkpoints ./checkpoints --vocab_path ${MODEL_PATH}/vocab.txt --ernie_config_path ${MODEL_PATH}/ernie_config.json --do_train true --do_val true --do_test true --verbose true --save_steps 1000 --validation_steps 100 --warmup_proportion 0.0 --weight_decay 0.01 --epoch 2 --max_seq_len 512 --do_lower_case true --doc_stride 128 --dev_set ${TASK_DATA_PATH}/dev.json --test_set ${TASK_DATA_PATH}/test.json --learning_rate 5e-5 --num_iteration_per_drop_scope 1 --init_pretraining_params ${MODEL_PATH}/params --skip_steps 10

PaddlePaddle / ERNIE 大约 2 年 前同步成功

GPU Memory error while running script/zh_task/ernie_base/run_drcd.sh

* Check failure stack trace: *

PaddlePaddle / ERNIE
大约 2 年前同步成功