运行过程中 CUDA out of memmory? (#178) · Issue · PaddlePaddle / DeepSpeech

运行过程中 CUDA out of memmory?

Created by: bolt163

说一下我的情况，因为机器只有4块GPU，每个12GB内存，同时也对应修改了 run_train.sh脚本内容如下

CUDA_VISIBLE_DEVICES=0,1,2,3
python -u train.py
--batch_size=64
--trainer_count=4
--num_passes=50
--num_proc_data=16
--num_conv_layers=2
--num_rnn_layers=3
--rnn_layer_size=1024
--num_iter_print=100
--learning_rate=5e-4
--max_duration=27.0
--min_duration=0.0
--test_off=False
--use_sortagrad=True
--use_gru=True
--use_gpu=True
--is_local=True
--share_rnn_weights=False
--train_manifest='data/aishell/manifest.train'
--dev_manifest='data/aishell/manifest.dev'
--mean_std_path='data/aishell/mean_std.npz'
--vocab_path='data/aishell/vocab.txt'
--output_model_dir='./checkpoints/aishell'
--augment_conf_path='conf/augmentation.config'
--specgram_type='linear'
--shuffle_method='batch_shuffle_clipped'

if [ $? -ne 0 ]; then echo "Failed in training!" exit 1 fi

exit 0 ~
~
~
"run_train.sh" 43L, 1019C

————————————————————————————————————————————— ~/DeepSpeech/examples/aishell> sh run_train.sh ----------- Configuration Arguments ----------- augment_conf_path: conf/augmentation.config batch_size: 64 dev_manifest: data/aishell/manifest.dev init_model_path: None is_local: 1 learning_rate: 0.0005 max_duration: 27.0 mean_std_path: data/aishell/mean_std.npz min_duration: 0.0 num_conv_layers: 2 num_iter_print: 100 num_passes: 50 num_proc_data: 16 num_rnn_layers: 3 output_model_dir: ./checkpoints/aishell rnn_layer_size: 1024 share_rnn_weights: 0 shuffle_method: batch_shuffle_clipped specgram_type: linear test_off: 0 train_manifest: data/aishell/manifest.train trainer_count: 4 use_gpu: 1 use_gru: 1 use_sortagrad: 1 vocab_path: data/aishell/vocab.txt

I0315 16:47:44.366181 11850 Util.cpp:166] commandline: --use_gpu=1 --rnn_use_batch=True --log_clipping=True --trainer_count=4 [INFO 2018-03-15 16:47:46,743 layers.py:2714] output for conv_0: c = 32, h = 81, w = 54, size = 139968 [INFO 2018-03-15 16:47:46,744 layers.py:3282] output for batch_norm_0: c = 32, h = 81, w = 54, size = 139968 [INFO 2018-03-15 16:47:46,744 layers.py:7454] output for scale_sub_region_0: c = 32, h = 81, w = 54, size = 139968 [INFO 2018-03-15 16:47:46,745 layers.py:2714] output for conv_1: c = 32, h = 41, w = 54, size = 70848 [INFO 2018-03-15 16:47:46,746 layers.py:3282] output for batch_norm_1: c = 32, h = 41, w = 54, size = 70848 [INFO 2018-03-15 16:47:46,746 layers.py:7454] output for scale_sub_region_1: c = 32, h = 41, w = 54, size = 70848 I0315 16:47:46.766090 11850 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=4 numDevices=4 I0315 16:47:46.877454 11850 GradientMachine.cpp:94] Initing parameters.. I0315 16:47:50.904826 11850 GradientMachine.cpp:101] Init parameters done. ................................................................................................... Pass: 0, Batch: 100, TrainCost: 63.639388 ................................................................................................... Pass: 0, Batch: 200, TrainCost: 63.148882 ................................................................................................... Pass: 0, Batch: 300, TrainCost: 64.747351 ................................................................................................... Pass: 0, Batch: 400, TrainCost: 54.954166 ................................................................................................... Pass: 0, Batch: 500, TrainCost: 38.613670 ................................................................................................... Pass: 0, Batch: 600, TrainCost: 30.979173 ................................................................................................... Pass: 0, Batch: 700, TrainCost: 26.576287 ................................................................................................... Pass: 0, Batch: 800, TrainCost: 24.339529 ................................................................................................... Pass: 0, Batch: 900, TrainCost: 22.288584 ...............................F0315 17:24:56.116675 11921 hl_cuda_device.cc:273] Check failed: cudaSuccess == cudaStat (0 vs. 2) Cuda Error: out of memory

* Check failure stack trace: *

@ 0x7fa000f5adad google::LogMessage::Fail() @ 0x7fa000f5ef6c google::LogMessage::SendToLog() @ 0x7fa000f5a8d3 google::LogMessage::Flush() @ 0x7fa000f5f9be google::LogMessageFatal::~LogMessageFatal() @ 0x7fa000f1bf84 hl_malloc_device() @ 0x7fa000dd4a66 paddle::GpuAllocator::alloc() @ 0x7fa000dc85ff paddle::PoolAllocator::alloc() @ 0x7fa000dc8004 paddle::GpuMemoryHandle::GpuMemoryHandle() @ 0x7fa000da05c4 paddle::GpuMatrix::resize() @ 0x7fa000db4839 paddle::Matrix::resizeOrCreate() @ 0x7fa000c15cb2 paddle::Layer::resetSpecifyOutput() @ 0x7fa000c15f64 paddle::Layer::resetOutput() @ 0x7fa000c5e384 paddle::CudnnBatchNormLayer::forward() @ 0x7fa000cc80fd paddle::NeuralNetwork::forward() @ 0x7fa000cd3334 paddle::TrainerThread::forward() @ 0x7fa000cd4625 paddle::TrainerThread::computeThread() @ 0x7fa04a58b870 (unknown) @ 0x7fa055d02dc5 start_thread @ 0x7fa05532729d __clone @ (nil) (unknown) run_train.sh: line 35: 11850 Aborted CUDA_VISIBLE_DEVICES=0,1,2,3 python -u train.py --batch_size=64 --trainer_count=4 --num_passes=50 --num_proc_data=16 --num_conv_layers=2 --num_rnn_layers=3 --rnn_layer_size=1024 --num_iter_print=100 --learning_rate=5e-4 --max_duration=27.0 --min_duration=0.0 --test_off=False --use_sortagrad=True --use_gru=True --use_gpu=True --is_local=True --share_rnn_weights=False --train_manifest='data/aishell/manifest.train' --dev_manifest='data/aishell/manifest.dev' --mean_std_path='data/aishell/mean_std.npz' --vocab_path='data/aishell/vocab.txt' --output_model_dir='./checkpoints/aishell' --augment_conf_path='conf/augmentation.config' --specgram_type='linear' --shuffle_method='batch_shuffle_clipped' Failed in training!

PaddlePaddle / DeepSpeech 1 年多 前同步成功

运行过程中 CUDA out of memmory?

说一下我的情况，因为机器只有4块GPU，每个12GB内存，同时也对应修改了 run_train.sh脚本内容如下

* Check failure stack trace: *

PaddlePaddle / DeepSpeech
1 年多前同步成功