运行过程中 CUDA out of memmory?
Created by: bolt163
说一下我的情况,因为机器只有4块GPU,每个12GB内存,同时也对应修改了 run_train.sh脚本内容如下
CUDA_VISIBLE_DEVICES=0,1,2,3
python -u train.py
--batch_size=64
--trainer_count=4
--num_passes=50
--num_proc_data=16
--num_conv_layers=2
--num_rnn_layers=3
--rnn_layer_size=1024
--num_iter_print=100
--learning_rate=5e-4
--max_duration=27.0
--min_duration=0.0
--test_off=False
--use_sortagrad=True
--use_gru=True
--use_gpu=True
--is_local=True
--share_rnn_weights=False
--train_manifest='data/aishell/manifest.train'
--dev_manifest='data/aishell/manifest.dev'
--mean_std_path='data/aishell/mean_std.npz'
--vocab_path='data/aishell/vocab.txt'
--output_model_dir='./checkpoints/aishell'
--augment_conf_path='conf/augmentation.config'
--specgram_type='linear'
--shuffle_method='batch_shuffle_clipped'
if [ $? -ne 0 ]; then echo "Failed in training!" exit 1 fi
exit 0
~
~
~
"run_train.sh" 43L, 1019C
————————————————————————————————————————————— ~/DeepSpeech/examples/aishell> sh run_train.sh ----------- Configuration Arguments ----------- augment_conf_path: conf/augmentation.config batch_size: 64 dev_manifest: data/aishell/manifest.dev init_model_path: None is_local: 1 learning_rate: 0.0005 max_duration: 27.0 mean_std_path: data/aishell/mean_std.npz min_duration: 0.0 num_conv_layers: 2 num_iter_print: 100 num_passes: 50 num_proc_data: 16 num_rnn_layers: 3 output_model_dir: ./checkpoints/aishell rnn_layer_size: 1024 share_rnn_weights: 0 shuffle_method: batch_shuffle_clipped specgram_type: linear test_off: 0 train_manifest: data/aishell/manifest.train trainer_count: 4 use_gpu: 1 use_gru: 1 use_sortagrad: 1 vocab_path: data/aishell/vocab.txt
I0315 16:47:44.366181 11850 Util.cpp:166] commandline: --use_gpu=1 --rnn_use_batch=True --log_clipping=True --trainer_count=4 [INFO 2018-03-15 16:47:46,743 layers.py:2714] output for conv_0: c = 32, h = 81, w = 54, size = 139968 [INFO 2018-03-15 16:47:46,744 layers.py:3282] output for batch_norm_0: c = 32, h = 81, w = 54, size = 139968 [INFO 2018-03-15 16:47:46,744 layers.py:7454] output for scale_sub_region_0: c = 32, h = 81, w = 54, size = 139968 [INFO 2018-03-15 16:47:46,745 layers.py:2714] output for conv_1: c = 32, h = 41, w = 54, size = 70848 [INFO 2018-03-15 16:47:46,746 layers.py:3282] output for batch_norm_1: c = 32, h = 41, w = 54, size = 70848 [INFO 2018-03-15 16:47:46,746 layers.py:7454] output for scale_sub_region_1: c = 32, h = 41, w = 54, size = 70848 I0315 16:47:46.766090 11850 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=4 numDevices=4 I0315 16:47:46.877454 11850 GradientMachine.cpp:94] Initing parameters.. I0315 16:47:50.904826 11850 GradientMachine.cpp:101] Init parameters done. ................................................................................................... Pass: 0, Batch: 100, TrainCost: 63.639388 ................................................................................................... Pass: 0, Batch: 200, TrainCost: 63.148882 ................................................................................................... Pass: 0, Batch: 300, TrainCost: 64.747351 ................................................................................................... Pass: 0, Batch: 400, TrainCost: 54.954166 ................................................................................................... Pass: 0, Batch: 500, TrainCost: 38.613670 ................................................................................................... Pass: 0, Batch: 600, TrainCost: 30.979173 ................................................................................................... Pass: 0, Batch: 700, TrainCost: 26.576287 ................................................................................................... Pass: 0, Batch: 800, TrainCost: 24.339529 ................................................................................................... Pass: 0, Batch: 900, TrainCost: 22.288584 ...............................F0315 17:24:56.116675 11921 hl_cuda_device.cc:273] Check failed: cudaSuccess == cudaStat (0 vs. 2) Cuda Error: out of memory
* Check failure stack trace: *
@ 0x7fa000f5adad google::LogMessage::Fail() @ 0x7fa000f5ef6c google::LogMessage::SendToLog() @ 0x7fa000f5a8d3 google::LogMessage::Flush() @ 0x7fa000f5f9be google::LogMessageFatal::~LogMessageFatal() @ 0x7fa000f1bf84 hl_malloc_device() @ 0x7fa000dd4a66 paddle::GpuAllocator::alloc() @ 0x7fa000dc85ff paddle::PoolAllocator::alloc() @ 0x7fa000dc8004 paddle::GpuMemoryHandle::GpuMemoryHandle() @ 0x7fa000da05c4 paddle::GpuMatrix::resize() @ 0x7fa000db4839 paddle::Matrix::resizeOrCreate() @ 0x7fa000c15cb2 paddle::Layer::resetSpecifyOutput() @ 0x7fa000c15f64 paddle::Layer::resetOutput() @ 0x7fa000c5e384 paddle::CudnnBatchNormLayer::forward() @ 0x7fa000cc80fd paddle::NeuralNetwork::forward() @ 0x7fa000cd3334 paddle::TrainerThread::forward() @ 0x7fa000cd4625 paddle::TrainerThread::computeThread() @ 0x7fa04a58b870 (unknown) @ 0x7fa055d02dc5 start_thread @ 0x7fa05532729d __clone @ (nil) (unknown) run_train.sh: line 35: 11850 Aborted CUDA_VISIBLE_DEVICES=0,1,2,3 python -u train.py --batch_size=64 --trainer_count=4 --num_passes=50 --num_proc_data=16 --num_conv_layers=2 --num_rnn_layers=3 --rnn_layer_size=1024 --num_iter_print=100 --learning_rate=5e-4 --max_duration=27.0 --min_duration=0.0 --test_off=False --use_sortagrad=True --use_gru=True --use_gpu=True --is_local=True --share_rnn_weights=False --train_manifest='data/aishell/manifest.train' --dev_manifest='data/aishell/manifest.dev' --mean_std_path='data/aishell/mean_std.npz' --vocab_path='data/aishell/vocab.txt' --output_model_dir='./checkpoints/aishell' --augment_conf_path='conf/augmentation.config' --specgram_type='linear' --shuffle_method='batch_shuffle_clipped' Failed in training!-------------------------------------------------------中途异常退出。。。GPU内存泄漏。。。
Thu Mar 15 17:26:56 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.20 Driver Version: 375.20 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40m Off | 0000:0D:00.0 Off | 0 |
| N/A 37C P0 62W / 235W | 3022MiB / 11471MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40m Off | 0000:0E:00.0 Off | 0 |
| N/A 36C P0 62W / 235W | 2967MiB / 11471MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K40m Off | 0000:30:00.0 Off | 0 |
| N/A 39C P0 62W / 235W | 3032MiB / 11471MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K40m Off | 0000:33:00.0 Off | 0 |
| N/A 35C P0 62W / 235W | 2966MiB / 11471MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+