use Paddlepaddle to train deepspeech show: Cannot allocate memory (#251) · Issue · PaddlePaddle / DeepSpeech

use Paddlepaddle to train deepspeech show: Cannot allocate memory

Created by: songxia928

============================================================

*** Print content:

============================================================ ................................................................................................... Pass: 25, Batch: 3100, TrainCost: 2.208693 ............................... ------- Time: 7017 sec, Pass: 25, ValidationCost: 52.2345577601 ................................................................................................... Pass: 26, Batch: 100, TrainCost: 1.603295 ................................................................................................... Pass: 26, Batch: 200, TrainCost: 1.676432 ................................................................................................... Pass: 26, Batch: 300, TrainCost: 1.612143 ................................................................................................... Pass: 26, Batch: 400, TrainCost: 1.353869 ................................................................................................... Pass: 26, Batch: 500, TrainCost: 1.482893 ................................................................................................... Pass: 26, Batch: 600, TrainCost: 1.687072 ................................................................................................... Pass: 26, Batch: 700, TrainCost: 1.966297 ................................................................................................... Pass: 26, Batch: 800, TrainCost: 1.420656 ................................................................................................... Pass: 26, Batch: 900, TrainCost: 1.590458 ................................................................................................... Pass: 26, Batch: 1000, TrainCost: 2.041432 ................................................................................................... Pass: 26, Batch: 1100, TrainCost: 1.333077 ................................................................................................... Pass: 26, Batch: 1200, TrainCost: 1.839570 ................................................................................................... Pass: 26, Batch: 1300, TrainCost: 1.614656 ................................................................................................... Pass: 26, Batch: 1400, TrainCost: 1.606205 ................................................................................................... Pass: 26, Batch: 1500, TrainCost: 2.410910 ................................................................................................... Pass: 26, Batch: 1600, TrainCost: 1.942313 ................................................................................................... Pass: 26, Batch: 1700, TrainCost: 2.099164 ................................................................................................... Pass: 26, Batch: 1800, TrainCost: 1.974127 ................................................................................................... Pass: 26, Batch: 1900, TrainCost: 2.180515 ................................................................................................... Pass: 26, Batch: 2000, TrainCost: 2.120815 ................................................................................................... Pass: 26, Batch: 2100, TrainCost: 2.503351 ................................................................................................... Pass: 26, Batch: 2200, TrainCost: 2.302993 ................................................................................................... Pass: 26, Batch: 2300, TrainCost: 2.194658 ................................................................................................... Pass: 26, Batch: 2400, TrainCost: 2.364214 ................................................................................................... Pass: 26, Batch: 2500, TrainCost: 2.714264 ................................................................................................... Pass: 26, Batch: 2600, TrainCost: 2.205300 ................................................................................................... Pass: 26, Batch: 2700, TrainCost: 2.671229 ................................................................................................... Pass: 26, Batch: 2800, TrainCost: 2.073756 ................................................................................................... Pass: 26, Batch: 2900, TrainCost: 2.087103 ................................................................................................... Pass: 26, Batch: 3000, TrainCost: 2.158531 ................................................................................................... Pass: 26, Batch: 3100, TrainCost: 2.519216 ................................ ------- Time: 6982 sec, Pass: 26, ValidationCost: 53.4873382121 Traceback (most recent call last): File "train.py", line 129, in main() File "train.py", line 125, in main train() File "train.py", line 116, in train test_off=args.test_off) File "/data1/chensong/code/20180523_deepspeech/DeepSpeech/model_utils/model.py", line 155, in train feeding=adapted_feeding_dict) File "/usr/local/lib/python2.7/dist-packages/paddle/v2/trainer.py", line 162, in train for batch_id, data_batch in enumerate(reader()): File "/data1/chensong/code/20180523_deepspeech/DeepSpeech/model_utils/model.py", line 391, in adapted_reader for instance in data(): File "/data1/chensong/code/20180523_deepspeech/DeepSpeech/data_utils/data.py", line 199, in batch_reader for instance in instance_reader(): File "/data1/chensong/code/20180523_deepspeech/DeepSpeech/data_utils/utility.py", line 198, in xreader w.start() File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/usr/lib/python2.7/multiprocessing/forking.py", line 121, in init self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory

============================================================

*** my machine state:

============================================================

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| +-----------------------------------------------------------------------------+ root@2f9aebc64ff:/data1/code/20180523_deepspeech/DeepSpeech# free -w total used free shared buffers cache available Mem: 264057556 241080888 3589964 14154876 2435984 16950720 6184440 Swap: 16777212 16452236 324976

============================================================

*** the training parameters I set:

============================================================ CUDA_VISIBLE_DEVICES=4,5,6,7
python -u train.py
--batch_size=32
--trainer_count=4
--num_passes=50
--num_proc_data=16
--num_conv_layers=2
--num_rnn_layers=3
--rnn_layer_size=1024
--num_iter_print=100
--learning_rate=5e-4
--max_duration=20.0
--min_duration=1.0
--test_off=False
--use_sortagrad=True
--use_gru=True
--use_gpu=True
--is_local=True
--share_rnn_weights=False
--train_manifest='data_mangguo_2/aishell/manifest.train'
--dev_manifest='data_mangguo_2/aishell/manifest.dev'
--mean_std_path='data_mangguo_2/aishell/mean_std.npz'
--vocab_path='data_mangguo_2/aishell/vocab.txt'
--output_model_dir='./checkpoints/mangguo_2'
--augment_conf_path='conf/augmentation.config'
--specgram_type='linear'
--shuffle_method='batch_shuffle_clipped'

Hi, I used PaddlePaddle to train a Chinese model based on Deepspeech. I can finish training a model with the data of aishell . when I add more audio data to train , it will show this wrong. From the machine state ， I guess the CPU' memory is not enough. But I don't know how to solve it . Anyone could tell me , thanks ! （我用PaddlePaddle训练一个基于Deepspeech的汉语模型。我能使用aishell的数据完成一轮训练，但是当语言数据增加时就会出现内存无法分配的错误提示，请问，有谁能知道怎么解决吗？）

PaddlePaddle / DeepSpeech 大约 2 年 前同步成功

use Paddlepaddle to train deepspeech show: Cannot allocate memory

*** Print content:

*** my machine state:

*** the training parameters I set:

PaddlePaddle / DeepSpeech
大约 2 年前同步成功