[Deep Speech 2] Slow training with internal English dataset
Created by: xinghai-sun
Currently, training with our internal English dataset (.seqbin) is unexpectedly slow. Only 15% GPU utilization. The normal speed should be larger than 70% GPU utilization.
By profiling, we found the reason is:
Some audio data in this dataset requires a resampling (from 8000 to 16000 sample-rates) before spectrogram feature extraction. Such a resampling is CPU intensive. However, paddle.reader.xmap_readers
is multi-threading, which can in fact use only single CPU core due to GIL (refer to Link).
For LibriSpeech dataset, this problem was not revealed since no CPU intensive resampling is needed for LibriSpeech data.
In a word, we need a multiprocessing version of xmap_reader
.