[Deep Speech 2] Slow training with internal English dataset (#354) · Issue · PaddlePaddle / models

[Deep Speech 2] Slow training with internal English dataset

Created by: xinghai-sun

Currently, training with our internal English dataset (.seqbin) is unexpectedly slow. Only 15% GPU utilization. The normal speed should be larger than 70% GPU utilization.

By profiling, we found the reason is:

Some audio data in this dataset requires a resampling (from 8000 to 16000 sample-rates) before spectrogram feature extraction. Such a resampling is CPU intensive. However, paddle.reader.xmap_readers is multi-threading, which can in fact use only single CPU core due to GIL (refer to Link).

For LibriSpeech dataset, this problem was not revealed since no CPU intensive resampling is needed for LibriSpeech data.

In a word, we need a multiprocessing version of xmap_reader.

PaddlePaddle / models 大约 1 年 前同步成功

[Deep Speech 2] Slow training with internal English dataset

PaddlePaddle / models
大约 1 年前同步成功