Add audio data provider and preprocessor for speech recognition datasets. (#2226) · Issue · PaddlePaddle / Paddle

Add audio data provider and preprocessor for speech recognition datasets.

Created by: xinghai-sun

Prepare one or more public English speech recognition data sets (e.g. LibriSpeech), and respective baselines.
Convert all audio file formats to .wav format.
Add a file manifest generator for each dataset, and add a merger if there exist more than one datasets. Make this interface unified across different datasets.
Add spectrogram feature extractor, power normalizer etc.
Add transcription text parser (tokenization, dictionary generation etc).
Add batch data reader with SortaGrad.
Refer to the DS2 design doc and update it when necessary.
Please pull your codes and docs into PaddlePaddle/models.