Add audio data provider and preprocessor for speech recognition datasets.
Created by: xinghai-sun
- Prepare one or more public English speech recognition data sets (e.g. LibriSpeech), and respective baselines.
- Convert all audio file formats to .wav format.
- Add a file manifest generator for each dataset, and add a merger if there exist more than one datasets. Make this interface unified across different datasets.
- Add spectrogram feature extractor, power normalizer etc.
- Add transcription text parser (tokenization, dictionary generation etc).
- Add batch data reader with SortaGrad.
- Refer to the DS2 design doc and update it when necessary.
- Please pull your codes and docs into PaddlePaddle/models.