Prepare internal speech recognition dataset for Mandarin.
Created by: xinghai-sun
- Prepare one or more internal Mandarin speech recognition datasets for our internal benchmark.
- If there is any particular data preprocessing for Mandarin, add it to the audio data provider.
- Prepare a reliable baseline & evaluation details. It would be better if we could try the baseline training environment ourselves.
- Need cooperating with the Department of Speech in Baidu.
- Refer to the DS2 design doc and update it when necessary.