Build n-gram language model for DeepSpeech2, and add inference interfaces insertable to CTC decoder.
Created by: xinghai-sun
- Train an Engish language model (Kneser-Ney smoothed 5-gram, with pruning), with KenLM toolkit, on cleaned text from the Common Crawl Repository. For detailed requirements please refer to DS2 paper.
- Add the training script into the DS2 trainer script.
- Add inference interfaces for this n-gram language model, insertable to CTC-LM-beam-search for decoding.
- Keep in mind that the interfaces should be compatible with both English (word-based LM) and Madarin (character-based LM).
- Please work closely with the "Add CTC-LM-beam-search decoder" task.
- Refer to the DS2 design doc and update it when necessary.