Created by: pkuyym
There is no need to convert transcription text to id sequence when doing evaluation and inference.