前往新版Gitcode,体验更适合开发者的 AI 搜索 >>
Created by: zhengken
Top 400,000 most frequent words are selected to build the vocabulary and the rest are replaced with 'UNKNOWNWORD'.
您好,我看英文语料处理的方法是选择文本中出现次数最多的 400, 000 个词,请问,那么这些词的发音该如何标注呢? 使用 G2P 吗?