• J
    Add TokenEmbedding (#4983) · e59f15a1
    Jack Zhou 提交于
    * Add TokenEmbedding
    
    * download corpus embedding data
    * load embedding data by specifying corpus name
    * extend the vocab of tokenizer from corpus embedding data
    
    * add unk token setting
    
    * modify tokenizer
    
    * add extend voacb
    
    * move jieba tokenizer and rename corpus_name->embedding_name
    
    * use bos url instead of localhost
    
    * add log when loading data
    
    * add token dot computation; add __repr__ of TokenEmbedding
    
    * add color logging
    
    * use paddlenlp.utils.log
    
    * adjust repr
    
    * update pretrained embedding table
    
    * fix padding idx
    e59f15a1
downloader.py 11.0 KB