1. 18 12月, 2020 1 次提交
    • J
      Add en vector for the embedding · 047b8b69
      Jack Zhou 提交于
      * add more English embedding name
      
      * fix doc bug
      
      * delete useless description
      
      * add comments of TokenEmbedding
      
      * add embedding model info
      047b8b69
  2. 17 12月, 2020 1 次提交
  3. 14 12月, 2020 2 次提交
  4. 12 12月, 2020 1 次提交
  5. 10 12月, 2020 1 次提交
    • J
      Add TokenEmbedding (#4983) · e59f15a1
      Jack Zhou 提交于
      * Add TokenEmbedding
      
      * download corpus embedding data
      * load embedding data by specifying corpus name
      * extend the vocab of tokenizer from corpus embedding data
      
      * add unk token setting
      
      * modify tokenizer
      
      * add extend voacb
      
      * move jieba tokenizer and rename corpus_name->embedding_name
      
      * use bos url instead of localhost
      
      * add log when loading data
      
      * add token dot computation; add __repr__ of TokenEmbedding
      
      * add color logging
      
      * use paddlenlp.utils.log
      
      * adjust repr
      
      * update pretrained embedding table
      
      * fix padding idx
      e59f15a1