• S
    Add FasterTokenizer Operator (#34491) · 3f2d6a3f
    Steffy-zxf 提交于
    Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.
    
    * support the text string as an input Tensor
    * support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
    * Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
    * It first applies basic tokenization, followed by wordpiece tokenization.
    3f2d6a3f
faster_tokenizer_op.cc 17.3 KB