- 15 7月, 2022 1 次提交
-
-
由 Ruibiao Chen 提交于
-
- 26 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 26 10月, 2021 1 次提交
-
-
由 Jack Zhou 提交于
* optimize fast tokenizer
-
- 20 10月, 2021 1 次提交
-
-
由 Steffy-zxf 提交于
Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent. * support the text string as an input Tensor * support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens * Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization. * It first applies basic tokenization, followed by wordpiece tokenization.
-