paddle/fluid/framework/CMakeLists.txt · 436808c6981be3fb808bb22794ee2885d7cd257e · PaddlePaddle / Paddle

[Cherry-pick] Add FasterTokenizer Operator (#36716) · edff5b79

由 Steffy-zxf 提交于 10月 26, 2021

* Add FasterTokenizer Operator (#34491)

Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.

* support the text string as an input Tensor
* support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
* Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
* It first applies basic tokenization, followed by wordpiece tokenization.

* optimize fast tokenizer

* remove const_cast
Co-authored-by: Nzhoushunjie <zhoushunjie@baidu.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>

edff5b79

CMakeLists.txt 23.4 KB

PaddlePaddle / Paddle 1 年多 前同步成功

Replace CMakeLists.txt

PaddlePaddle / Paddle
1 年多前同步成功