paddle/fluid/inference/api/CMakeLists.txt · 7e6a2190ddff25362d395667e397f5174f2346a2 · Crayon鑫 / Paddle

“6f09801ef97d58ac9a2cc4962edbb812287a9df5”上不存在“projects/thelittleboy/imports.yml”

Add FasterTokenizer Operator (#34491) · 3f2d6a3f

由 Steffy-zxf 提交于 10月 20, 2021

Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.

* support the text string as an input Tensor
* support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
* Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
* It first applies basic tokenization, followed by wordpiece tokenization.

3f2d6a3f

CMakeLists.txt 3.7 KB

Crayon鑫 / Paddle 与 Fork 源项目一致

Replace CMakeLists.txt

Crayon鑫 / Paddle
与 Fork 源项目一致