提交 · edff5b7975dcbf5ed6996f952de79be8ca15b49a · BaiXuePrincess / Paddle

26 10月, 2021 1 次提交

[Cherry-pick] Add FasterTokenizer Operator (#36716) · edff5b79

由 Steffy-zxf 提交于 10月 26, 2021

* Add FasterTokenizer Operator (#34491)

Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.

* support the text string as an input Tensor
* support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
* Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
* It first applies basic tokenization, followed by wordpiece tokenization.

* optimize fast tokenizer

* remove const_cast
Co-authored-by: Nzhoushunjie <zhoushunjie@baidu.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>

edff5b79

20 10月, 2021 1 次提交
- W
  
  [cherry-pick] Inference add type check in copy_from_cpu (#36552) · b5404f09
  由 Wilber 提交于 10月 20, 2021
  
  b5404f09
07 9月, 2020 1 次提交
- W
  
  Refine python inference api (#26958) · 63212541
  由 Wilber 提交于 9月 07, 2020
  
  63212541

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致