提交 · d2e59e155aa86b426b3cb0feb5990be77f74fc37 · 机器未来 / Paddle

15 7月, 2022 1 次提交
- R
  
  Remove boost library (#44092) · d2e59e15
  由 Ruibiao Chen 提交于 7月 15, 2022
  
  d2e59e15
26 6月, 2022 1 次提交
- S
  
  format all files in fluid using new config (#43776) · 576236a0
  由 Sing_chan 提交于 6月 26, 2022
  
  576236a0
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
26 10月, 2021 1 次提交
- J
  Optimize FasterTokenizer (#36701) · 290ded7a
  由 Jack Zhou 提交于 10月 26, 2021
```
* optimize fast tokenizer
```
  290ded7a
20 10月, 2021 1 次提交

Add FasterTokenizer Operator (#34491) · 3f2d6a3f

由 Steffy-zxf 提交于 10月 20, 2021

Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.

* support the text string as an input Tensor
* support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
* Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
* It first applies basic tokenization, followed by wordpiece tokenization.

3f2d6a3f

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致