PaddlePaddle / Paddle
大约 1 年前同步成功

20931

代码
- 文件
- 提交
- 分支
- Tags
- 贡献者
- 分支图
- Diff
Issue 1423
- 列表
- 看板
- 标记
- 里程碑
合并请求 543
Wiki 0
- Wiki
分析
- 仓库
- DevOps
项目成员
Pages

Add FasterTokenizer Operator (#34491) · 3f2d6a3f

由 Steffy-zxf 提交于 10月 20, 2021

Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.

* support the text string as an input Tensor
* support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
* Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
* It first applies basic tokenization, followed by wordpiece tokenization.

3f2d6a3f

faster_tokenizer_op.cc 17.3 KB

PaddlePaddle / Paddle 大约 1 年 前同步成功

Replace faster_tokenizer_op.cc

PaddlePaddle / Paddle
大约 1 年前同步成功