提交 · 436808c6981be3fb808bb22794ee2885d7cd257e · 机器未来 / Paddle

26 10月, 2021 1 次提交

[Cherry-pick] Add FasterTokenizer Operator (#36716) · edff5b79

由 Steffy-zxf 提交于 10月 26, 2021

* Add FasterTokenizer Operator (#34491)

Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.

* support the text string as an input Tensor
* support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
* Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
* It first applies basic tokenization, followed by wordpiece tokenization.

* optimize fast tokenizer

* remove const_cast
Co-authored-by: Nzhoushunjie <zhoushunjie@baidu.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>

edff5b79

17 9月, 2021 1 次提交

GeneratePass for Python Pass (#35708) · f6db9806

由 wuhuanzhou 提交于 9月 17, 2021

#### 背景

#35602 提供Python侧开发子图替换类Pass的方式：

- 利用Paddle Python API或者辅助类型定义子图program用来匹配/替换图；
- Python侧注册Pass时，将注册函数最终转换为protobuf定义的PassDesc数据形式，供C++侧进行解析完成Pass实例注册。

本PR即为根据PassDesc规则描述解析生成Pass实例。

#### 方案设计

##### Pass规则验证

在以往的Pass开发中，会存在随着算子迭代引发的匹配失效或者错误匹配的问题，该问题可以通过扫描算子支持的参数设置及参数类型等来判断是否应该使用该Pass或者给出提示需要修改Pass代码。

当前Pass开发中提供了算子兼容性OpCompatSensiblePass用于解决上述问题。但同时还存在不足：由于以往Pass开发在运行时才能获取到pattern信息，所以需要在执行Pass时才可以判断。

使用PassDesc表示的Pass可以在执行Pass前验证上述问题，这个过程在VerifyDesc中完成。

##### 根据匹配子图构造pattern

GeneratePass对于图匹配和替换使用GraphPatternDecetor完成，构造匹配pattern实际上就是将对应对象成员PDPattern中添加PDNode和边关系。该过程在函数`InitGeneratePattern`中完成，该函数没有作为GeneratePass的成员方法，主要出于后续可能开发新的Decetor考虑，GeneratePass与Decetor的操作是没有关联的。

初始化pattern主要通过遍历匹配子图program的全部算子实现：

1. 添加当前算子对应PDNode及限制条件（算子类型、属性限制等）；
2. 遍历当前算子对应输入并从pattern中尝试获取PDNode：
   - 在pattern中获取到PDNode且为输出节点：表示属于匹配子图的中间节点，将该PDNode设置为中间节点；
   - 在pattern中没有获取到PDNode：添加该输入PDNode并设置作为输入节点；
   - 设置输入到算子的边关系；
3. 遍历当前算子对应输出：
   - 在pattern中获取到PDNode且为输入节点：表示属于匹配子图的中间节点，将该PDNode设置为中间节点；
   - 在pattern中没有获取到PDNode：添加该输入PDNode并设置作为输出节点；
   - 设置算子到输出的边关系；

##### 根据替换子图操作graph

替换子图操作的过程在`GetGenerateRewrite`函数中完成，与`InitGeneratePattern`类似没有作为GeneratePass的成员方法。

生成替换子图操作过程如下：

1. 判断冗余替换子图；
2. 遍历替换子图program的全部算子添加替换子图Node：
   1. 添加当前算子的Node及属性设置；
   2. 遍历当前算子对应输入，添加中间variable节点；
   3. 遍历当前算子对应输出，添加中间variable节点；
   4. 添加输入/输出节点与算子节点的边关系；
3. 删除匹配图中属于中间节点的Node；

##### 优化子图验证

对于替换子图或者替换后的计算图是否可以正确运行等，可以在执行Pass时验证，从而防止在后续执行计算图时出现异常。

当前Pass执行直接修改计算图，验证失败时无法很好的完成还原操作，目前子图验证暂时默认成功，留到后续改进。

f6db9806

16 9月, 2021 1 次提交

Python support register pass via PassDesc (#35602) · bab39eb2

由 wuhuanzhou 提交于 9月 16, 2021

PR主要功能：针对fusion等子图替换场景，支持Python侧开发并注册Pass。

背景
Pass是指输入一个深度学习计算图Graph，依照一定条件进行修改，输出修改后的Graph的过程；
当前PaddlePadle框架编写Pass代码存在以下问题：
用户需要手写Graph的条件匹配、在Graph上的修改代码；
对Graph操作需要深入底层框架代码，了解Graph的结构，并且知道相关Pass写法；
我们提出了针对fusion等子图替换类Pass的优化方案以支持用户在Python侧开发注册Pass，提升二次开发体验：
用户只需要输入匹配和替换的子图描述，由深度学习框架编写的代码来生成匹配和替换的逻辑，不需要用户对Graph进行匹配和替换操作；
API级别的替换，用户可以通过Paddle的Python API构造子图，从而不需要知道Graph的结构，也能写Paddle的Graph Pass代码

bab39eb2

07 9月, 2021 1 次提交
- Y
  
  support multi-node (#35396) · c6e0cedc
  由 yaoxuefeng 提交于 9月 07, 2021
  
  c6e0cedc
24 8月, 2021 1 次提交
- Z
  
  add scope guard (#35103) · b0a1d122
  由 Zeng Jinle 提交于 8月 24, 2021
  
  b0a1d122
18 8月, 2021 2 次提交

code refactoring for new executor (#34970) · 40d4d834

由 wanghuancoder 提交于 8月 18, 2021

* code refactoring, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

40d4d834

[CustomOp] Fix ext_tensor.cast failed bug (#34884) · 4d88cdb8

由 Chen Weihang 提交于 8月 18, 2021

* fix ext_tensor.cast failed bug

* remove useless deps

* fix windows cmake failed

* try to fix windows make failed

* fix make error on windwos

4d88cdb8

11 8月, 2021 1 次提交
- L
  add the basic apis for auto_parallel (#33804) · 3f962e77
  由 lilong12 提交于 8月 11, 2021
```
* add auto_parallel apis
```
  3f962e77
10 8月, 2021 1 次提交

copy boost/any.hpp to utils and replace boost::any with self defined any (#34613) · 12892929

由 chentianyu03 提交于 8月 10, 2021

* add any.hpp to utils and replace boost::any with self defined paddle::any

* add copy any.hpp to custom op depends

* modify any.hpp include path

* remove boost from setup.py.in

* add copy any.hpp to custom op depends

* move any.hpp to paddle/utils/ dirs

* move any.h to extension/include direction

* copy utils to right directions

12892929

05 8月, 2021 1 次提交

New executor dev (#34407) · 012d12b5

由 hong 提交于 8月 05, 2021

* first test version

* add test exec;

* add data transfer; test=develop

* add new exec head;

* add memcpy; test=develop

* add python fetch

* add new test

* add graph node; test=develop

* remove useless new executor test; test=develop

* remove gperf dependency; test=develop

* fix compile bugs; test=develop

* remove useless code; test=develop

* remove useless code; test=develop

* add uni test; test=develop

* polish code; test=develop

* polish code; test=develop

* add interpreter cmakefile; test=develop

* remove useless code; test=develop

012d12b5

03 8月, 2021 2 次提交
- Q
  support Kunlun2 (#34459) · 2d0f3d9b
  由 QingshuChen 提交于 8月 03, 2021
```
* support Kunlun2

* support KL2

* support KL2
```
  2d0f3d9b
- polish sccahce (#34350) · 61e51c18
  由 zhouweiwei2014 提交于 8月 03, 2021
  
  61e51c18
20 7月, 2021 1 次提交

Add Dependency to Fix Random Compilation Failure (#34256) · c0133e01

由 Huihuang Zheng 提交于 7月 20, 2021

Add boost as dependency to fix random compilation failure. This is due to program_processing.cc used boost but didn't write boost into DEPS in the CMakeLists.txt

c0133e01

15 7月, 2021 3 次提交
- H
  Class for processing program (#33439) · 85642a0d
  由 huangxu96 提交于 7月 15, 2021
```
This PR creates a class to process the program at the C++ level. Currently, this class has one class method:
GetInputsOutputsInBlock()
```
  85642a0d
- Z
  Add DCU backend support for custom ops (#34050) · 62840afa
  由 Zhanlue Yang 提交于 7月 15, 2021
```
* Add DCU backend support for custom ops

* Added checks for DeviceCopy and renamed some macros
```
  62840afa
- A
  Upgrade Executor into ParallelExcutor to apply Graph Optimization in @to_static (#32283) · 2850391d
  由 Aurelius84 提交于 7月 15, 2021
```
* Refine Constructor logic of ParallelExecutor

* Replace executor into ParallelExecutor in run_program_op
```
  2850391d
14 7月, 2021 1 次提交
- Support sccache to speed up compilation on Windows (#34019) · 4ce66826
  由 zhouweiwei2014 提交于 7月 14, 2021
```
* Support sccache to speed up compilation on Windows

* Support sccache to speed up compilation on Windows
```
  4ce66826
13 7月, 2021 1 次提交
- Z
  
  expose gc analysis interface (#34092) · 2b557da0
  由 Zeng Jinle 提交于 7月 13, 2021
  
  2b557da0
02 7月, 2021 2 次提交
- H
  
  Enhance npu/xpu log when kernel fallback to cpu, and fix cmake warnings. (#33927) · a74e01ab
  由 houj04 提交于 7月 02, 2021
  
  a74e01ab
- Polish Windows CI, fix CI random fail (#33863) · fcdbc8de
  由 zhouweiwei2014 提交于 7月 02, 2021
  
  fcdbc8de
29 6月, 2021 1 次提交
- T
  Remove HeterBox (#33718) · 66c7a076
  由 Thunderbrook 提交于 6月 29, 2021
```
* remove heterbox

* remove heterbox
```
  66c7a076
07 6月, 2021 1 次提交
- 王
  
  pack the @op_name@.pbtxt into library. test=develop (#33322) · d19bceb6
  由王明冬提交于 6月 07, 2021
  
  d19bceb6
03 6月, 2021 1 次提交
- 王
  
  add the fc fuse example for pass enhance, test=develop (#33250) · fc5b3a99
  由王明冬提交于 6月 03, 2021
  
  fc5b3a99
25 5月, 2021 1 次提交
- 石
  add the op def proto, test=develop (#33098) · 3a7b9ed7
  由石晓伟提交于 5月 25, 2021
```
* add the op def proto, test=develop

* add while.pbtxt
```
  3a7b9ed7
18 5月, 2021 1 次提交
- T
  unit double (#32902) · 29bbeb07
  由 Thunderbrook 提交于 5月 18, 2021
```
* unit double

* unit double
```
  29bbeb07
10 5月, 2021 1 次提交
- T
  [pslib] pslib with cmake (#32800) · fbbc3394
  由 Thunderbrook 提交于 5月 10, 2021
```
* pslib with cmake

* heter util

* vlog

* heter server test

* add dtor

* cmake
```
  fbbc3394
07 5月, 2021 1 次提交
- Z
  Remove paddle_custom_op dynamic libraries, and link to FLUID_CORE on Windows (#32583) · 7610c2b4
  由 Zhou Wei 提交于 5月 07, 2021
```
* Remove paddle_custom_op dynamic libraries, change link to FLUID_CORE on windows, and check copy_to

* fix CI
```
  7610c2b4
28 4月, 2021 1 次提交

[PsCore] solve Brpc dep (#32632) · 4ead9a5a

由 Thunderbrook 提交于 4月 28, 2021

* Revert "Revert "[PsCore] optimize performance of large kv (#32535)" (#32599)"

This reverts commit 809ac036.

* brpc dep

4ead9a5a

15 4月, 2021 1 次提交
- 1
  tree-based-model (#31696) · a8c3a902
  由 123malin 提交于 4月 15, 2021
```
* add index_dataset and index_sampler for tree-based model
```
  a8c3a902
09 4月, 2021 1 次提交
- A
  [CustomOp]Support MacOS platform and Remove libpaddle_custom_op.so dependency (#31976) · d815fbf9
  由 Aurelius84 提交于 4月 09, 2021
```
* Remove old custom OP to reduce whl package volume

* [Custom OP]Remove old custom OP to reduce whl package volume

* support macos
```
  d815fbf9
30 3月, 2021 1 次提交
- Z
  [Custom OP]Remove old custom OP and reduce whl package volume (#31813) · 04a49b09
  由 Zhou Wei 提交于 3月 30, 2021
```
* Remove old custom OP to reduce whl package volume

* [Custom OP]Remove old custom OP to reduce whl package volume
```
  04a49b09
18 3月, 2021 1 次提交
- C
  [CustomOp] Support complex dtype in custom op (#31657) · 87852616
  由 Chen Weihang 提交于 3月 18, 2021
```
* support custom complex op

* fix detail error

* add inference support

* fix setup windows failed
```
  87852616
04 3月, 2021 1 次提交
- W
  
  Windows system supports Ninja compilation (#31161) · 4d6d2db8
  由 wuhuanzhou 提交于 3月 04, 2021
  
  4d6d2db8
27 2月, 2021 1 次提交
- 石
  
  [Custom OP] change the user header file format, test=develop (#31274) · 8c94d8cb
  由石晓伟提交于 2月 27, 2021
  
  8c94d8cb
25 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid framework for rocm (part4), test=develop (#31013) · 580447d0
  由 Qi Li 提交于 2月 25, 2021
  
  580447d0
24 2月, 2021 2 次提交
- Z
  
  modify custom op dependent from paddle_framework to paddle_custom_op (#31195) · ffbf7135
  由 Zhou Wei 提交于 2月 24, 2021
  
  ffbf7135
- C
  [CustomOp] Add new paddle custom op so (#31141) · 1ce96fa1
  由 Chen Weihang 提交于 2月 23, 2021
```
* add new custom op so

* fix use new method error

* fix test failed
```
  1ce96fa1
22 2月, 2021 1 次提交

[2.0Custom OP]Support New Custom OP on Windows (#31063) · adaec007

由 Zhou Wei 提交于 2月 22, 2021

* [2.0.1]Support New Custom OP on windows

* fix CI

* fix code style

* fix CI

* fix CI

* fix coverage

* fix CI

* fix CI

adaec007

10 2月, 2021 1 次提交

New custom operator extension mechanism (#30690) · f649442d

由 Chen Weihang 提交于 2月 09, 2021

* initial commit: simple demo

* polish copyright format

* add grap op simple demo

* adapt uncertain number of argument

* change trait marco name

* add place & dtype support for add kernel

* add dispath and infershape func

* poish code & add notes

* add dynamic_loader dep for paddle_framework

* add new custom op test dir

* polish impl details

* add unittest for new custom op

* fix failed unittest

* Costum op (#1)

* fix compile error

* wrap framework tensor with LoDTensor

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* add CustomTensor default constructor

* add size() for CustomTensor

* make size const for CustomTensor

* refactor place related api to circle the concept

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* make place const

* make Tensor copy

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* remove additional head of framework

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* add gpu test

* merge latest cwh code in

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* Remove ShareData from user && Change CustomTensor to Tensor && Support more data type (#2)

* fix compile error

* wrap framework tensor with LoDTensor

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* add CustomTensor default constructor

* add size() for CustomTensor

* make size const for CustomTensor

* refactor place related api to circle the concept

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* make place const

* make Tensor copy

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* remove additional head of framework

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* add gpu test

* merge latest cwh code in

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* hid share data from and to

* rename CustomTensor to Tensor

* refactor register design & add test

* change op_funtion to op_meta_info

* split op meta info into .h and .cc

* move get methods into friend class

* move OpMetaInfoHelper into framework space

* move CustomTensorUtils into framework space

* change pybind api name

* move PD C API into op meta info

* add register custom op api

* remove inference cmake change

* refactor copy to api && change Reshape to lowercase && support more dtype && add more test (#3)

* fix compile error

* wrap framework tensor with LoDTensor

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* add CustomTensor default constructor

* add size() for CustomTensor

* make size const for CustomTensor

* refactor place related api to circle the concept

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* make place const

* make Tensor copy

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* remove additional head of framework

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* add gpu test

* merge latest cwh code in

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* hid share data from and to

* rename CustomTensor to Tensor

* support multi dtype

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* fix copy to error

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* polish detail & error message

* polish test details

* Add cast api && Change copy related api to copy_to && add more test (#4)

* fix compile error

* wrap framework tensor with LoDTensor

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* add CustomTensor default constructor

* add size() for CustomTensor

* make size const for CustomTensor

* refactor place related api to circle the concept

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* make place const

* make Tensor copy

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* remove additional head of framework

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* add gpu test

* merge latest cwh code in

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* hid share data from and to

* rename CustomTensor to Tensor

* support multi dtype

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* fix copy to error

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add type cast

* add cast and make copy to api

* add cast and make copy to api

* add cast and make copy to api

* add cast and make copy to api

* merge cwh code

* merge cwh code

* merge cwh code

* merge cwh code

* merge cwh code

* add more error log

* add more error log

* polish code

* used for test

* remove test comment

* remove test comment

* fix uint8 type error

* fix lost uint8 type error

* add test for coverage

* polish details by reviewer comments

* add prefix for DISABLE_COPY_AND_ASSIGN
Co-authored-by: NJiabin Yang <360788950@qq.com>

f649442d

18 1月, 2021 1 次提交
- L
  
  [Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317) · 843dc3cd
  由 liuyuhui 提交于 1月 18, 2021
  
  843dc3cd

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致