提交 · 2a3d9eca64b0312a6bf49ffe6f470a084886bbe4 · Crayon鑫 / Paddle

07 3月, 2022 1 次提交

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

01 3月, 2022 1 次提交
- W
  remove conv_affine_channel_fuse_pass (#39817) · fc06be9d
  由 wenbin 提交于 3月 01, 2022
```
* remove

* pass

* more pass
```
  fc06be9d
22 2月, 2022 1 次提交
- A
  
  sync recent changes (#39763) · d945e24c
  由 Allen Guo 提交于 2月 22, 2022
  
  d945e24c
15 2月, 2022 1 次提交

[Paddle-Inference] support preln_ernie: add... · 2bc91cc5

由 Wangzheee 提交于 2月 15, 2022

[Paddle-Inference] support preln_ernie: add preln_embedding_eltwise_layernorm_fuse_pass, preln_skip_layernorm_fuse_pass (#39508)

* support preln_ernie

* support preln_ernie

2bc91cc5

09 2月, 2022 1 次提交

[Paddle-Inference] rebuild matmul pass: trt and gpu_cpu (#39369) · db7d129e

由 Wangzheee 提交于 2月 09, 2022

* rebuild matmul pass: trt and gpu_cpu

* rebuild matmul pass: trt and gpu_cpu

* rebuild matmul pass: trt and gpu_cpu

* rebuild matmul pass: trt and gpu_cpu

db7d129e

26 1月, 2022 1 次提交

[IPU] sync misc changes 01 (#38876) · 4efbebea

由 Allen Guo 提交于 1月 26, 2022

* sync misc changes

* apply comments 01

* fix compile error

* remove is_ipu_place check

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* sync changes

* restore cmake

* update ir cmake and setup.py

* update inference_lib cmake

* split PR
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

4efbebea

20 12月, 2021 1 次提交

add matmul_scale_fuse_pass (#37962) · ce335c23

由 heliqi 提交于 12月 20, 2021

* add matmul_scale matmul_v2_scale fuse pass

* add scaletensor judge

* modify var name

* add timeout notest;test=coverag

* fix error commit

* fix use_mkldnn attr

* fix use_mkldnn attr

ce335c23

16 12月, 2021 1 次提交

Add tests for PaddleInference Pass (#37676) · 96597a85

由 yeliang2258 提交于 12月 16, 2021

* add test for conv_elementwise_add2_act_fuse_pass and conv_elementwise_add_act_fuse_pass

* Add conv_eltwiseadd_bn_fuse_pass test and fix test_conv_elementwise_addX_act_fuse_pass

* add tests for conv_act_mkldnn_fuse_pass

* add test for conv_bias_mkldnn_fuse_pass

* update code

* add conv_act_mkldnn_fuse_pass for relu, relu6, swish, leaky_relu

* update test

* update

* update bug

* update

* update pattern_detector

* fix test_conv_eltwiseadd_bn_fuse_pass

* add diff display notest;test=windows_ci_inference

* fix

* remove test_conv_act_mkldnn_fuse_pass.py

* ifix

96597a85

14 12月, 2021 2 次提交

add layer_norm_fuse_pass test case (#37830) · b95c9cf2

由 heliqi 提交于 12月 14, 2021

* add layer_norm_fuse_pass test case

* restore cmakelist code

* Merge branch 'develop' into layer_norm_fuse_pass

* Merge branch 'develop' into layer_norm_fuse_pass

* add bad case test

b95c9cf2

S
add reshape+transpose+matmul_v2 only (#37847) · a922168a
由 Sylwester Fraczek 提交于 12月 14, 2021
```
* reshape+transpose+matmul_v2

* in_name->input_name

* fix pr-ci-static-check
```
a922168a

10 12月, 2021 1 次提交

add fc_elementwise_layernorm_fuse_pass (#37771) · 0127e92d

由 heliqi 提交于 12月 10, 2021

* add fc_elementwise_layernorm_fuse_pass

* fix name conflictn

* rebuild CI

* fix Ran Programs=0 bug

0127e92d

06 12月, 2021 1 次提交
- H
  add test_unsqueeze2_eltwise_fuse_pass (#37647) · 22401426
  由 heliqi 提交于 12月 06, 2021
```
* add test_unsqueeze2_eltwise_fuse_pass

* fix name conflictn

* rebuild CI
```
  22401426
11 11月, 2021 2 次提交

Add test property RUN_TYPE=CINN (#37114) · 7a0cc0a9

由 Huihuang Zheng 提交于 11月 11, 2021

Add test property RUN_TYPE=CINN to CINN unit tests. It will restrict Paddle-CINN CI to run these unit tests only.

7a0cc0a9

Added softplus + activation oneDNN fuse pass (#36657) · a346c4dc

由 jakpiase 提交于 11月 11, 2021

* added softplus + activation fuse plass

* minor change

* implemented reviewer suggestion

* minor fix

* minor fix

* added scale_out parameter

* minor fix

* fix for iScan CI

* conditionally disabled logs

* refactored pass builder

a346c4dc

21 10月, 2021 1 次提交

Added matmul_v2+transpose+reshape fuse pass (#36481) · 856cb9c5

由 jakpiase 提交于 10月 21, 2021

* added base changes for matmul_v2+trans+resh fuse pass

* added full matmul_v2+transpose+reshape pass

* removed a file added by mistake

* added reviewers suggestions

* Changed ops type in checking capatibility version

* Deteled one statement

856cb9c5

20 10月, 2021 1 次提交

Add CINN Compile Option (#36292) · 6524fa8d

由 Huihuang Zheng 提交于 10月 20, 2021

Add CINN compile option in CMake.

Now you can use CINN in Paddle by `-DWITH_CINN=ON` when `cmake`

To test it, you can run `make cinn_lib_test -j` and `ctest -R cinn_lib_test`. 

Note:
1. You should set
```
export runtime_include_dir=${CINN_SOURCE_DIR}/cinn/runtime/cuda 
```
When run test, the `${CINN_SOURCE_DIR}` should be set based on your CINN directory.

2. CINN is under developing now, you may have to change `CINN_GIT_TAG` to the git commit you need.

6524fa8d

15 10月, 2021 1 次提交

Add BuildCinnPass (#36345) · b3f02c57

由 jiangcheng 提交于 10月 15, 2021

* Add CinnSubgraphSearchPass

* solve CI problem of subgraph order not same

* fix some bug by review advices

* ensure the independently of subgraph, that mean the subgraph should not have link to out-graph

* rename cinn_subgraph_search_pass to build_cinn_pass and delete paddle_to_cinn_pass

* add flag to control wheter append build cinn pass

* remove AppendPass at ParallelExecutorPassBuilder

* rename paddle_to_cinn_pass to build_cinn_pass in build_strategy and close test_run_from_cinn

b3f02c57

13 10月, 2021 1 次提交

[PaddleInference] Pass: add int8 flag for op (#36042) · d7858c99

由 Wangzheee 提交于 10月 13, 2021

* add_int_pass

* add_int8_flag_pass

* add_int8_flag_pass

* fix CMakeLists.txt

* fix test_trt_fc_fuse_quant_dequant_pass.py

* fix python/paddle/fluid/tests/unittests/ir/inference/test_trt_fc_fuse_quant_dequant_pass.py

* fix test_trt_fc_fuse_quant_dequant_pass.py

d7858c99

11 10月, 2021 1 次提交

Add use_cinn Flag and RunFromCinn in PE (#36107) · 5690666c

由 Huihuang Zheng 提交于 10月 11, 2021

Add use_cinn flag and use it to control whether we run PaddlePaddle using CINN.

Also add:

Replace PaddlePaddle graph with a CINN graph in a pass
PE Method to feed data and run the graph by CINN

5690666c

18 9月, 2021 1 次提交

Basic PR on Cost Model (#35774) · 5ba9fe6e

由 Huihuang Zheng 提交于 9月 18, 2021

Add basic Cost Model, it uses executor to run program and profile it to get op time.

This is an early basic version, we will add more functions in the future.

5ba9fe6e

17 9月, 2021 1 次提交

GeneratePass for Python Pass (#35708) · f6db9806

由 wuhuanzhou 提交于 9月 17, 2021

#### 背景

#35602 提供Python侧开发子图替换类Pass的方式：

- 利用Paddle Python API或者辅助类型定义子图program用来匹配/替换图；
- Python侧注册Pass时，将注册函数最终转换为protobuf定义的PassDesc数据形式，供C++侧进行解析完成Pass实例注册。

本PR即为根据PassDesc规则描述解析生成Pass实例。

#### 方案设计

##### Pass规则验证

在以往的Pass开发中，会存在随着算子迭代引发的匹配失效或者错误匹配的问题，该问题可以通过扫描算子支持的参数设置及参数类型等来判断是否应该使用该Pass或者给出提示需要修改Pass代码。

当前Pass开发中提供了算子兼容性OpCompatSensiblePass用于解决上述问题。但同时还存在不足：由于以往Pass开发在运行时才能获取到pattern信息，所以需要在执行Pass时才可以判断。

使用PassDesc表示的Pass可以在执行Pass前验证上述问题，这个过程在VerifyDesc中完成。

##### 根据匹配子图构造pattern

GeneratePass对于图匹配和替换使用GraphPatternDecetor完成，构造匹配pattern实际上就是将对应对象成员PDPattern中添加PDNode和边关系。该过程在函数`InitGeneratePattern`中完成，该函数没有作为GeneratePass的成员方法，主要出于后续可能开发新的Decetor考虑，GeneratePass与Decetor的操作是没有关联的。

初始化pattern主要通过遍历匹配子图program的全部算子实现：

1. 添加当前算子对应PDNode及限制条件（算子类型、属性限制等）；
2. 遍历当前算子对应输入并从pattern中尝试获取PDNode：
   - 在pattern中获取到PDNode且为输出节点：表示属于匹配子图的中间节点，将该PDNode设置为中间节点；
   - 在pattern中没有获取到PDNode：添加该输入PDNode并设置作为输入节点；
   - 设置输入到算子的边关系；
3. 遍历当前算子对应输出：
   - 在pattern中获取到PDNode且为输入节点：表示属于匹配子图的中间节点，将该PDNode设置为中间节点；
   - 在pattern中没有获取到PDNode：添加该输入PDNode并设置作为输出节点；
   - 设置算子到输出的边关系；

##### 根据替换子图操作graph

替换子图操作的过程在`GetGenerateRewrite`函数中完成，与`InitGeneratePattern`类似没有作为GeneratePass的成员方法。

生成替换子图操作过程如下：

1. 判断冗余替换子图；
2. 遍历替换子图program的全部算子添加替换子图Node：
   1. 添加当前算子的Node及属性设置；
   2. 遍历当前算子对应输入，添加中间variable节点；
   3. 遍历当前算子对应输出，添加中间variable节点；
   4. 添加输入/输出节点与算子节点的边关系；
3. 删除匹配图中属于中间节点的Node；

##### 优化子图验证

对于替换子图或者替换后的计算图是否可以正确运行等，可以在执行Pass时验证，从而防止在后续执行计算图时出现异常。

当前Pass执行直接修改计算图，验证失败时无法很好的完成还原操作，目前子图验证暂时默认成功，留到后续改进。

f6db9806

17 8月, 2021 1 次提交

Add some passes which can be applied to Program (#34730) · 8046e33d

由 Zeng Jinle 提交于 8月 17, 2021

* add inplace passes and tests

* update

* fix use_cuda undefined
fix compile error of op compat

* add more ut

* fix CPU CI error

* check adam unique

* fix mac/windows ci, improve coverage

* fix ci error

* follow weihang's comment

* fix BlockDesc::MoveFrom

* follow qiuliang's comment

* update

* follow huihuang's comments

8046e33d

06 8月, 2021 1 次提交
- Q
  
  fix npu compile error, test=develop (#34656) · c16421c2
  由 Qi Li 提交于 8月 06, 2021
  
  c16421c2
12 6月, 2021 1 次提交

由 joanna.wozna.intel 提交于 6月 11, 2021

* Small changes related to BF16 fusion_gru and fusion_lstm

* Correct to pass arg by value

* Add conditions to rnn op

* Correct the spelling mistake

* Improving the test with checking activation

* Trigger CI

cd95ea82

03 6月, 2021 1 次提交
- 王
  
  add the fc fuse example for pass enhance, test=develop (#33250) · fc5b3a99
  由王明冬提交于 6月 03, 2021
  
  fc5b3a99
26 5月, 2021 1 次提交

optimize OP's compilation time (#32617) · 78ecb668

由 wuhuanzhou 提交于 5月 26, 2021

* optimize OP's compilation time, test=develop

* add more op and run ci test, test=develop

* CUDA Kernel register in cc file, test=develop

* fix macros, test=develop

* fix undefined symbol error, test=develop

* fix compilation error and undefined symbol, test=develop

* fix compilation error on Windows, test=develop

* fix compilation error on Windows, test=develop

78ecb668

25 5月, 2021 1 次提交
- 王
  
  add the IsLeftDefault definition for pass enhance,test=develop (#33081) · dc72ffa5
  由王明冬提交于 5月 25, 2021
  
  dc72ffa5
21 5月, 2021 1 次提交
- 王
  
  add method for enhance pass,test=develop (#33004) · 79ed7177
  由王明冬提交于 5月 21, 2021
  
  79ed7177
28 4月, 2021 1 次提交

Nne integration (#32604) · abcb3f54

由 denglin-github 提交于 4月 28, 2021

* Add dlnne engine runtime

* Fix log

* Remove <const_cast> and remove unrelated modify with dlnne, +clang-format

* Fix CMakeList format error

* Add copyright message

* Fix dlnne CMakeList.txt

* Add some paddlepaddle_pass to support more networks

* Fix some format bug

* Add delete dropout_op pass

* Fix some format bug

* Fix format bug

abcb3f54

22 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid framework for rocm (part1), test=develop (#31009) · 8fe09faf
  由 Qi Li 提交于 2月 22, 2021
  
  8fe09faf
03 2月, 2021 1 次提交
- A
  
  Layer normalization fuse pass. (#30721) · 4f066e31
  由 Adam Osewski 提交于 2月 03, 2021
  
  4f066e31
16 1月, 2021 1 次提交
- A
  [oneDNN] Refactor fuse pass helper functions to one place. (#30460) · c5ffad12
  由 Adam Osewski 提交于 1月 16, 2021
```
* Move pass tester helper functions to single common place.

* Use helper functions in two more fuse pass tests.
```
  c5ffad12
13 1月, 2021 1 次提交

Added support for inference using quantization aware trained dygraph (#30288) · 7bbf3ac5

由 alncat 提交于 1月 13, 2021

* added support for inference using qunatization aware trained dygraph

* added support for inference using qunatization aware trained dygraph
correct boost get usage

* Delete incorrect warning message (#30196)

* fix warning and no grad

* clean redundant API alias in 2.0 - part 2 (#30013)

* delete paddle.nn.functional.assign

* fix dynamic to static error

* just add the op error message for the matmul xpu (#30246)

 add the op error message for the matmul xpu

* Add Static Variable Clone (#30208)

Add clone method for static Variable so that this interface will be same as dygraph. It fixed some bugs in dy2stat

* use wget to replace curl to download the lcov file (#30229)

* use wget to replace curl to download the lcov file

* add cache for lcov

* fix test_pool3d_op timeout issue (#30248)

* Fix unittests bugs. (#30250)

* modify error message based on comments (#30189)

* modify error message based on comments

* edit code according to review.

* Correct spelling according to review.

* Fix bug for 'save mutiple method' (#30218)

* Fix bug for 'save mutiple method'

* To pass coverage.

* edit code to pass coverage.

* edit code to pass coverage.

* add unittest for coverage.

* change for coverage.

* edit for coverage.

* added support for inference using qunatization aware trained dygraph

* Alias from  paddle.fluid.layers.auc to paddle.static.auc (#30206)

* add alias from  fluid.layers.auc to static.auc

* Update __init__.py

* added support for inference using qunatization aware trained dygraph
correct boost get usage

* corrected boost get usage

* corrected naming issues and enforcing zero check

* correct paddle enforce message

* added more error checkings

* corrected error report message and optimized code

* corrected findvar usage

* corrected paddle_enforce in scope

* correct error messages

* correct error reporting format
Co-authored-by: NLielinJiang <50691816+LielinJiang@users.noreply.github.com>
Co-authored-by: NXiaoguangHu <46782768+XiaoguangHu01@users.noreply.github.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
Co-authored-by: NHuihuang Zheng <zhhsplendid@gmail.com>
Co-authored-by: NYUNSHEN XIE <1084314248@qq.com>
Co-authored-by: NBai Yifan <me@ethanbai.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NWeiXin <weixin10@baidu.com>
Co-authored-by: NJiaqi Liu <liujiaqi06@baidu.com>

7bbf3ac5

31 12月, 2020 1 次提交

Add mkldnn nearest_interp and bilinear_interp op (#30016) · c3c064a8

由 cc 提交于 12月 31, 2020

* Add mkldnn nearest_interp and bilinear_interp op
* don't run mkldnn interpolate in default
* add interpolate_mkldnn_pass

c3c064a8

29 12月, 2020 1 次提交
- C
  map matmul/squeeze2+matmul/reshape2+matmul to mul (#29911) · 6a0102b0
  由 cc 提交于 12月 29, 2020
```
* map matmul/squeeze2+matmul/reshape2+matmul to mul
```
  6a0102b0
28 12月, 2020 1 次提交
- W
  
  [Inference] Solve 2.0 trt performance reduce compare 1.8. (#29925) · 2b1d796c
  由 Wilber 提交于 12月 28, 2020
  
  2b1d796c
24 12月, 2020 1 次提交
- J
  
  Added fc + activation fuse pass (currently only gelu, sigmoid and tanh are supported) (#29772) · edc06c6a
  由 jakpiase 提交于 12月 24, 2020
  
  edc06c6a
23 12月, 2020 1 次提交
- Y
  remove duplicate ut reload (#29810) · 24ce051a
  由 YUNSHEN XIE 提交于 12月 23, 2020
```
* remove duplicate ut reload

* remove duplicate ut define in cmakelist
```
  24ce051a
25 11月, 2020 1 次提交
- W
  Add multi_gru_fuse_pass and tests (#28601) · 7b5a8e46
  由 Wojciech Uss 提交于 11月 25, 2020
```
* Add multi_gru_fuse_pass and tests

* fix date

* cleaned up headers
```
  7b5a8e46
24 11月, 2020 1 次提交
- W
  Add multi_gru_seq_fuse_pass and tests (#28604) · 991345b3
  由 Wojciech Uss 提交于 11月 24, 2020
```
* Add multi_gru_seq_fuse_pass and tests

* fix date

* removed unused functions
```
  991345b3

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致