提交 · 2a3d9eca64b0312a6bf49ffe6f470a084886bbe4 · Crayon鑫 / Paddle

07 3月, 2022 10 次提交

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

[phi] move is_empty to phi (#39919) · 72964335

由 WJJ1995 提交于 3月 07, 2022

* Add is_empty

* fixed for CI

* fixed code style

* resolve conflict

* deal with comments

* replace pt by pd

72964335

Y
[Phi]Move elementwise_div grad/double grad Kernel to Phi (#40172) · c52a664e
由 YuanRisheng 提交于 3月 07, 2022
```
* move elementwise_div grad

* change mutable_data to alloc

* fix compile bugs
```
c52a664e
W
fix infer shapes of pool_with_index (#40139) · 0fb6bca4
由 Wei Shengyu 提交于 3月 07, 2022
```
* dbg pool infer shapes

* dbg

* fix format
```
0fb6bca4
A

[Phi] Fix macro name typo (#40204) · 55a3bfbd
由 Aurelius84 提交于 3月 07, 2022

55a3bfbd

[bf16] add bf16 kernel: gaussian_random fill_constant fill_any_like (#40027) · 6a0d60d2

由 zhangbo9674 提交于 3月 07, 2022

* add gaussian random

* add full

* refine reduce

* refine code

* refine gaussian_random unittest

* add unittest for fill_any_like fill_constant

6a0d60d2

[phi] move multi_dot OP (#40038) · fd36ede6

由 Liu-xiandong 提交于 3月 07, 2022

* [phi] move multi_dot OP

* fix the segment bug

* fix bug

* delete useless comment

* fix CI bug

fd36ede6

Z
[bf16] add bf16 kernel: sigmoid & sqrt & softplus & square (#40004) · 98c427e2
由 zhangbo9674 提交于 3月 07, 2022
```
* add activ

* refine unittest

* refine unittest

* refine unittest

* refine unittest

* refine code
```
98c427e2
Z
[MLU]support reduce tensors on mlu (#40000) · b4eb413e
由 zn 提交于 3月 07, 2022
```
* [MLU]support reduce tensors on mlu

* [MLU]fix compiler options
```
b4eb413e
A
[Phi]Migrate Adamax and Adadelta Optimizer Op into Phi (#40173) · f5ec0314
由 Aurelius84 提交于 3月 07, 2022
```
* [Phi]Migrate Adamax into phi

* Add adadelta kernel
```
f5ec0314

06 3月, 2022 3 次提交

[Phi] Replace all prefix PT by PD and fix typo (#40046) · d30d85da

由 Chen Weihang 提交于 3月 06, 2022

* replace prefix pt by pd

* replace added kernel

* revert util change

* pd kernel to phi

* resolve conflict

* resolve conflict

d30d85da

Z
[PHI] Move dist op to phi (#40178) · 7e076e7b
由 Zhong Hui 提交于 3月 06, 2022
```
* move dist op to phi

* fix

* fix

* fix as reviews
```
7e076e7b

【Phi】Migrate triangular_solve op into phi (#40093) · a3f28a31

由 zhouweiwei2014 提交于 3月 06, 2022

* Migrate triangular_solve op into phi

* fix CI

* move MatrixReduceSum to phi funcs

* move MatrixReduceSum to phi funcs

* fix comment

* fic CI

a3f28a31

05 3月, 2022 3 次提交
- C
  [Phi] Remove eig op depend for svd_helper (#40174) · e7afa391
  由 Chen Weihang 提交于 3月 05, 2022
```
* remove eig dep for svd helper

* fix win failed
```
  e7afa391
- F
  [Phi] move infershape for mv (#39954) · 4be5448b
  由 furnace 提交于 3月 05, 2022
```
* [Phi] move infershape for mv

* [Phi] delete extra codes for mv
```
  4be5448b
- C
  
  support add infershape for no grad op (#40182) · 94f03dc2
  由 Chen Weihang 提交于 3月 05, 2022
  
  94f03dc2
04 3月, 2022 13 次提交

H
Move yolo box to phi (#40112) · faece382
由 hong 提交于 3月 04, 2022
```
* add yolo box kernel; test=develop

* fix comile error; test=develop
```
faece382
S
Move gather_nd/scatter/scatter_nd_add op to the phi library (#40090) · 1ca379bf
由 sneaxiy 提交于 3月 04, 2022
```
* move gather_nd/scatter/scatter_nd_add

* fix npu/xpu ci

* follow comments

* small fix
```
1ca379bf
F
[phi] move cpu_vec (#39714) · 70540b26
由 Feiyu Chan 提交于 3月 04, 2022
```
move cpu_vec.h to phi/kernels/funcs.
```
70540b26

[phi] move sigmoid_cross_entopy_with_logits log_loss cumsum auc kernel to phi (#39976) · b7bbe39c

由 Linjie Chen 提交于 3月 04, 2022

* move sigmoid cross entopy with logits to phi

* fix ci

* move log_loss to phi

* move cumsum to phi

* revert infershape

* fix xpu ci

* move auc to phi

* remove comment

* update sigmoid_cross_entropy_with_logits_op.cu

* update sigmoid_cross_entropy_with_logits_op

* Update log_loss

b7bbe39c

Add digamma abs trunc yaml (#40024) · 0bfba16b

由 hong 提交于 3月 04, 2022

* add digamma, abs, trunc; test=develop

* fix bug and add diagonal; test=develop

* add name coverter; test=develop

* update tracer.py; test=develop

* add test case; test=develop

* fix bugs; test=develop

0bfba16b

Z
[PHI] Remove emtpy kernel and infershape in fluid (#40146) · f3161c50
由 zyfncg 提交于 3月 04, 2022
```
* remove emtpy kernel and infershape in fluid

* fix bug of infershape_utils
```
f3161c50
Z
Fix bug caused by split infershape (#40116) · 45385371
由 zyfncg 提交于 3月 04, 2022
```
* fix bug caused by split infershape

* revert infer_shape of split

* revert split
```
45385371
C
[Phi] Remove cholsky solve deps with svd helper (#40119) · 28fd30cd
由 Chen Weihang 提交于 3月 04, 2022
```
* remove cholsky solve deps with svd helper

* fix shape infer bug
```
28fd30cd
【Phi】Migrate bitwise_and/bitwise_or/bitwise_xor/bitwise_not op into phi (#40031) · 03eb792d
由 zhouweiwei2014 提交于 3月 04, 2022
```
* Migrate bitwise_and/or/xor/not op into phi

* fix CI
```
03eb792d
L
clean distribution_helper, index_impl, aligned_vector code in fluid (#40071) · b9672a1e
由 Leo Chen 提交于 3月 04, 2022
```
* clean distribution_helper, index_impl, aligned_vector code in fluid

* fix conflicts
```
b9672a1e

[phi]move reduce gpu impl funcs into pten/kernels/funcs (#39990) · e2e2d531

由 chentianyu03 提交于 3月 04, 2022

* move reduce gpu impl funcs into pten/kernels/funcs

* change reduce header name and namespace

* fix spell word error

* change mutable_data to dev_ctx.Alloc

* modify place to devcontex

* format code style

* fix build error

* fix build error

* fix conflict

e2e2d531

X

transfer selu infershape (#40137) · abacc4cb
由 xiongkun 提交于 3月 04, 2022

abacc4cb

Move conv to pten (#39354) · d50fb43e

由 hong 提交于 3月 04, 2022

* move conv to pten

* move conv to pten; test=develop

* fix bug;

* add conv cudnn impl; test=develop

* update

* update operator; test=develop

* fix bug; test=develop

* move operator and prepared_operator to develop; test=develop

* resolve conflict; test=develop

* remove useless code;test=develop

* add depency ; test=develop

* fix bug;

* add sig.cc ; test=develop

* fix use_op error; test=develop

* fix bug; test=develop

* fix bug; test=develop

* add conv3d register; test=develop

* fix star gan and conv_nn_grad test failed; test=develop

* add header; test=develop

* manul to recover to develop;

* resolve confilct; test=develop

* remove useless code

* fix bug;

* remove conv2d_cudnn; test=develop

* fix bugs; test=develop

* fix cpu rocm compile bugs; test=develop

* fix blas error; test=develop

* fix compile bug; test=develop

* fix windows compile error; test=develop

* fix windows error; test=develop

* resolve confilct; test=develop

d50fb43e

03 3月, 2022 11 次提交
- Y
  
  fix save_vars bugs (#40062) · eaacf8bf
  由 YuanRisheng 提交于 3月 03, 2022
  
  eaacf8bf
- 0
  
  move eye, lerp infershape to phi (#40105) · 1c205883
  由 0x45f 提交于 3月 03, 2022
  
  1c205883
- T
  cinn_launch_op: switch to execution by PE (#39911) · 167d511f
  由 TeFeng Chen 提交于 3月 03, 2022
```
* swith to PE execution in cinn launch

* fix outer variables erased

* skip the map bug temporarily for test

* temporary solution for batch_norm bug

* update comment

* fix compile error

* cinn_instruction_run_op_test: update code to skip external alloc/free instructions generated
```
  167d511f
- F
  Move compare OPs to phi (#39970) · 0969a4eb
  由 From00 提交于 3月 03, 2022
```
* Move compare OPs to phi

* Fix bug

* Use BroadcastKernel and ElementwiseKernel in phi
```
  0969a4eb
- W
  modify infershape of multiclass nms (#40059) · 756af9ff
  由 wangxinxin08 提交于 3月 03, 2022
```
* modify infershape of multiclass nms
```
  756af9ff
- Y
  [Phi]Delete kernel registry of elementwise_sub op in Fluid (#40039) · cac00e0b
  由 YuanRisheng 提交于 3月 03, 2022
```
* delete elementwise_sub kernel registry

* fix compile bugs in xpu ci

* fix bugs when run inference ci
```
  cac00e0b
- W
  EmbEltwiseLayernorm fix (#40015) · c3f3643b
  由 wenbin 提交于 3月 03, 2022
```
* emb fix

* fix trt6 compile

* fix half

* absolute error fix
```
  c3f3643b
- H
  Modified sigmoid by the elementwise interface. (#39898) · 5d9e11a4
  由 huangxu96 提交于 3月 03, 2022
```
* Modified sigmoid by elementwise interface.

* using TensorReduceImpl to repalce Sum function

* using reduceimpl to calculate the norm variable

* Removed useless code
```
  5d9e11a4
- L
  Add support of int16 for gather op. (#40052) · 3e56e816
  由 Li Min 提交于 3月 03, 2022
```
* add support of int16 for gather op.

* Recover formats.

* Recover formats.

* fix.

* Fix format.

* Fix format.
```
  3e56e816
- X
  [phi] transfer pad kernel into phi and pass the test_pad_op (#40012) · 9f74b84e
  由 xiongkun 提交于 3月 03, 2022
```
* add pad forward

* fix error

* transfer pad and pass the test_pad_op
```
  9f74b84e
- C
  
  move gather_tree infer shape (#40082) · 3779e807
  由 crystal 提交于 3月 03, 2022
  
  3779e807

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致