提交 · caa57498e1fc763ee1277ac2ae8ccaf0f386e45f · Crayon鑫 / Paddle

09 6月, 2022 1 次提交
- C
  Implement dropout_nd operator to optimize dropout with axis not None. (#42463) · caa57498
  由 crystal 提交于 6月 09, 2022
```
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
```
  caa57498
08 6月, 2022 1 次提交

Fix wrong reduce_dims in fused_gate_attention and optimize the memory usage. (#43216) · 10f8637c

由 Yiqun Liu 提交于 6月 08, 2022

* Polish codes and memory usage for fused_gate_attention.

* Fix wrong reduce_dims in fused_gate_attention when computing gradient of nonbatched_bias.

10f8637c

07 6月, 2022 1 次提交
- Z
  
  Supoort more dimensions in forward fast layer_norm kernel (#43226) · d9f8636c
  由 Zhang Zheng 提交于 6月 07, 2022
  
  d9f8636c
05 6月, 2022 2 次提交
- S
  
  revert modification for format PR passing CI; not sort headers for windows CI test failed (#43200) · 58d2949d
  由 Sing_chan 提交于 6月 05, 2022
  
  58d2949d
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
02 6月, 2022 2 次提交
- Z
  Support head_dim = 96 in fused_multi_transformer for PLATO-XL (#43120) · 990c5e7f
  由 Zhang Zheng 提交于 6月 02, 2022
```
* Support head_dim = 96 in fused_multi_transformer in PLATO-XL

* add notes
```
  990c5e7f
- L
  Extend forward fast layer_norm kernel to support more dimensions. (#43118) · 85baa3c0
  由 Li Min 提交于 6月 02, 2022
```
* extend forward fast_ln_kernel to support more column values.
```
  85baa3c0
01 6月, 2022 1 次提交

Make fuse_gemm_epilogue support transpose_x and transpose_y (#40558) · 048b0013

由 sneaxiy 提交于 6月 01, 2022

* support weight transpose

* add ut

* add template

* fix transpose error

* fix transpose_comment

* add api tests

* add skipif

* add doc

048b0013

31 5月, 2022 1 次提交
- L
  Rename dropout is test (#43098) · 67497119
  由 Li Min 提交于 5月 31, 2022
```
* replace dropout_is_test with is_test.
* improve atol on a100.
```
  67497119
30 5月, 2022 2 次提交
- L
  Add fused_bias_dropout_residual_ln op and layer. (#43062) · dceccd9d
  由 Li Min 提交于 5月 30, 2022
```
* add fused_bias_dropout_residual_ln op and layer.
```
  dceccd9d
- C
  
  Implement fused_gate_attention operator for AlphaFold. (#42018) · fdcdbec5
  由 crystal 提交于 5月 30, 2022
  
  fdcdbec5
27 5月, 2022 1 次提交

[Phi] Change optional tensor from `optional<const Tensor&>` to `optional<Tensor>` (#42939) · 6d78524c

由 zyfncg 提交于 5月 27, 2022

* refactor the optional tensor

* remove optiona<MetaTensor> in InferMeta

* fix bug

* fix optional<vector<Tensor>>

* fix bug

* fix rmsprop

* fix amp of eager_gen

* polish code

* fix deleted code

* fix merge conflict

* polish code

* remove is_nullopt_

* fix merge conflict

* fix merge conflict

6d78524c

25 5月, 2022 1 次提交

fix maybe-uninitialized warning (#42902) · f1f79b0d

由 Leo Chen 提交于 5月 25, 2022

* fix maybe-uninitialized warning

* fix compile

* fix xpu compile

* fix npu compile

* fix infer compile

* fix compile

* fix compile

f1f79b0d

24 5月, 2022 1 次提交
- Y
  [Phi]Move grad_add op kernel into phi and delete elementwise_add_op file (#42903) · 4d7a9eef
  由 YuanRisheng 提交于 5月 24, 2022
```
* move grad_add

* fix unittest bugs

* fix compile bugs
```
  4d7a9eef
20 5月, 2022 1 次提交
- W
  
  fix fused_attention_op cacheKV InferShape (#42900) · 7306d1fb
  由 WangXi 提交于 5月 20, 2022
  
  7306d1fb
17 5月, 2022 1 次提交
- Z
  
  add yolo_box_fuse_pass, yolo_box_head_op, yolo_box_post_op (#42641) · 6b58de95
  由 zhupengyang 提交于 5月 17, 2022
  
  6b58de95
16 5月, 2022 2 次提交
- N
  
  delete rank switch in broadcast_function.h for compile (#42645) · 8501fb00
  由 niuliling123 提交于 5月 16, 2022
  
  8501fb00
- W
  
  fused_multi_transformer add fused softmax mask (#42636) · f9d5ae4e
  由 WangXi 提交于 5月 16, 2022
  
  f9d5ae4e
12 5月, 2022 1 次提交
- S
  
  Fix some typos in paddle/. (#42408) · 2012672c
  由 Shuangchi He 提交于 5月 12, 2022
  
  2012672c
06 5月, 2022 1 次提交
- Z
  
  Fix the implementation of fused_fast_ln_fwd_kernel in test mode (#42527) · 5acd764d
  由 Zhang Zheng 提交于 5月 06, 2022
  
  5acd764d
02 5月, 2022 1 次提交
- Z
  Fix test_cudnn_norm_conv and test_cudnn_bn_add_relu in CUDA11.2 (#42405) · fb3d5f07
  由 Zhang Zheng 提交于 5月 02, 2022
```
* Fix test_cudnn_norm_conv and test_cudnn_bn_add_relu in CUDA11.2

* no throw in V100 for some cases
```
  fb3d5f07
28 4月, 2022 3 次提交
- Z
  Suppport more scenes for fused_fast_ln (#42282) · 7cb49539
  由 Zhang Zheng 提交于 4月 28, 2022
```
* Suppport more scenes for fused_fast_ln

* fix
```
  7cb49539
- W
  
  fix FusedResidualDropoutBias nan in v100 (#42344) · 687219fe
  由 WangXi 提交于 4月 28, 2022
  
  687219fe
- W
  
  fix fused_multi_transformer compile failed in cuda arch < sm53 (#42315) · f4507974
  由 WangXi 提交于 4月 28, 2022
  
  f4507974
26 4月, 2022 1 次提交
- W
  
  Add fused_multi_transformer op to optimize transformer generation performance (#41814) · 9dadf7df
  由 WangXi 提交于 4月 26, 2022
  
  9dadf7df
22 4月, 2022 1 次提交

[WIP] Algorithm Cache of cuBlasLt Epilogue (#41010) · 19650d72

由 Ming-Xu Huang 提交于 4月 22, 2022

* Fix leading dimension setting error in fused_gemm_epilogue_grad_op.

* Add dyload to cuBlasLt functions.

* Added cublasLtMatmulAlgoGetHeuristic to improve performance.

* Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue

* Added UTs to FLAGS_cublaslt_exhaustive_search_times

* Added warmup runs in algo searching of Gemm epilogue.

* Update copyright and documents.

* Fixed error handling.

19650d72

19 4月, 2022 1 次提交
- W
  
  fix inf in fused_attention (#41933) · 6bd39b5e
  由 WangXi 提交于 4月 19, 2022
  
  6bd39b5e
16 4月, 2022 1 次提交
- 王
  
  move fc_functor from fluid to phi.test=develop (#41856) · 21aa3adc
  由王明冬提交于 4月 16, 2022
  
  21aa3adc
09 4月, 2022 1 次提交

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

28 3月, 2022 1 次提交
- D
  add fused_seqpool_cvm op (#37928) · ea5b2f26
  由 danleifeng 提交于 3月 28, 2022
```
* add fused_seqpool_cvm op;test=develop
```
  ea5b2f26
19 3月, 2022 1 次提交

Add infer meta (#40544) · 8e4e19ab

由 hong 提交于 3月 19, 2022

* add infer meta; test=develop

* add histogram infer meta; test=develop

* fix unitest bug; test=develop

* format; test=develop

* format; test=develop

* bn not use new infer meta; test=develop

* add infer meta; test=develop

* fixbug; test=develop

* fix bug;

* recover unitest; test=develop

8e4e19ab

17 3月, 2022 1 次提交

Move layer norm to phi (#40193) · 681a6865

由 hong 提交于 3月 17, 2022

* update

* fix bugs; test=develop

* update; test=develop

* fix test compile error; test=develop

* fix cpu compile error; test=develop

* fix test error; test=develo

* fix layer_norm_op plugin error; test=develop

* fix error; test=develop

* fix test bug; test=develop

* update; test=develop

* polish code; test=develop

* fix bugs; test=develop

* remove unused depency; test=develop

* polish code; test=develop

681a6865

11 3月, 2022 2 次提交

[Phi] Remove needless deps in unittests (#40256) · 89ed57e2

由 Chen Weihang 提交于 3月 11, 2022

* remove needless deps in unittests

* add gpu marco

* fix other unittests

* fix kernel name error

* fix test_prepare_op

* fix failed dygraph unittests

* fix gpu failed tests

* fix cinn test failed

* fix cinn test failed

* fix dropout tests

89ed57e2

Y

[hybrid] Support tensor parallel and cache structure for fused attention op. (#40101) · 1882c496
由 Yuang Liu 提交于 3月 11, 2022

1882c496

10 3月, 2022 1 次提交

Move dropout to phi (#40148) · 99fc1b08

由 hong 提交于 3月 10, 2022

* move dropout to phi; test=develop

* fix xpu, npu compile error; test=develop

99fc1b08

09 3月, 2022 1 次提交
- W
  
  [hybrid] fused_feedforward op support tensor model parallel (#40160) · e0866dc6
  由 WangXi 提交于 3月 09, 2022
  
  e0866dc6
08 3月, 2022 1 次提交
- Z
  Add exception throw for norm_conv when platform is not supported (#40166) · 00566ead
  由 Zhang Zheng 提交于 3月 08, 2022
```
* Add throw for norm_conv when platform is not supported

* fix format
```
  00566ead
07 3月, 2022 1 次提交

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

04 3月, 2022 1 次提交
- F
  [phi] move cpu_vec (#39714) · 70540b26
  由 Feiyu Chan 提交于 3月 04, 2022
```
move cpu_vec.h to phi/kernels/funcs.
```
  70540b26

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致