提交 · 576236a08a70e17890480f1248d01d3128024e28 · Crayon鑫 / Paddle

26 6月, 2022 1 次提交
- S
  
  format all files in fluid using new config (#43776) · 576236a0
  由 Sing_chan 提交于 6月 26, 2022
  
  576236a0
21 6月, 2022 1 次提交
- Y
  
  Fix code example of fused_attention and fused_feedforward. (#43635) · 223fb7b3
  由 Yiqun Liu 提交于 6月 21, 2022
  
  223fb7b3
20 6月, 2022 2 次提交
- W
  
  Add passes and plugins for distributed inference of NLU (#43049) · 007f3614
  由 whs 提交于 6月 20, 2022
  
  007f3614
- Z
  Support more dimensions in MMHA (#43612) · 03f9e598
  由 Zhang Zheng 提交于 6月 20, 2022
```
* support more dimensions

* fix
```
  03f9e598
17 6月, 2022 1 次提交

Support optional residual add in fused_attention and fused_feedforward. (#43474) · 19e866f9

由 Yiqun Liu 提交于 6月 17, 2022

* Support optional residual add in fused_attention and fused_feedforward.

* Add checkpoint and add the check of add_residual when pre_layer_norm is false.

* Add TODO and change the python api to add add_residual argument.

19e866f9

15 6月, 2022 1 次提交

Optimize prod's python implementation for dygraph. (#43309) · 9b7126d0

由 Yiqun Liu 提交于 6月 15, 2022

* Optimize prod's python implementation for dygraph.

* Change key_dim to head_dim.

* Add comment in unittest.

* Disable TF32 in unittest.

9b7126d0

14 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step3：enable clang-format sort these infrt files's headers (#43333) · 403b127b
  由 Sing_chan 提交于 6月 14, 2022
  
  403b127b
10 6月, 2022 1 次提交
- L
  
  optimize bwd layer_norm kernel with fast method (#42491) · b4a93884
  由 limingshu 提交于 6月 10, 2022
  
  b4a93884
09 6月, 2022 1 次提交
- C
  Implement dropout_nd operator to optimize dropout with axis not None. (#42463) · caa57498
  由 crystal 提交于 6月 09, 2022
```
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
```
  caa57498
08 6月, 2022 1 次提交

Fix wrong reduce_dims in fused_gate_attention and optimize the memory usage. (#43216) · 10f8637c

由 Yiqun Liu 提交于 6月 08, 2022

* Polish codes and memory usage for fused_gate_attention.

* Fix wrong reduce_dims in fused_gate_attention when computing gradient of nonbatched_bias.

10f8637c

07 6月, 2022 1 次提交
- Z
  
  Supoort more dimensions in forward fast layer_norm kernel (#43226) · d9f8636c
  由 Zhang Zheng 提交于 6月 07, 2022
  
  d9f8636c
05 6月, 2022 2 次提交
- S
  
  revert modification for format PR passing CI; not sort headers for windows CI test failed (#43200) · 58d2949d
  由 Sing_chan 提交于 6月 05, 2022
  
  58d2949d
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
02 6月, 2022 2 次提交
- Z
  Support head_dim = 96 in fused_multi_transformer for PLATO-XL (#43120) · 990c5e7f
  由 Zhang Zheng 提交于 6月 02, 2022
```
* Support head_dim = 96 in fused_multi_transformer in PLATO-XL

* add notes
```
  990c5e7f
- L
  Extend forward fast layer_norm kernel to support more dimensions. (#43118) · 85baa3c0
  由 Li Min 提交于 6月 02, 2022
```
* extend forward fast_ln_kernel to support more column values.
```
  85baa3c0
01 6月, 2022 1 次提交

Make fuse_gemm_epilogue support transpose_x and transpose_y (#40558) · 048b0013

由 sneaxiy 提交于 6月 01, 2022

* support weight transpose

* add ut

* add template

* fix transpose error

* fix transpose_comment

* add api tests

* add skipif

* add doc

048b0013

31 5月, 2022 1 次提交
- L
  Rename dropout is test (#43098) · 67497119
  由 Li Min 提交于 5月 31, 2022
```
* replace dropout_is_test with is_test.
* improve atol on a100.
```
  67497119
30 5月, 2022 2 次提交
- L
  Add fused_bias_dropout_residual_ln op and layer. (#43062) · dceccd9d
  由 Li Min 提交于 5月 30, 2022
```
* add fused_bias_dropout_residual_ln op and layer.
```
  dceccd9d
- C
  
  Implement fused_gate_attention operator for AlphaFold. (#42018) · fdcdbec5
  由 crystal 提交于 5月 30, 2022
  
  fdcdbec5
27 5月, 2022 1 次提交

[Phi] Change optional tensor from `optional<const Tensor&>` to `optional<Tensor>` (#42939) · 6d78524c

由 zyfncg 提交于 5月 27, 2022

* refactor the optional tensor

* remove optiona<MetaTensor> in InferMeta

* fix bug

* fix optional<vector<Tensor>>

* fix bug

* fix rmsprop

* fix amp of eager_gen

* polish code

* fix deleted code

* fix merge conflict

* polish code

* remove is_nullopt_

* fix merge conflict

* fix merge conflict

6d78524c

25 5月, 2022 1 次提交

fix maybe-uninitialized warning (#42902) · f1f79b0d

由 Leo Chen 提交于 5月 25, 2022

* fix maybe-uninitialized warning

* fix compile

* fix xpu compile

* fix npu compile

* fix infer compile

* fix compile

* fix compile

f1f79b0d

24 5月, 2022 1 次提交
- Y
  [Phi]Move grad_add op kernel into phi and delete elementwise_add_op file (#42903) · 4d7a9eef
  由 YuanRisheng 提交于 5月 24, 2022
```
* move grad_add

* fix unittest bugs

* fix compile bugs
```
  4d7a9eef
20 5月, 2022 1 次提交
- W
  
  fix fused_attention_op cacheKV InferShape (#42900) · 7306d1fb
  由 WangXi 提交于 5月 20, 2022
  
  7306d1fb
17 5月, 2022 1 次提交
- Z
  
  add yolo_box_fuse_pass, yolo_box_head_op, yolo_box_post_op (#42641) · 6b58de95
  由 zhupengyang 提交于 5月 17, 2022
  
  6b58de95
16 5月, 2022 2 次提交
- N
  
  delete rank switch in broadcast_function.h for compile (#42645) · 8501fb00
  由 niuliling123 提交于 5月 16, 2022
  
  8501fb00
- W
  
  fused_multi_transformer add fused softmax mask (#42636) · f9d5ae4e
  由 WangXi 提交于 5月 16, 2022
  
  f9d5ae4e
12 5月, 2022 1 次提交
- S
  
  Fix some typos in paddle/. (#42408) · 2012672c
  由 Shuangchi He 提交于 5月 12, 2022
  
  2012672c
06 5月, 2022 1 次提交
- Z
  
  Fix the implementation of fused_fast_ln_fwd_kernel in test mode (#42527) · 5acd764d
  由 Zhang Zheng 提交于 5月 06, 2022
  
  5acd764d
02 5月, 2022 1 次提交
- Z
  Fix test_cudnn_norm_conv and test_cudnn_bn_add_relu in CUDA11.2 (#42405) · fb3d5f07
  由 Zhang Zheng 提交于 5月 02, 2022
```
* Fix test_cudnn_norm_conv and test_cudnn_bn_add_relu in CUDA11.2

* no throw in V100 for some cases
```
  fb3d5f07
28 4月, 2022 3 次提交
- Z
  Suppport more scenes for fused_fast_ln (#42282) · 7cb49539
  由 Zhang Zheng 提交于 4月 28, 2022
```
* Suppport more scenes for fused_fast_ln

* fix
```
  7cb49539
- W
  
  fix FusedResidualDropoutBias nan in v100 (#42344) · 687219fe
  由 WangXi 提交于 4月 28, 2022
  
  687219fe
- W
  
  fix fused_multi_transformer compile failed in cuda arch < sm53 (#42315) · f4507974
  由 WangXi 提交于 4月 28, 2022
  
  f4507974
26 4月, 2022 1 次提交
- W
  
  Add fused_multi_transformer op to optimize transformer generation performance (#41814) · 9dadf7df
  由 WangXi 提交于 4月 26, 2022
  
  9dadf7df
22 4月, 2022 1 次提交

[WIP] Algorithm Cache of cuBlasLt Epilogue (#41010) · 19650d72

由 Ming-Xu Huang 提交于 4月 22, 2022

* Fix leading dimension setting error in fused_gemm_epilogue_grad_op.

* Add dyload to cuBlasLt functions.

* Added cublasLtMatmulAlgoGetHeuristic to improve performance.

* Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue

* Added UTs to FLAGS_cublaslt_exhaustive_search_times

* Added warmup runs in algo searching of Gemm epilogue.

* Update copyright and documents.

* Fixed error handling.

19650d72

19 4月, 2022 1 次提交
- W
  
  fix inf in fused_attention (#41933) · 6bd39b5e
  由 WangXi 提交于 4月 19, 2022
  
  6bd39b5e
16 4月, 2022 1 次提交
- 王
  
  move fc_functor from fluid to phi.test=develop (#41856) · 21aa3adc
  由王明冬提交于 4月 16, 2022
  
  21aa3adc
09 4月, 2022 1 次提交

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

28 3月, 2022 1 次提交
- D
  add fused_seqpool_cvm op (#37928) · ea5b2f26
  由 danleifeng 提交于 3月 28, 2022
```
* add fused_seqpool_cvm op;test=develop
```
  ea5b2f26
19 3月, 2022 1 次提交

Add infer meta (#40544) · 8e4e19ab

由 hong 提交于 3月 19, 2022

* add infer meta; test=develop

* add histogram infer meta; test=develop

* fix unitest bug; test=develop

* format; test=develop

* format; test=develop

* bn not use new infer meta; test=develop

* add infer meta; test=develop

* fixbug; test=develop

* fix bug;

* recover unitest; test=develop

8e4e19ab

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致