提交 · acf3e526c1a69d777e7b48005b317f41f307f724 · PaddlePaddle / Paddle

25 5月, 2023 5 次提交
- Z
  
  [Sparse]fix sparse bug (#53390) · acf3e526
  由 zhangkaihuo 提交于 5月 25, 2023
  
  acf3e526
- T
  
  【Hackathon 4th No.26】为 Paddle 新增 paddle.sparse.nn.Softmax 稀疏 API 的 coo 格式计算逻辑 (#53613) · 4ea1d041
  由 thunder95 提交于 5月 25, 2023
  
  4ea1d041
- [Zero-Dim] support ReshapeTransform/nll_loss/matmul support 0D (#53828) · a64a722a
  由 zhouweiwei2014 提交于 5月 25, 2023
  
  a64a722a
- L
  add log for memory stats (#54083) · 5745a63f
  由 Leo Chen 提交于 5月 25, 2023
```
* add log for memory stats

* fix string_split in einsum
```
  5745a63f
- 张
  
  fix return bool (#54096) · ae360000
  由张春乔提交于 5月 25, 2023
  
  ae360000
24 5月, 2023 8 次提交
- Y
  Try to increase the repeat of autotune and fix the setting of allow_tf32_cublas. (#53622) · f4abe34b
  由 Yiqun Liu 提交于 5月 24, 2023
```
* Try to increase the repeat of autotune and fix the setting of allow_tf32_cublas.

* Change the repeat of cublaslt to 10.

* Use FLAGS_cublaslt_exhaustive_search_times as repeats.

* Fix compiling error on CI.

* Polish the key and simplify codes.
```
  f4abe34b
- Z
  
  move reduce raw kernels to legacy (#53961) · f488e3fd
  由 zhangyuqin1998 提交于 5月 24, 2023
  
  f488e3fd
- Z
  move raw kernels to legacy (#53913) · 48f5af99
  由 zhangyuqin1998 提交于 5月 24, 2023
```
* move raw kernels to legacy

* Update elementwise_add_kernel.cu

* fix
```
  48f5af99
- W
  
  [XPU]Add act add fuse (#53965) · f55f9d79
  由 wz1qqx 提交于 5月 24, 2023
  
  f55f9d79
- L
  Fixed the bug in the api.cc file where there was an inconsistency between the... · 75fc4bf0
  由 Leo Guo 提交于 5月 24, 2023
```
Fixed the bug in the api.cc file where there was an inconsistency between the specified type (std::vector<DenseTensor*>&) in the function pointer kernel_signature and the type of the phi kernel parameter (std::vector<DenseTensor*>) when the phi kernel is set to output as std::vector<DenseTensor*>. test=kunlun (#54053)
```
  75fc4bf0
- X
  
  revert_tanh_double_grad (#54062) · e862753c
  由 xiaoguoguo626807 提交于 5月 24, 2023
  
  e862753c
- W
  Update lerp_kernel.cu (#54071) · a299797d
  由 Winters Montagne 提交于 5月 24, 2023
```
Removed unnecessary header files introduced
```
  a299797d
- L
  [XPU][PHI Kernels] bind bitwise_add kernel & add int32/int64 support to... · 0a06140f
  由 lijin23 提交于 5月 24, 2023
```
[XPU][PHI Kernels] bind bitwise_add kernel & add int32/int64 support to scatter_nd_add kernel for xpu (#54066)

* bind new kernels to xpu

* refine code

* fix bugs in unittest
```
  0a06140f
23 5月, 2023 14 次提交
- Z
  [AMP OP&Test] Support float16 in selu (#54030) · 6133ca4e
  由 Zhang Zheng 提交于 5月 23, 2023
```
* [AMP OP&Test] Support float16 in selu

* fix
```
  6133ca4e
- L
  
  fix nccl version (#53942) · 89da2f19
  由 LiYuRio 提交于 5月 23, 2023
  
  89da2f19
- R
  
  [PHI] bind nll_loss xpu kernel (#54043) · 73d706ce
  由 RuohengMa 提交于 5月 23, 2023
  
  73d706ce
- [dist attr 迁移到 phi]Dist attr (#53848) · be1152a4
  由 zhenhailiu 提交于 5月 23, 2023
```
* merge code from forsish

* polish

* paddle/fluid/pybind/auto_parallel_py.cc

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish
```
  be1152a4
- G
  [static op generation] tril_triu (#54033) · 4af0f140
  由 gouzil 提交于 5月 23, 2023
```
* [phi] autogen code tril_triu

* [phi][api]fix tril_triu_grad args

* [fluid] clean cmake; [phi] fix infer_meta
```
  4af0f140
- C
  
  Fix typos (#54015) · adca3654
  由 co63oc 提交于 5月 23, 2023
  
  adca3654
- W
  
  Enabel memory optimize pass although MkLDNN is enabled (#53615) · 5996f623
  由 weishengying 提交于 5月 23, 2023
  
  5996f623
- C
  Fix typos, Betweeen to Between (#53952) · ee4eecef
  由 co63oc 提交于 5月 23, 2023
```
* Fix typos

* Fix
```
  ee4eecef
- C
  
  fix typos(#53967) · c36a000d
  由 cyberslack_lee 提交于 5月 23, 2023
  
  c36a000d
- H
  Functionalize distributed_fused_lamb kernel (#53896) · 5f8e7d8f
  由 huangjiyi 提交于 5月 23, 2023
```
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update HostAlloc

* update param name

* update cpu kernel

* remove kernel header

* update

* update
```
  5f8e7d8f
- H
  move fusion_group infershape to phi (#53934) · 3dc99088
  由 huangjiyi 提交于 5月 23, 2023
```
* update

* update

* update

* set out dtype
```
  3dc99088
- W
  static graph autogen code support for pad3d op (#53733) · bcf67536
  由 Wang Xin 提交于 5月 23, 2023
```
* static graph autogen code support for pad3d op

* bug fixed

* add ut for pad3d mkldnn op

* fix coverage

* fix bug

* fix bug

* Delete test_pad3d_mkldnn_op.py
```
  bcf67536
- Z
  
  [XPU] silu op support to use fast_swish (#53980) · 1ef0de81
  由 zhangyikun02 提交于 5月 23, 2023
  
  1ef0de81
- L
  [static op generation] group_norm (#53489) · d0514a93
  由 LoneRanger 提交于 5月 23, 2023
```
* fix the static op generation for group_norm

* fix bug of mismatch

* fix bug of AssertionError

* fix setting of composite
```
  d0514a93
22 5月, 2023 9 次提交
- R
  update_c++14_to_c++17_on_windows (#53958) · 6e043202
  由 risemeup1 提交于 5月 22, 2023
```
* update_c++14_to_c++17_on_windows

* disable test_audio_logmel_feature and test_audio_mel_feature
```
  6e043202
- R
  
  fix gcc12 error of coverage_ci (#54009) · a0085a77
  由 risemeup1 提交于 5月 22, 2023
  
  a0085a77
- L
  [XPU][PHI Kernels] fix errors when numel is zero for xpu (#54010) · 423dda37
  由 lijin23 提交于 5月 22, 2023
```
* fix empty bugs for xpu

* fix empty bugs for xpu
```
  423dda37
- Z
  
  multi_encoder support adaptive seqlen (#53982) · 664a2753
  由 zhupengyang 提交于 5月 22, 2023
  
  664a2753
- Z
  
  [xpu][infer] support runtime configs (#53595) · e135069d
  由 zhupengyang 提交于 5月 22, 2023
  
  e135069d
- Z
  [Paddle Inference] Fix transfer_layout when input size if too big (#53881) · 5ac8c040
  由 zhoutianzi666 提交于 5月 22, 2023
```
* fix transfer_layout when input size if too big
* do not add TransferLayoutKernelGPU
* add int64 and add check
```
  5ac8c040
- Z
  
  [XPU] batch_norm_grad support float16 for xpu (#53977) · 934d8b89
  由 zhangyikun02 提交于 5月 22, 2023
  
  934d8b89
- T
  Add multiclass_nms3 GPU kernel (#52401) · f71c805e
  由 Tian Zheng 提交于 5月 22, 2023
```
* Add GPU kernel for multiclass_nms3 op

* Make multiclass_nms3 gpu kernel output consistent with cpu kernel

* Fix API incompatibility

* Fix unittests on builds without CUDA

* Fix ROCM build

* Remove fluid headers; Use default atol for unittest

* Change function and variable naming

* Add comments; Reduce redundant code

* Use paddle test framework
```
  f71c805e
- W
  [XPU] bind 3D grid sample, fix edge cases in slice & reshape (#53981) · e5021ee9
  由 wangshengxiang 提交于 5月 22, 2023
```
* bind xpu op: 3D grid sample

* fix edge cases in xpu op: reshape & slice
```
  e5021ee9
19 5月, 2023 4 次提交

W

[XPU] fix fallback (#53801) · 4b85e5db
由 wz1qqx 提交于 5月 19, 2023

4b85e5db

add minimum grad composite rules (#52561) · 97690816

由 warrentdrew 提交于 5月 19, 2023

* add minimum grad composite rules

* add public python api

* fix format

* fix format

* update testcase

* fix testcase

* fix format

* fix cmakelist.txt

* fix format

* fix param problem

* fix op and composite rule

* fix bf16 cpu support problem

* fix bf16 cpu issue

* fix axis error log

* add axis for maximum

* revert commit

* remove .orig

* fix generic problem

* revert max op

* fix axis error

* fix maximum axis

* fix test_check_output

* fix cinn

* fix minimum maximum axis check

97690816

Add flash attention to speedup fused_gate_attention. (#52731) · d29c1f8e

由 limingshu 提交于 5月 19, 2023

* Reorganize the forward codes of flash-attention.

* Fix forward.

* Remove some noused codes.

* Simplify codes and fix backward.

* Change all LOG(INFO) to VLOG and fix the backward.

* add scale for AF2 flash_attn, much thanks to xreki and shaojie for debug these codes

* decrease the effect of debug print on performance

* Unify the initialize of flashattn arguments.

* Rewirte the reshape of temp_mask and temp_bias.

* API support use_flash_attn.

* Fix compiling error on CI.

* Try to crop the flash-attention lib.

* Correct the condition of whether can use flash-attn.

* Remove the softmax_out argument.

* Remove is_causal.

* Polish codes.

* Fix qkv_transpose_out's shape and scaling of Q * K.

* Update commit of flash-attention.

---------
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

d29c1f8e

L

fix_windows_static_assert_error (#53750) · 4dc28b54
由 limingshu 提交于 5月 19, 2023

4dc28b54

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功