提交 · f71c805e7fe67283039bf7c15565ad3b9bd48b92 · PaddlePaddle / Paddle

22 5月, 2023 2 次提交

Add multiclass_nms3 GPU kernel (#52401) · f71c805e

由 Tian Zheng 提交于 5月 22, 2023

* Add GPU kernel for multiclass_nms3 op

* Make multiclass_nms3 gpu kernel output consistent with cpu kernel

* Fix API incompatibility

* Fix unittests on builds without CUDA

* Fix ROCM build

* Remove fluid headers; Use default atol for unittest

* Change function and variable naming

* Add comments; Reduce redundant code

* Use paddle test framework

f71c805e

N
Print python trace back when debugmode = CHECK_NAN_INF_AND_ABORT and backward... · d2fa26f6
由 niuliling123 提交于 5月 22, 2023
```
Print python trace back when debugmode = CHECK_NAN_INF_AND_ABORT  and backward has nan/inf  (#52808)
```
d2fa26f6

19 5月, 2023 5 次提交

add minimum grad composite rules (#52561) · 97690816

由 warrentdrew 提交于 5月 19, 2023

* add minimum grad composite rules

* add public python api

* fix format

* fix format

* update testcase

* fix testcase

* fix format

* fix cmakelist.txt

* fix format

* fix param problem

* fix op and composite rule

* fix bf16 cpu support problem

* fix bf16 cpu issue

* fix axis error log

* add axis for maximum

* revert commit

* remove .orig

* fix generic problem

* revert max op

* fix axis error

* fix maximum axis

* fix test_check_output

* fix cinn

* fix minimum maximum axis check

97690816

Add flash attention to speedup fused_gate_attention. (#52731) · d29c1f8e

由 limingshu 提交于 5月 19, 2023

* Reorganize the forward codes of flash-attention.

* Fix forward.

* Remove some noused codes.

* Simplify codes and fix backward.

* Change all LOG(INFO) to VLOG and fix the backward.

* add scale for AF2 flash_attn, much thanks to xreki and shaojie for debug these codes

* decrease the effect of debug print on performance

* Unify the initialize of flashattn arguments.

* Rewirte the reshape of temp_mask and temp_bias.

* API support use_flash_attn.

* Fix compiling error on CI.

* Try to crop the flash-attention lib.

* Correct the condition of whether can use flash-attn.

* Remove the softmax_out argument.

* Remove is_causal.

* Polish codes.

* Fix qkv_transpose_out's shape and scaling of Q * K.

* Update commit of flash-attention.

---------
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

d29c1f8e

Z
Add large dim test of log_softmax (#53954) · 1b6972fd
由 Zhang Zheng 提交于 5月 19, 2023
```
* Add large dim test of log_softmax

* fix
```
1b6972fd
C

fix meshgird and expand_as test (#53951) · 14e0ce71
由 Charles-hit 提交于 5月 19, 2023

14e0ce71
D
delete bf16 of cross entropy (#53922) · 69d3f4e3
由 Danyang Zhang 提交于 5月 19, 2023
```
* delete bf16 of cross entropy

* delete bf16 of cross entropy
```
69d3f4e3

18 5月, 2023 6 次提交
- C
  [AMP OP&Test]support prod、meshgrid、expand_as bf16 dtype (#53865) · 706503d0
  由 Charles-hit 提交于 5月 18, 2023
```
* add meshgrid,expand_as, prod and grad bf16 kernel

* fix bf16 for optest

* modify code style

* fix amp test
```
  706503d0
- H
  [CINN] Fix TestGelu unittest of CINN (#53859) · 2b7cbd1b
  由 HongyuJia 提交于 5月 18, 2023
```
* [CINN] Fix TestGelu unittest of CINN

* pass if_enable_cinn
```
  2b7cbd1b
- C
  
  Add einsum tests (#53722) · c3c85794
  由 co63oc 提交于 5月 18, 2023
  
  c3c85794
- C
  
  Add segment_pool tests (#53785) · 0bed2203
  由 co63oc 提交于 5月 18, 2023
  
  0bed2203
- L
  
  add fp16 and bf16 for trunc (#53876) · d8407c51
  由 LoneRanger 提交于 5月 18, 2023
  
  d8407c51
- C
  
  Fix typos, test=document_fix (#53916) · 92121d17
  由 co63oc 提交于 5月 18, 2023
  
  92121d17
17 5月, 2023 4 次提交
- J
  
  [CINN] extend cinn single test timeout from 150 to 200, test=document_fix (#53908) · 4f1bf199
  由 jiangcheng 提交于 5月 17, 2023
  
  4f1bf199
- D
  [CustomDevice] suport device_guard for custom device (#53808) · 9e045eeb
  由 duanyanhui 提交于 5月 17, 2023
```
* suport device_guard for npu

* fix comment

* fix typo
```
  9e045eeb
- J
  
  remove fluid memory_usage_calc&model_stat&op_frequence (#53838) · 2cb28014
  由 JYChen 提交于 5月 17, 2023
  
  2cb28014
- L
  【Hackathon 4 No.21】Add i1 / i1e to paddle (#53210) · a63fb4c8
  由 LyndonKong 提交于 5月 17, 2023
```
* Add i1 and i1e op

* resolve merge conflicts
```
  a63fb4c8
16 5月, 2023 14 次提交

C

Add huber_loss tests (#53535) · 74b91bce
由 co63oc 提交于 5月 16, 2023

74b91bce
[Zero-Dim] update 0d tensor api en doc, test=document_fix (#53823) · 50f0acc0
由 zhouweiwei2014 提交于 5月 16, 2023

50f0acc0

【Hackathon No57】add bf16 for mode (#53195) · 640cff0a

由 Difer 提交于 5月 16, 2023

* add bf16 for mode

* remove random seed 666

* try to fix op_type error

* test for me

* try to fix op_type

* fix redundancy code

* add fp,bf for lastdim

* fix some error

* simplify code

* fix shape error

* optype error

* fix skipif bf16

640cff0a

Y

add timer to pp (#53831) · 0ab7f949
由 Yuang Liu 提交于 5月 16, 2023

0ab7f949
Fix some tests for issuse 52842 (#53795) · c33ba9d4
由 zhenhailiu 提交于 5月 16, 2023
```
* polish

* polish
```
c33ba9d4

【static】modify backward prune logic for EmptygradOpMaker (#53746) · 69161a96

由 xiaoguoguo626807 提交于 5月 16, 2023

* add rules

* modify no kernel yaml parse

* success op generate

* success test_silu_double

* modify bug

* modify static error

* modify silu_grad input

* modify kernel signature

* modify kernel signature

* code style

* code style

* review

* delete opinfo modify

* modify gradOpMaker

* modify gradOpMaker

* modify genarated-j2

* add approve rules

* modify aytograd_functional_static_test

69161a96

C

fix _strip_grad_suffix_ bugs when input patten is 'x@GRAD@RENAME' · 0689e2a5
由 cxxly 提交于 5月 12, 2023

0689e2a5

Move fused batchnorm to Phi (#53476) · 5e5481d8

由 Sonder 提交于 5月 16, 2023

* trans fused batch norm Compute function

* trans batch norm register info to phi

* trans fused batch norm grad Compute

* trans batch norm grad register info

* add sig file

* update sig file

* Update fused_bn_activation_kernel.cu

* Update fused_bn_activation_grad_kernel.cu

* fix

* Rename fused_bn_activation_kernel_grad.cu to fused_bn_activation_kernel.cu

* fix

* fix

* fix CudnnDataType error

* fix

* fix include

* update

* add #if

* add fused bn act to cmakelist.txt

* update  cmakelist

* fix #ifdef error

* add timeout set

* add env set

* fix

* fix

* Update fused_bn_activation_sig.cc

5e5481d8

C

【Hackathon4 No.61】remainder 算子FP16/BF16单测完善 (#52920) · 481511a6
由 cyberslack_lee 提交于 5月 16, 2023

481511a6
[dygraph]remove legacy code : _in_eager_mode_ and _in_eager_without_dygraph_check() (#53761) · b1333175
由 meteor135 提交于 5月 16, 2023
```
* remove _in_eager_mode_

* remove _in_eager_mode_
```
b1333175
C

Add fill_constant_batch_size_like tests (#53736) · 98100fd2
由 co63oc 提交于 5月 16, 2023

98100fd2

[phi] move stft to phi - Step 1 (#53517) · 00c21abc

由 gouzil 提交于 5月 16, 2023

* [phi]mv StftKernel to phi

* [phi] fix KernelSignature

* [phi]fix arr error

* [phi] Disable check_dygraph

* [phi]fix include

* [phi] rewrite mutable_data, add output register

* [phi] fix  Alloc

* [phi] fix Alloc again

* [phi] fix mutable_data

* [phi] fix onesided_out Resize

00c21abc

M
fix simple typos (#53783) · 847c48a8
由 Mahmoud Ashraf 提交于 5月 16, 2023
```
* correct 1th to 1st

* correct 1th to 1st

* fix typo

* fix typos
```
847c48a8
A

fix pinv api for divide zero (#53815) · 434343c6
由 andyj 提交于 5月 16, 2023

434343c6

15 5月, 2023 4 次提交

Z

fix bug of test_pad_op for cinn (#53772) · a9c3e32d
由 zyfncg 提交于 5月 15, 2023

a9c3e32d

add check ops for prim (#52302) · 3d6bd6a4

由 Charles-hit 提交于 5月 15, 2023

* add check ops for prim

* fix pow and concat composite registration

* modify log

* add note and remove useless code

* remove useless code

* modify program to check

* remove useless note

3d6bd6a4

relocate python/paddle/fluid/regularizer.py (#53106) · 00e415de

由 LoneRanger 提交于 5月 15, 2023

* relocate regularizer.py

* fix bug

* fix bug

* fix bug

* relocate the import

* replace _regularization_coeff with coeff

* remove the L1DecayRegularizer and L2DecayRegularizer

00e415de

Tranpose layout (#53351) · 3dce9f0a

由 niuliling123 提交于 5月 15, 2023

* update

* Update backward.h

* Update composite_backward_api.h

* Update tensor_utils.cc

* Update backward.cc

* update

* stype

* update

* add ctest

* code stype

3dce9f0a

12 5月, 2023 2 次提交

【Hackathon 4 No.20】Add i0 / i0e to paddle (#52058) · ce256f75

由 PommesPeter 提交于 5月 12, 2023

* added base code for i0 and i0e

* added grad base code for i0 and i0e

* added i0 and i0e python code

* added ops and backward yaml config

* added i0 and i0e cpu kernel, but not test.

* added i0 and i0e code and unitest files

* added test files

* added i0/i0e gpu implementation code

* updated code style

* updated code style

* fixed unitests code

* updated i0 with eigen3

* fixed bug and added more test cases

* refactor: fixed static graph bug

* refactor: removed i0 and i0e from op_compat

* refactor: updated code style

* refactor: updated op_compat.yaml

* refactor: updated op_compat.yaml

* refactor: fixed op name mapping and optimize unittest case

* refactor: manually implement i0 / i0e

* refactor: added grad kernel for i0 / i0e,didn't finish

* Update math.py

* refactor: added equation to doc in English and added comments for computing i0 / i0e gradient

* refactor: removed eigen implementation

* refactor: finished i0 / i0e cpu and gpu op

* refactor: updated code style

* fix: find  a bug but not fix

* fix: incorrect unittest cases

* update: updated code style and remove my file

* update: updated unittest case

* fix: fixed sign error

* fix: fixed mistakes when merging

* refactor: updated code style

* refactor: remove unused code

* refactor: updated code style

ce256f75

【Prim】support higher order autodiff for dy2static+composite (#53171) · b73594b4

由 Xiaoxu Chen 提交于 5月 12, 2023

* [Dy2St]Fix x grad names when high order gradient

* Polish error msg

* Add inputs var to backward in dy2st

* Fix error

* Get grad names for backward API

* Fix save load

* Polish code

* Add ut

* [prim] fix not support optional grad bugs in higher order autodiff

* [prim] remove duplicate fill_any_like caused by infershape_for_composite

* fix _strip_grad_suffix_ bugs in higher-order autodiff

* [prim] create output for test_static_prim.cc

---------
Co-authored-by: N0x45f <wangzhen45@baidu.com>

b73594b4

11 5月, 2023 3 次提交
- L
  
  revise 'Examples' of LBFGS to create right docs(cn), test=docs_preview (#53375) · dc003fa3
  由 lijialin03 提交于 5月 11, 2023
  
  dc003fa3
- K
  move DataLoader code to paddle.io (#48699) · 793f3b93
  由 Kaipeng Deng 提交于 5月 11, 2023
```
* move DataLoader to paddle.io. test=develop
```
  793f3b93
- 张
  
  昇腾和寒武纪相关代码退场 npu相关代码退场2 (#53568) · 0d45ac73
  由张春乔提交于 5月 11, 2023
  
  0d45ac73

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功