提交 · ee0034575ca022d212acc9aa1d2b6f64153e8d39 · PaddlePaddle / Paddle

11 8月, 2023 2 次提交
- W
  
  [XPU]Add flip kernel (#55932) · ee003457
  由 wz1qqx 提交于 8月 10, 2023
  
  ee003457
- H
  
  [XPU] Add fast_gather_nd plugin (#56103) · 460e4fc6
  由 hong19860320 提交于 8月 11, 2023
  
  460e4fc6
10 8月, 2023 5 次提交

L

Implement reshard from s to r with same process_mesh (#56039) · 4569ae13
由 LiYuRio 提交于 8月 10, 2023

4569ae13
J

[XPU] Add gather_nd fp16 and add check_dtype_op_blacklist (#55860) · 307128d1
由 jiangfan06 提交于 8月 10, 2023

307128d1

Add variable_length_memory_efficient_attention (#55400) · 4036c937

由 lzy 提交于 8月 10, 2023

* add variable_length_memory_efficient_attention
* update variable_length_memory_efficient_attention unittest
* update variable_length_mem_eff_attn's docs and unittest
* update variable_length_mem_eff_attn's docs
* Update test_variable_length_memory_efficient_attention.py
* Update variable_length_memory_efficient_attention.cu
* fix codestyle
* fix variable_length_fmha's docs and unittest
* fix variable_length_fmha's docs

4036c937

add tanh_triple_grad composite logic (#56072) · 7c4a3556

由 lxd-cumt 提交于 8月 10, 2023

* decompose tanh_triple_grad and add it into prim_white_list test=develop

* fix TanhTripleGradKernel bugs test=develop

* decompose tanh_triple_grad test=develop

7c4a3556

[XPU kernel] fix warpctc issue (#55950) · 689bcad5

由 RuohengMa 提交于 8月 10, 2023

* [XPU kernel] fix warpctc issue

* fix issue

* temporal hack to circumvent depthwise_conv2d precision issue

* reset test case

689bcad5

09 8月, 2023 6 次提交
- X
  [oneDNN]rename macro to PADDLE_WITH_DNNL (#52208) · 6ff4c130
  由 Xinyu Chen 提交于 8月 09, 2023
```
* onednn: rename macro to PADDLE_WITH_DNNL

* onednn: rename macro to CINN_WITH_DNNL
```
  6ff4c130
- C
  
  Add FP16 & BF16 for nanmedian (#56056) · 4ae9945b
  由 cyberslack_lee 提交于 8月 09, 2023
  
  4ae9945b
- N
  
  change index's dtype for int to int64 (#55949) · 8d181e37
  由 niuliling123 提交于 8月 09, 2023
  
  8d181e37
- H
  
  [XPU] add fused_softmax_mask and fused_softmax_mask_grad. (#55914) · b982af4a
  由 houj04 提交于 8月 09, 2023
  
  b982af4a
- H
  [XPU] add pos_weight for sigmoid_cross_entropy_with_logits. (#55001) · 4315bc4c
  由 houj04 提交于 8月 09, 2023
```
* [XPU] add pos_weight for sigmoid_cross_entropy_with_logits.

* update xdnn version.
```
  4315bc4c
- R
  
  fix atan2 grad (#56067) · 1bf2ab48
  由 ronnywang 提交于 8月 09, 2023
  
  1bf2ab48
08 8月, 2023 6 次提交
- W
  move `decayed_adagrad_op` to phi (#55995) · 0d920178
  由 Wang Xin 提交于 8月 08, 2023
```
* move decayed_adagrad_op to phi

* fix bug
```
  0d920178
- H
  
  move dgc kernel to phi (#56003) · 3c03ade8
  由 huangjiyi 提交于 8月 08, 2023
  
  3c03ade8
- L
  
  [XPU] register multiclass_nms3 and norm xpu kernel to optimize model (#56064) · ba992136
  由 leolishaohao 提交于 8月 08, 2023
  
  ba992136
- F
  
  optimize op structure (#55988) · 6bd7f860
  由 freeliuzc 提交于 8月 08, 2023
  
  6bd7f860
- N
  
  Modefied reduce op for store temp_data with MpType (#55709) · 03ca04fe
  由 niuliling123 提交于 8月 08, 2023
  
  03ca04fe
- H
  
  add data op data type (#56033) · 7472057c
  由 hong 提交于 8月 08, 2023
  
  7472057c
07 8月, 2023 4 次提交

Add attn_mask supported for FlashAttnKernel. (#55969) · 42e0c6b8

由 yin wei 提交于 8月 07, 2023

* add mask

* add backword

* add enforce info

* update scale

* integrate code

* update enforce

* add enforce eq

* add error type

* update enforce

* add test_flash_attention

* Polish codes and fix compiling errors.

* Set num_splits to 0 for flash-attn with tensor mask.

* Fix the compiling error for non flash-attn case.

---------
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

42e0c6b8

G

[clang-tidy] NO.6 enable `modernize-avoid-c-arrays` step: 2 (#55954) · 5ada98b8
由 gouzil 提交于 8月 07, 2023

5ada98b8
R

[clang-tidy] enable modernize-use-equals-default (#55983) · 30a02d27
由 Ruibin Cheung 提交于 8月 07, 2023

30a02d27

[WIP] Integration flash attention 2 (#55758) · 0473369f

由 umiswing 提交于 8月 07, 2023

* Work for fa-2 padded fwd. Code to be cleaned.

* Work for fa2 unpadded fwd.

* Work for padded-bwd, dk get small diff on np.random.seed(0)

* Anyway I pass paddle's utest, except return softmax without dropout.

* Clean code.

* Modify interface.

* Clean code and add some check.

* Easy compile for dev.

* Fix ci.

* Fix ci-build.

* Add std c++17 option again.

* Limit max job when compiling fa2.

* Remove const_cast

* Add fwd params, to be cleaned.

* Clean code.

* Add bwd params.

* Clean code.

* Add enforce.

* Use v2.0.4

* Pass RNG state to fa2 capi

* Fix review.

* Add assert

* Skip compile for sm less than 80.

0473369f

04 8月, 2023 4 次提交
- K
  [NewIR] Rename feed with place to data (#55778) · 274e5e54
  由 kangguangli 提交于 8月 04, 2023
```
* fix bug: feed_with_place should consider variable existence

* fix

* fix build scope

* change method to set feed var name

* remove feed_with_place to placeholder

* fix

* rename to data

* fix

* fix
```
  274e5e54
- H
  [NewIR]New ir aot placement refactor (#55810) · dd1379e4
  由 hong 提交于 8月 04, 2023
```
* refacot aot

* update

* fix bugs

* remove some test

* fix bug

* fix bug

* fix bug

* fix bug

* update
```
  dd1379e4
- Z
  
  [clang-tidy] NO.12 enable modernize-use-nullptr check(#55800) · 1e4f627d
  由 Zhenghai Zhang 提交于 8月 04, 2023
  
  1e4f627d
- J
  
  [XPU] Add int support for elementwise_sub/elementwise_div (#55920) · 97ab6aa6
  由 jiangfan06 提交于 8月 04, 2023
  
  97ab6aa6
03 8月, 2023 7 次提交
- Y
  
  Optim fused linear grad add (#55927) · 91873469
  由 Yuang Liu 提交于 8月 03, 2023
  
  91873469
- Y
  
  FLUID: move limit_by_capacity to PHI (#55948) · 230c6ce1
  由 yangguohao 提交于 8月 03, 2023
  
  230c6ce1
- W
  
  [clang-tidy] [No.4] enable `modernize-loop-convert` (#55704) · 81ccd99e
  由 Wang Xin 提交于 8月 03, 2023
  
  81ccd99e
- W
  
  eliminate small pattern (#55843) · dc4b48f6
  由 wz1qqx 提交于 8月 03, 2023
  
  dc4b48f6
- H
  
  [XPU] Fix compilation errors of XPU plugin on multiple versions of GCC (#55924) · 613beeb6
  由 hong19860320 提交于 8月 03, 2023
  
  613beeb6
- W
  fix security bug (#55870) · 08f28b40
  由 wanghuancoder 提交于 8月 03, 2023
```
* fix security bug
```
  08f28b40
- W
  fix security bug (#55865) · dcf30692
  由 wanghuancoder 提交于 8月 03, 2023
```
* fix security bug
```
  dcf30692
02 8月, 2023 6 次提交

[clang-tidy] NO.6 enable `modernize-avoid-c-arrays` check (#55774) · c000091e

由 gouzil 提交于 8月 02, 2023

* [clang-tidy] modernize-avoid-c-arrays

* rollback

* [clang-tidy] fix

* close modernize-avoid-c-arrays

* fix PHI_DEFINE_string; add PHI_DEFINE_bool NOLINT

* fix PHI_DEFINE_string

* fix next_h_state and parity err

* fix win32

* fix cuda_graph

* fix accuracy_kernel

* fix math_function

* fix fused_softmax_mask_kernel.cu load_data and warp_reduce; rollback concat_and_split_functor ins_addr

* fix fused_dropout_add_grad_kernel

* fix

* rollback cu

* rollback concat_and_split_functor.cu

* rollback

c000091e

W

[XPU]Add conv1d fuse pass (#55719) · 22c7a6eb
由 wz1qqx 提交于 8月 02, 2023

22c7a6eb

[Inference] Replace groupNorm when data types are bf16 and fp16, and data... · e61d892a

由 yangjianfengo1 提交于 8月 02, 2023

[Inference] Replace groupNorm when data types are bf16 and fp16, and data format is NHWC implementation. (#55399)

* finish

* cpergroup odd

* fix bf16

* single channel

* code style

* jingdu duiqi

* add head_file

* add bf16 head file

* bf16 2

* bf16

* bf16 head

* bf16 compile

* py test

* bf16 compile

* bf16 compile

* unset py test

* nhwc

* test

* mean var

* bf16 success

* su

* ctest success

* use is_same_as

* is_same

* use is_same

* rtol

* gpu_stream

* del sigmod

* fix bfloat16 type

* use cuda_bf16_hpp

* use_cuda_arch

* bfloat162float2

* del inplace_tol

* del max_releative_tol

* temp store

* jingdu duiqi

* temp store

* plugin

* jingdu duiqi

* duiqi

* include cuda.h

* del half

* half single

* ci

* add const

* ci

* cudamemset

* del printf

* fp16 test

* add half compute

* del br16 ci

* del ci

* ci approve

* del fluid include

e61d892a

C

Add FP16 & BF16 for erfinv (#55287) · 6d7efd09
由 cyberslack_lee 提交于 8月 02, 2023

6d7efd09
W
fix security bug (#55782) · 19da5c0c
由 wanghuancoder 提交于 8月 02, 2023
```
* fix security bug
```
19da5c0c
J

[XPU] Add gather_squeeze_pass (#55605) · d13a49d6
由 jiangfan06 提交于 8月 02, 2023

d13a49d6

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功