提交 · 4036c9373c264332b7f23cdc72f6cfbb9492f8ee · PaddlePaddle / Paddle

10 8月, 2023 1 次提交

Add variable_length_memory_efficient_attention (#55400) · 4036c937

由 lzy 提交于 8月 10, 2023

* add variable_length_memory_efficient_attention
* update variable_length_memory_efficient_attention unittest
* update variable_length_mem_eff_attn's docs and unittest
* update variable_length_mem_eff_attn's docs
* Update test_variable_length_memory_efficient_attention.py
* Update variable_length_memory_efficient_attention.cu
* fix codestyle
* fix variable_length_fmha's docs and unittest
* fix variable_length_fmha's docs

4036c937

31 7月, 2023 1 次提交
- W
  Support stride2 (#55156) · 859fc01b
  由 wanghuancoder 提交于 7月 31, 2023
```
support stride
```
  859fc01b
30 7月, 2023 1 次提交
- S
  Add some ops to static build list (#55727) · 984a4cc1
  由 Sonder 提交于 7月 30, 2023
```
* update

* update
```
  984a4cc1
13 7月, 2023 2 次提交
- R
  Add matmul_int8 op (#55228) · 27cc0df5
  由 RichardWooSJTU 提交于 7月 13, 2023
```
* add matmul int8
```
  27cc0df5
- H
  [NewIR]fix new ir edit distance bug (#55294) · 2194e4c1
  由 hong 提交于 7月 13, 2023
```
* fix edit distance bug

* add op define kernel data type

* fix bug

* update

* add header

* add op test to cmake
```
  2194e4c1
11 7月, 2023 1 次提交

Integrate rmsnorm kernel (#54998) · 97d3d6ee

由 MarDino 提交于 7月 11, 2023

* add rmsnorm kernel
* add static graph test
* fix round type
* use alignas to avoid msvc compile error
* remove redundant headerfile to avoid rocm compile error
* fix rocm compile not found cub
* Add document

97d3d6ee

10 7月, 2023 1 次提交
- K
  
  update white_list and remove warning (#55243) · a8cd12d2
  由 kangguangli 提交于 7月 10, 2023
  
  a8cd12d2
06 7月, 2023 1 次提交
- K
  [NewIR] add more unit tests in white_list (#55160) · db6f3ee6
  由 kangguangli 提交于 7月 06, 2023
```
* add ir output check in OpTest

* add ir grad check in op test

* add more unittest

* fix
```
  db6f3ee6
04 7月, 2023 2 次提交

[NewIR] enable new ir for unittests in white list (#55117) · 89d3a46d

由 kangguangli 提交于 7月 04, 2023

* add ir output check in OpTest

* add ir grad check in op test

* add white list for ir op test

* fix

* open only in py3 and mac

(cherry picked from commit 6daa44da495afb0287c6b69ecefbe35bbc47cb50)

89d3a46d

[NewIR]Fix null value and support some attribute (#55100) · a2903920

由 hong 提交于 7月 04, 2023

* suport optional input in new_ir

* polish code

* add coverate test

* update

* update

* add unitest

* remove reduplicate code

* set test timeout

a2903920

03 7月, 2023 1 次提交
- add linear_compress API (#54140) · c4d5ec66
  由 FormlessUnit 提交于 7月 03, 2023
```
* add linear_compress API
```
  c4d5ec66
29 6月, 2023 1 次提交
- Z
  
  test_sync_batch_norm_op fix (#54916) · 1dcc3551
  由 zqw_1997 提交于 6月 29, 2023
  
  1dcc3551
28 6月, 2023 1 次提交
- L
  
  fix test_conv3d_transpose_op A100 test fail (#54913) · 86858a5a
  由 LokeZhou 提交于 6月 28, 2023
  
  86858a5a
27 6月, 2023 1 次提交
- X
  【prim】modify eular_beam (#54736) · 7c2c965d
  由 xiaoguoguo626807 提交于 6月 27, 2023
```
* modify eular_beam

* modify matmul infermeta

* add test

* modify timeout
```
  7c2c965d
25 6月, 2023 1 次提交

[Prim] Fix batch_norm bias_grad loss in cinn (#54751) · 3e0f0a00

由 cyber-pioneer 提交于 6月 25, 2023

* fix batch_norm grad kernel nhwc error

* fix batch_norm bias_grad loss in cinn

* disable cinn

* fix cinn_atol

3e0f0a00

20 6月, 2023 1 次提交
- J
  
  [CINN] skip timeout has setted test to avoid reset timeout (#54750) · 24523c16
  由 jiangcheng 提交于 6月 20, 2023
  
  24523c16
15 6月, 2023 1 次提交
- C
  
  fix batch_norm optest code (#54661) · 3a8484c4
  由 cyber-pioneer 提交于 6月 15, 2023
  
  3a8484c4
14 6月, 2023 3 次提交

Fix cuda12 timeout problems. (#54615) · a90d9088

由 Ghost Screaming 提交于 6月 14, 2023

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Remove climits.

* Fix problem of pickle and NCCL_P2P_DISABLE in distributed testcases in
cuda12.

* Fix problem of TimeOut of distributed testcases under cuda12.

* Remove useless modification.

* Remove useless modification.

a90d9088

[prim] move batch_norm prim test to op_test (#54458) · 58b4c60f

由 cyber-pioneer 提交于 6月 14, 2023

* move batch_norm prim test to op_test

* fix optest bug

* add test to cmake

* add cinn test case

* fix batch_norm prim grad bf16

* fix code

* add cuda check

* fix batch_norm bfloat16

* fix cpu bfloat16 bug

* skip non-bfloat16-supported platform

* fix code

* fix cinn rtol and atol in bfloat16

* fix name

* fix config

58b4c60f

C

support group_norm and cumsum prim ops bf16 dtype (#54580) · f7eb03c6
由 Charles-hit 提交于 6月 14, 2023

f7eb03c6

13 6月, 2023 3 次提交

[CINN] Enable CINN unittest on atan2, tile, top_k, where (#54280) · cf7cd247

由 Fisher 提交于 6月 13, 2023

* Enable check_cinn on atan2, tile, top_k and where

* Update cmakelists in legacy_test

* Reformat code

* Enable check_cinn on op take_along_axis legacy test

* Enable check_cinn on pool2d

* Remove check_cinn=False

* Try fix tile test error

* Rename enable_cinn to test_cinn

* Refactor test_tile_op

* Replace all enable_cinn to check_cinn

* Revert pool2d test timeout

* Remove check_prim and use enable_cinn

cf7cd247

Fix cuda12 timeout (#54540) · 7309f8ab

由 TaoTao Li 提交于 6月 13, 2023

* fix a100 cuda12 timeout

* fix cuda12 pickle loads problem

* fix ist_sharding_save ut

7309f8ab

Y

move single card ut to legacy_test dir (#54560) · 53f24669
由 Yuang Liu 提交于 6月 13, 2023

53f24669

12 6月, 2023 1 次提交
- T
  
  Numpy version is too high, resulting in issues with single testing (#54450) · 334c86af
  由 tianshuo78520a 提交于 6月 12, 2023
  
  334c86af
08 6月, 2023 2 次提交
- C
  [AMP Prim OP]support some prim ops for bf16 dtype part3 (#54368) · e64a18db
  由 Charles-hit 提交于 6月 08, 2023
```
* support some prim ops bf16 dtype

* fix cmake
```
  e64a18db
- C
  [AMP Prim OP]support some prim ops for bf16 dtype part5 (#54422) · bb89c0c8
  由 Charles-hit 提交于 6月 08, 2023
```
* support some prim ops for bf16 dtype

* remove useless code
```
  bb89c0c8
07 6月, 2023 1 次提交
- J
  [CINN] reopen mean/cumsum/instance_norm op's prim+CINN test (#54406) · 099b3d25
  由 jiangcheng 提交于 6月 07, 2023
```
* [CINN] reopen mean/cumsum/instance_norm op's prim+CINN test

* remove repeat test_mean_op in cmake
```
  099b3d25
05 6月, 2023 1 次提交
- Z
  [CINN] Enable check_cinn on some tests (#54261) · b8352611
  由 zzk0 提交于 6月 05, 2023
```
* [CINN] Enable check_cinn

* add CMakeLists.txt
```
  b8352611
02 6月, 2023 1 次提交
- W
  
  【Hackathon No.2】为 Paddle 新增 cdist API (#53836) · 5b4d786b
  由 Wang Xin 提交于 6月 02, 2023
  
  5b4d786b
01 6月, 2023 2 次提交

C
[AMP Prim OP]support bf16 dtype for layer_norm prim op (#54236) · e3fcbb8f
由 Charles-hit 提交于 6月 01, 2023
```
* support layer_norm prim op bf16 dtype

* polish code

* resolve conflict
```
e3fcbb8f

mv all unittests test (#53235) · b0e86d55

由 tianshuo78520a 提交于 6月 01, 2023

* mv all unittests test

* fix error

* fix error

* fix

* fix

* del unittests

* fix paddle_build.sh

* fix

* fix test

* fix add test

* fix

* fix

* fix

* merge develop

* fix

* fix

* fix

* fix

* fix

* merge develop

* fix test_async_read_write

* fix test_async_read_write

* merge develop

* fix

* fix import legacy_test

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix bug

* fix

* fix coverage test bug

* fix

* fix

* fix

* fix

* fix

* fix code sstyle

* fix code

* fix code

* fix

* fix

* fix

* del test_sequence_enumerate_op.py

* fix

b0e86d55

22 5月, 2023 1 次提交

update_c++14_to_c++17_on_windows (#53958) · 6e043202

由 risemeup1 提交于 5月 22, 2023

* update_c++14_to_c++17_on_windows

* disable test_audio_logmel_feature and test_audio_mel_feature

6e043202

18 5月, 2023 1 次提交
- T
  Del test_async_read_write in CPU (#53882) · acb5039a
  由 tianshuo78520a 提交于 5月 18, 2023
```
* fix

* fix
```
  acb5039a
23 3月, 2023 1 次提交
- Z
  
  [Test Mv] reader to legacy_test (#51943) · 227245d3
  由 Zheng-Bicheng 提交于 3月 23, 2023
  
  227245d3
22 3月, 2023 1 次提交
- Z
  
  [Test Mv] legacy_test (#51941) · 1617ba76
  由 Zheng-Bicheng 提交于 3月 22, 2023
  
  1617ba76
20 3月, 2023 1 次提交
- T
  
  Mv phi and fluid/test To test dir (#50640) · e808fa30
  由 tianshuo78520a 提交于 3月 20, 2023
  
  e808fa30
23 12月, 2022 1 次提交
- W
  
  Add configure of quantization for dynamic graph (#48000) · 941444aa
  由 whs 提交于 12月 23, 2022
  
  941444aa
14 6月, 2022 1 次提交
- W
  fix cmake-lint problems. (#43406) · 59f89236
  由 Wilber 提交于 6月 14, 2022
```
* cmake-lint

* update
```
  59f89236
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
18 9月, 2021 1 次提交

由 Feiyu Chan 提交于 9月 18, 2021

* 1. add interface for fft;
2. add data type predicate;
3. fix paddle.roll.

* add fft c2c cufft kernel

* implement argument checking & op calling parts for fft_c2c and fftn_c2c

* add operator and opmaker definitions

* only register float and double for cpu.

* add common code for implementing FFT, add pocketfft as a dependency

* add fft c2c cufft kernel function

* fix bugs in python interface

* add support for c2r, r2c operators, op makers, kernels and kernel functors.

* test and fix bugs

* 1. fft_c2c function: add support for onesided=False;
2. add complex<float>, complex<double> support for concat and flip.

* 1. fft: fix python api bugs;
2. shape_op: add support for complex data types.

* fft c2c cufft kernel done with complie and link

* fix shape_op, add mkl placeholder

* remove mkl

* complete fft c2c in gpu

* 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft;
2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation.

* complete fft c2c on gpu in ND

* complete fft c2c on gpu in ND

* complete fft c2c backward in ND

* fix MKL-based implementation

* Add frame op and CPU/GPU kernels.

* Add frame op forward unittest.

* Add frame op forward unittest.

* Remove axis parameter in FrameFunctor.

* Add frame op grad CPU/GPU kernels and unittest.

* Add frame op grad CPU/GPU kernels and unittest.

* Update doc string.

* Update after review and remove librosa requirement in unittest.

* Update grad kernel.

* add fft_c2r op

* Remove data allocation in TransCompute function.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* last fft c2r functor

* fix C2R and R2C for cufft, becase the direction is not an option in these cases.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* fix bugs in python APIs

* fix fft_c2r grad kernal

* fix bugs in python APIs

* add cuda fft c2r grad kernal functor

* clean code

* fix fft_c2r python API

* fill fft r2c result with conjugate symmetry (#19)

fill fft r2c result with conjugate symmetry

* add placeholder for unittests (#24)

* simple parameterize test function by auto generate test case from parm list (#25)

* miscellaneous fixes for python APIs (#26)

* add placeholder for unittests

* resize fft inputs before computation is n or s is provided.

* add complex kernels for pad and pad_grad

* simplify argument checking.

* add type promotion

* add int to float or complex promotion

* fix output data type for static mode

* fix fft's input dtype dispatch, import fft to paddle

* fix typos in axes checking (#27)

* fix typos in axes checking

* fix argument checking (#28)

* fix argument checking

* Add C2R Python layer normal and abnormal use cases (#29)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30)

* Documentation of the common interfaces of c2r and c2c (#31)

* Documentation of the common interfaces of c2r and c2c

* clean c++ code  (#32)

* clean code

* Add numpy-based implementation of spectral ops (#33)

* add numpy reference implementation of spectral ops

* Add fft_c2r numpy based implementation for unittest. (#34)

* add fft_c2r numpy implementation

* Add deframe op and stft/istft api. (#23)

* Add frame api

* Add deframe op and kernels.

* Add stft and istft apis.

* Add deframe api. Update stft and istft apis.

* Fix bug in frame_from_librosa function when input dims >= 3

* Rename deframe to overlap_add.

* Update istft.

* Update after code review.

* Add overlap_add op and stft/istft api unittest (#35)

* Add overlap_add op unittest.

* Register complex kernels of squeeze/unsquuze op.

* Add stft/istft api unittest.

* Add unittest for fft helper functions (#36)

* add unittests for fft helper functions. add complex kernel for roll op.

* complete static graph unittest for all public api (#37)

* Unittest of op with FFT C2C, C2R and r2c added (#38)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* Documentation of the common interfaces of c2r and c2c

* Unittest of op with FFT C2C, C2R and r2c added
Co-authored-by: lijiaqi <lijiaqi0612@163.com>

* add fft related options to CMakeLists.txt

* fix typos and clean code (#39)

* fix invisible character in mkl branch and fix error in error message

* clean code: remove docstring from unittest for signal.py.

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40)

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.

* fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41)

1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r;
3. fix unittest to catch UnImplementedError and RuntimeError;
4. fix compile error by avoid using thrust when cuda is not available.
5.  fix sample code, use paddle.fft instead of paddle.tensor.fft

* remove inclusion of thrust, add __all__ list for fft (#42)

* Add api doc and update unittest. (#43)

* Add doc strings.
* Update overlap_add op unittest

* fix MKL-based FFT implementation (#44)

* fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R

* remove code for debug (#45)

* use dynload for cufft (#46)

* use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms.

* add complex support for fill_zeros_like

* use dynload for cufft

* Update doc and unittest. (#47)

* Add doc of frame op and overlap_add op.

* Update unittest.

* use dynload for cufft (#48)

1. use dynload for cufft
2. fix unittest;
3. temporarily disable Rocm.

* fix conflicts and merge upstream (#49)

fix conflicts and merge upstream

* fix compile error: only link dyload_cuda when cuda is available (#50)

* fix compile error: only link dyload_cuda when cuda is available

* fix dynload for cufft on windows (#51)

1. fix dynload for cufft on windows;
2. fix unittests.

* add NOMINMAX to compile on windows (#52)

 add NOMINMAX to compile on windows

* explicitly specify capture mode for lambdas (#55)

 explicitly specify capture mode for lambdas

* fix fft sample (#53)

* fix fft sample

* update scipy and numpy version for unittests of fft (#56)

update scipy and numpy version for unittests of fft

* Add static graph unittests of frame and overlap_add api. (#57)

* Remove cache of cuFFT & Disable ONEMKL (#59)

1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm
2. remove cache of cufft plans;
3. enhance error checking.
4. default WITH_ONEMKL to OFF
Co-authored-by: Njeff41404 <jeff41404@gmail.com>
Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
Co-authored-by: NKP <109694228@qq.com>
Co-authored-by: lijiaqi <lijiaqi0612@163.com>
Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
Co-authored-by: Nlijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

11518a43

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功