- 10 8月, 2023 1 次提交
-
-
由 lzy 提交于
* add variable_length_memory_efficient_attention * update variable_length_memory_efficient_attention unittest * update variable_length_mem_eff_attn's docs and unittest * update variable_length_mem_eff_attn's docs * Update test_variable_length_memory_efficient_attention.py * Update variable_length_memory_efficient_attention.cu * fix codestyle * fix variable_length_fmha's docs and unittest * fix variable_length_fmha's docs
-
- 31 7月, 2023 1 次提交
-
-
由 wanghuancoder 提交于
support stride
-
- 30 7月, 2023 1 次提交
-
-
由 Sonder 提交于
* update * update
-
- 13 7月, 2023 2 次提交
-
-
由 RichardWooSJTU 提交于
* add matmul int8
-
由 hong 提交于
* fix edit distance bug * add op define kernel data type * fix bug * update * add header * add op test to cmake
-
- 11 7月, 2023 1 次提交
-
-
由 MarDino 提交于
* add rmsnorm kernel * add static graph test * fix round type * use alignas to avoid msvc compile error * remove redundant headerfile to avoid rocm compile error * fix rocm compile not found cub * Add document
-
- 10 7月, 2023 1 次提交
-
-
由 kangguangli 提交于
-
- 06 7月, 2023 1 次提交
-
-
由 kangguangli 提交于
* add ir output check in OpTest * add ir grad check in op test * add more unittest * fix
-
- 04 7月, 2023 2 次提交
-
-
由 kangguangli 提交于
* add ir output check in OpTest * add ir grad check in op test * add white list for ir op test * fix * open only in py3 and mac (cherry picked from commit 6daa44da495afb0287c6b69ecefbe35bbc47cb50)
-
由 hong 提交于
* suport optional input in new_ir * polish code * add coverate test * update * update * add unitest * remove reduplicate code * set test timeout
-
- 03 7月, 2023 1 次提交
-
-
由 FormlessUnit 提交于
* add linear_compress API
-
- 29 6月, 2023 1 次提交
-
-
由 zqw_1997 提交于
-
- 28 6月, 2023 1 次提交
-
-
由 LokeZhou 提交于
-
- 27 6月, 2023 1 次提交
-
-
由 xiaoguoguo626807 提交于
* modify eular_beam * modify matmul infermeta * add test * modify timeout
-
- 25 6月, 2023 1 次提交
-
-
由 cyber-pioneer 提交于
* fix batch_norm grad kernel nhwc error * fix batch_norm bias_grad loss in cinn * disable cinn * fix cinn_atol
-
- 20 6月, 2023 1 次提交
-
-
由 jiangcheng 提交于
-
- 15 6月, 2023 1 次提交
-
-
由 cyber-pioneer 提交于
-
- 14 6月, 2023 3 次提交
-
-
由 Ghost Screaming 提交于
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * Remove climits. * Fix problem of pickle and NCCL_P2P_DISABLE in distributed testcases in cuda12. * Fix problem of TimeOut of distributed testcases under cuda12. * Remove useless modification. * Remove useless modification.
-
由 cyber-pioneer 提交于
* move batch_norm prim test to op_test * fix optest bug * add test to cmake * add cinn test case * fix batch_norm prim grad bf16 * fix code * add cuda check * fix batch_norm bfloat16 * fix cpu bfloat16 bug * skip non-bfloat16-supported platform * fix code * fix cinn rtol and atol in bfloat16 * fix name * fix config
-
由 Charles-hit 提交于
-
- 13 6月, 2023 3 次提交
-
-
由 Fisher 提交于
* Enable check_cinn on atan2, tile, top_k and where * Update cmakelists in legacy_test * Reformat code * Enable check_cinn on op take_along_axis legacy test * Enable check_cinn on pool2d * Remove check_cinn=False * Try fix tile test error * Rename enable_cinn to test_cinn * Refactor test_tile_op * Replace all enable_cinn to check_cinn * Revert pool2d test timeout * Remove check_prim and use enable_cinn
-
由 TaoTao Li 提交于
* fix a100 cuda12 timeout * fix cuda12 pickle loads problem * fix ist_sharding_save ut
-
由 Yuang Liu 提交于
-
- 12 6月, 2023 1 次提交
-
-
由 tianshuo78520a 提交于
-
- 08 6月, 2023 2 次提交
-
-
由 Charles-hit 提交于
* support some prim ops bf16 dtype * fix cmake
-
由 Charles-hit 提交于
* support some prim ops for bf16 dtype * remove useless code
-
- 07 6月, 2023 1 次提交
-
-
由 jiangcheng 提交于
* [CINN] reopen mean/cumsum/instance_norm op's prim+CINN test * remove repeat test_mean_op in cmake
-
- 05 6月, 2023 1 次提交
-
-
由 zzk0 提交于
* [CINN] Enable check_cinn * add CMakeLists.txt
-
- 02 6月, 2023 1 次提交
-
-
由 Wang Xin 提交于
-
- 01 6月, 2023 2 次提交
-
-
由 Charles-hit 提交于
* support layer_norm prim op bf16 dtype * polish code * resolve conflict
-
由 tianshuo78520a 提交于
* mv all unittests test * fix error * fix error * fix * fix * del unittests * fix paddle_build.sh * fix * fix test * fix add test * fix * fix * fix * merge develop * fix * fix * fix * fix * fix * merge develop * fix test_async_read_write * fix test_async_read_write * merge develop * fix * fix import legacy_test * fix * fix * fix * fix * fix * fix * fix * fix * fix bug * fix * fix coverage test bug * fix * fix * fix * fix * fix * fix code sstyle * fix code * fix code * fix * fix * fix * del test_sequence_enumerate_op.py * fix
-
- 22 5月, 2023 1 次提交
-
-
由 risemeup1 提交于
* update_c++14_to_c++17_on_windows * disable test_audio_logmel_feature and test_audio_mel_feature
-
- 18 5月, 2023 1 次提交
-
-
由 tianshuo78520a 提交于
* fix * fix
-
- 23 3月, 2023 1 次提交
-
-
由 Zheng-Bicheng 提交于
-
- 22 3月, 2023 1 次提交
-
-
由 Zheng-Bicheng 提交于
-
- 20 3月, 2023 1 次提交
-
-
由 tianshuo78520a 提交于
-
- 23 12月, 2022 1 次提交
-
-
由 whs 提交于
-
- 14 6月, 2022 1 次提交
-
-
由 Wilber 提交于
* cmake-lint * update
-
- 04 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 18 9月, 2021 1 次提交
-
-
由 Feiyu Chan 提交于
* 1. add interface for fft; 2. add data type predicate; 3. fix paddle.roll. * add fft c2c cufft kernel * implement argument checking & op calling parts for fft_c2c and fftn_c2c * add operator and opmaker definitions * only register float and double for cpu. * add common code for implementing FFT, add pocketfft as a dependency * add fft c2c cufft kernel function * fix bugs in python interface * add support for c2r, r2c operators, op makers, kernels and kernel functors. * test and fix bugs * 1. fft_c2c function: add support for onesided=False; 2. add complex<float>, complex<double> support for concat and flip. * 1. fft: fix python api bugs; 2. shape_op: add support for complex data types. * fft c2c cufft kernel done with complie and link * fix shape_op, add mkl placeholder * remove mkl * complete fft c2c in gpu * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft; 2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation. * complete fft c2c on gpu in ND * complete fft c2c on gpu in ND * complete fft c2c backward in ND * fix MKL-based implementation * Add frame op and CPU/GPU kernels. * Add frame op forward unittest. * Add frame op forward unittest. * Remove axis parameter in FrameFunctor. * Add frame op grad CPU/GPU kernels and unittest. * Add frame op grad CPU/GPU kernels and unittest. * Update doc string. * Update after review and remove librosa requirement in unittest. * Update grad kernel. * add fft_c2r op * Remove data allocation in TransCompute function. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * last fft c2r functor * fix C2R and R2C for cufft, becase the direction is not an option in these cases. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * fix bugs in python APIs * fix fft_c2r grad kernal * fix bugs in python APIs * add cuda fft c2r grad kernal functor * clean code * fix fft_c2r python API * fill fft r2c result with conjugate symmetry (#19) fill fft r2c result with conjugate symmetry * add placeholder for unittests (#24) * simple parameterize test function by auto generate test case from parm list (#25) * miscellaneous fixes for python APIs (#26) * add placeholder for unittests * resize fft inputs before computation is n or s is provided. * add complex kernels for pad and pad_grad * simplify argument checking. * add type promotion * add int to float or complex promotion * fix output data type for static mode * fix fft's input dtype dispatch, import fft to paddle * fix typos in axes checking (#27) * fix typos in axes checking * fix argument checking (#28) * fix argument checking * Add C2R Python layer normal and abnormal use cases (#29) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30) * Documentation of the common interfaces of c2r and c2c (#31) * Documentation of the common interfaces of c2r and c2c * clean c++ code (#32) * clean code * Add numpy-based implementation of spectral ops (#33) * add numpy reference implementation of spectral ops * Add fft_c2r numpy based implementation for unittest. (#34) * add fft_c2r numpy implementation * Add deframe op and stft/istft api. (#23) * Add frame api * Add deframe op and kernels. * Add stft and istft apis. * Add deframe api. Update stft and istft apis. * Fix bug in frame_from_librosa function when input dims >= 3 * Rename deframe to overlap_add. * Update istft. * Update after code review. * Add overlap_add op and stft/istft api unittest (#35) * Add overlap_add op unittest. * Register complex kernels of squeeze/unsquuze op. * Add stft/istft api unittest. * Add unittest for fft helper functions (#36) * add unittests for fft helper functions. add complex kernel for roll op. * complete static graph unittest for all public api (#37) * Unittest of op with FFT C2C, C2R and r2c added (#38) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * Documentation of the common interfaces of c2r and c2c * Unittest of op with FFT C2C, C2R and r2c added Co-authored-by: lijiaqi <lijiaqi0612@163.com> * add fft related options to CMakeLists.txt * fix typos and clean code (#39) * fix invisible character in mkl branch and fix error in error message * clean code: remove docstring from unittest for signal.py. * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40) * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41) 1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. 2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r; 3. fix unittest to catch UnImplementedError and RuntimeError; 4. fix compile error by avoid using thrust when cuda is not available. 5. fix sample code, use paddle.fft instead of paddle.tensor.fft * remove inclusion of thrust, add __all__ list for fft (#42) * Add api doc and update unittest. (#43) * Add doc strings. * Update overlap_add op unittest * fix MKL-based FFT implementation (#44) * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R * remove code for debug (#45) * use dynload for cufft (#46) * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms. * add complex support for fill_zeros_like * use dynload for cufft * Update doc and unittest. (#47) * Add doc of frame op and overlap_add op. * Update unittest. * use dynload for cufft (#48) 1. use dynload for cufft 2. fix unittest; 3. temporarily disable Rocm. * fix conflicts and merge upstream (#49) fix conflicts and merge upstream * fix compile error: only link dyload_cuda when cuda is available (#50) * fix compile error: only link dyload_cuda when cuda is available * fix dynload for cufft on windows (#51) 1. fix dynload for cufft on windows; 2. fix unittests. * add NOMINMAX to compile on windows (#52) add NOMINMAX to compile on windows * explicitly specify capture mode for lambdas (#55) explicitly specify capture mode for lambdas * fix fft sample (#53) * fix fft sample * update scipy and numpy version for unittests of fft (#56) update scipy and numpy version for unittests of fft * Add static graph unittests of frame and overlap_add api. (#57) * Remove cache of cuFFT & Disable ONEMKL (#59) 1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm 2. remove cache of cufft plans; 3. enhance error checking. 4. default WITH_ONEMKL to OFF Co-authored-by: Njeff41404 <jeff41404@gmail.com> Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com> Co-authored-by: NKP <109694228@qq.com> Co-authored-by: lijiaqi <lijiaqi0612@163.com> Co-authored-by: NXiaoxu Chen <chenxx_id@163.com> Co-authored-by: Nlijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>
-