- 19 9月, 2021 2 次提交
-
-
由 limingshu 提交于
* Optimization of pool2d grad, first commit. * remove useless print codes * refine codes * refine codes * seal more operation into template specialization * fix template struct error in MaxPool2dGrad. * Fix header including error * refine code with comment * Seal the param-preparation codes into function for common use. * Seal the param-preparation codes into function for common use. * Seal the param-preparation into funciton and make it common for other kernels * polish code and erase useless template speicalization * Rerun triger * rerun trigger
-
由 baoachun 提交于
-
- 18 9月, 2021 18 次提交
-
-
由 zhangbo9674 提交于
-
由 WangXi 提交于
-
由 Huihuang Zheng 提交于
Add basic Cost Model, it uses executor to run program and profile it to get op time. This is an early basic version, we will add more functions in the future.
-
由 Guoxia Wang 提交于
* fix bug
-
由 crystal 提交于
-
由 Sing_chan 提交于
-
由 Wilber 提交于
-
由 Yiqun Liu 提交于
-
由 Zeng Jinle 提交于
Change __init__.py to adapt the new FLAGS coding style and update CI to monitor FLAGS changing (#35849) * change __init__.py to adapt new FLAGS * test ci check, ready for revert * split __init__.py and FLAGS approval * Revert "test ci check, ready for revert" This reverts commit bbbd2442fe3e948fef790ec634085a2431474326.
-
由 huangjun12 提交于
-
由 huangjun12 提交于
-
由 Zeng Jinle 提交于
-
由 Feiyu Chan 提交于
* 1. add interface for fft; 2. add data type predicate; 3. fix paddle.roll. * add fft c2c cufft kernel * implement argument checking & op calling parts for fft_c2c and fftn_c2c * add operator and opmaker definitions * only register float and double for cpu. * add common code for implementing FFT, add pocketfft as a dependency * add fft c2c cufft kernel function * fix bugs in python interface * add support for c2r, r2c operators, op makers, kernels and kernel functors. * test and fix bugs * 1. fft_c2c function: add support for onesided=False; 2. add complex<float>, complex<double> support for concat and flip. * 1. fft: fix python api bugs; 2. shape_op: add support for complex data types. * fft c2c cufft kernel done with complie and link * fix shape_op, add mkl placeholder * remove mkl * complete fft c2c in gpu * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft; 2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation. * complete fft c2c on gpu in ND * complete fft c2c on gpu in ND * complete fft c2c backward in ND * fix MKL-based implementation * Add frame op and CPU/GPU kernels. * Add frame op forward unittest. * Add frame op forward unittest. * Remove axis parameter in FrameFunctor. * Add frame op grad CPU/GPU kernels and unittest. * Add frame op grad CPU/GPU kernels and unittest. * Update doc string. * Update after review and remove librosa requirement in unittest. * Update grad kernel. * add fft_c2r op * Remove data allocation in TransCompute function. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * last fft c2r functor * fix C2R and R2C for cufft, becase the direction is not an option in these cases. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * fix bugs in python APIs * fix fft_c2r grad kernal * fix bugs in python APIs * add cuda fft c2r grad kernal functor * clean code * fix fft_c2r python API * fill fft r2c result with conjugate symmetry (#19) fill fft r2c result with conjugate symmetry * add placeholder for unittests (#24) * simple parameterize test function by auto generate test case from parm list (#25) * miscellaneous fixes for python APIs (#26) * add placeholder for unittests * resize fft inputs before computation is n or s is provided. * add complex kernels for pad and pad_grad * simplify argument checking. * add type promotion * add int to float or complex promotion * fix output data type for static mode * fix fft's input dtype dispatch, import fft to paddle * fix typos in axes checking (#27) * fix typos in axes checking * fix argument checking (#28) * fix argument checking * Add C2R Python layer normal and abnormal use cases (#29) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30) * Documentation of the common interfaces of c2r and c2c (#31) * Documentation of the common interfaces of c2r and c2c * clean c++ code (#32) * clean code * Add numpy-based implementation of spectral ops (#33) * add numpy reference implementation of spectral ops * Add fft_c2r numpy based implementation for unittest. (#34) * add fft_c2r numpy implementation * Add deframe op and stft/istft api. (#23) * Add frame api * Add deframe op and kernels. * Add stft and istft apis. * Add deframe api. Update stft and istft apis. * Fix bug in frame_from_librosa function when input dims >= 3 * Rename deframe to overlap_add. * Update istft. * Update after code review. * Add overlap_add op and stft/istft api unittest (#35) * Add overlap_add op unittest. * Register complex kernels of squeeze/unsquuze op. * Add stft/istft api unittest. * Add unittest for fft helper functions (#36) * add unittests for fft helper functions. add complex kernel for roll op. * complete static graph unittest for all public api (#37) * Unittest of op with FFT C2C, C2R and r2c added (#38) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * Documentation of the common interfaces of c2r and c2c * Unittest of op with FFT C2C, C2R and r2c added Co-authored-by: lijiaqi <lijiaqi0612@163.com> * add fft related options to CMakeLists.txt * fix typos and clean code (#39) * fix invisible character in mkl branch and fix error in error message * clean code: remove docstring from unittest for signal.py. * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40) * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41) 1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. 2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r; 3. fix unittest to catch UnImplementedError and RuntimeError; 4. fix compile error by avoid using thrust when cuda is not available. 5. fix sample code, use paddle.fft instead of paddle.tensor.fft * remove inclusion of thrust, add __all__ list for fft (#42) * Add api doc and update unittest. (#43) * Add doc strings. * Update overlap_add op unittest * fix MKL-based FFT implementation (#44) * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R * remove code for debug (#45) * use dynload for cufft (#46) * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms. * add complex support for fill_zeros_like * use dynload for cufft * Update doc and unittest. (#47) * Add doc of frame op and overlap_add op. * Update unittest. * use dynload for cufft (#48) 1. use dynload for cufft 2. fix unittest; 3. temporarily disable Rocm. * fix conflicts and merge upstream (#49) fix conflicts and merge upstream * fix compile error: only link dyload_cuda when cuda is available (#50) * fix compile error: only link dyload_cuda when cuda is available * fix dynload for cufft on windows (#51) 1. fix dynload for cufft on windows; 2. fix unittests. * add NOMINMAX to compile on windows (#52) add NOMINMAX to compile on windows * explicitly specify capture mode for lambdas (#55) explicitly specify capture mode for lambdas * fix fft sample (#53) * fix fft sample * update scipy and numpy version for unittests of fft (#56) update scipy and numpy version for unittests of fft * Add static graph unittests of frame and overlap_add api. (#57) * Remove cache of cuFFT & Disable ONEMKL (#59) 1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm 2. remove cache of cufft plans; 3. enhance error checking. 4. default WITH_ONEMKL to OFF Co-authored-by: Njeff41404 <jeff41404@gmail.com> Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com> Co-authored-by: NKP <109694228@qq.com> Co-authored-by: lijiaqi <lijiaqi0612@163.com> Co-authored-by: NXiaoxu Chen <chenxx_id@163.com> Co-authored-by: Nlijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>
-
由 Aurelius84 提交于
* split cuda_profiler into .h and .cc * fix cmake * remove inline
-
由 Wilber 提交于
-
由 Aurelius84 提交于
* Clean ParaseMemInfo and fix unittest with multi-thread * fix declare
-
由 Jacek Czaja 提交于
* - REorder disabling caching * - compilation fix * - another compilation fix * - another compilation fix * - compilation fix * - Fix * - yet another compilation fix * - suppresingly another compilation fix * - lint * - fix after review * - fix
-
由 From00 提交于
* Add linalg.eigvals API * pre-commit check * Adjust code style * Fix conflict * Improve code style * Modify the test code to ignore testing CUDA kernel * Sort ouput data before checking in test code * Set timeout value for UT * Improve API example code to pass CI * Fix bug for None fetch_list in Windows * Delete grad Op
-
- 17 9月, 2021 20 次提交
-
-
由 zhangbo9674 提交于
* add pure fp16 major function in auto_cast & tracer * support master weight in dygraph for pure fp16 * check mix dtype of fp16&fp32 for check_finite_and_unscale op * change pure fp16 funtion name * refine some bug in auto_cast * refine auto_cast interface logic * add param _casted_by_pure_fp16 for class Layer * support state_dict hook for save model by user appointed dtype in pure_fp16_decorator * refine pure_fp16_decorator as decorator * add unittest * add comment * add comment * support recompute * add comment for auto_cast and decorator * support to_static_state_dict for paddle.jit.save * unlimite models num and optimizers num * add lookup_table in black_list * fix momentum and layer state_dict * fix bug in layer state_dict * fix bug in layer state_dict_helper * refine unittest * refine test_momentun_op * refine interface and some code * refine amp_decorator interface * refine pure fp16 interface * refine master weight interface
-
由 Leo Chen 提交于
* temporally disable the warnings * disable ut
-
由 Zeng Jinle 提交于
-
由 Guoxia Wang 提交于
-
由 jakpiase 提交于
* disabled matmul_v2 grad * Revert "disabled matmul_v2 grad" This reverts commit b569bcef162116ca9f7963f3975b4a412f9e8555. * reverted disabling matmul_v2, disabled reshape and squeeze
-
由 Zeng Jinle 提交于
* make flag setter easier * update * rename macro name * fix bug of public/writable * update to pass CI * polish * fix CPU link error
-
由 andyjpaddle 提交于
* add pinv api, test=develop * add linalg pinv api, test=develop * update example code, test=develop
-
由 feng_shuai 提交于
* broadcast qkv_op * use PADDLE_ENFORCE_GT to replace assert
-
由 zhangkaihuo 提交于
Fused elementwise_add, dropout, elementwise_add and layer_norm into one operator, only support Forward. No Python API changed.
-
由 Haohongxiang 提交于
* Support EMA in Paddle2.x and Fleet * update * update * update * modify ut of ema * modify docs * modify bugs * update * update * update * modify ut
-
由 Guoxia Wang 提交于
-
由 Haipeng Wang 提交于
* add scale_op in model save step is not necessary, just fix the prune method to support static graph and inplace op * fix jit.save, no need to add scale_op to each outputvar anymore. fix prune_with_input, now it supports inplace op * temporarily disable test_trt_dynamic_shape.TRTDynamicShapeOutOfBound2Test
-
由 Aurelius84 提交于
* format code * format interface * polish interface * Remove std::memory_order * modify into SpinLock * remove fetch_context_pool_ * fix comment * modify into WorkQueueGroup * refine code * fix pointer * fix paddle_enforce * split into AsyncWorkQueue * polish code * specify std::memory_relax * fix atomic fetch_sub * fix num_thread
-
由 津 提交于
-
由 Chen Weihang 提交于
-
由 Zhong Hui 提交于
-
由 xiaoxiaohehe001 提交于
* add_skip_layernorm * add_skip_layernorm * add_skip_layernorm * add_skip_layernorm * add_skip_layernorm * add_skip_layernorm * add_skiplayernorm_teller * add_skip_layernorm * add_skip_layernorm_teller * add_skip_layernorm_teller * add_skip_layernorm * add_skip_teller
-
由 Leo Chen 提交于
* expose cuda stream to users * add ut
-
由 津 提交于
* add test * add test * add test
-
由 津 提交于
* add test * add test * add test * add test * add test
-