提交 · 697c712f3df0d811eba27f8c49ed13899038ccac · PaddlePaddle / Paddle

04 8月, 2023 1 次提交

Support Combined indexing for __getitem__ and __setitem__ (#55211) · 697c712f

由 JYChen 提交于 8月 04, 2023

* WIP: start writing combined indexing get

* list/tuple/Variable

* getitem 80%

* add setitem

* add some unittest for setitem

* lazy import

* fix some setitem error

* fix advance indexing with decreasing axes; fix strided_slice input name

* combine int-tensor getitem is ok (without boolean support & broadcast); add getitem unittest for static

* add broadcast & parse bool tensor for __getitem

* [change getitem] _getitem_impl_ to _getitem_static, not deleting the former one

* refine new getitem; fix ut in variable/var_base

* add __getitem__ ut in dygraph

* re-dispatch getitem for Py/CPP; fix strided_slice decrease axes error in dygraph

* fix ut; support tensor in slice

* [change setitem] _setitem_impl_ to _setitem_static, not deleting the former one

* remove some UT (for some, temporarily)

* add IndexError to solve timeout problem in static-mode

* 1.temply forbideen all-False bool-indexput; 2.setitem_static will return new variable

* xpu uses old stratege

* rename dy2st setitem ut to avoid same-name problem

* dy2st for new combined index

* ut case for combine-index with dy2st

* open ut with all-false-bool setitem

* remove useless doc and _getitem_impl_

* change static res

* fix static xpu

697c712f

17 4月, 2023 1 次提交

mv ps distributed dir (#52885) · 1765d5d1

由 tianshuo78520a 提交于 4月 17, 2023

* mv ps distributed dir

* fix

* add del auto_parallel

* add auto_parallel

* fix ps

* fix bug

* fix test bug

* fix test bug

* merge develop fix error

* merge develop fix error

* merge develop fix error

1765d5d1

14 6月, 2022 1 次提交
- W
  fix cmake-lint problems. (#43406) · 59f89236
  由 Wilber 提交于 6月 14, 2022
```
* cmake-lint

* update
```
  59f89236
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
23 12月, 2021 1 次提交
- X
  move distribution.py into distribution package and split into different file... · a3e6f18c
  由 Xiaoxu Chen 提交于 12月 23, 2021
```
move distribution.py into distribution package and split into different file for better scalability (#38047)
```
  a3e6f18c
18 9月, 2021 1 次提交

由 Feiyu Chan 提交于 9月 18, 2021

* 1. add interface for fft;
2. add data type predicate;
3. fix paddle.roll.

* add fft c2c cufft kernel

* implement argument checking & op calling parts for fft_c2c and fftn_c2c

* add operator and opmaker definitions

* only register float and double for cpu.

* add common code for implementing FFT, add pocketfft as a dependency

* add fft c2c cufft kernel function

* fix bugs in python interface

* add support for c2r, r2c operators, op makers, kernels and kernel functors.

* test and fix bugs

* 1. fft_c2c function: add support for onesided=False;
2. add complex<float>, complex<double> support for concat and flip.

* 1. fft: fix python api bugs;
2. shape_op: add support for complex data types.

* fft c2c cufft kernel done with complie and link

* fix shape_op, add mkl placeholder

* remove mkl

* complete fft c2c in gpu

* 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft;
2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation.

* complete fft c2c on gpu in ND

* complete fft c2c on gpu in ND

* complete fft c2c backward in ND

* fix MKL-based implementation

* Add frame op and CPU/GPU kernels.

* Add frame op forward unittest.

* Add frame op forward unittest.

* Remove axis parameter in FrameFunctor.

* Add frame op grad CPU/GPU kernels and unittest.

* Add frame op grad CPU/GPU kernels and unittest.

* Update doc string.

* Update after review and remove librosa requirement in unittest.

* Update grad kernel.

* add fft_c2r op

* Remove data allocation in TransCompute function.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* last fft c2r functor

* fix C2R and R2C for cufft, becase the direction is not an option in these cases.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* fix bugs in python APIs

* fix fft_c2r grad kernal

* fix bugs in python APIs

* add cuda fft c2r grad kernal functor

* clean code

* fix fft_c2r python API

* fill fft r2c result with conjugate symmetry (#19)

fill fft r2c result with conjugate symmetry

* add placeholder for unittests (#24)

* simple parameterize test function by auto generate test case from parm list (#25)

* miscellaneous fixes for python APIs (#26)

* add placeholder for unittests

* resize fft inputs before computation is n or s is provided.

* add complex kernels for pad and pad_grad

* simplify argument checking.

* add type promotion

* add int to float or complex promotion

* fix output data type for static mode

* fix fft's input dtype dispatch, import fft to paddle

* fix typos in axes checking (#27)

* fix typos in axes checking

* fix argument checking (#28)

* fix argument checking

* Add C2R Python layer normal and abnormal use cases (#29)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30)

* Documentation of the common interfaces of c2r and c2c (#31)

* Documentation of the common interfaces of c2r and c2c

* clean c++ code  (#32)

* clean code

* Add numpy-based implementation of spectral ops (#33)

* add numpy reference implementation of spectral ops

* Add fft_c2r numpy based implementation for unittest. (#34)

* add fft_c2r numpy implementation

* Add deframe op and stft/istft api. (#23)

* Add frame api

* Add deframe op and kernels.

* Add stft and istft apis.

* Add deframe api. Update stft and istft apis.

* Fix bug in frame_from_librosa function when input dims >= 3

* Rename deframe to overlap_add.

* Update istft.

* Update after code review.

* Add overlap_add op and stft/istft api unittest (#35)

* Add overlap_add op unittest.

* Register complex kernels of squeeze/unsquuze op.

* Add stft/istft api unittest.

* Add unittest for fft helper functions (#36)

* add unittests for fft helper functions. add complex kernel for roll op.

* complete static graph unittest for all public api (#37)

* Unittest of op with FFT C2C, C2R and r2c added (#38)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* Documentation of the common interfaces of c2r and c2c

* Unittest of op with FFT C2C, C2R and r2c added
Co-authored-by: lijiaqi <lijiaqi0612@163.com>

* add fft related options to CMakeLists.txt

* fix typos and clean code (#39)

* fix invisible character in mkl branch and fix error in error message

* clean code: remove docstring from unittest for signal.py.

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40)

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.

* fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41)

1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r;
3. fix unittest to catch UnImplementedError and RuntimeError;
4. fix compile error by avoid using thrust when cuda is not available.
5.  fix sample code, use paddle.fft instead of paddle.tensor.fft

* remove inclusion of thrust, add __all__ list for fft (#42)

* Add api doc and update unittest. (#43)

* Add doc strings.
* Update overlap_add op unittest

* fix MKL-based FFT implementation (#44)

* fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R

* remove code for debug (#45)

* use dynload for cufft (#46)

* use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms.

* add complex support for fill_zeros_like

* use dynload for cufft

* Update doc and unittest. (#47)

* Add doc of frame op and overlap_add op.

* Update unittest.

* use dynload for cufft (#48)

1. use dynload for cufft
2. fix unittest;
3. temporarily disable Rocm.

* fix conflicts and merge upstream (#49)

fix conflicts and merge upstream

* fix compile error: only link dyload_cuda when cuda is available (#50)

* fix compile error: only link dyload_cuda when cuda is available

* fix dynload for cufft on windows (#51)

1. fix dynload for cufft on windows;
2. fix unittests.

* add NOMINMAX to compile on windows (#52)

 add NOMINMAX to compile on windows

* explicitly specify capture mode for lambdas (#55)

 explicitly specify capture mode for lambdas

* fix fft sample (#53)

* fix fft sample

* update scipy and numpy version for unittests of fft (#56)

update scipy and numpy version for unittests of fft

* Add static graph unittests of frame and overlap_add api. (#57)

* Remove cache of cuFFT & Disable ONEMKL (#59)

1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm
2. remove cache of cufft plans;
3. enhance error checking.
4. default WITH_ONEMKL to OFF
Co-authored-by: Njeff41404 <jeff41404@gmail.com>
Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
Co-authored-by: NKP <109694228@qq.com>
Co-authored-by: lijiaqi <lijiaqi0612@163.com>
Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
Co-authored-by: Nlijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

11518a43

16 11月, 2020 1 次提交
- Y
  
  modified timeout value for some ut (#28616) · c4d22c84
  由 YUNSHEN XIE 提交于 11月 16, 2020
  
  c4d22c84
08 11月, 2020 1 次提交

exec ut no more than 15s 1 (#28439) · ba075632

由 YUNSHEN XIE 提交于 11月 08, 2020

* disable ut test_parallel_executor_fetch_isolated_var,test=document_fix

* test for limiting ut exec time as 15S

* fix an error caused by cannot find ut

* fix some error

* can not find test_transformer

* fix error caused by ut not run in windows

* fix error caused by Compiler Options

* fix error caused by setting timeout value as 15 in python/paddle/tests/CMakeLists.txt

* setting timeout value to 120s for old ut

* add the timeout value setting

* fix error caused by ut only run in coverage_ci

* add analyzer_transformer_profile_tester

* fix some error

* fix some error

* fix error with inference option

* fix error with inference option setting as ON_INFER

* add some ut to set timeout

* modified some option

* fix error

* fix some timeout error

* fix error

* fix error

* fix timeout for test_analyzer_bfloat16_resnet50

* fix error

* setting timeout properity for some ut

* first pr for new ut timeout as 15S

ba075632

27 8月, 2020 1 次提交

Add unified RNN APIs (#26588) · f4083010

由 Feiyu Chan 提交于 8月 27, 2020

* Add RNN related apis in paddl.nn
test=develop

* new rnn api, cell almost done

* add new progresses in rnn APIs for 2.0

* refine rnn APIs and docstrings.

* add unittets

* disable gpu tests when paddle is not compiled with cuda support

* remove unnecessary imports

* fix docstring

* add to no_sample wlist

* backport to python2 to avoid yield from

* add **kwargs, fix typos

* update docstrings for birnn

* rename argument for SimpleRNN and SimpleRNNCell, fix sample code

* add default value for initial_states in fluid.layers.birnn
Co-authored-by: Nguosheng <guosheng@baidu.com>

f4083010

29 1月, 2019 1 次提交
- K
  Make separate folders for mkldnn codes · b1bdcd4d
  由 Krzysztof Binias 提交于 1月 28, 2019
```
test=develop
```
  b1bdcd4d
14 12月, 2018 1 次提交

Enable ngraph tests for a ngraph engine (#14800) · 67b555d3

由 mozga-intel 提交于 12月 14, 2018

* Enable ngraph tests for a ngraph engine
test=develop

* Move the test structure to other place
test=develop

* Add USE_NGRAPH flag, simple structure
test=develop

67b555d3

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功