提交 · 58c8f6b38ddd44834d822a8054858becc89cf550 · 机器未来 / Paddle

28 9月, 2021 2 次提交

[hybrid] seed and dropout op support force-cpu (#35820) · 58c8f6b3

由 xiayanming 提交于 9月 28, 2021

* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid

* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid

* [HIP] fix op not support AMD GPU bug

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] fix seed ci failed issue

* add AsExtra for force_cpu of seed op

58c8f6b3

G

fix bug of reduce_sum when src_dtype != dst_dtype and reduce_num == 1 (#36123) · d5268a6e
由 Guoxia Wang 提交于 9月 28, 2021

d5268a6e

27 9月, 2021 3 次提交

fix zero tensor for unique, unstack (#36021) · efd35384

由 Jiawei Wang 提交于 9月 27, 2021

* fix extra op for expand, expand_as, tile, unstack

* fix unique unstack dim 0

* Update expand_v2_op.cc

* fix unique_op format

efd35384

Lars op optimiztion with cudaLaunchCooperativeKernel method (#35652) · a112ce42

由 limingshu 提交于 9月 27, 2021

* A leap of try for cudaLaunchCooperativeKernel

* fix bugs

* Totally replace the lar cuda kernel

* Fix bugs

* fix code according to comments

* fix codes according to  review comments

* adding some function overload

* relocate the power operation.

a112ce42

Added flatten and flatten2 BF16/FP32 FWD/BWD kernels (#35892) · e427a0f1

由 jakpiase 提交于 9月 27, 2021

* refactored reshape multiop kernel and added flatten1/2 kernels

* added formatting for flatten tests

* CI fix

* disabled reshape_kernel ops after succesful CI run

* minor fix

e427a0f1

26 9月, 2021 7 次提交
- J
  
  bugfix reshape -1 (#36087) · 2fe9ae71
  由 JZ-LIANG 提交于 9月 26, 2021
  
  2fe9ae71
- J
  [new api] add func/class API psroi_pool and UT (#35352) · e45d64ec
  由 JYChen 提交于 9月 26, 2021
```
* add func/class API psroi_pool and UT

* add UT in static mode

* Remove redundant type checks in static mode

* More detailed description for test_psroi_pool_op

* fix code format of UT

* fix en-doc
```
  e45d64ec
- Y
  
  Add a check for multiplex op (#34972) · b430f6a3
  由 Yulong Ao 提交于 9月 26, 2021
  
  b430f6a3
- W
  
  Fix FPE of label smooth op (#35861) · 628ff34b
  由 whs 提交于 9月 26, 2021
  
  628ff34b
- C
  
  CPU forward calculation replaces Eigen with Lapack;Modify linalg exposure rules (#35916) · 7ff226f0
  由 crystal 提交于 9月 26, 2021
  
  7ff226f0
- Y
  Support fixed seed in Python for test (#36065) · 1b90f968
  由 YuanRisheng 提交于 9月 26, 2021
```
* Add New Op: gumbel_softmax

* Add New Op: gumbel_softmax

* Add New Op: gumbel_softmax (amend)

* add __main__ function in unit test

* fix bugs when test in windows ci

* update en docs

* delete reletive error in unit test

* delete relative error in unit test

* set hard=True in unit test

* Support fix seed in Python for test
```
  1b90f968
- N
  
  [icafe-31094] Add function comments and instructions to the Primitive API (#35743) · bc0df48b
  由 niuliling123 提交于 9月 26, 2021
  
  bc0df48b
25 9月, 2021 1 次提交
- B
  
  temporarily fix the performance drop of recurrent op (#36052) · 372a1a75
  由 baoachun 提交于 9月 25, 2021
  
  372a1a75
24 9月, 2021 5 次提交

J
add gradient kernel of det op and slogdet op (#36013) · b91e8eec
由 jiangcheng 提交于 9月 24, 2021
```
* add gradient kernel of det op and slogdet op

* fix CI APPROVAL problem
```
b91e8eec

Added elementwise_sub_mkldnn operator (#35662) · 787273ed

由 piotrekobiIntel 提交于 9月 24, 2021

* Add elementwise_sub_mkldnn_op without grad

* Add test to static_mode_white_list

* Refactor code, change license years

* Remove invalid grad implementation

* Fix element_wise_sub_op test

* Fix CI Approval error

* Remove unnecessary EltwiseSubMKLDNNGradKernel class

* Fix CI Approval 2

* Fix CI Approval 3

* Fix CI Approval Attempt #4

* Fix CI Approve Attempt #5

* Fix CI Approval Attempt #6

* Fix CI Approval Attemt #7

* Change test names containing add to sub

* Fix old tests testing add instead of sub

* Copy grad implementation from elementwise_add_mkldnn

* CI test fix attempt

* Revert "CI test fix attempt"

This reverts commit c647cacf41e6a87c715385a185de5cbf65fc8900.

* Fix CI attempt 2

* Fix elementwise_sub tests, temporary mkldnn broadcast test disable

* Add working implementation of elementwise_sub grad

* Fix build errors caused by pull

* Fix format error

* Fix format error 2

* Disable elementwise_sub_mkldnn test on GPU

* Apply fix for paddle.fluid import

* Revert changes of test_elementwise_sub and Fix mkldnn test

* Revert "Apply fix for paddle.fluid import"

This reverts commit fc3b122fec8e12f2bcb32928a2685ba4d20fd742.

* fix bug of module 'paddle' has no attribute 'fluid' for python3.6 (#35862)

* Add changes suggested by reviewers

* Change @unittest.skipIf... to @OpTestTool.skip_if_not_cpu_bf16() to satisfy Approval CI

* Remove check_dygraph=False to satisify CI Approval
Co-authored-by: Nzhangbo9674 <82555433+zhangbo9674@users.noreply.github.com>

787273ed

[oneDNN] candidate fix to #34554 (#35884) · 485b387d

由 Jacek Czaja 提交于 9月 24, 2021

* - candidate fix

* - More fixes to #34554

* - another incosnstent fix to key

* - Remvoed unneeded line

* - matching the cache behaviour to other ops

485b387d

Add paddle.linalg.solve OP (#35715) · 8caf951c

由 Weilong Wu 提交于 9月 24, 2021

* Add linalg.solve op, test=develop

* Fix a bug caused by accidental deletion

* updated description and fix a bug: missing a comma

* Add linalg.solve op, test=develop

* updated solve op backward logic

* updated solve op backward logic again

* Add linalg.solve Op, test=develop

* Updated and modified to fit CI requirements

* Fix a bug

* 1)Add more test cases; 2)Fix a wrong usage in reduces operation; 3)Remove redundant code

* Remove redundant comments

* 1)Removed redundant code; 2)Updated to enhance code robustness

* Removed redundant code

* Updated API documents

8caf951c

W
add the shape check for the matmul (#35791) · 8e19d1ba
由 wawltor 提交于 9月 24, 2021
```
* add the shape check for the matmul

* remove the test case for the linear
```
8e19d1ba

23 9月, 2021 3 次提交
- F
  
  Replace Eigen with Lapack library for eigvals OP kernel (#35909) · 9b8aafe5
  由 From00 提交于 9月 23, 2021
  
  9b8aafe5
- L
  
  Add fused_attention_op: add impl wrappers. (#35903) · 88ea8e6f
  由 Li Min 提交于 9月 23, 2021
  
  88ea8e6f
- T
  add argmax and iou_similarity for kunlun (#35836) · 7bf84e2d
  由 TTerror 提交于 9月 23, 2021
```
* add argmax and iou_similarity for kunlun

* add argmax and iou_similarity for kunlun

* add argmax and iou_similarity for kunlun
```
  7bf84e2d
22 9月, 2021 7 次提交
- Z
  
  ResnetUnitOp implemented by cuDNN fused op(backend code) (#35557) · 736a7388
  由 Zhang Zheng 提交于 9月 22, 2021
  
  736a7388
- R
  [NPU] add randperm_op_npu (#35763) · 4f0c3278
  由 ronnywang 提交于 9月 22, 2021
```
* add randperm_op_npu

* fix test_set_value_op_npu
```
  4f0c3278
- T
  op:transpose_op supports bool type (#35886) · 0c6ee945
  由 TeslaZhao 提交于 9月 22, 2021
```
* Pass compat of conv_transpose_bias_mkldnn_fuse_pass

* Fix a bug of strided_slice op, about the axes parameter access memory out of bounds

* Fix a bug of transpose op, about accessing memory out of bounds of the perm param

* op:transpose_op supports bool type
```
  0c6ee945
- H
  Det &Slogdet (#34992) · 9ce45ddd
  由 huangxu96 提交于 9月 22, 2021
```
Add new API : paddle.linalg.det & paddle.linalg.slogdet

API Alias：paddle.det& paddle.slogdet
```
  9ce45ddd
- J
  
  [Inference] Support NNAdapter and ascend310 (#35226) · 10e53044
  由 JingZhuangzhuang 提交于 9月 22, 2021
  
  10e53044
- [2.2]support extern third_party lapack API on Linux/Windows/Mac (#35690) · ae65257d
  由 zhouweiwei2014 提交于 9月 22, 2021
```
* support extern third_party lapack on Linux/Windows/Mac

* fix ci
```
  ae65257d
- W
  
  add dilation check for conv (#35838) · 77134300
  由 wangguanzhong 提交于 9月 22, 2021
  
  77134300
21 9月, 2021 2 次提交

G

support fp16 (#35888) · 087c23a9
由 Guoxia Wang 提交于 9月 21, 2021

087c23a9

Reuse OneDNN handler for SGD and SUM for SelectedRows input tensors. (#35510) · 799f3861

由 Adam Osewski 提交于 9月 20, 2021

* Create stateful OneDNNAXPYHandler object.

This makes it possible to call it multiple times without recreating the
oneDNN primitives every time.

* Prepare SGDOpKernel to reuse its implementation from OneDNN kernel.

* OneDNN SGD kernel.

* Update call to use new OneDNNAXPYHandler object api.

* Setup seed in proper place.

* Enable OneDNN kernel only for single case.

* For dense param and sparse grad.

* Small refactor.

* Enable oneDNN by op attr or by cmd line flag.

* Use int64_t type for number of elements.

* Support dense param and grad from OneDNN kernel.

* Enable SGD OneDNN kernel when use MP BF16 optimizer.

* Force non-copyable/movable OneDNNAXPYHandler.

* Reuse OneDNNAXPYHandler for spare tensors in SUM op.

* Fix SFINAE rules.

* Remove recording event inside AXPY.

* Get rid of internal primitive caching.

* Stop use PP cache mechanims to store mem and primitive obj.
* Handler obj store and reuse needed desc & prim

* Do not derive from MKLDNNHandlerT

799f3861

19 9月, 2021 1 次提交

Optimization of pool2d grad (#35389) · 86685190

由 limingshu 提交于 9月 19, 2021

* Optimization of pool2d grad, first commit.

* remove useless print codes

* refine codes

* refine codes

* seal more operation into template specialization

* fix template struct error in MaxPool2dGrad.

* Fix header including error

* refine code with comment

* Seal the param-preparation codes into function for common use.

* Seal the param-preparation codes into function for common use.

* Seal the param-preparation into funciton and make it common for other kernels

* polish code and erase useless template speicalization

* Rerun triger

* rerun trigger

86685190

18 9月, 2021 5 次提交

C

FixEighOP; Unified MatrixEighFunctor function (#35812) · da441363
由 crystal 提交于 9月 18, 2021

da441363
Y

Correct the return type of elementwise kernel to avoid many compiling warnings. (#35839) · 71f051fb
由 Yiqun Liu 提交于 9月 18, 2021

71f051fb

由 Feiyu Chan 提交于 9月 18, 2021

* 1. add interface for fft;
2. add data type predicate;
3. fix paddle.roll.

* add fft c2c cufft kernel

* implement argument checking & op calling parts for fft_c2c and fftn_c2c

* add operator and opmaker definitions

* only register float and double for cpu.

* add common code for implementing FFT, add pocketfft as a dependency

* add fft c2c cufft kernel function

* fix bugs in python interface

* add support for c2r, r2c operators, op makers, kernels and kernel functors.

* test and fix bugs

* 1. fft_c2c function: add support for onesided=False;
2. add complex<float>, complex<double> support for concat and flip.

* 1. fft: fix python api bugs;
2. shape_op: add support for complex data types.

* fft c2c cufft kernel done with complie and link

* fix shape_op, add mkl placeholder

* remove mkl

* complete fft c2c in gpu

* 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft;
2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation.

* complete fft c2c on gpu in ND

* complete fft c2c on gpu in ND

* complete fft c2c backward in ND

* fix MKL-based implementation

* Add frame op and CPU/GPU kernels.

* Add frame op forward unittest.

* Add frame op forward unittest.

* Remove axis parameter in FrameFunctor.

* Add frame op grad CPU/GPU kernels and unittest.

* Add frame op grad CPU/GPU kernels and unittest.

* Update doc string.

* Update after review and remove librosa requirement in unittest.

* Update grad kernel.

* add fft_c2r op

* Remove data allocation in TransCompute function.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* last fft c2r functor

* fix C2R and R2C for cufft, becase the direction is not an option in these cases.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* fix bugs in python APIs

* fix fft_c2r grad kernal

* fix bugs in python APIs

* add cuda fft c2r grad kernal functor

* clean code

* fix fft_c2r python API

* fill fft r2c result with conjugate symmetry (#19)

fill fft r2c result with conjugate symmetry

* add placeholder for unittests (#24)

* simple parameterize test function by auto generate test case from parm list (#25)

* miscellaneous fixes for python APIs (#26)

* add placeholder for unittests

* resize fft inputs before computation is n or s is provided.

* add complex kernels for pad and pad_grad

* simplify argument checking.

* add type promotion

* add int to float or complex promotion

* fix output data type for static mode

* fix fft's input dtype dispatch, import fft to paddle

* fix typos in axes checking (#27)

* fix typos in axes checking

* fix argument checking (#28)

* fix argument checking

* Add C2R Python layer normal and abnormal use cases (#29)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30)

* Documentation of the common interfaces of c2r and c2c (#31)

* Documentation of the common interfaces of c2r and c2c

* clean c++ code  (#32)

* clean code

* Add numpy-based implementation of spectral ops (#33)

* add numpy reference implementation of spectral ops

* Add fft_c2r numpy based implementation for unittest. (#34)

* add fft_c2r numpy implementation

* Add deframe op and stft/istft api. (#23)

* Add frame api

* Add deframe op and kernels.

* Add stft and istft apis.

* Add deframe api. Update stft and istft apis.

* Fix bug in frame_from_librosa function when input dims >= 3

* Rename deframe to overlap_add.

* Update istft.

* Update after code review.

* Add overlap_add op and stft/istft api unittest (#35)

* Add overlap_add op unittest.

* Register complex kernels of squeeze/unsquuze op.

* Add stft/istft api unittest.

* Add unittest for fft helper functions (#36)

* add unittests for fft helper functions. add complex kernel for roll op.

* complete static graph unittest for all public api (#37)

* Unittest of op with FFT C2C, C2R and r2c added (#38)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* Documentation of the common interfaces of c2r and c2c

* Unittest of op with FFT C2C, C2R and r2c added
Co-authored-by: lijiaqi <lijiaqi0612@163.com>

* add fft related options to CMakeLists.txt

* fix typos and clean code (#39)

* fix invisible character in mkl branch and fix error in error message

* clean code: remove docstring from unittest for signal.py.

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40)

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.

* fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41)

1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r;
3. fix unittest to catch UnImplementedError and RuntimeError;
4. fix compile error by avoid using thrust when cuda is not available.
5.  fix sample code, use paddle.fft instead of paddle.tensor.fft

* remove inclusion of thrust, add __all__ list for fft (#42)

* Add api doc and update unittest. (#43)

* Add doc strings.
* Update overlap_add op unittest

* fix MKL-based FFT implementation (#44)

* fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R

* remove code for debug (#45)

* use dynload for cufft (#46)

* use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms.

* add complex support for fill_zeros_like

* use dynload for cufft

* Update doc and unittest. (#47)

* Add doc of frame op and overlap_add op.

* Update unittest.

* use dynload for cufft (#48)

1. use dynload for cufft
2. fix unittest;
3. temporarily disable Rocm.

* fix conflicts and merge upstream (#49)

fix conflicts and merge upstream

* fix compile error: only link dyload_cuda when cuda is available (#50)

* fix compile error: only link dyload_cuda when cuda is available

* fix dynload for cufft on windows (#51)

1. fix dynload for cufft on windows;
2. fix unittests.

* add NOMINMAX to compile on windows (#52)

 add NOMINMAX to compile on windows

* explicitly specify capture mode for lambdas (#55)

 explicitly specify capture mode for lambdas

* fix fft sample (#53)

* fix fft sample

* update scipy and numpy version for unittests of fft (#56)

update scipy and numpy version for unittests of fft

* Add static graph unittests of frame and overlap_add api. (#57)

* Remove cache of cuFFT & Disable ONEMKL (#59)

1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm
2. remove cache of cufft plans;
3. enhance error checking.
4. default WITH_ONEMKL to OFF
Co-authored-by: Njeff41404 <jeff41404@gmail.com>
Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
Co-authored-by: NKP <109694228@qq.com>
Co-authored-by: lijiaqi <lijiaqi0612@163.com>
Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
Co-authored-by: Nlijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

11518a43

[oneDNN] Disable caching of Reorder operation (#35664) · e4c2a854

由 Jacek Czaja 提交于 9月 18, 2021

* - REorder disabling caching

* - compilation fix

* - another compilation fix

* - another compilation fix

* - compilation fix

* - Fix

* - yet another compilation fix

* - suppresingly another compilation fix

* - lint

* - fix after review

* - fix

e4c2a854

Add new API "eigvals" in linalg (#35720) · d411a038

由 From00 提交于 9月 18, 2021

* Add linalg.eigvals API

* pre-commit check

* Adjust code style

* Fix conflict

* Improve code style

* Modify the test code to ignore testing CUDA kernel

* Sort ouput data before checking in test code

* Set timeout value for UT

* Improve API example code to pass CI

* Fix bug for None fetch_list in Windows

* Delete grad Op

d411a038

17 9月, 2021 4 次提交

Disabled oneDNN reshape1/2 and squeeze1/2 kernels (#35781) · 0eaab803

由 jakpiase 提交于 9月 17, 2021

* disabled matmul_v2 grad

* Revert "disabled matmul_v2 grad"

This reverts commit b569bcef162116ca9f7963f3975b4a412f9e8555.

* reverted disabling matmul_v2, disabled reshape and squeeze

0eaab803

Make flag adding easier (#35823) · 2c781455

由 Zeng Jinle 提交于 9月 17, 2021

* make flag setter easier

* update

* rename macro name

* fix bug of public/writable

* update to pass CI

* polish

* fix CPU link error

2c781455

F
broadcast qkv_op (#35780) · cf9eae4c
由 feng_shuai 提交于 9月 17, 2021
```
* broadcast qkv_op

* use PADDLE_ENFORCE_GT to replace assert
```
cf9eae4c

add a fusion op: fused_layernorm_residual_dropout_bias (#35151) · 7975dfcf

由 zhangkaihuo 提交于 9月 17, 2021

Fused elementwise_add, dropout, elementwise_add and layer_norm into one operator, only support Forward. 
No Python API changed.

7975dfcf

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致