提交 · f67a50bd8c70afd93083458d42478aba100c23e5 · PaddlePaddle / Paddle

22 9月, 2021 21 次提交
- Z
  
  fix adamw DeprecationWarining (#35869) · f67a50bd
  由 zhaoyingli 提交于 9月 22, 2021
  
  f67a50bd
- Z
  [AMP]split minimize and add unscale_ for GradScaler (#35825) · bf6f0e54
  由 zhangbo9674 提交于 9月 22, 2021
```
* split minimize() to step() + update()

* add unscale and step for grad_scaler

* add unittest

* refine code in minimize

* delete step in loss_scaler

* fix example bug

* refine comment

* refine unittest

* add unittest
```
  bf6f0e54
- R
  [NPU] add randperm_op_npu (#35763) · 4f0c3278
  由 ronnywang 提交于 9月 22, 2021
```
* add randperm_op_npu

* fix test_set_value_op_npu
```
  4f0c3278
- T
  op:transpose_op supports bool type (#35886) · 0c6ee945
  由 TeslaZhao 提交于 9月 22, 2021
```
* Pass compat of conv_transpose_bias_mkldnn_fuse_pass

* Fix a bug of strided_slice op, about the axes parameter access memory out of bounds

* Fix a bug of transpose op, about accessing memory out of bounds of the perm param

* op:transpose_op supports bool type
```
  0c6ee945
- H
  Det &Slogdet (#34992) · 9ce45ddd
  由 huangxu96 提交于 9月 22, 2021
```
Add new API : paddle.linalg.det & paddle.linalg.slogdet

API Alias：paddle.det& paddle.slogdet
```
  9ce45ddd
- Y
  
  update paddle2onnx version to 0.8.2 in unittest_py/requirements.txt (#35837) · 00e0e358
  由 yeliang2258 提交于 9月 22, 2021
  
  00e0e358
- P
  support ernie-int8 test and prune op attribute test (#35890) · e8789c11
  由 Peihan 提交于 9月 22, 2021
```
* support ernie-int8 test and prune op attribute test

* remove using and use namespace

* remove macro and use shell instead

* Revert "remove macro and use shell instead"

This reverts commit 615964b149d7de7825b341936b42be22a4bc0091.

* fix grammar error

* fix shell error
```
  e8789c11
- W
  
  add no need buffer check, test=develop (#35790) · 7ebbcbbc
  由 wanghuancoder 提交于 9月 22, 2021
  
  7ebbcbbc
- Z
  
  refine FLAGS approval (#35904) · 7ba69249
  由 Zeng Jinle 提交于 9月 22, 2021
  
  7ba69249
- J
  
  [Inference] Support NNAdapter and ascend310 (#35226) · 10e53044
  由 JingZhuangzhuang 提交于 9月 22, 2021
  
  10e53044
- W
  
  fix: delete_quant_dequant_filter_op_pass, delete_quant_dequant_op_pass (#35879) · 5cda6b2b
  由 Wangzheee 提交于 9月 22, 2021
  
  5cda6b2b
- J
  fix conv2d convert test (#35627) · 1238115e
  由 JingZhuangzhuang 提交于 9月 21, 2021
```
* support nnadapter and ascend310

* modify code

* add anchor_generator convert test

* add gelu convert test

* add conv2d convert test

* modify anchor_operator convert test

* modify conv2d test

* modify con2d convert test

* modify conv2d convert test

* modify conv2d convert test

* modify conv2d test

* fix WITH_PYTHON compile error

* modify test file

* modify test file

* modify test file

* modify test file

* modify test file

* modify test file

* modify test file

* modify test file
Co-authored-by: Nxiaoxiaohehe001 <hiteezsf@163.com>
Co-authored-by: Njiweibo <jiweibo@baidu.com>
```
  1238115e
- J
  
  Add quant2 int8 lstm model test (#35887) · be4d0026
  由 joanna.wozna.intel 提交于 9月 22, 2021
  
  be4d0026
- W
  fix feed for new executor (#35803) · 4c2a06df
  由 wanghuancoder 提交于 9月 21, 2021
```
* fix feed, test=develop

* delete one test case, test=develop
```
  4c2a06df
- W
  
  add timeline(recordevent) for new executor, test=develop (#35831) · 5574c8cf
  由 wanghuancoder 提交于 9月 21, 2021
  
  5574c8cf
- W
  refine gc for new_executor (#35764) · fab1a029
  由 wanghuancoder 提交于 9月 21, 2021
```
* refine gc for new_executor, test=develop

* refine, test=develop

* refine, test=develop

* merge, test=develop
```
  fab1a029
- A
  Modify H2D and D2H as kQueue::Sync and Polish Schedule logic (#35866) · fe35496b
  由 Aurelius84 提交于 9月 22, 2021
```
* Modify H2D and D2H as kQueue::Sync

* fix interface error
```
  fe35496b
- [2.2]support extern third_party lapack API on Linux/Windows/Mac (#35690) · ae65257d
  由 zhouweiwei2014 提交于 9月 22, 2021
```
* support extern third_party lapack on Linux/Windows/Mac

* fix ci
```
  ae65257d
- F
  
  disable tests for fft on windows with gpu (#35872) · 5af6081a
  由 Feiyu Chan 提交于 9月 22, 2021
  
  5af6081a
- Z
  
  fix bug of module 'paddle' has no attribute 'fluid' for python3.6 (#35862) · 12ab017e
  由 zhangbo9674 提交于 9月 22, 2021
  
  12ab017e
- W
  
  add dilation check for conv (#35838) · 77134300
  由 wangguanzhong 提交于 9月 22, 2021
  
  77134300
21 9月, 2021 2 次提交

G

support fp16 (#35888) · 087c23a9
由 Guoxia Wang 提交于 9月 21, 2021

087c23a9

Reuse OneDNN handler for SGD and SUM for SelectedRows input tensors. (#35510) · 799f3861

由 Adam Osewski 提交于 9月 20, 2021

* Create stateful OneDNNAXPYHandler object.

This makes it possible to call it multiple times without recreating the
oneDNN primitives every time.

* Prepare SGDOpKernel to reuse its implementation from OneDNN kernel.

* OneDNN SGD kernel.

* Update call to use new OneDNNAXPYHandler object api.

* Setup seed in proper place.

* Enable OneDNN kernel only for single case.

* For dense param and sparse grad.

* Small refactor.

* Enable oneDNN by op attr or by cmd line flag.

* Use int64_t type for number of elements.

* Support dense param and grad from OneDNN kernel.

* Enable SGD OneDNN kernel when use MP BF16 optimizer.

* Force non-copyable/movable OneDNNAXPYHandler.

* Reuse OneDNNAXPYHandler for spare tensors in SUM op.

* Fix SFINAE rules.

* Remove recording event inside AXPY.

* Get rid of internal primitive caching.

* Stop use PP cache mechanims to store mem and primitive obj.
* Handler obj store and reuse needed desc & prim

* Do not derive from MKLDNNHandlerT

799f3861

19 9月, 2021 2 次提交

Optimization of pool2d grad (#35389) · 86685190

由 limingshu 提交于 9月 19, 2021

* Optimization of pool2d grad, first commit.

* remove useless print codes

* refine codes

* refine codes

* seal more operation into template specialization

* fix template struct error in MaxPool2dGrad.

* Fix header including error

* refine code with comment

* Seal the param-preparation codes into function for common use.

* Seal the param-preparation codes into function for common use.

* Seal the param-preparation into funciton and make it common for other kernels

* polish code and erase useless template speicalization

* Rerun triger

* rerun trigger

86685190

B

add hard_sigmoid trt converter test cases (#35876) · 9f88d327
由 baoachun 提交于 9月 19, 2021

9f88d327

18 9月, 2021 15 次提交

Z

increase test_imperative_auto_mixed_precision timePROPERTIES TIMEOUT (#35863) · e7617512
由 zhangbo9674 提交于 9月 18, 2021

e7617512
W

[hybird] fix pipeline section program Parameter (#35847) · 67c63639
由 WangXi 提交于 9月 18, 2021

67c63639

Basic PR on Cost Model (#35774) · 5ba9fe6e

由 Huihuang Zheng 提交于 9月 18, 2021

Add basic Cost Model, it uses executor to run program and profile it to get op time.

This is an early basic version, we will add more functions in the future.

5ba9fe6e

G
fix bug of module 'paddle' has no attribute 'distributed' for python3.6 (#35848) · d4cd2590
由 Guoxia Wang 提交于 9月 18, 2021
```
* fix bug
```
d4cd2590
C

FixEighOP; Unified MatrixEighFunctor function (#35812) · da441363
由 crystal 提交于 9月 18, 2021

da441363
S

make whl-build task only reuse local third_party cache (#35858) · a1b6ae26
由 Sing_chan 提交于 9月 18, 2021

a1b6ae26
W

trt engine dtor when the last predictor dtor. (#35842) · 8a239ae5
由 Wilber 提交于 9月 18, 2021

8a239ae5
Y

Correct the return type of elementwise kernel to avoid many compiling warnings. (#35839) · 71f051fb
由 Yiqun Liu 提交于 9月 18, 2021

71f051fb

Change __init__.py to adapt the new FLAGS coding style and update CI to... · 74f38d63

由 Zeng Jinle 提交于 9月 18, 2021

Change __init__.py to adapt the new FLAGS coding style and update CI to monitor FLAGS changing (#35849)

* change __init__.py to adapt new FLAGS

* test ci check, ready for revert

* split __init__.py and FLAGS approval

* Revert "test ci check, ready for revert"

This reverts commit bbbd2442fe3e948fef790ec634085a2431474326.

74f38d63

H

fix import paddle · edeb0ade
由 huangjun12 提交于 9月 17, 2021

edeb0ade
H

replace matmul to matmul_v2, expand to expand_v2 · 1776293a
由 huangjun12 提交于 9月 16, 2021

1776293a
Z

fix flags dep (#35859) · 6d45d8da
由 Zeng Jinle 提交于 9月 18, 2021

6d45d8da

由 Feiyu Chan 提交于 9月 18, 2021

* 1. add interface for fft;
2. add data type predicate;
3. fix paddle.roll.

* add fft c2c cufft kernel

* implement argument checking & op calling parts for fft_c2c and fftn_c2c

* add operator and opmaker definitions

* only register float and double for cpu.

* add common code for implementing FFT, add pocketfft as a dependency

* add fft c2c cufft kernel function

* fix bugs in python interface

* add support for c2r, r2c operators, op makers, kernels and kernel functors.

* test and fix bugs

* 1. fft_c2c function: add support for onesided=False;
2. add complex<float>, complex<double> support for concat and flip.

* 1. fft: fix python api bugs;
2. shape_op: add support for complex data types.

* fft c2c cufft kernel done with complie and link

* fix shape_op, add mkl placeholder

* remove mkl

* complete fft c2c in gpu

* 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft;
2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation.

* complete fft c2c on gpu in ND

* complete fft c2c on gpu in ND

* complete fft c2c backward in ND

* fix MKL-based implementation

* Add frame op and CPU/GPU kernels.

* Add frame op forward unittest.

* Add frame op forward unittest.

* Remove axis parameter in FrameFunctor.

* Add frame op grad CPU/GPU kernels and unittest.

* Add frame op grad CPU/GPU kernels and unittest.

* Update doc string.

* Update after review and remove librosa requirement in unittest.

* Update grad kernel.

* add fft_c2r op

* Remove data allocation in TransCompute function.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* last fft c2r functor

* fix C2R and R2C for cufft, becase the direction is not an option in these cases.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* fix bugs in python APIs

* fix fft_c2r grad kernal

* fix bugs in python APIs

* add cuda fft c2r grad kernal functor

* clean code

* fix fft_c2r python API

* fill fft r2c result with conjugate symmetry (#19)

fill fft r2c result with conjugate symmetry

* add placeholder for unittests (#24)

* simple parameterize test function by auto generate test case from parm list (#25)

* miscellaneous fixes for python APIs (#26)

* add placeholder for unittests

* resize fft inputs before computation is n or s is provided.

* add complex kernels for pad and pad_grad

* simplify argument checking.

* add type promotion

* add int to float or complex promotion

* fix output data type for static mode

* fix fft's input dtype dispatch, import fft to paddle

* fix typos in axes checking (#27)

* fix typos in axes checking

* fix argument checking (#28)

* fix argument checking

* Add C2R Python layer normal and abnormal use cases (#29)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30)

* Documentation of the common interfaces of c2r and c2c (#31)

* Documentation of the common interfaces of c2r and c2c

* clean c++ code  (#32)

* clean code

* Add numpy-based implementation of spectral ops (#33)

* add numpy reference implementation of spectral ops

* Add fft_c2r numpy based implementation for unittest. (#34)

* add fft_c2r numpy implementation

* Add deframe op and stft/istft api. (#23)

* Add frame api

* Add deframe op and kernels.

* Add stft and istft apis.

* Add deframe api. Update stft and istft apis.

* Fix bug in frame_from_librosa function when input dims >= 3

* Rename deframe to overlap_add.

* Update istft.

* Update after code review.

* Add overlap_add op and stft/istft api unittest (#35)

* Add overlap_add op unittest.

* Register complex kernels of squeeze/unsquuze op.

* Add stft/istft api unittest.

* Add unittest for fft helper functions (#36)

* add unittests for fft helper functions. add complex kernel for roll op.

* complete static graph unittest for all public api (#37)

* Unittest of op with FFT C2C, C2R and r2c added (#38)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* Documentation of the common interfaces of c2r and c2c

* Unittest of op with FFT C2C, C2R and r2c added
Co-authored-by: lijiaqi <lijiaqi0612@163.com>

* add fft related options to CMakeLists.txt

* fix typos and clean code (#39)

* fix invisible character in mkl branch and fix error in error message

* clean code: remove docstring from unittest for signal.py.

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40)

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.

* fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41)

1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r;
3. fix unittest to catch UnImplementedError and RuntimeError;
4. fix compile error by avoid using thrust when cuda is not available.
5.  fix sample code, use paddle.fft instead of paddle.tensor.fft

* remove inclusion of thrust, add __all__ list for fft (#42)

* Add api doc and update unittest. (#43)

* Add doc strings.
* Update overlap_add op unittest

* fix MKL-based FFT implementation (#44)

* fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R

* remove code for debug (#45)

* use dynload for cufft (#46)

* use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms.

* add complex support for fill_zeros_like

* use dynload for cufft

* Update doc and unittest. (#47)

* Add doc of frame op and overlap_add op.

* Update unittest.

* use dynload for cufft (#48)

1. use dynload for cufft
2. fix unittest;
3. temporarily disable Rocm.

* fix conflicts and merge upstream (#49)

fix conflicts and merge upstream

* fix compile error: only link dyload_cuda when cuda is available (#50)

* fix compile error: only link dyload_cuda when cuda is available

* fix dynload for cufft on windows (#51)

1. fix dynload for cufft on windows;
2. fix unittests.

* add NOMINMAX to compile on windows (#52)

 add NOMINMAX to compile on windows

* explicitly specify capture mode for lambdas (#55)

 explicitly specify capture mode for lambdas

* fix fft sample (#53)

* fix fft sample

* update scipy and numpy version for unittests of fft (#56)

update scipy and numpy version for unittests of fft

* Add static graph unittests of frame and overlap_add api. (#57)

* Remove cache of cuFFT & Disable ONEMKL (#59)

1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm
2. remove cache of cufft plans;
3. enhance error checking.
4. default WITH_ONEMKL to OFF
Co-authored-by: Njeff41404 <jeff41404@gmail.com>
Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
Co-authored-by: NKP <109694228@qq.com>
Co-authored-by: lijiaqi <lijiaqi0612@163.com>
Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
Co-authored-by: Nlijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

11518a43

A
split cuda_profiler into .h and .cc (#35821) · 01063218
由 Aurelius84 提交于 9月 18, 2021
```
* split cuda_profiler into .h and .cc

* fix cmake

* remove inline
```
01063218
W

trt support serialize and deserialize (#35828) · ba71421c
由 Wilber 提交于 9月 18, 2021

ba71421c

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功