提交 · 0f3448380caafec400f04f6aab3b5144f9a90e82 · PaddlePaddle / Paddle

22 9月, 2021 8 次提交
- Y
  [Cherry-pick 2.2] Correct the return type of elementwise kernel to avoid many... · 0f344838
  由 Yiqun Liu 提交于 9月 22, 2021
```
 [Cherry-pick 2.2] Correct the return type of elementwise kernel to avoid many compiling warnings. (#35839) (#35868)

Cherry-pick #35839
```
  0f344838
- B
  
  add hard_sigmoid trt converter test cases (#35908) · 6cc8b167
  由 baoachun 提交于 9月 22, 2021
  
  6cc8b167
- G
  fix bug of module 'paddle' has no attribute 'distributed' for python3.6 (#35848) (#35874) · bba41e45
  由 Guoxia Wang 提交于 9月 22, 2021
```
* fix bug
```
  bba41e45
- W
  
  [cherry-pick] [Inference] Support NNAdapter and ascend310 (#35882) · 2aaa417e
  由 Wilber 提交于 9月 22, 2021
  
  2aaa417e
- Z
  [cherry-pick] fix bug of module 'paddle' has no attribute 'fluid' for python3.6 (#35862) (#35900) · c0535200
  由 zhangbo9674 提交于 9月 22, 2021
```
fix bug of module paddle has no attribute fluid for python3.6.
```
  c0535200
- Z
  [cherry-pick]increase test_imperative_auto_mixed_precision time PROPERTIES... · 17879369
  由 zhangbo9674 提交于 9月 22, 2021
```
 [cherry-pick]increase test_imperative_auto_mixed_precision time PROPERTIES TIMEOUT (#35863) (#35898)

Increase test_imperative_auto_mixed_precision PROPERTIES TIMEOUT from 120s to 300s.
```
  17879369
- [cherry-pick2.2]support extern third_party lapack API on Linux/Windows/Mac (#35897) · fb8be035
  由 zhouweiwei2014 提交于 9月 22, 2021
```
ATT, cherry-pick #35690
```
  fb8be035
- W
  [cherry-pick] trt engine dtor when the last predictor dtor (#35881) · f72d52e7
  由 Wilber 提交于 9月 22, 2021
```
* cherry-pick 32842
```
  f72d52e7
18 9月, 2021 9 次提交

H

fix import paddle · edeb0ade
由 huangjun12 提交于 9月 17, 2021

edeb0ade
H

replace matmul to matmul_v2, expand to expand_v2 · 1776293a
由 huangjun12 提交于 9月 16, 2021

1776293a
Z

fix flags dep (#35859) · 6d45d8da
由 Zeng Jinle 提交于 9月 18, 2021

6d45d8da

由 Feiyu Chan 提交于 9月 18, 2021

* 1. add interface for fft;
2. add data type predicate;
3. fix paddle.roll.

* add fft c2c cufft kernel

* implement argument checking & op calling parts for fft_c2c and fftn_c2c

* add operator and opmaker definitions

* only register float and double for cpu.

* add common code for implementing FFT, add pocketfft as a dependency

* add fft c2c cufft kernel function

* fix bugs in python interface

* add support for c2r, r2c operators, op makers, kernels and kernel functors.

* test and fix bugs

* 1. fft_c2c function: add support for onesided=False;
2. add complex<float>, complex<double> support for concat and flip.

* 1. fft: fix python api bugs;
2. shape_op: add support for complex data types.

* fft c2c cufft kernel done with complie and link

* fix shape_op, add mkl placeholder

* remove mkl

* complete fft c2c in gpu

* 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft;
2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation.

* complete fft c2c on gpu in ND

* complete fft c2c on gpu in ND

* complete fft c2c backward in ND

* fix MKL-based implementation

* Add frame op and CPU/GPU kernels.

* Add frame op forward unittest.

* Add frame op forward unittest.

* Remove axis parameter in FrameFunctor.

* Add frame op grad CPU/GPU kernels and unittest.

* Add frame op grad CPU/GPU kernels and unittest.

* Update doc string.

* Update after review and remove librosa requirement in unittest.

* Update grad kernel.

* add fft_c2r op

* Remove data allocation in TransCompute function.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* last fft c2r functor

* fix C2R and R2C for cufft, becase the direction is not an option in these cases.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* fix bugs in python APIs

* fix fft_c2r grad kernal

* fix bugs in python APIs

* add cuda fft c2r grad kernal functor

* clean code

* fix fft_c2r python API

* fill fft r2c result with conjugate symmetry (#19)

fill fft r2c result with conjugate symmetry

* add placeholder for unittests (#24)

* simple parameterize test function by auto generate test case from parm list (#25)

* miscellaneous fixes for python APIs (#26)

* add placeholder for unittests

* resize fft inputs before computation is n or s is provided.

* add complex kernels for pad and pad_grad

* simplify argument checking.

* add type promotion

* add int to float or complex promotion

* fix output data type for static mode

* fix fft's input dtype dispatch, import fft to paddle

* fix typos in axes checking (#27)

* fix typos in axes checking

* fix argument checking (#28)

* fix argument checking

* Add C2R Python layer normal and abnormal use cases (#29)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30)

* Documentation of the common interfaces of c2r and c2c (#31)

* Documentation of the common interfaces of c2r and c2c

* clean c++ code  (#32)

* clean code

* Add numpy-based implementation of spectral ops (#33)

* add numpy reference implementation of spectral ops

* Add fft_c2r numpy based implementation for unittest. (#34)

* add fft_c2r numpy implementation

* Add deframe op and stft/istft api. (#23)

* Add frame api

* Add deframe op and kernels.

* Add stft and istft apis.

* Add deframe api. Update stft and istft apis.

* Fix bug in frame_from_librosa function when input dims >= 3

* Rename deframe to overlap_add.

* Update istft.

* Update after code review.

* Add overlap_add op and stft/istft api unittest (#35)

* Add overlap_add op unittest.

* Register complex kernels of squeeze/unsquuze op.

* Add stft/istft api unittest.

* Add unittest for fft helper functions (#36)

* add unittests for fft helper functions. add complex kernel for roll op.

* complete static graph unittest for all public api (#37)

* Unittest of op with FFT C2C, C2R and r2c added (#38)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* Documentation of the common interfaces of c2r and c2c

* Unittest of op with FFT C2C, C2R and r2c added
Co-authored-by: lijiaqi <lijiaqi0612@163.com>

* add fft related options to CMakeLists.txt

* fix typos and clean code (#39)

* fix invisible character in mkl branch and fix error in error message

* clean code: remove docstring from unittest for signal.py.

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40)

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.

* fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41)

1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r;
3. fix unittest to catch UnImplementedError and RuntimeError;
4. fix compile error by avoid using thrust when cuda is not available.
5.  fix sample code, use paddle.fft instead of paddle.tensor.fft

* remove inclusion of thrust, add __all__ list for fft (#42)

* Add api doc and update unittest. (#43)

* Add doc strings.
* Update overlap_add op unittest

* fix MKL-based FFT implementation (#44)

* fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R

* remove code for debug (#45)

* use dynload for cufft (#46)

* use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms.

* add complex support for fill_zeros_like

* use dynload for cufft

* Update doc and unittest. (#47)

* Add doc of frame op and overlap_add op.

* Update unittest.

* use dynload for cufft (#48)

1. use dynload for cufft
2. fix unittest;
3. temporarily disable Rocm.

* fix conflicts and merge upstream (#49)

fix conflicts and merge upstream

* fix compile error: only link dyload_cuda when cuda is available (#50)

* fix compile error: only link dyload_cuda when cuda is available

* fix dynload for cufft on windows (#51)

1. fix dynload for cufft on windows;
2. fix unittests.

* add NOMINMAX to compile on windows (#52)

 add NOMINMAX to compile on windows

* explicitly specify capture mode for lambdas (#55)

 explicitly specify capture mode for lambdas

* fix fft sample (#53)

* fix fft sample

* update scipy and numpy version for unittests of fft (#56)

update scipy and numpy version for unittests of fft

* Add static graph unittests of frame and overlap_add api. (#57)

* Remove cache of cuFFT & Disable ONEMKL (#59)

1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm
2. remove cache of cufft plans;
3. enhance error checking.
4. default WITH_ONEMKL to OFF
Co-authored-by: Njeff41404 <jeff41404@gmail.com>
Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
Co-authored-by: NKP <109694228@qq.com>
Co-authored-by: lijiaqi <lijiaqi0612@163.com>
Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
Co-authored-by: Nlijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

11518a43

A
split cuda_profiler into .h and .cc (#35821) · 01063218
由 Aurelius84 提交于 9月 18, 2021
```
* split cuda_profiler into .h and .cc

* fix cmake

* remove inline
```
01063218
W

trt support serialize and deserialize (#35828) · ba71421c
由 Wilber 提交于 9月 18, 2021

ba71421c
A
Clean ParseMemInfo and Fix unittest failed under multi-thread (#35840) · 2fff5a58
由 Aurelius84 提交于 9月 18, 2021
```
* Clean ParaseMemInfo and fix unittest with multi-thread

* fix declare
```
2fff5a58

[oneDNN] Disable caching of Reorder operation (#35664) · e4c2a854

由 Jacek Czaja 提交于 9月 18, 2021

* - REorder disabling caching

* - compilation fix

* - another compilation fix

* - another compilation fix

* - compilation fix

* - Fix

* - yet another compilation fix

* - suppresingly another compilation fix

* - lint

* - fix after review

* - fix

e4c2a854

Add new API "eigvals" in linalg (#35720) · d411a038

由 From00 提交于 9月 18, 2021

* Add linalg.eigvals API

* pre-commit check

* Adjust code style

* Fix conflict

* Improve code style

* Modify the test code to ignore testing CUDA kernel

* Sort ouput data before checking in test code

* Set timeout value for UT

* Improve API example code to pass CI

* Fix bug for None fetch_list in Windows

* Delete grad Op

d411a038

17 9月, 2021 23 次提交

[AMP] Support pure fp16 training mode for dygraph (#35521) · adaeee4d

由 zhangbo9674 提交于 9月 17, 2021

* add pure fp16 major function in auto_cast & tracer

* support master weight in dygraph for pure fp16

* check mix dtype of fp16&fp32 for check_finite_and_unscale op

* change pure fp16 funtion name

* refine some bug in auto_cast

* refine auto_cast interface logic

* add param _casted_by_pure_fp16 for class Layer

* support state_dict hook for save model by user appointed dtype in pure_fp16_decorator

* refine pure_fp16_decorator as decorator

* add unittest

* add comment

* add comment

* support recompute

* add comment for auto_cast and decorator

* support to_static_state_dict for paddle.jit.save

* unlimite models num and optimizers num

* add lookup_table in black_list

* fix momentum and layer state_dict

* fix bug in layer state_dict

* fix bug in layer state_dict_helper

* refine unittest

* refine test_momentun_op

* refine interface and some code

* refine amp_decorator interface

* refine pure fp16 interface

* refine master weight interface

adaeee4d

L
temporally disable the warnings (#35560) · 68ae6345
由 Leo Chen 提交于 9月 17, 2021
```
* temporally disable the warnings

* disable ut
```
68ae6345
Z

change to PADDLE_DEFINE_EXPORTED (#35841) · d22914fd
由 Zeng Jinle 提交于 9月 17, 2021

d22914fd
G

fix unittest (#35808) · fcfb0afe
由 Guoxia Wang 提交于 9月 17, 2021

fcfb0afe

Disabled oneDNN reshape1/2 and squeeze1/2 kernels (#35781) · 0eaab803

由 jakpiase 提交于 9月 17, 2021

* disabled matmul_v2 grad

* Revert "disabled matmul_v2 grad"

This reverts commit b569bcef162116ca9f7963f3975b4a412f9e8555.

* reverted disabling matmul_v2, disabled reshape and squeeze

0eaab803

Make flag adding easier (#35823) · 2c781455

由 Zeng Jinle 提交于 9月 17, 2021

* make flag setter easier

* update

* rename macro name

* fix bug of public/writable

* update to pass CI

* polish

* fix CPU link error

2c781455

Add linalg pinv api (#35804) · 71e01d3f

由 andyjpaddle 提交于 9月 17, 2021

* add pinv api, test=develop
* add linalg pinv api, test=develop
* update example code, test=develop

71e01d3f

F
broadcast qkv_op (#35780) · cf9eae4c
由 feng_shuai 提交于 9月 17, 2021
```
* broadcast qkv_op

* use PADDLE_ENFORCE_GT to replace assert
```
cf9eae4c

add a fusion op: fused_layernorm_residual_dropout_bias (#35151) · 7975dfcf

由 zhangkaihuo 提交于 9月 17, 2021

Fused elementwise_add, dropout, elementwise_add and layer_norm into one operator, only support Forward. 
No Python API changed.

7975dfcf

Support EMA in Paddle2.x and Fleet (#35673) · fb4d5689

由 Haohongxiang 提交于 9月 17, 2021

* Support EMA in Paddle2.x and Fleet

* update

* update

* update

* modify ut of ema

* modify docs

* modify bugs

* update

* update

* update

* modify ut

fb4d5689

G

test=document_fix (#35824) · 177bf52f
由 Guoxia Wang 提交于 9月 17, 2021

177bf52f

add inplace op support to prune, scale_op is no longer need in jit.save (#35730) · 21921936

由 Haipeng Wang 提交于 9月 17, 2021

* add scale_op in model save step is not necessary, just fix the prune method to support static graph and inplace op

* fix jit.save, no need to add scale_op to each outputvar anymore.
fix prune_with_input, now it supports inplace op

* temporarily disable test_trt_dynamic_shape.TRTDynamicShapeOutOfBound2Test

21921936

Intergrate MultiThreadedWorkQueue to execute program ops (#35356) · a0871194

由 Aurelius84 提交于 9月 17, 2021

* format code

* format interface

* polish interface

* Remove std::memory_order

* modify into SpinLock

* remove fetch_context_pool_

* fix comment

* modify into WorkQueueGroup

* refine code

* fix pointer

* fix paddle_enforce

* split into AsyncWorkQueue

* polish code

* specify std::memory_relax

* fix atomic fetch_sub

* fix num_thread

a0871194

津

[inference]add hard_swish dynamic plugin (#35214) · c59c8e4f
由津提交于 9月 17, 2021

c59c8e4f
C

remove cuda sync in ext_tensor copy_to (#35802) · d43f797a
由 Chen Weihang 提交于 9月 17, 2021

d43f797a
Z

Fix segment api document. (#35818) · 6d5fc220
由 Zhong Hui 提交于 9月 17, 2021

6d5fc220

Add skip teller (#35807) · 0f74e5e7

由 xiaoxiaohehe001 提交于 9月 17, 2021

* add_skip_layernorm

* add_skip_layernorm

* add_skip_layernorm

* add_skip_layernorm

* add_skip_layernorm

* add_skip_layernorm

* add_skiplayernorm_teller

* add_skip_layernorm

* add_skip_layernorm_teller

* add_skip_layernorm_teller

* add_skip_layernorm

* add_skip_teller

0f74e5e7

L
expose cuda stream to users (#35813) · 40cfa512
由 Leo Chen 提交于 9月 17, 2021
```
* expose cuda stream to users

* add ut
```
40cfa512
津
[inference]add reduce converter test (#35145) · 05275010
由津提交于 9月 17, 2021
```
* add test

* add test

* add test
```
05275010
津
leaky_relu test (#35318) · 867f4fa0
由津提交于 9月 17, 2021
```
* add test

* add test

* add test

* add test

* add test
```
867f4fa0
W

polish code. (#35783) · 61010bb8
由 WeiXin 提交于 9月 17, 2021

61010bb8
X
fix unpool doc, test=document_fix (#35806) · 652e655f
由 xiaoting 提交于 9月 17, 2021
```
* fix unpool doc, test=document_fix

* fix typo for python example, test=document_fix
```
652e655f
F
Add New CI -- GPUBOX (#35755) · 00865930
由 Fan Zhang 提交于 9月 17, 2021
```
* Add New CI - GPUBOX
```
00865930

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功