提交 · b5404f0934120db9464a12974e1eb3889a3dc99e · BaiXuePrincess / Paddle

20 10月, 2021 2 次提交
- W
  
  [cherry-pick] Inference add type check in copy_from_cpu (#36552) · b5404f09
  由 Wilber 提交于 10月 20, 2021
  
  b5404f09
- X
  catch the generatorfunction and intercept it. (#35369) (#36536) · 023eb3f9
  由 xiongkun 提交于 10月 20, 2021
```
* catch the generatorfunction and intercept it.

* add test generator

* add test case

* refine the testcase
```
  023eb3f9
19 10月, 2021 2 次提交

[cherry-pick]Add sparse attention cherrypick (#36447) · 36edb0e1

由 Liu-xiandong 提交于 10月 19, 2021

The code of this PR can only support CUDA 11.2. Currently, CI does not have GPU with CUDA 11.2 , and all tests will be skipped automatically.

The new OP is paddle._C_ops.sparse_attention. Regarding the work of the python API, it will be resolved in a follow-up PR.

The code of this PR lacks tests on dynamic graphs and static graphs, and will be added in subsequent PRs.

36edb0e1

C
quant support matmul_v2 (#36469) (#36499) · b8167ed2
由 ceci3 提交于 10月 19, 2021
```
* quant support matmul_v2

* fix format
```
b8167ed2

18 10月, 2021 1 次提交

[Cherry-pick][Dy2stat]fix no_grad context error in train mode when using... · 2b9d1922

由 0x45f 提交于 10月 18, 2021

[Cherry-pick][Dy2stat]fix no_grad context error in train mode when using save/load (#36434) (#36463)

修复使用jit.save/load接口加载模型后，在train模式和no_grad上下文中，显存会一直增长的问题

2b9d1922

15 10月, 2021 2 次提交

[cherry-pick]Verify the correctness of graph rewrited by GeneratePass (#36453) · cc449652

由 wuhuanzhou 提交于 10月 15, 2021

* [WIP]Verify the correctness of graph rewrited by GeneratePass, test=develop

* add delete subgraph and unittest, test=develop

* check simple pass, test=develop

* fix coverage, test=develop

* limit with input_spec via Paddle API, test=develop

cc449652

Y
[cherry-pick] add sparse_embedding doc (#36312) · fc429fea
由 Yanxing Shi 提交于 10月 15, 2021
```
* add sparse_embedding doc

* modify sample code

* fix sample code error
```
fc429fea

13 10月, 2021 2 次提交
- 0
  delete remove_static_file() function in error.py (#36153) (#36375) · a5767bb6
  由 0x45f 提交于 10月 13, 2021
```
* change time to remove static tempfile

* delete remove_static_file() function
```
  a5767bb6
- J
  
  fix for matmul_v2 6D x 2D (#36379) · ce6a27d9
  由 jakpiase 提交于 10月 13, 2021
  
  ce6a27d9
12 10月, 2021 1 次提交
- A
  Fix stop_gradient in RunProgramOp (#36339) (#36353) · a6868c91
  由 Aurelius84 提交于 10月 12, 2021
```
* Fix stop_gradient in RunProgramOp

* fix reference
```
  a6868c91
11 10月, 2021 1 次提交

[cherry-pick]fix hasattr(paddle.fluid.ir.PassDesc.OP, '__name__') error (#36294) · 45de9312

由 wuhuanzhou 提交于 10月 11, 2021

对于__getattr__重载后不满足条件的参数，全部抛出AttributeError异常，达到与未重载版本一致。

(cherry picked from PR #36229)

45de9312

30 9月, 2021 3 次提交
- Z
  add optest for adamw (#36148) (#36239) · 70e67843
  由 zhaoyingli 提交于 9月 30, 2021
```
* update func name

* skip cpu

* update unittest

* update unittest
```
  70e67843
- 李
  Fix raw optim (#36176) (#36231) · 28d12007
  由李季提交于 9月 30, 2021
```
* fix raw optim

* pre-commit test file
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
```
  28d12007
- 李
  
  fix the undefined variable bug in dist_transformer file (#36211) (#36233) · 789012c0
  由李季提交于 9月 30, 2021
  
  789012c0
29 9月, 2021 2 次提交
- L
  add API paddle.linalg.eig (#35674) (#36188) · 4e2daa9a
  由 Lijunhui 提交于 9月 29, 2021
```
向PaddlePaddle中的线性代数库添加eig算子，该算子计算一般方阵的特征分解。
cherry-pick 自#35674.
```
  4e2daa9a
- H
  Add op paddle.device.cuda.get_device_name and paddle.device.cuda.get_device_capability. (#36172) · 96fd98bc
  由 hlygit66666 提交于 9月 29, 2021
```
* add get_device_name and get_device_capability

* fix docs

* fix docs

* fix decs
```
  96fd98bc
27 9月, 2021 6 次提交
- Y
  Add paddle.device.cuda.get_device_properties (#35875) · cea0bc26
  由 Yanxing Shi 提交于 9月 27, 2021
```
* Initial Commit

* fix py2 error

* fix wrong words and doc

* test=document_fix

* fix _gpuDeviceProperties
```
  cea0bc26
- J
  cherry-pick #36021 fix unique/unstack zero tensor (#36163) · 749bc240
  由 Jiawei Wang 提交于 9月 27, 2021
```
* fix unique unstack dim 0

* fix unique_op format
```
  749bc240
- Z
  remove linalg api in paddle.__init__ (#36112) · a57f0810
  由 zhiboniu 提交于 9月 27, 2021
```
remove recent linalg api in paddle.init;
add args 'name' in some new linalg api interface
```
  a57f0810
- H
  
  allow user to export parameters defined in model (#36132) · 5f168af7
  由 Haipeng Wang 提交于 9月 27, 2021
  
  5f168af7
- L
  
  Correct the misspelled part of the unit test (#36101) · 4bcff7b2
  由 LJQ❤️ 提交于 9月 27, 2021
  
  4bcff7b2
- J
  [Cherry-pick] Add new func/class API psroi_pool and UT (#36111) · 81557da6
  由 JYChen 提交于 9月 27, 2021
```
cherry-pick from #35352

Add new detection api paddle.vision.ops.psroi_pool and paddle.vision.ops.PSRoIPool
```
  81557da6
26 9月, 2021 6 次提交
- W
  
  concat api support empty tensor. (#35845) (#36096) · bc13ab9e
  由 wuhuachaocoding 提交于 9月 26, 2021
  
  bc13ab9e
- W
  
  修改了示例代码错误 (#36041) (#36089) · 14cdcde7
  由 wangzhuang01 提交于 9月 26, 2021
  
  14cdcde7
- H
  [cherry-pick] Add Det and Slogdet API to Release 2.2 (#36083) · ba2a1bb4
  由 Huihuang Zheng 提交于 9月 26, 2021
```
This PR added det and slogdet API to release/2.2
It is cherry-pick from #34992 and #36013
```
  ba2a1bb4
- W
  [Cherry-Pick]Add paddle.linalg.solve OP (#35715) (#36056) · 6b4f2fbf
  由 Weilong Wu 提交于 9月 26, 2021
```
This PR supports linalg.solve calculation for linear algorithm module of Paddle. One may call paddle.linalg.solve to use it.
```
  6b4f2fbf
- R
  [NPU] add randperm_op_npu (#35763) (#36026) · df81915a
  由 ronnywang 提交于 9月 26, 2021
```
* add randperm_op_npu

* fix test_set_value_op_npu
```
  df81915a
- Z
  [cherry pick]split minimize and add unscale_ for GradScaler (#35927) · e262125d
  由 zhangbo9674 提交于 9月 26, 2021
```
1、Split function GradScaler::minimize() to GradScaler::step() + GradScaler::update()
2、Add GradScaler::unscale_(optimizer)
```
  e262125d
24 9月, 2021 2 次提交
- H
  Basic PR on Cost Model (#35774) (#35915) · efcd108d
  由 Huihuang Zheng 提交于 9月 24, 2021
```
Add basic Cost Model, it uses executor to run program and profile it to get op time.

This is an early basic version, we will add more functions in the future.
```
  efcd108d
- J
  
  add pool2d convert test (#35925) · 063fca8e
  由 JingZhuangzhuang 提交于 9月 23, 2021
  
  063fca8e
23 9月, 2021 4 次提交
- C
  [cherry-pick] FixEighOP; Unified MatrixEighFunctor function (#35812) (#35919) · 4629401e
  由 crystal 提交于 9月 23, 2021
```
cherry-pick #35812，修复Eigh OP
```
  4629401e
- W
  
  add dilation check for conv (#35894) · 91f25ee3
  由 wangguanzhong 提交于 9月 23, 2021
  
  91f25ee3
- T
  op:transpose_op supports bool type (#35886) (#35926) · 95c100c1
  由 TeslaZhao 提交于 9月 23, 2021
```
* Pass compat of conv_transpose_bias_mkldnn_fuse_pass

* Fix a bug of strided_slice op, about the axes parameter access memory out of bounds

* Fix a bug of transpose op, about accessing memory out of bounds of the perm param

* op:transpose_op supports bool type
```
  95c100c1
- L
  Add quant2 int8 lstm model test (#35887) (#35912) · e8e77ebe
  由 lidanqing 提交于 9月 23, 2021
```
Co-authored-by: Njoanna.wozna.intel <joanna.wozna@intel.com>
```
  e8e77ebe
22 9月, 2021 3 次提交
- B
  
  add hard_sigmoid trt converter test cases (#35908) · 6cc8b167
  由 baoachun 提交于 9月 22, 2021
  
  6cc8b167
- Z
  [cherry-pick] fix bug of module 'paddle' has no attribute 'fluid' for python3.6 (#35862) (#35900) · c0535200
  由 zhangbo9674 提交于 9月 22, 2021
```
fix bug of module paddle has no attribute fluid for python3.6.
```
  c0535200
- Z
  [cherry-pick]increase test_imperative_auto_mixed_precision time PROPERTIES... · 17879369
  由 zhangbo9674 提交于 9月 22, 2021
```
 [cherry-pick]increase test_imperative_auto_mixed_precision time PROPERTIES TIMEOUT (#35863) (#35898)

Increase test_imperative_auto_mixed_precision PROPERTIES TIMEOUT from 120s to 300s.
```
  17879369
18 9月, 2021 3 次提交

H

fix import paddle · edeb0ade
由 huangjun12 提交于 9月 17, 2021

edeb0ade
H

replace matmul to matmul_v2, expand to expand_v2 · 1776293a
由 huangjun12 提交于 9月 16, 2021

1776293a

由 Feiyu Chan 提交于 9月 18, 2021

* 1. add interface for fft;
2. add data type predicate;
3. fix paddle.roll.

* add fft c2c cufft kernel

* implement argument checking & op calling parts for fft_c2c and fftn_c2c

* add operator and opmaker definitions

* only register float and double for cpu.

* add common code for implementing FFT, add pocketfft as a dependency

* add fft c2c cufft kernel function

* fix bugs in python interface

* add support for c2r, r2c operators, op makers, kernels and kernel functors.

* test and fix bugs

* 1. fft_c2c function: add support for onesided=False;
2. add complex<float>, complex<double> support for concat and flip.

* 1. fft: fix python api bugs;
2. shape_op: add support for complex data types.

* fft c2c cufft kernel done with complie and link

* fix shape_op, add mkl placeholder

* remove mkl

* complete fft c2c in gpu

* 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft;
2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation.

* complete fft c2c on gpu in ND

* complete fft c2c on gpu in ND

* complete fft c2c backward in ND

* fix MKL-based implementation

* Add frame op and CPU/GPU kernels.

* Add frame op forward unittest.

* Add frame op forward unittest.

* Remove axis parameter in FrameFunctor.

* Add frame op grad CPU/GPU kernels and unittest.

* Add frame op grad CPU/GPU kernels and unittest.

* Update doc string.

* Update after review and remove librosa requirement in unittest.

* Update grad kernel.

* add fft_c2r op

* Remove data allocation in TransCompute function.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* last fft c2r functor

* fix C2R and R2C for cufft, becase the direction is not an option in these cases.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* fix bugs in python APIs

* fix fft_c2r grad kernal

* fix bugs in python APIs

* add cuda fft c2r grad kernal functor

* clean code

* fix fft_c2r python API

* fill fft r2c result with conjugate symmetry (#19)

fill fft r2c result with conjugate symmetry

* add placeholder for unittests (#24)

* simple parameterize test function by auto generate test case from parm list (#25)

* miscellaneous fixes for python APIs (#26)

* add placeholder for unittests

* resize fft inputs before computation is n or s is provided.

* add complex kernels for pad and pad_grad

* simplify argument checking.

* add type promotion

* add int to float or complex promotion

* fix output data type for static mode

* fix fft's input dtype dispatch, import fft to paddle

* fix typos in axes checking (#27)

* fix typos in axes checking

* fix argument checking (#28)

* fix argument checking

* Add C2R Python layer normal and abnormal use cases (#29)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30)

* Documentation of the common interfaces of c2r and c2c (#31)

* Documentation of the common interfaces of c2r and c2c

* clean c++ code  (#32)

* clean code

* Add numpy-based implementation of spectral ops (#33)

* add numpy reference implementation of spectral ops

* Add fft_c2r numpy based implementation for unittest. (#34)

* add fft_c2r numpy implementation

* Add deframe op and stft/istft api. (#23)

* Add frame api

* Add deframe op and kernels.

* Add stft and istft apis.

* Add deframe api. Update stft and istft apis.

* Fix bug in frame_from_librosa function when input dims >= 3

* Rename deframe to overlap_add.

* Update istft.

* Update after code review.

* Add overlap_add op and stft/istft api unittest (#35)

* Add overlap_add op unittest.

* Register complex kernels of squeeze/unsquuze op.

* Add stft/istft api unittest.

* Add unittest for fft helper functions (#36)

* add unittests for fft helper functions. add complex kernel for roll op.

* complete static graph unittest for all public api (#37)

* Unittest of op with FFT C2C, C2R and r2c added (#38)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* Documentation of the common interfaces of c2r and c2c

* Unittest of op with FFT C2C, C2R and r2c added
Co-authored-by: lijiaqi <lijiaqi0612@163.com>

* add fft related options to CMakeLists.txt

* fix typos and clean code (#39)

* fix invisible character in mkl branch and fix error in error message

* clean code: remove docstring from unittest for signal.py.

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40)

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.

* fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41)

1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r;
3. fix unittest to catch UnImplementedError and RuntimeError;
4. fix compile error by avoid using thrust when cuda is not available.
5.  fix sample code, use paddle.fft instead of paddle.tensor.fft

* remove inclusion of thrust, add __all__ list for fft (#42)

* Add api doc and update unittest. (#43)

* Add doc strings.
* Update overlap_add op unittest

* fix MKL-based FFT implementation (#44)

* fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R

* remove code for debug (#45)

* use dynload for cufft (#46)

* use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms.

* add complex support for fill_zeros_like

* use dynload for cufft

* Update doc and unittest. (#47)

* Add doc of frame op and overlap_add op.

* Update unittest.

* use dynload for cufft (#48)

1. use dynload for cufft
2. fix unittest;
3. temporarily disable Rocm.

* fix conflicts and merge upstream (#49)

fix conflicts and merge upstream

* fix compile error: only link dyload_cuda when cuda is available (#50)

* fix compile error: only link dyload_cuda when cuda is available

* fix dynload for cufft on windows (#51)

1. fix dynload for cufft on windows;
2. fix unittests.

* add NOMINMAX to compile on windows (#52)

 add NOMINMAX to compile on windows

* explicitly specify capture mode for lambdas (#55)

 explicitly specify capture mode for lambdas

* fix fft sample (#53)

* fix fft sample

* update scipy and numpy version for unittests of fft (#56)

update scipy and numpy version for unittests of fft

* Add static graph unittests of frame and overlap_add api. (#57)

* Remove cache of cuFFT & Disable ONEMKL (#59)

1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm
2. remove cache of cufft plans;
3. enhance error checking.
4. default WITH_ONEMKL to OFF
Co-authored-by: Njeff41404 <jeff41404@gmail.com>
Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
Co-authored-by: NKP <109694228@qq.com>
Co-authored-by: lijiaqi <lijiaqi0612@163.com>
Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
Co-authored-by: Nlijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

11518a43

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致