提交 · fa7aa6b89b88a919f45f0e6022af1e74846ce0bf · 机器未来 / Paddle

29 10月, 2021 1 次提交
- F
  1. fix ifftshift(missing negative sign before shifts); (#36835) · fa7aa6b8
  由 Feiyu Chan 提交于 10月 29, 2021
```
2. add complex data type support for paddle.shape at graph assembly.
```
  fa7aa6b8
27 9月, 2021 1 次提交
- J
  cherry-pick #36021 fix unique/unstack zero tensor (#36163) · 749bc240
  由 Jiawei Wang 提交于 9月 27, 2021
```
* fix unique unstack dim 0

* fix unique_op format
```
  749bc240
26 9月, 2021 1 次提交
- W
  
  修改了示例代码错误 (#36041) (#36089) · 14cdcde7
  由 wangzhuang01 提交于 9月 26, 2021
  
  14cdcde7
23 9月, 2021 1 次提交

op:transpose_op supports bool type (#35886) (#35926) · 95c100c1

由 TeslaZhao 提交于 9月 23, 2021

* Pass compat of conv_transpose_bias_mkldnn_fuse_pass

* Fix a bug of strided_slice op, about the axes parameter access memory out of bounds

* Fix a bug of transpose op, about accessing memory out of bounds of the perm param

* op:transpose_op supports bool type

95c100c1

18 9月, 2021 1 次提交

由 Feiyu Chan 提交于 9月 18, 2021

* 1. add interface for fft;
2. add data type predicate;
3. fix paddle.roll.

* add fft c2c cufft kernel

* implement argument checking & op calling parts for fft_c2c and fftn_c2c

* add operator and opmaker definitions

* only register float and double for cpu.

* add common code for implementing FFT, add pocketfft as a dependency

* add fft c2c cufft kernel function

* fix bugs in python interface

* add support for c2r, r2c operators, op makers, kernels and kernel functors.

* test and fix bugs

* 1. fft_c2c function: add support for onesided=False;
2. add complex<float>, complex<double> support for concat and flip.

* 1. fft: fix python api bugs;
2. shape_op: add support for complex data types.

* fft c2c cufft kernel done with complie and link

* fix shape_op, add mkl placeholder

* remove mkl

* complete fft c2c in gpu

* 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft;
2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation.

* complete fft c2c on gpu in ND

* complete fft c2c on gpu in ND

* complete fft c2c backward in ND

* fix MKL-based implementation

* Add frame op and CPU/GPU kernels.

* Add frame op forward unittest.

* Add frame op forward unittest.

* Remove axis parameter in FrameFunctor.

* Add frame op grad CPU/GPU kernels and unittest.

* Add frame op grad CPU/GPU kernels and unittest.

* Update doc string.

* Update after review and remove librosa requirement in unittest.

* Update grad kernel.

* add fft_c2r op

* Remove data allocation in TransCompute function.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* last fft c2r functor

* fix C2R and R2C for cufft, becase the direction is not an option in these cases.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* fix bugs in python APIs

* fix fft_c2r grad kernal

* fix bugs in python APIs

* add cuda fft c2r grad kernal functor

* clean code

* fix fft_c2r python API

* fill fft r2c result with conjugate symmetry (#19)

fill fft r2c result with conjugate symmetry

* add placeholder for unittests (#24)

* simple parameterize test function by auto generate test case from parm list (#25)

* miscellaneous fixes for python APIs (#26)

* add placeholder for unittests

* resize fft inputs before computation is n or s is provided.

* add complex kernels for pad and pad_grad

* simplify argument checking.

* add type promotion

* add int to float or complex promotion

* fix output data type for static mode

* fix fft's input dtype dispatch, import fft to paddle

* fix typos in axes checking (#27)

* fix typos in axes checking

* fix argument checking (#28)

* fix argument checking

* Add C2R Python layer normal and abnormal use cases (#29)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30)

* Documentation of the common interfaces of c2r and c2c (#31)

* Documentation of the common interfaces of c2r and c2c

* clean c++ code  (#32)

* clean code

* Add numpy-based implementation of spectral ops (#33)

* add numpy reference implementation of spectral ops

* Add fft_c2r numpy based implementation for unittest. (#34)

* add fft_c2r numpy implementation

* Add deframe op and stft/istft api. (#23)

* Add frame api

* Add deframe op and kernels.

* Add stft and istft apis.

* Add deframe api. Update stft and istft apis.

* Fix bug in frame_from_librosa function when input dims >= 3

* Rename deframe to overlap_add.

* Update istft.

* Update after code review.

* Add overlap_add op and stft/istft api unittest (#35)

* Add overlap_add op unittest.

* Register complex kernels of squeeze/unsquuze op.

* Add stft/istft api unittest.

* Add unittest for fft helper functions (#36)

* add unittests for fft helper functions. add complex kernel for roll op.

* complete static graph unittest for all public api (#37)

* Unittest of op with FFT C2C, C2R and r2c added (#38)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* Documentation of the common interfaces of c2r and c2c

* Unittest of op with FFT C2C, C2R and r2c added
Co-authored-by: lijiaqi <lijiaqi0612@163.com>

* add fft related options to CMakeLists.txt

* fix typos and clean code (#39)

* fix invisible character in mkl branch and fix error in error message

* clean code: remove docstring from unittest for signal.py.

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40)

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.

* fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41)

1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r;
3. fix unittest to catch UnImplementedError and RuntimeError;
4. fix compile error by avoid using thrust when cuda is not available.
5.  fix sample code, use paddle.fft instead of paddle.tensor.fft

* remove inclusion of thrust, add __all__ list for fft (#42)

* Add api doc and update unittest. (#43)

* Add doc strings.
* Update overlap_add op unittest

* fix MKL-based FFT implementation (#44)

* fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R

* remove code for debug (#45)

* use dynload for cufft (#46)

* use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms.

* add complex support for fill_zeros_like

* use dynload for cufft

* Update doc and unittest. (#47)

* Add doc of frame op and overlap_add op.

* Update unittest.

* use dynload for cufft (#48)

1. use dynload for cufft
2. fix unittest;
3. temporarily disable Rocm.

* fix conflicts and merge upstream (#49)

fix conflicts and merge upstream

* fix compile error: only link dyload_cuda when cuda is available (#50)

* fix compile error: only link dyload_cuda when cuda is available

* fix dynload for cufft on windows (#51)

1. fix dynload for cufft on windows;
2. fix unittests.

* add NOMINMAX to compile on windows (#52)

 add NOMINMAX to compile on windows

* explicitly specify capture mode for lambdas (#55)

 explicitly specify capture mode for lambdas

* fix fft sample (#53)

* fix fft sample

* update scipy and numpy version for unittests of fft (#56)

update scipy and numpy version for unittests of fft

* Add static graph unittests of frame and overlap_add api. (#57)

* Remove cache of cuFFT & Disable ONEMKL (#59)

1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm
2. remove cache of cufft plans;
3. enhance error checking.
4. default WITH_ONEMKL to OFF
Co-authored-by: Njeff41404 <jeff41404@gmail.com>
Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
Co-authored-by: NKP <109694228@qq.com>
Co-authored-by: lijiaqi <lijiaqi0612@163.com>
Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
Co-authored-by: Nlijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

11518a43

17 9月, 2021 1 次提交
- G
  
  fix unittest (#35808) · fcfb0afe
  由 Guoxia Wang 提交于 9月 17, 2021
  
  fcfb0afe
16 9月, 2021 1 次提交
- G
  support l2_normalize float16 (#35776) · b666fd3c
  由 Guoxia Wang 提交于 9月 16, 2021
```
* support fp16 dtype
```
  b666fd3c
15 9月, 2021 2 次提交
- S
  
  upgrade dice_loss (#35734) · 46ec5b3e
  由 shangliang Xu 提交于 9月 15, 2021
  
  46ec5b3e
- Q
  [NPU] fix depthwise_conv2d_grad, test=develop (#35626) · d3e06a51
  由 Qi Li 提交于 9月 15, 2021
```
* [NPU] fix depthwise_conv2d_grad, test=develop

* remove debug files, test=develop
```
  d3e06a51
13 9月, 2021 2 次提交
- C
  fix instance norm index error (#35341) · e641c638
  由 ceci3 提交于 9月 13, 2021
```
* fix instance norm index error

* add unittest

* update

* fix
```
  e641c638
- J
  
  catch dimentions error when input is empty in static.nn.group_norm (#35613) · 7b743ba2
  由 JYChen 提交于 9月 13, 2021
  
  7b743ba2
11 9月, 2021 1 次提交
- 王
  
  register the with_quant_attr attribute for all operattor. test=develop (#35591) · 8412d6c0
  由王明冬提交于 9月 11, 2021
  
  8412d6c0
10 9月, 2021 2 次提交

G

fix prelu float16 bug (#35584) · 246a9b6a
由 Guoxia Wang 提交于 9月 10, 2021

246a9b6a

Fix warning (#34875) · 966f042d

由 sunzhongkai588 提交于 9月 10, 2021

* fix warning error , test=document_fix

* fix warning error , test=document_fix

* fix warning error , test=document_fix

* fix warning error , test=document_fix

* fix warning error , test=document_fix

* fix warning error , test=document_fix

* fix warning error , test=document_fix

966f042d

07 9月, 2021 3 次提交

Z
Fix scatter_nd_add doc (#35542) · 1635c02b
由 Zeng Jinle 提交于 9月 07, 2021
```
* fix scatter_nd_add doc, test=document_fix

* update
test=document_fix
```
1635c02b
W
add conv op check for illegal input or attributes (#35337) · 8307b0cb
由 wangxinxin08 提交于 9月 07, 2021
```
* add conv op check for illegal input or attributes
```
8307b0cb

add AsExtra in data_norm op (#35420) · 7907e241

由 XiangGao 提交于 9月 07, 2021

* add AsExtra in data_norm op

* pass data_layout from python to data_norm op

* fix data_layout in data_norm op
Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>

7907e241

01 9月, 2021 1 次提交
- W
  fix bug:When axes in paddle.slice is a tuple, an error occurs. (#35267) · b53887fd
  由 WeiXin 提交于 9月 01, 2021
```
* fix bug:When axes in paddle.sile is a tuple, an error occurs.

* polish code.
```
  b53887fd
27 8月, 2021 1 次提交
- W
  Polish the error message of paddle.slice. (#35179) · 669853f5
  由 WeiXin 提交于 8月 27, 2021
```
* polish the error message of paddle.slice.

* polish code.
```
  669853f5
20 8月, 2021 1 次提交
- S
  
  [bug fix] fix spectral_norm bug (#35005) · 1aa2bde0
  由 shangliang Xu 提交于 8月 20, 2021
  
  1aa2bde0
19 8月, 2021 1 次提交
- P
  
  fix reshape when is a number (#35016) · 866c1ea6
  由 parap1uie-s 提交于 8月 19, 2021
  
  866c1ea6
16 8月, 2021 2 次提交
- L
  Fix typos in English docs for diag and diagflat. (#34869) · 35ef4180
  由 Li Min 提交于 8月 16, 2021
```
* Fix typos in english docs for diag and diagflat.
```
  35ef4180
- S
  [dev] fix dice_loss bug (#34757) · ad6c3b92
  由 shangliang Xu 提交于 8月 16, 2021
```
* fix dice_loss bug
```
  ad6c3b92
09 8月, 2021 1 次提交
- W
  
  limit chunk.axis (#34630) · 3380778f
  由 wangzhen38 提交于 8月 09, 2021
  
  3380778f
04 8月, 2021 1 次提交

paddle/nn/functional docs' bug fix (#34580) · 420570c9

由 sunzhongkai588 提交于 8月 04, 2021

* fix paddle.optimizer test=document_fix

* fix paddle.optimizer test=document_fix

* fix bugs in paddle.nn.functional document test=document_fix

* fix bugs in paddle.nn.functional document test=document_fix

* fix bugs in paddle.nn.functional document test=document_fix

* fix bugs in paddle.nn.functional document test=document_fix

420570c9

23 7月, 2021 1 次提交

Logical Ops support more data types (#34141) · 27417f1f

由 will-jl944 提交于 7月 23, 2021

* logical ops support int8, int16, int32, int64, float, double

* update docs of logical ops

* fix npu and xpu logical ops

* fix npu and xpu logical ops

* fix bug in xpu logical op code

* update test_logical_op_npu and test_logical_op_xpu

* correct error type

27417f1f

22 7月, 2021 1 次提交
- W
  
  fix index erro in conv2d_transpose (#34270) · 24c7087f
  由 wangguanzhong 提交于 7月 22, 2021
  
  24c7087f
20 7月, 2021 1 次提交
- T
  fix crop_tensor op doc (#34263) · c8fb6fc4
  由 Thomas Young 提交于 7月 20, 2021
```
* fix crop_tensor op doc

* update code example test=document_fix
```
  c8fb6fc4
15 7月, 2021 1 次提交
- W
  cache core.ops (#34058) · f05098b5
  由 wanghuancoder 提交于 7月 15, 2021
```
* cache core.ops, test=develop

* refine, test=develop
```
  f05098b5
09 7月, 2021 1 次提交
- W
  opt dygraph python code (#33997) · 0a9ad8d7
  由 wanghuancoder 提交于 7月 09, 2021
```
* opt dygraph python code, test=develop

* refine, test=develop
```
  0a9ad8d7
15 6月, 2021 1 次提交

Support reduce_sum_op float16 (#32966) · 606939de

由 jiangcheng 提交于 6月 15, 2021

* add reduce_sum_op by add self-kernel

* set all ReduceKernel MPType for accuracy

* add float16 test script which input is integer number

* solve reduce sum float16 check_grad problem

* solve conflict and change test script for CI

* change kernel register for CI

* remove all useless template

606939de

09 6月, 2021 1 次提交
- W
  cache core.globals() to speed up dynamic graph (#32098) · b4954ce4
  由 wanghuancoder 提交于 6月 09, 2021
```
* modify API nn.Bilinear's doc, test=develop
```
  b4954ce4
07 6月, 2021 1 次提交

OP:strided_slice_op supports bool type inputs (#33373) · 73f2ffa3

由 TeslaZhao 提交于 6月 07, 2021

* Fix two english api documents, transpose and strided_slice

* OP:strided_slice_op supports bool type inputs

73f2ffa3

31 5月, 2021 1 次提交
- W
  enhance error message for conv (#33119) · c4dbeca3
  由 wangguanzhong 提交于 5月 31, 2021
```
* enhance error message for conv

* fix ci coverage
```
  c4dbeca3
13 5月, 2021 1 次提交
- B
  
  solved some npu bugs (#32793) · c3ae0d40
  由 Baibaifan 提交于 5月 13, 2021
  
  c3ae0d40
06 5月, 2021 1 次提交
- Z
  [Rocm] fix tests of inplace_abn_op & grid_sampler_op (#32703) · 7c27541e
  由 zhulei 提交于 5月 06, 2021
```
* [Rocm] fix tests of inplace_abn_op & grid_sampler_op

* [Rocm] fix tests of inplace_abn_op & grid_sampler_op
```
  7c27541e
29 4月, 2021 1 次提交

Add BF16 uniform random initializer (#32468) · f46f15a0

由 joanna.wozna.intel 提交于 4月 29, 2021

* Add bf16 uniform random initializer

* Remove duplicated section

* Change UT to CPU place only

* Put detail functions into anonymous namespace

f46f15a0

28 4月, 2021 1 次提交
- A
  
  Added pure_bf16 mode (#32281) · bc379ca3
  由 arlesniak 提交于 4月 28, 2021
  
  bc379ca3
19 4月, 2021 1 次提交
- J
  
  Add BF16 Constant Initializer and support for other initializer (#31935) · 76cb83e8
  由 joanna.wozna.intel 提交于 4月 19, 2021
  
  76cb83e8
15 4月, 2021 1 次提交

【NPU】Cherry-pick ascendrc ops code by 0325 to develop (#32197) · e6bc358d

由 zhang wenhui 提交于 4月 15, 2021

* merge 31065

* Fix typo of selected_npus (#31230)

* merge 31249

* [NPU] Support npu op pow and pow grad (#31247)

* [NPU] Support npu op: (1) pow (2) pow_grad

* Support fp16

* Fix pow npu fp16 test (#31256)

* support list of list attribute for NPU (#31299)

* support list of list attribute for NPU

* fix compile problem

* fix reference

* [NPU] Support npu op: (1) slice (2) slice_grad (#31275)

* fix reading flags from env (#31329)

* merge 31347

* [NPU] Support npu op layer_norm and layer_norm_grad (#31310)

* init commit, add layer_norm npu kernel

* fix typo

* add unittest

* add unittest

* fix bug

* fix bug

* refine ut

* [NPU] add npu kernel for equal op (#31393)

* add npu kernel for equal op

* refine code

* add more ut

* update year

* [NPU] Support npu kernel for shape op  (#31427)

* add shape npu

* fix

* fix

* fix endif (#31431)

* Fix pow, use fillD instead of broadcast (#31433)

* Fix pow, refine code (#31440)

* fix cmake of cryptopp to avoid downloading every time (#31451)

* [NPU] squeeze and unsqueeze op for ascend (#31452)
Co-authored-by: Nroot <xiayanming@baidu.com>

* Support npu kernel for gather op (#31458)

* add gather npu op

* code review done

* update python new line

* precommit

* fix review

* del commit

* 【NPU】add scale op for npu (#31499)

* add scale npu

* fix

* fix

* Support TensorFormVector, TensorToVector of bool type (#31518)

* support TensorFormVector, TensorToVector of bool type

* add ut

* fix compile problem

* 【NPU】support npu kernel for fill_constant op (#31521)

* add fill_constant npu

* add fill_constant npu

* fix

* cherry-pick 31422, solve conflict

* 【NPU】Support npu kernel for matmul op (#31544)

* add matmulv2_npu

* add matmul

* add matmul

* [NPU] Support npu op elementwise_mul and elementwise_mul_grad (#31571)

* [NPU] Support npu op elementwise_max (#31574)

* 【NPU】add relu op for  npu (#31515)

* add relu npu

* fixed

* fix

* 【NPU】Suppert npu kernel for reshape2 op (#31524)

* add reshape2 npu

* add reshpe2

* [NPU] Support npu kernel for gather op fix bug (#31541)

* add gather npu op

* code review done

* update python new line

* precommit

* fix review

* del commit

* update gather_grad

* fix bug

* fix bug

* [NPU] Support npu kernel for amp_check_finite_and_unscale_npu op (#31457)

* Support npu kernel for amp_check_finite_and_unscale_npu op

* support EnforceNotMet exception

* fix exception bug

* modify python unittest

* precommit

* update c++ unittest

* fix review

* fix review

* [NPU] accuracy op (#31492)

* accuracy op

* fix license

* fix

* add test and fix bug

* [NPU] add Assign OP (#31561)

* add assign op

* add test assign npu test

* dele if def
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] fix npu op elementwise_mul_grad (#31592)

* 【NPU】Support npu op gelu and gelu_grad (#31530)

* Support npu op gelu and gelu_grad

* Support npu op gelu and gelu_grad

* [NPU] fix assgin cmake (#31595)

* fix gather_grad bug (#31607)

* [NPU] add range op (#31560)

* add range op

* fix codestyle; call GetSize directly
Co-authored-by: Noyjxer <1728722986@qq.com>

* 【NPU】Support npu op elementwise_div and elementwise_div_grad (#31573)

* Support npu op elementwise_div and elementwise_div_grad

* Support npu op elementwise_div and elementwise_div_grad

* Support npu op elementwise_div and elementwise_div_grad

* [NPU] Support npu op log, log_grad, sqrt, sqrt_grad, square, tanh and tanh_grad (#31600)

* [NPU] Support npu op logicalnot_op (#31534)

* [NPU] Support npu op elementwise_min (#31575)

* [NPU] Support npu op elementwise_pow (#31576)

* [NPU] Support npu op table_lookup_v2 and table_lookup_v2_grad (#31399)

* [npu] support npu kernel `table_lookup_v2`

* clean up

* +python test

* +cmake

* clean up

* remove int8 kernel
+ python unitest for fp16

* clean up

* [NPU] support npu kernel for `less_than` (#31327)

* [npu] support npu kernel for `less than`

* remove int* kernel

* cleanup

* [NPU] Support npu kernel scatter op (#31624)

* Support npu kernel scatter op

* Add more test

* [NPU] fix allocator min chunk size (#31632)

* [NPU] Support NPU kernel cast op (#31635)
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* [NPU] add npu kernel for sgd (#31639)

* 【NPU】Support NPU kernel for reduce_sum op v2 (#31620)

* add reduce_sum

* fix broadcastd

* fix test

* fix

* add unsqueeze in reduce_sum

* add template

* add unittest for keep_dim

* test reduce_all
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* [NPU] add npu kernel for adam (#31644)

* add npu kernel for adam

* refine code

* disable test

* modify atol

* 【NPU】Support npu kernel for mul op (#31584)

* add mul

* add test mul

* [NPU] add npu kernel for softmax_with_cross_entropy (#31656)

* init

* fix bugs

* [NPU] add npu kernel for mean Op (#31562)

* update mean op

* update mean op

* give a better test activation
Co-authored-by: Noyjxer <1728722986@qq.com>

* Revert "[NPU] add npu kernel for mean Op (#31562)" (#31665)

This reverts commit 468ac699.

* 【NPU】Add TensorCopy to NPU kernel for reduce_sum op  (#31667)

* update unittest

* add TensorCopy in npu grad kernel

* [NPU] Support npu op `expand` (#31405)

* [npu] support npu kernel  for `expand`

* [NPU] fix shape of dx in mul_grad (#31675)

* fix shape of dx

* refine code

* [NPU] add Increment op (#31563)

* add increment

* fix

* update test increment op inplace

* update increment op

* increment b = 2
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] add NPU add topk  (#31596)

* add topk op

* add cmake

* update topk npu op

* refactor func

* fix test not go npu TopKD bug

* NPUPlace(4) to NPUPlace(0)

* update comment
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] Support NPU kernel sum op (#31671)

* [NPU] npu support `transpose` (#31486)

* cherry-pick 31564, solve conflict

* [NPU] Fix bug: Fix calculation errors of pow grad npu kernel (#31699)

* [NPU] Support testing grad of NPU ops in OpTest (#31697)

* [NPU] Support NPU kernel of stack op (#31711)

* [NPU] Remove redundant ctest of top_k_op_npu_test (#31718)

* [NPU] fix reshape npu op kernel (#31726)

* rename npu op file

* fix reshape

* [NPU] change transpose to transpose2 (#31734)

* change transpose to transpose2

* fix bug

* [NPU] Support  mean npu kernel (#31729)

* [NPU] fix some bugs of npu op (#31739)

* fix softmax

* fix mean

* fix lookup_table_v2

* 【NPU】Fix npu kernel elementwise_div_grad  (#31753)

* [NPU] fix the grad kernel diff bug of gather op (#31757)

* fix gather grad kernel diff

* fix gather grad kernel diff

* fix gather review bug

* 【NPU】Fix reshape test & add grad test (#31776)

* fix

* fix

* [NPU] support fp16 for npu accuracy op (#31797)

* [NPU] support list of tensor input (#31801)

* support list of tensor as npu input

* add comment

* fix typo

* fix typo

* [NPU] add npu kernel for concat op (#31695)

* add npu kernel for concat op

* add npu kernel for concat op

* refine code

* update

* refine concat_grad

* [NPU] Support npu kernel for op elementwise_floordiv (#31822)

* [NPU] fix bug of lookup_table_v2_grad (#31834)

* [NPU] support default stream (#31510)

* [NPU] support mixed precision input for npu layer norm (#31847)

* support mixed precision input for npu layer norm

* fix layer_norm npu kernel
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* 【NPU】Support npu kernel for update_loss_scaling op (#31830)

* add update_loss_scaling_npu NPU kernel

* change TensorFromVec to Memset

* fix compile problem (#31850)

* [NPU] support npu for conditional_block op (#31854)

* 【NPU】Add int dtype kernel for reshape2 op (#31864)

* fix

* fix

* [NPU] fix some op bugs (#31855)

* fix some op bugs

* fix some bugs

* follow comments

* fix log level

* add ut

* [NPU] support fp16 of input for api pow (#31871)

* [NPU] add npu kernel for truncated_gaussian_random op (#31654)

* init

* add todo

* add npu kernel for truncated_gaussian_random

* add sync

* fix concat_grad

* fix typo

* fix compile

* fix compile

* fix compile

* fix compile

* fix compile

* fix compile

* fix code style

* fix code style

* fix code

* Fix op test (#32231)

* fix conditional block (#32243)

* fix style code
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: NReventon_L <luyuxiang1994@qq.com>
Co-authored-by: Nroot <xiayanming@baidu.com>
Co-authored-by: Noyjxer <1728722986@qq.com>
Co-authored-by: Nyinhaofeng <66763551+yinhaofeng@users.noreply.github.com>
Co-authored-by: NOleNet <olenet@126.com>
Co-authored-by: NMeiyim <chen_xuyi@outlook.com>
Co-authored-by: Noyxuan-11 <963650125@qq.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

e6bc358d

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致