1. 28 9月, 2021 6 次提交
    • L
      add API paddle.linalg.eig (#35674) · bc7e2b92
      Lijunhui 提交于
      * Add paddle.linalg.eig op
      
      * remove comments
      
      * remove comments
      
      * extend batch_size to the origin
      
      * add real times complex functor & destroy the backward complex output bug
      
      * terminate output diff when input real tensors
      
      * correct tiny doc errors
      
      * move functions from eig_helper to svd_helper and remove eig_helper
      
      * remove tensor.Resize
      
      * remove no longer used code
      
      * use existing lapack functions
      
      * reply review comments 21/27
      
      * remove .cu as this op is only executed on CPU
      
      * remove const_cast & add const in argument list for read-only references
      
      * fix sample code error in CI
      
      * remove template typename Tbase and more
      
      * remove eig exposure in paddle.*
      
      * add 'name=None' in eig python implementation
      
      * handle the unittest
      
      * try to solve the unittest
      
      * solve CI coverage
      
      * remove no longer used code
      
      * polish API doc and more
      
      * reply review comments
      
      * polish unittest, commit plan B
      
      * polish unittest
      bc7e2b92
    • X
      [hybrid] seed and dropout op support force-cpu (#35820) · 58c8f6b3
      xiayanming 提交于
      * [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid
      
      * [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid
      
      * [HIP] fix op not support AMD GPU bug
      
      * [hybrid] seed and dropout op support force-cpu
      
      * [hybrid] seed and dropout op support force-cpu
      
      * [hybrid] seed and dropout op support force-cpu
      
      * [hybrid] seed and dropout op support force-cpu
      
      * [hybrid] seed and dropout op support force-cpu
      
      * [hybrid] fix seed ci failed issue
      
      * add AsExtra for force_cpu of seed op
      58c8f6b3
    • Z
      remove new linalg api in paddle.__init__ (#36151) · 3bb4715e
      zhiboniu 提交于
      remove recent linalg api in paddle.init;
      add args 'name' in some new linalg api interface
      same change in develop branch to #36112
      3bb4715e
    • J
      【Bug fix】Fix dygraph double grad dtype error (#36125) · af4f018a
      Jiabin Yang 提交于
      * fix dygraph double grad dtype error when calling for high differential senario
      
      * reinvoke ci
      
      * add test for partial_engine.cc
      af4f018a
    • W
      [hybrid] optimizer sharding support optimize cast (#35878) · eef0a943
      WangXi 提交于
      eef0a943
    • Y
      Add paddle.device.cuda.get_device_properties (#35661) · 4cbed9e5
      Yanxing Shi 提交于
      * Initial Commit
      
      * add unittest and add error information
      
      * modify doc
      
      * fix some error
      
      * fix some word
      
      * fix bug cudaDeviceProp* and modify error explanation
      
      * fix cudaDeviceProp* error and unnitest samples
      
      * fix hip error and PADDLE_WITH_HIP
      
      * update style
      
      * fix error is_compiled_with_cuda
      
      * fix paddle.device.cuda.get_device_properties
      
      * fix error for multi thread safe
      
      * update style
      
      * merge conflict
      
      * modify after mentor review
      
      * update style
      
      * delete word
      
      * fix unittest error for windows
      
      * support string input and modify some code
      
      * modify doc to support string input
      
      * fix error for express information
      
      * fix error for express information
      
      * fix unnitest for windows
      
      * fix device.startswith('gpu:')
      
      * format error and doc
      
      * fix after review
      
      * format code
      
      * fix error for doc compile
      
      * fix error for doc compile
      
      * fix error for doc compile
      
      * fix error for doc compile
      
      * fix error for doc compile
      
      * fix py2 error
      
      * fix wrong words and doc
      
      * fix _gpuDeviceProperties
      4cbed9e5
  2. 27 9月, 2021 2 次提交
    • J
      Added flatten and flatten2 BF16/FP32 FWD/BWD kernels (#35892) · e427a0f1
      jakpiase 提交于
      * refactored reshape multiop kernel and added flatten1/2 kernels
      
      * added formatting for flatten tests
      
      * CI fix
      
      * disabled reshape_kernel ops after succesful CI run
      
      * minor fix
      e427a0f1
    • L
      Add functional autograd API: jacobian (#35917) · ec2f68e8
      levi131 提交于
      * init functional jacobian api
      
      * finish test with dtype float32
      
      * add float64 test case
      
      * polish code
      
      * use atol=1e-5 with dtype float64
      
      * fix for ci
      
      * set timeout for test_jacobian
      
      * polish API docstring
      
      * modify docstring
      ec2f68e8
  3. 26 9月, 2021 3 次提交
  4. 24 9月, 2021 9 次提交
    • J
      add gradient kernel of det op and slogdet op (#36013) · b91e8eec
      jiangcheng 提交于
      * add gradient kernel of det op and slogdet op
      
      * fix CI APPROVAL problem
      b91e8eec
    • P
      Added elementwise_sub_mkldnn operator (#35662) · 787273ed
      piotrekobiIntel 提交于
      * Add elementwise_sub_mkldnn_op without grad
      
      * Add test to static_mode_white_list
      
      * Refactor code, change license years
      
      * Remove invalid grad implementation
      
      * Fix element_wise_sub_op test
      
      * Fix CI Approval error
      
      * Remove unnecessary EltwiseSubMKLDNNGradKernel class
      
      * Fix CI Approval 2
      
      * Fix CI Approval 3
      
      * Fix CI Approval Attempt #4
      
      * Fix CI Approve Attempt #5
      
      * Fix CI Approval Attempt #6
      
      * Fix CI Approval Attemt #7
      
      * Change test names containing add to sub
      
      * Fix old tests testing add instead of sub
      
      * Copy grad implementation from elementwise_add_mkldnn
      
      * CI test fix attempt
      
      * Revert "CI test fix attempt"
      
      This reverts commit c647cacf41e6a87c715385a185de5cbf65fc8900.
      
      * Fix CI attempt 2
      
      * Fix elementwise_sub tests, temporary mkldnn broadcast test disable
      
      * Add working implementation of elementwise_sub grad
      
      * Fix build errors caused by pull
      
      * Fix format error
      
      * Fix format error 2
      
      * Disable elementwise_sub_mkldnn test on GPU
      
      * Apply fix for paddle.fluid import
      
      * Revert changes of test_elementwise_sub and Fix mkldnn test
      
      * Revert "Apply fix for paddle.fluid import"
      
      This reverts commit fc3b122fec8e12f2bcb32928a2685ba4d20fd742.
      
      * fix bug of module 'paddle' has no attribute 'fluid' for python3.6 (#35862)
      
      * Add changes suggested by reviewers
      
      * Change @unittest.skipIf... to @OpTestTool.skip_if_not_cpu_bf16() to satisfy Approval CI
      
      * Remove check_dygraph=False to satisify CI Approval
      Co-authored-by: Nzhangbo9674 <82555433+zhangbo9674@users.noreply.github.com>
      787273ed
    • S
      add update (#36017) · 1691dc7a
      ShenLiang 提交于
      1691dc7a
    • J
      add pool2d convert test (#35923) · 82f255d0
      JingZhuangzhuang 提交于
      * add pool2d convert test
      
      * modify error
      
      * modify error
      
      * modify error
      
      * modify error
      
      * modify error
      
      * modify error
      82f255d0
    • K
      4f42e5d7
    • W
      Add paddle.linalg.solve OP (#35715) · 8caf951c
      Weilong Wu 提交于
      * Add linalg.solve op, test=develop
      
      * Fix a bug caused by accidental deletion
      
      * updated description and fix a bug: missing a comma
      
      * Add linalg.solve op, test=develop
      
      * updated solve op backward logic
      
      * updated solve op backward logic again
      
      * Add linalg.solve Op, test=develop
      
      * Updated and modified to fit CI requirements
      
      * Fix a bug
      
      * 1)Add more test cases; 2)Fix a wrong usage in reduces operation; 3)Remove redundant code
      
      * Remove redundant comments
      
      * 1)Removed redundant code; 2)Updated to enhance code robustness
      
      * Removed redundant code
      
      * Updated API documents
      8caf951c
    • B
      0bbaf9bd
    • B
      add multihead_matmul trt converter test case (#36023) · fcaa64b3
      baoachun 提交于
      * add multihead_matmul trt converter test case
      
      * move attribute check to op_teller
      fcaa64b3
    • W
      add the shape check for the matmul (#35791) · 8e19d1ba
      wawltor 提交于
      * add the shape check for the matmul
      
      * remove the test case for the linear
      8e19d1ba
  5. 23 9月, 2021 1 次提交
  6. 22 9月, 2021 8 次提交
  7. 21 9月, 2021 1 次提交
    • A
      Reuse OneDNN handler for SGD and SUM for SelectedRows input tensors. (#35510) · 799f3861
      Adam Osewski 提交于
      * Create stateful OneDNNAXPYHandler object.
      
      This makes it possible to call it multiple times without recreating the
      oneDNN primitives every time.
      
      * Prepare SGDOpKernel to reuse its implementation from OneDNN kernel.
      
      * OneDNN SGD kernel.
      
      * Update call to use new OneDNNAXPYHandler object api.
      
      * Setup seed in proper place.
      
      * Enable OneDNN kernel only for single case.
      
      * For dense param and sparse grad.
      
      * Small refactor.
      
      * Enable oneDNN by op attr or by cmd line flag.
      
      * Use int64_t type for number of elements.
      
      * Support dense param and grad from OneDNN kernel.
      
      * Enable SGD OneDNN kernel when use MP BF16 optimizer.
      
      * Force non-copyable/movable OneDNNAXPYHandler.
      
      * Reuse OneDNNAXPYHandler for spare tensors in SUM op.
      
      * Fix SFINAE rules.
      
      * Remove recording event inside AXPY.
      
      * Get rid of internal primitive caching.
      
      * Stop use PP cache mechanims to store mem and primitive obj.
      * Handler obj store and reuse needed desc & prim
      
      * Do not derive from MKLDNNHandlerT
      799f3861
  8. 19 9月, 2021 1 次提交
  9. 18 9月, 2021 8 次提交
    • Z
    • W
      [hybird] fix pipeline section program Parameter (#35847) · 67c63639
      WangXi 提交于
      67c63639
    • H
      Basic PR on Cost Model (#35774) · 5ba9fe6e
      Huihuang Zheng 提交于
      Add basic Cost Model, it uses executor to run program and profile it to get op time.
      
      This is an early basic version, we will add more functions in the future.
      5ba9fe6e
    • C
      FixEighOP; Unified MatrixEighFunctor function (#35812) · da441363
      crystal 提交于
      da441363
    • F
      Add FFT related operators and APIs (#35665) · 11518a43
      Feiyu Chan 提交于
      * 1. add interface for fft;
      2. add data type predicate;
      3. fix paddle.roll.
      
      * add fft c2c cufft kernel
      
      * implement argument checking & op calling parts for fft_c2c and fftn_c2c
      
      * add operator and opmaker definitions
      
      * only register float and double for cpu.
      
      * add common code for implementing FFT, add pocketfft as a dependency
      
      * add fft c2c cufft kernel function
      
      * fix bugs in python interface
      
      * add support for c2r, r2c operators, op makers, kernels and kernel functors.
      
      * test and fix bugs
      
      * 1. fft_c2c function: add support for onesided=False;
      2. add complex<float>, complex<double> support for concat and flip.
      
      * 1. fft: fix python api bugs;
      2. shape_op: add support for complex data types.
      
      * fft c2c cufft kernel done with complie and link
      
      * fix shape_op, add mkl placeholder
      
      * remove mkl
      
      * complete fft c2c in gpu
      
      * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft;
      2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation.
      
      * complete fft c2c on gpu in ND
      
      * complete fft c2c on gpu in ND
      
      * complete fft c2c backward in ND
      
      * fix MKL-based implementation
      
      * Add frame op and CPU/GPU kernels.
      
      * Add frame op forward unittest.
      
      * Add frame op forward unittest.
      
      * Remove axis parameter in FrameFunctor.
      
      * Add frame op grad CPU/GPU kernels and unittest.
      
      * Add frame op grad CPU/GPU kernels and unittest.
      
      * Update doc string.
      
      * Update after review and remove librosa requirement in unittest.
      
      * Update grad kernel.
      
      * add fft_c2r op
      
      * Remove data allocation in TransCompute function.
      
      * add fft r2c onesided with cpu(pocketfft/mkl) and gpu
      
      * last fft c2r functor
      
      * fix C2R and R2C for cufft, becase the direction is not an option in these cases.
      
      * add fft r2c onesided with cpu(pocketfft/mkl) and gpu
      
      * fix bugs in python APIs
      
      * fix fft_c2r grad kernal
      
      * fix bugs in python APIs
      
      * add cuda fft c2r grad kernal functor
      
      * clean code
      
      * fix fft_c2r python API
      
      * fill fft r2c result with conjugate symmetry (#19)
      
      fill fft r2c result with conjugate symmetry
      
      * add placeholder for unittests (#24)
      
      * simple parameterize test function by auto generate test case from parm list (#25)
      
      * miscellaneous fixes for python APIs (#26)
      
      * add placeholder for unittests
      
      * resize fft inputs before computation is n or s is provided.
      
      * add complex kernels for pad and pad_grad
      
      * simplify argument checking.
      
      * add type promotion
      
      * add int to float or complex promotion
      
      * fix output data type for static mode
      
      * fix fft's input dtype dispatch, import fft to paddle
      
      * fix typos in axes checking (#27)
      
      * fix typos in axes checking
      
      * fix argument checking (#28)
      
      * fix argument checking
      
      * Add C2R Python layer normal and abnormal use cases (#29)
      
      * documents and single case
      
      * test c2r case
      
      * New C2R Python layer normal and exception use cases
      
      * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30)
      
      * Documentation of the common interfaces of c2r and c2c (#31)
      
      * Documentation of the common interfaces of c2r and c2c
      
      * clean c++ code  (#32)
      
      * clean code
      
      * Add numpy-based implementation of spectral ops (#33)
      
      * add numpy reference implementation of spectral ops
      
      * Add fft_c2r numpy based implementation for unittest. (#34)
      
      * add fft_c2r numpy implementation
      
      * Add deframe op and stft/istft api. (#23)
      
      * Add frame api
      
      * Add deframe op and kernels.
      
      * Add stft and istft apis.
      
      * Add deframe api. Update stft and istft apis.
      
      * Fix bug in frame_from_librosa function when input dims >= 3
      
      * Rename deframe to overlap_add.
      
      * Update istft.
      
      * Update after code review.
      
      * Add overlap_add op and stft/istft api unittest (#35)
      
      * Add overlap_add op unittest.
      
      * Register complex kernels of squeeze/unsquuze op.
      
      * Add stft/istft api unittest.
      
      * Add unittest for fft helper functions (#36)
      
      * add unittests for fft helper functions. add complex kernel for roll op.
      
      * complete static graph unittest for all public api (#37)
      
      * Unittest of op with FFT C2C, C2R and r2c added (#38)
      
      * documents and single case
      
      * test c2r case
      
      * New C2R Python layer normal and exception use cases
      
      * Documentation of the common interfaces of c2r and c2c
      
      * Unittest of op with FFT C2C, C2R and r2c added
      Co-authored-by: lijiaqi0612's avatarlijiaqi <lijiaqi0612@163.com>
      
      * add fft related options to CMakeLists.txt
      
      * fix typos and clean code (#39)
      
      * fix invisible character in mkl branch and fix error in error message
      
      * clean code: remove docstring from unittest for signal.py.
      
      * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40)
      
      * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
      
      * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41)
      
      1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
      2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r;
      3. fix unittest to catch UnImplementedError and RuntimeError;
      4. fix compile error by avoid using thrust when cuda is not available.
      5.  fix sample code, use paddle.fft instead of paddle.tensor.fft
      
      * remove inclusion of thrust, add __all__ list for fft (#42)
      
      * Add api doc and update unittest. (#43)
      
      * Add doc strings.
      * Update overlap_add op unittest
      
      * fix MKL-based FFT implementation (#44)
      
      * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R
      
      * remove code for debug (#45)
      
      * use dynload for cufft (#46)
      
      * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms.
      
      * add complex support for fill_zeros_like
      
      * use dynload for cufft
      
      * Update doc and unittest. (#47)
      
      * Add doc of frame op and overlap_add op.
      
      * Update unittest.
      
      * use dynload for cufft (#48)
      
      1. use dynload for cufft
      2. fix unittest;
      3. temporarily disable Rocm.
      
      * fix conflicts and merge upstream (#49)
      
      fix conflicts and merge upstream
      
      * fix compile error: only link dyload_cuda when cuda is available (#50)
      
      * fix compile error: only link dyload_cuda when cuda is available
      
      * fix dynload for cufft on windows (#51)
      
      1. fix dynload for cufft on windows;
      2. fix unittests.
      
      * add NOMINMAX to compile on windows (#52)
      
       add NOMINMAX to compile on windows
      
      * explicitly specify capture mode for lambdas (#55)
      
       explicitly specify capture mode for lambdas
      
      * fix fft sample (#53)
      
      * fix fft sample
      
      * update scipy and numpy version for unittests of fft (#56)
      
      update scipy and numpy version for unittests of fft
      
      * Add static graph unittests of frame and overlap_add api. (#57)
      
      * Remove cache of cuFFT & Disable ONEMKL (#59)
      
      1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm
      2. remove cache of cufft plans;
      3. enhance error checking.
      4. default WITH_ONEMKL to OFF
      Co-authored-by: Njeff41404 <jeff41404@gmail.com>
      Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
      Co-authored-by: NKP <109694228@qq.com>
      Co-authored-by: lijiaqi0612's avatarlijiaqi <lijiaqi0612@163.com>
      Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
      Co-authored-by: Nlijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>
      11518a43
    • W
      trt support serialize and deserialize (#35828) · ba71421c
      Wilber 提交于
      ba71421c
    • A
      Clean ParseMemInfo and Fix unittest failed under multi-thread (#35840) · 2fff5a58
      Aurelius84 提交于
      * Clean ParaseMemInfo and fix unittest with multi-thread
      
      * fix declare
      2fff5a58
    • F
      Add new API "eigvals" in linalg (#35720) · d411a038
      From00 提交于
      * Add linalg.eigvals API
      
      * pre-commit check
      
      * Adjust code style
      
      * Fix conflict
      
      * Improve code style
      
      * Modify the test code to ignore testing CUDA kernel
      
      * Sort ouput data before checking in test code
      
      * Set timeout value for UT
      
      * Improve API example code to pass CI
      
      * Fix bug for None fetch_list in Windows
      
      * Delete grad Op
      d411a038
  10. 17 9月, 2021 1 次提交
    • Z
      [AMP] Support pure fp16 training mode for dygraph (#35521) · adaeee4d
      zhangbo9674 提交于
      * add pure fp16 major function in auto_cast & tracer
      
      * support master weight in dygraph for pure fp16
      
      * check mix dtype of fp16&fp32 for check_finite_and_unscale op
      
      * change pure fp16 funtion name
      
      * refine some bug in auto_cast
      
      * refine auto_cast interface logic
      
      * add param _casted_by_pure_fp16 for class Layer
      
      * support state_dict hook for save model by user appointed dtype in pure_fp16_decorator
      
      * refine pure_fp16_decorator as decorator
      
      * add unittest
      
      * add comment
      
      * add comment
      
      * support recompute
      
      * add comment for auto_cast and decorator
      
      * support to_static_state_dict for paddle.jit.save
      
      * unlimite models num and optimizers num
      
      * add lookup_table in black_list
      
      * fix momentum and layer state_dict
      
      * fix bug in layer state_dict
      
      * fix bug in layer state_dict_helper
      
      * refine unittest
      
      * refine test_momentun_op
      
      * refine interface and some code
      
      * refine amp_decorator interface
      
      * refine pure fp16 interface
      
      * refine master weight interface
      adaeee4d