1. 20 10月, 2021 3 次提交
    • S
      Add FasterTokenizer Operator (#34491) · 3f2d6a3f
      Steffy-zxf 提交于
      Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.
      
      * support the text string as an input Tensor
      * support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
      * Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
      * It first applies basic tokenization, followed by wordpiece tokenization.
      3f2d6a3f
    • W
      adapt to cann5.0.3_alpha3. (#36106) · 873ee4e3
      wuhuachaocoding 提交于
      873ee4e3
    • H
      Add CINN Compile Option (#36292) · 6524fa8d
      Huihuang Zheng 提交于
      Add CINN compile option in CMake.
      
      Now you can use CINN in Paddle by `-DWITH_CINN=ON` when `cmake`
      
      To test it, you can run `make cinn_lib_test -j` and `ctest -R cinn_lib_test`. 
      
      Note:
      1. You should set
      ```
      export runtime_include_dir=${CINN_SOURCE_DIR}/cinn/runtime/cuda 
      ```
      When run test, the `${CINN_SOURCE_DIR}` should be set based on your CINN directory.
      
      2. CINN is under developing now, you may have to change `CINN_GIT_TAG` to the git commit you need.
      6524fa8d
  2. 19 10月, 2021 2 次提交
    • Q
      [NPU] update inference cmake, test=develop (#36505) · 49d7bd38
      Qi Li 提交于
      * [NPU] update inference cmake, test=develop
      
      * address review comments, test=develop
      
      * fix compile error when WITH_ASCEND_CXX11 ON, test=develop
      49d7bd38
    • Y
      [paddle.linalg.qr] Add the Qr Operator (#35742) · 34d785c2
      Yulong Ao 提交于
      * Add QR decomposition op
      
      * Change codes to adapt to new svd_helper
      
      * Update linalg.py
      
      Restore the deleted comma
      
      * Restore the deleted line
      
      * Update linalg.py
      
      * Update linalg.py
      
      * Improve the qr code by reviews
      
      * Update QR based on CI results
      
      * Update qr doc, test=document_fix
      
      * Change unsafe and ill-formed codes
      34d785c2
  3. 14 10月, 2021 2 次提交
  4. 11 10月, 2021 1 次提交
  5. 28 9月, 2021 2 次提交
  6. 27 9月, 2021 1 次提交
  7. 24 9月, 2021 1 次提交
  8. 23 9月, 2021 2 次提交
  9. 22 9月, 2021 2 次提交
  10. 18 9月, 2021 1 次提交
    • F
      Add FFT related operators and APIs (#35665) · 11518a43
      Feiyu Chan 提交于
      * 1. add interface for fft;
      2. add data type predicate;
      3. fix paddle.roll.
      
      * add fft c2c cufft kernel
      
      * implement argument checking & op calling parts for fft_c2c and fftn_c2c
      
      * add operator and opmaker definitions
      
      * only register float and double for cpu.
      
      * add common code for implementing FFT, add pocketfft as a dependency
      
      * add fft c2c cufft kernel function
      
      * fix bugs in python interface
      
      * add support for c2r, r2c operators, op makers, kernels and kernel functors.
      
      * test and fix bugs
      
      * 1. fft_c2c function: add support for onesided=False;
      2. add complex<float>, complex<double> support for concat and flip.
      
      * 1. fft: fix python api bugs;
      2. shape_op: add support for complex data types.
      
      * fft c2c cufft kernel done with complie and link
      
      * fix shape_op, add mkl placeholder
      
      * remove mkl
      
      * complete fft c2c in gpu
      
      * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft;
      2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation.
      
      * complete fft c2c on gpu in ND
      
      * complete fft c2c on gpu in ND
      
      * complete fft c2c backward in ND
      
      * fix MKL-based implementation
      
      * Add frame op and CPU/GPU kernels.
      
      * Add frame op forward unittest.
      
      * Add frame op forward unittest.
      
      * Remove axis parameter in FrameFunctor.
      
      * Add frame op grad CPU/GPU kernels and unittest.
      
      * Add frame op grad CPU/GPU kernels and unittest.
      
      * Update doc string.
      
      * Update after review and remove librosa requirement in unittest.
      
      * Update grad kernel.
      
      * add fft_c2r op
      
      * Remove data allocation in TransCompute function.
      
      * add fft r2c onesided with cpu(pocketfft/mkl) and gpu
      
      * last fft c2r functor
      
      * fix C2R and R2C for cufft, becase the direction is not an option in these cases.
      
      * add fft r2c onesided with cpu(pocketfft/mkl) and gpu
      
      * fix bugs in python APIs
      
      * fix fft_c2r grad kernal
      
      * fix bugs in python APIs
      
      * add cuda fft c2r grad kernal functor
      
      * clean code
      
      * fix fft_c2r python API
      
      * fill fft r2c result with conjugate symmetry (#19)
      
      fill fft r2c result with conjugate symmetry
      
      * add placeholder for unittests (#24)
      
      * simple parameterize test function by auto generate test case from parm list (#25)
      
      * miscellaneous fixes for python APIs (#26)
      
      * add placeholder for unittests
      
      * resize fft inputs before computation is n or s is provided.
      
      * add complex kernels for pad and pad_grad
      
      * simplify argument checking.
      
      * add type promotion
      
      * add int to float or complex promotion
      
      * fix output data type for static mode
      
      * fix fft's input dtype dispatch, import fft to paddle
      
      * fix typos in axes checking (#27)
      
      * fix typos in axes checking
      
      * fix argument checking (#28)
      
      * fix argument checking
      
      * Add C2R Python layer normal and abnormal use cases (#29)
      
      * documents and single case
      
      * test c2r case
      
      * New C2R Python layer normal and exception use cases
      
      * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30)
      
      * Documentation of the common interfaces of c2r and c2c (#31)
      
      * Documentation of the common interfaces of c2r and c2c
      
      * clean c++ code  (#32)
      
      * clean code
      
      * Add numpy-based implementation of spectral ops (#33)
      
      * add numpy reference implementation of spectral ops
      
      * Add fft_c2r numpy based implementation for unittest. (#34)
      
      * add fft_c2r numpy implementation
      
      * Add deframe op and stft/istft api. (#23)
      
      * Add frame api
      
      * Add deframe op and kernels.
      
      * Add stft and istft apis.
      
      * Add deframe api. Update stft and istft apis.
      
      * Fix bug in frame_from_librosa function when input dims >= 3
      
      * Rename deframe to overlap_add.
      
      * Update istft.
      
      * Update after code review.
      
      * Add overlap_add op and stft/istft api unittest (#35)
      
      * Add overlap_add op unittest.
      
      * Register complex kernels of squeeze/unsquuze op.
      
      * Add stft/istft api unittest.
      
      * Add unittest for fft helper functions (#36)
      
      * add unittests for fft helper functions. add complex kernel for roll op.
      
      * complete static graph unittest for all public api (#37)
      
      * Unittest of op with FFT C2C, C2R and r2c added (#38)
      
      * documents and single case
      
      * test c2r case
      
      * New C2R Python layer normal and exception use cases
      
      * Documentation of the common interfaces of c2r and c2c
      
      * Unittest of op with FFT C2C, C2R and r2c added
      Co-authored-by: lijiaqi0612's avatarlijiaqi <lijiaqi0612@163.com>
      
      * add fft related options to CMakeLists.txt
      
      * fix typos and clean code (#39)
      
      * fix invisible character in mkl branch and fix error in error message
      
      * clean code: remove docstring from unittest for signal.py.
      
      * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40)
      
      * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
      
      * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41)
      
      1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
      2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r;
      3. fix unittest to catch UnImplementedError and RuntimeError;
      4. fix compile error by avoid using thrust when cuda is not available.
      5.  fix sample code, use paddle.fft instead of paddle.tensor.fft
      
      * remove inclusion of thrust, add __all__ list for fft (#42)
      
      * Add api doc and update unittest. (#43)
      
      * Add doc strings.
      * Update overlap_add op unittest
      
      * fix MKL-based FFT implementation (#44)
      
      * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R
      
      * remove code for debug (#45)
      
      * use dynload for cufft (#46)
      
      * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms.
      
      * add complex support for fill_zeros_like
      
      * use dynload for cufft
      
      * Update doc and unittest. (#47)
      
      * Add doc of frame op and overlap_add op.
      
      * Update unittest.
      
      * use dynload for cufft (#48)
      
      1. use dynload for cufft
      2. fix unittest;
      3. temporarily disable Rocm.
      
      * fix conflicts and merge upstream (#49)
      
      fix conflicts and merge upstream
      
      * fix compile error: only link dyload_cuda when cuda is available (#50)
      
      * fix compile error: only link dyload_cuda when cuda is available
      
      * fix dynload for cufft on windows (#51)
      
      1. fix dynload for cufft on windows;
      2. fix unittests.
      
      * add NOMINMAX to compile on windows (#52)
      
       add NOMINMAX to compile on windows
      
      * explicitly specify capture mode for lambdas (#55)
      
       explicitly specify capture mode for lambdas
      
      * fix fft sample (#53)
      
      * fix fft sample
      
      * update scipy and numpy version for unittests of fft (#56)
      
      update scipy and numpy version for unittests of fft
      
      * Add static graph unittests of frame and overlap_add api. (#57)
      
      * Remove cache of cuFFT & Disable ONEMKL (#59)
      
      1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm
      2. remove cache of cufft plans;
      3. enhance error checking.
      4. default WITH_ONEMKL to OFF
      Co-authored-by: Njeff41404 <jeff41404@gmail.com>
      Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
      Co-authored-by: NKP <109694228@qq.com>
      Co-authored-by: lijiaqi0612's avatarlijiaqi <lijiaqi0612@163.com>
      Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
      Co-authored-by: Nlijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>
      11518a43
  11. 16 9月, 2021 1 次提交
  12. 14 9月, 2021 1 次提交
    • S
      windows third party cache optimization: share third party cache among servers (#35368) · e919620a
      Sing_chan 提交于
      * new function: share third party cache among servers to fasten build speed
      
      * modified code according to zhouwei25's comment
      
      * add wget install step, move cd build to the last of if condition
      
      * block note and error of third_party share; change bce upload method
      
      * change third_party sub_dir in bos, since third party in different cuda version cant share
      
      * set sub_dir by get nvcc version
      
      * change third_party local path to be same with bos path
      e919620a
  13. 13 9月, 2021 1 次提交
  14. 09 9月, 2021 1 次提交
    • 0
      Add matrix_rank Op and it's GPU and CPU kernel (#34823) · eb1fbf12
      0x45f 提交于
      * init matrix_rank op, add matrix_rank CPU code and test
      
      * add GPU kernel, remove svd_eigen.h
      
      * add CPU kernel when tol is tensor
      
      * add cpu and gpu code when tol is tensor
      
      * fix CI-ROCM error
      
      * add matrix_rank API describe, fix PR-CI-Py3 error
      
      * fix PR-CI-Windows error, add matrix_rank API test
      
      * delete useless comments
      
      * fix review
      
      * add my code in svd_helper.h
      
      * update doc commets
      
      * remove spaces
      eb1fbf12
  15. 03 9月, 2021 2 次提交
  16. 02 9月, 2021 1 次提交
    • X
      Add SVD Op and it's GPU and CPU kernel (#34953) · 7e5fb462
      xiongkun 提交于
      * Add SVD Op and it's GPU and CPU kernel
      
      * Remove CUDAPlace in test_svd_op, make the test available in CPU package
      
      * modfity the file
      
      * fix windows bug/ fix ROCM / fix test timeout
      
      * for pass the CIs
      
      * improve error report
      
      * for code review
      
      * some modification to test_svd_op
      
      * change python code style
      
      * expose the svd interface for document
      7e5fb462
  17. 01 9月, 2021 1 次提交
  18. 31 8月, 2021 4 次提交
    • S
      Revert "Revert "Add copy from tensor (#34406)" (#35173)" (#35256) · 6116f9af
      Shang Zhizhou 提交于
      * Revert "Revert "Add copy from tensor (#34406)" (#35173)"
      
      This reverts commit 32c1ec42.
      
      * add template instantiation
      6116f9af
    • zhouweiwei2014's avatar
      fix bug that cmake find python (#35304) · 00c9aeb0
      zhouweiwei2014 提交于
      00c9aeb0
    • Z
      New whl release strategy with pruned nv_fatbin (#35239) · 2f3b393d
      Zhanlue Yang 提交于
      [Background]
      Expansion in code size can be irreversible in the long run, leading to huge release packages which
      not only hampers user experience but also exceeds a hard limit of pypi.
      
      In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU
      arches supported.
      
      This PR aims to prune this NV_FATBIN.
      
      [Solution]
      In the new release strategy, two types of whl packages will be involved:
      
      Cubin PIP package:
      PIP package maintains a smaller window for GPU arches support, containing
      sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches
      
      JIT release package:
      This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60,
      compute_70, compute_75, compute_80, with best performance and GPU arches coverage.
      
      However, it takes around 10 min to install due to the JIT compilation.
      
      [How to use]
      The new release strategy is disabled by default.
      To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP
      To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL
      2f3b393d
    • W
      fix CI skip cc test error (#35264) · 3d76d003
      wuhuanzhou 提交于
      * fix CI skip cc test error, test=develop
      
      * remove test code, test=develop
      3d76d003
  19. 27 8月, 2021 1 次提交
  20. 26 8月, 2021 1 次提交
    • S
      Add copy from tensor (#34406) · ac33c0ca
      Shang Zhizhou 提交于
      * add api
      
      * temp save
      
      * revert
      
      * copytocpu async ok
      
      * fix style
      
      * copy sync ok
      
      * fix compile error
      
      * fix compile error
      
      * api done
      
      * update python async api
      
      * fix compile
      
      * remove async python api; add c++ async unittest
      
      * remove python async api
      
      * update unittest
      
      * update unittest
      
      * add C++ unittest for copytensor
      
      * add unittest
      
      * update namespace utils to class TensorUtils
      
      * add unittest
      
      * update unittest
      
      * update unittest
      
      * update code style
      
      * update code style
      
      * update unittest
      ac33c0ca
  21. 25 8月, 2021 2 次提交
  22. 23 8月, 2021 1 次提交
  23. 16 8月, 2021 1 次提交
  24. 10 8月, 2021 1 次提交
    • C
      copy boost/any.hpp to utils and replace boost::any with self defined any (#34613) · 12892929
      chentianyu03 提交于
      * add any.hpp to utils and replace boost::any with self defined paddle::any
      
      * add copy any.hpp to custom op depends
      
      * modify any.hpp include path
      
      * remove boost from setup.py.in
      
      * add copy any.hpp to custom op depends
      
      * move any.hpp to paddle/utils/ dirs
      
      * move any.h to extension/include direction
      
      * copy utils to right directions
      12892929
  25. 09 8月, 2021 1 次提交
  26. 06 8月, 2021 1 次提交
  27. 03 8月, 2021 1 次提交
  28. 29 7月, 2021 1 次提交