1. 20 10月, 2021 1 次提交
    • S
      Add FasterTokenizer Operator (#34491) · 3f2d6a3f
      Steffy-zxf 提交于
      Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.
      
      * support the text string as an input Tensor
      * support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
      * Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
      * It first applies basic tokenization, followed by wordpiece tokenization.
      3f2d6a3f
  2. 19 10月, 2021 1 次提交
  3. 15 10月, 2021 1 次提交
  4. 11 10月, 2021 1 次提交
  5. 28 9月, 2021 1 次提交
  6. 24 9月, 2021 1 次提交
    • W
      Add paddle.linalg.solve OP (#35715) · 8caf951c
      Weilong Wu 提交于
      * Add linalg.solve op, test=develop
      
      * Fix a bug caused by accidental deletion
      
      * updated description and fix a bug: missing a comma
      
      * Add linalg.solve op, test=develop
      
      * updated solve op backward logic
      
      * updated solve op backward logic again
      
      * Add linalg.solve Op, test=develop
      
      * Updated and modified to fit CI requirements
      
      * Fix a bug
      
      * 1)Add more test cases; 2)Fix a wrong usage in reduces operation; 3)Remove redundant code
      
      * Remove redundant comments
      
      * 1)Removed redundant code; 2)Updated to enhance code robustness
      
      * Removed redundant code
      
      * Updated API documents
      8caf951c
  7. 22 9月, 2021 1 次提交
  8. 18 9月, 2021 1 次提交
    • F
      Add FFT related operators and APIs (#35665) · 11518a43
      Feiyu Chan 提交于
      * 1. add interface for fft;
      2. add data type predicate;
      3. fix paddle.roll.
      
      * add fft c2c cufft kernel
      
      * implement argument checking & op calling parts for fft_c2c and fftn_c2c
      
      * add operator and opmaker definitions
      
      * only register float and double for cpu.
      
      * add common code for implementing FFT, add pocketfft as a dependency
      
      * add fft c2c cufft kernel function
      
      * fix bugs in python interface
      
      * add support for c2r, r2c operators, op makers, kernels and kernel functors.
      
      * test and fix bugs
      
      * 1. fft_c2c function: add support for onesided=False;
      2. add complex<float>, complex<double> support for concat and flip.
      
      * 1. fft: fix python api bugs;
      2. shape_op: add support for complex data types.
      
      * fft c2c cufft kernel done with complie and link
      
      * fix shape_op, add mkl placeholder
      
      * remove mkl
      
      * complete fft c2c in gpu
      
      * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft;
      2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation.
      
      * complete fft c2c on gpu in ND
      
      * complete fft c2c on gpu in ND
      
      * complete fft c2c backward in ND
      
      * fix MKL-based implementation
      
      * Add frame op and CPU/GPU kernels.
      
      * Add frame op forward unittest.
      
      * Add frame op forward unittest.
      
      * Remove axis parameter in FrameFunctor.
      
      * Add frame op grad CPU/GPU kernels and unittest.
      
      * Add frame op grad CPU/GPU kernels and unittest.
      
      * Update doc string.
      
      * Update after review and remove librosa requirement in unittest.
      
      * Update grad kernel.
      
      * add fft_c2r op
      
      * Remove data allocation in TransCompute function.
      
      * add fft r2c onesided with cpu(pocketfft/mkl) and gpu
      
      * last fft c2r functor
      
      * fix C2R and R2C for cufft, becase the direction is not an option in these cases.
      
      * add fft r2c onesided with cpu(pocketfft/mkl) and gpu
      
      * fix bugs in python APIs
      
      * fix fft_c2r grad kernal
      
      * fix bugs in python APIs
      
      * add cuda fft c2r grad kernal functor
      
      * clean code
      
      * fix fft_c2r python API
      
      * fill fft r2c result with conjugate symmetry (#19)
      
      fill fft r2c result with conjugate symmetry
      
      * add placeholder for unittests (#24)
      
      * simple parameterize test function by auto generate test case from parm list (#25)
      
      * miscellaneous fixes for python APIs (#26)
      
      * add placeholder for unittests
      
      * resize fft inputs before computation is n or s is provided.
      
      * add complex kernels for pad and pad_grad
      
      * simplify argument checking.
      
      * add type promotion
      
      * add int to float or complex promotion
      
      * fix output data type for static mode
      
      * fix fft's input dtype dispatch, import fft to paddle
      
      * fix typos in axes checking (#27)
      
      * fix typos in axes checking
      
      * fix argument checking (#28)
      
      * fix argument checking
      
      * Add C2R Python layer normal and abnormal use cases (#29)
      
      * documents and single case
      
      * test c2r case
      
      * New C2R Python layer normal and exception use cases
      
      * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30)
      
      * Documentation of the common interfaces of c2r and c2c (#31)
      
      * Documentation of the common interfaces of c2r and c2c
      
      * clean c++ code  (#32)
      
      * clean code
      
      * Add numpy-based implementation of spectral ops (#33)
      
      * add numpy reference implementation of spectral ops
      
      * Add fft_c2r numpy based implementation for unittest. (#34)
      
      * add fft_c2r numpy implementation
      
      * Add deframe op and stft/istft api. (#23)
      
      * Add frame api
      
      * Add deframe op and kernels.
      
      * Add stft and istft apis.
      
      * Add deframe api. Update stft and istft apis.
      
      * Fix bug in frame_from_librosa function when input dims >= 3
      
      * Rename deframe to overlap_add.
      
      * Update istft.
      
      * Update after code review.
      
      * Add overlap_add op and stft/istft api unittest (#35)
      
      * Add overlap_add op unittest.
      
      * Register complex kernels of squeeze/unsquuze op.
      
      * Add stft/istft api unittest.
      
      * Add unittest for fft helper functions (#36)
      
      * add unittests for fft helper functions. add complex kernel for roll op.
      
      * complete static graph unittest for all public api (#37)
      
      * Unittest of op with FFT C2C, C2R and r2c added (#38)
      
      * documents and single case
      
      * test c2r case
      
      * New C2R Python layer normal and exception use cases
      
      * Documentation of the common interfaces of c2r and c2c
      
      * Unittest of op with FFT C2C, C2R and r2c added
      Co-authored-by: lijiaqi0612's avatarlijiaqi <lijiaqi0612@163.com>
      
      * add fft related options to CMakeLists.txt
      
      * fix typos and clean code (#39)
      
      * fix invisible character in mkl branch and fix error in error message
      
      * clean code: remove docstring from unittest for signal.py.
      
      * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40)
      
      * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
      
      * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41)
      
      1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
      2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r;
      3. fix unittest to catch UnImplementedError and RuntimeError;
      4. fix compile error by avoid using thrust when cuda is not available.
      5.  fix sample code, use paddle.fft instead of paddle.tensor.fft
      
      * remove inclusion of thrust, add __all__ list for fft (#42)
      
      * Add api doc and update unittest. (#43)
      
      * Add doc strings.
      * Update overlap_add op unittest
      
      * fix MKL-based FFT implementation (#44)
      
      * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R
      
      * remove code for debug (#45)
      
      * use dynload for cufft (#46)
      
      * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms.
      
      * add complex support for fill_zeros_like
      
      * use dynload for cufft
      
      * Update doc and unittest. (#47)
      
      * Add doc of frame op and overlap_add op.
      
      * Update unittest.
      
      * use dynload for cufft (#48)
      
      1. use dynload for cufft
      2. fix unittest;
      3. temporarily disable Rocm.
      
      * fix conflicts and merge upstream (#49)
      
      fix conflicts and merge upstream
      
      * fix compile error: only link dyload_cuda when cuda is available (#50)
      
      * fix compile error: only link dyload_cuda when cuda is available
      
      * fix dynload for cufft on windows (#51)
      
      1. fix dynload for cufft on windows;
      2. fix unittests.
      
      * add NOMINMAX to compile on windows (#52)
      
       add NOMINMAX to compile on windows
      
      * explicitly specify capture mode for lambdas (#55)
      
       explicitly specify capture mode for lambdas
      
      * fix fft sample (#53)
      
      * fix fft sample
      
      * update scipy and numpy version for unittests of fft (#56)
      
      update scipy and numpy version for unittests of fft
      
      * Add static graph unittests of frame and overlap_add api. (#57)
      
      * Remove cache of cuFFT & Disable ONEMKL (#59)
      
      1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm
      2. remove cache of cufft plans;
      3. enhance error checking.
      4. default WITH_ONEMKL to OFF
      Co-authored-by: Njeff41404 <jeff41404@gmail.com>
      Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
      Co-authored-by: NKP <109694228@qq.com>
      Co-authored-by: lijiaqi0612's avatarlijiaqi <lijiaqi0612@163.com>
      Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
      Co-authored-by: Nlijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>
      11518a43
  9. 26 8月, 2021 1 次提交
    • L
      Add feed_forward for fused attention op. (#34945) · d1a33bc7
      Li Min 提交于
      Describe
      
      Add feed_forward for fused attention op.
      (1) Encapsulate matmul impl (forward and backward) used in attention op.
      (2) Implement bias_add (forward and backward) used in attention op.
      d1a33bc7
  10. 17 8月, 2021 1 次提交
    • H
      Align CTC grad scale same with ESPNet (#34729) · 10f9644c
      Hui Zhang 提交于
      * dygraph support more ctc grad scale
      
      * scale for 1.x
      
      * fix unitest
      
      * fix unitest
      
      * format code
      
      * fix unittest
      
      * fix log info
      
      * unittest cov
      
      * fix format;notest,test=cpu,coverage
      
      * skip ctc_loss egs;test=cpu
      
      * warpctc grad cov;test=coverage
      
      * add dygraph test;test=coverage
      
      * format;test=cpu,coverage
      
      * format;test=cpu
      
      * add api compat;test=cpu
      
      * add cpu test
      
      * rename
      
      * rename
      
      * fix
      
      * fix test
      
      * format
      
      * eigen cpu
      
      * eigen gpu grad pass
      
      * cuda gpu pass
      
      * format
      
      * fix ci
      10f9644c
  11. 21 6月, 2021 1 次提交
    • L
      Add AXPY oneDNN handler (#33632) · 773aabc7
      lidanqing 提交于
      * Add oneDNN AXPY handler.
      
      * Add fallback for small tensors.
      
      * Fix ifdefs
      
      * Remove unnecessary namespace prefixes and add missing headers.
      
      * Guard handler_axpy with proper ifdefs.
      
      * Compilation of this function is possible only when Paddle is not build
      with CUDA nor HIP.
      
      * Move AXPY handler code to separate files.
      
      * Use oneDNN AXPY handler in SGD op.
      
      * Use axpy handler only when Paddle is built with oneDNN.
      
      * Add test for SUM BF16 with big rows.
      
      * Fix SFINAE rules for elementwise_add_to.
      
      * Add test case for SGD with big rows.
      
      * update
      
      * update
      Co-authored-by: NAdam Osewski <adam.osewski@intel.com>
      773aabc7
  12. 16 6月, 2021 1 次提交
  13. 26 5月, 2021 2 次提交
    • L
      [NPU] fix compile issue caused by dev changes (#33137) · 865f0c1f
      Leo Chen 提交于
      865f0c1f
    • W
      optimize OP's compilation time (#32617) · 78ecb668
      wuhuanzhou 提交于
      * optimize OP's compilation time, test=develop
      
      * add more op and run ci test, test=develop
      
      * CUDA Kernel register in cc file, test=develop
      
      * fix macros, test=develop
      
      * fix undefined symbol error, test=develop
      
      * fix compilation error and undefined symbol, test=develop
      
      * fix compilation error on Windows, test=develop
      
      * fix compilation error on Windows, test=develop
      78ecb668
  14. 25 4月, 2021 2 次提交
    • B
      add copy_cross_scope (#32432) · 5943ff7b
      Baibaifan 提交于
      5943ff7b
    • D
      Nne integration (#32255) · feb2e476
      denglin-github 提交于
      * Add dlnne engine runtime
      
      * Fix log
      
      * Remove <const_cast> and remove unrelated modify with dlnne, +clang-format
      
      * Fix CMakeList format error
      
      * Add copyright message
      
      * Fix dlnne CMakeList.txt
      
      * Add some paddlepaddle_pass to support more networks
      
      * Fix some format bug
      feb2e476
  15. 21 4月, 2021 1 次提交
  16. 19 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8
      Leo Chen 提交于
      * [NPU] support GarbageCollector for npu (#31874)
      
      * support GarbageCollector for npu
      
      * fix typo
      
      * fix gather_grad
      
      * disable NPUDefaultStreamGarbageCollector on NPU
      
      * [NPU] support npu for memcpy op (#31808)
      
      * support npu for memcpy op
      
      * add ut
      
      * fix ut
      
      * fix typo
      
      * 【NPU】fix bug of using temp vector (#31963)
      
      * fix bug when beta1_pow on cpu (#31995)
      
      * [NPU] support npu profiler (#31684)
      
      * support npu profiler
      
      * add python api
      
      * fix bugs
      
      * add wrapper for incomplete type
      
      * update profile proto
      
      * record npu wait
      
      * add xpu placeholder
      
      * fix adam (#32016)
      
      * [NPU] enable async copy and  add wait before sync operation (#31956)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * make TensorFromVector/TensorToVector sync
      
      * [NPU] Support dataloader on npu place. (#31867)
      
      * [NPU] Wait on NPUPlace (#32086)
      
      * [NPU] fix cast op (#32121)
      
      * fix npu kernel of cast op to handle casting to same dtype
      
      * add comments
      
      * [NPU] support cann 20.3 (#32044)
      
      * fix compile problem on cann 20.3
      
      * fix ut
      
      * fix test_mul
      
      * fix check_finite_and_scale
      
      * fix lookup_table_v2_grad
      
      * fix cmake
      
      * support print op
      
      * [NPU] Support npu save load (#31893)
      
      * support save load for NPU
      
      * add save load npu unittest
      
      * support np.array transform in NPU
      
      * fix errors
      
      * delete dygraph in unittest
      
      * add Wait
      
      * fix unittest
      
      * fix review comment
      
      * fix unittest problem
      
      * fix little problem
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace
      
      * refine code
      
      * fix NPUDeviceContext in all c++ unittest (#32198)
      
      * fix NPUDeviceContext in all c++ unittest
      
      * refine log
      Co-authored-by: Npangyoki <pangyoki@126.com>
      
      * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * change TensorFromVector to FillNpuTensorWithConstant
      
      * fix ignored api
      
      * delete extra unittest
      
      * fix little error
      
      * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu
      
      * change TensorCopySync to TensorCopy
      
      * delete useless Wait and add StreamWait
      
      * fix npu_stream error
      
      * fix check_finite_and_unscale_op_npu TensorCopy
      
      * only save stream wait
      
      * fix NPUDeviceContext in all c++ unittest
      
      * delete wait
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      
      * delete useless unittest file (#32206)
      
      * Fix op test (#32231)
      
      * fix conditional block (#32243)
      
      * fix adam bug again (#32246)
      
      * fix compile
      
      * fix ut
      
      * fix ut
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      Co-authored-by: Npangyoki <pangyoki@126.com>
      cbe5c9f8
  17. 15 4月, 2021 2 次提交
    • W
      Customizable Python Layer in Dygraph (#32130) · 29f65225
      WeiXin 提交于
      * custom python backward
      
      * polish up the code
      
      * polish up the code
      
      * polish up the code.
      
      * Fix code format and comments.
      
      * Delete redundant files.
      
      * add unnittest.
      
      * edit unnittest.
      
      * edit unnittest.
      
      * Remove redundant header files.
      
      * Improve coverage and remove redundant code.
      
      * support saving for backward.
      
      * polish code according to comments.
      
      * Add support type for PyLayer.
      
      * Modify the DOC.
      
      * polish Doc.
      
      * polish Doc.
      
      * polish Doc.
      
      * polish Doc.
      
      * polish Doc.
      
      * polish Doc.
      
      * polish code and make the code robust.
      
      * Modify the code format.
      29f65225
    • Z
      【NPU】Cherry-pick ascendrc ops code by 0325 to develop (#32197) · e6bc358d
      zhang wenhui 提交于
      * merge 31065
      
      * Fix typo of selected_npus (#31230)
      
      * merge 31249
      
      * [NPU] Support npu op pow and pow grad (#31247)
      
      * [NPU] Support npu op: (1) pow (2) pow_grad
      
      * Support fp16
      
      * Fix pow npu fp16 test (#31256)
      
      * support list of list attribute for NPU (#31299)
      
      * support list of list attribute for NPU
      
      * fix compile problem
      
      * fix reference
      
      * [NPU] Support npu op: (1) slice (2) slice_grad (#31275)
      
      * fix reading flags from env (#31329)
      
      * merge 31347
      
      * [NPU] Support npu op layer_norm and layer_norm_grad (#31310)
      
      * init commit, add layer_norm npu kernel
      
      * fix typo
      
      * add unittest
      
      * add unittest
      
      * fix bug
      
      * fix bug
      
      * refine ut
      
      * [NPU] add npu kernel for equal op (#31393)
      
      * add npu kernel for equal op
      
      * refine code
      
      * add more ut
      
      * update year
      
      * [NPU] Support npu kernel for shape op  (#31427)
      
      * add shape npu
      
      * fix
      
      * fix
      
      * fix endif (#31431)
      
      * Fix pow, use fillD instead of broadcast (#31433)
      
      * Fix pow, refine code (#31440)
      
      * fix cmake of cryptopp to avoid downloading every time (#31451)
      
      * [NPU] squeeze and unsqueeze op for ascend (#31452)
      Co-authored-by: Nroot <xiayanming@baidu.com>
      
      * Support npu kernel for gather op (#31458)
      
      * add gather npu op
      
      * code review done
      
      * update python new line
      
      * precommit
      
      * fix review
      
      * del commit
      
      * 【NPU】add scale op for npu (#31499)
      
      * add scale npu
      
      * fix
      
      * fix
      
      * Support TensorFormVector, TensorToVector of bool type (#31518)
      
      * support TensorFormVector, TensorToVector of bool type
      
      * add ut
      
      * fix compile problem
      
      * 【NPU】support npu kernel for fill_constant op (#31521)
      
      * add fill_constant npu
      
      * add fill_constant npu
      
      * fix
      
      * cherry-pick 31422, solve conflict
      
      * 【NPU】Support npu kernel for matmul op (#31544)
      
      * add matmulv2_npu
      
      * add matmul
      
      * add matmul
      
      * [NPU] Support npu op elementwise_mul and elementwise_mul_grad (#31571)
      
      * [NPU] Support npu op elementwise_max (#31574)
      
      * 【NPU】add relu op for  npu (#31515)
      
      * add relu npu
      
      * fixed
      
      * fix
      
      * 【NPU】Suppert npu kernel for reshape2 op (#31524)
      
      * add reshape2 npu
      
      * add reshpe2
      
      * [NPU] Support npu kernel for gather op fix bug (#31541)
      
      * add gather npu op
      
      * code review done
      
      * update python new line
      
      * precommit
      
      * fix review
      
      * del commit
      
      * update gather_grad
      
      * fix bug
      
      * fix bug
      
      * [NPU] Support npu kernel for amp_check_finite_and_unscale_npu op (#31457)
      
      * Support npu kernel for amp_check_finite_and_unscale_npu op
      
      * support EnforceNotMet exception
      
      * fix exception bug
      
      * modify python unittest
      
      * precommit
      
      * update c++ unittest
      
      * fix review
      
      * fix review
      
      * [NPU] accuracy op (#31492)
      
      * accuracy op
      
      * fix license
      
      * fix
      
      * add test and fix bug
      
      * [NPU] add Assign OP (#31561)
      
      * add assign op
      
      * add test assign npu test
      
      * dele if def
      Co-authored-by: Noyjxer <1728722986@qq.com>
      
      * [NPU] fix npu op elementwise_mul_grad (#31592)
      
      * 【NPU】Support npu op gelu and gelu_grad (#31530)
      
      * Support npu op gelu and gelu_grad
      
      * Support npu op gelu and gelu_grad
      
      * [NPU] fix assgin cmake (#31595)
      
      * fix gather_grad bug (#31607)
      
      * [NPU] add range op (#31560)
      
      * add range op
      
      * fix codestyle; call GetSize directly
      Co-authored-by: Noyjxer <1728722986@qq.com>
      
      * 【NPU】Support npu op elementwise_div and elementwise_div_grad (#31573)
      
      * Support npu op elementwise_div and elementwise_div_grad
      
      * Support npu op elementwise_div and elementwise_div_grad
      
      * Support npu op elementwise_div and elementwise_div_grad
      
      * [NPU] Support npu op log, log_grad, sqrt, sqrt_grad, square, tanh and tanh_grad (#31600)
      
      * [NPU] Support npu op logicalnot_op (#31534)
      
      * [NPU] Support npu op elementwise_min (#31575)
      
      * [NPU] Support npu op elementwise_pow (#31576)
      
      * [NPU] Support npu op table_lookup_v2 and table_lookup_v2_grad (#31399)
      
      * [npu] support npu kernel `table_lookup_v2`
      
      * clean up
      
      * +python test
      
      * +cmake
      
      * clean up
      
      * remove int8 kernel
      + python unitest for fp16
      
      * clean up
      
      * [NPU] support npu kernel for `less_than` (#31327)
      
      * [npu] support npu kernel for `less than`
      
      * remove int* kernel
      
      * cleanup
      
      * [NPU] Support npu kernel scatter op (#31624)
      
      * Support npu kernel scatter op
      
      * Add more test
      
      * [NPU] fix allocator min chunk size (#31632)
      
      * [NPU] Support NPU kernel cast op (#31635)
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * [NPU] add npu kernel for sgd (#31639)
      
      * 【NPU】Support NPU kernel for reduce_sum op v2 (#31620)
      
      * add reduce_sum
      
      * fix broadcastd
      
      * fix test
      
      * fix
      
      * add unsqueeze in reduce_sum
      
      * add template
      
      * add unittest for keep_dim
      
      * test reduce_all
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * [NPU] add npu kernel for adam (#31644)
      
      * add npu kernel for adam
      
      * refine code
      
      * disable test
      
      * modify atol
      
      * 【NPU】Support npu kernel for mul op (#31584)
      
      * add mul
      
      * add test mul
      
      * [NPU] add npu kernel for softmax_with_cross_entropy (#31656)
      
      * init
      
      * fix bugs
      
      * [NPU] add npu kernel for mean Op (#31562)
      
      * update mean op
      
      * update mean op
      
      * give a better test activation
      Co-authored-by: Noyjxer <1728722986@qq.com>
      
      * Revert "[NPU] add npu kernel for mean Op (#31562)" (#31665)
      
      This reverts commit 468ac699.
      
      * 【NPU】Add TensorCopy to NPU kernel for reduce_sum op  (#31667)
      
      * update unittest
      
      * add TensorCopy in npu grad kernel
      
      * [NPU] Support npu op `expand` (#31405)
      
      * [npu] support npu kernel  for `expand`
      
      * [NPU] fix shape of dx in mul_grad (#31675)
      
      * fix shape of dx
      
      * refine code
      
      * [NPU] add Increment op (#31563)
      
      * add increment
      
      * fix
      
      * update test increment op inplace
      
      * update increment op
      
      * increment b = 2
      Co-authored-by: Noyjxer <1728722986@qq.com>
      
      * [NPU] add NPU add topk  (#31596)
      
      * add topk op
      
      * add cmake
      
      * update topk npu op
      
      * refactor func
      
      * fix test not go npu TopKD bug
      
      * NPUPlace(4) to NPUPlace(0)
      
      * update comment
      Co-authored-by: Noyjxer <1728722986@qq.com>
      
      * [NPU] Support NPU kernel sum op (#31671)
      
      * [NPU] npu support `transpose` (#31486)
      
      * cherry-pick 31564, solve conflict
      
      * [NPU] Fix bug: Fix calculation errors of pow grad npu kernel (#31699)
      
      * [NPU] Support testing grad of NPU ops in OpTest (#31697)
      
      * [NPU] Support NPU kernel of stack op (#31711)
      
      * [NPU] Remove redundant ctest of top_k_op_npu_test (#31718)
      
      * [NPU] fix reshape npu op kernel (#31726)
      
      * rename npu op file
      
      * fix reshape
      
      * [NPU] change transpose to transpose2 (#31734)
      
      * change transpose to transpose2
      
      * fix bug
      
      * [NPU] Support  mean npu kernel (#31729)
      
      * [NPU] fix some bugs of npu op (#31739)
      
      * fix softmax
      
      * fix mean
      
      * fix lookup_table_v2
      
      * 【NPU】Fix npu kernel elementwise_div_grad  (#31753)
      
      * [NPU] fix the grad kernel diff bug of gather op (#31757)
      
      * fix gather grad kernel diff
      
      * fix gather grad kernel diff
      
      * fix gather review bug
      
      * 【NPU】Fix reshape test & add grad test (#31776)
      
      * fix
      
      * fix
      
      * [NPU] support fp16 for npu accuracy op (#31797)
      
      * [NPU] support list of tensor input (#31801)
      
      * support list of tensor as npu input
      
      * add comment
      
      * fix typo
      
      * fix typo
      
      * [NPU] add npu kernel for concat op (#31695)
      
      * add npu kernel for concat op
      
      * add npu kernel for concat op
      
      * refine code
      
      * update
      
      * refine concat_grad
      
      * [NPU] Support npu kernel for op elementwise_floordiv (#31822)
      
      * [NPU] fix bug of lookup_table_v2_grad (#31834)
      
      * [NPU] support default stream (#31510)
      
      * [NPU] support mixed precision input for npu layer norm (#31847)
      
      * support mixed precision input for npu layer norm
      
      * fix layer_norm npu kernel
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      
      * 【NPU】Support npu kernel for update_loss_scaling op (#31830)
      
      * add update_loss_scaling_npu NPU kernel
      
      * change TensorFromVec to Memset
      
      * fix compile problem (#31850)
      
      * [NPU] support npu for conditional_block op (#31854)
      
      * 【NPU】Add int dtype kernel for reshape2 op (#31864)
      
      * fix
      
      * fix
      
      * [NPU] fix some op bugs (#31855)
      
      * fix some op bugs
      
      * fix some bugs
      
      * follow comments
      
      * fix log level
      
      * add ut
      
      * [NPU] support fp16 of input for api pow (#31871)
      
      * [NPU] add npu kernel for truncated_gaussian_random op (#31654)
      
      * init
      
      * add todo
      
      * add npu kernel for truncated_gaussian_random
      
      * add sync
      
      * fix concat_grad
      
      * fix typo
      
      * fix compile
      
      * fix compile
      
      * fix compile
      
      * fix compile
      
      * fix compile
      
      * fix compile
      
      * fix code style
      
      * fix code style
      
      * fix code
      
      * Fix op test (#32231)
      
      * fix conditional block (#32243)
      
      * fix style code
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      Co-authored-by: NReventon_L <luyuxiang1994@qq.com>
      Co-authored-by: Nroot <xiayanming@baidu.com>
      Co-authored-by: Noyjxer <1728722986@qq.com>
      Co-authored-by: Nyinhaofeng <66763551+yinhaofeng@users.noreply.github.com>
      Co-authored-by: NOleNet <olenet@126.com>
      Co-authored-by: NMeiyim <chen_xuyi@outlook.com>
      Co-authored-by: Noyxuan-11 <963650125@qq.com>
      Co-authored-by: Npangyoki <pangyoki@126.com>
      e6bc358d
  18. 09 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d
      Leo Chen 提交于
      * [feature] support npu allocator (#30840)
      
      [feature] support npu allocator
      
      * [feature] support npu operator (#30951)
      
      [feature] support npu operator
      
      * [feature] support npu allocator, part 2 (#30972)
      
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      
      * support parsing ascend rank table file (#31000)
      
      support parsing ascend rank table file
      
      * Fix reshape on GE graph. (#31084)
      
      Fix reshape on GE graph
      
      * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
      
      * add npu sub op
      
      * fix typo
      
      * rename test
      
      * fix bug
      
      * fix bug
      
      * add fp16 kernel
      
      * fix typo
      
      * support sub grad op
      
      * support elementwise_sub_grad op
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * Fix compilation problem (#31100)
      
      Fix compilation problem (#31100)
      
      * fix compile
      
      * fix code stype
      
      * remove const_cast
      
      * support adding correct npu op in pybind.h (#31143)
      
      * support adding correct npu op in pybind.h
      
      * refine code
      
      * [NPU] Support executor with NPU (#31057)
      
      * [NPU] Support executor with NPU
      
      * Fix code according to reviews
      
      * Fix code
      
      * Add unittest for sub op npu
      
      * refactor npu device manager (#31154)
      
      refactor npu device manager (#31154)
      
      * fix selected npus
      
      * fix compile
      
      * fix reading flags from env
      
      * format
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      ccf5709d
  19. 06 4月, 2021 1 次提交
  20. 03 3月, 2021 1 次提交
  21. 24 2月, 2021 1 次提交
  22. 18 1月, 2021 1 次提交
  23. 12 1月, 2021 1 次提交
  24. 24 12月, 2020 1 次提交
  25. 23 12月, 2020 1 次提交
  26. 07 12月, 2020 1 次提交
    • L
      Compiling operator libraries with Unity build (#29130) · 671555ed
      LoveAn 提交于
      * Compiling operator libraries with Unity Build on Windows CPU.
      
      * Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci
      
      * Add option in windows ci script, no_test, test=windows_ci
      
      * Optimize parallel compiling, test=develop
      
      * remove limit of parallel compile and skip some ops in UB, test=develop
      
      * remove changes of header file, test=develop
      
      * remove changes of header file, test=develop
      
      * fix test_eye_op unittest failed, test=develop
      
      * Compiling operator libraries with Unity Build on Linux, test=develop
      
      * set default WITH_UNITY_BUILD=OFF, test=develop
      
      * Move unity build rules into a single file and add comment, test=develop
      
      * optimize parallel compilation, test=develop
      
      * fix undefined reference error on coverage ci, test=develop
      671555ed
  27. 03 12月, 2020 1 次提交
  28. 17 11月, 2020 1 次提交
  29. 04 11月, 2020 1 次提交
    • enhance the op_version_registry, test=develop (#28347) · 21a63f6f
      石晓伟 提交于
      * enhance the op_version_registry, test=develop
      
      * add unittests, test=develop
      
      * enhance the op_version_registry, test=develop
      
      * fix bugs, test=develop
      
      * revert pybind_boost_headers.h, test=develop
      
      * fix a attribute bug, test=develop
      21a63f6f
  30. 01 10月, 2020 1 次提交
  31. 22 9月, 2020 1 次提交
  32. 09 9月, 2020 1 次提交
  33. 24 8月, 2020 1 次提交
  34. 18 8月, 2020 1 次提交
  35. 13 8月, 2020 1 次提交
    • L
      [OpDevOptimize] Add common infershape functions (#26096) · ffe52b44
      Leo Chen 提交于
      * add unchaged infershape function
      
      * add broadcast infershape function
      
      * fix bug
      
      * rename infershape functions
      
      * add UnaryOpUnchangedInferShapeCheckAxis
      
      * add error message
      
      * add test for common infer shape functions
      
      * dont update existed ops
      
      * dont update op_desc.h
      
      * add more test
      
      * add error check, refine error message
      ffe52b44
  36. 29 7月, 2020 1 次提交
  37. 13 7月, 2020 1 次提交
    • H
      [Dy2stat] Fix Memory Optimization in run_program_op and Add SimNet as Unit Test (#25383) · f9ac5fb9
      Huihuang Zheng 提交于
      Add Similarity Net as unit test. During the unit test, we found three problems:
      
      1. The run_program_op has memory optimization error when running dy2stat net multiple times.
      2. The support for SelectedRows can cause problem in dy2stat.
      3. The return grammar has problem.
      
      This PR fixes the 1. problem but modify codes for the 2. 3. problems to make PR smaller. I will fix those two problems in the next PR(s)
      f9ac5fb9