1. 28 2月, 2022 1 次提交
  2. 23 2月, 2022 1 次提交
  3. 14 2月, 2022 1 次提交
  4. 29 1月, 2022 1 次提交
    • L
      Add xpu2 compiler (#37254) · 92da5055
      Liu-xiandong 提交于
      * Add XPU compiler for paddle, test=develop
      
      * clean code
      
      * clean useless code
      
      * clean useless code
      
      * clean useless code
      
      * test
      
      * add include path
      
      * use clang compiler
      
      * xpu2.cmake
      
      * XPU2 compiler passed
      
      * update
      
      * update after pten
      
      * combination the WITH_XPU and WITH_XPU2
      
      * update the fuse operation in WITH_XPU and WITH_XPU2
      
      * update
      
      * update
      
      * update
      
      * fix the merge error
      
      * update
      
      * update the code
      
      * update the code
      
      * add run_kp_kernel flag
      
      * update
      
      * update
      
      * fix prepared type_ bug
      
      * clean and update the code
      
      * reset the kernel_primitives
      
      * update
      
      * clean the code
      
      * delete useless comment
      
      * fix the bug in WITH_XPU
      
      * update
      
      * update
      
      * modify the abi
      
      * delete some useless code
      
      * Parameter automation in xpu compilation
      
      * Parameter automation in xpu compilation
      
      * delete kps in cmake
      
      * delete useless comment
      
      * clean the code
      
      * clean the code
      92da5055
  5. 26 1月, 2022 1 次提交
    • L
      [pten] remove deprecated fluid op kernel for pten (#38842) · 3ab9aef1
      Leo Chen 提交于
      * update cmake file to remove fluid kernel
      
      * add pten declaration.h to where pybind.h used
      
      * fix sync_bn and tensorrt_engine
      
      * refine detection_library
      
      * fix interpreter_core
      
      * support eager legacy
      
      * fit eager legacy for pten
      
      * fall back to cpu if not found kernel
      
      * fix compile problem
      
      * fix compile problem
      
      * refine fallback logic
      
      * fit operator.run()
      
      * fix xpu compile
      
      * fit for new_exec
      
      * add REGISTER_OP_WITHOUT_GRADIENT
      
      * un-cache pt_kernel_context
      
      * fix compile
      
      * fix cudnn
      
      * fix compiling with on_infer
      
      * fix mkldnn
      
      * fix isfinite_v2
      
      * fix xpu problem
      
      * fix op_device
      
      * refine fallback for xpu
      
      * fix xpu compile
      
      * merge develop
      
      * refine code format
      
      * fix compile
      
      * fix compile
      
      * add data_transfer
      
      * fix PreparePtenData
      
      * fix cpu context
      
      * merge develop
      
      * fix compile
      
      * fix error device context
      
      * fix xpu
      
      * fix dev_ctx
      3ab9aef1
  6. 10 1月, 2022 1 次提交
    • H
      Add gpu kernel for new api : linalg.lstsq (#38621) · 405103d8
      Haohongxiang 提交于
      * add lstsq gpu kernel
      
      * update
      
      * add docs_en
      
      * modify ut
      
      * fix bugs
      
      * modify example in docs_en
      
      * remove lstsq_op.cu from ROCM cmake
      
      * modify docs_en
      
      * modify docs_en
      
      * modify docs_en
      
      * remove unneccessary TensorCopy
      405103d8
  7. 30 12月, 2021 1 次提交
  8. 24 12月, 2021 1 次提交
  9. 20 12月, 2021 1 次提交
  10. 27 10月, 2021 1 次提交
    • H
      add paddle.linalg.eigvalsh API (#35615) · 9f9ed3ae
      huangjun12 提交于
      * add eigvalsh with is_test
      
      * add eigvalsh op
      
      * fix backward bug
      
      * forward and backward, float and complex, unittest
      
      * remove eigvalsh_helper.h
      
      * remove changes of cusolver.h
      
      * fix unittest
      
      * fix unittest bug
      
      * update code following eigh
      
      * fix test
      
      * update lapack
      
      * pull develop
      
      * update funcor
      
      * fix unittest bug
      
      * fix details
      
      * add tensor_method_func
      
      * fix notes
      9f9ed3ae
  11. 25 10月, 2021 2 次提交
    • T
      add some ops to train ssd on kunlun (#36407) · 50778ad6
      TTerror 提交于
      * add some ops to train ssd on kunlun
      
      * add some ops to train ssd on kunlun
      
      * add some ops to train ssd on kunlun
      
      * update cast op unittest
      
      * update cast op unittest
      
      * update cast op unittest
      
      * update xpu cmake
      
      * update cast unittest
      50778ad6
    • Z
      add op: fused_feedforward(forward) (#35843) · b18cbfb2
      zhangkaihuo 提交于
      这个PR只包含fused_feedforward前向的代码。
      
      相关kernel实现:fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias
      
      fused_feedforward是一个融合算子,该算子对transformer模型的feed forward层的算子进行融合和封装,使得前端只呈现一个接口,通过融合减少部分访存和kernel launch的时间,以此提升性能。
      b18cbfb2
  12. 22 10月, 2021 1 次提交
    • L
      Fused attention op forward (#35905) · d4906214
      Li Min 提交于
      功能:本PR的目标是提高attention模块的计算性能。
      为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op;
      为了减少防存开销,本PR采取了两种优化方法:
      (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次;
      (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
      d4906214
  13. 19 10月, 2021 1 次提交
    • Y
      [paddle.linalg.qr] Add the Qr Operator (#35742) · 34d785c2
      Yulong Ao 提交于
      * Add QR decomposition op
      
      * Change codes to adapt to new svd_helper
      
      * Update linalg.py
      
      Restore the deleted comma
      
      * Restore the deleted line
      
      * Update linalg.py
      
      * Update linalg.py
      
      * Improve the qr code by reviews
      
      * Update QR based on CI results
      
      * Update qr doc, test=document_fix
      
      * Change unsafe and ill-formed codes
      34d785c2
  14. 14 10月, 2021 1 次提交
  15. 28 9月, 2021 1 次提交
  16. 16 9月, 2021 1 次提交
  17. 09 9月, 2021 1 次提交
    • 0
      Add matrix_rank Op and it's GPU and CPU kernel (#34823) · eb1fbf12
      0x45f 提交于
      * init matrix_rank op, add matrix_rank CPU code and test
      
      * add GPU kernel, remove svd_eigen.h
      
      * add CPU kernel when tol is tensor
      
      * add cpu and gpu code when tol is tensor
      
      * fix CI-ROCM error
      
      * add matrix_rank API describe, fix PR-CI-Py3 error
      
      * fix PR-CI-Windows error, add matrix_rank API test
      
      * delete useless comments
      
      * fix review
      
      * add my code in svd_helper.h
      
      * update doc commets
      
      * remove spaces
      eb1fbf12
  18. 02 9月, 2021 1 次提交
    • X
      Add SVD Op and it's GPU and CPU kernel (#34953) · 7e5fb462
      xiongkun 提交于
      * Add SVD Op and it's GPU and CPU kernel
      
      * Remove CUDAPlace in test_svd_op, make the test available in CPU package
      
      * modfity the file
      
      * fix windows bug/ fix ROCM / fix test timeout
      
      * for pass the CIs
      
      * improve error report
      
      * for code review
      
      * some modification to test_svd_op
      
      * change python code style
      
      * expose the svd interface for document
      7e5fb462
  19. 16 6月, 2021 1 次提交
  20. 07 5月, 2021 1 次提交
  21. 06 5月, 2021 1 次提交
    • R
      [ROCM] bugfix for unittest (#32392) · 31392627
      ronnywang 提交于
      * fix test_unpool_op
      
      * fix test_inplace_addto_strategy
      
      * fix test_conv2d_fusion_op
      
      * fix test_imperative_lod_tensor_to_selected_rows, test_imperative_selected_rows_to_lod_tensor
      
      * fix test_dot_op
      
      * fix test_correlation_op
      
      * fix tracer
      
      * fix test_memcpy_op
      31392627
  22. 29 4月, 2021 1 次提交
  23. 09 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d
      Leo Chen 提交于
      * [feature] support npu allocator (#30840)
      
      [feature] support npu allocator
      
      * [feature] support npu operator (#30951)
      
      [feature] support npu operator
      
      * [feature] support npu allocator, part 2 (#30972)
      
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      
      * support parsing ascend rank table file (#31000)
      
      support parsing ascend rank table file
      
      * Fix reshape on GE graph. (#31084)
      
      Fix reshape on GE graph
      
      * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
      
      * add npu sub op
      
      * fix typo
      
      * rename test
      
      * fix bug
      
      * fix bug
      
      * add fp16 kernel
      
      * fix typo
      
      * support sub grad op
      
      * support elementwise_sub_grad op
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * Fix compilation problem (#31100)
      
      Fix compilation problem (#31100)
      
      * fix compile
      
      * fix code stype
      
      * remove const_cast
      
      * support adding correct npu op in pybind.h (#31143)
      
      * support adding correct npu op in pybind.h
      
      * refine code
      
      * [NPU] Support executor with NPU (#31057)
      
      * [NPU] Support executor with NPU
      
      * Fix code according to reviews
      
      * Fix code
      
      * Add unittest for sub op npu
      
      * refactor npu device manager (#31154)
      
      refactor npu device manager (#31154)
      
      * fix selected npus
      
      * fix compile
      
      * fix reading flags from env
      
      * format
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      ccf5709d
  24. 27 1月, 2021 1 次提交
    • J
      REUPLOAD Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30719) · f8da5536
      jakpiase 提交于
      * added external reorder to profiler
      
      * resolved conflict
      
      * added enable_static
      
      * initial version of lstm, not working yet
      
      * added lstm to operators.cmake
      
      * added vanilla lstm mkldnn op
      
      * added peephole weights integration
      
      * minor changes
      
      * added formatting
      
      * added fusion_lstm_mkldnn to static_whitelist
      
      * added formatting
      
      * removed comment
      
      * moved use_peepholes attribute inside is_cached block
      
      * reverted wrong changes
      
      * minor formatting change
      
      * minor changes
      
      * changed stream handling
      
      * minor change
      
      * added datatype to GetExpectedKernelType()
      
      * added reading stream from TLS
      f8da5536
  25. 26 1月, 2021 2 次提交
  26. 21 1月, 2021 1 次提交
  27. 16 12月, 2020 1 次提交
    • Y
      添加rocm平台支持代码 (#29342) · 76738504
      Y_Xuan 提交于
      * 添加rocm平台支持代码
      
      * 修改一些问题
      
      * 修改一些歧义并添加备注
      
      * 修改代码格式
      
      * 解决冲突后的代码修改
      
      * 修改operators.cmake
      
      * 修改格式
      
      * 修正错误
      
      * 统一接口
      
      * 修改日期
      76738504
  28. 07 12月, 2020 1 次提交
    • L
      Compiling operator libraries with Unity build (#29130) · 671555ed
      LoveAn 提交于
      * Compiling operator libraries with Unity Build on Windows CPU.
      
      * Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci
      
      * Add option in windows ci script, no_test, test=windows_ci
      
      * Optimize parallel compiling, test=develop
      
      * remove limit of parallel compile and skip some ops in UB, test=develop
      
      * remove changes of header file, test=develop
      
      * remove changes of header file, test=develop
      
      * fix test_eye_op unittest failed, test=develop
      
      * Compiling operator libraries with Unity Build on Linux, test=develop
      
      * set default WITH_UNITY_BUILD=OFF, test=develop
      
      * Move unity build rules into a single file and add comment, test=develop
      
      * optimize parallel compilation, test=develop
      
      * fix undefined reference error on coverage ci, test=develop
      671555ed
  29. 12 11月, 2020 1 次提交
  30. 27 9月, 2020 1 次提交
  31. 23 9月, 2020 1 次提交
  32. 16 9月, 2020 1 次提交
  33. 21 8月, 2020 1 次提交
    • Q
      support Baidu Kunlun AI Accelerator (#25959) · 138ecf24
      QingshuChen 提交于
      * support Baidu AI Accelerator
        * test=kunlun
      
      * minor
       * test=kunlun
      
      * support xpu op in separate file
       * test=kunlun
      
      * update XPU error message and remove duplicated code
      
       * test=kunlun
      
      * minor
       * test=kunlun
      
      * minor
       * test=kunlun
      138ecf24
  34. 13 8月, 2020 1 次提交
    • L
      [OpDevOptimize] Add common infershape functions (#26096) · ffe52b44
      Leo Chen 提交于
      * add unchaged infershape function
      
      * add broadcast infershape function
      
      * fix bug
      
      * rename infershape functions
      
      * add UnaryOpUnchangedInferShapeCheckAxis
      
      * add error message
      
      * add test for common infer shape functions
      
      * dont update existed ops
      
      * dont update op_desc.h
      
      * add more test
      
      * add error check, refine error message
      ffe52b44
  35. 06 8月, 2020 1 次提交
    • A
      Add oneDNN fusion_gru kernel (#25594) · 68c6160e
      Adam 提交于
      * Add oneDNN fusion_gru kernel and fix fc+gru pass
      test=develop
      
      * Formatting changes
      test=develop
      
      * Lint fixes
      test=develop
      
      * Add memory::format_tag::any to GRU weights
      test=develop
      
      * Fix build with CUDA
      
      * Fix build with CUDA v2
      68c6160e
  36. 30 7月, 2020 1 次提交
  37. 03 4月, 2020 1 次提交
    • C
      update linspace, equal operators to API 2.0 (#23274) · a2e10930
      channings 提交于
      * update linspace, equal operators to API 2.0, test=develop
      
      * equal support higher performance CUDA kernel, test=develop
      
      * update comment of equal&linspace operator, test=develop
      
      * update comment of equal&linspace operator, test=develop
      a2e10930
  38. 11 3月, 2020 1 次提交