1. 01 2月, 2023 1 次提交
  2. 31 1月, 2023 1 次提交
  3. 19 1月, 2023 1 次提交
    • J
      [KUNLUN] add op: maxpool_with_index (#49505) · f71f77e9
      jameszhang 提交于
      * [KUNLUN] add op: maxpool_with_index
      
      * use DeviceContext::Alloc() instead of DenseTensor::mutable_data()
      
      * fix file format
      
      * solve clip unittest failure
      
      * minor fix
      
      * Revert "solve clip unittest failure" since the issue is fixed
      in #49535
      
      This reverts commit 1127adc66e79afe35ac3c00bb34e6aaa7cd7d78b.
      
      * align with xdnn on the definition of mask in max_pool_with_index
      
      * minor
      f71f77e9
  4. 18 1月, 2023 3 次提交
    • R
      [PHI] remove bitwise and, or, xor (#49916) · 9056cc8b
      RuohengMa 提交于
      * add reduce_sum_int64 and reduce_sum_int8 xpu kernels
      
      * [PHI] add clip grad kernel with support type float32 and int32
      
      * [PHI unittest] add clip_grad unit test
      
      * adapt code to clang-format
      
      * update xpu api output with clip_grad api
      
      * remove int8 support of reduce_sum xpu kernel since it can not pass unit tests
      
      * adapt license date, add code for XPUDataType convertion
      
      * add int8 support of reduce_sum
      
      * add reduce_sum unit tests for dtype int64, int8, and add more test cases
      
      * update license date
      
      * remove buggy bitwise and, or and xor xpu kernels, refine bitwise not xpu kernel
      
      * change license date
      9056cc8b
    • H
      [XPU] add logical_not op. (#49911) · 60d1199a
      houj04 提交于
      60d1199a
    • J
      use default XPU stream for computing (#49806) · f6b23d6d
      jameszhang 提交于
      * revert to use default XPU stream for computing
      
      XPUContext now has a null stream by default. If you want to use a separate stream
       (e.g. in async collective communication), you should create a dedicated XPUContext
      and invoke its XPUContext::CreateStream()
      
      * minor
      f6b23d6d
  5. 16 1月, 2023 1 次提交
  6. 13 1月, 2023 4 次提交
  7. 12 1月, 2023 3 次提交
  8. 09 1月, 2023 2 次提交
  9. 06 1月, 2023 1 次提交
    • R
      Dev (#49591) · 07db4a9f
      RuohengMa 提交于
      * add bitwise and, bitwise not, bitwise or and bitwise xor
      
      * correct typo
      07db4a9f
  10. 27 12月, 2022 1 次提交
  11. 26 12月, 2022 1 次提交
    • Y
      fix dlrm qpsproblem (#49171) · c8f76337
      ykkk2333 提交于
      * migrate shaple sgd, split,sign xpu kernels to phi, test=kunlun
      
      * fix dlrm throughput problem, test=kunlun
      c8f76337
  12. 23 12月, 2022 1 次提交
  13. 22 12月, 2022 1 次提交
  14. 19 12月, 2022 1 次提交
  15. 14 12月, 2022 1 次提交
  16. 09 12月, 2022 3 次提交
  17. 08 12月, 2022 3 次提交
  18. 07 12月, 2022 2 次提交
  19. 06 12月, 2022 1 次提交
  20. 30 11月, 2022 1 次提交
    • J
      use correct xpu stream for synchronization (#48470) · 16562a9d
      james 提交于
      some legacy code still use xpu_wait() for stream sync -- it only syncs
      default stream. this PR replaces them with dev_ctx.Wait() to ensure
      that correct stream is always used
      16562a9d
  21. 28 11月, 2022 1 次提交
  22. 25 11月, 2022 1 次提交
  23. 24 11月, 2022 1 次提交
  24. 18 11月, 2022 1 次提交
    • J
      correct sync behavior for XPU distributed training (#47882) · aafa9820
      james 提交于
      * correct sync behavior for XPU distributed training
      
      XPU support event mechanism similar to cuda event, so it is advisable to
      use an event to sync compute/comm streams for performance. However this
      mechanism is never fully tested, and inconsistent loss/ending_epochs are
      reported. Therefore, this PR replaces event sync with stream waiting as
      a temporary solution.
      
      * remove compile warning
      aafa9820
  25. 10 11月, 2022 1 次提交
    • J
      XPU multi-card support eager mode (#47445) · 3b91f8f3
      james 提交于
      * XPU support eager mode
      
      * add unittest for XPU eager mode
      
      * minor bugfix
      
      * minor bugfix, test=kunlun
      
      * correct copyright info
      
      * 1. remove unsed vars/funcs
      2. ProcessGroupBKCL inherit from ProcessGroupStream
      
      * bugfix for fp16 in eager mode multi-card, test=kunlun
      
      * rebase & fix a few issues
      
      * use new processgroup interface, test=kunlun
      
      * fix compile issue, test=kunlun
      3b91f8f3
  26. 01 11月, 2022 1 次提交
    • C
      Adapting device-specific Extra Attributes for the PHI kernel (#46342) · c923e6c9
      Chen Weihang 提交于
      * add extra attr property set
      
      * add type_info for all context
      
      * add onednn context to all context
      
      * fix context compile error
      
      * simplify conv kernel args
      
      * pass runtime attr into dev_ctx
      
      * fix marco error
      
      * clear conv_grad_kernel extra args
      
      * merge conv_grad_grad into conv_grad
      
      * clear conv2d_grad_grad extra attrs
      
      * clear yaml and eager extra attr
      
      * fix conv1d error
      
      * change to thread local
      
      * fix npu compile failed
      
      * try to fix windows compile failed
      
      * add conv2d onednn phi kernel
      
      * fix ci bugs (#36)
      
      * fix compile bugs (#38)
      
      * fix extra input transform bug (#39)
      
      * support dynamic created attr (#40)
      
      * reset extra info gen code
      
      * rm conv_grad_grad kernel
      
      * reimpl pass attr adapting
      
      * add int attr support
      
      * remove vector inputnames creating
      
      * fix map at error
      
      * Update paddle/phi/kernels/onednn/conv_grad_kernel.cc
      Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>
      
      * remove useless extra attrs
      
      * replace mkldnn_engine by onednn_engine
      Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
      Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>
      c923e6c9
  27. 18 8月, 2022 1 次提交