1. 08 12月, 2022 1 次提交
    • H
      [PHI decoupling] move cuda_graph from fluid to phi (#48686) · a4d9851b
      huangjiyi 提交于
      * move cuda_graph from fluid to phi
      
      * move device_memory_aligment from fluid to phi
      
      * Revert "move device_memory_aligment from fluid to phi"
      
      This reverts commit b92fcd39a0a50fdac13278f49be0237a85f3a13f.
      
      * update xpu cmake
      a4d9851b
  2. 07 12月, 2022 2 次提交
  3. 06 12月, 2022 1 次提交
  4. 30 11月, 2022 1 次提交
    • J
      use correct xpu stream for synchronization (#48470) · 16562a9d
      james 提交于
      some legacy code still use xpu_wait() for stream sync -- it only syncs
      default stream. this PR replaces them with dev_ctx.Wait() to ensure
      that correct stream is always used
      16562a9d
  5. 28 11月, 2022 1 次提交
  6. 25 11月, 2022 1 次提交
  7. 24 11月, 2022 1 次提交
  8. 18 11月, 2022 1 次提交
    • J
      correct sync behavior for XPU distributed training (#47882) · aafa9820
      james 提交于
      * correct sync behavior for XPU distributed training
      
      XPU support event mechanism similar to cuda event, so it is advisable to
      use an event to sync compute/comm streams for performance. However this
      mechanism is never fully tested, and inconsistent loss/ending_epochs are
      reported. Therefore, this PR replaces event sync with stream waiting as
      a temporary solution.
      
      * remove compile warning
      aafa9820
  9. 10 11月, 2022 1 次提交
    • J
      XPU multi-card support eager mode (#47445) · 3b91f8f3
      james 提交于
      * XPU support eager mode
      
      * add unittest for XPU eager mode
      
      * minor bugfix
      
      * minor bugfix, test=kunlun
      
      * correct copyright info
      
      * 1. remove unsed vars/funcs
      2. ProcessGroupBKCL inherit from ProcessGroupStream
      
      * bugfix for fp16 in eager mode multi-card, test=kunlun
      
      * rebase & fix a few issues
      
      * use new processgroup interface, test=kunlun
      
      * fix compile issue, test=kunlun
      3b91f8f3
  10. 01 11月, 2022 1 次提交
    • C
      Adapting device-specific Extra Attributes for the PHI kernel (#46342) · c923e6c9
      Chen Weihang 提交于
      * add extra attr property set
      
      * add type_info for all context
      
      * add onednn context to all context
      
      * fix context compile error
      
      * simplify conv kernel args
      
      * pass runtime attr into dev_ctx
      
      * fix marco error
      
      * clear conv_grad_kernel extra args
      
      * merge conv_grad_grad into conv_grad
      
      * clear conv2d_grad_grad extra attrs
      
      * clear yaml and eager extra attr
      
      * fix conv1d error
      
      * change to thread local
      
      * fix npu compile failed
      
      * try to fix windows compile failed
      
      * add conv2d onednn phi kernel
      
      * fix ci bugs (#36)
      
      * fix compile bugs (#38)
      
      * fix extra input transform bug (#39)
      
      * support dynamic created attr (#40)
      
      * reset extra info gen code
      
      * rm conv_grad_grad kernel
      
      * reimpl pass attr adapting
      
      * add int attr support
      
      * remove vector inputnames creating
      
      * fix map at error
      
      * Update paddle/phi/kernels/onednn/conv_grad_kernel.cc
      Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>
      
      * remove useless extra attrs
      
      * replace mkldnn_engine by onednn_engine
      Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
      Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>
      c923e6c9
  11. 18 8月, 2022 1 次提交
  12. 10 8月, 2022 1 次提交
  13. 19 7月, 2022 1 次提交
  14. 15 7月, 2022 1 次提交
  15. 06 7月, 2022 1 次提交
  16. 05 6月, 2022 1 次提交
  17. 04 6月, 2022 1 次提交
  18. 05 5月, 2022 1 次提交
  19. 01 3月, 2022 1 次提交
  20. 28 2月, 2022 1 次提交
  21. 24 2月, 2022 1 次提交
  22. 22 2月, 2022 1 次提交
    • X
      change Vector to std::vector and provide MixVector class as a helper … (#39559) · 728c0624
      xiongkun 提交于
      * change Vector to std::vector and provide MixVector class as a helper wrapper class
      
      * solve the multi-gpu hang problem
      
      * remove the duplicate template instantialize
      
      * Copy vector to cpu
      
      * add CopyToCPU
      
      * xxx
      
      * final version: fix the problem of all reduce
      
      * remove mixvector dependence
      
      * fix
      
      * merge
      
      * fix code
      
      * fix by CI
      728c0624
  23. 20 2月, 2022 1 次提交