1. 06 3月, 2023 1 次提交
    • H
      [phi decoupling] decouple dependency to device_context in phi (Part 1) (#50865) · a1006b2b
      Huang Jiyi 提交于
      * move DeviceContextPool to phi
      
      * add EmplaceExternalContextFunc
      
      * update namespace
      
      * update cmake
      
      * fix bugs and create context_pool_impl.h
      
      * replace platform::is_xxx_place
      
      * fix bugs
      
      * update generator
      
      * fix bugs
      
      * fix bugs
      
      * fix bugs
      
      * fix bugs
      
      * fix bugs
      
      * fix bugs
      
      * fix bugs
      
      * fix enforce usage
      
      * Revert "fix enforce usage"
      
      This reverts commit 5f521f08a69713cee506e64a00ec6d9fba709e27.
      
      * fix bugs
      
      * rm XPUDeviceContext and CustomDeviceContext
      
      * fix bugs
      
      * fix fix context init bug
      
      * fix bugs after merge
      
      * fix bugs
      
      * fix name
      
      * fix mutable_data
      
      * update and fix bugs
      
      * fix bugs
      
      * update
      
      * fix bugs
      
      * fix name
      
      * fix bugs
      
      * merge
      
      * fix bugs
      
      * create context_pool in phi/backends
      
      * create context_pool in phi/backends
      
      * fix bugs
      
      * fix xpu bugs
      
      * fix rocm bugs
      
      * fix bugs
      
      * fix bugs
      
      * fix bugs
      
      * fix xpu bugs
      
      * update
      
      * update
      
      * fix bugs
      
      * fix bugs
      a1006b2b
  2. 03 3月, 2023 1 次提交
  3. 21 2月, 2023 1 次提交
  4. 01 2月, 2023 1 次提交
    • L
      H2D data transfer optimization for split kernel (#49086) · 057ba778
      limingshu 提交于
      * profile reduce kernel for fp16 and reduceHigherdim
      
      * use reinterpret_cast
      
      * fix for CI on ROCm
      
      * add Macro for ROCm
      
      * ROCm CI config
      
      * ROCm CI config
      
      * unit test repair
      
      * pull
      
      * add common_funcs.h
      
      * reduceType
      
      * Update reduce_function.h
      
      * not higher
      
      * rename
      
      * implement of matmul using cublasLt instead of cublas
      
      * cublasLt bugfix
      
      * Update matmul_kernel_impl.h
      
      * Update matmul_kernel_impl_via_blasLt.h
      
      * for-loop-algo
      
      * PR comments changes
      
      * add macro
      
      * ci unused variable isCublasLt
      
      * ci unused variable isCublasLt macro
      
      * split matmul to autotune
      
      * rewrite the split kernel with segmented_array
      
      * rewrite the split kernel with segmented_array
      
      * rewrite the split kernel with segmented_array
      
      * add some method for cuda_graph
      
      * fix bugs for rocm
      
      * change for ci-error
      
      * i dont know why ci-model-benchmark gives a shit error, so i recover codes with original one to see if original codes work.
      
      * add some changes for passing mode_benchmark and coverage ci
      
      * fix ci error
      
      * fix ci-rocm error
      
      * add some changes for header
      
      ---------
      Co-authored-by: Nzhangbopd <1299246947@qq.com>
      Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
      057ba778
  5. 18 1月, 2023 1 次提交
  6. 09 1月, 2023 1 次提交
  7. 03 1月, 2023 1 次提交
  8. 20 12月, 2022 1 次提交
  9. 01 9月, 2022 1 次提交
  10. 18 7月, 2022 1 次提交
  11. 15 6月, 2022 1 次提交
  12. 10 6月, 2022 1 次提交
  13. 07 6月, 2022 1 次提交
  14. 05 6月, 2022 1 次提交
  15. 14 3月, 2022 1 次提交
    • L
      fix gpu callback (#40445) · 2c21d240
      Leo Chen 提交于
      * fix gpu conetxt callback
      
      * fix gpu callback
      
      * fix callback early destruct problem
      2c21d240
  16. 25 2月, 2022 1 次提交
  17. 23 2月, 2022 1 次提交