1. 01 2月, 2023 1 次提交
    • L
      H2D data transfer optimization for split kernel (#49086) · 057ba778
      limingshu 提交于
      * profile reduce kernel for fp16 and reduceHigherdim
      
      * use reinterpret_cast
      
      * fix for CI on ROCm
      
      * add Macro for ROCm
      
      * ROCm CI config
      
      * ROCm CI config
      
      * unit test repair
      
      * pull
      
      * add common_funcs.h
      
      * reduceType
      
      * Update reduce_function.h
      
      * not higher
      
      * rename
      
      * implement of matmul using cublasLt instead of cublas
      
      * cublasLt bugfix
      
      * Update matmul_kernel_impl.h
      
      * Update matmul_kernel_impl_via_blasLt.h
      
      * for-loop-algo
      
      * PR comments changes
      
      * add macro
      
      * ci unused variable isCublasLt
      
      * ci unused variable isCublasLt macro
      
      * split matmul to autotune
      
      * rewrite the split kernel with segmented_array
      
      * rewrite the split kernel with segmented_array
      
      * rewrite the split kernel with segmented_array
      
      * add some method for cuda_graph
      
      * fix bugs for rocm
      
      * change for ci-error
      
      * i dont know why ci-model-benchmark gives a shit error, so i recover codes with original one to see if original codes work.
      
      * add some changes for passing mode_benchmark and coverage ci
      
      * fix ci error
      
      * fix ci-rocm error
      
      * add some changes for header
      
      ---------
      Co-authored-by: Nzhangbopd <1299246947@qq.com>
      Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
      057ba778
  2. 31 1月, 2023 5 次提交
  3. 30 1月, 2023 1 次提交
  4. 18 1月, 2023 1 次提交
  5. 16 1月, 2023 1 次提交
    • Z
      CUDA12.0 integration (#49539) · 1885d55a
      zlsh80826 提交于
      * Update warpctc for cuda-12
      
      * Deprecate cudaProfilerInitialize for CUDA > 11
      
      * Deprecate CUSPARSE_MV_ALG_DEFAULT for CUDA_VERSION >= 11040
      
      * Add the missing thrust header
      1885d55a
  6. 13 1月, 2023 3 次提交
  7. 11 1月, 2023 1 次提交
    • Y
      Implement a common segmented array. (#49450) · b1faa562
      Yiqun Liu 提交于
      * Implement a common PointerArray.
      
      * Polish codes.
      
      * Add including of header file.
      
      * Add the branch of kFix8.
      
      * Fix compiling error.
      
      * Add alignas hint to fix the performance drop.
      
      * Optimize the H2D copy in stack_grad.
      
      * Rename the macro.
      
      * Fix align hint for different compilers.
      
      * Polish the define of PADDLE_ALIGN.
      
      * Fix compiling error.
      
      * Remove the align hint on windows.
      b1faa562
  8. 10 1月, 2023 2 次提交
  9. 09 1月, 2023 2 次提交
  10. 04 1月, 2023 1 次提交
  11. 03 1月, 2023 2 次提交
  12. 26 12月, 2022 1 次提交
  13. 20 12月, 2022 1 次提交
  14. 19 12月, 2022 2 次提交
  15. 16 12月, 2022 1 次提交
  16. 15 12月, 2022 1 次提交
  17. 14 12月, 2022 1 次提交
  18. 12 12月, 2022 2 次提交
  19. 08 12月, 2022 5 次提交
  20. 07 12月, 2022 1 次提交
  21. 05 12月, 2022 5 次提交