1. 03 11月, 2022 1 次提交
  2. 19 10月, 2022 1 次提交
  3. 18 9月, 2022 1 次提交
  4. 14 9月, 2022 1 次提交
  5. 01 8月, 2022 1 次提交
  6. 22 7月, 2022 1 次提交
  7. 18 7月, 2022 1 次提交
  8. 28 6月, 2022 1 次提交
  9. 26 6月, 2022 1 次提交
  10. 24 6月, 2022 1 次提交
  11. 18 6月, 2022 1 次提交
  12. 15 6月, 2022 1 次提交
  13. 13 6月, 2022 1 次提交
  14. 09 6月, 2022 1 次提交
  15. 05 6月, 2022 1 次提交
  16. 04 6月, 2022 1 次提交
  17. 04 5月, 2022 1 次提交
  18. 22 4月, 2022 1 次提交
    • M
      [WIP] Algorithm Cache of cuBlasLt Epilogue (#41010) · 19650d72
      Ming-Xu Huang 提交于
      * Fix leading dimension setting error in fused_gemm_epilogue_grad_op.
      
      * Add dyload to cuBlasLt functions.
      
      * Added cublasLtMatmulAlgoGetHeuristic to improve performance.
      
      * Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue
      
      * Added UTs to FLAGS_cublaslt_exhaustive_search_times
      
      * Added warmup runs in algo searching of Gemm epilogue.
      
      * Update copyright and documents.
      
      * Fixed error handling.
      19650d72
  19. 11 3月, 2022 1 次提交
  20. 28 2月, 2022 2 次提交
  21. 25 2月, 2022 1 次提交
  22. 24 2月, 2022 1 次提交
  23. 20 2月, 2022 1 次提交
  24. 24 1月, 2022 1 次提交
  25. 10 1月, 2022 1 次提交
    • H
      Add gpu kernel for new api : linalg.lstsq (#38621) · 405103d8
      Haohongxiang 提交于
      * add lstsq gpu kernel
      
      * update
      
      * add docs_en
      
      * modify ut
      
      * fix bugs
      
      * modify example in docs_en
      
      * remove lstsq_op.cu from ROCM cmake
      
      * modify docs_en
      
      * modify docs_en
      
      * modify docs_en
      
      * remove unneccessary TensorCopy
      405103d8
  26. 04 1月, 2022 1 次提交
  27. 30 12月, 2021 3 次提交
  28. 29 12月, 2021 1 次提交
  29. 24 12月, 2021 1 次提交
  30. 03 12月, 2021 1 次提交
  31. 27 11月, 2021 1 次提交
    • A
      [NPU] reorganization for device API abstraction (#37110) · 72241a6a
      Aganlengzi 提交于
      * [NPU] reorganization for device API abstraction
      
      * [NPU] delete old files
      
      * [NPU] fix npu_collective_helper
      
      * [NPU] fix collective_helper
      
      * [NPU] fix ut
      
      * [NPU] mod memory allocation and hccl_helper
      
      * [NPU] fix place_type
      
      * [NPU] split enfoce.h
      
      * move acl* call into npu_info
      
      * merge conflict
      
      * fix merge
      
      * merge conflict
      
      * merge conflict
      72241a6a
  32. 24 11月, 2021 1 次提交
  33. 08 11月, 2021 1 次提交
    • W
      Use cuda virtual memory management and merge blocks (#36189) · a1ec1d5a
      wanghuancoder 提交于
      * Use cuda virtual memory management and merge blocks, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * window dll, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * use autogrowthv2 for system allocator, test=develop
      
      * remove ~CUDAVirtualMemAllocator(), test=develop
      
      * refine, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * fix bug, test=develop
      
      * revert system allocator, test =develop
      
      * revert multiprocessing, test=develop
      
      * fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop
      
      * catch cudaErrorInitializationError when create allocator, test=develop
      
      * fix cuMemSetAccess use, test=develop
      
      * refine cuda api use, test=develop
      
      * refine, test=develop
      
      * for test, test=develop
      
      * for test, test=develop
      
      * switch to v2, test=develop
      
      * refine virtual allocator, test=develop
      
      * Record cuMemCreate and cuMemRelease, test=develop
      
      * refine, test=develop
      
      * avoid out of bounds, test=develop
      
      * rename allocator, test=develop
      
      * refine, test=develop
      
      * use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop
      
      * for test,test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      a1ec1d5a
  34. 02 11月, 2021 1 次提交
  35. 29 10月, 2021 1 次提交
  36. 19 10月, 2021 2 次提交