• M
    [WIP] Algorithm Cache of cuBlasLt Epilogue (#41010) · 19650d72
    Ming-Xu Huang 提交于
    * Fix leading dimension setting error in fused_gemm_epilogue_grad_op.
    
    * Add dyload to cuBlasLt functions.
    
    * Added cublasLtMatmulAlgoGetHeuristic to improve performance.
    
    * Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue
    
    * Added UTs to FLAGS_cublaslt_exhaustive_search_times
    
    * Added warmup runs in algo searching of Gemm epilogue.
    
    * Update copyright and documents.
    
    * Fixed error handling.
    19650d72
test_fuse_gemm_epilogue_pass.py 13.8 KB