Matmul performance optimization with cuBlasLt (#46431)
* implement of matmul using cublasLt instead of cublas * Update matmul_kernel_impl_via_blasLt.h --------- Co-authored-by: Nzhangbopd <1299246947@qq.com> Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com> Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
Showing
想要评论请 注册 或 登录