MatMul operator (#4856)
* initial matmul operator Similar to np.matmul, but also has transpose_X and transpose_Y flags, and only supports tensors from rank 1 to 3 inclusive. For GPU, uses cublas?gemmStridedBatched. For CPU, uses cblas_?gemm_batch if available via MKL; otherwise a simple serial implementation that loops over the batch dimension is employed for now.
Showing
paddle/operators/math/matmul.h
0 → 100644
paddle/operators/matmul_op.cc
0 → 100644
paddle/operators/matmul_op.cu
0 → 100644
paddle/operators/matmul_op.h
0 → 100644
想要评论请 注册 或 登录