• M
    MatMul operator (#4856) · 16489827
    Markus Kliegl 提交于
    * initial matmul operator
    
    Similar to np.matmul, but also has transpose_X and transpose_Y flags,
    and only supports tensors from rank 1 to 3 inclusive.
    
    For GPU, uses cublas?gemmStridedBatched. For CPU, uses
    cblas_?gemm_batch if available via MKL; otherwise a simple serial
    implementation that loops over the batch dimension is employed for now.
    16489827
matmul_op.h 7.9 KB