1. 05 9月, 2019 1 次提交
    • Y
      Integrate NVRTC to support compiling CUDA kernel at runtime (#19422) · 42b5bec6
      Yiqun Liu 提交于
      * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
      test=develop
      
      * Call CUDA driver api to launch the kernel compiled by nvrtc.
      test=develop
      
      * Disable for mac and windows.
      test=develop
      
      * Refine the codes to support manually specified num_threads and workload_per_thread.
      test=develop
      
      * Refine the CUDA kernel to support large dims.
      test=develop
      42b5bec6
  2. 23 11月, 2018 1 次提交
  3. 22 11月, 2018 1 次提交
    • C
      Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929) · 00b9e9a1
      chengduo 提交于
      * refine cublase
      test=develop
      
      * code refine
      
      * refine cublas
      
      * add GEMME_EX
      
      * add enable_cublas_tensor_op_math doc and add cublasCall
      test=develop
      
      * fix CublasCall for cuda version
      test=develop
      
      * fix error
      test=develop
      
      * fix GEMM_EX to be compatible with gcc 4.8
      test=develop
      
      * add GEMM_EX
      test=develop
      
      * to compatiable with gcc4.8
      test=develop
      00b9e9a1
  4. 28 9月, 2018 1 次提交
  5. 15 9月, 2018 1 次提交
  6. 21 8月, 2018 1 次提交
  7. 17 8月, 2018 1 次提交
  8. 01 6月, 2018 1 次提交
  9. 25 4月, 2018 1 次提交
  10. 11 4月, 2018 1 次提交
  11. 08 4月, 2018 1 次提交
  12. 07 4月, 2018 1 次提交
  13. 09 3月, 2018 1 次提交
    • K
      Add float16 GEMM math function on GPU (#8695) · 90215b78
      kexinzhao 提交于
      * test cpu float16 data transform
      
      * add isnan etc
      
      * small fix
      
      * fix containsNAN test error
      
      * add data_type transform GPU test
      
      * add float16 GPU example
      
      * fix error
      
      * fix GPU test error
      
      * initial commit
      
      * fix error
      
      * small fix
      
      * add more gemm fp16 tests
      
      * fix error
      
      * add utility function
      90215b78
  14. 12 2月, 2018 1 次提交
  15. 10 2月, 2018 2 次提交
  16. 11 11月, 2017 1 次提交
  17. 18 10月, 2017 1 次提交
    • M
      MatMul operator (#4856) · 16489827
      Markus Kliegl 提交于
      * initial matmul operator
      
      Similar to np.matmul, but also has transpose_X and transpose_Y flags,
      and only supports tensors from rank 1 to 3 inclusive.
      
      For GPU, uses cublas?gemmStridedBatched. For CPU, uses
      cblas_?gemm_batch if available via MKL; otherwise a simple serial
      implementation that loops over the batch dimension is employed for now.
      16489827
  18. 10 8月, 2017 3 次提交
  19. 13 7月, 2017 1 次提交
  20. 11 7月, 2017 2 次提交
  21. 04 7月, 2017 3 次提交
  22. 03 7月, 2017 2 次提交