- 10 4月, 2020 1 次提交
-
-
由 littletomatodonkey 提交于
add addmm op
-
- 05 9月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop
-
- 23 11月, 2018 1 次提交
-
-
由 chengduozh 提交于
test=develop
-
- 22 11月, 2018 1 次提交
-
-
由 chengduo 提交于
* refine cublase test=develop * code refine * refine cublas * add GEMME_EX * add enable_cublas_tensor_op_math doc and add cublasCall test=develop * fix CublasCall for cuda version test=develop * fix error test=develop * fix GEMM_EX to be compatible with gcc 4.8 test=develop * add GEMM_EX test=develop * to compatiable with gcc4.8 test=develop
-
- 28 9月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* flags * "follow comment"
-
- 15 9月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 21 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 17 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 01 6月, 2018 1 次提交
-
-
由 yuyang18 提交于
-
- 25 4月, 2018 1 次提交
-
-
由 Yu Yang 提交于
-
- 11 4月, 2018 1 次提交
-
-
由 Kexin Zhao 提交于
* fix gemm error for cuda 7.5 * fix version number
-
- 08 4月, 2018 1 次提交
-
-
由 Yi Wang 提交于
* Update source files. * Update headers * Update * Update * Update * Update * Fix a CMake dependency
-
- 07 4月, 2018 1 次提交
-
-
由 Kexin Zhao 提交于
* change from hgemm to gemmEx * fix cpplint
-
- 09 3月, 2018 1 次提交
-
-
由 kexinzhao 提交于
* test cpu float16 data transform * add isnan etc * small fix * fix containsNAN test error * add data_type transform GPU test * add float16 GPU example * fix error * fix GPU test error * initial commit * fix error * small fix * add more gemm fp16 tests * fix error * add utility function
-
- 12 2月, 2018 1 次提交
-
-
由 qingqing01 提交于
-
- 10 2月, 2018 2 次提交
- 11 11月, 2017 1 次提交
-
-
由 dangqingqing 提交于
-
- 18 10月, 2017 1 次提交
-
-
由 Markus Kliegl 提交于
* initial matmul operator Similar to np.matmul, but also has transpose_X and transpose_Y flags, and only supports tensors from rank 1 to 3 inclusive. For GPU, uses cublas?gemmStridedBatched. For CPU, uses cblas_?gemm_batch if available via MKL; otherwise a simple serial implementation that loops over the batch dimension is employed for now.
-
- 10 8月, 2017 3 次提交
- 13 7月, 2017 1 次提交
-
-
由 qijun 提交于
-
- 11 7月, 2017 2 次提交
- 04 7月, 2017 3 次提交
- 03 7月, 2017 2 次提交