- 25 12月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add support for complex grad accumulated * add unittest for coverage * update test dtype * remove useless blank line
-
- 01 12月, 2020 1 次提交
-
-
由 chentianyu03 提交于
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types * add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest
-
- 24 9月, 2020 1 次提交
-
-
由 wanghuancoder 提交于
* use iwyu clean include, test=develop, test=win * compilation error, test=develop * fix compilation error2, test=develop * fix compilation error3, test=develop * fix compilation error4, test=develop * fix compilation error5, test=develop * fix compilation error6, test=develop * fix compilation error7, test=develop * fix compilation error8, test=develop * fix compilation error8, test=develop * fix compilation error10, test=develop * fix compilation error11, test=develop
-
- 03 9月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 09 7月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
-
- 27 4月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
-
- 24 4月, 2020 1 次提交
-
-
由 Guo Sheng 提交于
* Add cholesky_op forward part. test=develop * Complete cholesky_op forward part. test=develop * Add cholesky_op backward part. test=develop * Complete cholesky_op backward part. test=develop * Refine cholesky_op error check and docs. test=develop * Add grad_check unit test for cholesky_op. test=develop * Fix sample code in cholesky doc. test=develop * Refine some error messages of cholesky_op. test=develop * Refine some error messages of cholesky_op. test=develop * Remove unused input in cholesky_grad. test=develop * Remove unused input in cholesky_grad. test=develop * Fix stream for cusolverDnSetStream. test=develop * Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code. test=develop * Add CUSOLVER ERROR in enforce.h test=develop * Fix the missing return value in cholesky. test=develop
-
- 10 4月, 2020 1 次提交
-
-
由 littletomatodonkey 提交于
add addmm op
-
- 05 9月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop
-
- 23 11月, 2018 1 次提交
-
-
由 chengduozh 提交于
test=develop
-
- 22 11月, 2018 1 次提交
-
-
由 chengduo 提交于
* refine cublase test=develop * code refine * refine cublas * add GEMME_EX * add enable_cublas_tensor_op_math doc and add cublasCall test=develop * fix CublasCall for cuda version test=develop * fix error test=develop * fix GEMM_EX to be compatible with gcc 4.8 test=develop * add GEMM_EX test=develop * to compatiable with gcc4.8 test=develop
-
- 28 9月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* flags * "follow comment"
-
- 15 9月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 21 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 17 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 01 6月, 2018 1 次提交
-
-
由 yuyang18 提交于
-
- 25 4月, 2018 1 次提交
-
-
由 Yu Yang 提交于
-
- 11 4月, 2018 1 次提交
-
-
由 Kexin Zhao 提交于
* fix gemm error for cuda 7.5 * fix version number
-
- 08 4月, 2018 1 次提交
-
-
由 Yi Wang 提交于
* Update source files. * Update headers * Update * Update * Update * Update * Fix a CMake dependency
-
- 07 4月, 2018 1 次提交
-
-
由 Kexin Zhao 提交于
* change from hgemm to gemmEx * fix cpplint
-
- 09 3月, 2018 1 次提交
-
-
由 kexinzhao 提交于
* test cpu float16 data transform * add isnan etc * small fix * fix containsNAN test error * add data_type transform GPU test * add float16 GPU example * fix error * fix GPU test error * initial commit * fix error * small fix * add more gemm fp16 tests * fix error * add utility function
-
- 12 2月, 2018 1 次提交
-
-
由 qingqing01 提交于
-
- 10 2月, 2018 2 次提交
- 11 11月, 2017 1 次提交
-
-
由 dangqingqing 提交于
-
- 18 10月, 2017 1 次提交
-
-
由 Markus Kliegl 提交于
* initial matmul operator Similar to np.matmul, but also has transpose_X and transpose_Y flags, and only supports tensors from rank 1 to 3 inclusive. For GPU, uses cublas?gemmStridedBatched. For CPU, uses cblas_?gemm_batch if available via MKL; otherwise a simple serial implementation that loops over the batch dimension is employed for now.
-
- 10 8月, 2017 3 次提交
- 13 7月, 2017 1 次提交
-
-
由 qijun 提交于
-
- 11 7月, 2017 2 次提交
- 04 7月, 2017 3 次提交
- 03 7月, 2017 2 次提交