- 22 8月, 2020 1 次提交
- 
- 
由 ShenLiang 提交于* add matmul_v2 
 
- 
- 27 4月, 2020 1 次提交
- 
- 
由 Yiqun Liu 提交于
 
- 
- 24 4月, 2020 1 次提交
- 
- 
由 Guo Sheng 提交于* Add cholesky_op forward part. test=develop * Complete cholesky_op forward part. test=develop * Add cholesky_op backward part. test=develop * Complete cholesky_op backward part. test=develop * Refine cholesky_op error check and docs. test=develop * Add grad_check unit test for cholesky_op. test=develop * Fix sample code in cholesky doc. test=develop * Refine some error messages of cholesky_op. test=develop * Refine some error messages of cholesky_op. test=develop * Remove unused input in cholesky_grad. test=develop * Remove unused input in cholesky_grad. test=develop * Fix stream for cusolverDnSetStream. test=develop * Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code. test=develop * Add CUSOLVER ERROR in enforce.h test=develop * Fix the missing return value in cholesky. test=develop 
 
- 
- 30 9月, 2019 1 次提交
- 
- 
由 danleifeng 提交于Improve elementwise operators performance in same dimensions 
 
- 
- 25 9月, 2019 1 次提交
- 
- 
由 Bob Zhu 提交于* add support of matmul with multiple head even different width and height Original matmul with multiple head supports only the mat_a.width == mat_b.height, in that case, mat_b will be horizontally split. In this patch, we extend the support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height, in this case, mab_b will be vertically split. One example is A is [3, 8], B is [2, 16], head_number is 4. In this case, A will be split as [3, 2], B will be (vertically) split as [2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16] test=develop * add support of matmul with multiple head even different width and height Original matmul with multiple head supports only the mat_a.width == mat_b.height, in that case, mat_b will be horizontally split. In this patch, we extend the support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height, in this case, mab_b will be vertically split. One example is A is [3, 8], B is [2, 16], head_number is 4. In this case, A will be split as [3, 2], B will be (vertically) split as [2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16] test=develop * refactor the code of matmul with multiple head even different width and height test=develop 
 
- 
- 20 8月, 2019 1 次提交
- 
- 
由 Yihua Xu 提交于* Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * Ignore the deprecated status for windows test=develop 
 
- 
- 24 7月, 2019 1 次提交
- 
- 
由 Bob Zhu 提交于* extend matmul op to support multiple head multiplication With the support of multiple head, the multiplication of two big matrixes is split into multiplication of several (head_number) small matrixes. e.g. if Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16]. 
 
- 
- 04 3月, 2019 1 次提交
- 
- 
由 Yihua Xu 提交于test=develop 
 
- 
- 26 2月, 2019 1 次提交
- 
- 
由 Yihua Xu 提交于test=develop 
 
- 
- 22 2月, 2019 2 次提交
- 
- 
由 tensor-tang 提交于* Revert "Optimze Gelu with MKL Erf function (#15770)" This reverts commit 676995c8. * test=develop 
- 
由 Yihua Xu 提交于* Optimize for gelu operator * Set up the low accuracy mode of MKL ERF function. test=develop * Only enable MKLML ERF when OS is linux * Use the speical mklml version included vmsErf function to verify gelu mkl kernel. test=develop * Add the CUDA macro to avoid NVCC's compile issue. test=develop * Add the TODO comments for mklml library modification. test=develop * Clean Code test=develop * Add the comment of marco for NVCC compiler. test=develop 
 
- 
- 13 12月, 2018 1 次提交
- 
- 
由 Yu Yang 提交于
 
- 
- 27 11月, 2018 1 次提交
- 
- 
由 Jacek Czaja 提交于
 
- 
- 16 11月, 2018 1 次提交
- 
- 
由 tensor-tang 提交于* rename and fix blas vsqr test=develop * update 
 
- 
- 13 11月, 2018 1 次提交
- 
- 
由 tensor-tang 提交于
 
- 
- 22 8月, 2018 5 次提交
- 
- 
由 tensor-tang 提交于
- 
由 tensor-tang 提交于
- 
由 tensor-tang 提交于
- 
由 tensor-tang 提交于
- 
由 tensor-tang 提交于
 
- 
- 16 8月, 2018 1 次提交
- 
- 
由 tensor-tang 提交于
 
- 
- 06 8月, 2018 1 次提交
- 
- 
由 tensor-tang 提交于
 
- 
- 03 8月, 2018 2 次提交
- 
- 
由 tensor-tang 提交于
- 
由 tensor-tang 提交于
 
- 
- 05 7月, 2018 2 次提交
- 
- 
由 tensor-tang 提交于
- 
由 dzhwinter 提交于
 
- 
- 27 6月, 2018 1 次提交
- 
- 
由 tensor-tang 提交于
 
- 
- 20 6月, 2018 1 次提交
- 
- 
由 tensor-tang 提交于
 
- 
- 24 5月, 2018 1 次提交
- 
- 
由 Tomasz Patejko 提交于
 
- 
- 21 5月, 2018 1 次提交
- 
- 
由 Liu Yiqun 提交于Add an interface to set the number of threads for math function, and set the default value to 1 for inference. 
 
- 
- 08 5月, 2018 2 次提交
- 07 5月, 2018 1 次提交
- 
- 
由 Yu Yang 提交于
 
- 
- 04 5月, 2018 1 次提交
- 
- 
由 Yu Yang 提交于
 
- 
