- 08 2月, 2022 7 次提交
-
-
由 From00 提交于
* Rough implementation for experiment * Support allocate cuda managed memory * Fix CI error * Modify UT * Check whether support memory oversubscription * Fix ROCM Compile error * Fix ROCM Compile error * Fix UT cuda_managed_memory_test * Set UT timeout to 40 * Add UT OOMExceptionTest * Set UT timeout to 50
-
由 Chen Weihang 提交于
* fix pten reduce dispatch bug * add cast beforce reduce * fix test failed
-
由 Leo Chen 提交于
-
由 Leo Chen 提交于
-
由 Yan Chunwei 提交于
-
由 Zhanlue Yang 提交于
-
由 sneaxiy 提交于
* hack custom op * add ut * skip windows ci
-
- 07 2月, 2022 7 次提交
-
-
由 tanzhipeng 提交于
-
由 sneaxiy 提交于
-
由 Yan Chunwei 提交于
-
由 arlesniak 提交于
* amp list updated * tests updated * gray list updated * amp list updated * test updated
-
由 jakpiase 提交于
* Added adam kernel * CI rerun
-
由 Zhanlue Yang 提交于
-
由 Chen Weihang 提交于
* refactor custom op kernel func and utils * add output sync * adapte tensor* in utils * fix windows symbol error
-
- 06 2月, 2022 1 次提交
-
-
由 Wilber 提交于
-
- 04 2月, 2022 2 次提交
-
-
由 zyfncg 提交于
* add data_transform in pten api * support GetKernelTypeForVar * fix complie problem of bfloat16 * change error namespace * add complex type transform unittest * fix merge conflict
-
由 Chen Weihang 提交于
-
- 02 2月, 2022 3 次提交
-
-
由 Zuza 提交于
-
由 Chen Weihang 提交于
* remove kernel alias name * fix depreacted error * fix deprecated failed * fix mean error * resolve conflict * fix windows failed
-
由 Jiabin Yang 提交于
-
- 30 1月, 2022 10 次提交
-
-
由 Xiaoxu Chen 提交于
* add multinomial probability distribution * fix categorical sample bug when logits less than zero * fix categorical sample can't pass hypothesis test and entropy shape error bug
-
由 zhaocaibei123 提交于
* geo depends * add memory geo table * fix
-
由 zhangkaihuo 提交于
* dense_to_sparse_coo * optimize unit testing; support rocm * 1. delete fluid related header file 2. update the copyright * fix hipMemcpy * update dense_to_sparsecoo * add namespace sparse * sparse_csr_to_dense * test to_sparse_coo: csr_to_coo * fix writing error
-
由 Chen Weihang 提交于
* change unary infermeta * change other infermeta * change all infermeta format * resolve conflit * fix test failed * resolve reshape conflit * fix compile failed * adapt auto api gen * fix reshape failed * fix concat failed * resolve conflict
-
由 Chen Weihang 提交于
-
由 zhangkaihuo 提交于
* dense_to_sparse_coo * optimize unit testing; support rocm * 1. delete fluid related header file 2. update the copyright * fix hipMemcpy * update dense_to_sparsecoo * add namespace sparse
-
由 Leo Chen 提交于
-
由 fwenguang 提交于
-
由 mhhhh1 提交于
-
由 Leo Chen 提交于
* upgrade _get_all_register_op_kernels * add ut * support xpu/npu * fix device id * enhance TransToFluidPlace * fix compile
-
- 29 1月, 2022 10 次提交
-
-
由 ronnywang 提交于
-
由 Li Min 提交于
* Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op. * Remove useless code. * Remove useless code. * Optimize layer_norm fwd when cols is 1024. * Remove useless code. * Minors. * Minors. * Modifications accordding to reviews. * Minors. * Optimize layer_norm bwd kernel when cols is 1024. * Polish layer_norm_bwd_1024 kernel. * Limit ln_bwd_1024_kernel to paddle_with_cuda. * Fix double type compile error. * Add optimization of ln bwd for fused_dropout_add_ln op. * Polish codes.
-
由 Liu-xiandong 提交于
* Add XPU compiler for paddle, test=develop * clean code * clean useless code * clean useless code * clean useless code * test * add include path * use clang compiler * xpu2.cmake * XPU2 compiler passed * update * update after pten * combination the WITH_XPU and WITH_XPU2 * update the fuse operation in WITH_XPU and WITH_XPU2 * update * update * update * fix the merge error * update * update the code * update the code * add run_kp_kernel flag * update * update * fix prepared type_ bug * clean and update the code * reset the kernel_primitives * update * clean the code * delete useless comment * fix the bug in WITH_XPU * update * update * modify the abi * delete some useless code * Parameter automation in xpu compilation * Parameter automation in xpu compilation * delete kps in cmake * delete useless comment * clean the code * clean the code
-
由 Chen Weihang 提交于
-
由 Chen Weihang 提交于
* open header for custom kernel * add core utils * tidy core code * tify header * tidy include * tidy namespace * resolve conflit * fix unittest and coverage * remove platform using * resolve conflict * resolve conflict * fix digamma namespace error * fix xpu full kernel error * fix xpu full kernel error * polish details * add place for lib storage
-
由 Tongxin Bai 提交于
* [autograd] static Jacobian pass tests. * [autograd] apply CR suggested changes. * [autograd] more tests. * [autograd] add CPUPlace in tests. * [autograd] bug fixes. * [autograd] reformatted. * [autograd] adding Hessian, in progress. * [autograd] Hessian passes. A double grad bug fixed. * [autograd] fix renaming conflict in double backward pass. * [autograd] polish test.s * fix a bug when using brackets * debug for ci * [autograd] fixing Hessian test. * polish format. Co-authored-by: Nlevi131 <83750468+levi131@users.noreply.github.com> Co-authored-by: Nlevi131 <limaolin01@baidu.com>
-
由 Zhanlue Yang 提交于
-
由 Guanghua Yu 提交于
-
由 JZ-LIANG 提交于
* support qkv fuse * support qkv fuse * update completion * update completion * update dist_split * rerun ci * is_auto_compatible added * is_auto_compatible added
-
由 QingshuChen 提交于
* fix kunlun2 softmax unitest bug *test=kunlun * minor
-