- 02 2月, 2023 3 次提交
-
-
由 Ccc 提交于
* paddle.nn.functional.softmax * paddle.nn.functional.log_softmax * paddle.nn.functional.gumbel_softmax * paddle.nn.functional.prelu
-
由 YuanRisheng 提交于
* fix bugs * fix ci bugs
-
由 liuruyan 提交于
-
- 01 2月, 2023 9 次提交
-
-
由 RedContritio 提交于
* add range check for crop_kernel * remove shape negative check * add unittest
-
由 RedContritio 提交于
* add stride check for MaxPool * add unittests
-
由 Zhong Hui 提交于
* fix 0-d tensor for arg_min_max op. * fix xpu. * fix zero dims * fix * Update arg_min_max_kernel.cc * Update arg_min_max_kernel.cc * Update arg_min_max_kernel.cc * Update test_zero_dim_tensor.py * Update test_zero_dim_tensor_xpu.py * Update test_zero_dim_tensor.py * Update arg_min_max_kernel.cc * Update arg_min_max_kernel.cc * Update arg_min_max_kernel.cc
-
由 zhangyikun02 提交于
-
由 gouzil 提交于
* [Divide by 0 Error] add lu check * [Divide by 0 Error] lu check migrate to c++
-
由 gouzil 提交于
* [Divide by 0 Error] add eig check * [Divide by 0 Error] eig check migrate to c++ * [Divide by 0 Error] Fix class name error
-
由 gouzil 提交于
* [Divide by 0 Error] add norm check * [Divide by 0 Error] fix x AttributeError * [Divide by 0 Error] norm check migrate to c++
-
由 limingshu 提交于
* A leap of try for cudaLaunchCooperativeKernel * fix bugs * Totally replace the lar cuda kernel * Fix bugs * fix code according to comments * fix codes according to review comments * adding some function overload * relocate the power operation. * add bf16 support for index select relevant ops * revert bf16 type change. * add changes for more op * fix code writting bugs
-
由 limingshu 提交于
* profile reduce kernel for fp16 and reduceHigherdim * use reinterpret_cast * fix for CI on ROCm * add Macro for ROCm * ROCm CI config * ROCm CI config * unit test repair * pull * add common_funcs.h * reduceType * Update reduce_function.h * not higher * rename * implement of matmul using cublasLt instead of cublas * cublasLt bugfix * Update matmul_kernel_impl.h * Update matmul_kernel_impl_via_blasLt.h * for-loop-algo * PR comments changes * add macro * ci unused variable isCublasLt * ci unused variable isCublasLt macro * split matmul to autotune * rewrite the split kernel with segmented_array * rewrite the split kernel with segmented_array * rewrite the split kernel with segmented_array * add some method for cuda_graph * fix bugs for rocm * change for ci-error * i dont know why ci-model-benchmark gives a shit error, so i recover codes with original one to see if original codes work. * add some changes for passing mode_benchmark and coverage ci * fix ci error * fix ci-rocm error * add some changes for header --------- Co-authored-by: Nzhangbopd <1299246947@qq.com> Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
-
- 31 1月, 2023 11 次提交
-
-
由 RedContritio 提交于
-
由 wangshengxiang 提交于
-
由 zhangkaihuo 提交于
-
由 MarDino 提交于
-
由 张春乔 提交于
* fix mod 0 error * fix div 0 error in floormod
-
由 201716010711 提交于
-
由 xiaoting 提交于
* support 0d tensor for interpolate * support 0d tensor for interpolate * add xpu unittest for interp * update unittest for interpolate * fix coverage * fix code style * fix for coverage * fix coverage
-
由 张春乔 提交于
-
由 RedContritio 提交于
* add elements count check in atan2 * add unittest and pre-check in inferMeta * add dimension check
-
由 RedContritio 提交于
-
由 Yiqun Liu 提交于
* Unify the gpu implementation of stack and unstack to reuse the optimization. * Optimize the cuda implementation of unstack. * Use GpuMemcpyAsync instead of memory::Copy. * Fix error of calculating the index. * Use FastDivMod to further imporve the performance of unstack.
-
- 30 1月, 2023 3 次提交
-
-
由 RedContritio 提交于
* add pivots type check and fix batchsize error * add unittest for batchsize = 0 * fix nullptr in lu_unpack fix batchsize error in LU_Unpack add nullptr check in OneFunctor * remove exception in device code
-
由 Ryan 提交于
* add pinv check * add unitest * update unitest * roll back * fix not call stupid bug * use context
-
由 engineer1109 提交于
replace all TensorFromVector & TensorToVector AssignKernel async copy
-
- 25 1月, 2023 1 次提交
-
-
由 limingshu 提交于
Co-authored-by: Nzhangbopd <1299246947@qq.com>
-
- 20 1月, 2023 1 次提交
-
-
由 jakpiase 提交于
* fix for matmul_grad * another fix for matmul_grad * fix
-
- 19 1月, 2023 2 次提交
-
-
由 heliqi 提交于
* fix queeze_ bug * fix slove use squeeze_kernel * fix slove use squeeze_kernel * fix slove use squeeze_kernel * add test case
-
由 jameszhang 提交于
* [KUNLUN] add op: maxpool_with_index * use DeviceContext::Alloc() instead of DenseTensor::mutable_data() * fix file format * solve clip unittest failure * minor fix * Revert "solve clip unittest failure" since the issue is fixed in #49535 This reverts commit 1127adc66e79afe35ac3c00bb34e6aaa7cd7d78b. * align with xdnn on the definition of mask in max_pool_with_index * minor
-
- 18 1月, 2023 6 次提交
-
-
由 MarDino 提交于
* add align check * refine
-
由 zhouweiwei2014 提交于
* [Zero-Dim] support input 0D for paddle.moveaxis/quantile * fix CI
-
由 RuohengMa 提交于
* add reduce_sum_int64 and reduce_sum_int8 xpu kernels * [PHI] add clip grad kernel with support type float32 and int32 * [PHI unittest] add clip_grad unit test * adapt code to clang-format * update xpu api output with clip_grad api * remove int8 support of reduce_sum xpu kernel since it can not pass unit tests * adapt license date, add code for XPUDataType convertion * add int8 support of reduce_sum * add reduce_sum unit tests for dtype int64, int8, and add more test cases * update license date * remove buggy bitwise and, or and xor xpu kernels, refine bitwise not xpu kernel * change license date
-
由 houj04 提交于
-
由 wawltor 提交于
* Add the cumsum 0d tensor * xpu and cpu judge the 0d tensor * change to 2022 to 2023 in new commit * fix the reverse logic
-
由 Zhang Zheng 提交于
-
- 16 1月, 2023 1 次提交
-
-
由 zlsh80826 提交于
* Update warpctc for cuda-12 * Deprecate cudaProfilerInitialize for CUDA > 11 * Deprecate CUSPARSE_MV_ALG_DEFAULT for CUDA_VERSION >= 11040 * Add the missing thrust header
-
- 13 1月, 2023 3 次提交