- 31 1月, 2023 2 次提交
-
-
由 RedContritio 提交于
-
由 Yiqun Liu 提交于
* Unify the gpu implementation of stack and unstack to reuse the optimization. * Optimize the cuda implementation of unstack. * Use GpuMemcpyAsync instead of memory::Copy. * Fix error of calculating the index. * Use FastDivMod to further imporve the performance of unstack.
-
- 30 1月, 2023 3 次提交
-
-
由 RedContritio 提交于
* add pivots type check and fix batchsize error * add unittest for batchsize = 0 * fix nullptr in lu_unpack fix batchsize error in LU_Unpack add nullptr check in OneFunctor * remove exception in device code
-
由 Ryan 提交于
* add pinv check * add unitest * update unitest * roll back * fix not call stupid bug * use context
-
由 engineer1109 提交于
replace all TensorFromVector & TensorToVector AssignKernel async copy
-
- 25 1月, 2023 1 次提交
-
-
由 limingshu 提交于
Co-authored-by: Nzhangbopd <1299246947@qq.com>
-
- 20 1月, 2023 1 次提交
-
-
由 jakpiase 提交于
* fix for matmul_grad * another fix for matmul_grad * fix
-
- 19 1月, 2023 2 次提交
-
-
由 heliqi 提交于
* fix queeze_ bug * fix slove use squeeze_kernel * fix slove use squeeze_kernel * fix slove use squeeze_kernel * add test case
-
由 jameszhang 提交于
* [KUNLUN] add op: maxpool_with_index * use DeviceContext::Alloc() instead of DenseTensor::mutable_data() * fix file format * solve clip unittest failure * minor fix * Revert "solve clip unittest failure" since the issue is fixed in #49535 This reverts commit 1127adc66e79afe35ac3c00bb34e6aaa7cd7d78b. * align with xdnn on the definition of mask in max_pool_with_index * minor
-
- 18 1月, 2023 6 次提交
-
-
由 MarDino 提交于
* add align check * refine
-
由 zhouweiwei2014 提交于
* [Zero-Dim] support input 0D for paddle.moveaxis/quantile * fix CI
-
由 RuohengMa 提交于
* add reduce_sum_int64 and reduce_sum_int8 xpu kernels * [PHI] add clip grad kernel with support type float32 and int32 * [PHI unittest] add clip_grad unit test * adapt code to clang-format * update xpu api output with clip_grad api * remove int8 support of reduce_sum xpu kernel since it can not pass unit tests * adapt license date, add code for XPUDataType convertion * add int8 support of reduce_sum * add reduce_sum unit tests for dtype int64, int8, and add more test cases * update license date * remove buggy bitwise and, or and xor xpu kernels, refine bitwise not xpu kernel * change license date
-
由 houj04 提交于
-
由 wawltor 提交于
* Add the cumsum 0d tensor * xpu and cpu judge the 0d tensor * change to 2022 to 2023 in new commit * fix the reverse logic
-
由 Zhang Zheng 提交于
-
- 16 1月, 2023 1 次提交
-
-
由 zlsh80826 提交于
* Update warpctc for cuda-12 * Deprecate cudaProfilerInitialize for CUDA > 11 * Deprecate CUSPARSE_MV_ALG_DEFAULT for CUDA_VERSION >= 11040 * Add the missing thrust header
-
- 13 1月, 2023 8 次提交
-
-
由 limingshu 提交于
* first commit * add some changes in stack kernel. * move the location of GeneralDivMod * fix code format error according to ci
-
由 ronnywang 提交于
* add where, atan2, median 0d ut * add where, atan2, median 0d ut * update * update * update
-
由 ykkk2333 提交于
-
由 Leo Guo 提交于
-
由 wangshengxiang 提交于
-
由 wangzhen38 提交于
* [cpplint fix] under ps
-
由 zhangkaihuo 提交于
-
由 Yuanle Liu 提交于
-
- 12 1月, 2023 4 次提交
-
-
由 sunli 提交于
* lerp support 0 Tensor * fix lerp grad * fix lerp zero test * fix 0D + ND/ND + 0D * fix check * update code * fix lerp infer shape * static backward test * updata static graph test
-
由 YuanRisheng 提交于
-
由 Leo Guo 提交于
xpu2_op_list.cc. test=kunlun
-
由 YuanRisheng 提交于
* rename kernel * delete sig * modify code according comment * fix ci bugs
-
- 11 1月, 2023 1 次提交
-
-
由 Yiqun Liu 提交于
* Implement a common PointerArray. * Polish codes. * Add including of header file. * Add the branch of kFix8. * Fix compiling error. * Add alignas hint to fix the performance drop. * Optimize the H2D copy in stack_grad. * Rename the macro. * Fix align hint for different compilers. * Polish the define of PADDLE_ALIGN. * Fix compiling error. * Remove the align hint on windows.
-
- 10 1月, 2023 3 次提交
-
-
由 limingshu 提交于
* add stack grad kernel optimization * add basic optimization kernel for stack_grad_kernel * optimization of stack_grad_kernel for last dim stack and change code format with pre-commit
-
由 Ryan 提交于
* try sequence_padding * fix cant use mutable_data * fix mistake fluid_sequence_scale.hh/CMakeLists.t include * fix namespace bug * fix framework::ToAbsOffset not found * fix codestyle
-
由 MarDino 提交于
-
- 09 1月, 2023 4 次提交
-
-
由 MarDino 提交于
* add concat optimization * refine * remove annotation * use alignas instead of aligned_storage
-
由 QingshuChen 提交于
-
由 ykkk2333 提交于
* migrate shaple sgd, split,sign xpu kernels to phi, test=kunlun * fix dlrm throughput problem, test=kunlun * add xpu einsum, fill_diagonal, and diagonal kernels, test=kunlun
-
由 wangzhen38 提交于
-
- 06 1月, 2023 3 次提交
-
-
由 RuohengMa 提交于
* add bitwise and, bitwise not, bitwise or and bitwise xor * correct typo
-
由 JYChen 提交于
* add 0-d support for paddle.kthvalue * add 0-d support for paddle.mode * fix coverage test for device * fix check-bug in windows * change axis check from LT to LE * add shape & value check for grad when input is 0d tensor
-
由 Thomas Young 提交于
-
- 05 1月, 2023 1 次提交
-
-
由 Siming Dai 提交于
* support 0D for paddle.sort/argsort * support 0D tensor for paddle.sort/argsort in xpu * fix bug * fix grad and add value assertion
-