- 08 2月, 2023 1 次提交
- 
- 
由 Huang Jiyi 提交于
 
- 
- 07 2月, 2023 1 次提交
- 
- 
由 Yuang Liu 提交于
 
- 
- 03 2月, 2023 1 次提交
- 
- 
由 RedContritio 提交于
 
- 
- 02 2月, 2023 2 次提交
- 
- 
由 RedContritio 提交于* add stride check for PoolOutputSize * add unittest 
- 
由 YuanRisheng 提交于* fix bugs * fix ci bugs 
 
- 
- 01 2月, 2023 3 次提交
- 
- 
由 RedContritio 提交于* add stride check for MaxPool * add unittests 
- 
由 limingshu 提交于* A leap of try for cudaLaunchCooperativeKernel * fix bugs * Totally replace the lar cuda kernel * Fix bugs * fix code according to comments * fix codes according to review comments * adding some function overload * relocate the power operation. * add bf16 support for index select relevant ops * revert bf16 type change. * add changes for more op * fix code writting bugs 
- 
由 limingshu 提交于* profile reduce kernel for fp16 and reduceHigherdim * use reinterpret_cast * fix for CI on ROCm * add Macro for ROCm * ROCm CI config * ROCm CI config * unit test repair * pull * add common_funcs.h * reduceType * Update reduce_function.h * not higher * rename * implement of matmul using cublasLt instead of cublas * cublasLt bugfix * Update matmul_kernel_impl.h * Update matmul_kernel_impl_via_blasLt.h * for-loop-algo * PR comments changes * add macro * ci unused variable isCublasLt * ci unused variable isCublasLt macro * split matmul to autotune * rewrite the split kernel with segmented_array * rewrite the split kernel with segmented_array * rewrite the split kernel with segmented_array * add some method for cuda_graph * fix bugs for rocm * change for ci-error * i dont know why ci-model-benchmark gives a shit error, so i recover codes with original one to see if original codes work. * add some changes for passing mode_benchmark and coverage ci * fix ci error * fix ci-rocm error * add some changes for header --------- Co-authored-by: Nzhangbopd <1299246947@qq.com> Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com> 
 
- 
- 31 1月, 2023 5 次提交
- 
- 
由 zhangkaihuo 提交于
- 
由 张春乔 提交于* fix mod 0 error * fix div 0 error in floormod 
- 
由 xiaoting 提交于* support 0d tensor for interpolate * support 0d tensor for interpolate * add xpu unittest for interp * update unittest for interpolate * fix coverage * fix code style * fix for coverage * fix coverage 
- 
由 张春乔 提交于
- 
由 Yiqun Liu 提交于* Unify the gpu implementation of stack and unstack to reuse the optimization. * Optimize the cuda implementation of unstack. * Use GpuMemcpyAsync instead of memory::Copy. * Fix error of calculating the index. * Use FastDivMod to further imporve the performance of unstack. 
 
- 
- 30 1月, 2023 1 次提交
- 
- 
由 engineer1109 提交于replace all TensorFromVector & TensorToVector AssignKernel async copy 
 
- 
- 18 1月, 2023 1 次提交
- 
- 
由 MarDino 提交于* add align check * refine 
 
- 
- 16 1月, 2023 1 次提交
- 
- 
由 zlsh80826 提交于* Update warpctc for cuda-12 * Deprecate cudaProfilerInitialize for CUDA > 11 * Deprecate CUSPARSE_MV_ALG_DEFAULT for CUDA_VERSION >= 11040 * Add the missing thrust header 
 
- 
- 13 1月, 2023 3 次提交
- 
- 
由 limingshu 提交于* first commit * add some changes in stack kernel. * move the location of GeneralDivMod * fix code format error according to ci 
- 
由 zhangkaihuo 提交于
- 
由 Yuanle Liu 提交于
 
- 
- 11 1月, 2023 1 次提交
- 
- 
由 Yiqun Liu 提交于* Implement a common PointerArray. * Polish codes. * Add including of header file. * Add the branch of kFix8. * Fix compiling error. * Add alignas hint to fix the performance drop. * Optimize the H2D copy in stack_grad. * Rename the macro. * Fix align hint for different compilers. * Polish the define of PADDLE_ALIGN. * Fix compiling error. * Remove the align hint on windows. 
 
- 
- 10 1月, 2023 2 次提交
- 09 1月, 2023 2 次提交
- 
- 
由 MarDino 提交于* add concat optimization * refine * remove annotation * use alignas instead of aligned_storage 
- 
由 wangzhen38 提交于
 
- 
- 04 1月, 2023 1 次提交
- 
- 
由 Yuanle Liu 提交于
 
- 
- 03 1月, 2023 2 次提交
- 26 12月, 2022 1 次提交
- 
- 
由 Roc 提交于* revert concat and change concat to stack * let stack kernel support int8, uint8 and bool type 
 
- 
- 20 12月, 2022 1 次提交
- 
- 
由 huangjiyi 提交于* move dropout_impl from fluid to phi * move cuda_graph_with_memory_pool from fluid to phi * update namespace * remove cuad_graph in fluid * fix mac-build * fix bugs * correct CodeStyle * fix mac-build * fix mutable_data * fix stl include * fix copy param 
 
- 
- 19 12月, 2022 2 次提交
- 
- 
由 huangjiyi 提交于* move gather_scatter_kernel from fluid to phi * mv gather_scatter_kernel to gather_scatter_functor 
- 
由 huangjiyi 提交于* move maxouting from fluid to phi * move matrix_bit_code from fluid to phi * replace mutable_data and fix include * fix include * move gather_scatter_kernel from fluid to phi * Revert "move gather_scatter_kernel from fluid to phi" This reverts commit 3d0b1eaf179656072e8c483dfca688cccccdda01. 
 
- 
- 16 12月, 2022 1 次提交
- 
- 
由 MarDino 提交于* optimize bias_add reluv2 in half2 * Add annotation * refine code format 
 
- 
- 15 12月, 2022 1 次提交
- 
- 
由 huangjiyi 提交于
 
- 
- 14 12月, 2022 1 次提交
- 
- 
由 limingshu 提交于* First Commit. * add some codes * add elementwise loader * fix code styles * merge with develop * add some changes both in elementwise and transpose * add init operation in broadcast kernel. * change codes according to pr suggestions about transpose file * fix error for op-benchmark ci * fix according to ci 
 
- 
- 12 12月, 2022 2 次提交
- 
- 
由 傅剑寒 提交于* fix codestyle * add double complex<float> complex<double> dtype support for syevj_batched * fix use_syevj flag for precision loss when input dtype of syevj_batch is complex128 in some case * optimize eigh in different case * fix missing ; bug * fix use_syevj bug * fix use_cusolver_syevj_batched flag 
- 
由 huangjiyi 提交于* move norm_utils.cu.h from fluid to phi * remove norm_utils.h in fluid * fix bugs and replace mutable_data with Alloc * replace mutable_data with Alloc 
 
- 
- 08 12月, 2022 4 次提交
- 
- 
由 limingshu 提交于
- 
由 jakpiase 提交于Reenabled ext_reorder recording for TransDataLayoutFromOneDNN 
- 
由 201716010711 提交于
- 
由 Netpunk 提交于* remove bbox_util.h from phi * add file bbox_util.h * reframe bbox_util.h 
 
- 
