- 01 2月, 2023 1 次提交
-
-
由 limingshu 提交于
* profile reduce kernel for fp16 and reduceHigherdim * use reinterpret_cast * fix for CI on ROCm * add Macro for ROCm * ROCm CI config * ROCm CI config * unit test repair * pull * add common_funcs.h * reduceType * Update reduce_function.h * not higher * rename * implement of matmul using cublasLt instead of cublas * cublasLt bugfix * Update matmul_kernel_impl.h * Update matmul_kernel_impl_via_blasLt.h * for-loop-algo * PR comments changes * add macro * ci unused variable isCublasLt * ci unused variable isCublasLt macro * split matmul to autotune * rewrite the split kernel with segmented_array * rewrite the split kernel with segmented_array * rewrite the split kernel with segmented_array * add some method for cuda_graph * fix bugs for rocm * change for ci-error * i dont know why ci-model-benchmark gives a shit error, so i recover codes with original one to see if original codes work. * add some changes for passing mode_benchmark and coverage ci * fix ci error * fix ci-rocm error * add some changes for header --------- Co-authored-by: Nzhangbopd <1299246947@qq.com> Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
-
- 31 1月, 2023 5 次提交
-
-
由 zhangkaihuo 提交于
-
由 张春乔 提交于
* fix mod 0 error * fix div 0 error in floormod
-
由 xiaoting 提交于
* support 0d tensor for interpolate * support 0d tensor for interpolate * add xpu unittest for interp * update unittest for interpolate * fix coverage * fix code style * fix for coverage * fix coverage
-
由 张春乔 提交于
-
由 Yiqun Liu 提交于
* Unify the gpu implementation of stack and unstack to reuse the optimization. * Optimize the cuda implementation of unstack. * Use GpuMemcpyAsync instead of memory::Copy. * Fix error of calculating the index. * Use FastDivMod to further imporve the performance of unstack.
-
- 30 1月, 2023 1 次提交
-
-
由 engineer1109 提交于
replace all TensorFromVector & TensorToVector AssignKernel async copy
-
- 18 1月, 2023 1 次提交
-
-
由 MarDino 提交于
* add align check * refine
-
- 16 1月, 2023 1 次提交
-
-
由 zlsh80826 提交于
* Update warpctc for cuda-12 * Deprecate cudaProfilerInitialize for CUDA > 11 * Deprecate CUSPARSE_MV_ALG_DEFAULT for CUDA_VERSION >= 11040 * Add the missing thrust header
-
- 13 1月, 2023 3 次提交
-
-
由 limingshu 提交于
* first commit * add some changes in stack kernel. * move the location of GeneralDivMod * fix code format error according to ci
-
由 zhangkaihuo 提交于
-
由 Yuanle Liu 提交于
-
- 11 1月, 2023 1 次提交
-
-
由 Yiqun Liu 提交于
* Implement a common PointerArray. * Polish codes. * Add including of header file. * Add the branch of kFix8. * Fix compiling error. * Add alignas hint to fix the performance drop. * Optimize the H2D copy in stack_grad. * Rename the macro. * Fix align hint for different compilers. * Polish the define of PADDLE_ALIGN. * Fix compiling error. * Remove the align hint on windows.
-
- 10 1月, 2023 2 次提交
- 09 1月, 2023 2 次提交
-
-
由 MarDino 提交于
* add concat optimization * refine * remove annotation * use alignas instead of aligned_storage
-
由 wangzhen38 提交于
-
- 04 1月, 2023 1 次提交
-
-
由 Yuanle Liu 提交于
-
- 03 1月, 2023 2 次提交
- 26 12月, 2022 1 次提交
-
-
由 Roc 提交于
* revert concat and change concat to stack * let stack kernel support int8, uint8 and bool type
-
- 20 12月, 2022 1 次提交
-
-
由 huangjiyi 提交于
* move dropout_impl from fluid to phi * move cuda_graph_with_memory_pool from fluid to phi * update namespace * remove cuad_graph in fluid * fix mac-build * fix bugs * correct CodeStyle * fix mac-build * fix mutable_data * fix stl include * fix copy param
-
- 19 12月, 2022 2 次提交
-
-
由 huangjiyi 提交于
* move gather_scatter_kernel from fluid to phi * mv gather_scatter_kernel to gather_scatter_functor
-
由 huangjiyi 提交于
* move maxouting from fluid to phi * move matrix_bit_code from fluid to phi * replace mutable_data and fix include * fix include * move gather_scatter_kernel from fluid to phi * Revert "move gather_scatter_kernel from fluid to phi" This reverts commit 3d0b1eaf179656072e8c483dfca688cccccdda01.
-
- 16 12月, 2022 1 次提交
-
-
由 MarDino 提交于
* optimize bias_add reluv2 in half2 * Add annotation * refine code format
-
- 15 12月, 2022 1 次提交
-
-
由 huangjiyi 提交于
-
- 14 12月, 2022 1 次提交
-
-
由 limingshu 提交于
* First Commit. * add some codes * add elementwise loader * fix code styles * merge with develop * add some changes both in elementwise and transpose * add init operation in broadcast kernel. * change codes according to pr suggestions about transpose file * fix error for op-benchmark ci * fix according to ci
-
- 12 12月, 2022 2 次提交
-
-
由 傅剑寒 提交于
* fix codestyle * add double complex<float> complex<double> dtype support for syevj_batched * fix use_syevj flag for precision loss when input dtype of syevj_batch is complex128 in some case * optimize eigh in different case * fix missing ; bug * fix use_syevj bug * fix use_cusolver_syevj_batched flag
-
由 huangjiyi 提交于
* move norm_utils.cu.h from fluid to phi * remove norm_utils.h in fluid * fix bugs and replace mutable_data with Alloc * replace mutable_data with Alloc
-
- 08 12月, 2022 5 次提交
-
-
由 limingshu 提交于
-
由 jakpiase 提交于
Reenabled ext_reorder recording for TransDataLayoutFromOneDNN
-
由 201716010711 提交于
-
由 Netpunk 提交于
* remove bbox_util.h from phi * add file bbox_util.h * reframe bbox_util.h
-
由 Netpunk 提交于
-
- 07 12月, 2022 1 次提交
-
-
由 zhoutianzi666 提交于
-
- 05 12月, 2022 5 次提交
-
-
由 limingshu 提交于
* first commit * fix bugs according to ci * add some changes * change file name into function.cu.h * remove const_cast
-
由 Roc 提交于
-
由 Ruibiao Chen 提交于
* Replace mutable_data with DeviceContext.Alloc in phi kernels * Fix CI errors * Fix CI errors * Fix CI errors, test=kunlun * Fix CI errors, test=kunlun * Handle rnn_functor * Update approvals
-
由 heyanru 提交于
[Fluid Clean] remove nn.topk, nn.ctc_greedy_decoder, nn.im2sequence, nn.multiplex, nn.smooth_l1 (#48289)
-
由 Netpunk 提交于
* rm poly_util.h * format code * fix some problems * format code
-