- 21 2月, 2023 2 次提交
-
-
由 YuanRisheng 提交于
* decouple_memory * perfect memory utils * fix ci bugs * fix inference bugs * fix custom test bugs * fix converage bugs * modify code according comment * modify namespace * deal with compile bugs
-
由 Huang Jiyi 提交于
* move sequence_padding to phi * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * fix buga * fix bugs * revert and update phi::XPUContext
-
- 20 2月, 2023 1 次提交
-
-
由 RedContritio 提交于
-
- 17 2月, 2023 2 次提交
-
-
由 Huang Jiyi 提交于
* move platform::transform to phi * fix bugs * move transform_test to phi * fix cmake * update namespace * fix cmake
-
由 Huang Jiyi 提交于
* rm framework::tensor_util in phi * clean TensoCopy * fix bugs * fix bugs * fix bugs * repalce mutable_data * revert custom_device_test.cc
-
- 16 2月, 2023 2 次提交
-
-
由 Huang Jiyi 提交于
* move layer_norm_kernel.cu.h to phi * fix bugs * fix namespace * fix bugs * fix CI-Windwos * replace mutable_data * fix bugs * fix bugs
-
由 Huang Jiyi 提交于
* move variable_utils from phi_api_utils to fluid * fix coment * update include * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * update * update * fix CI-Windows-OpenBLAS * fix bugs * fix bugs * fix bugs * update include * move variable_utils to phi_utils * fix namespace
-
- 15 2月, 2023 1 次提交
-
-
由 YuanRisheng 提交于
* move profiler * add file * fix mac compile bugs * fix ci bugs * fix mac bugs * fix ci bugs * fix compile bugs * perfect code according comment
-
- 14 2月, 2023 2 次提交
-
-
由 engineer1109 提交于
fix X remove TensorCopy codestyle add fluid memory header fix symbol fix cmake fix cmake fix context fix header fix place fix context fix context fix context fix code fix custom context fix custom context fix copy fix data_transform fix style remove changes of custom fix scalar
-
由 limingshu 提交于
* first commit. * a little changes * add some changes for get vec_size efficiently * fix bugs --------- Co-authored-by: Nzhangbopd <1299246947@qq.com>
-
- 10 2月, 2023 1 次提交
-
-
由 RedContritio 提交于
* add dim check in scatter * add check in scatter.cu * add unittest * remove unnecessary log and comment --------- Co-authored-by: RedContritio <>
-
- 09 2月, 2023 2 次提交
-
-
由 Huang Jiyi 提交于
* decouple strided_memcpy * move strided_memcpy * move strided_memcpy to phi * fix namespace * update * fix gpu compile bugs
-
由 yuehuayingxueluo 提交于
* add multi_tenosr_adam * update multi_tensor_base.py, test_multi_tensor_adam.py, adamw.py * fix adam.py optimizer.py * fix adamw.py * fix test_multi_tensor_adam.py * fix CI bug * fix CI coverage * fix ci bug * fix betapow * fix some bugs * fix test_adamw_op.py * fix CI coverage * fix multi_tensor_adam_kernel.cc * fix CI bug * fix multi_tensor_adam_op.cc and test_multi_tensor_adam.py * fix code style * update C++ parts * remove python parts modification temporarily * add C++ ut * update betapow copy code logic * fix ci ut * fix windows ci * fix coverage ci * improve coverage rate --------- Co-authored-by: Nsneaxiy <sneaxiy@126.com>
-
- 08 2月, 2023 1 次提交
-
-
由 Huang Jiyi 提交于
-
- 07 2月, 2023 1 次提交
-
-
由 Yuang Liu 提交于
-
- 03 2月, 2023 1 次提交
-
-
由 RedContritio 提交于
-
- 02 2月, 2023 2 次提交
-
-
由 RedContritio 提交于
* add stride check for PoolOutputSize * add unittest
-
由 YuanRisheng 提交于
* fix bugs * fix ci bugs
-
- 01 2月, 2023 3 次提交
-
-
由 RedContritio 提交于
* add stride check for MaxPool * add unittests
-
由 limingshu 提交于
* A leap of try for cudaLaunchCooperativeKernel * fix bugs * Totally replace the lar cuda kernel * Fix bugs * fix code according to comments * fix codes according to review comments * adding some function overload * relocate the power operation. * add bf16 support for index select relevant ops * revert bf16 type change. * add changes for more op * fix code writting bugs
-
由 limingshu 提交于
* profile reduce kernel for fp16 and reduceHigherdim * use reinterpret_cast * fix for CI on ROCm * add Macro for ROCm * ROCm CI config * ROCm CI config * unit test repair * pull * add common_funcs.h * reduceType * Update reduce_function.h * not higher * rename * implement of matmul using cublasLt instead of cublas * cublasLt bugfix * Update matmul_kernel_impl.h * Update matmul_kernel_impl_via_blasLt.h * for-loop-algo * PR comments changes * add macro * ci unused variable isCublasLt * ci unused variable isCublasLt macro * split matmul to autotune * rewrite the split kernel with segmented_array * rewrite the split kernel with segmented_array * rewrite the split kernel with segmented_array * add some method for cuda_graph * fix bugs for rocm * change for ci-error * i dont know why ci-model-benchmark gives a shit error, so i recover codes with original one to see if original codes work. * add some changes for passing mode_benchmark and coverage ci * fix ci error * fix ci-rocm error * add some changes for header --------- Co-authored-by: Nzhangbopd <1299246947@qq.com> Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
-
- 31 1月, 2023 5 次提交
-
-
由 zhangkaihuo 提交于
-
由 张春乔 提交于
* fix mod 0 error * fix div 0 error in floormod
-
由 xiaoting 提交于
* support 0d tensor for interpolate * support 0d tensor for interpolate * add xpu unittest for interp * update unittest for interpolate * fix coverage * fix code style * fix for coverage * fix coverage
-
由 张春乔 提交于
-
由 Yiqun Liu 提交于
* Unify the gpu implementation of stack and unstack to reuse the optimization. * Optimize the cuda implementation of unstack. * Use GpuMemcpyAsync instead of memory::Copy. * Fix error of calculating the index. * Use FastDivMod to further imporve the performance of unstack.
-
- 30 1月, 2023 1 次提交
-
-
由 engineer1109 提交于
replace all TensorFromVector & TensorToVector AssignKernel async copy
-
- 18 1月, 2023 1 次提交
-
-
由 MarDino 提交于
* add align check * refine
-
- 16 1月, 2023 1 次提交
-
-
由 zlsh80826 提交于
* Update warpctc for cuda-12 * Deprecate cudaProfilerInitialize for CUDA > 11 * Deprecate CUSPARSE_MV_ALG_DEFAULT for CUDA_VERSION >= 11040 * Add the missing thrust header
-
- 13 1月, 2023 3 次提交
-
-
由 limingshu 提交于
* first commit * add some changes in stack kernel. * move the location of GeneralDivMod * fix code format error according to ci
-
由 zhangkaihuo 提交于
-
由 Yuanle Liu 提交于
-
- 11 1月, 2023 1 次提交
-
-
由 Yiqun Liu 提交于
* Implement a common PointerArray. * Polish codes. * Add including of header file. * Add the branch of kFix8. * Fix compiling error. * Add alignas hint to fix the performance drop. * Optimize the H2D copy in stack_grad. * Rename the macro. * Fix align hint for different compilers. * Polish the define of PADDLE_ALIGN. * Fix compiling error. * Remove the align hint on windows.
-
- 10 1月, 2023 2 次提交
- 09 1月, 2023 2 次提交
-
-
由 MarDino 提交于
* add concat optimization * refine * remove annotation * use alignas instead of aligned_storage
-
由 wangzhen38 提交于
-
- 04 1月, 2023 1 次提交
-
-
由 Yuanle Liu 提交于
-
- 03 1月, 2023 2 次提交