- 07 3月, 2023 1 次提交
-
-
由 Charles-hit 提交于
* support elementwise_pow bfloat16 * add only_check_prim parameters in check_grad * modify unit test * fix floor test * fix sigmoid bfloat16 test
-
- 06 3月, 2023 1 次提交
-
-
由 Huang Jiyi 提交于
* move DeviceContextPool to phi * add EmplaceExternalContextFunc * update namespace * update cmake * fix bugs and create context_pool_impl.h * replace platform::is_xxx_place * fix bugs * update generator * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * fix enforce usage * Revert "fix enforce usage" This reverts commit 5f521f08a69713cee506e64a00ec6d9fba709e27. * fix bugs * rm XPUDeviceContext and CustomDeviceContext * fix bugs * fix fix context init bug * fix bugs after merge * fix bugs * fix name * fix mutable_data * update and fix bugs * fix bugs * update * fix bugs * fix name * fix bugs * merge * fix bugs * create context_pool in phi/backends * create context_pool in phi/backends * fix bugs * fix xpu bugs * fix rocm bugs * fix bugs * fix bugs * fix bugs * fix xpu bugs * update * update * fix bugs * fix bugs
-
- 03 3月, 2023 2 次提交
-
-
由 gouzil 提交于
* [phi] move jit kernels from fluid to phi * [phi] fix paddle::phi err * [phi] fix windows 'posix_memalign': identifier not found * [phi] fix windows 'posix_memalign_free': identifier not found * [phi] fix readme directory structure, fc_functor paddle::platform
-
由 YuanRisheng 提交于
* decouple memory copy * fix ci bugs * fix ci compile bugs * fix rocm compile * fix ci bugs
-
- 02 3月, 2023 2 次提交
-
-
由 limingshu 提交于
* first commit * finish base work * modification for good * fix for cache setting and gather the algo and desc as one data for cache storage * fix for cache setting and gather the algo and desc as one data for cache storage * install pre-commit check
-
由 Leo Chen 提交于
* register fp16 and bf16 kernel for uniform_random * fix compile * support selected_rows * add ut * revert cpu * fp16 test skip cpu
-
- 01 3月, 2023 1 次提交
-
-
由 duanyanhui 提交于
* add support of int64 add for xpu * add transpose support for int64 * add randperm kernel * fix randperm * add distribute_fpn_proposal kernel * fix comment * add reduce_sum_int32
-
- 28 2月, 2023 1 次提交
-
-
由 gouzil 提交于
* [phi] move device_wrapper from fluid to phi * [phi] fix ‘PADDLE_ENFORCE_XDNN_SUCCESS’ was not declared in this scope
-
- 27 2月, 2023 2 次提交
- 26 2月, 2023 1 次提交
-
-
由 limingshu 提交于
* implement of matmul using cublasLt instead of cublas * Update matmul_kernel_impl_via_blasLt.h --------- Co-authored-by: Nzhangbopd <1299246947@qq.com> Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com> Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 22 2月, 2023 1 次提交
-
-
由 Shuangchi He 提交于
* Fix some typos. Signed-off-by: Yulv-git <yulvchi@qq.com> * pre-commit Signed-off-by: Yulv-git <yulvchi@qq.com> --------- Signed-off-by: Yulv-git <yulvchi@qq.com>
-
- 21 2月, 2023 2 次提交
-
-
由 YuanRisheng 提交于
* decouple_memory * perfect memory utils * fix ci bugs * fix inference bugs * fix custom test bugs * fix converage bugs * modify code according comment * modify namespace * deal with compile bugs
-
由 Huang Jiyi 提交于
* move sequence_padding to phi * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * fix buga * fix bugs * revert and update phi::XPUContext
-
- 20 2月, 2023 1 次提交
-
-
由 RedContritio 提交于
-
- 17 2月, 2023 2 次提交
-
-
由 Huang Jiyi 提交于
* move platform::transform to phi * fix bugs * move transform_test to phi * fix cmake * update namespace * fix cmake
-
由 Huang Jiyi 提交于
* rm framework::tensor_util in phi * clean TensoCopy * fix bugs * fix bugs * fix bugs * repalce mutable_data * revert custom_device_test.cc
-
- 16 2月, 2023 2 次提交
-
-
由 Huang Jiyi 提交于
* move layer_norm_kernel.cu.h to phi * fix bugs * fix namespace * fix bugs * fix CI-Windwos * replace mutable_data * fix bugs * fix bugs
-
由 Huang Jiyi 提交于
* move variable_utils from phi_api_utils to fluid * fix coment * update include * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * update * update * fix CI-Windows-OpenBLAS * fix bugs * fix bugs * fix bugs * update include * move variable_utils to phi_utils * fix namespace
-
- 15 2月, 2023 1 次提交
-
-
由 YuanRisheng 提交于
* move profiler * add file * fix mac compile bugs * fix ci bugs * fix mac bugs * fix ci bugs * fix compile bugs * perfect code according comment
-
- 14 2月, 2023 2 次提交
-
-
由 engineer1109 提交于
fix X remove TensorCopy codestyle add fluid memory header fix symbol fix cmake fix cmake fix context fix header fix place fix context fix context fix context fix code fix custom context fix custom context fix copy fix data_transform fix style remove changes of custom fix scalar
-
由 limingshu 提交于
* first commit. * a little changes * add some changes for get vec_size efficiently * fix bugs --------- Co-authored-by: Nzhangbopd <1299246947@qq.com>
-
- 10 2月, 2023 1 次提交
-
-
由 RedContritio 提交于
* add dim check in scatter * add check in scatter.cu * add unittest * remove unnecessary log and comment --------- Co-authored-by: RedContritio <>
-
- 09 2月, 2023 2 次提交
-
-
由 Huang Jiyi 提交于
* decouple strided_memcpy * move strided_memcpy * move strided_memcpy to phi * fix namespace * update * fix gpu compile bugs
-
由 yuehuayingxueluo 提交于
* add multi_tenosr_adam * update multi_tensor_base.py, test_multi_tensor_adam.py, adamw.py * fix adam.py optimizer.py * fix adamw.py * fix test_multi_tensor_adam.py * fix CI bug * fix CI coverage * fix ci bug * fix betapow * fix some bugs * fix test_adamw_op.py * fix CI coverage * fix multi_tensor_adam_kernel.cc * fix CI bug * fix multi_tensor_adam_op.cc and test_multi_tensor_adam.py * fix code style * update C++ parts * remove python parts modification temporarily * add C++ ut * update betapow copy code logic * fix ci ut * fix windows ci * fix coverage ci * improve coverage rate --------- Co-authored-by: Nsneaxiy <sneaxiy@126.com>
-
- 08 2月, 2023 1 次提交
-
-
由 Huang Jiyi 提交于
-
- 07 2月, 2023 1 次提交
-
-
由 Yuang Liu 提交于
-
- 03 2月, 2023 1 次提交
-
-
由 RedContritio 提交于
-
- 02 2月, 2023 2 次提交
-
-
由 RedContritio 提交于
* add stride check for PoolOutputSize * add unittest
-
由 YuanRisheng 提交于
* fix bugs * fix ci bugs
-
- 01 2月, 2023 3 次提交
-
-
由 RedContritio 提交于
* add stride check for MaxPool * add unittests
-
由 limingshu 提交于
* A leap of try for cudaLaunchCooperativeKernel * fix bugs * Totally replace the lar cuda kernel * Fix bugs * fix code according to comments * fix codes according to review comments * adding some function overload * relocate the power operation. * add bf16 support for index select relevant ops * revert bf16 type change. * add changes for more op * fix code writting bugs
-
由 limingshu 提交于
* profile reduce kernel for fp16 and reduceHigherdim * use reinterpret_cast * fix for CI on ROCm * add Macro for ROCm * ROCm CI config * ROCm CI config * unit test repair * pull * add common_funcs.h * reduceType * Update reduce_function.h * not higher * rename * implement of matmul using cublasLt instead of cublas * cublasLt bugfix * Update matmul_kernel_impl.h * Update matmul_kernel_impl_via_blasLt.h * for-loop-algo * PR comments changes * add macro * ci unused variable isCublasLt * ci unused variable isCublasLt macro * split matmul to autotune * rewrite the split kernel with segmented_array * rewrite the split kernel with segmented_array * rewrite the split kernel with segmented_array * add some method for cuda_graph * fix bugs for rocm * change for ci-error * i dont know why ci-model-benchmark gives a shit error, so i recover codes with original one to see if original codes work. * add some changes for passing mode_benchmark and coverage ci * fix ci error * fix ci-rocm error * add some changes for header --------- Co-authored-by: Nzhangbopd <1299246947@qq.com> Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
-
- 31 1月, 2023 5 次提交
-
-
由 zhangkaihuo 提交于
-
由 张春乔 提交于
* fix mod 0 error * fix div 0 error in floormod
-
由 xiaoting 提交于
* support 0d tensor for interpolate * support 0d tensor for interpolate * add xpu unittest for interp * update unittest for interpolate * fix coverage * fix code style * fix for coverage * fix coverage
-
由 张春乔 提交于
-
由 Yiqun Liu 提交于
* Unify the gpu implementation of stack and unstack to reuse the optimization. * Optimize the cuda implementation of unstack. * Use GpuMemcpyAsync instead of memory::Copy. * Fix error of calculating the index. * Use FastDivMod to further imporve the performance of unstack.
-
- 30 1月, 2023 1 次提交
-
-
由 engineer1109 提交于
replace all TensorFromVector & TensorToVector AssignKernel async copy
-
- 18 1月, 2023 1 次提交
-
-
由 MarDino 提交于
* add align check * refine
-