- 01 2月, 2023 1 次提交
-
-
由 limingshu 提交于
* profile reduce kernel for fp16 and reduceHigherdim * use reinterpret_cast * fix for CI on ROCm * add Macro for ROCm * ROCm CI config * ROCm CI config * unit test repair * pull * add common_funcs.h * reduceType * Update reduce_function.h * not higher * rename * implement of matmul using cublasLt instead of cublas * cublasLt bugfix * Update matmul_kernel_impl.h * Update matmul_kernel_impl_via_blasLt.h * for-loop-algo * PR comments changes * add macro * ci unused variable isCublasLt * ci unused variable isCublasLt macro * split matmul to autotune * rewrite the split kernel with segmented_array * rewrite the split kernel with segmented_array * rewrite the split kernel with segmented_array * add some method for cuda_graph * fix bugs for rocm * change for ci-error * i dont know why ci-model-benchmark gives a shit error, so i recover codes with original one to see if original codes work. * add some changes for passing mode_benchmark and coverage ci * fix ci error * fix ci-rocm error * add some changes for header --------- Co-authored-by: Nzhangbopd <1299246947@qq.com> Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
-
- 18 1月, 2023 1 次提交
-
-
由 MarDino 提交于
* add align check * refine
-
- 09 1月, 2023 1 次提交
-
-
由 MarDino 提交于
* add concat optimization * refine * remove annotation * use alignas instead of aligned_storage
-
- 03 1月, 2023 1 次提交
-
-
由 limingshu 提交于
-
- 20 12月, 2022 1 次提交
-
-
由 huangjiyi 提交于
* move dropout_impl from fluid to phi * move cuda_graph_with_memory_pool from fluid to phi * update namespace * remove cuad_graph in fluid * fix mac-build * fix bugs * correct CodeStyle * fix mac-build * fix mutable_data * fix stl include * fix copy param
-
- 01 9月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* refine cmake of framework * add deps for dense tensor * fix deps * remove alloc(ctx) * add depends on mkldnn
-
- 18 7月, 2022 1 次提交
-
-
由 Wilber 提交于
* test * update
-
- 15 6月, 2022 1 次提交
-
-
由 Guoxia Wang 提交于
-
- 10 6月, 2022 1 次提交
-
-
由 Wilber 提交于
-
- 07 6月, 2022 1 次提交
-
-
由 Wilber 提交于
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 14 3月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* fix gpu conetxt callback * fix gpu callback * fix callback early destruct problem
-
- 25 2月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* refine randint kernel * refine randperm kernel * refine unbind kernel * support op seed
-
- 23 2月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* move unbind to phi * revert infer shape * add header file * move concat_and_split to phi
-