- 31 1月, 2023 1 次提交
-
-
由 Yiqun Liu 提交于
* Unify the gpu implementation of stack and unstack to reuse the optimization. * Optimize the cuda implementation of unstack. * Use GpuMemcpyAsync instead of memory::Copy. * Fix error of calculating the index. * Use FastDivMod to further imporve the performance of unstack.
-
- 11 1月, 2023 1 次提交
-
-
由 Yiqun Liu 提交于
* Implement a common PointerArray. * Polish codes. * Add including of header file. * Add the branch of kFix8. * Fix compiling error. * Add alignas hint to fix the performance drop. * Optimize the H2D copy in stack_grad. * Rename the macro. * Fix align hint for different compilers. * Polish the define of PADDLE_ALIGN. * Fix compiling error. * Remove the align hint on windows.
-
- 10 1月, 2023 1 次提交
-
-
由 limingshu 提交于
* add stack grad kernel optimization * add basic optimization kernel for stack_grad_kernel * optimization of stack_grad_kernel for last dim stack and change code format with pre-commit
-
- 03 1月, 2023 1 次提交
-
-
由 limingshu 提交于
-
- 26 12月, 2022 1 次提交
-
-
由 Roc 提交于
* revert concat and change concat to stack * let stack kernel support int8, uint8 and bool type
-
- 11 12月, 2022 1 次提交
-
-
由 limingshu 提交于
* first commit. * refine performance with fast_divmod * refine performance with fast_divmod
-
- 05 9月, 2022 1 次提交
-
-
由 sneaxiy 提交于
-
- 01 9月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* refine cmake of framework * add deps for dense tensor * fix deps * remove alloc(ctx) * add depends on mkldnn
-
- 21 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
resort .cu headers, set clang-format not sort include block and consider .cu as main source file (#43633)
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 31 3月, 2022 1 次提交
-
-
由 csy0225 提交于
-