提交 · 3586e856c581f8e1ee1d924152a037357e3ccfb8 · PaddlePaddle / Paddle

31 1月, 2023 1 次提交

Unify the gpu implementation of stack and unstack to reuse the optimization. (#49748) · 3586e856

由 Yiqun Liu 提交于 1月 31, 2023

* Unify the gpu implementation of stack and unstack to reuse the optimization.

* Optimize the cuda implementation of unstack.

* Use GpuMemcpyAsync instead of memory::Copy.

* Fix error of calculating the index.

* Use FastDivMod to further imporve the performance of unstack.

3586e856

11 1月, 2023 1 次提交

Implement a common segmented array. (#49450) · b1faa562

由 Yiqun Liu 提交于 1月 11, 2023

* Implement a common PointerArray.

* Polish codes.

* Add including of header file.

* Add the branch of kFix8.

* Fix compiling error.

* Add alignas hint to fix the performance drop.

* Optimize the H2D copy in stack_grad.

* Rename the macro.

* Fix align hint for different compilers.

* Polish the define of PADDLE_ALIGN.

* Fix compiling error.

* Remove the align hint on windows.

b1faa562

10 1月, 2023 1 次提交

Optimization for StackGradCUDAKernel for last dimension stack case. (#48992) · 0cae5c7f

由 limingshu 提交于 1月 10, 2023

* add stack grad kernel optimization

* add basic optimization kernel for stack_grad_kernel

* optimization of stack_grad_kernel for last dim stack and change code format with pre-commit

0cae5c7f

26 12月, 2022 1 次提交
- R
  [0d Tensor] update scatter for zero-dimension tensor (#49279) · 73aa98cf
  由 Roc 提交于 12月 26, 2022
```
* revert concat and change concat to stack

* let stack kernel support int8, uint8 and bool type
```
  73aa98cf
01 9月, 2022 1 次提交
- L
  remove circular dependency of device_context and allocator (#45455) · 934171ae
  由 Leo Chen 提交于 9月 01, 2022
```
* refine cmake of framework

* add deps for dense tensor

* fix deps

* remove alloc(ctx)

* add depends on mkldnn
```
  934171ae
21 6月, 2022 1 次提交
- S
  resort .cu headers, set clang-format not sort include block and consider .cu... · 829723f2
  由 Sing_chan 提交于 6月 21, 2022
```
resort .cu headers, set clang-format not sort include block and consider .cu as main source file (#43633)
```
  829723f2
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
31 3月, 2022 1 次提交
- C
  
  fix conflict (#40851) · 74894cd7
  由 csy0225 提交于 3月 31, 2022
  
  74894cd7

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功