提交 · 057ba778fa19c1b9670150d5ea5e83d6c8d64d04 · BaiXuePrincess / Paddle

01 2月, 2023 1 次提交

H2D data transfer optimization for split kernel (#49086) · 057ba778

由 limingshu 提交于 2月 01, 2023

* profile reduce kernel for fp16 and reduceHigherdim

* use reinterpret_cast

* fix for CI on ROCm

* add Macro for ROCm

* ROCm CI config

* ROCm CI config

* unit test repair

* pull

* add common_funcs.h

* reduceType

* Update reduce_function.h

* not higher

* rename

* implement of matmul using cublasLt instead of cublas

* cublasLt bugfix

* Update matmul_kernel_impl.h

* Update matmul_kernel_impl_via_blasLt.h

* for-loop-algo

* PR comments changes

* add macro

* ci unused variable isCublasLt

* ci unused variable isCublasLt macro

* split matmul to autotune

* rewrite the split kernel with segmented_array

* rewrite the split kernel with segmented_array

* rewrite the split kernel with segmented_array

* add some method for cuda_graph

* fix bugs for rocm

* change for ci-error

* i dont know why ci-model-benchmark gives a shit error, so i recover codes with original one to see if original codes work.

* add some changes for passing mode_benchmark and coverage ci

* fix ci error

* fix ci-rocm error

* add some changes for header

---------
Co-authored-by: Nzhangbopd <1299246947@qq.com>
Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>

057ba778

31 1月, 2023 5 次提交

Z

optimize 2D sync_batch_norm (#49663) · 9a4acfee
由 zhangkaihuo 提交于 1月 31, 2023

9a4acfee
张
fix div 0 error in floormod (#49997) · 26bdea0f
由张春乔提交于 1月 31, 2023
```
* fix mod 0 error

* fix div 0 error in floormod
```
26bdea0f

support 0d tensor for interpolate (#49929) · 2e156ac8

由 xiaoting 提交于 1月 31, 2023

* support 0d tensor for interpolate

* support 0d tensor for interpolate

* add xpu unittest for interp

* update unittest for interpolate

* fix coverage

* fix code style

* fix for coverage

* fix coverage

2e156ac8

张

fix div 0 error in conv1_transpose (#50000) · 1755a154
由张春乔提交于 1月 31, 2023

1755a154

Unify the gpu implementation of stack and unstack to reuse the optimization. (#49748) · 3586e856

由 Yiqun Liu 提交于 1月 31, 2023

* Unify the gpu implementation of stack and unstack to reuse the optimization.

* Optimize the cuda implementation of unstack.

* Use GpuMemcpyAsync instead of memory::Copy.

* Fix error of calculating the index.

* Use FastDivMod to further imporve the performance of unstack.

3586e856

30 1月, 2023 1 次提交
- E
  add phi tensor vector array api from fluid (#49885) · 094e3b8c
  由 engineer1109 提交于 1月 30, 2023
```
replace all TensorFromVector & TensorToVector

AssignKernel async copy
```
  094e3b8c
18 1月, 2023 1 次提交
- Add align check for Concat Kernel (#49761) · 24379442
  由 MarDino 提交于 1月 18, 2023
```
* add align check

* refine
```
  24379442
16 1月, 2023 1 次提交

CUDA12.0 integration (#49539) · 1885d55a

由 zlsh80826 提交于 1月 16, 2023

* Update warpctc for cuda-12

* Deprecate cudaProfilerInitialize for CUDA > 11

* Deprecate CUSPARSE_MV_ALG_DEFAULT for CUDA_VERSION >= 11040

* Add the missing thrust header

1885d55a

13 1月, 2023 3 次提交
- L
  Move GeneralDivMod from segmented_array.h to fast_divmod.h (#48934) · ad4824e5
  由 limingshu 提交于 1月 13, 2023
```
* first commit

* add some changes in stack kernel.

* move the location of GeneralDivMod

* fix code format error according to ci
```
  ad4824e5
- Z
  
  Update threshold of bn1d (#49734) · 0294ab41
  由 zhangkaihuo 提交于 1月 13, 2023
  
  0294ab41
- Y
  
  fix fc and fused_fc_elementwise_layernorm kernel diff (#49778) · 0b24d167
  由 Yuanle Liu 提交于 1月 13, 2023
  
  0b24d167
11 1月, 2023 1 次提交

Implement a common segmented array. (#49450) · b1faa562

由 Yiqun Liu 提交于 1月 11, 2023

* Implement a common PointerArray.

* Polish codes.

* Add including of header file.

* Add the branch of kFix8.

* Fix compiling error.

* Add alignas hint to fix the performance drop.

* Optimize the H2D copy in stack_grad.

* Rename the macro.

* Fix align hint for different compilers.

* Polish the define of PADDLE_ALIGN.

* Fix compiling error.

* Remove the align hint on windows.

b1faa562

10 1月, 2023 2 次提交
- R
  [PHI Decoupling] move sequence_scale from fluid to phi (#49668) · a36c5490
  由 Ryan 提交于 1月 10, 2023
```
* try sequence_padding

* fix cant use mutable_data

* fix mistake fluid_sequence_scale.hh/CMakeLists.t include

* fix namespace bug

* fix framework::ToAbsOffset not found

* fix codestyle
```
  a36c5490
- Refine name style and MoeKernel (#49432) · 39210ed0
  由 MarDino 提交于 1月 10, 2023
  
  39210ed0
09 1月, 2023 2 次提交
- Add concat optimization (#49540) · 1a0b3661
  由 MarDino 提交于 1月 09, 2023
```
* add concat optimization

* refine

* remove annotation

* use alignas instead of aligned_storage
```
  1a0b3661
- W
  
  [0 Tensor support] cumprod (#49550) · 50a8b655
  由 wangzhen38 提交于 1月 09, 2023
  
  50a8b655
04 1月, 2023 1 次提交
- Y
  
  [Paddle Inference] fix mixed precision diff (#49475) · ac75a9a6
  由 Yuanle Liu 提交于 1月 04, 2023
  
  ac75a9a6
03 1月, 2023 2 次提交
- L
  
  H2D data transfer optimization for concat kernel (#49040) · 0de94cd9
  由 limingshu 提交于 1月 03, 2023
  
  0de94cd9
- Y
  Use BroadcastKernel and ReduceKernel to optimize expand and expand_grad. (#49419) · c4604025
  由 Yiqun Liu 提交于 1月 03, 2023
```
* Use BroadcastKernel and ReduceKernel to optimize expand and expand_grad.

* Correct the axis when there is only 1 input in BroadcastKernel.

* Add the calculate of output's shape.
```
  c4604025
26 12月, 2022 1 次提交
- R
  [0d Tensor] update scatter for zero-dimension tensor (#49279) · 73aa98cf
  由 Roc 提交于 12月 26, 2022
```
* revert concat and change concat to stack

* let stack kernel support int8, uint8 and bool type
```
  73aa98cf
20 12月, 2022 1 次提交

[PHI decouple] move dropout_impl and cuda_graph_with_memory_pool from fluid to phi (#49139) · 579784e2

由 huangjiyi 提交于 12月 20, 2022

* move dropout_impl from fluid to phi

* move cuda_graph_with_memory_pool from fluid to phi

* update namespace

* remove cuad_graph in fluid

* fix mac-build

* fix bugs

* correct CodeStyle

* fix mac-build

* fix mutable_data

* fix stl include

* fix copy param

579784e2

19 12月, 2022 2 次提交

H
[PHI decoupling] move gather_scatter_kernel from fluid to phi (#49132) · 0b79129d
由 huangjiyi 提交于 12月 19, 2022
```
* move gather_scatter_kernel from fluid to phi

* mv gather_scatter_kernel to gather_scatter_functor
```
0b79129d

[PHI Decoupling] move maxouting and matrix_bit_code from fluid to phi (#49131) · 5e222dc2

由 huangjiyi 提交于 12月 19, 2022

* move maxouting from fluid to phi

* move matrix_bit_code from fluid to phi

* replace mutable_data and fix include

* fix include

* move gather_scatter_kernel from fluid to phi

* Revert "move gather_scatter_kernel from fluid to phi"

This reverts commit 3d0b1eaf179656072e8c483dfca688cccccdda01.

5e222dc2

16 12月, 2022 1 次提交
- Optimize bias_add reluv2 in half2 (#49048) · e77d1cac
  由 MarDino 提交于 12月 16, 2022
```
* optimize bias_add reluv2 in half2

* Add annotation

* refine code format
```
  e77d1cac
15 12月, 2022 1 次提交
- H
  
  [PHI decoupling] move softmax from fluid to phi and remove cpu_vec.h in fluid (#48970) · 344b99e1
  由 huangjiyi 提交于 12月 15, 2022
  
  344b99e1
14 12月, 2022 1 次提交

Divide elementwise case from BroadcastKernel and refine transpose autotune (#33051) · 6c9df13d

由 limingshu 提交于 12月 14, 2022

* First Commit.

* add some codes

* add elementwise loader

* fix code styles

* merge with develop

* add some changes both in elementwise and transpose

* add init operation in broadcast kernel.

* change codes according to pr suggestions about transpose file

* fix error for op-benchmark ci

* fix according to ci

6c9df13d

12 12月, 2022 2 次提交

傅

Optimization of Eigh op with ssyevj_batched runtime api (#48560) · 16e364d3

由傅剑寒提交于 12月 12, 2022

* fix codestyle

* add double complex<float> complex<double> dtype support for syevj_batched

* fix use_syevj flag for precision loss when input dtype of syevj_batch is complex128 in some case

* optimize eigh in different case

* fix missing ; bug

* fix use_syevj bug

* fix use_cusolver_syevj_batched flag

16e364d3

[PHI decoupling] move norm_utils.cu.h from fluid to phi and remove norm_utils.h in fluid (#48930) · 3cb8db8f

由 huangjiyi 提交于 12月 12, 2022

* move norm_utils.cu.h from fluid to phi

* remove norm_utils.h in fluid

* fix bugs and replace mutable_data with Alloc

* replace mutable_data with Alloc

3cb8db8f

08 12月, 2022 5 次提交
- L
  
  first commit (#38143) · 2e7c172c
  由 limingshu 提交于 12月 08, 2022
  
  2e7c172c
- J
  proper fix (#48360) · f95e9245
  由 jakpiase 提交于 12月 08, 2022
```
Reenabled ext_reorder recording for TransDataLayoutFromOneDNN
```
  f95e9245
- 2
  
  Optimize Paddle diagonal (#47904) · b91bbd32
  由 201716010711 提交于 12月 08, 2022
  
  b91bbd32
- N
  [PHI decoupling] remove bbox_util.h from phi dependencies (#48761) · de2c5fd6
  由 Netpunk 提交于 12月 08, 2022
```
* remove bbox_util.h from phi

* add file bbox_util.h

* reframe bbox_util.h
```
  de2c5fd6
- N
  
  remove gpu_info.h from phi dependencies (#48811) · 73688894
  由 Netpunk 提交于 12月 08, 2022
  
  73688894
07 12月, 2022 1 次提交
- Z
  
  optimize nchw<->nhwc kernel in fp16 model (#48692) · 17879045
  由 zhoutianzi666 提交于 12月 07, 2022
  
  17879045
05 12月, 2022 5 次提交
- L
  Transpose optimization for AlphaFold2 (#45230) · a0f43889
  由 limingshu 提交于 12月 05, 2022
```
* first commit

* fix bugs according to ci

* add some changes

* change file name into function.cu.h

* remove const_cast
```
  a0f43889
- R
  
  [0D Tensor]support 0d tensor for dist.scatter and dist.broadcast (#48638) · 22ec915c
  由 Roc 提交于 12月 05, 2022
  
  22ec915c
- R
  Replace mutable_data with DeviceContext.Alloc in phi kernels (#48500) · 34a957e3
  由 Ruibiao Chen 提交于 12月 05, 2022
```
* Replace mutable_data with DeviceContext.Alloc in phi kernels

* Fix CI errors

* Fix CI errors

* Fix CI errors, test=kunlun

* Fix CI errors, test=kunlun

* Handle rnn_functor

* Update approvals
```
  34a957e3
- H
  [Fluid Clean] remove nn.topk, nn.ctc_greedy_decoder, nn.im2sequence,... · 93027d9f
  由 heyanru 提交于 12月 05, 2022
```
[Fluid Clean] remove nn.topk, nn.ctc_greedy_decoder, nn.im2sequence, nn.multiplex, nn.smooth_l1 (#48289)
```
  93027d9f
- N
  [PHI decoupling] migrate poly_util.h to phi (#48499) · d6aa0d43
  由 Netpunk 提交于 12月 05, 2022
```
* rm poly_util.h

* format code

* fix some problems

* format code
```
  d6aa0d43

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致