提交 · 32633c8e8568006cf51d83356a878d9f12d73e0a · PaddlePaddle / Paddle

14 12月, 2022 4 次提交

[CodeStyle] fix c++17-extensions warning on macos (#49017) · fd3169da

由 PuQing 提交于 12月 14, 2022

* fix c++17-extensions warning on macos

* fix type

fix c++17-extensions warning on macos

fix c++17-extensions warning on macos

fd3169da

Divide elementwise case from BroadcastKernel and refine transpose autotune (#33051) · 6c9df13d

由 limingshu 提交于 12月 14, 2022

* First Commit.

* add some codes

* add elementwise loader

* fix code styles

* merge with develop

* add some changes both in elementwise and transpose

* add init operation in broadcast kernel.

* change codes according to pr suggestions about transpose file

* fix error for op-benchmark ci

* fix according to ci

6c9df13d

nullptr bugfix for XPU pg mode (#49043) · f0dab193

由 james 提交于 12月 14, 2022

* nullptr bugfix for XPU pg mode

Also a few kernels is added to xpu whitelist

* increase error msg length

f0dab193

Z

[Sparse]Optimize performance of sparse conv on T4 (#49009) · 227a5112
由 zhangkaihuo 提交于 12月 14, 2022

227a5112

12 12月, 2022 4 次提交

N
[PHI decoupling] replace dependency of inclusive_scan.h from phi (#48980) · c9f4cfad
由 Netpunk 提交于 12月 12, 2022
```
* replace dependency of inclusive_scan.h from phi

* format code
```
c9f4cfad

傅

Optimization of Eigh op with ssyevj_batched runtime api (#48560) · 16e364d3

由傅剑寒提交于 12月 12, 2022

* fix codestyle

* add double complex<float> complex<double> dtype support for syevj_batched

* fix use_syevj flag for precision loss when input dtype of syevj_batch is complex128 in some case

* optimize eigh in different case

* fix missing ; bug

* fix use_syevj bug

* fix use_cusolver_syevj_batched flag

16e364d3

[PHI] OneDNN version of Copy (#48539) · d666c7df

由 Paulina Gacek 提交于 12月 12, 2022

* OneDNN version of Copy, tranpose kernels adjusted

* style fixes in tranpose_grad

* redundant headers deleted

d666c7df

[PHI decoupling] move norm_utils.cu.h from fluid to phi and remove norm_utils.h in fluid (#48930) · 3cb8db8f

由 huangjiyi 提交于 12月 12, 2022

* move norm_utils.cu.h from fluid to phi

* remove norm_utils.h in fluid

* fix bugs and replace mutable_data with Alloc

* replace mutable_data with Alloc

3cb8db8f

11 12月, 2022 1 次提交
- L
  H2D data transfer optimization with usage of structure type for stack kernel (#48899) · a78f0a16
  由 limingshu 提交于 12月 11, 2022
```
* first commit.

* refine performance with fast_divmod

* refine performance with fast_divmod
```
  a78f0a16
09 12月, 2022 4 次提交
- S
  [PHI] Migrate reshape kernel (#48749) · 7b2b0c1b
  由 Sławomir Siwek 提交于 12月 09, 2022
```
* reshape

* typo

* remove header
```
  7b2b0c1b
- N
  
  Modified the Kernel policy. When the compute is NHWC (#48563) · 992250bf
  由 niuliling123 提交于 12月 09, 2022
  
  992250bf
- L
  move share_buffer kernel to phi (#48858) · c2e77ba3
  由 Leo Chen 提交于 12月 09, 2022
```
* move share_buffer kernel to phi

* fix ut

* add source file

* fix window links
```
  c2e77ba3
- P
  
  [PHI decoupling] move "flags.h" from fluid to phi (#48696) · 39ffef0d
  由 PuQing 提交于 12月 09, 2022
  
  39ffef0d
08 12月, 2022 8 次提交
- L
  
  first commit (#38143) · 2e7c172c
  由 limingshu 提交于 12月 08, 2022
  
  2e7c172c
- H
  
  [XPU] add set_value and set_value_grad (#48845) · 94fe929a
  由 haosicheng 提交于 12月 08, 2022
  
  94fe929a
- J
  proper fix (#48360) · f95e9245
  由 jakpiase 提交于 12月 08, 2022
```
Reenabled ext_reorder recording for TransDataLayoutFromOneDNN
```
  f95e9245
- H
  [PHI decoupling] move cuda_graph from fluid to phi (#48686) · a4d9851b
  由 huangjiyi 提交于 12月 08, 2022
```
* move cuda_graph from fluid to phi

* move device_memory_aligment from fluid to phi

* Revert "move device_memory_aligment from fluid to phi"

This reverts commit b92fcd39a0a50fdac13278f49be0237a85f3a13f.

* update xpu cmake
```
  a4d9851b
- 2
  
  Optimize Paddle diagonal (#47904) · b91bbd32
  由 201716010711 提交于 12月 08, 2022
  
  b91bbd32
- N
  [PHI decoupling] remove bbox_util.h from phi dependencies (#48761) · de2c5fd6
  由 Netpunk 提交于 12月 08, 2022
```
* remove bbox_util.h from phi

* add file bbox_util.h

* reframe bbox_util.h
```
  de2c5fd6
- N
  
  remove gpu_info.h from phi dependencies (#48811) · 73688894
  由 Netpunk 提交于 12月 08, 2022
  
  73688894
- Y
  
  Try add eval() to speedup the eigen performance. (#48855) · e89a50c1
  由 Yiqun Liu 提交于 12月 08, 2022
  
  e89a50c1
07 12月, 2022 5 次提交
- S
  [PHI] Migrate squeeze and squeeze_grad kernels (#48634) · ad41fce8
  由 Sławomir Siwek 提交于 12月 07, 2022
```
* squeeze kernel

* squeze fwd

* whitespace
```
  ad41fce8
- 傅
  [Zero-Dim] Support 0D for paddle.diagflat (#48735) · 1a3d2592
  由傅剑寒提交于 12月 07, 2022
```
* [Zero-Dim] Support 0D for paddle.diagflat
```
  1a3d2592
- W
  
  Fix accuracy fp16 kernel return fp32 tensor error (#48803) · 693de9f0
  由 WangZhen 提交于 12月 07, 2022
  
  693de9f0
- Z
  
  optimize nchw<->nhwc kernel in fp16 model (#48692) · 17879045
  由 zhoutianzi666 提交于 12月 07, 2022
  
  17879045
- Z
  
  modify d2d copy to xpu::copy in xpu kernel, test=kunlun (#48710) · 0d8ddf9f
  由 zhangyikun02 提交于 12月 07, 2022
  
  0d8ddf9f
06 12月, 2022 6 次提交

X
make bilinear interpolate stable. (#48644) · e1e8bf72
由 xiongkun 提交于 12月 06, 2022
```
* make bilinear interpolate stable.

* fix code
```
e1e8bf72

Clear extra input (Bias, ResidualData) in OpMaker of conv2d (#47579) · 0a2dfa38

由 zyfncg 提交于 12月 06, 2022

* delete Bias and ResidualData in OpMaker of conv2d

* delete extra input of conv3d

* refactor pass of conv_bias_fusion

* fix mkldnn dependency

* fix mkldnn compile

* fix test_conv_bias_mkldnn_fuse_pass

* police some code

* remove useless log

* fix analyzer_vit_ocr_tester

* fix conv_activation_mkldnn_fuse_pass

* fix test_analyzer_ocr

* add fused_conv_sig

* fix performence regression

* fix performance regression

0a2dfa38

S
[PHI] Migrate elementwise_(add/mul) kernels (#48625) · 7575d37c
由 Sławomir Siwek 提交于 12月 06, 2022
```
* remove fluid code

* init

* typo

* fix merge conflicts
```
7575d37c
H

[XPU] add tile_grad op (#48720) · 8de336f9
由 houj04 提交于 12月 06, 2022

8de336f9

Remove fluid matmul (#47988) · 8fb829ba

由 kangguangli 提交于 12月 06, 2022

* remove layers.matmul in nets.py

* remove layers.matmul in rnn_impl/test_quantization_pass/auto_parallel_gpt_model/test_auto_parallel_completion_gpt

* remove layers.matmul in other files

* fix

* fix

* remove layers.matmul itself

* remove ref in CMakeLists.txt and tools directory

* remove matmul in fluid.layers.nn.py

* remove matmul in fluid.dygraph.rnn.py && resotre test_matmul_op.py

* replace matmul in fluid.dygraph.rnn.py && clean api_test in test_matmul_op.py

* fix error && restore empty test_auto_search_dist_matmul_op.py

* fix check in test_auto_parallel_partitioner.py

* fix test_dist_matmul && test_flags_mkldnn_ops_on_off

* fix test_fused_attention_op_xpu.py && test_matmul_op_xpu.py

* remove test_auto_search_dist_matmul_op.py

* remove layers.matmul in auto_parallel_gpt_model.py && fix doc in fluid/io.py

* fix for matmul_grad

* fix codestyle

* fix codestyle

* resolve conflicts error

* restore unit test file but not compiled it for later remove

* fix codestyle

* fix wrong unittest skip

* fix unittest delete

* fix scale cost

* fix scale cost

* resolve conflicts error

* resolve conflicts error
Co-authored-by: Njakpiase <jakpia21@gmail.com>

8fb829ba

add xpu centered rmsprop (#48658) · 54b756e2

由 ykkk2333 提交于 12月 06, 2022

* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

* add xpu rmsprop centered, test=kunlun

54b756e2

05 12月, 2022 7 次提交
- L
  Transpose optimization for AlphaFold2 (#45230) · a0f43889
  由 limingshu 提交于 12月 05, 2022
```
* first commit

* fix bugs according to ci

* add some changes

* change file name into function.cu.h

* remove const_cast
```
  a0f43889
- R
  
  [0D Tensor]support 0d tensor for dist.scatter and dist.broadcast (#48638) · 22ec915c
  由 Roc 提交于 12月 05, 2022
  
  22ec915c
- H
  
  move device_memory_aligment from fluid to phi (#48694) · 796499fd
  由 huangjiyi 提交于 12月 05, 2022
  
  796499fd
- R
  Replace mutable_data with DeviceContext.Alloc in phi kernels (#48500) · 34a957e3
  由 Ruibiao Chen 提交于 12月 05, 2022
```
* Replace mutable_data with DeviceContext.Alloc in phi kernels

* Fix CI errors

* Fix CI errors

* Fix CI errors, test=kunlun

* Fix CI errors, test=kunlun

* Handle rnn_functor

* Update approvals
```
  34a957e3
- S
  Register exp/expm1/logit bf16 activation op kernels (#48702) · d1e2ba8a
  由 sneaxiy 提交于 12月 05, 2022
```
* register more bf16 ops

* update to register coresponding backward ops
```
  d1e2ba8a
- H
  [Fluid Clean] remove nn.topk, nn.ctc_greedy_decoder, nn.im2sequence,... · 93027d9f
  由 heyanru 提交于 12月 05, 2022
```
[Fluid Clean] remove nn.topk, nn.ctc_greedy_decoder, nn.im2sequence, nn.multiplex, nn.smooth_l1 (#48289)
```
  93027d9f
- N
  [PHI decoupling] migrate poly_util.h to phi (#48499) · d6aa0d43
  由 Netpunk 提交于 12月 05, 2022
```
* rm poly_util.h

* format code

* fix some problems

* format code
```
  d6aa0d43
03 12月, 2022 1 次提交
- Y
  
  Scatter 0D index for gather, 0D index and 0D updates for scatter. (#48452) · f9815bfe
  由 Yuang Liu 提交于 12月 03, 2022
  
  f9815bfe

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功