提交 · 22bfa57982add30c43cbe32122b1301fffc0d5e8 · PaddlePaddle / Paddle

08 12月, 2022 15 次提交
- W
  [Paddle Inference] General optimization for no_varlen embedding layernorm (#48580) · 22bfa579
  由 Wangzheee 提交于 12月 08, 2022
```
* general optimization no_varlen embedding layernorm
```
  22bfa579
- H
  [XPU] add load op into oplist. (#48860) · 2bba3e18
  由 houj04 提交于 12月 08, 2022
```
* [XPU] add load op into oplist.

* remove test_sampling_id_op_xpu.py
```
  2bba3e18
- H
  [PHI decoupling] move cuda_graph from fluid to phi (#48686) · a4d9851b
  由 huangjiyi 提交于 12月 08, 2022
```
* move cuda_graph from fluid to phi

* move device_memory_aligment from fluid to phi

* Revert "move device_memory_aligment from fluid to phi"

This reverts commit b92fcd39a0a50fdac13278f49be0237a85f3a13f.

* update xpu cmake
```
  a4d9851b
- T
  fix-gpups setup.py (#48888) · 91ff2071
  由 tianshuo78520a 提交于 12月 08, 2022
```
* fix-gpups

* test=document_fix
```
  91ff2071
- W
  
  [Inference] Enable infer shape cache. (#48312) · f88713e1
  由 Wilber 提交于 12月 08, 2022
  
  f88713e1
- R
  
  Set WaiterType of kGpuSync to kCPU (#48758) · a5999d83
  由 Ruibiao Chen 提交于 12月 08, 2022
  
  a5999d83
- Q
  rm kunlun xpu2_op_list (#48826) · 83c41459
  由 QingshuChen 提交于 12月 08, 2022
```
*test=kunlun
```
  83c41459
- 2
  
  Optimize Paddle diagonal (#47904) · b91bbd32
  由 201716010711 提交于 12月 08, 2022
  
  b91bbd32
- N
  [PHI decoupling] remove bbox_util.h from phi dependencies (#48761) · de2c5fd6
  由 Netpunk 提交于 12月 08, 2022
```
* remove bbox_util.h from phi

* add file bbox_util.h

* reframe bbox_util.h
```
  de2c5fd6
- 六
  [Paddle Inference] Add add onehot trt converter (#48655) · 1adf5430
  由六个骨头提交于 12月 08, 2022
```
* add onehot trt converter

* add unitest

* fix bug

* opt code

* fix bug

* fix depth_tensor

* fix unitest

* fix bug

* fix unitest

* fix bug

* fix bug

* fix bug

* fix bug
```
  1adf5430
- N
  
  remove gpu_info.h from phi dependencies (#48811) · 73688894
  由 Netpunk 提交于 12月 08, 2022
  
  73688894
- W
  
  [Inference] inference add cinn interface (#48741) · 3a387df6
  由 Wilber 提交于 12月 08, 2022
  
  3a387df6
- W
  
  set free_when_no_cache_hit default value to true (#48815) · 592ed40b
  由 wanghuancoder 提交于 12月 08, 2022
  
  592ed40b
- R
  Setuptools optimization (#48770) · da8e15e6
  由 risemeup1 提交于 12月 08, 2022
```
* optimize setup.py

* modify setup.py

* modify setup.py

* modify setup.py

* modify setup.py after zhangbo reviewed
```
  da8e15e6
- Y
  
  Try add eval() to speedup the eigen performance. (#48855) · e89a50c1
  由 Yiqun Liu 提交于 12月 08, 2022
  
  e89a50c1
07 12月, 2022 10 次提交
- S
  [PHI] Migrate squeeze and squeeze_grad kernels (#48634) · ad41fce8
  由 Sławomir Siwek 提交于 12月 07, 2022
```
* squeeze kernel

* squeze fwd

* whitespace
```
  ad41fce8
- fix ci (#48730) · 3a8aac35
  由 zhouweiwei2014 提交于 12月 07, 2022
  
  3a8aac35
- 傅
  [Zero-Dim] Support 0D for paddle.diagflat (#48735) · 1a3d2592
  由傅剑寒提交于 12月 07, 2022
```
* [Zero-Dim] Support 0D for paddle.diagflat
```
  1a3d2592
- 张
  
  [phi::DenseTensor] Replace Tensor with phi::DenseTensor (#48682) · 65420271
  由张春乔提交于 12月 07, 2022
  
  65420271
- W
  
  Fix accuracy fp16 kernel return fp32 tensor error (#48803) · 693de9f0
  由 WangZhen 提交于 12月 07, 2022
  
  693de9f0
- Q
  update kl1 op list and optimize matmul unitest for kunlun (#48775) · 93b7ccf5
  由 QingshuChen 提交于 12月 07, 2022
```
*test=kunlun
```
  93b7ccf5
- F
  
  fix: oss just support sm>=75 (#48731) · 87fbc5e4
  由 feng_shuai 提交于 12月 07, 2022
  
  87fbc5e4
- Z
  
  optimize nchw<->nhwc kernel in fp16 model (#48692) · 17879045
  由 zhoutianzi666 提交于 12月 07, 2022
  
  17879045
- Q
  
  [NPU] add FLAGS_npu_storage_format env to enable npu storage format, test=develop (#48774) · e5bc2eec
  由 Qi Li 提交于 12月 07, 2022
  
  e5bc2eec
- Z
  
  modify d2d copy to xpu::copy in xpu kernel, test=kunlun (#48710) · 0d8ddf9f
  由 zhangyikun02 提交于 12月 07, 2022
  
  0d8ddf9f
06 12月, 2022 9 次提交

X
make bilinear interpolate stable. (#48644) · e1e8bf72
由 xiongkun 提交于 12月 06, 2022
```
* make bilinear interpolate stable.

* fix code
```
e1e8bf72

Clear extra input (Bias, ResidualData) in OpMaker of conv2d (#47579) · 0a2dfa38

由 zyfncg 提交于 12月 06, 2022

* delete Bias and ResidualData in OpMaker of conv2d

* delete extra input of conv3d

* refactor pass of conv_bias_fusion

* fix mkldnn dependency

* fix mkldnn compile

* fix test_conv_bias_mkldnn_fuse_pass

* police some code

* remove useless log

* fix analyzer_vit_ocr_tester

* fix conv_activation_mkldnn_fuse_pass

* fix test_analyzer_ocr

* add fused_conv_sig

* fix performence regression

* fix performance regression

0a2dfa38

Q
add xpu_support op function (#48606) · 06b32b38
由 QingshuChen 提交于 12月 06, 2022
```
*test=kunlun
```
06b32b38
S
[PHI] Migrate elementwise_(add/mul) kernels (#48625) · 7575d37c
由 Sławomir Siwek 提交于 12月 06, 2022
```
* remove fluid code

* init

* typo

* fix merge conflicts
```
7575d37c
H

[XPU] add tile_grad op (#48720) · 8de336f9
由 houj04 提交于 12月 06, 2022

8de336f9

Remove fluid matmul (#47988) · 8fb829ba

由 kangguangli 提交于 12月 06, 2022

* remove layers.matmul in nets.py

* remove layers.matmul in rnn_impl/test_quantization_pass/auto_parallel_gpt_model/test_auto_parallel_completion_gpt

* remove layers.matmul in other files

* fix

* fix

* remove layers.matmul itself

* remove ref in CMakeLists.txt and tools directory

* remove matmul in fluid.layers.nn.py

* remove matmul in fluid.dygraph.rnn.py && resotre test_matmul_op.py

* replace matmul in fluid.dygraph.rnn.py && clean api_test in test_matmul_op.py

* fix error && restore empty test_auto_search_dist_matmul_op.py

* fix check in test_auto_parallel_partitioner.py

* fix test_dist_matmul && test_flags_mkldnn_ops_on_off

* fix test_fused_attention_op_xpu.py && test_matmul_op_xpu.py

* remove test_auto_search_dist_matmul_op.py

* remove layers.matmul in auto_parallel_gpt_model.py && fix doc in fluid/io.py

* fix for matmul_grad

* fix codestyle

* fix codestyle

* resolve conflicts error

* restore unit test file but not compiled it for later remove

* fix codestyle

* fix wrong unittest skip

* fix unittest delete

* fix scale cost

* fix scale cost

* resolve conflicts error

* resolve conflicts error
Co-authored-by: Njakpiase <jakpia21@gmail.com>

8fb829ba

Z
[inference][trt] add reduce max for trt (#48684) · dd304f31
由 Zhang Jun 提交于 12月 06, 2022
```
* add reduce max for trt
```
dd304f31
Y

[Paddle Inference] Add float_to_half_pass to support inference with mixed precision (#47993) · c5a45cc6
由 Yuanle Liu 提交于 12月 06, 2022

c5a45cc6

add xpu centered rmsprop (#48658) · 54b756e2

由 ykkk2333 提交于 12月 06, 2022

* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

* add xpu rmsprop centered, test=kunlun

54b756e2

05 12月, 2022 6 次提交
- L
  Transpose optimization for AlphaFold2 (#45230) · a0f43889
  由 limingshu 提交于 12月 05, 2022
```
* first commit

* fix bugs according to ci

* add some changes

* change file name into function.cu.h

* remove const_cast
```
  a0f43889
- Z
  
  support nhwc in conv2d_fusion (#48642) · 30f4ef7f
  由 zhoutianzi666 提交于 12月 05, 2022
  
  30f4ef7f
- R
  
  [0D Tensor]support 0d tensor for dist.scatter and dist.broadcast (#48638) · 22ec915c
  由 Roc 提交于 12月 05, 2022
  
  22ec915c
- Y
  
  fix onednn bugs (#48714) · 35ebf2b4
  由 YuanRisheng 提交于 12月 05, 2022
  
  35ebf2b4
- W
  Reverse roll fuse (#46914) · feb68dd1
  由 Wang Bojun 提交于 12月 05, 2022
```
* pass

* pass

* draft version

* share mem opt

* remove sharemem

* add pattern for the case with circle_shift=0

* add UT

* pass opt

* test_fix

* code-commit

* code-style

* code style

* code-style

* ut-fix

* op teller refine

* resolve conflict

* adjust position op_teller list and pass order for swin

* ut code style update

* adjust paddle pass order

* refine pass order

* refine pass order

* refine pass order
```
  feb68dd1
- W
  
  fix error when share buffer but modify the dtype (#48666) · 65ffc3f5
  由 Wilber 提交于 12月 05, 2022
  
  65ffc3f5

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功