提交 · 6261076c4e3ba2e84a4c0d4f5123a5334d48b5ea · PaddlePaddle / Paddle

24 3月, 2023 8 次提交

[PHI Decoupling]Remove memory header (Part3) (#51288) · 3d78e759

由 YuanRisheng 提交于 3月 24, 2023

* decouple memory copy

* fix ci bugs

* fix ci compile bugs

* fix rocm compile

* fix ci bugs

* decouple memory

* deal with conflict

* fix xpu compile bugs

* fix xpu bugs

* deal with xpu bugs

* fix cmake bugs

* fix windows bugs

* fix ci bugs

* fix ci bugs

* delete redundance code

* add code for pybind

* fix py3 bugs

* fix ci bugs

3d78e759

Y
[CUSTOM DEVICE]analysis predictor custom device support (#52015) · 3ab19ab4
由 YuhangLi 提交于 3月 24, 2023
```
* [CUSTOM DEVICE]analysis predictor custom device support

* del debug log
```
3ab19ab4
Y

remove py::array::forcecast flag (#52039) · 5d503ec9
由 Yuanle Liu 提交于 3月 24, 2023

5d503ec9
P
[PHI]fix momentum dtype infer (#51353) · 648ec795
由 PuQing 提交于 3月 24, 2023
```
* fix momentum dtype infer

* fix momentum datatype

* fix on cpu

* add momentum
```
648ec795
T
【PaddlePaddle Hackathon 4 No.40】为 Paddle 优化 kthvalue op 在 GPU 上的计算性能 (#51835) · e18f5339
由 thunder95 提交于 3月 24, 2023
```
* untracked files

* kthvalue perf

* remove unused files

* fix isnan

* fix isnan2

* fix bug

* try to fix rocm error
```
e18f5339

Memory Efficient Attention (#51867) · e5ad3859

由 ZhangDY-6483 提交于 3月 24, 2023

* first version, notest

* return final rst, notest

* use infinity() instead of max

* ut structure

* start up of ut

* generate lse

* update

* add depense

* reconstruct cmake

* move file

* add memory efficient attention and fix blasimpl

* update

* update cmake

* add namespace

* update cmake

* use .cu

* update for pad3d

* bug fix

* bug fix

* update

* bug fix

* update enforce

* add test case

* merge the lse pad

* fix kernel_fn of backward

* fix PADDLE_ENFORCE_EQ and phi_api

* fix PADDLE_ENFORCE

* fix PADDLE_ENFORCE

* rerun coverage

* fix memory efficient attention test

* rerun ci

* add cuda version condition

* add cuda version condition

* delete WIP test

* replace PADDLE_ENFORCE

* edit the namespace of datatype in multiple.cc

* rerun

* rerun

---------
Co-authored-by: Nliuyuang <liuyuang@baidu.com>

e5ad3859

Z

remove copy of index for gather_nd_grad and scatter_nd_add op in xpu (#51871) · b110085f
由 zhangyikun02 提交于 3月 24, 2023

b110085f
Y

Fix roll kernel gpu bug. (#52012) · b6d0dac9
由 Yuang Liu 提交于 3月 24, 2023

b6d0dac9

23 3月, 2023 25 次提交
- H
  
  [Fix Bug] Fix customOP + customDevice scenario selects wrong place (#51996) · 2bf0d1c8
  由 HongyuJia 提交于 3月 23, 2023
  
  2bf0d1c8
- H
  
  [CustomOP Optional] CustomOP supports optional vector<Tensor> input (#51973) · 6a10e604
  由 HongyuJia 提交于 3月 23, 2023
  
  6a10e604
- H
  [Polish Log] Polish Tensor operants' log: 'OperantsManager reusing XXX mode... · 5754aae5
  由 HongyuJia 提交于 3月 23, 2023
```
[Polish Log] Polish Tensor operants' log: 'OperantsManager reusing XXX mode API {func_name}' (#51991)

* [Polish Log] Polish Tensor operants' log: 'OperantsManager reusing XXX mode API {func_name}'

* Make API name more precise
```
  5754aae5
- Z
  
  pool2d and pool2d_grad support case of kernel_size > kh/kw for xpu (#51870) · 5f388221
  由 zhangyikun02 提交于 3月 23, 2023
  
  5f388221
- W
  
  add paddle-trt convert op: greater_equal (#52000) · 4dfbdb04
  由 Wangzheee 提交于 3月 23, 2023
  
  4dfbdb04
- X
  【prim】delete high order prim flag && add special prune rules for node.cc (#51676) · 978d544b
  由 xiaoguoguo626807 提交于 3月 23, 2023
```
* delete prim flag for matmul_2_grad

* delete prim flag for matmul_2_grad

* add new setgradoutmeta for matmul_double_grad_node

* modify test and delete log

* deal with review
```
  978d544b
- C
  [Prim] add meshgrid composite rule (#51061) · 53bb883d
  由 chenjian 提交于 3月 23, 2023
```
* add meshgrid composite rule

* add meshgrid composite rule

* update

* add into CMakeLists

* fix

* update

* update

* optimize code

* fix meshgrid op

* update test
```
  53bb883d
- add output defs for clip_by_norm kernel (#51993) · 33897a95
  由 iSerendipity 提交于 3月 23, 2023
  
  33897a95
- Z
  
  [XPU] support lod_reset (#51967) · c491b361
  由 ZhouMengLei1999 提交于 3月 23, 2023
  
  c491b361
- S
  Remove fluid deps in fused_linear_param_grad_add_kernel.cu (#51975) · 5da1a27b
  由 sneaxiy 提交于 3月 23, 2023
```
* remove fluid deps in fused_linear_param_grad_add_kernel

* fix compile error

* fix ut error

* follow comments
```
  5da1a27b
- L
  Optimization for DropoutNd on Host side (#51934) · 101c9bb0
  由 limingshu 提交于 3月 23, 2023
```
* first commit

* fix bugs

* remove_useless sync
```
  101c9bb0
- H
  register fluid kerenls to phi (#51976) · cc9bbd5b
  由 Huang Jiyi 提交于 3月 23, 2023
```
* unify add_position_encoding

* unify affine_channel

* unify alloc_float_status

* unify allreduce

* unify alltoall

* unify anchor_generator

* unify ascend_trigger

* fix bug

* fix test
```
  cc9bbd5b
- H
  register fluid activation kernel to phi (#51927) · aaa14780
  由 Huang Jiyi 提交于 3月 23, 2023
```
* update

* update

* update

* update

* update

* fix test
```
  aaa14780
- C
  
  [prim] add gelu vjp rule · 2add31f4
  由 cxxly 提交于 3月 06, 2023
  
  2add31f4
- Z
  To support py3.11, pybind need to upgrade to v2.10.0 (#51350) · 13b8b5e0
  由 zqw_1997 提交于 3月 23, 2023
```
* to support cuda12, pybind need to upgrade to v2.10.0

* add DEPS of pybind in test_custom_plugin_creater.cc

* only change the tag

* please let CI pass

* try pybind v2.10/3

* modify the include header in test

* code check
```
  13b8b5e0
- L
  [AMP] Add bfloat16 Support for `elementwise_pow` Op (#51888) · 288ad844
  由 Lin Manhui 提交于 3月 23, 2023
```
* Add bf16 support for elementwise_pow

* Update ut
```
  288ad844
- I
  
  support auto generate for nms (#51891) · 4bf1c163
  由 Infinity_lee 提交于 3月 23, 2023
  
  4bf1c163
- Y
  
  gather and gather nd fp16, bf16 support and add ut (#51903) · 5bcdfbb0
  由 Yuang Liu 提交于 3月 23, 2023
  
  5bcdfbb0
- Y
  [AMP] Add bfloat16 and float16 tests for compare ops (#51978) · a7397e0c
  由 yeliang2258 提交于 3月 23, 2023
```
* add bf16 and fp16 tests

* fix dtype check
```
  a7397e0c
- H
  [Bug fixes] fix distributed graph engine (#51956) · 9c853d1d
  由 Huang Zhengjie 提交于 3月 23, 2023
```
* fix distributed graph engine
```
  9c853d1d
- L
  【PaddlePaddle Hackathon 4】No.63 fix temporal_shift and conj (#51532) · 1550348e
  由 LoneRanger 提交于 3月 23, 2023
```
* add fp16 and bfp16 for temporalshift

* add fp16 and bfp16 for complex

* fix bug

* fix bug

* add fp16 and bf16 for conj

* fix bug

* fix bug

* Update complex_kernel.h

fix bug

* Update temporal_shift_grad_kernel.h

fix bug

* Update temporal_shift_kernel.h

fix bug
```
  1550348e
- P
  [PHI] Add nanmedian output defs (#51358) · a82911a5
  由 PuQing 提交于 3月 23, 2023
```
* add nanmedian output defs

* remove the multiclass_nms3 momentum
```
  a82911a5
- P
  [CodeStyle][C408][C409][C410] Fix unnecessary <dict/list/tuple> call and... · cf391b81
  由 PuQing 提交于 3月 23, 2023
```
[CodeStyle][C408][C409][C410] Fix unnecessary <dict/list/tuple> call and unnecessary <list/tuple> passed to <list/tupule>() (#51928)

* autofix

* add select config

* autofix C410

* add C410 select
```
  cf391b81
- D
  【Hackathon No.45】为 Paddle logical 算子实现 float16 数据类型支持 (#50926) · 0480ff5d
  由 denglianbin 提交于 3月 23, 2023
```
* finish pr

* skip cpu test for logical

* change test style

* fix error.
```
  0480ff5d
- J
  【Eager】Fix error raise (#51963) · 3704471d
  由 Jiabin Yang 提交于 3月 23, 2023
```
* allow return none when stop_gradient=True

* remove useless code

* refine code

* refine code

* fix test cast

* change more test

* add more tests

* fix error msg in pylayer
```
  3704471d
22 3月, 2023 7 次提交

Support optimizers operator to be generated (#51767) · 0b008e0c

由 HappyHeavyRain 提交于 3月 22, 2023

* test_get_kernel

* add invoke signature

* change reduce_max

* change frobenius_norm

* reset reduce_max according to composite and change reduce_all

* fix the bug when Scalar(*)

* fix 'scalar when support_tensor'

* change code according to review

* change 'keep_signature' to 'manual_signature' and add some erro info

* support optimizers autogen

* change sgd yaml

* change generate signature

* fix test/cpp/new_executor/CM

* reset signature generated function

* change signature funciton

* change signature funciton

0b008e0c

[Zero-Dim] Support 0-D tensor for some oneDNN unary kernels (#51687) · 2a3d75bc

由 YangQun 提交于 3月 22, 2023

* support 0-d tensor for element wise unary ops

* fix python code style check

* fix approval check

* support 0-d tensor for onednn softmax and logsoftmax kernels

* fix commnets

* fix some unittests

2a3d75bc

J

Correct lstm qat test (#51499) · 31f81685
由 joanna.wozna.intel 提交于 3月 22, 2023

31f81685
S

add fused dropout add (#51752) · 6ba0507d
由 ShenLiang 提交于 3月 22, 2023

6ba0507d
D
[XPU] fix distribute_fpn_proposals (#51873) · a10718e8
由 duanyanhui 提交于 3月 22, 2023
```
* fix distribute_fpn_proposals

* fix bug
```
a10718e8

Add fused_feed_forward pass (#50423) · 5dda0ef6

由 Ghost Screaming 提交于 3月 22, 2023

* Add fused_feed_forward pass for semi-automatic static graph training.

* Add fused_feedforward property in parallel_executor.cc

* Polish code.

* Polish fused feed_forward pass code. Support use_dropout1 and
use_dropout2 option.

* Support model parallel in fused_feedforward pass.

5dda0ef6

Extract fused_transpose op dedicated for oneDNN fuse passes (#50021) · 02296977

由 Sławomir Siwek 提交于 3月 22, 2023

* extract common methods to reuse

* add header for transpose ops

* fused_transpose

* Split big function

* transpose2 tests

* fused_transpose

* Apply extra attributes

* add pbtxt file

* update pbtxt

* Merge develop

* add more strict op compats

* code  style

* remove mkldnn_data_type

* unify SetOutMemDescWithReshape2FuseSupport

* adjust quantize-dequantize for transpose

* remove appendact

* transpose2 quantization

* fix int8 tests

* adjust transpose_op to current develop

* delete fusion code from transpose_kernel

* add fused transpose to NHWC unittest

* change order

02296977

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功