提交 · 61469eec0bee98e1bd65ba54e99fe39998ded605 · PaddlePaddle / Paddle

17 2月, 2023 1 次提交
- Z
  [XPU] add multi_encoder_xpu_slice_fuse_pass, generate_sequence_xpu_fuse_pass,... · 61469eec
  由 zhupengyang 提交于 2月 17, 2023
```
[XPU] add multi_encoder_xpu_slice_fuse_pass, generate_sequence_xpu_fuse_pass, generate_sequence_xpu kernel (#50570)
```
  61469eec
16 2月, 2023 4 次提交
- S
  [XPU][Fleet] Support multi-card infer for xpu (#50490) · 517d8074
  由 shentanyue 提交于 2月 16, 2023
```
* support xpu multi-card infer

* add ut

* clean code

* clean code

* fix

* fix

* fix

* fix
```
  517d8074
- H
  [XPU] update xccl to 1.0.8 and xdnn to 20230215 (#50247) · b8008580
  由 houj04 提交于 2月 16, 2023
```
* [XPU] update xccl to 1.0.8

* update xdnn. add uint8 for concat and split.

* update xdnn to 20230215.
```
  b8008580
- R
  [XPU] add group_norm, sin, cos, linspace, randint kernels (#50465) · c86a5140
  由 ronnywang 提交于 2月 16, 2023
```
* [XPU] add group_norm kernel

* update

* add xpu sin, cos, randint, linspace kernels

* update

* update
```
  c86a5140
- Z
  
  [XPU] fix dropout pass; add multi_encoder_xpu_fuse_pass & multi_encoder_xpu kernel (#50499) · c8aa6405
  由 zhupengyang 提交于 2月 16, 2023
  
  c8aa6405
15 2月, 2023 4 次提交
- Y
  [PHI Decoupling]Remove Profiler header (Part2) (#50183) · 8fabca11
  由 YuanRisheng 提交于 2月 15, 2023
```
* move profiler

* add file

* fix mac compile bugs

* fix ci bugs

* fix mac bugs

* fix ci bugs

* fix compile bugs

* perfect code according comment
```
  8fabca11
- Z
  
  add gather_nd_grad op and where_grad support zero_dim for xpu (#50454) · 055d0c2d
  由 zhangyikun02 提交于 2月 15, 2023
  
  055d0c2d
- Q
  
  remove duplicated op in xpu2_op_list (#50450) · 47c23ccb
  由 QingshuChen 提交于 2月 15, 2023
  
  47c23ccb
- Y
  [CUSTOM]custom device add black_list (#50409) · 66d3c56e
  由 YuhangLi 提交于 2月 15, 2023
```
* [CUSTOM]custom device add black_list

* change log level

* fix some issues
```
  66d3c56e
14 2月, 2023 1 次提交

decouple tensor_utils (#50264) · 057cdb95

由 engineer1109 提交于 2月 14, 2023

fix X

remove TensorCopy

codestyle

add fluid memory header

fix symbol

fix cmake

fix cmake

fix context

fix header

fix place

fix context

fix context

fix context

fix code

fix custom context

fix custom context

fix copy

fix data_transform

fix style

remove changes of custom

fix scalar

057cdb95

13 2月, 2023 1 次提交

add xpu pool3d kernels (#50233) · 1281b612

由 ykkk2333 提交于 2月 13, 2023

* add xpu adagrad and where_grad kernels, test=kunlun

* add xpu pool3d kernels, test=kunlun

1281b612

10 2月, 2023 4 次提交
- L
  Fix bugs and add unit tests in instance_norm_grad_kernel when d_scale and (#50394) · 4c373e6b
  由 Leo Guo 提交于 2月 10, 2023
```
d_bias are nullptr. Modify the code style of full_kernel.cc. Add new data
type for concat, elementwise_add, gather, scale, scatter ops. test=kunlun
```
  4c373e6b
- Z
  
  [XPU] add fc_xpu op&pass to optimize ernie model (#50277) · 945f918c
  由 zhupengyang 提交于 2月 10, 2023
  
  945f918c
- H
  [phi decoupling] rm gradient_accumulator in phi (#50385) · 13f57ec0
  由 Huang Jiyi 提交于 2月 10, 2023
```
* rm gradient_accumulator in phi

* update
```
  13f57ec0
- W
  
  [XPU] bind op: atan & deformable_conv_v1 (#50373) · e15ef948
  由 wangshengxiang 提交于 2月 10, 2023
  
  e15ef948
09 2月, 2023 2 次提交
- L
  
  Modify full kernel for xpu. test=kunlun (#50209) · 18e0e01d
  由 Leo Guo 提交于 2月 09, 2023
  
  18e0e01d
- Z
  
  add logical_and, logical_or and logical_xor for xpu (#50228) · 0036316e
  由 zhangyikun02 提交于 2月 09, 2023
  
  0036316e
06 2月, 2023 1 次提交
- R
  
  fix gcc12 error: mismatched-new-delete error in custom_device.cc (#47466) · 6d70761e
  由 risemeup1 提交于 2月 06, 2023
  
  6d70761e
03 2月, 2023 1 次提交

Replace matmul(v2) with fused_matmul during oneDNN fuse passes (#49515) · 5cfe1645

由 Sławomir Siwek 提交于 2月 03, 2023

* replace matmul with matmul_v2 in fuse passes

* Remove fusion logic from matmul

* removing fusion methods

* add proper name

* adjust namespaces

* clean attrs in python tests

* delete checkpoint and restore matmul version

* remove unused code

* matmul and reshape/transpose fuses migrated

* split MatmulOneDNN headers

* fuse activation and eltwise_add

* add fuse_activation

* matmul_transpose_reshape/reshape_transpose_matmul

* matmul + elementwise_add (fused)

* activation temporary modifciation

* merge newest develop

* remove depedency from other PR

* revert pbtxt

* remove placeholders from matmul_v2

* add description in OPMaker

* remove matmul_v2_op.h and all depedencies

* remove dims changing in base op

* add possibility to fuse already fused_matmul

* restart broken CI

* Empty-Commit

* revert matmul_utils.h

* codestyle

* adjust imports

* add pbtxt file

* 100% matmul unit tests coverage

* trigger CI with minimal changes to develop

* adjust changes to develop

* add fused_matmul op

* inherit base ops

* add "v2"

* move OPMaker

* Gradually add fused_matmul files

* second batch of fused_matmul changes

* split infershapes of matmul_v2 and fused_matmul

* inherit fused_matmul from matmul_v2

* Update paddle/phi/backends/onednn/onednn_reuse.h
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

* Update paddle/phi/kernels/fusion/onednn/fused_matmul_kernel.cc
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

---------
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

5cfe1645

02 2月, 2023 1 次提交
- R
  
  [CustomDevice] refine custom device api (#50152) · dd480273
  由 ronnywang 提交于 2月 02, 2023
  
  dd480273
01 2月, 2023 1 次提交
- Z
  
  support grid_sampler_grad op for XPU (#49857) · 520f48d6
  由 zhangyikun02 提交于 2月 01, 2023
  
  520f48d6
31 1月, 2023 2 次提交
- W
  
  bind pixel_shuffle & pixel_shuffle_grad op for xpu (#50090) · a5f2e1f7
  由 wangshengxiang 提交于 1月 31, 2023
  
  a5f2e1f7
- R
  Add unified device management api (#48651) · 7aaaa1c6
  由 ronnywang 提交于 1月 31, 2023
```
* [CustomDevice] add custom device api

* update

* update

* test=document_fix

* update

* update

* add  examples
```
  7aaaa1c6
30 1月, 2023 1 次提交

Support stream priority for standalone executor (#49939) · 172d1de6

由 Ruibiao Chen 提交于 1月 30, 2023

* Support stream priority for standalone executor

* Fix compile error

* Fix compile error

* Fix compile error

* Fix compile error

* Fix compile error

172d1de6

19 1月, 2023 1 次提交

[KUNLUN] add op: maxpool_with_index (#49505) · f71f77e9

由 jameszhang 提交于 1月 19, 2023

* [KUNLUN] add op: maxpool_with_index

* use DeviceContext::Alloc() instead of DenseTensor::mutable_data()

* fix file format

* solve clip unittest failure

* minor fix

* Revert "solve clip unittest failure" since the issue is fixed
in #49535

This reverts commit 1127adc66e79afe35ac3c00bb34e6aaa7cd7d78b.

* align with xdnn on the definition of mask in max_pool_with_index

* minor

f71f77e9

18 1月, 2023 4 次提交

Handle repetitive code in oneDNN activation fuse passes (#49824) · a1b2e1e2

由 Sławomir Siwek 提交于 1月 18, 2023

* extract fuse pass logic to header file

* adjust namespaces

* Update paddle/fluid/framework/ir/mkldnn/activation_onednn_fuse_pass.h

update date
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

* add inline remove static
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

a1b2e1e2

[PHI] remove bitwise and, or, xor (#49916) · 9056cc8b

由 RuohengMa 提交于 1月 18, 2023

* add reduce_sum_int64 and reduce_sum_int8 xpu kernels

* [PHI] add clip grad kernel with support type float32 and int32

* [PHI unittest] add clip_grad unit test

* adapt code to clang-format

* update xpu api output with clip_grad api

* remove int8 support of reduce_sum xpu kernel since it can not pass unit tests

* adapt license date, add code for XPUDataType convertion

* add int8 support of reduce_sum

* add reduce_sum unit tests for dtype int64, int8, and add more test cases

* update license date

* remove buggy bitwise and, or and xor xpu kernels, refine bitwise not xpu kernel

* change license date

9056cc8b

H

[XPU] add logical_not op. (#49911) · 60d1199a
由 houj04 提交于 1月 18, 2023

60d1199a

use default XPU stream for computing (#49806) · f6b23d6d

由 jameszhang 提交于 1月 18, 2023

* revert to use default XPU stream for computing

XPUContext now has a null stream by default. If you want to use a separate stream
 (e.g. in async collective communication), you should create a dedicated XPUContext
and invoke its XPUContext::CreateStream()

* minor

f6b23d6d

16 1月, 2023 1 次提交
- Q
  
  add prod for kunlun (#49816) · bd03652f
  由 QingshuChen 提交于 1月 16, 2023
  
  bd03652f
13 1月, 2023 5 次提交
- D
  [Custom Device] Clear ProcessGroup Manually (#49182) · a923a757
  由 duanyanhui 提交于 1月 13, 2023
```
* clear ProcessGroupCustom manually

* fix bug

* fix bug

* move destroy ProcessGroup to ProcessGroupIdMap

* enable destroy to all device

* remove unused comments

* change to internal api

* Update process_group.cc

* Update process_group.cc
```
  a923a757
- J
  kunlun add support for c_concat and c_split (#49757) · a09b9a3f
  由 jameszhang 提交于 1月 13, 2023
```
* kunlun add support for c_concat and c_split

* replace mutable_data() and ShareDataWith()
```
  a09b9a3f
- Y
  
  add xpu adagrad and where_grad kernels (#49701) · a99c3cd4
  由 ykkk2333 提交于 1月 13, 2023
  
  a99c3cd4
- J
  fix xpu unittest issue (#49760) · ddc8a726
  由 jameszhang 提交于 1月 13, 2023
```
* fix xpu unittest issue: zero_dim_tensor

* deal with leftout issue introduced by #49470
```
  ddc8a726
- W
  
  add prelu & prelu_grad op for xpu (#49672) · 8d512b8f
  由 wangshengxiang 提交于 1月 13, 2023
  
  8d512b8f
12 1月, 2023 3 次提交
- Y
  
  deal with conflict (#49766) · 27aec62b
  由 YuanRisheng 提交于 1月 12, 2023
  
  27aec62b
- L
  Fix the bugs of set_value and set_value_grad ops and add register in (#49750) · 438975fd
  由 Leo Guo 提交于 1月 12, 2023
```
xpu2_op_list.cc. test=kunlun
```
  438975fd
- Y
  [PHI]Rename some PHI Kernel (#49470) · 30f5e39b
  由 YuanRisheng 提交于 1月 12, 2023
```
* rename kernel

* delete sig

* modify code according comment

* fix ci bugs
```
  30f5e39b
10 1月, 2023 2 次提交
- L
  Optimization for StackGradCUDAKernel for last dimension stack case. (#48992) · 0cae5c7f
  由 limingshu 提交于 1月 10, 2023
```
* add stack grad kernel optimization

* add basic optimization kernel for stack_grad_kernel

* optimization of stack_grad_kernel for last dim stack and change code format with pre-commit
```
  0cae5c7f
- Add cuda compiled arch check (#49592) · c0d6ec63
  由 MarDino 提交于 1月 10, 2023
  
  c0d6ec63

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功