提交 · dd4802735cf5634004187d3ff403eeb773e9eeb0 · BaiXuePrincess / Paddle

02 2月, 2023 1 次提交
- R
  
  [CustomDevice] refine custom device api (#50152) · dd480273
  由 ronnywang 提交于 2月 02, 2023
  
  dd480273
01 2月, 2023 1 次提交
- Z
  
  support grid_sampler_grad op for XPU (#49857) · 520f48d6
  由 zhangyikun02 提交于 2月 01, 2023
  
  520f48d6
31 1月, 2023 2 次提交
- W
  
  bind pixel_shuffle & pixel_shuffle_grad op for xpu (#50090) · a5f2e1f7
  由 wangshengxiang 提交于 1月 31, 2023
  
  a5f2e1f7
- R
  Add unified device management api (#48651) · 7aaaa1c6
  由 ronnywang 提交于 1月 31, 2023
```
* [CustomDevice] add custom device api

* update

* update

* test=document_fix

* update

* update

* add  examples
```
  7aaaa1c6
30 1月, 2023 1 次提交

Support stream priority for standalone executor (#49939) · 172d1de6

由 Ruibiao Chen 提交于 1月 30, 2023

* Support stream priority for standalone executor

* Fix compile error

* Fix compile error

* Fix compile error

* Fix compile error

* Fix compile error

172d1de6

19 1月, 2023 1 次提交

[KUNLUN] add op: maxpool_with_index (#49505) · f71f77e9

由 jameszhang 提交于 1月 19, 2023

* [KUNLUN] add op: maxpool_with_index

* use DeviceContext::Alloc() instead of DenseTensor::mutable_data()

* fix file format

* solve clip unittest failure

* minor fix

* Revert "solve clip unittest failure" since the issue is fixed
in #49535

This reverts commit 1127adc66e79afe35ac3c00bb34e6aaa7cd7d78b.

* align with xdnn on the definition of mask in max_pool_with_index

* minor

f71f77e9

18 1月, 2023 4 次提交

Handle repetitive code in oneDNN activation fuse passes (#49824) · a1b2e1e2

由 Sławomir Siwek 提交于 1月 18, 2023

* extract fuse pass logic to header file

* adjust namespaces

* Update paddle/fluid/framework/ir/mkldnn/activation_onednn_fuse_pass.h

update date
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

* add inline remove static
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

a1b2e1e2

[PHI] remove bitwise and, or, xor (#49916) · 9056cc8b

由 RuohengMa 提交于 1月 18, 2023

* add reduce_sum_int64 and reduce_sum_int8 xpu kernels

* [PHI] add clip grad kernel with support type float32 and int32

* [PHI unittest] add clip_grad unit test

* adapt code to clang-format

* update xpu api output with clip_grad api

* remove int8 support of reduce_sum xpu kernel since it can not pass unit tests

* adapt license date, add code for XPUDataType convertion

* add int8 support of reduce_sum

* add reduce_sum unit tests for dtype int64, int8, and add more test cases

* update license date

* remove buggy bitwise and, or and xor xpu kernels, refine bitwise not xpu kernel

* change license date

9056cc8b

H

[XPU] add logical_not op. (#49911) · 60d1199a
由 houj04 提交于 1月 18, 2023

60d1199a

use default XPU stream for computing (#49806) · f6b23d6d

由 jameszhang 提交于 1月 18, 2023

* revert to use default XPU stream for computing

XPUContext now has a null stream by default. If you want to use a separate stream
 (e.g. in async collective communication), you should create a dedicated XPUContext
and invoke its XPUContext::CreateStream()

* minor

f6b23d6d

16 1月, 2023 1 次提交
- Q
  
  add prod for kunlun (#49816) · bd03652f
  由 QingshuChen 提交于 1月 16, 2023
  
  bd03652f
13 1月, 2023 5 次提交
- D
  [Custom Device] Clear ProcessGroup Manually (#49182) · a923a757
  由 duanyanhui 提交于 1月 13, 2023
```
* clear ProcessGroupCustom manually

* fix bug

* fix bug

* move destroy ProcessGroup to ProcessGroupIdMap

* enable destroy to all device

* remove unused comments

* change to internal api

* Update process_group.cc

* Update process_group.cc
```
  a923a757
- J
  kunlun add support for c_concat and c_split (#49757) · a09b9a3f
  由 jameszhang 提交于 1月 13, 2023
```
* kunlun add support for c_concat and c_split

* replace mutable_data() and ShareDataWith()
```
  a09b9a3f
- Y
  
  add xpu adagrad and where_grad kernels (#49701) · a99c3cd4
  由 ykkk2333 提交于 1月 13, 2023
  
  a99c3cd4
- J
  fix xpu unittest issue (#49760) · ddc8a726
  由 jameszhang 提交于 1月 13, 2023
```
* fix xpu unittest issue: zero_dim_tensor

* deal with leftout issue introduced by #49470
```
  ddc8a726
- W
  
  add prelu & prelu_grad op for xpu (#49672) · 8d512b8f
  由 wangshengxiang 提交于 1月 13, 2023
  
  8d512b8f
12 1月, 2023 3 次提交
- Y
  
  deal with conflict (#49766) · 27aec62b
  由 YuanRisheng 提交于 1月 12, 2023
  
  27aec62b
- L
  Fix the bugs of set_value and set_value_grad ops and add register in (#49750) · 438975fd
  由 Leo Guo 提交于 1月 12, 2023
```
xpu2_op_list.cc. test=kunlun
```
  438975fd
- Y
  [PHI]Rename some PHI Kernel (#49470) · 30f5e39b
  由 YuanRisheng 提交于 1月 12, 2023
```
* rename kernel

* delete sig

* modify code according comment

* fix ci bugs
```
  30f5e39b
10 1月, 2023 2 次提交
- L
  Optimization for StackGradCUDAKernel for last dimension stack case. (#48992) · 0cae5c7f
  由 limingshu 提交于 1月 10, 2023
```
* add stack grad kernel optimization

* add basic optimization kernel for stack_grad_kernel

* optimization of stack_grad_kernel for last dim stack and change code format with pre-commit
```
  0cae5c7f
- Add cuda compiled arch check (#49592) · c0d6ec63
  由 MarDino 提交于 1月 10, 2023
  
  c0d6ec63
09 1月, 2023 2 次提交
- Q
  
  add fill/fill_any for kunlun (#49645) · 31ea3231
  由 QingshuChen 提交于 1月 09, 2023
  
  31ea3231
- Y
  [XPU] add einsum fill diagonal and diagonal kernels (#49465) · a5bf156b
  由 ykkk2333 提交于 1月 09, 2023
```
* migrate shaple sgd, split,sign xpu kernels to phi, test=kunlun

* fix dlrm throughput problem, test=kunlun

* add xpu einsum, fill_diagonal, and diagonal kernels, test=kunlun
```
  a5bf156b
06 1月, 2023 3 次提交
- R
  Dev (#49591) · 07db4a9f
  由 RuohengMa 提交于 1月 06, 2023
```
* add bitwise and, bitwise not, bitwise or and bitwise xor

* correct typo
```
  07db4a9f
- H
  
  fix typo, compatiable->compatible, test=document_fix (#49552) · 6ec8dfdd
  由 HongyuJia 提交于 1月 06, 2023
  
  6ec8dfdd
- 张
  
  Expansions of some unmaintained pr (#49551) · 419c2d14
  由张春乔提交于 1月 06, 2023
  
  419c2d14
03 1月, 2023 1 次提交
- L
  
  H2D data transfer optimization for concat kernel (#49040) · 0de94cd9
  由 limingshu 提交于 1月 03, 2023
  
  0de94cd9
27 12月, 2022 1 次提交
- Z
  
  add unbind op for xpu (#49356) · 16931039
  由 zhangyikun02 提交于 12月 27, 2022
  
  16931039
26 12月, 2022 1 次提交

fix dlrm qpsproblem (#49171) · c8f76337

由 ykkk2333 提交于 12月 26, 2022

* migrate shaple sgd, split,sign xpu kernels to phi, test=kunlun

* fix dlrm throughput problem, test=kunlun

c8f76337

23 12月, 2022 2 次提交
- H
  
  square_grad support fp16 *test=kunlun (#48847) · ae544586
  由 haosicheng 提交于 12月 23, 2022
  
  ae544586
- H
  add rnn-t loss and api (#49199) · c088f9ec
  由 Hui Zhang 提交于 12月 23, 2022
```
* add warp transducer code
```
  c088f9ec
22 12月, 2022 1 次提交
- Q
  
  fix softmax_with_cross_entropy bug for kunlun (#49207) · b421d7a5
  由 QingshuChen 提交于 12月 22, 2022
  
  b421d7a5
20 12月, 2022 1 次提交

[PHI decouple] move dropout_impl and cuda_graph_with_memory_pool from fluid to phi (#49139) · 579784e2

由 huangjiyi 提交于 12月 20, 2022

* move dropout_impl from fluid to phi

* move cuda_graph_with_memory_pool from fluid to phi

* update namespace

* remove cuad_graph in fluid

* fix mac-build

* fix bugs

* correct CodeStyle

* fix mac-build

* fix mutable_data

* fix stl include

* fix copy param

579784e2

19 12月, 2022 2 次提交
- W
  
  refactor: rename process group (#49137) · 22e416cf
  由 Wen Sun 提交于 12月 19, 2022
  
  22e416cf
- Z
  
  add diag_v2 op for xpu, test=kunlun (#49088) · 922f0868
  由 zhangyikun02 提交于 12月 19, 2022
  
  922f0868
17 12月, 2022 1 次提交
- W
  
  refactor: rename xccl files (#49127) · d4f43ad4
  由 Wen Sun 提交于 12月 17, 2022
  
  d4f43ad4
16 12月, 2022 1 次提交
- W
  
  refactor: rename files (#49117) · 40f3f4f0
  由 Wen Sun 提交于 12月 16, 2022
  
  40f3f4f0
15 12月, 2022 1 次提交
- H
  
  [PHI decoupling] move softmax from fluid to phi and remove cpu_vec.h in fluid (#48970) · 344b99e1
  由 huangjiyi 提交于 12月 15, 2022
  
  344b99e1
14 12月, 2022 1 次提交

nullptr bugfix for XPU pg mode (#49043) · f0dab193

由 james 提交于 12月 14, 2022

* nullptr bugfix for XPU pg mode

Also a few kernels is added to xpu whitelist

* increase error msg length

f0dab193

12 12月, 2022 1 次提交

傅

Optimization of Eigh op with ssyevj_batched runtime api (#48560) · 16e364d3

由傅剑寒提交于 12月 12, 2022

* fix codestyle

* add double complex<float> complex<double> dtype support for syevj_batched

* fix use_syevj flag for precision loss when input dtype of syevj_batch is complex128 in some case

* optimize eigh in different case

* fix missing ; bug

* fix use_syevj bug

* fix use_cusolver_syevj_batched flag

16e364d3

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致