提交 · 032da73170080c56830dd4bba7a943b4bd45a05e · BaiXuePrincess / Paddle

05 1月, 2023 1 次提交

Support 0D for paddle.sort/argsort (#49501) · 032da731

由 Siming Dai 提交于 1月 05, 2023

* support 0D for paddle.sort/argsort

* support 0D tensor for paddle.sort/argsort in xpu

* fix bug

* fix grad and add value assertion

032da731

29 12月, 2022 1 次提交
- Y
  
  xpu kernels support api int64 vector inputs, test=kunlun (#49336) · 3c2420a3
  由 ykkk2333 提交于 12月 29, 2022
  
  3c2420a3
27 12月, 2022 1 次提交
- Z
  
  add unbind op for xpu (#49356) · 16931039
  由 zhangyikun02 提交于 12月 27, 2022
  
  16931039
26 12月, 2022 1 次提交

fix dlrm qpsproblem (#49171) · c8f76337

由 ykkk2333 提交于 12月 26, 2022

* migrate shaple sgd, split,sign xpu kernels to phi, test=kunlun

* fix dlrm throughput problem, test=kunlun

c8f76337

23 12月, 2022 2 次提交
- Q
  
  suport recompute for kunlun (#49069) · 98c17a68
  由 QingshuChen 提交于 12月 23, 2022
  
  98c17a68
- H
  
  square_grad support fp16 *test=kunlun (#48847) · ae544586
  由 haosicheng 提交于 12月 23, 2022
  
  ae544586
22 12月, 2022 1 次提交
- Q
  
  fix softmax_with_cross_entropy bug for kunlun (#49207) · b421d7a5
  由 QingshuChen 提交于 12月 22, 2022
  
  b421d7a5
20 12月, 2022 1 次提交
- H
  
  disable set_value in kp *test=kunlun (#49153) · c830a28e
  由 haosicheng 提交于 12月 20, 2022
  
  c830a28e
19 12月, 2022 1 次提交
- Z
  
  add diag_v2 op for xpu, test=kunlun (#49088) · 922f0868
  由 zhangyikun02 提交于 12月 19, 2022
  
  922f0868
14 12月, 2022 1 次提交

nullptr bugfix for XPU pg mode (#49043) · f0dab193

由 james 提交于 12月 14, 2022

* nullptr bugfix for XPU pg mode

Also a few kernels is added to xpu whitelist

* increase error msg length

f0dab193

08 12月, 2022 1 次提交
- H
  
  [XPU] add set_value and set_value_grad (#48845) · 94fe929a
  由 haosicheng 提交于 12月 08, 2022
  
  94fe929a
07 12月, 2022 1 次提交
- Z
  
  modify d2d copy to xpu::copy in xpu kernel, test=kunlun (#48710) · 0d8ddf9f
  由 zhangyikun02 提交于 12月 07, 2022
  
  0d8ddf9f
06 12月, 2022 2 次提交
- H
  
  [XPU] add tile_grad op (#48720) · 8de336f9
  由 houj04 提交于 12月 06, 2022
  
  8de336f9
- Y
  add xpu centered rmsprop (#48658) · 54b756e2
  由 ykkk2333 提交于 12月 06, 2022
```
* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

* add xpu rmsprop centered, test=kunlun
```
  54b756e2
05 12月, 2022 1 次提交

Replace mutable_data with DeviceContext.Alloc in phi kernels (#48500) · 34a957e3

由 Ruibiao Chen 提交于 12月 05, 2022

* Replace mutable_data with DeviceContext.Alloc in phi kernels

* Fix CI errors

* Fix CI errors

* Fix CI errors, test=kunlun

* Fix CI errors, test=kunlun

* Handle rnn_functor

* Update approvals

34a957e3

03 12月, 2022 1 次提交
- Y
  
  Scatter 0D index for gather, 0D index and 0D updates for scatter. (#48452) · f9815bfe
  由 Yuang Liu 提交于 12月 03, 2022
  
  f9815bfe
02 12月, 2022 3 次提交

[XPU ]Fix xpu compile error (#48621) · 2af82190

由 Jiabin Yang 提交于 12月 02, 2022

* [Eager] Fix paddle.grad interface

* [Eager] Support minimum SubGraph for GeneralGrad

* Add needed_nodes to prune grad graph more thoroughly

* [Eager] Add grad_node_trans_mapping_ to record which grad_node has been transformed to AccumulationNode

* [Eager] Fix paddle.grad interface

* Polish code

* remove potential_stop_node

* Add endding_nodes to enhance genSugraph logic

* clear endding_nodes_

* polish code

* rename endding_nodes to endding_nades_

* Refactor grad interface

* Add register_hook case to fix coverage-ci

* Fix code format

* Refactor general_grad

* Add more code comments

* call clear directly to release GradSlotMeta

* fix a mistake

* fix matmul/ multiply kernel logic and optional input in yaml, fill zeros logic and so on.

* fix batch_norm_double_grad yaml optional config

* fix tanh_triple_grad yaml and kernels

* fix MultiplyTripleGradKernel optional logic

* fix merge mistake

* fix compile error

* remove legacy attr for bn

* polish code

* fix some kernel

* merge develop

* fix error

* remote log

* fix kernel with full like

* hide value log behind

* hide value log behind

* fix matmul_triple grad

* fix xpu compile error

* fix xpu compile error

* fix xpu ut

* fix xpu ut

* fix_xpu_compile_error
Co-authored-by: NWeilong Wu <veyron_wu@163.com>

2af82190

[Eager] Optimize Grad by prune useless branch (#47827) · d1e93be1

由 Jiabin Yang 提交于 12月 02, 2022

* [Eager] Fix paddle.grad interface

* [Eager] Support minimum SubGraph for GeneralGrad

* Add needed_nodes to prune grad graph more thoroughly

* [Eager] Add grad_node_trans_mapping_ to record which grad_node has been transformed to AccumulationNode

* [Eager] Fix paddle.grad interface

* Polish code

* remove potential_stop_node

* Add endding_nodes to enhance genSugraph logic

* clear endding_nodes_

* polish code

* rename endding_nodes to endding_nades_

* Refactor grad interface

* Add register_hook case to fix coverage-ci

* Fix code format

* Refactor general_grad

* Add more code comments

* call clear directly to release GradSlotMeta

* fix a mistake

* fix matmul/ multiply kernel logic and optional input in yaml, fill zeros logic and so on.

* fix batch_norm_double_grad yaml optional config

* fix tanh_triple_grad yaml and kernels

* fix MultiplyTripleGradKernel optional logic

* fix merge mistake

* fix compile error

* remove legacy attr for bn

* polish code

* fix some kernel

* merge develop

* fix error

* remote log

* fix kernel with full like

* hide value log behind

* hide value log behind

* fix matmul_triple grad
Co-authored-by: NWeilong Wu <veyron_wu@163.com>

d1e93be1

add silu, silu_grad, unfold and unfold_grad xpu kernels (#48325) · f71de378

由 ykkk2333 提交于 12月 02, 2022

* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

* add silu, unfold and their grads,test=kunlun

f71de378

01 12月, 2022 2 次提交
- Z
  Rename kernel for top_k, slogdeterminant, generate_proposals_v2 (#48594) · 3d35aa80
  由 zyfncg 提交于 12月 01, 2022
```
* rename kernel for top_k, slogdeterminant, generate_proposals_v2

* fix bug
```
  3d35aa80
- Z
  
  change d2d copy to api copy in xpu kernel, test=kunlun (#48505) · 4f834cb2
  由 zhangyikun02 提交于 12月 01, 2022
  
  4f834cb2
30 11月, 2022 1 次提交
- Z
  
  optimize for argsort with xpu, test=kunlun (#48440) · 7bf7e6e0
  由 zhangyikun02 提交于 11月 30, 2022
  
  7bf7e6e0
29 11月, 2022 1 次提交
- H
  
  add floor fp32 op *test=kunlun (#48458) · 9d4b4be3
  由 haosicheng 提交于 11月 29, 2022
  
  9d4b4be3
28 11月, 2022 3 次提交
- H
  [Phi decouple] remove dependece to "paddle/fluid/platform/device/xpu/xxx.h" in phi (#48420) · 2bae75ed
  由 huangjiyi 提交于 11月 28, 2022
```
* rm fluid “xpu_header.h” deps in phi

* move part of xpu_op_list.h from fluid to phi

* add fluid xpu_op_list deps

* add glog deps for xpu_op_list in phi

* fix PR-CI-Kunlun
```
  2bae75ed
- Z
  Fix bug of TransToFluidOpName (#48355) · d3f52efd
  由 zyfncg 提交于 11月 28, 2022
```
* add fluid_op_name_map

* rename some kernel name

* add comments for op-kernel map

* refine map name of op to kernel
```
  d3f52efd
- H
  
  add square fp16 *test=kunlun (#48095) · 81d0a3cc
  由 haosicheng 提交于 11月 28, 2022
  
  81d0a3cc
24 11月, 2022 2 次提交
- Z
  
  add exp_grad, hard_sigmoid and hard_sigmoid_grad for xpu, test=kunlun (#48307) · d2f87d96
  由 zhangyikun02 提交于 11月 24, 2022
  
  d2f87d96
- Z
  
  add pad3d and pad3d_grad op for xpu, test=kunlun (#48306) · 22555e96
  由 zhangyikun02 提交于 11月 24, 2022
  
  22555e96
23 11月, 2022 2 次提交
- Y
  add masked_select_grad kernel (#48137) · db0ea0ce
  由 ykkk2333 提交于 11月 23, 2022
```
* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

* add masked_selected_grad kernel,test=kunlun
```
  db0ea0ce
- Z
  
  add warpctc kernel and change cast_v2 to cast for xpu, test=kunlun (#48134) · 25ffe9c2
  由 zhangyikun02 提交于 11月 23, 2022
  
  25ffe9c2
21 11月, 2022 2 次提交
- W
  refine reduce_all (#48133) · 56f15c43
  由 wanghuancoder 提交于 11月 21, 2022
```
* refine reduce_all
```
  56f15c43
- T
  
  add adamw suppor xpu, test=kunlun (#48114) · 27e252d9
  由 taixiurong 提交于 11月 21, 2022
  
  27e252d9
18 11月, 2022 2 次提交

correct sync behavior for XPU distributed training (#47882) · aafa9820

由 james 提交于 11月 18, 2022

* correct sync behavior for XPU distributed training

XPU support event mechanism similar to cuda event, so it is advisable to
use an event to sync compute/comm streams for performance. However this
mechanism is never fully tested, and inconsistent loss/ending_epochs are
reported. Therefore, this PR replaces event sync with stream waiting as
a temporary solution.

* remove compile warning

aafa9820

Z

cast and gradient_accumulator support double for xpu, test=kunlun (#47800) · 982d5ff7
由 zhangyikun02 提交于 11月 18, 2022

982d5ff7

17 11月, 2022 2 次提交
- Y
  [PHI]Standardise some C++ API (Part5) (#47860) · f3650201
  由 YuanRisheng 提交于 11月 17, 2022
```
* standard api

* fix xpu bugs
```
  f3650201
- T
  
  xpu-paddlepaddle-41 [任务] ffn and attention test=kunlun (#46658) · 071708fa
  由 taixiurong 提交于 11月 17, 2022
  
  071708fa
16 11月, 2022 1 次提交

Fix paddle rec, kim, dsin models' bugs (#47792) · e23dfed9

由 ykkk2333 提交于 11月 16, 2022

* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

* embedding and embedding_grad add int32 input, test=kunlun

e23dfed9

15 11月, 2022 1 次提交
- [Zero-Dim] support input 0D Tensor for xpu kernel, test=kunlun (#47849) · d4d3d7ed
  由 zhouweiwei2014 提交于 11月 15, 2022
  
  d4d3d7ed
11 11月, 2022 1 次提交
- [Zero-Dim] fix batch_norm op infermeta bug (#47858) · 18549417
  由 zhouweiwei2014 提交于 11月 11, 2022
  
  18549417
10 11月, 2022 1 次提交
- Z
  
  conv2d_transpose and deformable_conv unrestricted some limit for xpu2, test=kunlun (#47837) · a38fc5e1
  由 zhangyikun02 提交于 11月 10, 2022
  
  a38fc5e1

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致