提交 · ae5445860efc7dda11f257d532c9ff1a1d9ca185 · BaiXuePrincess / Paddle

23 12月, 2022 1 次提交
- H
  
  square_grad support fp16 *test=kunlun (#48847) · ae544586
  由 haosicheng 提交于 12月 23, 2022
  
  ae544586
22 12月, 2022 1 次提交
- Q
  
  fix softmax_with_cross_entropy bug for kunlun (#49207) · b421d7a5
  由 QingshuChen 提交于 12月 22, 2022
  
  b421d7a5
20 12月, 2022 1 次提交
- H
  
  disable set_value in kp *test=kunlun (#49153) · c830a28e
  由 haosicheng 提交于 12月 20, 2022
  
  c830a28e
19 12月, 2022 1 次提交
- Z
  
  add diag_v2 op for xpu, test=kunlun (#49088) · 922f0868
  由 zhangyikun02 提交于 12月 19, 2022
  
  922f0868
14 12月, 2022 1 次提交

nullptr bugfix for XPU pg mode (#49043) · f0dab193

由 james 提交于 12月 14, 2022

* nullptr bugfix for XPU pg mode

Also a few kernels is added to xpu whitelist

* increase error msg length

f0dab193

08 12月, 2022 1 次提交
- H
  
  [XPU] add set_value and set_value_grad (#48845) · 94fe929a
  由 haosicheng 提交于 12月 08, 2022
  
  94fe929a
07 12月, 2022 1 次提交
- Z
  
  modify d2d copy to xpu::copy in xpu kernel, test=kunlun (#48710) · 0d8ddf9f
  由 zhangyikun02 提交于 12月 07, 2022
  
  0d8ddf9f
06 12月, 2022 2 次提交
- H
  
  [XPU] add tile_grad op (#48720) · 8de336f9
  由 houj04 提交于 12月 06, 2022
  
  8de336f9
- Y
  add xpu centered rmsprop (#48658) · 54b756e2
  由 ykkk2333 提交于 12月 06, 2022
```
* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

* add xpu rmsprop centered, test=kunlun
```
  54b756e2
05 12月, 2022 1 次提交

Replace mutable_data with DeviceContext.Alloc in phi kernels (#48500) · 34a957e3

由 Ruibiao Chen 提交于 12月 05, 2022

* Replace mutable_data with DeviceContext.Alloc in phi kernels

* Fix CI errors

* Fix CI errors

* Fix CI errors, test=kunlun

* Fix CI errors, test=kunlun

* Handle rnn_functor

* Update approvals

34a957e3

03 12月, 2022 1 次提交
- Y
  
  Scatter 0D index for gather, 0D index and 0D updates for scatter. (#48452) · f9815bfe
  由 Yuang Liu 提交于 12月 03, 2022
  
  f9815bfe
02 12月, 2022 3 次提交

[XPU ]Fix xpu compile error (#48621) · 2af82190

由 Jiabin Yang 提交于 12月 02, 2022

* [Eager] Fix paddle.grad interface

* [Eager] Support minimum SubGraph for GeneralGrad

* Add needed_nodes to prune grad graph more thoroughly

* [Eager] Add grad_node_trans_mapping_ to record which grad_node has been transformed to AccumulationNode

* [Eager] Fix paddle.grad interface

* Polish code

* remove potential_stop_node

* Add endding_nodes to enhance genSugraph logic

* clear endding_nodes_

* polish code

* rename endding_nodes to endding_nades_

* Refactor grad interface

* Add register_hook case to fix coverage-ci

* Fix code format

* Refactor general_grad

* Add more code comments

* call clear directly to release GradSlotMeta

* fix a mistake

* fix matmul/ multiply kernel logic and optional input in yaml, fill zeros logic and so on.

* fix batch_norm_double_grad yaml optional config

* fix tanh_triple_grad yaml and kernels

* fix MultiplyTripleGradKernel optional logic

* fix merge mistake

* fix compile error

* remove legacy attr for bn

* polish code

* fix some kernel

* merge develop

* fix error

* remote log

* fix kernel with full like

* hide value log behind

* hide value log behind

* fix matmul_triple grad

* fix xpu compile error

* fix xpu compile error

* fix xpu ut

* fix xpu ut

* fix_xpu_compile_error
Co-authored-by: NWeilong Wu <veyron_wu@163.com>

2af82190

[Eager] Optimize Grad by prune useless branch (#47827) · d1e93be1

由 Jiabin Yang 提交于 12月 02, 2022

* [Eager] Fix paddle.grad interface

* [Eager] Support minimum SubGraph for GeneralGrad

* Add needed_nodes to prune grad graph more thoroughly

* [Eager] Add grad_node_trans_mapping_ to record which grad_node has been transformed to AccumulationNode

* [Eager] Fix paddle.grad interface

* Polish code

* remove potential_stop_node

* Add endding_nodes to enhance genSugraph logic

* clear endding_nodes_

* polish code

* rename endding_nodes to endding_nades_

* Refactor grad interface

* Add register_hook case to fix coverage-ci

* Fix code format

* Refactor general_grad

* Add more code comments

* call clear directly to release GradSlotMeta

* fix a mistake

* fix matmul/ multiply kernel logic and optional input in yaml, fill zeros logic and so on.

* fix batch_norm_double_grad yaml optional config

* fix tanh_triple_grad yaml and kernels

* fix MultiplyTripleGradKernel optional logic

* fix merge mistake

* fix compile error

* remove legacy attr for bn

* polish code

* fix some kernel

* merge develop

* fix error

* remote log

* fix kernel with full like

* hide value log behind

* hide value log behind

* fix matmul_triple grad
Co-authored-by: NWeilong Wu <veyron_wu@163.com>

d1e93be1

add silu, silu_grad, unfold and unfold_grad xpu kernels (#48325) · f71de378

由 ykkk2333 提交于 12月 02, 2022

* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

* add silu, unfold and their grads,test=kunlun

f71de378

01 12月, 2022 2 次提交
- Z
  Rename kernel for top_k, slogdeterminant, generate_proposals_v2 (#48594) · 3d35aa80
  由 zyfncg 提交于 12月 01, 2022
```
* rename kernel for top_k, slogdeterminant, generate_proposals_v2

* fix bug
```
  3d35aa80
- Z
  
  change d2d copy to api copy in xpu kernel, test=kunlun (#48505) · 4f834cb2
  由 zhangyikun02 提交于 12月 01, 2022
  
  4f834cb2
30 11月, 2022 1 次提交
- Z
  
  optimize for argsort with xpu, test=kunlun (#48440) · 7bf7e6e0
  由 zhangyikun02 提交于 11月 30, 2022
  
  7bf7e6e0
29 11月, 2022 1 次提交
- H
  
  add floor fp32 op *test=kunlun (#48458) · 9d4b4be3
  由 haosicheng 提交于 11月 29, 2022
  
  9d4b4be3
28 11月, 2022 3 次提交
- H
  [Phi decouple] remove dependece to "paddle/fluid/platform/device/xpu/xxx.h" in phi (#48420) · 2bae75ed
  由 huangjiyi 提交于 11月 28, 2022
```
* rm fluid “xpu_header.h” deps in phi

* move part of xpu_op_list.h from fluid to phi

* add fluid xpu_op_list deps

* add glog deps for xpu_op_list in phi

* fix PR-CI-Kunlun
```
  2bae75ed
- Z
  Fix bug of TransToFluidOpName (#48355) · d3f52efd
  由 zyfncg 提交于 11月 28, 2022
```
* add fluid_op_name_map

* rename some kernel name

* add comments for op-kernel map

* refine map name of op to kernel
```
  d3f52efd
- H
  
  add square fp16 *test=kunlun (#48095) · 81d0a3cc
  由 haosicheng 提交于 11月 28, 2022
  
  81d0a3cc
24 11月, 2022 2 次提交
- Z
  
  add exp_grad, hard_sigmoid and hard_sigmoid_grad for xpu, test=kunlun (#48307) · d2f87d96
  由 zhangyikun02 提交于 11月 24, 2022
  
  d2f87d96
- Z
  
  add pad3d and pad3d_grad op for xpu, test=kunlun (#48306) · 22555e96
  由 zhangyikun02 提交于 11月 24, 2022
  
  22555e96
23 11月, 2022 2 次提交
- Y
  add masked_select_grad kernel (#48137) · db0ea0ce
  由 ykkk2333 提交于 11月 23, 2022
```
* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

* add masked_selected_grad kernel,test=kunlun
```
  db0ea0ce
- Z
  
  add warpctc kernel and change cast_v2 to cast for xpu, test=kunlun (#48134) · 25ffe9c2
  由 zhangyikun02 提交于 11月 23, 2022
  
  25ffe9c2
21 11月, 2022 2 次提交
- W
  refine reduce_all (#48133) · 56f15c43
  由 wanghuancoder 提交于 11月 21, 2022
```
* refine reduce_all
```
  56f15c43
- T
  
  add adamw suppor xpu, test=kunlun (#48114) · 27e252d9
  由 taixiurong 提交于 11月 21, 2022
  
  27e252d9
18 11月, 2022 2 次提交

correct sync behavior for XPU distributed training (#47882) · aafa9820

由 james 提交于 11月 18, 2022

* correct sync behavior for XPU distributed training

XPU support event mechanism similar to cuda event, so it is advisable to
use an event to sync compute/comm streams for performance. However this
mechanism is never fully tested, and inconsistent loss/ending_epochs are
reported. Therefore, this PR replaces event sync with stream waiting as
a temporary solution.

* remove compile warning

aafa9820

Z

cast and gradient_accumulator support double for xpu, test=kunlun (#47800) · 982d5ff7
由 zhangyikun02 提交于 11月 18, 2022

982d5ff7

17 11月, 2022 2 次提交
- Y
  [PHI]Standardise some C++ API (Part5) (#47860) · f3650201
  由 YuanRisheng 提交于 11月 17, 2022
```
* standard api

* fix xpu bugs
```
  f3650201
- T
  
  xpu-paddlepaddle-41 [任务] ffn and attention test=kunlun (#46658) · 071708fa
  由 taixiurong 提交于 11月 17, 2022
  
  071708fa
16 11月, 2022 1 次提交

Fix paddle rec, kim, dsin models' bugs (#47792) · e23dfed9

由 ykkk2333 提交于 11月 16, 2022

* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

* embedding and embedding_grad add int32 input, test=kunlun

e23dfed9

15 11月, 2022 1 次提交
- [Zero-Dim] support input 0D Tensor for xpu kernel, test=kunlun (#47849) · d4d3d7ed
  由 zhouweiwei2014 提交于 11月 15, 2022
  
  d4d3d7ed
11 11月, 2022 1 次提交
- [Zero-Dim] fix batch_norm op infermeta bug (#47858) · 18549417
  由 zhouweiwei2014 提交于 11月 11, 2022
  
  18549417
10 11月, 2022 6 次提交

Z

conv2d_transpose and deformable_conv unrestricted some limit for xpu2, test=kunlun (#47837) · a38fc5e1
由 zhangyikun02 提交于 11月 10, 2022

a38fc5e1

[PHI]Standardise some C++ API (Part4) (#47702) · 594bd723

由 YuanRisheng 提交于 11月 10, 2022

* standard api

* fix sparse bugs

* fix xpu bugs, test=kunlun

* remove hard code for custom unittest

* open ci, test=kunlun

* deal with conflict

594bd723

W
[PHI decoupling] remove fluid/framework/generator.h from phi (#47822) · 28c56d77
由 Wang Xin 提交于 11月 10, 2022
```
* remove fluid/framework/generator.h from phi

* fix PR-CI-Kunlun-KP-Build fail
```
28c56d77

[PHI Decoupling] remove "paddle/fluid/platform/float16.h" and... · 8164b97a

由 huangjiyi 提交于 11月 10, 2022

[PHI Decoupling] remove "paddle/fluid/platform/float16.h" and "paddle/fluid/platform/for_range.h" in phi. (#47817)

* rm "paddle/fluid/platform/float16.h" in phi

* rm "paddle/fluid/platform/for_range.h" in phi

8164b97a

[Zero-Dim] support input 0D Tensor for xpu compare kernel, test=kunlun (#47812) · d01109fc
由 zhouweiwei2014 提交于 11月 10, 2022

d01109fc

XPU multi-card support eager mode (#47445) · 3b91f8f3

由 james 提交于 11月 10, 2022

* XPU support eager mode

* add unittest for XPU eager mode

* minor bugfix

* minor bugfix, test=kunlun

* correct copyright info

* 1. remove unsed vars/funcs
2. ProcessGroupBKCL inherit from ProcessGroupStream

* bugfix for fp16 in eager mode multi-card, test=kunlun

* rebase & fix a few issues

* use new processgroup interface, test=kunlun

* fix compile issue, test=kunlun

3b91f8f3

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致