提交 · c923e6c9b1e69cd5cbd0aa76a3982358538c7c4d · PaddlePaddle / Paddle

01 11月, 2022 2 次提交

Adapting device-specific Extra Attributes for the PHI kernel (#46342) · c923e6c9

由 Chen Weihang 提交于 10月 31, 2022

* add extra attr property set

* add type_info for all context

* add onednn context to all context

* fix context compile error

* simplify conv kernel args

* pass runtime attr into dev_ctx

* fix marco error

* clear conv_grad_kernel extra args

* merge conv_grad_grad into conv_grad

* clear conv2d_grad_grad extra attrs

* clear yaml and eager extra attr

* fix conv1d error

* change to thread local

* fix npu compile failed

* try to fix windows compile failed

* add conv2d onednn phi kernel

* fix ci bugs (#36)

* fix compile bugs (#38)

* fix extra input transform bug (#39)

* support dynamic created attr (#40)

* reset extra info gen code

* rm conv_grad_grad kernel

* reimpl pass attr adapting

* add int attr support

* remove vector inputnames creating

* fix map at error

* Update paddle/phi/kernels/onednn/conv_grad_kernel.cc
Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>

* remove useless extra attrs

* replace mkldnn_engine by onednn_engine
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>

c923e6c9

U

summer-ospp 2022: 飞桨PaddlePaddle Sparse Conv开发和优化: gather-gemm-scatter fuse (#46679) · 5158fa4f
由 umiswing 提交于 11月 01, 2022

5158fa4f

31 10月, 2022 6 次提交

Y
[PHI]Standardise some C++ API (#47385) · 60e0c506
由 YuanRisheng 提交于 10月 31, 2022
```
* standard api

* fix ci bugs

* fix ci bugs

* fix ce bugs
```
60e0c506

[Einsum] Einsum support repeated labels. (#47290) · 6e1c14e3

由 xiongkun 提交于 10月 31, 2022

* add unittest for einsum-v2-trace and diagonal

* repeat labels.

* einsum support repeated labels.

* forward is ok for diagonal and undiagonalized.
TODO: check backward is ok by our theorem.

* backward is ok!

* fix by PR suggestions.

* fix ci error

* fix ci error

* fix ci warning

6e1c14e3

R
[CustomDevice] GetCCLComm add custom device support (#47168) · 34d13d6a
由 ronnywang 提交于 10月 31, 2022
```
* [CustomDevice] GetCCLComm add custom device support

* update

* update

* update
```
34d13d6a

[ControlFlow] replace executor in run method of control flow ops with standalone_executor (#45696) · 3b219e5e

由 kangguangli 提交于 10月 31, 2022

* replace executor in conditional_block_op.run with standalone_executor

* add block_id as the argument of standalone executor's method run; add print for program

* fix scope bug about conditional block op

* fix bug: unnecessary return of fetch value

* fix typo

* fix: quantization will set variable persistable, and these variables must exist in global scope

* add interpretercore cache for conditional block op but not activate in default

* fix bug: local scope reuse for conditional block op

* reset scope when conditional block op runs

* fix typo

* fix typo and code style

* add build scope for conditional block op

* add skip for transfer_layout kernel

* refind code

* fix reset_scope

* fix reset_scope

* refine code

* refine code

* refine code

1. remove flag use in conditional_block_op
2. pass execution_config to BuildOpFuncList instead of individual parameter

* refine code

* remove the use of FLAGS_control_flow_use_new_executor_cache

* change FLAGS_control_flow_use_new_executor to false

3b219e5e

[Zero-Dim] support input 0D Tensor for reduce_sum/reduce_mean (#47219) · c8fc3379
由 zhouweiwei2014 提交于 10月 31, 2022

c8fc3379
W

remove boost compiler flags in flags.cmake (#47468) · 91096ae2
由 Wang Xin 提交于 10月 31, 2022

91096ae2

28 10月, 2022 1 次提交
- Z
  
  generate static graph code for some ops by yaml (#47416) · 17fb92b3
  由 zyfncg 提交于 10月 28, 2022
  
  17fb92b3
27 10月, 2022 2 次提交

Update of PHI transpose_grad (#47311) · 493fbfd7

由 Jacek Czaja 提交于 10月 27, 2022

* - halfway transforming transpose grad

- Fixes

- buildable

* - lint

* rerunning the process

493fbfd7

fix reduce_any kernel data race on sharedMem (#47233) · 77dbb318

由 Bo Zhang 提交于 10月 27, 2022

* fix reduce_any kernel data race on sharedMem

* use bit operation instead of div & mod

* unbranch

* modified according to PR comments

77dbb318

26 10月, 2022 3 次提交
- L
  [Fix] Fix paddle.pow() Gets Incorrect Result When Broadcasting Is Triggered (#47307) · d8314ff5
  由 Lin Manhui 提交于 10月 26, 2022
```
* Fix paddle.pow() bugs

* Add unittest cases

* Fix ut cases

* Add ut cases on multiple devices
```
  d8314ff5
- Z
  
  test success on cuda11.7 (#47348) · 2534ca7e
  由 zhangkaihuo 提交于 10月 26, 2022
  
  2534ca7e
- W
  fix uninitialized, tautological-constant-out-of-range-compare and... · 076c41ef
  由 Wang Xin 提交于 10月 26, 2022
```
fix uninitialized, tautological-constant-out-of-range-compare and literal-conversion warning on macos (#47341)
```
  076c41ef
25 10月, 2022 2 次提交
- J
  
  minor split optimization (#47314) · d5e7d20d
  由 jakpiase 提交于 10月 25, 2022
  
  d5e7d20d
- [Zero-Dim] support input 0D Tensor for softmax/log_softmax/gumbel_softmax (#47251) · ac3b882f
  由 zhouweiwei2014 提交于 10月 25, 2022
  
  ac3b882f
24 10月, 2022 4 次提交
- Z
  Polish slice code in fluid (#45746) · 3f64a2c3
  由 zyfncg 提交于 10月 24, 2022
```
* support selected_rows kernel for multiply in dygraph

* delete useless code of slice in fluid

* fix complie bug

* move slice_array from fluid to phi

* fix strided_slice_op_npu
```
  3f64a2c3
- Y
  
  Enhance the implementation of some conv functions. (#47281) · bc47e7ac
  由 Yiqun Liu 提交于 10月 24, 2022
  
  bc47e7ac
- Z
  
  fix cumsum compilation error for GPU architecture that does not support fast FP16 (#47277) · 84273aaa
  由 Zhang Ting 提交于 10月 24, 2022
  
  84273aaa
- Y
  
  Move the header file of conv cudnn and miopen to phi directory. (#47248) · 31f57f29
  由 Yiqun Liu 提交于 10月 24, 2022
  
  31f57f29
21 10月, 2022 1 次提交
- Z
  
  fix bug of abs_grad in eager mode for kunlun, test=kunlun (#47164) · a9ac608f
  由 zhangyikun02 提交于 10月 21, 2022
  
  a9ac608f
20 10月, 2022 2 次提交
- J
  Add infer prune function (#47046) · af9486fc
  由 JingZhuangzhuang 提交于 10月 20, 2022
```
* Add infer prune function

* Update phi.cmake

* Update operators.cmake

* add fusion op
```
  af9486fc
- T
  
  PaddlePaddle Hackathon 3 No.45 & 46】：为 Paddle cumsum和logcumsumexp 支持 float16 数据类型 (#45952) · c91b1b91
  由 thunder95 提交于 10月 20, 2022
  
  c91b1b91
19 10月, 2022 4 次提交
- C
  
  remove fluid symbol depend in sync bn (#47122) · ab369976
  由 Chen Weihang 提交于 10月 19, 2022
  
  ab369976
- Y
  Enable to record whether the conv algo is got by exhaustive search to fix... · 3bc4b850
  由 Yiqun Liu 提交于 10月 19, 2022
```
Enable to record whether the conv algo is got by exhaustive search to fix autotune cache bug. (#47065)
```
  3bc4b850
- W
  
  slice op supports uint8_t (#47067) · 1e1c7275
  由 will-jl944 提交于 10月 19, 2022
  
  1e1c7275
- X
  [Dy2Static] Remove GradTransformer (#47063) · be3908a3
  由 xiongkun 提交于 10月 19, 2022
```
* [Dy2Static] Remove GradTransformer
1. fix einsum infershape bugs.
2. remove grad_transformer and unify paddle.grad and paddle.static.gradient.
3. add dygraph_and_dy2static_only decorator for dy2static.

* fix bugs

* rename
```
  be3908a3
18 10月, 2022 3 次提交
- S
  add embedding range check (#46991) · d68c38ef
  由 seemingwang 提交于 10月 18, 2022
```
* add embedding range check

* change head file

* change head file

* fix
```
  d68c38ef
- L
  
  Add value check & error message for gather_tree (#47051) · e5e3d5cf
  由 liu zhengxi 提交于 10月 18, 2022
  
  e5e3d5cf
- H
  [XPU] update xpu cmake to 1016. test=kunlun (#47041) · 55ac9c46
  由 houj04 提交于 10月 18, 2022
```
* [XPU] update xpu cmake to 1016. test=kunlun

* fix special case of transpose op. test=kunlun
```
  55ac9c46
17 10月, 2022 4 次提交

Support BF16 training for sharding (#46846) · 0b39b244

由 Ghost Screaming 提交于 10月 17, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* Support bfloat16 type for reducer and sharding.

* Fix some bug.

* Polish code.

* Polise code.

* Add bfloat16 datatype in fill_grad kernels.
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

0b39b244

Y
[PHI]Modify DataLayout's namespace from paddle::experimental to phi (#46869) · ec749398
由 YuanRisheng 提交于 10月 17, 2022
```
* namespace modify

* update by comment
```
ec749398

[Hackathon 3rd No.22 ] add paddle.incubate.sparse.reshape (#46694) · abb38136

由 OccupyMars2025 提交于 10月 17, 2022

* add sparse reshape

* change the dtype in all test cases to int64

* just one test case

* modify comments

* Update test_sparse_reshape_op.py

* chang the type of "shape"  from  vector<int64_t>  to  IntArray

* check whether sp_out.to_dense() is the cause  of error

* print sp_out

* Update reshape_kernel.cc

* use numpy to generate the equal paddle tensor

* just check dense_tensor.numpy()

* check cpu and cuda versions

* Update test_sparse_reshape_op.py

* supply all test cases for cpu forward coo kernel

* test forward coo cuda kernel

* change configuration of cuda kernel

* keep only one test case

* test coo cpu kernel (forward and backward)

* row major or column major ???

* test cuda coo forward kernel

* complete declaration and registration

* Update __init__.py

* rebuild

* retrigger CI

* add cudaMalloc and cudaMemcpy  in  ReshapeCooKernel  and change back to row major order in a cuda dense tensor

* midify minor error

* test only cpu coo forward kernel

* add all test cases for coo forward kernel  (both cpu and gpu)

* test all forward kernels (coo, csr; cpu, gpu)

* add all test cases for all kinds of kernels

* just retrigger CI

* Update sparse_ops.yaml

* Update sparse_ops.yaml

* Update sparse_ops.yaml

* resolve conflicts

* Update sparse_ops.yaml

* don't specify tensor place

* new shape has -1 or 0 in it

* Update unary_grad_kernel.h

* correct lvalue error

* code style

* Update sparse_backward.yaml

* Update sparse_ops.yaml

* Update unary_kernel.h

* Update unary.py

* Update sparse_backward.yaml

* Update unary.py

* code style

* code style

* code style

* Update unary.py

* specify tensor place explicitly

* do not use numpy array

* use numpy array in unit test again

* modify example code in docstring

abb38136

L
Fix the bug of PHI kernel of reduce_sum in kunlun when using eager mode. (#47004) · f9c1cdc1
由 Leo Guo 提交于 10月 17, 2022
```
test=kunlun
```
f9c1cdc1

14 10月, 2022 2 次提交
- R
  
  speed_up for deformable conv (#46997) · eee6b3a7
  由 Rayman 提交于 10月 14, 2022
  
  eee6b3a7
- W
  TRT pool2d adaptive mode bugfix (#46802) · eb32746a
  由 Wang Bojun 提交于 10月 14, 2022
```
* draft with debug print
```
  eb32746a
13 10月, 2022 4 次提交
- X
  
  logsumexp support fp16 (#45817) · 910e1b6a
  由 xiaohemaikoo 提交于 10月 13, 2022
  
  910e1b6a
- [Zero-Dim] support 0D for paddle.transpose/reshape/stack/tile/unsqueeze (#46555) · 78add057
  由 zhouweiwei2014 提交于 10月 13, 2022
  
  78add057
- C
  
  fix softmax memory align (#46902) · 71748805
  由 carryyu 提交于 10月 13, 2022
  
  71748805
- Z
  Revert #46111 (#46961) · cf9ca61d
  由 Zhang Ting 提交于 10月 13, 2022
```
* Revert "【Hackathon No.56&38】deformable_conv_v1 算子实现 float16 数据类型支持&前向运行加速 (#46111)"
```
  cf9ca61d

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功