提交 · da7d2f297ce15af23307d233ef8cfc479677a2c6 · BaiXuePrincess / Paddle

20 10月, 2022 1 次提交
- S
  [Cherry-pick][Release/2.4] support pure bfloat16 for more ops · da7d2f29
  由 sneaxiy 提交于 10月 20, 2022
```
support pure bfloat16 for more ops
```
  da7d2f29
19 10月, 2022 6 次提交

Z
[cherry-pick] strided_slice grad add fp16 support (#47159) · 23f2a4ea
由 Zhang Ting 提交于 10月 19, 2022
```
* strided_slice grad add fp16 support
```
23f2a4ea

Add unsigned int8 scale propagation (#46378) (#47156) · 66dccd7d

由 yeliang2258 提交于 10月 19, 2022

* Add unsigned int8 propagation

* Add or modify unit tests

* Correct concat scale checking

* Apply review suggestions

* Corrections
Co-authored-by: Njoanna.wozna.intel <joanna.wozna@intel.com>

66dccd7d

[CherryPick] Support TypeHint for function decorated by @to_static (#47147) · 247ef477

由 xiongkun 提交于 10月 19, 2022

* [Dy2Static] Support TypeHint for function decorated by @to_static (#47121)

* Add TypeHint Transformer

* add unittest for typehint transformer

* [Dy2Static] Remove GradTransformer (#47063)

* [Dy2Static] Remove GradTransformer
1. fix einsum infershape bugs.
2. remove grad_transformer and unify paddle.grad and paddle.static.gradient.
3. add dygraph_and_dy2static_only decorator for dy2static.

* fix bugs

* rename

247ef477

Add enable_partial_send_recv switch in pipeline_configs (#46992) (#47083) · 1d015f12

由 Ghost Screaming 提交于 10月 19, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Support allow_partial switch, which can be configure in
pipeline_configs. If sent tensor are not the same from
different hosts, they shouldn't been sent partially and
then concated as a whole tensor.

* Change name allow_partial to enable_partial_send_recv.

* Add global variable _enable_partial_send_recv

1d015f12

W
[Dy2St]Fix recurrent op eager deletion pass error in dy2st (#47105) (#47134) · 69515e90
由 WangZhen 提交于 10月 19, 2022
```
[CherryPick][Dy2St]Fix recurrent op eager deletion pass error in dy2st
```
69515e90
H
[ cherrypick] Construct exec and ctx only once in cond op to speed up (#47012) · fcb9c0b5
由 Hui Zhang 提交于 10月 19, 2022
```
Construct exec and ctx only once in cond op to speed up
```
fcb9c0b5

18 10月, 2022 6 次提交
- W
  
  reconstruct code for convert_fp16 (#46428) (#47087) · de6f15b6
  由 Wilber 提交于 10月 18, 2022
  
  de6f15b6
- W
  Add symbolic shape deduction function for unfold, scatter_nd_add, p_norm,... · 2cc8797e
  由 weishengying 提交于 10月 18, 2022
```
Add symbolic shape deduction function for unfold, scatter_nd_add, p_norm, grid_sampler, pad3d, etc (#46291) (#47003)
```
  2cc8797e
- [cherry-pick 2.4] add sparse api transpose/reshape/is_same_shape (#47076) · 5fef043d
  由 zhouweiwei2014 提交于 10月 18, 2022
```
新增sparse.is_same_shape、sparse.reshape、sparse.transpose 三个API
```
  5fef043d
- Z
  
  support shape tensor is the input of trt-subgraph (#47066) · 5a44c124
  由 zhoutianzi666 提交于 10月 18, 2022
  
  5a44c124
- H
  [cherry-pick] Fix perf issues of mp/pp/fuse in eager mode (#47071) · b84edd90
  由 Haohongxiang 提交于 10月 18, 2022
```
* [Dygraph] Fix performance of pp+mp by using send/recv_calc_stream instead of send/recv (#46116)

* [Dygraph] Fix Perf of FusedFeedForward and FusedAttention with AllReduce (#46780)

* update
```
  b84edd90
- W
  [Cherry pick] trt pool2d adaptive ifx (#47069) · 5f6b9f1b
  由 Wang Bojun 提交于 10月 18, 2022
```
* draft with debug print
* remove debug print
* bug fix for ci
```
  5f6b9f1b
17 10月, 2022 5 次提交

[Cherry-pick] Collective communication APIs (#46922) · 5fba2a98

由 Wen Sun 提交于 10月 17, 2022

* Support both use_calc_stream and sync_op in send recv APIs (#46023)

* Support both use_calc_stream and sync_op in allgather API (#46295)

* Support both use_calc_stream and sync_op in collective communication API (#46761)

* Move group and all reduce from collective to communication (#45848)

* Completes bfloat16 dtype for collective api in eager mode (#45844)

* Fix collective APIs cannot be recognized when building docs (#46962)
Co-authored-by: NLiYuRio <63526175+LiYuRio@users.noreply.github.com>

5fba2a98

Z
[cherry-pick]Sparse static graph (#46838) · 10225d22
由 zhangkaihuo 提交于 10月 17, 2022
```
cherry-pick : #46322, #46245
Sparse API 支持静态图
```
10225d22

Optimize performance of depthwise_conv (#46896) · 976af0da

由 Zhang Zheng 提交于 10月 17, 2022

Optimize performance of depthwise_conv

Config: input[2048, 1024, 4, 4], filter[1024, 1, 4, 4], stride=1, pad=0, dilation=1

976af0da

[IPU] paddle-inference support custom-ops (#45235) (#46868) · bd89be12

由 Allen Guo 提交于 10月 17, 2022

* paddle-inference support custom-ops
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>

* fix tolower
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>

bd89be12

[Cherry-Pick]Move valid check from python to kernel (#46980) · 8bfd45ad

由 Zhang Zheng 提交于 10月 17, 2022

为了提升性能，将label的边界检查从python端转移到kernel内，减少额外op的调用，如min、max和同步拷贝等
    当前的模板参数IgnoreIndex仅在ignore_index取值范围在[0, dim)时才生效，但是当某个label值超出了边界，ignore_index等于该label，这种情况下是应该仍然能正常计算。虽然当前的计算逻辑在结果上不会出错，但逻辑上仍是有问题的，且模板参数IgnoreIndex是没有必要的

8bfd45ad

14 10月, 2022 5 次提交
- W
  
  cherry-pick 46942 (#47015) · 82db4993
  由 Wilber 提交于 10月 14, 2022
  
  82db4993
- X
  
  Add bmm convert (#47011) · 8f1ac7cf
  由 xiaoxiaohehe001 提交于 10月 14, 2022
  
  8f1ac7cf
- A
  [BUG]Fix expand_as_v2 bug while X and Y with different dtype (#46950) (#46999) · 4b472656
  由 Aurelius84 提交于 10月 14, 2022
```
* [BUG]Fix expand_as_v2 bug while X and Y with different dtype

* fix commit
```
  4b472656
- Z
  [cherry-pick 2.4][inference] fix reshape2 opteller (#46871) · 535d7574
  由 Zhang Jun 提交于 10月 14, 2022
```
* fix reshape2 opteller;
add elementwise min/max register for tensorrt
```
  535d7574
- Z
  
  [Paddle-TRT] support new quant format from slim (#46022) (#46979) · b8677c0d
  由 zhoutianzi666 提交于 10月 14, 2022
  
  b8677c0d
13 10月, 2022 3 次提交

Z

interpretercore thread not always spin (#46687) (#46952) · d90aaa6e
由 zhangbo9674 提交于 10月 13, 2022

d90aaa6e
傅
[Cherry-pick] Add fp16 dtype support for set_value op (#46906) · 100a0750
由傅剑寒提交于 10月 13, 2022
```
Fix set_value failure when source tensor is fp16 Dtype and destiny value is a number
(dev PR link:#46801)
```
100a0750

[cherry-pick] [PHI] transpose2_grad op migration (#46139) (#46873) · 0280c0b9

由 Sławomir Siwek 提交于 10月 13, 2022

* Revert pool+grad oneDNN kernel conversion (#45989)

* [PHI] transpose2_grad op migration (#46139)

* op migrated, Copy(OneDNNContext, ...) added

* mutable_data & op registration in fluid removed

* refactoring

* OneDNNGetDataType to uppercase

* missing cpu check added, handler moved to .h file

* name changed to transpose_grad

* Copy changed back to TensorCopy

* Resizing corrected, Copy(OneDNNContext) removed
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
Co-authored-by: NPaulina Gacek <paulina.gacek@intel.com>

0280c0b9

12 10月, 2022 1 次提交
- N
  [Cherry-pick]Update layout autotune for module with no modified (#46541) (#46515) (#46880) · 61273c0e
  由 niuliling123 提交于 10月 12, 2022
```
Cherry-pick 46541
保证Reset50 TSM deeplabv3模型零修改下实现Layout自动调优
```
  61273c0e
11 10月, 2022 8 次提交
- F
  
  set_value_op: add support for complex types (#46885) · b051455f
  由 Feiyu Chan 提交于 10月 11, 2022
  
  b051455f
- S
  
  add seed check (#46858) · 2190da20
  由 Sławomir Siwek 提交于 10月 11, 2022
  
  2190da20
- S
  
  hard_swish grad (#46857) · 2c6bd4ad
  由 Sławomir Siwek 提交于 10月 11, 2022
  
  2c6bd4ad
- S
  [cherry-pick] [PHI] relu6_grad kernel (#46501) (#46862) · 2bcbf8b0
  由 Sławomir Siwek 提交于 10月 11, 2022
```
* [PHI] Migrate gelu kernels (#45596)

* gaussian random

* mkldnn to onednn renaming

* fix merge conflicts

* remove fluid code

* onednn renaming

* gelu fwd

* sort activations

* gelu gradient

* remove unused macros

* merge conflicts

* fix merge conflicts

* remove extra contraint from gelu op

* [PHI] relu6_grad kernel (#46501)

* Relu6

* remove fluid handler

* add individual kernel signature

* coding style

* replace bounded_relu with clip

* whitespace

* code style
```
  2bcbf8b0
- S
  Revert pool+grad oneDNN kernel conversion (#45989) (#46860) · 7b3837e6
  由 Sławomir Siwek 提交于 10月 11, 2022
```
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
```
  7b3837e6
- C
  
  speedup ChannelClipAndQuantDequantKernelQuantAxis1 kernel (#46471) (#46551) · f5565494
  由 ceci3 提交于 10月 11, 2022
  
  f5565494
- Y
  [BugFix]Fix concat bugs when call onednn kernel (#46518) (#46845) · 6a6c7493
  由 YuanRisheng 提交于 10月 11, 2022
```
* fix concat bug

* fix ci bugs

* fix ci bugs
```
  6a6c7493
- Y
  
  optimize Paddle-TRT performance (#46684) · d091d1b0
  由 Yuanle Liu 提交于 10月 11, 2022
  
  d091d1b0
10 10月, 2022 5 次提交

F
Fix gather op convert for Paddle-TensorRT (#46779) (#46825) · a0e03418
由 feng_shuai 提交于 10月 10, 2022
```
* fix gather op convert to only support int32 index as input.
* add ut
```
a0e03418

[cherry-pick] [PHI] Migrate concat+grad, expand+grad, fill_constant … oneDNN... · fdd0d6d0

由 Sławomir Siwek 提交于 10月 10, 2022

[cherry-pick] [PHI] Migrate concat+grad, expand+grad, fill_constant … oneDNN kernels (#45863) (#46727)

* [PHI] Migrate concat+grad, expand+grad, fill_constant, nearest_interp and bilinear_interp oneDNN kernels (#45863)

* Migrate concat+grad, expand+grad, fill_constant, nearest_interp_v2 and bilinear_interp_v2 oneDNN kernels to PHI

* Remove old namespace variable

* Fix invalid out dims error

* Add mutable_data method to concat output

* Add check for -1 dim before computing out_dims

* Capitalize oneDNNGetDataType function name

* Change fill_constant kernel to correct PHI kernel

* Attempt to fix dims error

* Fix fill_constant (full) kernel

* update dependencies
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>

fdd0d6d0

[cherry-pick] [PHI] Migrate sgd and stack oneDNN kernels (#46374) (#46729) · 25d61cd1

由 Sławomir Siwek 提交于 10月 10, 2022

* [PHI] Migrate sgd and stack oneDNN kernels (#46374)

* Convert slice+grad oneDNN fluid kernels to PHI

* Change mutable_data to Alloc

* Refactor licences

* update dependencies
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>

25d61cd1

[PHI] Migrate slice, slice_grad, split, pad and pad3d oneDNN kernels (#46101) (#46726) · 51a91fee

由 Sławomir Siwek 提交于 10月 10, 2022

* Convert split, pad and pad3d kernels

* Convert slice+grad oneDNN fluid kernels to PHI

* change out->mutable_data to dev_ctx.Alloc
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>

51a91fee

S
[PHI] migrate softmax_grad kernel (#46257) (#46725) · 44ecae6c
由 Sławomir Siwek 提交于 10月 10, 2022
```
* init

* remove softmaxop

* merge dev

* correct dir

* style
```
44ecae6c

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致