提交 · 95c3d6135841d74211b1343679ee568588051d7b · PaddlePaddle / Paddle

09 4月, 2023 1 次提交

Cherry pick the support of bfloat16 for several operators. (#52608) · 95c3d613

由 Yiqun Liu 提交于 4月 09, 2023

* Register exp/expm1/logit bf16 activation op kernels (#48702)

* register more bf16 ops

* update to register coresponding backward ops

* Addition of bf16 type support for Compare OP  (#46413)

* first commit

* clarify the quotes

* change code style format

* support bfloat16

* add bfloat16 support for more ops (#48272)

* [Bfloat16]register bfloat16 datatype for squared l2 norm (#50908)

* Sync the pull request #51903.

* Add some header files back.

* modify cmake file for cuda11.8 compile (#49020)

* modify cmake file for cuda11.8 compile

* add op_library(fused_embedding_eltwise_layernorm_op DEPS bert_encoder_functor)

* Fix compling error.

* Cherry-pick pull request #51396.

---------
Co-authored-by: Nsneaxiy <32832641+sneaxiy@users.noreply.github.com>
Co-authored-by: Nlimingshu <61349199+JamesLim-sy@users.noreply.github.com>
Co-authored-by: Shaojie WANG <wsjmessi@163.com>
Co-authored-by: Nzqw_1997 <118182234+zhengqiwen1997@users.noreply.github.com>

95c3d613

20 3月, 2023 1 次提交
- L
  
  Cherry-pick fleet executor and auto parallel (#50071) · 92c2dcbd
  由 LiYuRio 提交于 3月 20, 2023
  
  92c2dcbd
13 1月, 2023 1 次提交
- Y
  fix fc kernel diff (#49781) · 01c26ab2
  由 Yuanle Liu 提交于 1月 13, 2023
```
* fix fc kernel diff

* disable fc_elementwise_layernorm_fuse_pass
```
  01c26ab2
12 1月, 2023 1 次提交
- X
  
  fix_split_infermeta (#49745) · 8a934047
  由 xiaoxiaohehe001 提交于 1月 12, 2023
  
  8a934047
09 1月, 2023 1 次提交
- H
  
  fix bugs of paddle.multiplex API (#49368) (#49642) · 6d2d8e50
  由 Haohongxiang 提交于 1月 09, 2023
  
  6d2d8e50
04 1月, 2023 1 次提交

[Cherry-pick][Paddle Inference] fix mixed precision diff (#49477) · 1d25c663

由 Yuanle Liu 提交于 1月 04, 2023

* disable scale op in amp pass

* Do not insert redundant cast op

* fix fused_fc_elementwise_layernorm kernel diff

* fix fc kerenl diff

1d25c663

03 1月, 2023 1 次提交
- X
  [Cherry pick] fix fold for big bs (#49491) · 2a438b0a
  由 xiaoting 提交于 1月 03, 2023
```
* fix fold for large bs

* fix fold for large bs

* fix pre-commit
```
  2a438b0a
29 12月, 2022 1 次提交

[Cherry-pick]Move sum op to PHI && Fix MetaTensor's bug when run infermeta (#49342) · 8015fbd6

由 YuanRisheng 提交于 12月 29, 2022

* cherry-pick 45860

* [BUG FIX]Fix MetaTensor's bug when run infermeta (#46265)

* fix sum bug

* fix ci bugs

* fix ci bugs

* update code according comment

8015fbd6

29 11月, 2022 1 次提交

[cherry-pick] updating mul and matmul with set_mem_desc and fix... · 9e2ba9b9

由 yeliang2258 提交于 11月 29, 2022

[cherry-pick] updating mul and matmul with set_mem_desc and fix squeeze_transpose for MKLDNN (#47951)

* Fix slice bugs in MKLDNN when input dims are zeros (#46671)

* fix slice bugs

* fix

* update code

* fix

* update code

* updating mul and matmul with set_mem_desc (#45624)

* - mul & matmul changes

- fix

- bs16 correction of strides

* - cosmetic fixes

* - lint

* - fix

* - fix

* - format -> mem_desc

* - fix

* - fix

* - fix

* - fix

* - fix

* fix squueze_transpose (#47911)
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>

9e2ba9b9

25 11月, 2022 1 次提交
- Z
  Fix wrong eigen header include in data_type.h (#48157) (#48260) · a2f61fef
  由 zyfncg 提交于 11月 25, 2022
```
* Fix wrong eigen header include

* fix compile bug
```
  a2f61fef
10 11月, 2022 1 次提交

【Cherry-pick PR47743】change cudnn error to cuda error if compiled cuda version... · 76b883c2

由 pangyoki 提交于 11月 10, 2022

【Cherry-pick PR47743】change cudnn error to cuda error if compiled cuda version is incompatible with installed cuda version (#47744)

* cherry-pick pr47743

* fix

* fix

* fix

76b883c2

07 11月, 2022 2 次提交

【Cherry-pick PR47666】add cudnn error if compiled cudnn version is incompatible... · 764cea0c

由 pangyoki 提交于 11月 07, 2022

【Cherry-pick PR47666】add cudnn error if compiled cudnn version is incompatible with installed cudnn version (#47673)

* Cherry-pick PR47666, add cudnn error (#47666)

* [CherryPick] Cherry pick #45916 #46031 #47299  (#47610)

* [ Dy2Static ] Fix bugs when select inputs meeting different shape or undefined-var (#45916)

* fix select_input with different shape errors:
1. select_input_with_buildin_type directly return non-undefinedvar branch when meeting undefined var
2. the output shape of select_input is inferred from inputs.

* reverse the logic in select_input

* [warning] added warning message in cond block when one branch returns variable and another returns None (#46031)

* [cherry-pick] Allow manaully set py_reader name in standalone executor (#45898) (#45931)

* Allow manaully set py_reader name in standalone executor

* [BugFix] while cond receives dict as input (#47299)

* fix bugs while cond receives dict as input

* add unittest

* change flatten -> _is_sequence_except_dict

* code format
Co-authored-by: Nfeifei-111 <wuzhanfei@baidu.com>
Co-authored-by: Nxiongkun <xiongkun03@baidu.com>
Co-authored-by: Nfeifei-111 <wuzhanfei@baidu.com>

764cea0c

Z
Revert "SparseConv support duplicate coordinates (#44976)" (#45202) (#47699) · 7145db6e
由 zhangkaihuo 提交于 11月 07, 2022
```
Revert SparseConv support duplicate coordinates
```
7145db6e

03 11月, 2022 1 次提交
- Z
  [Sparse] Unified api args name (#47529) (#47627) · 75088bbf
  由 zhangkaihuo 提交于 11月 03, 2022
```
Unified api args name
```
  75088bbf
02 11月, 2022 1 次提交
- S
  
  [geometric] Optimize graph sample speed (#47531) (#47548) · 7a1cf277
  由 Siming Dai 提交于 11月 02, 2022
  
  7a1cf277
01 11月, 2022 1 次提交

[cherry-pick][code-gen] Support code-gen for opmaker of sparse op (#46993) (#47417) · 601626ac

由 zyfncg 提交于 11月 01, 2022

* support generating code of opmaker for backward op invoke forward op (#46912)

* [code-gen] Support code-gen for opmaker of sparse op (#46993)

* support generating code of opmaker for backward op invoke forward op

* gsupport code-gen of opmaker for sparse op

* refind logic of choose phi kernrel

* fix complie budg

* fix code_gen bug

* fix bug

* fix kernel signature code-gen

* fix complie bug of VarType

* fix complie bug of VarType

* fix test_sparse_conv_op

* fix test_sparse_norm_op

* [Phi] Refactor logic of judging whether having a phi kernrel (#46920)

* refind logic of choose phi kernrel

* fix complie budg

* update cmake

601626ac

28 10月, 2022 1 次提交
- Z
  [cherry-pick]add sync_batch_norm_bn and deliver indices_dict (#47407) · 0fa8309a
  由 zhangkaihuo 提交于 10月 28, 2022
```
add sync_batch_norm_bn and deliver indices_dict 
```
  0fa8309a
27 10月, 2022 1 次提交
- Z
  [cherry-pick] add batch_norm_kernel (#47394) · b143e008
  由 zhangkaihuo 提交于 10月 27, 2022
```
* cherry-pick #46359 and resolve conflict
```
  b143e008
26 10月, 2022 2 次提交
- Z
  Fix inference performance problem caused by selecting cudnn kernel of softmax (#47338) (#47367) · 0369cd0f
  由 zyfncg 提交于 10月 26, 2022
```
* fix inference perfermence problem caused by selecting cudnn kernel for softmax

* recover use_cudnn in opmaker of softmax
```
  0369cd0f
- Y
  Added workaround for elementwise oneDNN kernel (#47080) (#47342) · 7c6550a6
  由 yeliang2258 提交于 10月 26, 2022
```
* return proper state

* fix for dims

* fix
Co-authored-by: Njakpiase <jakpia21@gmail.com>
```
  7c6550a6
25 10月, 2022 1 次提交

[Sparse] Fix indices (#47190) (#47226) · 942ab42f

由 zhangkaihuo 提交于 10月 25, 2022

当前无法从Tensor中获取到SparseTensor的sparse_dim，无法准确推断出indices的shape，所以目前先以3D点云模型为主，输入的SparseTensor的维度是5D的，其中非零元素是一维向量，所以indices是[4, -1]。

942ab42f

24 10月, 2022 1 次提交

Support BF16 training for sharding (#46846) (#47246) · 5c85f1a7

由 Ghost Screaming 提交于 10月 24, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* Support bfloat16 type for reducer and sharding.

* Fix some bug.

* Polish code.

* Polise code.

* Add bfloat16 datatype in fill_grad kernels.
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

5c85f1a7

21 10月, 2022 1 次提交
- J
  Add infer prune function (#47047) · 8739497c
  由 JingZhuangzhuang 提交于 10月 21, 2022
```
* Add infer prune function

* add fusion op
```
  8739497c
20 10月, 2022 4 次提交
- Y
  [Cherry-pick] Simplify conv codes and fix cache and autotune bugs. (#47197) · c0ed8729
  由 Yiqun Liu 提交于 10月 20, 2022
```
* Simplify the codes of conv. (#45966)

* Enable to record whether the conv algo is got by exhaustive search to fix autotune cache bug. (#47065)
```
  c0ed8729
- L
  Add value check & error message for gather_tree (#47051) (#47221) · 6712e262
  由 liu zhengxi 提交于 10月 20, 2022
```
Add value check & error message for gather_tree
cherry-pick #47051
```
  6712e262
- S
  [Cherry-pick][Release/2.4] Fix some operators when the tensor.numel() > INT32_MAX (#47191) · c74bf018
  由 sneaxiy 提交于 10月 20, 2022
```
Fix some operators when the tensor.numel() > INT32_MAX
```
  c74bf018
- S
  [Cherry-pick][Release/2.4] support pure bfloat16 for more ops · da7d2f29
  由 sneaxiy 提交于 10月 20, 2022
```
support pure bfloat16 for more ops
```
  da7d2f29
19 10月, 2022 2 次提交

Z
[cherry-pick] strided_slice grad add fp16 support (#47159) · 23f2a4ea
由 Zhang Ting 提交于 10月 19, 2022
```
* strided_slice grad add fp16 support
```
23f2a4ea

[CherryPick] Support TypeHint for function decorated by @to_static (#47147) · 247ef477

由 xiongkun 提交于 10月 19, 2022

* [Dy2Static] Support TypeHint for function decorated by @to_static (#47121)

* Add TypeHint Transformer

* add unittest for typehint transformer

* [Dy2Static] Remove GradTransformer (#47063)

* [Dy2Static] Remove GradTransformer
1. fix einsum infershape bugs.
2. remove grad_transformer and unify paddle.grad and paddle.static.gradient.
3. add dygraph_and_dy2static_only decorator for dy2static.

* fix bugs

* rename

247ef477

18 10月, 2022 2 次提交
- [cherry-pick 2.4] add sparse api transpose/reshape/is_same_shape (#47076) · 5fef043d
  由 zhouweiwei2014 提交于 10月 18, 2022
```
新增sparse.is_same_shape、sparse.reshape、sparse.transpose 三个API
```
  5fef043d
- W
  [Cherry pick] trt pool2d adaptive ifx (#47069) · 5f6b9f1b
  由 Wang Bojun 提交于 10月 18, 2022
```
* draft with debug print
* remove debug print
* bug fix for ci
```
  5f6b9f1b
17 10月, 2022 3 次提交

Z
[cherry-pick]Sparse static graph (#46838) · 10225d22
由 zhangkaihuo 提交于 10月 17, 2022
```
cherry-pick : #46322, #46245
Sparse API 支持静态图
```
10225d22

Optimize performance of depthwise_conv (#46896) · 976af0da

由 Zhang Zheng 提交于 10月 17, 2022

Optimize performance of depthwise_conv

Config: input[2048, 1024, 4, 4], filter[1024, 1, 4, 4], stride=1, pad=0, dilation=1

976af0da

[Cherry-Pick]Move valid check from python to kernel (#46980) · 8bfd45ad

由 Zhang Zheng 提交于 10月 17, 2022

为了提升性能，将label的边界检查从python端转移到kernel内，减少额外op的调用，如min、max和同步拷贝等
    当前的模板参数IgnoreIndex仅在ignore_index取值范围在[0, dim)时才生效，但是当某个label值超出了边界，ignore_index等于该label，这种情况下是应该仍然能正常计算。虽然当前的计算逻辑在结果上不会出错，但逻辑上仍是有问题的，且模板参数IgnoreIndex是没有必要的

8bfd45ad

13 10月, 2022 2 次提交

傅
[Cherry-pick] Add fp16 dtype support for set_value op (#46906) · 100a0750
由傅剑寒提交于 10月 13, 2022
```
Fix set_value failure when source tensor is fp16 Dtype and destiny value is a number
(dev PR link:#46801)
```
100a0750

[cherry-pick] [PHI] transpose2_grad op migration (#46139) (#46873) · 0280c0b9

由 Sławomir Siwek 提交于 10月 13, 2022

* Revert pool+grad oneDNN kernel conversion (#45989)

* [PHI] transpose2_grad op migration (#46139)

* op migrated, Copy(OneDNNContext, ...) added

* mutable_data & op registration in fluid removed

* refactoring

* OneDNNGetDataType to uppercase

* missing cpu check added, handler moved to .h file

* name changed to transpose_grad

* Copy changed back to TensorCopy

* Resizing corrected, Copy(OneDNNContext) removed
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
Co-authored-by: NPaulina Gacek <paulina.gacek@intel.com>

0280c0b9

12 10月, 2022 1 次提交
- N
  [Cherry-pick]Update layout autotune for module with no modified (#46541) (#46515) (#46880) · 61273c0e
  由 niuliling123 提交于 10月 12, 2022
```
Cherry-pick 46541
保证Reset50 TSM deeplabv3模型零修改下实现Layout自动调优
```
  61273c0e
11 10月, 2022 3 次提交
- F
  
  set_value_op: add support for complex types (#46885) · b051455f
  由 Feiyu Chan 提交于 10月 11, 2022
  
  b051455f
- S
  
  add seed check (#46858) · 2190da20
  由 Sławomir Siwek 提交于 10月 11, 2022
  
  2190da20
- S
  
  hard_swish grad (#46857) · 2c6bd4ad
  由 Sławomir Siwek 提交于 10月 11, 2022
  
  2c6bd4ad

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功