提交 · b720873d37afd9bde17b5e26c2acd42948b02802 · PaddlePaddle / Paddle

12 5月, 2023 1 次提交
- L
  
  fix add_n kernel of large shape (#53751) · b720873d
  由 Leo Chen 提交于 5月 12, 2023
  
  b720873d
21 4月, 2023 1 次提交
- [frl_train_eval] add bfloat16 dtype support of to_tensor,due to numpy not support bfloat16 (#53153) · 94e8fc78
  由 zhouweiwei2014 提交于 4月 21, 2023
  
  94e8fc78
11 4月, 2023 1 次提交

Cherry pick for fix of operator precision. (#52705) · d1e8b1e2

由 Yiqun Liu 提交于 4月 11, 2023

* Fix scale kernel for low precision, cherry pick #50998.

* Fix the FP16 precision problem of add_n. (#50129)

* Change squared_l2_norm to reuse ReduceKernel, and register fp16 and bf16 kernel, which is cherry pick #48315.

* Cherry-pick the fix of MPTypeTrait in KP, which is implemented in #50993.

* Cherry-pick the multi-precision support of AdamW for bf16, #48041.

* Fix compiling error.

* Cherry-pick the fix of CubTensorReduceImpl for bfloat16 in #50993.

* Fix unittest.

---------
Co-authored-by: Nliuruyan <44316842+liuruyan@users.noreply.github.com>

d1e8b1e2

09 4月, 2023 2 次提交

Add bfloat16 support for several operators and apis. (#52696) · ba9a22db

由 Yiqun Liu 提交于 4月 09, 2023

* Cherry-pick the register of bfloat16 for amp_kernel, pull request #45541.

* Cherry-pick the master_grad support of adamw, pull request #51141.

* add bf16 for some ops in static mode (#51582)

* Add bfloat16 support for some api in static mode.

* Fix codestyle.

* Revert the change of layer_function_generator.py.

---------
Co-authored-by: Shaojie WANG <wsjmessi@163.com>

ba9a22db

Cherry pick the support of bfloat16 for several operators. (#52608) · 95c3d613

由 Yiqun Liu 提交于 4月 09, 2023

* Register exp/expm1/logit bf16 activation op kernels (#48702)

* register more bf16 ops

* update to register coresponding backward ops

* Addition of bf16 type support for Compare OP  (#46413)

* first commit

* clarify the quotes

* change code style format

* support bfloat16

* add bfloat16 support for more ops (#48272)

* [Bfloat16]register bfloat16 datatype for squared l2 norm (#50908)

* Sync the pull request #51903.

* Add some header files back.

* modify cmake file for cuda11.8 compile (#49020)

* modify cmake file for cuda11.8 compile

* add op_library(fused_embedding_eltwise_layernorm_op DEPS bert_encoder_functor)

* Fix compling error.

* Cherry-pick pull request #51396.

---------
Co-authored-by: Nsneaxiy <32832641+sneaxiy@users.noreply.github.com>
Co-authored-by: Nlimingshu <61349199+JamesLim-sy@users.noreply.github.com>
Co-authored-by: Shaojie WANG <wsjmessi@163.com>
Co-authored-by: Nzqw_1997 <118182234+zhengqiwen1997@users.noreply.github.com>

95c3d613

20 3月, 2023 1 次提交
- L
  
  Cherry-pick fleet executor and auto parallel (#50071) · 92c2dcbd
  由 LiYuRio 提交于 3月 20, 2023
  
  92c2dcbd
13 1月, 2023 1 次提交
- Y
  fix fc kernel diff (#49781) · 01c26ab2
  由 Yuanle Liu 提交于 1月 13, 2023
```
* fix fc kernel diff

* disable fc_elementwise_layernorm_fuse_pass
```
  01c26ab2
09 1月, 2023 1 次提交
- H
  
  fix bugs of paddle.multiplex API (#49368) (#49642) · 6d2d8e50
  由 Haohongxiang 提交于 1月 09, 2023
  
  6d2d8e50
04 1月, 2023 1 次提交

[Cherry-pick][Paddle Inference] fix mixed precision diff (#49477) · 1d25c663

由 Yuanle Liu 提交于 1月 04, 2023

* disable scale op in amp pass

* Do not insert redundant cast op

* fix fused_fc_elementwise_layernorm kernel diff

* fix fc kerenl diff

1d25c663

03 1月, 2023 1 次提交
- X
  [Cherry pick] fix fold for big bs (#49491) · 2a438b0a
  由 xiaoting 提交于 1月 03, 2023
```
* fix fold for large bs

* fix fold for large bs

* fix pre-commit
```
  2a438b0a
29 12月, 2022 1 次提交

[Cherry-pick]Move sum op to PHI && Fix MetaTensor's bug when run infermeta (#49342) · 8015fbd6

由 YuanRisheng 提交于 12月 29, 2022

* cherry-pick 45860

* [BUG FIX]Fix MetaTensor's bug when run infermeta (#46265)

* fix sum bug

* fix ci bugs

* fix ci bugs

* update code according comment

8015fbd6

29 11月, 2022 1 次提交

[cherry-pick] updating mul and matmul with set_mem_desc and fix... · 9e2ba9b9

由 yeliang2258 提交于 11月 29, 2022

[cherry-pick] updating mul and matmul with set_mem_desc and fix squeeze_transpose for MKLDNN (#47951)

* Fix slice bugs in MKLDNN when input dims are zeros (#46671)

* fix slice bugs

* fix

* update code

* fix

* update code

* updating mul and matmul with set_mem_desc (#45624)

* - mul & matmul changes

- fix

- bs16 correction of strides

* - cosmetic fixes

* - lint

* - fix

* - fix

* - format -> mem_desc

* - fix

* - fix

* - fix

* - fix

* - fix

* fix squueze_transpose (#47911)
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>

9e2ba9b9

25 11月, 2022 1 次提交
- Z
  Fix wrong eigen header include in data_type.h (#48157) (#48260) · a2f61fef
  由 zyfncg 提交于 11月 25, 2022
```
* Fix wrong eigen header include

* fix compile bug
```
  a2f61fef
07 11月, 2022 1 次提交
- Z
  Revert "SparseConv support duplicate coordinates (#44976)" (#45202) (#47699) · 7145db6e
  由 zhangkaihuo 提交于 11月 07, 2022
```
Revert SparseConv support duplicate coordinates
```
  7145db6e
03 11月, 2022 1 次提交
- Z
  [Sparse] Unified api args name (#47529) (#47627) · 75088bbf
  由 zhangkaihuo 提交于 11月 03, 2022
```
Unified api args name
```
  75088bbf
02 11月, 2022 1 次提交
- S
  
  [geometric] Optimize graph sample speed (#47531) (#47548) · 7a1cf277
  由 Siming Dai 提交于 11月 02, 2022
  
  7a1cf277
28 10月, 2022 1 次提交
- Z
  [cherry-pick]add sync_batch_norm_bn and deliver indices_dict (#47407) · 0fa8309a
  由 zhangkaihuo 提交于 10月 28, 2022
```
add sync_batch_norm_bn and deliver indices_dict 
```
  0fa8309a
27 10月, 2022 1 次提交
- Z
  [cherry-pick] add batch_norm_kernel (#47394) · b143e008
  由 zhangkaihuo 提交于 10月 27, 2022
```
* cherry-pick #46359 and resolve conflict
```
  b143e008
24 10月, 2022 1 次提交

Support BF16 training for sharding (#46846) (#47246) · 5c85f1a7

由 Ghost Screaming 提交于 10月 24, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* Support bfloat16 type for reducer and sharding.

* Fix some bug.

* Polish code.

* Polise code.

* Add bfloat16 datatype in fill_grad kernels.
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

5c85f1a7

21 10月, 2022 1 次提交
- J
  Add infer prune function (#47047) · 8739497c
  由 JingZhuangzhuang 提交于 10月 21, 2022
```
* Add infer prune function

* add fusion op
```
  8739497c
20 10月, 2022 4 次提交
- Y
  [Cherry-pick] Simplify conv codes and fix cache and autotune bugs. (#47197) · c0ed8729
  由 Yiqun Liu 提交于 10月 20, 2022
```
* Simplify the codes of conv. (#45966)

* Enable to record whether the conv algo is got by exhaustive search to fix autotune cache bug. (#47065)
```
  c0ed8729
- L
  Add value check & error message for gather_tree (#47051) (#47221) · 6712e262
  由 liu zhengxi 提交于 10月 20, 2022
```
Add value check & error message for gather_tree
cherry-pick #47051
```
  6712e262
- S
  [Cherry-pick][Release/2.4] Fix some operators when the tensor.numel() > INT32_MAX (#47191) · c74bf018
  由 sneaxiy 提交于 10月 20, 2022
```
Fix some operators when the tensor.numel() > INT32_MAX
```
  c74bf018
- S
  [Cherry-pick][Release/2.4] support pure bfloat16 for more ops · da7d2f29
  由 sneaxiy 提交于 10月 20, 2022
```
support pure bfloat16 for more ops
```
  da7d2f29
19 10月, 2022 2 次提交

Z
[cherry-pick] strided_slice grad add fp16 support (#47159) · 23f2a4ea
由 Zhang Ting 提交于 10月 19, 2022
```
* strided_slice grad add fp16 support
```
23f2a4ea

[CherryPick] Support TypeHint for function decorated by @to_static (#47147) · 247ef477

由 xiongkun 提交于 10月 19, 2022

* [Dy2Static] Support TypeHint for function decorated by @to_static (#47121)

* Add TypeHint Transformer

* add unittest for typehint transformer

* [Dy2Static] Remove GradTransformer (#47063)

* [Dy2Static] Remove GradTransformer
1. fix einsum infershape bugs.
2. remove grad_transformer and unify paddle.grad and paddle.static.gradient.
3. add dygraph_and_dy2static_only decorator for dy2static.

* fix bugs

* rename

247ef477

18 10月, 2022 2 次提交
- [cherry-pick 2.4] add sparse api transpose/reshape/is_same_shape (#47076) · 5fef043d
  由 zhouweiwei2014 提交于 10月 18, 2022
```
新增sparse.is_same_shape、sparse.reshape、sparse.transpose 三个API
```
  5fef043d
- W
  [Cherry pick] trt pool2d adaptive ifx (#47069) · 5f6b9f1b
  由 Wang Bojun 提交于 10月 18, 2022
```
* draft with debug print
* remove debug print
* bug fix for ci
```
  5f6b9f1b
17 10月, 2022 3 次提交

Z
[cherry-pick]Sparse static graph (#46838) · 10225d22
由 zhangkaihuo 提交于 10月 17, 2022
```
cherry-pick : #46322, #46245
Sparse API 支持静态图
```
10225d22

Optimize performance of depthwise_conv (#46896) · 976af0da

由 Zhang Zheng 提交于 10月 17, 2022

Optimize performance of depthwise_conv

Config: input[2048, 1024, 4, 4], filter[1024, 1, 4, 4], stride=1, pad=0, dilation=1

976af0da

[Cherry-Pick]Move valid check from python to kernel (#46980) · 8bfd45ad

由 Zhang Zheng 提交于 10月 17, 2022

为了提升性能，将label的边界检查从python端转移到kernel内，减少额外op的调用，如min、max和同步拷贝等
    当前的模板参数IgnoreIndex仅在ignore_index取值范围在[0, dim)时才生效，但是当某个label值超出了边界，ignore_index等于该label，这种情况下是应该仍然能正常计算。虽然当前的计算逻辑在结果上不会出错，但逻辑上仍是有问题的，且模板参数IgnoreIndex是没有必要的

8bfd45ad

13 10月, 2022 1 次提交

[cherry-pick] [PHI] transpose2_grad op migration (#46139) (#46873) · 0280c0b9

由 Sławomir Siwek 提交于 10月 13, 2022

* Revert pool+grad oneDNN kernel conversion (#45989)

* [PHI] transpose2_grad op migration (#46139)

* op migrated, Copy(OneDNNContext, ...) added

* mutable_data & op registration in fluid removed

* refactoring

* OneDNNGetDataType to uppercase

* missing cpu check added, handler moved to .h file

* name changed to transpose_grad

* Copy changed back to TensorCopy

* Resizing corrected, Copy(OneDNNContext) removed
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
Co-authored-by: NPaulina Gacek <paulina.gacek@intel.com>

0280c0b9

11 10月, 2022 6 次提交
- F
  
  set_value_op: add support for complex types (#46885) · b051455f
  由 Feiyu Chan 提交于 10月 11, 2022
  
  b051455f
- S
  
  add seed check (#46858) · 2190da20
  由 Sławomir Siwek 提交于 10月 11, 2022
  
  2190da20
- S
  
  hard_swish grad (#46857) · 2c6bd4ad
  由 Sławomir Siwek 提交于 10月 11, 2022
  
  2c6bd4ad
- S
  [cherry-pick] [PHI] relu6_grad kernel (#46501) (#46862) · 2bcbf8b0
  由 Sławomir Siwek 提交于 10月 11, 2022
```
* [PHI] Migrate gelu kernels (#45596)

* gaussian random

* mkldnn to onednn renaming

* fix merge conflicts

* remove fluid code

* onednn renaming

* gelu fwd

* sort activations

* gelu gradient

* remove unused macros

* merge conflicts

* fix merge conflicts

* remove extra contraint from gelu op

* [PHI] relu6_grad kernel (#46501)

* Relu6

* remove fluid handler

* add individual kernel signature

* coding style

* replace bounded_relu with clip

* whitespace

* code style
```
  2bcbf8b0
- S
  Revert pool+grad oneDNN kernel conversion (#45989) (#46860) · 7b3837e6
  由 Sławomir Siwek 提交于 10月 11, 2022
```
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
```
  7b3837e6
- Y
  [BugFix]Fix concat bugs when call onednn kernel (#46518) (#46845) · 6a6c7493
  由 YuanRisheng 提交于 10月 11, 2022
```
* fix concat bug

* fix ci bugs

* fix ci bugs
```
  6a6c7493
10 10月, 2022 2 次提交

[cherry-pick] [PHI] Migrate concat+grad, expand+grad, fill_constant … oneDNN... · fdd0d6d0

由 Sławomir Siwek 提交于 10月 10, 2022

[cherry-pick] [PHI] Migrate concat+grad, expand+grad, fill_constant … oneDNN kernels (#45863) (#46727)

* [PHI] Migrate concat+grad, expand+grad, fill_constant, nearest_interp and bilinear_interp oneDNN kernels (#45863)

* Migrate concat+grad, expand+grad, fill_constant, nearest_interp_v2 and bilinear_interp_v2 oneDNN kernels to PHI

* Remove old namespace variable

* Fix invalid out dims error

* Add mutable_data method to concat output

* Add check for -1 dim before computing out_dims

* Capitalize oneDNNGetDataType function name

* Change fill_constant kernel to correct PHI kernel

* Attempt to fix dims error

* Fix fill_constant (full) kernel

* update dependencies
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>

fdd0d6d0

[cherry-pick] [PHI] Migrate sgd and stack oneDNN kernels (#46374) (#46729) · 25d61cd1

由 Sławomir Siwek 提交于 10月 10, 2022

* [PHI] Migrate sgd and stack oneDNN kernels (#46374)

* Convert slice+grad oneDNN fluid kernels to PHI

* Change mutable_data to Alloc

* Refactor licences

* update dependencies
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>

25d61cd1

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功