提交 · 73473ac26c03395cfaaa3087c4fa82ad318266e5 · PaddlePaddle / Paddle

09 4月, 2023 1 次提交
- register bf16 for c ops (#52641) · 73473ac2
  由 shaojie_wang 提交于 4月 08, 2023
  
  73473ac2
07 4月, 2023 1 次提交
- Z
  
  modify cmake file for cuda11.8 compile (#49020) (#52481) · 9431bae1
  由 zhaoyingli 提交于 4月 07, 2023
  
  9431bae1
30 3月, 2023 1 次提交
- Y
  
  use int64 for c split (#52279) · 964497b5
  由 Yuang Liu 提交于 3月 30, 2023
  
  964497b5
20 3月, 2023 1 次提交
- L
  
  Cherry-pick fleet executor and auto parallel (#50071) · 92c2dcbd
  由 LiYuRio 提交于 3月 20, 2023
  
  92c2dcbd
17 2月, 2023 1 次提交
- W
  
  Add rpc ops to fetch data from remote service (#50220) · 9025fddd
  由 Wen Sun 提交于 2月 17, 2023
  
  9025fddd
13 1月, 2023 1 次提交
- Y
  fix fc kernel diff (#49781) · 01c26ab2
  由 Yuanle Liu 提交于 1月 13, 2023
```
* fix fc kernel diff

* disable fc_elementwise_layernorm_fuse_pass
```
  01c26ab2
04 1月, 2023 1 次提交

[Cherry-pick][Paddle Inference] fix mixed precision diff (#49477) · 1d25c663

由 Yuanle Liu 提交于 1月 04, 2023

* disable scale op in amp pass

* Do not insert redundant cast op

* fix fused_fc_elementwise_layernorm kernel diff

* fix fc kerenl diff

1d25c663

30 12月, 2022 1 次提交

[MLU] cherry-pick from develop to release/2.4 (#48313) · 6e154fc6

由 Chenxiao Niu 提交于 12月 30, 2022

* [MLU] fix compute error of dropout op (#45923)

* [MLU] add mergedAdam kernel. (#45965)

* [MLU] add int64 support for mlu one_hot_v2 (#46313)

* [MLU] fix profiler compile failure (#46208)

* [MLU] add barrier_op kernel. (#46417)

* [MLU] fluid: add mluop (#46429)

* [MLU] add huber_loss kernel. (#46455)

* [MLU] add mlu kernel for add_reduce_max_grad (#45651)
Co-authored-by: Nliupeiyu <liupeiyu@cambricon.com>

* [MLU] add_fluid_mluop_yolo_box (#46573)

* [MLU] fix phi::Tensor compile error of mlu. (#46649)

* [MLU] add fluid MLUOps prior_box (#46585)

* [MLU] fix cmake error (#46772)

* [MLU]fix unittest of sync_bn (#46797)

* [MLU] add masterparam support for mlu adamw. (#46804)

* [MLU] add int64 support for allgather. (#46830)

* [MLU] fix compile error & add mlu blacklist function. (#47439)

* [MLU] fix softmax_with_cross_entropy failed in 370-X8.

* [MLU] fix cncl stuck caused by multiple initializations.

* [MLU] fix code style check.
Co-authored-by: Nqipengh <huangqipeng@cambricon.com>
Co-authored-by: Ncifar10 <41565156+cifar10@users.noreply.github.com>
Co-authored-by: Lux et Veritas <1004239791@qq.com>
Co-authored-by: Nliupeiyu <liupeiyu@cambricon.com>
Co-authored-by: Nronnywang <ronny1996@163.com>

6e154fc6

29 12月, 2022 1 次提交

[Cherry-pick]Move sum op to PHI && Fix MetaTensor's bug when run infermeta (#49342) · 8015fbd6

由 YuanRisheng 提交于 12月 29, 2022

* cherry-pick 45860

* [BUG FIX]Fix MetaTensor's bug when run infermeta (#46265)

* fix sum bug

* fix ci bugs

* fix ci bugs

* update code according comment

8015fbd6

20 12月, 2022 1 次提交
- S
  Fix nullptr to TestFuseGemmEpilogueReluBWDFP* (#48997) (#49090) · cdab3a44
  由 ShenLiang 提交于 12月 20, 2022
```
Co-authored-by: NMing-Xu Huang <mingh@nvidia.com>
```
  cdab3a44
29 11月, 2022 1 次提交

[cherry-pick] updating mul and matmul with set_mem_desc and fix... · 9e2ba9b9

由 yeliang2258 提交于 11月 29, 2022

[cherry-pick] updating mul and matmul with set_mem_desc and fix squeeze_transpose for MKLDNN (#47951)

* Fix slice bugs in MKLDNN when input dims are zeros (#46671)

* fix slice bugs

* fix

* update code

* fix

* update code

* updating mul and matmul with set_mem_desc (#45624)

* - mul & matmul changes

- fix

- bs16 correction of strides

* - cosmetic fixes

* - lint

* - fix

* - fix

* - format -> mem_desc

* - fix

* - fix

* - fix

* - fix

* - fix

* fix squueze_transpose (#47911)
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>

9e2ba9b9

28 11月, 2022 1 次提交

Cherrypick NV fixes to release/2.4 (#48263) · 7a0b8625

由 zlsh80826 提交于 11月 28, 2022

* Reduce squeeze2_matmul_fuse_pass, flattent tests time (#47098)

* Add missing fp32 config and reduce the testing combination

* Reduce trt matmul pass test max examples

* Loose TRT fp16 tests tolerance (#47100)

* Loose TRT half test tolerance to 1e-3 (#47101)

* Loose TRT half test tolerance to 1e-3 (#47106)

* Update distributed_strategy.proto (#46531)

* Close popen pipe after used (#47053)

* Add launch_bounds (#47285)

* Fix TRT UT failures (#47488)

* Format cherry-picked commits

* CudnnNormConvolution is no longer supported on NVIDIA Hopper GPUs (#48203)

* Skip tests that use fused_ops on H100

* Add error message to FusedOps on H100
Co-authored-by: NShijie <505749828@qq.com>
Co-authored-by: NLeo Chen <39020268+leo0519@users.noreply.github.com>
Co-authored-by: NTian Zheng <tizheng@nvidia.com>

7a0b8625

11 11月, 2022 1 次提交
- Y
  Fix slice bugs in MKLDNN when input dims are zeros (#46671) (#47887) · 5033b6c2
  由 yeliang2258 提交于 11月 11, 2022
```
* fix slice bugs

* fix

* update code

* fix

* update code
```
  5033b6c2
09 11月, 2022 1 次提交
- H
  [cherry-pick] Squeeze2 and transpose2 fuse using oneDNN(#47712) · ea5f44b8
  由 Hui Zhang 提交于 11月 09, 2022
```
* suqeeze2 + transpose2 fuse onednn cherrypick 2.4

* format

* fix merge
```
  ea5f44b8
08 11月, 2022 1 次提交

[CHERRY-PICK] Added caching to oneDNN FC and op+unsqueeze2 and op+reshape2 fuse passes (#47690) · d0e19af3

由 jakpiase 提交于 11月 08, 2022

* fc cherrypick

* another files added

* added transpose cherrypick

* reverter somebodys fc changes

* minor fix

* minor fix

* cherry-pick of fc+act changes

* minor fix

* fix

d0e19af3

07 11月, 2022 1 次提交

[cherry-pick2.4]docs fix (#47669) · cf668ab3

由 Ligoml 提交于 11月 07, 2022

* #46165

* #45752

* fix some doc bug test=document_fix (#45488)

* fix some doc bug test=document_fix

* fix some docs issues, test=document_fix

* beta -> \beta in softplus

* threshold -> \varepsilon in softplus

* parameter name

* delta -> \delta in smooth_l1_loss

* fix some docs test=document_fix

* fix docs test=document_fix

* fix docs && 增加空行 test=document_fix

* Update python/paddle/nn/functional/activation.py, test=document_fix

* Update python/paddle/nn/layer/activation.py, test=document_fix
Co-authored-by: NSigureMo <sigure.qaq@gmail.com>

* [docs] add ipustrategy Hyperlink (#46422)

* [docs] add ipustrategy Hyperlink

* fix ipu_shard_guard docs; test=document_fix

* [docs] add set_ipu_shard note

* [docs] fix hyperlink

* update framework.py

* fix mlu_places docs; test=document_fix

* fix put_along_axis docs; test=document_fix

* fix flake8 W293 error, test=document_fix

* fix typo in typing, test=document_fix
Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
Co-authored-by: NNyakku Shigure <sigure.qaq@gmail.com>

* #46659

* Update README_cn.md (#46927)

修复了错别字

* #46738

* fix paddle.get_default_dtype (#47040)

Chinese and English return values are inconsistent

* fix bug
Co-authored-by: N张春乔 <83450930+Liyulingyue@users.noreply.github.com>
Co-authored-by: NInfinity_lee <luhputu0815@gmail.com>
Co-authored-by: Nmrcangye <chenloong@88.com>
Co-authored-by: NSigureMo <sigure.qaq@gmail.com>
Co-authored-by: Ngouzil <66515297+gouzil@users.noreply.github.com>
Co-authored-by: NHamid Zare <12127420+hamidzr@users.noreply.github.com>
Co-authored-by: NSqhttwl <61459740+Sqhttwl@users.noreply.github.com>
Co-authored-by: NOccupyMars2025 <31559413+OccupyMars2025@users.noreply.github.com>
Co-authored-by: N超级码牛 <54444805+SuperCodebull@users.noreply.github.com>
Co-authored-by: Njzhang533 <jzhang533@gmail.com>

cf668ab3

03 11月, 2022 1 次提交
- S
  
  FC/matmul(v2) + scale fuse pass (#47420) · 99c872fa
  由 Sławomir Siwek 提交于 11月 03, 2022
  
  99c872fa
01 11月, 2022 1 次提交

[cherry-pick][code-gen] Support code-gen for opmaker of sparse op (#46993) (#47417) · 601626ac

由 zyfncg 提交于 11月 01, 2022

* support generating code of opmaker for backward op invoke forward op (#46912)

* [code-gen] Support code-gen for opmaker of sparse op (#46993)

* support generating code of opmaker for backward op invoke forward op

* gsupport code-gen of opmaker for sparse op

* refind logic of choose phi kernrel

* fix complie budg

* fix code_gen bug

* fix bug

* fix kernel signature code-gen

* fix complie bug of VarType

* fix complie bug of VarType

* fix test_sparse_conv_op

* fix test_sparse_norm_op

* [Phi] Refactor logic of judging whether having a phi kernrel (#46920)

* refind logic of choose phi kernrel

* fix complie budg

* update cmake

601626ac

28 10月, 2022 2 次提交

[Cherry-pick][JIT] Add Predictor for JITLayer (#47379) (#47419) · c42929c5

由 Aurelius84 提交于 10月 28, 2022

* [JIT] Add Predictor for JITLayer (#47379)

* add predictor_engine

* add predictor_engine

* fix zero shape

* fix lodTensor

* fix unittest

* fix code style

* update CmakeList

* fix new executor

c42929c5

Z
[cherry-pick]add sync_batch_norm_bn and deliver indices_dict (#47407) · 0fa8309a
由 zhangkaihuo 提交于 10月 28, 2022
```
add sync_batch_norm_bn and deliver indices_dict 
```
0fa8309a

27 10月, 2022 1 次提交
- Z
  [cherry-pick] add batch_norm_kernel (#47394) · b143e008
  由 zhangkaihuo 提交于 10月 27, 2022
```
* cherry-pick #46359 and resolve conflict
```
  b143e008
26 10月, 2022 3 次提交

Z
Fix inference performance problem caused by selecting cudnn kernel of softmax (#47338) (#47367) · 0369cd0f
由 zyfncg 提交于 10月 26, 2022
```
* fix inference perfermence problem caused by selecting cudnn kernel for softmax

* recover use_cudnn in opmaker of softmax
```
0369cd0f
Y
Added workaround for elementwise oneDNN kernel (#47080) (#47342) · 7c6550a6
由 yeliang2258 提交于 10月 26, 2022
```
* return proper state

* fix for dims

* fix
Co-authored-by: Njakpiase <jakpia21@gmail.com>
```
7c6550a6

[Cherry-pick][Release/2.4]Refine the memory usage of fused_attention and... · 9a6dd8f8

由 sneaxiy 提交于 10月 26, 2022

[Cherry-pick][Release/2.4]Refine the memory usage of fused_attention and fused_feedforward ops (#47235)

* fix fused_attention fused_feedforward

* fix ci

* fix ci

* fix ci PADDLE_GET_CONST

* fix ci ut

9a6dd8f8

21 10月, 2022 1 次提交
- J
  Add infer prune function (#47047) · 8739497c
  由 JingZhuangzhuang 提交于 10月 21, 2022
```
* Add infer prune function

* add fusion op
```
  8739497c
20 10月, 2022 4 次提交
- Y
  [Cherry-pick] Simplify conv codes and fix cache and autotune bugs. (#47197) · c0ed8729
  由 Yiqun Liu 提交于 10月 20, 2022
```
* Simplify the codes of conv. (#45966)

* Enable to record whether the conv algo is got by exhaustive search to fix autotune cache bug. (#47065)
```
  c0ed8729
- Z
  [Paddle-TRT][Cherry-Pick]Rewrite strided_slice converter using shape tensor (#47153) · 68c4ac31
  由 zhoutianzi666 提交于 10月 20, 2022
```
* stride_to_24

* fix CI failing
```
  68c4ac31
- S
  [Cherry-pick][Release/2.4] Fix some operators when the tensor.numel() > INT32_MAX (#47191) · c74bf018
  由 sneaxiy 提交于 10月 20, 2022
```
Fix some operators when the tensor.numel() > INT32_MAX
```
  c74bf018
- S
  [Cherry-pick][Release/2.4] support pure bfloat16 for more ops · da7d2f29
  由 sneaxiy 提交于 10月 20, 2022
```
support pure bfloat16 for more ops
```
  da7d2f29
19 10月, 2022 2 次提交
- W
  [Dy2St]Fix recurrent op eager deletion pass error in dy2st (#47105) (#47134) · 69515e90
  由 WangZhen 提交于 10月 19, 2022
```
[CherryPick][Dy2St]Fix recurrent op eager deletion pass error in dy2st
```
  69515e90
- H
  [ cherrypick] Construct exec and ctx only once in cond op to speed up (#47012) · fcb9c0b5
  由 Hui Zhang 提交于 10月 19, 2022
```
Construct exec and ctx only once in cond op to speed up
```
  fcb9c0b5
18 10月, 2022 2 次提交
- Z
  
  support shape tensor is the input of trt-subgraph (#47066) · 5a44c124
  由 zhoutianzi666 提交于 10月 18, 2022
  
  5a44c124
- H
  [cherry-pick] Fix perf issues of mp/pp/fuse in eager mode (#47071) · b84edd90
  由 Haohongxiang 提交于 10月 18, 2022
```
* [Dygraph] Fix performance of pp+mp by using send/recv_calc_stream instead of send/recv (#46116)

* [Dygraph] Fix Perf of FusedFeedForward and FusedAttention with AllReduce (#46780)

* update
```
  b84edd90
17 10月, 2022 1 次提交
- Z
  [cherry-pick]Sparse static graph (#46838) · 10225d22
  由 zhangkaihuo 提交于 10月 17, 2022
```
cherry-pick : #46322, #46245
Sparse API 支持静态图
```
  10225d22
14 10月, 2022 1 次提交
- A
  [BUG]Fix expand_as_v2 bug while X and Y with different dtype (#46950) (#46999) · 4b472656
  由 Aurelius84 提交于 10月 14, 2022
```
* [BUG]Fix expand_as_v2 bug while X and Y with different dtype

* fix commit
```
  4b472656
13 10月, 2022 2 次提交

傅
[Cherry-pick] Add fp16 dtype support for set_value op (#46906) · 100a0750
由傅剑寒提交于 10月 13, 2022
```
Fix set_value failure when source tensor is fp16 Dtype and destiny value is a number
(dev PR link:#46801)
```
100a0750

[cherry-pick] [PHI] transpose2_grad op migration (#46139) (#46873) · 0280c0b9

由 Sławomir Siwek 提交于 10月 13, 2022

* Revert pool+grad oneDNN kernel conversion (#45989)

* [PHI] transpose2_grad op migration (#46139)

* op migrated, Copy(OneDNNContext, ...) added

* mutable_data & op registration in fluid removed

* refactoring

* OneDNNGetDataType to uppercase

* missing cpu check added, handler moved to .h file

* name changed to transpose_grad

* Copy changed back to TensorCopy

* Resizing corrected, Copy(OneDNNContext) removed
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
Co-authored-by: NPaulina Gacek <paulina.gacek@intel.com>

0280c0b9

11 10月, 2022 3 次提交

[cherry-pick] [PHI] relu6_grad kernel (#46501) (#46862) · 2bcbf8b0

由 Sławomir Siwek 提交于 10月 11, 2022

* [PHI] Migrate gelu kernels (#45596)

* gaussian random

* mkldnn to onednn renaming

* fix merge conflicts

* remove fluid code

* onednn renaming

* gelu fwd

* sort activations

* gelu gradient

* remove unused macros

* merge conflicts

* fix merge conflicts

* remove extra contraint from gelu op

* [PHI] relu6_grad kernel (#46501)

* Relu6

* remove fluid handler

* add individual kernel signature

* coding style

* replace bounded_relu with clip

* whitespace

* code style

2bcbf8b0

S
Revert pool+grad oneDNN kernel conversion (#45989) (#46860) · 7b3837e6
由 Sławomir Siwek 提交于 10月 11, 2022
```
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
```
7b3837e6
C

speedup ChannelClipAndQuantDequantKernelQuantAxis1 kernel (#46471) (#46551) · f5565494
由 ceci3 提交于 10月 11, 2022

f5565494

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功