提交 · f8681ffcf6eee4bb1d548507a1fda84ec7a75271 · PaddlePaddle / Paddle

21 6月, 2022 3 次提交
- J
  
  Correct elementwise quantization (#43693) · f8681ffc
  由 joanna.wozna.intel 提交于 6月 21, 2022
  
  f8681ffc
- J
  
  fix for quant model (#43567) · ae7192a8
  由 jakpiase 提交于 6月 16, 2022
  
  ae7192a8
- Z
  
  fix compile fail in cuda11.6 (#43588) · e1604f9e
  由 zhoutianzi666 提交于 6月 21, 2022
  
  e1604f9e
20 6月, 2022 1 次提交
- X
  [Cherry pick] Einsum memory optimization PR #43397 (#43554) · 638b69dc
  由 xiongkun 提交于 6月 20, 2022
```
* cherry pick from #43397

* fix code
```
  638b69dc
17 6月, 2022 2 次提交

W

Export symbols of phi operator library (#43478) · 68ed3b86
由 weishengying 提交于 6月 17, 2022

68ed3b86

[cherry-pick 2.3] Cherry parallel fused transformer api (#43505) · 19b87aec

由 WangXi 提交于 6月 17, 2022

* Rename dropout is test (#43098)

* replace dropout_is_test with is_test.
* improve atol on a100.

* fused_attention fused_feedforward api support Model Tensor Parallel (#42985)

* fix is_test bug in fused_feedforward. (#43508)
Co-authored-by: NLi Min <11663212+limin2021@users.noreply.github.com>

19b87aec

15 6月, 2022 1 次提交
- Z
  [cherry-pick] Fix bug of strided_slice and slice (#43388, #43443) (#43432) · 7e940b84
  由 zyfncg 提交于 6月 15, 2022
```
* fix bug of strided_slice (#43388)

* fix stride_slice bug

* fix bug

* fix bug of infer shape for slice (#43443)
```
  7e940b84
14 6月, 2022 1 次提交

[ CherryPick ] Cherry pick for einsum optimization. (#43468) · 22e75d92

由 xiongkun 提交于 6月 14, 2022

* [EinsumOp] Polish forward logic and backward logic for optimize (#42603)

* change logic for optimize

* modifty

* merge

* change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0 (#43010)

* [EinsumOp] Make EinsumOp support bfloat16. (#43085)

* change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0

* make EInsumOP support bf16

* add unittest for BF16

* add condition for test_BF16

* fix bugs

* fix

* change the backward api to fit einsum op

22e75d92

09 6月, 2022 1 次提交
- Z
  
  disable lite gpu (#43178) · 36980306
  由 zhupengyang 提交于 6月 09, 2022
  
  36980306
08 6月, 2022 3 次提交

Replace ReduceAmax/Amax.part.cu with KP (#43202) (#43263) · e161979e

由 niuliling123 提交于 6月 08, 2022

Reduce amax/amin frobenius_norm_kerne原始实现为Eigen实现，文件编译时间较长，因此本PR将其替换为KP实现
删除DefaultElementwiseOperator中重复功能支持，减少elementwise_double_grad OP编译时间

e161979e

T
Del whl check for release/2.3 (#43288) · 8f127681
由 tianshuo78520a 提交于 6月 08, 2022
```
删除在2.3 对比whl包大小。
```
8f127681
H
Resolve protobuf of ORT Backend conflict (#43275) · c2804390
由 heliqi 提交于 6月 07, 2022
```
解决onnxruntime后端依赖的protobuf跟框架或外部protobuf版本冲突问题
```
c2804390

07 6月, 2022 1 次提交
- N
  [cherry-pick]Delete ElementwiseKernel in BroadcastKernel (#42779) (#43210) · 52ef8656
  由 niuliling123 提交于 6月 07, 2022
```
Delete ElementwiseKernel in BroadcastKernel
减少所有Broadcast中重复功能调用，同时减少编译时间和问题体积
```
  52ef8656
06 6月, 2022 1 次提交

cherry-pick 42645 (#43205) · 835a1888

由 niuliling123 提交于 6月 06, 2022

删除Broadcast function中rank例化以及Elementwise调用，降低编译时间。
从develop分支中的#42645 PR修改而来，由于develop分支与release分支相差较大，无法实现cherry-pick，因此针对release2.3重新提交PR.
Broadcast中关于rank的例化会导致底层模板展开较多，造成reduce_sum_grad_kernel.cu.o文件体积过大，修改后可以降低.o体积及编译时间

835a1888

31 5月, 2022 1 次提交

Del check size (#43113) · 40a7e0ad

由 tianshuo78520a 提交于 5月 31, 2022

删除判断build目录大小和预测库大小检查功能。该功能是和develop比较，会存在差异，在release任务中取消判断

40a7e0ad

30 5月, 2022 2 次提交
- W
  [Dy2St]Fix cond_block_grad error when handle no need grad vras (#43034) (#43084) · e6e85b35
  由 WangZhen 提交于 5月 30, 2022
```
* Fix cond_block_grad error when handle no need grad vras

* Add comment and UT
```
  e6e85b35
- W
  [Paddle-Inference] fix_multiheadpass_int8 (#43020) · 72880279
  由 Wangzheee 提交于 5月 30, 2022
```
* fix_multi_int8 (#42977)

* cherry-pick fix_multihead_int8
```
  72880279
27 5月, 2022 2 次提交
- T
  
  test=document_fix · aedd4592
  由 tianshuo78520a 提交于 5月 27, 2022
  
  aedd4592
- T
  
  test=document_fix · 67da108a
  由 tianshuo78520a 提交于 5月 27, 2022
  
  67da108a
26 5月, 2022 1 次提交
- C
  
  polish kernel type str (#42791) (#42931) · b5766fbf
  由 Chen Weihang 提交于 5月 26, 2022
  
  b5766fbf
23 5月, 2022 1 次提交
- S
  【CI】run all demo ci before exit in windows (#42700) (#42897) · 2300d45f
  由 Sing_chan 提交于 5月 23, 2022
```
cherry-pick PR #42700
```
  2300d45f
17 5月, 2022 1 次提交
- C
  
  fix trace op record event error (#42775) (#42789) · af79273d
  由 Chen Weihang 提交于 5月 17, 2022
  
  af79273d
11 5月, 2022 1 次提交
- A
  
  [Eager]Fix EagerTensor _copy_to memory overlap problem (#42668) (#42686) · d0e733dd
  由 Aurelius84 提交于 5月 11, 2022
  
  d0e733dd
10 5月, 2022 4 次提交
- J
  pdnode_compare (#42597) (#42633) · 403b503f
  由 JingZhuangzhuang 提交于 5月 10, 2022
```
* pdnode_compare

* panode compare

* pdnode_compare
```
  403b503f
- F
  [cherry-pick][MLU] support add callback to stream and profiler (#42115) · 25124d7f
  由 fwenguang 提交于 5月 10, 2022
```
* [MLU] add mlu new profiler (#41138)

* [MLU] add mlu new profiler

* fix format

* [MLU] support add callback to stream (#41831)

* [MLU] add gather mlu kernel (#41969)

* [MLU] add mlu activation kernels (#41751)
```
  25124d7f
- A
  set custom_nll_loss_op attr ignoreIndex to str (#42596) · 6c935e1d
  由 Allen Guo 提交于 5月 10, 2022
```
set attr ignoreIndex type to string for custom_nllloss_op

部分 cheery-pick of #42534
```
  6c935e1d
- Z
  
  fix bug of optional_tensor in amp logic (#42561) (#42577) · 37715dab
  由 zhangbo9674 提交于 5月 10, 2022
  
  37715dab
09 5月, 2022 1 次提交

[Cherry-pick][IPU] merge recent changes (#42078) (#42582) · 1f9b60df

由 Allen Guo 提交于 5月 09, 2022

    add class NameScopeHelper for adding namescope info
    添加更多 种类优化器状态的映射
    为 IpuStrategy 添加 compilation_progress_logger option 用于输出 编译进度
    部分代码清理和杂项优化

1f9b60df

07 5月, 2022 2 次提交
- F
  Reduce the number of threads per block of deformable_psroi_pooling to solve... · 44271ece
  由 FlyingQianMM 提交于 5月 07, 2022
```
Reduce the number of threads per block of deformable_psroi_pooling to solve the bug where too many resources requested for launch (PaddlePaddle#42531) (#42533)
```
  44271ece
- R
  [cherry-pick] Fix UT timeout problem for cuda_managed_memory_test and test_tensordot (#42492) · c9d156b1
  由 Ruibiao Chen 提交于 5月 07, 2022
```
* Reduce time variation for cuda_managed_memory_test (#42458)

* Disable standalone executor for test_tensordot (#42476)
```
  c9d156b1
06 5月, 2022 1 次提交

Fix the race condition in cumsum operator (#42205) (#42500) · 58f40144

由 wawltor 提交于 5月 06, 2022

* Fix the race condition in cumsum operator

* Optimize cumsum operator
Co-authored-by: NLeo Chen <39020268+leo0519@users.noreply.github.com>

58f40144

05 5月, 2022 2 次提交
- X
  
  fix bugs (#42495) · 590b4dbc
  由 xiongkun 提交于 5月 05, 2022
  
  590b4dbc
- W
  
  fix the v100 cuda11.2 matmul_v2 and elementwise_div bug (#42479) · e052fde7
  由 wawltor 提交于 5月 05, 2022
  
  e052fde7
04 5月, 2022 4 次提交

graph partition (#42472) · a3917625

由 seemingwang 提交于 5月 04, 2022

* enable graph-engine to return all id (#42319)

* enable graph-engine to return all id

* change vector's dimension

* change vector's dimension

* enlarge returned ids dimensions

* change sample result's structure to fit training (#42426)

* enable graph-engine to return all id

* change vector's dimension

* change vector's dimension

* enlarge returned ids dimensions

* add actual_val

* change vlog

* fix bug

* bug fix

* bug fix

* fix display test

* singleton of gpu_graph_wrapper

* change sample result's structure to fit training

* recover sample code

* fix

* secondary sample

* add graph partition

* fix pybind
Co-authored-by: NDesmonDay <908660116@qq.com>
Co-authored-by: NDesmonDay <908660116@qq.com>

a3917625

X
[cherry-pick 2.3] fix bug of batch_norm_grad kernel with fp16 (#42461) · a5745864
由 XiaoguangHu 提交于 5月 04, 2022
```
* fix bug of batch_norm_grad kernel with fp16

* format code
```
a5745864
H
fix paddle-ort python bug (#42464) (#42470) · 87e6149c
由 heliqi 提交于 5月 04, 2022
```
* fix paddle-ort python bug

* fix paddle-ort python bug
```
87e6149c
X

fix bug when compiling with cusparse in CUDA version >=11.4 (#42456) · b57c132a
由 XiaoguangHu 提交于 5月 04, 2022

b57c132a

02 5月, 2022 1 次提交
- Z
  [Cherry-Pick]Fix test_cudnn_norm_conv and test_cudnn_bn_add_relu in CUDA11.2 (#42406) · 655c4981
  由 Zhang Zheng 提交于 5月 02, 2022
```
* Fix test_cudnn_norm_conv and test_cudnn_bn_add_relu in CUDA11.2

* no throw in V100 for some cases
```
  655c4981
01 5月, 2022 1 次提交
- C
  
  remove useless lod copy (#42425) · 778ec77b
  由 Chen Weihang 提交于 5月 01, 2022
  
  778ec77b
30 4月, 2022 1 次提交
- W
  
  [Eager] Support test_diff_op switch to eager mode (#42360) (#42392) · 1e3d2e4a
  由 Weilong Wu 提交于 4月 30, 2022
  
  1e3d2e4a

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功