提交 · ae1e71b3cb856e6dc7c76dd1ee38b038e09d137b · 机器未来 / Paddle

19 1月, 2022 2 次提交
- W
  
  [hybrid] Fix out of memory bug (#39009) · 01222f52
  由 wuhuachaocoding 提交于 1月 19, 2022
  
  01222f52
- Z
  
  Add conv2d_transpose and conv2d_transpose_grad for XPU,test=kunlun (#38956) · c7de7440
  由 zhangyikun02 提交于 1月 19, 2022
  
  c7de7440
18 1月, 2022 7 次提交

Mish FP32/BF16 kernel, conv and fc fuse passes (#38623) · 1d18bc2c

由 Sławomir Siwek 提交于 1月 18, 2022

* Mish

* Change exp() library

* mish fuse pass

* mish attrs

* fixes

* mishop maker

* remove attrs

* mish kernal for bf16

* fc+mish fuse

* fix code format error

* Resolve merge conflicts

* Update mish operator version

* update mish variable to new naming convention

1d18bc2c

change CUDA implementaion of uniform/gaussian OP (#38611) · bbbd75e4
由 zhouweiwei2014 提交于 1月 18, 2022
```
* change CUDA implementaion of uniform/gaussian OP

* fix unittest
```
bbbd75e4

add the uva function for the Tensor (#38950) · bfacd706

由 wawltor 提交于 1月 18, 2022

* add the uva api for the tensor

* fix the compiler problem for the uva

* fix the example for the _uva

* fix the compile problem in the pten library

* update the enviroment support for the uva

* use the make_shared replace the shared_ptr

bfacd706

J
fix trt convert conv2d skip (#38999) · dfa242e4
由 JingZhuangzhuang 提交于 1月 18, 2022
```
* fix trt convert conv2d skip

* fix trt convert conv2d skip
```
dfa242e4
W
modify transpose params check (#39006) · 27f8460a
由 wenbin 提交于 1月 18, 2022
```
* modify params check

* correct compile
```
27f8460a
Z
[AutoParallel] Recompute Pass (#38920) · 30845734
由 zhaoyingli 提交于 1月 18, 2022
```
* [AutoParallel] Recompute Pass

* update unittest

* reshard for amp

* add comment
```
30845734
S
Speedup FP16 Gelu op using fast math and vectorized 8 kernel (#38980) · 8c20d668
由 sneaxiy 提交于 1月 18, 2022
```
* speedup gelu using fast math

* add bwd part
```
8c20d668

17 1月, 2022 7 次提交

J

fix for conv2D training error (#38938) · 944ea436
由 jakpiase 提交于 1月 17, 2022

944ea436

[Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5

由 Wilber 提交于 1月 17, 2022

* add pten::Place data structure.

* update ci problem

* fix ci problem

* update

* using platform::Place=pten::Place

* remove BOOST_GET_CONST for CPUPlace and GPUPlace

* compile pass 25%.

* compile pass 45%

* compile pass 60%

* remove boost_get for xpu npu mlu and ipu

* compile pass on cpu and gpu.

* fix compile problem

* fix compile error.

* update

* fix ci problem

* update

* ci approve

* fix ci problem

* fix ci eager test problem

* remove BOOST_GET_CONST

* fix npu compile

c48a9ad5

S
Add NoReduce mode for ParallelExecutor (#38969) · e50d883e
由 sneaxiy 提交于 1月 17, 2022
```
* add no reduce mode for pe

* add NoReduce ut
```
e50d883e
S

add squared_l2_norm (#38968) · 6eeb16b8
由 sneaxiy 提交于 1月 17, 2022

6eeb16b8
R
fix paddle.where torch diff (#38870) · 096afbe1
由 ronnywang 提交于 1月 17, 2022
```
* fix paddle.where torch diff

* update
```
096afbe1
0
[Dy2St]close enable_inplace PASS for PE and open test_mnist_pure_fp16.py for windows (#38752) · 724d49da
由 0x45f 提交于 1月 17, 2022
```
* close enable_inplace PASS for PE, and test dy2st pure fp16 training stability

* add some comment

* enlarge atol
```
724d49da
J
Support auto prune logic in eager mode (#38960) · f81569e3
由 Jiabin Yang 提交于 1月 17, 2022
```
* support test_auto_prune_partial

* support rest of autoprune strategy in eager mode
```
f81569e3

14 1月, 2022 4 次提交

add flatten_contiguous_range OpConvert for Paddle-TRT (#38922) · 050aa6fe

由 heliqi 提交于 1月 14, 2022

* add trt_convert_flatten_contiguous_rang op

* trt version >7,support trt_convert_flatten_contiguous_rang

* trt version >7,support trt_convert_flatten_contiguous_rang

* trt version >7,support trt_convert_flatten_contiguous_rang

* test cast add trt version >=7 skip

050aa6fe

[XPU]add stack_grad op for kunlun2,*test=kunlun (#38674) · 87ee3e4f

由 Zhangjingyu06 提交于 1月 14, 2022

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun,*test=kunlun

* [XPU]add stack_grad op for kunlun2,*test=kunlun
Co-authored-by: NQingshuChen <chenqingshu@baidu.com>

87ee3e4f

B

Add dygraph sharding stage3 (#38052) · 4c77a908
由 Baibaifan 提交于 1月 14, 2022

4c77a908

[MLU]Add mean and reduce_mean op (#38872) · 7f8d5bc8

由 qipengh 提交于 1月 14, 2022

* [MLU]: add mean and reduce mean op

* [MLU]add mlu pytest dir in CMakeLists.txt

* [MLU]fix tensor data

* [MLU]fix TensorToPyArray and license

7f8d5bc8

13 1月, 2022 6 次提交

F
[NPU] fix tril_triu (#38864) · eaccdc71
由 furnace 提交于 1月 13, 2022
```
[NPU] fix tril_triu
```
eaccdc71
F
[NPU] fix expand op (#38526) · 7a5af630
由 furnace 提交于 1月 13, 2022
```
* [NPU] fix expand op

* [NPU] optimize codes

* [NPU] optimize codes
```
7a5af630
W
roi_align aligned supported (#38905) · 08dcea18
由 wenbin 提交于 1月 13, 2022
```
roi_align aligned supported
```
08dcea18

Added mul BF16/FP32 FWD/BWD oneDNN kernel (#38552) · fc6eed5b

由 jakpiase 提交于 1月 13, 2022

* base changes for mul reimplementation

* empty commit

* tmp save

* full implementation of mul bf16/fp32 fwd bwd

* CI fix

* CI rerun

* changed unity build cmake to avoid gpu issues

* removed mul mkldnn from unity build

* added skipping tests if not cpu_bf16

* CI fix

* CI fix

* CI fix

fc6eed5b

C
Fix mkldnn invalid infershape impl (#38837) · 281644cd
由 Chen Weihang 提交于 1月 13, 2022
```
* fix mkldnn invalid infershape

* add unittest for mkldnn in new executor

* add import os
```
281644cd

Support test_imperative using_non_zero_gpu with _test_eager_guard() (#38881) · 5e515781

由 Weilong Wu 提交于 1月 13, 2022

* Support test_imperative using_non_zero_gpu and Add a TODO comment

* Change GPU number to 0

* Modify the cuda device selection method

5e515781

12 1月, 2022 4 次提交

S
Fix conv act int8 scale (#38331) · 4825addd
由 Sylwester Fraczek 提交于 1月 12, 2022
```
* fix conv act int8 scale

* add unit test for conv+hard_swish
```
4825addd

support 5d for nearest interp (#38868) · d296456c

由 xiaoting 提交于 1月 12, 2022

* support 5d for nearest

* update nearest3d unittest, test=develop

* fix approve ci, test=develop

* fix approve ci, test=develop

d296456c

[Dist Pass] Amp Pass (#38764) · cc24427e

由 JZ-LIANG 提交于 1月 12, 2022

* auto parallel sharding base

* chmod

* add unitest

* set unitest cmake dist label

* revise code according to rewiew

* chmod

* bugfix for grad_clip and param broadcast

* chmod

* update unitest

* chmod

* add clip

* chmod

* add amp pass

* chmod

* add unitest

* remove grad update

* fixed bug

* fixed bug

* fixed typose

* fixed typoes

cc24427e

J

support test_auto_prune_partial (#38871) · 4640955c
由 Jiabin Yang 提交于 1月 12, 2022

4640955c

11 1月, 2022 4 次提交

W

Support test_numpy_bridge and thread_local_has_grad (#38835) · 29c211ee
由 Weilong Wu 提交于 1月 11, 2022

29c211ee

【Auto Parallel】New local tensor (#38747) · d3ba1895

由 caozhou 提交于 1月 11, 2022

* update dist tensor

* add unitest

* update unitest

* refactor dist tensor

* update dist tensor and unitest

d3ba1895

Z
[AMP] Check call order of paddle.amp.decorate and paddle.DataParallel (#38785) · fbb40281
由 zhangbo9674 提交于 1月 11, 2022
```
* check amp.decorate and DataParallel

* refine coverage

* fix layer dtype

* refine code
```
fbb40281

Jit pre save hook (#38186) · e91f7c02

由 Ming-Xu Huang 提交于 1月 11, 2022

* Pre-save hooks of jit.save

1. Added pre_save_hooks features to jit.save.
2. Added related unittests

* Added jit pre_save_hooks functions's alias to paddle.jit and copyright.

* Make jit.save_pre_hook style be consisent with Paddle's rule.

* Fixed arguments passing bug in run_save_pre_hooks

* Added API Documents

* Move clear and run_pre_save_hooks as internal methonds only.

* Made register_save_pre_hook as an internal function.

e91f7c02

10 1月, 2022 6 次提交
- B
  
  update mul_gru_fuse_pass ut timeout setting (#38763) · 1f8fe035
  由 baoachun 提交于 1月 10, 2022
  
  1f8fe035
- H
  Add gpu kernel for new api : linalg.lstsq (#38621) · 405103d8
  由 Haohongxiang 提交于 1月 10, 2022
```
* add lstsq gpu kernel

* update

* add docs_en

* modify ut

* fix bugs

* modify example in docs_en

* remove lstsq_op.cu from ROCM cmake

* modify docs_en

* modify docs_en

* modify docs_en

* remove unneccessary TensorCopy
```
  405103d8
- Y
  Add the backward support for QR (#38824) · 657b6742
  由 Yulong Ao 提交于 1月 10, 2022
```
* Add the backward support for QR

* Remove unnecessary comments
```
  657b6742
- H
  
  add static label check · 09d4a3a4
  由 HydrogenSulfate 提交于 12月 28, 2021
  
  09d4a3a4
- H
  
  Update test_cross_entropy_loss.py · 9765be09
  由 HydrogenSulfate 提交于 12月 28, 2021
  
  9765be09
- H
  
  remove hard labels check · 51398ab9
  由 HydrogenSulfate 提交于 12月 27, 2021
  
  51398ab9

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致