提交 · 0837a2ccad11cdc42b498ee02374ebceff0a177d · PaddlePaddle / Paddle

19 1月, 2022 4 次提交

ipu python interface p1 (#38096) · 0837a2cc

由 jianghaicheng 提交于 1月 19, 2022

* ipu_commit_tests p1

* resolve comments

* resolve comments

* resolve comments

* resolve comments

* resolve comments

* resolve comments

* resolve comments

* update lint and ipustrategy introduction

* update ipu_config

* update __init__ of static

* update doc

* update doc 2

* update doc 3

* update doc 4

* update doc 5

* update doc 5

* update doc 6

* update lint

* update lint 2

* update ipustrategy

* add IpuStrategy to all

* update ipustrategy

* update ipu_shard_guard

* update ipu_shard_guard 2
Co-authored-by: Nyaozhixin <522190855@qq.com>

0837a2cc

Fix paddle.flops AttributeError (#38850) · ae1e71b3

由 yingyibiao 提交于 1月 19, 2022

* Fix AttributeError when output y is a tuple which has no attribute 'shape'

* Add unit test for dynamic_flops with multiple outputs

* Add unit test for dynamic_flops with multiple outputs

ae1e71b3

W

[hybrid] Fix out of memory bug (#39009) · 01222f52
由 wuhuachaocoding 提交于 1月 19, 2022

01222f52
Z

Add conv2d_transpose and conv2d_transpose_grad for XPU,test=kunlun (#38956) · c7de7440
由 zhangyikun02 提交于 1月 19, 2022

c7de7440

18 1月, 2022 10 次提交
- S
  Mish FP32/BF16 kernel, conv and fc fuse passes (#38623) · 1d18bc2c
  由 Sławomir Siwek 提交于 1月 18, 2022
```
* Mish

* Change exp() library

* mish fuse pass

* mish attrs

* fixes

* mishop maker

* remove attrs

* mish kernal for bf16

* fc+mish fuse

* fix code format error

* Resolve merge conflicts

* Update mish operator version

* update mish variable to new naming convention
```
  1d18bc2c
- change CUDA implementaion of uniform/gaussian OP (#38611) · bbbd75e4
  由 zhouweiwei2014 提交于 1月 18, 2022
```
* change CUDA implementaion of uniform/gaussian OP

* fix unittest
```
  bbbd75e4
- K
  
  fix http gloo bug (#39017) · a998c077
  由 kuizhiqing 提交于 1月 18, 2022
  
  a998c077
- W
  add the uva function for the Tensor (#38950) · bfacd706
  由 wawltor 提交于 1月 18, 2022
```
* add the uva api for the tensor

* fix the compiler problem for the uva

* fix the example for the _uva

* fix the compile problem in the pten library

* update the enviroment support for the uva

* use the make_shared replace the shared_ptr
```
  bfacd706
- J
  fix trt convert conv2d skip (#38999) · dfa242e4
  由 JingZhuangzhuang 提交于 1月 18, 2022
```
* fix trt convert conv2d skip

* fix trt convert conv2d skip
```
  dfa242e4
- W
  modify transpose params check (#39006) · 27f8460a
  由 wenbin 提交于 1月 18, 2022
```
* modify params check

* correct compile
```
  27f8460a
- Z
  
  Fixed python-level LoDTensor patch (#38996) · a17e51dd
  由 Zhanlue Yang 提交于 1月 18, 2022
  
  a17e51dd
- D
  
  Fix pad api docs (#38988) · 5406e6f8
  由 duanboqiang 提交于 1月 18, 2022
  
  5406e6f8
- Z
  [AutoParallel] Recompute Pass (#38920) · 30845734
  由 zhaoyingli 提交于 1月 18, 2022
```
* [AutoParallel] Recompute Pass

* update unittest

* reshard for amp

* add comment
```
  30845734
- S
  Speedup FP16 Gelu op using fast math and vectorized 8 kernel (#38980) · 8c20d668
  由 sneaxiy 提交于 1月 18, 2022
```
* speedup gelu using fast math

* add bwd part
```
  8c20d668
17 1月, 2022 8 次提交
- J
  
  fix for conv2D training error (#38938) · 944ea436
  由 jakpiase 提交于 1月 17, 2022
  
  944ea436
- W
  [Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5
  由 Wilber 提交于 1月 17, 2022
```
* add pten::Place data structure.

* update ci problem

* fix ci problem

* update

* using platform::Place=pten::Place

* remove BOOST_GET_CONST for CPUPlace and GPUPlace

* compile pass 25%.

* compile pass 45%

* compile pass 60%

* remove boost_get for xpu npu mlu and ipu

* compile pass on cpu and gpu.

* fix compile problem

* fix compile error.

* update

* fix ci problem

* update

* ci approve

* fix ci problem

* fix ci eager test problem

* remove BOOST_GET_CONST

* fix npu compile
```
  c48a9ad5
- W
  
  fix benchmark in paddlerec (#38278) · 1dbc8632
  由 wangguanqun 提交于 1月 17, 2022
  
  1dbc8632
- S
  Add NoReduce mode for ParallelExecutor (#38969) · e50d883e
  由 sneaxiy 提交于 1月 17, 2022
```
* add no reduce mode for pe

* add NoReduce ut
```
  e50d883e
- S
  
  add squared_l2_norm (#38968) · 6eeb16b8
  由 sneaxiy 提交于 1月 17, 2022
  
  6eeb16b8
- R
  fix paddle.where torch diff (#38870) · 096afbe1
  由 ronnywang 提交于 1月 17, 2022
```
* fix paddle.where torch diff

* update
```
  096afbe1
- 0
  [Dy2St]close enable_inplace PASS for PE and open test_mnist_pure_fp16.py for windows (#38752) · 724d49da
  由 0x45f 提交于 1月 17, 2022
```
* close enable_inplace PASS for PE, and test dy2st pure fp16 training stability

* add some comment

* enlarge atol
```
  724d49da
- J
  Support auto prune logic in eager mode (#38960) · f81569e3
  由 Jiabin Yang 提交于 1月 17, 2022
```
* support test_auto_prune_partial

* support rest of autoprune strategy in eager mode
```
  f81569e3
15 1月, 2022 2 次提交

C
[PTen] Remove cached kernel context (#38953) · 35d2b71a
由 Chen Weihang 提交于 1月 15, 2022
```
* remove cached kernel context

* revert dataloader format change
```
35d2b71a

[Unify Tensors PR ] Merged LoDTensor with Tensor, test=allcases (#38880) · 88966b28

由 Zhanlue Yang 提交于 1月 15, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Fixed example code failure

* Polished function names, removed duplicated forward declarations

88966b28

14 1月, 2022 4 次提交

add flatten_contiguous_range OpConvert for Paddle-TRT (#38922) · 050aa6fe

由 heliqi 提交于 1月 14, 2022

* add trt_convert_flatten_contiguous_rang op

* trt version >7,support trt_convert_flatten_contiguous_rang

* trt version >7,support trt_convert_flatten_contiguous_rang

* trt version >7,support trt_convert_flatten_contiguous_rang

* test cast add trt version >=7 skip

050aa6fe

[XPU]add stack_grad op for kunlun2,*test=kunlun (#38674) · 87ee3e4f

由 Zhangjingyu06 提交于 1月 14, 2022

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun,*test=kunlun

* [XPU]add stack_grad op for kunlun2,*test=kunlun
Co-authored-by: NQingshuChen <chenqingshu@baidu.com>

87ee3e4f

B

Add dygraph sharding stage3 (#38052) · 4c77a908
由 Baibaifan 提交于 1月 14, 2022

4c77a908

[MLU]Add mean and reduce_mean op (#38872) · 7f8d5bc8

由 qipengh 提交于 1月 14, 2022

* [MLU]: add mean and reduce mean op

* [MLU]add mlu pytest dir in CMakeLists.txt

* [MLU]fix tensor data

* [MLU]fix TensorToPyArray and license

7f8d5bc8

13 1月, 2022 8 次提交

F
[NPU] fix tril_triu (#38864) · eaccdc71
由 furnace 提交于 1月 13, 2022
```
[NPU] fix tril_triu
```
eaccdc71
F
[NPU] fix expand op (#38526) · 7a5af630
由 furnace 提交于 1月 13, 2022
```
* [NPU] fix expand op

* [NPU] optimize codes

* [NPU] optimize codes
```
7a5af630

[pten]Remove pten/include dir files (#38878) · 7e0292ea

由 chentianyu03 提交于 1月 13, 2022

* move dot_dev api into dot_kernel.h

* add infermate header

* modify to dotkerel in dot_op.h

* mvoe conj dev api into complex_kernel.h

* move sign dev api into  sign_kernel.h

* move scale dev api into kernel.h and remove infermete.h

* rm paddle/pten/include/math.h

* rm paddle/pten/include/math.h

* rm include dir

* rm paddle/pten/include/math.h

* fix conflict with develop branch

* rm devContext in conj_op.h

* add the missing complex_kernel header

7e0292ea

J

[Dist Pass] AMP pass add dist_update_loss_scaling op (#38902) · 53783e1e
由 JZ-LIANG 提交于 1月 13, 2022

53783e1e
W
roi_align aligned supported (#38905) · 08dcea18
由 wenbin 提交于 1月 13, 2022
```
roi_align aligned supported
```
08dcea18

Added mul BF16/FP32 FWD/BWD oneDNN kernel (#38552) · fc6eed5b

由 jakpiase 提交于 1月 13, 2022

* base changes for mul reimplementation

* empty commit

* tmp save

* full implementation of mul bf16/fp32 fwd bwd

* CI fix

* CI rerun

* changed unity build cmake to avoid gpu issues

* removed mul mkldnn from unity build

* added skipping tests if not cpu_bf16

* CI fix

* CI fix

* CI fix

fc6eed5b

C
Fix mkldnn invalid infershape impl (#38837) · 281644cd
由 Chen Weihang 提交于 1月 13, 2022
```
* fix mkldnn invalid infershape

* add unittest for mkldnn in new executor

* add import os
```
281644cd

Support test_imperative using_non_zero_gpu with _test_eager_guard() (#38881) · 5e515781

由 Weilong Wu 提交于 1月 13, 2022

* Support test_imperative using_non_zero_gpu and Add a TODO comment

* Change GPU number to 0

* Modify the cuda device selection method

5e515781

12 1月, 2022 4 次提交

the_one_ps dirs reconstruct (#38804) · 50609214

由 ziyoujiyi 提交于 1月 12, 2022

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

50609214

S
Fix conv act int8 scale (#38331) · 4825addd
由 Sylwester Fraczek 提交于 1月 12, 2022
```
* fix conv act int8 scale

* add unit test for conv+hard_swish
```
4825addd

support 5d for nearest interp (#38868) · d296456c

由 xiaoting 提交于 1月 12, 2022

* support 5d for nearest

* update nearest3d unittest, test=develop

* fix approve ci, test=develop

* fix approve ci, test=develop

d296456c

[Dist Pass] Amp Pass (#38764) · cc24427e

由 JZ-LIANG 提交于 1月 12, 2022

* auto parallel sharding base

* chmod

* add unitest

* set unitest cmake dist label

* revise code according to rewiew

* chmod

* bugfix for grad_clip and param broadcast

* chmod

* update unitest

* chmod

* add clip

* chmod

* add amp pass

* chmod

* add unitest

* remove grad update

* fixed bug

* fixed bug

* fixed typose

* fixed typoes

cc24427e

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功