提交 · 9acc26ca1a6361e87ec1dfb92d062de54c0b7dc7 · Crayon鑫 / Paddle

20 1月, 2022 4 次提交

[Auto Parallel] Improve the dist op interface and the compatible computation (#39014) · 9acc26ca

由 Yulong Ao 提交于 1月 20, 2022

* Add the backward support for QR

* Remove unnecessary comments

* [Auto Parallel] Improve the dist op interface and compatible computation

* Remove unnecessary modification

* Recover some modifications

* Add lost files

* Fix a minor bug

* Fix the bug of the planner

* Fix the format problem

9acc26ca

Z
Fix master weight bug for multi_tensor optimizer(momentum, adam) (#38991) · 6b0c57cf
由 zhangbo9674 提交于 1月 20, 2022
```
* fix mp

* support merged_momentum for mp
```
6b0c57cf
M
[Paddle-ASP]Make test_asp_sharding running on non-mac platform (#39034) · c0f27282
由 minghaoBD 提交于 1月 20, 2022
```
* [Paddle-ASP]Make test_asp_sharding running on non-mac platform

* syntax check

* syntax check
```
c0f27282

[Eager] Support Eager mode for some testcase (#38783) · d21074cd

由 wanghuancoder 提交于 1月 20, 2022

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* support more eager tensor api

* fix merge compile error

* fix compile error and fit develop code

* support pure CPU

* fix some logic error in eager_mode

* support _varbase_creator in eager mode

* Added safe_initialized interface to EagerTensor for use in processing dispensable inputs

* for eager mode

* refine

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* eager logic

* refine test in pure cpu

* eager logic

* eager logic

* eager logic, test=develop

* skip core.eager when in inference, test=develop

* refine, test=develop

* refine, test=develop

* call RetainGrad after run forward kernel, test=develop

* refine, test=develop

* support dygraph util, meta, guard test

* eager test case

* support inference test

* refine test and fix initializer failed

* modify eagertensor patch method

* add eagertensor.clear_grandint, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* support create varbase and fix retain grad error

* call monkey_patch_varbase in _test_eager_guard, test=develop

* fix windows error

* split clear_gradient to clear_gradient and zero_grads, test=develop

* refine, test=develop

* refine, test=develop

* support test_imperative_basic test in eager mode

* remove additional log in variable.h

* remove additional log in variable.h

* remove additional code create in merge

* eager

* fix some eager logic, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* patch_tensor_method_func, test=develop

* refine, test=develop

* eager test case, test=develop

* refine, test=develop

* eager, test=develop

* eager, test=develop

* eager optimizer, test=develop

* eager optimizer, test=develop

* eager test_imperative_optimizer_v2, test=develop

* eager, test=develop

* refine, test=develop

* refine, test=develop

* eager, test=develop

* add resize in share buffer to, test=develop

* eager, test=develop

* fix _share_buffer_to, test=develop

* refine, test=develop

* refine, test=develop

* support eager for dataloader,test=develop
Co-authored-by: Njim19930609 <jim19930609@gmail.com>
Co-authored-by: NJiabinYang <360788950@qq.com>

d21074cd

19 1月, 2022 4 次提交

ipu python interface p1 (#38096) · 0837a2cc

由 jianghaicheng 提交于 1月 19, 2022

* ipu_commit_tests p1

* resolve comments

* resolve comments

* resolve comments

* resolve comments

* resolve comments

* resolve comments

* resolve comments

* update lint and ipustrategy introduction

* update ipu_config

* update __init__ of static

* update doc

* update doc 2

* update doc 3

* update doc 4

* update doc 5

* update doc 5

* update doc 6

* update lint

* update lint 2

* update ipustrategy

* add IpuStrategy to all

* update ipustrategy

* update ipu_shard_guard

* update ipu_shard_guard 2
Co-authored-by: Nyaozhixin <522190855@qq.com>

0837a2cc

Fix paddle.flops AttributeError (#38850) · ae1e71b3

由 yingyibiao 提交于 1月 19, 2022

* Fix AttributeError when output y is a tuple which has no attribute 'shape'

* Add unit test for dynamic_flops with multiple outputs

* Add unit test for dynamic_flops with multiple outputs

ae1e71b3

W

[hybrid] Fix out of memory bug (#39009) · 01222f52
由 wuhuachaocoding 提交于 1月 19, 2022

01222f52
Z

Add conv2d_transpose and conv2d_transpose_grad for XPU,test=kunlun (#38956) · c7de7440
由 zhangyikun02 提交于 1月 19, 2022

c7de7440

18 1月, 2022 10 次提交
- S
  Mish FP32/BF16 kernel, conv and fc fuse passes (#38623) · 1d18bc2c
  由 Sławomir Siwek 提交于 1月 18, 2022
```
* Mish

* Change exp() library

* mish fuse pass

* mish attrs

* fixes

* mishop maker

* remove attrs

* mish kernal for bf16

* fc+mish fuse

* fix code format error

* Resolve merge conflicts

* Update mish operator version

* update mish variable to new naming convention
```
  1d18bc2c
- change CUDA implementaion of uniform/gaussian OP (#38611) · bbbd75e4
  由 zhouweiwei2014 提交于 1月 18, 2022
```
* change CUDA implementaion of uniform/gaussian OP

* fix unittest
```
  bbbd75e4
- K
  
  fix http gloo bug (#39017) · a998c077
  由 kuizhiqing 提交于 1月 18, 2022
  
  a998c077
- W
  add the uva function for the Tensor (#38950) · bfacd706
  由 wawltor 提交于 1月 18, 2022
```
* add the uva api for the tensor

* fix the compiler problem for the uva

* fix the example for the _uva

* fix the compile problem in the pten library

* update the enviroment support for the uva

* use the make_shared replace the shared_ptr
```
  bfacd706
- J
  fix trt convert conv2d skip (#38999) · dfa242e4
  由 JingZhuangzhuang 提交于 1月 18, 2022
```
* fix trt convert conv2d skip

* fix trt convert conv2d skip
```
  dfa242e4
- W
  modify transpose params check (#39006) · 27f8460a
  由 wenbin 提交于 1月 18, 2022
```
* modify params check

* correct compile
```
  27f8460a
- Z
  
  Fixed python-level LoDTensor patch (#38996) · a17e51dd
  由 Zhanlue Yang 提交于 1月 18, 2022
  
  a17e51dd
- D
  
  Fix pad api docs (#38988) · 5406e6f8
  由 duanboqiang 提交于 1月 18, 2022
  
  5406e6f8
- Z
  [AutoParallel] Recompute Pass (#38920) · 30845734
  由 zhaoyingli 提交于 1月 18, 2022
```
* [AutoParallel] Recompute Pass

* update unittest

* reshard for amp

* add comment
```
  30845734
- S
  Speedup FP16 Gelu op using fast math and vectorized 8 kernel (#38980) · 8c20d668
  由 sneaxiy 提交于 1月 18, 2022
```
* speedup gelu using fast math

* add bwd part
```
  8c20d668
17 1月, 2022 8 次提交
- J
  
  fix for conv2D training error (#38938) · 944ea436
  由 jakpiase 提交于 1月 17, 2022
  
  944ea436
- W
  [Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5
  由 Wilber 提交于 1月 17, 2022
```
* add pten::Place data structure.

* update ci problem

* fix ci problem

* update

* using platform::Place=pten::Place

* remove BOOST_GET_CONST for CPUPlace and GPUPlace

* compile pass 25%.

* compile pass 45%

* compile pass 60%

* remove boost_get for xpu npu mlu and ipu

* compile pass on cpu and gpu.

* fix compile problem

* fix compile error.

* update

* fix ci problem

* update

* ci approve

* fix ci problem

* fix ci eager test problem

* remove BOOST_GET_CONST

* fix npu compile
```
  c48a9ad5
- W
  
  fix benchmark in paddlerec (#38278) · 1dbc8632
  由 wangguanqun 提交于 1月 17, 2022
  
  1dbc8632
- S
  Add NoReduce mode for ParallelExecutor (#38969) · e50d883e
  由 sneaxiy 提交于 1月 17, 2022
```
* add no reduce mode for pe

* add NoReduce ut
```
  e50d883e
- S
  
  add squared_l2_norm (#38968) · 6eeb16b8
  由 sneaxiy 提交于 1月 17, 2022
  
  6eeb16b8
- R
  fix paddle.where torch diff (#38870) · 096afbe1
  由 ronnywang 提交于 1月 17, 2022
```
* fix paddle.where torch diff

* update
```
  096afbe1
- 0
  [Dy2St]close enable_inplace PASS for PE and open test_mnist_pure_fp16.py for windows (#38752) · 724d49da
  由 0x45f 提交于 1月 17, 2022
```
* close enable_inplace PASS for PE, and test dy2st pure fp16 training stability

* add some comment

* enlarge atol
```
  724d49da
- J
  Support auto prune logic in eager mode (#38960) · f81569e3
  由 Jiabin Yang 提交于 1月 17, 2022
```
* support test_auto_prune_partial

* support rest of autoprune strategy in eager mode
```
  f81569e3
15 1月, 2022 2 次提交

C
[PTen] Remove cached kernel context (#38953) · 35d2b71a
由 Chen Weihang 提交于 1月 15, 2022
```
* remove cached kernel context

* revert dataloader format change
```
35d2b71a

[Unify Tensors PR #7] Merged LoDTensor with Tensor, test=allcases (#38880) · 88966b28

由 Zhanlue Yang 提交于 1月 15, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Fixed example code failure

* Polished function names, removed duplicated forward declarations

88966b28

14 1月, 2022 4 次提交

add flatten_contiguous_range OpConvert for Paddle-TRT (#38922) · 050aa6fe

由 heliqi 提交于 1月 14, 2022

* add trt_convert_flatten_contiguous_rang op

* trt version >7,support trt_convert_flatten_contiguous_rang

* trt version >7,support trt_convert_flatten_contiguous_rang

* trt version >7,support trt_convert_flatten_contiguous_rang

* test cast add trt version >=7 skip

050aa6fe

[XPU]add stack_grad op for kunlun2,*test=kunlun (#38674) · 87ee3e4f

由 Zhangjingyu06 提交于 1月 14, 2022

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun,*test=kunlun

* [XPU]add stack_grad op for kunlun2,*test=kunlun
Co-authored-by: NQingshuChen <chenqingshu@baidu.com>

87ee3e4f

B

Add dygraph sharding stage3 (#38052) · 4c77a908
由 Baibaifan 提交于 1月 14, 2022

4c77a908

[MLU]Add mean and reduce_mean op (#38872) · 7f8d5bc8

由 qipengh 提交于 1月 14, 2022

* [MLU]: add mean and reduce mean op

* [MLU]add mlu pytest dir in CMakeLists.txt

* [MLU]fix tensor data

* [MLU]fix TensorToPyArray and license

7f8d5bc8

13 1月, 2022 8 次提交

F
[NPU] fix tril_triu (#38864) · eaccdc71
由 furnace 提交于 1月 13, 2022
```
[NPU] fix tril_triu
```
eaccdc71
F
[NPU] fix expand op (#38526) · 7a5af630
由 furnace 提交于 1月 13, 2022
```
* [NPU] fix expand op

* [NPU] optimize codes

* [NPU] optimize codes
```
7a5af630

[pten]Remove pten/include dir files (#38878) · 7e0292ea

由 chentianyu03 提交于 1月 13, 2022

* move dot_dev api into dot_kernel.h

* add infermate header

* modify to dotkerel in dot_op.h

* mvoe conj dev api into complex_kernel.h

* move sign dev api into  sign_kernel.h

* move scale dev api into kernel.h and remove infermete.h

* rm paddle/pten/include/math.h

* rm paddle/pten/include/math.h

* rm include dir

* rm paddle/pten/include/math.h

* fix conflict with develop branch

* rm devContext in conj_op.h

* add the missing complex_kernel header

7e0292ea

J

[Dist Pass] AMP pass add dist_update_loss_scaling op (#38902) · 53783e1e
由 JZ-LIANG 提交于 1月 13, 2022

53783e1e
W
roi_align aligned supported (#38905) · 08dcea18
由 wenbin 提交于 1月 13, 2022
```
roi_align aligned supported
```
08dcea18

Added mul BF16/FP32 FWD/BWD oneDNN kernel (#38552) · fc6eed5b

由 jakpiase 提交于 1月 13, 2022

* base changes for mul reimplementation

* empty commit

* tmp save

* full implementation of mul bf16/fp32 fwd bwd

* CI fix

* CI rerun

* changed unity build cmake to avoid gpu issues

* removed mul mkldnn from unity build

* added skipping tests if not cpu_bf16

* CI fix

* CI fix

* CI fix

fc6eed5b

C
Fix mkldnn invalid infershape impl (#38837) · 281644cd
由 Chen Weihang 提交于 1月 13, 2022
```
* fix mkldnn invalid infershape

* add unittest for mkldnn in new executor

* add import os
```
281644cd

Support test_imperative using_non_zero_gpu with _test_eager_guard() (#38881) · 5e515781

由 Weilong Wu 提交于 1月 13, 2022

* Support test_imperative using_non_zero_gpu and Add a TODO comment

* Change GPU number to 0

* Modify the cuda device selection method

5e515781

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致