提交 · eaeff90edec7fee8dc91ea3b39b53e0378291df7 · Crayon鑫 / Paddle

19 3月, 2022 2 次提交

C

fix python hook mem leak (#40716) · c46f2ddb
由 Chen Weihang 提交于 3月 19, 2022

c46f2ddb

support inplace in dygraph eager_fluid state (#40400) · 8e612903

由 pangyoki 提交于 3月 19, 2022

* [Eager] Support eager grad interface, draft version

* Support eager grad interface with allow_unused and multi startup_op

* Fix code format

* Fix allow_unused case, return PyNone if tensor not initialize

* Support output's stop_gradient related to create_graph

* Support grad exception case in eager mode, fix coverage CI

* Update ToPyObject, return PyNone if not initialize

* AccumulationNode add FLAGS_retain_grad_for_all_tensor

* Fix ci issue

* Fix CI issue

* fix, use core.eager.Tensor

* Add func SetBufferSlotRankZeros for GradTensorHolder

* Support retain_graph by using ClearTensorWrappers

* Support retain_graph by using ClearTensorWrappers

* Update retain_graph and no_grad_vars related test case

* Update code gen logic for ClearTensorWrappers

* Fix by override statement

* fix override func args

* Support retain_graph, update unit tests

* Updated ClearTensorWrappers logic

* fix grad python interface

* Use deep copy and update unit tests

* Polish code

* Polish code

* Fix CI issue, Deep copy only use when user set grad_tensors

* Fix CI, use Backward instead RunBackward

* Fix CI, Declare kernel explicitly in test file

* Polish, remove vector of TensorWrapper

* Refactor the logic of grad/backward, polish codes

* Update code after merge upstream develop

* Polish after merge upstream develop

* Update to adapt new GradNodeBase superclass

* Fix error introduced during conflict resolution

* support inplace strategy in eager_fluid state

* solve conflict

* nothing

* Update purify potential_startup_nodes logic

* Fix errors

* Polish code

* Remove useless args for ToPyObject

* Remove useless TensorWrappersSet

* fix record conflict

* Fix code-format, re-install pre-commit

* fix tensor_wrapper bug

* Fix pre-process logic for potential_startup_ops

* Update unit tests, use eager mode

* Fix conflicts

* fix unittest timeout

* little change
Co-authored-by: NWeilong Wu <veyron_wu@163.com>

8e612903

18 3月, 2022 4 次提交

Supported Complex2Real Conversion for Eager Dygraph (#39878) · e3b2a035

由 Zhanlue Yang 提交于 3月 18, 2022

* Supported Complex2Real Conversion for Eager Dygraph

* Supported Complex2Real Conversion for Eager Dygraph

* Enabled complex type promotion test for matmul_v2

* Fix CI issues

* Merged adj_edges_ with GradSlotMeta

* Fixed monir issue

* Adjusted num runs

* Recovered Eager performance tests configurations

* Recovered Eager performance tests configurations

* Adjusted performance tests configurations

* Fixed Minor Issues with performance tests

* Moved out Edge from GradSlotMeta

* Fixed issues from merge

* Fixed typo

* Addressed review comments

* Fixed minor issues

e3b2a035

S
[DataParallel]Support control flow in new DP (#40593) · 984eacb3
由 ShenLiang 提交于 3月 18, 2022
```
* fix bug

* fix bug
```
984eacb3
L

Use store for gloo process group (#40629) · bb2cb762
由 lilong12 提交于 3月 18, 2022

bb2cb762

王

[infrt] rename pd dialect from mlir to infrt. (#40651) · ef4ef154

由王明冬提交于 3月 18, 2022

* [infrt] rename pd dialect from mlir to infrt. test=develop

* [infrt] fix the kernel signature generator bug.

ef4ef154

17 3月, 2022 4 次提交

S
merge cpu and gpu graph engines (#40597) · 31776199
由 seemingwang 提交于 3月 17, 2022
```
* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config
```
31776199
B

support gpu mixed precision inference (#40531) · 06fee998
由 baoachun 提交于 3月 17, 2022

06fee998

[Eager Grad] Support eager grad interface (#40170) · 4db8cf24

由 Weilong Wu 提交于 3月 17, 2022

* [Eager] Support eager grad interface, draft version

* Support eager grad interface with allow_unused and multi startup_op

* Fix code format

* Fix allow_unused case, return PyNone if tensor not initialize

* Support output's stop_gradient related to create_graph

* Support grad exception case in eager mode, fix coverage CI

* Update ToPyObject, return PyNone if not initialize

* AccumulationNode add FLAGS_retain_grad_for_all_tensor

* Fix ci issue

* Fix CI issue

* fix, use core.eager.Tensor

* Add func SetBufferSlotRankZeros for GradTensorHolder

* Support retain_graph by using ClearTensorWrappers

* Support retain_graph by using ClearTensorWrappers

* Update retain_graph and no_grad_vars related test case

* Update code gen logic for ClearTensorWrappers

* Fix by override statement

* fix override func args

* Support retain_graph, update unit tests

* Updated ClearTensorWrappers logic

* fix grad python interface

* Use deep copy and update unit tests

* Polish code

* Polish code

* Fix CI issue, Deep copy only use when user set grad_tensors

* Fix CI, use Backward instead RunBackward

* Fix CI, Declare kernel explicitly in test file

* Polish, remove vector of TensorWrapper

* Refactor the logic of grad/backward, polish codes

* Update code after merge upstream develop

* Polish after merge upstream develop

* Update to adapt new GradNodeBase superclass

* Fix error introduced during conflict resolution

* Update purify potential_startup_nodes logic

* Fix errors

* Polish code

* Remove useless args for ToPyObject

* Remove useless TensorWrappersSet

* Fix code-format, re-install pre-commit

* Fix pre-process logic for potential_startup_ops

* Update unit tests, use eager mode

4db8cf24

J
fix copy_ problem by doing it with phi copy (#40521) · c1931beb
由 Jiabin Yang 提交于 3月 17, 2022
```
* fix copy_ problem by doing it with phi copy

* improve test coverage

* refactor copy with sr kernel
```
c1931beb

16 3月, 2022 3 次提交

P

update; test=develop · 733d3109
由 phlrain 提交于 3月 16, 2022

733d3109
R

clean up DeviceManager in advance manually (#40504) · 23c036d6
由 ronnywang 提交于 3月 16, 2022

23c036d6

[Auto Parallel] Add the support for the auto completion of while_op (#39939) · ec6b8fbd

由 Yulong Ao 提交于 3月 16, 2022

* [Auto Parallel] Support the auto completion of while_op

* [Auto Parallel] Improve the completion algorithms

* [Auto Parallel] Fix bugs for ernie inference

* [Auto Parallel] Remove attrs which cannot be pickled

* [Auto Parallel] make the dims_mappings of LodTensorArray vars empty

* [Auto Parallel] Fix bugs for the ernie inference in the pipeline parallel

* [Auto Parallel] Remove unncessary comments

* [Auto Parallel] Fix a bug of the CMakeLists

* [Auto Parallel] Use the newest APIs to write the unit test

* [Auto Parallel] Remove unnecessary statements

ec6b8fbd

15 3月, 2022 4 次提交

X
run python api in eager model and filter the out in argument list (#40523) · 4d886f75
由 xiongkun 提交于 3月 15, 2022
```
* run python api in eager model and filter the out in argument list

* fix code
```
4d886f75
F
[NPU] add AMP O1 support (#40362) · 69dd43d1
由 furnace 提交于 3月 15, 2022
```
* [NPU] add AMP O1 support

* [NPU] fix NOTE and warnings
```
69dd43d1

[Dygraph] Refactoring of reducer in DataParallel (#40389) · 1a32391c

由 Haohongxiang 提交于 3月 15, 2022

* refactor reducer

* modify cmakelists

* solve conflicts

* rename group and update process_group

* fix bugs of ProcessGroupNCCL

* modify for CIs

* refactoring reducer

1a32391c

Remove pybind index error (#40538) · 47d764a3

由 zyfncg 提交于 3月 15, 2022

* change the exception of getitem from pybind type to PADDLE_ENFORCE

* fix bug

* remove pybind::index_error exception

47d764a3

14 3月, 2022 4 次提交

Support custom op and paddle.autograd.bacward in eager (#40423) · 227fa408

由 Jiabin Yang 提交于 3月 14, 2022

* eager, test=develop

* fix bug, test=develop

* eager, test=develop

* merge legacy to fluid

* eager, test=develop

* eager, test=develop

* Refactor TensorAdd func by template and remove gradient_accumulation in eager

* Remove needless target name

* eager, test=develop

* eager, test=develop

* Use overload instead of template

* Remove legacy code

* Remove legacy code

* selectedrows, test=develop

* Remove DataType test

* eager, test=develop

* eager, test=develop

* support gan, test=develop

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer

* refine code

* ptb, test=develop

* Rename all EagerTensor to Tensor

* Rename some EagerTensor to Tensor

* rename EagerTensor to EagerVariable

* eager, test=develop

* eager, test=develop

* eager, test=develop

* eager, test=develop

* add more test

* eager, test=develop

* Support copiable selected rows and merge develop

* save load, eager, test=develop

* save load, eager, test=develop

* refine, test=develop

* remove useless _set_value method

* refine, test=develop

* refine, test=develop

* revert static_runner, test=develop

* EagerTensor to Tensor, test=develop

* refine, test=develop

* refine, test=develop

* clear grad, test=develop

* merge, develop

* merge, develop

* merge, test=develop

* merge, test=develop

* Support quant and part of slice

* support legacy static save

* extend slim tests time

* remove imperative on inference

* remove imperative on inference

* merge develop

* fix typo

* fix typo

* split slice related code into 2 part for imperative and eager

* split slice from inference

* split slice from inference

* fix test_tensor_register_hook

* support custom op in eager mode

* fix inference deps error

* split eager utils from custom operator

* fix type match

* fix typo
Co-authored-by: NWang Huan <wanghuan29@baidu.com>
Co-authored-by: NWeilong Wu <veyron_wu@163.com>
Co-authored-by: Nwanghuancoder <wanghuancoder@163.com>

227fa408

0

adjust params order for eager.Tensor._copy_to (#40449) · c6ec8b9f
由 0x45f 提交于 3月 14, 2022

c6ec8b9f

[multiprocessing] Add paddle.incubate.multiprocessing for sharing tensors ... · e553f758

由 Zhong Hui 提交于 3月 14, 2022

[multiprocessing] Add paddle.incubate.multiprocessing for sharing tensors  between python processes. (#37302)

* Add support for paddle.multiprocessing
* move multiprocessing to incubate.

e553f758

Refine partial_program for new run_program OP (#40355) · afafb1c3

由 0x45f 提交于 3月 14, 2022

* refine partial_program

* fix code for test_mnist.py train

* support quantify UT

* make __fake_vars and _double_grads to lazy

* fix comments

afafb1c3

12 3月, 2022 1 次提交
- A
  [custom kernel] fix static object de-initialize bug (#40414) · 573ca984
  由 Aganlengzi 提交于 3月 12, 2022
```
* [custom kernel] fix static object de-initialize bug

* fix text

* fix text

* refine log info
```
  573ca984
11 3月, 2022 2 次提交
- L
  
  fix the bug for processgroup_hccl compiling (#40437) · f70f5e4f
  由 lilong12 提交于 3月 11, 2022
  
  f70f5e4f
- Y
  
  [hybrid] Support tensor parallel and cache structure for fused attention op. (#40101) · 1882c496
  由 Yuang Liu 提交于 3月 11, 2022
  
  1882c496
10 3月, 2022 2 次提交

Inference add ONNXRuntime back-end (#39988) · 431afc39

由 heliqi 提交于 3月 10, 2022

* add onnxruntime predictor

* Add code comments

* support link paddle2onnx onnxruntime

* support onnxruntime with python

* support onnxruntime with python

* support onnxruntime with windows

* paddle2onnx compile with windows

* supoort windows compile

* supoort windows compile with onnxruntime

* supoort windows compile with paddle2onnx

* supoort mac compile

* compile with mac

* compile with mac

* add code comments

* fix remind word

* code optimization

* add test case

* add test case

* add inference demo_ci test case

* fix compile paddle2onnx with no python

* add inference demo_ci test case

* add inference demo_ci test case

* add inference infer_ut test case

* support c go api and test cases

* add converage test case

* add converage test case

* add capi test case

* add capi test case

431afc39

L

solve unexecuted UT (#40397) · bd4dc3be
由 Lijunhui 提交于 3月 10, 2022

bd4dc3be

09 3月, 2022 2 次提交
- 0
  adapt run_program OP for eager (#40198) · 3e9601ba
  由 0x45f 提交于 3月 09, 2022
```
* adapt run_program OP for eager

* fix program_id

* refine code

* fix test
```
  3e9601ba
- H
  
  [Infrt]Update kernel dialect (#40141) · 767647ce
  由 huzhiqiang 提交于 3月 09, 2022
  
  767647ce
08 3月, 2022 2 次提交

L
add the implementation of process group for hccl (#40228) · 73583f86
由 lilong12 提交于 3月 08, 2022
```
* add pg_hccl
```
73583f86

add python profiler package (#40065) · 10325a82

由 chenjian 提交于 3月 08, 2022

* add python profiler package

* update according to review

* fix bug

* fix bug

* fix bug

* add unit test

* Revert "add unit test"

This reverts commit 4e69ff71b0645e069afe5dd8fea0d07717852c48.

* reduce for pr

* add unit test

* modify for pr

* fix unittest

* update for ci coverage

* modify according to review

* fix bug

* improve coverage

10325a82

07 3月, 2022 3 次提交

X
[OpTest] Support to test paddle API end-to-end for check_eager (#40169) · 79a32715
由 xiongkun 提交于 3月 07, 2022
```
* add python api test in TestOp

* test_python_api if self.python_api is set

* fix code by CR
```
79a32715

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

L

initialize processgroupnccl with store (#40181) · 0ad25fb9
由 lilong12 提交于 3月 07, 2022

0ad25fb9

04 3月, 2022 1 次提交

Generate forward-only operators (#39962) · a6947991

由 Zhanlue Yang 提交于 3月 04, 2022

* [Eager][Yaml]Supported Scalar and ScalarArray for AutoCodeGen

* Generate forward-only operators

* [Yaml]Support parsing fwd & bwd returns with name

* Fixed issues

* Fixed minor issues

a6947991

03 3月, 2022 5 次提交

Z

[Eager][Yaml]Supported Scalar and ScalarArray for AutoCodeGen (#40080) · 97ccaa79
由 Zhanlue Yang 提交于 3月 03, 2022

97ccaa79
R

[CustomRuntime] migrate CustomRuntime into phi (#39908) · b4665d23
由 ronnywang 提交于 3月 03, 2022

b4665d23
L

add communication api for ProcessGroupNCCL (#40097) · b565b349
由 lilong12 提交于 3月 03, 2022

b565b349

Support slim eager (#39874) · da47544c

由 Jiabin Yang 提交于 3月 03, 2022

* eager, test=develop

* fix bug, test=develop

* eager, test=develop

* merge legacy to fluid

* eager, test=develop

* eager, test=develop

* Refactor TensorAdd func by template and remove gradient_accumulation in eager

* Remove needless target name

* eager, test=develop

* eager, test=develop

* Use overload instead of template

* Remove legacy code

* Remove legacy code

* selectedrows, test=develop

* Remove DataType test

* eager, test=develop

* eager, test=develop

* support gan, test=develop

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer

* refine code

* ptb, test=develop

* Rename all EagerTensor to Tensor

* Rename some EagerTensor to Tensor

* rename EagerTensor to EagerVariable

* eager, test=develop

* eager, test=develop

* eager, test=develop

* eager, test=develop

* add more test

* eager, test=develop

* Support copiable selected rows and merge develop

* save load, eager, test=develop

* save load, eager, test=develop

* refine, test=develop

* remove useless _set_value method

* refine, test=develop

* refine, test=develop

* revert static_runner, test=develop

* EagerTensor to Tensor, test=develop

* refine, test=develop

* refine, test=develop

* clear grad, test=develop

* merge, develop

* merge, develop

* merge, test=develop

* merge, test=develop

* Support quant and part of slice

* support legacy static save

* extend slim tests time

* remove imperative on inference

* remove imperative on inference

* merge develop

* fix typo

* fix typo

* split slice related code into 2 part for imperative and eager

* split slice from inference

* split slice from inference

* fix test_tensor_register_hook
Co-authored-by: NWang Huan <wanghuan29@baidu.com>
Co-authored-by: NWeilong Wu <veyron_wu@163.com>
Co-authored-by: Nwanghuancoder <wanghuancoder@163.com>

da47544c

L
Add the implementation of Gloo for ProcessGroup (#39892) · c16f85f9
由 lilong12 提交于 3月 03, 2022
```
* add pg_gloo
```
c16f85f9

02 3月, 2022 1 次提交
- H
  
  [Infrt]add phi kernel dialect (#39726) · 07dad6d6
  由 huzhiqiang 提交于 3月 02, 2022
  
  07dad6d6

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致