提交 · b3270adfe0c638ac582ef96565493c18e1b57989 · Crayon鑫 / Paddle

30 3月, 2022 2 次提交

Add new APIs for GPU memory monitoring (max_memory_allocated,... · afe02e9d

由 From00 提交于 3月 30, 2022

Add new APIs for GPU memory monitoring (max_memory_allocated, max_memory_reserved, memory_allocated, memory_reserved) (#38657)

* Add new API memory_reserved

* Add memory_allocated, max_memory_reserved and max_memory_allocater

* Fix CI error

* Fix CI error

* Enhance UT

* Add FLAGS_memory_stats_opt

* Add STATS macro functions

* Add StatAllocator

* Fix CI errors

* Add UT

* Fix CI errors

afe02e9d

suppor inplace in tensor_method_setitem (#40915) · 7170c687

由 pangyoki 提交于 3月 30, 2022

* suppor inplace in tensor_method_setitem

* delete bump_inplace_version

* optimize inplace unittest

* fix

* fix setitem bug

* update eager_generator

* optimize inplace unittest

* little change

7170c687

28 3月, 2022 1 次提交

[Dygraph] Add unittests for DataParallel in eager mode (#40709) · 62af5903

由 Haohongxiang 提交于 3月 28, 2022

* add uts for EagerReducer

* add more uts

* fix bugs

* fix bugs

* modify

* modify uts

* fix bugs

* update

* update

* update

* solve conflicts and merge

* add some other uts

* modify time of uts

* update

* update

* update

* remove uts of resnet

62af5903

25 3月, 2022 1 次提交

Refactor Dygraph Flags (#40786) · 3085d5e4

由 Jiabin Yang 提交于 3月 25, 2022

* refactor eager flags

* fix flags error when we switch from eager to dygraph

* fix ci problem

* fix ci

* fix ci

* merge develop and fix code style

* merge develop and fix code style

* fix op test error

* fix op test error

* fix op test error

* fix op test error

* fix op test error

* merge develop

3085d5e4

24 3月, 2022 1 次提交
- L
  
  Wrap dist api for dygraph mode (#40408) · 9d8cfc1b
  由 lilong12 提交于 3月 24, 2022
  
  9d8cfc1b
21 3月, 2022 1 次提交
- K
  
  fleetrun launch in legacy mode (#40568) · c54c60de
  由 kuizhiqing 提交于 3月 21, 2022
  
  c54c60de
19 3月, 2022 1 次提交

support inplace in dygraph eager_fluid state (#40400) · 8e612903

由 pangyoki 提交于 3月 19, 2022

* [Eager] Support eager grad interface, draft version

* Support eager grad interface with allow_unused and multi startup_op

* Fix code format

* Fix allow_unused case, return PyNone if tensor not initialize

* Support output's stop_gradient related to create_graph

* Support grad exception case in eager mode, fix coverage CI

* Update ToPyObject, return PyNone if not initialize

* AccumulationNode add FLAGS_retain_grad_for_all_tensor

* Fix ci issue

* Fix CI issue

* fix, use core.eager.Tensor

* Add func SetBufferSlotRankZeros for GradTensorHolder

* Support retain_graph by using ClearTensorWrappers

* Support retain_graph by using ClearTensorWrappers

* Update retain_graph and no_grad_vars related test case

* Update code gen logic for ClearTensorWrappers

* Fix by override statement

* fix override func args

* Support retain_graph, update unit tests

* Updated ClearTensorWrappers logic

* fix grad python interface

* Use deep copy and update unit tests

* Polish code

* Polish code

* Fix CI issue, Deep copy only use when user set grad_tensors

* Fix CI, use Backward instead RunBackward

* Fix CI, Declare kernel explicitly in test file

* Polish, remove vector of TensorWrapper

* Refactor the logic of grad/backward, polish codes

* Update code after merge upstream develop

* Polish after merge upstream develop

* Update to adapt new GradNodeBase superclass

* Fix error introduced during conflict resolution

* support inplace strategy in eager_fluid state

* solve conflict

* nothing

* Update purify potential_startup_nodes logic

* Fix errors

* Polish code

* Remove useless args for ToPyObject

* Remove useless TensorWrappersSet

* fix record conflict

* Fix code-format, re-install pre-commit

* fix tensor_wrapper bug

* Fix pre-process logic for potential_startup_ops

* Update unit tests, use eager mode

* Fix conflicts

* fix unittest timeout

* little change
Co-authored-by: NWeilong Wu <veyron_wu@163.com>

8e612903

17 3月, 2022 1 次提交
- H
  
  add time of unittests for dataparallel in dygraph mode (#40639) · e3a67782
  由 Haohongxiang 提交于 3月 17, 2022
  
  e3a67782
15 3月, 2022 1 次提交
- K
  
  New design for launch/run (#40086) · 67c6ddff
  由 kuizhiqing 提交于 3月 15, 2022
  
  67c6ddff
14 3月, 2022 1 次提交

[multiprocessing] Add paddle.incubate.multiprocessing for sharing tensors ... · e553f758

由 Zhong Hui 提交于 3月 14, 2022

[multiprocessing] Add paddle.incubate.multiprocessing for sharing tensors  between python processes. (#37302)

* Add support for paddle.multiprocessing
* move multiprocessing to incubate.

e553f758

11 3月, 2022 1 次提交
- Y
  
  [hybrid] Support tensor parallel and cache structure for fused attention op. (#40101) · 1882c496
  由 Yuang Liu 提交于 3月 11, 2022
  
  1882c496
09 3月, 2022 3 次提交
- B
  
  add_sharding_api (#40129) · f40ed5f4
  由 Baibaifan 提交于 3月 09, 2022
  
  f40ed5f4
- S
  Fix time of utest in distributed (#40163) · 7ea9235c
  由 ShenLiang 提交于 3月 09, 2022
```
* fix time of utest
```
  7ea9235c
- W
  
  [hybrid] fused_feedforward op support tensor model parallel (#40160) · e0866dc6
  由 WangXi 提交于 3月 09, 2022
  
  e0866dc6
07 3月, 2022 1 次提交

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

02 3月, 2022 2 次提交

new fleet_desc builder (#39948) · 1c4e3e5d

由 ziyoujiyi 提交于 3月 02, 2022

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* refactor ps optimize

* refactor ps optimize

* refactor ps optimize

* .

* .

* .

* .

* .

* .

* refactor theoneps

* the_one_ps

* add ps pass unittest

* add ps pass unittest

* ps unitest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* ps unittest ready

* ps unittest ready

* solve dist_pass init conflict

* solve import CommContext error

* unittest ok

* implement AllocateFrom

* solve setup.py.in conflict

* solve conflict

* solve conflict

* solve conflict

* .

* .

* cpu-async-ps minimize test ok & gpu minimize test ok

* add heter 2stage unittest

* add heter 2stage unittest

* add heter 2stage unittest

* sync/geo test ok & fix heter_worker program ok

* .

* new fleet desc generator

* new fleet_desc builder

* new fleet_desc builder

* .

* .

* correct ps.proto compile

* .
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

1c4e3e5d

[Eager] open eager when WITH_PYTHON (#39979) · 9af72957

由 wanghuancoder 提交于 3月 02, 2022

* open eager when WITH_PYTHON, test=develop

* refine, test=develop

* refine, test=develop

* add DWITH_PYTHON for gen_fluid_lib, test=develop

9af72957

01 3月, 2022 1 次提交
- Z
  
  add test_warpctc_op in mac (#39983) · 25650774
  由 zhangchunle 提交于 3月 01, 2022
  
  25650774
28 2月, 2022 1 次提交
- Z
  PR-CI-Py3 change cpu test (#39659) · 3cb93edf
  由 zhangchunle 提交于 2月 28, 2022
```
* update;test=cpu-py3
```
  3cb93edf
23 2月, 2022 1 次提交
- S
  Add ProcessGroupNCCL for distributed training (#39737) · 0b205817
  由 ShenLiang 提交于 2月 23, 2022
```
* add processgroup_nccl
```
  0b205817
22 2月, 2022 1 次提交
- Y
  
  disable some distribute test case when in CPU test env (#39801) · ae8c811a
  由 YUNSHEN XIE 提交于 2月 22, 2022
  
  ae8c811a
21 2月, 2022 1 次提交

disable some distribute test case when in CPU test env (#39682) · 941bdb41

由 wanghuancoder 提交于 2月 21, 2022

* disable some distribute test case when in CPU test env, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

941bdb41

19 2月, 2022 1 次提交

Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61

由 sneaxiy 提交于 2月 19, 2022

* add DistributedFusedLamb op

* polish code

* fix compile error

* compatible with pten changement

* fix rocm compile error

* improve converage

* update upstream/develop

* fix cast_with_ptr.h

* add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1

* fix clip before allreduce

* add use_master_param_norm

* code polish

* fix bug

* fix ROCM ci

5df3cd61

09 2月, 2022 1 次提交

Replace EagerTensor with Tensor (#39376) · 945a3ce9

由 Jiabin Yang 提交于 2月 09, 2022

* merge legacy to fluid

* Remove legacy code

* Remove legacy code

* Remove DataType test

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer

945a3ce9

28 1月, 2022 1 次提交

Resolve unit-test timeout issues (#39292) · 543f3dea

由 Weilong Wu 提交于 1月 28, 2022

* implement AllocateFrom

* fix PR-CI-Coverage timeout in 120s
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

543f3dea

25 1月, 2022 1 次提交
- Y
  
  [fleet_executor] Dist model run method Implementation (#39194) · 20e23e1b
  由 Yuang Liu 提交于 1月 25, 2022
  
  20e23e1b
24 1月, 2022 1 次提交

Refactored python-level trace_op to call through _C_ops instead of... · c3796061

由 Zhanlue Yang 提交于 1月 24, 2022

Refactored python-level trace_op to call through _C_ops instead of Tracer::TraceOp, under eager_mode (#38338)

* Replaced core.ops with _C_ops

* Refactored python-level trace_op to call through _C_ops instead of Tracer::TraceOp, under eager_mode

* Modified trace_op interface

* Refactored trace_op logic for eager mode

* Added Eager Dygraph support for OpTest

* Fixed ci issues

* Fixed CI failures

* Fixed Coverage CI Issues

* Fixed XPU CI Issues

c3796061

21 1月, 2022 1 次提交
- Y
  
  [fleet executor] add a tensor wrapper to support python numpy input (#39076) · 08793179
  由 Yuang Liu 提交于 1月 21, 2022
  
  08793179
19 1月, 2022 1 次提交

ipu python interface p1 (#38096) · 0837a2cc

由 jianghaicheng 提交于 1月 19, 2022

* ipu_commit_tests p1

* resolve comments

* resolve comments

* resolve comments

* resolve comments

* resolve comments

* resolve comments

* resolve comments

* update lint and ipustrategy introduction

* update ipu_config

* update __init__ of static

* update doc

* update doc 2

* update doc 3

* update doc 4

* update doc 5

* update doc 5

* update doc 6

* update lint

* update lint 2

* update ipustrategy

* add IpuStrategy to all

* update ipustrategy

* update ipu_shard_guard

* update ipu_shard_guard 2
Co-authored-by: Nyaozhixin <522190855@qq.com>

0837a2cc

14 1月, 2022 2 次提交
- B
  
  Add dygraph sharding stage3 (#38052) · 4c77a908
  由 Baibaifan 提交于 1月 14, 2022
  
  4c77a908
- Q
  [MLU]Add mean and reduce_mean op (#38872) · 7f8d5bc8
  由 qipengh 提交于 1月 14, 2022
```
* [MLU]: add mean and reduce mean op

* [MLU]add mlu pytest dir in CMakeLists.txt

* [MLU]fix tensor data

* [MLU]fix TensorToPyArray and license
```
  7f8d5bc8
11 1月, 2022 1 次提交

【Auto Parallel】New local tensor (#38747) · d3ba1895

由 caozhou 提交于 1月 11, 2022

* update dist tensor

* add unitest

* update unitest

* refactor dist tensor

* update dist tensor and unitest

d3ba1895

10 1月, 2022 1 次提交
- Y
  Add the backward support for QR (#38824) · 657b6742
  由 Yulong Ao 提交于 1月 10, 2022
```
* Add the backward support for QR

* Remove unnecessary comments
```
  657b6742
31 12月, 2021 1 次提交
- D
  
  fix timeout (#38612) · 02c17c0b
  由 Double_V 提交于 12月 31, 2021
  
  02c17c0b
23 12月, 2021 1 次提交
- X
  move distribution.py into distribution package and split into different file... · a3e6f18c
  由 Xiaoxu Chen 提交于 12月 23, 2021
```
move distribution.py into distribution package and split into different file for better scalability (#38047)
```
  a3e6f18c
21 12月, 2021 1 次提交
- Y
  
  [fleet_executor] Python side fleet executor and task node (#38290) · a4afb97a
  由 Yuang Liu 提交于 12月 21, 2021
  
  a4afb97a
20 12月, 2021 1 次提交
- Y
  
  [fleet_executor] Remove runtime graph, all scheduler on python side (#38261) · 2f188341
  由 Yuang Liu 提交于 12月 20, 2021
  
  2f188341
08 12月, 2021 1 次提交
- C
  add update func of auto search (#37867) · 46212b80
  由 caozhou 提交于 12月 08, 2021
```
* add update func of auto search

* update unitest
```
  46212b80
07 12月, 2021 1 次提交

[Auto para] Relaunch with auto mapping function (#37326) · 506e79d1

由 Yulong Ao 提交于 12月 07, 2021

* [Auto Parallel]  Add the unified cluster representation

* [Auto Parallel] Add the graph class for physical mapping

* [Auto Parallel] Add the simple physical mapper

* Set the timeout of the mapper

* Merge the upstream develop unittests cmake files

* Fix a bug of the process group

* Remove mapper unittest from platforms which is not GPU

* Move the instantiation of process group after resharding

* Add the local id for devices

* Update the rank mapping format

* [Auto Parallel] Relaunch with the rank mapping file

* Remove the unnecessary json file

* Avoid entering get_device_proc_info for auto mapping

* Correct the mapper unit test

* Add some comments

* Remove the related files about mapping

* Update the unittest for auto mapping

* Remove unused rank_mapping unittest

* Improve the unittest coverage

* Improve the unittest coverage

* Improve the unittest of relaunch

* Fix the unittest problem in CI

* Improve the unittest of relaunch

* Remove unnecessary statements

* Update the unittest cmakefile

* Correct the cmakefile of auto parallel unittests

* Modify codes based on the new elastic change

* Use the GPUs exclusively in the unittest

* Correct the cmakefile

* Set the timeout of the unittest

506e79d1

02 12月, 2021 1 次提交
- B
  
  Add dygraph sharding stage2 (#37707) · 20e19776
  由 Baibaifan 提交于 12月 02, 2021
  
  20e19776

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致