提交 · c3158527717014f7875d61a4151ebdfaf3d30bd0 · BaiXuePrincess / Paddle

21 4月, 2021 1 次提交

【NPU】Merge NPU ccl code (#32381) · c3158527

由 zhang wenhui 提交于 4月 21, 2021

* add allreduce and broadcast without test (#31024)

add allreduce and broadcast without test

* Refactor HCCLCommContext to be compatible with Paddle (#31359)

Refactor HCCLCommContext to be compatible with Paddle (#31359)

* [NPU] add npu kernel for communication op (#31437)

* add allreduce and broadcast without test

* add c_broadcast_test case

* build c_comm_init and c_create_group operators

* make the whole thing compile

* add broadcast and init op test case but run failed

* make unit test compile

* fix broadcast test bug and change into hcom for ccl

* change c_comm_init and c_create_group ops accordingly

* make tests compile

* transfer code to 27

* compiled successfully in 28, but run failed

* test broadcast in 28, but failed

* make hcom primitives work

* change hccl data type for base.h

* fix broadcast bug

* make attributes work

* fix group name bug

* add allreduce but test failed

* allreduce bug for qiuliang

* allreduce finished

* add allgather and reducescatter

* merge all op code

* add allgather test

* finish run all ccl op test exclude send/recv

* all all op and test exclude send/recv

* send_v2_npu.cc recv_v2_npiu.cc compiled

* fix ccl core dump bug and test allgather, reducescatter, broadcast op

* fix allreduce bug just for test

* hcom send&recv test pass, without hcom_destroy

* for qiuliang test

* Ascend Send&Recv Test Pass

* all op (ex send/recv) ok

* fix bug

* merge all ccl op

* style merge to PaddlePaddle

* merge style

* new merge style

* merge style 2

* insert an empty at the end

* disable ctest for hcom to pass ci
Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>

* Add auto-increasing tag id for Hcom OPs (#31702)

* add c_reduce_sum op (#31793)

add c_reduce_sum op

* update Ascendrc hccl to 20.3 (#32126)

update Ascendrc hccl to 20.3 (#32126)

* fix merge code

* change cmake.txt1

* [NPU] Support npu kernel for c sync stream op (#31386)

* sync stream npu op

* add with_ascend_acl

* update c++ unittest

* compile all failed

* try to pre commit

* after pre commit

* merge&compile&test hccl successfully!

* fix code style

* fix code style

* fix bugs about hccl

* fix some bugs

* fix code style

* fix style

* fix style

* fix

* fixed

* merge develop
Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>
Co-authored-by: Nxiayanming <41795079@qq.com>

c3158527

19 4月, 2021 1 次提交
- S
  [Hybrid Parallel] Support dp & mp in dygraph (#32323) · ffd40860
  由 ShenLiang 提交于 4月 19, 2021
```
* support dp & mp
```
  ffd40860
15 4月, 2021 1 次提交

heterps support pscore (#32093) · 9f8c8f96

由 Thunderbrook 提交于 4月 15, 2021

* pscore support heterps

* fleet cmake

* fleet wrapper

* macro

* solve conflict

* solve conflict

* add unitest

* paddle enforce

* unitest

* unitest

* unitest

9f8c8f96

09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

08 4月, 2021 1 次提交
- J
  
  4D Hybrid Parallelism (#32134) · 54344964
  由 JZ-LIANG 提交于 4月 08, 2021
  
  54344964
07 4月, 2021 2 次提交

【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3

由 zhang wenhui 提交于 4月 07, 2021

* Ascend rc (#30483)

* Fix compilcation on CANN20.1 and older (#30494)

Fix compilcation on CANN20.1 and older

* Add distribution supported (#30578)

Add distribution supported

* Build praser for Hcom* operators (#30627)

Build praser for Hcom* operators

* Pass device_ids info from launch to trainer. (#30632)

Pass device_ids info from launch to trainer

* Add Hccl program group (#30642)

Add Hccl program group

* Add startup bash files of test_ascend_group. (#30645)

Add startup bash files of test_ascend_group

* cleanup (#30646)

cleanup test_ascend_group.py

* [Feature] Build parser to support distributed training (#30658)

[Feature] Build parser to support distributed training

* fix compilation on ascend-20.1 (#30722)

fix compilation on ascend-20.1

* Dev/fix ascend string (#30749)

Dev/fix ascend string

* code style (#30781)

code style

* Merge ascend_optimizer and ascend_parser. (#30776)

Merge ascend_optimizer and ascend_parser.

* Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)

Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug

* Add paddle ascend distribution training supported (#30796)

Add paddle ascend distribution training supported

* pass cxx_flags to gloo cmake (#30857)

* Destroy session first. (#30954)

Destroy session first.

* merge

* fix, test=develop

* fix, test=develop

* fix style, test=develop

* fix, test=develop

* fix

* fix log fatal, test=develop

* fix enforce style, test=develop

* fix, test=develop

* fix, test=develop

* fix rccl, test=develop

* fix test, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix node_num, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Ndingsiyu <18369187719@163.com>
Co-authored-by: NOleNet <olenet@126.com>

8c7c53b3

J

[3D-parallelism] Hybrid Model Parallelism (#32074) · 1e60a0c4
由 JZ-LIANG 提交于 4月 07, 2021

1e60a0c4

02 4月, 2021 1 次提交
- J
  
  [3D-Parallel:Sharding] Optimizations for supporting ERNIE 3.0 training (#31884) · 69c874fd
  由 JZ-LIANG 提交于 4月 02, 2021
  
  69c874fd
31 3月, 2021 1 次提交
- L
  Adjust pipeline optimizer for 3d parallelism (#31939) · 695dd371
  由 lilong12 提交于 3月 31, 2021
```
* update, test=develop
```
  695dd371
26 3月, 2021 1 次提交
- L
  [3D-parallel] Reformat pipeline parallel (#31786) · c3974d0e
  由 lilong12 提交于 3月 26, 2021
```
* update, test=develop
```
  c3974d0e
22 3月, 2021 1 次提交
- L
  [3D-parallel] add 1f1b scheduler for pipeline (#31566) · a501a7b0
  由 lilong12 提交于 3月 22, 2021
```
* add 1f1b scheduler for pp, test=develop
```
  a501a7b0
10 3月, 2021 1 次提交

remove the send/recv of tensor size (#31460) · 0205e9f8

由 lilong12 提交于 3月 10, 2021

* remove the send/recv of tensor size, but users have to specify the shape of the received var explicitly.

0205e9f8

05 2月, 2021 1 次提交
- L
  
  [Kunlun] add gen_bkcl_id_op, support multi XPU cards training using multiprocess (#30858) · 4a8b8b45
  由 liuyuhui 提交于 2月 05, 2021
  
  4a8b8b45
01 2月, 2021 1 次提交
- W
  
  Fleet distributed strategy support pure fp16 (#30754) · 31ed9c9e
  由 WangXi 提交于 2月 01, 2021
  
  31ed9c9e
20 1月, 2021 1 次提交
- L
  fix the bug of all_reduce pipeline gradient multiple times (#30437) · 8126a41d
  由 lilong12 提交于 1月 20, 2021
```
* update, test=develop
```
  8126a41d
18 1月, 2021 1 次提交
- H
  
  Ascend Framework Part3: Ascend Parser (#30391) · 9fec1618
  由 hutuxian 提交于 1月 18, 2021
  
  9fec1618
12 1月, 2021 2 次提交
- J
  
  Recompute Offload (#30233) · 75936d83
  由 JZ-LIANG 提交于 1月 12, 2021
  
  75936d83
- C
  【Paddle.Fleet】Support local save sparse param (#30175) · d479ae17
  由 Chengmo 提交于 1月 12, 2021
```
* add save tensor support
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>
```
  d479ae17
08 1月, 2021 1 次提交
- C
  【Paddle.Fleet】Fix tensor table (#30075) · 528e03fc
  由 Chengmo 提交于 1月 08, 2021
```
* add tensor table
```
  528e03fc
05 1月, 2021 1 次提交
- W
  
  [fleet] combine amp and gradient merge, test=develop (#30086) · ab049978
  由 WangXi 提交于 1月 05, 2021
  
  ab049978
25 12月, 2020 1 次提交
- L
  fix the bug in pipeline data parallelism (#29731) · 01950ceb
  由 lilong12 提交于 12月 25, 2020
```
* update, test=develop
```
  01950ceb
24 12月, 2020 1 次提交

[Feature] one ps (3/4) (#29604) · 032414ca

由 tangwei12 提交于 12月 24, 2020

* oneps (3/4)
Co-authored-by: NMrChengmo <cmchengmo@163.com>
Co-authored-by: Nmalin10 <malin10@baidu.com>
Co-authored-by: Nchengmo <chengmo@baidu.com>

032414ca

17 12月, 2020 1 次提交
- W
  
  fleet sync build strategy, test=develop (#29732) · 9cbcc6ca
  由 WangXi 提交于 12月 17, 2020
  
  9cbcc6ca
11 12月, 2020 1 次提交

[Sharding] add hybrid-dp feature (#29518) · d33d468f

由 JZ-LIANG 提交于 12月 11, 2020

* Sharding add hybrid-dp feature

* update sharding in distributed_strategy

* update sharding unitest

* revise code format for sharding

d33d468f

30 11月, 2020 1 次提交
- W
  
  optimizer amp, all use fp16 communication, overlap last comm and compute (#28957) · 0c2a51d2
  由 WangXi 提交于 11月 30, 2020
  
  0c2a51d2
26 11月, 2020 2 次提交

[sharding] doc, api, bug fixed (#28983) · 0dadacc4

由 JZ-LIANG 提交于 11月 26, 2020

* add lars to fleet meta optimizer

* add lamb to proto

* add lamb to fleet meta optimizer

* fixed syntax bug

* fixed syntax bug

* fixed syntax error in lamb, add config setter of lamb in distributed_strategy

* trigger unitest to rerun

* add new unitest func for lamb

* revise unitest for lars and lamb

* revise dgc meta unitest

* revise lars document in distribute_strategy

* revise lars lamb document in distributed_strategy.py

* revise lars lamb document in distributed_strategy.py

* add weight decay exclude logic to lars

* restore optimzier.py

* restore optimizer.py as develop except lars

* add epsilon and exclude fn to distributed_sttrategy

* add lars epsilon

* revise unitest for fleet lars and lamb

* revise lars lamb unitest for CI coverage

* revise lars argument api

* revise lars argument api

* revise lars argument api

* revise api doc of lars

* fix op role

* add sharding save and add_sync_comm_for_test function

* add comm_analyse to utlis

* revise sharding_utils

* add sharding saving unittest

* revise sharding utils for unittest

* revise sharding en doc

* update sharding utils api

* add doc for sharding

* fixed bug in sharding var size count

* update varsize count in sharding

* fix sharding num_nccl_comm

* Revert "fix sharding num_nccl_comm"

This reverts commit d51587c15e9323acf226ddd36154275f0d1daf76.

0dadacc4

W

Fix multi nccl comm & wait server ready (#28663) · e931c7ba
由 WangXi 提交于 11月 26, 2020

e931c7ba

24 11月, 2020 1 次提交

Upgrade string literals to raw string (#28989) · 3815d7aa

由 Leo Chen 提交于 11月 24, 2020

* upgrade comment string to raw string

* fix string in

* fix string with ' '

* revert update on comments

* upgrade only necessary

* fix sample code checker

* fix comments with '''

3815d7aa

23 11月, 2020 1 次提交
- L
  enable pipeline to run with Executor.run() (#28373) · f77a78cd
  由 lilong12 提交于 11月 23, 2020
```
* update, test=develop
```
  f77a78cd
18 11月, 2020 1 次提交

[Sharding] add new features (#28568) · 5a9f6889

由 JZ-LIANG 提交于 11月 18, 2020

* add lars to fleet meta optimizer

* add lamb to proto

* add lamb to fleet meta optimizer

* fixed syntax bug

* fixed syntax bug

* fixed syntax error in lamb, add config setter of lamb in distributed_strategy

* trigger unitest to rerun

* add new unitest func for lamb

* revise unitest for lars and lamb

* revise dgc meta unitest

* revise lars document in distribute_strategy

* revise lars lamb document in distributed_strategy.py

* revise lars lamb document in distributed_strategy.py

* add weight decay exclude logic to lars

* restore optimzier.py

* restore optimizer.py as develop except lars

* add epsilon and exclude fn to distributed_sttrategy

* add lars epsilon

* revise unitest for fleet lars and lamb

* revise lars lamb unitest for CI coverage

* revise lars argument api

* revise lars argument api

* revise lars argument api

* revise api doc of lars

* fix op role

* add sharding save and add_sync_comm_for_test function

* add comm_analyse to utlis

* revise sharding_utils

* add sharding saving unittest

* revise sharding utils for unittest

5a9f6889

28 10月, 2020 1 次提交
- C
  【Paddle.Fleet】Fix fleetrun heter (#28252) · 4dc8c44b
  由 Chengmo 提交于 10月 28, 2020
```
* fix fleetrun heter ps on paddlecloud
```
  4dc8c44b
26 10月, 2020 1 次提交
- M
  add sharding strategy in fleet(#27900) · 81244fbf
  由 mapingshuo 提交于 10月 26, 2020
```
* add sharding
```
  81244fbf
16 10月, 2020 1 次提交
- W
  
  【paddle.fleet】fleet add _get_applied_meta_list and _get_applied_graph_list (#27952) · fb641c91
  由 WangXi 提交于 10月 16, 2020
  
  fb641c91
14 10月, 2020 1 次提交
- C
  【paddle.fleet】fix sparse load (#27680) · 328cb289
  由 Chengmo 提交于 10月 14, 2020
```
* add sparse tensor load method
```
  328cb289
13 10月, 2020 2 次提交

M
support gradient merge with recompute, test=develop (#27834) · 8d2cb14f
由 mapingshuo 提交于 10月 13, 2020
```
* support gradient merge with recompute, test=develop
test=develop
```
8d2cb14f

【paddle.fleet】Update fleetrun & ps-heter (#27472) · c5f2802d

由 Chengmo 提交于 10月 13, 2020

* refine fleetrun.ps_launch

* update fleet run for multi device support

* ps_graph support ps-gpu

* fix heter save

* add heter save unittest

* fix unittest & simple code

* update fleetrun

* fix fleetrun

* fix launch barrier

* fix role maker

* add paddlecloud rolemaker unittest

* rename heter_worker_device_guard

c5f2802d

12 10月, 2020 1 次提交
- W
  
  fleet combine amp dgc recompute meta optimizer (#27643) · 0a1862d1
  由 WangXi 提交于 10月 12, 2020
  
  0a1862d1
25 9月, 2020 1 次提交
- W
  
  fleet2.0 add fp16 grad compression (#27480) · e550fc02
  由 WangXi 提交于 9月 25, 2020
  
  e550fc02
20 9月, 2020 1 次提交

【paddle.fleet】Fix/role maker api fix (#27326) · d6b54de4

由 tangwei12 提交于 9月 20, 2020

* fix fleet util and gloo

* fix worker endpoints

* fix

* fix UT

* fix gloo

* fix gloo

* update gloo

* update gloo

* update gloo

* update gloo

* update gloo

* fix gloo wrapper for hdfs

* add file gloo and UT

* fix UT

* fix UT

* fix UT

* hide public method of RoleMaker

* fix UT

* GPU fleetrun support gloo

* parameterserver fleetrun support gloo

* add UT

* add UT

* fix UT

* fix get server endpoint

* fix get server endpoint

* fix UT

* hide public method of rolemaker

* hide public method of rolemaker

* hide public method of rolemaker

* Update test_fleet_rolemaker_new.py

* hide public method of rolemaker

* hide public method of rolemaker

d6b54de4

17 9月, 2020 1 次提交
- S
  
  fix comment of adaptive lsgd (#27362) · 746a8ded
  由 ShenLiang 提交于 9月 17, 2020
  
  746a8ded

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致