提交 · bd165b946412dc21e5cc6354ea85f7418073b499 · PaddlePaddle / Paddle

12 1月, 2023 2 次提交

Migrate collective communication checks to PHI (#49754) · c24e7fe1

由 Wen Sun 提交于 1月 12, 2023

* refactor: migrate comm checks

* refactor: add check in comm context

* feat: add gloo static check

* refactor: add place param in static check

c24e7fe1

Fix reduce func bug in process_group_bkcl (#49749) · 8e291bf7

由 jameszhang 提交于 1月 12, 2023

* Fix reduce func bug in process_group_bkcl

Also catch up with a recent process_group PR that failed to add XPU branch.
Note that reduce is still accomplished by allreduce for xpu. Fix this should
xccl lib be updated.

* fix compile issue for non-XPU

8e291bf7

09 1月, 2023 1 次提交

Create comm_context and modified static init (#49536) · 04e24e58

由 LiYuRio 提交于 1月 09, 2023

* comm_context and static init

* refactor: move to phi/core/distributed

* refactor: avoid mutable_data usage

* fix: windows sock

* fix: device without nccl
Co-authored-by: Wen Sun <syl1887415157@126.com>

04e24e58

06 1月, 2023 1 次提交
- W
  Fix hidden overloaded functions in process group (#49576) · 215c7ae7
  由 Wen Sun 提交于 1月 06, 2023
```
* fix: fix hidden virtual funcs

* fix: add default impl
```
  215c7ae7
05 1月, 2023 1 次提交

Refactor `ProcessGroup` to support comm context migration & clang compilation (#49451) · 1be70bc5

由 Wen Sun 提交于 1月 05, 2023

* refactor: use base class

* fix: incorrect deps

* fix: add missing header

* refactor: update class structures

* fix: bkcl typo

* fix: remove redundant def

1be70bc5

19 12月, 2022 1 次提交
- W
  
  refactor: rename process group (#49137) · 22e416cf
  由 Wen Sun 提交于 12月 19, 2022
  
  22e416cf
17 12月, 2022 1 次提交
- W
  
  refactor: rename xccl files (#49127) · d4f43ad4
  由 Wen Sun 提交于 12月 17, 2022
  
  d4f43ad4
16 12月, 2022 1 次提交
- W
  
  refactor: rename files (#49117) · 40f3f4f0
  由 Wen Sun 提交于 12月 16, 2022
  
  40f3f4f0
15 12月, 2022 1 次提交
- W
  
  fix: gloo compatible (#49084) · 3fec7a6e
  由 Wen Sun 提交于 12月 15, 2022
  
  3fec7a6e
14 12月, 2022 1 次提交

nullptr bugfix for XPU pg mode (#49043) · f0dab193

由 james 提交于 12月 14, 2022

* nullptr bugfix for XPU pg mode

Also a few kernels is added to xpu whitelist

* increase error msg length

f0dab193

12 12月, 2022 1 次提交
- W
  Add dynamic checks for collective communication on NCCL (#48915) · e7711592
  由 Wen Sun 提交于 12月 12, 2022
```
* chore: unify `SingleTensor`

* feat: dynamic check
```
  e7711592
05 12月, 2022 1 次提交
- S
  
  fix bug of reducer in best_fit (#48668) · cee7a3db
  由 ShenLiang 提交于 12月 05, 2022
  
  cee7a3db
03 12月, 2022 1 次提交

Refactor collective communication static check (#48646) · 4552be48

由 Wen Sun 提交于 12月 03, 2022

* refactor: classify static check

* refactor: rename to static_check & use forward decl

* refactor: switch to unary & binary funcs

4552be48

24 11月, 2022 1 次提交

processgroup bkcl support reduce (#48232) · 5f995d3f

由 james 提交于 11月 24, 2022

Note: this is a temporary solution, should be replaced once reduce kernel
is natively supported on KL2

5f995d3f

23 11月, 2022 1 次提交
- W
  Add static checks for collective communication on NCCL (#48256) · d828ca46
  由 Wen Sun 提交于 11月 23, 2022
```
* feat: static check
```
  d828ca46
21 11月, 2022 4 次提交
- R
  
  Fix Ctx Dev pointer for KUNLUN (#48184) · 2d0fb059
  由 Roc 提交于 11月 21, 2022
  
  2d0fb059
- W
  Unify `ProcessGroupNCCL` APIs underlying implementation (#48163) · 88410225
  由 Wen Sun 提交于 11月 21, 2022
```
* refactor: replace Collective & PointToPoint with NCCLEnv

* refactor: rename to RunFnInNCCLEnv

* refactor: pass std::function by value
```
  88410225
- L
  
  add new map instance (#48145) · 2a47416c
  由 LiYuRio 提交于 11月 21, 2022
  
  2a47416c
- L
  
  return pointer rather than reference (#48152) · 403d58bb
  由 LiYuRio 提交于 11月 21, 2022
  
  403d58bb
19 11月, 2022 1 次提交
- W
  
  refactor: rm redundant funcs (#48149) · f38e09f0
  由 Wen Sun 提交于 11月 19, 2022
  
  f38e09f0
18 11月, 2022 3 次提交

W

Refactor collective communication reduce, scatter, reduce_scatter C++ API (#48115) · edda13cd
由 Wen Sun 提交于 11月 18, 2022

edda13cd

correct sync behavior for XPU distributed training (#47882) · aafa9820

由 james 提交于 11月 18, 2022

* correct sync behavior for XPU distributed training

XPU support event mechanism similar to cuda event, so it is advisable to
use an event to sync compute/comm streams for performance. However this
mechanism is never fully tested, and inconsistent loss/ending_epochs are
reported. Therefore, this PR replaces event sync with stream waiting as
a temporary solution.

* remove compile warning

aafa9820

fix device id issue for xpu eager mode (#48076) · 3b18d96b

由 james 提交于 11月 18, 2022

* fix device id issue for xpu eager

xpu device id is not correctly set in eager mode, thus vars are on dev0 unless
XPUDeviceGurad is called, leading to this error message for all node rank != 0:
"NotImplementedError: (Unimplemented) Place Place(xpu:0) is not supported."

* fix typo

* fix pybind error

3b18d96b

17 11月, 2022 1 次提交
- W
  
  Refactor collective communication all_to_all, all_to_all_single C++ API (#48059) · 3f480af2
  由 Wen Sun 提交于 11月 17, 2022
  
  3f480af2
16 11月, 2022 1 次提交

Update `ProcessGroupCustom` for `sync_op` compatibility (#47976) · e4ebf383

由 Wen Sun 提交于 11月 16, 2022

* refactor: update pg custom

* fix: use new api in ut

* fix: typo

* revert: recover legacy apis

* fix: add GetDeviceContext

e4ebf383

14 11月, 2022 3 次提交
- W
  Refactor collective communication send_partial, recv_partial, all_gather_partial C++ API (#47863) · 25e63dca
  由 Wen Sun 提交于 11月 14, 2022
```
* refactor: simplify send, recv interfaces

* refactor: rm send_partial, recv_partial, all_gather_partial
```
  25e63dca
- L
  
  Remove place for process group (#47857) · 2d383b81
  由 LiYuRio 提交于 11月 14, 2022
  
  2d383b81
- L
  
  remove heter and hccl (#47918) · 9191e743
  由 LiYuRio 提交于 11月 14, 2022
  
  9191e743
10 11月, 2022 2 次提交

XPU multi-card support eager mode (#47445) · 3b91f8f3

由 james 提交于 11月 10, 2022

* XPU support eager mode

* add unittest for XPU eager mode

* minor bugfix

* minor bugfix, test=kunlun

* correct copyright info

* 1. remove unsed vars/funcs
2. ProcessGroupBKCL inherit from ProcessGroupStream

* bugfix for fp16 in eager mode multi-card, test=kunlun

* rebase & fix a few issues

* use new processgroup interface, test=kunlun

* fix compile issue, test=kunlun

3b91f8f3

W
Refactor collective communication P2P C++ API (#47801) · d926c270
由 Wen Sun 提交于 11月 10, 2022
```
* refactor: send, recv, send_partial, recv_partial

* refactor: rm useless const ref
```
d926c270

09 11月, 2022 1 次提交
- W
  
  refactor: ProcessGroupNCCL (#47740) · ae14bad1
  由 Wen Sun 提交于 11月 09, 2022
  
  ae14bad1
08 11月, 2022 1 次提交
- L
  
  refine comm api implementation (#47713) · 84c9a0d6
  由 LiYuRio 提交于 11月 08, 2022
  
  84c9a0d6
07 11月, 2022 1 次提交
- W
  
  Refactor collective communication all_gather, all_reduce, broadcast & barrier C++ API (#47481) · e1a1c354
  由 Wen Sun 提交于 11月 07, 2022
  
  e1a1c354
04 11月, 2022 2 次提交
- L
  
  move broadcast, reduce, send, recv, reduce_scatter, scatter, alltoall (#47255) · 99504cbb
  由 LiYuRio 提交于 11月 04, 2022
  
  99504cbb
- L
  
  remove global var (#47659) · 7fe7eebc
  由 LiYuRio 提交于 11月 04, 2022
  
  7fe7eebc
01 11月, 2022 1 次提交
- Y
  
  fix p2p comm memory release logic (#47497) · f82d7e3c
  由 Yuang Liu 提交于 11月 01, 2022
  
  f82d7e3c
31 10月, 2022 1 次提交
- R
  [CustomDevice] GetCCLComm add custom device support (#47168) · 34d13d6a
  由 ronnywang 提交于 10月 31, 2022
```
* [CustomDevice] GetCCLComm add custom device support

* update

* update

* update
```
  34d13d6a
28 10月, 2022 2 次提交
- H
  
  [Dygraph] Finish fixing mem bugs of no sync in DataParallel (#47444) · e77c062e
  由 Haohongxiang 提交于 10月 28, 2022
  
  e77c062e
- H
  [Dygraph] Fix memory bugs of no sync and SplitTensors in DataParallel (#47369) · 57d5ffa5
  由 Haohongxiang 提交于 10月 28, 2022
```
* fix no sync bugs

* update

* update task chain

fix: update wait chain

feat: add `GetDeviceContext` for gloo

* fix oom

* fix dev

* update

* update
Co-authored-by: NLiYuRio <liyuruijx@163.com>
Co-authored-by: NForFishes <2282912238@qq.com>
```
  57d5ffa5
17 10月, 2022 1 次提交

Support BF16 training for sharding (#46846) · 0b39b244

由 Ghost Screaming 提交于 10月 17, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* Support bfloat16 type for reducer and sharding.

* Fix some bug.

* Polish code.

* Polise code.

* Add bfloat16 datatype in fill_grad kernels.
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

0b39b244

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功