提交 · d4f43ad4ced51f9eb4979172d6a4be80090ac530 · PaddlePaddle / Paddle

17 12月, 2022 1 次提交
- W
  
  refactor: rename xccl files (#49127) · d4f43ad4
  由 Wen Sun 提交于 12月 17, 2022
  
  d4f43ad4
16 12月, 2022 1 次提交
- W
  
  refactor: rename files (#49117) · 40f3f4f0
  由 Wen Sun 提交于 12月 16, 2022
  
  40f3f4f0
15 12月, 2022 1 次提交
- W
  
  fix: gloo compatible (#49084) · 3fec7a6e
  由 Wen Sun 提交于 12月 15, 2022
  
  3fec7a6e
14 12月, 2022 1 次提交

nullptr bugfix for XPU pg mode (#49043) · f0dab193

由 james 提交于 12月 14, 2022

* nullptr bugfix for XPU pg mode

Also a few kernels is added to xpu whitelist

* increase error msg length

f0dab193

12 12月, 2022 1 次提交
- W
  Add dynamic checks for collective communication on NCCL (#48915) · e7711592
  由 Wen Sun 提交于 12月 12, 2022
```
* chore: unify `SingleTensor`

* feat: dynamic check
```
  e7711592
05 12月, 2022 1 次提交
- S
  
  fix bug of reducer in best_fit (#48668) · cee7a3db
  由 ShenLiang 提交于 12月 05, 2022
  
  cee7a3db
03 12月, 2022 1 次提交

Refactor collective communication static check (#48646) · 4552be48

由 Wen Sun 提交于 12月 03, 2022

* refactor: classify static check

* refactor: rename to static_check & use forward decl

* refactor: switch to unary & binary funcs

4552be48

24 11月, 2022 1 次提交

processgroup bkcl support reduce (#48232) · 5f995d3f

由 james 提交于 11月 24, 2022

Note: this is a temporary solution, should be replaced once reduce kernel
is natively supported on KL2

5f995d3f

23 11月, 2022 1 次提交
- W
  Add static checks for collective communication on NCCL (#48256) · d828ca46
  由 Wen Sun 提交于 11月 23, 2022
```
* feat: static check
```
  d828ca46
21 11月, 2022 4 次提交
- R
  
  Fix Ctx Dev pointer for KUNLUN (#48184) · 2d0fb059
  由 Roc 提交于 11月 21, 2022
  
  2d0fb059
- W
  Unify `ProcessGroupNCCL` APIs underlying implementation (#48163) · 88410225
  由 Wen Sun 提交于 11月 21, 2022
```
* refactor: replace Collective & PointToPoint with NCCLEnv

* refactor: rename to RunFnInNCCLEnv

* refactor: pass std::function by value
```
  88410225
- L
  
  add new map instance (#48145) · 2a47416c
  由 LiYuRio 提交于 11月 21, 2022
  
  2a47416c
- L
  
  return pointer rather than reference (#48152) · 403d58bb
  由 LiYuRio 提交于 11月 21, 2022
  
  403d58bb
19 11月, 2022 1 次提交
- W
  
  refactor: rm redundant funcs (#48149) · f38e09f0
  由 Wen Sun 提交于 11月 19, 2022
  
  f38e09f0
18 11月, 2022 3 次提交

W

Refactor collective communication reduce, scatter, reduce_scatter C++ API (#48115) · edda13cd
由 Wen Sun 提交于 11月 18, 2022

edda13cd

correct sync behavior for XPU distributed training (#47882) · aafa9820

由 james 提交于 11月 18, 2022

* correct sync behavior for XPU distributed training

XPU support event mechanism similar to cuda event, so it is advisable to
use an event to sync compute/comm streams for performance. However this
mechanism is never fully tested, and inconsistent loss/ending_epochs are
reported. Therefore, this PR replaces event sync with stream waiting as
a temporary solution.

* remove compile warning

aafa9820

fix device id issue for xpu eager mode (#48076) · 3b18d96b

由 james 提交于 11月 18, 2022

* fix device id issue for xpu eager

xpu device id is not correctly set in eager mode, thus vars are on dev0 unless
XPUDeviceGurad is called, leading to this error message for all node rank != 0:
"NotImplementedError: (Unimplemented) Place Place(xpu:0) is not supported."

* fix typo

* fix pybind error

3b18d96b

17 11月, 2022 1 次提交
- W
  
  Refactor collective communication all_to_all, all_to_all_single C++ API (#48059) · 3f480af2
  由 Wen Sun 提交于 11月 17, 2022
  
  3f480af2
16 11月, 2022 1 次提交

Update `ProcessGroupCustom` for `sync_op` compatibility (#47976) · e4ebf383

由 Wen Sun 提交于 11月 16, 2022

* refactor: update pg custom

* fix: use new api in ut

* fix: typo

* revert: recover legacy apis

* fix: add GetDeviceContext

e4ebf383

14 11月, 2022 3 次提交
- W
  Refactor collective communication send_partial, recv_partial, all_gather_partial C++ API (#47863) · 25e63dca
  由 Wen Sun 提交于 11月 14, 2022
```
* refactor: simplify send, recv interfaces

* refactor: rm send_partial, recv_partial, all_gather_partial
```
  25e63dca
- L
  
  Remove place for process group (#47857) · 2d383b81
  由 LiYuRio 提交于 11月 14, 2022
  
  2d383b81
- L
  
  remove heter and hccl (#47918) · 9191e743
  由 LiYuRio 提交于 11月 14, 2022
  
  9191e743
10 11月, 2022 2 次提交

XPU multi-card support eager mode (#47445) · 3b91f8f3

由 james 提交于 11月 10, 2022

* XPU support eager mode

* add unittest for XPU eager mode

* minor bugfix

* minor bugfix, test=kunlun

* correct copyright info

* 1. remove unsed vars/funcs
2. ProcessGroupBKCL inherit from ProcessGroupStream

* bugfix for fp16 in eager mode multi-card, test=kunlun

* rebase & fix a few issues

* use new processgroup interface, test=kunlun

* fix compile issue, test=kunlun

3b91f8f3

W
Refactor collective communication P2P C++ API (#47801) · d926c270
由 Wen Sun 提交于 11月 10, 2022
```
* refactor: send, recv, send_partial, recv_partial

* refactor: rm useless const ref
```
d926c270

09 11月, 2022 1 次提交
- W
  
  refactor: ProcessGroupNCCL (#47740) · ae14bad1
  由 Wen Sun 提交于 11月 09, 2022
  
  ae14bad1
08 11月, 2022 1 次提交
- L
  
  refine comm api implementation (#47713) · 84c9a0d6
  由 LiYuRio 提交于 11月 08, 2022
  
  84c9a0d6
07 11月, 2022 1 次提交
- W
  
  Refactor collective communication all_gather, all_reduce, broadcast & barrier C++ API (#47481) · e1a1c354
  由 Wen Sun 提交于 11月 07, 2022
  
  e1a1c354
04 11月, 2022 2 次提交
- L
  
  move broadcast, reduce, send, recv, reduce_scatter, scatter, alltoall (#47255) · 99504cbb
  由 LiYuRio 提交于 11月 04, 2022
  
  99504cbb
- L
  
  remove global var (#47659) · 7fe7eebc
  由 LiYuRio 提交于 11月 04, 2022
  
  7fe7eebc
01 11月, 2022 1 次提交
- Y
  
  fix p2p comm memory release logic (#47497) · f82d7e3c
  由 Yuang Liu 提交于 11月 01, 2022
  
  f82d7e3c
31 10月, 2022 1 次提交
- R
  [CustomDevice] GetCCLComm add custom device support (#47168) · 34d13d6a
  由 ronnywang 提交于 10月 31, 2022
```
* [CustomDevice] GetCCLComm add custom device support

* update

* update

* update
```
  34d13d6a
28 10月, 2022 2 次提交
- H
  
  [Dygraph] Finish fixing mem bugs of no sync in DataParallel (#47444) · e77c062e
  由 Haohongxiang 提交于 10月 28, 2022
  
  e77c062e
- H
  [Dygraph] Fix memory bugs of no sync and SplitTensors in DataParallel (#47369) · 57d5ffa5
  由 Haohongxiang 提交于 10月 28, 2022
```
* fix no sync bugs

* update

* update task chain

fix: update wait chain

feat: add `GetDeviceContext` for gloo

* fix oom

* fix dev

* update

* update
Co-authored-by: NLiYuRio <liyuruijx@163.com>
Co-authored-by: NForFishes <2282912238@qq.com>
```
  57d5ffa5
17 10月, 2022 1 次提交

Support BF16 training for sharding (#46846) · 0b39b244

由 Ghost Screaming 提交于 10月 17, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* Support bfloat16 type for reducer and sharding.

* Fix some bug.

* Polish code.

* Polise code.

* Add bfloat16 datatype in fill_grad kernels.
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

0b39b244

11 10月, 2022 2 次提交
- W
  
  Support both use_calc_stream and sync_op in collective communication API (#46761) · f94edc3b
  由 Wen Sun 提交于 10月 11, 2022
  
  f94edc3b
- W
  
  Completes bfloat16 dtype for collective api in eager mode (#45844) · e4eb8d36
  由 Wen Sun 提交于 10月 11, 2022
  
  e4eb8d36
10 10月, 2022 1 次提交
- L
  
  Move group and all reduce from collective to communication (#45848) · a0dffd39
  由 LiYuRio 提交于 10月 10, 2022
  
  a0dffd39
08 10月, 2022 1 次提交
- H
  
  [Dygraph] Fix performance of pp+mp by using send/recv_calc_stream instead of send/recv (#46116) · 8c0529fd
  由 Haohongxiang 提交于 10月 08, 2022
  
  8c0529fd
30 9月, 2022 1 次提交
- W
  
  Support both use_calc_stream and sync_op in allgather API (#46295) · ecae7b31
  由 Wen Sun 提交于 9月 30, 2022
  
  ecae7b31
29 9月, 2022 1 次提交
- X
  
  fix mpi include bug (#46601) · 7057093e
  由 Xinger 提交于 9月 29, 2022
  
  7057093e

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功