提交 · ad76d37e8ace4e0a0d74a719c76945ffe7f9edb5 · PaddlePaddle / Paddle

29 3月, 2023 1 次提交
- Q
  
  fix bkcl_all_gather and c_embedding_grad bug for xpu (#51785) · ad76d37e
  由 QingshuChen 提交于 3月 29, 2023
  
  ad76d37e
16 3月, 2023 1 次提交

Update from_blob API (#51646) · c07c7712

由 Huang Jiyi 提交于 3月 16, 2023

* remove contexts in tensor_utils

* update from_blob

* update from_blob

* update from_blob

* fix bug

* fix bug

c07c7712

27 2月, 2023 1 次提交
- J
  [kunlun] support reduce_scatter (#50792) · 6786c012
  由 jameszhang 提交于 2月 27, 2023
```
* [kunlun] support reduce_scatter

* uncomment unittest

* update xccl to 1.0.10
```
  6786c012
20 1月, 2023 1 次提交
- J
  [KUNLUN] update xccl lib & use native Reduce in dygraph (#49941) · 073f7ced
  由 jameszhang 提交于 1月 20, 2023
```
* update xccl lib & use native Reduce in dygraph

* minor
```
  073f7ced
18 1月, 2023 2 次提交

J

kunlun support p2p send/recv (#49896) · 7242f40b
由 jameszhang 提交于 1月 18, 2023

7242f40b

use default XPU stream for computing (#49806) · f6b23d6d

由 jameszhang 提交于 1月 18, 2023

* revert to use default XPU stream for computing

XPUContext now has a null stream by default. If you want to use a separate stream
 (e.g. in async collective communication), you should create a dedicated XPUContext
and invoke its XPUContext::CreateStream()

* minor

f6b23d6d

15 1月, 2023 1 次提交

support mp on xpu (#49815) · 6a56bce7

由 Roc 提交于 1月 15, 2023

1 update xccl lib
2 when using comm_ctx, the allocator should be set manually.

6a56bce7

12 1月, 2023 1 次提交

Fix reduce func bug in process_group_bkcl (#49749) · 8e291bf7

由 jameszhang 提交于 1月 12, 2023

* Fix reduce func bug in process_group_bkcl

Also catch up with a recent process_group PR that failed to add XPU branch.
Note that reduce is still accomplished by allreduce for xpu. Fix this should
xccl lib be updated.

* fix compile issue for non-XPU

8e291bf7

09 1月, 2023 1 次提交

Create comm_context and modified static init (#49536) · 04e24e58

由 LiYuRio 提交于 1月 09, 2023

* comm_context and static init

* refactor: move to phi/core/distributed

* refactor: avoid mutable_data usage

* fix: windows sock

* fix: device without nccl
Co-authored-by: Wen Sun <syl1887415157@126.com>

04e24e58

05 1月, 2023 1 次提交

Refactor `ProcessGroup` to support comm context migration & clang compilation (#49451) · 1be70bc5

由 Wen Sun 提交于 1月 05, 2023

* refactor: use base class

* fix: incorrect deps

* fix: add missing header

* refactor: update class structures

* fix: bkcl typo

* fix: remove redundant def

1be70bc5

19 12月, 2022 1 次提交
- W
  
  refactor: rename process group (#49137) · 22e416cf
  由 Wen Sun 提交于 12月 19, 2022
  
  22e416cf
17 12月, 2022 1 次提交
- W
  
  refactor: rename xccl files (#49127) · d4f43ad4
  由 Wen Sun 提交于 12月 17, 2022
  
  d4f43ad4
14 12月, 2022 1 次提交

nullptr bugfix for XPU pg mode (#49043) · f0dab193

由 james 提交于 12月 14, 2022

* nullptr bugfix for XPU pg mode

Also a few kernels is added to xpu whitelist

* increase error msg length

f0dab193

24 11月, 2022 1 次提交

processgroup bkcl support reduce (#48232) · 5f995d3f

由 james 提交于 11月 24, 2022

Note: this is a temporary solution, should be replaced once reduce kernel
is natively supported on KL2

5f995d3f

21 11月, 2022 3 次提交
- R
  
  Fix Ctx Dev pointer for KUNLUN (#48184) · 2d0fb059
  由 Roc 提交于 11月 21, 2022
  
  2d0fb059
- L
  
  add new map instance (#48145) · 2a47416c
  由 LiYuRio 提交于 11月 21, 2022
  
  2a47416c
- L
  
  return pointer rather than reference (#48152) · 403d58bb
  由 LiYuRio 提交于 11月 21, 2022
  
  403d58bb
18 11月, 2022 2 次提交

correct sync behavior for XPU distributed training (#47882) · aafa9820

由 james 提交于 11月 18, 2022

* correct sync behavior for XPU distributed training

XPU support event mechanism similar to cuda event, so it is advisable to
use an event to sync compute/comm streams for performance. However this
mechanism is never fully tested, and inconsistent loss/ending_epochs are
reported. Therefore, this PR replaces event sync with stream waiting as
a temporary solution.

* remove compile warning

aafa9820

fix device id issue for xpu eager mode (#48076) · 3b18d96b

由 james 提交于 11月 18, 2022

* fix device id issue for xpu eager

xpu device id is not correctly set in eager mode, thus vars are on dev0 unless
XPUDeviceGurad is called, leading to this error message for all node rank != 0:
"NotImplementedError: (Unimplemented) Place Place(xpu:0) is not supported."

* fix typo

* fix pybind error

3b18d96b

14 11月, 2022 2 次提交
- W
  Refactor collective communication send_partial, recv_partial, all_gather_partial C++ API (#47863) · 25e63dca
  由 Wen Sun 提交于 11月 14, 2022
```
* refactor: simplify send, recv interfaces

* refactor: rm send_partial, recv_partial, all_gather_partial
```
  25e63dca
- L
  
  Remove place for process group (#47857) · 2d383b81
  由 LiYuRio 提交于 11月 14, 2022
  
  2d383b81
10 11月, 2022 1 次提交

XPU multi-card support eager mode (#47445) · 3b91f8f3

由 james 提交于 11月 10, 2022

* XPU support eager mode

* add unittest for XPU eager mode

* minor bugfix

* minor bugfix, test=kunlun

* correct copyright info

* 1. remove unsed vars/funcs
2. ProcessGroupBKCL inherit from ProcessGroupStream

* bugfix for fp16 in eager mode multi-card, test=kunlun

* rebase & fix a few issues

* use new processgroup interface, test=kunlun

* fix compile issue, test=kunlun

3b91f8f3

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功