- 29 3月, 2023 1 次提交
-
-
由 QingshuChen 提交于
-
- 16 3月, 2023 1 次提交
-
-
由 Huang Jiyi 提交于
* remove contexts in tensor_utils * update from_blob * update from_blob * update from_blob * fix bug * fix bug
-
- 27 2月, 2023 1 次提交
-
-
由 jameszhang 提交于
* [kunlun] support reduce_scatter * uncomment unittest * update xccl to 1.0.10
-
- 20 1月, 2023 1 次提交
-
-
由 jameszhang 提交于
* update xccl lib & use native Reduce in dygraph * minor
-
- 18 1月, 2023 2 次提交
-
-
由 jameszhang 提交于
-
由 jameszhang 提交于
* revert to use default XPU stream for computing XPUContext now has a null stream by default. If you want to use a separate stream (e.g. in async collective communication), you should create a dedicated XPUContext and invoke its XPUContext::CreateStream() * minor
-
- 15 1月, 2023 1 次提交
-
-
由 Roc 提交于
1 update xccl lib 2 when using comm_ctx, the allocator should be set manually.
-
- 12 1月, 2023 1 次提交
-
-
由 jameszhang 提交于
* Fix reduce func bug in process_group_bkcl Also catch up with a recent process_group PR that failed to add XPU branch. Note that reduce is still accomplished by allreduce for xpu. Fix this should xccl lib be updated. * fix compile issue for non-XPU
-
- 09 1月, 2023 1 次提交
-
-
由 LiYuRio 提交于
* comm_context and static init * refactor: move to phi/core/distributed * refactor: avoid mutable_data usage * fix: windows sock * fix: device without nccl Co-authored-by: Wen Sun <syl1887415157@126.com>
-
- 05 1月, 2023 1 次提交
-
-
由 Wen Sun 提交于
* refactor: use base class * fix: incorrect deps * fix: add missing header * refactor: update class structures * fix: bkcl typo * fix: remove redundant def
-
- 19 12月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 17 12月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 14 12月, 2022 1 次提交
-
-
由 james 提交于
* nullptr bugfix for XPU pg mode Also a few kernels is added to xpu whitelist * increase error msg length
-
- 24 11月, 2022 1 次提交
-
-
由 james 提交于
Note: this is a temporary solution, should be replaced once reduce kernel is natively supported on KL2
-
- 21 11月, 2022 3 次提交
- 18 11月, 2022 2 次提交
-
-
由 james 提交于
* correct sync behavior for XPU distributed training XPU support event mechanism similar to cuda event, so it is advisable to use an event to sync compute/comm streams for performance. However this mechanism is never fully tested, and inconsistent loss/ending_epochs are reported. Therefore, this PR replaces event sync with stream waiting as a temporary solution. * remove compile warning
-
由 james 提交于
* fix device id issue for xpu eager xpu device id is not correctly set in eager mode, thus vars are on dev0 unless XPUDeviceGurad is called, leading to this error message for all node rank != 0: "NotImplementedError: (Unimplemented) Place Place(xpu:0) is not supported." * fix typo * fix pybind error
-
- 14 11月, 2022 2 次提交
- 10 11月, 2022 1 次提交
-
-
由 james 提交于
* XPU support eager mode * add unittest for XPU eager mode * minor bugfix * minor bugfix, test=kunlun * correct copyright info * 1. remove unsed vars/funcs 2. ProcessGroupBKCL inherit from ProcessGroupStream * bugfix for fp16 in eager mode multi-card, test=kunlun * rebase & fix a few issues * use new processgroup interface, test=kunlun * fix compile issue, test=kunlun
-