提交 · edda13cd88b269c932e1d8fafa5a6fabbbda72a2 · PaddlePaddle / Paddle

18 11月, 2022 3 次提交

W

Refactor collective communication reduce, scatter, reduce_scatter C++ API (#48115) · edda13cd
由 Wen Sun 提交于 11月 18, 2022

edda13cd

correct sync behavior for XPU distributed training (#47882) · aafa9820

由 james 提交于 11月 18, 2022

* correct sync behavior for XPU distributed training

XPU support event mechanism similar to cuda event, so it is advisable to
use an event to sync compute/comm streams for performance. However this
mechanism is never fully tested, and inconsistent loss/ending_epochs are
reported. Therefore, this PR replaces event sync with stream waiting as
a temporary solution.

* remove compile warning

aafa9820

fix device id issue for xpu eager mode (#48076) · 3b18d96b

由 james 提交于 11月 18, 2022

* fix device id issue for xpu eager

xpu device id is not correctly set in eager mode, thus vars are on dev0 unless
XPUDeviceGurad is called, leading to this error message for all node rank != 0:
"NotImplementedError: (Unimplemented) Place Place(xpu:0) is not supported."

* fix typo

* fix pybind error

3b18d96b

17 11月, 2022 1 次提交
- W
  
  Refactor collective communication all_to_all, all_to_all_single C++ API (#48059) · 3f480af2
  由 Wen Sun 提交于 11月 17, 2022
  
  3f480af2
16 11月, 2022 1 次提交

Update `ProcessGroupCustom` for `sync_op` compatibility (#47976) · e4ebf383

由 Wen Sun 提交于 11月 16, 2022

* refactor: update pg custom

* fix: use new api in ut

* fix: typo

* revert: recover legacy apis

* fix: add GetDeviceContext

e4ebf383

14 11月, 2022 3 次提交
- W
  Refactor collective communication send_partial, recv_partial, all_gather_partial C++ API (#47863) · 25e63dca
  由 Wen Sun 提交于 11月 14, 2022
```
* refactor: simplify send, recv interfaces

* refactor: rm send_partial, recv_partial, all_gather_partial
```
  25e63dca
- L
  
  Remove place for process group (#47857) · 2d383b81
  由 LiYuRio 提交于 11月 14, 2022
  
  2d383b81
- L
  
  remove heter and hccl (#47918) · 9191e743
  由 LiYuRio 提交于 11月 14, 2022
  
  9191e743
10 11月, 2022 2 次提交

XPU multi-card support eager mode (#47445) · 3b91f8f3

由 james 提交于 11月 10, 2022

* XPU support eager mode

* add unittest for XPU eager mode

* minor bugfix

* minor bugfix, test=kunlun

* correct copyright info

* 1. remove unsed vars/funcs
2. ProcessGroupBKCL inherit from ProcessGroupStream

* bugfix for fp16 in eager mode multi-card, test=kunlun

* rebase & fix a few issues

* use new processgroup interface, test=kunlun

* fix compile issue, test=kunlun

3b91f8f3

W
Refactor collective communication P2P C++ API (#47801) · d926c270
由 Wen Sun 提交于 11月 10, 2022
```
* refactor: send, recv, send_partial, recv_partial

* refactor: rm useless const ref
```
d926c270

09 11月, 2022 1 次提交
- W
  
  refactor: ProcessGroupNCCL (#47740) · ae14bad1
  由 Wen Sun 提交于 11月 09, 2022
  
  ae14bad1
08 11月, 2022 1 次提交
- L
  
  refine comm api implementation (#47713) · 84c9a0d6
  由 LiYuRio 提交于 11月 08, 2022
  
  84c9a0d6
07 11月, 2022 1 次提交
- W
  
  Refactor collective communication all_gather, all_reduce, broadcast & barrier C++ API (#47481) · e1a1c354
  由 Wen Sun 提交于 11月 07, 2022
  
  e1a1c354
04 11月, 2022 2 次提交
- L
  
  move broadcast, reduce, send, recv, reduce_scatter, scatter, alltoall (#47255) · 99504cbb
  由 LiYuRio 提交于 11月 04, 2022
  
  99504cbb
- L
  
  remove global var (#47659) · 7fe7eebc
  由 LiYuRio 提交于 11月 04, 2022
  
  7fe7eebc
01 11月, 2022 1 次提交
- Y
  
  fix p2p comm memory release logic (#47497) · f82d7e3c
  由 Yuang Liu 提交于 11月 01, 2022
  
  f82d7e3c
31 10月, 2022 1 次提交
- R
  [CustomDevice] GetCCLComm add custom device support (#47168) · 34d13d6a
  由 ronnywang 提交于 10月 31, 2022
```
* [CustomDevice] GetCCLComm add custom device support

* update

* update

* update
```
  34d13d6a
28 10月, 2022 2 次提交
- H
  
  [Dygraph] Finish fixing mem bugs of no sync in DataParallel (#47444) · e77c062e
  由 Haohongxiang 提交于 10月 28, 2022
  
  e77c062e
- H
  [Dygraph] Fix memory bugs of no sync and SplitTensors in DataParallel (#47369) · 57d5ffa5
  由 Haohongxiang 提交于 10月 28, 2022
```
* fix no sync bugs

* update

* update task chain

fix: update wait chain

feat: add `GetDeviceContext` for gloo

* fix oom

* fix dev

* update

* update
Co-authored-by: NLiYuRio <liyuruijx@163.com>
Co-authored-by: NForFishes <2282912238@qq.com>
```
  57d5ffa5
17 10月, 2022 1 次提交

Support BF16 training for sharding (#46846) · 0b39b244

由 Ghost Screaming 提交于 10月 17, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* Support bfloat16 type for reducer and sharding.

* Fix some bug.

* Polish code.

* Polise code.

* Add bfloat16 datatype in fill_grad kernels.
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

0b39b244

11 10月, 2022 2 次提交
- W
  
  Support both use_calc_stream and sync_op in collective communication API (#46761) · f94edc3b
  由 Wen Sun 提交于 10月 11, 2022
  
  f94edc3b
- W
  
  Completes bfloat16 dtype for collective api in eager mode (#45844) · e4eb8d36
  由 Wen Sun 提交于 10月 11, 2022
  
  e4eb8d36
10 10月, 2022 1 次提交
- L
  
  Move group and all reduce from collective to communication (#45848) · a0dffd39
  由 LiYuRio 提交于 10月 10, 2022
  
  a0dffd39
08 10月, 2022 1 次提交
- H
  
  [Dygraph] Fix performance of pp+mp by using send/recv_calc_stream instead of send/recv (#46116) · 8c0529fd
  由 Haohongxiang 提交于 10月 08, 2022
  
  8c0529fd
30 9月, 2022 1 次提交
- W
  
  Support both use_calc_stream and sync_op in allgather API (#46295) · ecae7b31
  由 Wen Sun 提交于 9月 30, 2022
  
  ecae7b31
29 9月, 2022 1 次提交
- X
  
  fix mpi include bug (#46601) · 7057093e
  由 Xinger 提交于 9月 29, 2022
  
  7057093e
21 9月, 2022 1 次提交
- W
  
  Mpi final dev simple (#46247) · 9ce31e96
  由 wuhuachaocoding 提交于 9月 21, 2022
  
  9ce31e96
16 9月, 2022 1 次提交
- W
  
  Support both use_calc_stream and sync_op in send recv APIs (#46023) · ae00f428
  由 Wen Sun 提交于 9月 16, 2022
  
  ae00f428
07 9月, 2022 1 次提交
- L
  
  add device context getter (#45790) · b7d219be
  由 LiYuRio 提交于 9月 07, 2022
  
  b7d219be
06 9月, 2022 1 次提交
- W
  
  Completes basic dtypes for collective api in eager mode (#45574) · 7a92e74b
  由 Wen Sun 提交于 9月 06, 2022
  
  7a92e74b
01 9月, 2022 1 次提交
- S
  Lazy initialize dense_contents_ in reducer (#45631) · 196b0187
  由 sneaxiy 提交于 9月 01, 2022
```
* make dense_contents_ lazy init

* update legacy dygraph

* fix legacy dygraph bug
```
  196b0187
31 8月, 2022 1 次提交
- L
  
  add stream.all_reduce API and ProcessGroupStream (#45282) · ce4775cd
  由 LiYuRio 提交于 8月 31, 2022
  
  ce4775cd
26 8月, 2022 1 次提交
- D
  
  fix brpc update compile error; test=develop (#45438) · a5e9ccda
  由 danleifeng 提交于 8月 26, 2022
  
  a5e9ccda
25 8月, 2022 1 次提交
- D
  update brpc version to 1.2.0 (#45351) · 9b5b005e
  由 danleifeng 提交于 8月 25, 2022
```
* update brpc version;test=develop
```
  9b5b005e
22 8月, 2022 1 次提交
- R
  
  [CustomDevice] fix custom ccl (#45276) · 307ad60d
  由 ronnywang 提交于 8月 22, 2022
  
  307ad60d
12 8月, 2022 1 次提交
- L
  
  fix nccl comm in sync_bn (#45100) · 1e965756
  由 LiYuRio 提交于 8月 12, 2022
  
  1e965756
08 8月, 2022 1 次提交
- S
  
  fix memory leak (#44971) · 031debb7
  由 ShenLiang 提交于 8月 08, 2022
  
  031debb7
03 8月, 2022 1 次提交
- R
  [CustomDevice] add custom ccl 2/2 (#44650) · 80ca78a2
  由 ronnywang 提交于 8月 03, 2022
```
* [CustomDevice] add custom ccl 2/2

* update

* update

* update launch
```
  80ca78a2
01 8月, 2022 1 次提交

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

29 7月, 2022 1 次提交

move CUDAStream to phi (#44529) · da3743fd

由 Leo Chen 提交于 7月 29, 2022

* init

* move CUDAStream to phi

* fix compilation

* merge develop

* add stream_owned_ member

* split cuda_stream.h

* fix cpu compile

* fix constructor

* fix bug

* fix windows compile

* fix inference test_levit

* fix windows tests

da3743fd

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功