提交 · e7547ca7e1c37ddd319273bb2bd12f338178164e · PaddlePaddle / Paddle

02 8月, 2022 1 次提交
- Y
  Pass NVIDIA_TF32_OVERRIDE to internal (#43646) (#44796) · e7547ca7
  由 Yuang Liu 提交于 8月 02, 2022
```
Co-authored-by: Ngongweibao <gongweibao@baidu.com>
```
  e7547ca7
24 6月, 2022 1 次提交

[cherry-pick] NVIDIA fixes (#43780) · 9edbe4aa

由 Aganlengzi 提交于 6月 24, 2022

* Use all sitepackages path as the library/include path (#42940)

* Fix several unit tests and increase the unit tests stability (#43670)

* Reduce gather op unit tests size and increase the timeout

* Add NVIDIA_TF32_OVERRIDE for multi-processes environment

* Remove record test for device event ut

* Fix 3 unittest errors (#43532)

* Fix test_fuse_resnet_unit failure

* Fix test_imperative_auto_mixed_precision failure

* Fix sparse_attention_op error

* Fix sparse_attention_op error

* Use fixed random seed (#43659)

* for CI test_collective_sendrecv_api
Co-authored-by: Nzlsh80826 <rewang@nvidia.com>
Co-authored-by: NShijie <505749828@qq.com>

9edbe4aa

20 6月, 2022 1 次提交
- Z
  
  place all save/load path into temporary directory (#43652) · a5ccc713
  由 zhaoyingli 提交于 6月 20, 2022
  
  a5ccc713
13 9月, 2021 1 次提交
- 李
  upload global scatter and global gather operators related files (#35546) · ecfe8375
  由李季提交于 9月 13, 2021
```
* upload global scatter and global gather operators related files
```
  ecfe8375
21 6月, 2021 1 次提交
- T
  Del six.PY code2 (#33607) · 0f7187af
  由 tianshuo78520a 提交于 6月 21, 2021
```
* del py2 code2

* fix test timeout
```
  0f7187af
09 6月, 2021 1 次提交
- W
  
  [HybridParallel] update collective split to use c_embedding and mp_allreduce (#33411) · 42c1297e
  由 WangXi 提交于 6月 09, 2021
  
  42c1297e
26 5月, 2021 1 次提交
- J
  
  [Tensor Parallelism] split fix bug (#33015) · 20b9be65
  由 JZ-LIANG 提交于 5月 26, 2021
  
  20b9be65
27 4月, 2021 1 次提交
- L
  add alltoall api (#32507) · db41b742
  由 lilong12 提交于 4月 27, 2021
```
* add alltoall api, test=develop
```
  db41b742
26 4月, 2021 1 次提交
- L
  add send/recv api (#32504) · c47bafc6
  由 lilong12 提交于 4月 26, 2021
```
* add sendrecv, test=develop
```
  c47bafc6
21 4月, 2021 1 次提交
- L
  
  [Kunlun]add collective ops for multi XPU cards training and add Kunlun multi XPU cards CI (#32302) · 2194ad15
  由 liuyuhui 提交于 4月 21, 2021
  
  2194ad15
31 12月, 2020 3 次提交
- L
  
  update, test=develop (#30047) · 9e51e383
  由 lilong12 提交于 12月 31, 2020
  
  9e51e383
- L
  Disable gloo by default (#29805) · b0bd93de
  由 lilong12 提交于 12月 31, 2020
```
* update, test=develop
```
  b0bd93de
- L
  add the paddle.distributed.split api (#29970) · 2bc5121d
  由 lilong12 提交于 12月 31, 2020
```
* add distributed.split, test=develop
```
  2bc5121d
21 10月, 2020 1 次提交
- L
  modify ut cmakefile (#28140) · 4873c20d
  由 lilong12 提交于 10月 21, 2020
```
* modify ut cmakefile, test=develop
```
  4873c20d
29 9月, 2020 1 次提交
- L
  Initialize gloo for low level collective apis (#27672) · bbc2add7
  由 lilong12 提交于 9月 29, 2020
```
* add gloo initializer, test=develop
```
  bbc2add7
28 9月, 2020 2 次提交
- L
  
  Revert "Initialize gloo for low level collective apis (#27356)", test=document_fix (#27665) · 36c04102
  由 lilong12 提交于 9月 28, 2020
  
  36c04102
- L
  Initialize gloo for low level collective apis (#27356) · fa73e4a2
  由 lilong12 提交于 9月 28, 2020
```
* add gloo initializer, test=develop
```
  fa73e4a2
28 8月, 2020 1 次提交
- L
  
  update copyright year, test=document_fix (#26586) · f1ae017f
  由 lilong12 提交于 8月 28, 2020
  
  f1ae017f
27 8月, 2020 1 次提交
- L
  [api 2.0] add collective op for cpu using gloo and paddle.distributed.* apis (#26552) · 1c681383
  由 lilong12 提交于 8月 27, 2020
```
add collective op for cpu using gloo and paddle.distributed.* apis
```
  1c681383
21 8月, 2020 1 次提交
- L
  
  Add collective ops (reduce) (#26340) · e92f770c
  由 lilong12 提交于 8月 21, 2020
  
  e92f770c
24 11月, 2019 1 次提交
- Y
  adapt test_collective_base.py for only two GPU cards available. (#21307) · f1b09ba3
  由 Yi Liu 提交于 11月 24, 2019
```
* adapt test_collective_base.py for only two GPU cards available.
test=develop

* fix bug of issue #21259
test=develop
```
  f1b09ba3
27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功