提交 · ce207c3aba1b1c3eebcc7fb7cb1ba3da2f0c460b · 月光在发光 / Paddle

24 2月, 2022 1 次提交
- Z
  
  [MLU]add mlu kernel for allreduce (#39788) · ce207c3a
  由 zn 提交于 2月 24, 2022
  
  ce207c3a
18 2月, 2022 1 次提交
- Z
  [MLU]add sync stream ops and broadcast pytest (#39518) · d2bd05b9
  由 zn 提交于 2月 18, 2022
```
* [MLU]add sync stream ops and broadcast pytest

* [MLU]fix broadcast pytest to add data type
```
  d2bd05b9
17 12月, 2021 1 次提交
- W
  
  fix bind failed with Address already in use (#38174) · 446a62e8
  由 WangXi 提交于 12月 17, 2021
  
  446a62e8
28 6月, 2021 1 次提交
- J
  
  fix undef var (#33780) · 83284c8c
  由 Jiangxinz 提交于 6月 28, 2021
  
  83284c8c
21 6月, 2021 1 次提交
- T
  Del six.PY code2 (#33607) · 0f7187af
  由 tianshuo78520a 提交于 6月 21, 2021
```
* del py2 code2

* fix test timeout
```
  0f7187af
08 5月, 2021 1 次提交
- B
  add c_identity op npu (#32787) · c8affff0
  由 Baibaifan 提交于 5月 08, 2021
```
* add c_identity_op_npu
```
  c8affff0
23 4月, 2021 2 次提交
- L
  add the c_identity op (#32485) · 8fa8a37f
  由 lilong12 提交于 4月 23, 2021
```
* add c_identity op, test=develop
```
  8fa8a37f
- L
  add c_concat and c_split ops (#32486) · 2b108a04
  由 lilong12 提交于 4月 23, 2021
```
* add c_concat op
```
  2b108a04
13 11月, 2020 1 次提交
- L
  add send and recv ops (#28590) · ed9dd7c9
  由 lilong12 提交于 11月 13, 2020
```
* update, test=develop
```
  ed9dd7c9
21 8月, 2020 1 次提交
- L
  
  Add collective ops (reduce) (#26340) · e92f770c
  由 lilong12 提交于 8月 21, 2020
  
  e92f770c
24 11月, 2019 1 次提交
- Y
  adapt test_collective_base.py for only two GPU cards available. (#21307) · f1b09ba3
  由 Yi Liu 提交于 11月 24, 2019
```
* adapt test_collective_base.py for only two GPU cards available.
test=develop

* fix bug of issue #21259
test=develop
```
  f1b09ba3
27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

月光在发光 / Paddle 与 Fork 源项目一致

月光在发光 / Paddle
与 Fork 源项目一致