提交 · 0c5a4bac8da32fd7329cd5d0d2961f2ee0697c26 · PaddlePaddle / Paddle

24 3月, 2023 1 次提交

add phi operator allreduce/reduce (#51857) · 47f87ad3

由 TaoTao Li 提交于 3月 24, 2023

* add all_reduce, reduce kernel and api

* fix all_reduce reduce ut

fix reduce op maker conflict

fix merge conflicts

* fix conflicts, rename ReduceOp->ReduceBaseOp in reduce_ops

rename allreduce op, to remove

* fix code format

fix comments

* modify test_collective_reduce_api ut timeout

* fix PR-CI-Build

fix comments: format phi operator

47f87ad3

13 3月, 2023 1 次提交
- Add phi operator all_gather (#51420) · afa26a59
  由 TaoTao Li 提交于 3月 13, 2023
```
* add all_gather and fix conflicts

* fix code format

* fix ut

* fix broadcast ut
```
  afa26a59
09 3月, 2023 1 次提交

Add comm context manager, add phi broadcast op (#51072) · c191b707

由 TaoTao Li 提交于 3月 09, 2023

* * add comm context for device context

* add broadcast phi operator kernel and api

* add broadcast support dtype, update ut

* fix broadcast bfloat16 type

* fix ut

* update test_collective_broadcast_api timeout to 300

c191b707

20 1月, 2023 1 次提交

Fluid clean remove io data (#49301) · 5670644c

由 GGBond8488 提交于 1月 20, 2023

* replace paddle.fluid.layers.data and remove io.data

* partial commit

* partial commit

* partial commit

* partial commit

* partial commit

* partial commit

* remove data in fluid.layers.io.__all__

* fix errors

* fix unitests

* fix unitest

* fix unitests

* fix unitest

* fix unitest

* fix unitests

* fix unitest

* fix test_layers unitests

* fix typro

* fix unitest

* fix unitest

* fix unitest

* fix typro

* fix unitest test_model_cast_to_bf16

* fix test_reducescatter

* fix collective unitest

* fix collective unitests

* fix collective unitests

* add coverage

* fix add layers.data

* re run ci

* fix some typro

* fix samplecode error

* fix samplecode error

5670644c

25 11月, 2022 1 次提交
- N
  [CodeStyle][isort] introducing `isort` (part1) (#46475) · cfd7ff8f
  由 Nyakku Shigure 提交于 11月 25, 2022
```
* add isort config

* isort all files
```
  cfd7ff8f
23 10月, 2022 1 次提交
- N
  [CodeStyle][black] use black instead of yapf (#46014) · 7097630f
  由 Nyakku Shigure 提交于 10月 23, 2022
```
* update config

* re-blacken python code

* temporarily disable date and diff_py_file

* skip a format
```
  7097630f
29 9月, 2022 1 次提交
- N
  [CodeStyle][F401] remove unused imports in unittests/collective (#46615) · 0ef7a02f
  由 Nyakku Shigure 提交于 9月 29, 2022
```
* [CodeStyle][F401] remove unused import in unittests/collective

* empty commit, test=document_fix

* empty commit
```
  0ef7a02f
27 9月, 2022 1 次提交
- N
  [CodeStyle] remove all future import (#46411) · 30387006
  由 Nyakku Shigure 提交于 9月 27, 2022
```
* [CodeStyle] remove all future import

* revert test_error.py

* restore future import in example code
```
  30387006
26 8月, 2022 1 次提交

move collective tests into a collective directory (#45223) · 9eb4d89b

由 Roc 提交于 8月 26, 2022

* add simple reformated ci files

* update

* add radme for new unitetsts

* add radme for new unitetsts

* add radme for new unitetsts

* reset mlu

* update for samples

* add base api

* reset some dist unit tests

* add warning in grenerated cmakelists file

* update readme for new dist unit tests

* add all collective tests

* remain base file and launcher file

* Update README.md

* Update README.md

* fix env PYTHONPATH

* Update gen_ut_cmakelists.py

* add all collective tests

* add docs for gen_ut_cmakelists.py

* pretify codes

* commont name == "name"

* update for comments

* update function's help

* update for run type

* update readme

* add all collective tests

* add all collective tests

* mv  collective test files

* update for all collective tests

* update

* update

* update

* update for all tests

* update for checking name

* Update Cmakelists.txt

* update testlist.csv

* remain test_parallel_dygraph_dataparallel in unittests

* set broadcast op all platforms

* update

* remain test_broadcast_tensors_op

* fix

* rm some collective files

* update more colective tests

* update

* update

* update
gen_ut_supports recursion

* update

* update

* update

* update

* fix nccl version

* update

* update

* update

* update

* fix a bug and try to pass

* update

* add csv

* update for timeout

* remove tcp store

* fix

* fix

* update

* update

* update for more dist tests

* move multi node tests

* update

* update

* update

* fix for auto parallele

* update

* update path in python file

* update

* reset some test in unittests

* fix

* update readme

* fix

* update

* fix port

9eb4d89b

05 6月, 2022 1 次提交

【code format check upgrade】 step2：yapf (#42944) · a072fca8

由 Sing_chan 提交于 6月 05, 2022

* use yapf to format all python file

* yapf exclude two unittests file for they rely on writing and reading file, and format will break them

* disable diff_py_file because too many diff files cause command following failed

a072fca8

22 9月, 2020 1 次提交

Use dygraph mode by default (#27443) · 827ac36f

由 pangyoki 提交于 9月 22, 2020

* default open dygraph mode

* fix CI-Mac

* fix Mac-CI other unittest file

* fix CI-Py3

* fix test_communicator_geo and test_buffer_shared_memory_reuse_pass

* add enable_static to fix CI-Py3

* add enable_static to fix CI-coverage

* delete try except

827ac36f

27 8月, 2020 1 次提交
- L
  [api 2.0] add collective op for cpu using gloo and paddle.distributed.* apis (#26552) · 1c681383
  由 lilong12 提交于 8月 27, 2020
```
add collective op for cpu using gloo and paddle.distributed.* apis
```
  1c681383
03 12月, 2019 1 次提交

set dim[0] to -1 if dim[0] < 0 during compiling for c_allgather op (#21402) · 0bc8bdf7

由 lilong12 提交于 12月 03, 2019

* set dim[0] to -1 if dim[0] < 0 and remove assertion to runtime, test=develop

* modify ENFORCE message, test=develop

* add validation for x.shape[0] > 0, test=develop

* add ut, test=develop

0bc8bdf7

27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功