- 24 3月, 2023 1 次提交
-
-
由 TaoTao Li 提交于
* add all_reduce, reduce kernel and api * fix all_reduce reduce ut fix reduce op maker conflict fix merge conflicts * fix conflicts, rename ReduceOp->ReduceBaseOp in reduce_ops rename allreduce op, to remove * fix code format fix comments * modify test_collective_reduce_api ut timeout * fix PR-CI-Build fix comments: format phi operator
-
- 13 3月, 2023 1 次提交
-
-
由 TaoTao Li 提交于
* add all_gather and fix conflicts * fix code format * fix ut * fix broadcast ut
-
- 09 3月, 2023 1 次提交
-
-
由 TaoTao Li 提交于
* * add comm context for device context * add broadcast phi operator kernel and api * add broadcast support dtype, update ut * fix broadcast bfloat16 type * fix ut * update test_collective_broadcast_api timeout to 300
-
- 20 1月, 2023 1 次提交
-
-
由 GGBond8488 提交于
* replace paddle.fluid.layers.data and remove io.data * partial commit * partial commit * partial commit * partial commit * partial commit * partial commit * remove data in fluid.layers.io.__all__ * fix errors * fix unitests * fix unitest * fix unitests * fix unitest * fix unitest * fix unitests * fix unitest * fix test_layers unitests * fix typro * fix unitest * fix unitest * fix unitest * fix typro * fix unitest test_model_cast_to_bf16 * fix test_reducescatter * fix collective unitest * fix collective unitests * fix collective unitests * add coverage * fix add layers.data * re run ci * fix some typro * fix samplecode error * fix samplecode error
-
- 25 11月, 2022 1 次提交
-
-
由 Nyakku Shigure 提交于
* add isort config * isort all files
-
- 23 10月, 2022 1 次提交
-
-
由 Nyakku Shigure 提交于
* update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format
-
- 29 9月, 2022 1 次提交
-
-
由 Nyakku Shigure 提交于
* [CodeStyle][F401] remove unused import in unittests/collective * empty commit, test=document_fix * empty commit
-
- 27 9月, 2022 1 次提交
-
-
由 Nyakku Shigure 提交于
* [CodeStyle] remove all future import * revert test_error.py * restore future import in example code
-
- 26 8月, 2022 1 次提交
-
-
由 Roc 提交于
* add simple reformated ci files * update * add radme for new unitetsts * add radme for new unitetsts * add radme for new unitetsts * reset mlu * update for samples * add base api * reset some dist unit tests * add warning in grenerated cmakelists file * update readme for new dist unit tests * add all collective tests * remain base file and launcher file * Update README.md * Update README.md * fix env PYTHONPATH * Update gen_ut_cmakelists.py * add all collective tests * add docs for gen_ut_cmakelists.py * pretify codes * commont name == "name" * update for comments * update function's help * update for run type * update readme * add all collective tests * add all collective tests * mv collective test files * update for all collective tests * update * update * update * update for all tests * update for checking name * Update Cmakelists.txt * update testlist.csv * remain test_parallel_dygraph_dataparallel in unittests * set broadcast op all platforms * update * remain test_broadcast_tensors_op * fix * rm some collective files * update more colective tests * update * update * update gen_ut_supports recursion * update * update * update * update * fix nccl version * update * update * update * update * fix a bug and try to pass * update * add csv * update for timeout * remove tcp store * fix * fix * update * update * update for more dist tests * move multi node tests * update * update * update * fix for auto parallele * update * update path in python file * update * reset some test in unittests * fix * update readme * fix * update * fix port
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
* use yapf to format all python file * yapf exclude two unittests file for they rely on writing and reading file, and format will break them * disable diff_py_file because too many diff files cause command following failed
-
- 22 9月, 2020 1 次提交
-
-
由 pangyoki 提交于
* default open dygraph mode * fix CI-Mac * fix Mac-CI other unittest file * fix CI-Py3 * fix test_communicator_geo and test_buffer_shared_memory_reuse_pass * add enable_static to fix CI-Py3 * add enable_static to fix CI-coverage * delete try except
-
- 27 8月, 2020 1 次提交
-
-
由 lilong12 提交于
add collective op for cpu using gloo and paddle.distributed.* apis
-
- 03 12月, 2019 1 次提交
-
-
由 lilong12 提交于
* set dim[0] to -1 if dim[0] < 0 and remove assertion to runtime, test=develop * modify ENFORCE message, test=develop * add validation for x.shape[0] > 0, test=develop * add ut, test=develop
-
- 27 6月, 2019 1 次提交
-
-
由 HaoRen 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O
-