- 12 1月, 2022 2 次提交
-
-
由 ziyoujiyi 提交于
* delete gloo connect retry * the_one_ps dirs reconstruct * . * . * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * the one ps dirs modify * the one ps dirs modify * the one ps dirs modify * the one ps dirs modify
-
由 JZ-LIANG 提交于
* auto parallel sharding base * chmod * add unitest * set unitest cmake dist label * revise code according to rewiew * chmod * bugfix for grad_clip and param broadcast * chmod * update unitest * chmod * add clip * chmod * add amp pass * chmod * add unitest * remove grad update * fixed bug * fixed bug * fixed typose * fixed typoes
-
- 11 1月, 2022 1 次提交
-
-
由 caozhou 提交于
* update dist tensor * add unitest * update unitest * refactor dist tensor * update dist tensor and unitest
-
- 06 1月, 2022 3 次提交
- 31 12月, 2021 1 次提交
-
-
由 xiayanming 提交于
* [Auto Parallel] add gradient merge pass * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix pr review * fix pr review * fix pr review * fix pr review * fix pr review * fix pr review
-
- 30 12月, 2021 2 次提交
- 29 12月, 2021 1 次提交
-
-
由 JZ-LIANG 提交于
* auto parallel sharding base * chmod * add unitest * set unitest cmake dist label * revise code according to rewiew * chmod
-
- 24 12月, 2021 1 次提交
-
-
由 JZ-LIANG 提交于
-
- 22 12月, 2021 1 次提交
-
-
由 Zhanlue Yang 提交于
-
- 21 12月, 2021 3 次提交
-
-
由 Yuang Liu 提交于
-
由 Guoxia Wang 提交于
-
由 Haohongxiang 提交于
* update * fix bugs * modify code style * fix bugs of _get_global_group
-
- 20 12月, 2021 2 次提交
- 19 12月, 2021 1 次提交
-
-
由 Baibaifan 提交于
-
- 17 12月, 2021 3 次提交
- 14 12月, 2021 2 次提交
- 12 12月, 2021 1 次提交
-
-
由 沉潜的鱼儿 提交于
* dist matmul op compatible * dist op unittest * modify dist matmul * modify dist reshape * modify dist reshape * add a space * add a space * delete dist matmul op * modify reshape * add dist op unittest * modify dist op unittest
-
- 10 12月, 2021 1 次提交
-
-
由 沉潜的鱼儿 提交于
* dist matmul op compatible * modify common dist op * modify common * add a space
-
- 09 12月, 2021 2 次提交
-
-
由 Haohongxiang 提交于
* merge latest develop branch * fix bugs * update * fix bugs for unittest * modify for less use of gpu mem * fix bugs of using _reset_grad_inplace_version * update * update * modify for CI-Coverage * retrick all CIs
-
由 wangguanqun 提交于
* default accessor and multi table config * add unittest * add unittest * delete print
-
- 08 12月, 2021 1 次提交
-
-
由 caozhou 提交于
* add update func of auto search * update unitest
-
- 07 12月, 2021 2 次提交
-
-
由 Zhanlue Yang 提交于
* Debug * Fixed issue with reset_grad_inplace_version when used with clear_gradient & cross-batch accumulation * Rearranged interfaces * Fixed ci issues
-
由 Yulong Ao 提交于
* [Auto Parallel] Add the unified cluster representation * [Auto Parallel] Add the graph class for physical mapping * [Auto Parallel] Add the simple physical mapper * Set the timeout of the mapper * Merge the upstream develop unittests cmake files * Fix a bug of the process group * Remove mapper unittest from platforms which is not GPU * Move the instantiation of process group after resharding * Add the local id for devices * Update the rank mapping format * [Auto Parallel] Relaunch with the rank mapping file * Remove the unnecessary json file * Avoid entering get_device_proc_info for auto mapping * Correct the mapper unit test * Add some comments * Remove the related files about mapping * Update the unittest for auto mapping * Remove unused rank_mapping unittest * Improve the unittest coverage * Improve the unittest coverage * Improve the unittest of relaunch * Fix the unittest problem in CI * Improve the unittest of relaunch * Remove unnecessary statements * Update the unittest cmakefile * Correct the cmakefile of auto parallel unittests * Modify codes based on the new elastic change * Use the GPUs exclusively in the unittest * Correct the cmakefile * Set the timeout of the unittest
-
- 06 12月, 2021 2 次提交
-
-
由 Baibaifan 提交于
-
由 kuizhiqing 提交于
-
- 02 12月, 2021 2 次提交
-
-
由 xiayanming 提交于
-
由 Baibaifan 提交于
-
- 01 12月, 2021 1 次提交
-
-
由 zmxdream 提交于
* fix launch_utils.py. test=develop * fix launch_utils.py. test=develop
-
- 30 11月, 2021 3 次提交
-
-
由 xiayanming 提交于
* [Auto Parallel] elastic support auto parallel re-launch * [Auto Parallel] elastic support auto parallel re-launch * fix ci issue * fix ci issue * fix rank mapping unittest * fix rank mapping unittest * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue
-
由 zhaocaibei123 提交于
-
由 Yulong Ao 提交于
* [Auto Parallel] Add the unified cluster representation * [Auto Parallel] Add the graph class for physical mapping * [Auto Parallel] Add the simple physical mapper * Set the timeout of the mapper * Merge the upstream develop unittests cmake files * Fix a bug of the process group * Remove mapper unittest from platforms which is not GPU * Move the instantiation of process group after resharding * Add the local id for devices * Update the rank mapping format * Add some comments * Remove the related files about mapping * Update the unittest for auto mapping * Remove unused rank_mapping unittest * Improve the unittest coverage * Improve the unittest coverage
-
- 29 11月, 2021 2 次提交
-
-
由 Baibaifan 提交于
-
由 李季 提交于
Co-authored-by: NChen Long <1300851984@qq.com>
-