- 19 12月, 2021 1 次提交
-
-
由 Baibaifan 提交于
-
- 17 12月, 2021 3 次提交
- 14 12月, 2021 2 次提交
- 12 12月, 2021 1 次提交
-
-
由 沉潜的鱼儿 提交于
* dist matmul op compatible * dist op unittest * modify dist matmul * modify dist reshape * modify dist reshape * add a space * add a space * delete dist matmul op * modify reshape * add dist op unittest * modify dist op unittest
-
- 10 12月, 2021 1 次提交
-
-
由 沉潜的鱼儿 提交于
* dist matmul op compatible * modify common dist op * modify common * add a space
-
- 09 12月, 2021 2 次提交
-
-
由 Haohongxiang 提交于
* merge latest develop branch * fix bugs * update * fix bugs for unittest * modify for less use of gpu mem * fix bugs of using _reset_grad_inplace_version * update * update * modify for CI-Coverage * retrick all CIs
-
由 wangguanqun 提交于
* default accessor and multi table config * add unittest * add unittest * delete print
-
- 08 12月, 2021 1 次提交
-
-
由 caozhou 提交于
* add update func of auto search * update unitest
-
- 07 12月, 2021 2 次提交
-
-
由 Zhanlue Yang 提交于
* Debug * Fixed issue with reset_grad_inplace_version when used with clear_gradient & cross-batch accumulation * Rearranged interfaces * Fixed ci issues
-
由 Yulong Ao 提交于
* [Auto Parallel] Add the unified cluster representation * [Auto Parallel] Add the graph class for physical mapping * [Auto Parallel] Add the simple physical mapper * Set the timeout of the mapper * Merge the upstream develop unittests cmake files * Fix a bug of the process group * Remove mapper unittest from platforms which is not GPU * Move the instantiation of process group after resharding * Add the local id for devices * Update the rank mapping format * [Auto Parallel] Relaunch with the rank mapping file * Remove the unnecessary json file * Avoid entering get_device_proc_info for auto mapping * Correct the mapper unit test * Add some comments * Remove the related files about mapping * Update the unittest for auto mapping * Remove unused rank_mapping unittest * Improve the unittest coverage * Improve the unittest coverage * Improve the unittest of relaunch * Fix the unittest problem in CI * Improve the unittest of relaunch * Remove unnecessary statements * Update the unittest cmakefile * Correct the cmakefile of auto parallel unittests * Modify codes based on the new elastic change * Use the GPUs exclusively in the unittest * Correct the cmakefile * Set the timeout of the unittest
-
- 06 12月, 2021 2 次提交
-
-
由 Baibaifan 提交于
-
由 kuizhiqing 提交于
-
- 02 12月, 2021 2 次提交
-
-
由 xiayanming 提交于
-
由 Baibaifan 提交于
-
- 01 12月, 2021 1 次提交
-
-
由 zmxdream 提交于
* fix launch_utils.py. test=develop * fix launch_utils.py. test=develop
-
- 30 11月, 2021 3 次提交
-
-
由 xiayanming 提交于
* [Auto Parallel] elastic support auto parallel re-launch * [Auto Parallel] elastic support auto parallel re-launch * fix ci issue * fix ci issue * fix rank mapping unittest * fix rank mapping unittest * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue
-
由 zhaocaibei123 提交于
-
由 Yulong Ao 提交于
* [Auto Parallel] Add the unified cluster representation * [Auto Parallel] Add the graph class for physical mapping * [Auto Parallel] Add the simple physical mapper * Set the timeout of the mapper * Merge the upstream develop unittests cmake files * Fix a bug of the process group * Remove mapper unittest from platforms which is not GPU * Move the instantiation of process group after resharding * Add the local id for devices * Update the rank mapping format * Add some comments * Remove the related files about mapping * Update the unittest for auto mapping * Remove unused rank_mapping unittest * Improve the unittest coverage * Improve the unittest coverage
-
- 29 11月, 2021 2 次提交
-
-
由 Baibaifan 提交于
-
由 李季 提交于
Co-authored-by: NChen Long <1300851984@qq.com>
-
- 27 11月, 2021 1 次提交
-
-
由 Yulong Ao 提交于
* [Auto Parallel] Add the unified cluster representation * [Auto Parallel] Add the graph class for physical mapping * [Auto Parallel] Add the simple physical mapper * Set the timeout of the mapper * Merge the upstream develop unittests cmake files * Fix a bug of the process group * Remove mapper unittest from platforms which is not GPU * Move the instantiation of process group after resharding * Add the local id for devices * Update the rank mapping format * Add some comments * Remove the related files about mapping * Remove unused rank_mapping unittest * Improve the unittest coverage
-
- 26 11月, 2021 2 次提交
-
-
由 zhaocaibei123 提交于
* test * test * rm test * update * update * update * add unittest * update * update save
-
由 wangzhen38 提交于
* add tdm sample * add tdm sample in c++ * update tdm sample * modify sample count * fix conflict * add set_date * fix cmake error * fix bug of proto * update index_dataset proto * update cmake * fix error cmake * fix cmake mkldnn * fix cmake proto * update cmake proto * update cmake * update rec * update dataset * update dataset * update dataset * updata dataset * updata dataset * updata coverage * updata ci * goback4 * fix npu ci * add xxhash dep
-
- 25 11月, 2021 2 次提交
- 24 11月, 2021 2 次提交
-
-
由 zhaoyingli 提交于
* adapt auto search * adapt auto search * fix matmulv2 compatible * del debug
-
由 Yulong Ao 提交于
* [Auto Parallel] Add the unified cluster representation * Add the local id for devices * Add some comments
-
- 23 11月, 2021 1 次提交
-
-
由 ronnywang 提交于
* Added HCCL backend support in dynamic graph mode * fix segmentation fault * add ut
-
- 22 11月, 2021 3 次提交
-
-
由 zhaoyingli 提交于
* fix autoconvert * fix merge parameter
-
由 Webbley 提交于
-
由 zmx 提交于
* fix api. test=develop * fix api. test=develop
-
- 19 11月, 2021 1 次提交
-
-
由 wangguanqun 提交于
-
- 18 11月, 2021 3 次提交
-
-
由 zmx 提交于
* fix pslib. test=develop * add device to train_from_dataset. test=develop * refine fleet.stop_worker. test=develop * fix ut. test=develop * fix ut. test=develop * fix executor & ut. test=develop * fix executor & ut. test=develop * fix executor & ut. test=develop
-
由 xiayanming 提交于
* fleet support elastic train * fleet support elastic train * support elastic * add unittest * fix unitest bug * fix unittest bug * fix unittest bug * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix elastic bug * fix ci fail * fix ci fail * fix elastic bug * fix elastic bug * fix joint debugging bug * fix joint debugging bug * fix windows ci failed * fix windows ci failed * Optimize fleet elastic scale in/out * elastic support pre hook * add prehook unittest
-
由 zmx 提交于
-
- 17 11月, 2021 2 次提交
-
-
由 zhaocaibei123 提交于
-
由 zmx 提交于
* fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * refactor heter trainer. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop
-