- 30 11月, 2021 2 次提交
-
-
由 xiayanming 提交于
* [Auto Parallel] elastic support auto parallel re-launch * [Auto Parallel] elastic support auto parallel re-launch * fix ci issue * fix ci issue * fix rank mapping unittest * fix rank mapping unittest * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue * fix ci issue
-
由 zhaocaibei123 提交于
-
- 29 11月, 2021 2 次提交
-
-
由 Baibaifan 提交于
-
由 李季 提交于
Co-authored-by: NChen Long <1300851984@qq.com>
-
- 26 11月, 2021 2 次提交
-
-
由 zhaocaibei123 提交于
* test * test * rm test * update * update * update * add unittest * update * update save
-
由 wangzhen38 提交于
* add tdm sample * add tdm sample in c++ * update tdm sample * modify sample count * fix conflict * add set_date * fix cmake error * fix bug of proto * update index_dataset proto * update cmake * fix error cmake * fix cmake mkldnn * fix cmake proto * update cmake proto * update cmake * update rec * update dataset * update dataset * update dataset * updata dataset * updata dataset * updata coverage * updata ci * goback4 * fix npu ci * add xxhash dep
-
- 25 11月, 2021 2 次提交
- 24 11月, 2021 1 次提交
-
-
由 zhaoyingli 提交于
* adapt auto search * adapt auto search * fix matmulv2 compatible * del debug
-
- 22 11月, 2021 2 次提交
- 19 11月, 2021 1 次提交
-
-
由 wangguanqun 提交于
-
- 18 11月, 2021 3 次提交
-
-
由 zmx 提交于
* fix pslib. test=develop * add device to train_from_dataset. test=develop * refine fleet.stop_worker. test=develop * fix ut. test=develop * fix ut. test=develop * fix executor & ut. test=develop * fix executor & ut. test=develop * fix executor & ut. test=develop
-
由 xiayanming 提交于
* fleet support elastic train * fleet support elastic train * support elastic * add unittest * fix unitest bug * fix unittest bug * fix unittest bug * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix elastic bug * fix ci fail * fix ci fail * fix elastic bug * fix elastic bug * fix joint debugging bug * fix joint debugging bug * fix windows ci failed * fix windows ci failed * Optimize fleet elastic scale in/out * elastic support pre hook * add prehook unittest
-
由 zmx 提交于
-
- 17 11月, 2021 3 次提交
-
-
由 zhaocaibei123 提交于
-
由 zmx 提交于
* fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * refactor heter trainer. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop
-
由 WangXi 提交于
-
- 15 11月, 2021 2 次提交
-
-
由 Zeng Jinle 提交于
* add split_program * make ut faster * increase ut timeout * make result deterministic * add fuse_all_reduce pass * add ut framework, update * fix ut framework * remove useless code * add coverage support * update * fix CI * fix some bugs and fix ci coverage * fix conflict
-
由 zmx 提交于
* fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop
-
- 11 11月, 2021 2 次提交
-
-
由 xiayanming 提交于
* fleet support elastic train * fleet support elastic train * support elastic * add unittest * fix unitest bug * fix unittest bug * fix unittest bug * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix elastic bug * fix ci fail * fix ci fail * fix elastic bug * fix elastic bug * fix joint debugging bug * fix joint debugging bug * fix windows ci failed * fix windows ci failed
-
由 zmx 提交于
* change username * fix * fix * fix * fix * fix * update * update * update unittests * fix * update * fix * update * fix * fix * fix * update * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update send_and_recv op. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix unit. notest,test=coverage * fix ut. notest, test=coverage * update. notest,test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix. notest, test=coverage * fix. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * add func. notest, test=coverage * fix ut. notest, test=coverage * fix. test=develop * fix. test=develop
-
- 08 11月, 2021 1 次提交
-
-
由 kuizhiqing 提交于
-
- 28 10月, 2021 3 次提交
-
-
由 wangguanqun 提交于
* add trainer desc config to distributed strategy * code style modified * data_feed set lod * fix bug * code style * fix bug * save load * save load * save unittest * add unittest of the_one_ps * unittest * add todo in communicator sendsparse
-
由 seemingwang 提交于
-
由 Bo Liu 提交于
-
- 27 10月, 2021 1 次提交
-
-
由 xiongkun 提交于
* bugfix: only check backend when mode == Collecive * fix bug
-
- 25 10月, 2021 1 次提交
-
-
由 Haohongxiang 提交于
* fix bug of check_inf * fix allreduce
-
- 21 10月, 2021 2 次提交
-
-
由 danleifeng 提交于
-
由 xiongkun 提交于
-
- 20 10月, 2021 1 次提交
-
-
由 Haohongxiang 提交于
* fix bugs of ClipGradByGlobalNorm * add unittests * add unittests
-
- 19 10月, 2021 2 次提交
-
-
由 danleifeng 提交于
-
由 WangXi 提交于
-
- 18 10月, 2021 1 次提交
-
-
由 Haohongxiang 提交于
* [HybridParallel]Support fp16 in dygraph hybrid parallel * update * update * update for recompute * add unittest of pp+fp16 * add unittest of recompute+fp16 * update * modify ut
-
- 15 10月, 2021 1 次提交
-
-
由 duanboqiang 提交于
-
- 14 10月, 2021 3 次提交
-
-
由 duanboqiang 提交于
-
由 ShenLiang 提交于
* add no_sync for parameters sync * add pipeline for moe
-
由 Yuang Liu 提交于
-
- 13 10月, 2021 2 次提交
-
-
由 Guoxia Wang 提交于
-
由 Leo Chen 提交于
* refine amp level * fix typo * update tracer._amp_level
-