- 18 11月, 2021 3 次提交
-
-
由 zmx 提交于
* fix pslib. test=develop * add device to train_from_dataset. test=develop * refine fleet.stop_worker. test=develop * fix ut. test=develop * fix ut. test=develop * fix executor & ut. test=develop * fix executor & ut. test=develop * fix executor & ut. test=develop
-
由 xiayanming 提交于
* fleet support elastic train * fleet support elastic train * support elastic * add unittest * fix unitest bug * fix unittest bug * fix unittest bug * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix elastic bug * fix ci fail * fix ci fail * fix elastic bug * fix elastic bug * fix joint debugging bug * fix joint debugging bug * fix windows ci failed * fix windows ci failed * Optimize fleet elastic scale in/out * elastic support pre hook * add prehook unittest
-
由 zmx 提交于
-
- 17 11月, 2021 3 次提交
-
-
由 zhaocaibei123 提交于
-
由 zmx 提交于
* fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * refactor heter trainer. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop
-
由 WangXi 提交于
-
- 16 11月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 15 11月, 2021 2 次提交
-
-
由 Zeng Jinle 提交于
* add split_program * make ut faster * increase ut timeout * make result deterministic * add fuse_all_reduce pass * add ut framework, update * fix ut framework * remove useless code * add coverage support * update * fix CI * fix some bugs and fix ci coverage * fix conflict
-
由 zmx 提交于
* fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop
-
- 12 11月, 2021 1 次提交
-
-
由 zhaoyingli 提交于
* add AutoConvert * add unitest * amend merge&slice * amend default dist_attr * update doc&improve coverage * add interface dist_context * tiny modify
-
- 11 11月, 2021 2 次提交
-
-
由 xiayanming 提交于
* fleet support elastic train * fleet support elastic train * support elastic * add unittest * fix unitest bug * fix unittest bug * fix unittest bug * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix elastic bug * fix ci fail * fix ci fail * fix elastic bug * fix elastic bug * fix joint debugging bug * fix joint debugging bug * fix windows ci failed * fix windows ci failed
-
由 zmx 提交于
* change username * fix * fix * fix * fix * fix * update * update * update unittests * fix * update * fix * update * fix * fix * fix * update * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update send_and_recv op. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix unit. notest,test=coverage * fix ut. notest, test=coverage * update. notest,test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix. notest, test=coverage * fix. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * add func. notest, test=coverage * fix ut. notest, test=coverage * fix. test=develop * fix. test=develop
-
- 08 11月, 2021 1 次提交
-
-
由 kuizhiqing 提交于
-
- 02 11月, 2021 1 次提交
-
-
由 zhaoyingli 提交于
* AutoParallel Save&Load * tiny modi * update func name * tiny fix * add NotImplementedError * fix doc * update func name * update func param * update interface * add unitest & modi make_data_unshard * update unittest * update unittest * fix unittest * fix cmakelist * update unittest
-
- 29 10月, 2021 1 次提交
-
-
由 Yulong Ao 提交于
* default dist op * add dist_attr for dist op * add unitest * update inputname * update function name * add unitest * update CMakeLists.txt for CI * fix dis_matmul * fix compile error * update matmul to matmul_v2 * unify api * unify api * todo * update distop forward func * update distop forward func * auto parallel backward * update dist op * autoparallel backward * add backward for embedding * temp1 * temp2 * temp3 * temp4 * backward done1 * backward done2 * backward done3 * dist embedding remove mp mode * dist matmul remove mp mode * update dist embedding 『 * dist op init1 * dist op init 2 * update unitest * context remove parallel mode * partitioner remove parallel mode * update unitest * a more general method to support varying mesh in pipeline parallel * support varying mesh in pipeline parallel * embedding support varying mesh in pipeline parallel * matmul support varying mesh in pipeline parallel * default dist op support varying mesh in pipeline parallel * dist attribute for startup program * default dist op support varying mesh in pipeline parallel 2 * partitoner support varying mesh in pipeline parallel * revise logic for auto compeletion * revise framework.py * revise reshard unitest * revise unitest for parallelize * chmod * fixed bug for dist embedding name mapping * Improve the interface and the underlying mechanisms of auto parallel * revise completion for backward * revise completion for update * revise completion for update * update unitest * chmod * bugfix for grad_op output var's mesh * Modify codes for pr 36744 * Remove unnecessary comments in framework.py * Remove unnecessary comments in completion.py Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com> Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com> Co-authored-by: NJZ-LIANG <38102074+JZ-LIANG@users.noreply.github.com>
-
- 28 10月, 2021 3 次提交
-
-
由 wangguanqun 提交于
* add trainer desc config to distributed strategy * code style modified * data_feed set lod * fix bug * code style * fix bug * save load * save load * save unittest * add unittest of the_one_ps * unittest * add todo in communicator sendsparse
-
由 seemingwang 提交于
-
由 Bo Liu 提交于
-
- 27 10月, 2021 2 次提交
- 25 10月, 2021 1 次提交
-
-
由 Haohongxiang 提交于
* fix bug of check_inf * fix allreduce
-
- 21 10月, 2021 2 次提交
-
-
由 danleifeng 提交于
-
由 xiongkun 提交于
-
- 20 10月, 2021 3 次提交
-
-
由 Haohongxiang 提交于
* fix bugs of ClipGradByGlobalNorm * add unittests * add unittests
-
由 李季 提交于
* fix global gather and global scatter operators
-
由 JZ-LIANG 提交于
* default dist op * add dist_attr for dist op * add unitest * update inputname * update function name * add unitest * update CMakeLists.txt for CI * fix dis_matmul * fix compile error * update matmul to matmul_v2 * unify api * unify api * todo * update distop forward func * update distop forward func * auto parallel backward * update dist op * autoparallel backward * add backward for embedding * temp1 * temp2 * temp3 * temp4 * backward done1 * backward done2 * backward done3 * dist embedding remove mp mode * dist matmul remove mp mode * update dist embedding 『 * dist op init1 * dist op init 2 * update unitest * context remove parallel mode * partitioner remove parallel mode * update unitest * a more general method to support varying mesh in pipeline parallel * support varying mesh in pipeline parallel * embedding support varying mesh in pipeline parallel * matmul support varying mesh in pipeline parallel * default dist op support varying mesh in pipeline parallel * dist attribute for startup program * default dist op support varying mesh in pipeline parallel 2 * partitoner support varying mesh in pipeline parallel * revise logic for auto compeletion * revise framework.py * revise reshard unitest * revise unitest for parallelize * chmod * fixed bug for dist embedding name mapping Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
-
- 19 10月, 2021 3 次提交
-
-
由 danleifeng 提交于
-
由 WangXi 提交于
-
由 YipZLF 提交于
* Add auto parallel cost model and unittests * Fixed code styles. * Fixed bugs and codes style * fixed typo * Improved code style: object encapsulation. * Fixed codes. * Refractored estimate_cost * Fixed typo
-
- 18 10月, 2021 1 次提交
-
-
由 Haohongxiang 提交于
* [HybridParallel]Support fp16 in dygraph hybrid parallel * update * update * update for recompute * add unittest of pp+fp16 * add unittest of recompute+fp16 * update * modify ut
-
- 15 10月, 2021 1 次提交
-
-
由 duanboqiang 提交于
-
- 14 10月, 2021 3 次提交
-
-
由 duanboqiang 提交于
-
由 ShenLiang 提交于
* add no_sync for parameters sync * add pipeline for moe
-
由 Yuang Liu 提交于
-
- 13 10月, 2021 3 次提交
-
-
由 Guoxia Wang 提交于
-
由 caozhou 提交于
-
由 Leo Chen 提交于
* refine amp level * fix typo * update tracer._amp_level
-
- 12 10月, 2021 1 次提交
-
-
由 Haohongxiang 提交于
* fix calling bug of HybridParallelClipGrad * fix bugs of HybridParallelClipGrad * add unittest of pp with HybridParallelClipGrad * fix bugs in mp_layers.py * update * fix bugs in pp_layers.py * update
-
- 11 10月, 2021 2 次提交
-
-
由 danleifeng 提交于
* heterps:add fuse_allreduce op; test=develop * add program_mode in minimize for pslib mode;test=develop
-
由 caozhou 提交于
* add reshard module * fix conflict * update reshard module * update and add unitest * update reshard module and unitest * add more unitests
-