1. 26 7月, 2022 1 次提交
    • Z
      add horizontal federation learning ps feature (#44327) · 4bc22b69
      ziyoujiyi 提交于
      * back fl
      
      * delete ssl cert
      
      * .
      
      * make warning
      
      * .
      
      * unittest paral degree
      
      * solve unittest
      
      * heter & multi cloud commm ready
      
      * .
      
      * .
      
      * fl-ps v1.0
      
      * .
      
      * support N + N mode
      
      * .
      
      * .
      
      * .
      
      * .
      
      * delete print
      
      * .
      
      * .
      
      * .
      
      * .
      
      * fix bug
      
      * .
      
      * .
      
      * fl-ps with coordinator ready
      
      * merge dev
      
      * update message parse only
      
      * update fl client scheduler
      
      * fix bug
      
      * update multithreads sync
      
      * fix ci errors
      
      * update role_maker.py
      
      * update role_maker.py
      
      * fix ci error: windows py import error
      
      * fix ci error: windows py import error
      
      * fix windows ci pylib import error
      
      * add dump fields & params
      
      * try to fix windows import fleet error
      
      * fix ps FLAGS error
      4bc22b69
  2. 20 7月, 2022 1 次提交
  3. 20 6月, 2022 1 次提交
    • W
      add dymf to gpups in python (#43497) · a4cfa5ae
      wangguanqun 提交于
      * gpups default config and dataset
      
      * codestyle
      
      * add unittest
      
      * code style
      
      * add dymf to gpups
      
      * codestyle
      
      * add static.nn.cvm import
      
      * PSERVER_DEBUG
      
      * add fs config to worker desc
      
      * update unittest
      
      * unittest
      
      * remove gpups unittest
      
      * remove gpups unittest
      
      * static check
      a4cfa5ae
  4. 17 6月, 2022 1 次提交
    • Z
      bug fix (#43526) · 4d649893
      ziyoujiyi 提交于
      * back fl
      
      * delete ssl cert
      
      * .
      
      * make warning
      
      * .
      
      * unittest paral degree
      
      * solve unittest
      
      * heter & multi cloud commm ready
      
      * .
      
      * .
      
      * fl-ps v1.0
      
      * .
      
      * support N + N mode
      
      * .
      
      * .
      
      * .
      
      * .
      
      * delete print
      
      * .
      
      * .
      
      * .
      
      * .
      
      * fix bug
      
      * .
      
      * .
      4d649893
  5. 13 6月, 2022 1 次提交
  6. 05 6月, 2022 1 次提交
    • S
      【code format check upgrade】 step2:yapf (#42944) · a072fca8
      Sing_chan 提交于
      * use yapf to format all python file
      
      * yapf exclude two unittests file for they rely on writing and reading file, and format will break them
      
      * disable diff_py_file because too many diff files cause command following failed
      a072fca8
  7. 02 6月, 2022 1 次提交
  8. 25 5月, 2022 1 次提交
  9. 12 5月, 2022 1 次提交
  10. 22 4月, 2022 1 次提交
    • Z
      Ssd sparse table (#41812) · cca57c4a
      zhaocaibei123 提交于
      * [cherry-pick2.3]fix compile bug of windows cuda11.5 (#41464)
      
      cherry-pick
      
      fix compile bug of windows cuda11.5 #41433
      
      * fix bug of missing boost when compile cache.cc (#41449)
      
      【chery-pick #41430】fix bug of random compile failure, due to incorrect compile order of dependencies
      
      * Fix eager try catch (#41438) (#41477)
      
      [Cherry-Pick]Fix eager try catch (#41438)
      
      * Cherry-pick-PR41407, fix device_id bug for final_state op in multiprocess testcase (#41407) (#41475)
      
      Cherry-pick PR #41407
      
      * [BugFix] Add error hint for one_hot gpu version (#41335) (#41495)
      
      * add one_hot gpu hint
      
      * move allow_out_of_range judgement
      
      * delete useless unittest
      
      * fix bugs of reshape double grad infermeta (#41459) (#41493)
      
      * [cherrypick-2.3] modify infer gpu memory strategy (#41427), remove cudnn_deterministic=True (#41341)  (#41491)
      Co-authored-by: NJingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>
      
      * [Cherry-pick][ROCm] fix dcu error in device event base, test=develop (#41523)
      
      Cherry-pick of #41521
      
      * [Cherry-Pick]Cherry pick PR41200, PR41474, PR41382 (#41509)
      
      * Use `self`as a parameter of _hash_with_id function to avoid error caused by hash_id reuse (#41200)
      
      * Add fill_constant_batch_size YAML and UT (#41474)
      
      * Switch some dy2st UT to eager mode (#41382)
      
      * Sitch some dy2st UT to eager mode
      
      * Fix test_lstm and remove test_transformer
      
      * Run test_resnet_v2 in old dy mode
      
      * Unittest recover (#41431)
      
      * update name
      
      * update name
      
      * fix test
      
      * fix fleet bind
      
      * update name
      
      * update name
      
      * fix test
      
      * fix gpups wrapper
      
      * remove Push/Pull/Load/Save with context in client and wrapper base class
      
      * fix
      
      * fix
      
      * remove some interface
      
      * fix
      
      * remove
      
      * code style
      
      * recover
      
      * fix
      
      * remove code unused
      
      * remove some unused table & accessor & CommonDenseTable => MemoryDenseTable
      
      * fix
      
      * fix
      
      * fix
      
      * recover
      
      * remove unused code
      
      * recover unittest
      
      * fix
      
      * remove
      
      * fix
      
      * remove code unuseful
      
      * remove
      
      * fix
      
      * recover
      
      * remove
      Co-authored-by: Nesythan <esythan@126.com>
      
      * add ssd sparse table
      
      * fix
      
      * add cache shuffle
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * add unit test
      
      * fix
      Co-authored-by: zhouweiwei2014's avatarZhou Wei <1183042833@qq.com>
      Co-authored-by: NSing_chan <51314274+betterpig@users.noreply.github.com>
      Co-authored-by: N0x45f <23097963+0x45f@users.noreply.github.com>
      Co-authored-by: Npangyoki <pangyoki@126.com>
      Co-authored-by: NSiming Dai <908660116@qq.com>
      Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
      Co-authored-by: NZhang Jun <ewalker@live.cn>
      Co-authored-by: NJingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>
      Co-authored-by: NQi Li <qili93@qq.com>
      Co-authored-by: Nesythan <esythan@126.com>
      cca57c4a
  11. 13 4月, 2022 1 次提交
    • W
      the one ps proto (#41659) · b12af9e1
      wangguanqun 提交于
      * the one ps proto
      
      * the one ps proto
      
      * fix
      
      * fix
      
      * fix
      
      * fix windows ci
      
      * fix windows ci
      
      * add dependency
      
      * add dependency
      b12af9e1
  12. 09 4月, 2022 1 次提交
    • Z
      Unittest recover (#41431) · 7a07c4a5
      zhaocaibei123 提交于
      * update name
      
      * update name
      
      * fix test
      
      * fix fleet bind
      
      * update name
      
      * update name
      
      * fix test
      
      * fix gpups wrapper
      
      * remove Push/Pull/Load/Save with context in client and wrapper base class
      
      * fix
      
      * fix
      
      * remove some interface
      
      * fix
      
      * remove
      
      * code style
      
      * recover
      
      * fix
      
      * remove code unused
      
      * remove some unused table & accessor & CommonDenseTable => MemoryDenseTable
      
      * fix
      
      * fix
      
      * fix
      
      * recover
      
      * remove unused code
      
      * recover unittest
      
      * fix
      
      * remove
      
      * fix
      
      * remove code unuseful
      
      * remove
      
      * fix
      
      * recover
      
      * remove
      Co-authored-by: Nesythan <esythan@126.com>
      7a07c4a5
  13. 05 4月, 2022 1 次提交
    • Z
      Table refine: remove table/accessor unuseful (#41400) · a288fcab
      zhaocaibei123 提交于
      * update name
      
      * update name
      
      * fix test
      
      * fix fleet bind
      
      * update name
      
      * update name
      
      * fix test
      
      * fix gpups wrapper
      
      * remove Push/Pull/Load/Save with context in client and wrapper base class
      
      * fix
      
      * fix
      
      * remove some interface
      
      * fix
      
      * remove
      
      * code style
      
      * recover
      
      * fix
      
      * remove code unused
      
      * remove some unused table & accessor & CommonDenseTable => MemoryDenseTable
      
      * fix
      
      * fix
      
      * fix
      
      * recover
      
      * remove unused code
      Co-authored-by: Nesythan <esythan@126.com>
      a288fcab
  14. 04 4月, 2022 1 次提交
    • Z
      Table refine: Pull/Push(TableContext) (#41320) · 19cb0d18
      zhaocaibei123 提交于
      * update name
      
      * update name
      
      * fix test
      
      * fix fleet bind
      
      * update name
      
      * update name
      
      * fix test
      
      * fix gpups wrapper
      
      * remove Push/Pull/Load/Save with context in client and wrapper base class
      
      * fix
      
      * fix
      
      * remove some interface
      
      * fix
      
      * remove
      
      * code style
      
      * recover
      
      * fix
      
      * remove code unused
      
      * fix
      
      * recover
      
      * fix
      Co-authored-by: Nesythan <esythan@126.com>
      19cb0d18
  15. 01 4月, 2022 1 次提交
    • Z
      fix py36 import as error (#41236) · 6ed6f9fe
      ziyoujiyi 提交于
      * back fl
      
      * delete ssl cert
      
      * .
      
      * make warning
      
      * .
      
      * unittest paral degree
      
      * solve unittest
      
      * heter & multi cloud commm ready
      
      * .
      
      * .
      
      * correct py36 import error
      
      * correct py36 import error
      
      * correct py36 import error
      
      * correct py36 import error
      6ed6f9fe
  16. 31 3月, 2022 1 次提交
  17. 30 3月, 2022 2 次提交
  18. 28 3月, 2022 1 次提交
  19. 23 3月, 2022 1 次提交
    • Z
      two-phase training for ps (#40762) · b1a4668c
      zhaocaibei123 提交于
      * fix benchmark and communicator config
      
      * fix bugs of the_one_ps
      
      * multi program and fix bug in optimizer
      
      * multi program in the_one_ps
      
      * public commcontext
      
      * ps optimizer multi programs
      
      * cvm & datanorm backend
      
      * fix dim
      
      * fix unittest
      
      * fix
      
      * the one ps merge
      
      * remove comm
      
      * add DownpourLiteWorker
      
      * all
      
      * fix
      
      * fix
      
      * device worker downpour lite
      
      * fix
      
      * fix bug in global shuffle
      
      * save inference model
      
      * fix & add log
      
      * fix
      
      * remove log
      
      * fix
      
      * fix save summary
      
      * fix
      
      * fix pscore
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * remove logs
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * add some comments
      
      * fix
      Co-authored-by: Nesythan <esythan@126.com>
      b1a4668c
  20. 05 3月, 2022 1 次提交
    • W
      Ps optimizer multi programs (#39883) · bcaf88d2
      wangguanqun 提交于
      * fix benchmark and communicator config
      
      * fix bugs of the_one_ps
      
      * multi program and fix bug in optimizer
      
      * multi program in the_one_ps
      
      * public commcontext
      
      * ps optimizer multi programs
      
      * the one ps merge
      
      * fix bug in test
      bcaf88d2
  21. 02 3月, 2022 1 次提交
    • Z
      new fleet_desc builder (#39948) · 1c4e3e5d
      ziyoujiyi 提交于
      * delete gloo connect retry
      
      * the_one_ps dirs reconstruct
      
      * .
      
      * .
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * refactor theoneps
      
      * the_one_ps
      
      * add ps pass unittest
      
      * add ps pass unittest
      
      * ps unitest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * ps unittest ready
      
      * ps unittest ready
      
      * solve dist_pass init conflict
      
      * solve import CommContext error
      
      * unittest ok
      
      * implement AllocateFrom
      
      * solve setup.py.in conflict
      
      * solve conflict
      
      * solve conflict
      
      * solve conflict
      
      * .
      
      * .
      
      * cpu-async-ps minimize test ok & gpu minimize test ok
      
      * add heter 2stage unittest
      
      * add heter 2stage unittest
      
      * add heter 2stage unittest
      
      * sync/geo test ok & fix heter_worker program ok
      
      * .
      
      * new fleet desc generator
      
      * new fleet_desc builder
      
      * new fleet_desc builder
      
      * .
      
      * .
      
      * correct ps.proto compile
      
      * .
      Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>
      1c4e3e5d
  22. 22 2月, 2022 1 次提交
    • W
      fix bug in new the_one_ps (#39505) · d56a0a1b
      wangguanqun 提交于
      * fix benchmark and communicator config
      
      * fix bugs of the_one_ps
      
      * multi program and fix bug in optimizer
      
      * multi program in the_one_ps
      
      * public commcontext
      d56a0a1b
  23. 16 2月, 2022 1 次提交
    • Z
      sync/geo test ok & fix heter_worker program ok (#39511) · b2986bab
      ziyoujiyi 提交于
      * delete gloo connect retry
      
      * the_one_ps dirs reconstruct
      
      * .
      
      * .
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * refactor theoneps
      
      * the_one_ps
      
      * add ps pass unittest
      
      * add ps pass unittest
      
      * ps unitest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * ps unittest ready
      
      * ps unittest ready
      
      * solve dist_pass init conflict
      
      * solve import CommContext error
      
      * unittest ok
      
      * implement AllocateFrom
      
      * solve setup.py.in conflict
      
      * solve conflict
      
      * solve conflict
      
      * solve conflict
      
      * .
      
      * .
      
      * cpu-async-ps minimize test ok & gpu minimize test ok
      
      * add heter 2stage unittest
      
      * add heter 2stage unittest
      
      * add heter 2stage unittest
      
      * sync/geo test ok & fix heter_worker program ok
      
      * .
      Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>
      b2986bab
  24. 14 2月, 2022 1 次提交
    • Z
      统一ps:heter ps 二阶段单测通过 (#39468) · 765a2ada
      ziyoujiyi 提交于
      * delete gloo connect retry
      
      * the_one_ps dirs reconstruct
      
      * .
      
      * .
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * refactor theoneps
      
      * the_one_ps
      
      * add ps pass unittest
      
      * add ps pass unittest
      
      * ps unitest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * ps unittest ready
      
      * ps unittest ready
      
      * solve dist_pass init conflict
      
      * solve import CommContext error
      
      * unittest ok
      
      * implement AllocateFrom
      
      * solve setup.py.in conflict
      
      * solve conflict
      
      * solve conflict
      
      * solve conflict
      
      * .
      
      * .
      
      * cpu-async-ps minimize test ok & gpu minimize test ok
      
      * add heter 2stage unittest
      
      * add heter 2stage unittest
      
      * add heter 2stage unittest
      Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>
      765a2ada
  25. 11 2月, 2022 1 次提交
    • Z
      统一 ps 开发 - python (#39431) · 22c67d14
      ziyoujiyi 提交于
      * delete gloo connect retry
      
      * the_one_ps dirs reconstruct
      
      * .
      
      * .
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * refactor theoneps
      
      * the_one_ps
      
      * add ps pass unittest
      
      * add ps pass unittest
      
      * ps unitest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * ps unittest ready
      
      * ps unittest ready
      
      * solve dist_pass init conflict
      
      * solve import CommContext error
      
      * unittest ok
      
      * implement AllocateFrom
      
      * solve setup.py.in conflict
      
      * solve conflict
      
      * solve conflict
      
      * solve conflict
      
      * .
      
      * .
      
      * cpu-async-ps minimize test ok & gpu minimize test ok
      Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>
      22c67d14
  26. 08 2月, 2022 1 次提交
    • Z
      ps optimize refactor (#38982) · 196dbfc2
      ziyoujiyi 提交于
      * delete gloo connect retry
      
      * the_one_ps dirs reconstruct
      
      * .
      
      * .
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * refactor theoneps
      
      * the_one_ps
      
      * add ps pass unittest
      
      * add ps pass unittest
      
      * ps unitest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * ps unittest ready
      
      * ps unittest ready
      
      * solve dist_pass init conflict
      
      * solve import CommContext error
      
      * unittest ok
      
      * implement AllocateFrom
      
      * solve setup.py.in conflict
      
      * solve conflict
      
      * solve conflict
      
      * solve conflict
      
      * .
      
      * .
      Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>
      196dbfc2
  27. 12 1月, 2022 1 次提交
    • Z
      the_one_ps dirs reconstruct (#38804) · 50609214
      ziyoujiyi 提交于
      * delete gloo connect retry
      
      * the_one_ps dirs reconstruct
      
      * .
      
      * .
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      50609214