1. 09 4月, 2022 1 次提交
    • Z
      Unittest recover (#41431) · 7a07c4a5
      zhaocaibei123 提交于
      * update name
      
      * update name
      
      * fix test
      
      * fix fleet bind
      
      * update name
      
      * update name
      
      * fix test
      
      * fix gpups wrapper
      
      * remove Push/Pull/Load/Save with context in client and wrapper base class
      
      * fix
      
      * fix
      
      * remove some interface
      
      * fix
      
      * remove
      
      * code style
      
      * recover
      
      * fix
      
      * remove code unused
      
      * remove some unused table & accessor & CommonDenseTable => MemoryDenseTable
      
      * fix
      
      * fix
      
      * fix
      
      * recover
      
      * remove unused code
      
      * recover unittest
      
      * fix
      
      * remove
      
      * fix
      
      * remove code unuseful
      
      * remove
      
      * fix
      
      * recover
      
      * remove
      Co-authored-by: Nesythan <esythan@126.com>
      7a07c4a5
  2. 23 3月, 2022 1 次提交
    • Z
      two-phase training for ps (#40762) · b1a4668c
      zhaocaibei123 提交于
      * fix benchmark and communicator config
      
      * fix bugs of the_one_ps
      
      * multi program and fix bug in optimizer
      
      * multi program in the_one_ps
      
      * public commcontext
      
      * ps optimizer multi programs
      
      * cvm & datanorm backend
      
      * fix dim
      
      * fix unittest
      
      * fix
      
      * the one ps merge
      
      * remove comm
      
      * add DownpourLiteWorker
      
      * all
      
      * fix
      
      * fix
      
      * device worker downpour lite
      
      * fix
      
      * fix bug in global shuffle
      
      * save inference model
      
      * fix & add log
      
      * fix
      
      * remove log
      
      * fix
      
      * fix save summary
      
      * fix
      
      * fix pscore
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * remove logs
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * add some comments
      
      * fix
      Co-authored-by: Nesythan <esythan@126.com>
      b1a4668c
  3. 30 11月, 2021 1 次提交
  4. 26 5月, 2021 1 次提交
    • T
      ut fix (#33102) · e05a7a49
      tangwei12 提交于
      
      Change-Id: I2e82dfcee6a1d0512b94cebc32281123fa5bf597
      
      * pretty print for datafeed error
      
      Change-Id: I056a8b6f03608e96679a83846c97aed289cef7e6
      
      * fix fleet dist infer ut
      e05a7a49
  5. 30 12月, 2020 1 次提交
    • T
      fix ut (#29989) · ed856d25
      tangwei12 提交于
      * fix ut
      
      Change-Id: I151e152919a1863db07792bffb42d0ca68995756
      ed856d25
  6. 24 12月, 2020 1 次提交
  7. 08 9月, 2020 1 次提交
  8. 02 9月, 2020 1 次提交
  9. 20 8月, 2020 1 次提交
  10. 19 8月, 2020 1 次提交
  11. 07 8月, 2020 1 次提交
  12. 30 7月, 2020 1 次提交
  13. 17 2月, 2020 1 次提交
  14. 12 2月, 2020 1 次提交
  15. 17 1月, 2020 1 次提交
  16. 06 1月, 2020 1 次提交
  17. 19 12月, 2019 1 次提交
  18. 15 10月, 2019 1 次提交
    • C
      Fix communicator slow bug & fix communicator stop bug (#20366) · 940c6ff1
      Chengmo 提交于
      * test=develop,Fix communicator slow bug
      
      * test=develop, delete if() in stop_worker()
      
      * test=develop
      
      * fix UT, test=develop
      
      * fix bug in fetch handler, test=develop
      
      * fix bug in fetch handler, test=develop
      
      * test=develop, fix fetch barrier bug
      
      * test=develop, bug fix
      
      * test=develop, bug fix
      
      * test=develop, fix bug
      940c6ff1
  19. 07 10月, 2019 1 次提交
  20. 27 9月, 2019 1 次提交
  21. 28 8月, 2019 1 次提交
    • T
      Fix the correctness of async mode at distributed training (#18863) · 65c73684
      tangwei12 提交于
      * fix correctness of the communicator
      
      * fix a bug in send thread when sending var context is empty, test=develop
      
      * add lookup_table_prefetch_op and prefetch optimize, test=develop
      
      * remove remote prefetch GPU supported
      
      * word2vec force with CPU, test=develop
      
      * test dist remote lookup table force with CPU, test=develop
      65c73684
  22. 22 7月, 2019 1 次提交
  23. 12 6月, 2019 1 次提交
  24. 29 10月, 2018 1 次提交
    • W
      [1.1] [project] train imagenet using large batch size (#13766) · 26200f2e
      Wu Yi 提交于
      * fix nccl2 lars dist support
      
      * put lars in momentum op
      
      * add tests lars
      
      * fix ci
      
      * fix cpu kernel
      
      * soft warning
      
      * remove lars in test_recognize_digits.py
      
      * move to another op
      
      * add file
      
      * update api.spec test=develop
      
      * update test=develop
      
      * fix api.spec test=develop
      
      * wip
      
      * wip, finish grad merge ops
      
      * wip, finish graph build
      
      * wip test running
      
      * work on 1 gpu
      
      * workable version
      
      * update
      
      * fix tests
      
      * fuse broadcast op
      
      * fix compile failed
      
      * refine
      
      * add batch merge test mnist
      
      * fix CI test=develop
      
      * fix build
      
      * use independent bn params for batch merge test=develop
      
      * update api.spec
      
      * follow comments and for test
      
      * wip
      
      * refine tests test=develop
      
      * follow comments test=develop
      
      * remove startup bn modify test=develop
      
      * follow comments test=develop
      
      * fix merge test=develop
      26200f2e