1. 06 11月, 2018 1 次提交
    • W
      feature/DC asgd (#12722) · 306236c2
      Wu Yi 提交于
      * wip
      
      * add ref_by_trainer_id op
      
      * ready to test
      
      * fix ref inputs
      
      * refine rpc_op_handle
      
      * fix merge bug
      306236c2
  2. 02 11月, 2018 1 次提交
    • T
      [1.1] Load vars on PSERVER (#14037) · d325e668
      tangwei12 提交于
      * fix dim0 in _load_slice_up_vars
      
      * fix dim0 in _load_slice_up_vars, fix innershape in delete_var_op
      
      * Revert "fix lookuptable in reduce strategy"
      
      This reverts commit 0e722c5e
      
      * add unit test for dist
      
      * add unit test for dist, test=develop
      
      * cancel revert, test=develop
      d325e668
  3. 01 11月, 2018 4 次提交
  4. 31 10月, 2018 2 次提交
  5. 30 10月, 2018 1 次提交
  6. 29 10月, 2018 1 次提交
    • W
      [1.1] [project] train imagenet using large batch size (#13766) · 26200f2e
      Wu Yi 提交于
      * fix nccl2 lars dist support
      
      * put lars in momentum op
      
      * add tests lars
      
      * fix ci
      
      * fix cpu kernel
      
      * soft warning
      
      * remove lars in test_recognize_digits.py
      
      * move to another op
      
      * add file
      
      * update api.spec test=develop
      
      * update test=develop
      
      * fix api.spec test=develop
      
      * wip
      
      * wip, finish grad merge ops
      
      * wip, finish graph build
      
      * wip test running
      
      * work on 1 gpu
      
      * workable version
      
      * update
      
      * fix tests
      
      * fuse broadcast op
      
      * fix compile failed
      
      * refine
      
      * add batch merge test mnist
      
      * fix CI test=develop
      
      * fix build
      
      * use independent bn params for batch merge test=develop
      
      * update api.spec
      
      * follow comments and for test
      
      * wip
      
      * refine tests test=develop
      
      * follow comments test=develop
      
      * remove startup bn modify test=develop
      
      * follow comments test=develop
      
      * fix merge test=develop
      26200f2e
  7. 28 10月, 2018 1 次提交
  8. 27 10月, 2018 1 次提交
  9. 26 10月, 2018 4 次提交
  10. 25 10月, 2018 1 次提交
  11. 18 10月, 2018 1 次提交
  12. 16 10月, 2018 1 次提交
  13. 14 10月, 2018 1 次提交
  14. 27 9月, 2018 3 次提交
    • T
      fix graph num · 5b152b1f
      typhoonzero 提交于
      5b152b1f
    • T
      Add distributed unit tests about text_classification/simnet-bow/ctr (#12812) · 97cf1eb6
      tangwei12 提交于
      * add dist ut for text_classification
      
      * add dist ut for text_classification
      
      * add simnet bow unittest
      
      * add dist ut for simnet bow
      
      * add trainning data url for simnet bow
      
      * add trainning data url for simnet bow
      
      * modify simnet test_reader to train reader
      
      * add test_dist_ctr
      
      * test_dist_ctr can run now
      
      * dense update is good
      
      * add unit test for selected rows
      
      * debug unit test
      
      * fix dist sparse update problem
      
      * Constant args at init
      
      * optimize code
      
      * simnet optimize
      
      * fix DebugStringEx
      
      * optimize sum_op.h
      
      * add ScaleOpVarTypeInference
      
      * clean code
      
      * fix test_dist_transpiler.py
      
      * code optimize
      
      * modify delta
      
      * fix sparse update bug
      
      * dist test use one cpu
      
      * update some data
      
      * remove unused code
      
      * add use cuda config
      
      * unit test fix
      
      * unit test fix
      
      * unit test fix
      
      * unit test fix
      
      * dist_word2vec use CPU
      
      * unit test fix
      
      * unit test fix
      
      * code clean
      
      * code clean
      
      * merge develop
      
      * api spec update
      
      * Revert: api spec update
      
      * replace simnet data with fake
      
      * replace simnet data with fake
      
      * update dim
      
      * add batch auc
      
      * code clean
      
      * code clean
      
      * modify print to stderr
      
      * update simnet delta -> 1e-5
      
      * update RUN_STEP
      
      * add use_reader_alloc
      
      * add use_reader_alloc
      
      * add use_reader_alloc
      
      * modify delta
      
      * add use_reader_alloc
      
      * fix stderr write
      
      * python3 compatibility
      
      test=develop
      
      * python3 compatibility, test=develop
      
      * Update dist_text_classification.py
      
      * test=develop
      97cf1eb6
    • T
      Batch AUC (#13567) · 85362e98
      tangwei12 提交于
      * add distributed auc
      
      * add attr "is distributed" and config it
      
      * add distributed auc
      
      * add batch auc and code format
      
      * code format
      
      * auc optimize
      
      * metric_op optimize
      
      * code clean
      
      * bug fix and code clean
      
      * bug fix and code clean
      
      * code optimize
      
      * code optimize
      
      * api spec update
      
      * Comments optimized
      
      * add mutex
      
      * Revert: add mutex
      
      * remove distribute metric
      
      * remove distribute metric
      
      * spec modifyed
      
      * add annotation, test=develop
      
      * keep API compatibility
      test=develop
      85362e98
  15. 26 9月, 2018 2 次提交
  16. 25 9月, 2018 1 次提交
    • W
      Nccl2 dist API (#13506) · aeb2dc2b
      Wu Yi 提交于
      * add nccl2 dist api
      
      * update apispec
      
      * update
      
      * update api spec
      aeb2dc2b
  17. 23 9月, 2018 1 次提交
  18. 21 9月, 2018 1 次提交
  19. 18 9月, 2018 2 次提交
  20. 13 9月, 2018 1 次提交
    • W
      Trainer auto wait pserver ports (#13341) · 3ab3a7f3
      Wu Yi 提交于
      * trainer auto wait pserver port ready
      
      * add file
      
      * fix docstring
      
      * add option to not wait
      
      * update api spec
      
      * clean
      
      * fix test hang
      3ab3a7f3
  21. 04 9月, 2018 1 次提交
  22. 03 9月, 2018 1 次提交
  23. 31 8月, 2018 2 次提交
  24. 29 8月, 2018 1 次提交
  25. 28 8月, 2018 1 次提交
    • W
      Refine dist rpc deps (#12899) · 0ee6fed0
      Wu Yi 提交于
      * refine dist train RPC deps
      
      * clean up
      
      * clean up
      
      * fix ut
      
      * remove input for fetch_barrier
      
      * follow comments
      0ee6fed0
  26. 27 8月, 2018 1 次提交
  27. 23 8月, 2018 1 次提交
    • W
      Resovle multi gpu async deps (#12828) · b8da70c3
      Wu Yi 提交于
      * dist transpiler add control dependency var between send and recv
      
      * fix async deps
      
      * follow comments and refine
      
      * fix deps connect for rpc ops
      b8da70c3
  28. 21 8月, 2018 1 次提交
    • W
      Add async dist tests (#12798) · f63368db
      Wu Yi 提交于
      * add async dist tests
      
      * update delta
      
      * fix transformer test
      
      * refine rmsprop transpile
      
      * update
      
      * fix dist seresnet
      f63368db