1. 06 1月, 2021 1 次提交
  2. 05 1月, 2021 2 次提交
    • G
      fix test=release/2.0 (#30045) · 6e2066b0
      gongweibao 提交于
      6e2066b0
    • C
      [cherry pick]Set FLAGS_selected_gpus for spawn (#29962) (#30097) · cda7397f
      Chen Weihang 提交于
      Set FLAGS_selected_gpus for spawn.
      
      When the child process starts, it will inherit the configuration of the main process and set the FLAGS once, but the environment variable has not been set at this time, which leads to the FLAGS_selected_gpus is keep same with mainprocess(usually empty), so manually update the flags here.
      
      注:增加了一个单测,又移除了,单测打印显示CI机器nvidia-smi只有两张卡,需要大于两张卡才能测这个问题
      cda7397f
  3. 31 12月, 2020 3 次提交
  4. 25 12月, 2020 1 次提交
    • T
      2 0 ps core 2 (#29894) · f781ab08
      tangwei12 提交于
      * add ps table (#29463)
      
      * add ps table
      
      Change-Id: I468a04bd071d21ff52654926fcf4d5f3da19e178
      
      * add service (#29560)
      
      * add service, remove ut on mac
      
      * fix heter_profiler & add heter stop method
      
      * fix code style
      
      * merge pscore
      
      Change-Id: Ie7f60d1cdde6755a0c29db26863c6283e9843d57
      
      * fix cmake
      
      Change-Id: I6773509a7b4ca79139ecc40b7bf3eb318ceff8bb
      
      * fix conflit
      
      Change-Id: I35575be0c96a8520f9d756ea7f1ff0b904a165ba
      
      * fix conflit
      
      Change-Id: Ic926ea0b0d67803226d51241397ba3b510226bfa
      f781ab08
  5. 22 12月, 2020 2 次提交
  6. 17 12月, 2020 1 次提交
    • S
      [cherry-pick]fix matmulv2 bug & add rebuild group & fix bug of download (#29726) · df0430dc
      ShenLiang 提交于
      * Fix the dowanload bug in the case of multiple machines (#29551)
      
      * fix the dowanload bug
      * add sort for ips
      
      * Fix bug of matmul_v2 for broadcast case (#29599)
      
      * fix bug of matmul_v2 for broadcast
      
      * Rebuild group automatically in dynamic graph distributed (#29255)
      
      * add tensor_indices in AssignGroupBySize
      
      * add rebuild group in reducer
      
      * fix error message of gather nd (#29521)
      df0430dc
  7. 16 12月, 2020 1 次提交
  8. 08 12月, 2020 1 次提交
  9. 04 12月, 2020 1 次提交
  10. 03 12月, 2020 2 次提交
  11. 01 12月, 2020 1 次提交
  12. 30 11月, 2020 2 次提交
  13. 27 11月, 2020 4 次提交
  14. 26 11月, 2020 5 次提交
    • S
      fix InMemoryDataset doc (#28688) · cddc7096
      ShenLiang 提交于
      * add Inmemorydataset
      cddc7096
    • J
      [sharding] doc, api, bug fixed (#28983) · 0dadacc4
      JZ-LIANG 提交于
      * add lars to fleet meta optimizer
      
      * add lamb to proto
      
      * add lamb to fleet meta optimizer
      
      * fixed syntax bug
      
      * fixed syntax bug
      
      * fixed syntax error in lamb, add config setter of lamb in distributed_strategy
      
      * trigger unitest to rerun
      
      * add new unitest func for lamb
      
      * revise unitest for lars and lamb
      
      * revise dgc meta unitest
      
      * revise lars document in distribute_strategy
      
      * revise lars lamb document in distributed_strategy.py
      
      * revise lars lamb document in distributed_strategy.py
      
      * add weight decay exclude logic to lars
      
      * restore optimzier.py
      
      * restore optimizer.py as develop except lars
      
      * add epsilon and exclude fn to distributed_sttrategy
      
      * add lars epsilon
      
      * revise unitest for fleet lars and lamb
      
      * revise lars lamb unitest for CI coverage
      
      * revise lars argument api
      
      * revise lars argument api
      
      * revise lars argument api
      
      * revise api doc of lars
      
      * fix op role
      
      * add sharding save and add_sync_comm_for_test function
      
      * add comm_analyse to utlis
      
      * revise sharding_utils
      
      * add sharding saving unittest
      
      * revise sharding utils for unittest
      
      * revise sharding en doc
      
      * update sharding utils api
      
      * add doc for sharding
      
      * fixed bug in sharding var size count
      
      * update varsize count in sharding
      
      * fix sharding num_nccl_comm
      
      * Revert "fix sharding num_nccl_comm"
      
      This reverts commit d51587c15e9323acf226ddd36154275f0d1daf76.
      0dadacc4
    • L
      fix the bug in gloo (#29112) · 2a864c70
      lilong12 提交于
      * update, test=develop
      2a864c70
    • W
      Fix multi nccl comm & wait server ready (#28663) · e931c7ba
      WangXi 提交于
      e931c7ba
    • G
      1358397e
  15. 24 11月, 2020 3 次提交
  16. 23 11月, 2020 2 次提交
  17. 18 11月, 2020 1 次提交
    • J
      [Sharding] add new features (#28568) · 5a9f6889
      JZ-LIANG 提交于
      * add lars to fleet meta optimizer
      
      * add lamb to proto
      
      * add lamb to fleet meta optimizer
      
      * fixed syntax bug
      
      * fixed syntax bug
      
      * fixed syntax error in lamb, add config setter of lamb in distributed_strategy
      
      * trigger unitest to rerun
      
      * add new unitest func for lamb
      
      * revise unitest for lars and lamb
      
      * revise dgc meta unitest
      
      * revise lars document in distribute_strategy
      
      * revise lars lamb document in distributed_strategy.py
      
      * revise lars lamb document in distributed_strategy.py
      
      * add weight decay exclude logic to lars
      
      * restore optimzier.py
      
      * restore optimizer.py as develop except lars
      
      * add epsilon and exclude fn to distributed_sttrategy
      
      * add lars epsilon
      
      * revise unitest for fleet lars and lamb
      
      * revise lars lamb unitest for CI coverage
      
      * revise lars argument api
      
      * revise lars argument api
      
      * revise lars argument api
      
      * revise api doc of lars
      
      * fix op role
      
      * add sharding save and add_sync_comm_for_test function
      
      * add comm_analyse to utlis
      
      * revise sharding_utils
      
      * add sharding saving unittest
      
      * revise sharding utils for unittest
      5a9f6889
  18. 17 11月, 2020 1 次提交
  19. 16 11月, 2020 1 次提交
  20. 28 10月, 2020 1 次提交
  21. 26 10月, 2020 1 次提交
  22. 22 10月, 2020 1 次提交
  23. 19 10月, 2020 2 次提交