1. 17 12月, 2020 1 次提交
    • S
      [cherry-pick]fix matmulv2 bug & add rebuild group & fix bug of download (#29726) · df0430dc
      ShenLiang 提交于
      * Fix the dowanload bug in the case of multiple machines (#29551)
      
      * fix the dowanload bug
      * add sort for ips
      
      * Fix bug of matmul_v2 for broadcast case (#29599)
      
      * fix bug of matmul_v2 for broadcast
      
      * Rebuild group automatically in dynamic graph distributed (#29255)
      
      * add tensor_indices in AssignGroupBySize
      
      * add rebuild group in reducer
      
      * fix error message of gather nd (#29521)
      df0430dc
  2. 03 12月, 2020 1 次提交
  3. 01 12月, 2020 1 次提交
  4. 26 11月, 2020 1 次提交
    • J
      [sharding] doc, api, bug fixed (#28983) · 0dadacc4
      JZ-LIANG 提交于
      * add lars to fleet meta optimizer
      
      * add lamb to proto
      
      * add lamb to fleet meta optimizer
      
      * fixed syntax bug
      
      * fixed syntax bug
      
      * fixed syntax error in lamb, add config setter of lamb in distributed_strategy
      
      * trigger unitest to rerun
      
      * add new unitest func for lamb
      
      * revise unitest for lars and lamb
      
      * revise dgc meta unitest
      
      * revise lars document in distribute_strategy
      
      * revise lars lamb document in distributed_strategy.py
      
      * revise lars lamb document in distributed_strategy.py
      
      * add weight decay exclude logic to lars
      
      * restore optimzier.py
      
      * restore optimizer.py as develop except lars
      
      * add epsilon and exclude fn to distributed_sttrategy
      
      * add lars epsilon
      
      * revise unitest for fleet lars and lamb
      
      * revise lars lamb unitest for CI coverage
      
      * revise lars argument api
      
      * revise lars argument api
      
      * revise lars argument api
      
      * revise api doc of lars
      
      * fix op role
      
      * add sharding save and add_sync_comm_for_test function
      
      * add comm_analyse to utlis
      
      * revise sharding_utils
      
      * add sharding saving unittest
      
      * revise sharding utils for unittest
      
      * revise sharding en doc
      
      * update sharding utils api
      
      * add doc for sharding
      
      * fixed bug in sharding var size count
      
      * update varsize count in sharding
      
      * fix sharding num_nccl_comm
      
      * Revert "fix sharding num_nccl_comm"
      
      This reverts commit d51587c15e9323acf226ddd36154275f0d1daf76.
      0dadacc4
  5. 24 11月, 2020 1 次提交
    • L
      Upgrade string literals to raw string (#28989) · 3815d7aa
      Leo Chen 提交于
      * upgrade comment string to raw string
      
      * fix string in
      
      * fix string with ' '
      
      * revert update on comments
      
      * upgrade only necessary
      
      * fix sample code checker
      
      * fix comments with '''
      3815d7aa
  6. 26 10月, 2020 1 次提交
  7. 22 10月, 2020 1 次提交
  8. 12 10月, 2020 1 次提交
  9. 28 9月, 2020 1 次提交
  10. 25 9月, 2020 1 次提交
  11. 16 9月, 2020 1 次提交
  12. 14 9月, 2020 1 次提交
  13. 09 9月, 2020 1 次提交
  14. 07 9月, 2020 1 次提交
  15. 04 9月, 2020 1 次提交
  16. 29 8月, 2020 1 次提交
  17. 27 8月, 2020 1 次提交
  18. 26 8月, 2020 1 次提交
  19. 25 8月, 2020 1 次提交
  20. 24 8月, 2020 1 次提交
  21. 21 8月, 2020 1 次提交
  22. 18 8月, 2020 1 次提交
  23. 13 8月, 2020 1 次提交
  24. 12 8月, 2020 1 次提交
  25. 10 8月, 2020 1 次提交
  26. 05 8月, 2020 1 次提交
  27. 03 8月, 2020 1 次提交
  28. 30 7月, 2020 1 次提交
  29. 29 7月, 2020 1 次提交
  30. 28 7月, 2020 1 次提交
    • D
      add more settings for distributed strategy (#25685) · 920d998f
      Dong Daxiang 提交于
      * add more settings for distributed strategy
      Basically, DistributedStrategy has several parts of configurations:
      - BuildStrategy: the same as paddle.fluid.BuildStrategy, but the distributed arguments are moved out of BuildStrategy
      - ExecutionStrategy: the same as paddle.fluid.ExecutionStrategy
      - collective communication configs: nccl_comm_num, hierarchical allreduce and so on
      - distributed algorithms: async_update(mainly used in PS), lars, lamb and so on
      920d998f
  31. 20 7月, 2020 1 次提交
  32. 08 7月, 2020 1 次提交
  33. 06 7月, 2020 1 次提交