1. 11 11月, 2022 1 次提交
  2. 10 11月, 2022 1 次提交
  3. 07 11月, 2022 3 次提交
  4. 04 11月, 2022 1 次提交
  5. 03 11月, 2022 1 次提交
  6. 01 11月, 2022 1 次提交
  7. 29 10月, 2022 1 次提交
  8. 24 10月, 2022 3 次提交
  9. 21 10月, 2022 1 次提交
  10. 19 10月, 2022 1 次提交
    • G
      Add enable_partial_send_recv switch in pipeline_configs (#46992) (#47083) · 1d015f12
      Ghost Screaming 提交于
      * Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
      is wrong.
      
      * Support allow_partial switch, which can be configure in
      pipeline_configs. If sent tensor are not the same from
      different hosts, they shouldn't been sent partially and
      then concated as a whole tensor.
      
      * Change name allow_partial to enable_partial_send_recv.
      
      * Add global variable _enable_partial_send_recv
      1d015f12
  11. 18 10月, 2022 2 次提交
    • Y
      Cherry pick for sharding (#47061) · 5b642140
      Yuang Liu 提交于
      * [dygraph sharding] Overlap the reduce and the caculation for sharding stage 2. (#46495)
      
      * [dygraph sharding stage 2] sharding broadcast overlap (#46656)
      
      * Multi groups for broadcast of sharding stage 2 (#46894)
      5b642140
    • H
      [cherry-pick] Fix perf issues of mp/pp/fuse in eager mode (#47071) · b84edd90
      Haohongxiang 提交于
      * [Dygraph] Fix performance of pp+mp by using send/recv_calc_stream instead of send/recv (#46116)
      
      * [Dygraph] Fix Perf of FusedFeedForward and FusedAttention with AllReduce (#46780)
      
      * update
      b84edd90
  12. 17 10月, 2022 1 次提交
    • W
      [Cherry-pick] Collective communication APIs (#46922) · 5fba2a98
      Wen Sun 提交于
      * Support both use_calc_stream and sync_op in send recv APIs (#46023)
      
      * Support both use_calc_stream and sync_op in allgather API (#46295)
      
      * Support both use_calc_stream and sync_op in collective communication API (#46761)
      
      * Move group and all reduce from collective to communication (#45848)
      
      * Completes bfloat16 dtype for collective api in eager mode (#45844)
      
      * Fix collective APIs cannot be recognized when building docs (#46962)
      Co-authored-by: NLiYuRio <63526175+LiYuRio@users.noreply.github.com>
      5fba2a98
  13. 11 10月, 2022 1 次提交
    • Y
      Cherry pick for dygraph pp (#46876) · 9cc3f69f
      Yuang Liu 提交于
      * bug fix for virtual pipeline parallel (#45922)
      
      * dont wait for send op under dygraph pp (#46209)
      
      * [interleave pp] sync recv for 1f1b (#46399)
      
      * [dygraph pp] all sync for allgather partial (#46483)
      9cc3f69f
  14. 27 9月, 2022 1 次提交
  15. 22 9月, 2022 2 次提交
  16. 20 9月, 2022 2 次提交
  17. 19 9月, 2022 3 次提交
  18. 09 9月, 2022 1 次提交
  19. 07 9月, 2022 2 次提交
  20. 06 9月, 2022 1 次提交
  21. 02 9月, 2022 1 次提交
  22. 01 9月, 2022 1 次提交
  23. 26 8月, 2022 3 次提交
  24. 23 8月, 2022 2 次提交
  25. 16 8月, 2022 1 次提交
  26. 15 8月, 2022 1 次提交
    • W
      refactor fleet. (#44833) · 8636d2a2
      wuhuachaocoding 提交于
      * refactor fleet.
      
      * refact fleet.py.
      
      * update fleet/__init__.py.
      
      * update fleet.py
      
      * update code style.
      
      * update fleet
      
      * update fleet
      
      * update fleet
      
      * update fleet
      
      * update model.py
      
      * update fleet.
      
      * update __init__.py
      
      * update fleet.
      
      * update fleet.
      
      * update fleet
      
      * update fleet
      
      * update fleet
      
      * update fleet.
      
      * update optimizer.py
      
      * update optimizer
      
      * update fleet.py
      
      * update scaler.py
      
      * update setup.py.in
      8636d2a2
  27. 13 8月, 2022 1 次提交
    • Z
      fl-ps: support split sparse params in local & remote (#44864) · 3f5c405f
      ziyoujiyi 提交于
      * back fl
      
      * delete ssl cert
      
      * .
      
      * make warning
      
      * .
      
      * unittest paral degree
      
      * solve unittest
      
      * heter & multi cloud commm ready
      
      * .
      
      * .
      
      * fl-ps v1.0
      
      * .
      
      * support N + N mode
      
      * .
      
      * .
      
      * .
      
      * .
      
      * delete print
      
      * .
      
      * .
      
      * .
      
      * .
      
      * fix bug
      
      * .
      
      * .
      
      * fl-ps with coordinator ready
      
      * merge dev
      
      * update message parse only
      
      * update fl client scheduler
      
      * fix bug
      
      * update multithreads sync
      
      * fix ci errors
      
      * update role_maker.py
      
      * update role_maker.py
      
      * fix ci error: windows py import error
      
      * fix ci error: windows py import error
      
      * fix windows ci pylib import error
      
      * add dump fields & params
      
      * try to fix windows import fleet error
      
      * fix ps FLAGS error
      
      * fix logging risk
      
      * fix logging possible risk
      
      * write trainer_desc file
      
      * support split sparse params in local & remote
      
      * fix import paddle.fluid.core.PSGPU
      
      * fix import paddle.fluid.core.PSGPU
      
      * add remote_sparse & local_sparse config
      
      * fix unittest
      
      * fix test_dist_fleet_geo table error
      
      * fix PADDLE_ENFORCE error
      
      * fix other's pr conflict
      3f5c405f