1. 12 4月, 2023 1 次提交
  2. 10 4月, 2023 1 次提交
  3. 03 4月, 2023 1 次提交
  4. 20 3月, 2023 1 次提交
  5. 09 3月, 2023 1 次提交
  6. 24 11月, 2022 1 次提交
  7. 11 11月, 2022 1 次提交
  8. 10 11月, 2022 1 次提交
  9. 09 11月, 2022 1 次提交
  10. 07 11月, 2022 3 次提交
  11. 04 11月, 2022 1 次提交
  12. 03 11月, 2022 1 次提交
  13. 01 11月, 2022 1 次提交
  14. 31 10月, 2022 1 次提交
    • Z
      2.4/fix engine build (#47462) · 4b3589fb
      zhaoyingli 提交于
      * update codestyle
      
      * [AutoParallel] fix fp16 for subblock (#47189)
      
      * [AutoParallel] fix fp16 for subblock
      
      * fix engine
      
      * fix comment
      
      * [AutoParallel] fix engine _build and cost method (#47263)
      
      * fix engine build method
      
      * fix import
      
      * update engine cost
      
      * update raise error
      
      * update cmakelist
      
      * revert optimizer
      
      * revert optimizer
      
      * fix unittest
      
      * fix unittest
      Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
      Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
      4b3589fb
  15. 29 10月, 2022 1 次提交
  16. 26 10月, 2022 1 次提交
  17. 24 10月, 2022 3 次提交
  18. 21 10月, 2022 1 次提交
  19. 20 10月, 2022 1 次提交
  20. 19 10月, 2022 2 次提交
    • Z
      [Cherry-Pick][AutoParallel] auto_parallel cherry-pick to release2.4 (#47145) · 90b31790
      zhaoyingli 提交于
      * [Auto Parallel] Make Engine class callable (#46416)
      
      * [Auto Parallel] Imporve the user-defined fetches and logging
      
      * [Auto Parallel] Make Engine class callable
      
      * [Auto Parallel] Update the data loading of tuner
      
      * Print IPS in auto parallel Engine (#46554)
      
      * [AutoParallel] fix dist_split (#46505)
      
      * [AutoParallel] fix dist_split
      
      * add unittest
      
      * update cmakelist
      
      * [AutoParallel] fix sharding (#46572)
      
      * [AutoParallel] fix process_mesh (#46583)
      
      * [AutoParallel] fix reshard when train with eval (#46605)
      
      * [AutoParallel] fix reshard when train with eval
      
      * fix mppp
      
      * [AutoParallel] fix amp when predict (#46637)
      
      * [Auto Parallel]Update comp cost and completion for gpt auto search (#46387)
      
      * update comp cost and completion for gpt auto search
      
      * add unittest
      
      * [Auto Parallel] Fix bugs caused by the inconsistent outputs of Engine API (#46633)
      
      * [Auto Parallel] Unify the logger and outputs of Engine API
      
      * [Auto Parallel] Fix the bugs of to_static
      
      * [Auto Parallel] Adjust the test_to_static.py
      
      * [Auto Parallel] Improve the fine-grained APIs (#46552)
      
      * [Auto Parallel] Suppport different dataloaders
      
      * [Auto Parallel] Add num_shards config for dataset
      
      * [Auto Parallel] Unify the logger and outputs of Engine API
      
      * [Auto Parallel] Fix the bugs of to_static
      
      * [Auto Parallel] Adjust the test_to_static.py
      
      * [Auto Parallel] Add the prepare API and replace __call__ with run
      
      * [Auto Parallel] Improve the private implementations of Engine
      
      * [Auto Parallel] Set capacity of dataloader for opt tuning
      
      * [Auto Parallel] [WIP] Change the fine-grained API
      
      * [Auto Parallel] Improve APIs to support different user cases
      
      * [Auto Parallel] Add removed config
      
      * [Auto Parallel] Add imports
      
      * [Auto Parallel] Fix bugs for to_static
      
      * [Auto Parallel] Remove unnecessary imports
      
      * bugfix (#46921)
      
      * [Auto Parallel] Fix the bug for None labels (#46987)
      
      * [AutoParallel] adapt for gpt-gen (#46771)
      
      * for gpt-gen
      
      * fix reshard
      
      * adapt assign and shape op
      
      * add dist_assign & unittest
      
      * add conditional block unittest
      
      * rename unittest
      
      * [Auto Parallel] Fix the bug of completion (#47056)
      
      * [Auto Parallel] Fix the bug for None labels
      
      * [Auto Parallel] Fix the completion bug
      
      * [AutoParallel] add callbacks (#47014)
      
      * [AutoParallel] add callbacks
      
      * fix unittest
      
      * fix dist_context
      
      * fix engine
      
      * fix cmakelist
      
      * fix unittest's returns
      
      * fix cmakelist
      
      * [Auto Parallel] Add cost interface (#47043)
      
      * add cost interface
      
      * update inferface and add unittest
      
      * update unittest
      
      * update inferface
      
      * [Auto Parallel]Add parallel tuner (#46189)
      
      * add parallel tuner
      
      * add unittest
      
      * fix unittest
      
      * set timeout of unittest
      
      * set unittest timeout
      
      * fix auto_mode setting
      
      * update unittest
      
      * sync from develop and update unittest
      
      * remove unused import
      
      * update unittest
      
      * update cmakelist
      
      * add unittests
      Co-authored-by: NYulong Ao <aoyulong@baidu.com>
      Co-authored-by: NRuibiao Chen <chenruibiao@baidu.com>
      Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
      Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
      90b31790
    • G
      Add enable_partial_send_recv switch in pipeline_configs (#46992) (#47083) · 1d015f12
      Ghost Screaming 提交于
      * Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
      is wrong.
      
      * Support allow_partial switch, which can be configure in
      pipeline_configs. If sent tensor are not the same from
      different hosts, they shouldn't been sent partially and
      then concated as a whole tensor.
      
      * Change name allow_partial to enable_partial_send_recv.
      
      * Add global variable _enable_partial_send_recv
      1d015f12
  21. 18 10月, 2022 2 次提交
    • Y
      Cherry pick for sharding (#47061) · 5b642140
      Yuang Liu 提交于
      * [dygraph sharding] Overlap the reduce and the caculation for sharding stage 2. (#46495)
      
      * [dygraph sharding stage 2] sharding broadcast overlap (#46656)
      
      * Multi groups for broadcast of sharding stage 2 (#46894)
      5b642140
    • H
      [cherry-pick] Fix perf issues of mp/pp/fuse in eager mode (#47071) · b84edd90
      Haohongxiang 提交于
      * [Dygraph] Fix performance of pp+mp by using send/recv_calc_stream instead of send/recv (#46116)
      
      * [Dygraph] Fix Perf of FusedFeedForward and FusedAttention with AllReduce (#46780)
      
      * update
      b84edd90
  22. 17 10月, 2022 1 次提交
    • W
      [Cherry-pick] Collective communication APIs (#46922) · 5fba2a98
      Wen Sun 提交于
      * Support both use_calc_stream and sync_op in send recv APIs (#46023)
      
      * Support both use_calc_stream and sync_op in allgather API (#46295)
      
      * Support both use_calc_stream and sync_op in collective communication API (#46761)
      
      * Move group and all reduce from collective to communication (#45848)
      
      * Completes bfloat16 dtype for collective api in eager mode (#45844)
      
      * Fix collective APIs cannot be recognized when building docs (#46962)
      Co-authored-by: NLiYuRio <63526175+LiYuRio@users.noreply.github.com>
      5fba2a98
  23. 11 10月, 2022 1 次提交
    • Y
      Cherry pick for dygraph pp (#46876) · 9cc3f69f
      Yuang Liu 提交于
      * bug fix for virtual pipeline parallel (#45922)
      
      * dont wait for send op under dygraph pp (#46209)
      
      * [interleave pp] sync recv for 1f1b (#46399)
      
      * [dygraph pp] all sync for allgather partial (#46483)
      9cc3f69f
  24. 27 9月, 2022 2 次提交
  25. 26 9月, 2022 1 次提交
    • Z
      cherry-pick V2.4 (#46358) · 536d9d8c
      ziyoujiyi 提交于
      * back fl
      
      * delete ssl cert
      
      * .
      
      * make warning
      
      * .
      
      * unittest paral degree
      
      * solve unittest
      
      * heter & multi cloud commm ready
      
      * .
      
      * .
      
      * fix gloo compile warning
      
      * adapt for nn fl-ps
      
      * flps del fake-init op
      
      * add learning_rate_0 intializer op
      
      * bug fix
      
      * .
      
      * .
      536d9d8c
  26. 22 9月, 2022 3 次提交
  27. 20 9月, 2022 3 次提交
  28. 19 9月, 2022 2 次提交