1. 21 9月, 2022 1 次提交
  2. 20 9月, 2022 6 次提交
    • R
      logger manager (#45909) · 264ad205
      Roc 提交于
      uniform logger manager in FleetAPI.
      hidde API under distributed/utils which users don't need.
      264ad205
    • Z
      Fl ps (#46258) · 5bbfca15
      ziyoujiyi 提交于
      * back fl
      
      * delete ssl cert
      
      * .
      
      * make warning
      
      * .
      
      * unittest paral degree
      
      * solve unittest
      
      * heter & multi cloud commm ready
      
      * .
      
      * .
      
      * fl-ps v1.0
      
      * .
      
      * support N + N mode
      
      * .
      
      * .
      
      * .
      
      * .
      
      * delete print
      
      * .
      
      * .
      
      * .
      
      * .
      
      * fix bug
      
      * .
      
      * .
      
      * fl-ps with coordinator ready
      
      * merge dev
      
      * update message parse only
      
      * update fl client scheduler
      
      * fix bug
      
      * update multithreads sync
      
      * fix ci errors
      
      * update role_maker.py
      
      * update role_maker.py
      
      * fix ci error: windows py import error
      
      * fix ci error: windows py import error
      
      * fix windows ci pylib import error
      
      * add dump fields & params
      
      * try to fix windows import fleet error
      
      * fix ps FLAGS error
      
      * fix logging risk
      
      * fix logging possible risk
      
      * write trainer_desc file
      
      * support split sparse params in local & remote
      
      * fix import paddle.fluid.core.PSGPU
      
      * fix import paddle.fluid.core.PSGPU
      
      * add remote_sparse & local_sparse config
      
      * fix unittest
      
      * fix test_dist_fleet_geo table error
      
      * fix PADDLE_ENFORCE error
      
      * fix other's pr conflict
      
      * forbidden ssd table
      
      * .
      
      * recover ssd table code
      
      * recover file mode
      
      * debug auc 0.5
      
      * adapt for nn fl-ps
      
      * adapt for nn fl-ps
      
      * add learning_rate_0 intializer op
      
      * recover ssd table
      
      * modify file mode
      
      * flps del fake-init op
      5bbfca15
    • J
      [Auto Parallel] performance improvement for Sharding-DP hybrid parallelism (#46180) · f769f850
      JZ-LIANG 提交于
      * remove no need grad allreduce communication when sharding-dp
      
      * remove no need grad allreduce communication when sharding-dp
      
      * bugfix
      
      * bugfix
      
      * bugfix
      f769f850
    • Z
      fix strategy (#46256) · b1e82031
      zhaoyingli 提交于
      b1e82031
    • Y
      dont wait for send op under dygraph pp (#46209) · 8ff7df8f
      Yuang Liu 提交于
      8ff7df8f
    • H
      [PolishComments] Polish some code comments (#46032) · 56f9452c
      HongyuJia 提交于
      * polish code comments
      
      * polish data_device_transform.cc
      56f9452c
  3. 19 9月, 2022 5 次提交
  4. 17 9月, 2022 1 次提交
  5. 16 9月, 2022 3 次提交
  6. 15 9月, 2022 1 次提交
    • Y
      [Auto Parallel] Improve the APIs (#45776) · b042a3b1
      Yulong Ao 提交于
      * [Auto Parallel] Use c++ dist attr in the completion process
      
      * [Auto Parallel] Add minor changes
      
      * [Auto Parallel] Use c++ dist attr in the completion process
      
      * [Auto Parallel] Add minor changes
      
      * [Auto Parallel] Add the serialization process for dist attrs
      
      * [Auto Parallel] Remove unnecessary comments
      
      * [Auto Parallel] Fix some bugs
      
      * [Auto Parallel] Fix the code style
      
      * [Auto Parallel] Remove unnecessary impls
      
      * [Auto Parallel] Fix the importing error
      
      * [Auto Parallel] Fix the copy from bugs of op dist attr
      
      * [Auto Parallel] Replace the use of constexpr if
      
      * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh
      
      * [Auto Parallel] Change API of the completion unittest
      
      * [Auto Parallel] Fix the bug when set_attr an int
      
      * [Auto Parallel] Add the unittest for the serialization
      
      * [Auto Parallel] Add some unit tests
      
      * [Auto Paralle] Unify the strategy
      
      * [Auto Parallel] Improve the engine api
      
      * [Auto Parallel] Reset the changes made to the framework
      
      * [Auto Parallel] Change the engine unittest
      
      * [Auto Parallel] Update API of the completion and partitioner
      
      * [Auto Parallel] Update unit tests using engine api
      
      * update shard annotation
      
      * [Auto Parallel] Remove the modifications of other modules
      
      * [Auto Parallel] Add docs for APIs
      
      * add new strategy
      
      * [Auto Parallel] Replace the logger
      
      * [Auto Parallel] Restore the test_program.py
      
      * [Auto Parallel] Change the import rules
      
      * [Auto Parallel] Add the examples for Engine
      
      * [Auto Parallel] Do some minor changes
      
      * [Auto Parallel] Remove yaml dependency
      
      * [Auto Parallel] Fix the unittests
      
      * add valid after train
      
      * bug fix
      Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
      Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
      Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
      b042a3b1
  7. 14 9月, 2022 4 次提交
  8. 13 9月, 2022 2 次提交
  9. 09 9月, 2022 5 次提交
  10. 08 9月, 2022 1 次提交
  11. 07 9月, 2022 2 次提交
  12. 06 9月, 2022 2 次提交
  13. 05 9月, 2022 1 次提交
  14. 02 9月, 2022 3 次提交
  15. 01 9月, 2022 3 次提交