1. 14 7月, 2023 1 次提交
    • C
      [AutoTuner] Distribute best cfg (#54834) · 7f6d222f
      caozhou 提交于
      * distribute best cfg
      
      * adapt to multi args transmission
      
      * update metric extracting
      
      * fix bugs of prune and reading log
      
      * fix time default value
      
      * remove time record
      
      * adjust the order of searching dim
      
      * fix prune bugs
      
      * fix adding cfg bug
      
      * fix multi nodes bug
      
      * reset status
      
      * remove alarm and set logdir
      
      * deepcopy ctx
      
      * change alarm
      
      * fix restart bug
      
      * add exit
      
      * best no need alarm
      
      * add warmup time
      7f6d222f
  2. 30 6月, 2023 1 次提交
  3. 25 6月, 2023 1 次提交
  4. 20 6月, 2023 1 次提交
    • A
      [AutoTuner] Add compare and record (#54668) · 6fe7b5e2
      Azure 提交于
      * add auto tuner
      
      * compare and record module
      
      * revert launch main
      
      * add prune rule
      
      * add unit test
      
      * add auto tuner
      
      * revert launch main
      
      * add prune rule
      
      * modify unit test script
      
      * fix bug for dump nodes; fix bug for checking log file
      
      * fix bug
      
      ---------
      Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
      6fe7b5e2
  5. 19 6月, 2023 1 次提交
  6. 14 6月, 2023 1 次提交
  7. 12 6月, 2023 1 次提交
  8. 08 6月, 2023 1 次提交
  9. 07 6月, 2023 1 次提交
  10. 11 5月, 2023 1 次提交
  11. 10 5月, 2023 1 次提交
  12. 24 4月, 2023 1 次提交
  13. 23 4月, 2023 1 次提交
  14. 13 4月, 2023 1 次提交
  15. 06 4月, 2023 1 次提交
    • K
      rem is_compiled_with_npu (#52385) · 7976e2a3
      Kim Yann 提交于
      * rem is_compiled_with_npu
      
      * rem nup related code
      
      * make lint happy
      
      * rem test
      
      * remove some tests
      
      * Update grad_scaler.py
      
      * fix an error
      7976e2a3
  16. 03 4月, 2023 1 次提交
  17. 31 3月, 2023 1 次提交
  18. 30 3月, 2023 1 次提交
  19. 25 3月, 2023 1 次提交
  20. 23 3月, 2023 1 次提交
  21. 20 3月, 2023 1 次提交
  22. 13 12月, 2022 1 次提交
  23. 08 12月, 2022 1 次提交
    • G
      Clean fluid APIs in distributed and fleet files (#48851) · 911d6bb1
      Ghost Screaming 提交于
      * Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
      is wrong.
      
      * Remove climits.
      
      * Clean fluid API in paddle/distributed and paddle/fleetx folders.
      Include following files:
      python/paddle/distributed/__init__.py
      python/paddle/distributed/collective.py
      python/paddle/distributed/fleet/utils/fs.py
      python/paddle/distributed/fleet/utils/hybrid_parallel_inference.py
      python/paddle/distributed/fleet/utils/hybrid_parallel_util.py
      python/paddle/distributed/fleet/utils/internal_storage.py
      python/paddle/distributed/launch/context/device.py
      python/paddle/distributed/parallel.py
      python/paddle/distributed/parallel_with_gloo.py
      python/paddle/distributed/spawn.py
      python/paddle/framework/__init__.py
      To be mentioned, 'paddle.fluid.dygraph.parallel.ParallelEnv'
       and 'fluid.framework.core' keeps unchanged in those files.
      ParallelEnv is used by paddle.fluid.dygraph.parallel.DataParallel.
      However, APIs in paddle.fluid.dygraph.parallel can't be
      migrated to paddle.distributed, as there exists cyclic import
      dependencies in modules like paddle.static, paddle.tensor. And
      'fluid.framework.core' will be changed to import framework.core
      after fluid.core is transmitted.
      
      * Change TODO authors.
      911d6bb1
  24. 29 11月, 2022 1 次提交
  25. 14 11月, 2022 1 次提交
  26. 09 11月, 2022 1 次提交
  27. 08 11月, 2022 1 次提交
  28. 03 11月, 2022 1 次提交
  29. 01 11月, 2022 1 次提交
    • N
      [CodeStyle][py2] remove `six` package (part2) (#47334) · 3592ba8c
      Nyakku Shigure 提交于
      * [CodeStyle][py2] remove `six` package (part2)
      
      * six.ensure_str
      
      * remove unused `import six`
      
      * remove six from BUILTIN_LIKELY_MODULES
      
      * remove six in example code
      
      * remove some decode
      
      * try to fix example code
      
      * fix MockEtcdClient get/get_prefix returns data type
      
      * fix MockEtcdClient get_prefix returns data
      
      * fix MockEtcdClient get returns data
      
      * remove `six` in pypi and conda requirements
      
      * fix MockEtcdClient add_watch_callback/add_watch_prefix_callback returns data type
      
      * refine MockEtcdClient
      3592ba8c
  30. 23 10月, 2022 1 次提交
  31. 19 10月, 2022 1 次提交
  32. 13 10月, 2022 1 次提交
    • X
      [WIP]飞桨PaddlePaddle 分布式强化学习功能研发 (#45998) · f0afcabc
      Xinger 提交于
      * add rpc module in cpp side
      
      * add rpc module in python side
      
      * support win32 and mac for rpc
      
      * 代码优化
      
      * 优化代码
      
      * update rpc
      
      * update rpc launch
      
      * rpc remove rank and world_size api
      
      * fix logger import bug
      
      * remove support for win and mac
      
      * remove support for xpu, npu, cinn and rocm
      
      * remove support for xpu, npu, cinn and rocm
      
      * fix shutdown barrier timeout bug
      
      * update:python_rpc_handler to shared ptr
      
      * fix master shutodwn first bug
      
      * tests support for cpu
      
      * update log to vlog
      
      * update get service info api
      
      * add single process test case
      
      * remove process group
      
      * remove some useless dependencies
      
      * update rpc api comments
      
      * update rpc comments: Example to Examples
      
      * update rpc api comments
      
      * update rpc api comments
      
      * update launch api comments
      
      * update init_rpc comments
      
      * update rpc sync and async comments
      
      * fix bug: init_rpc cant be called repeatly in a process
      
      * update rpc api comment: make master endpoint unique
      
      * update rpc api:service to worker, timeout_ms to timeout
      
      * rename ServiceInfo to WorkerInfo
      
      * refactor: rename server to worker, log to vlog
      
      * add launch test
      
      * remove unused codes
      
      * refine
      f0afcabc
  33. 12 10月, 2022 1 次提交
    • N
      [CodeStyle][F401] remove unused imports in python/paddle/distributed (#46758) · fe716a0b
      Nyakku Shigure 提交于
      * [CodeStyle][F401] remove unused import in python/paddle/distributed
      
      * remove pass
      
      * empty commit
      
      * Fix ValueError: list.remove(x): x not in list for meta_optimizer_names.
      
      Fix ValueError: list.remove(x): x not in list for meta_optimizer_names.
      
      * Fix split import.
      
      Fix split import.
      
      * add noqa after meta_optimizers in factory
      
      * restort collective ops
      
      * expand `import *`
      
      * add noqa after required imports
      
      * try to fix APIs without core.ops
      
      * Revert "try to fix APIs without core.ops"
      
      This reverts commit 6172beaf601e84bf61f2490c12c4739f0edaa5eb.
      
      * fix an increment
      
      * empty commit
      
      * add noqa after required imports
      
      * expand `import *`, fix ci error
      Co-authored-by: NShuangchi He <34329208+Yulv-git@users.noreply.github.com>
      fe716a0b
  34. 14 9月, 2022 1 次提交
  35. 22 8月, 2022 1 次提交
  36. 19 8月, 2022 1 次提交
  37. 18 8月, 2022 1 次提交
  38. 17 8月, 2022 1 次提交
  39. 11 8月, 2022 1 次提交
  40. 08 8月, 2022 1 次提交