1. 26 8月, 2023 1 次提交
  2. 25 8月, 2023 1 次提交
  3. 24 8月, 2023 1 次提交
  4. 27 7月, 2023 1 次提交
  5. 26 7月, 2023 1 次提交
  6. 20 7月, 2023 1 次提交
  7. 15 7月, 2023 1 次提交
  8. 09 5月, 2023 1 次提交
  9. 03 5月, 2023 1 次提交
  10. 02 5月, 2023 1 次提交
  11. 30 4月, 2023 1 次提交
  12. 21 4月, 2023 1 次提交
  13. 31 3月, 2023 1 次提交
  14. 27 3月, 2023 1 次提交
  15. 24 3月, 2023 1 次提交
  16. 26 1月, 2023 1 次提交
    • M
      Abstract accelerator (step 3) (#2677) · 98cc35b6
      Ma, Guokai 提交于
      * Integrate accelerator abstraction interface into deepspeed/
      
      * Fix error message in fp16/fused_optimizer
      
      * fix error message in fp16/unfused_optimizer.py
      
      * assign get_accelerator().pin_memory() result to input Tensor name
      
      * no need to check cuda and whether nvtx supported
      
      * move try-except into inner most block
      
      * call Event() and Stream() in get_accelerator() for data type
      
      * Make Stream and Event as properties of abstract interface so they can be used as data type in deepspeed
      
      * Apply op_builder backend api change from #2705 from @jeffra
      
      * fix tests where Builder NAME is used
      
      * keep original ...Builder.NAME interface instead of ...Builder().NAME interface
      
      * fix builder closure for installation
      
      * fix randomltd builder
      
      * add comments to clarify create_op_builder and get_op_builder
      
      * fix compatibility with pip install -e
      Co-authored-by: NCheng Li <pistasable@gmail.com>
      Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
      98cc35b6
  17. 17 12月, 2022 1 次提交
  18. 13 12月, 2022 1 次提交
  19. 22 10月, 2022 1 次提交
  20. 30 7月, 2022 1 次提交
  21. 28 7月, 2022 1 次提交
    • T
      Trajepl/nebula ckpt engine (#2085) · e669aaf5
      trajep 提交于
      * enable checkpoint engine
      
      * seprated nebula config
      
      * add __init__.py for nebula importing
      
      * linter fix
      
      * fix: ds_config is None
      
      * fix: ds config
      
      * fix: get sd loader fix
      
      * align the API with torch raw code
      
      * linter fix
      
      * remove duplicate tag params
      
      * make checkpoint_engine as required args
      
      * fix args
      
      * extract parameters out to config
      
      * fix: load state dict
      
      * separate load engine
      
      * linter fix
      
      * extract checkpoint engine to abstract calss
      
      * linter fix
      
      * construct function args fix
      
      * add docs for dev/customers
      
      * linter fix
      
      * remove load engine
      
      * print->log_dist
      
      * linter fix
      
      * add tag flag to distinguish the loading order
      Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
      Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
      Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
      e669aaf5
  22. 26 7月, 2022 2 次提交
  23. 21 6月, 2022 1 次提交
  24. 16 6月, 2022 1 次提交
  25. 11 6月, 2022 1 次提交
  26. 12 5月, 2022 1 次提交
  27. 10 5月, 2022 1 次提交
  28. 04 5月, 2022 1 次提交
  29. 27 4月, 2022 1 次提交
  30. 20 4月, 2022 1 次提交
    • O
      bf16+pipeline parallelism (#1801) · 56c52238
      Olatunji Ruwase 提交于
      * bf16 updates
      
      * Got bf16 working
      
      * fp32 reduction; flattened tensors
      
      * bf16+zero_stage_1 first cut
      
      * finish zero_stage 1 sharding
      
      * Matching fp16 with debugging codes
      
      * Matching loss with fp16
      
      * Fix gradient clipping
      
      * bf16 gradient clipping fix
      bf16 checkpoint save/load
      
      * Unscale grad norm
      
      * Fix grad norm scaling
      
      * Enable loading fp16_zero_1 into bf16_zero_1 engine and vice versa
      
      * Fix clip_grad key error
      
      * Reduce tied weight gradients
      
      * Fix grad norm for moe
      
      * Reduce specified gradients
      
      * Use O(n) instead of O(n^2)
      
      * Remove optimizer restriction for bf16
      
      * Link bf16 & fp32 params
      
      * Clip gradients of last stage tied weights
      
      * Simplify tied weights reduction logic
      
      * Also clip all tp rank parameters
      
      * lp to hp mapping
      
      * Link lp/hp/optim state; Refresh links after checkpoint load
      
      * Remove debug print
      
      * Remove debug print
      
      * Simplify zero_grad logic
      
      * fp32 accessors
      
      * Fix update bug
      Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
      56c52238
  31. 11 2月, 2022 1 次提交
  32. 23 1月, 2022 1 次提交
  33. 22 10月, 2021 2 次提交
  34. 10 10月, 2021 1 次提交
  35. 09 10月, 2021 1 次提交
  36. 08 10月, 2021 1 次提交
  37. 02 10月, 2021 1 次提交
  38. 30 9月, 2021 1 次提交