1. 30 6月, 2023 1 次提交
  2. 15 6月, 2023 1 次提交
  3. 03 6月, 2023 1 次提交
  4. 12 5月, 2023 1 次提交
  5. 03 5月, 2023 1 次提交
  6. 26 4月, 2023 2 次提交
  7. 31 3月, 2023 1 次提交
  8. 30 3月, 2023 1 次提交
  9. 29 3月, 2023 1 次提交
  10. 27 3月, 2023 1 次提交
  11. 15 3月, 2023 1 次提交
  12. 07 3月, 2023 1 次提交
  13. 28 2月, 2023 1 次提交
  14. 24 2月, 2023 1 次提交
  15. 26 1月, 2023 1 次提交
    • M
      Abstract accelerator (step 3) (#2677) · 98cc35b6
      Ma, Guokai 提交于
      * Integrate accelerator abstraction interface into deepspeed/
      
      * Fix error message in fp16/fused_optimizer
      
      * fix error message in fp16/unfused_optimizer.py
      
      * assign get_accelerator().pin_memory() result to input Tensor name
      
      * no need to check cuda and whether nvtx supported
      
      * move try-except into inner most block
      
      * call Event() and Stream() in get_accelerator() for data type
      
      * Make Stream and Event as properties of abstract interface so they can be used as data type in deepspeed
      
      * Apply op_builder backend api change from #2705 from @jeffra
      
      * fix tests where Builder NAME is used
      
      * keep original ...Builder.NAME interface instead of ...Builder().NAME interface
      
      * fix builder closure for installation
      
      * fix randomltd builder
      
      * add comments to clarify create_op_builder and get_op_builder
      
      * fix compatibility with pip install -e
      Co-authored-by: NCheng Li <pistasable@gmail.com>
      Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
      98cc35b6
  16. 25 1月, 2023 1 次提交
  17. 14 1月, 2023 1 次提交
  18. 11 1月, 2023 1 次提交
  19. 09 1月, 2023 2 次提交
  20. 17 12月, 2022 1 次提交
  21. 10 11月, 2022 1 次提交
  22. 18 10月, 2022 1 次提交
  23. 04 8月, 2022 1 次提交
  24. 27 7月, 2022 1 次提交
    • M
      Refactor ZeRO configs to use Pydantic (#2004) · 59975896
      Michael Wyatt 提交于
      * first pass at pydanticifying Zero Configs
      
      * added pydantic to reqs
      
      * fixed bug with deprecated values not being type-checked
      
      * fixing zero config bugs from unit tests
      
      * fixed access of Config values
      
      * removing zero constants
      
      * formatting/fix broken import
      
      * fixed bad merge
      
      * fixed issue with missing aliased field
      
      * fix for failing tests
      
      * fix how deprecated fields are processed
      
      * only process dep params when they are set
      
      * fix mistyped field name
      
      * fixes, docs, removed more constants
      
      * fix merge
      
      * more fixes after merge w master
      
      * added unit tests
      
      * formatting
      
      * added fix for transformers unit tests
      
      * separated offload config from zero config
      
      * fixed bad import
      
      * formatting and flake fixes
      
      * implement suggestion from review
      Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
      Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
      59975896
  25. 21 7月, 2022 1 次提交
    • O
      Checkpoint reshaping (#1953) · 80d0a32f
      Olatunji Ruwase 提交于
      * unit test, remove exception, add notes
      
      * Move param_shapes to model files
      
      * Remove hard-coded constants
      
      * Conditioned to zero optimizer
      
      * Add zero checkpoint merging
      
      * Print checkpoint version
      
      * Reshape zero_* ckpt files
      
      * Merge zero* files contraction
      
      * Utils for 3D contraction reshaping
      
      * Remove bogus import
      
      * Support bf16_zero ckpts
      
      * Add param slice mappings
      
      * Load universal checkpoints
      
      * Per group mappings from Stas
      
      * Hack to load bf16 zero files
      
      * Param attributes
      
      * WIP
      
      * Fix api bug
      
      * Update lp with local/remote hp
      
      * Disable vocab padding handling
      
      * Update z2 checkpoint
      
      * Remove debug prints
      
      * Remove debug prints; Rebase unit test
      
      * Add reshape assert
      
      * Padding
      
      * Typo
      
      * Catch nonexistent checkpoint path
      
      * Cleanup
      
      * Restore checkpoint state comparisons
      
      * Add torch version guards
      
      * More precise avoidance of false positives.
      Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
      80d0a32f
  26. 14 7月, 2022 1 次提交
  27. 08 7月, 2022 1 次提交
  28. 07 7月, 2022 1 次提交
  29. 28 6月, 2022 1 次提交
  30. 21 6月, 2022 1 次提交
  31. 14 6月, 2022 1 次提交
  32. 11 6月, 2022 1 次提交
  33. 16 5月, 2022 1 次提交
  34. 12 5月, 2022 1 次提交
  35. 20 4月, 2022 1 次提交
    • O
      bf16+pipeline parallelism (#1801) · 56c52238
      Olatunji Ruwase 提交于
      * bf16 updates
      
      * Got bf16 working
      
      * fp32 reduction; flattened tensors
      
      * bf16+zero_stage_1 first cut
      
      * finish zero_stage 1 sharding
      
      * Matching fp16 with debugging codes
      
      * Matching loss with fp16
      
      * Fix gradient clipping
      
      * bf16 gradient clipping fix
      bf16 checkpoint save/load
      
      * Unscale grad norm
      
      * Fix grad norm scaling
      
      * Enable loading fp16_zero_1 into bf16_zero_1 engine and vice versa
      
      * Fix clip_grad key error
      
      * Reduce tied weight gradients
      
      * Fix grad norm for moe
      
      * Reduce specified gradients
      
      * Use O(n) instead of O(n^2)
      
      * Remove optimizer restriction for bf16
      
      * Link bf16 & fp32 params
      
      * Clip gradients of last stage tied weights
      
      * Simplify tied weights reduction logic
      
      * Also clip all tp rank parameters
      
      * lp to hp mapping
      
      * Link lp/hp/optim state; Refresh links after checkpoint load
      
      * Remove debug print
      
      * Remove debug print
      
      * Simplify zero_grad logic
      
      * fp32 accessors
      
      * Fix update bug
      Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
      56c52238
  36. 23 3月, 2022 1 次提交
  37. 18 3月, 2022 1 次提交
  38. 17 3月, 2022 1 次提交