1. 02 9月, 2020 1 次提交
  2. 01 9月, 2020 1 次提交
    • S
      Samyamr/grad acc stage2 (#338) · 7240abf3
      Samyam Rajbhandari 提交于
      * Adding gradient accumulation support for ZeRO Stage 2. Changing all Megatron-LM tests to also test gradient accumulation
      
      * Gradient Accumulation support for Stage 2. Model tests added to test the feature
      
      * formatting
      
      * Update deepspeed_light.py
      
      removing comment
      
      * Update ds_config_func_bs8_zero1.json
      
      reverting this file back. Its not needed for this PR
      
      * defining baseline prefix
      Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
      7240abf3
  3. 14 7月, 2020 1 次提交
  4. 07 7月, 2020 1 次提交
  5. 24 6月, 2020 1 次提交
  6. 20 6月, 2020 1 次提交
    • S
      Update deepspeed_utils.py (#270) · 224494bd
      Samyam Rajbhandari 提交于
      * Removing handle_overflow debugging code in deepspeed_utils.py
      
      * Removing handle_overflow debugging code in deepspeed_zero_optimizer.py
      
      Removing unnecessary overflow handle code. Not sure why it was there in the first place.
      224494bd
  7. 05 6月, 2020 1 次提交
    • C
      Add log util (#230) · e1ad8803
      Chunyang Wen 提交于
      * Add log util
      
      * replace all occurrences of print and logging
      
      * address format
      
      * disable propagate to avoid duplicate log
      e1ad8803
  8. 04 6月, 2020 1 次提交
  9. 28 5月, 2020 2 次提交
    • J
      add support for predivide as a config option (#235) · bc36b91d
      Jeff Rasley 提交于
      * add support for predivide as a flag
      * add predivide json config, remove allgather_disable (as it's not currently used anymore)
      bc36b91d
    • S
      Samyamr/cpu memory bloat fix zero (#233) · d24d3de9
      Samyam Rajbhandari 提交于
      * Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather
      
      * Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather
      
      * Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather
      d24d3de9
  10. 19 5月, 2020 1 次提交
  11. 25 4月, 2020 1 次提交
  12. 21 4月, 2020 1 次提交
  13. 03 4月, 2020 1 次提交
  14. 26 3月, 2020 1 次提交
  15. 11 3月, 2020 1 次提交
  16. 04 2月, 2020 1 次提交