1. 01 8月, 2022 1 次提交
    • O
      Release swap buffers for persisted params (#2089) · 2210ebe7
      Olatunji Ruwase 提交于
      * Split parameter offload from z3
      
      * Format fixes
      
      * Bug fixes
      
      * Cleanup
      
      * Remove dead code
      
      * Release swap buffers for persisted params
      
      * Format fixes
      
      * Format fixes
      
      * Pass args correctly
      
      * Use pinned memory for nvme offload
      
      * Merge with masster
      
      * Fix missing import
      
      * model pesistence params
      
      * Fix merge issues
      
      * Handle none device
      
      * Usse log_dist
      2210ebe7
  2. 31 7月, 2022 1 次提交
  3. 30 7月, 2022 3 次提交
  4. 29 7月, 2022 3 次提交
  5. 28 7月, 2022 5 次提交
  6. 27 7月, 2022 12 次提交
  7. 26 7月, 2022 5 次提交
  8. 23 7月, 2022 4 次提交
  9. 22 7月, 2022 3 次提交
  10. 21 7月, 2022 2 次提交
    • O
      Checkpoint reshaping (#1953) · 80d0a32f
      Olatunji Ruwase 提交于
      * unit test, remove exception, add notes
      
      * Move param_shapes to model files
      
      * Remove hard-coded constants
      
      * Conditioned to zero optimizer
      
      * Add zero checkpoint merging
      
      * Print checkpoint version
      
      * Reshape zero_* ckpt files
      
      * Merge zero* files contraction
      
      * Utils for 3D contraction reshaping
      
      * Remove bogus import
      
      * Support bf16_zero ckpts
      
      * Add param slice mappings
      
      * Load universal checkpoints
      
      * Per group mappings from Stas
      
      * Hack to load bf16 zero files
      
      * Param attributes
      
      * WIP
      
      * Fix api bug
      
      * Update lp with local/remote hp
      
      * Disable vocab padding handling
      
      * Update z2 checkpoint
      
      * Remove debug prints
      
      * Remove debug prints; Rebase unit test
      
      * Add reshape assert
      
      * Padding
      
      * Typo
      
      * Catch nonexistent checkpoint path
      
      * Cleanup
      
      * Restore checkpoint state comparisons
      
      * Add torch version guards
      
      * More precise avoidance of false positives.
      Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
      80d0a32f
    • C
  11. 20 7月, 2022 1 次提交