1. 02 2月, 2023 1 次提交
  2. 01 2月, 2023 4 次提交
  3. 31 1月, 2023 2 次提交
  4. 29 1月, 2023 1 次提交
  5. 27 1月, 2023 5 次提交
  6. 26 1月, 2023 2 次提交
    • M
      Abstract accelerator (step 3) (#2677) · 98cc35b6
      Ma, Guokai 提交于
      * Integrate accelerator abstraction interface into deepspeed/
      
      * Fix error message in fp16/fused_optimizer
      
      * fix error message in fp16/unfused_optimizer.py
      
      * assign get_accelerator().pin_memory() result to input Tensor name
      
      * no need to check cuda and whether nvtx supported
      
      * move try-except into inner most block
      
      * call Event() and Stream() in get_accelerator() for data type
      
      * Make Stream and Event as properties of abstract interface so they can be used as data type in deepspeed
      
      * Apply op_builder backend api change from #2705 from @jeffra
      
      * fix tests where Builder NAME is used
      
      * keep original ...Builder.NAME interface instead of ...Builder().NAME interface
      
      * fix builder closure for installation
      
      * fix randomltd builder
      
      * add comments to clarify create_op_builder and get_op_builder
      
      * fix compatibility with pip install -e
      Co-authored-by: NCheng Li <pistasable@gmail.com>
      Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
      98cc35b6
    • S
      [GatheredParameters] fix memory leak (#2665) · ddd48b36
      Stas Bekman 提交于
      * [GatheredParameters] fix memory leak
      
      * simplify
      
      * cleanup and move
      
      * style
      
      * Formatting
      
      * fix test
      
      * fix test
      
      * fix test take 2
      
      * Trigger CI
      Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
      Co-authored-by: NJoe Mayer <114769929+jomayeri@users.noreply.github.com>
      ddd48b36
  7. 25 1月, 2023 3 次提交
  8. 20 1月, 2023 1 次提交
  9. 19 1月, 2023 4 次提交
  10. 18 1月, 2023 6 次提交
  11. 14 1月, 2023 4 次提交
  12. 13 1月, 2023 1 次提交
  13. 12 1月, 2023 2 次提交
  14. 11 1月, 2023 2 次提交
    • C
      Add mlflow logging for aml (#2495) · a3d7f106
      cassieesvelt 提交于
      * add logging changes
      
      * try w/out abspath
      
      * undo last change
      
      * start mlflow debug
      
      * remove mlflow from export_envs
      
      * add mlflow logging for reversed
      
      * remove mlflow.start_run
      
      * add back start run
      
      * don't clean cmd
      
      * print os environment variables
      
      * remove first start run
      
      * add run_id to mlflow star
      
      * remove context managers
      
      * move last end run
      
      * add extra parent start_runs
      
      * add run id logging
      
      * add logging to run_ds_config
      
      * change run_id to run_name
      
      * add back context managers and run_id logs
      
      * remove context mng
      
      * debug environment variable
      
      * reset environment variables
      
      * add env variable deletion
      
      * clean up
      
      * remove unused import
      
      * fix yapf/whitespace errors
      Co-authored-by: NCheng Li <pistasable@gmail.com>
      a3d7f106
    • J
      remove duplicated code in ZeRO (#2655) · 89da037e
      JackieWu 提交于
      Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
      89da037e
  15. 10 1月, 2023 2 次提交