- 19 7月, 2023 4 次提交
-
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
- 18 7月, 2023 10 次提交
-
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
Signed-off-by: NJavier Salmeron Garcia <jsalmeron@vmware.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Nripesh Niketan 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Logan Adams 提交于
-
- 15 7月, 2023 5 次提交
-
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
由 mzl 提交于
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
-
由 hipudding 提交于
Create tensor with device=Integer will always choose cuda as its deivce for current pytorch version(2.1), other device should use device={device}:{index}. Change get_accelerator().current_device() to get_accelerator().current_device_name() to support other devices. Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 14 7月, 2023 11 次提交
-
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
由 Conglong Li 提交于
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
由 KaiChen1008 提交于
fix "ERROR: failed to solve: nvidia/cuda:11.7.0-devel-ubuntu18.04: docker.io/nvidia/cuda:11.7.0-devel-ubuntu18.04: not found" (#3930) Update Nvidia docker version. Fix "ERROR: failed to solve: nvidia/cuda:11.7.0-devel-ubuntu18.04: docker.io/nvidia/cuda:11.7.0-devel-ubuntu18.04: not found" Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
-
由 Logan Adams 提交于
-
由 Logan Adams 提交于
-
- 13 7月, 2023 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Eugene Cheah 提交于
Since deepspeed 1 checkpoint structure is identical to deepspeed 2 (AFAIK), we should just change the version check and add support accordingly
-
- 12 7月, 2023 4 次提交
-
-
由 Michael Wyatt 提交于
* add coverage report * define env vars in shared action * reduce time for longest running tests * fix broken shared action * reduce test time * reducing Pipeline test times * further reducing test times * rework Z3 test * testing new mp.pool and persistent dist envs * fix import * reuse distributed environment for tests with lots of param combos * fix for dist teardown * fix pickling issue with pool cache * actually fix pickling problem * avoid running pool cache stuff on non-distributed tests * fix issues with nested mp.pool * fix for nested pools in Pipeline Engine * re-add params * update workflows with pytest opts * implement feedback * resolve race condition with port selection * Update tests/unit/common.py --------- Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Heyang Qin 提交于
-
由 hipudding 提交于
Object deepspeed.zero.Init() is not callable, it can't be used as a decorator. Delete this code comment to avoid misunderstanding. Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
-
由 digger yu 提交于
-
- 11 7月, 2023 1 次提交
-
-
由 digger yu 提交于
-
- 10 7月, 2023 1 次提交
-
-
由 Jeff Rasley 提交于
* fix memory leak with z3 * __del__ is not good for zero-inference
-
- 08 7月, 2023 2 次提交
-
-
由 Jinzhen Lin 提交于
-
由 YiSheng5 提交于
* print_rank_0 is not defined in this python script * Enable log_summary for mics feature based on Zero_Stage 3
-