- 30 6月, 2023 1 次提交
-
-
由 Michael Wyatt 提交于
* utilize shorter tests for MII * use cached torch download * rework zero++ unit tests * formatting --------- Co-authored-by: NHeyangQin <heyangqin@microsoft.com>
-
- 24 6月, 2023 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 12 5月, 2023 1 次提交
-
-
由 digger-yu 提交于
-
- 14 4月, 2023 1 次提交
-
-
由 Michael Wyatt 提交于
* update docs to reflect changes in deepspeed-chat training script * add blogs to ignored changes in unit tests
-
- 05 4月, 2023 1 次提交
-
-
由 Michael Wyatt 提交于
-
- 22 3月, 2023 1 次提交
-
-
由 Logan Adams 提交于
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 18 1月, 2023 1 次提交
-
-
由 Michael Wyatt 提交于
-
- 21 12月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
add reusable workflow that sets up fresh venv for each test and prints relevant environment info
-
- 09 11月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
* remove any cupy install when setting up environments * revert previous changes to run on cu111 runners * fix for when no cupy is installed * remove cupy uninstall for workflows not using latest torch version * update to cu116 for inference tests * fix pip uninstall line * move python environment list to after DS install * remove cupy uninstall * re-add --forked * fix how we get cupy version (should be based on nvcc version)
-
- 02 11月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
* check only major CUDA version in CI * update expected torch latest version * pin torch latest to 1.12 until issues with 1.13 are resolve * wrong expected torch version * Update nv-torch18-v100.yml * remove forked from pytest option due to cuda re-initialization errors * removed expected torch version from inference tests, causing errors currently * fix various bugs that popped up * move all tests over to cu111 runners, cu113 runners having problems
-
- 14 10月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 05 10月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
* batch of refactored tests * more test refactoring * fp16 test refactor * more refactors * added DistributedFixture class * applied DistributedFixture to first batch of tests as a trial * added DistributedFixture test and documentation * last tests * fixes for refactored tests * remove subdirs in workflow files * fix pytest syntax error * fix another syntax error * update imports * use DistFixture with elastic checkpoint test * missing import * update to shared class tmpdir for elastic test * moved test files * avoid duplicate test file name * last refactor and moving test files * formatting * fix broken import * testing forked AMD tests * update abstract method * use blob storage for accelerate and transformers tests * upgrade torch for acclerate CI Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 10 9月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NSam Ade Jacobs <samjacobs@microsoft.com>
-
- 11 8月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
Refactor Distributed unit tests
-
- 02 8月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
Fix for distributed tests Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 16 6月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
-
- 15 6月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 07 6月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 25 5月, 2022 1 次提交
-
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 19 3月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
* added concurrency to github actions * fixed problem where one workflow can cancel another workflow * added tmp file to help test concurrency policy on CI * removed tmp file to finish testing concurrency policy
-
- 15 3月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-