- 06 4月, 2023 1 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 05 4月, 2023 1 次提交
-
-
由 Michael Wyatt 提交于
-
- 24 3月, 2023 1 次提交
-
-
由 Michael Wyatt 提交于
* bump torch18 -> torch19 * fix gptj --------- Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 22 3月, 2023 2 次提交
-
-
由 Molly Smith 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Logan Adams 提交于
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 18 2月, 2023 3 次提交
-
-
由 Michael Wyatt 提交于
-
由 Michael Wyatt 提交于
* add auto-generated PR for private repo * change variable names
-
由 Michael Wyatt 提交于
-
- 15 2月, 2023 2 次提交
-
-
由 Michael Wyatt 提交于
* fix overlapping checkpoint names in unit tests * remove running cpu-only on master merge
-
由 Michael Wyatt 提交于
* fix permissions issue with pip upgrade * install to .local instead of use sudo * upgrade pip in venv * Update action.yml * fix typos
-
- 11 2月, 2023 1 次提交
-
-
由 Michael Wyatt 提交于
-
- 05 2月, 2023 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Common location to install libaio-dev * Update .github/workflows/setup-venv/action.yml Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> --------- Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
- 19 1月, 2023 1 次提交
-
-
由 Michael Wyatt 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 18 1月, 2023 2 次提交
-
-
由 Michael Wyatt 提交于
-
由 Olatunji Ruwase 提交于
* CPU-Adam: add compile-flag to enable param-copy from CPU to GPU * guarde the CUDA-related include files and variables * remove CUDA dependency from op_builder when building against CPU * fixing the builder issues * fix formatting * return true when there is no mismatch on the cuda version * guard for when cuda is not available & test with cpu-only environment * Update cpu_adam and cpu_adagrad * Format fixes * Add configurable half precision type; Build/run in CUDA environment * Run cpu_adam and cpu_adagrad in cpu only environment * Mark CUDA only unit tests * CPU environment CI * Format fixes * Remove --forked * Add --forked * CPU only CI should pass * Format fixes * Format fixes * Remove scattered pytest.skip * Fix cpu_adam unit test * Update .github/workflows/nv-torch-latest-cpu.yml Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> * Update .github/workflows/nv-torch-latest-cpu.yml Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> * Address PR feedback * OpenMP linking * Fix unit tests Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
- 14 1月, 2023 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 10 1月, 2023 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 21 12月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
add reusable workflow that sets up fresh venv for each test and prints relevant environment info
-
- 18 12月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
* added megatron unit test * Update nv-megatron.yml Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 14 12月, 2022 1 次提交
-
-
由 Connor Holmes 提交于
* Migrate ops tests to new inference_ops marker * Disable by default * Add missing test cases * Reorder such that inference_ops will run[fail] first
-
- 10 12月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 23 11月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
Adding MII tests to ensure changes to DS-Inference do not break MII
-
- 18 11月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
* Make new InferenceConfig backwards compatible with previous init_inference API Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 11 11月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
* fix for lm-eval nightly tests and add gpt-j to MPtest because OOM on single GPU * add nv-nightly badge
-
- 09 11月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
* remove any cupy install when setting up environments * revert previous changes to run on cu111 runners * fix for when no cupy is installed * remove cupy uninstall for workflows not using latest torch version * update to cu116 for inference tests * fix pip uninstall line * move python environment list to after DS install * remove cupy uninstall * re-add --forked * fix how we get cupy version (should be based on nvcc version)
-
- 02 11月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
* check only major CUDA version in CI * update expected torch latest version * pin torch latest to 1.12 until issues with 1.13 are resolve * wrong expected torch version * Update nv-torch18-v100.yml * remove forked from pytest option due to cuda re-initialization errors * removed expected torch version from inference tests, causing errors currently * fix various bugs that popped up * move all tests over to cu111 runners, cu113 runners having problems
-
- 14 10月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 08 10月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
-
- 05 10月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
* batch of refactored tests * more test refactoring * fp16 test refactor * more refactors * added DistributedFixture class * applied DistributedFixture to first batch of tests as a trial * added DistributedFixture test and documentation * last tests * fixes for refactored tests * remove subdirs in workflow files * fix pytest syntax error * fix another syntax error * update imports * use DistFixture with elastic checkpoint test * missing import * update to shared class tmpdir for elastic test * moved test files * avoid duplicate test file name * last refactor and moving test files * formatting * fix broken import * testing forked AMD tests * update abstract method * use blob storage for accelerate and transformers tests * upgrade torch for acclerate CI Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 15 9月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 13 9月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 10 9月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NSam Ade Jacobs <samjacobs@microsoft.com>
-
- 01 9月, 2022 1 次提交
-
-
由 Ammar Ahmad Awan 提交于
Co-authored-by: Ncmikeh2 <connorholmes@microsoft.com>
-
- 27 8月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
Add blob storage to CI runners and enable for transformers cache on inference tests
-
- 25 8月, 2022 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Correctly detect CPU optimizer usage * Update nv-transformers-v100.yml (#2259) Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 13 8月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 11 8月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
Refactor Distributed unit tests
-
- 06 8月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 04 8月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
re-enable AMD CI with some modifications
-
- 03 8月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-