- 12 7月, 2023 1 次提交
-
-
由 Michael Wyatt 提交于
* add coverage report * define env vars in shared action * reduce time for longest running tests * fix broken shared action * reduce test time * reducing Pipeline test times * further reducing test times * rework Z3 test * testing new mp.pool and persistent dist envs * fix import * reuse distributed environment for tests with lots of param combos * fix for dist teardown * fix pickling issue with pool cache * actually fix pickling problem * avoid running pool cache stuff on non-distributed tests * fix issues with nested mp.pool * fix for nested pools in Pipeline Engine * re-add params * update workflows with pytest opts * implement feedback * resolve race condition with port selection * Update tests/unit/common.py --------- Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 07 7月, 2023 1 次提交
-
-
由 Ramya Ramineni 提交于
* Workaround to pass unit/ops/accelerators/test_accelerator_backward.py unit tests on ROCm * Rearranged is_rocm_pytorch() * Introduced is_rocm_pytorch() for ROCm * Fixed formatting errors * Function call * formatting fix --------- Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: NLogan Adams <loadams@microsoft.com>
-
- 31 3月, 2023 1 次提交
-
-
由 Michael Wyatt 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 27 3月, 2023 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 08 3月, 2023 1 次提交
-
-
由 Ma, Guokai 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 28 2月, 2023 1 次提交
-
-
由 Jeff Rasley 提交于
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: NConglong Li <conglong.li@gmail.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 18 1月, 2023 1 次提交
-
-
由 Olatunji Ruwase 提交于
* CPU-Adam: add compile-flag to enable param-copy from CPU to GPU * guarde the CUDA-related include files and variables * remove CUDA dependency from op_builder when building against CPU * fixing the builder issues * fix formatting * return true when there is no mismatch on the cuda version * guard for when cuda is not available & test with cpu-only environment * Update cpu_adam and cpu_adagrad * Format fixes * Add configurable half precision type; Build/run in CUDA environment * Run cpu_adam and cpu_adagrad in cpu only environment * Mark CUDA only unit tests * CPU environment CI * Format fixes * Remove --forked * Add --forked * CPU only CI should pass * Format fixes * Format fixes * Remove scattered pytest.skip * Fix cpu_adam unit test * Update .github/workflows/nv-torch-latest-cpu.yml Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> * Update .github/workflows/nv-torch-latest-cpu.yml Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> * Address PR feedback * OpenMP linking * Fix unit tests Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
- 05 10月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
* batch of refactored tests * more test refactoring * fp16 test refactor * more refactors * added DistributedFixture class * applied DistributedFixture to first batch of tests as a trial * added DistributedFixture test and documentation * last tests * fixes for refactored tests * remove subdirs in workflow files * fix pytest syntax error * fix another syntax error * update imports * use DistFixture with elastic checkpoint test * missing import * update to shared class tmpdir for elastic test * moved test files * avoid duplicate test file name * last refactor and moving test files * formatting * fix broken import * testing forked AMD tests * update abstract method * use blob storage for accelerate and transformers tests * upgrade torch for acclerate CI Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 26 7月, 2022 1 次提交
-
-
由 Alex Hedges 提交于
-
- 21 1月, 2022 1 次提交
-
-
由 Justin Chiu 提交于
* Changes for bfloat16 Zero2 * ZeRO stage3 optimizations, with some bug fixes optimizations for stage3: - prefetching improvements - batching allgather calls to amortize fixed overhead and improve bandwidth utilization - batching reduce_scatter calls to amortize fixed overhead and improve bandwidth utilization - using *_base variants of allgather and reduce scatter to reduce memory allocations and data movement - more fine grained synchronization for communication that allows blocking on less work - precomputation of fetching code - using a fetch queue rather than deciding what to (pre)fetch at each iteration - limiting queued coalesced communication ops to reduce memory pressure on pytorch cuda caching allocator (not elegant solution) optimizations for stage3-offload: - made some host-device tensor copies async to improve performance bug fixes and qol improvements: - fix init context method when parent modules modify child weights - speed up model initialization by moving model to GPU before weight initialization - fixed unit test imports so that unit tests can be run from any directory - change performance logging to include memory consumption - add logging w/ model size when done partitioning model new features - bfloat16 support for ZeRO 3 * fix import in ut * ran yapf * improvements to cache flush warn log * backwards compatibility with older versions of pytorch * handle edge case where reduced tensor smaller than world size * moved event synchronization to allgather handle wait() call * removed unnecessary barrier call * formatting fix after resolving merge conflict * skip nvme prefetch when trace not complete * opportunistically avoid memory allocation in allgather coalesced where possible * fix indentation after merge * fixes to account for parameter offload * accounting for torch.cuda.memory_stats not being available * moved partition_all_params to optimizer step * allgathering on params before item gets called * fix param status checks needed after moving partition_all_parameters call to optimizer step * fix grad accumulation with optimizer offload * grad norm computation fix for optimizer offload * change post divide in reduce-scatter to pre divide * fix gradient race condition w/ optimizer offload * improve inf/nan gradient tracking * don't prefetch when not in training mode * format fix after merging * fix prefetching issue when using NVME offload * improved defragmentation for fp16 parameters * relative imports for bf16 tests * changes for bwd compatibility with pytorch 1.2 * remove buffered_reduce_fallback * removed unused parameter offset bookkeeping * fixed tracking for multiple param groups * unbroke bfloat16 config after merge conflict * using base allgather params when only 1 param * cleanup/fixes for fp16 partition defragmentation * switch to CRLF * convert to same new-line style as master * align new line with master * Fix merge issues * switch to CRLF * fix to LF line endings * minor merge fixes * remove extra bfloat16_enabled definition * asserting params inflight for AllGatherHandle * remove get_cuda_mem_allocated_str * Format fixes * fix bfloat16 zero stage check (broken after merge commit) * +self.communication_data_type, -self.allreduce_always_fp32; delete dead code * Add self.reduce_scatter * Format fix * Fix merge issues * iterate over params_to_fetch rather than make another iterator * add some TODOs * remove unnecessary division by micro_step_id * rename config keys "bfloat16" -> "bf16" * rename stage3_gather_fp16_weights_on_model_save -> stage3_gather_16bit_weights_on_model_save * add unit test to check backwards compatibility for gather_16bit_weights * added test to confirm bf16 key bwd compatibility * Format fixes Co-authored-by: NRana Ali Amjad <raamjad@amazon.com> Co-authored-by: NJustin Chiu <justchiu@amazon.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 02 12月, 2021 1 次提交
-
-
由 Reza Yazdani 提交于
* fixing the softmax masking when using triangular masking * fix a bug in the the layernorm backward kernels * revert back some changes & remove debug code * change the constants to a macro Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 01 11月, 2021 1 次提交
-
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 02 10月, 2021 1 次提交
-
-
由 Alex Hedges 提交于
* Fix typos in docs/ * Fix typos in code comments and output strings * Fix typos in the code itself * Fix typos in tests/ Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 24 5月, 2021 1 次提交
-
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NElton Zheng <eltonz@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com> Co-authored-by: NArash Ashari <arashari@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NArash Ashari <arashari@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NArash Ashari <arashari@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com>
-
- 08 4月, 2021 1 次提交
-
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 09 3月, 2021 1 次提交
-
-
由 Samyam Rajbhandari 提交于
* Squash stage3 v1 (#146) Co-authored-by: NSamyam <samyamr@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com> * Fix correctness bug (#147) * formatting fix (#150) * stage3 bugfix (API) update and simplified FP16 Z3 tests (#151) * fp16 Z3 API update and bugfix * revert debug change * ZeRO-3 detach and race condition bugfixes (#149) * trying out ZeRO-3 race condition fix * CUDA sync instead of stream * reduction stream sync * remove commented code * Fix optimizer state_dict KeyError (#148) Co-authored-by: NJeff Rasley <jerasley@microsoft.com> * fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152) * Simplifying the logic for getting averaged gradients (#153) * skip for now * Z3 Docs redux (#154) * removing some TODOs and commented code (#155) * New Z3 defaults (#156) Co-authored-by: NJeff Rasley <jerasley@microsoft.com> * formatting * megatron external params Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com>
-
- 30 1月, 2021 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 28 1月, 2021 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 07 1月, 2021 1 次提交
-
-
由 Jeff Rasley 提交于
Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 18 12月, 2020 1 次提交
-
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 02 12月, 2020 1 次提交
-
-
由 Reza Yazdani 提交于
* supporting different hidden dimensions * add support for larger hidden dimensions (greater than 8K) * remove empty line * add loop unrolling factor for dropout kernels * update different kernels based on the reviews Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 13 11月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
-
- 22 9月, 2020 1 次提交
-
-
由 RezaYazdaniAminabadi 提交于
Co-authored-by: NConglong Li <conglong.li@gmail.com>
-
- 21 9月, 2020 1 次提交
-
-
由 RezaYazdaniAminabadi 提交于
-
- 15 9月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
* add pytest skips around tests that require certain ops to be installed
-
- 12 9月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
This reverts commit e549be60.
-
- 11 9月, 2020 1 次提交
-
-
由 RezaYazdaniAminabadi 提交于
* supporting different intermediate sizes other than 4*hidden_dim * run precommit * uncommnet the unit tests Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 30 5月, 2020 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
* Transformer kernels release Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: NElton Zheng <eltonz@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NRezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NTunji Ruwase <olruwase@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: NElton Zheng <eltonz@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NRezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NTunji Ruwase <olruwase@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
-