[Doc] round_robin_gradients (#1261)

* Fix docstring * Make screenshots clickable for easier viewing * Navigation menu in alphabetical order; More clicable screenshots * Rename 1Cycle doc * Tweak naming * Remove no longer used flag * ZeRO3 Offload release * Single GPU results * Rearrange figures * Single GPU text * tweak intro * zero3-offload section * Add asynchronous i/o docs * Fix print_per_steps doc * Document round_robin_gradients * Tweak description * Trigger CI

[Doc] round_robin_gradients (#1261)
* Fix docstring * Make screenshots clickable for easier viewing * Navigation menu in alphabetical order; More clicable screenshots * Rename 1Cycle doc * Tweak naming * Remove no longer used flag * ZeRO3 Offload release * Single GPU results * Rearrange figures * Single GPU text * tweak intro * zero3-offload section * Add asynchronous i/o docs * Fix print_per_steps doc * Document round_robin_gradients * Tweak description * Trigger CI
40c381df · Olatunji Ruwase · GitHub · e82060d0 · 40c381df
隐藏空白更改
内联并排

Showing with 7 addition and 0 deletion

docs/_pages/config-json.md docs/_pages/config-json.md +7 -0

未找到文件。
--- a/docs/_pages/config-json.md
+++ b/docs/_pages/config-json.md
@@ -301,6 +301,7 @@ Enabling and configuring ZeRO memory optimizations
    "elastic_checkpoint" : [true|false],
    "stage3_gather_fp16_weights_on_model_save": [true|false],
    "ignore_unused_parameters": [true|false]
+    "round_robin_gradients": [true|false]
    }
 ```

@@ -358,6 +359,12 @@ Enabling and configuring ZeRO memory optimizations
 | ------------------------------------------------------------------------------------------------------------------------------------------ | ------- |
 | For use with ZeRO stage 1, enable backward hooks to reduce gradients during the backward pass or wait until the end of the backward pass.  | `True`  |

+***round_robin_gradients***: [boolean]
+
+| Description                                                                                                                                | Default |
+| ------------------------------------------------------------------------------------------------------------------------------------------ | ------- |
+| Stage 2 optimization for CPU offloading that parallelizes gradient copying to CPU memory among ranks by fine-grained gradient partitioning. Performance benefit grows with gradient accumulation steps (more copying between optimizer steps) or GPU count (increased parallelism). | `False`  |
+
 ***offload_param***: [dictionary]

 | Description                                                                                                                                                                                   | Default |