Doc updates (#3520)

* Update pytorch-profiler.md * Update one-cycle.md --------- Co-authored-by: N Logan Adams <114770087+loadams@users.noreply.github.com>

Doc updates (#3520)
* Update pytorch-profiler.md * Update one-cycle.md --------- Co-authored-by: N Logan Adams <114770087+loadams@users.noreply.github.com>
82991df0 · Gavin Goodship · GitHub · d73f99d8 · 82991df0 · 82991df0
隐藏空白更改
内联并排

Showing with 23 addition and 23 deletion

docs/_tutorials/one-cycle.md docs/_tutorials/one-cycle.md +16 -16

docs/_tutorials/pytorch-profiler.md docs/_tutorials/pytorch-profiler.md +7 -7

未找到文件。
--- a/docs/_tutorials/one-cycle.md
+++ b/docs/_tutorials/one-cycle.md
@@ -42,33 +42,33 @@ of learning rate and momentum because they are correlated hyperparameters. We
 have leveraged this recommendation to reduce configuration burden by organizing
 the 1-cycle parameters into two groups:

-1. Global parameters for configuring the cycle and decay phase
-2. Local parameters for configuring learning rate and momentum
+1. Global parameters for configuring the cycle and decay phase.
+2. Local parameters for configuring learning rate and momentum.

 The global parameters for configuring the 1-cycle phases are:

-1. `cycle_first_step_size`: The count of training steps to complete first step of cycle phase
-2. `cycle_first_stair_count`: The count of updates (or stairs) in first step of cycle phase
-3. `cycle_second_step_size`: The count of training steps to complete second step of cycle phase
-4. `cycle_second_stair_count`: The count of updates (or stairs) in the second step of cycle phase
-5. `post_cycle_decay_step_size`: The interval, in training steps, to decay hyperparameter in decay phase
+1. `cycle_first_step_size`: The count of training steps to complete first step of cycle phase.
+2. `cycle_first_stair_count`: The count of updates (or stairs) in first step of cycle phase.
+3. `cycle_second_step_size`: The count of training steps to complete second step of cycle phase.
+4. `cycle_second_stair_count`: The count of updates (or stairs) in the second step of cycle phase.
+5. `post_cycle_decay_step_size`: The interval, in training steps, to decay hyperparameter in decay phase.

 The local parameters for the hyperparameters are:

 **Learning rate**:

-1. `cycle_min_lr`: minimum learning rate in cycle phase
-2. `cycle_max_lr`: maximum learning rate in cycle phase
-3. `decay_lr_rate`: decay rate for learning rate in decay phase
+1. `cycle_min_lr`: Minimum learning rate in cycle phase.
+2. `cycle_max_lr`: Maximum learning rate in cycle phase.
+3. `decay_lr_rate`: Decay rate for learning rate in decay phase.

 Although appropriate values `cycle_min_lr` and `cycle_max_lr` values can be
 selected based on experience or expertise,  we recommend using [learning rate
 range test](/tutorials/lrrt/) feature of DeepSpeed to configure them.

 **Momentum**
-1. `cycle_min_mom`: minimum momentum in cycle phase
-2. `cycle_max_mom`: maximum momentum in cycle phase
-3. `decay_mom_rate`: decay rate for momentum in decay phase
+1. `cycle_min_mom`: Minimum momentum in cycle phase.
+2. `cycle_max_mom`: Maximum momentum in cycle phase.
+3. `decay_mom_rate`: Decay rate for momentum in decay phase.

 ## Required Model Configuration Changes

@@ -122,9 +122,9 @@ GPU, but was converging slowly to target performance (AUC) when training on 8
 GPUs (8X batch size). The plot below shows model convergence with 8 GPUs for
 these learning rate schedules:

-1. **Fixed**: using an optimal fixed learning rate for 1-GPU training.
-2. **LinearScale**: using a fixed learning rate that is 8X of **Fixed**.
-3. **1Cycle**: using 1-Cycle schedule.
+1. **Fixed**: Using an optimal fixed learning rate for 1-GPU training.
+2. **LinearScale**: Using a fixed learning rate that is 8X of **Fixed**.
+3. **1Cycle**: Using 1-Cycle schedule.

 ![model_convergence](/assets/images/model_convergence.png)


--- a/docs/_tutorials/pytorch-profiler.md
+++ b/docs/_tutorials/pytorch-profiler.md
@@ -22,12 +22,12 @@ from torch.profiler import profile, record_function, ProfilerActivity

 with torch.profiler.profile(
    schedule=torch.profiler.schedule(
-        wait=5, # during this phase profiler is not active
-        warmup=2, # during this phase profiler starts tracing, but the results are discarded
-        active=6, # during this phase profiler traces and records data
-        repeat=2), # specifies an upper bound on the number of cycles
+        wait=5, # During this phase profiler is not active.
+        warmup=2, # During this phase profiler starts tracing, but the results are discarded.
+        active=6, # During this phase profiler traces and records data.
+        repeat=2), # Specifies an upper bound on the number of cycles.
    on_trace_ready=tensorboard_trace_handler,
-    with_stack=True # enable stack tracing, adds extra profiling overhead
+    with_stack=True # Enable stack tracing, adds extra profiling overhead.
 ) as profiler:
    for step, batch in enumerate(data_loader):
        print("step:{}".format(step))
@@ -40,7 +40,7 @@ with torch.profiler.profile(

        #weight update
        model_engine.step()
-        profiler.step() # send the signal to the profiler that the next step has started
+        profiler.step() # Send the signal to the profiler that the next step has started.
 ```

 ## Label arbitrary code ranges
@@ -48,7 +48,7 @@ with torch.profiler.profile(
 The `record_function` context manager can be used to label arbitrary code ranges with user provided names. For example, the following code marks `"model_forward"` as a label:

 ```python
-with profile(record_shapes=True) as prof: # record_shapes indicates whether to record shapes of the operator inputs
+with profile(record_shapes=True) as prof: # record_shapes indicates whether to record shapes of the operator inputs.
    with record_function("""):"
        model_engine(inputs)
 ```