未验证 提交 cd4e473e 编写于 作者: D digger yu 提交者: GitHub

fix typo with deepspeed/ (#3547)

* fix spelling error with deepspeed/runtime/

* fix typo docs/

* fix typo in comments with deepspeed/

* fix typo deepspeed/

* Update constants.py

Remove the space after nebula

---------
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
上级 da8f4e01
......@@ -637,7 +637,7 @@ class Autotuner:
logger.info(f"End tuning for space: {tuning_space_name}")
return max_micro_batch_size, best_mbs, best_metric_val
def get_plauteu_mbs(self, tuning_space_name):
def get_plateau_mbs(self, tuning_space_name):
if tuning_space_name not in self.records:
return 0
space_records = self.records[tuning_space_name]
......
......@@ -213,14 +213,14 @@ def student_initialization(student_model, teacher_model, deepspeed_config):
Example 1: bert.encoder.layer, for BERT_base model's prefix name
Example 2: transformer.h, for GPT-2 hugging face prefix name
teacher_layer (`list of integers`)
The layer of teacher will be used for student's reinitializedion
The layer of teacher will be used for student's reinitialization
Example 1: [1,3,5,7,9], means we want to matches the 2nd/4th/6th/8th/10th layer of teacher to the first 5 layers of student
student_layer (`list` or None)
The layer of student need to be re-initialized
Example 1: None, means we want to reinitialize all the layers
Example 1: [0,1,2,3,4], means we want to reinitialize the first 5 layers
other_module_name (`list of string`)
The modules will be used for student's reinitializedion
The modules will be used for student's reinitialization
Example 1: ['bert.pooler', 'bert.embeddings', 'classifier'], means we want to apply the weight in teacher's embedding/pooler/classier module to the student
Example 2: ['transformer.w', 'transformer.ln_f', 'lm_head'], means we want to apply the weight in teacher's embedding layers module to the student
Note that teacher_layer should matches student layer
......
......@@ -29,8 +29,8 @@ NEBULA_ENABLED_DEFAULT = False
# There is a case where customer want to load the checkpoint saved
# by raw torch. Because nebula cannot load torch checkpoint directly
# as they have different folder structures to bring the gap for
# loading(the data are totally same in bytes for torch and nebula s
# aving).
# loading(the data are totally same in bytes for torch and nebula
# saving).
# In this case, we must disable nebula load to use raw torch load.
# Customer can just set NEBULA_ENABLE_NEBULA_LOAD to False. Then use
# original way of deepspeed to load, i.e. set the value of "--load".
......
......@@ -31,7 +31,7 @@ class CheckpointEngine(object):
pass
def commit(self, tag):
# to tell checkpoint services if all files are readys.
# to tell checkpoint services if all files are ready.
pass
```
......@@ -26,5 +26,5 @@ class CheckpointEngine(object):
pass
def commit(self, tag):
# to tell checkpoint services if all files are readys.
# to tell checkpoint services if all files are ready.
pass
......@@ -1916,7 +1916,7 @@ class DeepSpeedEngine(Module):
"""
Manually overrides the DeepSpeed engine's gradient accumulation boundary state, this is an optional
feature and should be used with care. The state should be set before to the intended
value before each forward/backward. The final fordward/backward should have the
value before each forward/backward. The final forward/backward should have the
boundary state set to True. This style allows client code to only call engine.step() once after all
the gradient accumulation passes are complete. See example below:
.. code-block:: python
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册