未验证 提交 cd4e473e 编写于 作者: D digger yu 提交者: GitHub

fix typo with deepspeed/ (#3547)

* fix spelling error with deepspeed/runtime/

* fix typo docs/

* fix typo in comments with deepspeed/

* fix typo deepspeed/

* Update constants.py

Remove the space after nebula

---------
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
上级 da8f4e01
...@@ -637,7 +637,7 @@ class Autotuner: ...@@ -637,7 +637,7 @@ class Autotuner:
logger.info(f"End tuning for space: {tuning_space_name}") logger.info(f"End tuning for space: {tuning_space_name}")
return max_micro_batch_size, best_mbs, best_metric_val return max_micro_batch_size, best_mbs, best_metric_val
def get_plauteu_mbs(self, tuning_space_name): def get_plateau_mbs(self, tuning_space_name):
if tuning_space_name not in self.records: if tuning_space_name not in self.records:
return 0 return 0
space_records = self.records[tuning_space_name] space_records = self.records[tuning_space_name]
......
...@@ -213,14 +213,14 @@ def student_initialization(student_model, teacher_model, deepspeed_config): ...@@ -213,14 +213,14 @@ def student_initialization(student_model, teacher_model, deepspeed_config):
Example 1: bert.encoder.layer, for BERT_base model's prefix name Example 1: bert.encoder.layer, for BERT_base model's prefix name
Example 2: transformer.h, for GPT-2 hugging face prefix name Example 2: transformer.h, for GPT-2 hugging face prefix name
teacher_layer (`list of integers`) teacher_layer (`list of integers`)
The layer of teacher will be used for student's reinitializedion The layer of teacher will be used for student's reinitialization
Example 1: [1,3,5,7,9], means we want to matches the 2nd/4th/6th/8th/10th layer of teacher to the first 5 layers of student Example 1: [1,3,5,7,9], means we want to matches the 2nd/4th/6th/8th/10th layer of teacher to the first 5 layers of student
student_layer (`list` or None) student_layer (`list` or None)
The layer of student need to be re-initialized The layer of student need to be re-initialized
Example 1: None, means we want to reinitialize all the layers Example 1: None, means we want to reinitialize all the layers
Example 1: [0,1,2,3,4], means we want to reinitialize the first 5 layers Example 1: [0,1,2,3,4], means we want to reinitialize the first 5 layers
other_module_name (`list of string`) other_module_name (`list of string`)
The modules will be used for student's reinitializedion The modules will be used for student's reinitialization
Example 1: ['bert.pooler', 'bert.embeddings', 'classifier'], means we want to apply the weight in teacher's embedding/pooler/classier module to the student Example 1: ['bert.pooler', 'bert.embeddings', 'classifier'], means we want to apply the weight in teacher's embedding/pooler/classier module to the student
Example 2: ['transformer.w', 'transformer.ln_f', 'lm_head'], means we want to apply the weight in teacher's embedding layers module to the student Example 2: ['transformer.w', 'transformer.ln_f', 'lm_head'], means we want to apply the weight in teacher's embedding layers module to the student
Note that teacher_layer should matches student layer Note that teacher_layer should matches student layer
......
...@@ -29,8 +29,8 @@ NEBULA_ENABLED_DEFAULT = False ...@@ -29,8 +29,8 @@ NEBULA_ENABLED_DEFAULT = False
# There is a case where customer want to load the checkpoint saved # There is a case where customer want to load the checkpoint saved
# by raw torch. Because nebula cannot load torch checkpoint directly # by raw torch. Because nebula cannot load torch checkpoint directly
# as they have different folder structures to bring the gap for # as they have different folder structures to bring the gap for
# loading(the data are totally same in bytes for torch and nebula s # loading(the data are totally same in bytes for torch and nebula
# aving). # saving).
# In this case, we must disable nebula load to use raw torch load. # In this case, we must disable nebula load to use raw torch load.
# Customer can just set NEBULA_ENABLE_NEBULA_LOAD to False. Then use # Customer can just set NEBULA_ENABLE_NEBULA_LOAD to False. Then use
# original way of deepspeed to load, i.e. set the value of "--load". # original way of deepspeed to load, i.e. set the value of "--load".
......
...@@ -31,7 +31,7 @@ class CheckpointEngine(object): ...@@ -31,7 +31,7 @@ class CheckpointEngine(object):
pass pass
def commit(self, tag): def commit(self, tag):
# to tell checkpoint services if all files are readys. # to tell checkpoint services if all files are ready.
pass pass
``` ```
...@@ -26,5 +26,5 @@ class CheckpointEngine(object): ...@@ -26,5 +26,5 @@ class CheckpointEngine(object):
pass pass
def commit(self, tag): def commit(self, tag):
# to tell checkpoint services if all files are readys. # to tell checkpoint services if all files are ready.
pass pass
...@@ -1916,7 +1916,7 @@ class DeepSpeedEngine(Module): ...@@ -1916,7 +1916,7 @@ class DeepSpeedEngine(Module):
""" """
Manually overrides the DeepSpeed engine's gradient accumulation boundary state, this is an optional Manually overrides the DeepSpeed engine's gradient accumulation boundary state, this is an optional
feature and should be used with care. The state should be set before to the intended feature and should be used with care. The state should be set before to the intended
value before each forward/backward. The final fordward/backward should have the value before each forward/backward. The final forward/backward should have the
boundary state set to True. This style allows client code to only call engine.step() once after all boundary state set to True. This style allows client code to only call engine.step() once after all
the gradient accumulation passes are complete. See example below: the gradient accumulation passes are complete. See example below:
.. code-block:: python .. code-block:: python
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册