未验证 提交 b8c06b6a 编写于 作者: C chenxujun 提交者: GitHub

Fix typos (#50894)

上级 f8ec430e
...@@ -126,7 +126,7 @@ def save( ...@@ -126,7 +126,7 @@ def save(
filepath: saved path filepath: saved path
src: the audio tensor src: the audio tensor
sample_rate: the number of samples of audio per second. sample_rate: the number of samples of audio per second.
channels_first: src channel infomation channels_first: src channel information
if True, means input tensor is (channels, time) if True, means input tensor is (channels, time)
if False, means input tensor is (time, channels) if False, means input tensor is (time, channels)
encoding:encoding format, wave_backend only support PCM16 now. encoding:encoding format, wave_backend only support PCM16 now.
......
...@@ -37,7 +37,7 @@ class ESC50(AudioClassificationDataset): ...@@ -37,7 +37,7 @@ class ESC50(AudioClassificationDataset):
Args: Args:
mode (str, optional): It identifies the dataset mode (train or dev). Default:train. mode (str, optional): It identifies the dataset mode (train or dev). Default:train.
split (int, optional): It specify the fold of dev dataset. Default:1. split (int, optional): It specify the fold of dev dataset. Default:1.
feat_type (str, optional): It identifies the feature type that user wants to extrace of an audio file. Default:raw. feat_type (str, optional): It identifies the feature type that user wants to extract of an audio file. Default:raw.
archive(dict, optional): it tells where to download the audio archive. Default:None. archive(dict, optional): it tells where to download the audio archive. Default:None.
Returns: Returns:
......
...@@ -39,7 +39,7 @@ class TESS(AudioClassificationDataset): ...@@ -39,7 +39,7 @@ class TESS(AudioClassificationDataset):
mode (str, optional): It identifies the dataset mode (train or dev). Defaults to train. mode (str, optional): It identifies the dataset mode (train or dev). Defaults to train.
n_folds (int, optional): Split the dataset into n folds. 1 fold for dev dataset and n-1 for train dataset. Defaults to 5. n_folds (int, optional): Split the dataset into n folds. 1 fold for dev dataset and n-1 for train dataset. Defaults to 5.
split (int, optional): It specify the fold of dev dataset. Defaults to 1. split (int, optional): It specify the fold of dev dataset. Defaults to 1.
feat_type (str, optional): It identifies the feature type that user wants to extrace of an audio file. Defaults to raw. feat_type (str, optional): It identifies the feature type that user wants to extract of an audio file. Defaults to raw.
archive(dict): it tells where to download the audio archive. Defaults to None. archive(dict): it tells where to download the audio archive. Defaults to None.
Returns: Returns:
......
...@@ -34,7 +34,7 @@ def backward(tensors, grad_tensors=None, retain_graph=False): ...@@ -34,7 +34,7 @@ def backward(tensors, grad_tensors=None, retain_graph=False):
retain_graph(bool, optional): If False, the graph used to compute grads will be freed. If you would retain_graph(bool, optional): If False, the graph used to compute grads will be freed. If you would
like to add more ops to the built graph after calling this method( :code:`backward` ), set the parameter like to add more ops to the built graph after calling this method( :code:`backward` ), set the parameter
:code:`retain_graph` to True, then the grads will be retained. Thus, seting it to False is much more memory-efficient. :code:`retain_graph` to True, then the grads will be retained. Thus, setting it to False is much more memory-efficient.
Defaults to False. Defaults to False.
Returns: Returns:
...@@ -79,7 +79,7 @@ def backward(tensors, grad_tensors=None, retain_graph=False): ...@@ -79,7 +79,7 @@ def backward(tensors, grad_tensors=None, retain_graph=False):
assert in_out_list is not None, "{} should not be None".format(name) assert in_out_list is not None, "{} should not be None".format(name)
if isinstance(in_out_list, (list, tuple)): if isinstance(in_out_list, (list, tuple)):
assert len(in_out_list) > 0, "{} connot be empyt".format(name) assert len(in_out_list) > 0, "{} connot be empty".format(name)
for each_var in in_out_list: for each_var in in_out_list:
assert isinstance( assert isinstance(
each_var, (paddle.Tensor, core.eager.Tensor) each_var, (paddle.Tensor, core.eager.Tensor)
......
...@@ -29,7 +29,7 @@ class saved_tensors_hooks: ...@@ -29,7 +29,7 @@ class saved_tensors_hooks:
of the original tensor. `pack_hook` will also be called while any of the original tensor. `pack_hook` will also be called while any
tensor need be saved by `PyLayerContext.save_for_backward`. If a tensor tensor need be saved by `PyLayerContext.save_for_backward`. If a tensor
saved for backward is no need buffer, `pack_hook` will not be called. saved for backward is no need buffer, `pack_hook` will not be called.
Only the thensor saved for backward is LoDTensor, `pack_hook` will be Only the tensor saved for backward is LoDTensor, `pack_hook` will be
called. called.
unpack_hook (function): The unpack hook will be called every time the unpack_hook (function): The unpack hook will be called every time the
backward need use the saved inputs/outputs tensors. Then you can reload backward need use the saved inputs/outputs tensors. Then you can reload
......
...@@ -95,8 +95,8 @@ def corpus_reader(data_path, words_name, props_name): ...@@ -95,8 +95,8 @@ def corpus_reader(data_path, words_name, props_name):
if len(label) == 0: # end of sentence if len(label) == 0: # end of sentence
for i in range(len(one_seg[0])): for i in range(len(one_seg[0])):
a_kind_lable = [x[i] for x in one_seg] a_kind_label = [x[i] for x in one_seg]
labels.append(a_kind_lable) labels.append(a_kind_label)
if len(labels) >= 1: if len(labels) >= 1:
verb_list = [] verb_list = []
......
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
""" """
This module will download dataset from This module will download dataset from
http://www.robots.ox.ac.uk/~vgg/data/flowers/102/index.html http://www.robots.ox.ac.uk/~vgg/data/flowers/102/index.html
and parse train/test set intopaddle reader creators. and parse train/test dataset into paddle reader creators.
This set contains images of flowers belonging to 102 different categories. This set contains images of flowers belonging to 102 different categories.
The images were acquired by searching the web and taking pictures. There are a The images were acquired by searching the web and taking pictures. There are a
......
...@@ -89,9 +89,9 @@ def batch_images_from_tar( ...@@ -89,9 +89,9 @@ def batch_images_from_tar(
:type data_file: string :type data_file: string
:param dataset_name: 'train','test' or 'valid' :param dataset_name: 'train','test' or 'valid'
:type dataset_name: string :type dataset_name: string
:param img2label: a dic with image file name as key :param img2label: a dict with image file name as key
and image's label as value and image's label as value
:type img2label: dic :type img2label: dict
:param num_per_batch: image number per batch file :param num_per_batch: image number per batch file
:type num_per_batch: int :type num_per_batch: int
:return: path of list file containing paths of batch file :return: path of list file containing paths of batch file
......
...@@ -108,7 +108,7 @@ def reader_creator(filename, word_idx, n, data_type): ...@@ -108,7 +108,7 @@ def reader_creator(filename, word_idx, n, data_type):
continue continue
yield src_seq, trg_seq yield src_seq, trg_seq
else: else:
assert False, 'Unknow data type' assert False, 'Unknown data type'
return reader return reader
......
...@@ -48,7 +48,7 @@ class DeviceMesh(core.DeviceMesh): ...@@ -48,7 +48,7 @@ class DeviceMesh(core.DeviceMesh):
The class `DeviceMesh` describes the topology of physical devices. The class `DeviceMesh` describes the topology of physical devices.
Args: Args:
mesh (list|numpy.array): an N-dimensional array describes the toplogy mesh (list|numpy.array): an N-dimensional array describes the topology
of logical processes. of logical processes.
dim_names (list, optional): the i-th element of this list gives the name of the dim_names (list, optional): the i-th element of this list gives the name of the
i-th dimension. i-th dimension.
......
...@@ -257,7 +257,7 @@ class Completer: ...@@ -257,7 +257,7 @@ class Completer:
tensor_desc.name(), compatible_dims_mapping tensor_desc.name(), compatible_dims_mapping
) )
changed = True changed = True
# Find the most compatible implemenetations from the distributed operator # Find the most compatible implementations from the distributed operator
op_dist_impls = find_compatible_distributed_operator_impls( op_dist_impls = find_compatible_distributed_operator_impls(
dist_op, fwd=True dist_op, fwd=True
) )
...@@ -329,7 +329,7 @@ class Completer: ...@@ -329,7 +329,7 @@ class Completer:
tensor_desc.name(), compatible_dims_mapping tensor_desc.name(), compatible_dims_mapping
) )
changed = True changed = True
# Find the most compatible implemenetations from the distributed operator # Find the most compatible implementations from the distributed operator
op_dist_impls = find_compatible_distributed_operator_impls( op_dist_impls = find_compatible_distributed_operator_impls(
dist_op, fwd=False dist_op, fwd=False
) )
...@@ -685,7 +685,7 @@ class Completer: ...@@ -685,7 +685,7 @@ class Completer:
cond_tensor_related_nodes.extend( cond_tensor_related_nodes.extend(
_find_nodes_related_to_cond(cond_tensor_node) _find_nodes_related_to_cond(cond_tensor_node)
) )
# Step 2.3: Add the StepScops output of while_op # Step 2.3: Add the StepScopes output of while_op
stepscopes_tensor_name = while_op_node.op().output("StepScopes")[0] stepscopes_tensor_name = while_op_node.op().output("StepScopes")[0]
stepscopes_tensor_node = None stepscopes_tensor_node = None
for output_node in while_op_node.outputs: for output_node in while_op_node.outputs:
...@@ -1397,7 +1397,7 @@ class Completer: ...@@ -1397,7 +1397,7 @@ class Completer:
) )
forward_var = vars[forward_var_name] forward_var = vars[forward_var_name]
# TODO complete other attribte for grad var # TODO complete other attribute for grad var
tensor_dist_attr = TensorDistAttr() tensor_dist_attr = TensorDistAttr()
process_mesh = ( process_mesh = (
self._dist_context.get_tensor_dist_attr_for_program( self._dist_context.get_tensor_dist_attr_for_program(
......
...@@ -1047,7 +1047,7 @@ class DistributedOperatorContext: ...@@ -1047,7 +1047,7 @@ class DistributedOperatorContext:
# NOTE Support correct parallelism for high-order differential model. # NOTE Support correct parallelism for high-order differential model.
# by default exceed_backward_init_op is False and it means we are in Forward phase; After exceed_backward_init_op = True, # by default exceed_backward_init_op is False and it means we are in Forward phase; After exceed_backward_init_op = True,
# it means we are in Backward phase. # it means we are in Backward phase.
# And the final sulotion should be revise high-order differential logic for these two phases in future. # And the final solution should be revise high-order differential logic for these two phases in future.
self._exceed_backward_init_op = False self._exceed_backward_init_op = False
def __deepcopy__(self, memo): def __deepcopy__(self, memo):
......
...@@ -146,7 +146,7 @@ class DistributedDataLoaderFromGenerator(DistributedDataLoaderBase): ...@@ -146,7 +146,7 @@ class DistributedDataLoaderFromGenerator(DistributedDataLoaderBase):
steps_per_epoch = len(self.dataset) // self.batch_size steps_per_epoch = len(self.dataset) // self.batch_size
except: except:
raise ValueError( raise ValueError(
"Pleace set `steps_per_epoch` or implement `__len__` methond in dataset class." "Please set `steps_per_epoch` or implement `__len__` method in dataset class."
) )
return steps_per_epoch return steps_per_epoch
......
...@@ -328,7 +328,7 @@ class DistributedOperatorHelper: ...@@ -328,7 +328,7 @@ class DistributedOperatorHelper:
elif isinstance(output, Variable): elif isinstance(output, Variable):
new_output = [output] new_output = [output]
else: else:
raise ValueError("Unrecognized outpout.") raise ValueError("Unrecognized output.")
if self._out_dims_mappings: if self._out_dims_mappings:
assert len(new_output) == len( assert len(new_output) == len(
......
...@@ -247,7 +247,7 @@ class Engine: ...@@ -247,7 +247,7 @@ class Engine:
labels = sample[split:] labels = sample[split:]
else: else:
raise TypeError( raise TypeError(
"Data should be a Dataset or IterableDatset, but received {}.".format( "Data should be a Dataset or IterableDataset, but received {}.".format(
type(data).__name__ type(data).__name__
) )
) )
...@@ -699,7 +699,7 @@ class Engine: ...@@ -699,7 +699,7 @@ class Engine:
def _parallel(self, mode, all_ranks=False): def _parallel(self, mode, all_ranks=False):
# Parallelize program based on the planner's results # Parallelize program based on the planner's results
# For now, the completer has to be passed to the planner, # For now, the completer has to be passed to the planner,
# because we may use it to complete the annotation of the backwarkward and update. # because we may use it to complete the annotation of the backward and update.
parallelizer = Parallelizer( parallelizer = Parallelizer(
mode, mode,
self._planners[mode].completer, self._planners[mode].completer,
......
...@@ -117,15 +117,15 @@ def shard_op(op, process_mesh=None, in_shard_specs=None, out_shard_specs=None): ...@@ -117,15 +117,15 @@ def shard_op(op, process_mesh=None, in_shard_specs=None, out_shard_specs=None):
will be used. And an error will be raised if the current process mesh cannot be found. will be used. And an error will be raised if the current process mesh cannot be found.
Default: None. Default: None.
in_shard_specs (list of list, optional): a list of list to describe the sharding specifications in_shard_specs (list of list, optional): a list of list to describe the sharding specifications
for the inputs. Each item of `in_shard_specs` is a `shard_spec` between the correspoinding input for the inputs. Each item of `in_shard_specs` is a `shard_spec` between the corresponding input
and `process_mesh`. If one item is None, the cooresponding input is replicated across all processes and `process_mesh`. If one item is None, the corresponding input is replicated across all processes
If it is None, all inputs are replicated across all processes. Note that the lenght of the If it is None, all inputs are replicated across all processes. Note that the length of the
`in_shard_specs` should be equal to the actual number of inputs when calling this operation. `in_shard_specs` should be equal to the actual number of inputs when calling this operation.
Default: None. Default: None.
out_shard_specs (list of list, optional): a list of list to describe the sharding specifications out_shard_specs (list of list, optional): a list of list to describe the sharding specifications
for the outputs. Each item of `out_shard_specs` is a `shard_spec` between the correspoinding output for the outputs. Each item of `out_shard_specs` is a `shard_spec` between the corresponding output
and `process_mesh`. If one item is None, the cooresponding output is replicated across all processes and `process_mesh`. If one item is None, the corresponding output is replicated across all processes
If it is None, all outputs are replicated across all processes. Note that the lenght of the If it is None, all outputs are replicated across all processes. Note that the length of the
`in_shard_specs` should be equal to the actual number of inputs when calling this operation. `in_shard_specs` should be equal to the actual number of inputs when calling this operation.
Default: None. Default: None. Default: None. Default: None.
......
...@@ -191,7 +191,7 @@ def register_distributed_operator_impl(op_type, dist_impl): ...@@ -191,7 +191,7 @@ def register_distributed_operator_impl(op_type, dist_impl):
def find_compatible_distributed_operator_impls(dist_op, fwd=True, partial=True): def find_compatible_distributed_operator_impls(dist_op, fwd=True, partial=True):
""" """
Here just return the first compatible implemention. Here just return the first compatible implementation.
This will be improved by cost model in the future. This will be improved by cost model in the future.
""" """
op_type = dist_op.serial_op.type op_type = dist_op.serial_op.type
......
...@@ -78,7 +78,7 @@ def adopt_lookup_table_v1(ctx, main_block, src_op, Ids_var): ...@@ -78,7 +78,7 @@ def adopt_lookup_table_v1(ctx, main_block, src_op, Ids_var):
) )
if not Ids_var.stop_gradient: if not Ids_var.stop_gradient:
raise NotImplementedError( raise NotImplementedError(
'Requiring the gradient of Ids of lookup_table(v1dist op is not currently supported. Please open an issue with details on your use case so that we can prioritize adding this (for instance, adversarial training for language model).' 'Requiring the gradient of Ids of lookup_table(v1) dist op is not currently supported. Please open an issue with details on your use case so that we can prioritize adding this (for instance, adversarial training for language model).'
) )
target_shape = list(Ids_var.shape[:-1]) target_shape = list(Ids_var.shape[:-1])
...@@ -405,7 +405,7 @@ class DistributedEmbeddingImpl(DistributedOperatorImpl): ...@@ -405,7 +405,7 @@ class DistributedEmbeddingImpl(DistributedOperatorImpl):
ctx, op_dist_attr.process_mesh, rank_id ctx, op_dist_attr.process_mesh, rank_id
) )
# A generalized method to caculate embedding offset using cartisian product # A generalized method to calculate embedding offset using cartisian product
relative_idx = _get_idx_in_axis( relative_idx = _get_idx_in_axis(
process_mesh_group, process_mesh_group,
process_mesh_shape, process_mesh_shape,
...@@ -416,7 +416,7 @@ class DistributedEmbeddingImpl(DistributedOperatorImpl): ...@@ -416,7 +416,7 @@ class DistributedEmbeddingImpl(DistributedOperatorImpl):
per_part_size = Weight_var.shape[0] per_part_size = Weight_var.shape[0]
relative_idx = relative_idx * per_part_size relative_idx = relative_idx * per_part_size
# TODO caculate ring id # TODO calculate ring id
parallel_axis = embedding_row_dim_mapping parallel_axis = embedding_row_dim_mapping
group_ranks = _get_comm_group( group_ranks = _get_comm_group(
process_mesh_group, process_mesh_shape, parallel_axis, rank_id process_mesh_group, process_mesh_shape, parallel_axis, rank_id
...@@ -544,7 +544,7 @@ class DistributedEmbeddingImpl(DistributedOperatorImpl): ...@@ -544,7 +544,7 @@ class DistributedEmbeddingImpl(DistributedOperatorImpl):
process_mesh = param_dist_attr.process_mesh process_mesh = param_dist_attr.process_mesh
dim_mapping = param_dist_attr.dims_mapping dim_mapping = param_dist_attr.dims_mapping
# NOTE all not splited axis should be presented in mesh # NOTE all not splitted axis should be presented in mesh
for axis, size in enumerate(process_mesh.shape): for axis, size in enumerate(process_mesh.shape):
if size <= 1 or axis in dim_mapping: if size <= 1 or axis in dim_mapping:
pass pass
...@@ -632,7 +632,7 @@ class DistributedEmbeddingImpl(DistributedOperatorImpl): ...@@ -632,7 +632,7 @@ class DistributedEmbeddingImpl(DistributedOperatorImpl):
process_mesh_shape = dist_attr.process_mesh.shape process_mesh_shape = dist_attr.process_mesh.shape
process_mesh_group = dist_attr.process_mesh.process_ids process_mesh_group = dist_attr.process_mesh.process_ids
# A generalized method to caculate embedding offset using cartisian product # A generalized method to calculate embedding offset using cartisian product
relative_idx = _get_idx_in_axis( relative_idx = _get_idx_in_axis(
process_mesh_group, process_mesh_group,
process_mesh_shape, process_mesh_shape,
......
...@@ -114,7 +114,7 @@ class DistributedFillConstantBatchSizeLikeImpl0(DistributedOperatorImpl): ...@@ -114,7 +114,7 @@ class DistributedFillConstantBatchSizeLikeImpl0(DistributedOperatorImpl):
x_dims_mapping = op_dist_attr.get_input_dims_mapping(x_name) x_dims_mapping = op_dist_attr.get_input_dims_mapping(x_name)
out_dims_mapping = op_dist_attr.get_output_dims_mapping(out_name) out_dims_mapping = op_dist_attr.get_output_dims_mapping(out_name)
# only the batch size dimemsion of input and output are relative. # only the batch size dimension of input and output are relative.
dim_changed = compute_compatible_and_update_dim_mapping( dim_changed = compute_compatible_and_update_dim_mapping(
[x_dims_mapping, out_dims_mapping], [0, 0] [x_dims_mapping, out_dims_mapping], [0, 0]
) )
......
...@@ -377,7 +377,7 @@ def _right_operand_parameter_matmul_backward(ctx, *args, **kwargs): ...@@ -377,7 +377,7 @@ def _right_operand_parameter_matmul_backward(ctx, *args, **kwargs):
# assert len( # assert len(
# Y_var_dim_mapping # Y_var_dim_mapping
# ) == 2, "dist matmual only support Y operand with 2 dims now but Y({})'s dim is [{}]".format( # ) == 2, "dist matmul only support Y operand with 2 dims now but Y({})'s dim is [{}]".format(
# Y_var.name, Y_var_dim_mapping) # Y_var.name, Y_var_dim_mapping)
Y_var_partitioned = False Y_var_partitioned = False
for dim in Y_var_dim_mapping: for dim in Y_var_dim_mapping:
......
...@@ -51,14 +51,14 @@ class DistributedPNormImpl0(DistributedOperatorImpl): ...@@ -51,14 +51,14 @@ class DistributedPNormImpl0(DistributedOperatorImpl):
1. axis == None, isinstance(p, (int, float)), asvector = True 1. axis == None, isinstance(p, (int, float)), asvector = True
1.1 x_dims_mapping == [0, -1, -1] 1.1 x_dims_mapping == [0, -1, -1]
allgather input if it is splited by dp group allgather input if it is splitted by dp group
1.2 x_dims_mapping == [-1, 0, -1] 1.2 x_dims_mapping == [-1, 0, -1]
allgather, split and concat input if it is splited by mp group allgather, split and concat input if it is splitted by mp group
2. isinstance(axis, int), asvector = False 2. isinstance(axis, int), asvector = False
1.1 axis == 0 and x_dims_mapping == [0, -1, -1] 1.1 axis == 0 and x_dims_mapping == [0, -1, -1]
allgather input if it's input[0] is splited by dp group. allgather input if it's input[0] is splited by dp group.
1.2 axis == 1 and x_dims_mapping == [-1, 0, -1] 1.2 axis == 1 and x_dims_mapping == [-1, 0, -1]
allgather, split and concat input if it's input[1] is splited by mp group allgather, split and concat input if it's input[1] is splitted by mp group
""" """
def __init__(self, name): def __init__(self, name):
......
...@@ -67,7 +67,7 @@ class DistributedUpdateLossScalingImpl(DistributedOperatorImpl): ...@@ -67,7 +67,7 @@ class DistributedUpdateLossScalingImpl(DistributedOperatorImpl):
@staticmethod @staticmethod
def backward(ctx, *args, **kwargs): def backward(ctx, *args, **kwargs):
# the backward function only filte the gradient with current rank id # the backward function only filter the gradient with current rank id
dist_op_context = ctx.dist_op_context dist_op_context = ctx.dist_op_context
main_block = dist_op_context.main_block main_block = dist_op_context.main_block
backward_op = dist_op_context.cur_src_op backward_op = dist_op_context.cur_src_op
......
...@@ -55,9 +55,9 @@ class AutoParallelizer: ...@@ -55,9 +55,9 @@ class AutoParallelizer:
AutoParallelizer is the main controller class to do the auto parallel process. AutoParallelizer is the main controller class to do the auto parallel process.
And the auto parallel process will be triggered in the wrapped parallelize function. And the auto parallel process will be triggered in the wrapped parallelize function.
To facilitate the auto parallelization, it will contain information about program, cluster and the To facilitate the auto parallelization, it will contain information about program, cluster and the
related context. In this basic version, the program information will be retrevied from related context. In this basic version, the program information will be retrieved from
Fleet object, and the cluster information can be retrevied in the new created Cluster object, Fleet object, and the cluster information can be retrieved in the new created Cluster object,
and the context information can be retrevied in the new created DistributedContext. and the context information can be retrieved in the new created DistributedContext.
""" """
def __init__(self, fleet): def __init__(self, fleet):
......
...@@ -251,7 +251,7 @@ class Partitioner: ...@@ -251,7 +251,7 @@ class Partitioner:
serial_ops[idx].desc.original_id() serial_ops[idx].desc.original_id()
] = serial_ops[idx] ] = serial_ops[idx]
# partiiton # partition
appended_grad_times = 0 appended_grad_times = 0
for idx, op in enumerate(serial_ops): for idx, op in enumerate(serial_ops):
...@@ -263,7 +263,7 @@ class Partitioner: ...@@ -263,7 +263,7 @@ class Partitioner:
if not op_dist_attr.is_recompute: if not op_dist_attr.is_recompute:
appended_grad_times += 1 appended_grad_times += 1
# partititon input variables # partition input variables
for serial_input_varname in op.desc.input_arg_names(): for serial_input_varname in op.desc.input_arg_names():
if ( if (
serial_input_varname serial_input_varname
......
...@@ -29,10 +29,10 @@ class Planner: ...@@ -29,10 +29,10 @@ class Planner:
self._dist_context._dist_op_context = default_ctx.dist_op_context self._dist_context._dist_op_context = default_ctx.dist_op_context
self._dist_context.data_parallel = default_ctx.data_parallel self._dist_context.data_parallel = default_ctx.data_parallel
if not is_naive_data_parallel(self._dist_context): if not is_naive_data_parallel(self._dist_context):
# Use SSA graph for complex parallism # Use SSA graph for complex parallelism
self._dist_context.initialize(with_graph=True) self._dist_context.initialize(with_graph=True)
else: else:
# Use program for data parallel parallism # Use program for data parallel parallelism
self._dist_context.initialize(with_graph=False) self._dist_context.initialize(with_graph=False)
self._completer = Completer(self._dist_context) self._completer = Completer(self._dist_context)
......
...@@ -57,7 +57,7 @@ def new_process_group(ranks, group_id=None, force_new_group=False): ...@@ -57,7 +57,7 @@ def new_process_group(ranks, group_id=None, force_new_group=False):
cur_key = ''.join(map(str, sorted(pg.ranks))) cur_key = ''.join(map(str, sorted(pg.ranks)))
if pg_id != 0 and new_key == cur_key: if pg_id != 0 and new_key == cur_key:
return pg return pg
# If not matching the existing one, construt a new process group # If not matching the existing one, construct a new process group
num_groups = len(_g_process_group_map) num_groups = len(_g_process_group_map)
# Note: our process group may interfere with the original implementation # Note: our process group may interfere with the original implementation
# so the created group id should start from the original _new_ring_id() # so the created group id should start from the original _new_ring_id()
......
...@@ -22,7 +22,7 @@ class ProcessMesh(core.ProcessMesh): ...@@ -22,7 +22,7 @@ class ProcessMesh(core.ProcessMesh):
The class `Processmesh` describes the topology of logical processes. The class `Processmesh` describes the topology of logical processes.
Args: Args:
mesh (list|numpy.array): an N-dimensional array describes the toplogy mesh (list|numpy.array): an N-dimensional array describes the topology
of logical processes. of logical processes.
dim_names (list, optional): the i-th element of this list gives the name of the dim_names (list, optional): the i-th element of this list gives the name of the
i-th dimension. i-th dimension.
......
...@@ -35,7 +35,7 @@ class BaseConfig: ...@@ -35,7 +35,7 @@ class BaseConfig:
for field, default_value in config.items(): for field, default_value in config.items():
setattr(self, field, default_value) setattr(self, field, default_value)
# Overide attributes by the config_dict # Override attributes by the config_dict
if self._config_dict: if self._config_dict:
self.from_dict(self._config_dict) self.from_dict(self._config_dict)
...@@ -128,7 +128,7 @@ class FusedPassesConfig(BaseConfig): ...@@ -128,7 +128,7 @@ class FusedPassesConfig(BaseConfig):
class Strategy(BaseConfig): class Strategy(BaseConfig):
""" """
The `Strategy` object is used to configure the paralleization and optimization beheviors. The `Strategy` object is used to configure the parallelization and optimization behaviors.
Args: Args:
config (dict|string, optional): If this is None, the default configurations will used. config (dict|string, optional): If this is None, the default configurations will used.
......
...@@ -23,7 +23,7 @@ from .trial import TrialStatus ...@@ -23,7 +23,7 @@ from .trial import TrialStatus
class AlgorithmBase(ABC): class AlgorithmBase(ABC):
""" """
An Tuning alogrithm is a class to find out an optimal configuration An Tuning algorithm is a class to find out an optimal configuration
given the selected tuning optimization pass(es) and the arguments to be tuned. given the selected tuning optimization pass(es) and the arguments to be tuned.
Different optimization pass(es) will correspond to a different algorithm, Different optimization pass(es) will correspond to a different algorithm,
where different search space **pruning rules** will applied. where different search space **pruning rules** will applied.
...@@ -71,7 +71,7 @@ class AlgorithmBase(ABC): ...@@ -71,7 +71,7 @@ class AlgorithmBase(ABC):
@abstractmethod @abstractmethod
def update(self, results): def update(self, results):
""" """
Update the algorthim with the results of last trial. Using this information is used to Update the algorithm with the results of last trial. Using this information is used to
pruning the search space of the future trial. pruning the search space of the future trial.
""" """
pass pass
...@@ -227,7 +227,7 @@ class ReccomputeCheckpointAlgorithm(AlgorithmBase): ...@@ -227,7 +227,7 @@ class ReccomputeCheckpointAlgorithm(AlgorithmBase):
else: else:
self._trial_idx = self._total_num_trial self._trial_idx = self._total_num_trial
self._logger.info( self._logger.info(
"Recompute is unnecessary for this model size, which will reduce the Throughtput." "Recompute is unnecessary for this model size, which will reduce the Throughput."
) )
else: else:
if self._trail_left >= self._trail_right: if self._trail_left >= self._trail_right:
......
...@@ -611,7 +611,7 @@ The best trial is: [{}], whose configuration is following: ...@@ -611,7 +611,7 @@ The best trial is: [{}], whose configuration is following:
def tune(self): def tune(self):
""" """
Performs the search for best hyperparameter configuations Performs the search for best hyperparameter configurations
for the selected optimization pass(es). for the selected optimization pass(es).
""" """
......
...@@ -530,7 +530,7 @@ class ParallelTuner: ...@@ -530,7 +530,7 @@ class ParallelTuner:
del self._concerned_dist_ops[op_id] del self._concerned_dist_ops[op_id]
print( print(
"Number of the concered dist ops", "Number of the concerned dist ops",
len(self._concerned_dist_ops), len(self._concerned_dist_ops),
flush=True, flush=True,
) )
...@@ -631,7 +631,7 @@ class ParallelTuner: ...@@ -631,7 +631,7 @@ class ParallelTuner:
direction = directions[i].random(self._seed) direction = directions[i].random(self._seed)
size = sizes[i].random(self._seed) size = sizes[i].random(self._seed)
if direction: if direction:
# Substract 1 from size to avoid the overlapping of new starts # Subtract 1 from size to avoid the overlapping of new starts
new_start = start - (size - 1) new_start = start - (size - 1)
else: else:
new_start = start + size new_start = start + size
...@@ -788,7 +788,7 @@ class ParallelTuner: ...@@ -788,7 +788,7 @@ class ParallelTuner:
dist_op.dist_attr.impl_idx = 0 dist_op.dist_attr.impl_idx = 0
def _check_fused_softmax_mask_upper_triangle(self, dist_op): def _check_fused_softmax_mask_upper_triangle(self, dist_op):
"""The last_but_one dim shoule be equal to last dim.""" """The last_but_one dim should be equal to last dim."""
input_name = dist_op.serial_op.input_arg_names[0] input_name = dist_op.serial_op.input_arg_names[0]
input_dims_mapping = dist_op.dist_attr.get_input_dims_mapping( input_dims_mapping = dist_op.dist_attr.get_input_dims_mapping(
input_name input_name
...@@ -996,7 +996,7 @@ class ParallelTuner: ...@@ -996,7 +996,7 @@ class ParallelTuner:
self._dist_context.serial_main_program self._dist_context.serial_main_program
) )
# Backup the intital parallel strategy # Backup the initial parallel strategy
self._init_parallel_strategy[0] = copy.deepcopy( self._init_parallel_strategy[0] = copy.deepcopy(
self._dist_context._dist_tensors_for_program self._dist_context._dist_tensors_for_program
) )
......
...@@ -73,7 +73,7 @@ def parse_args(): ...@@ -73,7 +73,7 @@ def parse_args():
"--ctx_filename", "--ctx_filename",
type=str, type=str,
required=True, required=True,
help="the filename to the profile context file saved by optimizaiton tuner", help="the filename to the profile context file saved by optimization tuner",
) )
args = parser.parse_args() args = parser.parse_args()
......
...@@ -81,7 +81,7 @@ class QKVPattern(BasePattern): ...@@ -81,7 +81,7 @@ class QKVPattern(BasePattern):
# Pattern # Pattern
self.attrs["shard_spec"] = [ self.attrs["shard_spec"] = [
[(1, 2, 3), [[-1, 0], [-1, 1]]], [(1, 2, 3), [[-1, 0], [-1, 1]]],
] # 2-tuple list such as [(tensor_id, shard_sepc)] ] # 2-tuple list such as [(tensor_id, shard_spec)]
def convert_to_graph(ops, block): def convert_to_graph(ops, block):
...@@ -535,7 +535,7 @@ class ClusterPartitionUtil: ...@@ -535,7 +535,7 @@ class ClusterPartitionUtil:
], ],
) -> list: ) -> list:
""" """
Partiton cluster into possible device meshes. Partition cluster into possible device meshes.
Args: Args:
n (int): The number of nodes. n (int): The number of nodes.
......
...@@ -147,7 +147,7 @@ class OptimizationTunerTrial(Trial): ...@@ -147,7 +147,7 @@ class OptimizationTunerTrial(Trial):
draws = border + "\n" draws = border + "\n"
draws += h1_format.format("") draws += h1_format.format("")
draws += h1_format.format("Tuned Configuartions Overview") draws += h1_format.format("Tuned Configurations Overview")
draws += h1_format.format("") draws += h1_format.format("")
for name in self._changed_configs: for name in self._changed_configs:
......
...@@ -26,7 +26,7 @@ class TunableSpace: ...@@ -26,7 +26,7 @@ class TunableSpace:
def __init__(self): def __init__(self):
# Tunable variables for this tunable variables # Tunable variables for this tunable variables
self._variables = {} self._variables = {}
# Specific values coresponding to each tunable variable # Specific values corresponding to each tunable variable
self._values = {} self._values = {}
@property @property
......
...@@ -273,7 +273,7 @@ def _get_comm_group(processes, shape, axis, rank): ...@@ -273,7 +273,7 @@ def _get_comm_group(processes, shape, axis, rank):
Given a rank and the processes mesh the rank belongs to, Given a rank and the processes mesh the rank belongs to,
compute the communication peers of the rank based on the give axis in the mesh. compute the communication peers of the rank based on the give axis in the mesh.
Example: 16 processes managed in a 4-Dimensinal mesh with shape of [2, 2, 2, 2]. Example: 16 processes managed in a 4-Dimensional mesh with shape of [2, 2, 2, 2].
the rank communication peers of rank 0 (included) are following: the rank communication peers of rank 0 (included) are following:
in axis 0: [0, 1] in axis 0: [0, 1]
in axis 1: [0, 2] in axis 1: [0, 2]
...@@ -347,7 +347,7 @@ def _coordinate2linear_idx(mesh_shape, coordinate): ...@@ -347,7 +347,7 @@ def _coordinate2linear_idx(mesh_shape, coordinate):
# that the processes in mesh are # that the processes in mesh are
# 1. starts from 0 # 1. starts from 0
# 2. continuous # 2. continuous
# it will be wrong if ths above condition doesnot meet, # it will be wrong if ths above condition does not meet,
# e.g. process_mesh = { process_groups = [7, 8, 9,10, 12, 13, 14, 15], mesh = [2, 4]} # e.g. process_mesh = { process_groups = [7, 8, 9,10, 12, 13, 14, 15], mesh = [2, 4]}
# if you want a more general mapping, you should use cartesian product # if you want a more general mapping, you should use cartesian product
...@@ -594,7 +594,7 @@ def save_distributed_checkpoint( ...@@ -594,7 +594,7 @@ def save_distributed_checkpoint(
dist_context=None, dist_context=None,
): ):
""" """
Save model parameter state, optimzer state, distributed attribute and Save model parameter state, optimizer state, distributed attribute and
additional information of each rank. additional information of each rank.
Args: Args:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册