提交 4c00d2f7 编写于 作者: 绝不原创的飞龙's avatar 绝不原创的飞龙

2024-02-05 13:39:19

上级 3db0d3af
此差异已折叠。
- en: Generic Join Context Manager
id: totrans-0
prefs:
- PREF_H1
type: TYPE_NORMAL
zh: 通用加入上下文管理器
- en: 原文:[https://pytorch.org/docs/stable/distributed.algorithms.join.html](https://pytorch.org/docs/stable/distributed.algorithms.join.html)
id: totrans-1
prefs:
- PREF_BQ
type: TYPE_NORMAL
zh: 原文:[https://pytorch.org/docs/stable/distributed.algorithms.join.html](https://pytorch.org/docs/stable/distributed.algorithms.join.html)
- en: 'The generic join context manager facilitates distributed training on uneven
inputs. This page outlines the API of the relevant classes: `Join`, `Joinable`,
and `JoinHook`. For a tutorial, see [Distributed Training with Uneven Inputs Using
the Join Context Manager](https://pytorch.org/tutorials/advanced/generic_join.html).'
id: totrans-2
prefs: []
type: TYPE_NORMAL
zh: 通用加入上下文管理器促进了不均匀输入的分布式训练。本页概述了相关类的API:`Join`、`Joinable`和`JoinHook`。有关教程,请参阅[使用加入上下文管理器进行不均匀输入的分布式训练](https://pytorch.org/tutorials/advanced/generic_join.html)。
- en: '[PRE0]'
id: totrans-3
prefs: []
type: TYPE_PRE
zh: '[PRE0]'
- en: This class defines the generic join context manager, which allows custom hooks
to be called after a process joins.
id: totrans-4
prefs: []
type: TYPE_NORMAL
zh: 此类定义了通用加入上下文管理器,允许在进程加入后调用自定义钩子。
- en: These hooks should shadow the collective communications of non-joined processes
to prevent hanging and erroring and to ensure algorithmic correctness. Refer to
[`JoinHook`](#torch.distributed.algorithms.JoinHook "torch.distributed.algorithms.JoinHook")
for details about the hook definition.
id: totrans-5
prefs: []
type: TYPE_NORMAL
zh: 这些钩子应该遮蔽未加入进程的集体通信,以防止挂起和出错,并确保算法的正确性。有关钩子定义的详细信息,请参阅[`JoinHook`](#torch.distributed.algorithms.JoinHook
"torch.distributed.algorithms.JoinHook")。
- en: Warning
id: totrans-6
prefs: []
type: TYPE_NORMAL
zh: 警告
- en: The context manager requires each participating [`Joinable`](#torch.distributed.algorithms.Joinable
"torch.distributed.algorithms.Joinable") to call the method [`notify_join_context()`](#torch.distributed.algorithms.Join.notify_join_context
"torch.distributed.algorithms.Join.notify_join_context") before its own per- iteration
collective communications to ensure correctness.
id: totrans-7
prefs: []
type: TYPE_NORMAL
zh: 上下文管理器要求每个参与的[`Joinable`](#torch.distributed.algorithms.Joinable "torch.distributed.algorithms.Joinable")在自己的每次迭代集体通信之前调用方法[`notify_join_context()`](#torch.distributed.algorithms.Join.notify_join_context
"torch.distributed.algorithms.Join.notify_join_context")以确保正确性。
- en: Warning
id: totrans-8
prefs: []
type: TYPE_NORMAL
zh: 警告
- en: The context manager requires that all `process_group` attributes in the [`JoinHook`](#torch.distributed.algorithms.JoinHook
"torch.distributed.algorithms.JoinHook") objects are the same. If there are multiple
[`JoinHook`](#torch.distributed.algorithms.JoinHook "torch.distributed.algorithms.JoinHook")
......@@ -44,80 +64,122 @@
information is used for checking for non- joined processes and for notifying processes
to throw an exception if `throw_on_early_termination` is enabled, both of which
using an all- reduce.
id: totrans-9
prefs: []
type: TYPE_NORMAL
zh: 上下文管理器要求[`JoinHook`](#torch.distributed.algorithms.JoinHook "torch.distributed.algorithms.JoinHook")对象中的所有`process_group`属性都相同。如果有多个[`JoinHook`](#torch.distributed.algorithms.JoinHook
"torch.distributed.algorithms.JoinHook")对象,则使用第一个的`device`。进程组和设备信息用于检查未加入的进程,并通知进程在启用`throw_on_early_termination`时抛出异常,两者都使用全局归约。
- en: Parameters
id: totrans-10
prefs: []
type: TYPE_NORMAL
zh: 参数
- en: '**joinables** (*List**[*[*Joinable*](#torch.distributed.algorithms.Joinable
"torch.distributed.algorithms.Joinable")*]*) a list of the participating [`Joinable`](#torch.distributed.algorithms.Joinable
"torch.distributed.algorithms.Joinable") s; their hooks are iterated over in the
given order.'
id: totrans-11
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '**joinables**(*List**[*[*Joinable*](#torch.distributed.algorithms.Joinable
"torch.distributed.algorithms.Joinable")*]*) - 参与的[`Joinable`](#torch.distributed.algorithms.Joinable
"torch.distributed.algorithms.Joinable")对象的列表;它们的钩子按给定顺序迭代。'
- en: '**enable** ([*bool*](https://docs.python.org/3/library/functions.html#bool
"(in Python v3.12)")) a flag enabling uneven input detection; setting to `False`
disables the context manager’s functionality and should only be set when the user
knows the inputs will not be uneven (default: `True`).'
id: totrans-12
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '**enable**([*bool*](https://docs.python.org/3/library/functions.html#bool "(in
Python v3.12)")) - 一个标志,用于启用不均匀输入检测;设置为`False`会禁用上下文管理器的功能,只有在用户知道输入不会不均匀时才应设置(默认值:`True`)。'
- en: '**throw_on_early_termination** ([*bool*](https://docs.python.org/3/library/functions.html#bool
"(in Python v3.12)")) a flag controlling whether to throw an exception upon
detecting uneven inputs (default: `False`).'
id: totrans-13
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '**throw_on_early_termination**([*bool*](https://docs.python.org/3/library/functions.html#bool
"(in Python v3.12)")) - 一个控制是否在检测到不均匀输入时抛出异常的标志(默认值:`False`)。'
- en: 'Example:'
id: totrans-14
prefs: []
type: TYPE_NORMAL
zh: 示例:
- en: '[PRE1]'
id: totrans-15
prefs: []
type: TYPE_PRE
zh: '[PRE1]'
- en: '[PRE2]'
id: totrans-16
prefs: []
type: TYPE_PRE
zh: '[PRE2]'
- en: Notifies the join context manager that the calling process has not yet joined.
id: totrans-17
prefs: []
type: TYPE_NORMAL
zh: 通知加入上下文管理器,调用进程尚未加入。
- en: Then, if `throw_on_early_termination=True`, checks if uneven inputs have been
detected (i.e. if one process has already joined) and throws an exception if so.
id: totrans-18
prefs: []
type: TYPE_NORMAL
zh: 然后,如果`throw_on_early_termination=True`,则检查是否检测到不均匀的输入(即如果一个进程已经加入),如果是,则抛出异常。
- en: This method should be called from a [`Joinable`](#torch.distributed.algorithms.Joinable
"torch.distributed.algorithms.Joinable") object before its per-iteration collective
communications. For example, this should be called at the beginning of the forward
pass in `DistributedDataParallel`.
id: totrans-19
prefs: []
type: TYPE_NORMAL
zh: 此方法应该在[`Joinable`](#torch.distributed.algorithms.Joinable "torch.distributed.algorithms.Joinable")对象的每次迭代集体通信之前调用。例如,在`DistributedDataParallel`的前向传递开始时应调用此方法。
- en: Only the first [`Joinable`](#torch.distributed.algorithms.Joinable "torch.distributed.algorithms.Joinable")
object passed into the context manager performs the collective communications
in this method, and for the others, this method is vacuous.
id: totrans-20
prefs: []
type: TYPE_NORMAL
zh: 只有第一个传递到上下文管理器的[`Joinable`](#torch.distributed.algorithms.Joinable "torch.distributed.algorithms.Joinable")对象在此方法中执行集体通信,对于其他对象,此方法为空。
- en: Parameters
id: totrans-21
prefs: []
type: TYPE_NORMAL
zh: 参数
- en: '**joinable** ([*Joinable*](#torch.distributed.algorithms.Joinable "torch.distributed.algorithms.Joinable"))
the [`Joinable`](#torch.distributed.algorithms.Joinable "torch.distributed.algorithms.Joinable")
object calling this method.'
id: totrans-22
prefs: []
type: TYPE_NORMAL
zh: '**joinable**([*Joinable*](#torch.distributed.algorithms.Joinable "torch.distributed.algorithms.Joinable"))
- 调用此方法的[`Joinable`](#torch.distributed.algorithms.Joinable "torch.distributed.algorithms.Joinable")对象。'
- en: Returns
id: totrans-23
prefs: []
type: TYPE_NORMAL
zh: 返回
- en: An async work handle for the all-reduce meant to notify the context manager
that the process has not yet joined if `joinable` is the first one passed into
the context manager; `None` otherwise.
id: totrans-24
prefs: []
type: TYPE_NORMAL
zh: 一个用于全局归约的异步工作句柄,用于通知上下文管理器进程尚未加入,如果`joinable`是传递到上下文管理器的第一个;否则为`None`。
- en: '[PRE3]'
id: totrans-25
prefs: []
type: TYPE_PRE
zh: '[PRE3]'
- en: This defines an abstract base class for joinable classes.
id: totrans-26
prefs: []
type: TYPE_NORMAL
zh: 这为可加入类定义了一个抽象基类。
- en: A joinable class (inheriting from [`Joinable`](#torch.distributed.algorithms.Joinable
"torch.distributed.algorithms.Joinable")) should implement [`join_hook()`](#torch.distributed.algorithms.Joinable.join_hook
"torch.distributed.algorithms.Joinable.join_hook"), which returns a [`JoinHook`](#torch.distributed.algorithms.JoinHook
......@@ -125,87 +187,143 @@
"torch.distributed.algorithms.Joinable.join_device") and [`join_process_group()`](#torch.distributed.algorithms.Joinable.join_process_group
"torch.distributed.algorithms.Joinable.join_process_group") that return device
and process group information, respectively.
id: totrans-27
prefs: []
type: TYPE_NORMAL
zh: 一个可加入的类(从[`Joinable`](#torch.distributed.algorithms.Joinable "torch.distributed.algorithms.Joinable")继承)应该实现[`join_hook()`](#torch.distributed.algorithms.Joinable.join_hook
"torch.distributed.algorithms.Joinable.join_hook"),它返回一个[`JoinHook`](#torch.distributed.algorithms.JoinHook
"torch.distributed.algorithms.JoinHook")实例,另外还应该实现[`join_device()`](#torch.distributed.algorithms.Joinable.join_device
"torch.distributed.algorithms.Joinable.join_device")和[`join_process_group()`](#torch.distributed.algorithms.Joinable.join_process_group
"torch.distributed.algorithms.Joinable.join_process_group")来分别返回设备和进程组信息。
- en: '[PRE4]'
id: totrans-28
prefs: []
type: TYPE_PRE
zh: '[PRE4]'
- en: Return the device from which to perform collective communications needed by
the join context manager.
id: totrans-29
prefs: []
type: TYPE_NORMAL
zh: 返回执行加入上下文管理器所需的集体通信的设备。
- en: '[PRE5]'
id: totrans-30
prefs: []
type: TYPE_PRE
zh: '[PRE5]'
- en: Return a [`JoinHook`](#torch.distributed.algorithms.JoinHook "torch.distributed.algorithms.JoinHook")
instance for the given [`Joinable`](#torch.distributed.algorithms.Joinable "torch.distributed.algorithms.Joinable").
id: totrans-31
prefs: []
type: TYPE_NORMAL
zh: 为给定的[`Joinable`](#torch.distributed.algorithms.Joinable "torch.distributed.algorithms.Joinable")返回一个[`JoinHook`](#torch.distributed.algorithms.JoinHook
"torch.distributed.algorithms.JoinHook")实例。
- en: Parameters
id: totrans-32
prefs: []
type: TYPE_NORMAL
zh: 参数
- en: '**kwargs** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict "(in
Python v3.12)")) a [`dict`](https://docs.python.org/3/library/stdtypes.html#dict
"(in Python v3.12)") containing any keyword arguments to modify the behavior of
the join hook at run time; all [`Joinable`](#torch.distributed.algorithms.Joinable
"torch.distributed.algorithms.Joinable") instances sharing the same join context
manager are forwarded the same value for `kwargs`.'
id: totrans-33
prefs: []
type: TYPE_NORMAL
zh: '**kwargs**([*dict*](https://docs.python.org/3/library/stdtypes.html#dict "(in
Python v3.12)")) - 包含任何关键字参数以在运行时修改加入钩子行为的[`dict`](https://docs.python.org/3/library/stdtypes.html#dict
"(in Python v3.12)");所有共享相同加入上下文管理器的[`Joinable`](#torch.distributed.algorithms.Joinable
"torch.distributed.algorithms.Joinable")实例将被转发相同的`kwargs`值。'
- en: Return type
id: totrans-34
prefs: []
type: TYPE_NORMAL
zh: 返回类型
- en: '[*JoinHook*](#torch.distributed.algorithms.JoinHook "torch.distributed.algorithms.join.JoinHook")'
id: totrans-35
prefs: []
type: TYPE_NORMAL
zh: '[*JoinHook*](#torch.distributed.algorithms.JoinHook "torch.distributed.algorithms.join.JoinHook")'
- en: '[PRE6]'
id: totrans-36
prefs: []
type: TYPE_PRE
zh: '[PRE6]'
- en: Returns the process group for the collective communications needed by the join
context manager itself.
id: totrans-37
prefs: []
type: TYPE_NORMAL
zh: 返回加入上下文管理器本身所需的集体通信的进程组。
- en: '[PRE7]'
id: totrans-38
prefs: []
type: TYPE_PRE
zh: '[PRE7]'
- en: This defines a join hook, which provides two entry points in the join context
manager.
id: totrans-39
prefs: []
type: TYPE_NORMAL
zh: 这定义了一个加入钩子,在加入上下文管理器中提供了两个入口点。
- en: 'Entry points : a main hook, which is called repeatedly while there exists a
non-joined process, and a post-hook, which is called once all processes have joined.'
id: totrans-40
prefs: []
type: TYPE_NORMAL
zh: 入口点:一个主要的钩子,当存在一个未加入的进程时会被重复调用,以及一个后置钩子,当所有进程都已加入时会被调用一次。
- en: To implement a join hook for the generic join context manager, define a class
that inherits from [`JoinHook`](#torch.distributed.algorithms.JoinHook "torch.distributed.algorithms.JoinHook")
and override `main_hook()` and `post_hook()` as appropriate.
id: totrans-41
prefs: []
type: TYPE_NORMAL
zh: 要为通用加入上下文管理器实现一个加入钩子,需要定义一个从[`JoinHook`](#torch.distributed.algorithms.JoinHook
"torch.distributed.algorithms.JoinHook")继承的类,并适当地重写`main_hook()`和`post_hook()`。
- en: '[PRE8]'
id: totrans-42
prefs: []
type: TYPE_PRE
zh: '[PRE8]'
- en: Call this hook while there exists a non-joined process to shadow collective
communications in a training iteration.
id: totrans-43
prefs: []
type: TYPE_NORMAL
zh: 在训练迭代中,当存在一个未加入的进程时调用此钩子以隐藏集体通信。
- en: Training iteration i.e., in one forward pass, backward pass, and optimizer step.
id: totrans-44
prefs: []
type: TYPE_NORMAL
zh: 训练迭代,即在一个前向传播、反向传播和优化器步骤中。
- en: '[PRE9]'
id: totrans-45
prefs: []
type: TYPE_PRE
zh: '[PRE9]'
- en: Call hook after all processes have joined.
id: totrans-46
prefs: []
type: TYPE_NORMAL
zh: 在所有进程都已加入后调用钩子。
- en: It is passed an additional `bool` argument `is_last_joiner`, which indicates
if the rank is one of the last to join.
id: totrans-47
prefs: []
type: TYPE_NORMAL
zh: 它接受一个额外的`bool`参数`is_last_joiner`,指示该排名是否是最后加入的之一。
- en: Parameters
id: totrans-48
prefs: []
type: TYPE_NORMAL
zh: 参数
- en: '**is_last_joiner** ([*bool*](https://docs.python.org/3/library/functions.html#bool
"(in Python v3.12)")) `True` if the rank is one of the last to join; `False`
otherwise.'
id: totrans-49
prefs: []
type: TYPE_NORMAL
zh: '**is_last_joiner**([*bool*](https://docs.python.org/3/library/functions.html#bool
"(in Python v3.12)")) - 如果排名是最后加入的之一,则为`True`;否则为`False`。'
- en: Torch Distributed Elastic
id: totrans-0
prefs:
- PREF_H1
type: TYPE_NORMAL
zh: Torch分布式弹性
- en: 原文:[https://pytorch.org/docs/stable/distributed.elastic.html](https://pytorch.org/docs/stable/distributed.elastic.html)
id: totrans-1
prefs:
- PREF_BQ
type: TYPE_NORMAL
zh: 原文:[https://pytorch.org/docs/stable/distributed.elastic.html](https://pytorch.org/docs/stable/distributed.elastic.html)
- en: Makes distributed PyTorch fault-tolerant and elastic.
id: totrans-2
prefs: []
type: TYPE_NORMAL
zh: 使分布式PyTorch具有容错性和弹性。
- en: Get Started
id: totrans-3
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 入门
- en: Usage
id: totrans-4
prefs: []
type: TYPE_NORMAL
zh: 用法
- en: '[Quickstart](elastic/quickstart.html)'
id: totrans-5
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[快速入门](elastic/quickstart.html)'
- en: '[Train script](elastic/train_script.html)'
id: totrans-6
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[训练脚本](elastic/train_script.html)'
- en: '[Examples](elastic/examples.html)'
id: totrans-7
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[示例](elastic/examples.html)'
- en: Documentation
id: totrans-8
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 文档
- en: API
id: totrans-9
prefs: []
type: TYPE_NORMAL
zh: API
- en: '[torchrun (Elastic Launch)](elastic/run.html)'
id: totrans-10
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[torchrun(弹性启动)](elastic/run.html)'
- en: '[Elastic Agent](elastic/agent.html)'
id: totrans-11
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[弹性代理](elastic/agent.html)'
- en: '[Multiprocessing](elastic/multiprocessing.html)'
id: totrans-12
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[多进程](elastic/multiprocessing.html)'
- en: '[Error Propagation](elastic/errors.html)'
id: totrans-13
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[错误传播](elastic/errors.html)'
- en: '[Rendezvous](elastic/rendezvous.html)'
id: totrans-14
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[会合](elastic/rendezvous.html)'
- en: '[Expiration Timers](elastic/timer.html)'
id: totrans-15
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[过期计时器](elastic/timer.html)'
- en: '[Metrics](elastic/metrics.html)'
id: totrans-16
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[指标](elastic/metrics.html)'
- en: '[Events](elastic/events.html)'
id: totrans-17
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[事件](elastic/events.html)'
- en: Advanced
id: totrans-18
prefs: []
type: TYPE_NORMAL
zh: 高级
- en: '[Customization](elastic/customization.html)'
id: totrans-19
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[定制](elastic/customization.html)'
- en: Plugins
id: totrans-20
prefs: []
type: TYPE_NORMAL
zh: 插件
- en: '[TorchElastic Kubernetes](elastic/kubernetes.html)'
id: totrans-21
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[TorchElastic Kubernetes](elastic/kubernetes.html)'
此差异已折叠。
此差异已折叠。
- en: Tensor Parallelism - torch.distributed.tensor.parallel
id: totrans-0
prefs:
- PREF_H1
type: TYPE_NORMAL
- en: 原文:[https://pytorch.org/docs/stable/distributed.tensor.parallel.html](https://pytorch.org/docs/stable/distributed.tensor.parallel.html)
id: totrans-1
prefs:
- PREF_BQ
type: TYPE_NORMAL
- en: 'Tensor Parallelism(TP) is built on top of the PyTorch DistributedTensor ([DTensor](https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/README.md))
and provides different parallelism styles: Colwise and Rowwise Parallelism.'
id: totrans-2
prefs: []
type: TYPE_NORMAL
- en: Warning
id: totrans-3
prefs: []
type: TYPE_NORMAL
- en: Tensor Parallelism APIs are experimental and subject to change.
id: totrans-4
prefs: []
type: TYPE_NORMAL
- en: 'The entrypoint to parallelize your `nn.Module` using Tensor Parallelism is:'
id: totrans-5
prefs: []
type: TYPE_NORMAL
- en: '[PRE0]'
id: totrans-6
prefs: []
type: TYPE_PRE
zh: '[PRE0]'
- en: Apply Tensor Parallelism in PyTorch by parallelizing modules or sub-modules
based on a user-specified plan.
id: totrans-7
prefs: []
type: TYPE_NORMAL
- en: We parallelize module or sub_modules based on a parallelize_plan. The parallelize_plan
contains `ParallelStyle`, which indicates how user wants the module or sub_module
to be parallelized.
id: totrans-8
prefs: []
type: TYPE_NORMAL
- en: User can also specify different parallel style per module fully qualified name
(FQN).
id: totrans-9
prefs: []
type: TYPE_NORMAL
- en: Note that `parallelize_module` only accepts a 1-D `DeviceMesh`, if you have
a 2-D or N-D `DeviceMesh`, slice the DeviceMesh to a 1-D sub DeviceMesh first
then pass to this API(i.e. `device_mesh["tp"]`)
id: totrans-10
prefs: []
type: TYPE_NORMAL
- en: Parameters
id: totrans-11
prefs: []
type: TYPE_NORMAL
- en: '**module** (`nn.Module`) Module to be parallelized.'
id: totrans-12
prefs:
- PREF_UL
type: TYPE_NORMAL
- en: '**device_mesh** (`DeviceMesh`) Object which describes the mesh topology of
devices for the DTensor.'
id: totrans-13
prefs:
- PREF_UL
type: TYPE_NORMAL
......@@ -56,6 +71,7 @@
The plan used to parallelize the module. It can be either a `ParallelStyle` object
which contains how we prepare input/output for Tensor Parallelism or it can be
a dict of module FQN and its corresponding `ParallelStyle` object.'
id: totrans-14
prefs:
- PREF_UL
type: TYPE_NORMAL
......@@ -63,52 +79,68 @@
"(in Python v3.12)")*,* *deprecated*) The dimension of `device_mesh` where we
perform Tensor Parallelism on, this field is deprecated and will be removed in
future. If you have a 2-D or N-D `DeviceMesh`, consider passing in device_mesh[“tp”]'
id: totrans-15
prefs:
- PREF_UL
type: TYPE_NORMAL
- en: Returns
id: totrans-16
prefs: []
type: TYPE_NORMAL
- en: A `nn.Module` object parallelized.
id: totrans-17
prefs: []
type: TYPE_NORMAL
- en: Return type
id: totrans-18
prefs: []
type: TYPE_NORMAL
- en: '[*Module*](generated/torch.nn.Module.html#torch.nn.Module "torch.nn.modules.module.Module")'
id: totrans-19
prefs: []
type: TYPE_NORMAL
- en: 'Example::'
id: totrans-20
prefs: []
type: TYPE_NORMAL
- en: '[PRE1]'
id: totrans-21
prefs: []
type: TYPE_PRE
zh: '[PRE1]'
- en: Note
id: totrans-22
prefs: []
type: TYPE_NORMAL
- en: For complex module architecture like Attention, MLP layers, we recommend composing
different ParallelStyles together (i.e. `ColwiseParallel` and `RowwiseParallel`)
and pass as a parallelize_plan, to achieves the desired sharding computation.
id: totrans-23
prefs: []
type: TYPE_NORMAL
- en: 'Tensor Parallelism supports the following parallel styles:'
id: totrans-24
prefs: []
type: TYPE_NORMAL
- en: '[PRE2]'
id: totrans-25
prefs: []
type: TYPE_PRE
zh: '[PRE2]'
- en: Partition a compatible nn.Module in a row-wise fashion. Currently supports nn.Linear
and nn.Embedding. Users can compose it together with RowwiseParallel to achieve
the sharding of more complicated modules. (i.e. MLP, Attention)
id: totrans-26
prefs: []
type: TYPE_NORMAL
- en: Keyword Arguments
id: totrans-27
prefs: []
type: TYPE_NORMAL
- en: '**input_layouts** (*Placement**,* *optional*) The DTensor layout of input
tensor for the nn.Module, this is used to annotate the input tensor to become
a DTensor. If not specified, we assume the input tensor to be replicated.'
id: totrans-28
prefs:
- PREF_UL
type: TYPE_NORMAL
......@@ -116,173 +148,257 @@
output for the nn.Module, this is used to ensure the output of the nn.Module with
the user desired layout. If not specified, the output tensor is sharded on the
last dimension.'
id: totrans-29
prefs:
- PREF_UL
type: TYPE_NORMAL
- en: '**use_local_output** ([*bool*](https://docs.python.org/3/library/functions.html#bool
"(in Python v3.12)")*,* *optional*) Whether to use local [`torch.Tensor`](tensors.html#torch.Tensor
"torch.Tensor") instead of `DTensor` for the module output, default: True.'
id: totrans-30
prefs:
- PREF_UL
type: TYPE_NORMAL
- en: Returns
id: totrans-31
prefs: []
type: TYPE_NORMAL
- en: A `ParallelStyle` object that represents Colwise sharding of the nn.Module.
id: totrans-32
prefs: []
type: TYPE_NORMAL
- en: 'Example::'
id: totrans-33
prefs: []
type: TYPE_NORMAL
- en: '[PRE3]'
id: totrans-34
prefs: []
type: TYPE_PRE
zh: '[PRE3]'
- en: Note
id: totrans-35
prefs: []
type: TYPE_NORMAL
- en: By default `ColwiseParallel` output is sharded on the last dimension if the
`output_layouts` not specified, if there’re operators that require specific tensor
shape (i.e. before the paired `RowwiseParallel`), keep in mind that if the output
is sharded the operator might need to be adjusted to the sharded size.
id: totrans-36
prefs: []
type: TYPE_NORMAL
- en: '[PRE4]'
id: totrans-37
prefs: []
type: TYPE_PRE
zh: '[PRE4]'
- en: Partition a compatible nn.Module in a row-wise fashion. Currently supports nn.Linear
only. Users can compose it with ColwiseParallel to achieve the sharding of more
complicated modules. (i.e. MLP, Attention)
id: totrans-38
prefs: []
type: TYPE_NORMAL
zh: 将兼容的nn.Module按行划分。目前仅支持nn.Linear。用户可以将其与ColwiseParallel组合,以实现更复杂模块的分片(即MLP,Attention)
- en: Keyword Arguments
id: totrans-39
prefs: []
type: TYPE_NORMAL
zh: 关键参数
- en: '**input_layouts** (*Placement**,* *optional*) The DTensor layout of input
tensor for the nn.Module, this is used to annotate the input tensor to become
a DTensor. If not specified, we assume the input tensor to be sharded on the last
dimension.'
id: totrans-40
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '**input_layouts** (*Placement**,* *optional*) nn.Module的输入张量的DTensor布局,用于注释输入张量以成为DTensor。如果未指定,我们假定输入张量在最后一个维度上被分片。'
- en: '**output_layouts** (*Placement**,* *optional*) The DTensor layout of the
output for the nn.Module, this is used to ensure the output of the nn.Module with
the user desired layout. If not specified, the output tensor is replicated.'
id: totrans-41
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '**output_layouts** (*Placement**,* *optional*) nn.Module输出的DTensor布局,用于确保nn.Module的输出具有用户期望的布局。如果未指定,则输出张量将被复制。'
- en: '**use_local_output** ([*bool*](https://docs.python.org/3/library/functions.html#bool
"(in Python v3.12)")*,* *optional*) Whether to use local [`torch.Tensor`](tensors.html#torch.Tensor
"torch.Tensor") instead of `DTensor` for the module output, default: True.'
id: totrans-42
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '**use_local_output** ([*bool*](https://docs.python.org/3/library/functions.html#bool
"(in Python v3.12)")*,* *optional*) 是否使用本地[`torch.Tensor`](tensors.html#torch.Tensor
"torch.Tensor")而不是`DTensor`作为模块输出,默认值为True。'
- en: Returns
id: totrans-43
prefs: []
type: TYPE_NORMAL
zh: 返回
- en: A `ParallelStyle` object that represents Rowwise sharding of the nn.Module.
id: totrans-44
prefs: []
type: TYPE_NORMAL
zh: 代表nn.Module的Rowwise分片的`ParallelStyle`对象。
- en: 'Example::'
id: totrans-45
prefs: []
type: TYPE_NORMAL
zh: '示例::'
- en: '[PRE5]'
id: totrans-46
prefs: []
type: TYPE_PRE
zh: '[PRE5]'
- en: 'To simply configure the nn.Module’s inputs and outputs with DTensor layouts
and perform necessary layout redistributions, without distribute the module parameters
to DTensors, the following classes can be used in the `parallelize_plan` of `parallelize_module`:'
id: totrans-47
prefs: []
type: TYPE_NORMAL
zh: 要简单配置nn.Module的输入和输出以及执行必要的布局重分配,而不将模块参数分发到DTensors,可以在`parallelize_module`的`parallelize_plan`中使用以下类:
- en: '[PRE6]'
id: totrans-48
prefs: []
type: TYPE_PRE
zh: '[PRE6]'
- en: Configure the nn.Module’s inputs to convert the input tensors of the nn.Module
to DTensors at runtime according to `input_layouts`, and perform layout redistribution
according to the `desired_input_layouts`.
id: totrans-49
prefs: []
type: TYPE_NORMAL
zh: 根据`input_layouts`配置nn.Module的输入,根据`desired_input_layouts`执行布局重分配,将nn.Module的输入张量转换为DTensors。
- en: Keyword Arguments
id: totrans-50
prefs: []
type: TYPE_NORMAL
zh: 关键参数
- en: '**input_layouts** (*Union**[**Placement**,* *Tuple**[**Placement**]**]*)
The DTensor layouts of input tensors for the nn.Module, this is used to convert
the input tensors to DTensors. If some inputs are not torch.Tensor or no need
to convert to DTensors, `None` need to be specified as a placeholder.'
id: totrans-51
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '**input_layouts** (*Union**[**Placement**,* *Tuple**[**Placement**]**]*)
nn.Module的输入张量的DTensor布局,用于将输入张量转换为DTensors。如果某些输入不是torch.Tensor或不需要转换为DTensors,则需要指定`None`作为占位符。'
- en: '**desired_input_layouts** (*Union**[**Placement**,* *Tuple**[**Placement**]**]*)
The desired DTensor layout of input tensors for the nn.Module, this is used
to ensure the inputs of the nn.Module have the desired DTensor layouts. This argument
needs to have the same length with `input_layouts`.'
id: totrans-52
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '**desired_input_layouts** (*Union**[**Placement**,* *Tuple**[**Placement**]**]*)
nn.Module输入张量的期望DTensor布局,用于确保nn.Module的输入具有期望的DTensor布局。此参数需要与`input_layouts`具有相同的长度。'
- en: '**use_local_output** ([*bool*](https://docs.python.org/3/library/functions.html#bool
"(in Python v3.12)")*,* *optional*) Whether to use local [`torch.Tensor`](tensors.html#torch.Tensor
"torch.Tensor") instead of `DTensor` for the module inputs, default: False.'
id: totrans-53
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '**use_local_output** ([*bool*](https://docs.python.org/3/library/functions.html#bool
"(in Python v3.12)")*,* *optional*) 是否使用本地[`torch.Tensor`](tensors.html#torch.Tensor
"torch.Tensor")而不是`DTensor`作为模块输入,默认值为False。'
- en: Returns
id: totrans-54
prefs: []
type: TYPE_NORMAL
zh: 返回
- en: A `ParallelStyle` object that prepares the sharding layouts of the nn.Module’s
inputs.
id: totrans-55
prefs: []
type: TYPE_NORMAL
zh: 准备nn.Module输入的分片布局的`ParallelStyle`对象。
- en: 'Example::'
id: totrans-56
prefs: []
type: TYPE_NORMAL
zh: '示例::'
- en: '[PRE7]'
id: totrans-57
prefs: []
type: TYPE_PRE
zh: '[PRE7]'
- en: '[PRE8]'
id: totrans-58
prefs: []
type: TYPE_PRE
zh: '[PRE8]'
- en: Configure the nn.Module’s outputs to convert the output tensors of the nn.Module
to DTensors at runtime according to `output_layouts`, and perform layout redistribution
according to the `desired_output_layouts`.
id: totrans-59
prefs: []
type: TYPE_NORMAL
zh: 根据`output_layouts`配置nn.Module的输出,根据`desired_output_layouts`执行布局重分配,将nn.Module的输出张量转换为DTensors。
- en: Keyword Arguments
id: totrans-60
prefs: []
type: TYPE_NORMAL
zh: 关键参数
- en: '**output_layouts** (*Union**[**Placement**,* *Tuple**[**Placement**]**]*)
The DTensor layouts of output tensors for the nn.Module, this is used to convert
the output tensors to DTensors if they are [`torch.Tensor`](tensors.html#torch.Tensor
"torch.Tensor"). If some outputs are not torch.Tensor or no need to convert to
DTensors, `None` need to be specified as a placeholder.'
id: totrans-61
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '**output_layouts** (*Union**[**Placement**,* *Tuple**[**Placement**]**]*)
nn.Module输出张量的DTensor布局,用于将输出张量转换为DTensors(如果它们是[`torch.Tensor`](tensors.html#torch.Tensor
"torch.Tensor"))。如果某些输出不是torch.Tensor或不需要转换为DTensors,则需要指定`None`作为占位符。'
- en: '**desired_output_layouts** (*Union**[**Placement**,* *Tuple**[**Placement**]**]*)
The desired DTensor layouts of output tensors for the nn.Module, this is used
to ensure the outputs of the nn.Module have the desired DTensor layouts.'
id: totrans-62
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '**desired_output_layouts** (*Union**[**Placement**,* *Tuple**[**Placement**]**]*)
nn.Module输出张量的期望DTensor布局,用于确保nn.Module的输出具有期望的DTensor布局。'
- en: '**use_local_output** ([*bool*](https://docs.python.org/3/library/functions.html#bool
"(in Python v3.12)")*,* *optional*) Whether to use local [`torch.Tensor`](tensors.html#torch.Tensor
"torch.Tensor") instead of `DTensor` for the module outputs, default: False.'
id: totrans-63
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '**use_local_output** ([*bool*](https://docs.python.org/3/library/functions.html#bool
"(in Python v3.12)")*,* *optional*) 是否使用本地[`torch.Tensor`](tensors.html#torch.Tensor
"torch.Tensor")而不是`DTensor`作为模块输出,默认值为False。'
- en: Returns
id: totrans-64
prefs: []
type: TYPE_NORMAL
zh: 返回
- en: A ParallelStyle object that prepares the sharding layouts of the nn.Module’s
outputs.
id: totrans-65
prefs: []
type: TYPE_NORMAL
zh: 准备nn.Module输出的分片布局的`ParallelStyle`对象。
- en: 'Example::'
id: totrans-66
prefs: []
type: TYPE_NORMAL
zh: '示例::'
- en: '[PRE9]'
id: totrans-67
prefs: []
type: TYPE_PRE
zh: '[PRE9]'
- en: For models like Transformer, we recommend users to use `ColwiseParallel` and
`RowwiseParallel` together in the parallelize_plan for achieve the desired sharding
for the entire model (i.e. Attention and MLP).
id: totrans-68
prefs: []
type: TYPE_NORMAL
zh: 对于Transformer等模型,我们建议用户在`parallelize_plan`中同时使用`ColwiseParallel`和`RowwiseParallel`来实现整个模型的期望分片(即Attention和MLP)。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册