-en:When the `visualize` argument is set to True, the [get_comm_comp_overlap](https://hta.readthedocs.io/en/latest/source/api/trace_analysis_api.html#hta.trace_analysis.TraceAnalysis.get_comm_comp_overlap)
function also generates a bar graph representing the overlap by rank.
-en:If the TensorBoard is launched inside VS Code ([Launch Guide](https://devblogs.microsoft.com/python/python-in-visual-studio-code-february-2021-release/#tensorboard-integration)),
clicking a call stack frame will navigate to the specific code line.
-en:Let’s use Intel® VTune Profiler ITT to annotate [TorchServe inference scope](https://github.com/pytorch/serve/blob/master/ts/torch_handler/base_handler.py#L188)
to profile at inference-level granularity. As [TorchServe Architecture](https://github.com/pytorch/serve/blob/master/docs/internals.md#torchserve-architecture)
consists of several sub-components, including the Java frontend for handling request/response,
to launch distributed training if errors (e.g., out-of-memory) are expected or
if resources can join and leave dynamically during training.
id:totrans-18
prefs:
-PREF_OL
type:TYPE_NORMAL
-en:Note
id:totrans-19
prefs:[]
type:TYPE_NORMAL
-en:Data-parallel training also works with [Automatic Mixed Precision (AMP)](https://pytorch.org/docs/stable/notes/amp_examples.html#working-with-multiple-gpus).
data-parallel workers. The support for FSDP was added starting PyTorch v1.11\.
The tutorial [Getting Started with FSDP](https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html)
provides in depth explanation and example of how FSDP works.
id:totrans-32
prefs:[]
type:TYPE_NORMAL
-en:torch.distributed.elastic
id:totrans-33
prefs:
-PREF_H3
type:TYPE_NORMAL
...
...
@@ -208,9 +255,11 @@
(mismatched `AllReduce` operations) which would then cause a crash or hang. [torch.distributed.elastic](https://pytorch.org/docs/stable/distributed.elastic.html)
adds fault tolerance and the ability to make use of a dynamic pool of machines
-en:The [Getting Started with Distributed RPC Framework](../intermediate/rpc_tutorial.html)
tutorial first uses a simple Reinforcement Learning (RL) example to demonstrate
RPC and RRef. Then, it applies a basic distributed model parallelism to an RNN
example to show how to use distributed autograd and distributed optimizer.
id:totrans-43
prefs:
-PREF_OL
type:TYPE_NORMAL
-en:The [Implementing a Parameter Server Using Distributed RPC Framework](../intermediate/rpc_param_server_tutorial.html)
tutorial borrows the spirit of [HogWild! training](https://people.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf)
and applies it to an asynchronous parameter server (PS) training application.
id:totrans-44
prefs:
-PREF_OL
type:TYPE_NORMAL
...
...
@@ -268,6 +326,7 @@
tutorial extends the single-machine pipeline parallel example (presented in [Single-Machine
Model Parallel Best Practices](../intermediate/model_parallel_tutorial.html))
to a distributed environment and shows how to implement it using RPC.
id:totrans-45
prefs:
-PREF_OL
type:TYPE_NORMAL
...
...
@@ -275,20 +334,24 @@
tutorial demonstrates how to implement RPC batch processing using the [@rpc.functions.async_execution](https://pytorch.org/docs/stable/rpc.html#torch.distributed.rpc.functions.async_execution)
decorator, which can help speed up inference and training. It uses RL and PS examples
similar to those in the above tutorials 1 and 2.
id:totrans-46
prefs:
-PREF_OL
type:TYPE_NORMAL
-en:The [Combining Distributed DataParallel with Distributed RPC Framework](../advanced/rpc_ddp_tutorial.html)
tutorial demonstrates how to combine DDP with RPC to train a model using distributed
data parallelism combined with distributed model parallelism.
id:totrans-47
prefs:
-PREF_OL
type:TYPE_NORMAL
-en:PyTorch Distributed Developers
id:totrans-48
prefs:
-PREF_H2
type:TYPE_NORMAL
-en:If you’d like to contribute to PyTorch Distributed, please refer to our [Developer