data07_09.yaml

- en: DataLoader2 Tutorial¶
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
- en: 原文：[https://pytorch.org/data/beta/dlv2_tutorial.html](https://pytorch.org/data/beta/dlv2_tutorial.html)
  prefs:
  - PREF_BQ
  type: TYPE_NORMAL
- en: This is the tutorial for users to create a `DataPipe` graph and load data via
    `DataLoader2` with different backend systems (`ReadingService`). An usage example
    can be found in [this colab notebook](https://colab.research.google.com/drive/1eSvp-eUDYPj0Sd0X_Mv9s9VkE8RNDg1u).
  prefs: []
  type: TYPE_NORMAL
- en: DataPipe[¶](#datapipe "Permalink to this heading")
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
- en: 'Please refer to [DataPipe Tutorial](dp_tutorial.html) for more details. Here
    are the most important caveats necessary: to make sure the data pipeline has different
    order per epoch and data shards are mutually exclusive and collectively exhaustive:'
  prefs: []
  type: TYPE_NORMAL
- en: Place `sharding_filter` or `sharding_round_robin_dispatch` as early as possible
    in the pipeline to avoid repeating expensive operations in worker/distributed
    processes.
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
- en: Add a `shuffle` DataPipe before sharding to achieve inter-shard shuffling. `ReadingService`
    will handle synchronization of those `shuffle` operations to ensure the order
    of data are the same before sharding so that all shards are mutually exclusive
    and collectively exhaustive.
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
- en: 'Here is an example of a `DataPipe` graph:'
  prefs: []
  type: TYPE_NORMAL
- en: '[PRE0]'
  prefs: []
  type: TYPE_PRE
- en: Multiprocessing[¶](#multiprocessing "Permalink to this heading")
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
- en: '`MultiProcessingReadingService` handles multiprocessing sharding at the point
    of `sharding_filter` and synchronizes the seeds across worker processes.'
  prefs: []
  type: TYPE_NORMAL
- en: '[PRE1]'
  prefs: []
  type: TYPE_PRE
- en: Distributed[¶](#distributed "Permalink to this heading")
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
- en: '`DistributedReadingService` handles distributed sharding at the point of `sharding_filter`
    and synchronizes the seeds across distributed processes. And, in order to balance
    the data shards across distributed nodes, a `fullsync` `DataPipe` will be attached
    to the `DataPipe` graph to align the number of batches across distributed ranks.
    This would prevent hanging issue caused by uneven shards in distributed training.'
  prefs: []
  type: TYPE_NORMAL
- en: '[PRE2]'
  prefs: []
  type: TYPE_PRE
- en: Multiprocessing + Distributed[¶](#multiprocessing-distributed "Permalink to
    this heading")
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
- en: '`SequentialReadingService` can be used to combine both `ReadingServices` together
    to achieve multiprocessing and distributed training at the same time.'
  prefs: []
  type: TYPE_NORMAL
- en: '[PRE3]'
  prefs: []
  type: TYPE_PRE