data07_09.yaml 2.9 KB
Newer Older
绝不原创的飞龙's avatar
绝不原创的飞龙 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
- en: DataLoader2 Tutorial¶
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
- en: 原文:[https://pytorch.org/data/beta/dlv2_tutorial.html](https://pytorch.org/data/beta/dlv2_tutorial.html)
  prefs:
  - PREF_BQ
  type: TYPE_NORMAL
- en: This is the tutorial for users to create a `DataPipe` graph and load data via
    `DataLoader2` with different backend systems (`ReadingService`). An usage example
    can be found in [this colab notebook](https://colab.research.google.com/drive/1eSvp-eUDYPj0Sd0X_Mv9s9VkE8RNDg1u).
  prefs: []
  type: TYPE_NORMAL
- en: DataPipe[¶](#datapipe "Permalink to this heading")
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
- en: 'Please refer to [DataPipe Tutorial](dp_tutorial.html) for more details. Here
    are the most important caveats necessary: to make sure the data pipeline has different
    order per epoch and data shards are mutually exclusive and collectively exhaustive:'
  prefs: []
  type: TYPE_NORMAL
- en: Place `sharding_filter` or `sharding_round_robin_dispatch` as early as possible
    in the pipeline to avoid repeating expensive operations in worker/distributed
    processes.
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
- en: Add a `shuffle` DataPipe before sharding to achieve inter-shard shuffling. `ReadingService`
    will handle synchronization of those `shuffle` operations to ensure the order
    of data are the same before sharding so that all shards are mutually exclusive
    and collectively exhaustive.
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
- en: 'Here is an example of a `DataPipe` graph:'
  prefs: []
  type: TYPE_NORMAL
- en: '[PRE0]'
  prefs: []
  type: TYPE_PRE
- en: Multiprocessing[¶](#multiprocessing "Permalink to this heading")
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
- en: '`MultiProcessingReadingService` handles multiprocessing sharding at the point
    of `sharding_filter` and synchronizes the seeds across worker processes.'
  prefs: []
  type: TYPE_NORMAL
- en: '[PRE1]'
  prefs: []
  type: TYPE_PRE
- en: Distributed[¶](#distributed "Permalink to this heading")
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
- en: '`DistributedReadingService` handles distributed sharding at the point of `sharding_filter`
    and synchronizes the seeds across distributed processes. And, in order to balance
    the data shards across distributed nodes, a `fullsync` `DataPipe` will be attached
    to the `DataPipe` graph to align the number of batches across distributed ranks.
    This would prevent hanging issue caused by uneven shards in distributed training.'
  prefs: []
  type: TYPE_NORMAL
- en: '[PRE2]'
  prefs: []
  type: TYPE_PRE
- en: Multiprocessing + Distributed[¶](#multiprocessing-distributed "Permalink to
    this heading")
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
- en: '`SequentialReadingService` can be used to combine both `ReadingServices` together
    to achieve multiprocessing and distributed training at the same time.'
  prefs: []
  type: TYPE_NORMAL
- en: '[PRE3]'
  prefs: []
  type: TYPE_PRE