- en: Map-style DataPipes id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL zh: 映射样式DataPipes - en: 原文:[https://pytorch.org/data/beta/torchdata.datapipes.map.html](https://pytorch.org/data/beta/torchdata.datapipes.map.html) id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL zh: 原文:[https://pytorch.org/data/beta/torchdata.datapipes.map.html](https://pytorch.org/data/beta/torchdata.datapipes.map.html) - en: A Map-style DataPipe is one that implements the `__getitem__()` and `__len__()` protocols, and represents a map from (possibly non-integral) indices/keys to data samples. This is a close equivalent of `Dataset` from the PyTorch core library. id: totrans-2 prefs: [] type: TYPE_NORMAL zh: 映射样式DataPipe是实现`__getitem__()`和`__len__()`协议的DataPipe,表示从(可能是非整数)索引/键到数据样本的映射。这与PyTorch核心库中的`Dataset`是相似的。 - en: For example, when accessed with `mapdatapipe[idx]`, could read the `idx`-th image and its corresponding label from a folder on the disk. id: totrans-3 prefs: [] type: TYPE_NORMAL zh: 例如,当使用`mapdatapipe[idx]`访问时,可以从磁盘上的文件夹中读取第`idx`个图像及其对应的标签。 - en: '[PRE0]' id: totrans-4 prefs: [] type: TYPE_PRE zh: '[PRE0]' - en: Map-style DataPipe. id: totrans-5 prefs: [] type: TYPE_NORMAL zh: 映射样式DataPipe。 - en: All datasets that represent a map from keys to data samples should subclass this. Subclasses should overwrite `__getitem__()`, supporting fetching a data sample for a given, unique key. Subclasses can also optionally overwrite `__len__()`, which is expected to return the size of the dataset by many `Sampler` implementations and the default options of `DataLoader`. id: totrans-6 prefs: [] type: TYPE_NORMAL zh: 所有表示从键到数据样本的数据集都应该是这个类的子类。子类应该重写`__getitem__()`,支持为给定的唯一键获取数据样本。子类也可以选择性地重写`__len__()`,这个方法在许多`Sampler`实现和`DataLoader`的默认选项中被期望返回数据集的大小。 - en: These DataPipes can be invoked in two ways, using the class constructor or applying their functional form onto an existing MapDataPipe (recommend, available to most but not all DataPipes). id: totrans-7 prefs: [] type: TYPE_NORMAL zh: 这些DataPipes可以通过两种方式调用,一种是使用类构造函数,另一种是将它们的函数形式应用于现有的MapDataPipe(推荐,适用于大多数但不是所有DataPipes)。 - en: Note id: totrans-8 prefs: [] type: TYPE_NORMAL zh: 注意 - en: '`DataLoader` by default constructs an index sampler that yields integral indices. To make it work with a map-style DataPipe with non-integral indices/keys, a custom sampler must be provided.' id: totrans-9 prefs: [] type: TYPE_NORMAL zh: '`DataLoader`默认构建一个索引采样器,产生整数索引。要使其与具有非整数索引/键的映射样式DataPipe一起工作,必须提供自定义采样器。' - en: Example id: totrans-10 prefs: [] type: TYPE_NORMAL zh: 示例 - en: '[PRE1]' id: totrans-11 prefs: [] type: TYPE_PRE zh: '[PRE1]' - en: By design, there are fewer `MapDataPipe` than `IterDataPipe` to avoid duplicate implementations of the same functionalities as `MapDataPipe`. We encourage users to use the built-in `IterDataPipe` for various functionalities, and convert it to `MapDataPipe` as needed using [`IterToMapConverter`](generated/torchdata.datapipes.map.IterToMapConverter.html#torchdata.datapipes.map.IterToMapConverter "torchdata.datapipes.map.IterToMapConverter") or `.to_map_datapipe()`. If you have any question about usage or best practices while using `MapDataPipe`, feel free to ask on the PyTorch forum under the [‘data’ category](https://discuss.pytorch.org/c/data/37). id: totrans-12 prefs: [] type: TYPE_NORMAL zh: 按设计,`MapDataPipe`比`IterDataPipe`少,以避免重复实现相同的功能。我们鼓励用户使用内置的`IterDataPipe`进行各种功能,并根据需要使用[`IterToMapConverter`](generated/torchdata.datapipes.map.IterToMapConverter.html#torchdata.datapipes.map.IterToMapConverter "torchdata.datapipes.map.IterToMapConverter")或`.to_map_datapipe()`将其转换为`MapDataPipe`。如果您在使用`MapDataPipe`时有任何问题或最佳实践,请随时在PyTorch论坛的[‘data’类别](https://discuss.pytorch.org/c/data/37)下提问。 - en: We are open to add additional `MapDataPipe` where the operations can be lazily executed and `__len__` can be known in advance. Feel free to make suggestions with description of your use case in [this Github issue](https://github.com/pytorch/pytorch/issues/57031). Feedback about our design choice is also welcomed in that Github issue. id: totrans-13 prefs: [] type: TYPE_NORMAL zh: 我们愿意添加额外的`MapDataPipe`,其中操作可以延迟执行,并且`__len__`可以提前知道。请在[此Github问题](https://github.com/pytorch/pytorch/issues/57031)中提出您的用例描述的建议。关于我们的设计选择的反馈也欢迎在该Github问题中提出。 - en: 'Here is the list of available Map-style DataPipes:' id: totrans-14 prefs: [] type: TYPE_NORMAL zh: 以下是可用的映射样式DataPipes列表: - en: List of MapDataPipes[](#list-of-mapdatapipes "Permalink to this heading") id: totrans-15 prefs: - PREF_H2 type: TYPE_NORMAL zh: MapDataPipes列表[](#list-of-mapdatapipes "跳转到此标题") - en: '| [`Batcher`](generated/torchdata.datapipes.map.Batcher.html#torchdata.datapipes.map.Batcher "torchdata.datapipes.map.Batcher") | Create mini-batches of data (functional name: `batch`). |' id: totrans-16 prefs: [] type: TYPE_TB zh: '| [`Batcher`](generated/torchdata.datapipes.map.Batcher.html#torchdata.datapipes.map.Batcher "torchdata.datapipes.map.Batcher") | 创建数据的小批次(函数名称:`batch`)。|' - en: '| [`Concater`](generated/torchdata.datapipes.map.Concater.html#torchdata.datapipes.map.Concater "torchdata.datapipes.map.Concater") | Concatenate multiple Map DataPipes (functional name: `concat`). |' id: totrans-17 prefs: [] type: TYPE_TB zh: '| [`Concater`](generated/torchdata.datapipes.map.Concater.html#torchdata.datapipes.map.Concater "torchdata.datapipes.map.Concater") | 连接多个Map DataPipes(函数名称:`concat`)。|' - en: '| [`InMemoryCacheHolder`](generated/torchdata.datapipes.map.InMemoryCacheHolder.html#torchdata.datapipes.map.InMemoryCacheHolder "torchdata.datapipes.map.InMemoryCacheHolder") | Stores elements from the source DataPipe in memory (functional name: `in_memory_cache`). |' id: totrans-18 prefs: [] type: TYPE_TB zh: '| [`InMemoryCacheHolder`](generated/torchdata.datapipes.map.InMemoryCacheHolder.html#torchdata.datapipes.map.InMemoryCacheHolder "torchdata.datapipes.map.InMemoryCacheHolder") | 将源DataPipe中的元素存储在内存中(函数名称:`in_memory_cache`)。|' - en: '| [`IterToMapConverter`](generated/torchdata.datapipes.map.IterToMapConverter.html#torchdata.datapipes.map.IterToMapConverter "torchdata.datapipes.map.IterToMapConverter") | Lazily load data from `IterDataPipe` to construct a `MapDataPipe` with the key-value pair generated by `key_value_fn` (functional name: `to_map_datapipe`). |' id: totrans-19 prefs: [] type: TYPE_TB zh: '| [`IterToMapConverter`](generated/torchdata.datapipes.map.IterToMapConverter.html#torchdata.datapipes.map.IterToMapConverter "torchdata.datapipes.map.IterToMapConverter") | 从`IterDataPipe`中延迟加载数据,以生成由`key_value_fn`生成的键值对构建`MapDataPipe`(函数名称:`to_map_datapipe`)。|' - en: '| [`Mapper`](generated/torchdata.datapipes.map.Mapper.html#torchdata.datapipes.map.Mapper "torchdata.datapipes.map.Mapper") | Apply the input function over each item from the source DataPipe (functional name: `map`). |' id: totrans-20 prefs: [] type: TYPE_TB zh: '| [`Mapper`](generated/torchdata.datapipes.map.Mapper.html#torchdata.datapipes.map.Mapper "torchdata.datapipes.map.Mapper") | 对源DataPipe中的每个项目应用输入函数(函数名称:`map`)。|' - en: '| [`SequenceWrapper`](generated/torchdata.datapipes.map.SequenceWrapper.html#torchdata.datapipes.map.SequenceWrapper "torchdata.datapipes.map.SequenceWrapper") | Wraps a sequence object into a MapDataPipe. |' id: totrans-21 prefs: [] type: TYPE_TB zh: '| [`SequenceWrapper`](generated/torchdata.datapipes.map.SequenceWrapper.html#torchdata.datapipes.map.SequenceWrapper "torchdata.datapipes.map.SequenceWrapper") | 将序列对象包装成MapDataPipe。|' - en: '| [`Shuffler`](generated/torchdata.datapipes.map.Shuffler.html#torchdata.datapipes.map.Shuffler "torchdata.datapipes.map.Shuffler") | Shuffle the input MapDataPipe via its indices (functional name: `shuffle`). |' id: totrans-22 prefs: [] type: TYPE_TB zh: '| [`Shuffler`](generated/torchdata.datapipes.map.Shuffler.html#torchdata.datapipes.map.Shuffler "torchdata.datapipes.map.Shuffler") | 通过其索引对输入的 MapDataPipe 进行洗牌(函数名称:`shuffle`)。 |' - en: '| [`UnZipper`](generated/torchdata.datapipes.map.UnZipper.html#torchdata.datapipes.map.UnZipper "torchdata.datapipes.map.UnZipper") | Takes in a DataPipe of Sequences, unpacks each Sequence, and return the elements in separate DataPipes based on their position in the Sequence (functional name: `unzip`). |' id: totrans-23 prefs: [] type: TYPE_TB zh: '| [`UnZipper`](generated/torchdata.datapipes.map.UnZipper.html#torchdata.datapipes.map.UnZipper "torchdata.datapipes.map.UnZipper") | 接收一个序列的 DataPipe,解压每个序列,并根据它们在序列中的位置将元素分别返回到不同的 DataPipes 中(函数名称:`unzip`)。 |' - en: '| [`Zipper`](generated/torchdata.datapipes.map.Zipper.html#torchdata.datapipes.map.Zipper "torchdata.datapipes.map.Zipper") | Aggregates elements into a tuple from each of the input DataPipes (functional name: `zip`). |' id: totrans-24 prefs: [] type: TYPE_TB zh: '| [`Zipper`](generated/torchdata.datapipes.map.Zipper.html#torchdata.datapipes.map.Zipper "torchdata.datapipes.map.Zipper") | 从每个输入的 DataPipe 中聚合元素到一个元组中(函数名称:`zip`)。 |'