- en: torchaudio.datasets id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL zh: torchaudio.datasets - en: 原文:[https://pytorch.org/audio/stable/datasets.html](https://pytorch.org/audio/stable/datasets.html) id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL zh: 原文:[https://pytorch.org/audio/stable/datasets.html](https://pytorch.org/audio/stable/datasets.html) - en: All datasets are subclasses of [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset "(in PyTorch v2.1)") and have `__getitem__` and `__len__` methods implemented. id: totrans-2 prefs: [] type: TYPE_NORMAL zh: 所有数据集都是 [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset "(在 PyTorch v2.1 中)") 的子类,并实现了 `__getitem__` 和 `__len__` 方法。 - en: 'Hence, they can all be passed to a [`torch.utils.data.DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader "(in PyTorch v2.1)") which can load multiple samples parallelly using [`torch.multiprocessing`](https://pytorch.org/docs/stable/multiprocessing.html#module-torch.multiprocessing "(in PyTorch v2.1)") workers. For example:' id: totrans-3 prefs: [] type: TYPE_NORMAL zh: 因此,它们都可以传递给 [`torch.utils.data.DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader "(在 PyTorch v2.1 中)"),该加载器可以使用 [`torch.multiprocessing`](https://pytorch.org/docs/stable/multiprocessing.html#module-torch.multiprocessing "(在 PyTorch v2.1 中)") 工作器并行加载多个样本。例如: - en: '[PRE0]' id: totrans-4 prefs: [] type: TYPE_PRE zh: '[PRE0]' - en: '| [`CMUARCTIC`](generated/torchaudio.datasets.CMUARCTIC.html#torchaudio.datasets.CMUARCTIC "torchaudio.datasets.CMUARCTIC") | *CMU ARCTIC* [[Kominek *et al.*, 2003](references.html#id36 "John Kominek, Alan W Black, and Ver Ver. Cmu arctic databases for speech synthesis. Technical Report, 2003.")] dataset. |' id: totrans-5 prefs: [] type: TYPE_TB zh: '| [`CMUARCTIC`](generated/torchaudio.datasets.CMUARCTIC.html#torchaudio.datasets.CMUARCTIC "torchaudio.datasets.CMUARCTIC") | *CMU ARCTIC* [[Kominek *et al.*, 2003](references.html#id36 "John Kominek, Alan W Black, and Ver Ver. Cmu arctic databases for speech synthesis. Technical Report, 2003.")] 数据集。|' - en: '| [`CMUDict`](generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict "torchaudio.datasets.CMUDict") | *CMU Pronouncing Dictionary* [[Weide, 1998](references.html#id45 "R.L. Weide. The carnegie mellon pronuncing dictionary. 1998\. URL: http://www.speech.cs.cmu.edu/cgi-bin/cmudict.")] (CMUDict) dataset. |' id: totrans-6 prefs: [] type: TYPE_TB zh: '| [`CMUDict`](generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict "torchaudio.datasets.CMUDict") | *CMU Pronouncing Dictionary* [[Weide, 1998](references.html#id45 "R.L. Weide. The carnegie mellon pronuncing dictionary. 1998\. URL: http://www.speech.cs.cmu.edu/cgi-bin/cmudict.")] (CMUDict) 数据集。|' - en: '| [`COMMONVOICE`](generated/torchaudio.datasets.COMMONVOICE.html#torchaudio.datasets.COMMONVOICE "torchaudio.datasets.COMMONVOICE") | *CommonVoice* [[Ardila *et al.*, 2020](references.html#id10 "Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, and Gregor Weber. Common voice: a massively-multilingual speech corpus. 2020\. arXiv:1912.06670.")] dataset. |' id: totrans-7 prefs: [] type: TYPE_TB zh: '| [`COMMONVOICE`](generated/torchaudio.datasets.COMMONVOICE.html#torchaudio.datasets.COMMONVOICE "torchaudio.datasets.COMMONVOICE") | *CommonVoice* [[Ardila *et al.*, 2020](references.html#id10 "Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, and Gregor Weber. Common voice: a massively-multilingual speech corpus. 2020\. arXiv:1912.06670.")] 数据集。|' - en: '| [`DR_VCTK`](generated/torchaudio.datasets.DR_VCTK.html#torchaudio.datasets.DR_VCTK "torchaudio.datasets.DR_VCTK") | *Device Recorded VCTK (Small subset version)* [[Sarfjoo and Yamagishi, 2018](references.html#id42 "Seyyed Saeed Sarfjoo and Junichi Yamagishi. Device recorded vctk (small subset version). 2018.")] dataset. |' id: totrans-8 prefs: [] type: TYPE_TB zh: '| [`DR_VCTK`](generated/torchaudio.datasets.DR_VCTK.html#torchaudio.datasets.DR_VCTK "torchaudio.datasets.DR_VCTK") | *Device Recorded VCTK (Small subset version)* [[Sarfjoo and Yamagishi, 2018](references.html#id42 "Seyyed Saeed Sarfjoo and Junichi Yamagishi. Device recorded vctk (small subset version). 2018.")] 数据集。|' - en: '| [`FluentSpeechCommands`](generated/torchaudio.datasets.FluentSpeechCommands.html#torchaudio.datasets.FluentSpeechCommands "torchaudio.datasets.FluentSpeechCommands") | *Fluent Speech Commands* [[Lugosch *et al.*, 2019](references.html#id48 "Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, and Yoshua Bengio. Speech model pre-training for end-to-end spoken language understanding. In Gernot Kubin and Zdravko Kacic, editors, Proc. of Interspeech, 814–818\. 2019.")] dataset |' id: totrans-9 prefs: [] type: TYPE_TB zh: '| [`FluentSpeechCommands`](generated/torchaudio.datasets.FluentSpeechCommands.html#torchaudio.datasets.FluentSpeechCommands "torchaudio.datasets.FluentSpeechCommands") | *Fluent Speech Commands* [[Lugosch *et al.*, 2019](references.html#id48 "Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, and Yoshua Bengio. Speech model pre-training for end-to-end spoken language understanding. In Gernot Kubin and Zdravko Kacic, editors, Proc. of Interspeech, 814–818\. 2019.")] 数据集|' - en: '| [`GTZAN`](generated/torchaudio.datasets.GTZAN.html#torchaudio.datasets.GTZAN "torchaudio.datasets.GTZAN") | *GTZAN* [[Tzanetakis *et al.*, 2001](references.html#id43 "George Tzanetakis, Georg Essl, and Perry Cook. Automatic musical genre classification of audio signals. 2001\. URL: http://ismir2001.ismir.net/pdf/tzanetakis.pdf.")] dataset. |' id: totrans-10 prefs: [] type: TYPE_TB zh: '| [`GTZAN`](generated/torchaudio.datasets.GTZAN.html#torchaudio.datasets.GTZAN "torchaudio.datasets.GTZAN") | *GTZAN* [[Tzanetakis *et al.*, 2001](references.html#id43 "George Tzanetakis, Georg Essl, and Perry Cook. Automatic musical genre classification of audio signals. 2001\. URL: http://ismir2001.ismir.net/pdf/tzanetakis.pdf.")] 数据集。|' - en: '| [`IEMOCAP`](generated/torchaudio.datasets.IEMOCAP.html#torchaudio.datasets.IEMOCAP "torchaudio.datasets.IEMOCAP") | *IEMOCAP* [[Busso *et al.*, 2008](references.html#id52 "Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower Provost, Samuel Kim, Jeannette Chang, Sungbok Lee, and Shrikanth Narayanan. Iemocap: interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42:335-359, 12 2008\. doi:10.1007/s10579-008-9076-6.")] dataset. |' id: totrans-11 prefs: [] type: TYPE_TB zh: '| [`IEMOCAP`](generated/torchaudio.datasets.IEMOCAP.html#torchaudio.datasets.IEMOCAP "torchaudio.datasets.IEMOCAP") | *IEMOCAP* [[Busso *et al.*, 2008](references.html#id52 "Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower Provost, Samuel Kim, Jeannette Chang, Sungbok Lee, and Shrikanth Narayanan. Iemocap: interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42:335-359, 12 2008\. doi:10.1007/s10579-008-9076-6.")] 数据集。|' - en: '| [`LibriMix`](generated/torchaudio.datasets.LibriMix.html#torchaudio.datasets.LibriMix "torchaudio.datasets.LibriMix") | *LibriMix* [[Cosentino *et al.*, 2020](references.html#id37 "Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, and Emmanuel Vincent. Librimix: an open-source dataset for generalizable speech separation. 2020\. arXiv:2005.11262.")] dataset. |' id: totrans-12 prefs: [] type: TYPE_TB zh: '| [`LibriMix`](generated/torchaudio.datasets.LibriMix.html#torchaudio.datasets.LibriMix "torchaudio.datasets.LibriMix") | *LibriMix* [[Cosentino *et al.*, 2020](references.html#id37 "Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, and Emmanuel Vincent. Librimix: an open-source dataset for generalizable speech separation. 2020\. arXiv:2005.11262.")] 数据集。|' - en: '| [`LIBRISPEECH`](generated/torchaudio.datasets.LIBRISPEECH.html#torchaudio.datasets.LIBRISPEECH "torchaudio.datasets.LIBRISPEECH") | *LibriSpeech* [[Panayotov *et al.*, 2015](references.html#id13 "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. doi:10.1109/ICASSP.2015.7178964.")] dataset. |' id: totrans-13 prefs: [] type: TYPE_TB zh: '| [`LIBRISPEECH`](generated/torchaudio.datasets.LIBRISPEECH.html#torchaudio.datasets.LIBRISPEECH "torchaudio.datasets.LIBRISPEECH") | *LibriSpeech* [[Panayotov *et al.*, 2015](references.html#id13 "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. doi:10.1109/ICASSP.2015.7178964.")] 数据集。|' - en: '| [`LibriLightLimited`](generated/torchaudio.datasets.LibriLightLimited.html#torchaudio.datasets.LibriLightLimited "torchaudio.datasets.LibriLightLimited") | Subset of Libri-light [[Kahn *et al.*, 2020](references.html#id12 "J. Kahn, M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, V. Liptchinsky, R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, A. Mohamed, and E. Dupoux. Libri-light: a benchmark for asr with limited or no supervision. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")] dataset, which was used in HuBERT [[Hsu *et al.*, 2021](references.html#id16 "Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. Hubert: self-supervised speech representation learning by masked prediction of hidden units. 2021\. arXiv:2106.07447.")] for supervised fine-tuning. |' id: totrans-14 prefs: [] type: TYPE_TB zh: '| [`LibriLightLimited`](generated/torchaudio.datasets.LibriLightLimited.html#torchaudio.datasets.LibriLightLimited "torchaudio.datasets.LibriLightLimited") | Libri-light的子集 [[Kahn *et al.*, 2020](references.html#id12 "J. Kahn, M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, V. Liptchinsky, R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, A. Mohamed, and E. Dupoux. Libri-light: a benchmark for asr with limited or no supervision. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")] 数据集,被用于HuBERT [[Hsu *et al.*, 2021](references.html#id16 "Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. Hubert: self-supervised speech representation learning by masked prediction of hidden units. 2021\. arXiv:2106.07447.")] 进行监督微调。|' - en: '| [`LIBRITTS`](generated/torchaudio.datasets.LIBRITTS.html#torchaudio.datasets.LIBRITTS "torchaudio.datasets.LIBRITTS") | *LibriTTS* [[Zen *et al.*, 2019](references.html#id38 "Heiga Zen, Viet-Trung Dang, Robert A. J. Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Z. Chen, and Yonghui Wu. Libritts: a corpus derived from librispeech for text-to-speech. ArXiv, 2019.")] dataset. |' id: totrans-15 prefs: [] type: TYPE_TB zh: '| [`LIBRITTS`](generated/torchaudio.datasets.LIBRITTS.html#torchaudio.datasets.LIBRITTS "torchaudio.datasets.LIBRITTS") | *LibriTTS* [[Zen *et al.*, 2019](references.html#id38 "Heiga Zen, Viet-Trung Dang, Robert A. J. Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Z. Chen, and Yonghui Wu. Libritts: a corpus derived from librispeech for text-to-speech. ArXiv, 2019.")] 数据集。|' - en: '| [`LJSPEECH`](generated/torchaudio.datasets.LJSPEECH.html#torchaudio.datasets.LJSPEECH "torchaudio.datasets.LJSPEECH") | *LJSpeech-1.1* [[Ito and Johnson, 2017](references.html#id7 "Keith Ito and Linda Johnson. The lj speech dataset. \url https://keithito.com/LJ-Speech-Dataset/, 2017.")] dataset. |' id: totrans-16 prefs: [] type: TYPE_TB zh: '| [`LJSPEECH`](generated/torchaudio.datasets.LJSPEECH.html#torchaudio.datasets.LJSPEECH "torchaudio.datasets.LJSPEECH") | *LJSpeech-1.1* [[Ito and Johnson, 2017](references.html#id7 "Keith Ito and Linda Johnson. The lj speech dataset. \url https://keithito.com/LJ-Speech-Dataset/, 2017.")] 数据集。|' - en: '| [`MUSDB_HQ`](generated/torchaudio.datasets.MUSDB_HQ.html#torchaudio.datasets.MUSDB_HQ "torchaudio.datasets.MUSDB_HQ") | *MUSDB_HQ* [[Rafii *et al.*, 2019](references.html#id47 "Zafar Rafii, Antoine Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis, and Rachel Bittner. MUSDB18-HQ - an uncompressed version of musdb18\. December 2019\. URL: https://doi.org/10.5281/zenodo.3338373, doi:10.5281/zenodo.3338373.")] dataset. |' id: totrans-17 prefs: [] type: TYPE_TB zh: '| [`MUSDB_HQ`](generated/torchaudio.datasets.MUSDB_HQ.html#torchaudio.datasets.MUSDB_HQ "torchaudio.datasets.MUSDB_HQ") | *MUSDB_HQ* [[Rafii *et al.*, 2019](references.html#id47 "Zafar Rafii, Antoine Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis, and Rachel Bittner. MUSDB18-HQ - an uncompressed version of musdb18\. December 2019\. URL: https://doi.org/10.5281/zenodo.3338373, doi:10.5281/zenodo.3338373.")] 数据集。|' - en: '| [`QUESST14`](generated/torchaudio.datasets.QUESST14.html#torchaudio.datasets.QUESST14 "torchaudio.datasets.QUESST14") | *QUESST14* [[Miro *et al.*, 2015](references.html#id44 "Xavier Anguera Miro, Luis Javier Rodriguez-Fuentes, Andi Buzo, Florian Metze, Igor Szoke, and Mikel Peñagarikano. Quesst2014: evaluating query-by-example speech search in a zero-resource setting with real-life queries. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5833-5837, 2015.")] dataset. |' id: totrans-18 prefs: [] type: TYPE_TB zh: '| [`QUESST14`](generated/torchaudio.datasets.QUESST14.html#torchaudio.datasets.QUESST14 "torchaudio.datasets.QUESST14") | *QUESST14* [[Miro *et al.*, 2015](references.html#id44 "Xavier Anguera Miro, Luis Javier Rodriguez-Fuentes, Andi Buzo, Florian Metze, Igor Szoke, and Mikel Peñagarikano. Quesst2014: evaluating query-by-example speech search in a zero-resource setting with real-life queries. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5833-5837, 2015.")] 数据集。|' - en: '| [`Snips`](generated/torchaudio.datasets.Snips.html#torchaudio.datasets.Snips "torchaudio.datasets.Snips") | *Snips* [[Coucke *et al.*, 2018](references.html#id53 "Alice Coucke, Alaa Saade, Adrien Ball, Théodore Bluche, Alexandre Caulier, David Leroy, Clément Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril, and others. Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190, 2018.")] dataset. |' id: totrans-19 prefs: [] type: TYPE_TB zh: '| [`Snips`](generated/torchaudio.datasets.Snips.html#torchaudio.datasets.Snips "torchaudio.datasets.Snips") | *Snips* [[Coucke *et al.*, 2018](references.html#id53 "Alice Coucke, Alaa Saade, Adrien Ball, Théodore Bluche, Alexandre Caulier, David Leroy, Clément Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril, and others. Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190, 2018.")] 数据集。|' - en: '| [`SPEECHCOMMANDS`](generated/torchaudio.datasets.SPEECHCOMMANDS.html#torchaudio.datasets.SPEECHCOMMANDS "torchaudio.datasets.SPEECHCOMMANDS") | *Speech Commands* [[Warden, 2018](references.html#id39 "P. Warden. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. ArXiv e-prints, April 2018\. URL: https://arxiv.org/abs/1804.03209, arXiv:1804.03209.")] dataset. |' id: totrans-20 prefs: [] type: TYPE_TB zh: '| [`SPEECHCOMMANDS`](generated/torchaudio.datasets.SPEECHCOMMANDS.html#torchaudio.datasets.SPEECHCOMMANDS "torchaudio.datasets.SPEECHCOMMANDS") | *Speech Commands* [[Warden, 2018](references.html#id39 "P. Warden. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. ArXiv e-prints, April 2018\. URL: https://arxiv.org/abs/1804.03209, arXiv:1804.03209.")] 数据集。 |' - en: '| [`TEDLIUM`](generated/torchaudio.datasets.TEDLIUM.html#torchaudio.datasets.TEDLIUM "torchaudio.datasets.TEDLIUM") | *Tedlium* [[Rousseau *et al.*, 2012](references.html#id40 "Anthony Rousseau, Paul Deléglise, and Yannick Estève. Ted-lium: an automatic speech recognition dedicated corpus. In Conference on Language Resources and Evaluation (LREC), 125–129\. 2012.")] dataset (releases 1,2 and 3). |' id: totrans-21 prefs: [] type: TYPE_TB zh: '| [`TEDLIUM`](generated/torchaudio.datasets.TEDLIUM.html#torchaudio.datasets.TEDLIUM "torchaudio.datasets.TEDLIUM") | *Tedlium* [[Rousseau *et al.*, 2012](references.html#id40 "Anthony Rousseau, Paul Deléglise, and Yannick Estève. Ted-lium: an automatic speech recognition dedicated corpus. In Conference on Language Resources and Evaluation (LREC), 125–129\. 2012.")] 数据集(版本1、2和3)。 |' - en: '| [`VCTK_092`](generated/torchaudio.datasets.VCTK_092.html#torchaudio.datasets.VCTK_092 "torchaudio.datasets.VCTK_092") | *VCTK 0.92* [[Yamagishi *et al.*, 2019](references.html#id41 "Junichi Yamagishi, Christophe Veaux, and Kirsten MacDonald. CSTR VCTK Corpus: english multi-speaker corpus for CSTR voice cloning toolkit (version 0.92). 2019\. doi:10.7488/ds/2645.")] dataset |' id: totrans-22 prefs: [] type: TYPE_TB zh: '| [`VCTK_092`](generated/torchaudio.datasets.VCTK_092.html#torchaudio.datasets.VCTK_092 "torchaudio.datasets.VCTK_092") | *VCTK 0.92* [[Yamagishi *et al.*, 2019](references.html#id41 "Junichi Yamagishi, Christophe Veaux, and Kirsten MacDonald. CSTR VCTK Corpus: english multi-speaker corpus for CSTR voice cloning toolkit (version 0.92). 2019\. doi:10.7488/ds/2645.")] 数据集 |' - en: '| [`VoxCeleb1Identification`](generated/torchaudio.datasets.VoxCeleb1Identification.html#torchaudio.datasets.VoxCeleb1Identification "torchaudio.datasets.VoxCeleb1Identification") | *VoxCeleb1* [[Nagrani *et al.*, 2017](references.html#id49 "Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612, 2017.")] dataset for speaker identification task. |' id: totrans-23 prefs: [] type: TYPE_TB zh: '| [`VoxCeleb1Identification`](generated/torchaudio.datasets.VoxCeleb1Identification.html#torchaudio.datasets.VoxCeleb1Identification "torchaudio.datasets.VoxCeleb1Identification") | *VoxCeleb1* [[Nagrani *et al.*, 2017](references.html#id49 "Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612, 2017.")] 用于说话人识别任务的数据集。 |' - en: '| [`VoxCeleb1Verification`](generated/torchaudio.datasets.VoxCeleb1Verification.html#torchaudio.datasets.VoxCeleb1Verification "torchaudio.datasets.VoxCeleb1Verification") | *VoxCeleb1* [[Nagrani *et al.*, 2017](references.html#id49 "Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612, 2017.")] dataset for speaker verification task. |' id: totrans-24 prefs: [] type: TYPE_TB zh: '| [`VoxCeleb1Verification`](generated/torchaudio.datasets.VoxCeleb1Verification.html#torchaudio.datasets.VoxCeleb1Verification "torchaudio.datasets.VoxCeleb1Verification") | *VoxCeleb1* [[Nagrani *et al.*, 2017](references.html#id49 "Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612, 2017.")] 用于说话人验证任务的数据集。 |' - en: '| [`YESNO`](generated/torchaudio.datasets.YESNO.html#torchaudio.datasets.YESNO "torchaudio.datasets.YESNO") | *YesNo* [[*YesNo*, n.d.](references.html#id46 "Yesno. URL: http://www.openslr.org/1/.")] dataset. |' id: totrans-25 prefs: [] type: TYPE_TB zh: '| [`YESNO`](generated/torchaudio.datasets.YESNO.html#torchaudio.datasets.YESNO "torchaudio.datasets.YESNO") | *YesNo* [[*YesNo*, n.d.](references.html#id46 "Yesno. URL: http://www.openslr.org/1/.")] 数据集。 |'