diff --git a/totrans/aud22_34.yaml b/totrans/aud22_34.yaml index 81d65a37e33d639a0aff561b33adcc1ac9933464..2f7295fb4fbb2a5d08aa685138f1fae6e4c4af64 100644 --- a/totrans/aud22_34.yaml +++ b/totrans/aud22_34.yaml @@ -1,310 +1,480 @@ - en: Speech Recognition with Wav2Vec2 + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 使用Wav2Vec2进行语音识别 - en: 原文:[https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html](https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html](https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html) - en: Note + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Click [here](#sphx-glr-download-tutorials-speech-recognition-pipeline-tutorial-py) to download the full example code + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 点击[这里](#sphx-glr-download-tutorials-speech-recognition-pipeline-tutorial-py)下载完整示例代码 - en: '**Author**: [Moto Hira](mailto:moto%40meta.com)' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: '**作者**:[Moto Hira](mailto:moto%40meta.com)' - en: This tutorial shows how to perform speech recognition using using pre-trained models from wav2vec 2.0 [[paper](https://arxiv.org/abs/2006.11477)]. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 本教程展示了如何使用来自wav2vec 2.0的预训练模型执行语音识别[[论文](https://arxiv.org/abs/2006.11477)]。 - en: Overview[](#overview "Permalink to this heading") + id: totrans-6 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 概述[](#overview "跳转到此标题") - en: The process of speech recognition looks like the following. + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 语音识别的过程如下所示。 - en: Extract the acoustic features from audio waveform + id: totrans-8 prefs: - PREF_OL type: TYPE_NORMAL + zh: 从音频波形中提取声学特征 - en: Estimate the class of the acoustic features frame-by-frame + id: totrans-9 prefs: - PREF_OL type: TYPE_NORMAL + zh: 逐帧估计声学特征的类别 - en: Generate hypothesis from the sequence of the class probabilities + id: totrans-10 prefs: - PREF_OL type: TYPE_NORMAL + zh: 从类概率序列生成假设 - en: Torchaudio provides easy access to the pre-trained weights and associated information, such as the expected sample rate and class labels. They are bundled together and available under [`torchaudio.pipelines`](../pipelines.html#module-torchaudio.pipelines "torchaudio.pipelines") module. + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: Torchaudio提供了对预训练权重和相关信息的简单访问,例如预期的采样率和类标签。它们被捆绑在一起,并在[`torchaudio.pipelines`](../pipelines.html#module-torchaudio.pipelines + "torchaudio.pipelines")模块下提供。 - en: Preparation[](#preparation "Permalink to this heading") + id: totrans-12 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 准备[](#preparation "跳转到此标题") - en: '[PRE0]' + id: totrans-13 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '[PRE1]' + id: totrans-14 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: '[PRE2]' + id: totrans-15 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: '[PRE3]' + id: totrans-16 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: Creating a pipeline[](#creating-a-pipeline "Permalink to this heading") + id: totrans-17 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 创建管道[](#creating-a-pipeline "跳转到此标题") - en: First, we will create a Wav2Vec2 model that performs the feature extraction and the classification. + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: 首先,我们将创建一个执行特征提取和分类的Wav2Vec2模型。 - en: There are two types of Wav2Vec2 pre-trained weights available in torchaudio. The ones fine-tuned for ASR task, and the ones not fine-tuned. + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: torchaudio中有两种类型的Wav2Vec2预训练权重。一种是为ASR任务微调的,另一种是未经微调的。 - en: Wav2Vec2 (and HuBERT) models are trained in self-supervised manner. They are firstly trained with audio only for representation learning, then fine-tuned for a specific task with additional labels. + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: Wav2Vec2(和HuBERT)模型以自监督方式进行训练。它们首先仅使用音频进行表示学习的训练,然后再使用附加标签进行特定任务的微调。 - en: The pre-trained weights without fine-tuning can be fine-tuned for other downstream tasks as well, but this tutorial does not cover that. + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: 未经微调的预训练权重也可以用于其他下游任务的微调,但本教程不涵盖此内容。 - en: We will use [`torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H`](../generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H "torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H") here. + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 我们将在这里使用[`torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H`](../generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H + "torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H")。 - en: There are multiple pre-trained models available in [`torchaudio.pipelines`](../pipelines.html#module-torchaudio.pipelines "torchaudio.pipelines"). Please check the documentation for the detail of how they are trained. + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: '[`torchaudio.pipelines`](../pipelines.html#module-torchaudio.pipelines "torchaudio.pipelines")中有多个预训练模型可用。请查看文档以了解它们的训练方式的详细信息。' - en: The bundle object provides the interface to instantiate model and other information. Sampling rate and the class labels are found as follow. + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: bundle对象提供了实例化模型和其他信息的接口。采样率和类标签如下所示。 - en: '[PRE4]' + id: totrans-25 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: '[PRE5]' + id: totrans-26 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: Model can be constructed as following. This process will automatically fetch the pre-trained weights and load it into the model. + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: 模型可以按以下方式构建。此过程将自动获取预训练权重并将其加载到模型中。 - en: '[PRE6]' + id: totrans-28 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: '[PRE7]' + id: totrans-29 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: Loading data[](#loading-data "Permalink to this heading") + id: totrans-30 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 加载数据[](#loading-data "跳转到此标题") - en: We will use the speech data from [VOiCES dataset](https://iqtlabs.github.io/voices/), which is licensed under Creative Commos BY 4.0. + id: totrans-31 prefs: [] type: TYPE_NORMAL + zh: 我们将使用[VOiCES数据集](https://iqtlabs.github.io/voices/)中的语音数据,该数据集在Creative Commos + BY 4.0下许可。 - en: '[PRE8]' + id: totrans-32 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE8]' +- en: null + id: totrans-33 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-34 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: To load data, we use [`torchaudio.load()`](../generated/torchaudio.load.html#torchaudio.load "torchaudio.load"). + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 为了加载数据,我们使用[`torchaudio.load()`](../generated/torchaudio.load.html#torchaudio.load + "torchaudio.load")。 - en: If the sampling rate is different from what the pipeline expects, then we can use [`torchaudio.functional.resample()`](../generated/torchaudio.functional.resample.html#torchaudio.functional.resample "torchaudio.functional.resample") for resampling. + id: totrans-36 prefs: [] type: TYPE_NORMAL + zh: 如果采样率与管道期望的不同,则可以使用[`torchaudio.functional.resample()`](../generated/torchaudio.functional.resample.html#torchaudio.functional.resample + "torchaudio.functional.resample")进行重采样。 - en: Note + id: totrans-37 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: '[`torchaudio.functional.resample()`](../generated/torchaudio.functional.resample.html#torchaudio.functional.resample "torchaudio.functional.resample") works on CUDA tensors as well.' + id: totrans-38 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[`torchaudio.functional.resample()`](../generated/torchaudio.functional.resample.html#torchaudio.functional.resample + "torchaudio.functional.resample")也适用于CUDA张量。' - en: When performing resampling multiple times on the same set of sample rates, using [`torchaudio.transforms.Resample`](../generated/torchaudio.transforms.Resample.html#torchaudio.transforms.Resample "torchaudio.transforms.Resample") might improve the performace. + id: totrans-39 prefs: - PREF_UL type: TYPE_NORMAL + zh: 在同一组采样率上多次执行重采样时,使用[`torchaudio.transforms.Resample`](../generated/torchaudio.transforms.Resample.html#torchaudio.transforms.Resample + "torchaudio.transforms.Resample")可能会提高性能。 - en: '[PRE9]' + id: totrans-40 prefs: [] type: TYPE_PRE -- en: Extracting acoustic features[](#extracting-acoustic-features "Permalink to - this heading") + zh: '[PRE9]' +- en: Extracting acoustic features[](#extracting-acoustic-features "Permalink to this + heading") + id: totrans-41 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 提取声学特征[](#extracting-acoustic-features "跳转到此标题") - en: The next step is to extract acoustic features from the audio. + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: 下一步是从音频中提取声学特征。 - en: Note + id: totrans-43 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Wav2Vec2 models fine-tuned for ASR task can perform feature extraction and classification with one step, but for the sake of the tutorial, we also show how to perform feature extraction here. + id: totrans-44 prefs: [] type: TYPE_NORMAL + zh: 为ASR任务微调的Wav2Vec2模型可以一步完成特征提取和分类,但为了教程的目的,我们还展示了如何在此处执行特征提取。 - en: '[PRE10]' + id: totrans-45 prefs: [] type: TYPE_PRE + zh: '[PRE10]' - en: The returned features is a list of tensors. Each tensor is the output of a transformer layer. + id: totrans-46 prefs: [] type: TYPE_NORMAL + zh: 返回的特征是一个张量列表。每个张量是一个变换器层的输出。 - en: '[PRE11]' + id: totrans-47 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: '![Feature from transformer layer 1, Feature from transformer layer 2, Feature from transformer layer 3, Feature from transformer layer 4, Feature from transformer layer 5, Feature from transformer layer 6, Feature from transformer layer 7, Feature from transformer layer 8, Feature from transformer layer 9, Feature from transformer layer 10, Feature from transformer layer 11, Feature from transformer layer 12](../Images/9f2d3410922166561ebdadfd4981e797.png)' + id: totrans-48 prefs: [] type: TYPE_IMG + zh: '![来自变换器层1的特征,来自变换器层2的特征,来自变换器层3的特征,来自变换器层4的特征,来自变换器层5的特征,来自变换器层6的特征,来自变换器层7的特征,来自变换器层8的特征,来自变换器层9的特征,来自变换器层10的特征,来自变换器层11的特征,来自变换器层12的特征](../Images/9f2d3410922166561ebdadfd4981e797.png)' - en: Feature classification[](#feature-classification "Permalink to this heading") + id: totrans-49 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 特征分类 - en: Once the acoustic features are extracted, the next step is to classify them into a set of categories. + id: totrans-50 prefs: [] type: TYPE_NORMAL + zh: 一旦提取了声学特征,下一步就是将它们分类到一组类别中。 - en: Wav2Vec2 model provides method to perform the feature extraction and classification in one step. + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: Wav2Vec2模型提供了一种在一步中执行特征提取和分类的方法。 - en: '[PRE12]' + id: totrans-52 prefs: [] type: TYPE_PRE + zh: '[PRE12]' - en: The output is in the form of logits. It is not in the form of probability. + id: totrans-53 prefs: [] type: TYPE_NORMAL + zh: 输出以logits的形式呈现,而不是概率的形式。 - en: Let’s visualize this. + id: totrans-54 prefs: [] type: TYPE_NORMAL + zh: 让我们可视化这个过程。 - en: '[PRE13]' + id: totrans-55 prefs: [] type: TYPE_PRE + zh: '[PRE13]' - en: '![Classification result](../Images/ce8601d728900194dc8cb21fbd524cf7.png)' + id: totrans-56 prefs: [] type: TYPE_IMG + zh: '![分类结果](../Images/ce8601d728900194dc8cb21fbd524cf7.png)' - en: '[PRE14]' + id: totrans-57 prefs: [] type: TYPE_PRE + zh: '[PRE14]' - en: We can see that there are strong indications to certain labels across the time line. + id: totrans-58 prefs: [] type: TYPE_NORMAL + zh: 我们可以看到在时间线上有对某些标签的强烈指示。 - en: Generating transcripts[](#generating-transcripts "Permalink to this heading") + id: totrans-59 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 生成转录 - en: From the sequence of label probabilities, now we want to generate transcripts. The process to generate hypotheses is often called “decoding”. + id: totrans-60 prefs: [] type: TYPE_NORMAL + zh: 从标签概率序列中,现在我们想生成转录。生成假设的过程通常称为“解码”。 - en: Decoding is more elaborate than simple classification because decoding at certain time step can be affected by surrounding observations. + id: totrans-61 prefs: [] type: TYPE_NORMAL + zh: 解码比简单分类更复杂,因为在某个时间步骤的解码可能会受到周围观察的影响。 - en: For example, take a word like `night` and `knight`. Even if their prior probability distribution are differnt (in typical conversations, `night` would occur way more often than `knight`), to accurately generate transcripts with `knight`, such as `a knight with a sword`, the decoding process has to postpone the final decision until it sees enough context. + id: totrans-62 prefs: [] type: TYPE_NORMAL + zh: 例如,拿一个词像`night`和`knight`。即使它们的先验概率分布不同(在典型对话中,`night`会比`knight`发生得更频繁),为了准确生成带有`knight`的转录,比如`a + knight with a sword`,解码过程必须推迟最终决定,直到看到足够的上下文。 - en: There are many decoding techniques proposed, and they require external resources, such as word dictionary and language models. + id: totrans-63 prefs: [] type: TYPE_NORMAL + zh: 有许多提出的解码技术,它们需要外部资源,如单词词典和语言模型。 - en: In this tutorial, for the sake of simplicity, we will perform greedy decoding which does not depend on such external components, and simply pick up the best hypothesis at each time step. Therefore, the context information are not used, and only one transcript can be generated. + id: totrans-64 prefs: [] type: TYPE_NORMAL + zh: 在本教程中,为了简单起见,我们将执行贪婪解码,它不依赖于外部组件,并且只在每个时间步骤选择最佳假设。因此,上下文信息未被使用,只能生成一个转录。 - en: We start by defining greedy decoding algorithm. + id: totrans-65 prefs: [] type: TYPE_NORMAL + zh: 我们首先定义贪婪解码算法。 - en: '[PRE15]' + id: totrans-66 prefs: [] type: TYPE_PRE + zh: '[PRE15]' - en: Now create the decoder object and decode the transcript. + id: totrans-67 prefs: [] type: TYPE_NORMAL + zh: 现在创建解码器对象并解码转录。 - en: '[PRE16]' + id: totrans-68 prefs: [] type: TYPE_PRE + zh: '[PRE16]' - en: Let’s check the result and listen again to the audio. + id: totrans-69 prefs: [] type: TYPE_NORMAL + zh: 让我们检查结果并再次听音频。 - en: '[PRE17]' + id: totrans-70 prefs: [] type: TYPE_PRE + zh: '[PRE17]' - en: '[PRE18]' + id: totrans-71 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE18]' +- en: null + id: totrans-72 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-73 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: The ASR model is fine-tuned using a loss function called Connectionist Temporal Classification (CTC). The detail of CTC loss is explained [here](https://distill.pub/2017/ctc/). In CTC a blank token (ϵ) is a special token which represents a repetition of the previous symbol. In decoding, these are simply ignored. + id: totrans-74 prefs: [] type: TYPE_NORMAL + zh: ASR模型使用一种称为连接主义时间分类(CTC)的损失函数进行微调。CTC损失的详细信息在[这里](https://distill.pub/2017/ctc/)有解释。在CTC中,空白标记(ϵ)是一个特殊标记,表示前一个符号的重复。在解码中,这些标记被简单地忽略。 - en: Conclusion[](#conclusion "Permalink to this heading") + id: totrans-75 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 结论 - en: In this tutorial, we looked at how to use [`Wav2Vec2ASRBundle`](../generated/torchaudio.pipelines.Wav2Vec2ASRBundle.html#torchaudio.pipelines.Wav2Vec2ASRBundle "torchaudio.pipelines.Wav2Vec2ASRBundle") to perform acoustic feature extraction and speech recognition. Constructing a model and getting the emission is as short as two lines. + id: totrans-76 prefs: [] type: TYPE_NORMAL + zh: 在本教程中,我们看了如何使用[`Wav2Vec2ASRBundle`](../generated/torchaudio.pipelines.Wav2Vec2ASRBundle.html#torchaudio.pipelines.Wav2Vec2ASRBundle)执行声学特征提取和语音识别。构建模型并获取发射只需两行代码。 - en: '[PRE19]' + id: totrans-77 prefs: [] type: TYPE_PRE + zh: '[PRE19]' - en: '**Total running time of the script:** ( 0 minutes 6.833 seconds)' + id: totrans-78 prefs: [] type: TYPE_NORMAL + zh: '**脚本的总运行时间:**(0分钟6.833秒)' - en: '[`Download Python source code: speech_recognition_pipeline_tutorial.py`](../_downloads/a0b5016bbf34fce4ac5549f4075dd10f/speech_recognition_pipeline_tutorial.py)' + id: totrans-79 prefs: [] type: TYPE_NORMAL + zh: '[`下载Python源代码:speech_recognition_pipeline_tutorial.py`](../_downloads/a0b5016bbf34fce4ac5549f4075dd10f/speech_recognition_pipeline_tutorial.py)' - en: '[`Download Jupyter notebook: speech_recognition_pipeline_tutorial.ipynb`](../_downloads/ca83af2ea8d7db05fb63211d515b7fde/speech_recognition_pipeline_tutorial.ipynb)' + id: totrans-80 prefs: [] type: TYPE_NORMAL + zh: '[`下载Jupyter笔记本:speech_recognition_pipeline_tutorial.ipynb`](../_downloads/ca83af2ea8d7db05fb63211d515b7fde/speech_recognition_pipeline_tutorial.ipynb)' - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)' + id: totrans-81 prefs: [] type: TYPE_NORMAL + zh: '[Sphinx-Gallery生成的图库](https://sphinx-gallery.github.io)' diff --git a/totrans/aud22_35.yaml b/totrans/aud22_35.yaml index c44557bd3d62d2585914c4ac5d1b4e3a7e11faf4..911b005cbd0de61058699b0116a5b24c932f7250 100644 --- a/totrans/aud22_35.yaml +++ b/totrans/aud22_35.yaml @@ -1,509 +1,774 @@ - en: ASR Inference with CTC Decoder + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 使用CTC解码器进行ASR推断 - en: 原文:[https://pytorch.org/audio/stable/tutorials/asr_inference_with_ctc_decoder_tutorial.html](https://pytorch.org/audio/stable/tutorials/asr_inference_with_ctc_decoder_tutorial.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/tutorials/asr_inference_with_ctc_decoder_tutorial.html](https://pytorch.org/audio/stable/tutorials/asr_inference_with_ctc_decoder_tutorial.html) - en: Note + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Click [here](#sphx-glr-download-tutorials-asr-inference-with-ctc-decoder-tutorial-py) to download the full example code + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 点击[这里](#sphx-glr-download-tutorials-asr-inference-with-ctc-decoder-tutorial-py)下载完整示例代码 - en: '**Author**: [Caroline Chen](mailto:carolinechen%40meta.com)' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: '**作者**:[Caroline Chen](mailto:carolinechen%40meta.com)' - en: This tutorial shows how to perform speech recognition inference using a CTC beam search decoder with lexicon constraint and KenLM language model support. We demonstrate this on a pretrained wav2vec 2.0 model trained using CTC loss. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 本教程展示了如何使用带有词典约束和KenLM语言模型支持的CTC波束搜索解码器执行语音识别推断。我们在使用CTC损失训练的预训练wav2vec 2.0模型上演示了这一点。 - en: Overview[](#overview "Permalink to this heading") + id: totrans-6 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 概述[](#overview "跳转到此标题的永久链接") - en: Beam search decoding works by iteratively expanding text hypotheses (beams) with next possible characters, and maintaining only the hypotheses with the highest scores at each time step. A language model can be incorporated into the scoring computation, and adding a lexicon constraint restricts the next possible tokens for the hypotheses so that only words from the lexicon can be generated. + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 波束搜索解码通过迭代扩展文本假设(波束)并使用下一个可能的字符,每个时间步仅保留具有最高分数的假设来工作。语言模型可以并入到得分计算中,添加词典约束会限制假设的下一个可能令牌,以便只能生成词典中的单词。 - en: The underlying implementation is ported from [Flashlight](https://arxiv.org/pdf/2201.12465.pdf)’s beam search decoder. A mathematical formula for the decoder optimization can be found in the [Wav2Letter paper](https://arxiv.org/pdf/1609.03193.pdf), and a more detailed algorithm can be found in this [blog](https://towardsdatascience.com/boosting-your-sequence-generation-performance-with-beam-search-language-model-decoding-74ee64de435a). + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 底层实现是从[Flashlight](https://arxiv.org/pdf/2201.12465.pdf)的波束搜索解码器移植过来的。解码器优化的数学公式可以在[Wav2Letter论文](https://arxiv.org/pdf/1609.03193.pdf)中找到,更详细的算法可以在这篇[博客](https://towardsdatascience.com/boosting-your-sequence-generation-performance-with-beam-search-language-model-decoding-74ee64de435a)中找到。 - en: Running ASR inference using a CTC Beam Search decoder with a language model and lexicon constraint requires the following components + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: 使用带有语言模型和词典约束的CTC波束搜索解码器进行ASR推断需要以下组件 - en: 'Acoustic Model: model predicting phonetics from audio waveforms' + id: totrans-10 prefs: - PREF_UL type: TYPE_NORMAL + zh: 声学模型:从音频波形预测语音学的模型 - en: 'Tokens: the possible predicted tokens from the acoustic model' + id: totrans-11 prefs: - PREF_UL type: TYPE_NORMAL + zh: 令牌:声学模型可能预测的令牌 - en: 'Lexicon: mapping between possible words and their corresponding tokens sequence' + id: totrans-12 prefs: - PREF_UL type: TYPE_NORMAL + zh: 词典:可能单词与其对应的令牌序列之间的映射 - en: 'Language Model (LM): n-gram language model trained with the [KenLM library](https://kheafield.com/code/kenlm/), or custom language model that inherits [`CTCDecoderLM`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoderLM "torchaudio.models.decoder.CTCDecoderLM")' + id: totrans-13 prefs: - PREF_UL type: TYPE_NORMAL + zh: 语言模型(LM):使用[KenLM库](https://kheafield.com/code/kenlm/)训练的n-gram语言模型,或者继承了[`CTCDecoderLM`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoderLM + "torchaudio.models.decoder.CTCDecoderLM")的自定义语言模型 - en: Acoustic Model and Set Up[](#acoustic-model-and-set-up "Permalink to this heading") + id: totrans-14 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 声学模型和设置[](#acoustic-model-and-set-up "跳转到此标题的永久链接") - en: First we import the necessary utilities and fetch the data that we are working with + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: 首先,我们导入必要的工具并获取我们正在处理的数据 - en: '[PRE0]' + id: totrans-16 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '[PRE1]' + id: totrans-17 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: '[PRE2]' + id: totrans-18 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: We use the pretrained [Wav2Vec 2.0](https://arxiv.org/abs/2006.11477) Base model that is finetuned on 10 min of the [LibriSpeech dataset](http://www.openslr.org/12), which can be loaded in using [`torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M`](../generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M "torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M"). For more detail on running Wav2Vec 2.0 speech recognition pipelines in torchaudio, please refer to [this tutorial](./speech_recognition_pipeline_tutorial.html). + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: 我们使用预训练的[Wav2Vec 2.0](https://arxiv.org/abs/2006.11477)基础模型,该模型在10分钟的[LibriSpeech数据集](http://www.openslr.org/12)上进行了微调,可以使用[`torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M`](../generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M + "torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M")加载。有关在torchaudio中运行Wav2Vec 2.0语音识别流水线的更多详细信息,请参考[此教程](./speech_recognition_pipeline_tutorial.html)。 - en: '[PRE3]' + id: totrans-20 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: '[PRE4]' + id: totrans-21 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: We will load a sample from the LibriSpeech test-other dataset. + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 我们将从LibriSpeech测试集中加载一个样本。 - en: '[PRE5]' + id: totrans-23 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE5]' +- en: null + id: totrans-24 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-25 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: The transcript corresponding to this audio file is + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: 与此音频文件对应的转录本是 - en: '[PRE6]' + id: totrans-27 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: '[PRE7]' + id: totrans-28 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: Files and Data for Decoder[](#files-and-data-for-decoder "Permalink to this heading") + id: totrans-29 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 解码器的文件和数据[](#files-and-data-for-decoder "跳转到此标题的永久链接") - en: Next, we load in our token, lexicon, and language model data, which are used by the decoder to predict words from the acoustic model output. Pretrained files for the LibriSpeech dataset can be downloaded through torchaudio, or the user can provide their own files. + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: 接下来,我们加载我们的令牌、词典和语言模型数据,这些数据由解码器用于从声学模型输出中预测单词。LibriSpeech数据集的预训练文件可以通过torchaudio下载,或者用户可以提供自己的文件。 - en: Tokens[](#tokens "Permalink to this heading") + id: totrans-31 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 令牌[](#tokens "跳转到此标题的永久链接") - en: The tokens are the possible symbols that the acoustic model can predict, including the blank and silent symbols. It can either be passed in as a file, where each line consists of the tokens corresponding to the same index, or as a list of tokens, each mapping to a unique index. + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: 令牌是声学模型可以预测的可能符号,包括空白和静音符号。它可以作为一个文件传递,其中每一行都包含与相同索引对应的令牌,或者作为令牌列表传递,每个令牌映射到一个唯一的索引。 - en: '[PRE8]' + id: totrans-33 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: '[PRE9]' + id: totrans-34 prefs: [] type: TYPE_PRE + zh: '[PRE9]' - en: '[PRE10]' + id: totrans-35 prefs: [] type: TYPE_PRE + zh: '[PRE10]' - en: Lexicon[](#lexicon "Permalink to this heading") + id: totrans-36 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 词典[](#lexicon "跳转到此标题的永久链接") - en: The lexicon is a mapping from words to their corresponding tokens sequence, and is used to restrict the search space of the decoder to only words from the lexicon. The expected format of the lexicon file is a line per word, with a word followed by its space-split tokens. + id: totrans-37 prefs: [] type: TYPE_NORMAL + zh: 词典是从单词到其对应标记序列的映射,并用于将解码器的搜索空间限制为仅来自词典的单词。词典文件的预期格式是每行一个单词,后跟其空格分隔的标记。 - en: '[PRE11]' + id: totrans-38 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: Language Model[](#language-model "Permalink to this heading") + id: totrans-39 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 语言模型 - en: A language model can be used in decoding to improve the results, by factoring in a language model score that represents the likelihood of the sequence into the beam search computation. Below, we outline the different forms of language models that are supported for decoding. + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: 在解码中可以使用语言模型来改善结果,通过将代表序列可能性的语言模型分数纳入到波束搜索计算中。下面,我们概述了支持解码的不同形式的语言模型。 - en: No Language Model[](#no-language-model "Permalink to this heading") + id: totrans-41 prefs: - PREF_H4 type: TYPE_NORMAL + zh: 无语言模型 - en: To create a decoder instance without a language model, set lm=None when initializing the decoder. + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: 要创建一个没有语言模型的解码器实例,请在初始化解码器时设置lm=None。 - en: KenLM[](#kenlm "Permalink to this heading") + id: totrans-43 prefs: - PREF_H4 type: TYPE_NORMAL + zh: KenLM - en: This is an n-gram language model trained with the [KenLM library](https://kheafield.com/code/kenlm/). Both the `.arpa` or the binarized `.bin` LM can be used, but the binary format is recommended for faster loading. + id: totrans-44 prefs: [] type: TYPE_NORMAL + zh: 这是一个使用KenLM库训练的n-gram语言模型。可以使用`.arpa`或二进制化的`.bin`语言模型,但建议使用二进制格式以加快加载速度。 - en: The language model used in this tutorial is a 4-gram KenLM trained using [LibriSpeech](http://www.openslr.org/11). + id: totrans-45 prefs: [] type: TYPE_NORMAL + zh: 本教程中使用的语言模型是使用[LibriSpeech](http://www.openslr.org/11)训练的4-gram KenLM。 - en: Custom Language Model[](#custom-language-model "Permalink to this heading") + id: totrans-46 prefs: - PREF_H4 type: TYPE_NORMAL + zh: 自定义语言模型 - en: Users can define their own custom language model in Python, whether it be a statistical or neural network language model, using [`CTCDecoderLM`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoderLM "torchaudio.models.decoder.CTCDecoderLM") and [`CTCDecoderLMState`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoderLMState "torchaudio.models.decoder.CTCDecoderLMState"). + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 用户可以在Python中定义自己的自定义语言模型,无论是统计还是神经网络语言模型,使用[`CTCDecoderLM`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoderLM)和[`CTCDecoderLMState`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoderLMState)。 - en: For instance, the following code creates a basic wrapper around a PyTorch `torch.nn.Module` language model. + id: totrans-48 prefs: [] type: TYPE_NORMAL + zh: 例如,以下代码创建了一个围绕PyTorch `torch.nn.Module`语言模型的基本包装器。 - en: '[PRE12]' + id: totrans-49 prefs: [] type: TYPE_PRE -- en: Downloading Pretrained Files[](#downloading-pretrained-files "Permalink to - this heading") + zh: '[PRE12]' +- en: Downloading Pretrained Files[](#downloading-pretrained-files "Permalink to this + heading") + id: totrans-50 prefs: - PREF_H4 type: TYPE_NORMAL + zh: 下载预训练文件 - en: Pretrained files for the LibriSpeech dataset can be downloaded using [`download_pretrained_files()`](../generated/torchaudio.models.decoder.download_pretrained_files.html#torchaudio.models.decoder.download_pretrained_files "torchaudio.models.decoder.download_pretrained_files"). + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: 可以使用[`download_pretrained_files()`](../generated/torchaudio.models.decoder.download_pretrained_files.html#torchaudio.models.decoder.download_pretrained_files)下载LibriSpeech数据集的预训练文件。 - en: 'Note: this cell may take a couple of minutes to run, as the language model can be large' + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: 注意:此单元格可能需要几分钟才能运行,因为语言模型可能很大 - en: '[PRE13]' + id: totrans-53 prefs: [] type: TYPE_PRE + zh: '[PRE13]' - en: '[PRE14]' + id: totrans-54 prefs: [] type: TYPE_PRE + zh: '[PRE14]' - en: Construct Decoders[](#construct-decoders "Permalink to this heading") + id: totrans-55 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 构建解码器 - en: In this tutorial, we construct both a beam search decoder and a greedy decoder for comparison. + id: totrans-56 prefs: [] type: TYPE_NORMAL + zh: 在本教程中,我们构建了波束搜索解码器和贪婪解码器进行比较。 - en: Beam Search Decoder[](#beam-search-decoder "Permalink to this heading") + id: totrans-57 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 波束搜索解码器 - en: The decoder can be constructed using the factory function [`ctc_decoder()`](../generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder "torchaudio.models.decoder.ctc_decoder"). In addition to the previously mentioned components, it also takes in various beam search decoding parameters and token/word parameters. + id: totrans-58 prefs: [] type: TYPE_NORMAL + zh: 可以使用工厂函数[`ctc_decoder()`](../generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder)构建解码器。除了先前提到的组件外,它还接受各种波束搜索解码参数和标记/单词参数。 - en: This decoder can also be run without a language model by passing in None into the lm parameter. + id: totrans-59 prefs: [] type: TYPE_NORMAL + zh: 这个解码器也可以在没有语言模型的情况下运行,通过将None传递给lm参数。 - en: '[PRE15]' + id: totrans-60 prefs: [] type: TYPE_PRE + zh: '[PRE15]' - en: Greedy Decoder[](#greedy-decoder "Permalink to this heading") + id: totrans-61 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 贪婪解码器 - en: '[PRE16]' + id: totrans-62 prefs: [] type: TYPE_PRE + zh: '[PRE16]' - en: Run Inference[](#run-inference "Permalink to this heading") + id: totrans-63 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 运行推理 - en: Now that we have the data, acoustic model, and decoder, we can perform inference. The output of the beam search decoder is of type [`CTCHypothesis`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCHypothesis "torchaudio.models.decoder.CTCHypothesis"), consisting of the predicted token IDs, corresponding words (if a lexicon is provided), hypothesis score, and timesteps corresponding to the token IDs. Recall the transcript corresponding to the waveform is + id: totrans-64 prefs: [] type: TYPE_NORMAL + zh: 现在我们有了数据、声学模型和解码器,我们可以执行推理。波束搜索解码器的输出类型为[`CTCHypothesis`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCHypothesis),包括预测的标记ID、对应的单词(如果提供了词典)、假设分数和与标记ID对应的时间步。回想一下与波形对应的转录是 - en: '[PRE17]' + id: totrans-65 prefs: [] type: TYPE_PRE + zh: '[PRE17]' - en: '[PRE18]' + id: totrans-66 prefs: [] type: TYPE_PRE + zh: '[PRE18]' - en: The greedy decoder gives the following result. + id: totrans-67 prefs: [] type: TYPE_NORMAL + zh: 贪婪解码器给出以下结果。 - en: '[PRE19]' + id: totrans-68 prefs: [] type: TYPE_PRE + zh: '[PRE19]' - en: '[PRE20]' + id: totrans-69 prefs: [] type: TYPE_PRE + zh: '[PRE20]' - en: 'Using the beam search decoder:' + id: totrans-70 prefs: [] type: TYPE_NORMAL + zh: 使用波束搜索解码器: - en: '[PRE21]' + id: totrans-71 prefs: [] type: TYPE_PRE + zh: '[PRE21]' - en: '[PRE22]' + id: totrans-72 prefs: [] type: TYPE_PRE + zh: '[PRE22]' - en: Note + id: totrans-73 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: The [`words`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCHypothesis.words "torchaudio.models.decoder.CTCHypothesis.words") field of the output hypotheses will be empty if no lexicon is provided to the decoder. To retrieve a transcript with lexicon-free decoding, you can perform the following to retrieve the token indices, convert them to original tokens, then join them together. + id: totrans-74 prefs: [] type: TYPE_NORMAL + zh: 如果解码器没有提供词典,输出假设的[`words`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCHypothesis.words + "torchaudio.models.decoder.CTCHypothesis.words")字段将为空。要获取无词典解码的转录,可以执行以下操作:检索标记索引,将其转换为原始标记,然后将它们连接在一起。 - en: '[PRE23]' + id: totrans-75 prefs: [] type: TYPE_PRE + zh: '[PRE23]' - en: We see that the transcript with the lexicon-constrained beam search decoder produces a more accurate result consisting of real words, while the greedy decoder can predict incorrectly spelled words like “affrayd” and “shoktd”. + id: totrans-76 prefs: [] type: TYPE_NORMAL + zh: 我们看到,使用受词典约束的波束搜索解码器的转录产生了更准确的结果,包含真实单词,而贪婪解码器可能会预测拼写错误的单词,如“affrayd”和“shoktd”。 - en: Incremental decoding[](#incremental-decoding "Permalink to this heading") + id: totrans-77 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 增量解码 - en: If the input speech is long, one can decode the emission in incremental manner. + id: totrans-78 prefs: [] type: TYPE_NORMAL + zh: 如果输入语音很长,可以以增量方式解码排放。 - en: You need to first initialize the internal state of the decoder with [`decode_begin()`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder.decode_begin "torchaudio.models.decoder.CTCDecoder.decode_begin"). + id: totrans-79 prefs: [] type: TYPE_NORMAL + zh: 您需要首先使用[`decode_begin()`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder.decode_begin + "torchaudio.models.decoder.CTCDecoder.decode_begin")初始化解码器的内部状态。 - en: '[PRE24]' + id: totrans-80 prefs: [] type: TYPE_PRE + zh: '[PRE24]' - en: Then, you can pass emissions to [`decode_begin()`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder.decode_begin "torchaudio.models.decoder.CTCDecoder.decode_begin"). Here we use the same emission but pass it to the decoder one frame at a time. + id: totrans-81 prefs: [] type: TYPE_NORMAL + zh: 然后,您可以将排放传递给[`decode_begin()`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder.decode_begin + "torchaudio.models.decoder.CTCDecoder.decode_begin")。在这里,我们使用相同的排放,但是一次将其传递给解码器一个帧。 - en: '[PRE25]' + id: totrans-82 prefs: [] type: TYPE_PRE + zh: '[PRE25]' - en: Finally, finalize the internal state of the decoder, and retrieve the result. + id: totrans-83 prefs: [] type: TYPE_NORMAL + zh: 最后,完成解码器的内部状态,并检索结果。 - en: '[PRE26]' + id: totrans-84 prefs: [] type: TYPE_PRE + zh: '[PRE26]' - en: The result of incremental decoding is identical to batch decoding. + id: totrans-85 prefs: [] type: TYPE_NORMAL + zh: 增量解码的结果与批量解码相同。 - en: '[PRE27]' + id: totrans-86 prefs: [] type: TYPE_PRE + zh: '[PRE27]' - en: '[PRE28]' + id: totrans-87 prefs: [] type: TYPE_PRE + zh: '[PRE28]' - en: Timestep Alignments[](#timestep-alignments "Permalink to this heading") + id: totrans-88 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 时间步对齐 - en: Recall that one of the components of the resulting Hypotheses is timesteps corresponding to the token IDs. + id: totrans-89 prefs: [] type: TYPE_NORMAL + zh: 回想一下,生成的假设中的一个组成部分是与标记ID对应的时间步。 - en: '[PRE29]' + id: totrans-90 prefs: [] type: TYPE_PRE + zh: '[PRE29]' - en: '[PRE30]' + id: totrans-91 prefs: [] type: TYPE_PRE + zh: '[PRE30]' - en: Below, we visualize the token timestep alignments relative to the original waveform. + id: totrans-92 prefs: [] type: TYPE_NORMAL + zh: 下面,我们将标记时间步对齐可视化相对于原始波形。 - en: '[PRE31]' + id: totrans-93 prefs: [] type: TYPE_PRE + zh: '[PRE31]' - en: '![asr inference with ctc decoder tutorial](../Images/e2abf68b7cace07964d5580316ac4575.png)' + id: totrans-94 prefs: [] type: TYPE_IMG + zh: '![带有ctc解码器教程的asr推理](../Images/e2abf68b7cace07964d5580316ac4575.png)' - en: Beam Search Decoder Parameters[](#beam-search-decoder-parameters "Permalink to this heading") + id: totrans-95 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 波束搜索解码器参数 - en: In this section, we go a little bit more in depth about some different parameters and tradeoffs. For the full list of customizable parameters, please refer to the [`documentation`](../generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder "torchaudio.models.decoder.ctc_decoder"). + id: totrans-96 prefs: [] type: TYPE_NORMAL + zh: 在本节中,我们将更深入地讨论一些不同的参数和权衡。有关可定制参数的完整列表,请参考[`文档`](../generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder + "torchaudio.models.decoder.ctc_decoder")。 - en: Helper Function[](#helper-function "Permalink to this heading") + id: totrans-97 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 辅助函数 - en: '[PRE32]' + id: totrans-98 prefs: [] type: TYPE_PRE + zh: '[PRE32]' - en: nbest[](#nbest "Permalink to this heading") + id: totrans-99 prefs: - PREF_H3 type: TYPE_NORMAL + zh: nbest - en: This parameter indicates the number of best hypotheses to return, which is a property that is not possible with the greedy decoder. For instance, by setting `nbest=3` when constructing the beam search decoder earlier, we can now access the hypotheses with the top 3 scores. + id: totrans-100 prefs: [] type: TYPE_NORMAL + zh: 此参数指示要返回的最佳假设数,这是贪婪解码器无法实现的属性。例如,在构建波束搜索解码器时设置`nbest=3`,现在我们可以访问得分最高的三个假设。 - en: '[PRE33]' + id: totrans-101 prefs: [] type: TYPE_PRE + zh: '[PRE33]' - en: '[PRE34]' + id: totrans-102 prefs: [] type: TYPE_PRE + zh: '[PRE34]' - en: beam size[](#beam-size "Permalink to this heading") + id: totrans-103 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 波束大小 - en: The `beam_size` parameter determines the maximum number of best hypotheses to hold after each decoding step. Using larger beam sizes allows for exploring a larger range of possible hypotheses which can produce hypotheses with higher scores, but it is computationally more expensive and does not provide additional gains beyond a certain point. + id: totrans-104 prefs: [] type: TYPE_NORMAL + zh: '`beam_size`参数确定每个解码步骤后保留的最佳假设数的最大值。使用更大的波束大小可以探索更广泛的可能假设范围,这可能会产生得分更高的假设,但在计算上更昂贵,并且在某一点之后不提供额外的收益。' - en: In the example below, we see improvement in decoding quality as we increase beam size from 1 to 5 to 50, but notice how using a beam size of 500 provides the same output as beam size 50 while increase the computation time. + id: totrans-105 prefs: [] type: TYPE_NORMAL + zh: 在下面的示例中,我们看到随着将波束大小从1增加到5再到50,解码质量有所提高,但请注意,使用波束大小为500时提供与波束大小为50相同的输出,同时增加了计算时间。 - en: '[PRE35]' + id: totrans-106 prefs: [] type: TYPE_PRE + zh: '[PRE35]' - en: '[PRE36]' + id: totrans-107 prefs: [] type: TYPE_PRE + zh: '[PRE36]' - en: beam size token[](#beam-size-token "Permalink to this heading") + id: totrans-108 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 波束大小标记 - en: The `beam_size_token` parameter corresponds to the number of tokens to consider for expanding each hypothesis at the decoding step. Exploring a larger number of next possible tokens increases the range of potential hypotheses at the cost of computation. + id: totrans-109 prefs: [] type: TYPE_NORMAL + zh: '`beam_size_token`参数对应于在解码步骤中考虑扩展每个假设的标记数。探索更多可能的下一个标记数量会增加潜在假设的范围,但会增加计算成本。' - en: '[PRE37]' + id: totrans-110 prefs: [] type: TYPE_PRE + zh: '[PRE37]' - en: '[PRE38]' + id: totrans-111 prefs: [] type: TYPE_PRE + zh: '[PRE38]' - en: beam threshold[](#beam-threshold "Permalink to this heading") + id: totrans-112 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 波束阈值 - en: The `beam_threshold` parameter is used to prune the stored hypotheses set at each decoding step, removing hypotheses whose scores are greater than `beam_threshold` away from the highest scoring hypothesis. There is a balance between choosing smaller thresholds to prune more hypotheses and reduce the search space, and choosing a large enough threshold such that plausible hypotheses are not pruned. + id: totrans-113 prefs: [] type: TYPE_NORMAL + zh: '`beam_threshold`参数用于在每个解码步骤中修剪存储的假设集,删除分数高于距离最高分假设`beam_threshold`的假设。在选择较小的阈值以修剪更多假设并减少搜索空间之间存在平衡,以及选择足够大的阈值以确保不会修剪合理的假设。' - en: '[PRE39]' + id: totrans-114 prefs: [] type: TYPE_PRE + zh: '[PRE39]' - en: '[PRE40]' + id: totrans-115 prefs: [] type: TYPE_PRE + zh: '[PRE40]' - en: language model weight[](#language-model-weight "Permalink to this heading") + id: totrans-116 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 语言模型权重[](#language-model-weight "跳转到此标题的永久链接") - en: The `lm_weight` parameter is the weight to assign to the language model score which to accumulate with the acoustic model score for determining the overall scores. Larger weights encourage the model to predict next words based on the language model, while smaller weights give more weight to the acoustic model score instead. + id: totrans-117 prefs: [] type: TYPE_NORMAL + zh: '`lm_weight`参数是要分配给语言模型分数的权重,该分数将与声学模型分数累积以确定总体分数。较大的权重鼓励模型基于语言模型预测下一个单词,而较小的权重则更多地将权重放在声学模型分数上。' - en: '[PRE41]' + id: totrans-118 prefs: [] type: TYPE_PRE + zh: '[PRE41]' - en: '[PRE42]' + id: totrans-119 prefs: [] type: TYPE_PRE + zh: '[PRE42]' - en: additional parameters[](#additional-parameters "Permalink to this heading") + id: totrans-120 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 其他参数[](#additional-parameters "跳转到此标题的永久链接") - en: Additional parameters that can be optimized include the following + id: totrans-121 prefs: [] type: TYPE_NORMAL + zh: 可以优化的其他参数包括以下内容 - en: '`word_score`: score to add when word finishes' + id: totrans-122 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`word_score`: 单词结束时要添加的分数' - en: '`unk_score`: unknown word appearance score to add' + id: totrans-123 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`unk_score`: 添加未知单词出现分数' - en: '`sil_score`: silence appearance score to add' + id: totrans-124 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`sil_score`: 添加静音出现分数' - en: '`log_add`: whether to use log add for lexicon Trie smearing' + id: totrans-125 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`log_add`: 是否对词典Trie扩散使用对数相加' - en: '**Total running time of the script:** ( 1 minutes 55.312 seconds)' + id: totrans-126 prefs: [] type: TYPE_NORMAL + zh: '**脚本的总运行时间:**(1分钟55.312秒)' - en: '[`Download Python source code: asr_inference_with_ctc_decoder_tutorial.py`](../_downloads/da151acc525ba1fb468e2a4904659af1/asr_inference_with_ctc_decoder_tutorial.py)' + id: totrans-127 prefs: [] type: TYPE_NORMAL + zh: '[`下载Python源代码:asr_inference_with_ctc_decoder_tutorial.py`](../_downloads/da151acc525ba1fb468e2a4904659af1/asr_inference_with_ctc_decoder_tutorial.py)' - en: '[`Download Jupyter notebook: asr_inference_with_ctc_decoder_tutorial.ipynb`](../_downloads/ade1a3c3b444796d2a34839c7ea75426/asr_inference_with_ctc_decoder_tutorial.ipynb)' + id: totrans-128 prefs: [] type: TYPE_NORMAL + zh: '[`下载Jupyter笔记本:asr_inference_with_ctc_decoder_tutorial.ipynb`](../_downloads/ade1a3c3b444796d2a34839c7ea75426/asr_inference_with_ctc_decoder_tutorial.ipynb)' - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)' + id: totrans-129 prefs: [] type: TYPE_NORMAL + zh: '[Sphinx-Gallery生成的图库](https://sphinx-gallery.github.io)' diff --git a/totrans/aud22_36.yaml b/totrans/aud22_36.yaml index 7c8faf9b10e086a27362f9e21652eb7bb927ea9c..cd987eb71c8d58d31fac66bed0ddd7ba181f5334 100644 --- a/totrans/aud22_36.yaml +++ b/totrans/aud22_36.yaml @@ -1,293 +1,449 @@ - en: ASR Inference with CUDA CTC Decoder + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 使用CUDA CTC解码器进行ASR推理 - en: 原文:[https://pytorch.org/audio/stable/tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html](https://pytorch.org/audio/stable/tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html](https://pytorch.org/audio/stable/tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html) - en: Note + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Click [here](#sphx-glr-download-tutorials-asr-inference-with-cuda-ctc-decoder-tutorial-py) to download the full example code + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 点击这里下载完整示例代码 - en: '**Author**: [Yuekai Zhang](mailto:yuekaiz%40nvidia.com)' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 作者:Yuekai Zhang - en: This tutorial shows how to perform speech recognition inference using a CUDA-based CTC beam search decoder. We demonstrate this on a pretrained [Zipformer](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_ctc) model from [Next-gen Kaldi](https://nadirapovey.com/next-gen-kaldi-what-is-it) project. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 本教程展示了如何使用基于CUDA的CTC波束搜索解码器执行语音识别推理。我们在来自[Next-gen Kaldi](https://nadirapovey.com/next-gen-kaldi-what-is-it)项目的预训练[Zipformer](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_ctc)模型上演示了这一点。 - en: Overview[](#overview "Permalink to this heading") + id: totrans-6 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 概述 - en: Beam search decoding works by iteratively expanding text hypotheses (beams) with next possible characters, and maintaining only the hypotheses with the highest scores at each time step. + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 波束搜索解码通过迭代地扩展文本假设(波束)与下一个可能的字符,并在每个时间步仅保留得分最高的假设来工作。 - en: The underlying implementation uses cuda to acclerate the whole decoding process + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 底层实现使用cuda来加速整个解码过程 - en: A mathematical formula for the decoder can be + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: 解码器的数学公式可以是 - en: found in the [paper](https://arxiv.org/pdf/1408.2873.pdf), and a more detailed algorithm can be found in this [blog](https://distill.pub/2017/ctc/). + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 在[论文](https://arxiv.org/pdf/1408.2873.pdf)中找到,并且更详细的算法可以在这个[博客](https://distill.pub/2017/ctc/)中找到。 - en: Running ASR inference using a CUDA CTC Beam Search decoder requires the following components + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 使用CUDA CTC波束搜索解码器运行ASR推理需要以下组件 - en: 'Acoustic Model: model predicting modeling units (BPE in this tutorial) from acoustic features' + id: totrans-12 prefs: - PREF_UL type: TYPE_NORMAL + zh: 声学模型:从声学特征预测建模单元(本教程中为BPE)的模型 - en: 'BPE Model: the byte-pair encoding (BPE) tokenizer file' + id: totrans-13 prefs: - PREF_UL type: TYPE_NORMAL + zh: BPE模型:字节对编码(BPE)分词器文件 - en: Acoustic Model and Set Up[](#acoustic-model-and-set-up "Permalink to this heading") + id: totrans-14 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 声学模型和设置 - en: First we import the necessary utilities and fetch the data that we are working with + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: 首先,我们导入必要的工具并获取我们要处理的数据 - en: '[PRE0]' + id: totrans-16 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '[PRE1]' + id: totrans-17 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: '[PRE2]' + id: totrans-18 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: We use the pretrained [Zipformer](https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-ctc-2022-12-01) model that is trained on the [LibriSpeech dataset](http://www.openslr.org/12). The model is jointly trained with CTC and Transducer loss functions. In this tutorial, we only use CTC head of the model. + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: 我们使用预训练的[Zipformer](https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-ctc-2022-12-01)模型,该模型在[LibriSpeech数据集](http://www.openslr.org/12)上进行了训练。该模型同时使用CTC和Transducer损失函数进行训练。在本教程中,我们仅使用模型的CTC头部。 - en: '[PRE3]' + id: totrans-20 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: '[PRE4]' + id: totrans-21 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: We will load a sample from the LibriSpeech test-other dataset. + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 我们将从LibriSpeech测试其他数据集中加载一个样本。 - en: '[PRE5]' + id: totrans-23 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: '[PRE6]' + id: totrans-24 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE6]' +- en: null + id: totrans-25 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: The transcript corresponding to this audio file is + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: 与此音频文件对应的抄本是 - en: '[PRE7]' + id: totrans-28 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: Files and Data for Decoder[](#files-and-data-for-decoder "Permalink to this heading") + id: totrans-29 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 解码器的文件和数据 - en: Next, we load in our token from BPE model, which is the tokenizer for decoding. + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: 接下来,我们从BPE模型中加载我们的标记,这是用于解码的分词器。 - en: Tokens[](#tokens "Permalink to this heading") + id: totrans-31 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 标记 - en: The tokens are the possible symbols that the acoustic model can predict, including the blank symbol in CTC. In this tutorial, it includes 500 BPE tokens. It can either be passed in as a file, where each line consists of the tokens corresponding to the same index, or as a list of tokens, each mapping to a unique index. + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: 标记是声学模型可以预测的可能符号,包括CTC中的空白符号。在本教程中,它包括500个BPE标记。它可以作为文件传入,其中每行包含与相同索引对应的标记,或作为标记列表传入,每个标记映射到一个唯一的索引。 - en: '[PRE8]' + id: totrans-33 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: '[PRE9]' + id: totrans-34 prefs: [] type: TYPE_PRE + zh: '[PRE9]' - en: '[PRE10]' + id: totrans-35 prefs: [] type: TYPE_PRE + zh: '[PRE10]' - en: Construct CUDA Decoder[](#construct-cuda-decoder "Permalink to this heading") + id: totrans-36 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 构建CUDA解码器 - en: In this tutorial, we will construct a CUDA beam search decoder. The decoder can be constructed using the factory function [`cuda_ctc_decoder()`](../generated/torchaudio.models.decoder.cuda_ctc_decoder.html#torchaudio.models.decoder.cuda_ctc_decoder "torchaudio.models.decoder.cuda_ctc_decoder"). + id: totrans-37 prefs: [] type: TYPE_NORMAL + zh: 在本教程中,我们将构建一个CUDA波束搜索解码器。可以使用工厂函数[`cuda_ctc_decoder()`](../generated/torchaudio.models.decoder.cuda_ctc_decoder.html#torchaudio.models.decoder.cuda_ctc_decoder + "torchaudio.models.decoder.cuda_ctc_decoder")来构建解码器。 - en: '[PRE11]' + id: totrans-38 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: Run Inference[](#run-inference "Permalink to this heading") + id: totrans-39 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 运行推理 - en: Now that we have the data, acoustic model, and decoder, we can perform inference. The output of the beam search decoder is of type [`CUCTCHypothesis`](../generated/torchaudio.models.decoder.CUCTCDecoder.html#torchaudio.models.decoder.CUCTCHypothesis "torchaudio.models.decoder.CUCTCHypothesis"), consisting of the predicted token IDs, words (symbols corresponding to the token IDs), and hypothesis scores. Recall the transcript corresponding to the waveform is + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: 现在我们有了数据、声学模型和解码器,我们可以执行推理。波束搜索解码器的输出类型为[`CUCTCHypothesis`](../generated/torchaudio.models.decoder.CUCTCDecoder.html#torchaudio.models.decoder.CUCTCHypothesis + "torchaudio.models.decoder.CUCTCHypothesis"),包括预测的标记ID、单词(与标记ID对应的符号)和假设分数。回想一下与波形对应的抄本是 - en: '[PRE12]' + id: totrans-41 prefs: [] type: TYPE_PRE + zh: '[PRE12]' - en: '[PRE13]' + id: totrans-42 prefs: [] type: TYPE_PRE + zh: '[PRE13]' - en: '[PRE14]' + id: totrans-43 prefs: [] type: TYPE_PRE + zh: '[PRE14]' - en: The cuda ctc decoder gives the following result. + id: totrans-44 prefs: [] type: TYPE_NORMAL + zh: cuda ctc解码器给出以下结果。 - en: '[PRE15]' + id: totrans-45 prefs: [] type: TYPE_PRE + zh: '[PRE15]' - en: '[PRE16]' + id: totrans-46 prefs: [] type: TYPE_PRE + zh: '[PRE16]' - en: Beam Search Decoder Parameters[](#beam-search-decoder-parameters "Permalink to this heading") + id: totrans-47 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 波束搜索解码器参数 - en: In this section, we go a little bit more in depth about some different parameters and tradeoffs. For the full list of customizable parameters, please refer to the [`documentation`](../generated/torchaudio.models.decoder.cuda_ctc_decoder.html#torchaudio.models.decoder.cuda_ctc_decoder "torchaudio.models.decoder.cuda_ctc_decoder"). + id: totrans-48 prefs: [] type: TYPE_NORMAL + zh: 在本节中,我们将更深入地讨论一些不同参数和权衡。有关可自定义参数的完整列表,请参考[`文档`](../generated/torchaudio.models.decoder.cuda_ctc_decoder.html#torchaudio.models.decoder.cuda_ctc_decoder + "torchaudio.models.decoder.cuda_ctc_decoder")。 - en: Helper Function[](#helper-function "Permalink to this heading") + id: totrans-49 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 辅助函数[](#helper-function "跳转到此标题") - en: '[PRE17]' + id: totrans-50 prefs: [] type: TYPE_PRE + zh: '[PRE17]' - en: nbest[](#nbest "Permalink to this heading") + id: totrans-51 prefs: - PREF_H3 type: TYPE_NORMAL + zh: nbest[](#nbest "跳转到此标题") - en: This parameter indicates the number of best hypotheses to return. For instance, by setting `nbest=10` when constructing the beam search decoder earlier, we can now access the hypotheses with the top 10 scores. + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: 此参数表示要返回的最佳假设数量。例如,在之前构建波束搜索解码器时设置 `nbest=10`,现在我们可以访问得分前10名的假设。 - en: '[PRE18]' + id: totrans-53 prefs: [] type: TYPE_PRE + zh: '[PRE18]' - en: '[PRE19]' + id: totrans-54 prefs: [] type: TYPE_PRE + zh: '[PRE19]' - en: beam size[](#beam-size "Permalink to this heading") + id: totrans-55 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 波束大小[](#beam-size "跳转到此标题") - en: The `beam_size` parameter determines the maximum number of best hypotheses to hold after each decoding step. Using larger beam sizes allows for exploring a larger range of possible hypotheses which can produce hypotheses with higher scores, but it does not provide additional gains beyond a certain point. We recommend to set beam_size=10 for cuda beam search decoder. + id: totrans-56 prefs: [] type: TYPE_NORMAL + zh: '`beam_size`参数确定每个解码步骤后保留的最佳假设数量上限。使用更大的波束大小可以探索更广泛的可能假设范围,这可以产生得分更高的假设,但在一定程度上不会提供额外的收益。我们建议为cuda波束搜索解码器设置`beam_size=10`。' - en: In the example below, we see improvement in decoding quality as we increase beam size from 1 to 3, but notice how using a beam size of 3 provides the same output as beam size 10. + id: totrans-57 prefs: [] type: TYPE_NORMAL + zh: 在下面的示例中,我们可以看到随着波束大小从1增加到3,解码质量有所提高,但请注意,使用波束大小为3时提供与波束大小为10相同的输出。 - en: '[PRE20]' + id: totrans-58 prefs: [] type: TYPE_PRE + zh: '[PRE20]' - en: '[PRE21]' + id: totrans-59 prefs: [] type: TYPE_PRE + zh: '[PRE21]' - en: blank skip threshold[](#blank-skip-threshold "Permalink to this heading") + id: totrans-60 prefs: - PREF_H3 type: TYPE_NORMAL + zh: blank skip threshold[](#blank-skip-threshold "跳转到此标题") - en: The `blank_skip_threshold` parameter is used to prune the frames which have large blank probability. Pruning these frames with a good blank_skip_threshold could speed up decoding process a lot while no accuracy drop. Since the rule of CTC, we would keep at least one blank frame between two non-blank frames to avoid mistakenly merge two consecutive identical symbols. We recommend to set blank_skip_threshold=0.95 for cuda beam search decoder. + id: totrans-61 prefs: [] type: TYPE_NORMAL + zh: '`blank_skip_threshold`参数用于修剪具有较大空白概率的帧。使用良好的`blank_skip_threshold`修剪这些帧可以大大加快解码过程,而不会降低准确性。根据CTC规则,我们应至少在两个非空白帧之间保留一个空白帧,以避免错误地合并两个连续相同的符号。我们建议为cuda波束搜索解码器设置`blank_skip_threshold=0.95`。' - en: '[PRE22]' + id: totrans-62 prefs: [] type: TYPE_PRE + zh: '[PRE22]' - en: '[PRE23]' + id: totrans-63 prefs: [] type: TYPE_PRE + zh: '[PRE23]' - en: Benchmark with flashlight CPU decoder[](#benchmark-with-flashlight-cpu-decoder "Permalink to this heading") + id: totrans-64 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 使用手电筒CPU解码器进行基准测试[](#benchmark-with-flashlight-cpu-decoder "跳转到此标题") - en: We benchmark the throughput and accuracy between CUDA decoder and CPU decoder using librispeech test_other set. To reproduce below benchmark results, you may refer [here](https://github.com/pytorch/audio/tree/main/examples/asr/librispeech_cuda_ctc_decoder). + id: totrans-65 prefs: [] type: TYPE_NORMAL + zh: 我们使用librispeech test_other数据集对CUDA解码器和CPU解码器之间的吞吐量和准确性进行基准测试。要重现下面的基准测试结果,您可以参考[这里](https://github.com/pytorch/audio/tree/main/examples/asr/librispeech_cuda_ctc_decoder)。 - en: '| Decoder | Setting | WER (%) | N-Best Oracle WER (%) | Decoder Cost Time (seconds) |' + id: totrans-66 prefs: [] type: TYPE_TB + zh: '| 解码器 | 设置 | WER (%) | N-Best Oracle WER (%) | 解码器成本时间 (秒) |' - en: '| --- | --- | --- | --- | --- |' + id: totrans-67 prefs: [] type: TYPE_TB + zh: '| --- | --- | --- | --- | --- |' - en: '| CUDA decoder | blank_skip_threshold 0.95 | 5.81 | 4.11 | 2.57 |' + id: totrans-68 prefs: [] type: TYPE_TB + zh: '| CUDA解码器 | blank_skip_threshold 0.95 | 5.81 | 4.11 | 2.57 |' - en: '| CUDA decoder | blank_skip_threshold 1.0 (no frame-skip) | 5.81 | 4.09 | 6.24 |' + id: totrans-69 prefs: [] type: TYPE_TB + zh: '| CUDA解码器 | blank_skip_threshold 1.0 (无帧跳过) | 5.81 | 4.09 | 6.24 |' - en: '| CPU decoder | beam_size_token 10 | 5.86 | 4.30 | 28.61 |' + id: totrans-70 prefs: [] type: TYPE_TB + zh: '| CPU解码器 | beam_size_token 10 | 5.86 | 4.30 | 28.61 |' - en: '| CPU decoder | beam_size_token 500 | 5.86 | 4.30 | 791.80 |' + id: totrans-71 prefs: [] type: TYPE_TB + zh: '| CPU解码器 | beam_size_token 500 | 5.86 | 4.30 | 791.80 |' - en: From the above table, CUDA decoder could give a slight improvement in WER and a significant increase in throughput. + id: totrans-72 prefs: [] type: TYPE_NORMAL + zh: 从上表中可以看出,CUDA解码器在WER方面略有改善,并且吞吐量显著增加。 - en: '**Total running time of the script:** ( 0 minutes 8.752 seconds)' + id: totrans-73 prefs: [] type: TYPE_NORMAL + zh: '**脚本的总运行时间:** ( 0 分钟 8.752 秒)' - en: '[`Download Python source code: asr_inference_with_cuda_ctc_decoder_tutorial.py`](../_downloads/3956cf493d21711e687e9610c91f9cd1/asr_inference_with_cuda_ctc_decoder_tutorial.py)' + id: totrans-74 prefs: [] type: TYPE_NORMAL + zh: '[`下载Python源代码: asr_inference_with_cuda_ctc_decoder_tutorial.py`](../_downloads/3956cf493d21711e687e9610c91f9cd1/asr_inference_with_cuda_ctc_decoder_tutorial.py)' - en: '[`Download Jupyter notebook: asr_inference_with_cuda_ctc_decoder_tutorial.ipynb`](../_downloads/96982138e59c541534342222a3f5c69e/asr_inference_with_cuda_ctc_decoder_tutorial.ipynb)' + id: totrans-75 prefs: [] type: TYPE_NORMAL + zh: '[`下载Jupyter笔记本: asr_inference_with_cuda_ctc_decoder_tutorial.ipynb`](../_downloads/96982138e59c541534342222a3f5c69e/asr_inference_with_cuda_ctc_decoder_tutorial.ipynb)' - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)' + id: totrans-76 prefs: [] type: TYPE_NORMAL + zh: '[Sphinx-Gallery生成的图库](https://sphinx-gallery.github.io)' diff --git a/totrans/aud22_37.yaml b/totrans/aud22_37.yaml index 031be106904633eaed0cc09cc26908168c8187e1..b42d615a341119df88a1bbf3cd9c40d4596594ab 100644 --- a/totrans/aud22_37.yaml +++ b/totrans/aud22_37.yaml @@ -1,378 +1,598 @@ - en: Online ASR with Emformer RNN-T + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 使用Emformer RNN-T进行在线ASR - en: 原文:[https://pytorch.org/audio/stable/tutorials/online_asr_tutorial.html](https://pytorch.org/audio/stable/tutorials/online_asr_tutorial.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/tutorials/online_asr_tutorial.html](https://pytorch.org/audio/stable/tutorials/online_asr_tutorial.html) - en: Note + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Click [here](#sphx-glr-download-tutorials-online-asr-tutorial-py) to download the full example code + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 点击[这里](#sphx-glr-download-tutorials-online-asr-tutorial-py)下载完整示例代码 - en: '**Author**: [Jeff Hwang](mailto:jeffhwang%40meta.com), [Moto Hira](mailto:moto%40meta.com)' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: '**作者**:[Jeff Hwang](mailto:jeffhwang%40meta.com), [Moto Hira](mailto:moto%40meta.com)' - en: This tutorial shows how to use Emformer RNN-T and streaming API to perform online speech recognition. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 本教程展示了如何使用Emformer RNN-T和流式API执行在线语音识别。 - en: Note + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: This tutorial requires FFmpeg libraries and SentencePiece. + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 本教程需要使用FFmpeg库和SentencePiece。 - en: Please refer to [Optional Dependencies](../installation.html#optional-dependencies) for the detail. + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 有关详细信息,请参阅[可选依赖项](../installation.html#optional-dependencies)。 - en: 1\. Overview[](#overview "Permalink to this heading") + id: totrans-9 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 1\. 概述[](#overview "跳转到此标题的永久链接") - en: Performing online speech recognition is composed of the following steps + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 在线语音识别的执行由以下步骤组成 - en: 'Build the inference pipeline Emformer RNN-T is composed of three components: feature extractor, decoder and token processor.' + id: totrans-11 prefs: - PREF_OL type: TYPE_NORMAL + zh: 构建推理管道Emformer RNN-T由三个组件组成:特征提取器、解码器和标记处理器。 - en: Format the waveform into chunks of expected sizes. + id: totrans-12 prefs: - PREF_OL type: TYPE_NORMAL + zh: 将波形格式化为预期大小的块。 - en: Pass data through the pipeline. + id: totrans-13 prefs: - PREF_OL type: TYPE_NORMAL + zh: 通过管道传递数据。 - en: 2\. Preparation[](#preparation "Permalink to this heading") + id: totrans-14 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 2\. 准备[](#preparation "跳转到此标题的永久链接") - en: '[PRE0]' + id: totrans-15 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '[PRE1]' + id: totrans-16 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: 3\. Construct the pipeline[](#construct-the-pipeline "Permalink to this heading") + id: totrans-17 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 3\. 构建管道[](#construct-the-pipeline "跳转到此标题的永久链接") - en: Pre-trained model weights and related pipeline components are bundled as [`torchaudio.pipelines.RNNTBundle`](../generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle "torchaudio.pipelines.RNNTBundle"). + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: 预训练模型权重和相关管道组件被捆绑为[`torchaudio.pipelines.RNNTBundle`](../generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle)。 - en: We use [`torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH`](../generated/torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH.html#torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH "torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH"), which is a Emformer RNN-T model trained on LibriSpeech dataset. + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: 我们使用[`torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH`](../generated/torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH.html#torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH),这是在LibriSpeech数据集上训练的Emformer + RNN-T模型。 - en: '[PRE2]' + id: totrans-20 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: '[PRE3]' + id: totrans-21 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: Streaming inference works on input data with overlap. Emformer RNN-T model treats the newest portion of the input data as the “right context” — a preview of future context. In each inference call, the model expects the main segment to start from this right context from the previous inference call. The following figure illustrates this. + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 流式推理适用于具有重叠的输入数据。Emformer RNN-T模型将输入数据的最新部分视为“右上下文” —— 未来上下文的预览。在每次推理调用中,模型期望主段从上一次推理调用的右上下文开始。以下图示说明了这一点。 - en: '![https://download.pytorch.org/torchaudio/tutorial-assets/emformer_rnnt_context.png](../Images/0e1c9a1ab0a1725ac44a8f5ae79784d9.png)' + id: totrans-23 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/tutorial-assets/emformer_rnnt_context.png](../Images/0e1c9a1ab0a1725ac44a8f5ae79784d9.png)' - en: The size of main segment and right context, along with the expected sample rate can be retrieved from bundle. + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: 主段和右上下文的大小,以及预期的采样率可以从bundle中检索。 - en: '[PRE4]' + id: totrans-25 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: '[PRE5]' + id: totrans-26 prefs: [] type: TYPE_PRE -- en: 4\. Configure the audio stream[](#configure-the-audio-stream "Permalink to - this heading") + zh: '[PRE5]' +- en: 4\. Configure the audio stream[](#configure-the-audio-stream "Permalink to this + heading") + id: totrans-27 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 4\. 配置音频流[](#configure-the-audio-stream "跳转到此标题的永久链接") - en: Next, we configure the input audio stream using [`torchaudio.io.StreamReader`](../generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader "torchaudio.io.StreamReader"). + id: totrans-28 prefs: [] type: TYPE_NORMAL + zh: 接下来,我们使用[`torchaudio.io.StreamReader`](../generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader)配置输入音频流。 - en: For the detail of this API, please refer to the [StreamReader Basic Usage](./streamreader_basic_tutorial.html). + id: totrans-29 prefs: [] type: TYPE_NORMAL + zh: 有关此API的详细信息,请参阅[StreamReader基本用法](./streamreader_basic_tutorial.html)。 - en: The following audio file was originally published by LibriVox project, and it is in the public domain. + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: 以下音频文件最初由LibriVox项目发布,属于公共领域。 - en: '[https://librivox.org/great-pirate-stories-by-joseph-lewis-french/](https://librivox.org/great-pirate-stories-by-joseph-lewis-french/)' + id: totrans-31 prefs: [] type: TYPE_NORMAL + zh: '[https://librivox.org/great-pirate-stories-by-joseph-lewis-french/](https://librivox.org/great-pirate-stories-by-joseph-lewis-french/)' - en: It was re-uploaded for the sake of the tutorial. + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: 出于教程目的,它被重新上传。 - en: '[PRE6]' + id: totrans-33 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: '[PRE7]' + id: totrans-34 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: As previously explained, Emformer RNN-T model expects input data with overlaps; however, Streamer iterates the source media without overlap, so we make a helper structure that caches a part of input data from Streamer as right context and then appends it to the next input data from Streamer. + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 如前所述,Emformer RNN-T模型期望具有重叠的输入数据;然而,Streamer在没有重叠的情况下迭代源媒体,因此我们制作了一个辅助结构,从Streamer缓存一部分输入数据作为右上下文,然后将其附加到来自Streamer的下一个输入数据。 - en: The following figure illustrates this. + id: totrans-36 prefs: [] type: TYPE_NORMAL + zh: 以下图示说明了这一点。 - en: '![https://download.pytorch.org/torchaudio/tutorial-assets/emformer_rnnt_streamer_context.png](../Images/a57362a983bfc8977c146b9cec1fbdc5.png)' + id: totrans-37 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/tutorial-assets/emformer_rnnt_streamer_context.png](../Images/a57362a983bfc8977c146b9cec1fbdc5.png)' - en: '[PRE8]' + id: totrans-38 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: 5\. Run stream inference[](#run-stream-inference "Permalink to this heading") + id: totrans-39 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 5\. 运行流推理[](#run-stream-inference "跳转到此标题的永久链接") - en: Finally, we run the recognition. + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: 最后,我们运行识别。 - en: First, we initialize the stream iterator, context cacher, and state and hypothesis that are used by decoder to carry over the decoding state between inference calls. + id: totrans-41 prefs: [] type: TYPE_NORMAL + zh: 首先,我们初始化流迭代器、上下文缓存器以及解码器使用的状态和假设,用于在推理调用之间传递解码状态。 - en: '[PRE9]' + id: totrans-42 prefs: [] type: TYPE_PRE + zh: '[PRE9]' - en: Next we, run the inference. + id: totrans-43 prefs: [] type: TYPE_NORMAL + zh: 接下来,我们运行推理。 - en: For the sake of better display, we create a helper function which processes the source stream up to the given times and call it repeatedly. + id: totrans-44 prefs: [] type: TYPE_NORMAL + zh: 为了更好地显示,我们创建了一个辅助函数,该函数处理源流直到给定次数,并重复调用它。 - en: '[PRE10]' + id: totrans-45 prefs: [] type: TYPE_PRE + zh: '[PRE10]' - en: '[PRE11]' + id: totrans-46 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: '![MelSpectrogram Feature](../Images/6f88cad1fa15680732704d2ab1568895.png)' + id: totrans-47 prefs: [] type: TYPE_IMG + zh: '![MelSpectrogram特征](../Images/6f88cad1fa15680732704d2ab1568895.png)' - en: '[PRE12]' + id: totrans-48 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE12]' +- en: null + id: totrans-49 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-50 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE13]' + id: totrans-51 prefs: [] type: TYPE_PRE + zh: '[PRE13]' - en: '![MelSpectrogram Feature](../Images/63ea9ff950b6828668774e9e16e2da72.png)' + id: totrans-52 prefs: [] type: TYPE_IMG + zh: '![Mel频谱特征](../Images/63ea9ff950b6828668774e9e16e2da72.png)' - en: '[PRE14]' + id: totrans-53 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE14]' +- en: null + id: totrans-54 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-55 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE15]' + id: totrans-56 prefs: [] type: TYPE_PRE + zh: '[PRE15]' - en: '![MelSpectrogram Feature](../Images/9fd0eaf340cc4769da822a728893c8d0.png)' + id: totrans-57 prefs: [] type: TYPE_IMG + zh: '![Mel频谱特征](../Images/9fd0eaf340cc4769da822a728893c8d0.png)' - en: '[PRE16]' + id: totrans-58 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE16]' +- en: null + id: totrans-59 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-60 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE17]' + id: totrans-61 prefs: [] type: TYPE_PRE + zh: '[PRE17]' - en: '![MelSpectrogram Feature](../Images/27361e962edf9ff4e1dc7a554b09d885.png)' + id: totrans-62 prefs: [] type: TYPE_IMG + zh: '![Mel频谱特征](../Images/27361e962edf9ff4e1dc7a554b09d885.png)' - en: '[PRE18]' + id: totrans-63 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE18]' +- en: null + id: totrans-64 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-65 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE19]' + id: totrans-66 prefs: [] type: TYPE_PRE + zh: '[PRE19]' - en: '![MelSpectrogram Feature](../Images/78b4f08b9d73ca155002dca9b67d5139.png)' + id: totrans-67 prefs: [] type: TYPE_IMG + zh: '![Mel频谱特征](../Images/78b4f08b9d73ca155002dca9b67d5139.png)' - en: '[PRE20]' + id: totrans-68 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE20]' +- en: null + id: totrans-69 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-70 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE21]' + id: totrans-71 prefs: [] type: TYPE_PRE + zh: '[PRE21]' - en: '![MelSpectrogram Feature](../Images/8e43113644bb019dfc4bb4603e5bc696.png)' + id: totrans-72 prefs: [] type: TYPE_IMG + zh: '![Mel频谱特征](../Images/8e43113644bb019dfc4bb4603e5bc696.png)' - en: '[PRE22]' + id: totrans-73 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE22]' +- en: null + id: totrans-74 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-75 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE23]' + id: totrans-76 prefs: [] type: TYPE_PRE + zh: '[PRE23]' - en: '![MelSpectrogram Feature](../Images/74f496d6db06d496150b2e6b919a7fea.png)' + id: totrans-77 prefs: [] type: TYPE_IMG + zh: '![Mel频谱特征](../Images/74f496d6db06d496150b2e6b919a7fea.png)' - en: '[PRE24]' + id: totrans-78 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE24]' +- en: null + id: totrans-79 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-80 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE25]' + id: totrans-81 prefs: [] type: TYPE_PRE + zh: '[PRE25]' - en: '![MelSpectrogram Feature](../Images/1d8004d0bd1aaa132e299f5e7b3f4d65.png)' + id: totrans-82 prefs: [] type: TYPE_IMG + zh: '![Mel频谱特征](../Images/1d8004d0bd1aaa132e299f5e7b3f4d65.png)' - en: '[PRE26]' + id: totrans-83 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE26]' +- en: null + id: totrans-84 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-85 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE27]' + id: totrans-86 prefs: [] type: TYPE_PRE + zh: '[PRE27]' - en: '![MelSpectrogram Feature](../Images/078602e6329acdc28d9f151361d84fa4.png)' + id: totrans-87 prefs: [] type: TYPE_IMG + zh: '![Mel频谱特征](../Images/078602e6329acdc28d9f151361d84fa4.png)' - en: '[PRE28]' + id: totrans-88 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE28]' +- en: null + id: totrans-89 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-90 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE29]' + id: totrans-91 prefs: [] type: TYPE_PRE + zh: '[PRE29]' - en: '![MelSpectrogram Feature](../Images/09c62d29a7ebfdca810fb7715b4d6deb.png)' + id: totrans-92 prefs: [] type: TYPE_IMG + zh: '![Mel频谱特征](../Images/09c62d29a7ebfdca810fb7715b4d6deb.png)' - en: '[PRE30]' + id: totrans-93 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE30]' +- en: null + id: totrans-94 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-95 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE31]' + id: totrans-96 prefs: [] type: TYPE_PRE + zh: '[PRE31]' - en: '![MelSpectrogram Feature](../Images/bd6f77d39b92dab706c4579cee78d49b.png)' + id: totrans-97 prefs: [] type: TYPE_IMG + zh: '![Mel频谱特征](../Images/bd6f77d39b92dab706c4579cee78d49b.png)' - en: '[PRE32]' + id: totrans-98 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE32]' +- en: null + id: totrans-99 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-100 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE33]' + id: totrans-101 prefs: [] type: TYPE_PRE + zh: '[PRE33]' - en: '![MelSpectrogram Feature](../Images/1d08a0f2dfb8662795d4a456d55369b9.png)' + id: totrans-102 prefs: [] type: TYPE_IMG + zh: '![Mel频谱特征](../Images/1d08a0f2dfb8662795d4a456d55369b9.png)' - en: '[PRE34]' + id: totrans-103 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE34]' +- en: null + id: totrans-104 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-105 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE35]' + id: totrans-106 prefs: [] type: TYPE_PRE + zh: '[PRE35]' - en: '![MelSpectrogram Feature](../Images/b5ffe860eeae95b44bae565c68a36a14.png)' + id: totrans-107 prefs: [] type: TYPE_IMG + zh: '![Mel频谱特征](../Images/b5ffe860eeae95b44bae565c68a36a14.png)' - en: '[PRE36]' + id: totrans-108 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE36]' +- en: null + id: totrans-109 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-110 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: 'Tag: [`torchaudio.io`](../io.html#module-torchaudio.io "torchaudio.io")' + id: totrans-111 prefs: [] type: TYPE_NORMAL + zh: 标签:[`torchaudio.io`](../io.html#module-torchaudio.io "torchaudio.io") - en: '**Total running time of the script:** ( 1 minutes 34.955 seconds)' + id: totrans-112 prefs: [] type: TYPE_NORMAL + zh: '**脚本的总运行时间:**(1分钟34.955秒)' - en: '[`Download Python source code: online_asr_tutorial.py`](../_downloads/f9f593098569966df0b815e29c13dd20/online_asr_tutorial.py)' + id: totrans-113 prefs: [] type: TYPE_NORMAL + zh: '[`下载Python源代码:online_asr_tutorial.py`](../_downloads/f9f593098569966df0b815e29c13dd20/online_asr_tutorial.py)' - en: '[`Download Jupyter notebook: online_asr_tutorial.ipynb`](../_downloads/bd34dff0656a1aa627d444a8d1a5957f/online_asr_tutorial.ipynb)' + id: totrans-114 prefs: [] type: TYPE_NORMAL + zh: '[`下载Jupyter笔记本:online_asr_tutorial.ipynb`](../_downloads/bd34dff0656a1aa627d444a8d1a5957f/online_asr_tutorial.ipynb)' - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)' + id: totrans-115 prefs: [] type: TYPE_NORMAL + zh: '[Sphinx-Gallery生成的图库](https://sphinx-gallery.github.io)' diff --git a/totrans/aud22_38.yaml b/totrans/aud22_38.yaml index 65ecf1e57c394ccbce7b8eef87240dedec6e7e13..9d04b09e1c6af001fbb75f3c31f7457804be46b2 100644 --- a/totrans/aud22_38.yaml +++ b/totrans/aud22_38.yaml @@ -1,218 +1,338 @@ - en: Device ASR with Emformer RNN-T + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 使用Emformer RNN-T的设备ASR - en: 原文:[https://pytorch.org/audio/stable/tutorials/device_asr.html](https://pytorch.org/audio/stable/tutorials/device_asr.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/tutorials/device_asr.html](https://pytorch.org/audio/stable/tutorials/device_asr.html) - en: Note + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Click [here](#sphx-glr-download-tutorials-device-asr-py) to download the full example code + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 点击[这里](#sphx-glr-download-tutorials-device-asr-py)下载完整示例代码 - en: '**Author**: [Moto Hira](mailto:moto%40meta.com), [Jeff Hwang](mailto:jeffhwang%40meta.com).' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: '**作者**:[Moto Hira](mailto:moto%40meta.com), [Jeff Hwang](mailto:jeffhwang%40meta.com)。' - en: This tutorial shows how to use Emformer RNN-T and streaming API to perform speech recognition on a streaming device input, i.e. microphone on laptop. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 本教程展示了如何使用Emformer RNN-T和流式API在流式设备输入上执行语音识别,即笔记本电脑上的麦克风。 - en: Note + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: This tutorial requires FFmpeg libraries. Please refer to [FFmpeg dependency](../installation.html#ffmpeg-dependency) for the detail. + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 本教程需要FFmpeg库。请参考[FFmpeg依赖](../installation.html#ffmpeg-dependency)获取详细信息。 - en: Note + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: This tutorial was tested on MacBook Pro and Dynabook with Windows 10. + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: 本教程在MacBook Pro和安装了Windows 10的Dynabook上进行了测试。 - en: This tutorial does NOT work on Google Colab because the server running this tutorial does not have a microphone that you can talk to. + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 本教程在Google Colab上不起作用,因为运行本教程的服务器没有可以与之交谈的麦克风。 - en: 1\. Overview[](#overview "Permalink to this heading") + id: totrans-11 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 1\. 概述[](#overview "Permalink to this heading") - en: We use streaming API to fetch audio from audio device (microphone) chunk by chunk, then run inference using Emformer RNN-T. + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 我们使用流式API逐块从音频设备(麦克风)获取音频,然后使用Emformer RNN-T进行推理。 - en: For the basic usage of the streaming API and Emformer RNN-T please refer to [StreamReader Basic Usage](./streamreader_basic_tutorial.html) and [Online ASR with Emformer RNN-T](./online_asr_tutorial.html). + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 有关流式API和Emformer RNN-T的基本用法,请参考[StreamReader基本用法](./streamreader_basic_tutorial.html)和[使用Emformer + RNN-T进行在线ASR](./online_asr_tutorial.html)。 - en: 2\. Checking the supported devices[](#checking-the-supported-devices "Permalink to this heading") + id: totrans-14 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 2\. 检查支持的设备[](#checking-the-supported-devices "Permalink to this heading") - en: Firstly, we need to check the devices that Streaming API can access, and figure out the arguments (`src` and `format`) we need to pass to [`StreamReader()`](../generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader "torchaudio.io.StreamReader") class. + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: 首先,我们需要检查流式API可以访问的设备,并找出我们需要传递给[`StreamReader()`](../generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader + "torchaudio.io.StreamReader")类的参数(`src`和`format`)。 - en: We use `ffmpeg` command for this. `ffmpeg` abstracts away the difference of underlying hardware implementations, but the expected value for `format` varies across OS and each `format` defines different syntax for `src`. + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: 我们使用`ffmpeg`命令来实现。`ffmpeg`抽象了底层硬件实现的差异,但`format`的预期值在不同操作系统上有所不同,每个`format`定义了不同的`src`语法。 - en: The details of supported `format` values and `src` syntax can be found in [https://ffmpeg.org/ffmpeg-devices.html](https://ffmpeg.org/ffmpeg-devices.html). + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: 有关支持的`format`值和`src`语法的详细信息,请参考[https://ffmpeg.org/ffmpeg-devices.html](https://ffmpeg.org/ffmpeg-devices.html)。 - en: For macOS, the following command will list the available devices. + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: 对于macOS,以下命令将列出可用设备。 - en: '[PRE0]' + id: totrans-19 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: We will use the following values for Streaming API. + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: 我们将使用以下值进行流式API。 - en: '[PRE1]' + id: totrans-21 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: For Windows, `dshow` device should work. + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 对于Windows,`dshow`设备应该可以工作。 - en: '[PRE2]' + id: totrans-23 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: In the above case, the following value can be used to stream from microphone. + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: 在上述情况下,可以使用以下值从麦克风进行流式传输。 - en: '[PRE3]' + id: totrans-25 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: 3\. Data acquisition[](#data-acquisition "Permalink to this heading") + id: totrans-26 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 3\. 数据采集[](#data-acquisition "Permalink to this heading") - en: Streaming audio from microphone input requires properly timing data acquisition. Failing to do so may introduce discontinuities in the data stream. + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: 从麦克风输入流式音频需要正确计时数据采集。如果未能这样做,可能会导致数据流中出现不连续性。 - en: For this reason, we will run the data acquisition in a subprocess. + id: totrans-28 prefs: [] type: TYPE_NORMAL + zh: 因此,我们将在子进程中运行数据采集。 - en: Firstly, we create a helper function that encapsulates the whole process executed in the subprocess. + id: totrans-29 prefs: [] type: TYPE_NORMAL + zh: 首先,我们创建一个封装在子进程中执行的整个过程的辅助函数。 - en: This function initializes the streaming API, acquires data then puts it in a queue, which the main process is watching. + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: 此函数初始化流式API,获取数据然后将其放入队列,主进程正在监视该队列。 - en: '[PRE4]' + id: totrans-31 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: The notable difference from the non-device streaming is that, we provide `timeout` and `backoff` parameters to `stream` method. + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: 与非设备流式的显着区别在于,我们为`stream`方法提供了`timeout`和`backoff`参数。 - en: When acquiring data, if the rate of acquisition requests is higher than that at which the hardware can prepare the data, then the underlying implementation reports special error code, and expects client code to retry. + id: totrans-33 prefs: [] type: TYPE_NORMAL + zh: 在获取数据时,如果获取请求的速率高于硬件准备数据的速率,则底层实现会报告特殊的错误代码,并期望客户端代码重试。 - en: Precise timing is the key for smooth streaming. Reporting this error from low-level implementation all the way back to Python layer, before retrying adds undesired overhead. For this reason, the retry behavior is implemented in C++ layer, and `timeout` and `backoff` parameters allow client code to control the behavior. + id: totrans-34 prefs: [] type: TYPE_NORMAL + zh: 精确的时序是流畅流媒体的关键。从低级实现报告此错误一直返回到Python层,在重试之前会增加不必要的开销。因此,重试行为是在C++层实现的,`timeout`和`backoff`参数允许客户端代码控制行为。 - en: For the detail of `timeout` and `backoff` parameters, please refer to the documentation of `stream()` method. + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 有关`timeout`和`backoff`参数的详细信息,请参考`stream()`方法的文档。 - en: Note + id: totrans-36 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: The proper value of `backoff` depends on the system configuration. One way to see if `backoff` value is appropriate is to save the series of acquired chunks as a continuous audio and listen to it. If `backoff` value is too large, then the data stream is discontinuous. The resulting audio sounds sped up. If `backoff` value is too small or zero, the audio stream is fine, but the data acquisition process enters busy-waiting state, and this increases the CPU consumption. + id: totrans-37 prefs: [] type: TYPE_NORMAL + zh: '`backoff`的适当值取决于系统配置。检查`backoff`值是否合适的一种方法是将获取的一系列块保存为连续音频并进行听取。如果`backoff`值太大,则数据流是不连续的。生成的音频听起来加快了。如果`backoff`值太小或为零,则音频流正常,但数据采集过程进入忙等待状态,这会增加CPU消耗。' - en: 4\. Building inference pipeline[](#building-inference-pipeline "Permalink to this heading") + id: totrans-38 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 4\. 构建推理流程[](#building-inference-pipeline "跳转到此标题") - en: The next step is to create components required for inference. + id: totrans-39 prefs: [] type: TYPE_NORMAL + zh: 接下来的步骤是创建推理所需的组件。 - en: This is the same process as [Online ASR with Emformer RNN-T](./online_asr_tutorial.html). + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: 这与[使用Emformer RNN-T进行在线ASR](./online_asr_tutorial.html)是相同的流程。 - en: '[PRE5]' + id: totrans-41 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: '[PRE6]' + id: totrans-42 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: 5\. The main process[](#the-main-process "Permalink to this heading") + id: totrans-43 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 5\. 主要流程[](#the-main-process "跳转到此标题") - en: 'The execution flow of the main process is as follows:' + id: totrans-44 prefs: [] type: TYPE_NORMAL + zh: 主进程的执行流程如下: - en: Initialize the inference pipeline. + id: totrans-45 prefs: - PREF_OL type: TYPE_NORMAL + zh: 初始化推理流程。 - en: Launch data acquisition subprocess. + id: totrans-46 prefs: - PREF_OL type: TYPE_NORMAL + zh: 启动数据获取子进程。 - en: Run inference. + id: totrans-47 prefs: - PREF_OL type: TYPE_NORMAL + zh: 运行推理。 - en: Clean up + id: totrans-48 prefs: - PREF_OL type: TYPE_NORMAL + zh: 清理 - en: Note + id: totrans-49 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: As the data acquisition subprocess will be launched with “spawn” method, all the code on global scope are executed on the subprocess as well. + id: totrans-50 prefs: [] type: TYPE_NORMAL + zh: 由于数据获取子进程将使用“spawn”方法启动,全局范围的所有代码也将在子进程中执行。 - en: We want to instantiate pipeline only in the main process, so we put them in a function and invoke it within __name__ == “__main__” guard. + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: 我们希望只在主进程中实例化流程,因此我们将它们放在一个函数中,并在`__name__ == "__main__"`保护内调用它。 - en: '[PRE7]' + id: totrans-52 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: '[PRE8]' + id: totrans-53 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: 'Tag: [`torchaudio.io`](../io.html#module-torchaudio.io "torchaudio.io")' + id: totrans-54 prefs: [] type: TYPE_NORMAL + zh: 标签:[`torchaudio.io`](../io.html#module-torchaudio.io "torchaudio.io") - en: '**Total running time of the script:** ( 0 minutes 0.000 seconds)' + id: totrans-55 prefs: [] type: TYPE_NORMAL + zh: '**脚本的总运行时间:**(0分钟0.000秒)' - en: '[`Download Python source code: device_asr.py`](../_downloads/8009eae2a3a1a322f175ecc138597775/device_asr.py)' + id: totrans-56 prefs: [] type: TYPE_NORMAL + zh: '[`下载Python源代码:device_asr.py`](../_downloads/8009eae2a3a1a322f175ecc138597775/device_asr.py)' - en: '[`Download Jupyter notebook: device_asr.ipynb`](../_downloads/c8265c298ed19ff44b504d5c3aa72563/device_asr.ipynb)' + id: totrans-57 prefs: [] type: TYPE_NORMAL + zh: '[`下载Jupyter笔记本:device_asr.ipynb`](../_downloads/c8265c298ed19ff44b504d5c3aa72563/device_asr.ipynb)' - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)' + id: totrans-58 prefs: [] type: TYPE_NORMAL + zh: '[Sphinx-Gallery生成的画廊](https://sphinx-gallery.github.io)' diff --git a/totrans/aud22_39.yaml b/totrans/aud22_39.yaml index 040864478c34f00c06737408b11157ce2b230287..b71321977ae9bb5e43af72e2074ab2008bacd7be 100644 --- a/totrans/aud22_39.yaml +++ b/totrans/aud22_39.yaml @@ -1,62 +1,94 @@ - en: Device AV-ASR with Emformer RNN-T + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 使用Emformer RNN-T的设备AV-ASR - en: 原文:[https://pytorch.org/audio/stable/tutorials/device_avsr.html](https://pytorch.org/audio/stable/tutorials/device_avsr.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/tutorials/device_avsr.html](https://pytorch.org/audio/stable/tutorials/device_avsr.html) - en: Note + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Click [here](#sphx-glr-download-tutorials-device-avsr-py) to download the full example code + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 点击[这里](#sphx-glr-download-tutorials-device-avsr-py)下载完整示例代码 - en: '**Author**: [Pingchuan Ma](mailto:pingchuanma%40meta.com), [Moto Hira](mailto:moto%40meta.com).' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: '**作者**:[Pingchuan Ma](mailto:pingchuanma%40meta.com), [Moto Hira](mailto:moto%40meta.com)。' - en: This tutorial shows how to run on-device audio-visual speech recognition (AV-ASR, or AVSR) with TorchAudio on a streaming device input, i.e. microphone on laptop. AV-ASR is the task of transcribing text from audio and visual streams, which has recently attracted a lot of research attention due to its robustness against noise. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 本教程展示了如何在流设备输入上(即笔记本电脑上的麦克风)使用TorchAudio运行设备上的音频-视觉语音识别(AV-ASR或AVSR)。AV-ASR是从音频和视觉流中转录文本的任务,最近因其对噪声的稳健性而引起了许多研究的关注。 - en: Note + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: This tutorial requires ffmpeg, sentencepiece, mediapipe, opencv-python and scikit-image libraries. + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 此教程需要ffmpeg、sentencepiece、mediapipe、opencv-python和scikit-image库。 - en: There are multiple ways to install ffmpeg libraries. If you are using Anaconda Python distribution, `conda install -c conda-forge 'ffmpeg<7'` will install compatible FFmpeg libraries. + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 有多种安装ffmpeg库的方法。如果您使用Anaconda Python发行版,`conda install -c conda-forge 'ffmpeg<7'`将安装兼容的FFmpeg库。 - en: You can run `pip install sentencepiece mediapipe opencv-python scikit-image` to install the other libraries mentioned. + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: 您可以运行`pip install sentencepiece mediapipe opencv-python scikit-image`来安装其他提到的库。 - en: Note + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: To run this tutorial, please make sure you are in the tutorial folder. + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 要运行此教程,请确保您在教程文件夹中。 - en: Note + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: We tested the tutorial on torchaudio version 2.0.2 on Macbook Pro (M1 Pro). + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 我们在Macbook Pro(M1 Pro)上测试了torchaudio版本2.0.2上的教程。 - en: '[PRE0]' + id: totrans-14 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: Overview[](#overview "Permalink to this heading") + id: totrans-15 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 概述[](#overview "Permalink to this heading") - en: The real-time AV-ASR system is presented as follows, which consists of three components, a data collection module, a pre-processing module and an end-to-end model. The data collection module is hardware, such as a microphone and camera. @@ -64,140 +96,217 @@ collected, the pre-processing module location and crop out face. Next, we feed the raw audio stream and the pre-processed video stream into our end-to-end model for inference. + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: 实时AV-ASR系统如下所示,由三个组件组成,即数据收集模块、预处理模块和端到端模型。数据收集模块是硬件,如麦克风和摄像头。它的作用是从现实世界收集信息。一旦信息被收集,预处理模块会定位和裁剪出脸部。接下来,我们将原始音频流和预处理的视频流馈送到我们的端到端模型进行推断。 - en: '![https://download.pytorch.org/torchaudio/doc-assets/avsr/overview.png](../Images/757b2c4226d175a3a1b0d10e928d909c.png)' + id: totrans-17 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/doc-assets/avsr/overview.png](../Images/757b2c4226d175a3a1b0d10e928d909c.png)' - en: 1\. Data acquisition[](#data-acquisition "Permalink to this heading") + id: totrans-18 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 1\. 数据采集[](#data-acquisition "Permalink to this heading") - en: Firstly, we define the function to collect videos from microphone and camera. To be specific, we use [`StreamReader`](../generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader "torchaudio.io.StreamReader") class for the purpose of data collection, which supports capturing audio/video from microphone and camera. For the detailed usage of this class, please refer to the [tutorial](./streamreader_basic_tutorial.html). + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: 首先,我们定义了从麦克风和摄像头收集视频的函数。具体来说,我们使用[`StreamReader`](../generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader + "torchaudio.io.StreamReader")类来进行数据收集,该类支持从麦克风和摄像头捕获音频/视频。有关此类的详细用法,请参考[教程](./streamreader_basic_tutorial.html)。 - en: '[PRE1]' + id: totrans-20 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: 2\. Pre-processing[](#pre-processing "Permalink to this heading") + id: totrans-21 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 2\. 预处理[](#pre-processing "Permalink to this heading") - en: Before feeding the raw stream into our model, each video sequence has to undergo a specific pre-processing procedure. This involves three critical steps. The first step is to perform face detection. Following that, each individual frame is aligned to a referenced frame, commonly known as the mean face, in order to normalize rotation and size differences across frames. The final step in the pre-processing module is to crop the face region from the aligned face image. + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 在将原始流馈送到我们的模型之前,每个视频序列都必须经过特定的预处理过程。这涉及三个关键步骤。第一步是进行人脸检测。随后,将每个单独的帧对齐到一个参考帧,通常称为平均脸,以规范化帧之间的旋转和大小差异。预处理模块中的最后一步是从对齐的人脸图像中裁剪出脸部区域。 - en: '| ![https://download.pytorch.org/torchaudio/doc-assets/avsr/original.gif](../Images/b9142268a9c0666c9697c22b10755a18.png) | ![https://download.pytorch.org/torchaudio/doc-assets/avsr/detected.gif](../Images/b44fd7d78a200f7ef203259295e21a8a.png) | ![https://download.pytorch.org/torchaudio/doc-assets/avsr/transformed.gif](../Images/7029d284337ec7c2222d6b4344ac49d0.png) | ![https://download.pytorch.org/torchaudio/doc-assets/avsr/cropped.gif](../Images/5aa4bb57e0b31b6d34ac3b4766e5503f.png) |' + id: totrans-23 prefs: [] type: TYPE_TB + zh: '| ![https://download.pytorch.org/torchaudio/doc-assets/avsr/original.gif](../Images/b9142268a9c0666c9697c22b10755a18.png) + | ![https://download.pytorch.org/torchaudio/doc-assets/avsr/detected.gif](../Images/b44fd7d78a200f7ef203259295e21a8a.png) + | ![https://download.pytorch.org/torchaudio/doc-assets/avsr/transformed.gif](../Images/7029d284337ec7c2222d6b4344ac49d0.png) + | ![https://download.pytorch.org/torchaudio/doc-assets/avsr/cropped.gif](../Images/5aa4bb57e0b31b6d34ac3b4766e5503f.png) + |' - en: '|' + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: '|' - en: Original + id: totrans-25 prefs: - PREF_OL type: TYPE_NORMAL + zh: 原 - en: '|' + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: '|' - en: Detected + id: totrans-27 prefs: - PREF_OL type: TYPE_NORMAL + zh: 检测 - en: '|' + id: totrans-28 prefs: [] type: TYPE_NORMAL + zh: '|' - en: Transformed + id: totrans-29 prefs: - PREF_OL type: TYPE_NORMAL + zh: 转换 - en: '|' + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: '|' - en: Cropped + id: totrans-31 prefs: - PREF_OL type: TYPE_NORMAL + zh: 裁剪 - en: '|' + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: '|' - en: '[PRE2]' + id: totrans-33 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: 3\. Building inference pipeline[](#building-inference-pipeline "Permalink to this heading") + id: totrans-34 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 3\. 构建推断管道[](#building-inference-pipeline "Permalink to this heading") - en: The next step is to create components required for pipeline. + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 下一步是创建管道所需的组件。 - en: We use convolutional-based front-ends to extract features from both the raw audio and video streams. These features are then passed through a two-layer MLP for fusion. For our transducer model, we leverage the TorchAudio library, which incorporates an encoder (Emformer), a predictor, and a joint network. The architecture of the proposed AV-ASR model is illustrated as follows. + id: totrans-36 prefs: [] type: TYPE_NORMAL + zh: 我们使用基于卷积的前端从原始音频和视频流中提取特征。然后,这些特征通过两层MLP进行融合。对于我们的转录器模型,我们利用了TorchAudio库,该库包含一个编码器(Emformer)、一个预测器和一个联合网络。所提出的AV-ASR模型的架构如下所示。 - en: '![https://download.pytorch.org/torchaudio/doc-assets/avsr/architecture.png](../Images/ed7f525d50ee520d70b7e9c6f6b7fd66.png)' + id: totrans-37 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/doc-assets/avsr/architecture.png](../Images/ed7f525d50ee520d70b7e9c6f6b7fd66.png)' - en: '[PRE3]' + id: totrans-38 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: 4\. The main process[](#the-main-process "Permalink to this heading") + id: totrans-39 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 4. 主进程[](#the-main-process "Permalink to this heading") - en: 'The execution flow of the main process is as follows:' + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: 主进程的执行流程如下: - en: Initialize the inference pipeline. + id: totrans-41 prefs: - PREF_OL type: TYPE_NORMAL + zh: 初始化推断流程。 - en: Launch data acquisition subprocess. + id: totrans-42 prefs: - PREF_OL type: TYPE_NORMAL + zh: 启动数据采集子进程。 - en: Run inference. + id: totrans-43 prefs: - PREF_OL type: TYPE_NORMAL + zh: 运行推断。 - en: Clean up + id: totrans-44 prefs: - PREF_OL type: TYPE_NORMAL + zh: 清理 - en: '[PRE4]' + id: totrans-45 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: '[PRE5]' + id: totrans-46 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: 'Tag: [`torchaudio.io`](../io.html#module-torchaudio.io "torchaudio.io")' + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 标签:[`torchaudio.io`](../io.html#module-torchaudio.io "torchaudio.io") - en: '**Total running time of the script:** ( 0 minutes 0.000 seconds)' + id: totrans-48 prefs: [] type: TYPE_NORMAL + zh: '**脚本的总运行时间:**(0分钟0.000秒)' - en: '[`Download Python source code: device_avsr.py`](../_downloads/e10abb57121274b0bbaca74dbbd1fbc4/device_avsr.py)' + id: totrans-49 prefs: [] type: TYPE_NORMAL + zh: 下载Python源代码:device_avsr.py - en: '[`Download Jupyter notebook: device_avsr.ipynb`](../_downloads/eb72a6f2273304a15352dfcf3b824b42/device_avsr.ipynb)' + id: totrans-50 prefs: [] type: TYPE_NORMAL + zh: 下载Jupyter笔记本:device_avsr.ipynb - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)' + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: '[Sphinx-Gallery生成的图库](https://sphinx-gallery.github.io)' diff --git a/totrans/aud22_40.yaml b/totrans/aud22_40.yaml index a377ac1cd3dd00f2198c6cfde1393a2f0c9a33d4..c28e0aa5e12fbb12caf0f8858e742f9867f29240 100644 --- a/totrans/aud22_40.yaml +++ b/totrans/aud22_40.yaml @@ -1,433 +1,674 @@ - en: Forced Alignment with Wav2Vec2 + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 使用Wav2Vec2进行强制对齐 - en: 原文:[https://pytorch.org/audio/stable/tutorials/forced_alignment_tutorial.html](https://pytorch.org/audio/stable/tutorials/forced_alignment_tutorial.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/tutorials/forced_alignment_tutorial.html](https://pytorch.org/audio/stable/tutorials/forced_alignment_tutorial.html) - en: Note + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Click [here](#sphx-glr-download-tutorials-forced-alignment-tutorial-py) to download the full example code + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 点击[这里](#sphx-glr-download-tutorials-forced-alignment-tutorial-py)下载完整示例代码 - en: '**Author**: [Moto Hira](mailto:moto%40meta.com)' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: '**作者**:[Moto Hira](mailto:moto%40meta.com)' - en: This tutorial shows how to align transcript to speech with `torchaudio`, using CTC segmentation algorithm described in [CTC-Segmentation of Large Corpora for German End-to-end Speech Recognition](https://arxiv.org/abs/2007.09127). + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 本教程展示了如何使用`torchaudio`将转录对齐到语音,使用[CTC-Segmentation of Large Corpora for German + End-to-end Speech Recognition](https://arxiv.org/abs/2007.09127)中描述的CTC分割算法。 - en: Note + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: This tutorial was originally written to illustrate a usecase for Wav2Vec2 pretrained model. + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 本教程最初是为了说明Wav2Vec2预训练模型的用例而编写的。 - en: TorchAudio now has a set of APIs designed for forced alignment. The [CTC forced alignment API tutorial](./ctc_forced_alignment_api_tutorial.html) illustrates the usage of [`torchaudio.functional.forced_align()`](../generated/torchaudio.functional.forced_align.html#torchaudio.functional.forced_align "torchaudio.functional.forced_align"), which is the core API. + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: TorchAudio现在有一组专为强制对齐设计的API。[CTC强制对齐API教程](./ctc_forced_alignment_api_tutorial.html)说明了[`torchaudio.functional.forced_align()`](../generated/torchaudio.functional.forced_align.html#torchaudio.functional.forced_align + "torchaudio.functional.forced_align")的用法,这是核心API。 - en: If you are looking to align your corpus, we recommend to use [`torchaudio.pipelines.Wav2Vec2FABundle`](../generated/torchaudio.pipelines.Wav2Vec2FABundle.html#torchaudio.pipelines.Wav2Vec2FABundle "torchaudio.pipelines.Wav2Vec2FABundle"), which combines [`forced_align()`](../generated/torchaudio.functional.forced_align.html#torchaudio.functional.forced_align "torchaudio.functional.forced_align") and other support functions with pre-trained model specifically trained for forced-alignment. Please refer to the [Forced alignment for multilingual data](forced_alignment_for_multilingual_data_tutorial.html) which illustrates its usage. + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: 如果您想要对齐您的语料库,我们建议使用[`torchaudio.pipelines.Wav2Vec2FABundle`](../generated/torchaudio.pipelines.Wav2Vec2FABundle.html#torchaudio.pipelines.Wav2Vec2FABundle + "torchaudio.pipelines.Wav2Vec2FABundle"),它结合了[`forced_align()`](../generated/torchaudio.functional.forced_align.html#torchaudio.functional.forced_align + "torchaudio.functional.forced_align")和其他支持函数,专门针对强制对齐进行了训练的预训练模型。请参考[多语言数据的强制对齐](forced_alignment_for_multilingual_data_tutorial.html)以了解其用法。 - en: '[PRE0]' + id: totrans-10 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '[PRE1]' + id: totrans-11 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: Overview[](#overview "Permalink to this heading") + id: totrans-12 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 概述[](#overview "跳转到此标题的永久链接") - en: The process of alignment looks like the following. + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 对齐过程如下所示。 - en: Estimate the frame-wise label probability from audio waveform + id: totrans-14 prefs: - PREF_OL type: TYPE_NORMAL + zh: 从音频波形中估计逐帧标签概率 - en: Generate the trellis matrix which represents the probability of labels aligned at time step. + id: totrans-15 prefs: - PREF_OL type: TYPE_NORMAL + zh: 生成表示时间步对齐标签概率的状态图矩阵。 - en: Find the most likely path from the trellis matrix. + id: totrans-16 prefs: - PREF_OL type: TYPE_NORMAL + zh: 从状态图中找到最可能的路径。 - en: In this example, we use `torchaudio`’s `Wav2Vec2` model for acoustic feature extraction. + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: 在本示例中,我们使用`torchaudio`的`Wav2Vec2`模型进行声学特征提取。 - en: Preparation[](#preparation "Permalink to this heading") + id: totrans-18 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 准备工作[](#preparation "跳转到此标题的永久链接") - en: First we import the necessary packages, and fetch data that we work on. + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: 首先导入必要的包,并获取我们要处理的数据。 - en: '[PRE2]' + id: totrans-20 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: Generate frame-wise label probability[](#generate-frame-wise-label-probability "Permalink to this heading") + id: totrans-21 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 生成逐帧标签概率[](#generate-frame-wise-label-probability "跳转到此标题的永久链接") - en: The first step is to generate the label class porbability of each audio frame. We can use a Wav2Vec2 model that is trained for ASR. Here we use [`torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H()`](../generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H "torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H"). + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 第一步是生成每个音频帧的标签类概率。我们可以使用为ASR训练的Wav2Vec2模型。这里我们使用[`torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H()`](../generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H + "torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H")。 - en: '`torchaudio` provides easy access to pretrained models with associated labels.' + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: '`torchaudio`提供了易于访问的预训练模型和相关标签。' - en: Note + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: In the subsequent sections, we will compute the probability in log-domain to avoid numerical instability. For this purpose, we normalize the `emission` with `torch.log_softmax()`. + id: totrans-25 prefs: [] type: TYPE_NORMAL + zh: 在接下来的部分中,我们将在对数域中计算概率,以避免数值不稳定性。为此,我们使用`torch.log_softmax()`对`emission`进行归一化。 - en: '[PRE3]' + id: totrans-26 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: '[PRE4]' + id: totrans-27 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: Visualization[](#visualization "Permalink to this heading") + id: totrans-28 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 可视化[](#visualization "跳转到此标题的永久链接") - en: '[PRE5]' + id: totrans-29 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: '![Frame-wise class probability](../Images/efb61f2d411fc5066755dd6b78a9a867.png)' + id: totrans-30 prefs: [] type: TYPE_IMG + zh: '![逐帧类概率](../Images/efb61f2d411fc5066755dd6b78a9a867.png)' - en: Generate alignment probability (trellis)[](#generate-alignment-probability-trellis "Permalink to this heading") + id: totrans-31 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 生成对齐概率(状态图)[](#generate-alignment-probability-trellis "跳转到此标题的永久链接") - en: From the emission matrix, next we generate the trellis which represents the probability of transcript labels occur at each time frame. + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: 从发射矩阵中,接下来我们生成表示每个时间帧发生转录标签概率的状态图。 - en: Trellis is 2D matrix with time axis and label axis. The label axis represents the transcript that we are aligning. In the following, we use \(t\) to denote the index in time axis and \(j\) to denote the index in label axis. \(c_j\) represents the label at label index \(j\). + id: totrans-33 prefs: [] type: TYPE_NORMAL + zh: 状态图是一个二维矩阵,具有时间轴和标签轴。标签轴表示我们正在对齐的转录。在下文中,我们使用\(t\)表示时间轴中的索引,使用\(j\)表示标签轴中的索引。\(c_j\)表示标签索引\(j\)处的标签。 - en: To generate, the probability of time step \(t+1\), we look at the trellis from time step \(t\) and emission at time step \(t+1\). There are two path to reach to time step \(t+1\) with label \(c_{j+1}\). The first one is the case where the label was \(c_{j+1}\) at \(t\) and there was no label change from \(t\) to \(t+1\). The other case is where the label was \(c_j\) at \(t\) and it transitioned to the next label \(c_{j+1}\) at \(t+1\). + id: totrans-34 prefs: [] type: TYPE_NORMAL + zh: 为了生成时间步长\(t+1\)的概率,我们查看从时间步长\(t\)到时间步长\(t+1\)的格子和发射。有两种路径可以到达时间步长\(t+1\),标签为\(c_{j+1}\)。第一种情况是标签在\(t\)时为\(c_{j+1}\),从\(t\)到\(t+1\)没有标签变化。另一种情况是标签在\(t\)时为\(c_j\),在\(t+1\)转换为下一个标签\(c_{j+1}\)。 - en: The follwoing diagram illustrates this transition. + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 以下图表说明了这种转变。 - en: '![https://download.pytorch.org/torchaudio/tutorial-assets/ctc-forward.png](../Images/cb0c89f6f8c29828d4d4d04ded7193b6.png)' + id: totrans-36 prefs: [] type: TYPE_IMG + zh: ![https://download.pytorch.org/torchaudio/tutorial-assets/ctc-forward.png](../Images/cb0c89f6f8c29828d4d4d04ded7193b6.png) - en: Since we are looking for the most likely transitions, we take the more likely path for the value of \(k_{(t+1, j+1)}\), that is + id: totrans-37 prefs: [] type: TYPE_NORMAL + zh: 由于我们正在寻找最可能的转换,因此我们为\(k_{(t+1, j+1)}\)的更可能路径取更可能的路径,即 - en: \(k_{(t+1, j+1)} = max( k_{(t, j)} p(t+1, c_{j+1}), k_{(t, j+1)} p(t+1, repeat) )\) + id: totrans-38 prefs: [] type: TYPE_NORMAL + zh: \(k_{(t+1, j+1)} = max( k_{(t, j)} p(t+1, c_{j+1}), k_{(t, j+1)} p(t+1, repeat) + )\) - en: where \(k\) represents is trellis matrix, and \(p(t, c_j)\) represents the probability of label \(c_j\) at time step \(t\). \(repeat\) represents the blank token from CTC formulation. (For the detail of CTC algorithm, please refer to the *Sequence Modeling with CTC* [[distill.pub](https://distill.pub/2017/ctc/)]) + id: totrans-39 prefs: [] type: TYPE_NORMAL + zh: 其中\(k\)代表格子矩阵,\(p(t, c_j)\)代表时间步长\(t\)处标签\(c_j\)的概率。\(repeat\)代表CTC公式中的空白标记。(有关CTC算法的详细信息,请参阅*使用CTC进行序列建模*[[distill.pub](https://distill.pub/2017/ctc/)) - en: '[PRE6]' + id: totrans-40 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: '[PRE7]' + id: totrans-41 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: Visualization[](#id1 "Permalink to this heading") + id: totrans-42 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 可视化[](#id1“此标题的永久链接”) - en: '[PRE8]' + id: totrans-43 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: '![forced alignment tutorial](../Images/9cba4b626edb17a6e4b5838fd55a4e90.png)' + id: totrans-44 prefs: [] type: TYPE_IMG + zh: ![强制对齐教程](../Images/9cba4b626edb17a6e4b5838fd55a4e90.png) - en: In the above visualization, we can see that there is a trace of high probability crossing the matrix diagonally. + id: totrans-45 prefs: [] type: TYPE_NORMAL + zh: 在上面的可视化中,我们可以看到有一个高概率的痕迹对角穿过矩阵。 - en: Find the most likely path (backtracking)[](#find-the-most-likely-path-backtracking "Permalink to this heading") + id: totrans-46 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 找到最可能的路径(回溯)[](#find-the-most-likely-path-backtracking“此标题的永久链接”) - en: Once the trellis is generated, we will traverse it following the elements with high probability. + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 生成了格子后,我们将沿着具有高概率元素的路径遍历它。 - en: We will start from the last label index with the time step of highest probability, then, we traverse back in time, picking stay (\(c_j \rightarrow c_j\)) or transition (\(c_j \rightarrow c_{j+1}\)), based on the post-transition probability \(k_{t, j} p(t+1, c_{j+1})\) or \(k_{t, j+1} p(t+1, repeat)\). + id: totrans-48 prefs: [] type: TYPE_NORMAL + zh: 我们将从具有最高概率时间步长的最后标签索引开始,然后,我们向后遍历时间,根据过渡后概率\(k_{t, j} p(t+1, c_{j+1})\)或\(k_{t, + j+1} p(t+1, repeat)\)选择停留(\(c_j \rightarrow c_j\))或过渡(\(c_j \rightarrow c_{j+1}\))。 - en: Transition is done once the label reaches the beginning. + id: totrans-49 prefs: [] type: TYPE_NORMAL + zh: 一旦标签到达开头,转换就完成了。 - en: The trellis matrix is used for path-finding, but for the final probability of each segment, we take the frame-wise probability from emission matrix. + id: totrans-50 prefs: [] type: TYPE_NORMAL + zh: 格子矩阵用于寻找路径,但对于每个段的最终概率,我们从发射矩阵中获取逐帧概率。 - en: '[PRE9]' + id: totrans-51 prefs: [] type: TYPE_PRE + zh: '[PRE9]' - en: '[PRE10]' + id: totrans-52 prefs: [] type: TYPE_PRE + zh: '[PRE10]' - en: Visualization[](#id2 "Permalink to this heading") + id: totrans-53 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 可视化[](#id2“此标题的永久链接”) - en: '[PRE11]' + id: totrans-54 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: '![The path found by backtracking](../Images/bfe239f26439c642dad7b47fc213e358.png)' + id: totrans-55 prefs: [] type: TYPE_IMG + zh: ![通过回溯找到的路径](../Images/bfe239f26439c642dad7b47fc213e358.png) - en: Looking good. + id: totrans-56 prefs: [] type: TYPE_NORMAL + zh: 看起来不错。 - en: Segment the path[](#segment-the-path "Permalink to this heading") + id: totrans-57 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 分割路径[](#segment-the-path“此标题的永久链接”) - en: Now this path contains repetations for the same labels, so let’s merge them to make it close to the original transcript. + id: totrans-58 prefs: [] type: TYPE_NORMAL + zh: 现在这条路径包含相同标签的重复,所以让我们合并它们使其接近原始文本。 - en: When merging the multiple path points, we simply take the average probability for the merged segments. + id: totrans-59 prefs: [] type: TYPE_NORMAL + zh: 在合并多个路径点时,我们简单地取合并段的平均概率。 - en: '[PRE12]' + id: totrans-60 prefs: [] type: TYPE_PRE + zh: '[PRE12]' - en: '[PRE13]' + id: totrans-61 prefs: [] type: TYPE_PRE + zh: '[PRE13]' - en: Visualization[](#id3 "Permalink to this heading") + id: totrans-62 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 可视化[](#id3“此标题的永久链接”) - en: '[PRE14]' + id: totrans-63 prefs: [] type: TYPE_PRE + zh: '[PRE14]' - en: '![Path, label and probability for each label, Label probability with and without repetation](../Images/f07166f8b26588977594bdaa39644315.png)' + id: totrans-64 prefs: [] type: TYPE_IMG + zh: ![路径,标签和每个标签的概率,带和不带重复的标签概率](../Images/f07166f8b26588977594bdaa39644315.png) - en: Looks good. + id: totrans-65 prefs: [] type: TYPE_NORMAL + zh: 看起来不错。 - en: Merge the segments into words[](#merge-the-segments-into-words "Permalink to this heading") + id: totrans-66 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 将段合并成单词[](#merge-the-segments-into-words“此标题的永久链接”) - en: Now let’s merge the words. The Wav2Vec2 model uses `'|'` as the word boundary, so we merge the segments before each occurance of `'|'`. + id: totrans-67 prefs: [] type: TYPE_NORMAL + zh: 现在让我们合并这些单词。Wav2Vec2模型使用`'|'`作为单词边界,因此我们在每次出现`'|'`之前合并段。 - en: Then, finally, we segment the original audio into segmented audio and listen to them to see if the segmentation is correct. + id: totrans-68 prefs: [] type: TYPE_NORMAL + zh: 最后,我们将原始音频分割成分段音频,并听取它们以查看分割是否正确。 - en: '[PRE15]' + id: totrans-69 prefs: [] type: TYPE_PRE + zh: '[PRE15]' - en: '[PRE16]' + id: totrans-70 prefs: [] type: TYPE_PRE + zh: '[PRE16]' - en: Visualization[](#id4 "Permalink to this heading") + id: totrans-71 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 可视化[](#id4“此标题的永久链接”) - en: '[PRE17]' + id: totrans-72 prefs: [] type: TYPE_PRE + zh: '[PRE17]' - en: '![forced alignment tutorial](../Images/5f304131dabeba702068f67a1e4db351.png)' + id: totrans-73 prefs: [] type: TYPE_IMG + zh: ![强制对齐教程](../Images/5f304131dabeba702068f67a1e4db351.png) - en: Audio Samples[](#audio-samples "Permalink to this heading") + id: totrans-74 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 音频样本[](#audio-samples“此标题的永久链接”) - en: '[PRE18]' + id: totrans-75 prefs: [] type: TYPE_PRE + zh: '[PRE18]' - en: '[PRE19]' + id: totrans-76 prefs: [] type: TYPE_PRE + zh: '[PRE19]' - en: '[PRE20]' + id: totrans-77 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE20]' +- en: null + id: totrans-78 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-79 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE21]' + id: totrans-80 prefs: [] type: TYPE_PRE + zh: '[PRE21]' - en: '[PRE22]' + id: totrans-81 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE22]' +- en: null + id: totrans-82 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-83 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE23]' + id: totrans-84 prefs: [] type: TYPE_PRE + zh: '[PRE23]' - en: '[PRE24]' + id: totrans-85 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE24]' +- en: null + id: totrans-86 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-87 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE25]' + id: totrans-88 prefs: [] type: TYPE_PRE + zh: '[PRE25]' - en: '[PRE26]' + id: totrans-89 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE26]' +- en: null + id: totrans-90 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-91 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE27]' + id: totrans-92 prefs: [] type: TYPE_PRE + zh: '[PRE27]' - en: '[PRE28]' + id: totrans-93 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE28]' +- en: null + id: totrans-94 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-95 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE29]' + id: totrans-96 prefs: [] type: TYPE_PRE + zh: '[PRE29]' - en: '[PRE30]' + id: totrans-97 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE30]' +- en: null + id: totrans-98 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-99 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE31]' + id: totrans-100 prefs: [] type: TYPE_PRE + zh: '[PRE31]' - en: '[PRE32]' + id: totrans-101 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE32]' +- en: null + id: totrans-102 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-103 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE33]' + id: totrans-104 prefs: [] type: TYPE_PRE + zh: '[PRE33]' - en: '[PRE34]' + id: totrans-105 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE34]' +- en: null + id: totrans-106 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-107 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE35]' + id: totrans-108 prefs: [] type: TYPE_PRE + zh: '[PRE35]' - en: '[PRE36]' + id: totrans-109 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE36]' +- en: null + id: totrans-110 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-111 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE37]' + id: totrans-112 prefs: [] type: TYPE_PRE + zh: '[PRE37]' - en: '[PRE38]' + id: totrans-113 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE38]' +- en: null + id: totrans-114 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-115 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Conclusion[](#conclusion "Permalink to this heading") + id: totrans-116 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 结论[](#conclusion“此标题的永久链接”) - en: In this tutorial, we looked how to use torchaudio’s Wav2Vec2 model to perform CTC segmentation for forced alignment. + id: totrans-117 prefs: [] type: TYPE_NORMAL + zh: 在本教程中,我们看了如何使用torchaudio的Wav2Vec2模型执行强制对齐的CTC分割。 - en: '**Total running time of the script:** ( 0 minutes 1.734 seconds)' + id: totrans-118 prefs: [] type: TYPE_NORMAL + zh: 脚本的总运行时间:(0分钟1.734秒) - en: '[`Download Python source code: forced_alignment_tutorial.py`](../_downloads/fa57890a830bd47c0baa254781b3a8e1/forced_alignment_tutorial.py)' + id: totrans-119 prefs: [] type: TYPE_NORMAL + zh: '[`下载Python源代码:forced_alignment_tutorial.py`](../_downloads/fa57890a830bd47c0baa254781b3a8e1/forced_alignment_tutorial.py)' - en: '[`Download Jupyter notebook: forced_alignment_tutorial.ipynb`](../_downloads/160356f33d521341c47ec6b1406a3c2e/forced_alignment_tutorial.ipynb)' + id: totrans-120 prefs: [] type: TYPE_NORMAL + zh: '[`下载Jupyter笔记本:forced_alignment_tutorial.ipynb`](../_downloads/160356f33d521341c47ec6b1406a3c2e/forced_alignment_tutorial.ipynb)' - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)' + id: totrans-121 prefs: [] type: TYPE_NORMAL + zh: '[Sphinx-Gallery生成的图库](https://sphinx-gallery.github.io)' diff --git a/totrans/aud22_41.yaml b/totrans/aud22_41.yaml index d41aee05eebfba37d7cffd935c8fdffa5ec3ed76..19153e111eb67a3b9851c2b6c1018297cdc079e1 100644 --- a/totrans/aud22_41.yaml +++ b/totrans/aud22_41.yaml @@ -1,828 +1,1305 @@ - en: Forced alignment for multilingual data + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 多语言数据的强制对齐 - en: 原文:[https://pytorch.org/audio/stable/tutorials/forced_alignment_for_multilingual_data_tutorial.html](https://pytorch.org/audio/stable/tutorials/forced_alignment_for_multilingual_data_tutorial.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/tutorials/forced_alignment_for_multilingual_data_tutorial.html](https://pytorch.org/audio/stable/tutorials/forced_alignment_for_multilingual_data_tutorial.html) - en: Note + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Click [here](#sphx-glr-download-tutorials-forced-alignment-for-multilingual-data-tutorial-py) to download the full example code + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 点击[这里](#sphx-glr-download-tutorials-forced-alignment-for-multilingual-data-tutorial-py)下载完整示例代码 - en: '**Authors**: [Xiaohui Zhang](mailto:xiaohuizhang%40meta.com), [Moto Hira](mailto:moto%40meta.com).' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: '**作者**:[Xiaohui Zhang](mailto:xiaohuizhang%40meta.com), [Moto Hira](mailto:moto%40meta.com)。' - en: This tutorial shows how to align transcript to speech for non-English languages. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 本教程展示了如何为非英语语言对齐转录和语音。 - en: The process of aligning non-English (normalized) transcript is identical to aligning English (normalized) transcript, and the process for English is covered in detail in [CTC forced alignment tutorial](./ctc_forced_alignment_api_tutorial.html). In this tutorial, we use TorchAudio’s high-level API, [`torchaudio.pipelines.Wav2Vec2FABundle`](../generated/torchaudio.pipelines.Wav2Vec2FABundle.html#torchaudio.pipelines.Wav2Vec2FABundle "torchaudio.pipelines.Wav2Vec2FABundle"), which packages the pre-trained model, tokenizer and aligner, to perform the forced alignment with less code. + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 对齐非英语(标准化)转录的过程与对齐英语(标准化)转录的过程相同,对于英语的过程在[CTC强制对齐教程](./ctc_forced_alignment_api_tutorial.html)中有详细介绍。在本教程中,我们使用TorchAudio的高级API,[`torchaudio.pipelines.Wav2Vec2FABundle`](../generated/torchaudio.pipelines.Wav2Vec2FABundle.html#torchaudio.pipelines.Wav2Vec2FABundle + "torchaudio.pipelines.Wav2Vec2FABundle"),它打包了预训练模型、分词器和对齐器,以更少的代码执行强制对齐。 - en: '[PRE0]' + id: totrans-7 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '[PRE1]' + id: totrans-8 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: '[PRE2]' + id: totrans-9 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: Creating the pipeline[](#creating-the-pipeline "Permalink to this heading") + id: totrans-10 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 创建流程[](#creating-the-pipeline "Permalink to this heading") - en: First, we instantiate the model and pre/post-processing pipelines. + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 首先,我们实例化模型和前/后处理流程。 - en: The following diagram illustrates the process of alignment. + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 以下图示了对齐的过程。 - en: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-wav2vec2fabundle.png](../Images/81159a1c90b6bf1cc96789ecb75c13f0.png)' + id: totrans-13 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-wav2vec2fabundle.png](../Images/81159a1c90b6bf1cc96789ecb75c13f0.png)' - en: The waveform is passed to an acoustic model, which produces the sequence of probability distribution of tokens. The transcript is passed to tokenizer, which converts the transcript to sequence of tokens. Aligner takes the results from the acoustic model and the tokenizer and generate timestamps for each token. + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 波形被传递给声学模型,该模型生成标记的概率分布序列。转录被传递给分词器,将转录转换为标记序列。对齐器获取声学模型和分词器的结果,并为每个标记生成时间戳。 - en: Note + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: This process expects that the input transcript is already normalized. The process of normalization, which involves romanization of non-English languages, is language-dependent, so it is not covered in this tutorial, but we will breifly look into it. + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: 该过程期望输入的转录已经被标准化。标准化的过程涉及非英语语言的罗马化,是与语言相关的,因此本教程不涵盖,但我们将简要介绍。 - en: The acoustic model and the tokenizer must use the same set of tokens. To facilitate the creation of matching processors, [`Wav2Vec2FABundle`](../generated/torchaudio.pipelines.Wav2Vec2FABundle.html#torchaudio.pipelines.Wav2Vec2FABundle "torchaudio.pipelines.Wav2Vec2FABundle") associates a pre-trained accoustic model and a tokenizer. [`torchaudio.pipelines.MMS_FA`](../generated/torchaudio.pipelines.MMS_FA.html#torchaudio.pipelines.MMS_FA "torchaudio.pipelines.MMS_FA") is one of such instance. + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: 声学模型和分词器必须使用相同的标记集。为了便于创建匹配的处理器,[`Wav2Vec2FABundle`](../generated/torchaudio.pipelines.Wav2Vec2FABundle.html#torchaudio.pipelines.Wav2Vec2FABundle + "torchaudio.pipelines.Wav2Vec2FABundle")关联了一个预训练的声学模型和一个分词器。[`torchaudio.pipelines.MMS_FA`](../generated/torchaudio.pipelines.MMS_FA.html#torchaudio.pipelines.MMS_FA + "torchaudio.pipelines.MMS_FA")就是这样一个实例。 - en: The following code instantiates a pre-trained acoustic model, a tokenizer which uses the same set of tokens as the model, and an aligner. + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: 以下代码实例化了一个预训练的声学模型,一个使用与模型相同的标记集的分词器,以及一个对齐器。 - en: '[PRE3]' + id: totrans-19 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: Note + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: The model instantiated by [`MMS_FA`](../generated/torchaudio.pipelines.MMS_FA.html#torchaudio.pipelines.MMS_FA "torchaudio.pipelines.MMS_FA")’s [`get_model()`](../generated/torchaudio.pipelines.Wav2Vec2FABundle.html#torchaudio.pipelines.Wav2Vec2FABundle.get_model "torchaudio.pipelines.Wav2Vec2FABundle.get_model") method by default includes the feature dimension for `` token. You can disable this by passing `with_star=False`. + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: 通过[`MMS_FA`](../generated/torchaudio.pipelines.MMS_FA.html#torchaudio.pipelines.MMS_FA + "torchaudio.pipelines.MMS_FA")的[`get_model()`](../generated/torchaudio.pipelines.Wav2Vec2FABundle.html#torchaudio.pipelines.Wav2Vec2FABundle.get_model + "torchaudio.pipelines.Wav2Vec2FABundle.get_model")方法实例化的模型默认包含``标记的特征维度。您可以通过传递`with_star=False`来禁用此功能。 - en: The acoustic model of [`MMS_FA`](../generated/torchaudio.pipelines.MMS_FA.html#torchaudio.pipelines.MMS_FA "torchaudio.pipelines.MMS_FA") was created and open-sourced as part of the research project, [Scaling Speech Technology to 1,000+ Languages](https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/). It was trained with 23,000 hours of audio from 1100+ languages. + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: '[`MMS_FA`](../generated/torchaudio.pipelines.MMS_FA.html#torchaudio.pipelines.MMS_FA + "torchaudio.pipelines.MMS_FA")的声学模型是作为研究项目[将语音技术扩展到1000多种语言](https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/)的一部分创建并开源的。它使用来自1100多种语言的23000小时音频进行训练。' - en: The tokenizer simply maps the normalized characters to integers. You can check the mapping as follow; + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: 分词器简单地将标准化字符映射为整数。您可以按以下方式检查映射; - en: '[PRE4]' + id: totrans-24 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: '[PRE5]' + id: totrans-25 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: The aligner internally uses [`torchaudio.functional.forced_align()`](../generated/torchaudio.functional.forced_align.html#torchaudio.functional.forced_align "torchaudio.functional.forced_align") and [`torchaudio.functional.merge_tokens()`](../generated/torchaudio.functional.merge_tokens.html#torchaudio.functional.merge_tokens "torchaudio.functional.merge_tokens") to infer the time stamps of the input tokens. + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: 对齐器内部使用[`torchaudio.functional.forced_align()`](../generated/torchaudio.functional.forced_align.html#torchaudio.functional.forced_align + "torchaudio.functional.forced_align")和[`torchaudio.functional.merge_tokens()`](../generated/torchaudio.functional.merge_tokens.html#torchaudio.functional.merge_tokens + "torchaudio.functional.merge_tokens")来推断输入标记的时间戳。 - en: The detail of the underlying mechanism is covered in [CTC forced alignment API tutorial](./ctc_forced_alignment_api_tutorial.html), so please refer to it. + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: 底层机制的详细信息在[CTC强制对齐API教程](./ctc_forced_alignment_api_tutorial.html)中介绍,请参考。 - en: We define a utility function that performs the forced alignment with the above model, the tokenizer and the aligner. + id: totrans-28 prefs: [] type: TYPE_NORMAL + zh: 我们定义了一个实用函数,使用上述模型、分词器和对齐器执行强制对齐。 - en: '[PRE6]' + id: totrans-29 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: We also define utility functions for plotting the result and previewing the audio segments. + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: 我们还定义了用于绘制结果和预览音频片段的实用函数。 - en: '[PRE7]' + id: totrans-31 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: '[PRE8]' + id: totrans-32 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: Normalizing the transcript[](#normalizing-the-transcript "Permalink to this heading") + id: totrans-33 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 将文本标准化[](#normalizing-the-transcript "跳转到此标题") - en: The transcripts passed to the pipeline must be normalized beforehand. The exact process of normalization depends on language. + id: totrans-34 prefs: [] type: TYPE_NORMAL + zh: 传递到流水线的文本必须事先进行标准化。标准化的确切过程取决于语言。 - en: Languages that do not have explicit word boundaries (such as Chinese, Japanese and Korean) require segmentation first. There are dedicated tools for this, but let’s say we have segmented transcript. + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 没有明确单词边界的语言(如中文、日文和韩文)需要首先进行分词。有专门的工具可以做到这一点,但假设我们已经对文本进行了分词。 - en: The first step of normalization is romanization. [uroman](https://github.com/isi-nlp/uroman) is a tool that supports many languages. + id: totrans-36 prefs: [] type: TYPE_NORMAL + zh: 标准化的第一步是罗马化。[uroman](https://github.com/isi-nlp/uroman)是一个支持多种语言的工具。 - en: Here is a BASH commands to romanize the input text file and write the output to another text file using `uroman`. + id: totrans-37 prefs: [] type: TYPE_NORMAL + zh: 以下是一个BASH命令,用于罗马化输入文本文件并使用`uroman`将输出写入另一个文本文件。 - en: '[PRE9]' + id: totrans-38 prefs: [] type: TYPE_PRE + zh: '[PRE9]' - en: '[PRE10]' + id: totrans-39 prefs: [] type: TYPE_PRE + zh: '[PRE10]' - en: The next step is to remove non-alphabets and punctuations. The following snippet normalizes the romanized transcript. + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: 下一步是删除非字母和标点符号。以下代码段标准化了罗马化的文本。 - en: '[PRE11]' + id: totrans-41 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: Running the script on the above exanple produces the following. + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: 在上述示例上运行脚本会产生以下结果。 - en: '[PRE12]' + id: totrans-43 prefs: [] type: TYPE_PRE + zh: '[PRE12]' - en: Note that, in this example, since “1882” was not romanized by `uroman`, it was removed in the normalization step. To avoid this, one needs to romanize numbers, but this is known to be a non-trivial task. + id: totrans-44 prefs: [] type: TYPE_NORMAL + zh: 请注意,在此示例中,“1882”未被`uroman`罗马化,因此在标准化步骤中被移除。为了避免这种情况,需要罗马化数字,但这被认为是一个非常困难的任务。 - en: Aligning transcripts to speech[](#aligning-transcripts-to-speech "Permalink to this heading") + id: totrans-45 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 将文本对齐到语音[](#aligning-transcripts-to-speech "跳转到此标题") - en: Now we perform the forced alignment for multiple languages. + id: totrans-46 prefs: [] type: TYPE_NORMAL + zh: 现在我们为多种语言执行强制对齐。 - en: German[](#german "Permalink to this heading") + id: totrans-47 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 德语[](#german "跳转到此标题") - en: '[PRE13]' + id: totrans-48 prefs: [] type: TYPE_PRE + zh: '[PRE13]' - en: '[PRE14]' + id: totrans-49 prefs: [] type: TYPE_PRE + zh: '[PRE14]' - en: '[PRE15]' + id: totrans-50 prefs: [] type: TYPE_PRE + zh: '[PRE15]' - en: '![Emission](../Images/ae919088b5d459e4f9ccbdc7d3fc7a22.png)' + id: totrans-51 prefs: [] type: TYPE_IMG + zh: '![发射](../Images/ae919088b5d459e4f9ccbdc7d3fc7a22.png)' - en: '[PRE16]' + id: totrans-52 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE16]' +- en: null + id: totrans-53 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-54 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE17]' + id: totrans-55 prefs: [] type: TYPE_PRE + zh: '[PRE17]' - en: '[PRE18]' + id: totrans-56 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE18]' +- en: null + id: totrans-57 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-58 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE19]' + id: totrans-59 prefs: [] type: TYPE_PRE + zh: '[PRE19]' - en: '[PRE20]' + id: totrans-60 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE20]' +- en: null + id: totrans-61 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-62 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE21]' + id: totrans-63 prefs: [] type: TYPE_PRE + zh: '[PRE21]' - en: '[PRE22]' + id: totrans-64 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE22]' +- en: null + id: totrans-65 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-66 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE23]' + id: totrans-67 prefs: [] type: TYPE_PRE + zh: '[PRE23]' - en: '[PRE24]' + id: totrans-68 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE24]' +- en: null + id: totrans-69 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-70 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE25]' + id: totrans-71 prefs: [] type: TYPE_PRE + zh: '[PRE25]' - en: '[PRE26]' + id: totrans-72 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE26]' +- en: null + id: totrans-73 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-74 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE27]' + id: totrans-75 prefs: [] type: TYPE_PRE + zh: '[PRE27]' - en: '[PRE28]' + id: totrans-76 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE28]' +- en: null + id: totrans-77 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-78 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE29]' + id: totrans-79 prefs: [] type: TYPE_PRE + zh: '[PRE29]' - en: '[PRE30]' + id: totrans-80 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE30]' +- en: null + id: totrans-81 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-82 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE31]' + id: totrans-83 prefs: [] type: TYPE_PRE + zh: '[PRE31]' - en: '[PRE32]' + id: totrans-84 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE32]' +- en: null + id: totrans-85 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-86 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Chinese[](#chinese "Permalink to this heading") + id: totrans-87 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 中文[](#chinese "跳转到此标题") - en: Chinese is a character-based language, and there is not explicit word-level tokenization (separated by spaces) in its raw written form. In order to obtain word level alignments, you need to first tokenize the transcripts at the word level using a word tokenizer like [“Stanford Tokenizer”](https://michelleful.github.io/code-blog/2015/09/10/parsing-chinese-with-stanford/). However this is not needed if you only want character-level alignments. + id: totrans-88 prefs: [] type: TYPE_NORMAL + zh: 中文是一种基于字符的语言,在其原始书面形式中没有明确的单词级标记化(由空格分隔)。为了获得单词级别的对齐,您需要首先使用像[“斯坦福分词器”](https://michelleful.github.io/code-blog/2015/09/10/parsing-chinese-with-stanford/)这样的单词分词器对文本进行单词级别的标记化。但是,如果您只需要字符级别的对齐,则不需要这样做。 - en: '[PRE33]' + id: totrans-89 prefs: [] type: TYPE_PRE + zh: '[PRE33]' - en: '[PRE34]' + id: totrans-90 prefs: [] type: TYPE_PRE + zh: '[PRE34]' - en: '[PRE35]' + id: totrans-91 prefs: [] type: TYPE_PRE + zh: '[PRE35]' - en: '[PRE36]' + id: totrans-92 prefs: [] type: TYPE_PRE + zh: '[PRE36]' - en: '![Emission](../Images/331a9b3d8f84020b493b73104b8177af.png)' + id: totrans-93 prefs: [] type: TYPE_IMG + zh: '![发射](../Images/331a9b3d8f84020b493b73104b8177af.png)' - en: '[PRE37]' + id: totrans-94 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE37]' +- en: null + id: totrans-95 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-96 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE38]' + id: totrans-97 prefs: [] type: TYPE_PRE + zh: '[PRE38]' - en: '[PRE39]' + id: totrans-98 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE39]' +- en: null + id: totrans-99 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-100 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE40]' + id: totrans-101 prefs: [] type: TYPE_PRE + zh: '[PRE40]' - en: '[PRE41]' + id: totrans-102 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE41]' +- en: null + id: totrans-103 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-104 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE42]' + id: totrans-105 prefs: [] type: TYPE_PRE + zh: '[PRE42]' - en: '[PRE43]' + id: totrans-106 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE43]' +- en: null + id: totrans-107 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-108 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE44]' + id: totrans-109 prefs: [] type: TYPE_PRE + zh: '[PRE44]' - en: '[PRE45]' + id: totrans-110 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE45]' +- en: null + id: totrans-111 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-112 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE46]' + id: totrans-113 prefs: [] type: TYPE_PRE + zh: '[PRE46]' - en: '[PRE47]' + id: totrans-114 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE47]' +- en: null + id: totrans-115 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-116 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE48]' + id: totrans-117 prefs: [] type: TYPE_PRE + zh: '[PRE48]' - en: '[PRE49]' + id: totrans-118 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE49]' +- en: null + id: totrans-119 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-120 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE50]' + id: totrans-121 prefs: [] type: TYPE_PRE + zh: '[PRE50]' - en: '[PRE51]' + id: totrans-122 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE51]' +- en: null + id: totrans-123 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-124 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE52]' + id: totrans-125 prefs: [] type: TYPE_PRE + zh: '[PRE52]' - en: '[PRE53]' + id: totrans-126 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE53]' +- en: null + id: totrans-127 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-128 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE54]' + id: totrans-129 prefs: [] type: TYPE_PRE + zh: '[PRE54]' - en: '[PRE55]' + id: totrans-130 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE55]' +- en: null + id: totrans-131 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-132 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Polish[](#polish "Permalink to this heading") + id: totrans-133 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 波兰语[](#polish "跳转到此标题") - en: '[PRE56]' + id: totrans-134 prefs: [] type: TYPE_PRE + zh: '[PRE56]' - en: '[PRE57]' + id: totrans-135 prefs: [] type: TYPE_PRE + zh: '[PRE57]' - en: '[PRE58]' + id: totrans-136 prefs: [] type: TYPE_PRE + zh: '[PRE58]' - en: '![Emission](../Images/0ec75a66e47434dd967d9a79a1ff6920.png)' + id: totrans-137 prefs: [] type: TYPE_IMG + zh: '![发射](../Images/0ec75a66e47434dd967d9a79a1ff6920.png)' - en: '[PRE59]' + id: totrans-138 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE59]' +- en: null + id: totrans-139 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-140 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE60]' + id: totrans-141 prefs: [] type: TYPE_PRE + zh: '[PRE60]' - en: '[PRE61]' + id: totrans-142 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE61]' +- en: null + id: totrans-143 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-144 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE62]' + id: totrans-145 prefs: [] type: TYPE_PRE + zh: '[PRE62]' - en: '[PRE63]' + id: totrans-146 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE63]' +- en: null + id: totrans-147 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-148 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE64]' + id: totrans-149 prefs: [] type: TYPE_PRE + zh: '[PRE64]' - en: '[PRE65]' + id: totrans-150 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE65]' +- en: null + id: totrans-151 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-152 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE66]' + id: totrans-153 prefs: [] type: TYPE_PRE + zh: '[PRE66]' - en: '[PRE67]' + id: totrans-154 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE67]' +- en: null + id: totrans-155 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-156 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE68]' + id: totrans-157 prefs: [] type: TYPE_PRE + zh: '[PRE68]' - en: '[PRE69]' + id: totrans-158 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE69]' +- en: null + id: totrans-159 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-160 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE70]' + id: totrans-161 prefs: [] type: TYPE_PRE + zh: '[PRE70]' - en: '[PRE71]' + id: totrans-162 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE71]' +- en: null + id: totrans-163 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-164 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE72]' + id: totrans-165 prefs: [] type: TYPE_PRE + zh: '[PRE72]' - en: '[PRE73]' + id: totrans-166 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE73]' +- en: null + id: totrans-167 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-168 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE74]' + id: totrans-169 prefs: [] type: TYPE_PRE + zh: '[PRE74]' - en: '[PRE75]' + id: totrans-170 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE75]' +- en: null + id: totrans-171 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-172 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Portuguese[](#portuguese "Permalink to this heading") + id: totrans-173 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 葡萄牙语[](#portuguese "跳转到此标题的永久链接") - en: '[PRE76]' + id: totrans-174 prefs: [] type: TYPE_PRE + zh: '[PRE76]' - en: '[PRE77]' + id: totrans-175 prefs: [] type: TYPE_PRE + zh: '[PRE77]' - en: '[PRE78]' + id: totrans-176 prefs: [] type: TYPE_PRE + zh: '[PRE78]' - en: '![Emission](../Images/83cad3306c52b6f77872c504153d9294.png)' + id: totrans-177 prefs: [] type: TYPE_IMG + zh: '![发射](../Images/83cad3306c52b6f77872c504153d9294.png)' - en: '[PRE79]' + id: totrans-178 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE79]' +- en: null + id: totrans-179 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-180 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE80]' + id: totrans-181 prefs: [] type: TYPE_PRE + zh: '[PRE80]' - en: '[PRE81]' + id: totrans-182 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE81]' +- en: null + id: totrans-183 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-184 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE82]' + id: totrans-185 prefs: [] type: TYPE_PRE + zh: '[PRE82]' - en: '[PRE83]' + id: totrans-186 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE83]' +- en: null + id: totrans-187 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-188 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE84]' + id: totrans-189 prefs: [] type: TYPE_PRE + zh: '[PRE84]' - en: '[PRE85]' + id: totrans-190 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE85]' +- en: null + id: totrans-191 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-192 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE86]' + id: totrans-193 prefs: [] type: TYPE_PRE + zh: '[PRE86]' - en: '[PRE87]' + id: totrans-194 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE87]' +- en: null + id: totrans-195 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-196 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE88]' + id: totrans-197 prefs: [] type: TYPE_PRE + zh: '[PRE88]' - en: '[PRE89]' + id: totrans-198 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE89]' +- en: null + id: totrans-199 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-200 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE90]' + id: totrans-201 prefs: [] type: TYPE_PRE + zh: '[PRE90]' - en: '[PRE91]' + id: totrans-202 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE91]' +- en: null + id: totrans-203 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-204 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE92]' + id: totrans-205 prefs: [] type: TYPE_PRE + zh: '[PRE92]' - en: '[PRE93]' + id: totrans-206 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE93]' +- en: null + id: totrans-207 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-208 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE94]' + id: totrans-209 prefs: [] type: TYPE_PRE + zh: '[PRE94]' - en: '[PRE95]' + id: totrans-210 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE95]' +- en: null + id: totrans-211 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-212 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE96]' + id: totrans-213 prefs: [] type: TYPE_PRE + zh: '[PRE96]' - en: '[PRE97]' + id: totrans-214 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE97]' +- en: null + id: totrans-215 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-216 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Italian[](#italian "Permalink to this heading") + id: totrans-217 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 意大利语[](#italian "跳转到此标题的永久链接") - en: '[PRE98]' + id: totrans-218 prefs: [] type: TYPE_PRE + zh: '[PRE98]' - en: '[PRE99]' + id: totrans-219 prefs: [] type: TYPE_PRE + zh: '[PRE99]' - en: '[PRE100]' + id: totrans-220 prefs: [] type: TYPE_PRE + zh: '[PRE100]' - en: '![Emission](../Images/731d611433f5cf2db8c00bfa366eced7.png)' + id: totrans-221 prefs: [] type: TYPE_IMG + zh: '![发射](../Images/731d611433f5cf2db8c00bfa366eced7.png)' - en: '[PRE101]' + id: totrans-222 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE101]' +- en: null + id: totrans-223 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-224 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE102]' + id: totrans-225 prefs: [] type: TYPE_PRE + zh: '[PRE102]' - en: '[PRE103]' + id: totrans-226 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE103]' +- en: null + id: totrans-227 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-228 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE104]' + id: totrans-229 prefs: [] type: TYPE_PRE + zh: '[PRE104]' - en: '[PRE105]' + id: totrans-230 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE105]' +- en: null + id: totrans-231 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-232 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE106]' + id: totrans-233 prefs: [] type: TYPE_PRE + zh: '[PRE106]' - en: '[PRE107]' + id: totrans-234 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE107]' +- en: null + id: totrans-235 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-236 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE108]' + id: totrans-237 prefs: [] type: TYPE_PRE + zh: '[PRE108]' - en: '[PRE109]' + id: totrans-238 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE109]' +- en: null + id: totrans-239 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-240 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE110]' + id: totrans-241 prefs: [] type: TYPE_PRE + zh: '[PRE110]' - en: '[PRE111]' + id: totrans-242 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE111]' +- en: null + id: totrans-243 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-244 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE112]' + id: totrans-245 prefs: [] type: TYPE_PRE + zh: '[PRE112]' - en: '[PRE113]' + id: totrans-246 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE113]' +- en: null + id: totrans-247 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-248 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Conclusion[](#conclusion "Permalink to this heading") + id: totrans-249 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 结论[](#conclusion "跳转到此标题的永久链接") - en: In this tutorial, we looked at how to use torchaudio’s forced alignment API and a Wav2Vec2 pre-trained mulilingual acoustic model to align speech data to transcripts in five languages. + id: totrans-250 prefs: [] type: TYPE_NORMAL + zh: 在本教程中,我们看了如何使用torchaudio的强制对齐API和一个Wav2Vec2预训练的多语言声学模型来将五种语言的语音数据与文本对齐。 - en: Acknowledgement[](#acknowledgement "Permalink to this heading") + id: totrans-251 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 致谢[](#acknowledgement "跳转到此标题的永久链接") - en: Thanks to [Vineel Pratap](mailto:vineelkpratap%40meta.com) and [Zhaoheng Ni](mailto:zni%40meta.com) for developing and open-sourcing the forced aligner API. + id: totrans-252 prefs: [] type: TYPE_NORMAL + zh: 感谢[Vineel Pratap](mailto:vineelkpratap%40meta.com)和[Zhaoheng Ni](mailto:zni%40meta.com)开发并开源了强制对齐器API。 - en: '**Total running time of the script:** ( 0 minutes 4.115 seconds)' + id: totrans-253 prefs: [] type: TYPE_NORMAL + zh: '**脚本的总运行时间:**(0分钟4.115秒)' - en: '[`Download Python source code: forced_alignment_for_multilingual_data_tutorial.py`](../_downloads/a662d1f1f11633103b4b95ad4b68013c/forced_alignment_for_multilingual_data_tutorial.py)' + id: totrans-254 prefs: [] type: TYPE_NORMAL + zh: '[`下载Python源代码:forced_alignment_for_multilingual_data_tutorial.py`](../_downloads/a662d1f1f11633103b4b95ad4b68013c/forced_alignment_for_multilingual_data_tutorial.py)' - en: '[`Download Jupyter notebook: forced_alignment_for_multilingual_data_tutorial.ipynb`](../_downloads/454ce4c8debdfeda1ab0ab945c52976d/forced_alignment_for_multilingual_data_tutorial.ipynb)' + id: totrans-255 prefs: [] type: TYPE_NORMAL + zh: '[`下载Jupyter笔记本:forced_alignment_for_multilingual_data_tutorial.ipynb`](../_downloads/454ce4c8debdfeda1ab0ab945c52976d/forced_alignment_for_multilingual_data_tutorial.ipynb)' - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)' + id: totrans-256 prefs: [] type: TYPE_NORMAL + zh: '[Sphinx-Gallery生成的图库](https://sphinx-gallery.github.io)' diff --git a/totrans/aud22_42.yaml b/totrans/aud22_42.yaml index e142c0b5dea61fcbcd022288c9ab2dcb58a9a68a..53f65c32ef2c1bdc5bdaadd771848a643af05517 100644 --- a/totrans/aud22_42.yaml +++ b/totrans/aud22_42.yaml @@ -1,334 +1,521 @@ - en: Text-to-Speech with Tacotron2 + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 使用Tacotron2进行文本到语音转换 - en: 原文:[https://pytorch.org/audio/stable/tutorials/tacotron2_pipeline_tutorial.html](https://pytorch.org/audio/stable/tutorials/tacotron2_pipeline_tutorial.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/tutorials/tacotron2_pipeline_tutorial.html](https://pytorch.org/audio/stable/tutorials/tacotron2_pipeline_tutorial.html) - en: Note + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Click [here](#sphx-glr-download-tutorials-tacotron2-pipeline-tutorial-py) to download the full example code + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 点击[这里](#sphx-glr-download-tutorials-tacotron2-pipeline-tutorial-py)下载完整示例代码 - en: '**Author**: [Yao-Yuan Yang](https://github.com/yangarbiter), [Moto Hira](mailto:moto%40meta.com)' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: '**作者**:[Yao-Yuan Yang](https://github.com/yangarbiter), [Moto Hira](mailto:moto%40meta.com)' - en: Overview[](#overview "Permalink to this heading") + id: totrans-5 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 概述[](#overview "跳转到此标题的永久链接") - en: This tutorial shows how to build text-to-speech pipeline, using the pretrained Tacotron2 in torchaudio. + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 本教程展示了如何构建文本到语音流水线,使用torchaudio中的预训练Tacotron2。 - en: 'The text-to-speech pipeline goes as follows:' + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 文本到语音流水线的步骤如下: - en: Text preprocessing + id: totrans-8 prefs: - PREF_OL type: TYPE_NORMAL + zh: 文本预处理 - en: First, the input text is encoded into a list of symbols. In this tutorial, we will use English characters and phonemes as the symbols. + id: totrans-9 prefs: - PREF_IND type: TYPE_NORMAL + zh: 首先,将输入文本编码为符号列表。在本教程中,我们将使用英文字符和音素作为符号。 - en: Spectrogram generation + id: totrans-10 prefs: - PREF_OL type: TYPE_NORMAL + zh: 生成频谱图 - en: From the encoded text, a spectrogram is generated. We use `Tacotron2` model for this. + id: totrans-11 prefs: - PREF_IND type: TYPE_NORMAL + zh: 从编码文本生成频谱图。我们使用`Tacotron2`模型进行此操作。 - en: Time-domain conversion + id: totrans-12 prefs: - PREF_OL type: TYPE_NORMAL + zh: 时域转换 - en: The last step is converting the spectrogram into the waveform. The process to generate speech from spectrogram is also called Vocoder. In this tutorial, three different vocoders are used, [`WaveRNN`](../generated/torchaudio.models.WaveRNN.html#torchaudio.models.WaveRNN "torchaudio.models.WaveRNN"), [`GriffinLim`](../generated/torchaudio.transforms.GriffinLim.html#torchaudio.transforms.GriffinLim "torchaudio.transforms.GriffinLim"), and [Nvidia’s WaveGlow](https://pytorch.org/hub/nvidia_deeplearningexamples_tacotron2/). + id: totrans-13 prefs: - PREF_IND type: TYPE_NORMAL + zh: 最后一步是将频谱图转换为波形。从频谱图生成语音的过程也称为声码器。在本教程中,使用了三种不同的声码器,[`WaveRNN`](../generated/torchaudio.models.WaveRNN.html#torchaudio.models.WaveRNN + "torchaudio.models.WaveRNN")、[`GriffinLim`](../generated/torchaudio.transforms.GriffinLim.html#torchaudio.transforms.GriffinLim + "torchaudio.transforms.GriffinLim")和[Nvidia的WaveGlow](https://pytorch.org/hub/nvidia_deeplearningexamples_tacotron2/)。 - en: The following figure illustrates the whole process. + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 以下图示了整个过程。 - en: '![https://download.pytorch.org/torchaudio/tutorial-assets/tacotron2_tts_pipeline.png](../Images/209f5b44836c4b1fdfb15fbfce7fd7f0.png)' + id: totrans-15 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/tutorial-assets/tacotron2_tts_pipeline.png](../Images/209f5b44836c4b1fdfb15fbfce7fd7f0.png)' - en: All the related components are bundled in [`torchaudio.pipelines.Tacotron2TTSBundle`](../generated/torchaudio.pipelines.Tacotron2TTSBundle.html#torchaudio.pipelines.Tacotron2TTSBundle "torchaudio.pipelines.Tacotron2TTSBundle"), but this tutorial will also cover the process under the hood. + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: 所有相关组件都打包在[`torchaudio.pipelines.Tacotron2TTSBundle`](../generated/torchaudio.pipelines.Tacotron2TTSBundle.html#torchaudio.pipelines.Tacotron2TTSBundle + "torchaudio.pipelines.Tacotron2TTSBundle")中,但本教程还将介绍底层过程。 - en: Preparation[](#preparation "Permalink to this heading") + id: totrans-17 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 准备工作[](#preparation "跳转到此标题的永久链接") - en: First, we install the necessary dependencies. In addition to `torchaudio`, `DeepPhonemizer` is required to perform phoneme-based encoding. + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: 首先,我们安装必要的依赖项。除了`torchaudio`之外,还需要`DeepPhonemizer`来执行基于音素的编码。 - en: '[PRE0]' + id: totrans-19 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '[PRE1]' + id: totrans-20 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: '[PRE2]' + id: totrans-21 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: '[PRE3]' + id: totrans-22 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: Text Processing[](#text-processing "Permalink to this heading") + id: totrans-23 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 文本处理[](#text-processing "跳转到此标题的永久链接") - en: Character-based encoding[](#character-based-encoding "Permalink to this heading") + id: totrans-24 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 基于字符的编码[](#character-based-encoding "跳转到此标题的永久链接") - en: In this section, we will go through how the character-based encoding works. + id: totrans-25 prefs: [] type: TYPE_NORMAL + zh: 在本节中,我们将介绍基于字符的编码工作原理。 - en: Since the pre-trained Tacotron2 model expects specific set of symbol tables, the same functionalities available in `torchaudio`. This section is more for the explanation of the basis of encoding. + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: 由于预训练的Tacotron2模型期望特定的符号表集,因此`torchaudio`中提供了相同的功能。本节更多是为了解释编码的基础。 - en: Firstly, we define the set of symbols. For example, we can use `'_-!\'(),.:;? abcdefghijklmnopqrstuvwxyz'`. Then, we will map the each character of the input text into the index of the corresponding symbol in the table. + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: 首先,我们定义符号集。例如,我们可以使用`'_-!\'(),.:;? abcdefghijklmnopqrstuvwxyz'`。然后,我们将输入文本的每个字符映射到表中相应符号的索引。 - en: The following is an example of such processing. In the example, symbols that are not in the table are ignored. + id: totrans-28 prefs: [] type: TYPE_NORMAL + zh: 以下是这种处理的示例。在示例中,表中没有的符号将被忽略。 - en: '[PRE4]' + id: totrans-29 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: '[PRE5]' + id: totrans-30 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: As mentioned in the above, the symbol table and indices must match what the pretrained Tacotron2 model expects. `torchaudio` provides the transform along with the pretrained model. For example, you can instantiate and use such transform as follow. + id: totrans-31 prefs: [] type: TYPE_NORMAL + zh: 如上所述,符号表和索引必须与预训练的Tacotron2模型期望的相匹配。`torchaudio`提供了该转换以及预训练模型。例如,您可以实例化并使用此类转换如下。 - en: '[PRE6]' + id: totrans-32 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: '[PRE7]' + id: totrans-33 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: The `processor` object takes either a text or list of texts as inputs. When a list of texts are provided, the returned `lengths` variable represents the valid length of each processed tokens in the output batch. + id: totrans-34 prefs: [] type: TYPE_NORMAL + zh: '`processor`对象接受文本或文本列表作为输入。当提供文本列表时,返回的`lengths`变量表示输出批次中每个处理的标记的有效长度。' - en: The intermediate representation can be retrieved as follow. + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 中间表示可以按以下方式检索。 - en: '[PRE8]' + id: totrans-36 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: '[PRE9]' + id: totrans-37 prefs: [] type: TYPE_PRE + zh: '[PRE9]' - en: Phoneme-based encoding[](#phoneme-based-encoding "Permalink to this heading") + id: totrans-38 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 基于音素的编码[](#phoneme-based-encoding "跳转到此标题的永久链接") - en: Phoneme-based encoding is similar to character-based encoding, but it uses a symbol table based on phonemes and a G2P (Grapheme-to-Phoneme) model. + id: totrans-39 prefs: [] type: TYPE_NORMAL + zh: 基于音素的编码类似于基于字符的编码,但它使用基于音素的符号表和G2P(字素到音素)模型。 - en: The detail of the G2P model is out of scope of this tutorial, we will just look at what the conversion looks like. + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: G2P模型的详细信息超出了本教程的范围,我们只会看一下转换的样子。 - en: Similar to the case of character-based encoding, the encoding process is expected to match what a pretrained Tacotron2 model is trained on. `torchaudio` has an interface to create the process. + id: totrans-41 prefs: [] type: TYPE_NORMAL + zh: 与基于字符的编码类似,编码过程应与预训练的Tacotron2模型训练的内容相匹配。`torchaudio`具有创建该过程的接口。 - en: The following code illustrates how to make and use the process. Behind the scene, a G2P model is created using `DeepPhonemizer` package, and the pretrained weights published by the author of `DeepPhonemizer` is fetched. + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: 以下代码说明了如何进行该过程。在幕后,使用`DeepPhonemizer`包创建了一个G2P模型,并获取了`DeepPhonemizer`作者发布的预训练权重。 - en: '[PRE10]' + id: totrans-43 prefs: [] type: TYPE_PRE + zh: '[PRE10]' - en: '[PRE11]' + id: totrans-44 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: Notice that the encoded values are different from the example of character-based encoding. + id: totrans-45 prefs: [] type: TYPE_NORMAL + zh: 请注意,编码值与基于字符的编码示例不同。 - en: The intermediate representation looks like the following. + id: totrans-46 prefs: [] type: TYPE_NORMAL + zh: 中间表示如下。 - en: '[PRE12]' + id: totrans-47 prefs: [] type: TYPE_PRE + zh: '[PRE12]' - en: '[PRE13]' + id: totrans-48 prefs: [] type: TYPE_PRE + zh: '[PRE13]' - en: Spectrogram Generation[](#spectrogram-generation "Permalink to this heading") + id: totrans-49 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 频谱图生成[](#spectrogram-generation "跳转到此标题") - en: '`Tacotron2` is the model we use to generate spectrogram from the encoded text. For the detail of the model, please refer to [the paper](https://arxiv.org/abs/1712.05884).' + id: totrans-50 prefs: [] type: TYPE_NORMAL + zh: '`Tacotron2`是我们用来从编码文本生成频谱图的模型。有关模型的详细信息,请参阅[论文](https://arxiv.org/abs/1712.05884)。' - en: It is easy to instantiate a Tacotron2 model with pretrained weight, however, note that the input to Tacotron2 models need to be processed by the matching text processor. + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: 实例化一个带有预训练权重的Tacotron2模型很容易,但请注意,Tacotron2模型的输入需要经过匹配的文本处理器处理。 - en: '[`torchaudio.pipelines.Tacotron2TTSBundle`](../generated/torchaudio.pipelines.Tacotron2TTSBundle.html#torchaudio.pipelines.Tacotron2TTSBundle "torchaudio.pipelines.Tacotron2TTSBundle") bundles the matching models and processors together so that it is easy to create the pipeline.' + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: '[`torchaudio.pipelines.Tacotron2TTSBundle`](../generated/torchaudio.pipelines.Tacotron2TTSBundle.html#torchaudio.pipelines.Tacotron2TTSBundle + "torchaudio.pipelines.Tacotron2TTSBundle")将匹配的模型和处理器捆绑在一起,以便轻松创建流水线。' - en: For the available bundles, and its usage, please refer to [`Tacotron2TTSBundle`](../generated/torchaudio.pipelines.Tacotron2TTSBundle.html#torchaudio.pipelines.Tacotron2TTSBundle "torchaudio.pipelines.Tacotron2TTSBundle"). + id: totrans-53 prefs: [] type: TYPE_NORMAL + zh: 有关可用捆绑包及其用法,请参阅[`Tacotron2TTSBundle`](../generated/torchaudio.pipelines.Tacotron2TTSBundle.html#torchaudio.pipelines.Tacotron2TTSBundle + "torchaudio.pipelines.Tacotron2TTSBundle")。 - en: '[PRE14]' + id: totrans-54 prefs: [] type: TYPE_PRE + zh: '[PRE14]' - en: '![tacotron2 pipeline tutorial](../Images/caf2a228b2d54421e1c3fc64dd482393.png)' + id: totrans-55 prefs: [] type: TYPE_IMG + zh: '![tacotron2流水线教程](../Images/caf2a228b2d54421e1c3fc64dd482393.png)' - en: '[PRE15]' + id: totrans-56 prefs: [] type: TYPE_PRE + zh: '[PRE15]' - en: Note that `Tacotron2.infer` method perfoms multinomial sampling, therefor, the process of generating the spectrogram incurs randomness. + id: totrans-57 prefs: [] type: TYPE_NORMAL + zh: 请注意,`Tacotron2.infer`方法执行多项抽样,因此生成频谱图的过程会产生随机性。 - en: '[PRE16]' + id: totrans-58 prefs: [] type: TYPE_PRE + zh: '[PRE16]' - en: '![tacotron2 pipeline tutorial](../Images/b4898a614b73264775a6c5201f2e1bc3.png)' + id: totrans-59 prefs: [] type: TYPE_IMG + zh: '![tacotron2流水线教程](../Images/b4898a614b73264775a6c5201f2e1bc3.png)' - en: '[PRE17]' + id: totrans-60 prefs: [] type: TYPE_PRE + zh: '[PRE17]' - en: Waveform Generation[](#waveform-generation "Permalink to this heading") + id: totrans-61 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 波形生成[](#waveform-generation "跳转到此标题") - en: Once the spectrogram is generated, the last process is to recover the waveform from the spectrogram. + id: totrans-62 prefs: [] type: TYPE_NORMAL + zh: 生成频谱图后,最后一个过程是从频谱图中恢复波形。 - en: '`torchaudio` provides vocoders based on `GriffinLim` and `WaveRNN`.' + id: totrans-63 prefs: [] type: TYPE_NORMAL + zh: '`torchaudio`提供基于`GriffinLim`和`WaveRNN`的声码器。' - en: WaveRNN[](#wavernn "Permalink to this heading") + id: totrans-64 prefs: - PREF_H3 type: TYPE_NORMAL + zh: WaveRNN[](#wavernn "跳转到此标题") - en: Continuing from the previous section, we can instantiate the matching WaveRNN model from the same bundle. + id: totrans-65 prefs: [] type: TYPE_NORMAL + zh: 继续上一节,我们可以从相同的捆绑包中实例化匹配的WaveRNN模型。 - en: '[PRE18]' + id: totrans-66 prefs: [] type: TYPE_PRE + zh: '[PRE18]' - en: '[PRE19]' + id: totrans-67 prefs: [] type: TYPE_PRE + zh: '[PRE19]' - en: '[PRE20]' + id: totrans-68 prefs: [] type: TYPE_PRE + zh: '[PRE20]' - en: '![tacotron2 pipeline tutorial](../Images/d81f4591e9faa8f9a00d0a4eb78e505d.png)' + id: totrans-69 prefs: [] type: TYPE_IMG -- en: + zh: '![tacotron2流水线教程](../Images/d81f4591e9faa8f9a00d0a4eb78e505d.png)' +- en: null + id: totrans-70 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-71 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Griffin-Lim[](#griffin-lim "Permalink to this heading") + id: totrans-72 prefs: - PREF_H3 type: TYPE_NORMAL + zh: Griffin-Lim[](#griffin-lim "跳转到此标题") - en: Using the Griffin-Lim vocoder is same as WaveRNN. You can instantiate the vocode object with [`get_vocoder()`](../generated/torchaudio.pipelines.Tacotron2TTSBundle.html#torchaudio.pipelines.Tacotron2TTSBundle.get_vocoder "torchaudio.pipelines.Tacotron2TTSBundle.get_vocoder") method and pass the spectrogram. + id: totrans-73 prefs: [] type: TYPE_NORMAL + zh: 使用Griffin-Lim声码器与WaveRNN相同。您可以使用[`get_vocoder()`](../generated/torchaudio.pipelines.Tacotron2TTSBundle.html#torchaudio.pipelines.Tacotron2TTSBundle.get_vocoder + "torchaudio.pipelines.Tacotron2TTSBundle.get_vocoder")方法实例化声码器对象并传递频谱图。 - en: '[PRE21]' + id: totrans-74 prefs: [] type: TYPE_PRE + zh: '[PRE21]' - en: '[PRE22]' + id: totrans-75 prefs: [] type: TYPE_PRE + zh: '[PRE22]' - en: '[PRE23]' + id: totrans-76 prefs: [] type: TYPE_PRE + zh: '[PRE23]' - en: '![tacotron2 pipeline tutorial](../Images/3ce8674d89c25493f24e575fd2377a53.png)' + id: totrans-77 prefs: [] type: TYPE_IMG -- en: + zh: '![tacotron2流水线教程](../Images/3ce8674d89c25493f24e575fd2377a53.png)' +- en: null + id: totrans-78 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-79 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Waveglow[](#waveglow "Permalink to this heading") + id: totrans-80 prefs: - PREF_H3 type: TYPE_NORMAL + zh: Waveglow[](#waveglow "跳转到此标题") - en: Waveglow is a vocoder published by Nvidia. The pretrained weights are published on Torch Hub. One can instantiate the model using `torch.hub` module. + id: totrans-81 prefs: [] type: TYPE_NORMAL + zh: Waveglow是Nvidia发布的声码器。预训练权重已发布在Torch Hub上。可以使用`torch.hub`模块实例化模型。 - en: '[PRE24]' + id: totrans-82 prefs: [] type: TYPE_PRE + zh: '[PRE24]' - en: '[PRE25]' + id: totrans-83 prefs: [] type: TYPE_PRE + zh: '[PRE25]' - en: '[PRE26]' + id: totrans-84 prefs: [] type: TYPE_PRE + zh: '[PRE26]' - en: '![tacotron2 pipeline tutorial](../Images/d981c34fe89af30a4e994e3d10e2dad4.png)' + id: totrans-85 prefs: [] type: TYPE_IMG -- en: + zh: '![tacotron2流水线教程](../Images/d981c34fe89af30a4e994e3d10e2dad4.png)' +- en: null + id: totrans-86 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-87 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '**Total running time of the script:** ( 1 minutes 41.941 seconds)' + id: totrans-88 prefs: [] type: TYPE_NORMAL + zh: '**脚本的总运行时间:**(1分钟41.941秒)' - en: '[`Download Python source code: tacotron2_pipeline_tutorial.py`](../_downloads/9772cbd0af96f57f17a2da758b365a43/tacotron2_pipeline_tutorial.py)' + id: totrans-89 prefs: [] type: TYPE_NORMAL + zh: '[`下载Python源代码:tacotron2_pipeline_tutorial.py`](../_downloads/9772cbd0af96f57f17a2da758b365a43/tacotron2_pipeline_tutorial.py)' - en: '[`Download Jupyter notebook: tacotron2_pipeline_tutorial.ipynb`](../_downloads/63ad2005fc24f143f3f078cd2c6b0d60/tacotron2_pipeline_tutorial.ipynb)' + id: totrans-90 prefs: [] type: TYPE_NORMAL + zh: '[`下载Jupyter笔记本:tacotron2_pipeline_tutorial.ipynb`](../_downloads/63ad2005fc24f143f3f078cd2c6b0d60/tacotron2_pipeline_tutorial.ipynb)' - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)' + id: totrans-91 prefs: [] type: TYPE_NORMAL + zh: '[Sphinx-Gallery生成的画廊](https://sphinx-gallery.github.io)' diff --git a/totrans/aud22_43.yaml b/totrans/aud22_43.yaml index 727dd58aafdcf1e44067e3ca48c671c64973bdf8..6d2f8fe90a75ddbd6cc2866673b1f69300fda049 100644 --- a/totrans/aud22_43.yaml +++ b/totrans/aud22_43.yaml @@ -1,422 +1,650 @@ - en: Speech Enhancement with MVDR Beamforming + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 使用MVDR波束形成进行语音增强 - en: 原文:[https://pytorch.org/audio/stable/tutorials/mvdr_tutorial.html](https://pytorch.org/audio/stable/tutorials/mvdr_tutorial.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: '[https://pytorch.org/audio/stable/tutorials/mvdr_tutorial.html](https://pytorch.org/audio/stable/tutorials/mvdr_tutorial.html)' - en: Note + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Click [here](#sphx-glr-download-tutorials-mvdr-tutorial-py) to download the full example code + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 点击[这里](#sphx-glr-download-tutorials-mvdr-tutorial-py)下载完整示例代码 - en: '**Author**: [Zhaoheng Ni](mailto:zni%40meta.com)' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: '**作者**:[Zhaoheng Ni](mailto:zni%40meta.com)' - en: 1\. Overview[](#overview "Permalink to this heading") + id: totrans-5 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 1\. 概述[](#overview "跳转到此标题的永久链接") - en: This is a tutorial on applying Minimum Variance Distortionless Response (MVDR) beamforming to estimate enhanced speech with TorchAudio. + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 这是一个关于如何应用最小方差无失真响应(MVDR)波束形成来估计增强语音的TorchAudio教程。 - en: 'Steps:' + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 步骤: - en: Generate an ideal ratio mask (IRM) by dividing the clean/noise magnitude by the mixture magnitude. + id: totrans-8 prefs: - PREF_UL type: TYPE_NORMAL + zh: 通过将干净/噪声幅度除以混合幅度生成理想比掩模(IRM)。 - en: Estimate power spectral density (PSD) matrices using [`torchaudio.transforms.PSD()`](../generated/torchaudio.transforms.PSD.html#torchaudio.transforms.PSD "torchaudio.transforms.PSD"). + id: totrans-9 prefs: - PREF_UL type: TYPE_NORMAL + zh: 使用[`torchaudio.transforms.PSD()`](../generated/torchaudio.transforms.PSD.html#torchaudio.transforms.PSD + "torchaudio.transforms.PSD")来估计功率谱密度(PSD)矩阵。 - en: Estimate enhanced speech using MVDR modules ([`torchaudio.transforms.SoudenMVDR()`](../generated/torchaudio.transforms.SoudenMVDR.html#torchaudio.transforms.SoudenMVDR "torchaudio.transforms.SoudenMVDR") and [`torchaudio.transforms.RTFMVDR()`](../generated/torchaudio.transforms.RTFMVDR.html#torchaudio.transforms.RTFMVDR "torchaudio.transforms.RTFMVDR")). + id: totrans-10 prefs: - PREF_UL type: TYPE_NORMAL + zh: 使用MVDR模块([`torchaudio.transforms.SoudenMVDR()`](../generated/torchaudio.transforms.SoudenMVDR.html#torchaudio.transforms.SoudenMVDR + "torchaudio.transforms.SoudenMVDR")和[`torchaudio.transforms.RTFMVDR()`](../generated/torchaudio.transforms.RTFMVDR.html#torchaudio.transforms.RTFMVDR + "torchaudio.transforms.RTFMVDR"))估计增强语音。 - en: Benchmark the two methods ([`torchaudio.functional.rtf_evd()`](../generated/torchaudio.functional.rtf_evd.html#torchaudio.functional.rtf_evd "torchaudio.functional.rtf_evd") and [`torchaudio.functional.rtf_power()`](../generated/torchaudio.functional.rtf_power.html#torchaudio.functional.rtf_power "torchaudio.functional.rtf_power")) for computing the relative transfer function (RTF) matrix of the reference microphone. + id: totrans-11 prefs: - PREF_UL type: TYPE_NORMAL + zh: 为计算参考麦克风的相对传递函数(RTF)矩阵,对两种方法([`torchaudio.functional.rtf_evd()`](../generated/torchaudio.functional.rtf_evd.html#torchaudio.functional.rtf_evd + "torchaudio.functional.rtf_evd")和[`torchaudio.functional.rtf_power()`](../generated/torchaudio.functional.rtf_power.html#torchaudio.functional.rtf_power + "torchaudio.functional.rtf_power"))进行基准测试。 - en: '[PRE0]' + id: totrans-12 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '[PRE1]' + id: totrans-13 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: 2\. Preparation[](#preparation "Permalink to this heading") + id: totrans-14 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 2\. 准备工作[](#preparation "跳转到此标题的永久链接") - en: 2.1\. Import the packages[](#import-the-packages "Permalink to this heading") + id: totrans-15 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 2.1\. 导入包[](#import-the-packages "跳转到此标题的永久链接") - en: First, we install and import the necessary packages. + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: 首先,我们安装并导入必要的包。 - en: '`mir_eval`, `pesq`, and `pystoi` packages are required for evaluating the speech enhancement performance.' + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: 评估语音增强性能需要`mir_eval`、`pesq`和`pystoi`包。 - en: '[PRE2]' + id: totrans-18 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: 2.2\. Download audio data[](#download-audio-data "Permalink to this heading") + id: totrans-19 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 2.2\. 下载音频数据[](#download-audio-data "跳转到此标题的永久链接") - en: The multi-channel audio example is selected from [ConferencingSpeech](https://github.com/ConferencingSpeech/ConferencingSpeech2021) dataset. + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: 多通道音频示例选自[ConferencingSpeech](https://github.com/ConferencingSpeech/ConferencingSpeech2021)数据集。 - en: The original filename is + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: 原始文件名为 - en: '`SSB07200001\#noise-sound-bible-0038\#7.86_6.16_3.00_3.14_4.84_134.5285_191.7899_0.4735\#15217\#25.16333303751458\#0.2101221178590021.wav`' + id: totrans-22 prefs: - PREF_BQ type: TYPE_NORMAL + zh: '`SSB07200001\#noise-sound-bible-0038\#7.86_6.16_3.00_3.14_4.84_134.5285_191.7899_0.4735\#15217\#25.16333303751458\#0.2101221178590021.wav`' - en: 'which was generated with:' + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: 这是通过以下方式生成的: - en: '`SSB07200001.wav` from [AISHELL-3](https://www.openslr.org/93/) (Apache License v.2.0)' + id: totrans-24 prefs: - PREF_UL type: TYPE_NORMAL + zh: 从[AISHELL-3](https://www.openslr.org/93/)(Apache许可证v.2.0)中获取`SSB07200001.wav` - en: '`noise-sound-bible-0038.wav` from [MUSAN](http://www.openslr.org/17/) (Attribution 4.0 International — CC BY 4.0)' + id: totrans-25 prefs: - PREF_UL type: TYPE_NORMAL + zh: 从[MUSAN](http://www.openslr.org/17/)(署名4.0国际-CC BY 4.0)中获取`noise-sound-bible-0038.wav` - en: '[PRE3]' + id: totrans-26 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: '[PRE4]' + id: totrans-27 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: 2.3\. Helper functions[](#helper-functions "Permalink to this heading") + id: totrans-28 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 2.3\. 辅助函数[](#helper-functions "跳转到此标题的永久链接") - en: '[PRE5]' + id: totrans-29 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: 3\. Generate Ideal Ratio Masks (IRMs)[](#generate-ideal-ratio-masks-irms "Permalink to this heading") + id: totrans-30 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 3\. 生成理想比掩模(IRMs)[](#generate-ideal-ratio-masks-irms "跳转到此标题的永久链接") - en: 3.1\. Load audio data[](#load-audio-data "Permalink to this heading") + id: totrans-31 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 3.1\. 加载音频数据[](#load-audio-data "跳转到此标题的永久链接") - en: '[PRE6]' + id: totrans-32 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: 'Note: To improve computational robustness, it is recommended to represent the waveforms as double-precision floating point (`torch.float64` or `torch.double`) values.' + id: totrans-33 prefs: [] type: TYPE_NORMAL + zh: 注意:为了提高计算的稳健性,建议将波形表示为双精度浮点数(`torch.float64`或`torch.double`)值。 - en: '[PRE7]' + id: totrans-34 prefs: [] type: TYPE_PRE -- en: 3.2\. Compute STFT coefficients[](#compute-stft-coefficients "Permalink to - this heading") + zh: '[PRE7]' +- en: 3.2\. Compute STFT coefficients[](#compute-stft-coefficients "Permalink to this + heading") + id: totrans-35 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 3.2\. 计算STFT系数[](#compute-stft-coefficients "跳转到此标题的永久链接") - en: '[PRE8]' + id: totrans-36 prefs: [] type: TYPE_PRE -- en: 3.2.1\. Visualize mixture speech[](#visualize-mixture-speech "Permalink to - this heading") + zh: '[PRE8]' +- en: 3.2.1\. Visualize mixture speech[](#visualize-mixture-speech "Permalink to this + heading") + id: totrans-37 prefs: - PREF_H4 type: TYPE_NORMAL + zh: 3.2.1\. 可视化混合语音[](#visualize-mixture-speech "跳转到此标题的永久链接") - en: 'We evaluate the quality of the mixture speech or the enhanced speech using the following three metrics:' + id: totrans-38 prefs: [] type: TYPE_NORMAL + zh: 我们使用以下三个指标评估混合语音或增强语音的质量: - en: signal-to-distortion ratio (SDR) + id: totrans-39 prefs: - PREF_UL type: TYPE_NORMAL + zh: 信号与失真比(SDR) - en: scale-invariant signal-to-noise ratio (Si-SNR, or Si-SDR in some papers) + id: totrans-40 prefs: - PREF_UL type: TYPE_NORMAL + zh: 尺度不变信噪比(Si-SNR,在一些论文中为Si-SDR) - en: Perceptual Evaluation of Speech Quality (PESQ) + id: totrans-41 prefs: - PREF_UL type: TYPE_NORMAL + zh: 语音质量的感知评估(PESQ) - en: We also evaluate the intelligibility of the speech with the Short-Time Objective Intelligibility (STOI) metric. + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: 我们还使用短时客观可懂性(STOI)指标评估语音的可懂性。 - en: '[PRE9]' + id: totrans-43 prefs: [] type: TYPE_PRE + zh: '[PRE9]' - en: '![Spectrogram of Mixture Speech (dB)](../Images/649354a23fa7ffa055a7ebbc4cc794ee.png)' + id: totrans-44 prefs: [] type: TYPE_IMG + zh: '![混合语音的频谱图(dB)](../Images/649354a23fa7ffa055a7ebbc4cc794ee.png)' - en: '[PRE10]' + id: totrans-45 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE10]' +- en: null + id: totrans-46 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: 3.2.2\. Visualize clean speech[](#visualize-clean-speech "Permalink to this heading") + id: totrans-48 prefs: - PREF_H4 type: TYPE_NORMAL + zh: 3.2.2\. 可视化干净语音[](#visualize-clean-speech "跳转到此标题的永久链接") - en: '[PRE11]' + id: totrans-49 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: '![Spectrogram of Clean Speech (dB)](../Images/8c0ec6c7b70ba3381f01f4ca1aa64cc3.png)' + id: totrans-50 prefs: [] type: TYPE_IMG -- en: + zh: '![干净语音的频谱图(dB)](../Images/8c0ec6c7b70ba3381f01f4ca1aa64cc3.png)' +- en: null + id: totrans-51 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: 3.2.3\. Visualize noise[](#visualize-noise "Permalink to this heading") + id: totrans-53 prefs: - PREF_H4 type: TYPE_NORMAL + zh: 3.2.3\. 可视化噪声[](#visualize-noise "跳转到此标题的永久链接") - en: '[PRE12]' + id: totrans-54 prefs: [] type: TYPE_PRE + zh: '[PRE12]' - en: '![Spectrogram of Noise (dB)](../Images/6fcdfd90b4d1cf9de948387124b33fbc.png)' + id: totrans-55 prefs: [] type: TYPE_IMG -- en: + zh: '![噪声的频谱图(dB)](../Images/6fcdfd90b4d1cf9de948387124b33fbc.png)' +- en: null + id: totrans-56 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-57 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: 3.3\. Define the reference microphone[](#define-the-reference-microphone "Permalink to this heading") + id: totrans-58 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 3.3\. 定义参考麦克风 - en: We choose the first microphone in the array as the reference channel for demonstration. The selection of the reference channel may depend on the design of the microphone array. + id: totrans-59 prefs: [] type: TYPE_NORMAL + zh: 我们选择阵列中的第一个麦克风作为演示的参考通道。参考通道的选择可能取决于麦克风阵列的设计。 - en: You can also apply an end-to-end neural network which estimates both the reference channel and the PSD matrices, then obtains the enhanced STFT coefficients by the MVDR module. + id: totrans-60 prefs: [] type: TYPE_NORMAL + zh: 您还可以应用一个端到端的神经网络,该网络估计参考通道和PSD矩阵,然后通过MVDR模块获得增强的STFT系数。 - en: '[PRE13]' + id: totrans-61 prefs: [] type: TYPE_PRE + zh: '[PRE13]' - en: 3.4\. Compute IRMs[](#compute-irms "Permalink to this heading") + id: totrans-62 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 3.4\. 计算IRM - en: '[PRE14]' + id: totrans-63 prefs: [] type: TYPE_PRE + zh: '[PRE14]' - en: 3.4.1\. Visualize IRM of target speech[](#visualize-irm-of-target-speech "Permalink to this heading") + id: totrans-64 prefs: - PREF_H4 type: TYPE_NORMAL + zh: 3.4.1\. 可视化目标语音的IRM - en: '[PRE15]' + id: totrans-65 prefs: [] type: TYPE_PRE + zh: '[PRE15]' - en: '![IRM of the Target Speech](../Images/554c74b7aceb3610533b6c17013955ed.png)' + id: totrans-66 prefs: [] type: TYPE_IMG + zh: '![目标语音的IRM](../Images/554c74b7aceb3610533b6c17013955ed.png)' - en: 3.4.2\. Visualize IRM of noise[](#visualize-irm-of-noise "Permalink to this heading") + id: totrans-67 prefs: - PREF_H4 type: TYPE_NORMAL + zh: 3.4.2\. 可视化噪声的IRM - en: '[PRE16]' + id: totrans-68 prefs: [] type: TYPE_PRE + zh: '[PRE16]' - en: '![IRM of the Noise](../Images/f8e3d909efad92e7bbf8a4a89c77afe9.png)' + id: totrans-69 prefs: [] type: TYPE_IMG + zh: '![噪声的IRM](../Images/f8e3d909efad92e7bbf8a4a89c77afe9.png)' - en: 4\. Compute PSD matrices[](#compute-psd-matrices "Permalink to this heading") + id: totrans-70 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 4\. 计算PSD矩阵 - en: '[`torchaudio.transforms.PSD()`](../generated/torchaudio.transforms.PSD.html#torchaudio.transforms.PSD "torchaudio.transforms.PSD") computes the time-invariant PSD matrix given the multi-channel complex-valued STFT coefficients of the mixture speech and the time-frequency mask.' + id: totrans-71 prefs: [] type: TYPE_NORMAL + zh: '[`torchaudio.transforms.PSD()`](../generated/torchaudio.transforms.PSD.html#torchaudio.transforms.PSD + "torchaudio.transforms.PSD") 计算给定混合语音的多通道复值STFT系数和时间频率掩模的时不变PSD矩阵。' - en: The shape of the PSD matrix is (…, freq, channel, channel). + id: totrans-72 prefs: [] type: TYPE_NORMAL + zh: PSD矩阵的形状为(…,频率,通道,通道)。 - en: '[PRE17]' + id: totrans-73 prefs: [] type: TYPE_PRE + zh: '[PRE17]' - en: 5\. Beamforming using SoudenMVDR[](#beamforming-using-soudenmvdr "Permalink to this heading") + id: totrans-74 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 5\. 使用SoudenMVDR进行波束成形 - en: 5.1\. Apply beamforming[](#apply-beamforming "Permalink to this heading") + id: totrans-75 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 5.1\. 应用波束成形 - en: '[`torchaudio.transforms.SoudenMVDR()`](../generated/torchaudio.transforms.SoudenMVDR.html#torchaudio.transforms.SoudenMVDR "torchaudio.transforms.SoudenMVDR") takes the multi-channel complexed-valued STFT coefficients of the mixture speech, PSD matrices of target speech and noise, and the reference channel inputs.' + id: totrans-76 prefs: [] type: TYPE_NORMAL + zh: '[`torchaudio.transforms.SoudenMVDR()`](../generated/torchaudio.transforms.SoudenMVDR.html#torchaudio.transforms.SoudenMVDR + "torchaudio.transforms.SoudenMVDR") 接受混合语音的多通道复值STFT系数,目标语音和噪声的PSD矩阵,以及参考通道输入。' - en: The output is a single-channel complex-valued STFT coefficients of the enhanced speech. We can then obtain the enhanced waveform by passing this output to the [`torchaudio.transforms.InverseSpectrogram()`](../generated/torchaudio.transforms.InverseSpectrogram.html#torchaudio.transforms.InverseSpectrogram "torchaudio.transforms.InverseSpectrogram") module. + id: totrans-77 prefs: [] type: TYPE_NORMAL + zh: 输出是增强语音的单通道复值STFT系数。然后,我们可以通过将此输出传递给[`torchaudio.transforms.InverseSpectrogram()`](../generated/torchaudio.transforms.InverseSpectrogram.html#torchaudio.transforms.InverseSpectrogram + "torchaudio.transforms.InverseSpectrogram")模块来获得增强的波形。 - en: '[PRE18]' + id: totrans-78 prefs: [] type: TYPE_PRE + zh: '[PRE18]' - en: 5.2\. Result for SoudenMVDR[](#result-for-soudenmvdr "Permalink to this heading") + id: totrans-79 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 5.2\. SoudenMVDR的结果 - en: '[PRE19]' + id: totrans-80 prefs: [] type: TYPE_PRE + zh: '[PRE19]' - en: '![Enhanced Spectrogram by SoudenMVDR (dB)](../Images/538460f3f3101c43956f055d758c19d8.png)' + id: totrans-81 prefs: [] type: TYPE_IMG + zh: '![SoudenMVDR增强的频谱图(dB)](../Images/538460f3f3101c43956f055d758c19d8.png)' - en: '[PRE20]' + id: totrans-82 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE20]' +- en: null + id: totrans-83 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-84 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: 6\. Beamforming using RTFMVDR[](#beamforming-using-rtfmvdr "Permalink to this heading") + id: totrans-85 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 6\. 使用RTFMVDR进行波束成形 - en: 6.1\. Compute RTF[](#compute-rtf "Permalink to this heading") + id: totrans-86 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 6.1\. 计算RTF - en: 'TorchAudio offers two methods for computing the RTF matrix of a target speech:' + id: totrans-87 prefs: [] type: TYPE_NORMAL + zh: TorchAudio提供了两种计算目标语音RTF矩阵的方法: - en: '[`torchaudio.functional.rtf_evd()`](../generated/torchaudio.functional.rtf_evd.html#torchaudio.functional.rtf_evd "torchaudio.functional.rtf_evd"), which applies eigenvalue decomposition to the PSD matrix of target speech to get the RTF matrix.' + id: totrans-88 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[`torchaudio.functional.rtf_evd()`](../generated/torchaudio.functional.rtf_evd.html#torchaudio.functional.rtf_evd + "torchaudio.functional.rtf_evd"),它对目标语音的PSD矩阵应用特征值分解以获得RTF矩阵。' - en: '[`torchaudio.functional.rtf_power()`](../generated/torchaudio.functional.rtf_power.html#torchaudio.functional.rtf_power "torchaudio.functional.rtf_power"), which applies the power iteration method. You can specify the number of iterations with argument `n_iter`.' + id: totrans-89 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[`torchaudio.functional.rtf_power()`](../generated/torchaudio.functional.rtf_power.html#torchaudio.functional.rtf_power + "torchaudio.functional.rtf_power"),它应用幂迭代方法。您可以使用参数`n_iter`指定迭代次数。' - en: '[PRE21]' + id: totrans-90 prefs: [] type: TYPE_PRE + zh: '[PRE21]' - en: 6.2\. Apply beamforming[](#id1 "Permalink to this heading") + id: totrans-91 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 6.2\. 使用波束成形 - en: '[`torchaudio.transforms.RTFMVDR()`](../generated/torchaudio.transforms.RTFMVDR.html#torchaudio.transforms.RTFMVDR "torchaudio.transforms.RTFMVDR") takes the multi-channel complexed-valued STFT coefficients of the mixture speech, RTF matrix of target speech, PSD matrix of noise, and the reference channel inputs.' + id: totrans-92 prefs: [] type: TYPE_NORMAL + zh: '[`torchaudio.transforms.RTFMVDR()`](../generated/torchaudio.transforms.RTFMVDR.html#torchaudio.transforms.RTFMVDR + "torchaudio.transforms.RTFMVDR") 接受混合语音的多通道复值STFT系数,目标语音的RTF矩阵,噪声的PSD矩阵,以及参考通道输入。' - en: The output is a single-channel complex-valued STFT coefficients of the enhanced speech. We can then obtain the enhanced waveform by passing this output to the [`torchaudio.transforms.InverseSpectrogram()`](../generated/torchaudio.transforms.InverseSpectrogram.html#torchaudio.transforms.InverseSpectrogram "torchaudio.transforms.InverseSpectrogram") module. + id: totrans-93 prefs: [] type: TYPE_NORMAL + zh: 输出是增强语音的单通道复值STFT系数。然后,我们可以通过将此输出传递给[`torchaudio.transforms.InverseSpectrogram()`](../generated/torchaudio.transforms.InverseSpectrogram.html#torchaudio.transforms.InverseSpectrogram + "torchaudio.transforms.InverseSpectrogram")模块来获得增强的波形。 - en: '[PRE22]' + id: totrans-94 prefs: [] type: TYPE_PRE + zh: '[PRE22]' - en: 6.3\. Result for RTFMVDR with rtf_evd[](#result-for-rtfmvdr-with-rtf-evd "Permalink to this heading") + id: totrans-95 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 6.3\. 使用rtf_evd的RTFMVDR的结果 - en: '[PRE23]' + id: totrans-96 prefs: [] type: TYPE_PRE + zh: '[PRE23]' - en: '![Enhanced Spectrogram by RTFMVDR and F.rtf_evd (dB)](../Images/370db6cebd6277dbdb20615483979e75.png)' + id: totrans-97 prefs: [] type: TYPE_IMG + zh: '![RTFMVDR和F.rtf_evd(dB)增强的频谱图](../Images/370db6cebd6277dbdb20615483979e75.png)' - en: '[PRE24]' + id: totrans-98 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE24]' +- en: null + id: totrans-99 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-100 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: 6.4\. Result for RTFMVDR with rtf_power[](#result-for-rtfmvdr-with-rtf-power "Permalink to this heading") + id: totrans-101 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 6.4\. 使用rtf_power的RTFMVDR结果[](#result-for-rtfmvdr-with-rtf-power "跳转到此标题") - en: '[PRE25]' + id: totrans-102 prefs: [] type: TYPE_PRE + zh: '[PRE25]' - en: '![Enhanced Spectrogram by RTFMVDR and F.rtf_power (dB)](../Images/1c0783f4375432452c0ab411fa8bb3a5.png)' + id: totrans-103 prefs: [] type: TYPE_IMG + zh: '![RTFMVDR和F.rtf_power(dB)增强的频谱图](../Images/1c0783f4375432452c0ab411fa8bb3a5.png)' - en: '[PRE26]' + id: totrans-104 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE26]' +- en: null + id: totrans-105 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-106 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '**Total running time of the script:** ( 0 minutes 1.792 seconds)' + id: totrans-107 prefs: [] type: TYPE_NORMAL + zh: '**脚本的总运行时间:**(0分钟1.792秒)' - en: '[`Download Python source code: mvdr_tutorial.py`](../_downloads/50de4231f2cfe5d85bac91915f27f92c/mvdr_tutorial.py)' + id: totrans-108 prefs: [] type: TYPE_NORMAL + zh: '[`下载Python源代码:mvdr_tutorial.py`](../_downloads/50de4231f2cfe5d85bac91915f27f92c/mvdr_tutorial.py)' - en: '[`Download Jupyter notebook: mvdr_tutorial.ipynb`](../_downloads/ad8cfe3c85e0370f75a48f091e5a301d/mvdr_tutorial.ipynb)' + id: totrans-109 prefs: [] type: TYPE_NORMAL + zh: '[`下载Jupyter笔记本:mvdr_tutorial.ipynb`](../_downloads/ad8cfe3c85e0370f75a48f091e5a301d/mvdr_tutorial.ipynb)' - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)' + id: totrans-110 prefs: [] type: TYPE_NORMAL + zh: '[Sphinx-Gallery生成的图库](https://sphinx-gallery.github.io)' diff --git a/totrans/aud22_44.yaml b/totrans/aud22_44.yaml index 85f1e5fab44235202573427e8677d5b560f92961..9d3d78e1277bef940d6085ca20bea70dd3d75119 100644 --- a/totrans/aud22_44.yaml +++ b/totrans/aud22_44.yaml @@ -1,282 +1,432 @@ - en: Music Source Separation with Hybrid Demucs + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 使用混合Demucs进行音乐源分离 - en: 原文:[https://pytorch.org/audio/stable/tutorials/hybrid_demucs_tutorial.html](https://pytorch.org/audio/stable/tutorials/hybrid_demucs_tutorial.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/tutorials/hybrid_demucs_tutorial.html](https://pytorch.org/audio/stable/tutorials/hybrid_demucs_tutorial.html) - en: Note + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Click [here](#sphx-glr-download-tutorials-hybrid-demucs-tutorial-py) to download the full example code + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 点击[这里](#sphx-glr-download-tutorials-hybrid-demucs-tutorial-py)下载完整示例代码 - en: '**Author**: [Sean Kim](https://github.com/skim0514)' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: '**作者**:[Sean Kim](https://github.com/skim0514)' - en: This tutorial shows how to use the Hybrid Demucs model in order to perform music separation + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 本教程展示了如何使用混合Demucs模型进行音乐分离 - en: 1\. Overview[](#overview "Permalink to this heading") + id: totrans-6 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 1\. 概述[](#overview "跳转到此标题的永久链接") - en: Performing music separation is composed of the following steps + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 进行音乐分离包括以下步骤 - en: Build the Hybrid Demucs pipeline. + id: totrans-8 prefs: - PREF_OL type: TYPE_NORMAL + zh: 构建混合Demucs管道。 - en: Format the waveform into chunks of expected sizes and loop through chunks (with overlap) and feed into pipeline. + id: totrans-9 prefs: - PREF_OL type: TYPE_NORMAL + zh: 将波形格式化为预期大小的块,并循环遍历块(带有重叠),并将其馈送到管道中。 - en: Collect output chunks and combine according to the way they have been overlapped. + id: totrans-10 prefs: - PREF_OL type: TYPE_NORMAL + zh: 收集输出块并根据它们的重叠方式进行组合。 - en: The Hybrid Demucs [[Défossez, 2021](https://arxiv.org/abs/2111.03600)] model is a developed version of the [Demucs](https://github.com/facebookresearch/demucs) model, a waveform based model which separates music into its respective sources, such as vocals, bass, and drums. Hybrid Demucs effectively uses spectrogram to learn through the frequency domain and also moves to time convolutions. + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 混合Demucs[[Défossez, 2021](https://arxiv.org/abs/2111.03600)]模型是[Demucs](https://github.com/facebookresearch/demucs)模型的进化版本,这是一个基于波形的模型,将音乐分离为其各自的源,如人声、低音和鼓。混合Demucs有效地使用频谱图来学习频域,并且还移动到时间卷积。 - en: 2\. Preparation[](#preparation "Permalink to this heading") + id: totrans-12 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 2\. 准备工作[](#preparation "跳转到此标题的永久链接") - en: First, we install the necessary dependencies. The first requirement is `torchaudio` and `torch` + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 首先,我们安装必要的依赖项。第一个要求是`torchaudio`和`torch` - en: '[PRE0]' + id: totrans-14 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '[PRE1]' + id: totrans-15 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: In addition to `torchaudio`, `mir_eval` is required to perform signal-to-distortion ratio (SDR) calculations. To install `mir_eval` please use `pip3 install mir_eval`. + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: 除了`torchaudio`,还需要`mir_eval`来执行信号失真比(SDR)计算。要安装`mir_eval`,请使用`pip3 install mir_eval`。 - en: '[PRE2]' + id: totrans-17 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: 3\. Construct the pipeline[](#construct-the-pipeline "Permalink to this heading") + id: totrans-18 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 3\. 构建管道[](#construct-the-pipeline "跳转到此标题的永久链接") - en: Pre-trained model weights and related pipeline components are bundled as [`torchaudio.pipelines.HDEMUCS_HIGH_MUSDB_PLUS()`](../generated/torchaudio.pipelines.HDEMUCS_HIGH_MUSDB_PLUS.html#torchaudio.pipelines.HDEMUCS_HIGH_MUSDB_PLUS "torchaudio.pipelines.HDEMUCS_HIGH_MUSDB_PLUS"). This is a [`torchaudio.models.HDemucs`](../generated/torchaudio.models.HDemucs.html#torchaudio.models.HDemucs "torchaudio.models.HDemucs") model trained on [MUSDB18-HQ](https://zenodo.org/record/3338373) and additional internal extra training data. This specific model is suited for higher sample rates, around 44.1 kHZ and has a nfft value of 4096 with a depth of 6 in the model implementation. + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: 预训练模型权重和相关管道组件被捆绑为[`torchaudio.pipelines.HDEMUCS_HIGH_MUSDB_PLUS()`](../generated/torchaudio.pipelines.HDEMUCS_HIGH_MUSDB_PLUS.html#torchaudio.pipelines.HDEMUCS_HIGH_MUSDB_PLUS)。这是一个在[MUSDB18-HQ](https://zenodo.org/record/3338373)和额外的内部额外训练数据上训练的[`torchaudio.models.HDemucs`](../generated/torchaudio.models.HDemucs.html#torchaudio.models.HDemucs)模型。这个特定的模型适用于更高的采样率,大约为44.1 + kHZ,并且在模型实现中具有4096的nfft值和6的深度。 - en: '[PRE3]' + id: totrans-20 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: '[PRE4]' + id: totrans-21 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: 4\. Configure the application function[](#configure-the-application-function "Permalink to this heading") + id: totrans-22 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 4\. 配置应用程序功能[](#configure-the-application-function "跳转到此标题的永久链接") - en: Because `HDemucs` is a large and memory-consuming model it is very difficult to have sufficient memory to apply the model to an entire song at once. To work around this limitation, obtain the separated sources of a full song by chunking the song into smaller segments and run through the model piece by piece, and then rearrange back together. + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: 由于`HDemucs`是一个占用大量内存的模型,很难有足够的内存一次性将模型应用于整首歌曲。为了解决这个限制,通过将歌曲分成较小的片段并逐段通过模型运行,然后重新排列在一起,获得完整歌曲的分离源。 - en: When doing this, it is important to ensure some overlap between each of the chunks, to accommodate for artifacts at the edges. Due to the nature of the model, sometimes the edges have inaccurate or undesired sounds included. + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: 在进行此操作时,重要的是确保每个块之间有一定的重叠,以适应边缘处的伪影。由于模型的性质,有时边缘会包含不准确或不希望的声音。 - en: We provide a sample implementation of chunking and arrangement below. This implementation takes an overlap of 1 second on each side, and then does a linear fade in and fade out on each side. Using the faded overlaps, I add these segments together, to ensure a constant volume throughout. This accommodates for the artifacts by using less of the edges of the model outputs. + id: totrans-25 prefs: [] type: TYPE_NORMAL + zh: 我们提供了一个分块和排列的示例实现。该实现在每一侧都有1秒的重叠,并在每一侧进行线性淡入和淡出。使用淡化的重叠,我将这些段添加在一起,以确保整个过程中的音量恒定。通过使用模型输出的边缘较少的部分,可以适应伪影。 - en: '![https://download.pytorch.org/torchaudio/tutorial-assets/HDemucs_Drawing.jpg](../Images/6e9f5cb6983d007601e3ca05feb269f2.png)' + id: totrans-26 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/tutorial-assets/HDemucs_Drawing.jpg](../Images/6e9f5cb6983d007601e3ca05feb269f2.png)' - en: '[PRE5]' + id: totrans-27 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: 5\. Run Model[](#run-model "Permalink to this heading") + id: totrans-28 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 5\. 运行模型[](#run-model "跳转到此标题的永久链接") - en: Finally, we run the model and store the separate source files in a directory + id: totrans-29 prefs: [] type: TYPE_NORMAL + zh: 最后,我们运行模型并将单独的源文件存储在一个目录中 - en: As a test song, we will be using A Classic Education by NightOwl from MedleyDB (Creative Commons BY-NC-SA 4.0). This is also located in [MUSDB18-HQ](https://zenodo.org/record/3338373) dataset within the `train` sources. + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: 作为测试歌曲,我们将使用MedleyDB中NightOwl演唱的A Classic Education(知识共享署名-非商业-相同方式共享4.0)。这也位于[MUSDB18-HQ](https://zenodo.org/record/3338373)数据集中的`train`来源中。 - en: In order to test with a different song, the variable names and urls below can be changed alongside with the parameters to test the song separator in different ways. + id: totrans-31 prefs: [] type: TYPE_NORMAL + zh: 为了测试不同歌曲,下面的变量名称和网址可以随着参数的改变而改变,以不同方式测试歌曲分离器。 - en: '[PRE6]' + id: totrans-32 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: '[PRE7]' + id: totrans-33 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: 5.1 Separate Track[](#separate-track "Permalink to this heading") + id: totrans-34 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 5.1 分离轨道[](#separate-track "跳转到此标题的永久链接") - en: 'The default set of pretrained weights that has been loaded has 4 sources that it is separated into: drums, bass, other, and vocals in that order. They have been stored into the dict “audios” and therefore can be accessed there. For the four sources, there is a separate cell for each, that will create the audio, the spectrogram graph, and also calculate the SDR score. SDR is the signal-to-distortion ratio, essentially a representation to the “quality” of an audio track.' + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 已加载的默认预训练权重集将其分为4个来源:鼓、低音、其他和人声,按顺序存储在字典“audios”中,因此可以在那里访问。对于这四个来源,每个都有一个单独的单元格,将创建音频、频谱图并计算SDR分数。SDR是信号失真比,本质上是音频轨道“质量”的表示。 - en: '[PRE8]' + id: totrans-36 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: 5.2 Audio Segmenting and Processing[](#audio-segmenting-and-processing "Permalink to this heading") + id: totrans-37 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 5.2 音频分段和处理[](#audio-segmenting-and-processing "跳转到此标题的永久链接") - en: Below is the processing steps and segmenting 5 seconds of the tracks in order to feed into the spectrogram and to caclulate the respective SDR scores. + id: totrans-38 prefs: [] type: TYPE_NORMAL + zh: 以下是处理步骤和将曲目分段为5秒以供输入频谱图和计算相应SDR分数。 - en: '[PRE9]' + id: totrans-39 prefs: [] type: TYPE_PRE + zh: '[PRE9]' - en: '[PRE10]' + id: totrans-40 prefs: [] type: TYPE_PRE + zh: '[PRE10]' - en: 5.3 Spectrograms and Audio[](#spectrograms-and-audio "Permalink to this heading") + id: totrans-41 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 5.3 频谱图和音频[](#spectrograms-and-audio "跳转到此标题的永久链接") - en: In the next 5 cells, you can see the spectrograms with the respective audios. The audios can be clearly visualized using the spectrogram. + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: 在接下来的5个单元格中,您可以看到具有相应音频的频谱图。可以使用频谱图清晰地可视化音频。 - en: The mixture clip comes from the original track, and the remaining tracks are the model output + id: totrans-43 prefs: [] type: TYPE_NORMAL + zh: 混音片段来自原始曲目,其余曲目是模型输出 - en: '[PRE11]' + id: totrans-44 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: '![Spectrogram - Mixture](../Images/903b3c24f56fbd7403f05bea8407e0e1.png)' + id: totrans-45 prefs: [] type: TYPE_IMG -- en: + zh: '![频谱图 - 混音](../Images/903b3c24f56fbd7403f05bea8407e0e1.png)' +- en: null + id: totrans-46 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Drums SDR, Spectrogram, and Audio + id: totrans-48 prefs: [] type: TYPE_NORMAL + zh: 鼓SDR、频谱图和音频 - en: '[PRE12]' + id: totrans-49 prefs: [] type: TYPE_PRE + zh: '[PRE12]' - en: '![Spectrogram - drums](../Images/7450ec09b4ac750ff99261d0424d93ad.png)' + id: totrans-50 prefs: [] type: TYPE_IMG + zh: '![频谱图 - 鼓](../Images/7450ec09b4ac750ff99261d0424d93ad.png)' - en: '[PRE13]' + id: totrans-51 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE13]' +- en: null + id: totrans-52 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-53 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Bass SDR, Spectrogram, and Audio + id: totrans-54 prefs: [] type: TYPE_NORMAL + zh: 低音SDR、频谱图和音频 - en: '[PRE14]' + id: totrans-55 prefs: [] type: TYPE_PRE + zh: '[PRE14]' - en: '![Spectrogram - bass](../Images/285241ea561b69a0ed0bde347e065a4b.png)' + id: totrans-56 prefs: [] type: TYPE_IMG + zh: '![频谱图 - 低音](../Images/285241ea561b69a0ed0bde347e065a4b.png)' - en: '[PRE15]' + id: totrans-57 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE15]' +- en: null + id: totrans-58 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-59 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Vocals SDR, Spectrogram, and Audio + id: totrans-60 prefs: [] type: TYPE_NORMAL + zh: 人声SDR、频谱图和音频 - en: '[PRE16]' + id: totrans-61 prefs: [] type: TYPE_PRE + zh: '[PRE16]' - en: '![Spectrogram - vocals](../Images/74dae62c693575aeeec23910e8c6a8d2.png)' + id: totrans-62 prefs: [] type: TYPE_IMG + zh: '![频谱图 - 人声](../Images/74dae62c693575aeeec23910e8c6a8d2.png)' - en: '[PRE17]' + id: totrans-63 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE17]' +- en: null + id: totrans-64 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-65 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Other SDR, Spectrogram, and Audio + id: totrans-66 prefs: [] type: TYPE_NORMAL + zh: 其他SDR、频谱图和音频 - en: '[PRE18]' + id: totrans-67 prefs: [] type: TYPE_PRE + zh: '[PRE18]' - en: '![Spectrogram - other](../Images/560e3f4afb279402f99bd339b3fdc12d.png)' + id: totrans-68 prefs: [] type: TYPE_IMG + zh: '![频谱图 - 其他](../Images/560e3f4afb279402f99bd339b3fdc12d.png)' - en: '[PRE19]' + id: totrans-69 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE19]' +- en: null + id: totrans-70 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-71 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: '[PRE20]' + id: totrans-72 prefs: [] type: TYPE_PRE + zh: '[PRE20]' - en: '**Total running time of the script:** ( 0 minutes 22.977 seconds)' + id: totrans-73 prefs: [] type: TYPE_NORMAL + zh: '**脚本的总运行时间:**(0分钟22.977秒)' - en: '[`Download Python source code: hybrid_demucs_tutorial.py`](../_downloads/d7783185e54fb77cb13eb7133fa130a3/hybrid_demucs_tutorial.py)' + id: totrans-74 prefs: [] type: TYPE_NORMAL + zh: '[`下载Python源代码:hybrid_demucs_tutorial.py`](../_downloads/d7783185e54fb77cb13eb7133fa130a3/hybrid_demucs_tutorial.py)' - en: '[`Download Jupyter notebook: hybrid_demucs_tutorial.ipynb`](../_downloads/c9521dfc1feb227de7d892f0131bbc95/hybrid_demucs_tutorial.ipynb)' + id: totrans-75 prefs: [] type: TYPE_NORMAL + zh: '[`下载Jupyter笔记本:hybrid_demucs_tutorial.ipynb`](../_downloads/c9521dfc1feb227de7d892f0131bbc95/hybrid_demucs_tutorial.ipynb)' - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)' + id: totrans-76 prefs: [] type: TYPE_NORMAL + zh: '[Sphinx-Gallery生成的画廊](https://sphinx-gallery.github.io)' diff --git a/totrans/aud22_45.yaml b/totrans/aud22_45.yaml index b59e889724c8fac54421476d87e60d36c7072556..6a57b471c7470be2bca699848607835f0be3aa16 100644 --- a/totrans/aud22_45.yaml +++ b/totrans/aud22_45.yaml @@ -1,340 +1,536 @@ - en: 'Torchaudio-Squim: Non-intrusive Speech Assessment in TorchAudio' + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: Torchaudio-Squim:TorchAudio中的非侵入式语音评估 - en: 原文:[https://pytorch.org/audio/stable/tutorials/squim_tutorial.html](https://pytorch.org/audio/stable/tutorials/squim_tutorial.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 链接:[https://pytorch.org/audio/stable/tutorials/squim_tutorial.html](https://pytorch.org/audio/stable/tutorials/squim_tutorial.html) - en: Note + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Click [here](#sphx-glr-download-tutorials-squim-tutorial-py) to download the full example code + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 点击[这里](#sphx-glr-download-tutorials-squim-tutorial-py)下载完整示例代码 - en: 'Author: [Anurag Kumar](mailto:anuragkr90%40meta.com), [Zhaoheng Ni](mailto:zni%40meta.com)' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 作者:[Anurag Kumar](mailto:anuragkr90%40meta.com),[Zhaoheng Ni](mailto:zni%40meta.com) - en: 1\. Overview[](#overview "Permalink to this heading") + id: totrans-5 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 1\. 概述[](#overview "跳转到此标题的永久链接") - en: This tutorial shows uses of Torchaudio-Squim to estimate objective and subjective metrics for assessment of speech quality and intelligibility. + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 本教程展示了使用Torchaudio-Squim来估计语音质量和可懂度的客观和主观度量的用法。 - en: 'TorchAudio-Squim enables speech assessment in Torchaudio. It provides interface and pre-trained models to estimate various speech quality and intelligibility metrics. Currently, Torchaudio-Squim [1] supports reference-free estimation 3 widely used objective metrics:' + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: TorchAudio-Squim使得在TorchAudio中进行语音评估成为可能。它提供接口和预训练模型来估计各种语音质量和可懂度度量。目前,Torchaudio-Squim + [1]支持无参考估计3种广泛使用的客观度量: - en: Wideband Perceptual Estimation of Speech Quality (PESQ) [2] + id: totrans-8 prefs: - PREF_UL type: TYPE_NORMAL + zh: 宽带感知语音质量估计(PESQ)[2] - en: Short-Time Objective Intelligibility (STOI) [3] + id: totrans-9 prefs: - PREF_UL type: TYPE_NORMAL + zh: 短时客观可懂度(STOI)[3] - en: Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) [4] + id: totrans-10 prefs: - PREF_UL type: TYPE_NORMAL + zh: 标度不变信号失真比(SI-SDR)[4] - en: It also supports estimation of subjective Mean Opinion Score (MOS) for a given audio waveform using Non-Matching References [1, 5]. + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 它还支持使用非匹配参考[1, 5]对给定音频波形进行主观平均意见分数(MOS)的估计。 - en: '**References**' + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: '**参考文献**' - en: '[1] Kumar, Anurag, et al. “TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio.” ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023.' + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: '[1] Kumar, Anurag, et al.“TorchAudio-Squim:TorchAudio中的无参考语音质量和可懂度度量。”ICASSP + 2023-2023 IEEE国际声学、语音和信号处理会议(ICASSP)。IEEE,2023年。' - en: '[2] I. Rec, “P.862.2: Wideband extension to recommendation P.862 for the assessment of wideband telephone networks and speech codecs,” International Telecommunication Union, CH–Geneva, 2005.' + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: '[2] I. Rec,“P.862.2:推荐P.862的宽带扩展,用于评估宽带电话网络和语音编解码器”,国际电信联盟,瑞士日内瓦,2005年。' - en: '[3] Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2010, March). A short-time objective intelligibility measure for time-frequency weighted noisy speech. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 4214-4217). IEEE.' + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: '[3] Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J.(2010年3月)。一种用于时频加权嘈杂语音的短时客观可懂度测量。在2010年IEEE国际声学、语音和信号处理会议上(第4214-4217页)。IEEE。' - en: '[4] Le Roux, Jonathan, et al. “SDR–half-baked or well done?.” ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019.' + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: '[4] Le Roux, Jonathan, et al.“SDR–半成品还是成品?。”ICASSP 2019-2019 IEEE国际声学、语音和信号处理会议(ICASSP)。IEEE,2019年。' - en: '[5] Manocha, Pranay, and Anurag Kumar. “Speech quality assessment through MOS using non-matching references.” Interspeech, 2022.' + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: '[5] Manocha, Pranay, and Anurag Kumar. “使用非匹配参考进行MOS的语音质量评估。” Interspeech,2022年。' - en: '[PRE0]' + id: totrans-18 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '[PRE1]' + id: totrans-19 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: 2\. Preparation[](#preparation "Permalink to this heading") + id: totrans-20 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 2\. 准备工作[](#preparation "跳转到此标题的永久链接") - en: First import the modules and define the helper functions. + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: 首先导入模块并定义辅助函数。 - en: We will need torch, torchaudio to use Torchaudio-squim, Matplotlib to plot data, pystoi, pesq for computing reference metrics. + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 我们需要使用torch、torchaudio来使用Torchaudio-squim,使用Matplotlib来绘制数据,使用pystoi、pesq来计算参考度量。 - en: '[PRE2]' + id: totrans-23 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: '[PRE3]' + id: totrans-24 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: 3\. Load Speech and Noise Sample[](#load-speech-and-noise-sample "Permalink to this heading") + id: totrans-25 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 3\. 加载语音和噪声样本[](#load-speech-and-noise-sample "跳转到此标题的永久链接") - en: '[PRE4]' + id: totrans-26 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: '[PRE5]' + id: totrans-27 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: '[PRE6]' + id: totrans-28 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: Currently, Torchaudio-Squim model only supports 16000 Hz sampling rate. Resample the waveforms if necessary. + id: totrans-29 prefs: [] type: TYPE_NORMAL + zh: 目前,Torchaudio-Squim模型仅支持16000 Hz的采样率。如有必要,请重新采样波形。 - en: '[PRE7]' + id: totrans-30 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: Trim waveforms so that they have the same number of frames. + id: totrans-31 prefs: [] type: TYPE_NORMAL + zh: 修剪波形,使其具有相同数量的帧。 - en: '[PRE8]' + id: totrans-32 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: Play speech sample + id: totrans-33 prefs: [] type: TYPE_NORMAL + zh: 播放语音样本 - en: '[PRE9]' + id: totrans-34 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE9]' +- en: null + id: totrans-35 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-36 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Play noise sample + id: totrans-37 prefs: [] type: TYPE_NORMAL + zh: 播放噪声样本 - en: '[PRE10]' + id: totrans-38 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE10]' +- en: null + id: totrans-39 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: 4\. Create distorted (noisy) speech samples[](#create-distorted-noisy-speech-samples "Permalink to this heading") + id: totrans-41 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 4\. 创建失真(嘈杂)语音样本[](#create-distorted-noisy-speech-samples "跳转到此标题的永久链接") - en: '[PRE11]' + id: totrans-42 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: Play distorted speech with 20dB SNR + id: totrans-43 prefs: [] type: TYPE_NORMAL + zh: 播放信噪比为20dB的失真语音 - en: '[PRE12]' + id: totrans-44 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE12]' +- en: null + id: totrans-45 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-46 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: Play distorted speech with -5dB SNR + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 播放信噪比为-5dB的失真语音 - en: '[PRE13]' + id: totrans-48 prefs: [] type: TYPE_PRE -- en: + zh: '[PRE13]' +- en: null + id: totrans-49 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. + id: totrans-50 prefs: [] type: TYPE_NORMAL + zh: 您的浏览器不支持音频元素。 - en: 5\. Visualize the waveforms[](#visualize-the-waveforms "Permalink to this heading") + id: totrans-51 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 5\. 可视化波形[](#visualize-the-waveforms "跳转到此标题的永久链接") - en: Visualize speech sample + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: 可视化语音样本 - en: '[PRE14]' + id: totrans-53 prefs: [] type: TYPE_PRE + zh: '[PRE14]' - en: '![Clean Speech](../Images/1a1bff08b20cbd26590cca35888077e0.png)' + id: totrans-54 prefs: [] type: TYPE_IMG + zh: '![干净语音](../Images/1a1bff08b20cbd26590cca35888077e0.png)' - en: Visualize noise sample + id: totrans-55 prefs: [] type: TYPE_NORMAL + zh: 可视化噪声样本 - en: '[PRE15]' + id: totrans-56 prefs: [] type: TYPE_PRE + zh: '[PRE15]' - en: '![Noise](../Images/cf255965754a39d367fd8c4d7cc5021b.png)' + id: totrans-57 prefs: [] type: TYPE_IMG + zh: '![噪声](../Images/cf255965754a39d367fd8c4d7cc5021b.png)' - en: Visualize distorted speech with 20dB SNR + id: totrans-58 prefs: [] type: TYPE_NORMAL + zh: 可视化信噪比为20dB的失真语音 - en: '[PRE16]' + id: totrans-59 prefs: [] type: TYPE_PRE + zh: '[PRE16]' - en: '![Distorted Speech with 20dB SNR](../Images/c0d401a9d6195aa0fd526c9507f14b87.png)' + id: totrans-60 prefs: [] type: TYPE_IMG + zh: '![信噪比为20dB的失真语音](../Images/c0d401a9d6195aa0fd526c9507f14b87.png)' - en: Visualize distorted speech with -5dB SNR + id: totrans-61 prefs: [] type: TYPE_NORMAL + zh: 可视化信噪比为-5dB的失真语音 - en: '[PRE17]' + id: totrans-62 prefs: [] type: TYPE_PRE + zh: '[PRE17]' - en: '![Distorted Speech with -5dB SNR](../Images/f71c3c56743ba5d56f22d8064d2e12ef.png)' + id: totrans-63 prefs: [] type: TYPE_IMG + zh: '![信噪比为-5dB的失真语音](../Images/f71c3c56743ba5d56f22d8064d2e12ef.png)' - en: 6\. Predict Objective Metrics[](#predict-objective-metrics "Permalink to this heading") + id: totrans-64 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 6\. 预测客观度量[](#predict-objective-metrics "跳转到此标题的永久链接") - en: Get the pre-trained `SquimObjective`model. + id: totrans-65 prefs: [] type: TYPE_NORMAL + zh: 获取预训练的`SquimObjective`模型。 - en: '[PRE18]' + id: totrans-66 prefs: [] type: TYPE_PRE + zh: '[PRE18]' - en: '[PRE19]' + id: totrans-67 prefs: [] type: TYPE_PRE + zh: '[PRE19]' - en: Compare model outputs with ground truths for distorted speech with 20dB SNR + id: totrans-68 prefs: [] type: TYPE_NORMAL + zh: 比较模型输出和信噪比为20dB的失真语音的真实值 - en: '[PRE20]' + id: totrans-69 prefs: [] type: TYPE_PRE + zh: '[PRE20]' - en: '[PRE21]' + id: totrans-70 prefs: [] type: TYPE_PRE + zh: '[PRE21]' - en: Compare model outputs with ground truths for distorted speech with -5dB SNR + id: totrans-71 prefs: [] type: TYPE_NORMAL + zh: 比较模型输出和信噪比为-5dB的失真语音的真实值 - en: '[PRE22]' + id: totrans-72 prefs: [] type: TYPE_PRE + zh: '[PRE22]' - en: '[PRE23]' + id: totrans-73 prefs: [] type: TYPE_PRE + zh: '[PRE23]' - en: 7\. Predict Mean Opinion Scores (Subjective) Metric[](#predict-mean-opinion-scores-subjective-metric "Permalink to this heading") + id: totrans-74 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 7\. 预测主观平均意见分数(MOS)度量[](#predict-mean-opinion-scores-subjective-metric "跳转到此标题的永久链接") - en: Get the pre-trained `SquimSubjective` model. + id: totrans-75 prefs: [] type: TYPE_NORMAL + zh: 获取预训练的`SquimSubjective`模型。 - en: '[PRE24]' + id: totrans-76 prefs: [] type: TYPE_PRE + zh: '[PRE24]' - en: '[PRE25]' + id: totrans-77 prefs: [] type: TYPE_PRE + zh: '[PRE25]' - en: Load a non-matching reference (NMR) + id: totrans-78 prefs: [] type: TYPE_NORMAL + zh: 加载一个不匹配的参考(NMR) - en: '[PRE26]' + id: totrans-79 prefs: [] type: TYPE_PRE + zh: '[PRE26]' - en: Compute MOS metric for distorted speech with 20dB SNR + id: totrans-80 prefs: [] type: TYPE_NORMAL + zh: 计算信噪比为20dB的失真语音的MOS指标 - en: '[PRE27]' + id: totrans-81 prefs: [] type: TYPE_PRE + zh: '[PRE27]' - en: '[PRE28]' + id: totrans-82 prefs: [] type: TYPE_PRE + zh: '[PRE28]' - en: Compute MOS metric for distorted speech with -5dB SNR + id: totrans-83 prefs: [] type: TYPE_NORMAL + zh: 计算信噪比为-5dB的失真语音的MOS指标 - en: '[PRE29]' + id: totrans-84 prefs: [] type: TYPE_PRE + zh: '[PRE29]' - en: '[PRE30]' + id: totrans-85 prefs: [] type: TYPE_PRE + zh: '[PRE30]' - en: 8\. Comparison with ground truths and baselines[](#comparison-with-ground-truths-and-baselines "Permalink to this heading") + id: totrans-86 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 8. 与基准和基线的比较[](#comparison-with-ground-truths-and-baselines "跳转到此标题的永久链接") - en: 'Visualizing the estimated metrics by the `SquimObjective` and `SquimSubjective` models can help users better understand how the models can be applicable in real scenario. The graph below shows scatter plots of three different systems: MOSA-Net [1], AMSA [2], and the `SquimObjective` model, where y axis represents the estimated STOI, PESQ, and Si-SDR scores, and x axis represents the corresponding ground truth.' + id: totrans-87 prefs: [] type: TYPE_NORMAL + zh: 通过可视化`SquimObjective`和`SquimSubjective`模型估计的指标,可以帮助用户更好地理解这些模型在实际场景中的应用。下面的图表显示了三种不同系统的散点图:MOSA-Net + [1]、AMSA [2] 和`SquimObjective`模型,其中y轴表示估计的STOI、PESQ和Si-SDR分数,x轴表示相应的基准。 - en: '[![https://download.pytorch.org/torchaudio/tutorial-assets/objective_plot.png](../Images/beab25bd56b59ea05c29a2fee467b3a7.png)](https://download.pytorch.org/torchaudio/tutorial-assets/objective_plot.png)' + id: totrans-88 prefs: [] type: TYPE_NORMAL + zh: '[![https://download.pytorch.org/torchaudio/tutorial-assets/objective_plot.png](../Images/beab25bd56b59ea05c29a2fee467b3a7.png)](https://download.pytorch.org/torchaudio/tutorial-assets/objective_plot.png)' - en: '[1] Zezario, Ryandhimas E., Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, and Yu Tsao. “Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2022): 54-70.' + id: totrans-89 prefs: [] type: TYPE_NORMAL + zh: '[1] Zezario, Ryandhimas E., Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min + Wang, and Yu Tsao. “基于深度学习的非侵入式多目标语音评估模型与跨领域特征。”IEEE/ACM Transactions on Audio, + Speech, and Language Processing 31 (2022): 54-70.' - en: '[2] Dong, Xuan, and Donald S. Williamson. “An attention enhanced multi-task model for objective speech assessment in real-world environments.” In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 911-915\. IEEE, 2020.' + id: totrans-90 prefs: [] type: TYPE_NORMAL + zh: '[2] Dong, Xuan, and Donald S. Williamson. “一种增强注意力的多任务模型,用于实际环境中的客观语音评估。”在ICASSP + 2020-2020 IEEE国际声学、语音和信号处理会议(ICASSP)中,第 911-915 页。IEEE, 2020.' - en: The graph below shows scatter plot of the `SquimSubjective` model, where y axis represents the estimated MOS metric score, and x axis represents the corresponding ground truth. + id: totrans-91 prefs: [] type: TYPE_NORMAL + zh: 下面的图表显示了`SquimSubjective`模型的散点图,其中y轴表示估计的MOS指标分数,x轴表示相应的基准。 - en: '[![https://download.pytorch.org/torchaudio/tutorial-assets/subjective_plot.png](../Images/6f51f5b6f641de3a35830b1d0e9a0d57.png)](https://download.pytorch.org/torchaudio/tutorial-assets/subjective_plot.png)' + id: totrans-92 prefs: [] type: TYPE_NORMAL + zh: '[![https://download.pytorch.org/torchaudio/tutorial-assets/subjective_plot.png](../Images/6f51f5b6f641de3a35830b1d0e9a0d57.png)](https://download.pytorch.org/torchaudio/tutorial-assets/subjective_plot.png)' - en: '**Total running time of the script:** ( 0 minutes 6.527 seconds)' + id: totrans-93 prefs: [] type: TYPE_NORMAL + zh: '**脚本的总运行时间:** (0 分钟 6.527 秒)' - en: '[`Download Python source code: squim_tutorial.py`](../_downloads/c943e35bc7cad6e8d9b1df2a7034a8fc/squim_tutorial.py)' + id: totrans-94 prefs: [] type: TYPE_NORMAL + zh: '[`下载 Python 源代码:squim_tutorial.py`](../_downloads/c943e35bc7cad6e8d9b1df2a7034a8fc/squim_tutorial.py)' - en: '[`Download Jupyter notebook: squim_tutorial.ipynb`](../_downloads/242b4f86f5d51a9a90d3080d8ce32681/squim_tutorial.ipynb)' + id: totrans-95 prefs: [] type: TYPE_NORMAL + zh: '[`下载 Jupyter 笔记本:squim_tutorial.ipynb`](../_downloads/242b4f86f5d51a9a90d3080d8ce32681/squim_tutorial.ipynb)' - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)' + id: totrans-96 prefs: [] type: TYPE_NORMAL + zh: '[Sphinx-Gallery 生成的图库](https://sphinx-gallery.github.io)' diff --git a/totrans/aud22_46.yaml b/totrans/aud22_46.yaml index 3ec29165f803e88e12618de37eea87fa1b61b77e..18127ee67d810773ce2fe822f51b39f1a7a50308 100644 --- a/totrans/aud22_46.yaml +++ b/totrans/aud22_46.yaml @@ -1,4 +1,6 @@ - en: Training Recipes + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 训练食谱 diff --git a/totrans/aud22_47.yaml b/totrans/aud22_47.yaml index 29cbb3f66b1041af269a143841857f00cc6cd7e2..214fcc30408c6fa67dbbab48810d77a28b85f707 100644 --- a/totrans/aud22_47.yaml +++ b/totrans/aud22_47.yaml @@ -1,4 +1,6 @@ - en: Python API Reference + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: Python API 参考文档 diff --git a/totrans/aud22_48.yaml b/totrans/aud22_48.yaml index 6376633b7e4eb6c865e79f92ad28ef96dd70db56..3d5a0074c539849b6af75a244d5e8a82a944c5e5 100644 --- a/totrans/aud22_48.yaml +++ b/totrans/aud22_48.yaml @@ -1,120 +1,186 @@ - en: torchaudio + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio - en: 原文:[https://pytorch.org/audio/stable/torchaudio.html](https://pytorch.org/audio/stable/torchaudio.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/torchaudio.html](https://pytorch.org/audio/stable/torchaudio.html) - en: I/O[](#i-o "Permalink to this heading") + id: totrans-2 prefs: - PREF_H2 type: TYPE_NORMAL + zh: I/O[](#i-o "Permalink to this heading") - en: '`torchaudio` top-level module provides the following functions that make it easy to handle audio data.' + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: '`torchaudio`顶级模块提供了以下函数,使处理音频数据变得容易。' - en: '| [`info`](generated/torchaudio.info.html#torchaudio.info "torchaudio.info") | Get signal information of an audio file. |' + id: totrans-4 prefs: [] type: TYPE_TB + zh: '| [`info`](generated/torchaudio.info.html#torchaudio.info "torchaudio.info") + | 获取音频文件的信号信息。 |' - en: '| [`load`](generated/torchaudio.load.html#torchaudio.load "torchaudio.load") | Load audio data from source. |' + id: totrans-5 prefs: [] type: TYPE_TB + zh: '| [`load`](generated/torchaudio.load.html#torchaudio.load "torchaudio.load") + | 从源加载音频数据。 |' - en: '| [`save`](generated/torchaudio.save.html#torchaudio.save "torchaudio.save") | Save audio data to file. |' + id: totrans-6 prefs: [] type: TYPE_TB + zh: '| [`save`](generated/torchaudio.save.html#torchaudio.save "torchaudio.save") + | 将音频数据保存到文件。 |' - en: '| [`list_audio_backends`](generated/torchaudio.list_audio_backends.html#torchaudio.list_audio_backends "torchaudio.list_audio_backends") | List available backends |' + id: totrans-7 prefs: [] type: TYPE_TB + zh: '| [`list_audio_backends`](generated/torchaudio.list_audio_backends.html#torchaudio.list_audio_backends + "torchaudio.list_audio_backends") | 列出可用的后端 |' - en: '## Backend and Dispatcher[](#backend-and-dispatcher "Permalink to this heading")' + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: '## 后端和调度程序[](#backend-and-dispatcher "Permalink to this heading")' - en: Decoding and encoding media is highly elaborated process. Therefore, TorchAudio relies on third party libraries to perform these operations. These third party libraries are called `backend`, and currently TorchAudio integrates the following libraries. + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: 解码和编码媒体是一个非常复杂的过程。因此,TorchAudio依赖于第三方库来执行这些操作。这些第三方库称为`backend`,目前TorchAudio集成了以下库。 - en: Please refer to [Installation](./installation.html) for how to enable backends. + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 有关如何启用后端,请参阅[安装](./installation.html)。 - en: Conventionally, TorchAudio has had its I/O backend set globally at runtime based on availability. However, this approach does not allow applications to use different backends, and it is not well-suited for large codebases. + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 传统上,TorchAudio在运行时全局设置其I/O后端,基于可用性。然而,这种方法不允许应用程序使用不同的后端,并且不适用于大型代码库。 - en: For these reasons, in v2.0, we introduced a dispatcher, a new mechanism to allow users to choose a backend for each function call. + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 出于这些原因,在v2.0中,我们引入了调度程序,一种允许用户为每个函数调用选择后端的新机制。 - en: When dispatcher mode is enabled, all the I/O functions accept extra keyward argument `backend`, which specifies the desired backend. If the specified backend is not available, the function call will fail. + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 当启用调度程序模式时,所有I/O函数都接受额外的关键字参数`backend`,指定所需的后端。如果指定的后端不可用,函数调用将失败。 - en: If a backend is not explicitly chosen, the functions will select a backend to use given order of precedence and library availability. + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 如果没有明确选择后端,函数将根据优先顺序和库可用性选择要使用的后端。 - en: The following table summarizes the backends. + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: 下表总结了后端。 - en: '| Priority | Backend | Supported OS | Note |' + id: totrans-16 prefs: [] type: TYPE_TB + zh: '| 优先级 | 后端 | 支持的操作系统 | 备注 |' - en: '| --- | --- | --- | --- |' + id: totrans-17 prefs: [] type: TYPE_TB + zh: '| --- | --- | --- | --- |' - en: '| 1 | FFmpeg | Linux, macOS, Windows | Use [`get_audio_decoders()`](generated/torchaudio.utils.ffmpeg_utils.html#torchaudio.utils.ffmpeg_utils.get_audio_decoders "torchaudio.utils.ffmpeg_utils.get_audio_decoders") and [`get_audio_encoders()`](generated/torchaudio.utils.ffmpeg_utils.html#torchaudio.utils.ffmpeg_utils.get_audio_encoders "torchaudio.utils.ffmpeg_utils.get_audio_encoders") to retrieve the supported codecs.This backend Supports various protocols, such as HTTPS and MP4, and file-like objects. |' + id: totrans-18 prefs: [] type: TYPE_TB + zh: '| 1 | FFmpeg | Linux, macOS, Windows | 使用[`get_audio_decoders()`](generated/torchaudio.utils.ffmpeg_utils.html#torchaudio.utils.ffmpeg_utils.get_audio_decoders + "torchaudio.utils.ffmpeg_utils.get_audio_decoders")和[`get_audio_encoders()`](generated/torchaudio.utils.ffmpeg_utils.html#torchaudio.utils.ffmpeg_utils.get_audio_encoders + "torchaudio.utils.ffmpeg_utils.get_audio_encoders")来检索支持的编解码器。此后端支持各种协议,如HTTPS和MP4,以及类似文件的对象。 + |' - en: '| 2 | SoX | Linux, macOS | Use [`list_read_formats()`](generated/torchaudio.utils.sox_utils.html#torchaudio.utils.sox_utils.list_read_formats "torchaudio.utils.sox_utils.list_read_formats") and [`list_write_formats()`](generated/torchaudio.utils.sox_utils.html#torchaudio.utils.sox_utils.list_write_formats "torchaudio.utils.sox_utils.list_write_formats") to retrieve the supported codecs.This backend does *not* support file-like objects. |' + id: totrans-19 prefs: [] type: TYPE_TB + zh: '| 2 | SoX | Linux, macOS | 使用[`list_read_formats()`](generated/torchaudio.utils.sox_utils.html#torchaudio.utils.sox_utils.list_read_formats + "torchaudio.utils.sox_utils.list_read_formats")和[`list_write_formats()`](generated/torchaudio.utils.sox_utils.html#torchaudio.utils.sox_utils.list_write_formats + "torchaudio.utils.sox_utils.list_write_formats")来检索支持的编解码器。此后端*不*支持类似文件的对象。 |' - en: '| 3 | SoundFile | Linux, macOS, Windows | Please refer to [the official document](https://pysoundfile.readthedocs.io/) for the supported codecs.This backend supports file-like objects. |' + id: totrans-20 prefs: [] type: TYPE_TB + zh: '| 3 | SoundFile | Linux, macOS, Windows | 请参阅[官方文档](https://pysoundfile.readthedocs.io/)以获取支持的编解码器。此后端支持类似文件的对象。 + |' - en: '### Dispatcher Migration[](#dispatcher-migration "Permalink to this heading")' + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: '### 调度程序迁移[](#dispatcher-migration "Permalink to this heading")' - en: We are migrating the I/O functions to use the dispatcher mechanism, and this incurs multiple changes, some of which involve backward-compatibility-breaking changes, and require users to change their function call. + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 我们正在将I/O函数迁移到使用调度程序机制,这会导致多个更改,其中一些涉及向后不兼容的更改,并要求用户更改其函数调用。 - en: The (planned) changes are as follows. For up-to-date information, please refer to [https://github.com/pytorch/audio/issues/2950](https://github.com/pytorch/audio/issues/2950) + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: (计划中的)更改如下。有关最新信息,请参阅[https://github.com/pytorch/audio/issues/2950](https://github.com/pytorch/audio/issues/2950) - en: In 2.0, audio I/O backend dispatcher was introduced. Users can opt-in to using dispatcher by setting the environment variable `TORCHAUDIO_USE_BACKEND_DISPATCHER=1`. + id: totrans-24 prefs: - PREF_UL type: TYPE_NORMAL + zh: 在2.0中,引入了音频I/O后端调度程序。用户可以通过设置环境变量`TORCHAUDIO_USE_BACKEND_DISPATCHER=1`选择使用调度程序。 - en: In 2.1, the disptcher became the default mechanism for I/O. + id: totrans-25 prefs: - PREF_UL type: TYPE_NORMAL + zh: 在2.1中,调度程序成为I/O的默认机制。 - en: In 2.2, the legacy global backend mechanism is removed. Utility functions `get_audio_backend()` and `set_audio_backend()` became no-op. + id: totrans-26 prefs: - PREF_UL type: TYPE_NORMAL + zh: 在2.2中,传统的全局后端机制被移除。实用函数`get_audio_backend()`和`set_audio_backend()`变为无操作。 - en: Furthermore, we removed file-like object support from libsox backend, as this is better supported by FFmpeg backend and makes the build process simpler. Therefore, beginning with 2.1, FFmpeg and Soundfile are the sole backends that support file-like objects. + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: 此外,我们从libsox后端中移除了对类似文件对象的支持,因为这在FFmpeg后端中得到了更好的支持,并且使构建过程更简单。因此,从2.1版本开始,FFmpeg和Soundfile是唯一支持类似文件对象的后端。 diff --git a/totrans/aud22_49.yaml b/totrans/aud22_49.yaml index 76306f48f58c8f6689eeaff6f14d30d9693ea5a2..9e90e394e91b14027fbbe86849f9c29170fd2248 100644 --- a/totrans/aud22_49.yaml +++ b/totrans/aud22_49.yaml @@ -1,94 +1,156 @@ - en: torchaudio.io + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.io - en: 原文:[https://pytorch.org/audio/stable/io.html](https://pytorch.org/audio/stable/io.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: '[https://pytorch.org/audio/stable/io.html](https://pytorch.org/audio/stable/io.html)' - en: '| [`StreamReader`](generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader "torchaudio.io.StreamReader") | alias of [`StreamingMediaDecoder`](generated/torio.io.StreamingMediaDecoder.html#torio.io.StreamingMediaDecoder "torio.io._streaming_media_decoder.StreamingMediaDecoder") |' + id: totrans-2 prefs: [] type: TYPE_TB + zh: '| [`StreamReader`](generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader + "torchaudio.io.StreamReader") | 别名为[`StreamingMediaDecoder`](generated/torio.io.StreamingMediaDecoder.html#torio.io.StreamingMediaDecoder + "torio.io._streaming_media_decoder.StreamingMediaDecoder") |' - en: '| [`StreamWriter`](generated/torchaudio.io.StreamWriter.html#torchaudio.io.StreamWriter "torchaudio.io.StreamWriter") | alias of [`StreamingMediaEncoder`](generated/torio.io.StreamingMediaEncoder.html#torio.io.StreamingMediaEncoder "torio.io._streaming_media_encoder.StreamingMediaEncoder") |' + id: totrans-3 prefs: [] type: TYPE_TB + zh: '| [`StreamWriter`](generated/torchaudio.io.StreamWriter.html#torchaudio.io.StreamWriter + "torchaudio.io.StreamWriter") | 别名为[`StreamingMediaEncoder`](generated/torio.io.StreamingMediaEncoder.html#torio.io.StreamingMediaEncoder + "torio.io._streaming_media_encoder.StreamingMediaEncoder") |' - en: '| [`AudioEffector`](generated/torchaudio.io.AudioEffector.html#torchaudio.io.AudioEffector "torchaudio.io.AudioEffector") | Apply various filters and/or codecs to waveforms. |' + id: totrans-4 prefs: [] type: TYPE_TB + zh: '| [`AudioEffector`](generated/torchaudio.io.AudioEffector.html#torchaudio.io.AudioEffector + "torchaudio.io.AudioEffector") | 对波形应用各种滤波器和/或编解码器。 |' - en: '| [`play_audio`](generated/torchaudio.io.play_audio.html#torchaudio.io.play_audio "torchaudio.io.play_audio") | Plays audio through specified or available output device. |' + id: totrans-5 prefs: [] type: TYPE_TB + zh: '| [`play_audio`](generated/torchaudio.io.play_audio.html#torchaudio.io.play_audio + "torchaudio.io.play_audio") | 通过指定或可用的输出设备播放音频。 |' - en: Tutorials using `torchaudio.io` + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 使用`torchaudio.io`的教程 - en: '![StreamWriter Advanced Usage](../Images/6220c14661a5916b79dc7176329a2f31.png)' + id: totrans-7 prefs: [] type: TYPE_IMG + zh: '![StreamWriter高级用法](../Images/6220c14661a5916b79dc7176329a2f31.png)' - en: '[StreamWriter Advanced Usage](tutorials/streamwriter_advanced.html#sphx-glr-tutorials-streamwriter-advanced-py)' + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: '[StreamWriter高级用法](tutorials/streamwriter_advanced.html#sphx-glr-tutorials-streamwriter-advanced-py)' - en: StreamWriter Advanced Usage![StreamReader Advanced Usages](../Images/0bfcb9f0a40e70876201bb889c96b850.png) + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: StreamWriter高级用法![StreamReader高级用法](../Images/0bfcb9f0a40e70876201bb889c96b850.png) - en: '[StreamReader Advanced Usages](tutorials/streamreader_advanced_tutorial.html#sphx-glr-tutorials-streamreader-advanced-tutorial-py)' + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: '[StreamReader高级用法](tutorials/streamreader_advanced_tutorial.html#sphx-glr-tutorials-streamreader-advanced-tutorial-py)' - en: StreamReader Advanced Usages![StreamReader Basic Usages](../Images/2e9d3658df8a114b4fbaf83899e67e81.png) + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: StreamReader高级用法![StreamReader基本用法](../Images/2e9d3658df8a114b4fbaf83899e67e81.png) - en: '[StreamReader Basic Usages](tutorials/streamreader_basic_tutorial.html#sphx-glr-tutorials-streamreader-basic-tutorial-py)' + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: '[StreamReader基本用法](tutorials/streamreader_basic_tutorial.html#sphx-glr-tutorials-streamreader-basic-tutorial-py)' - en: StreamReader Basic Usages![AudioEffector Usages](../Images/1a4ea86e92f465a76624e1054dea18f7.png) + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: StreamReader基本用法![AudioEffector用法](../Images/1a4ea86e92f465a76624e1054dea18f7.png) - en: '[AudioEffector Usages](tutorials/effector_tutorial.html#sphx-glr-tutorials-effector-tutorial-py)' + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: '[AudioEffector用法](tutorials/effector_tutorial.html#sphx-glr-tutorials-effector-tutorial-py)' - en: AudioEffector Usages![Online ASR with Emformer RNN-T](../Images/200081d049505bef5c1ce8e3c321134d.png) + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: 使用AudioEffector![使用Emformer RNN-T的在线ASR](../Images/200081d049505bef5c1ce8e3c321134d.png) - en: '[Online ASR with Emformer RNN-T](tutorials/online_asr_tutorial.html#sphx-glr-tutorials-online-asr-tutorial-py)' + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: '[使用Emformer RNN-T的在线ASR](tutorials/online_asr_tutorial.html#sphx-glr-tutorials-online-asr-tutorial-py)' - en: Online ASR with Emformer RNN-T![Device ASR with Emformer RNN-T](../Images/62ca7f96e6d3a3011aa85c2a9228f03f.png) + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: 使用Emformer RNN-T的在线ASR![使用Emformer RNN-T的设备ASR](../Images/62ca7f96e6d3a3011aa85c2a9228f03f.png) - en: '[Device ASR with Emformer RNN-T](tutorials/device_asr.html#sphx-glr-tutorials-device-asr-py)' + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: '[使用Emformer RNN-T的设备ASR](tutorials/device_asr.html#sphx-glr-tutorials-device-asr-py)' - en: Device ASR with Emformer RNN-T![Accelerated video encoding with NVENC](../Images/31ca70defe6a312ea9543e4c326ada9d.png) + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: 使用Emformer RNN-T的设备ASR![使用NVENC加速视频编码](../Images/31ca70defe6a312ea9543e4c326ada9d.png) - en: '[Accelerated video encoding with NVENC](tutorials/nvenc_tutorial.html#sphx-glr-tutorials-nvenc-tutorial-py)' + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: '[使用NVENC加速视频编码](tutorials/nvenc_tutorial.html#sphx-glr-tutorials-nvenc-tutorial-py)' - en: Accelerated video encoding with NVENC![StreamWriter Basic Usage](../Images/9f6289e977fd79f4e28b4217ecde6c14.png) + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: 使用NVENC加速视频编码![StreamWriter基本用法](../Images/9f6289e977fd79f4e28b4217ecde6c14.png) - en: '[StreamWriter Basic Usage](tutorials/streamwriter_basic_tutorial.html#sphx-glr-tutorials-streamwriter-basic-tutorial-py)' + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: '[StreamWriter基本用法](tutorials/streamwriter_basic_tutorial.html#sphx-glr-tutorials-streamwriter-basic-tutorial-py)' - en: StreamWriter Basic Usage![Device AV-ASR with Emformer RNN-T](../Images/cfabfa62624e7ca52c6aa860b13fed89.png) + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: StreamWriter基本用法![使用Emformer RNN-T的设备AV-ASR](../Images/cfabfa62624e7ca52c6aa860b13fed89.png) - en: '[Device AV-ASR with Emformer RNN-T](tutorials/device_avsr.html#sphx-glr-tutorials-device-avsr-py)' + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: '[使用Emformer RNN-T的设备AV-ASR](tutorials/device_avsr.html#sphx-glr-tutorials-device-avsr-py)' - en: Device AV-ASR with Emformer RNN-T![Accelerated video decoding with NVDEC](../Images/4fbb2b4bcf6bdf294aad9b160cfaa3cf.png) + id: totrans-25 prefs: [] type: TYPE_NORMAL + zh: 使用Emformer RNN-T的设备AV-ASR![使用NVDEC加速视频解码](../Images/4fbb2b4bcf6bdf294aad9b160cfaa3cf.png) - en: '[Accelerated video decoding with NVDEC](tutorials/nvdec_tutorial.html#sphx-glr-tutorials-nvdec-tutorial-py)' + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: '[使用NVDEC加速视频解码](tutorials/nvdec_tutorial.html#sphx-glr-tutorials-nvdec-tutorial-py)' - en: Accelerated video decoding with NVDEC + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: 使用NVDEC加速视频解码 diff --git a/totrans/aud22_50.yaml b/totrans/aud22_50.yaml index 7a35e2821a36688bc5d292115c1ad9ac33e46694..b85969ff5c85a2298694fa39a25dc28214d8472b 100644 --- a/totrans/aud22_50.yaml +++ b/totrans/aud22_50.yaml @@ -1,290 +1,473 @@ - en: torchaudio.functional + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.functional - en: 原文:[https://pytorch.org/audio/stable/functional.html](https://pytorch.org/audio/stable/functional.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/functional.html](https://pytorch.org/audio/stable/functional.html) - en: Functions to perform common audio operations. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 执行常见音频操作的函数。 - en: Utility[](#utility "Permalink to this heading") + id: totrans-3 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 实用工具[](#utility "跳转到此标题") - en: '| [`amplitude_to_DB`](generated/torchaudio.functional.amplitude_to_DB.html#torchaudio.functional.amplitude_to_DB "torchaudio.functional.amplitude_to_DB") | Turn a spectrogram from the power/amplitude scale to the decibel scale. |' + id: totrans-4 prefs: [] type: TYPE_TB + zh: '| [`amplitude_to_DB`](generated/torchaudio.functional.amplitude_to_DB.html#torchaudio.functional.amplitude_to_DB + "torchaudio.functional.amplitude_to_DB") | 将频谱图从功率/幅度标度转换为分贝标度。 |' - en: '| [`DB_to_amplitude`](generated/torchaudio.functional.DB_to_amplitude.html#torchaudio.functional.DB_to_amplitude "torchaudio.functional.DB_to_amplitude") | Turn a tensor from the decibel scale to the power/amplitude scale. |' + id: totrans-5 prefs: [] type: TYPE_TB + zh: '| [`DB_to_amplitude`](generated/torchaudio.functional.DB_to_amplitude.html#torchaudio.functional.DB_to_amplitude + "torchaudio.functional.DB_to_amplitude") | 将张量从分贝标度转换为功率/幅度标度。 |' - en: '| [`melscale_fbanks`](generated/torchaudio.functional.melscale_fbanks.html#torchaudio.functional.melscale_fbanks "torchaudio.functional.melscale_fbanks") | Create a frequency bin conversion matrix. |' + id: totrans-6 prefs: [] type: TYPE_TB + zh: '| [`melscale_fbanks`](generated/torchaudio.functional.melscale_fbanks.html#torchaudio.functional.melscale_fbanks + "torchaudio.functional.melscale_fbanks") | 创建一个频率箱转换矩阵。 |' - en: '| [`linear_fbanks`](generated/torchaudio.functional.linear_fbanks.html#torchaudio.functional.linear_fbanks "torchaudio.functional.linear_fbanks") | Creates a linear triangular filterbank. |' + id: totrans-7 prefs: [] type: TYPE_TB + zh: '| [`linear_fbanks`](generated/torchaudio.functional.linear_fbanks.html#torchaudio.functional.linear_fbanks + "torchaudio.functional.linear_fbanks") | 创建一个线性三角滤波器组。 |' - en: '| [`create_dct`](generated/torchaudio.functional.create_dct.html#torchaudio.functional.create_dct "torchaudio.functional.create_dct") | Create a DCT transformation matrix with shape (`n_mels`, `n_mfcc`), normalized depending on norm. |' + id: totrans-8 prefs: [] type: TYPE_TB + zh: '| [`create_dct`](generated/torchaudio.functional.create_dct.html#torchaudio.functional.create_dct + "torchaudio.functional.create_dct") | 创建一个形状为(`n_mels`, `n_mfcc`)的DCT变换矩阵,根据norm进行归一化。 + |' - en: '| [`mask_along_axis`](generated/torchaudio.functional.mask_along_axis.html#torchaudio.functional.mask_along_axis "torchaudio.functional.mask_along_axis") | Apply a mask along `axis`. |' + id: totrans-9 prefs: [] type: TYPE_TB + zh: '| [`mask_along_axis`](generated/torchaudio.functional.mask_along_axis.html#torchaudio.functional.mask_along_axis + "torchaudio.functional.mask_along_axis") | 沿着`axis`应用掩码。 |' - en: '| [`mask_along_axis_iid`](generated/torchaudio.functional.mask_along_axis_iid.html#torchaudio.functional.mask_along_axis_iid "torchaudio.functional.mask_along_axis_iid") | Apply a mask along `axis`. |' + id: totrans-10 prefs: [] type: TYPE_TB + zh: '| [`mask_along_axis_iid`](generated/torchaudio.functional.mask_along_axis_iid.html#torchaudio.functional.mask_along_axis_iid + "torchaudio.functional.mask_along_axis_iid") | 沿着`axis`应用掩码。 |' - en: '| [`mu_law_encoding`](generated/torchaudio.functional.mu_law_encoding.html#torchaudio.functional.mu_law_encoding "torchaudio.functional.mu_law_encoding") | Encode signal based on mu-law companding. |' + id: totrans-11 prefs: [] type: TYPE_TB + zh: '| [`mu_law_encoding`](generated/torchaudio.functional.mu_law_encoding.html#torchaudio.functional.mu_law_encoding + "torchaudio.functional.mu_law_encoding") | 基于mu-law压缩编码信号。 |' - en: '| [`mu_law_decoding`](generated/torchaudio.functional.mu_law_decoding.html#torchaudio.functional.mu_law_decoding "torchaudio.functional.mu_law_decoding") | Decode mu-law encoded signal. |' + id: totrans-12 prefs: [] type: TYPE_TB + zh: '| [`mu_law_decoding`](generated/torchaudio.functional.mu_law_decoding.html#torchaudio.functional.mu_law_decoding + "torchaudio.functional.mu_law_decoding") | 解码mu-law编码信号。 |' - en: '| [`apply_codec`](generated/torchaudio.functional.apply_codec.html#torchaudio.functional.apply_codec "torchaudio.functional.apply_codec") | DEPRECATED: Apply codecs as a form of augmentation. |' + id: totrans-13 prefs: [] type: TYPE_TB + zh: '| [`apply_codec`](generated/torchaudio.functional.apply_codec.html#torchaudio.functional.apply_codec + "torchaudio.functional.apply_codec") | 已弃用:将编解码器应用为一种增强形式。 |' - en: '| [`resample`](generated/torchaudio.functional.resample.html#torchaudio.functional.resample "torchaudio.functional.resample") | Resamples the waveform at the new frequency using bandlimited interpolation. |' + id: totrans-14 prefs: [] type: TYPE_TB + zh: '| [`resample`](generated/torchaudio.functional.resample.html#torchaudio.functional.resample + "torchaudio.functional.resample") | 使用带限插值将波形重新采样到新的频率。 |' - en: '| [`loudness`](generated/torchaudio.functional.loudness.html#torchaudio.functional.loudness "torchaudio.functional.loudness") | Measure audio loudness according to the ITU-R BS.1770-4 recommendation. |' + id: totrans-15 prefs: [] type: TYPE_TB + zh: '| [`loudness`](generated/torchaudio.functional.loudness.html#torchaudio.functional.loudness + "torchaudio.functional.loudness") | 根据ITU-R BS.1770-4推荐测量音频响度。 |' - en: '| [`convolve`](generated/torchaudio.functional.convolve.html#torchaudio.functional.convolve "torchaudio.functional.convolve") | Convolves inputs along their last dimension using the direct method. |' + id: totrans-16 prefs: [] type: TYPE_TB + zh: '| [`convolve`](generated/torchaudio.functional.convolve.html#torchaudio.functional.convolve + "torchaudio.functional.convolve") | 使用直接方法沿着它们的最后一个维度对输入进行卷积。 |' - en: '| [`fftconvolve`](generated/torchaudio.functional.fftconvolve.html#torchaudio.functional.fftconvolve "torchaudio.functional.fftconvolve") | Convolves inputs along their last dimension using FFT. |' + id: totrans-17 prefs: [] type: TYPE_TB + zh: '| [`fftconvolve`](generated/torchaudio.functional.fftconvolve.html#torchaudio.functional.fftconvolve + "torchaudio.functional.fftconvolve") | 使用FFT沿着它们的最后一个维度对输入进行卷积。 |' - en: '| [`add_noise`](generated/torchaudio.functional.add_noise.html#torchaudio.functional.add_noise "torchaudio.functional.add_noise") | Scales and adds noise to waveform per signal-to-noise ratio. |' + id: totrans-18 prefs: [] type: TYPE_TB + zh: '| [`add_noise`](generated/torchaudio.functional.add_noise.html#torchaudio.functional.add_noise + "torchaudio.functional.add_noise") | 根据信噪比对波形进行缩放和添加噪音。 |' - en: '| [`preemphasis`](generated/torchaudio.functional.preemphasis.html#torchaudio.functional.preemphasis "torchaudio.functional.preemphasis") | Pre-emphasizes a waveform along its last dimension, i.e. for each signal \(x\) in `waveform`, computes output \(y\) as. |' + id: totrans-19 prefs: [] type: TYPE_TB + zh: '| [`preemphasis`](generated/torchaudio.functional.preemphasis.html#torchaudio.functional.preemphasis + "torchaudio.functional.preemphasis") | 预强调波形的最后一个维度,即对于`waveform`中的每个信号\(x\),计算输出\(y\)为。 + |' - en: '| [`deemphasis`](generated/torchaudio.functional.deemphasis.html#torchaudio.functional.deemphasis "torchaudio.functional.deemphasis") | De-emphasizes a waveform along its last dimension. |' + id: totrans-20 prefs: [] type: TYPE_TB + zh: '| [`deemphasis`](generated/torchaudio.functional.deemphasis.html#torchaudio.functional.deemphasis + "torchaudio.functional.deemphasis") | 沿着其最后一个维度减弱波形。 |' - en: '| [`speed`](generated/torchaudio.functional.speed.html#torchaudio.functional.speed "torchaudio.functional.speed") | Adjusts waveform speed. |' + id: totrans-21 prefs: [] type: TYPE_TB + zh: '| [`speed`](generated/torchaudio.functional.speed.html#torchaudio.functional.speed + "torchaudio.functional.speed") | 调整波形速度。 |' - en: '| [`frechet_distance`](generated/torchaudio.functional.frechet_distance.html#torchaudio.functional.frechet_distance "torchaudio.functional.frechet_distance") | Computes the Fréchet distance between two multivariate normal distributions [[Dowson and Landau, 1982](references.html#id72 "DC Dowson and BV666017 Landau. The fréchet distance between multivariate normal distributions. Journal of multivariate analysis, 12(3):450–455, 1982.")]. |' + id: totrans-22 prefs: [] type: TYPE_TB + zh: '| [`frechet_distance`](generated/torchaudio.functional.frechet_distance.html#torchaudio.functional.frechet_distance + "torchaudio.functional.frechet_distance") | 计算两个多元正态分布之间的Fréchet距离[[Dowson and + Landau, 1982](references.html#id72 "DC Dowson and BV666017 Landau. The fréchet + distance between multivariate normal distributions. Journal of multivariate analysis, + 12(3):450–455, 1982.")]。 |' - en: Forced Alignment[](#forced-alignment "Permalink to this heading") + id: totrans-23 prefs: - PREF_H2 type: TYPE_NORMAL + zh: Forced Alignment[](#forced-alignment "Permalink to this heading") - en: '| [`forced_align`](generated/torchaudio.functional.forced_align.html#torchaudio.functional.forced_align "torchaudio.functional.forced_align") | Align a CTC label sequence to an emission. |' + id: totrans-24 prefs: [] type: TYPE_TB + zh: '| [`forced_align`](generated/torchaudio.functional.forced_align.html#torchaudio.functional.forced_align + "torchaudio.functional.forced_align") | 将CTC标签序列与发射对齐。 |' - en: '| [`merge_tokens`](generated/torchaudio.functional.merge_tokens.html#torchaudio.functional.merge_tokens "torchaudio.functional.merge_tokens") | Removes repeated tokens and blank tokens from the given CTC token sequence. |' + id: totrans-25 prefs: [] type: TYPE_TB + zh: '| [`merge_tokens`](generated/torchaudio.functional.merge_tokens.html#torchaudio.functional.merge_tokens + "torchaudio.functional.merge_tokens") | 从给定的CTC标记序列中删除重复标记和空白标记。 |' - en: '| [`TokenSpan`](generated/torchaudio.functional.TokenSpan.html#torchaudio.functional.TokenSpan "torchaudio.functional.TokenSpan") | Token with time stamps and score. |' + id: totrans-26 prefs: [] type: TYPE_TB + zh: '| [`TokenSpan`](generated/torchaudio.functional.TokenSpan.html#torchaudio.functional.TokenSpan + "torchaudio.functional.TokenSpan") | 具有时间戳和分数的标记。 |' - en: Filtering[](#filtering "Permalink to this heading") + id: totrans-27 prefs: - PREF_H2 type: TYPE_NORMAL + zh: Filtering[](#filtering "Permalink to this heading") - en: '| [`allpass_biquad`](generated/torchaudio.functional.allpass_biquad.html#torchaudio.functional.allpass_biquad "torchaudio.functional.allpass_biquad") | Design two-pole all-pass filter. |' + id: totrans-28 prefs: [] type: TYPE_TB + zh: '| [`allpass_biquad`](generated/torchaudio.functional.allpass_biquad.html#torchaudio.functional.allpass_biquad + "torchaudio.functional.allpass_biquad") | 设计双极全通滤波器。 |' - en: '| [`band_biquad`](generated/torchaudio.functional.band_biquad.html#torchaudio.functional.band_biquad "torchaudio.functional.band_biquad") | Design two-pole band filter. |' + id: totrans-29 prefs: [] type: TYPE_TB + zh: '| [`band_biquad`](generated/torchaudio.functional.band_biquad.html#torchaudio.functional.band_biquad + "torchaudio.functional.band_biquad") | 设计双极带通滤波器。 |' - en: '| [`bandpass_biquad`](generated/torchaudio.functional.bandpass_biquad.html#torchaudio.functional.bandpass_biquad "torchaudio.functional.bandpass_biquad") | Design two-pole band-pass filter. |' + id: totrans-30 prefs: [] type: TYPE_TB + zh: '| [`bandpass_biquad`](generated/torchaudio.functional.bandpass_biquad.html#torchaudio.functional.bandpass_biquad + "torchaudio.functional.bandpass_biquad") | 设计双极带通滤波器。 |' - en: '| [`bandreject_biquad`](generated/torchaudio.functional.bandreject_biquad.html#torchaudio.functional.bandreject_biquad "torchaudio.functional.bandreject_biquad") | Design two-pole band-reject filter. |' + id: totrans-31 prefs: [] type: TYPE_TB + zh: '| [`bandreject_biquad`](generated/torchaudio.functional.bandreject_biquad.html#torchaudio.functional.bandreject_biquad + "torchaudio.functional.bandreject_biquad") | 设计双极带阻滤波器。 |' - en: '| [`bass_biquad`](generated/torchaudio.functional.bass_biquad.html#torchaudio.functional.bass_biquad "torchaudio.functional.bass_biquad") | Design a bass tone-control effect. |' + id: totrans-32 prefs: [] type: TYPE_TB + zh: '| [`bass_biquad`](generated/torchaudio.functional.bass_biquad.html#torchaudio.functional.bass_biquad + "torchaudio.functional.bass_biquad") | 设计低音音控效果。 |' - en: '| [`biquad`](generated/torchaudio.functional.biquad.html#torchaudio.functional.biquad "torchaudio.functional.biquad") | Perform a biquad filter of input tensor. |' + id: totrans-33 prefs: [] type: TYPE_TB + zh: '| [`biquad`](generated/torchaudio.functional.biquad.html#torchaudio.functional.biquad + "torchaudio.functional.biquad") | 对输入张量执行双二阶滤波器。 |' - en: '| [`contrast`](generated/torchaudio.functional.contrast.html#torchaudio.functional.contrast "torchaudio.functional.contrast") | Apply contrast effect. |' + id: totrans-34 prefs: [] type: TYPE_TB + zh: '| [`contrast`](generated/torchaudio.functional.contrast.html#torchaudio.functional.contrast + "torchaudio.functional.contrast") | 应用对比度效果。 |' - en: '| [`dcshift`](generated/torchaudio.functional.dcshift.html#torchaudio.functional.dcshift "torchaudio.functional.dcshift") | Apply a DC shift to the audio. |' + id: totrans-35 prefs: [] type: TYPE_TB + zh: '| [`dcshift`](generated/torchaudio.functional.dcshift.html#torchaudio.functional.dcshift + "torchaudio.functional.dcshift") | 对音频应用DC偏移。 |' - en: '| [`deemph_biquad`](generated/torchaudio.functional.deemph_biquad.html#torchaudio.functional.deemph_biquad "torchaudio.functional.deemph_biquad") | Apply ISO 908 CD de-emphasis (shelving) IIR filter. |' + id: totrans-36 prefs: [] type: TYPE_TB + zh: '| [`deemph_biquad`](generated/torchaudio.functional.deemph_biquad.html#torchaudio.functional.deemph_biquad + "torchaudio.functional.deemph_biquad") | 应用ISO 908 CD去强调(搁置)IIR滤波器。 |' - en: '| [`dither`](generated/torchaudio.functional.dither.html#torchaudio.functional.dither "torchaudio.functional.dither") | Apply dither |' + id: totrans-37 prefs: [] type: TYPE_TB + zh: '| [`dither`](generated/torchaudio.functional.dither.html#torchaudio.functional.dither + "torchaudio.functional.dither") | 应用抖动。 |' - en: '| [`equalizer_biquad`](generated/torchaudio.functional.equalizer_biquad.html#torchaudio.functional.equalizer_biquad "torchaudio.functional.equalizer_biquad") | Design biquad peaking equalizer filter and perform filtering. |' + id: totrans-38 prefs: [] type: TYPE_TB + zh: '| [`equalizer_biquad`](generated/torchaudio.functional.equalizer_biquad.html#torchaudio.functional.equalizer_biquad + "torchaudio.functional.equalizer_biquad") | 设计双二阶峰值均衡器滤波器并执行滤波。 |' - en: '| [`filtfilt`](generated/torchaudio.functional.filtfilt.html#torchaudio.functional.filtfilt "torchaudio.functional.filtfilt") | Apply an IIR filter forward and backward to a waveform. |' + id: totrans-39 prefs: [] type: TYPE_TB + zh: '| [`filtfilt`](generated/torchaudio.functional.filtfilt.html#torchaudio.functional.filtfilt + "torchaudio.functional.filtfilt") | 对波形应用IIR滤波器前向和后向。 |' - en: '| [`flanger`](generated/torchaudio.functional.flanger.html#torchaudio.functional.flanger "torchaudio.functional.flanger") | Apply a flanger effect to the audio. |' + id: totrans-40 prefs: [] type: TYPE_TB + zh: '| [`flanger`](generated/torchaudio.functional.flanger.html#torchaudio.functional.flanger + "torchaudio.functional.flanger") | 对音频应用谐振效果。 |' - en: '| [`gain`](generated/torchaudio.functional.gain.html#torchaudio.functional.gain "torchaudio.functional.gain") | Apply amplification or attenuation to the whole waveform. |' + id: totrans-41 prefs: [] type: TYPE_TB + zh: '| [`gain`](generated/torchaudio.functional.gain.html#torchaudio.functional.gain + "torchaudio.functional.gain") | 对整个波形应用放大或衰减。 |' - en: '| [`highpass_biquad`](generated/torchaudio.functional.highpass_biquad.html#torchaudio.functional.highpass_biquad "torchaudio.functional.highpass_biquad") | Design biquad highpass filter and perform filtering. |' + id: totrans-42 prefs: [] type: TYPE_TB + zh: '| [`highpass_biquad`](generated/torchaudio.functional.highpass_biquad.html#torchaudio.functional.highpass_biquad + "torchaudio.functional.highpass_biquad") | 设计双二阶高通滤波器并执行滤波。 |' - en: '| [`lfilter`](generated/torchaudio.functional.lfilter.html#torchaudio.functional.lfilter "torchaudio.functional.lfilter") | Perform an IIR filter by evaluating difference equation. |' + id: totrans-43 prefs: [] type: TYPE_TB + zh: '| [`lfilter`](generated/torchaudio.functional.lfilter.html#torchaudio.functional.lfilter + "torchaudio.functional.lfilter") | 通过求解差分方程执行IIR滤波。 |' - en: '| [`lowpass_biquad`](generated/torchaudio.functional.lowpass_biquad.html#torchaudio.functional.lowpass_biquad "torchaudio.functional.lowpass_biquad") | Design biquad lowpass filter and perform filtering. |' + id: totrans-44 prefs: [] type: TYPE_TB + zh: '| [`lowpass_biquad`](generated/torchaudio.functional.lowpass_biquad.html#torchaudio.functional.lowpass_biquad + "torchaudio.functional.lowpass_biquad") | 设计双二阶低通滤波器并执行滤波。 |' - en: '| [`overdrive`](generated/torchaudio.functional.overdrive.html#torchaudio.functional.overdrive "torchaudio.functional.overdrive") | Apply a overdrive effect to the audio. |' + id: totrans-45 prefs: [] type: TYPE_TB + zh: '| [`overdrive`](generated/torchaudio.functional.overdrive.html#torchaudio.functional.overdrive + "torchaudio.functional.overdrive") | 对音频应用过载效果。 |' - en: '| [`phaser`](generated/torchaudio.functional.phaser.html#torchaudio.functional.phaser "torchaudio.functional.phaser") | Apply a phasing effect to the audio. |' + id: totrans-46 prefs: [] type: TYPE_TB + zh: '| [`phaser`](generated/torchaudio.functional.phaser.html#torchaudio.functional.phaser + "torchaudio.functional.phaser") | 对音频应用相位效果。 |' - en: '| [`riaa_biquad`](generated/torchaudio.functional.riaa_biquad.html#torchaudio.functional.riaa_biquad "torchaudio.functional.riaa_biquad") | Apply RIAA vinyl playback equalization. |' + id: totrans-47 prefs: [] type: TYPE_TB + zh: '| [`riaa_biquad`](generated/torchaudio.functional.riaa_biquad.html#torchaudio.functional.riaa_biquad + "torchaudio.functional.riaa_biquad") | 应用RIAA黑胶播放均衡。 |' - en: '| [`treble_biquad`](generated/torchaudio.functional.treble_biquad.html#torchaudio.functional.treble_biquad "torchaudio.functional.treble_biquad") | Design a treble tone-control effect. |' + id: totrans-48 prefs: [] type: TYPE_TB + zh: '| [`treble_biquad`](generated/torchaudio.functional.treble_biquad.html#torchaudio.functional.treble_biquad + "torchaudio.functional.treble_biquad") | 设计高音音控效果。 |' - en: Feature Extractions[](#feature-extractions "Permalink to this heading") + id: totrans-49 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 特征提取[](#feature-extractions "跳转到此标题的永久链接") - en: '| [`vad`](generated/torchaudio.functional.vad.html#torchaudio.functional.vad "torchaudio.functional.vad") | Voice Activity Detector. |' + id: totrans-50 prefs: [] type: TYPE_TB + zh: '| [`vad`](generated/torchaudio.functional.vad.html#torchaudio.functional.vad + "torchaudio.functional.vad") | 语音活动检测器。 |' - en: '| [`spectrogram`](generated/torchaudio.functional.spectrogram.html#torchaudio.functional.spectrogram "torchaudio.functional.spectrogram") | Create a spectrogram or a batch of spectrograms from a raw audio signal. |' + id: totrans-51 prefs: [] type: TYPE_TB + zh: '| [`spectrogram`](generated/torchaudio.functional.spectrogram.html#torchaudio.functional.spectrogram + "torchaudio.functional.spectrogram") | 从原始音频信号创建频谱图或一批频谱图。 |' - en: '| [`inverse_spectrogram`](generated/torchaudio.functional.inverse_spectrogram.html#torchaudio.functional.inverse_spectrogram "torchaudio.functional.inverse_spectrogram") | Create an inverse spectrogram or a batch of inverse spectrograms from the provided complex-valued spectrogram. |' + id: totrans-52 prefs: [] type: TYPE_TB + zh: '| [`inverse_spectrogram`](generated/torchaudio.functional.inverse_spectrogram.html#torchaudio.functional.inverse_spectrogram + "torchaudio.functional.inverse_spectrogram") | 从提供的复值频谱图创建逆频谱图或一批逆频谱图。 |' - en: '| [`griffinlim`](generated/torchaudio.functional.griffinlim.html#torchaudio.functional.griffinlim "torchaudio.functional.griffinlim") | Compute waveform from a linear scale magnitude spectrogram using the Griffin-Lim transformation. |' + id: totrans-53 prefs: [] type: TYPE_TB + zh: '| [`griffinlim`](generated/torchaudio.functional.griffinlim.html#torchaudio.functional.griffinlim + "torchaudio.functional.griffinlim") | 使用Griffin-Lim变换从线性刻度幅度频谱图计算波形。 |' - en: '| [`phase_vocoder`](generated/torchaudio.functional.phase_vocoder.html#torchaudio.functional.phase_vocoder "torchaudio.functional.phase_vocoder") | Given a STFT tensor, speed up in time without modifying pitch by a factor of `rate`. |' + id: totrans-54 prefs: [] type: TYPE_TB + zh: '| [`phase_vocoder`](generated/torchaudio.functional.phase_vocoder.html#torchaudio.functional.phase_vocoder + "torchaudio.functional.phase_vocoder") | 给定STFT张量,通过因子`rate`在时间上加速而不改变音调。 |' - en: '| [`pitch_shift`](generated/torchaudio.functional.pitch_shift.html#torchaudio.functional.pitch_shift "torchaudio.functional.pitch_shift") | Shift the pitch of a waveform by `n_steps` steps. |' + id: totrans-55 prefs: [] type: TYPE_TB + zh: '| [`pitch_shift`](generated/torchaudio.functional.pitch_shift.html#torchaudio.functional.pitch_shift + "torchaudio.functional.pitch_shift") | 将波形的音调向上或向下移动`n_steps`步。 |' - en: '| [`compute_deltas`](generated/torchaudio.functional.compute_deltas.html#torchaudio.functional.compute_deltas "torchaudio.functional.compute_deltas") | Compute delta coefficients of a tensor, usually a spectrogram: |' + id: totrans-56 prefs: [] type: TYPE_TB + zh: '| [`compute_deltas`](generated/torchaudio.functional.compute_deltas.html#torchaudio.functional.compute_deltas + "torchaudio.functional.compute_deltas") | 计算张量的增量系数,通常是频谱图: |' - en: '| [`detect_pitch_frequency`](generated/torchaudio.functional.detect_pitch_frequency.html#torchaudio.functional.detect_pitch_frequency "torchaudio.functional.detect_pitch_frequency") | Detect pitch frequency. |' + id: totrans-57 prefs: [] type: TYPE_TB + zh: '| [`detect_pitch_frequency`](generated/torchaudio.functional.detect_pitch_frequency.html#torchaudio.functional.detect_pitch_frequency + "torchaudio.functional.detect_pitch_frequency") | 检测音高频率。 |' - en: '| [`sliding_window_cmn`](generated/torchaudio.functional.sliding_window_cmn.html#torchaudio.functional.sliding_window_cmn "torchaudio.functional.sliding_window_cmn") | Apply sliding-window cepstral mean (and optionally variance) normalization per utterance. |' + id: totrans-58 prefs: [] type: TYPE_TB + zh: '| [`sliding_window_cmn`](generated/torchaudio.functional.sliding_window_cmn.html#torchaudio.functional.sliding_window_cmn + "torchaudio.functional.sliding_window_cmn") | 对每个话语应用滑动窗口倒谱均值(和可选的方差)归一化。 |' - en: '| [`spectral_centroid`](generated/torchaudio.functional.spectral_centroid.html#torchaudio.functional.spectral_centroid "torchaudio.functional.spectral_centroid") | Compute the spectral centroid for each channel along the time axis. |' + id: totrans-59 prefs: [] type: TYPE_TB + zh: '| [`spectral_centroid`](generated/torchaudio.functional.spectral_centroid.html#torchaudio.functional.spectral_centroid + "torchaudio.functional.spectral_centroid") | 计算每个通道沿时间轴的频谱中心。 |' - en: Multi-channel[](#multi-channel "Permalink to this heading") + id: totrans-60 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 多通道[](#multi-channel "跳转到此标题的永久链接") - en: '| [`psd`](generated/torchaudio.functional.psd.html#torchaudio.functional.psd "torchaudio.functional.psd") | Compute cross-channel power spectral density (PSD) matrix. |' + id: totrans-61 prefs: [] type: TYPE_TB + zh: '| [`psd`](generated/torchaudio.functional.psd.html#torchaudio.functional.psd + "torchaudio.functional.psd") | 计算跨通道功率谱密度(PSD)矩阵。 |' - en: '| [`mvdr_weights_souden`](generated/torchaudio.functional.mvdr_weights_souden.html#torchaudio.functional.mvdr_weights_souden "torchaudio.functional.mvdr_weights_souden") | Compute the Minimum Variance Distortionless Response (*MVDR* [[Capon, 1969](references.html#id34 "Jack Capon. High-resolution @@ -294,48 +477,77 @@ Affes. On optimal frequency-domain multichannel linear filtering for noise reduction. In IEEE Transactions on audio, speech, and language processing, volume 18, 260–276\. IEEE, 2009.")]. |' + id: totrans-62 prefs: [] type: TYPE_TB + zh: '| [`mvdr_weights_souden`](generated/torchaudio.functional.mvdr_weights_souden.html#torchaudio.functional.mvdr_weights_souden + "torchaudio.functional.mvdr_weights_souden") | 通过*Souden等人*提出的方法计算最小方差无失真响应(*MVDR*)波束形成权重[[Capon, + 1969](references.html#id34 "Jack Capon. 高分辨率频率-波数谱分析. IEEE会议录, 57(8):1408–1418, + 1969年。")]. |' - en: '| [`mvdr_weights_rtf`](generated/torchaudio.functional.mvdr_weights_rtf.html#torchaudio.functional.mvdr_weights_rtf "torchaudio.functional.mvdr_weights_rtf") | Compute the Minimum Variance Distortionless Response (*MVDR* [[Capon, 1969](references.html#id34 "Jack Capon. High-resolution frequency-wavenumber spectrum analysis. Proceedings of the IEEE, 57(8):1408–1418, 1969.")]) beamforming weights based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise. |' + id: totrans-63 prefs: [] type: TYPE_TB + zh: '| [`mvdr_weights_rtf`](generated/torchaudio.functional.mvdr_weights_rtf.html#torchaudio.functional.mvdr_weights_rtf + "torchaudio.functional.mvdr_weights_rtf") | 基于相对传递函数(RTF)和噪声的功率谱密度(PSD)矩阵计算最小方差无失真响应(*MVDR*)波束形成权重。 + |' - en: '| [`rtf_evd`](generated/torchaudio.functional.rtf_evd.html#torchaudio.functional.rtf_evd "torchaudio.functional.rtf_evd") | Estimate the relative transfer function (RTF) or the steering vector by eigenvalue decomposition. |' + id: totrans-64 prefs: [] type: TYPE_TB + zh: '| [`rtf_evd`](generated/torchaudio.functional.rtf_evd.html#torchaudio.functional.rtf_evd + "torchaudio.functional.rtf_evd") | 通过特征值分解估计相对传递函数(RTF)或指向向量。 |' - en: '| [`rtf_power`](generated/torchaudio.functional.rtf_power.html#torchaudio.functional.rtf_power "torchaudio.functional.rtf_power") | Estimate the relative transfer function (RTF) or the steering vector by the power method. |' + id: totrans-65 prefs: [] type: TYPE_TB + zh: '| [`rtf_power`](generated/torchaudio.functional.rtf_power.html#torchaudio.functional.rtf_power + "torchaudio.functional.rtf_power") | 通过功率方法估计相对传递函数(RTF)或指向向量。 |' - en: '| [`apply_beamforming`](generated/torchaudio.functional.apply_beamforming.html#torchaudio.functional.apply_beamforming "torchaudio.functional.apply_beamforming") | Apply the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum. |' + id: totrans-66 prefs: [] type: TYPE_TB + zh: '| [`apply_beamforming`](generated/torchaudio.functional.apply_beamforming.html#torchaudio.functional.apply_beamforming + "torchaudio.functional.apply_beamforming") | 将波束形成权重应用于多通道嘈杂频谱,以获得单通道增强频谱。 |' - en: Loss[](#loss "Permalink to this heading") + id: totrans-67 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 损失[](#loss "此标题的永久链接") - en: '| [`rnnt_loss`](generated/torchaudio.functional.rnnt_loss.html#torchaudio.functional.rnnt_loss "torchaudio.functional.rnnt_loss") | Compute the RNN Transducer loss from *Sequence Transduction with Recurrent Neural Networks* [[Graves, 2012](references.html#id18 "Alex Graves. Sequence transduction with recurrent neural networks. 2012\. arXiv:1211.3711.")]. |' + id: totrans-68 prefs: [] type: TYPE_TB + zh: '| [`rnnt_loss`](generated/torchaudio.functional.rnnt_loss.html#torchaudio.functional.rnnt_loss + "torchaudio.functional.rnnt_loss") | 从*使用循环神经网络进行序列转导*[[Graves, 2012](references.html#id18 + "Alex Graves. 使用循环神经网络进行序列转导. 2012. arXiv:1211.3711.")]计算RNN Transducer损失。 |' - en: Metric[](#metric "Permalink to this heading") + id: totrans-69 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 度量[](#metric "此标题的永久链接") - en: '| [`edit_distance`](generated/torchaudio.functional.edit_distance.html#torchaudio.functional.edit_distance "torchaudio.functional.edit_distance") | Calculate the word level edit (Levenshtein) distance between two sequences. |' + id: totrans-70 prefs: [] type: TYPE_TB + zh: '| [`edit_distance`](generated/torchaudio.functional.edit_distance.html#torchaudio.functional.edit_distance + "torchaudio.functional.edit_distance") | 计算两个序列之间的单词级编辑(Levenshtein)距离。 |' diff --git a/totrans/aud22_51.yaml b/totrans/aud22_51.yaml index c738edecaec1581caef66ac208d5313e994336c4..07262b267efd1b185c5981899991a4f65408b6b6 100644 --- a/totrans/aud22_51.yaml +++ b/totrans/aud22_51.yaml @@ -1,245 +1,393 @@ - en: torchaudio.transforms + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.transforms - en: 原文:[https://pytorch.org/audio/stable/transforms.html](https://pytorch.org/audio/stable/transforms.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/transforms.html](https://pytorch.org/audio/stable/transforms.html) - en: '`torchaudio.transforms` module contains common audio processings and feature extractions. The following diagram shows the relationship between some of the available transforms.' + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: '`torchaudio.transforms`模块包含常见的音频处理和特征提取。以下图表显示了一些可用变换之间的关系。' - en: '![https://download.pytorch.org/torchaudio/tutorial-assets/torchaudio_feature_extractions.png](../Images/82ba49f78e3cd14b6e337acaf57b11e2.png)' + id: totrans-3 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/tutorial-assets/torchaudio_feature_extractions.png](../Images/82ba49f78e3cd14b6e337acaf57b11e2.png)' - en: Transforms are implemented using [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module "(in PyTorch v2.1)"). Common ways to build a processing pipeline are to define custom Module class or chain Modules together using [`torch.nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential "(in PyTorch v2.1)"), then move it to a target device and data type. + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 变换是使用[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module + "(在PyTorch v2.1中)")实现的。构建处理流程的常见方法是定义自定义Module类或使用[`torch.nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential + "(在PyTorch v2.1中)")链接模块,然后将其移动到目标设备和数据类型。 - en: '[PRE0]' + id: totrans-5 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '[PRE1]' + id: totrans-6 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: Please check out tutorials that cover in-depth usage of trasforms. + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 请查看涵盖变换深入使用的教程。 - en: '![Audio Feature Extractions](../Images/fc6b9ddc12696e086aaac0cd46a41785.png)' + id: totrans-8 prefs: [] type: TYPE_IMG + zh: '![音频特征提取](../Images/fc6b9ddc12696e086aaac0cd46a41785.png)' - en: '[Audio Feature Extractions](tutorials/audio_feature_extractions_tutorial.html#sphx-glr-tutorials-audio-feature-extractions-tutorial-py)' + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: '[音频特征提取](tutorials/audio_feature_extractions_tutorial.html#sphx-glr-tutorials-audio-feature-extractions-tutorial-py)' - en: Audio Feature Extractions + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 音频特征提取 - en: Utility[](#utility "Permalink to this heading") + id: totrans-11 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 实用工具[](#utility "跳转到此标题") - en: '| [`AmplitudeToDB`](generated/torchaudio.transforms.AmplitudeToDB.html#torchaudio.transforms.AmplitudeToDB "torchaudio.transforms.AmplitudeToDB") | Turn a tensor from the power/amplitude scale to the decibel scale. |' + id: totrans-12 prefs: [] type: TYPE_TB + zh: '| [`AmplitudeToDB`](generated/torchaudio.transforms.AmplitudeToDB.html#torchaudio.transforms.AmplitudeToDB + "torchaudio.transforms.AmplitudeToDB") | 将张量从功率/幅度比例转换为分贝比例。 |' - en: '| [`MuLawEncoding`](generated/torchaudio.transforms.MuLawEncoding.html#torchaudio.transforms.MuLawEncoding "torchaudio.transforms.MuLawEncoding") | Encode signal based on mu-law companding. |' + id: totrans-13 prefs: [] type: TYPE_TB + zh: '| [`MuLawEncoding`](generated/torchaudio.transforms.MuLawEncoding.html#torchaudio.transforms.MuLawEncoding + "torchaudio.transforms.MuLawEncoding") | 基于mu-law压缩对信号进行编码。 |' - en: '| [`MuLawDecoding`](generated/torchaudio.transforms.MuLawDecoding.html#torchaudio.transforms.MuLawDecoding "torchaudio.transforms.MuLawDecoding") | Decode mu-law encoded signal. |' + id: totrans-14 prefs: [] type: TYPE_TB + zh: '| [`MuLawDecoding`](generated/torchaudio.transforms.MuLawDecoding.html#torchaudio.transforms.MuLawDecoding + "torchaudio.transforms.MuLawDecoding") | 解码mu-law编码的信号。 |' - en: '| [`Resample`](generated/torchaudio.transforms.Resample.html#torchaudio.transforms.Resample "torchaudio.transforms.Resample") | Resample a signal from one frequency to another. |' + id: totrans-15 prefs: [] type: TYPE_TB + zh: '| [`Resample`](generated/torchaudio.transforms.Resample.html#torchaudio.transforms.Resample + "torchaudio.transforms.Resample") | 将信号从一个频率重新采样到另一个频率。 |' - en: '| [`Fade`](generated/torchaudio.transforms.Fade.html#torchaudio.transforms.Fade "torchaudio.transforms.Fade") | Add a fade in and/or fade out to an waveform. |' + id: totrans-16 prefs: [] type: TYPE_TB + zh: '| [`Fade`](generated/torchaudio.transforms.Fade.html#torchaudio.transforms.Fade + "torchaudio.transforms.Fade") | 为波形添加淡入和/或淡出。 |' - en: '| [`Vol`](generated/torchaudio.transforms.Vol.html#torchaudio.transforms.Vol "torchaudio.transforms.Vol") | Adjust volume of waveform. |' + id: totrans-17 prefs: [] type: TYPE_TB + zh: '| [`Vol`](generated/torchaudio.transforms.Vol.html#torchaudio.transforms.Vol + "torchaudio.transforms.Vol") | 调整波形的音量。 |' - en: '| [`Loudness`](generated/torchaudio.transforms.Loudness.html#torchaudio.transforms.Loudness "torchaudio.transforms.Loudness") | Measure audio loudness according to the ITU-R BS.1770-4 recommendation. |' + id: totrans-18 prefs: [] type: TYPE_TB + zh: '| [`Loudness`](generated/torchaudio.transforms.Loudness.html#torchaudio.transforms.Loudness + "torchaudio.transforms.Loudness") | 根据ITU-R BS.1770-4建议测量音频响度。 |' - en: '| [`AddNoise`](generated/torchaudio.transforms.AddNoise.html#torchaudio.transforms.AddNoise "torchaudio.transforms.AddNoise") | Scales and adds noise to waveform per signal-to-noise ratio. |' + id: totrans-19 prefs: [] type: TYPE_TB + zh: '| [`AddNoise`](generated/torchaudio.transforms.AddNoise.html#torchaudio.transforms.AddNoise + "torchaudio.transforms.AddNoise") | 根据信噪比对波形进行缩放和添加噪音。 |' - en: '| [`Convolve`](generated/torchaudio.transforms.Convolve.html#torchaudio.transforms.Convolve "torchaudio.transforms.Convolve") | Convolves inputs along their last dimension using the direct method. |' + id: totrans-20 prefs: [] type: TYPE_TB + zh: '| [`Convolve`](generated/torchaudio.transforms.Convolve.html#torchaudio.transforms.Convolve + "torchaudio.transforms.Convolve") | 使用直接方法沿着它们的最后一个维度对输入进行卷积。 |' - en: '| [`FFTConvolve`](generated/torchaudio.transforms.FFTConvolve.html#torchaudio.transforms.FFTConvolve "torchaudio.transforms.FFTConvolve") | Convolves inputs along their last dimension using FFT. |' + id: totrans-21 prefs: [] type: TYPE_TB + zh: '| [`FFTConvolve`](generated/torchaudio.transforms.FFTConvolve.html#torchaudio.transforms.FFTConvolve + "torchaudio.transforms.FFTConvolve") | 使用FFT沿着它们的最后一个维度对输入进行卷积。 |' - en: '| [`Speed`](generated/torchaudio.transforms.Speed.html#torchaudio.transforms.Speed "torchaudio.transforms.Speed") | Adjusts waveform speed. |' + id: totrans-22 prefs: [] type: TYPE_TB + zh: '| [`Speed`](generated/torchaudio.transforms.Speed.html#torchaudio.transforms.Speed + "torchaudio.transforms.Speed") | 调整波形速度。 |' - en: '| [`SpeedPerturbation`](generated/torchaudio.transforms.SpeedPerturbation.html#torchaudio.transforms.SpeedPerturbation "torchaudio.transforms.SpeedPerturbation") | Applies the speed perturbation augmentation introduced in *Audio augmentation for speech recognition* [[Ko *et al.*, 2015](references.html#id58 "Tom Ko, Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur. Audio augmentation for speech recognition. In Proc. Interspeech 2015, 3586–3589\. 2015\. doi:10.21437/Interspeech.2015-711.")]. |' + id: totrans-23 prefs: [] type: TYPE_TB + zh: '| [`SpeedPerturbation`](generated/torchaudio.transforms.SpeedPerturbation.html#torchaudio.transforms.SpeedPerturbation + "torchaudio.transforms.SpeedPerturbation") | 应用于*语音识别的音频增强*中引入的速度扰动增强[[Ko等,2015](references.html#id58 + "Tom Ko,Vijayaditya Peddinti,Daniel Povey和Sanjeev Khudanpur。语音识别的音频增强。在Proc. Interspeech + 2015中,3586-3589。2015。doi:10.21437/Interspeech.2015-711。")]. |' - en: '| [`Deemphasis`](generated/torchaudio.transforms.Deemphasis.html#torchaudio.transforms.Deemphasis "torchaudio.transforms.Deemphasis") | De-emphasizes a waveform along its last dimension. |' + id: totrans-24 prefs: [] type: TYPE_TB + zh: '| [`Deemphasis`](generated/torchaudio.transforms.Deemphasis.html#torchaudio.transforms.Deemphasis + "torchaudio.transforms.Deemphasis") | 沿着其最后一个维度减弱波形。 |' - en: '| [`Preemphasis`](generated/torchaudio.transforms.Preemphasis.html#torchaudio.transforms.Preemphasis "torchaudio.transforms.Preemphasis") | Pre-emphasizes a waveform along its last dimension. |' + id: totrans-25 prefs: [] type: TYPE_TB + zh: '| [`Preemphasis`](generated/torchaudio.transforms.Preemphasis.html#torchaudio.transforms.Preemphasis + "torchaudio.transforms.Preemphasis") | 沿着最后一个维度对波形进行预强调。 |' - en: Feature Extractions[](#feature-extractions "Permalink to this heading") + id: totrans-26 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 特征提取[](#feature-extractions "跳转到此标题") - en: '| [`Spectrogram`](generated/torchaudio.transforms.Spectrogram.html#torchaudio.transforms.Spectrogram "torchaudio.transforms.Spectrogram") | Create a spectrogram from a audio signal. |' + id: totrans-27 prefs: [] type: TYPE_TB + zh: '| [`Spectrogram`](generated/torchaudio.transforms.Spectrogram.html#torchaudio.transforms.Spectrogram + "torchaudio.transforms.Spectrogram") | 从音频信号创建频谱图。 |' - en: '| [`InverseSpectrogram`](generated/torchaudio.transforms.InverseSpectrogram.html#torchaudio.transforms.InverseSpectrogram "torchaudio.transforms.InverseSpectrogram") | Create an inverse spectrogram to recover an audio signal from a spectrogram. |' + id: totrans-28 prefs: [] type: TYPE_TB + zh: '| [`InverseSpectrogram`](generated/torchaudio.transforms.InverseSpectrogram.html#torchaudio.transforms.InverseSpectrogram + "torchaudio.transforms.InverseSpectrogram") | 创建一个逆频谱图,从频谱图中恢复音频信号。 |' - en: '| [`MelScale`](generated/torchaudio.transforms.MelScale.html#torchaudio.transforms.MelScale "torchaudio.transforms.MelScale") | Turn a normal STFT into a mel frequency STFT with triangular filter banks. |' + id: totrans-29 prefs: [] type: TYPE_TB + zh: '| [`MelScale`](generated/torchaudio.transforms.MelScale.html#torchaudio.transforms.MelScale + "torchaudio.transforms.MelScale") | 将普通STFT转换为带有三角滤波器组的梅尔频率STFT。 |' - en: '| [`InverseMelScale`](generated/torchaudio.transforms.InverseMelScale.html#torchaudio.transforms.InverseMelScale "torchaudio.transforms.InverseMelScale") | Estimate a STFT in normal frequency domain from mel frequency domain. |' + id: totrans-30 prefs: [] type: TYPE_TB + zh: '| [`InverseMelScale`](generated/torchaudio.transforms.InverseMelScale.html#torchaudio.transforms.InverseMelScale + "torchaudio.transforms.InverseMelScale") | 从梅尔频率域估计正常频率域中的STFT。 |' - en: '| [`MelSpectrogram`](generated/torchaudio.transforms.MelSpectrogram.html#torchaudio.transforms.MelSpectrogram "torchaudio.transforms.MelSpectrogram") | Create MelSpectrogram for a raw audio signal. |' + id: totrans-31 prefs: [] type: TYPE_TB + zh: '| [`MelSpectrogram`](generated/torchaudio.transforms.MelSpectrogram.html#torchaudio.transforms.MelSpectrogram + "torchaudio.transforms.MelSpectrogram") | 为原始音频信号创建MelSpectrogram。 |' - en: '| [`GriffinLim`](generated/torchaudio.transforms.GriffinLim.html#torchaudio.transforms.GriffinLim "torchaudio.transforms.GriffinLim") | Compute waveform from a linear scale magnitude spectrogram using the Griffin-Lim transformation. |' + id: totrans-32 prefs: [] type: TYPE_TB + zh: '| [`GriffinLim`](generated/torchaudio.transforms.GriffinLim.html#torchaudio.transforms.GriffinLim + "torchaudio.transforms.GriffinLim") | 使用Griffin-Lim变换从线性幅度频谱图计算波形。 |' - en: '| [`MFCC`](generated/torchaudio.transforms.MFCC.html#torchaudio.transforms.MFCC "torchaudio.transforms.MFCC") | Create the Mel-frequency cepstrum coefficients from an audio signal. |' + id: totrans-33 prefs: [] type: TYPE_TB + zh: '| [`MFCC`](generated/torchaudio.transforms.MFCC.html#torchaudio.transforms.MFCC + "torchaudio.transforms.MFCC") | 从音频信号创建梅尔频率倒谱系数。 |' - en: '| [`LFCC`](generated/torchaudio.transforms.LFCC.html#torchaudio.transforms.LFCC "torchaudio.transforms.LFCC") | Create the linear-frequency cepstrum coefficients from an audio signal. |' + id: totrans-34 prefs: [] type: TYPE_TB + zh: '| [`LFCC`](generated/torchaudio.transforms.LFCC.html#torchaudio.transforms.LFCC + "torchaudio.transforms.LFCC") | 从音频信号创建线性频率倒谱系数。 |' - en: '| [`ComputeDeltas`](generated/torchaudio.transforms.ComputeDeltas.html#torchaudio.transforms.ComputeDeltas "torchaudio.transforms.ComputeDeltas") | Compute delta coefficients of a tensor, usually a spectrogram. |' + id: totrans-35 prefs: [] type: TYPE_TB + zh: '| [`ComputeDeltas`](generated/torchaudio.transforms.ComputeDeltas.html#torchaudio.transforms.ComputeDeltas + "torchaudio.transforms.ComputeDeltas") | 计算张量的增量系数,通常是频谱图。 |' - en: '| [`PitchShift`](generated/torchaudio.transforms.PitchShift.html#torchaudio.transforms.PitchShift "torchaudio.transforms.PitchShift") | Shift the pitch of a waveform by `n_steps` steps. |' + id: totrans-36 prefs: [] type: TYPE_TB + zh: '| [`PitchShift`](generated/torchaudio.transforms.PitchShift.html#torchaudio.transforms.PitchShift + "torchaudio.transforms.PitchShift") | 将波形的音调移动`n_steps`步。 |' - en: '| [`SlidingWindowCmn`](generated/torchaudio.transforms.SlidingWindowCmn.html#torchaudio.transforms.SlidingWindowCmn "torchaudio.transforms.SlidingWindowCmn") | Apply sliding-window cepstral mean (and optionally variance) normalization per utterance. |' + id: totrans-37 prefs: [] type: TYPE_TB + zh: '| [`SlidingWindowCmn`](generated/torchaudio.transforms.SlidingWindowCmn.html#torchaudio.transforms.SlidingWindowCmn + "torchaudio.transforms.SlidingWindowCmn") | 对每个话语应用滑动窗口倾斜均值(和可选的方差)归一化。 |' - en: '| [`SpectralCentroid`](generated/torchaudio.transforms.SpectralCentroid.html#torchaudio.transforms.SpectralCentroid "torchaudio.transforms.SpectralCentroid") | Compute the spectral centroid for each channel along the time axis. |' + id: totrans-38 prefs: [] type: TYPE_TB + zh: '| [`SpectralCentroid`](generated/torchaudio.transforms.SpectralCentroid.html#torchaudio.transforms.SpectralCentroid + "torchaudio.transforms.SpectralCentroid") | 计算每个通道沿时间轴的频谱中心。 |' - en: '| [`Vad`](generated/torchaudio.transforms.Vad.html#torchaudio.transforms.Vad "torchaudio.transforms.Vad") | Voice Activity Detector. |' + id: totrans-39 prefs: [] type: TYPE_TB + zh: '| [`Vad`](generated/torchaudio.transforms.Vad.html#torchaudio.transforms.Vad + "torchaudio.transforms.Vad") | 语音活动检测器。 |' - en: Augmentations[](#augmentations "Permalink to this heading") + id: totrans-40 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 增强[](#augmentations "跳转到此标题") - en: 'The following transforms implement popular augmentation techniques known as *SpecAugment* [[Park *et al.*, 2019](references.html#id6 "Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, and Quoc V. Le. Specaugment: a simple data augmentation method for automatic speech recognition. Interspeech 2019, Sep 2019\. URL: http://dx.doi.org/10.21437/Interspeech.2019-2680, doi:10.21437/interspeech.2019-2680.")].' + id: totrans-41 prefs: [] type: TYPE_NORMAL + zh: '以下转换实现了众所周知的增强技术,称为*SpecAugment* [[Park *et al.*, 2019](references.html#id6 + "Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. + Cubuk, and Quoc V. Le. Specaugment: a simple data augmentation method for automatic + speech recognition. Interspeech 2019, Sep 2019\. URL: http://dx.doi.org/10.21437/Interspeech.2019-2680, + doi:10.21437/interspeech.2019-2680.")].' - en: '| [`FrequencyMasking`](generated/torchaudio.transforms.FrequencyMasking.html#torchaudio.transforms.FrequencyMasking "torchaudio.transforms.FrequencyMasking") | Apply masking to a spectrogram in the frequency domain. |' + id: totrans-42 prefs: [] type: TYPE_TB + zh: '| [`FrequencyMasking`](generated/torchaudio.transforms.FrequencyMasking.html#torchaudio.transforms.FrequencyMasking + "torchaudio.transforms.FrequencyMasking") | 在频率域中对频谱图应用掩蔽。 |' - en: '| [`TimeMasking`](generated/torchaudio.transforms.TimeMasking.html#torchaudio.transforms.TimeMasking "torchaudio.transforms.TimeMasking") | Apply masking to a spectrogram in the time domain. |' + id: totrans-43 prefs: [] type: TYPE_TB + zh: '| [`TimeMasking`](generated/torchaudio.transforms.TimeMasking.html#torchaudio.transforms.TimeMasking + "torchaudio.transforms.TimeMasking") | 在时间域中对频谱图应用掩蔽。 |' - en: '| [`TimeStretch`](generated/torchaudio.transforms.TimeStretch.html#torchaudio.transforms.TimeStretch "torchaudio.transforms.TimeStretch") | Stretch stft in time without modifying pitch for a given rate. |' + id: totrans-44 prefs: [] type: TYPE_TB + zh: '| [`TimeStretch`](generated/torchaudio.transforms.TimeStretch.html#torchaudio.transforms.TimeStretch + "torchaudio.transforms.TimeStretch") | 在不改变音调的情况下拉伸时间的stft。 |' - en: Loss[](#loss "Permalink to this heading") + id: totrans-45 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 损失[](#loss "到这个标题的永久链接") - en: '| [`RNNTLoss`](generated/torchaudio.transforms.RNNTLoss.html#torchaudio.transforms.RNNTLoss "torchaudio.transforms.RNNTLoss") | Compute the RNN Transducer loss from *Sequence Transduction with Recurrent Neural Networks* [[Graves, 2012](references.html#id18 "Alex Graves. Sequence transduction with recurrent neural networks. 2012\. arXiv:1211.3711.")]. |' + id: totrans-46 prefs: [] type: TYPE_TB + zh: '| [`RNNTLoss`](generated/torchaudio.transforms.RNNTLoss.html#torchaudio.transforms.RNNTLoss + "torchaudio.transforms.RNNTLoss") | 计算来自*使用循环神经网络进行序列转导*[[Graves, 2012](references.html#id18 + "Alex Graves. Sequence transduction with recurrent neural networks. 2012\. arXiv:1211.3711.")]的RNN + Transducer损失。 |' - en: Multi-channel[](#multi-channel "Permalink to this heading") + id: totrans-47 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 多通道[](#multi-channel "到这个标题的永久链接") - en: '| [`PSD`](generated/torchaudio.transforms.PSD.html#torchaudio.transforms.PSD "torchaudio.transforms.PSD") | Compute cross-channel power spectral density (PSD) matrix. |' + id: totrans-48 prefs: [] type: TYPE_TB + zh: '| [`PSD`](generated/torchaudio.transforms.PSD.html#torchaudio.transforms.PSD + "torchaudio.transforms.PSD") | 计算跨通道功率谱密度(PSD)矩阵。 |' - en: '| [`MVDR`](generated/torchaudio.transforms.MVDR.html#torchaudio.transforms.MVDR "torchaudio.transforms.MVDR") | Minimum Variance Distortionless Response (MVDR) module that performs MVDR beamforming with Time-Frequency masks. |' + id: totrans-49 prefs: [] type: TYPE_TB + zh: '| [`MVDR`](generated/torchaudio.transforms.MVDR.html#torchaudio.transforms.MVDR + "torchaudio.transforms.MVDR") | 执行带有时频掩模的MVDR波束形成的最小方差无失真响应(MVDR)模块。 |' - en: '| [`RTFMVDR`](generated/torchaudio.transforms.RTFMVDR.html#torchaudio.transforms.RTFMVDR "torchaudio.transforms.RTFMVDR") | Minimum Variance Distortionless Response (*MVDR* [[Capon, 1969](references.html#id34 "Jack Capon. High-resolution frequency-wavenumber spectrum analysis. Proceedings of the IEEE, 57(8):1408–1418, 1969.")]) module based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise. |' + id: totrans-50 prefs: [] type: TYPE_TB + zh: '| [`RTFMVDR`](generated/torchaudio.transforms.RTFMVDR.html#torchaudio.transforms.RTFMVDR + "torchaudio.transforms.RTFMVDR") | 基于相对传递函数(RTF)和噪声的功率谱密度(PSD)矩阵的最小方差无失真响应(*MVDR*[[Capon, + 1969](references.html#id34 "Jack Capon. High-resolution frequency-wavenumber spectrum + analysis. Proceedings of the IEEE, 57(8):1408–1418, 1969.")])模块。 |' - en: '| [`SoudenMVDR`](generated/torchaudio.transforms.SoudenMVDR.html#torchaudio.transforms.SoudenMVDR "torchaudio.transforms.SoudenMVDR") | Minimum Variance Distortionless Response (*MVDR* [[Capon, 1969](references.html#id34 "Jack Capon. High-resolution frequency-wavenumber @@ -248,5 +396,13 @@ "Mehrez Souden, Jacob Benesty, and Sofiene Affes. On optimal frequency-domain multichannel linear filtering for noise reduction. In IEEE Transactions on audio, speech, and language processing, volume 18, 260–276\. IEEE, 2009.")]. |' + id: totrans-51 prefs: [] type: TYPE_TB + zh: '| [`SoudenMVDR`](generated/torchaudio.transforms.SoudenMVDR.html#torchaudio.transforms.SoudenMVDR + "torchaudio.transforms.SoudenMVDR") | 基于*Souden等人*[[Souden *et al.*, 2009](references.html#id28 + "Mehrez Souden, Jacob Benesty, and Sofiene Affes. On optimal frequency-domain + multichannel linear filtering for noise reduction. In IEEE Transactions on audio, + speech, and language processing, volume 18, 260–276\. IEEE, 2009.")]提出的方法的最小方差无失真响应(*MVDR*[[Capon, + 1969](references.html#id34 "Jack Capon. High-resolution frequency-wavenumber spectrum + analysis. Proceedings of the IEEE, 57(8):1408–1418, 1969.")])模块。 |' diff --git a/totrans/aud22_52.yaml b/totrans/aud22_52.yaml index 9901ab8b778a1de722ec6fb371fc90ffd66c85c1..e7819e51d5cd8e9ab9024398fa59ee488de8680f 100644 --- a/totrans/aud22_52.yaml +++ b/totrans/aud22_52.yaml @@ -1,88 +1,156 @@ - en: torchaudio.datasets + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.datasets - en: 原文:[https://pytorch.org/audio/stable/datasets.html](https://pytorch.org/audio/stable/datasets.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/datasets.html](https://pytorch.org/audio/stable/datasets.html) - en: All datasets are subclasses of [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset "(in PyTorch v2.1)") and have `__getitem__` and `__len__` methods implemented. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 所有数据集都是 [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset + "(在 PyTorch v2.1 中)") 的子类,并实现了 `__getitem__` 和 `__len__` 方法。 - en: 'Hence, they can all be passed to a [`torch.utils.data.DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader "(in PyTorch v2.1)") which can load multiple samples parallelly using [`torch.multiprocessing`](https://pytorch.org/docs/stable/multiprocessing.html#module-torch.multiprocessing "(in PyTorch v2.1)") workers. For example:' + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 因此,它们都可以传递给 [`torch.utils.data.DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader + "(在 PyTorch v2.1 中)"),该加载器可以使用 [`torch.multiprocessing`](https://pytorch.org/docs/stable/multiprocessing.html#module-torch.multiprocessing + "(在 PyTorch v2.1 中)") 工作器并行加载多个样本。例如: - en: '[PRE0]' + id: totrans-4 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '| [`CMUARCTIC`](generated/torchaudio.datasets.CMUARCTIC.html#torchaudio.datasets.CMUARCTIC "torchaudio.datasets.CMUARCTIC") | *CMU ARCTIC* [[Kominek *et al.*, 2003](references.html#id36 "John Kominek, Alan W Black, and Ver Ver. Cmu arctic databases for speech synthesis. Technical Report, 2003.")] dataset. |' + id: totrans-5 prefs: [] type: TYPE_TB + zh: '| [`CMUARCTIC`](generated/torchaudio.datasets.CMUARCTIC.html#torchaudio.datasets.CMUARCTIC + "torchaudio.datasets.CMUARCTIC") | *CMU ARCTIC* [[Kominek *et al.*, 2003](references.html#id36 + "John Kominek, Alan W Black, and Ver Ver. Cmu arctic databases for speech synthesis. + Technical Report, 2003.")] 数据集。|' - en: '| [`CMUDict`](generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict "torchaudio.datasets.CMUDict") | *CMU Pronouncing Dictionary* [[Weide, 1998](references.html#id45 "R.L. Weide. The carnegie mellon pronuncing dictionary. 1998\. URL: http://www.speech.cs.cmu.edu/cgi-bin/cmudict.")] (CMUDict) dataset. |' + id: totrans-6 prefs: [] type: TYPE_TB + zh: '| [`CMUDict`](generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict + "torchaudio.datasets.CMUDict") | *CMU Pronouncing Dictionary* [[Weide, 1998](references.html#id45 + "R.L. Weide. The carnegie mellon pronuncing dictionary. 1998\. URL: http://www.speech.cs.cmu.edu/cgi-bin/cmudict.")] + (CMUDict) 数据集。|' - en: '| [`COMMONVOICE`](generated/torchaudio.datasets.COMMONVOICE.html#torchaudio.datasets.COMMONVOICE "torchaudio.datasets.COMMONVOICE") | *CommonVoice* [[Ardila *et al.*, 2020](references.html#id10 "Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, and Gregor Weber. Common voice: a massively-multilingual speech corpus. 2020\. arXiv:1912.06670.")] dataset. |' + id: totrans-7 prefs: [] type: TYPE_TB + zh: '| [`COMMONVOICE`](generated/torchaudio.datasets.COMMONVOICE.html#torchaudio.datasets.COMMONVOICE + "torchaudio.datasets.COMMONVOICE") | *CommonVoice* [[Ardila *et al.*, 2020](references.html#id10 + "Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, + Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, and Gregor Weber. + Common voice: a massively-multilingual speech corpus. 2020\. arXiv:1912.06670.")] + 数据集。|' - en: '| [`DR_VCTK`](generated/torchaudio.datasets.DR_VCTK.html#torchaudio.datasets.DR_VCTK "torchaudio.datasets.DR_VCTK") | *Device Recorded VCTK (Small subset version)* [[Sarfjoo and Yamagishi, 2018](references.html#id42 "Seyyed Saeed Sarfjoo and Junichi Yamagishi. Device recorded vctk (small subset version). 2018.")] dataset. |' + id: totrans-8 prefs: [] type: TYPE_TB + zh: '| [`DR_VCTK`](generated/torchaudio.datasets.DR_VCTK.html#torchaudio.datasets.DR_VCTK + "torchaudio.datasets.DR_VCTK") | *Device Recorded VCTK (Small subset version)* + [[Sarfjoo and Yamagishi, 2018](references.html#id42 "Seyyed Saeed Sarfjoo and + Junichi Yamagishi. Device recorded vctk (small subset version). 2018.")] 数据集。|' - en: '| [`FluentSpeechCommands`](generated/torchaudio.datasets.FluentSpeechCommands.html#torchaudio.datasets.FluentSpeechCommands "torchaudio.datasets.FluentSpeechCommands") | *Fluent Speech Commands* [[Lugosch *et al.*, 2019](references.html#id48 "Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, and Yoshua Bengio. Speech model pre-training for end-to-end spoken language understanding. In Gernot Kubin and Zdravko Kacic, editors, Proc. of Interspeech, 814–818\. 2019.")] dataset |' + id: totrans-9 prefs: [] type: TYPE_TB + zh: '| [`FluentSpeechCommands`](generated/torchaudio.datasets.FluentSpeechCommands.html#torchaudio.datasets.FluentSpeechCommands + "torchaudio.datasets.FluentSpeechCommands") | *Fluent Speech Commands* [[Lugosch + *et al.*, 2019](references.html#id48 "Loren Lugosch, Mirco Ravanelli, Patrick + Ignoto, Vikrant Singh Tomar, and Yoshua Bengio. Speech model pre-training for + end-to-end spoken language understanding. In Gernot Kubin and Zdravko Kacic, editors, + Proc. of Interspeech, 814–818\. 2019.")] 数据集|' - en: '| [`GTZAN`](generated/torchaudio.datasets.GTZAN.html#torchaudio.datasets.GTZAN "torchaudio.datasets.GTZAN") | *GTZAN* [[Tzanetakis *et al.*, 2001](references.html#id43 "George Tzanetakis, Georg Essl, and Perry Cook. Automatic musical genre classification of audio signals. 2001\. URL: http://ismir2001.ismir.net/pdf/tzanetakis.pdf.")] dataset. |' + id: totrans-10 prefs: [] type: TYPE_TB + zh: '| [`GTZAN`](generated/torchaudio.datasets.GTZAN.html#torchaudio.datasets.GTZAN + "torchaudio.datasets.GTZAN") | *GTZAN* [[Tzanetakis *et al.*, 2001](references.html#id43 + "George Tzanetakis, Georg Essl, and Perry Cook. Automatic musical genre classification + of audio signals. 2001\. URL: http://ismir2001.ismir.net/pdf/tzanetakis.pdf.")] + 数据集。|' - en: '| [`IEMOCAP`](generated/torchaudio.datasets.IEMOCAP.html#torchaudio.datasets.IEMOCAP "torchaudio.datasets.IEMOCAP") | *IEMOCAP* [[Busso *et al.*, 2008](references.html#id52 "Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower Provost, Samuel Kim, Jeannette Chang, Sungbok Lee, and Shrikanth Narayanan. Iemocap: interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42:335-359, 12 2008\. doi:10.1007/s10579-008-9076-6.")] dataset. |' + id: totrans-11 prefs: [] type: TYPE_TB + zh: '| [`IEMOCAP`](generated/torchaudio.datasets.IEMOCAP.html#torchaudio.datasets.IEMOCAP + "torchaudio.datasets.IEMOCAP") | *IEMOCAP* [[Busso *et al.*, 2008](references.html#id52 + "Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower Provost, + Samuel Kim, Jeannette Chang, Sungbok Lee, and Shrikanth Narayanan. Iemocap: interactive + emotional dyadic motion capture database. Language Resources and Evaluation, 42:335-359, + 12 2008\. doi:10.1007/s10579-008-9076-6.")] 数据集。|' - en: '| [`LibriMix`](generated/torchaudio.datasets.LibriMix.html#torchaudio.datasets.LibriMix "torchaudio.datasets.LibriMix") | *LibriMix* [[Cosentino *et al.*, 2020](references.html#id37 "Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, and Emmanuel Vincent. Librimix: an open-source dataset for generalizable speech separation. 2020\. arXiv:2005.11262.")] dataset. |' + id: totrans-12 prefs: [] type: TYPE_TB + zh: '| [`LibriMix`](generated/torchaudio.datasets.LibriMix.html#torchaudio.datasets.LibriMix + "torchaudio.datasets.LibriMix") | *LibriMix* [[Cosentino *et al.*, 2020](references.html#id37 + "Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, and Emmanuel + Vincent. Librimix: an open-source dataset for generalizable speech separation. + 2020\. arXiv:2005.11262.")] 数据集。|' - en: '| [`LIBRISPEECH`](generated/torchaudio.datasets.LIBRISPEECH.html#torchaudio.datasets.LIBRISPEECH "torchaudio.datasets.LIBRISPEECH") | *LibriSpeech* [[Panayotov *et al.*, 2015](references.html#id13 "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. doi:10.1109/ICASSP.2015.7178964.")] dataset. |' + id: totrans-13 prefs: [] type: TYPE_TB + zh: '| [`LIBRISPEECH`](generated/torchaudio.datasets.LIBRISPEECH.html#torchaudio.datasets.LIBRISPEECH + "torchaudio.datasets.LIBRISPEECH") | *LibriSpeech* [[Panayotov *et al.*, 2015](references.html#id13 + "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: + an asr corpus based on public domain audio books. In 2015 IEEE International Conference + on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. + doi:10.1109/ICASSP.2015.7178964.")] 数据集。|' - en: '| [`LibriLightLimited`](generated/torchaudio.datasets.LibriLightLimited.html#torchaudio.datasets.LibriLightLimited "torchaudio.datasets.LibriLightLimited") | Subset of Libri-light [[Kahn *et al.*, 2020](references.html#id12 "J. Kahn, M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, @@ -95,29 +163,59 @@ and Abdelrahman Mohamed. Hubert: self-supervised speech representation learning by masked prediction of hidden units. 2021\. arXiv:2106.07447.")] for supervised fine-tuning. |' + id: totrans-14 prefs: [] type: TYPE_TB + zh: '| [`LibriLightLimited`](generated/torchaudio.datasets.LibriLightLimited.html#torchaudio.datasets.LibriLightLimited + "torchaudio.datasets.LibriLightLimited") | Libri-light的子集 [[Kahn *et al.*, 2020](references.html#id12 + "J. Kahn, M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, + V. Liptchinsky, R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, + A. Mohamed, and E. Dupoux. Libri-light: a benchmark for asr with limited or no + supervision. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, + Speech and Signal Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")] + 数据集,被用于HuBERT [[Hsu *et al.*, 2021](references.html#id16 "Wei-Ning Hsu, Benjamin + Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman + Mohamed. Hubert: self-supervised speech representation learning by masked prediction + of hidden units. 2021\. arXiv:2106.07447.")] 进行监督微调。|' - en: '| [`LIBRITTS`](generated/torchaudio.datasets.LIBRITTS.html#torchaudio.datasets.LIBRITTS "torchaudio.datasets.LIBRITTS") | *LibriTTS* [[Zen *et al.*, 2019](references.html#id38 "Heiga Zen, Viet-Trung Dang, Robert A. J. Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Z. Chen, and Yonghui Wu. Libritts: a corpus derived from librispeech for text-to-speech. ArXiv, 2019.")] dataset. |' + id: totrans-15 prefs: [] type: TYPE_TB + zh: '| [`LIBRITTS`](generated/torchaudio.datasets.LIBRITTS.html#torchaudio.datasets.LIBRITTS + "torchaudio.datasets.LIBRITTS") | *LibriTTS* [[Zen *et al.*, 2019](references.html#id38 + "Heiga Zen, Viet-Trung Dang, Robert A. J. Clark, Yu Zhang, Ron J. Weiss, Ye Jia, + Z. Chen, and Yonghui Wu. Libritts: a corpus derived from librispeech for text-to-speech. + ArXiv, 2019.")] 数据集。|' - en: '| [`LJSPEECH`](generated/torchaudio.datasets.LJSPEECH.html#torchaudio.datasets.LJSPEECH "torchaudio.datasets.LJSPEECH") | *LJSpeech-1.1* [[Ito and Johnson, 2017](references.html#id7 "Keith Ito and Linda Johnson. The lj speech dataset. \url https://keithito.com/LJ-Speech-Dataset/, 2017.")] dataset. |' + id: totrans-16 prefs: [] type: TYPE_TB + zh: '| [`LJSPEECH`](generated/torchaudio.datasets.LJSPEECH.html#torchaudio.datasets.LJSPEECH + "torchaudio.datasets.LJSPEECH") | *LJSpeech-1.1* [[Ito and Johnson, 2017](references.html#id7 + "Keith Ito and Linda Johnson. The lj speech dataset. \url https://keithito.com/LJ-Speech-Dataset/, + 2017.")] 数据集。|' - en: '| [`MUSDB_HQ`](generated/torchaudio.datasets.MUSDB_HQ.html#torchaudio.datasets.MUSDB_HQ "torchaudio.datasets.MUSDB_HQ") | *MUSDB_HQ* [[Rafii *et al.*, 2019](references.html#id47 "Zafar Rafii, Antoine Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis, and Rachel Bittner. MUSDB18-HQ - an uncompressed version of musdb18\. December 2019\. URL: https://doi.org/10.5281/zenodo.3338373, doi:10.5281/zenodo.3338373.")] dataset. |' + id: totrans-17 prefs: [] type: TYPE_TB + zh: '| [`MUSDB_HQ`](generated/torchaudio.datasets.MUSDB_HQ.html#torchaudio.datasets.MUSDB_HQ + "torchaudio.datasets.MUSDB_HQ") | *MUSDB_HQ* [[Rafii *et al.*, 2019](references.html#id47 + "Zafar Rafii, Antoine Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis, + and Rachel Bittner. MUSDB18-HQ - an uncompressed version of musdb18\. December + 2019\. URL: https://doi.org/10.5281/zenodo.3338373, doi:10.5281/zenodo.3338373.")] + 数据集。|' - en: '| [`QUESST14`](generated/torchaudio.datasets.QUESST14.html#torchaudio.datasets.QUESST14 "torchaudio.datasets.QUESST14") | *QUESST14* [[Miro *et al.*, 2015](references.html#id44 "Xavier Anguera Miro, Luis Javier Rodriguez-Fuentes, Andi Buzo, Florian Metze, @@ -125,8 +223,16 @@ search in a zero-resource setting with real-life queries. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5833-5837, 2015.")] dataset. |' + id: totrans-18 prefs: [] type: TYPE_TB + zh: '| [`QUESST14`](generated/torchaudio.datasets.QUESST14.html#torchaudio.datasets.QUESST14 + "torchaudio.datasets.QUESST14") | *QUESST14* [[Miro *et al.*, 2015](references.html#id44 + "Xavier Anguera Miro, Luis Javier Rodriguez-Fuentes, Andi Buzo, Florian Metze, + Igor Szoke, and Mikel Peñagarikano. Quesst2014: evaluating query-by-example speech + search in a zero-resource setting with real-life queries. 2015 IEEE International + Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5833-5837, + 2015.")] 数据集。|' - en: '| [`Snips`](generated/torchaudio.datasets.Snips.html#torchaudio.datasets.Snips "torchaudio.datasets.Snips") | *Snips* [[Coucke *et al.*, 2018](references.html#id53 "Alice Coucke, Alaa Saade, Adrien Ball, Théodore Bluche, Alexandre Caulier, David @@ -134,45 +240,87 @@ Lavril, and others. Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190, 2018.")] dataset. |' + id: totrans-19 prefs: [] type: TYPE_TB + zh: '| [`Snips`](generated/torchaudio.datasets.Snips.html#torchaudio.datasets.Snips + "torchaudio.datasets.Snips") | *Snips* [[Coucke *et al.*, 2018](references.html#id53 + "Alice Coucke, Alaa Saade, Adrien Ball, Théodore Bluche, Alexandre Caulier, David + Leroy, Clément Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut + Lavril, and others. Snips voice platform: an embedded spoken language understanding + system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190, + 2018.")] 数据集。|' - en: '| [`SPEECHCOMMANDS`](generated/torchaudio.datasets.SPEECHCOMMANDS.html#torchaudio.datasets.SPEECHCOMMANDS "torchaudio.datasets.SPEECHCOMMANDS") | *Speech Commands* [[Warden, 2018](references.html#id39 "P. Warden. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. ArXiv e-prints, April 2018\. URL: https://arxiv.org/abs/1804.03209, arXiv:1804.03209.")] dataset. |' + id: totrans-20 prefs: [] type: TYPE_TB + zh: '| [`SPEECHCOMMANDS`](generated/torchaudio.datasets.SPEECHCOMMANDS.html#torchaudio.datasets.SPEECHCOMMANDS + "torchaudio.datasets.SPEECHCOMMANDS") | *Speech Commands* [[Warden, 2018](references.html#id39 + "P. Warden. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. + ArXiv e-prints, April 2018\. URL: https://arxiv.org/abs/1804.03209, arXiv:1804.03209.")] + 数据集。 |' - en: '| [`TEDLIUM`](generated/torchaudio.datasets.TEDLIUM.html#torchaudio.datasets.TEDLIUM "torchaudio.datasets.TEDLIUM") | *Tedlium* [[Rousseau *et al.*, 2012](references.html#id40 "Anthony Rousseau, Paul Deléglise, and Yannick Estève. Ted-lium: an automatic speech recognition dedicated corpus. In Conference on Language Resources and Evaluation (LREC), 125–129\. 2012.")] dataset (releases 1,2 and 3). |' + id: totrans-21 prefs: [] type: TYPE_TB + zh: '| [`TEDLIUM`](generated/torchaudio.datasets.TEDLIUM.html#torchaudio.datasets.TEDLIUM + "torchaudio.datasets.TEDLIUM") | *Tedlium* [[Rousseau *et al.*, 2012](references.html#id40 + "Anthony Rousseau, Paul Deléglise, and Yannick Estève. Ted-lium: an automatic + speech recognition dedicated corpus. In Conference on Language Resources and Evaluation + (LREC), 125–129\. 2012.")] 数据集(版本1、2和3)。 |' - en: '| [`VCTK_092`](generated/torchaudio.datasets.VCTK_092.html#torchaudio.datasets.VCTK_092 "torchaudio.datasets.VCTK_092") | *VCTK 0.92* [[Yamagishi *et al.*, 2019](references.html#id41 "Junichi Yamagishi, Christophe Veaux, and Kirsten MacDonald. CSTR VCTK Corpus: english multi-speaker corpus for CSTR voice cloning toolkit (version 0.92). 2019\. doi:10.7488/ds/2645.")] dataset |' + id: totrans-22 prefs: [] type: TYPE_TB + zh: '| [`VCTK_092`](generated/torchaudio.datasets.VCTK_092.html#torchaudio.datasets.VCTK_092 + "torchaudio.datasets.VCTK_092") | *VCTK 0.92* [[Yamagishi *et al.*, 2019](references.html#id41 + "Junichi Yamagishi, Christophe Veaux, and Kirsten MacDonald. CSTR VCTK Corpus: + english multi-speaker corpus for CSTR voice cloning toolkit (version 0.92). 2019\. + doi:10.7488/ds/2645.")] 数据集 |' - en: '| [`VoxCeleb1Identification`](generated/torchaudio.datasets.VoxCeleb1Identification.html#torchaudio.datasets.VoxCeleb1Identification "torchaudio.datasets.VoxCeleb1Identification") | *VoxCeleb1* [[Nagrani *et al.*, 2017](references.html#id49 "Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612, 2017.")] dataset for speaker identification task. |' + id: totrans-23 prefs: [] type: TYPE_TB + zh: '| [`VoxCeleb1Identification`](generated/torchaudio.datasets.VoxCeleb1Identification.html#torchaudio.datasets.VoxCeleb1Identification + "torchaudio.datasets.VoxCeleb1Identification") | *VoxCeleb1* [[Nagrani *et al.*, + 2017](references.html#id49 "Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. + Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612, + 2017.")] 用于说话人识别任务的数据集。 |' - en: '| [`VoxCeleb1Verification`](generated/torchaudio.datasets.VoxCeleb1Verification.html#torchaudio.datasets.VoxCeleb1Verification "torchaudio.datasets.VoxCeleb1Verification") | *VoxCeleb1* [[Nagrani *et al.*, 2017](references.html#id49 "Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612, 2017.")] dataset for speaker verification task. |' + id: totrans-24 prefs: [] type: TYPE_TB + zh: '| [`VoxCeleb1Verification`](generated/torchaudio.datasets.VoxCeleb1Verification.html#torchaudio.datasets.VoxCeleb1Verification + "torchaudio.datasets.VoxCeleb1Verification") | *VoxCeleb1* [[Nagrani *et al.*, + 2017](references.html#id49 "Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. + Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612, + 2017.")] 用于说话人验证任务的数据集。 |' - en: '| [`YESNO`](generated/torchaudio.datasets.YESNO.html#torchaudio.datasets.YESNO "torchaudio.datasets.YESNO") | *YesNo* [[*YesNo*, n.d.](references.html#id46 "Yesno. URL: http://www.openslr.org/1/.")] dataset. |' + id: totrans-25 prefs: [] type: TYPE_TB + zh: '| [`YESNO`](generated/torchaudio.datasets.YESNO.html#torchaudio.datasets.YESNO + "torchaudio.datasets.YESNO") | *YesNo* [[*YesNo*, n.d.](references.html#id46 "Yesno. + URL: http://www.openslr.org/1/.")] 数据集。 |' diff --git a/totrans/aud22_53.yaml b/totrans/aud22_53.yaml index 62b27a5f048d607951ff1910ddb149b6cd17c8a9..c7abdc47c9be716d629f2754653243fdc6af89f1 100644 --- a/totrans/aud22_53.yaml +++ b/totrans/aud22_53.yaml @@ -1,30 +1,45 @@ - en: torchaudio.models + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.models - en: 原文:[https://pytorch.org/audio/stable/models.html](https://pytorch.org/audio/stable/models.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/models.html](https://pytorch.org/audio/stable/models.html) - en: The `torchaudio.models` subpackage contains definitions of models for addressing common audio tasks. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: '`torchaudio.models`子包含有用于处理常见音频任务的模型定义。' - en: Note + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: For models with pre-trained parameters, please refer to [`torchaudio.pipelines`](pipelines.html#module-torchaudio.pipelines "torchaudio.pipelines") module. + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 对于具有预训练参数的模型,请参考[`torchaudio.pipelines`](pipelines.html#module-torchaudio.pipelines + "torchaudio.pipelines")模块。 - en: Model defintions are responsible for constructing computation graphs and executing them. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 模型定义负责构建计算图并执行它们。 - en: Some models have complex structure and variations. For such models, factory functions are provided. + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 一些模型具有复杂的结构和变体。对于这样的模型,提供了工厂函数。 - en: '| [`Conformer`](generated/torchaudio.models.Conformer.html#torchaudio.models.Conformer "torchaudio.models.Conformer") | Conformer architecture introduced in *Conformer: Convolution-augmented Transformer for Speech Recognition* [[Gulati *et al.*, 2020](references.html#id21 @@ -32,8 +47,16 @@ Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, and Ruoming Pang. Conformer: convolution-augmented transformer for speech recognition. 2020\. arXiv:2005.08100.")]. |' + id: totrans-7 prefs: [] type: TYPE_TB + zh: '| [`Conformer`](generated/torchaudio.models.Conformer.html#torchaudio.models.Conformer + "torchaudio.models.Conformer") | Conformer架构介绍在*Conformer: Convolution-augmented + Transformer for Speech Recognition* [[Gulati *et al.*, 2020](references.html#id21 + "Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, + Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, and Ruoming Pang. Conformer: + convolution-augmented transformer for speech recognition. 2020\. arXiv:2005.08100.")]. + |' - en: '| [`ConvTasNet`](generated/torchaudio.models.ConvTasNet.html#torchaudio.models.ConvTasNet "torchaudio.models.ConvTasNet") | Conv-TasNet architecture introduced in *Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation* [[Luo @@ -42,8 +65,16 @@ Transactions on Audio, Speech, and Language Processing, 27(8):1256–1266, Aug 2019\. URL: http://dx.doi.org/10.1109/TASLP.2019.2915167, doi:10.1109/taslp.2019.2915167.")]. |' + id: totrans-8 prefs: [] type: TYPE_TB + zh: '| [`ConvTasNet`](generated/torchaudio.models.ConvTasNet.html#torchaudio.models.ConvTasNet + "torchaudio.models.ConvTasNet") | Conv-TasNet架构介绍在*Conv-TasNet: Surpassing Ideal + Time–Frequency Magnitude Masking for Speech Separation* [[Luo and Mesgarani, 2019](references.html#id22 + "Yi Luo and Nima Mesgarani. Conv-tasnet: surpassing ideal time–frequency magnitude + masking for speech separation. IEEE/ACM Transactions on Audio, Speech, and Language + Processing, 27(8):1256–1266, Aug 2019\. URL: http://dx.doi.org/10.1109/TASLP.2019.2915167, + doi:10.1109/taslp.2019.2915167.")]. |' - en: '| [`DeepSpeech`](generated/torchaudio.models.DeepSpeech.html#torchaudio.models.DeepSpeech "torchaudio.models.DeepSpeech") | DeepSpeech architecture introduced in *Deep Speech: Scaling up end-to-end speech recognition* [[Hannun *et al.*, 2014](references.html#id17 @@ -51,8 +82,15 @@ Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. Deep speech: scaling up end-to-end speech recognition. 2014\. arXiv:1412.5567.")]. |' + id: totrans-9 prefs: [] type: TYPE_TB + zh: '| [`DeepSpeech`](generated/torchaudio.models.DeepSpeech.html#torchaudio.models.DeepSpeech + "torchaudio.models.DeepSpeech") | DeepSpeech架构介绍在*Deep Speech: Scaling up end-to-end + speech recognition* [[Hannun *et al.*, 2014](references.html#id17 "Awni Hannun, + Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, + Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. Deep speech: + scaling up end-to-end speech recognition. 2014\. arXiv:1412.5567.")]. |' - en: '| [`Emformer`](generated/torchaudio.models.Emformer.html#torchaudio.models.Emformer "torchaudio.models.Emformer") | Emformer architecture introduced in *Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech @@ -61,43 +99,79 @@ Seltzer. Emformer: efficient memory transformer based acoustic model for low latency streaming speech recognition. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6783-6787\. 2021.")]. |' + id: totrans-10 prefs: [] type: TYPE_TB + zh: '| [`Emformer`](generated/torchaudio.models.Emformer.html#torchaudio.models.Emformer + "torchaudio.models.Emformer") | Emformer架构介绍在*Emformer: Efficient Memory Transformer + Based Acoustic Model for Low Latency Streaming Speech Recognition* [[Shi *et al.*, + 2021](references.html#id30 "Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng + Yeh, Julian Chan, Frank Zhang, Duc Le, and Mike Seltzer. Emformer: efficient memory + transformer based acoustic model for low latency streaming speech recognition. + In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal + Processing (ICASSP), 6783-6787\. 2021.")]. |' - en: '| [`HDemucs`](generated/torchaudio.models.HDemucs.html#torchaudio.models.HDemucs "torchaudio.models.HDemucs") | Hybrid Demucs model from *Hybrid Spectrogram and Waveform Source Separation* [[Défossez, 2021](references.html#id50 "Alexandre Défossez. Hybrid spectrogram and waveform source separation. In Proceedings of the ISMIR 2021 Workshop on Music Source Separation. 2021.")]. |' + id: totrans-11 prefs: [] type: TYPE_TB + zh: '| [`HDemucs`](generated/torchaudio.models.HDemucs.html#torchaudio.models.HDemucs + "torchaudio.models.HDemucs") | 来自*Hybrid Spectrogram and Waveform Source Separation*的混合Demucs模型[[Défossez, + 2021](references.html#id50 "Alexandre Défossez. Hybrid spectrogram and waveform + source separation. In Proceedings of the ISMIR 2021 Workshop on Music Source Separation. + 2021.")]. |' - en: '| [`HuBERTPretrainModel`](generated/torchaudio.models.HuBERTPretrainModel.html#torchaudio.models.HuBERTPretrainModel "torchaudio.models.HuBERTPretrainModel") | HuBERT model used for pretraining in *HuBERT* [[Hsu *et al.*, 2021](references.html#id16 "Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. Hubert: self-supervised speech representation learning by masked prediction of hidden units. 2021\. arXiv:2106.07447.")]. |' + id: totrans-12 prefs: [] type: TYPE_TB + zh: '| [`HuBERTPretrainModel`](generated/torchaudio.models.HuBERTPretrainModel.html#torchaudio.models.HuBERTPretrainModel + "torchaudio.models.HuBERTPretrainModel") | HuBERT模型用于*HuBERT*中的预训练 [[Hsu *et al.*, + 2021](references.html#id16 "Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, + Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. Hubert: self-supervised + speech representation learning by masked prediction of hidden units. 2021\. arXiv:2106.07447.")]. + |' - en: '| [`RNNT`](generated/torchaudio.models.RNNT.html#torchaudio.models.RNNT "torchaudio.models.RNNT") | Recurrent neural network transducer (RNN-T) model. |' + id: totrans-13 prefs: [] type: TYPE_TB + zh: '| [`RNNT`](generated/torchaudio.models.RNNT.html#torchaudio.models.RNNT "torchaudio.models.RNNT") + | 循环神经网络转录器(RNN-T)模型。 |' - en: '| [`RNNTBeamSearch`](generated/torchaudio.models.RNNTBeamSearch.html#torchaudio.models.RNNTBeamSearch "torchaudio.models.RNNTBeamSearch") | Beam search decoder for RNN-T model. |' + id: totrans-14 prefs: [] type: TYPE_TB + zh: '| [`RNNTBeamSearch`](generated/torchaudio.models.RNNTBeamSearch.html#torchaudio.models.RNNTBeamSearch + "torchaudio.models.RNNTBeamSearch") | RNN-T模型的束搜索解码器。 |' - en: '| [`SquimObjective`](generated/torchaudio.models.SquimObjective.html#torchaudio.models.SquimObjective "torchaudio.models.SquimObjective") | Speech Quality and Intelligibility Measures (SQUIM) model that predicts **objective** metric scores for speech enhancement (e.g., STOI, PESQ, and SI-SDR). |' + id: totrans-15 prefs: [] type: TYPE_TB + zh: '| [`SquimObjective`](generated/torchaudio.models.SquimObjective.html#torchaudio.models.SquimObjective + "torchaudio.models.SquimObjective") | 预测语音增强的**客观**度量分数(例如,STOI、PESQ和SI-SDR)的语音质量和可懂度测量(SQUIM)模型。 + |' - en: '| [`SquimSubjective`](generated/torchaudio.models.SquimSubjective.html#torchaudio.models.SquimSubjective "torchaudio.models.SquimSubjective") | Speech Quality and Intelligibility Measures (SQUIM) model that predicts **subjective** metric scores for speech enhancement (e.g., Mean Opinion Score (MOS)). |' + id: totrans-16 prefs: [] type: TYPE_TB + zh: '| [`SquimSubjective`](generated/torchaudio.models.SquimSubjective.html#torchaudio.models.SquimSubjective + "torchaudio.models.SquimSubjective") | 预测语音增强的**主观**度量分数(例如,平均意见分数(MOS))的语音质量和可懂度测量(SQUIM)模型。 + |' - en: '| [`Tacotron2`](generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2 "torchaudio.models.Tacotron2") | Tacotron2 model from *Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions* [[Shen *et al.*, 2018](references.html#id27 @@ -107,22 +181,39 @@ IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4779–4783\. IEEE, 2018.")] based on the implementation from [Nvidia Deep Learning Examples](https://github.com/NVIDIA/DeepLearningExamples/). |' + id: totrans-17 prefs: [] type: TYPE_TB + zh: '| [`Tacotron2`](generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2 + "torchaudio.models.Tacotron2") | 基于《自然TTS合成:通过在Mel频谱图预测上对WaveNet进行条件化》[[Shen等,2018](references.html#id27 + "Jonathan Shen, Ruoming Pang, Ron J Weiss, Mike Schuster, Navdeep Jaitly, Zongheng + Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Rj Skerrv-Ryan等。通过在mel频谱图预测上对WaveNet进行条件化的自然TTS合成。在2018年IEEE国际声学、语音和信号处理会议(ICASSP)上,4779-4783。IEEE,2018。")] + 的Tacotron2模型,基于[Nvidia深度学习示例](https://github.com/NVIDIA/DeepLearningExamples/)的实现。 + |' - en: '| [`Wav2Letter`](generated/torchaudio.models.Wav2Letter.html#torchaudio.models.Wav2Letter "torchaudio.models.Wav2Letter") | Wav2Letter model architecture from *Wav2Letter: an End-to-End ConvNet-based Speech Recognition System* [[Collobert *et al.*, 2016](references.html#id19 "Ronan Collobert, Christian Puhrsch, and Gabriel Synnaeve. Wav2letter: an end-to-end convnet-based speech recognition system. 2016\. arXiv:1609.03193.")]. |' + id: totrans-18 prefs: [] type: TYPE_TB + zh: '| [`Wav2Letter`](generated/torchaudio.models.Wav2Letter.html#torchaudio.models.Wav2Letter + "torchaudio.models.Wav2Letter") | 来自《Wav2Letter:基于端到端ConvNet的语音识别系统》[[Collobert等,2016](references.html#id19 + "Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve。Wav2letter:基于端到端ConvNet的语音识别系统。2016。arXiv:1609.03193。")] + 的Wav2Letter模型架构。 |' - en: '| [`Wav2Vec2Model`](generated/torchaudio.models.Wav2Vec2Model.html#torchaudio.models.Wav2Vec2Model "torchaudio.models.Wav2Vec2Model") | Acoustic model used in *wav2vec 2.0* [[Baevski *et al.*, 2020](references.html#id15 "Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. Wav2vec 2.0: a framework for self-supervised learning of speech representations. 2020\. arXiv:2006.11477.")]. |' + id: totrans-19 prefs: [] type: TYPE_TB + zh: '| [`Wav2Vec2Model`](generated/torchaudio.models.Wav2Vec2Model.html#torchaudio.models.Wav2Vec2Model + "torchaudio.models.Wav2Vec2Model") | *wav2vec 2.0*中使用的声学模型[[Baevski等,2020](references.html#id15 + "Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli。Wav2vec 2.0:自监督学习语音表示的框架。2020。arXiv:2006.11477。")]。 + |' - en: '| [`WaveRNN`](generated/torchaudio.models.WaveRNN.html#torchaudio.models.WaveRNN "torchaudio.models.WaveRNN") | WaveRNN model from *Efficient Neural Audio Synthesis* [[Kalchbrenner *et al.*, 2018](references.html#id3 "Nal Kalchbrenner, Erich Elsen, @@ -131,5 +222,12 @@ synthesis. CoRR, 2018\. URL: http://arxiv.org/abs/1802.08435, arXiv:1802.08435.")] based on the implementation from [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN). |' + id: totrans-20 prefs: [] type: TYPE_TB + zh: '| [`WaveRNN`](generated/torchaudio.models.WaveRNN.html#torchaudio.models.WaveRNN + "torchaudio.models.WaveRNN") | 基于《高效神经音频合成》[[Kalchbrenner等,2018](references.html#id3 + "Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, + Edward Lockhart, Florian Stimberg, Aäron van den Oord, Sander Dieleman, Koray + Kavukcuoglu等。高效神经音频合成。CoRR,2018。URL:http://arxiv.org/abs/1802.08435,arXiv:1802.08435。")] + 的WaveRNN模型,基于[fatchord/WaveRNN](https://github.com/fatchord/WaveRNN)的实现。 |' diff --git a/totrans/aud22_54.yaml b/totrans/aud22_54.yaml index 7b40baf7ceb78c73bf18be6d097a45409bfeef4e..099829f73bed6a2752300ffbfeae95f16bf20a6e 100644 --- a/totrans/aud22_54.yaml +++ b/totrans/aud22_54.yaml @@ -1,67 +1,113 @@ - en: torchaudio.models.decoder + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.models.decoder - en: 原文:[https://pytorch.org/audio/stable/models.decoder.html](https://pytorch.org/audio/stable/models.decoder.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/models.decoder.html](https://pytorch.org/audio/stable/models.decoder.html) - en: '## CTC Decoder[](#ctc-decoder "Permalink to this heading")' + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: '## CTC解码器[](#ctc-decoder "跳转到此标题")' - en: '| [`CTCDecoder`](generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder "torchaudio.models.decoder.CTCDecoder") | CTC beam search decoder from *Flashlight* [[Kahn *et al.*, 2022](references.html#id35 "Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov, and others. Flashlight: enabling innovation in tools for machine learning. arXiv preprint arXiv:2201.12465, 2022.")]. |' + id: totrans-3 prefs: [] type: TYPE_TB + zh: '| [`CTCDecoder`](generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder + "torchaudio.models.decoder.CTCDecoder") | 来自 *Flashlight* 的CTC波束搜索解码器 [[Kahn *et + al.*, 2022](references.html#id35 "Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, + Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad + Avidov, and others. Flashlight: enabling innovation in tools for machine learning. + arXiv preprint arXiv:2201.12465, 2022.")]。 |' - en: '| [`ctc_decoder`](generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder "torchaudio.models.decoder.ctc_decoder") | Builds an instance of [`CTCDecoder`](generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder "torchaudio.models.decoder.CTCDecoder"). |' + id: totrans-4 prefs: [] type: TYPE_TB + zh: '| [`ctc_decoder`](generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder + "torchaudio.models.decoder.ctc_decoder") | 构建 [`CTCDecoder`](generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder + "torchaudio.models.decoder.CTCDecoder") 的实例。 |' - en: '| [`download_pretrained_files`](generated/torchaudio.models.decoder.download_pretrained_files.html#torchaudio.models.decoder.download_pretrained_files "torchaudio.models.decoder.download_pretrained_files") | Retrieves pretrained data files used for [`ctc_decoder()`](generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder "torchaudio.models.decoder.ctc_decoder"). |' + id: totrans-5 prefs: [] type: TYPE_TB + zh: '| [`download_pretrained_files`](generated/torchaudio.models.decoder.download_pretrained_files.html#torchaudio.models.decoder.download_pretrained_files + "torchaudio.models.decoder.download_pretrained_files") | 获取用于 [`ctc_decoder()`](generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder + "torchaudio.models.decoder.ctc_decoder") 的预训练数据文件。 |' - en: Tutorials using CTC Decoder + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 使用CTC解码器的教程 - en: '![ASR Inference with CTC Decoder](../Images/260e63239576cae8ee00cfcba8e4889e.png)' + id: totrans-7 prefs: [] type: TYPE_IMG + zh: '![使用CTC解码器的ASR推理](../Images/260e63239576cae8ee00cfcba8e4889e.png)' - en: '[ASR Inference with CTC Decoder](tutorials/asr_inference_with_ctc_decoder_tutorial.html#sphx-glr-tutorials-asr-inference-with-ctc-decoder-tutorial-py)' + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: '[使用CTC解码器的ASR推理](tutorials/asr_inference_with_ctc_decoder_tutorial.html#sphx-glr-tutorials-asr-inference-with-ctc-decoder-tutorial-py)' - en: ASR Inference with CTC Decoder + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: 使用CTC解码器的ASR推理 - en: CUDA CTC Decoder[](#cuda-ctc-decoder "Permalink to this heading") + id: totrans-10 prefs: - PREF_H2 type: TYPE_NORMAL + zh: CUDA CTC解码器[](#cuda-ctc-decoder "跳转到此标题") - en: '| [`CUCTCDecoder`](generated/torchaudio.models.decoder.CUCTCDecoder.html#torchaudio.models.decoder.CUCTCDecoder "torchaudio.models.decoder.CUCTCDecoder") | CUDA CTC beam search decoder. |' + id: totrans-11 prefs: [] type: TYPE_TB + zh: '| [`CUCTCDecoder`](generated/torchaudio.models.decoder.CUCTCDecoder.html#torchaudio.models.decoder.CUCTCDecoder + "torchaudio.models.decoder.CUCTCDecoder") | CUDA CTC波束搜索解码器。 |' - en: '| [`cuda_ctc_decoder`](generated/torchaudio.models.decoder.cuda_ctc_decoder.html#torchaudio.models.decoder.cuda_ctc_decoder "torchaudio.models.decoder.cuda_ctc_decoder") | Builds an instance of [`CUCTCDecoder`](generated/torchaudio.models.decoder.CUCTCDecoder.html#torchaudio.models.decoder.CUCTCDecoder "torchaudio.models.decoder.CUCTCDecoder"). |' + id: totrans-12 prefs: [] type: TYPE_TB + zh: '| [`cuda_ctc_decoder`](generated/torchaudio.models.decoder.cuda_ctc_decoder.html#torchaudio.models.decoder.cuda_ctc_decoder + "torchaudio.models.decoder.cuda_ctc_decoder") | 构建 [`CUCTCDecoder`](generated/torchaudio.models.decoder.CUCTCDecoder.html#torchaudio.models.decoder.CUCTCDecoder + "torchaudio.models.decoder.CUCTCDecoder") 的实例。 |' - en: Tutorials using CUDA CTC Decoder + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 使用CUDA CTC解码器的教程 - en: '![ASR Inference with CUDA CTC Decoder](../Images/9d0a043104707d980656cfaf03fdd1a1.png)' + id: totrans-14 prefs: [] type: TYPE_IMG + zh: '![使用CUDA CTC解码器的ASR推理](../Images/9d0a043104707d980656cfaf03fdd1a1.png)' - en: '[ASR Inference with CUDA CTC Decoder](tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html#sphx-glr-tutorials-asr-inference-with-cuda-ctc-decoder-tutorial-py)' + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: '[使用CUDA CTC解码器的ASR推理](tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html#sphx-glr-tutorials-asr-inference-with-cuda-ctc-decoder-tutorial-py)' - en: ASR Inference with CUDA CTC Decoder + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: 使用CUDA CTC解码器的ASR推理 diff --git a/totrans/aud22_55.yaml b/totrans/aud22_55.yaml index e190acf470efa5f410dcb919b16bfe7d5b320a4b..78d218eac5a89ca9de961afb8d00826c4b6a0d21 100644 --- a/totrans/aud22_55.yaml +++ b/totrans/aud22_55.yaml @@ -1,34 +1,48 @@ - en: torchaudio.pipelines + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.pipelines - en: 原文:[https://pytorch.org/audio/stable/pipelines.html](https://pytorch.org/audio/stable/pipelines.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/pipelines.html](https://pytorch.org/audio/stable/pipelines.html) - en: The `torchaudio.pipelines` module packages pre-trained models with support functions and meta-data into simple APIs tailored to perform specific tasks. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: '`torchaudio.pipelines`模块将预训练模型与支持函数和元数据打包成简单的API,以执行特定任务。' - en: When using pre-trained models to perform a task, in addition to instantiating the model with pre-trained weights, the client code also needs to build pipelines for feature extractions and post processing in the same way they were done during the training. This requires to carrying over information used during the training, such as the type of transforms and the their parameters (for example, sampling rate the number of FFT bins). + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 当使用预训练模型执行任务时,除了使用预训练权重实例化模型外,客户端代码还需要以与训练期间相同的方式构建特征提取和后处理流水线。这需要将训练期间使用的信息传递过去,比如变换的类型和参数(例如,采样率和FFT频率数量)。 - en: To make this information tied to a pre-trained model and easily accessible, `torchaudio.pipelines` module uses the concept of a Bundle class, which defines a set of APIs to instantiate pipelines, and the interface of the pipelines. + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 为了将这些信息与预训练模型绑定并轻松访问,`torchaudio.pipelines`模块使用Bundle类的概念,该类定义了一组API来实例化流水线和流水线的接口。 - en: The following figure illustrates this. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 以下图示说明了这一点。 - en: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-intro.png](../Images/7dc27a33a67f5b02c554368a2500bcb8.png)' + id: totrans-6 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-intro.png](../Images/7dc27a33a67f5b02c554368a2500bcb8.png)' - en: A pre-trained model and associated pipelines are expressed as an instance of `Bundle`. Different instances of same `Bundle` share the interface, but their implementations are not constrained to be of same types. For example, [`SourceSeparationBundle`](generated/torchaudio.pipelines.SourceSeparationBundle.html#torchaudio.pipelines.SourceSeparationBundle @@ -39,72 +53,117 @@ "torchaudio.pipelines.HDEMUCS_HIGH_MUSDB") instantiates a model of [`HDemucs`](generated/torchaudio.models.HDemucs.html#torchaudio.models.HDemucs "torchaudio.models.HDemucs"). Still, because they share the same interface, the usage is the same. + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 预训练模型和相关流水线被表示为`Bundle`的实例。相同`Bundle`的不同实例共享接口,但它们的实现不受限于相同类型。例如,[`SourceSeparationBundle`](generated/torchaudio.pipelines.SourceSeparationBundle.html#torchaudio.pipelines.SourceSeparationBundle + "torchaudio.pipelines.SourceSeparationBundle")定义了执行源分离的接口,但其实例[`CONVTASNET_BASE_LIBRI2MIX`](generated/torchaudio.pipelines.CONVTASNET_BASE_LIBRI2MIX.html#torchaudio.pipelines.CONVTASNET_BASE_LIBRI2MIX + "torchaudio.pipelines.CONVTASNET_BASE_LIBRI2MIX")实例化了一个[`ConvTasNet`](generated/torchaudio.models.ConvTasNet.html#torchaudio.models.ConvTasNet + "torchaudio.models.ConvTasNet")模型,而[`HDEMUCS_HIGH_MUSDB`](generated/torchaudio.pipelines.HDEMUCS_HIGH_MUSDB.html#torchaudio.pipelines.HDEMUCS_HIGH_MUSDB + "torchaudio.pipelines.HDEMUCS_HIGH_MUSDB")实例化了一个[`HDemucs`](generated/torchaudio.models.HDemucs.html#torchaudio.models.HDemucs + "torchaudio.models.HDemucs")模型。尽管如此,因为它们共享相同的接口,使用方式是相同的。 - en: Note + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Under the hood, the implementations of `Bundle` use components from other `torchaudio` modules, such as [`torchaudio.models`](models.html#module-torchaudio.models "torchaudio.models") and [`torchaudio.transforms`](transforms.html#module-torchaudio.transforms "torchaudio.transforms"), or even third party libraries like [SentencPiece](https://github.com/google/sentencepiece) and [DeepPhonemizer](https://github.com/as-ideas/DeepPhonemizer). But this implementation detail is abstracted away from library users. + id: totrans-9 prefs: [] type: TYPE_NORMAL -- en: '## RNN-T Streaming/Non-Streaming ASR[](#rnn-t-streaming-non-streaming-asr - "Permalink to this heading")' + zh: 在底层,`Bundle`的实现使用了来自其他`torchaudio`模块的组件,比如[`torchaudio.models`](models.html#module-torchaudio.models + "torchaudio.models")和[`torchaudio.transforms`](transforms.html#module-torchaudio.transforms + "torchaudio.transforms"),甚至第三方库如[SentencPiece](https://github.com/google/sentencepiece)和[DeepPhonemizer](https://github.com/as-ideas/DeepPhonemizer)。但这些实现细节对库用户是抽象的。 +- en: '## RNN-T Streaming/Non-Streaming ASR[](#rnn-t-streaming-non-streaming-asr "Permalink + to this heading")' + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: '## RNN-T流式/非流式ASR[](#rnn-t-streaming-non-streaming-asr "Permalink to this heading")' - en: Interface[](#interface "Permalink to this heading") + id: totrans-11 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 接口[](#interface "Permalink to this heading") - en: '`RNNTBundle` defines ASR pipelines and consists of three steps: feature extraction, inference, and de-tokenization.' + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: '`RNNTBundle`定义了ASR流水线,包括三个步骤:特征提取、推理和去标记化。' - en: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-rnntbundle.png](../Images/d53f88ebd8f526f56982a4de4848dcaf.png)' + id: totrans-13 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-rnntbundle.png](../Images/d53f88ebd8f526f56982a4de4848dcaf.png)' - en: '| [`RNNTBundle`](generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle "torchaudio.pipelines.RNNTBundle") | Dataclass that bundles components for performing automatic speech recognition (ASR, speech-to-text) inference with an RNN-T model. |' + id: totrans-14 prefs: [] type: TYPE_TB + zh: '| [`RNNTBundle`](generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle + "torchaudio.pipelines.RNNTBundle") | 用于执行自动语音识别(ASR,语音转文本)推理的RNN-T模型的组件捆绑数据类。 + |' - en: '| [`RNNTBundle.FeatureExtractor`](generated/torchaudio.pipelines.RNNTBundle.FeatureExtractor.html#torchaudio.pipelines.RNNTBundle.FeatureExtractor "torchaudio.pipelines.RNNTBundle.FeatureExtractor") | Interface of the feature extraction part of RNN-T pipeline |' + id: totrans-15 prefs: [] type: TYPE_TB + zh: '| [`RNNTBundle.FeatureExtractor`](generated/torchaudio.pipelines.RNNTBundle.FeatureExtractor.html#torchaudio.pipelines.RNNTBundle.FeatureExtractor + "torchaudio.pipelines.RNNTBundle.FeatureExtractor") | RNN-T流水线中特征提取部分的接口 |' - en: '| [`RNNTBundle.TokenProcessor`](generated/torchaudio.pipelines.RNNTBundle.TokenProcessor.html#torchaudio.pipelines.RNNTBundle.TokenProcessor "torchaudio.pipelines.RNNTBundle.TokenProcessor") | Interface of the token processor part of RNN-T pipeline |' + id: totrans-16 prefs: [] type: TYPE_TB + zh: '| [`RNNTBundle.TokenProcessor`](generated/torchaudio.pipelines.RNNTBundle.TokenProcessor.html#torchaudio.pipelines.RNNTBundle.TokenProcessor + "torchaudio.pipelines.RNNTBundle.TokenProcessor") | RNN-T流水线中标记处理器部分的接口 |' - en: Tutorials using `RNNTBundle` + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: 使用`RNNTBundle`的教程 - en: '![Online ASR with Emformer RNN-T](../Images/200081d049505bef5c1ce8e3c321134d.png)' + id: totrans-18 prefs: [] type: TYPE_IMG + zh: '![在线 ASR 与 Emformer RNN-T](../Images/200081d049505bef5c1ce8e3c321134d.png)' - en: '[Online ASR with Emformer RNN-T](tutorials/online_asr_tutorial.html#sphx-glr-tutorials-online-asr-tutorial-py)' + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: '[在线 ASR 与 Emformer RNN-T](tutorials/online_asr_tutorial.html#sphx-glr-tutorials-online-asr-tutorial-py)' - en: Online ASR with Emformer RNN-T![Device ASR with Emformer RNN-T](../Images/62ca7f96e6d3a3011aa85c2a9228f03f.png) + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: 在线 ASR 与 Emformer RNN-T![设备 ASR 与 Emformer RNN-T](../Images/62ca7f96e6d3a3011aa85c2a9228f03f.png) - en: '[Device ASR with Emformer RNN-T](tutorials/device_asr.html#sphx-glr-tutorials-device-asr-py)' + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: '[设备 ASR 与 Emformer RNN-T](tutorials/device_asr.html#sphx-glr-tutorials-device-asr-py)' - en: Device ASR with Emformer RNN-T + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 设备 ASR 与 Emformer RNN-T - en: Pretrained Models[](#pretrained-models "Permalink to this heading") + id: totrans-23 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 预训练模型[](#pretrained-models "跳转到此标题") - en: '| [`EMFORMER_RNNT_BASE_LIBRISPEECH`](generated/torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH.html#torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH "torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH") | ASR pipeline based on Emformer-RNNT, pretrained on *LibriSpeech* dataset [[Panayotov *et al.*, 2015](references.html#id13 @@ -113,34 +172,56 @@ on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. doi:10.1109/ICASSP.2015.7178964.")], capable of performing both streaming and non-streaming inference. |' + id: totrans-24 prefs: [] type: TYPE_TB + zh: '| [`EMFORMER_RNNT_BASE_LIBRISPEECH`](generated/torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH.html#torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH + "torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH") | 基于 Emformer-RNNT 的 ASR + 流水线,在 *LibriSpeech* 数据集上进行预训练[[Panayotov *et al.*, 2015](references.html#id13 + "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: + an asr corpus based on public domain audio books. In 2015 IEEE International Conference + on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. + doi:10.1109/ICASSP.2015.7178964.")], 能够执行流式和非流式推理。 |' - en: wav2vec 2.0 / HuBERT / WavLM - SSL[](#wav2vec-2-0-hubert-wavlm-ssl "Permalink to this heading") + id: totrans-25 prefs: - PREF_H2 type: TYPE_NORMAL + zh: wav2vec 2.0 / HuBERT / WavLM - SSL[](#wav2vec-2-0-hubert-wavlm-ssl "跳转到此标题") - en: Interface[](#id2 "Permalink to this heading") + id: totrans-26 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 界面[](#id2 "跳转到此标题") - en: '`Wav2Vec2Bundle` instantiates models that generate acoustic features that can be used for downstream inference and fine-tuning.' + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: '`Wav2Vec2Bundle` 实例化生成声学特征的模型,可用于下游推理和微调。' - en: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-wav2vec2bundle.png](../Images/7a92fa41c1718aa05693226b9462514d.png)' + id: totrans-28 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-wav2vec2bundle.png](../Images/7a92fa41c1718aa05693226b9462514d.png)' - en: '| [`Wav2Vec2Bundle`](generated/torchaudio.pipelines.Wav2Vec2Bundle.html#torchaudio.pipelines.Wav2Vec2Bundle "torchaudio.pipelines.Wav2Vec2Bundle") | Data class that bundles associated information to use pretrained [`Wav2Vec2Model`](generated/torchaudio.models.Wav2Vec2Model.html#torchaudio.models.Wav2Vec2Model "torchaudio.models.Wav2Vec2Model"). |' + id: totrans-29 prefs: [] type: TYPE_TB + zh: '| [`Wav2Vec2Bundle`](generated/torchaudio.pipelines.Wav2Vec2Bundle.html#torchaudio.pipelines.Wav2Vec2Bundle + "torchaudio.pipelines.Wav2Vec2Bundle") | 数据类,捆绑相关信息以使用预训练的 [`Wav2Vec2Model`](generated/torchaudio.models.Wav2Vec2Model.html#torchaudio.models.Wav2Vec2Model + "torchaudio.models.Wav2Vec2Model")。 |' - en: Pretrained Models[](#id3 "Permalink to this heading") + id: totrans-30 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 预训练模型[](#id3 "跳转到此标题") - en: '| [`WAV2VEC2_BASE`](generated/torchaudio.pipelines.WAV2VEC2_BASE.html#torchaudio.pipelines.WAV2VEC2_BASE "torchaudio.pipelines.WAV2VEC2_BASE") | Wav2vec 2.0 model ("base" architecture), pre-trained on 960 hours of unlabeled audio from *LibriSpeech* dataset [[Panayotov @@ -150,8 +231,17 @@ (ICASSP), volume, 5206-5210\. 2015\. doi:10.1109/ICASSP.2015.7178964.")] (the combination of "train-clean-100", "train-clean-360", and "train-other-500"), not fine-tuned. |' + id: totrans-31 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_BASE`](generated/torchaudio.pipelines.WAV2VEC2_BASE.html#torchaudio.pipelines.WAV2VEC2_BASE + "torchaudio.pipelines.WAV2VEC2_BASE") | Wav2vec 2.0 模型(“基础”架构),在 *LibriSpeech* + 数据集的 960 小时未标记音频上进行预训练[[Panayotov *et al.*, 2015](references.html#id13 "Vassil + Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: an asr + corpus based on public domain audio books. In 2015 IEEE International Conference + on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. + doi:10.1109/ICASSP.2015.7178964.")]("train-clean-100"、"train-clean-360" 和 "train-other-500" + 的组合),未进行微调。 |' - en: '| [`WAV2VEC2_LARGE`](generated/torchaudio.pipelines.WAV2VEC2_LARGE.html#torchaudio.pipelines.WAV2VEC2_LARGE "torchaudio.pipelines.WAV2VEC2_LARGE") | Wav2vec 2.0 model ("large" architecture), pre-trained on 960 hours of unlabeled audio from *LibriSpeech* dataset [[Panayotov @@ -161,8 +251,17 @@ (ICASSP), volume, 5206-5210\. 2015\. doi:10.1109/ICASSP.2015.7178964.")] (the combination of "train-clean-100", "train-clean-360", and "train-other-500"), not fine-tuned. |' + id: totrans-32 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_LARGE`](generated/torchaudio.pipelines.WAV2VEC2_LARGE.html#torchaudio.pipelines.WAV2VEC2_LARGE + "torchaudio.pipelines.WAV2VEC2_LARGE") | Wav2vec 2.0 模型(“大”架构),在 *LibriSpeech* + 数据集的 960 小时未标记音频上进行预训练[[Panayotov *et al.*, 2015](references.html#id13 "Vassil + Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: an asr + corpus based on public domain audio books. In 2015 IEEE International Conference + on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. + doi:10.1109/ICASSP.2015.7178964.")]("train-clean-100"、"train-clean-360" 和 "train-other-500" + 的组合),未进行微调。 |' - en: '| [`WAV2VEC2_LARGE_LV60K`](generated/torchaudio.pipelines.WAV2VEC2_LARGE_LV60K.html#torchaudio.pipelines.WAV2VEC2_LARGE_LV60K "torchaudio.pipelines.WAV2VEC2_LARGE_LV60K") | Wav2vec 2.0 model ("large-lv60k" architecture), pre-trained on 60,000 hours of unlabeled audio from *Libri-Light* @@ -173,8 +272,12 @@ - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")], not fine-tuned. |' + id: totrans-33 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_LARGE_LV60K`](generated/torchaudio.pipelines.WAV2VEC2_LARGE_LV60K.html#torchaudio.pipelines.WAV2VEC2_LARGE_LV60K + "torchaudio.pipelines.WAV2VEC2_LARGE_LV60K") | Wav2vec 2.0 模型(“large-lv60k”架构),在 + *Libri-Light* 数据集的 60,000 小时未标记音频上进行预训练,未进行微调。 |' - en: '| [`WAV2VEC2_XLSR53`](generated/torchaudio.pipelines.WAV2VEC2_XLSR53.html#torchaudio.pipelines.WAV2VEC2_XLSR53 "torchaudio.pipelines.WAV2VEC2_XLSR53") | Wav2vec 2.0 model ("base" architecture), pre-trained on 56,000 hours of unlabeled audio from multiple datasets ( *Multilingual @@ -189,8 +292,12 @@ Kate Knill, Anton Ragni, and Shakti Prasad Rath. Speech recognition and keyword spotting for low-resource languages: babel project research at cued. In SLTU. 2014.")]), not fine-tuned. |' + id: totrans-34 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_XLSR53`](generated/torchaudio.pipelines.WAV2VEC2_XLSR53.html#torchaudio.pipelines.WAV2VEC2_XLSR53 + "torchaudio.pipelines.WAV2VEC2_XLSR53") | Wav2vec 2.0 模型(“基础”架构),在多个数据集的 56,000 + 小时未标记音频上进行预训练(*多语言 LibriSpeech*,*CommonVoice* 和 *BABEL*),未进行微调。 |' - en: '| [`WAV2VEC2_XLSR_300M`](generated/torchaudio.pipelines.WAV2VEC2_XLSR_300M.html#torchaudio.pipelines.WAV2VEC2_XLSR_300M "torchaudio.pipelines.WAV2VEC2_XLSR_300M") | XLS-R model with 300 million parameters, pre-trained on 436,000 hours of unlabeled audio from multiple datasets ( *Multilingual @@ -213,8 +320,13 @@ corpus for representation learning, semi-supervised learning and interpretation. CoRR, 2021\. URL: https://arxiv.org/abs/2101.00390, arXiv:2101.00390.")]) in 128 languages, not fine-tuned. |' + id: totrans-35 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_XLSR_300M`](generated/torchaudio.pipelines.WAV2VEC2_XLSR_300M.html#torchaudio.pipelines.WAV2VEC2_XLSR_300M + "torchaudio.pipelines.WAV2VEC2_XLSR_300M") | XLS-R 模型,具有 3 亿个参数,在多个数据集的 436,000 + 小时未标记音频上进行预训练(*多语言 LibriSpeech*,*CommonVoice*,*VoxLingua107*,*BABEL* 和 *VoxPopuli*)涵盖 + 128 种语言,未进行微调。 |' - en: '| [`WAV2VEC2_XLSR_1B`](generated/torchaudio.pipelines.WAV2VEC2_XLSR_1B.html#torchaudio.pipelines.WAV2VEC2_XLSR_1B "torchaudio.pipelines.WAV2VEC2_XLSR_1B") | XLS-R model with 1 billion parameters, pre-trained on 436,000 hours of unlabeled audio from multiple datasets ( *Multilingual @@ -237,8 +349,13 @@ corpus for representation learning, semi-supervised learning and interpretation. CoRR, 2021\. URL: https://arxiv.org/abs/2101.00390, arXiv:2101.00390.")]) in 128 languages, not fine-tuned. |' + id: totrans-36 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_XLSR_1B`](generated/torchaudio.pipelines.WAV2VEC2_XLSR_1B.html#torchaudio.pipelines.WAV2VEC2_XLSR_1B + "torchaudio.pipelines.WAV2VEC2_XLSR_1B") | XLS-R 模型,具有 10 亿个参数,在多个数据集的 436,000 + 小时未标记音频上进行了预训练(*多语言 LibriSpeech*,*CommonVoice*,*VoxLingua107*,*BABEL* 和 *VoxPopuli*)共 + 128 种语言,未进行微调。|' - en: '| [`WAV2VEC2_XLSR_2B`](generated/torchaudio.pipelines.WAV2VEC2_XLSR_2B.html#torchaudio.pipelines.WAV2VEC2_XLSR_2B "torchaudio.pipelines.WAV2VEC2_XLSR_2B") | XLS-R model with 2 billion parameters, pre-trained on 436,000 hours of unlabeled audio from multiple datasets ( *Multilingual @@ -261,8 +378,13 @@ corpus for representation learning, semi-supervised learning and interpretation. CoRR, 2021\. URL: https://arxiv.org/abs/2101.00390, arXiv:2101.00390.")]) in 128 languages, not fine-tuned. |' + id: totrans-37 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_XLSR_2B`](generated/torchaudio.pipelines.WAV2VEC2_XLSR_2B.html#torchaudio.pipelines.WAV2VEC2_XLSR_2B + "torchaudio.pipelines.WAV2VEC2_XLSR_2B") | XLS-R 模型,具有 20 亿个参数,在多个数据集的 436,000 + 小时未标记音频上进行了预训练(*多语言 LibriSpeech*,*CommonVoice*,*VoxLingua107*,*BABEL* 和 *VoxPopuli*)共 + 128 种语言,未进行微调。|' - en: '| [`HUBERT_BASE`](generated/torchaudio.pipelines.HUBERT_BASE.html#torchaudio.pipelines.HUBERT_BASE "torchaudio.pipelines.HUBERT_BASE") | HuBERT model ("base" architecture), pre-trained on 960 hours of unlabeled audio from *LibriSpeech* dataset [[Panayotov *et al.*, @@ -272,8 +394,15 @@ volume, 5206-5210\. 2015\. doi:10.1109/ICASSP.2015.7178964.")] (the combination of "train-clean-100", "train-clean-360", and "train-other-500"), not fine-tuned. |' + id: totrans-38 prefs: [] type: TYPE_TB + zh: '| [`HUBERT_BASE`](generated/torchaudio.pipelines.HUBERT_BASE.html#torchaudio.pipelines.HUBERT_BASE + "torchaudio.pipelines.HUBERT_BASE") | HuBERT模型(“基础”架构),在*LibriSpeech*数据集的960小时未标记音频上进行预训练[[Panayotov等人,2015年](references.html#id13 + "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: + an asr corpus based on public domain audio books. In 2015 IEEE International Conference + on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210. 2015. + doi:10.1109/ICASSP.2015.7178964.")](包括“train-clean-100”,“train-clean-360”和“train-other-500”),未进行微调。|' - en: '| [`HUBERT_LARGE`](generated/torchaudio.pipelines.HUBERT_LARGE.html#torchaudio.pipelines.HUBERT_LARGE "torchaudio.pipelines.HUBERT_LARGE") | HuBERT model ("large" architecture), pre-trained on 60,000 hours of unlabeled audio from *Libri-Light* dataset [[Kahn *et al.*, @@ -283,8 +412,16 @@ asr with limited or no supervision. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")], not fine-tuned. |' + id: totrans-39 prefs: [] type: TYPE_TB + zh: '| [`HUBERT_LARGE`](generated/torchaudio.pipelines.HUBERT_LARGE.html#torchaudio.pipelines.HUBERT_LARGE + "torchaudio.pipelines.HUBERT_LARGE") | HuBERT模型(“大”架构),在*Libri-Light*数据集的60,000小时未标记音频上进行预训练[[Kahn等人,2020年](references.html#id12 + "J. Kahn, M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, + V. Liptchinsky, R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, + A. Mohamed, and E. Dupoux. Libri-light: a benchmark for asr with limited or no + supervision. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, + Speech and Signal Processing (ICASSP), 7669-7673. 2020. \url https://github.com/facebookresearch/libri-light.")],未进行微调。|' - en: '| [`HUBERT_XLARGE`](generated/torchaudio.pipelines.HUBERT_XLARGE.html#torchaudio.pipelines.HUBERT_XLARGE "torchaudio.pipelines.HUBERT_XLARGE") | HuBERT model ("extra large" architecture), pre-trained on 60,000 hours of unlabeled audio from *Libri-Light* dataset [[Kahn @@ -295,8 +432,16 @@ International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")], not fine-tuned. |' + id: totrans-40 prefs: [] type: TYPE_TB + zh: '| [`HUBERT_XLARGE`](generated/torchaudio.pipelines.HUBERT_XLARGE.html#torchaudio.pipelines.HUBERT_XLARGE + "torchaudio.pipelines.HUBERT_XLARGE") | HuBERT模型(“超大”架构),在*Libri-Light*数据集的60,000小时未标记音频上进行预训练[[Kahn等人,2020年](references.html#id12 + "J. Kahn, M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, + V. Liptchinsky, R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, + A. Mohamed, and E. Dupoux. Libri-light: a benchmark for asr with limited or no + supervision. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, + Speech and Signal Processing (ICASSP), 7669-7673. 2020. \url https://github.com/facebookresearch/libri-light.")],未进行微调。|' - en: '| [`WAVLM_BASE`](generated/torchaudio.pipelines.WAVLM_BASE.html#torchaudio.pipelines.WAVLM_BASE "torchaudio.pipelines.WAVLM_BASE") | WavLM Base model ("base" architecture), pre-trained on 960 hours of unlabeled audio from *LibriSpeech* dataset [[Panayotov *et al.*, @@ -305,8 +450,15 @@ IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. doi:10.1109/ICASSP.2015.7178964.")], not fine-tuned. |' + id: totrans-41 prefs: [] type: TYPE_TB + zh: '| [`WAVLM_BASE`](generated/torchaudio.pipelines.WAVLM_BASE.html#torchaudio.pipelines.WAVLM_BASE + "torchaudio.pipelines.WAVLM_BASE") | WavLM基础模型(“基础”架构),在*LibriSpeech*数据集的960小时未标记音频上进行预训练[[Panayotov等人,2015年](references.html#id13 + "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: + an asr corpus based on public domain audio books. In 2015 IEEE International Conference + on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210. 2015. + doi:10.1109/ICASSP.2015.7178964.")],未进行微调。|' - en: '| [`WAVLM_BASE_PLUS`](generated/torchaudio.pipelines.WAVLM_BASE_PLUS.html#torchaudio.pipelines.WAVLM_BASE_PLUS "torchaudio.pipelines.WAVLM_BASE_PLUS") | WavLM Base+ model ("base" architecture), pre-trained on 60,000 hours of Libri-Light dataset [[Kahn *et al.*, 2020](references.html#id12 @@ -327,8 +479,28 @@ corpus for representation learning, semi-supervised learning and interpretation. CoRR, 2021\. URL: https://arxiv.org/abs/2101.00390, arXiv:2101.00390.")], not fine-tuned. |' + id: totrans-42 prefs: [] type: TYPE_TB + zh: '| [`WAVLM_BASE_PLUS`](generated/torchaudio.pipelines.WAVLM_BASE_PLUS.html#torchaudio.pipelines.WAVLM_BASE_PLUS + "torchaudio.pipelines.WAVLM_BASE_PLUS") | WavLM 基础+ 模型("base" 架构),在 60,000 小时的 + Libri-Light 数据集上进行了预训练[[Kahn *et al.*, 2020](references.html#id12 "J. Kahn, M. + Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, V. Liptchinsky, + R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, A. Mohamed, + and E. Dupoux. Libri-light: a benchmark for asr with limited or no supervision. + In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal + Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")],10,000 + 小时的 GigaSpeech[[Chen *et al.*, 2021](references.html#id56 "Guoguo Chen, Shuzhou + Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, + Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang + Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, and + Zhiyong Yan. Gigaspeech: an evolving, multi-domain asr corpus with 10,000 hours + of transcribed audio. In Proc. Interspeech 2021\. 2021.")],以及 24,000 小时的 *VoxPopuli*[[Wang + *et al.*, 2021](references.html#id5 "Changhan Wang, Morgane Rivière, Ann Lee, + Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Miguel Pino, + and Emmanuel Dupoux. Voxpopuli: A large-scale multilingual speech corpus for representation + learning, semi-supervised learning and interpretation. CoRR, 2021\. URL: https://arxiv.org/abs/2101.00390, + arXiv:2101.00390.")],未进行微调。|' - en: '| [`WAVLM_LARGE`](generated/torchaudio.pipelines.WAVLM_LARGE.html#torchaudio.pipelines.WAVLM_LARGE "torchaudio.pipelines.WAVLM_LARGE") | WavLM Large model ("large" architecture), pre-trained on 60,000 hours of Libri-Light dataset [[Kahn *et al.*, 2020](references.html#id12 @@ -349,58 +521,109 @@ corpus for representation learning, semi-supervised learning and interpretation. CoRR, 2021\. URL: https://arxiv.org/abs/2101.00390, arXiv:2101.00390.")], not fine-tuned. |' + id: totrans-43 prefs: [] type: TYPE_TB -- en: wav2vec 2.0 / HuBERT - Fine-tuned ASR[](#wav2vec-2-0-hubert-fine-tuned-asr - "Permalink to this heading") + zh: '[`WAVLM_LARGE`](generated/torchaudio.pipelines.WAVLM_LARGE.html#torchaudio.pipelines.WAVLM_LARGE + "torchaudio.pipelines.WAVLM_LARGE") | WavLM 大型模型("large" 架构),在 60,000 小时的 Libri-Light + 数据集上进行了预训练[[Kahn *et al.*, 2020](references.html#id12 "J. Kahn, M. Rivière, W. + Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, V. Liptchinsky, R. Collobert, + C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, A. Mohamed, and E. Dupoux. + Libri-light: a benchmark for asr with limited or no supervision. In ICASSP 2020 + - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing + (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")],10,000 + 小时的 GigaSpeech[[Chen *et al.*, 2021](references.html#id56 "Guoguo Chen, Shuzhou + Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, + Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang + Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, and + Zhiyong Yan. Gigaspeech: an evolving, multi-domain asr corpus with 10,000 hours + of transcribed audio. In Proc. Interspeech 2021\. 2021.")],以及 24,000 小时的 *VoxPopuli*[[Wang + *et al.*, 2021](references.html#id5 "Changhan Wang, Morgane Rivière, Ann Lee, + Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Miguel Pino, + and Emmanuel Dupoux. Voxpopuli: A large-scale multilingual speech corpus for representation + learning, semi-supervised learning and interpretation. CoRR, 2021\. URL: https://arxiv.org/abs/2101.00390, + arXiv:2101.00390.")],未进行微调。|' +- en: wav2vec 2.0 / HuBERT - Fine-tuned ASR[](#wav2vec-2-0-hubert-fine-tuned-asr "Permalink + to this heading") + id: totrans-44 prefs: - PREF_H2 type: TYPE_NORMAL + zh: wav2vec 2.0 / HuBERT - 微调 ASR[](#wav2vec-2-0-hubert-fine-tuned-asr "Permalink + to this heading") - en: Interface[](#id35 "Permalink to this heading") + id: totrans-45 prefs: - PREF_H3 type: TYPE_NORMAL + zh: Interface[](#id35 "Permalink to this heading") - en: '`Wav2Vec2ASRBundle` instantiates models that generate probability distribution over pre-defined labels, that can be used for ASR.' + id: totrans-46 prefs: [] type: TYPE_NORMAL + zh: '`Wav2Vec2ASRBundle` 实例化了生成预定义标签上的概率分布的模型,可用于 ASR。' - en: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-wav2vec2asrbundle.png](../Images/5f9b45dac675bb2cb840209162a85158.png)' + id: totrans-47 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-wav2vec2asrbundle.png](../Images/5f9b45dac675bb2cb840209162a85158.png)' - en: '| [`Wav2Vec2ASRBundle`](generated/torchaudio.pipelines.Wav2Vec2ASRBundle.html#torchaudio.pipelines.Wav2Vec2ASRBundle "torchaudio.pipelines.Wav2Vec2ASRBundle") | Data class that bundles associated information to use pretrained [`Wav2Vec2Model`](generated/torchaudio.models.Wav2Vec2Model.html#torchaudio.models.Wav2Vec2Model "torchaudio.models.Wav2Vec2Model"). |' + id: totrans-48 prefs: [] type: TYPE_TB + zh: '| [`Wav2Vec2ASRBundle`](generated/torchaudio.pipelines.Wav2Vec2ASRBundle.html#torchaudio.pipelines.Wav2Vec2ASRBundle + "torchaudio.pipelines.Wav2Vec2ASRBundle") | 数据类,捆绑了与预训练的 [`Wav2Vec2Model`](generated/torchaudio.models.Wav2Vec2Model.html#torchaudio.models.Wav2Vec2Model + "torchaudio.models.Wav2Vec2Model") 相关的信息。 |' - en: Tutorials using `Wav2Vec2ASRBundle` + id: totrans-49 prefs: [] type: TYPE_NORMAL + zh: 使用 `Wav2Vec2ASRBundle` 的教程 - en: '![Speech Recognition with Wav2Vec2](../Images/a6aefab61852740b8a11d3cfd1ac6866.png)' + id: totrans-50 prefs: [] type: TYPE_IMG + zh: '![使用 Wav2Vec2 进行语音识别](../Images/a6aefab61852740b8a11d3cfd1ac6866.png)' - en: '[Speech Recognition with Wav2Vec2](tutorials/speech_recognition_pipeline_tutorial.html#sphx-glr-tutorials-speech-recognition-pipeline-tutorial-py)' + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: '[使用 Wav2Vec2 进行语音识别](tutorials/speech_recognition_pipeline_tutorial.html#sphx-glr-tutorials-speech-recognition-pipeline-tutorial-py)' - en: Speech Recognition with Wav2Vec2![ASR Inference with CTC Decoder](../Images/260e63239576cae8ee00cfcba8e4889e.png) + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: 使用 Wav2Vec2 进行语音识别![CTC 解码器进行 ASR 推断](../Images/260e63239576cae8ee00cfcba8e4889e.png) - en: '[ASR Inference with CTC Decoder](tutorials/asr_inference_with_ctc_decoder_tutorial.html#sphx-glr-tutorials-asr-inference-with-ctc-decoder-tutorial-py)' + id: totrans-53 prefs: [] type: TYPE_NORMAL + zh: '[CTC 解码器进行 ASR 推断](tutorials/asr_inference_with_ctc_decoder_tutorial.html#sphx-glr-tutorials-asr-inference-with-ctc-decoder-tutorial-py)' - en: ASR Inference with CTC Decoder![Forced Alignment with Wav2Vec2](../Images/6658c9fe256ea584e84432cc92cd4db9.png) + id: totrans-54 prefs: [] type: TYPE_NORMAL + zh: CTC 解码器进行 ASR 推断![使用 Wav2Vec2 进行强制对齐](../Images/6658c9fe256ea584e84432cc92cd4db9.png) - en: '[Forced Alignment with Wav2Vec2](tutorials/forced_alignment_tutorial.html#sphx-glr-tutorials-forced-alignment-tutorial-py)' + id: totrans-55 prefs: [] type: TYPE_NORMAL + zh: '[使用 Wav2Vec2 进行强制对齐](tutorials/forced_alignment_tutorial.html#sphx-glr-tutorials-forced-alignment-tutorial-py)' - en: Forced Alignment with Wav2Vec2 + id: totrans-56 prefs: [] type: TYPE_NORMAL + zh: 使用 Wav2Vec2 进行强制对齐 - en: Pretrained Models[](#id36 "Permalink to this heading") + id: totrans-57 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 预训练模型[](#id36 "跳转到此标题的永久链接") - en: '| [`WAV2VEC2_ASR_BASE_10M`](generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M "torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M") | Wav2vec 2.0 model ("base" architecture with an extra linear module), pre-trained on 960 hours of unlabeled audio from @@ -417,8 +640,23 @@ In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")] ("train-10min" subset). |' + id: totrans-58 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_ASR_BASE_10M`](generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M + "torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M") | Wav2vec 2.0 模型(带有额外线性模块的“基础”架构),在 + *LibriSpeech* 数据集的 960 小时未标记音频上进行预训练[[Panayotov *et al.*, 2015](references.html#id13 + "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: + an asr corpus based on public domain audio books. In 2015 IEEE International Conference + on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. + doi:10.1109/ICASSP.2015.7178964.")](由 "train-clean-100"、"train-clean-360" 和 "train-other-500" + 组成),并在 *Libri-Light* 数据集的 10 分钟转录音频上进行了 ASR 微调[[Kahn *et al.*, 2020](references.html#id12 + "J. Kahn, M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, + V. Liptchinsky, R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, + A. Mohamed, and E. Dupoux. Libri-light: a benchmark for asr with limited or no + supervision. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, + Speech and Signal Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")]("train-10min" + 子集)。 |' - en: '| [`WAV2VEC2_ASR_BASE_100H`](generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_100H "torchaudio.pipelines.WAV2VEC2_ASR_BASE_100H") | Wav2vec 2.0 model ("base" architecture with an extra linear module), pre-trained on 960 hours of unlabeled audio from @@ -429,8 +667,17 @@ doi:10.1109/ICASSP.2015.7178964.")] (the combination of "train-clean-100", "train-clean-360", and "train-other-500"), and fine-tuned for ASR on 100 hours of transcribed audio from "train-clean-100" subset. |' + id: totrans-59 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_ASR_BASE_100H`](generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_100H + "torchaudio.pipelines.WAV2VEC2_ASR_BASE_100H") | Wav2vec 2.0 模型(带有额外线性模块的“基础”架构),在 + *LibriSpeech* 数据集的 960 小时未标记音频上进行预训练[[Panayotov *et al.*, 2015](references.html#id13 + "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: + an asr corpus based on public domain audio books. In 2015 IEEE International Conference + on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. + doi:10.1109/ICASSP.2015.7178964.")](由 "train-clean-100"、"train-clean-360" 和 "train-other-500" + 组成),并在 "train-clean-100" 子集的 100 小时转录音频上进行了 ASR 微调。 |' - en: '| [`WAV2VEC2_ASR_BASE_960H`](generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H "torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H") | Wav2vec 2.0 model ("base" architecture with an extra linear module), pre-trained on 960 hours of unlabeled audio from @@ -441,8 +688,17 @@ doi:10.1109/ICASSP.2015.7178964.")] (the combination of "train-clean-100", "train-clean-360", and "train-other-500"), and fine-tuned for ASR on the same audio with the corresponding transcripts. |' + id: totrans-60 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_ASR_BASE_960H`](generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H + "torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H") | Wav2vec 2.0 模型("base" 架构,带有额外的线性模块),在 + *LibriSpeech* 数据集的 960 小时未标记音频上进行预训练[[Panayotov *et al.*, 2015](references.html#id13 + "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: + an asr corpus based on public domain audio books. In 2015 IEEE International Conference + on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. + doi:10.1109/ICASSP.2015.7178964.")]("train-clean-100"、"train-clean-360" 和 "train-other-500" + 的组合),并在相同音频上与相应的转录进行了 ASR 微调。 |' - en: '| [`WAV2VEC2_ASR_LARGE_10M`](generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_10M "torchaudio.pipelines.WAV2VEC2_ASR_LARGE_10M") | Wav2vec 2.0 model ("large" architecture with an extra linear module), pre-trained on 960 hours of unlabeled audio from @@ -459,8 +715,23 @@ In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")] ("train-10min" subset). |' + id: totrans-61 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_ASR_LARGE_10M`](generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_10M + "torchaudio.pipelines.WAV2VEC2_ASR_LARGE_10M") | Wav2vec 2.0 模型("large" 架构,带有额外的线性模块),在 + *LibriSpeech* 数据集的 960 小时未标记音频上进行预训练[[Panayotov *et al.*, 2015](references.html#id13 + "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: + an asr corpus based on public domain audio books. In 2015 IEEE International Conference + on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. + doi:10.1109/ICASSP.2015.7178964.")]("train-clean-100"、"train-clean-360" 和 "train-other-500" + 的组合),并在 *Libri-Light* 数据集的 10 分钟转录音频上进行了 ASR 微调[[Kahn *et al.*, 2020](references.html#id12 + "J. Kahn, M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, + V. Liptchinsky, R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, + A. Mohamed, and E. Dupoux. Libri-light: a benchmark for asr with limited or no + supervision. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, + Speech and Signal Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")]("train-10min" + 子集)。 |' - en: '| [`WAV2VEC2_ASR_LARGE_100H`](generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_100H "torchaudio.pipelines.WAV2VEC2_ASR_LARGE_100H") | Wav2vec 2.0 model ("large" architecture with an extra linear module), pre-trained on 960 hours of unlabeled audio from @@ -471,8 +742,17 @@ doi:10.1109/ICASSP.2015.7178964.")] (the combination of "train-clean-100", "train-clean-360", and "train-other-500"), and fine-tuned for ASR on 100 hours of transcribed audio from the same dataset ("train-clean-100" subset). |' + id: totrans-62 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_ASR_LARGE_100H`](generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_100H + "torchaudio.pipelines.WAV2VEC2_ASR_LARGE_100H") | Wav2vec 2.0 模型("large" 架构,带有额外的线性模块),在 + *LibriSpeech* 数据集的 960 小时未标记音频上进行预训练[[Panayotov *et al.*, 2015](references.html#id13 + "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: + an asr corpus based on public domain audio books. In 2015 IEEE International Conference + on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. + doi:10.1109/ICASSP.2015.7178964.")]("train-clean-100"、"train-clean-360" 和 "train-other-500" + 的组合),并在相同数据集的 100 小时转录音频上进行了 ASR 微调("train-clean-100" 子集)。 |' - en: '| [`WAV2VEC2_ASR_LARGE_960H`](generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_960H "torchaudio.pipelines.WAV2VEC2_ASR_LARGE_960H") | Wav2vec 2.0 model ("large" architecture with an extra linear module), pre-trained on 960 hours of unlabeled audio from @@ -483,8 +763,17 @@ doi:10.1109/ICASSP.2015.7178964.")] (the combination of "train-clean-100", "train-clean-360", and "train-other-500"), and fine-tuned for ASR on the same audio with the corresponding transcripts. |' + id: totrans-63 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_ASR_LARGE_960H`](generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_960H + "torchaudio.pipelines.WAV2VEC2_ASR_LARGE_960H") | Wav2vec 2.0 模型("large" 架构,带有额外的线性模块),在 + *LibriSpeech* 数据集的 960 小时未标记音频上进行预训练[[Panayotov *et al.*, 2015](references.html#id13 + "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: + an asr corpus based on public domain audio books. In 2015 IEEE International Conference + on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. + doi:10.1109/ICASSP.2015.7178964.")]("train-clean-100"、"train-clean-360" 和 "train-other-500" + 的组合),并在相同音频上与相应的转录进行了 ASR 微调。 |' - en: '| [`WAV2VEC2_ASR_LARGE_LV60K_10M`](generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M "torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M") | Wav2vec 2.0 model ("large-lv60k" architecture with an extra linear module), pre-trained on 60,000 hours of unlabeled @@ -496,8 +785,18 @@ Speech and Signal Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")], and fine-tuned for ASR on 10 minutes of transcribed audio from the same dataset ("train-10min" subset). |' + id: totrans-64 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_ASR_LARGE_LV60K_10M`](generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M + "torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M") | Wav2vec 2.0 模型("large-lv60k" + 架构,带有额外的线性模块),在 *Libri-Light* 数据集的 60,000 小时未标记音频上进行预训练[[Kahn 等人,2020](references.html#id12 + "J. Kahn, M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, + V. Liptchinsky, R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, + A. Mohamed, and E. Dupoux. Libri-light: a benchmark for asr with limited or no + supervision. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, + Speech and Signal Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")],并在相同数据集的经过转录的音频上进行了 + ASR 的微调("train-10min" 子集)。 |' - en: '| [`WAV2VEC2_ASR_LARGE_LV60K_100H`](generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H "torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H") | Wav2vec 2.0 model ("large-lv60k" architecture with an extra linear module), pre-trained on 60,000 hours of unlabeled @@ -513,8 +812,22 @@ domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. doi:10.1109/ICASSP.2015.7178964.")] ("train-clean-100" subset). |' + id: totrans-65 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_ASR_LARGE_LV60K_100H`](generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H + "torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H") | Wav2vec 2.0 模型("large-lv60k" + 架构,带有额外的线性模块),在 *Libri-Light* 数据集的 60,000 小时未标记音频上进行预训练[[Kahn 等人,2020](references.html#id12 + "J. Kahn, M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, + V. Liptchinsky, R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, + A. Mohamed, and E. Dupoux. Libri-light: a benchmark for asr with limited or no + supervision. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, + Speech and Signal Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")],并在 + *LibriSpeech* 数据集的经过转录的音频上进行了 ASR 的微调,微调时长为 100 小时[[Panayotov 等人,2015](references.html#id13 + "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: + an asr corpus based on public domain audio books. In 2015 IEEE International Conference + on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. + doi:10.1109/ICASSP.2015.7178964.")]("train-clean-100" 子集)。 |' - en: '| [`WAV2VEC2_ASR_LARGE_LV60K_960H`](generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H "torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H") | Wav2vec 2.0 model ("large-lv60k" architecture with an extra linear module), pre-trained on 60,000 hours of unlabeled @@ -531,8 +844,23 @@ Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. doi:10.1109/ICASSP.2015.7178964.")] (the combination of "train-clean-100", "train-clean-360", and "train-other-500"). |' + id: totrans-66 prefs: [] type: TYPE_TB + zh: '| [`WAV2VEC2_ASR_LARGE_LV60K_960H`](generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H + "torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H") | Wav2vec 2.0 模型("large-lv60k" + 架构,带有额外的线性模块),在 *Libri-Light* 数据集的 60,000 小时未标记音频上进行预训练[[Kahn 等人,2020](references.html#id12 + "J. Kahn, M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, + V. Liptchinsky, R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, + A. Mohamed, and E. Dupoux. Libri-light: a benchmark for asr with limited or no + supervision. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, + Speech and Signal Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")] + 数据集,并在 *LibriSpeech* 数据集的经过转录的音频上进行了 ASR 的微调,微调时长为 960 小时[[Panayotov 等人,2015](references.html#id13 + "Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: + an asr corpus based on public domain audio books. In 2015 IEEE International Conference + on Acoustics, Speech and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. + doi:10.1109/ICASSP.2015.7178964.")]("train-clean-100"、"train-clean-360" 和 "train-other-500" + 的组合)。 |' - en: '| [`VOXPOPULI_ASR_BASE_10K_DE`](generated/torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_DE.html#torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_DE "torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_DE") | wav2vec 2.0 model ("base" architecture), pre-trained on 10k hours of unlabeled audio from *VoxPopuli* dataset @@ -543,8 +871,16 @@ 2021\. URL: https://arxiv.org/abs/2101.00390, arXiv:2101.00390.")] ("10k" subset, consisting of 23 languages), and fine-tuned for ASR on 282 hours of transcribed audio from "de" subset. |' + id: totrans-67 prefs: [] type: TYPE_TB + zh: '| [`VOXPOPULI_ASR_BASE_10K_DE`](generated/torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_DE.html#torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_DE + "torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_DE") | wav2vec 2.0 模型(“基础”架构),在 *VoxPopuli* + 数据集的 10k 小时未标记音频上进行预训练[[Wang 等人,2021](references.html#id5 "Changhan Wang, Morgane + Rivière, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, + Juan Miguel Pino, 和 Emmanuel Dupoux. Voxpopuli: 用于表示学习、半监督学习和解释的大规模多语言语音语料库。CoRR,2021。URL: + https://arxiv.org/abs/2101.00390, arXiv:2101.00390。")](由 23 种语言组成的“10k”子集),并在来自“de”子集的 + 282 小时转录音频上进行了 ASR 微调。|' - en: '| [`VOXPOPULI_ASR_BASE_10K_EN`](generated/torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_EN.html#torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_EN "torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_EN") | wav2vec 2.0 model ("base" architecture), pre-trained on 10k hours of unlabeled audio from *VoxPopuli* dataset @@ -555,8 +891,16 @@ 2021\. URL: https://arxiv.org/abs/2101.00390, arXiv:2101.00390.")] ("10k" subset, consisting of 23 languages), and fine-tuned for ASR on 543 hours of transcribed audio from "en" subset. |' + id: totrans-68 prefs: [] type: TYPE_TB + zh: '| [`VOXPOPULI_ASR_BASE_10K_EN`](generated/torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_EN.html#torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_EN + "torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_EN") | wav2vec 2.0 模型(“基础”架构),在 *VoxPopuli* + 数据集的 10k 小时未标记音频上进行预训练[[Wang 等人,2021](references.html#id5 "Changhan Wang, Morgane + Rivière, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, + Juan Miguel Pino, 和 Emmanuel Dupoux. Voxpopuli: 用于表示学习、半监督学习和解释的大规模多语言语音语料库。CoRR,2021。URL: + https://arxiv.org/abs/2101.00390, arXiv:2101.00390。")](由 23 种语言组成的“10k”子集),并在来自“en”子集的 + 543 小时转录音频上进行了 ASR 微调。|' - en: '| [`VOXPOPULI_ASR_BASE_10K_ES`](generated/torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_ES.html#torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_ES "torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_ES") | wav2vec 2.0 model ("base" architecture), pre-trained on 10k hours of unlabeled audio from *VoxPopuli* dataset @@ -567,8 +911,16 @@ 2021\. URL: https://arxiv.org/abs/2101.00390, arXiv:2101.00390.")] ("10k" subset, consisting of 23 languages), and fine-tuned for ASR on 166 hours of transcribed audio from "es" subset. |' + id: totrans-69 prefs: [] type: TYPE_TB + zh: '| [`VOXPOPULI_ASR_BASE_10K_ES`](generated/torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_ES.html#torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_ES + "torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_ES") | wav2vec 2.0 模型(“基础”架构),在 *VoxPopuli* + 数据集的 10k 小时未标记音频上进行预训练[[Wang 等人,2021](references.html#id5 "Changhan Wang, Morgane + Rivière, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, + Juan Miguel Pino, 和 Emmanuel Dupoux. Voxpopuli: 用于表示学习、半监督学习和解释的大规模多语言语音语料库。CoRR,2021。URL: + https://arxiv.org/abs/2101.00390, arXiv:2101.00390。")](由 23 种语言组成的“10k”子集),并在来自“es”子集的 + 166 小时转录音频上进行了 ASR 微调。|' - en: '| [`VOXPOPULI_ASR_BASE_10K_FR`](generated/torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_FR.html#torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_FR "torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_FR") | wav2vec 2.0 model ("base" architecture), pre-trained on 10k hours of unlabeled audio from *VoxPopuli* dataset @@ -579,8 +931,16 @@ 2021\. URL: https://arxiv.org/abs/2101.00390, arXiv:2101.00390.")] ("10k" subset, consisting of 23 languages), and fine-tuned for ASR on 211 hours of transcribed audio from "fr" subset. |' + id: totrans-70 prefs: [] type: TYPE_TB + zh: '| [`VOXPOPULI_ASR_BASE_10K_FR`](generated/torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_FR.html#torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_FR + "torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_FR") | wav2vec 2.0 模型(“基础”架构),在 *VoxPopuli* + 数据集的 10k 小时未标记音频上进行预训练[[Wang 等人,2021](references.html#id5 "Changhan Wang, Morgane + Rivière, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, + Juan Miguel Pino, 和 Emmanuel Dupoux. Voxpopuli: 用于表示学习、半监督学习和解释的大规模多语言语音语料库。CoRR,2021。URL: + https://arxiv.org/abs/2101.00390, arXiv:2101.00390。")](由 23 种语言组成的“10k”子集),并在来自“fr”子集的 + 211 小时转录音频上进行了 ASR 微调。|' - en: '| [`VOXPOPULI_ASR_BASE_10K_IT`](generated/torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_IT.html#torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_IT "torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_IT") | wav2vec 2.0 model ("base" architecture), pre-trained on 10k hours of unlabeled audio from *VoxPopuli* dataset @@ -591,8 +951,17 @@ 2021\. URL: https://arxiv.org/abs/2101.00390, arXiv:2101.00390.")] ("10k" subset, consisting of 23 languages), and fine-tuned for ASR on 91 hours of transcribed audio from "it" subset. |' + id: totrans-71 prefs: [] type: TYPE_TB + zh: '| [`VOXPOPULI_ASR_BASE_10K_IT`](generated/torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_IT.html#torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_IT + "torchaudio.pipelines.VOXPOPULI_ASR_BASE_10K_IT") | wav2vec 2.0 模型(“base” 架构),在 + *VoxPopuli* 数据集的 10,000 小时未标记音频上进行预训练[[Wang *et al.*, 2021](references.html#id5 + "Changhan Wang, Morgane Rivière, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel + Haziza, Mary Williamson, Juan Miguel Pino, and Emmanuel Dupoux. Voxpopuli: A large-scale + multilingual speech corpus for representation learning, semi-supervised learning + and interpretation. CoRR, 2021\. URL: https://arxiv.org/abs/2101.00390, arXiv:2101.00390.")](由 + 23 种语言组成的“10k”子集),并在来自“it”子集的 91 小时转录音频上进行了 ASR 微调。 |' - en: '| [`HUBERT_ASR_LARGE`](generated/torchaudio.pipelines.HUBERT_ASR_LARGE.html#torchaudio.pipelines.HUBERT_ASR_LARGE "torchaudio.pipelines.HUBERT_ASR_LARGE") | HuBERT model ("large" architecture), pre-trained on 60,000 hours of unlabeled audio from *Libri-Light* dataset [[Kahn @@ -609,8 +978,19 @@ and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. doi:10.1109/ICASSP.2015.7178964.")] (the combination of "train-clean-100", "train-clean-360", and "train-other-500"). |' + id: totrans-72 prefs: [] type: TYPE_TB + zh: '| [`HUBERT_ASR_LARGE`](generated/torchaudio.pipelines.HUBERT_ASR_LARGE.html#torchaudio.pipelines.HUBERT_ASR_LARGE + "torchaudio.pipelines.HUBERT_ASR_LARGE") | HuBERT 模型(“large” 架构),在 *Libri-Light* + 数据集的 60,000 小时未标记音频上进行预训练[[Kahn *et al.*, 2020](references.html#id12 "J. Kahn, + M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, V. Liptchinsky, + R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, A. Mohamed, + and E. Dupoux. Libri-light: a benchmark for asr with limited or no supervision. + In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal + Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")], + 并在来自 *LibriSpeech* 数据集的 960 小时转录音频上进行了 ASR 微调(由 "train-clean-100", "train-clean-360", + 和 "train-other-500" 组成)。 |' - en: '| [`HUBERT_ASR_XLARGE`](generated/torchaudio.pipelines.HUBERT_ASR_XLARGE.html#torchaudio.pipelines.HUBERT_ASR_XLARGE "torchaudio.pipelines.HUBERT_ASR_XLARGE") | HuBERT model ("extra large" architecture), pre-trained on 60,000 hours of unlabeled audio from *Libri-Light* dataset [[Kahn @@ -627,67 +1007,115 @@ and Signal Processing (ICASSP), volume, 5206-5210\. 2015\. doi:10.1109/ICASSP.2015.7178964.")] (the combination of "train-clean-100", "train-clean-360", and "train-other-500"). |' + id: totrans-73 prefs: [] type: TYPE_TB + zh: '| [`HUBERT_ASR_XLARGE`](generated/torchaudio.pipelines.HUBERT_ASR_XLARGE.html#torchaudio.pipelines.HUBERT_ASR_XLARGE + "torchaudio.pipelines.HUBERT_ASR_XLARGE") | HuBERT 模型(“extra large” 架构),在 *Libri-Light* + 数据集的 60,000 小时未标记音频上进行预训练[[Kahn *et al.*, 2020](references.html#id12 "J. Kahn, + M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P. E. Mazaré, J. Karadayi, V. Liptchinsky, + R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, A. Mohamed, + and E. Dupoux. Libri-light: a benchmark for asr with limited or no supervision. + In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal + Processing (ICASSP), 7669-7673\. 2020\. \url https://github.com/facebookresearch/libri-light.")], + 并在来自 *LibriSpeech* 数据集的 960 小时转录音频上进行了 ASR 微调(由 "train-clean-100", "train-clean-360", + 和 "train-other-500" 组成)。 |' - en: wav2vec 2.0 / HuBERT - Forced Alignment[](#wav2vec-2-0-hubert-forced-alignment "Permalink to this heading") + id: totrans-74 prefs: - PREF_H2 type: TYPE_NORMAL + zh: wav2vec 2.0 / HuBERT - 强制对齐[](#wav2vec-2-0-hubert-forced-alignment "Permalink + to this heading") - en: Interface[](#id59 "Permalink to this heading") + id: totrans-75 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 界面[](#id59 "Permalink to this heading") - en: '`Wav2Vec2FABundle` bundles pre-trained model and its associated dictionary. Additionally, it supports appending `star` token dimension.' + id: totrans-76 prefs: [] type: TYPE_NORMAL + zh: '`Wav2Vec2FABundle` 包含预训练模型及其相关字典。此外,它支持附加 `star` 标记维度。' - en: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-wav2vec2fabundle.png](../Images/81159a1c90b6bf1cc96789ecb75c13f0.png)' + id: totrans-77 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-wav2vec2fabundle.png](../Images/81159a1c90b6bf1cc96789ecb75c13f0.png)' - en: '| [`Wav2Vec2FABundle`](generated/torchaudio.pipelines.Wav2Vec2FABundle.html#torchaudio.pipelines.Wav2Vec2FABundle "torchaudio.pipelines.Wav2Vec2FABundle") | Data class that bundles associated information to use pretrained [`Wav2Vec2Model`](generated/torchaudio.models.Wav2Vec2Model.html#torchaudio.models.Wav2Vec2Model "torchaudio.models.Wav2Vec2Model") for forced alignment. |' + id: totrans-78 prefs: [] type: TYPE_TB + zh: '| [`Wav2Vec2FABundle`](generated/torchaudio.pipelines.Wav2Vec2FABundle.html#torchaudio.pipelines.Wav2Vec2FABundle + "torchaudio.pipelines.Wav2Vec2FABundle") | 数据类,捆绑了与预训练的[`Wav2Vec2Model`](generated/torchaudio.models.Wav2Vec2Model.html#torchaudio.models.Wav2Vec2Model + "torchaudio.models.Wav2Vec2Model")用于强制对齐的相关信息。 |' - en: '| [`Wav2Vec2FABundle.Tokenizer`](generated/torchaudio.pipelines.Wav2Vec2FABundle.Tokenizer.html#torchaudio.pipelines.Wav2Vec2FABundle.Tokenizer "torchaudio.pipelines.Wav2Vec2FABundle.Tokenizer") | Interface of the tokenizer |' + id: totrans-79 prefs: [] type: TYPE_TB + zh: '| [`Wav2Vec2FABundle.Tokenizer`](generated/torchaudio.pipelines.Wav2Vec2FABundle.Tokenizer.html#torchaudio.pipelines.Wav2Vec2FABundle.Tokenizer + "torchaudio.pipelines.Wav2Vec2FABundle.Tokenizer") | 分词器的接口 |' - en: '| [`Wav2Vec2FABundle.Aligner`](generated/torchaudio.pipelines.Wav2Vec2FABundle.Aligner.html#torchaudio.pipelines.Wav2Vec2FABundle.Aligner "torchaudio.pipelines.Wav2Vec2FABundle.Aligner") | Interface of the aligner |' + id: totrans-80 prefs: [] type: TYPE_TB + zh: '| [`Wav2Vec2FABundle.Aligner`](generated/torchaudio.pipelines.Wav2Vec2FABundle.Aligner.html#torchaudio.pipelines.Wav2Vec2FABundle.Aligner + "torchaudio.pipelines.Wav2Vec2FABundle.Aligner") | 对齐器的接口 |' - en: Tutorials using `Wav2Vec2FABundle` + id: totrans-81 prefs: [] type: TYPE_NORMAL + zh: 使用`Wav2Vec2FABundle`的教程 - en: '![CTC forced alignment API tutorial](../Images/644afa8c7cc662a8465d389ef96d587c.png)' + id: totrans-82 prefs: [] type: TYPE_IMG + zh: '![CTC强制对齐API教程](../Images/644afa8c7cc662a8465d389ef96d587c.png)' - en: '[CTC forced alignment API tutorial](tutorials/ctc_forced_alignment_api_tutorial.html#sphx-glr-tutorials-ctc-forced-alignment-api-tutorial-py)' + id: totrans-83 prefs: [] type: TYPE_NORMAL + zh: '[CTC强制对齐API教程](tutorials/ctc_forced_alignment_api_tutorial.html#sphx-glr-tutorials-ctc-forced-alignment-api-tutorial-py)' - en: CTC forced alignment API tutorial![Forced alignment for multilingual data](../Images/ca023cbba331b61f65d37937f8a25beb.png) + id: totrans-84 prefs: [] type: TYPE_NORMAL + zh: CTC强制对齐API教程![多语言数据的强制对齐](../Images/ca023cbba331b61f65d37937f8a25beb.png) - en: '[Forced alignment for multilingual data](tutorials/forced_alignment_for_multilingual_data_tutorial.html#sphx-glr-tutorials-forced-alignment-for-multilingual-data-tutorial-py)' + id: totrans-85 prefs: [] type: TYPE_NORMAL + zh: '[多语言数据的强制对齐](tutorials/forced_alignment_for_multilingual_data_tutorial.html#sphx-glr-tutorials-forced-alignment-for-multilingual-data-tutorial-py)' - en: Forced alignment for multilingual data![Forced Alignment with Wav2Vec2](../Images/6658c9fe256ea584e84432cc92cd4db9.png) + id: totrans-86 prefs: [] type: TYPE_NORMAL + zh: 多语言数据的强制对齐![使用Wav2Vec2进行强制对齐](../Images/6658c9fe256ea584e84432cc92cd4db9.png) - en: '[Forced Alignment with Wav2Vec2](tutorials/forced_alignment_tutorial.html#sphx-glr-tutorials-forced-alignment-tutorial-py)' + id: totrans-87 prefs: [] type: TYPE_NORMAL + zh: '[使用Wav2Vec2进行强制对齐](tutorials/forced_alignment_tutorial.html#sphx-glr-tutorials-forced-alignment-tutorial-py)' - en: Forced Alignment with Wav2Vec2 + id: totrans-88 prefs: [] type: TYPE_NORMAL + zh: 使用Wav2Vec2进行强制对齐 - en: Pertrained Models[](#pertrained-models "Permalink to this heading") + id: totrans-89 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 预训练模型[](#pertrained-models "跳转到此标题的永久链接") - en: '| [`MMS_FA`](generated/torchaudio.pipelines.MMS_FA.html#torchaudio.pipelines.MMS_FA "torchaudio.pipelines.MMS_FA") | Trained on 31K hours of data in 1,130 languages from *Scaling Speech Technology to 1,000+ Languages* [[Pratap *et al.*, 2023](references.html#id71 @@ -695,65 +1123,106 @@ Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, and Michael Auli. Scaling speech technology to 1,000+ languages. 2023\. arXiv:2305.13516.")]. |' + id: totrans-90 prefs: [] type: TYPE_TB + zh: '| [`MMS_FA`](generated/torchaudio.pipelines.MMS_FA.html#torchaudio.pipelines.MMS_FA + "torchaudio.pipelines.MMS_FA") | 在来自*将语音技术扩展到1000多种语言*的1,130种语言的31,000小时数据上训练[[Pratap等人,2023](references.html#id71 + "Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani + Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, + Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, and Michael Auli. Scaling + speech technology to 1,000+ languages. 2023\. arXiv:2305.13516.")] |' - en: '## Tacotron2 Text-To-Speech[](#tacotron2-text-to-speech "Permalink to this heading")' + id: totrans-91 prefs: [] type: TYPE_NORMAL + zh: '## Tacotron2文本到语音[](#tacotron2-text-to-speech "跳转到此标题的永久链接")' - en: '`Tacotron2TTSBundle` defines text-to-speech pipelines and consists of three steps: tokenization, spectrogram generation and vocoder. The spectrogram generation is based on [`Tacotron2`](generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2 "torchaudio.models.Tacotron2") model.' + id: totrans-92 prefs: [] type: TYPE_NORMAL + zh: '`Tacotron2TTSBundle`定义了文本到语音流水线,包括三个步骤:分词、频谱图生成和声码器。频谱图生成基于[`Tacotron2`](generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2 + "torchaudio.models.Tacotron2")模型。' - en: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-tacotron2bundle.png](../Images/97c575d1ba15c954a23c68df0d5b0471.png)' + id: totrans-93 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-tacotron2bundle.png](../Images/97c575d1ba15c954a23c68df0d5b0471.png)' - en: '`TextProcessor` can be rule-based tokenization in the case of characters, or it can be a neural-netowrk-based G2P model that generates sequence of phonemes from input text.' + id: totrans-94 prefs: [] type: TYPE_NORMAL + zh: '`TextProcessor`可以是基于规则的字符分词,也可以是一个神经网络的G2P模型,从输入文本生成音素序列。' - en: Similarly `Vocoder` can be an algorithm without learning parameters, like Griffin-Lim, or a neural-network-based model like Waveglow. + id: totrans-95 prefs: [] type: TYPE_NORMAL + zh: 同样,`Vocoder`可以是一个没有学习参数的算法,比如Griffin-Lim,也可以是一个基于神经网络的模型,比如Waveglow。 - en: Interface[](#id61 "Permalink to this heading") + id: totrans-96 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 接口[](#id61 "跳转到此标题的永久链接") - en: '| [`Tacotron2TTSBundle`](generated/torchaudio.pipelines.Tacotron2TTSBundle.html#torchaudio.pipelines.Tacotron2TTSBundle "torchaudio.pipelines.Tacotron2TTSBundle") | Data class that bundles associated information to use pretrained Tacotron2 and vocoder. |' + id: totrans-97 prefs: [] type: TYPE_TB + zh: '| [`Tacotron2TTSBundle`](generated/torchaudio.pipelines.Tacotron2TTSBundle.html#torchaudio.pipelines.Tacotron2TTSBundle + "torchaudio.pipelines.Tacotron2TTSBundle") | 数据类,捆绑了与预训练的Tacotron2和声码器相关信息。 |' - en: '| [`Tacotron2TTSBundle.TextProcessor`](generated/torchaudio.pipelines.Tacotron2TTSBundle.TextProcessor.html#torchaudio.pipelines.Tacotron2TTSBundle.TextProcessor "torchaudio.pipelines.Tacotron2TTSBundle.TextProcessor") | Interface of the text processing part of Tacotron2TTS pipeline |' + id: totrans-98 prefs: [] type: TYPE_TB + zh: '| [`Tacotron2TTSBundle.TextProcessor`](generated/torchaudio.pipelines.Tacotron2TTSBundle.TextProcessor.html#torchaudio.pipelines.Tacotron2TTSBundle.TextProcessor + "torchaudio.pipelines.Tacotron2TTSBundle.TextProcessor") | Tacotron2TTS流水线文本处理部分的接口 + |' - en: '| [`Tacotron2TTSBundle.Vocoder`](generated/torchaudio.pipelines.Tacotron2TTSBundle.Vocoder.html#torchaudio.pipelines.Tacotron2TTSBundle.Vocoder "torchaudio.pipelines.Tacotron2TTSBundle.Vocoder") | Interface of the vocoder part of Tacotron2TTS pipeline |' + id: totrans-99 prefs: [] type: TYPE_TB + zh: '| [`Tacotron2TTSBundle.Vocoder`](generated/torchaudio.pipelines.Tacotron2TTSBundle.Vocoder.html#torchaudio.pipelines.Tacotron2TTSBundle.Vocoder + "torchaudio.pipelines.Tacotron2TTSBundle.Vocoder") | Tacotron2TTS流水线的声码器部分的接口 + |' - en: Tutorials using `Tacotron2TTSBundle` + id: totrans-100 prefs: [] type: TYPE_NORMAL + zh: 使用`Tacotron2TTSBundle`的教程 - en: '![Text-to-Speech with Tacotron2](../Images/5a248f30c367f9fb17d182966714fd7d.png)' + id: totrans-101 prefs: [] type: TYPE_IMG + zh: '![使用Tacotron2进行文本到语音转换](../Images/5a248f30c367f9fb17d182966714fd7d.png)' - en: '[Text-to-Speech with Tacotron2](tutorials/tacotron2_pipeline_tutorial.html#sphx-glr-tutorials-tacotron2-pipeline-tutorial-py)' + id: totrans-102 prefs: [] type: TYPE_NORMAL + zh: '[使用Tacotron2进行文本到语音转换](tutorials/tacotron2_pipeline_tutorial.html#sphx-glr-tutorials-tacotron2-pipeline-tutorial-py)' - en: Text-to-Speech with Tacotron2 + id: totrans-103 prefs: [] type: TYPE_NORMAL + zh: 使用Tacotron2进行文本到语音转换 - en: Pretrained Models[](#id62 "Permalink to this heading") + id: totrans-104 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 预训练模型[](#id62 "跳转到此标题") - en: '| [`TACOTRON2_WAVERNN_PHONE_LJSPEECH`](generated/torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH.html#torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH "torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH") | Phoneme-based TTS pipeline with [`Tacotron2`](generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2 @@ -764,8 +1233,13 @@ [[Ito and Johnson, 2017](references.html#id7 "Keith Ito and Linda Johnson. The lj speech dataset. \url https://keithito.com/LJ-Speech-Dataset/, 2017.")] for 10,000 epochs. |' + id: totrans-105 prefs: [] type: TYPE_TB + zh: '| [`TACOTRON2_WAVERNN_PHONE_LJSPEECH`](generated/torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH.html#torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH + "torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH") | 基于音素的TTS流水线,使用在*LJSpeech*上训练的[`Tacotron2`](generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2 + "torchaudio.models.Tacotron2"),训练了1,500个时代,并使用在*LJSpeech*的8位深度波形上训练了10,000个时代的[`WaveRNN`](generated/torchaudio.models.WaveRNN.html#torchaudio.models.WaveRNN + "torchaudio.models.WaveRNN")声码器。 |' - en: '| [`TACOTRON2_WAVERNN_CHAR_LJSPEECH`](generated/torchaudio.pipelines.TACOTRON2_WAVERNN_CHAR_LJSPEECH.html#torchaudio.pipelines.TACOTRON2_WAVERNN_CHAR_LJSPEECH "torchaudio.pipelines.TACOTRON2_WAVERNN_CHAR_LJSPEECH") | Character-based TTS pipeline with [`Tacotron2`](generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2 @@ -776,8 +1250,13 @@ [[Ito and Johnson, 2017](references.html#id7 "Keith Ito and Linda Johnson. The lj speech dataset. \url https://keithito.com/LJ-Speech-Dataset/, 2017.")] for 10,000 epochs. |' + id: totrans-106 prefs: [] type: TYPE_TB + zh: '| [`TACOTRON2_WAVERNN_CHAR_LJSPEECH`](generated/torchaudio.pipelines.TACOTRON2_WAVERNN_CHAR_LJSPEECH.html#torchaudio.pipelines.TACOTRON2_WAVERNN_CHAR_LJSPEECH + "torchaudio.pipelines.TACOTRON2_WAVERNN_CHAR_LJSPEECH") | 基于字符的TTS流水线,使用在*LJSpeech*上训练的[`Tacotron2`](generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2 + "torchaudio.models.Tacotron2"),训练了1,500个时代,并使用在*LJSpeech*的8位深度波形上训练了10,000个时代的[`WaveRNN`](generated/torchaudio.models.WaveRNN.html#torchaudio.models.WaveRNN + "torchaudio.models.WaveRNN")声码器。 |' - en: '| [`TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH`](generated/torchaudio.pipelines.TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH.html#torchaudio.pipelines.TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH "torchaudio.pipelines.TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH") | Phoneme-based TTS pipeline with [`Tacotron2`](generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2 @@ -785,8 +1264,13 @@ "Keith Ito and Linda Johnson. The lj speech dataset. \url https://keithito.com/LJ-Speech-Dataset/, 2017.")] for 1,500 epochs and [`GriffinLim`](generated/torchaudio.transforms.GriffinLim.html#torchaudio.transforms.GriffinLim "torchaudio.transforms.GriffinLim") as vocoder. |' + id: totrans-107 prefs: [] type: TYPE_TB + zh: '| [`TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH`](generated/torchaudio.pipelines.TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH.html#torchaudio.pipelines.TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH + "torchaudio.pipelines.TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH") | 基于音素的TTS流水线,使用在*LJSpeech*上训练的[`Tacotron2`](generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2 + "torchaudio.models.Tacotron2"),训练了1,500个时代,并使用[`GriffinLim`](generated/torchaudio.transforms.GriffinLim.html#torchaudio.transforms.GriffinLim + "torchaudio.transforms.GriffinLim")作为声码器。 |' - en: '| [`TACOTRON2_GRIFFINLIM_CHAR_LJSPEECH`](generated/torchaudio.pipelines.TACOTRON2_GRIFFINLIM_CHAR_LJSPEECH.html#torchaudio.pipelines.TACOTRON2_GRIFFINLIM_CHAR_LJSPEECH "torchaudio.pipelines.TACOTRON2_GRIFFINLIM_CHAR_LJSPEECH") | Character-based TTS pipeline with [`Tacotron2`](generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2 @@ -794,44 +1278,70 @@ "Keith Ito and Linda Johnson. The lj speech dataset. \url https://keithito.com/LJ-Speech-Dataset/, 2017.")] for 1,500 epochs, and [`GriffinLim`](generated/torchaudio.transforms.GriffinLim.html#torchaudio.transforms.GriffinLim "torchaudio.transforms.GriffinLim") as vocoder. |' + id: totrans-108 prefs: [] type: TYPE_TB + zh: '| [`TACOTRON2_GRIFFINLIM_CHAR_LJSPEECH`](generated/torchaudio.pipelines.TACOTRON2_GRIFFINLIM_CHAR_LJSPEECH.html#torchaudio.pipelines.TACOTRON2_GRIFFINLIM_CHAR_LJSPEECH + "torchaudio.pipelines.TACOTRON2_GRIFFINLIM_CHAR_LJSPEECH") | 基于字符的TTS流水线,使用在*LJSpeech*上训练的[`Tacotron2`](generated/torchaudio.models.Tacotron2.html#torchaudio.models.Tacotron2 + "torchaudio.models.Tacotron2"),训练了1,500个时代,并使用[`GriffinLim`](generated/torchaudio.transforms.GriffinLim.html#torchaudio.transforms.GriffinLim + "torchaudio.transforms.GriffinLim")作为声码器。 |' - en: Source Separation[](#source-separation "Permalink to this heading") + id: totrans-109 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 声源分离[](#source-separation "跳转到此标题") - en: Interface[](#id69 "Permalink to this heading") + id: totrans-110 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 界面[](#id69 "跳转到此标题") - en: '`SourceSeparationBundle` instantiates source separation models which take single channel audio and generates multi-channel audio.' + id: totrans-111 prefs: [] type: TYPE_NORMAL + zh: '`SourceSeparationBundle`实例化声源分离模型,该模型接收单声道音频并生成多声道音频。' - en: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-sourceseparationbundle.png](../Images/69b4503224dac9c3e845bd309a996829.png)' + id: totrans-112 prefs: [] type: TYPE_IMG + zh: '![https://download.pytorch.org/torchaudio/doc-assets/pipelines-sourceseparationbundle.png](../Images/69b4503224dac9c3e845bd309a996829.png)' - en: '| [`SourceSeparationBundle`](generated/torchaudio.pipelines.SourceSeparationBundle.html#torchaudio.pipelines.SourceSeparationBundle "torchaudio.pipelines.SourceSeparationBundle") | Dataclass that bundles components for performing source separation. |' + id: totrans-113 prefs: [] type: TYPE_TB + zh: '| [`SourceSeparationBundle`](generated/torchaudio.pipelines.SourceSeparationBundle.html#torchaudio.pipelines.SourceSeparationBundle + "torchaudio.pipelines.SourceSeparationBundle") | 用于执行源分离的组件的数据类。 |' - en: Tutorials using `SourceSeparationBundle` + id: totrans-114 prefs: [] type: TYPE_NORMAL + zh: 使用`SourceSeparationBundle`的教程 - en: '![Music Source Separation with Hybrid Demucs](../Images/f822c0c06abbbf25ee5b2b2573665977.png)' + id: totrans-115 prefs: [] type: TYPE_IMG + zh: '![使用混合Demucs进行音乐源分离](../Images/f822c0c06abbbf25ee5b2b2573665977.png)' - en: '[Music Source Separation with Hybrid Demucs](tutorials/hybrid_demucs_tutorial.html#sphx-glr-tutorials-hybrid-demucs-tutorial-py)' + id: totrans-116 prefs: [] type: TYPE_NORMAL + zh: '[使用混合Demucs进行音乐源分离](tutorials/hybrid_demucs_tutorial.html#sphx-glr-tutorials-hybrid-demucs-tutorial-py)' - en: Music Source Separation with Hybrid Demucs + id: totrans-117 prefs: [] type: TYPE_NORMAL + zh: 使用混合Demucs进行音乐源分离 - en: Pretrained Models[](#id70 "Permalink to this heading") + id: totrans-118 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 预训练模型[](#id70 "跳转到此标题") - en: '| [`CONVTASNET_BASE_LIBRI2MIX`](generated/torchaudio.pipelines.CONVTASNET_BASE_LIBRI2MIX.html#torchaudio.pipelines.CONVTASNET_BASE_LIBRI2MIX "torchaudio.pipelines.CONVTASNET_BASE_LIBRI2MIX") | Pre-trained Source Separation pipeline with *ConvTasNet* [[Luo and Mesgarani, 2019](references.html#id22 "Yi @@ -842,8 +1352,15 @@ *et al.*, 2020](references.html#id37 "Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, and Emmanuel Vincent. Librimix: an open-source dataset for generalizable speech separation. 2020\. arXiv:2005.11262.")]. |' + id: totrans-119 prefs: [] type: TYPE_TB + zh: '| [`CONVTASNET_BASE_LIBRI2MIX`](generated/torchaudio.pipelines.CONVTASNET_BASE_LIBRI2MIX.html#torchaudio.pipelines.CONVTASNET_BASE_LIBRI2MIX + "torchaudio.pipelines.CONVTASNET_BASE_LIBRI2MIX") | 使用*ConvTasNet*预训练的源分离流水线[[Luo和Mesgarani,2019](references.html#id22 + "Yi Luo和Nima Mesgarani。Conv-tasnet: 超越理想的时频幅度掩蔽进行语音分离。IEEE/ACM音频、语音和语言处理交易,27(8):1256–1266,2019年8月。URL: + http://dx.doi.org/10.1109/TASLP.2019.2915167, doi:10.1109/taslp.2019.2915167。")],在*Libri2Mix数据集*上进行训练[[Cosentino等,2020](references.html#id37 + "Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge和Emmanuel + Vincent。Librimix: 用于通用语音分离的开源数据集。2020年。arXiv:2005.11262。")]. |' - en: '| [`HDEMUCS_HIGH_MUSDB_PLUS`](generated/torchaudio.pipelines.HDEMUCS_HIGH_MUSDB_PLUS.html#torchaudio.pipelines.HDEMUCS_HIGH_MUSDB_PLUS "torchaudio.pipelines.HDEMUCS_HIGH_MUSDB_PLUS") | Pre-trained music source separation pipeline with *Hybrid Demucs* [[Défossez, 2021](references.html#id50 "Alexandre @@ -855,8 +1372,12 @@ URL: https://doi.org/10.5281/zenodo.3338373, doi:10.5281/zenodo.3338373.")] and an additional 150 extra songs from an internal database that was specifically produced for Meta. |' + id: totrans-120 prefs: [] type: TYPE_TB + zh: '| [`HDEMUCS_HIGH_MUSDB_PLUS`](generated/torchaudio.pipelines.HDEMUCS_HIGH_MUSDB_PLUS.html#torchaudio.pipelines.HDEMUCS_HIGH_MUSDB_PLUS + "torchaudio.pipelines.HDEMUCS_HIGH_MUSDB_PLUS") | 使用*Hybrid Demucs*预训练的音乐源分离流水线[[Défossez, + 2021](references.html#id50 "Alexandre Défossez. 混合频谱图和波形源分离。在ISMIR 2021音乐源分离研讨会论文集中。2021年。")],在MUSDB-HQ的训练集和测试集以及专门为Meta制作的内部数据库中的额外150首歌曲上进行训练。|' - en: '| [`HDEMUCS_HIGH_MUSDB`](generated/torchaudio.pipelines.HDEMUCS_HIGH_MUSDB.html#torchaudio.pipelines.HDEMUCS_HIGH_MUSDB "torchaudio.pipelines.HDEMUCS_HIGH_MUSDB") | Pre-trained music source separation pipeline with *Hybrid Demucs* [[Défossez, 2021](references.html#id50 "Alexandre @@ -866,32 +1387,52 @@ Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis, and Rachel Bittner. MUSDB18-HQ - an uncompressed version of musdb18\. December 2019\. URL: https://doi.org/10.5281/zenodo.3338373, doi:10.5281/zenodo.3338373.")]. |' + id: totrans-121 prefs: [] type: TYPE_TB + zh: '| [`HDEMUCS_HIGH_MUSDB`](generated/torchaudio.pipelines.HDEMUCS_HIGH_MUSDB.html#torchaudio.pipelines.HDEMUCS_HIGH_MUSDB + "torchaudio.pipelines.HDEMUCS_HIGH_MUSDB") | 使用*Hybrid Demucs*预训练的音乐源分离流水线[[Défossez, + 2021](references.html#id50 "Alexandre Défossez. 混合频谱图和波形源分离。在ISMIR 2021音乐源分离研讨会论文集中。2021年。")],在MUSDB-HQ的训练集上进行训练[[Rafii等,2019](references.html#id47 + "Zafar Rafii, Antoine Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis和Rachel + Bittner。MUSDB18-HQ - musdb18的未压缩版本。2019年12月。URL: https://doi.org/10.5281/zenodo.3338373, + doi:10.5281/zenodo.3338373。")]. |' - en: Squim Objective[](#squim-objective "Permalink to this heading") + id: totrans-122 prefs: - PREF_H2 type: TYPE_NORMAL + zh: Squim目标[](#squim-objective "跳转到此标题") - en: Interface[](#id77 "Permalink to this heading") + id: totrans-123 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 界面[](#id77 "跳转到此标题") - en: '[`SquimObjectiveBundle`](generated/torchaudio.pipelines.SquimObjectiveBundle.html#torchaudio.pipelines.SquimObjectiveBundle "torchaudio.pipelines.SquimObjectiveBundle") defines speech quality and intelligibility measurement (SQUIM) pipeline that can predict **objecive** metric scores given the input waveform.' + id: totrans-124 prefs: [] type: TYPE_NORMAL + zh: '[`SquimObjectiveBundle`](generated/torchaudio.pipelines.SquimObjectiveBundle.html#torchaudio.pipelines.SquimObjectiveBundle + "torchaudio.pipelines.SquimObjectiveBundle")定义了语音质量和可懂度测量(SQUIM)流水线,可以根据输入波形预测**客观**度量分数。' - en: '| [`SquimObjectiveBundle`](generated/torchaudio.pipelines.SquimObjectiveBundle.html#torchaudio.pipelines.SquimObjectiveBundle "torchaudio.pipelines.SquimObjectiveBundle") | Data class that bundles associated information to use pretrained [`SquimObjective`](generated/torchaudio.models.SquimObjective.html#torchaudio.models.SquimObjective "torchaudio.models.SquimObjective") model. |' + id: totrans-125 prefs: [] type: TYPE_TB + zh: '| [`SquimObjectiveBundle`](generated/torchaudio.pipelines.SquimObjectiveBundle.html#torchaudio.pipelines.SquimObjectiveBundle + "torchaudio.pipelines.SquimObjectiveBundle") | 封装了与预训练[`SquimObjective`](generated/torchaudio.models.SquimObjective.html#torchaudio.models.SquimObjective + "torchaudio.models.SquimObjective")模型使用相关信息的数据类。 |' - en: Pretrained Models[](#id78 "Permalink to this heading") + id: totrans-126 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 预训练模型[](#id78 "跳转到此标题") - en: '| [`SQUIM_OBJECTIVE`](generated/torchaudio.pipelines.SQUIM_OBJECTIVE.html#torchaudio.pipelines.SQUIM_OBJECTIVE "torchaudio.pipelines.SQUIM_OBJECTIVE") | SquimObjective pipeline trained using approach described in [[Kumar *et al.*, 2023](references.html#id69 "Anurag Kumar, @@ -903,32 +1444,56 @@ Robert Aichner, Ashkan Aazami, Sebastian Braun, and others. The interspeech 2020 deep noise suppression challenge: datasets, subjective testing framework, and challenge results. arXiv preprint arXiv:2005.13981, 2020.")]. |' + id: totrans-127 prefs: [] type: TYPE_TB + zh: '| [`SQUIM_OBJECTIVE`](generated/torchaudio.pipelines.SQUIM_OBJECTIVE.html#torchaudio.pipelines.SQUIM_OBJECTIVE + "torchaudio.pipelines.SQUIM_OBJECTIVE") | 使用[[Kumar等人,2023年](references.html#id69 + "Anurag Kumar, Ke Tan, Zhaoheng Ni, Pranay Manocha, Xiaohui Zhang, Ethan Henderson, + and Buye Xu. Torchaudio-squim: reference-less speech quality and intelligibility + measures in torchaudio. arXiv preprint arXiv:2304.01448, 2023.")]中描述的方法训练的SquimObjective管道,基于*DNS + 2020数据集*[[Reddy等人,2020年](references.html#id65 "Chandan KA Reddy, Vishak Gopal, + Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, + Robert Aichner, Ashkan Aazami, Sebastian Braun, and others. The interspeech 2020 + deep noise suppression challenge: datasets, subjective testing framework, and + challenge results. arXiv preprint arXiv:2005.13981, 2020.")]。 |' - en: Squim Subjective[](#squim-subjective "Permalink to this heading") + id: totrans-128 prefs: - PREF_H2 type: TYPE_NORMAL + zh: Squim Subjective[](#squim-subjective "跳转到此标题的永久链接") - en: Interface[](#id81 "Permalink to this heading") + id: totrans-129 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 接口[](#id81 "跳转到此标题的永久链接") - en: '[`SquimSubjectiveBundle`](generated/torchaudio.pipelines.SquimSubjectiveBundle.html#torchaudio.pipelines.SquimSubjectiveBundle "torchaudio.pipelines.SquimSubjectiveBundle") defines speech quality and intelligibility measurement (SQUIM) pipeline that can predict **subjective** metric scores given the input waveform.' + id: totrans-130 prefs: [] type: TYPE_NORMAL + zh: '[`SquimSubjectiveBundle`](generated/torchaudio.pipelines.SquimSubjectiveBundle.html#torchaudio.pipelines.SquimSubjectiveBundle + "torchaudio.pipelines.SquimSubjectiveBundle")定义了可以根据输入波形预测**主观**度量分数的语音质量和可懂度测量(SQUIM)管道。' - en: '| [`SquimSubjectiveBundle`](generated/torchaudio.pipelines.SquimSubjectiveBundle.html#torchaudio.pipelines.SquimSubjectiveBundle "torchaudio.pipelines.SquimSubjectiveBundle") | Data class that bundles associated information to use pretrained [`SquimSubjective`](generated/torchaudio.models.SquimSubjective.html#torchaudio.models.SquimSubjective "torchaudio.models.SquimSubjective") model. |' + id: totrans-131 prefs: [] type: TYPE_TB + zh: '| [`SquimSubjectiveBundle`](generated/torchaudio.pipelines.SquimSubjectiveBundle.html#torchaudio.pipelines.SquimSubjectiveBundle + "torchaudio.pipelines.SquimSubjectiveBundle") | 数据类,捆绑了相关信息以使用预训练的[`SquimSubjective`](generated/torchaudio.models.SquimSubjective.html#torchaudio.models.SquimSubjective + "torchaudio.models.SquimSubjective")模型。 |' - en: Pretrained Models[](#id82 "Permalink to this heading") + id: totrans-132 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 预训练模型[](#id82 "跳转到此标题的永久链接") - en: '| [`SQUIM_SUBJECTIVE`](generated/torchaudio.pipelines.SQUIM_SUBJECTIVE.html#torchaudio.pipelines.SQUIM_SUBJECTIVE "torchaudio.pipelines.SQUIM_SUBJECTIVE") | SquimSubjective pipeline trained as described in [[Manocha and Kumar, 2022](references.html#id66 "Pranay Manocha and @@ -944,5 +1509,19 @@ in real-world environments into professional production quality speech?—a dataset, insights, and challenges. IEEE Signal Processing Letters, 22(8):1006–1010, 2014.")] datasets. |' + id: totrans-133 prefs: [] type: TYPE_TB + zh: '| [`SQUIM_SUBJECTIVE`](generated/torchaudio.pipelines.SQUIM_SUBJECTIVE.html#torchaudio.pipelines.SQUIM_SUBJECTIVE + "torchaudio.pipelines.SQUIM_SUBJECTIVE") | 如[[Manocha和Kumar,2022年](references.html#id66 + "Pranay Manocha and Anurag Kumar. Speech quality assessment through mos using + non-matching references. arXiv preprint arXiv:2206.12285, 2022.")]和[[Kumar等人,2023年](references.html#id69 + "Anurag Kumar, Ke Tan, Zhaoheng Ni, Pranay Manocha, Xiaohui Zhang, Ethan Henderson, + and Buye Xu. Torchaudio-squim: reference-less speech quality and intelligibility + measures in torchaudio. arXiv preprint arXiv:2304.01448, 2023.")]中描述的方法训练的SquimSubjective管道,基于*BVCC*[[Cooper和Yamagishi,2021年](references.html#id67 + "Erica Cooper and Junichi Yamagishi. How do voices from past speech synthesis + challenges compare today? arXiv preprint arXiv:2105.02373, 2021.")]和*DAPS*[[Mysore,2014年](references.html#id68 + "Gautham J Mysore. Can we automatically transform speech recorded on common consumer + devices in real-world environments into professional production quality speech?—a + dataset, insights, and challenges. IEEE Signal Processing Letters, 22(8):1006–1010, + 2014.")]数据集。 |' diff --git a/totrans/aud22_56.yaml b/totrans/aud22_56.yaml index 7ee368e933397056eb86f123ac89e4d40f561a4a..a94f2914525383e52f198384dfc2c86799327911 100644 --- a/totrans/aud22_56.yaml +++ b/totrans/aud22_56.yaml @@ -1,32 +1,51 @@ - en: torchaudio.sox_effects + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.sox_effects - en: 原文:[https://pytorch.org/audio/stable/sox_effects.html](https://pytorch.org/audio/stable/sox_effects.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/sox_effects.html](https://pytorch.org/audio/stable/sox_effects.html) - en: '## Applying effects[](#applying-effects "Permalink to this heading")' + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: '## 应用效果[](#applying-effects "跳转到本标题")' - en: Apply SoX effects chain on torch.Tensor or on file and load as torch.Tensor. + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 在torch.Tensor上或文件上应用SoX效果链,并加载为torch.Tensor。 - en: '| [`apply_effects_tensor`](generated/torchaudio.sox_effects.apply_effects_tensor.html#torchaudio.sox_effects.apply_effects_tensor "torchaudio.sox_effects.apply_effects_tensor") | Apply sox effects to given Tensor |' + id: totrans-4 prefs: [] type: TYPE_TB + zh: '| [`apply_effects_tensor`](generated/torchaudio.sox_effects.apply_effects_tensor.html#torchaudio.sox_effects.apply_effects_tensor + "torchaudio.sox_effects.apply_effects_tensor") | 对给定的Tensor应用SoX效果 |' - en: '| [`apply_effects_file`](generated/torchaudio.sox_effects.apply_effects_file.html#torchaudio.sox_effects.apply_effects_file "torchaudio.sox_effects.apply_effects_file") | Apply sox effects to the audio file and load the resulting data as Tensor |' + id: totrans-5 prefs: [] type: TYPE_TB + zh: '| [`apply_effects_file`](generated/torchaudio.sox_effects.apply_effects_file.html#torchaudio.sox_effects.apply_effects_file + "torchaudio.sox_effects.apply_effects_file") | 对音频文件应用SoX效果,并将结果数据加载为Tensor |' - en: Utilities[](#utilities "Permalink to this heading") + id: totrans-6 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 实用工具[](#utilities "跳转到本标题") - en: '| [`effect_names`](generated/torchaudio.sox_effects.effect_names.html#torchaudio.sox_effects.effect_names "torchaudio.sox_effects.effect_names") | Gets list of valid sox effect names |' + id: totrans-7 prefs: [] type: TYPE_TB + zh: '| [`effect_names`](generated/torchaudio.sox_effects.effect_names.html#torchaudio.sox_effects.effect_names + "torchaudio.sox_effects.effect_names") | 获取有效的SoX效果名称列表 |' diff --git a/totrans/aud22_57.yaml b/totrans/aud22_57.yaml index 92e42d42d4688c23a1be25fde8a41e8b4cc4399c..f013eea77bcda33ec6c372b0b87dd0d746a789a8 100644 --- a/totrans/aud22_57.yaml +++ b/totrans/aud22_57.yaml @@ -1,27 +1,42 @@ - en: torchaudio.compliance.kaldi + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.compliance.kaldi - en: 原文:[https://pytorch.org/audio/stable/compliance.kaldi.html](https://pytorch.org/audio/stable/compliance.kaldi.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/compliance.kaldi.html](https://pytorch.org/audio/stable/compliance.kaldi.html) - en: The useful processing operations of [kaldi](https://github.com/kaldi-asr/kaldi) can be performed with torchaudio. Various functions with identical parameters are given so that torchaudio can produce similar outputs. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 可以使用torchaudio执行[kaldi](https://github.com/kaldi-asr/kaldi)的有用处理操作。提供了具有相同参数的各种函数,以便torchaudio可以产生类似的输出。 - en: '| [`spectrogram`](generated/torchaudio.compliance.kaldi.spectrogram.html#torchaudio.compliance.kaldi.spectrogram "torchaudio.compliance.kaldi.spectrogram") | Create a spectrogram from a raw audio signal. |' + id: totrans-3 prefs: [] type: TYPE_TB + zh: '| [`spectrogram`](生成/torchaudio.compliance.kaldi.spectrogram.html#torchaudio.compliance.kaldi.spectrogram + "torchaudio.compliance.kaldi.spectrogram") | 从原始音频信号创建频谱图。|' - en: '| [`fbank`](generated/torchaudio.compliance.kaldi.fbank.html#torchaudio.compliance.kaldi.fbank "torchaudio.compliance.kaldi.fbank") | Create a fbank from a raw audio signal. |' + id: totrans-4 prefs: [] type: TYPE_TB + zh: '| [`fbank`](生成/torchaudio.compliance.kaldi.fbank.html#torchaudio.compliance.kaldi.fbank + "torchaudio.compliance.kaldi.fbank") | 从原始音频信号创建fbank。|' - en: '| [`mfcc`](generated/torchaudio.compliance.kaldi.mfcc.html#torchaudio.compliance.kaldi.mfcc "torchaudio.compliance.kaldi.mfcc") | Create a mfcc from a raw audio signal. |' + id: totrans-5 prefs: [] type: TYPE_TB + zh: '| [`mfcc`](生成/torchaudio.compliance.kaldi.mfcc.html#torchaudio.compliance.kaldi.mfcc + "torchaudio.compliance.kaldi.mfcc") | 从原始音频信号创建mfcc。|' diff --git a/totrans/aud22_58.yaml b/totrans/aud22_58.yaml index 144955bbe8ec8574b72d0f5898334f2615b970f0..977dc4492280235d91eca26cb1ad51971a2eb125 100644 --- a/totrans/aud22_58.yaml +++ b/totrans/aud22_58.yaml @@ -1,206 +1,332 @@ - en: torchaudio.kaldi_io + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.kaldi_io - en: 原文:[https://pytorch.org/audio/stable/kaldi_io.html](https://pytorch.org/audio/stable/kaldi_io.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/kaldi_io.html](https://pytorch.org/audio/stable/kaldi_io.html) - en: To use this module, the dependency [kaldi_io](https://github.com/vesis84/kaldi-io-for-python) needs to be installed. This is a light wrapper around `kaldi_io` that returns [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor "(in PyTorch v2.1)"). + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 要使用此模块,需要安装依赖[kaldi_io](https://github.com/vesis84/kaldi-io-for-python)。这是围绕`kaldi_io`的轻量级包装,返回[`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor + "(在PyTorch v2.1中)")。 - en: Vectors[](#vectors "Permalink to this heading") + id: totrans-3 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 向量[](#vectors "跳转到此标题") - en: read_vec_int_ark[](#read-vec-int-ark "Permalink to this heading") + id: totrans-4 prefs: - PREF_H3 type: TYPE_NORMAL + zh: read_vec_int_ark[](#read-vec-int-ark "跳转到此标题") - en: '[PRE0]' + id: totrans-5 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: Create generator of (key,vector) tuples, which reads from the ark file/stream. + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 创建生成器,生成从ark文件/流中读取的(key,vector)元组。 - en: 'Parameters:' + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**file_or_fd** (*str/FileDescriptor*) – ark, gzipped ark, pipe or opened file descriptor' + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: '**file_or_fd** (*str/FileDescriptor*) – ark、gzipped ark、管道或已打开的文件描述符' - en: 'Returns:' + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: 返回: - en: The string is the key and the tensor is the vector read from file + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 字符串是键,张量是从文件中读取的向量 - en: 'Return type:' + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 返回类型: - en: Iterable[Tuple[[str](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.12)"), Tensor]] + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 可迭代的元组[[str](https://docs.python.org/3/library/stdtypes.html#str "(在Python v3.12中)"), + Tensor] - en: Example + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 示例 - en: '[PRE1]' + id: totrans-14 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: read_vec_flt_scp[](#read-vec-flt-scp "Permalink to this heading") + id: totrans-15 prefs: - PREF_H3 type: TYPE_NORMAL + zh: read_vec_flt_scp[](#read-vec-flt-scp "跳转到此标题") - en: '[PRE2]' + id: totrans-16 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: Create generator of (key,vector) tuples, read according to Kaldi scp. + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: 创建生成器,根据Kaldi scp读取(key,vector)元组。 - en: 'Parameters:' + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**file_or_fd** (*str/FileDescriptor*) – scp, gzipped scp, pipe or opened file descriptor' + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: '**file_or_fd** (*str/FileDescriptor*) – scp、gzipped scp、管道或已打开的文件描述符' - en: 'Returns:' + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: 返回: - en: The string is the key and the tensor is the vector read from file + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: 字符串是键,张量是从文件中读取的向量 - en: 'Return type:' + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 返回类型: - en: Iterable[Tuple[[str](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.12)"), Tensor]] + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: 可迭代的元组[[str](https://docs.python.org/3/library/stdtypes.html#str "(在Python v3.12中)"), + Tensor] - en: Example + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: 示例 - en: '[PRE3]' + id: totrans-25 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: read_vec_flt_ark[](#read-vec-flt-ark "Permalink to this heading") + id: totrans-26 prefs: - PREF_H3 type: TYPE_NORMAL + zh: read_vec_flt_ark[](#read-vec-flt-ark "跳转到此标题") - en: '[PRE4]' + id: totrans-27 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: Create generator of (key,vector) tuples, which reads from the ark file/stream. + id: totrans-28 prefs: [] type: TYPE_NORMAL + zh: 创建生成器,生成从ark文件/流中读取的(key,vector)元组。 - en: 'Parameters:' + id: totrans-29 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**file_or_fd** (*str/FileDescriptor*) – ark, gzipped ark, pipe or opened file descriptor' + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: '**file_or_fd** (*str/FileDescriptor*) – ark、gzipped ark、管道或已打开的文件描述符' - en: 'Returns:' + id: totrans-31 prefs: [] type: TYPE_NORMAL + zh: 返回: - en: The string is the key and the tensor is the vector read from file + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: 字符串是键,张量是从文件中读取的向量 - en: 'Return type:' + id: totrans-33 prefs: [] type: TYPE_NORMAL + zh: 返回类型: - en: Iterable[Tuple[[str](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.12)"), Tensor]] + id: totrans-34 prefs: [] type: TYPE_NORMAL + zh: 可迭代的元组[[str](https://docs.python.org/3/library/stdtypes.html#str "(在Python v3.12中)"), + Tensor] - en: Example + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 示例 - en: '[PRE5]' + id: totrans-36 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: Matrices[](#matrices "Permalink to this heading") + id: totrans-37 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 矩阵[](#matrices "跳转到此标题") - en: read_mat_scp[](#read-mat-scp "Permalink to this heading") + id: totrans-38 prefs: - PREF_H3 type: TYPE_NORMAL + zh: read_mat_scp[](#read-mat-scp "跳转到此标题") - en: '[PRE6]' + id: totrans-39 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: Create generator of (key,matrix) tuples, read according to Kaldi scp. + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: 创建生成器,根据Kaldi scp读取(key,matrix)元组。 - en: 'Parameters:' + id: totrans-41 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**file_or_fd** (*str/FileDescriptor*) – scp, gzipped scp, pipe or opened file descriptor' + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: '**file_or_fd** (*str/FileDescriptor*) – scp、gzipped scp、管道或已打开的文件描述符' - en: 'Returns:' + id: totrans-43 prefs: [] type: TYPE_NORMAL + zh: 返回: - en: The string is the key and the tensor is the matrix read from file + id: totrans-44 prefs: [] type: TYPE_NORMAL + zh: 字符串是键,张量是从文件中读取的矩阵 - en: 'Return type:' + id: totrans-45 prefs: [] type: TYPE_NORMAL + zh: 返回类型: - en: Iterable[Tuple[[str](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.12)"), Tensor]] + id: totrans-46 prefs: [] type: TYPE_NORMAL + zh: 可迭代的元组[[str](https://docs.python.org/3/library/stdtypes.html#str "(在Python v3.12中)"), + Tensor] - en: Example + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 示例 - en: '[PRE7]' + id: totrans-48 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: read_mat_ark[](#read-mat-ark "Permalink to this heading") + id: totrans-49 prefs: - PREF_H3 type: TYPE_NORMAL + zh: read_mat_ark[](#read-mat-ark "跳转到此标题") - en: '[PRE8]' + id: totrans-50 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: Create generator of (key,matrix) tuples, which reads from the ark file/stream. + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: 创建生成器,生成从ark文件/流中读取的(key,matrix)元组。 - en: 'Parameters:' + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**file_or_fd** (*str/FileDescriptor*) – ark, gzipped ark, pipe or opened file descriptor' + id: totrans-53 prefs: [] type: TYPE_NORMAL + zh: '**file_or_fd** (*str/FileDescriptor*) – ark、gzipped ark、管道或已打开的文件描述符' - en: 'Returns:' + id: totrans-54 prefs: [] type: TYPE_NORMAL + zh: 返回: - en: The string is the key and the tensor is the matrix read from file + id: totrans-55 prefs: [] type: TYPE_NORMAL + zh: 字符串是键,张量是从文件中读取的矩阵 - en: 'Return type:' + id: totrans-56 prefs: [] type: TYPE_NORMAL + zh: 返回类型: - en: Iterable[Tuple[[str](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.12)"), Tensor]] + id: totrans-57 prefs: [] type: TYPE_NORMAL + zh: 可迭代的元组[[str](https://docs.python.org/3/library/stdtypes.html#str "(在Python v3.12中)"), + Tensor] - en: Example + id: totrans-58 prefs: [] type: TYPE_NORMAL + zh: 示例 - en: '[PRE9]' + id: totrans-59 prefs: [] type: TYPE_PRE + zh: '[PRE9]' diff --git a/totrans/aud22_59.yaml b/totrans/aud22_59.yaml index 290f7828ca8e478725a3a902337beac999e9979e..935732529b02ecf67421493463eda767e474280a 100644 --- a/totrans/aud22_59.yaml +++ b/totrans/aud22_59.yaml @@ -1,23 +1,36 @@ - en: torchaudio.utils + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.utils - en: 原文:[https://pytorch.org/audio/stable/utils.html](https://pytorch.org/audio/stable/utils.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/utils.html](https://pytorch.org/audio/stable/utils.html) - en: '`torchaudio.utils` module contains utility functions to configure the global state of third party libraries.' + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: '`torchaudio.utils`模块包含用于配置第三方库全局状态的实用函数。' - en: '| [`sox_utils`](generated/torchaudio.utils.sox_utils.html#module-torchaudio.utils.sox_utils "torchaudio.utils.sox_utils") | Module to change the configuration of libsox, which is used by I/O functions like `sox_io_backend` and [`sox_effects`](sox_effects.html#module-torchaudio.sox_effects "torchaudio.sox_effects"). |' + id: totrans-3 prefs: [] type: TYPE_TB + zh: '| [`sox_utils`](generated/torchaudio.utils.sox_utils.html#module-torchaudio.utils.sox_utils + "torchaudio.utils.sox_utils") | 用于更改由I/O函数(如`sox_io_backend`和[`sox_effects`](sox_effects.html#module-torchaudio.sox_effects + "torchaudio.sox_effects"))使用的libsox配置的模块。 |' - en: '| [`ffmpeg_utils`](generated/torchaudio.utils.ffmpeg_utils.html#module-torchaudio.utils.ffmpeg_utils "torchaudio.utils.ffmpeg_utils") | Module to change the configuration of FFmpeg libraries (such as libavformat). |' + id: totrans-4 prefs: [] type: TYPE_TB + zh: '| [`ffmpeg_utils`](generated/torchaudio.utils.ffmpeg_utils.html#module-torchaudio.utils.ffmpeg_utils + "torchaudio.utils.ffmpeg_utils") | 用于更改FFmpeg库(如libavformat)配置的模块。 |' diff --git a/totrans/aud22_60.yaml b/totrans/aud22_60.yaml index a8c74cec112c94b4b1aa8630c46695a209239656..f7428a0bcdc8f95a56080d6a63f4792a55e33475 100644 --- a/totrans/aud22_60.yaml +++ b/totrans/aud22_60.yaml @@ -1,24 +1,36 @@ - en: torio + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torio - en: 原文:[https://pytorch.org/audio/stable/torio.html](https://pytorch.org/audio/stable/torio.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/torio.html](https://pytorch.org/audio/stable/torio.html) - en: '`torio` is an alternative top-level module for I/O features. It is the extraction of the core implementation of I/O feature of `torchaudio`.' + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: '`torio`是I/O功能的替代顶级模块。它是`torchaudio`的I/O功能的核心实现提取。' - en: If you want to use the multimedia processing features, but do not want to depend on the entire `torchaudio` package, you can use `torio`. + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 如果您想使用多媒体处理功能,但不想依赖整个`torchaudio`包,您可以使用`torio`。 - en: Note + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Currently, `torio` is distributed alongside `torchaudio`, and there is no stand-alone procedure to install `torio` only. Please refer to [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/) for the installation of `torchaudio`. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 目前,`torio`与`torchaudio`一起分发,没有独立的安装程序来安装`torio`。请参考[https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/)来安装`torchaudio`。 diff --git a/totrans/aud22_61.yaml b/totrans/aud22_61.yaml index b71cc95193a5b9499c9fd95f39edae39db4577e3..3be96e3a709b5e056fb9c42b4bc55084e9bb4925 100644 --- a/totrans/aud22_61.yaml +++ b/totrans/aud22_61.yaml @@ -1,84 +1,138 @@ - en: torio.io + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torio.io - en: 原文:[https://pytorch.org/audio/stable/torio.io.html](https://pytorch.org/audio/stable/torio.io.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/torio.io.html](https://pytorch.org/audio/stable/torio.io.html) - en: '| [`StreamingMediaDecoder`](generated/torio.io.StreamingMediaDecoder.html#torio.io.StreamingMediaDecoder "torio.io.StreamingMediaDecoder") | Fetch and decode audio/video streams chunk by chunk. |' + id: totrans-2 prefs: [] type: TYPE_TB + zh: '| [`StreamingMediaDecoder`](generated/torio.io.StreamingMediaDecoder.html#torio.io.StreamingMediaDecoder + "torio.io.StreamingMediaDecoder") | 逐块获取和解码音频/视频流。 |' - en: '| [`StreamingMediaEncoder`](generated/torio.io.StreamingMediaEncoder.html#torio.io.StreamingMediaEncoder "torio.io.StreamingMediaEncoder") | Encode and write audio/video streams chunk by chunk |' + id: totrans-3 prefs: [] type: TYPE_TB + zh: '| [`StreamingMediaEncoder`](generated/torio.io.StreamingMediaEncoder.html#torio.io.StreamingMediaEncoder + "torio.io.StreamingMediaEncoder") | 逐块编码和写入音频/视频流 |' - en: Tutorials using `torio.io` + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 使用`torio.io`的教程 - en: '![StreamWriter Advanced Usage](../Images/6220c14661a5916b79dc7176329a2f31.png)' + id: totrans-5 prefs: [] type: TYPE_IMG + zh: '![StreamWriter高级用法](../Images/6220c14661a5916b79dc7176329a2f31.png)' - en: '[StreamWriter Advanced Usage](tutorials/streamwriter_advanced.html#sphx-glr-tutorials-streamwriter-advanced-py)' + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: '[StreamWriter高级用法](tutorials/streamwriter_advanced.html#sphx-glr-tutorials-streamwriter-advanced-py)' - en: StreamWriter Advanced Usage![StreamReader Advanced Usages](../Images/0bfcb9f0a40e70876201bb889c96b850.png) + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: StreamWriter高级用法![StreamReader高级用法](../Images/0bfcb9f0a40e70876201bb889c96b850.png) - en: '[StreamReader Advanced Usages](tutorials/streamreader_advanced_tutorial.html#sphx-glr-tutorials-streamreader-advanced-tutorial-py)' + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: '[StreamReader高级用法](tutorials/streamreader_advanced_tutorial.html#sphx-glr-tutorials-streamreader-advanced-tutorial-py)' - en: StreamReader Advanced Usages![StreamReader Basic Usages](../Images/2e9d3658df8a114b4fbaf83899e67e81.png) + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: StreamReader高级用法![StreamReader基本用法](../Images/2e9d3658df8a114b4fbaf83899e67e81.png) - en: '[StreamReader Basic Usages](tutorials/streamreader_basic_tutorial.html#sphx-glr-tutorials-streamreader-basic-tutorial-py)' + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: '[StreamReader基本用法](tutorials/streamreader_basic_tutorial.html#sphx-glr-tutorials-streamreader-basic-tutorial-py)' - en: StreamReader Basic Usages![AudioEffector Usages](../Images/1a4ea86e92f465a76624e1054dea18f7.png) + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: StreamReader基本用法![AudioEffector用法](../Images/1a4ea86e92f465a76624e1054dea18f7.png) - en: '[AudioEffector Usages](tutorials/effector_tutorial.html#sphx-glr-tutorials-effector-tutorial-py)' + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: '[AudioEffector用法](tutorials/effector_tutorial.html#sphx-glr-tutorials-effector-tutorial-py)' - en: AudioEffector Usages![Online ASR with Emformer RNN-T](../Images/200081d049505bef5c1ce8e3c321134d.png) + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: AudioEffector用法![使用Emformer RNN-T进行在线ASR](../Images/200081d049505bef5c1ce8e3c321134d.png) - en: '[Online ASR with Emformer RNN-T](tutorials/online_asr_tutorial.html#sphx-glr-tutorials-online-asr-tutorial-py)' + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: '[使用Emformer RNN-T进行在线ASR](tutorials/online_asr_tutorial.html#sphx-glr-tutorials-online-asr-tutorial-py)' - en: Online ASR with Emformer RNN-T![Device ASR with Emformer RNN-T](../Images/62ca7f96e6d3a3011aa85c2a9228f03f.png) + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: 使用Emformer RNN-T进行在线ASR![使用Emformer RNN-T进行设备ASR](../Images/62ca7f96e6d3a3011aa85c2a9228f03f.png) - en: '[Device ASR with Emformer RNN-T](tutorials/device_asr.html#sphx-glr-tutorials-device-asr-py)' + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: '[使用Emformer RNN-T进行设备ASR](tutorials/device_asr.html#sphx-glr-tutorials-device-asr-py)' - en: Device ASR with Emformer RNN-T![Accelerated video encoding with NVENC](../Images/31ca70defe6a312ea9543e4c326ada9d.png) + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: 使用Emformer RNN-T进行设备ASR![NVENC加速视频编码](../Images/31ca70defe6a312ea9543e4c326ada9d.png) - en: '[Accelerated video encoding with NVENC](tutorials/nvenc_tutorial.html#sphx-glr-tutorials-nvenc-tutorial-py)' + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: '[NVENC加速视频编码](tutorials/nvenc_tutorial.html#sphx-glr-tutorials-nvenc-tutorial-py)' - en: Accelerated video encoding with NVENC![StreamWriter Basic Usage](../Images/9f6289e977fd79f4e28b4217ecde6c14.png) + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: NVENC加速视频编码![StreamWriter基本用法](../Images/9f6289e977fd79f4e28b4217ecde6c14.png) - en: '[StreamWriter Basic Usage](tutorials/streamwriter_basic_tutorial.html#sphx-glr-tutorials-streamwriter-basic-tutorial-py)' + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: '[StreamWriter基本用法](tutorials/streamwriter_basic_tutorial.html#sphx-glr-tutorials-streamwriter-basic-tutorial-py)' - en: StreamWriter Basic Usage![Device AV-ASR with Emformer RNN-T](../Images/cfabfa62624e7ca52c6aa860b13fed89.png) + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: StreamWriter基本用法![使用Emformer RNN-T进行设备AV-ASR](../Images/cfabfa62624e7ca52c6aa860b13fed89.png) - en: '[Device AV-ASR with Emformer RNN-T](tutorials/device_avsr.html#sphx-glr-tutorials-device-avsr-py)' + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: '[使用Emformer RNN-T进行设备AV-ASR](tutorials/device_avsr.html#sphx-glr-tutorials-device-avsr-py)' - en: Device AV-ASR with Emformer RNN-T![Accelerated video decoding with NVDEC](../Images/4fbb2b4bcf6bdf294aad9b160cfaa3cf.png) + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: 使用Emformer RNN-T进行设备AV-ASR![NVDEC加速视频解码](../Images/4fbb2b4bcf6bdf294aad9b160cfaa3cf.png) - en: '[Accelerated video decoding with NVDEC](tutorials/nvdec_tutorial.html#sphx-glr-tutorials-nvdec-tutorial-py)' + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: '[NVDEC加速视频解码](tutorials/nvdec_tutorial.html#sphx-glr-tutorials-nvdec-tutorial-py)' - en: Accelerated video decoding with NVDEC + id: totrans-25 prefs: [] type: TYPE_NORMAL + zh: NVDEC加速视频解码 diff --git a/totrans/aud22_62.yaml b/totrans/aud22_62.yaml index 6d507f4e3bf62b1eeb8c58e43c0249e3f9ed4d99..4ea0ff988cf12a44c28f3a4039691ccc59de8e70 100644 --- a/totrans/aud22_62.yaml +++ b/totrans/aud22_62.yaml @@ -1,17 +1,26 @@ - en: torio.utils + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torio.utils - en: 原文:[https://pytorch.org/audio/stable/torio.utils.html](https://pytorch.org/audio/stable/torio.utils.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/torio.utils.html](https://pytorch.org/audio/stable/torio.utils.html) - en: '`torio.utils` module contains utility functions to query and configure the global state of third party libraries.' + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: '`torio.utils` 模块包含用于查询和配置第三方库全局状态的实用函数。' - en: '| [`ffmpeg_utils`](generated/torio.utils.ffmpeg_utils.html#module-torio.utils.ffmpeg_utils "torio.utils.ffmpeg_utils") | Module to change the configuration of FFmpeg libraries (such as libavformat). |' + id: totrans-3 prefs: [] type: TYPE_TB + zh: '| [`ffmpeg_utils`](generated/torio.utils.ffmpeg_utils.html#module-torio.utils.ffmpeg_utils + "torio.utils.ffmpeg_utils") | 用于更改 FFmpeg 库(如 libavformat)配置的模块。 ' diff --git a/totrans/aud22_63.yaml b/totrans/aud22_63.yaml index ded9c36d4236757d872756f1d37165cf50ce7c4d..325dea02a19c7918f990e7285dadecdc4b9973ed 100644 --- a/totrans/aud22_63.yaml +++ b/totrans/aud22_63.yaml @@ -1,4 +1,6 @@ - en: Python Prototype API Reference + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: Python 原型 API 参考 diff --git a/totrans/aud22_64.yaml b/totrans/aud22_64.yaml index 6aa9daaae844550d6bdfd08bac5023ffe2c34479..c476ba5092e7b412f1524674394b683604a8edc0 100644 --- a/totrans/aud22_64.yaml +++ b/totrans/aud22_64.yaml @@ -1,552 +1,737 @@ - en: torchaudio.prototype + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.prototype - en: 原文:[https://pytorch.org/audio/stable/prototype.html](https://pytorch.org/audio/stable/prototype.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: '[https://pytorch.org/audio/stable/prototype.html](https://pytorch.org/audio/stable/prototype.html)' - en: '`torchaudio.prototype` provides prototype features; they are at an early stage for feedback and testing. Their interfaces might be changed without prior notice.' + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: '`torchaudio.prototype`提供原型功能;它们处于早期阶段,用于反馈和测试。它们的接口可能会在没有事先通知的情况下更改。' - en: Most modules of prototypes are excluded from release. Please refer to [here](https://pytorch.org/audio) for more information on prototype features. + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 原型的大多数模块都不包含在发布中。请参考[这里](https://pytorch.org/audio)获取有关原型功能的更多信息。 - en: The modules under `torchaudio.prototype` must be imported explicitly, e.g. + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: '`torchaudio.prototype`模块必须显式导入,例如' - en: '[PRE0]' + id: totrans-5 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '[torchaudio.prototype.datasets](prototype.datasets.html)' + id: totrans-6 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.datasets](prototype.datasets.html)' - en: '[Musan](generated/torchaudio.prototype.datasets.Musan.html)' + id: totrans-7 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Musan](generated/torchaudio.prototype.datasets.Musan.html)' - en: '[__getitem__](generated/torchaudio.prototype.datasets.Musan.html#getitem)' + id: totrans-8 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[__getitem__](generated/torchaudio.prototype.datasets.Musan.html#getitem)' - en: '[get_metadata](generated/torchaudio.prototype.datasets.Musan.html#get-metadata)' + id: totrans-9 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[get_metadata](generated/torchaudio.prototype.datasets.Musan.html#get-metadata)' - en: '[torchaudio.prototype.functional](prototype.functional.html)' + id: totrans-10 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.functional](prototype.functional.html)' - en: '[Utility](prototype.functional.html#utility)' + id: totrans-11 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Utility](prototype.functional.html#utility)' - en: '[torchaudio.prototype.functional.barkscale_fbanks](generated/torchaudio.prototype.functional.barkscale_fbanks.html)' + id: totrans-12 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.functional.barkscale_fbanks](generated/torchaudio.prototype.functional.barkscale_fbanks.html)' - en: '[torchaudio.prototype.functional.chroma_filterbank](generated/torchaudio.prototype.functional.chroma_filterbank.html)' + id: totrans-13 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.functional.chroma_filterbank](generated/torchaudio.prototype.functional.chroma_filterbank.html)' - en: '[DSP](prototype.functional.html#dsp)' + id: totrans-14 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[DSP](prototype.functional.html#dsp)' - en: '[torchaudio.prototype.functional.adsr_envelope](generated/torchaudio.prototype.functional.adsr_envelope.html)' + id: totrans-15 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.functional.adsr_envelope](generated/torchaudio.prototype.functional.adsr_envelope.html)' - en: '[torchaudio.prototype.functional.filter_waveform](generated/torchaudio.prototype.functional.filter_waveform.html)' + id: totrans-16 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.functional.filter_waveform](generated/torchaudio.prototype.functional.filter_waveform.html)' - en: '[torchaudio.prototype.functional.extend_pitch](generated/torchaudio.prototype.functional.extend_pitch.html)' + id: totrans-17 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.functional.extend_pitch](generated/torchaudio.prototype.functional.extend_pitch.html)' - en: '[torchaudio.prototype.functional.oscillator_bank](generated/torchaudio.prototype.functional.oscillator_bank.html)' + id: totrans-18 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.functional.oscillator_bank](generated/torchaudio.prototype.functional.oscillator_bank.html)' - en: '[torchaudio.prototype.functional.sinc_impulse_response](generated/torchaudio.prototype.functional.sinc_impulse_response.html)' + id: totrans-19 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.functional.sinc_impulse_response](generated/torchaudio.prototype.functional.sinc_impulse_response.html)' - en: '[torchaudio.prototype.functional.frequency_impulse_response](generated/torchaudio.prototype.functional.frequency_impulse_response.html)' + id: totrans-20 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.functional.frequency_impulse_response](generated/torchaudio.prototype.functional.frequency_impulse_response.html)' - en: '[Room Impulse Response Simulation](prototype.functional.html#room-impulse-response-simulation)' + id: totrans-21 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Room Impulse Response Simulation](prototype.functional.html#room-impulse-response-simulation)' - en: '[torchaudio.prototype.functional.ray_tracing](generated/torchaudio.prototype.functional.ray_tracing.html)' + id: totrans-22 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.functional.ray_tracing](generated/torchaudio.prototype.functional.ray_tracing.html)' - en: '[torchaudio.prototype.functional.simulate_rir_ism](generated/torchaudio.prototype.functional.simulate_rir_ism.html)' + id: totrans-23 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.functional.simulate_rir_ism](generated/torchaudio.prototype.functional.simulate_rir_ism.html)' - en: '[torchaudio.prototype.models](prototype.models.html)' + id: totrans-24 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.models](prototype.models.html)' - en: '[ConformerWav2Vec2PretrainModel](generated/torchaudio.prototype.models.ConformerWav2Vec2PretrainModel.html)' + id: totrans-25 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[ConformerWav2Vec2PretrainModel](generated/torchaudio.prototype.models.ConformerWav2Vec2PretrainModel.html)' - en: '[Methods](generated/torchaudio.prototype.models.ConformerWav2Vec2PretrainModel.html#methods)' + id: totrans-26 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Methods](generated/torchaudio.prototype.models.ConformerWav2Vec2PretrainModel.html#methods)' - en: '[forward](generated/torchaudio.prototype.models.ConformerWav2Vec2PretrainModel.html#forward)' + id: totrans-27 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[forward](generated/torchaudio.prototype.models.ConformerWav2Vec2PretrainModel.html#forward)' - en: '[Factory Functions](generated/torchaudio.prototype.models.ConformerWav2Vec2PretrainModel.html#factory-functions)' + id: totrans-28 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Factory Functions](generated/torchaudio.prototype.models.ConformerWav2Vec2PretrainModel.html#factory-functions)' - en: '[torchaudio.prototype.models.conformer_wav2vec2_pretrain_model](generated/torchaudio.prototype.models.conformer_wav2vec2_pretrain_model.html)' + id: totrans-29 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.models.conformer_wav2vec2_pretrain_model](generated/torchaudio.prototype.models.conformer_wav2vec2_pretrain_model.html)' - en: '[torchaudio.prototype.models.conformer_wav2vec2_pretrain_base](generated/torchaudio.prototype.models.conformer_wav2vec2_pretrain_base.html)' + id: totrans-30 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.models.conformer_wav2vec2_pretrain_base](generated/torchaudio.prototype.models.conformer_wav2vec2_pretrain_base.html)' - en: '[torchaudio.prototype.models.conformer_wav2vec2_pretrain_large](generated/torchaudio.prototype.models.conformer_wav2vec2_pretrain_large.html)' + id: totrans-31 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.models.conformer_wav2vec2_pretrain_large](generated/torchaudio.prototype.models.conformer_wav2vec2_pretrain_large.html)' - en: '[ConvEmformer](generated/torchaudio.prototype.models.ConvEmformer.html)' + id: totrans-32 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[ConvEmformer](generated/torchaudio.prototype.models.ConvEmformer.html)' - en: '[Methods](generated/torchaudio.prototype.models.ConvEmformer.html#methods)' + id: totrans-33 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Methods](generated/torchaudio.prototype.models.ConvEmformer.html#methods)' - en: '[forward](generated/torchaudio.prototype.models.ConvEmformer.html#forward)' + id: totrans-34 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[forward](generated/torchaudio.prototype.models.ConvEmformer.html#forward)' - en: '[infer](generated/torchaudio.prototype.models.ConvEmformer.html#infer)' + id: totrans-35 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[infer](generated/torchaudio.prototype.models.ConvEmformer.html#infer)' - en: '[HiFiGANVocoder](generated/torchaudio.prototype.models.HiFiGANVocoder.html)' + id: totrans-36 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[HiFiGANVocoder](generated/torchaudio.prototype.models.HiFiGANVocoder.html)' - en: '[Methods](generated/torchaudio.prototype.models.HiFiGANVocoder.html#methods)' + id: totrans-37 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Methods](generated/torchaudio.prototype.models.HiFiGANVocoder.html#methods)' - en: '[forward](generated/torchaudio.prototype.models.HiFiGANVocoder.html#forward)' + id: totrans-38 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[forward](generated/torchaudio.prototype.models.HiFiGANVocoder.html#forward)' - en: '[Factory Functions](generated/torchaudio.prototype.models.HiFiGANVocoder.html#factory-functions)' + id: totrans-39 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Factory Functions](generated/torchaudio.prototype.models.HiFiGANVocoder.html#factory-functions)' - en: '[torchaudio.prototype.models.hifigan_vocoder](generated/torchaudio.prototype.models.hifigan_vocoder.html)' + id: totrans-40 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.models.hifigan_vocoder](generated/torchaudio.prototype.models.hifigan_vocoder.html)' - en: '[torchaudio.prototype.models.hifigan_vocoder_v1](generated/torchaudio.prototype.models.hifigan_vocoder_v1.html)' + id: totrans-41 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.models.hifigan_vocoder_v1](generated/torchaudio.prototype.models.hifigan_vocoder_v1.html)' - en: '[torchaudio.prototype.models.hifigan_vocoder_v2](generated/torchaudio.prototype.models.hifigan_vocoder_v2.html)' + id: totrans-42 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.models.hifigan_vocoder_v2](generated/torchaudio.prototype.models.hifigan_vocoder_v2.html)' - en: '[torchaudio.prototype.models.hifigan_vocoder_v3](generated/torchaudio.prototype.models.hifigan_vocoder_v3.html)' + id: totrans-43 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.models.hifigan_vocoder_v3](generated/torchaudio.prototype.models.hifigan_vocoder_v3.html)' - en: '[Prototype Factory Functions of Beta Models](prototype.models.html#prototype-factory-functions-of-beta-models)' + id: totrans-44 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Prototype Factory Functions of Beta Models](prototype.models.html#prototype-factory-functions-of-beta-models)' - en: '[Wav2Vec2Model](generated/torchaudio.models.Wav2Vec2Model.html)' + id: totrans-45 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Wav2Vec2Model](generated/torchaudio.models.Wav2Vec2Model.html)' - en: '[Methods](generated/torchaudio.models.Wav2Vec2Model.html#methods)' + id: totrans-46 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Methods](generated/torchaudio.models.Wav2Vec2Model.html#methods)' - en: '[forward](generated/torchaudio.models.Wav2Vec2Model.html#forward)' + id: totrans-47 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[forward](generated/torchaudio.models.Wav2Vec2Model.html#forward)' - en: '[extract_features](generated/torchaudio.models.Wav2Vec2Model.html#extract-features)' + id: totrans-48 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[extract_features](generated/torchaudio.models.Wav2Vec2Model.html#extract-features)' - en: '[Factory Functions](generated/torchaudio.models.Wav2Vec2Model.html#factory-functions)' + id: totrans-49 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Factory Functions](generated/torchaudio.models.Wav2Vec2Model.html#factory-functions)' - en: '[torchaudio.models.wav2vec2_model](generated/torchaudio.models.wav2vec2_model.html)' + id: totrans-50 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.wav2vec2_model](generated/torchaudio.models.wav2vec2_model.html)' - en: '[torchaudio.models.wav2vec2_base](generated/torchaudio.models.wav2vec2_base.html)' + id: totrans-51 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.wav2vec2_base](generated/torchaudio.models.wav2vec2_base.html)' - en: '[torchaudio.models.wav2vec2_large](generated/torchaudio.models.wav2vec2_large.html)' + id: totrans-52 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.wav2vec2_large](generated/torchaudio.models.wav2vec2_large.html)' - en: '[torchaudio.models.wav2vec2_large_lv60k](generated/torchaudio.models.wav2vec2_large_lv60k.html)' + id: totrans-53 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.wav2vec2_large_lv60k](generated/torchaudio.models.wav2vec2_large_lv60k.html)' - en: '[torchaudio.models.wav2vec2_xlsr_300m](generated/torchaudio.models.wav2vec2_xlsr_300m.html)' + id: totrans-54 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.wav2vec2_xlsr_300m](generated/torchaudio.models.wav2vec2_xlsr_300m.html)' - en: '[torchaudio.models.wav2vec2_xlsr_1b](generated/torchaudio.models.wav2vec2_xlsr_1b.html)' + id: totrans-55 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.wav2vec2_xlsr_1b](generated/torchaudio.models.wav2vec2_xlsr_1b.html)' - en: '[torchaudio.models.wav2vec2_xlsr_2b](generated/torchaudio.models.wav2vec2_xlsr_2b.html)' + id: totrans-56 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.wav2vec2_xlsr_2b](generated/torchaudio.models.wav2vec2_xlsr_2b.html)' - en: '[torchaudio.models.hubert_base](generated/torchaudio.models.hubert_base.html)' + id: totrans-57 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.hubert_base](generated/torchaudio.models.hubert_base.html)' - en: '[torchaudio.models.hubert_large](generated/torchaudio.models.hubert_large.html)' + id: totrans-58 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.hubert_large](generated/torchaudio.models.hubert_large.html)' - en: '[torchaudio.models.hubert_xlarge](generated/torchaudio.models.hubert_xlarge.html)' + id: totrans-59 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.hubert_xlarge](generated/torchaudio.models.hubert_xlarge.html)' - en: '[torchaudio.models.wavlm_model](generated/torchaudio.models.wavlm_model.html)' + id: totrans-60 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.wavlm_model](generated/torchaudio.models.wavlm_model.html)' - en: '[torchaudio.models.wavlm_base](generated/torchaudio.models.wavlm_base.html)' + id: totrans-61 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.wavlm_base](generated/torchaudio.models.wavlm_base.html)' - en: '[torchaudio.models.wavlm_large](generated/torchaudio.models.wavlm_large.html)' + id: totrans-62 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.wavlm_large](generated/torchaudio.models.wavlm_large.html)' - en: '[Prototype Factory Functions](generated/torchaudio.models.Wav2Vec2Model.html#prototype-factory-functions)' + id: totrans-63 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Prototype Factory Functions](generated/torchaudio.models.Wav2Vec2Model.html#prototype-factory-functions)' - en: '[torchaudio.prototype.models.emformer_hubert_model](generated/torchaudio.prototype.models.emformer_hubert_model.html)' + id: totrans-64 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.models.emformer_hubert_model](generated/torchaudio.prototype.models.emformer_hubert_model.html)' - en: '[torchaudio.prototype.models.emformer_hubert_base](generated/torchaudio.prototype.models.emformer_hubert_base.html)' + id: totrans-65 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.models.emformer_hubert_base](generated/torchaudio.prototype.models.emformer_hubert_base.html)' - en: '[torchaudio.prototype.models.conformer_wav2vec2_model](generated/torchaudio.prototype.models.conformer_wav2vec2_model.html)' + id: totrans-66 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.models.conformer_wav2vec2_model](generated/torchaudio.prototype.models.conformer_wav2vec2_model.html)' - en: '[torchaudio.prototype.models.conformer_wav2vec2_base](generated/torchaudio.prototype.models.conformer_wav2vec2_base.html)' + id: totrans-67 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.models.conformer_wav2vec2_base](generated/torchaudio.prototype.models.conformer_wav2vec2_base.html)' - en: '[Utility Functions](generated/torchaudio.models.Wav2Vec2Model.html#utility-functions)' + id: totrans-68 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Utility Functions](generated/torchaudio.models.Wav2Vec2Model.html#utility-functions)' - en: '[torchaudio.models.wav2vec2.utils.import_fairseq_model](generated/torchaudio.models.wav2vec2.utils.import_fairseq_model.html)' + id: totrans-69 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.wav2vec2.utils.import_fairseq_model](generated/torchaudio.models.wav2vec2.utils.import_fairseq_model.html)' - en: '[torchaudio.models.wav2vec2.utils.import_huggingface_model](generated/torchaudio.models.wav2vec2.utils.import_huggingface_model.html)' + id: totrans-70 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.wav2vec2.utils.import_huggingface_model](generated/torchaudio.models.wav2vec2.utils.import_huggingface_model.html)' - en: '[RNNT](generated/torchaudio.models.RNNT.html)' + id: totrans-71 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[RNNT](generated/torchaudio.models.RNNT.html)' - en: '[Methods](generated/torchaudio.models.RNNT.html#methods)' + id: totrans-72 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Methods](generated/torchaudio.models.RNNT.html#methods)' - en: '[forward](generated/torchaudio.models.RNNT.html#forward)' + id: totrans-73 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[forward](generated/torchaudio.models.RNNT.html#forward)' - en: '[transcribe_streaming](generated/torchaudio.models.RNNT.html#transcribe-streaming)' + id: totrans-74 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[transcribe_streaming](generated/torchaudio.models.RNNT.html#transcribe-streaming)' - en: '[transcribe](generated/torchaudio.models.RNNT.html#transcribe)' + id: totrans-75 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[transcribe](generated/torchaudio.models.RNNT.html#transcribe)' - en: '[predict](generated/torchaudio.models.RNNT.html#predict)' + id: totrans-76 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[predict](generated/torchaudio.models.RNNT.html#predict)' - en: '[join](generated/torchaudio.models.RNNT.html#join)' + id: totrans-77 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[join](generated/torchaudio.models.RNNT.html#join)' - en: '[Factory Functions](generated/torchaudio.models.RNNT.html#factory-functions)' + id: totrans-78 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Factory Functions](generated/torchaudio.models.RNNT.html#factory-functions)' - en: '[torchaudio.models.emformer_rnnt_model](generated/torchaudio.models.emformer_rnnt_model.html)' + id: totrans-79 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.emformer_rnnt_model](generated/torchaudio.models.emformer_rnnt_model.html)' - en: '[torchaudio.models.emformer_rnnt_base](generated/torchaudio.models.emformer_rnnt_base.html)' + id: totrans-80 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.models.emformer_rnnt_base](generated/torchaudio.models.emformer_rnnt_base.html)' - en: '[Prototype Factory Functions](generated/torchaudio.models.RNNT.html#prototype-factory-functions)' + id: totrans-81 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Prototype Factory Functions](generated/torchaudio.models.RNNT.html#prototype-factory-functions)' - en: '[torchaudio.prototype.models.conformer_rnnt_model](generated/torchaudio.prototype.models.conformer_rnnt_model.html)' + id: totrans-82 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.models.conformer_rnnt_model](generated/torchaudio.prototype.models.conformer_rnnt_model.html)' - en: '[torchaudio.prototype.models.conformer_rnnt_base](generated/torchaudio.prototype.models.conformer_rnnt_base.html)' + id: totrans-83 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.models.conformer_rnnt_base](generated/torchaudio.prototype.models.conformer_rnnt_base.html)' - en: '[torchaudio.prototype.pipelines](prototype.pipelines.html)' + id: totrans-84 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.pipelines](prototype.pipelines.html)' - en: '[RNN-T Streaming/Non-Streaming ASR](prototype.pipelines.html#rnn-t-streaming-non-streaming-asr)' + id: totrans-85 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[RNN-T Streaming/Non-Streaming ASR](prototype.pipelines.html#rnn-t-streaming-non-streaming-asr)' - en: '[Pretrained Models](prototype.pipelines.html#pretrained-models)' + id: totrans-86 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[Pretrained Models](prototype.pipelines.html#pretrained-models)' - en: '[EMFORMER_RNNT_BASE_MUSTC](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC.html)' + id: totrans-87 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[EMFORMER_RNNT_BASE_MUSTC](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC.html)' - en: '[EMFORMER_RNNT_BASE_TEDLIUM3](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3.html)' + id: totrans-88 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[EMFORMER_RNNT_BASE_TEDLIUM3](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3.html)' - en: '[HiFiGAN Vocoder](prototype.pipelines.html#hifigan-vocoder)' + id: totrans-89 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[HiFiGAN 语音合成器](prototype.pipelines.html#hifigan-vocoder)' - en: '[Interface](prototype.pipelines.html#interface)' + id: totrans-90 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[接口](prototype.pipelines.html#interface)' - en: '[HiFiGANVocoderBundle](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html)' + id: totrans-91 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[HiFiGANVocoderBundle](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html)' - en: '[Properties](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#properties)' + id: totrans-92 prefs: - PREF_IND - PREF_IND @@ -554,7 +739,9 @@ - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[属性](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#properties)' - en: '[sample_rate](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#sample-rate)' + id: totrans-93 prefs: - PREF_IND - PREF_IND @@ -563,7 +750,9 @@ - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[采样率](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#sample-rate)' - en: '[Methods](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#methods)' + id: totrans-94 prefs: - PREF_IND - PREF_IND @@ -571,7 +760,9 @@ - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[方法](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#methods)' - en: '[get_mel_transform](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#get-mel-transform)' + id: totrans-95 prefs: - PREF_IND - PREF_IND @@ -580,7 +771,9 @@ - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[获取梅尔变换](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#get-mel-transform)' - en: '[get_vocoder](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#get-vocoder)' + id: totrans-96 prefs: - PREF_IND - PREF_IND @@ -589,38 +782,50 @@ - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[获取声码器](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#get-vocoder)' - en: '[Pretrained Models](prototype.pipelines.html#id1)' + id: totrans-97 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[预训练模型](prototype.pipelines.html#id1)' - en: '[HIFIGAN_VOCODER_V3_LJSPEECH](generated/torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH.html)' + id: totrans-98 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[HIFIGAN_VOCODER_V3_LJSPEECH](generated/torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH.html)' - en: '[VGGish](prototype.pipelines.html#vggish)' + id: totrans-99 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[VGGish](prototype.pipelines.html#vggish)' - en: '[Interface](prototype.pipelines.html#id3)' + id: totrans-100 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[接口](prototype.pipelines.html#id3)' - en: '[VGGishBundle](generated/torchaudio.prototype.pipelines.VGGishBundle.html)' + id: totrans-101 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[VGGishBundle](generated/torchaudio.prototype.pipelines.VGGishBundle.html)' - en: '[Properties](generated/torchaudio.prototype.pipelines.VGGishBundle.html#properties)' + id: totrans-102 prefs: - PREF_IND - PREF_IND @@ -628,7 +833,9 @@ - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[属性](generated/torchaudio.prototype.pipelines.VGGishBundle.html#properties)' - en: '[sample_rate](generated/torchaudio.prototype.pipelines.VGGishBundle.html#sample-rate)' + id: totrans-103 prefs: - PREF_IND - PREF_IND @@ -637,7 +844,9 @@ - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[采样率](generated/torchaudio.prototype.pipelines.VGGishBundle.html#sample-rate)' - en: '[Methods](generated/torchaudio.prototype.pipelines.VGGishBundle.html#methods)' + id: totrans-104 prefs: - PREF_IND - PREF_IND @@ -645,7 +854,9 @@ - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[方法](generated/torchaudio.prototype.pipelines.VGGishBundle.html#methods)' - en: '[get_input_processor](generated/torchaudio.prototype.pipelines.VGGishBundle.html#get-input-processor)' + id: totrans-105 prefs: - PREF_IND - PREF_IND @@ -654,7 +865,9 @@ - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[获取输入处理器](generated/torchaudio.prototype.pipelines.VGGishBundle.html#get-input-processor)' - en: '[get_model](generated/torchaudio.prototype.pipelines.VGGishBundle.html#get-model)' + id: totrans-106 prefs: - PREF_IND - PREF_IND @@ -663,14 +876,18 @@ - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[get_model](generated/torchaudio.prototype.pipelines.VGGishBundle.html#get-model)' - en: '[VGGishBundle.VGGish](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGish.html)' + id: totrans-107 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[VGGishBundle.VGGish](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGish.html)' - en: '[Methods](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGish.html#methods)' + id: totrans-108 prefs: - PREF_IND - PREF_IND @@ -678,7 +895,9 @@ - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[方法](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGish.html#methods)' - en: '[forward](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGish.html#forward)' + id: totrans-109 prefs: - PREF_IND - PREF_IND @@ -687,14 +906,18 @@ - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[前向传播](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGish.html#forward)' - en: '[VGGishBundle.VGGishInputProcessor](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor.html)' + id: totrans-110 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[VGGishBundle.VGGishInputProcessor](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor.html)' - en: '[Methods](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor.html#methods)' + id: totrans-111 prefs: - PREF_IND - PREF_IND @@ -702,7 +925,9 @@ - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[方法](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor.html#methods)' - en: '[__call__](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor.html#call)' + id: totrans-112 prefs: - PREF_IND - PREF_IND @@ -711,45 +936,62 @@ - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[__call__](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor.html#call)' - en: '[Pretrained Models](prototype.pipelines.html#id6)' + id: totrans-113 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[预训练模型](prototype.pipelines.html#id6)' - en: '[VGGISH](generated/torchaudio.prototype.pipelines.VGGISH.html)' + id: totrans-114 prefs: - PREF_IND - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[VGGISH](generated/torchaudio.prototype.pipelines.VGGISH.html)' - en: '[torchaudio.prototype.transforms](prototype.transforms.html)' + id: totrans-115 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[torchaudio.prototype.transforms](prototype.transforms.html)' - en: '[BarkScale](generated/torchaudio.prototype.transforms.BarkScale.html)' + id: totrans-116 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[BarkScale](generated/torchaudio.prototype.transforms.BarkScale.html)' - en: '[BarkSpectrogram](generated/torchaudio.prototype.transforms.BarkSpectrogram.html)' + id: totrans-117 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[BarkSpectrogram](generated/torchaudio.prototype.transforms.BarkSpectrogram.html)' - en: '[ChromaScale](generated/torchaudio.prototype.transforms.ChromaScale.html)' + id: totrans-118 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[ChromaScale](generated/torchaudio.prototype.transforms.ChromaScale.html)' - en: '[ChromaSpectrogram](generated/torchaudio.prototype.transforms.ChromaSpectrogram.html)' + id: totrans-119 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[ChromaSpectrogram](generated/torchaudio.prototype.transforms.ChromaSpectrogram.html)' - en: '[InverseBarkScale](generated/torchaudio.prototype.transforms.InverseBarkScale.html)' + id: totrans-120 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[InverseBarkScale](generated/torchaudio.prototype.transforms.InverseBarkScale.html)' diff --git a/totrans/aud22_65.yaml b/totrans/aud22_65.yaml index c1e495cb66fb88ed241812784bd46582893194f7..2c3334214e8001b2c2c6431656fe263c05d7c637 100644 --- a/totrans/aud22_65.yaml +++ b/totrans/aud22_65.yaml @@ -1,14 +1,23 @@ - en: torchaudio.prototype.datasets + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.prototype.datasets - en: 原文:[https://pytorch.org/audio/stable/prototype.datasets.html](https://pytorch.org/audio/stable/prototype.datasets.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: '[https://pytorch.org/audio/stable/prototype.datasets.html](https://pytorch.org/audio/stable/prototype.datasets.html)' - en: '| [`Musan`](generated/torchaudio.prototype.datasets.Musan.html#torchaudio.prototype.datasets.Musan "torchaudio.prototype.datasets.Musan") | *MUSAN* [[Snyder *et al.*, 2015](references.html#id59 "David Snyder, Guoguo Chen, and Daniel Povey. MUSAN: A Music, Speech, and Noise Corpus. 2015\. arXiv:1510.08484v1\. arXiv:1510.08484.")] dataset. |' + id: totrans-2 prefs: [] type: TYPE_TB + zh: '| [`Musan`](generated/torchaudio.prototype.datasets.Musan.html#torchaudio.prototype.datasets.Musan + "torchaudio.prototype.datasets.Musan") | *MUSAN* [[Snyder *et al.*, 2015](references.html#id59 + "David Snyder, Guoguo Chen, and Daniel Povey. MUSAN: A Music, Speech, and Noise + Corpus. 2015\. arXiv:1510.08484v1\. arXiv:1510.08484.")] dataset. |' diff --git a/totrans/aud22_66.yaml b/totrans/aud22_66.yaml index 564d72b858a9888ed466aba79b10159a8ff8b055..26a3dbf69624e069a8a4974e9a09e1c66be132d2 100644 --- a/totrans/aud22_66.yaml +++ b/totrans/aud22_66.yaml @@ -1,72 +1,117 @@ - en: torchaudio.prototype.functional + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.prototype.functional - en: 原文:[https://pytorch.org/audio/stable/prototype.functional.html](https://pytorch.org/audio/stable/prototype.functional.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/prototype.functional.html](https://pytorch.org/audio/stable/prototype.functional.html) - en: '## Utility[](#utility "Permalink to this heading")' + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: '## 实用工具[](#utility "跳转到此标题的永久链接")' - en: '| [`barkscale_fbanks`](generated/torchaudio.prototype.functional.barkscale_fbanks.html#torchaudio.prototype.functional.barkscale_fbanks "torchaudio.prototype.functional.barkscale_fbanks") | Create a frequency bin conversion matrix. |' + id: totrans-3 prefs: [] type: TYPE_TB + zh: '| [`barkscale_fbanks`](generated/torchaudio.prototype.functional.barkscale_fbanks.html#torchaudio.prototype.functional.barkscale_fbanks + "torchaudio.prototype.functional.barkscale_fbanks") | 创建一个频率频带转换矩阵。 |' - en: '| [`chroma_filterbank`](generated/torchaudio.prototype.functional.chroma_filterbank.html#torchaudio.prototype.functional.chroma_filterbank "torchaudio.prototype.functional.chroma_filterbank") | Create a frequency-to-chroma conversion matrix. |' + id: totrans-4 prefs: [] type: TYPE_TB + zh: '| [`chroma_filterbank`](generated/torchaudio.prototype.functional.chroma_filterbank.html#torchaudio.prototype.functional.chroma_filterbank + "torchaudio.prototype.functional.chroma_filterbank") | 创建一个频率到色度转换矩阵。 |' - en: DSP[](#dsp "Permalink to this heading") + id: totrans-5 prefs: - PREF_H2 type: TYPE_NORMAL + zh: DSP[](#dsp "跳转到此标题的永久链接") - en: '| [`adsr_envelope`](generated/torchaudio.prototype.functional.adsr_envelope.html#torchaudio.prototype.functional.adsr_envelope "torchaudio.prototype.functional.adsr_envelope") | Generate ADSR Envelope |' + id: totrans-6 prefs: [] type: TYPE_TB + zh: '| [`adsr_envelope`](generated/torchaudio.prototype.functional.adsr_envelope.html#torchaudio.prototype.functional.adsr_envelope + "torchaudio.prototype.functional.adsr_envelope") | 生成ADSR包络 |' - en: '| [`filter_waveform`](generated/torchaudio.prototype.functional.filter_waveform.html#torchaudio.prototype.functional.filter_waveform "torchaudio.prototype.functional.filter_waveform") | Applies filters along time axis of the given waveform. |' + id: totrans-7 prefs: [] type: TYPE_TB + zh: '| [`filter_waveform`](generated/torchaudio.prototype.functional.filter_waveform.html#torchaudio.prototype.functional.filter_waveform + "torchaudio.prototype.functional.filter_waveform") | 在给定波形的时间轴上应用滤波器。 |' - en: '| [`extend_pitch`](generated/torchaudio.prototype.functional.extend_pitch.html#torchaudio.prototype.functional.extend_pitch "torchaudio.prototype.functional.extend_pitch") | Extend the given time series values with multipliers of them. |' + id: totrans-8 prefs: [] type: TYPE_TB + zh: '| [`extend_pitch`](generated/torchaudio.prototype.functional.extend_pitch.html#torchaudio.prototype.functional.extend_pitch + "torchaudio.prototype.functional.extend_pitch") | 用它们的倍数扩展给定的时间序列值。 |' - en: '| [`oscillator_bank`](generated/torchaudio.prototype.functional.oscillator_bank.html#torchaudio.prototype.functional.oscillator_bank "torchaudio.prototype.functional.oscillator_bank") | Synthesize waveform from the given instantaneous frequencies and amplitudes. |' + id: totrans-9 prefs: [] type: TYPE_TB + zh: '| [`oscillator_bank`](generated/torchaudio.prototype.functional.oscillator_bank.html#torchaudio.prototype.functional.oscillator_bank + "torchaudio.prototype.functional.oscillator_bank") | 从给定的瞬时频率和振幅合成波形。 |' - en: '| [`sinc_impulse_response`](generated/torchaudio.prototype.functional.sinc_impulse_response.html#torchaudio.prototype.functional.sinc_impulse_response "torchaudio.prototype.functional.sinc_impulse_response") | Create windowed-sinc impulse response for given cutoff frequencies. |' + id: totrans-10 prefs: [] type: TYPE_TB + zh: '| [`sinc_impulse_response`](generated/torchaudio.prototype.functional.sinc_impulse_response.html#torchaudio.prototype.functional.sinc_impulse_response + "torchaudio.prototype.functional.sinc_impulse_response") | 为给定的截止频率创建窗口化sinc脉冲响应。 + |' - en: '| [`frequency_impulse_response`](generated/torchaudio.prototype.functional.frequency_impulse_response.html#torchaudio.prototype.functional.frequency_impulse_response "torchaudio.prototype.functional.frequency_impulse_response") | Create filter from desired frequency response |' + id: totrans-11 prefs: [] type: TYPE_TB + zh: '| [`frequency_impulse_response`](generated/torchaudio.prototype.functional.frequency_impulse_response.html#torchaudio.prototype.functional.frequency_impulse_response + "torchaudio.prototype.functional.frequency_impulse_response") | 从所需的频率响应创建滤波器 + |' - en: Room Impulse Response Simulation[](#room-impulse-response-simulation "Permalink to this heading") + id: totrans-12 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 房间脉冲响应模拟[](#room-impulse-response-simulation "跳转到此标题的永久链接") - en: '| [`ray_tracing`](generated/torchaudio.prototype.functional.ray_tracing.html#torchaudio.prototype.functional.ray_tracing "torchaudio.prototype.functional.ray_tracing") | Compute energy histogram via ray tracing. |' + id: totrans-13 prefs: [] type: TYPE_TB + zh: '| [`ray_tracing`](generated/torchaudio.prototype.functional.ray_tracing.html#torchaudio.prototype.functional.ray_tracing + "torchaudio.prototype.functional.ray_tracing") | 通过光线追踪计算能量直方图。 |' - en: '| [`simulate_rir_ism`](generated/torchaudio.prototype.functional.simulate_rir_ism.html#torchaudio.prototype.functional.simulate_rir_ism "torchaudio.prototype.functional.simulate_rir_ism") | Compute Room Impulse Response (RIR) based on the *image source method* [[Allen and Berkley, 1979](references.html#id63 "Jont B Allen and David A Berkley. Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America, 65(4):943–950, 1979.")]. |' + id: totrans-14 prefs: [] type: TYPE_TB + zh: '| [`simulate_rir_ism`](generated/torchaudio.prototype.functional.simulate_rir_ism.html#torchaudio.prototype.functional.simulate_rir_ism + "torchaudio.prototype.functional.simulate_rir_ism") | 基于*图像源方法*[[Allen and Berkley, + 1979](references.html#id63 "Jont B Allen and David A Berkley. Image method for + efficiently simulating small-room acoustics. The Journal of the Acoustical Society + of America, 65(4):943–950, 1979.")]计算房间脉冲响应(RIR)。 |' diff --git a/totrans/aud22_67.yaml b/totrans/aud22_67.yaml index 841bb4dbbf3df3b742f5ccfb2c812834ad758d3a..ce578fc954d88ddadcbca4345a0081ef01dda94e 100644 --- a/totrans/aud22_67.yaml +++ b/totrans/aud22_67.yaml @@ -1,35 +1,54 @@ - en: torchaudio.prototype.models + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.prototype.models - en: 原文:[https://pytorch.org/audio/stable/prototype.models.html](https://pytorch.org/audio/stable/prototype.models.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/prototype.models.html](https://pytorch.org/audio/stable/prototype.models.html) - en: The `torchaudio.prototype.models` subpackage contains definitions of models for addressing common audio tasks. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: '`torchaudio.prototype.models`子包含有用于处理常见音频任务的模型定义。' - en: Note + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: For models with pre-trained parameters, please refer to [`torchaudio.prototype.pipelines`](prototype.pipelines.html#module-torchaudio.prototype.pipelines "torchaudio.prototype.pipelines") module. + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 对于具有预训练参数的模型,请参考[`torchaudio.prototype.pipelines`](prototype.pipelines.html#module-torchaudio.prototype.pipelines + "torchaudio.prototype.pipelines")模块。 - en: Model defintions are responsible for constructing computation graphs and executing them. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 模型定义负责构建计算图并执行它们。 - en: Some models have complex structure and variations. For such models, factory functions are provided. + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 一些模型具有复杂的结构和变体。对于这样的模型,提供了工厂函数。 - en: '| [`ConformerWav2Vec2PretrainModel`](generated/torchaudio.prototype.models.ConformerWav2Vec2PretrainModel.html#torchaudio.prototype.models.ConformerWav2Vec2PretrainModel "torchaudio.prototype.models.ConformerWav2Vec2PretrainModel") | Conformer Wav2Vec2 pre-train model for training from scratch. |' + id: totrans-7 prefs: [] type: TYPE_TB + zh: '| [`ConformerWav2Vec2PretrainModel`](generated/torchaudio.prototype.models.ConformerWav2Vec2PretrainModel.html#torchaudio.prototype.models.ConformerWav2Vec2PretrainModel + "torchaudio.prototype.models.ConformerWav2Vec2PretrainModel") | Conformer Wav2Vec2预训练模型,用于从头开始训练。 + |' - en: '| [`ConvEmformer`](generated/torchaudio.prototype.models.ConvEmformer.html#torchaudio.prototype.models.ConvEmformer "torchaudio.prototype.models.ConvEmformer") | Implements the convolution-augmented streaming transformer architecture introduced in *Streaming Transformer Transducer @@ -40,8 +59,18 @@ In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume, 8277-8281\. 2022\. doi:10.1109/ICASSP43922.2022.9747706.")]. |' + id: totrans-8 prefs: [] type: TYPE_TB + zh: '| [`ConvEmformer`](generated/torchaudio.prototype.models.ConvEmformer.html#torchaudio.prototype.models.ConvEmformer + "torchaudio.prototype.models.ConvEmformer") | 实现了*Streaming Transformer Transducer + based Speech Recognition Using Non-Causal Convolution*中引入的卷积增强流式变压器架构[[Shi等人,2022](references.html#id31 + "Yangyang Shi, Chunyang Wu, Dilin Wang, Alex Xiao, Jay Mahadeokar, Xiaohui Zhang, + Chunxi Liu, Ke Li, Yuan Shangguan, Varun Nagaraja, Ozlem Kalinli, and Mike Seltzer. + Streaming transformer transducer based speech recognition using non-causal convolution. + In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal + Processing (ICASSP), volume, 8277-8281\. 2022\. doi:10.1109/ICASSP43922.2022.9747706.")]. + |' - en: '| [`HiFiGANVocoder`](generated/torchaudio.prototype.models.HiFiGANVocoder.html#torchaudio.prototype.models.HiFiGANVocoder "torchaudio.prototype.models.HiFiGANVocoder") | Generator part of *HiFi GAN* [[Kong *et al.*, 2020](references.html#id57 "Jungil Kong, Jaehyeon Kim, and Jaekyoung @@ -50,26 +79,49 @@ Lin, editors, Advances in Neural Information Processing Systems, volume 33, 17022–17033\. Curran Associates, Inc., 2020\. URL: https://proceedings.neurips.cc/paper/2020/file/c5d736809766d46260d816d8dbc9eb44-Paper.pdf.")]. |' + id: totrans-9 prefs: [] type: TYPE_TB + zh: '| [`HiFiGANVocoder`](generated/torchaudio.prototype.models.HiFiGANVocoder.html#torchaudio.prototype.models.HiFiGANVocoder + "torchaudio.prototype.models.HiFiGANVocoder") | *HiFi GAN*的生成器部分[[Kong等人,2020](references.html#id57 + "Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. Hifi-gan: generative adversarial + networks for efficient and high fidelity speech synthesis. In H. Larochelle, M. + Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information + Processing Systems, volume 33, 17022–17033\. Curran Associates, Inc., 2020\. URL: + https://proceedings.neurips.cc/paper/2020/file/c5d736809766d46260d816d8dbc9eb44-Paper.pdf.")]. + |' - en: Prototype Factory Functions of Beta Models[](#prototype-factory-functions-of-beta-models "Permalink to this heading") + id: totrans-10 prefs: - PREF_H2 type: TYPE_NORMAL + zh: Beta模型的原型工厂函数[](#prototype-factory-functions-of-beta-models "Permalink to this + heading") - en: Some model definitions are in beta, but there are new factory functions that are still in prototype. Please check “Prototype Factory Functions” section in each model. + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 一些模型定义处于测试阶段,但仍有新的工厂函数处于原型阶段。请查看每个模型中的“Prototype Factory Functions”部分。 - en: '| [`Wav2Vec2Model`](generated/torchaudio.models.Wav2Vec2Model.html#torchaudio.models.Wav2Vec2Model "torchaudio.models.Wav2Vec2Model") | Acoustic model used in *wav2vec 2.0* [[Baevski *et al.*, 2020](references.html#id15 "Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. Wav2vec 2.0: a framework for self-supervised learning of speech representations. 2020\. arXiv:2006.11477.")]. |' + id: totrans-12 prefs: [] type: TYPE_TB + zh: '| [`Wav2Vec2Model`](generated/torchaudio.models.Wav2Vec2Model.html#torchaudio.models.Wav2Vec2Model + "torchaudio.models.Wav2Vec2Model") | *wav2vec 2.0*中使用的声学模型[[Baevski等人,2020](references.html#id15 + "Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. Wav2vec 2.0: + a framework for self-supervised learning of speech representations. 2020\. arXiv:2006.11477.")]. + |' - en: '| [`RNNT`](generated/torchaudio.models.RNNT.html#torchaudio.models.RNNT "torchaudio.models.RNNT") | Recurrent neural network transducer (RNN-T) model. |' + id: totrans-13 prefs: [] type: TYPE_TB + zh: '| [`RNNT`](generated/torchaudio.models.RNNT.html#torchaudio.models.RNNT "torchaudio.models.RNNT") + | 循环神经网络转录器(RNN-T)模型。 |' diff --git a/totrans/aud22_68.yaml b/totrans/aud22_68.yaml index a60d6db47b48b51bc297e11026d950ff86193165..fb3c88cd560e0696aae95d9c8e7f95099c62486b 100644 --- a/totrans/aud22_68.yaml +++ b/totrans/aud22_68.yaml @@ -1,74 +1,116 @@ - en: torchaudio.prototype.pipelines + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.prototype.pipelines - en: 原文:[https://pytorch.org/audio/stable/prototype.pipelines.html](https://pytorch.org/audio/stable/prototype.pipelines.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/prototype.pipelines.html](https://pytorch.org/audio/stable/prototype.pipelines.html) - en: The pipelines subpackage contains APIs to models with pretrained weights and relevant utilities. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: pipelines子包含有具有预训练权重和相关实用程序的模型的API。 - en: RNN-T Streaming/Non-Streaming ASR[](#rnn-t-streaming-non-streaming-asr "Permalink to this heading") + id: totrans-3 prefs: - PREF_H2 type: TYPE_NORMAL + zh: RNN-T流式/非流式ASR[](#rnn-t-streaming-non-streaming-asr "跳转到此标题") - en: Pretrained Models[](#pretrained-models "Permalink to this heading") + id: totrans-4 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 预训练模型[](#pretrained-models "跳转到此标题") - en: '| [`EMFORMER_RNNT_BASE_MUSTC`](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC.html#torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC "torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC") | Pre-trained Emformer-RNNT-based ASR pipeline capable of performing both streaming and non-streaming inference. |' + id: totrans-5 prefs: [] type: TYPE_TB + zh: '| [`EMFORMER_RNNT_BASE_MUSTC`](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC.html#torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC + "torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC") | 预训练的Emformer-RNNT基于ASR管道,能够执行流式和非流式推断。 + |' - en: '| [`EMFORMER_RNNT_BASE_TEDLIUM3`](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3.html#torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3 "torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3") | Pre-trained Emformer-RNNT-based ASR pipeline capable of performing both streaming and non-streaming inference. |' + id: totrans-6 prefs: [] type: TYPE_TB + zh: '| [`EMFORMER_RNNT_BASE_TEDLIUM3`](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3.html#torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3 + "torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3") | 预训练的Emformer-RNNT基于ASR管道,能够执行流式和非流式推断。 + |' - en: HiFiGAN Vocoder[](#hifigan-vocoder "Permalink to this heading") + id: totrans-7 prefs: - PREF_H2 type: TYPE_NORMAL + zh: HiFiGAN Vocoder[](#hifigan-vocoder "跳转到此标题") - en: Interface[](#interface "Permalink to this heading") + id: totrans-8 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 接口[](#interface "跳转到此标题") - en: '[`HiFiGANVocoderBundle`](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#torchaudio.prototype.pipelines.HiFiGANVocoderBundle "torchaudio.prototype.pipelines.HiFiGANVocoderBundle") defines HiFiGAN Vocoder pipeline capable of transforming mel spectrograms into waveforms.' + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: '[`HiFiGANVocoderBundle`](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#torchaudio.prototype.pipelines.HiFiGANVocoderBundle + "torchaudio.prototype.pipelines.HiFiGANVocoderBundle")定义了能够将mel频谱图转换为波形的HiFiGAN + Vocoder管道。' - en: '| [`HiFiGANVocoderBundle`](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#torchaudio.prototype.pipelines.HiFiGANVocoderBundle "torchaudio.prototype.pipelines.HiFiGANVocoderBundle") | Data class that bundles associated information to use pretrained [`HiFiGANVocoder`](generated/torchaudio.prototype.models.HiFiGANVocoder.html#torchaudio.prototype.models.HiFiGANVocoder "torchaudio.prototype.models.HiFiGANVocoder"). |' + id: totrans-10 prefs: [] type: TYPE_TB + zh: '| [`HiFiGANVocoderBundle`](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#torchaudio.prototype.pipelines.HiFiGANVocoderBundle + "torchaudio.prototype.pipelines.HiFiGANVocoderBundle") | 数据类,捆绑了与预训练[`HiFiGANVocoder`](generated/torchaudio.prototype.models.HiFiGANVocoder.html#torchaudio.prototype.models.HiFiGANVocoder + "torchaudio.prototype.models.HiFiGANVocoder")相关的信息。 |' - en: Pretrained Models[](#id1 "Permalink to this heading") + id: totrans-11 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 预训练模型[](#id1 "跳转到此标题") - en: '| [`HIFIGAN_VOCODER_V3_LJSPEECH`](generated/torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH.html#torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH "torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH") | HiFiGAN Vocoder pipeline, trained on *The LJ Speech Dataset* [[Ito and Johnson, 2017](references.html#id7 "Keith Ito and Linda Johnson. The lj speech dataset. \url https://keithito.com/LJ-Speech-Dataset/, 2017.")]. |' + id: totrans-12 prefs: [] type: TYPE_TB + zh: '| [`HIFIGAN_VOCODER_V3_LJSPEECH`](generated/torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH.html#torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH + "torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH") | HiFiGAN Vocoder管道,训练于*LJ + Speech数据集*[[Ito and Johnson, 2017](references.html#id7 "Keith Ito and Linda Johnson. + The lj speech dataset. \url https://keithito.com/LJ-Speech-Dataset/, 2017.")]。 + |' - en: VGGish[](#vggish "Permalink to this heading") + id: totrans-13 prefs: - PREF_H2 type: TYPE_NORMAL + zh: VGGish[](#vggish "跳转到此标题") - en: Interface[](#id3 "Permalink to this heading") + id: totrans-14 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 接口[](#id3 "跳转到此标题") - en: '| [`VGGishBundle`](generated/torchaudio.prototype.pipelines.VGGishBundle.html#torchaudio.prototype.pipelines.VGGishBundle "torchaudio.prototype.pipelines.VGGishBundle") | VGGish [[Hershey *et al.*, 2017](references.html#id70 "Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, @@ -78,8 +120,17 @@ 2017\. URL: https://arxiv.org/abs/1609.09430.")] inference pipeline ported from [torchvggish](https://github.com/harritaylor/torchvggish) and [tensorflow-models](https://github.com/tensorflow/models/tree/master/research/audioset). |' + id: totrans-15 prefs: [] type: TYPE_TB + zh: '| [`VGGishBundle`](generated/torchaudio.prototype.pipelines.VGGishBundle.html#torchaudio.prototype.pipelines.VGGishBundle + "torchaudio.prototype.pipelines.VGGishBundle") | VGGish[[Hershey等人,2017](references.html#id70 + "Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, + Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm + Slaney, Ron Weiss, and Kevin Wilson. Cnn architectures for large-scale audio classification. + In International Conference on Acoustics, Speech and Signal Processing (ICASSP). + 2017\. URL: https://arxiv.org/abs/1609.09430.")]推断管道,从[torchvggish](https://github.com/harritaylor/torchvggish)和[tensorflow-models](https://github.com/tensorflow/models/tree/master/research/audioset)移植而来。 + |' - en: '| [`VGGishBundle.VGGish`](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGish.html#torchaudio.prototype.pipelines.VGGishBundle.VGGish "torchaudio.prototype.pipelines.VGGishBundle.VGGish") | Implementation of VGGish model [[Hershey *et al.*, 2017](references.html#id70 "Shawn Hershey, Sourish Chaudhuri, @@ -88,17 +139,31 @@ Wilson. Cnn architectures for large-scale audio classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017\. URL: https://arxiv.org/abs/1609.09430.")]. |' + id: totrans-16 prefs: [] type: TYPE_TB + zh: '| [`VGGishBundle.VGGish`](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGish.html#torchaudio.prototype.pipelines.VGGishBundle.VGGish + "torchaudio.prototype.pipelines.VGGishBundle.VGGish") | VGGish模型的实现[[Hershey等人,2017](references.html#id70 + "Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, + Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm + Slaney, Ron Weiss, and Kevin Wilson. Cnn architectures for large-scale audio classification. + In International Conference on Acoustics, Speech and Signal Processing (ICASSP). + 2017\. URL: https://arxiv.org/abs/1609.09430.")]。 |' - en: '| [`VGGishBundle.VGGishInputProcessor`](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor.html#torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor "torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor") | Converts raw waveforms to batches of examples to use as inputs to VGGish. |' + id: totrans-17 prefs: [] type: TYPE_TB + zh: '| [`VGGishBundle.VGGishInputProcessor`](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor.html#torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor + "torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor") | 将原始波形转换为用作VGGish输入的示例批次。 + |' - en: Pretrained Models[](#id6 "Permalink to this heading") + id: totrans-18 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 预训练模型 - en: '| [`VGGISH`](generated/torchaudio.prototype.pipelines.VGGISH.html#torchaudio.prototype.pipelines.VGGISH "torchaudio.prototype.pipelines.VGGISH") | Pre-trained VGGish [[Hershey *et al.*, 2017](references.html#id70 "Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, @@ -109,5 +174,15 @@ inference pipeline ported from [torchvggish](https://github.com/harritaylor/torchvggish) and [tensorflow-models](https://github.com/tensorflow/models/tree/master/research/audioset). |' + id: totrans-19 prefs: [] type: TYPE_TB + zh: '| [`VGGISH`](generated/torchaudio.prototype.pipelines.VGGISH.html#torchaudio.prototype.pipelines.VGGISH + "torchaudio.prototype.pipelines.VGGISH") | 从 [torchvggish](https://github.com/harritaylor/torchvggish) + 和 [tensorflow-models](https://github.com/tensorflow/models/tree/master/research/audioset) + 移植的预训练VGGish [[Hershey *et al.*, 2017](references.html#id70 "Shawn Hershey, Sourish + Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj + Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, + and Kevin Wilson. Cnn architectures for large-scale audio classification. In International + Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017\. URL: https://arxiv.org/abs/1609.09430.")] + 推理流程。 |' diff --git a/totrans/aud22_69.yaml b/totrans/aud22_69.yaml index 8ea562aeeda6eeb46a0d39a16df6dbdcd7885d9d..307c8f833687554ce7583523e9afa4c5bf9c6c63 100644 --- a/totrans/aud22_69.yaml +++ b/totrans/aud22_69.yaml @@ -1,33 +1,53 @@ - en: torchaudio.prototype.transforms + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchaudio.prototype.transforms - en: 原文:[https://pytorch.org/audio/stable/prototype.transforms.html](https://pytorch.org/audio/stable/prototype.transforms.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/prototype.transforms.html](https://pytorch.org/audio/stable/prototype.transforms.html) - en: '| [`BarkScale`](generated/torchaudio.prototype.transforms.BarkScale.html#torchaudio.prototype.transforms.BarkScale "torchaudio.prototype.transforms.BarkScale") | Turn a normal STFT into a bark frequency STFT with triangular filter banks. |' + id: totrans-2 prefs: [] type: TYPE_TB + zh: '| [`BarkScale`](generated/torchaudio.prototype.transforms.BarkScale.html#torchaudio.prototype.transforms.BarkScale + "torchaudio.prototype.transforms.BarkScale") | 将普通的STFT转换为具有三角滤波器组的犬吠频率STFT。 |' - en: '| [`BarkSpectrogram`](generated/torchaudio.prototype.transforms.BarkSpectrogram.html#torchaudio.prototype.transforms.BarkSpectrogram "torchaudio.prototype.transforms.BarkSpectrogram") | Create BarkSpectrogram for a raw audio signal. |' + id: totrans-3 prefs: [] type: TYPE_TB + zh: '| [`BarkSpectrogram`](generated/torchaudio.prototype.transforms.BarkSpectrogram.html#torchaudio.prototype.transforms.BarkSpectrogram + "torchaudio.prototype.transforms.BarkSpectrogram") | 为原始音频信号创建BarkSpectrogram。 + |' - en: '| [`ChromaScale`](generated/torchaudio.prototype.transforms.ChromaScale.html#torchaudio.prototype.transforms.ChromaScale "torchaudio.prototype.transforms.ChromaScale") | Converts spectrogram to chromagram. |' + id: totrans-4 prefs: [] type: TYPE_TB + zh: '| [`ChromaScale`](generated/torchaudio.prototype.transforms.ChromaScale.html#torchaudio.prototype.transforms.ChromaScale + "torchaudio.prototype.transforms.ChromaScale") | 将频谱图转换为色谱图。 |' - en: '| [`ChromaSpectrogram`](generated/torchaudio.prototype.transforms.ChromaSpectrogram.html#torchaudio.prototype.transforms.ChromaSpectrogram "torchaudio.prototype.transforms.ChromaSpectrogram") | Generates chromagram for audio signal. |' + id: totrans-5 prefs: [] type: TYPE_TB + zh: '| [`ChromaSpectrogram`](generated/torchaudio.prototype.transforms.ChromaSpectrogram.html#torchaudio.prototype.transforms.ChromaSpectrogram + "torchaudio.prototype.transforms.ChromaSpectrogram") | 为音频信号生成色谱图。 |' - en: '| [`InverseBarkScale`](generated/torchaudio.prototype.transforms.InverseBarkScale.html#torchaudio.prototype.transforms.InverseBarkScale "torchaudio.prototype.transforms.InverseBarkScale") | Estimate a STFT in normal frequency domain from bark frequency domain. |' + id: totrans-6 prefs: [] type: TYPE_TB + zh: '| [`InverseBarkScale`](generated/torchaudio.prototype.transforms.InverseBarkScale.html#torchaudio.prototype.transforms.InverseBarkScale + "torchaudio.prototype.transforms.InverseBarkScale") | 从犬吠频率域估计普通频率域中的STFT。 |' diff --git a/totrans/aud22_70.yaml b/totrans/aud22_70.yaml index 7e5eaa479c4c2f80cc6488377d0710ae36549428..f72102290a768c2721f448737967cd58d81ac851 100644 --- a/totrans/aud22_70.yaml +++ b/totrans/aud22_70.yaml @@ -1,4 +1,6 @@ - en: C++ Prototype API Reference + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: C++ 原型 API 参考 diff --git a/totrans/aud22_71.yaml b/totrans/aud22_71.yaml index 9c5ea5fa15eafc13a33f2da335f0aa589a45028d..14300fb571f247920686f5fd52e583f8728ef54b 100644 --- a/totrans/aud22_71.yaml +++ b/totrans/aud22_71.yaml @@ -1,261 +1,355 @@ - en: libtorio + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: libtorio - en: 原文:[https://pytorch.org/audio/stable/libtorio.html](https://pytorch.org/audio/stable/libtorio.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/libtorio.html](https://pytorch.org/audio/stable/libtorio.html) - en: Warning + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 警告 - en: TorchAudio’s C++ API is a prototype feature. API/ABI backward compatibility is not guaranteed. + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: TorchAudio的C++ API是一个原型功能。不保证API/ABI向后兼容性。 - en: '[torio::io::StreamingMediaDecoder](libtorio.stream_reader.html)' + id: totrans-4 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[torio::io::流媒体解码器](libtorio.stream_reader.html)' - en: '[Constructors](libtorio.stream_reader.html#constructors)' + id: totrans-5 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[构造函数](libtorio.stream_reader.html#constructors)' - en: '[StreamingMediaDecoder](libtorio.stream_reader.html#streamingmediadecoder)' + id: totrans-6 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[流媒体解码器](libtorio.stream_reader.html#streamingmediadecoder)' - en: '[StreamingMediaDecoderCustomIO](libtorio.stream_reader.html#streamingmediadecodercustomio)' + id: totrans-7 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[自定义IO流媒体解码器](libtorio.stream_reader.html#streamingmediadecodercustomio)' - en: '[Query Methods](libtorio.stream_reader.html#query-methods)' + id: totrans-8 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[查询方法](libtorio.stream_reader.html#query-methods)' - en: '[find_best_audio_stream](libtorio.stream_reader.html#find-best-audio-stream)' + id: totrans-9 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[查找最佳音频流](libtorio.stream_reader.html#find-best-audio-stream)' - en: '[find_best_video_stream](libtorio.stream_reader.html#find-best-video-stream)' + id: totrans-10 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[查找最佳视频流](libtorio.stream_reader.html#find-best-video-stream)' - en: '[get_metadata](libtorio.stream_reader.html#get-metadata)' + id: totrans-11 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[获取元数据](libtorio.stream_reader.html#get-metadata)' - en: '[num_src_streams](libtorio.stream_reader.html#num-src-streams)' + id: totrans-12 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[源流数量](libtorio.stream_reader.html#num-src-streams)' - en: '[get_src_stream_info](libtorio.stream_reader.html#get-src-stream-info)' + id: totrans-13 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[获取源流信息](libtorio.stream_reader.html#get-src-stream-info)' - en: '[num_out_streams](libtorio.stream_reader.html#num-out-streams)' + id: totrans-14 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[输出流数量](libtorio.stream_reader.html#num-out-streams)' - en: '[get_out_stream_info](libtorio.stream_reader.html#get-out-stream-info)' + id: totrans-15 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[获取输出流信息](libtorio.stream_reader.html#get-out-stream-info)' - en: '[is_buffer_ready](libtorio.stream_reader.html#is-buffer-ready)' + id: totrans-16 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[缓冲区是否准备就绪](libtorio.stream_reader.html#is-buffer-ready)' - en: '[Configure Methods](libtorio.stream_reader.html#configure-methods)' + id: totrans-17 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[配置方法](libtorio.stream_reader.html#configure-methods)' - en: '[add_audio_stream](libtorio.stream_reader.html#add-audio-stream)' + id: totrans-18 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[添加音频流](libtorio.stream_reader.html#add-audio-stream)' - en: '[add_video_stream](libtorio.stream_reader.html#add-video-stream)' + id: totrans-19 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[添加视频流](libtorio.stream_reader.html#add-video-stream)' - en: '[remove_stream](libtorio.stream_reader.html#remove-stream)' + id: totrans-20 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[移除流](libtorio.stream_reader.html#remove-stream)' - en: '[Stream Methods](libtorio.stream_reader.html#stream-methods)' + id: totrans-21 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[流方法](libtorio.stream_reader.html#stream-methods)' - en: '[seek](libtorio.stream_reader.html#seek)' + id: totrans-22 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[寻找](libtorio.stream_reader.html#seek)' - en: '[process_packet](libtorio.stream_reader.html#process-packet)' + id: totrans-23 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[处理数据包](libtorio.stream_reader.html#process-packet)' - en: '[process_packet_block](libtorio.stream_reader.html#process-packet-block)' + id: totrans-24 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[处理数据块](libtorio.stream_reader.html#process-packet-block)' - en: '[process_all_packets](libtorio.stream_reader.html#process-all-packets)' + id: totrans-25 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[处理所有数据包](libtorio.stream_reader.html#process-all-packets)' - en: '[fill_buffer](libtorio.stream_reader.html#fill-buffer)' + id: totrans-26 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[填充缓冲区](libtorio.stream_reader.html#fill-buffer)' - en: '[Retrieval Methods](libtorio.stream_reader.html#retrieval-methods)' + id: totrans-27 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[检索方法](libtorio.stream_reader.html#retrieval-methods)' - en: '[pop_chunks](libtorio.stream_reader.html#pop-chunks)' + id: totrans-28 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[弹出块](libtorio.stream_reader.html#pop-chunks)' - en: '[Support Structures](libtorio.stream_reader.html#support-structures)' + id: totrans-29 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[支持结构](libtorio.stream_reader.html#support-structures)' - en: '[Chunk](libtorio.stream_reader.html#chunk)' + id: totrans-30 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[块](libtorio.stream_reader.html#chunk)' - en: '[SrcStreaminfo](libtorio.stream_reader.html#srcstreaminfo)' + id: totrans-31 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[源流信息](libtorio.stream_reader.html#srcstreaminfo)' - en: '[OutputStreaminfo](libtorio.stream_reader.html#outputstreaminfo)' + id: totrans-32 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[输出流信息](libtorio.stream_reader.html#outputstreaminfo)' - en: '[torio::io::StreamingMediaEncoder](libtorio.stream_writer.html)' + id: totrans-33 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[torio::io::流媒体编码器](libtorio.stream_writer.html)' - en: '[Constructors](libtorio.stream_writer.html#constructors)' + id: totrans-34 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[构造函数](libtorio.stream_writer.html#constructors)' - en: '[StreamingMediaEncoder](libtorio.stream_writer.html#streamingmediaencoder)' + id: totrans-35 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[流媒体编码器](libtorio.stream_writer.html#streamingmediaencoder)' - en: '[StreamingMediaEncoderCustomIO](libtorio.stream_writer.html#streamingmediaencodercustomio)' + id: totrans-36 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[自定义IO流媒体编码器](libtorio.stream_writer.html#streamingmediaencodercustomio)' - en: '[Config methods](libtorio.stream_writer.html#config-methods)' + id: totrans-37 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[配置方法](libtorio.stream_writer.html#config-methods)' - en: '[add_audio_stream](libtorio.stream_writer.html#add-audio-stream)' + id: totrans-38 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[添加音频流](libtorio.stream_writer.html#add-audio-stream)' - en: '[add_video_stream](libtorio.stream_writer.html#add-video-stream)' + id: totrans-39 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[添加视频流](libtorio.stream_writer.html#add-video-stream)' - en: '[set_metadata](libtorio.stream_writer.html#set-metadata)' + id: totrans-40 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[设置元数据](libtorio.stream_writer.html#set-metadata)' - en: '[Write methods](libtorio.stream_writer.html#write-methods)' + id: totrans-41 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[写方法](libtorio.stream_writer.html#write-methods)' - en: '[open](libtorio.stream_writer.html#open)' + id: totrans-42 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[打开](libtorio.stream_writer.html#open)' - en: '[close](libtorio.stream_writer.html#close)' + id: totrans-43 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[关闭](libtorio.stream_writer.html#close)' - en: '[write_audio_chunk](libtorio.stream_writer.html#write-audio-chunk)' + id: totrans-44 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[写音频块](libtorio.stream_writer.html#write-audio-chunk)' - en: '[write_video_chunk](libtorio.stream_writer.html#write-video-chunk)' + id: totrans-45 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[写视频块](libtorio.stream_writer.html#write-video-chunk)' - en: '[flush](libtorio.stream_writer.html#flush)' + id: totrans-46 prefs: - PREF_IND - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '[刷新](libtorio.stream_writer.html#flush)' diff --git a/totrans/aud22_72.yaml b/totrans/aud22_72.yaml index f5db35c6b4e91bcf80a0ed8f7fef8d74ff530437..4e4b7e2fb020e02d0992e56bafd25fb949e74868 100644 --- a/totrans/aud22_72.yaml +++ b/totrans/aud22_72.yaml @@ -1,433 +1,676 @@ - en: torio::io::StreamingMediaDecoder + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torio::io::StreamingMediaDecoder - en: 原文:[https://pytorch.org/audio/stable/libtorio.stream_reader.html](https://pytorch.org/audio/stable/libtorio.stream_reader.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/libtorio.stream_reader.html](https://pytorch.org/audio/stable/libtorio.stream_reader.html) - en: Warning + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 警告 - en: TorchAudio’s C++ API is a prototype feature. API/ABI backward compatibility is not guaranteed. + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: TorchAudio 的 C++ API 是一个原型功能。API/ABI 的向后兼容性不能保证。 - en: Note + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: The top-level namespace has been changed from `torchaudio` to `torio`. `StreamReader` has been renamed to `StreamingMediaDecoder`. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 顶层命名空间已从 `torchaudio` 更改为 `torio`。`StreamReader` 已重命名为 `StreamingMediaDecoder`。 - en: '`StreamingMediaDecoder` is the implementation used by Python equivalent and provides similar interface. When working with custom I/O, such as in-memory data, `StreamingMediaDecoderCustomIO` class can be used.' + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: '`StreamingMediaDecoder` 是 Python 等效的实现,提供类似的接口。在使用自定义 I/O(例如内存数据)时,可以使用 `StreamingMediaDecoderCustomIO` + 类。' - en: Both classes have the same methods defined, so their usages are the same. + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 这两个类定义了相同的方法,因此它们的用法相同。 - en: Constructors[](#constructors "Permalink to this heading") + id: totrans-8 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 构造函数: - en: StreamingMediaDecoder[](#streamingmediadecoder "Permalink to this heading") + id: totrans-9 prefs: - PREF_H3 type: TYPE_NORMAL + zh: StreamingMediaDecoder - en: class StreamingMediaDecoder[](#_CPPv4N5torio2io21StreamingMediaDecoderE "Permalink to this definition") + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 类 `StreamingMediaDecoder`: - en: Fetch and decode audio/video streams chunk by chunk. + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 逐块获取和解码音频/视频流。 - en: Subclassed by [torio::io::StreamingMediaDecoderCustomIO](#classtorio_1_1io_1_1StreamingMediaDecoderCustomIO) + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 由 [torio::io::StreamingMediaDecoderCustomIO](#classtorio_1_1io_1_1StreamingMediaDecoderCustomIO) + 继承 - en: explicit torio::io::[StreamingMediaDecoder](#_CPPv4N5torio2io21StreamingMediaDecoderE "torio::io::StreamingMediaDecoder")::StreamingMediaDecoder(const std::string &src, const c10::optional &format = c10::nullopt, const c10::optional &option = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaDecoder21StreamingMediaDecoderERKNSt6stringERKN3c108optionalINSt6stringEEERKN3c108optionalI10OptionDictEE "Permalink to this definition") + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: explicit torio::io::[StreamingMediaDecoder](#_CPPv4N5torio2io21StreamingMediaDecoderE + "torio::io::StreamingMediaDecoder")::StreamingMediaDecoder(const std::string &src, + const c10::optional &format = c10::nullopt, const c10::optional &option = c10::nullopt) - en: Construct media processor from soruce URI. + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 从源 URI 构造媒体处理器。 - en: 'Parameters:' + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**src** – URL of source media, in the format FFmpeg can understand.' + id: totrans-16 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**src** – 源媒体的 URL,格式为 FFmpeg 可理解的格式。' - en: '**format** – Specifies format (such as mp4) or device (such as lavfi and avfoundation)' + id: totrans-17 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**format** – 指定格式(如 mp4)或设备(如 lavfi 和 avfoundation)。' - en: '**option** – Custom option passed when initializing format context (opening source).' + id: totrans-18 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**option** – 在初始化格式上下文(打开源)时传递的自定义选项。' - en: StreamingMediaDecoderCustomIO[](#streamingmediadecodercustomio "Permalink to this heading") + id: totrans-19 prefs: - PREF_H3 type: TYPE_NORMAL + zh: StreamingMediaDecoderCustomIO - en: 'class StreamingMediaDecoderCustomIO : private detail::CustomInput, public torio::io::[StreamingMediaDecoder](#_CPPv4N5torio2io21StreamingMediaDecoderE "torio::io::StreamingMediaDecoder")[](#_CPPv4N5torio2io29StreamingMediaDecoderCustomIOE "Permalink to this definition")' + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: 类 `StreamingMediaDecoderCustomIO`:继承自 private detail::CustomInput,公开 torio::io::[StreamingMediaDecoder](#_CPPv4N5torio2io21StreamingMediaDecoderE + "torio::io::StreamingMediaDecoder") - en: A subclass of [StreamingMediaDecoder](#classtorio_1_1io_1_1StreamingMediaDecoder) which works with custom read function. Can be used for decoding media from memory or custom object. + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: '[StreamingMediaDecoder](#classtorio_1_1io_1_1StreamingMediaDecoder) 的子类,与自定义读取函数一起工作。可用于从内存或自定义对象解码媒体。' - en: torio::io::[StreamingMediaDecoderCustomIO](#_CPPv4N5torio2io29StreamingMediaDecoderCustomIOE "torio::io::StreamingMediaDecoderCustomIO")::StreamingMediaDecoderCustomIO(void *opaque, const c10::optional &format, int buffer_size, int (*read_packet)(void *opaque, uint8_t *buf, int buf_size), int64_t (*seek)(void *opaque, int64_t offset, int whence) = nullptr, const c10::optional &option = c10::nullopt)[](#_CPPv4N5torio2io29StreamingMediaDecoderCustomIO29StreamingMediaDecoderCustomIOEPvRKN3c108optionalINSt6stringEEEiPFiPvP7uint8_tiEPF7int64_tPv7int64_tiERKN3c108optionalI10OptionDictEE "Permalink to this definition") + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: torio::io::[StreamingMediaDecoderCustomIO](#_CPPv4N5torio2io29StreamingMediaDecoderCustomIOE + "torio::io::StreamingMediaDecoderCustomIO")::StreamingMediaDecoderCustomIO(void *opaque, + const c10::optional &format, int buffer_size, int (*read_packet)(void *opaque, uint8_t *buf, int buf_size), + int64_t (*seek)(void *opaque, int64_t offset, int whence) = nullptr, const c10::optional &option = c10::nullopt) - en: Construct [StreamingMediaDecoder](#classtorio_1_1io_1_1StreamingMediaDecoder) with custom read and seek functions. + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: 使用自定义读取和寻址函数构造 [StreamingMediaDecoder](#classtorio_1_1io_1_1StreamingMediaDecoder)。 - en: 'Parameters:' + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**opaque** – Custom data used by `read_packet` and `seek` functions.' + id: totrans-25 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**opaque** – `read_packet` 和 `seek` 函数使用的自定义数据。' - en: '**format** – Specify input format.' + id: totrans-26 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**format** – 指定输入格式。' - en: '**buffer_size** – The size of the intermediate buffer, which FFmpeg uses to pass data to function read_packet.' + id: totrans-27 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**buffer_size** – 中间缓冲区的大小,FFmpeg 用于将数据传递给 read_packet 函数。' - en: '**read_packet** – Custom read function that is called from FFmpeg to read data from the destination.' + id: totrans-28 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**read_packet** – 自定义读取函数,由 FFmpeg 调用以从目标读取数据。' - en: '**seek** – Optional seek function that is used to seek the destination.' + id: totrans-29 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**seek** – 可选的寻址函数,用于寻址目标。' - en: '**option** – Custom option passed when initializing format context.' + id: totrans-30 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**option** – 在初始化格式上下文时传递的自定义选项。' - en: Query Methods[](#query-methods "Permalink to this heading") + id: totrans-31 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 查询方法: - en: find_best_audio_stream[](#find-best-audio-stream "Permalink to this heading") + id: totrans-32 prefs: - PREF_H3 type: TYPE_NORMAL + zh: find_best_audio_stream - en: int64_t torio::io::[StreamingMediaDecoder](#_CPPv4N5torio2io21StreamingMediaDecoderE "torio::io::StreamingMediaDecoder")::find_best_audio_stream() const[](#_CPPv4NK5torio2io21StreamingMediaDecoder22find_best_audio_streamEv "Permalink to this definition") + id: totrans-33 prefs: [] type: TYPE_NORMAL + zh: StreamingMediaDecoder类中的find_best_audio_stream()方法 - en: Find a suitable audio stream using heuristics from ffmpeg. + id: totrans-34 prefs: [] type: TYPE_NORMAL + zh: 使用ffmpeg的启发式方法找到合适的音频流。 - en: If successful, the index of the best stream (>=0) is returned. Otherwise a negative value is returned. + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 如果成功,返回最佳流的索引(大于等于0)。否则返回负值。 - en: find_best_video_stream[](#find-best-video-stream "Permalink to this heading") + id: totrans-36 prefs: - PREF_H3 type: TYPE_NORMAL + zh: find_best_video_stream - en: int64_t torio::io::[StreamingMediaDecoder](#_CPPv4N5torio2io21StreamingMediaDecoderE "torio::io::StreamingMediaDecoder")::find_best_video_stream() const[](#_CPPv4NK5torio2io21StreamingMediaDecoder22find_best_video_streamEv "Permalink to this definition") + id: totrans-37 prefs: [] type: TYPE_NORMAL + zh: StreamingMediaDecoder类中的find_best_video_stream()方法 - en: Find a suitable video stream using heuristics from ffmpeg. + id: totrans-38 prefs: [] type: TYPE_NORMAL + zh: 使用ffmpeg的启发式方法找到合适的视频流。 - en: If successful, the index of the best stream (0>=) is returned. otherwise a negative value is returned. + id: totrans-39 prefs: [] type: TYPE_NORMAL + zh: 如果成功,返回最佳流的索引(大于等于0)。否则返回负值。 - en: get_metadata[](#get-metadata "Permalink to this heading") + id: totrans-40 prefs: - PREF_H3 type: TYPE_NORMAL + zh: get_metadata - en: OptionDict torio::io::[StreamingMediaDecoder](#_CPPv4N5torio2io21StreamingMediaDecoderE "torio::io::StreamingMediaDecoder")::get_metadata() const[](#_CPPv4NK5torio2io21StreamingMediaDecoder12get_metadataEv "Permalink to this definition") + id: totrans-41 prefs: [] type: TYPE_NORMAL + zh: StreamingMediaDecoder类中的get_metadata()方法 - en: Fetch metadata of the source media. + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: 获取源媒体的元数据。 - en: num_src_streams[](#num-src-streams "Permalink to this heading") + id: totrans-43 prefs: - PREF_H3 type: TYPE_NORMAL + zh: num_src_streams - en: int64_t torio::io::[StreamingMediaDecoder](#_CPPv4N5torio2io21StreamingMediaDecoderE "torio::io::StreamingMediaDecoder")::num_src_streams() const[](#_CPPv4NK5torio2io21StreamingMediaDecoder15num_src_streamsEv "Permalink to this definition") + id: totrans-44 prefs: [] type: TYPE_NORMAL + zh: StreamingMediaDecoder类中的num_src_streams()方法 - en: Fetch the number of source streams found in the input media. + id: totrans-45 prefs: [] type: TYPE_NORMAL + zh: 获取输入媒体中找到的源流数量。 - en: The source streams include not only audio/video streams but also subtitle and others. + id: totrans-46 prefs: [] type: TYPE_NORMAL + zh: 源流不仅包括音频/视频流,还包括字幕等。 - en: get_src_stream_info[](#get-src-stream-info "Permalink to this heading") + id: totrans-47 prefs: - PREF_H3 type: TYPE_NORMAL + zh: get_src_stream_info - en: '[SrcStreamInfo](#_CPPv4N5torio2io13SrcStreamInfoE "torio::io::SrcStreamInfo") torio::io::[StreamingMediaDecoder](#_CPPv4N5torio2io21StreamingMediaDecoderE "torio::io::StreamingMediaDecoder")::get_src_stream_info(int i) const[](#_CPPv4NK5torio2io21StreamingMediaDecoder19get_src_stream_infoEi "Permalink to this definition")' + id: totrans-48 prefs: [] type: TYPE_NORMAL + zh: StreamingMediaDecoder类中的get_src_stream_info()方法 - en: Fetch information about the specified source stream. + id: totrans-49 prefs: [] type: TYPE_NORMAL + zh: 获取指定源流的信息。 - en: The valid value range is `[0, [num_src_streams()](#classtorio_1_1io_1_1StreamingMediaDecoder_1a6b3e5fd480cc50ee5ec9b389641c4512))`. + id: totrans-50 prefs: [] type: TYPE_NORMAL + zh: 有效值范围为`[0, num_src_streams()]`。 - en: num_out_streams[](#num-out-streams "Permalink to this heading") + id: totrans-51 prefs: - PREF_H3 type: TYPE_NORMAL + zh: num_out_streams - en: int64_t torio::io::[StreamingMediaDecoder](#_CPPv4N5torio2io21StreamingMediaDecoderE "torio::io::StreamingMediaDecoder")::num_out_streams() const[](#_CPPv4NK5torio2io21StreamingMediaDecoder15num_out_streamsEv "Permalink to this definition") + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: StreamingMediaDecoder类中的num_out_streams()方法 - en: Fetch the number of output streams defined by client code. + id: totrans-53 prefs: [] type: TYPE_NORMAL + zh: 获取客户端代码定义的输出流数量。 - en: get_out_stream_info[](#get-out-stream-info "Permalink to this heading") + id: totrans-54 prefs: - PREF_H3 type: TYPE_NORMAL + zh: get_out_stream_info - en: '[OutputStreamInfo](#_CPPv4N5torio2io16OutputStreamInfoE "torio::io::OutputStreamInfo") torio::io::[StreamingMediaDecoder](#_CPPv4N5torio2io21StreamingMediaDecoderE "torio::io::StreamingMediaDecoder")::get_out_stream_info(int i) const[](#_CPPv4NK5torio2io21StreamingMediaDecoder19get_out_stream_infoEi "Permalink to this definition")' + id: totrans-55 prefs: [] type: TYPE_NORMAL + zh: StreamingMediaDecoder类中的get_out_stream_info()方法 - en: Fetch information about the specified output stream. + id: totrans-56 prefs: [] type: TYPE_NORMAL + zh: 获取指定输出流的信息。 - en: The valid value range is `[0, [num_out_streams()](#classtorio_1_1io_1_1StreamingMediaDecoder_1a2675b80361ce5ac9da29bb63105f1135))`. + id: totrans-57 prefs: [] type: TYPE_NORMAL + zh: 有效值范围为`[0, num_out_streams()]`。 - en: is_buffer_ready[](#is-buffer-ready "Permalink to this heading") + id: totrans-58 prefs: - PREF_H3 type: TYPE_NORMAL + zh: is_buffer_ready - en: bool torio::io::[StreamingMediaDecoder](#_CPPv4N5torio2io21StreamingMediaDecoderE "torio::io::StreamingMediaDecoder")::is_buffer_ready() const[](#_CPPv4NK5torio2io21StreamingMediaDecoder15is_buffer_readyEv "Permalink to this definition") + id: totrans-59 prefs: [] type: TYPE_NORMAL + zh: StreamingMediaDecoder类中的is_buffer_ready()方法 - en: Check if all the buffers of the output streams have enough decoded frames. + id: totrans-60 prefs: [] type: TYPE_NORMAL + zh: 检查输出流的所有缓冲区是否有足够的解码帧。 - en: Configure Methods[](#configure-methods "Permalink to this heading") + id: totrans-61 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 配置方法 - en: add_audio_stream[](#add-audio-stream "Permalink to this heading") + id: totrans-62 prefs: - PREF_H3 type: TYPE_NORMAL + zh: add_audio_stream - en: void torio::io::[StreamingMediaDecoder](#_CPPv4N5torio2io21StreamingMediaDecoderE "torio::io::StreamingMediaDecoder")::add_audio_stream(int64_t i, int64_t frames_per_chunk, int64_t num_chunks, const c10::optional &filter_desc = c10::nullopt, const c10::optional &decoder = c10::nullopt, const c10::optional &decoder_option = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaDecoder16add_audio_streamE7int64_t7int64_t7int64_tRKN3c108optionalINSt6stringEEERKN3c108optionalINSt6stringEEERKN3c108optionalI10OptionDictEE "Permalink to this definition") + id: totrans-63 prefs: [] type: TYPE_NORMAL + zh: void torio::io::[StreamingMediaDecoder](#_CPPv4N5torio2io21StreamingMediaDecoderE + "torio::io::StreamingMediaDecoder")::add_audio_stream(int64_t i, int64_t frames_per_chunk, + int64_t num_chunks, const c10::optional &filter_desc = c10::nullopt, + const c10::optional &decoder = c10::nullopt, const c10::optional &decoder_option = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaDecoder16add_audio_streamE7int64_t7int64_t7int64_tRKN3c108optionalINSt6stringEEERKN3c108optionalINSt6stringEEERKN3c108optionalI10OptionDictEE + "此定义的永久链接") - en: Define an output audio stream. + id: totrans-64 prefs: [] type: TYPE_NORMAL + zh: 定义一个输出音频流。 - en: 'Parameters:' + id: totrans-65 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**i** – The index of the source stream.' + id: totrans-66 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**i** – 源流的索引。' - en: '**frames_per_chunk** – Number of frames returned as one chunk.' + id: totrans-67 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**每块帧数** – 作为一个块返回的帧数。' - en: If a source stream is exhausted before `frames_per_chunk` frames are buffered, the chunk is returned as-is. Thus the number of frames in the chunk may be smaller than [PRE0] + id: totrans-68 prefs: - PREF_IND type: TYPE_NORMAL + zh: 如果源流在缓冲`frames_per_chunk`帧之前耗尽,则该块将原样返回。因此,块中的帧数可能小于[PRE0] - en: '{' + id: totrans-69 prefs: [] type: TYPE_NORMAL + zh: '{' - en: '"title": "foo",' + id: totrans-70 prefs: [] type: TYPE_NORMAL + zh: '"title": "foo",' - en: '"artist": "bar",' + id: totrans-71 prefs: [] type: TYPE_NORMAL + zh: '"艺术家": "bar",' - en: '"date": "2017"' + id: totrans-72 prefs: [] type: TYPE_NORMAL + zh: '"日期": "2017"' - en: '}' + id: totrans-73 prefs: [] type: TYPE_NORMAL + zh: '}' - en: '```' + id: totrans-74 prefs: [] type: TYPE_NORMAL + zh: '```' - en: AUDIO-SPECIFIC MEMBERS + id: totrans-75 prefs: [] type: TYPE_NORMAL -- en: double sample_rate = 0[](#_CPPv4N5torio2io13SrcStreamInfo11sample_rateE - "Permalink to this definition") + zh: 音频特定成员 +- en: double sample_rate = 0[](#_CPPv4N5torio2io13SrcStreamInfo11sample_rateE "Permalink + to this definition") + id: totrans-76 prefs: [] type: TYPE_NORMAL + zh: double 采样率 = 0[](#_CPPv4N5torio2io13SrcStreamInfo11sample_rateE "此定义的永久链接") - en: Sample rate. + id: totrans-77 prefs: [] type: TYPE_NORMAL + zh: 采样率。 - en: int num_channels = 0[](#_CPPv4N5torio2io13SrcStreamInfo12num_channelsE "Permalink to this definition") + id: totrans-78 prefs: [] type: TYPE_NORMAL + zh: int 通道数 = 0[](#_CPPv4N5torio2io13SrcStreamInfo12num_channelsE "此定义的永久链接") - en: The number of channels. + id: totrans-79 prefs: [] type: TYPE_NORMAL + zh: 通道数。 - en: VIDEO-SPECIFIC MEMBERS + id: totrans-80 prefs: [] type: TYPE_NORMAL + zh: 视频特定成员 - en: int width = 0[](#_CPPv4N5torio2io13SrcStreamInfo5widthE "Permalink to this definition") + id: totrans-81 prefs: [] type: TYPE_NORMAL + zh: int 宽度 = 0[](#_CPPv4N5torio2io13SrcStreamInfo5widthE "此定义的永久链接") - en: Width. + id: totrans-82 prefs: [] type: TYPE_NORMAL -- en: int height = 0[](#_CPPv4N5torio2io13SrcStreamInfo6heightE "Permalink to - this definition") + zh: 宽度。 +- en: int height = 0[](#_CPPv4N5torio2io13SrcStreamInfo6heightE "Permalink to this + definition") + id: totrans-83 prefs: [] type: TYPE_NORMAL + zh: int 高度 = 0[](#_CPPv4N5torio2io13SrcStreamInfo6heightE "此定义的永久链接") - en: Height. + id: totrans-84 prefs: [] type: TYPE_NORMAL + zh: 高度。 - en: double frame_rate = 0[](#_CPPv4N5torio2io13SrcStreamInfo10frame_rateE "Permalink to this definition") + id: totrans-85 prefs: [] type: TYPE_NORMAL + zh: double 帧率 = 0[](#_CPPv4N5torio2io13SrcStreamInfo10frame_rateE "此定义的永久链接") - en: Frame rate. + id: totrans-86 prefs: [] type: TYPE_NORMAL + zh: 帧率。 - en: OutputStreaminfo[](#outputstreaminfo "Permalink to this heading") + id: totrans-87 prefs: - PREF_H3 type: TYPE_NORMAL -- en: struct OutputStreamInfo[](#_CPPv4N5torio2io16OutputStreamInfoE "Permalink - to this definition") + zh: OutputStreaminfo[](#outputstreaminfo "此标题的永久链接") +- en: struct OutputStreamInfo[](#_CPPv4N5torio2io16OutputStreamInfoE "Permalink to + this definition") + id: totrans-88 prefs: [] type: TYPE_NORMAL + zh: 结构 OutputStreamInfo[](#_CPPv4N5torio2io16OutputStreamInfoE "此定义的永久链接") - en: Information about output stream configured by user code. + id: totrans-89 prefs: [] type: TYPE_NORMAL + zh: 用户代码配置的输出流信息。 - en: AUDIO-SPECIFIC MEMBERS + id: totrans-90 prefs: [] type: TYPE_NORMAL + zh: 音频特定成员 - en: double sample_rate = -1[](#_CPPv4N5torio2io16OutputStreamInfo11sample_rateE "Permalink to this definition") + id: totrans-91 prefs: [] type: TYPE_NORMAL + zh: double 采样率 = -1[](#_CPPv4N5torio2io16OutputStreamInfo11sample_rateE "此定义的永久链接") - en: Sample rate. + id: totrans-92 prefs: [] type: TYPE_NORMAL + zh: 采样率。 - en: int num_channels = -1[](#_CPPv4N5torio2io16OutputStreamInfo12num_channelsE "Permalink to this definition") + id: totrans-93 prefs: [] type: TYPE_NORMAL + zh: int 通道数 = -1[](#_CPPv4N5torio2io16OutputStreamInfo12num_channelsE "此定义的永久链接") - en: The number of channels. + id: totrans-94 prefs: [] type: TYPE_NORMAL + zh: 通道数。 - en: VIDEO-SPECIFIC MEMBERS + id: totrans-95 prefs: [] type: TYPE_NORMAL + zh: 视频特定成员 - en: int width = -1[](#_CPPv4N5torio2io16OutputStreamInfo5widthE "Permalink to this definition") + id: totrans-96 prefs: [] type: TYPE_NORMAL + zh: int 宽度 = -1[](#_CPPv4N5torio2io16OutputStreamInfo5widthE "此定义的永久链接") - en: Width. + id: totrans-97 prefs: [] type: TYPE_NORMAL + zh: 宽度。 - en: int height = -1[](#_CPPv4N5torio2io16OutputStreamInfo6heightE "Permalink to this definition") + id: totrans-98 prefs: [] type: TYPE_NORMAL + zh: int 高度 = -1[](#_CPPv4N5torio2io16OutputStreamInfo6heightE "此定义的永久链接") - en: Height. + id: totrans-99 prefs: [] type: TYPE_NORMAL + zh: 高度。 - en: AVRational frame_rate = {0, 1}[](#_CPPv4N5torio2io16OutputStreamInfo10frame_rateE "Permalink to this definition") + id: totrans-100 prefs: [] type: TYPE_NORMAL + zh: AVRational 帧率 = {0, 1}[](#_CPPv4N5torio2io16OutputStreamInfo10frame_rateE + "此定义的永久链接") - en: Frame rate. + id: totrans-101 prefs: [] type: TYPE_NORMAL + zh: 帧率。 - en: Public Members + id: totrans-102 prefs: [] type: TYPE_NORMAL + zh: 公共成员 - en: int source_index[](#_CPPv4N5torio2io16OutputStreamInfo12source_indexE "Permalink to this definition") + id: totrans-103 prefs: [] type: TYPE_NORMAL + zh: int 源索引[](#_CPPv4N5torio2io16OutputStreamInfo12source_indexE "此定义的永久链接") - en: The index of the input source stream. + id: totrans-104 prefs: [] type: TYPE_NORMAL + zh: 输入源流的索引。 - en: AVMediaType media_type = AVMEDIA_TYPE_UNKNOWN[](#_CPPv4N5torio2io16OutputStreamInfo10media_typeE "Permalink to this definition") + id: totrans-105 prefs: [] type: TYPE_NORMAL + zh: AVMediaType 媒体类型 = AVMEDIA_TYPE_UNKNOWN[](#_CPPv4N5torio2io16OutputStreamInfo10media_typeE + "此定义的永久链接") - en: The stream media type. + id: totrans-106 prefs: [] type: TYPE_NORMAL + zh: 流媒体类型。 - en: Please see refer to [the FFmpeg documentation](https://ffmpeg.org/doxygen/4.1/group__lavu__misc.html#ga9a84bba4713dfced21a1a56163be1f48) for the available values + id: totrans-107 prefs: [] type: TYPE_NORMAL + zh: 请参阅[FFmpeg文档](https://ffmpeg.org/doxygen/4.1/group__lavu__misc.html#ga9a84bba4713dfced21a1a56163be1f48)以获取可用值 - en: '*Todo:*' + id: totrans-108 prefs: [] type: TYPE_NORMAL + zh: '*待办事项:*' - en: Introduce own enum and get rid of FFmpeg dependency + id: totrans-109 prefs: [] type: TYPE_NORMAL + zh: 引入自己的枚举并摆脱FFmpeg依赖 - en: int format = -1[](#_CPPv4N5torio2io16OutputStreamInfo6formatE "Permalink to this definition") + id: totrans-110 prefs: [] type: TYPE_NORMAL + zh: int 格式 = -1[](#_CPPv4N5torio2io16OutputStreamInfo6formatE "此定义的永久链接") - en: Media format. AVSampleFormat for audio or AVPixelFormat for video. + id: totrans-111 prefs: [] type: TYPE_NORMAL + zh: 媒体格式。音频的AVSampleFormat或视频的AVPixelFormat。 - en: std::string filter_description = {}[](#_CPPv4N5torio2io16OutputStreamInfo18filter_descriptionE "Permalink to this definition") + id: totrans-112 prefs: [] type: TYPE_NORMAL + zh: std::string 滤波器描述 = {}[](#_CPPv4N5torio2io16OutputStreamInfo18filter_descriptionE + "此定义的永久链接") - en: Filter graph definition, such as `"aresample=16000,aformat=sample_fmts=fltp"`. + id: totrans-113 prefs: [] type: TYPE_NORMAL + zh: 滤波器图定义,例如`"aresample=16000,aformat=sample_fmts=fltp"`。 diff --git a/totrans/aud22_73.yaml b/totrans/aud22_73.yaml index 7ca34e3e9120d8513d596f1bd4648e05d135e3ad..8475318b1fc602a43bfae455858d7b13efadd4b4 100644 --- a/totrans/aud22_73.yaml +++ b/totrans/aud22_73.yaml @@ -1,131 +1,204 @@ - en: torio::io::StreamingMediaEncoder + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torio::io::StreamingMediaEncoder - en: 原文:[https://pytorch.org/audio/stable/libtorio.stream_writer.html](https://pytorch.org/audio/stable/libtorio.stream_writer.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/audio/stable/libtorio.stream_writer.html](https://pytorch.org/audio/stable/libtorio.stream_writer.html) - en: Warning + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 警告 - en: TorchAudio’s C++ API is prototype feature. API/ABI backward compatibility is not guaranteed. + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: TorchAudio的C++ API是原型功能。不保证API/ABI向后兼容性。 - en: Note + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: The top-level namespace has been changed from `torchaudio` to `torio`. `StreamWriter` has been renamed to `StreamingMediaEncoder`. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 顶层命名空间已从`torchaudio`更改为`torio`。`StreamWriter`已更名为`StreamingMediaEncoder`。 - en: '`StreamingMediaEncoder` is the implementation used by Python equivalent and provides similar interface. When working with custom I/O, such as in-memory data, `StreamingMediaEncoderCustomIO` class can be used.' + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: '`StreamingMediaEncoder`是Python等效实现使用的实现,并提供类似的接口。在处理自定义I/O(例如内存数据)时,可以使用`StreamingMediaEncoderCustomIO`类。' - en: Both classes have the same methods defined, so their usages are the same. + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 这两个类定义了相同的方法,因此它们的用法相同。 - en: Constructors[](#constructors "Permalink to this heading") + id: totrans-8 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 构造函数[](#constructors "Permalink to this heading") - en: StreamingMediaEncoder[](#streamingmediaencoder "Permalink to this heading") + id: totrans-9 prefs: - PREF_H3 type: TYPE_NORMAL + zh: StreamingMediaEncoder[](#streamingmediaencoder "Permalink to this heading") - en: class StreamingMediaEncoder[](#_CPPv4N5torio2io21StreamingMediaEncoderE "Permalink to this definition") + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 类StreamingMediaEncoder[](#_CPPv4N5torio2io21StreamingMediaEncoderE "Permalink + to this definition") - en: Encode and write audio/video streams chunk by chunk + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 逐块编码和写入音频/视频流 - en: Subclassed by [torio::io::StreamingMediaEncoderCustomIO](#classtorio_1_1io_1_1StreamingMediaEncoderCustomIO) + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 由[torio::io::StreamingMediaEncoderCustomIO](#classtorio_1_1io_1_1StreamingMediaEncoderCustomIO)派生 - en: explicit torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE "torio::io::StreamingMediaEncoder")::StreamingMediaEncoder(const std::string &dst, const c10::optional &format = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaEncoder21StreamingMediaEncoderERKNSt6stringERKN3c108optionalINSt6stringEEE "Permalink to this definition") + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 显式torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE + "torio::io::StreamingMediaEncoder")::StreamingMediaEncoder(const std::string &dst, + const c10::optional &format = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaEncoder21StreamingMediaEncoderERKNSt6stringERKN3c108optionalINSt6stringEEE + "Permalink to this definition") - en: Construct [StreamingMediaEncoder](#classtorio_1_1io_1_1StreamingMediaEncoder) from destination URI + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 从目的地URI构造[StreamingMediaEncoder](#classtorio_1_1io_1_1StreamingMediaEncoder) - en: 'Parameters:' + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**dst** – Destination where encoded data are written.' + id: totrans-16 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**dst** - 编码数据写入的目的地。' - en: '**format** – Specify output format. If not provided, it is guessed from `dst`.' + id: totrans-17 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**格式** - 指定输出格式。如果未提供,则从`dst`中猜测。' - en: StreamingMediaEncoderCustomIO[](#streamingmediaencodercustomio "Permalink to this heading") + id: totrans-18 prefs: - PREF_H3 type: TYPE_NORMAL + zh: StreamingMediaEncoderCustomIO[](#streamingmediaencodercustomio "Permalink to + this heading") - en: 'class StreamingMediaEncoderCustomIO : private detail::CustomOutput, public torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE "torio::io::StreamingMediaEncoder")[](#_CPPv4N5torio2io29StreamingMediaEncoderCustomIOE "Permalink to this definition")' + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: 类StreamingMediaEncoderCustomIO:private detail::CustomOutput, public torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE + "torio::io::StreamingMediaEncoder")[](#_CPPv4N5torio2io29StreamingMediaEncoderCustomIOE + "Permalink to this definition") - en: A subclass of [StreamingMediaDecoder](libtorio.stream_reader.html#classtorio_1_1io_1_1StreamingMediaDecoder) which works with custom read function. Can be used for encoding media into memory or custom object. + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: 一个[StreamingMediaDecoder](libtorio.stream_reader.html#classtorio_1_1io_1_1StreamingMediaDecoder)的子类,可以与自定义读取函数一起使用。可用于将媒体编码到内存或自定义对象中。 - en: torio::io::[StreamingMediaEncoderCustomIO](#_CPPv4N5torio2io29StreamingMediaEncoderCustomIOE "torio::io::StreamingMediaEncoderCustomIO")::StreamingMediaEncoderCustomIO(void *opaque, const c10::optional &format, int buffer_size, int (*write_packet)(void *opaque, uint8_t *buf, int buf_size), int64_t (*seek)(void *opaque, int64_t offset, int whence) = nullptr)[](#_CPPv4N5torio2io29StreamingMediaEncoderCustomIO29StreamingMediaEncoderCustomIOEPvRKN3c108optionalINSt6stringEEEiPFiPvP7uint8_tiEPF7int64_tPv7int64_tiE "Permalink to this definition") + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: torio::io::[StreamingMediaEncoderCustomIO](#_CPPv4N5torio2io29StreamingMediaEncoderCustomIOE + "torio::io::StreamingMediaEncoderCustomIO")::StreamingMediaEncoderCustomIO(void *opaque, + const c10::optional &format, int buffer_size, int (*write_packet)(void *opaque, uint8_t *buf, int buf_size), + int64_t (*seek)(void *opaque, int64_t offset, int whence) = nullptr)[](#_CPPv4N5torio2io29StreamingMediaEncoderCustomIO29StreamingMediaEncoderCustomIOEPvRKN3c108optionalINSt6stringEEEiPFiPvP7uint8_tiEPF7int64_tPv7int64_tiE + "Permalink to this definition") - en: Construct [StreamingMediaEncoderCustomIO](#classtorio_1_1io_1_1StreamingMediaEncoderCustomIO) with custom write and seek functions. + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 使用自定义写入和寻址函数构造[StreamingMediaEncoderCustomIO](#classtorio_1_1io_1_1StreamingMediaEncoderCustomIO)。 - en: 'Parameters:' + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**opaque** – Custom data used by `write_packet` and `seek` functions.' + id: totrans-24 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**opaque** - `write_packet`和`seek`函数使用的自定义数据。' - en: '**format** – Specify output format.' + id: totrans-25 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**格式** - 指定输出格式。' - en: '**buffer_size** – The size of the intermediate buffer, which FFmpeg uses to pass data to write_packet function.' + id: totrans-26 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**buffer_size** - 中间缓冲区的大小,FFmpeg用于将数据传递给write_packet函数。' - en: '**write_packet** – Custom write function that is called from FFmpeg to actually write data to the custom destination.' + id: totrans-27 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**write_packet** - 自定义写入函数,从FFmpeg调用以实际将数据写入自定义目的地。' - en: '**seek** – Optional seek function that is used to seek the destination.' + id: totrans-28 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**seek** - 可选的寻址函数,用于寻址目的地。' - en: Config methods[](#config-methods "Permalink to this heading") + id: totrans-29 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 配置方法[](#config-methods "Permalink to this heading") - en: add_audio_stream[](#add-audio-stream "Permalink to this heading") + id: totrans-30 prefs: - PREF_H3 type: TYPE_NORMAL + zh: add_audio_stream[](#add-audio-stream "Permalink to this heading") - en: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE "torio::io::StreamingMediaEncoder")::add_audio_stream(int sample_rate, int num_channels, const std::string &format, const c10::optional &encoder = c10::nullopt, @@ -133,106 +206,161 @@ const c10::optional &encoder_sample_rate = c10::nullopt, const c10::optional &encoder_num_channels = c10::nullopt, const c10::optional &codec_config = c10::nullopt, const c10::optional &filter_desc = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaEncoder16add_audio_streamEiiRKNSt6stringERKN3c108optionalINSt6stringEEERKN3c108optionalI10OptionDictEERKN3c108optionalINSt6stringEEERKN3c108optionalIiEERKN3c108optionalIiEERKN3c108optionalI11CodecConfigEERKN3c108optionalINSt6stringEEE "Permalink to this definition") - prefs: [] - type: TYPE_NORMAL + id: totrans-31 + prefs: [] + type: TYPE_NORMAL + zh: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE + "torio::io::StreamingMediaEncoder")::add_audio_stream(int sample_rate, int num_channels, + const std::string &format, const c10::optional &encoder = c10::nullopt, + const c10::optional &encoder_option = c10::nullopt, const c10::optional + &encoder_format = c10::nullopt, const c10::optional &encoder_sample_rate + = c10::nullopt, const c10::optional &encoder_num_channels = c10::nullopt, + const c10::optional &codec_config = c10::nullopt, const c10::optional + &filter_desc = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaEncoder16add_audio_streamEiiRKNSt6stringERKN3c108optionalINSt6stringEEERKN3c108optionalI10OptionDictEERKN3c108optionalINSt6stringEEERKN3c108optionalIiEERKN3c108optionalIiEERKN3c108optionalI11CodecConfigEERKN3c108optionalINSt6stringEEE + "跳转到此定义") - en: Add an output audio stream. + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: 添加一个输出音频流。 - en: 'Parameters:' + id: totrans-33 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**sample_rate** – The sample rate.' + id: totrans-34 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**sample_rate** - 采样率。' - en: '**num_channels** – The number of channels.' + id: totrans-35 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**num_channels** - 通道数。' - en: '**format** – Input sample format, which determines the dtype of the input tensor.' + id: totrans-36 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**format** - 输入样本格式,确定输入张量的dtype。' - en: '`"u8"`: The input tensor must be `torch.uint8` type.' + id: totrans-37 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '"u8":输入张量必须是`torch.uint8`类型。' - en: '`"s16"`: The input tensor must be `torch.int16` type.' + id: totrans-38 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '"s16":输入张量必须是`torch.int16`类型。' - en: '`"s32"`: The input tensor must be `torch.int32` type.' + id: totrans-39 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '"s32":输入张量必须是`torch.int32`类型。' - en: '`"s64"`: The input tensor must be `torch.int64` type.' + id: totrans-40 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '"s64":输入张量必须是`torch.int64`类型。' - en: '`"flt"`: The input tensor must be `torch.float32` type.' + id: totrans-41 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '"flt":输入张量必须是`torch.float32`类型。' - en: '`"dbl"`: The input tensor must be `torch.float64` type.' + id: totrans-42 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '"dbl":输入张量必须是`torch.float64`类型。' - en: 'Default: `"flt"`.' + id: totrans-43 prefs: - PREF_IND type: TYPE_NORMAL + zh: 默认值:"flt"。 - en: '**encoder** – The name of the encoder to be used.' + id: totrans-44 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**encoder** - 要使用的编码器的名称。' - en: When provided, use the specified encoder instead of the default one. + id: totrans-45 prefs: - PREF_IND type: TYPE_NORMAL + zh: 在提供时,使用指定的编码器而不是默认的编码器。 - en: To list the available encoders, you can use `ffmpeg -encoders` command. + id: totrans-46 prefs: - PREF_IND type: TYPE_NORMAL + zh: 要列出可用的编码器,可以使用`ffmpeg -encoders`命令。 - en: '**encoder_option** – Options passed to encoder. To list encoder options for a encoder, you can use `ffmpeg -h encoder=`.' + id: totrans-47 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**encoder_option** - 传递给编码器的选项。要列出编码器的编码器选项,可以使用`ffmpeg -h encoder=`。' - en: '**encoder_format** – Format used to encode media. When encoder supports multiple formats, passing this argument will override the format used for encoding. To list supported formats for the encoder, you can use `ffmpeg -h encoder=` command.' + id: totrans-48 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**encoder_format** - 用于编码媒体的格式。当编码器支持多种格式时,传递此参数将覆盖用于编码的格式。要列出编码器支持的格式,可以使用`ffmpeg + -h encoder=`命令。' - en: '**encoder_sample_rate** – If provided, perform resampling before encoding.' + id: totrans-49 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**encoder_sample_rate** - 如果提供,执行编码前的重采样。' - en: '**encoder_num_channels** – If provided, change channel configuration before encoding.' + id: totrans-50 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**encoder_num_channels** - 如果提供,改变编码前的通道配置。' - en: '**codec_config** – Codec configuration.' + id: totrans-51 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**codec_config** - 编解码器配置。' - en: '**filter_desc** – Additional processing to apply before encoding the input data' + id: totrans-52 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**filter_desc** - 在编码输入数据之前应用的附加处理' - en: add_video_stream[](#add-video-stream "Permalink to this heading") + id: totrans-53 prefs: - PREF_H3 type: TYPE_NORMAL + zh: add_video_stream[](#add-video-stream "跳转到此标题") - en: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE "torio::io::StreamingMediaEncoder")::add_video_stream(double frame_rate, int width, int height, const std::string &format, const c10::optional &encoder = c10::nullopt, @@ -241,247 +369,389 @@ const c10::optional &encoder_height = c10::nullopt, const c10::optional &hw_accel = c10::nullopt, const c10::optional &codec_config = c10::nullopt, const c10::optional &filter_desc = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaEncoder16add_video_streamEdiiRKNSt6stringERKN3c108optionalINSt6stringEEERKN3c108optionalI10OptionDictEERKN3c108optionalINSt6stringEEERKN3c108optionalIdEERKN3c108optionalIiEERKN3c108optionalIiEERKN3c108optionalINSt6stringEEERKN3c108optionalI11CodecConfigEERKN3c108optionalINSt6stringEEE "Permalink to this definition") - prefs: [] - type: TYPE_NORMAL + id: totrans-54 + prefs: [] + type: TYPE_NORMAL + zh: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE + "torio::io::StreamingMediaEncoder")::add_video_stream(double frame_rate, int width, + int height, const std::string &format, const c10::optional &encoder + = c10::nullopt, const c10::optional &encoder_option = c10::nullopt, + const c10::optional &encoder_format = c10::nullopt, const c10::optional + &encoder_frame_rate = c10::nullopt, const c10::optional &encoder_width = + c10::nullopt, const c10::optional &encoder_height = c10::nullopt, const c10::optional + &hw_accel = c10::nullopt, const c10::optional &codec_config = c10::nullopt, + const c10::optional &filter_desc = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaEncoder16add_video_streamEdiiRKNSt6stringERKN3c108optionalINSt6stringEEERKN3c108optionalI10OptionDictEERKN3c108optionalINSt6stringEEERKN3c108optionalIdEERKN3c108optionalIiEERKN3c108optionalIiEERKN3c108optionalINSt6stringEEERKN3c108optionalI11CodecConfigEERKN3c108optionalINSt6stringEEE + "跳转到此定义") - en: Add an output video stream. + id: totrans-55 prefs: [] type: TYPE_NORMAL + zh: 添加一个输出视频流。 - en: 'Parameters:' + id: totrans-56 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**frame_rate** – Frame rate' + id: totrans-57 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**frame_rate** - 帧率' - en: '**width** – Width' + id: totrans-58 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**width** - 宽度' - en: '**height** – Height' + id: totrans-59 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**height** - 高度' - en: '**format** – Input pixel format, which determines the color channel order of the input tensor.' + id: totrans-60 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**format** - 输入像素格式,确定输入张量的颜色通道顺序。' - en: '`"gray8"`: One channel, grayscale.' + id: totrans-61 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '"gray8":一个通道,灰度。' - en: '`"rgb24"`: Three channels in the order of RGB.' + id: totrans-62 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '"rgb24":RGB顺序的三个通道。' - en: '`"bgr24"`: Three channels in the order of BGR.' + id: totrans-63 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '"bgr24":BGR顺序的三个通道。' - en: '`"yuv444p"`: Three channels in the order of YUV.' + id: totrans-64 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL + zh: '"yuv444p":YUV顺序的三个通道。' - en: In either case, the input tensor has to be `torch.uint8` type and the shape must be (frame, channel, height, width). + id: totrans-65 prefs: - PREF_IND type: TYPE_NORMAL + zh: 在任何情况下,输入张量必须是`torch.uint8`类型,形状必须是(frame,channel,height,width)。 - en: '**encoder** – See `[add_audio_stream()](#classtorio_1_1io_1_1StreamingMediaEncoder_1af7f8bbbe1d7b6363969eb099c48e5d04)`.' + id: totrans-66 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**encoder** – 参见`[add_audio_stream()](#classtorio_1_1io_1_1StreamingMediaEncoder_1af7f8bbbe1d7b6363969eb099c48e5d04)`。' - en: '**encoder_option** – See `[add_audio_stream()](#classtorio_1_1io_1_1StreamingMediaEncoder_1af7f8bbbe1d7b6363969eb099c48e5d04)`.' + id: totrans-67 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**encoder_option** – 参见`[add_audio_stream()](#classtorio_1_1io_1_1StreamingMediaEncoder_1af7f8bbbe1d7b6363969eb099c48e5d04)`。' - en: '**encoder_format** – See `[add_audio_stream()](#classtorio_1_1io_1_1StreamingMediaEncoder_1af7f8bbbe1d7b6363969eb099c48e5d04)`.' + id: totrans-68 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**encoder_format** – 参见`[add_audio_stream()](#classtorio_1_1io_1_1StreamingMediaEncoder_1af7f8bbbe1d7b6363969eb099c48e5d04)`。' - en: '**encoder_frame_rate** – If provided, change frame rate before encoding.' + id: totrans-69 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**encoder_frame_rate** – 如果提供,编码前更改帧率。' - en: '**encoder_width** – If provided, resize image before encoding.' + id: totrans-70 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**encoder_width** – 如果提供,编码前调整图像大小。' - en: '**encoder_height** – If provided, resize image before encoding.' + id: totrans-71 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**encoder_height** – 如果提供,编码前调整图像大小。' - en: '**hw_accel** – Enable hardware acceleration.' + id: totrans-72 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**hw_accel** – 启用硬件加速。' - en: '**codec_config** – Codec configuration.' + id: totrans-73 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**codec_config** – 编解码器配置。' - en: When video is encoded on CUDA hardware, for example `encoder="h264_nvenc"`, passing CUDA device indicator to `hw_accel` (i.e. `hw_accel="cuda:0"`) will make [StreamingMediaEncoder](#classtorio_1_1io_1_1StreamingMediaEncoder) expect video chunk to be a CUDA Tensor. Passing CPU Tensor will result in an error. + id: totrans-74 prefs: - PREF_IND type: TYPE_NORMAL + zh: 当视频在CUDA硬件上编码时,例如`encoder="h264_nvenc"`,将CUDA设备指示器传递给`hw_accel`(即`hw_accel="cuda:0"`)将使[StreamingMediaEncoder](#classtorio_1_1io_1_1StreamingMediaEncoder)期望视频块是CUDA张量。传递CPU张量将导致错误。 - en: If `None`, the video chunk Tensor has to be a CPU Tensor. + id: totrans-75 prefs: - PREF_IND type: TYPE_NORMAL + zh: 如果为`None`,视频块张量必须是CPU张量。 - en: '**filter_desc** – Additional processing to apply before encoding the input data' + id: totrans-76 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**filter_desc** – 在编码输入数据之前应用的附加处理' - en: set_metadata[](#set-metadata "Permalink to this heading") + id: totrans-77 prefs: - PREF_H3 type: TYPE_NORMAL + zh: set_metadata[](#set-metadata "Permalink to this heading") - en: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE "torio::io::StreamingMediaEncoder")::set_metadata(const OptionDict &metadata)[](#_CPPv4N5torio2io21StreamingMediaEncoder12set_metadataERK10OptionDict "Permalink to this definition") + id: totrans-78 prefs: [] type: TYPE_NORMAL + zh: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE + "torio::io::StreamingMediaEncoder")::set_metadata(const OptionDict &metadata)[](#_CPPv4N5torio2io21StreamingMediaEncoder12set_metadataERK10OptionDict + "Permalink to this definition") - en: Set file-level metadata + id: totrans-79 prefs: [] type: TYPE_NORMAL + zh: 设置文件级元数据 - en: 'Parameters:' + id: totrans-80 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**metadata** – metadata.' + id: totrans-81 prefs: [] type: TYPE_NORMAL + zh: '**metadata** – 元数据。' - en: Write methods[](#write-methods "Permalink to this heading") + id: totrans-82 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 写入方法[](#write-methods "Permalink to this heading") - en: open[](#open "Permalink to this heading") + id: totrans-83 prefs: - PREF_H3 type: TYPE_NORMAL + zh: open[](#open "Permalink to this heading") - en: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE "torio::io::StreamingMediaEncoder")::open(const c10::optional &opt = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaEncoder4openERKN3c108optionalI10OptionDictEE "Permalink to this definition") + id: totrans-84 prefs: [] type: TYPE_NORMAL + zh: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE + "torio::io::StreamingMediaEncoder")::open(const c10::optional &opt + = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaEncoder4openERKN3c108optionalI10OptionDictEE + "Permalink to this definition") - en: Open the output file / device and write the header. + id: totrans-85 prefs: [] type: TYPE_NORMAL + zh: 打开输出文件/设备并写入头部。 - en: 'Parameters:' + id: totrans-86 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**opt** – Private options for protocol, device and muxer.' + id: totrans-87 prefs: [] type: TYPE_NORMAL + zh: '**opt** – 协议、设备和复用器的私有选项。' - en: close[](#close "Permalink to this heading") + id: totrans-88 prefs: - PREF_H3 type: TYPE_NORMAL + zh: close[](#close "Permalink to this heading") - en: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE "torio::io::StreamingMediaEncoder")::close()[](#_CPPv4N5torio2io21StreamingMediaEncoder5closeEv "Permalink to this definition") + id: totrans-89 prefs: [] type: TYPE_NORMAL + zh: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE + "torio::io::StreamingMediaEncoder")::close()[](#_CPPv4N5torio2io21StreamingMediaEncoder5closeEv + "Permalink to this definition") - en: Close the output file / device and finalize metadata. + id: totrans-90 prefs: [] type: TYPE_NORMAL + zh: 关闭输出文件/设备并完成元数据。 - en: write_audio_chunk[](#write-audio-chunk "Permalink to this heading") + id: totrans-91 prefs: - PREF_H3 type: TYPE_NORMAL + zh: write_audio_chunk[](#write-audio-chunk "Permalink to this heading") - en: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE "torio::io::StreamingMediaEncoder")::write_audio_chunk(int i, const torch::Tensor &frames, const c10::optional &pts = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaEncoder17write_audio_chunkEiRKN5torch6TensorERKN3c108optionalIdEE "Permalink to this definition") + id: totrans-92 prefs: [] type: TYPE_NORMAL + zh: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE + "torio::io::StreamingMediaEncoder")::write_audio_chunk(int i, const torch::Tensor + &frames, const c10::optional &pts = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaEncoder17write_audio_chunkEiRKN5torch6TensorERKN3c108optionalIdEE + "Permalink to this definition") - en: Write audio data + id: totrans-93 prefs: [] type: TYPE_NORMAL + zh: 写入音频数据 - en: 'Parameters:' + id: totrans-94 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**i** – Stream index.' + id: totrans-95 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**i** – 流索引。' - en: '**frames** – Waveform tensor. Shape: `(frame, channel)`. The `dtype` must match what was passed to `[add_audio_stream()](#classtorio_1_1io_1_1StreamingMediaEncoder_1af7f8bbbe1d7b6363969eb099c48e5d04)` method.' + id: totrans-96 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**frames** – 波形张量。形状:`(frame, channel)`。`dtype`必须与传递给`[add_audio_stream()](#classtorio_1_1io_1_1StreamingMediaEncoder_1af7f8bbbe1d7b6363969eb099c48e5d04)`方法相匹配。' - en: '**pts** –' + id: totrans-97 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**pts** –' - en: Presentation timestamp. If provided, it overwrites the PTS of the first frame with the provided one. Otherwise, PTS are incremented per an inverse of sample rate. Only values exceed the PTS values processed internally. + id: totrans-98 prefs: - PREF_IND type: TYPE_NORMAL + zh: 呈现时间戳。如果提供,则用提供的时间戳覆盖第一帧的时间戳。否则,时间戳按照采样率的倒数递增。仅处理内部处理的时间戳值。 - en: '**NOTE**: The provided value is converted to integer value expressed in basis of sample rate. Therefore, it is truncated to the nearest value of `n / sample_rate`.' + id: totrans-99 prefs: - PREF_IND type: TYPE_NORMAL + zh: '**注意**:提供的值转换为以采样率为基础的整数值。因此,它被截断为最接近的`n / sample_rate`值。' - en: write_video_chunk[](#write-video-chunk "Permalink to this heading") + id: totrans-100 prefs: - PREF_H3 type: TYPE_NORMAL + zh: write_video_chunk[](#write-video-chunk "Permalink to this heading") - en: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE "torio::io::StreamingMediaEncoder")::write_video_chunk(int i, const torch::Tensor &frames, const c10::optional &pts = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaEncoder17write_video_chunkEiRKN5torch6TensorERKN3c108optionalIdEE "Permalink to this definition") + id: totrans-101 prefs: [] type: TYPE_NORMAL + zh: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE + "torio::io::StreamingMediaEncoder")::write_video_chunk(int i, const torch::Tensor + &frames, const c10::optional &pts = c10::nullopt)[](#_CPPv4N5torio2io21StreamingMediaEncoder17write_video_chunkEiRKN5torch6TensorERKN3c108optionalIdEE + "Permalink to this definition") - en: Write video data + id: totrans-102 prefs: [] type: TYPE_NORMAL + zh: 写入视频数据 - en: 'Parameters:' + id: totrans-103 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**i** – Stream index.' + id: totrans-104 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**i** – 流索引。' - en: '**frames** – Video/image tensor. Shape: `(time, channel, height, width)`. The `dtype` must be `torch.uint8`. The shape `(height, width and the number of channels)` must match what was configured when calling `[add_video_stream()](#classtorio_1_1io_1_1StreamingMediaEncoder_1a5337088220f338d2aa5fddfd3d256579)`.' + id: totrans-105 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**frames** – 视频/图像张量。形状:`(时间,通道,高度,宽度)`。`dtype` 必须是 `torch.uint8`。形状 `(高度、宽度和通道数)` + 必须与调用 `[add_video_stream()](#classtorio_1_1io_1_1StreamingMediaEncoder_1a5337088220f338d2aa5fddfd3d256579)` + 时配置的相匹配。' - en: '**pts** –' + id: totrans-106 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**pts** –' - en: Presentation timestamp. If provided, it overwrites the PTS of the first frame with the provided one. Otherwise, PTS are incremented per an inverse of frame rate. Only values exceed the PTS values processed internally. + id: totrans-107 prefs: - PREF_IND type: TYPE_NORMAL + zh: 呈现时间戳。如果提供,则用提供的时间戳覆盖第一帧的时间戳。否则,时间戳按帧速率的倒数递增。仅超过内部处理的时间戳值。 - en: '**NOTE**: The provided value is converted to integer value expressed in basis of frame rate. Therefore, it is truncated to the nearest value of `n / frame_rate`.' + id: totrans-108 prefs: - PREF_IND type: TYPE_NORMAL + zh: '**注意**:提供的值被转换为以帧速率为基础的整数值。因此,它被截断为最接近的值 `n / frame_rate`。' - en: flush[](#flush "Permalink to this heading") + id: totrans-109 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 刷新[](#flush "此标题的永久链接") - en: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE "torio::io::StreamingMediaEncoder")::flush()[](#_CPPv4N5torio2io21StreamingMediaEncoder5flushEv "Permalink to this definition") + id: totrans-110 prefs: [] type: TYPE_NORMAL + zh: void torio::io::[StreamingMediaEncoder](#_CPPv4N5torio2io21StreamingMediaEncoderE + "torio::io::StreamingMediaEncoder")::flush()[](#_CPPv4N5torio2io21StreamingMediaEncoder5flushEv + "此定义的永久链接") - en: Flush the frames from encoders and write the frames to the destination. + id: totrans-111 prefs: [] type: TYPE_NORMAL + zh: 刷新编码器中的帧并将帧写入目标。 diff --git a/totrans/aud22_74.yaml b/totrans/aud22_74.yaml index 5fcc1c33682477d25adbc55165db4b5dc2d1a41c..7095894be9834c9bd3154e9084a960ac7ca953dc 100644 --- a/totrans/aud22_74.yaml +++ b/totrans/aud22_74.yaml @@ -1,4 +1,6 @@ - en: PyTorch Libraries + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: PyTorch库 diff --git a/totrans/aud22_75.yaml b/totrans/aud22_75.yaml index 31e81dd3826d52f36d7cee5789ddbac45739964d..0fa2f26bc459afbbe4a3501d777517a1d96ab701 100644 --- a/totrans/aud22_75.yaml +++ b/totrans/aud22_75.yaml @@ -1,203 +1,317 @@ - en: TorchServe + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: TorchServe - en: 原文:[https://pytorch.org/serve](https://pytorch.org/serve) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/serve](https://pytorch.org/serve) - en: TorchServe is a performant, flexible and easy to use tool for serving PyTorch models in production. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: TorchServe是一个高性能、灵活且易于使用的工具,用于在生产环境中提供PyTorch模型服务。 - en: What’s going on in TorchServe? + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: TorchServe中发生了什么? - en: '[High performance Llama 2 deployments with AWS Inferentia2 using TorchServe](https://pytorch.org/blog/high-performance-llama/)' + id: totrans-4 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[使用TorchServe在AWS Inferentia2上部署高性能Llama 2](https://pytorch.org/blog/high-performance-llama/)' - en: '[Naver Case Study: Transition From High-Cost GPUs to Intel CPUs and oneAPI powered Software with performance](https://pytorch.org/blog/ml-model-server-resource-saving/)' + id: totrans-5 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[Naver案例研究:从高成本GPU过渡到Intel CPU和性能强大的oneAPI软件](https://pytorch.org/blog/ml-model-server-resource-saving/)' - en: '[Run multiple generative AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe and save up to 75% in inference costs](https://aws.amazon.com/blogs/machine-learning/run-multiple-generative-ai-models-on-gpu-using-amazon-sagemaker-multi-model-endpoints-with-torchserve-and-save-up-to-75-in-inference-costs/)' + id: totrans-6 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[在GPU上使用Amazon SageMaker多模型端点运行多个生成式AI模型,并节省高达75%的推理成本](https://aws.amazon.com/blogs/machine-learning/run-multiple-generative-ai-models-on-gpu-using-amazon-sagemaker-multi-model-endpoints-with-torchserve-and-save-up-to-75-in-inference-costs/)' - en: '[Deploying your Generative AI model in only four steps with Vertex AI and PyTorch](https://cloud.google.com/blog/products/ai-machine-learning/get-your-genai-model-going-in-four-easy-steps)' + id: totrans-7 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[使用Vertex AI和PyTorch在四个步骤中部署您的生成式AI模型](https://cloud.google.com/blog/products/ai-machine-learning/get-your-genai-model-going-in-four-easy-steps)' - en: '[PyTorch Model Serving on Google Cloud TPUv5](https://cloud.google.com/tpu/docs/v5e-inference#pytorch-model-inference-and-serving)' + id: totrans-8 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[在Google Cloud TPUv5上提供PyTorch模型](https://cloud.google.com/tpu/docs/v5e-inference#pytorch-model-inference-and-serving)' - en: '[Monitoring using Datadog](https://www.datadoghq.com/blog/ai-integrations/#model-serving-and-deployment-vertex-ai-amazon-sagemaker-torchserve)' + id: totrans-9 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[使用Datadog进行监控](https://www.datadoghq.com/blog/ai-integrations/#model-serving-and-deployment-vertex-ai-amazon-sagemaker-torchserve)' - en: '[Torchserve Performance Tuning, Animated Drawings Case-Study](https://pytorch.org/blog/torchserve-performance-tuning/)' + id: totrans-10 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[TorchServe性能调优,动画绘图案例研究](https://pytorch.org/blog/torchserve-performance-tuning/)' - en: '[Walmart Search: Serving Models at a Scale on TorchServe](https://medium.com/walmartglobaltech/search-model-serving-using-pytorch-and-torchserve-6caf9d1c5f4d)' + id: totrans-11 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[Walmart Search:在TorchServe上规模化提供模型](https://medium.com/walmartglobaltech/search-model-serving-using-pytorch-and-torchserve-6caf9d1c5f4d)' - en: '[Scaling inference on CPU with TorchServe](https://www.youtube.com/watch?v=066_Jd6cwZg)' + id: totrans-12 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[使用TorchServe在CPU上扩展推理](https://www.youtube.com/watch?v=066_Jd6cwZg)' - en: '[TorchServe C++ backend](https://www.youtube.com/watch?v=OSmGGDpaesc)' + id: totrans-13 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[TorchServe C++后端](https://www.youtube.com/watch?v=OSmGGDpaesc)' - en: '[Grokking Intel CPU PyTorch performance from first principles: a TorchServe case study](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex.html)' + id: totrans-14 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[Grokking Intel CPU PyTorch performance from first principles: a TorchServe + case study](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex.html)' - en: '[Grokking Intel CPU PyTorch performance from first principles( Part 2): a TorchServe case study](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex_2.html)' + id: totrans-15 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[Grokking Intel CPU PyTorch performance from first principles( Part 2): a TorchServe + case study](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex_2.html)' - en: '[Case Study: Amazon Ads Uses PyTorch and AWS Inferentia to Scale Models for Ads Processing](https://pytorch.org/blog/amazon-ads-case-study/)' + id: totrans-16 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[案例研究:亚马逊广告使用PyTorch和AWS Inferentia扩展广告处理模型](https://pytorch.org/blog/amazon-ads-case-study/)' - en: '[Optimize your inference jobs using dynamic batch inference with TorchServe on Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/optimize-your-inference-jobs-using-dynamic-batch-inference-with-torchserve-on-amazon-sagemaker/)' + id: totrans-17 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[在Amazon SageMaker上使用TorchServe进行动态批量推理优化](https://aws.amazon.com/blogs/machine-learning/optimize-your-inference-jobs-using-dynamic-batch-inference-with-torchserve-on-amazon-sagemaker/)' - en: '[Using AI to bring children’s drawings to life](https://ai.facebook.com/blog/using-ai-to-bring-childrens-drawings-to-life/)' + id: totrans-18 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[利用AI让儿童的画作栩栩如生](https://ai.facebook.com/blog/using-ai-to-bring-childrens-drawings-to-life/)' - en: '[Model Serving in PyTorch](https://www.youtube.com/watch?v=2A17ZtycsPw)' + id: totrans-19 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[在PyTorch中提供模型服务](https://www.youtube.com/watch?v=2A17ZtycsPw)' - en: '[Evolution of Cresta’s machine learning architecture: Migration to AWS and PyTorch](https://aws.amazon.com/blogs/machine-learning/evolution-of-crestas-machine-learning-architecture-migration-to-aws-and-pytorch/)' + id: totrans-20 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[Cresta机器学习架构的演变:迁移到AWS和PyTorch](https://aws.amazon.com/blogs/machine-learning/evolution-of-crestas-machine-learning-architecture-migration-to-aws-and-pytorch/)' - en: '[Explain Like I’m 5: TorchServe](https://www.youtube.com/watch?v=NEdZbkfHQCk)' + id: totrans-21 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[像我五岁一样解释TorchServe](https://www.youtube.com/watch?v=NEdZbkfHQCk)' - en: '[How to Serve PyTorch Models with TorchServe](https://www.youtube.com/watch?v=XlO7iQMV3Ik)' + id: totrans-22 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[如何使用TorchServe为PyTorch模型提供服务](https://www.youtube.com/watch?v=XlO7iQMV3Ik)' - en: '[How to deploy PyTorch models on Vertex AI](https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-deploy-pytorch-models-vertex-ai)' + id: totrans-23 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[如何在Vertex AI上部署PyTorch模型](https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-deploy-pytorch-models-vertex-ai)' - en: '[Quantitative Comparison of Serving Platforms](https://biano-ai.github.io/research/2021/08/16/quantitative-comparison-of-serving-platforms-for-neural-networks.html)' + id: totrans-24 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[服务平台的定量比较](https://biano-ai.github.io/research/2021/08/16/quantitative-comparison-of-serving-platforms-for-neural-networks.html)' - en: All + id: totrans-25 prefs: [] type: TYPE_NORMAL + zh: 全部 - en: '* * *' + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: '* * *' - en: '[#### TorchServe Quick Start' + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: '[#### TorchServe快速入门' - en: 'Topics: Quick Start' + id: totrans-28 prefs: [] type: TYPE_NORMAL + zh: 主题:快速入门 - en: Learn how to install TorchServe and serve models. + id: totrans-29 prefs: [] type: TYPE_NORMAL + zh: 学习如何安装TorchServe并提供模型服务。 - en: '![](../Images/2e44a4dab4c1bd5cde13eaa681343e78.png)](getting_started.html) [#### Running TorchServe' + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/2e44a4dab4c1bd5cde13eaa681343e78.png)](getting_started.html) + [#### 运行TorchServe' - en: 'Topics: Running TorchServe' + id: totrans-31 prefs: [] type: TYPE_NORMAL + zh: 主题:运行TorchServe - en: Indepth explanation of how to run TorchServe + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: 深入解释如何运行TorchServe - en: '![](../Images/661e92286b91a04a664aa0dd434223f4.png)](server.html) [#### Why TorchServe' + id: totrans-33 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/661e92286b91a04a664aa0dd434223f4.png)](server.html) [#### 为什么选择TorchServe' - en: 'Topics: Examples' + id: totrans-34 prefs: [] type: TYPE_NORMAL + zh: 主题:示例 - en: Various TorchServe use cases + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 各种TorchServe使用案例 - en: '![](../Images/0507eb3112fdbfd24e3e2ba13aa3e3fa.png)](use_cases.html) [#### Performance' + id: totrans-36 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/0507eb3112fdbfd24e3e2ba13aa3e3fa.png)](use_cases.html) [#### + 性能' - en: 'Topics: Performance,Troubleshooting' + id: totrans-37 prefs: [] type: TYPE_NORMAL + zh: 主题:性能、故障排除 - en: Guides and best practices on how to improve perfromance when working with TorchServe + id: totrans-38 prefs: [] type: TYPE_NORMAL + zh: 关于如何在使用TorchServe时提高性能的指南和最佳实践 - en: '![](../Images/a115bf3860d7637d64025cdabc4de95b.png)](performance_guide.html) [#### Metrics' + id: totrans-39 prefs: [] type: TYPE_NORMAL + zh: '#### 指标' - en: 'Topics: Metrics,Performance,Troubleshooting' + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: 主题:指标,性能,故障排除 - en: Collecting and viewing Torcherve metrics + id: totrans-41 prefs: [] type: TYPE_NORMAL + zh: 收集和查看Torcherve指标 - en: '![](../Images/eab661f8c4941205ffdc566aced9bccf.png)](metrics.html) [#### Large Model Inference' + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: 主题:常见问题解答 - en: 'Topics: Large-Models,Performance' + id: totrans-43 prefs: [] type: TYPE_NORMAL + zh: 主题:大型模型,性能 - en: Serving Large Models with TorchServe + id: totrans-44 prefs: [] type: TYPE_NORMAL + zh: 使用TorchServe为大型模型提供服务 - en: '![](../Images/f6afe69d86ffcf863cd832ed3698732f.png)](large_model_inference.html) [#### Troubleshooting' + id: totrans-45 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/a115bf3860d7637d64025cdabc4de95b.png)](performance_guide.html) ' - en: 'Topics: Troubleshooting,Performance' + id: totrans-46 prefs: [] type: TYPE_NORMAL + zh: 主题:故障排除,性能 - en: Various updates on Torcherve and use cases. + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 各种关于Torcherve和使用案例的更新。 - en: '![](../Images/d23903f23b5705cc9f1d9bdca6ce6bbb.png)](Troubleshooting.html) [#### TorchServe Security Policy' + id: totrans-48 prefs: [] type: TYPE_NORMAL + zh: '#### TorchServe安全策略' - en: 'Topics: Security' + id: totrans-49 prefs: [] type: TYPE_NORMAL + zh: '#### 故障排除' - en: Security Policy + id: totrans-50 prefs: [] type: TYPE_NORMAL + zh: 安全策略 - en: '![](../Images/2e44a4dab4c1bd5cde13eaa681343e78.png)](security.html) [#### FAQs' + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: '#### 常见问题解答' - en: 'Topics: FAQS' + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: 各种经常被问到的问题。 - en: Various frequently asked questions. + id: totrans-53 prefs: [] type: TYPE_NORMAL + zh: 主题:安全性 - en: '![](../Images/7ccfac0b40fe2fac42582244489f0da4.png)](FAQs.html)' + id: totrans-54 prefs: [] type: TYPE_IMG + zh: '#### 大型模型推理' diff --git a/totrans/data07_00.yaml b/totrans/data07_00.yaml index f7276f65773ec86de02983f0e82dfa44909e300b..dd27e380228c9ec632d2f313d9510d10f247684e 100644 --- a/totrans/data07_00.yaml +++ b/totrans/data07_00.yaml @@ -1,7 +1,11 @@ - en: TorchData 0.7 Doc + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: TorchData 0.7 文档 - en: 来源:[https://pytorch.org/data/beta/index.html](https://pytorch.org/data/beta/index.html) + id: totrans-1 prefs: [] type: TYPE_NORMAL + zh: 来源:[https://pytorch.org/data/beta/index.html](https://pytorch.org/data/beta/index.html) diff --git a/totrans/data07_01.yaml b/totrans/data07_01.yaml index 18fe5d88e0611a14a53b73f0c262b5b023f3c0a3..9c546f6853eb50c10c7859f860d98cff7542eb10 100644 --- a/totrans/data07_01.yaml +++ b/totrans/data07_01.yaml @@ -1,4 +1,6 @@ - en: 'API Reference:' + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 'API 参考文档:' diff --git a/totrans/data07_02.yaml b/totrans/data07_02.yaml index 832e3235d38a165bc84de9410aed9e5d76ccdc16..cf111e4270455987a3d5a74a3c32eccf95686f81 100644 --- a/totrans/data07_02.yaml +++ b/totrans/data07_02.yaml @@ -1,67 +1,94 @@ - en: Iterable-style DataPipes + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 可迭代式DataPipes - en: 原文:[https://pytorch.org/data/beta/torchdata.datapipes.iter.html](https://pytorch.org/data/beta/torchdata.datapipes.iter.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/data/beta/torchdata.datapipes.iter.html](https://pytorch.org/data/beta/torchdata.datapipes.iter.html) - en: An iterable-style dataset is an instance of a subclass of IterableDataset that implements the `__iter__()` protocol, and represents an iterable over data samples. This type of datasets is particularly suitable for cases where random reads are expensive or even improbable, and where the batch size depends on the fetched data. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 可迭代式数据集是IterableDataset子类的实例,实现了`__iter__()`协议,并表示数据样本的可迭代。这种类型的数据集特别适用于随机读取昂贵甚至不太可能的情况,批量大小取决于获取的数据。 - en: For example, such a dataset, when called `iter(iterdatapipe)`, could return a stream of data reading from a database, a remote server, or even logs generated in real time. + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 例如,这样一个数据集,当调用`iter(iterdatapipe)`时,可以返回从数据库、远程服务器或实时生成的日志中读取的数据流。 - en: This is an updated version of `IterableDataset` in `torch`. + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 这是`torch`中`IterableDataset`的更新版本。 - en: '[PRE0]' + id: totrans-5 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: Iterable-style DataPipe. + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 可迭代式DataPipe。 - en: All DataPipes that represent an iterable of data samples should subclass this. This style of DataPipes is particularly useful when data come from a stream, or when the number of samples is too large to fit them all in memory. `IterDataPipe` is lazily initialized and its elements are computed only when `next()` is called on the iterator of an `IterDataPipe`. + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 所有表示数据样本可迭代的DataPipes都应该是这样的子类。当数据来自流时,或者样本数量太大无法全部放入内存时,这种DataPipes风格特别有用。`IterDataPipe`是惰性初始化的,只有在对`IterDataPipe`的迭代器调用`next()`时才计算其元素。 - en: All subclasses should overwrite `__iter__()`, which would return an iterator of samples in this DataPipe. Calling `__iter__` of an `IterDataPipe` automatically invokes its method `reset()`, which by default performs no operation. When writing a custom `IterDataPipe`, users should override `reset()` if necessary. The common usages include resetting buffers, pointers, and various state variables within the custom `IterDataPipe`. + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 所有子类应该重写`__iter__()`,它将返回此DataPipe中样本的迭代器。调用`IterDataPipe`的`__iter__`会自动调用其方法`reset()`,默认情况下不执行任何操作。当编写自定义`IterDataPipe`时,用户应该根据需要重写`reset()`。常见用法包括重置自定义`IterDataPipe`中的缓冲区、指针和各种状态变量。 - en: Note + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: Only one iterator can be valid for each `IterDataPipe` at a time, and the creation a second iterator will invalidate the first one. This constraint is necessary because some `IterDataPipe` have internal buffers, whose states can become invalid if there are multiple iterators. The code example below presents details on how this constraint looks in practice. If you have any feedback related to this constraint, please see [GitHub IterDataPipe Single Iterator Issue](https://github.com/pytorch/data/issues/45). + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 每次只能有一个迭代器对`IterDataPipe`有效,创建第二个迭代器将使第一个迭代器无效。这个约束是必要的,因为一些`IterDataPipe`具有内部缓冲区,如果有多个迭代器,其状态可能会变得无效。下面的代码示例详细介绍了这个约束在实践中的样子。如果您对这个约束有任何反馈,请参阅[GitHub + IterDataPipe Single Iterator Issue](https://github.com/pytorch/data/issues/45)。 - en: These DataPipes can be invoked in two ways, using the class constructor or applying their functional form onto an existing `IterDataPipe` (recommended, available to most but not all DataPipes). You can chain multiple IterDataPipe together to form a pipeline that will perform multiple operations in succession. + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 这些DataPipes可以以两种方式调用,使用类构造函数或将它们的函数形式应用于现有的`IterDataPipe`(推荐,大多数但不是所有DataPipes都可用)。您可以将多个IterDataPipe链接在一起,形成一个连续执行多个操作的管道。 - en: Note + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: When a subclass is used with `DataLoader`, each item in the DataPipe will be yielded from the `DataLoader` iterator. When `num_workers > 0`, each worker process will have a different copy of the DataPipe object, so it is often desired to configure @@ -69,627 +96,1031 @@ `get_worker_info()`, when called in a worker process, returns information about the worker. It can be used in either the dataset’s `__iter__()` method or the `DataLoader` ‘s `worker_init_fn` option to modify each copy’s behavior. + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 当子类与`DataLoader`一起使用时,DataPipe中的每个项目将从`DataLoader`迭代器中产生。当`num_workers > 0`时,每个工作进程将拥有DataPipe对象的不同副本,因此通常希望配置每个副本独立以避免从工作进程返回重复数据。`get_worker_info()`在工作进程中调用时,返回有关工作进程的信息。它可以在数据集的`__iter__()`方法或`DataLoader`的`worker_init_fn`选项中使用,以修改每个副本的行为。 - en: Examples + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 示例 - en: 'General Usage:' + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: 通用用法: - en: '[PRE1]' + id: totrans-16 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: 'Single Iterator Constraint Example:' + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: 单迭代器约束示例: - en: '[PRE2]' + id: totrans-18 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: 'We have different types of Iterable DataPipes:' + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: 我们有不同类型的Iterable DataPipes: - en: Archive - open and decompress archive files of different formats. + id: totrans-20 prefs: - PREF_OL type: TYPE_NORMAL + zh: 存档 - 打开和解压不同格式的存档文件。 - en: Augmenting - augment your samples (e.g. adding index, or cycle through indefinitely). + id: totrans-21 prefs: - PREF_OL type: TYPE_NORMAL + zh: 增强 - 增强您的样本(例如添加索引,或无限循环)。 - en: Combinatorial - perform combinatorial operations (e.g. sampling, shuffling). + id: totrans-22 prefs: - PREF_OL type: TYPE_NORMAL + zh: 组合 - 执行组合操作(例如采样、洗牌)。 - en: Combining/Splitting - interact with multiple DataPipes by combining them or splitting one to many. + id: totrans-23 prefs: - PREF_OL type: TYPE_NORMAL + zh: 组合/拆分 - 通过组合多个DataPipes或将一个DataPipe拆分为多个来进行交互。 - en: Grouping - group samples within a DataPipe + id: totrans-24 prefs: - PREF_OL type: TYPE_NORMAL + zh: 分组 - 在DataPipe中对样本进行分组 - en: IO - interacting with the file systems or remote server (e.g. downloading, opening, saving files, and listing the files in directories). + id: totrans-25 prefs: - PREF_OL type: TYPE_NORMAL + zh: IO - 与文件系统或远程服务器交互(例如下载、打开、保存文件,并列出目录中的文件)。 - en: Mapping - apply the a given function to each element in the DataPipe. + id: totrans-26 prefs: - PREF_OL type: TYPE_NORMAL + zh: 映射 - 将给定函数应用于DataPipe中的每个元素。 - en: Others - perform miscellaneous set of operations. + id: totrans-27 prefs: - PREF_OL type: TYPE_NORMAL + zh: 其他 - 执行各种操作。 - en: Selecting - select specific samples within a DataPipe. + id: totrans-28 prefs: - PREF_OL type: TYPE_NORMAL + zh: 选择 - 在DataPipe中选择特定样本。 - en: Text - parse, read, and transform text files and data + id: totrans-29 prefs: - PREF_OL type: TYPE_NORMAL + zh: 文本 - 解析、读取和转换文本文件和数据 - en: Archive DataPipes[](#archive-datapipes "Permalink to this heading") + id: totrans-30 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 存档DataPipes - en: These DataPipes help opening and decompressing archive files of different formats. + id: totrans-31 prefs: [] type: TYPE_NORMAL + zh: 这些DataPipes帮助打开和解压不同格式的存档文件。 - en: '| [`Bz2FileLoader`](generated/torchdata.datapipes.iter.Bz2FileLoader.html#torchdata.datapipes.iter.Bz2FileLoader "torchdata.datapipes.iter.Bz2FileLoader") | Decompresses bz2 binary streams from an Iterable DataPipe which contains tuples of path name and bz2 binary streams, and yields a tuple of path name and extracted binary stream (functional name: `load_from_bz2`). |' + id: totrans-32 prefs: [] type: TYPE_TB + zh: '| [`Bz2FileLoader`](generated/torchdata.datapipes.iter.Bz2FileLoader.html#torchdata.datapipes.iter.Bz2FileLoader + "torchdata.datapipes.iter.Bz2FileLoader") | 从包含路径名和bz2二进制流元组的可迭代DataPipe中解压缩bz2二进制流,并产生一个路径名和提取的二进制流元组(函数名:`load_from_bz2`)。 + |' - en: '| [`Decompressor`](generated/torchdata.datapipes.iter.Decompressor.html#torchdata.datapipes.iter.Decompressor "torchdata.datapipes.iter.Decompressor") | Takes tuples of path and compressed stream of data, and returns tuples of path and decompressed stream of data (functional name: `decompress`). |' + id: totrans-33 prefs: [] type: TYPE_TB + zh: '| [`Decompressor`](generated/torchdata.datapipes.iter.Decompressor.html#torchdata.datapipes.iter.Decompressor + "torchdata.datapipes.iter.Decompressor") | 接受路径和压缩数据流的元组,并返回路径和解压缩数据流的元组(函数名:`decompress`)。 + |' - en: '| [`RarArchiveLoader`](generated/torchdata.datapipes.iter.RarArchiveLoader.html#torchdata.datapipes.iter.RarArchiveLoader "torchdata.datapipes.iter.RarArchiveLoader") | Decompresses rar binary streams from input Iterable Datapipes which contains tuples of path name and rar binary stream, and yields a tuple of path name and extracted binary stream (functional name: `load_from_rar`). |' + id: totrans-34 prefs: [] type: TYPE_TB + zh: '| [`RarArchiveLoader`](generated/torchdata.datapipes.iter.RarArchiveLoader.html#torchdata.datapipes.iter.RarArchiveLoader + "torchdata.datapipes.iter.RarArchiveLoader") | 从包含路径名和rar二进制流元组的输入可迭代DataPipe中解压缩rar二进制流,并产生一个路径名和提取的二进制流元组(函数名:`load_from_rar`)。 + |' - en: '| [`TarArchiveLoader`](generated/torchdata.datapipes.iter.TarArchiveLoader.html#torchdata.datapipes.iter.TarArchiveLoader "torchdata.datapipes.iter.TarArchiveLoader") | Opens/decompresses tar binary streams from an Iterable DataPipe which contains tuples of path name and tar binary stream, and yields a tuple of path name and extracted binary stream (functional name: `load_from_tar`). |' + id: totrans-35 prefs: [] type: TYPE_TB + zh: '| [`TarArchiveLoader`](generated/torchdata.datapipes.iter.TarArchiveLoader.html#torchdata.datapipes.iter.TarArchiveLoader + "torchdata.datapipes.iter.TarArchiveLoader") | 从包含路径名和tar二进制流元组的可迭代DataPipe中打开/解压缩tar二进制流,并产生一个路径名和提取的二进制流元组(函数名:`load_from_tar`)。 + |' - en: '| [`TFRecordLoader`](generated/torchdata.datapipes.iter.TFRecordLoader.html#torchdata.datapipes.iter.TFRecordLoader "torchdata.datapipes.iter.TFRecordLoader") | Opens/decompresses tfrecord binary streams from an Iterable DataPipe which contains tuples of path name and tfrecord binary stream, and yields the stored records (functional name: `load_from_tfrecord`). |' + id: totrans-36 prefs: [] type: TYPE_TB + zh: '| [`TFRecordLoader`](generated/torchdata.datapipes.iter.TFRecordLoader.html#torchdata.datapipes.iter.TFRecordLoader + "torchdata.datapipes.iter.TFRecordLoader") | 从包含路径名和tfrecord二进制流元组的可迭代DataPipe中打开/解压缩tfrecord二进制流,并产生存储的记录(函数名:`load_from_tfrecord`)。 + |' - en: '| [`WebDataset`](generated/torchdata.datapipes.iter.WebDataset.html#torchdata.datapipes.iter.WebDataset "torchdata.datapipes.iter.WebDataset") | Iterable DataPipe that accepts stream of (path, data) tuples, usually, representing the pathnames and files of a tar archive (functional name: `webdataset`). |' + id: totrans-37 prefs: [] type: TYPE_TB + zh: '| [`WebDataset`](generated/torchdata.datapipes.iter.WebDataset.html#torchdata.datapipes.iter.WebDataset + "torchdata.datapipes.iter.WebDataset") | 接受(路径,数据)元组流的可迭代DataPipe,通常表示tar存档的路径名和文件(函数名:`webdataset`)。 + |' - en: '| [`XzFileLoader`](generated/torchdata.datapipes.iter.XzFileLoader.html#torchdata.datapipes.iter.XzFileLoader "torchdata.datapipes.iter.XzFileLoader") | Decompresses xz (lzma) binary streams from an Iterable DataPipe which contains tuples of path name and xy binary streams, and yields a tuple of path name and extracted binary stream (functional name: `load_from_xz`). |' + id: totrans-38 prefs: [] type: TYPE_TB + zh: '| [`XzFileLoader`](generated/torchdata.datapipes.iter.XzFileLoader.html#torchdata.datapipes.iter.XzFileLoader + "torchdata.datapipes.iter.XzFileLoader") | 从包含路径名和xz二进制流元组的可迭代DataPipe中解压缩xz(lzma)二进制流,并产生一个路径名和提取的二进制流元组(函数名:`load_from_xz`)。 + |' - en: '| [`ZipArchiveLoader`](generated/torchdata.datapipes.iter.ZipArchiveLoader.html#torchdata.datapipes.iter.ZipArchiveLoader "torchdata.datapipes.iter.ZipArchiveLoader") | Opens/decompresses zip binary streams from an Iterable DataPipe which contains a tuple of path name and zip binary stream, and yields a tuple of path name and extracted binary stream (functional name: `load_from_zip`). |' + id: totrans-39 prefs: [] type: TYPE_TB + zh: '| [`ZipArchiveLoader`](generated/torchdata.datapipes.iter.ZipArchiveLoader.html#torchdata.datapipes.iter.ZipArchiveLoader + "torchdata.datapipes.iter.ZipArchiveLoader") | 从包含路径名和zip二进制流元组的可迭代DataPipe中打开/解压缩zip二进制流,并产生一个路径名和提取的二进制流元组(函数名:`load_from_zip`)。 + |' - en: Augmenting DataPipes[](#augmenting-datapipes "Permalink to this heading") + id: totrans-40 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 增强DataPipes[](#augmenting-datapipes "Permalink to this heading") - en: These DataPipes help to augment your samples. + id: totrans-41 prefs: [] type: TYPE_NORMAL + zh: 这些DataPipes有助于增强您的样本。 - en: '| [`Cycler`](generated/torchdata.datapipes.iter.Cycler.html#torchdata.datapipes.iter.Cycler "torchdata.datapipes.iter.Cycler") | Cycles the specified input in perpetuity by default, or for the specified number of times (functional name: `cycle`). |' + id: totrans-42 prefs: [] type: TYPE_TB + zh: '| [`Cycler`](generated/torchdata.datapipes.iter.Cycler.html#torchdata.datapipes.iter.Cycler + "torchdata.datapipes.iter.Cycler") | 默认情况下永久循环指定的输入,或者循环指定次数(函数名:`cycle`)。 |' - en: '| [`Enumerator`](generated/torchdata.datapipes.iter.Enumerator.html#torchdata.datapipes.iter.Enumerator "torchdata.datapipes.iter.Enumerator") | Adds an index to an existing DataPipe through enumeration, with the index starting from 0 by default (functional name: `enumerate`). |' + id: totrans-43 prefs: [] type: TYPE_TB + zh: '| [`Enumerator`](generated/torchdata.datapipes.iter.Enumerator.html#torchdata.datapipes.iter.Enumerator + "torchdata.datapipes.iter.Enumerator") | 通过枚举向现有DataPipe添加索引,默认情况下索引从0开始(函数名:`enumerate`)。 + |' - en: '| [`IndexAdder`](generated/torchdata.datapipes.iter.IndexAdder.html#torchdata.datapipes.iter.IndexAdder "torchdata.datapipes.iter.IndexAdder") | Adds an index to an existing Iterable DataPipe with (functional name: `add_index`). |' + id: totrans-44 prefs: [] type: TYPE_TB + zh: '| [`IndexAdder`](generated/torchdata.datapipes.iter.IndexAdder.html#torchdata.datapipes.iter.IndexAdder + "torchdata.datapipes.iter.IndexAdder") | 向现有可迭代DataPipe添加索引(函数名:`add_index`)。 + |' - en: '| [`Repeater`](generated/torchdata.datapipes.iter.Repeater.html#torchdata.datapipes.iter.Repeater "torchdata.datapipes.iter.Repeater") | Repeatedly yield each element of source DataPipe for the specified number of times before moving onto the next element (functional name: `repeat`). |' + id: totrans-45 prefs: [] type: TYPE_TB + zh: '| [`Repeater`](生成/torchdata.datapipes.iter.Repeater.html#torchdata.datapipes.iter.Repeater + "torchdata.datapipes.iter.Repeater") | 在移动到下一个元素之前,重复为源DataPipe的每个元素指定次数的输出(功能名称:`repeat`)。 + |' - en: Combinatorial DataPipes[](#combinatorial-datapipes "Permalink to this heading") + id: totrans-46 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 组合式DataPipes[](#combinatorial-datapipes "跳转到此标题的永久链接") - en: These DataPipes help to perform combinatorial operations. + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 这些DataPipes有助于执行组合操作。 - en: '| [`InBatchShuffler`](generated/torchdata.datapipes.iter.InBatchShuffler.html#torchdata.datapipes.iter.InBatchShuffler "torchdata.datapipes.iter.InBatchShuffler") | Shuffles each mini-batch from the prior DataPipe (functional name: `in_batch_shuffle`). |' + id: totrans-48 prefs: [] type: TYPE_TB + zh: '| [`InBatchShuffler`](生成/torchdata.datapipes.iter.InBatchShuffler.html#torchdata.datapipes.iter.InBatchShuffler + "torchdata.datapipes.iter.InBatchShuffler") | 对来自先前DataPipe的每个小批次进行洗牌(功能名称:`in_batch_shuffle`)。 + |' - en: '| [`Sampler`](generated/torchdata.datapipes.iter.Sampler.html#torchdata.datapipes.iter.Sampler "torchdata.datapipes.iter.Sampler") | Generates sample elements using the provided `Sampler` (defaults to `SequentialSampler`). |' + id: totrans-49 prefs: [] type: TYPE_TB + zh: '| [`Sampler`](生成/torchdata.datapipes.iter.Sampler.html#torchdata.datapipes.iter.Sampler + "torchdata.datapipes.iter.Sampler") | 使用提供的`Sampler`生成样本元素(默认为`SequentialSampler`)。 + |' - en: '| [`Shuffler`](generated/torchdata.datapipes.iter.Shuffler.html#torchdata.datapipes.iter.Shuffler "torchdata.datapipes.iter.Shuffler") | Shuffles the input DataPipe with a buffer (functional name: `shuffle`). |' + id: totrans-50 prefs: [] type: TYPE_TB + zh: '| [`Shuffler`](生成/torchdata.datapipes.iter.Shuffler.html#torchdata.datapipes.iter.Shuffler + "torchdata.datapipes.iter.Shuffler") | 使用缓冲区对输入DataPipe进行洗牌(功能名称:`shuffle`)。 |' - en: Combining/Splitting DataPipes[](#combining-splitting-datapipes "Permalink to this heading") + id: totrans-51 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 组合/拆分DataPipes[](#combining-splitting-datapipes "跳转到此标题的永久链接") - en: These tend to involve multiple DataPipes, combining them or splitting one to many. + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: '这些通常涉及多个DataPipes,将它们组合在一起或将一个拆分为多个。 ' - en: '| [`Concater`](generated/torchdata.datapipes.iter.Concater.html#torchdata.datapipes.iter.Concater "torchdata.datapipes.iter.Concater") | Concatenates multiple Iterable DataPipes (functional name: `concat`). |' + id: totrans-53 prefs: [] type: TYPE_TB + zh: '| [`Concater`](生成/torchdata.datapipes.iter.Concater.html#torchdata.datapipes.iter.Concater + "torchdata.datapipes.iter.Concater") | 连接多个Iterable DataPipes(功能名称:`concat`)。 + |' - en: '| [`Demultiplexer`](generated/torchdata.datapipes.iter.Demultiplexer.html#torchdata.datapipes.iter.Demultiplexer "torchdata.datapipes.iter.Demultiplexer") | Splits the input DataPipe into multiple child DataPipes, using the given classification function (functional name: `demux`). |' + id: totrans-54 prefs: [] type: TYPE_TB + zh: '| [`Demultiplexer`](生成/torchdata.datapipes.iter.Demultiplexer.html#torchdata.datapipes.iter.Demultiplexer + "torchdata.datapipes.iter.Demultiplexer") | 使用给定的分类函数将输入DataPipe拆分为多个子DataPipes(功能名称:`demux`)。 + |' - en: '| [`Forker`](generated/torchdata.datapipes.iter.Forker.html#torchdata.datapipes.iter.Forker "torchdata.datapipes.iter.Forker") | Creates multiple instances of the same Iterable DataPipe (functional name: `fork`). |' + id: totrans-55 prefs: [] type: TYPE_TB + zh: '| [`Forker`](生成/torchdata.datapipes.iter.Forker.html#torchdata.datapipes.iter.Forker + "torchdata.datapipes.iter.Forker") | 创建相同Iterable DataPipe的多个实例(功能名称:`fork`)。 + |' - en: '| [`IterKeyZipper`](generated/torchdata.datapipes.iter.IterKeyZipper.html#torchdata.datapipes.iter.IterKeyZipper "torchdata.datapipes.iter.IterKeyZipper") | Zips two IterDataPipes together based on the matching key (functional name: `zip_with_iter`). |' + id: totrans-56 prefs: [] type: TYPE_TB + zh: '| [`IterKeyZipper`](生成/torchdata.datapipes.iter.IterKeyZipper.html#torchdata.datapipes.iter.IterKeyZipper + "torchdata.datapipes.iter.IterKeyZipper") | 根据匹配的键将两个IterDataPipes一起压缩(功能名称:`zip_with_iter`)。 + |' - en: '| [`MapKeyZipper`](generated/torchdata.datapipes.iter.MapKeyZipper.html#torchdata.datapipes.iter.MapKeyZipper "torchdata.datapipes.iter.MapKeyZipper") | Joins the items from the source IterDataPipe with items from a MapDataPipe (functional name: `zip_with_map`). |' + id: totrans-57 prefs: [] type: TYPE_TB + zh: '| [`MapKeyZipper`](生成/torchdata.datapipes.iter.MapKeyZipper.html#torchdata.datapipes.iter.MapKeyZipper + "torchdata.datapipes.iter.MapKeyZipper") | 将源IterDataPipe的项目与MapDataPipe的项目结合(功能名称:`zip_with_map`)。 + |' - en: '| [`Multiplexer`](generated/torchdata.datapipes.iter.Multiplexer.html#torchdata.datapipes.iter.Multiplexer "torchdata.datapipes.iter.Multiplexer") | Yields one element at a time from each of the input Iterable DataPipes (functional name: `mux`). |' + id: totrans-58 prefs: [] type: TYPE_TB + zh: '| [`Multiplexer`](生成/torchdata.datapipes.iter.Multiplexer.html#torchdata.datapipes.iter.Multiplexer + "torchdata.datapipes.iter.Multiplexer") | 从输入的每个Iterable DataPipe中一次产生一个元素(功能名称:`mux`)。 + |' - en: '| [`MultiplexerLongest`](generated/torchdata.datapipes.iter.MultiplexerLongest.html#torchdata.datapipes.iter.MultiplexerLongest "torchdata.datapipes.iter.MultiplexerLongest") | Yields one element at a time from each of the input Iterable DataPipes (functional name: `mux_longest`). |' + id: totrans-59 prefs: [] type: TYPE_TB + zh: '| [`MultiplexerLongest`](生成/torchdata.datapipes.iter.MultiplexerLongest.html#torchdata.datapipes.iter.MultiplexerLongest + "torchdata.datapipes.iter.MultiplexerLongest") | 从输入的每个Iterable DataPipe中一次产生一个元素(功能名称:`mux_longest`)。 + |' - en: '| [`RoundRobinDemultiplexer`](generated/torchdata.datapipes.iter.RoundRobinDemultiplexer.html#torchdata.datapipes.iter.RoundRobinDemultiplexer "torchdata.datapipes.iter.RoundRobinDemultiplexer") | Splits the input DataPipe into multiple child DataPipes in the round-robin order (functional name: `round_robin_demux`). |' + id: totrans-60 prefs: [] type: TYPE_TB + zh: '| [`RoundRobinDemultiplexer`](生成/torchdata.datapipes.iter.RoundRobinDemultiplexer.html#torchdata.datapipes.iter.RoundRobinDemultiplexer + "torchdata.datapipes.iter.RoundRobinDemultiplexer") | 按照轮询顺序将输入DataPipe拆分为多个子DataPipes(功能名称:`round_robin_demux`)。 + |' - en: '| [`SampleMultiplexer`](generated/torchdata.datapipes.iter.SampleMultiplexer.html#torchdata.datapipes.iter.SampleMultiplexer "torchdata.datapipes.iter.SampleMultiplexer") | Takes a Dict of (IterDataPipe, Weight), and yields items by sampling from these DataPipes with respect to their weights. |' + id: totrans-61 prefs: [] type: TYPE_TB + zh: '| [`SampleMultiplexer`](生成/torchdata.datapipes.iter.SampleMultiplexer.html#torchdata.datapipes.iter.SampleMultiplexer + "torchdata.datapipes.iter.SampleMultiplexer") | 接受一个(IterDataPipe, Weight)字典,并根据权重从这些DataPipes中进行采样生成项目。 + |' - en: '| [`UnZipper`](generated/torchdata.datapipes.iter.UnZipper.html#torchdata.datapipes.iter.UnZipper "torchdata.datapipes.iter.UnZipper") | Takes in a DataPipe of Sequences, unpacks each Sequence, and return the elements in separate DataPipes based on their position in the Sequence (functional name: `unzip`). |' + id: totrans-62 prefs: [] type: TYPE_TB + zh: '| [`UnZipper`](生成/torchdata.datapipes.iter.UnZipper.html#torchdata.datapipes.iter.UnZipper + "torchdata.datapipes.iter.UnZipper") | 接受一个序列的DataPipe,解压每个序列,并根据序列中的位置将元素分别返回到不同的DataPipes中(功能名称:`unzip`)。 + |' - en: '| [`Zipper`](generated/torchdata.datapipes.iter.Zipper.html#torchdata.datapipes.iter.Zipper "torchdata.datapipes.iter.Zipper") | Aggregates elements into a tuple from each of the input DataPipes (functional name: `zip`). |' + id: totrans-63 prefs: [] type: TYPE_TB + zh: '| [`Zipper`](generated/torchdata.datapipes.iter.Zipper.html#torchdata.datapipes.iter.Zipper + "torchdata.datapipes.iter.Zipper") | 从每个输入DataPipe中聚合元素为元组(功能名称:`zip`)。 |' - en: '| [`ZipperLongest`](generated/torchdata.datapipes.iter.ZipperLongest.html#torchdata.datapipes.iter.ZipperLongest "torchdata.datapipes.iter.ZipperLongest") | Aggregates elements into a tuple from each of the input DataPipes (functional name: `zip_longest`). |' + id: totrans-64 prefs: [] type: TYPE_TB + zh: '| [`ZipperLongest`](generated/torchdata.datapipes.iter.ZipperLongest.html#torchdata.datapipes.iter.ZipperLongest + "torchdata.datapipes.iter.ZipperLongest") | 从每个输入DataPipe中聚合元素为元组(功能名称:`zip_longest`)。 + |' - en: Grouping DataPipes[](#grouping-datapipes "Permalink to this heading") + id: totrans-65 prefs: - PREF_H2 type: TYPE_NORMAL + zh: Grouping DataPipes[](#grouping-datapipes "Permalink to this heading") - en: These DataPipes have you group samples within a DataPipe. + id: totrans-66 prefs: [] type: TYPE_NORMAL + zh: 这些DataPipes让您在DataPipe中对样本进行分组。 - en: '| [`Batcher`](generated/torchdata.datapipes.iter.Batcher.html#torchdata.datapipes.iter.Batcher "torchdata.datapipes.iter.Batcher") | Creates mini-batches of data (functional name: `batch`). |' + id: totrans-67 prefs: [] type: TYPE_TB + zh: '| [`Batcher`](generated/torchdata.datapipes.iter.Batcher.html#torchdata.datapipes.iter.Batcher + "torchdata.datapipes.iter.Batcher") | 创建数据的小批次(功能名称:`batch`)。 |' - en: '| [`BucketBatcher`](generated/torchdata.datapipes.iter.BucketBatcher.html#torchdata.datapipes.iter.BucketBatcher "torchdata.datapipes.iter.BucketBatcher") | Creates mini-batches of data from sorted bucket (functional name: `bucketbatch`). |' + id: totrans-68 prefs: [] type: TYPE_TB + zh: '| [`BucketBatcher`](generated/torchdata.datapipes.iter.BucketBatcher.html#torchdata.datapipes.iter.BucketBatcher + "torchdata.datapipes.iter.BucketBatcher") | 从排序的桶中创建数据的小批次(功能名称:`bucketbatch`)。 + |' - en: '| [`Collator`](generated/torchdata.datapipes.iter.Collator.html#torchdata.datapipes.iter.Collator "torchdata.datapipes.iter.Collator") | Collates samples from DataPipe to Tensor(s) by a custom collate function (functional name: `collate`). |' + id: totrans-69 prefs: [] type: TYPE_TB + zh: '| [`Collator`](generated/torchdata.datapipes.iter.Collator.html#torchdata.datapipes.iter.Collator + "torchdata.datapipes.iter.Collator") | 通过自定义整理函数将DataPipe中的样本整理为张量(功能名称:`collate`)。 + |' - en: '| [`Grouper`](generated/torchdata.datapipes.iter.Grouper.html#torchdata.datapipes.iter.Grouper "torchdata.datapipes.iter.Grouper") | Groups data from input IterDataPipe by keys which are generated from `group_key_fn`, and yields a `DataChunk` with batch size up to `group_size` if defined (functional name: `groupby`). |' + id: totrans-70 prefs: [] type: TYPE_TB + zh: '| [`Grouper`](generated/torchdata.datapipes.iter.Grouper.html#torchdata.datapipes.iter.Grouper + "torchdata.datapipes.iter.Grouper") | 通过从`group_key_fn`生成的键对来自输入IterDataPipe的数据进行分组,并在定义了`group_size`的情况下生成具有最大批量大小的`DataChunk`(功能名称:`groupby`)。 + |' - en: '| [`MaxTokenBucketizer`](generated/torchdata.datapipes.iter.MaxTokenBucketizer.html#torchdata.datapipes.iter.MaxTokenBucketizer "torchdata.datapipes.iter.MaxTokenBucketizer") | Creates mini-batches of data from a min-heap with limited size, and the total length of samples returned by `len_fn` within each batch will be limited by `max_token_count` (functional name: `max_token_bucketize`). |' + id: totrans-71 prefs: [] type: TYPE_TB + zh: '| [`MaxTokenBucketizer`](generated/torchdata.datapipes.iter.MaxTokenBucketizer.html#torchdata.datapipes.iter.MaxTokenBucketizer + "torchdata.datapipes.iter.MaxTokenBucketizer") | 从具有限制大小的最小堆中创建数据的小批次,并且每个批次中由`len_fn`返回的样本的总长度将受到`max_token_count`的限制(功能名称:`max_token_bucketize`)。 + |' - en: '| [`UnBatcher`](generated/torchdata.datapipes.iter.UnBatcher.html#torchdata.datapipes.iter.UnBatcher "torchdata.datapipes.iter.UnBatcher") | Undoes batching of data (functional name: `unbatch`). |' + id: totrans-72 prefs: [] type: TYPE_TB + zh: '| [`UnBatcher`](generated/torchdata.datapipes.iter.UnBatcher.html#torchdata.datapipes.iter.UnBatcher + "torchdata.datapipes.iter.UnBatcher") | 撤消数据的批处理(功能名称:`unbatch`)。 |' - en: IO DataPipes[](#io-datapipes "Permalink to this heading") + id: totrans-73 prefs: - PREF_H2 type: TYPE_NORMAL + zh: IO DataPipes[](#io-datapipes "Permalink to this heading") - en: These DataPipes help interacting with the file systems or remote server (e.g. downloading, opening, saving files, and listing the files in directories). + id: totrans-74 prefs: [] type: TYPE_NORMAL + zh: 这些DataPipes有助于与文件系统或远程服务器进行交互(例如下载、打开、保存文件以及列出目录中的文件)。 - en: '| [`AISFileLister`](generated/torchdata.datapipes.iter.AISFileLister.html#torchdata.datapipes.iter.AISFileLister "torchdata.datapipes.iter.AISFileLister") | Iterable Datapipe that lists files from the AIStore backends with the given URL prefixes (functional name: `list_files_by_ais`). |' + id: totrans-75 prefs: [] type: TYPE_TB + zh: '| [`AISFileLister`](generated/torchdata.datapipes.iter.AISFileLister.html#torchdata.datapipes.iter.AISFileLister + "torchdata.datapipes.iter.AISFileLister") | 可迭代的Datapipe,列出具有给定URL前缀的AIStore后端的文件(功能名称:`list_files_by_ais`)。 + |' - en: '| [`AISFileLoader`](generated/torchdata.datapipes.iter.AISFileLoader.html#torchdata.datapipes.iter.AISFileLoader "torchdata.datapipes.iter.AISFileLoader") | Iterable DataPipe that loads files from AIStore with the given URLs (functional name: `load_files_by_ais`). |' + id: totrans-76 prefs: [] type: TYPE_TB + zh: '| [`AISFileLoader`](generated/torchdata.datapipes.iter.AISFileLoader.html#torchdata.datapipes.iter.AISFileLoader + "torchdata.datapipes.iter.AISFileLoader") | 可迭代的DataPipe,从具有给定URL的AIStore中加载文件(功能名称:`load_files_by_ais`)。 + |' - en: '| [`FSSpecFileLister`](generated/torchdata.datapipes.iter.FSSpecFileLister.html#torchdata.datapipes.iter.FSSpecFileLister "torchdata.datapipes.iter.FSSpecFileLister") | Lists the contents of the directory at the provided `root` pathname or URL, and yields the full pathname or URL for each file within the directory (functional name: `list_files_by_fsspec`). |' + id: totrans-77 prefs: [] type: TYPE_TB + zh: '| [`FSSpecFileLister`](generated/torchdata.datapipes.iter.FSSpecFileLister.html#torchdata.datapipes.iter.FSSpecFileLister + "torchdata.datapipes.iter.FSSpecFileLister") | 列出提供的`root`路径名或URL的目录内容,并为目录中的每个文件生成完整的路径名或URL(功能名称:`list_files_by_fsspec`)。 + |' - en: '| [`FSSpecFileOpener`](generated/torchdata.datapipes.iter.FSSpecFileOpener.html#torchdata.datapipes.iter.FSSpecFileOpener "torchdata.datapipes.iter.FSSpecFileOpener") | Opens files from input datapipe which contains fsspec paths and yields a tuple of pathname and opened file stream (functional name: `open_files_by_fsspec`). |' + id: totrans-78 prefs: [] type: TYPE_TB + zh: '| [`FSSpecFileOpener`](generated/torchdata.datapipes.iter.FSSpecFileOpener.html#torchdata.datapipes.iter.FSSpecFileOpener + "torchdata.datapipes.iter.FSSpecFileOpener") | 从包含fsspec路径的输入datapipe中打开文件,并生成路径名和打开的文件流的元组(功能名称:`open_files_by_fsspec`)。 + |' - en: '| [`FSSpecSaver`](generated/torchdata.datapipes.iter.FSSpecSaver.html#torchdata.datapipes.iter.FSSpecSaver "torchdata.datapipes.iter.FSSpecSaver") | Takes in a DataPipe of tuples of metadata and data, saves the data to the target path (generated by the filepath_fn and metadata), and yields the resulting fsspec path (functional name: `save_by_fsspec`). |' + id: totrans-79 prefs: [] type: TYPE_TB + zh: '| [`FSSpecSaver`](generated/torchdata.datapipes.iter.FSSpecSaver.html#torchdata.datapipes.iter.FSSpecSaver + "torchdata.datapipes.iter.FSSpecSaver") | 接收元数据和数据元组的DataPipe,将数据保存到目标路径(由filepath_fn和元数据生成),并产生结果的fsspec路径(函数名:`save_by_fsspec`)。 + |' - en: '| [`FileLister`](generated/torchdata.datapipes.iter.FileLister.html#torchdata.datapipes.iter.FileLister "torchdata.datapipes.iter.FileLister") | Given path(s) to the root directory, yields file pathname(s) (path + filename) of files within the root directory. |' + id: totrans-80 prefs: [] type: TYPE_TB + zh: '| [`FileLister`](generated/torchdata.datapipes.iter.FileLister.html#torchdata.datapipes.iter.FileLister + "torchdata.datapipes.iter.FileLister") | 给定根目录的路径,产生根目录中文件的路径名(路径+文件名)。 |' - en: '| [`FileOpener`](generated/torchdata.datapipes.iter.FileOpener.html#torchdata.datapipes.iter.FileOpener "torchdata.datapipes.iter.FileOpener") | Given pathnames, opens files and yield pathname and file stream in a tuple (functional name: `open_files`). |' + id: totrans-81 prefs: [] type: TYPE_TB + zh: '| [`FileOpener`](generated/torchdata.datapipes.iter.FileOpener.html#torchdata.datapipes.iter.FileOpener + "torchdata.datapipes.iter.FileOpener") | 给定路径名,打开文件并以元组形式产生路径名和文件流(函数名:`open_files`)。 + |' - en: '| [`GDriveReader`](generated/torchdata.datapipes.iter.GDriveReader.html#torchdata.datapipes.iter.GDriveReader "torchdata.datapipes.iter.GDriveReader") | Takes URLs pointing at GDrive files, and yields tuples of file name and IO stream (functional name: `read_from_gdrive`). |' + id: totrans-82 prefs: [] type: TYPE_TB + zh: '| [`GDriveReader`](generated/torchdata.datapipes.iter.GDriveReader.html#torchdata.datapipes.iter.GDriveReader + "torchdata.datapipes.iter.GDriveReader") | 接收指向GDrive文件的URL,并产生文件名和IO流的元组(函数名:`read_from_gdrive`)。 + |' - en: '| [`HttpReader`](generated/torchdata.datapipes.iter.HttpReader.html#torchdata.datapipes.iter.HttpReader "torchdata.datapipes.iter.HttpReader") | Takes file URLs (HTTP URLs pointing to files), and yields tuples of file URL and IO stream (functional name: `read_from_http`). |' + id: totrans-83 prefs: [] type: TYPE_TB + zh: '| [`HttpReader`](generated/torchdata.datapipes.iter.HttpReader.html#torchdata.datapipes.iter.HttpReader + "torchdata.datapipes.iter.HttpReader") | 接收文件URL(指向文件的HTTP URL),并产生文件URL和IO流的元组(函数名:`read_from_http`)。 + |' - en: '| [`HuggingFaceHubReader`](generated/torchdata.datapipes.iter.HuggingFaceHubReader.html#torchdata.datapipes.iter.HuggingFaceHubReader "torchdata.datapipes.iter.HuggingFaceHubReader") | Takes in dataset names and returns an Iterable HuggingFace dataset. |' + id: totrans-84 prefs: [] type: TYPE_TB + zh: '| [`HuggingFaceHubReader`](generated/torchdata.datapipes.iter.HuggingFaceHubReader.html#torchdata.datapipes.iter.HuggingFaceHubReader + "torchdata.datapipes.iter.HuggingFaceHubReader") | 接收数据集名称并返回一个可迭代的HuggingFace数据集。 + |' - en: '| [`IoPathFileLister`](generated/torchdata.datapipes.iter.IoPathFileLister.html#torchdata.datapipes.iter.IoPathFileLister "torchdata.datapipes.iter.IoPathFileLister") | Lists the contents of the directory at the provided `root` pathname or URL, and yields the full pathname or URL for each file within the directory (functional name: `list_files_by_iopath`). |' + id: totrans-85 prefs: [] type: TYPE_TB + zh: '| [`IoPathFileLister`](generated/torchdata.datapipes.iter.IoPathFileLister.html#torchdata.datapipes.iter.IoPathFileLister + "torchdata.datapipes.iter.IoPathFileLister") | 列出提供的`root`路径名或URL的目录内容,并为目录中的每个文件产生完整的路径名或URL(函数名:`list_files_by_iopath`)。 + |' - en: '| [`IoPathFileOpener`](generated/torchdata.datapipes.iter.IoPathFileOpener.html#torchdata.datapipes.iter.IoPathFileOpener "torchdata.datapipes.iter.IoPathFileOpener") | Opens files from input datapipe which contains pathnames or URLs, and yields a tuple of pathname and opened file stream (functional name: `open_files_by_iopath`). |' + id: totrans-86 prefs: [] type: TYPE_TB + zh: '| [`IoPathFileOpener`](generated/torchdata.datapipes.iter.IoPathFileOpener.html#torchdata.datapipes.iter.IoPathFileOpener + "torchdata.datapipes.iter.IoPathFileOpener") | 从包含路径名或URL的输入datapipe中打开文件,并产生路径名和已打开文件流的元组(函数名:`open_files_by_iopath`)。 + |' - en: '| [`IoPathSaver`](generated/torchdata.datapipes.iter.IoPathSaver.html#torchdata.datapipes.iter.IoPathSaver "torchdata.datapipes.iter.IoPathSaver") | Takes in a DataPipe of tuples of metadata and data, saves the data to the target path which is generated by the `filepath_fn` and metadata, and yields the resulting path in iopath format (functional name: `save_by_iopath`). |' + id: totrans-87 prefs: [] type: TYPE_TB + zh: '| [`IoPathSaver`](generated/torchdata.datapipes.iter.IoPathSaver.html#torchdata.datapipes.iter.IoPathSaver + "torchdata.datapipes.iter.IoPathSaver") | 接收元数据和数据元组的DataPipe,将数据保存到由`filepath_fn`和元数据生成的目标路径,并以iopath格式(函数名:`save_by_iopath`)产生结果路径。 + |' - en: '| [`OnlineReader`](generated/torchdata.datapipes.iter.OnlineReader.html#torchdata.datapipes.iter.OnlineReader "torchdata.datapipes.iter.OnlineReader") | Takes file URLs (can be HTTP URLs pointing to files or URLs to GDrive files), and yields tuples of file URL and IO stream (functional name: `read_from_remote`). |' + id: totrans-88 prefs: [] type: TYPE_TB + zh: '| [`OnlineReader`](generated/torchdata.datapipes.iter.OnlineReader.html#torchdata.datapipes.iter.OnlineReader + "torchdata.datapipes.iter.OnlineReader") | 接收文件URL(可以是指向文件的HTTP URL或指向GDrive文件的URL),并产生文件URL和IO流的元组(函数名:`read_from_remote`)。 + |' - en: '| [`ParquetDataFrameLoader`](generated/torchdata.datapipes.iter.ParquetDataFrameLoader.html#torchdata.datapipes.iter.ParquetDataFrameLoader "torchdata.datapipes.iter.ParquetDataFrameLoader") | Takes in paths to Parquet files and return a TorchArrow DataFrame for each row group within a Parquet file (functional name: `load_parquet_as_df`). |' + id: totrans-89 prefs: [] type: TYPE_TB + zh: '| [`ParquetDataFrameLoader`](generated/torchdata.datapipes.iter.ParquetDataFrameLoader.html#torchdata.datapipes.iter.ParquetDataFrameLoader + "torchdata.datapipes.iter.ParquetDataFrameLoader") | 接收Parquet文件的路径,并为Parquet文件中的每个行组返回一个TorchArrow + DataFrame(函数名:`load_parquet_as_df`)。 |' - en: '| [`S3FileLister`](generated/torchdata.datapipes.iter.S3FileLister.html#torchdata.datapipes.iter.S3FileLister "torchdata.datapipes.iter.S3FileLister") | Iterable DataPipe that lists Amazon S3 file URLs with the given prefixes (functional name: `list_files_by_s3`). |' + id: totrans-90 prefs: [] type: TYPE_TB + zh: '| [`S3FileLister`](generated/torchdata.datapipes.iter.S3FileLister.html#torchdata.datapipes.iter.S3FileLister + "torchdata.datapipes.iter.S3FileLister") | 可迭代的DataPipe,列出具有给定前缀的Amazon S3文件URL(函数名:`list_files_by_s3`)。 + |' - en: '| [`S3FileLoader`](generated/torchdata.datapipes.iter.S3FileLoader.html#torchdata.datapipes.iter.S3FileLoader "torchdata.datapipes.iter.S3FileLoader") | Iterable DataPipe that loads Amazon S3 files from the given S3 URLs (functional name: `load_files_by_s3`). |' + id: totrans-91 prefs: [] type: TYPE_TB + zh: '| [`S3FileLoader`](generated/torchdata.datapipes.iter.S3FileLoader.html#torchdata.datapipes.iter.S3FileLoader + "torchdata.datapipes.iter.S3FileLoader") | 可迭代的DataPipe,从给定的S3 URL加载Amazon S3文件(函数名:`load_files_by_s3`)。 + |' - en: '| [`Saver`](generated/torchdata.datapipes.iter.Saver.html#torchdata.datapipes.iter.Saver "torchdata.datapipes.iter.Saver") | Takes in a DataPipe of tuples of metadata and data, saves the data to the target path generated by the `filepath_fn` and metadata, and yields file path on local file system (functional name: `save_to_disk`). |' + id: totrans-92 prefs: [] type: TYPE_TB + zh: '| [`Saver`](生成/torchdata.datapipes.iter.Saver.html#torchdata.datapipes.iter.Saver + "torchdata.datapipes.iter.Saver") | 接收元数据和数据元组的DataPipe,将数据保存到由`filepath_fn`生成的目标路径和元数据中,并在本地文件系统上生成文件路径(函数名称:`save_to_disk`)。 + |' - en: Mapping DataPipes[](#mapping-datapipes "Permalink to this heading") + id: totrans-93 prefs: - PREF_H2 type: TYPE_NORMAL + zh: Mapping DataPipes[](#mapping-datapipes "跳转到此标题") - en: These DataPipes apply the a given function to each element in the DataPipe. + id: totrans-94 prefs: [] type: TYPE_NORMAL + zh: 这些DataPipes将给定的函数应用于DataPipe中的每个元素。 - en: '| [`BatchAsyncMapper`](generated/torchdata.datapipes.iter.BatchAsyncMapper.html#torchdata.datapipes.iter.BatchAsyncMapper "torchdata.datapipes.iter.BatchAsyncMapper") | Combines elements from the source DataPipe to batches and applies a coroutine function over each element within the batch concurrently, then flattens the outpus to a single, unnested IterDataPipe (functional name: `async_map_batches`). |' + id: totrans-95 prefs: [] type: TYPE_TB + zh: '| [`BatchAsyncMapper`](生成/torchdata.datapipes.iter.BatchAsyncMapper.html#torchdata.datapipes.iter.BatchAsyncMapper + "torchdata.datapipes.iter.BatchAsyncMapper") | 将源DataPipe中的元素组合成批次,并对每个批次中的每个元素并发地应用协程函数,然后将输出展平为单个、非嵌套的IterDataPipe(函数名称:`async_map_batches`)。 + |' - en: '| [`BatchMapper`](generated/torchdata.datapipes.iter.BatchMapper.html#torchdata.datapipes.iter.BatchMapper "torchdata.datapipes.iter.BatchMapper") | Combines elements from the source DataPipe to batches and applies a function over each batch, then flattens the outputs to a single, unnested IterDataPipe (functional name: `map_batches`). |' + id: totrans-96 prefs: [] type: TYPE_TB + zh: '| [`BatchMapper`](生成/torchdata.datapipes.iter.BatchMapper.html#torchdata.datapipes.iter.BatchMapper + "torchdata.datapipes.iter.BatchMapper") | 将源DataPipe中的元素组合成批次,并对每个批次应用函数,然后将输出展平为单个、非嵌套的IterDataPipe(函数名称:`map_batches`)。 + |' - en: '| [`FlatMapper`](generated/torchdata.datapipes.iter.FlatMapper.html#torchdata.datapipes.iter.FlatMapper "torchdata.datapipes.iter.FlatMapper") | Applies a function over each item from the source DataPipe, then flattens the outputs to a single, unnested IterDataPipe (functional name: `flatmap`). |' + id: totrans-97 prefs: [] type: TYPE_TB + zh: '| [`FlatMapper`](生成/torchdata.datapipes.iter.FlatMapper.html#torchdata.datapipes.iter.FlatMapper + "torchdata.datapipes.iter.FlatMapper") | 对源DataPipe中的每个项目应用函数,然后将输出展平为单个、非嵌套的IterDataPipe(函数名称:`flatmap`)。 + |' - en: '| [`Mapper`](generated/torchdata.datapipes.iter.Mapper.html#torchdata.datapipes.iter.Mapper "torchdata.datapipes.iter.Mapper") | Applies a function over each item from the source DataPipe (functional name: `map`). |' + id: totrans-98 prefs: [] type: TYPE_TB + zh: '| [`Mapper`](生成/torchdata.datapipes.iter.Mapper.html#torchdata.datapipes.iter.Mapper + "torchdata.datapipes.iter.Mapper") | 对源DataPipe中的每个项目应用函数(函数名称:`map`)。 |' - en: '| [`ShuffledFlatMapper`](generated/torchdata.datapipes.iter.ShuffledFlatMapper.html#torchdata.datapipes.iter.ShuffledFlatMapper "torchdata.datapipes.iter.ShuffledFlatMapper") | Applies a function over each item from the source DataPipe, then collects the iterables returned in a buffer, then, at every iteration, chooses at random one of the iterables in the buffer and yields one item from this iterable (functional name: `shuffled_flatmap`). |' + id: totrans-99 prefs: [] type: TYPE_TB + zh: '| [`ShuffledFlatMapper`](生成/torchdata.datapipes.iter.ShuffledFlatMapper.html#torchdata.datapipes.iter.ShuffledFlatMapper + "torchdata.datapipes.iter.ShuffledFlatMapper") | 对源DataPipe中的每个项目应用函数,然后将返回的可迭代对象收集到缓冲区中,然后,在每次迭代时,随机选择缓冲区中的一个可迭代对象,并从该可迭代对象中产生一个项目(函数名称:`shuffled_flatmap`)。 + |' - en: '| [`ThreadPoolMapper`](generated/torchdata.datapipes.iter.ThreadPoolMapper.html#torchdata.datapipes.iter.ThreadPoolMapper "torchdata.datapipes.iter.ThreadPoolMapper") | Applies a function over each item from the source DataPipe concurrently using `ThreadPoolExecutor` (functional name: `threadpool_map`). |' + id: totrans-100 prefs: [] type: TYPE_TB + zh: '| [`ThreadPoolMapper`](生成/torchdata.datapipes.iter.ThreadPoolMapper.html#torchdata.datapipes.iter.ThreadPoolMapper + "torchdata.datapipes.iter.ThreadPoolMapper") | 并发地对源DataPipe中的每个项目应用函数,使用`ThreadPoolExecutor`(函数名称:`threadpool_map`)。 + |' - en: Other DataPipes[](#other-datapipes "Permalink to this heading") + id: totrans-101 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 其他DataPipes[](#other-datapipes "跳转到此标题") - en: A miscellaneous set of DataPipes with different functionalities. + id: totrans-102 prefs: [] type: TYPE_NORMAL + zh: 一组具有不同功能的杂项DataPipes。 - en: '| [`DataFrameMaker`](generated/torchdata.datapipes.iter.DataFrameMaker.html#torchdata.datapipes.iter.DataFrameMaker "torchdata.datapipes.iter.DataFrameMaker") | Takes rows of data, batches a number of them together and creates TorchArrow DataFrames (functional name: `dataframe`). |' + id: totrans-103 prefs: [] type: TYPE_TB + zh: '| [`DataFrameMaker`](生成/torchdata.datapipes.iter.DataFrameMaker.html#torchdata.datapipes.iter.DataFrameMaker + "torchdata.datapipes.iter.DataFrameMaker") | 获取数据行,将其中一些数据批量处理并创建TorchArrow数据框(函数名称:`dataframe`)。 + |' - en: '| [`EndOnDiskCacheHolder`](generated/torchdata.datapipes.iter.EndOnDiskCacheHolder.html#torchdata.datapipes.iter.EndOnDiskCacheHolder "torchdata.datapipes.iter.EndOnDiskCacheHolder") | Indicates when the result of prior DataPipe will be saved local files specified by `filepath_fn` (functional name: `end_caching`). |' + id: totrans-104 prefs: [] type: TYPE_TB + zh: '| [`EndOnDiskCacheHolder`](生成/torchdata.datapipes.iter.EndOnDiskCacheHolder.html#torchdata.datapipes.iter.EndOnDiskCacheHolder + "torchdata.datapipes.iter.EndOnDiskCacheHolder") | 指示先前DataPipe的结果将保存在由`filepath_fn`指定的本地文件中(函数名称:`end_caching`)。 + |' - en: '| [`FullSync`](generated/torchdata.datapipes.iter.FullSync.html#torchdata.datapipes.iter.FullSync "torchdata.datapipes.iter.FullSync") | Synchronizes data across distributed processes to prevent hanging during training, which is caused by uneven sharded data (functional name: `fullsync`). |' + id: totrans-105 prefs: [] type: TYPE_TB + zh: '| [`FullSync`](生成/torchdata.datapipes.iter.FullSync.html#torchdata.datapipes.iter.FullSync + "torchdata.datapipes.iter.FullSync") | 同步分布式进程中的数据,以防止训练过程中出现挂起,这是由不均匀的分片数据引起的(函数名称:`fullsync`)。 + |' - en: '| [`HashChecker`](generated/torchdata.datapipes.iter.HashChecker.html#torchdata.datapipes.iter.HashChecker "torchdata.datapipes.iter.HashChecker") | Computes and checks the hash of each file, from an input DataPipe of tuples of file name and data/stream (functional name: `check_hash`). |' + id: totrans-106 prefs: [] type: TYPE_TB + zh: '| [`HashChecker`](生成/torchdata.datapipes.iter.HashChecker.html#torchdata.datapipes.iter.HashChecker + "torchdata.datapipes.iter.HashChecker") | 计算并检查每个文件的哈希值,从文件名和数据/流的元组输入DataPipe中(函数名称:`check_hash`)。 + |' - en: '| [`InMemoryCacheHolder`](generated/torchdata.datapipes.iter.InMemoryCacheHolder.html#torchdata.datapipes.iter.InMemoryCacheHolder "torchdata.datapipes.iter.InMemoryCacheHolder") | Stores elements from the source DataPipe in memory, up to a size limit if specified (functional name: `in_memory_cache`). |' + id: totrans-107 prefs: [] type: TYPE_TB + zh: '| [`InMemoryCacheHolder`](generated/torchdata.datapipes.iter.InMemoryCacheHolder.html#torchdata.datapipes.iter.InMemoryCacheHolder + "torchdata.datapipes.iter.InMemoryCacheHolder") | 将来自源DataPipe的元素存储在内存中,如果指定了大小限制,则存储在内存中(功能名称:`in_memory_cache`)。 + |' - en: '| [`IterableWrapper`](generated/torchdata.datapipes.iter.IterableWrapper.html#torchdata.datapipes.iter.IterableWrapper "torchdata.datapipes.iter.IterableWrapper") | Wraps an iterable object to create an IterDataPipe. |' + id: totrans-108 prefs: [] type: TYPE_TB + zh: '| [`IterableWrapper`](generated/torchdata.datapipes.iter.IterableWrapper.html#torchdata.datapipes.iter.IterableWrapper + "torchdata.datapipes.iter.IterableWrapper") | 包装可迭代对象以创建IterDataPipe。 |' - en: '| [`LengthSetter`](generated/torchdata.datapipes.iter.LengthSetter.html#torchdata.datapipes.iter.LengthSetter "torchdata.datapipes.iter.LengthSetter") | Set the length attribute of the DataPipe, which is returned by `__len__` (functional name: `set_length`). |' + id: totrans-109 prefs: [] type: TYPE_TB + zh: '| [`LengthSetter`](generated/torchdata.datapipes.iter.LengthSetter.html#torchdata.datapipes.iter.LengthSetter + "torchdata.datapipes.iter.LengthSetter") | 设置DataPipe的长度属性,该属性由`__len__`返回(功能名称:`set_length`)。 + |' - en: '| [`MapToIterConverter`](generated/torchdata.datapipes.iter.MapToIterConverter.html#torchdata.datapipes.iter.MapToIterConverter "torchdata.datapipes.iter.MapToIterConverter") | Convert a `MapDataPipe` to an `IterDataPipe` (functional name: `to_iter_datapipe`). |' + id: totrans-110 prefs: [] type: TYPE_TB + zh: '| [`MapToIterConverter`](generated/torchdata.datapipes.iter.MapToIterConverter.html#torchdata.datapipes.iter.MapToIterConverter + "torchdata.datapipes.iter.MapToIterConverter") | 将`MapDataPipe`转换为`IterDataPipe`(功能名称:`to_iter_datapipe`)。 + |' - en: '| [`OnDiskCacheHolder`](generated/torchdata.datapipes.iter.OnDiskCacheHolder.html#torchdata.datapipes.iter.OnDiskCacheHolder "torchdata.datapipes.iter.OnDiskCacheHolder") | Caches the outputs of multiple DataPipe operations to local files, which are typically performance bottleneck such download, decompress, and etc (functional name: `on_disk_cache`). |' + id: totrans-111 prefs: [] type: TYPE_TB + zh: '| [`OnDiskCacheHolder`](generated/torchdata.datapipes.iter.OnDiskCacheHolder.html#torchdata.datapipes.iter.OnDiskCacheHolder + "torchdata.datapipes.iter.OnDiskCacheHolder") | 将多个DataPipe操作的输出缓存到本地文件中,这些操作通常是性能瓶颈,如下载、解压等(功能名称:`on_disk_cache`)。 + |' - en: '| [`PinMemory`](generated/torchdata.datapipes.iter.PinMemory.html#torchdata.datapipes.iter.PinMemory "torchdata.datapipes.iter.PinMemory") | Prefetches one element from the source DataPipe and moves it to pinned memory (functional name: `pin_memory`). |' + id: totrans-112 prefs: [] type: TYPE_TB + zh: '| [`PinMemory`](generated/torchdata.datapipes.iter.PinMemory.html#torchdata.datapipes.iter.PinMemory + "torchdata.datapipes.iter.PinMemory") | 预取源DataPipe中的一个元素并将其移动到固定内存中(功能名称:`pin_memory`)。 + |' - en: '| [`Prefetcher`](generated/torchdata.datapipes.iter.Prefetcher.html#torchdata.datapipes.iter.Prefetcher "torchdata.datapipes.iter.Prefetcher") | Prefetches elements from the source DataPipe and puts them into a buffer (functional name: `prefetch`). |' + id: totrans-113 prefs: [] type: TYPE_TB + zh: '| [`Prefetcher`](generated/torchdata.datapipes.iter.Prefetcher.html#torchdata.datapipes.iter.Prefetcher + "torchdata.datapipes.iter.Prefetcher") | 预取来自源DataPipe的元素并将它们放入缓冲区(功能名称:`prefetch`)。 + |' - en: '| [`RandomSplitter`](generated/torchdata.datapipes.iter.RandomSplitter.html#torchdata.datapipes.iter.RandomSplitter "torchdata.datapipes.iter.RandomSplitter") | Randomly split samples from a source DataPipe into groups (functional name: `random_split`). |' + id: totrans-114 prefs: [] type: TYPE_TB + zh: '| [`RandomSplitter`](generated/torchdata.datapipes.iter.RandomSplitter.html#torchdata.datapipes.iter.RandomSplitter + "torchdata.datapipes.iter.RandomSplitter") | 将源DataPipe中的样本随机分成组(功能名称:`random_split`)。 + |' - en: '| [`ShardExpander`](generated/torchdata.datapipes.iter.ShardExpander.html#torchdata.datapipes.iter.ShardExpander "torchdata.datapipes.iter.ShardExpander") | Expands incoming shard strings into shards. |' + id: totrans-115 prefs: [] type: TYPE_TB + zh: '| [`ShardExpander`](generated/torchdata.datapipes.iter.ShardExpander.html#torchdata.datapipes.iter.ShardExpander + "torchdata.datapipes.iter.ShardExpander") | 将传入的分片字符串扩展为分片。 |' - en: '| [`ShardingFilter`](generated/torchdata.datapipes.iter.ShardingFilter.html#torchdata.datapipes.iter.ShardingFilter "torchdata.datapipes.iter.ShardingFilter") | Wrapper that allows DataPipe to be sharded (functional name: `sharding_filter`). |' + id: totrans-116 prefs: [] type: TYPE_TB + zh: '| [`ShardingFilter`](generated/torchdata.datapipes.iter.ShardingFilter.html#torchdata.datapipes.iter.ShardingFilter + "torchdata.datapipes.iter.ShardingFilter") | 允许DataPipe被分片的包装器(功能名称:`sharding_filter`)。 + |' - en: '| [`ShardingRoundRobinDispatcher`](generated/torchdata.datapipes.iter.ShardingRoundRobinDispatcher.html#torchdata.datapipes.iter.ShardingRoundRobinDispatcher "torchdata.datapipes.iter.ShardingRoundRobinDispatcher") | Wrapper that indicates the prior section of `DataPipe` graph is non-replicable and will be iterated in a separate, single dispatching process to distribute data to worker processes in a round-robin manner when multiprocessing is being used. |' + id: totrans-117 prefs: [] type: TYPE_TB + zh: '| [`ShardingRoundRobinDispatcher`](generated/torchdata.datapipes.iter.ShardingRoundRobinDispatcher.html#torchdata.datapipes.iter.ShardingRoundRobinDispatcher + "torchdata.datapipes.iter.ShardingRoundRobinDispatcher") | 包装器,指示`DataPipe`图的前一部分是不可复制的,并且在使用多处理时将以循环方式将数据分发到工作进程中(功能名称:`sharding_round_robin_dispatcher`)。 + |' - en: Selecting DataPipes[](#selecting-datapipes "Permalink to this heading") + id: totrans-118 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 选择DataPipes[](#selecting-datapipes "Permalink to this heading") - en: These DataPipes helps you select specific samples within a DataPipe. + id: totrans-119 prefs: [] type: TYPE_NORMAL + zh: 这些DataPipes帮助您在DataPipe中选择特定的样本。 - en: '| [`Filter`](generated/torchdata.datapipes.iter.Filter.html#torchdata.datapipes.iter.Filter "torchdata.datapipes.iter.Filter") | Filters out elements from the source datapipe according to input `filter_fn` (functional name: `filter`). |' + id: totrans-120 prefs: [] type: TYPE_TB + zh: '| [`Filter`](generated/torchdata.datapipes.iter.Filter.html#torchdata.datapipes.iter.Filter + "torchdata.datapipes.iter.Filter") | 根据输入的`filter_fn`从源datapipe中过滤出元素(功能名称:`filter`)。 + |' - en: '| [`Header`](generated/torchdata.datapipes.iter.Header.html#torchdata.datapipes.iter.Header "torchdata.datapipes.iter.Header") | Yields elements from the source DataPipe from the start, up to the specfied limit (functional name: `header`). |' + id: totrans-121 prefs: [] type: TYPE_TB + zh: '| [`Header`](generated/torchdata.datapipes.iter.Header.html#torchdata.datapipes.iter.Header + "torchdata.datapipes.iter.Header") | 从源DataPipe中产生元素,直到达到指定的限制为止(功能名称:`header`)。 + |' - en: '| [`Dropper`](generated/torchdata.datapipes.iter.Dropper.html#torchdata.datapipes.iter.Dropper "torchdata.datapipes.iter.Dropper") | Drop columns/elements in input DataPipe via its indices (functional name: `drop`). |' + id: totrans-122 prefs: [] type: TYPE_TB + zh: '| [`Dropper`](generated/torchdata.datapipes.iter.Dropper.html#torchdata.datapipes.iter.Dropper + "torchdata.datapipes.iter.Dropper") | 通过其索引在输入DataPipe中删除列/元素(功能名称:`drop`)。 |' - en: '| [`Slicer`](generated/torchdata.datapipes.iter.Slicer.html#torchdata.datapipes.iter.Slicer "torchdata.datapipes.iter.Slicer") | returns a slice of elements in input DataPipe via start/stop/step or indices (functional name: `slice`). |' + id: totrans-123 prefs: [] type: TYPE_TB + zh: '| [`Slicer`](generated/torchdata.datapipes.iter.Slicer.html#torchdata.datapipes.iter.Slicer + "torchdata.datapipes.iter.Slicer") | 通过起始/停止/步长或索引返回输入DataPipe中元素的切片(功能名称:`slice`)。 + |' - en: '| [`Flattener`](generated/torchdata.datapipes.iter.Flattener.html#torchdata.datapipes.iter.Flattener "torchdata.datapipes.iter.Flattener") | returns a flattened copy of the input DataPipe at the per sample/element level based on provided indices (functional name: `flatten`). |' + id: totrans-124 prefs: [] type: TYPE_TB + zh: '| [`Flattener`](generated/torchdata.datapipes.iter.Flattener.html#torchdata.datapipes.iter.Flattener + "torchdata.datapipes.iter.Flattener") | 根据提供的索引,在每个样本/元素级别返回输入DataPipe的扁平副本(功能名称:`flatten`)。 + |' - en: Text DataPipes[](#text-datapipes "Permalink to this heading") + id: totrans-125 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 文本DataPipes[](#text-datapipes "Permalink to this heading") - en: These DataPipes help you parse, read, and transform text files and data. + id: totrans-126 prefs: [] type: TYPE_NORMAL + zh: 这些DataPipes帮助您解析、读取和转换文本文件和数据。 - en: '| [`CSVDictParser`](generated/torchdata.datapipes.iter.CSVDictParser.html#torchdata.datapipes.iter.CSVDictParser "torchdata.datapipes.iter.CSVDictParser") | Accepts a DataPipe consists of tuples of file name and CSV data stream, reads and returns the contents within the CSV files one row at a time (functional name: `parse_csv_as_dict`). |' + id: totrans-127 prefs: [] type: TYPE_TB + zh: '| [`CSVDictParser`](generated/torchdata.datapipes.iter.CSVDictParser.html#torchdata.datapipes.iter.CSVDictParser + "torchdata.datapipes.iter.CSVDictParser") | 接受由文件名和CSV数据流元组组成的DataPipe,逐行读取并返回CSV文件中的内容(功能名称:`parse_csv_as_dict`)。 + |' - en: '| [`CSVParser`](generated/torchdata.datapipes.iter.CSVParser.html#torchdata.datapipes.iter.CSVParser "torchdata.datapipes.iter.CSVParser") | Accepts a DataPipe consists of tuples of file name and CSV data stream, reads and returns the contents within the CSV files one row at a time (functional name: `parse_csv`). |' + id: totrans-128 prefs: [] type: TYPE_TB + zh: '| [`CSVParser`](generated/torchdata.datapipes.iter.CSVParser.html#torchdata.datapipes.iter.CSVParser + "torchdata.datapipes.iter.CSVParser") | 接受由文件名和CSV数据流元组组成的DataPipe,逐行读取并返回CSV文件中的内容(功能名称:`parse_csv`)。 + |' - en: '| [`JsonParser`](generated/torchdata.datapipes.iter.JsonParser.html#torchdata.datapipes.iter.JsonParser "torchdata.datapipes.iter.JsonParser") | Reads from JSON data streams and yields a tuple of file name and JSON data (functional name: `parse_json_files`). |' + id: totrans-129 prefs: [] type: TYPE_TB + zh: '| [`JsonParser`](generated/torchdata.datapipes.iter.JsonParser.html#torchdata.datapipes.iter.JsonParser + "torchdata.datapipes.iter.JsonParser") | 从JSON数据流中读取并产生一个由文件名和JSON数据组成的元组(功能名称:`parse_json_files`)。 + |' - en: '| [`LineReader`](generated/torchdata.datapipes.iter.LineReader.html#torchdata.datapipes.iter.LineReader "torchdata.datapipes.iter.LineReader") | Accepts a DataPipe consisting of tuples of file name and string data stream, and for each line in the stream, yields a tuple of file name and the line (functional name: `readlines`). |' + id: totrans-130 prefs: [] type: TYPE_TB + zh: '| [`LineReader`](generated/torchdata.datapipes.iter.LineReader.html#torchdata.datapipes.iter.LineReader + "torchdata.datapipes.iter.LineReader") | 接受由文件名和字符串数据流元组组成的DataPipe,对流中的每一行,产生一个由文件名和该行组成的元组(功能名称:`readlines`)。 + |' - en: '| [`ParagraphAggregator`](generated/torchdata.datapipes.iter.ParagraphAggregator.html#torchdata.datapipes.iter.ParagraphAggregator "torchdata.datapipes.iter.ParagraphAggregator") | Aggregates lines of text from the same file into a single paragraph (functional name: `lines_to_paragraphs`). |' + id: totrans-131 prefs: [] type: TYPE_TB + zh: '| [`ParagraphAggregator`](generated/torchdata.datapipes.iter.ParagraphAggregator.html#torchdata.datapipes.iter.ParagraphAggregator + "torchdata.datapipes.iter.ParagraphAggregator") | 将同一文件中的文本行聚合成一个段落(功能名称:`lines_to_paragraphs`)。 + |' - en: '| [`RoutedDecoder`](generated/torchdata.datapipes.iter.RoutedDecoder.html#torchdata.datapipes.iter.RoutedDecoder "torchdata.datapipes.iter.RoutedDecoder") | Decodes binary streams from input DataPipe, yields pathname and decoded data in a tuple (functional name: `routed_decode`). |' + id: totrans-132 prefs: [] type: TYPE_TB + zh: '| [`RoutedDecoder`](generated/torchdata.datapipes.iter.RoutedDecoder.html#torchdata.datapipes.iter.RoutedDecoder + "torchdata.datapipes.iter.RoutedDecoder") | 从输入DataPipe解码二进制流,以元组形式产生路径名和解码数据(功能名称:`routed_decode`)。 + |' - en: '| [`Rows2Columnar`](generated/torchdata.datapipes.iter.Rows2Columnar.html#torchdata.datapipes.iter.Rows2Columnar "torchdata.datapipes.iter.Rows2Columnar") | Accepts an input DataPipe with batches of data, and processes one batch at a time and yields a Dict for each batch, with `column_names` as keys and lists of corresponding values from each row as values (functional name: `rows2columnar`). |' + id: totrans-133 prefs: [] type: TYPE_TB + zh: '| [`Rows2Columnar`](generated/torchdata.datapipes.iter.Rows2Columnar.html#torchdata.datapipes.iter.Rows2Columnar + "torchdata.datapipes.iter.Rows2Columnar") | 接受一个带有数据批次的输入DataPipe,逐批处理并为每批产生一个字典,其中`column_names`作为键,每行对应值的列表作为值(功能名称:`rows2columnar`)。 + |' - en: '| [`StreamReader`](generated/torchdata.datapipes.iter.StreamReader.html#torchdata.datapipes.iter.StreamReader "torchdata.datapipes.iter.StreamReader") | Given IO streams and their label names, yields bytes with label name in a tuple (functional name: `read_from_stream`). |' + id: totrans-134 prefs: [] type: TYPE_TB + zh: '| [`StreamReader`](generated/torchdata.datapipes.iter.StreamReader.html#torchdata.datapipes.iter.StreamReader + "torchdata.datapipes.iter.StreamReader") | 给定IO流及其标签名称,以元组形式产生带有标签名称的字节(功能名称:`read_from_stream`)。 + |' diff --git a/totrans/data07_03.yaml b/totrans/data07_03.yaml index 8e9ccbc12267c549dec3b28320ea0b18c015dabb..9a4ca0924ec016533fbc223b3a78ea5bdb9a05e2 100644 --- a/totrans/data07_03.yaml +++ b/totrans/data07_03.yaml @@ -1,52 +1,76 @@ - en: Map-style DataPipes + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 映射样式DataPipes - en: 原文:[https://pytorch.org/data/beta/torchdata.datapipes.map.html](https://pytorch.org/data/beta/torchdata.datapipes.map.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/data/beta/torchdata.datapipes.map.html](https://pytorch.org/data/beta/torchdata.datapipes.map.html) - en: A Map-style DataPipe is one that implements the `__getitem__()` and `__len__()` protocols, and represents a map from (possibly non-integral) indices/keys to data samples. This is a close equivalent of `Dataset` from the PyTorch core library. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 映射样式DataPipe是实现`__getitem__()`和`__len__()`协议的DataPipe,表示从(可能是非整数)索引/键到数据样本的映射。这与PyTorch核心库中的`Dataset`是相似的。 - en: For example, when accessed with `mapdatapipe[idx]`, could read the `idx`-th image and its corresponding label from a folder on the disk. + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 例如,当使用`mapdatapipe[idx]`访问时,可以从磁盘上的文件夹中读取第`idx`个图像及其对应的标签。 - en: '[PRE0]' + id: totrans-4 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: Map-style DataPipe. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 映射样式DataPipe。 - en: All datasets that represent a map from keys to data samples should subclass this. Subclasses should overwrite `__getitem__()`, supporting fetching a data sample for a given, unique key. Subclasses can also optionally overwrite `__len__()`, which is expected to return the size of the dataset by many `Sampler` implementations and the default options of `DataLoader`. + id: totrans-6 prefs: [] type: TYPE_NORMAL + zh: 所有表示从键到数据样本的数据集都应该是这个类的子类。子类应该重写`__getitem__()`,支持为给定的唯一键获取数据样本。子类也可以选择性地重写`__len__()`,这个方法在许多`Sampler`实现和`DataLoader`的默认选项中被期望返回数据集的大小。 - en: These DataPipes can be invoked in two ways, using the class constructor or applying their functional form onto an existing MapDataPipe (recommend, available to most but not all DataPipes). + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 这些DataPipes可以通过两种方式调用,一种是使用类构造函数,另一种是将它们的函数形式应用于现有的MapDataPipe(推荐,适用于大多数但不是所有DataPipes)。 - en: Note + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: '`DataLoader` by default constructs an index sampler that yields integral indices. To make it work with a map-style DataPipe with non-integral indices/keys, a custom sampler must be provided.' + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: '`DataLoader`默认构建一个索引采样器,产生整数索引。要使其与具有非整数索引/键的映射样式DataPipe一起工作,必须提供自定义采样器。' - en: Example + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 示例 - en: '[PRE1]' + id: totrans-11 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: By design, there are fewer `MapDataPipe` than `IterDataPipe` to avoid duplicate implementations of the same functionalities as `MapDataPipe`. We encourage users to use the built-in `IterDataPipe` for various functionalities, and convert it @@ -54,65 +78,103 @@ "torchdata.datapipes.map.IterToMapConverter") or `.to_map_datapipe()`. If you have any question about usage or best practices while using `MapDataPipe`, feel free to ask on the PyTorch forum under the [‘data’ category](https://discuss.pytorch.org/c/data/37). + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 按设计,`MapDataPipe`比`IterDataPipe`少,以避免重复实现相同的功能。我们鼓励用户使用内置的`IterDataPipe`进行各种功能,并根据需要使用[`IterToMapConverter`](generated/torchdata.datapipes.map.IterToMapConverter.html#torchdata.datapipes.map.IterToMapConverter + "torchdata.datapipes.map.IterToMapConverter")或`.to_map_datapipe()`将其转换为`MapDataPipe`。如果您在使用`MapDataPipe`时有任何问题或最佳实践,请随时在PyTorch论坛的[‘data’类别](https://discuss.pytorch.org/c/data/37)下提问。 - en: We are open to add additional `MapDataPipe` where the operations can be lazily executed and `__len__` can be known in advance. Feel free to make suggestions with description of your use case in [this Github issue](https://github.com/pytorch/pytorch/issues/57031). Feedback about our design choice is also welcomed in that Github issue. + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 我们愿意添加额外的`MapDataPipe`,其中操作可以延迟执行,并且`__len__`可以提前知道。请在[此Github问题](https://github.com/pytorch/pytorch/issues/57031)中提出您的用例描述的建议。关于我们的设计选择的反馈也欢迎在该Github问题中提出。 - en: 'Here is the list of available Map-style DataPipes:' + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 以下是可用的映射样式DataPipes列表: - en: List of MapDataPipes[](#list-of-mapdatapipes "Permalink to this heading") + id: totrans-15 prefs: - PREF_H2 type: TYPE_NORMAL + zh: MapDataPipes列表[](#list-of-mapdatapipes "跳转到此标题") - en: '| [`Batcher`](generated/torchdata.datapipes.map.Batcher.html#torchdata.datapipes.map.Batcher "torchdata.datapipes.map.Batcher") | Create mini-batches of data (functional name: `batch`). |' + id: totrans-16 prefs: [] type: TYPE_TB + zh: '| [`Batcher`](generated/torchdata.datapipes.map.Batcher.html#torchdata.datapipes.map.Batcher + "torchdata.datapipes.map.Batcher") | 创建数据的小批次(函数名称:`batch`)。|' - en: '| [`Concater`](generated/torchdata.datapipes.map.Concater.html#torchdata.datapipes.map.Concater "torchdata.datapipes.map.Concater") | Concatenate multiple Map DataPipes (functional name: `concat`). |' + id: totrans-17 prefs: [] type: TYPE_TB + zh: '| [`Concater`](generated/torchdata.datapipes.map.Concater.html#torchdata.datapipes.map.Concater + "torchdata.datapipes.map.Concater") | 连接多个Map DataPipes(函数名称:`concat`)。|' - en: '| [`InMemoryCacheHolder`](generated/torchdata.datapipes.map.InMemoryCacheHolder.html#torchdata.datapipes.map.InMemoryCacheHolder "torchdata.datapipes.map.InMemoryCacheHolder") | Stores elements from the source DataPipe in memory (functional name: `in_memory_cache`). |' + id: totrans-18 prefs: [] type: TYPE_TB + zh: '| [`InMemoryCacheHolder`](generated/torchdata.datapipes.map.InMemoryCacheHolder.html#torchdata.datapipes.map.InMemoryCacheHolder + "torchdata.datapipes.map.InMemoryCacheHolder") | 将源DataPipe中的元素存储在内存中(函数名称:`in_memory_cache`)。|' - en: '| [`IterToMapConverter`](generated/torchdata.datapipes.map.IterToMapConverter.html#torchdata.datapipes.map.IterToMapConverter "torchdata.datapipes.map.IterToMapConverter") | Lazily load data from `IterDataPipe` to construct a `MapDataPipe` with the key-value pair generated by `key_value_fn` (functional name: `to_map_datapipe`). |' + id: totrans-19 prefs: [] type: TYPE_TB + zh: '| [`IterToMapConverter`](generated/torchdata.datapipes.map.IterToMapConverter.html#torchdata.datapipes.map.IterToMapConverter + "torchdata.datapipes.map.IterToMapConverter") | 从`IterDataPipe`中延迟加载数据,以生成由`key_value_fn`生成的键值对构建`MapDataPipe`(函数名称:`to_map_datapipe`)。|' - en: '| [`Mapper`](generated/torchdata.datapipes.map.Mapper.html#torchdata.datapipes.map.Mapper "torchdata.datapipes.map.Mapper") | Apply the input function over each item from the source DataPipe (functional name: `map`). |' + id: totrans-20 prefs: [] type: TYPE_TB + zh: '| [`Mapper`](generated/torchdata.datapipes.map.Mapper.html#torchdata.datapipes.map.Mapper + "torchdata.datapipes.map.Mapper") | 对源DataPipe中的每个项目应用输入函数(函数名称:`map`)。|' - en: '| [`SequenceWrapper`](generated/torchdata.datapipes.map.SequenceWrapper.html#torchdata.datapipes.map.SequenceWrapper "torchdata.datapipes.map.SequenceWrapper") | Wraps a sequence object into a MapDataPipe. |' + id: totrans-21 prefs: [] type: TYPE_TB + zh: '| [`SequenceWrapper`](generated/torchdata.datapipes.map.SequenceWrapper.html#torchdata.datapipes.map.SequenceWrapper + "torchdata.datapipes.map.SequenceWrapper") | 将序列对象包装成MapDataPipe。|' - en: '| [`Shuffler`](generated/torchdata.datapipes.map.Shuffler.html#torchdata.datapipes.map.Shuffler "torchdata.datapipes.map.Shuffler") | Shuffle the input MapDataPipe via its indices (functional name: `shuffle`). |' + id: totrans-22 prefs: [] type: TYPE_TB + zh: '| [`Shuffler`](generated/torchdata.datapipes.map.Shuffler.html#torchdata.datapipes.map.Shuffler + "torchdata.datapipes.map.Shuffler") | 通过其索引对输入的 MapDataPipe 进行洗牌(函数名称:`shuffle`)。 + |' - en: '| [`UnZipper`](generated/torchdata.datapipes.map.UnZipper.html#torchdata.datapipes.map.UnZipper "torchdata.datapipes.map.UnZipper") | Takes in a DataPipe of Sequences, unpacks each Sequence, and return the elements in separate DataPipes based on their position in the Sequence (functional name: `unzip`). |' + id: totrans-23 prefs: [] type: TYPE_TB + zh: '| [`UnZipper`](generated/torchdata.datapipes.map.UnZipper.html#torchdata.datapipes.map.UnZipper + "torchdata.datapipes.map.UnZipper") | 接收一个序列的 DataPipe,解压每个序列,并根据它们在序列中的位置将元素分别返回到不同的 + DataPipes 中(函数名称:`unzip`)。 |' - en: '| [`Zipper`](generated/torchdata.datapipes.map.Zipper.html#torchdata.datapipes.map.Zipper "torchdata.datapipes.map.Zipper") | Aggregates elements into a tuple from each of the input DataPipes (functional name: `zip`). |' + id: totrans-24 prefs: [] type: TYPE_TB + zh: '| [`Zipper`](generated/torchdata.datapipes.map.Zipper.html#torchdata.datapipes.map.Zipper + "torchdata.datapipes.map.Zipper") | 从每个输入的 DataPipe 中聚合元素到一个元组中(函数名称:`zip`)。 |' diff --git a/totrans/data07_04.yaml b/totrans/data07_04.yaml index 3056b8098536e5c76aa466294d843639e3bc7a5a..19956704588765878312e6a33f6294917fec3b8e 100644 --- a/totrans/data07_04.yaml +++ b/totrans/data07_04.yaml @@ -1,64 +1,100 @@ - en: Utility Functions + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 实用函数 - en: 原文:[https://pytorch.org/data/beta/torchdata.datapipes.utils.html](https://pytorch.org/data/beta/torchdata.datapipes.utils.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL -- en: DataPipe Graph Visualization[](#datapipe-graph-visualization "Permalink to - this heading") + zh: 原文:[https://pytorch.org/data/beta/torchdata.datapipes.utils.html](https://pytorch.org/data/beta/torchdata.datapipes.utils.html) +- en: DataPipe Graph Visualization[](#datapipe-graph-visualization "Permalink to this + heading") + id: totrans-2 prefs: - PREF_H2 type: TYPE_NORMAL + zh: DataPipe 图形可视化[](#datapipe-graph-visualization "跳转到此标题") - en: '| [`to_graph`](generated/torchdata.datapipes.utils.to_graph.html#torchdata.datapipes.utils.to_graph "torchdata.datapipes.utils.to_graph") | Visualizes a DataPipe by returning a [`graphviz.Digraph`](https://graphviz.readthedocs.io/en/stable/api.html#graphviz.Digraph "(in graphviz)"), which is a graph of the data pipeline. |' + id: totrans-3 prefs: [] type: TYPE_TB + zh: '| [`to_graph`](generated/torchdata.datapipes.utils.to_graph.html#torchdata.datapipes.utils.to_graph + "torchdata.datapipes.utils.to_graph") | 通过返回 [`graphviz.Digraph`](https://graphviz.readthedocs.io/en/stable/api.html#graphviz.Digraph + "(在 graphviz 中)") 来可视化 DataPipe,这是数据管道的图形。 |' - en: Common Utility Functions[](#common-utility-functions "Permalink to this heading") + id: totrans-4 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 常见实用函数[](#common-utility-functions "跳转到此标题") - en: '| [`janitor`](generated/torchdata.datapipes.utils.janitor.html#torchdata.datapipes.utils.janitor "torchdata.datapipes.utils.janitor") | Invokes various obj cleanup procedures such as: - Closing streams |' + id: totrans-5 prefs: [] type: TYPE_TB + zh: '| [`janitor`](generated/torchdata.datapipes.utils.janitor.html#torchdata.datapipes.utils.janitor + "torchdata.datapipes.utils.janitor") | 调用各种对象清理程序,如:- 关闭流 |' - en: '| [`pin_memory_fn`](generated/torchdata.datapipes.utils.pin_memory_fn.html#torchdata.datapipes.utils.pin_memory_fn "torchdata.datapipes.utils.pin_memory_fn") | Utility function to move data to pinned memory. |' + id: totrans-6 prefs: [] type: TYPE_TB + zh: '| [`pin_memory_fn`](generated/torchdata.datapipes.utils.pin_memory_fn.html#torchdata.datapipes.utils.pin_memory_fn + "torchdata.datapipes.utils.pin_memory_fn") | 将数据移动到固定内存的实用函数。 |' - en: File Object and Stream Utility[](#file-object-and-stream-utility "Permalink to this heading") + id: totrans-7 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 文件对象和流实用程序[](#file-object-and-stream-utility "跳转到此标题") - en: '| [`StreamWrapper`](generated/torchdata.datapipes.utils.StreamWrapper.html#torchdata.datapipes.utils.StreamWrapper "torchdata.datapipes.utils.StreamWrapper") | StreamWrapper is introduced to wrap file handler generated by DataPipe operation like FileOpener. |' + id: totrans-8 prefs: [] type: TYPE_TB + zh: '| [`StreamWrapper`](generated/torchdata.datapipes.utils.StreamWrapper.html#torchdata.datapipes.utils.StreamWrapper + "torchdata.datapipes.utils.StreamWrapper") | StreamWrapper 用于包装由 DataPipe 操作生成的文件处理程序,如 + FileOpener。 |' - en: DataLoader[](#dataloader "Permalink to this heading") + id: totrans-9 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 数据加载器[](#dataloader "跳转到此标题") - en: For documentation related to DataLoader, please refer to the `torch.utils.data` [documentation](https://pytorch.org/docs/stable/data.html). Or, more specifically, the [DataLoader API section](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader). + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 有关 DataLoader 的文档,请参考 `torch.utils.data` 的 [文档](https://pytorch.org/docs/stable/data.html)。或者更具体地,[DataLoader + API 部分](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)。 - en: DataLoader v2 is currently in development. For more information please refer to [DataLoader2](dataloader2.html). + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: DataLoader v2 目前正在开发中。更多信息请参考 [DataLoader2](dataloader2.html)。 - en: Sampler[](#sampler "Permalink to this heading") + id: totrans-12 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 采样器[](#sampler "跳转到此标题") - en: For documentation related to Sampler, please refer to the `torch.utils.data` [documentation on Data Loading order](https://pytorch.org/docs/stable/data.html#data-loading-order-and-sampler). The Sampler API section is [here](https://pytorch.org/docs/stable/data.html#torch.utils.data.Sampler). + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 有关采样器的文档,请参考 `torch.utils.data` 的 [数据加载顺序文档](https://pytorch.org/docs/stable/data.html#data-loading-order-and-sampler)。采样器 + API 部分在[这里](https://pytorch.org/docs/stable/data.html#torch.utils.data.Sampler)。 diff --git a/totrans/data07_05.yaml b/totrans/data07_05.yaml index b8985ef3997d15bf82a8a030c8298caf18df3180..3bf2e978520ff2ba8c2550e8d100ecc96cbb657f 100644 --- a/totrans/data07_05.yaml +++ b/totrans/data07_05.yaml @@ -1,261 +1,411 @@ - en: DataLoader2 + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: DataLoader2 - en: 原文:[https://pytorch.org/data/beta/dataloader2.html](https://pytorch.org/data/beta/dataloader2.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/data/beta/dataloader2.html](https://pytorch.org/data/beta/dataloader2.html) - en: A new, light-weight [`DataLoader2`](#torchdata.dataloader2.DataLoader2 "torchdata.dataloader2.DataLoader2") is introduced to decouple the overloaded data-manipulation functionalities from `torch.utils.data.DataLoader` to `DataPipe` operations. Besides, certain features can only be achieved with [`DataLoader2`](#torchdata.dataloader2.DataLoader2 "torchdata.dataloader2.DataLoader2") like snapshotting and switching backend services to perform high-performant operations. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 引入了一个新的轻量级 [`DataLoader2`](#torchdata.dataloader2.DataLoader2 "torchdata.dataloader2.DataLoader2"),以将过载的数据操作功能与 + `torch.utils.data.DataLoader` 分离,转移到 `DataPipe` 操作。此外,某些功能只能通过 [`DataLoader2`](#torchdata.dataloader2.DataLoader2 + "torchdata.dataloader2.DataLoader2") 实现,如快照和切换后端服务以执行高性能操作。 - en: DataLoader2[](#id1 "Permalink to this heading") + id: totrans-3 prefs: - PREF_H2 type: TYPE_NORMAL + zh: DataLoader2[](#id1 "跳转到此标题") - en: '[PRE0]' + id: totrans-4 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: '`DataLoader2` is used to optimize and execute the given `DataPipe` graph based on `ReadingService` and `Adapter` functions, with support for' + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: '`DataLoader2` 用于优化和执行给定的 `DataPipe` 图,基于 `ReadingService` 和 `Adapter` 函数,支持' - en: Dynamic sharding for multiprocess and distributed data loading + id: totrans-6 prefs: - PREF_UL type: TYPE_NORMAL + zh: 动态分片用于多进程和分布式数据加载 - en: Multiple backend `ReadingServices` + id: totrans-7 prefs: - PREF_UL type: TYPE_NORMAL + zh: 多个后端 `ReadingServices` - en: '`DataPipe` graph in-place modification like shuffle control, memory pinning, etc.' + id: totrans-8 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`DataPipe` 图的原地修改,如洗牌控制、内存固定等。' - en: Snapshot the state of data-preprocessing pipeline (WIP) + id: totrans-9 prefs: - PREF_UL type: TYPE_NORMAL + zh: 快照数据预处理流水线的状态(WIP) - en: 'Parameters:' + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**datapipe** (`IterDataPipe` or `MapDataPipe`) – `DataPipe` from which to load the data. A deepcopy of this datapipe will be made during initialization, allowing the input to be re-used in a different `DataLoader2` without sharing states. Input `None` can only be used if `load_state_dict` is called right after the creation of the DataLoader.' + id: totrans-11 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**datapipe** (`IterDataPipe` 或 `MapDataPipe`) – 用于加载数据的 `DataPipe`。在初始化期间将对此 + datapipe 进行深拷贝,允许在不共享状态的情况下在不同的 `DataLoader2` 中重复使用输入。只有在创建 DataLoader 后立即调用 `load_state_dict` + 才能使用输入 `None`。' - en: '**datapipe_adapter_fn** (`Iterable[Adapter]` or `Adapter`, optional) – `Adapter` function(s) that will be applied to the DataPipe (default: `None`).' + id: totrans-12 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**datapipe_adapter_fn** (`Iterable[Adapter]` 或 `Adapter`,可选) – 将应用于 DataPipe + 的 `Adapter` 函数(默认值:`None`)。' - en: '**reading_service** ([*ReadingServiceInterface*](reading_service.html#torchdata.dataloader2.ReadingServiceInterface "torchdata.dataloader2.ReadingServiceInterface")*,* *optional*) – defines how `DataLoader2` should execute operations over the `DataPipe`, e.g. multiprocessing/distributed (default: `None`). A deepcopy of this will be created during initialization, allowing the ReadingService to be re-used in a different `DataLoader2` without sharing states.' + id: totrans-13 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**reading_service** ([*ReadingServiceInterface*](reading_service.html#torchdata.dataloader2.ReadingServiceInterface + "torchdata.dataloader2.ReadingServiceInterface")*,* *可选*) – 定义 `DataLoader2` 应如何在 + `DataPipe` 上执行操作,例如多进程/分布式(默认值:`None`)。在初始化期间将对此进行深拷贝,允许在不共享状态的情况下在不同的 `DataLoader2` + 中重复使用 ReadingService。' - en: Note + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 注意 - en: When a `MapDataPipe` is passed into `DataLoader2`, in order to iterate through the data, `DataLoader2` will attempt to create an iterator via `iter(datapipe)`. If the object has a non-zero-indexed indices, this may fail. Consider using `.shuffle()` (which converts `MapDataPipe` to `IterDataPipe`) or `datapipe.to_iter_datapipe(custom_indices)`. + id: totrans-15 prefs: [] type: TYPE_NORMAL + zh: 当将 `MapDataPipe` 传递给 `DataLoader2` 时,为了遍历数据,`DataLoader2` 将尝试通过 `iter(datapipe)` + 创建迭代器。如果对象具有非零索引的索引,这可能会失败。考虑使用 `.shuffle()`(将 `MapDataPipe` 转换为 `IterDataPipe`)或 + `datapipe.to_iter_datapipe(custom_indices)`。 - en: '[PRE1]' + id: totrans-16 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: Return a singleton iterator from the `DataPipe` graph adapted by `ReadingService`. `DataPipe` will be restored if the serialized state is provided to construct `DataLoader2`. And, `initialize_iteration` and `finalize_iterator` will be invoked at the beginning and end of the iteration correspondingly. + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: 从由 `ReadingService` 调整的 `DataPipe` 图返回一个单例迭代器。如果提供了序列化状态以构建 `DataLoader2`,则将恢复 + `DataPipe`。并且,将在迭代开始和结束时分别调用 `initialize_iteration` 和 `finalize_iterator`。 - en: '[PRE2]' + id: totrans-18 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: Create new `DataLoader2` with `DataPipe` graph and `ReadingService` restored from the serialized state. + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: 创建新的 `DataLoader2`,其中包含从序列化状态恢复的 `DataPipe` 图和 `ReadingService`。 - en: '[PRE3]' + id: totrans-20 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: For the existing `DataLoader2`, load serialized state to restore `DataPipe` graph and reset the internal state of `ReadingService`. + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: 对于现有的 `DataLoader2`,加载序列化状态以恢复 `DataPipe` 图并重置 `ReadingService` 的内部状态。 - en: '[PRE4]' + id: totrans-22 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: Set random seed for DataLoader2 to control determinism. + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: 设置 DataLoader2 的随机种子以控制确定性。 - en: 'Parameters:' + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**seed** – Random uint64 seed' + id: totrans-25 prefs: [] type: TYPE_NORMAL + zh: '**seed** – 随机的 uint64 种子' - en: '[PRE5]' + id: totrans-26 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: Shuts down `ReadingService` and clean up iterator. + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: 关闭 `ReadingService` 并清理迭代器。 - en: '[PRE6]' + id: totrans-28 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: 'Return a dictionary to represent the state of data-processing pipeline with keys:' + id: totrans-29 prefs: [] type: TYPE_NORMAL + zh: 返回一个表示数据处理流水线状态的字典,其中包含键: - en: '`serialized_datapipe`:Serialized `DataPipe` before `ReadingService` adaption.' + id: totrans-30 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`serialized_datapipe`:`ReadingService` 适配之前序列化的 `DataPipe`。' - en: '`reading_service_state`: The state of `ReadingService` and adapted `DataPipe`.' + id: totrans-31 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`reading_service_state`:`ReadingService` 的状态和适配的 `DataPipe`。' - en: 'Note: [`DataLoader2`](#torchdata.dataloader2.DataLoader2 "torchdata.dataloader2.DataLoader2") doesn’t support `torch.utils.data.Dataset` or `torch.utils.data.IterableDataset`. Please wrap each of them with the corresponding `DataPipe` below:' + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: 注意:[`DataLoader2`](#torchdata.dataloader2.DataLoader2 "torchdata.dataloader2.DataLoader2") + 不支持 `torch.utils.data.Dataset` 或 `torch.utils.data.IterableDataset`。请使用下面对应的 `DataPipe` + 包装每一个: - en: '[`torchdata.datapipes.map.SequenceWrapper`](generated/torchdata.datapipes.map.SequenceWrapper.html#torchdata.datapipes.map.SequenceWrapper "torchdata.datapipes.map.SequenceWrapper"): `torch.utils.data.Dataset`' + id: totrans-33 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[`torchdata.datapipes.map.SequenceWrapper`](generated/torchdata.datapipes.map.SequenceWrapper.html#torchdata.datapipes.map.SequenceWrapper + "torchdata.datapipes.map.SequenceWrapper"):`torch.utils.data.Dataset`' - en: '[`torchdata.datapipes.iter.IterableWrapper`](generated/torchdata.datapipes.iter.IterableWrapper.html#torchdata.datapipes.iter.IterableWrapper "torchdata.datapipes.iter.IterableWrapper"): `torch.utils.data.IterableDataset`' + id: totrans-34 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[`torchdata.datapipes.iter.IterableWrapper`](generated/torchdata.datapipes.iter.IterableWrapper.html#torchdata.datapipes.iter.IterableWrapper + "torchdata.datapipes.iter.IterableWrapper"):`torch.utils.data.IterableDataset`' - en: ReadingService[](#readingservice "Permalink to this heading") + id: totrans-35 prefs: - PREF_H2 type: TYPE_NORMAL + zh: ReadingService[](#readingservice "跳转到此标题") - en: '`ReadingService` specifies the execution backend for the data-processing graph. There are three types of `ReadingServices` provided in TorchData:' + id: totrans-36 prefs: [] type: TYPE_NORMAL + zh: '`ReadingService`指定数据处理图的执行后端。TorchData提供了三种类型的`ReadingServices`:' - en: '| [`DistributedReadingService`](generated/torchdata.dataloader2.DistributedReadingService.html#torchdata.dataloader2.DistributedReadingService "torchdata.dataloader2.DistributedReadingService") | `DistributedReadingSerivce` handles distributed sharding on the graph of `DataPipe` and guarantee the randomness by sharing the same seed across the distributed processes. |' + id: totrans-37 prefs: [] type: TYPE_TB + zh: '| [`DistributedReadingService`](generated/torchdata.dataloader2.DistributedReadingService.html#torchdata.dataloader2.DistributedReadingService + "torchdata.dataloader2.DistributedReadingService") | `DistributedReadingSerivce`处理`DataPipe`图上的分布式分片,并通过在分布式进程之间共享相同的种子来保证随机性。 + |' - en: '| [`InProcessReadingService`](generated/torchdata.dataloader2.InProcessReadingService.html#torchdata.dataloader2.InProcessReadingService "torchdata.dataloader2.InProcessReadingService") | Default ReadingService to serve the [``](#id2)DataPipe` graph in the main process, and apply graph settings like determinism control to the graph. |' + id: totrans-38 prefs: [] type: TYPE_TB + zh: '| [`InProcessReadingService`](generated/torchdata.dataloader2.InProcessReadingService.html#torchdata.dataloader2.InProcessReadingService + "torchdata.dataloader2.InProcessReadingService") | 默认的ReadingService,用于在主进程中为`DataPipe`图提供服务,并应用图设置,如确定性控制。 + |' - en: '| [`MultiProcessingReadingService`](generated/torchdata.dataloader2.MultiProcessingReadingService.html#torchdata.dataloader2.MultiProcessingReadingService "torchdata.dataloader2.MultiProcessingReadingService") | Spawns multiple worker processes to load data from the `DataPipe` graph. |' + id: totrans-39 prefs: [] type: TYPE_TB + zh: '| [`MultiProcessingReadingService`](generated/torchdata.dataloader2.MultiProcessingReadingService.html#torchdata.dataloader2.MultiProcessingReadingService + "torchdata.dataloader2.MultiProcessingReadingService") | 生成多个工作进程来从`DataPipe`图中加载数据。 + |' - en: '| [`SequentialReadingService`](generated/torchdata.dataloader2.SequentialReadingService.html#torchdata.dataloader2.SequentialReadingService "torchdata.dataloader2.SequentialReadingService") | |' + id: totrans-40 prefs: [] type: TYPE_TB + zh: '| [`SequentialReadingService`](generated/torchdata.dataloader2.SequentialReadingService.html#torchdata.dataloader2.SequentialReadingService + "torchdata.dataloader2.SequentialReadingService") | |' - en: Each `ReadingServices` would take the `DataPipe` graph and rewrite it to achieve a few features like dynamic sharding, sharing random seeds and snapshoting for multi-/distributed processes. For more detail about those features, please refer to [the documentation](reading_service.html). + id: totrans-41 prefs: [] type: TYPE_NORMAL + zh: 每个`ReadingServices`都会接收`DataPipe`图并重写它,以实现一些功能,如动态分片、共享随机种子和多/分布式进程的快照。有关这些功能的更多详细信息,请参阅[文档](reading_service.html)。 - en: Adapter[](#adapter "Permalink to this heading") + id: totrans-42 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 适配器[](#adapter "跳转到此标题") - en: '`Adapter` is used to configure, modify and extend the `DataPipe` graph in [`DataLoader2`](#torchdata.dataloader2.DataLoader2 "torchdata.dataloader2.DataLoader2"). It allows in-place modification or replace the pre-assembled `DataPipe` graph provided by PyTorch domains. For example, `Shuffle(False)` can be provided to [`DataLoader2`](#torchdata.dataloader2.DataLoader2 "torchdata.dataloader2.DataLoader2"), which would disable any `shuffle` operations in the `DataPipes` graph.' + id: totrans-43 prefs: [] type: TYPE_NORMAL + zh: '`Adapter`用于配置、修改和扩展[`DataLoader2`](#torchdata.dataloader2.DataLoader2 "torchdata.dataloader2.DataLoader2")中的`DataPipe`图。它允许就地修改或替换由PyTorch领域提供的预组装的`DataPipe`图。例如,可以向[`DataLoader2`](#torchdata.dataloader2.DataLoader2 + "torchdata.dataloader2.DataLoader2")提供`Shuffle(False)`,这将禁用`DataPipes`图中的任何`shuffle`操作。' - en: '[PRE7]' + id: totrans-44 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: Adapter Base Class that follows python Callable protocol. + id: totrans-45 prefs: [] type: TYPE_NORMAL + zh: 遵循Python Callable协议的适配器基类。 - en: '[PRE8]' + id: totrans-46 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: Callable function that either runs in-place modification of the `DataPipe` graph, or returns a new `DataPipe` graph. + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 可调用函数,可以就地修改`DataPipe`图,也可以返回一个新的`DataPipe`图。 - en: 'Parameters:' + id: totrans-48 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**datapipe** – `DataPipe` that needs to be adapted.' + id: totrans-49 prefs: [] type: TYPE_NORMAL + zh: '**datapipe** - 需要适配的`DataPipe`。' - en: 'Returns:' + id: totrans-50 prefs: [] type: TYPE_NORMAL + zh: 返回: - en: Adapted `DataPipe` or new `DataPipe`. + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: 适配的`DataPipe`或新的`DataPipe`。 - en: 'Here are the list of [`Adapter`](#torchdata.dataloader2.adapter.Adapter "torchdata.dataloader2.adapter.Adapter") provided by TorchData in `torchdata.dataloader2.adapter`:' + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: 以下是TorchData在`torchdata.dataloader2.adapter`中提供的[`Adapter`](#torchdata.dataloader2.adapter.Adapter + "torchdata.dataloader2.adapter.Adapter")列表: - en: '| [`Shuffle`](generated/torchdata.dataloader2.adapter.Shuffle.html#torchdata.dataloader2.adapter.Shuffle "torchdata.dataloader2.adapter.Shuffle") | Shuffle DataPipes adapter allows control over all existing Shuffler (`shuffle`) DataPipes in the graph. |' + id: totrans-53 prefs: [] type: TYPE_TB + zh: '| [`Shuffle`](generated/torchdata.dataloader2.adapter.Shuffle.html#torchdata.dataloader2.adapter.Shuffle + "torchdata.dataloader2.adapter.Shuffle") | Shuffle DataPipes适配器允许控制图中所有现有的Shuffler(`shuffle`)DataPipes。 + |' - en: '| [`CacheTimeout`](generated/torchdata.dataloader2.adapter.CacheTimeout.html#torchdata.dataloader2.adapter.CacheTimeout "torchdata.dataloader2.adapter.CacheTimeout") | CacheTimeout DataPipes adapter allows control over timeouts of all existing EndOnDiskCacheHolder (`end_caching`) in the graph. |' + id: totrans-54 prefs: [] type: TYPE_TB + zh: '| [`CacheTimeout`](generated/torchdata.dataloader2.adapter.CacheTimeout.html#torchdata.dataloader2.adapter.CacheTimeout + "torchdata.dataloader2.adapter.CacheTimeout") | CacheTimeout DataPipes适配器允许控制图中所有现有的EndOnDiskCacheHolder(`end_caching`)的超时时间。 + |' - en: 'And, we will provide more `Adapters` to cover data-processing options:' + id: totrans-55 prefs: [] type: TYPE_NORMAL + zh: 此外,我们将提供更多的`Adapters`来覆盖数据处理选项: - en: '`PinMemory`: Attach a `DataPipe` at the end of the data-processing graph that coverts output data to `torch.Tensor` in pinned memory.' + id: totrans-56 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`PinMemory`:在数据处理图的末尾附加一个`DataPipe`,将输出数据转换为固定内存中的`torch.Tensor`。' - en: '`FullSync`: Attach a `DataPipe` to make sure the data-processing graph synchronized between distributed processes to prevent hanging.' + id: totrans-57 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`FullSync`:附加一个`DataPipe`,以确保分布式进程之间的数据处理图同步,以防止挂起。' - en: '`ShardingPolicy`: Modify sharding policy if `sharding_filter` is presented in the `DataPipe` graph.' + id: totrans-58 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`ShardingPolicy`:如果`DataPipe`图中存在`sharding_filter`,则修改分片策略。' - en: '`PrefetchPolicy`, `InvalidateCache`, etc.' + id: totrans-59 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`PrefetchPolicy`,`InvalidateCache`等。' - en: If you have feature requests about the `Adapters` you’d like to be provided, please open a GitHub issue. For specific needs, `DataLoader2` also accepts any custom `Adapter` as long as it inherits from the `Adapter` class. + id: totrans-60 prefs: [] type: TYPE_NORMAL + zh: 如果您对希望提供的`Adapters`有功能请求,请提交一个GitHub问题。对于特定需求,`DataLoader2`还接受任何自定义`Adapter`,只要它继承自`Adapter`类。 diff --git a/totrans/data07_06.yaml b/totrans/data07_06.yaml index 387a478495bc5fe529f00cb4d3421d88a7805a7a..30727cb8b8c5f6888f1453a49cb3521aa08af6e7 100644 --- a/totrans/data07_06.yaml +++ b/totrans/data07_06.yaml @@ -1,410 +1,599 @@ - en: ReadingService + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: ReadingService - en: 原文:[https://pytorch.org/data/beta/reading_service.html](https://pytorch.org/data/beta/reading_service.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: '[https://pytorch.org/data/beta/reading_service.html](https://pytorch.org/data/beta/reading_service.html)' - en: '`ReadingService` handles in-place modification of `DataPipe` graph based on different use cases.' + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: '`ReadingService`处理基于不同用例的`DataPipe`图的原地修改。' - en: Features[](#features "Permalink to this heading") + id: totrans-3 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 特性[](#features "跳转到此标题") - en: Dynamic Sharding[](#dynamic-sharding "Permalink to this heading") + id: totrans-4 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 动态分片[](#dynamic-sharding "跳转到此标题") - en: Dynamic sharding is achieved by `MultiProcessingReadingService` and `DistributedReadingService` to shard the pipeline based on the information of corresponding multiprocessing and distributed workers. And, TorchData offers two types of `DataPipe` letting users define the sharding place within the pipeline. + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 动态分片是通过`MultiProcessingReadingService`和`DistributedReadingService`实现的,根据相应的多进程和分布式工作者的信息对管道进行分片。TorchData提供了两种类型的`DataPipe`,让用户在管道内定义分片位置。 - en: '`sharding_filter` ([`ShardingFilter`](generated/torchdata.datapipes.iter.ShardingFilter.html#torchdata.datapipes.iter.ShardingFilter "torchdata.datapipes.iter.ShardingFilter")): When the pipeline is replicable, each distributed/multiprocessing worker loads data from its own replica of the `DataPipe` graph, while skipping samples that do not belong to the corresponding worker at the point where `sharding_filter` is placed.' + id: totrans-6 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`sharding_filter`([`ShardingFilter`](generated/torchdata.datapipes.iter.ShardingFilter.html#torchdata.datapipes.iter.ShardingFilter + "torchdata.datapipes.iter.ShardingFilter")): 当管道可复制时,每个分布/多进程工作者从其自己的`DataPipe`图的副本加载数据,同时跳过不属于相应工作者的样本,即在放置`sharding_filter`的点。' - en: '`sharding_round_robin_dispatch` ([`ShardingRoundRobinDispatcher`](generated/torchdata.datapipes.iter.ShardingRoundRobinDispatcher.html#torchdata.datapipes.iter.ShardingRoundRobinDispatcher "torchdata.datapipes.iter.ShardingRoundRobinDispatcher")): When there is any `sharding_round_robin_dispatch` `DataPipe` in the pipeline, that branch (i.e. all DataPipes prior to `sharding_round_robin_dispatch`) will be treated as a non-replicable branch (in the context of multiprocessing). A single dispatching process will be created to load data from the non-replicable branch and distribute data to the subsequent worker processes.' + id: totrans-7 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`sharding_round_robin_dispatch`([`ShardingRoundRobinDispatcher`](generated/torchdata.datapipes.iter.ShardingRoundRobinDispatcher.html#torchdata.datapipes.iter.ShardingRoundRobinDispatcher + "torchdata.datapipes.iter.ShardingRoundRobinDispatcher")): 当管道中存在任何`sharding_round_robin_dispatch` + `DataPipe`时,该分支(即所有在`sharding_round_robin_dispatch`之前的DataPipes)将被视为不可复制的分支(在多进程的上下文中)。将创建一个单一的调度过程,从不可复制的分支加载数据并将数据分发给后续的工作进程。' - en: The following is an example of having two types of sharding strategies in the pipeline. + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 以下是在管道中使用两种分片策略的示例。 - en: '![digraph Example {' + id: totrans-9 prefs: [] type: TYPE_NORMAL + zh: '![有向图示例{' - en: subgraph cluster_replicable { + id: totrans-10 prefs: - PREF_IND type: TYPE_NORMAL + zh: 子图cluster_replicable { - en: label="Replicable" + id: totrans-11 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: 标签="可复制" - en: a -> b -> c -> d -> l; + id: totrans-12 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: a -> b -> c -> d -> l; - en: color=blue; + id: totrans-13 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: 颜色=蓝色; - en: '}' + id: totrans-14 prefs: - PREF_IND type: TYPE_NORMAL + zh: '}' - en: subgraph cluster_non_replicable { + id: totrans-15 prefs: - PREF_IND type: TYPE_NORMAL + zh: 子图cluster_non_replicable { - en: style=filled; + id: totrans-16 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: 样式=填充; - en: color=lightgrey; + id: totrans-17 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: 颜色=浅灰色; - en: node [style=filled,color=white]; + id: totrans-18 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: 节点[样式=填充,颜色=白色]; - en: label="Non-Replicable" + id: totrans-19 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: 标签="不可复制" - en: e -> f -> g -> k; + id: totrans-20 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: e -> f -> g -> k; - en: h -> i -> j -> k; + id: totrans-21 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: h -> i -> j -> k; - en: '}' + id: totrans-22 prefs: - PREF_IND type: TYPE_NORMAL + zh: '}' - en: k -> l -> fullsync -> end; + id: totrans-23 prefs: - PREF_IND type: TYPE_NORMAL + zh: k -> l -> fullsync -> 结束; - en: a [label="DP1"]; + id: totrans-24 prefs: - PREF_IND type: TYPE_NORMAL + zh: a [标签="DP1"]; - en: b [label="shuffle"]; + id: totrans-25 prefs: - PREF_IND type: TYPE_NORMAL + zh: b [标签="洗牌"]; - en: c [label="sharding_filter", color=blue]; + id: totrans-26 prefs: - PREF_IND type: TYPE_NORMAL + zh: c [标签="分片过滤器", 颜色=蓝色]; - en: d [label="DP4"]; + id: totrans-27 prefs: - PREF_IND type: TYPE_NORMAL + zh: d [标签="DP4"]; - en: e [label="DP2"]; + id: totrans-28 prefs: - PREF_IND type: TYPE_NORMAL + zh: e [标签="DP2"]; - en: f [label="shuffle"]; + id: totrans-29 prefs: - PREF_IND type: TYPE_NORMAL + zh: f [标签="洗牌"]; - en: g [label="sharding_round_robin_dispatch", style="filled,rounded", color=red, fillcolor=white]; + id: totrans-30 prefs: - PREF_IND type: TYPE_NORMAL + zh: g [标签="分片轮询调度", 样式="填充,圆角", 颜色=红色, 填充颜色=白色]; - en: h [label="DP3"]; + id: totrans-31 prefs: - PREF_IND type: TYPE_NORMAL + zh: h [标签="DP3"]; - en: i [label="shuffle"]; + id: totrans-32 prefs: - PREF_IND type: TYPE_NORMAL + zh: i [标签="洗牌"]; - en: j [label="sharding_round_robin_dispatch", style="filled,rounded", color=red, fillcolor=white]; + id: totrans-33 prefs: - PREF_IND type: TYPE_NORMAL + zh: j [标签="分片轮询调度", 样式="填充,圆角", 颜色=红色, 填充颜色=白色]; - en: k [label="DP5 (Lowest common ancestor)"]; + id: totrans-34 prefs: - PREF_IND type: TYPE_NORMAL + zh: k [标签="DP5(最低公共祖先)"]; - en: l [label="DP6"]; + id: totrans-35 prefs: - PREF_IND type: TYPE_NORMAL + zh: l [标签="DP6"]; - en: fullsync; + id: totrans-36 prefs: - PREF_IND type: TYPE_NORMAL + zh: fullsync; - en: end [shape=box]; + id: totrans-37 prefs: - PREF_IND type: TYPE_NORMAL + zh: 结束[形状=方框]; - en: '}](../Images/ded90db7e9b275c0ce72673dbdb87c9c.png)' + id: totrans-38 prefs: [] type: TYPE_NORMAL + zh: '}](../Images/ded90db7e9b275c0ce72673dbdb87c9c.png)' - en: 'When multiprocessing takes place, the graph becomes:' + id: totrans-39 prefs: [] type: TYPE_NORMAL + zh: '当多进程发生时,图变为:' - en: '![digraph Example {' + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: '![有向图示例{' - en: subgraph cluster_worker_0 { + id: totrans-41 prefs: - PREF_IND type: TYPE_NORMAL + zh: 子图cluster_worker_0 { - en: label="Worker 0" + id: totrans-42 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: 标签="工作者0" - en: a0 -> b0 -> c0 -> d0 -> l0; + id: totrans-43 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: a0 -> b0 -> c0 -> d0 -> l0; - en: m0 -> l0; + id: totrans-44 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: m0 -> l0; - en: color=blue; + id: totrans-45 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: 颜色=蓝色; - en: '}' + id: totrans-46 prefs: - PREF_IND type: TYPE_NORMAL + zh: '}' - en: subgraph cluster_worker_1 { + id: totrans-47 prefs: - PREF_IND type: TYPE_NORMAL + zh: 子图cluster_worker_1 { - en: label="Worker 1" + id: totrans-48 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: 标签="工作者1" - en: a1 -> b1 -> c1 -> d1 -> l1; + id: totrans-49 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: a1 -> b1 -> c1 -> d1 -> l1; - en: m1 -> l1; + id: totrans-50 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: m1 -> l1; - en: color=blue; + id: totrans-51 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: 颜色=蓝色; - en: '}' + id: totrans-52 prefs: - PREF_IND type: TYPE_NORMAL + zh: '}' - en: subgraph cluster_non_replicable { + id: totrans-53 prefs: - PREF_IND type: TYPE_NORMAL + zh: 子图cluster_non_replicable { - en: style=filled; + id: totrans-54 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: 样式=填充; - en: color=lightgrey; + id: totrans-55 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: 颜色=浅灰色; - en: node [style=filled,color=white]; + id: totrans-56 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: 节点[样式=填充,颜色=白色]; - en: label="Non-Replicable" + id: totrans-57 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: 标签="不可复制" - en: e -> f -> g -> k; + id: totrans-58 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: e -> f -> g -> k; - en: h -> i -> j -> k; + id: totrans-59 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: h -> i -> j -> k; - en: k -> round_robin_demux; + id: totrans-60 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL + zh: k -> 轮询解复用; - en: '}' + id: totrans-61 prefs: - PREF_IND type: TYPE_NORMAL + zh: '}' - en: round_robin_demux -> m0; + id: totrans-62 prefs: - PREF_IND type: TYPE_NORMAL + zh: 轮询解复用 -> m0; - en: round_robin_demux -> m1; + id: totrans-63 prefs: - PREF_IND type: TYPE_NORMAL + zh: 轮询解复用 -> m1; - en: l0 -> n; + id: totrans-64 prefs: - PREF_IND type: TYPE_NORMAL + zh: l0 -> n; - en: l1 -> n; + id: totrans-65 prefs: - PREF_IND type: TYPE_NORMAL + zh: l1 -> n; - en: n -> fullsync -> end; + id: totrans-66 prefs: - PREF_IND type: TYPE_NORMAL + zh: n -> fullsync -> 结束; - en: a0 [label="DP1"]; + id: totrans-67 prefs: - PREF_IND type: TYPE_NORMAL + zh: a0 [标签="DP1"]; - en: b0 [label="shuffle"]; + id: totrans-68 prefs: - PREF_IND type: TYPE_NORMAL + zh: b0 [标签="洗牌"]; - en: c0 [label="sharding_filter", color=blue]; + id: totrans-69 prefs: - PREF_IND type: TYPE_NORMAL + zh: c0 [标签="分片过滤器", 颜色=蓝色]; - en: d0 [label="DP4"]; + id: totrans-70 prefs: - PREF_IND type: TYPE_NORMAL + zh: d0 [标签="DP4"]; - en: a1 [label="DP1"]; + id: totrans-71 prefs: - PREF_IND type: TYPE_NORMAL + zh: a1 [标签="DP1"]; - en: b1 [label="shuffle"]; + id: totrans-72 prefs: - PREF_IND type: TYPE_NORMAL + zh: b1 [标签="洗牌"]; - en: c1 [label="sharding_filter", color=blue]; + id: totrans-73 prefs: - PREF_IND type: TYPE_NORMAL + zh: c1 [标签="分片过滤器", 颜色=蓝色]; - en: d1 [label="DP4"]; + id: totrans-74 prefs: - PREF_IND type: TYPE_NORMAL + zh: d1 [标签="DP4"]; - en: e [label="DP2"]; + id: totrans-75 prefs: - PREF_IND type: TYPE_NORMAL + zh: e [标签="DP2"]; - en: f [label="shuffle"]; + id: totrans-76 prefs: - PREF_IND type: TYPE_NORMAL + zh: f [标签="洗牌"]; - en: g [label="sharding_round_robin_dispatch", style="filled,rounded", color=red, fillcolor=white]; + id: totrans-77 prefs: - PREF_IND type: TYPE_NORMAL + zh: g [标签="分片轮询调度", 样式="填充,圆角", 颜色=红色, 填充颜色=白色]; - en: h [label="DP3"]; + id: totrans-78 prefs: - PREF_IND type: TYPE_NORMAL + zh: h [标签="DP3"]; - en: i [label="shuffle"]; + id: totrans-79 prefs: - PREF_IND type: TYPE_NORMAL + zh: i [标签="洗牌"]; - en: j [label="sharding_round_robin_dispatch", style="filled,rounded", color=red, fillcolor=white]; + id: totrans-80 prefs: - PREF_IND type: TYPE_NORMAL + zh: j [标签="分片轮询调度", 样式="填充,圆角", 颜色=红色, 填充颜色=白色]; - en: k [label="DP5 (Lowest common ancestor)"]; + id: totrans-81 prefs: - PREF_IND type: TYPE_NORMAL + zh: k [标签="DP5(最低公共祖先)"]; - en: fullsync; + id: totrans-82 prefs: - PREF_IND type: TYPE_NORMAL + zh: fullsync; - en: l0 [label="DP6"]; + id: totrans-83 prefs: - PREF_IND type: TYPE_NORMAL + zh: l0 [标签="DP6"]; - en: l1 [label="DP6"]; + id: totrans-84 prefs: - PREF_IND type: TYPE_NORMAL + zh: l1 [标签="DP6"]; - en: m0 [label="Client"] + id: totrans-85 prefs: - PREF_IND type: TYPE_NORMAL + zh: m0 [标签="客户端"] - en: m1 [label="Client"] + id: totrans-86 prefs: - PREF_IND type: TYPE_NORMAL + zh: m1 [标签="客户端"] - en: n [label="Client"] + id: totrans-87 prefs: - PREF_IND type: TYPE_NORMAL + zh: n [标签="客户端"] - en: end [shape=box]; + id: totrans-88 prefs: - PREF_IND type: TYPE_NORMAL + zh: 结束[形状=方框]; - en: '}](../Images/43cb85d64c97047f6451f776a8417bfc.png)' + id: totrans-89 prefs: [] type: TYPE_NORMAL + zh: '}](../Images/43cb85d64c97047f6451f776a8417bfc.png)' - en: '`Client` in the graph is a `DataPipe` that sends a request and receives a response from multiprocessing queues.' + id: totrans-90 prefs: [] type: TYPE_NORMAL + zh: 图中的`客户端`是一个`DataPipe`,它向多进程队列发送请求并接收响应。 - en: Determinism[](#determinism "Permalink to this heading") + id: totrans-91 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 确定性[](#determinism "跳转到此标题") - en: In `DataLoader2`, a `SeedGenerator` becomes a single source of randomness and each `ReadingService` would access it via `initialize_iteration()` and generate corresponding random seeds for random `DataPipe` operations. + id: totrans-92 prefs: [] type: TYPE_NORMAL + zh: 在`DataLoader2`中,`SeedGenerator`成为随机性的单一来源,每个`ReadingService`都可以通过`initialize_iteration()`访问它,并为随机的`DataPipe`操作生成相应的随机种子。 - en: In order to make sure that the Dataset shards are mutually exclusive and collectively exhaustive on multiprocessing processes and distributed nodes, `MultiProcessingReadingService` and `DistributedReadingService` would help [`DataLoader2`](dataloader2.html#torchdata.dataloader2.DataLoader2 @@ -413,12 +602,17 @@ For the remaining `DataPipe` operations after sharding, unique random states are generated based on the distributed rank and worker process id by each `ReadingService`, in order to perform different random transformations. + id: totrans-93 prefs: [] type: TYPE_NORMAL + zh: 为了确保数据集分片在多进程和分布式节点上是互斥且完全穷尽的,`MultiProcessingReadingService`和`DistributedReadingService`将帮助[`DataLoader2`](dataloader2.html#torchdata.dataloader2.DataLoader2 + "torchdata.dataloader2.DataLoader2")在`sharding_filter`或`sharding_round_robin_dispatch`之前同步任何随机`DataPipe`操作的随机状态。在分片之后的剩余`DataPipe`操作中,每个`ReadingService`基于分布式排名和工作进程ID生成唯一的随机状态,以执行不同的随机变换。 - en: Graph Mode[](#graph-mode "Permalink to this heading") + id: totrans-94 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 图模式[](#graph-mode "跳转到此标题") - en: This also allows easier transition of data-preprocessing pipeline from research to production. After the `DataPipe` graph is created and validated with the `ReadingServices`, a different `ReadingService` that configures and connects to the production service/infrastructure @@ -427,179 +621,282 @@ could potentially search the graph, and find `DataPipe` operations that can be delegated to the production service/infrastructure, then modify the graph correspondingly to achieve higher-performant execution. + id: totrans-95 prefs: [] type: TYPE_NORMAL + zh: 这也使得从研究到生产的数据预处理流水线更容易过渡。在`DataPipe`图与`ReadingServices`创建和验证后,可以提供一个不同的`ReadingService`,配置并连接到生产服务/基础设施,比如`AIStore`,作为[`DataLoader2`](dataloader2.html#torchdata.dataloader2.DataLoader2 + "torchdata.dataloader2.DataLoader2")的替换。`ReadingService`可能会搜索图,并找到可以委托给生产服务/基础设施的`DataPipe`操作,然后相应地修改图以实现更高性能的执行。 - en: Extend ReadingService[](#extend-readingservice "Permalink to this heading") + id: totrans-96 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 扩展`ReadingService`[](#extend-readingservice "跳转到此标题") - en: The followings are interfaces for custom `ReadingService`. + id: totrans-97 prefs: [] type: TYPE_NORMAL + zh: 以下是自定义`ReadingService`的接口。 - en: '[PRE0]' + id: totrans-98 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: Interface for `ReadingService`. Please extend custom `ReadingService` based on this interface class. + id: totrans-99 prefs: [] type: TYPE_NORMAL + zh: '`ReadingService`的接口。请根据这个接口类扩展自定义的`ReadingService`。' - en: ReadingService must be picklable prior to `initialize` being called. This is because a copy of it will be created by `DataLoader2` to avoid the situation where the same ReadingService object is used by multiple `DataLoader2`, and its internal state will be modifiable by each of them. + id: totrans-100 prefs: [] type: TYPE_NORMAL + zh: 在调用`initialize`之前,`ReadingService`必须是可picklable的。这是因为`DataLoader2`会创建它的副本,以避免同一个`ReadingService`对象被多个`DataLoader2`使用,并且它的内部状态将被每个对象修改。 - en: As a result of this constraint, certain initialization steps may need to take place within the `initialize` method rather than `__init__` of the ReadingService class. + id: totrans-101 prefs: [] type: TYPE_NORMAL + zh: 由于这个限制,某些初始化步骤可能需要在`initialize`方法中进行,而不是在`ReadingService`类的`__init__`中进行。 - en: '[PRE1]' + id: totrans-102 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: '`ReadingService` cleans up internal states and fully shuts down the service. Called in `DataLoader2`’s `shutdown` and `__del__`.' + id: totrans-103 prefs: [] type: TYPE_NORMAL + zh: '`ReadingService`清理内部状态并完全关闭服务。在`DataLoader2`的`shutdown`和`__del__`中调用。' - en: '[PRE2]' + id: totrans-104 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: '`ReadingService` ends service after an epoch is finished. Called when the iterator of `DataLoader2` is depleted.' + id: totrans-105 prefs: [] type: TYPE_NORMAL + zh: '`ReadingService`在一个epoch结束后终止服务。当`DataLoader2`的迭代器耗尽时调用。' - en: '[PRE3]' + id: totrans-106 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: '`ReadingService` takes a `DataPipe` graph, adapts it into a new `DataPipe` graph based on the custom need. Called once in creating `DataLoader2` iterator at first time. Prior to calling this method, the `ReadingService` object must be picklable.' + id: totrans-107 prefs: [] type: TYPE_NORMAL + zh: '`ReadingService`接受一个`DataPipe`图,根据自定义需求将其调整为一个新的`DataPipe`图。在首次创建`DataLoader2`迭代器时调用一次。在调用此方法之前,`ReadingService`对象必须是可picklable的。' - en: 'Parameters:' + id: totrans-108 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**datapipe** – Original `DataPipe` graph.' + id: totrans-109 prefs: [] type: TYPE_NORMAL + zh: '**datapipe** - 原始的`DataPipe`图。' - en: 'Returns:' + id: totrans-110 prefs: [] type: TYPE_NORMAL + zh: 返回值: - en: An adapted or a new `DataPipe` graph. + id: totrans-111 prefs: [] type: TYPE_NORMAL + zh: 一个调整或新的`DataPipe`图。 - en: '[PRE4]' + id: totrans-112 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: '`ReadingService` spins up service for an epoch. Called at the beginning of every time getting `DataLoader2` iterator.' + id: totrans-113 prefs: [] type: TYPE_NORMAL + zh: '`ReadingService`为一个epoch启动服务。在每次获取`DataLoader2`迭代器时调用。' - en: 'Parameters:' + id: totrans-114 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**seed_generator** – SeedGenerator object created and managed by DataLoader2\. As the single source of randomness, it will govern the determinism for all of random operations with the graph of DataPipes.' + id: totrans-115 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**seed_generator** - 由`DataLoader2`创建和管理的SeedGenerator对象。作为随机性的单一来源,它将控制所有DataPipes图中的随机操作的确定性。' - en: '**iter_reset_fn** – Optional reset function from the prior `ReadingServcie` when `SequentialReadingService` chains multiple `ReadingServices`' + id: totrans-116 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**iter_reset_fn** - 当`SequentialReadingService`链多个`ReadingServices`时,来自先前`ReadingServcie`的可选重置函数' - en: 'Returns:' + id: totrans-117 prefs: [] type: TYPE_NORMAL + zh: 返回值: - en: A new `iter_reset_fn` to be used by subseqeuent `ReadingService` + id: totrans-118 prefs: [] type: TYPE_NORMAL + zh: 一个新的`iter_reset_fn`供后续`ReadingService`使用 - en: Example + id: totrans-119 prefs: [] type: TYPE_NORMAL + zh: 示例 - en: MultiProcessingReadingService starts setting worker seeds per process and prefetching items from the graph. + id: totrans-120 prefs: [] type: TYPE_NORMAL + zh: MultiProcessingReadingService开始为每个进程设置工作器种子,并从图中预取项目。 - en: 'The checkpoint/snapshotting feature is a work in progress. Here is the preliminary interface (small changes are likely):' + id: totrans-121 prefs: [] type: TYPE_NORMAL + zh: 检查点/快照功能正在进行中。这是初步接口(可能会有小的更改): - en: '[PRE5]' + id: totrans-122 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: Extend `ReadingServiceInterface` with two additional methods to save/restore the state of the data-processing graph. + id: totrans-123 prefs: [] type: TYPE_NORMAL + zh: 通过两个额外的方法扩展`ReadingServiceInterface`以保存/恢复数据处理图的状态。 - en: '[PRE6]' + id: totrans-124 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: '`ReadingService` serializes the internal states. Called in `DataLoader2.state_dict`.' + id: totrans-125 prefs: [] type: TYPE_NORMAL + zh: '`ReadingService`序列化内部状态。在`DataLoader2.state_dict`中调用。' - en: '[PRE7]' + id: totrans-126 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: '`ReadingService` adapts `DataPipe` graph based on the serialized state. Called once in creating `DataLoader2` iterator at first time. Counterpart of `initialize`, which adapt `DataPipe` graph from scratch.' + id: totrans-127 prefs: [] type: TYPE_NORMAL + zh: '`ReadingService` 根据序列化状态调整 `DataPipe` 图。在首次创建 `DataLoader2` 迭代器时调用一次。与 `initialize` + 相对应,从头开始调整 `DataPipe` 图。' - en: 'Parameters:' + id: totrans-128 prefs: [] type: TYPE_NORMAL + zh: 参数: - en: '**datapipe** – original `DataPipe` graph before adapted by `ReadingService`' + id: totrans-129 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**datapipe** – 在被 `ReadingService` 调整之前的原始 `DataPipe` 图。' - en: '**serialized_state** – The serialized state of internal state used to restore the state of the adapted `DataPipe` graph.' + id: totrans-130 prefs: - PREF_UL type: TYPE_NORMAL + zh: '**serialized_state** – 用于恢复适应的 `DataPipe` 图状态的内部状态的序列化状态。' - en: 'Returns:' + id: totrans-131 prefs: [] type: TYPE_NORMAL + zh: 返回: - en: Adapted `DataPipe` generated from the serialized state. + id: totrans-132 prefs: [] type: TYPE_NORMAL + zh: 根据序列化状态生成的适应的 `DataPipe`。 - en: Graph Functions[](#graph-functions "Permalink to this heading") + id: totrans-133 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 图函数[](#graph-functions "Permalink to this heading") - en: 'And, graph utility functions are provided in `torchdata.dataloader.graph` to help users to do `DataPipe` graph rewrite for custom `ReadingService`:' + id: totrans-134 prefs: [] type: TYPE_NORMAL + zh: 另外,`torchdata.dataloader.graph` 中提供了图实用函数,帮助用户为自定义 `ReadingService` 进行 `DataPipe` + 图重写: - en: '| [`traverse_dps`](generated/torchdata.dataloader2.graph.traverse_dps.html#torchdata.dataloader2.graph.traverse_dps "torchdata.dataloader2.graph.traverse_dps") | Traverse the DataPipes and their attributes to extract the DataPipe graph. |' + id: totrans-135 prefs: [] type: TYPE_TB + zh: '| [`traverse_dps`](generated/torchdata.dataloader2.graph.traverse_dps.html#torchdata.dataloader2.graph.traverse_dps + "torchdata.dataloader2.graph.traverse_dps") | 遍历 DataPipes 及其属性以提取 DataPipe 图。 + |' - en: '| [`find_dps`](generated/torchdata.dataloader2.graph.find_dps.html#torchdata.dataloader2.graph.find_dps "torchdata.dataloader2.graph.find_dps") | Given the graph of DataPipe generated by `traverse_dps` function, return DataPipe instances with the provided DataPipe type. |' + id: totrans-136 prefs: [] type: TYPE_TB + zh: '| [`find_dps`](generated/torchdata.dataloader2.graph.find_dps.html#torchdata.dataloader2.graph.find_dps + "torchdata.dataloader2.graph.find_dps") | 给定由 `traverse_dps` 函数生成的 DataPipe 图,返回具有提供的 + DataPipe 类型的 DataPipe 实例。 |' - en: '| [`list_dps`](generated/torchdata.dataloader2.graph.list_dps.html#torchdata.dataloader2.graph.list_dps "torchdata.dataloader2.graph.list_dps") | Given the graph of DataPipe generated by `traverse_dps` function, return a list of all DataPipe instances without duplication. |' + id: totrans-137 prefs: [] type: TYPE_TB + zh: '| [`list_dps`](generated/torchdata.dataloader2.graph.list_dps.html#torchdata.dataloader2.graph.list_dps + "torchdata.dataloader2.graph.list_dps") | 给定由 `traverse_dps` 函数生成的 DataPipe 图,返回所有 + DataPipe 实例的列表,不重复。 |' - en: '| [`remove_dp`](generated/torchdata.dataloader2.graph.remove_dp.html#torchdata.dataloader2.graph.remove_dp "torchdata.dataloader2.graph.remove_dp") | Given the graph of DataPipe generated by `traverse_dps` function and the DataPipe to be removed, return the new graph of DataPipe. |' + id: totrans-138 prefs: [] type: TYPE_TB + zh: '| [`remove_dp`](generated/torchdata.dataloader2.graph.remove_dp.html#torchdata.dataloader2.graph.remove_dp + "torchdata.dataloader2.graph.remove_dp") | 给定由 `traverse_dps` 函数生成的 DataPipe 图以及要移除的 + DataPipe,返回新的 DataPipe 图。 |' - en: '| [`replace_dp`](generated/torchdata.dataloader2.graph.replace_dp.html#torchdata.dataloader2.graph.replace_dp "torchdata.dataloader2.graph.replace_dp") | Given the graph of DataPipe generated by `traverse_dps` function and the DataPipe to be replaced and the new DataPipe, return the new graph of DataPipe. |' + id: totrans-139 prefs: [] type: TYPE_TB + zh: '| [`replace_dp`](generated/torchdata.dataloader2.graph.replace_dp.html#torchdata.dataloader2.graph.replace_dp + "torchdata.dataloader2.graph.replace_dp") | 给定由 `traverse_dps` 函数生成的 DataPipe + 图以及要替换的 DataPipe 和新的 DataPipe,返回新的 DataPipe 图。 |' diff --git a/totrans/data07_07.yaml b/totrans/data07_07.yaml index 0292b6c2aedf4e943f0ff030ef334673d65e97b6..af234406bc3ac683162d7be6d9412af201405859 100644 --- a/totrans/data07_07.yaml +++ b/totrans/data07_07.yaml @@ -1,4 +1,6 @@ - en: 'Tutorial and Examples:' + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 教程和示例: diff --git a/totrans/data07_08.yaml b/totrans/data07_08.yaml index 7c7f7c5a50d19dd6337475af94b7aa2884c5322d..ff07c196920047126f571bdfe2b076b76d4654cd 100644 --- a/totrans/data07_08.yaml +++ b/totrans/data07_08.yaml @@ -1,151 +1,227 @@ - en: DataPipe Tutorial + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: DataPipe教程 - en: 原文:[https://pytorch.org/data/beta/dp_tutorial.html](https://pytorch.org/data/beta/dp_tutorial.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/data/beta/dp_tutorial.html](https://pytorch.org/data/beta/dp_tutorial.html) - en: Using DataPipes[](#using-datapipes "Permalink to this heading") + id: totrans-2 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 使用DataPipes[](#using-datapipes "跳转到此标题") - en: 'Suppose that we want to load data from CSV files with the following steps:' + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: 假设我们想要从CSV文件中加载数据,以下是步骤: - en: List all CSV files in a directory + id: totrans-4 prefs: - PREF_UL type: TYPE_NORMAL + zh: 列出目录中的所有CSV文件 - en: Load CSV files + id: totrans-5 prefs: - PREF_UL type: TYPE_NORMAL + zh: 加载CSV文件 - en: Parse CSV file and yield rows + id: totrans-6 prefs: - PREF_UL type: TYPE_NORMAL + zh: 解析CSV文件并产生行 - en: Split our dataset into training and validation sets + id: totrans-7 prefs: - PREF_UL type: TYPE_NORMAL + zh: 将数据集分割为训练集和验证集 - en: There are a few [built-in DataPipes](torchdata.datapipes.iter.html) that can help us with the above operations. + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 有一些[内置的DataPipes](torchdata.datapipes.iter.html)可以帮助我们进行上述操作。 - en: '`FileLister` - [lists out files in a directory](generated/torchdata.datapipes.iter.FileLister.html)' + id: totrans-9 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`FileLister` - [列出目录中的文件](generated/torchdata.datapipes.iter.FileLister.html)' - en: '`Filter` - [filters the elements in DataPipe based on a given function](generated/torchdata.datapipes.iter.Filter.html)' + id: totrans-10 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`Filter` - [根据给定函数过滤DataPipe中的元素](generated/torchdata.datapipes.iter.Filter.html)' - en: '`FileOpener` - [consumes file paths and returns opened file streams](generated/torchdata.datapipes.iter.FileOpener.html)' + id: totrans-11 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`FileOpener` - [消耗文件路径并返回打开的文件流](generated/torchdata.datapipes.iter.FileOpener.html)' - en: '`CSVParser` - [consumes file streams, parses the CSV contents, and returns one parsed line at a time](generated/torchdata.datapipes.iter.CSVParser.html)' + id: totrans-12 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`CSVParser` - [消耗文件流,解析CSV内容,并逐行返回解析后的内容](generated/torchdata.datapipes.iter.CSVParser.html)' - en: '`RandomSplitter` - [randomly split samples from a source DataPipe into groups](generated/torchdata.datapipes.iter.RandomSplitter.html)' + id: totrans-13 prefs: - PREF_UL type: TYPE_NORMAL + zh: '`RandomSplitter` - [从源DataPipe中随机分割样本为组](generated/torchdata.datapipes.iter.RandomSplitter.html)' - en: 'As an example, the source code for `CSVParser` looks something like this:' + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 例如,`CSVParser`的源代码看起来像这样: - en: '[PRE0]' + id: totrans-15 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: 'As mentioned in a different section, DataPipes can be invoked using their functional forms (recommended) or their class constructors. A pipeline can be assembled as the following:' + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: 如在不同部分中提到的,DataPipes可以使用它们的函数形式(推荐)或它们的类构造函数来调用。可以组装一个管道如下: - en: '[PRE1]' + id: totrans-17 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: You can find the full list of built-in [IterDataPipes here](torchdata.datapipes.iter.html) and [MapDataPipes here](torchdata.datapipes.map.html). + id: totrans-18 prefs: [] type: TYPE_NORMAL + zh: 您可以在这里找到所有内置的[IterDataPipes](torchdata.datapipes.iter.html)和[MapDataPipes](torchdata.datapipes.map.html)。 - en: Working with DataLoader[](#working-with-dataloader "Permalink to this heading") + id: totrans-19 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 使用DataLoader[](#working-with-dataloader "跳转到此标题") - en: In this section, we will demonstrate how you can use `DataPipe` with `DataLoader`. For the most part, you should be able to use it just by passing `dataset=datapipe` as an input argument into the `DataLoader`. For detailed documentation related to `DataLoader`, please visit [this PyTorch Core page](https://pytorch.org/docs/stable/data.html#single-and-multi-process-data-loading). + id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: 在本节中,我们将演示如何使用`DataPipe`与`DataLoader`。大部分情况下,您只需将`dataset=datapipe`作为输入参数传递给`DataLoader`即可使用。有关与`DataLoader`相关的详细文档,请访问[此PyTorch + Core页面](https://pytorch.org/docs/stable/data.html#single-and-multi-process-data-loading)。 - en: Please refer to [this page](dlv2_tutorial.html) about using `DataPipe` with `DataLoader2`. + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: 请参考[此页面](dlv2_tutorial.html)关于如何使用`DataPipe`与`DataLoader2`。 - en: For this example, we will first have a helper function that generates some CSV files with random label and data. + id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 对于这个例子,我们首先会有一个帮助函数,生成一些带有随机标签和数据的CSV文件。 - en: '[PRE2]' + id: totrans-23 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: Next, we will build our DataPipes to read and parse through the generated CSV files. Note that we prefer to have pass defined functions to DataPipes rather than lambda functions because the formers are serializable with pickle. + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: 接下来,我们将构建我们的DataPipes来读取和解析生成的CSV文件。请注意,我们更喜欢将定义的函数传递给DataPipes,而不是lambda函数,因为前者可以与pickle序列化。 - en: '[PRE3]' + id: totrans-25 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: Lastly, we will put everything together in `'__main__'` and pass the DataPipe into the DataLoader. Note that if you choose to use `Batcher` while setting `batch_size > 1` for DataLoader, your samples will be batched more than once. You should choose one or the other. + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: 最后,我们将把所有内容放在`'__main__'`中,并将DataPipe传递给DataLoader。请注意,如果您选择在DataLoader中设置`batch_size + > 1`时使用`Batcher`,则您的样本将被分批多次。您应该选择其中一个。 - en: '[PRE4]' + id: totrans-27 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: The following statements will be printed to show the shapes of a single batch of labels and features. + id: totrans-28 prefs: [] type: TYPE_NORMAL + zh: 以下语句将被打印出来,显示单个批次的标签和特征的形状。 - en: '[PRE5]' + id: totrans-29 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: The reason why `n_sample = 12` is because `ShardingFilter` (`datapipe.sharding_filter()`) was not used, such that each worker will independently return all samples. In this case, there are 10 rows per file and 3 files, with a batch size of 5, that gives us 6 batches per worker. With 2 workers, we get 12 total batches from the `DataLoader`. + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: '`n_sample = 12`的原因是因为没有使用`ShardingFilter`(`datapipe.sharding_filter()`),因此每个工作进程将独立返回所有样本。在这种情况下,每个文件有10行,共3个文件,批量大小为5,这给我们每个工作进程6个批次。有2个工作进程,我们从`DataLoader`中得到12个总批次。' - en: In order for DataPipe sharding to work with `DataLoader`, we need to add the following. + id: totrans-31 prefs: [] type: TYPE_NORMAL + zh: 为了使DataPipe分片与`DataLoader`一起工作,我们需要添加以下内容。 - en: '[PRE6]' + id: totrans-32 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: 'When we re-run, we will get:' + id: totrans-33 prefs: [] type: TYPE_NORMAL + zh: 当我们重新运行时,我们将得到: - en: '[PRE7]' + id: totrans-34 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: 'Note:' + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 注意: - en: Place `ShardingFilter` (`datapipe.sharding_filter`) as early as possible in the pipeline, especially before expensive operations such as decoding, in order to avoid repeating these expensive operations across worker/distributed processes. + id: totrans-36 prefs: - PREF_UL type: TYPE_NORMAL + zh: 尽量在管道中尽早放置`ShardingFilter`(`datapipe.sharding_filter`),特别是在解码等昂贵操作之前,以避免在工作进程/分布式进程中重复执行这些昂贵操作。 - en: For the data source that needs to be sharded, it is crucial to add `Shuffler` before `ShardingFilter` to ensure data are globally shuffled before being split into shards. Otherwise, each worker process would always process the same shard @@ -153,302 +229,448 @@ the same shard, which leads to low accuracy during training. However, it doesn’t apply to the data source that has already been sharded for each multi-/distributed process, since `ShardingFilter` is no longer required to be presented in the pipeline. + id: totrans-37 prefs: - PREF_UL type: TYPE_NORMAL + zh: 对于需要分片的数据源,关键是在`ShardingFilter`之前添加`Shuffler`,以确保数据在分成片之前进行全局洗牌。否则,每个工作进程将始终处理相同的数据片段进行所有时期的训练。这意味着每个批次只包含来自同一数据片段的数据,这会导致训练时准确性较低。然而,对于已经为每个多/分布式进程分片的数据源,不再需要在管道中出现`ShardingFilter`。 - en: There may be cases where placing `Shuffler` earlier in the pipeline lead to worse performance, because some operations (e.g. decompression) are faster with sequential reading. In those cases, we recommend decompressing the files prior to shuffling (potentially prior to any data loading). + id: totrans-38 prefs: - PREF_UL type: TYPE_NORMAL + zh: 在某些情况下,将`Shuffler`放在管道中较早的位置可能会导致性能变差,因为某些操作(例如解压缩)在顺序读取时速度更快。在这种情况下,我们建议在洗牌之前解压缩文件(可能在任何数据加载之前)。 - en: You can find more DataPipe implementation examples for various research domains [on this page](examples.html). + id: totrans-39 prefs: [] type: TYPE_NORMAL + zh: 您可以在[此页面](examples.html)找到各种研究领域的更多DataPipe实现示例。 - en: Implementing a Custom DataPipe[](#implementing-a-custom-datapipe "Permalink to this heading") + id: totrans-40 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 实现自定义DataPipe[](#implementing-a-custom-datapipe "跳转到此标题") - en: Currently, we already have a large number of built-in DataPipes and we expect them to cover most necessary data processing operations. If none of them supports your need, you can create your own custom DataPipe. + id: totrans-41 prefs: [] type: TYPE_NORMAL + zh: 目前,我们已经拥有大量内置的DataPipes,并且我们希望它们能够涵盖大多数必要的数据处理操作。如果没有一个支持您的需求,您可以创建自己的自定义DataPipe。 - en: As a guiding example, let us implement an `IterDataPipe` that applies a callable to the input iterator. For `MapDataPipe`, take a look at the [map](https://github.com/pytorch/pytorch/tree/master/torch/utils/data/datapipes/map) folder for examples, and follow the steps below for the `__getitem__` method instead of the `__iter__` method. + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: 作为一个指导示例,让我们实现一个将可调用函数应用于输入迭代器的`IterDataPipe`。对于`MapDataPipe`,请查看[map](https://github.com/pytorch/pytorch/tree/master/torch/utils/data/datapipes/map)文件夹中的示例,并按照下面的步骤为`__getitem__`方法而不是`__iter__`方法。 - en: Naming[](#naming "Permalink to this heading") + id: totrans-43 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 命名[](#naming "跳转到此标题") - en: The naming convention for `DataPipe` is “Operation”-er, followed by `IterDataPipe` or `MapDataPipe`, as each DataPipe is essentially a container to apply an operation to data yielded from a source `DataPipe`. For succinctness, we alias to just “Operation-er” in **init** files. For our `IterDataPipe` example, we’ll name the module `MapperIterDataPipe` and alias it as `iter.Mapper` under `torchdata.datapipes`. + id: totrans-44 prefs: [] type: TYPE_NORMAL + zh: '`DataPipe`的命名约定是“操作”-er,后跟`IterDataPipe`或`MapDataPipe`,因为每个DataPipe本质上是一个容器,用于将操作应用于从源`DataPipe`中产生的数据。为了简洁起见,在**init**文件中我们将其别名为“Operation-er”。对于我们的`IterDataPipe`示例,我们将模块命名为`MapperIterDataPipe`,并在`torchdata.datapipes`下将其别名为`iter.Mapper`。' - en: For the functional method name, the naming convention is `datapipe.`. For instance, the functional method name of `Mapper` is `map`, such that it can be invoked by `datapipe.map(...)`. + id: totrans-45 prefs: [] type: TYPE_NORMAL + zh: 对于功能方法的命名约定是`datapipe.`。例如,`Mapper`的功能方法名称是`map`,因此可以通过`datapipe.map(...)`来调用它。 - en: Constructor[](#constructor "Permalink to this heading") + id: totrans-46 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 构造函数[](#constructor "跳转到此标题") - en: 'DataSets are now generally constructed as stacks of `DataPipes`, so each `DataPipe` typically takes a source `DataPipe` as its first argument. Here is a simplified version of Mapper as an example:' + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 数据集现在通常构建为`DataPipes`堆叠,因此每个`DataPipe`通常将源`DataPipe`作为其第一个参数。以下是Mapper的简化版本示例: - en: '[PRE8]' + id: totrans-48 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: 'Note:' + id: totrans-49 prefs: [] type: TYPE_NORMAL + zh: 注意: - en: Avoid loading data from the source DataPipe in `__init__` function, in order to support lazy data loading and save memory. + id: totrans-50 prefs: - PREF_UL type: TYPE_NORMAL + zh: 避免在`__init__`函数中从源DataPipe加载数据,以支持延迟数据加载并节省内存。 - en: If `IterDataPipe` instance holds data in memory, please be ware of the in-place modification of data. When second iterator is created from the instance, the data may have already changed. Please take `IterableWrapper` [class](https://github.com/pytorch/pytorch/blob/master/torch/utils/data/datapipes/iter/utils.py) as reference to `deepcopy` data for each iterator. + id: totrans-51 prefs: - PREF_UL type: TYPE_NORMAL + zh: 如果`IterDataPipe`实例在内存中保存数据,请注意数据的原地修改。当从实例创建第二个迭代器时,数据可能已经发生了变化。请参考`IterableWrapper`[类](https://github.com/pytorch/pytorch/blob/master/torch/utils/data/datapipes/iter/utils.py)来为每个迭代器`deepcopy`数据。 - en: Avoid variables names that are taken by the functional names of existing DataPipes. For instance, `.filter` is the functional name that can be used to invoke `FilterIterDataPipe`. Having a variable named `filter` inside another `IterDataPipe` can lead to confusion. + id: totrans-52 prefs: - PREF_UL type: TYPE_NORMAL + zh: 避免使用现有DataPipes的功能名称作为变量名。例如,`.filter`是可以用来调用`FilterIterDataPipe`的功能名称。在另一个`IterDataPipe`中有一个名为`filter`的变量可能会导致混淆。 - en: Iterator[](#iterator "Permalink to this heading") + id: totrans-53 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 迭代器[](#iterator "跳转到此标题") - en: For `IterDataPipes`, an `__iter__` function is needed to consume data from the source `IterDataPipe` then apply the operation over the data before `yield`. + id: totrans-54 prefs: [] type: TYPE_NORMAL + zh: 对于`IterDataPipes`,需要一个`__iter__`函数来从源`IterDataPipe`中消耗数据,然后在`yield`之前对数据应用操作。 - en: '[PRE9]' + id: totrans-55 prefs: [] type: TYPE_PRE + zh: '[PRE9]' - en: Length[](#length "Permalink to this heading") + id: totrans-56 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 长度[](#length "跳转到此标题") - en: In many cases, as in our `MapperIterDataPipe` example, the `__len__` method of a DataPipe returns the length of the source DataPipe. + id: totrans-57 prefs: [] type: TYPE_NORMAL + zh: 在许多情况下,就像我们的`MapperIterDataPipe`示例一样,DataPipe的`__len__`方法返回源DataPipe的长度。 - en: '[PRE10]' + id: totrans-58 prefs: [] type: TYPE_PRE + zh: '[PRE10]' - en: However, note that `__len__` is optional for `IterDataPipe` and often inadvisable. For `CSVParserIterDataPipe` in the using DataPipes section below, `__len__` is not implemented because the number of rows in each file is unknown before loading it. In some special cases, `__len__` can be made to either return an integer or raise an Error depending on the input. In those cases, the Error must be a `TypeError` to support Python’s build-in functions like `list(dp)`. + id: totrans-59 prefs: [] type: TYPE_NORMAL + zh: 但请注意,对于`IterDataPipe`,`__len__`是可选的,通常不建议使用。在下面的DataPipes部分中,对于`CSVParserIterDataPipe`,`__len__`未实现,因为在加载之前无法确定每个文件中的行数。在某些特殊情况下,`__len__`可以被设置为返回整数或根据输入引发错误。在这些情况下,错误必须是`TypeError`,以支持Python的内置函数如`list(dp)`。 - en: Registering DataPipes with the functional API[](#registering-datapipes-with-the-functional-api "Permalink to this heading") + id: totrans-60 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 使用功能API注册DataPipes[](#registering-datapipes-with-the-functional-api "跳转到此标题的永久链接") - en: Each DataPipe can be registered to support functional invocation using the decorator `functional_datapipe`. + id: totrans-61 prefs: [] type: TYPE_NORMAL + zh: 每个DataPipe都可以注册以支持使用装饰器`functional_datapipe`进行功能调用。 - en: '[PRE11]' + id: totrans-62 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: 'The stack of DataPipes can then be constructed using their functional forms (recommended) or class constructors:' + id: totrans-63 prefs: [] type: TYPE_NORMAL + zh: 然后,可以使用它们的功能形式(推荐)或类构造函数构建DataPipes堆栈: - en: '[PRE12]' + id: totrans-64 prefs: [] type: TYPE_PRE + zh: '[PRE12]' - en: In the above example, `datapipes1` and `datapipes2` represent the exact same stack of `IterDataPipe`s. We recommend using the functional form of DataPipes. + id: totrans-65 prefs: [] type: TYPE_NORMAL + zh: 在上面的示例中,`datapipes1`和`datapipes2`代表完全相同的`IterDataPipe`堆栈。我们建议使用DataPipes的功能形式。 - en: Working with Cloud Storage Providers[](#working-with-cloud-storage-providers "Permalink to this heading") + id: totrans-66 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 与云存储提供商合作[](#working-with-cloud-storage-providers "跳转到此标题的永久链接") - en: In this section, we show examples accessing AWS S3, Google Cloud Storage, and Azure Cloud Storage with built-in `fsspec` DataPipes. Although only those two providers are discussed here, with additional libraries, `fsspec` DataPipes should allow you to connect with other storage systems as well ([list of known implementations](https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations)). + id: totrans-67 prefs: [] type: TYPE_NORMAL + zh: 在本节中,我们展示了使用内置`fsspec` DataPipes访问AWS S3、Google Cloud Storage和Azure Cloud Storage的示例。尽管这里只讨论了这两个提供商,但使用其他库,`fsspec` + DataPipes也应该允许您连接到其他存储系统([已知实现列表](https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations))。 - en: Let us know on GitHub if you have a request for support for other cloud storage providers, or you have code examples to share with the community. + id: totrans-68 prefs: [] type: TYPE_NORMAL + zh: 如果您对其他云存储提供商的支持有任何请求,或者有代码示例要与社区分享,请在GitHub上告诉我们。 - en: Accessing AWS S3 with `fsspec` DataPipes[](#accessing-aws-s3-with-fsspec-datapipes "Permalink to this heading") + id: totrans-69 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 使用`fsspec` DataPipes访问AWS S3[](#accessing-aws-s3-with-fsspec-datapipes "跳转到此标题的永久链接") - en: This requires the installation of the libraries `fsspec` ([documentation](https://filesystem-spec.readthedocs.io/en/latest/)) and `s3fs` ([s3fs GitHub repo](https://github.com/fsspec/s3fs)). + id: totrans-70 prefs: [] type: TYPE_NORMAL + zh: 这需要安装库`fsspec`([文档](https://filesystem-spec.readthedocs.io/en/latest/))和`s3fs`([s3fs + GitHub 仓库](https://github.com/fsspec/s3fs))。 - en: You can list out the files within a S3 bucket directory by passing a path that starts with `"s3://BUCKET_NAME"` to [FSSpecFileLister](generated/torchdata.datapipes.iter.FSSpecFileLister.html) (`.list_files_by_fsspec(...)`). + id: totrans-71 prefs: [] type: TYPE_NORMAL + zh: 您可以通过将以`s3://BUCKET_NAME`开头的路径传递给[FSSpecFileLister](generated/torchdata.datapipes.iter.FSSpecFileLister.html)(`.list_files_by_fsspec(...)`)来列出S3存储桶目录中的文件。 - en: '[PRE13]' + id: totrans-72 prefs: [] type: TYPE_PRE + zh: '[PRE13]' - en: You can also open files using [FSSpecFileOpener](generated/torchdata.datapipes.iter.FSSpecFileOpener.html) (`.open_files_by_fsspec(...)`) and stream them (if supported by the file format). + id: totrans-73 prefs: [] type: TYPE_NORMAL + zh: 您还可以使用[FSSpecFileOpener](generated/torchdata.datapipes.iter.FSSpecFileOpener.html)(`.open_files_by_fsspec(...)`)打开文件并流式传输(如果文件格式支持)。 - en: 'Note that you can also provide additional parameters via the argument `kwargs_for_open`. This can be useful for purposes such as accessing specific bucket version, which you can do so by passing in `{version_id: ''SOMEVERSIONID''}` (more [details about S3 bucket version awareness](https://s3fs.readthedocs.io/en/latest/#bucket-version-awareness) by `s3fs`). The supported arguments vary by the (cloud) file system that you are accessing.' + id: totrans-74 prefs: [] type: TYPE_NORMAL + zh: '请注意,您还可以通过参数`kwargs_for_open`提供额外的参数。这对于访问特定存储桶版本等目的可能很有用,您可以通过传入`{version_id: + ''SOMEVERSIONID''}`来实现(更多关于S3存储桶版本感知的详细信息,请参阅`s3fs`的[文档](https://s3fs.readthedocs.io/en/latest/#bucket-version-awareness))。支持的参数取决于您正在访问的(云)文件系统。' - en: In the example below, we are streaming the archive by using [TarArchiveLoader](generated/torchdata.datapipes.iter.TarArchiveLoader.html#) (`.load_from_tar(mode="r|")`), in contrast with the usual `mode="r:"`. This allows us to begin processing data inside the archive without downloading the whole archive into memory first. + id: totrans-75 prefs: [] type: TYPE_NORMAL + zh: 在下面的示例中,我们通过使用[TarArchiveLoader](generated/torchdata.datapipes.iter.TarArchiveLoader.html#)(`.load_from_tar(mode="r|")`)来流式传输存档,与通常的`mode="r:"`相反。这使我们能够在将整个存档下载到内存之前开始处理存档中的数据。 - en: '[PRE14]' + id: totrans-76 prefs: [] type: TYPE_PRE + zh: '[PRE14]' - en: Finally, [FSSpecFileSaver](generated/torchdata.datapipes.iter.FSSpecSaver.html) is also available for writing data to cloud. + id: totrans-77 prefs: [] type: TYPE_NORMAL + zh: 最后,[FSSpecFileSaver](generated/torchdata.datapipes.iter.FSSpecSaver.html) 也可用于将数据写入云端。 - en: Accessing Google Cloud Storage (GCS) with `fsspec` DataPipes[](#accessing-google-cloud-storage-gcs-with-fsspec-datapipes "Permalink to this heading") + id: totrans-78 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 使用`fsspec` DataPipes访问Google Cloud Storage(GCS)[](#accessing-google-cloud-storage-gcs-with-fsspec-datapipes + "跳转到此标题的永久链接") - en: This requires the installation of the libraries `fsspec` ([documentation](https://filesystem-spec.readthedocs.io/en/latest/)) and `gcsfs` ([gcsfs GitHub repo](https://github.com/fsspec/gcsfs)). + id: totrans-79 prefs: [] type: TYPE_NORMAL + zh: 这需要安装库`fsspec`([文档](https://filesystem-spec.readthedocs.io/en/latest/))和`gcsfs`([gcsfs + GitHub 仓库](https://github.com/fsspec/gcsfs))。 - en: You can list out the files within a GCS bucket directory by specifying a path that starts with `"gcs://BUCKET_NAME"`. The bucket name in the example below is `uspto-pair`. + id: totrans-80 prefs: [] type: TYPE_NORMAL + zh: 您可以通过指定以`"gcs://BUCKET_NAME"`开头的路径来列出GCS存储桶目录中的文件。下面示例中的存储桶名称是`uspto-pair`。 - en: '[PRE15]' + id: totrans-81 prefs: [] type: TYPE_PRE + zh: '[PRE15]' - en: Here is an example of loading a zip file `05900035.zip` from a bucket named `uspto-pair` inside the directory `applications`. + id: totrans-82 prefs: [] type: TYPE_NORMAL + zh: 以下是从名为`uspto-pair`的存储桶中的`applications`目录加载`05900035.zip`文件的示例。 - en: '[PRE16]' + id: totrans-83 prefs: [] type: TYPE_PRE + zh: '[PRE16]' - en: Accessing Azure Blob storage with `fsspec` DataPipes[](#accessing-azure-blob-storage-with-fsspec-datapipes "Permalink to this heading") + id: totrans-84 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 使用`fsspec` DataPipes 访问 Azure Blob 存储[](#accessing-azure-blob-storage-with-fsspec-datapipes + "跳转到此标题") - en: 'This requires the installation of the libraries `fsspec` ([documentation](https://filesystem-spec.readthedocs.io/en/latest/)) and `adlfs` ([adlfs GitHub repo](https://github.com/fsspec/adlfs)). You can access data in Azure Data Lake Storage Gen2 by providing URIs staring with `abfs://`. For example, [FSSpecFileLister](generated/torchdata.datapipes.iter.FSSpecFileLister.html) (`.list_files_by_fsspec(...)`) can be used to list files in a directory in a container:' + id: totrans-85 prefs: [] type: TYPE_NORMAL + zh: 这需要安装库`fsspec`([文档](https://filesystem-spec.readthedocs.io/en/latest/))和`adlfs`([adlfs + GitHub 仓库](https://github.com/fsspec/adlfs))。您可以通过提供以`abfs://`开头的 URI 来访问 Azure + Data Lake Storage Gen2。例如,[FSSpecFileLister](generated/torchdata.datapipes.iter.FSSpecFileLister.html)(`.list_files_by_fsspec(...)`)可用于列出容器中目录中的文件: - en: '[PRE17]' + id: totrans-86 prefs: [] type: TYPE_PRE + zh: '[PRE17]' - en: You can also open files using [FSSpecFileOpener](generated/torchdata.datapipes.iter.FSSpecFileOpener.html) (`.open_files_by_fsspec(...)`) and stream them (if supported by the file format). + id: totrans-87 prefs: [] type: TYPE_NORMAL + zh: 您还可以使用[FSSpecFileOpener](generated/torchdata.datapipes.iter.FSSpecFileOpener.html)(`.open_files_by_fsspec(...)`)打开文件并流式传输(如果文件格式支持)。 - en: Here is an example of loading a CSV file `ecdc_cases.csv` from a public container inside the directory `curated/covid-19/ecdc_cases/latest`, belonging to account `pandemicdatalake`. + id: totrans-88 prefs: [] type: TYPE_NORMAL + zh: 这里是一个从属于账户`pandemicdatalake`的公共容器内的目录`curated/covid-19/ecdc_cases/latest`中加载 + CSV 文件`ecdc_cases.csv`的示例。 - en: '[PRE18]' + id: totrans-89 prefs: [] type: TYPE_PRE + zh: '[PRE18]' - en: If necessary, you can also access data in Azure Data Lake Storage Gen1 by using URIs staring with `adl://` and `abfs://`, as described in [README of adlfs repo](https://github.com/fsspec/adlfs/blob/main/README.md) + id: totrans-90 prefs: [] type: TYPE_NORMAL + zh: 如有必要,您还可以通过使用以`adl://`和`abfs://`开头的 URI 来访问 Azure Data Lake Storage Gen1,如[adlfs + 仓库的 README](https://github.com/fsspec/adlfs/blob/main/README.md)中所述。 - en: Accessing Azure ML Datastores with `fsspec` DataPipes[](#accessing-azure-ml-datastores-with-fsspec-datapipes "Permalink to this heading") + id: totrans-91 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 使用`fsspec` DataPipes 访问 Azure ML 数据存储[](#accessing-azure-ml-datastores-with-fsspec-datapipes + "跳转到此标题") - en: 'An Azure ML datastore is a *reference* to an existing storage account on Azure. The key benefits of creating and using an Azure ML datastore are:' + id: totrans-92 prefs: [] type: TYPE_NORMAL + zh: Azure ML 数据存储是对 Azure 上现有存储账户的*引用*。创建和使用 Azure ML 数据存储的主要优势是: - en: A common and easy-to-use API to interact with different storage types in Azure (Blob/Files/). + id: totrans-93 prefs: - PREF_UL type: TYPE_NORMAL + zh: 一个通用且易于使用的 API,用于与 Azure 中的不同存储类型(Blob/Files/)进行交互。 - en: Easier to discover useful datastores when working as a team. + id: totrans-94 prefs: - PREF_UL type: TYPE_NORMAL + zh: 团队合作时更容易发现有用的数据存储。 - en: Authentication is automatically handled - both *credential-based* access (service principal/SAS/key) and *identity-based* access (Azure Active Directory/managed identity) are supported. When using credential-based authentication, you do not need to expose secrets in your code. + id: totrans-95 prefs: - PREF_UL type: TYPE_NORMAL + zh: 身份验证会自动处理 - 支持基于凭据的访问(服务主体/SAS/密钥)和基于身份的访问(Azure Active Directory/托管标识)。使用基于凭据的身份验证时,您无需在代码中暴露密钥。 - en: This requires the installation of the library `azureml-fsspec` ([documentation](https://learn.microsoft.com/python/api/azureml-fsspec/?view=azure-ml-py)). + id: totrans-96 prefs: [] type: TYPE_NORMAL + zh: 这需要安装库`azureml-fsspec`([文档](https://learn.microsoft.com/python/api/azureml-fsspec/?view=azure-ml-py))。 - en: 'You can access data in an Azure ML datastore by providing URIs staring with `azureml://`. For example, [FSSpecFileLister](generated/torchdata.datapipes.iter.FSSpecFileLister.html) (`.list_files_by_fsspec(...)`) can be used to list files in a directory in a container:' + id: totrans-97 prefs: [] type: TYPE_NORMAL + zh: 通过提供以`azureml://`开头的 URI,您可以访问 Azure ML 数据存储。例如,[FSSpecFileLister](generated/torchdata.datapipes.iter.FSSpecFileLister.html)(`.list_files_by_fsspec(...)`)可用于列出容器中目录中的文件: - en: '[PRE19]' + id: totrans-98 prefs: [] type: TYPE_PRE + zh: '[PRE19]' - en: You can also open files using [FSSpecFileOpener](generated/torchdata.datapipes.iter.FSSpecFileOpener.html) (`.open_files_by_fsspec(...)`) and stream them (if supported by the file format). + id: totrans-99 prefs: [] type: TYPE_NORMAL + zh: 您还可以使用[FSSpecFileOpener](generated/torchdata.datapipes.iter.FSSpecFileOpener.html)(`.open_files_by_fsspec(...)`)打开文件并流式传输(如果文件格式支持)。 - en: Here is an example of loading a tar file from the default Azure ML datastore `workspaceblobstore` where the path is `/cifar-10-python.tar.gz` (top-level folder). + id: totrans-100 prefs: [] type: TYPE_NORMAL + zh: 这里是一个从默认的 Azure ML 数据存储`workspaceblobstore`中加载 tar 文件的示例,路径为`/cifar-10-python.tar.gz`(顶层文件夹)。 - en: '[PRE20]' + id: totrans-101 prefs: [] type: TYPE_PRE + zh: '[PRE20]' - en: Here is an example of loading a CSV file - the famous Titanic dataset ([download](https://raw.githubusercontent.com/Azure/azureml-examples/main/cli/assets/data/sample-data/titanic.csv)) - from the Azure ML datastore `workspaceblobstore` where the path is `/titanic.csv` (top-level folder). + id: totrans-102 prefs: [] type: TYPE_NORMAL + zh: 这里是一个加载 CSV 文件的示例 - 著名的泰坦尼克号数据集([下载](https://raw.githubusercontent.com/Azure/azureml-examples/main/cli/assets/data/sample-data/titanic.csv))- + 从 Azure ML 数据存储`workspaceblobstore`中加载,路径为`/titanic.csv`(顶层文件夹)。 - en: '[PRE21]' + id: totrans-103 prefs: [] type: TYPE_PRE + zh: '[PRE21]' diff --git a/totrans/data07_09.yaml b/totrans/data07_09.yaml index 3e96db9248223b0819f0483924856eabbbe343fe..6b063d570a27dadb7dc93812aac2b2c8ee44a676 100644 --- a/totrans/data07_09.yaml +++ b/totrans/data07_09.yaml @@ -1,78 +1,115 @@ - en: DataLoader2 Tutorial + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: DataLoader2教程 - en: 原文:[https://pytorch.org/data/beta/dlv2_tutorial.html](https://pytorch.org/data/beta/dlv2_tutorial.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/data/beta/dlv2_tutorial.html](https://pytorch.org/data/beta/dlv2_tutorial.html) - en: This is the tutorial for users to create a `DataPipe` graph and load data via `DataLoader2` with different backend systems (`ReadingService`). An usage example can be found in [this colab notebook](https://colab.research.google.com/drive/1eSvp-eUDYPj0Sd0X_Mv9s9VkE8RNDg1u). + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 这是用户创建`DataPipe`图并通过不同后端系统(`ReadingService`)加载数据的教程。可以在[此colab笔记本](https://colab.research.google.com/drive/1eSvp-eUDYPj0Sd0X_Mv9s9VkE8RNDg1u)中找到一个使用示例。 - en: DataPipe[](#datapipe "Permalink to this heading") + id: totrans-3 prefs: - PREF_H2 type: TYPE_NORMAL + zh: DataPipe - en: 'Please refer to [DataPipe Tutorial](dp_tutorial.html) for more details. Here are the most important caveats necessary: to make sure the data pipeline has different order per epoch and data shards are mutually exclusive and collectively exhaustive:' + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 有关更多详细信息,请参阅[DataPipe教程](dp_tutorial.html)。以下是必要的最重要注意事项:确保数据管道每个时期具有不同的顺序,并且数据分片是互斥且完全穷尽的。 - en: Place `sharding_filter` or `sharding_round_robin_dispatch` as early as possible in the pipeline to avoid repeating expensive operations in worker/distributed processes. + id: totrans-5 prefs: - PREF_UL type: TYPE_NORMAL + zh: 尽早在管道中放置`sharding_filter`或`sharding_round_robin_dispatch`,以避免在工作/分布式进程中重复昂贵的操作。 - en: Add a `shuffle` DataPipe before sharding to achieve inter-shard shuffling. `ReadingService` will handle synchronization of those `shuffle` operations to ensure the order of data are the same before sharding so that all shards are mutually exclusive and collectively exhaustive. + id: totrans-6 prefs: - PREF_UL type: TYPE_NORMAL + zh: 在分片之前添加一个`shuffle` DataPipe以实现分片间的洗牌。`ReadingService`将处理这些`shuffle`操作的同步,以确保在分片之前数据的顺序相同,以使所有分片互斥且完全穷尽。 - en: 'Here is an example of a `DataPipe` graph:' + id: totrans-7 prefs: [] type: TYPE_NORMAL + zh: 以下是一个`DataPipe`图的示例: - en: '[PRE0]' + id: totrans-8 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: Multiprocessing[](#multiprocessing "Permalink to this heading") + id: totrans-9 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 多进程 - en: '`MultiProcessingReadingService` handles multiprocessing sharding at the point of `sharding_filter` and synchronizes the seeds across worker processes.' + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: '`MultiProcessingReadingService` 在`sharding_filter`点处理多进程分片,并在工作进程之间同步种子。' - en: '[PRE1]' + id: totrans-11 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: Distributed[](#distributed "Permalink to this heading") + id: totrans-12 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 分布式 - en: '`DistributedReadingService` handles distributed sharding at the point of `sharding_filter` and synchronizes the seeds across distributed processes. And, in order to balance the data shards across distributed nodes, a `fullsync` `DataPipe` will be attached to the `DataPipe` graph to align the number of batches across distributed ranks. This would prevent hanging issue caused by uneven shards in distributed training.' + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: '`DistributedReadingService` 在`sharding_filter`点处理分布式分片,并在分布式进程之间同步种子。为了平衡分布式节点之间的数据分片,将在`DataPipe`图中附加一个`fullsync` + `DataPipe`,以使分布式排名之间的批次数量保持一致。这将防止分布式训练中由不均匀分片引起的挂起问题。' - en: '[PRE2]' + id: totrans-14 prefs: [] type: TYPE_PRE -- en: Multiprocessing + Distributed[](#multiprocessing-distributed "Permalink to - this heading") + zh: '[PRE2]' +- en: Multiprocessing + Distributed[](#multiprocessing-distributed "Permalink to this + heading") + id: totrans-15 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 多进程+分布式 - en: '`SequentialReadingService` can be used to combine both `ReadingServices` together to achieve multiprocessing and distributed training at the same time.' + id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: '`SequentialReadingService`可用于将两个`ReadingServices`组合在一起,以同时实现多进程和分布式训练。' - en: '[PRE3]' + id: totrans-17 prefs: [] type: TYPE_PRE + zh: '[PRE3]' diff --git a/totrans/data07_10.yaml b/totrans/data07_10.yaml index 7fc445c0fcd4f573e596d0b90f6053bcf9d0ea1b..83dde78b42dc259dfa0b39566978fac993acf5fb 100644 --- a/totrans/data07_10.yaml +++ b/totrans/data07_10.yaml @@ -1,113 +1,159 @@ - en: Examples + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 示例 - en: 原文:[https://pytorch.org/data/beta/examples.html](https://pytorch.org/data/beta/examples.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:https://pytorch.org/data/beta/examples.html - en: In this section, you will find the data loading implementations (using DataPipes) of various popular datasets across different research domains. Some of the examples are implements by the PyTorch team and the implementation codes are maintained within PyTorch libraries. Others are created by members of the PyTorch community. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 在本节中,您将找到不同研究领域中各种流行数据集的数据加载实现(使用DataPipes)。一些示例是由PyTorch团队实现的,实现代码在PyTorch库中维护。其他是由PyTorch社区成员创建的。 - en: Audio[](#audio "Permalink to this heading") + id: totrans-3 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 音频这个标题的永久链接 - en: LibriSpeech[](#librispeech "Permalink to this heading") + id: totrans-4 prefs: - PREF_H3 type: TYPE_NORMAL + zh: LibriSpeech这个标题的永久链接 - en: '[LibriSpeech dataset](https://www.openslr.org/12/) is corpus of approximately 1000 hours of 16kHz read English speech. Here is the [DataPipe implementation of LibriSpeech](https://github.com/pytorch/data/blob/main/examples/audio/librispeech.py) to load the data.' + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: LibriSpeech数据集是大约1000小时的16kHz英语朗读语音语料库。这里是加载数据的LibriSpeech的DataPipe实现。 - en: Text[](#text "Permalink to this heading") + id: totrans-6 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 这个标题的永久链接 - en: Amazon Review Polarity[](#amazon-review-polarity "Permalink to this heading") + id: totrans-7 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 亚马逊评论极性这个标题的永久链接 - en: The Amazon reviews dataset contains reviews from Amazon. Its purpose is to train text/sentiment classification models. In our DataPipe [implementation of the dataset](https://github.com/pytorch/data/blob/main/examples/text/amazonreviewpolarity.py), we described every step with detailed comments to help you understand what each DataPipe is doing. We recommend having a look at this example. + id: totrans-8 prefs: [] type: TYPE_NORMAL + zh: 亚马逊评论数据集包含来自亚马逊的评论。其目的是训练文本/情感分类模型。在我们的DataPipe数据集实现中,我们用详细的注释描述了每个步骤,以帮助您了解每个DataPipe正在做什么。我们建议您查看这个例子。 - en: IMDB[](#imdb "Permalink to this heading") + id: totrans-9 prefs: - PREF_H3 type: TYPE_NORMAL + zh: IMDB这个标题的永久链接 - en: This is a [large movie review dataset](http://ai.stanford.edu/~amaas/data/sentiment/) for binary sentiment classification containing 25,000 highly polar movie reviews for training and 25,00 for testing. Here is the [DataPipe implementation to load the data](https://github.com/pytorch/data/blob/main/examples/text/imdb.py). + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 这是一个用于二元情感分类的大型电影评论数据集,包含25000条高度极性的电影评论用于训练和25000条用于测试。这里是加载数据的DataPipe实现。 - en: SQuAD[](#squad "Permalink to this heading") + id: totrans-11 prefs: - PREF_H3 type: TYPE_NORMAL + zh: SQuAD这个标题的永久链接 - en: '[SQuAD (Stanford Question Answering Dataset)](https://rajpurkar.github.io/SQuAD-explorer/) is a dataset for reading comprehension. It consists of a list of questions by crowdworkers on a set of Wikipedia articles. Here are the DataPipe implementations for [version 1.1](https://github.com/pytorch/data/blob/main/examples/text/squad1.py) is here and [version 2.0](https://github.com/pytorch/data/blob/main/examples/text/squad2.py).' + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: SQuAD(斯坦福问答数据集)是一个用于阅读理解的数据集。它由一组维基百科文章上的众包工作者提出的问题列表组成。这里是版本1.1的DataPipe实现和版本2.0的DataPipe实现。 - en: Additional Datasets in TorchText[](#additional-datasets-in-torchtext "Permalink to this heading") + id: totrans-13 prefs: - PREF_H3 type: TYPE_NORMAL + zh: TorchText中的其他数据集这个标题的永久链接 - en: In a separate PyTorch domain library [TorchText](https://github.com/pytorch/text), you will find some of the most popular datasets in the NLP field implemented as loadable datasets using DataPipes. You can find all of those [NLP datasets here](https://github.com/pytorch/text/tree/main/torchtext/datasets). + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 在一个独立的PyTorch领域库TorchText中,您将找到一些最受欢迎的NLP领域数据集,这些数据集被实现为可使用DataPipes加载的数据集。您可以在这里找到所有这些NLP数据集。 - en: Vision[](#vision "Permalink to this heading") + id: totrans-15 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 视觉这个标题的永久链接 - en: Caltech 101[](#caltech-101 "Permalink to this heading") + id: totrans-16 prefs: - PREF_H3 type: TYPE_NORMAL + zh: Caltech 101这个标题的永久链接 - en: The [Caltech 101 dataset](https://data.caltech.edu/records/20086) contains pictures of objects belonging to 101 categories. Here is the [DataPipe implementation of Caltech 101](https://github.com/pytorch/data/blob/main/examples/vision/caltech101.py). + id: totrans-17 prefs: [] type: TYPE_NORMAL + zh: Caltech 101数据集包含属于101个类别的对象的图片。这里是Caltech 101的DataPipe实现。 - en: Caltech 256[](#caltech-256 "Permalink to this heading") + id: totrans-18 prefs: - PREF_H3 type: TYPE_NORMAL + zh: Caltech 256这个标题的永久链接 - en: The [Caltech 256 dataset](https://data.caltech.edu/records/20087) contains 30607 images from 256 categories. Here is the [DataPipe implementation of Caltech 256](https://github.com/pytorch/data/blob/main/examples/vision/caltech256.py). + id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: Caltech 256数据集包含来自256个类别的30607张图片。这里是Caltech 256的DataPipe实现。 - en: CamVid - Semantic Segmentation (community example)[](#camvid-semantic-segmentation-community-example "Permalink to this heading") + id: totrans-20 prefs: - PREF_H3 type: TYPE_NORMAL + zh: CamVid - 语义分割(社区示例)这个标题的永久链接 - en: The [Cambridge-driving Labeled Video Database (CamVid)](http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/) is a collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. Here is a [DataPipe implementation of CamVid](https://github.com/tcapelle/torchdata/blob/main/01_Camvid_segmentation_with_datapipes.ipynb) created by our community. + id: totrans-21 prefs: [] type: TYPE_NORMAL + zh: 剑桥驾驶标记视频数据库(CamVid)是一个带有对象类语义标签的视频集合,附带元数据。该数据库提供了将每个像素与32个语义类别之一关联的地面实况标签。这里是我们社区创建的CamVid的DataPipe实现。 - en: laion2B-en-joined[](#laion2b-en-joined "Permalink to this heading") + id: totrans-22 prefs: - PREF_H3 type: TYPE_NORMAL + zh: laion2B-en-joined这个标题的永久链接 - en: The [laion2B-en-joined dataset](https://huggingface.co/datasets/laion/laion2B-en-joined) is a subset of the [LAION-5B dataset](https://laion.ai/blog/laion-5b/) containing english captions, URls pointing to images, and other metadata. It contains around @@ -115,112 +161,155 @@ to valid images. Here is a [DataPipe implementation of laion2B-en-joined](https://github.com/pytorch/data/blob/main/examples/vision/laion5b.py) that filters out unsafe images and images with watermarks and loads the images from the URLs. + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: '[laion2B-en-joined数据集](https://huggingface.co/datasets/laion/laion2B-en-joined)是[LAION-5B数据集](https://laion.ai/blog/laion-5b/)的一个子集,包含英文标题、指向图像的URL以及其他元数据。它包含大约23.2亿条目。目前(2023年2月)大约86%的URL仍指向有效图像。这里有一个[laion2B-en-joined的DataPipe实现](https://github.com/pytorch/data/blob/main/examples/vision/laion5b.py),它会过滤掉不安全的图像和带有水印的图像,并从URL加载图像。' - en: Additional Datasets in TorchVision[](#additional-datasets-in-torchvision "Permalink to this heading") + id: totrans-24 prefs: - PREF_H3 type: TYPE_NORMAL + zh: TorchVision中的其他数据集 - en: In a separate PyTorch domain library [TorchVision](https://github.com/pytorch/vision), you will find some of the most popular datasets in the computer vision field implemented as loadable datasets using DataPipes. You can find all of those [vision datasets here](https://github.com/pytorch/vision/tree/main/torchvision/prototype/datasets/_builtin). + id: totrans-25 prefs: [] type: TYPE_NORMAL + zh: 在单独的PyTorch领域库[TorchVision](https://github.com/pytorch/vision)中,您将找到一些最受欢迎的计算机视觉领域数据集,这些数据集被实现为可加载的数据集,使用DataPipes。您可以在[这里找到所有这些视觉数据集](https://github.com/pytorch/vision/tree/main/torchvision/prototype/datasets/_builtin)。 - en: Note that these implementations are currently in the prototype phase, but they should be fully supported in the coming months. Nonetheless, they demonstrate the different ways DataPipes can be used for data loading. + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: 请注意,这些实现目前处于原型阶段,但它们应该在未来几个月内得到充分支持。尽管如此,它们展示了DataPipes可以用于数据加载的不同方式。 - en: Recommender System[](#recommender-system "Permalink to this heading") + id: totrans-27 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 推荐系统 - en: Criteo 1TB Click Logs[](#criteo-1tb-click-logs "Permalink to this heading") + id: totrans-28 prefs: - PREF_H3 type: TYPE_NORMAL + zh: Criteo 1TB点击日志 - en: The [Criteo dataset](https://ailab.criteo.com/download-criteo-1tb-click-logs-dataset) contains feature values and click feedback for millions of display advertisements. It aims to benchmark algorithms for click through rate (CTR) prediction. You can find a prototype stage implementation of the [dataset with DataPipes in TorchRec](https://github.com/pytorch/torchrec/blob/main/torchrec/datasets/criteo.py). + id: totrans-29 prefs: [] type: TYPE_NORMAL + zh: '[Criteo数据集](https://ailab.criteo.com/download-criteo-1tb-click-logs-dataset)包含数百万个展示广告的特征值和点击反馈。它旨在为点击率(CTR)预测的算法提供基准。您可以在[TorchRec中使用DataPipes实现数据集的原型阶段](https://github.com/pytorch/torchrec/blob/main/torchrec/datasets/criteo.py)。' - en: Graphs, Meshes and Point Clouds[](#graphs-meshes-and-point-clouds "Permalink to this heading") + id: totrans-30 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 图、网格和点云 - en: TigerGraph (community example)[](#tigergraph-community-example "Permalink to this heading") + id: totrans-31 prefs: - PREF_H3 type: TYPE_NORMAL + zh: TigerGraph(社区示例) - en: TigerGraph is a scalable graph data platform for AI and ML. You can find an [implementation](https://github.com/TigerGraph-DevLabs/torchdata_tutorial/blob/main/torchdata_example.ipynb) of graph feature engineering and machine learning with DataPipes in TorchData and data stored in a TigerGraph database, which includes computing PageRank scores in-database, pulling graph data and features with multiple DataPipes, and training a neural network using graph features in PyTorch. + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: TigerGraph是一个可扩展的用于AI和ML的图数据平台。您可以在[TorchData中使用DataPipes实现图特征工程和机器学习](https://github.com/TigerGraph-DevLabs/torchdata_tutorial/blob/main/torchdata_example.ipynb),数据存储在TigerGraph数据库中,其中包括在数据库中计算PageRank分数,使用多个DataPipes提取图数据和特征,以及使用PyTorch中的图特征训练神经网络。 - en: MoleculeNet (community example)[](#moleculenet-community-example "Permalink to this heading") + id: totrans-33 prefs: - PREF_H3 type: TYPE_NORMAL + zh: MoleculeNet(社区示例) - en: '[MoleculeNet](https://moleculenet.org/) is a benchmark specially designed for testing machine learning methods of molecular properties. You can find an implementation of the [HIV dataset with DataPipes in PyTorch Geometric](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/datapipe.py), which includes converting SMILES strings into molecular graph representations.' + id: totrans-34 prefs: [] type: TYPE_NORMAL + zh: MoleculeNet是专门设计用于测试分子属性机器学习方法的基准。您可以在[PyTorch Geometric中使用DataPipes实现HIV数据集](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/datapipe.py),其中包括将SMILES字符串转换为分子图表示。 - en: Princeton ModelNet (community example)[](#princeton-modelnet-community-example "Permalink to this heading") + id: totrans-35 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 普林斯顿ModelNet(社区示例) - en: The Princeton ModelNet project provides a comprehensive and clean collection of 3D CAD models across various object types. You can find an implementation of the [ModelNet10 dataset with DataPipes in PyTorch Geometric](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/datapipe.py), which includes reading in meshes via [meshio](https://github.com/nschloe/meshio), and sampling of points from object surfaces and dynamic graph generation via [PyG’s functional transformations](https://pytorch-geometric.readthedocs.io/en/latest/modules/transforms.html). + id: totrans-36 prefs: [] type: TYPE_NORMAL + zh: 普林斯顿ModelNet项目提供了各种对象类型的全面且干净的3D CAD模型集合。您可以在[PyTorch Geometric中使用DataPipes实现ModelNet10数据集](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/datapipe.py),其中包括通过[meshio](https://github.com/nschloe/meshio)读取网格,从对象表面采样点以及通过[PyG的功能转换](https://pytorch-geometric.readthedocs.io/en/latest/modules/transforms.html)生成动态图。 - en: Timeseries[](#timeseries "Permalink to this heading") + id: totrans-37 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 时间序列 - en: Custom DataPipe for Timeseries rolling window (community example)[](#custom-datapipe-for-timeseries-rolling-window-community-example "Permalink to this heading") + id: totrans-38 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 用于时间序列滚动窗口的自定义DataPipe(社区示例) - en: Implementing a rolling window custom DataPipe for timeseries forecasting tasks. Here is the [DataPipe implementation of a rolling window](https://github.com/tcapelle/torchdata/blob/main/02_Custom_timeseries_datapipe.ipynb). + id: totrans-39 prefs: [] type: TYPE_NORMAL + zh: 为时间序列预测任务实现滚动窗口自定义DataPipe。这里是滚动窗口的DataPipe实现。 - en: Using AIStore[](#using-aistore "Permalink to this heading") + id: totrans-40 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 使用AIStore - en: Caltech 256 and Microsoft COCO (community example)[](#caltech-256-and-microsoft-coco-community-example "Permalink to this heading") + id: totrans-41 prefs: - PREF_H3 type: TYPE_NORMAL + zh: Caltech 256和Microsoft COCO(社区示例) - en: Listing and loading data from AIS buckets (buckets that are not 3rd party backend-based) and remote cloud buckets (3rd party backend-based cloud buckets) using [AISFileLister](https://pytorch.org/data/main/generated/torchdata.datapipes.iter.AISFileLister.html#aisfilelister) and [AISFileLoader](https://pytorch.org/data/main/generated/torchdata.datapipes.iter.AISFileLoader.html#torchdata.datapipes.iter.AISFileLoader). + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: 从AIS存储桶(非第三方后端存储桶)和远程云存储桶(第三方后端云存储桶)中列出和加载数据,使用AISFileLister和AISFileLoader。 - en: Here is an [example which uses AISIO DataPipe](https://github.com/pytorch/data/blob/main/examples/aistore/aisio_usage_example.ipynb) for the [Caltech-256 Object Category Dataset](https://data.caltech.edu/records/20087) containing 256 object categories and a total of 30607 images stored on an AIS bucket and the [Microsoft COCO Dataset](https://cocodataset.org/#home) which has 330K images with over 200K labels of more than 1.5 million object instances across 80 object categories stored on Google Cloud. + id: totrans-43 prefs: [] type: TYPE_NORMAL + zh: 这是一个示例,使用AISIO DataPipe处理Caltech-256对象类别数据集和Microsoft COCO数据集。Caltech-256数据集包含256个对象类别和30607张图像,存储在AIS存储桶中;而Microsoft + COCO数据集包含330K张图像,涵盖80个对象类别的超过200K个标签和超过150万个对象实例,存储在Google Cloud上。 diff --git a/totrans/data07_11.yaml b/totrans/data07_11.yaml index 5fcc1c33682477d25adbc55165db4b5dc2d1a41c..7095894be9834c9bd3154e9084a960ac7ca953dc 100644 --- a/totrans/data07_11.yaml +++ b/totrans/data07_11.yaml @@ -1,4 +1,6 @@ - en: PyTorch Libraries + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: PyTorch库 diff --git a/totrans/data07_12.yaml b/totrans/data07_12.yaml index 31e81dd3826d52f36d7cee5789ddbac45739964d..48b5f3fe99c072f301d94d2e4e3eabbdd2a3b372 100644 --- a/totrans/data07_12.yaml +++ b/totrans/data07_12.yaml @@ -1,203 +1,318 @@ - en: TorchServe + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: TorchServe - en: 原文:[https://pytorch.org/serve](https://pytorch.org/serve) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/serve](https://pytorch.org/serve) - en: TorchServe is a performant, flexible and easy to use tool for serving PyTorch models in production. + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: TorchServe是一个性能优越、灵活且易于使用的工具,用于在生产中提供PyTorch模型的服务。 - en: What’s going on in TorchServe? + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: TorchServe中发生了什么? - en: '[High performance Llama 2 deployments with AWS Inferentia2 using TorchServe](https://pytorch.org/blog/high-performance-llama/)' + id: totrans-4 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[使用AWS Inferentia2和TorchServe进行高性能Llama 2部署](https://pytorch.org/blog/high-performance-llama/)' - en: '[Naver Case Study: Transition From High-Cost GPUs to Intel CPUs and oneAPI powered Software with performance](https://pytorch.org/blog/ml-model-server-resource-saving/)' + id: totrans-5 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[Naver案例研究:从高成本GPU过渡到Intel CPU和使用性能的oneAPI软件](https://pytorch.org/blog/ml-model-server-resource-saving/)' - en: '[Run multiple generative AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe and save up to 75% in inference costs](https://aws.amazon.com/blogs/machine-learning/run-multiple-generative-ai-models-on-gpu-using-amazon-sagemaker-multi-model-endpoints-with-torchserve-and-save-up-to-75-in-inference-costs/)' + id: totrans-6 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[使用Amazon SageMaker多模型端点在GPU上运行多个生成AI模型,并节省高达75%的推理成本](https://aws.amazon.com/blogs/machine-learning/run-multiple-generative-ai-models-on-gpu-using-amazon-sagemaker-multi-model-endpoints-with-torchserve-and-save-up-to-75-in-inference-costs/)' - en: '[Deploying your Generative AI model in only four steps with Vertex AI and PyTorch](https://cloud.google.com/blog/products/ai-machine-learning/get-your-genai-model-going-in-four-easy-steps)' + id: totrans-7 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[使用Vertex AI和PyTorch在四个步骤中部署您的生成AI模型](https://cloud.google.com/blog/products/ai-machine-learning/get-your-genai-model-going-in-four-easy-steps)' - en: '[PyTorch Model Serving on Google Cloud TPUv5](https://cloud.google.com/tpu/docs/v5e-inference#pytorch-model-inference-and-serving)' + id: totrans-8 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[在Google Cloud TPUv5上提供PyTorch模型](https://cloud.google.com/tpu/docs/v5e-inference#pytorch-model-inference-and-serving)' - en: '[Monitoring using Datadog](https://www.datadoghq.com/blog/ai-integrations/#model-serving-and-deployment-vertex-ai-amazon-sagemaker-torchserve)' + id: totrans-9 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[使用Datadog进行监控](https://www.datadoghq.com/blog/ai-integrations/#model-serving-and-deployment-vertex-ai-amazon-sagemaker-torchserve)' - en: '[Torchserve Performance Tuning, Animated Drawings Case-Study](https://pytorch.org/blog/torchserve-performance-tuning/)' + id: totrans-10 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[TorchServe性能调优,动画绘图案例研究](https://pytorch.org/blog/torchserve-performance-tuning/)' - en: '[Walmart Search: Serving Models at a Scale on TorchServe](https://medium.com/walmartglobaltech/search-model-serving-using-pytorch-and-torchserve-6caf9d1c5f4d)' + id: totrans-11 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[Walmart搜索:在TorchServe上规模化提供模型](https://medium.com/walmartglobaltech/search-model-serving-using-pytorch-and-torchserve-6caf9d1c5f4d)' - en: '[Scaling inference on CPU with TorchServe](https://www.youtube.com/watch?v=066_Jd6cwZg)' + id: totrans-12 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[使用TorchServe在CPU上扩展推理](https://www.youtube.com/watch?v=066_Jd6cwZg)' - en: '[TorchServe C++ backend](https://www.youtube.com/watch?v=OSmGGDpaesc)' + id: totrans-13 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[TorchServe C++后端](https://www.youtube.com/watch?v=OSmGGDpaesc)' - en: '[Grokking Intel CPU PyTorch performance from first principles: a TorchServe case study](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex.html)' + id: totrans-14 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[从第一原则开始理解Intel CPU PyTorch性能:一个TorchServe案例研究](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex.html)' - en: '[Grokking Intel CPU PyTorch performance from first principles( Part 2): a TorchServe case study](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex_2.html)' + id: totrans-15 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[Grokking Intel CPU PyTorch性能的第一原则(第2部分):一个TorchServe案例研究](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex_2.html)' - en: '[Case Study: Amazon Ads Uses PyTorch and AWS Inferentia to Scale Models for Ads Processing](https://pytorch.org/blog/amazon-ads-case-study/)' + id: totrans-16 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[案例研究:亚马逊广告使用PyTorch和AWS Inferentia扩展广告处理模型](https://pytorch.org/blog/amazon-ads-case-study/)' - en: '[Optimize your inference jobs using dynamic batch inference with TorchServe on Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/optimize-your-inference-jobs-using-dynamic-batch-inference-with-torchserve-on-amazon-sagemaker/)' + id: totrans-17 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[使用TorchServe在Amazon SageMaker上进行动态批量推理优化](https://aws.amazon.com/blogs/machine-learning/optimize-your-inference-jobs-using-dynamic-batch-inference-with-torchserve-on-amazon-sagemaker/)' - en: '[Using AI to bring children’s drawings to life](https://ai.facebook.com/blog/using-ai-to-bring-childrens-drawings-to-life/)' + id: totrans-18 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[使用AI让儿童的绘画栩栩如生](https://ai.facebook.com/blog/using-ai-to-bring-childrens-drawings-to-life/)' - en: '[Model Serving in PyTorch](https://www.youtube.com/watch?v=2A17ZtycsPw)' + id: totrans-19 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[PyTorch中的模型服务](https://www.youtube.com/watch?v=2A17ZtycsPw)' - en: '[Evolution of Cresta’s machine learning architecture: Migration to AWS and PyTorch](https://aws.amazon.com/blogs/machine-learning/evolution-of-crestas-machine-learning-architecture-migration-to-aws-and-pytorch/)' + id: totrans-20 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[Cresta机器学习架构的演变:迁移到AWS和PyTorch](https://aws.amazon.com/blogs/machine-learning/evolution-of-crestas-machine-learning-architecture-migration-to-aws-and-pytorch/)' - en: '[Explain Like I’m 5: TorchServe](https://www.youtube.com/watch?v=NEdZbkfHQCk)' + id: totrans-21 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[像我5岁一样解释:TorchServe](https://www.youtube.com/watch?v=NEdZbkfHQCk)' - en: '[How to Serve PyTorch Models with TorchServe](https://www.youtube.com/watch?v=XlO7iQMV3Ik)' + id: totrans-22 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[如何使用TorchServe为PyTorch模型提供服务](https://www.youtube.com/watch?v=XlO7iQMV3Ik)' - en: '[How to deploy PyTorch models on Vertex AI](https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-deploy-pytorch-models-vertex-ai)' + id: totrans-23 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[如何在Vertex AI上部署PyTorch模型](https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-deploy-pytorch-models-vertex-ai)' - en: '[Quantitative Comparison of Serving Platforms](https://biano-ai.github.io/research/2021/08/16/quantitative-comparison-of-serving-platforms-for-neural-networks.html)' + id: totrans-24 prefs: - PREF_UL type: TYPE_NORMAL + zh: '[服务平台的定量比较](https://biano-ai.github.io/research/2021/08/16/quantitative-comparison-of-serving-platforms-for-neural-networks.html)' - en: All + id: totrans-25 prefs: [] type: TYPE_NORMAL + zh: 全部 - en: '* * *' + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: '* * *' - en: '[#### TorchServe Quick Start' + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: '[#### TorchServe快速入门' - en: 'Topics: Quick Start' + id: totrans-28 prefs: [] type: TYPE_NORMAL + zh: 主题:快速入门 - en: Learn how to install TorchServe and serve models. + id: totrans-29 prefs: [] type: TYPE_NORMAL + zh: 学习如何安装TorchServe并提供模型服务。 - en: '![](../Images/2e44a4dab4c1bd5cde13eaa681343e78.png)](getting_started.html) [#### Running TorchServe' + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/2e44a4dab4c1bd5cde13eaa681343e78.png)](getting_started.html) + [#### 运行TorchServe' - en: 'Topics: Running TorchServe' + id: totrans-31 prefs: [] type: TYPE_NORMAL + zh: 主题:运行TorchServe - en: Indepth explanation of how to run TorchServe + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: 深入解释如何运行TorchServe - en: '![](../Images/661e92286b91a04a664aa0dd434223f4.png)](server.html) [#### Why TorchServe' + id: totrans-33 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/661e92286b91a04a664aa0dd434223f4.png)](server.html) [#### 为什么选择TorchServe' - en: 'Topics: Examples' + id: totrans-34 prefs: [] type: TYPE_NORMAL + zh: 主题:示例 - en: Various TorchServe use cases + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 各种TorchServe用例 - en: '![](../Images/0507eb3112fdbfd24e3e2ba13aa3e3fa.png)](use_cases.html) [#### Performance' + id: totrans-36 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/0507eb3112fdbfd24e3e2ba13aa3e3fa.png)](use_cases.html) [#### + 性能' - en: 'Topics: Performance,Troubleshooting' + id: totrans-37 prefs: [] type: TYPE_NORMAL + zh: 主题:性能,故障排除 - en: Guides and best practices on how to improve perfromance when working with TorchServe + id: totrans-38 prefs: [] type: TYPE_NORMAL + zh: 指南和最佳实践,以提高在使用TorchServe时的性能 - en: '![](../Images/a115bf3860d7637d64025cdabc4de95b.png)](performance_guide.html) [#### Metrics' + id: totrans-39 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/a115bf3860d7637d64025cdabc4de95b.png)](performance_guide.html) + [#### 指标' - en: 'Topics: Metrics,Performance,Troubleshooting' + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: 主题:指标,性能,故障排除 - en: Collecting and viewing Torcherve metrics + id: totrans-41 prefs: [] type: TYPE_NORMAL + zh: 收集和查看Torcherve指标 - en: '![](../Images/eab661f8c4941205ffdc566aced9bccf.png)](metrics.html) [#### Large Model Inference' + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/eab661f8c4941205ffdc566aced9bccf.png)](metrics.html) [#### 大型模型推理' - en: 'Topics: Large-Models,Performance' + id: totrans-43 prefs: [] type: TYPE_NORMAL + zh: 主题:大型模型,性能 - en: Serving Large Models with TorchServe + id: totrans-44 prefs: [] type: TYPE_NORMAL + zh: 使用TorchServe为大型模型提供服务 - en: '![](../Images/f6afe69d86ffcf863cd832ed3698732f.png)](large_model_inference.html) [#### Troubleshooting' + id: totrans-45 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/f6afe69d86ffcf863cd832ed3698732f.png)](large_model_inference.html) + [#### 故障排除' - en: 'Topics: Troubleshooting,Performance' + id: totrans-46 prefs: [] type: TYPE_NORMAL + zh: 主题:故障排除,性能 - en: Various updates on Torcherve and use cases. + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: Torcherve的各种更新和用例。 - en: '![](../Images/d23903f23b5705cc9f1d9bdca6ce6bbb.png)](Troubleshooting.html) [#### TorchServe Security Policy' + id: totrans-48 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/d23903f23b5705cc9f1d9bdca6ce6bbb.png)](Troubleshooting.html) + [#### TorchServe安全策略' - en: 'Topics: Security' + id: totrans-49 prefs: [] type: TYPE_NORMAL + zh: 主题:安全 - en: Security Policy + id: totrans-50 prefs: [] type: TYPE_NORMAL + zh: 安全策略 - en: '![](../Images/2e44a4dab4c1bd5cde13eaa681343e78.png)](security.html) [#### FAQs' + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: '![](../Images/2e44a4dab4c1bd5cde13eaa681343e78.png)](security.html) [#### 常见问题' - en: 'Topics: FAQS' + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: 主题:常见问题 - en: Various frequently asked questions. + id: totrans-53 prefs: [] type: TYPE_NORMAL + zh: 各种常见问题。 - en: '![](../Images/7ccfac0b40fe2fac42582244489f0da4.png)](FAQs.html)' + id: totrans-54 prefs: [] type: TYPE_IMG + zh: '![](../Images/7ccfac0b40fe2fac42582244489f0da4.png)](FAQs.html)' diff --git a/totrans/rec_00.yaml b/totrans/rec_00.yaml index ef03bca71a99049dcbe1d6a03a6247ec25381220..ca1746e48edd4cccd18fb2f5decbb1665e81f07e 100644 --- a/totrans/rec_00.yaml +++ b/totrans/rec_00.yaml @@ -1,7 +1,11 @@ - en: TorchRec Doc + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: TorchRec 文档 - en: 来源:[https://pytorch.org/torchrec/](https://pytorch.org/torchrec/) + id: totrans-1 prefs: [] type: TYPE_NORMAL + zh: 来源:[https://pytorch.org/torchrec/](https://pytorch.org/torchrec/) diff --git a/totrans/rec_01.yaml b/totrans/rec_01.yaml index 83331aa70fe3e64c6818de38514ce1c4ac3abbea..e4882ab41296adfb5b1078eaf84ed632bf4a4cc1 100644 --- a/totrans/rec_01.yaml +++ b/totrans/rec_01.yaml @@ -1,4 +1,6 @@ - en: 'Contents:' + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: '目录:' diff --git a/totrans/rec_02.yaml b/totrans/rec_02.yaml index 6ebb32f643e4d65d24298e2795b93a28b5c47be1..d1ff2d760d7c22f5db2fa859383f3d2f5ccf3fc3 100644 --- a/totrans/rec_02.yaml +++ b/totrans/rec_02.yaml @@ -1,25 +1,37 @@ - en: torchrec.datasets + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchrec.datasets - en: 原文:[https://pytorch.org/torchrec/torchrec.datasets.html](https://pytorch.org/torchrec/torchrec.datasets.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 原文:[https://pytorch.org/torchrec/torchrec.datasets.html](https://pytorch.org/torchrec/torchrec.datasets.html) - en: torchrec.datasets.criteo[](#torchrec-datasets-criteo "Permalink to this heading") + id: totrans-2 prefs: - PREF_H2 type: TYPE_NORMAL + zh: torchrec.datasets.criteo[](#torchrec-datasets-criteo "此标题的永久链接") - en: torchrec.datasets.movielens[](#torchrec-datasets-movielens "Permalink to this heading") + id: totrans-3 prefs: - PREF_H2 type: TYPE_NORMAL + zh: torchrec.datasets.movielens[](#torchrec-datasets-movielens "此标题的永久链接") - en: torchrec.datasets.random[](#torchrec-datasets-random "Permalink to this heading") + id: totrans-4 prefs: - PREF_H2 type: TYPE_NORMAL + zh: torchrec.datasets.random[](#torchrec-datasets-random "此标题的永久链接") - en: torchrec.datasets.utils[](#torchrec-datasets-utils "Permalink to this heading") + id: totrans-5 prefs: - PREF_H2 type: TYPE_NORMAL + zh: torchrec.datasets.utils[](#torchrec-datasets-utils "此标题的永久链接") diff --git a/totrans/rec_03.yaml b/totrans/rec_03.yaml index cef7dddab5250adf7597895659996894183fdb60..6e49a518ff0d58c85cab85e1aacfbef32956eda3 100644 --- a/totrans/rec_03.yaml +++ b/totrans/rec_03.yaml @@ -1,18 +1,28 @@ - en: torchrec.datasets.scripts + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: torchrec.datasets.scripts - en: 原文:[https://pytorch.org/torchrec/torchrec.datasets.scripts.html](https://pytorch.org/torchrec/torchrec.datasets.scripts.html) + id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL + zh: '[https://pytorch.org/torchrec/torchrec.datasets.scripts.html](https://pytorch.org/torchrec/torchrec.datasets.scripts.html)' - en: torchrec.datasets.scripts.contiguous_preproc_criteo[](#torchrec-datasets-scripts-contiguous-preproc-criteo "Permalink to this heading") + id: totrans-2 prefs: - PREF_H2 type: TYPE_NORMAL + zh: torchrec.datasets.scripts.contiguous_preproc_criteo[](#torchrec-datasets-scripts-contiguous-preproc-criteo + "跳转到此标题") - en: torchrec.datasets.scripts.npy_preproc_criteo[](#torchrec-datasets-scripts-npy-preproc-criteo "Permalink to this heading") + id: totrans-3 prefs: - PREF_H2 type: TYPE_NORMAL + zh: torchrec.datasets.scripts.npy_preproc_criteo[](#torchrec-datasets-scripts-npy-preproc-criteo + "跳转到此标题")