2024-02-05 18:28:17

3ebcdd57 · 绝不原创的飞龙 · f26b02ef · 3ebcdd57 · 3ebcdd57 · 3ebcdd57
59 changed file
--- a/totrans/aud22_34.yaml
+++ b/totrans/aud22_34.yaml
 - en: Speech Recognition with Wav2Vec2
+  id: totrans-0
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
+  zh: 使用Wav2Vec2进行语音识别
 - en: 原文：[https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html](https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html)
+  id: totrans-1
  prefs:
  - PREF_BQ
  type: TYPE_NORMAL
+  zh: 原文：[https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html](https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html)
 - en: Note
+  id: totrans-2
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: Click [here](#sphx-glr-download-tutorials-speech-recognition-pipeline-tutorial-py)
    to download the full example code
+  id: totrans-3
  prefs: []
  type: TYPE_NORMAL
+  zh: 点击[这里](#sphx-glr-download-tutorials-speech-recognition-pipeline-tutorial-py)下载完整示例代码
 - en: '**Author**: [Moto Hira](mailto:moto%40meta.com)'
+  id: totrans-4
  prefs: []
  type: TYPE_NORMAL
+  zh: '**作者**：[Moto Hira](mailto:moto%40meta.com)'
 - en: This tutorial shows how to perform speech recognition using using pre-trained
    models from wav2vec 2.0 [[paper](https://arxiv.org/abs/2006.11477)].
+  id: totrans-5
  prefs: []
  type: TYPE_NORMAL
+  zh: 本教程展示了如何使用来自wav2vec 2.0的预训练模型执行语音识别[[论文](https://arxiv.org/abs/2006.11477)]。
 - en: Overview[](#overview "Permalink to this heading")
+  id: totrans-6
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 概述[](#overview "跳转到此标题")
 - en: The process of speech recognition looks like the following.
+  id: totrans-7
  prefs: []
  type: TYPE_NORMAL
+  zh: 语音识别的过程如下所示。
 - en: Extract the acoustic features from audio waveform
+  id: totrans-8
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 从音频波形中提取声学特征
 - en: Estimate the class of the acoustic features frame-by-frame
+  id: totrans-9
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 逐帧估计声学特征的类别
 - en: Generate hypothesis from the sequence of the class probabilities
+  id: totrans-10
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 从类概率序列生成假设
 - en: Torchaudio provides easy access to the pre-trained weights and associated information,
    such as the expected sample rate and class labels. They are bundled together and
    available under [`torchaudio.pipelines`](../pipelines.html#module-torchaudio.pipelines
    "torchaudio.pipelines") module.
+  id: totrans-11
  prefs: []
  type: TYPE_NORMAL
+  zh: Torchaudio提供了对预训练权重和相关信息的简单访问，例如预期的采样率和类标签。它们被捆绑在一起，并在[`torchaudio.pipelines`](../pipelines.html#module-torchaudio.pipelines
+    "torchaudio.pipelines")模块下提供。
 - en: Preparation[](#preparation "Permalink to this heading")
+  id: totrans-12
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 准备[](#preparation "跳转到此标题")
 - en: '[PRE0]'
+  id: totrans-13
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE0]'
 - en: '[PRE1]'
+  id: totrans-14
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE1]'
 - en: '[PRE2]'
+  id: totrans-15
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE2]'
 - en: '[PRE3]'
+  id: totrans-16
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE3]'
 - en: Creating a pipeline[](#creating-a-pipeline "Permalink to this heading")
+  id: totrans-17
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 创建管道[](#creating-a-pipeline "跳转到此标题")
 - en: First, we will create a Wav2Vec2 model that performs the feature extraction
    and the classification.
+  id: totrans-18
  prefs: []
  type: TYPE_NORMAL
+  zh: 首先，我们将创建一个执行特征提取和分类的Wav2Vec2模型。
 - en: There are two types of Wav2Vec2 pre-trained weights available in torchaudio.
    The ones fine-tuned for ASR task, and the ones not fine-tuned.
+  id: totrans-19
  prefs: []
  type: TYPE_NORMAL
+  zh: torchaudio中有两种类型的Wav2Vec2预训练权重。一种是为ASR任务微调的，另一种是未经微调的。
 - en: Wav2Vec2 (and HuBERT) models are trained in self-supervised manner. They are
    firstly trained with audio only for representation learning, then fine-tuned for
    a specific task with additional labels.
+  id: totrans-20
  prefs: []
  type: TYPE_NORMAL
+  zh: Wav2Vec2（和HuBERT）模型以自监督方式进行训练。它们首先仅使用音频进行表示学习的训练，然后再使用附加标签进行特定任务的微调。
 - en: The pre-trained weights without fine-tuning can be fine-tuned for other downstream
    tasks as well, but this tutorial does not cover that.
+  id: totrans-21
  prefs: []
  type: TYPE_NORMAL
+  zh: 未经微调的预训练权重也可以用于其他下游任务的微调，但本教程不涵盖此内容。
 - en: We will use [`torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H`](../generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H
    "torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H") here.
+  id: totrans-22
  prefs: []
  type: TYPE_NORMAL
+  zh: 我们将在这里使用[`torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H`](../generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H
+    "torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H")。
 - en: There are multiple pre-trained models available in [`torchaudio.pipelines`](../pipelines.html#module-torchaudio.pipelines
    "torchaudio.pipelines"). Please check the documentation for the detail of how
    they are trained.
+  id: totrans-23
  prefs: []
  type: TYPE_NORMAL
+  zh: '[`torchaudio.pipelines`](../pipelines.html#module-torchaudio.pipelines "torchaudio.pipelines")中有多个预训练模型可用。请查看文档以了解它们的训练方式的详细信息。'
 - en: The bundle object provides the interface to instantiate model and other information.
    Sampling rate and the class labels are found as follow.
+  id: totrans-24
  prefs: []
  type: TYPE_NORMAL
+  zh: bundle对象提供了实例化模型和其他信息的接口。采样率和类标签如下所示。
 - en: '[PRE4]'
+  id: totrans-25
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE4]'
 - en: '[PRE5]'
+  id: totrans-26
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE5]'
 - en: Model can be constructed as following. This process will automatically fetch
    the pre-trained weights and load it into the model.
+  id: totrans-27
  prefs: []
  type: TYPE_NORMAL
+  zh: 模型可以按以下方式构建。此过程将自动获取预训练权重并将其加载到模型中。
 - en: '[PRE6]'
+  id: totrans-28
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE6]'
 - en: '[PRE7]'
+  id: totrans-29
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE7]'
 - en: Loading data[](#loading-data "Permalink to this heading")
+  id: totrans-30
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 加载数据[](#loading-data "跳转到此标题")
 - en: We will use the speech data from [VOiCES dataset](https://iqtlabs.github.io/voices/),
    which is licensed under Creative Commos BY 4.0.
+  id: totrans-31
  prefs: []
  type: TYPE_NORMAL
+  zh: 我们将使用[VOiCES数据集](https://iqtlabs.github.io/voices/)中的语音数据，该数据集在Creative Commos
+    BY 4.0下许可。
 - en: '[PRE8]'
+  id: totrans-32
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE8]'
+- en: null
+  id: totrans-33
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-34
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: To load data, we use [`torchaudio.load()`](../generated/torchaudio.load.html#torchaudio.load
    "torchaudio.load").
+  id: totrans-35
  prefs: []
  type: TYPE_NORMAL
+  zh: 为了加载数据，我们使用[`torchaudio.load()`](../generated/torchaudio.load.html#torchaudio.load
+    "torchaudio.load")。
 - en: If the sampling rate is different from what the pipeline expects, then we can
    use [`torchaudio.functional.resample()`](../generated/torchaudio.functional.resample.html#torchaudio.functional.resample
    "torchaudio.functional.resample") for resampling.
+  id: totrans-36
  prefs: []
  type: TYPE_NORMAL
+  zh: 如果采样率与管道期望的不同，则可以使用[`torchaudio.functional.resample()`](../generated/torchaudio.functional.resample.html#torchaudio.functional.resample
+    "torchaudio.functional.resample")进行重采样。
 - en: Note
+  id: totrans-37
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: '[`torchaudio.functional.resample()`](../generated/torchaudio.functional.resample.html#torchaudio.functional.resample
    "torchaudio.functional.resample") works on CUDA tensors as well.'
+  id: totrans-38
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
+  zh: '[`torchaudio.functional.resample()`](../generated/torchaudio.functional.resample.html#torchaudio.functional.resample
+    "torchaudio.functional.resample")也适用于CUDA张量。'
 - en: When performing resampling multiple times on the same set of sample rates, using
    [`torchaudio.transforms.Resample`](../generated/torchaudio.transforms.Resample.html#torchaudio.transforms.Resample
    "torchaudio.transforms.Resample") might improve the performace.
+  id: totrans-39
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
+  zh: 在同一组采样率上多次执行重采样时，使用[`torchaudio.transforms.Resample`](../generated/torchaudio.transforms.Resample.html#torchaudio.transforms.Resample
+    "torchaudio.transforms.Resample")可能会提高性能。
 - en: '[PRE9]'
+  id: totrans-40
  prefs: []
  type: TYPE_PRE
- en: Extracting acoustic features[](#extracting-acoustic-features "Permalink to
-    this heading")
+  zh: '[PRE9]'
+- en: Extracting acoustic features[](#extracting-acoustic-features "Permalink to this
+    heading")
+  id: totrans-41
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 提取声学特征[](#extracting-acoustic-features "跳转到此标题")
 - en: The next step is to extract acoustic features from the audio.
+  id: totrans-42
  prefs: []
  type: TYPE_NORMAL
+  zh: 下一步是从音频中提取声学特征。
 - en: Note
+  id: totrans-43
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: Wav2Vec2 models fine-tuned for ASR task can perform feature extraction and classification
    with one step, but for the sake of the tutorial, we also show how to perform feature
    extraction here.
+  id: totrans-44
  prefs: []
  type: TYPE_NORMAL
+  zh: 为ASR任务微调的Wav2Vec2模型可以一步完成特征提取和分类，但为了教程的目的，我们还展示了如何在此处执行特征提取。
 - en: '[PRE10]'
+  id: totrans-45
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE10]'
 - en: The returned features is a list of tensors. Each tensor is the output of a transformer
    layer.
+  id: totrans-46
  prefs: []
  type: TYPE_NORMAL
+  zh: 返回的特征是一个张量列表。每个张量是一个变换器层的输出。
 - en: '[PRE11]'
+  id: totrans-47
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE11]'
 - en: '![Feature from transformer layer 1, Feature from transformer layer 2, Feature
    from transformer layer 3, Feature from transformer layer 4, Feature from transformer
    layer 5, Feature from transformer layer 6, Feature from transformer layer 7, Feature
    from transformer layer 8, Feature from transformer layer 9, Feature from transformer
    layer 10, Feature from transformer layer 11, Feature from transformer layer 12](../Images/9f2d3410922166561ebdadfd4981e797.png)'
+  id: totrans-48
  prefs: []
  type: TYPE_IMG
+  zh: '![来自变换器层1的特征，来自变换器层2的特征，来自变换器层3的特征，来自变换器层4的特征，来自变换器层5的特征，来自变换器层6的特征，来自变换器层7的特征，来自变换器层8的特征，来自变换器层9的特征，来自变换器层10的特征，来自变换器层11的特征，来自变换器层12的特征](../Images/9f2d3410922166561ebdadfd4981e797.png)'
 - en: Feature classification[](#feature-classification "Permalink to this heading")
+  id: totrans-49
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 特征分类
 - en: Once the acoustic features are extracted, the next step is to classify them
    into a set of categories.
+  id: totrans-50
  prefs: []
  type: TYPE_NORMAL
+  zh: 一旦提取了声学特征，下一步就是将它们分类到一组类别中。
 - en: Wav2Vec2 model provides method to perform the feature extraction and classification
    in one step.
+  id: totrans-51
  prefs: []
  type: TYPE_NORMAL
+  zh: Wav2Vec2模型提供了一种在一步中执行特征提取和分类的方法。
 - en: '[PRE12]'
+  id: totrans-52
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE12]'
 - en: The output is in the form of logits. It is not in the form of probability.
+  id: totrans-53
  prefs: []
  type: TYPE_NORMAL
+  zh: 输出以logits的形式呈现，而不是概率的形式。
 - en: Let’s visualize this.
+  id: totrans-54
  prefs: []
  type: TYPE_NORMAL
+  zh: 让我们可视化这个过程。
 - en: '[PRE13]'
+  id: totrans-55
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE13]'
 - en: '![Classification result](../Images/ce8601d728900194dc8cb21fbd524cf7.png)'
+  id: totrans-56
  prefs: []
  type: TYPE_IMG
+  zh: '![分类结果](../Images/ce8601d728900194dc8cb21fbd524cf7.png)'
 - en: '[PRE14]'
+  id: totrans-57
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE14]'
 - en: We can see that there are strong indications to certain labels across the time
    line.
+  id: totrans-58
  prefs: []
  type: TYPE_NORMAL
+  zh: 我们可以看到在时间线上有对某些标签的强烈指示。
 - en: Generating transcripts[](#generating-transcripts "Permalink to this heading")
+  id: totrans-59
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 生成转录
 - en: From the sequence of label probabilities, now we want to generate transcripts.
    The process to generate hypotheses is often called “decoding”.
+  id: totrans-60
  prefs: []
  type: TYPE_NORMAL
+  zh: 从标签概率序列中，现在我们想生成转录。生成假设的过程通常称为“解码”。
 - en: Decoding is more elaborate than simple classification because decoding at certain
    time step can be affected by surrounding observations.
+  id: totrans-61
  prefs: []
  type: TYPE_NORMAL
+  zh: 解码比简单分类更复杂，因为在某个时间步骤的解码可能会受到周围观察的影响。
 - en: For example, take a word like `night` and `knight`. Even if their prior probability
    distribution are differnt (in typical conversations, `night` would occur way more
    often than `knight`), to accurately generate transcripts with `knight`, such as
    `a knight with a sword`, the decoding process has to postpone the final decision
    until it sees enough context.
+  id: totrans-62
  prefs: []
  type: TYPE_NORMAL
+  zh: 例如，拿一个词像`night`和`knight`。即使它们的先验概率分布不同（在典型对话中，`night`会比`knight`发生得更频繁），为了准确生成带有`knight`的转录，比如`a
+    knight with a sword`，解码过程必须推迟最终决定，直到看到足够的上下文。
 - en: There are many decoding techniques proposed, and they require external resources,
    such as word dictionary and language models.
+  id: totrans-63
  prefs: []
  type: TYPE_NORMAL
+  zh: 有许多提出的解码技术，它们需要外部资源，如单词词典和语言模型。
 - en: In this tutorial, for the sake of simplicity, we will perform greedy decoding
    which does not depend on such external components, and simply pick up the best
    hypothesis at each time step. Therefore, the context information are not used,
    and only one transcript can be generated.
+  id: totrans-64
  prefs: []
  type: TYPE_NORMAL
+  zh: 在本教程中，为了简单起见，我们将执行贪婪解码，它不依赖于外部组件，并且只在每个时间步骤选择最佳假设。因此，上下文信息未被使用，只能生成一个转录。
 - en: We start by defining greedy decoding algorithm.
+  id: totrans-65
  prefs: []
  type: TYPE_NORMAL
+  zh: 我们首先定义贪婪解码算法。
 - en: '[PRE15]'
+  id: totrans-66
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE15]'
 - en: Now create the decoder object and decode the transcript.
+  id: totrans-67
  prefs: []
  type: TYPE_NORMAL
+  zh: 现在创建解码器对象并解码转录。
 - en: '[PRE16]'
+  id: totrans-68
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE16]'
 - en: Let’s check the result and listen again to the audio.
+  id: totrans-69
  prefs: []
  type: TYPE_NORMAL
+  zh: 让我们检查结果并再次听音频。
 - en: '[PRE17]'
+  id: totrans-70
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE17]'
 - en: '[PRE18]'
+  id: totrans-71
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE18]'
+- en: null
+  id: totrans-72
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-73
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: The ASR model is fine-tuned using a loss function called Connectionist Temporal
    Classification (CTC). The detail of CTC loss is explained [here](https://distill.pub/2017/ctc/).
    In CTC a blank token (ϵ) is a special token which represents a repetition of the
    previous symbol. In decoding, these are simply ignored.
+  id: totrans-74
  prefs: []
  type: TYPE_NORMAL
+  zh: ASR模型使用一种称为连接主义时间分类（CTC）的损失函数进行微调。CTC损失的详细信息在[这里](https://distill.pub/2017/ctc/)有解释。在CTC中，空白标记（ϵ）是一个特殊标记，表示前一个符号的重复。在解码中，这些标记被简单地忽略。
 - en: Conclusion[](#conclusion "Permalink to this heading")
+  id: totrans-75
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 结论
 - en: In this tutorial, we looked at how to use [`Wav2Vec2ASRBundle`](../generated/torchaudio.pipelines.Wav2Vec2ASRBundle.html#torchaudio.pipelines.Wav2Vec2ASRBundle
    "torchaudio.pipelines.Wav2Vec2ASRBundle") to perform acoustic feature extraction
    and speech recognition. Constructing a model and getting the emission is as short
    as two lines.
+  id: totrans-76
  prefs: []
  type: TYPE_NORMAL
+  zh: 在本教程中，我们看了如何使用[`Wav2Vec2ASRBundle`](../generated/torchaudio.pipelines.Wav2Vec2ASRBundle.html#torchaudio.pipelines.Wav2Vec2ASRBundle)执行声学特征提取和语音识别。构建模型并获取发射只需两行代码。
 - en: '[PRE19]'
+  id: totrans-77
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE19]'
 - en: '**Total running time of the script:** ( 0 minutes 6.833 seconds)'
+  id: totrans-78
  prefs: []
  type: TYPE_NORMAL
+  zh: '**脚本的总运行时间：**（0分钟6.833秒）'
 - en: '[`Download Python source code: speech_recognition_pipeline_tutorial.py`](../_downloads/a0b5016bbf34fce4ac5549f4075dd10f/speech_recognition_pipeline_tutorial.py)'
+  id: totrans-79
  prefs: []
  type: TYPE_NORMAL
+  zh: '[`下载Python源代码：speech_recognition_pipeline_tutorial.py`](../_downloads/a0b5016bbf34fce4ac5549f4075dd10f/speech_recognition_pipeline_tutorial.py)'
 - en: '[`Download Jupyter notebook: speech_recognition_pipeline_tutorial.ipynb`](../_downloads/ca83af2ea8d7db05fb63211d515b7fde/speech_recognition_pipeline_tutorial.ipynb)'
+  id: totrans-80
  prefs: []
  type: TYPE_NORMAL
+  zh: '[`下载Jupyter笔记本：speech_recognition_pipeline_tutorial.ipynb`](../_downloads/ca83af2ea8d7db05fb63211d515b7fde/speech_recognition_pipeline_tutorial.ipynb)'
 - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)'
+  id: totrans-81
  prefs: []
  type: TYPE_NORMAL
+  zh: '[Sphinx-Gallery生成的图库](https://sphinx-gallery.github.io)'
--- a/totrans/aud22_35.yaml
+++ b/totrans/aud22_35.yaml
--- a/totrans/aud22_36.yaml
+++ b/totrans/aud22_36.yaml
 - en: ASR Inference with CUDA CTC Decoder
+  id: totrans-0
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
+  zh: 使用CUDA CTC解码器进行ASR推理
 - en: 原文：[https://pytorch.org/audio/stable/tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html](https://pytorch.org/audio/stable/tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html)
+  id: totrans-1
  prefs:
  - PREF_BQ
  type: TYPE_NORMAL
+  zh: 原文：[https://pytorch.org/audio/stable/tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html](https://pytorch.org/audio/stable/tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html)
 - en: Note
+  id: totrans-2
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: Click [here](#sphx-glr-download-tutorials-asr-inference-with-cuda-ctc-decoder-tutorial-py)
    to download the full example code
+  id: totrans-3
  prefs: []
  type: TYPE_NORMAL
+  zh: 点击这里下载完整示例代码
 - en: '**Author**: [Yuekai Zhang](mailto:yuekaiz%40nvidia.com)'
+  id: totrans-4
  prefs: []
  type: TYPE_NORMAL
+  zh: 作者：Yuekai Zhang
 - en: This tutorial shows how to perform speech recognition inference using a CUDA-based
    CTC beam search decoder. We demonstrate this on a pretrained [Zipformer](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_ctc)
    model from [Next-gen Kaldi](https://nadirapovey.com/next-gen-kaldi-what-is-it)
    project.
+  id: totrans-5
  prefs: []
  type: TYPE_NORMAL
+  zh: 本教程展示了如何使用基于CUDA的CTC波束搜索解码器执行语音识别推理。我们在来自[Next-gen Kaldi](https://nadirapovey.com/next-gen-kaldi-what-is-it)项目的预训练[Zipformer](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_ctc)模型上演示了这一点。
 - en: Overview[](#overview "Permalink to this heading")
+  id: totrans-6
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 概述
 - en: Beam search decoding works by iteratively expanding text hypotheses (beams)
    with next possible characters, and maintaining only the hypotheses with the highest
    scores at each time step.
+  id: totrans-7
  prefs: []
  type: TYPE_NORMAL
+  zh: 波束搜索解码通过迭代地扩展文本假设（波束）与下一个可能的字符，并在每个时间步仅保留得分最高的假设来工作。
 - en: The underlying implementation uses cuda to acclerate the whole decoding process
+  id: totrans-8
  prefs: []
  type: TYPE_NORMAL
+  zh: 底层实现使用cuda来加速整个解码过程
 - en: A mathematical formula for the decoder can be
+  id: totrans-9
  prefs: []
  type: TYPE_NORMAL
+  zh: 解码器的数学公式可以是
 - en: found in the [paper](https://arxiv.org/pdf/1408.2873.pdf), and a more detailed
    algorithm can be found in this [blog](https://distill.pub/2017/ctc/).
+  id: totrans-10
  prefs: []
  type: TYPE_NORMAL
+  zh: 在[论文](https://arxiv.org/pdf/1408.2873.pdf)中找到，并且更详细的算法可以在这个[博客](https://distill.pub/2017/ctc/)中找到。
 - en: Running ASR inference using a CUDA CTC Beam Search decoder requires the following
    components
+  id: totrans-11
  prefs: []
  type: TYPE_NORMAL
+  zh: 使用CUDA CTC波束搜索解码器运行ASR推理需要以下组件
 - en: 'Acoustic Model: model predicting modeling units (BPE in this tutorial) from
    acoustic features'
+  id: totrans-12
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
+  zh: 声学模型：从声学特征预测建模单元（本教程中为BPE）的模型
 - en: 'BPE Model: the byte-pair encoding (BPE) tokenizer file'
+  id: totrans-13
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
+  zh: BPE模型：字节对编码（BPE）分词器文件
 - en: Acoustic Model and Set Up[](#acoustic-model-and-set-up "Permalink to this heading")
+  id: totrans-14
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 声学模型和设置
 - en: First we import the necessary utilities and fetch the data that we are working
    with
+  id: totrans-15
  prefs: []
  type: TYPE_NORMAL
+  zh: 首先，我们导入必要的工具并获取我们要处理的数据
 - en: '[PRE0]'
+  id: totrans-16
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE0]'
 - en: '[PRE1]'
+  id: totrans-17
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE1]'
 - en: '[PRE2]'
+  id: totrans-18
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE2]'
 - en: We use the pretrained [Zipformer](https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-ctc-2022-12-01)
    model that is trained on the [LibriSpeech dataset](http://www.openslr.org/12).
    The model is jointly trained with CTC and Transducer loss functions. In this tutorial,
    we only use CTC head of the model.
+  id: totrans-19
  prefs: []
  type: TYPE_NORMAL
+  zh: 我们使用预训练的[Zipformer](https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-ctc-2022-12-01)模型，该模型在[LibriSpeech数据集](http://www.openslr.org/12)上进行了训练。该模型同时使用CTC和Transducer损失函数进行训练。在本教程中，我们仅使用模型的CTC头部。
 - en: '[PRE3]'
+  id: totrans-20
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE3]'
 - en: '[PRE4]'
+  id: totrans-21
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE4]'
 - en: We will load a sample from the LibriSpeech test-other dataset.
+  id: totrans-22
  prefs: []
  type: TYPE_NORMAL
+  zh: 我们将从LibriSpeech测试其他数据集中加载一个样本。
 - en: '[PRE5]'
+  id: totrans-23
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE5]'
 - en: '[PRE6]'
+  id: totrans-24
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE6]'
+- en: null
+  id: totrans-25
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-26
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: The transcript corresponding to this audio file is
+  id: totrans-27
  prefs: []
  type: TYPE_NORMAL
+  zh: 与此音频文件对应的抄本是
 - en: '[PRE7]'
+  id: totrans-28
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE7]'
 - en: Files and Data for Decoder[](#files-and-data-for-decoder "Permalink to this
    heading")
+  id: totrans-29
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 解码器的文件和数据
 - en: Next, we load in our token from BPE model, which is the tokenizer for decoding.
+  id: totrans-30
  prefs: []
  type: TYPE_NORMAL
+  zh: 接下来，我们从BPE模型中加载我们的标记，这是用于解码的分词器。
 - en: Tokens[](#tokens "Permalink to this heading")
+  id: totrans-31
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
+  zh: 标记
 - en: The tokens are the possible symbols that the acoustic model can predict, including
    the blank symbol in CTC. In this tutorial, it includes 500 BPE tokens. It can
    either be passed in as a file, where each line consists of the tokens corresponding
    to the same index, or as a list of tokens, each mapping to a unique index.
+  id: totrans-32
  prefs: []
  type: TYPE_NORMAL
+  zh: 标记是声学模型可以预测的可能符号，包括CTC中的空白符号。在本教程中，它包括500个BPE标记。它可以作为文件传入，其中每行包含与相同索引对应的标记，或作为标记列表传入，每个标记映射到一个唯一的索引。
 - en: '[PRE8]'
+  id: totrans-33
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE8]'
 - en: '[PRE9]'
+  id: totrans-34
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE9]'
 - en: '[PRE10]'
+  id: totrans-35
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE10]'
 - en: Construct CUDA Decoder[](#construct-cuda-decoder "Permalink to this heading")
+  id: totrans-36
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 构建CUDA解码器
 - en: In this tutorial, we will construct a CUDA beam search decoder. The decoder
    can be constructed using the factory function [`cuda_ctc_decoder()`](../generated/torchaudio.models.decoder.cuda_ctc_decoder.html#torchaudio.models.decoder.cuda_ctc_decoder
    "torchaudio.models.decoder.cuda_ctc_decoder").
+  id: totrans-37
  prefs: []
  type: TYPE_NORMAL
+  zh: 在本教程中，我们将构建一个CUDA波束搜索解码器。可以使用工厂函数[`cuda_ctc_decoder()`](../generated/torchaudio.models.decoder.cuda_ctc_decoder.html#torchaudio.models.decoder.cuda_ctc_decoder
+    "torchaudio.models.decoder.cuda_ctc_decoder")来构建解码器。
 - en: '[PRE11]'
+  id: totrans-38
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE11]'
 - en: Run Inference[](#run-inference "Permalink to this heading")
+  id: totrans-39
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 运行推理
 - en: Now that we have the data, acoustic model, and decoder, we can perform inference.
    The output of the beam search decoder is of type [`CUCTCHypothesis`](../generated/torchaudio.models.decoder.CUCTCDecoder.html#torchaudio.models.decoder.CUCTCHypothesis
    "torchaudio.models.decoder.CUCTCHypothesis"), consisting of the predicted token
    IDs, words (symbols corresponding to the token IDs), and hypothesis scores. Recall
    the transcript corresponding to the waveform is
+  id: totrans-40
  prefs: []
  type: TYPE_NORMAL
+  zh: 现在我们有了数据、声学模型和解码器，我们可以执行推理。波束搜索解码器的输出类型为[`CUCTCHypothesis`](../generated/torchaudio.models.decoder.CUCTCDecoder.html#torchaudio.models.decoder.CUCTCHypothesis
+    "torchaudio.models.decoder.CUCTCHypothesis")，包括预测的标记ID、单词（与标记ID对应的符号）和假设分数。回想一下与波形对应的抄本是
 - en: '[PRE12]'
+  id: totrans-41
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE12]'
 - en: '[PRE13]'
+  id: totrans-42
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE13]'
 - en: '[PRE14]'
+  id: totrans-43
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE14]'
 - en: The cuda ctc decoder gives the following result.
+  id: totrans-44
  prefs: []
  type: TYPE_NORMAL
+  zh: cuda ctc解码器给出以下结果。
 - en: '[PRE15]'
+  id: totrans-45
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE15]'
 - en: '[PRE16]'
+  id: totrans-46
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE16]'
 - en: Beam Search Decoder Parameters[](#beam-search-decoder-parameters "Permalink
    to this heading")
+  id: totrans-47
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 波束搜索解码器参数
 - en: In this section, we go a little bit more in depth about some different parameters
    and tradeoffs. For the full list of customizable parameters, please refer to the
    [`documentation`](../generated/torchaudio.models.decoder.cuda_ctc_decoder.html#torchaudio.models.decoder.cuda_ctc_decoder
    "torchaudio.models.decoder.cuda_ctc_decoder").
+  id: totrans-48
  prefs: []
  type: TYPE_NORMAL
+  zh: 在本节中，我们将更深入地讨论一些不同参数和权衡。有关可自定义参数的完整列表，请参考[`文档`](../generated/torchaudio.models.decoder.cuda_ctc_decoder.html#torchaudio.models.decoder.cuda_ctc_decoder
+    "torchaudio.models.decoder.cuda_ctc_decoder")。
 - en: Helper Function[](#helper-function "Permalink to this heading")
+  id: totrans-49
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
+  zh: 辅助函数[](#helper-function "跳转到此标题")
 - en: '[PRE17]'
+  id: totrans-50
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE17]'
 - en: nbest[](#nbest "Permalink to this heading")
+  id: totrans-51
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
+  zh: nbest[](#nbest "跳转到此标题")
 - en: This parameter indicates the number of best hypotheses to return. For instance,
    by setting `nbest=10` when constructing the beam search decoder earlier, we can
    now access the hypotheses with the top 10 scores.
+  id: totrans-52
  prefs: []
  type: TYPE_NORMAL
+  zh: 此参数表示要返回的最佳假设数量。例如，在之前构建波束搜索解码器时设置 `nbest=10`，现在我们可以访问得分前10名的假设。
 - en: '[PRE18]'
+  id: totrans-53
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE18]'
 - en: '[PRE19]'
+  id: totrans-54
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE19]'
 - en: beam size[](#beam-size "Permalink to this heading")
+  id: totrans-55
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
+  zh: 波束大小[](#beam-size "跳转到此标题")
 - en: The `beam_size` parameter determines the maximum number of best hypotheses to
    hold after each decoding step. Using larger beam sizes allows for exploring a
    larger range of possible hypotheses which can produce hypotheses with higher scores,
    but it does not provide additional gains beyond a certain point. We recommend
    to set beam_size=10 for cuda beam search decoder.
+  id: totrans-56
  prefs: []
  type: TYPE_NORMAL
+  zh: '`beam_size`参数确定每个解码步骤后保留的最佳假设数量上限。使用更大的波束大小可以探索更广泛的可能假设范围，这可以产生得分更高的假设，但在一定程度上不会提供额外的收益。我们建议为cuda波束搜索解码器设置`beam_size=10`。'
 - en: In the example below, we see improvement in decoding quality as we increase
    beam size from 1 to 3, but notice how using a beam size of 3 provides the same
    output as beam size 10.
+  id: totrans-57
  prefs: []
  type: TYPE_NORMAL
+  zh: 在下面的示例中，我们可以看到随着波束大小从1增加到3，解码质量有所提高，但请注意，使用波束大小为3时提供与波束大小为10相同的输出。
 - en: '[PRE20]'
+  id: totrans-58
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE20]'
 - en: '[PRE21]'
+  id: totrans-59
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE21]'
 - en: blank skip threshold[](#blank-skip-threshold "Permalink to this heading")
+  id: totrans-60
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
+  zh: blank skip threshold[](#blank-skip-threshold "跳转到此标题")
 - en: The `blank_skip_threshold` parameter is used to prune the frames which have
    large blank probability. Pruning these frames with a good blank_skip_threshold
    could speed up decoding process a lot while no accuracy drop. Since the rule of
    CTC, we would keep at least one blank frame between two non-blank frames to avoid
    mistakenly merge two consecutive identical symbols. We recommend to set blank_skip_threshold=0.95
    for cuda beam search decoder.
+  id: totrans-61
  prefs: []
  type: TYPE_NORMAL
+  zh: '`blank_skip_threshold`参数用于修剪具有较大空白概率的帧。使用良好的`blank_skip_threshold`修剪这些帧可以大大加快解码过程，而不会降低准确性。根据CTC规则，我们应至少在两个非空白帧之间保留一个空白帧，以避免错误地合并两个连续相同的符号。我们建议为cuda波束搜索解码器设置`blank_skip_threshold=0.95`。'
 - en: '[PRE22]'
+  id: totrans-62
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE22]'
 - en: '[PRE23]'
+  id: totrans-63
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE23]'
 - en: Benchmark with flashlight CPU decoder[](#benchmark-with-flashlight-cpu-decoder
    "Permalink to this heading")
+  id: totrans-64
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 使用手电筒CPU解码器进行基准测试[](#benchmark-with-flashlight-cpu-decoder "跳转到此标题")
 - en: We benchmark the throughput and accuracy between CUDA decoder and CPU decoder
    using librispeech test_other set. To reproduce below benchmark results, you may
    refer [here](https://github.com/pytorch/audio/tree/main/examples/asr/librispeech_cuda_ctc_decoder).
+  id: totrans-65
  prefs: []
  type: TYPE_NORMAL
+  zh: 我们使用librispeech test_other数据集对CUDA解码器和CPU解码器之间的吞吐量和准确性进行基准测试。要重现下面的基准测试结果，您可以参考[这里](https://github.com/pytorch/audio/tree/main/examples/asr/librispeech_cuda_ctc_decoder)。
 - en: '| Decoder | Setting | WER (%) | N-Best Oracle WER (%) | Decoder Cost Time (seconds)
    |'
+  id: totrans-66
  prefs: []
  type: TYPE_TB
+  zh: '| 解码器 | 设置 | WER (%) | N-Best Oracle WER (%) | 解码器成本时间 (秒) |'
 - en: '| --- | --- | --- | --- | --- |'
+  id: totrans-67
  prefs: []
  type: TYPE_TB
+  zh: '| --- | --- | --- | --- | --- |'
 - en: '| CUDA decoder | blank_skip_threshold 0.95 | 5.81 | 4.11 | 2.57 |'
+  id: totrans-68
  prefs: []
  type: TYPE_TB
+  zh: '| CUDA解码器 | blank_skip_threshold 0.95 | 5.81 | 4.11 | 2.57 |'
 - en: '| CUDA decoder | blank_skip_threshold 1.0 (no frame-skip) | 5.81 | 4.09 | 6.24
    |'
+  id: totrans-69
  prefs: []
  type: TYPE_TB
+  zh: '| CUDA解码器 | blank_skip_threshold 1.0 (无帧跳过) | 5.81 | 4.09 | 6.24 |'
 - en: '| CPU decoder | beam_size_token 10 | 5.86 | 4.30 | 28.61 |'
+  id: totrans-70
  prefs: []
  type: TYPE_TB
+  zh: '| CPU解码器 | beam_size_token 10 | 5.86 | 4.30 | 28.61 |'
 - en: '| CPU decoder | beam_size_token 500 | 5.86 | 4.30 | 791.80 |'
+  id: totrans-71
  prefs: []
  type: TYPE_TB
+  zh: '| CPU解码器 | beam_size_token 500 | 5.86 | 4.30 | 791.80 |'
 - en: From the above table, CUDA decoder could give a slight improvement in WER and
    a significant increase in throughput.
+  id: totrans-72
  prefs: []
  type: TYPE_NORMAL
+  zh: 从上表中可以看出，CUDA解码器在WER方面略有改善，并且吞吐量显著增加。
 - en: '**Total running time of the script:** ( 0 minutes 8.752 seconds)'
+  id: totrans-73
  prefs: []
  type: TYPE_NORMAL
+  zh: '**脚本的总运行时间:** ( 0 分钟 8.752 秒)'
 - en: '[`Download Python source code: asr_inference_with_cuda_ctc_decoder_tutorial.py`](../_downloads/3956cf493d21711e687e9610c91f9cd1/asr_inference_with_cuda_ctc_decoder_tutorial.py)'
+  id: totrans-74
  prefs: []
  type: TYPE_NORMAL
+  zh: '[`下载Python源代码: asr_inference_with_cuda_ctc_decoder_tutorial.py`](../_downloads/3956cf493d21711e687e9610c91f9cd1/asr_inference_with_cuda_ctc_decoder_tutorial.py)'
 - en: '[`Download Jupyter notebook: asr_inference_with_cuda_ctc_decoder_tutorial.ipynb`](../_downloads/96982138e59c541534342222a3f5c69e/asr_inference_with_cuda_ctc_decoder_tutorial.ipynb)'
+  id: totrans-75
  prefs: []
  type: TYPE_NORMAL
+  zh: '[`下载Jupyter笔记本: asr_inference_with_cuda_ctc_decoder_tutorial.ipynb`](../_downloads/96982138e59c541534342222a3f5c69e/asr_inference_with_cuda_ctc_decoder_tutorial.ipynb)'
 - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)'
+  id: totrans-76
  prefs: []
  type: TYPE_NORMAL
+  zh: '[Sphinx-Gallery生成的图库](https://sphinx-gallery.github.io)'
--- a/totrans/aud22_37.yaml
+++ b/totrans/aud22_37.yaml
 - en: Online ASR with Emformer RNN-T
+  id: totrans-0
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
+  zh: 使用Emformer RNN-T进行在线ASR
 - en: 原文：[https://pytorch.org/audio/stable/tutorials/online_asr_tutorial.html](https://pytorch.org/audio/stable/tutorials/online_asr_tutorial.html)
+  id: totrans-1
  prefs:
  - PREF_BQ
  type: TYPE_NORMAL
+  zh: 原文：[https://pytorch.org/audio/stable/tutorials/online_asr_tutorial.html](https://pytorch.org/audio/stable/tutorials/online_asr_tutorial.html)
 - en: Note
+  id: totrans-2
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: Click [here](#sphx-glr-download-tutorials-online-asr-tutorial-py) to download
    the full example code
+  id: totrans-3
  prefs: []
  type: TYPE_NORMAL
+  zh: 点击[这里](#sphx-glr-download-tutorials-online-asr-tutorial-py)下载完整示例代码
 - en: '**Author**: [Jeff Hwang](mailto:jeffhwang%40meta.com), [Moto Hira](mailto:moto%40meta.com)'
+  id: totrans-4
  prefs: []
  type: TYPE_NORMAL
+  zh: '**作者**：[Jeff Hwang](mailto:jeffhwang%40meta.com), [Moto Hira](mailto:moto%40meta.com)'
 - en: This tutorial shows how to use Emformer RNN-T and streaming API to perform online
    speech recognition.
+  id: totrans-5
  prefs: []
  type: TYPE_NORMAL
+  zh: 本教程展示了如何使用Emformer RNN-T和流式API执行在线语音识别。
 - en: Note
+  id: totrans-6
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: This tutorial requires FFmpeg libraries and SentencePiece.
+  id: totrans-7
  prefs: []
  type: TYPE_NORMAL
+  zh: 本教程需要使用FFmpeg库和SentencePiece。
 - en: Please refer to [Optional Dependencies](../installation.html#optional-dependencies)
    for the detail.
+  id: totrans-8
  prefs: []
  type: TYPE_NORMAL
+  zh: 有关详细信息，请参阅[可选依赖项](../installation.html#optional-dependencies)。
 - en: 1\. Overview[](#overview "Permalink to this heading")
+  id: totrans-9
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 1\. 概述[](#overview "跳转到此标题的永久链接")
 - en: Performing online speech recognition is composed of the following steps
+  id: totrans-10
  prefs: []
  type: TYPE_NORMAL
+  zh: 在线语音识别的执行由以下步骤组成
 - en: 'Build the inference pipeline Emformer RNN-T is composed of three components:
    feature extractor, decoder and token processor.'
+  id: totrans-11
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 构建推理管道Emformer RNN-T由三个组件组成：特征提取器、解码器和标记处理器。
 - en: Format the waveform into chunks of expected sizes.
+  id: totrans-12
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 将波形格式化为预期大小的块。
 - en: Pass data through the pipeline.
+  id: totrans-13
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 通过管道传递数据。
 - en: 2\. Preparation[](#preparation "Permalink to this heading")
+  id: totrans-14
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 2\. 准备[](#preparation "跳转到此标题的永久链接")
 - en: '[PRE0]'
+  id: totrans-15
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE0]'
 - en: '[PRE1]'
+  id: totrans-16
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE1]'
 - en: 3\. Construct the pipeline[](#construct-the-pipeline "Permalink to this heading")
+  id: totrans-17
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 3\. 构建管道[](#construct-the-pipeline "跳转到此标题的永久链接")
 - en: Pre-trained model weights and related pipeline components are bundled as [`torchaudio.pipelines.RNNTBundle`](../generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle
    "torchaudio.pipelines.RNNTBundle").
+  id: totrans-18
  prefs: []
  type: TYPE_NORMAL
+  zh: 预训练模型权重和相关管道组件被捆绑为[`torchaudio.pipelines.RNNTBundle`](../generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle)。
 - en: We use [`torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH`](../generated/torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH.html#torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH
    "torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH"), which is a Emformer RNN-T
    model trained on LibriSpeech dataset.
+  id: totrans-19
  prefs: []
  type: TYPE_NORMAL
+  zh: 我们使用[`torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH`](../generated/torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH.html#torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH)，这是在LibriSpeech数据集上训练的Emformer
+    RNN-T模型。
 - en: '[PRE2]'
+  id: totrans-20
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE2]'
 - en: '[PRE3]'
+  id: totrans-21
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE3]'
 - en: Streaming inference works on input data with overlap. Emformer RNN-T model treats
    the newest portion of the input data as the “right context” — a preview of future
    context. In each inference call, the model expects the main segment to start from
    this right context from the previous inference call. The following figure illustrates
    this.
+  id: totrans-22
  prefs: []
  type: TYPE_NORMAL
+  zh: 流式推理适用于具有重叠的输入数据。Emformer RNN-T模型将输入数据的最新部分视为“右上下文” —— 未来上下文的预览。在每次推理调用中，模型期望主段从上一次推理调用的右上下文开始。以下图示说明了这一点。
 - en: '![https://download.pytorch.org/torchaudio/tutorial-assets/emformer_rnnt_context.png](../Images/0e1c9a1ab0a1725ac44a8f5ae79784d9.png)'
+  id: totrans-23
  prefs: []
  type: TYPE_IMG
+  zh: '![https://download.pytorch.org/torchaudio/tutorial-assets/emformer_rnnt_context.png](../Images/0e1c9a1ab0a1725ac44a8f5ae79784d9.png)'
 - en: The size of main segment and right context, along with the expected sample rate
    can be retrieved from bundle.
+  id: totrans-24
  prefs: []
  type: TYPE_NORMAL
+  zh: 主段和右上下文的大小，以及预期的采样率可以从bundle中检索。
 - en: '[PRE4]'
+  id: totrans-25
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE4]'
 - en: '[PRE5]'
+  id: totrans-26
  prefs: []
  type: TYPE_PRE
- en: 4\. Configure the audio stream[](#configure-the-audio-stream "Permalink to
-    this heading")
+  zh: '[PRE5]'
+- en: 4\. Configure the audio stream[](#configure-the-audio-stream "Permalink to this
+    heading")
+  id: totrans-27
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 4\. 配置音频流[](#configure-the-audio-stream "跳转到此标题的永久链接")
 - en: Next, we configure the input audio stream using [`torchaudio.io.StreamReader`](../generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader
    "torchaudio.io.StreamReader").
+  id: totrans-28
  prefs: []
  type: TYPE_NORMAL
+  zh: 接下来，我们使用[`torchaudio.io.StreamReader`](../generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader)配置输入音频流。
 - en: For the detail of this API, please refer to the [StreamReader Basic Usage](./streamreader_basic_tutorial.html).
+  id: totrans-29
  prefs: []
  type: TYPE_NORMAL
+  zh: 有关此API的详细信息，请参阅[StreamReader基本用法](./streamreader_basic_tutorial.html)。
 - en: The following audio file was originally published by LibriVox project, and it
    is in the public domain.
+  id: totrans-30
  prefs: []
  type: TYPE_NORMAL
+  zh: 以下音频文件最初由LibriVox项目发布，属于公共领域。
 - en: '[https://librivox.org/great-pirate-stories-by-joseph-lewis-french/](https://librivox.org/great-pirate-stories-by-joseph-lewis-french/)'
+  id: totrans-31
  prefs: []
  type: TYPE_NORMAL
+  zh: '[https://librivox.org/great-pirate-stories-by-joseph-lewis-french/](https://librivox.org/great-pirate-stories-by-joseph-lewis-french/)'
 - en: It was re-uploaded for the sake of the tutorial.
+  id: totrans-32
  prefs: []
  type: TYPE_NORMAL
+  zh: 出于教程目的，它被重新上传。
 - en: '[PRE6]'
+  id: totrans-33
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE6]'
 - en: '[PRE7]'
+  id: totrans-34
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE7]'
 - en: As previously explained, Emformer RNN-T model expects input data with overlaps;
    however, Streamer iterates the source media without overlap, so we make a helper
    structure that caches a part of input data from Streamer as right context and
    then appends it to the next input data from Streamer.
+  id: totrans-35
  prefs: []
  type: TYPE_NORMAL
+  zh: 如前所述，Emformer RNN-T模型期望具有重叠的输入数据；然而，Streamer在没有重叠的情况下迭代源媒体，因此我们制作了一个辅助结构，从Streamer缓存一部分输入数据作为右上下文，然后将其附加到来自Streamer的下一个输入数据。
 - en: The following figure illustrates this.
+  id: totrans-36
  prefs: []
  type: TYPE_NORMAL
+  zh: 以下图示说明了这一点。
 - en: '![https://download.pytorch.org/torchaudio/tutorial-assets/emformer_rnnt_streamer_context.png](../Images/a57362a983bfc8977c146b9cec1fbdc5.png)'
+  id: totrans-37
  prefs: []
  type: TYPE_IMG
+  zh: '![https://download.pytorch.org/torchaudio/tutorial-assets/emformer_rnnt_streamer_context.png](../Images/a57362a983bfc8977c146b9cec1fbdc5.png)'
 - en: '[PRE8]'
+  id: totrans-38
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE8]'
 - en: 5\. Run stream inference[](#run-stream-inference "Permalink to this heading")
+  id: totrans-39
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 5\. 运行流推理[](#run-stream-inference "跳转到此标题的永久链接")
 - en: Finally, we run the recognition.
+  id: totrans-40
  prefs: []
  type: TYPE_NORMAL
+  zh: 最后，我们运行识别。
 - en: First, we initialize the stream iterator, context cacher, and state and hypothesis
    that are used by decoder to carry over the decoding state between inference calls.
+  id: totrans-41
  prefs: []
  type: TYPE_NORMAL
+  zh: 首先，我们初始化流迭代器、上下文缓存器以及解码器使用的状态和假设，用于在推理调用之间传递解码状态。
 - en: '[PRE9]'
+  id: totrans-42
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE9]'
 - en: Next we, run the inference.
+  id: totrans-43
  prefs: []
  type: TYPE_NORMAL
+  zh: 接下来，我们运行推理。
 - en: For the sake of better display, we create a helper function which processes
    the source stream up to the given times and call it repeatedly.
+  id: totrans-44
  prefs: []
  type: TYPE_NORMAL
+  zh: 为了更好地显示，我们创建了一个辅助函数，该函数处理源流直到给定次数，并重复调用它。
 - en: '[PRE10]'
+  id: totrans-45
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE10]'
 - en: '[PRE11]'
+  id: totrans-46
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE11]'
 - en: '![MelSpectrogram Feature](../Images/6f88cad1fa15680732704d2ab1568895.png)'
+  id: totrans-47
  prefs: []
  type: TYPE_IMG
+  zh: '![MelSpectrogram特征](../Images/6f88cad1fa15680732704d2ab1568895.png)'
 - en: '[PRE12]'
+  id: totrans-48
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE12]'
+- en: null
+  id: totrans-49
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-50
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: '[PRE13]'
+  id: totrans-51
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE13]'
 - en: '![MelSpectrogram Feature](../Images/63ea9ff950b6828668774e9e16e2da72.png)'
+  id: totrans-52
  prefs: []
  type: TYPE_IMG
+  zh: '![Mel频谱特征](../Images/63ea9ff950b6828668774e9e16e2da72.png)'
 - en: '[PRE14]'
+  id: totrans-53
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE14]'
+- en: null
+  id: totrans-54
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-55
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: '[PRE15]'
+  id: totrans-56
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE15]'
 - en: '![MelSpectrogram Feature](../Images/9fd0eaf340cc4769da822a728893c8d0.png)'
+  id: totrans-57
  prefs: []
  type: TYPE_IMG
+  zh: '![Mel频谱特征](../Images/9fd0eaf340cc4769da822a728893c8d0.png)'
 - en: '[PRE16]'
+  id: totrans-58
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE16]'
+- en: null
+  id: totrans-59
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-60
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: '[PRE17]'
+  id: totrans-61
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE17]'
 - en: '![MelSpectrogram Feature](../Images/27361e962edf9ff4e1dc7a554b09d885.png)'
+  id: totrans-62
  prefs: []
  type: TYPE_IMG
+  zh: '![Mel频谱特征](../Images/27361e962edf9ff4e1dc7a554b09d885.png)'
 - en: '[PRE18]'
+  id: totrans-63
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE18]'
+- en: null
+  id: totrans-64
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-65
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: '[PRE19]'
+  id: totrans-66
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE19]'
 - en: '![MelSpectrogram Feature](../Images/78b4f08b9d73ca155002dca9b67d5139.png)'
+  id: totrans-67
  prefs: []
  type: TYPE_IMG
+  zh: '![Mel频谱特征](../Images/78b4f08b9d73ca155002dca9b67d5139.png)'
 - en: '[PRE20]'
+  id: totrans-68
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE20]'
+- en: null
+  id: totrans-69
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-70
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: '[PRE21]'
+  id: totrans-71
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE21]'
 - en: '![MelSpectrogram Feature](../Images/8e43113644bb019dfc4bb4603e5bc696.png)'
+  id: totrans-72
  prefs: []
  type: TYPE_IMG
+  zh: '![Mel频谱特征](../Images/8e43113644bb019dfc4bb4603e5bc696.png)'
 - en: '[PRE22]'
+  id: totrans-73
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE22]'
+- en: null
+  id: totrans-74
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-75
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: '[PRE23]'
+  id: totrans-76
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE23]'
 - en: '![MelSpectrogram Feature](../Images/74f496d6db06d496150b2e6b919a7fea.png)'
+  id: totrans-77
  prefs: []
  type: TYPE_IMG
+  zh: '![Mel频谱特征](../Images/74f496d6db06d496150b2e6b919a7fea.png)'
 - en: '[PRE24]'
+  id: totrans-78
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE24]'
+- en: null
+  id: totrans-79
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-80
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: '[PRE25]'
+  id: totrans-81
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE25]'
 - en: '![MelSpectrogram Feature](../Images/1d8004d0bd1aaa132e299f5e7b3f4d65.png)'
+  id: totrans-82
  prefs: []
  type: TYPE_IMG
+  zh: '![Mel频谱特征](../Images/1d8004d0bd1aaa132e299f5e7b3f4d65.png)'
 - en: '[PRE26]'
+  id: totrans-83
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE26]'
+- en: null
+  id: totrans-84
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-85
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: '[PRE27]'
+  id: totrans-86
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE27]'
 - en: '![MelSpectrogram Feature](../Images/078602e6329acdc28d9f151361d84fa4.png)'
+  id: totrans-87
  prefs: []
  type: TYPE_IMG
+  zh: '![Mel频谱特征](../Images/078602e6329acdc28d9f151361d84fa4.png)'
 - en: '[PRE28]'
+  id: totrans-88
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE28]'
+- en: null
+  id: totrans-89
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-90
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: '[PRE29]'
+  id: totrans-91
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE29]'
 - en: '![MelSpectrogram Feature](../Images/09c62d29a7ebfdca810fb7715b4d6deb.png)'
+  id: totrans-92
  prefs: []
  type: TYPE_IMG
+  zh: '![Mel频谱特征](../Images/09c62d29a7ebfdca810fb7715b4d6deb.png)'
 - en: '[PRE30]'
+  id: totrans-93
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE30]'
+- en: null
+  id: totrans-94
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-95
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: '[PRE31]'
+  id: totrans-96
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE31]'
 - en: '![MelSpectrogram Feature](../Images/bd6f77d39b92dab706c4579cee78d49b.png)'
+  id: totrans-97
  prefs: []
  type: TYPE_IMG
+  zh: '![Mel频谱特征](../Images/bd6f77d39b92dab706c4579cee78d49b.png)'
 - en: '[PRE32]'
+  id: totrans-98
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE32]'
+- en: null
+  id: totrans-99
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-100
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: '[PRE33]'
+  id: totrans-101
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE33]'
 - en: '![MelSpectrogram Feature](../Images/1d08a0f2dfb8662795d4a456d55369b9.png)'
+  id: totrans-102
  prefs: []
  type: TYPE_IMG
+  zh: '![Mel频谱特征](../Images/1d08a0f2dfb8662795d4a456d55369b9.png)'
 - en: '[PRE34]'
+  id: totrans-103
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE34]'
+- en: null
+  id: totrans-104
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-105
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: '[PRE35]'
+  id: totrans-106
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE35]'
 - en: '![MelSpectrogram Feature](../Images/b5ffe860eeae95b44bae565c68a36a14.png)'
+  id: totrans-107
  prefs: []
  type: TYPE_IMG
+  zh: '![Mel频谱特征](../Images/b5ffe860eeae95b44bae565c68a36a14.png)'
 - en: '[PRE36]'
+  id: totrans-108
  prefs: []
  type: TYPE_PRE
- en: 
+  zh: '[PRE36]'
+- en: null
+  id: totrans-109
  prefs: []
  type: TYPE_NORMAL
 - en: Your browser does not support the audio element.
+  id: totrans-110
  prefs: []
  type: TYPE_NORMAL
+  zh: 您的浏览器不支持音频元素。
 - en: 'Tag: [`torchaudio.io`](../io.html#module-torchaudio.io "torchaudio.io")'
+  id: totrans-111
  prefs: []
  type: TYPE_NORMAL
+  zh: 标签：[`torchaudio.io`](../io.html#module-torchaudio.io "torchaudio.io")
 - en: '**Total running time of the script:** ( 1 minutes 34.955 seconds)'
+  id: totrans-112
  prefs: []
  type: TYPE_NORMAL
+  zh: '**脚本的总运行时间：**（1分钟34.955秒）'
 - en: '[`Download Python source code: online_asr_tutorial.py`](../_downloads/f9f593098569966df0b815e29c13dd20/online_asr_tutorial.py)'
+  id: totrans-113
  prefs: []
  type: TYPE_NORMAL
+  zh: '[`下载Python源代码：online_asr_tutorial.py`](../_downloads/f9f593098569966df0b815e29c13dd20/online_asr_tutorial.py)'
 - en: '[`Download Jupyter notebook: online_asr_tutorial.ipynb`](../_downloads/bd34dff0656a1aa627d444a8d1a5957f/online_asr_tutorial.ipynb)'
+  id: totrans-114
  prefs: []
  type: TYPE_NORMAL
+  zh: '[`下载Jupyter笔记本：online_asr_tutorial.ipynb`](../_downloads/bd34dff0656a1aa627d444a8d1a5957f/online_asr_tutorial.ipynb)'
 - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)'
+  id: totrans-115
  prefs: []
  type: TYPE_NORMAL
+  zh: '[Sphinx-Gallery生成的图库](https://sphinx-gallery.github.io)'
--- a/totrans/aud22_38.yaml
+++ b/totrans/aud22_38.yaml
 - en: Device ASR with Emformer RNN-T
+  id: totrans-0
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
+  zh: 使用Emformer RNN-T的设备ASR
 - en: 原文：[https://pytorch.org/audio/stable/tutorials/device_asr.html](https://pytorch.org/audio/stable/tutorials/device_asr.html)
+  id: totrans-1
  prefs:
  - PREF_BQ
  type: TYPE_NORMAL
+  zh: 原文：[https://pytorch.org/audio/stable/tutorials/device_asr.html](https://pytorch.org/audio/stable/tutorials/device_asr.html)
 - en: Note
+  id: totrans-2
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: Click [here](#sphx-glr-download-tutorials-device-asr-py) to download the full
    example code
+  id: totrans-3
  prefs: []
  type: TYPE_NORMAL
+  zh: 点击[这里](#sphx-glr-download-tutorials-device-asr-py)下载完整示例代码
 - en: '**Author**: [Moto Hira](mailto:moto%40meta.com), [Jeff Hwang](mailto:jeffhwang%40meta.com).'
+  id: totrans-4
  prefs: []
  type: TYPE_NORMAL
+  zh: '**作者**：[Moto Hira](mailto:moto%40meta.com), [Jeff Hwang](mailto:jeffhwang%40meta.com)。'
 - en: This tutorial shows how to use Emformer RNN-T and streaming API to perform speech
    recognition on a streaming device input, i.e. microphone on laptop.
+  id: totrans-5
  prefs: []
  type: TYPE_NORMAL
+  zh: 本教程展示了如何使用Emformer RNN-T和流式API在流式设备输入上执行语音识别，即笔记本电脑上的麦克风。
 - en: Note
+  id: totrans-6
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: This tutorial requires FFmpeg libraries. Please refer to [FFmpeg dependency](../installation.html#ffmpeg-dependency)
    for the detail.
+  id: totrans-7
  prefs: []
  type: TYPE_NORMAL
+  zh: 本教程需要FFmpeg库。请参考[FFmpeg依赖](../installation.html#ffmpeg-dependency)获取详细信息。
 - en: Note
+  id: totrans-8
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: This tutorial was tested on MacBook Pro and Dynabook with Windows 10.
+  id: totrans-9
  prefs: []
  type: TYPE_NORMAL
+  zh: 本教程在MacBook Pro和安装了Windows 10的Dynabook上进行了测试。
 - en: This tutorial does NOT work on Google Colab because the server running this
    tutorial does not have a microphone that you can talk to.
+  id: totrans-10
  prefs: []
  type: TYPE_NORMAL
+  zh: 本教程在Google Colab上不起作用，因为运行本教程的服务器没有可以与之交谈的麦克风。
 - en: 1\. Overview[](#overview "Permalink to this heading")
+  id: totrans-11
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 1\. 概述[](#overview "Permalink to this heading")
 - en: We use streaming API to fetch audio from audio device (microphone) chunk by
    chunk, then run inference using Emformer RNN-T.
+  id: totrans-12
  prefs: []
  type: TYPE_NORMAL
+  zh: 我们使用流式API逐块从音频设备（麦克风）获取音频，然后使用Emformer RNN-T进行推理。
 - en: For the basic usage of the streaming API and Emformer RNN-T please refer to
    [StreamReader Basic Usage](./streamreader_basic_tutorial.html) and [Online ASR
    with Emformer RNN-T](./online_asr_tutorial.html).
+  id: totrans-13
  prefs: []
  type: TYPE_NORMAL
+  zh: 有关流式API和Emformer RNN-T的基本用法，请参考[StreamReader基本用法](./streamreader_basic_tutorial.html)和[使用Emformer
+    RNN-T进行在线ASR](./online_asr_tutorial.html)。
 - en: 2\. Checking the supported devices[](#checking-the-supported-devices "Permalink
    to this heading")
+  id: totrans-14
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 2\. 检查支持的设备[](#checking-the-supported-devices "Permalink to this heading")
 - en: Firstly, we need to check the devices that Streaming API can access, and figure
    out the arguments (`src` and `format`) we need to pass to [`StreamReader()`](../generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader
    "torchaudio.io.StreamReader") class.
+  id: totrans-15
  prefs: []
  type: TYPE_NORMAL
+  zh: 首先，我们需要检查流式API可以访问的设备，并找出我们需要传递给[`StreamReader()`](../generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader
+    "torchaudio.io.StreamReader")类的参数（`src`和`format`）。
 - en: We use `ffmpeg` command for this. `ffmpeg` abstracts away the difference of
    underlying hardware implementations, but the expected value for `format` varies
    across OS and each `format` defines different syntax for `src`.
+  id: totrans-16
  prefs: []
  type: TYPE_NORMAL
+  zh: 我们使用`ffmpeg`命令来实现。`ffmpeg`抽象了底层硬件实现的差异，但`format`的预期值在不同操作系统上有所不同，每个`format`定义了不同的`src`语法。
 - en: The details of supported `format` values and `src` syntax can be found in [https://ffmpeg.org/ffmpeg-devices.html](https://ffmpeg.org/ffmpeg-devices.html).
+  id: totrans-17
  prefs: []
  type: TYPE_NORMAL
+  zh: 有关支持的`format`值和`src`语法的详细信息，请参考[https://ffmpeg.org/ffmpeg-devices.html](https://ffmpeg.org/ffmpeg-devices.html)。
 - en: For macOS, the following command will list the available devices.
+  id: totrans-18
  prefs: []
  type: TYPE_NORMAL
+  zh: 对于macOS，以下命令将列出可用设备。
 - en: '[PRE0]'
+  id: totrans-19
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE0]'
 - en: We will use the following values for Streaming API.
+  id: totrans-20
  prefs: []
  type: TYPE_NORMAL
+  zh: 我们将使用以下值进行流式API。
 - en: '[PRE1]'
+  id: totrans-21
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE1]'
 - en: For Windows, `dshow` device should work.
+  id: totrans-22
  prefs: []
  type: TYPE_NORMAL
+  zh: 对于Windows，`dshow`设备应该可以工作。
 - en: '[PRE2]'
+  id: totrans-23
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE2]'
 - en: In the above case, the following value can be used to stream from microphone.
+  id: totrans-24
  prefs: []
  type: TYPE_NORMAL
+  zh: 在上述情况下，可以使用以下值从麦克风进行流式传输。
 - en: '[PRE3]'
+  id: totrans-25
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE3]'
 - en: 3\. Data acquisition[](#data-acquisition "Permalink to this heading")
+  id: totrans-26
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 3\. 数据采集[](#data-acquisition "Permalink to this heading")
 - en: Streaming audio from microphone input requires properly timing data acquisition.
    Failing to do so may introduce discontinuities in the data stream.
+  id: totrans-27
  prefs: []
  type: TYPE_NORMAL
+  zh: 从麦克风输入流式音频需要正确计时数据采集。如果未能这样做，可能会导致数据流中出现不连续性。
 - en: For this reason, we will run the data acquisition in a subprocess.
+  id: totrans-28
  prefs: []
  type: TYPE_NORMAL
+  zh: 因此，我们将在子进程中运行数据采集。
 - en: Firstly, we create a helper function that encapsulates the whole process executed
    in the subprocess.
+  id: totrans-29
  prefs: []
  type: TYPE_NORMAL
+  zh: 首先，我们创建一个封装在子进程中执行的整个过程的辅助函数。
 - en: This function initializes the streaming API, acquires data then puts it in a
    queue, which the main process is watching.
+  id: totrans-30
  prefs: []
  type: TYPE_NORMAL
+  zh: 此函数初始化流式API，获取数据然后将其放入队列，主进程正在监视该队列。
 - en: '[PRE4]'
+  id: totrans-31
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE4]'
 - en: The notable difference from the non-device streaming is that, we provide `timeout`
    and `backoff` parameters to `stream` method.
+  id: totrans-32
  prefs: []
  type: TYPE_NORMAL
+  zh: 与非设备流式的显着区别在于，我们为`stream`方法提供了`timeout`和`backoff`参数。
 - en: When acquiring data, if the rate of acquisition requests is higher than that
    at which the hardware can prepare the data, then the underlying implementation
    reports special error code, and expects client code to retry.
+  id: totrans-33
  prefs: []
  type: TYPE_NORMAL
+  zh: 在获取数据时，如果获取请求的速率高于硬件准备数据的速率，则底层实现会报告特殊的错误代码，并期望客户端代码重试。
 - en: Precise timing is the key for smooth streaming. Reporting this error from low-level
    implementation all the way back to Python layer, before retrying adds undesired
    overhead. For this reason, the retry behavior is implemented in C++ layer, and
    `timeout` and `backoff` parameters allow client code to control the behavior.
+  id: totrans-34
  prefs: []
  type: TYPE_NORMAL
+  zh: 精确的时序是流畅流媒体的关键。从低级实现报告此错误一直返回到Python层，在重试之前会增加不必要的开销。因此，重试行为是在C++层实现的，`timeout`和`backoff`参数允许客户端代码控制行为。
 - en: For the detail of `timeout` and `backoff` parameters, please refer to the documentation
    of `stream()` method.
+  id: totrans-35
  prefs: []
  type: TYPE_NORMAL
+  zh: 有关`timeout`和`backoff`参数的详细信息，请参考`stream()`方法的文档。
 - en: Note
+  id: totrans-36
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: The proper value of `backoff` depends on the system configuration. One way to
    see if `backoff` value is appropriate is to save the series of acquired chunks
    as a continuous audio and listen to it. If `backoff` value is too large, then
    the data stream is discontinuous. The resulting audio sounds sped up. If `backoff`
    value is too small or zero, the audio stream is fine, but the data acquisition
    process enters busy-waiting state, and this increases the CPU consumption.
+  id: totrans-37
  prefs: []
  type: TYPE_NORMAL
+  zh: '`backoff`的适当值取决于系统配置。检查`backoff`值是否合适的一种方法是将获取的一系列块保存为连续音频并进行听取。如果`backoff`值太大，则数据流是不连续的。生成的音频听起来加快了。如果`backoff`值太小或为零，则音频流正常，但数据采集过程进入忙等待状态，这会增加CPU消耗。'
 - en: 4\. Building inference pipeline[](#building-inference-pipeline "Permalink to
    this heading")
+  id: totrans-38
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 4\. 构建推理流程[](#building-inference-pipeline "跳转到此标题")
 - en: The next step is to create components required for inference.
+  id: totrans-39
  prefs: []
  type: TYPE_NORMAL
+  zh: 接下来的步骤是创建推理所需的组件。
 - en: This is the same process as [Online ASR with Emformer RNN-T](./online_asr_tutorial.html).
+  id: totrans-40
  prefs: []
  type: TYPE_NORMAL
+  zh: 这与[使用Emformer RNN-T进行在线ASR](./online_asr_tutorial.html)是相同的流程。
 - en: '[PRE5]'
+  id: totrans-41
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE5]'
 - en: '[PRE6]'
+  id: totrans-42
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE6]'
 - en: 5\. The main process[](#the-main-process "Permalink to this heading")
+  id: totrans-43
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 5\. 主要流程[](#the-main-process "跳转到此标题")
 - en: 'The execution flow of the main process is as follows:'
+  id: totrans-44
  prefs: []
  type: TYPE_NORMAL
+  zh: 主进程的执行流程如下：
 - en: Initialize the inference pipeline.
+  id: totrans-45
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 初始化推理流程。
 - en: Launch data acquisition subprocess.
+  id: totrans-46
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 启动数据获取子进程。
 - en: Run inference.
+  id: totrans-47
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 运行推理。
 - en: Clean up
+  id: totrans-48
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 清理
 - en: Note
+  id: totrans-49
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: As the data acquisition subprocess will be launched with “spawn” method, all
    the code on global scope are executed on the subprocess as well.
+  id: totrans-50
  prefs: []
  type: TYPE_NORMAL
+  zh: 由于数据获取子进程将使用“spawn”方法启动，全局范围的所有代码也将在子进程中执行。
 - en: We want to instantiate pipeline only in the main process, so we put them in
    a function and invoke it within __name__ == “__main__” guard.
+  id: totrans-51
  prefs: []
  type: TYPE_NORMAL
+  zh: 我们希望只在主进程中实例化流程，因此我们将它们放在一个函数中，并在`__name__ == "__main__"`保护内调用它。
 - en: '[PRE7]'
+  id: totrans-52
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE7]'
 - en: '[PRE8]'
+  id: totrans-53
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE8]'
 - en: 'Tag: [`torchaudio.io`](../io.html#module-torchaudio.io "torchaudio.io")'
+  id: totrans-54
  prefs: []
  type: TYPE_NORMAL
+  zh: 标签：[`torchaudio.io`](../io.html#module-torchaudio.io "torchaudio.io")
 - en: '**Total running time of the script:** ( 0 minutes 0.000 seconds)'
+  id: totrans-55
  prefs: []
  type: TYPE_NORMAL
+  zh: '**脚本的总运行时间：**（0分钟0.000秒）'
 - en: '[`Download Python source code: device_asr.py`](../_downloads/8009eae2a3a1a322f175ecc138597775/device_asr.py)'
+  id: totrans-56
  prefs: []
  type: TYPE_NORMAL
+  zh: '[`下载Python源代码：device_asr.py`](../_downloads/8009eae2a3a1a322f175ecc138597775/device_asr.py)'
 - en: '[`Download Jupyter notebook: device_asr.ipynb`](../_downloads/c8265c298ed19ff44b504d5c3aa72563/device_asr.ipynb)'
+  id: totrans-57
  prefs: []
  type: TYPE_NORMAL
+  zh: '[`下载Jupyter笔记本：device_asr.ipynb`](../_downloads/c8265c298ed19ff44b504d5c3aa72563/device_asr.ipynb)'
 - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)'
+  id: totrans-58
  prefs: []
  type: TYPE_NORMAL
+  zh: '[Sphinx-Gallery生成的画廊](https://sphinx-gallery.github.io)'
--- a/totrans/aud22_39.yaml
+++ b/totrans/aud22_39.yaml
 - en: Device AV-ASR with Emformer RNN-T
+  id: totrans-0
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
+  zh: 使用Emformer RNN-T的设备AV-ASR
 - en: 原文：[https://pytorch.org/audio/stable/tutorials/device_avsr.html](https://pytorch.org/audio/stable/tutorials/device_avsr.html)
+  id: totrans-1
  prefs:
  - PREF_BQ
  type: TYPE_NORMAL
+  zh: 原文：[https://pytorch.org/audio/stable/tutorials/device_avsr.html](https://pytorch.org/audio/stable/tutorials/device_avsr.html)
 - en: Note
+  id: totrans-2
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: Click [here](#sphx-glr-download-tutorials-device-avsr-py) to download the full
    example code
+  id: totrans-3
  prefs: []
  type: TYPE_NORMAL
+  zh: 点击[这里](#sphx-glr-download-tutorials-device-avsr-py)下载完整示例代码
 - en: '**Author**: [Pingchuan Ma](mailto:pingchuanma%40meta.com), [Moto Hira](mailto:moto%40meta.com).'
+  id: totrans-4
  prefs: []
  type: TYPE_NORMAL
+  zh: '**作者**：[Pingchuan Ma](mailto:pingchuanma%40meta.com), [Moto Hira](mailto:moto%40meta.com)。'
 - en: This tutorial shows how to run on-device audio-visual speech recognition (AV-ASR,
    or AVSR) with TorchAudio on a streaming device input, i.e. microphone on laptop.
    AV-ASR is the task of transcribing text from audio and visual streams, which has
    recently attracted a lot of research attention due to its robustness against noise.
+  id: totrans-5
  prefs: []
  type: TYPE_NORMAL
+  zh: 本教程展示了如何在流设备输入上（即笔记本电脑上的麦克风）使用TorchAudio运行设备上的音频-视觉语音识别（AV-ASR或AVSR）。AV-ASR是从音频和视觉流中转录文本的任务，最近因其对噪声的稳健性而引起了许多研究的关注。
 - en: Note
+  id: totrans-6
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: This tutorial requires ffmpeg, sentencepiece, mediapipe, opencv-python and scikit-image
    libraries.
+  id: totrans-7
  prefs: []
  type: TYPE_NORMAL
+  zh: 此教程需要ffmpeg、sentencepiece、mediapipe、opencv-python和scikit-image库。
 - en: There are multiple ways to install ffmpeg libraries. If you are using Anaconda
    Python distribution, `conda install -c conda-forge 'ffmpeg<7'` will install compatible
    FFmpeg libraries.
+  id: totrans-8
  prefs: []
  type: TYPE_NORMAL
+  zh: 有多种安装ffmpeg库的方法。如果您使用Anaconda Python发行版，`conda install -c conda-forge 'ffmpeg<7'`将安装兼容的FFmpeg库。
 - en: You can run `pip install sentencepiece mediapipe opencv-python scikit-image`
    to install the other libraries mentioned.
+  id: totrans-9
  prefs: []
  type: TYPE_NORMAL
+  zh: 您可以运行`pip install sentencepiece mediapipe opencv-python scikit-image`来安装其他提到的库。
 - en: Note
+  id: totrans-10
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: To run this tutorial, please make sure you are in the tutorial folder.
+  id: totrans-11
  prefs: []
  type: TYPE_NORMAL
+  zh: 要运行此教程，请确保您在教程文件夹中。
 - en: Note
+  id: totrans-12
  prefs: []
  type: TYPE_NORMAL
+  zh: 注意
 - en: We tested the tutorial on torchaudio version 2.0.2 on Macbook Pro (M1 Pro).
+  id: totrans-13
  prefs: []
  type: TYPE_NORMAL
+  zh: 我们在Macbook Pro（M1 Pro）上测试了torchaudio版本2.0.2上的教程。
 - en: '[PRE0]'
+  id: totrans-14
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE0]'
 - en: Overview[](#overview "Permalink to this heading")
+  id: totrans-15
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 概述[](#overview "Permalink to this heading")
 - en: The real-time AV-ASR system is presented as follows, which consists of three
    components, a data collection module, a pre-processing module and an end-to-end
    model. The data collection module is hardware, such as a microphone and camera.
@@ -64,140 +96,217 @@
    collected, the pre-processing module location and crop out face. Next, we feed
    the raw audio stream and the pre-processed video stream into our end-to-end model
    for inference.
+  id: totrans-16
  prefs: []
  type: TYPE_NORMAL
+  zh: 实时AV-ASR系统如下所示，由三个组件组成，即数据收集模块、预处理模块和端到端模型。数据收集模块是硬件，如麦克风和摄像头。它的作用是从现实世界收集信息。一旦信息被收集，预处理模块会定位和裁剪出脸部。接下来，我们将原始音频流和预处理的视频流馈送到我们的端到端模型进行推断。
 - en: '![https://download.pytorch.org/torchaudio/doc-assets/avsr/overview.png](../Images/757b2c4226d175a3a1b0d10e928d909c.png)'
+  id: totrans-17
  prefs: []
  type: TYPE_IMG
+  zh: '![https://download.pytorch.org/torchaudio/doc-assets/avsr/overview.png](../Images/757b2c4226d175a3a1b0d10e928d909c.png)'
 - en: 1\. Data acquisition[](#data-acquisition "Permalink to this heading")
+  id: totrans-18
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 1\. 数据采集[](#data-acquisition "Permalink to this heading")
 - en: Firstly, we define the function to collect videos from microphone and camera.
    To be specific, we use [`StreamReader`](../generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader
    "torchaudio.io.StreamReader") class for the purpose of data collection, which
    supports capturing audio/video from microphone and camera. For the detailed usage
    of this class, please refer to the [tutorial](./streamreader_basic_tutorial.html).
+  id: totrans-19
  prefs: []
  type: TYPE_NORMAL
+  zh: 首先，我们定义了从麦克风和摄像头收集视频的函数。具体来说，我们使用[`StreamReader`](../generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader
+    "torchaudio.io.StreamReader")类来进行数据收集，该类支持从麦克风和摄像头捕获音频/视频。有关此类的详细用法，请参考[教程](./streamreader_basic_tutorial.html)。
 - en: '[PRE1]'
+  id: totrans-20
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE1]'
 - en: 2\. Pre-processing[](#pre-processing "Permalink to this heading")
+  id: totrans-21
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 2\. 预处理[](#pre-processing "Permalink to this heading")
 - en: Before feeding the raw stream into our model, each video sequence has to undergo
    a specific pre-processing procedure. This involves three critical steps. The first
    step is to perform face detection. Following that, each individual frame is aligned
    to a referenced frame, commonly known as the mean face, in order to normalize
    rotation and size differences across frames. The final step in the pre-processing
    module is to crop the face region from the aligned face image.
+  id: totrans-22
  prefs: []
  type: TYPE_NORMAL
+  zh: 在将原始流馈送到我们的模型之前，每个视频序列都必须经过特定的预处理过程。这涉及三个关键步骤。第一步是进行人脸检测。随后，将每个单独的帧对齐到一个参考帧，通常称为平均脸，以规范化帧之间的旋转和大小差异。预处理模块中的最后一步是从对齐的人脸图像中裁剪出脸部区域。
 - en: '| ![https://download.pytorch.org/torchaudio/doc-assets/avsr/original.gif](../Images/b9142268a9c0666c9697c22b10755a18.png)
    | ![https://download.pytorch.org/torchaudio/doc-assets/avsr/detected.gif](../Images/b44fd7d78a200f7ef203259295e21a8a.png)
    | ![https://download.pytorch.org/torchaudio/doc-assets/avsr/transformed.gif](../Images/7029d284337ec7c2222d6b4344ac49d0.png)
    | ![https://download.pytorch.org/torchaudio/doc-assets/avsr/cropped.gif](../Images/5aa4bb57e0b31b6d34ac3b4766e5503f.png)
    |'
+  id: totrans-23
  prefs: []
  type: TYPE_TB
+  zh: '| ![https://download.pytorch.org/torchaudio/doc-assets/avsr/original.gif](../Images/b9142268a9c0666c9697c22b10755a18.png)
+    | ![https://download.pytorch.org/torchaudio/doc-assets/avsr/detected.gif](../Images/b44fd7d78a200f7ef203259295e21a8a.png)
+    | ![https://download.pytorch.org/torchaudio/doc-assets/avsr/transformed.gif](../Images/7029d284337ec7c2222d6b4344ac49d0.png)
+    | ![https://download.pytorch.org/torchaudio/doc-assets/avsr/cropped.gif](../Images/5aa4bb57e0b31b6d34ac3b4766e5503f.png)
+    |'
 - en: '|'
+  id: totrans-24
  prefs: []
  type: TYPE_NORMAL
+  zh: '|'
 - en: Original
+  id: totrans-25
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 原
 - en: '|'
+  id: totrans-26
  prefs: []
  type: TYPE_NORMAL
+  zh: '|'
 - en: Detected
+  id: totrans-27
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 检测
 - en: '|'
+  id: totrans-28
  prefs: []
  type: TYPE_NORMAL
+  zh: '|'
 - en: Transformed
+  id: totrans-29
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 转换
 - en: '|'
+  id: totrans-30
  prefs: []
  type: TYPE_NORMAL
+  zh: '|'
 - en: Cropped
+  id: totrans-31
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 裁剪
 - en: '|'
+  id: totrans-32
  prefs: []
  type: TYPE_NORMAL
+  zh: '|'
 - en: '[PRE2]'
+  id: totrans-33
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE2]'
 - en: 3\. Building inference pipeline[](#building-inference-pipeline "Permalink to
    this heading")
+  id: totrans-34
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 3\. 构建推断管道[](#building-inference-pipeline "Permalink to this heading")
 - en: The next step is to create components required for pipeline.
+  id: totrans-35
  prefs: []
  type: TYPE_NORMAL
+  zh: 下一步是创建管道所需的组件。
 - en: We use convolutional-based front-ends to extract features from both the raw
    audio and video streams. These features are then passed through a two-layer MLP
    for fusion. For our transducer model, we leverage the TorchAudio library, which
    incorporates an encoder (Emformer), a predictor, and a joint network. The architecture
    of the proposed AV-ASR model is illustrated as follows.
+  id: totrans-36
  prefs: []
  type: TYPE_NORMAL
+  zh: 我们使用基于卷积的前端从原始音频和视频流中提取特征。然后，这些特征通过两层MLP进行融合。对于我们的转录器模型，我们利用了TorchAudio库，该库包含一个编码器（Emformer）、一个预测器和一个联合网络。所提出的AV-ASR模型的架构如下所示。
 - en: '![https://download.pytorch.org/torchaudio/doc-assets/avsr/architecture.png](../Images/ed7f525d50ee520d70b7e9c6f6b7fd66.png)'
+  id: totrans-37
  prefs: []
  type: TYPE_IMG
+  zh: '![https://download.pytorch.org/torchaudio/doc-assets/avsr/architecture.png](../Images/ed7f525d50ee520d70b7e9c6f6b7fd66.png)'
 - en: '[PRE3]'
+  id: totrans-38
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE3]'
 - en: 4\. The main process[](#the-main-process "Permalink to this heading")
+  id: totrans-39
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: 4. 主进程[](#the-main-process "Permalink to this heading")
 - en: 'The execution flow of the main process is as follows:'
+  id: totrans-40
  prefs: []
  type: TYPE_NORMAL
+  zh: 主进程的执行流程如下：
 - en: Initialize the inference pipeline.
+  id: totrans-41
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 初始化推断流程。
 - en: Launch data acquisition subprocess.
+  id: totrans-42
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 启动数据采集子进程。
 - en: Run inference.
+  id: totrans-43
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 运行推断。
 - en: Clean up
+  id: totrans-44
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
+  zh: 清理
 - en: '[PRE4]'
+  id: totrans-45
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE4]'
 - en: '[PRE5]'
+  id: totrans-46
  prefs: []
  type: TYPE_PRE
+  zh: '[PRE5]'
 - en: 'Tag: [`torchaudio.io`](../io.html#module-torchaudio.io "torchaudio.io")'
+  id: totrans-47
  prefs: []
  type: TYPE_NORMAL
+  zh: 标签：[`torchaudio.io`](../io.html#module-torchaudio.io "torchaudio.io")
 - en: '**Total running time of the script:** ( 0 minutes 0.000 seconds)'
+  id: totrans-48
  prefs: []
  type: TYPE_NORMAL
+  zh: '**脚本的总运行时间：**（0分钟0.000秒）'
 - en: '[`Download Python source code: device_avsr.py`](../_downloads/e10abb57121274b0bbaca74dbbd1fbc4/device_avsr.py)'
+  id: totrans-49
  prefs: []
  type: TYPE_NORMAL
+  zh: 下载Python源代码：device_avsr.py
 - en: '[`Download Jupyter notebook: device_avsr.ipynb`](../_downloads/eb72a6f2273304a15352dfcf3b824b42/device_avsr.ipynb)'
+  id: totrans-50
  prefs: []
  type: TYPE_NORMAL
+  zh: 下载Jupyter笔记本：device_avsr.ipynb
 - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)'
+  id: totrans-51
  prefs: []
  type: TYPE_NORMAL
+  zh: '[Sphinx-Gallery生成的图库](https://sphinx-gallery.github.io)'
--- a/totrans/aud22_40.yaml
+++ b/totrans/aud22_40.yaml
--- a/totrans/aud22_41.yaml
+++ b/totrans/aud22_41.yaml
--- a/totrans/aud22_42.yaml
+++ b/totrans/aud22_42.yaml
--- a/totrans/aud22_43.yaml
+++ b/totrans/aud22_43.yaml
--- a/totrans/aud22_44.yaml
+++ b/totrans/aud22_44.yaml
--- a/totrans/aud22_45.yaml
+++ b/totrans/aud22_45.yaml
--- a/totrans/aud22_46.yaml
+++ b/totrans/aud22_46.yaml
 - en: Training Recipes
+  id: totrans-0
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
+  zh: 训练食谱
--- a/totrans/aud22_47.yaml
+++ b/totrans/aud22_47.yaml
 - en: Python API Reference
+  id: totrans-0
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
+  zh: Python API 参考文档
--- a/totrans/aud22_48.yaml
+++ b/totrans/aud22_48.yaml
--- a/totrans/aud22_49.yaml
+++ b/totrans/aud22_49.yaml
--- a/totrans/aud22_50.yaml
+++ b/totrans/aud22_50.yaml
--- a/totrans/aud22_51.yaml
+++ b/totrans/aud22_51.yaml
--- a/totrans/aud22_52.yaml
+++ b/totrans/aud22_52.yaml
--- a/totrans/aud22_53.yaml
+++ b/totrans/aud22_53.yaml
--- a/totrans/aud22_54.yaml
+++ b/totrans/aud22_54.yaml
 - en: torchaudio.models.decoder
+  id: totrans-0
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
+  zh: torchaudio.models.decoder
 - en: 原文：[https://pytorch.org/audio/stable/models.decoder.html](https://pytorch.org/audio/stable/models.decoder.html)
+  id: totrans-1
  prefs:
  - PREF_BQ
  type: TYPE_NORMAL
+  zh: 原文：[https://pytorch.org/audio/stable/models.decoder.html](https://pytorch.org/audio/stable/models.decoder.html)
 - en: '## CTC Decoder[](#ctc-decoder "Permalink to this heading")'
+  id: totrans-2
  prefs: []
  type: TYPE_NORMAL
+  zh: '## CTC解码器[](#ctc-decoder "跳转到此标题")'
 - en: '| [`CTCDecoder`](generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder
    "torchaudio.models.decoder.CTCDecoder") | CTC beam search decoder from *Flashlight*
    [[Kahn *et al.*, 2022](references.html#id35 "Jacob Kahn, Vineel Pratap, Tatiana
    Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard
    Grave, Gilad Avidov, and others. Flashlight: enabling innovation in tools for
    machine learning. arXiv preprint arXiv:2201.12465, 2022.")]. |'
+  id: totrans-3
  prefs: []
  type: TYPE_TB
+  zh: '| [`CTCDecoder`](generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder
+    "torchaudio.models.decoder.CTCDecoder") | 来自 *Flashlight* 的CTC波束搜索解码器 [[Kahn *et
+    al.*, 2022](references.html#id35 "Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko,
+    Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad
+    Avidov, and others. Flashlight: enabling innovation in tools for machine learning.
+    arXiv preprint arXiv:2201.12465, 2022.")]。 |'
 - en: '| [`ctc_decoder`](generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder
    "torchaudio.models.decoder.ctc_decoder") | Builds an instance of [`CTCDecoder`](generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder
    "torchaudio.models.decoder.CTCDecoder"). |'
+  id: totrans-4
  prefs: []
  type: TYPE_TB
+  zh: '| [`ctc_decoder`](generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder
+    "torchaudio.models.decoder.ctc_decoder") | 构建 [`CTCDecoder`](generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder
+    "torchaudio.models.decoder.CTCDecoder") 的实例。 |'
 - en: '| [`download_pretrained_files`](generated/torchaudio.models.decoder.download_pretrained_files.html#torchaudio.models.decoder.download_pretrained_files
    "torchaudio.models.decoder.download_pretrained_files") | Retrieves pretrained
    data files used for [`ctc_decoder()`](generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder
    "torchaudio.models.decoder.ctc_decoder"). |'
+  id: totrans-5
  prefs: []
  type: TYPE_TB
+  zh: '| [`download_pretrained_files`](generated/torchaudio.models.decoder.download_pretrained_files.html#torchaudio.models.decoder.download_pretrained_files
+    "torchaudio.models.decoder.download_pretrained_files") | 获取用于 [`ctc_decoder()`](generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder
+    "torchaudio.models.decoder.ctc_decoder") 的预训练数据文件。 |'
 - en: Tutorials using CTC Decoder
+  id: totrans-6
  prefs: []
  type: TYPE_NORMAL
+  zh: 使用CTC解码器的教程
 - en: '![ASR Inference with CTC Decoder](../Images/260e63239576cae8ee00cfcba8e4889e.png)'
+  id: totrans-7
  prefs: []
  type: TYPE_IMG
+  zh: '![使用CTC解码器的ASR推理](../Images/260e63239576cae8ee00cfcba8e4889e.png)'
 - en: '[ASR Inference with CTC Decoder](tutorials/asr_inference_with_ctc_decoder_tutorial.html#sphx-glr-tutorials-asr-inference-with-ctc-decoder-tutorial-py)'
+  id: totrans-8
  prefs: []
  type: TYPE_NORMAL
+  zh: '[使用CTC解码器的ASR推理](tutorials/asr_inference_with_ctc_decoder_tutorial.html#sphx-glr-tutorials-asr-inference-with-ctc-decoder-tutorial-py)'
 - en: ASR Inference with CTC Decoder
+  id: totrans-9
  prefs: []
  type: TYPE_NORMAL
+  zh: 使用CTC解码器的ASR推理
 - en: CUDA CTC Decoder[](#cuda-ctc-decoder "Permalink to this heading")
+  id: totrans-10
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
+  zh: CUDA CTC解码器[](#cuda-ctc-decoder "跳转到此标题")
 - en: '| [`CUCTCDecoder`](generated/torchaudio.models.decoder.CUCTCDecoder.html#torchaudio.models.decoder.CUCTCDecoder
    "torchaudio.models.decoder.CUCTCDecoder") | CUDA CTC beam search decoder. |'
+  id: totrans-11
  prefs: []
  type: TYPE_TB
+  zh: '| [`CUCTCDecoder`](generated/torchaudio.models.decoder.CUCTCDecoder.html#torchaudio.models.decoder.CUCTCDecoder
+    "torchaudio.models.decoder.CUCTCDecoder") | CUDA CTC波束搜索解码器。 |'
 - en: '| [`cuda_ctc_decoder`](generated/torchaudio.models.decoder.cuda_ctc_decoder.html#torchaudio.models.decoder.cuda_ctc_decoder
    "torchaudio.models.decoder.cuda_ctc_decoder") | Builds an instance of [`CUCTCDecoder`](generated/torchaudio.models.decoder.CUCTCDecoder.html#torchaudio.models.decoder.CUCTCDecoder
    "torchaudio.models.decoder.CUCTCDecoder"). |'
+  id: totrans-12
  prefs: []
  type: TYPE_TB
+  zh: '| [`cuda_ctc_decoder`](generated/torchaudio.models.decoder.cuda_ctc_decoder.html#torchaudio.models.decoder.cuda_ctc_decoder
+    "torchaudio.models.decoder.cuda_ctc_decoder") | 构建 [`CUCTCDecoder`](generated/torchaudio.models.decoder.CUCTCDecoder.html#torchaudio.models.decoder.CUCTCDecoder
+    "torchaudio.models.decoder.CUCTCDecoder") 的实例。 |'
 - en: Tutorials using CUDA CTC Decoder
+  id: totrans-13
  prefs: []
  type: TYPE_NORMAL
+  zh: 使用CUDA CTC解码器的教程
 - en: '![ASR Inference with CUDA CTC Decoder](../Images/9d0a043104707d980656cfaf03fdd1a1.png)'
+  id: totrans-14
  prefs: []
  type: TYPE_IMG
+  zh: '![使用CUDA CTC解码器的ASR推理](../Images/9d0a043104707d980656cfaf03fdd1a1.png)'
 - en: '[ASR Inference with CUDA CTC Decoder](tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html#sphx-glr-tutorials-asr-inference-with-cuda-ctc-decoder-tutorial-py)'
+  id: totrans-15
  prefs: []
  type: TYPE_NORMAL
+  zh: '[使用CUDA CTC解码器的ASR推理](tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html#sphx-glr-tutorials-asr-inference-with-cuda-ctc-decoder-tutorial-py)'
 - en: ASR Inference with CUDA CTC Decoder
+  id: totrans-16
  prefs: []
  type: TYPE_NORMAL
+  zh: 使用CUDA CTC解码器的ASR推理
--- a/totrans/aud22_55.yaml
+++ b/totrans/aud22_55.yaml
--- a/totrans/aud22_56.yaml
+++ b/totrans/aud22_56.yaml
--- a/totrans/aud22_57.yaml
+++ b/totrans/aud22_57.yaml
--- a/totrans/aud22_58.yaml
+++ b/totrans/aud22_58.yaml
--- a/totrans/aud22_59.yaml
+++ b/totrans/aud22_59.yaml
--- a/totrans/aud22_60.yaml
+++ b/totrans/aud22_60.yaml
--- a/totrans/aud22_61.yaml
+++ b/totrans/aud22_61.yaml
--- a/totrans/aud22_62.yaml
+++ b/totrans/aud22_62.yaml
 - en: torio.utils
+  id: totrans-0
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
+  zh: torio.utils
 - en: 原文：[https://pytorch.org/audio/stable/torio.utils.html](https://pytorch.org/audio/stable/torio.utils.html)
+  id: totrans-1
  prefs:
  - PREF_BQ
  type: TYPE_NORMAL
+  zh: 原文：[https://pytorch.org/audio/stable/torio.utils.html](https://pytorch.org/audio/stable/torio.utils.html)
 - en: '`torio.utils` module contains utility functions to query and configure the
    global state of third party libraries.'
+  id: totrans-2
  prefs: []
  type: TYPE_NORMAL
+  zh: '`torio.utils` 模块包含用于查询和配置第三方库全局状态的实用函数。'
 - en: '| [`ffmpeg_utils`](generated/torio.utils.ffmpeg_utils.html#module-torio.utils.ffmpeg_utils
    "torio.utils.ffmpeg_utils") | Module to change the configuration of FFmpeg libraries
    (such as libavformat). |'
+  id: totrans-3
  prefs: []
  type: TYPE_TB
+  zh: '| [`ffmpeg_utils`](generated/torio.utils.ffmpeg_utils.html#module-torio.utils.ffmpeg_utils
+    "torio.utils.ffmpeg_utils") | 用于更改 FFmpeg 库（如 libavformat）配置的模块。 '
--- a/totrans/aud22_63.yaml
+++ b/totrans/aud22_63.yaml
 - en: Python Prototype API Reference
+  id: totrans-0
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
+  zh: Python 原型 API 参考
--- a/totrans/aud22_64.yaml
+++ b/totrans/aud22_64.yaml
--- a/totrans/aud22_65.yaml
+++ b/totrans/aud22_65.yaml
--- a/totrans/aud22_66.yaml
+++ b/totrans/aud22_66.yaml
--- a/totrans/aud22_67.yaml
+++ b/totrans/aud22_67.yaml
--- a/totrans/aud22_68.yaml
+++ b/totrans/aud22_68.yaml
--- a/totrans/aud22_69.yaml
+++ b/totrans/aud22_69.yaml
--- a/totrans/aud22_70.yaml
+++ b/totrans/aud22_70.yaml
--- a/totrans/aud22_71.yaml
+++ b/totrans/aud22_71.yaml
--- a/totrans/aud22_72.yaml
+++ b/totrans/aud22_72.yaml
--- a/totrans/aud22_73.yaml
+++ b/totrans/aud22_73.yaml
--- a/totrans/aud22_74.yaml
+++ b/totrans/aud22_74.yaml
--- a/totrans/aud22_75.yaml
+++ b/totrans/aud22_75.yaml
--- a/totrans/data07_00.yaml
+++ b/totrans/data07_00.yaml
--- a/totrans/data07_01.yaml
+++ b/totrans/data07_01.yaml
--- a/totrans/data07_02.yaml
+++ b/totrans/data07_02.yaml
--- a/totrans/data07_03.yaml
+++ b/totrans/data07_03.yaml
--- a/totrans/data07_04.yaml
+++ b/totrans/data07_04.yaml
--- a/totrans/data07_05.yaml
+++ b/totrans/data07_05.yaml
--- a/totrans/data07_06.yaml
+++ b/totrans/data07_06.yaml
--- a/totrans/data07_07.yaml
+++ b/totrans/data07_07.yaml
--- a/totrans/data07_08.yaml
+++ b/totrans/data07_08.yaml
--- a/totrans/data07_09.yaml
+++ b/totrans/data07_09.yaml
--- a/totrans/data07_10.yaml
+++ b/totrans/data07_10.yaml
--- a/totrans/data07_11.yaml
+++ b/totrans/data07_11.yaml
--- a/totrans/data07_12.yaml
+++ b/totrans/data07_12.yaml
--- a/totrans/rec_00.yaml
+++ b/totrans/rec_00.yaml
--- a/totrans/rec_01.yaml
+++ b/totrans/rec_01.yaml
--- a/totrans/rec_02.yaml
+++ b/totrans/rec_02.yaml
--- a/totrans/rec_03.yaml
+++ b/totrans/rec_03.yaml