- en: ASR Inference with CTC Decoder id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL zh: 使用CTC解码器进行ASR推断 - en: 原文:[https://pytorch.org/audio/stable/tutorials/asr_inference_with_ctc_decoder_tutorial.html](https://pytorch.org/audio/stable/tutorials/asr_inference_with_ctc_decoder_tutorial.html) id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL zh: 原文:[https://pytorch.org/audio/stable/tutorials/asr_inference_with_ctc_decoder_tutorial.html](https://pytorch.org/audio/stable/tutorials/asr_inference_with_ctc_decoder_tutorial.html) - en: Note id: totrans-2 prefs: [] type: TYPE_NORMAL zh: 注意 - en: Click [here](#sphx-glr-download-tutorials-asr-inference-with-ctc-decoder-tutorial-py) to download the full example code id: totrans-3 prefs: [] type: TYPE_NORMAL zh: 点击[这里](#sphx-glr-download-tutorials-asr-inference-with-ctc-decoder-tutorial-py)下载完整示例代码 - en: '**Author**: [Caroline Chen](mailto:carolinechen%40meta.com)' id: totrans-4 prefs: [] type: TYPE_NORMAL zh: '**作者**:[Caroline Chen](mailto:carolinechen%40meta.com)' - en: This tutorial shows how to perform speech recognition inference using a CTC beam search decoder with lexicon constraint and KenLM language model support. We demonstrate this on a pretrained wav2vec 2.0 model trained using CTC loss. id: totrans-5 prefs: [] type: TYPE_NORMAL zh: 本教程展示了如何使用带有词典约束和KenLM语言模型支持的CTC波束搜索解码器执行语音识别推断。我们在使用CTC损失训练的预训练wav2vec 2.0模型上演示了这一点。 - en: Overview[](#overview "Permalink to this heading") id: totrans-6 prefs: - PREF_H2 type: TYPE_NORMAL zh: 概述[](#overview "跳转到此标题的永久链接") - en: Beam search decoding works by iteratively expanding text hypotheses (beams) with next possible characters, and maintaining only the hypotheses with the highest scores at each time step. A language model can be incorporated into the scoring computation, and adding a lexicon constraint restricts the next possible tokens for the hypotheses so that only words from the lexicon can be generated. id: totrans-7 prefs: [] type: TYPE_NORMAL zh: 波束搜索解码通过迭代扩展文本假设(波束)并使用下一个可能的字符,每个时间步仅保留具有最高分数的假设来工作。语言模型可以并入到得分计算中,添加词典约束会限制假设的下一个可能令牌,以便只能生成词典中的单词。 - en: The underlying implementation is ported from [Flashlight](https://arxiv.org/pdf/2201.12465.pdf)’s beam search decoder. A mathematical formula for the decoder optimization can be found in the [Wav2Letter paper](https://arxiv.org/pdf/1609.03193.pdf), and a more detailed algorithm can be found in this [blog](https://towardsdatascience.com/boosting-your-sequence-generation-performance-with-beam-search-language-model-decoding-74ee64de435a). id: totrans-8 prefs: [] type: TYPE_NORMAL zh: 底层实现是从[Flashlight](https://arxiv.org/pdf/2201.12465.pdf)的波束搜索解码器移植过来的。解码器优化的数学公式可以在[Wav2Letter论文](https://arxiv.org/pdf/1609.03193.pdf)中找到,更详细的算法可以在这篇[博客](https://towardsdatascience.com/boosting-your-sequence-generation-performance-with-beam-search-language-model-decoding-74ee64de435a)中找到。 - en: Running ASR inference using a CTC Beam Search decoder with a language model and lexicon constraint requires the following components id: totrans-9 prefs: [] type: TYPE_NORMAL zh: 使用带有语言模型和词典约束的CTC波束搜索解码器进行ASR推断需要以下组件 - en: 'Acoustic Model: model predicting phonetics from audio waveforms' id: totrans-10 prefs: - PREF_UL type: TYPE_NORMAL zh: 声学模型:从音频波形预测语音学的模型 - en: 'Tokens: the possible predicted tokens from the acoustic model' id: totrans-11 prefs: - PREF_UL type: TYPE_NORMAL zh: 令牌:声学模型可能预测的令牌 - en: 'Lexicon: mapping between possible words and their corresponding tokens sequence' id: totrans-12 prefs: - PREF_UL type: TYPE_NORMAL zh: 词典:可能单词与其对应的令牌序列之间的映射 - en: 'Language Model (LM): n-gram language model trained with the [KenLM library](https://kheafield.com/code/kenlm/), or custom language model that inherits [`CTCDecoderLM`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoderLM "torchaudio.models.decoder.CTCDecoderLM")' id: totrans-13 prefs: - PREF_UL type: TYPE_NORMAL zh: 语言模型(LM):使用[KenLM库](https://kheafield.com/code/kenlm/)训练的n-gram语言模型,或者继承了[`CTCDecoderLM`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoderLM "torchaudio.models.decoder.CTCDecoderLM")的自定义语言模型 - en: Acoustic Model and Set Up[](#acoustic-model-and-set-up "Permalink to this heading") id: totrans-14 prefs: - PREF_H2 type: TYPE_NORMAL zh: 声学模型和设置[](#acoustic-model-and-set-up "跳转到此标题的永久链接") - en: First we import the necessary utilities and fetch the data that we are working with id: totrans-15 prefs: [] type: TYPE_NORMAL zh: 首先,我们导入必要的工具并获取我们正在处理的数据 - en: '[PRE0]' id: totrans-16 prefs: [] type: TYPE_PRE zh: '[PRE0]' - en: '[PRE1]' id: totrans-17 prefs: [] type: TYPE_PRE zh: '[PRE1]' - en: '[PRE2]' id: totrans-18 prefs: [] type: TYPE_PRE zh: '[PRE2]' - en: We use the pretrained [Wav2Vec 2.0](https://arxiv.org/abs/2006.11477) Base model that is finetuned on 10 min of the [LibriSpeech dataset](http://www.openslr.org/12), which can be loaded in using [`torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M`](../generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M "torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M"). For more detail on running Wav2Vec 2.0 speech recognition pipelines in torchaudio, please refer to [this tutorial](./speech_recognition_pipeline_tutorial.html). id: totrans-19 prefs: [] type: TYPE_NORMAL zh: 我们使用预训练的[Wav2Vec 2.0](https://arxiv.org/abs/2006.11477)基础模型,该模型在10分钟的[LibriSpeech数据集](http://www.openslr.org/12)上进行了微调,可以使用[`torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M`](../generated/torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M "torchaudio.pipelines.WAV2VEC2_ASR_BASE_10M")加载。有关在torchaudio中运行Wav2Vec 2.0语音识别流水线的更多详细信息,请参考[此教程](./speech_recognition_pipeline_tutorial.html)。 - en: '[PRE3]' id: totrans-20 prefs: [] type: TYPE_PRE zh: '[PRE3]' - en: '[PRE4]' id: totrans-21 prefs: [] type: TYPE_PRE zh: '[PRE4]' - en: We will load a sample from the LibriSpeech test-other dataset. id: totrans-22 prefs: [] type: TYPE_NORMAL zh: 我们将从LibriSpeech测试集中加载一个样本。 - en: '[PRE5]' id: totrans-23 prefs: [] type: TYPE_PRE zh: '[PRE5]' - en: null id: totrans-24 prefs: [] type: TYPE_NORMAL - en: Your browser does not support the audio element. id: totrans-25 prefs: [] type: TYPE_NORMAL zh: 您的浏览器不支持音频元素。 - en: The transcript corresponding to this audio file is id: totrans-26 prefs: [] type: TYPE_NORMAL zh: 与此音频文件对应的转录本是 - en: '[PRE6]' id: totrans-27 prefs: [] type: TYPE_PRE zh: '[PRE6]' - en: '[PRE7]' id: totrans-28 prefs: [] type: TYPE_PRE zh: '[PRE7]' - en: Files and Data for Decoder[](#files-and-data-for-decoder "Permalink to this heading") id: totrans-29 prefs: - PREF_H2 type: TYPE_NORMAL zh: 解码器的文件和数据[](#files-and-data-for-decoder "跳转到此标题的永久链接") - en: Next, we load in our token, lexicon, and language model data, which are used by the decoder to predict words from the acoustic model output. Pretrained files for the LibriSpeech dataset can be downloaded through torchaudio, or the user can provide their own files. id: totrans-30 prefs: [] type: TYPE_NORMAL zh: 接下来,我们加载我们的令牌、词典和语言模型数据,这些数据由解码器用于从声学模型输出中预测单词。LibriSpeech数据集的预训练文件可以通过torchaudio下载,或者用户可以提供自己的文件。 - en: Tokens[](#tokens "Permalink to this heading") id: totrans-31 prefs: - PREF_H3 type: TYPE_NORMAL zh: 令牌[](#tokens "跳转到此标题的永久链接") - en: The tokens are the possible symbols that the acoustic model can predict, including the blank and silent symbols. It can either be passed in as a file, where each line consists of the tokens corresponding to the same index, or as a list of tokens, each mapping to a unique index. id: totrans-32 prefs: [] type: TYPE_NORMAL zh: 令牌是声学模型可以预测的可能符号,包括空白和静音符号。它可以作为一个文件传递,其中每一行都包含与相同索引对应的令牌,或者作为令牌列表传递,每个令牌映射到一个唯一的索引。 - en: '[PRE8]' id: totrans-33 prefs: [] type: TYPE_PRE zh: '[PRE8]' - en: '[PRE9]' id: totrans-34 prefs: [] type: TYPE_PRE zh: '[PRE9]' - en: '[PRE10]' id: totrans-35 prefs: [] type: TYPE_PRE zh: '[PRE10]' - en: Lexicon[](#lexicon "Permalink to this heading") id: totrans-36 prefs: - PREF_H3 type: TYPE_NORMAL zh: 词典[](#lexicon "跳转到此标题的永久链接") - en: The lexicon is a mapping from words to their corresponding tokens sequence, and is used to restrict the search space of the decoder to only words from the lexicon. The expected format of the lexicon file is a line per word, with a word followed by its space-split tokens. id: totrans-37 prefs: [] type: TYPE_NORMAL zh: 词典是从单词到其对应标记序列的映射,并用于将解码器的搜索空间限制为仅来自词典的单词。词典文件的预期格式是每行一个单词,后跟其空格分隔的标记。 - en: '[PRE11]' id: totrans-38 prefs: [] type: TYPE_PRE zh: '[PRE11]' - en: Language Model[](#language-model "Permalink to this heading") id: totrans-39 prefs: - PREF_H3 type: TYPE_NORMAL zh: 语言模型 - en: A language model can be used in decoding to improve the results, by factoring in a language model score that represents the likelihood of the sequence into the beam search computation. Below, we outline the different forms of language models that are supported for decoding. id: totrans-40 prefs: [] type: TYPE_NORMAL zh: 在解码中可以使用语言模型来改善结果,通过将代表序列可能性的语言模型分数纳入到波束搜索计算中。下面,我们概述了支持解码的不同形式的语言模型。 - en: No Language Model[](#no-language-model "Permalink to this heading") id: totrans-41 prefs: - PREF_H4 type: TYPE_NORMAL zh: 无语言模型 - en: To create a decoder instance without a language model, set lm=None when initializing the decoder. id: totrans-42 prefs: [] type: TYPE_NORMAL zh: 要创建一个没有语言模型的解码器实例,请在初始化解码器时设置lm=None。 - en: KenLM[](#kenlm "Permalink to this heading") id: totrans-43 prefs: - PREF_H4 type: TYPE_NORMAL zh: KenLM - en: This is an n-gram language model trained with the [KenLM library](https://kheafield.com/code/kenlm/). Both the `.arpa` or the binarized `.bin` LM can be used, but the binary format is recommended for faster loading. id: totrans-44 prefs: [] type: TYPE_NORMAL zh: 这是一个使用KenLM库训练的n-gram语言模型。可以使用`.arpa`或二进制化的`.bin`语言模型,但建议使用二进制格式以加快加载速度。 - en: The language model used in this tutorial is a 4-gram KenLM trained using [LibriSpeech](http://www.openslr.org/11). id: totrans-45 prefs: [] type: TYPE_NORMAL zh: 本教程中使用的语言模型是使用[LibriSpeech](http://www.openslr.org/11)训练的4-gram KenLM。 - en: Custom Language Model[](#custom-language-model "Permalink to this heading") id: totrans-46 prefs: - PREF_H4 type: TYPE_NORMAL zh: 自定义语言模型 - en: Users can define their own custom language model in Python, whether it be a statistical or neural network language model, using [`CTCDecoderLM`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoderLM "torchaudio.models.decoder.CTCDecoderLM") and [`CTCDecoderLMState`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoderLMState "torchaudio.models.decoder.CTCDecoderLMState"). id: totrans-47 prefs: [] type: TYPE_NORMAL zh: 用户可以在Python中定义自己的自定义语言模型,无论是统计还是神经网络语言模型,使用[`CTCDecoderLM`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoderLM)和[`CTCDecoderLMState`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoderLMState)。 - en: For instance, the following code creates a basic wrapper around a PyTorch `torch.nn.Module` language model. id: totrans-48 prefs: [] type: TYPE_NORMAL zh: 例如,以下代码创建了一个围绕PyTorch `torch.nn.Module`语言模型的基本包装器。 - en: '[PRE12]' id: totrans-49 prefs: [] type: TYPE_PRE zh: '[PRE12]' - en: Downloading Pretrained Files[](#downloading-pretrained-files "Permalink to this heading") id: totrans-50 prefs: - PREF_H4 type: TYPE_NORMAL zh: 下载预训练文件 - en: Pretrained files for the LibriSpeech dataset can be downloaded using [`download_pretrained_files()`](../generated/torchaudio.models.decoder.download_pretrained_files.html#torchaudio.models.decoder.download_pretrained_files "torchaudio.models.decoder.download_pretrained_files"). id: totrans-51 prefs: [] type: TYPE_NORMAL zh: 可以使用[`download_pretrained_files()`](../generated/torchaudio.models.decoder.download_pretrained_files.html#torchaudio.models.decoder.download_pretrained_files)下载LibriSpeech数据集的预训练文件。 - en: 'Note: this cell may take a couple of minutes to run, as the language model can be large' id: totrans-52 prefs: [] type: TYPE_NORMAL zh: 注意:此单元格可能需要几分钟才能运行,因为语言模型可能很大 - en: '[PRE13]' id: totrans-53 prefs: [] type: TYPE_PRE zh: '[PRE13]' - en: '[PRE14]' id: totrans-54 prefs: [] type: TYPE_PRE zh: '[PRE14]' - en: Construct Decoders[](#construct-decoders "Permalink to this heading") id: totrans-55 prefs: - PREF_H2 type: TYPE_NORMAL zh: 构建解码器 - en: In this tutorial, we construct both a beam search decoder and a greedy decoder for comparison. id: totrans-56 prefs: [] type: TYPE_NORMAL zh: 在本教程中,我们构建了波束搜索解码器和贪婪解码器进行比较。 - en: Beam Search Decoder[](#beam-search-decoder "Permalink to this heading") id: totrans-57 prefs: - PREF_H3 type: TYPE_NORMAL zh: 波束搜索解码器 - en: The decoder can be constructed using the factory function [`ctc_decoder()`](../generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder "torchaudio.models.decoder.ctc_decoder"). In addition to the previously mentioned components, it also takes in various beam search decoding parameters and token/word parameters. id: totrans-58 prefs: [] type: TYPE_NORMAL zh: 可以使用工厂函数[`ctc_decoder()`](../generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder)构建解码器。除了先前提到的组件外,它还接受各种波束搜索解码参数和标记/单词参数。 - en: This decoder can also be run without a language model by passing in None into the lm parameter. id: totrans-59 prefs: [] type: TYPE_NORMAL zh: 这个解码器也可以在没有语言模型的情况下运行,通过将None传递给lm参数。 - en: '[PRE15]' id: totrans-60 prefs: [] type: TYPE_PRE zh: '[PRE15]' - en: Greedy Decoder[](#greedy-decoder "Permalink to this heading") id: totrans-61 prefs: - PREF_H3 type: TYPE_NORMAL zh: 贪婪解码器 - en: '[PRE16]' id: totrans-62 prefs: [] type: TYPE_PRE zh: '[PRE16]' - en: Run Inference[](#run-inference "Permalink to this heading") id: totrans-63 prefs: - PREF_H2 type: TYPE_NORMAL zh: 运行推理 - en: Now that we have the data, acoustic model, and decoder, we can perform inference. The output of the beam search decoder is of type [`CTCHypothesis`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCHypothesis "torchaudio.models.decoder.CTCHypothesis"), consisting of the predicted token IDs, corresponding words (if a lexicon is provided), hypothesis score, and timesteps corresponding to the token IDs. Recall the transcript corresponding to the waveform is id: totrans-64 prefs: [] type: TYPE_NORMAL zh: 现在我们有了数据、声学模型和解码器,我们可以执行推理。波束搜索解码器的输出类型为[`CTCHypothesis`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCHypothesis),包括预测的标记ID、对应的单词(如果提供了词典)、假设分数和与标记ID对应的时间步。回想一下与波形对应的转录是 - en: '[PRE17]' id: totrans-65 prefs: [] type: TYPE_PRE zh: '[PRE17]' - en: '[PRE18]' id: totrans-66 prefs: [] type: TYPE_PRE zh: '[PRE18]' - en: The greedy decoder gives the following result. id: totrans-67 prefs: [] type: TYPE_NORMAL zh: 贪婪解码器给出以下结果。 - en: '[PRE19]' id: totrans-68 prefs: [] type: TYPE_PRE zh: '[PRE19]' - en: '[PRE20]' id: totrans-69 prefs: [] type: TYPE_PRE zh: '[PRE20]' - en: 'Using the beam search decoder:' id: totrans-70 prefs: [] type: TYPE_NORMAL zh: 使用波束搜索解码器: - en: '[PRE21]' id: totrans-71 prefs: [] type: TYPE_PRE zh: '[PRE21]' - en: '[PRE22]' id: totrans-72 prefs: [] type: TYPE_PRE zh: '[PRE22]' - en: Note id: totrans-73 prefs: [] type: TYPE_NORMAL zh: 注意 - en: The [`words`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCHypothesis.words "torchaudio.models.decoder.CTCHypothesis.words") field of the output hypotheses will be empty if no lexicon is provided to the decoder. To retrieve a transcript with lexicon-free decoding, you can perform the following to retrieve the token indices, convert them to original tokens, then join them together. id: totrans-74 prefs: [] type: TYPE_NORMAL zh: 如果解码器没有提供词典,输出假设的[`words`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCHypothesis.words "torchaudio.models.decoder.CTCHypothesis.words")字段将为空。要获取无词典解码的转录,可以执行以下操作:检索标记索引,将其转换为原始标记,然后将它们连接在一起。 - en: '[PRE23]' id: totrans-75 prefs: [] type: TYPE_PRE zh: '[PRE23]' - en: We see that the transcript with the lexicon-constrained beam search decoder produces a more accurate result consisting of real words, while the greedy decoder can predict incorrectly spelled words like “affrayd” and “shoktd”. id: totrans-76 prefs: [] type: TYPE_NORMAL zh: 我们看到,使用受词典约束的波束搜索解码器的转录产生了更准确的结果,包含真实单词,而贪婪解码器可能会预测拼写错误的单词,如“affrayd”和“shoktd”。 - en: Incremental decoding[](#incremental-decoding "Permalink to this heading") id: totrans-77 prefs: - PREF_H3 type: TYPE_NORMAL zh: 增量解码 - en: If the input speech is long, one can decode the emission in incremental manner. id: totrans-78 prefs: [] type: TYPE_NORMAL zh: 如果输入语音很长,可以以增量方式解码排放。 - en: You need to first initialize the internal state of the decoder with [`decode_begin()`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder.decode_begin "torchaudio.models.decoder.CTCDecoder.decode_begin"). id: totrans-79 prefs: [] type: TYPE_NORMAL zh: 您需要首先使用[`decode_begin()`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder.decode_begin "torchaudio.models.decoder.CTCDecoder.decode_begin")初始化解码器的内部状态。 - en: '[PRE24]' id: totrans-80 prefs: [] type: TYPE_PRE zh: '[PRE24]' - en: Then, you can pass emissions to [`decode_begin()`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder.decode_begin "torchaudio.models.decoder.CTCDecoder.decode_begin"). Here we use the same emission but pass it to the decoder one frame at a time. id: totrans-81 prefs: [] type: TYPE_NORMAL zh: 然后,您可以将排放传递给[`decode_begin()`](../generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder.decode_begin "torchaudio.models.decoder.CTCDecoder.decode_begin")。在这里,我们使用相同的排放,但是一次将其传递给解码器一个帧。 - en: '[PRE25]' id: totrans-82 prefs: [] type: TYPE_PRE zh: '[PRE25]' - en: Finally, finalize the internal state of the decoder, and retrieve the result. id: totrans-83 prefs: [] type: TYPE_NORMAL zh: 最后,完成解码器的内部状态,并检索结果。 - en: '[PRE26]' id: totrans-84 prefs: [] type: TYPE_PRE zh: '[PRE26]' - en: The result of incremental decoding is identical to batch decoding. id: totrans-85 prefs: [] type: TYPE_NORMAL zh: 增量解码的结果与批量解码相同。 - en: '[PRE27]' id: totrans-86 prefs: [] type: TYPE_PRE zh: '[PRE27]' - en: '[PRE28]' id: totrans-87 prefs: [] type: TYPE_PRE zh: '[PRE28]' - en: Timestep Alignments[](#timestep-alignments "Permalink to this heading") id: totrans-88 prefs: - PREF_H2 type: TYPE_NORMAL zh: 时间步对齐 - en: Recall that one of the components of the resulting Hypotheses is timesteps corresponding to the token IDs. id: totrans-89 prefs: [] type: TYPE_NORMAL zh: 回想一下,生成的假设中的一个组成部分是与标记ID对应的时间步。 - en: '[PRE29]' id: totrans-90 prefs: [] type: TYPE_PRE zh: '[PRE29]' - en: '[PRE30]' id: totrans-91 prefs: [] type: TYPE_PRE zh: '[PRE30]' - en: Below, we visualize the token timestep alignments relative to the original waveform. id: totrans-92 prefs: [] type: TYPE_NORMAL zh: 下面,我们将标记时间步对齐可视化相对于原始波形。 - en: '[PRE31]' id: totrans-93 prefs: [] type: TYPE_PRE zh: '[PRE31]' - en: '![asr inference with ctc decoder tutorial](../Images/e2abf68b7cace07964d5580316ac4575.png)' id: totrans-94 prefs: [] type: TYPE_IMG zh: '![带有ctc解码器教程的asr推理](../Images/e2abf68b7cace07964d5580316ac4575.png)' - en: Beam Search Decoder Parameters[](#beam-search-decoder-parameters "Permalink to this heading") id: totrans-95 prefs: - PREF_H2 type: TYPE_NORMAL zh: 波束搜索解码器参数 - en: In this section, we go a little bit more in depth about some different parameters and tradeoffs. For the full list of customizable parameters, please refer to the [`documentation`](../generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder "torchaudio.models.decoder.ctc_decoder"). id: totrans-96 prefs: [] type: TYPE_NORMAL zh: 在本节中,我们将更深入地讨论一些不同的参数和权衡。有关可定制参数的完整列表,请参考[`文档`](../generated/torchaudio.models.decoder.ctc_decoder.html#torchaudio.models.decoder.ctc_decoder "torchaudio.models.decoder.ctc_decoder")。 - en: Helper Function[](#helper-function "Permalink to this heading") id: totrans-97 prefs: - PREF_H3 type: TYPE_NORMAL zh: 辅助函数 - en: '[PRE32]' id: totrans-98 prefs: [] type: TYPE_PRE zh: '[PRE32]' - en: nbest[](#nbest "Permalink to this heading") id: totrans-99 prefs: - PREF_H3 type: TYPE_NORMAL zh: nbest - en: This parameter indicates the number of best hypotheses to return, which is a property that is not possible with the greedy decoder. For instance, by setting `nbest=3` when constructing the beam search decoder earlier, we can now access the hypotheses with the top 3 scores. id: totrans-100 prefs: [] type: TYPE_NORMAL zh: 此参数指示要返回的最佳假设数,这是贪婪解码器无法实现的属性。例如,在构建波束搜索解码器时设置`nbest=3`,现在我们可以访问得分最高的三个假设。 - en: '[PRE33]' id: totrans-101 prefs: [] type: TYPE_PRE zh: '[PRE33]' - en: '[PRE34]' id: totrans-102 prefs: [] type: TYPE_PRE zh: '[PRE34]' - en: beam size[](#beam-size "Permalink to this heading") id: totrans-103 prefs: - PREF_H3 type: TYPE_NORMAL zh: 波束大小 - en: The `beam_size` parameter determines the maximum number of best hypotheses to hold after each decoding step. Using larger beam sizes allows for exploring a larger range of possible hypotheses which can produce hypotheses with higher scores, but it is computationally more expensive and does not provide additional gains beyond a certain point. id: totrans-104 prefs: [] type: TYPE_NORMAL zh: '`beam_size`参数确定每个解码步骤后保留的最佳假设数的最大值。使用更大的波束大小可以探索更广泛的可能假设范围,这可能会产生得分更高的假设,但在计算上更昂贵,并且在某一点之后不提供额外的收益。' - en: In the example below, we see improvement in decoding quality as we increase beam size from 1 to 5 to 50, but notice how using a beam size of 500 provides the same output as beam size 50 while increase the computation time. id: totrans-105 prefs: [] type: TYPE_NORMAL zh: 在下面的示例中,我们看到随着将波束大小从1增加到5再到50,解码质量有所提高,但请注意,使用波束大小为500时提供与波束大小为50相同的输出,同时增加了计算时间。 - en: '[PRE35]' id: totrans-106 prefs: [] type: TYPE_PRE zh: '[PRE35]' - en: '[PRE36]' id: totrans-107 prefs: [] type: TYPE_PRE zh: '[PRE36]' - en: beam size token[](#beam-size-token "Permalink to this heading") id: totrans-108 prefs: - PREF_H3 type: TYPE_NORMAL zh: 波束大小标记 - en: The `beam_size_token` parameter corresponds to the number of tokens to consider for expanding each hypothesis at the decoding step. Exploring a larger number of next possible tokens increases the range of potential hypotheses at the cost of computation. id: totrans-109 prefs: [] type: TYPE_NORMAL zh: '`beam_size_token`参数对应于在解码步骤中考虑扩展每个假设的标记数。探索更多可能的下一个标记数量会增加潜在假设的范围,但会增加计算成本。' - en: '[PRE37]' id: totrans-110 prefs: [] type: TYPE_PRE zh: '[PRE37]' - en: '[PRE38]' id: totrans-111 prefs: [] type: TYPE_PRE zh: '[PRE38]' - en: beam threshold[](#beam-threshold "Permalink to this heading") id: totrans-112 prefs: - PREF_H3 type: TYPE_NORMAL zh: 波束阈值 - en: The `beam_threshold` parameter is used to prune the stored hypotheses set at each decoding step, removing hypotheses whose scores are greater than `beam_threshold` away from the highest scoring hypothesis. There is a balance between choosing smaller thresholds to prune more hypotheses and reduce the search space, and choosing a large enough threshold such that plausible hypotheses are not pruned. id: totrans-113 prefs: [] type: TYPE_NORMAL zh: '`beam_threshold`参数用于在每个解码步骤中修剪存储的假设集,删除分数高于距离最高分假设`beam_threshold`的假设。在选择较小的阈值以修剪更多假设并减少搜索空间之间存在平衡,以及选择足够大的阈值以确保不会修剪合理的假设。' - en: '[PRE39]' id: totrans-114 prefs: [] type: TYPE_PRE zh: '[PRE39]' - en: '[PRE40]' id: totrans-115 prefs: [] type: TYPE_PRE zh: '[PRE40]' - en: language model weight[](#language-model-weight "Permalink to this heading") id: totrans-116 prefs: - PREF_H3 type: TYPE_NORMAL zh: 语言模型权重[](#language-model-weight "跳转到此标题的永久链接") - en: The `lm_weight` parameter is the weight to assign to the language model score which to accumulate with the acoustic model score for determining the overall scores. Larger weights encourage the model to predict next words based on the language model, while smaller weights give more weight to the acoustic model score instead. id: totrans-117 prefs: [] type: TYPE_NORMAL zh: '`lm_weight`参数是要分配给语言模型分数的权重,该分数将与声学模型分数累积以确定总体分数。较大的权重鼓励模型基于语言模型预测下一个单词,而较小的权重则更多地将权重放在声学模型分数上。' - en: '[PRE41]' id: totrans-118 prefs: [] type: TYPE_PRE zh: '[PRE41]' - en: '[PRE42]' id: totrans-119 prefs: [] type: TYPE_PRE zh: '[PRE42]' - en: additional parameters[](#additional-parameters "Permalink to this heading") id: totrans-120 prefs: - PREF_H3 type: TYPE_NORMAL zh: 其他参数[](#additional-parameters "跳转到此标题的永久链接") - en: Additional parameters that can be optimized include the following id: totrans-121 prefs: [] type: TYPE_NORMAL zh: 可以优化的其他参数包括以下内容 - en: '`word_score`: score to add when word finishes' id: totrans-122 prefs: - PREF_UL type: TYPE_NORMAL zh: '`word_score`: 单词结束时要添加的分数' - en: '`unk_score`: unknown word appearance score to add' id: totrans-123 prefs: - PREF_UL type: TYPE_NORMAL zh: '`unk_score`: 添加未知单词出现分数' - en: '`sil_score`: silence appearance score to add' id: totrans-124 prefs: - PREF_UL type: TYPE_NORMAL zh: '`sil_score`: 添加静音出现分数' - en: '`log_add`: whether to use log add for lexicon Trie smearing' id: totrans-125 prefs: - PREF_UL type: TYPE_NORMAL zh: '`log_add`: 是否对词典Trie扩散使用对数相加' - en: '**Total running time of the script:** ( 1 minutes 55.312 seconds)' id: totrans-126 prefs: [] type: TYPE_NORMAL zh: '**脚本的总运行时间:**(1分钟55.312秒)' - en: '[`Download Python source code: asr_inference_with_ctc_decoder_tutorial.py`](../_downloads/da151acc525ba1fb468e2a4904659af1/asr_inference_with_ctc_decoder_tutorial.py)' id: totrans-127 prefs: [] type: TYPE_NORMAL zh: '[`下载Python源代码:asr_inference_with_ctc_decoder_tutorial.py`](../_downloads/da151acc525ba1fb468e2a4904659af1/asr_inference_with_ctc_decoder_tutorial.py)' - en: '[`Download Jupyter notebook: asr_inference_with_ctc_decoder_tutorial.ipynb`](../_downloads/ade1a3c3b444796d2a34839c7ea75426/asr_inference_with_ctc_decoder_tutorial.ipynb)' id: totrans-128 prefs: [] type: TYPE_NORMAL zh: '[`下载Jupyter笔记本:asr_inference_with_ctc_decoder_tutorial.ipynb`](../_downloads/ade1a3c3b444796d2a34839c7ea75426/asr_inference_with_ctc_decoder_tutorial.ipynb)' - en: '[Gallery generated by Sphinx-Gallery](https://sphinx-gallery.github.io)' id: totrans-129 prefs: [] type: TYPE_NORMAL zh: '[Sphinx-Gallery生成的图库](https://sphinx-gallery.github.io)'