- en: torchaudio.prototype.pipelines id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL zh: torchaudio.prototype.pipelines - en: 原文:[https://pytorch.org/audio/stable/prototype.pipelines.html](https://pytorch.org/audio/stable/prototype.pipelines.html) id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL zh: 原文:[https://pytorch.org/audio/stable/prototype.pipelines.html](https://pytorch.org/audio/stable/prototype.pipelines.html) - en: The pipelines subpackage contains APIs to models with pretrained weights and relevant utilities. id: totrans-2 prefs: [] type: TYPE_NORMAL zh: pipelines子包含有具有预训练权重和相关实用程序的模型的API。 - en: RNN-T Streaming/Non-Streaming ASR[](#rnn-t-streaming-non-streaming-asr "Permalink to this heading") id: totrans-3 prefs: - PREF_H2 type: TYPE_NORMAL zh: RNN-T流式/非流式ASR[](#rnn-t-streaming-non-streaming-asr "跳转到此标题") - en: Pretrained Models[](#pretrained-models "Permalink to this heading") id: totrans-4 prefs: - PREF_H3 type: TYPE_NORMAL zh: 预训练模型[](#pretrained-models "跳转到此标题") - en: '| [`EMFORMER_RNNT_BASE_MUSTC`](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC.html#torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC "torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC") | Pre-trained Emformer-RNNT-based ASR pipeline capable of performing both streaming and non-streaming inference. |' id: totrans-5 prefs: [] type: TYPE_TB zh: '| [`EMFORMER_RNNT_BASE_MUSTC`](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC.html#torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC "torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC") | 预训练的Emformer-RNNT基于ASR管道,能够执行流式和非流式推断。 |' - en: '| [`EMFORMER_RNNT_BASE_TEDLIUM3`](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3.html#torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3 "torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3") | Pre-trained Emformer-RNNT-based ASR pipeline capable of performing both streaming and non-streaming inference. |' id: totrans-6 prefs: [] type: TYPE_TB zh: '| [`EMFORMER_RNNT_BASE_TEDLIUM3`](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3.html#torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3 "torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3") | 预训练的Emformer-RNNT基于ASR管道,能够执行流式和非流式推断。 |' - en: HiFiGAN Vocoder[](#hifigan-vocoder "Permalink to this heading") id: totrans-7 prefs: - PREF_H2 type: TYPE_NORMAL zh: HiFiGAN Vocoder[](#hifigan-vocoder "跳转到此标题") - en: Interface[](#interface "Permalink to this heading") id: totrans-8 prefs: - PREF_H3 type: TYPE_NORMAL zh: 接口[](#interface "跳转到此标题") - en: '[`HiFiGANVocoderBundle`](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#torchaudio.prototype.pipelines.HiFiGANVocoderBundle "torchaudio.prototype.pipelines.HiFiGANVocoderBundle") defines HiFiGAN Vocoder pipeline capable of transforming mel spectrograms into waveforms.' id: totrans-9 prefs: [] type: TYPE_NORMAL zh: '[`HiFiGANVocoderBundle`](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#torchaudio.prototype.pipelines.HiFiGANVocoderBundle "torchaudio.prototype.pipelines.HiFiGANVocoderBundle")定义了能够将mel频谱图转换为波形的HiFiGAN Vocoder管道。' - en: '| [`HiFiGANVocoderBundle`](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#torchaudio.prototype.pipelines.HiFiGANVocoderBundle "torchaudio.prototype.pipelines.HiFiGANVocoderBundle") | Data class that bundles associated information to use pretrained [`HiFiGANVocoder`](generated/torchaudio.prototype.models.HiFiGANVocoder.html#torchaudio.prototype.models.HiFiGANVocoder "torchaudio.prototype.models.HiFiGANVocoder"). |' id: totrans-10 prefs: [] type: TYPE_TB zh: '| [`HiFiGANVocoderBundle`](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#torchaudio.prototype.pipelines.HiFiGANVocoderBundle "torchaudio.prototype.pipelines.HiFiGANVocoderBundle") | 数据类,捆绑了与预训练[`HiFiGANVocoder`](generated/torchaudio.prototype.models.HiFiGANVocoder.html#torchaudio.prototype.models.HiFiGANVocoder "torchaudio.prototype.models.HiFiGANVocoder")相关的信息。 |' - en: Pretrained Models[](#id1 "Permalink to this heading") id: totrans-11 prefs: - PREF_H3 type: TYPE_NORMAL zh: 预训练模型[](#id1 "跳转到此标题") - en: '| [`HIFIGAN_VOCODER_V3_LJSPEECH`](generated/torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH.html#torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH "torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH") | HiFiGAN Vocoder pipeline, trained on *The LJ Speech Dataset* [[Ito and Johnson, 2017](references.html#id7 "Keith Ito and Linda Johnson. The lj speech dataset. \url https://keithito.com/LJ-Speech-Dataset/, 2017.")]. |' id: totrans-12 prefs: [] type: TYPE_TB zh: '| [`HIFIGAN_VOCODER_V3_LJSPEECH`](generated/torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH.html#torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH "torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH") | HiFiGAN Vocoder管道,训练于*LJ Speech数据集*[[Ito and Johnson, 2017](references.html#id7 "Keith Ito and Linda Johnson. The lj speech dataset. \url https://keithito.com/LJ-Speech-Dataset/, 2017.")]。 |' - en: VGGish[](#vggish "Permalink to this heading") id: totrans-13 prefs: - PREF_H2 type: TYPE_NORMAL zh: VGGish[](#vggish "跳转到此标题") - en: Interface[](#id3 "Permalink to this heading") id: totrans-14 prefs: - PREF_H3 type: TYPE_NORMAL zh: 接口[](#id3 "跳转到此标题") - en: '| [`VGGishBundle`](generated/torchaudio.prototype.pipelines.VGGishBundle.html#torchaudio.prototype.pipelines.VGGishBundle "torchaudio.prototype.pipelines.VGGishBundle") | VGGish [[Hershey *et al.*, 2017](references.html#id70 "Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, and Kevin Wilson. Cnn architectures for large-scale audio classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017\. URL: https://arxiv.org/abs/1609.09430.")] inference pipeline ported from [torchvggish](https://github.com/harritaylor/torchvggish) and [tensorflow-models](https://github.com/tensorflow/models/tree/master/research/audioset). |' id: totrans-15 prefs: [] type: TYPE_TB zh: '| [`VGGishBundle`](generated/torchaudio.prototype.pipelines.VGGishBundle.html#torchaudio.prototype.pipelines.VGGishBundle "torchaudio.prototype.pipelines.VGGishBundle") | VGGish[[Hershey等人,2017](references.html#id70 "Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, and Kevin Wilson. Cnn architectures for large-scale audio classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017\. URL: https://arxiv.org/abs/1609.09430.")]推断管道,从[torchvggish](https://github.com/harritaylor/torchvggish)和[tensorflow-models](https://github.com/tensorflow/models/tree/master/research/audioset)移植而来。 |' - en: '| [`VGGishBundle.VGGish`](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGish.html#torchaudio.prototype.pipelines.VGGishBundle.VGGish "torchaudio.prototype.pipelines.VGGishBundle.VGGish") | Implementation of VGGish model [[Hershey *et al.*, 2017](references.html#id70 "Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, and Kevin Wilson. Cnn architectures for large-scale audio classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017\. URL: https://arxiv.org/abs/1609.09430.")]. |' id: totrans-16 prefs: [] type: TYPE_TB zh: '| [`VGGishBundle.VGGish`](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGish.html#torchaudio.prototype.pipelines.VGGishBundle.VGGish "torchaudio.prototype.pipelines.VGGishBundle.VGGish") | VGGish模型的实现[[Hershey等人,2017](references.html#id70 "Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, and Kevin Wilson. Cnn architectures for large-scale audio classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017\. URL: https://arxiv.org/abs/1609.09430.")]。 |' - en: '| [`VGGishBundle.VGGishInputProcessor`](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor.html#torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor "torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor") | Converts raw waveforms to batches of examples to use as inputs to VGGish. |' id: totrans-17 prefs: [] type: TYPE_TB zh: '| [`VGGishBundle.VGGishInputProcessor`](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor.html#torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor "torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor") | 将原始波形转换为用作VGGish输入的示例批次。 |' - en: Pretrained Models[](#id6 "Permalink to this heading") id: totrans-18 prefs: - PREF_H3 type: TYPE_NORMAL zh: 预训练模型 - en: '| [`VGGISH`](generated/torchaudio.prototype.pipelines.VGGISH.html#torchaudio.prototype.pipelines.VGGISH "torchaudio.prototype.pipelines.VGGISH") | Pre-trained VGGish [[Hershey *et al.*, 2017](references.html#id70 "Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, and Kevin Wilson. Cnn architectures for large-scale audio classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017\. URL: https://arxiv.org/abs/1609.09430.")] inference pipeline ported from [torchvggish](https://github.com/harritaylor/torchvggish) and [tensorflow-models](https://github.com/tensorflow/models/tree/master/research/audioset). |' id: totrans-19 prefs: [] type: TYPE_TB zh: '| [`VGGISH`](generated/torchaudio.prototype.pipelines.VGGISH.html#torchaudio.prototype.pipelines.VGGISH "torchaudio.prototype.pipelines.VGGISH") | 从 [torchvggish](https://github.com/harritaylor/torchvggish) 和 [tensorflow-models](https://github.com/tensorflow/models/tree/master/research/audioset) 移植的预训练VGGish [[Hershey *et al.*, 2017](references.html#id70 "Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, and Kevin Wilson. Cnn architectures for large-scale audio classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017\. URL: https://arxiv.org/abs/1609.09430.")] 推理流程。 |'