- en: torchaudio.prototype.pipelines
  id: totrans-0
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
  zh: torchaudio.prototype.pipelines
- en: 原文：[https://pytorch.org/audio/stable/prototype.pipelines.html](https://pytorch.org/audio/stable/prototype.pipelines.html)
  id: totrans-1
  prefs:
  - PREF_BQ
  type: TYPE_NORMAL
  zh: 原文：[https://pytorch.org/audio/stable/prototype.pipelines.html](https://pytorch.org/audio/stable/prototype.pipelines.html)
- en: The pipelines subpackage contains APIs to models with pretrained weights and
    relevant utilities.
  id: totrans-2
  prefs: []
  type: TYPE_NORMAL
  zh: pipelines子包含有具有预训练权重和相关实用程序的模型的API。
- en: RNN-T Streaming/Non-Streaming ASR[](#rnn-t-streaming-non-streaming-asr "Permalink
    to this heading")
  id: totrans-3
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
  zh: RNN-T流式/非流式ASR[](#rnn-t-streaming-non-streaming-asr "跳转到此标题")
- en: Pretrained Models[](#pretrained-models "Permalink to this heading")
  id: totrans-4
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 预训练模型[](#pretrained-models "跳转到此标题")
- en: '| [`EMFORMER_RNNT_BASE_MUSTC`](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC.html#torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC
    "torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC") | Pre-trained Emformer-RNNT-based
    ASR pipeline capable of performing both streaming and non-streaming inference.
    |'
  id: totrans-5
  prefs: []
  type: TYPE_TB
  zh: '| [`EMFORMER_RNNT_BASE_MUSTC`](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC.html#torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC
    "torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_MUSTC") | 预训练的Emformer-RNNT基于ASR管道，能够执行流式和非流式推断。
    |'
- en: '| [`EMFORMER_RNNT_BASE_TEDLIUM3`](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3.html#torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3
    "torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3") | Pre-trained Emformer-RNNT-based
    ASR pipeline capable of performing both streaming and non-streaming inference.
    |'
  id: totrans-6
  prefs: []
  type: TYPE_TB
  zh: '| [`EMFORMER_RNNT_BASE_TEDLIUM3`](generated/torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3.html#torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3
    "torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3") | 预训练的Emformer-RNNT基于ASR管道，能够执行流式和非流式推断。
    |'
- en: HiFiGAN Vocoder[](#hifigan-vocoder "Permalink to this heading")
  id: totrans-7
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
  zh: HiFiGAN Vocoder[](#hifigan-vocoder "跳转到此标题")
- en: Interface[](#interface "Permalink to this heading")
  id: totrans-8
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 接口[](#interface "跳转到此标题")
- en: '[`HiFiGANVocoderBundle`](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#torchaudio.prototype.pipelines.HiFiGANVocoderBundle
    "torchaudio.prototype.pipelines.HiFiGANVocoderBundle") defines HiFiGAN Vocoder
    pipeline capable of transforming mel spectrograms into waveforms.'
  id: totrans-9
  prefs: []
  type: TYPE_NORMAL
  zh: '[`HiFiGANVocoderBundle`](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#torchaudio.prototype.pipelines.HiFiGANVocoderBundle
    "torchaudio.prototype.pipelines.HiFiGANVocoderBundle")定义了能够将mel频谱图转换为波形的HiFiGAN
    Vocoder管道。'
- en: '| [`HiFiGANVocoderBundle`](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#torchaudio.prototype.pipelines.HiFiGANVocoderBundle
    "torchaudio.prototype.pipelines.HiFiGANVocoderBundle") | Data class that bundles
    associated information to use pretrained [`HiFiGANVocoder`](generated/torchaudio.prototype.models.HiFiGANVocoder.html#torchaudio.prototype.models.HiFiGANVocoder
    "torchaudio.prototype.models.HiFiGANVocoder"). |'
  id: totrans-10
  prefs: []
  type: TYPE_TB
  zh: '| [`HiFiGANVocoderBundle`](generated/torchaudio.prototype.pipelines.HiFiGANVocoderBundle.html#torchaudio.prototype.pipelines.HiFiGANVocoderBundle
    "torchaudio.prototype.pipelines.HiFiGANVocoderBundle") | 数据类，捆绑了与预训练[`HiFiGANVocoder`](generated/torchaudio.prototype.models.HiFiGANVocoder.html#torchaudio.prototype.models.HiFiGANVocoder
    "torchaudio.prototype.models.HiFiGANVocoder")相关的信息。 |'
- en: Pretrained Models[](#id1 "Permalink to this heading")
  id: totrans-11
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 预训练模型[](#id1 "跳转到此标题")
- en: '| [`HIFIGAN_VOCODER_V3_LJSPEECH`](generated/torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH.html#torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH
    "torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH") | HiFiGAN Vocoder
    pipeline, trained on *The LJ Speech Dataset* [[Ito and Johnson, 2017](references.html#id7
    "Keith Ito and Linda Johnson. The lj speech dataset. \url https://keithito.com/LJ-Speech-Dataset/,
    2017.")]. |'
  id: totrans-12
  prefs: []
  type: TYPE_TB
  zh: '| [`HIFIGAN_VOCODER_V3_LJSPEECH`](generated/torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH.html#torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH
    "torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH") | HiFiGAN Vocoder管道，训练于*LJ
    Speech数据集*[[Ito and Johnson, 2017](references.html#id7 "Keith Ito and Linda Johnson.
    The lj speech dataset. \url https://keithito.com/LJ-Speech-Dataset/, 2017.")]。
    |'
- en: VGGish[](#vggish "Permalink to this heading")
  id: totrans-13
  prefs:
  - PREF_H2
  type: TYPE_NORMAL
  zh: VGGish[](#vggish "跳转到此标题")
- en: Interface[](#id3 "Permalink to this heading")
  id: totrans-14
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 接口[](#id3 "跳转到此标题")
- en: '| [`VGGishBundle`](generated/torchaudio.prototype.pipelines.VGGishBundle.html#torchaudio.prototype.pipelines.VGGishBundle
    "torchaudio.prototype.pipelines.VGGishBundle") | VGGish [[Hershey *et al.*, 2017](references.html#id70
    "Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen,
    Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm
    Slaney, Ron Weiss, and Kevin Wilson. Cnn architectures for large-scale audio classification.
    In International Conference on Acoustics, Speech and Signal Processing (ICASSP).
    2017\. URL: https://arxiv.org/abs/1609.09430.")] inference pipeline ported from
    [torchvggish](https://github.com/harritaylor/torchvggish) and [tensorflow-models](https://github.com/tensorflow/models/tree/master/research/audioset).
    |'
  id: totrans-15
  prefs: []
  type: TYPE_TB
  zh: '| [`VGGishBundle`](generated/torchaudio.prototype.pipelines.VGGishBundle.html#torchaudio.prototype.pipelines.VGGishBundle
    "torchaudio.prototype.pipelines.VGGishBundle") | VGGish[[Hershey等人，2017](references.html#id70
    "Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen,
    Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm
    Slaney, Ron Weiss, and Kevin Wilson. Cnn architectures for large-scale audio classification.
    In International Conference on Acoustics, Speech and Signal Processing (ICASSP).
    2017\. URL: https://arxiv.org/abs/1609.09430.")]推断管道，从[torchvggish](https://github.com/harritaylor/torchvggish)和[tensorflow-models](https://github.com/tensorflow/models/tree/master/research/audioset)移植而来。
    |'
- en: '| [`VGGishBundle.VGGish`](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGish.html#torchaudio.prototype.pipelines.VGGishBundle.VGGish
    "torchaudio.prototype.pipelines.VGGishBundle.VGGish") | Implementation of VGGish
    model [[Hershey *et al.*, 2017](references.html#id70 "Shawn Hershey, Sourish Chaudhuri,
    Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal,
    Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, and Kevin
    Wilson. Cnn architectures for large-scale audio classification. In International
    Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017\. URL: https://arxiv.org/abs/1609.09430.")].
    |'
  id: totrans-16
  prefs: []
  type: TYPE_TB
  zh: '| [`VGGishBundle.VGGish`](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGish.html#torchaudio.prototype.pipelines.VGGishBundle.VGGish
    "torchaudio.prototype.pipelines.VGGishBundle.VGGish") | VGGish模型的实现[[Hershey等人，2017](references.html#id70
    "Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen,
    Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm
    Slaney, Ron Weiss, and Kevin Wilson. Cnn architectures for large-scale audio classification.
    In International Conference on Acoustics, Speech and Signal Processing (ICASSP).
    2017\. URL: https://arxiv.org/abs/1609.09430.")]。 |'
- en: '| [`VGGishBundle.VGGishInputProcessor`](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor.html#torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor
    "torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor") | Converts
    raw waveforms to batches of examples to use as inputs to VGGish. |'
  id: totrans-17
  prefs: []
  type: TYPE_TB
  zh: '| [`VGGishBundle.VGGishInputProcessor`](generated/torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor.html#torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor
    "torchaudio.prototype.pipelines.VGGishBundle.VGGishInputProcessor") | 将原始波形转换为用作VGGish输入的示例批次。
    |'
- en: Pretrained Models[](#id6 "Permalink to this heading")
  id: totrans-18
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 预训练模型
- en: '| [`VGGISH`](generated/torchaudio.prototype.pipelines.VGGISH.html#torchaudio.prototype.pipelines.VGGISH
    "torchaudio.prototype.pipelines.VGGISH") | Pre-trained VGGish [[Hershey *et al.*,
    2017](references.html#id70 "Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis,
    Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, Devin Platt, Rif A.
    Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, and Kevin Wilson. Cnn architectures
    for large-scale audio classification. In International Conference on Acoustics,
    Speech and Signal Processing (ICASSP). 2017\. URL: https://arxiv.org/abs/1609.09430.")]
    inference pipeline ported from [torchvggish](https://github.com/harritaylor/torchvggish)
    and [tensorflow-models](https://github.com/tensorflow/models/tree/master/research/audioset).
    |'
  id: totrans-19
  prefs: []
  type: TYPE_TB
  zh: '| [`VGGISH`](generated/torchaudio.prototype.pipelines.VGGISH.html#torchaudio.prototype.pipelines.VGGISH
    "torchaudio.prototype.pipelines.VGGISH") | 从 [torchvggish](https://github.com/harritaylor/torchvggish)
    和 [tensorflow-models](https://github.com/tensorflow/models/tree/master/research/audioset)
    移植的预训练VGGish [[Hershey *et al.*, 2017](references.html#id70 "Shawn Hershey, Sourish
    Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj
    Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss,
    and Kevin Wilson. Cnn architectures for large-scale audio classification. In International
    Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017\. URL: https://arxiv.org/abs/1609.09430.")]
    推理流程。 |'