- en: torchaudio.prototype.models id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL zh: torchaudio.prototype.models - en: 原文:[https://pytorch.org/audio/stable/prototype.models.html](https://pytorch.org/audio/stable/prototype.models.html) id: totrans-1 prefs: - PREF_BQ type: TYPE_NORMAL zh: 原文:[https://pytorch.org/audio/stable/prototype.models.html](https://pytorch.org/audio/stable/prototype.models.html) - en: The `torchaudio.prototype.models` subpackage contains definitions of models for addressing common audio tasks. id: totrans-2 prefs: [] type: TYPE_NORMAL zh: '`torchaudio.prototype.models`子包含有用于处理常见音频任务的模型定义。' - en: Note id: totrans-3 prefs: [] type: TYPE_NORMAL zh: 注意 - en: For models with pre-trained parameters, please refer to [`torchaudio.prototype.pipelines`](prototype.pipelines.html#module-torchaudio.prototype.pipelines "torchaudio.prototype.pipelines") module. id: totrans-4 prefs: [] type: TYPE_NORMAL zh: 对于具有预训练参数的模型,请参考[`torchaudio.prototype.pipelines`](prototype.pipelines.html#module-torchaudio.prototype.pipelines "torchaudio.prototype.pipelines")模块。 - en: Model defintions are responsible for constructing computation graphs and executing them. id: totrans-5 prefs: [] type: TYPE_NORMAL zh: 模型定义负责构建计算图并执行它们。 - en: Some models have complex structure and variations. For such models, factory functions are provided. id: totrans-6 prefs: [] type: TYPE_NORMAL zh: 一些模型具有复杂的结构和变体。对于这样的模型,提供了工厂函数。 - en: '| [`ConformerWav2Vec2PretrainModel`](generated/torchaudio.prototype.models.ConformerWav2Vec2PretrainModel.html#torchaudio.prototype.models.ConformerWav2Vec2PretrainModel "torchaudio.prototype.models.ConformerWav2Vec2PretrainModel") | Conformer Wav2Vec2 pre-train model for training from scratch. |' id: totrans-7 prefs: [] type: TYPE_TB zh: '| [`ConformerWav2Vec2PretrainModel`](generated/torchaudio.prototype.models.ConformerWav2Vec2PretrainModel.html#torchaudio.prototype.models.ConformerWav2Vec2PretrainModel "torchaudio.prototype.models.ConformerWav2Vec2PretrainModel") | Conformer Wav2Vec2预训练模型,用于从头开始训练。 |' - en: '| [`ConvEmformer`](generated/torchaudio.prototype.models.ConvEmformer.html#torchaudio.prototype.models.ConvEmformer "torchaudio.prototype.models.ConvEmformer") | Implements the convolution-augmented streaming transformer architecture introduced in *Streaming Transformer Transducer based Speech Recognition Using Non-Causal Convolution* [[Shi *et al.*, 2022](references.html#id31 "Yangyang Shi, Chunyang Wu, Dilin Wang, Alex Xiao, Jay Mahadeokar, Xiaohui Zhang, Chunxi Liu, Ke Li, Yuan Shangguan, Varun Nagaraja, Ozlem Kalinli, and Mike Seltzer. Streaming transformer transducer based speech recognition using non-causal convolution. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume, 8277-8281\. 2022\. doi:10.1109/ICASSP43922.2022.9747706.")]. |' id: totrans-8 prefs: [] type: TYPE_TB zh: '| [`ConvEmformer`](generated/torchaudio.prototype.models.ConvEmformer.html#torchaudio.prototype.models.ConvEmformer "torchaudio.prototype.models.ConvEmformer") | 实现了*Streaming Transformer Transducer based Speech Recognition Using Non-Causal Convolution*中引入的卷积增强流式变压器架构[[Shi等人,2022](references.html#id31 "Yangyang Shi, Chunyang Wu, Dilin Wang, Alex Xiao, Jay Mahadeokar, Xiaohui Zhang, Chunxi Liu, Ke Li, Yuan Shangguan, Varun Nagaraja, Ozlem Kalinli, and Mike Seltzer. Streaming transformer transducer based speech recognition using non-causal convolution. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume, 8277-8281\. 2022\. doi:10.1109/ICASSP43922.2022.9747706.")]. |' - en: '| [`HiFiGANVocoder`](generated/torchaudio.prototype.models.HiFiGANVocoder.html#torchaudio.prototype.models.HiFiGANVocoder "torchaudio.prototype.models.HiFiGANVocoder") | Generator part of *HiFi GAN* [[Kong *et al.*, 2020](references.html#id57 "Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. Hifi-gan: generative adversarial networks for efficient and high fidelity speech synthesis. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, 17022–17033\. Curran Associates, Inc., 2020\. URL: https://proceedings.neurips.cc/paper/2020/file/c5d736809766d46260d816d8dbc9eb44-Paper.pdf.")]. |' id: totrans-9 prefs: [] type: TYPE_TB zh: '| [`HiFiGANVocoder`](generated/torchaudio.prototype.models.HiFiGANVocoder.html#torchaudio.prototype.models.HiFiGANVocoder "torchaudio.prototype.models.HiFiGANVocoder") | *HiFi GAN*的生成器部分[[Kong等人,2020](references.html#id57 "Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. Hifi-gan: generative adversarial networks for efficient and high fidelity speech synthesis. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, 17022–17033\. Curran Associates, Inc., 2020\. URL: https://proceedings.neurips.cc/paper/2020/file/c5d736809766d46260d816d8dbc9eb44-Paper.pdf.")]. |' - en: Prototype Factory Functions of Beta Models[](#prototype-factory-functions-of-beta-models "Permalink to this heading") id: totrans-10 prefs: - PREF_H2 type: TYPE_NORMAL zh: Beta模型的原型工厂函数[](#prototype-factory-functions-of-beta-models "Permalink to this heading") - en: Some model definitions are in beta, but there are new factory functions that are still in prototype. Please check “Prototype Factory Functions” section in each model. id: totrans-11 prefs: [] type: TYPE_NORMAL zh: 一些模型定义处于测试阶段,但仍有新的工厂函数处于原型阶段。请查看每个模型中的“Prototype Factory Functions”部分。 - en: '| [`Wav2Vec2Model`](generated/torchaudio.models.Wav2Vec2Model.html#torchaudio.models.Wav2Vec2Model "torchaudio.models.Wav2Vec2Model") | Acoustic model used in *wav2vec 2.0* [[Baevski *et al.*, 2020](references.html#id15 "Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. Wav2vec 2.0: a framework for self-supervised learning of speech representations. 2020\. arXiv:2006.11477.")]. |' id: totrans-12 prefs: [] type: TYPE_TB zh: '| [`Wav2Vec2Model`](generated/torchaudio.models.Wav2Vec2Model.html#torchaudio.models.Wav2Vec2Model "torchaudio.models.Wav2Vec2Model") | *wav2vec 2.0*中使用的声学模型[[Baevski等人,2020](references.html#id15 "Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. Wav2vec 2.0: a framework for self-supervised learning of speech representations. 2020\. arXiv:2006.11477.")]. |' - en: '| [`RNNT`](generated/torchaudio.models.RNNT.html#torchaudio.models.RNNT "torchaudio.models.RNNT") | Recurrent neural network transducer (RNN-T) model. |' id: totrans-13 prefs: [] type: TYPE_TB zh: '| [`RNNT`](generated/torchaudio.models.RNNT.html#torchaudio.models.RNNT "torchaudio.models.RNNT") | 循环神经网络转录器(RNN-T)模型。 |'