Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
5e714ecb
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 2 年 前同步成功
通知
210
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
5e714ecb
编写于
9月 19, 2022
作者:
小湉湉
提交者:
GitHub
9月 19, 2022
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
[doc]update api docs (#2406)
* update apt docs, test=doc
上级
e6cbcca3
变更
97
隐藏空白更改
内联
并排
Showing
97 changed file
with
1348 addition
and
375 deletion
+1348
-375
docs/source/api/paddlespeech.audio.rst
docs/source/api/paddlespeech.audio.rst
+3
-0
docs/source/api/paddlespeech.audio.streamdata.autodecode.rst
docs/source/api/paddlespeech.audio.streamdata.autodecode.rst
+7
-0
docs/source/api/paddlespeech.audio.streamdata.cache.rst
docs/source/api/paddlespeech.audio.streamdata.cache.rst
+7
-0
docs/source/api/paddlespeech.audio.streamdata.compat.rst
docs/source/api/paddlespeech.audio.streamdata.compat.rst
+7
-0
docs/source/api/paddlespeech.audio.streamdata.extradatasets.rst
...ource/api/paddlespeech.audio.streamdata.extradatasets.rst
+7
-0
docs/source/api/paddlespeech.audio.streamdata.filters.rst
docs/source/api/paddlespeech.audio.streamdata.filters.rst
+7
-0
docs/source/api/paddlespeech.audio.streamdata.gopen.rst
docs/source/api/paddlespeech.audio.streamdata.gopen.rst
+7
-0
docs/source/api/paddlespeech.audio.streamdata.handlers.rst
docs/source/api/paddlespeech.audio.streamdata.handlers.rst
+7
-0
docs/source/api/paddlespeech.audio.streamdata.mix.rst
docs/source/api/paddlespeech.audio.streamdata.mix.rst
+7
-0
docs/source/api/paddlespeech.audio.streamdata.paddle_utils.rst
...source/api/paddlespeech.audio.streamdata.paddle_utils.rst
+7
-0
docs/source/api/paddlespeech.audio.streamdata.pipeline.rst
docs/source/api/paddlespeech.audio.streamdata.pipeline.rst
+7
-0
docs/source/api/paddlespeech.audio.streamdata.rst
docs/source/api/paddlespeech.audio.streamdata.rst
+28
-0
docs/source/api/paddlespeech.audio.streamdata.shardlists.rst
docs/source/api/paddlespeech.audio.streamdata.shardlists.rst
+7
-0
docs/source/api/paddlespeech.audio.streamdata.tariterators.rst
...source/api/paddlespeech.audio.streamdata.tariterators.rst
+7
-0
docs/source/api/paddlespeech.audio.streamdata.utils.rst
docs/source/api/paddlespeech.audio.streamdata.utils.rst
+7
-0
docs/source/api/paddlespeech.audio.streamdata.writer.rst
docs/source/api/paddlespeech.audio.streamdata.writer.rst
+7
-0
docs/source/api/paddlespeech.audio.text.rst
docs/source/api/paddlespeech.audio.text.rst
+16
-0
docs/source/api/paddlespeech.audio.text.text_featurizer.rst
docs/source/api/paddlespeech.audio.text.text_featurizer.rst
+7
-0
docs/source/api/paddlespeech.audio.text.utility.rst
docs/source/api/paddlespeech.audio.text.utility.rst
+7
-0
docs/source/api/paddlespeech.audio.transform.add_deltas.rst
docs/source/api/paddlespeech.audio.transform.add_deltas.rst
+7
-0
docs/source/api/paddlespeech.audio.transform.channel_selector.rst
...rce/api/paddlespeech.audio.transform.channel_selector.rst
+7
-0
docs/source/api/paddlespeech.audio.transform.cmvn.rst
docs/source/api/paddlespeech.audio.transform.cmvn.rst
+7
-0
docs/source/api/paddlespeech.audio.transform.functional.rst
docs/source/api/paddlespeech.audio.transform.functional.rst
+7
-0
docs/source/api/paddlespeech.audio.transform.perturb.rst
docs/source/api/paddlespeech.audio.transform.perturb.rst
+7
-0
docs/source/api/paddlespeech.audio.transform.rst
docs/source/api/paddlespeech.audio.transform.rst
+24
-0
docs/source/api/paddlespeech.audio.transform.spec_augment.rst
.../source/api/paddlespeech.audio.transform.spec_augment.rst
+7
-0
docs/source/api/paddlespeech.audio.transform.spectrogram.rst
docs/source/api/paddlespeech.audio.transform.spectrogram.rst
+7
-0
docs/source/api/paddlespeech.audio.transform.transform_interface.rst
.../api/paddlespeech.audio.transform.transform_interface.rst
+7
-0
docs/source/api/paddlespeech.audio.transform.transformation.rst
...ource/api/paddlespeech.audio.transform.transformation.rst
+7
-0
docs/source/api/paddlespeech.audio.transform.wpe.rst
docs/source/api/paddlespeech.audio.transform.wpe.rst
+7
-0
docs/source/api/paddlespeech.audio.utils.check_kwargs.rst
docs/source/api/paddlespeech.audio.utils.check_kwargs.rst
+7
-0
docs/source/api/paddlespeech.audio.utils.dynamic_import.rst
docs/source/api/paddlespeech.audio.utils.dynamic_import.rst
+7
-0
docs/source/api/paddlespeech.audio.utils.rst
docs/source/api/paddlespeech.audio.utils.rst
+3
-0
docs/source/api/paddlespeech.audio.utils.tensor_utils.rst
docs/source/api/paddlespeech.audio.utils.tensor_utils.rst
+7
-0
docs/source/api/paddlespeech.kws.exps.mdtc.collate.rst
docs/source/api/paddlespeech.kws.exps.mdtc.collate.rst
+7
-0
docs/source/api/paddlespeech.kws.exps.mdtc.compute_det.rst
docs/source/api/paddlespeech.kws.exps.mdtc.compute_det.rst
+7
-0
docs/source/api/paddlespeech.kws.exps.mdtc.plot_det_curve.rst
.../source/api/paddlespeech.kws.exps.mdtc.plot_det_curve.rst
+7
-0
docs/source/api/paddlespeech.kws.exps.mdtc.rst
docs/source/api/paddlespeech.kws.exps.mdtc.rst
+19
-0
docs/source/api/paddlespeech.kws.exps.mdtc.score.rst
docs/source/api/paddlespeech.kws.exps.mdtc.score.rst
+7
-0
docs/source/api/paddlespeech.kws.exps.mdtc.train.rst
docs/source/api/paddlespeech.kws.exps.mdtc.train.rst
+7
-0
docs/source/api/paddlespeech.kws.exps.rst
docs/source/api/paddlespeech.kws.exps.rst
+15
-0
docs/source/api/paddlespeech.kws.rst
docs/source/api/paddlespeech.kws.rst
+1
-0
docs/source/api/paddlespeech.resource.model_alias.rst
docs/source/api/paddlespeech.resource.model_alias.rst
+7
-0
docs/source/api/paddlespeech.resource.pretrained_models.rst
docs/source/api/paddlespeech.resource.pretrained_models.rst
+7
-0
docs/source/api/paddlespeech.resource.resource.rst
docs/source/api/paddlespeech.resource.resource.rst
+7
-0
docs/source/api/paddlespeech.resource.rst
docs/source/api/paddlespeech.resource.rst
+17
-0
docs/source/api/paddlespeech.rst
docs/source/api/paddlespeech.rst
+2
-0
docs/source/api/paddlespeech.s2t.rst
docs/source/api/paddlespeech.s2t.rst
+0
-1
docs/source/api/paddlespeech.server.utils.rst
docs/source/api/paddlespeech.server.utils.rst
+0
-1
docs/source/api/paddlespeech.t2s.datasets.rst
docs/source/api/paddlespeech.t2s.datasets.rst
+1
-0
docs/source/api/paddlespeech.t2s.datasets.sampler.rst
docs/source/api/paddlespeech.t2s.datasets.sampler.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.ernie_sat.align.rst
docs/source/api/paddlespeech.t2s.exps.ernie_sat.align.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.ernie_sat.normalize.rst
.../source/api/paddlespeech.t2s.exps.ernie_sat.normalize.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.ernie_sat.preprocess.rst
...source/api/paddlespeech.t2s.exps.ernie_sat.preprocess.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.ernie_sat.rst
docs/source/api/paddlespeech.t2s.exps.ernie_sat.rst
+21
-0
docs/source/api/paddlespeech.t2s.exps.ernie_sat.synthesize.rst
...source/api/paddlespeech.t2s.exps.ernie_sat.synthesize.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.ernie_sat.synthesize_e2e.rst
...ce/api/paddlespeech.t2s.exps.ernie_sat.synthesize_e2e.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.ernie_sat.train.rst
docs/source/api/paddlespeech.t2s.exps.ernie_sat.train.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.ernie_sat.utils.rst
docs/source/api/paddlespeech.t2s.exps.ernie_sat.utils.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.fastspeech2.rst
docs/source/api/paddlespeech.t2s.exps.fastspeech2.rst
+1
-0
docs/source/api/paddlespeech.t2s.exps.fastspeech2.vc2_infer.rst
...ource/api/paddlespeech.t2s.exps.fastspeech2.vc2_infer.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.rst
docs/source/api/paddlespeech.t2s.exps.rst
+3
-0
docs/source/api/paddlespeech.t2s.exps.stream_play_tts.rst
docs/source/api/paddlespeech.t2s.exps.stream_play_tts.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.vits.normalize.rst
docs/source/api/paddlespeech.t2s.exps.vits.normalize.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.vits.preprocess.rst
docs/source/api/paddlespeech.t2s.exps.vits.preprocess.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.vits.rst
docs/source/api/paddlespeech.t2s.exps.vits.rst
+20
-0
docs/source/api/paddlespeech.t2s.exps.vits.synthesize.rst
docs/source/api/paddlespeech.t2s.exps.vits.synthesize.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.vits.synthesize_e2e.rst
.../source/api/paddlespeech.t2s.exps.vits.synthesize_e2e.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.vits.train.rst
docs/source/api/paddlespeech.t2s.exps.vits.train.rst
+7
-0
docs/source/api/paddlespeech.t2s.exps.vits.voice_cloning.rst
docs/source/api/paddlespeech.t2s.exps.vits.voice_cloning.rst
+7
-0
docs/source/api/paddlespeech.t2s.frontend.g2pw.dataset.rst
docs/source/api/paddlespeech.t2s.frontend.g2pw.dataset.rst
+7
-0
docs/source/api/paddlespeech.t2s.frontend.g2pw.onnx_api.rst
docs/source/api/paddlespeech.t2s.frontend.g2pw.onnx_api.rst
+7
-0
docs/source/api/paddlespeech.t2s.frontend.g2pw.rst
docs/source/api/paddlespeech.t2s.frontend.g2pw.rst
+17
-0
docs/source/api/paddlespeech.t2s.frontend.g2pw.utils.rst
docs/source/api/paddlespeech.t2s.frontend.g2pw.utils.rst
+7
-0
docs/source/api/paddlespeech.t2s.frontend.mix_frontend.rst
docs/source/api/paddlespeech.t2s.frontend.mix_frontend.rst
+7
-0
docs/source/api/paddlespeech.t2s.frontend.rst
docs/source/api/paddlespeech.t2s.frontend.rst
+2
-0
docs/source/api/paddlespeech.t2s.models.ernie_sat.ernie_sat.rst
...ource/api/paddlespeech.t2s.models.ernie_sat.ernie_sat.rst
+7
-0
docs/source/api/paddlespeech.t2s.models.ernie_sat.ernie_sat_updater.rst
...i/paddlespeech.t2s.models.ernie_sat.ernie_sat_updater.rst
+7
-0
docs/source/api/paddlespeech.t2s.models.ernie_sat.rst
docs/source/api/paddlespeech.t2s.models.ernie_sat.rst
+2
-1
docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.core.rst
...api/paddlespeech.t2s.models.vits.monotonic_align.core.rst
+7
-0
docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.rst
...urce/api/paddlespeech.t2s.models.vits.monotonic_align.rst
+16
-0
docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.setup.rst
...pi/paddlespeech.t2s.models.vits.monotonic_align.setup.rst
+7
-0
docs/source/api/paddlespeech.utils.dynamic_import.rst
docs/source/api/paddlespeech.utils.dynamic_import.rst
+7
-0
docs/source/api/paddlespeech.utils.env.rst
docs/source/api/paddlespeech.utils.env.rst
+7
-0
docs/source/api/paddlespeech.utils.rst
docs/source/api/paddlespeech.utils.rst
+16
-0
docs/source/index.rst
docs/source/index.rst
+2
-0
paddlespeech/t2s/models/ernie_sat/ernie_sat.py
paddlespeech/t2s/models/ernie_sat/ernie_sat.py
+71
-37
paddlespeech/t2s/models/vits/duration_predictor.py
paddlespeech/t2s/models/vits/duration_predictor.py
+26
-13
paddlespeech/t2s/models/vits/flow.py
paddlespeech/t2s/models/vits/flow.py
+74
-37
paddlespeech/t2s/models/vits/generator.py
paddlespeech/t2s/models/vits/generator.py
+182
-119
paddlespeech/t2s/models/vits/posterior_encoder.py
paddlespeech/t2s/models/vits/posterior_encoder.py
+36
-18
paddlespeech/t2s/models/vits/residual_coupling.py
paddlespeech/t2s/models/vits/residual_coupling.py
+66
-33
paddlespeech/t2s/models/vits/text_encoder.py
paddlespeech/t2s/models/vits/text_encoder.py
+46
-23
paddlespeech/t2s/models/vits/vits.py
paddlespeech/t2s/models/vits/vits.py
+102
-51
paddlespeech/t2s/models/vits/wavenet/residual_block.py
paddlespeech/t2s/models/vits/wavenet/residual_block.py
+16
-8
paddlespeech/t2s/models/vits/wavenet/wavenet.py
paddlespeech/t2s/models/vits/wavenet/wavenet.py
+46
-26
paddlespeech/t2s/models/wavernn/wavernn.py
paddlespeech/t2s/models/wavernn/wavernn.py
+13
-7
未找到文件。
docs/source/api/paddlespeech.audio.rst
浏览文件 @
5e714ecb
...
@@ -20,4 +20,7 @@ Subpackages
...
@@ -20,4 +20,7 @@ Subpackages
paddlespeech.audio.io
paddlespeech.audio.io
paddlespeech.audio.metric
paddlespeech.audio.metric
paddlespeech.audio.sox_effects
paddlespeech.audio.sox_effects
paddlespeech.audio.streamdata
paddlespeech.audio.text
paddlespeech.audio.transform
paddlespeech.audio.utils
paddlespeech.audio.utils
docs/source/api/paddlespeech.audio.streamdata.autodecode.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata.autodecode module
===============================================
.. automodule:: paddlespeech.audio.streamdata.autodecode
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.streamdata.cache.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata.cache module
==========================================
.. automodule:: paddlespeech.audio.streamdata.cache
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.streamdata.compat.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata.compat module
===========================================
.. automodule:: paddlespeech.audio.streamdata.compat
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.streamdata.extradatasets.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata.extradatasets module
==================================================
.. automodule:: paddlespeech.audio.streamdata.extradatasets
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.streamdata.filters.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata.filters module
============================================
.. automodule:: paddlespeech.audio.streamdata.filters
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.streamdata.gopen.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata.gopen module
==========================================
.. automodule:: paddlespeech.audio.streamdata.gopen
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.streamdata.handlers.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata.handlers module
=============================================
.. automodule:: paddlespeech.audio.streamdata.handlers
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.streamdata.mix.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata.mix module
========================================
.. automodule:: paddlespeech.audio.streamdata.mix
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.streamdata.paddle_utils.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata.paddle\_utils module
==================================================
.. automodule:: paddlespeech.audio.streamdata.paddle_utils
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.streamdata.pipeline.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata.pipeline module
=============================================
.. automodule:: paddlespeech.audio.streamdata.pipeline
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.streamdata.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata package
=====================================
.. automodule:: paddlespeech.audio.streamdata
:members:
:undoc-members:
:show-inheritance:
Submodules
----------
.. toctree::
:maxdepth: 4
paddlespeech.audio.streamdata.autodecode
paddlespeech.audio.streamdata.cache
paddlespeech.audio.streamdata.compat
paddlespeech.audio.streamdata.extradatasets
paddlespeech.audio.streamdata.filters
paddlespeech.audio.streamdata.gopen
paddlespeech.audio.streamdata.handlers
paddlespeech.audio.streamdata.mix
paddlespeech.audio.streamdata.paddle_utils
paddlespeech.audio.streamdata.pipeline
paddlespeech.audio.streamdata.shardlists
paddlespeech.audio.streamdata.tariterators
paddlespeech.audio.streamdata.utils
paddlespeech.audio.streamdata.writer
docs/source/api/paddlespeech.audio.streamdata.shardlists.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata.shardlists module
===============================================
.. automodule:: paddlespeech.audio.streamdata.shardlists
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.streamdata.tariterators.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata.tariterators module
=================================================
.. automodule:: paddlespeech.audio.streamdata.tariterators
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.streamdata.utils.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata.utils module
==========================================
.. automodule:: paddlespeech.audio.streamdata.utils
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.streamdata.writer.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.streamdata.writer module
===========================================
.. automodule:: paddlespeech.audio.streamdata.writer
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.text.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.text package
===============================
.. automodule:: paddlespeech.audio.text
:members:
:undoc-members:
:show-inheritance:
Submodules
----------
.. toctree::
:maxdepth: 4
paddlespeech.audio.text.text_featurizer
paddlespeech.audio.text.utility
docs/source/api/paddlespeech.audio.text.text_featurizer.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.text.text\_featurizer module
===============================================
.. automodule:: paddlespeech.audio.text.text_featurizer
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.text.utility.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.text.utility module
======================================
.. automodule:: paddlespeech.audio.text.utility
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.transform.add_deltas.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.transform.add\_deltas module
===============================================
.. automodule:: paddlespeech.audio.transform.add_deltas
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.transform.channel_selector.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.transform.channel\_selector module
=====================================================
.. automodule:: paddlespeech.audio.transform.channel_selector
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.transform.cmvn.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.transform.cmvn module
========================================
.. automodule:: paddlespeech.audio.transform.cmvn
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.transform.functional.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.transform.functional module
==============================================
.. automodule:: paddlespeech.audio.transform.functional
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.transform.perturb.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.transform.perturb module
===========================================
.. automodule:: paddlespeech.audio.transform.perturb
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.transform.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.transform package
====================================
.. automodule:: paddlespeech.audio.transform
:members:
:undoc-members:
:show-inheritance:
Submodules
----------
.. toctree::
:maxdepth: 4
paddlespeech.audio.transform.add_deltas
paddlespeech.audio.transform.channel_selector
paddlespeech.audio.transform.cmvn
paddlespeech.audio.transform.functional
paddlespeech.audio.transform.perturb
paddlespeech.audio.transform.spec_augment
paddlespeech.audio.transform.spectrogram
paddlespeech.audio.transform.transform_interface
paddlespeech.audio.transform.transformation
paddlespeech.audio.transform.wpe
docs/source/api/paddlespeech.audio.transform.spec_augment.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.transform.spec\_augment module
=================================================
.. automodule:: paddlespeech.audio.transform.spec_augment
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.transform.spectrogram.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.transform.spectrogram module
===============================================
.. automodule:: paddlespeech.audio.transform.spectrogram
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.transform.transform_interface.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.transform.transform\_interface module
========================================================
.. automodule:: paddlespeech.audio.transform.transform_interface
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.transform.transformation.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.transform.transformation module
==================================================
.. automodule:: paddlespeech.audio.transform.transformation
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.transform.wpe.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.transform.wpe module
=======================================
.. automodule:: paddlespeech.audio.transform.wpe
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.utils.check_kwargs.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.utils.check\_kwargs module
=============================================
.. automodule:: paddlespeech.audio.utils.check_kwargs
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.utils.dynamic_import.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.utils.dynamic\_import module
===============================================
.. automodule:: paddlespeech.audio.utils.dynamic_import
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.audio.utils.rst
浏览文件 @
5e714ecb
...
@@ -12,8 +12,11 @@ Submodules
...
@@ -12,8 +12,11 @@ Submodules
.. toctree::
.. toctree::
:maxdepth: 4
:maxdepth: 4
paddlespeech.audio.utils.check_kwargs
paddlespeech.audio.utils.download
paddlespeech.audio.utils.download
paddlespeech.audio.utils.dynamic_import
paddlespeech.audio.utils.error
paddlespeech.audio.utils.error
paddlespeech.audio.utils.log
paddlespeech.audio.utils.log
paddlespeech.audio.utils.numeric
paddlespeech.audio.utils.numeric
paddlespeech.audio.utils.tensor_utils
paddlespeech.audio.utils.time
paddlespeech.audio.utils.time
docs/source/api/paddlespeech.audio.utils.tensor_utils.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.audio.utils.tensor\_utils module
=============================================
.. automodule:: paddlespeech.audio.utils.tensor_utils
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.kws.exps.mdtc.collate.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.kws.exps.mdtc.collate module
=========================================
.. automodule:: paddlespeech.kws.exps.mdtc.collate
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.kws.exps.mdtc.compute_det.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.kws.exps.mdtc.compute\_det module
==============================================
.. automodule:: paddlespeech.kws.exps.mdtc.compute_det
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.kws.exps.mdtc.plot_det_curve.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.kws.exps.mdtc.plot\_det\_curve module
==================================================
.. automodule:: paddlespeech.kws.exps.mdtc.plot_det_curve
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.kws.exps.mdtc.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.kws.exps.mdtc package
==================================
.. automodule:: paddlespeech.kws.exps.mdtc
:members:
:undoc-members:
:show-inheritance:
Submodules
----------
.. toctree::
:maxdepth: 4
paddlespeech.kws.exps.mdtc.collate
paddlespeech.kws.exps.mdtc.compute_det
paddlespeech.kws.exps.mdtc.plot_det_curve
paddlespeech.kws.exps.mdtc.score
paddlespeech.kws.exps.mdtc.train
docs/source/api/paddlespeech.kws.exps.mdtc.score.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.kws.exps.mdtc.score module
=======================================
.. automodule:: paddlespeech.kws.exps.mdtc.score
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.kws.exps.mdtc.train.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.kws.exps.mdtc.train module
=======================================
.. automodule:: paddlespeech.kws.exps.mdtc.train
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.kws.exps.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.kws.exps package
=============================
.. automodule:: paddlespeech.kws.exps
:members:
:undoc-members:
:show-inheritance:
Subpackages
-----------
.. toctree::
:maxdepth: 4
paddlespeech.kws.exps.mdtc
docs/source/api/paddlespeech.kws.rst
浏览文件 @
5e714ecb
...
@@ -12,4 +12,5 @@ Subpackages
...
@@ -12,4 +12,5 @@ Subpackages
.. toctree::
.. toctree::
:maxdepth: 4
:maxdepth: 4
paddlespeech.kws.exps
paddlespeech.kws.models
paddlespeech.kws.models
docs/source/api/paddlespeech.resource.model_alias.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.resource.model\_alias module
=========================================
.. automodule:: paddlespeech.resource.model_alias
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.resource.pretrained_models.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.resource.pretrained\_models module
===============================================
.. automodule:: paddlespeech.resource.pretrained_models
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.resource.resource.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.resource.resource module
=====================================
.. automodule:: paddlespeech.resource.resource
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.resource.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.resource package
=============================
.. automodule:: paddlespeech.resource
:members:
:undoc-members:
:show-inheritance:
Submodules
----------
.. toctree::
:maxdepth: 4
paddlespeech.resource.model_alias
paddlespeech.resource.pretrained_models
paddlespeech.resource.resource
docs/source/api/paddlespeech.rst
浏览文件 @
5e714ecb
...
@@ -16,8 +16,10 @@ Subpackages
...
@@ -16,8 +16,10 @@ Subpackages
paddlespeech.cli
paddlespeech.cli
paddlespeech.cls
paddlespeech.cls
paddlespeech.kws
paddlespeech.kws
paddlespeech.resource
paddlespeech.s2t
paddlespeech.s2t
paddlespeech.server
paddlespeech.server
paddlespeech.t2s
paddlespeech.t2s
paddlespeech.text
paddlespeech.text
paddlespeech.utils
paddlespeech.vector
paddlespeech.vector
docs/source/api/paddlespeech.s2t.rst
浏览文件 @
5e714ecb
...
@@ -19,5 +19,4 @@ Subpackages
...
@@ -19,5 +19,4 @@ Subpackages
paddlespeech.s2t.models
paddlespeech.s2t.models
paddlespeech.s2t.modules
paddlespeech.s2t.modules
paddlespeech.s2t.training
paddlespeech.s2t.training
paddlespeech.s2t.transform
paddlespeech.s2t.utils
paddlespeech.s2t.utils
docs/source/api/paddlespeech.server.utils.rst
浏览文件 @
5e714ecb
...
@@ -18,7 +18,6 @@ Submodules
...
@@ -18,7 +18,6 @@ Submodules
paddlespeech.server.utils.config
paddlespeech.server.utils.config
paddlespeech.server.utils.errors
paddlespeech.server.utils.errors
paddlespeech.server.utils.exception
paddlespeech.server.utils.exception
paddlespeech.server.utils.log
paddlespeech.server.utils.onnx_infer
paddlespeech.server.utils.onnx_infer
paddlespeech.server.utils.paddle_predictor
paddlespeech.server.utils.paddle_predictor
paddlespeech.server.utils.util
paddlespeech.server.utils.util
...
...
docs/source/api/paddlespeech.t2s.datasets.rst
浏览文件 @
5e714ecb
...
@@ -19,4 +19,5 @@ Submodules
...
@@ -19,4 +19,5 @@ Submodules
paddlespeech.t2s.datasets.get_feats
paddlespeech.t2s.datasets.get_feats
paddlespeech.t2s.datasets.ljspeech
paddlespeech.t2s.datasets.ljspeech
paddlespeech.t2s.datasets.preprocess_utils
paddlespeech.t2s.datasets.preprocess_utils
paddlespeech.t2s.datasets.sampler
paddlespeech.t2s.datasets.vocoder_batch_fn
paddlespeech.t2s.datasets.vocoder_batch_fn
docs/source/api/paddlespeech.t2s.datasets.sampler.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.datasets.sampler module
========================================
.. automodule:: paddlespeech.t2s.datasets.sampler
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.ernie_sat.align.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.ernie\_sat.align module
=============================================
.. automodule:: paddlespeech.t2s.exps.ernie_sat.align
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.ernie_sat.normalize.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.ernie\_sat.normalize module
=================================================
.. automodule:: paddlespeech.t2s.exps.ernie_sat.normalize
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.ernie_sat.preprocess.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.ernie\_sat.preprocess module
==================================================
.. automodule:: paddlespeech.t2s.exps.ernie_sat.preprocess
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.ernie_sat.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.ernie\_sat package
========================================
.. automodule:: paddlespeech.t2s.exps.ernie_sat
:members:
:undoc-members:
:show-inheritance:
Submodules
----------
.. toctree::
:maxdepth: 4
paddlespeech.t2s.exps.ernie_sat.align
paddlespeech.t2s.exps.ernie_sat.normalize
paddlespeech.t2s.exps.ernie_sat.preprocess
paddlespeech.t2s.exps.ernie_sat.synthesize
paddlespeech.t2s.exps.ernie_sat.synthesize_e2e
paddlespeech.t2s.exps.ernie_sat.train
paddlespeech.t2s.exps.ernie_sat.utils
docs/source/api/paddlespeech.t2s.exps.ernie_sat.synthesize.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.ernie\_sat.synthesize module
==================================================
.. automodule:: paddlespeech.t2s.exps.ernie_sat.synthesize
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.ernie_sat.synthesize_e2e.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.ernie\_sat.synthesize\_e2e module
=======================================================
.. automodule:: paddlespeech.t2s.exps.ernie_sat.synthesize_e2e
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.ernie_sat.train.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.ernie\_sat.train module
=============================================
.. automodule:: paddlespeech.t2s.exps.ernie_sat.train
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.ernie_sat.utils.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.ernie\_sat.utils module
=============================================
.. automodule:: paddlespeech.t2s.exps.ernie_sat.utils
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.fastspeech2.rst
浏览文件 @
5e714ecb
...
@@ -16,3 +16,4 @@ Submodules
...
@@ -16,3 +16,4 @@ Submodules
paddlespeech.t2s.exps.fastspeech2.normalize
paddlespeech.t2s.exps.fastspeech2.normalize
paddlespeech.t2s.exps.fastspeech2.preprocess
paddlespeech.t2s.exps.fastspeech2.preprocess
paddlespeech.t2s.exps.fastspeech2.train
paddlespeech.t2s.exps.fastspeech2.train
paddlespeech.t2s.exps.fastspeech2.vc2_infer
docs/source/api/paddlespeech.t2s.exps.fastspeech2.vc2_infer.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.fastspeech2.vc2\_infer module
===================================================
.. automodule:: paddlespeech.t2s.exps.fastspeech2.vc2_infer
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.rst
浏览文件 @
5e714ecb
...
@@ -12,11 +12,13 @@ Subpackages
...
@@ -12,11 +12,13 @@ Subpackages
.. toctree::
.. toctree::
:maxdepth: 4
:maxdepth: 4
paddlespeech.t2s.exps.ernie_sat
paddlespeech.t2s.exps.fastspeech2
paddlespeech.t2s.exps.fastspeech2
paddlespeech.t2s.exps.gan_vocoder
paddlespeech.t2s.exps.gan_vocoder
paddlespeech.t2s.exps.speedyspeech
paddlespeech.t2s.exps.speedyspeech
paddlespeech.t2s.exps.tacotron2
paddlespeech.t2s.exps.tacotron2
paddlespeech.t2s.exps.transformer_tts
paddlespeech.t2s.exps.transformer_tts
paddlespeech.t2s.exps.vits
paddlespeech.t2s.exps.waveflow
paddlespeech.t2s.exps.waveflow
paddlespeech.t2s.exps.wavernn
paddlespeech.t2s.exps.wavernn
...
@@ -31,6 +33,7 @@ Submodules
...
@@ -31,6 +33,7 @@ Submodules
paddlespeech.t2s.exps.ort_predict
paddlespeech.t2s.exps.ort_predict
paddlespeech.t2s.exps.ort_predict_e2e
paddlespeech.t2s.exps.ort_predict_e2e
paddlespeech.t2s.exps.ort_predict_streaming
paddlespeech.t2s.exps.ort_predict_streaming
paddlespeech.t2s.exps.stream_play_tts
paddlespeech.t2s.exps.syn_utils
paddlespeech.t2s.exps.syn_utils
paddlespeech.t2s.exps.synthesize
paddlespeech.t2s.exps.synthesize
paddlespeech.t2s.exps.synthesize_e2e
paddlespeech.t2s.exps.synthesize_e2e
...
...
docs/source/api/paddlespeech.t2s.exps.stream_play_tts.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.stream\_play\_tts module
==============================================
.. automodule:: paddlespeech.t2s.exps.stream_play_tts
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.vits.normalize.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.vits.normalize module
===========================================
.. automodule:: paddlespeech.t2s.exps.vits.normalize
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.vits.preprocess.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.vits.preprocess module
============================================
.. automodule:: paddlespeech.t2s.exps.vits.preprocess
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.vits.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.vits package
==================================
.. automodule:: paddlespeech.t2s.exps.vits
:members:
:undoc-members:
:show-inheritance:
Submodules
----------
.. toctree::
:maxdepth: 4
paddlespeech.t2s.exps.vits.normalize
paddlespeech.t2s.exps.vits.preprocess
paddlespeech.t2s.exps.vits.synthesize
paddlespeech.t2s.exps.vits.synthesize_e2e
paddlespeech.t2s.exps.vits.train
paddlespeech.t2s.exps.vits.voice_cloning
docs/source/api/paddlespeech.t2s.exps.vits.synthesize.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.vits.synthesize module
============================================
.. automodule:: paddlespeech.t2s.exps.vits.synthesize
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.vits.synthesize_e2e.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.vits.synthesize\_e2e module
=================================================
.. automodule:: paddlespeech.t2s.exps.vits.synthesize_e2e
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.vits.train.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.vits.train module
=======================================
.. automodule:: paddlespeech.t2s.exps.vits.train
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.exps.vits.voice_cloning.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.exps.vits.voice\_cloning module
================================================
.. automodule:: paddlespeech.t2s.exps.vits.voice_cloning
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.frontend.g2pw.dataset.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.frontend.g2pw.dataset module
=============================================
.. automodule:: paddlespeech.t2s.frontend.g2pw.dataset
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.frontend.g2pw.onnx_api.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.frontend.g2pw.onnx\_api module
===============================================
.. automodule:: paddlespeech.t2s.frontend.g2pw.onnx_api
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.frontend.g2pw.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.frontend.g2pw package
======================================
.. automodule:: paddlespeech.t2s.frontend.g2pw
:members:
:undoc-members:
:show-inheritance:
Submodules
----------
.. toctree::
:maxdepth: 4
paddlespeech.t2s.frontend.g2pw.dataset
paddlespeech.t2s.frontend.g2pw.onnx_api
paddlespeech.t2s.frontend.g2pw.utils
docs/source/api/paddlespeech.t2s.frontend.g2pw.utils.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.frontend.g2pw.utils module
===========================================
.. automodule:: paddlespeech.t2s.frontend.g2pw.utils
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.frontend.mix_frontend.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.frontend.mix\_frontend module
==============================================
.. automodule:: paddlespeech.t2s.frontend.mix_frontend
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.frontend.rst
浏览文件 @
5e714ecb
...
@@ -12,6 +12,7 @@ Subpackages
...
@@ -12,6 +12,7 @@ Subpackages
.. toctree::
.. toctree::
:maxdepth: 4
:maxdepth: 4
paddlespeech.t2s.frontend.g2pw
paddlespeech.t2s.frontend.normalizer
paddlespeech.t2s.frontend.normalizer
paddlespeech.t2s.frontend.zh_normalization
paddlespeech.t2s.frontend.zh_normalization
...
@@ -23,6 +24,7 @@ Submodules
...
@@ -23,6 +24,7 @@ Submodules
paddlespeech.t2s.frontend.arpabet
paddlespeech.t2s.frontend.arpabet
paddlespeech.t2s.frontend.generate_lexicon
paddlespeech.t2s.frontend.generate_lexicon
paddlespeech.t2s.frontend.mix_frontend
paddlespeech.t2s.frontend.phonectic
paddlespeech.t2s.frontend.phonectic
paddlespeech.t2s.frontend.punctuation
paddlespeech.t2s.frontend.punctuation
paddlespeech.t2s.frontend.tone_sandhi
paddlespeech.t2s.frontend.tone_sandhi
...
...
docs/source/api/paddlespeech.t2s.models.ernie_sat.ernie_sat.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.models.ernie\_sat.ernie\_sat module
====================================================
.. automodule:: paddlespeech.t2s.models.ernie_sat.ernie_sat
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.models.ernie_sat.ernie_sat_updater.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.models.ernie\_sat.ernie\_sat\_updater module
=============================================================
.. automodule:: paddlespeech.t2s.models.ernie_sat.ernie_sat_updater
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.models.ernie_sat.rst
浏览文件 @
5e714ecb
...
@@ -12,4 +12,5 @@ Submodules
...
@@ -12,4 +12,5 @@ Submodules
.. toctree::
.. toctree::
:maxdepth: 4
:maxdepth: 4
paddlespeech.t2s.models.ernie_sat.mlm
paddlespeech.t2s.models.ernie_sat.ernie_sat
paddlespeech.t2s.models.ernie_sat.ernie_sat_updater
docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.core.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.models.vits.monotonic\_align.core module
=========================================================
.. automodule:: paddlespeech.t2s.models.vits.monotonic_align.core
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.models.vits.monotonic\_align package
=====================================================
.. automodule:: paddlespeech.t2s.models.vits.monotonic_align
:members:
:undoc-members:
:show-inheritance:
Submodules
----------
.. toctree::
:maxdepth: 4
paddlespeech.t2s.models.vits.monotonic_align.core
paddlespeech.t2s.models.vits.monotonic_align.setup
docs/source/api/paddlespeech.t2s.models.vits.monotonic_align.setup.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.t2s.models.vits.monotonic\_align.setup module
==========================================================
.. automodule:: paddlespeech.t2s.models.vits.monotonic_align.setup
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.utils.dynamic_import.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.utils.dynamic\_import module
=========================================
.. automodule:: paddlespeech.utils.dynamic_import
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.utils.env.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.utils.env module
=============================
.. automodule:: paddlespeech.utils.env
:members:
:undoc-members:
:show-inheritance:
docs/source/api/paddlespeech.utils.rst
0 → 100644
浏览文件 @
5e714ecb
paddlespeech.utils package
==========================
.. automodule:: paddlespeech.utils
:members:
:undoc-members:
:show-inheritance:
Submodules
----------
.. toctree::
:maxdepth: 4
paddlespeech.utils.dynamic_import
paddlespeech.utils.env
docs/source/index.rst
浏览文件 @
5e714ecb
...
@@ -74,8 +74,10 @@ Contents
...
@@ -74,8 +74,10 @@ Contents
paddlespeech.cli <api/paddlespeech.cli>
paddlespeech.cli <api/paddlespeech.cli>
paddlespeech.cls <api/paddlespeech.cls>
paddlespeech.cls <api/paddlespeech.cls>
paddlespeech.kws <api/paddlespeech.kws>
paddlespeech.kws <api/paddlespeech.kws>
paddlespeech.resource <api/paddlespeech.resource>
paddlespeech.s2t <api/paddlespeech.s2t>
paddlespeech.s2t <api/paddlespeech.s2t>
paddlespeech.server <api/paddlespeech.server>
paddlespeech.server <api/paddlespeech.server>
paddlespeech.t2s <api/paddlespeech.t2s>
paddlespeech.t2s <api/paddlespeech.t2s>
paddlespeech.text <api/paddlespeech.text>
paddlespeech.text <api/paddlespeech.text>
paddlespeech.utils <api/ppaddlespeech.utils>
paddlespeech.vector <api/paddlespeech.vector>
paddlespeech.vector <api/paddlespeech.vector>
paddlespeech/t2s/models/ernie_sat/ernie_sat.py
浏览文件 @
5e714ecb
...
@@ -71,31 +71,53 @@ class MLMEncoder(nn.Layer):
...
@@ -71,31 +71,53 @@ class MLMEncoder(nn.Layer):
"""Conformer encoder module.
"""Conformer encoder module.
Args:
Args:
idim (int): Input dimension.
idim (int):
attention_dim (int): Dimension of attention.
Input dimension.
attention_heads (int): The number of heads of multi head attention.
attention_dim (int):
linear_units (int): The number of units of position-wise feed forward.
Dimension of attention.
num_blocks (int): The number of decoder blocks.
attention_heads (int):
dropout_rate (float): Dropout rate.
The number of heads of multi head attention.
positional_dropout_rate (float): Dropout rate after adding positional encoding.
linear_units (int):
attention_dropout_rate (float): Dropout rate in attention.
The number of units of position-wise feed forward.
input_layer (Union[str, paddle.nn.Layer]): Input layer type.
num_blocks (int):
normalize_before (bool): Whether to use layer_norm before the first block.
The number of decoder blocks.
concat_after (bool): Whether to concat attention layer's input and output.
dropout_rate (float):
Dropout rate.
positional_dropout_rate (float):
Dropout rate after adding positional encoding.
attention_dropout_rate (float):
Dropout rate in attention.
input_layer (Union[str, paddle.nn.Layer]):
Input layer type.
normalize_before (bool):
Whether to use layer_norm before the first block.
concat_after (bool):
Whether to concat attention layer's input and output.
if True, additional linear will be applied.
if True, additional linear will be applied.
i.e. x -> x + linear(concat(x, att(x)))
i.e. x -> x + linear(concat(x, att(x)))
if False, no additional linear will be applied. i.e. x -> x + att(x)
if False, no additional linear will be applied. i.e. x -> x + att(x)
positionwise_layer_type (str): "linear", "conv1d", or "conv1d-linear".
positionwise_layer_type (str):
positionwise_conv_kernel_size (int): Kernel size of positionwise conv1d layer.
"linear", "conv1d", or "conv1d-linear".
macaron_style (bool): Whether to use macaron style for positionwise layer.
positionwise_conv_kernel_size (int):
pos_enc_layer_type (str): Encoder positional encoding layer type.
Kernel size of positionwise conv1d layer.
selfattention_layer_type (str): Encoder attention layer type.
macaron_style (bool):
activation_type (str): Encoder activation function type.
Whether to use macaron style for positionwise layer.
use_cnn_module (bool): Whether to use convolution module.
pos_enc_layer_type (str):
zero_triu (bool): Whether to zero the upper triangular part of attention matrix.
Encoder positional encoding layer type.
cnn_module_kernel (int): Kernerl size of convolution module.
selfattention_layer_type (str):
padding_idx (int): Padding idx for input_layer=embed.
Encoder attention layer type.
stochastic_depth_rate (float): Maximum probability to skip the encoder layer.
activation_type (str):
Encoder activation function type.
use_cnn_module (bool):
Whether to use convolution module.
zero_triu (bool):
Whether to zero the upper triangular part of attention matrix.
cnn_module_kernel (int):
Kernerl size of convolution module.
padding_idx (int):
Padding idx for input_layer=embed.
stochastic_depth_rate (float):
Maximum probability to skip the encoder layer.
"""
"""
...
@@ -320,12 +342,16 @@ class MLMDecoder(MLMEncoder):
...
@@ -320,12 +342,16 @@ class MLMDecoder(MLMEncoder):
"""Encode input sequence.
"""Encode input sequence.
Args:
Args:
xs (paddle.Tensor): Input tensor (#batch, time, idim).
xs (paddle.Tensor):
masks (paddle.Tensor): Mask tensor (#batch, time).
Input tensor (#batch, time, idim).
masks (paddle.Tensor):
Mask tensor (#batch, time).
Returns:
Returns:
paddle.Tensor: Output tensor (#batch, time, attention_dim).
paddle.Tensor:
paddle.Tensor: Mask tensor (#batch, time).
Output tensor (#batch, time, attention_dim).
paddle.Tensor:
Mask tensor (#batch, time).
"""
"""
xs
=
self
.
embed
(
xs
)
xs
=
self
.
embed
(
xs
)
...
@@ -392,19 +418,27 @@ class MLM(nn.Layer):
...
@@ -392,19 +418,27 @@ class MLM(nn.Layer):
use_teacher_forcing
:
bool
=
True
,
)
->
List
[
paddle
.
Tensor
]:
use_teacher_forcing
:
bool
=
True
,
)
->
List
[
paddle
.
Tensor
]:
'''
'''
Args:
Args:
speech (paddle.Tensor): input speech (1, Tmax, D).
speech (paddle.Tensor):
text (paddle.Tensor): input text (1, Tmax2).
input speech (1, Tmax, D).
masked_pos (paddle.Tensor): masked position of input speech (1, Tmax)
text (paddle.Tensor):
speech_mask (paddle.Tensor): mask of speech (1, 1, Tmax).
input text (1, Tmax2).
text_mask (paddle.Tensor): mask of text (1, 1, Tmax2).
masked_pos (paddle.Tensor):
speech_seg_pos (paddle.Tensor): n-th phone of each mel, 0<=n<=Tmax2 (1, Tmax).
masked position of input speech (1, Tmax)
text_seg_pos (paddle.Tensor): n-th phone of each phone, 0<=n<=Tmax2 (1, Tmax2).
speech_mask (paddle.Tensor):
span_bdy (List[int]): masked mel boundary of input speech (2,)
mask of speech (1, 1, Tmax).
use_teacher_forcing (bool): whether to use teacher forcing
text_mask (paddle.Tensor):
mask of text (1, 1, Tmax2).
speech_seg_pos (paddle.Tensor):
n-th phone of each mel, 0<=n<=Tmax2 (1, Tmax).
text_seg_pos (paddle.Tensor):
n-th phone of each phone, 0<=n<=Tmax2 (1, Tmax2).
span_bdy (List[int]):
masked mel boundary of input speech (2,)
use_teacher_forcing (bool):
whether to use teacher forcing
Returns:
Returns:
List[Tensor]:
List[Tensor]:
eg:
eg: [Tensor(shape=[1, 181, 80]), Tensor(shape=[80, 80]), Tensor(shape=[1, 67, 80])]
[Tensor(shape=[1, 181, 80]), Tensor(shape=[80, 80]), Tensor(shape=[1, 67, 80])]
'''
'''
z_cache
=
None
z_cache
=
None
...
...
paddlespeech/t2s/models/vits/duration_predictor.py
浏览文件 @
5e714ecb
...
@@ -48,12 +48,18 @@ class StochasticDurationPredictor(nn.Layer):
...
@@ -48,12 +48,18 @@ class StochasticDurationPredictor(nn.Layer):
global_channels
:
int
=-
1
,
):
global_channels
:
int
=-
1
,
):
"""Initialize StochasticDurationPredictor module.
"""Initialize StochasticDurationPredictor module.
Args:
Args:
channels (int): Number of channels.
channels (int):
kernel_size (int): Kernel size.
Number of channels.
dropout_rate (float): Dropout rate.
kernel_size (int):
flows (int): Number of flows.
Kernel size.
dds_conv_layers (int): Number of conv layers in DDS conv.
dropout_rate (float):
global_channels (int): Number of global conditioning channels.
Dropout rate.
flows (int):
Number of flows.
dds_conv_layers (int):
Number of conv layers in DDS conv.
global_channels (int):
Number of global conditioning channels.
"""
"""
super
().
__init__
()
super
().
__init__
()
...
@@ -108,14 +114,21 @@ class StochasticDurationPredictor(nn.Layer):
...
@@ -108,14 +114,21 @@ class StochasticDurationPredictor(nn.Layer):
noise_scale
:
float
=
1.0
,
)
->
paddle
.
Tensor
:
noise_scale
:
float
=
1.0
,
)
->
paddle
.
Tensor
:
"""Calculate forward propagation.
"""Calculate forward propagation.
Args:
Args:
x (Tensor): Input tensor (B, channels, T_text).
x (Tensor):
x_mask (Tensor): Mask tensor (B, 1, T_text).
Input tensor (B, channels, T_text).
w (Optional[Tensor]): Duration tensor (B, 1, T_text).
x_mask (Tensor):
g (Optional[Tensor]): Global conditioning tensor (B, channels, 1)
Mask tensor (B, 1, T_text).
inverse (bool): Whether to inverse the flow.
w (Optional[Tensor]):
noise_scale (float): Noise scale value.
Duration tensor (B, 1, T_text).
g (Optional[Tensor]):
Global conditioning tensor (B, channels, 1)
inverse (bool):
Whether to inverse the flow.
noise_scale (float):
Noise scale value.
Returns:
Returns:
Tensor: If not inverse, negative log-likelihood (NLL) tensor (B,).
Tensor:
If not inverse, negative log-likelihood (NLL) tensor (B,).
If inverse, log-duration tensor (B, 1, T_text).
If inverse, log-duration tensor (B, 1, T_text).
"""
"""
# stop gradient
# stop gradient
...
...
paddlespeech/t2s/models/vits/flow.py
浏览文件 @
5e714ecb
...
@@ -34,11 +34,15 @@ class FlipFlow(nn.Layer):
...
@@ -34,11 +34,15 @@ class FlipFlow(nn.Layer):
)
->
Union
[
paddle
.
Tensor
,
Tuple
[
paddle
.
Tensor
,
paddle
.
Tensor
]]:
)
->
Union
[
paddle
.
Tensor
,
Tuple
[
paddle
.
Tensor
,
paddle
.
Tensor
]]:
"""Calculate forward propagation.
"""Calculate forward propagation.
Args:
Args:
x (Tensor): Input tensor (B, channels, T).
x (Tensor):
inverse (bool): Whether to inverse the flow.
Input tensor (B, channels, T).
inverse (bool):
Whether to inverse the flow.
Returns:
Returns:
Tensor: Flipped tensor (B, channels, T).
Tensor:
Tensor: Log-determinant tensor for NLL (B,) if not inverse.
Flipped tensor (B, channels, T).
Tensor:
Log-determinant tensor for NLL (B,) if not inverse.
"""
"""
x
=
paddle
.
flip
(
x
,
[
1
])
x
=
paddle
.
flip
(
x
,
[
1
])
if
not
inverse
:
if
not
inverse
:
...
@@ -60,13 +64,19 @@ class LogFlow(nn.Layer):
...
@@ -60,13 +64,19 @@ class LogFlow(nn.Layer):
)
->
Union
[
paddle
.
Tensor
,
Tuple
[
paddle
.
Tensor
,
paddle
.
Tensor
]]:
)
->
Union
[
paddle
.
Tensor
,
Tuple
[
paddle
.
Tensor
,
paddle
.
Tensor
]]:
"""Calculate forward propagation.
"""Calculate forward propagation.
Args:
Args:
x (Tensor): Input tensor (B, channels, T).
x (Tensor):
x_mask (Tensor): Mask tensor (B, 1, T).
Input tensor (B, channels, T).
inverse (bool): Whether to inverse the flow.
x_mask (Tensor):
eps (float): Epsilon for log.
Mask tensor (B, 1, T).
inverse (bool):
Whether to inverse the flow.
eps (float):
Epsilon for log.
Returns:
Returns:
Tensor: Output tensor (B, channels, T).
Tensor:
Tensor: Log-determinant tensor for NLL (B,) if not inverse.
Output tensor (B, channels, T).
Tensor:
Log-determinant tensor for NLL (B,) if not inverse.
"""
"""
if
not
inverse
:
if
not
inverse
:
y
=
paddle
.
log
(
paddle
.
clip
(
x
,
min
=
eps
))
*
x_mask
y
=
paddle
.
log
(
paddle
.
clip
(
x
,
min
=
eps
))
*
x_mask
...
@@ -83,7 +93,8 @@ class ElementwiseAffineFlow(nn.Layer):
...
@@ -83,7 +93,8 @@ class ElementwiseAffineFlow(nn.Layer):
def
__init__
(
self
,
channels
:
int
):
def
__init__
(
self
,
channels
:
int
):
"""Initialize ElementwiseAffineFlow module.
"""Initialize ElementwiseAffineFlow module.
Args:
Args:
channels (int): Number of channels.
channels (int):
Number of channels.
"""
"""
super
().
__init__
()
super
().
__init__
()
self
.
channels
=
channels
self
.
channels
=
channels
...
@@ -107,12 +118,17 @@ class ElementwiseAffineFlow(nn.Layer):
...
@@ -107,12 +118,17 @@ class ElementwiseAffineFlow(nn.Layer):
)
->
Union
[
paddle
.
Tensor
,
Tuple
[
paddle
.
Tensor
,
paddle
.
Tensor
]]:
)
->
Union
[
paddle
.
Tensor
,
Tuple
[
paddle
.
Tensor
,
paddle
.
Tensor
]]:
"""Calculate forward propagation.
"""Calculate forward propagation.
Args:
Args:
x (Tensor): Input tensor (B, channels, T).
x (Tensor):
x_mask (Tensor): Mask tensor (B, 1, T).
Input tensor (B, channels, T).
inverse (bool): Whether to inverse the flow.
x_mask (Tensor):
Mask tensor (B, 1, T).
inverse (bool):
Whether to inverse the flow.
Returns:
Returns:
Tensor: Output tensor (B, channels, T).
Tensor:
Tensor: Log-determinant tensor for NLL (B,) if not inverse.
Output tensor (B, channels, T).
Tensor:
Log-determinant tensor for NLL (B,) if not inverse.
"""
"""
if
not
inverse
:
if
not
inverse
:
y
=
self
.
m
+
paddle
.
exp
(
self
.
logs
)
*
x
y
=
self
.
m
+
paddle
.
exp
(
self
.
logs
)
*
x
...
@@ -157,11 +173,16 @@ class DilatedDepthSeparableConv(nn.Layer):
...
@@ -157,11 +173,16 @@ class DilatedDepthSeparableConv(nn.Layer):
eps
:
float
=
1e-5
,
):
eps
:
float
=
1e-5
,
):
"""Initialize DilatedDepthSeparableConv module.
"""Initialize DilatedDepthSeparableConv module.
Args:
Args:
channels (int): Number of channels.
channels (int):
kernel_size (int): Kernel size.
Number of channels.
layers (int): Number of layers.
kernel_size (int):
dropout_rate (float): Dropout rate.
Kernel size.
eps (float): Epsilon for layer norm.
layers (int):
Number of layers.
dropout_rate (float):
Dropout rate.
eps (float):
Epsilon for layer norm.
"""
"""
super
().
__init__
()
super
().
__init__
()
...
@@ -198,11 +219,15 @@ class DilatedDepthSeparableConv(nn.Layer):
...
@@ -198,11 +219,15 @@ class DilatedDepthSeparableConv(nn.Layer):
g
:
Optional
[
paddle
.
Tensor
]
=
None
)
->
paddle
.
Tensor
:
g
:
Optional
[
paddle
.
Tensor
]
=
None
)
->
paddle
.
Tensor
:
"""Calculate forward propagation.
"""Calculate forward propagation.
Args:
Args:
x (Tensor): Input tensor (B, in_channels, T).
x (Tensor):
x_mask (Tensor): Mask tensor (B, 1, T).
Input tensor (B, in_channels, T).
g (Optional[Tensor]): Global conditioning tensor (B, global_channels, 1).
x_mask (Tensor):
Mask tensor (B, 1, T).
g (Optional[Tensor]):
Global conditioning tensor (B, global_channels, 1).
Returns:
Returns:
Tensor: Output tensor (B, channels, T).
Tensor:
Output tensor (B, channels, T).
"""
"""
if
g
is
not
None
:
if
g
is
not
None
:
x
=
x
+
g
x
=
x
+
g
...
@@ -225,12 +250,18 @@ class ConvFlow(nn.Layer):
...
@@ -225,12 +250,18 @@ class ConvFlow(nn.Layer):
tail_bound
:
float
=
5.0
,
):
tail_bound
:
float
=
5.0
,
):
"""Initialize ConvFlow module.
"""Initialize ConvFlow module.
Args:
Args:
in_channels (int): Number of input channels.
in_channels (int):
hidden_channels (int): Number of hidden channels.
Number of input channels.
kernel_size (int): Kernel size.
hidden_channels (int):
layers (int): Number of layers.
Number of hidden channels.
bins (int): Number of bins.
kernel_size (int):
tail_bound (float): Tail bound value.
Kernel size.
layers (int):
Number of layers.
bins (int):
Number of bins.
tail_bound (float):
Tail bound value.
"""
"""
super
().
__init__
()
super
().
__init__
()
self
.
half_channels
=
in_channels
//
2
self
.
half_channels
=
in_channels
//
2
...
@@ -275,13 +306,19 @@ class ConvFlow(nn.Layer):
...
@@ -275,13 +306,19 @@ class ConvFlow(nn.Layer):
)
->
Union
[
paddle
.
Tensor
,
Tuple
[
paddle
.
Tensor
,
paddle
.
Tensor
]]:
)
->
Union
[
paddle
.
Tensor
,
Tuple
[
paddle
.
Tensor
,
paddle
.
Tensor
]]:
"""Calculate forward propagation.
"""Calculate forward propagation.
Args:
Args:
x (Tensor): Input tensor (B, channels, T).
x (Tensor):
x_mask (Tensor): Mask tensor (B, 1, T).
Input tensor (B, channels, T).
g (Optional[Tensor]): Global conditioning tensor (B, channels, 1).
x_mask (Tensor):
inverse (bool): Whether to inverse the flow.
Mask tensor (B, 1, T).
g (Optional[Tensor]):
Global conditioning tensor (B, channels, 1).
inverse (bool):
Whether to inverse the flow.
Returns:
Returns:
Tensor: Output tensor (B, channels, T).
Tensor:
Tensor: Log-determinant tensor for NLL (B,) if not inverse.
Output tensor (B, channels, T).
Tensor:
Log-determinant tensor for NLL (B,) if not inverse.
"""
"""
xa
,
xb
=
x
.
split
(
2
,
1
)
xa
,
xb
=
x
.
split
(
2
,
1
)
h
=
self
.
input_conv
(
xa
)
h
=
self
.
input_conv
(
xa
)
...
...
paddlespeech/t2s/models/vits/generator.py
浏览文件 @
5e714ecb
...
@@ -97,81 +97,104 @@ class VITSGenerator(nn.Layer):
...
@@ -97,81 +97,104 @@ class VITSGenerator(nn.Layer):
stochastic_duration_predictor_dds_conv_layers
:
int
=
3
,
):
stochastic_duration_predictor_dds_conv_layers
:
int
=
3
,
):
"""Initialize VITS generator module.
"""Initialize VITS generator module.
Args:
Args:
vocabs (int): Input vocabulary size.
vocabs (int):
aux_channels (int): Number of acoustic feature channels.
Input vocabulary size.
hidden_channels (int): Number of hidden channels.
aux_channels (int):
spks (Optional[int]): Number of speakers. If set to > 1, assume that the
Number of acoustic feature channels.
hidden_channels (int):
Number of hidden channels.
spks (Optional[int]):
Number of speakers. If set to > 1, assume that the
sids will be provided as the input and use sid embedding layer.
sids will be provided as the input and use sid embedding layer.
langs (Optional[int]): Number of languages. If set to > 1, assume that the
langs (Optional[int]):
Number of languages. If set to > 1, assume that the
lids will be provided as the input and use sid embedding layer.
lids will be provided as the input and use sid embedding layer.
spk_embed_dim (Optional[int]): Speaker embedding dimension. If set to > 0,
spk_embed_dim (Optional[int]):
Speaker embedding dimension. If set to > 0,
assume that spembs will be provided as the input.
assume that spembs will be provided as the input.
global_channels (int): Number of global conditioning channels.
global_channels (int):
segment_size (int): Segment size for decoder.
Number of global conditioning channels.
text_encoder_attention_heads (int): Number of heads in conformer block
segment_size (int):
of text encoder.
Segment size for decoder.
text_encoder_ffn_expand (int): Expansion ratio of FFN in conformer block
text_encoder_attention_heads (int):
of text encoder.
Number of heads in conformer block of text encoder.
text_encoder_blocks (int): Number of conformer blocks in text encoder.
text_encoder_ffn_expand (int):
text_encoder_positionwise_layer_type (str): Position-wise layer type in
Expansion ratio of FFN in conformer block of text encoder.
conformer block of text encoder.
text_encoder_blocks (int):
text_encoder_positionwise_conv_kernel_size (int): Position-wise convolution
Number of conformer blocks in text encoder.
kernel size in conformer block of text encoder. Only used when the
text_encoder_positionwise_layer_type (str):
above layer type is conv1d or conv1d-linear.
Position-wise layer type in conformer block of text encoder.
text_encoder_positional_encoding_layer_type (str): Positional encoding layer
text_encoder_positionwise_conv_kernel_size (int):
type in conformer block of text encoder.
Position-wise convolution kernel size in conformer block of text encoder.
text_encoder_self_attention_layer_type (str): Self-attention layer type in
Only used when the above layer type is conv1d or conv1d-linear.
conformer block of text encoder.
text_encoder_positional_encoding_layer_type (str):
text_encoder_activation_type (str): Activation function type in conformer
Positional encoding layer type in conformer block of text encoder.
block of text encoder.
text_encoder_self_attention_layer_type (str):
text_encoder_normalize_before (bool): Whether to apply layer norm before
Self-attention layer type in conformer block of text encoder.
self-attention in conformer block of text encoder.
text_encoder_activation_type (str):
text_encoder_dropout_rate (float): Dropout rate in conformer block of
Activation function type in conformer block of text encoder.
text encoder.
text_encoder_normalize_before (bool):
text_encoder_positional_dropout_rate (float): Dropout rate for positional
Whether to apply layer norm before self-attention in conformer block of text encoder.
encoding in conformer block of text encoder.
text_encoder_dropout_rate (float):
text_encoder_attention_dropout_rate (float): Dropout rate for attention in
Dropout rate in conformer block of text encoder.
conformer block of text encoder.
text_encoder_positional_dropout_rate (float):
text_encoder_conformer_kernel_size (int): Conformer conv kernel size. It
Dropout rate for positional encoding in conformer block of text encoder.
will be used when only use_conformer_conv_in_text_encoder = True.
text_encoder_attention_dropout_rate (float):
use_macaron_style_in_text_encoder (bool): Whether to use macaron style FFN
Dropout rate for attention in conformer block of text encoder.
in conformer block of text encoder.
text_encoder_conformer_kernel_size (int):
use_conformer_conv_in_text_encoder (bool): Whether to use covolution in
Conformer conv kernel size. It will be used when only use_conformer_conv_in_text_encoder = True.
conformer block of text encoder.
use_macaron_style_in_text_encoder (bool):
decoder_kernel_size (int): Decoder kernel size.
Whether to use macaron style FFN in conformer block of text encoder.
decoder_channels (int): Number of decoder initial channels.
use_conformer_conv_in_text_encoder (bool):
decoder_upsample_scales (List[int]): List of upsampling scales in decoder.
Whether to use covolution in conformer block of text encoder.
decoder_upsample_kernel_sizes (List[int]): List of kernel size for
decoder_kernel_size (int):
upsampling layers in decoder.
Decoder kernel size.
decoder_resblock_kernel_sizes (List[int]): List of kernel size for resblocks
decoder_channels (int):
in decoder.
Number of decoder initial channels.
decoder_resblock_dilations (List[List[int]]): List of list of dilations for
decoder_upsample_scales (List[int]):
resblocks in decoder.
List of upsampling scales in decoder.
use_weight_norm_in_decoder (bool): Whether to apply weight normalization in
decoder_upsample_kernel_sizes (List[int]):
decoder.
List of kernel size for upsampling layers in decoder.
posterior_encoder_kernel_size (int): Posterior encoder kernel size.
decoder_resblock_kernel_sizes (List[int]):
posterior_encoder_layers (int): Number of layers of posterior encoder.
List of kernel size for resblocks in decoder.
posterior_encoder_stacks (int): Number of stacks of posterior encoder.
decoder_resblock_dilations (List[List[int]]):
posterior_encoder_base_dilation (int): Base dilation of posterior encoder.
List of list of dilations for resblocks in decoder.
posterior_encoder_dropout_rate (float): Dropout rate for posterior encoder.
use_weight_norm_in_decoder (bool):
use_weight_norm_in_posterior_encoder (bool): Whether to apply weight
Whether to apply weight normalization in decoder.
normalization in posterior encoder.
posterior_encoder_kernel_size (int):
flow_flows (int): Number of flows in flow.
Posterior encoder kernel size.
flow_kernel_size (int): Kernel size in flow.
posterior_encoder_layers (int):
flow_base_dilation (int): Base dilation in flow.
Number of layers of posterior encoder.
flow_layers (int): Number of layers in flow.
posterior_encoder_stacks (int):
flow_dropout_rate (float): Dropout rate in flow
Number of stacks of posterior encoder.
use_weight_norm_in_flow (bool): Whether to apply weight normalization in
posterior_encoder_base_dilation (int):
flow.
Base dilation of posterior encoder.
use_only_mean_in_flow (bool): Whether to use only mean in flow.
posterior_encoder_dropout_rate (float):
stochastic_duration_predictor_kernel_size (int): Kernel size in stochastic
Dropout rate for posterior encoder.
duration predictor.
use_weight_norm_in_posterior_encoder (bool):
stochastic_duration_predictor_dropout_rate (float): Dropout rate in
Whether to apply weight normalization in posterior encoder.
stochastic duration predictor.
flow_flows (int):
stochastic_duration_predictor_flows (int): Number of flows in stochastic
Number of flows in flow.
duration predictor.
flow_kernel_size (int):
stochastic_duration_predictor_dds_conv_layers (int): Number of DDS conv
Kernel size in flow.
layers in stochastic duration predictor.
flow_base_dilation (int):
Base dilation in flow.
flow_layers (int):
Number of layers in flow.
flow_dropout_rate (float):
Dropout rate in flow
use_weight_norm_in_flow (bool):
Whether to apply weight normalization in flow.
use_only_mean_in_flow (bool):
Whether to use only mean in flow.
stochastic_duration_predictor_kernel_size (int):
Kernel size in stochastic duration predictor.
stochastic_duration_predictor_dropout_rate (float):
Dropout rate in stochastic duration predictor.
stochastic_duration_predictor_flows (int):
Number of flows in stochastic duration predictor.
stochastic_duration_predictor_dds_conv_layers (int):
Number of DDS conv layers in stochastic duration predictor.
"""
"""
super
().
__init__
()
super
().
__init__
()
self
.
segment_size
=
segment_size
self
.
segment_size
=
segment_size
...
@@ -272,27 +295,40 @@ class VITSGenerator(nn.Layer):
...
@@ -272,27 +295,40 @@ class VITSGenerator(nn.Layer):
paddle
.
Tensor
,
paddle
.
Tensor
,
],
]:
paddle
.
Tensor
,
paddle
.
Tensor
,
],
]:
"""Calculate forward propagation.
"""Calculate forward propagation.
Args:
Args:
text (Tensor): Text index tensor (B, T_text).
text (Tensor):
text_lengths (Tensor): Text length tensor (B,).
Text index tensor (B, T_text).
feats (Tensor): Feature tensor (B, aux_channels, T_feats).
text_lengths (Tensor):
feats_lengths (Tensor): Feature length tensor (B,).
Text length tensor (B,).
sids (Optional[Tensor]): Speaker index tensor (B,) or (B, 1).
feats (Tensor):
spembs (Optional[Tensor]): Speaker embedding tensor (B, spk_embed_dim).
Feature tensor (B, aux_channels, T_feats).
lids (Optional[Tensor]): Language index tensor (B,) or (B, 1).
feats_lengths (Tensor):
Feature length tensor (B,).
sids (Optional[Tensor]):
Speaker index tensor (B,) or (B, 1).
spembs (Optional[Tensor]):
Speaker embedding tensor (B, spk_embed_dim).
lids (Optional[Tensor]):
Language index tensor (B,) or (B, 1).
Returns:
Returns:
Tensor: Waveform tensor (B, 1, segment_size * upsample_factor).
Tensor:
Tensor: Duration negative log-likelihood (NLL) tensor (B,).
Waveform tensor (B, 1, segment_size * upsample_factor).
Tensor: Monotonic attention weight tensor (B, 1, T_feats, T_text).
Tensor:
Tensor: Segments start index tensor (B,).
Duration negative log-likelihood (NLL) tensor (B,).
Tensor: Text mask tensor (B, 1, T_text).
Tensor:
Tensor: Feature mask tensor (B, 1, T_feats).
Monotonic attention weight tensor (B, 1, T_feats, T_text).
tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]:
Tensor:
- Tensor: Posterior encoder hidden representation (B, H, T_feats).
Segments start index tensor (B,).
- Tensor: Flow hidden representation (B, H, T_feats).
Tensor:
- Tensor: Expanded text encoder projected mean (B, H, T_feats).
Text mask tensor (B, 1, T_text).
- Tensor: Expanded text encoder projected scale (B, H, T_feats).
Tensor:
- Tensor: Posterior encoder projected mean (B, H, T_feats).
Feature mask tensor (B, 1, T_feats).
- Tensor: Posterior encoder projected scale (B, H, T_feats).
tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]:
- Tensor: Posterior encoder hidden representation (B, H, T_feats).
- Tensor: Flow hidden representation (B, H, T_feats).
- Tensor: Expanded text encoder projected mean (B, H, T_feats).
- Tensor: Expanded text encoder projected scale (B, H, T_feats).
- Tensor: Posterior encoder projected mean (B, H, T_feats).
- Tensor: Posterior encoder projected scale (B, H, T_feats).
"""
"""
# forward text encoder
# forward text encoder
x
,
m_p
,
logs_p
,
x_mask
=
self
.
text_encoder
(
text
,
text_lengths
)
x
,
m_p
,
logs_p
,
x_mask
=
self
.
text_encoder
(
text
,
text_lengths
)
...
@@ -402,24 +438,40 @@ class VITSGenerator(nn.Layer):
...
@@ -402,24 +438,40 @@ class VITSGenerator(nn.Layer):
)
->
Tuple
[
paddle
.
Tensor
,
paddle
.
Tensor
,
paddle
.
Tensor
]:
)
->
Tuple
[
paddle
.
Tensor
,
paddle
.
Tensor
,
paddle
.
Tensor
]:
"""Run inference.
"""Run inference.
Args:
Args:
text (Tensor): Input text index tensor (B, T_text,).
text (Tensor):
text_lengths (Tensor): Text length tensor (B,).
Input text index tensor (B, T_text,).
feats (Tensor): Feature tensor (B, aux_channels, T_feats,).
text_lengths (Tensor):
feats_lengths (Tensor): Feature length tensor (B,).
Text length tensor (B,).
sids (Optional[Tensor]): Speaker index tensor (B,) or (B, 1).
feats (Tensor):
spembs (Optional[Tensor]): Speaker embedding tensor (B, spk_embed_dim).
Feature tensor (B, aux_channels, T_feats,).
lids (Optional[Tensor]): Language index tensor (B,) or (B, 1).
feats_lengths (Tensor):
dur (Optional[Tensor]): Ground-truth duration (B, T_text,). If provided,
Feature length tensor (B,).
sids (Optional[Tensor]):
Speaker index tensor (B,) or (B, 1).
spembs (Optional[Tensor]):
Speaker embedding tensor (B, spk_embed_dim).
lids (Optional[Tensor]):
Language index tensor (B,) or (B, 1).
dur (Optional[Tensor]):
Ground-truth duration (B, T_text,). If provided,
skip the prediction of durations (i.e., teacher forcing).
skip the prediction of durations (i.e., teacher forcing).
noise_scale (float): Noise scale parameter for flow.
noise_scale (float):
noise_scale_dur (float): Noise scale parameter for duration predictor.
Noise scale parameter for flow.
alpha (float): Alpha parameter to control the speed of generated speech.
noise_scale_dur (float):
max_len (Optional[int]): Maximum length of acoustic feature sequence.
Noise scale parameter for duration predictor.
use_teacher_forcing (bool): Whether to use teacher forcing.
alpha (float):
Alpha parameter to control the speed of generated speech.
max_len (Optional[int]):
Maximum length of acoustic feature sequence.
use_teacher_forcing (bool):
Whether to use teacher forcing.
Returns:
Returns:
Tensor: Generated waveform tensor (B, T_wav).
Tensor:
Tensor: Monotonic attention weight tensor (B, T_feats, T_text).
Generated waveform tensor (B, T_wav).
Tensor: Duration tensor (B, T_text).
Tensor:
Monotonic attention weight tensor (B, T_feats, T_text).
Tensor:
Duration tensor (B, T_text).
"""
"""
# encoder
# encoder
x
,
m_p
,
logs_p
,
x_mask
=
self
.
text_encoder
(
text
,
text_lengths
)
x
,
m_p
,
logs_p
,
x_mask
=
self
.
text_encoder
(
text
,
text_lengths
)
...
@@ -533,15 +585,23 @@ class VITSGenerator(nn.Layer):
...
@@ -533,15 +585,23 @@ class VITSGenerator(nn.Layer):
lids
:
Optional
[
paddle
.
Tensor
]
=
None
,
)
->
paddle
.
Tensor
:
lids
:
Optional
[
paddle
.
Tensor
]
=
None
,
)
->
paddle
.
Tensor
:
"""Run voice conversion.
"""Run voice conversion.
Args:
Args:
feats (Tensor): Feature tensor (B, aux_channels, T_feats,).
feats (Tensor):
feats_lengths (Tensor): Feature length tensor (B,).
Feature tensor (B, aux_channels, T_feats,).
sids_src (Optional[Tensor]): Speaker index tensor of source feature (B,) or (B, 1).
feats_lengths (Tensor):
sids_tgt (Optional[Tensor]): Speaker index tensor of target feature (B,) or (B, 1).
Feature length tensor (B,).
spembs_src (Optional[Tensor]): Speaker embedding tensor of source feature (B, spk_embed_dim).
sids_src (Optional[Tensor]):
spembs_tgt (Optional[Tensor]): Speaker embedding tensor of target feature (B, spk_embed_dim).
Speaker index tensor of source feature (B,) or (B, 1).
lids (Optional[Tensor]): Language index tensor (B,) or (B, 1).
sids_tgt (Optional[Tensor]):
Speaker index tensor of target feature (B,) or (B, 1).
spembs_src (Optional[Tensor]):
Speaker embedding tensor of source feature (B, spk_embed_dim).
spembs_tgt (Optional[Tensor]):
Speaker embedding tensor of target feature (B, spk_embed_dim).
lids (Optional[Tensor]):
Language index tensor (B,) or (B, 1).
Returns:
Returns:
Tensor: Generated waveform tensor (B, T_wav).
Tensor:
Generated waveform tensor (B, T_wav).
"""
"""
# encoder
# encoder
g_src
=
None
g_src
=
None
...
@@ -602,10 +662,13 @@ class VITSGenerator(nn.Layer):
...
@@ -602,10 +662,13 @@ class VITSGenerator(nn.Layer):
mask
:
paddle
.
Tensor
)
->
paddle
.
Tensor
:
mask
:
paddle
.
Tensor
)
->
paddle
.
Tensor
:
"""Generate path a.k.a. monotonic attention.
"""Generate path a.k.a. monotonic attention.
Args:
Args:
dur (Tensor): Duration tensor (B, 1, T_text).
dur (Tensor):
mask (Tensor): Attention mask tensor (B, 1, T_feats, T_text).
Duration tensor (B, 1, T_text).
mask (Tensor):
Attention mask tensor (B, 1, T_feats, T_text).
Returns:
Returns:
Tensor: Path tensor (B, 1, T_feats, T_text).
Tensor:
Path tensor (B, 1, T_feats, T_text).
"""
"""
b
,
_
,
t_y
,
t_x
=
paddle
.
shape
(
mask
)
b
,
_
,
t_y
,
t_x
=
paddle
.
shape
(
mask
)
cum_dur
=
paddle
.
cumsum
(
dur
,
-
1
)
cum_dur
=
paddle
.
cumsum
(
dur
,
-
1
)
...
...
paddlespeech/t2s/models/vits/posterior_encoder.py
浏览文件 @
5e714ecb
...
@@ -52,17 +52,28 @@ class PosteriorEncoder(nn.Layer):
...
@@ -52,17 +52,28 @@ class PosteriorEncoder(nn.Layer):
"""Initilialize PosteriorEncoder module.
"""Initilialize PosteriorEncoder module.
Args:
Args:
in_channels (int): Number of input channels.
in_channels (int):
out_channels (int): Number of output channels.
Number of input channels.
hidden_channels (int): Number of hidden channels.
out_channels (int):
kernel_size (int): Kernel size in WaveNet.
Number of output channels.
layers (int): Number of layers of WaveNet.
hidden_channels (int):
stacks (int): Number of repeat stacking of WaveNet.
Number of hidden channels.
base_dilation (int): Base dilation factor.
kernel_size (int):
global_channels (int): Number of global conditioning channels.
Kernel size in WaveNet.
dropout_rate (float): Dropout rate.
layers (int):
bias (bool): Whether to use bias parameters in conv.
Number of layers of WaveNet.
use_weight_norm (bool): Whether to apply weight norm.
stacks (int):
Number of repeat stacking of WaveNet.
base_dilation (int):
Base dilation factor.
global_channels (int):
Number of global conditioning channels.
dropout_rate (float):
Dropout rate.
bias (bool):
Whether to use bias parameters in conv.
use_weight_norm (bool):
Whether to apply weight norm.
"""
"""
super
().
__init__
()
super
().
__init__
()
...
@@ -99,15 +110,22 @@ class PosteriorEncoder(nn.Layer):
...
@@ -99,15 +110,22 @@ class PosteriorEncoder(nn.Layer):
"""Calculate forward propagation.
"""Calculate forward propagation.
Args:
Args:
x (Tensor): Input tensor (B, in_channels, T_feats).
x (Tensor):
x_lengths (Tensor): Length tensor (B,).
Input tensor (B, in_channels, T_feats).
g (Optional[Tensor]): Global conditioning tensor (B, global_channels, 1).
x_lengths (Tensor):
Length tensor (B,).
g (Optional[Tensor]):
Global conditioning tensor (B, global_channels, 1).
Returns:
Returns:
Tensor: Encoded hidden representation tensor (B, out_channels, T_feats).
Tensor:
Tensor: Projected mean tensor (B, out_channels, T_feats).
Encoded hidden representation tensor (B, out_channels, T_feats).
Tensor: Projected scale tensor (B, out_channels, T_feats).
Tensor:
Tensor: Mask tensor for input tensor (B, 1, T_feats).
Projected mean tensor (B, out_channels, T_feats).
Tensor:
Projected scale tensor (B, out_channels, T_feats).
Tensor:
Mask tensor for input tensor (B, 1, T_feats).
"""
"""
x_mask
=
make_non_pad_mask
(
x_lengths
).
unsqueeze
(
1
)
x_mask
=
make_non_pad_mask
(
x_lengths
).
unsqueeze
(
1
)
...
...
paddlespeech/t2s/models/vits/residual_coupling.py
浏览文件 @
5e714ecb
...
@@ -55,18 +55,30 @@ class ResidualAffineCouplingBlock(nn.Layer):
...
@@ -55,18 +55,30 @@ class ResidualAffineCouplingBlock(nn.Layer):
"""Initilize ResidualAffineCouplingBlock module.
"""Initilize ResidualAffineCouplingBlock module.
Args:
Args:
in_channels (int): Number of input channels.
in_channels (int):
hidden_channels (int): Number of hidden channels.
Number of input channels.
flows (int): Number of flows.
hidden_channels (int):
kernel_size (int): Kernel size for WaveNet.
Number of hidden channels.
base_dilation (int): Base dilation factor for WaveNet.
flows (int):
layers (int): Number of layers of WaveNet.
Number of flows.
stacks (int): Number of stacks of WaveNet.
kernel_size (int):
global_channels (int): Number of global channels.
Kernel size for WaveNet.
dropout_rate (float): Dropout rate.
base_dilation (int):
use_weight_norm (bool): Whether to use weight normalization in WaveNet.
Base dilation factor for WaveNet.
bias (bool): Whether to use bias paramters in WaveNet.
layers (int):
use_only_mean (bool): Whether to estimate only mean.
Number of layers of WaveNet.
stacks (int):
Number of stacks of WaveNet.
global_channels (int):
Number of global channels.
dropout_rate (float):
Dropout rate.
use_weight_norm (bool):
Whether to use weight normalization in WaveNet.
bias (bool):
Whether to use bias paramters in WaveNet.
use_only_mean (bool):
Whether to estimate only mean.
"""
"""
super
().
__init__
()
super
().
__init__
()
...
@@ -97,10 +109,14 @@ class ResidualAffineCouplingBlock(nn.Layer):
...
@@ -97,10 +109,14 @@ class ResidualAffineCouplingBlock(nn.Layer):
"""Calculate forward propagation.
"""Calculate forward propagation.
Args:
Args:
x (Tensor): Input tensor (B, in_channels, T).
x (Tensor):
x_mask (Tensor): Length tensor (B, 1, T).
Input tensor (B, in_channels, T).
g (Optional[Tensor]): Global conditioning tensor (B, global_channels, 1).
x_mask (Tensor):
inverse (bool): Whether to inverse the flow.
Length tensor (B, 1, T).
g (Optional[Tensor]):
Global conditioning tensor (B, global_channels, 1).
inverse (bool):
Whether to inverse the flow.
Returns:
Returns:
Tensor: Output tensor (B, in_channels, T).
Tensor: Output tensor (B, in_channels, T).
...
@@ -134,17 +150,28 @@ class ResidualAffineCouplingLayer(nn.Layer):
...
@@ -134,17 +150,28 @@ class ResidualAffineCouplingLayer(nn.Layer):
"""Initialzie ResidualAffineCouplingLayer module.
"""Initialzie ResidualAffineCouplingLayer module.
Args:
Args:
in_channels (int): Number of input channels.
in_channels (int):
hidden_channels (int): Number of hidden channels.
Number of input channels.
kernel_size (int): Kernel size for WaveNet.
hidden_channels (int):
base_dilation (int): Base dilation factor for WaveNet.
Number of hidden channels.
layers (int): Number of layers of WaveNet.
kernel_size (int):
stacks (int): Number of stacks of WaveNet.
Kernel size for WaveNet.
global_channels (int): Number of global channels.
base_dilation (int):
dropout_rate (float): Dropout rate.
Base dilation factor for WaveNet.
use_weight_norm (bool): Whether to use weight normalization in WaveNet.
layers (int):
bias (bool): Whether to use bias paramters in WaveNet.
Number of layers of WaveNet.
use_only_mean (bool): Whether to estimate only mean.
stacks (int):
Number of stacks of WaveNet.
global_channels (int):
Number of global channels.
dropout_rate (float):
Dropout rate.
use_weight_norm (bool):
Whether to use weight normalization in WaveNet.
bias (bool):
Whether to use bias paramters in WaveNet.
use_only_mean (bool):
Whether to estimate only mean.
"""
"""
assert
in_channels
%
2
==
0
,
"in_channels should be divisible by 2"
assert
in_channels
%
2
==
0
,
"in_channels should be divisible by 2"
...
@@ -211,14 +238,20 @@ class ResidualAffineCouplingLayer(nn.Layer):
...
@@ -211,14 +238,20 @@ class ResidualAffineCouplingLayer(nn.Layer):
"""Calculate forward propagation.
"""Calculate forward propagation.
Args:
Args:
x (Tensor): Input tensor (B, in_channels, T).
x (Tensor):
x_lengths (Tensor): Length tensor (B,).
Input tensor (B, in_channels, T).
g (Optional[Tensor]): Global conditioning tensor (B, global_channels, 1).
x_lengths (Tensor):
inverse (bool): Whether to inverse the flow.
Length tensor (B,).
g (Optional[Tensor]):
Global conditioning tensor (B, global_channels, 1).
inverse (bool):
Whether to inverse the flow.
Returns:
Returns:
Tensor: Output tensor (B, in_channels, T).
Tensor:
Tensor: Log-determinant tensor for NLL (B,) if not inverse.
Output tensor (B, in_channels, T).
Tensor:
Log-determinant tensor for NLL (B,) if not inverse.
"""
"""
xa
,
xb
=
paddle
.
split
(
x
,
2
,
axis
=
1
)
xa
,
xb
=
paddle
.
split
(
x
,
2
,
axis
=
1
)
...
...
paddlespeech/t2s/models/vits/text_encoder.py
浏览文件 @
5e714ecb
...
@@ -62,23 +62,40 @@ class TextEncoder(nn.Layer):
...
@@ -62,23 +62,40 @@ class TextEncoder(nn.Layer):
"""Initialize TextEncoder module.
"""Initialize TextEncoder module.
Args:
Args:
vocabs (int): Vocabulary size.
vocabs (int):
attention_dim (int): Attention dimension.
Vocabulary size.
attention_heads (int): Number of attention heads.
attention_dim (int):
linear_units (int): Number of linear units of positionwise layers.
Attention dimension.
blocks (int): Number of encoder blocks.
attention_heads (int):
positionwise_layer_type (str): Positionwise layer type.
Number of attention heads.
positionwise_conv_kernel_size (int): Positionwise layer's kernel size.
linear_units (int):
positional_encoding_layer_type (str): Positional encoding layer type.
Number of linear units of positionwise layers.
self_attention_layer_type (str): Self-attention layer type.
blocks (int):
activation_type (str): Activation function type.
Number of encoder blocks.
normalize_before (bool): Whether to apply LayerNorm before attention.
positionwise_layer_type (str):
use_macaron_style (bool): Whether to use macaron style components.
Positionwise layer type.
use_conformer_conv (bool): Whether to use conformer conv layers.
positionwise_conv_kernel_size (int):
conformer_kernel_size (int): Conformer's conv kernel size.
Positionwise layer's kernel size.
dropout_rate (float): Dropout rate.
positional_encoding_layer_type (str):
positional_dropout_rate (float): Dropout rate for positional encoding.
Positional encoding layer type.
attention_dropout_rate (float): Dropout rate for attention.
self_attention_layer_type (str):
Self-attention layer type.
activation_type (str):
Activation function type.
normalize_before (bool):
Whether to apply LayerNorm before attention.
use_macaron_style (bool):
Whether to use macaron style components.
use_conformer_conv (bool):
Whether to use conformer conv layers.
conformer_kernel_size (int):
Conformer's conv kernel size.
dropout_rate (float):
Dropout rate.
positional_dropout_rate (float):
Dropout rate for positional encoding.
attention_dropout_rate (float):
Dropout rate for attention.
"""
"""
super
().
__init__
()
super
().
__init__
()
...
@@ -121,14 +138,20 @@ class TextEncoder(nn.Layer):
...
@@ -121,14 +138,20 @@ class TextEncoder(nn.Layer):
"""Calculate forward propagation.
"""Calculate forward propagation.
Args:
Args:
x (Tensor): Input index tensor (B, T_text).
x (Tensor):
x_lengths (Tensor): Length tensor (B,).
Input index tensor (B, T_text).
x_lengths (Tensor):
Length tensor (B,).
Returns:
Returns:
Tensor: Encoded hidden representation (B, attention_dim, T_text).
Tensor:
Tensor: Projected mean tensor (B, attention_dim, T_text).
Encoded hidden representation (B, attention_dim, T_text).
Tensor: Projected scale tensor (B, attention_dim, T_text).
Tensor:
Tensor: Mask tensor for input tensor (B, 1, T_text).
Projected mean tensor (B, attention_dim, T_text).
Tensor:
Projected scale tensor (B, attention_dim, T_text).
Tensor:
Mask tensor for input tensor (B, 1, T_text).
"""
"""
x
=
self
.
emb
(
x
)
*
math
.
sqrt
(
self
.
attention_dim
)
x
=
self
.
emb
(
x
)
*
math
.
sqrt
(
self
.
attention_dim
)
...
...
paddlespeech/t2s/models/vits/vits.py
浏览文件 @
5e714ecb
...
@@ -156,17 +156,25 @@ class VITS(nn.Layer):
...
@@ -156,17 +156,25 @@ class VITS(nn.Layer):
init_type
:
str
=
"xavier_uniform"
,
):
init_type
:
str
=
"xavier_uniform"
,
):
"""Initialize VITS module.
"""Initialize VITS module.
Args:
Args:
idim (int): Input vocabrary size.
idim (int):
odim (int): Acoustic feature dimension. The actual output channels will
Input vocabrary size.
odim (int):
Acoustic feature dimension. The actual output channels will
be 1 since VITS is the end-to-end text-to-wave model but for the
be 1 since VITS is the end-to-end text-to-wave model but for the
compatibility odim is used to indicate the acoustic feature dimension.
compatibility odim is used to indicate the acoustic feature dimension.
sampling_rate (int): Sampling rate, not used for the training but it will
sampling_rate (int):
Sampling rate, not used for the training but it will
be referred in saving waveform during the inference.
be referred in saving waveform during the inference.
generator_type (str): Generator type.
generator_type (str):
generator_params (Dict[str, Any]): Parameter dict for generator.
Generator type.
discriminator_type (str): Discriminator type.
generator_params (Dict[str, Any]):
discriminator_params (Dict[str, Any]): Parameter dict for discriminator.
Parameter dict for generator.
cache_generator_outputs (bool): Whether to cache generator outputs.
discriminator_type (str):
Discriminator type.
discriminator_params (Dict[str, Any]):
Parameter dict for discriminator.
cache_generator_outputs (bool):
Whether to cache generator outputs.
"""
"""
assert
check_argument_types
()
assert
check_argument_types
()
super
().
__init__
()
super
().
__init__
()
...
@@ -218,14 +226,22 @@ class VITS(nn.Layer):
...
@@ -218,14 +226,22 @@ class VITS(nn.Layer):
forward_generator
:
bool
=
True
,
)
->
Dict
[
str
,
Any
]:
forward_generator
:
bool
=
True
,
)
->
Dict
[
str
,
Any
]:
"""Perform generator forward.
"""Perform generator forward.
Args:
Args:
text (Tensor): Text index tensor (B, T_text).
text (Tensor):
text_lengths (Tensor): Text length tensor (B,).
Text index tensor (B, T_text).
feats (Tensor): Feature tensor (B, T_feats, aux_channels).
text_lengths (Tensor):
feats_lengths (Tensor): Feature length tensor (B,).
Text length tensor (B,).
sids (Optional[Tensor]): Speaker index tensor (B,) or (B, 1).
feats (Tensor):
spembs (Optional[Tensor]): Speaker embedding tensor (B, spk_embed_dim).
Feature tensor (B, T_feats, aux_channels).
lids (Optional[Tensor]): Language index tensor (B,) or (B, 1).
feats_lengths (Tensor):
forward_generator (bool): Whether to forward generator.
Feature length tensor (B,).
sids (Optional[Tensor]):
Speaker index tensor (B,) or (B, 1).
spembs (Optional[Tensor]):
Speaker embedding tensor (B, spk_embed_dim).
lids (Optional[Tensor]):
Language index tensor (B,) or (B, 1).
forward_generator (bool):
Whether to forward generator.
Returns:
Returns:
"""
"""
...
@@ -259,13 +275,20 @@ class VITS(nn.Layer):
...
@@ -259,13 +275,20 @@ class VITS(nn.Layer):
lids
:
Optional
[
paddle
.
Tensor
]
=
None
,
)
->
Dict
[
str
,
Any
]:
lids
:
Optional
[
paddle
.
Tensor
]
=
None
,
)
->
Dict
[
str
,
Any
]:
"""Perform generator forward.
"""Perform generator forward.
Args:
Args:
text (Tensor): Text index tensor (B, T_text).
text (Tensor):
text_lengths (Tensor): Text length tensor (B,).
Text index tensor (B, T_text).
feats (Tensor): Feature tensor (B, T_feats, aux_channels).
text_lengths (Tensor):
feats_lengths (Tensor): Feature length tensor (B,).
Text length tensor (B,).
sids (Optional[Tensor]): Speaker index tensor (B,) or (B, 1).
feats (Tensor):
spembs (Optional[Tensor]): Speaker embedding tensor (B, spk_embed_dim).
Feature tensor (B, T_feats, aux_channels).
lids (Optional[Tensor]): Language index tensor (B,) or (B, 1).
feats_lengths (Tensor):
Feature length tensor (B,).
sids (Optional[Tensor]):
Speaker index tensor (B,) or (B, 1).
spembs (Optional[Tensor]):
Speaker embedding tensor (B, spk_embed_dim).
lids (Optional[Tensor]):
Language index tensor (B,) or (B, 1).
Returns:
Returns:
"""
"""
...
@@ -304,13 +327,20 @@ class VITS(nn.Layer):
...
@@ -304,13 +327,20 @@ class VITS(nn.Layer):
lids
:
Optional
[
paddle
.
Tensor
]
=
None
,
)
->
Dict
[
str
,
Any
]:
lids
:
Optional
[
paddle
.
Tensor
]
=
None
,
)
->
Dict
[
str
,
Any
]:
"""Perform discriminator forward.
"""Perform discriminator forward.
Args:
Args:
text (Tensor): Text index tensor (B, T_text).
text (Tensor):
text_lengths (Tensor): Text length tensor (B,).
Text index tensor (B, T_text).
feats (Tensor): Feature tensor (B, T_feats, aux_channels).
text_lengths (Tensor):
feats_lengths (Tensor): Feature length tensor (B,).
Text length tensor (B,).
sids (Optional[Tensor]): Speaker index tensor (B,) or (B, 1).
feats (Tensor):
spembs (Optional[Tensor]): Speaker embedding tensor (B, spk_embed_dim).
Feature tensor (B, T_feats, aux_channels).
lids (Optional[Tensor]): Language index tensor (B,) or (B, 1).
feats_lengths (Tensor):
Feature length tensor (B,).
sids (Optional[Tensor]):
Speaker index tensor (B,) or (B, 1).
spembs (Optional[Tensor]):
Speaker embedding tensor (B, spk_embed_dim).
lids (Optional[Tensor]):
Language index tensor (B,) or (B, 1).
Returns:
Returns:
"""
"""
...
@@ -353,22 +383,36 @@ class VITS(nn.Layer):
...
@@ -353,22 +383,36 @@ class VITS(nn.Layer):
use_teacher_forcing
:
bool
=
False
,
)
->
Dict
[
str
,
paddle
.
Tensor
]:
use_teacher_forcing
:
bool
=
False
,
)
->
Dict
[
str
,
paddle
.
Tensor
]:
"""Run inference.
"""Run inference.
Args:
Args:
text (Tensor): Input text index tensor (T_text,).
text (Tensor):
feats (Tensor): Feature tensor (T_feats, aux_channels).
Input text index tensor (T_text,).
sids (Tensor): Speaker index tensor (1,).
feats (Tensor):
spembs (Optional[Tensor]): Speaker embedding tensor (spk_embed_dim,).
Feature tensor (T_feats, aux_channels).
lids (Tensor): Language index tensor (1,).
sids (Tensor):
durations (Tensor): Ground-truth duration tensor (T_text,).
Speaker index tensor (1,).
noise_scale (float): Noise scale value for flow.
spembs (Optional[Tensor]):
noise_scale_dur (float): Noise scale value for duration predictor.
Speaker embedding tensor (spk_embed_dim,).
alpha (float): Alpha parameter to control the speed of generated speech.
lids (Tensor):
max_len (Optional[int]): Maximum length.
Language index tensor (1,).
use_teacher_forcing (bool): Whether to use teacher forcing.
durations (Tensor):
Ground-truth duration tensor (T_text,).
noise_scale (float):
Noise scale value for flow.
noise_scale_dur (float):
Noise scale value for duration predictor.
alpha (float):
Alpha parameter to control the speed of generated speech.
max_len (Optional[int]):
Maximum length.
use_teacher_forcing (bool):
Whether to use teacher forcing.
Returns:
Returns:
Dict[str, Tensor]:
Dict[str, Tensor]:
* wav (Tensor): Generated waveform tensor (T_wav,).
* wav (Tensor):
* att_w (Tensor): Monotonic attention weight tensor (T_feats, T_text).
Generated waveform tensor (T_wav,).
* duration (Tensor): Predicted duration tensor (T_text,).
* att_w (Tensor):
Monotonic attention weight tensor (T_feats, T_text).
* duration (Tensor):
Predicted duration tensor (T_text,).
"""
"""
# setup
# setup
text
=
text
[
None
]
text
=
text
[
None
]
...
@@ -417,15 +461,22 @@ class VITS(nn.Layer):
...
@@ -417,15 +461,22 @@ class VITS(nn.Layer):
lids
:
Optional
[
paddle
.
Tensor
]
=
None
,
)
->
paddle
.
Tensor
:
lids
:
Optional
[
paddle
.
Tensor
]
=
None
,
)
->
paddle
.
Tensor
:
"""Run voice conversion.
"""Run voice conversion.
Args:
Args:
feats (Tensor): Feature tensor (T_feats, aux_channels).
feats (Tensor):
sids_src (Optional[Tensor]): Speaker index tensor of source feature (1,).
Feature tensor (T_feats, aux_channels).
sids_tgt (Optional[Tensor]): Speaker index tensor of target feature (1,).
sids_src (Optional[Tensor]):
spembs_src (Optional[Tensor]): Speaker embedding tensor of source feature (spk_embed_dim,).
Speaker index tensor of source feature (1,).
spembs_tgt (Optional[Tensor]): Speaker embedding tensor of target feature (spk_embed_dim,).
sids_tgt (Optional[Tensor]):
lids (Optional[Tensor]): Language index tensor (1,).
Speaker index tensor of target feature (1,).
spembs_src (Optional[Tensor]):
Speaker embedding tensor of source feature (spk_embed_dim,).
spembs_tgt (Optional[Tensor]):
Speaker embedding tensor of target feature (spk_embed_dim,).
lids (Optional[Tensor]):
Language index tensor (1,).
Returns:
Returns:
Dict[str, Tensor]:
Dict[str, Tensor]:
* wav (Tensor): Generated waveform tensor (T_wav,).
* wav (Tensor):
Generated waveform tensor (T_wav,).
"""
"""
assert
feats
is
not
None
assert
feats
is
not
None
feats
=
feats
[
None
].
transpose
([
0
,
2
,
1
])
feats
=
feats
[
None
].
transpose
([
0
,
2
,
1
])
...
...
paddlespeech/t2s/models/vits/wavenet/residual_block.py
浏览文件 @
5e714ecb
...
@@ -39,14 +39,22 @@ class ResidualBlock(nn.Layer):
...
@@ -39,14 +39,22 @@ class ResidualBlock(nn.Layer):
"""Initialize ResidualBlock module.
"""Initialize ResidualBlock module.
Args:
Args:
kernel_size (int): Kernel size of dilation convolution layer.
kernel_size (int):
residual_channels (int): Number of channels for residual connection.
Kernel size of dilation convolution layer.
skip_channels (int): Number of channels for skip connection.
residual_channels (int):
aux_channels (int): Number of local conditioning channels.
Number of channels for residual connection.
dropout (float): Dropout probability.
skip_channels (int):
dilation (int): Dilation factor.
Number of channels for skip connection.
bias (bool): Whether to add bias parameter in convolution layers.
aux_channels (int):
scale_residual (bool): Whether to scale the residual outputs.
Number of local conditioning channels.
dropout (float):
Dropout probability.
dilation (int):
Dilation factor.
bias (bool):
Whether to add bias parameter in convolution layers.
scale_residual (bool):
Whether to scale the residual outputs.
"""
"""
super
().
__init__
()
super
().
__init__
()
...
...
paddlespeech/t2s/models/vits/wavenet/wavenet.py
浏览文件 @
5e714ecb
...
@@ -47,25 +47,42 @@ class WaveNet(nn.Layer):
...
@@ -47,25 +47,42 @@ class WaveNet(nn.Layer):
"""Initialize WaveNet module.
"""Initialize WaveNet module.
Args:
Args:
in_channels (int): Number of input channels.
in_channels (int):
out_channels (int): Number of output channels.
Number of input channels.
kernel_size (int): Kernel size of dilated convolution.
out_channels (int):
layers (int): Number of residual block layers.
Number of output channels.
stacks (int): Number of stacks i.e., dilation cycles.
kernel_size (int):
base_dilation (int): Base dilation factor.
Kernel size of dilated convolution.
residual_channels (int): Number of channels in residual conv.
layers (int):
gate_channels (int): Number of channels in gated conv.
Number of residual block layers.
skip_channels (int): Number of channels in skip conv.
stacks (int):
aux_channels (int): Number of channels for local conditioning feature.
Number of stacks i.e., dilation cycles.
global_channels (int): Number of channels for global conditioning feature.
base_dilation (int):
dropout_rate (float): Dropout rate. 0.0 means no dropout applied.
Base dilation factor.
bias (bool): Whether to use bias parameter in conv layer.
residual_channels (int):
use_weight_norm (bool): Whether to use weight norm. If set to true, it will
Number of channels in residual conv.
be applied to all of the conv layers.
gate_channels (int):
use_first_conv (bool): Whether to use the first conv layers.
Number of channels in gated conv.
use_last_conv (bool): Whether to use the last conv layers.
skip_channels (int):
scale_residual (bool): Whether to scale the residual outputs.
Number of channels in skip conv.
scale_skip_connect (bool): Whether to scale the skip connection outputs.
aux_channels (int):
Number of channels for local conditioning feature.
global_channels (int):
Number of channels for global conditioning feature.
dropout_rate (float):
Dropout rate. 0.0 means no dropout applied.
bias (bool):
Whether to use bias parameter in conv layer.
use_weight_norm (bool):
Whether to use weight norm. If set to true, it will be applied to all of the conv layers.
use_first_conv (bool):
Whether to use the first conv layers.
use_last_conv (bool):
Whether to use the last conv layers.
scale_residual (bool):
Whether to scale the residual outputs.
scale_skip_connect (bool):
Whether to scale the skip connection outputs.
"""
"""
super
().
__init__
()
super
().
__init__
()
...
@@ -128,15 +145,18 @@ class WaveNet(nn.Layer):
...
@@ -128,15 +145,18 @@ class WaveNet(nn.Layer):
"""Calculate forward propagation.
"""Calculate forward propagation.
Args:
Args:
x (Tensor): Input noise signal (B, 1, T) if use_first_conv else
x (Tensor):
(B, residual_channels, T).
Input noise signal (B, 1, T) if use_first_conv else (B, residual_channels, T).
x_mask (Optional[Tensor]): Mask tensor (B, 1, T).
x_mask (Optional[Tensor]):
c (Optional[Tensor]): Local conditioning features (B, aux_channels, T).
Mask tensor (B, 1, T).
g (Optional[Tensor]): Global conditioning features (B, global_channels, 1).
c (Optional[Tensor]):
Local conditioning features (B, aux_channels, T).
g (Optional[Tensor]):
Global conditioning features (B, global_channels, 1).
Returns:
Returns:
Tensor:
Output tensor (B, out_channels, T) if use_last_conv else
Tensor:
(B, residual_channels, T).
Output tensor (B, out_channels, T) if use_last_conv else
(B, residual_channels, T).
"""
"""
# encode to hidden representation
# encode to hidden representation
...
...
paddlespeech/t2s/models/wavernn/wavernn.py
浏览文件 @
5e714ecb
...
@@ -69,9 +69,11 @@ class MelResNet(nn.Layer):
...
@@ -69,9 +69,11 @@ class MelResNet(nn.Layer):
def
forward
(
self
,
x
):
def
forward
(
self
,
x
):
'''
'''
Args:
Args:
x (Tensor): Input tensor (B, in_dims, T).
x (Tensor):
Input tensor (B, in_dims, T).
Returns:
Returns:
Tensor: Output tensor (B, res_out_dims, T).
Tensor:
Output tensor (B, res_out_dims, T).
'''
'''
x
=
self
.
conv_in
(
x
)
x
=
self
.
conv_in
(
x
)
...
@@ -119,10 +121,13 @@ class UpsampleNetwork(nn.Layer):
...
@@ -119,10 +121,13 @@ class UpsampleNetwork(nn.Layer):
def
forward
(
self
,
m
):
def
forward
(
self
,
m
):
'''
'''
Args:
Args:
c (Tensor): Input tensor (B, C_aux, T).
c (Tensor):
Input tensor (B, C_aux, T).
Returns:
Returns:
Tensor: Output tensor (B, (T - 2 * pad) * prob(upsample_scales), C_aux).
Tensor:
Tensor: Output tensor (B, (T - 2 * pad) * prob(upsample_scales), res_out_dims).
Output tensor (B, (T - 2 * pad) * prob(upsample_scales), C_aux).
Tensor:
Output tensor (B, (T - 2 * pad) * prob(upsample_scales), res_out_dims).
'''
'''
# aux: [B, C_aux, T]
# aux: [B, C_aux, T]
# -> [B, res_out_dims, T - 2 * aux_context_window]
# -> [B, res_out_dims, T - 2 * aux_context_window]
...
@@ -302,7 +307,8 @@ class WaveRNN(nn.Layer):
...
@@ -302,7 +307,8 @@ class WaveRNN(nn.Layer):
number of samples for crossfading between batches
number of samples for crossfading between batches
mu_law(bool)
mu_law(bool)
Returns:
Returns:
wav sequence: Output (T' * prod(upsample_scales), out_channels, C_out).
wav sequence:
Output (T' * prod(upsample_scales), out_channels, C_out).
"""
"""
self
.
eval
()
self
.
eval
()
...
@@ -423,7 +429,7 @@ class WaveRNN(nn.Layer):
...
@@ -423,7 +429,7 @@ class WaveRNN(nn.Layer):
x(Tensor):
x(Tensor):
mel, [1, n_frames, 80]
mel, [1, n_frames, 80]
pad(int):
pad(int):
side(str, optional): (Default value = 'both')
side(str, optional): (Default value = 'both')
Returns:
Returns:
Tensor
Tensor
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录