@@ -248,7 +248,8 @@ Audio samples generated from ground-truth spectrograms with a vocoder.
</tr>
</table>
</div>
<br>
<br>
TTS
-------------------
...
...
@@ -633,10 +634,264 @@ Audio samples generated by a TTS system. Text is first transformed into spectrog
</td>
</tr>
</table>
</div>
<br>
<br>
Multi-Speaker TTS
-------------------
PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generated by FastSpeech2 + ParallelWaveGAN, we use AISHELL-3 Multi-Speaker TTS dataset.
In our FastSpeech2, we can control ``duration``, ``pitch`` and ``energy``, we provide the audio demos of duration control here. ``duration`` means the duration of phonemes, when we reduce duration, the speed of audios will increase, and when we incerase ``duration``, the speed of audios will reduce.
The ``duration`` of different phonemes in a sentence can have different scale ratios (when you want to slow down one word and keep the other words' speed in a sentence). Here we use a fixed scale ratio for different phonemes to control the ``speed`` of audios.
The duration control in FastSpeech2 can control the speed of audios will keep the pitch. (in some speech tool, increase the speed will increase the pitch, and vice versa.)