README.md 8.9 KB
Newer Older
1
([简体中文](./README_cn.md)|English)
小湉湉's avatar
小湉湉 已提交
2
# TTS (Text To Speech)
K
KP 已提交
3 4 5 6

## Introduction
Text-to-speech (TTS) is a natural language modeling process that requires changing units of text into units of speech for audio presentation. 

小湉湉's avatar
小湉湉 已提交
7
This demo is an implementation to generate audio from the given text. It can be done by a single command or a few lines in python using `PaddleSpeech`. 
K
KP 已提交
8 9 10

## Usage
### 1. Installation
小湉湉's avatar
小湉湉 已提交
11
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
J
Jackwaterveg 已提交
12 13 14

You can choose one way from easy, meduim and hard to install paddlespeech.

K
KP 已提交
15
### 2. Prepare Input
小湉湉's avatar
小湉湉 已提交
16
The input of this demo should be a text of the specific language that can be passed via argument.
K
KP 已提交
17
### 3. Usage
小湉湉's avatar
小湉湉 已提交
18
- Command Line (Recommended)
小湉湉's avatar
小湉湉 已提交
19
    The default acoustic model is `Fastspeech2`, and the default vocoder is `HiFiGAN`, the default inference method is dygraph inference. 
小湉湉's avatar
小湉湉 已提交
20
    - Chinese
小湉湉's avatar
小湉湉 已提交
21 22 23
        ```bash
        paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!"
        ```
H
Hui Zhang 已提交
24 25 26 27
    - Batch Process
        ```bash
        echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
        ```
小湉湉's avatar
小湉湉 已提交
28
    - Chinese, use `SpeedySpeech` as the acoustic model
小湉湉's avatar
小湉湉 已提交
29 30 31
        ```bash
        paddlespeech tts --am speedyspeech_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!"
        ```
小湉湉's avatar
小湉湉 已提交
32
    - Chinese, multi-speaker
小湉湉's avatar
小湉湉 已提交
33 34 35 36 37 38
    
        You can change `spk_id` here.
        ```bash
        paddlespeech tts --am fastspeech2_aishell3 --voc pwgan_aishell3 --input "你好,欢迎使用百度飞桨深度学习框架!" --spk_id 0
        ```
    
小湉湉's avatar
小湉湉 已提交
39
     - English
小湉湉's avatar
小湉湉 已提交
40 41 42
        ```bash
        paddlespeech tts --am fastspeech2_ljspeech --voc pwgan_ljspeech --lang en --input "hello world"
        ```
小湉湉's avatar
小湉湉 已提交
43
    - English, multi-speaker
小湉湉's avatar
小湉湉 已提交
44 45 46
    
        You can change `spk_id` here.
        ```bash
小湉湉's avatar
小湉湉 已提交
47
        paddlespeech tts --am fastspeech2_vctk --voc pwgan_vctk --input "hello, boys" --lang en --spk_id 0
48 49 50 51 52 53 54 55 56 57 58 59 60
        ```
    - Chinese English Mixed, multi-speaker
        You can change `spk_id` here.
        ```bash
        # The `am` must be `fastspeech2_mix`!
        # The `lang` must be `mix`!
        # The voc must be chinese datasets' voc now!
        # spk 174 is csmcc, spk 175 is ljspeech
        paddlespeech tts --am fastspeech2_mix --voc hifigan_csmsc --lang mix --input "热烈欢迎您在 Discussions 中提交问题,并在 Issues 中指出发现的 bug。此外,我们非常希望您参与到 Paddle Speech 的开发中!" --spk_id 174 --output mix_spk174.wav
        paddlespeech tts --am fastspeech2_mix --voc hifigan_aishell3 --lang mix --input "热烈欢迎您在 Discussions 中提交问题,并在 Issues 中指出发现的 bug。此外,我们非常希望您参与到 Paddle Speech 的开发中!" --spk_id 174 --output mix_spk174_aishell3.wav
        paddlespeech tts --am fastspeech2_mix --voc pwgan_csmsc --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --spk_id 175 --output mix_spk175_pwgan.wav
        paddlespeech tts --am fastspeech2_mix --voc hifigan_csmsc --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --spk_id 175 --output mix_spk175.wav
        ```
小湉湉's avatar
小湉湉 已提交
61 62 63 64 65 66 67 68 69 70 71 72 73 74
     - Use ONNXRuntime infer:
        ```bash
        paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" --output default.wav --use_onnx True
        paddlespeech tts --am speedyspeech_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!" --output ss.wav --use_onnx True
        paddlespeech tts --voc mb_melgan_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!" --output mb.wav --use_onnx True
        paddlespeech tts --voc pwgan_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!" --output pwgan.wav --use_onnx True
        paddlespeech tts --am fastspeech2_aishell3 --voc pwgan_aishell3 --input "你好,欢迎使用百度飞桨深度学习框架!" --spk_id 0 --output aishell3_fs2_pwgan.wav --use_onnx True
        paddlespeech tts --am fastspeech2_aishell3 --voc hifigan_aishell3 --input "你好,欢迎使用百度飞桨深度学习框架!" --spk_id 0 --output aishell3_fs2_hifigan.wav --use_onnx True
        paddlespeech tts --am fastspeech2_ljspeech --voc pwgan_ljspeech --lang en --input "Life was like a box of chocolates, you never know what you're gonna get." --output lj_fs2_pwgan.wav --use_onnx True
        paddlespeech tts --am fastspeech2_ljspeech --voc hifigan_ljspeech --lang en --input "Life was like a box of chocolates, you never know what you're gonna get." --output lj_fs2_hifigan.wav --use_onnx True
        paddlespeech tts --am fastspeech2_vctk --voc pwgan_vctk --input "Life was like a box of chocolates, you never know what you're gonna get." --lang en --spk_id 0 --output vctk_fs2_pwgan.wav --use_onnx True
        paddlespeech tts --am fastspeech2_vctk --voc hifigan_vctk --input "Life was like a box of chocolates, you never know what you're gonna get." --lang en --spk_id 0 --output vctk_fs2_hifigan.wav --use_onnx True
         ```

小湉湉's avatar
小湉湉 已提交
75
  Usage:
小湉湉's avatar
小湉湉 已提交
76 77
  
  ```bash
K
KP 已提交
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
  paddlespeech tts --help
  ```
  Arguments:
  - `input`(required): Input text to generate..
  - `am`: Acoustic model type of tts task. Default: `fastspeech2_csmsc`.
  - `am_config`: Config of acoustic model. Use deault config when it is None. Default: `None`.
  - `am_ckpt`: Acoustic model checkpoint. Use pretrained model when it is None. Default: `None`.
  - `am_stat`: Mean and standard deviation used to normalize spectrogram when training acoustic model. Default: `None`.
  - `phones_dict`: Phone vocabulary file. Default: `None`.
  - `tones_dict`: Tone vocabulary file. Default: `None`.
  - `speaker_dict`: speaker id map file. Default: `None`.
  - `spk_id`: Speaker id for multi speaker acoustic model. Default: `0`.
  - `voc`: Vocoder type of tts task. Default: `pwgan_csmsc`.
  - `voc_config`: Config of vocoder. Use deault config when it is None. Default: `None`.
  - `voc_ckpt`: Vocoder checkpoint. Use pretrained model when it is None. Default: `None`.
  - `voc_stat`: Mean and standard deviation used to normalize spectrogram when training vocoder. Default: `None`.
  - `lang`: Language of tts task. Default: `zh`.
  - `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.
  - `output`: Output wave filepath. Default: `output.wav`.
小湉湉's avatar
小湉湉 已提交
97 98
  - `use_onnx`: whether to usen ONNXRuntime inference.
  - `fs`: sample rate for ONNX models when use specified model files.
K
KP 已提交
99 100 101 102 103 104 105

  Output:
  ```bash
  [2021-12-09 20:49:58,955] [    INFO] [log.py] [L57] - Wave file has been generated: output.wav
  ```

- Python API
小湉湉's avatar
小湉湉 已提交
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
    - Dygraph infer:
        ```python
        import paddle
        from paddlespeech.cli.tts import TTSExecutor
        tts_executor = TTSExecutor()
        wav_file = tts_executor(
            text='今天的天气不错啊',
            output='output.wav',
            am='fastspeech2_csmsc',
            am_config=None,
            am_ckpt=None,
            am_stat=None,
            spk_id=0,
            phones_dict=None,
            tones_dict=None,
            speaker_dict=None,
            voc='pwgan_csmsc',
            voc_config=None,
            voc_ckpt=None,
            voc_stat=None,
            lang='zh',
            device=paddle.get_device())
        print('Wave file has been generated: {}'.format(wav_file))
        ```
    - ONNXRuntime infer:
        ```python
        from paddlespeech.cli.tts import TTSExecutor
        tts_executor = TTSExecutor()
        wav_file = tts_executor(
            text='对数据集进行预处理',
            output='output.wav',
            am='fastspeech2_csmsc',
            voc='hifigan_csmsc',
            lang='zh',
            use_onnx=True,
            cpu_threads=2)
        ```
 
K
KP 已提交
144 145 146 147 148
  Output:
  ```bash
  Wave file has been generated: output.wav
  ```

小湉湉's avatar
小湉湉 已提交
149
### 4. Pretrained Models
小湉湉's avatar
小湉湉 已提交
150
Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:
K
KP 已提交
151 152

- Acoustic model
153
  | Model | Language |
K
KP 已提交
154
  | :--- | :---: |
155 156 157 158 159 160 161 162 163
  |      speedyspeech_csmsc      |    zh    |
  |      fastspeech2_csmsc       |    zh    |
  |     fastspeech2_ljspeech     |    en    |
  |     fastspeech2_aishell3     |    zh    |
  |       fastspeech2_vctk       |    en    |
  | fastspeech2_cnndecoder_csmsc |    zh    |
  |       fastspeech2_mix        |   mix    |
  |       tacotron2_csmsc        |    zh    |
  |      tacotron2_ljspeech      |    en    |
K
KP 已提交
164 165

- Vocoder
166
  | Model | Language |
K
KP 已提交
167
  | :--- | :---: |
168 169 170 171 172 173 174 175 176 177 178
  |         pwgan_csmsc          |    zh    |
  |        pwgan_ljspeech        |    en    |
  |        pwgan_aishell3        |    zh    |
  |          pwgan_vctk          |    en    |
  |       mb_melgan_csmsc        |    zh    |
  |      style_melgan_csmsc      |    zh    |
  |        hifigan_csmsc         |    zh    |
  |       hifigan_ljspeech       |    en    |
  |       hifigan_aishell3       |    zh    |
  |         hifigan_vctk         |    en    |
  |        wavernn_csmsc         |    zh    |