README.md 12.7 KB
Newer Older
M
Mingxue-Xu 已提交
1
<p align="center">
M
Mingxue-Xu 已提交
2
  <img src="./docs/images/PaddleSpeech_logo.png" />
M
Mingxue-Xu 已提交
3 4 5 6 7 8
</p>
<div align="center">  

  <h3>
  <a href="#quick-start"> Quick Start </a>
  | <a href="#tutorials"> Tutorials </a>
Z
Zeyu Chen 已提交
9
  | <a href="#model-list"> Models List </a>
M
Mingxue-Xu 已提交
10 11 12
</div>

------------------------------------------------------------------------------------
Z
Zeyu Chen 已提交
13

Z
Zeyu Chen 已提交
14 15 16
![License](https://img.shields.io/badge/license-Apache%202-red.svg)
![python version](https://img.shields.io/badge/python-3.7+-orange.svg)
![support os](https://img.shields.io/badge/os-linux-yellow.svg)
L
lfchener 已提交
17

M
Mingxue-Xu 已提交
18 19 20 21 22 23 24 25
<!---
from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readmes-readable.md
1.What is this repo or project? (You can reuse the repo description you used earlier because this section doesn’t have to be long.)
2.How does it work?
3.Who will use this repo or project?
4.What is the goal of this project?
-->

M
Mingxue-Xu 已提交
26
**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech, with the state-of-art and influential models.
M
Mingxue-Xu 已提交
27

Z
Zeyu Chen 已提交
28
##### Speech-to-Text
M
Mingxue-Xu 已提交
29 30 31 32 33 34

<div align = "center">
<table style="width:100%">
  <thead>
    <tr>
      <th> Input Audio  </th>
M
Mingxue-Xu 已提交
35
      <th width="550"> Recognition Result  </th>
M
Mingxue-Xu 已提交
36 37 38 39 40
    </tr>
  </thead>
  <tbody>
   <tr>
      <td align = "center">
M
Mingxue-Xu 已提交
41
      <a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
M
Mingxue-Xu 已提交
42
            <img align="center" src="./docs/images/audio_icon.png" width="200 style="max-width: 100%;"></a><br>
M
Mingxue-Xu 已提交
43
      </td>
M
Mingxue-Xu 已提交
44
      <td >I knocked at the door on the ancient side of the building.</td>
M
Mingxue-Xu 已提交
45 46 47
    </tr>
    <tr>
      <td align = "center">
M
Mingxue-Xu 已提交
48
      <a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav" rel="nofollow">
M
Mingxue-Xu 已提交
49
            <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
M
Mingxue-Xu 已提交
50
      </td>
M
Mingxue-Xu 已提交
51
      <td>我认为跑步最重要的就是给我带来了身体健康。</td>
M
Mingxue-Xu 已提交
52 53 54 55 56
    </tr>
  </tbody>
</table>

</div>
57

Z
Zeyu Chen 已提交
58
##### Text-to-Speech
M
Mingxue-Xu 已提交
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
<div align = "center">
<table style="width:100%">
  <thead>
    <tr>
      <th><img width="200" height="1"> Input Text <img width="200" height="1"> </th>
      <th>Synthetic Audio</th>
    </tr>
  </thead>
  <tbody>
   <tr>
      <td >Life was like a box of chocolates, you never know what you're gonna get.</td>
      <td align = "center">
      <a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/transformer_tts_ljspeech_ckpt_0.4_waveflow_ljspeech_ckpt_0.3/001.wav" rel="nofollow">
            <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
      </td>
    </tr>
    <tr>
      <td >早上好,今天是2020/10/29,最低温度是-3°C。</td>
      <td align = "center">
      <a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/001.wav" rel="nofollow">
            <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
      </td>
    </tr>
  </tbody>
</table>

</div>

Z
Zeyu Chen 已提交
87
For more synthesized audios, please refer to [PaddleSpeech Text-to-Speech samples](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html).
小湉湉's avatar
小湉湉 已提交
88

M
Mingxue-Xu 已提交
89 90 91 92
Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:
- **Fast and Light-weight**: we provide high-speed and ultra-lightweight models that are convenient for industrial deployment.
- **Rule-based Chinese frontend**: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
- **Varieties of Functions that Vitalize both Industrial and Academia**:
Z
Zeyu Chen 已提交
93 94
  - *Implementation of critical audio tasks*: this toolkit contains audio functions like Speech Translation, Automatic Speech Recognition, Text-to-Speech Synthesis, Voice Cloning, etc.
  - *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details.
M
Mingxue-Xu 已提交
95 96
  - *Cascaded models application*: as an extension of the application of traditional audio tasks, we combine the workflows of aforementioned tasks with other fields like Natural language processing (NLP), like Punctuation Restoration.

Z
Zeyu Chen 已提交
97
## Installation
M
Mingxue-Xu 已提交
98 99 100 101

The base environment in this page is  
- Ubuntu 16.04
- python>=3.7
Z
Zeyu Chen 已提交
102
- paddlepaddle>=2.2.0
M
Mingxue-Xu 已提交
103

M
Mingxue-Xu 已提交
104
If you want to set up PaddleSpeech in other environment, please see the [installation](./docs/source/install.md) documents for all the alternatives.
M
Mingxue-Xu 已提交
105

Z
Zeyu Chen 已提交
106
## Quick Start
M
Mingxue-Xu 已提交
107

M
Mingxue-Xu 已提交
108
Developers can have a try of our models with [PaddleSpeech Command Line](./paddlespeech/cli/README.md). Change `--input` to test your own audio/text file.
M
Mingxue-Xu 已提交
109

M
Mingxue-Xu 已提交
110
**Audio Classification**     
M
Mingxue-Xu 已提交
111
```shell
M
Mingxue-Xu 已提交
112
paddlespeech cls --input ./test_audio.wav
M
Mingxue-Xu 已提交
113
```
M
Mingxue-Xu 已提交
114 115 116 117 118 119 120 121 122
**Automatic Speech Recognition**
```shell
paddlespeech asr --lang zh --sr 16000 --input ./input.wav
```
**Speech Translation** (English to Chinese)
```shell
paddlespeech st --input ./test_audio.wav
```
**Text-to-Speech** 
M
Mingxue-Xu 已提交
123
```shell
M
Mingxue-Xu 已提交
124
paddlespeech tts --lang zh --input ./test_text.txt 
M
Mingxue-Xu 已提交
125 126
```

小湉湉's avatar
小湉湉 已提交
127
If you want to try more functions like training and tuning, please see [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md).
Z
Zeyu Chen 已提交
128

Z
Zeyu Chen 已提交
129
## Model List
130

小湉湉's avatar
小湉湉 已提交
131
PaddleSpeech supports a series of most popular models, summarized in [released models](./docs/source/released_model.md) with available pretrained models.
Z
Zeyu Chen 已提交
132

Z
Zeyu Chen 已提交
133
Speech-to-Text module contains *Acoustic Model* and *Language Model*, with the following details:
L
lfchener 已提交
134

M
Mingxue-Xu 已提交
135 136 137
<!---
The current hyperlinks redirect to [Previous Parakeet](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples).
-->
H
Hui Zhang 已提交
138

M
Mingxue-Xu 已提交
139 140 141
<table style="width:100%">
  <thead>
    <tr>
Z
Zeyu Chen 已提交
142
      <th>Speech-to-Text Module Type</th>
M
Mingxue-Xu 已提交
143 144 145 146 147 148 149 150 151 152 153
      <th>Dataset</th>
      <th>Model Type</th>
      <th>Link</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td rowspan="3">Acoustic Model</td>
      <td rowspan="2" >Aishell</td>
      <td >DeepSpeech2 RNN + Conv based Models</td>
      <td>
H
Hui Zhang 已提交
154
      <a href = "./examples/aishell/asr0">deepspeech2-aishell</a>
M
Mingxue-Xu 已提交
155 156 157 158 159
      </td>
    </tr>
    <tr>
      <td>Transformer based Attention Models </td>
      <td>
H
Hui Zhang 已提交
160
      <a href = "./examples/aishell/asr1">u2.transformer.conformer-aishell</a>
M
Mingxue-Xu 已提交
161 162 163 164 165 166
      </td>
    </tr>
      <tr>
      <td> Librispeech</td>
      <td>Transformer based Attention Models </td>
      <td>
H
Hui Zhang 已提交
167
      <a href = "./examples/librispeech/asr0">deepspeech2-librispeech</a> / <a href = "./examples/librispeech/asr1">transformer.conformer.u2-librispeech</a>  / <a href = "./examples/librispeech/asr2">transformer.conformer.u2-kaldi-librispeech</a>
M
Mingxue-Xu 已提交
168 169 170
      </td>
      </td>
    </tr>
M
Mingxue-Xu 已提交
171 172 173 174 175
  <tr>
  <td>Alignment</td>
  <td>THCHS30</td>
  <td>MFA</td>
  <td>
H
Hui Zhang 已提交
176
  <a href = ".examples/thchs30/align0">mfa-thchs30</a>
M
Mingxue-Xu 已提交
177 178
  </td>
  </tr>
M
Mingxue-Xu 已提交
179 180
   <tr>
      <td rowspan="2">Language Model</td>
M
Mingxue-Xu 已提交
181
      <td colspan = "2">Ngram Language Model</td>
M
Mingxue-Xu 已提交
182
      <td>
M
Mingxue-Xu 已提交
183
      <a href = "./examples/other/ngram_lm">kenlm</a>
M
Mingxue-Xu 已提交
184 185 186
      </td>
    </tr>
    <tr>
M
Mingxue-Xu 已提交
187 188
      <td>TIMIT</td>
      <td>Unified Streaming & Non-streaming Two-pass</td>
M
Mingxue-Xu 已提交
189
      <td>
H
Hui Zhang 已提交
190
    <a href = "./examples/timit/asr1"> u2-timit</a>
M
Mingxue-Xu 已提交
191 192 193 194
      </td>
    </tr>
  </tbody>
</table>
H
Hui Zhang 已提交
195

Z
Zeyu Chen 已提交
196
PaddleSpeech Text-to-Speech mainly contains three modules: *Text Frontend*, *Acoustic Model* and *Vocoder*. Acoustic Model and Vocoder models are listed as follow:
H
Hui Zhang 已提交
197

M
Mingxue-Xu 已提交
198 199 200
<table>
  <thead>
    <tr>
Z
Zeyu Chen 已提交
201
      <th> Text-to-Speech Module Type <img width="110" height="1"> </th>
M
Mingxue-Xu 已提交
202 203 204 205 206 207 208 209 210 211
      <th>  Model Type  </th>
      <th> <img width="50" height="1"> Dataset  <img width="50" height="1"> </th>
      <th> <img width="101" height="1"> Link <img width="105" height="1"> </th>
    </tr>
  </thead>
  <tbody>
    <tr>
    <td> Text Frontend</td>
    <td colspan="2"> &emsp; </td>
    <td>
212
    <a href = "./examples/other/tn">tn</a> / <a href = "./examples/other/g2p">g2p</a>
M
Mingxue-Xu 已提交
213 214 215 216 217 218 219 220 221 222 223
    </td>
    </tr>
    <tr>
      <td rowspan="4">Acoustic Model</td>
      <td >Tacotron2</td>
      <td rowspan="2" >LJSpeech</td>
      <td>
      <a href = "./examples/ljspeech/tts0">tacotron2-ljspeech</a>
      </td>
    </tr>
    <tr>
小湉湉's avatar
小湉湉 已提交
224
      <td>Transformer TTS</td>
M
Mingxue-Xu 已提交
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243
      <td>
      <a href = "./examples/ljspeech/tts1">transformer-ljspeech</a>
      </td>
    </tr>
    <tr>
      <td>SpeedySpeech</td>
      <td>CSMSC</td>
      <td >
      <a href = "./examples/csmsc/tts2">speedyspeech-csmsc</a>
      </td>
    </tr>
    <tr>
      <td>FastSpeech2</td>
      <td>AISHELL-3 / VCTK / LJSpeech / CSMSC</td>
      <td>
      <a href = "./examples/aishell3/tts3">fastspeech2-aishell3</a> / <a href = "./examples/vctk/tts3">fastspeech2-vctk</a> / <a href = "./examples/ljspeech/tts3">fastspeech2-ljspeech</a> / <a href = "./examples/csmsc/tts3">fastspeech2-csmsc</a>
      </td>
    </tr>
   <tr>
小湉湉's avatar
小湉湉 已提交
244
      <td rowspan="3">Vocoder</td>
M
Mingxue-Xu 已提交
245 246 247 248 249 250 251 252 253 254
      <td >WaveFlow</td>
      <td >LJSpeech</td>
      <td>
      <a href = "./examples/ljspeech/voc0">waveflow-ljspeech</a>
      </td>
    </tr>
    <tr>
      <td >Parallel WaveGAN</td>
      <td >LJSpeech / VCTK / CSMSC</td>
      <td>
255
      <a href = "./examples/ljspeech/voc1">PWGAN-ljspeech</a> / <a href = "./examples/vctk/voc1">PWGAN-vctk</a> / <a href = "./examples/csmsc/voc1">PWGAN-csmsc</a>
M
Mingxue-Xu 已提交
256 257
      </td>
    </tr>
小湉湉's avatar
小湉湉 已提交
258 259 260 261 262 263 264
    <tr>
      <td >Multi Band MelGAN</td>
      <td >CSMSC</td>
      <td>
      <a href = "./examples/csmsc/voc3">Multi Band MelGAN-csmsc</a> 
      </td>
    </tr>                                                                                                                                           
M
Mingxue-Xu 已提交
265
    <tr>
小湉湉's avatar
小湉湉 已提交
266 267 268 269 270 271
      <td rowspan="3">Voice Cloning</td>
      <td>GE2E</td>
      <td >AISHELL-3, etc.</td>
      <td>
      <a href = "./examples/other/ge2e">ge2e</a>
      </td>
M
Mingxue-Xu 已提交
272 273
    </tr>
    <tr>
小湉湉's avatar
小湉湉 已提交
274 275 276 277 278 279 280 281 282 283 284 285
      <td>GE2E + Tactron2</td>
      <td>AISHELL-3</td>
      <td>
      <a href = "./examples/aishell3/vc0">ge2e-tactron2-aishell3</a>
      </td>
    </tr>
    <tr>
      <td>GE2E + FastSpeech2</td>
      <td>AISHELL-3</td>
      <td>
      <a href = "./examples/aishell3/vc1">ge2e-fastspeech2-aishell3</a>
      </td>
M
Mingxue-Xu 已提交
286 287 288
    </tr>
  </tbody>
</table>
H
Hui Zhang 已提交
289

Z
Zeyu Chen 已提交
290
## Tutorials
Z
Zeyu Chen 已提交
291

M
Mingxue-Xu 已提交
292
Normally, [Speech SoTA](https://paperswithcode.com/area/speech) and [Audio SoTA](https://paperswithcode.com/area/audio) give you an overview of the hot academic topics in this area. To focus on the tasks in PaddleSpeech, you will find the following guidelines are helpful to grasp the core ideas.
Z
Zeyu Chen 已提交
293

294
- [Overview](./docs/source/introduction.md)
M
Mingxue-Xu 已提交
295 296
- Quick Start
  - [Dependencies](./docs/source/dependencies.md) and [Installation](./docs/source/install.md)
Z
Zeyu Chen 已提交
297 298 299
  - [Quick Start of Speech-to-Text](./docs/source/asr/quick_start.md)
  - [Quick Start of Text-to-Speech](./docs/source/tts/quick_start.md)
- Speech-to-Text
M
Mingxue-Xu 已提交
300 301 302 303 304
  - [Models Introduction](./docs/source/asr/models_introduction.md)
  - [Data Preparation](./docs/source/asr/data_preparation.md)
  - [Data Augmentation Pipeline](./docs/source/asr/augmentation.md)
  - [Features](./docs/source/asr/feature_list.md)
  - [Ngram LM](./docs/source/asr/ngram_lm.md)
Z
Zeyu Chen 已提交
305
- Text-to-Speech
M
Mingxue-Xu 已提交
306 307 308 309 310
  - [Introduction](./docs/source/tts/models_introduction.md)
  - [Advanced Usage](./docs/source/tts/advanced_usage.md)
  - [Chinese Rule Based Text Frontend](./docs/source/tts/zh_text_frontend.md)
  - [Test Audio Samples](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html) and [PaddleSpeech VS. Espnet](https://paddlespeech.readthedocs.io/en/latest/tts/demo_2.html)
- [Released Models](./docs/source/released_model.md)
Z
Zeyu Chen 已提交
311

小湉湉's avatar
小湉湉 已提交
312
The TTS module is originally called [Parakeet](https://github.com/PaddlePaddle/Parakeet), and now merged with DeepSpeech. If you are interested in academic research about this function, please see [TTS research overview](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview). Also, [this document](https://paddlespeech.readthedocs.io/en/latest/tts/models_introduction.html) is a good guideline for the pipeline components.
Z
Zeyu Chen 已提交
313

Z
Zeyu Chen 已提交
314
## FAQ and Contributing
Z
Zeyu Chen 已提交
315

Z
Zeyu Chen 已提交
316
You are warmly welcome to submit questions in [discussions](https://github.com/PaddlePaddle/PaddleSpeech/discussions) and bug reports in [issues](https://github.com/PaddlePaddle/PaddleSpeech/issues)! Also, we highly appreciate if you would like to contribute to this project!
317

Z
Zeyu Chen 已提交
318
## Citation
319

M
Mingxue-Xu 已提交
320 321 322 323 324
To cite PaddleSpeech for research, please use the following format.
```tex
@misc{ppspeech2021,
title={PaddleSpeech, a toolkit for audio processing based on PaddlePaddle.},
author={PaddlePaddle Authors},
Z
Zeyu Chen 已提交
325
howpublished = {\url{https://github.com/PaddlePaddle/PaddleSpeech}},
M
Mingxue-Xu 已提交
326 327 328
year={2021}
}
```
Z
Zeyu Chen 已提交
329 330 331 332 333

## License and Acknowledge

PaddleSpeech is provided under the [Apache-2.0 License](./LICENSE).

小湉湉's avatar
小湉湉 已提交
334
PaddleSpeech depends on a lot of open source repositories. See [references](./docs/source/reference.md) for more information.