diff --git a/README.md b/README.md index da413001a51a74be9de4c70f6c1b8f23732a3597..a1d6777e123b65c7242e13259de1440733bfa682 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme 4.What is the goal of this project? --> -**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech, with the state-of-art and influential models. +**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models. ##### Speech-to-Text @@ -86,77 +86,76 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme For more synthesized audios, please refer to [PaddleSpeech Text-to-Speech samples](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html). +##### Speech Translation + +
+ + + + + + + + + + + + + +
Input Audio Translations Result
+ +
+
我 在 这栋 建筑 的 古老 门上 敲门。
+ +
+ Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at: -- **Fast and Light-weight**: we provide high-speed and ultra-lightweight models that are convenient for industrial deployment. +- **Ease of Use**: low barries to install, and [CLI](#quick-start) is available to quick-start your journey. +- **Align to the State-of-the-Art**: we provide high-speed and ultra-lightweight models, and also cutting edge technology. - **Rule-based Chinese frontend**: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context. - **Varieties of Functions that Vitalize both Industrial and Academia**: - - *Implementation of critical audio tasks*: this toolkit contains audio functions like Speech Translation, Automatic Speech Recognition, Text-to-Speech Synthesis, Voice Cloning, etc. + - *Implementation of critical audio tasks*: this toolkit contains audio functions like Audio Classification, Speech Translation, Automatic Speech Recognition, Text-to-Speech Synthesis, etc. - *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details. - - *Cascaded models application*: as an extension of the application of traditional audio tasks, we combine the workflows of aforementioned tasks with other fields like Natural language processing (NLP), like Punctuation Restoration. + - *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV). ## Installation -The base environment in this page is -- Ubuntu 16.04 -- python>=3.7 -- paddlepaddle>=2.2.0 - -If you want to set up PaddleSpeech in other environment, please see the [installation](./docs/source/install.md) documents for all the alternatives. +We strongly recommend our users to install PaddleSpeech in *Linux* with *python>=3.7* and *paddlepaddle>=2.2.0*, where `paddlespeech` can be easily installed with `pip`: +```python +pip install paddlespeech +``` +If you want to set up in other environment, please see the [installation](./docs/source/install.md) for all the alternatives. ## Quick Start -Developers can have a try of our model with only a few lines of code. - -A tiny DeepSpeech2 **Speech-to-Text** model training on toy set of LibriSpeech: +Developers can have a try of our models with [PaddleSpeech Command Line](./paddlespeech/cli/README.md). Change `--input` to test your own audio/text. +**Audio Classification** +```shell +paddlespeech cls --input input.wav +``` +**Automatic Speech Recognition** ```shell -cd examples/tiny/asr0/ -# source the environment -source path.sh -source ../../../utils/parse_options.sh -# prepare data -bash ./local/data.sh -# train model, all `ckpt` under `exp` dir, if you use paddlepaddle-gpu, you can set CUDA_VISIBLE_DEVICES before the train script -./local/train.sh conf/deepspeech2.yaml deepspeech2 offline -# avg n best model to get the test model, in this case, n = 1 -avg.sh best exp/deepspeech2/checkpoints 1 -# evaluate the test model -./local/test.sh conf/deepspeech2.yaml exp/deepspeech2/checkpoints/avg_1 offline +paddlespeech asr --lang zh --input input_16k.wav ``` +**Speech Translation** (English to Chinese) -For **Text-to-Speech**, try pretrained FastSpeech2 + Parallel WaveGAN on CSMSC: +(not support for Windows now) +```shell +paddlespeech st --input input_16k.wav +``` +**Text-to-Speech** ```shell -cd examples/csmsc/tts3 -# download the pretrained models and unaip them -wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_ckpt_0.4.zip -unzip pwg_baker_ckpt_0.4.zip -wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip -unzip fastspeech2_nosil_baker_ckpt_0.4.zip -# source the environment -source path.sh -# run end-to-end synthesize -FLAGS_allocator_strategy=naive_best_fit \ -FLAGS_fraction_of_gpu_memory_to_use=0.01 \ -python3 ${BIN_DIR}/synthesize_e2e.py \ - --fastspeech2-config=fastspeech2_nosil_baker_ckpt_0.4/default.yaml \ - --fastspeech2-checkpoint=fastspeech2_nosil_baker_ckpt_0.4/snapshot_iter_76000.pdz \ - --fastspeech2-stat=fastspeech2_nosil_baker_ckpt_0.4/speech_stats.npy \ - --pwg-config=pwg_baker_ckpt_0.4/pwg_default.yaml \ - --pwg-checkpoint=pwg_baker_ckpt_0.4/pwg_snapshot_iter_400000.pdz \ - --pwg-stat=pwg_baker_ckpt_0.4/pwg_stats.npy \ - --text=${BIN_DIR}/../sentences.txt \ - --output-dir=exp/default/test_e2e \ - --inference-dir=exp/default/inference \ - --phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt +paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" --output output.wav ``` -If you want to try more functions like training and tuning, please see [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md). +If you want to try more functions like training and tuning, please have a look at [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md). ## Model List -PaddleSpeech supports a series of most popular models, summarized in [released models](./docs/source/released_model.md) with available pretrained models. +PaddleSpeech supports a series of most popular models. They are summarized in [released models](./docs/source/released_model.md) and attached with available pretrained models. -Speech-to-Text module contains *Acoustic Model* and *Language Model*, with the following details: +**Speech-to-Text** contains *Acoustic Model* and *Language Model*, with the following details: