diff --git a/README.md b/README.md index 7060a6556d3810ae06b2fe512f2c303521f0f24b..8a83ac6199a4cf1f1427abcb09244f8b7b2a3676 100644 --- a/README.md +++ b/README.md @@ -9,20 +9,20 @@ English | [简体中文](README_ch.md)

-

+

Quick Start | Tutorials - | Models List + | Models List

- + ------------------------------------------------------------------------------------ ![License](https://img.shields.io/badge/license-Apache%202-red.svg) ![python version](https://img.shields.io/badge/python-3.7+-orange.svg) ![support os](https://img.shields.io/badge/os-linux-yellow.svg) @@ -31,7 +31,7 @@ how they can use it Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at: - **Fast and Light-weight**: we provide high-speed and ultra-lightweight models that are convenient for industrial deployment. - **Rule-based Chinese frontend**: our frontend contains Text Normalization (TN) and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context. -- **Varieties of Functions that Vitalize both Industrial and Academia**: +- **Varieties of Functions that Vitalize both Industrial and Academia**: - *Implementation of critical audio tasks*: this toolkit contains audio functions like Speech Translation (ST), Automatic Speech Recognition (ASR), Text-To-Speech Synthesis (TTS), Voice Cloning(VC), Punctuation Restoration, etc. - *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model lists](#models-list) for more details. - *Cross-domain application*: as an extension of the application of traditional audio tasks, we combine the aforementioned tasks with other fields like NLP. @@ -70,7 +70,7 @@ If you want to set up PaddleSpeech in other environment, please see the [ASR ins ## Quick Start > Note: the current links to `English ASR` and `English TTS` are not valid. -Just a quick test of our functions: [English ASR](link/hubdetail?name=deepspeech2_aishell&en_category=AutomaticSpeechRecognition) and [English TTS](link/hubdetail?name=fastspeech2_baker&en_category=TextToSpeech) by typing message or upload your own audio file. +Just a quick test of our functions: [English ASR](link/hubdetail?name=deepspeech2_aishell&en_category=AutomaticSpeechRecognition) and [English TTS](link/hubdetail?name=fastspeech2_baker&en_category=TextToSpeech) by typing message or upload your own audio file. Developers can have a try of our model with only a few lines of code. @@ -87,7 +87,7 @@ bash local/test.sh conf/deepspeech2.yaml ckptfile offline ``` For *TTS*, try FastSpeech2 on LJSpeech: -- Download LJSpeech-1.1 from the [ljspeech official website](https://keithito.com/LJ-Speech-Dataset/), our prepared durations for fastspeech2 [ljspeech_alignment](https://paddlespeech.bj.bcebos.com/MFA/LJSpeech-1.1/ljspeech_alignment.tar.gz). +- Download LJSpeech-1.1 from the [ljspeech official website](https://keithito.com/LJ-Speech-Dataset/), our prepared durations for fastspeech2 [ljspeech_alignment](https://paddlespeech.bj.bcebos.com/MFA/LJSpeech-1.1/ljspeech_alignment.tar.gz). - The pretrained models are seperated into two parts: [fastspeech2_nosil_ljspeech_ckpt](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_ljspeech_ckpt_0.5.zip) and [pwg_ljspeech_ckpt](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_ljspeech_ckpt_0.5.zip). Please download then unzip to `./model/fastspeech2` and `./model/pwg` respectively. - Assume your path to the dataset is `~/datasets/LJSpeech-1.1` and `./ljspeech_alignment` accordingly, preprocess your data and then use our pretrained model to synthesize: ```shell @@ -106,7 +106,7 @@ PaddleSpeech supports a series of most popular models, summarized in [released m ASR module contains *Acoustic Model* and *Language Model*, with the following details: > Note: The `Link` should be code path rather than download links. diff --git a/deepspeech/exps/deepspeech2/model.py b/deepspeech/exps/deepspeech2/model.py index 6424cfdf389ea2d62f80a6577e4f4270ccebca46..710630a7864a6296a0f0ed4f19ede9f17df136c9 100644 --- a/deepspeech/exps/deepspeech2/model.py +++ b/deepspeech/exps/deepspeech2/model.py @@ -153,8 +153,12 @@ class DeepSpeech2Trainer(Trainer): def setup_model(self): config = self.config.clone() with UpdateConfig(config): - config.model.feat_size = self.train_loader.collate_fn.feature_size - config.model.dict_size = self.train_loader.collate_fn.vocab_size + if self.train: + config.model.feat_size = self.train_loader.collate_fn.feature_size + config.model.dict_size = self.train_loader.collate_fn.vocab_size + else: + config.model.feat_size = self.test_loader.collate_fn.feature_size + config.model.dict_size = self.test_loader.collate_fn.vocab_size if self.args.model_type == 'offline': model = DeepSpeech2Model.from_config(config.model) @@ -189,7 +193,6 @@ class DeepSpeech2Trainer(Trainer): self.lr_scheduler = lr_scheduler logger.info("Setup optimizer/lr_scheduler!") - def setup_dataloader(self): config = self.config.clone() config.defrost() diff --git a/examples/aishell/s0/local/test.sh b/examples/aishell/s0/local/test.sh index 2ae0740b3e8d44ab03e45f4c1b5dbb945657705e..64d7250304137e7d658d3bb48d916a346229d876 100755 --- a/examples/aishell/s0/local/test.sh +++ b/examples/aishell/s0/local/test.sh @@ -13,7 +13,7 @@ ckpt_prefix=$2 model_type=$3 # download language model -bash local/download_lm_ch.sh +bash local/download_lm_ch.sh > dev/null 2>&1 if [ $? -ne 0 ]; then exit 1 fi diff --git a/examples/aishell/s0/local/test_export.sh b/examples/aishell/s0/local/test_export.sh index a9a6b122df8055f872f9f0a68717b57241d99359..71469753db5b2615585851f7f3c37a4119ff5056 100755 --- a/examples/aishell/s0/local/test_export.sh +++ b/examples/aishell/s0/local/test_export.sh @@ -13,7 +13,7 @@ jit_model_export_path=$2 model_type=$3 # download language model -bash local/download_lm_ch.sh +bash local/download_lm_ch.sh > dev/null 2>&1 if [ $? -ne 0 ]; then exit 1 fi diff --git a/examples/librispeech/s0/local/test.sh b/examples/librispeech/s0/local/test.sh index 4d00f30b852da5a370f5d4934f3caadd2b833c00..25dd04374acb02256fc6efc5a6b4d572569efb3a 100755 --- a/examples/librispeech/s0/local/test.sh +++ b/examples/librispeech/s0/local/test.sh @@ -13,7 +13,7 @@ ckpt_prefix=$2 model_type=$3 # download language model -bash local/download_lm_en.sh +bash local/download_lm_en.sh > /dev/null 2>&1 if [ $? -ne 0 ]; then exit 1 fi diff --git a/examples/other/1xt2x/aishell/local/test.sh b/examples/other/1xt2x/aishell/local/test.sh index 2ae0740b3e8d44ab03e45f4c1b5dbb945657705e..d539ac4943039fe6c33eb1373985aa98617a587f 100755 --- a/examples/other/1xt2x/aishell/local/test.sh +++ b/examples/other/1xt2x/aishell/local/test.sh @@ -13,7 +13,7 @@ ckpt_prefix=$2 model_type=$3 # download language model -bash local/download_lm_ch.sh +bash local/download_lm_ch.sh > /dev/null 2>&1 if [ $? -ne 0 ]; then exit 1 fi diff --git a/examples/other/1xt2x/baidu_en8k/local/test.sh b/examples/other/1xt2x/baidu_en8k/local/test.sh index 4d00f30b852da5a370f5d4934f3caadd2b833c00..25dd04374acb02256fc6efc5a6b4d572569efb3a 100755 --- a/examples/other/1xt2x/baidu_en8k/local/test.sh +++ b/examples/other/1xt2x/baidu_en8k/local/test.sh @@ -13,7 +13,7 @@ ckpt_prefix=$2 model_type=$3 # download language model -bash local/download_lm_en.sh +bash local/download_lm_en.sh > /dev/null 2>&1 if [ $? -ne 0 ]; then exit 1 fi diff --git a/examples/other/1xt2x/librispeech/local/test.sh b/examples/other/1xt2x/librispeech/local/test.sh index 4d00f30b852da5a370f5d4934f3caadd2b833c00..25dd04374acb02256fc6efc5a6b4d572569efb3a 100755 --- a/examples/other/1xt2x/librispeech/local/test.sh +++ b/examples/other/1xt2x/librispeech/local/test.sh @@ -13,7 +13,7 @@ ckpt_prefix=$2 model_type=$3 # download language model -bash local/download_lm_en.sh +bash local/download_lm_en.sh > /dev/null 2>&1 if [ $? -ne 0 ]; then exit 1 fi diff --git a/examples/tiny/s0/conf/deepspeech2.yaml b/examples/tiny/s0/conf/deepspeech2.yaml index 621b372cbb932a732c63b109ec4ed57c47791b8d..58899a1568e3fd61ba23aaf1cb83347428a7f40d 100644 --- a/examples/tiny/s0/conf/deepspeech2.yaml +++ b/examples/tiny/s0/conf/deepspeech2.yaml @@ -45,7 +45,7 @@ model: ctc_grad_norm_type: null training: - n_epoch: 10 + n_epoch: 5 accum_grad: 1 lr: 1e-5 lr_decay: 0.8 diff --git a/examples/tiny/s0/conf/deepspeech2_online.yaml b/examples/tiny/s0/conf/deepspeech2_online.yaml index 5a8294adb780b32503bb46cfbc80c43b1700b1eb..334b1d31ce21ab95c3099c76caf9cdd36c61cd92 100644 --- a/examples/tiny/s0/conf/deepspeech2_online.yaml +++ b/examples/tiny/s0/conf/deepspeech2_online.yaml @@ -47,7 +47,7 @@ model: ctc_grad_norm_type: null training: - n_epoch: 10 + n_epoch: 5 accum_grad: 1 lr: 1e-5 lr_decay: 1.0 diff --git a/examples/tiny/s0/local/test.sh b/examples/tiny/s0/local/test.sh index 4d00f30b852da5a370f5d4934f3caadd2b833c00..25dd04374acb02256fc6efc5a6b4d572569efb3a 100755 --- a/examples/tiny/s0/local/test.sh +++ b/examples/tiny/s0/local/test.sh @@ -13,7 +13,7 @@ ckpt_prefix=$2 model_type=$3 # download language model -bash local/download_lm_en.sh +bash local/download_lm_en.sh > /dev/null 2>&1 if [ $? -ne 0 ]; then exit 1 fi diff --git a/examples/tiny/s1/conf/chunk_confermer.yaml b/examples/tiny/s1/conf/chunk_confermer.yaml index b14b4b21218012f56a9df73b7dc31da8c271ee6e..c518666977faef8c0862be3e7c7f4d5b5244a5fc 100644 --- a/examples/tiny/s1/conf/chunk_confermer.yaml +++ b/examples/tiny/s1/conf/chunk_confermer.yaml @@ -83,7 +83,7 @@ model: training: - n_epoch: 20 + n_epoch: 5 accum_grad: 1 global_grad_clip: 5.0 optim: adam diff --git a/examples/tiny/s1/conf/chunk_transformer.yaml b/examples/tiny/s1/conf/chunk_transformer.yaml index 38edbf35816a6bd73af9eeea8051c4b580ebb5b1..29c30b262048b46bf08d132aebbb24bd7186bf71 100644 --- a/examples/tiny/s1/conf/chunk_transformer.yaml +++ b/examples/tiny/s1/conf/chunk_transformer.yaml @@ -76,7 +76,7 @@ model: training: - n_epoch: 20 + n_epoch: 5 accum_grad: 1 global_grad_clip: 5.0 optim: adam diff --git a/examples/tiny/s1/conf/conformer.yaml b/examples/tiny/s1/conf/conformer.yaml index 0b06b2b72feb890d886aade48d3449785fa4b375..8487da771930e6f615ac9fe0e718bab310f66970 100644 --- a/examples/tiny/s1/conf/conformer.yaml +++ b/examples/tiny/s1/conf/conformer.yaml @@ -79,7 +79,7 @@ model: training: - n_epoch: 20 + n_epoch: 5 accum_grad: 4 global_grad_clip: 5.0 optim: adam diff --git a/examples/tiny/s1/conf/transformer.yaml b/examples/tiny/s1/conf/transformer.yaml index 1c6f9e022a44e108c5f6d1d6d81cd743a8448863..cc9b5c5158adf2ca74ccf715e6edaf61cb320953 100644 --- a/examples/tiny/s1/conf/transformer.yaml +++ b/examples/tiny/s1/conf/transformer.yaml @@ -73,7 +73,7 @@ model: training: - n_epoch: 21 + n_epoch: 5 accum_grad: 1 global_grad_clip: 5.0 optim: adam