diff --git a/README.md b/README.md index 521351c37c0666f9c59d4a44263dfac57c3f06cf..7a7cf48daa80ec6494a859e8b0d89d4eda0c247c 100644 --- a/README.md +++ b/README.md @@ -105,41 +105,52 @@ Let's take a tiny sampled subset of [LibriSpeech dataset](http://www.openslr.org ``` Notice that this is only a toy example with a tiny sampled subset of LibriSpeech. If you would like to try with the complete dataset (would take several days for training), please go to `examples/librispeech` instead. +- Source env + + ```bash + source path.sh + ``` + Set `MAIN_ROOT` as project dir. +- Main entrypoint + + ```bash + bash run.sh + ``` - Prepare the data ```bash - sh run_data.sh + sh local/run_data.sh ``` - `run_data.sh` will download dataset, generate manifests, collect normalizer's statistics and build vocabulary. Once the data preparation is done, you will find the data (only part of LibriSpeech) downloaded in `./dataset/librispeech` and the corresponding manifest files generated in `./data/tiny` as well as a mean stddev file and a vocabulary file. It has to be run for the very first time you run this dataset and is reusable for all further experiments. + `run_data.sh` will download dataset, generate manifests, collect normalizer's statistics and build vocabulary. Once the data preparation is done, you will find the data (only part of LibriSpeech) downloaded in `${MAIN_ROOT}/dataset/librispeech` and the corresponding manifest files generated in `${PWD}/data` as well as a mean stddev file and a vocabulary file. It has to be run for the very first time you run this dataset and is reusable for all further experiments. - Train your own ASR model ```bash - sh run_train.sh + sh local/run_train.sh ``` - `run_train.sh` will start a training job, with training logs printed to stdout and model checkpoint of every pass/epoch saved to `./checkpoints/tiny`. These checkpoints could be used for training resuming, inference, evaluation and deployment. + `run_train.sh` will start a training job, with training logs printed to stdout and model checkpoint of every pass/epoch saved to `${PWD}/checkpoints`. These checkpoints could be used for training resuming, inference, evaluation and deployment. - Case inference with an existing model ```bash - sh run_infer.sh + sh local/run_infer.sh ``` `run_infer.sh` will show us some speech-to-text decoding results for several (default: 10) samples with the trained model. The performance might not be good now as the current model is only trained with a toy subset of LibriSpeech. To see the results with a better model, you can download a well-trained (trained for several days, with the complete LibriSpeech) model and do the inference: ```bash - sh run_infer_golden.sh + sh local/run_infer_golden.sh ``` - Evaluate an existing model ```bash - sh run_test.sh + sh local/run_test.sh ``` `run_test.sh` will evaluate the model with Word Error Rate (or Character Error Rate) measurement. Similarly, you can also download a well-trained model and test its performance: ```bash - sh run_test_golden.sh + sh local/run_test_golden.sh ``` More detailed information are provided in the following sections. Wish you a happy journey with the *DeepSpeech2 on PaddlePaddle* ASR engine! @@ -158,7 +169,7 @@ More detailed information are provided in the following sections. Wish you a hap To use your custom data, you only need to generate such manifest files to summarize the dataset. Given such summarized manifests, training, inference and all other modules can be aware of where to access the audio files, as well as their meta data including the transcription labels. -For how to generate such manifest files, please refer to `data/librispeech/librispeech.py`, which will download data and generate manifest files for LibriSpeech dataset. +For how to generate such manifest files, please refer to `PATH/TO/LIBRISPEECH/local/librispeech.py`, which will download data and generate manifest files for LibriSpeech dataset. ### Compute Mean & Stddev for Normalizer @@ -168,11 +179,11 @@ To perform z-score normalization (zero-mean, unit stddev) upon audio features, w python3 tools/compute_mean_std.py \ --num_samples 2000 \ --specgram_type linear \ ---manifest_path data/librispeech/manifest.train \ ---output_path data/librispeech/mean_std.npz +--manifest_path PATH/TO/LIBRISPEECH/data/manifest.train \ +--output_path PATH/TO/LIBRISPEECH/data/mean_std.npz ``` -It will compute the mean and standard deviatio of power spectrum feature with 2000 random sampled audio clips listed in `data/librispeech/manifest.train` and save the results to `data/librispeech/mean_std.npz` for further usage. +It will compute the mean and standard deviatio of power spectrum feature with 2000 random sampled audio clips listed in `PATH/TO/LIBRISPEECH/data/manifest.train` and save the results to `PATH/TO/LIBRISPEECH/data/mean_std.npz` for further usage. ### Build Vocabulary @@ -182,11 +193,11 @@ A vocabulary of possible characters is required to convert the transcription int ```bash python3 tools/build_vocab.py \ --count_threshold 0 \ ---vocab_path data/librispeech/eng_vocab.txt \ ---manifest_paths data/librispeech/manifest.train +--vocab_path PATH/TO/LIBRISPEECH/data/eng_vocab.txt \ +--manifest_paths PATH/TO/LIBRISPEECH/data/manifest.train ``` -It will write a vocabuary file `data/librispeeech/eng_vocab.txt` with all transcription text in `data/librispeech/manifest.train`, without vocabulary truncation (`--count_threshold 0`). +It will write a vocabuary file `PATH/TO/LIBRISPEEECH/data/eng_vocab.txt` with all transcription text in `PATH/TO/LIBRISPEECH/data/manifest.train`, without vocabulary truncation (`--count_threshold 0`). ### More Help @@ -226,7 +237,7 @@ For more help on arguments: ```bash python3 train.py --help ``` -or refer to `example/librispeech/run_train.sh`. +or refer to `example/librispeech/local/run_train.sh`. ## Data Augmentation Pipeline @@ -322,7 +333,7 @@ For more help on arguments: ``` python3 infer.py --help ``` -or refer to `example/librispeech/run_infer.sh`. +or refer to `example/librispeech/local/run_infer.sh`. ### Evaluate a Model @@ -347,7 +358,7 @@ For more help on arguments: ```bash python3 test.py --help ``` -or refer to `example/librispeech/run_test.sh`. +or refer to `example/librispeech/local/run_test.sh`. ## Hyper-parameters Tuning @@ -387,11 +398,11 @@ After tuning, you can reset $\alpha$ and $\beta$ in the inference and evaluation ```bash python3 tune.py --help ``` -or refer to `example/librispeech/run_tune.sh`. +or refer to `example/librispeech/local/run_tune.sh`. ## Training for Mandarin Language -The key steps of training for Mandarin language are same to that of English language and we have also provided an example for Mandarin training with Aishell in ```examples/aishell```. As mentioned above, please execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, testing and inference correspondingly. We have also prepared a pre-trained model (downloaded by ./models/aishell/download_model.sh) for users to try with ```sh run_infer_golden.sh``` and ```sh run_test_golden.sh```. Notice that, different from English LM, the Mandarin LM is character-based and please run ```tools/tune.py``` to find an optimal setting. +The key steps of training for Mandarin language are same to that of English language and we have also provided an example for Mandarin training with Aishell in ```examples/aishell/local```. As mentioned above, please execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, testing and inference correspondingly. We have also prepared a pre-trained model (downloaded by ./models/aishell/download_model.sh) for users to try with ```sh run_infer_golden.sh``` and ```sh run_test_golden.sh```. Notice that, different from English LM, the Mandarin LM is character-based and please run ```tools/tune.py``` to find an optimal setting. ## Trying Live Demo with Your Own Voice