提交 c246d315 编写于 作者: H Hui Zhang

update en readme

上级 7357e521
......@@ -105,41 +105,52 @@ Let's take a tiny sampled subset of [LibriSpeech dataset](http://www.openslr.org
```
Notice that this is only a toy example with a tiny sampled subset of LibriSpeech. If you would like to try with the complete dataset (would take several days for training), please go to `examples/librispeech` instead.
- Source env
```bash
source path.sh
```
Set `MAIN_ROOT` as project dir.
- Main entrypoint
```bash
bash run.sh
```
- Prepare the data
```bash
sh run_data.sh
sh local/run_data.sh
```
`run_data.sh` will download dataset, generate manifests, collect normalizer's statistics and build vocabulary. Once the data preparation is done, you will find the data (only part of LibriSpeech) downloaded in `./dataset/librispeech` and the corresponding manifest files generated in `./data/tiny` as well as a mean stddev file and a vocabulary file. It has to be run for the very first time you run this dataset and is reusable for all further experiments.
`run_data.sh` will download dataset, generate manifests, collect normalizer's statistics and build vocabulary. Once the data preparation is done, you will find the data (only part of LibriSpeech) downloaded in `${MAIN_ROOT}/dataset/librispeech` and the corresponding manifest files generated in `${PWD}/data` as well as a mean stddev file and a vocabulary file. It has to be run for the very first time you run this dataset and is reusable for all further experiments.
- Train your own ASR model
```bash
sh run_train.sh
sh local/run_train.sh
```
`run_train.sh` will start a training job, with training logs printed to stdout and model checkpoint of every pass/epoch saved to `./checkpoints/tiny`. These checkpoints could be used for training resuming, inference, evaluation and deployment.
`run_train.sh` will start a training job, with training logs printed to stdout and model checkpoint of every pass/epoch saved to `${PWD}/checkpoints`. These checkpoints could be used for training resuming, inference, evaluation and deployment.
- Case inference with an existing model
```bash
sh run_infer.sh
sh local/run_infer.sh
```
`run_infer.sh` will show us some speech-to-text decoding results for several (default: 10) samples with the trained model. The performance might not be good now as the current model is only trained with a toy subset of LibriSpeech. To see the results with a better model, you can download a well-trained (trained for several days, with the complete LibriSpeech) model and do the inference:
```bash
sh run_infer_golden.sh
sh local/run_infer_golden.sh
```
- Evaluate an existing model
```bash
sh run_test.sh
sh local/run_test.sh
```
`run_test.sh` will evaluate the model with Word Error Rate (or Character Error Rate) measurement. Similarly, you can also download a well-trained model and test its performance:
```bash
sh run_test_golden.sh
sh local/run_test_golden.sh
```
More detailed information are provided in the following sections. Wish you a happy journey with the *DeepSpeech2 on PaddlePaddle* ASR engine!
......@@ -158,7 +169,7 @@ More detailed information are provided in the following sections. Wish you a hap
To use your custom data, you only need to generate such manifest files to summarize the dataset. Given such summarized manifests, training, inference and all other modules can be aware of where to access the audio files, as well as their meta data including the transcription labels.
For how to generate such manifest files, please refer to `data/librispeech/librispeech.py`, which will download data and generate manifest files for LibriSpeech dataset.
For how to generate such manifest files, please refer to `PATH/TO/LIBRISPEECH/local/librispeech.py`, which will download data and generate manifest files for LibriSpeech dataset.
### Compute Mean & Stddev for Normalizer
......@@ -168,11 +179,11 @@ To perform z-score normalization (zero-mean, unit stddev) upon audio features, w
python3 tools/compute_mean_std.py \
--num_samples 2000 \
--specgram_type linear \
--manifest_path data/librispeech/manifest.train \
--output_path data/librispeech/mean_std.npz
--manifest_path PATH/TO/LIBRISPEECH/data/manifest.train \
--output_path PATH/TO/LIBRISPEECH/data/mean_std.npz
```
It will compute the mean and standard deviatio of power spectrum feature with 2000 random sampled audio clips listed in `data/librispeech/manifest.train` and save the results to `data/librispeech/mean_std.npz` for further usage.
It will compute the mean and standard deviatio of power spectrum feature with 2000 random sampled audio clips listed in `PATH/TO/LIBRISPEECH/data/manifest.train` and save the results to `PATH/TO/LIBRISPEECH/data/mean_std.npz` for further usage.
### Build Vocabulary
......@@ -182,11 +193,11 @@ A vocabulary of possible characters is required to convert the transcription int
```bash
python3 tools/build_vocab.py \
--count_threshold 0 \
--vocab_path data/librispeech/eng_vocab.txt \
--manifest_paths data/librispeech/manifest.train
--vocab_path PATH/TO/LIBRISPEECH/data/eng_vocab.txt \
--manifest_paths PATH/TO/LIBRISPEECH/data/manifest.train
```
It will write a vocabuary file `data/librispeeech/eng_vocab.txt` with all transcription text in `data/librispeech/manifest.train`, without vocabulary truncation (`--count_threshold 0`).
It will write a vocabuary file `PATH/TO/LIBRISPEEECH/data/eng_vocab.txt` with all transcription text in `PATH/TO/LIBRISPEECH/data/manifest.train`, without vocabulary truncation (`--count_threshold 0`).
### More Help
......@@ -226,7 +237,7 @@ For more help on arguments:
```bash
python3 train.py --help
```
or refer to `example/librispeech/run_train.sh`.
or refer to `example/librispeech/local/run_train.sh`.
## Data Augmentation Pipeline
......@@ -322,7 +333,7 @@ For more help on arguments:
```
python3 infer.py --help
```
or refer to `example/librispeech/run_infer.sh`.
or refer to `example/librispeech/local/run_infer.sh`.
### Evaluate a Model
......@@ -347,7 +358,7 @@ For more help on arguments:
```bash
python3 test.py --help
```
or refer to `example/librispeech/run_test.sh`.
or refer to `example/librispeech/local/run_test.sh`.
## Hyper-parameters Tuning
......@@ -387,11 +398,11 @@ After tuning, you can reset $\alpha$ and $\beta$ in the inference and evaluation
```bash
python3 tune.py --help
```
or refer to `example/librispeech/run_tune.sh`.
or refer to `example/librispeech/local/run_tune.sh`.
## Training for Mandarin Language
The key steps of training for Mandarin language are same to that of English language and we have also provided an example for Mandarin training with Aishell in ```examples/aishell```. As mentioned above, please execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, testing and inference correspondingly. We have also prepared a pre-trained model (downloaded by ./models/aishell/download_model.sh) for users to try with ```sh run_infer_golden.sh``` and ```sh run_test_golden.sh```. Notice that, different from English LM, the Mandarin LM is character-based and please run ```tools/tune.py``` to find an optimal setting.
The key steps of training for Mandarin language are same to that of English language and we have also provided an example for Mandarin training with Aishell in ```examples/aishell/local```. As mentioned above, please execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, testing and inference correspondingly. We have also prepared a pre-trained model (downloaded by ./models/aishell/download_model.sh) for users to try with ```sh run_infer_golden.sh``` and ```sh run_test_golden.sh```. Notice that, different from English LM, the Mandarin LM is character-based and please run ```tools/tune.py``` to find an optimal setting.
## Trying Live Demo with Your Own Voice
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册