README.md 2.8 KB
Newer Older
1 2
# Deep Speech 2 on PaddlePaddle

3
## Installation
4

5
Please replace `$PADDLE_INSTALL_DIR` with your own paddle installation directory.
6 7

```
Y
yangyaming 已提交
8
sh setup.sh
9 10 11 12 13
export LD_LIBRARY_PATH=$PADDLE_INSTALL_DIR/Paddle/third_party/install/warpctc/lib:$LD_LIBRARY_PATH
```

For some machines, we also need to install libsndfile1. Details to be added.

14 15 16
## Usage

### Preparing Data
17

18
```
X
Xinghai Sun 已提交
19 20
cd datasets
sh run_all.sh
21
cd ..
22
```
23

X
Xinghai Sun 已提交
24
`sh run_all.sh` prepares all ASR datasets (currently, only LibriSpeech available). After running, we have several summarization manifest files in json-format.
25

X
Xinghai Sun 已提交
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
A manifest file summarizes a speech data set, with each line containing the meta data (i.e. audio filepath, transcript text, audio duration) of each audio file within the data set, in json format. Manifest file serves as an interface informing our system of  where and what to read the speech samples.


More help for arguments:

```
python datasets/librispeech/librispeech.py --help
```

### Preparing for Training

```
python compute_mean_std.py
```

Y
Yibing Liu 已提交
41 42 43 44 45
`python compute_mean_std.py` computes mean and stdandard deviation for audio features, and save them to a file with a default name `./mean_std.npz`. This file will be used in both training and inferencing. The default feature of audio data is power spectrum, currently the mfcc feature is also supported. To train and infer based on mfcc feature, you can regenerate this file by

```
python compute_mean_std.py --specgram_type mfcc
```
46

47 48
and specify the ```specgram_type``` to ```mfcc``` in each step, including training, inference etc.

49 50 51
More help for arguments:

```
X
Xinghai Sun 已提交
52
python compute_mean_std.py --help
53 54
```

X
Xinghai Sun 已提交
55
### Training
56 57 58 59

For GPU Training:

```
60
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py
61 62 63 64 65
```

For CPU Training:

```
66
python train.py --use_gpu False
67 68 69 70 71 72 73 74
```

More help for arguments:

```
python train.py --help
```

Y
Yibing Liu 已提交
75 76 77 78 79 80 81 82 83 84 85 86 87 88
### Preparing language model

The following steps, inference, parameters tuning and evaluating, will require a language model during decoding.
A compressed language model is provided and can be accessed by

```
cd ./lm
sh run.sh
cd ..
```

### Inference

For GPU inference
89 90

```
X
Xinghai Sun 已提交
91
CUDA_VISIBLE_DEVICES=0 python infer.py
92 93
```

Y
Yibing Liu 已提交
94 95 96 97 98 99
For CPU inference

```
python infer.py --use_gpu=False
```

100 101 102 103 104
More help for arguments:

```
python infer.py --help
```
Y
Yibing Liu 已提交
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119

### Evaluating

```
CUDA_VISIBLE_DEVICES=0 python evaluate.py
```

More help for arguments:

```
python evaluate.py --help
```

### Parameters tuning

Y
Yibing Liu 已提交
120 121 122
Usually, the parameters $\alpha$ and $\beta$ for the CTC [prefix beam search](https://arxiv.org/abs/1408.2873) decoder need to be tuned after retraining the acoustic model.

For GPU tuning
Y
Yibing Liu 已提交
123 124 125 126 127

```
CUDA_VISIBLE_DEVICES=0 python tune.py
```

Y
Yibing Liu 已提交
128 129 130 131 132 133
For CPU tuning

```
python tune.py --use_gpu=False
```

Y
Yibing Liu 已提交
134 135 136 137 138
More help for arguments:

```
python tune.py --help
```
Y
Yibing Liu 已提交
139 140

Then reset parameters with the tuning result before inference or evaluating.