*DeepSpeech on PaddlePaddle* is an open-source implementation of end-to-end Automatic Speech Recognition (ASR) engine, with [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform. Our vision is to empower both industrial application and academic research on speech recognition, via an easy-to-use, efficient and scalable implementation, including training, inference & testing module, and demo deployment.
For more information, please docs under `doc`.
For more information, please see below:
[Install](docs/install.md)
[Getting Started](docs/geting_stared.md)
[Data Prepration](docs/data_preparation.md)
[Data Augmentation](docs/augmentation.md)
[Ngram LM](docs/ngram_lm.md)
[Server Demo](docs/server.md)
[Benchmark](docs/benchmark.md)
[Relased Model](docs/released_model.md)
[FAQ](docs/faq.md)
## Models
*[Baidu's Deep Speech2](http://proceedings.mlr.press/v48/amodei16.pdf)
## Setup
* python3.7
* python3.7
* paddlepaddle 2.0.0
- Run the setup script for the remaining dependencies
...
...
@@ -33,6 +43,7 @@ source tools/venv/bin/activate
Please see [Getting Started](docs/geting_started.md) and [tiny egs](examples/tiny/README.md).
## Questions and Help
You are welcome to submit questions and bug reports in [Github Issues](https://github.com/PaddlePaddle/DeepSpeech/issues). You are also welcome to contribute to this project.
We compare the training time with 1, 2, 4, 8 Tesla V100 GPUs (with a subset of LibriSpeech samples whose audio durations are between 6.0 and 7.0 seconds). And it shows that a **near-linear** acceleration with multiple GPUs has been achieved. In the following figure, the time (in seconds) cost for training is printed on the blue bars.
The grid search will print the WER (word error rate) or CER (character error rate) at each point in the hyper-parameters space, and draw the error surface optionally. A proper hyper-parameters range should include the global minima of the error surface for WER/CER, as illustrated in the following figure.