From f4375ef125cf496d87bb92e9991da12039488077 Mon Sep 17 00:00:00 2001 From: Xinghai Sun Date: Mon, 7 Aug 2017 19:41:59 +0800 Subject: [PATCH] Update README.md with code reviews for DS2. --- README.md | 36 ++++++++++++++---------------------- model.py | 6 +++--- 2 files changed, 17 insertions(+), 25 deletions(-) diff --git a/README.md b/README.md index 39dba0cd..96fbb7d0 100755 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ - **Python = 2.7** only supported; - **cuDNN >= 6.0** is required to utilize NVIDIA GPU platform in the installation of PaddlePaddle, and the **CUDA toolkit** with proper version suitable for cuDNN. The cuDNN library below 6.0 is found to yield a fatal error in batch normalization when handling utterances with long duration in inference. -### Setup for Training & Evaluation +### Setup ``` sh setup.sh @@ -16,19 +16,6 @@ export LD_LIBRARY_PATH=$PADDLE_INSTALL_DIR/Paddle/third_party/install/warpctc/li Please replace `$PADDLE_INSTALL_DIR` with your own paddle installation directory. -### Setup for Demo - -Please do the following extra installation before run `demo_client.py` to try the realtime ASR demo. However there is no need to install them for the computer running the demo's server-end (`demo_server.py`). For details of running the ASR demo, please refer to the [section](#playing-with-the-asr-demo). - -For example, on MAC OS X: - -``` -brew install portaudio -pip install pyaudio -pip install pynput -``` - - ## Usage ### Preparing Data @@ -159,20 +146,25 @@ Then reset parameters with the tuning result before inference or evaluating. ### Playing with the ASR Demo -A real-time ASR demo (`demo_server.py` and `demo_client.py`) are prepared for users to try out the ASR model with their own voice. After a model and language model is prepared, we can first start the demo server: +A real-time ASR demo is built for users to try out the ASR model with their own voice. Please do the following installation on the machine you'd like to run the demo's client (no need for the machine running the demo's server). + +For example, on MAC OS X: + +``` +brew install portaudio +pip install pyaudio +pip install pynput +``` +After a model and language model is prepared, we can first start the demo's server: ``` CUDA_VISIBLE_DEVICES=0 python demo_server.py ``` -And then in another console, start the client: +And then in another console, start the demo's client: ``` python demo_client.py ``` -On the client console, press and hold "white-space" key and start talking, then release the "white-space" key when you finish your speech. The decoding results (infered transcription) will be displayed. - -If you would like to start the server and the client in two machines. Please use `--host_ip` and `--host_port` to indicate the actual IP address and port, for both `demo_server.py` and `demo_client.py`. - -Notice that `demo_client.py` should be started in your local computer with microphone hardware, while `demo_server.py` can be started in any remote server as well as the same local computer. IP address and port should be properly set for server-client communication. +On the client console, press and hold the "white-space" key on the keyboard to start talking, until you finish your speech and then release the "white-space" key. The decoding results (infered transcription) will be displayed. -For running `demo_client.py`, please first finish the [extra installation steps](#setup-for-demo). +It could be possible to start the server and the client in two seperate machines, e.g. `demo_client.py` is usually started in a machine with a microphone hardware, while `demo_server.py` is usually started in a remote server with powerful GPUs. Please first make sure that these two machines have network access to each other, and then use `--host_ip` and `--host_port` to indicate the server machine's actual IP address (instead of the `localhost` as default) and TCP port, in both `demo_server.py` and `demo_client.py`. diff --git a/model.py b/model.py index c2e440b3..2eb7c359 100644 --- a/model.py +++ b/model.py @@ -143,9 +143,9 @@ class DeepSpeech2Model(object): """Model inference. Infer the transcription for a batch of speech utterances. - :param infer_data: List of utterances to infer, with each utterance a - tuple of audio features and transcription text (empty - string). + :param infer_data: List of utterances to infer, with each utterance + consisting of a tuple of audio features and + transcription text (empty string). :type infer_data: list :param decode_method: Decoding method name, 'best_path' or 'beam search'. -- GitLab