diff --git a/README.md b/README.md index 39dba0cd12b6cd297041133c850d3895b6c65e08..96fbb7d09aa310003d83a036d301deac54f3004d 100755 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ - **Python = 2.7** only supported; - **cuDNN >= 6.0** is required to utilize NVIDIA GPU platform in the installation of PaddlePaddle, and the **CUDA toolkit** with proper version suitable for cuDNN. The cuDNN library below 6.0 is found to yield a fatal error in batch normalization when handling utterances with long duration in inference. -### Setup for Training & Evaluation +### Setup ``` sh setup.sh @@ -16,19 +16,6 @@ export LD_LIBRARY_PATH=$PADDLE_INSTALL_DIR/Paddle/third_party/install/warpctc/li Please replace `$PADDLE_INSTALL_DIR` with your own paddle installation directory. -### Setup for Demo - -Please do the following extra installation before run `demo_client.py` to try the realtime ASR demo. However there is no need to install them for the computer running the demo's server-end (`demo_server.py`). For details of running the ASR demo, please refer to the [section](#playing-with-the-asr-demo). - -For example, on MAC OS X: - -``` -brew install portaudio -pip install pyaudio -pip install pynput -``` - - ## Usage ### Preparing Data @@ -159,20 +146,25 @@ Then reset parameters with the tuning result before inference or evaluating. ### Playing with the ASR Demo -A real-time ASR demo (`demo_server.py` and `demo_client.py`) are prepared for users to try out the ASR model with their own voice. After a model and language model is prepared, we can first start the demo server: +A real-time ASR demo is built for users to try out the ASR model with their own voice. Please do the following installation on the machine you'd like to run the demo's client (no need for the machine running the demo's server). + +For example, on MAC OS X: + +``` +brew install portaudio +pip install pyaudio +pip install pynput +``` +After a model and language model is prepared, we can first start the demo's server: ``` CUDA_VISIBLE_DEVICES=0 python demo_server.py ``` -And then in another console, start the client: +And then in another console, start the demo's client: ``` python demo_client.py ``` -On the client console, press and hold "white-space" key and start talking, then release the "white-space" key when you finish your speech. The decoding results (infered transcription) will be displayed. - -If you would like to start the server and the client in two machines. Please use `--host_ip` and `--host_port` to indicate the actual IP address and port, for both `demo_server.py` and `demo_client.py`. - -Notice that `demo_client.py` should be started in your local computer with microphone hardware, while `demo_server.py` can be started in any remote server as well as the same local computer. IP address and port should be properly set for server-client communication. +On the client console, press and hold the "white-space" key on the keyboard to start talking, until you finish your speech and then release the "white-space" key. The decoding results (infered transcription) will be displayed. -For running `demo_client.py`, please first finish the [extra installation steps](#setup-for-demo). +It could be possible to start the server and the client in two seperate machines, e.g. `demo_client.py` is usually started in a machine with a microphone hardware, while `demo_server.py` is usually started in a remote server with powerful GPUs. Please first make sure that these two machines have network access to each other, and then use `--host_ip` and `--host_port` to indicate the server machine's actual IP address (instead of the `localhost` as default) and TCP port, in both `demo_server.py` and `demo_client.py`. diff --git a/model.py b/model.py index c2e440b3ae4a8455d41de7b293e95123b7038931..2eb7c3594974239dff68f771e478423414688411 100644 --- a/model.py +++ b/model.py @@ -143,9 +143,9 @@ class DeepSpeech2Model(object): """Model inference. Infer the transcription for a batch of speech utterances. - :param infer_data: List of utterances to infer, with each utterance a - tuple of audio features and transcription text (empty - string). + :param infer_data: List of utterances to infer, with each utterance + consisting of a tuple of audio features and + transcription text (empty string). :type infer_data: list :param decode_method: Decoding method name, 'best_path' or 'beam search'.