Inference using English model returns gibberish
Created by: jrao1
The command we run:
CUDA_VISIBLE_DEVICES=0 python infer.py --trainer_count 1 \
--use_gru True \
--mean_std_path ./models/baidu_en8k/mean_std.npz \
--vocab_path ./models/baidu_en8k/vocab.txt \
--infer_manifest manifest.json \
--lang_model_path models/lm/common_crawl_00.prune01111.trie.klm \
--model_path ./models/baidu_en8k/params.tar.gz
Manifest:
{"audio_filepath": "./241757.wav", "duration": 5, "text": "If students today had more free time, they might show more interest in politics."}
The file is this: 241757.zip, it came from http://www.manythings.org/audio/sentences/126.html
Result:
----------- Configuration Arguments -----------
alpha: 2.5
beam_size: 500
beta: 0.3
cutoff_prob: 1.0
cutoff_top_n: 40
decoding_method: ctc_beam_search
error_rate_type: wer
infer_manifest: manifest.json
lang_model_path: models/lm/common_crawl_00.prune01111.trie.klm
mean_std_path: ./models/baidu_en8k/mean_std.npz
model_path: ./models/baidu_en8k/params.tar.gz
num_conv_layers: 2
num_proc_bsearch: 8
num_rnn_layers: 3
num_samples: 10
rnn_layer_size: 2048
share_rnn_weights: True
specgram_type: linear
trainer_count: 1
use_gpu: True
use_gru: 1
vocab_path: ./models/baidu_en8k/vocab.txt
------------------------------------------------
I0323 12:10:15.707188 820 Util.cpp:166] commandline: --use_gpu=True --rnn_use_batch=True --trainer_count=1
[INFO 2018-03-23 12:10:17,678 layers.py:2606] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2018-03-23 12:10:17,680 layers.py:3133] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2018-03-23 12:10:17,680 layers.py:7224] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2018-03-23 12:10:17,681 layers.py:2606] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2018-03-23 12:10:17,682 layers.py:3133] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2018-03-23 12:10:17,683 layers.py:7224] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2018-03-23 12:10:20,938 model.py:243] begin to initialize the external scorer for decoding
[INFO 2018-03-23 12:10:29,526 model.py:253] language model: is_character_based = 0, max_order = 5, dict_size = 400000
[INFO 2018-03-23 12:10:29,711 model.py:254] end initializing scorer
[INFO 2018-03-23 12:10:29,711 infer.py:103] start inference ...
Target Transcription: If students today had more free time, they might show more interest in politics.
Output Transcription: eugenespringfield emergencies emergencies eyebrowraising
Current error rate [wer] = 1.000000
[INFO 2018-03-23 12:10:30,421 infer.py:124] finish inference
Any ideas?
(It works well for sentences from librispeech, so the model is working)