output meaningless Chinese words
Created by: xuzhaoqing
I have a dataset with no transcripts available, so I need to infer them. I followed the run_infer_golden.sh but got the results like this:
Output Transcription: 捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞
Output Transcription: 捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞
Output Transcription: 捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞呀捞
Could any of you have some ideas why it got so terrible? Thanks a lot !
My configuration is as follows:
CUDA_VISIBLE_DEVICES=0 \
python -u infer.py \
--num_samples=10 \
--trainer_count=1 \
--beam_size=300 \
--num_proc_bsearch=2 \
--num_conv_layers=2 \
--num_rnn_layers=3 \
--rnn_layer_size=1024 \
--alpha=2.4 \
--beta=5.0 \
--cutoff_prob=0.99 \
--cutoff_top_n=40 \
--use_gru=True \
--use_gpu=True \
--share_rnn_weights=False \
--infer_manifest='manifest.ex' \
--mean_std_path='mean_std.npz' \
--vocab_path='vocab.txt' \
--model_path='params.tar.gz' \
--lang_model_path='zhidao_giga.klm' \
--decoding_method='ctc_beam_search' \
--error_rate_type='cer' \
--specgram_type='linear'
I used the larger language model here and tried the mean_std both generated ourselves and from aishell dataset by executing run_data.sh(someone say it may work).