model init multiple times, slow inference (#418) · Issue · PaddlePaddle / DeepSpeech

model init multiple times, slow inference

Created by: aayushkubb

Hi Team,

I was trying to run DeepSpeech on Libris dataset and noticed while doing inference. The model is initialized with every batch/loop. This makes the inference very slow, is there a way around that we don't initialize the model in every batch?

Complete trace: `----------- Configuration Arguments ----------- alpha: 2.5 batch_size: 16 beam_size: 128 beta: 0.3 cutoff_prob: 1.0 cutoff_top_n: 40 decoding_method: ctc_beam_search error_rate_type: wer lang_model_path: pretrained_ds2/common_crawl_00.prune01111.trie.klm mean_std_path: data/tiny/mean_std.npz model_path: pretrained_ds2/libris num_conv_layers: 2 num_proc_bsearch: 8 num_rnn_layers: 3 rnn_layer_size: 2048 share_rnn_weights: 1 specgram_type: linear test_manifest: data/tiny/manifest.sample use_gpu: 1 use_gru: 0 vocab_path: pretrained_ds2/libris/vocab.txt

2020-01-23 17:04:50,840-INFO: begin to initialize the external scorer for decoding 2020-01-23 17:07:10,592-INFO: language model: is_character_based = 0, max_order = 5, dict_size = 400000 2020-01-23 17:07:10,596-INFO: end initializing scorer 2020-01-23 17:07:10,597-INFO: start evaluation ... +++++++++++++++++++++++++++++++++ Processing Batch: 0 +++++++++++++++++++++++++++++++++ W0123 17:07:12.100998 28455 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 10.0, Runtime API Version: 10.0 W0123 17:07:12.143882 28455 device_context.cc:244] device: 0, cuDNN Version: 7.6. finish initing model from pretrained params from pretrained_ds2/libris

Target Transcription: he hoped there would be stew for dinner turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick peppered flour fattened sauce Output Transcription: he hoped there would be stew for dinner turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick peppered flower fattened sauce Error rate [wer] (16/?) = 0.050676 +++++++++++++++++++++++++++++++++ Processing Batch: 1 +++++++++++++++++++++++++++++++++ finish initing model from pretrained params from pretrained_ds2/libris

Target Transcription: then you can ask him questions on the catechism dedalus Output Transcription: then you can ask him questions on the catechism daedalus Error rate [wer] (32/?) = 0.031674 +++++++++++++++++++++++++++++++++ Processing Batch: 2 +++++++++++++++++++++++++++++++++ finish initing model from pretrained params from pretrained_ds2/libris

Target Transcription: he is called as you know the apostle of the indies Output Transcription: he is called as you know the apostle of the indies Error rate [wer] (48/?) = 0.027174 +++++++++++++++++++++++++++++++++ Processing Batch: 3 +++++++++++++++++++++++++++++++++ finish initing model from pretrained params from pretrained_ds2/libris

Target Transcription: brother mac ardle brother keogh Output Transcription: brother mccardle brother key of Error rate [wer] (64/?) = 0.040898 +++++++++++++++++++++++++++++++++ Processing Batch: 4 +++++++++++++++++++++++++++++++++ finish initing model from pretrained params from pretrained_ds2/libris

Target Transcription: you will find me continually speaking of four men titian holbein turner and tintoret in almost the same terms Output Transcription: you will find me continually speaking of four men titan hobin turner and tenkara in almost the same terms Error rate [wer] (80/?) = 0.056689 `

PaddlePaddle / DeepSpeech 大约 2 年 前同步成功

model init multiple times, slow inference

PaddlePaddle / DeepSpeech
大约 2 年前同步成功