something strange in aishell database downloading
Created by: Willoiron
Hi , i meet a mistake i try to run the aishell baseline, when it broke down like the reason of out of memory , i continue to run again, the program shows the following error:
[INFO 2018-01-23 08:08:02,503 model.py:243] begin to initialize the external scorer for decoding [INFO 2018-01-23 08:08:02,666 model.py:253] language model: is_character_based = 1, max_order = 5, dict_size = 0 [INFO 2018-01-23 08:08:02,667 model.py:254] end initializing scorer [INFO 2018-01-23 08:08:02,667 test.py:98] start evaluation ... Process Process-3: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 134, in order_handle_worker result = mapper(sample) File "/DeepSpeech/data_utils/data.py", line 280, in lambda instance: self.process_utterance(instance["audio_filepath"], instance["text"]), File "/DeepSpeech/data_utils/data.py", line 115, in process_utterance speech_segment = SpeechSegment.from_file(filename, transcript) File "/DeepSpeech/data_utils/speech.py", line 50, in from_file audio = AudioSegment.from_file(filepath) File "/DeepSpeech/data_utils/audio.py", line 71, in from_file samples, sample_rate = soundfile.read(file, dtype='float32') File "/usr/local/lib/python2.7/dist-packages/soundfile.py", line 373, in read subtype, endian, format, closefd) as f: File "/usr/local/lib/python2.7/dist-packages/soundfile.py", line 740, in init self._file = self._open(file, mode_int, closefd) File "/usr/local/lib/python2.7/dist-packages/soundfile.py", line 1265, in _open "Error opening {0!r}: ".format(self.name)) File "/usr/local/lib/python2.7/dist-packages/soundfile.py", line 1455, in _error_check raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace')) RuntimeError: Error opening u'/root/.cache/paddle/dataset/speech/Aishell/data_aishell/wav/train/S0002/BAC009S0002W0213.wav': System error. and so on ..
if i want to keep running the program, aishell database has to be re-downloaded.. I have encountered this situation many times .. the path i use is
Saving to: '/root/.cache/paddle/dataset/speech/Aishell/data_aishell.tgz' data_aishell.tgz 100%[===================>] 14.51G 2.16MB/s in 2h 30m
It costs a lot of time At the same time , the libri/tiny do not have a similar situation.. What is the cause of this problem? I am very confused..