训练过程中 Connection reset by peer,“ managers.py", line 759, result = conn.recv() IOError: [Errno 104] Connection reset by peer
Created by: bolt163
batch_size调成24 没有OOM, 第0个Pass训练正常,第1个Pass报 Connection reset by peer, 什么情况
................................................................................................... Pass: 1, Batch: 2100, TrainCost: 8.552184 ................................................................................................... Pass: 1, Batch: 2200, TrainCost: 7.938597 ................................................................................................... Pass: 1, Batch: 2300, TrainCost: 8.337096 ................................................................................................... Pass: 1, Batch: 2400, TrainCost: 8.301428 ................................................................................................... Pass: 1, Batch: 2500, TrainCost: 7.078673 ................................................................................................... Pass: 1, Batch: 2600, TrainCost: 8.098217 ................................................................................................... Pass: 1, Batch: 2700, TrainCost: 8.403698 ................................................................................................... Pass: 1, Batch: 2800, TrainCost: 7.654552 ................................................................................................... Pass: 1, Batch: 2900, TrainCost: 7.540778 ................................................................................................... Pass: 1, Batch: 3000, TrainCost: 7.034502 ................................................................................................... Pass: 1, Batch: 3100, TrainCost: 8.303614 ................................................................................................... Pass: 1, Batch: 3200, TrainCost: 7.362420 ................................................................................................... Pass: 1, Batch: 3300, TrainCost: 8.457884 ................................................................................................... Pass: 1, Batch: 3400, TrainCost: 8.275194 ................................................................................................... Pass: 1, Batch: 3500, TrainCost: 7.887627 ................................................................................................... Pass: 1, Batch: 3600, TrainCost: 9.091355 ................................................................................................... Pass: 1, Batch: 3700, TrainCost: 7.678354 ................................................................................................... Pass: 1, Batch: 3800, TrainCost: 8.109676 ................................................................................................... Pass: 1, Batch: 3900, TrainCost: 8.876282 ................................................................................................... Pass: 1, Batch: 4000, TrainCost: 8.288002 ................................................................................................... Pass: 1, Batch: 4100, TrainCost: 7.990575 ................................................................................................... Pass: 1, Batch: 4200, TrainCost: 7.985103 ................................................................................................... Pass: 1, Batch: 4300, TrainCost: 7.639118 ................................................................................................... Pass: 1, Batch: 4400, TrainCost: 8.015420 ................................................................................................... Pass: 1, Batch: 4500, TrainCost: 7.916594 ................................................................................................... Pass: 1, Batch: 4600, TrainCost: 8.564386 ................................................................................................... Pass: 1, Batch: 4700, TrainCost: 8.309537 ................................................................................................... Pass: 1, Batch: 4800, TrainCost: 8.294658 ...Exception in thread Thread-3: Traceback (most recent call last): File "/data/offline/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/data/offline/anaconda2/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/data1/bolt163/DeepSpeech/data_utils/utility.py", line 153, in flush_worker sample = in_queue.get() File "", line 2, in get File "/data/offline/anaconda2/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod conn.send((self._id, methodname, args, kwds)) IOError: [Errno 32] Broken pipe
.....................Process Process-46: Traceback (most recent call last): File "/data/offline/anaconda2/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap Process Process-43: Traceback (most recent call last): File "/data/offline/anaconda2/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap Process Process-51: Traceback (most recent call last): File "/data/offline/anaconda2/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/data/offline/anaconda2/lib/python2.7/multiprocessing/process.py", line 114, in run self.run() self._target(*self._args, **self._kwargs) File "/data/offline/anaconda2/lib/python2.7/multiprocessing/process.py", line 114, in run File "/data1/bolt163/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker self.run() File "/data/offline/anaconda2/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) self._target(*self._args, **self._kwargs) File "/data1/bolt163/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker File "/data1/bolt163/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem File "/data/offline/anaconda2/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod while order_id != out_order[0]: while order_id != out_order[0]: File "", line 2, in getitem File "", line 2, in getitem File "/data/offline/anaconda2/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod File "/data/offline/anaconda2/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() EOFError kind, result = conn.recv() IOError: [Errno 104] Connection reset by peer kind, result = conn.recv() IOError: [Errno 104] Connection reset by peer Process Process-53: Traceback (most recent call last): File "/data/offline/anaconda2/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/data/offline/anaconda2/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/data1/bolt163/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem File "/data/offline/anaconda2/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() IOError: [Errno 104] Connection reset by peer Process Process-42: Traceback (most recent call last): File "/data/offline/anaconda2/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/data/offline/anaconda2/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/data1/bolt163/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem File "/data/offline/anaconda2/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() IOError: [Errno 104] Connection reset by peer Process Process-47: Traceback (most recent call last): File "/data/offline/anaconda2/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/data/offline/anaconda2/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/data1/bolt163/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem Process Process-49: Traceback (most recent call last): File "/data/offline/anaconda2/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/data/offline/anaconda2/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/data1/bolt163/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker