nvidia-docker跑demo训练程序出错
Created by: wuqi930907
我用nvidia-docker(sudo nvidia-docker run -it -v $(pwd)/DeepSpeech:/DeepSpeech paddlepaddle/deep_speech:latest-gpu /bin/bash)在8块显卡上跑训练程序DeepSpeech/aishell/run_train.sh,出错了,请问这是怎么回事?代码一点没有改
Pass: 9, Batch: 1500, TrainCost: 1.245488 ................................................................................................... Pass: 9, Batch: 1600, TrainCost: 1.278968 ................................................................................................... Pass: 9, Batch: 1700, TrainCost: 1.278138 ................................................................................................... Pass: 9, Batch: 1800, TrainCost: 1.370887 ........................................................................... ------- Time: 2829 sec, Pass: 9, ValidationCost: 1.88582897921 ...................................................................................................Exception in thread Thread-21: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/DeepSpeech/data_utils/utility.py", line 153, in flush_worker sample = in_queue.get() File "", line 2, in get File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() IOError: [Errno 104] Connection reset by peer
Pass: 10, Batch: 100, TrainCost: 0.892535 .....................................Process Process-376: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() IOError: [Errno 104] Connection reset by peer Process Process-373: Process Process-372: Traceback (most recent call last): Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() self.run() IOError: [Errno 104] Connection reset by peer File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run Process Process-367: self._target(*self._args, **self._kwargs) Traceback (most recent call last): File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap while order_id != out_order[0]: File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker Process Process-371: Traceback (most recent call last): while order_id != out_order[0]: File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() EOFError self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker kind, result = conn.recv() while order_id != out_order[0]: File "", line 2, in getitem EOFError File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() EOFError Process Process-374: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker Process Process-368: Traceback (most recent call last): while order_id != out_order[0]: File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 138, in order_handle_worker kind, result = conn.recv() out_order[0] += 1 File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod Process Process-365: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap kind, result = conn.recv() EOFError Process Process-375: IOError: [Errno 104] Connection reset by peer Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() IOError: [Errno 104] Connection reset by peer kind, result = conn.recv() IOError: [Errno 104] Connection reset by peer Process Process-364: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() IOError: [Errno 104] Connection reset by peer Process Process-369: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() IOError: [Errno 104] Connection reset by peer Process Process-370: Process Process-377: Traceback (most recent call last): Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self.run() self._target(*self._args, **self._kwargs) File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() IOError: [Errno 104] Connection reset by peer Process Process-362: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 121, in order_read_worker in_queue.put((order_id, sample)) File "", line 2, in put File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() EOFError Process Process-363: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() IOError: [Errno 104] Connection reset by peer Process Process-366: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() EOFError Process Process-378: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() IOError: [Errno 104] Connection reset by peer self._target(*self._args, **self._kwargs) File "/DeepSpeech/data_utils/utility.py", line 135, in order_handle_worker while order_id != out_order[0]: File "", line 2, in getitem File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() IOError: [Errno 104] Connection reset by peer