Created by: chenwhql
PR types
Bug fixes
PR changes
APIs
Describe
move DataLoader._worker_loop
to top level
Otherwise, it will cause errors when using paddle.distributed.spawn
method to start multi-process DataLoader because the _worker_loop
cannot be pickled.
error example:
I0910 12:49:50.989151 8382 nccl_context.cc:127] init nccl context nranks: 2 local rank: 0 gpu id: 0
I0910 12:49:50.989272 8383 nccl_context.cc:127] init nccl context nranks: 2 local rank: 1 gpu id: 1
W0910 12:49:51.561908 8382 device_context.cc:320] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 10.0
W0910 12:49:51.562680 8383 device_context.cc:320] Please NOTE: device: 1, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 10.0
W0910 12:49:51.568655 8382 device_context.cc:328] device: 0, cuDNN Version: 7.5.
W0910 12:49:51.569162 8383 device_context.cc:328] device: 1, cuDNN Version: 7.5.
Exception ignored in: <bound method _DataLoaderIterMultiProcess.__del__ of <paddle.fluid.dataloader.dataloader_iter._DataLoaderIterMultiProcess object at 0x7f71cbeb5898>>
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 719, in __del__
self._try_shutdown_all()
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 456, in _try_shutdown_all
if not self._shutdown:
AttributeError: '_DataLoaderIterMultiProcess' object has no attribute '_shutdown'
Exception ignored in: <bound method _DataLoaderIterMultiProcess.__del__ of <paddle.fluid.dataloader.dataloader_iter._DataLoaderIterMultiProcess object at 0x7fcf918958d0>>
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 719, in __del__
self._try_shutdown_all()
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 456, in _try_shutdown_all
if not self._shutdown:
AttributeError: '_DataLoaderIterMultiProcess' object has no attribute '_shutdown'
Traceback (most recent call last):
File "spawn_dataloader.py", line 72, in <module>
dist.spawn(train, nprocs=2)
File "/usr/local/lib/python3.5/dist-packages/paddle/distributed/spawn.py", line 409, in spawn
while not context.join():
File "/usr/local/lib/python3.5/dist-packages/paddle/distributed/spawn.py", line 210, in join
self._throw_exception(error_index)
File "/usr/local/lib/python3.5/dist-packages/paddle/distributed/spawn.py", line 228, in _throw_exception
raise Exception(msg)
Exception:
----------------------------------------------
Process 0 terminated with the following error:
----------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/paddle/distributed/spawn.py", line 159, in _func_wrapper
result = func(*args)
File "/work/scripts/spawn/spawn_dataloader.py", line 58, in train
for batch_id, (image, label) in enumerate(loader()):
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 406, in __call__
return self.__iter__()
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 403, in __iter__
return _DataLoaderIterMultiProcess(self)
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 381, in __init__
self._init_workers()
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 413, in _init_workers
worker.start()
File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.5/multiprocessing/context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.5/multiprocessing/context.py", line 274, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.5/multiprocessing/popen_spawn_posix.py", line 33, in __init__
super().__init__(process_obj)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/usr/lib/python3.5/multiprocessing/popen_spawn_posix.py", line 48, in _launch
reduction.dump(process_obj, fp)
File "/usr/lib/python3.5/multiprocessing/reduction.py", line 59, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread.lock objects