Training in multiprocessing.Process falied to run startup program when using GPU
Created by: xueeinstein
I build a complicate reinforcement learning model in Paddle, with parallel training two different models simultaneously. And one needs to sync the latest weights of another one to generate some target values for training. I implement the second model in multiprocess.Process
.
However, the fresh forked (no other Paddle program created when fork new process) process failed to run the startup program when using GPU, but success when using CPU.
To make it easy for debug, I reproduce the bug in unit testcases as following:
class TestDynamicsModel(object):
def test_learner(self):
self.learner_process()
def test_learner_process(self):
process = multiprocessing.Process(target=self.learner_process)
process.start()
process.join()
assert process.exitcode == 0
def learner_process(self):
bs = 4
if machine_info.is_gpu_available():
place = fluid.CUDAPlace(0)
else:
place = fluid.CPUPlace()
# place = fluid.CPUPlace()
exe = fluid.Executor(place)
main, startup = fluid.Program(), fluid.Program()
with fluid.program_guard(main, startup):
env = DynamicsModel("env", self.env_config, self.learner_config)
learner = DynamicsModelLearner(env)
obs = fluid.layers.data("obs", self.env_config["obs_dims"],
dtype="float32")
next_obs = fluid.layers.data(
"next_obs", self.env_config["obs_dims"], dtype="float32")
actions = fluid.layers.data(
"actions", [self.env_config["action_dim"]], dtype="float32")
rewards = fluid.layers.data("rewards", [], dtype="float32")
dones = fluid.layers.data("dones", [], dtype="bool")
losses = learner.learn(obs, next_obs, actions, rewards, dones)
exe.run(startup)
obs_shape = [bs, *self.env_config["obs_dims"]]
act_shape = [bs, self.env_config["action_dim"]]
feed_dict = {
"obs": np.random.random(obs_shape).astype(np.float32),
"next_obs": np.random.random(obs_shape).astype(np.float32),
"actions": np.random.random(act_shape).astype(np.float32),
"rewards": np.random.random([bs]).astype(np.float32),
"dones": np.zeros([bs]).astype(bool)
}
inspects = exe.run(program=main, feed=feed_dict, fetch_list=losses)
assert len(inspects) == 4
Results:
The testcase test_learner
passed for running both on CPU and GPU, meaning that the process target function works well.
The testcase test_learner_process
passed only for running on CPU. When using GPU, it raised an error when init learning rate variable. Check the attachment for details.
out.txt