MADDPG例子报错求助!!
Created by: gudufengzhongyipilang
-
背景:在双2070ti的服务器上跑MADDPG的example,CUDA环境为10.2,nccl2已经装好,各个环境依赖都已经装好了。
-
conda下直接运行train.py,错误发生在simple_agent.py第118行 类MAAgent() 函数predict()内:
def predict(self, obs):
obs = np.expand_dims(obs, axis=0)
obs = obs.astype('float32')
act = self.fluid_executor.run( self.pred_program, feed={'obs': obs}, fetch_list=[self.pred_act])[0]
报错信息如下: `发生异常: EnforceNotMet
C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const*, int) 2 paddle::framework::ParallelExecutor::FeedAndSplitTensorIntoLocalScopes(std::unordered_map<std::string, paddle::framework::LoDTensor, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, paddle::framework::LoDTensor> > > const&)
Error Message Summary:
PreconditionNotMetError: The number(1) of samples[obs] of current batch is less than the count(2) of devices(GPU), currently, it is not allowed. at (/paddle/paddle/fluid/framework/parallel_executor.cc:923) File "/home/fsj/programs/RL/PARL/examples/MADDPG/simple_agent.py", line 118, in predict fetch_list=[self.pred_act])[0] File "/home/fsj/programs/RL/PARL/examples/MADDPG/train.py", line 33, in action_n = [agent.predict(obs) for agent, obs in zip(agents, obs_n)] File "/home/fsj/programs/RL/PARL/examples/MADDPG/train.py", line 33, in run_episode action_n = [agent.predict(obs) for agent, obs in zip(agents, obs_n)] File "/home/fsj/programs/RL/PARL/examples/MADDPG/train.py", line 134, in train_agent ep_reward, ep_agent_rewards, steps = run_episode(env, agents) File "/home/fsj/programs/RL/PARL/examples/MADDPG/train.py", line 226, in train_agent()`