Can not Get PsKey for the training with fault towerance mode
Created by: Yancey1989
I start up master, pserver and trainer in a Docker container, but the trainer can not get the PServer address from etcd, the error logs as below:
['/work/data/uci_housing_train-*-of-*']
ERRO[0000] Get task failed, sleep 3 seconds and continue, no more available task
I0719 08:12:24.602708 824 Util.cpp:166] commandline:
I0719 08:12:24.607319 824 GradientMachine.cpp:85] Initing parameters..
I0719 08:12:24.607365 824 GradientMachine.cpp:92] Init parameters done.
INFO[0000] Connected to etcd: localhost:2379
I0719 08:12:24.962303 824 NewRemoteParameterUpdater.cpp:68] paddle_begin_init_params start
I0719 08:12:24.962774 824 NewRemoteParameterUpdater.cpp:71] old param config: name: "___fc_layer_0__.w0"
size: 13
initial_mean: 0
initial_std: 0.27735009811261457
dims: 13
dims: 1
initial_strategy: 0
initial_smart: true
para_id: 0
INFO[0000] Get psKey= /ps/0 error, context canceled
ERRO[0003] Get task failed, sleep 3 seconds and continue, no more available task
ERRO[0006] Get task failed, sleep 3 seconds and continue, no more available task
ERRO[0009] Get task failed, sleep 3 seconds and continue, no more available task
INFO[0010] Get psKey= /ps/0 error, context canceled