集群训练打开sparse update后报错 Check failed: numPorts > 0 (0 vs. 0)
Created by: SearchVera
未打开sparse update可以跑通完成训练,打开sparse update修改了两个地方
- 网络结构配置sparse_update=True
`
hd1 = paddle.layer.fc(
input=feature,
size=hidden_layer_size,
act=paddle.activation.Tanh(),
layer_attr=paddle.attr.Extra(drop_rate=0.5),
#param_attr=paddle.attr.Param(initial_std=1.0 / hidden_layer_size))
param_attr=paddle.attr.Param(initial_std=1.0 / hidden_layer_size, sparse_update=True))`
- 提交命令use_remote_sparse设为1
paddle cluster_train \ --config ${model_config_file} \ --use_remote_sparse 1 \
更新算法使用的AdaGrad:
adagrad_optimizer = paddle.optimizer.AdaGrad( learning_rate=1e-3, regularization=paddle.optimizer.L2Regularization(rate=1e-3))