Paddle on Kubernetes tutorial questions
Created by: helinwang
-
trainer_id
is referenced in many places, but what doestrainer_id
actually do in paddle? Might be worthwhile to briefly explain. - The descriptions for paddle env variables are hard to understand:
CONF_PADDLE_PORTS_NUM_SPARSE represents the sparse updated port number, --ports_num_for_sparse parameter.
CONF_PADDLE_GRADIENT_NUM represents the training node number, --num_gradient_servers parameter.
Maybe provide a link for --ports_num_for_sparse
, --num_gradient_servers parameter
to paddle reference?
- 30001 is a magic number in the kubernetes job yaml:
- name: jobport hostPort: 30001 containerPort: 30001
Where does it come from? Does paddle has 30001 hardcoded somewhere?
- 7164 is the port number for parameter server, but 7164 is not exposed by container spec.
- name: CONF_PADDLE_PORT value: "7164"
Where does 7164 come from (e.g., hardcoded in paddle?). And how is 7164 exposed from inside of container (have not see a container spec ports
for 7164).