To use AWS, we need to sign up an AWS account on Amazon's Web site.
To use AWS, we need to sign up an AWS account on Amazon's Web site.
An AWS account allows us to login to the AWS Console Web interface to
An AWS account allows us to login to the AWS Console Web interface to
create IAM users and user groups. Usually, we create a user group with
create IAM users and user groups. Usually, we create a user group with
privileges required to run PaddlePaddle, and we create users for
privileges required to run PaddlePaddle, and we create users for
those who are going to run PaddlePaddle and add these users into the
those who are going to run PaddlePaddle and add these users into the
group. IAM users can identify themselves using password and tokens,
group. IAM users can identify themselves using password and tokens,
where passwords allows users to log in to the AWS Console, and tokens
where passwords allows users to log in to the AWS Console, and tokens
make it easy for users to submit and inspect jobs from the command
make it easy for users to submit and inspect jobs from the command
line.
line.
...
@@ -360,7 +360,7 @@ In one time of distributed training, user will confirm the PaddlePaddle node num
...
@@ -360,7 +360,7 @@ In one time of distributed training, user will confirm the PaddlePaddle node num
####Create PaddlePaddle Node
####Create PaddlePaddle Node
After Kubernetes master gets the request, it will parse the yaml file and create several pods (defined by PaddlePaddle's node number), Kubernetes will allocate these pods onto cluster's node. A pod represents a PaddlePaddle node, when pod is successfully allocated onto one physical/virtual machine, Kubernetes will startup the container in the pod, and this container will use the environment variables in yaml file and start up `paddle pserver` and `paddle trainer` processes.
After Kubernetes master gets the request, it will parse the yaml file and create several pods (defined by PaddlePaddle's node number), Kubernetes will allocate these pods onto cluster's node. A pod represents a PaddlePaddle node, when pod is successfully allocated onto one physical/virtual machine, Kubernetes will startup the container in the pod, and this container will use the environment variables in yaml file and start up `paddle pserver` and `paddle trainer` processes.
####Start up Training
####Start up Training
...
@@ -661,6 +661,6 @@ Sometimes we might need to create or manage the cluster on AWS manually with lim
...
@@ -661,6 +661,6 @@ Sometimes we might need to create or manage the cluster on AWS manually with lim
### Some Presumptions
### Some Presumptions
* Instances run on CoreOS, the official IAM.
* Instances run on CoreOS, the official IAM.
* Kubernetes node use instance storage, no EBS get mounted. Etcd is running on additional node.
* Kubernetes node use instance storage, no EBS get mounted. Etcd is running on additional node.
* For networking, we use Flannel network at this moment, we will use Calico solution later on.
* For networking, we use Flannel network at this moment, we will use Calico solution later on.
* When you create a service with Type=LoadBalancer, Kubernetes will create and ELB, and create a security group for the ELB.
* When you create a service with Type=LoadBalancer, Kubernetes will create and ELB, and create a security group for the ELB.