提交 db0df8f8 编写于 作者: H helinwang 提交者: GitHub

Merge pull request #1235 from putcn/k8s_aws

added region check step, route53 config.
# Distributed PaddlePaddle Training on AWS with Kubernetes # Distributed PaddlePaddle Training on AWS with Kubernetes
We will show you step by step on how to run distributed PaddlePaddle training on AWS cluster with Kubernetes. Let's start from core concepts. We will show you step by step on how to run distributed PaddlePaddle training on AWS cluster with Kubernetes. Let's start from core concepts.
...@@ -43,6 +44,12 @@ We rank each pod by sorting them by their ips. The rank of each pod could be the ...@@ -43,6 +44,12 @@ We rank each pod by sorting them by their ips. The rank of each pod could be the
## PaddlePaddle on AWS with Kubernetes ## PaddlePaddle on AWS with Kubernetes
### Choose AWS Service Region
This tutorial requires several AWS services work in the same region. Before we create anything in AWS, please check the following link
https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/
Choose a region which has the following services available: EC2, EFS, VPS, CloudFormation, KMS, VPC, S3.
In this tutorial, we use "Oregon(us-west-2)" as example.
### Create AWS Account and IAM Account ### Create AWS Account and IAM Account
Under each AWS account, we can create multiple [IAM](http://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) users. This allows us to grant some privileges to each IAM user and to create/operate AWS clusters as an IAM user. Under each AWS account, we can create multiple [IAM](http://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) users. This allows us to grant some privileges to each IAM user and to create/operate AWS clusters as an IAM user.
...@@ -73,7 +80,8 @@ Please be aware that this tutorial needs the following privileges for the user i ...@@ -73,7 +80,8 @@ Please be aware that this tutorial needs the following privileges for the user i
#### kube-aws #### kube-aws
[kube-aws](https://github.com/coreos/kube-aws) is a CLI tool to automate cluster deployment to AWS. [kube-aws](https://github.com/coreos/kube-aws) is a CLI tool to automate cluster deployment to AWS.
##### Verify kube-aws integrity
Note: if you are using a non-official release (e.g RC release) kube-aws, you can skip this setp.
Import the CoreOS Application Signing Public Key: Import the CoreOS Application Signing Public Key:
``` ```
...@@ -98,7 +106,7 @@ PLATFORM=darwin-amd64 ...@@ -98,7 +106,7 @@ PLATFORM=darwin-amd64
gpg2 --verify kube-aws-${PLATFORM}.tar.gz.sig kube-aws-${PLATFORM}.tar.gz gpg2 --verify kube-aws-${PLATFORM}.tar.gz.sig kube-aws-${PLATFORM}.tar.gz
``` ```
##### Install kube-aws
Extract the binary: Extract the binary:
``` ```
...@@ -241,7 +249,7 @@ Paste into following inline policies: ...@@ -241,7 +249,7 @@ Paste into following inline policies:
] ]
} }
``` ```
`Version` : Its value has to be exactly "2012-10-17".
`AWS_ACCOUNT_ID`: You can get it from following command line: `AWS_ACCOUNT_ID`: You can get it from following command line:
``` ```
...@@ -249,6 +257,7 @@ aws sts get-caller-identity --output text --query Account ...@@ -249,6 +257,7 @@ aws sts get-caller-identity --output text --query Account
``` ```
`MY_CLUSTER_NAME`: Pick a MY_CLUSTER_NAME that you like, you will use it later as well. `MY_CLUSTER_NAME`: Pick a MY_CLUSTER_NAME that you like, you will use it later as well.
Please note, stack name must satisfy regular expression pattern: [a-zA-Z][-a-zA-Z0-9*]*, which means no "_" or "-" in stack name, or kube-aws will throw error in later steps.
#### External DNS name #### External DNS name
...@@ -256,7 +265,7 @@ When the cluster is created, the controller will expose the TLS-secured API on a ...@@ -256,7 +265,7 @@ When the cluster is created, the controller will expose the TLS-secured API on a
DNS name should have a CNAME points to cluster DNS name or an A record points to the cluster IP address. DNS name should have a CNAME points to cluster DNS name or an A record points to the cluster IP address.
We will need to use DNS name later in tutorial. We will need to use DNS name later in tutorial. If you don't already own one, you can choose any DNS name (e.g., `paddle`) and modify `/etc/hosts` to associate cluster IP with that DNS name for your local machine. And add name service (route53) in aws to associate the IP to paddle for cluster. We will find the cluster IP in later steps.
#### S3 bucket #### S3 bucket
...@@ -364,6 +373,26 @@ paddle-cl-ElbAPISe-EEOI3EZPR86C-531251350.us-west-2.elb.amazonaws.com. 59 IN A 5 ...@@ -364,6 +373,26 @@ paddle-cl-ElbAPISe-EEOI3EZPR86C-531251350.us-west-2.elb.amazonaws.com. 59 IN A 5
In the above output, both ip `54.241.164.52`, `54.67.102.112` will work. In the above output, both ip `54.241.164.52`, `54.67.102.112` will work.
*If you own a DNS name*, set the A record to any of the above ip. Then you can skip to the step "Access the cluster".
*If you do not own a DNS name*:
##### Update local DNS association
Edit `/etc/hosts` to associate above ip with the DNS name.
##### Add Route53 private name service in VPC
- Open [Route53 Console](https://console.aws.amazon.com/route53/home)
- Create hosted zone with following config
- Domain name: "paddle"
- Type: "Private hosted zone for amazon VPC"
- VPC ID: <Your VPC ID>
- Add A record
- Click on the zone "paddle" just created
- Click the button "Create record set"
- Name : leave blank
- type: "A"
- Value: <kube-controller ec2 private ip>
- Verify name service
- Connect to any instance created by kube-aws via ssh
- Run command "host paddle", see if the ip returned is the private ip of kube-controller
#### Access the cluster #### Access the cluster
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册