From 07787f72ba69dd0bbca4ee01f84c59fb34dc02c9 Mon Sep 17 00:00:00 2001 From: Helin Wang Date: Mon, 16 Jan 2017 09:19:05 -0800 Subject: [PATCH] clarify and fix problems in paddle on aws k8s (create cluster part) --- doc/howto/usage/k8s/k8s_aws_en.md | 138 ++++++++++++++++++++---------- 1 file changed, 93 insertions(+), 45 deletions(-) diff --git a/doc/howto/usage/k8s/k8s_aws_en.md b/doc/howto/usage/k8s/k8s_aws_en.md index c776ba9eb9..bd9eee7296 100644 --- a/doc/howto/usage/k8s/k8s_aws_en.md +++ b/doc/howto/usage/k8s/k8s_aws_en.md @@ -2,18 +2,18 @@ ## Create AWS Account and IAM Account -AWS account allow us to manage AWS from Web Console. Amazon AMI enable us to manage AWS from command line interface. +AWS account allow us to manage AWS from Web Console. Amazon IAM enable us to manage AWS from command line interface. -We need to create an AMI user with sufficient privilege to create kubernetes cluster on AWS. +We need to create an IAM user with sufficient privilege to create kubernetes cluster on AWS. To sign up an AWS account, please follow [this guide](http://docs.aws.amazon.com/lambda/latest/dg/setting-up.html). -To create users and user groups under an AWS account, please +To create IAM users and user groups under an AWS account, please follow [this guide](http://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html). -Please be aware that this tutorial needs the following privileges for the user in AMI: +Please be aware that this tutorial needs the following privileges for the user in IAM: - AmazonEC2FullAccess - AmazonS3FullAccess @@ -27,14 +27,6 @@ Please be aware that this tutorial needs the following privileges for the user i - AWSKeyManagementServicePowerUser -By the time we write this tutorial, we noticed that Chinese AWS users -might suffer from authentication problems when running this tutorial. -Our solution is that we create a VM instance with the default Amazon -AMI and in the same zone as our cluster runs, so we can SSH to this VM -instance as a tunneling server and control our cluster and jobs from -it. - - ## PaddlePaddle on AWS Here we will show you step by step on how to run PaddlePaddle training on AWS cluster. @@ -59,7 +51,7 @@ gpg2 --fingerprint FC8A365E ``` The correct key fingerprint is `18AD 5014 C99E F7E3 BA5F 6CE9 50BD D3E0 FC8A 365E` -Go to the [releases](https://github.com/coreos/kube-aws/releases) and download the latest release tarball and detached signature (.sig) for your architecture. +Go to the [releases](https://github.com/coreos/kube-aws/releases) and download release tarball (this tutorial is using v0.9.1) and detached signature (.sig) for your architecture. Validate the tarball's GPG signature: @@ -88,14 +80,22 @@ mv ${PLATFORM}/kube-aws /usr/local/bin [kubectl](https://kubernetes.io/docs/user-guide/kubectl-overview/) is a command line interface for running commands against Kubernetes clusters. -Go to the [releases](https://github.com/kubernetes/kubernetes/releases) and download the latest release tarball. - -Extract the tarball and then concate the kubernetes binaries directory into PATH: +Download `kubectl` from the Kubernetes release artifact site with the `curl` tool. ``` -export PATH=/platforms/linux/amd64:$PATH # The exact path depend on your platform +# OS X +curl -O https://storage.googleapis.com/kubernetes-release/release/"$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)"/bin/darwin/amd64/kubectl + +# Linux +curl -O https://storage.googleapis.com/kubernetes-release/release/"$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)"/bin/linux/amd64/kubectl ``` +Make the kubectl binary executable and move it to your PATH (e.g. `/usr/local/bin`): + +``` +chmod +x ./kubectl +sudo mv ./kubectl /usr/local/bin/kubectl +``` ### Configure AWS Credentials @@ -109,17 +109,18 @@ aws configure ``` -Fill in the required fields (You can get your AWS aceess key id and AWS secrete access key by following [this](http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) instruction): +Fill in the required fields: ``` AWS Access Key ID: YOUR_ACCESS_KEY_ID AWS Secrete Access Key: YOUR_SECRETE_ACCESS_KEY -Default region name: us-west-2 +Default region name: us-west-1 Default output format: json - ``` +`YOUR_ACCESS_KEY_ID`, and `YOUR_SECRETE_ACCESS_KEY` is the IAM key and secret from [Create AWS Account and IAM Account](#create-aws-account-and-iam-account) + Verify that your credentials work by describing any instances you may already have running on your account: ``` @@ -134,7 +135,9 @@ The keypair that will authenticate SSH access to your EC2 instances. The public Follow [EC2 Keypair docs](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) to create a EC2 key pair -After creating a key pair, you will use the name you gave the keys to configure the cluster. Key pairs are only available to EC2 instances in the same region. +After creating a key pair, you will use the key pair name to configure the cluster. + +Key pairs are only available to EC2 instances in the same region. We are using us-west-1 in our tutorial, so make sure to creat key pairs in that region (N. California). #### KMS key @@ -143,12 +146,12 @@ Amazon KMS keys are used to encrypt and decrypt cluster TLS assets. If you alrea You can create a KMS key in the AWS console, or with the aws command line tool: ``` -$ aws kms --region=us-west-1 create-key --description="kube-aws assets" +aws kms --region=us-west-1 create-key --description="kube-aws assets" { "KeyMetadata": { "CreationDate": 1458235139.724, "KeyState": "Enabled", - "Arn": "arn:aws:kms:us-west-1:xxxxxxxxx:key/xxxxxxxxxxxxxxxxxxx", + "Arn": "arn:aws:kms:us-west-1:aaaaaaaaaaaaa:key/xxxxxxxxxxxxxxxxxxx", "AWSAccountId": "xxxxxxxxxxxxx", "Enabled": true, "KeyUsage": "ENCRYPT_DECRYPT", @@ -158,11 +161,11 @@ $ aws kms --region=us-west-1 create-key --description="kube-aws assets" } ``` -You will use the `KeyMetadata.Arn` string to identify your KMS key in the init step. +We will need to use the value of `Arn` later. And then you need to add several inline policies in your user permission. -Go to AMI user page, click on `Add inline policy` button, and then select `Custom Policy` +Go to IAM user page, click on `Add inline policy` button, and then select `Custom Policy` paste into following inline policies: @@ -178,7 +181,7 @@ paste into following inline policies: "kms:Encrypt" ], "Resource": [ - "arn:aws:kms:*:xxxxxxxxx:key/*" + "arn:aws:kms:*:AWS_ACCOUNT_ID:key/*" ] }, { @@ -194,29 +197,37 @@ paste into following inline policies: "cloudformation:DescribeStackEvents" ], "Resource": [ - "arn:aws:cloudformation:us-west-1:xxxxxxxxx:stack/YOUR_CLUSTER_NAME/*" + "arn:aws:cloudformation:us-west-1:AWS_ACCOUNT_ID:stack/MY_CLUSTER_NAME/*" ] } ] } ``` +`AWS_ACCOUNT_ID`: You can get it from following command line: + +``` +aws sts get-caller-identity --output text --query Account +``` + +`MY_CLUSTER_NAME`: Pick a MY_CLUSTER_NAME that you like, you will use it later as well. #### External DNS name -When the cluster is created, the controller will expose the TLS-secured API on a public IP address. You will need to create an A record for the external DNS hostname you want to point to this IP address. You can find the API external IP address after the cluster is created by invoking kube-aws status. +When the cluster is created, the controller will expose the TLS-secured API on a DNS name. + +The A record of that DNS name needs to be point to the cluster ip address. + +We will need to use DNS name later in tutorial. If you don't already own one, you can choose any DNS name (e.g., `paddle`) and modify `/etc/hosts` to associate cluster ip with that DNS name. #### S3 bucket You need to create an S3 bucket before startup the Kubernetes cluster. -command (need to have a global unique name): +There are some bug in aws cli in creating S3 bucket, so let's use [web console](https://console.aws.amazon.com/s3/home?region=us-west-1). -``` -paddle aws s3api --region=us-west-1 create-bucket --bucket bucket-name -``` +Click on `Create Bucket`, fill in a unique BUCKET_NAME, and make sure region is us-west-1 (Northern California). -If you get an error message, try a different bucket name. The bucket name needs to be globally unique. #### Initialize an asset directory @@ -230,33 +241,44 @@ $ cd my-cluster Initialize the cluster CloudFormation stack with the KMS Arn, key pair name, and DNS name from the previous step: ``` -$ kube-aws init \ ---cluster-name=my-cluster-name \ ---external-dns-name=my-cluster-endpoint \ +kube-aws init \ +--cluster-name=MY_CLUSTER_NAME \ +--external-dns-name=MY_EXTERNAL_DNS_NAME \ --region=us-west-1 \ ---availability-zone=us-west-1c \ ---key-name=key-pair-name \ +--availability-zone=us-west-1a \ +--key-name=KEY_PAIR_NAME \ --kms-key-arn="arn:aws:kms:us-west-1:xxxxxxxxxx:key/xxxxxxxxxxxxxxxxxxx" ``` -Here `us-west-1c` is used for parameter `--availability-zone`, but supported availability zone varies among AWS accounts. +`MY_CLUSTER_NAME`: the one you picked in [KMS key](#kms-key) + +`MY_EXTERNAL_DNS_NAME`: see [External DNS name](#external-dns-name) -Please check if `us-west-1c` is supported by `aws ec2 --region us-west-1 describe-availability-zones`, if not switch to other supported availability zone. (e.g., `us-west-1a`, or `us-west-1b`) +`KEY_PAIR_NAME`: see [EC2 key pair](#ec2-key-pair) + +`--kms-key-arn`: the "Arn" in [KMS key](#kms-key) + +Here `us-west-1a` is used for parameter `--availability-zone`, but supported availability zone varies among AWS accounts. + +Please check if `us-west-1a` is supported by `aws ec2 --region us-west-1 describe-availability-zones`, if not switch to other supported availability zone. (e.g., `us-west-1a`, or `us-west-1b`) + +Note: please don't use `us-west-1c`. Subnets can currently only be created in the following availability zones: us-west-1b, us-west-1a. There will now be a cluster.yaml file in the asset directory. This is the main configuration file for your cluster. + #### Render contents of the asset directory In the simplest case, you can have kube-aws generate both your TLS identities and certificate authority for you. ``` -$ kube-aws render credentials --generate-ca +kube-aws render credentials --generate-ca ``` The next command generates the default set of cluster assets in your asset directory. ``` -sh $ kube-aws render stack +kube-aws render stack ``` Here's what the directory structure looks like: @@ -292,15 +314,41 @@ These assets (templates and credentials) are used to create, update and interact #### Create the instances defined in the CloudFormation template -Now for the exciting part, creating your cluster (choose any ``): +Now let's create your cluster (choose any PREFIX for the command below): ``` -$ kube-aws up --s3-uri s3:/// +kube-aws up --s3-uri s3://BUCKET_NAME/PREFIX ``` +`BUCKET_NAME`: the bucket name that you used in [S3 bucket](#s3-bucket) + + #### Configure DNS -You can invoke `kube-aws status` to get the cluster API endpoint after cluster creation, if necessary. This command can take a while. And use command `dig` to check the load balancer hostname to get the ip address, use this ip to setup an A record for your external dns name. +You can invoke `kube-aws status` to get the cluster API endpoint after cluster creation. + +``` +$ kube-aws status +Cluster Name: paddle-cluster +Controller DNS Name: paddle-cl-ElbAPISe-EEOI3EZPR86C-531251350.us-west-1.elb.amazonaws.com +``` + +Use command `dig` to check the load balancer hostname to get the ip address. + +``` +$ dig paddle-cl-ElbAPISe-EEOI3EZPR86C-531251350.us-west-1.elb.amazonaws.com + +;; QUESTION SECTION: +;paddle-cl-ElbAPISe-EEOI3EZPR86C-531251350.us-west-1.elb.amazonaws.com. IN A + +;; ANSWER SECTION: +paddle-cl-ElbAPISe-EEOI3EZPR86C-531251350.us-west-1.elb.amazonaws.com. 59 IN A 54.241.164.52 +paddle-cl-ElbAPISe-EEOI3EZPR86C-531251350.us-west-1.elb.amazonaws.com. 59 IN A 54.67.102.112 +``` + +In the above output, both ip `54.241.164.52`, `54.67.102.112` will work. + +If you own a DNS name, set the A record to any of the above ip. Otherwise you can edit `/etc/hosts` to associate ip with the DNS name. #### Access the cluster -- GitLab