From 1e7c69fda42e0d5f8212ba815e9dba0bdf11d3c1 Mon Sep 17 00:00:00 2001 From: Xi Chen Date: Mon, 16 Apr 2018 16:48:20 -0700 Subject: [PATCH] doc update --- tools/aws_benchmarking/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/aws_benchmarking/README.md b/tools/aws_benchmarking/README.md index dfa2a5f478..837fcbb851 100644 --- a/tools/aws_benchmarking/README.md +++ b/tools/aws_benchmarking/README.md @@ -54,6 +54,7 @@ Training nodes will run your `ENTRYPOINT` script with the following environment - `TRAINERS`: trainer count - `SERVER_ENDPOINT`: current server end point if the node role is a pserver - `TRAINER_INDEX`: an integer to identify the index of current trainer if the node role is a trainer. + - `PADDLE_INIT_TRAINER_ID`: same as above Now we have a working distributed training script which takes advantage of node environment variables and docker file to generate the training image. Run the following command: @@ -81,8 +82,7 @@ putcn/paddle_aws_client \ --action create \ --key_name \ --security_group_id \ ---pserver_image_id \ ---trainer_image_id \ +--docker_image myreponame/paddle_benchmark \ --pserver_count 2 \ --trainer_count 2 ``` @@ -146,7 +146,7 @@ When the training is finished, pservers and trainers will be terminated. All the Master exposes 4 major services: - GET `/status`: return master log - - GET `/list_logs`: return list of log file names + - GET `/logs`: return list of log file names - GET `/log/`: return a particular log by log file name - POST `/cleanup`: teardown the whole setup -- GitLab