From 2d11588a46d48d0b9630bf2362685dd204fd1ed1 Mon Sep 17 00:00:00 2001 From: "A. Unique TensorFlower" Date: Tue, 5 Apr 2016 08:17:51 -0800 Subject: [PATCH] Edits distributed tf tutorial and adds the tutorial to the nav. Change: 119056020 --- tensorflow/g3doc/how_tos/distributed/index.md | 94 +++++++++++++------ tensorflow/g3doc/how_tos/leftnav_files | 1 + 2 files changed, 66 insertions(+), 29 deletions(-) diff --git a/tensorflow/g3doc/how_tos/distributed/index.md b/tensorflow/g3doc/how_tos/distributed/index.md index 37c6b081ae4..0907ef97792 100644 --- a/tensorflow/g3doc/how_tos/distributed/index.md +++ b/tensorflow/g3doc/how_tos/distributed/index.md @@ -5,13 +5,20 @@ distribute a computation graph across that cluster. We assume that you are familiar with the [basic concepts](../../get_started/basic_usage.md) of writing TensorFlow programs. -## Quick start +## Install -The gRPC server is included as part of the nightly PIP packages, which you can -download from [the continuous integration -site](http://ci.tensorflow.org/view/Nightly/). Alternatively, you can build an -up-to-date PIP package by following [these installation instructions] -(https://www.tensorflow.org/versions/master/get_started/os_setup.html#create-the-pip-package-and-install). +To use distributed TensorFlow, install a TensorFlow package that includes the +gRPC server. + +1. Download a nightly PIP package from [the continuous integration +site](http://ci.tensorflow.org/view/Nightly/). +1. Execute `pip uninstall tensorflow` if you have a previous installation. +1. Execute `pip install `. + +Alternatively, you can build an up-to-date PIP package from source by +following [these installation instructions] +(https://www.tensorflow.org/versions/master/get_started/os_setup.html#create- +the-pip-package-and-install). Once you have successfully built the distributed TensorFlow components, you can test your installation by starting a local server as follows: @@ -27,54 +34,83 @@ $ python 'Hello, distributed TensorFlow!' ``` -## Cluster definition - The `tf.GrpcServer.new_local_server()` method creates a single-process cluster. -To create a more realistic distributed cluster, you create a `tf.GrpcServer` by -passing in a `tf.ServerDef` that defines the membership of a TensorFlow cluster, -and then run multiple processes that each have the same cluster definition. -A `tf.ServerDef` comprises a cluster definition (`tf.ClusterDef`), which is the -same for all servers in a cluster; and a job name and task index that are unique -to a particular cluster. +## Create a cluster + +To create a cluster with multiple processes or machines: + +1. **Create a cluster specification dictionary**. All servers in the cluster share the +specification. + +1. **For each process or machine** in the cluster, run a TensorFlow program to: + + 1. **Create a `ClusterSpec`**, passing the dictionary to the constructor. -For constructing a `tf.ClusterDef`, the `tf.make_cluster_def()` function enables you to specify the jobs and tasks as a Python dictionary, mapping job names to lists of network addresses. For example: + 1. **Create a `tf.ServerDef`** that identifies itself with one of the + tasks in the `ClusterSpec`. + + 1. **Create a `tf.GrpcServer`**, passing the `tf.ServerDef` to the + constructor. + + +### Create the cluster specification dictionary and `ClusterSpec` instances. + + The cluster specification dictionary maps job names to lists + of network adresses. Pass this dictionary to the `tf.ClusterSpec` constructor. + For example: - + - + + - +
tf.ClusterDef constructionAvailable tasks
tf.ClusterSpec constructionAvailable tasks
tf.make_cluster_def({"local": ["localhost:2222", "localhost:2223"]})/job:local/task:0
/job:local/task:1
+tf.ClusterSpec({"local": ["localhost:2222", "localhost:2223"]})
+
/job:local/task:0
/job:local/task:1
tf.make_cluster_def({
    "worker": ["worker0:2222", "worker1:2222", "worker2:2222"],
    "ps": ["ps0:2222", "ps1:2222"]})
/job:worker/task:0
/job:worker/task:1
/job:worker/task:2
/job:ps/task:0
/job:ps/task:1
+tf.ClusterSpec({
+    "trainer": [
+        "trainer0.example.com:2222", 
+        "trainer1.example.com:2222",
+        "trainer2.example.com:2222"
+    ],
+    "params": [
+        "params0.example.com:2222",
+        "params1.example.com:2222"
+    ]})
+
/job:trainer/task:0
/job:trainer/task:1
/job:trainer/task:2
/job:params/task:0
/job:params/task:1
-The `server_def.job_name` and `server_def.task_index` fields select one of the -defined tasks from the `tf.ClusterDef`. For example, running the following code -in two different processes: +### Create `ServerDef` and `GrpcServer` instances + +A `ServerDef` stores a job name and task index that uniquely identify one of +the tasks defined in the `tf.ClusterSpec`. The `GrpcServer` constructor uses +this information to start a server. + +For example, to define and instantiate servers running on `localhost:2222` and +`localhost:2223`, run the following snippets in different processes: ```python # In task 0: server_def = tf.ServerDef( - cluster=tf.make_cluster_def({ - "local": ["localhost:2222", "localhost:2223"]}), + cluster=tf.ClusterSpec({ + "local": ["localhost:2222", "localhost:2223"]}).as_cluster_def(), job_name="local", task_index=0) server = tf.GrpcServer(server_def) ``` ```python # In task 1: server_def = tf.ServerDef( - cluster=tf.make_cluster_def({ - "local": ["localhost:2222", "localhost:2223"]}), + cluster=tf.ClusterSpec({ + "local": ["localhost:2222", "localhost:2223"]}).as_cluster_def(), job_name="local", task_index=1) server = tf.GrpcServer(server_def) ``` -…will define and instantiate servers running on `localhost:2222` and -`localhost:2223`. - -**N.B.** Manually specifying these cluster specifications can be tedious, +**Note:** Manually specifying these cluster specifications can be tedious, especially for large clusters. We are working on tools for launching tasks programmatically, e.g. using a cluster manager like [Kubernetes](http://kubernetes.io). If there are particular cluster managers for diff --git a/tensorflow/g3doc/how_tos/leftnav_files b/tensorflow/g3doc/how_tos/leftnav_files index f2b0a9fe9d2..9371098b0be 100644 --- a/tensorflow/g3doc/how_tos/leftnav_files +++ b/tensorflow/g3doc/how_tos/leftnav_files @@ -4,6 +4,7 @@ summaries_and_tensorboard/index.md graph_viz/index.md reading_data/index.md threading_and_queues/index.md +distributed/index.md adding_an_op/index.md new_data_formats/index.md using_gpu/index.md -- GitLab