Once you have successfully built the distributed TensorFlow components, you can
test your installation by starting a local server as follows:
...
...
@@ -27,54 +34,83 @@ $ python
'Hello, distributed TensorFlow!'
```
## Cluster definition
The `tf.GrpcServer.new_local_server()` method creates a single-process cluster.
To create a more realistic distributed cluster, you create a `tf.GrpcServer` by
passing in a `tf.ServerDef` that defines the membership of a TensorFlow cluster,
and then run multiple processes that each have the same cluster definition.
A `tf.ServerDef` comprises a cluster definition (`tf.ClusterDef`), which is the
same for all servers in a cluster; and a job name and task index that are unique
to a particular cluster.
## Create a cluster
To create a cluster with multiple processes or machines:
1.**Create a cluster specification dictionary**. All servers in the cluster share the
specification.
1.**For each process or machine** in the cluster, run a TensorFlow program to:
1. **Create a `ClusterSpec`**, passing the dictionary to the constructor.
For constructing a `tf.ClusterDef`, the `tf.make_cluster_def()` function enables you to specify the jobs and tasks as a Python dictionary, mapping job names to lists of network addresses. For example:
1. **Create a `tf.ServerDef`** that identifies itself with one of the
tasks in the `ClusterSpec`.
1. **Create a `tf.GrpcServer`**, passing the `tf.ServerDef` to the
constructor.
### Create the cluster specification dictionary and `ClusterSpec` instances.
The cluster specification dictionary maps job names to lists
of network adresses. Pass this dictionary to the `tf.ClusterSpec` constructor.