Update generated Python Op docs.

Change: 136614898

Update generated Python Op docs.
Change: 136614898
418daaac · A. Unique TensorFlower · TensorFlower Gardener · c8d4bba1 · 418daaac · 418daaac
15 changed file
--- a/tensorflow/g3doc/api_docs/python/contrib.learn.monitors.md
+++ b/tensorflow/g3doc/api_docs/python/contrib.learn.monitors.md
@@ -2649,34 +2649,3 @@ Wraps monitors into a SessionRunHook.



- - -
-
-### `class tf.contrib.learn.monitors.SummaryWriterCache` {#SummaryWriterCache}
-
-Cache for summary writers.
-
-This class caches summary writers, one per directory.
- - -
-
-#### `tf.contrib.learn.monitors.SummaryWriterCache.clear()` {#SummaryWriterCache.clear}
-
-Clear cached summary writers. Currently only used for unit tests.
-
-
- - -
-
-#### `tf.contrib.learn.monitors.SummaryWriterCache.get(logdir)` {#SummaryWriterCache.get}
-
-Returns the SummaryWriter for the specified directory.
-
-##### Args:
-
-
-*  <b>`logdir`</b>: str, name of the directory.
-
-##### Returns:
-
-  A `SummaryWriter`.
-
-
-
--- a/tensorflow/g3doc/api_docs/python/functions_and_classes/shard4/tf.contrib.learn.monitors.SummaryWriterCache.clear.md
+++ b/tensorflow/g3doc/api_docs/python/functions_and_classes/shard4/tf.contrib.learn.monitors.SummaryWriterCache.clear.md
-#### `tf.contrib.learn.monitors.SummaryWriterCache.clear()` {#SummaryWriterCache.clear}
+#### `tf.train.SummaryWriterCache.clear()` {#SummaryWriterCache.clear}

 Clear cached summary writers. Currently only used for unit tests.

--- a/tensorflow/g3doc/api_docs/python/functions_and_classes/shard2/tf.train.basic_train_loop.md
+++ b/tensorflow/g3doc/api_docs/python/functions_and_classes/shard2/tf.train.basic_train_loop.md
+### `tf.train.basic_train_loop(supervisor, train_step_fn, args=None, kwargs=None, master='')` {#basic_train_loop}
+
+Basic loop to train a model.
+
+Calls `train_step_fn` in a loop to train a model.  The function is called as:
+
+```python
+train_step_fn(session, *args, **kwargs)
+```
+
+It is passed a `tf.Session` in addition to `args` and `kwargs`.  The function
+typically runs one training step in the session.
+
+##### Args:
+
+
+*  <b>`supervisor`</b>: `tf.Supervisor` to run the training services.
+*  <b>`train_step_fn`</b>: Callable to execute one training step.  Called
+    repeatedly as `train_step_fn(session, *args **kwargs)`.
+*  <b>`args`</b>: Optional positional arguments passed to `train_step_fn`.
+*  <b>`kwargs`</b>: Optional keyword arguments passed to `train_step_fn`.
+*  <b>`master`</b>: Master to use to create the training session.  Defaults to
+    `""` which causes the session to be created in the local process.
+
--- a/tensorflow/g3doc/api_docs/python/functions_and_classes/shard3/tf.train.SyncReplicasOptimizer.md
+++ b/tensorflow/g3doc/api_docs/python/functions_and_classes/shard3/tf.train.SyncReplicasOptimizer.md
+Class to synchronize, aggregate gradients and pass them to the optimizer.
+
+In a typical asynchronous training environment, it's common to have some
+stale gradients. For example, with a N-replica asynchronous training,
+gradients will be applied to the variables N times independently. Depending
+on each replica's training speed, some gradients might be calculated from
+copies of the variable from several steps back (N-1 steps on average). This
+optimizer avoids stale gradients by collecting gradients from all replicas,
+summing them, then applying them to the variables in one shot, after
+which replicas can fetch the new variables and continue.
+
+The following queues are created:
+<empty line>
+* N `gradient` queues, one per variable to train. Gradients are pushed to
+  these queues and the chief worker will dequeue_many and then sum them
+  before applying to variables.
+* 1 `token` queue where the optimizer pushes the new global_step value after
+  all gradients have been applied.
+
+The following variables are created:
+* N `local_step`, one per replica. Compared against global step to check for
+  staleness of the gradients.
+
+This adds nodes to the graph to collect gradients and pause the trainers until
+variables are updated.
+For the PS:
+<empty line>
+1. A queue is created for each variable, and each replica now pushes the
+  gradients into the queue instead of directly applying them to the
+  variables.
+2. For each gradient_queue, pop and sum the gradients once enough
+  replicas (replicas_to_aggregate) have pushed gradients to the queue.
+3. Apply the aggregated gradients to the variables.
+4. Only after all variables have been updated, increment the global step.
+5. Only after step 4, clear all the gradients in the queues as they are
+  stale now (could happen when replicas are restarted and push to the queues
+  multiple times, or from the backup replicas).
+6. Only after step 5, pushes `global_step` in the `token_queue`, once for
+  each worker replica. The workers can now fetch it to its local_step variable
+  and start the next batch.
+
+For the replicas:
+<empty line>
+1. Start a step: fetch variables and compute gradients.
+2. Once the gradients have been computed, push them into `gradient_queue` only
+  if local_step equals global_step, otherwise the gradients are just dropped.
+  This avoids stale gradients.
+3. After pushing all the gradients, dequeue an updated value of global_step
+  from the token queue and record that step to its local_step variable. Note
+  that this is effectively a barrier.
+4. Start the next batch.
+
+### Usage
+
+```python
+# Create any optimizer to update the variables, say a simple SGD:
+opt = GradientDescentOptimizer(learning_rate=0.1)
+
+# Wrap the optimizer with sync_replicas_optimizer with 50 replicas: at each
+# step the optimizer collects 50 gradients before applying to variables.
+opt = tf.SyncReplicasOptimizer(opt, replicas_to_aggregate=50,
+          replica_id=task_id, total_num_replicas=50)
+# Note that if you want to have 2 backup replicas, you can change
+# total_num_replicas=52 and make sure this number matches how many physical
+# replicas you started in your job.
+
+# Some models have startup_delays to help stabilize the model but when using
+# sync_replicas training, set it to 0.
+
+# Now you can call `minimize()` or `compute_gradients()` and
+# `apply_gradients()` normally
+grads = opt.minimize(total_loss, global_step=self.global_step)
+
+
+# You can now call get_init_tokens_op() and get_chief_queue_runner().
+# Note that get_init_tokens_op() must be called before creating session
+# because it modifies the graph.
+init_token_op = opt.get_init_tokens_op()
+chief_queue_runner = opt.get_chief_queue_runner()
+```
+
+In the training program, every worker will run the train_op as if not
+synchronized. But one worker (usually the chief) will need to execute the
+chief_queue_runner and get_init_tokens_op generated from this optimizer.
+
+```python
+# After the session is created by the Supervisor and before the main while
+# loop:
+if is_chief and FLAGS.sync_replicas:
+  sv.start_queue_runners(sess, [chief_queue_runner])
+  # Insert initial tokens to the queue.
+  sess.run(init_token_op)
+```
+
+- - -
+
+#### `tf.train.SyncReplicasOptimizer.__init__(opt, replicas_to_aggregate, variable_averages=None, variables_to_average=None, replica_id=None, total_num_replicas=0, use_locking=False, name='sync_replicas')` {#SyncReplicasOptimizer.__init__}
+
+Construct a sync_replicas optimizer.
+
+##### Args:
+
+
+*  <b>`opt`</b>: The actual optimizer that will be used to compute and apply the
+    gradients. Must be one of the Optimizer classes.
+*  <b>`replicas_to_aggregate`</b>: number of replicas to aggregate for each variable
+    update.
+*  <b>`variable_averages`</b>: Optional `ExponentialMovingAverage` object, used to
+    maintain moving averages for the variables passed in
+    `variables_to_average`.
+*  <b>`variables_to_average`</b>: a list of variables that need to be averaged. Only
+    needed if variable_averages is passed in.
+*  <b>`replica_id`</b>: This is the task/worker/replica ID. Needed as index to access
+    local_steps to check staleness. Must be in the interval:
+    [0, total_num_replicas)
+*  <b>`total_num_replicas`</b>: Total number of tasks/workers/replicas, could be
+    different from replicas_to_aggregate.
+    If total_num_replicas > replicas_to_aggregate: it is backup_replicas +
+    replicas_to_aggregate.
+    If total_num_replicas < replicas_to_aggregate: Replicas compute
+    multiple batches per update to variables.
+*  <b>`use_locking`</b>: If True use locks for update operation.
+*  <b>`name`</b>: string. Optional name of the returned operation.
+
+
+- - -
+
+#### `tf.train.SyncReplicasOptimizer.compute_gradients(*args, **kwargs)` {#SyncReplicasOptimizer.compute_gradients}
+
+Compute gradients of "loss" for the variables in "var_list".
+
+This simply wraps the compute_gradients() from the real optimizer. The
+gradients will be aggregated in the apply_gradients() so that user can
+modify the gradients like clipping with per replica global norm if needed.
+The global norm with aggregated gradients can be bad as one replica's huge
+gradients can hurt the gradients from other replicas.
+
+##### Args:
+
+
+*  <b>`*args`</b>: Arguments for compute_gradients().
+*  <b>`**kwargs`</b>: Keyword arguments for compute_gradients().
+
+##### Returns:
+
+  A list of (gradient, variable) pairs.
+
+
+- - -
+
+#### `tf.train.SyncReplicasOptimizer.apply_gradients(grads_and_vars, global_step=None, name=None)` {#SyncReplicasOptimizer.apply_gradients}
+
+Apply gradients to variables.
+
+This contains most of the synchronization implementation and also wraps the
+apply_gradients() from the real optimizer.
+
+##### Args:
+
+
+*  <b>`grads_and_vars`</b>: List of (gradient, variable) pairs as returned by
+    compute_gradients().
+*  <b>`global_step`</b>: Optional Variable to increment by one after the
+    variables have been updated.
+*  <b>`name`</b>: Optional name for the returned operation.  Default to the
+    name passed to the Optimizer constructor.
+
+##### Returns:
+
+
+*  <b>`train_op`</b>: The op to dequeue a token so the replicas can exit this batch
+  and start the next one. This is executed by each replica.
+
+##### Raises:
+
+
+*  <b>`ValueError`</b>: If the grads_and_vars is empty.
+*  <b>`ValueError`</b>: If global step is not provided, the staleness cannot be
+    checked.
+
+
+- - -
+
+#### `tf.train.SyncReplicasOptimizer.get_chief_queue_runner()` {#SyncReplicasOptimizer.get_chief_queue_runner}
+
+Returns the QueueRunner for the chief to execute.
+
+This includes the operations to synchronize replicas: aggregate gradients,
+apply to variables, increment global step, insert tokens to token queue.
+
+Note that this can only be called after calling apply_gradients() which
+actually generates this queuerunner.
+
+##### Returns:
+
+  A `QueueRunner` for chief to execute.
+
+##### Raises:
+
+
+*  <b>`ValueError`</b>: If this is called before apply_gradients().
+
+
+- - -
+
+#### `tf.train.SyncReplicasOptimizer.get_init_tokens_op(num_tokens=-1)` {#SyncReplicasOptimizer.get_init_tokens_op}
+
+Returns the op to fill the sync_token_queue with the tokens.
+
+This is supposed to be executed in the beginning of the chief/sync thread
+so that even if the total_num_replicas is less than replicas_to_aggregate,
+the model can still proceed as the replicas can compute multiple steps per
+variable update. Make sure:
+`num_tokens >= replicas_to_aggregate - total_num_replicas`.
+
+##### Args:
+
+
+*  <b>`num_tokens`</b>: Number of tokens to add to the queue.
+
+##### Returns:
+
+  An op for the chief/sync replica to fill the token queue.
+
+##### Raises:
+
+
+*  <b>`ValueError`</b>: If this is called before apply_gradients().
+*  <b>`ValueError`</b>: If num_tokens are smaller than replicas_to_aggregate -
+    total_num_replicas.
+
+
+
+#### Other Methods
+- - -
+
+#### `tf.train.SyncReplicasOptimizer.get_clean_up_op()` {#SyncReplicasOptimizer.get_clean_up_op}
+
+Returns the clean up op for the chief to execute before exit.
+
+This includes the operation to abort the device with the token queue so all
+other replicas can also restart. This can avoid potential hang when chief
+restarts.
+
+Note that this can only be called after calling apply_gradients().
+
+##### Returns:
+
+  A clean_up_op for chief to execute before exits.
+
+##### Raises:
+
+
+*  <b>`ValueError`</b>: If this is called before apply_gradients().
+
+
+- - -
+
+#### `tf.train.SyncReplicasOptimizer.get_slot(*args, **kwargs)` {#SyncReplicasOptimizer.get_slot}
+
+Return a slot named "name" created for "var" by the Optimizer.
+
+This simply wraps the get_slot() from the actual optimizer.
+
+##### Args:
+
+
+*  <b>`*args`</b>: Arguments for get_slot().
+*  <b>`**kwargs`</b>: Keyword arguments for get_slot().
+
+##### Returns:
+
+  The `Variable` for the slot if it was created, `None` otherwise.
+
+
+- - -
+
+#### `tf.train.SyncReplicasOptimizer.get_slot_names(*args, **kwargs)` {#SyncReplicasOptimizer.get_slot_names}
+
+Return a list of the names of slots created by the `Optimizer`.
+
+This simply wraps the get_slot_names() from the actual optimizer.
+
+##### Args:
+
+
+*  <b>`*args`</b>: Arguments for get_slot().
+*  <b>`**kwargs`</b>: Keyword arguments for get_slot().
+
+##### Returns:
+
+  A list of strings.
+
+
--- a/tensorflow/g3doc/api_docs/python/functions_and_classes/shard3/tf.train.get_checkpoint_mtimes.md
+++ b/tensorflow/g3doc/api_docs/python/functions_and_classes/shard3/tf.train.get_checkpoint_mtimes.md
+### `tf.train.get_checkpoint_mtimes(checkpoint_prefixes)` {#get_checkpoint_mtimes}
+
+Returns the mtimes (modification timestamps) of the checkpoints.
+
+Globs for the checkpoints pointed to by `checkpoint_prefixes`.  If the files
+exist, collect their mtime.  Both V2 and V1 checkpoints are considered, in
+that priority.
+
+This is the recommended way to get the mtimes, since it takes into account
+the naming difference between V1 and V2 formats.
+
+##### Args:
+
+
+*  <b>`checkpoint_prefixes`</b>: a list of checkpoint paths, typically the results of
+    `Saver.save()` or those of `tf.train.latest_checkpoint()`, regardless of
+    sharded/non-sharded or V1/V2.
+
+##### Returns:
+
+  A list of mtimes (in microseconds) of the found checkpoints.
+
--- a/tensorflow/g3doc/api_docs/python/functions_and_classes/shard5/tf.train.NewCheckpointReader.md
+++ b/tensorflow/g3doc/api_docs/python/functions_and_classes/shard5/tf.train.NewCheckpointReader.md
+### `tf.train.NewCheckpointReader(filepattern)` {#NewCheckpointReader}
+
+
+
--- a/tensorflow/g3doc/api_docs/python/functions_and_classes/shard1/tf.contrib.learn.monitors.SummaryWriterCache.get.md
+++ b/tensorflow/g3doc/api_docs/python/functions_and_classes/shard1/tf.contrib.learn.monitors.SummaryWriterCache.get.md
-#### `tf.contrib.learn.monitors.SummaryWriterCache.get(logdir)` {#SummaryWriterCache.get}
+#### `tf.train.SummaryWriterCache.get(logdir)` {#SummaryWriterCache.get}

 Returns the SummaryWriter for the specified directory.


--- a/tensorflow/g3doc/api_docs/python/functions_and_classes/shard5/tf.train.checkpoint_exists.md
+++ b/tensorflow/g3doc/api_docs/python/functions_and_classes/shard5/tf.train.checkpoint_exists.md
+### `tf.train.checkpoint_exists(checkpoint_prefix)` {#checkpoint_exists}
+
+Checks whether a V1 or V2 checkpoint exists with the specified prefix.
+
+This is the recommended way to check if a checkpoint exists, since it takes
+into account the naming difference between V1 and V2 formats.
+
+##### Args:
+
+
+*  <b>`checkpoint_prefix`</b>: the prefix of a V1 or V2 checkpoint, with V2 taking
+    priority.  Typically the result of `Saver.save()` or that of
+    `tf.train.latest_checkpoint()`, regardless of sharded/non-sharded or
+    V1/V2.
+
+##### Returns:
+
+  A bool, true iff a checkpoint referred to by `checkpoint_prefix` exists.
+
--- a/tensorflow/g3doc/api_docs/python/functions_and_classes/shard7/tf.train.ProximalGradientDescentOptimizer.md
+++ b/tensorflow/g3doc/api_docs/python/functions_and_classes/shard7/tf.train.ProximalGradientDescentOptimizer.md
+Optimizer that implements the proximal gradient descent algorithm.
+
+See this [paper](http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf).
+
+- - -
+
+#### `tf.train.ProximalGradientDescentOptimizer.__init__(learning_rate, l1_regularization_strength=0.0, l2_regularization_strength=0.0, use_locking=False, name='ProximalGradientDescent')` {#ProximalGradientDescentOptimizer.__init__}
+
+Construct a new proximal gradient descent optimizer.
+
+##### Args:
+
+
+*  <b>`learning_rate`</b>: A Tensor or a floating point value.  The learning
+    rate to use.
+*  <b>`l1_regularization_strength`</b>: A float value, must be greater than or
+    equal to zero.
+*  <b>`l2_regularization_strength`</b>: A float value, must be greater than or
+    equal to zero.
+*  <b>`use_locking`</b>: If True use locks for update operations.
+*  <b>`name`</b>: Optional name prefix for the operations created when applying
+    gradients. Defaults to "GradientDescent".
+
+
--- a/tensorflow/g3doc/api_docs/python/functions_and_classes/shard3/tf.contrib.learn.monitors.SummaryWriterCache.md
+++ b/tensorflow/g3doc/api_docs/python/functions_and_classes/shard3/tf.contrib.learn.monitors.SummaryWriterCache.md
@@ -3,14 +3,14 @@ Cache for summary writers.
 This class caches summary writers, one per directory.
 - - -

-#### `tf.contrib.learn.monitors.SummaryWriterCache.clear()` {#SummaryWriterCache.clear}
+#### `tf.train.SummaryWriterCache.clear()` {#SummaryWriterCache.clear}

 Clear cached summary writers. Currently only used for unit tests.


 - - -

-#### `tf.contrib.learn.monitors.SummaryWriterCache.get(logdir)` {#SummaryWriterCache.get}
+#### `tf.train.SummaryWriterCache.get(logdir)` {#SummaryWriterCache.get}

 Returns the SummaryWriter for the specified directory.


--- a/tensorflow/g3doc/api_docs/python/functions_and_classes/shard9/tf.train.ProximalAdagradOptimizer.md
+++ b/tensorflow/g3doc/api_docs/python/functions_and_classes/shard9/tf.train.ProximalAdagradOptimizer.md
+Optimizer that implements the Proximal Adagrad algorithm.
+
+See this [paper](http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf).
+
+- - -
+
+#### `tf.train.ProximalAdagradOptimizer.__init__(learning_rate, initial_accumulator_value=0.1, l1_regularization_strength=0.0, l2_regularization_strength=0.0, use_locking=False, name='ProximalAdagrad')` {#ProximalAdagradOptimizer.__init__}
+
+Construct a new ProximalAdagrad optimizer.
+
+##### Args:
+
+
+*  <b>`learning_rate`</b>: A `Tensor` or a floating point value.  The learning rate.
+*  <b>`initial_accumulator_value`</b>: A floating point value.
+    Starting value for the accumulators, must be positive.
+*  <b>`l1_regularization_strength`</b>: A float value, must be greater than or
+    equal to zero.
+*  <b>`l2_regularization_strength`</b>: A float value, must be greater than or
+    equal to zero.
+*  <b>`use_locking`</b>: If `True` use locks for update operations.
+*  <b>`name`</b>: Optional name prefix for the operations created when applying
+    gradients.  Defaults to "Adagrad".
+
+##### Raises:
+
+
+*  <b>`ValueError`</b>: If the `initial_accumulator_value` is invalid.
+
+
--- a/tensorflow/g3doc/api_docs/python/functions_and_classes/shard9/tf.train.SyncReplicasOptimizerV2.md
+++ b/tensorflow/g3doc/api_docs/python/functions_and_classes/shard9/tf.train.SyncReplicasOptimizerV2.md
+Class to synchronize, aggregate gradients and pass them to the optimizer.
+
+In a typical asynchronous training environment, it's common to have some
+stale gradients. For example, with a N-replica asynchronous training,
+gradients will be applied to the variables N times independently. Depending
+on each replica's training speed, some gradients might be calculated from
+copies of the variable from several steps back (N-1 steps on average). This
+optimizer avoids stale gradients by collecting gradients from all replicas,
+averaging them, then applying them to the variables in one shot, after
+which replicas can fetch the new variables and continue.
+
+The following accumulators/queue are created:
+<empty line>
+* N `gradient accumulators`, one per variable to train. Gradients are pushed
+  to them and the chief worker will wait until enough gradients are collected
+  and then average them before applying to variables. The accumulator will
+  drop all stale gradients (more details in the accumulator op).
+* 1 `token` queue where the optimizer pushes the new global_step value after
+  all variables are updated.
+
+The following local variable is created:
+* `sync_rep_local_step`, one per replica. Compared against the global_step in
+  each accumulator to check for staleness of the gradients.
+
+The optimizer adds nodes to the graph to collect gradients and pause the
+trainers until variables are updated.
+For the Parameter Server job:
+<empty line>
+1. An accumulator is created for each variable, and each replica pushes the
+   gradients into the accumulators instead of directly applying them to the
+   variables.
+2. Each accumulator averages once enough gradients (replicas_to_aggregate)
+   have been accumulated.
+3. Apply the averaged gradients to the variables.
+4. Only after all variables have been updated, increment the global step.
+5. Only after step 4, pushes `global_step` in the `token_queue`, once for
+   each worker replica. The workers can now fetch the global step, use it to
+   update its local_step variable and start the next batch.
+
+For the replicas:
+<empty line>
+1. Start a step: fetch variables and compute gradients.
+2. Once the gradients have been computed, push them into gradient
+   accumulators. Each accumulator will check the staleness and drop the stale.
+3. After pushing all the gradients, dequeue an updated value of global_step
+   from the token queue and record that step to its local_step variable. Note
+   that this is effectively a barrier.
+4. Start the next batch.
+
+### Usage
+
+```python
+# Create any optimizer to update the variables, say a simple SGD:
+opt = GradientDescentOptimizer(learning_rate=0.1)
+
+# Wrap the optimizer with sync_replicas_optimizer with 50 replicas: at each
+# step the optimizer collects 50 gradients before applying to variables.
+# Note that if you want to have 2 backup replicas, you can change
+# total_num_replicas=52 and make sure this number matches how many physical
+# replicas you started in your job.
+opt = tf.SyncReplicasOptimizerV2(opt, replicas_to_aggregate=50,
+                                 total_num_replicas=50)
+
+# Some models have startup_delays to help stabilize the model but when using
+# sync_replicas training, set it to 0.
+
+# Now you can call `minimize()` or `compute_gradients()` and
+# `apply_gradients()` normally
+grads = opt.minimize(total_loss, global_step=self.global_step)
+
+
+# You can now call get_init_tokens_op() and get_chief_queue_runner().
+# Note that get_init_tokens_op() must be called before creating session
+# because it modifies the graph by adding new nodes.
+init_token_op = opt.get_init_tokens_op()
+chief_queue_runner = opt.get_chief_queue_runner()
+```
+
+In the training program, every worker will run the train_op as if not
+synchronized. But one worker (usually the chief) will need to execute the
+chief_queue_runner and get_init_tokens_op from this optimizer.
+
+```python
+# When you create the supervisor, you need to add the local_init_op and
+# ready_for_local_init_op to make sure the local_step is initialized to the
+# global_step. Here is an example:
+if is_chief:
+  local_init_op = opt.chief_init_op
+else:
+  local_init_op = opt.local_step_init_op
+ready_for_local_init_op = opt.ready_for_local_init_op
+sv = tf.Supervisor(graph=g,
+                   is_chief=is_chief,
+                   # This initialize local step.
+                   local_init_op=local_init_op,
+                   # This makes sure global step is initialized before using.
+                   ready_for_local_init_op=ready_for_local_init_op,
+                   saver=model.saver)
+
+# After the session is created by the Supervisor and before the main while
+# loop:
+if is_chief and FLAGS.sync_replicas:
+  sv.start_queue_runners(sess, [chief_queue_runner])
+  # Insert initial tokens to the queue.
+  sess.run(init_token_op)
+```
+
+- - -
+
+#### `tf.train.SyncReplicasOptimizerV2.__init__(opt, replicas_to_aggregate, total_num_replicas=None, variable_averages=None, variables_to_average=None, use_locking=False, name='sync_replicas')` {#SyncReplicasOptimizerV2.__init__}
+
+Construct a sync_replicas optimizer.
+
+##### Args:
+
+
+*  <b>`opt`</b>: The actual optimizer that will be used to compute and apply the
+    gradients. Must be one of the Optimizer classes.
+*  <b>`replicas_to_aggregate`</b>: number of replicas to aggregate for each variable
+    update.
+*  <b>`total_num_replicas`</b>: Total number of tasks/workers/replicas, could be
+    different from replicas_to_aggregate.
+    If total_num_replicas > replicas_to_aggregate: it is backup_replicas +
+    replicas_to_aggregate.
+    If total_num_replicas < replicas_to_aggregate: Replicas compute
+    multiple batches per update to variables.
+*  <b>`variable_averages`</b>: Optional `ExponentialMovingAverage` object, used to
+    maintain moving averages for the variables passed in
+    `variables_to_average`.
+*  <b>`variables_to_average`</b>: a list of variables that need to be averaged. Only
+    needed if variable_averages is passed in.
+*  <b>`use_locking`</b>: If True use locks for update operation.
+*  <b>`name`</b>: string. Optional name of the returned operation.
+
+
+- - -
+
+#### `tf.train.SyncReplicasOptimizerV2.compute_gradients(*args, **kwargs)` {#SyncReplicasOptimizerV2.compute_gradients}
+
+Compute gradients of "loss" for the variables in "var_list".
+
+This simply wraps the compute_gradients() from the real optimizer. The
+gradients will be aggregated in the apply_gradients() so that user can
+modify the gradients like clipping with per replica global norm if needed.
+The global norm with aggregated gradients can be bad as one replica's huge
+gradients can hurt the gradients from other replicas.
+
+##### Args:
+
+
+*  <b>`*args`</b>: Arguments for compute_gradients().
+*  <b>`**kwargs`</b>: Keyword arguments for compute_gradients().
+
+##### Returns:
+
+  A list of (gradient, variable) pairs.
+
+
+- - -
+
+#### `tf.train.SyncReplicasOptimizerV2.apply_gradients(grads_and_vars, global_step=None, name=None)` {#SyncReplicasOptimizerV2.apply_gradients}
+
+Apply gradients to variables.
+
+This contains most of the synchronization implementation and also wraps the
+apply_gradients() from the real optimizer.
+
+##### Args:
+
+
+*  <b>`grads_and_vars`</b>: List of (gradient, variable) pairs as returned by
+    compute_gradients().
+*  <b>`global_step`</b>: Optional Variable to increment by one after the
+    variables have been updated.
+*  <b>`name`</b>: Optional name for the returned operation.  Default to the
+    name passed to the Optimizer constructor.
+
+##### Returns:
+
+
+*  <b>`train_op`</b>: The op to dequeue a token so the replicas can exit this batch
+  and start the next one. This is executed by each replica.
+
+##### Raises:
+
+
+*  <b>`ValueError`</b>: If the grads_and_vars is empty.
+*  <b>`ValueError`</b>: If global step is not provided, the staleness cannot be
+    checked.
+
+
+- - -
+
+#### `tf.train.SyncReplicasOptimizerV2.get_chief_queue_runner()` {#SyncReplicasOptimizerV2.get_chief_queue_runner}
+
+Returns the QueueRunner for the chief to execute.
+
+This includes the operations to synchronize replicas: aggregate gradients,
+apply to variables, increment global step, insert tokens to token queue.
+
+Note that this can only be called after calling apply_gradients() which
+actually generates this queuerunner.
+
+##### Returns:
+
+  A `QueueRunner` for chief to execute.
+
+##### Raises:
+
+
+*  <b>`ValueError`</b>: If this is called before apply_gradients().
+
+
+- - -
+
+#### `tf.train.SyncReplicasOptimizerV2.get_init_tokens_op(num_tokens=-1)` {#SyncReplicasOptimizerV2.get_init_tokens_op}
+
+Returns the op to fill the sync_token_queue with the tokens.
+
+This is supposed to be executed in the beginning of the chief/sync thread
+so that even if the total_num_replicas is less than replicas_to_aggregate,
+the model can still proceed as the replicas can compute multiple steps per
+variable update. Make sure:
+`num_tokens >= replicas_to_aggregate - total_num_replicas`.
+
+##### Args:
+
+
+*  <b>`num_tokens`</b>: Number of tokens to add to the queue.
+
+##### Returns:
+
+  An op for the chief/sync replica to fill the token queue.
+
+##### Raises:
+
+
+*  <b>`ValueError`</b>: If this is called before apply_gradients().
+*  <b>`ValueError`</b>: If num_tokens are smaller than replicas_to_aggregate -
+    total_num_replicas.
+
+
+
+#### Other Methods
+- - -
+
+#### `tf.train.SyncReplicasOptimizerV2.get_slot(*args, **kwargs)` {#SyncReplicasOptimizerV2.get_slot}
+
+Return a slot named "name" created for "var" by the Optimizer.
+
+This simply wraps the get_slot() from the actual optimizer.
+
+##### Args:
+
+
+*  <b>`*args`</b>: Arguments for get_slot().
+*  <b>`**kwargs`</b>: Keyword arguments for get_slot().
+
+##### Returns:
+
+  The `Variable` for the slot if it was created, `None` otherwise.
+
+
+- - -
+
+#### `tf.train.SyncReplicasOptimizerV2.get_slot_names(*args, **kwargs)` {#SyncReplicasOptimizerV2.get_slot_names}
+
+Return a list of the names of slots created by the `Optimizer`.
+
+This simply wraps the get_slot_names() from the actual optimizer.
+
+##### Args:
+
+
+*  <b>`*args`</b>: Arguments for get_slot().
+*  <b>`**kwargs`</b>: Keyword arguments for get_slot().
+
+##### Returns:
+
+  A list of strings.
+
+
--- a/tensorflow/g3doc/api_docs/python/index.md
+++ b/tensorflow/g3doc/api_docs/python/index.md
@@ -561,6 +561,8 @@
  * [`AggregationMethod`](../../api_docs/python/train.md#AggregationMethod)
  * [`assert_global_step`](../../api_docs/python/train.md#assert_global_step)
  * [`audio_summary`](../../api_docs/python/train.md#audio_summary)
+  * [`basic_train_loop`](../../api_docs/python/train.md#basic_train_loop)
+  * [`checkpoint_exists`](../../api_docs/python/train.md#checkpoint_exists)
  * [`CheckpointSaverHook`](../../api_docs/python/train.md#CheckpointSaverHook)
  * [`ChiefSessionCreator`](../../api_docs/python/train.md#ChiefSessionCreator)
  * [`clip_by_average_norm`](../../api_docs/python/train.md#clip_by_average_norm)
@@ -574,6 +576,7 @@
  * [`ExponentialMovingAverage`](../../api_docs/python/train.md#ExponentialMovingAverage)
  * [`FtrlOptimizer`](../../api_docs/python/train.md#FtrlOptimizer)
  * [`generate_checkpoint_state_proto`](../../api_docs/python/train.md#generate_checkpoint_state_proto)
+  * [`get_checkpoint_mtimes`](../../api_docs/python/train.md#get_checkpoint_mtimes)
  * [`get_global_step`](../../api_docs/python/train.md#get_global_step)
  * [`global_norm`](../../api_docs/python/train.md#global_norm)
  * [`global_step`](../../api_docs/python/train.md#global_step)
@@ -590,7 +593,10 @@
  * [`MonitoredTrainingSession`](../../api_docs/python/train.md#MonitoredTrainingSession)
  * [`NanLossDuringTrainingError`](../../api_docs/python/train.md#NanLossDuringTrainingError)
  * [`NanTensorHook`](../../api_docs/python/train.md#NanTensorHook)
+  * [`NewCheckpointReader`](../../api_docs/python/train.md#NewCheckpointReader)
  * [`Optimizer`](../../api_docs/python/train.md#Optimizer)
+  * [`ProximalAdagradOptimizer`](../../api_docs/python/train.md#ProximalAdagradOptimizer)
+  * [`ProximalGradientDescentOptimizer`](../../api_docs/python/train.md#ProximalGradientDescentOptimizer)
  * [`QueueRunner`](../../api_docs/python/train.md#QueueRunner)
  * [`replica_device_setter`](../../api_docs/python/train.md#replica_device_setter)
  * [`RMSPropOptimizer`](../../api_docs/python/train.md#RMSPropOptimizer)
@@ -610,7 +616,10 @@
  * [`summary_iterator`](../../api_docs/python/train.md#summary_iterator)
  * [`SummarySaverHook`](../../api_docs/python/train.md#SummarySaverHook)
  * [`SummaryWriter`](../../api_docs/python/train.md#SummaryWriter)
+  * [`SummaryWriterCache`](../../api_docs/python/train.md#SummaryWriterCache)
  * [`Supervisor`](../../api_docs/python/train.md#Supervisor)
+  * [`SyncReplicasOptimizer`](../../api_docs/python/train.md#SyncReplicasOptimizer)
+  * [`SyncReplicasOptimizerV2`](../../api_docs/python/train.md#SyncReplicasOptimizerV2)
  * [`WorkerSessionCreator`](../../api_docs/python/train.md#WorkerSessionCreator)
  * [`write_graph`](../../api_docs/python/train.md#write_graph)
  * [`zero_fraction`](../../api_docs/python/train.md#zero_fraction)
@@ -942,7 +951,6 @@
  * [`StepCounter`](../../api_docs/python/contrib.learn.monitors.md#StepCounter)
  * [`StopAtStep`](../../api_docs/python/contrib.learn.monitors.md#StopAtStep)
  * [`SummarySaver`](../../api_docs/python/contrib.learn.monitors.md#SummarySaver)
-  * [`SummaryWriterCache`](../../api_docs/python/contrib.learn.monitors.md#SummaryWriterCache)
  * [`ValidationMonitor`](../../api_docs/python/contrib.learn.monitors.md#ValidationMonitor)

 * **[Losses (contrib)](../../api_docs/python/contrib.losses.md)**:

--- a/tensorflow/g3doc/api_docs/python/summary.md
+++ b/tensorflow/g3doc/api_docs/python/summary.md
@@ -3,9 +3,9 @@
 # Summary Operations
 [TOC]

-This module contains ops for generating summaries.
+## Generation of summaries.

-## Summary Ops
+### Summary Ops
 - - -

 ### `tf.summary.tensor_summary(name, tensor, summary_description=None, collections=None)` {#tensor_summary}

--- a/tensorflow/g3doc/api_docs/python/train.md
+++ b/tensorflow/g3doc/api_docs/python/train.md