Internal changes for object detection. (#3656)

* Force cast of num_classes to integer PiperOrigin-RevId: 188335318 * Updating config util to allow overwriting of cosine decay learning rates. PiperOrigin-RevId: 188338852 * Make box_list_ops.py and box_list_ops_test.py work with C API enabled. The C API has improved shape inference over the original Python code. This causes some previously-working conds to fail. Switching to smart_cond fixes this. Another effect of the improved shape inference is that one of the failures tested gets caught earlier, so I modified the test to reflect this. PiperOrigin-RevId: 188409792 * Fix parallel event file writing issue. Without this change, the event files might get corrupted when multiple evaluations are run in parallel. PiperOrigin-RevId: 188502560 * Deprecating the boolean flag of from_detection_checkpoint. Replace with a string field fine_tune_checkpoint_type to train_config to provide extensibility. The fine_tune_checkpoint_type can currently take value of `detection`, `classification`, or others when the restore_map is overwritten. PiperOrigin-RevId: 188518685 * Automated g4 rollback of changelist 188502560 PiperOrigin-RevId: 188519969 * Introducing eval metrics specs for Coco Mask metrics. This allows metrics to be computed in tensorflow using the tf.learn Estimator. PiperOrigin-RevId: 188528485 * Minor fix to make object_detection/metrics/coco_evaluation.py python3 compatible. PiperOrigin-RevId: 188550683 * Updating eval_util to handle eval_metric_ops from multiple `DetectionEvaluator`s. PiperOrigin-RevId: 188560474 * Allow tensor input for new_height and new_width for resize_image. PiperOrigin-RevId: 188561908 * Fix typo in fine_tune_checkpoint_type name in trainer. PiperOrigin-RevId: 188799033 * Adding mobilenet feature extractor to object detection. PiperOrigin-RevId: 188916897 * Allow label maps to optionally contain an explicit background class with id zero. PiperOrigin-RevId: 188951089 * Fix boundary conditions in random_pad_to_aspect_ratio to ensure that min_scale is always less than max_scale. PiperOrigin-RevId: 189026868 * Fallback on from_detection_checkpoint option if fine_tune_checkpoint_type isn't set. PiperOrigin-RevId: 189052833 * Add proper names for learning rate schedules so we don't see cryptic names on tensorboard. PiperOrigin-RevId: 189069837 * Enforcing that all datasets are batched (and then unbatched in the model) with batch_size >= 1. PiperOrigin-RevId: 189117178 * Adding regularization to total loss returned from DetectionModel.loss(). PiperOrigin-RevId: 189189123 * Standardize the names of loss scalars (for SSD, Faster R-CNN and R-FCN) in both training and eval so they can be compared on tensorboard. Log localization and classification losses in evaluation. PiperOrigin-RevId: 189189940 * Remove negative test from box list ops test. PiperOrigin-RevId: 189229327 * Add an option to warmup learning rate in manual stepping schedule. PiperOrigin-RevId: 189361039 * Replace tf.contrib.slim.tfexample_decoder.LookupTensor with object_detection.data_decoders.tf_example_decoder.LookupTensor. PiperOrigin-RevId: 189388556 * Force regularization summary variables under specific family names. PiperOrigin-RevId: 189393190 * Automated g4 rollback of changelist 188619139 PiperOrigin-RevId: 189396001 * Remove step 0 schedule since we do a hard check for it after cl/189361039 PiperOrigin-RevId: 189396697 * PiperOrigin-RevId: 189040463 * PiperOrigin-RevId: 189059229 * PiperOrigin-RevId: 189214402 * Force regularization summary variables under specific family names. PiperOrigin-RevId: 189393190 * Automated g4 rollback of changelist 188619139 PiperOrigin-RevId: 189396001 * Make slim python3 compatible. * Monir fixes. * Add TargetAssignment summaries in a separate family. PiperOrigin-RevId: 189407487 * 1. Setting `family` keyword arg prepends the summary names twice with the same name. Directly adding family suffix to the name gets rid of this problem. 2. Make sure the eval losses have the same name. PiperOrigin-RevId: 189434618 * Minor fixes to make object detection tf 1.4 compatible. PiperOrigin-RevId: 189437519 * Call the base of mobilenet_v1 feature extractor under the right arg scope and set batchnorm is_training based on the value passed in the constructor. PiperOrigin-RevId: 189460890 * Automated g4 rollback of changelist 188409792 PiperOrigin-RevId: 189463882 * Update object detection syncing. PiperOrigin-RevId: 189601955 * Add an option to warmup learning rate, hold it constant for a certain number of steps and cosine decay it. PiperOrigin-RevId: 189606169 * Let the proposal feature extractor function in faster_rcnn meta architectures return the activations (end_points). PiperOrigin-RevId: 189619301 * Fixed bug which caused masks to be mostly zeros (caused by detection_boxes being in absolute coordinates if scale_to_absolute=True. PiperOrigin-RevId: 189641294 * Open sourcing Mobilenetv2 + SSDLite. PiperOrigin-RevId: 189654520 * Remove unused files.

Internal changes for object detection. (#3656)
* Force cast of num_classes to integer PiperOrigin-RevId: 188335318 * Updating config util to allow overwriting of cosine decay learning rates. PiperOrigin-RevId: 188338852 * Make box_list_ops.py and box_list_ops_test.py work with C API enabled. The C API has improved shape inference over the original Python code. This causes some previously-working conds to fail. Switching to smart_cond fixes this. Another effect of the improved shape inference is that one of the failures tested gets caught earlier, so I modified the test to reflect this. PiperOrigin-RevId: 188409792 * Fix parallel event file writing issue. Without this change, the event files might get corrupted when multiple evaluations are run in parallel. PiperOrigin-RevId: 188502560 * Deprecating the boolean flag of from_detection_checkpoint. Replace with a string field fine_tune_checkpoint_type to train_config to provide extensibility. The fine_tune_checkpoint_type can currently take value of `detection`, `classification`, or others when the restore_map is overwritten. PiperOrigin-RevId: 188518685 * Automated g4 rollback of changelist 188502560 PiperOrigin-RevId: 188519969 * Introducing eval metrics specs for Coco Mask metrics. This allows metrics to be computed in tensorflow using the tf.learn Estimator. PiperOrigin-RevId: 188528485 * Minor fix to make object_detection/metrics/coco_evaluation.py python3 compatible. PiperOrigin-RevId: 188550683 * Updating eval_util to handle eval_metric_ops from multiple `DetectionEvaluator`s. PiperOrigin-RevId: 188560474 * Allow tensor input for new_height and new_width for resize_image. PiperOrigin-RevId: 188561908 * Fix typo in fine_tune_checkpoint_type name in trainer. PiperOrigin-RevId: 188799033 * Adding mobilenet feature extractor to object detection. PiperOrigin-RevId: 188916897 * Allow label maps to optionally contain an explicit background class with id zero. PiperOrigin-RevId: 188951089 * Fix boundary conditions in random_pad_to_aspect_ratio to ensure that min_scale is always less than max_scale. PiperOrigin-RevId: 189026868 * Fallback on from_detection_checkpoint option if fine_tune_checkpoint_type isn't set. PiperOrigin-RevId: 189052833 * Add proper names for learning rate schedules so we don't see cryptic names on tensorboard. PiperOrigin-RevId: 189069837 * Enforcing that all datasets are batched (and then unbatched in the model) with batch_size >= 1. PiperOrigin-RevId: 189117178 * Adding regularization to total loss returned from DetectionModel.loss(). PiperOrigin-RevId: 189189123 * Standardize the names of loss scalars (for SSD, Faster R-CNN and R-FCN) in both training and eval so they can be compared on tensorboard. Log localization and classification losses in evaluation. PiperOrigin-RevId: 189189940 * Remove negative test from box list ops test. PiperOrigin-RevId: 189229327 * Add an option to warmup learning rate in manual stepping schedule. PiperOrigin-RevId: 189361039 * Replace tf.contrib.slim.tfexample_decoder.LookupTensor with object_detection.data_decoders.tf_example_decoder.LookupTensor. PiperOrigin-RevId: 189388556 * Force regularization summary variables under specific family names. PiperOrigin-RevId: 189393190 * Automated g4 rollback of changelist 188619139 PiperOrigin-RevId: 189396001 * Remove step 0 schedule since we do a hard check for it after cl/189361039 PiperOrigin-RevId: 189396697 * PiperOrigin-RevId: 189040463 * PiperOrigin-RevId: 189059229 * PiperOrigin-RevId: 189214402 * Force regularization summary variables under specific family names. PiperOrigin-RevId: 189393190 * Automated g4 rollback of changelist 188619139 PiperOrigin-RevId: 189396001 * Make slim python3 compatible. * Monir fixes. * Add TargetAssignment summaries in a separate family. PiperOrigin-RevId: 189407487 * 1. Setting `family` keyword arg prepends the summary names twice with the same name. Directly adding family suffix to the name gets rid of this problem. 2. Make sure the eval losses have the same name. PiperOrigin-RevId: 189434618 * Minor fixes to make object detection tf 1.4 compatible. PiperOrigin-RevId: 189437519 * Call the base of mobilenet_v1 feature extractor under the right arg scope and set batchnorm is_training based on the value passed in the constructor. PiperOrigin-RevId: 189460890 * Automated g4 rollback of changelist 188409792 PiperOrigin-RevId: 189463882 * Update object detection syncing. PiperOrigin-RevId: 189601955 * Add an option to warmup learning rate, hold it constant for a certain number of steps and cosine decay it. PiperOrigin-RevId: 189606169 * Let the proposal feature extractor function in faster_rcnn meta architectures return the activations (end_points). PiperOrigin-RevId: 189619301 * Fixed bug which caused masks to be mostly zeros (caused by detection_boxes being in absolute coordinates if scale_to_absolute=True. PiperOrigin-RevId: 189641294 * Open sourcing Mobilenetv2 + SSDLite. PiperOrigin-RevId: 189654520 * Remove unused files.
001a2a61 · pkulzc · Sergio Guadarrama · 2913cb24 · 001a2a61 · 001a2a61
93 changed file
--- a/research/object_detection/builders/dataset_builder.py
+++ b/research/object_detection/builders/dataset_builder.py
@@ -30,8 +30,8 @@ from object_detection.protos import input_reader_pb2
 from object_detection.utils import dataset_util


-def _get_padding_shapes(dataset, max_num_boxes, num_classes,
-                        spatial_image_shape):
+def _get_padding_shapes(dataset, max_num_boxes=None, num_classes=None,
+                        spatial_image_shape=None):
  """Returns shapes to pad dataset tensors to before batching.

  Args:
@@ -41,13 +41,21 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,
    num_classes: Number of classes in the dataset needed to compute shapes for
      padding.
    spatial_image_shape: A list of two integers of the form [height, width]
-      containing expected spatial shape of the imaage.
+      containing expected spatial shape of the image.

  Returns:
    A dictionary keyed by fields.InputDataFields containing padding shapes for
    tensors in the dataset.
+
+  Raises:
+    ValueError: If groundtruth classes is neither rank 1 nor rank 2.
  """
-  height, width = spatial_image_shape
+
+  if not spatial_image_shape or spatial_image_shape == [-1, -1]:
+    height, width = None, None
+  else:
+    height, width = spatial_image_shape  # pylint: disable=unpacking-non-sequence
+
  padding_shapes = {
      fields.InputDataFields.image: [height, width, 3],
      fields.InputDataFields.source_id: [],
@@ -55,9 +63,6 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,
      fields.InputDataFields.key: [],
      fields.InputDataFields.groundtruth_difficult: [max_num_boxes],
      fields.InputDataFields.groundtruth_boxes: [max_num_boxes, 4],
-      fields.InputDataFields.groundtruth_classes: [
-          max_num_boxes, num_classes
-      ],
      fields.InputDataFields.groundtruth_instance_masks: [max_num_boxes, height,
                                                          width],
      fields.InputDataFields.groundtruth_is_crowd: [max_num_boxes],
@@ -69,6 +74,21 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,
      fields.InputDataFields.groundtruth_label_scores: [max_num_boxes],
      fields.InputDataFields.true_image_shape: [3]
  }
+  # Determine whether groundtruth_classes are integers or one-hot encodings, and
+  # apply batching appropriately.
+  classes_shape = dataset.output_shapes[
+      fields.InputDataFields.groundtruth_classes]
+  if len(classes_shape) == 1:  # Class integers.
+    padding_shapes[fields.InputDataFields.groundtruth_classes] = [max_num_boxes]
+  elif len(classes_shape) == 2:  # One-hot or k-hot encoding.
+    padding_shapes[fields.InputDataFields.groundtruth_classes] = [
+        max_num_boxes, num_classes]
+  else:
+    raise ValueError('Groundtruth classes must be a rank 1 tensor (classes) or '
+                     'rank 2 tensor (one-hot encodings)')
+
+  if fields.InputDataFields.original_image in dataset.output_shapes:
+    padding_shapes[fields.InputDataFields.original_image] = [None, None, 3]
  if fields.InputDataFields.groundtruth_keypoints in dataset.output_shapes:
    tensor_shape = dataset.output_shapes[fields.InputDataFields.
                                         groundtruth_keypoints]
@@ -87,28 +107,25 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,


 def build(input_reader_config, transform_input_data_fn=None,
-          batch_size=1, max_num_boxes=None, num_classes=None,
+          batch_size=None, max_num_boxes=None, num_classes=None,
          spatial_image_shape=None):
  """Builds a tf.data.Dataset.

  Builds a tf.data.Dataset by applying the `transform_input_data_fn` on all
-  records. Optionally, if `batch_size` > 1 and `max_num_boxes`, `num_classes`
-  and `spatial_image_shape` are not None, returns a padded batched
-  tf.data.Dataset.
+  records. Applies a padded batch to the resulting dataset.

  Args:
    input_reader_config: A input_reader_pb2.InputReader object.
    transform_input_data_fn: Function to apply to all records, or None if
      no extra decoding is required.
-    batch_size: Batch size. If not None, returns a padded batch dataset.
-    max_num_boxes: Max number of groundtruth boxes needed to computes shapes for
-      padding. This is only used if batch_size is greater than 1.
+    batch_size: Batch size. If None, batching is not performed.
+    max_num_boxes: Max number of groundtruth boxes needed to compute shapes for
+      padding. If None, will use a dynamic shape.
    num_classes: Number of classes in the dataset needed to compute shapes for
-      padding. This is only used if batch_size is greater than 1.
-    spatial_image_shape: a list of two integers of the form [height, width]
+      padding. If None, will use a dynamic shape.
+    spatial_image_shape: A list of two integers of the form [height, width]
      containing expected spatial shape of the image after applying
-      transform_input_data_fn. This is needed to compute shapes for padding and
-      only used if batch_size is greater than 1.
+      transform_input_data_fn. If None, will use dynamic shapes.

  Returns:
    A tf.data.Dataset based on the input_reader_config.
@@ -116,8 +133,6 @@ def build(input_reader_config, transform_input_data_fn=None,
  Raises:
    ValueError: On invalid input reader proto.
    ValueError: If no input paths are specified.
-    ValueError: If batch_size > 1 and any of (max_num_boxes, num_classes,
-      spatial_image_shape) is None.
  """
  if not isinstance(input_reader_config, input_reader_pb2.InputReader):
    raise ValueError('input_reader_config not of type '
@@ -147,14 +162,7 @@ def build(input_reader_config, transform_input_data_fn=None,
        functools.partial(tf.data.TFRecordDataset, buffer_size=8 * 1000 * 1000),
        process_fn, config.input_path[:], input_reader_config)

-    if batch_size > 1:
-      if num_classes is None:
-        raise ValueError('`num_classes` must be set when batch_size > 1.')
-      if max_num_boxes is None:
-        raise ValueError('`max_num_boxes` must be set when batch_size > 1.')
-      if spatial_image_shape is None:
-        raise ValueError('`spatial_image_shape` must be set when batch_size > '
-                         '1 .')
+    if batch_size:
      padding_shapes = _get_padding_shapes(dataset, max_num_boxes, num_classes,
                                           spatial_image_shape)
      dataset = dataset.apply(

--- a/research/object_detection/builders/dataset_builder_test.py
+++ b/research/object_detection/builders/dataset_builder_test.py
@@ -91,7 +91,7 @@ class DatasetBuilderTest(tf.test.TestCase):
    input_reader_proto = input_reader_pb2.InputReader()
    text_format.Merge(input_reader_text_proto, input_reader_proto)
    tensor_dict = dataset_util.make_initializable_iterator(
-        dataset_builder.build(input_reader_proto)).get_next()
+        dataset_builder.build(input_reader_proto, batch_size=1)).get_next()

    sv = tf.train.Supervisor(logdir=self.get_temp_dir())
    with sv.prepare_or_wait_for_session() as sess:
@@ -100,15 +100,15 @@ class DatasetBuilderTest(tf.test.TestCase):

    self.assertTrue(
        fields.InputDataFields.groundtruth_instance_masks not in output_dict)
-    self.assertEquals((4, 5, 3),
+    self.assertEquals((1, 4, 5, 3),
                      output_dict[fields.InputDataFields.image].shape)
-    self.assertEquals([2],
-                      output_dict[fields.InputDataFields.groundtruth_classes])
+    self.assertAllEqual([[2]],
+                        output_dict[fields.InputDataFields.groundtruth_classes])
    self.assertEquals(
-        (1, 4), output_dict[fields.InputDataFields.groundtruth_boxes].shape)
+        (1, 1, 4), output_dict[fields.InputDataFields.groundtruth_boxes].shape)
    self.assertAllEqual(
        [0.0, 0.0, 1.0, 1.0],
-        output_dict[fields.InputDataFields.groundtruth_boxes][0])
+        output_dict[fields.InputDataFields.groundtruth_boxes][0][0])

  def test_build_tf_record_input_reader_and_load_instance_masks(self):
    tf_record_path = self.create_tf_record()
@@ -124,14 +124,14 @@ class DatasetBuilderTest(tf.test.TestCase):
    input_reader_proto = input_reader_pb2.InputReader()
    text_format.Merge(input_reader_text_proto, input_reader_proto)
    tensor_dict = dataset_util.make_initializable_iterator(
-        dataset_builder.build(input_reader_proto)).get_next()
+        dataset_builder.build(input_reader_proto, batch_size=1)).get_next()

    sv = tf.train.Supervisor(logdir=self.get_temp_dir())
    with sv.prepare_or_wait_for_session() as sess:
      sv.start_queue_runners(sess)
      output_dict = sess.run(tensor_dict)
    self.assertAllEqual(
-        (1, 4, 5),
+        (1, 1, 4, 5),
        output_dict[fields.InputDataFields.groundtruth_instance_masks].shape)

  def test_build_tf_record_input_reader_with_batch_size_two(self):

--- a/research/object_detection/builders/model_builder.py
+++ b/research/object_detection/builders/model_builder.py
@@ -36,6 +36,7 @@ from object_detection.models.embedded_ssd_mobilenet_v1_feature_extractor import
 from object_detection.models.ssd_inception_v2_feature_extractor import SSDInceptionV2FeatureExtractor
 from object_detection.models.ssd_inception_v3_feature_extractor import SSDInceptionV3FeatureExtractor
 from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobileNetV1FeatureExtractor
+from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
 from object_detection.protos import model_pb2

 # A map of names to SSD feature extractors.
@@ -43,6 +44,7 @@ SSD_FEATURE_EXTRACTOR_CLASS_MAP = {
    'ssd_inception_v2': SSDInceptionV2FeatureExtractor,
    'ssd_inception_v3': SSDInceptionV3FeatureExtractor,
    'ssd_mobilenet_v1': SSDMobileNetV1FeatureExtractor,
+    'ssd_mobilenet_v2': SSDMobileNetV2FeatureExtractor,
    'ssd_resnet50_v1_fpn': ssd_resnet_v1_fpn.SSDResnet50V1FpnFeatureExtractor,
    'ssd_resnet101_v1_fpn': ssd_resnet_v1_fpn.SSDResnet101V1FpnFeatureExtractor,
    'ssd_resnet152_v1_fpn': ssd_resnet_v1_fpn.SSDResnet152V1FpnFeatureExtractor,

--- a/research/object_detection/builders/model_builder_test.py
+++ b/research/object_detection/builders/model_builder_test.py
@@ -31,6 +31,7 @@ from object_detection.models.embedded_ssd_mobilenet_v1_feature_extractor import
 from object_detection.models.ssd_inception_v2_feature_extractor import SSDInceptionV2FeatureExtractor
 from object_detection.models.ssd_inception_v3_feature_extractor import SSDInceptionV3FeatureExtractor
 from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobileNetV1FeatureExtractor
+from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
 from object_detection.protos import model_pb2

 FRCNN_RESNET_FEAT_MAPS = {
@@ -368,6 +369,81 @@ class ModelBuilderTest(tf.test.TestCase):
    self.assertTrue(model._feature_extractor._batch_norm_trainable)
    self.assertTrue(model._normalize_loc_loss_by_codesize)

+  def test_create_ssd_mobilenet_v2_model_from_config(self):
+    model_text_proto = """
+      ssd {
+        feature_extractor {
+          type: 'ssd_mobilenet_v2'
+          conv_hyperparams {
+            regularizer {
+                l2_regularizer {
+                }
+              }
+              initializer {
+                truncated_normal_initializer {
+                }
+              }
+          }
+          batch_norm_trainable: true
+        }
+        box_coder {
+          faster_rcnn_box_coder {
+          }
+        }
+        matcher {
+          argmax_matcher {
+          }
+        }
+        similarity_calculator {
+          iou_similarity {
+          }
+        }
+        anchor_generator {
+          ssd_anchor_generator {
+            aspect_ratios: 1.0
+          }
+        }
+        image_resizer {
+          fixed_shape_resizer {
+            height: 320
+            width: 320
+          }
+        }
+        box_predictor {
+          convolutional_box_predictor {
+            conv_hyperparams {
+              regularizer {
+                l2_regularizer {
+                }
+              }
+              initializer {
+                truncated_normal_initializer {
+                }
+              }
+            }
+          }
+        }
+        normalize_loc_loss_by_codesize: true
+        loss {
+          classification_loss {
+            weighted_softmax {
+            }
+          }
+          localization_loss {
+            weighted_smooth_l1 {
+            }
+          }
+        }
+      }"""
+    model_proto = model_pb2.DetectionModel()
+    text_format.Merge(model_text_proto, model_proto)
+    model = self.create_model(model_proto)
+    self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch)
+    self.assertIsInstance(model._feature_extractor,
+                          SSDMobileNetV2FeatureExtractor)
+    self.assertTrue(model._feature_extractor._batch_norm_trainable)
+    self.assertTrue(model._normalize_loc_loss_by_codesize)
+
  def test_create_embedded_ssd_mobilenet_v1_model_from_config(self):
    model_text_proto = """
      ssd {

--- a/research/object_detection/builders/optimizer_builder.py
+++ b/research/object_detection/builders/optimizer_builder.py
@@ -85,7 +85,8 @@ def _create_learning_rate(learning_rate_config):
  learning_rate_type = learning_rate_config.WhichOneof('learning_rate')
  if learning_rate_type == 'constant_learning_rate':
    config = learning_rate_config.constant_learning_rate
-    learning_rate = tf.constant(config.learning_rate, dtype=tf.float32)
+    learning_rate = tf.constant(config.learning_rate, dtype=tf.float32,
+                                name='learning_rate')

  if learning_rate_type == 'exponential_decay_learning_rate':
    config = learning_rate_config.exponential_decay_learning_rate
@@ -94,7 +95,7 @@ def _create_learning_rate(learning_rate_config):
        tf.train.get_or_create_global_step(),
        config.decay_steps,
        config.decay_factor,
-        staircase=config.staircase)
+        staircase=config.staircase, name='learning_rate')

  if learning_rate_type == 'manual_step_learning_rate':
    config = learning_rate_config.manual_step_learning_rate
@@ -105,7 +106,7 @@ def _create_learning_rate(learning_rate_config):
    learning_rate_sequence += [x.learning_rate for x in config.schedule]
    learning_rate = learning_schedules.manual_stepping(
        tf.train.get_or_create_global_step(), learning_rate_step_boundaries,
-        learning_rate_sequence)
+        learning_rate_sequence, config.warmup)

  if learning_rate_type == 'cosine_decay_learning_rate':
    config = learning_rate_config.cosine_decay_learning_rate
@@ -114,7 +115,8 @@ def _create_learning_rate(learning_rate_config):
        config.learning_rate_base,
        config.total_steps,
        config.warmup_learning_rate,
-        config.warmup_steps)
+        config.warmup_steps,
+        config.hold_base_rate_steps)

  if learning_rate is None:
    raise ValueError('Learning_rate %s not supported.' % learning_rate_type)

--- a/research/object_detection/builders/optimizer_builder_test.py
+++ b/research/object_detection/builders/optimizer_builder_test.py
@@ -35,6 +35,7 @@ class LearningRateBuilderTest(tf.test.TestCase):
    text_format.Merge(learning_rate_text_proto, learning_rate_proto)
    learning_rate = optimizer_builder._create_learning_rate(
        learning_rate_proto)
+    self.assertTrue(learning_rate.op.name.endswith('learning_rate'))
    with self.test_session():
      learning_rate_out = learning_rate.eval()
    self.assertAlmostEqual(learning_rate_out, 0.004)
@@ -52,19 +53,22 @@ class LearningRateBuilderTest(tf.test.TestCase):
    text_format.Merge(learning_rate_text_proto, learning_rate_proto)
    learning_rate = optimizer_builder._create_learning_rate(
        learning_rate_proto)
+    self.assertTrue(learning_rate.op.name.endswith('learning_rate'))
    self.assertTrue(isinstance(learning_rate, tf.Tensor))

  def testBuildManualStepLearningRate(self):
    learning_rate_text_proto = """
      manual_step_learning_rate {
+        initial_learning_rate: 0.002
        schedule {
-          step: 0
+          step: 100
          learning_rate: 0.006
        }
        schedule {
          step: 90000
          learning_rate: 0.00006
        }
+        warmup: true
      }
    """
    learning_rate_proto = optimizer_pb2.LearningRate()
@@ -80,6 +84,7 @@ class LearningRateBuilderTest(tf.test.TestCase):
        total_steps: 20000
        warmup_learning_rate: 0.0001
        warmup_steps: 1000
+        hold_base_rate_steps: 20000
      }
    """
    learning_rate_proto = optimizer_pb2.LearningRate()

--- a/research/object_detection/core/box_list_ops_test.py
+++ b/research/object_detection/core/box_list_ops_test.py
@@ -727,21 +727,6 @@ class ConcatenateTest(tf.test.TestCase):

 class NonMaxSuppressionTest(tf.test.TestCase):

-  def test_with_invalid_scores_field(self):
-    corners = tf.constant([[0, 0, 1, 1],
-                           [0, 0.1, 1, 1.1],
-                           [0, -0.1, 1, 0.9],
-                           [0, 10, 1, 11],
-                           [0, 10.1, 1, 11.1],
-                           [0, 100, 1, 101]], tf.float32)
-    boxes = box_list.BoxList(corners)
-    boxes.add_field('scores', tf.constant([.9, .75, .6, .95, .5]))
-    iou_thresh = .5
-    max_output_size = 3
-    with self.assertRaisesWithPredicateMatch(ValueError,
-                                             'Dimensions must be equal'):
-      box_list_ops.non_max_suppression(boxes, iou_thresh, max_output_size)
-
  def test_select_from_three_clusters(self):
    corners = tf.constant([[0, 0, 1, 1],
                           [0, 0.1, 1, 1.1],

--- a/research/object_detection/core/model.py
+++ b/research/object_detection/core/model.py
@@ -275,7 +275,7 @@ class DetectionModel(object):
          fields.BoxListFields.keypoints] = groundtruth_keypoints_list

  @abstractmethod
-  def restore_map(self, from_detection_checkpoint=True):
+  def restore_map(self, fine_tune_checkpoint_type='detection'):
    """Returns a map of variables to load from a foreign checkpoint.

    Returns a map of variable names to load from a checkpoint to variables in
@@ -287,9 +287,10 @@ class DetectionModel(object):
    the num_classes parameter.

    Args:
-      from_detection_checkpoint: whether to restore from a full detection
+      fine_tune_checkpoint_type: whether to restore from a full detection
        checkpoint (with compatible variable names) or to restore from a
        classification checkpoint for initialization prior to training.
+        Valid values: `detection`, `classification`. Default 'detection'.

    Returns:
      A dict mapping variable names (to load from a checkpoint) to variables in

--- a/research/object_detection/core/post_processing.py
+++ b/research/object_detection/core/post_processing.py
@@ -122,7 +122,7 @@ def multiclass_non_max_suppression(boxes,
    if boundaries is not None:
      per_class_boundaries_list = tf.unstack(boundaries, axis=1)
    boxes_ids = (range(num_classes) if len(per_class_boxes_list) > 1
-                 else [0] * num_classes)
+                 else [0] * num_classes.value)
    for class_idx, boxes_idx in zip(range(num_classes), boxes_ids):
      per_class_boxes = per_class_boxes_list[boxes_idx]
      boxlist_and_class_scores = box_list.BoxList(per_class_boxes)

--- a/research/object_detection/core/preprocessor.py
+++ b/research/object_detection/core/preprocessor.py
@@ -233,7 +233,7 @@ def _rgb_to_grayscale(images, name=None):
    rgb_weights = [0.2989, 0.5870, 0.1140]
    rank_1 = tf.expand_dims(tf.rank(images) - 1, 0)
    gray_float = tf.reduce_sum(
-        flt_image * rgb_weights, rank_1, keepdims=True)
+        flt_image * rgb_weights, rank_1, keep_dims=True)
    gray_float.set_shape(images.get_shape()[:-1].concatenate([1]))
    return tf.image.convert_image_dtype(gray_float, orig_dtype, name=name)

@@ -1821,8 +1821,10 @@ def random_pad_to_aspect_ratio(image,
    max_width = tf.maximum(
        max_padded_size_ratio[1] * image_width, target_width)

-    min_scale = tf.maximum(min_height / target_height, min_width / target_width)
    max_scale = tf.minimum(max_height / target_height, max_width / target_width)
+    min_scale = tf.minimum(
+        max_scale,
+        tf.maximum(min_height / target_height, min_width / target_width))

    generator_func = functools.partial(tf.random_uniform, [],
                                       min_scale, max_scale, seed=seed)
@@ -1831,8 +1833,8 @@ def random_pad_to_aspect_ratio(image,
        preprocessor_cache.PreprocessorCache.PAD_TO_ASPECT_RATIO,
        preprocess_vars_cache)

-    target_height = scale * target_height
-    target_width = scale * target_width
+    target_height = tf.round(scale * target_height)
+    target_width = tf.round(scale * target_width)

    new_image = tf.image.pad_to_bounding_box(
        image, 0, 0, tf.to_int32(target_height), tf.to_int32(target_width))
@@ -2261,14 +2263,14 @@ def resize_image(image,
      'ResizeImage',
      values=[image, new_height, new_width, method, align_corners]):
    new_image = tf.image.resize_images(
-        image, [new_height, new_width],
+        image, tf.stack([new_height, new_width]),
        method=method,
        align_corners=align_corners)
    image_shape = shape_utils.combined_static_and_dynamic_shape(image)
    result = [new_image]
    if masks is not None:
      num_instances = tf.shape(masks)[0]
-      new_size = tf.constant([new_height, new_width], dtype=tf.int32)
+      new_size = tf.stack([new_height, new_width])
      def resize_masks_branch():
        new_masks = tf.expand_dims(masks, 3)
        new_masks = tf.image.resize_nearest_neighbor(

--- a/research/object_detection/core/preprocessor_test.py
+++ b/research/object_detection/core/preprocessor_test.py
@@ -1736,6 +1736,41 @@ class PreprocessorTest(tf.test.TestCase):
                                test_masks=True,
                                test_keypoints=True)

+  def testRunRandomPadToAspectRatioWithMinMaxPaddedSizeRatios(self):
+    image = self.createColorfulTestImage()
+    boxes = self.createTestBoxes()
+    labels = self.createTestLabels()
+
+    tensor_dict = {
+        fields.InputDataFields.image: image,
+        fields.InputDataFields.groundtruth_boxes: boxes,
+        fields.InputDataFields.groundtruth_classes: labels
+    }
+
+    preprocessor_arg_map = preprocessor.get_default_func_arg_map()
+    preprocessing_options = [(preprocessor.random_pad_to_aspect_ratio,
+                              {'min_padded_size_ratio': (4.0, 4.0),
+                               'max_padded_size_ratio': (4.0, 4.0)})]
+
+    distorted_tensor_dict = preprocessor.preprocess(
+        tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map)
+    distorted_image = distorted_tensor_dict[fields.InputDataFields.image]
+    distorted_boxes = distorted_tensor_dict[
+        fields.InputDataFields.groundtruth_boxes]
+    distorted_labels = distorted_tensor_dict[
+        fields.InputDataFields.groundtruth_classes]
+    with self.test_session() as sess:
+      distorted_image_, distorted_boxes_, distorted_labels_ = sess.run([
+          distorted_image, distorted_boxes, distorted_labels])
+
+      expected_boxes = np.array(
+          [[0.0, 0.125, 0.1875, 0.5], [0.0625, 0.25, 0.1875, 0.5]],
+          dtype=np.float32)
+      self.assertAllEqual(distorted_image_.shape, [1, 800, 800, 3])
+      self.assertAllEqual(distorted_labels_, [1, 2])
+      self.assertAllClose(distorted_boxes_.flatten(),
+                          expected_boxes.flatten())
+
  def testRunRandomPadToAspectRatioWithMasks(self):
    image = self.createColorfulTestImage()
    boxes = self.createTestBoxes()
@@ -2118,6 +2153,33 @@ class PreprocessorTest(tf.test.TestCase):
        self.assertAllEqual(out_image_shape, expected_image_shape)
        self.assertAllEqual(out_masks_shape, expected_mask_shape)

+  def testResizeImageWithMasksTensorInputHeightAndWidth(self):
+    """Tests image resizing, checking output sizes."""
+    in_image_shape_list = [[60, 40, 3], [15, 30, 3]]
+    in_masks_shape_list = [[15, 60, 40], [10, 15, 30]]
+    height = tf.constant(50, dtype=tf.int32)
+    width = tf.constant(100, dtype=tf.int32)
+    expected_image_shape_list = [[50, 100, 3], [50, 100, 3]]
+    expected_masks_shape_list = [[15, 50, 100], [10, 50, 100]]
+
+    for (in_image_shape, expected_image_shape, in_masks_shape,
+         expected_mask_shape) in zip(in_image_shape_list,
+                                     expected_image_shape_list,
+                                     in_masks_shape_list,
+                                     expected_masks_shape_list):
+      in_image = tf.random_uniform(in_image_shape)
+      in_masks = tf.random_uniform(in_masks_shape)
+      out_image, out_masks, _ = preprocessor.resize_image(
+          in_image, in_masks, new_height=height, new_width=width)
+      out_image_shape = tf.shape(out_image)
+      out_masks_shape = tf.shape(out_masks)
+
+      with self.test_session() as sess:
+        out_image_shape, out_masks_shape = sess.run(
+            [out_image_shape, out_masks_shape])
+        self.assertAllEqual(out_image_shape, expected_image_shape)
+        self.assertAllEqual(out_masks_shape, expected_mask_shape)
+
  def testResizeImageWithNoInstanceMask(self):
    """Tests image resizing, checking output sizes."""
    in_image_shape_list = [[60, 40, 3], [15, 30, 3]]

--- a/research/object_detection/data_decoders/tf_example_decoder.py
+++ b/research/object_detection/data_decoders/tf_example_decoder.py
@@ -31,6 +31,44 @@ from object_detection.utils import label_map_util
 slim_example_decoder = tf.contrib.slim.tfexample_decoder


+# TODO(lzc): keep LookupTensor and BackupHandler in sync with
+# tf.contrib.slim.tfexample_decoder version.
+class LookupTensor(slim_example_decoder.Tensor):
+  """An ItemHandler that returns a parsed Tensor, the result of a lookup."""
+
+  def __init__(self,
+               tensor_key,
+               table,
+               shape_keys=None,
+               shape=None,
+               default_value=''):
+    """Initializes the LookupTensor handler.
+
+    Simply calls a vocabulary (most often, a label mapping) lookup.
+
+    Args:
+      tensor_key: the name of the `TFExample` feature to read the tensor from.
+      table: A tf.lookup table.
+      shape_keys: Optional name or list of names of the TF-Example feature in
+        which the tensor shape is stored. If a list, then each corresponds to
+        one dimension of the shape.
+      shape: Optional output shape of the `Tensor`. If provided, the `Tensor` is
+        reshaped accordingly.
+      default_value: The value used when the `tensor_key` is not found in a
+        particular `TFExample`.
+
+    Raises:
+      ValueError: if both `shape_keys` and `shape` are specified.
+    """
+    self._table = table
+    super(LookupTensor, self).__init__(tensor_key, shape_keys, shape,
+                                       default_value)
+
+  def tensors_to_item(self, keys_to_tensors):
+    unmapped_tensor = super(LookupTensor, self).tensors_to_item(keys_to_tensors)
+    return self._table.lookup(unmapped_tensor)
+
+
 class BackupHandler(slim_example_decoder.ItemHandler):
  """An ItemHandler that tries two ItemHandlers in order."""

@@ -207,8 +245,7 @@ class TfExampleDecoder(data_decoder.DataDecoder):
      # switch back to slim_example_decoder.BackupHandler once tf 1.5 becomes
      # more popular.
      label_handler = BackupHandler(
-          slim_example_decoder.LookupTensor(
-              'image/object/class/text', table, default_value=''),
+          LookupTensor('image/object/class/text', table, default_value=''),
          slim_example_decoder.Tensor('image/object/class/label'))
    else:
      label_handler = slim_example_decoder.Tensor('image/object/class/label')

--- a/research/object_detection/data_decoders/tf_example_decoder_test.py
+++ b/research/object_detection/data_decoders/tf_example_decoder_test.py
@@ -108,8 +108,8 @@ class TfExampleDecoderTest(tf.test.TestCase):
    }
    backup_handler = tf_example_decoder.BackupHandler(
        handler=slim_example_decoder.Tensor('image/object/class/label'),
-        backup=slim_example_decoder.LookupTensor('image/object/class/text',
-                                                 table))
+        backup=tf_example_decoder.LookupTensor('image/object/class/text',
+                                               table))
    items_to_handlers = {
        'labels': backup_handler,
    }
@@ -128,6 +128,37 @@ class TfExampleDecoderTest(tf.test.TestCase):
    self.assertAllClose([2, 0, 1], obtained_class_ids_each_example[1])
    self.assertAllClose([42, 10, 901], obtained_class_ids_each_example[2])

+  def testDecodeExampleWithBranchedLookup(self):
+
+    example = example_pb2.Example(features=feature_pb2.Features(feature={
+        'image/object/class/text': self._BytesFeatureFromList(
+            np.array(['cat', 'dog', 'guinea pig'])),
+    }))
+    serialized_example = example.SerializeToString()
+    # 'dog' -> 0, 'guinea pig' -> 1, 'cat' -> 2
+    table = lookup_ops.index_table_from_tensor(
+        constant_op.constant(['dog', 'guinea pig', 'cat']))
+
+    with self.test_session() as sess:
+      sess.run(lookup_ops.tables_initializer())
+
+      serialized_example = array_ops.reshape(serialized_example, shape=[])
+
+      keys_to_features = {
+          'image/object/class/text': parsing_ops.VarLenFeature(dtypes.string),
+      }
+
+      items_to_handlers = {
+          'labels':
+              tf_example_decoder.LookupTensor('image/object/class/text', table),
+      }
+
+      decoder = slim_example_decoder.TFExampleDecoder(keys_to_features,
+                                                      items_to_handlers)
+      obtained_class_ids = decoder.decode(serialized_example)[0].eval()
+
+    self.assertAllClose([2, 0, 1], obtained_class_ids)
+
  def testDecodeJpegImage(self):
    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)

--- a/research/object_detection/dataset_tools/create_kitti_tf_record.py
+++ b/research/object_detection/dataset_tools/create_kitti_tf_record.py
@@ -58,10 +58,10 @@ tf.app.flags.DEFINE_string('output_path', '', 'Path to which TFRecord files'
                           'will be located at: <output_path>_train.tfrecord.'
                           'And the TFRecord with the validation set will be'
                           'located at: <output_path>_val.tfrecord')
-tf.app.flags.DEFINE_list('classes_to_use', ['car', 'pedestrian', 'dontcare'],
-                         'Which classes of bounding boxes to use. Adding the'
-                         'dontcare class will remove all bboxs in the dontcare'
-                         'regions.')
+tf.app.flags.DEFINE_string('classes_to_use', 'car,pedestrian,dontcare',
+                           'Comma separated list of class names that will be'
+                           'used. Adding the dontcare class will remove all'
+                           'bboxs in the dontcare regions.')
 tf.app.flags.DEFINE_string('label_map_path', 'data/kitti_label_map.pbtxt',
                           'Path to label map proto.')
 tf.app.flags.DEFINE_integer('validation_set_size', '500', 'Number of images to'
@@ -302,7 +302,7 @@ def main(_):
  convert_kitti_to_tfrecords(
      data_dir=FLAGS.data_dir,
      output_path=FLAGS.output_path,
-      classes_to_use=FLAGS.classes_to_use,
+      classes_to_use=FLAGS.classes_to_use.split(','),
      label_map_path=FLAGS.label_map_path,
      validation_set_size=FLAGS.validation_set_size)


--- a/research/object_detection/eval_util.py
+++ b/research/object_detection/eval_util.py
@@ -12,7 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==============================================================================
-"""Common functions for repeatedly evaluating a checkpoint."""
+"""Common utility functions for evaluation."""
+import collections
 import logging
 import os
 import time
@@ -24,6 +25,7 @@ from object_detection.core import box_list
 from object_detection.core import box_list_ops
 from object_detection.core import keypoint_ops
 from object_detection.core import standard_fields as fields
+from object_detection.metrics import coco_evaluation
 from object_detection.utils import label_map_util
 from object_detection.utils import ops
 from object_detection.utils import visualization_utils as vis_utils
@@ -201,8 +203,9 @@ def _run_checkpoint_once(tensor_dict,
                         num_batches=1,
                         master='',
                         save_graph=False,
-                         save_graph_dir=''):
-  """Evaluates metrics defined in evaluators.
+                         save_graph_dir='',
+                         losses_dict=None):
+  """Evaluates metrics defined in evaluators and returns summaries.

  This function loads the latest checkpoint in checkpoint_dirs and evaluates
  all metrics defined in evaluators. The metrics are processed in batch by the
@@ -240,6 +243,7 @@ def _run_checkpoint_once(tensor_dict,
    save_graph: whether or not the Tensorflow graph is stored as a pbtxt file.
    save_graph_dir: where to store the Tensorflow graph on disk. If save_graph
      is True this must be non-empty.
+    losses_dict: optional dictionary of scalar detection losses.

  Returns:
    global_step: the count of global steps.
@@ -269,6 +273,7 @@ def _run_checkpoint_once(tensor_dict,
    tf.train.write_graph(sess.graph_def, save_graph_dir, 'eval.pbtxt')

  counters = {'skipped': 0, 'success': 0}
+  aggregate_result_losses_dict = collections.defaultdict(list)
  with tf.contrib.slim.queues.QueueRunners(sess):
    try:
      for batch in range(int(num_batches)):
@@ -276,16 +281,22 @@ def _run_checkpoint_once(tensor_dict,
          logging.info('Running eval ops batch %d/%d', batch + 1, num_batches)
        if not batch_processor:
          try:
-            result_dict = sess.run(tensor_dict)
+            if not losses_dict:
+              losses_dict = {}
+            result_dict, result_losses_dict = sess.run([tensor_dict,
+                                                        losses_dict])
            counters['success'] += 1
          except tf.errors.InvalidArgumentError:
            logging.info('Skipping image')
            counters['skipped'] += 1
            result_dict = {}
        else:
-          result_dict = batch_processor(tensor_dict, sess, batch, counters)
+          result_dict, result_losses_dict = batch_processor(
+              tensor_dict, sess, batch, counters, losses_dict=losses_dict)
        if not result_dict:
          continue
+        for key, value in iter(result_losses_dict.items()):
+          aggregate_result_losses_dict[key].append(value)
        for evaluator in evaluators:
          # TODO(b/65130867): Use image_id tensor once we fix the input data
          # decoders to return correct image_id.
@@ -310,6 +321,9 @@ def _run_checkpoint_once(tensor_dict,
          raise ValueError('Metric names between evaluators must not collide.')
        all_evaluator_metrics.update(metrics)
      global_step = tf.train.global_step(sess, tf.train.get_global_step())
+
+      for key, value in iter(aggregate_result_losses_dict.items()):
+        all_evaluator_metrics['Losses/' + key] = np.mean(value)
  sess.close()
  return (global_step, all_evaluator_metrics)

@@ -327,7 +341,8 @@ def repeated_checkpoint_run(tensor_dict,
                            max_number_of_evaluations=None,
                            master='',
                            save_graph=False,
-                            save_graph_dir=''):
+                            save_graph_dir='',
+                            losses_dict=None):
  """Periodically evaluates desired tensors using checkpoint_dirs or restore_fn.

  This function repeatedly loads a checkpoint and evaluates a desired
@@ -367,6 +382,7 @@ def repeated_checkpoint_run(tensor_dict,
    save_graph: whether or not the Tensorflow graph is saved as a pbtxt file.
    save_graph_dir: where to save on disk the Tensorflow graph. If store_graph
      is True this must be non-empty.
+    losses_dict: optional dictionary of scalar detection losses.

  Returns:
    metrics: A dictionary containing metric names and values in the latest
@@ -404,7 +420,8 @@ def repeated_checkpoint_run(tensor_dict,
                                                  variables_to_restore,
                                                  restore_fn, num_batches,
                                                  master, save_graph,
-                                                  save_graph_dir)
+                                                  save_graph_dir,
+                                                  losses_dict=losses_dict)
      write_metrics(metrics, global_step, summary_dir)
    number_of_evaluations += 1

@@ -432,7 +449,7 @@ def result_dict_for_single_example(image,
  have label 1.

  Args:
-    image: A single 4D image tensor of shape [1, H, W, C].
+    image: A single 4D uint8 image tensor of shape [1, H, W, C].
    key: A single string tensor identifying the image.
    detections: A dictionary of detections, returned from
      DetectionModel.postprocess().
@@ -479,7 +496,7 @@ def result_dict_for_single_example(image,
  """
  label_id_offset = 1  # Applying label id offset (b/63711816)

-  input_data_fields = fields.InputDataFields()
+  input_data_fields = fields.InputDataFields
  output_dict = {
      input_data_fields.original_image: image,
      input_data_fields.key: key,
@@ -488,10 +505,6 @@ def result_dict_for_single_example(image,
  detection_fields = fields.DetectionResultFields
  detection_boxes = detections[detection_fields.detection_boxes][0]
  image_shape = tf.shape(image)
-  if scale_to_absolute:
-    absolute_detection_boxlist = box_list_ops.to_absolute_coordinates(
-        box_list.BoxList(detection_boxes), image_shape[1], image_shape[2])
-    detection_boxes = absolute_detection_boxlist.get()
  detection_scores = detections[detection_fields.detection_scores][0]

  if class_agnostic:
@@ -508,7 +521,14 @@ def result_dict_for_single_example(image,
      detection_classes, begin=[0], size=[num_detections])
  detection_scores = tf.slice(
      detection_scores, begin=[0], size=[num_detections])
-  output_dict[detection_fields.detection_boxes] = detection_boxes
+
+  if scale_to_absolute:
+    absolute_detection_boxlist = box_list_ops.to_absolute_coordinates(
+        box_list.BoxList(detection_boxes), image_shape[1], image_shape[2])
+    output_dict[detection_fields.detection_boxes] = (
+        absolute_detection_boxlist.get())
+  else:
+    output_dict[detection_fields.detection_boxes] = detection_boxes
  output_dict[detection_fields.detection_classes] = detection_classes
  output_dict[detection_fields.detection_scores] = detection_scores

@@ -550,3 +570,69 @@ def result_dict_for_single_example(image,
      output_dict[input_data_fields.groundtruth_classes] = groundtruth_classes

  return output_dict
+
+
+def get_eval_metric_ops_for_evaluators(evaluation_metrics,
+                                       categories,
+                                       eval_dict,
+                                       include_metrics_per_category=False):
+  """Returns a dictionary of eval metric ops to use with `tf.EstimatorSpec`.
+
+  Args:
+    evaluation_metrics: List of evaluation metric names. Current options are
+      'coco_detection_metrics' and 'coco_mask_metrics'.
+    categories: A list of dicts, each of which has the following keys -
+        'id': (required) an integer id uniquely identifying this category.
+        'name': (required) string representing category name e.g., 'cat', 'dog'.
+    eval_dict: An evaluation dictionary, returned from
+      result_dict_for_single_example().
+    include_metrics_per_category: If True, include metrics for each category.
+
+  Returns:
+    A dictionary of metric names to tuple of value_op and update_op that can be
+    used as eval metric ops in tf.EstimatorSpec.
+
+  Raises:
+    ValueError: If any of the metrics in `evaluation_metric` is not
+    'coco_detection_metrics' or 'coco_mask_metrics'.
+  """
+  evaluation_metrics = list(set(evaluation_metrics))
+
+  input_data_fields = fields.InputDataFields
+  detection_fields = fields.DetectionResultFields
+  eval_metric_ops = {}
+  for metric in evaluation_metrics:
+    if metric == 'coco_detection_metrics':
+      coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
+          categories, include_metrics_per_category=include_metrics_per_category)
+      eval_metric_ops.update(
+          coco_evaluator.get_estimator_eval_metric_ops(
+              image_id=eval_dict[input_data_fields.key],
+              groundtruth_boxes=eval_dict[input_data_fields.groundtruth_boxes],
+              groundtruth_classes=eval_dict[
+                  input_data_fields.groundtruth_classes],
+              detection_boxes=eval_dict[detection_fields.detection_boxes],
+              detection_scores=eval_dict[detection_fields.detection_scores],
+              detection_classes=eval_dict[detection_fields.detection_classes]))
+    elif metric == 'coco_mask_metrics':
+      coco_mask_evaluator = coco_evaluation.CocoMaskEvaluator(
+          categories, include_metrics_per_category=include_metrics_per_category)
+      eval_metric_ops.update(
+          coco_mask_evaluator.get_estimator_eval_metric_ops(
+              image_id=eval_dict[input_data_fields.key],
+              groundtruth_boxes=eval_dict[input_data_fields.groundtruth_boxes],
+              groundtruth_classes=eval_dict[
+                  input_data_fields.groundtruth_classes],
+              groundtruth_instance_masks=eval_dict[
+                  input_data_fields.groundtruth_instance_masks],
+              detection_scores=eval_dict[detection_fields.detection_scores],
+              detection_classes=eval_dict[detection_fields.detection_classes],
+              detection_masks=eval_dict[detection_fields.detection_masks]))
+    else:
+      raise ValueError('The only evaluation metrics supported are '
+                       '"coco_detection_metrics" and "coco_mask_metrics". '
+                       'Found {} in the evaluation metrics'.format(metric))
+
+  return eval_metric_ops
+
+
--- a/research/object_detection/eval_util_test.py
+++ b/research/object_detection/eval_util_test.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for eval_util."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+
+
+from object_detection import eval_util
+from object_detection.core import standard_fields as fields
+
+
+class EvalUtilTest(tf.test.TestCase):
+
+  def _get_categories_list(self):
+    return [{'id': 0, 'name': 'person'},
+            {'id': 1, 'name': 'dog'},
+            {'id': 2, 'name': 'cat'}]
+
+  def _make_evaluation_dict(self):
+    input_data_fields = fields.InputDataFields
+    detection_fields = fields.DetectionResultFields
+
+    image = tf.zeros(shape=[1, 20, 20, 3], dtype=tf.uint8)
+    key = tf.constant('image1')
+    detection_boxes = tf.constant([[[0., 0., 1., 1.]]])
+    detection_scores = tf.constant([[0.8]])
+    detection_classes = tf.constant([[0]])
+    detection_masks = tf.ones(shape=[1, 1, 20, 20], dtype=tf.float32)
+    num_detections = tf.constant([1])
+    groundtruth_boxes = tf.constant([[0., 0., 1., 1.]])
+    groundtruth_classes = tf.constant([1])
+    groundtruth_instance_masks = tf.ones(shape=[1, 20, 20], dtype=tf.uint8)
+    detections = {
+        detection_fields.detection_boxes: detection_boxes,
+        detection_fields.detection_scores: detection_scores,
+        detection_fields.detection_classes: detection_classes,
+        detection_fields.detection_masks: detection_masks,
+        detection_fields.num_detections: num_detections
+    }
+    groundtruth = {
+        input_data_fields.groundtruth_boxes: groundtruth_boxes,
+        input_data_fields.groundtruth_classes: groundtruth_classes,
+        input_data_fields.groundtruth_instance_masks: groundtruth_instance_masks
+    }
+    return eval_util.result_dict_for_single_example(image, key, detections,
+                                                    groundtruth)
+
+  def test_get_eval_metric_ops_for_coco_detections(self):
+    evaluation_metrics = ['coco_detection_metrics']
+    categories = self._get_categories_list()
+    eval_dict = self._make_evaluation_dict()
+    metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
+        evaluation_metrics, categories, eval_dict)
+    _, update_op = metric_ops['DetectionBoxes_Precision/mAP']
+
+    with self.test_session() as sess:
+      metrics = {}
+      for key, (value_op, _) in metric_ops.iteritems():
+        metrics[key] = value_op
+      sess.run(update_op)
+      metrics = sess.run(metrics)
+      print(metrics)
+      self.assertAlmostEqual(1.0, metrics['DetectionBoxes_Precision/mAP'])
+      self.assertNotIn('DetectionMasks_Precision/mAP', metrics)
+
+  def test_get_eval_metric_ops_for_coco_detections_and_masks(self):
+    evaluation_metrics = ['coco_detection_metrics',
+                          'coco_mask_metrics']
+    categories = self._get_categories_list()
+    eval_dict = self._make_evaluation_dict()
+    metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
+        evaluation_metrics, categories, eval_dict)
+    _, update_op_boxes = metric_ops['DetectionBoxes_Precision/mAP']
+    _, update_op_masks = metric_ops['DetectionMasks_Precision/mAP']
+
+    with self.test_session() as sess:
+      metrics = {}
+      for key, (value_op, _) in metric_ops.iteritems():
+        metrics[key] = value_op
+      sess.run(update_op_boxes)
+      sess.run(update_op_masks)
+      metrics = sess.run(metrics)
+      self.assertAlmostEqual(1.0, metrics['DetectionBoxes_Precision/mAP'])
+      self.assertAlmostEqual(1.0, metrics['DetectionMasks_Precision/mAP'])
+
+  def test_get_eval_metric_ops_raises_error_with_unsupported_metric(self):
+    evaluation_metrics = ['unsupported_metrics']
+    categories = self._get_categories_list()
+    eval_dict = self._make_evaluation_dict()
+    with self.assertRaises(ValueError):
+      eval_util.get_eval_metric_ops_for_evaluators(
+          evaluation_metrics, categories, eval_dict)
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/evaluator.py
+++ b/research/object_detection/evaluator.py
@@ -50,10 +50,10 @@ EVAL_METRICS_CLASS_DICT = {
 EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics'


-def _extract_prediction_tensors(model,
-                                create_input_dict_fn,
-                                ignore_groundtruth=False):
-  """Restores the model in a tensorflow session.
+def _extract_predictions_and_losses(model,
+                                    create_input_dict_fn,
+                                    ignore_groundtruth=False):
+  """Constructs tensorflow detection graph and returns output tensors.

  Args:
    model: model to perform predictions with.
@@ -61,7 +61,11 @@ def _extract_prediction_tensors(model,
    ignore_groundtruth: whether groundtruth should be ignored.

  Returns:
-    tensor_dict: A tensor dictionary with evaluations.
+    prediction_groundtruth_dict: A dictionary with postprocessed tensors (keyed
+      by standard_fields.DetectionResultsFields) and optional groundtruth
+      tensors (keyed by standard_fields.InputDataFields).
+    losses_dict: A dictionary containing detection losses. This is empty when
+      ignore_groundtruth is true.
  """
  input_dict = create_input_dict_fn()
  prefetch_queue = prefetcher.prefetch(input_dict, capacity=500)
@@ -73,6 +77,7 @@ def _extract_prediction_tensors(model,
  detections = model.postprocess(prediction_dict, true_image_shapes)

  groundtruth = None
+  losses_dict = {}
  if not ignore_groundtruth:
    groundtruth = {
        fields.InputDataFields.groundtruth_boxes:
@@ -92,8 +97,14 @@ def _extract_prediction_tensors(model,
    if fields.DetectionResultFields.detection_masks in detections:
      groundtruth[fields.InputDataFields.groundtruth_instance_masks] = (
          input_dict[fields.InputDataFields.groundtruth_instance_masks])
-
-  return eval_util.result_dict_for_single_example(
+    label_id_offset = 1
+    model.provide_groundtruth(
+        [input_dict[fields.InputDataFields.groundtruth_boxes]],
+        [tf.one_hot(input_dict[fields.InputDataFields.groundtruth_classes]
+                    - label_id_offset, depth=model.num_classes)])
+    losses_dict.update(model.loss(prediction_dict, true_image_shapes))
+
+  result_dict = eval_util.result_dict_for_single_example(
      original_image,
      input_dict[fields.InputDataFields.source_id],
      detections,
@@ -101,6 +112,7 @@ def _extract_prediction_tensors(model,
      class_agnostic=(
          fields.DetectionResultFields.detection_classes not in detections),
      scale_to_absolute=True)
+  return result_dict, losses_dict


 def get_evaluators(eval_config, categories):
@@ -157,13 +169,14 @@ def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories,
    logging.fatal('If ignore_groundtruth=True then an export_path is '
                  'required. Aborting!!!')

-  tensor_dict = _extract_prediction_tensors(
+  tensor_dict, losses_dict = _extract_predictions_and_losses(
      model=model,
      create_input_dict_fn=create_input_dict_fn,
      ignore_groundtruth=eval_config.ignore_groundtruth)

-  def _process_batch(tensor_dict, sess, batch_index, counters):
-    """Evaluates tensors in tensor_dict, visualizing the first K examples.
+  def _process_batch(tensor_dict, sess, batch_index, counters,
+                     losses_dict=None):
+    """Evaluates tensors in tensor_dict, losses_dict and visualizes examples.

    This function calls sess.run on tensor_dict, evaluating the original_image
    tensor only on the first K examples and visualizing detections overlaid
@@ -177,12 +190,17 @@ def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories,
        be updated to keep track of number of successful and failed runs,
        respectively.  If these fields are not updated, then the success/skipped
        counter values shown at the end of evaluation will be incorrect.
+      losses_dict: Optional dictonary of scalar loss tensors.

    Returns:
      result_dict: a dictionary of numpy arrays
+      result_losses_dict: a dictionary of scalar losses. This is empty if input
+        losses_dict is None.
    """
    try:
-      result_dict = sess.run(tensor_dict)
+      if not losses_dict:
+        losses_dict = {}
+      result_dict, result_losses_dict = sess.run([tensor_dict, losses_dict])
      counters['success'] += 1
    except tf.errors.InvalidArgumentError:
      logging.info('Skipping image')
@@ -207,7 +225,7 @@ def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories,
          skip_labels=eval_config.skip_labels,
          keep_image_id_for_visualization_export=eval_config.
          keep_image_id_for_visualization_export)
-    return result_dict
+    return result_dict, result_losses_dict

  variables_to_restore = tf.global_variables()
  global_step = tf.train.get_or_create_global_step()
@@ -242,6 +260,7 @@ def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories,
                                 if eval_config.max_evals else None),
      master=eval_config.eval_master,
      save_graph=eval_config.save_graph,
-      save_graph_dir=(eval_dir if eval_config.save_graph else ''))
+      save_graph_dir=(eval_dir if eval_config.save_graph else ''),
+      losses_dict=losses_dict)

  return metrics
--- a/research/object_detection/exporter_test.py
+++ b/research/object_detection/exporter_test.py
@@ -62,7 +62,7 @@ class FakeModel(model.DetectionModel):
            np.arange(64).reshape([2, 2, 4, 4]), tf.float32)
    return postprocessed_tensors

-  def restore_map(self, checkpoint_path, from_detection_checkpoint):
+  def restore_map(self, checkpoint_path, fine_tune_checkpoint_type):
    pass

  def loss(self, prediction_dict, true_image_shapes):

--- a/research/object_detection/g3doc/faq.md
+++ b/research/object_detection/g3doc/faq.md
@@ -6,10 +6,14 @@ introduced in tensorflow 1.5.0 so runing with earlier versions may cause this
 issue. It now has been replaced by
 object_detection.data_decoders.tf_example_decoder.BackupHandler. Whoever sees
 this issue should be able to resolve it by syncing your fork to HEAD.
+Same for LookupTensor.
+
+## Q: AttributeError: 'module' object has no attribute 'LookupTensor'
+A: Similar to BackupHandler, syncing your fork to HEAD should make it work.

 ## Q: Why can't I get the inference time as reported in model zoo?
 A: The inference time reported in model zoo is mean time of testing hundreds of
-images with a internal machine. As mentioned in
+images with an internal machine. As mentioned in
 [Tensorflow detection model zoo](detection_model_zoo.md), this speed depends
 highly on one's specific hardware configuration and should be treated more as
 relative timing.
--- a/research/object_detection/inputs.py
+++ b/research/object_detection/inputs.py
@@ -40,6 +40,11 @@ HASH_KEY = 'hash'
 HASH_BINS = 1 << 31
 SERVING_FED_EXAMPLE_KEY = 'serialized_example'

+# A map of names to methods that help build the input pipeline.
+INPUT_BUILDER_UTIL_MAP = {
+    'dataset_build': dataset_builder.build,
+}
+

 def transform_input_data(tensor_dict,
                         model_preprocess_fn,
@@ -229,7 +234,7 @@ def create_train_input_fn(train_config, train_input_config,
        image_resizer_fn=image_resizer_fn,
        num_classes=config_util.get_number_of_classes(model_config),
        data_augmentation_fn=data_augmentation_fn)
-    dataset = dataset_builder.build(
+    dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
        train_input_config,
        transform_input_data_fn=transform_data_fn,
        batch_size=params['batch_size'] if params else train_config.batch_size,
@@ -341,8 +346,13 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
        num_classes=num_classes,
        data_augmentation_fn=None,
        retain_original_image=True)
-    dataset = dataset_builder.build(eval_input_config,
-                                    transform_input_data_fn=transform_data_fn)
+    dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
+        eval_input_config,
+        transform_input_data_fn=transform_data_fn,
+        batch_size=1,
+        num_classes=config_util.get_number_of_classes(model_config),
+        spatial_image_shape=config_util.get_spatial_image_size(
+            image_resizer_config))
    input_dict = dataset_util.make_initializable_iterator(dataset).get_next()

    hash_from_source_id = tf.string_to_hash_bucket_fast(
@@ -374,16 +384,6 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
      labels[fields.InputDataFields.groundtruth_instance_masks] = input_dict[
          fields.InputDataFields.groundtruth_instance_masks]

-    # Add a batch dimension to the tensors.
-    features = {
-        key: tf.expand_dims(features[key], axis=0)
-        for key, feature in features.items()
-    }
-    labels = {
-        key: tf.expand_dims(labels[key], axis=0)
-        for key, label in labels.items()
-    }
-
    return features, labels

  return _eval_input_fn
@@ -426,9 +426,13 @@ def create_predict_input_fn(model_config):
    input_dict = transform_fn(decoder.decode(example))
    images = tf.to_float(input_dict[fields.InputDataFields.image])
    images = tf.expand_dims(images, axis=0)
+    true_image_shape = tf.expand_dims(
+        input_dict[fields.InputDataFields.true_image_shape], axis=0)

    return tf.estimator.export.ServingInputReceiver(
-        features={fields.InputDataFields.image: images},
+        features={
+            fields.InputDataFields.image: images,
+            fields.InputDataFields.true_image_shape: true_image_shape},
        receiver_tensors={SERVING_FED_EXAMPLE_KEY: example})

  return _predict_input_fn
--- a/research/object_detection/inputs_test.py
+++ b/research/object_detection/inputs_test.py
@@ -64,24 +64,24 @@ class InputsTest(tf.test.TestCase):
        configs['train_config'], configs['train_input_config'], model_config)
    features, labels = train_input_fn()

-    self.assertAllEqual([None, None, 3],
+    self.assertAllEqual([1, None, None, 3],
                        features[fields.InputDataFields.image].shape.as_list())
    self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
-    self.assertAllEqual([],
+    self.assertAllEqual([1],
                        features[inputs.HASH_KEY].shape.as_list())
    self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
    self.assertAllEqual(
-        [None, 4],
+        [1, 50, 4],
        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
    self.assertAllEqual(
-        [None, model_config.faster_rcnn.num_classes],
+        [1, 50, model_config.faster_rcnn.num_classes],
        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_classes].dtype)
    self.assertAllEqual(
-        [None],
+        [1, 50],
        labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_weights].dtype)

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
@@ -159,6 +159,7 @@ class FasterRCNNFeatureExtractor(object):

    Returns:
      rpn_feature_map: A tensor with shape [batch, height, width, depth]
+      activations: A dictionary mapping activation tensor names to tensors.
    """
    with tf.variable_scope(scope, values=[preprocessed_inputs]):
      return self._extract_proposal_features(preprocessed_inputs, scope)
@@ -906,7 +907,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
      image_shape: A 1-D tensor representing the input image shape.
    """
    image_shape = tf.shape(preprocessed_inputs)
-    rpn_features_to_crop = self._feature_extractor.extract_proposal_features(
+    rpn_features_to_crop, _ = self._feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope=self.first_stage_feature_extractor_scope)

    feature_map_shape = tf.shape(rpn_features_to_crop)
@@ -1649,14 +1650,14 @@ class FasterRCNNMetaArch(model.DetectionModel):
          tf.reduce_sum(localization_losses, axis=1) / normalizer)
      objectness_loss = tf.reduce_mean(
          tf.reduce_sum(objectness_losses, axis=1) / normalizer)
-      loss_dict = {}
-
-      with tf.name_scope('localization_loss'):
-        loss_dict['first_stage_localization_loss'] = (
-            self._first_stage_loc_loss_weight * localization_loss)
-      with tf.name_scope('objectness_loss'):
-        loss_dict['first_stage_objectness_loss'] = (
-            self._first_stage_obj_loss_weight * objectness_loss)
+
+      localization_loss = tf.multiply(self._first_stage_loc_loss_weight,
+                                      localization_loss,
+                                      name='localization_loss')
+      objectness_loss = tf.multiply(self._first_stage_obj_loss_weight,
+                                    objectness_loss, name='objectness_loss')
+      loss_dict = {localization_loss.op.name: localization_loss,
+                   objectness_loss.op.name: objectness_loss}
    return loss_dict

  def _loss_box_classifier(self,
@@ -1782,15 +1783,16 @@ class FasterRCNNMetaArch(model.DetectionModel):
        ) = self._unpad_proposals_and_apply_hard_mining(
            proposal_boxlists, second_stage_loc_losses,
            second_stage_cls_losses, num_proposals)
-      loss_dict = {}
-      with tf.name_scope('localization_loss'):
-        loss_dict['second_stage_localization_loss'] = (
-            self._second_stage_loc_loss_weight * second_stage_loc_loss)
+      localization_loss = tf.multiply(self._second_stage_loc_loss_weight,
+                                      second_stage_loc_loss,
+                                      name='localization_loss')

-      with tf.name_scope('classification_loss'):
-        loss_dict['second_stage_classification_loss'] = (
-            self._second_stage_cls_loss_weight * second_stage_cls_loss)
+      classification_loss = tf.multiply(self._second_stage_cls_loss_weight,
+                                        second_stage_cls_loss,
+                                        name='classification_loss')

+      loss_dict = {localization_loss.op.name: localization_loss,
+                   classification_loss.op.name: classification_loss}
      second_stage_mask_loss = None
      if prediction_masks is not None:
        if groundtruth_masks_list is None:
@@ -1857,9 +1859,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
            tf.boolean_mask(second_stage_mask_losses, paddings_indicator))

      if second_stage_mask_loss is not None:
-        with tf.name_scope('mask_loss'):
-          loss_dict['second_stage_mask_loss'] = (
-              self._second_stage_mask_loss_weight * second_stage_mask_loss)
+        mask_loss = tf.multiply(self._second_stage_mask_loss_weight,
+                                second_stage_mask_loss, name='mask_loss')
+        loss_dict[mask_loss.op.name] = mask_loss
    return loss_dict

  def _padded_batched_proposals_indicator(self,
@@ -1927,26 +1929,32 @@ class FasterRCNNMetaArch(model.DetectionModel):
          decoded_boxlist_list=[proposal_boxlist])

  def restore_map(self,
-                  from_detection_checkpoint=True,
+                  fine_tune_checkpoint_type='detection',
                  load_all_detection_checkpoint_vars=False):
    """Returns a map of variables to load from a foreign checkpoint.

    See parent class for details.

    Args:
-      from_detection_checkpoint: whether to restore from a full detection
+      fine_tune_checkpoint_type: whether to restore from a full detection
        checkpoint (with compatible variable names) or to restore from a
-        classification checkpoint for initialization prior to training. Default
-        True.
+        classification checkpoint for initialization prior to training.
+        Valid values: `detection`, `classification`. Default 'detection'.
       load_all_detection_checkpoint_vars: whether to load all variables (when
-         `from_detection_checkpoint` is True). If False, only variables within
-         the feature extractor scopes are included. Default False.
+         `fine_tune_checkpoint_type` is `detection`). If False, only variables
+         within the feature extractor scopes are included. Default False.

    Returns:
      A dict mapping variable names (to load from a checkpoint) to variables in
      the model graph.
+    Raises:
+      ValueError: if fine_tune_checkpoint_type is neither `classification`
+        nor `detection`.
    """
-    if not from_detection_checkpoint:
+    if fine_tune_checkpoint_type not in ['detection', 'classification']:
+      raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
+          fine_tune_checkpoint_type))
+    if fine_tune_checkpoint_type == 'classification':
      return self._feature_extractor.restore_from_classification_checkpoint_fn(
          self.first_stage_feature_extractor_scope,
          self.second_stage_feature_extractor_scope)

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py
@@ -47,8 +47,9 @@ class FakeFasterRCNNFeatureExtractor(

  def _extract_proposal_features(self, preprocessed_inputs, scope):
    with tf.variable_scope('mock_model'):
-      return 0 * slim.conv2d(preprocessed_inputs,
-                             num_outputs=3, kernel_size=1, scope='layer1')
+      proposal_features = 0 * slim.conv2d(
+          preprocessed_inputs, num_outputs=3, kernel_size=1, scope='layer1')
+      return proposal_features, {}

  def _extract_box_classifier_features(self, proposal_feature_maps, scope):
    with tf.variable_scope('mock_model'):
@@ -792,10 +793,12 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
    loss_dict = model.loss(prediction_dict, true_image_shapes)
    with self.test_session() as sess:
      loss_dict_out = sess.run(loss_dict)
-      self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
-      self.assertTrue('second_stage_localization_loss' not in loss_dict_out)
-      self.assertTrue('second_stage_classification_loss' not in loss_dict_out)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
+      self.assertTrue('Loss/BoxClassifierLoss/localization_loss'
+                      not in loss_dict_out)
+      self.assertTrue('Loss/BoxClassifierLoss/classification_loss'
+                      not in loss_dict_out)

  # TODO(rathodv): Split test into two - with and without masks.
  def test_loss_full(self):
@@ -890,11 +893,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):

    with self.test_session() as sess:
      loss_dict_out = sess.run(loss_dict)
-      self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_mask_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/localization_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/classification_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)

  def test_loss_full_zero_padded_proposals(self):
    model = self._build_model(
@@ -978,11 +983,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):

    with self.test_session() as sess:
      loss_dict_out = sess.run(loss_dict)
-      self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_mask_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/localization_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/classification_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)

  def test_loss_full_multiple_label_groundtruth(self):
    model = self._build_model(
@@ -1074,11 +1081,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):

    with self.test_session() as sess:
      loss_dict_out = sess.run(loss_dict)
-      self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_mask_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/localization_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/classification_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)

  def test_loss_full_zero_padded_proposals_nonzero_loss_with_two_images(self):
    model = self._build_model(
@@ -1173,12 +1182,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):

    with self.test_session() as sess:
      loss_dict_out = sess.run(loss_dict)
-      self.assertAllClose(loss_dict_out['first_stage_localization_loss'],
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'],
                          exp_loc_loss)
-      self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_localization_loss'],
-                          exp_loc_loss)
-      self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/localization_loss'], exp_loc_loss)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/classification_loss'], 0)

  def test_loss_with_hard_mining(self):
    model = self._build_model(is_training=True,
@@ -1263,9 +1273,10 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):

    with self.test_session() as sess:
      loss_dict_out = sess.run(loss_dict)
-      self.assertAllClose(loss_dict_out['second_stage_localization_loss'],
-                          exp_loc_loss)
-      self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/localization_loss'], exp_loc_loss)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/classification_loss'], 0)

  def test_restore_map_for_classification_ckpt(self):
    # Define mock tensorflow classification graph and save variables.
@@ -1296,7 +1307,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
      preprocessed_inputs, true_image_shapes = model.preprocess(inputs)
      prediction_dict = model.predict(preprocessed_inputs, true_image_shapes)
      model.postprocess(prediction_dict, true_image_shapes)
-      var_map = model.restore_map(from_detection_checkpoint=False)
+      var_map = model.restore_map(fine_tune_checkpoint_type='classification')
      self.assertIsInstance(var_map, dict)
      saver = tf.train.Saver(var_map)
      with self.test_session(graph=test_graph_classification) as sess:
@@ -1338,7 +1349,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
      prediction_dict2 = model2.predict(preprocessed_inputs2, true_image_shapes)
      model2.postprocess(prediction_dict2, true_image_shapes)
      another_variable = tf.Variable([17.0], name='another_variable')  # pylint: disable=unused-variable
-      var_map = model2.restore_map(from_detection_checkpoint=True)
+      var_map = model2.restore_map(fine_tune_checkpoint_type='detection')
      self.assertIsInstance(var_map, dict)
      saver = tf.train.Saver(var_map)
      with self.test_session(graph=test_graph_detection2) as sess:
@@ -1366,7 +1377,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
      model.postprocess(prediction_dict, true_image_shapes)
      another_variable = tf.Variable([17.0], name='another_variable')  # pylint: disable=unused-variable
      var_map = model.restore_map(
-          from_detection_checkpoint=True,
+          fine_tune_checkpoint_type='detection',
          load_all_detection_checkpoint_vars=True)
      self.assertIsInstance(var_map, dict)
      self.assertIn('another_variable', var_map)

--- a/research/object_detection/meta_architectures/ssd_meta_arch.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch.py
@@ -503,7 +503,7 @@ class SSDMetaArch(model.DetectionModel):
           self.groundtruth_lists(fields.BoxListFields.classes),
           keypoints, weights)
      if self._add_summaries:
-        self._summarize_input(
+        self._summarize_target_assignment(
            self.groundtruth_lists(fields.BoxListFields.boxes), match_list)
      location_losses = self._localization_loss(
          prediction_dict['box_encodings'],
@@ -538,19 +538,20 @@ class SSDMetaArch(model.DetectionModel):
        normalizer = tf.maximum(tf.to_float(tf.reduce_sum(batch_reg_weights)),
                                1.0)

-      with tf.name_scope('localization_loss'):
-        localization_loss_normalizer = normalizer
-        if self._normalize_loc_loss_by_codesize:
-          localization_loss_normalizer *= self._box_coder.code_size
-        localization_loss = ((self._localization_loss_weight / (
-            localization_loss_normalizer)) * localization_loss)
-      with tf.name_scope('classification_loss'):
-        classification_loss = ((self._classification_loss_weight / normalizer) *
-                               classification_loss)
+      localization_loss_normalizer = normalizer
+      if self._normalize_loc_loss_by_codesize:
+        localization_loss_normalizer *= self._box_coder.code_size
+      localization_loss = tf.multiply((self._localization_loss_weight /
+                                       localization_loss_normalizer),
+                                      localization_loss,
+                                      name='localization_loss')
+      classification_loss = tf.multiply((self._classification_loss_weight /
+                                         normalizer), classification_loss,
+                                        name='classification_loss')

      loss_dict = {
-          'localization_loss': localization_loss,
-          'classification_loss': classification_loss
+          localization_loss.op.name: localization_loss,
+          classification_loss.op.name: classification_loss
      }
    return loss_dict

@@ -615,7 +616,7 @@ class SSDMetaArch(model.DetectionModel):
        self._target_assigner, self.anchors, groundtruth_boxlists,
        groundtruth_classes_with_background_list, groundtruth_weights_list)

-  def _summarize_input(self, groundtruth_boxes_list, match_list):
+  def _summarize_target_assignment(self, groundtruth_boxes_list, match_list):
    """Creates tensorflow summaries for the input boxes and anchors.

    This function creates four summaries corresponding to the average
@@ -639,14 +640,18 @@ class SSDMetaArch(model.DetectionModel):
        [match.num_unmatched_columns() for match in match_list])
    ignored_anchors_per_image = tf.stack(
        [match.num_ignored_columns() for match in match_list])
-    tf.summary.scalar('Input/AvgNumGroundtruthBoxesPerImage',
-                      tf.reduce_mean(tf.to_float(num_boxes_per_image)))
-    tf.summary.scalar('Input/AvgNumPositiveAnchorsPerImage',
-                      tf.reduce_mean(tf.to_float(pos_anchors_per_image)))
-    tf.summary.scalar('Input/AvgNumNegativeAnchorsPerImage',
-                      tf.reduce_mean(tf.to_float(neg_anchors_per_image)))
-    tf.summary.scalar('Input/AvgNumIgnoredAnchorsPerImage',
-                      tf.reduce_mean(tf.to_float(ignored_anchors_per_image)))
+    tf.summary.scalar('AvgNumGroundtruthBoxesPerImage',
+                      tf.reduce_mean(tf.to_float(num_boxes_per_image)),
+                      family='TargetAssignment')
+    tf.summary.scalar('AvgNumPositiveAnchorsPerImage',
+                      tf.reduce_mean(tf.to_float(pos_anchors_per_image)),
+                      family='TargetAssignment')
+    tf.summary.scalar('AvgNumNegativeAnchorsPerImage',
+                      tf.reduce_mean(tf.to_float(neg_anchors_per_image)),
+                      family='TargetAssignment')
+    tf.summary.scalar('AvgNumIgnoredAnchorsPerImage',
+                      tf.reduce_mean(tf.to_float(ignored_anchors_per_image)),
+                      family='TargetAssignment')

  def _apply_hard_mining(self, location_losses, cls_losses, prediction_dict,
                         match_list):
@@ -731,16 +736,17 @@ class SSDMetaArch(model.DetectionModel):
    return decoded_boxes, decoded_keypoints

  def restore_map(self,
-                  from_detection_checkpoint=True,
+                  fine_tune_checkpoint_type='detection',
                  load_all_detection_checkpoint_vars=False):
    """Returns a map of variables to load from a foreign checkpoint.

    See parent class for details.

    Args:
-      from_detection_checkpoint: whether to restore from a full detection
+      fine_tune_checkpoint_type: whether to restore from a full detection
        checkpoint (with compatible variable names) or to restore from a
        classification checkpoint for initialization prior to training.
+        Valid values: `detection`, `classification`. Default 'detection'.
      load_all_detection_checkpoint_vars: whether to load all variables (when
         `from_detection_checkpoint` is True). If False, only variables within
         the appropriate scopes are included. Default False.
@@ -748,15 +754,22 @@ class SSDMetaArch(model.DetectionModel):
    Returns:
      A dict mapping variable names (to load from a checkpoint) to variables in
      the model graph.
+    Raises:
+      ValueError: if fine_tune_checkpoint_type is neither `classification`
+        nor `detection`.
    """
+    if fine_tune_checkpoint_type not in ['detection', 'classification']:
+      raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
+          fine_tune_checkpoint_type))
    variables_to_restore = {}
    for variable in tf.global_variables():
      var_name = variable.op.name
-      if from_detection_checkpoint and load_all_detection_checkpoint_vars:
+      if (fine_tune_checkpoint_type == 'detection' and
+          load_all_detection_checkpoint_vars):
        variables_to_restore[var_name] = variable
      else:
        if var_name.startswith(self._extract_features_scope):
-          if not from_detection_checkpoint:
+          if fine_tune_checkpoint_type == 'classification':
            var_name = (
                re.split('^' + self._extract_features_scope + '/',
                         var_name)[-1])

--- a/research/object_detection/meta_architectures/ssd_meta_arch_test.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch_test.py
@@ -72,6 +72,13 @@ class MockAnchorGenerator2x2(anchor_generator.AnchorGenerator):
    return 4


+def _get_value_for_matching_key(dictionary, suffix):
+  for key in dictionary.keys():
+    if key.endswith(suffix):
+      return dictionary[key]
+  raise ValueError('key not found {}'.format(suffix))
+
+
 class SsdMetaArchTest(test_case.TestCase):

  def _create_model(self, apply_hard_mining=True,
@@ -270,7 +277,9 @@ class SsdMetaArchTest(test_case.TestCase):
      prediction_dict = model.predict(preprocessed_tensor,
                                      true_image_shapes=None)
      loss_dict = model.loss(prediction_dict, true_image_shapes=None)
-      return (loss_dict['localization_loss'], loss_dict['classification_loss'])
+      return (
+          _get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),
+          _get_value_for_matching_key(loss_dict, 'Loss/classification_loss'))

    batch_size = 2
    preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
@@ -305,7 +314,7 @@ class SsdMetaArchTest(test_case.TestCase):
      prediction_dict = model.predict(preprocessed_tensor,
                                      true_image_shapes=None)
      loss_dict = model.loss(prediction_dict, true_image_shapes=None)
-      return (loss_dict['localization_loss'],)
+      return (_get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),)

    batch_size = 2
    preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
@@ -335,7 +344,9 @@ class SsdMetaArchTest(test_case.TestCase):
      prediction_dict = model.predict(preprocessed_tensor,
                                      true_image_shapes=None)
      loss_dict = model.loss(prediction_dict, true_image_shapes=None)
-      return (loss_dict['localization_loss'], loss_dict['classification_loss'])
+      return (
+          _get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),
+          _get_value_for_matching_key(loss_dict, 'Loss/classification_loss'))

    batch_size = 2
    preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
@@ -366,7 +377,7 @@ class SsdMetaArchTest(test_case.TestCase):
      sess.run(init_op)
      saved_model_path = saver.save(sess, save_path)
      var_map = model.restore_map(
-          from_detection_checkpoint=True,
+          fine_tune_checkpoint_type='detection',
          load_all_detection_checkpoint_vars=False)
      self.assertIsInstance(var_map, dict)
      saver = tf.train.Saver(var_map)
@@ -402,7 +413,7 @@ class SsdMetaArchTest(test_case.TestCase):
      prediction_dict = model.predict(preprocessed_inputs, true_image_shapes)
      model.postprocess(prediction_dict, true_image_shapes)
      another_variable = tf.Variable([17.0], name='another_variable')  # pylint: disable=unused-variable
-      var_map = model.restore_map(from_detection_checkpoint=False)
+      var_map = model.restore_map(fine_tune_checkpoint_type='classification')
      self.assertNotIn('another_variable', var_map)
      self.assertIsInstance(var_map, dict)
      saver = tf.train.Saver(var_map)
@@ -423,7 +434,7 @@ class SsdMetaArchTest(test_case.TestCase):
      model.postprocess(prediction_dict, true_image_shapes)
      another_variable = tf.Variable([17.0], name='another_variable')  # pylint: disable=unused-variable
      var_map = model.restore_map(
-          from_detection_checkpoint=True,
+          fine_tune_checkpoint_type='detection',
          load_all_detection_checkpoint_vars=True)
      self.assertIsInstance(var_map, dict)
      self.assertIn('another_variable', var_map)

--- a/research/object_detection/metrics/coco_evaluation.py
+++ b/research/object_detection/metrics/coco_evaluation.py
@@ -100,6 +100,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
            groundtruth_is_crowd=groundtruth_is_crowd))
    self._annotation_id += groundtruth_dict[standard_fields.InputDataFields.
                                            groundtruth_boxes].shape[0]
+    # Boolean to indicate whether a detection has been added for this image.
    self._image_ids[image_id] = False

  def add_single_detected_image_info(self,
@@ -120,9 +121,6 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
          [num_boxes] containing detection scores for the boxes.
        DetectionResultFields.detection_classes: integer numpy array of shape
          [num_boxes] containing 1-indexed detection classes for the boxes.
-        DetectionResultFields.detection_masks: optional uint8 numpy array of
-          shape [num_boxes, image_height, image_width] containing instance
-          masks for the boxes.

    Raises:
      ValueError: If groundtruth for the image_id is not available.
@@ -200,7 +198,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
        all_metrics_per_category=self._all_metrics_per_category)
    box_metrics.update(box_per_category_ap)
    box_metrics = {'DetectionBoxes_'+ key: value
-                   for key, value in box_metrics.iteritems()}
+                   for key, value in iter(box_metrics.items())}
    return box_metrics

  def get_estimator_eval_metric_ops(self, image_id, groundtruth_boxes,
@@ -282,6 +280,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
        return np.float32(self._metrics[metric_name])
      return value_func

+    # Ensure that the metrics are only evaluated once.
    first_value_op = tf.py_func(first_value_func, [], tf.float32)
    eval_metric_ops = {metric_names[0]: (first_value_op, update_op)}
    with tf.control_dependencies([first_value_op]):
@@ -292,7 +291,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):


 def _check_mask_type_and_value(array_name, masks):
-  """Checks whether mask dtype is uint8 anf the values are either 0 or 1."""
+  """Checks whether mask dtype is uint8 and the values are either 0 or 1."""
  if masks.dtype != np.uint8:
    raise ValueError('{} must be of type np.uint8. Found {}.'.format(
        array_name, masks.dtype))
@@ -334,6 +333,9 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
                                         groundtruth_dict):
    """Adds groundtruth for a single image to be used for evaluation.

+    If the image has already been added, a warning is logged, and groundtruth is
+    ignored.
+
    Args:
      image_id: A unique string/integer identifier for the image.
      groundtruth_dict: A dictionary containing -
@@ -379,6 +381,9 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
                                     detections_dict):
    """Adds detections for a single image to be used for evaluation.

+    If a detection has already been added for this image id, a warning is
+    logged, and the detection is skipped.
+
    Args:
      image_id: A unique string/integer identifier for the image.
      detections_dict: A dictionary containing -
@@ -435,25 +440,25 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
      A dictionary holding -

      1. summary_metrics:
-      'Precision/mAP': mean average precision over classes averaged over IOU
-        thresholds ranging from .5 to .95 with .05 increments
-      'Precision/mAP@.50IOU': mean average precision at 50% IOU
-      'Precision/mAP@.75IOU': mean average precision at 75% IOU
-      'Precision/mAP (small)': mean average precision for small objects
-                      (area < 32^2 pixels)
-      'Precision/mAP (medium)': mean average precision for medium sized
-                      objects (32^2 pixels < area < 96^2 pixels)
-      'Precision/mAP (large)': mean average precision for large objects
-                      (96^2 pixels < area < 10000^2 pixels)
-      'Recall/AR@1': average recall with 1 detection
-      'Recall/AR@10': average recall with 10 detections
-      'Recall/AR@100': average recall with 100 detections
-      'Recall/AR@100 (small)': average recall for small objects with 100
-        detections
-      'Recall/AR@100 (medium)': average recall for medium objects with 100
-        detections
-      'Recall/AR@100 (large)': average recall for large objects with 100
-        detections
+      'DetectionMasks_Precision/mAP': mean average precision over classes
+        averaged over IOU thresholds ranging from .5 to .95 with .05 increments.
+      'DetectionMasks_Precision/mAP@.50IOU': mean average precision at 50% IOU.
+      'DetectionMasks_Precision/mAP@.75IOU': mean average precision at 75% IOU.
+      'DetectionMasks_Precision/mAP (small)': mean average precision for small
+        objects (area < 32^2 pixels).
+      'DetectionMasks_Precision/mAP (medium)': mean average precision for medium
+        sized objects (32^2 pixels < area < 96^2 pixels).
+      'DetectionMasks_Precision/mAP (large)': mean average precision for large
+        objects (96^2 pixels < area < 10000^2 pixels).
+      'DetectionMasks_Recall/AR@1': average recall with 1 detection.
+      'DetectionMasks_Recall/AR@10': average recall with 10 detections.
+      'DetectionMasks_Recall/AR@100': average recall with 100 detections.
+      'DetectionMasks_Recall/AR@100 (small)': average recall for small objects
+        with 100 detections.
+      'DetectionMasks_Recall/AR@100 (medium)': average recall for medium objects
+        with 100 detections.
+      'DetectionMasks_Recall/AR@100 (large)': average recall for large objects
+        with 100 detections.

      2. per_category_ap: if include_metrics_per_category is True, category
      specific results with keys of the form:
@@ -482,3 +487,101 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
    mask_metrics = {'DetectionMasks_'+ key: value
                    for key, value in mask_metrics.iteritems()}
    return mask_metrics
+
+  def get_estimator_eval_metric_ops(self, image_id, groundtruth_boxes,
+                                    groundtruth_classes,
+                                    groundtruth_instance_masks,
+                                    detection_scores, detection_classes,
+                                    detection_masks):
+    """Returns a dictionary of eval metric ops to use with `tf.EstimatorSpec`.
+
+    Note that once value_op is called, the detections and groundtruth added via
+    update_op are cleared.
+
+    Args:
+      image_id: Unique string/integer identifier for the image.
+      groundtruth_boxes: float32 tensor of shape [num_boxes, 4] containing
+        `num_boxes` groundtruth boxes of the format
+        [ymin, xmin, ymax, xmax] in absolute image coordinates.
+      groundtruth_classes: int32 tensor of shape [num_boxes] containing
+        1-indexed groundtruth classes for the boxes.
+      groundtruth_instance_masks: uint8 tensor array of shape
+        [num_boxes, image_height, image_width] containing groundtruth masks
+        corresponding to the boxes. The elements of the array must be in {0, 1}.
+      detection_scores: float32 tensor of shape [num_boxes] containing
+        detection scores for the boxes.
+      detection_classes: int32 tensor of shape [num_boxes] containing
+        1-indexed detection classes for the boxes.
+      detection_masks: uint8 tensor array of shape
+        [num_boxes, image_height, image_width] containing instance masks
+        corresponding to the boxes. The elements of the array must be in {0, 1}.
+
+    Returns:
+      a dictionary of metric names to tuple of value_op and update_op that can
+      be used as eval metric ops in tf.EstimatorSpec. Note that all update ops
+      must be run together and similarly all value ops must be run together to
+      guarantee correct behaviour.
+    """
+    def update_op(
+        image_id,
+        groundtruth_boxes,
+        groundtruth_classes,
+        groundtruth_instance_masks,
+        detection_scores,
+        detection_classes,
+        detection_masks):
+      self.add_single_ground_truth_image_info(
+          image_id,
+          {'groundtruth_boxes': groundtruth_boxes,
+           'groundtruth_classes': groundtruth_classes,
+           'groundtruth_instance_masks': groundtruth_instance_masks})
+      self.add_single_detected_image_info(
+          image_id,
+          {'detection_scores': detection_scores,
+           'detection_classes': detection_classes,
+           'detection_masks': detection_masks})
+
+    update_op = tf.py_func(update_op, [image_id,
+                                       groundtruth_boxes,
+                                       groundtruth_classes,
+                                       groundtruth_instance_masks,
+                                       detection_scores,
+                                       detection_classes,
+                                       detection_masks], [])
+    metric_names = ['DetectionMasks_Precision/mAP',
+                    'DetectionMasks_Precision/mAP@.50IOU',
+                    'DetectionMasks_Precision/mAP@.75IOU',
+                    'DetectionMasks_Precision/mAP (large)',
+                    'DetectionMasks_Precision/mAP (medium)',
+                    'DetectionMasks_Precision/mAP (small)',
+                    'DetectionMasks_Recall/AR@1',
+                    'DetectionMasks_Recall/AR@10',
+                    'DetectionMasks_Recall/AR@100',
+                    'DetectionMasks_Recall/AR@100 (large)',
+                    'DetectionMasks_Recall/AR@100 (medium)',
+                    'DetectionMasks_Recall/AR@100 (small)']
+    if self._include_metrics_per_category:
+      for category_dict in self._categories:
+        metric_names.append('DetectionMasks_PerformanceByCategory/mAP/' +
+                            category_dict['name'])
+
+    def first_value_func():
+      self._metrics = self.evaluate()
+      self.clear()
+      return np.float32(self._metrics[metric_names[0]])
+
+    def value_func_factory(metric_name):
+      def value_func():
+        return np.float32(self._metrics[metric_name])
+      return value_func
+
+    # Ensure that the metrics are only evaluated once.
+    first_value_op = tf.py_func(first_value_func, [], tf.float32)
+    eval_metric_ops = {metric_names[0]: (first_value_op, update_op)}
+    with tf.control_dependencies([first_value_op]):
+      for metric_name in metric_names[1:]:
+        eval_metric_ops[metric_name] = (tf.py_func(
+            value_func_factory(metric_name), [], np.float32), update_op)
+    return eval_metric_ops
+
+
--- a/research/object_detection/metrics/coco_evaluation_test.py
+++ b/research/object_detection/metrics/coco_evaluation_test.py
@@ -403,5 +403,101 @@ class CocoMaskEvaluationTest(tf.test.TestCase):
    self.assertFalse(coco_evaluator._detection_masks_list)


+class CocoMaskEvaluationPyFuncTest(tf.test.TestCase):
+
+  def testGetOneMAPWithMatchingGroundtruthAndDetections(self):
+    category_list = [{'id': 0, 'name': 'person'},
+                     {'id': 1, 'name': 'cat'},
+                     {'id': 2, 'name': 'dog'}]
+    coco_evaluator = coco_evaluation.CocoMaskEvaluator(category_list)
+    image_id = tf.placeholder(tf.string, shape=())
+    groundtruth_boxes = tf.placeholder(tf.float32, shape=(None, 4))
+    groundtruth_classes = tf.placeholder(tf.float32, shape=(None))
+    groundtruth_masks = tf.placeholder(tf.uint8, shape=(None, None, None))
+    detection_scores = tf.placeholder(tf.float32, shape=(None))
+    detection_classes = tf.placeholder(tf.float32, shape=(None))
+    detection_masks = tf.placeholder(tf.uint8, shape=(None, None, None))
+
+    eval_metric_ops = coco_evaluator.get_estimator_eval_metric_ops(
+        image_id, groundtruth_boxes,
+        groundtruth_classes,
+        groundtruth_masks,
+        detection_scores,
+        detection_classes,
+        detection_masks)
+
+    _, update_op = eval_metric_ops['DetectionMasks_Precision/mAP']
+
+    with self.test_session() as sess:
+      sess.run(update_op,
+               feed_dict={
+                   image_id: 'image1',
+                   groundtruth_boxes: np.array([[100., 100., 200., 200.]]),
+                   groundtruth_classes: np.array([1]),
+                   groundtruth_masks: np.pad(np.ones([1, 100, 100],
+                                                     dtype=np.uint8),
+                                             ((0, 0), (10, 10), (10, 10)),
+                                             mode='constant'),
+                   detection_scores: np.array([.8]),
+                   detection_classes: np.array([1]),
+                   detection_masks: np.pad(np.ones([1, 100, 100],
+                                                   dtype=np.uint8),
+                                           ((0, 0), (10, 10), (10, 10)),
+                                           mode='constant')
+               })
+      sess.run(update_op,
+               feed_dict={
+                   image_id: 'image2',
+                   groundtruth_boxes: np.array([[50., 50., 100., 100.]]),
+                   groundtruth_classes: np.array([1]),
+                   groundtruth_masks: np.pad(np.ones([1, 50, 50],
+                                                     dtype=np.uint8),
+                                             ((0, 0), (10, 10), (10, 10)),
+                                             mode='constant'),
+                   detection_scores: np.array([.8]),
+                   detection_classes: np.array([1]),
+                   detection_masks: np.pad(np.ones([1, 50, 50], dtype=np.uint8),
+                                           ((0, 0), (10, 10), (10, 10)),
+                                           mode='constant')
+               })
+      sess.run(update_op,
+               feed_dict={
+                   image_id: 'image3',
+                   groundtruth_boxes: np.array([[25., 25., 50., 50.]]),
+                   groundtruth_classes: np.array([1]),
+                   groundtruth_masks: np.pad(np.ones([1, 25, 25],
+                                                     dtype=np.uint8),
+                                             ((0, 0), (10, 10), (10, 10)),
+                                             mode='constant'),
+                   detection_scores: np.array([.8]),
+                   detection_classes: np.array([1]),
+                   detection_masks: np.pad(np.ones([1, 25, 25],
+                                                   dtype=np.uint8),
+                                           ((0, 0), (10, 10), (10, 10)),
+                                           mode='constant')
+               })
+    metrics = {}
+    for key, (value_op, _) in eval_metric_ops.iteritems():
+      metrics[key] = value_op
+    metrics = sess.run(metrics)
+    self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP@.50IOU'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP@.75IOU'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP (large)'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP (medium)'],
+                           1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP (small)'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@1'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@10'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100 (large)'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100 (medium)'],
+                           1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100 (small)'], 1.0)
+    self.assertFalse(coco_evaluator._groundtruth_list)
+    self.assertFalse(coco_evaluator._image_ids_with_detections)
+    self.assertFalse(coco_evaluator._image_id_to_mask_shape_map)
+    self.assertFalse(coco_evaluator._detection_masks_list)
+
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/model.py
+++ b/research/object_detection/model.py
@@ -39,7 +39,6 @@ from object_detection import model_hparams
 from object_detection.builders import model_builder
 from object_detection.builders import optimizer_builder
 from object_detection.core import standard_fields as fields
-from object_detection.metrics import coco_evaluation
 from object_detection.utils import config_util
 from object_detection.utils import label_map_util
 from object_detection.utils import shape_utils
@@ -121,8 +120,8 @@ def unstack_batch(tensor_dict, unpad_groundtruth_tensors=True):
  2. [batch_size, height, width, channels]
  3. [batch_size, num_boxes, d1, d2, ... dn]

-  When unpad_tensors is set to true, unstacked tensors of form 3 above are
-  sliced along the `num_boxes` dimension using the value in tensor
+  When unpad_groundtruth_tensors is set to true, unstacked tensors of form 3
+  above are sliced along the `num_boxes` dimension using the value in tensor
  field.InputDataFields.num_groundtruth_boxes.

  Note that this function has a static list of input data fields and has to be
@@ -198,6 +197,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
  """
  train_config = configs['train_config']
  eval_input_config = configs['eval_input_config']
+  eval_config = configs['eval_config']

  def model_fn(features, labels, mode, params=None):
    """Constructs the object detection model.
@@ -250,9 +250,25 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
        prediction_dict, features[fields.InputDataFields.true_image_shape])

    if mode == tf.estimator.ModeKeys.TRAIN:
+      if not train_config.fine_tune_checkpoint_type:
+        # train_config.from_detection_checkpoint field is deprecated. For
+        # backward compatibility, sets finetune_checkpoint_type based on
+        # from_detection_checkpoint.
+        if train_config.from_detection_checkpoint:
+          train_config.fine_tune_checkpoint_type = 'detection'
+        else:
+          train_config.fine_tune_checkpoint_type = 'classification'
      if train_config.fine_tune_checkpoint and hparams.load_pretrained:
+        if not train_config.fine_tune_checkpoint_type:
+          # train_config.from_detection_checkpoint field is deprecated. For
+          # backward compatibility, set train_config.fine_tune_checkpoint_type
+          # based on train_config.from_detection_checkpoint.
+          if train_config.from_detection_checkpoint:
+            train_config.fine_tune_checkpoint_type = 'detection'
+          else:
+            train_config.fine_tune_checkpoint_type = 'classification'
        asg_map = detection_model.restore_map(
-            from_detection_checkpoint=train_config.from_detection_checkpoint,
+            fine_tune_checkpoint_type=train_config.fine_tune_checkpoint_type,
            load_all_detection_checkpoint_vars=(
                train_config.load_all_detection_checkpoint_vars))
        available_var_map = (
@@ -273,6 +289,15 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
      losses_dict = detection_model.loss(
          prediction_dict, features[fields.InputDataFields.true_image_shape])
      losses = [loss_tensor for loss_tensor in losses_dict.itervalues()]
+      if train_config.add_regularization_loss:
+        regularization_losses = tf.get_collection(
+            tf.GraphKeys.REGULARIZATION_LOSSES)
+        if regularization_losses:
+          regularization_loss = tf.add_n(regularization_losses,
+                                         name='regularization_loss')
+          losses.append(regularization_loss)
+          if not use_tpu:
+            tf.summary.scalar('regularization_loss', regularization_loss)
      total_loss = tf.add_n(losses, name='total_loss')

    if mode == tf.estimator.ModeKeys.TRAIN:
@@ -321,8 +346,12 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
      class_agnostic = (fields.DetectionResultFields.detection_classes
                        not in detections)
      groundtruth = _get_groundtruth_data(detection_model, class_agnostic)
+      use_original_images = fields.InputDataFields.original_image in features
+      eval_images = (
+          features[fields.InputDataFields.original_image] if use_original_images
+          else features[fields.InputDataFields.image])
      eval_dict = eval_util.result_dict_for_single_example(
-          tf.expand_dims(features[fields.InputDataFields.original_image][0], 0),
+          eval_images[0:1],
          features[inputs.HASH_KEY][0],
          detections,
          groundtruth,
@@ -334,7 +363,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
      else:
        category_index = label_map_util.create_category_index_from_labelmap(
            eval_input_config.label_map_path)
-      if not use_tpu:
+      if not use_tpu and use_original_images:
        detection_and_groundtruth = (
            vis_utils.draw_side_by_side_evaluation_image(
                eval_dict, category_index, max_boxes_to_draw=20,
@@ -343,17 +372,12 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
                         detection_and_groundtruth)

      # Eval metrics on a single image.
-      detection_fields = fields.DetectionResultFields()
-      input_data_fields = fields.InputDataFields()
-      coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
-          category_index.values())
-      eval_metric_ops = coco_evaluator.get_estimator_eval_metric_ops(
-          image_id=eval_dict[input_data_fields.key],
-          groundtruth_boxes=eval_dict[input_data_fields.groundtruth_boxes],
-          groundtruth_classes=eval_dict[input_data_fields.groundtruth_classes],
-          detection_boxes=eval_dict[detection_fields.detection_boxes],
-          detection_scores=eval_dict[detection_fields.detection_scores],
-          detection_classes=eval_dict[detection_fields.detection_classes])
+      eval_metrics = eval_config.metrics_set
+      if not eval_metrics:
+        eval_metrics = ['coco_detection_metrics']
+      eval_metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
+          eval_metrics, category_index.values(), eval_dict,
+          include_metrics_per_category=False)

    if use_tpu:
      return tf.contrib.tpu.TPUEstimatorSpec(
@@ -376,7 +400,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
  return model_fn


-def _build_experiment_fn(train_steps, eval_steps):
+def build_experiment_fn(train_steps, eval_steps):
  """Returns a function that creates an `Experiment`."""

  def build_experiment(run_config, hparams):
@@ -509,8 +533,8 @@ def main(unused_argv):
  tf.flags.mark_flag_as_required('pipeline_config_path')
  config = tf.contrib.learn.RunConfig(model_dir=FLAGS.model_dir)
  learn_runner.run(
-      experiment_fn=_build_experiment_fn(FLAGS.num_train_steps,
-                                         FLAGS.num_eval_steps),
+      experiment_fn=build_experiment_fn(FLAGS.num_train_steps,
+                                        FLAGS.num_eval_steps),
      run_config=config,
      hparams=model_hparams.create_hparams())


--- a/research/object_detection/model_test_util.py
+++ b/research/object_detection/model_test_util.py
@@ -49,6 +49,6 @@ def BuildExperiment():
      hparams_overrides='load_pretrained=false')

  # pylint: disable=protected-access
-  experiment_fn = model._build_experiment_fn(10, 10)
+  experiment_fn = model.build_experiment_fn(10, 10)
  # pylint: enable=protected-access
  return experiment_fn(run_config, hparams)
--- a/research/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor.py
+++ b/research/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor.py
@@ -105,12 +105,10 @@ class FasterRCNNInceptionResnetV2FeatureExtractor(
                          is_training=self._train_batch_norm):
        with tf.variable_scope('InceptionResnetV2',
                               reuse=self._reuse_weights) as scope:
-          rpn_feature_map, _ = (
-              inception_resnet_v2.inception_resnet_v2_base(
-                  preprocessed_inputs, final_endpoint='PreAuxLogits',
-                  scope=scope, output_stride=self._first_stage_features_stride,
-                  align_feature_maps=True))
-    return rpn_feature_map
+          return inception_resnet_v2.inception_resnet_v2_base(
+              preprocessed_inputs, final_endpoint='PreAuxLogits',
+              scope=scope, output_stride=self._first_stage_features_stride,
+              align_feature_maps=True)

  def _extract_box_classifier_features(self, proposal_feature_maps, scope):
    """Extracts second stage box classifier features.

--- a/research/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor_test.py
+++ b/research/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor_test.py
@@ -35,7 +35,7 @@ class FasterRcnnInceptionResnetV2FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [1, 299, 299, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -50,7 +50,7 @@ class FasterRcnnInceptionResnetV2FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=8)
    preprocessed_inputs = tf.random_uniform(
        [1, 224, 224, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -65,7 +65,7 @@ class FasterRcnnInceptionResnetV2FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [1, 112, 112, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)


--- a/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py
+++ b/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py
@@ -109,6 +109,9 @@ class FasterRCNNInceptionV2FeatureExtractor(

    Returns:
      rpn_feature_map: A tensor with shape [batch, height, width, depth]
+      activations: A dictionary mapping feature extractor tensor names to
+        tensors
+
    Raises:
      InvalidArgumentError: If the spatial size of `preprocessed_inputs`
        (height or width) is less than 33.
@@ -134,7 +137,7 @@ class FasterRCNNInceptionV2FeatureExtractor(
              depth_multiplier=self._depth_multiplier,
              scope=scope)

-    return activations['Mixed_4e']
+    return activations['Mixed_4e'], activations

  def _extract_box_classifier_features(self, proposal_feature_maps, scope):
    """Extracts second stage box classifier features.

--- a/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor_test.py
+++ b/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor_test.py
@@ -36,7 +36,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [4, 224, 224, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -51,7 +51,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=8)
    preprocessed_inputs = tf.random_uniform(
        [4, 224, 224, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -66,7 +66,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [1, 112, 112, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -84,7 +84,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
    feature_extractor = self._build_feature_extractor(
        first_stage_features_stride=16)
    preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)


--- a/research/object_detection/models/faster_rcnn_mobilenet_v1_feature_extractor.py
+++ b/research/object_detection/models/faster_rcnn_mobilenet_v1_feature_extractor.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Mobilenet v1 Faster R-CNN implementation."""
+import tensorflow as tf
+
+from object_detection.meta_architectures import faster_rcnn_meta_arch
+from nets import mobilenet_v1
+
+slim = tf.contrib.slim
+
+
+def _batch_norm_arg_scope(list_ops,
+                          use_batch_norm=True,
+                          batch_norm_decay=0.9997,
+                          batch_norm_epsilon=0.001,
+                          batch_norm_scale=False,
+                          train_batch_norm=False):
+  """Slim arg scope for Mobilenet V1 batch norm."""
+  if use_batch_norm:
+    batch_norm_params = {
+        'is_training': train_batch_norm,
+        'scale': batch_norm_scale,
+        'decay': batch_norm_decay,
+        'epsilon': batch_norm_epsilon
+    }
+    normalizer_fn = slim.batch_norm
+  else:
+    normalizer_fn = None
+    batch_norm_params = None
+
+  return slim.arg_scope(list_ops,
+                        normalizer_fn=normalizer_fn,
+                        normalizer_params=batch_norm_params)
+
+
+class FasterRCNNMobilenetV1FeatureExtractor(
+    faster_rcnn_meta_arch.FasterRCNNFeatureExtractor):
+  """Faster R-CNN Mobilenet V1 feature extractor implementation."""
+
+  def __init__(self,
+               is_training,
+               first_stage_features_stride,
+               batch_norm_trainable=False,
+               reuse_weights=None,
+               weight_decay=0.0,
+               depth_multiplier=1.0,
+               min_depth=16):
+    """Constructor.
+
+    Args:
+      is_training: See base class.
+      first_stage_features_stride: See base class.
+      batch_norm_trainable: See base class.
+      reuse_weights: See base class.
+      weight_decay: See base class.
+      depth_multiplier: float depth multiplier for feature extractor.
+      min_depth: minimum feature extractor depth.
+
+    Raises:
+      ValueError: If `first_stage_features_stride` is not 8 or 16.
+    """
+    if first_stage_features_stride != 8 and first_stage_features_stride != 16:
+      raise ValueError('`first_stage_features_stride` must be 8 or 16.')
+    self._depth_multiplier = depth_multiplier
+    self._min_depth = min_depth
+    super(FasterRCNNMobilenetV1FeatureExtractor, self).__init__(
+        is_training, first_stage_features_stride, batch_norm_trainable,
+        reuse_weights, weight_decay)
+
+  def preprocess(self, resized_inputs):
+    """Faster R-CNN Mobilenet V1 preprocessing.
+
+    Maps pixel values to the range [-1, 1].
+
+    Args:
+      resized_inputs: a [batch, height, width, channels] float tensor
+        representing a batch of images.
+
+    Returns:
+      preprocessed_inputs: a [batch, height, width, channels] float tensor
+        representing a batch of images.
+    """
+    return (2.0 / 255.0) * resized_inputs - 1.0
+
+  def _extract_proposal_features(self, preprocessed_inputs, scope):
+    """Extracts first stage RPN features.
+
+    Args:
+      preprocessed_inputs: A [batch, height, width, channels] float32 tensor
+        representing a batch of images.
+      scope: A scope name.
+
+    Returns:
+      rpn_feature_map: A tensor with shape [batch, height, width, depth]
+      activations: A dictionary mapping feature extractor tensor names to
+        tensors
+
+    Raises:
+      InvalidArgumentError: If the spatial size of `preprocessed_inputs`
+        (height or width) is less than 33.
+      ValueError: If the created network is missing the required activation.
+    """
+
+    preprocessed_inputs.get_shape().assert_has_rank(4)
+    shape_assert = tf.Assert(
+        tf.logical_and(tf.greater_equal(tf.shape(preprocessed_inputs)[1], 33),
+                       tf.greater_equal(tf.shape(preprocessed_inputs)[2], 33)),
+        ['image size must at least be 33 in both height and width.'])
+
+    with tf.control_dependencies([shape_assert]):
+      with tf.variable_scope('MobilenetV1',
+                             reuse=self._reuse_weights) as scope:
+        with _batch_norm_arg_scope([slim.conv2d, slim.separable_conv2d],
+                                   batch_norm_scale=True,
+                                   train_batch_norm=self._train_batch_norm):
+          _, activations = mobilenet_v1.mobilenet_v1_base(
+              preprocessed_inputs,
+              final_endpoint='Conv2d_13_pointwise',
+              min_depth=self._min_depth,
+              depth_multiplier=self._depth_multiplier,
+              scope=scope)
+    return activations['Conv2d_13_pointwise'], activations
+
+  def _extract_box_classifier_features(self, proposal_feature_maps, scope):
+    """Extracts second stage box classifier features.
+
+    Args:
+      proposal_feature_maps: A 4-D float tensor with shape
+        [batch_size * self.max_num_proposals, crop_height, crop_width, depth]
+        representing the feature map cropped to each proposal.
+      scope: A scope name (unused).
+
+    Returns:
+      proposal_classifier_features: A 4-D float tensor with shape
+        [batch_size * self.max_num_proposals, height, width, depth]
+        representing box classifier features for each proposal.
+    """
+    net = proposal_feature_maps
+
+    depth = lambda d: max(int(d * 1.0), 16)
+    with tf.variable_scope('MobilenetV1', reuse=self._reuse_weights):
+      with _batch_norm_arg_scope([slim.conv2d, slim.separable_conv2d],
+                                 batch_norm_scale=True,
+                                 train_batch_norm=self._train_batch_norm):
+        with slim.arg_scope(
+            [slim.conv2d, slim.separable_conv2d], padding='SAME'):
+          net = slim.separable_conv2d(
+              net,
+              depth(1024), [3, 3],
+              depth_multiplier=1,
+              stride=2,
+              scope='Conv2d_12_pointwise')
+          return slim.separable_conv2d(
+              net,
+              depth(1024), [3, 3],
+              depth_multiplier=1,
+              stride=1,
+              scope='Conv2d_13_pointwise')
--- a/research/object_detection/models/faster_rcnn_mobilenet_v1_feature_extractor_test.py
+++ b/research/object_detection/models/faster_rcnn_mobilenet_v1_feature_extractor_test.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for faster_rcnn_mobilenet_v1_feature_extractor."""
+
+import numpy as np
+import tensorflow as tf
+
+from object_detection.models import faster_rcnn_mobilenet_v1_feature_extractor as faster_rcnn_mobilenet_v1
+
+
+class FasterRcnnMobilenetV1FeatureExtractorTest(tf.test.TestCase):
+
+  def _build_feature_extractor(self, first_stage_features_stride):
+    return faster_rcnn_mobilenet_v1.FasterRCNNMobilenetV1FeatureExtractor(
+        is_training=False,
+        first_stage_features_stride=first_stage_features_stride,
+        batch_norm_trainable=False,
+        reuse_weights=None,
+        weight_decay=0.0)
+
+  def test_extract_proposal_features_returns_expected_size(self):
+    feature_extractor = self._build_feature_extractor(
+        first_stage_features_stride=16)
+    preprocessed_inputs = tf.random_uniform(
+        [4, 224, 224, 3], maxval=255, dtype=tf.float32)
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
+        preprocessed_inputs, scope='TestScope')
+    features_shape = tf.shape(rpn_feature_map)
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      features_shape_out = sess.run(features_shape)
+      self.assertAllEqual(features_shape_out, [4, 7, 7, 1024])
+
+  def test_extract_proposal_features_stride_eight(self):
+    feature_extractor = self._build_feature_extractor(
+        first_stage_features_stride=8)
+    preprocessed_inputs = tf.random_uniform(
+        [4, 224, 224, 3], maxval=255, dtype=tf.float32)
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
+        preprocessed_inputs, scope='TestScope')
+    features_shape = tf.shape(rpn_feature_map)
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      features_shape_out = sess.run(features_shape)
+      self.assertAllEqual(features_shape_out, [4, 7, 7, 1024])
+
+  def test_extract_proposal_features_half_size_input(self):
+    feature_extractor = self._build_feature_extractor(
+        first_stage_features_stride=16)
+    preprocessed_inputs = tf.random_uniform(
+        [1, 112, 112, 3], maxval=255, dtype=tf.float32)
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
+        preprocessed_inputs, scope='TestScope')
+    features_shape = tf.shape(rpn_feature_map)
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      features_shape_out = sess.run(features_shape)
+      self.assertAllEqual(features_shape_out, [1, 4, 4, 1024])
+
+  def test_extract_proposal_features_dies_on_invalid_stride(self):
+    with self.assertRaises(ValueError):
+      self._build_feature_extractor(first_stage_features_stride=99)
+
+  def test_extract_proposal_features_dies_on_very_small_images(self):
+    feature_extractor = self._build_feature_extractor(
+        first_stage_features_stride=16)
+    preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
+        preprocessed_inputs, scope='TestScope')
+    features_shape = tf.shape(rpn_feature_map)
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      with self.assertRaises(tf.errors.InvalidArgumentError):
+        sess.run(
+            features_shape,
+            feed_dict={preprocessed_inputs: np.random.rand(4, 32, 32, 3)})
+
+  def test_extract_proposal_features_dies_with_incorrect_rank_inputs(self):
+    feature_extractor = self._build_feature_extractor(
+        first_stage_features_stride=16)
+    preprocessed_inputs = tf.random_uniform(
+        [224, 224, 3], maxval=255, dtype=tf.float32)
+    with self.assertRaises(ValueError):
+      feature_extractor.extract_proposal_features(
+          preprocessed_inputs, scope='TestScope')
+
+  def test_extract_box_classifier_features_returns_expected_size(self):
+    feature_extractor = self._build_feature_extractor(
+        first_stage_features_stride=16)
+    proposal_feature_maps = tf.random_uniform(
+        [3, 14, 14, 576], maxval=255, dtype=tf.float32)
+    proposal_classifier_features = (
+        feature_extractor.extract_box_classifier_features(
+            proposal_feature_maps, scope='TestScope'))
+    features_shape = tf.shape(proposal_classifier_features)
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      features_shape_out = sess.run(features_shape)
+      self.assertAllEqual(features_shape_out, [3, 7, 7, 1024])
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/models/faster_rcnn_nas_feature_extractor.py
+++ b/research/object_detection/models/faster_rcnn_nas_feature_extractor.py
@@ -171,6 +171,8 @@ class FasterRCNNNASFeatureExtractor(

    Returns:
      rpn_feature_map: A tensor with shape [batch, height, width, depth]
+      end_points: A dictionary mapping feature extractor tensor names to tensors
+
    Raises:
      ValueError: If the created network is missing the required activation.
    """
@@ -202,7 +204,7 @@ class FasterRCNNNASFeatureExtractor(
    rpn_feature_map_shape = [batch] + shape_without_batch
    rpn_feature_map.set_shape(rpn_feature_map_shape)

-    return rpn_feature_map
+    return rpn_feature_map, end_points

  def _extract_box_classifier_features(self, proposal_feature_maps, scope):
    """Extracts second stage box classifier features.
@@ -231,9 +233,11 @@ class FasterRCNNNASFeatureExtractor(
    # Note that what follows is largely a copy of build_nasnet_large() within
    # nasnet.py. We are copying to minimize code pollution in slim.

-    # pylint: disable=protected-access
-    hparams = nasnet._large_imagenet_config(is_training=self._is_training)
-    # pylint: enable=protected-access
+    # TODO(shlens,skornblith): Determine the appropriate drop path schedule.
+    # For now the schedule is the default (1.0->0.7 over 250,000 train steps).
+    hparams = nasnet.large_imagenet_config()
+    if not self._is_training:
+      hparams.set_hparam('drop_path_keep_prob', 1.0)

    # Calculate the total number of cells in the network
    # -- Add 2 for the reduction cells.

--- a/research/object_detection/models/faster_rcnn_nas_feature_extractor_test.py
+++ b/research/object_detection/models/faster_rcnn_nas_feature_extractor_test.py
@@ -35,7 +35,7 @@ class FasterRcnnNASFeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [1, 299, 299, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -50,7 +50,7 @@ class FasterRcnnNASFeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [1, 224, 224, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -65,7 +65,7 @@ class FasterRcnnNASFeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [1, 112, 112, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)


--- a/research/object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py
+++ b/research/object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py
@@ -95,6 +95,9 @@ class FasterRCNNResnetV1FeatureExtractor(

    Returns:
      rpn_feature_map: A tensor with shape [batch, height, width, depth]
+      activations: A dictionary mapping feature extractor tensor names to
+        tensors
+
    Raises:
      InvalidArgumentError: If the spatial size of `preprocessed_inputs`
        (height or width) is less than 33.
@@ -130,7 +133,7 @@ class FasterRCNNResnetV1FeatureExtractor(
              scope=var_scope)

    handle = scope + '/%s/block3' % self._architecture
-    return activations[handle]
+    return activations[handle], activations

  def _extract_box_classifier_features(self, proposal_feature_maps, scope):
    """Extracts second stage box classifier features.

--- a/research/object_detection/models/faster_rcnn_resnet_v1_feature_extractor_test.py
+++ b/research/object_detection/models/faster_rcnn_resnet_v1_feature_extractor_test.py
@@ -47,7 +47,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
          first_stage_features_stride=16, architecture=architecture)
      preprocessed_inputs = tf.random_uniform(
          [4, 224, 224, 3], maxval=255, dtype=tf.float32)
-      rpn_feature_map = feature_extractor.extract_proposal_features(
+      rpn_feature_map, _ = feature_extractor.extract_proposal_features(
          preprocessed_inputs, scope='TestScope')
      features_shape = tf.shape(rpn_feature_map)

@@ -62,7 +62,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=8)
    preprocessed_inputs = tf.random_uniform(
        [4, 224, 224, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -77,7 +77,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [1, 112, 112, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -95,7 +95,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
    feature_extractor = self._build_feature_extractor(
        first_stage_features_stride=16)
    preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)


--- a/research/object_detection/models/ssd_mobilenet_v1_feature_extractor.py
+++ b/research/object_detection/models/ssd_mobilenet_v1_feature_extractor.py
@@ -100,11 +100,13 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
        'use_depthwise': self._use_depthwise,
    }

-    with slim.arg_scope(self._conv_hyperparams):
-      # TODO(skligys): Enable fused batch norm once quantization supports it.
-      with slim.arg_scope([slim.batch_norm], fused=False):
-        with tf.variable_scope('MobilenetV1',
-                               reuse=self._reuse_weights) as scope:
+    with tf.variable_scope('MobilenetV1',
+                           reuse=self._reuse_weights) as scope:
+      with slim.arg_scope(
+          mobilenet_v1.mobilenet_v1_arg_scope(
+              is_training=(self._batch_norm_trainable and self._is_training))):
+        # TODO(skligys): Enable fused batch norm once quantization supports it.
+        with slim.arg_scope([slim.batch_norm], fused=False):
          _, image_features = mobilenet_v1.mobilenet_v1_base(
              ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
              final_endpoint='Conv2d_13_pointwise',
@@ -112,6 +114,9 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
              depth_multiplier=self._depth_multiplier,
              use_explicit_padding=self._use_explicit_padding,
              scope=scope)
+      with slim.arg_scope(self._conv_hyperparams):
+        # TODO(skligys): Enable fused batch norm once quantization supports it.
+        with slim.arg_scope([slim.batch_norm], fused=False):
          feature_maps = feature_map_generators.multi_resolution_feature_maps(
              feature_map_layout=feature_map_layout,
              depth_multiplier=self._depth_multiplier,

--- a/research/object_detection/models/ssd_mobilenet_v2_feature_extractor.py
+++ b/research/object_detection/models/ssd_mobilenet_v2_feature_extractor.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""SSDFeatureExtractor for MobilenetV2 features."""
+
+import tensorflow as tf
+
+from object_detection.meta_architectures import ssd_meta_arch
+from object_detection.models import feature_map_generators
+from object_detection.utils import ops
+from object_detection.utils import shape_utils
+from nets.mobilenet import mobilenet
+from nets.mobilenet import mobilenet_v2
+
+slim = tf.contrib.slim
+
+
+class SSDMobileNetV2FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
+  """SSD Feature Extractor using MobilenetV2 features."""
+
+  def __init__(self,
+               is_training,
+               depth_multiplier,
+               min_depth,
+               pad_to_multiple,
+               conv_hyperparams,
+               batch_norm_trainable=True,
+               reuse_weights=None,
+               use_explicit_padding=False,
+               use_depthwise=False):
+    """MobileNetV2 Feature Extractor for SSD Models.
+
+    Mobilenet v2 (experimental), designed by sandler@. More details can be found
+    in //knowledge/cerebra/brain/compression/mobilenet/mobilenet_experimental.py
+
+    Args:
+      is_training: whether the network is in training mode.
+      depth_multiplier: float depth multiplier for feature extractor.
+      min_depth: minimum feature extractor depth.
+      pad_to_multiple: the nearest multiple to zero pad the input height and
+        width dimensions to.
+      conv_hyperparams: tf slim arg_scope for conv2d and separable_conv2d ops.
+      batch_norm_trainable:  Whether to update batch norm parameters during
+        training or not. When training with a small batch size
+        (e.g. 1), it is desirable to disable batch norm update and use
+        pretrained batch norm params.
+      reuse_weights: Whether to reuse variables. Default is None.
+      use_explicit_padding: Whether to use explicit padding when extracting
+        features. Default is False.
+      use_depthwise: Whether to use depthwise convolutions. Default is False.
+    """
+    super(SSDMobileNetV2FeatureExtractor, self).__init__(
+        is_training, depth_multiplier, min_depth, pad_to_multiple,
+        conv_hyperparams, batch_norm_trainable, reuse_weights,
+        use_explicit_padding, use_depthwise)
+
+  def preprocess(self, resized_inputs):
+    """SSD preprocessing.
+
+    Maps pixel values to the range [-1, 1].
+
+    Args:
+      resized_inputs: a [batch, height, width, channels] float tensor
+        representing a batch of images.
+
+    Returns:
+      preprocessed_inputs: a [batch, height, width, channels] float tensor
+        representing a batch of images.
+    """
+    return (2.0 / 255.0) * resized_inputs - 1.0
+
+  def extract_features(self, preprocessed_inputs):
+    """Extract features from preprocessed inputs.
+
+    Args:
+      preprocessed_inputs: a [batch, height, width, channels] float tensor
+        representing a batch of images.
+
+    Returns:
+      feature_maps: a list of tensors where the ith tensor has shape
+        [batch, height_i, width_i, depth_i]
+    """
+    preprocessed_inputs = shape_utils.check_min_image_dim(
+        33, preprocessed_inputs)
+
+    feature_map_layout = {
+        'from_layer': ['layer_15/expansion_output', 'layer_19', '', '', '', ''],
+        'layer_depth': [-1, -1, 512, 256, 256, 128],
+        'use_depthwise': self._use_depthwise,
+        'use_explicit_padding': self._use_explicit_padding,
+    }
+
+    with tf.variable_scope('MobilenetV2', reuse=self._reuse_weights) as scope:
+      with slim.arg_scope(
+          mobilenet_v2.training_scope(
+              is_training=(self._is_training and self._batch_norm_trainable),
+              bn_decay=0.9997)), \
+          slim.arg_scope(
+              [mobilenet.depth_multiplier], min_depth=self._min_depth):
+        # TODO(b/68150321): Enable fused batch norm once quantization
+        # supports it.
+        with slim.arg_scope([slim.batch_norm], fused=False):
+          _, image_features = mobilenet_v2.mobilenet_base(
+              ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
+              final_endpoint='layer_19',
+              depth_multiplier=self._depth_multiplier,
+              use_explicit_padding=self._use_explicit_padding,
+              scope=scope)
+        with slim.arg_scope(self._conv_hyperparams):
+          # TODO(b/68150321): Enable fused batch norm once quantization
+          # supports it.
+          with slim.arg_scope([slim.batch_norm], fused=False):
+            feature_maps = feature_map_generators.multi_resolution_feature_maps(
+                feature_map_layout=feature_map_layout,
+                depth_multiplier=self._depth_multiplier,
+                min_depth=self._min_depth,
+                insert_1x1_conv=True,
+                image_features=image_features)
+
+    return feature_maps.values()
--- a/research/object_detection/models/ssd_mobilenet_v2_feature_extractor_test.py
+++ b/research/object_detection/models/ssd_mobilenet_v2_feature_extractor_test.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for ssd_mobilenet_v2_feature_extractor."""
+import numpy as np
+import tensorflow as tf
+
+from object_detection.models import ssd_feature_extractor_test
+from object_detection.models import ssd_mobilenet_v2_feature_extractor
+
+slim = tf.contrib.slim
+
+
+class SsdMobilenetV2FeatureExtractorTest(
+    ssd_feature_extractor_test.SsdFeatureExtractorTestBase):
+
+  def _create_feature_extractor(self, depth_multiplier, pad_to_multiple,
+                                use_explicit_padding=False):
+    """Constructs a new feature extractor.
+
+    Args:
+      depth_multiplier: float depth multiplier for feature extractor
+      pad_to_multiple: the nearest multiple to zero pad the input height and
+        width dimensions to.
+      use_explicit_padding: use 'VALID' padding for convolutions, but prepad
+        inputs so that the output dimensions are the same as if 'SAME' padding
+        were used.
+    Returns:
+      an ssd_meta_arch.SSDFeatureExtractor object.
+    """
+    min_depth = 32
+    with slim.arg_scope([slim.conv2d], normalizer_fn=slim.batch_norm) as sc:
+      conv_hyperparams = sc
+    return ssd_mobilenet_v2_feature_extractor.SSDMobileNetV2FeatureExtractor(
+        False,
+        depth_multiplier,
+        min_depth,
+        pad_to_multiple,
+        conv_hyperparams,
+        use_explicit_padding=use_explicit_padding)
+
+  def test_extract_features_returns_correct_shapes_128(self):
+    image_height = 128
+    image_width = 128
+    depth_multiplier = 1.0
+    pad_to_multiple = 1
+    expected_feature_map_shape = [(2, 8, 8, 576), (2, 4, 4, 1280),
+                                  (2, 2, 2, 512), (2, 1, 1, 256),
+                                  (2, 1, 1, 256), (2, 1, 1, 128)]
+    self.check_extract_features_returns_correct_shape(
+        2, image_height, image_width, depth_multiplier, pad_to_multiple,
+        expected_feature_map_shape)
+
+  def test_extract_features_returns_correct_shapes_with_dynamic_inputs(self):
+    image_height = 128
+    image_width = 128
+    depth_multiplier = 1.0
+    pad_to_multiple = 1
+    expected_feature_map_shape = [(2, 8, 8, 576), (2, 4, 4, 1280),
+                                  (2, 2, 2, 512), (2, 1, 1, 256),
+                                  (2, 1, 1, 256), (2, 1, 1, 128)]
+    self.check_extract_features_returns_correct_shapes_with_dynamic_inputs(
+        2, image_height, image_width, depth_multiplier, pad_to_multiple,
+        expected_feature_map_shape)
+
+  def test_extract_features_returns_correct_shapes_299(self):
+    image_height = 299
+    image_width = 299
+    depth_multiplier = 1.0
+    pad_to_multiple = 1
+    expected_feature_map_shape = [(2, 19, 19, 576), (2, 10, 10, 1280),
+                                  (2, 5, 5, 512), (2, 3, 3, 256),
+                                  (2, 2, 2, 256), (2, 1, 1, 128)]
+    self.check_extract_features_returns_correct_shape(
+        2, image_height, image_width, depth_multiplier, pad_to_multiple,
+        expected_feature_map_shape)
+
+  def test_extract_features_returns_correct_shapes_enforcing_min_depth(self):
+    image_height = 299
+    image_width = 299
+    depth_multiplier = 0.5**12
+    pad_to_multiple = 1
+    expected_feature_map_shape = [(2, 19, 19, 192), (2, 10, 10, 32),
+                                  (2, 5, 5, 32), (2, 3, 3, 32),
+                                  (2, 2, 2, 32), (2, 1, 1, 32)]
+    self.check_extract_features_returns_correct_shape(
+        2, image_height, image_width, depth_multiplier, pad_to_multiple,
+        expected_feature_map_shape)
+
+  def test_extract_features_returns_correct_shapes_with_pad_to_multiple(self):
+    image_height = 299
+    image_width = 299
+    depth_multiplier = 1.0
+    pad_to_multiple = 32
+    expected_feature_map_shape = [(2, 20, 20, 576), (2, 10, 10, 1280),
+                                  (2, 5, 5, 512), (2, 3, 3, 256),
+                                  (2, 2, 2, 256), (2, 1, 1, 128)]
+    self.check_extract_features_returns_correct_shape(
+        2, image_height, image_width, depth_multiplier, pad_to_multiple,
+        expected_feature_map_shape)
+
+  def test_extract_features_raises_error_with_invalid_image_size(self):
+    image_height = 32
+    image_width = 32
+    depth_multiplier = 1.0
+    pad_to_multiple = 1
+    self.check_extract_features_raises_error_with_invalid_image_size(
+        image_height, image_width, depth_multiplier, pad_to_multiple)
+
+  def test_preprocess_returns_correct_value_range(self):
+    image_height = 128
+    image_width = 128
+    depth_multiplier = 1
+    pad_to_multiple = 1
+    test_image = np.random.rand(4, image_height, image_width, 3)
+    feature_extractor = self._create_feature_extractor(depth_multiplier,
+                                                       pad_to_multiple)
+    preprocessed_image = feature_extractor.preprocess(test_image)
+    self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0)))
+
+  def test_variables_only_created_in_scope(self):
+    depth_multiplier = 1
+    pad_to_multiple = 1
+    scope_name = 'MobilenetV2'
+    self.check_feature_extractor_variables_under_scope(
+        depth_multiplier, pad_to_multiple, scope_name)
+
+  def test_nofused_batchnorm(self):
+    image_height = 40
+    image_width = 40
+    depth_multiplier = 1
+    pad_to_multiple = 1
+    image_placeholder = tf.placeholder(tf.float32,
+                                       [1, image_height, image_width, 3])
+    feature_extractor = self._create_feature_extractor(depth_multiplier,
+                                                       pad_to_multiple)
+    preprocessed_image = feature_extractor.preprocess(image_placeholder)
+    _ = feature_extractor.extract_features(preprocessed_image)
+    self.assertFalse(any(op.type == 'FusedBatchNorm'
+                         for op in tf.get_default_graph().get_operations()))
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/protos/optimizer.proto
+++ b/research/object_detection/protos/optimizer.proto
@@ -71,6 +71,10 @@ message ManualStepLearningRate {
    optional float learning_rate = 2 [default = 0.002];
  }
  repeated LearningRateSchedule schedule = 2;
+
+  // Whether to linearly interpolate learning rates for steps in
+  // [0, schedule[0].step].
+  optional bool warmup = 3 [default = false];
 }

 // Configuration message for a cosine decaying learning rate as defined in
@@ -80,4 +84,5 @@ message CosineDecayLearningRate {
  optional uint32 total_steps = 2 [default = 4000000];
  optional float warmup_learning_rate = 3 [default = 0.0002];
  optional uint32 warmup_steps = 4 [default = 10000];
+  optional uint32 hold_base_rate_steps = 5 [default = 0];
 }
--- a/research/object_detection/protos/ssd.proto
+++ b/research/object_detection/protos/ssd.proto
@@ -72,7 +72,8 @@ message SsdFeatureExtractor {
  // Minimum number of the channels in the feature extractor.
  optional int32 min_depth = 3 [default=16];

-  // Hyperparameters for the feature extractor.
+  // Hyperparameters that affect the layers of feature extractor added on top
+  // of the base feature extractor.
  optional Hyperparams conv_hyperparams = 4;

  // The nearest multiple to zero-pad the input height and width dimensions to.

--- a/research/object_detection/protos/train.proto
+++ b/research/object_detection/protos/train.proto
@@ -29,11 +29,17 @@ message TrainConfig {
  // extractor variables trained outside of object detection.
  optional string fine_tune_checkpoint = 7 [default=""];

+  // Type of checkpoint to restore variables from, e.g. 'classification' or
+  // 'detection'. Provides extensibility to from_detection_checkpoint.
+  // Typically used to load feature extractor variables from trained models.
+  optional string fine_tune_checkpoint_type = 22 [default=""];
+
+  // [Deprecated]: use fine_tune_checkpoint_type instead.
  // Specifies if the finetune checkpoint is from an object detection model.
  // If from an object detection model, the model being trained should have
  // the same parameters with the exception of the num_classes parameter.
  // If false, it assumes the checkpoint was a object classification model.
-  optional bool from_detection_checkpoint = 8 [default=false];
+  optional bool from_detection_checkpoint = 8 [default=false, deprecated=true];

  // Whether to load all checkpoint vars that match model variable names and
  // sizes. This option is only available if `from_detection_checkpoint` is
@@ -83,7 +89,7 @@ message TrainConfig {
  // Set this to at least the maximum amount of boxes in the input data.
  // Otherwise, it may cause "Data loss: Attempted to pad to a smaller size
  // than the input element" errors.
-  optional int32 max_number_of_boxes = 20 [default=50];
+  optional int32 max_number_of_boxes = 20 [default=100];

  // Whether to remove padding along `num_boxes` dimension of the groundtruth
  // tensors.

--- a/research/object_detection/samples/configs/faster_rcnn_inception_resnet_v2_atrous_coco.config
+++ b/research/object_detection/samples/configs/faster_rcnn_inception_resnet_v2_atrous_coco.config
@@ -90,10 +90,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003

--- a/research/object_detection/samples/configs/faster_rcnn_inception_resnet_v2_atrous_oid.config
+++ b/research/object_detection/samples/configs/faster_rcnn_inception_resnet_v2_atrous_oid.config
@@ -90,10 +90,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.00006
-          schedule {
-            step: 0
-            learning_rate: .00006
-          }
          schedule {
            step: 6000000
            learning_rate: .000006

--- a/research/object_detection/samples/configs/faster_rcnn_inception_resnet_v2_atrous_pets.config
+++ b/research/object_detection/samples/configs/faster_rcnn_inception_resnet_v2_atrous_pets.config
@@ -90,10 +90,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003

--- a/research/object_detection/samples/configs/faster_rcnn_inception_v2_coco.config
+++ b/research/object_detection/samples/configs/faster_rcnn_inception_v2_coco.config
@@ -89,10 +89,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
-          schedule {
-            step: 0
-            learning_rate: .0002
-          }
          schedule {
            step: 900000
            learning_rate: .00002

--- a/research/object_detection/samples/configs/faster_rcnn_inception_v2_pets.config
+++ b/research/object_detection/samples/configs/faster_rcnn_inception_v2_pets.config
@@ -88,10 +88,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
-          schedule {
-            step: 0
-            learning_rate: .0002
-          }
          schedule {
            step: 900000
            learning_rate: .00002

--- a/research/object_detection/samples/configs/faster_rcnn_nas_coco.config
+++ b/research/object_detection/samples/configs/faster_rcnn_nas_coco.config
@@ -91,10 +91,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003

--- a/research/object_detection/samples/configs/faster_rcnn_resnet101_atrous_coco.config
+++ b/research/object_detection/samples/configs/faster_rcnn_resnet101_atrous_coco.config
@@ -90,10 +90,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003

--- a/research/object_detection/samples/configs/faster_rcnn_resnet101_coco.config
+++ b/research/object_detection/samples/configs/faster_rcnn_resnet101_coco.config
@@ -88,10 +88,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003

--- a/research/object_detection/samples/configs/faster_rcnn_resnet101_kitti.config
+++ b/research/object_detection/samples/configs/faster_rcnn_resnet101_kitti.config
@@ -93,10 +93,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0001
-          schedule {
-            step: 0
-            learning_rate: .0001
-          }
          schedule {
            step: 500000
            learning_rate: .00001

--- a/research/object_detection/samples/configs/faster_rcnn_resnet101_pets.config
+++ b/research/object_detection/samples/configs/faster_rcnn_resnet101_pets.config
@@ -88,10 +88,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003

--- a/research/object_detection/samples/configs/faster_rcnn_resnet101_voc07.config
+++ b/research/object_detection/samples/configs/faster_rcnn_resnet101_voc07.config
@@ -88,10 +88,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0001
-          schedule {
-            step: 0
-            learning_rate: .0001
-          }
          schedule {
            step: 500000
            learning_rate: .00001

--- a/research/object_detection/samples/configs/faster_rcnn_resnet152_coco.config
+++ b/research/object_detection/samples/configs/faster_rcnn_resnet152_coco.config
@@ -88,10 +88,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003

--- a/research/object_detection/samples/configs/faster_rcnn_resnet152_pets.config
+++ b/research/object_detection/samples/configs/faster_rcnn_resnet152_pets.config
@@ -88,10 +88,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003

--- a/research/object_detection/samples/configs/faster_rcnn_resnet50_coco.config
+++ b/research/object_detection/samples/configs/faster_rcnn_resnet50_coco.config
@@ -88,10 +88,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003

--- a/research/object_detection/samples/configs/faster_rcnn_resnet50_pets.config
+++ b/research/object_detection/samples/configs/faster_rcnn_resnet50_pets.config
@@ -88,10 +88,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003
@@ -118,6 +114,7 @@ train_config: {
    random_horizontal_flip {
    }
  }
+  max_number_of_boxes: 50
 }

 train_input_reader: {

--- a/research/object_detection/samples/configs/mask_rcnn_inception_resnet_v2_atrous_coco.config
+++ b/research/object_detection/samples/configs/mask_rcnn_inception_resnet_v2_atrous_coco.config
@@ -110,10 +110,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003

--- a/research/object_detection/samples/configs/mask_rcnn_inception_v2_coco.config
+++ b/research/object_detection/samples/configs/mask_rcnn_inception_v2_coco.config
@@ -109,10 +109,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
-          schedule {
-            step: 0
-            learning_rate: .0002
-          }
          schedule {
            step: 900000
            learning_rate: .00002

--- a/research/object_detection/samples/configs/mask_rcnn_resnet101_atrous_coco.config
+++ b/research/object_detection/samples/configs/mask_rcnn_resnet101_atrous_coco.config
@@ -110,10 +110,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003

--- a/research/object_detection/samples/configs/mask_rcnn_resnet101_pets.config
+++ b/research/object_detection/samples/configs/mask_rcnn_resnet101_pets.config
@@ -103,10 +103,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0007
-          schedule {
-            step: 0
-            learning_rate: 0.0007
-          }
          schedule {
            step: 15000
            learning_rate: 0.00007

--- a/research/object_detection/samples/configs/mask_rcnn_resnet50_atrous_coco.config
+++ b/research/object_detection/samples/configs/mask_rcnn_resnet50_atrous_coco.config
@@ -110,10 +110,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003

--- a/research/object_detection/samples/configs/rfcn_resnet101_coco.config
+++ b/research/object_detection/samples/configs/rfcn_resnet101_coco.config
@@ -85,10 +85,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003

--- a/research/object_detection/samples/configs/rfcn_resnet101_pets.config
+++ b/research/object_detection/samples/configs/rfcn_resnet101_pets.config
@@ -85,10 +85,6 @@ train_config: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
-          schedule {
-            step: 0
-            learning_rate: .0003
-          }
          schedule {
            step: 900000
            learning_rate: .00003

--- a/research/object_detection/samples/configs/ssd_inception_v2_pets.config
+++ b/research/object_detection/samples/configs/ssd_inception_v2_pets.config
@@ -162,6 +162,7 @@ train_config: {
    ssd_random_crop {
    }
  }
+  max_number_of_boxes: 50
 }

 train_input_reader: {

--- a/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config
+++ b/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config
+# SSD with Mobilenet v2 configuration for MSCOCO Dataset.
+# Users should configure the fine_tune_checkpoint field in the train config as
+# well as the label_map_path and input_path fields in the train_input_reader and
+# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
+# should be configured.
+
+model {
+  ssd {
+    num_classes: 90
+    box_coder {
+      faster_rcnn_box_coder {
+        y_scale: 10.0
+        x_scale: 10.0
+        height_scale: 5.0
+        width_scale: 5.0
+      }
+    }
+    matcher {
+      argmax_matcher {
+        matched_threshold: 0.5
+        unmatched_threshold: 0.5
+        ignore_thresholds: false
+        negatives_lower_than_unmatched: true
+        force_match_for_each_row: true
+      }
+    }
+    similarity_calculator {
+      iou_similarity {
+      }
+    }
+    anchor_generator {
+      ssd_anchor_generator {
+        num_layers: 6
+        min_scale: 0.2
+        max_scale: 0.95
+        aspect_ratios: 1.0
+        aspect_ratios: 2.0
+        aspect_ratios: 0.5
+        aspect_ratios: 3.0
+        aspect_ratios: 0.3333
+      }
+    }
+    image_resizer {
+      fixed_shape_resizer {
+        height: 300
+        width: 300
+      }
+    }
+    box_predictor {
+      convolutional_box_predictor {
+        min_depth: 0
+        max_depth: 0
+        num_layers_before_predictor: 0
+        use_dropout: false
+        dropout_keep_probability: 0.8
+        kernel_size: 3
+        box_code_size: 4
+        apply_sigmoid_to_scores: false
+        conv_hyperparams {
+          activation: RELU_6,
+          regularizer {
+            l2_regularizer {
+              weight: 0.00004
+            }
+          }
+          initializer {
+            truncated_normal_initializer {
+              stddev: 0.03
+              mean: 0.0
+            }
+          }
+          batch_norm {
+            train: true,
+            scale: true,
+            center: true,
+            decay: 0.9997,
+            epsilon: 0.001,
+          }
+        }
+      }
+    }
+    feature_extractor {
+      type: 'ssd_mobilenet_v2'
+      min_depth: 16
+      depth_multiplier: 1.0
+      use_depthwise: true
+      conv_hyperparams {
+        activation: RELU_6,
+        regularizer {
+          l2_regularizer {
+            weight: 0.00004
+          }
+        }
+        initializer {
+          truncated_normal_initializer {
+            stddev: 0.03
+            mean: 0.0
+          }
+        }
+        batch_norm {
+          train: true,
+          scale: true,
+          center: true,
+          decay: 0.9997,
+          epsilon: 0.001,
+        }
+      }
+      batch_norm_trainable: true
+    }
+    loss {
+      classification_loss {
+        weighted_sigmoid {
+        }
+      }
+      localization_loss {
+        weighted_smooth_l1 {
+        }
+      }
+      hard_example_miner {
+        num_hard_examples: 3000
+        iou_threshold: 0.99
+        loss_type: CLASSIFICATION
+        max_negatives_per_positive: 3
+        min_negatives_per_image: 3
+      }
+      classification_weight: 1.0
+      localization_weight: 1.0
+    }
+    normalize_loss_by_num_matches: true
+    post_processing {
+      batch_non_max_suppression {
+        score_threshold: 1e-8
+        iou_threshold: 0.6
+        max_detections_per_class: 100
+        max_total_detections: 100
+      }
+      score_converter: SIGMOID
+    }
+  }
+}
+
+train_config: {
+  batch_size: 24
+  optimizer {
+    rms_prop_optimizer: {
+      learning_rate: {
+        exponential_decay_learning_rate {
+          initial_learning_rate: 0.004
+          decay_steps: 800720
+          decay_factor: 0.95
+        }
+      }
+      momentum_optimizer_value: 0.9
+      decay: 0.9
+      epsilon: 1.0
+    }
+  }
+  fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
+  fine_tune_checkpoint_type:  "detection"
+  # Note: The below line limits the training process to 200K steps, which we
+  # empirically found to be sufficient enough to train the pets dataset. This
+  # effectively bypasses the learning rate schedule (the learning rate will
+  # never decay). Remove the below line to train indefinitely.
+  num_steps: 200000
+  data_augmentation_options {
+    random_horizontal_flip {
+    }
+  }
+  data_augmentation_options {
+    ssd_random_crop {
+    }
+  }
+}
+
+train_input_reader: {
+  tf_record_input_reader {
+    input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record"
+  }
+  label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
+}
+
+eval_config: {
+  num_examples: 8000
+  # Note: The below line limits the evaluation process to 10 evaluations.
+  # Remove the below line to evaluate indefinitely.
+  max_evals: 10
+}
+
+eval_input_reader: {
+  tf_record_input_reader {
+    input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record"
+  }
+  label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
+  shuffle: false
+  num_readers: 1
+}
\ No newline at end of file
--- a/research/object_detection/trainer.py
+++ b/research/object_detection/trainer.py
@@ -254,7 +254,7 @@ def train(create_tensor_dict_fn, create_model_fn, train_config, master, task,
      training_optimizer, optimizer_summary_vars = optimizer_builder.build(
          train_config.optimizer)
      for var in optimizer_summary_vars:
-        tf.summary.scalar(var.op.name, var)
+        tf.summary.scalar(var.op.name, var, family='LearningRate')

    sync_optimizer = None
    if train_config.sync_replicas:
@@ -267,8 +267,16 @@ def train(create_tensor_dict_fn, create_model_fn, train_config, master, task,
    # Create ops required to initialize the model from a given checkpoint.
    init_fn = None
    if train_config.fine_tune_checkpoint:
+      if not train_config.fine_tune_checkpoint_type:
+        # train_config.from_detection_checkpoint field is deprecated. For
+        # backward compatibility, fine_tune_checkpoint_type is set based on
+        # from_detection_checkpoint.
+        if train_config.from_detection_checkpoint:
+          train_config.fine_tune_checkpoint_type = 'detection'
+        else:
+          train_config.fine_tune_checkpoint_type = 'classification'
      var_map = detection_model.restore_map(
-          from_detection_checkpoint=train_config.from_detection_checkpoint,
+          fine_tune_checkpoint_type=train_config.fine_tune_checkpoint_type,
          load_all_detection_checkpoint_vars=(
              train_config.load_all_detection_checkpoint_vars))
      available_var_map = (variables_helper.
@@ -320,11 +328,13 @@ def train(create_tensor_dict_fn, create_model_fn, train_config, master, task,

    # Add summaries.
    for model_var in slim.get_model_variables():
-      global_summaries.add(tf.summary.histogram(model_var.op.name, model_var))
+      global_summaries.add(tf.summary.histogram('ModelVars/' +
+                                                model_var.op.name, model_var))
    for loss_tensor in tf.losses.get_losses():
-      global_summaries.add(tf.summary.scalar(loss_tensor.op.name, loss_tensor))
+      global_summaries.add(tf.summary.scalar('Losses/' + loss_tensor.op.name,
+                                             loss_tensor))
    global_summaries.add(
-        tf.summary.scalar('TotalLoss', tf.losses.get_total_loss()))
+        tf.summary.scalar('Losses/TotalLoss', tf.losses.get_total_loss()))

    # Add the summaries from the first clone. These contain the summaries
    # created by model_fn and either optimize_clones() or _gather_clone_loss().

--- a/research/object_detection/trainer_test.py
+++ b/research/object_detection/trainer_test.py
@@ -157,13 +157,14 @@ class FakeDetectionModel(model.DetectionModel):
    }
    return loss_dict

-  def restore_map(self, from_detection_checkpoint=True):
+  def restore_map(self, fine_tune_checkpoint_type='detection'):
    """Returns a map of variables to load from a foreign checkpoint.

    Args:
-      from_detection_checkpoint: whether to restore from a full detection
+      fine_tune_checkpoint_type: whether to restore from a full detection
        checkpoint (with compatible variable names) or to restore from a
        classification checkpoint for initialization prior to training.
+        Valid values: `detection`, `classification`. Default 'detection'.

    Returns:
      A dict mapping variable names to variables.

--- a/research/object_detection/utils/config_util.py
+++ b/research/object_detection/utils/config_util.py
@@ -285,6 +285,9 @@ def merge_external_params_with_configs(configs, hparams=None, **kwargs):
 def _update_initial_learning_rate(configs, learning_rate):
  """Updates `configs` to reflect the new initial learning rate.

+  This function updates the initial learning rate. For learning rate schedules,
+  all other defined learning rates in the pipeline config are scaled to maintain
+  their same ratio with the initial learning rate.
  The configs dictionary is updated in place, and hence not returned.

  Args:
@@ -322,6 +325,13 @@ def _update_initial_learning_rate(configs, learning_rate):
    manual_lr.initial_learning_rate = learning_rate
    for schedule in manual_lr.schedule:
      schedule.learning_rate *= learning_rate_scaling
+  elif learning_rate_type == "cosine_decay_learning_rate":
+    cosine_lr = optimizer_config.learning_rate.cosine_decay_learning_rate
+    learning_rate_base = cosine_lr.learning_rate_base
+    warmup_learning_rate = cosine_lr.warmup_learning_rate
+    warmup_scale_factor = warmup_learning_rate / learning_rate_base
+    cosine_lr.learning_rate_base = learning_rate
+    cosine_lr.warmup_learning_rate = warmup_scale_factor * learning_rate
  else:
    raise TypeError("Learning rate %s is not supported." % learning_rate_type)


--- a/research/object_detection/utils/config_util_test.py
+++ b/research/object_detection/utils/config_util_test.py
@@ -59,6 +59,14 @@ def _update_optimizer_with_manual_step_learning_rate(
    schedule.learning_rate = initial_learning_rate * learning_rate_scaling**i


+def _update_optimizer_with_cosine_decay_learning_rate(
+    optimizer, learning_rate, warmup_learning_rate):
+  """Adds a new cosine decay learning rate."""
+  cosine_lr = optimizer.learning_rate.cosine_decay_learning_rate
+  cosine_lr.learning_rate_base = learning_rate
+  cosine_lr.warmup_learning_rate = warmup_learning_rate
+
+
 class ConfigUtilTest(tf.test.TestCase):

  def test_get_configs_from_pipeline_file(self):
@@ -154,6 +162,7 @@ class ConfigUtilTest(tf.test.TestCase):
    """Asserts successful updating of all learning rate schemes."""
    original_learning_rate = 0.7
    learning_rate_scaling = 0.1
+    warmup_learning_rate = 0.07
    hparams = tf.contrib.training.HParams(learning_rate=0.15)
    pipeline_config_path = os.path.join(self.get_temp_dir(), "pipeline.config")

@@ -201,6 +210,24 @@ class ConfigUtilTest(tf.test.TestCase):
      self.assertAlmostEqual(hparams.learning_rate * learning_rate_scaling**i,
                             schedule.learning_rate)

+    # Cosine decay learning rate.
+    pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
+    optimizer = getattr(pipeline_config.train_config.optimizer, optimizer_name)
+    _update_optimizer_with_cosine_decay_learning_rate(optimizer,
+                                                      original_learning_rate,
+                                                      warmup_learning_rate)
+    _write_config(pipeline_config, pipeline_config_path)
+
+    configs = config_util.get_configs_from_pipeline_file(pipeline_config_path)
+    configs = config_util.merge_external_params_with_configs(configs, hparams)
+    optimizer = getattr(configs["train_config"].optimizer, optimizer_name)
+    cosine_lr = optimizer.learning_rate.cosine_decay_learning_rate
+
+    self.assertAlmostEqual(hparams.learning_rate, cosine_lr.learning_rate_base)
+    warmup_scale_factor = warmup_learning_rate / original_learning_rate
+    self.assertAlmostEqual(hparams.learning_rate * warmup_scale_factor,
+                           cosine_lr.warmup_learning_rate)
+
  def testRMSPropWithNewLearingRate(self):
    """Tests new learning rates for RMSProp Optimizer."""
    self._assertOptimizerWithNewLearningRate("rms_prop_optimizer")

--- a/research/object_detection/utils/label_map_util.py
+++ b/research/object_detection/utils/label_map_util.py
@@ -32,8 +32,10 @@ def _validate_label_map(label_map):
    ValueError: if label map is invalid.
  """
  for item in label_map.item:
-    if item.id < 1:
-      raise ValueError('Label map ids should be >= 1.')
+    if item.id < 0:
+      raise ValueError('Label map ids should be >= 0.')
+    if item.id == 0 and item.name != 'background':
+      raise ValueError('Label map id 0 is reserved for the background label')


 def create_category_index(categories):

--- a/research/object_detection/utils/label_map_util_test.py
+++ b/research/object_detection/utils/label_map_util_test.py
@@ -95,6 +95,30 @@ class LabelMapUtilTest(tf.test.TestCase):
    with self.assertRaises(ValueError):
      label_map_util.load_labelmap(label_map_path)

+  def test_load_label_map_with_background(self):
+    label_map_string = """
+      item {
+        id:0
+        name:'background'
+      }
+      item {
+        id:2
+        name:'cat'
+      }
+      item {
+        id:1
+        name:'dog'
+      }
+    """
+    label_map_path = os.path.join(self.get_temp_dir(), 'label_map.pbtxt')
+    with tf.gfile.Open(label_map_path, 'wb') as f:
+      f.write(label_map_string)
+
+    label_map_dict = label_map_util.get_label_map_dict(label_map_path)
+    self.assertEqual(label_map_dict['background'], 0)
+    self.assertEqual(label_map_dict['dog'], 1)
+    self.assertEqual(label_map_dict['cat'], 2)
+
  def test_keep_categories_with_unique_id(self):
    label_map_proto = string_int_label_map_pb2.StringIntLabelMap()
    label_map_string = """

--- a/research/object_detection/utils/learning_schedules.py
+++ b/research/object_detection/utils/learning_schedules.py
@@ -56,14 +56,15 @@ def exponential_decay_with_burnin(global_step,
  return tf.where(
      tf.less(tf.cast(global_step, tf.int32), tf.constant(burnin_steps)),
      tf.constant(burnin_learning_rate),
-      post_burnin_learning_rate)
+      post_burnin_learning_rate, name='learning_rate')


 def cosine_decay_with_warmup(global_step,
                             learning_rate_base,
                             total_steps,
                             warmup_learning_rate=0.0,
-                             warmup_steps=0):
+                             warmup_steps=0,
+                             hold_base_rate_steps=0):
  """Cosine decay schedule with warm up period.

  Cosine annealing learning rate as described in:
@@ -79,6 +80,8 @@ def cosine_decay_with_warmup(global_step,
    total_steps: total number of training steps.
    warmup_learning_rate: initial learning rate for warm up.
    warmup_steps: number of warmup steps.
+    hold_base_rate_steps: Optional number of steps to hold base learning rate
+      before decaying.

  Returns:
    a (scalar) float tensor representing learning rate.
@@ -93,21 +96,24 @@ def cosine_decay_with_warmup(global_step,
  if total_steps < warmup_steps:
    raise ValueError('total_steps must be larger or equal to '
                     'warmup_steps.')
-  learning_rate = 0.5 * learning_rate_base * (
-      1 + tf.cos(np.pi * (tf.cast(global_step, tf.float32) - warmup_steps
-                         ) / float(total_steps - warmup_steps)))
+  learning_rate = 0.5 * learning_rate_base * (1 + tf.cos(
+      np.pi *
+      (tf.cast(global_step, tf.float32) - warmup_steps - hold_base_rate_steps
+      ) / float(total_steps - warmup_steps - hold_base_rate_steps)))
+  if hold_base_rate_steps > 0:
+    learning_rate = tf.where(global_step > warmup_steps + hold_base_rate_steps,
+                             learning_rate, learning_rate_base)
  if warmup_steps > 0:
    slope = (learning_rate_base - warmup_learning_rate) / warmup_steps
-    pre_cosine_learning_rate = slope * tf.cast(
-        global_step, tf.float32) + warmup_learning_rate
-    learning_rate = tf.where(
-        tf.less(tf.cast(global_step, tf.int32), warmup_steps),
-        pre_cosine_learning_rate,
-        learning_rate)
-  return learning_rate
+    warmup_rate = slope * tf.cast(global_step,
+                                  tf.float32) + warmup_learning_rate
+    learning_rate = tf.where(global_step < warmup_steps, warmup_rate,
+                             learning_rate)
+  return tf.where(global_step > total_steps, 0.0, learning_rate,
+                  name='learning_rate')


-def manual_stepping(global_step, boundaries, rates):
+def manual_stepping(global_step, boundaries, rates, warmup=False):
  """Manually stepped learning rate schedule.

  This function provides fine grained control over learning rates.  One must
@@ -124,6 +130,8 @@ def manual_stepping(global_step, boundaries, rates):
    rates: a list of (float) learning rates corresponding to intervals between
      the boundaries.  The length of this list must be exactly
      len(boundaries) + 1.
+    warmup: Whether to linearly interpolate learning rate for steps in
+      [0, boundaries[0]].

  Returns:
    a (scalar) float tensor representing learning rate
@@ -131,6 +139,7 @@ def manual_stepping(global_step, boundaries, rates):
    ValueError: if one of the following checks fails:
      1. boundaries is a strictly increasing list of positive integers
      2. len(rates) == len(boundaries) + 1
+      3. boundaries[0] != 0
  """
  if any([b < 0 for b in boundaries]) or any(
      [not isinstance(b, int) for b in boundaries]):
@@ -142,16 +151,21 @@ def manual_stepping(global_step, boundaries, rates):
  if len(rates) != len(boundaries) + 1:
    raise ValueError('Number of provided learning rates must exceed '
                     'number of boundary points by exactly 1.')
-  if not boundaries: return tf.constant(rates[0])
-  step_boundaries = tf.constant(boundaries, tf.int32)
+
+  if boundaries and boundaries[0] == 0:
+    raise ValueError('First step cannot be zero.')
+
+  if warmup and boundaries:
+    slope = (rates[1] - rates[0]) * 1.0 / boundaries[0]
+    warmup_steps = range(boundaries[0])
+    warmup_rates = [rates[0] + slope * step for step in warmup_steps]
+    boundaries = warmup_steps + boundaries
+    rates = warmup_rates + rates[1:]
+  else:
+    boundaries = [0] + boundaries
  num_boundaries = len(boundaries)
-  learning_rates = tf.constant(rates, tf.float32)
-  index = tf.reduce_min(
-      tf.where(
-          # Casting global step to tf.int32 is dangerous, but necessary to be
-          # compatible with TPU.
-          tf.greater(step_boundaries, tf.cast(global_step, tf.int32)),
-          tf.constant(range(num_boundaries), dtype=tf.int32),
-          tf.constant([num_boundaries] * num_boundaries, dtype=tf.int32)))
-  return tf.reduce_sum(learning_rates * tf.one_hot(index, len(rates),
-                                                   dtype=tf.float32))
+  rate_index = tf.reduce_max(tf.where(tf.greater_equal(global_step, boundaries),
+                                      range(num_boundaries),
+                                      [0] * num_boundaries))
+  return tf.reduce_sum(rates * tf.one_hot(rate_index, depth=num_boundaries),
+                       name='learning_rate')
--- a/research/object_detection/utils/learning_schedules_test.py
+++ b/research/object_detection/utils/learning_schedules_test.py
@@ -33,6 +33,7 @@ class LearningSchedulesTest(test_case.TestCase):
      learning_rate = learning_schedules.exponential_decay_with_burnin(
          global_step, learning_rate_base, learning_rate_decay_steps,
          learning_rate_decay_factor, burnin_learning_rate, burnin_steps)
+      assert learning_rate.op.name.endswith('learning_rate')
      return (learning_rate,)

    output_rates = [
@@ -51,6 +52,7 @@ class LearningSchedulesTest(test_case.TestCase):
      learning_rate = learning_schedules.cosine_decay_with_warmup(
          global_step, learning_rate_base, total_steps,
          warmup_learning_rate, warmup_steps)
+      assert learning_rate.op.name.endswith('learning_rate')
      return (learning_rate,)
    exp_rates = [0.1, 0.5, 0.9, 1.0, 0]
    input_global_steps = [0, 4, 8, 9, 100]
@@ -60,12 +62,53 @@ class LearningSchedulesTest(test_case.TestCase):
    ]
    self.assertAllClose(output_rates, exp_rates)

+  def testCosineDecayAfterTotalSteps(self):
+    def graph_fn(global_step):
+      learning_rate_base = 1.0
+      total_steps = 100
+      warmup_learning_rate = 0.1
+      warmup_steps = 9
+      learning_rate = learning_schedules.cosine_decay_with_warmup(
+          global_step, learning_rate_base, total_steps,
+          warmup_learning_rate, warmup_steps)
+      assert learning_rate.op.name.endswith('learning_rate')
+      return (learning_rate,)
+    exp_rates = [0]
+    input_global_steps = [101]
+    output_rates = [
+        self.execute(graph_fn, [np.array(step).astype(np.int64)])
+        for step in input_global_steps
+    ]
+    self.assertAllClose(output_rates, exp_rates)
+
+  def testCosineDecayWithHoldBaseLearningRateSteps(self):
+    def graph_fn(global_step):
+      learning_rate_base = 1.0
+      total_steps = 120
+      warmup_learning_rate = 0.1
+      warmup_steps = 9
+      hold_base_rate_steps = 20
+      learning_rate = learning_schedules.cosine_decay_with_warmup(
+          global_step, learning_rate_base, total_steps,
+          warmup_learning_rate, warmup_steps, hold_base_rate_steps)
+      assert learning_rate.op.name.endswith('learning_rate')
+      return (learning_rate,)
+    exp_rates = [0.1, 0.5, 0.9, 1.0, 1.0, 1.0, 0.999702, 0.874255, 0.577365,
+                 0.0]
+    input_global_steps = [0, 4, 8, 9, 10, 29, 30, 50, 70, 120]
+    output_rates = [
+        self.execute(graph_fn, [np.array(step).astype(np.int64)])
+        for step in input_global_steps
+    ]
+    self.assertAllClose(output_rates, exp_rates)
+
  def testManualStepping(self):
    def graph_fn(global_step):
      boundaries = [2, 3, 7]
      rates = [1.0, 2.0, 3.0, 4.0]
      learning_rate = learning_schedules.manual_stepping(
          global_step, boundaries, rates)
+      assert learning_rate.op.name.endswith('learning_rate')
      return (learning_rate,)

    output_rates = [
@@ -75,6 +118,22 @@ class LearningSchedulesTest(test_case.TestCase):
    exp_rates = [1.0, 1.0, 2.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0]
    self.assertAllClose(output_rates, exp_rates)

+  def testManualSteppingWithWarmup(self):
+    def graph_fn(global_step):
+      boundaries = [4, 6, 8]
+      rates = [0.02, 0.10, 0.01, 0.001]
+      learning_rate = learning_schedules.manual_stepping(
+          global_step, boundaries, rates, warmup=True)
+      assert learning_rate.op.name.endswith('learning_rate')
+      return (learning_rate,)
+
+    output_rates = [
+        self.execute(graph_fn, [np.array(i).astype(np.int64)])
+        for i in range(9)
+    ]
+    exp_rates = [0.02, 0.04, 0.06, 0.08, 0.10, 0.10, 0.01, 0.01, 0.001]
+    self.assertAllClose(output_rates, exp_rates)
+
  def testManualSteppingWithZeroBoundaries(self):
    def graph_fn(global_step):
      boundaries = []

--- a/research/object_detection/utils/ops.py
+++ b/research/object_detection/utils/ops.py
@@ -657,7 +657,7 @@ def position_sensitive_crop_regions(image,
    position_sensitive_features = tf.add_n(image_crops) / len(image_crops)
    # Then average over spatial positions within the bins.
    position_sensitive_features = tf.reduce_mean(
-        position_sensitive_features, [1, 2], keepdims=True)
+        position_sensitive_features, [1, 2], keep_dims=True)
  else:
    # Reorder height/width to depth channel.
    block_size = bin_crop_size[0]

--- a/research/object_detection/utils/ops_test.py
+++ b/research/object_detection/utils/ops_test.py
@@ -840,7 +840,7 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
    # All channels are equal so position-sensitive crop and resize should
    # work as the usual crop and resize for just one channel.
    crop = tf.image.crop_and_resize(image, boxes, box_ind, crop_size)
-    crop_and_pool = tf.reduce_mean(crop, [1, 2], keepdims=True)
+    crop_and_pool = tf.reduce_mean(crop, [1, 2], keep_dims=True)

    ps_crop_and_pool = ops.position_sensitive_crop_regions(
        tiled_image,
@@ -866,7 +866,7 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
    # When a single bin is used, position-sensitive crop and pool should be
    # the same as non-position sensitive crop and pool.
    crop = tf.image.crop_and_resize(image, boxes, box_ind, crop_size)
-    crop_and_pool = tf.reduce_mean(crop, [1, 2], keepdims=True)
+    crop_and_pool = tf.reduce_mean(crop, [1, 2], keep_dims=True)

    ps_crop_and_pool = ops.position_sensitive_crop_regions(
        image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
@@ -1054,7 +1054,7 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
      ps_crop = ops.position_sensitive_crop_regions(
          image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False)
      ps_crop_and_pool = tf.reduce_mean(
-          ps_crop, reduction_indices=(1, 2), keepdims=True)
+          ps_crop, reduction_indices=(1, 2), keep_dims=True)

      with self.test_session() as sess:
        output = sess.run(ps_crop_and_pool)

--- a/research/object_detection/utils/variables_helper.py
+++ b/research/object_detection/utils/variables_helper.py
@@ -42,7 +42,7 @@ def filter_variables(variables, filter_regex_list, invert=False):
    a list of filtered variables.
  """
  kept_vars = []
-  variables_to_ignore_patterns = list(filter(None, filter_regex_list))
+  variables_to_ignore_patterns = filter(None, filter_regex_list)
  for var in variables:
    add = True
    for pattern in variables_to_ignore_patterns:

--- a/research/slim/BUILD
+++ b/research/slim/BUILD
@@ -434,7 +434,6 @@ py_library(
    srcs = glob(["nets/mobilenet/*.py"]),
    srcs_version = "PY2AND3",
    deps = [
-        "//third_party/py/contextlib2",
        # "//tensorflow",
    ],
 )

--- a/research/slim/datasets/build_imagenet_data.py
+++ b/research/slim/datasets/build_imagenet_data.py
@@ -93,6 +93,7 @@ import sys
 import threading

 import numpy as np
+from six.moves import xrange
 import tensorflow as tf

 tf.app.flags.DEFINE_string('train_directory', '/tmp/',

--- a/research/slim/datasets/preprocess_imagenet_validation_data.py
+++ b/research/slim/datasets/preprocess_imagenet_validation_data.py
@@ -52,6 +52,8 @@ import os
 import os.path
 import sys

+from six.moves import xrange
+

 if __name__ == '__main__':
  if len(sys.argv) < 3:

--- a/research/slim/datasets/process_bounding_boxes.py
+++ b/research/slim/datasets/process_bounding_boxes.py
@@ -86,6 +86,8 @@ import os.path
 import sys
 import xml.etree.ElementTree as ET

+from six.moves import xrange
+

 class BoundingBox(object):
  pass

--- a/research/slim/deployment/model_deploy.py
+++ b/research/slim/deployment/model_deploy.py
@@ -230,9 +230,10 @@ def _gather_clone_loss(clone, num_clones, regularization_losses):
      sum_loss = tf.add_n(all_losses)
  # Add the summaries out of the clone device block.
  if clone_loss is not None:
-    tf.summary.scalar(clone.scope + '/clone_loss', clone_loss)
+    tf.summary.scalar(clone.scope + '/clone_loss', clone_loss, family='Losses')
  if regularization_loss is not None:
-    tf.summary.scalar('regularization_loss', regularization_loss)
+    tf.summary.scalar('regularization_loss', regularization_loss,
+                      family='Losses')
  return sum_loss



--- a/research/slim/nets/cyclegan.py
+++ b/research/slim/nets/cyclegan.py
@@ -18,7 +18,7 @@ from __future__ import division
 from __future__ import print_function

 import numpy as np
-
+from six.moves import xrange
 import tensorflow as tf

 layers = tf.contrib.layers

--- a/research/slim/nets/dcgan.py
+++ b/research/slim/nets/dcgan.py
@@ -19,6 +19,8 @@ from __future__ import print_function

 from math import log

+from six.moves import xrange
+
 import tensorflow as tf
 slim = tf.contrib.slim


--- a/research/slim/nets/dcgan_test.py
+++ b/research/slim/nets/dcgan_test.py
@@ -18,6 +18,7 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

+from six.moves import xrange
 import tensorflow as tf
 from nets import dcgan


--- a/research/slim/nets/mobilenet/conv_blocks.py
+++ b/research/slim/nets/mobilenet/conv_blocks.py
@@ -13,8 +13,8 @@
 # limitations under the License.
 # ==============================================================================
 """Convolution blocks for mobilenet."""
+import contextlib
 import functools
-import contextlib2

 import tensorflow as tf

@@ -75,6 +75,19 @@ def _split_divisible(num, num_ways, divisible_by=8):
  return result


+@contextlib.contextmanager
+def _v1_compatible_scope_naming(scope):
+  if scope is None:  # Create uniqified separable blocks.
+    with tf.variable_scope(None, default_name='separable') as s, \
+         tf.name_scope(s.original_name_scope):
+      yield ''
+  else:
+    # We use scope_depthwise, scope_pointwise for compatibility with V1 ckpts.
+    # which provide numbered scopes.
+    scope += '_'
+    yield scope
+
+
 @slim.add_arg_scope
 def split_separable_conv2d(input_tensor,
                           num_outputs,
@@ -110,15 +123,7 @@ def split_separable_conv2d(input_tensor,
    output tesnor
  """

-  with contextlib2.ExitStack() as stack:
-    if scope is None:  # Create uniqified separable blocks.
-      s = stack.enter_context(tf.variable_scope(None, default_name='separable'))
-      stack.enter_context(tf.name_scope(s.original_name_scope))
-      scope = ''
-    else:
-      # We use scope_depthwise, scope_pointwise for compatibility with V1 ckpts.
-      scope += '_'
-
+  with _v1_compatible_scope_naming(scope) as scope:
    dw_scope = scope + 'depthwise'
    endpoints = endpoints if endpoints is not None else {}
    kernel_size = [3, 3]

--- a/research/slim/nets/mobilenet/mobilenet.py
+++ b/research/slim/nets/mobilenet/mobilenet.py
@@ -22,8 +22,6 @@ import contextlib
 import copy
 import os

-import contextlib2
-
 import tensorflow as tf


@@ -76,17 +74,23 @@ def _set_arg_scope_defaults(defaults):
  """Sets arg scope defaults for all items present in defaults.

  Args:
-    defaults: dictionary mapping function to default_dict
+    defaults: dictionary/list of pairs, containing a mapping from
+    function to a dictionary of default args.

  Yields:
-    context manager
+    context manager where all defaults are set.
  """
-  with contextlib2.ExitStack() as stack:
-    _ = [
-        stack.enter_context(slim.arg_scope(func, **default_arg))
-        for func, default_arg in defaults.items()
-    ]
+  if hasattr(defaults, 'items'):
+    items = defaults.items()
+  else:
+    items = defaults
+  if not items:
    yield
+  else:
+    func, default_arg = items[0]
+    with slim.arg_scope(func, **default_arg):
+      with _set_arg_scope_defaults(items[1:]):
+        yield


 @slim.add_arg_scope

--- a/research/slim/nets/mobilenet_v1_test.py
+++ b/research/slim/nets/mobilenet_v1_test.py
@@ -350,7 +350,7 @@ class MobilenetV1Test(tf.test.TestCase):
      mobilenet_v1.mobilenet_v1_base(inputs)
      total_params, _ = slim.model_analyzer.analyze_vars(
          slim.get_model_variables())
-      self.assertAlmostEqual(3217920L, total_params)
+      self.assertAlmostEqual(3217920, total_params)

  def testBuildEndPointsWithDepthMultiplierLessThanOne(self):
    batch_size = 5

--- a/research/slim/nets/nasnet/nasnet.py
+++ b/research/slim/nets/nasnet/nasnet.py
@@ -20,6 +20,7 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

+import copy
 import tensorflow as tf

 from nets.nasnet import nasnet_utils
@@ -35,13 +36,12 @@ slim = tf.contrib.slim
 # cosine (single period) learning rate decay
 # auxiliary head loss weighting: 0.4
 # clip global norm of all gradients by 5
-def _cifar_config(is_training=True, use_aux_head=True):
-  drop_path_keep_prob = 1.0 if not is_training else 0.6
+def cifar_config():
  return tf.contrib.training.HParams(
      stem_multiplier=3.0,
-      drop_path_keep_prob=drop_path_keep_prob,
+      drop_path_keep_prob=0.6,
      num_cells=18,
-      use_aux_head=int(use_aux_head),
+      use_aux_head=1,
      num_conv_filters=32,
      dense_dropout_keep_prob=1.0,
      filter_scaling_rate=2.0,
@@ -65,16 +65,15 @@ def _cifar_config(is_training=True, use_aux_head=True):
 # auxiliary head loss weighting: 0.4
 # label smoothing: 0.1
 # clip global norm of all gradients by 10
-def _large_imagenet_config(is_training=True, use_aux_head=True):
-  drop_path_keep_prob = 1.0 if not is_training else 0.7
+def large_imagenet_config():
  return tf.contrib.training.HParams(
      stem_multiplier=3.0,
      dense_dropout_keep_prob=0.5,
      num_cells=18,
      filter_scaling_rate=2.0,
      num_conv_filters=168,
-      drop_path_keep_prob=drop_path_keep_prob,
-      use_aux_head=int(use_aux_head),
+      drop_path_keep_prob=0.7,
+      use_aux_head=1,
      num_reduction_layers=2,
      data_format='NHWC',
      skip_reduction_layer_input=1,
@@ -92,7 +91,7 @@ def _large_imagenet_config(is_training=True, use_aux_head=True):
 # auxiliary head weighting: 0.4
 # label smoothing: 0.1
 # clip global norm of all gradients by 10
-def _mobile_imagenet_config(use_aux_head=True):
+def mobile_imagenet_config():
  return tf.contrib.training.HParams(
      stem_multiplier=1.0,
      dense_dropout_keep_prob=0.5,
@@ -100,7 +99,7 @@ def _mobile_imagenet_config(use_aux_head=True):
      filter_scaling_rate=2.0,
      drop_path_keep_prob=1.0,
      num_conv_filters=44,
-      use_aux_head=int(use_aux_head),
+      use_aux_head=1,
      num_reduction_layers=2,
      data_format='NHWC',
      skip_reduction_layer_input=0,
@@ -108,6 +107,12 @@ def _mobile_imagenet_config(use_aux_head=True):
  )


+def _update_hparams(hparams, is_training):
+  """Update hparams for given is_training option."""
+  if not is_training:
+    hparams.set_hparam('drop_path_keep_prob', 1.0)
+
+
 def nasnet_cifar_arg_scope(weight_decay=5e-4,
                           batch_norm_decay=0.9,
                           batch_norm_epsilon=1e-5):
@@ -279,10 +284,12 @@ def _cifar_stem(inputs, hparams):
  return net, [None, net]


-def build_nasnet_cifar(
-    images, num_classes, is_training=True, use_aux_head=True):
+def build_nasnet_cifar(images, num_classes,
+                       is_training=True,
+                       config=None):
  """Build NASNet model for the Cifar Dataset."""
-  hparams = _cifar_config(is_training=is_training, use_aux_head=use_aux_head)
+  hparams = cifar_config() if config is None else copy.deepcopy(config)
+  _update_hparams(hparams, is_training)

  if tf.test.is_gpu_available() and hparams.data_format == 'NHWC':
    tf.logging.info('A GPU is available on the machine, consider using NCHW '
@@ -326,9 +333,11 @@ build_nasnet_cifar.default_image_size = 32
 def build_nasnet_mobile(images, num_classes,
                        is_training=True,
                        final_endpoint=None,
-                        use_aux_head=True):
+                        config=None):
  """Build NASNet Mobile model for the ImageNet Dataset."""
-  hparams = _mobile_imagenet_config(use_aux_head=use_aux_head)
+  hparams = (mobile_imagenet_config() if config is None
+             else copy.deepcopy(config))
+  _update_hparams(hparams, is_training)

  if tf.test.is_gpu_available() and hparams.data_format == 'NHWC':
    tf.logging.info('A GPU is available on the machine, consider using NCHW '
@@ -375,10 +384,11 @@ build_nasnet_mobile.default_image_size = 224
 def build_nasnet_large(images, num_classes,
                       is_training=True,
                       final_endpoint=None,
-                       use_aux_head=True):
+                       config=None):
  """Build NASNet Large model for the ImageNet Dataset."""
-  hparams = _large_imagenet_config(is_training=is_training,
-                                   use_aux_head=use_aux_head)
+  hparams = (large_imagenet_config() if config is None
+             else copy.deepcopy(config))
+  _update_hparams(hparams, is_training)

  if tf.test.is_gpu_available() and hparams.data_format == 'NHWC':
    tf.logging.info('A GPU is available on the machine, consider using NCHW '

--- a/research/slim/nets/nasnet/nasnet_test.py
+++ b/research/slim/nets/nasnet/nasnet_test.py
@@ -166,9 +166,11 @@ class NASNetTest(tf.test.TestCase):
      tf.reset_default_graph()
      inputs = tf.random_uniform((batch_size, height, width, 3))
      tf.train.create_global_step()
+      config = nasnet.cifar_config()
+      config.set_hparam('use_aux_head', int(use_aux_head))
      with slim.arg_scope(nasnet.nasnet_cifar_arg_scope()):
        _, end_points = nasnet.build_nasnet_cifar(inputs, num_classes,
-                                                  use_aux_head=use_aux_head)
+                                                  config=config)
      self.assertEqual('AuxLogits' in end_points, use_aux_head)

  def testAllEndPointsShapesMobileModel(self):
@@ -215,9 +217,11 @@ class NASNetTest(tf.test.TestCase):
      tf.reset_default_graph()
      inputs = tf.random_uniform((batch_size, height, width, 3))
      tf.train.create_global_step()
+      config = nasnet.mobile_imagenet_config()
+      config.set_hparam('use_aux_head', int(use_aux_head))
      with slim.arg_scope(nasnet.nasnet_mobile_arg_scope()):
        _, end_points = nasnet.build_nasnet_mobile(inputs, num_classes,
-                                                   use_aux_head=use_aux_head)
+                                                   config=config)
      self.assertEqual('AuxLogits' in end_points, use_aux_head)

  def testAllEndPointsShapesLargeModel(self):
@@ -270,9 +274,11 @@ class NASNetTest(tf.test.TestCase):
      tf.reset_default_graph()
      inputs = tf.random_uniform((batch_size, height, width, 3))
      tf.train.create_global_step()
+      config = nasnet.large_imagenet_config()
+      config.set_hparam('use_aux_head', int(use_aux_head))
      with slim.arg_scope(nasnet.nasnet_large_arg_scope()):
        _, end_points = nasnet.build_nasnet_large(inputs, num_classes,
-                                                  use_aux_head=use_aux_head)
+                                                  config=config)
      self.assertEqual('AuxLogits' in end_points, use_aux_head)

  def testVariablesSetDeviceMobileModel(self):
@@ -323,6 +329,48 @@ class NASNetTest(tf.test.TestCase):
      output = sess.run(predictions)
      self.assertEquals(output.shape, (batch_size,))

+  def testOverrideHParamsCifarModel(self):
+    batch_size = 5
+    height, width = 32, 32
+    num_classes = 10
+    inputs = tf.random_uniform((batch_size, height, width, 3))
+    tf.train.create_global_step()
+    config = nasnet.cifar_config()
+    config.set_hparam('data_format', 'NCHW')
+    with slim.arg_scope(nasnet.nasnet_cifar_arg_scope()):
+      _, end_points = nasnet.build_nasnet_cifar(
+          inputs, num_classes, config=config)
+    self.assertListEqual(
+        end_points['Stem'].shape.as_list(), [batch_size, 96, 32, 32])
+
+  def testOverrideHParamsMobileModel(self):
+    batch_size = 5
+    height, width = 224, 224
+    num_classes = 1000
+    inputs = tf.random_uniform((batch_size, height, width, 3))
+    tf.train.create_global_step()
+    config = nasnet.mobile_imagenet_config()
+    config.set_hparam('data_format', 'NCHW')
+    with slim.arg_scope(nasnet.nasnet_mobile_arg_scope()):
+      _, end_points = nasnet.build_nasnet_mobile(
+          inputs, num_classes, config=config)
+    self.assertListEqual(
+        end_points['Stem'].shape.as_list(), [batch_size, 88, 28, 28])
+
+  def testOverrideHParamsLargeModel(self):
+    batch_size = 5
+    height, width = 331, 331
+    num_classes = 1000
+    inputs = tf.random_uniform((batch_size, height, width, 3))
+    tf.train.create_global_step()
+    config = nasnet.large_imagenet_config()
+    config.set_hparam('data_format', 'NCHW')
+    with slim.arg_scope(nasnet.nasnet_large_arg_scope()):
+      _, end_points = nasnet.build_nasnet_large(
+          inputs, num_classes, config=config)
+    self.assertListEqual(
+        end_points['Stem'].shape.as_list(), [batch_size, 336, 42, 42])
+

 if __name__ == '__main__':
  tf.test.main()