提交 001a2a61 编写于 作者: P pkulzc 提交者: Sergio Guadarrama

Internal changes for object detection. (#3656)

* Force cast of num_classes to integer

PiperOrigin-RevId: 188335318

* Updating config util to allow overwriting of cosine decay learning rates.

PiperOrigin-RevId: 188338852

* Make box_list_ops.py and box_list_ops_test.py work with C API enabled.

The C API has improved shape inference over the original Python
code. This causes some previously-working conds to fail. Switching to smart_cond fixes this.

Another effect of the improved shape inference is that one of the
failures tested gets caught earlier, so I modified the test to reflect
this.

PiperOrigin-RevId: 188409792

* Fix parallel event file writing issue.

Without this change, the event files might get corrupted when multiple evaluations are run in parallel.

PiperOrigin-RevId: 188502560

* Deprecating the boolean flag of from_detection_checkpoint.

Replace with a string field fine_tune_checkpoint_type to train_config to provide extensibility. The fine_tune_checkpoint_type can currently take value of `detection`, `classification`, or others when the restore_map is overwritten.

PiperOrigin-RevId: 188518685

* Automated g4 rollback of changelist 188502560

PiperOrigin-RevId: 188519969

* Introducing eval metrics specs for Coco Mask metrics. This allows metrics to be computed in tensorflow using the tf.learn Estimator.

PiperOrigin-RevId: 188528485

* Minor fix to make object_detection/metrics/coco_evaluation.py python3 compatible.

PiperOrigin-RevId: 188550683

* Updating eval_util to handle eval_metric_ops from multiple `DetectionEvaluator`s.

PiperOrigin-RevId: 188560474

* Allow tensor input for new_height and new_width for resize_image.

PiperOrigin-RevId: 188561908

* Fix typo in fine_tune_checkpoint_type name in trainer.

PiperOrigin-RevId: 188799033

* Adding mobilenet feature extractor to object detection.

PiperOrigin-RevId: 188916897

* Allow label maps to optionally contain an explicit background class with id zero.

PiperOrigin-RevId: 188951089

* Fix boundary conditions in random_pad_to_aspect_ratio to ensure that min_scale is always less than max_scale.

PiperOrigin-RevId: 189026868

* Fallback on from_detection_checkpoint option if fine_tune_checkpoint_type isn't set.

PiperOrigin-RevId: 189052833

* Add proper names for learning rate schedules so we don't see cryptic names on tensorboard.

PiperOrigin-RevId: 189069837

* Enforcing that all datasets are batched (and then unbatched in the model) with batch_size >= 1.

PiperOrigin-RevId: 189117178

* Adding regularization to total loss returned from DetectionModel.loss().

PiperOrigin-RevId: 189189123

* Standardize the names of loss scalars (for SSD, Faster R-CNN and R-FCN) in both training and eval so they can be compared on tensorboard.

Log localization and classification losses in evaluation.

PiperOrigin-RevId: 189189940

* Remove negative test from box list ops test.

PiperOrigin-RevId: 189229327

* Add an option to warmup learning rate in manual stepping schedule.

PiperOrigin-RevId: 189361039

* Replace tf.contrib.slim.tfexample_decoder.LookupTensor with object_detection.data_decoders.tf_example_decoder.LookupTensor.

PiperOrigin-RevId: 189388556

* Force regularization summary variables under specific family names.

PiperOrigin-RevId: 189393190

* Automated g4 rollback of changelist 188619139

PiperOrigin-RevId: 189396001

* Remove step 0 schedule since we do a hard check for it after cl/189361039

PiperOrigin-RevId: 189396697

* PiperOrigin-RevId: 189040463

* PiperOrigin-RevId: 189059229

* PiperOrigin-RevId: 189214402

* Force regularization summary variables under specific family names.

PiperOrigin-RevId: 189393190

* Automated g4 rollback of changelist 188619139

PiperOrigin-RevId: 189396001

* Make slim python3 compatible.

* Monir fixes.

* Add TargetAssignment summaries in a separate family.

PiperOrigin-RevId: 189407487

* 1. Setting `family` keyword arg prepends the summary names twice with the same name. Directly adding family suffix to the name gets rid of this problem.
2. Make sure the eval losses have the same name.

PiperOrigin-RevId: 189434618

* Minor fixes to make object detection tf 1.4 compatible.

PiperOrigin-RevId: 189437519

* Call the base of mobilenet_v1 feature extractor under the right arg scope and set batchnorm is_training based on the value passed in the constructor.

PiperOrigin-RevId: 189460890

* Automated g4 rollback of changelist 188409792

PiperOrigin-RevId: 189463882

* Update object detection syncing.

PiperOrigin-RevId: 189601955

* Add an option to warmup learning rate, hold it constant for a certain number of steps and cosine decay it.

PiperOrigin-RevId: 189606169

* Let the proposal feature extractor function in faster_rcnn meta architectures return the activations (end_points).

PiperOrigin-RevId: 189619301

* Fixed bug which caused masks to be mostly zeros (caused by detection_boxes being in absolute coordinates if scale_to_absolute=True.

PiperOrigin-RevId: 189641294

* Open sourcing Mobilenetv2 + SSDLite.

PiperOrigin-RevId: 189654520

* Remove unused files.
上级 2913cb24
......@@ -30,8 +30,8 @@ from object_detection.protos import input_reader_pb2
from object_detection.utils import dataset_util
def _get_padding_shapes(dataset, max_num_boxes, num_classes,
spatial_image_shape):
def _get_padding_shapes(dataset, max_num_boxes=None, num_classes=None,
spatial_image_shape=None):
"""Returns shapes to pad dataset tensors to before batching.
Args:
......@@ -41,13 +41,21 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,
num_classes: Number of classes in the dataset needed to compute shapes for
padding.
spatial_image_shape: A list of two integers of the form [height, width]
containing expected spatial shape of the imaage.
containing expected spatial shape of the image.
Returns:
A dictionary keyed by fields.InputDataFields containing padding shapes for
tensors in the dataset.
Raises:
ValueError: If groundtruth classes is neither rank 1 nor rank 2.
"""
height, width = spatial_image_shape
if not spatial_image_shape or spatial_image_shape == [-1, -1]:
height, width = None, None
else:
height, width = spatial_image_shape # pylint: disable=unpacking-non-sequence
padding_shapes = {
fields.InputDataFields.image: [height, width, 3],
fields.InputDataFields.source_id: [],
......@@ -55,9 +63,6 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,
fields.InputDataFields.key: [],
fields.InputDataFields.groundtruth_difficult: [max_num_boxes],
fields.InputDataFields.groundtruth_boxes: [max_num_boxes, 4],
fields.InputDataFields.groundtruth_classes: [
max_num_boxes, num_classes
],
fields.InputDataFields.groundtruth_instance_masks: [max_num_boxes, height,
width],
fields.InputDataFields.groundtruth_is_crowd: [max_num_boxes],
......@@ -69,6 +74,21 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,
fields.InputDataFields.groundtruth_label_scores: [max_num_boxes],
fields.InputDataFields.true_image_shape: [3]
}
# Determine whether groundtruth_classes are integers or one-hot encodings, and
# apply batching appropriately.
classes_shape = dataset.output_shapes[
fields.InputDataFields.groundtruth_classes]
if len(classes_shape) == 1: # Class integers.
padding_shapes[fields.InputDataFields.groundtruth_classes] = [max_num_boxes]
elif len(classes_shape) == 2: # One-hot or k-hot encoding.
padding_shapes[fields.InputDataFields.groundtruth_classes] = [
max_num_boxes, num_classes]
else:
raise ValueError('Groundtruth classes must be a rank 1 tensor (classes) or '
'rank 2 tensor (one-hot encodings)')
if fields.InputDataFields.original_image in dataset.output_shapes:
padding_shapes[fields.InputDataFields.original_image] = [None, None, 3]
if fields.InputDataFields.groundtruth_keypoints in dataset.output_shapes:
tensor_shape = dataset.output_shapes[fields.InputDataFields.
groundtruth_keypoints]
......@@ -87,28 +107,25 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,
def build(input_reader_config, transform_input_data_fn=None,
batch_size=1, max_num_boxes=None, num_classes=None,
batch_size=None, max_num_boxes=None, num_classes=None,
spatial_image_shape=None):
"""Builds a tf.data.Dataset.
Builds a tf.data.Dataset by applying the `transform_input_data_fn` on all
records. Optionally, if `batch_size` > 1 and `max_num_boxes`, `num_classes`
and `spatial_image_shape` are not None, returns a padded batched
tf.data.Dataset.
records. Applies a padded batch to the resulting dataset.
Args:
input_reader_config: A input_reader_pb2.InputReader object.
transform_input_data_fn: Function to apply to all records, or None if
no extra decoding is required.
batch_size: Batch size. If not None, returns a padded batch dataset.
max_num_boxes: Max number of groundtruth boxes needed to computes shapes for
padding. This is only used if batch_size is greater than 1.
batch_size: Batch size. If None, batching is not performed.
max_num_boxes: Max number of groundtruth boxes needed to compute shapes for
padding. If None, will use a dynamic shape.
num_classes: Number of classes in the dataset needed to compute shapes for
padding. This is only used if batch_size is greater than 1.
spatial_image_shape: a list of two integers of the form [height, width]
padding. If None, will use a dynamic shape.
spatial_image_shape: A list of two integers of the form [height, width]
containing expected spatial shape of the image after applying
transform_input_data_fn. This is needed to compute shapes for padding and
only used if batch_size is greater than 1.
transform_input_data_fn. If None, will use dynamic shapes.
Returns:
A tf.data.Dataset based on the input_reader_config.
......@@ -116,8 +133,6 @@ def build(input_reader_config, transform_input_data_fn=None,
Raises:
ValueError: On invalid input reader proto.
ValueError: If no input paths are specified.
ValueError: If batch_size > 1 and any of (max_num_boxes, num_classes,
spatial_image_shape) is None.
"""
if not isinstance(input_reader_config, input_reader_pb2.InputReader):
raise ValueError('input_reader_config not of type '
......@@ -147,14 +162,7 @@ def build(input_reader_config, transform_input_data_fn=None,
functools.partial(tf.data.TFRecordDataset, buffer_size=8 * 1000 * 1000),
process_fn, config.input_path[:], input_reader_config)
if batch_size > 1:
if num_classes is None:
raise ValueError('`num_classes` must be set when batch_size > 1.')
if max_num_boxes is None:
raise ValueError('`max_num_boxes` must be set when batch_size > 1.')
if spatial_image_shape is None:
raise ValueError('`spatial_image_shape` must be set when batch_size > '
'1 .')
if batch_size:
padding_shapes = _get_padding_shapes(dataset, max_num_boxes, num_classes,
spatial_image_shape)
dataset = dataset.apply(
......
......@@ -91,7 +91,7 @@ class DatasetBuilderTest(tf.test.TestCase):
input_reader_proto = input_reader_pb2.InputReader()
text_format.Merge(input_reader_text_proto, input_reader_proto)
tensor_dict = dataset_util.make_initializable_iterator(
dataset_builder.build(input_reader_proto)).get_next()
dataset_builder.build(input_reader_proto, batch_size=1)).get_next()
sv = tf.train.Supervisor(logdir=self.get_temp_dir())
with sv.prepare_or_wait_for_session() as sess:
......@@ -100,15 +100,15 @@ class DatasetBuilderTest(tf.test.TestCase):
self.assertTrue(
fields.InputDataFields.groundtruth_instance_masks not in output_dict)
self.assertEquals((4, 5, 3),
self.assertEquals((1, 4, 5, 3),
output_dict[fields.InputDataFields.image].shape)
self.assertEquals([2],
output_dict[fields.InputDataFields.groundtruth_classes])
self.assertAllEqual([[2]],
output_dict[fields.InputDataFields.groundtruth_classes])
self.assertEquals(
(1, 4), output_dict[fields.InputDataFields.groundtruth_boxes].shape)
(1, 1, 4), output_dict[fields.InputDataFields.groundtruth_boxes].shape)
self.assertAllEqual(
[0.0, 0.0, 1.0, 1.0],
output_dict[fields.InputDataFields.groundtruth_boxes][0])
output_dict[fields.InputDataFields.groundtruth_boxes][0][0])
def test_build_tf_record_input_reader_and_load_instance_masks(self):
tf_record_path = self.create_tf_record()
......@@ -124,14 +124,14 @@ class DatasetBuilderTest(tf.test.TestCase):
input_reader_proto = input_reader_pb2.InputReader()
text_format.Merge(input_reader_text_proto, input_reader_proto)
tensor_dict = dataset_util.make_initializable_iterator(
dataset_builder.build(input_reader_proto)).get_next()
dataset_builder.build(input_reader_proto, batch_size=1)).get_next()
sv = tf.train.Supervisor(logdir=self.get_temp_dir())
with sv.prepare_or_wait_for_session() as sess:
sv.start_queue_runners(sess)
output_dict = sess.run(tensor_dict)
self.assertAllEqual(
(1, 4, 5),
(1, 1, 4, 5),
output_dict[fields.InputDataFields.groundtruth_instance_masks].shape)
def test_build_tf_record_input_reader_with_batch_size_two(self):
......
......@@ -36,6 +36,7 @@ from object_detection.models.embedded_ssd_mobilenet_v1_feature_extractor import
from object_detection.models.ssd_inception_v2_feature_extractor import SSDInceptionV2FeatureExtractor
from object_detection.models.ssd_inception_v3_feature_extractor import SSDInceptionV3FeatureExtractor
from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobileNetV1FeatureExtractor
from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
from object_detection.protos import model_pb2
# A map of names to SSD feature extractors.
......@@ -43,6 +44,7 @@ SSD_FEATURE_EXTRACTOR_CLASS_MAP = {
'ssd_inception_v2': SSDInceptionV2FeatureExtractor,
'ssd_inception_v3': SSDInceptionV3FeatureExtractor,
'ssd_mobilenet_v1': SSDMobileNetV1FeatureExtractor,
'ssd_mobilenet_v2': SSDMobileNetV2FeatureExtractor,
'ssd_resnet50_v1_fpn': ssd_resnet_v1_fpn.SSDResnet50V1FpnFeatureExtractor,
'ssd_resnet101_v1_fpn': ssd_resnet_v1_fpn.SSDResnet101V1FpnFeatureExtractor,
'ssd_resnet152_v1_fpn': ssd_resnet_v1_fpn.SSDResnet152V1FpnFeatureExtractor,
......
......@@ -31,6 +31,7 @@ from object_detection.models.embedded_ssd_mobilenet_v1_feature_extractor import
from object_detection.models.ssd_inception_v2_feature_extractor import SSDInceptionV2FeatureExtractor
from object_detection.models.ssd_inception_v3_feature_extractor import SSDInceptionV3FeatureExtractor
from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobileNetV1FeatureExtractor
from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
from object_detection.protos import model_pb2
FRCNN_RESNET_FEAT_MAPS = {
......@@ -368,6 +369,81 @@ class ModelBuilderTest(tf.test.TestCase):
self.assertTrue(model._feature_extractor._batch_norm_trainable)
self.assertTrue(model._normalize_loc_loss_by_codesize)
def test_create_ssd_mobilenet_v2_model_from_config(self):
model_text_proto = """
ssd {
feature_extractor {
type: 'ssd_mobilenet_v2'
conv_hyperparams {
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
batch_norm_trainable: true
}
box_coder {
faster_rcnn_box_coder {
}
}
matcher {
argmax_matcher {
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
aspect_ratios: 1.0
}
}
image_resizer {
fixed_shape_resizer {
height: 320
width: 320
}
}
box_predictor {
convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}
}
normalize_loc_loss_by_codesize: true
loss {
classification_loss {
weighted_softmax {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
}
}"""
model_proto = model_pb2.DetectionModel()
text_format.Merge(model_text_proto, model_proto)
model = self.create_model(model_proto)
self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch)
self.assertIsInstance(model._feature_extractor,
SSDMobileNetV2FeatureExtractor)
self.assertTrue(model._feature_extractor._batch_norm_trainable)
self.assertTrue(model._normalize_loc_loss_by_codesize)
def test_create_embedded_ssd_mobilenet_v1_model_from_config(self):
model_text_proto = """
ssd {
......
......@@ -85,7 +85,8 @@ def _create_learning_rate(learning_rate_config):
learning_rate_type = learning_rate_config.WhichOneof('learning_rate')
if learning_rate_type == 'constant_learning_rate':
config = learning_rate_config.constant_learning_rate
learning_rate = tf.constant(config.learning_rate, dtype=tf.float32)
learning_rate = tf.constant(config.learning_rate, dtype=tf.float32,
name='learning_rate')
if learning_rate_type == 'exponential_decay_learning_rate':
config = learning_rate_config.exponential_decay_learning_rate
......@@ -94,7 +95,7 @@ def _create_learning_rate(learning_rate_config):
tf.train.get_or_create_global_step(),
config.decay_steps,
config.decay_factor,
staircase=config.staircase)
staircase=config.staircase, name='learning_rate')
if learning_rate_type == 'manual_step_learning_rate':
config = learning_rate_config.manual_step_learning_rate
......@@ -105,7 +106,7 @@ def _create_learning_rate(learning_rate_config):
learning_rate_sequence += [x.learning_rate for x in config.schedule]
learning_rate = learning_schedules.manual_stepping(
tf.train.get_or_create_global_step(), learning_rate_step_boundaries,
learning_rate_sequence)
learning_rate_sequence, config.warmup)
if learning_rate_type == 'cosine_decay_learning_rate':
config = learning_rate_config.cosine_decay_learning_rate
......@@ -114,7 +115,8 @@ def _create_learning_rate(learning_rate_config):
config.learning_rate_base,
config.total_steps,
config.warmup_learning_rate,
config.warmup_steps)
config.warmup_steps,
config.hold_base_rate_steps)
if learning_rate is None:
raise ValueError('Learning_rate %s not supported.' % learning_rate_type)
......
......@@ -35,6 +35,7 @@ class LearningRateBuilderTest(tf.test.TestCase):
text_format.Merge(learning_rate_text_proto, learning_rate_proto)
learning_rate = optimizer_builder._create_learning_rate(
learning_rate_proto)
self.assertTrue(learning_rate.op.name.endswith('learning_rate'))
with self.test_session():
learning_rate_out = learning_rate.eval()
self.assertAlmostEqual(learning_rate_out, 0.004)
......@@ -52,19 +53,22 @@ class LearningRateBuilderTest(tf.test.TestCase):
text_format.Merge(learning_rate_text_proto, learning_rate_proto)
learning_rate = optimizer_builder._create_learning_rate(
learning_rate_proto)
self.assertTrue(learning_rate.op.name.endswith('learning_rate'))
self.assertTrue(isinstance(learning_rate, tf.Tensor))
def testBuildManualStepLearningRate(self):
learning_rate_text_proto = """
manual_step_learning_rate {
initial_learning_rate: 0.002
schedule {
step: 0
step: 100
learning_rate: 0.006
}
schedule {
step: 90000
learning_rate: 0.00006
}
warmup: true
}
"""
learning_rate_proto = optimizer_pb2.LearningRate()
......@@ -80,6 +84,7 @@ class LearningRateBuilderTest(tf.test.TestCase):
total_steps: 20000
warmup_learning_rate: 0.0001
warmup_steps: 1000
hold_base_rate_steps: 20000
}
"""
learning_rate_proto = optimizer_pb2.LearningRate()
......
......@@ -727,21 +727,6 @@ class ConcatenateTest(tf.test.TestCase):
class NonMaxSuppressionTest(tf.test.TestCase):
def test_with_invalid_scores_field(self):
corners = tf.constant([[0, 0, 1, 1],
[0, 0.1, 1, 1.1],
[0, -0.1, 1, 0.9],
[0, 10, 1, 11],
[0, 10.1, 1, 11.1],
[0, 100, 1, 101]], tf.float32)
boxes = box_list.BoxList(corners)
boxes.add_field('scores', tf.constant([.9, .75, .6, .95, .5]))
iou_thresh = .5
max_output_size = 3
with self.assertRaisesWithPredicateMatch(ValueError,
'Dimensions must be equal'):
box_list_ops.non_max_suppression(boxes, iou_thresh, max_output_size)
def test_select_from_three_clusters(self):
corners = tf.constant([[0, 0, 1, 1],
[0, 0.1, 1, 1.1],
......
......@@ -275,7 +275,7 @@ class DetectionModel(object):
fields.BoxListFields.keypoints] = groundtruth_keypoints_list
@abstractmethod
def restore_map(self, from_detection_checkpoint=True):
def restore_map(self, fine_tune_checkpoint_type='detection'):
"""Returns a map of variables to load from a foreign checkpoint.
Returns a map of variable names to load from a checkpoint to variables in
......@@ -287,9 +287,10 @@ class DetectionModel(object):
the num_classes parameter.
Args:
from_detection_checkpoint: whether to restore from a full detection
fine_tune_checkpoint_type: whether to restore from a full detection
checkpoint (with compatible variable names) or to restore from a
classification checkpoint for initialization prior to training.
Valid values: `detection`, `classification`. Default 'detection'.
Returns:
A dict mapping variable names (to load from a checkpoint) to variables in
......
......@@ -122,7 +122,7 @@ def multiclass_non_max_suppression(boxes,
if boundaries is not None:
per_class_boundaries_list = tf.unstack(boundaries, axis=1)
boxes_ids = (range(num_classes) if len(per_class_boxes_list) > 1
else [0] * num_classes)
else [0] * num_classes.value)
for class_idx, boxes_idx in zip(range(num_classes), boxes_ids):
per_class_boxes = per_class_boxes_list[boxes_idx]
boxlist_and_class_scores = box_list.BoxList(per_class_boxes)
......
......@@ -233,7 +233,7 @@ def _rgb_to_grayscale(images, name=None):
rgb_weights = [0.2989, 0.5870, 0.1140]
rank_1 = tf.expand_dims(tf.rank(images) - 1, 0)
gray_float = tf.reduce_sum(
flt_image * rgb_weights, rank_1, keepdims=True)
flt_image * rgb_weights, rank_1, keep_dims=True)
gray_float.set_shape(images.get_shape()[:-1].concatenate([1]))
return tf.image.convert_image_dtype(gray_float, orig_dtype, name=name)
......@@ -1821,8 +1821,10 @@ def random_pad_to_aspect_ratio(image,
max_width = tf.maximum(
max_padded_size_ratio[1] * image_width, target_width)
min_scale = tf.maximum(min_height / target_height, min_width / target_width)
max_scale = tf.minimum(max_height / target_height, max_width / target_width)
min_scale = tf.minimum(
max_scale,
tf.maximum(min_height / target_height, min_width / target_width))
generator_func = functools.partial(tf.random_uniform, [],
min_scale, max_scale, seed=seed)
......@@ -1831,8 +1833,8 @@ def random_pad_to_aspect_ratio(image,
preprocessor_cache.PreprocessorCache.PAD_TO_ASPECT_RATIO,
preprocess_vars_cache)
target_height = scale * target_height
target_width = scale * target_width
target_height = tf.round(scale * target_height)
target_width = tf.round(scale * target_width)
new_image = tf.image.pad_to_bounding_box(
image, 0, 0, tf.to_int32(target_height), tf.to_int32(target_width))
......@@ -2261,14 +2263,14 @@ def resize_image(image,
'ResizeImage',
values=[image, new_height, new_width, method, align_corners]):
new_image = tf.image.resize_images(
image, [new_height, new_width],
image, tf.stack([new_height, new_width]),
method=method,
align_corners=align_corners)
image_shape = shape_utils.combined_static_and_dynamic_shape(image)
result = [new_image]
if masks is not None:
num_instances = tf.shape(masks)[0]
new_size = tf.constant([new_height, new_width], dtype=tf.int32)
new_size = tf.stack([new_height, new_width])
def resize_masks_branch():
new_masks = tf.expand_dims(masks, 3)
new_masks = tf.image.resize_nearest_neighbor(
......
......@@ -1736,6 +1736,41 @@ class PreprocessorTest(tf.test.TestCase):
test_masks=True,
test_keypoints=True)
def testRunRandomPadToAspectRatioWithMinMaxPaddedSizeRatios(self):
image = self.createColorfulTestImage()
boxes = self.createTestBoxes()
labels = self.createTestLabels()
tensor_dict = {
fields.InputDataFields.image: image,
fields.InputDataFields.groundtruth_boxes: boxes,
fields.InputDataFields.groundtruth_classes: labels
}
preprocessor_arg_map = preprocessor.get_default_func_arg_map()
preprocessing_options = [(preprocessor.random_pad_to_aspect_ratio,
{'min_padded_size_ratio': (4.0, 4.0),
'max_padded_size_ratio': (4.0, 4.0)})]
distorted_tensor_dict = preprocessor.preprocess(
tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map)
distorted_image = distorted_tensor_dict[fields.InputDataFields.image]
distorted_boxes = distorted_tensor_dict[
fields.InputDataFields.groundtruth_boxes]
distorted_labels = distorted_tensor_dict[
fields.InputDataFields.groundtruth_classes]
with self.test_session() as sess:
distorted_image_, distorted_boxes_, distorted_labels_ = sess.run([
distorted_image, distorted_boxes, distorted_labels])
expected_boxes = np.array(
[[0.0, 0.125, 0.1875, 0.5], [0.0625, 0.25, 0.1875, 0.5]],
dtype=np.float32)
self.assertAllEqual(distorted_image_.shape, [1, 800, 800, 3])
self.assertAllEqual(distorted_labels_, [1, 2])
self.assertAllClose(distorted_boxes_.flatten(),
expected_boxes.flatten())
def testRunRandomPadToAspectRatioWithMasks(self):
image = self.createColorfulTestImage()
boxes = self.createTestBoxes()
......@@ -2118,6 +2153,33 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllEqual(out_image_shape, expected_image_shape)
self.assertAllEqual(out_masks_shape, expected_mask_shape)
def testResizeImageWithMasksTensorInputHeightAndWidth(self):
"""Tests image resizing, checking output sizes."""
in_image_shape_list = [[60, 40, 3], [15, 30, 3]]
in_masks_shape_list = [[15, 60, 40], [10, 15, 30]]
height = tf.constant(50, dtype=tf.int32)
width = tf.constant(100, dtype=tf.int32)
expected_image_shape_list = [[50, 100, 3], [50, 100, 3]]
expected_masks_shape_list = [[15, 50, 100], [10, 50, 100]]
for (in_image_shape, expected_image_shape, in_masks_shape,
expected_mask_shape) in zip(in_image_shape_list,
expected_image_shape_list,
in_masks_shape_list,
expected_masks_shape_list):
in_image = tf.random_uniform(in_image_shape)
in_masks = tf.random_uniform(in_masks_shape)
out_image, out_masks, _ = preprocessor.resize_image(
in_image, in_masks, new_height=height, new_width=width)
out_image_shape = tf.shape(out_image)
out_masks_shape = tf.shape(out_masks)
with self.test_session() as sess:
out_image_shape, out_masks_shape = sess.run(
[out_image_shape, out_masks_shape])
self.assertAllEqual(out_image_shape, expected_image_shape)
self.assertAllEqual(out_masks_shape, expected_mask_shape)
def testResizeImageWithNoInstanceMask(self):
"""Tests image resizing, checking output sizes."""
in_image_shape_list = [[60, 40, 3], [15, 30, 3]]
......
......@@ -31,6 +31,44 @@ from object_detection.utils import label_map_util
slim_example_decoder = tf.contrib.slim.tfexample_decoder
# TODO(lzc): keep LookupTensor and BackupHandler in sync with
# tf.contrib.slim.tfexample_decoder version.
class LookupTensor(slim_example_decoder.Tensor):
"""An ItemHandler that returns a parsed Tensor, the result of a lookup."""
def __init__(self,
tensor_key,
table,
shape_keys=None,
shape=None,
default_value=''):
"""Initializes the LookupTensor handler.
Simply calls a vocabulary (most often, a label mapping) lookup.
Args:
tensor_key: the name of the `TFExample` feature to read the tensor from.
table: A tf.lookup table.
shape_keys: Optional name or list of names of the TF-Example feature in
which the tensor shape is stored. If a list, then each corresponds to
one dimension of the shape.
shape: Optional output shape of the `Tensor`. If provided, the `Tensor` is
reshaped accordingly.
default_value: The value used when the `tensor_key` is not found in a
particular `TFExample`.
Raises:
ValueError: if both `shape_keys` and `shape` are specified.
"""
self._table = table
super(LookupTensor, self).__init__(tensor_key, shape_keys, shape,
default_value)
def tensors_to_item(self, keys_to_tensors):
unmapped_tensor = super(LookupTensor, self).tensors_to_item(keys_to_tensors)
return self._table.lookup(unmapped_tensor)
class BackupHandler(slim_example_decoder.ItemHandler):
"""An ItemHandler that tries two ItemHandlers in order."""
......@@ -207,8 +245,7 @@ class TfExampleDecoder(data_decoder.DataDecoder):
# switch back to slim_example_decoder.BackupHandler once tf 1.5 becomes
# more popular.
label_handler = BackupHandler(
slim_example_decoder.LookupTensor(
'image/object/class/text', table, default_value=''),
LookupTensor('image/object/class/text', table, default_value=''),
slim_example_decoder.Tensor('image/object/class/label'))
else:
label_handler = slim_example_decoder.Tensor('image/object/class/label')
......
......@@ -108,8 +108,8 @@ class TfExampleDecoderTest(tf.test.TestCase):
}
backup_handler = tf_example_decoder.BackupHandler(
handler=slim_example_decoder.Tensor('image/object/class/label'),
backup=slim_example_decoder.LookupTensor('image/object/class/text',
table))
backup=tf_example_decoder.LookupTensor('image/object/class/text',
table))
items_to_handlers = {
'labels': backup_handler,
}
......@@ -128,6 +128,37 @@ class TfExampleDecoderTest(tf.test.TestCase):
self.assertAllClose([2, 0, 1], obtained_class_ids_each_example[1])
self.assertAllClose([42, 10, 901], obtained_class_ids_each_example[2])
def testDecodeExampleWithBranchedLookup(self):
example = example_pb2.Example(features=feature_pb2.Features(feature={
'image/object/class/text': self._BytesFeatureFromList(
np.array(['cat', 'dog', 'guinea pig'])),
}))
serialized_example = example.SerializeToString()
# 'dog' -> 0, 'guinea pig' -> 1, 'cat' -> 2
table = lookup_ops.index_table_from_tensor(
constant_op.constant(['dog', 'guinea pig', 'cat']))
with self.test_session() as sess:
sess.run(lookup_ops.tables_initializer())
serialized_example = array_ops.reshape(serialized_example, shape=[])
keys_to_features = {
'image/object/class/text': parsing_ops.VarLenFeature(dtypes.string),
}
items_to_handlers = {
'labels':
tf_example_decoder.LookupTensor('image/object/class/text', table),
}
decoder = slim_example_decoder.TFExampleDecoder(keys_to_features,
items_to_handlers)
obtained_class_ids = decoder.decode(serialized_example)[0].eval()
self.assertAllClose([2, 0, 1], obtained_class_ids)
def testDecodeJpegImage(self):
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
......
......@@ -58,10 +58,10 @@ tf.app.flags.DEFINE_string('output_path', '', 'Path to which TFRecord files'
'will be located at: <output_path>_train.tfrecord.'
'And the TFRecord with the validation set will be'
'located at: <output_path>_val.tfrecord')
tf.app.flags.DEFINE_list('classes_to_use', ['car', 'pedestrian', 'dontcare'],
'Which classes of bounding boxes to use. Adding the'
'dontcare class will remove all bboxs in the dontcare'
'regions.')
tf.app.flags.DEFINE_string('classes_to_use', 'car,pedestrian,dontcare',
'Comma separated list of class names that will be'
'used. Adding the dontcare class will remove all'
'bboxs in the dontcare regions.')
tf.app.flags.DEFINE_string('label_map_path', 'data/kitti_label_map.pbtxt',
'Path to label map proto.')
tf.app.flags.DEFINE_integer('validation_set_size', '500', 'Number of images to'
......@@ -302,7 +302,7 @@ def main(_):
convert_kitti_to_tfrecords(
data_dir=FLAGS.data_dir,
output_path=FLAGS.output_path,
classes_to_use=FLAGS.classes_to_use,
classes_to_use=FLAGS.classes_to_use.split(','),
label_map_path=FLAGS.label_map_path,
validation_set_size=FLAGS.validation_set_size)
......
......@@ -12,7 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Common functions for repeatedly evaluating a checkpoint."""
"""Common utility functions for evaluation."""
import collections
import logging
import os
import time
......@@ -24,6 +25,7 @@ from object_detection.core import box_list
from object_detection.core import box_list_ops
from object_detection.core import keypoint_ops
from object_detection.core import standard_fields as fields
from object_detection.metrics import coco_evaluation
from object_detection.utils import label_map_util
from object_detection.utils import ops
from object_detection.utils import visualization_utils as vis_utils
......@@ -201,8 +203,9 @@ def _run_checkpoint_once(tensor_dict,
num_batches=1,
master='',
save_graph=False,
save_graph_dir=''):
"""Evaluates metrics defined in evaluators.
save_graph_dir='',
losses_dict=None):
"""Evaluates metrics defined in evaluators and returns summaries.
This function loads the latest checkpoint in checkpoint_dirs and evaluates
all metrics defined in evaluators. The metrics are processed in batch by the
......@@ -240,6 +243,7 @@ def _run_checkpoint_once(tensor_dict,
save_graph: whether or not the Tensorflow graph is stored as a pbtxt file.
save_graph_dir: where to store the Tensorflow graph on disk. If save_graph
is True this must be non-empty.
losses_dict: optional dictionary of scalar detection losses.
Returns:
global_step: the count of global steps.
......@@ -269,6 +273,7 @@ def _run_checkpoint_once(tensor_dict,
tf.train.write_graph(sess.graph_def, save_graph_dir, 'eval.pbtxt')
counters = {'skipped': 0, 'success': 0}
aggregate_result_losses_dict = collections.defaultdict(list)
with tf.contrib.slim.queues.QueueRunners(sess):
try:
for batch in range(int(num_batches)):
......@@ -276,16 +281,22 @@ def _run_checkpoint_once(tensor_dict,
logging.info('Running eval ops batch %d/%d', batch + 1, num_batches)
if not batch_processor:
try:
result_dict = sess.run(tensor_dict)
if not losses_dict:
losses_dict = {}
result_dict, result_losses_dict = sess.run([tensor_dict,
losses_dict])
counters['success'] += 1
except tf.errors.InvalidArgumentError:
logging.info('Skipping image')
counters['skipped'] += 1
result_dict = {}
else:
result_dict = batch_processor(tensor_dict, sess, batch, counters)
result_dict, result_losses_dict = batch_processor(
tensor_dict, sess, batch, counters, losses_dict=losses_dict)
if not result_dict:
continue
for key, value in iter(result_losses_dict.items()):
aggregate_result_losses_dict[key].append(value)
for evaluator in evaluators:
# TODO(b/65130867): Use image_id tensor once we fix the input data
# decoders to return correct image_id.
......@@ -310,6 +321,9 @@ def _run_checkpoint_once(tensor_dict,
raise ValueError('Metric names between evaluators must not collide.')
all_evaluator_metrics.update(metrics)
global_step = tf.train.global_step(sess, tf.train.get_global_step())
for key, value in iter(aggregate_result_losses_dict.items()):
all_evaluator_metrics['Losses/' + key] = np.mean(value)
sess.close()
return (global_step, all_evaluator_metrics)
......@@ -327,7 +341,8 @@ def repeated_checkpoint_run(tensor_dict,
max_number_of_evaluations=None,
master='',
save_graph=False,
save_graph_dir=''):
save_graph_dir='',
losses_dict=None):
"""Periodically evaluates desired tensors using checkpoint_dirs or restore_fn.
This function repeatedly loads a checkpoint and evaluates a desired
......@@ -367,6 +382,7 @@ def repeated_checkpoint_run(tensor_dict,
save_graph: whether or not the Tensorflow graph is saved as a pbtxt file.
save_graph_dir: where to save on disk the Tensorflow graph. If store_graph
is True this must be non-empty.
losses_dict: optional dictionary of scalar detection losses.
Returns:
metrics: A dictionary containing metric names and values in the latest
......@@ -404,7 +420,8 @@ def repeated_checkpoint_run(tensor_dict,
variables_to_restore,
restore_fn, num_batches,
master, save_graph,
save_graph_dir)
save_graph_dir,
losses_dict=losses_dict)
write_metrics(metrics, global_step, summary_dir)
number_of_evaluations += 1
......@@ -432,7 +449,7 @@ def result_dict_for_single_example(image,
have label 1.
Args:
image: A single 4D image tensor of shape [1, H, W, C].
image: A single 4D uint8 image tensor of shape [1, H, W, C].
key: A single string tensor identifying the image.
detections: A dictionary of detections, returned from
DetectionModel.postprocess().
......@@ -479,7 +496,7 @@ def result_dict_for_single_example(image,
"""
label_id_offset = 1 # Applying label id offset (b/63711816)
input_data_fields = fields.InputDataFields()
input_data_fields = fields.InputDataFields
output_dict = {
input_data_fields.original_image: image,
input_data_fields.key: key,
......@@ -488,10 +505,6 @@ def result_dict_for_single_example(image,
detection_fields = fields.DetectionResultFields
detection_boxes = detections[detection_fields.detection_boxes][0]
image_shape = tf.shape(image)
if scale_to_absolute:
absolute_detection_boxlist = box_list_ops.to_absolute_coordinates(
box_list.BoxList(detection_boxes), image_shape[1], image_shape[2])
detection_boxes = absolute_detection_boxlist.get()
detection_scores = detections[detection_fields.detection_scores][0]
if class_agnostic:
......@@ -508,7 +521,14 @@ def result_dict_for_single_example(image,
detection_classes, begin=[0], size=[num_detections])
detection_scores = tf.slice(
detection_scores, begin=[0], size=[num_detections])
output_dict[detection_fields.detection_boxes] = detection_boxes
if scale_to_absolute:
absolute_detection_boxlist = box_list_ops.to_absolute_coordinates(
box_list.BoxList(detection_boxes), image_shape[1], image_shape[2])
output_dict[detection_fields.detection_boxes] = (
absolute_detection_boxlist.get())
else:
output_dict[detection_fields.detection_boxes] = detection_boxes
output_dict[detection_fields.detection_classes] = detection_classes
output_dict[detection_fields.detection_scores] = detection_scores
......@@ -550,3 +570,69 @@ def result_dict_for_single_example(image,
output_dict[input_data_fields.groundtruth_classes] = groundtruth_classes
return output_dict
def get_eval_metric_ops_for_evaluators(evaluation_metrics,
categories,
eval_dict,
include_metrics_per_category=False):
"""Returns a dictionary of eval metric ops to use with `tf.EstimatorSpec`.
Args:
evaluation_metrics: List of evaluation metric names. Current options are
'coco_detection_metrics' and 'coco_mask_metrics'.
categories: A list of dicts, each of which has the following keys -
'id': (required) an integer id uniquely identifying this category.
'name': (required) string representing category name e.g., 'cat', 'dog'.
eval_dict: An evaluation dictionary, returned from
result_dict_for_single_example().
include_metrics_per_category: If True, include metrics for each category.
Returns:
A dictionary of metric names to tuple of value_op and update_op that can be
used as eval metric ops in tf.EstimatorSpec.
Raises:
ValueError: If any of the metrics in `evaluation_metric` is not
'coco_detection_metrics' or 'coco_mask_metrics'.
"""
evaluation_metrics = list(set(evaluation_metrics))
input_data_fields = fields.InputDataFields
detection_fields = fields.DetectionResultFields
eval_metric_ops = {}
for metric in evaluation_metrics:
if metric == 'coco_detection_metrics':
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
categories, include_metrics_per_category=include_metrics_per_category)
eval_metric_ops.update(
coco_evaluator.get_estimator_eval_metric_ops(
image_id=eval_dict[input_data_fields.key],
groundtruth_boxes=eval_dict[input_data_fields.groundtruth_boxes],
groundtruth_classes=eval_dict[
input_data_fields.groundtruth_classes],
detection_boxes=eval_dict[detection_fields.detection_boxes],
detection_scores=eval_dict[detection_fields.detection_scores],
detection_classes=eval_dict[detection_fields.detection_classes]))
elif metric == 'coco_mask_metrics':
coco_mask_evaluator = coco_evaluation.CocoMaskEvaluator(
categories, include_metrics_per_category=include_metrics_per_category)
eval_metric_ops.update(
coco_mask_evaluator.get_estimator_eval_metric_ops(
image_id=eval_dict[input_data_fields.key],
groundtruth_boxes=eval_dict[input_data_fields.groundtruth_boxes],
groundtruth_classes=eval_dict[
input_data_fields.groundtruth_classes],
groundtruth_instance_masks=eval_dict[
input_data_fields.groundtruth_instance_masks],
detection_scores=eval_dict[detection_fields.detection_scores],
detection_classes=eval_dict[detection_fields.detection_classes],
detection_masks=eval_dict[detection_fields.detection_masks]))
else:
raise ValueError('The only evaluation metrics supported are '
'"coco_detection_metrics" and "coco_mask_metrics". '
'Found {} in the evaluation metrics'.format(metric))
return eval_metric_ops
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for eval_util."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from object_detection import eval_util
from object_detection.core import standard_fields as fields
class EvalUtilTest(tf.test.TestCase):
def _get_categories_list(self):
return [{'id': 0, 'name': 'person'},
{'id': 1, 'name': 'dog'},
{'id': 2, 'name': 'cat'}]
def _make_evaluation_dict(self):
input_data_fields = fields.InputDataFields
detection_fields = fields.DetectionResultFields
image = tf.zeros(shape=[1, 20, 20, 3], dtype=tf.uint8)
key = tf.constant('image1')
detection_boxes = tf.constant([[[0., 0., 1., 1.]]])
detection_scores = tf.constant([[0.8]])
detection_classes = tf.constant([[0]])
detection_masks = tf.ones(shape=[1, 1, 20, 20], dtype=tf.float32)
num_detections = tf.constant([1])
groundtruth_boxes = tf.constant([[0., 0., 1., 1.]])
groundtruth_classes = tf.constant([1])
groundtruth_instance_masks = tf.ones(shape=[1, 20, 20], dtype=tf.uint8)
detections = {
detection_fields.detection_boxes: detection_boxes,
detection_fields.detection_scores: detection_scores,
detection_fields.detection_classes: detection_classes,
detection_fields.detection_masks: detection_masks,
detection_fields.num_detections: num_detections
}
groundtruth = {
input_data_fields.groundtruth_boxes: groundtruth_boxes,
input_data_fields.groundtruth_classes: groundtruth_classes,
input_data_fields.groundtruth_instance_masks: groundtruth_instance_masks
}
return eval_util.result_dict_for_single_example(image, key, detections,
groundtruth)
def test_get_eval_metric_ops_for_coco_detections(self):
evaluation_metrics = ['coco_detection_metrics']
categories = self._get_categories_list()
eval_dict = self._make_evaluation_dict()
metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
evaluation_metrics, categories, eval_dict)
_, update_op = metric_ops['DetectionBoxes_Precision/mAP']
with self.test_session() as sess:
metrics = {}
for key, (value_op, _) in metric_ops.iteritems():
metrics[key] = value_op
sess.run(update_op)
metrics = sess.run(metrics)
print(metrics)
self.assertAlmostEqual(1.0, metrics['DetectionBoxes_Precision/mAP'])
self.assertNotIn('DetectionMasks_Precision/mAP', metrics)
def test_get_eval_metric_ops_for_coco_detections_and_masks(self):
evaluation_metrics = ['coco_detection_metrics',
'coco_mask_metrics']
categories = self._get_categories_list()
eval_dict = self._make_evaluation_dict()
metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
evaluation_metrics, categories, eval_dict)
_, update_op_boxes = metric_ops['DetectionBoxes_Precision/mAP']
_, update_op_masks = metric_ops['DetectionMasks_Precision/mAP']
with self.test_session() as sess:
metrics = {}
for key, (value_op, _) in metric_ops.iteritems():
metrics[key] = value_op
sess.run(update_op_boxes)
sess.run(update_op_masks)
metrics = sess.run(metrics)
self.assertAlmostEqual(1.0, metrics['DetectionBoxes_Precision/mAP'])
self.assertAlmostEqual(1.0, metrics['DetectionMasks_Precision/mAP'])
def test_get_eval_metric_ops_raises_error_with_unsupported_metric(self):
evaluation_metrics = ['unsupported_metrics']
categories = self._get_categories_list()
eval_dict = self._make_evaluation_dict()
with self.assertRaises(ValueError):
eval_util.get_eval_metric_ops_for_evaluators(
evaluation_metrics, categories, eval_dict)
if __name__ == '__main__':
tf.test.main()
......@@ -50,10 +50,10 @@ EVAL_METRICS_CLASS_DICT = {
EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics'
def _extract_prediction_tensors(model,
create_input_dict_fn,
ignore_groundtruth=False):
"""Restores the model in a tensorflow session.
def _extract_predictions_and_losses(model,
create_input_dict_fn,
ignore_groundtruth=False):
"""Constructs tensorflow detection graph and returns output tensors.
Args:
model: model to perform predictions with.
......@@ -61,7 +61,11 @@ def _extract_prediction_tensors(model,
ignore_groundtruth: whether groundtruth should be ignored.
Returns:
tensor_dict: A tensor dictionary with evaluations.
prediction_groundtruth_dict: A dictionary with postprocessed tensors (keyed
by standard_fields.DetectionResultsFields) and optional groundtruth
tensors (keyed by standard_fields.InputDataFields).
losses_dict: A dictionary containing detection losses. This is empty when
ignore_groundtruth is true.
"""
input_dict = create_input_dict_fn()
prefetch_queue = prefetcher.prefetch(input_dict, capacity=500)
......@@ -73,6 +77,7 @@ def _extract_prediction_tensors(model,
detections = model.postprocess(prediction_dict, true_image_shapes)
groundtruth = None
losses_dict = {}
if not ignore_groundtruth:
groundtruth = {
fields.InputDataFields.groundtruth_boxes:
......@@ -92,8 +97,14 @@ def _extract_prediction_tensors(model,
if fields.DetectionResultFields.detection_masks in detections:
groundtruth[fields.InputDataFields.groundtruth_instance_masks] = (
input_dict[fields.InputDataFields.groundtruth_instance_masks])
return eval_util.result_dict_for_single_example(
label_id_offset = 1
model.provide_groundtruth(
[input_dict[fields.InputDataFields.groundtruth_boxes]],
[tf.one_hot(input_dict[fields.InputDataFields.groundtruth_classes]
- label_id_offset, depth=model.num_classes)])
losses_dict.update(model.loss(prediction_dict, true_image_shapes))
result_dict = eval_util.result_dict_for_single_example(
original_image,
input_dict[fields.InputDataFields.source_id],
detections,
......@@ -101,6 +112,7 @@ def _extract_prediction_tensors(model,
class_agnostic=(
fields.DetectionResultFields.detection_classes not in detections),
scale_to_absolute=True)
return result_dict, losses_dict
def get_evaluators(eval_config, categories):
......@@ -157,13 +169,14 @@ def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories,
logging.fatal('If ignore_groundtruth=True then an export_path is '
'required. Aborting!!!')
tensor_dict = _extract_prediction_tensors(
tensor_dict, losses_dict = _extract_predictions_and_losses(
model=model,
create_input_dict_fn=create_input_dict_fn,
ignore_groundtruth=eval_config.ignore_groundtruth)
def _process_batch(tensor_dict, sess, batch_index, counters):
"""Evaluates tensors in tensor_dict, visualizing the first K examples.
def _process_batch(tensor_dict, sess, batch_index, counters,
losses_dict=None):
"""Evaluates tensors in tensor_dict, losses_dict and visualizes examples.
This function calls sess.run on tensor_dict, evaluating the original_image
tensor only on the first K examples and visualizing detections overlaid
......@@ -177,12 +190,17 @@ def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories,
be updated to keep track of number of successful and failed runs,
respectively. If these fields are not updated, then the success/skipped
counter values shown at the end of evaluation will be incorrect.
losses_dict: Optional dictonary of scalar loss tensors.
Returns:
result_dict: a dictionary of numpy arrays
result_losses_dict: a dictionary of scalar losses. This is empty if input
losses_dict is None.
"""
try:
result_dict = sess.run(tensor_dict)
if not losses_dict:
losses_dict = {}
result_dict, result_losses_dict = sess.run([tensor_dict, losses_dict])
counters['success'] += 1
except tf.errors.InvalidArgumentError:
logging.info('Skipping image')
......@@ -207,7 +225,7 @@ def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories,
skip_labels=eval_config.skip_labels,
keep_image_id_for_visualization_export=eval_config.
keep_image_id_for_visualization_export)
return result_dict
return result_dict, result_losses_dict
variables_to_restore = tf.global_variables()
global_step = tf.train.get_or_create_global_step()
......@@ -242,6 +260,7 @@ def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories,
if eval_config.max_evals else None),
master=eval_config.eval_master,
save_graph=eval_config.save_graph,
save_graph_dir=(eval_dir if eval_config.save_graph else ''))
save_graph_dir=(eval_dir if eval_config.save_graph else ''),
losses_dict=losses_dict)
return metrics
......@@ -62,7 +62,7 @@ class FakeModel(model.DetectionModel):
np.arange(64).reshape([2, 2, 4, 4]), tf.float32)
return postprocessed_tensors
def restore_map(self, checkpoint_path, from_detection_checkpoint):
def restore_map(self, checkpoint_path, fine_tune_checkpoint_type):
pass
def loss(self, prediction_dict, true_image_shapes):
......
......@@ -6,10 +6,14 @@ introduced in tensorflow 1.5.0 so runing with earlier versions may cause this
issue. It now has been replaced by
object_detection.data_decoders.tf_example_decoder.BackupHandler. Whoever sees
this issue should be able to resolve it by syncing your fork to HEAD.
Same for LookupTensor.
## Q: AttributeError: 'module' object has no attribute 'LookupTensor'
A: Similar to BackupHandler, syncing your fork to HEAD should make it work.
## Q: Why can't I get the inference time as reported in model zoo?
A: The inference time reported in model zoo is mean time of testing hundreds of
images with a internal machine. As mentioned in
images with an internal machine. As mentioned in
[Tensorflow detection model zoo](detection_model_zoo.md), this speed depends
highly on one's specific hardware configuration and should be treated more as
relative timing.
......@@ -40,6 +40,11 @@ HASH_KEY = 'hash'
HASH_BINS = 1 << 31
SERVING_FED_EXAMPLE_KEY = 'serialized_example'
# A map of names to methods that help build the input pipeline.
INPUT_BUILDER_UTIL_MAP = {
'dataset_build': dataset_builder.build,
}
def transform_input_data(tensor_dict,
model_preprocess_fn,
......@@ -229,7 +234,7 @@ def create_train_input_fn(train_config, train_input_config,
image_resizer_fn=image_resizer_fn,
num_classes=config_util.get_number_of_classes(model_config),
data_augmentation_fn=data_augmentation_fn)
dataset = dataset_builder.build(
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
train_input_config,
transform_input_data_fn=transform_data_fn,
batch_size=params['batch_size'] if params else train_config.batch_size,
......@@ -341,8 +346,13 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
num_classes=num_classes,
data_augmentation_fn=None,
retain_original_image=True)
dataset = dataset_builder.build(eval_input_config,
transform_input_data_fn=transform_data_fn)
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
eval_input_config,
transform_input_data_fn=transform_data_fn,
batch_size=1,
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
input_dict = dataset_util.make_initializable_iterator(dataset).get_next()
hash_from_source_id = tf.string_to_hash_bucket_fast(
......@@ -374,16 +384,6 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
labels[fields.InputDataFields.groundtruth_instance_masks] = input_dict[
fields.InputDataFields.groundtruth_instance_masks]
# Add a batch dimension to the tensors.
features = {
key: tf.expand_dims(features[key], axis=0)
for key, feature in features.items()
}
labels = {
key: tf.expand_dims(labels[key], axis=0)
for key, label in labels.items()
}
return features, labels
return _eval_input_fn
......@@ -426,9 +426,13 @@ def create_predict_input_fn(model_config):
input_dict = transform_fn(decoder.decode(example))
images = tf.to_float(input_dict[fields.InputDataFields.image])
images = tf.expand_dims(images, axis=0)
true_image_shape = tf.expand_dims(
input_dict[fields.InputDataFields.true_image_shape], axis=0)
return tf.estimator.export.ServingInputReceiver(
features={fields.InputDataFields.image: images},
features={
fields.InputDataFields.image: images,
fields.InputDataFields.true_image_shape: true_image_shape},
receiver_tensors={SERVING_FED_EXAMPLE_KEY: example})
return _predict_input_fn
......@@ -64,24 +64,24 @@ class InputsTest(tf.test.TestCase):
configs['train_config'], configs['train_input_config'], model_config)
features, labels = train_input_fn()
self.assertAllEqual([None, None, 3],
self.assertAllEqual([1, None, None, 3],
features[fields.InputDataFields.image].shape.as_list())
self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
self.assertAllEqual([],
self.assertAllEqual([1],
features[inputs.HASH_KEY].shape.as_list())
self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
self.assertAllEqual(
[None, 4],
[1, 50, 4],
labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_boxes].dtype)
self.assertAllEqual(
[None, model_config.faster_rcnn.num_classes],
[1, 50, model_config.faster_rcnn.num_classes],
labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual(
[None],
[1, 50],
labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_weights].dtype)
......
......@@ -159,6 +159,7 @@ class FasterRCNNFeatureExtractor(object):
Returns:
rpn_feature_map: A tensor with shape [batch, height, width, depth]
activations: A dictionary mapping activation tensor names to tensors.
"""
with tf.variable_scope(scope, values=[preprocessed_inputs]):
return self._extract_proposal_features(preprocessed_inputs, scope)
......@@ -906,7 +907,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
image_shape: A 1-D tensor representing the input image shape.
"""
image_shape = tf.shape(preprocessed_inputs)
rpn_features_to_crop = self._feature_extractor.extract_proposal_features(
rpn_features_to_crop, _ = self._feature_extractor.extract_proposal_features(
preprocessed_inputs, scope=self.first_stage_feature_extractor_scope)
feature_map_shape = tf.shape(rpn_features_to_crop)
......@@ -1649,14 +1650,14 @@ class FasterRCNNMetaArch(model.DetectionModel):
tf.reduce_sum(localization_losses, axis=1) / normalizer)
objectness_loss = tf.reduce_mean(
tf.reduce_sum(objectness_losses, axis=1) / normalizer)
loss_dict = {}
with tf.name_scope('localization_loss'):
loss_dict['first_stage_localization_loss'] = (
self._first_stage_loc_loss_weight * localization_loss)
with tf.name_scope('objectness_loss'):
loss_dict['first_stage_objectness_loss'] = (
self._first_stage_obj_loss_weight * objectness_loss)
localization_loss = tf.multiply(self._first_stage_loc_loss_weight,
localization_loss,
name='localization_loss')
objectness_loss = tf.multiply(self._first_stage_obj_loss_weight,
objectness_loss, name='objectness_loss')
loss_dict = {localization_loss.op.name: localization_loss,
objectness_loss.op.name: objectness_loss}
return loss_dict
def _loss_box_classifier(self,
......@@ -1782,15 +1783,16 @@ class FasterRCNNMetaArch(model.DetectionModel):
) = self._unpad_proposals_and_apply_hard_mining(
proposal_boxlists, second_stage_loc_losses,
second_stage_cls_losses, num_proposals)
loss_dict = {}
with tf.name_scope('localization_loss'):
loss_dict['second_stage_localization_loss'] = (
self._second_stage_loc_loss_weight * second_stage_loc_loss)
localization_loss = tf.multiply(self._second_stage_loc_loss_weight,
second_stage_loc_loss,
name='localization_loss')
with tf.name_scope('classification_loss'):
loss_dict['second_stage_classification_loss'] = (
self._second_stage_cls_loss_weight * second_stage_cls_loss)
classification_loss = tf.multiply(self._second_stage_cls_loss_weight,
second_stage_cls_loss,
name='classification_loss')
loss_dict = {localization_loss.op.name: localization_loss,
classification_loss.op.name: classification_loss}
second_stage_mask_loss = None
if prediction_masks is not None:
if groundtruth_masks_list is None:
......@@ -1857,9 +1859,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
tf.boolean_mask(second_stage_mask_losses, paddings_indicator))
if second_stage_mask_loss is not None:
with tf.name_scope('mask_loss'):
loss_dict['second_stage_mask_loss'] = (
self._second_stage_mask_loss_weight * second_stage_mask_loss)
mask_loss = tf.multiply(self._second_stage_mask_loss_weight,
second_stage_mask_loss, name='mask_loss')
loss_dict[mask_loss.op.name] = mask_loss
return loss_dict
def _padded_batched_proposals_indicator(self,
......@@ -1927,26 +1929,32 @@ class FasterRCNNMetaArch(model.DetectionModel):
decoded_boxlist_list=[proposal_boxlist])
def restore_map(self,
from_detection_checkpoint=True,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=False):
"""Returns a map of variables to load from a foreign checkpoint.
See parent class for details.
Args:
from_detection_checkpoint: whether to restore from a full detection
fine_tune_checkpoint_type: whether to restore from a full detection
checkpoint (with compatible variable names) or to restore from a
classification checkpoint for initialization prior to training. Default
True.
classification checkpoint for initialization prior to training.
Valid values: `detection`, `classification`. Default 'detection'.
load_all_detection_checkpoint_vars: whether to load all variables (when
`from_detection_checkpoint` is True). If False, only variables within
the feature extractor scopes are included. Default False.
`fine_tune_checkpoint_type` is `detection`). If False, only variables
within the feature extractor scopes are included. Default False.
Returns:
A dict mapping variable names (to load from a checkpoint) to variables in
the model graph.
Raises:
ValueError: if fine_tune_checkpoint_type is neither `classification`
nor `detection`.
"""
if not from_detection_checkpoint:
if fine_tune_checkpoint_type not in ['detection', 'classification']:
raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
fine_tune_checkpoint_type))
if fine_tune_checkpoint_type == 'classification':
return self._feature_extractor.restore_from_classification_checkpoint_fn(
self.first_stage_feature_extractor_scope,
self.second_stage_feature_extractor_scope)
......
......@@ -47,8 +47,9 @@ class FakeFasterRCNNFeatureExtractor(
def _extract_proposal_features(self, preprocessed_inputs, scope):
with tf.variable_scope('mock_model'):
return 0 * slim.conv2d(preprocessed_inputs,
num_outputs=3, kernel_size=1, scope='layer1')
proposal_features = 0 * slim.conv2d(
preprocessed_inputs, num_outputs=3, kernel_size=1, scope='layer1')
return proposal_features, {}
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
with tf.variable_scope('mock_model'):
......@@ -792,10 +793,12 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
loss_dict = model.loss(prediction_dict, true_image_shapes)
with self.test_session() as sess:
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
self.assertTrue('second_stage_localization_loss' not in loss_dict_out)
self.assertTrue('second_stage_classification_loss' not in loss_dict_out)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
self.assertTrue('Loss/BoxClassifierLoss/localization_loss'
not in loss_dict_out)
self.assertTrue('Loss/BoxClassifierLoss/classification_loss'
not in loss_dict_out)
# TODO(rathodv): Split test into two - with and without masks.
def test_loss_full(self):
......@@ -890,11 +893,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
with self.test_session() as sess:
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_localization_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_mask_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/classification_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)
def test_loss_full_zero_padded_proposals(self):
model = self._build_model(
......@@ -978,11 +983,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
with self.test_session() as sess:
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_localization_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_mask_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/classification_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)
def test_loss_full_multiple_label_groundtruth(self):
model = self._build_model(
......@@ -1074,11 +1081,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
with self.test_session() as sess:
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_localization_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_mask_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/classification_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)
def test_loss_full_zero_padded_proposals_nonzero_loss_with_two_images(self):
model = self._build_model(
......@@ -1173,12 +1182,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
with self.test_session() as sess:
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['first_stage_localization_loss'],
self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'],
exp_loc_loss)
self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_localization_loss'],
exp_loc_loss)
self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/localization_loss'], exp_loc_loss)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/classification_loss'], 0)
def test_loss_with_hard_mining(self):
model = self._build_model(is_training=True,
......@@ -1263,9 +1273,10 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
with self.test_session() as sess:
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['second_stage_localization_loss'],
exp_loc_loss)
self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/localization_loss'], exp_loc_loss)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/classification_loss'], 0)
def test_restore_map_for_classification_ckpt(self):
# Define mock tensorflow classification graph and save variables.
......@@ -1296,7 +1307,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
preprocessed_inputs, true_image_shapes = model.preprocess(inputs)
prediction_dict = model.predict(preprocessed_inputs, true_image_shapes)
model.postprocess(prediction_dict, true_image_shapes)
var_map = model.restore_map(from_detection_checkpoint=False)
var_map = model.restore_map(fine_tune_checkpoint_type='classification')
self.assertIsInstance(var_map, dict)
saver = tf.train.Saver(var_map)
with self.test_session(graph=test_graph_classification) as sess:
......@@ -1338,7 +1349,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
prediction_dict2 = model2.predict(preprocessed_inputs2, true_image_shapes)
model2.postprocess(prediction_dict2, true_image_shapes)
another_variable = tf.Variable([17.0], name='another_variable') # pylint: disable=unused-variable
var_map = model2.restore_map(from_detection_checkpoint=True)
var_map = model2.restore_map(fine_tune_checkpoint_type='detection')
self.assertIsInstance(var_map, dict)
saver = tf.train.Saver(var_map)
with self.test_session(graph=test_graph_detection2) as sess:
......@@ -1366,7 +1377,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
model.postprocess(prediction_dict, true_image_shapes)
another_variable = tf.Variable([17.0], name='another_variable') # pylint: disable=unused-variable
var_map = model.restore_map(
from_detection_checkpoint=True,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=True)
self.assertIsInstance(var_map, dict)
self.assertIn('another_variable', var_map)
......
......@@ -503,7 +503,7 @@ class SSDMetaArch(model.DetectionModel):
self.groundtruth_lists(fields.BoxListFields.classes),
keypoints, weights)
if self._add_summaries:
self._summarize_input(
self._summarize_target_assignment(
self.groundtruth_lists(fields.BoxListFields.boxes), match_list)
location_losses = self._localization_loss(
prediction_dict['box_encodings'],
......@@ -538,19 +538,20 @@ class SSDMetaArch(model.DetectionModel):
normalizer = tf.maximum(tf.to_float(tf.reduce_sum(batch_reg_weights)),
1.0)
with tf.name_scope('localization_loss'):
localization_loss_normalizer = normalizer
if self._normalize_loc_loss_by_codesize:
localization_loss_normalizer *= self._box_coder.code_size
localization_loss = ((self._localization_loss_weight / (
localization_loss_normalizer)) * localization_loss)
with tf.name_scope('classification_loss'):
classification_loss = ((self._classification_loss_weight / normalizer) *
classification_loss)
localization_loss_normalizer = normalizer
if self._normalize_loc_loss_by_codesize:
localization_loss_normalizer *= self._box_coder.code_size
localization_loss = tf.multiply((self._localization_loss_weight /
localization_loss_normalizer),
localization_loss,
name='localization_loss')
classification_loss = tf.multiply((self._classification_loss_weight /
normalizer), classification_loss,
name='classification_loss')
loss_dict = {
'localization_loss': localization_loss,
'classification_loss': classification_loss
localization_loss.op.name: localization_loss,
classification_loss.op.name: classification_loss
}
return loss_dict
......@@ -615,7 +616,7 @@ class SSDMetaArch(model.DetectionModel):
self._target_assigner, self.anchors, groundtruth_boxlists,
groundtruth_classes_with_background_list, groundtruth_weights_list)
def _summarize_input(self, groundtruth_boxes_list, match_list):
def _summarize_target_assignment(self, groundtruth_boxes_list, match_list):
"""Creates tensorflow summaries for the input boxes and anchors.
This function creates four summaries corresponding to the average
......@@ -639,14 +640,18 @@ class SSDMetaArch(model.DetectionModel):
[match.num_unmatched_columns() for match in match_list])
ignored_anchors_per_image = tf.stack(
[match.num_ignored_columns() for match in match_list])
tf.summary.scalar('Input/AvgNumGroundtruthBoxesPerImage',
tf.reduce_mean(tf.to_float(num_boxes_per_image)))
tf.summary.scalar('Input/AvgNumPositiveAnchorsPerImage',
tf.reduce_mean(tf.to_float(pos_anchors_per_image)))
tf.summary.scalar('Input/AvgNumNegativeAnchorsPerImage',
tf.reduce_mean(tf.to_float(neg_anchors_per_image)))
tf.summary.scalar('Input/AvgNumIgnoredAnchorsPerImage',
tf.reduce_mean(tf.to_float(ignored_anchors_per_image)))
tf.summary.scalar('AvgNumGroundtruthBoxesPerImage',
tf.reduce_mean(tf.to_float(num_boxes_per_image)),
family='TargetAssignment')
tf.summary.scalar('AvgNumPositiveAnchorsPerImage',
tf.reduce_mean(tf.to_float(pos_anchors_per_image)),
family='TargetAssignment')
tf.summary.scalar('AvgNumNegativeAnchorsPerImage',
tf.reduce_mean(tf.to_float(neg_anchors_per_image)),
family='TargetAssignment')
tf.summary.scalar('AvgNumIgnoredAnchorsPerImage',
tf.reduce_mean(tf.to_float(ignored_anchors_per_image)),
family='TargetAssignment')
def _apply_hard_mining(self, location_losses, cls_losses, prediction_dict,
match_list):
......@@ -731,16 +736,17 @@ class SSDMetaArch(model.DetectionModel):
return decoded_boxes, decoded_keypoints
def restore_map(self,
from_detection_checkpoint=True,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=False):
"""Returns a map of variables to load from a foreign checkpoint.
See parent class for details.
Args:
from_detection_checkpoint: whether to restore from a full detection
fine_tune_checkpoint_type: whether to restore from a full detection
checkpoint (with compatible variable names) or to restore from a
classification checkpoint for initialization prior to training.
Valid values: `detection`, `classification`. Default 'detection'.
load_all_detection_checkpoint_vars: whether to load all variables (when
`from_detection_checkpoint` is True). If False, only variables within
the appropriate scopes are included. Default False.
......@@ -748,15 +754,22 @@ class SSDMetaArch(model.DetectionModel):
Returns:
A dict mapping variable names (to load from a checkpoint) to variables in
the model graph.
Raises:
ValueError: if fine_tune_checkpoint_type is neither `classification`
nor `detection`.
"""
if fine_tune_checkpoint_type not in ['detection', 'classification']:
raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
fine_tune_checkpoint_type))
variables_to_restore = {}
for variable in tf.global_variables():
var_name = variable.op.name
if from_detection_checkpoint and load_all_detection_checkpoint_vars:
if (fine_tune_checkpoint_type == 'detection' and
load_all_detection_checkpoint_vars):
variables_to_restore[var_name] = variable
else:
if var_name.startswith(self._extract_features_scope):
if not from_detection_checkpoint:
if fine_tune_checkpoint_type == 'classification':
var_name = (
re.split('^' + self._extract_features_scope + '/',
var_name)[-1])
......
......@@ -72,6 +72,13 @@ class MockAnchorGenerator2x2(anchor_generator.AnchorGenerator):
return 4
def _get_value_for_matching_key(dictionary, suffix):
for key in dictionary.keys():
if key.endswith(suffix):
return dictionary[key]
raise ValueError('key not found {}'.format(suffix))
class SsdMetaArchTest(test_case.TestCase):
def _create_model(self, apply_hard_mining=True,
......@@ -270,7 +277,9 @@ class SsdMetaArchTest(test_case.TestCase):
prediction_dict = model.predict(preprocessed_tensor,
true_image_shapes=None)
loss_dict = model.loss(prediction_dict, true_image_shapes=None)
return (loss_dict['localization_loss'], loss_dict['classification_loss'])
return (
_get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),
_get_value_for_matching_key(loss_dict, 'Loss/classification_loss'))
batch_size = 2
preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
......@@ -305,7 +314,7 @@ class SsdMetaArchTest(test_case.TestCase):
prediction_dict = model.predict(preprocessed_tensor,
true_image_shapes=None)
loss_dict = model.loss(prediction_dict, true_image_shapes=None)
return (loss_dict['localization_loss'],)
return (_get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),)
batch_size = 2
preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
......@@ -335,7 +344,9 @@ class SsdMetaArchTest(test_case.TestCase):
prediction_dict = model.predict(preprocessed_tensor,
true_image_shapes=None)
loss_dict = model.loss(prediction_dict, true_image_shapes=None)
return (loss_dict['localization_loss'], loss_dict['classification_loss'])
return (
_get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),
_get_value_for_matching_key(loss_dict, 'Loss/classification_loss'))
batch_size = 2
preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
......@@ -366,7 +377,7 @@ class SsdMetaArchTest(test_case.TestCase):
sess.run(init_op)
saved_model_path = saver.save(sess, save_path)
var_map = model.restore_map(
from_detection_checkpoint=True,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=False)
self.assertIsInstance(var_map, dict)
saver = tf.train.Saver(var_map)
......@@ -402,7 +413,7 @@ class SsdMetaArchTest(test_case.TestCase):
prediction_dict = model.predict(preprocessed_inputs, true_image_shapes)
model.postprocess(prediction_dict, true_image_shapes)
another_variable = tf.Variable([17.0], name='another_variable') # pylint: disable=unused-variable
var_map = model.restore_map(from_detection_checkpoint=False)
var_map = model.restore_map(fine_tune_checkpoint_type='classification')
self.assertNotIn('another_variable', var_map)
self.assertIsInstance(var_map, dict)
saver = tf.train.Saver(var_map)
......@@ -423,7 +434,7 @@ class SsdMetaArchTest(test_case.TestCase):
model.postprocess(prediction_dict, true_image_shapes)
another_variable = tf.Variable([17.0], name='another_variable') # pylint: disable=unused-variable
var_map = model.restore_map(
from_detection_checkpoint=True,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=True)
self.assertIsInstance(var_map, dict)
self.assertIn('another_variable', var_map)
......
......@@ -100,6 +100,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
groundtruth_is_crowd=groundtruth_is_crowd))
self._annotation_id += groundtruth_dict[standard_fields.InputDataFields.
groundtruth_boxes].shape[0]
# Boolean to indicate whether a detection has been added for this image.
self._image_ids[image_id] = False
def add_single_detected_image_info(self,
......@@ -120,9 +121,6 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
[num_boxes] containing detection scores for the boxes.
DetectionResultFields.detection_classes: integer numpy array of shape
[num_boxes] containing 1-indexed detection classes for the boxes.
DetectionResultFields.detection_masks: optional uint8 numpy array of
shape [num_boxes, image_height, image_width] containing instance
masks for the boxes.
Raises:
ValueError: If groundtruth for the image_id is not available.
......@@ -200,7 +198,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
all_metrics_per_category=self._all_metrics_per_category)
box_metrics.update(box_per_category_ap)
box_metrics = {'DetectionBoxes_'+ key: value
for key, value in box_metrics.iteritems()}
for key, value in iter(box_metrics.items())}
return box_metrics
def get_estimator_eval_metric_ops(self, image_id, groundtruth_boxes,
......@@ -282,6 +280,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
return np.float32(self._metrics[metric_name])
return value_func
# Ensure that the metrics are only evaluated once.
first_value_op = tf.py_func(first_value_func, [], tf.float32)
eval_metric_ops = {metric_names[0]: (first_value_op, update_op)}
with tf.control_dependencies([first_value_op]):
......@@ -292,7 +291,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
def _check_mask_type_and_value(array_name, masks):
"""Checks whether mask dtype is uint8 anf the values are either 0 or 1."""
"""Checks whether mask dtype is uint8 and the values are either 0 or 1."""
if masks.dtype != np.uint8:
raise ValueError('{} must be of type np.uint8. Found {}.'.format(
array_name, masks.dtype))
......@@ -334,6 +333,9 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
groundtruth_dict):
"""Adds groundtruth for a single image to be used for evaluation.
If the image has already been added, a warning is logged, and groundtruth is
ignored.
Args:
image_id: A unique string/integer identifier for the image.
groundtruth_dict: A dictionary containing -
......@@ -379,6 +381,9 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
detections_dict):
"""Adds detections for a single image to be used for evaluation.
If a detection has already been added for this image id, a warning is
logged, and the detection is skipped.
Args:
image_id: A unique string/integer identifier for the image.
detections_dict: A dictionary containing -
......@@ -435,25 +440,25 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
A dictionary holding -
1. summary_metrics:
'Precision/mAP': mean average precision over classes averaged over IOU
thresholds ranging from .5 to .95 with .05 increments
'Precision/mAP@.50IOU': mean average precision at 50% IOU
'Precision/mAP@.75IOU': mean average precision at 75% IOU
'Precision/mAP (small)': mean average precision for small objects
(area < 32^2 pixels)
'Precision/mAP (medium)': mean average precision for medium sized
objects (32^2 pixels < area < 96^2 pixels)
'Precision/mAP (large)': mean average precision for large objects
(96^2 pixels < area < 10000^2 pixels)
'Recall/AR@1': average recall with 1 detection
'Recall/AR@10': average recall with 10 detections
'Recall/AR@100': average recall with 100 detections
'Recall/AR@100 (small)': average recall for small objects with 100
detections
'Recall/AR@100 (medium)': average recall for medium objects with 100
detections
'Recall/AR@100 (large)': average recall for large objects with 100
detections
'DetectionMasks_Precision/mAP': mean average precision over classes
averaged over IOU thresholds ranging from .5 to .95 with .05 increments.
'DetectionMasks_Precision/mAP@.50IOU': mean average precision at 50% IOU.
'DetectionMasks_Precision/mAP@.75IOU': mean average precision at 75% IOU.
'DetectionMasks_Precision/mAP (small)': mean average precision for small
objects (area < 32^2 pixels).
'DetectionMasks_Precision/mAP (medium)': mean average precision for medium
sized objects (32^2 pixels < area < 96^2 pixels).
'DetectionMasks_Precision/mAP (large)': mean average precision for large
objects (96^2 pixels < area < 10000^2 pixels).
'DetectionMasks_Recall/AR@1': average recall with 1 detection.
'DetectionMasks_Recall/AR@10': average recall with 10 detections.
'DetectionMasks_Recall/AR@100': average recall with 100 detections.
'DetectionMasks_Recall/AR@100 (small)': average recall for small objects
with 100 detections.
'DetectionMasks_Recall/AR@100 (medium)': average recall for medium objects
with 100 detections.
'DetectionMasks_Recall/AR@100 (large)': average recall for large objects
with 100 detections.
2. per_category_ap: if include_metrics_per_category is True, category
specific results with keys of the form:
......@@ -482,3 +487,101 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
mask_metrics = {'DetectionMasks_'+ key: value
for key, value in mask_metrics.iteritems()}
return mask_metrics
def get_estimator_eval_metric_ops(self, image_id, groundtruth_boxes,
groundtruth_classes,
groundtruth_instance_masks,
detection_scores, detection_classes,
detection_masks):
"""Returns a dictionary of eval metric ops to use with `tf.EstimatorSpec`.
Note that once value_op is called, the detections and groundtruth added via
update_op are cleared.
Args:
image_id: Unique string/integer identifier for the image.
groundtruth_boxes: float32 tensor of shape [num_boxes, 4] containing
`num_boxes` groundtruth boxes of the format
[ymin, xmin, ymax, xmax] in absolute image coordinates.
groundtruth_classes: int32 tensor of shape [num_boxes] containing
1-indexed groundtruth classes for the boxes.
groundtruth_instance_masks: uint8 tensor array of shape
[num_boxes, image_height, image_width] containing groundtruth masks
corresponding to the boxes. The elements of the array must be in {0, 1}.
detection_scores: float32 tensor of shape [num_boxes] containing
detection scores for the boxes.
detection_classes: int32 tensor of shape [num_boxes] containing
1-indexed detection classes for the boxes.
detection_masks: uint8 tensor array of shape
[num_boxes, image_height, image_width] containing instance masks
corresponding to the boxes. The elements of the array must be in {0, 1}.
Returns:
a dictionary of metric names to tuple of value_op and update_op that can
be used as eval metric ops in tf.EstimatorSpec. Note that all update ops
must be run together and similarly all value ops must be run together to
guarantee correct behaviour.
"""
def update_op(
image_id,
groundtruth_boxes,
groundtruth_classes,
groundtruth_instance_masks,
detection_scores,
detection_classes,
detection_masks):
self.add_single_ground_truth_image_info(
image_id,
{'groundtruth_boxes': groundtruth_boxes,
'groundtruth_classes': groundtruth_classes,
'groundtruth_instance_masks': groundtruth_instance_masks})
self.add_single_detected_image_info(
image_id,
{'detection_scores': detection_scores,
'detection_classes': detection_classes,
'detection_masks': detection_masks})
update_op = tf.py_func(update_op, [image_id,
groundtruth_boxes,
groundtruth_classes,
groundtruth_instance_masks,
detection_scores,
detection_classes,
detection_masks], [])
metric_names = ['DetectionMasks_Precision/mAP',
'DetectionMasks_Precision/mAP@.50IOU',
'DetectionMasks_Precision/mAP@.75IOU',
'DetectionMasks_Precision/mAP (large)',
'DetectionMasks_Precision/mAP (medium)',
'DetectionMasks_Precision/mAP (small)',
'DetectionMasks_Recall/AR@1',
'DetectionMasks_Recall/AR@10',
'DetectionMasks_Recall/AR@100',
'DetectionMasks_Recall/AR@100 (large)',
'DetectionMasks_Recall/AR@100 (medium)',
'DetectionMasks_Recall/AR@100 (small)']
if self._include_metrics_per_category:
for category_dict in self._categories:
metric_names.append('DetectionMasks_PerformanceByCategory/mAP/' +
category_dict['name'])
def first_value_func():
self._metrics = self.evaluate()
self.clear()
return np.float32(self._metrics[metric_names[0]])
def value_func_factory(metric_name):
def value_func():
return np.float32(self._metrics[metric_name])
return value_func
# Ensure that the metrics are only evaluated once.
first_value_op = tf.py_func(first_value_func, [], tf.float32)
eval_metric_ops = {metric_names[0]: (first_value_op, update_op)}
with tf.control_dependencies([first_value_op]):
for metric_name in metric_names[1:]:
eval_metric_ops[metric_name] = (tf.py_func(
value_func_factory(metric_name), [], np.float32), update_op)
return eval_metric_ops
......@@ -403,5 +403,101 @@ class CocoMaskEvaluationTest(tf.test.TestCase):
self.assertFalse(coco_evaluator._detection_masks_list)
class CocoMaskEvaluationPyFuncTest(tf.test.TestCase):
def testGetOneMAPWithMatchingGroundtruthAndDetections(self):
category_list = [{'id': 0, 'name': 'person'},
{'id': 1, 'name': 'cat'},
{'id': 2, 'name': 'dog'}]
coco_evaluator = coco_evaluation.CocoMaskEvaluator(category_list)
image_id = tf.placeholder(tf.string, shape=())
groundtruth_boxes = tf.placeholder(tf.float32, shape=(None, 4))
groundtruth_classes = tf.placeholder(tf.float32, shape=(None))
groundtruth_masks = tf.placeholder(tf.uint8, shape=(None, None, None))
detection_scores = tf.placeholder(tf.float32, shape=(None))
detection_classes = tf.placeholder(tf.float32, shape=(None))
detection_masks = tf.placeholder(tf.uint8, shape=(None, None, None))
eval_metric_ops = coco_evaluator.get_estimator_eval_metric_ops(
image_id, groundtruth_boxes,
groundtruth_classes,
groundtruth_masks,
detection_scores,
detection_classes,
detection_masks)
_, update_op = eval_metric_ops['DetectionMasks_Precision/mAP']
with self.test_session() as sess:
sess.run(update_op,
feed_dict={
image_id: 'image1',
groundtruth_boxes: np.array([[100., 100., 200., 200.]]),
groundtruth_classes: np.array([1]),
groundtruth_masks: np.pad(np.ones([1, 100, 100],
dtype=np.uint8),
((0, 0), (10, 10), (10, 10)),
mode='constant'),
detection_scores: np.array([.8]),
detection_classes: np.array([1]),
detection_masks: np.pad(np.ones([1, 100, 100],
dtype=np.uint8),
((0, 0), (10, 10), (10, 10)),
mode='constant')
})
sess.run(update_op,
feed_dict={
image_id: 'image2',
groundtruth_boxes: np.array([[50., 50., 100., 100.]]),
groundtruth_classes: np.array([1]),
groundtruth_masks: np.pad(np.ones([1, 50, 50],
dtype=np.uint8),
((0, 0), (10, 10), (10, 10)),
mode='constant'),
detection_scores: np.array([.8]),
detection_classes: np.array([1]),
detection_masks: np.pad(np.ones([1, 50, 50], dtype=np.uint8),
((0, 0), (10, 10), (10, 10)),
mode='constant')
})
sess.run(update_op,
feed_dict={
image_id: 'image3',
groundtruth_boxes: np.array([[25., 25., 50., 50.]]),
groundtruth_classes: np.array([1]),
groundtruth_masks: np.pad(np.ones([1, 25, 25],
dtype=np.uint8),
((0, 0), (10, 10), (10, 10)),
mode='constant'),
detection_scores: np.array([.8]),
detection_classes: np.array([1]),
detection_masks: np.pad(np.ones([1, 25, 25],
dtype=np.uint8),
((0, 0), (10, 10), (10, 10)),
mode='constant')
})
metrics = {}
for key, (value_op, _) in eval_metric_ops.iteritems():
metrics[key] = value_op
metrics = sess.run(metrics)
self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP@.50IOU'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP@.75IOU'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP (large)'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP (medium)'],
1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP (small)'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@1'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@10'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100 (large)'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100 (medium)'],
1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100 (small)'], 1.0)
self.assertFalse(coco_evaluator._groundtruth_list)
self.assertFalse(coco_evaluator._image_ids_with_detections)
self.assertFalse(coco_evaluator._image_id_to_mask_shape_map)
self.assertFalse(coco_evaluator._detection_masks_list)
if __name__ == '__main__':
tf.test.main()
......@@ -39,7 +39,6 @@ from object_detection import model_hparams
from object_detection.builders import model_builder
from object_detection.builders import optimizer_builder
from object_detection.core import standard_fields as fields
from object_detection.metrics import coco_evaluation
from object_detection.utils import config_util
from object_detection.utils import label_map_util
from object_detection.utils import shape_utils
......@@ -121,8 +120,8 @@ def unstack_batch(tensor_dict, unpad_groundtruth_tensors=True):
2. [batch_size, height, width, channels]
3. [batch_size, num_boxes, d1, d2, ... dn]
When unpad_tensors is set to true, unstacked tensors of form 3 above are
sliced along the `num_boxes` dimension using the value in tensor
When unpad_groundtruth_tensors is set to true, unstacked tensors of form 3
above are sliced along the `num_boxes` dimension using the value in tensor
field.InputDataFields.num_groundtruth_boxes.
Note that this function has a static list of input data fields and has to be
......@@ -198,6 +197,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
"""
train_config = configs['train_config']
eval_input_config = configs['eval_input_config']
eval_config = configs['eval_config']
def model_fn(features, labels, mode, params=None):
"""Constructs the object detection model.
......@@ -250,9 +250,25 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
prediction_dict, features[fields.InputDataFields.true_image_shape])
if mode == tf.estimator.ModeKeys.TRAIN:
if not train_config.fine_tune_checkpoint_type:
# train_config.from_detection_checkpoint field is deprecated. For
# backward compatibility, sets finetune_checkpoint_type based on
# from_detection_checkpoint.
if train_config.from_detection_checkpoint:
train_config.fine_tune_checkpoint_type = 'detection'
else:
train_config.fine_tune_checkpoint_type = 'classification'
if train_config.fine_tune_checkpoint and hparams.load_pretrained:
if not train_config.fine_tune_checkpoint_type:
# train_config.from_detection_checkpoint field is deprecated. For
# backward compatibility, set train_config.fine_tune_checkpoint_type
# based on train_config.from_detection_checkpoint.
if train_config.from_detection_checkpoint:
train_config.fine_tune_checkpoint_type = 'detection'
else:
train_config.fine_tune_checkpoint_type = 'classification'
asg_map = detection_model.restore_map(
from_detection_checkpoint=train_config.from_detection_checkpoint,
fine_tune_checkpoint_type=train_config.fine_tune_checkpoint_type,
load_all_detection_checkpoint_vars=(
train_config.load_all_detection_checkpoint_vars))
available_var_map = (
......@@ -273,6 +289,15 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
losses_dict = detection_model.loss(
prediction_dict, features[fields.InputDataFields.true_image_shape])
losses = [loss_tensor for loss_tensor in losses_dict.itervalues()]
if train_config.add_regularization_loss:
regularization_losses = tf.get_collection(
tf.GraphKeys.REGULARIZATION_LOSSES)
if regularization_losses:
regularization_loss = tf.add_n(regularization_losses,
name='regularization_loss')
losses.append(regularization_loss)
if not use_tpu:
tf.summary.scalar('regularization_loss', regularization_loss)
total_loss = tf.add_n(losses, name='total_loss')
if mode == tf.estimator.ModeKeys.TRAIN:
......@@ -321,8 +346,12 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
class_agnostic = (fields.DetectionResultFields.detection_classes
not in detections)
groundtruth = _get_groundtruth_data(detection_model, class_agnostic)
use_original_images = fields.InputDataFields.original_image in features
eval_images = (
features[fields.InputDataFields.original_image] if use_original_images
else features[fields.InputDataFields.image])
eval_dict = eval_util.result_dict_for_single_example(
tf.expand_dims(features[fields.InputDataFields.original_image][0], 0),
eval_images[0:1],
features[inputs.HASH_KEY][0],
detections,
groundtruth,
......@@ -334,7 +363,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
else:
category_index = label_map_util.create_category_index_from_labelmap(
eval_input_config.label_map_path)
if not use_tpu:
if not use_tpu and use_original_images:
detection_and_groundtruth = (
vis_utils.draw_side_by_side_evaluation_image(
eval_dict, category_index, max_boxes_to_draw=20,
......@@ -343,17 +372,12 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
detection_and_groundtruth)
# Eval metrics on a single image.
detection_fields = fields.DetectionResultFields()
input_data_fields = fields.InputDataFields()
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
category_index.values())
eval_metric_ops = coco_evaluator.get_estimator_eval_metric_ops(
image_id=eval_dict[input_data_fields.key],
groundtruth_boxes=eval_dict[input_data_fields.groundtruth_boxes],
groundtruth_classes=eval_dict[input_data_fields.groundtruth_classes],
detection_boxes=eval_dict[detection_fields.detection_boxes],
detection_scores=eval_dict[detection_fields.detection_scores],
detection_classes=eval_dict[detection_fields.detection_classes])
eval_metrics = eval_config.metrics_set
if not eval_metrics:
eval_metrics = ['coco_detection_metrics']
eval_metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
eval_metrics, category_index.values(), eval_dict,
include_metrics_per_category=False)
if use_tpu:
return tf.contrib.tpu.TPUEstimatorSpec(
......@@ -376,7 +400,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
return model_fn
def _build_experiment_fn(train_steps, eval_steps):
def build_experiment_fn(train_steps, eval_steps):
"""Returns a function that creates an `Experiment`."""
def build_experiment(run_config, hparams):
......@@ -509,8 +533,8 @@ def main(unused_argv):
tf.flags.mark_flag_as_required('pipeline_config_path')
config = tf.contrib.learn.RunConfig(model_dir=FLAGS.model_dir)
learn_runner.run(
experiment_fn=_build_experiment_fn(FLAGS.num_train_steps,
FLAGS.num_eval_steps),
experiment_fn=build_experiment_fn(FLAGS.num_train_steps,
FLAGS.num_eval_steps),
run_config=config,
hparams=model_hparams.create_hparams())
......
......@@ -49,6 +49,6 @@ def BuildExperiment():
hparams_overrides='load_pretrained=false')
# pylint: disable=protected-access
experiment_fn = model._build_experiment_fn(10, 10)
experiment_fn = model.build_experiment_fn(10, 10)
# pylint: enable=protected-access
return experiment_fn(run_config, hparams)
......@@ -105,12 +105,10 @@ class FasterRCNNInceptionResnetV2FeatureExtractor(
is_training=self._train_batch_norm):
with tf.variable_scope('InceptionResnetV2',
reuse=self._reuse_weights) as scope:
rpn_feature_map, _ = (
inception_resnet_v2.inception_resnet_v2_base(
preprocessed_inputs, final_endpoint='PreAuxLogits',
scope=scope, output_stride=self._first_stage_features_stride,
align_feature_maps=True))
return rpn_feature_map
return inception_resnet_v2.inception_resnet_v2_base(
preprocessed_inputs, final_endpoint='PreAuxLogits',
scope=scope, output_stride=self._first_stage_features_stride,
align_feature_maps=True)
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
"""Extracts second stage box classifier features.
......
......@@ -35,7 +35,7 @@ class FasterRcnnInceptionResnetV2FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 299, 299, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -50,7 +50,7 @@ class FasterRcnnInceptionResnetV2FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=8)
preprocessed_inputs = tf.random_uniform(
[1, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -65,7 +65,7 @@ class FasterRcnnInceptionResnetV2FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 112, 112, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......
......@@ -109,6 +109,9 @@ class FasterRCNNInceptionV2FeatureExtractor(
Returns:
rpn_feature_map: A tensor with shape [batch, height, width, depth]
activations: A dictionary mapping feature extractor tensor names to
tensors
Raises:
InvalidArgumentError: If the spatial size of `preprocessed_inputs`
(height or width) is less than 33.
......@@ -134,7 +137,7 @@ class FasterRCNNInceptionV2FeatureExtractor(
depth_multiplier=self._depth_multiplier,
scope=scope)
return activations['Mixed_4e']
return activations['Mixed_4e'], activations
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
"""Extracts second stage box classifier features.
......
......@@ -36,7 +36,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[4, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -51,7 +51,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=8)
preprocessed_inputs = tf.random_uniform(
[4, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -66,7 +66,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 112, 112, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -84,7 +84,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=16)
preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Mobilenet v1 Faster R-CNN implementation."""
import tensorflow as tf
from object_detection.meta_architectures import faster_rcnn_meta_arch
from nets import mobilenet_v1
slim = tf.contrib.slim
def _batch_norm_arg_scope(list_ops,
use_batch_norm=True,
batch_norm_decay=0.9997,
batch_norm_epsilon=0.001,
batch_norm_scale=False,
train_batch_norm=False):
"""Slim arg scope for Mobilenet V1 batch norm."""
if use_batch_norm:
batch_norm_params = {
'is_training': train_batch_norm,
'scale': batch_norm_scale,
'decay': batch_norm_decay,
'epsilon': batch_norm_epsilon
}
normalizer_fn = slim.batch_norm
else:
normalizer_fn = None
batch_norm_params = None
return slim.arg_scope(list_ops,
normalizer_fn=normalizer_fn,
normalizer_params=batch_norm_params)
class FasterRCNNMobilenetV1FeatureExtractor(
faster_rcnn_meta_arch.FasterRCNNFeatureExtractor):
"""Faster R-CNN Mobilenet V1 feature extractor implementation."""
def __init__(self,
is_training,
first_stage_features_stride,
batch_norm_trainable=False,
reuse_weights=None,
weight_decay=0.0,
depth_multiplier=1.0,
min_depth=16):
"""Constructor.
Args:
is_training: See base class.
first_stage_features_stride: See base class.
batch_norm_trainable: See base class.
reuse_weights: See base class.
weight_decay: See base class.
depth_multiplier: float depth multiplier for feature extractor.
min_depth: minimum feature extractor depth.
Raises:
ValueError: If `first_stage_features_stride` is not 8 or 16.
"""
if first_stage_features_stride != 8 and first_stage_features_stride != 16:
raise ValueError('`first_stage_features_stride` must be 8 or 16.')
self._depth_multiplier = depth_multiplier
self._min_depth = min_depth
super(FasterRCNNMobilenetV1FeatureExtractor, self).__init__(
is_training, first_stage_features_stride, batch_norm_trainable,
reuse_weights, weight_decay)
def preprocess(self, resized_inputs):
"""Faster R-CNN Mobilenet V1 preprocessing.
Maps pixel values to the range [-1, 1].
Args:
resized_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
"""
return (2.0 / 255.0) * resized_inputs - 1.0
def _extract_proposal_features(self, preprocessed_inputs, scope):
"""Extracts first stage RPN features.
Args:
preprocessed_inputs: A [batch, height, width, channels] float32 tensor
representing a batch of images.
scope: A scope name.
Returns:
rpn_feature_map: A tensor with shape [batch, height, width, depth]
activations: A dictionary mapping feature extractor tensor names to
tensors
Raises:
InvalidArgumentError: If the spatial size of `preprocessed_inputs`
(height or width) is less than 33.
ValueError: If the created network is missing the required activation.
"""
preprocessed_inputs.get_shape().assert_has_rank(4)
shape_assert = tf.Assert(
tf.logical_and(tf.greater_equal(tf.shape(preprocessed_inputs)[1], 33),
tf.greater_equal(tf.shape(preprocessed_inputs)[2], 33)),
['image size must at least be 33 in both height and width.'])
with tf.control_dependencies([shape_assert]):
with tf.variable_scope('MobilenetV1',
reuse=self._reuse_weights) as scope:
with _batch_norm_arg_scope([slim.conv2d, slim.separable_conv2d],
batch_norm_scale=True,
train_batch_norm=self._train_batch_norm):
_, activations = mobilenet_v1.mobilenet_v1_base(
preprocessed_inputs,
final_endpoint='Conv2d_13_pointwise',
min_depth=self._min_depth,
depth_multiplier=self._depth_multiplier,
scope=scope)
return activations['Conv2d_13_pointwise'], activations
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
"""Extracts second stage box classifier features.
Args:
proposal_feature_maps: A 4-D float tensor with shape
[batch_size * self.max_num_proposals, crop_height, crop_width, depth]
representing the feature map cropped to each proposal.
scope: A scope name (unused).
Returns:
proposal_classifier_features: A 4-D float tensor with shape
[batch_size * self.max_num_proposals, height, width, depth]
representing box classifier features for each proposal.
"""
net = proposal_feature_maps
depth = lambda d: max(int(d * 1.0), 16)
with tf.variable_scope('MobilenetV1', reuse=self._reuse_weights):
with _batch_norm_arg_scope([slim.conv2d, slim.separable_conv2d],
batch_norm_scale=True,
train_batch_norm=self._train_batch_norm):
with slim.arg_scope(
[slim.conv2d, slim.separable_conv2d], padding='SAME'):
net = slim.separable_conv2d(
net,
depth(1024), [3, 3],
depth_multiplier=1,
stride=2,
scope='Conv2d_12_pointwise')
return slim.separable_conv2d(
net,
depth(1024), [3, 3],
depth_multiplier=1,
stride=1,
scope='Conv2d_13_pointwise')
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for faster_rcnn_mobilenet_v1_feature_extractor."""
import numpy as np
import tensorflow as tf
from object_detection.models import faster_rcnn_mobilenet_v1_feature_extractor as faster_rcnn_mobilenet_v1
class FasterRcnnMobilenetV1FeatureExtractorTest(tf.test.TestCase):
def _build_feature_extractor(self, first_stage_features_stride):
return faster_rcnn_mobilenet_v1.FasterRCNNMobilenetV1FeatureExtractor(
is_training=False,
first_stage_features_stride=first_stage_features_stride,
batch_norm_trainable=False,
reuse_weights=None,
weight_decay=0.0)
def test_extract_proposal_features_returns_expected_size(self):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[4, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
features_shape_out = sess.run(features_shape)
self.assertAllEqual(features_shape_out, [4, 7, 7, 1024])
def test_extract_proposal_features_stride_eight(self):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=8)
preprocessed_inputs = tf.random_uniform(
[4, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
features_shape_out = sess.run(features_shape)
self.assertAllEqual(features_shape_out, [4, 7, 7, 1024])
def test_extract_proposal_features_half_size_input(self):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 112, 112, 3], maxval=255, dtype=tf.float32)
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
features_shape_out = sess.run(features_shape)
self.assertAllEqual(features_shape_out, [1, 4, 4, 1024])
def test_extract_proposal_features_dies_on_invalid_stride(self):
with self.assertRaises(ValueError):
self._build_feature_extractor(first_stage_features_stride=99)
def test_extract_proposal_features_dies_on_very_small_images(self):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=16)
preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
with self.assertRaises(tf.errors.InvalidArgumentError):
sess.run(
features_shape,
feed_dict={preprocessed_inputs: np.random.rand(4, 32, 32, 3)})
def test_extract_proposal_features_dies_with_incorrect_rank_inputs(self):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[224, 224, 3], maxval=255, dtype=tf.float32)
with self.assertRaises(ValueError):
feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
def test_extract_box_classifier_features_returns_expected_size(self):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=16)
proposal_feature_maps = tf.random_uniform(
[3, 14, 14, 576], maxval=255, dtype=tf.float32)
proposal_classifier_features = (
feature_extractor.extract_box_classifier_features(
proposal_feature_maps, scope='TestScope'))
features_shape = tf.shape(proposal_classifier_features)
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
features_shape_out = sess.run(features_shape)
self.assertAllEqual(features_shape_out, [3, 7, 7, 1024])
if __name__ == '__main__':
tf.test.main()
......@@ -171,6 +171,8 @@ class FasterRCNNNASFeatureExtractor(
Returns:
rpn_feature_map: A tensor with shape [batch, height, width, depth]
end_points: A dictionary mapping feature extractor tensor names to tensors
Raises:
ValueError: If the created network is missing the required activation.
"""
......@@ -202,7 +204,7 @@ class FasterRCNNNASFeatureExtractor(
rpn_feature_map_shape = [batch] + shape_without_batch
rpn_feature_map.set_shape(rpn_feature_map_shape)
return rpn_feature_map
return rpn_feature_map, end_points
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
"""Extracts second stage box classifier features.
......@@ -231,9 +233,11 @@ class FasterRCNNNASFeatureExtractor(
# Note that what follows is largely a copy of build_nasnet_large() within
# nasnet.py. We are copying to minimize code pollution in slim.
# pylint: disable=protected-access
hparams = nasnet._large_imagenet_config(is_training=self._is_training)
# pylint: enable=protected-access
# TODO(shlens,skornblith): Determine the appropriate drop path schedule.
# For now the schedule is the default (1.0->0.7 over 250,000 train steps).
hparams = nasnet.large_imagenet_config()
if not self._is_training:
hparams.set_hparam('drop_path_keep_prob', 1.0)
# Calculate the total number of cells in the network
# -- Add 2 for the reduction cells.
......
......@@ -35,7 +35,7 @@ class FasterRcnnNASFeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 299, 299, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -50,7 +50,7 @@ class FasterRcnnNASFeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -65,7 +65,7 @@ class FasterRcnnNASFeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 112, 112, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......
......@@ -95,6 +95,9 @@ class FasterRCNNResnetV1FeatureExtractor(
Returns:
rpn_feature_map: A tensor with shape [batch, height, width, depth]
activations: A dictionary mapping feature extractor tensor names to
tensors
Raises:
InvalidArgumentError: If the spatial size of `preprocessed_inputs`
(height or width) is less than 33.
......@@ -130,7 +133,7 @@ class FasterRCNNResnetV1FeatureExtractor(
scope=var_scope)
handle = scope + '/%s/block3' % self._architecture
return activations[handle]
return activations[handle], activations
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
"""Extracts second stage box classifier features.
......
......@@ -47,7 +47,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16, architecture=architecture)
preprocessed_inputs = tf.random_uniform(
[4, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -62,7 +62,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=8)
preprocessed_inputs = tf.random_uniform(
[4, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -77,7 +77,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 112, 112, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -95,7 +95,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=16)
preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......
......@@ -100,11 +100,13 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
'use_depthwise': self._use_depthwise,
}
with slim.arg_scope(self._conv_hyperparams):
# TODO(skligys): Enable fused batch norm once quantization supports it.
with slim.arg_scope([slim.batch_norm], fused=False):
with tf.variable_scope('MobilenetV1',
reuse=self._reuse_weights) as scope:
with tf.variable_scope('MobilenetV1',
reuse=self._reuse_weights) as scope:
with slim.arg_scope(
mobilenet_v1.mobilenet_v1_arg_scope(
is_training=(self._batch_norm_trainable and self._is_training))):
# TODO(skligys): Enable fused batch norm once quantization supports it.
with slim.arg_scope([slim.batch_norm], fused=False):
_, image_features = mobilenet_v1.mobilenet_v1_base(
ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
final_endpoint='Conv2d_13_pointwise',
......@@ -112,6 +114,9 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
depth_multiplier=self._depth_multiplier,
use_explicit_padding=self._use_explicit_padding,
scope=scope)
with slim.arg_scope(self._conv_hyperparams):
# TODO(skligys): Enable fused batch norm once quantization supports it.
with slim.arg_scope([slim.batch_norm], fused=False):
feature_maps = feature_map_generators.multi_resolution_feature_maps(
feature_map_layout=feature_map_layout,
depth_multiplier=self._depth_multiplier,
......
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""SSDFeatureExtractor for MobilenetV2 features."""
import tensorflow as tf
from object_detection.meta_architectures import ssd_meta_arch
from object_detection.models import feature_map_generators
from object_detection.utils import ops
from object_detection.utils import shape_utils
from nets.mobilenet import mobilenet
from nets.mobilenet import mobilenet_v2
slim = tf.contrib.slim
class SSDMobileNetV2FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
"""SSD Feature Extractor using MobilenetV2 features."""
def __init__(self,
is_training,
depth_multiplier,
min_depth,
pad_to_multiple,
conv_hyperparams,
batch_norm_trainable=True,
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False):
"""MobileNetV2 Feature Extractor for SSD Models.
Mobilenet v2 (experimental), designed by sandler@. More details can be found
in //knowledge/cerebra/brain/compression/mobilenet/mobilenet_experimental.py
Args:
is_training: whether the network is in training mode.
depth_multiplier: float depth multiplier for feature extractor.
min_depth: minimum feature extractor depth.
pad_to_multiple: the nearest multiple to zero pad the input height and
width dimensions to.
conv_hyperparams: tf slim arg_scope for conv2d and separable_conv2d ops.
batch_norm_trainable: Whether to update batch norm parameters during
training or not. When training with a small batch size
(e.g. 1), it is desirable to disable batch norm update and use
pretrained batch norm params.
reuse_weights: Whether to reuse variables. Default is None.
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False.
use_depthwise: Whether to use depthwise convolutions. Default is False.
"""
super(SSDMobileNetV2FeatureExtractor, self).__init__(
is_training, depth_multiplier, min_depth, pad_to_multiple,
conv_hyperparams, batch_norm_trainable, reuse_weights,
use_explicit_padding, use_depthwise)
def preprocess(self, resized_inputs):
"""SSD preprocessing.
Maps pixel values to the range [-1, 1].
Args:
resized_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
"""
return (2.0 / 255.0) * resized_inputs - 1.0
def extract_features(self, preprocessed_inputs):
"""Extract features from preprocessed inputs.
Args:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
feature_maps: a list of tensors where the ith tensor has shape
[batch, height_i, width_i, depth_i]
"""
preprocessed_inputs = shape_utils.check_min_image_dim(
33, preprocessed_inputs)
feature_map_layout = {
'from_layer': ['layer_15/expansion_output', 'layer_19', '', '', '', ''],
'layer_depth': [-1, -1, 512, 256, 256, 128],
'use_depthwise': self._use_depthwise,
'use_explicit_padding': self._use_explicit_padding,
}
with tf.variable_scope('MobilenetV2', reuse=self._reuse_weights) as scope:
with slim.arg_scope(
mobilenet_v2.training_scope(
is_training=(self._is_training and self._batch_norm_trainable),
bn_decay=0.9997)), \
slim.arg_scope(
[mobilenet.depth_multiplier], min_depth=self._min_depth):
# TODO(b/68150321): Enable fused batch norm once quantization
# supports it.
with slim.arg_scope([slim.batch_norm], fused=False):
_, image_features = mobilenet_v2.mobilenet_base(
ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
final_endpoint='layer_19',
depth_multiplier=self._depth_multiplier,
use_explicit_padding=self._use_explicit_padding,
scope=scope)
with slim.arg_scope(self._conv_hyperparams):
# TODO(b/68150321): Enable fused batch norm once quantization
# supports it.
with slim.arg_scope([slim.batch_norm], fused=False):
feature_maps = feature_map_generators.multi_resolution_feature_maps(
feature_map_layout=feature_map_layout,
depth_multiplier=self._depth_multiplier,
min_depth=self._min_depth,
insert_1x1_conv=True,
image_features=image_features)
return feature_maps.values()
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for ssd_mobilenet_v2_feature_extractor."""
import numpy as np
import tensorflow as tf
from object_detection.models import ssd_feature_extractor_test
from object_detection.models import ssd_mobilenet_v2_feature_extractor
slim = tf.contrib.slim
class SsdMobilenetV2FeatureExtractorTest(
ssd_feature_extractor_test.SsdFeatureExtractorTestBase):
def _create_feature_extractor(self, depth_multiplier, pad_to_multiple,
use_explicit_padding=False):
"""Constructs a new feature extractor.
Args:
depth_multiplier: float depth multiplier for feature extractor
pad_to_multiple: the nearest multiple to zero pad the input height and
width dimensions to.
use_explicit_padding: use 'VALID' padding for convolutions, but prepad
inputs so that the output dimensions are the same as if 'SAME' padding
were used.
Returns:
an ssd_meta_arch.SSDFeatureExtractor object.
"""
min_depth = 32
with slim.arg_scope([slim.conv2d], normalizer_fn=slim.batch_norm) as sc:
conv_hyperparams = sc
return ssd_mobilenet_v2_feature_extractor.SSDMobileNetV2FeatureExtractor(
False,
depth_multiplier,
min_depth,
pad_to_multiple,
conv_hyperparams,
use_explicit_padding=use_explicit_padding)
def test_extract_features_returns_correct_shapes_128(self):
image_height = 128
image_width = 128
depth_multiplier = 1.0
pad_to_multiple = 1
expected_feature_map_shape = [(2, 8, 8, 576), (2, 4, 4, 1280),
(2, 2, 2, 512), (2, 1, 1, 256),
(2, 1, 1, 256), (2, 1, 1, 128)]
self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape)
def test_extract_features_returns_correct_shapes_with_dynamic_inputs(self):
image_height = 128
image_width = 128
depth_multiplier = 1.0
pad_to_multiple = 1
expected_feature_map_shape = [(2, 8, 8, 576), (2, 4, 4, 1280),
(2, 2, 2, 512), (2, 1, 1, 256),
(2, 1, 1, 256), (2, 1, 1, 128)]
self.check_extract_features_returns_correct_shapes_with_dynamic_inputs(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape)
def test_extract_features_returns_correct_shapes_299(self):
image_height = 299
image_width = 299
depth_multiplier = 1.0
pad_to_multiple = 1
expected_feature_map_shape = [(2, 19, 19, 576), (2, 10, 10, 1280),
(2, 5, 5, 512), (2, 3, 3, 256),
(2, 2, 2, 256), (2, 1, 1, 128)]
self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape)
def test_extract_features_returns_correct_shapes_enforcing_min_depth(self):
image_height = 299
image_width = 299
depth_multiplier = 0.5**12
pad_to_multiple = 1
expected_feature_map_shape = [(2, 19, 19, 192), (2, 10, 10, 32),
(2, 5, 5, 32), (2, 3, 3, 32),
(2, 2, 2, 32), (2, 1, 1, 32)]
self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape)
def test_extract_features_returns_correct_shapes_with_pad_to_multiple(self):
image_height = 299
image_width = 299
depth_multiplier = 1.0
pad_to_multiple = 32
expected_feature_map_shape = [(2, 20, 20, 576), (2, 10, 10, 1280),
(2, 5, 5, 512), (2, 3, 3, 256),
(2, 2, 2, 256), (2, 1, 1, 128)]
self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape)
def test_extract_features_raises_error_with_invalid_image_size(self):
image_height = 32
image_width = 32
depth_multiplier = 1.0
pad_to_multiple = 1
self.check_extract_features_raises_error_with_invalid_image_size(
image_height, image_width, depth_multiplier, pad_to_multiple)
def test_preprocess_returns_correct_value_range(self):
image_height = 128
image_width = 128
depth_multiplier = 1
pad_to_multiple = 1
test_image = np.random.rand(4, image_height, image_width, 3)
feature_extractor = self._create_feature_extractor(depth_multiplier,
pad_to_multiple)
preprocessed_image = feature_extractor.preprocess(test_image)
self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0)))
def test_variables_only_created_in_scope(self):
depth_multiplier = 1
pad_to_multiple = 1
scope_name = 'MobilenetV2'
self.check_feature_extractor_variables_under_scope(
depth_multiplier, pad_to_multiple, scope_name)
def test_nofused_batchnorm(self):
image_height = 40
image_width = 40
depth_multiplier = 1
pad_to_multiple = 1
image_placeholder = tf.placeholder(tf.float32,
[1, image_height, image_width, 3])
feature_extractor = self._create_feature_extractor(depth_multiplier,
pad_to_multiple)
preprocessed_image = feature_extractor.preprocess(image_placeholder)
_ = feature_extractor.extract_features(preprocessed_image)
self.assertFalse(any(op.type == 'FusedBatchNorm'
for op in tf.get_default_graph().get_operations()))
if __name__ == '__main__':
tf.test.main()
......@@ -71,6 +71,10 @@ message ManualStepLearningRate {
optional float learning_rate = 2 [default = 0.002];
}
repeated LearningRateSchedule schedule = 2;
// Whether to linearly interpolate learning rates for steps in
// [0, schedule[0].step].
optional bool warmup = 3 [default = false];
}
// Configuration message for a cosine decaying learning rate as defined in
......@@ -80,4 +84,5 @@ message CosineDecayLearningRate {
optional uint32 total_steps = 2 [default = 4000000];
optional float warmup_learning_rate = 3 [default = 0.0002];
optional uint32 warmup_steps = 4 [default = 10000];
optional uint32 hold_base_rate_steps = 5 [default = 0];
}
......@@ -72,7 +72,8 @@ message SsdFeatureExtractor {
// Minimum number of the channels in the feature extractor.
optional int32 min_depth = 3 [default=16];
// Hyperparameters for the feature extractor.
// Hyperparameters that affect the layers of feature extractor added on top
// of the base feature extractor.
optional Hyperparams conv_hyperparams = 4;
// The nearest multiple to zero-pad the input height and width dimensions to.
......
......@@ -29,11 +29,17 @@ message TrainConfig {
// extractor variables trained outside of object detection.
optional string fine_tune_checkpoint = 7 [default=""];
// Type of checkpoint to restore variables from, e.g. 'classification' or
// 'detection'. Provides extensibility to from_detection_checkpoint.
// Typically used to load feature extractor variables from trained models.
optional string fine_tune_checkpoint_type = 22 [default=""];
// [Deprecated]: use fine_tune_checkpoint_type instead.
// Specifies if the finetune checkpoint is from an object detection model.
// If from an object detection model, the model being trained should have
// the same parameters with the exception of the num_classes parameter.
// If false, it assumes the checkpoint was a object classification model.
optional bool from_detection_checkpoint = 8 [default=false];
optional bool from_detection_checkpoint = 8 [default=false, deprecated=true];
// Whether to load all checkpoint vars that match model variable names and
// sizes. This option is only available if `from_detection_checkpoint` is
......@@ -83,7 +89,7 @@ message TrainConfig {
// Set this to at least the maximum amount of boxes in the input data.
// Otherwise, it may cause "Data loss: Attempted to pad to a smaller size
// than the input element" errors.
optional int32 max_number_of_boxes = 20 [default=50];
optional int32 max_number_of_boxes = 20 [default=100];
// Whether to remove padding along `num_boxes` dimension of the groundtruth
// tensors.
......
......@@ -90,10 +90,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -90,10 +90,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.00006
schedule {
step: 0
learning_rate: .00006
}
schedule {
step: 6000000
learning_rate: .000006
......
......@@ -90,10 +90,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -89,10 +89,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 0
learning_rate: .0002
}
schedule {
step: 900000
learning_rate: .00002
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 0
learning_rate: .0002
}
schedule {
step: 900000
learning_rate: .00002
......
......@@ -91,10 +91,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -90,10 +90,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -93,10 +93,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0001
schedule {
step: 0
learning_rate: .0001
}
schedule {
step: 500000
learning_rate: .00001
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0001
schedule {
step: 0
learning_rate: .0001
}
schedule {
step: 500000
learning_rate: .00001
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......@@ -118,6 +114,7 @@ train_config: {
random_horizontal_flip {
}
}
max_number_of_boxes: 50
}
train_input_reader: {
......
......@@ -110,10 +110,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -109,10 +109,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 0
learning_rate: .0002
}
schedule {
step: 900000
learning_rate: .00002
......
......@@ -110,10 +110,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -103,10 +103,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0007
schedule {
step: 0
learning_rate: 0.0007
}
schedule {
step: 15000
learning_rate: 0.00007
......
......@@ -110,10 +110,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -85,10 +85,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -85,10 +85,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -162,6 +162,7 @@ train_config: {
ssd_random_crop {
}
}
max_number_of_boxes: 50
}
train_input_reader: {
......
# SSD with Mobilenet v2 configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
ssd {
num_classes: 90
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 3
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobilenet_v2'
min_depth: 16
depth_multiplier: 1.0
use_depthwise: true
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
batch_norm_trainable: true
}
loss {
classification_loss {
weighted_sigmoid {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
loss_type: CLASSIFICATION
max_negatives_per_positive: 3
min_negatives_per_image: 3
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
batch_size: 24
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
fine_tune_checkpoint_type: "detection"
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
}
eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false
num_readers: 1
}
\ No newline at end of file
......@@ -254,7 +254,7 @@ def train(create_tensor_dict_fn, create_model_fn, train_config, master, task,
training_optimizer, optimizer_summary_vars = optimizer_builder.build(
train_config.optimizer)
for var in optimizer_summary_vars:
tf.summary.scalar(var.op.name, var)
tf.summary.scalar(var.op.name, var, family='LearningRate')
sync_optimizer = None
if train_config.sync_replicas:
......@@ -267,8 +267,16 @@ def train(create_tensor_dict_fn, create_model_fn, train_config, master, task,
# Create ops required to initialize the model from a given checkpoint.
init_fn = None
if train_config.fine_tune_checkpoint:
if not train_config.fine_tune_checkpoint_type:
# train_config.from_detection_checkpoint field is deprecated. For
# backward compatibility, fine_tune_checkpoint_type is set based on
# from_detection_checkpoint.
if train_config.from_detection_checkpoint:
train_config.fine_tune_checkpoint_type = 'detection'
else:
train_config.fine_tune_checkpoint_type = 'classification'
var_map = detection_model.restore_map(
from_detection_checkpoint=train_config.from_detection_checkpoint,
fine_tune_checkpoint_type=train_config.fine_tune_checkpoint_type,
load_all_detection_checkpoint_vars=(
train_config.load_all_detection_checkpoint_vars))
available_var_map = (variables_helper.
......@@ -320,11 +328,13 @@ def train(create_tensor_dict_fn, create_model_fn, train_config, master, task,
# Add summaries.
for model_var in slim.get_model_variables():
global_summaries.add(tf.summary.histogram(model_var.op.name, model_var))
global_summaries.add(tf.summary.histogram('ModelVars/' +
model_var.op.name, model_var))
for loss_tensor in tf.losses.get_losses():
global_summaries.add(tf.summary.scalar(loss_tensor.op.name, loss_tensor))
global_summaries.add(tf.summary.scalar('Losses/' + loss_tensor.op.name,
loss_tensor))
global_summaries.add(
tf.summary.scalar('TotalLoss', tf.losses.get_total_loss()))
tf.summary.scalar('Losses/TotalLoss', tf.losses.get_total_loss()))
# Add the summaries from the first clone. These contain the summaries
# created by model_fn and either optimize_clones() or _gather_clone_loss().
......
......@@ -157,13 +157,14 @@ class FakeDetectionModel(model.DetectionModel):
}
return loss_dict
def restore_map(self, from_detection_checkpoint=True):
def restore_map(self, fine_tune_checkpoint_type='detection'):
"""Returns a map of variables to load from a foreign checkpoint.
Args:
from_detection_checkpoint: whether to restore from a full detection
fine_tune_checkpoint_type: whether to restore from a full detection
checkpoint (with compatible variable names) or to restore from a
classification checkpoint for initialization prior to training.
Valid values: `detection`, `classification`. Default 'detection'.
Returns:
A dict mapping variable names to variables.
......
......@@ -285,6 +285,9 @@ def merge_external_params_with_configs(configs, hparams=None, **kwargs):
def _update_initial_learning_rate(configs, learning_rate):
"""Updates `configs` to reflect the new initial learning rate.
This function updates the initial learning rate. For learning rate schedules,
all other defined learning rates in the pipeline config are scaled to maintain
their same ratio with the initial learning rate.
The configs dictionary is updated in place, and hence not returned.
Args:
......@@ -322,6 +325,13 @@ def _update_initial_learning_rate(configs, learning_rate):
manual_lr.initial_learning_rate = learning_rate
for schedule in manual_lr.schedule:
schedule.learning_rate *= learning_rate_scaling
elif learning_rate_type == "cosine_decay_learning_rate":
cosine_lr = optimizer_config.learning_rate.cosine_decay_learning_rate
learning_rate_base = cosine_lr.learning_rate_base
warmup_learning_rate = cosine_lr.warmup_learning_rate
warmup_scale_factor = warmup_learning_rate / learning_rate_base
cosine_lr.learning_rate_base = learning_rate
cosine_lr.warmup_learning_rate = warmup_scale_factor * learning_rate
else:
raise TypeError("Learning rate %s is not supported." % learning_rate_type)
......
......@@ -59,6 +59,14 @@ def _update_optimizer_with_manual_step_learning_rate(
schedule.learning_rate = initial_learning_rate * learning_rate_scaling**i
def _update_optimizer_with_cosine_decay_learning_rate(
optimizer, learning_rate, warmup_learning_rate):
"""Adds a new cosine decay learning rate."""
cosine_lr = optimizer.learning_rate.cosine_decay_learning_rate
cosine_lr.learning_rate_base = learning_rate
cosine_lr.warmup_learning_rate = warmup_learning_rate
class ConfigUtilTest(tf.test.TestCase):
def test_get_configs_from_pipeline_file(self):
......@@ -154,6 +162,7 @@ class ConfigUtilTest(tf.test.TestCase):
"""Asserts successful updating of all learning rate schemes."""
original_learning_rate = 0.7
learning_rate_scaling = 0.1
warmup_learning_rate = 0.07
hparams = tf.contrib.training.HParams(learning_rate=0.15)
pipeline_config_path = os.path.join(self.get_temp_dir(), "pipeline.config")
......@@ -201,6 +210,24 @@ class ConfigUtilTest(tf.test.TestCase):
self.assertAlmostEqual(hparams.learning_rate * learning_rate_scaling**i,
schedule.learning_rate)
# Cosine decay learning rate.
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
optimizer = getattr(pipeline_config.train_config.optimizer, optimizer_name)
_update_optimizer_with_cosine_decay_learning_rate(optimizer,
original_learning_rate,
warmup_learning_rate)
_write_config(pipeline_config, pipeline_config_path)
configs = config_util.get_configs_from_pipeline_file(pipeline_config_path)
configs = config_util.merge_external_params_with_configs(configs, hparams)
optimizer = getattr(configs["train_config"].optimizer, optimizer_name)
cosine_lr = optimizer.learning_rate.cosine_decay_learning_rate
self.assertAlmostEqual(hparams.learning_rate, cosine_lr.learning_rate_base)
warmup_scale_factor = warmup_learning_rate / original_learning_rate
self.assertAlmostEqual(hparams.learning_rate * warmup_scale_factor,
cosine_lr.warmup_learning_rate)
def testRMSPropWithNewLearingRate(self):
"""Tests new learning rates for RMSProp Optimizer."""
self._assertOptimizerWithNewLearningRate("rms_prop_optimizer")
......
......@@ -32,8 +32,10 @@ def _validate_label_map(label_map):
ValueError: if label map is invalid.
"""
for item in label_map.item:
if item.id < 1:
raise ValueError('Label map ids should be >= 1.')
if item.id < 0:
raise ValueError('Label map ids should be >= 0.')
if item.id == 0 and item.name != 'background':
raise ValueError('Label map id 0 is reserved for the background label')
def create_category_index(categories):
......
......@@ -95,6 +95,30 @@ class LabelMapUtilTest(tf.test.TestCase):
with self.assertRaises(ValueError):
label_map_util.load_labelmap(label_map_path)
def test_load_label_map_with_background(self):
label_map_string = """
item {
id:0
name:'background'
}
item {
id:2
name:'cat'
}
item {
id:1
name:'dog'
}
"""
label_map_path = os.path.join(self.get_temp_dir(), 'label_map.pbtxt')
with tf.gfile.Open(label_map_path, 'wb') as f:
f.write(label_map_string)
label_map_dict = label_map_util.get_label_map_dict(label_map_path)
self.assertEqual(label_map_dict['background'], 0)
self.assertEqual(label_map_dict['dog'], 1)
self.assertEqual(label_map_dict['cat'], 2)
def test_keep_categories_with_unique_id(self):
label_map_proto = string_int_label_map_pb2.StringIntLabelMap()
label_map_string = """
......
......@@ -56,14 +56,15 @@ def exponential_decay_with_burnin(global_step,
return tf.where(
tf.less(tf.cast(global_step, tf.int32), tf.constant(burnin_steps)),
tf.constant(burnin_learning_rate),
post_burnin_learning_rate)
post_burnin_learning_rate, name='learning_rate')
def cosine_decay_with_warmup(global_step,
learning_rate_base,
total_steps,
warmup_learning_rate=0.0,
warmup_steps=0):
warmup_steps=0,
hold_base_rate_steps=0):
"""Cosine decay schedule with warm up period.
Cosine annealing learning rate as described in:
......@@ -79,6 +80,8 @@ def cosine_decay_with_warmup(global_step,
total_steps: total number of training steps.
warmup_learning_rate: initial learning rate for warm up.
warmup_steps: number of warmup steps.
hold_base_rate_steps: Optional number of steps to hold base learning rate
before decaying.
Returns:
a (scalar) float tensor representing learning rate.
......@@ -93,21 +96,24 @@ def cosine_decay_with_warmup(global_step,
if total_steps < warmup_steps:
raise ValueError('total_steps must be larger or equal to '
'warmup_steps.')
learning_rate = 0.5 * learning_rate_base * (
1 + tf.cos(np.pi * (tf.cast(global_step, tf.float32) - warmup_steps
) / float(total_steps - warmup_steps)))
learning_rate = 0.5 * learning_rate_base * (1 + tf.cos(
np.pi *
(tf.cast(global_step, tf.float32) - warmup_steps - hold_base_rate_steps
) / float(total_steps - warmup_steps - hold_base_rate_steps)))
if hold_base_rate_steps > 0:
learning_rate = tf.where(global_step > warmup_steps + hold_base_rate_steps,
learning_rate, learning_rate_base)
if warmup_steps > 0:
slope = (learning_rate_base - warmup_learning_rate) / warmup_steps
pre_cosine_learning_rate = slope * tf.cast(
global_step, tf.float32) + warmup_learning_rate
learning_rate = tf.where(
tf.less(tf.cast(global_step, tf.int32), warmup_steps),
pre_cosine_learning_rate,
learning_rate)
return learning_rate
warmup_rate = slope * tf.cast(global_step,
tf.float32) + warmup_learning_rate
learning_rate = tf.where(global_step < warmup_steps, warmup_rate,
learning_rate)
return tf.where(global_step > total_steps, 0.0, learning_rate,
name='learning_rate')
def manual_stepping(global_step, boundaries, rates):
def manual_stepping(global_step, boundaries, rates, warmup=False):
"""Manually stepped learning rate schedule.
This function provides fine grained control over learning rates. One must
......@@ -124,6 +130,8 @@ def manual_stepping(global_step, boundaries, rates):
rates: a list of (float) learning rates corresponding to intervals between
the boundaries. The length of this list must be exactly
len(boundaries) + 1.
warmup: Whether to linearly interpolate learning rate for steps in
[0, boundaries[0]].
Returns:
a (scalar) float tensor representing learning rate
......@@ -131,6 +139,7 @@ def manual_stepping(global_step, boundaries, rates):
ValueError: if one of the following checks fails:
1. boundaries is a strictly increasing list of positive integers
2. len(rates) == len(boundaries) + 1
3. boundaries[0] != 0
"""
if any([b < 0 for b in boundaries]) or any(
[not isinstance(b, int) for b in boundaries]):
......@@ -142,16 +151,21 @@ def manual_stepping(global_step, boundaries, rates):
if len(rates) != len(boundaries) + 1:
raise ValueError('Number of provided learning rates must exceed '
'number of boundary points by exactly 1.')
if not boundaries: return tf.constant(rates[0])
step_boundaries = tf.constant(boundaries, tf.int32)
if boundaries and boundaries[0] == 0:
raise ValueError('First step cannot be zero.')
if warmup and boundaries:
slope = (rates[1] - rates[0]) * 1.0 / boundaries[0]
warmup_steps = range(boundaries[0])
warmup_rates = [rates[0] + slope * step for step in warmup_steps]
boundaries = warmup_steps + boundaries
rates = warmup_rates + rates[1:]
else:
boundaries = [0] + boundaries
num_boundaries = len(boundaries)
learning_rates = tf.constant(rates, tf.float32)
index = tf.reduce_min(
tf.where(
# Casting global step to tf.int32 is dangerous, but necessary to be
# compatible with TPU.
tf.greater(step_boundaries, tf.cast(global_step, tf.int32)),
tf.constant(range(num_boundaries), dtype=tf.int32),
tf.constant([num_boundaries] * num_boundaries, dtype=tf.int32)))
return tf.reduce_sum(learning_rates * tf.one_hot(index, len(rates),
dtype=tf.float32))
rate_index = tf.reduce_max(tf.where(tf.greater_equal(global_step, boundaries),
range(num_boundaries),
[0] * num_boundaries))
return tf.reduce_sum(rates * tf.one_hot(rate_index, depth=num_boundaries),
name='learning_rate')
......@@ -33,6 +33,7 @@ class LearningSchedulesTest(test_case.TestCase):
learning_rate = learning_schedules.exponential_decay_with_burnin(
global_step, learning_rate_base, learning_rate_decay_steps,
learning_rate_decay_factor, burnin_learning_rate, burnin_steps)
assert learning_rate.op.name.endswith('learning_rate')
return (learning_rate,)
output_rates = [
......@@ -51,6 +52,7 @@ class LearningSchedulesTest(test_case.TestCase):
learning_rate = learning_schedules.cosine_decay_with_warmup(
global_step, learning_rate_base, total_steps,
warmup_learning_rate, warmup_steps)
assert learning_rate.op.name.endswith('learning_rate')
return (learning_rate,)
exp_rates = [0.1, 0.5, 0.9, 1.0, 0]
input_global_steps = [0, 4, 8, 9, 100]
......@@ -60,12 +62,53 @@ class LearningSchedulesTest(test_case.TestCase):
]
self.assertAllClose(output_rates, exp_rates)
def testCosineDecayAfterTotalSteps(self):
def graph_fn(global_step):
learning_rate_base = 1.0
total_steps = 100
warmup_learning_rate = 0.1
warmup_steps = 9
learning_rate = learning_schedules.cosine_decay_with_warmup(
global_step, learning_rate_base, total_steps,
warmup_learning_rate, warmup_steps)
assert learning_rate.op.name.endswith('learning_rate')
return (learning_rate,)
exp_rates = [0]
input_global_steps = [101]
output_rates = [
self.execute(graph_fn, [np.array(step).astype(np.int64)])
for step in input_global_steps
]
self.assertAllClose(output_rates, exp_rates)
def testCosineDecayWithHoldBaseLearningRateSteps(self):
def graph_fn(global_step):
learning_rate_base = 1.0
total_steps = 120
warmup_learning_rate = 0.1
warmup_steps = 9
hold_base_rate_steps = 20
learning_rate = learning_schedules.cosine_decay_with_warmup(
global_step, learning_rate_base, total_steps,
warmup_learning_rate, warmup_steps, hold_base_rate_steps)
assert learning_rate.op.name.endswith('learning_rate')
return (learning_rate,)
exp_rates = [0.1, 0.5, 0.9, 1.0, 1.0, 1.0, 0.999702, 0.874255, 0.577365,
0.0]
input_global_steps = [0, 4, 8, 9, 10, 29, 30, 50, 70, 120]
output_rates = [
self.execute(graph_fn, [np.array(step).astype(np.int64)])
for step in input_global_steps
]
self.assertAllClose(output_rates, exp_rates)
def testManualStepping(self):
def graph_fn(global_step):
boundaries = [2, 3, 7]
rates = [1.0, 2.0, 3.0, 4.0]
learning_rate = learning_schedules.manual_stepping(
global_step, boundaries, rates)
assert learning_rate.op.name.endswith('learning_rate')
return (learning_rate,)
output_rates = [
......@@ -75,6 +118,22 @@ class LearningSchedulesTest(test_case.TestCase):
exp_rates = [1.0, 1.0, 2.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0]
self.assertAllClose(output_rates, exp_rates)
def testManualSteppingWithWarmup(self):
def graph_fn(global_step):
boundaries = [4, 6, 8]
rates = [0.02, 0.10, 0.01, 0.001]
learning_rate = learning_schedules.manual_stepping(
global_step, boundaries, rates, warmup=True)
assert learning_rate.op.name.endswith('learning_rate')
return (learning_rate,)
output_rates = [
self.execute(graph_fn, [np.array(i).astype(np.int64)])
for i in range(9)
]
exp_rates = [0.02, 0.04, 0.06, 0.08, 0.10, 0.10, 0.01, 0.01, 0.001]
self.assertAllClose(output_rates, exp_rates)
def testManualSteppingWithZeroBoundaries(self):
def graph_fn(global_step):
boundaries = []
......
......@@ -657,7 +657,7 @@ def position_sensitive_crop_regions(image,
position_sensitive_features = tf.add_n(image_crops) / len(image_crops)
# Then average over spatial positions within the bins.
position_sensitive_features = tf.reduce_mean(
position_sensitive_features, [1, 2], keepdims=True)
position_sensitive_features, [1, 2], keep_dims=True)
else:
# Reorder height/width to depth channel.
block_size = bin_crop_size[0]
......
......@@ -840,7 +840,7 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
# All channels are equal so position-sensitive crop and resize should
# work as the usual crop and resize for just one channel.
crop = tf.image.crop_and_resize(image, boxes, box_ind, crop_size)
crop_and_pool = tf.reduce_mean(crop, [1, 2], keepdims=True)
crop_and_pool = tf.reduce_mean(crop, [1, 2], keep_dims=True)
ps_crop_and_pool = ops.position_sensitive_crop_regions(
tiled_image,
......@@ -866,7 +866,7 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
# When a single bin is used, position-sensitive crop and pool should be
# the same as non-position sensitive crop and pool.
crop = tf.image.crop_and_resize(image, boxes, box_ind, crop_size)
crop_and_pool = tf.reduce_mean(crop, [1, 2], keepdims=True)
crop_and_pool = tf.reduce_mean(crop, [1, 2], keep_dims=True)
ps_crop_and_pool = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
......@@ -1054,7 +1054,7 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
ps_crop = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False)
ps_crop_and_pool = tf.reduce_mean(
ps_crop, reduction_indices=(1, 2), keepdims=True)
ps_crop, reduction_indices=(1, 2), keep_dims=True)
with self.test_session() as sess:
output = sess.run(ps_crop_and_pool)
......
......@@ -42,7 +42,7 @@ def filter_variables(variables, filter_regex_list, invert=False):
a list of filtered variables.
"""
kept_vars = []
variables_to_ignore_patterns = list(filter(None, filter_regex_list))
variables_to_ignore_patterns = filter(None, filter_regex_list)
for var in variables:
add = True
for pattern in variables_to_ignore_patterns:
......
......@@ -434,7 +434,6 @@ py_library(
srcs = glob(["nets/mobilenet/*.py"]),
srcs_version = "PY2AND3",
deps = [
"//third_party/py/contextlib2",
# "//tensorflow",
],
)
......
......@@ -93,6 +93,7 @@ import sys
import threading
import numpy as np
from six.moves import xrange
import tensorflow as tf
tf.app.flags.DEFINE_string('train_directory', '/tmp/',
......
......@@ -52,6 +52,8 @@ import os
import os.path
import sys
from six.moves import xrange
if __name__ == '__main__':
if len(sys.argv) < 3:
......
......@@ -86,6 +86,8 @@ import os.path
import sys
import xml.etree.ElementTree as ET
from six.moves import xrange
class BoundingBox(object):
pass
......
......@@ -230,9 +230,10 @@ def _gather_clone_loss(clone, num_clones, regularization_losses):
sum_loss = tf.add_n(all_losses)
# Add the summaries out of the clone device block.
if clone_loss is not None:
tf.summary.scalar(clone.scope + '/clone_loss', clone_loss)
tf.summary.scalar(clone.scope + '/clone_loss', clone_loss, family='Losses')
if regularization_loss is not None:
tf.summary.scalar('regularization_loss', regularization_loss)
tf.summary.scalar('regularization_loss', regularization_loss,
family='Losses')
return sum_loss
......
......@@ -18,7 +18,7 @@ from __future__ import division
from __future__ import print_function
import numpy as np
from six.moves import xrange
import tensorflow as tf
layers = tf.contrib.layers
......
......@@ -19,6 +19,8 @@ from __future__ import print_function
from math import log
from six.moves import xrange
import tensorflow as tf
slim = tf.contrib.slim
......
......@@ -18,6 +18,7 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from six.moves import xrange
import tensorflow as tf
from nets import dcgan
......
......@@ -13,8 +13,8 @@
# limitations under the License.
# ==============================================================================
"""Convolution blocks for mobilenet."""
import contextlib
import functools
import contextlib2
import tensorflow as tf
......@@ -75,6 +75,19 @@ def _split_divisible(num, num_ways, divisible_by=8):
return result
@contextlib.contextmanager
def _v1_compatible_scope_naming(scope):
if scope is None: # Create uniqified separable blocks.
with tf.variable_scope(None, default_name='separable') as s, \
tf.name_scope(s.original_name_scope):
yield ''
else:
# We use scope_depthwise, scope_pointwise for compatibility with V1 ckpts.
# which provide numbered scopes.
scope += '_'
yield scope
@slim.add_arg_scope
def split_separable_conv2d(input_tensor,
num_outputs,
......@@ -110,15 +123,7 @@ def split_separable_conv2d(input_tensor,
output tesnor
"""
with contextlib2.ExitStack() as stack:
if scope is None: # Create uniqified separable blocks.
s = stack.enter_context(tf.variable_scope(None, default_name='separable'))
stack.enter_context(tf.name_scope(s.original_name_scope))
scope = ''
else:
# We use scope_depthwise, scope_pointwise for compatibility with V1 ckpts.
scope += '_'
with _v1_compatible_scope_naming(scope) as scope:
dw_scope = scope + 'depthwise'
endpoints = endpoints if endpoints is not None else {}
kernel_size = [3, 3]
......
......@@ -22,8 +22,6 @@ import contextlib
import copy
import os
import contextlib2
import tensorflow as tf
......@@ -76,17 +74,23 @@ def _set_arg_scope_defaults(defaults):
"""Sets arg scope defaults for all items present in defaults.
Args:
defaults: dictionary mapping function to default_dict
defaults: dictionary/list of pairs, containing a mapping from
function to a dictionary of default args.
Yields:
context manager
context manager where all defaults are set.
"""
with contextlib2.ExitStack() as stack:
_ = [
stack.enter_context(slim.arg_scope(func, **default_arg))
for func, default_arg in defaults.items()
]
if hasattr(defaults, 'items'):
items = defaults.items()
else:
items = defaults
if not items:
yield
else:
func, default_arg = items[0]
with slim.arg_scope(func, **default_arg):
with _set_arg_scope_defaults(items[1:]):
yield
@slim.add_arg_scope
......
......@@ -350,7 +350,7 @@ class MobilenetV1Test(tf.test.TestCase):
mobilenet_v1.mobilenet_v1_base(inputs)
total_params, _ = slim.model_analyzer.analyze_vars(
slim.get_model_variables())
self.assertAlmostEqual(3217920L, total_params)
self.assertAlmostEqual(3217920, total_params)
def testBuildEndPointsWithDepthMultiplierLessThanOne(self):
batch_size = 5
......
......@@ -20,6 +20,7 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import copy
import tensorflow as tf
from nets.nasnet import nasnet_utils
......@@ -35,13 +36,12 @@ slim = tf.contrib.slim
# cosine (single period) learning rate decay
# auxiliary head loss weighting: 0.4
# clip global norm of all gradients by 5
def _cifar_config(is_training=True, use_aux_head=True):
drop_path_keep_prob = 1.0 if not is_training else 0.6
def cifar_config():
return tf.contrib.training.HParams(
stem_multiplier=3.0,
drop_path_keep_prob=drop_path_keep_prob,
drop_path_keep_prob=0.6,
num_cells=18,
use_aux_head=int(use_aux_head),
use_aux_head=1,
num_conv_filters=32,
dense_dropout_keep_prob=1.0,
filter_scaling_rate=2.0,
......@@ -65,16 +65,15 @@ def _cifar_config(is_training=True, use_aux_head=True):
# auxiliary head loss weighting: 0.4
# label smoothing: 0.1
# clip global norm of all gradients by 10
def _large_imagenet_config(is_training=True, use_aux_head=True):
drop_path_keep_prob = 1.0 if not is_training else 0.7
def large_imagenet_config():
return tf.contrib.training.HParams(
stem_multiplier=3.0,
dense_dropout_keep_prob=0.5,
num_cells=18,
filter_scaling_rate=2.0,
num_conv_filters=168,
drop_path_keep_prob=drop_path_keep_prob,
use_aux_head=int(use_aux_head),
drop_path_keep_prob=0.7,
use_aux_head=1,
num_reduction_layers=2,
data_format='NHWC',
skip_reduction_layer_input=1,
......@@ -92,7 +91,7 @@ def _large_imagenet_config(is_training=True, use_aux_head=True):
# auxiliary head weighting: 0.4
# label smoothing: 0.1
# clip global norm of all gradients by 10
def _mobile_imagenet_config(use_aux_head=True):
def mobile_imagenet_config():
return tf.contrib.training.HParams(
stem_multiplier=1.0,
dense_dropout_keep_prob=0.5,
......@@ -100,7 +99,7 @@ def _mobile_imagenet_config(use_aux_head=True):
filter_scaling_rate=2.0,
drop_path_keep_prob=1.0,
num_conv_filters=44,
use_aux_head=int(use_aux_head),
use_aux_head=1,
num_reduction_layers=2,
data_format='NHWC',
skip_reduction_layer_input=0,
......@@ -108,6 +107,12 @@ def _mobile_imagenet_config(use_aux_head=True):
)
def _update_hparams(hparams, is_training):
"""Update hparams for given is_training option."""
if not is_training:
hparams.set_hparam('drop_path_keep_prob', 1.0)
def nasnet_cifar_arg_scope(weight_decay=5e-4,
batch_norm_decay=0.9,
batch_norm_epsilon=1e-5):
......@@ -279,10 +284,12 @@ def _cifar_stem(inputs, hparams):
return net, [None, net]
def build_nasnet_cifar(
images, num_classes, is_training=True, use_aux_head=True):
def build_nasnet_cifar(images, num_classes,
is_training=True,
config=None):
"""Build NASNet model for the Cifar Dataset."""
hparams = _cifar_config(is_training=is_training, use_aux_head=use_aux_head)
hparams = cifar_config() if config is None else copy.deepcopy(config)
_update_hparams(hparams, is_training)
if tf.test.is_gpu_available() and hparams.data_format == 'NHWC':
tf.logging.info('A GPU is available on the machine, consider using NCHW '
......@@ -326,9 +333,11 @@ build_nasnet_cifar.default_image_size = 32
def build_nasnet_mobile(images, num_classes,
is_training=True,
final_endpoint=None,
use_aux_head=True):
config=None):
"""Build NASNet Mobile model for the ImageNet Dataset."""
hparams = _mobile_imagenet_config(use_aux_head=use_aux_head)
hparams = (mobile_imagenet_config() if config is None
else copy.deepcopy(config))
_update_hparams(hparams, is_training)
if tf.test.is_gpu_available() and hparams.data_format == 'NHWC':
tf.logging.info('A GPU is available on the machine, consider using NCHW '
......@@ -375,10 +384,11 @@ build_nasnet_mobile.default_image_size = 224
def build_nasnet_large(images, num_classes,
is_training=True,
final_endpoint=None,
use_aux_head=True):
config=None):
"""Build NASNet Large model for the ImageNet Dataset."""
hparams = _large_imagenet_config(is_training=is_training,
use_aux_head=use_aux_head)
hparams = (large_imagenet_config() if config is None
else copy.deepcopy(config))
_update_hparams(hparams, is_training)
if tf.test.is_gpu_available() and hparams.data_format == 'NHWC':
tf.logging.info('A GPU is available on the machine, consider using NCHW '
......
......@@ -166,9 +166,11 @@ class NASNetTest(tf.test.TestCase):
tf.reset_default_graph()
inputs = tf.random_uniform((batch_size, height, width, 3))
tf.train.create_global_step()
config = nasnet.cifar_config()
config.set_hparam('use_aux_head', int(use_aux_head))
with slim.arg_scope(nasnet.nasnet_cifar_arg_scope()):
_, end_points = nasnet.build_nasnet_cifar(inputs, num_classes,
use_aux_head=use_aux_head)
config=config)
self.assertEqual('AuxLogits' in end_points, use_aux_head)
def testAllEndPointsShapesMobileModel(self):
......@@ -215,9 +217,11 @@ class NASNetTest(tf.test.TestCase):
tf.reset_default_graph()
inputs = tf.random_uniform((batch_size, height, width, 3))
tf.train.create_global_step()
config = nasnet.mobile_imagenet_config()
config.set_hparam('use_aux_head', int(use_aux_head))
with slim.arg_scope(nasnet.nasnet_mobile_arg_scope()):
_, end_points = nasnet.build_nasnet_mobile(inputs, num_classes,
use_aux_head=use_aux_head)
config=config)
self.assertEqual('AuxLogits' in end_points, use_aux_head)
def testAllEndPointsShapesLargeModel(self):
......@@ -270,9 +274,11 @@ class NASNetTest(tf.test.TestCase):
tf.reset_default_graph()
inputs = tf.random_uniform((batch_size, height, width, 3))
tf.train.create_global_step()
config = nasnet.large_imagenet_config()
config.set_hparam('use_aux_head', int(use_aux_head))
with slim.arg_scope(nasnet.nasnet_large_arg_scope()):
_, end_points = nasnet.build_nasnet_large(inputs, num_classes,
use_aux_head=use_aux_head)
config=config)
self.assertEqual('AuxLogits' in end_points, use_aux_head)
def testVariablesSetDeviceMobileModel(self):
......@@ -323,6 +329,48 @@ class NASNetTest(tf.test.TestCase):
output = sess.run(predictions)
self.assertEquals(output.shape, (batch_size,))
def testOverrideHParamsCifarModel(self):
batch_size = 5
height, width = 32, 32
num_classes = 10
inputs = tf.random_uniform((batch_size, height, width, 3))
tf.train.create_global_step()
config = nasnet.cifar_config()
config.set_hparam('data_format', 'NCHW')
with slim.arg_scope(nasnet.nasnet_cifar_arg_scope()):
_, end_points = nasnet.build_nasnet_cifar(
inputs, num_classes, config=config)
self.assertListEqual(
end_points['Stem'].shape.as_list(), [batch_size, 96, 32, 32])
def testOverrideHParamsMobileModel(self):
batch_size = 5
height, width = 224, 224
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
tf.train.create_global_step()
config = nasnet.mobile_imagenet_config()
config.set_hparam('data_format', 'NCHW')
with slim.arg_scope(nasnet.nasnet_mobile_arg_scope()):
_, end_points = nasnet.build_nasnet_mobile(
inputs, num_classes, config=config)
self.assertListEqual(
end_points['Stem'].shape.as_list(), [batch_size, 88, 28, 28])
def testOverrideHParamsLargeModel(self):
batch_size = 5
height, width = 331, 331
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
tf.train.create_global_step()
config = nasnet.large_imagenet_config()
config.set_hparam('data_format', 'NCHW')
with slim.arg_scope(nasnet.nasnet_large_arg_scope()):
_, end_points = nasnet.build_nasnet_large(
inputs, num_classes, config=config)
self.assertListEqual(
end_points['Stem'].shape.as_list(), [batch_size, 336, 42, 42])
if __name__ == '__main__':
tf.test.main()
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册