提交 a1337e01 编写于 作者: Z Zhichao Lu 提交者: pkulzc

Merged commit includes the following changes:

223075771  by lzc:

    Bring in external fixes.

--
222919755  by ronnyvotel:

    Bug fix in faster r-cnn model builder. Was previously using `inplace_batchnorm_update` for `reuse_weights`.

--
222885680  by Zhichao Lu:

    Use the result_dict_for_batched_example in models_lib
    Also fixes the visualization size on when eval is on GPU

--
222883648  by Zhichao Lu:

    Fix _unmatched_class_label for the _add_background_class == False case in ssd_meta_arch.py.

--
222836663  by Zhichao Lu:

    Adding support for visualizing grayscale images. Without this change, the images are black-red instead of grayscale.

--
222501978  by Zhichao Lu:

    Fix a bug that caused convert_to_grayscale flag not to be respected.

--
222432846  by richardmunoz:

    Fix mapping of groundtruth_confidences from shape [num_boxes] to [num_boxes, num_classes] when the input contains the groundtruth_confidences field.

--
221725755  by richardmunoz:

    Internal change.

--
221458536  by Zhichao Lu:

    Fix saver defer build bug in object detection train codepath.

--
221391590  by Zhichao Lu:

    Add support for group normalization in the object detection API. Just adding MobileNet-v1 SSD currently. This may serve as a road map for other models that wish to support group normalization as an option.

--
221367993  by Zhichao Lu:

    Bug fixes (1) Make RandomPadImage work, (2) Fix keep_checkpoint_every_n_hours.

--
221266403  by rathodv:

    Use detection boxes as proposals to compute correct mask loss in eval jobs.

--
220845934  by lzc:

    Internal change.

--
220778850  by Zhichao Lu:

    Incorporating existing metrics into Estimator framework.
    Should restore:
    -oid_challenge_detection_metrics
    -pascal_voc_detection_metrics
    -weighted_pascal_voc_detection_metrics
    -pascal_voc_instance_segmentation_metrics
    -weighted_pascal_voc_instance_segmentation_metrics
    -oid_V2_detection_metrics

--
220370391  by alirezafathi:

    Adding precision and recall to the metrics.

--
220321268  by Zhichao Lu:

    Allow the option of setting max_examples_to_draw to zero.

--
220193337  by Zhichao Lu:

    This CL fixes a bug where the Keras convolutional box predictor was applying heads in the non-deterministic dict order. The consequence of this bug was that variables were created in non-deterministic orders. This in turn led different workers in a multi-gpu training setup to have slightly different graphs which had variables assigned to mismatched parameter servers. As a result, roughly half of all workers were unable to initialize and did no work, and training time was slowed down approximately 2x.

--
220136508  by huizhongc:

    Add weight equalization loss to SSD meta arch.

--
220125875  by pengchong:

    Rename label_scores to label_weights

--
219730108  by Zhichao Lu:

    Add description of detection_keypoints in postprocessed_tensors to docstring.

--
219577519  by pengchong:

    Support parsing the class confidences and training using them.

--
219547611  by lzc:

    Stop using static shapes in GPU eval jobs.

--
219536476  by Zhichao Lu:

    Migrate TensorFlow Lite out of tensorflow/contrib

    This change moves //tensorflow/contrib/lite to //tensorflow/lite in preparation
    for TensorFlow 2.0's deprecation of contrib/. If you refer to TF Lite build
    targets or headers, you will need to update them manually. If you use TF Lite
    from the TensorFlow python package, "tf.contrib.lite" now points to "tf.lite".
    Please update your imports as soon as possible.

    For more details, see https://groups.google.com/a/tensorflow.org/forum/#!topic/tflite/iIIXOTOFvwQ

    @angersson and @aselle are conducting this migration. Please contact them if
    you have any further questions.

--
219190083  by Zhichao Lu:

    Add a second expected_loss_weights function using an alternative expectation calculation compared to previous. Integrate this op into ssd_meta_arch and losses builder. Affects files that use losses_builder.build to handle the returning of an additional element.

--
218924451  by pengchong:

    Add a new way to assign training targets using groundtruth confidences.

--
218760524  by chowdhery:

    Modify export script to add option for regular NMS in TFLite post-processing op.

--

PiperOrigin-RevId: 223075771
上级 2c680af3
......@@ -16,7 +16,6 @@
"""Function to build box predictor from configuration."""
import collections
from absl import logging
import tensorflow as tf
from object_detection.predictors import convolutional_box_predictor
from object_detection.predictors import convolutional_keras_box_predictor
......@@ -26,7 +25,6 @@ from object_detection.predictors.heads import box_head
from object_detection.predictors.heads import class_head
from object_detection.predictors.heads import keras_box_head
from object_detection.predictors.heads import keras_class_head
from object_detection.predictors.heads import keras_mask_head
from object_detection.predictors.heads import mask_head
from object_detection.protos import box_predictor_pb2
......@@ -44,8 +42,7 @@ def build_convolutional_box_predictor(is_training,
apply_sigmoid_to_scores=False,
add_background_class=True,
class_prediction_bias_init=0.0,
use_depthwise=False,
mask_head_config=None):
use_depthwise=False,):
"""Builds the ConvolutionalBoxPredictor from the arguments.
Args:
......@@ -80,8 +77,6 @@ def build_convolutional_box_predictor(is_training,
conv2d layer before class prediction.
use_depthwise: Whether to use depthwise convolutions for prediction
steps. Default is False.
mask_head_config: An optional MaskHead object containing configs for mask
head construction.
Returns:
A ConvolutionalBoxPredictor class.
......@@ -101,21 +96,6 @@ def build_convolutional_box_predictor(is_training,
class_prediction_bias_init=class_prediction_bias_init,
use_depthwise=use_depthwise)
other_heads = {}
if mask_head_config is not None:
if not mask_head_config.masks_are_class_agnostic:
logging.warning('Note that class specific mask prediction for SSD '
'models is memory consuming.')
other_heads[convolutional_box_predictor.MASK_PREDICTIONS] = (
mask_head.ConvolutionalMaskHead(
is_training=is_training,
num_classes=num_classes,
use_dropout=use_dropout,
dropout_keep_prob=dropout_keep_prob,
kernel_size=kernel_size,
use_depthwise=use_depthwise,
mask_height=mask_head_config.mask_height,
mask_width=mask_head_config.mask_width,
masks_are_class_agnostic=mask_head_config.masks_are_class_agnostic))
return convolutional_box_predictor.ConvolutionalBoxPredictor(
is_training=is_training,
num_classes=num_classes,
......@@ -144,7 +124,6 @@ def build_convolutional_keras_box_predictor(is_training,
add_background_class=True,
class_prediction_bias_init=0.0,
use_depthwise=False,
mask_head_config=None,
name='BoxPredictor'):
"""Builds the Keras ConvolutionalBoxPredictor from the arguments.
......@@ -189,8 +168,6 @@ def build_convolutional_keras_box_predictor(is_training,
conv2d layer before class prediction.
use_depthwise: Whether to use depthwise convolutions for prediction
steps. Default is False.
mask_head_config: An optional MaskHead object containing configs for mask
head construction.
name: A string name scope to assign to the box predictor. If `None`, Keras
will auto-generate one from the class name.
......@@ -199,11 +176,7 @@ def build_convolutional_keras_box_predictor(is_training,
"""
box_prediction_heads = []
class_prediction_heads = []
mask_prediction_heads = []
other_heads = {}
if mask_head_config is not None:
other_heads[convolutional_box_predictor.MASK_PREDICTIONS] = \
mask_prediction_heads
for stack_index, num_predictions_per_location in enumerate(
num_predictions_per_location_list):
......@@ -231,26 +204,6 @@ def build_convolutional_keras_box_predictor(is_training,
class_prediction_bias_init=class_prediction_bias_init,
use_depthwise=use_depthwise,
name='ConvolutionalClassHead_%d' % stack_index))
if mask_head_config is not None:
if not mask_head_config.masks_are_class_agnostic:
logging.warning('Note that class specific mask prediction for SSD '
'models is memory consuming.')
mask_prediction_heads.append(
keras_mask_head.ConvolutionalMaskHead(
is_training=is_training,
num_classes=num_classes,
use_dropout=use_dropout,
dropout_keep_prob=dropout_keep_prob,
kernel_size=kernel_size,
conv_hyperparams=conv_hyperparams,
freeze_batchnorm=freeze_batchnorm,
num_predictions_per_location=num_predictions_per_location,
use_depthwise=use_depthwise,
mask_height=mask_head_config.mask_height,
mask_width=mask_head_config.mask_width,
masks_are_class_agnostic=mask_head_config.
masks_are_class_agnostic,
name='ConvolutionalMaskHead_%d' % stack_index))
return convolutional_keras_box_predictor.ConvolutionalBoxPredictor(
is_training=is_training,
......@@ -282,7 +235,6 @@ def build_weight_shared_convolutional_box_predictor(
share_prediction_tower=False,
apply_batch_norm=True,
use_depthwise=False,
mask_head_config=None,
score_converter_fn=tf.identity,
box_encodings_clip_range=None):
"""Builds and returns a WeightSharedConvolutionalBoxPredictor class.
......@@ -310,8 +262,6 @@ def build_weight_shared_convolutional_box_predictor(
apply_batch_norm: Whether to apply batch normalization to conv layers in
this predictor.
use_depthwise: Whether to use depthwise separable conv2d instead of conv2d.
mask_head_config: An optional MaskHead object containing configs for mask
head construction.
score_converter_fn: Callable score converter to perform elementwise op on
class scores.
box_encodings_clip_range: Min and max values for clipping the box_encodings.
......@@ -335,19 +285,6 @@ def build_weight_shared_convolutional_box_predictor(
use_depthwise=use_depthwise,
score_converter_fn=score_converter_fn))
other_heads = {}
if mask_head_config is not None:
if not mask_head_config.masks_are_class_agnostic:
logging.warning('Note that class specific mask prediction for SSD '
'models is memory consuming.')
other_heads[convolutional_box_predictor.MASK_PREDICTIONS] = (
mask_head.WeightSharedConvolutionalMaskHead(
num_classes=num_classes,
kernel_size=kernel_size,
use_dropout=use_dropout,
dropout_keep_prob=dropout_keep_prob,
mask_height=mask_head_config.mask_height,
mask_width=mask_head_config.mask_width,
masks_are_class_agnostic=mask_head_config.masks_are_class_agnostic))
return convolutional_box_predictor.WeightSharedConvolutionalBoxPredictor(
is_training=is_training,
num_classes=num_classes,
......@@ -520,9 +457,6 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes,
config_box_predictor = box_predictor_config.convolutional_box_predictor
conv_hyperparams_fn = argscope_fn(config_box_predictor.conv_hyperparams,
is_training)
mask_head_config = (
config_box_predictor.mask_head
if config_box_predictor.HasField('mask_head') else None)
return build_convolutional_box_predictor(
is_training=is_training,
num_classes=num_classes,
......@@ -539,8 +473,7 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes,
apply_sigmoid_to_scores=config_box_predictor.apply_sigmoid_to_scores,
class_prediction_bias_init=(
config_box_predictor.class_prediction_bias_init),
use_depthwise=config_box_predictor.use_depthwise,
mask_head_config=mask_head_config)
use_depthwise=config_box_predictor.use_depthwise)
if box_predictor_oneof == 'weight_shared_convolutional_box_predictor':
config_box_predictor = (
......@@ -549,9 +482,6 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes,
is_training)
apply_batch_norm = config_box_predictor.conv_hyperparams.HasField(
'batch_norm')
mask_head_config = (
config_box_predictor.mask_head
if config_box_predictor.HasField('mask_head') else None)
# During training phase, logits are used to compute the loss. Only apply
# sigmoid at inference to make the inference graph TPU friendly.
score_converter_fn = build_score_converter(
......@@ -581,7 +511,6 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes,
share_prediction_tower=config_box_predictor.share_prediction_tower,
apply_batch_norm=apply_batch_norm,
use_depthwise=config_box_predictor.use_depthwise,
mask_head_config=mask_head_config,
score_converter_fn=score_converter_fn,
box_encodings_clip_range=box_encodings_clip_range)
......@@ -680,10 +609,6 @@ def build_keras(conv_hyperparams_fn, freeze_batchnorm, inplace_batchnorm_update,
config_box_predictor = box_predictor_config.convolutional_box_predictor
conv_hyperparams = conv_hyperparams_fn(
config_box_predictor.conv_hyperparams)
mask_head_config = (
config_box_predictor.mask_head
if config_box_predictor.HasField('mask_head') else None)
return build_convolutional_keras_box_predictor(
is_training=is_training,
num_classes=num_classes,
......@@ -702,8 +627,7 @@ def build_keras(conv_hyperparams_fn, freeze_batchnorm, inplace_batchnorm_update,
max_depth=config_box_predictor.max_depth,
class_prediction_bias_init=(
config_box_predictor.class_prediction_bias_init),
use_depthwise=config_box_predictor.use_depthwise,
mask_head_config=mask_head_config)
use_depthwise=config_box_predictor.use_depthwise)
raise ValueError(
'Unknown box predictor for Keras: {}'.format(box_predictor_oneof))
......@@ -21,9 +21,7 @@ import tensorflow as tf
from google.protobuf import text_format
from object_detection.builders import box_predictor_builder
from object_detection.builders import hyperparams_builder
from object_detection.predictors import convolutional_box_predictor
from object_detection.predictors import mask_rcnn_box_predictor
from object_detection.predictors.heads import mask_head
from object_detection.protos import box_predictor_pb2
from object_detection.protos import hyperparams_pb2
......@@ -161,73 +159,6 @@ class ConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
self.assertTrue(box_predictor._is_training)
self.assertFalse(class_head._use_depthwise)
def test_construct_default_conv_box_predictor_with_default_mask_head(self):
box_predictor_text_proto = """
convolutional_box_predictor {
mask_head {
}
conv_hyperparams {
regularizer {
l1_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}"""
box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(box_predictor_text_proto, box_predictor_proto)
box_predictor = box_predictor_builder.build(
argscope_fn=hyperparams_builder.build,
box_predictor_config=box_predictor_proto,
is_training=True,
num_classes=90)
self.assertTrue(convolutional_box_predictor.MASK_PREDICTIONS in
box_predictor._other_heads)
mask_prediction_head = (
box_predictor._other_heads[convolutional_box_predictor.MASK_PREDICTIONS]
)
self.assertEqual(mask_prediction_head._mask_height, 15)
self.assertEqual(mask_prediction_head._mask_width, 15)
self.assertTrue(mask_prediction_head._masks_are_class_agnostic)
def test_construct_default_conv_box_predictor_with_custom_mask_head(self):
box_predictor_text_proto = """
convolutional_box_predictor {
mask_head {
mask_height: 7
mask_width: 7
masks_are_class_agnostic: false
}
conv_hyperparams {
regularizer {
l1_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}"""
box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(box_predictor_text_proto, box_predictor_proto)
box_predictor = box_predictor_builder.build(
argscope_fn=hyperparams_builder.build,
box_predictor_config=box_predictor_proto,
is_training=True,
num_classes=90)
self.assertTrue(convolutional_box_predictor.MASK_PREDICTIONS in
box_predictor._other_heads)
mask_prediction_head = (
box_predictor._other_heads[convolutional_box_predictor.MASK_PREDICTIONS]
)
self.assertEqual(mask_prediction_head._mask_height, 7)
self.assertEqual(mask_prediction_head._mask_width, 7)
self.assertFalse(mask_prediction_head._masks_are_class_agnostic)
class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
......@@ -421,79 +352,6 @@ class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
self.assertTrue(box_predictor._is_training)
self.assertEqual(box_predictor._apply_batch_norm, True)
def test_construct_weight_shared_predictor_with_default_mask_head(self):
box_predictor_text_proto = """
weight_shared_convolutional_box_predictor {
mask_head {
}
conv_hyperparams {
regularizer {
l1_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}"""
box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(box_predictor_text_proto, box_predictor_proto)
box_predictor = box_predictor_builder.build(
argscope_fn=hyperparams_builder.build,
box_predictor_config=box_predictor_proto,
is_training=True,
num_classes=90)
self.assertTrue(convolutional_box_predictor.MASK_PREDICTIONS in
box_predictor._other_heads)
weight_shared_convolutional_mask_head = (
box_predictor._other_heads[convolutional_box_predictor.MASK_PREDICTIONS]
)
self.assertIsInstance(weight_shared_convolutional_mask_head,
mask_head.WeightSharedConvolutionalMaskHead)
self.assertEqual(weight_shared_convolutional_mask_head._mask_height, 15)
self.assertEqual(weight_shared_convolutional_mask_head._mask_width, 15)
self.assertTrue(
weight_shared_convolutional_mask_head._masks_are_class_agnostic)
def test_construct_weight_shared_predictor_with_custom_mask_head(self):
box_predictor_text_proto = """
weight_shared_convolutional_box_predictor {
mask_head {
mask_height: 7
mask_width: 7
masks_are_class_agnostic: false
}
conv_hyperparams {
regularizer {
l1_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}"""
box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(box_predictor_text_proto, box_predictor_proto)
box_predictor = box_predictor_builder.build(
argscope_fn=hyperparams_builder.build,
box_predictor_config=box_predictor_proto,
is_training=True,
num_classes=90)
self.assertTrue(convolutional_box_predictor.MASK_PREDICTIONS in
box_predictor._other_heads)
weight_shared_convolutional_mask_head = (
box_predictor._other_heads[convolutional_box_predictor.MASK_PREDICTIONS]
)
self.assertIsInstance(weight_shared_convolutional_mask_head,
mask_head.WeightSharedConvolutionalMaskHead)
self.assertEqual(weight_shared_convolutional_mask_head._mask_height, 7)
self.assertEqual(weight_shared_convolutional_mask_head._mask_width, 7)
self.assertFalse(
weight_shared_convolutional_mask_head._masks_are_class_agnostic)
class MaskRCNNBoxPredictorBuilderTest(tf.test.TestCase):
......
......@@ -182,8 +182,9 @@ def build(hyperparams_config, is_training):
initializer, weights regularizer, activation function, batch norm function
and batch norm parameters based on the configuration.
Note that if the batch_norm parameteres are not specified in the config
(i.e. left to default) then batch norm is excluded from the arg_scope.
Note that if no normalization parameters are specified in the config,
(i.e. left to default) then both batch norm and group norm are excluded
from the arg_scope.
The batch norm parameters are set for updates based on `is_training` argument
and conv_hyperparams_config.batch_norm.train parameter. During training, they
......@@ -208,13 +209,14 @@ def build(hyperparams_config, is_training):
raise ValueError('hyperparams_config not of type '
'hyperparams_pb.Hyperparams.')
batch_norm = None
normalizer_fn = None
batch_norm_params = None
if hyperparams_config.HasField('batch_norm'):
batch_norm = slim.batch_norm
normalizer_fn = slim.batch_norm
batch_norm_params = _build_batch_norm_params(
hyperparams_config.batch_norm, is_training)
if hyperparams_config.HasField('group_norm'):
normalizer_fn = tf.contrib.layers.group_norm
affected_ops = [slim.conv2d, slim.separable_conv2d, slim.conv2d_transpose]
if hyperparams_config.HasField('op') and (
hyperparams_config.op == hyperparams_pb2.Hyperparams.FC):
......@@ -230,7 +232,7 @@ def build(hyperparams_config, is_training):
weights_initializer=_build_initializer(
hyperparams_config.initializer),
activation_fn=_build_activation_fn(hyperparams_config.activation),
normalizer_fn=batch_norm) as sc:
normalizer_fn=normalizer_fn) as sc:
return sc
return scope_fn
......
......@@ -15,9 +15,11 @@
"""A function to build localization and classification losses from config."""
import functools
from object_detection.core import balanced_positive_negative_sampler as sampler
from object_detection.core import losses
from object_detection.protos import losses_pb2
from object_detection.utils import ops
def build(loss_config):
......@@ -66,8 +68,28 @@ def build(loss_config):
random_example_sampler = sampler.BalancedPositiveNegativeSampler(
positive_fraction=loss_config.random_example_sampler.
positive_sample_fraction)
if loss_config.expected_loss_weights == loss_config.NONE:
expected_loss_weights_fn = None
elif loss_config.expected_loss_weights == loss_config.EXPECTED_SAMPLING:
expected_loss_weights_fn = functools.partial(
ops.expected_classification_loss_by_expected_sampling,
min_num_negative_samples=loss_config.min_num_negative_samples,
desired_negative_sampling_ratio=loss_config
.desired_negative_sampling_ratio)
elif (loss_config.expected_loss_weights == loss_config
.REWEIGHTING_UNMATCHED_ANCHORS):
expected_loss_weights_fn = functools.partial(
ops.expected_classification_loss_by_reweighting_unmatched_anchors,
min_num_negative_samples=loss_config.min_num_negative_samples,
desired_negative_sampling_ratio=loss_config
.desired_negative_sampling_ratio)
else:
raise ValueError('Not a valid value for expected_classification_loss.')
return (classification_loss, localization_loss, classification_weight,
localization_weight, hard_example_miner, random_example_sampler)
localization_weight, hard_example_miner, random_example_sampler,
expected_loss_weights_fn)
def build_hard_example_miner(config,
......
......@@ -21,6 +21,7 @@ from google.protobuf import text_format
from object_detection.builders import losses_builder
from object_detection.core import losses
from object_detection.protos import losses_pb2
from object_detection.utils import ops
class LocalizationLossBuilderTest(tf.test.TestCase):
......@@ -38,7 +39,7 @@ class LocalizationLossBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
_, localization_loss, _, _, _, _ = losses_builder.build(losses_proto)
_, localization_loss, _, _, _, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(localization_loss,
losses.WeightedL2LocalizationLoss))
......@@ -55,7 +56,7 @@ class LocalizationLossBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
_, localization_loss, _, _, _, _ = losses_builder.build(losses_proto)
_, localization_loss, _, _, _, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(localization_loss,
losses.WeightedSmoothL1LocalizationLoss))
self.assertAlmostEqual(localization_loss._delta, 1.0)
......@@ -74,7 +75,7 @@ class LocalizationLossBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
_, localization_loss, _, _, _, _ = losses_builder.build(losses_proto)
_, localization_loss, _, _, _, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(localization_loss,
losses.WeightedSmoothL1LocalizationLoss))
self.assertAlmostEqual(localization_loss._delta, 0.1)
......@@ -92,7 +93,7 @@ class LocalizationLossBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
_, localization_loss, _, _, _, _ = losses_builder.build(losses_proto)
_, localization_loss, _, _, _, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(localization_loss,
losses.WeightedIOULocalizationLoss))
......@@ -109,7 +110,7 @@ class LocalizationLossBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
_, localization_loss, _, _, _, _ = losses_builder.build(losses_proto)
_, localization_loss, _, _, _, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(localization_loss,
losses.WeightedSmoothL1LocalizationLoss))
predictions = tf.constant([[[0.0, 0.0, 1.0, 1.0], [0.0, 0.0, 1.0, 1.0]]])
......@@ -146,7 +147,7 @@ class ClassificationLossBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
classification_loss, _, _, _, _, _ = losses_builder.build(losses_proto)
classification_loss, _, _, _, _, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(classification_loss,
losses.WeightedSigmoidClassificationLoss))
......@@ -163,7 +164,7 @@ class ClassificationLossBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
classification_loss, _, _, _, _, _ = losses_builder.build(losses_proto)
classification_loss, _, _, _, _, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(classification_loss,
losses.SigmoidFocalClassificationLoss))
self.assertAlmostEqual(classification_loss._alpha, None)
......@@ -184,7 +185,7 @@ class ClassificationLossBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
classification_loss, _, _, _, _, _ = losses_builder.build(losses_proto)
classification_loss, _, _, _, _, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(classification_loss,
losses.SigmoidFocalClassificationLoss))
self.assertAlmostEqual(classification_loss._alpha, 0.25)
......@@ -203,7 +204,7 @@ class ClassificationLossBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
classification_loss, _, _, _, _, _ = losses_builder.build(losses_proto)
classification_loss, _, _, _, _, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(classification_loss,
losses.WeightedSoftmaxClassificationLoss))
......@@ -220,7 +221,7 @@ class ClassificationLossBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
classification_loss, _, _, _, _, _ = losses_builder.build(losses_proto)
classification_loss, _, _, _, _, _, _ = losses_builder.build(losses_proto)
self.assertTrue(
isinstance(classification_loss,
losses.WeightedSoftmaxClassificationAgainstLogitsLoss))
......@@ -239,7 +240,7 @@ class ClassificationLossBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
classification_loss, _, _, _, _, _ = losses_builder.build(losses_proto)
classification_loss, _, _, _, _, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(classification_loss,
losses.WeightedSoftmaxClassificationLoss))
......@@ -257,7 +258,7 @@ class ClassificationLossBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
classification_loss, _, _, _, _, _ = losses_builder.build(losses_proto)
classification_loss, _, _, _, _, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(classification_loss,
losses.BootstrappedSigmoidClassificationLoss))
......@@ -275,7 +276,7 @@ class ClassificationLossBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
classification_loss, _, _, _, _, _ = losses_builder.build(losses_proto)
classification_loss, _, _, _, _, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(classification_loss,
losses.WeightedSigmoidClassificationLoss))
predictions = tf.constant([[[0.0, 1.0, 0.0], [0.0, 0.5, 0.5]]])
......@@ -312,7 +313,7 @@ class HardExampleMinerBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
_, _, _, _, hard_example_miner, _ = losses_builder.build(losses_proto)
_, _, _, _, hard_example_miner, _, _ = losses_builder.build(losses_proto)
self.assertEqual(hard_example_miner, None)
def test_build_hard_example_miner_for_classification_loss(self):
......@@ -331,7 +332,7 @@ class HardExampleMinerBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
_, _, _, _, hard_example_miner, _ = losses_builder.build(losses_proto)
_, _, _, _, hard_example_miner, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(hard_example_miner, losses.HardExampleMiner))
self.assertEqual(hard_example_miner._loss_type, 'cls')
......@@ -351,7 +352,7 @@ class HardExampleMinerBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
_, _, _, _, hard_example_miner, _ = losses_builder.build(losses_proto)
_, _, _, _, hard_example_miner, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(hard_example_miner, losses.HardExampleMiner))
self.assertEqual(hard_example_miner._loss_type, 'loc')
......@@ -375,7 +376,7 @@ class HardExampleMinerBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
_, _, _, _, hard_example_miner, _ = losses_builder.build(losses_proto)
_, _, _, _, hard_example_miner, _, _ = losses_builder.build(losses_proto)
self.assertTrue(isinstance(hard_example_miner, losses.HardExampleMiner))
self.assertEqual(hard_example_miner._num_hard_examples, 32)
self.assertAlmostEqual(hard_example_miner._iou_threshold, 0.5)
......@@ -402,9 +403,9 @@ class LossBuilderTest(tf.test.TestCase):
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
(classification_loss, localization_loss,
classification_weight, localization_weight,
hard_example_miner, _) = losses_builder.build(losses_proto)
(classification_loss, localization_loss, classification_weight,
localization_weight, hard_example_miner, _,
_) = losses_builder.build(losses_proto)
self.assertTrue(isinstance(hard_example_miner, losses.HardExampleMiner))
self.assertTrue(isinstance(classification_loss,
losses.WeightedSoftmaxClassificationLoss))
......@@ -413,6 +414,65 @@ class LossBuilderTest(tf.test.TestCase):
self.assertAlmostEqual(classification_weight, 0.8)
self.assertAlmostEqual(localization_weight, 0.2)
def test_build_expected_sampling(self):
losses_text_proto = """
localization_loss {
weighted_l2 {
}
}
classification_loss {
weighted_softmax {
}
}
hard_example_miner {
}
classification_weight: 0.8
localization_weight: 0.2
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
(classification_loss, localization_loss, classification_weight,
localization_weight, hard_example_miner, _,
_) = losses_builder.build(losses_proto)
self.assertTrue(isinstance(hard_example_miner, losses.HardExampleMiner))
self.assertTrue(
isinstance(classification_loss,
losses.WeightedSoftmaxClassificationLoss))
self.assertTrue(
isinstance(localization_loss, losses.WeightedL2LocalizationLoss))
self.assertAlmostEqual(classification_weight, 0.8)
self.assertAlmostEqual(localization_weight, 0.2)
def test_build_reweighting_unmatched_anchors(self):
losses_text_proto = """
localization_loss {
weighted_l2 {
}
}
classification_loss {
weighted_softmax {
}
}
hard_example_miner {
}
classification_weight: 0.8
localization_weight: 0.2
"""
losses_proto = losses_pb2.Loss()
text_format.Merge(losses_text_proto, losses_proto)
(classification_loss, localization_loss, classification_weight,
localization_weight, hard_example_miner, _,
_) = losses_builder.build(losses_proto)
self.assertTrue(isinstance(hard_example_miner, losses.HardExampleMiner))
self.assertTrue(
isinstance(classification_loss,
losses.WeightedSoftmaxClassificationLoss))
self.assertTrue(
isinstance(localization_loss, losses.WeightedL2LocalizationLoss))
self.assertAlmostEqual(classification_weight, 0.8)
self.assertAlmostEqual(localization_weight, 0.2)
def test_raise_error_when_both_focal_loss_and_hard_example_miner(self):
losses_text_proto = """
localization_loss {
......
......@@ -50,6 +50,7 @@ from object_detection.models.ssd_mobilenet_v2_fpn_feature_extractor import SSDMo
from object_detection.models.ssd_mobilenet_v2_keras_feature_extractor import SSDMobileNetV2KerasFeatureExtractor
from object_detection.models.ssd_pnasnet_feature_extractor import SSDPNASNetFeatureExtractor
from object_detection.predictors import rfcn_box_predictor
from object_detection.predictors.heads import mask_head
from object_detection.protos import model_pb2
from object_detection.utils import ops
......@@ -261,28 +262,23 @@ def _build_ssd_model(ssd_config, is_training, add_summaries):
non_max_suppression_fn, score_conversion_fn = post_processing_builder.build(
ssd_config.post_processing)
(classification_loss, localization_loss, classification_weight,
localization_weight, hard_example_miner,
random_example_sampler) = losses_builder.build(ssd_config.loss)
localization_weight, hard_example_miner, random_example_sampler,
expected_loss_weights_fn) = losses_builder.build(ssd_config.loss)
normalize_loss_by_num_matches = ssd_config.normalize_loss_by_num_matches
normalize_loc_loss_by_codesize = ssd_config.normalize_loc_loss_by_codesize
weight_regression_loss_by_score = (ssd_config.weight_regression_loss_by_score)
equalization_loss_config = ops.EqualizationLossConfig(
weight=ssd_config.loss.equalization_loss.weight,
exclude_prefixes=ssd_config.loss.equalization_loss.exclude_prefixes)
target_assigner_instance = target_assigner.TargetAssigner(
region_similarity_calculator,
matcher,
box_coder,
negative_class_weight=negative_class_weight,
weight_regression_loss_by_score=weight_regression_loss_by_score)
expected_classification_loss_under_sampling = None
if ssd_config.use_expected_classification_loss_under_sampling:
expected_classification_loss_under_sampling = functools.partial(
ops.expected_classification_loss_under_sampling,
min_num_negative_samples=ssd_config.min_num_negative_samples,
desired_negative_sampling_ratio=ssd_config.
desired_negative_sampling_ratio)
negative_class_weight=negative_class_weight)
ssd_meta_arch_fn = ssd_meta_arch.SSDMetaArch
kwargs = {}
return ssd_meta_arch_fn(
is_training=is_training,
......@@ -306,9 +302,13 @@ def _build_ssd_model(ssd_config, is_training, add_summaries):
freeze_batchnorm=ssd_config.freeze_batchnorm,
inplace_batchnorm_update=ssd_config.inplace_batchnorm_update,
add_background_class=ssd_config.add_background_class,
explicit_background_class=ssd_config.explicit_background_class,
random_example_sampler=random_example_sampler,
expected_classification_loss_under_sampling=
expected_classification_loss_under_sampling)
expected_loss_weights_fn=expected_loss_weights_fn,
use_confidences_as_targets=ssd_config.use_confidences_as_targets,
implicit_example_weight=ssd_config.implicit_example_weight,
equalization_loss_config=equalization_loss_config,
**kwargs)
def _build_faster_rcnn_feature_extractor(
......@@ -374,7 +374,7 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
feature_extractor = _build_faster_rcnn_feature_extractor(
frcnn_config.feature_extractor, is_training,
frcnn_config.inplace_batchnorm_update)
inplace_batchnorm_update=frcnn_config.inplace_batchnorm_update)
number_of_stages = frcnn_config.number_of_stages
first_stage_anchor_generator = anchor_generator_builder.build(
......@@ -391,7 +391,8 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
frcnn_config.first_stage_box_predictor_kernel_size)
first_stage_box_predictor_depth = frcnn_config.first_stage_box_predictor_depth
first_stage_minibatch_size = frcnn_config.first_stage_minibatch_size
use_static_shapes = frcnn_config.use_static_shapes
use_static_shapes = frcnn_config.use_static_shapes and (
frcnn_config.use_static_shapes_for_eval or is_training)
first_stage_sampler = sampler.BalancedPositiveNegativeSampler(
positive_fraction=frcnn_config.first_stage_positive_balance_fraction,
is_static=(frcnn_config.use_static_balanced_label_sampler and
......
......@@ -150,9 +150,6 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
}
}
}
use_expected_classification_loss_under_sampling: true
min_num_negative_samples: 10
desired_negative_sampling_ratio: 2
}"""
model_proto = model_pb2.DetectionModel()
text_format.Merge(model_text_proto, model_proto)
......@@ -160,12 +157,8 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch)
self.assertIsInstance(model._feature_extractor,
SSDInceptionV2FeatureExtractor)
self.assertIsNotNone(model._expected_classification_loss_under_sampling)
self.assertEqual(
model._expected_classification_loss_under_sampling.keywords, {
'min_num_negative_samples': 10,
'desired_negative_sampling_ratio': 2
})
self.assertIsNone(model._expected_loss_weights_fn)
def test_create_ssd_inception_v3_model_from_config(self):
......@@ -708,7 +701,6 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
}
}
}
weight_regression_loss_by_score: true
}"""
model_proto = model_pb2.DetectionModel()
text_format.Merge(model_text_proto, model_proto)
......@@ -719,7 +711,6 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
self.assertIsInstance(model._box_predictor,
convolutional_box_predictor.ConvolutionalBoxPredictor)
self.assertTrue(model._normalize_loc_loss_by_codesize)
self.assertTrue(model._target_assigner._weight_regression_loss_by_score)
def test_create_ssd_mobilenet_v2_keras_model_from_config(self):
model_text_proto = """
......@@ -785,7 +776,6 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
}
}
}
weight_regression_loss_by_score: true
}"""
model_proto = model_pb2.DetectionModel()
text_format.Merge(model_text_proto, model_proto)
......@@ -797,7 +787,6 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
model._box_predictor,
convolutional_keras_box_predictor.ConvolutionalBoxPredictor)
self.assertTrue(model._normalize_loc_loss_by_codesize)
self.assertTrue(model._target_assigner._weight_regression_loss_by_score)
def test_create_ssd_mobilenet_v2_fpn_model_from_config(self):
model_text_proto = """
......@@ -1037,7 +1026,7 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
def test_create_faster_rcnn_resnet_v1_models_from_config(self):
model_text_proto = """
faster_rcnn {
inplace_batchnorm_update: true
inplace_batchnorm_update: false
num_classes: 3
image_resizer {
keep_aspect_ratio_resizer {
......
......@@ -189,11 +189,12 @@ def build(preprocessor_step_config):
if config.HasField('max_image_height'):
max_image_size = (config.max_image_height, config.max_image_width)
pad_color = config.pad_color
if pad_color and len(pad_color) != 3:
raise ValueError('pad_color should have 3 elements (RGB) if set!')
if not pad_color:
pad_color = None
pad_color = config.pad_color or None
if pad_color:
if len(pad_color) == 3:
pad_color = tf.to_float([x for x in config.pad_color])
else:
raise ValueError('pad_color should have 3 elements (RGB) if set!')
return (preprocessor.random_pad_image,
{
'min_image_size': min_image_size,
......
......@@ -241,6 +241,7 @@ class DetectionModel(object):
groundtruth_masks_list=None,
groundtruth_keypoints_list=None,
groundtruth_weights_list=None,
groundtruth_confidences_list=None,
groundtruth_is_crowd_list=None,
is_annotated_list=None):
"""Provide groundtruth tensors.
......@@ -265,6 +266,9 @@ class DetectionModel(object):
missing keypoints should be encoded as NaN.
groundtruth_weights_list: A list of 1-D tf.float32 tensors of shape
[num_boxes] containing weights for groundtruth boxes.
groundtruth_confidences_list: A list of 2-D tf.float32 tensors of shape
[num_boxes, num_classes] containing class confidences for groundtruth
boxes.
groundtruth_is_crowd_list: A list of 1-D tf.bool tensors of shape
[num_boxes] containing is_crowd annotations
is_annotated_list: A list of scalar tf.bool tensors indicating whether
......@@ -276,6 +280,9 @@ class DetectionModel(object):
if groundtruth_weights_list:
self._groundtruth_lists[fields.BoxListFields.
weights] = groundtruth_weights_list
if groundtruth_confidences_list:
self._groundtruth_lists[fields.BoxListFields.
confidences] = groundtruth_confidences_list
if groundtruth_masks_list:
self._groundtruth_lists[
fields.BoxListFields.masks] = groundtruth_masks_list
......
......@@ -1734,7 +1734,7 @@ class PreprocessorTest(tf.test.TestCase):
}
preprocessor_arg_map = preprocessor.get_default_func_arg_map(
include_label_scores=True,
include_label_weights=True,
include_instance_masks=True)
preprocessing_options = [
......
......@@ -44,6 +44,8 @@ class InputDataFields(object):
groundtruth_image_confidences: image-level class confidences.
groundtruth_boxes: coordinates of the ground truth boxes in the image.
groundtruth_classes: box-level class labels.
groundtruth_confidences: box-level class confidences. The shape should be
the same as the shape of groundtruth_classes.
groundtruth_label_types: box-level label types (e.g. explicit negative).
groundtruth_is_crowd: [DEPRECATED, use groundtruth_group_of instead]
is the groundtruth a single object or a crowd.
......@@ -59,7 +61,7 @@ class InputDataFields(object):
groundtruth_instance_classes: instance mask-level class labels.
groundtruth_keypoints: ground truth keypoints.
groundtruth_keypoint_visibilities: ground truth keypoint visibilities.
groundtruth_label_scores: groundtruth label scores.
groundtruth_label_weights: groundtruth label weights.
groundtruth_weights: groundtruth weight factor for bounding boxes.
num_groundtruth_boxes: number of groundtruth boxes.
is_annotated: whether an image has been labeled or not.
......@@ -91,7 +93,7 @@ class InputDataFields(object):
groundtruth_instance_classes = 'groundtruth_instance_classes'
groundtruth_keypoints = 'groundtruth_keypoints'
groundtruth_keypoint_visibilities = 'groundtruth_keypoint_visibilities'
groundtruth_label_scores = 'groundtruth_label_scores'
groundtruth_label_weights = 'groundtruth_label_weights'
groundtruth_weights = 'groundtruth_weights'
num_groundtruth_boxes = 'num_groundtruth_boxes'
is_annotated = 'is_annotated'
......@@ -144,6 +146,7 @@ class BoxListFields(object):
classes = 'classes'
scores = 'scores'
weights = 'weights'
confidences = 'confidences'
objectness = 'objectness'
masks = 'masks'
boundaries = 'boundaries'
......
......@@ -52,8 +52,7 @@ class TargetAssigner(object):
similarity_calc,
matcher,
box_coder,
negative_class_weight=1.0,
weight_regression_loss_by_score=False):
negative_class_weight=1.0):
"""Construct Object Detection Target Assigner.
Args:
......@@ -64,8 +63,6 @@ class TargetAssigner(object):
groundtruth boxes with respect to anchors.
negative_class_weight: classification weight to be associated to negative
anchors (default: 1.0). The weight must be in [0., 1.].
weight_regression_loss_by_score: Whether to weight the regression loss by
ground truth box score.
Raises:
ValueError: if similarity_calc is not a RegionSimilarityCalculator or
......@@ -81,7 +78,6 @@ class TargetAssigner(object):
self._matcher = matcher
self._box_coder = box_coder
self._negative_class_weight = negative_class_weight
self._weight_regression_loss_by_score = weight_regression_loss_by_score
@property
def box_coder(self):
......@@ -170,11 +166,6 @@ class TargetAssigner(object):
num_gt_boxes = groundtruth_boxes.num_boxes()
groundtruth_weights = tf.ones([num_gt_boxes], dtype=tf.float32)
# set scores on the gt boxes
scores = 1 - groundtruth_labels[:, 0]
groundtruth_boxes.add_field(fields.BoxListFields.scores, scores)
with tf.control_dependencies(
[unmatched_shape_assert, labels_and_box_shapes_assert]):
match_quality_matrix = self._similarity_calc.compare(groundtruth_boxes,
......@@ -187,12 +178,7 @@ class TargetAssigner(object):
cls_targets = self._create_classification_targets(groundtruth_labels,
unmatched_class_label,
match)
if self._weight_regression_loss_by_score:
reg_weights = self._create_regression_weights(
match, groundtruth_weights * scores)
else:
reg_weights = self._create_regression_weights(match,
groundtruth_weights)
reg_weights = self._create_regression_weights(match, groundtruth_weights)
cls_weights = self._create_classification_weights(match,
groundtruth_weights)
......@@ -503,3 +489,146 @@ def batch_assign_targets(target_assigner,
batch_reg_weights = tf.stack(reg_weights_list)
return (batch_cls_targets, batch_cls_weights, batch_reg_targets,
batch_reg_weights, match_list)
def batch_assign_confidences(target_assigner,
anchors_batch,
gt_box_batch,
gt_class_confidences_batch,
gt_weights_batch=None,
unmatched_class_label=None,
include_background_class=True,
implicit_class_weight=1.0):
"""Batched assignment of classification and regression targets.
This differences between batch_assign_confidences and batch_assign_targets:
- 'batch_assign_targets' supports scalar (agnostic), vector (multiclass) and
tensor (high-dimensional) targets. 'batch_assign_confidences' only support
scalar (agnostic) and vector (multiclass) targets.
- 'batch_assign_targets' assumes the input class tensor using the binary
one/K-hot encoding. 'batch_assign_confidences' takes the class confidence
scores as the input, where 1 means positive classes, 0 means implicit
negative classes, and -1 means explicit negative classes.
- 'batch_assign_confidences' assigns the targets in the similar way as
'batch_assign_targets' except that it gives different weights for implicit
and explicit classes. This allows user to control the negative gradients
pushed differently for implicit and explicit examples during the training.
Args:
target_assigner: a target assigner.
anchors_batch: BoxList representing N box anchors or list of BoxList objects
with length batch_size representing anchor sets.
gt_box_batch: a list of BoxList objects with length batch_size
representing groundtruth boxes for each image in the batch
gt_class_confidences_batch: a list of tensors with length batch_size, where
each tensor has shape [num_gt_boxes_i, classification_target_size] and
num_gt_boxes_i is the number of boxes in the ith boxlist of
gt_box_batch. Note that in this tensor, 1 means explicit positive class,
-1 means explicit negative class, and 0 means implicit negative class.
gt_weights_batch: A list of 1-D tf.float32 tensors of shape
[num_gt_boxes_i] containing weights for groundtruth boxes.
unmatched_class_label: a float32 tensor with shape [d_1, d_2, ..., d_k]
which is consistent with the classification target for each
anchor (and can be empty for scalar targets). This shape must thus be
compatible with the groundtruth labels that are passed to the "assign"
function (which have shape [num_gt_boxes, d_1, d_2, ..., d_k]).
include_background_class: whether or not gt_class_confidences_batch includes
the background class.
implicit_class_weight: the weight assigned to implicit examples.
Returns:
batch_cls_targets: a tensor with shape [batch_size, num_anchors,
num_classes],
batch_cls_weights: a tensor with shape [batch_size, num_anchors,
num_classes],
batch_reg_targets: a tensor with shape [batch_size, num_anchors,
box_code_dimension]
batch_reg_weights: a tensor with shape [batch_size, num_anchors],
match_list: a list of matcher.Match objects encoding the match between
anchors and groundtruth boxes for each image of the batch,
with rows of the Match objects corresponding to groundtruth boxes
and columns corresponding to anchors.
Raises:
ValueError: if input list lengths are inconsistent, i.e.,
batch_size == len(gt_box_batch) == len(gt_class_targets_batch)
and batch_size == len(anchors_batch) unless anchors_batch is a single
BoxList, or if any element in gt_class_confidences_batch has rank > 2.
"""
if not isinstance(anchors_batch, list):
anchors_batch = len(gt_box_batch) * [anchors_batch]
if not all(
isinstance(anchors, box_list.BoxList) for anchors in anchors_batch):
raise ValueError('anchors_batch must be a BoxList or list of BoxLists.')
if not (len(anchors_batch)
== len(gt_box_batch)
== len(gt_class_confidences_batch)):
raise ValueError('batch size incompatible with lengths of anchors_batch, '
'gt_box_batch and gt_class_confidences_batch.')
cls_targets_list = []
cls_weights_list = []
reg_targets_list = []
reg_weights_list = []
match_list = []
if gt_weights_batch is None:
gt_weights_batch = [None] * len(gt_class_confidences_batch)
for anchors, gt_boxes, gt_class_confidences, gt_weights in zip(
anchors_batch, gt_box_batch, gt_class_confidences_batch,
gt_weights_batch):
if (gt_class_confidences is not None and
len(gt_class_confidences.get_shape().as_list()) > 2):
raise ValueError('The shape of the class target is not supported. ',
gt_class_confidences.get_shape())
cls_targets, _, reg_targets, _, match = target_assigner.assign(
anchors, gt_boxes, gt_class_confidences, unmatched_class_label,
groundtruth_weights=gt_weights)
if include_background_class:
cls_targets_without_background = tf.slice(
cls_targets, [0, 1], [-1, -1])
else:
cls_targets_without_background = cls_targets
positive_mask = tf.greater(cls_targets_without_background, 0.0)
negative_mask = tf.less(cls_targets_without_background, 0.0)
explicit_example_mask = tf.logical_or(positive_mask, negative_mask)
positive_anchors = tf.reduce_any(positive_mask, axis=-1)
regression_weights = tf.to_float(positive_anchors)
regression_targets = (
reg_targets * tf.expand_dims(regression_weights, axis=-1))
regression_weights_expanded = tf.expand_dims(regression_weights, axis=-1)
cls_targets_without_background = (
cls_targets_without_background * (1 - tf.to_float(negative_mask)))
cls_weights_without_background = (
(1 - implicit_class_weight) * tf.to_float(explicit_example_mask)
+ implicit_class_weight)
if include_background_class:
cls_weights_background = (
(1 - implicit_class_weight) * regression_weights_expanded
+ implicit_class_weight)
classification_weights = tf.concat(
[cls_weights_background, cls_weights_without_background], axis=-1)
cls_targets_background = 1 - regression_weights_expanded
classification_targets = tf.concat(
[cls_targets_background, cls_targets_without_background], axis=-1)
else:
classification_targets = cls_targets_without_background
classification_weights = cls_weights_without_background
cls_targets_list.append(classification_targets)
cls_weights_list.append(classification_weights)
reg_targets_list.append(regression_targets)
reg_weights_list.append(regression_weights)
match_list.append(match)
batch_cls_targets = tf.stack(cls_targets_list)
batch_cls_weights = tf.stack(cls_weights_list)
batch_reg_targets = tf.stack(reg_targets_list)
batch_reg_weights = tf.stack(reg_weights_list)
return (batch_cls_targets, batch_cls_weights, batch_reg_targets,
batch_reg_weights, match_list)
......@@ -325,54 +325,6 @@ class TargetAssignerTest(test_case.TestCase):
self.assertAllClose(cls_weights_out, exp_cls_weights)
self.assertAllClose(reg_weights_out, exp_reg_weights)
def test_assign_multiclass_with_weight_regression_loss_by_score(self):
def graph_fn(anchor_means, groundtruth_box_corners, groundtruth_labels):
similarity_calc = region_similarity_calculator.IouSimilarity()
matcher = argmax_matcher.ArgMaxMatcher(
matched_threshold=0.5, unmatched_threshold=0.5)
box_coder = mean_stddev_box_coder.MeanStddevBoxCoder(stddev=0.1)
unmatched_class_label = tf.constant([1, 0, 0, 0, 0, 0, 0], tf.float32)
target_assigner = targetassigner.TargetAssigner(
similarity_calc,
matcher,
box_coder,
weight_regression_loss_by_score=True)
anchors_boxlist = box_list.BoxList(anchor_means)
groundtruth_boxlist = box_list.BoxList(groundtruth_box_corners)
result = target_assigner.assign(
anchors_boxlist,
groundtruth_boxlist,
groundtruth_labels,
unmatched_class_label=unmatched_class_label)
(_, cls_weights, _, reg_weights, _) = result
return (cls_weights, reg_weights)
anchor_means = np.array(
[[0.0, 0.0, 0.5, 0.5], [0.5, 0.5, 1.0, 0.8], [0, 0.5, .5, 1.0],
[.75, 0, 1.0, .25]],
dtype=np.float32)
groundtruth_box_corners = np.array(
[[0.0, 0.0, 0.5, 0.5], [0.5, 0.5, 0.9, 0.9], [.75, 0, .95, .27]],
dtype=np.float32)
groundtruth_labels = np.array(
[[.9, .1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0],
[.5, 0, 0, .5, 0, 0, 0]],
dtype=np.float32)
exp_cls_weights = [
[1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1]] # background class gets weight of 1.
exp_reg_weights = [.1, 1, 0., .5] # background class gets weight of 0.
(cls_weights_out, reg_weights_out) = self.execute(
graph_fn, [anchor_means, groundtruth_box_corners, groundtruth_labels])
self.assertAllClose(cls_weights_out, exp_cls_weights)
self.assertAllClose(reg_weights_out, exp_reg_weights)
def test_assign_multidimensional_class_targets(self):
def graph_fn(anchor_means, groundtruth_box_corners, groundtruth_labels):
......@@ -869,6 +821,321 @@ class BatchTargetAssignerTest(test_case.TestCase):
self.assertAllClose(reg_weights_out, exp_reg_weights)
class BatchTargetAssignConfidencesTest(test_case.TestCase):
def _get_target_assigner(self):
similarity_calc = region_similarity_calculator.IouSimilarity()
matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=0.5,
unmatched_threshold=0.5)
box_coder = mean_stddev_box_coder.MeanStddevBoxCoder(stddev=0.1)
return targetassigner.TargetAssigner(similarity_calc, matcher, box_coder)
def test_batch_assign_empty_groundtruth(self):
def graph_fn(anchor_means, groundtruth_box_corners, gt_class_confidences):
groundtruth_boxlist = box_list.BoxList(groundtruth_box_corners)
gt_box_batch = [groundtruth_boxlist]
gt_class_confidences_batch = [gt_class_confidences]
anchors_boxlist = box_list.BoxList(anchor_means)
num_classes = 3
implicit_class_weight = 0.5
unmatched_class_label = tf.constant([1] + num_classes * [0], tf.float32)
multiclass_target_assigner = self._get_target_assigner()
(cls_targets, cls_weights, reg_targets, reg_weights,
_) = targetassigner.batch_assign_confidences(
multiclass_target_assigner,
anchors_boxlist,
gt_box_batch,
gt_class_confidences_batch,
unmatched_class_label=unmatched_class_label,
include_background_class=True,
implicit_class_weight=implicit_class_weight)
return (cls_targets, cls_weights, reg_targets, reg_weights)
groundtruth_box_corners = np.zeros((0, 4), dtype=np.float32)
anchor_means = np.array([[0, 0, .25, .25],
[0, .25, 1, 1]], dtype=np.float32)
num_classes = 3
pad = 1
gt_class_confidences = np.zeros((0, num_classes + pad), dtype=np.float32)
exp_cls_targets = [[[1, 0, 0, 0],
[1, 0, 0, 0]]]
exp_cls_weights = [[[0.5, 0.5, 0.5, 0.5],
[0.5, 0.5, 0.5, 0.5]]]
exp_reg_targets = [[[0, 0, 0, 0],
[0, 0, 0, 0]]]
exp_reg_weights = [[0, 0]]
(cls_targets_out,
cls_weights_out, reg_targets_out, reg_weights_out) = self.execute(
graph_fn,
[anchor_means, groundtruth_box_corners, gt_class_confidences])
self.assertAllClose(cls_targets_out, exp_cls_targets)
self.assertAllClose(cls_weights_out, exp_cls_weights)
self.assertAllClose(reg_targets_out, exp_reg_targets)
self.assertAllClose(reg_weights_out, exp_reg_weights)
def test_batch_assign_confidences_agnostic(self):
def graph_fn(anchor_means, groundtruth_boxlist1, groundtruth_boxlist2):
box_list1 = box_list.BoxList(groundtruth_boxlist1)
box_list2 = box_list.BoxList(groundtruth_boxlist2)
gt_box_batch = [box_list1, box_list2]
gt_class_confidences_batch = [None, None]
anchors_boxlist = box_list.BoxList(anchor_means)
agnostic_target_assigner = self._get_target_assigner()
implicit_class_weight = 0.5
(cls_targets, cls_weights, reg_targets, reg_weights,
_) = targetassigner.batch_assign_confidences(
agnostic_target_assigner,
anchors_boxlist,
gt_box_batch,
gt_class_confidences_batch,
include_background_class=False,
implicit_class_weight=implicit_class_weight)
return (cls_targets, cls_weights, reg_targets, reg_weights)
groundtruth_boxlist1 = np.array([[0., 0., 0.2, 0.2]], dtype=np.float32)
groundtruth_boxlist2 = np.array([[0, 0.25123152, 1, 1],
[0.015789, 0.0985, 0.55789, 0.3842]],
dtype=np.float32)
anchor_means = np.array([[0, 0, .25, .25],
[0, .25, 1, 1],
[0, .1, .5, .5],
[.75, .75, 1, 1]], dtype=np.float32)
exp_cls_targets = [[[1], [0], [0], [0]],
[[0], [1], [1], [0]]]
exp_cls_weights = [[[1], [0.5], [0.5], [0.5]],
[[0.5], [1], [1], [0.5]]]
exp_reg_targets = [[[0, 0, -0.5, -0.5],
[0, 0, 0, 0],
[0, 0, 0, 0,],
[0, 0, 0, 0,],],
[[0, 0, 0, 0,],
[0, 0.01231521, 0, 0],
[0.15789001, -0.01500003, 0.57889998, -1.15799987],
[0, 0, 0, 0]]]
exp_reg_weights = [[1, 0, 0, 0],
[0, 1, 1, 0]]
(cls_targets_out,
cls_weights_out, reg_targets_out, reg_weights_out) = self.execute(
graph_fn, [anchor_means, groundtruth_boxlist1, groundtruth_boxlist2])
self.assertAllClose(cls_targets_out, exp_cls_targets)
self.assertAllClose(cls_weights_out, exp_cls_weights)
self.assertAllClose(reg_targets_out, exp_reg_targets)
self.assertAllClose(reg_weights_out, exp_reg_weights)
def test_batch_assign_confidences_multiclass(self):
def graph_fn(anchor_means, groundtruth_boxlist1, groundtruth_boxlist2,
class_targets1, class_targets2):
box_list1 = box_list.BoxList(groundtruth_boxlist1)
box_list2 = box_list.BoxList(groundtruth_boxlist2)
gt_box_batch = [box_list1, box_list2]
gt_class_confidences_batch = [class_targets1, class_targets2]
anchors_boxlist = box_list.BoxList(anchor_means)
multiclass_target_assigner = self._get_target_assigner()
num_classes = 3
implicit_class_weight = 0.5
unmatched_class_label = tf.constant([1] + num_classes * [0], tf.float32)
(cls_targets, cls_weights, reg_targets, reg_weights,
_) = targetassigner.batch_assign_confidences(
multiclass_target_assigner,
anchors_boxlist,
gt_box_batch,
gt_class_confidences_batch,
unmatched_class_label=unmatched_class_label,
include_background_class=True,
implicit_class_weight=implicit_class_weight)
return (cls_targets, cls_weights, reg_targets, reg_weights)
groundtruth_boxlist1 = np.array([[0., 0., 0.2, 0.2]], dtype=np.float32)
groundtruth_boxlist2 = np.array([[0, 0.25123152, 1, 1],
[0.015789, 0.0985, 0.55789, 0.3842]],
dtype=np.float32)
class_targets1 = np.array([[0, 1, 0, 0]], dtype=np.float32)
class_targets2 = np.array([[0, 0, 0, 1],
[0, 0, -1, 0]], dtype=np.float32)
anchor_means = np.array([[0, 0, .25, .25],
[0, .25, 1, 1],
[0, .1, .5, .5],
[.75, .75, 1, 1]], dtype=np.float32)
exp_cls_targets = [[[0, 1, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0]],
[[1, 0, 0, 0],
[0, 0, 0, 1],
[1, 0, 0, 0],
[1, 0, 0, 0]]]
exp_cls_weights = [[[1, 1, 0.5, 0.5],
[0.5, 0.5, 0.5, 0.5],
[0.5, 0.5, 0.5, 0.5],
[0.5, 0.5, 0.5, 0.5]],
[[0.5, 0.5, 0.5, 0.5],
[1, 0.5, 0.5, 1],
[0.5, 0.5, 1, 0.5],
[0.5, 0.5, 0.5, 0.5]]]
exp_reg_targets = [[[0, 0, -0.5, -0.5],
[0, 0, 0, 0],
[0, 0, 0, 0,],
[0, 0, 0, 0,],],
[[0, 0, 0, 0,],
[0, 0.01231521, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]]
exp_reg_weights = [[1, 0, 0, 0],
[0, 1, 0, 0]]
(cls_targets_out, cls_weights_out, reg_targets_out,
reg_weights_out) = self.execute(graph_fn, [
anchor_means, groundtruth_boxlist1, groundtruth_boxlist2,
class_targets1, class_targets2
])
self.assertAllClose(cls_targets_out, exp_cls_targets)
self.assertAllClose(cls_weights_out, exp_cls_weights)
self.assertAllClose(reg_targets_out, exp_reg_targets)
self.assertAllClose(reg_weights_out, exp_reg_weights)
def test_batch_assign_confidences_multiclass_with_padded_groundtruth(self):
def graph_fn(anchor_means, groundtruth_boxlist1, groundtruth_boxlist2,
class_targets1, class_targets2, groundtruth_weights1,
groundtruth_weights2):
box_list1 = box_list.BoxList(groundtruth_boxlist1)
box_list2 = box_list.BoxList(groundtruth_boxlist2)
gt_box_batch = [box_list1, box_list2]
gt_class_confidences_batch = [class_targets1, class_targets2]
gt_weights = [groundtruth_weights1, groundtruth_weights2]
anchors_boxlist = box_list.BoxList(anchor_means)
multiclass_target_assigner = self._get_target_assigner()
num_classes = 3
unmatched_class_label = tf.constant([1] + num_classes * [0], tf.float32)
implicit_class_weight = 0.5
(cls_targets, cls_weights, reg_targets, reg_weights,
_) = targetassigner.batch_assign_confidences(
multiclass_target_assigner,
anchors_boxlist,
gt_box_batch,
gt_class_confidences_batch,
gt_weights,
unmatched_class_label=unmatched_class_label,
include_background_class=True,
implicit_class_weight=implicit_class_weight)
return (cls_targets, cls_weights, reg_targets, reg_weights)
groundtruth_boxlist1 = np.array([[0., 0., 0.2, 0.2],
[0., 0., 0., 0.]], dtype=np.float32)
groundtruth_weights1 = np.array([1, 0], dtype=np.float32)
groundtruth_boxlist2 = np.array([[0, 0.25123152, 1, 1],
[0.015789, 0.0985, 0.55789, 0.3842],
[0, 0, 0, 0]],
dtype=np.float32)
groundtruth_weights2 = np.array([1, 1, 0], dtype=np.float32)
class_targets1 = np.array([[0, 1, 0, 0], [0, 0, 0, 0]], dtype=np.float32)
class_targets2 = np.array([[0, 0, 0, 1],
[0, 0, -1, 0],
[0, 0, 0, 0]], dtype=np.float32)
anchor_means = np.array([[0, 0, .25, .25],
[0, .25, 1, 1],
[0, .1, .5, .5],
[.75, .75, 1, 1]], dtype=np.float32)
exp_cls_targets = [[[0, 1, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0]],
[[1, 0, 0, 0],
[0, 0, 0, 1],
[1, 0, 0, 0],
[1, 0, 0, 0]]]
exp_cls_weights = [[[1, 1, 0.5, 0.5],
[0.5, 0.5, 0.5, 0.5],
[0.5, 0.5, 0.5, 0.5],
[0.5, 0.5, 0.5, 0.5]],
[[0.5, 0.5, 0.5, 0.5],
[1, 0.5, 0.5, 1],
[0.5, 0.5, 1, 0.5],
[0.5, 0.5, 0.5, 0.5]]]
exp_reg_targets = [[[0, 0, -0.5, -0.5],
[0, 0, 0, 0],
[0, 0, 0, 0,],
[0, 0, 0, 0,],],
[[0, 0, 0, 0,],
[0, 0.01231521, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]]
exp_reg_weights = [[1, 0, 0, 0],
[0, 1, 0, 0]]
(cls_targets_out, cls_weights_out, reg_targets_out,
reg_weights_out) = self.execute(graph_fn, [
anchor_means, groundtruth_boxlist1, groundtruth_boxlist2,
class_targets1, class_targets2, groundtruth_weights1,
groundtruth_weights2
])
self.assertAllClose(cls_targets_out, exp_cls_targets)
self.assertAllClose(cls_weights_out, exp_cls_weights)
self.assertAllClose(reg_targets_out, exp_reg_targets)
self.assertAllClose(reg_weights_out, exp_reg_weights)
def test_batch_assign_confidences_multidimensional(self):
def graph_fn(anchor_means, groundtruth_boxlist1, groundtruth_boxlist2,
class_targets1, class_targets2):
box_list1 = box_list.BoxList(groundtruth_boxlist1)
box_list2 = box_list.BoxList(groundtruth_boxlist2)
gt_box_batch = [box_list1, box_list2]
gt_class_confidences_batch = [class_targets1, class_targets2]
anchors_boxlist = box_list.BoxList(anchor_means)
multiclass_target_assigner = self._get_target_assigner()
target_dimensions = (2, 3)
unmatched_class_label = tf.constant(np.zeros(target_dimensions),
tf.float32)
implicit_class_weight = 0.5
(cls_targets, cls_weights, reg_targets, reg_weights,
_) = targetassigner.batch_assign_confidences(
multiclass_target_assigner,
anchors_boxlist,
gt_box_batch,
gt_class_confidences_batch,
unmatched_class_label=unmatched_class_label,
include_background_class=True,
implicit_class_weight=implicit_class_weight)
return (cls_targets, cls_weights, reg_targets, reg_weights)
groundtruth_boxlist1 = np.array([[0., 0., 0.2, 0.2]], dtype=np.float32)
groundtruth_boxlist2 = np.array([[0, 0.25123152, 1, 1],
[0.015789, 0.0985, 0.55789, 0.3842]],
dtype=np.float32)
class_targets1 = np.array([[0, 1, 0, 0]], dtype=np.float32)
class_targets2 = np.array([[0, 0, 0, 1],
[0, 0, 1, 0]], dtype=np.float32)
class_targets1 = np.array([[[0, 1, 1],
[1, 1, 0]]], dtype=np.float32)
class_targets2 = np.array([[[0, 1, 1],
[1, 1, 0]],
[[0, 0, 1],
[0, 0, 1]]], dtype=np.float32)
anchor_means = np.array([[0, 0, .25, .25],
[0, .25, 1, 1],
[0, .1, .5, .5],
[.75, .75, 1, 1]], dtype=np.float32)
with self.assertRaises(ValueError):
_, _, _, _ = self.execute(graph_fn, [
anchor_means, groundtruth_boxlist1, groundtruth_boxlist2,
class_targets1, class_targets2
])
class CreateTargetAssignerTest(tf.test.TestCase):
def test_create_target_assigner(self):
......
item {
name: "background"
id: 0
display_name: "background"
}
item {
name: "/m/01g317"
id: 1
display_name: "person"
}
item {
name: "/m/0199g"
id: 2
display_name: "bicycle"
}
item {
name: "/m/0k4j"
id: 3
display_name: "car"
}
item {
name: "/m/04_sv"
id: 4
display_name: "motorcycle"
}
item {
name: "/m/05czz6l"
id: 5
display_name: "airplane"
}
item {
name: "/m/01bjv"
id: 6
display_name: "bus"
}
item {
name: "/m/07jdr"
id: 7
display_name: "train"
}
item {
name: "/m/07r04"
id: 8
display_name: "truck"
}
item {
name: "/m/019jd"
id: 9
display_name: "boat"
}
item {
name: "/m/015qff"
id: 10
display_name: "traffic light"
}
item {
name: "/m/01pns0"
id: 11
display_name: "fire hydrant"
}
item {
name: "12"
id: 12
display_name: "12"
}
item {
name: "/m/02pv19"
id: 13
display_name: "stop sign"
}
item {
name: "/m/015qbp"
id: 14
display_name: "parking meter"
}
item {
name: "/m/0cvnqh"
id: 15
display_name: "bench"
}
item {
name: "/m/015p6"
id: 16
display_name: "bird"
}
item {
name: "/m/01yrx"
id: 17
display_name: "cat"
}
item {
name: "/m/0bt9lr"
id: 18
display_name: "dog"
}
item {
name: "/m/03k3r"
id: 19
display_name: "horse"
}
item {
name: "/m/07bgp"
id: 20
display_name: "sheep"
}
item {
name: "/m/01xq0k1"
id: 21
display_name: "cow"
}
item {
name: "/m/0bwd_0j"
id: 22
display_name: "elephant"
}
item {
name: "/m/01dws"
id: 23
display_name: "bear"
}
item {
name: "/m/0898b"
id: 24
display_name: "zebra"
}
item {
name: "/m/03bk1"
id: 25
display_name: "giraffe"
}
item {
name: "26"
id: 26
display_name: "26"
}
item {
name: "/m/01940j"
id: 27
display_name: "backpack"
}
item {
name: "/m/0hnnb"
id: 28
display_name: "umbrella"
}
item {
name: "29"
id: 29
display_name: "29"
}
item {
name: "30"
id: 30
display_name: "30"
}
item {
name: "/m/080hkjn"
id: 31
display_name: "handbag"
}
item {
name: "/m/01rkbr"
id: 32
display_name: "tie"
}
item {
name: "/m/01s55n"
id: 33
display_name: "suitcase"
}
item {
name: "/m/02wmf"
id: 34
display_name: "frisbee"
}
item {
name: "/m/071p9"
id: 35
display_name: "skis"
}
item {
name: "/m/06__v"
id: 36
display_name: "snowboard"
}
item {
name: "/m/018xm"
id: 37
display_name: "sports ball"
}
item {
name: "/m/02zt3"
id: 38
display_name: "kite"
}
item {
name: "/m/03g8mr"
id: 39
display_name: "baseball bat"
}
item {
name: "/m/03grzl"
id: 40
display_name: "baseball glove"
}
item {
name: "/m/06_fw"
id: 41
display_name: "skateboard"
}
item {
name: "/m/019w40"
id: 42
display_name: "surfboard"
}
item {
name: "/m/0dv9c"
id: 43
display_name: "tennis racket"
}
item {
name: "/m/04dr76w"
id: 44
display_name: "bottle"
}
item {
name: "45"
id: 45
display_name: "45"
}
item {
name: "/m/09tvcd"
id: 46
display_name: "wine glass"
}
item {
name: "/m/08gqpm"
id: 47
display_name: "cup"
}
item {
name: "/m/0dt3t"
id: 48
display_name: "fork"
}
item {
name: "/m/04ctx"
id: 49
display_name: "knife"
}
item {
name: "/m/0cmx8"
id: 50
display_name: "spoon"
}
item {
name: "/m/04kkgm"
id: 51
display_name: "bowl"
}
item {
name: "/m/09qck"
id: 52
display_name: "banana"
}
item {
name: "/m/014j1m"
id: 53
display_name: "apple"
}
item {
name: "/m/0l515"
id: 54
display_name: "sandwich"
}
item {
name: "/m/0cyhj_"
id: 55
display_name: "orange"
}
item {
name: "/m/0hkxq"
id: 56
display_name: "broccoli"
}
item {
name: "/m/0fj52s"
id: 57
display_name: "carrot"
}
item {
name: "/m/01b9xk"
id: 58
display_name: "hot dog"
}
item {
name: "/m/0663v"
id: 59
display_name: "pizza"
}
item {
name: "/m/0jy4k"
id: 60
display_name: "donut"
}
item {
name: "/m/0fszt"
id: 61
display_name: "cake"
}
item {
name: "/m/01mzpv"
id: 62
display_name: "chair"
}
item {
name: "/m/02crq1"
id: 63
display_name: "couch"
}
item {
name: "/m/03fp41"
id: 64
display_name: "potted plant"
}
item {
name: "/m/03ssj5"
id: 65
display_name: "bed"
}
item {
name: "66"
id: 66
display_name: "66"
}
item {
name: "/m/04bcr3"
id: 67
display_name: "dining table"
}
item {
name: "68"
id: 68
display_name: "68"
}
item {
name: "69"
id: 69
display_name: "69"
}
item {
name: "/m/09g1w"
id: 70
display_name: "toilet"
}
item {
name: "71"
id: 71
display_name: "71"
}
item {
name: "/m/07c52"
id: 72
display_name: "tv"
}
item {
name: "/m/01c648"
id: 73
display_name: "laptop"
}
item {
name: "/m/020lf"
id: 74
display_name: "mouse"
}
item {
name: "/m/0qjjc"
id: 75
display_name: "remote"
}
item {
name: "/m/01m2v"
id: 76
display_name: "keyboard"
}
item {
name: "/m/050k8"
id: 77
display_name: "cell phone"
}
item {
name: "/m/0fx9l"
id: 78
display_name: "microwave"
}
item {
name: "/m/029bxz"
id: 79
display_name: "oven"
}
item {
name: "/m/01k6s3"
id: 80
display_name: "toaster"
}
item {
name: "/m/0130jx"
id: 81
display_name: "sink"
}
item {
name: "/m/040b_t"
id: 82
display_name: "refrigerator"
}
item {
name: "83"
id: 83
display_name: "83"
}
item {
name: "/m/0bt_c3"
id: 84
display_name: "book"
}
item {
name: "/m/01x3z"
id: 85
display_name: "clock"
}
item {
name: "/m/02s195"
id: 86
display_name: "vase"
}
item {
name: "/m/01lsmm"
id: 87
display_name: "scissors"
}
item {
name: "/m/0kmg4"
id: 88
display_name: "teddy bear"
}
item {
name: "/m/03wvsk"
id: 89
display_name: "hair drier"
}
item {
name: "/m/012xff"
id: 90
display_name: "toothbrush"
}
......@@ -14,7 +14,7 @@ A couple words of warning:
the container. When running through the tutorial,
**do not close the container**.
2. To be able to deploy the [Android app](
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/examples/android/app)
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/examples/android/app)
(which you will build at the end of the tutorial),
you will need to kill any instances of `adb` running on the host machine. You
can accomplish this by closing all instances of Android Studio, and then
......
......@@ -26,6 +26,7 @@ from object_detection.core import keypoint_ops
from object_detection.core import standard_fields as fields
from object_detection.metrics import coco_evaluation
from object_detection.utils import label_map_util
from object_detection.utils import object_detection_evaluation
from object_detection.utils import ops
from object_detection.utils import shape_utils
from object_detection.utils import visualization_utils as vis_utils
......@@ -40,6 +41,18 @@ EVAL_METRICS_CLASS_DICT = {
coco_evaluation.CocoDetectionEvaluator,
'coco_mask_metrics':
coco_evaluation.CocoMaskEvaluator,
'oid_challenge_detection_metrics':
object_detection_evaluation.OpenImagesDetectionChallengeEvaluator,
'pascal_voc_detection_metrics':
object_detection_evaluation.PascalDetectionEvaluator,
'weighted_pascal_voc_detection_metrics':
object_detection_evaluation.WeightedPascalDetectionEvaluator,
'pascal_voc_instance_segmentation_metrics':
object_detection_evaluation.PascalInstanceSegmentationEvaluator,
'weighted_pascal_voc_instance_segmentation_metrics':
object_detection_evaluation.WeightedPascalInstanceSegmentationEvaluator,
'oid_V2_detection_metrics':
object_detection_evaluation.OpenImagesDetectionEvaluator,
}
EVAL_DEFAULT_METRIC = 'coco_detection_metrics'
......@@ -588,8 +601,7 @@ def result_dict_for_single_example(image,
exclude_keys = [
fields.InputDataFields.original_image,
fields.DetectionResultFields.num_detections,
fields.InputDataFields.num_groundtruth_boxes,
fields.InputDataFields.original_image_spatial_shape
fields.InputDataFields.num_groundtruth_boxes
]
output_dict = {
......@@ -611,6 +623,7 @@ def result_dict_for_batched_example(images,
class_agnostic=False,
scale_to_absolute=False,
original_image_spatial_shapes=None,
true_image_shapes=None,
max_gt_boxes=None):
"""Merges all detection and groundtruth information for a single example.
......@@ -646,6 +659,8 @@ def result_dict_for_batched_example(images,
coordinates. Default False.
original_image_spatial_shapes: A 2D int32 tensor of shape [batch_size, 2]
used to resize the image. When set to None, the image size is retained.
true_image_shapes: A 2D int32 tensor of shape [batch_size, 3]
containing the size of the unpadded original_image.
max_gt_boxes: [batch_size] tensor representing the maximum number of
groundtruth boxes to pad.
......@@ -654,6 +669,8 @@ def result_dict_for_batched_example(images,
'original_image': A [batch_size, H, W, C] uint8 image tensor.
'original_image_spatial_shape': A [batch_size, 2] tensor containing the
original image sizes.
'true_image_shape': A [batch_size, 3] tensor containing the size of
the unpadded original_image.
'key': A [batch_size] string tensor with image identifier.
'detection_boxes': [batch_size, max_detections, 4] float32 tensor of boxes,
in normalized or absolute coordinates, depending on the value of
......@@ -681,8 +698,10 @@ def result_dict_for_batched_example(images,
of groundtruth boxes per image.
Raises:
ValueError: if original_image_spatial_shape is not 1D int32 tensor of shape
[2].
ValueError: if original_image_spatial_shape is not 2D int32 tensor of shape
[2].
ValueError: if true_image_shapes is not 2D int32 tensor of shape
[3].
"""
label_id_offset = 1 # Applying label id offset (b/63711816)
......@@ -698,11 +717,25 @@ def result_dict_for_batched_example(images,
'`original_image_spatial_shape` should be a 2D tensor of shape '
'[batch_size, 2].')
if true_image_shapes is None:
true_image_shapes = tf.tile(
tf.expand_dims(tf.shape(images)[1:4], axis=0),
multiples=[tf.shape(images)[0], 1])
else:
if (len(true_image_shapes.shape) != 2
and true_image_shapes.shape[1] != 3):
raise ValueError('`true_image_shapes` should be a 2D tensor of '
'shape [batch_size, 3].')
output_dict = {
input_data_fields.original_image: images,
input_data_fields.key: keys,
input_data_fields.original_image:
images,
input_data_fields.key:
keys,
input_data_fields.original_image_spatial_shape: (
original_image_spatial_shapes)
original_image_spatial_shapes),
input_data_fields.true_image_shape:
true_image_shapes
}
detection_fields = fields.DetectionResultFields
......
......@@ -47,7 +47,7 @@ class EvalUtilTest(test_case.TestCase, parameterized.TestCase):
if batch_size == 1:
key = tf.constant('image1')
else:
key = tf.constant([str(range(batch_size))])
key = tf.constant([str(i) for i in range(batch_size)])
detection_boxes = tf.tile(tf.constant([[[0., 0., 1., 1.]]]),
multiples=[batch_size, 1, 1])
detection_scores = tf.tile(tf.constant([[0.8]]), multiples=[batch_size, 1])
......
......@@ -107,8 +107,14 @@ flags.DEFINE_integer('max_detections', 10,
'Maximum number of detections (boxes) to show.')
flags.DEFINE_integer('max_classes_per_detection', 1,
'Number of classes to display per detection box.')
flags.DEFINE_integer(
'detections_per_class', 100,
'Number of anchors used per class in Regular Non-Max-Suppression.')
flags.DEFINE_bool('add_postprocessing_op', True,
'Add TFLite custom op for postprocessing to the graph.')
flags.DEFINE_bool(
'use_regular_nms', False,
'Flag to set postprocessing op to use Regular NMS instead of Fast NMS.')
flags.DEFINE_string(
'config_override', '', 'pipeline_pb2.TrainEvalPipelineConfig '
'text proto to override pipeline_config_path.')
......@@ -130,7 +136,7 @@ def main(argv):
export_tflite_ssd_graph_lib.export_tflite_graph(
pipeline_config, FLAGS.trained_checkpoint_prefix, FLAGS.output_directory,
FLAGS.add_postprocessing_op, FLAGS.max_detections,
FLAGS.max_classes_per_detection)
FLAGS.max_classes_per_detection, FLAGS.use_regular_nms)
if __name__ == '__main__':
......
......@@ -59,9 +59,15 @@ def get_const_center_size_encoded_anchors(anchors):
return encoded_anchors
def append_postprocessing_op(frozen_graph_def, max_detections,
max_classes_per_detection, nms_score_threshold,
nms_iou_threshold, num_classes, scale_values):
def append_postprocessing_op(frozen_graph_def,
max_detections,
max_classes_per_detection,
nms_score_threshold,
nms_iou_threshold,
num_classes,
scale_values,
detections_per_class=100,
use_regular_nms=False):
"""Appends postprocessing custom op.
Args:
......@@ -77,6 +83,10 @@ def append_postprocessing_op(frozen_graph_def, max_detections,
scale_values: scale values is a dict with following key-value pairs
{y_scale: 10, x_scale: 10, h_scale: 5, w_scale: 5} that are used in decode
centersize boxes
detections_per_class: In regular NonMaxSuppression, number of anchors used
for NonMaxSuppression per class
use_regular_nms: Flag to set postprocessing op to use Regular NMS instead
of Fast NMS.
Returns:
transformed_graph_def: Frozen GraphDef with postprocessing custom op
......@@ -121,6 +131,10 @@ def append_postprocessing_op(frozen_graph_def, max_detections,
attr_value_pb2.AttrValue(f=scale_values['h_scale'].pop()))
new_output.attr['w_scale'].CopyFrom(
attr_value_pb2.AttrValue(f=scale_values['w_scale'].pop()))
new_output.attr['detections_per_class'].CopyFrom(
attr_value_pb2.AttrValue(i=detections_per_class))
new_output.attr['use_regular_nms'].CopyFrom(
attr_value_pb2.AttrValue(b=use_regular_nms))
new_output.input.extend(
['raw_outputs/box_encodings', 'raw_outputs/class_predictions', 'anchors'])
......@@ -133,9 +147,14 @@ def append_postprocessing_op(frozen_graph_def, max_detections,
return transformed_graph_def
def export_tflite_graph(pipeline_config, trained_checkpoint_prefix, output_dir,
add_postprocessing_op, max_detections,
max_classes_per_detection):
def export_tflite_graph(pipeline_config,
trained_checkpoint_prefix,
output_dir,
add_postprocessing_op,
max_detections,
max_classes_per_detection,
detections_per_class=100,
use_regular_nms=False):
"""Exports a tflite compatible graph and anchors for ssd detection model.
Anchors are written to a tensor and tflite compatible graph
......@@ -151,7 +170,10 @@ def export_tflite_graph(pipeline_config, trained_checkpoint_prefix, output_dir,
TFLite_Detection_PostProcess custom op
max_detections: Maximum number of detections (boxes) to show
max_classes_per_detection: Number of classes to display per detection
detections_per_class: In regular NonMaxSuppression, number of anchors used
for NonMaxSuppression per class
use_regular_nms: Flag to set postprocessing op to use Regular NMS instead
of Fast NMS.
Raises:
ValueError: if the pipeline config contains models other than ssd or uses an
......@@ -276,7 +298,8 @@ def export_tflite_graph(pipeline_config, trained_checkpoint_prefix, output_dir,
if add_postprocessing_op:
transformed_graph_def = append_postprocessing_op(
frozen_graph_def, max_detections, max_classes_per_detection,
nms_score_threshold, nms_iou_threshold, num_classes, scale_values)
nms_score_threshold, nms_iou_threshold, num_classes, scale_values,
detections_per_class, use_regular_nms)
else:
# Return frozen without adding post-processing custom op
transformed_graph_def = frozen_graph_def
......
......@@ -195,6 +195,8 @@ def add_output_tensor_nodes(postprocessed_tensors,
'detection_classes': [batch, max_detections]
'detection_masks': [batch, max_detections, mask_height, mask_width]
(optional).
'detection_keypoints': [batch, max_detections, num_keypoints, 2]
(optional).
'num_detections': [batch]
output_collection_name: Name of collection to add output tensors to.
......
......@@ -83,7 +83,7 @@ tf_record_input_reader: { input_path: '${INPUT_TFRECORDS_WITH_DETECTIONS}' }
" > ${OUTPUT_CONFIG_DIR}/input_config.pbtxt
echo "
metrics_set: 'oid_challenge_object_detection_metrics'
metrics_set: 'oid_challenge_detection_metrics'
" > ${OUTPUT_CONFIG_DIR}/eval_config.pbtxt
OUTPUT_METRICS_DIR=/path/to/metrics_csv
......
......@@ -108,11 +108,13 @@ Model name
## Open Images-trained models
Model name | Speed (ms) | Open Images mAP@0.5[^2] | Outputs
----------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---: | :-------------: | :-----:
[faster_rcnn_inception_resnet_v2_atrous_oid](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_oid_2018_01_28.tar.gz) | 727 | 37 | Boxes
[faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2018_01_28.tar.gz) | 347 | | Boxes
[facessd_mobilenet_v2_quantized_open_image_v4](http://download.tensorflow.org/models/object_detection/facessd_mobilenet_v2_quantized_320x320_open_image_v4.tar.gz) [^3] | 20 | 73 (faces) | Boxes
Model name | Speed (ms) | Open Images mAP@0.5[^2] | Outputs
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------: | :---------------------: | :-----:
[faster_rcnn_inception_resnet_v2_atrous_oidv2](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_oid_2018_01_28.tar.gz) | 727 | 37 | Boxes
[faster_rcnn_inception_resnet_v2_atrous_lowproposals_oidv2](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2018_01_28.tar.gz) | 347 | | Boxes
[faster_rcnn_inception_resnet_v2_atrous_oidv4](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_oidv4_2018_10_30.tar.gz) | 455 | | Boxes
[ssd_mobilenetv2_oidv4](http://download.tensorflow.org/models/object_detection/ssd_mobilenetv2_oidv4_2018_10_30.tar.gz) | 24 | | Boxes
[facessd_mobilenet_v2_quantized_open_image_v4](http://download.tensorflow.org/models/object_detection/facessd_mobilenet_v2_quantized_320x320_open_image_v4.tar.gz) [^3] | 20 | 73 (faces) | Boxes
## iNaturalist Species-trained models
......
......@@ -65,7 +65,7 @@ intersection over union based on the object masks instead of object boxes.
## Open Images V2 detection metric
`EvalConfig.metrics_set='open_images_V2_detection_metrics'`
`EvalConfig.metrics_set='oid_V2_detection_metrics'`
This metric is defined originally for evaluating detector performance on [Open
Images V2 dataset](https://github.com/openimages/dataset) and is fairly similar
......@@ -132,14 +132,20 @@ convention, the evaluation software treats all classes independently, ignoring
the hierarchy. To achieve high performance values, object detectors should
output bounding-boxes labelled in the same manner.
The old metric name is DEPRECATED.
`EvalConfig.metrics_set='open_images_V2_detection_metrics'`
## OID Challenge Object Detection Metric 2018
`EvalConfig.metrics_set='oid_challenge_object_detection_metrics'`
`EvalConfig.metrics_set='oid_challenge_detection_metrics'`
The metric for the OID Challenge Object Detection Metric 2018, Object Detection
track. The description is provided on the [Open Images Challenge
website](https://storage.googleapis.com/openimages/web/challenge.html).
The old metric name is DEPRECATED.
`EvalConfig.metrics_set='oid_challenge_object_detection_metrics'`
## OID Challenge Visual Relationship Detection Metric 2018
The metric for the OID Challenge Visual Relationship Detection Metric 2018, Visual
......
......@@ -216,7 +216,7 @@ tf_record_input_reader: { input_path: '${SPLIT}_detections.tfrecord@${NUM_SHARDS
" > ${SPLIT}_eval_metrics/${SPLIT}_input_config.pbtxt
echo "
metrics_set: 'open_images_V2_detection_metrics'
metrics_set: 'oid_V2_detection_metrics'
" > ${SPLIT}_eval_metrics/${SPLIT}_eval_config.pbtxt
```
......
......@@ -49,14 +49,14 @@ will output the frozen graph that we can input to TensorFlow Lite directly and
is the one we’ll be using.
Next we’ll use TensorFlow Lite to get the optimized model by using
[TOCO](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/toco),
[TOCO](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/toco),
the TensorFlow Lite Optimizing Converter. This will convert the resulting frozen
graph (tflite_graph.pb) to the TensorFlow Lite flatbuffer format (detect.tflite)
via the following command. For a quantized model, run this from the tensorflow/
directory:
```shell
bazel run --config=opt tensorflow/contrib/lite/toco:toco -- \
bazel run --config=opt tensorflow/lite/toco:toco -- \
--input_file=$OUTPUT_DIR/tflite_graph.pb \
--output_file=$OUTPUT_DIR/detect.tflite \
--input_shapes=1,300,300,3 \
......@@ -75,14 +75,14 @@ are named 'TFLite_Detection_PostProcess', 'TFLite_Detection_PostProcess:1',
'TFLite_Detection_PostProcess:2', and 'TFLite_Detection_PostProcess:3' and
represent four arrays: detection_boxes, detection_classes, detection_scores, and
num_detections. The documentation for other flags used in this command is
[here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/toco/g3doc/cmdline_reference.md).
[here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/toco/g3doc/cmdline_reference.md).
If things ran successfully, you should now see a third file in the /tmp/tflite
directory called detect.tflite. This file contains the graph and all model
parameters and can be run via the TensorFlow Lite interpreter on the Android
device. For a floating point model, run this from the tensorflow/ directory:
```shell
bazel run -c opt tensorflow/lite/toco:toco -- \
bazel run --config=opt tensorflow/lite/toco:toco -- \
--input_file=$OUTPUT_DIR/tflite_graph.pb \
--output_file=$OUTPUT_DIR/detect.tflite \
--input_shapes=1,300,300,3 \
......@@ -105,7 +105,7 @@ Studio](https://developer.android.com/studio/index.html). To build the
TensorFlow Lite Android demo, build tools require API >= 23 (but it will run on
devices with API >= 21). Additional details are available on the [TensorFlow
Lite Android App
page](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/java/demo/README.md).
page](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/java/demo/README.md).
Next we need to point the app to our new detect.tflite file and give it the
names of our new labels. Specifically, we will copy our TensorFlow Lite
......@@ -113,24 +113,24 @@ flatbuffer to the app assets directory with the following command:
```shell
cp /tmp/tflite/detect.tflite \
//tensorflow/contrib/lite/examples/android/app/src/main/assets
//tensorflow/lite/examples/android/app/src/main/assets
```
You will also need to copy your new labelmap labels_list.txt to the assets
directory.
We will now edit the BUILD file to point to this new model. First, open the
BUILD file tensorflow/contrib/lite/examples/android/BUILD. Then find the assets
BUILD file tensorflow/lite/examples/android/BUILD. Then find the assets
section, and replace the line “@tflite_mobilenet_ssd_quant//:detect.tflite”
(which by default points to a COCO pretrained model) with the path to your new
TFLite model
“//tensorflow/contrib/lite/examples/android/app/src/main/assets:detect.tflite”.
“//tensorflow/lite/examples/android/app/src/main/assets:detect.tflite”.
Finally, change the last line in assets section to use the new label map as
well.
We will also need to tell our app to use the new label map. In order to do this,
open up the
tensorflow/contrib/lite/examples/android/app/src/main/java/org/tensorflow/demo/DetectorActivity.java
tensorflow/lite/examples/android/app/src/main/java/org/tensorflow/demo/DetectorActivity.java
file in a text editor and find the definition of TF_OD_API_LABELS_FILE. Update
this path to point to your new label map file:
"file:///android_asset/labels_list.txt". Note that if your model is quantized,
......@@ -150,7 +150,7 @@ from the tensorflow directory:
```shell
bazel build -c opt --config=android_arm{,64} --cxxopt='--std=c++11'
"//tensorflow/contrib/lite/examples/android:tflite_demo"
"//tensorflow/lite/examples/android:tflite_demo"
```
Now install the demo on a
......@@ -159,5 +159,5 @@ Android phone via [Android Debug
Bridge](https://developer.android.com/studio/command-line/adb) (adb):
```shell
adb install bazel-bin/tensorflow/contrib/lite/examples/android/tflite_demo.apk
adb install bazel-bin/tensorflow/lite/examples/android/tflite_demo.apk
```
......@@ -139,12 +139,10 @@ def transform_input_data(tensor_dict,
if fields.InputDataFields.groundtruth_confidences in tensor_dict:
groundtruth_confidences = tensor_dict[
fields.InputDataFields.groundtruth_confidences]
# Map the confidences to the one-hot encoding of classes
tensor_dict[fields.InputDataFields.groundtruth_confidences] = (
tf.sparse_to_dense(
zero_indexed_groundtruth_classes,
[num_classes],
groundtruth_confidences,
validate_indices=False))
tf.reshape(groundtruth_confidences, [-1, 1]) *
tensor_dict[fields.InputDataFields.groundtruth_classes])
else:
groundtruth_confidences = tf.ones_like(
zero_indexed_groundtruth_classes, dtype=tf.float32)
......@@ -200,10 +198,14 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
if fields.InputDataFields.image_additional_channels in tensor_dict:
num_additional_channels = tensor_dict[
fields.InputDataFields.image_additional_channels].shape[2].value
num_image_channels = 3
if fields.InputDataFields.image in tensor_dict:
num_image_channels = tensor_dict[fields.InputDataFields
.image].shape[2].value
padding_shapes = {
# Additional channels are merged before batching.
fields.InputDataFields.image: [
height, width, 3 + num_additional_channels
height, width, num_image_channels + num_additional_channels
],
fields.InputDataFields.original_image_spatial_shape: [2],
fields.InputDataFields.image_additional_channels: [
......@@ -215,8 +217,6 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
fields.InputDataFields.groundtruth_difficult: [max_num_boxes],
fields.InputDataFields.groundtruth_boxes: [max_num_boxes, 4],
fields.InputDataFields.groundtruth_classes: [max_num_boxes, num_classes],
fields.InputDataFields.groundtruth_confidences: [
max_num_boxes, num_classes],
fields.InputDataFields.groundtruth_instance_masks: [
max_num_boxes, height, width
],
......@@ -224,9 +224,12 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
fields.InputDataFields.groundtruth_group_of: [max_num_boxes],
fields.InputDataFields.groundtruth_area: [max_num_boxes],
fields.InputDataFields.groundtruth_weights: [max_num_boxes],
fields.InputDataFields.groundtruth_confidences: [
max_num_boxes, num_classes
],
fields.InputDataFields.num_groundtruth_boxes: [],
fields.InputDataFields.groundtruth_label_types: [max_num_boxes],
fields.InputDataFields.groundtruth_label_scores: [max_num_boxes],
fields.InputDataFields.groundtruth_label_weights: [max_num_boxes],
fields.InputDataFields.true_image_shape: [3],
fields.InputDataFields.multiclass_scores: [
max_num_boxes, num_classes + 1 if num_classes is not None else None
......@@ -237,7 +240,7 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
if fields.InputDataFields.original_image in tensor_dict:
padding_shapes[fields.InputDataFields.original_image] = [
height, width, 3 + num_additional_channels
height, width, num_image_channels + num_additional_channels
]
if fields.InputDataFields.groundtruth_keypoints in tensor_dict:
tensor_shape = (
......@@ -287,9 +290,15 @@ def augment_input_data(tensor_dict, data_augmentation_options):
in tensor_dict)
include_keypoints = (fields.InputDataFields.groundtruth_keypoints
in tensor_dict)
include_label_weights = (fields.InputDataFields.groundtruth_weights
in tensor_dict)
include_label_confidences = (fields.InputDataFields.groundtruth_confidences
in tensor_dict)
tensor_dict = preprocessor.preprocess(
tensor_dict, data_augmentation_options,
func_arg_map=preprocessor.get_default_func_arg_map(
include_label_weights=include_label_weights,
include_label_confidences=include_label_confidences,
include_instance_masks=include_instance_masks,
include_keypoints=include_keypoints))
tensor_dict[fields.InputDataFields.image] = tf.squeeze(
......@@ -303,7 +312,7 @@ def _get_labels_dict(input_dict):
fields.InputDataFields.num_groundtruth_boxes,
fields.InputDataFields.groundtruth_boxes,
fields.InputDataFields.groundtruth_classes,
fields.InputDataFields.groundtruth_weights
fields.InputDataFields.groundtruth_weights,
]
labels_dict = {}
for key in required_label_keys:
......
......@@ -93,17 +93,17 @@ class InputsTest(test_case.TestCase, parameterized.TestCase):
labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual(
[1, 100],
labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_weights].dtype)
self.assertAllEqual(
[1, 100, model_config.faster_rcnn.num_classes],
labels[fields.InputDataFields.groundtruth_confidences].shape.as_list())
self.assertEqual(
tf.float32,
labels[fields.InputDataFields.groundtruth_confidences].dtype)
self.assertAllEqual(
[1, 100],
labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_weights].dtype)
@parameterized.parameters(
{'eval_batch_size': 1},
......@@ -141,11 +141,11 @@ class InputsTest(test_case.TestCase, parameterized.TestCase):
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual(
[eval_batch_size, 100, model_config.faster_rcnn.num_classes],
labels[fields.InputDataFields.groundtruth_confidences].shape.as_list())
[eval_batch_size, 100],
labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(
tf.float32,
labels[fields.InputDataFields.groundtruth_confidences].dtype)
labels[fields.InputDataFields.groundtruth_weights].dtype)
self.assertAllEqual(
[eval_batch_size, 100],
labels[fields.InputDataFields.groundtruth_area].shape.as_list())
......@@ -194,17 +194,12 @@ class InputsTest(test_case.TestCase, parameterized.TestCase):
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual(
[batch_size, 100, model_config.ssd.num_classes],
[batch_size, 100],
labels[
fields.InputDataFields.groundtruth_confidences].shape.as_list())
fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(
tf.float32,
labels[fields.InputDataFields.groundtruth_confidences].dtype)
self.assertAllEqual(
[batch_size, 100],
labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_weights].dtype)
labels[fields.InputDataFields.groundtruth_weights].dtype)
@parameterized.parameters(
{'eval_batch_size': 1},
......@@ -242,12 +237,12 @@ class InputsTest(test_case.TestCase, parameterized.TestCase):
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual(
[eval_batch_size, 100, model_config.ssd.num_classes],
[eval_batch_size, 100],
labels[
fields.InputDataFields.groundtruth_confidences].shape.as_list())
fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(
tf.float32,
labels[fields.InputDataFields.groundtruth_confidences].dtype)
labels[fields.InputDataFields.groundtruth_weights].dtype)
self.assertAllEqual(
[eval_batch_size, 100],
labels[fields.InputDataFields.groundtruth_area].shape.as_list())
......@@ -447,7 +442,7 @@ class DataAugmentationFnTest(test_case.TestCase):
tf.constant(np.array([[.5, .5, 1., 1.]], np.float32)),
fields.InputDataFields.groundtruth_classes:
tf.constant(np.array([1.0], np.float32)),
fields.InputDataFields.groundtruth_confidences:
fields.InputDataFields.groundtruth_weights:
tf.constant(np.array([0.8], np.float32)),
}
augmented_tensor_dict = data_augmentation_fn(tensor_dict=tensor_dict)
......@@ -468,7 +463,7 @@ class DataAugmentationFnTest(test_case.TestCase):
)
self.assertAllClose(
augmented_tensor_dict_out[
fields.InputDataFields.groundtruth_confidences],
fields.InputDataFields.groundtruth_weights],
[0.8]
)
......@@ -634,6 +629,34 @@ class DataTransformationFnTest(test_case.TestCase):
transformed_inputs[fields.InputDataFields.num_groundtruth_boxes],
1)
def test_returns_correct_groundtruth_confidences_when_input_present(self):
tensor_dict = {
fields.InputDataFields.image:
tf.constant(np.random.rand(4, 4, 3).astype(np.float32)),
fields.InputDataFields.groundtruth_boxes:
tf.constant(np.array([[0, 0, 1, 1], [.5, .5, 1, 1]], np.float32)),
fields.InputDataFields.groundtruth_classes:
tf.constant(np.array([3, 1], np.int32)),
fields.InputDataFields.groundtruth_confidences:
tf.constant(np.array([1.0, -1.0], np.float32))
}
num_classes = 3
input_transformation_fn = functools.partial(
inputs.transform_input_data,
model_preprocess_fn=_fake_model_preprocessor_fn,
image_resizer_fn=_fake_image_resizer_fn,
num_classes=num_classes)
with self.test_session() as sess:
transformed_inputs = sess.run(
input_transformation_fn(tensor_dict=tensor_dict))
self.assertAllClose(
transformed_inputs[fields.InputDataFields.groundtruth_classes],
[[0, 0, 1], [1, 0, 0]])
self.assertAllClose(
transformed_inputs[fields.InputDataFields.groundtruth_confidences],
[[0, 0, 1], [-1, 0, 0]])
def test_returns_resized_masks(self):
tensor_dict = {
fields.InputDataFields.image:
......@@ -879,6 +902,41 @@ class PadInputDataToStaticShapesFnTest(test_case.TestCase):
padded_tensor_dict[fields.InputDataFields.image_additional_channels]
.shape.as_list(), [5, 6, 2])
def test_gray_images(self):
input_tensor_dict = {
fields.InputDataFields.image:
tf.placeholder(tf.float32, [None, None, 1]),
}
padded_tensor_dict = inputs.pad_input_data_to_static_shapes(
tensor_dict=input_tensor_dict,
max_num_boxes=3,
num_classes=3,
spatial_image_shape=[5, 6])
self.assertAllEqual(
padded_tensor_dict[fields.InputDataFields.image].shape.as_list(),
[5, 6, 1])
def test_gray_images_and_additional_channels(self):
input_tensor_dict = {
fields.InputDataFields.image:
tf.placeholder(tf.float32, [None, None, 1]),
fields.InputDataFields.image_additional_channels:
tf.placeholder(tf.float32, [None, None, 2]),
}
padded_tensor_dict = inputs.pad_input_data_to_static_shapes(
tensor_dict=input_tensor_dict,
max_num_boxes=3,
num_classes=3,
spatial_image_shape=[5, 6])
self.assertAllEqual(
padded_tensor_dict[fields.InputDataFields.image].shape.as_list(),
[5, 6, 3])
self.assertAllEqual(
padded_tensor_dict[fields.InputDataFields.image_additional_channels]
.shape.as_list(), [5, 6, 2])
def test_keypoints(self):
input_tensor_dict = {
fields.InputDataFields.groundtruth_keypoints:
......
......@@ -39,12 +39,18 @@ EVAL_METRICS_CLASS_DICT = {
object_detection_evaluation.PascalInstanceSegmentationEvaluator,
'weighted_pascal_voc_instance_segmentation_metrics':
object_detection_evaluation.WeightedPascalInstanceSegmentationEvaluator,
'oid_V2_detection_metrics':
object_detection_evaluation.OpenImagesDetectionEvaluator,
# DEPRECATED: please use oid_V2_detection_metrics instead
'open_images_V2_detection_metrics':
object_detection_evaluation.OpenImagesDetectionEvaluator,
'coco_detection_metrics':
coco_evaluation.CocoDetectionEvaluator,
'coco_mask_metrics':
coco_evaluation.CocoMaskEvaluator,
'oid_challenge_detection_metrics':
object_detection_evaluation.OpenImagesDetectionChallengeEvaluator,
# DEPRECATED: please use oid_challenge_detection_metrics instead
'oid_challenge_object_detection_metrics':
object_detection_evaluation.OpenImagesDetectionChallengeEvaluator,
}
......@@ -146,6 +152,16 @@ def get_evaluators(eval_config, categories):
for eval_metric_fn_key in eval_metric_fn_keys:
if eval_metric_fn_key not in EVAL_METRICS_CLASS_DICT:
raise ValueError('Metric not found: {}'.format(eval_metric_fn_key))
if eval_metric_fn_key == 'oid_challenge_object_detection_metrics':
logging.warning(
'oid_challenge_object_detection_metrics is deprecated; '
'use oid_challenge_detection_metrics instead'
)
if eval_metric_fn_key == 'oid_V2_detection_metrics':
logging.warning(
'open_images_V2_detection_metrics is deprecated; '
'use oid_V2_detection_metrics instead'
)
evaluators_list.append(
EVAL_METRICS_CLASS_DICT[eval_metric_fn_key](categories=categories))
return evaluators_list
......
......@@ -75,6 +75,7 @@ def create_input_queue(batch_size_per_clone, create_tensor_dict_fn,
tensor_dict = preprocessor.preprocess(
tensor_dict, data_augmentation_options,
func_arg_map=preprocessor.get_default_func_arg_map(
include_label_weights=True,
include_multiclass_scores=include_multiclass_scores,
include_instance_masks=include_instance_masks,
include_keypoints=include_keypoints))
......
......@@ -1568,7 +1568,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
Returns:
A dictionary containing:
`detection_boxes`: [batch, max_detection, 4]
`detection_boxes`: [batch, max_detection, 4] in normalized co-ordinates.
`detection_scores`: [batch, max_detections]
`detection_classes`: [batch, max_detections]
`num_detections`: [batch]
......@@ -1701,14 +1701,14 @@ class FasterRCNNMetaArch(model.DetectionModel):
prediction_dict['refined_box_encodings'],
prediction_dict['class_predictions_with_background'],
prediction_dict['proposal_boxes'],
prediction_dict['num_proposals'],
groundtruth_boxlists,
prediction_dict['num_proposals'], groundtruth_boxlists,
groundtruth_classes_with_background_list,
groundtruth_weights_list,
prediction_dict['image_shape'],
prediction_dict.get('mask_predictions'),
groundtruth_masks_list,
))
groundtruth_weights_list, prediction_dict['image_shape'],
prediction_dict.get('mask_predictions'), groundtruth_masks_list,
prediction_dict.get(
fields.DetectionResultFields.detection_boxes),
prediction_dict.get(
fields.DetectionResultFields.num_detections)))
return loss_dict
def _loss_rpn(self, rpn_box_encodings,
......@@ -1811,7 +1811,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
groundtruth_weights_list,
image_shape,
prediction_masks=None,
groundtruth_masks_list=None):
groundtruth_masks_list=None,
detection_boxes=None,
num_detections=None):
"""Computes scalar box classifier loss tensors.
Uses self._detector_target_assigner to obtain regression and classification
......@@ -1854,6 +1856,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
groundtruth_masks_list: an optional list of 3-D tensors of shape
[num_boxes, image_height, image_width] containing the instance masks for
each of the boxes.
detection_boxes: 3-D float tensor of shape [batch,
max_total_detections, 4] containing post-processed detection boxes in
normalized co-ordinates.
num_detections: 1-D int32 tensor of shape [batch] containing number of
valid detections in `detection_boxes`.
Returns:
a dictionary mapping loss keys ('second_stage_localization_loss',
......@@ -1867,7 +1874,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
"""
with tf.name_scope('BoxClassifierLoss'):
paddings_indicator = self._padded_batched_proposals_indicator(
num_proposals, self.max_num_proposals)
num_proposals, proposal_boxes.shape[1])
proposal_boxlists = [
box_list.BoxList(proposal_boxes_single_image)
for proposal_boxes_single_image in tf.unstack(proposal_boxes)]
......@@ -1958,6 +1965,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
raise ValueError('Groundtruth instance masks not provided. '
'Please configure input reader.')
if not self._is_training:
(proposal_boxes, proposal_boxlists, paddings_indicator,
one_hot_flat_cls_targets_with_background
) = self._get_mask_proposal_boxes_and_classes(
detection_boxes, num_detections, image_shape,
groundtruth_boxlists, groundtruth_classes_with_background_list,
groundtruth_weights_list)
unmatched_mask_label = tf.zeros(image_shape[1:3], dtype=tf.float32)
(batch_mask_targets, _, _, batch_mask_target_weights,
_) = target_assigner.batch_assign_targets(
......@@ -2031,6 +2045,64 @@ class FasterRCNNMetaArch(model.DetectionModel):
loss_dict[mask_loss.op.name] = mask_loss
return loss_dict
def _get_mask_proposal_boxes_and_classes(
self, detection_boxes, num_detections, image_shape, groundtruth_boxlists,
groundtruth_classes_with_background_list, groundtruth_weights_list):
"""Returns proposal boxes and class targets to compute evaluation mask loss.
During evaluation, detection boxes are used to extract features for mask
prediction. Therefore, to compute mask loss during evaluation detection
boxes must be used to compute correct class and mask targets. This function
returns boxes and classes in the correct format for computing mask targets
during evaluation.
Args:
detection_boxes: A 3-D float tensor of shape [batch, max_detection_boxes,
4] containing detection boxes in normalized co-ordinates.
num_detections: A 1-D float tensor of shape [batch] containing number of
valid boxes in `detection_boxes`.
image_shape: A 1-D tensor of shape [4] containing image tensor shape.
groundtruth_boxlists: A list of groundtruth boxlists.
groundtruth_classes_with_background_list: A list of groundtruth classes.
groundtruth_weights_list: A list of groundtruth weights.
Return:
mask_proposal_boxes: detection boxes to use for mask proposals in absolute
co-ordinates.
mask_proposal_boxlists: `mask_proposal_boxes` in a list of BoxLists in
absolute co-ordinates.
mask_proposal_paddings_indicator: a tensor indicating valid boxes.
mask_proposal_one_hot_flat_cls_targets_with_background: Class targets
computed using detection boxes.
"""
batch, max_num_detections, _ = detection_boxes.shape.as_list()
proposal_boxes = tf.reshape(box_list_ops.to_absolute_coordinates(
box_list.BoxList(tf.reshape(detection_boxes, [-1, 4])), image_shape[1],
image_shape[2]).get(), [batch, max_num_detections, 4])
proposal_boxlists = [
box_list.BoxList(detection_boxes_single_image)
for detection_boxes_single_image in tf.unstack(proposal_boxes)
]
paddings_indicator = self._padded_batched_proposals_indicator(
tf.to_int32(num_detections), detection_boxes.shape[1])
(batch_cls_targets_with_background, _, _, _,
_) = target_assigner.batch_assign_targets(
target_assigner=self._detector_target_assigner,
anchors_batch=proposal_boxlists,
gt_box_batch=groundtruth_boxlists,
gt_class_targets_batch=groundtruth_classes_with_background_list,
unmatched_class_label=tf.constant(
[1] + self._num_classes * [0], dtype=tf.float32),
gt_weights_batch=groundtruth_weights_list)
flat_cls_targets_with_background = tf.reshape(
batch_cls_targets_with_background, [-1, self._num_classes + 1])
one_hot_flat_cls_targets_with_background = tf.argmax(
flat_cls_targets_with_background, axis=1)
one_hot_flat_cls_targets_with_background = tf.one_hot(
one_hot_flat_cls_targets_with_background,
flat_cls_targets_with_background.get_shape()[1])
return (proposal_boxes, proposal_boxlists, paddings_indicator,
one_hot_flat_cls_targets_with_background)
def _get_refined_encodings_for_postitive_class(
self, refined_box_encodings, flat_cls_targets_with_background,
batch_size):
......@@ -2185,4 +2257,3 @@ class FasterRCNNMetaArch(model.DetectionModel):
A list of update operators.
"""
return tf.get_collection(tf.GraphKeys.UPDATE_OPS)
......@@ -281,8 +281,12 @@ class SSDMetaArch(model.DetectionModel):
freeze_batchnorm=False,
inplace_batchnorm_update=False,
add_background_class=True,
explicit_background_class=False,
random_example_sampler=None,
expected_classification_loss_under_sampling=None):
expected_loss_weights_fn=None,
use_confidences_as_targets=False,
implicit_example_weight=0.5,
equalization_loss_config=None):
"""SSDMetaArch Constructor.
TODO(rathodv,jonathanhuang): group NMS parameters + score converter into
......@@ -335,17 +339,29 @@ class SSDMetaArch(model.DetectionModel):
dependency on tf.graphkeys.UPDATE_OPS collection in order to update
batch norm statistics.
add_background_class: Whether to add an implicit background class to
one-hot encodings of groundtruth labels. Set to false if using
groundtruth labels with an explicit background class or using multiclass
scores instead of truth in the case of distillation.
one-hot encodings of groundtruth labels. Set to false if training a
single class model or using groundtruth labels with an explicit
background class.
explicit_background_class: Set to true if using groundtruth labels with an
explicit background class, as in multiclass scores.
random_example_sampler: a BalancedPositiveNegativeSampler object that can
perform random example sampling when computing loss. If None, random
sampling process is skipped. Note that random example sampler and hard
example miner can both be applied to the model. In that case, random
sampler will take effect first and hard example miner can only process
the random sampled examples.
expected_classification_loss_under_sampling: If not None, use
to calcualte classification loss by background/foreground weighting.
expected_loss_weights_fn: If not None, use to calculate
loss by background/foreground weighting. Should take batch_cls_targets
as inputs and return foreground_weights, background_weights. See
expected_classification_loss_by_expected_sampling and
expected_classification_loss_by_reweighting_unmatched_anchors in
third_party/tensorflow_models/object_detection/utils/ops.py as examples.
use_confidences_as_targets: Whether to use groundtruth_condifences field
to assign the targets.
implicit_example_weight: a float number that specifies the weight used
for the implicit negative examples.
equalization_loss_config: a namedtuple that specifies configs for
computing equalization loss.
"""
super(SSDMetaArch, self).__init__(num_classes=box_predictor.num_classes)
self._is_training = is_training
......@@ -358,6 +374,11 @@ class SSDMetaArch(model.DetectionModel):
self._box_coder = box_coder
self._feature_extractor = feature_extractor
self._add_background_class = add_background_class
self._explicit_background_class = explicit_background_class
if add_background_class and explicit_background_class:
raise ValueError("Cannot have both 'add_background_class' and"
" 'explicit_background_class' true.")
# Needed for fine-tuning from classification checkpoints whose
# variables do not have the feature extractor scope.
......@@ -370,15 +391,18 @@ class SSDMetaArch(model.DetectionModel):
# Slim feature extractors get an explicit naming scope
self._extract_features_scope = 'FeatureExtractor'
if self._add_background_class and encode_background_as_zeros:
self._unmatched_class_label = tf.constant((self.num_classes + 1) * [0],
tf.float32)
elif self._add_background_class:
self._unmatched_class_label = tf.constant([1] + self.num_classes * [0],
tf.float32)
if encode_background_as_zeros:
background_class = [0]
else:
background_class = [1]
if self._add_background_class:
num_foreground_classes = self.num_classes
else:
self._unmatched_class_label = tf.constant(self.num_classes * [0],
tf.float32)
num_foreground_classes = self.num_classes - 1
self._unmatched_class_label = tf.constant(
background_class + num_foreground_classes * [0], tf.float32)
self._target_assigner = target_assigner_instance
......@@ -399,8 +423,11 @@ class SSDMetaArch(model.DetectionModel):
self._anchors = None
self._add_summaries = add_summaries
self._batched_prediction_tensor_names = []
self._expected_classification_loss_under_sampling = (
expected_classification_loss_under_sampling)
self._expected_loss_weights_fn = expected_loss_weights_fn
self._use_confidences_as_targets = use_confidences_as_targets
self._implicit_example_weight = implicit_example_weight
self._equalization_loss_config = equalization_loss_config
@property
def anchors(self):
......@@ -647,7 +674,7 @@ class SSDMetaArch(model.DetectionModel):
detection_scores = self._score_conversion_fn(class_predictions)
detection_scores = tf.identity(detection_scores, 'raw_box_scores')
if self._add_background_class:
if self._add_background_class or self._explicit_background_class:
detection_scores = tf.slice(detection_scores, [0, 0, 1], [-1, -1, -1])
additional_fields = None
......@@ -720,11 +747,14 @@ class SSDMetaArch(model.DetectionModel):
weights = None
if self.groundtruth_has_field(fields.BoxListFields.weights):
weights = self.groundtruth_lists(fields.BoxListFields.weights)
confidences = None
if self.groundtruth_has_field(fields.BoxListFields.confidences):
confidences = self.groundtruth_lists(fields.BoxListFields.confidences)
(batch_cls_targets, batch_cls_weights, batch_reg_targets,
batch_reg_weights, match_list) = self._assign_targets(
self.groundtruth_lists(fields.BoxListFields.boxes),
self.groundtruth_lists(fields.BoxListFields.classes),
keypoints, weights)
keypoints, weights, confidences)
if self._add_summaries:
self._summarize_target_assignment(
self.groundtruth_lists(fields.BoxListFields.boxes), match_list)
......@@ -762,7 +792,7 @@ class SSDMetaArch(model.DetectionModel):
weights=batch_cls_weights,
losses_mask=losses_mask)
if self._expected_classification_loss_under_sampling:
if self._expected_loss_weights_fn:
# Need to compute losses for assigned targets against the
# unmatched_class_label as well as their assigned targets.
# simplest thing (but wasteful) is just to calculate all losses
......@@ -787,8 +817,16 @@ class SSDMetaArch(model.DetectionModel):
batch_cls_targets = tf.concat(
[1 - batch_cls_targets, batch_cls_targets], axis=-1)
cls_losses = self._expected_classification_loss_under_sampling(
batch_cls_targets, cls_losses, unmatched_cls_losses)
location_losses = tf.tile(location_losses, [1, num_classes])
foreground_weights, background_weights = (
self._expected_loss_weights_fn(batch_cls_targets))
cls_losses = (
foreground_weights * cls_losses +
background_weights * unmatched_cls_losses)
location_losses *= foreground_weights
classification_loss = tf.reduce_sum(cls_losses)
localization_loss = tf.reduce_sum(location_losses)
......@@ -824,6 +862,8 @@ class SSDMetaArch(model.DetectionModel):
str(localization_loss.op.name): localization_loss,
str(classification_loss.op.name): classification_loss
}
return loss_dict
def _minibatch_subsample_fn(self, inputs):
......@@ -864,9 +904,12 @@ class SSDMetaArch(model.DetectionModel):
visualization_utils.add_cdf_image_summary(negative_anchor_cls_loss,
'NegativeAnchorLossCDF')
def _assign_targets(self, groundtruth_boxes_list, groundtruth_classes_list,
def _assign_targets(self,
groundtruth_boxes_list,
groundtruth_classes_list,
groundtruth_keypoints_list=None,
groundtruth_weights_list=None):
groundtruth_weights_list=None,
groundtruth_confidences_list=None):
"""Assign groundtruth targets.
Adds a background class to each one-hot encoding of groundtruth classes
......@@ -885,6 +928,9 @@ class SSDMetaArch(model.DetectionModel):
[num_boxes, num_keypoints, 2]
groundtruth_weights_list: A list of 1-D tf.float32 tensors of shape
[num_boxes] containing weights for groundtruth boxes.
groundtruth_confidences_list: A list of 2-D tf.float32 tensors of shape
[num_boxes, num_classes] containing class confidences for
groundtruth boxes.
Returns:
batch_cls_targets: a tensor with shape [batch_size, num_anchors,
......@@ -901,11 +947,18 @@ class SSDMetaArch(model.DetectionModel):
groundtruth_boxlists = [
box_list.BoxList(boxes) for boxes in groundtruth_boxes_list
]
train_using_confidences = (self._is_training and
self._use_confidences_as_targets)
if self._add_background_class:
groundtruth_classes_with_background_list = [
tf.pad(one_hot_encoding, [[0, 0], [1, 0]], mode='CONSTANT')
for one_hot_encoding in groundtruth_classes_list
]
if train_using_confidences:
groundtruth_confidences_with_background_list = [
tf.pad(groundtruth_confidences, [[0, 0], [1, 0]], mode='CONSTANT')
for groundtruth_confidences in groundtruth_confidences_list
]
else:
groundtruth_classes_with_background_list = groundtruth_classes_list
......@@ -913,10 +966,24 @@ class SSDMetaArch(model.DetectionModel):
for boxlist, keypoints in zip(
groundtruth_boxlists, groundtruth_keypoints_list):
boxlist.add_field(fields.BoxListFields.keypoints, keypoints)
return target_assigner.batch_assign_targets(
self._target_assigner, self.anchors, groundtruth_boxlists,
groundtruth_classes_with_background_list, self._unmatched_class_label,
groundtruth_weights_list)
if train_using_confidences:
return target_assigner.batch_assign_confidences(
self._target_assigner,
self.anchors,
groundtruth_boxlists,
groundtruth_confidences_with_background_list,
groundtruth_weights_list,
self._unmatched_class_label,
self._add_background_class,
self._implicit_example_weight)
else:
return target_assigner.batch_assign_targets(
self._target_assigner,
self.anchors,
groundtruth_boxlists,
groundtruth_classes_with_background_list,
self._unmatched_class_label,
groundtruth_weights_list)
def _summarize_target_assignment(self, groundtruth_boxes_list, match_list):
"""Creates tensorflow summaries for the input boxes and anchors.
......
......@@ -22,6 +22,7 @@ import tensorflow as tf
from object_detection.meta_architectures import ssd_meta_arch
from object_detection.meta_architectures import ssd_meta_arch_test_lib
from object_detection.protos import model_pb2
from object_detection.utils import test_utils
slim = tf.contrib.slim
......@@ -35,28 +36,26 @@ keras = tf.keras.layers
class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
parameterized.TestCase):
def _create_model(self,
apply_hard_mining=True,
normalize_loc_loss_by_codesize=False,
add_background_class=True,
random_example_sampling=False,
weight_regression_loss_by_score=False,
use_expected_classification_loss_under_sampling=False,
min_num_negative_samples=1,
desired_negative_sampling_ratio=3,
use_keras=False,
predict_mask=False,
use_static_shapes=False,
nms_max_size_per_class=5):
def _create_model(
self,
apply_hard_mining=True,
normalize_loc_loss_by_codesize=False,
add_background_class=True,
random_example_sampling=False,
expected_loss_weights=model_pb2.DetectionModel().ssd.loss.NONE,
min_num_negative_samples=1,
desired_negative_sampling_ratio=3,
use_keras=False,
predict_mask=False,
use_static_shapes=False,
nms_max_size_per_class=5):
return super(SsdMetaArchTest, self)._create_model(
model_fn=ssd_meta_arch.SSDMetaArch,
apply_hard_mining=apply_hard_mining,
normalize_loc_loss_by_codesize=normalize_loc_loss_by_codesize,
add_background_class=add_background_class,
random_example_sampling=random_example_sampling,
weight_regression_loss_by_score=weight_regression_loss_by_score,
use_expected_classification_loss_under_sampling=
use_expected_classification_loss_under_sampling,
expected_loss_weights=expected_loss_weights,
min_num_negative_samples=min_num_negative_samples,
desired_negative_sampling_ratio=desired_negative_sampling_ratio,
use_keras=use_keras,
......@@ -358,91 +357,6 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
self.assertAllClose(localization_loss, expected_localization_loss)
self.assertAllClose(classification_loss, expected_classification_loss)
def test_loss_with_expected_classification_loss(self, use_keras):
with tf.Graph().as_default():
_, num_classes, num_anchors, _ = self._create_model(use_keras=use_keras)
def graph_fn(preprocessed_tensor, groundtruth_boxes1, groundtruth_boxes2,
groundtruth_classes1, groundtruth_classes2):
groundtruth_boxes_list = [groundtruth_boxes1, groundtruth_boxes2]
groundtruth_classes_list = [groundtruth_classes1, groundtruth_classes2]
model, _, _, _ = self._create_model(
apply_hard_mining=False,
add_background_class=True,
use_expected_classification_loss_under_sampling=True,
min_num_negative_samples=1,
desired_negative_sampling_ratio=desired_negative_sampling_ratio)
model.provide_groundtruth(groundtruth_boxes_list,
groundtruth_classes_list)
prediction_dict = model.predict(
preprocessed_tensor, true_image_shapes=None)
loss_dict = model.loss(prediction_dict, true_image_shapes=None)
return (loss_dict['Loss/localization_loss'],
loss_dict['Loss/classification_loss'])
batch_size = 2
desired_negative_sampling_ratio = 4
preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
groundtruth_boxes1 = np.array([[0, 0, .5, .5]], dtype=np.float32)
groundtruth_boxes2 = np.array([[0, 0, .5, .5]], dtype=np.float32)
groundtruth_classes1 = np.array([[1]], dtype=np.float32)
groundtruth_classes2 = np.array([[1]], dtype=np.float32)
expected_localization_loss = 0.0
expected_classification_loss = (
batch_size * (num_anchors + num_classes * num_anchors) * np.log(2.0))
(localization_loss, classification_loss) = self.execute(
graph_fn, [
preprocessed_input, groundtruth_boxes1, groundtruth_boxes2,
groundtruth_classes1, groundtruth_classes2
])
self.assertAllClose(localization_loss, expected_localization_loss)
self.assertAllClose(classification_loss, expected_classification_loss)
def test_loss_results_are_correct_with_weight_regression_loss_by_score(
self, use_keras):
with tf.Graph().as_default():
_, num_classes, num_anchors, _ = self._create_model(
use_keras=use_keras,
add_background_class=False,
weight_regression_loss_by_score=True)
def graph_fn(preprocessed_tensor, groundtruth_boxes1, groundtruth_boxes2,
groundtruth_classes1, groundtruth_classes2):
groundtruth_boxes_list = [groundtruth_boxes1, groundtruth_boxes2]
groundtruth_classes_list = [groundtruth_classes1, groundtruth_classes2]
model, _, _, _ = self._create_model(
use_keras=use_keras,
apply_hard_mining=False,
add_background_class=False,
weight_regression_loss_by_score=True)
model.provide_groundtruth(groundtruth_boxes_list,
groundtruth_classes_list)
prediction_dict = model.predict(
preprocessed_tensor, true_image_shapes=None)
loss_dict = model.loss(prediction_dict, true_image_shapes=None)
return (loss_dict['Loss/localization_loss'],
loss_dict['Loss/classification_loss'])
batch_size = 2
preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
groundtruth_boxes1 = np.array([[0, 0, 1, 1]], dtype=np.float32)
groundtruth_boxes2 = np.array([[0, 0, 1, 1]], dtype=np.float32)
groundtruth_classes1 = np.array([[1]], dtype=np.float32)
groundtruth_classes2 = np.array([[0]], dtype=np.float32)
expected_localization_loss = 0.25
expected_classification_loss = (
batch_size * num_anchors * num_classes * np.log(2.0))
(localization_loss, classification_loss) = self.execute(
graph_fn, [
preprocessed_input, groundtruth_boxes1, groundtruth_boxes2,
groundtruth_classes1, groundtruth_classes2
])
self.assertAllClose(localization_loss, expected_localization_loss)
self.assertAllClose(classification_loss, expected_classification_loss)
def test_loss_results_are_correct_with_losses_mask(self, use_keras):
......
......@@ -25,6 +25,7 @@ from object_detection.core import post_processing
from object_detection.core import region_similarity_calculator as sim_calc
from object_detection.core import target_assigner
from object_detection.meta_architectures import ssd_meta_arch
from object_detection.protos import model_pb2
from object_detection.utils import ops
from object_detection.utils import test_case
from object_detection.utils import test_utils
......@@ -111,31 +112,29 @@ class MockAnchorGenerator2x2(anchor_generator.AnchorGenerator):
class SSDMetaArchTestBase(test_case.TestCase):
"""Base class to test SSD based meta architectures."""
def _create_model(self,
model_fn=ssd_meta_arch.SSDMetaArch,
apply_hard_mining=True,
normalize_loc_loss_by_codesize=False,
add_background_class=True,
random_example_sampling=False,
weight_regression_loss_by_score=False,
use_expected_classification_loss_under_sampling=False,
min_num_negative_samples=1,
desired_negative_sampling_ratio=3,
use_keras=False,
predict_mask=False,
use_static_shapes=False,
nms_max_size_per_class=5):
def _create_model(
self,
model_fn=ssd_meta_arch.SSDMetaArch,
apply_hard_mining=True,
normalize_loc_loss_by_codesize=False,
add_background_class=True,
random_example_sampling=False,
expected_loss_weights=model_pb2.DetectionModel().ssd.loss.NONE,
min_num_negative_samples=1,
desired_negative_sampling_ratio=3,
use_keras=False,
predict_mask=False,
use_static_shapes=False,
nms_max_size_per_class=5):
is_training = False
num_classes = 1
mock_anchor_generator = MockAnchorGenerator2x2()
if use_keras:
mock_box_predictor = test_utils.MockKerasBoxPredictor(
is_training, num_classes, add_background_class=add_background_class,
predict_mask=predict_mask)
is_training, num_classes, add_background_class=add_background_class)
else:
mock_box_predictor = test_utils.MockBoxPredictor(
is_training, num_classes, add_background_class=add_background_class,
predict_mask=predict_mask)
is_training, num_classes, add_background_class=add_background_class)
mock_box_coder = test_utils.MockBoxCoder()
if use_keras:
fake_feature_extractor = FakeSSDKerasFeatureExtractor()
......@@ -177,17 +176,22 @@ class SSDMetaArchTestBase(test_case.TestCase):
region_similarity_calculator,
mock_matcher,
mock_box_coder,
negative_class_weight=negative_class_weight,
weight_regression_loss_by_score=weight_regression_loss_by_score)
negative_class_weight=negative_class_weight)
expected_classification_loss_under_sampling = None
if use_expected_classification_loss_under_sampling:
expected_classification_loss_under_sampling = functools.partial(
ops.expected_classification_loss_under_sampling,
min_num_negative_samples=min_num_negative_samples,
desired_negative_sampling_ratio=desired_negative_sampling_ratio)
model_config = model_pb2.DetectionModel()
if expected_loss_weights == model_config.ssd.loss.NONE:
expected_loss_weights_fn = None
else:
raise ValueError('Not a valid value for expected_loss_weights.')
code_size = 4
kwargs = {}
if predict_mask:
kwargs.update({
'mask_prediction_fn': test_utils.MockMaskHead(num_classes=1).predict,
})
model = model_fn(
is_training=is_training,
anchor_generator=mock_anchor_generator,
......@@ -211,8 +215,8 @@ class SSDMetaArchTestBase(test_case.TestCase):
inplace_batchnorm_update=False,
add_background_class=add_background_class,
random_example_sampler=random_example_sampler,
expected_classification_loss_under_sampling=
expected_classification_loss_under_sampling)
expected_loss_weights_fn=expected_loss_weights_fn,
**kwargs)
return model, num_classes, mock_anchor_generator.num_anchors(), code_size
def _get_value_for_matching_key(self, dictionary, suffix):
......
......@@ -54,49 +54,59 @@ MODEL_BUILD_UTIL_MAP = {
}
def _prepare_groundtruth_for_eval(detection_model, class_agnostic):
def _prepare_groundtruth_for_eval(detection_model, class_agnostic,
max_number_of_boxes):
"""Extracts groundtruth data from detection_model and prepares it for eval.
Args:
detection_model: A `DetectionModel` object.
class_agnostic: Whether the detections are class_agnostic.
max_number_of_boxes: Max number of groundtruth boxes.
Returns:
A tuple of:
groundtruth: Dictionary with the following fields:
'groundtruth_boxes': [num_boxes, 4] float32 tensor of boxes, in
normalized coordinates.
'groundtruth_classes': [num_boxes] int64 tensor of 1-indexed classes.
'groundtruth_masks': 3D float32 tensor of instance masks (if provided in
'groundtruth_boxes': [batch_size, num_boxes, 4] float32 tensor of boxes,
in normalized coordinates.
'groundtruth_classes': [batch_size, num_boxes] int64 tensor of 1-indexed
classes.
'groundtruth_masks': 4D float32 tensor of instance masks (if provided in
groundtruth)
'groundtruth_is_crowd': [num_boxes] bool tensor indicating is_crowd
annotations (if provided in groundtruth).
'groundtruth_is_crowd': [batch_size, num_boxes] bool tensor indicating
is_crowd annotations (if provided in groundtruth).
'num_groundtruth_boxes': [batch_size] tensor containing the maximum number
of groundtruth boxes per image..
class_agnostic: Boolean indicating whether detections are class agnostic.
"""
input_data_fields = fields.InputDataFields()
groundtruth_boxes = detection_model.groundtruth_lists(
fields.BoxListFields.boxes)[0]
groundtruth_boxes = tf.stack(
detection_model.groundtruth_lists(fields.BoxListFields.boxes))
groundtruth_boxes_shape = tf.shape(groundtruth_boxes)
# For class-agnostic models, groundtruth one-hot encodings collapse to all
# ones.
if class_agnostic:
groundtruth_boxes_shape = tf.shape(groundtruth_boxes)
groundtruth_classes_one_hot = tf.ones([groundtruth_boxes_shape[0], 1])
groundtruth_classes_one_hot = tf.ones(
[groundtruth_boxes_shape[0], groundtruth_boxes_shape[1], 1])
else:
groundtruth_classes_one_hot = detection_model.groundtruth_lists(
fields.BoxListFields.classes)[0]
groundtruth_classes_one_hot = tf.stack(
detection_model.groundtruth_lists(fields.BoxListFields.classes))
label_id_offset = 1 # Applying label id offset (b/63711816)
groundtruth_classes = (
tf.argmax(groundtruth_classes_one_hot, axis=1) + label_id_offset)
tf.argmax(groundtruth_classes_one_hot, axis=2) + label_id_offset)
groundtruth = {
input_data_fields.groundtruth_boxes: groundtruth_boxes,
input_data_fields.groundtruth_classes: groundtruth_classes
}
if detection_model.groundtruth_has_field(fields.BoxListFields.masks):
groundtruth[input_data_fields.groundtruth_instance_masks] = (
detection_model.groundtruth_lists(fields.BoxListFields.masks)[0])
groundtruth[input_data_fields.groundtruth_instance_masks] = tf.stack(
detection_model.groundtruth_lists(fields.BoxListFields.masks))
if detection_model.groundtruth_has_field(fields.BoxListFields.is_crowd):
groundtruth[input_data_fields.groundtruth_is_crowd] = (
detection_model.groundtruth_lists(fields.BoxListFields.is_crowd)[0])
groundtruth[input_data_fields.groundtruth_is_crowd] = tf.stack(
detection_model.groundtruth_lists(fields.BoxListFields.is_crowd))
groundtruth[input_data_fields.num_groundtruth_boxes] = (
tf.tile([max_number_of_boxes], multiples=[groundtruth_boxes_shape[0]]))
return groundtruth
......@@ -226,7 +236,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
boxes_shape = (
labels[fields.InputDataFields.groundtruth_boxes].get_shape()
.as_list())
unpad_groundtruth_tensors = True if boxes_shape[1] is not None else False
unpad_groundtruth_tensors = boxes_shape[1] is not None and not use_tpu
labels = unstack_batch(
labels, unpad_groundtruth_tensors=unpad_groundtruth_tensors)
......@@ -243,12 +253,17 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
gt_weights_list = None
if fields.InputDataFields.groundtruth_weights in labels:
gt_weights_list = labels[fields.InputDataFields.groundtruth_weights]
gt_confidences_list = None
if fields.InputDataFields.groundtruth_confidences in labels:
gt_confidences_list = labels[
fields.InputDataFields.groundtruth_confidences]
gt_is_crowd_list = None
if fields.InputDataFields.groundtruth_is_crowd in labels:
gt_is_crowd_list = labels[fields.InputDataFields.groundtruth_is_crowd]
detection_model.provide_groundtruth(
groundtruth_boxes_list=gt_boxes_list,
groundtruth_classes_list=gt_classes_list,
groundtruth_confidences_list=gt_confidences_list,
groundtruth_masks_list=gt_masks_list,
groundtruth_keypoints_list=gt_keypoints_list,
groundtruth_weights_list=gt_weights_list,
......@@ -378,24 +393,30 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
if mode == tf.estimator.ModeKeys.EVAL:
class_agnostic = (
fields.DetectionResultFields.detection_classes not in detections)
groundtruth = _prepare_groundtruth_for_eval(detection_model,
class_agnostic)
groundtruth = _prepare_groundtruth_for_eval(
detection_model, class_agnostic,
eval_input_config.max_number_of_boxes)
use_original_images = fields.InputDataFields.original_image in features
if use_original_images:
eval_images = tf.cast(tf.image.resize_bilinear(
features[fields.InputDataFields.original_image][0:1],
features[fields.InputDataFields.original_image_spatial_shape][0]),
tf.uint8)
eval_images = features[fields.InputDataFields.original_image]
true_image_shapes = tf.slice(
features[fields.InputDataFields.true_image_shape], [0, 0], [-1, 3])
original_image_spatial_shapes = features[fields.InputDataFields
.original_image_spatial_shape]
else:
eval_images = features[fields.InputDataFields.image]
true_image_shapes = None
original_image_spatial_shapes = None
eval_dict = eval_util.result_dict_for_single_example(
eval_images[0:1],
features[inputs.HASH_KEY][0],
eval_dict = eval_util.result_dict_for_batched_example(
eval_images,
features[inputs.HASH_KEY],
detections,
groundtruth,
class_agnostic=class_agnostic,
scale_to_absolute=True)
scale_to_absolute=True,
original_image_spatial_shapes=original_image_spatial_shapes,
true_image_shapes=true_image_shapes)
if class_agnostic:
category_index = label_map_util.create_class_agnostic_category_index()
......@@ -445,6 +466,15 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
eval_metrics=eval_metric_ops,
export_outputs=export_outputs)
else:
if scaffold is None:
keep_checkpoint_every_n_hours = (
train_config.keep_checkpoint_every_n_hours)
saver = tf.train.Saver(
sharded=True,
keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours,
save_relative_paths=True)
tf.add_to_collection(tf.GraphKeys.SAVERS, saver)
scaffold = tf.train.Scaffold(saver=saver)
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=detections,
......
......@@ -301,8 +301,6 @@ class WeightSharedConvolutionalBoxPredictor(box_predictor.BoxPredictor):
num_predictions_per_location):
if head_name == CLASS_PREDICTIONS_WITH_BACKGROUND:
tower_name_scope = 'ClassPredictionTower'
elif head_name == MASK_PREDICTIONS:
tower_name_scope = 'MaskPredictionTower'
else:
raise ValueError('Unknown head')
if self._share_prediction_tower:
......
......@@ -119,6 +119,10 @@ class ConvolutionalBoxPredictor(box_predictor.KerasBoxPredictor):
if other_heads:
self._prediction_heads.update(other_heads)
# We generate a consistent ordering for the prediction head names,
# So that all workers build the model in the exact same order
self._sorted_head_names = sorted(self._prediction_heads.keys())
self._conv_hyperparams = conv_hyperparams
self._min_depth = min_depth
self._max_depth = max_depth
......@@ -187,7 +191,7 @@ class ConvolutionalBoxPredictor(box_predictor.KerasBoxPredictor):
for layer in self._shared_nets[index]:
net = layer(net)
for head_name in self._prediction_heads:
for head_name in self._sorted_head_names:
head_obj = self._prediction_heads[head_name][index]
prediction = head_obj(net)
predictions[head_name].append(prediction)
......
......@@ -188,6 +188,8 @@ class ConvolutionalKerasBoxPredictorTest(test_case.TestCase):
'BoxPredictor/ConvolutionalClassHead_0/ClassPredictor/bias',
'BoxPredictor/ConvolutionalClassHead_0/ClassPredictor/kernel'])
self.assertEqual(expected_variable_set, actual_variable_set)
self.assertEqual(conv_box_predictor._sorted_head_names,
['box_encodings', 'class_predictions_with_background'])
# TODO(kaftan): Remove conditional after CMLE moves to TF 1.10
......
......@@ -15,18 +15,6 @@ message BoxPredictor {
}
}
// Configuration proto for MaskHead in predictors.
// Next id: 4
message MaskHead {
// The height and the width of the predicted mask. Only used when
// predict_instance_masks is true.
optional int32 mask_height = 1 [default = 15];
optional int32 mask_width = 2 [default = 15];
// Whether to predict class agnostic masks. Only used when
// predict_instance_masks is true.
optional bool masks_are_class_agnostic = 3 [default = true];
}
// Configuration proto for Convolutional box predictor.
// Next id: 13
......@@ -69,9 +57,6 @@ message ConvolutionalBoxPredictor {
// Whether to use depthwise separable convolution for box predictor layers.
optional bool use_depthwise = 11 [default = false];
// Configs for a mask prediction head.
optional MaskHead mask_head = 12;
}
// Configuration proto for weight shared convolutional box predictor.
......@@ -113,9 +98,6 @@ message WeightSharedConvolutionalBoxPredictor {
// Whether to use depthwise separable convolution for box predictor layers.
optional bool use_depthwise = 14 [default = false];
// Configs for a mask prediction head.
optional MaskHead mask_head = 15;
// Enum to specify how to convert the detection scores at inference time.
enum ScoreConverter {
// Input scores equals output scores.
......
......@@ -35,9 +35,16 @@ message Hyperparams {
}
optional Activation activation = 4 [default = RELU];
// BatchNorm hyperparameters. If this parameter is NOT set then BatchNorm is
// not applied!
optional BatchNorm batch_norm = 5;
oneof normalizer_oneof {
// Note that if nothing below is selected, then no normalization is applied
// BatchNorm hyperparameters.
BatchNorm batch_norm = 5;
// GroupNorm hyperparameters. This is only supported on a subset of models.
// Note that the current implementation of group norm instantiated in
// tf.contrib.group.layers.group_norm() only supports fixed_size_resizer
// for image preprocessing.
GroupNorm group_norm = 7;
}
// Whether depthwise convolutions should be regularized. If this parameter is
// NOT set then the conv hyperparams will default to the parent scope.
......@@ -113,3 +120,8 @@ message BatchNorm {
// forward pass but they are never updated.
optional bool train = 5 [default = true];
}
// Configuration proto for group normalization to apply after convolution op.
// https://arxiv.org/abs/1803.08494
message GroupNorm {
}
......@@ -23,6 +23,44 @@ message Loss {
// If not left to default, applies random example sampling.
optional RandomExampleSampler random_example_sampler = 6;
// Equalization loss.
message EqualizationLoss {
// Weight equalization loss strength.
optional float weight = 1 [default=0.0];
// When computing equalization loss, ops that start with
// equalization_exclude_prefixes will be ignored. Only used when
// equalization_weight > 0.
repeated string exclude_prefixes = 2;
}
optional EqualizationLoss equalization_loss = 7;
enum ExpectedLossWeights {
NONE = 0;
// Use expected_classification_loss_by_expected_sampling
// from third_party/tensorflow_models/object_detection/utils/ops.py
EXPECTED_SAMPLING = 1;
// Use expected_classification_loss_by_reweighting_unmatched_anchors
// from third_party/tensorflow_models/object_detection/utils/ops.py
REWEIGHTING_UNMATCHED_ANCHORS = 2;
}
// Method to compute expected loss weights with respect to balanced
// positive/negative sampling scheme. If NONE, use explicit sampling.
// TODO(birdbrain): Move under ExpectedLossWeights.
optional ExpectedLossWeights expected_loss_weights = 18 [default = NONE];
// Minimum number of effective negative samples.
// Only applies if expected_loss_weights is not NONE.
// TODO(birdbrain): Move under ExpectedLossWeights.
optional float min_num_negative_samples = 19 [default=0];
// Desired number of effective negative samples per positive sample.
// Only applies if expected_loss_weights is not NONE.
// TODO(birdbrain): Move under ExpectedLossWeights.
optional float desired_negative_sampling_ratio = 20 [default=3];
}
// Configuration for bounding box localization loss function.
......
......@@ -166,13 +166,13 @@ message RandomCropImage {
message RandomPadImage {
// Minimum dimensions for padded image. If unset, will use original image
// dimension as a lower bound.
optional float min_image_height = 1;
optional float min_image_width = 2;
optional int32 min_image_height = 1;
optional int32 min_image_width = 2;
// Maximum dimensions for padded image. If unset, will use double the original
// image dimension as a lower bound.
optional float max_image_height = 3;
optional float max_image_width = 4;
optional int32 max_image_height = 3;
optional int32 max_image_width = 4;
// Color of the padding. If unset, will pad using average color of the input
// image.
......
......@@ -12,7 +12,7 @@ import "object_detection/protos/post_processing.proto";
import "object_detection/protos/region_similarity_calculator.proto";
// Configuration for Single Shot Detection (SSD) models.
// Next id: 22
// Next id: 26
message Ssd {
// Number of classes to predict.
......@@ -35,7 +35,7 @@ message Ssd {
// Whether background targets are to be encoded as an all
// zeros vector or a one-hot vector (where background is the 0th class).
optional bool encode_background_as_zeros = 12 [default=false];
optional bool encode_background_as_zeros = 12 [default = false];
// classification weight to be associated to negative
// anchors (default: 1.0). The weight must be in [0., 1.].
......@@ -52,11 +52,11 @@ message Ssd {
// Whether to normalize the loss by number of groundtruth boxes that match to
// the anchors.
optional bool normalize_loss_by_num_matches = 10 [default=true];
optional bool normalize_loss_by_num_matches = 10 [default = true];
// Whether to normalize the localization loss by the code size of the box
// encodings. This is applied along with other normalization factors.
optional bool normalize_loc_loss_by_codesize = 14 [default=false];
optional bool normalize_loc_loss_by_codesize = 14 [default = false];
// Loss configuration for training.
optional Loss loss = 11;
......@@ -82,29 +82,66 @@ message Ssd {
// to update the batch norm moving average parameters.
optional bool inplace_batchnorm_update = 15 [default = false];
// Whether to weight the regression loss by the score of the ground truth box
// the anchor matches to.
optional bool weight_regression_loss_by_score = 17 [default=false];
// Whether to add an implicit background class to one-hot encodings of
// groundtruth labels. Set to false if training a single
// class model or using an explicit background class.
optional bool add_background_class = 21 [default = true];
// Whether to compute expected loss with respect to balanced positive/negative
// sampling scheme. If false, use explicit sampling.
optional bool use_expected_classification_loss_under_sampling = 18 [default=false];
// Whether to use an explicit background class. Set to true if using
// groundtruth labels with an explicit background class, as in multiclass
// scores.
optional bool explicit_background_class = 24 [default = false];
// Minimum number of effective negative samples.
// Only applies if use_expected_classification_loss_under_sampling is true.
optional float min_num_negative_samples = 19 [default=0];
optional bool use_confidences_as_targets = 22 [default = false];
// Desired number of effective negative samples per positive sample.
// Only applies if use_expected_classification_loss_under_sampling is true.
optional float desired_negative_sampling_ratio = 20 [default=3];
optional float implicit_example_weight = 23 [default = 1.0];
// Whether to add an implicit background class to one-hot encodings of
// groundtruth labels. Set to false if using groundtruth labels with an
// explicit background class, using multiclass scores, or if training a single
// class model.
optional bool add_background_class = 21 [default = true];
}
// Configuration proto for MaskHead.
// Next id: 11
message MaskHead {
// The height and the width of the predicted mask. Only used when
// predict_instance_masks is true.
optional int32 mask_height = 1 [default = 15];
optional int32 mask_width = 2 [default = 15];
// Whether to predict class agnostic masks. Only used when
// predict_instance_masks is true.
optional bool masks_are_class_agnostic = 3 [default = true];
// The depth for the first conv2d_transpose op applied to the
// image_features in the mask prediction branch. If set to 0, the value
// will be set automatically based on the number of channels in the image
// features and the number of classes.
optional int32 mask_prediction_conv_depth = 4 [default = 256];
// The number of convolutions applied to image_features in the mask prediction
// branch.
optional int32 mask_prediction_num_conv_layers = 5 [default = 2];
// Whether to apply convolutions on mask features before upsampling using
// nearest neighbor resizing.
// By default, mask features are resized to [`mask_height`, `mask_width`]
// before applying convolutions and predicting masks.
optional bool convolve_then_upsample_masks = 6 [default = false];
// Mask loss weight.
optional float mask_loss_weight = 7 [default=5.0];
// Number of boxes to be generated at training time for computing mask loss.
optional int32 mask_loss_sample_size = 8 [default=16];
// Hyperparameters for convolution ops used in the box predictor.
optional Hyperparams conv_hyperparams = 9;
// Output size (width and height are set to be the same) of the initial
// bilinear interpolation based cropping during ROI pooling. Only used when
// we have second stage prediction head enabled (e.g. mask head).
optional int32 initial_crop_size = 10 [default = 15];
}
// Configs for mask head.
optional MaskHead mask_head_config = 25;
}
message SsdFeatureExtractor {
reserved 6;
......@@ -113,10 +150,10 @@ message SsdFeatureExtractor {
optional string type = 1;
// The factor to alter the depth of the channels in the feature extractor.
optional float depth_multiplier = 2 [default=1.0];
optional float depth_multiplier = 2 [default = 1.0];
// Minimum number of the channels in the feature extractor.
optional int32 min_depth = 3 [default=16];
optional int32 min_depth = 3 [default = 16];
// Hyperparameters that affect the layers of feature extractor added on top
// of the base feature extractor.
......@@ -128,7 +165,8 @@ message SsdFeatureExtractor {
// layers while base feature extractor uses its own default hyperparams. If
// this value is set to true, the base feature extractor's hyperparams will be
// overridden with the `conv_hyperparams`.
optional bool override_base_feature_extractor_hyperparams = 9 [default = false];
optional bool override_base_feature_extractor_hyperparams = 9
[default = false];
// The nearest multiple to zero-pad the input height and width dimensions to.
// For example, if pad_to_multiple = 2, input dimensions are zero-padded
......@@ -138,11 +176,11 @@ message SsdFeatureExtractor {
// Whether to use explicit padding when extracting SSD multiresolution
// features. This will also apply to the base feature extractor if a MobileNet
// architecture is used.
optional bool use_explicit_padding = 7 [default=false];
optional bool use_explicit_padding = 7 [default = false];
// Whether to use depthwise separable convolutions for to extract additional
// feature maps added by SSD.
optional bool use_depthwise = 8 [default=false];
optional bool use_depthwise = 8 [default = false];
// Feature Pyramid Networks config.
optional FeaturePyramidNetworks fpn = 10;
......@@ -173,4 +211,3 @@ message FeaturePyramidNetworks {
// channel depth for additional coarse feature layers.
optional int32 additional_layer_depth = 3 [default = 256];
}
......@@ -20,7 +20,7 @@ message TrainConfig {
optional bool sync_replicas = 3 [default=false];
// How frequently to keep checkpoints.
optional uint32 keep_checkpoint_every_n_hours = 4 [default=1000];
optional float keep_checkpoint_every_n_hours = 4 [default=10000.0];
// Optimizer used to train the DetectionModel.
optional Optimizer optimizer = 5;
......
......@@ -33,6 +33,7 @@ import collections
import logging
import unicodedata
import numpy as np
import tensorflow as tf
from object_detection.core import standard_fields
from object_detection.utils import label_map_util
......@@ -126,6 +127,7 @@ class ObjectDetectionEvaluator(DetectionEvaluator):
categories,
matching_iou_threshold=0.5,
evaluate_corlocs=False,
evaluate_precision_recall=False,
metric_prefix=None,
use_weighted_mean_ap=False,
evaluate_masks=False,
......@@ -140,6 +142,8 @@ class ObjectDetectionEvaluator(DetectionEvaluator):
boxes to detection boxes.
evaluate_corlocs: (optional) boolean which determines if corloc scores
are to be returned or not.
evaluate_precision_recall: (optional) boolean which determines if
precision and recall values are to be returned or not.
metric_prefix: (optional) string prefix for metric name; if None, no
prefix is used.
use_weighted_mean_ap: (optional) boolean which determines if the mean
......@@ -174,7 +178,50 @@ class ObjectDetectionEvaluator(DetectionEvaluator):
group_of_weight=self._group_of_weight)
self._image_ids = set([])
self._evaluate_corlocs = evaluate_corlocs
self._evaluate_precision_recall = evaluate_precision_recall
self._metric_prefix = (metric_prefix + '_') if metric_prefix else ''
self._expected_keys = set([
standard_fields.InputDataFields.key,
standard_fields.InputDataFields.groundtruth_boxes,
standard_fields.InputDataFields.groundtruth_classes,
standard_fields.InputDataFields.groundtruth_difficult,
standard_fields.InputDataFields.groundtruth_instance_masks,
standard_fields.DetectionResultFields.detection_boxes,
standard_fields.DetectionResultFields.detection_scores,
standard_fields.DetectionResultFields.detection_classes,
standard_fields.DetectionResultFields.detection_masks
])
self._build_metric_names()
def _build_metric_names(self):
"""Builds a list with metric names."""
self._metric_names = [
self._metric_prefix + 'Precision/mAP@{}IOU'.format(
self._matching_iou_threshold)
]
if self._evaluate_corlocs:
self._metric_names.append(
self._metric_prefix +
'Precision/meanCorLoc@{}IOU'.format(self._matching_iou_threshold))
category_index = label_map_util.create_category_index(self._categories)
for idx in range(self._num_classes):
if idx + self._label_id_offset in category_index:
category_name = category_index[idx + self._label_id_offset]['name']
try:
category_name = unicode(category_name, 'utf-8')
except TypeError:
pass
category_name = unicodedata.normalize('NFKD', category_name).encode(
'ascii', 'ignore')
self._metric_names.append(
self._metric_prefix + 'PerformanceByCategory/AP@{}IOU/{}'.format(
self._matching_iou_threshold, category_name))
if self._evaluate_corlocs:
self._metric_names.append(
self._metric_prefix + 'PerformanceByCategory/CorLoc@{}IOU/{}'
.format(self._matching_iou_threshold, category_name))
def add_single_ground_truth_image_info(self, image_id, groundtruth_dict):
"""Adds groundtruth for a single image to be used for evaluation.
......@@ -283,22 +330,19 @@ class ObjectDetectionEvaluator(DetectionEvaluator):
A dictionary of metrics with the following fields -
1. summary_metrics:
'Precision/mAP@<matching_iou_threshold>IOU': mean average precision at
the specified IOU threshold.
'<prefix if not empty>_Precision/mAP@<matching_iou_threshold>IOU': mean
average precision at the specified IOU threshold.
2. per_category_ap: category specific results with keys of the form
'PerformanceByCategory/mAP@<matching_iou_threshold>IOU/category'.
'<prefix if not empty>_PerformanceByCategory/
mAP@<matching_iou_threshold>IOU/category'.
"""
(per_class_ap, mean_ap, _, _, per_class_corloc, mean_corloc) = (
self._evaluation.evaluate())
pascal_metrics = {
self._metric_prefix +
'Precision/mAP@{}IOU'.format(self._matching_iou_threshold):
mean_ap
}
(per_class_ap, mean_ap, per_class_precision, per_class_recall,
per_class_corloc, mean_corloc) = (
self._evaluation.evaluate())
pascal_metrics = {self._metric_names[0]: mean_ap}
if self._evaluate_corlocs:
pascal_metrics[self._metric_prefix + 'Precision/meanCorLoc@{}IOU'.format(
self._matching_iou_threshold)] = mean_corloc
pascal_metrics[self._metric_names[1]] = mean_corloc
category_index = label_map_util.create_category_index(self._categories)
for idx in range(per_class_ap.size):
if idx + self._label_id_offset in category_index:
......@@ -314,6 +358,19 @@ class ObjectDetectionEvaluator(DetectionEvaluator):
self._matching_iou_threshold, category_name))
pascal_metrics[display_name] = per_class_ap[idx]
# Optionally add precision and recall values
if self._evaluate_precision_recall:
display_name = (
self._metric_prefix +
'PerformanceByCategory/Precision@{}IOU/{}'.format(
self._matching_iou_threshold, category_name))
pascal_metrics[display_name] = per_class_precision[idx]
display_name = (
self._metric_prefix +
'PerformanceByCategory/Recall@{}IOU/{}'.format(
self._matching_iou_threshold, category_name))
pascal_metrics[display_name] = per_class_recall[idx]
# Optionally add CorLoc metrics.classes
if self._evaluate_corlocs:
display_name = (
......@@ -332,6 +389,74 @@ class ObjectDetectionEvaluator(DetectionEvaluator):
label_id_offset=self._label_id_offset)
self._image_ids.clear()
def get_estimator_eval_metric_ops(self, eval_dict):
"""Returns dict of metrics to use with `tf.estimator.EstimatorSpec`.
Note that this must only be implemented if performing evaluation with a
`tf.estimator.Estimator`.
Args:
eval_dict: A dictionary that holds tensors for evaluating an object
detection model, returned from
eval_util.result_dict_for_single_example(). It must contain
standard_fields.InputDataFields.key.
Returns:
A dictionary of metric names to tuple of value_op and update_op that can
be used as eval metric ops in `tf.estimator.EstimatorSpec`.
"""
# remove unexpected fields
eval_dict_filtered = dict()
for key, value in eval_dict.items():
if key in self._expected_keys:
eval_dict_filtered[key] = value
eval_dict_keys = eval_dict_filtered.keys()
def update_op(image_id, *eval_dict_batched_as_list):
"""Update operation that adds batch of images to ObjectDetectionEvaluator.
Args:
image_id: image id (single id or an array)
*eval_dict_batched_as_list: the values of the dictionary of tensors.
"""
if np.isscalar(image_id):
single_example_dict = dict(
zip(eval_dict_keys, eval_dict_batched_as_list))
self.add_single_ground_truth_image_info(image_id, single_example_dict)
self.add_single_detected_image_info(image_id, single_example_dict)
else:
for unzipped_tuple in zip(*eval_dict_batched_as_list):
single_example_dict = dict(zip(eval_dict_keys, unzipped_tuple))
image_id = single_example_dict[standard_fields.InputDataFields.key]
self.add_single_ground_truth_image_info(image_id, single_example_dict)
self.add_single_detected_image_info(image_id, single_example_dict)
args = [eval_dict_filtered[standard_fields.InputDataFields.key]]
args.extend(eval_dict_filtered.values())
update_op = tf.py_func(update_op, args, [])
def first_value_func():
self._metrics = self.evaluate()
self.clear()
return np.float32(self._metrics[self._metric_names[0]])
def value_func_factory(metric_name):
def value_func():
return np.float32(self._metrics[metric_name])
return value_func
# Ensure that the metrics are only evaluated once.
first_value_op = tf.py_func(first_value_func, [], tf.float32)
eval_metric_ops = {self._metric_names[0]: (first_value_op, update_op)}
with tf.control_dependencies([first_value_op]):
for metric_name in self._metric_names[1:]:
eval_metric_ops[metric_name] = (tf.py_func(
value_func_factory(metric_name), [], np.float32), update_op)
return eval_metric_ops
class PascalDetectionEvaluator(ObjectDetectionEvaluator):
"""A class to evaluate detections using PASCAL metrics."""
......@@ -442,6 +567,15 @@ class OpenImagesDetectionEvaluator(ObjectDetectionEvaluator):
evaluate_corlocs,
metric_prefix=metric_prefix,
group_of_weight=group_of_weight)
self._expected_keys = set([
standard_fields.InputDataFields.key,
standard_fields.InputDataFields.groundtruth_boxes,
standard_fields.InputDataFields.groundtruth_classes,
standard_fields.InputDataFields.groundtruth_group_of,
standard_fields.DetectionResultFields.detection_boxes,
standard_fields.DetectionResultFields.detection_scores,
standard_fields.DetectionResultFields.detection_classes,
])
def add_single_ground_truth_image_info(self, image_id, groundtruth_dict):
"""Adds groundtruth for a single image to be used for evaluation.
......@@ -535,6 +669,16 @@ class OpenImagesDetectionChallengeEvaluator(OpenImagesDetectionEvaluator):
group_of_weight=group_of_weight)
self._evaluatable_labels = {}
self._expected_keys = set([
standard_fields.InputDataFields.key,
standard_fields.InputDataFields.groundtruth_boxes,
standard_fields.InputDataFields.groundtruth_classes,
standard_fields.InputDataFields.groundtruth_group_of,
standard_fields.InputDataFields.groundtruth_image_classes,
standard_fields.DetectionResultFields.detection_boxes,
standard_fields.DetectionResultFields.detection_scores,
standard_fields.DetectionResultFields.detection_classes,
])
def add_single_ground_truth_image_info(self, image_id, groundtruth_dict):
"""Adds groundtruth for a single image to be used for evaluation.
......@@ -890,15 +1034,14 @@ class ObjectDetectionEvaluation(object):
if self.use_weighted_mean_ap:
all_scores = np.append(all_scores, scores)
all_tp_fp_labels = np.append(all_tp_fp_labels, tp_fp_labels)
logging.info('Scores and tpfp per class label: %d', class_index)
logging.info(tp_fp_labels)
logging.info(scores)
precision, recall = metrics.compute_precision_recall(
scores, tp_fp_labels, self.num_gt_instances_per_class[class_index])
self.precisions_per_class[class_index] = precision
self.recalls_per_class[class_index] = recall
average_precision = metrics.compute_average_precision(precision, recall)
self.average_precision_per_class[class_index] = average_precision
logging.info('average_precision: %f', average_precision)
self.corloc_per_class = metrics.compute_cor_loc(
self.num_gt_imgs_per_class,
......
......@@ -15,9 +15,10 @@
"""Tests for object_detection.utils.object_detection_evaluation."""
from absl.testing import parameterized
import numpy as np
import tensorflow as tf
from object_detection import eval_util
from object_detection.core import standard_fields
from object_detection.utils import object_detection_evaluation
......@@ -683,5 +684,141 @@ class ObjectDetectionEvaluationTest(tf.test.TestCase):
self.assertAlmostEqual(expected_mean_corloc, mean_corloc)
class ObjectDetectionEvaluatorTest(tf.test.TestCase, parameterized.TestCase):
def setUp(self):
self.categories = [{
'id': 1,
'name': 'person'
}, {
'id': 2,
'name': 'dog'
}, {
'id': 3,
'name': 'cat'
}]
self.od_eval = object_detection_evaluation.ObjectDetectionEvaluator(
categories=self.categories)
def _make_evaluation_dict(self,
resized_groundtruth_masks=False,
batch_size=1,
max_gt_boxes=None,
scale_to_absolute=False):
input_data_fields = standard_fields.InputDataFields
detection_fields = standard_fields.DetectionResultFields
image = tf.zeros(shape=[batch_size, 20, 20, 3], dtype=tf.uint8)
if batch_size == 1:
key = tf.constant('image1')
else:
key = tf.constant([str(i) for i in range(batch_size)])
detection_boxes = tf.concat([
tf.tile(
tf.constant([[[0., 0., 1., 1.]]]), multiples=[batch_size - 1, 1, 1
]),
tf.constant([[[0., 0., 0.5, 0.5]]])
],
axis=0)
detection_scores = tf.concat([
tf.tile(tf.constant([[0.5]]), multiples=[batch_size - 1, 1]),
tf.constant([[0.8]])
],
axis=0)
detection_classes = tf.tile(tf.constant([[0]]), multiples=[batch_size, 1])
detection_masks = tf.tile(
tf.ones(shape=[1, 2, 20, 20], dtype=tf.float32),
multiples=[batch_size, 1, 1, 1])
groundtruth_boxes = tf.constant([[0., 0., 1., 1.]])
groundtruth_classes = tf.constant([1])
groundtruth_instance_masks = tf.ones(shape=[1, 20, 20], dtype=tf.uint8)
num_detections = tf.ones([batch_size])
if resized_groundtruth_masks:
groundtruth_instance_masks = tf.ones(shape=[1, 10, 10], dtype=tf.uint8)
if batch_size > 1:
groundtruth_boxes = tf.tile(
tf.expand_dims(groundtruth_boxes, 0), multiples=[batch_size, 1, 1])
groundtruth_classes = tf.tile(
tf.expand_dims(groundtruth_classes, 0), multiples=[batch_size, 1])
groundtruth_instance_masks = tf.tile(
tf.expand_dims(groundtruth_instance_masks, 0),
multiples=[batch_size, 1, 1, 1])
detections = {
detection_fields.detection_boxes: detection_boxes,
detection_fields.detection_scores: detection_scores,
detection_fields.detection_classes: detection_classes,
detection_fields.detection_masks: detection_masks,
detection_fields.num_detections: num_detections
}
groundtruth = {
input_data_fields.groundtruth_boxes:
groundtruth_boxes,
input_data_fields.groundtruth_classes:
groundtruth_classes,
input_data_fields.groundtruth_instance_masks:
groundtruth_instance_masks,
}
if batch_size > 1:
return eval_util.result_dict_for_batched_example(
image,
key,
detections,
groundtruth,
scale_to_absolute=scale_to_absolute,
max_gt_boxes=max_gt_boxes)
else:
return eval_util.result_dict_for_single_example(
image,
key,
detections,
groundtruth,
scale_to_absolute=scale_to_absolute)
@parameterized.parameters({
'batch_size': 1,
'expected_map': 0,
'max_gt_boxes': None,
'scale_to_absolute': True
}, {
'batch_size': 8,
'expected_map': 0.765625,
'max_gt_boxes': [1],
'scale_to_absolute': True
}, {
'batch_size': 1,
'expected_map': 0,
'max_gt_boxes': None,
'scale_to_absolute': False
}, {
'batch_size': 8,
'expected_map': 0.765625,
'max_gt_boxes': [1],
'scale_to_absolute': False
})
def test_get_estimator_eval_metric_ops(self,
batch_size=1,
expected_map=1,
max_gt_boxes=None,
scale_to_absolute=False):
eval_dict = self._make_evaluation_dict(
batch_size=batch_size,
max_gt_boxes=max_gt_boxes,
scale_to_absolute=scale_to_absolute)
tf.logging.info('eval_dict: {}'.format(eval_dict))
metric_ops = self.od_eval.get_estimator_eval_metric_ops(eval_dict)
_, update_op = metric_ops['Precision/mAP@0.5IOU']
with self.test_session() as sess:
metrics = {}
for key, (value_op, _) in metric_ops.iteritems():
metrics[key] = value_op
sess.run(update_op)
metrics = sess.run(metrics)
self.assertAlmostEqual(expected_map, metrics['Precision/mAP@0.5IOU'])
if __name__ == '__main__':
tf.test.main()
......@@ -14,6 +14,7 @@
# ==============================================================================
"""A module for helper tensorflow ops."""
import collections
import math
import numpy as np
import six
......@@ -1087,81 +1088,10 @@ def native_crop_and_resize(image, boxes, crop_size, scope=None):
return tf.reshape(cropped_regions, final_shape)
def expected_classification_loss_under_sampling(
batch_cls_targets, cls_losses, unmatched_cls_losses,
desired_negative_sampling_ratio, min_num_negative_samples):
"""Computes classification loss by background/foreground weighting.
The weighting is such that the effective background/foreground weight ratio
is the desired_negative_sampling_ratio. if p_i is the foreground probability
of anchor a_i, L(a_i) is the anchors loss, N is the number of anchors, M
is the sum of foreground probabilities across anchors, and K is the desired
ratio between the number of negative and positive samples, then the total loss
L is calculated as:
beta = K*M/(N-M)
L = sum_{i=1}^N [p_i * L_p(a_i) + beta * (1 - p_i) * L_n(a_i)]
where L_p(a_i) is the loss against target assuming the anchor was matched,
otherwise zero, and L_n(a_i) is the loss against the background target
assuming the anchor was unmatched, otherwise zero.
Args:
batch_cls_targets: A tensor with shape [batch_size, num_anchors, num_classes
+ 1], where 0'th index is the background class, containing the class
distrubution for the target assigned to a given anchor.
cls_losses: Float tensor of shape [batch_size, num_anchors] representing
anchorwise classification losses.
unmatched_cls_losses: loss for each anchor against the unmatched class
target.
desired_negative_sampling_ratio: The desired background/foreground weight
ratio.
min_num_negative_samples: Minimum number of effective negative samples.
Used only when there are no positive examples.
Returns:
The classification loss.
"""
num_anchors = tf.cast(tf.shape(batch_cls_targets)[1], tf.float32)
# find the p_i
foreground_probabilities = 1 - batch_cls_targets[:, :, 0]
foreground_sum = tf.reduce_sum(foreground_probabilities, axis=-1)
# for each anchor, expected_j is the expected number of positive anchors
# given that this anchor was sampled as negative.
tiled_foreground_sum = tf.tile(
tf.reshape(foreground_sum, [-1, 1]),
[1, tf.cast(num_anchors, tf.int32)])
expected_j = tiled_foreground_sum - foreground_probabilities
k = desired_negative_sampling_ratio
# compute beta
expected_negatives = tf.to_float(num_anchors) - expected_j
desired_negatives = k * expected_j
desired_negatives = tf.where(
tf.greater(desired_negatives, expected_negatives), expected_negatives,
desired_negatives)
# probability that an anchor is sampled for the loss computation given that it
# is negative.
beta = desired_negatives / expected_negatives
# where the foreground sum is zero, use a minimum negative weight.
min_negative_weight = 1.0 * min_num_negative_samples / num_anchors
beta = tf.where(
tf.equal(tiled_foreground_sum, 0),
min_negative_weight * tf.ones_like(beta), beta)
foreground_weights = foreground_probabilities
background_weights = (1 - foreground_weights) * beta
weighted_foreground_losses = foreground_weights * cls_losses
weighted_background_losses = background_weights * unmatched_cls_losses
EqualizationLossConfig = collections.namedtuple('EqualizationLossConfig',
['weight', 'exclude_prefixes'])
cls_losses = tf.reduce_sum(
weighted_foreground_losses, axis=-1) + tf.reduce_sum(
weighted_background_losses, axis=-1)
return cls_losses
......@@ -21,6 +21,8 @@ from object_detection.core import standard_fields as fields
from object_detection.utils import ops
from object_detection.utils import test_case
slim = tf.contrib.slim
class NormalizedToImageCoordinatesTest(tf.test.TestCase):
......@@ -1466,189 +1468,9 @@ class OpsTestCropAndResize(test_case.TestCase):
self.assertAllClose(crop_output, expected_output)
class OpsTestExpectedClassificationLoss(test_case.TestCase):
def testExpectedClassificationLossUnderSamplingWithHardLabels(self):
def graph_fn(batch_cls_targets, cls_losses, unmatched_cls_losses,
negative_to_positive_ratio, min_num_negative_samples):
return ops.expected_classification_loss_under_sampling(
batch_cls_targets, cls_losses, unmatched_cls_losses,
negative_to_positive_ratio, min_num_negative_samples)
batch_cls_targets = np.array(
[[[1., 0, 0], [0, 1., 0]], [[1., 0, 0], [0, 1., 0]]], dtype=np.float32)
cls_losses = np.array([[1, 2], [3, 4]], dtype=np.float32)
unmatched_cls_losses = np.array([[10, 20], [30, 40]], dtype=np.float32)
negative_to_positive_ratio = np.array([2], dtype=np.float32)
min_num_negative_samples = np.array([1], dtype=np.float32)
classification_loss = self.execute(graph_fn, [
batch_cls_targets, cls_losses, unmatched_cls_losses,
negative_to_positive_ratio, min_num_negative_samples
])
# expected_foreground_sum = [1,1]
# expected_expected_j = [[1, 0], [1, 0]]
# expected_expected_negatives = [[1, 2], [1, 2]]
# expected_desired_negatives = [[2, 0], [2, 0]]
# expected_beta = [[1, 0], [1, 0]]
# expected_foreground_weights = [[0, 1], [0, 1]]
# expected_background_weights = [[1, 0], [1, 0]]
# expected_weighted_foreground_losses = [[0, 2], [0, 4]]
# expected_weighted_background_losses = [[10, 0], [30, 0]]
# expected_classification_loss_under_sampling = [6, 40]
expected_classification_loss_under_sampling = [2 + 10, 4 + 30]
self.assertAllClose(expected_classification_loss_under_sampling,
classification_loss)
def testExpectedClassificationLossUnderSamplingWithHardLabelsMoreNegatives(
self):
def graph_fn(batch_cls_targets, cls_losses, unmatched_cls_losses,
negative_to_positive_ratio, min_num_negative_samples):
return ops.expected_classification_loss_under_sampling(
batch_cls_targets, cls_losses, unmatched_cls_losses,
negative_to_positive_ratio, min_num_negative_samples)
batch_cls_targets = np.array(
[[[1., 0, 0], [0, 1., 0], [1., 0, 0], [1., 0, 0], [1., 0, 0]]],
dtype=np.float32)
cls_losses = np.array([[1, 2, 3, 4, 5]], dtype=np.float32)
unmatched_cls_losses = np.array([[10, 20, 30, 40, 50]], dtype=np.float32)
negative_to_positive_ratio = np.array([2], dtype=np.float32)
min_num_negative_samples = np.array([1], dtype=np.float32)
classification_loss = self.execute(graph_fn, [
batch_cls_targets, cls_losses, unmatched_cls_losses,
negative_to_positive_ratio, min_num_negative_samples
])
# expected_foreground_sum = [1]
# expected_expected_j = [[1, 0, 1, 1, 1]]
# expected_expected_negatives = [[4, 5, 4, 4, 4]]
# expected_desired_negatives = [[2, 0, 2, 2, 2]]
# expected_beta = [[.5, 0, .5, .5, .5]]
# expected_foreground_weights = [[0, 1, 0, 0, 0]]
# expected_background_weights = [[.5, 0, .5, .5, .5]]
# expected_weighted_foreground_losses = [[0, 2, 0, 0, 0]]
# expected_weighted_background_losses = [[10*.5, 0, 30*.5, 40*.5, 50*.5]]
# expected_classification_loss_under_sampling = [5+2+15+20+25]
expected_classification_loss_under_sampling = [5 + 2 + 15 + 20 + 25]
self.assertAllClose(expected_classification_loss_under_sampling,
classification_loss)
def testExpectedClassificationLossUnderSamplingWithAllNegative(self):
def graph_fn(batch_cls_targets, cls_losses, unmatched_cls_losses):
return ops.expected_classification_loss_under_sampling(
batch_cls_targets, cls_losses, unmatched_cls_losses,
negative_to_positive_ratio, min_num_negative_samples)
batch_cls_targets = np.array(
[[[1, 0, 0], [1, 0, 0]], [[1, 0, 0], [1, 0, 0]]], dtype=np.float32)
cls_losses = np.array([[1, 2], [3, 4]], dtype=np.float32)
unmatched_cls_losses = np.array([[10, 20], [30, 40]], dtype=np.float32)
negative_to_positive_ratio = np.array([2], dtype=np.float32)
min_num_negative_samples = np.array([1], dtype=np.float32)
classification_loss = self.execute(
graph_fn, [batch_cls_targets, cls_losses, unmatched_cls_losses])
# expected_foreground_sum = [0,0]
# expected_expected_j = [[0, 0], [0, 0]]
# expected_expected_negatives = [[2, 2], [2, 2]]
# expected_desired_negatives = [[0, 0], [0, 0]]
# expected_beta = [[0, 0],[0, 0]]
# expected_foreground_weights = [[0, 0], [0, 0]]
# expected_background_weights = [[.5, .5], [.5, .5]]
# expected_weighted_foreground_losses = [[0, 0], [0, 0]]
# expected_weighted_background_losses = [[5, 10], [15, 20]]
# expected_classification_loss_under_sampling = [15, 35]
expected_classification_loss_under_sampling = [
10 * .5 + 20 * .5, 30 * .5 + 40 * .5
]
self.assertAllClose(expected_classification_loss_under_sampling,
classification_loss)
def testExpectedClassificationLossUnderSamplingWithAllPositive(self):
def graph_fn(batch_cls_targets, cls_losses, unmatched_cls_losses):
return ops.expected_classification_loss_under_sampling(
batch_cls_targets, cls_losses, unmatched_cls_losses,
negative_to_positive_ratio, min_num_negative_samples)
batch_cls_targets = np.array(
[[[0, 1., 0], [0, 1., 0]], [[0, 1, 0], [0, 0, 1]]], dtype=np.float32)
cls_losses = np.array([[1, 2], [3, 4]], dtype=np.float32)
unmatched_cls_losses = np.array([[10, 20], [30, 40]], dtype=np.float32)
negative_to_positive_ratio = np.array([2], dtype=np.float32)
min_num_negative_samples = np.array([1], dtype=np.float32)
classification_loss = self.execute(
graph_fn, [batch_cls_targets, cls_losses, unmatched_cls_losses])
# expected_foreground_sum = [2,2]
# expected_expected_j = [[1, 1], [1, 1]]
# expected_expected_negatives = [[1, 1], [1, 1]]
# expected_desired_negatives = [[1, 1], [1, 1]]
# expected_beta = [[1, 1],[1, 1]]
# expected_foreground_weights = [[1, 1], [1, 1]]
# expected_background_weights = [[0, 0], [0, 0]]
# expected_weighted_foreground_losses = [[1, 2], [3, 4]]
# expected_weighted_background_losses = [[0, 0], [0, 0]]
# expected_classification_loss_under_sampling = [15, 35]
expected_classification_loss_under_sampling = [1 + 2, 3 + 4]
self.assertAllClose(expected_classification_loss_under_sampling,
classification_loss)
def testExpectedClassificationLossUnderSamplingWithSoftLabels(self):
def graph_fn(batch_cls_targets, cls_losses, unmatched_cls_losses,
negative_to_positive_ratio, min_num_negative_samples):
return ops.expected_classification_loss_under_sampling(
batch_cls_targets, cls_losses, unmatched_cls_losses,
negative_to_positive_ratio, min_num_negative_samples)
batch_cls_targets = np.array([[[.75, .25, 0], [0.25, .75, 0], [.75, .25, 0],
[0.25, .75, 0], [1., 0, 0]]],
dtype=np.float32)
cls_losses = np.array([[1, 2, 3, 4, 5]], dtype=np.float32)
unmatched_cls_losses = np.array([[10, 20, 30, 40, 50]], dtype=np.float32)
negative_to_positive_ratio = np.array([2], dtype=np.float32)
min_num_negative_samples = np.array([1], dtype=np.float32)
classification_loss = self.execute(graph_fn, [
batch_cls_targets, cls_losses, unmatched_cls_losses,
negative_to_positive_ratio, min_num_negative_samples
])
# expected_foreground_sum = [2]
# expected_expected_j = [[1.75, 1.25, 1.75, 1.25, 2]]
# expected_expected_negatives = [[3.25, 3.75, 3.25, 3.75, 3]]
# expected_desired_negatives = [[3.25, 2.5, 3.25, 2.5, 3]]
# expected_beta = [[1, 2/3, 1, 2/3, 1]]
# expected_foreground_weights = [[0.25, .75, .25, .75, 0]]
# expected_background_weights = [[[.75, 1/6., .75, 1/6., 1]]]
# expected_weighted_foreground_losses = [[.25*1, .75*2, .25*3, .75*4, 0*5]]
# expected_weighted_background_losses = [[
# .75*10, 1/6.*20, .75*30, 1/6.*40, 1*50]]
# expected_classification_loss_under_sampling = sum([
# .25*1, .75*2, .25*3, .75*4, 0, .75*10, 1/6.*20, .75*30,
# 1/6.*40, 1*50])
expected_classification_loss_under_sampling = [
sum([
.25 * 1, .75 * 2, .25 * 3, .75 * 4, 0, .75 * 10, 1 / 6. * 20,
.75 * 30, 1 / 6. * 40, 1 * 50
])
]
self.assertAllClose(expected_classification_loss_under_sampling,
classification_loss)
if __name__ == '__main__':
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册