提交 2f43cff2 编写于 作者: X Xianzhi Du 提交者: A. Unique TensorFlower

Internal change

PiperOrigin-RevId: 394772382
上级 bff5aad0
......@@ -52,6 +52,7 @@ In the near future, we will add:
| [Mask R-CNN](vision/beta/MODEL_GARDEN.md) | [Mask R-CNN](https://arxiv.org/abs/1703.06870) |
| [ShapeMask](vision/detection) | [ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors](https://arxiv.org/abs/1904.03239) |
| [SpineNet](vision/beta/MODEL_GARDEN.md) | [SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization](https://arxiv.org/abs/1912.05027) |
| [Cascade RCNN-RS and RetinaNet-RS](vision/beta/MODEL_GARDEN.md) | [Simple Training Strategies and Model Scaling for Object Detection](https://arxiv.org/abs/2107.00057)|
### Natural Language Processing
......
......@@ -54,9 +54,12 @@ depth, label smoothing and dropout.
### Common Settings and Notes
* We provide models based on two detection frameworks, [RetinaNet](https://arxiv.org/abs/1708.02002)
or [Mask R-CNN](https://arxiv.org/abs/1703.06870), and two backbones, [ResNet-FPN](https://arxiv.org/abs/1612.03144)
or [SpineNet](https://arxiv.org/abs/1912.05027).
* We provide models adopting [ResNet-FPN](https://arxiv.org/abs/1612.03144) and
[SpineNet](https://arxiv.org/abs/1912.05027) backbones based on detection frameworks:
* [RetinaNet](https://arxiv.org/abs/1708.02002) and [RetinaNet-RS](https://arxiv.org/abs/2107.00057)
* [Mask R-CNN](https://arxiv.org/abs/1703.06870)
* [Cascade RCNN](https://arxiv.org/abs/1712.00726) and [Cascade RCNN-RS](https://arxiv.org/abs/2107.00057)
* Models are all trained on COCO train2017 and evaluated on COCO val2017.
* Training details:
* Models finetuned from ImageNet pretrained checkpoints adopt the 12 or 36
......@@ -99,13 +102,22 @@ depth, label smoothing and dropout.
### Instance Segmentation Baselines
#### Mask R-CNN (ImageNet pretrained)
#### Mask R-CNN (Trained from scratch)
| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Mask AP | Download |
| ------------ |:-------------:| -------:|-----------:|-----------:|-------:|--------:|---------:|
| SpineNet-49 | 640x640 | 350 | 215.7 | 40.8 | 42.6 | 37.9 | config |
ResNet50-FPN | 640x640 | 350 | 227.7 | 46.3 | 42.3 | 37.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/r50fpn_640_coco_scratch_tpu4x4.yaml) |
| SpineNet-49 | 640x640 | 350 | 215.7 | 40.8 | 42.6 | 37.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet49_mrcnn_tpu.yaml) |
SpineNet-96 | 1024x1024 | 500 | 315.0 | 55.2 | 48.1 | 42.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet96_mrcnn_tpu.yaml) |
SpineNet-143 | 1280x1280 | 500 | 498.8 | 79.2 | 49.3 | 43.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet143_mrcnn_tpu.yaml) |
#### Cascade RCNN-RS (Trained from scratch)
backbone | resolution | epochs | params (M) | box AP | mask AP | download
------------ | :--------: | -----: | ---------: | -----: | ------: | -------:
SpineNet-49 | 640x640 | 500 | 56.4 | 46.4 | 40.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet49_cascadercnn_tpu.yaml)|
SpineNet-143 | 1280x1280 | 500 | 94.9 | 51.9 | 45.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet143_cascadercnn_tpu.yaml)|
## Semantic Segmentation
......@@ -131,7 +143,7 @@ depth, label smoothing and dropout.
### Common Settings and Notes
* We provide models for video classification with two backbones:
* We provide models for video classification with two backbones:
[SlowOnly](https://arxiv.org/abs/1812.03982) and 3D-ResNet (R3D) used in
[Spatiotemporal Contrastive Video Representation Learning](https://arxiv.org/abs/2008.03800).
* Training and evaluation details:
......
......@@ -13,7 +13,7 @@
# limitations under the License.
# Lint as: python3
"""Mask R-CNN configuration definition."""
"""R-CNN(-RS) configuration definition."""
import dataclasses
import os
......@@ -432,7 +432,7 @@ def maskrcnn_spinenet_coco() -> cfg.ExperimentConfig:
@exp_factory.register_config_factory('cascadercnn_spinenet_coco')
def cascadercnn_spinenet_coco() -> cfg.ExperimentConfig:
"""COCO object detection with Cascade R-CNN with SpineNet backbone."""
"""COCO object detection with Cascade RCNN-RS with SpineNet backbone."""
steps_per_epoch = 463
coco_val_samples = 5000
train_batch_size = 256
......
......@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
"""Mask R-CNN model."""
"""R-CNN(-RS) models."""
from typing import Any, List, Mapping, Optional, Tuple, Union
......@@ -24,7 +24,7 @@ from official.vision.beta.ops import box_ops
@tf.keras.utils.register_keras_serializable(package='Vision')
class MaskRCNNModel(tf.keras.Model):
"""The Mask R-CNN model."""
"""The Mask R-CNN(-RS) and Cascade RCNN-RS models."""
def __init__(self,
backbone: tf.keras.Model,
......@@ -48,7 +48,7 @@ class MaskRCNNModel(tf.keras.Model):
aspect_ratios: Optional[List[float]] = None,
anchor_size: Optional[float] = None,
**kwargs):
"""Initializes the Mask R-CNN model.
"""Initializes the R-CNN(-RS) model.
Args:
backbone: `tf.keras.Model`, the backbone network.
......@@ -65,19 +65,18 @@ class MaskRCNNModel(tf.keras.Model):
mask_roi_aligner: the ROI alginer for mask prediction.
class_agnostic_bbox_pred: if True, perform class agnostic bounding box
prediction. Needs to be `True` for Cascade RCNN models.
cascade_class_ensemble: if True, ensemble classification scores over
all detection heads.
cascade_class_ensemble: if True, ensemble classification scores over all
detection heads.
min_level: Minimum level in output feature maps.
max_level: Maximum level in output feature maps.
num_scales: A number representing intermediate scales added
on each level. For instances, num_scales=2 adds one additional
intermediate anchor scales [2^0, 2^0.5] on each level.
aspect_ratios: A list representing the aspect raito
anchors added on each level. The number indicates the ratio of width to
height. For instances, aspect_ratios=[1.0, 2.0, 0.5] adds three anchors
on each scale level.
anchor_size: A number representing the scale of size of the base
anchor to the feature stride 2^level.
num_scales: A number representing intermediate scales added on each level.
For instances, num_scales=2 adds one additional intermediate anchor
scales [2^0, 2^0.5] on each level.
aspect_ratios: A list representing the aspect raito anchors added on each
level. The number indicates the ratio of width to height. For instances,
aspect_ratios=[1.0, 2.0, 0.5] adds three anchors on each scale level.
anchor_size: A number representing the scale of size of the base anchor to
the feature stride 2^level.
**kwargs: keyword arguments to be passed.
"""
super(MaskRCNNModel, self).__init__(**kwargs)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册