提交 a86917df 编写于 作者: A A. Unique TensorFlower

Internal change

PiperOrigin-RevId: 357758634
上级 2fb3c898
......@@ -6,7 +6,7 @@ TF Vision model garden provides a large collection of baselines and checkpoints
## Image Classification
### ImageNet Baselines
#### Models trained with vanilla settings:
#### ResNet models trained with vanilla settings:
* Models are trained from scratch with batch size 4096 and 1.6 initial learning rate.
* Linear warmup is applied for the first 5 epochs.
* Models trained with l2 weight regularization and ReLU activation.
......@@ -18,17 +18,27 @@ TF Vision model garden provides a large collection of baselines and checkpoints
| ResNet-101 | 224x224 | 200 | 78.3 | 94.2 | config |
| ResNet-152 | 224x224 | 200 | 78.7 | 94.3 | config |
#### Models trained with training features including:
* Label smoothing 0.1.
* Swish activation.
| model | resolution | epochs | Top-1 | Top-5 | download |
| ------------ |:-------------:| ---------:|--------:|---------:|---------:|
| ResNet-50 | 224x224 | 200 | 78.1 | 93.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) |
| ResNet-101 | 224x224 | 200 | 79.1 | 94.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet101_tpu.yaml) |
| ResNet-152 | 224x224 | 200 | 79.4 | 94.7 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet152_tpu.yaml) |
| ResNet-200 | 224x224 | 200 | 79.9 | 94.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet200_tpu.yaml) |
#### ResNet-RS models trained with settings including:
* ResNet-RS architectural changes and Swish activation.
* Regularization methods including Random Augment, 4e-5 weight decay, stochastic depth, label smoothing and dropout.
* New training methods including a 350-epoch schedule, cosine learning rate and
EMA.
* Configs are in this [directory](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification)
model | resolution | params (M) | Top-1 | Top-5 | download
--------- | :--------: | -----: | ----: | ----: | -------:
ResNet-RS-50 | 160x160 | 35.7 | 79.1 | 94.5 |
ResNet-RS-101 | 160x160 | 63.7 | 80.2 | 94.9 |
ResNet-RS-101 | 192x192 | 63.7 | 81.3 | 95.6 |
ResNet-RS-152 | 192x192 | 86.8 | 81.9 | 95.8 |
ResNet-RS-152 | 224x224 | 86.8 | 82.5 | 96.1 |
ResNet-RS-152 | 256x256 | 86.8 | 83.1 | 96.3 |
ResNet-RS-200 | 256x256 | 93.4 | 83.5 | 96.6 |
ResNet-RS-270 | 256x256 | 130.1 | 83.6 | 96.6 |
ResNet-RS-350 | 256x256 | 164.3 | 83.7 | 96.7 |
ResNet-RS-350 | 320x320 | 164.3 | 84.2 | 96.9 |
## Object Detection and Instance Segmentation
......
# ResNet-300 ImageNet classification. 82.6% top-1 and 96.3% top-5 accuracy.
# ResNet-RS-101 ImageNet classification. 80.2% accuracy.
runtime:
distribution_strategy: 'tpu'
mixed_precision_dtype: 'bfloat16'
task:
model:
num_classes: 1001
input_size: [380, 380, 3]
input_size: [160, 160, 3]
backbone:
type: 'resnet'
resnet:
model_id: 300
stem_type: 'v1'
model_id: 101
replace_stem_max_pool: true
resnetd_shortcut: true
se_ratio: 0.25
stochastic_depth_drop_rate: 0.2
stem_type: 'v1'
stochastic_depth_drop_rate: 0.0
norm_activation:
activation: 'swish'
norm_momentum: 0.0
use_sync_bn: false
losses:
l2_weight_decay: 0.0001
l2_weight_decay: 0.00004
one_hot: true
label_smoothing: 0.1
train_data:
......@@ -24,6 +28,8 @@ task:
is_training: true
global_batch_size: 4096
dtype: 'bfloat16'
aug_policy: 'randaug'
randaug_magnitude: 15
validation_data:
input_path: 'imagenet-2012-tfrecord/valid*'
is_training: false
......@@ -31,13 +37,15 @@ task:
dtype: 'bfloat16'
drop_remainder: false
trainer:
train_steps: 62400
train_steps: 109200
validation_steps: 13
validation_interval: 312
steps_per_loop: 312
summary_interval: 312
checkpoint_interval: 312
optimizer_config:
ema:
average_decay: 0.9999
optimizer:
type: 'sgd'
sgd:
......@@ -46,7 +54,7 @@ trainer:
type: 'cosine'
cosine:
initial_learning_rate: 1.6
decay_steps: 62400
decay_steps: 109200
warmup:
type: 'linear'
linear:
......
# ResNet-RS-101 ImageNet classification. 81.3% top-5 accuracy.
runtime:
distribution_strategy: 'tpu'
mixed_precision_dtype: 'bfloat16'
task:
model:
num_classes: 1001
input_size: [192, 192, 3]
backbone:
type: 'resnet'
resnet:
model_id: 101
replace_stem_max_pool: true
resnetd_shortcut: true
se_ratio: 0.25
stem_type: 'v1'
stochastic_depth_drop_rate: 0.0
norm_activation:
activation: 'swish'
norm_momentum: 0.0
use_sync_bn: false
losses:
l2_weight_decay: 0.00004
one_hot: true
label_smoothing: 0.1
train_data:
input_path: 'imagenet-2012-tfrecord/train*'
is_training: true
global_batch_size: 4096
dtype: 'bfloat16'
aug_policy: 'randaug'
randaug_magnitude: 15
validation_data:
input_path: 'imagenet-2012-tfrecord/valid*'
is_training: false
global_batch_size: 4096
dtype: 'bfloat16'
drop_remainder: false
trainer:
train_steps: 109200
validation_steps: 13
validation_interval: 312
steps_per_loop: 312
summary_interval: 312
checkpoint_interval: 312
optimizer_config:
ema:
average_decay: 0.9999
optimizer:
type: 'sgd'
sgd:
momentum: 0.9
learning_rate:
type: 'cosine'
cosine:
initial_learning_rate: 1.6
decay_steps: 109200
warmup:
type: 'linear'
linear:
warmup_steps: 1560
# ResNet-RS-152 ImageNet classification. 81.9% top-5 accuracy.
runtime:
distribution_strategy: 'tpu'
mixed_precision_dtype: 'bfloat16'
task:
model:
num_classes: 1001
input_size: [192, 192, 3]
backbone:
type: 'resnet'
resnet:
model_id: 152
replace_stem_max_pool: true
resnetd_shortcut: true
se_ratio: 0.25
stem_type: 'v1'
stochastic_depth_drop_rate: 0.0
norm_activation:
activation: 'swish'
norm_momentum: 0.0
use_sync_bn: false
losses:
l2_weight_decay: 0.00004
one_hot: true
label_smoothing: 0.1
train_data:
input_path: 'imagenet-2012-tfrecord/train*'
is_training: true
global_batch_size: 4096
dtype: 'bfloat16'
aug_policy: 'randaug'
randaug_magnitude: 15
validation_data:
input_path: 'imagenet-2012-tfrecord/valid*'
is_training: false
global_batch_size: 4096
dtype: 'bfloat16'
drop_remainder: false
trainer:
train_steps: 109200
validation_steps: 13
validation_interval: 312
steps_per_loop: 312
summary_interval: 312
checkpoint_interval: 312
optimizer_config:
ema:
average_decay: 0.9999
optimizer:
type: 'sgd'
sgd:
momentum: 0.9
learning_rate:
type: 'cosine'
cosine:
initial_learning_rate: 1.6
decay_steps: 109200
warmup:
type: 'linear'
linear:
warmup_steps: 1560
# ResNet-200 ImageNet classification. 79.9% top-1 and 94.8% top-5 accuracy.
# ResNet-RS-152 ImageNet classification. 82.5% top-5 accuracy.
runtime:
distribution_strategy: 'tpu'
mixed_precision_dtype: 'bfloat16'
......@@ -9,11 +9,18 @@ task:
backbone:
type: 'resnet'
resnet:
model_id: 200
model_id: 152
replace_stem_max_pool: true
resnetd_shortcut: true
se_ratio: 0.25
stem_type: 'v1'
stochastic_depth_drop_rate: 0.0
norm_activation:
activation: 'swish'
norm_momentum: 0.0
use_sync_bn: false
losses:
l2_weight_decay: 0.0001
l2_weight_decay: 0.00004
one_hot: true
label_smoothing: 0.1
train_data:
......@@ -21,6 +28,8 @@ task:
is_training: true
global_batch_size: 4096
dtype: 'bfloat16'
aug_policy: 'randaug'
randaug_magnitude: 15
validation_data:
input_path: 'imagenet-2012-tfrecord/valid*'
is_training: false
......@@ -28,13 +37,15 @@ task:
dtype: 'bfloat16'
drop_remainder: false
trainer:
train_steps: 62400
train_steps: 109200
validation_steps: 13
validation_interval: 312
steps_per_loop: 312
summary_interval: 312
checkpoint_interval: 312
optimizer_config:
ema:
average_decay: 0.9999
optimizer:
type: 'sgd'
sgd:
......@@ -43,7 +54,7 @@ trainer:
type: 'cosine'
cosine:
initial_learning_rate: 1.6
decay_steps: 62400
decay_steps: 109200
warmup:
type: 'linear'
linear:
......
# ResNet-RS-152 ImageNet classification. 83.1% top-5 accuracy.
runtime:
distribution_strategy: 'tpu'
mixed_precision_dtype: 'bfloat16'
task:
model:
num_classes: 1001
input_size: [256, 256, 3]
backbone:
type: 'resnet'
resnet:
model_id: 152
replace_stem_max_pool: true
resnetd_shortcut: true
se_ratio: 0.25
stem_type: 'v1'
stochastic_depth_drop_rate: 0.0
norm_activation:
activation: 'swish'
norm_momentum: 0.0
use_sync_bn: false
losses:
l2_weight_decay: 0.00004
one_hot: true
label_smoothing: 0.1
train_data:
input_path: 'imagenet-2012-tfrecord/train*'
is_training: true
global_batch_size: 4096
dtype: 'bfloat16'
aug_policy: 'randaug'
randaug_magnitude: 15
validation_data:
input_path: 'imagenet-2012-tfrecord/valid*'
is_training: false
global_batch_size: 4096
dtype: 'bfloat16'
drop_remainder: false
trainer:
train_steps: 109200
validation_steps: 13
validation_interval: 312
steps_per_loop: 312
summary_interval: 312
checkpoint_interval: 312
optimizer_config:
ema:
average_decay: 0.9999
optimizer:
type: 'sgd'
sgd:
momentum: 0.9
learning_rate:
type: 'cosine'
cosine:
initial_learning_rate: 1.6
decay_steps: 109200
warmup:
type: 'linear'
linear:
warmup_steps: 1560
# ResNet-RS-200 ImageNet classification. 83.5% top-5 accuracy.
runtime:
distribution_strategy: 'tpu'
mixed_precision_dtype: 'bfloat16'
task:
model:
num_classes: 1001
input_size: [256, 256, 3]
backbone:
type: 'resnet'
resnet:
model_id: 200
replace_stem_max_pool: true
resnetd_shortcut: true
se_ratio: 0.25
stem_type: 'v1'
stochastic_depth_drop_rate: 0.1
norm_activation:
activation: 'swish'
norm_momentum: 0.0
use_sync_bn: false
losses:
l2_weight_decay: 0.00004
one_hot: true
label_smoothing: 0.1
train_data:
input_path: 'imagenet-2012-tfrecord/train*'
is_training: true
global_batch_size: 4096
dtype: 'bfloat16'
aug_policy: 'randaug'
randaug_magnitude: 15
validation_data:
input_path: 'imagenet-2012-tfrecord/valid*'
is_training: false
global_batch_size: 4096
dtype: 'bfloat16'
drop_remainder: false
trainer:
train_steps: 109200
validation_steps: 13
validation_interval: 312
steps_per_loop: 312
summary_interval: 312
checkpoint_interval: 312
optimizer_config:
ema:
average_decay: 0.9999
optimizer:
type: 'sgd'
sgd:
momentum: 0.9
learning_rate:
type: 'cosine'
cosine:
initial_learning_rate: 1.6
decay_steps: 109200
warmup:
type: 'linear'
linear:
warmup_steps: 1560
# ResNet-RS-270 ImageNet classification. 83.6% top-5 accuracy.
runtime:
distribution_strategy: 'tpu'
mixed_precision_dtype: 'bfloat16'
task:
model:
num_classes: 1001
input_size: [256, 256, 3]
backbone:
type: 'resnet'
resnet:
model_id: 270
replace_stem_max_pool: true
resnetd_shortcut: true
se_ratio: 0.25
stem_type: 'v1'
stochastic_depth_drop_rate: 0.1
norm_activation:
activation: 'swish'
norm_momentum: 0.0
use_sync_bn: false
losses:
l2_weight_decay: 0.00004
one_hot: true
label_smoothing: 0.1
train_data:
input_path: 'imagenet-2012-tfrecord/train*'
is_training: true
global_batch_size: 4096
dtype: 'bfloat16'
aug_policy: 'randaug'
randaug_magnitude: 15
validation_data:
input_path: 'imagenet-2012-tfrecord/valid*'
is_training: false
global_batch_size: 4096
dtype: 'bfloat16'
drop_remainder: false
trainer:
train_steps: 109200
validation_steps: 13
validation_interval: 312
steps_per_loop: 312
summary_interval: 312
checkpoint_interval: 312
optimizer_config:
ema:
average_decay: 0.9999
optimizer:
type: 'sgd'
sgd:
momentum: 0.9
learning_rate:
type: 'cosine'
cosine:
initial_learning_rate: 1.6
decay_steps: 109200
warmup:
type: 'linear'
linear:
warmup_steps: 1560
# ResNet-RS-350 ImageNet classification. 83.7% top-5 accuracy.
runtime:
distribution_strategy: 'tpu'
mixed_precision_dtype: 'bfloat16'
task:
model:
num_classes: 1001
input_size: [256, 256, 3]
backbone:
type: 'resnet'
resnet:
model_id: 350
replace_stem_max_pool: true
resnetd_shortcut: true
se_ratio: 0.25
stem_type: 'v1'
stochastic_depth_drop_rate: 0.1
norm_activation:
activation: 'swish'
norm_momentum: 0.0
use_sync_bn: false
losses:
l2_weight_decay: 0.00004
one_hot: true
label_smoothing: 0.1
train_data:
input_path: 'imagenet-2012-tfrecord/train*'
is_training: true
global_batch_size: 4096
dtype: 'bfloat16'
aug_policy: 'randaug'
randaug_magnitude: 15
validation_data:
input_path: 'imagenet-2012-tfrecord/valid*'
is_training: false
global_batch_size: 4096
dtype: 'bfloat16'
drop_remainder: false
trainer:
train_steps: 109200
validation_steps: 13
validation_interval: 312
steps_per_loop: 312
summary_interval: 312
checkpoint_interval: 312
optimizer_config:
ema:
average_decay: 0.9999
optimizer:
type: 'sgd'
sgd:
momentum: 0.9
learning_rate:
type: 'cosine'
cosine:
initial_learning_rate: 1.6
decay_steps: 109200
warmup:
type: 'linear'
linear:
warmup_steps: 1560
# ResNet-RS-350 ImageNet classification. 84.2% top-5 accuracy.
runtime:
distribution_strategy: 'tpu'
mixed_precision_dtype: 'bfloat16'
task:
model:
num_classes: 1001
input_size: [320, 320, 3]
backbone:
type: 'resnet'
resnet:
model_id: 350
replace_stem_max_pool: true
resnetd_shortcut: true
se_ratio: 0.25
stem_type: 'v1'
stochastic_depth_drop_rate: 0.1
norm_activation:
activation: 'swish'
norm_momentum: 0.0
use_sync_bn: false
losses:
l2_weight_decay: 0.00004
one_hot: true
label_smoothing: 0.1
train_data:
input_path: 'imagenet-2012-tfrecord/train*'
is_training: true
global_batch_size: 4096
dtype: 'bfloat16'
aug_policy: 'randaug'
randaug_magnitude: 15
validation_data:
input_path: 'imagenet-2012-tfrecord/valid*'
is_training: false
global_batch_size: 4096
dtype: 'bfloat16'
drop_remainder: false
trainer:
train_steps: 109200
validation_steps: 13
validation_interval: 312
steps_per_loop: 312
summary_interval: 312
checkpoint_interval: 312
optimizer_config:
ema:
average_decay: 0.9999
optimizer:
type: 'sgd'
sgd:
momentum: 0.9
learning_rate:
type: 'cosine'
cosine:
initial_learning_rate: 1.6
decay_steps: 109200
warmup:
type: 'linear'
linear:
warmup_steps: 1560
# ResNet-350 ImageNet classification. 84.2% top-1 accuracy.
runtime:
distribution_strategy: 'tpu'
mixed_precision_dtype: 'bfloat16'
......@@ -9,14 +8,16 @@ task:
backbone:
type: 'resnet'
resnet:
model_id: 350
depth_multiplier: 1.25
stem_type: 'v1'
model_id: 420
replace_stem_max_pool: true
resnetd_shortcut: true
se_ratio: 0.25
stochastic_depth_drop_rate: 0.2
stem_type: 'v1'
stochastic_depth_drop_rate: 0.1
norm_activation:
activation: 'swish'
dropout_rate: 0.5
norm_momentum: 0.0
use_sync_bn: false
losses:
l2_weight_decay: 0.00004
one_hot: true
......@@ -27,6 +28,7 @@ task:
global_batch_size: 4096
dtype: 'bfloat16'
aug_policy: 'randaug'
randaug_magnitude: 15
validation_data:
input_path: 'imagenet-2012-tfrecord/valid*'
is_training: false
......@@ -41,6 +43,8 @@ trainer:
summary_interval: 312
checkpoint_interval: 312
optimizer_config:
ema:
average_decay: 0.9999
optimizer:
type: 'sgd'
sgd:
......@@ -53,4 +57,4 @@ trainer:
warmup:
type: 'linear'
linear:
warmup_steps: 5000
warmup_steps: 1560
# ResNet-RS-50 ImageNet classification. 79.1% top-5 accuracy.
runtime:
distribution_strategy: 'tpu'
mixed_precision_dtype: 'bfloat16'
task:
model:
num_classes: 1001
input_size: [160, 160, 3]
backbone:
type: 'resnet'
resnet:
model_id: 50
replace_stem_max_pool: true
resnetd_shortcut: true
se_ratio: 0.25
stem_type: 'v1'
stochastic_depth_drop_rate: 0.0
norm_activation:
activation: 'swish'
norm_momentum: 0.0
use_sync_bn: false
losses:
l2_weight_decay: 0.00004
one_hot: true
label_smoothing: 0.1
train_data:
input_path: 'imagenet-2012-tfrecord/train*'
is_training: true
global_batch_size: 4096
dtype: 'bfloat16'
aug_policy: 'randaug'
randaug_magnitude: 10
validation_data:
input_path: 'imagenet-2012-tfrecord/valid*'
is_training: false
global_batch_size: 4096
dtype: 'bfloat16'
drop_remainder: false
trainer:
train_steps: 109200
validation_steps: 13
validation_interval: 312
steps_per_loop: 312
summary_interval: 312
checkpoint_interval: 312
optimizer_config:
ema:
average_decay: 0.9999
optimizer:
type: 'sgd'
sgd:
momentum: 0.9
learning_rate:
type: 'cosine'
cosine:
initial_learning_rate: 1.6
decay_steps: 109200
warmup:
type: 'linear'
linear:
warmup_steps: 1560
......@@ -35,6 +35,7 @@ class DataConfig(cfg.DataConfig):
shuffle_buffer_size: int = 10000
cycle_length: int = 10
aug_policy: Optional[str] = None # None, 'autoaug', or 'randaug'
randaug_magnitude: Optional[int] = 10
file_type: str = 'tfrecord'
......@@ -184,13 +185,17 @@ def image_classification_imagenet_resnetrs() -> cfg.ExperimentConfig:
stochastic_depth_drop_rate=0.0)),
dropout_rate=0.25,
norm_activation=common.NormActivation(
norm_momentum=0.0, norm_epsilon=1e-5, use_sync_bn=False)),
norm_momentum=0.0,
norm_epsilon=1e-5,
use_sync_bn=False,
activation='swish')),
losses=Losses(l2_weight_decay=4e-5, label_smoothing=0.1),
train_data=DataConfig(
input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'),
is_training=True,
global_batch_size=train_batch_size,
aug_policy='randaug'),
aug_policy='randaug',
randaug_magnitude=10),
validation_data=DataConfig(
input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'),
is_training=False,
......@@ -199,7 +204,7 @@ def image_classification_imagenet_resnetrs() -> cfg.ExperimentConfig:
steps_per_loop=steps_per_epoch,
summary_interval=steps_per_epoch,
checkpoint_interval=steps_per_epoch,
train_steps=360 * steps_per_epoch,
train_steps=350 * steps_per_epoch,
validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size,
validation_interval=steps_per_epoch,
optimizer_config=optimization.OptimizationConfig({
......@@ -215,8 +220,8 @@ def image_classification_imagenet_resnetrs() -> cfg.ExperimentConfig:
'learning_rate': {
'type': 'cosine',
'cosine': {
'initial_learning_rate': 0.1,
'decay_steps': 360 * steps_per_epoch
'initial_learning_rate': 1.6,
'decay_steps': 350 * steps_per_epoch
}
},
'warmup': {
......
......@@ -49,6 +49,7 @@ class Parser(parser.Parser):
num_classes: float,
aug_rand_hflip: bool = True,
aug_policy: Optional[str] = None,
randaug_magnitude: Optional[int] = 10,
dtype: str = 'float32'):
"""Initializes parameters for parsing annotations in the dataset.
......@@ -59,6 +60,7 @@ class Parser(parser.Parser):
aug_rand_hflip: `bool`, if True, augment training with random
horizontal flip.
aug_policy: `str`, augmentation policies. None, 'autoaug', or 'randaug'.
randaug_magnitude: `int`, magnitude of the randaugment policy.
dtype: `str`, cast output image in dtype. It can be 'float32', 'float16',
or 'bfloat16'.
"""
......@@ -77,7 +79,8 @@ class Parser(parser.Parser):
if aug_policy == 'autoaug':
self._augmenter = augment.AutoAugment()
elif aug_policy == 'randaug':
self._augmenter = augment.RandAugment(num_layers=2, magnitude=20)
self._augmenter = augment.RandAugment(
num_layers=2, magnitude=randaug_magnitude)
else:
raise ValueError(
'Augmentation policy {} not supported.'.format(aug_policy))
......
......@@ -93,6 +93,7 @@ class ImageClassificationTask(base_task.Task):
output_size=input_size[:2],
num_classes=num_classes,
aug_policy=params.aug_policy,
randaug_magnitude=params.randaug_magnitude,
dtype=params.dtype)
reader = input_reader.InputReader(
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册