diff --git a/official/vision/image_classification/README.md b/official/vision/image_classification/README.md
index 8e2edbf91888ec916231f66fea53f4887352a6c5..c34b48a4f847613132a7b36fa4f1ccd110882143 100644
--- a/official/vision/image_classification/README.md
+++ b/official/vision/image_classification/README.md
@@ -1,185 +1,3 @@
-# Image Classification
-
-**Warning:** the features in the `image_classification/` folder have been fully
-intergrated into vision/beta. Please use the [new code base](../beta/README.md).
-
-This folder contains TF 2.0 model examples for image classification:
-
-* [MNIST](#mnist)
-* [Classifier Trainer](#classifier-trainer), a framework that uses the Keras
-compile/fit methods for image classification models, including:
-  * ResNet
-  * EfficientNet[^1]
-
-[^1]: Currently a work in progress. We cannot match "AutoAugment (AA)" in [the original version](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet).
-For more information about other types of models, please refer to this
-[README file](../../README.md).
-
-## Before you begin
-Please make sure that you have the latest version of TensorFlow
-installed and
-[add the models folder to your Python path](/official/#running-the-models).
-
-### ImageNet preparation
-
-#### Using TFDS
-`classifier_trainer.py` supports ImageNet with
-[TensorFlow Datasets (TFDS)](https://www.tensorflow.org/datasets/overview).
-
-Please see the following [example snippet](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/scripts/download_and_prepare.py)
-for more information on how to use TFDS to download and prepare datasets, and
-specifically the [TFDS ImageNet readme](https://github.com/tensorflow/datasets/blob/master/docs/catalog/imagenet2012.md)
-for manual download instructions.
-
-#### Legacy TFRecords
-Download the ImageNet dataset and convert it to TFRecord format.
-The following [script](https://github.com/tensorflow/tpu/blob/master/tools/datasets/imagenet_to_gcs.py)
-and [README](https://github.com/tensorflow/tpu/tree/master/tools/datasets#imagenet_to_gcspy)
-provide a few options.
-
-Note that the legacy ResNet runners, e.g. [resnet/resnet_ctl_imagenet_main.py](resnet/resnet_ctl_imagenet_main.py)
-require TFRecords whereas `classifier_trainer.py` can use both by setting the
-builder to 'records' or 'tfds' in the configurations.
-
-### Running on Cloud TPUs
-
-Note: These models will **not** work with TPUs on Colab.
-
-You can train image classification models on Cloud TPUs using
-[tf.distribute.TPUStrategy](https://www.tensorflow.org/api_docs/python/tf.distribute.TPUStrategy?version=nightly).
-If you are not familiar with Cloud TPUs, it is strongly recommended that you go
-through the
-[quickstart](https://cloud.google.com/tpu/docs/quickstart) to learn how to
-create a TPU and GCE VM.
-
-### Running on multiple GPU hosts
-
-You can also train these models on multiple hosts, each with GPUs, using
-[tf.distribute.experimental.MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy).
-
-The easiest way to run multi-host benchmarks is to set the
-[`TF_CONFIG`](https://www.tensorflow.org/guide/distributed_training#TF_CONFIG)
-appropriately at each host.  e.g., to run using `MultiWorkerMirroredStrategy` on
-2 hosts, the `cluster` in `TF_CONFIG` should have 2 `host:port` entries, and
-host `i` should have the `task` in `TF_CONFIG` set to `{"type": "worker",
-"index": i}`.  `MultiWorkerMirroredStrategy` will automatically use all the
-available GPUs at each host.
-
-## MNIST
-
-To download the data and run the MNIST sample model locally for the first time,
-run one of the following command:
-
-```bash
-python3 mnist_main.py \
-  --model_dir=$MODEL_DIR \
-  --data_dir=$DATA_DIR \
-  --train_epochs=10 \
-  --distribution_strategy=one_device \
-  --num_gpus=$NUM_GPUS \
-  --download
-```
-
-To train the model on a Cloud TPU, run the following command:
-
-```bash
-python3 mnist_main.py \
-  --tpu=$TPU_NAME \
-  --model_dir=$MODEL_DIR \
-  --data_dir=$DATA_DIR \
-  --train_epochs=10 \
-  --distribution_strategy=tpu \
-  --download
-```
-
-Note: the `--download` flag is only required the first time you run the model.
-
-
-## Classifier Trainer
-The classifier trainer is a unified framework for running image classification
-models using Keras's compile/fit methods. Experiments should be provided in the
-form of YAML files, some examples are included within the configs/examples
-folder. Please see [configs/examples](./configs/examples) for more example
-configurations.
-
-The provided configuration files use a per replica batch size and is scaled
-by the number of devices. For instance, if `batch size` = 64, then for 1 GPU
-the global batch size would be 64 * 1 = 64. For 8 GPUs, the global batch size
-would be 64 * 8 = 512. Similarly, for a v3-8 TPU, the global batch size would
-be 64 * 8 = 512, and for a v3-32, the global batch size is 64 * 32 = 2048.
-
-### ResNet50
-
-#### On GPU:
-```bash
-python3 classifier_trainer.py \
-  --mode=train_and_eval \
-  --model_type=resnet \
-  --dataset=imagenet \
-  --model_dir=$MODEL_DIR \
-  --data_dir=$DATA_DIR \
-  --config_file=configs/examples/resnet/imagenet/gpu.yaml \
-  --params_override='runtime.num_gpus=$NUM_GPUS'
-```
-
-To train on multiple hosts, each with GPUs attached using
-[MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy)
-please update `runtime` section in gpu.yaml
-(or override using `--params_override`) with:
-
-```YAML
-# gpu.yaml
-runtime:
-  distribution_strategy: 'multi_worker_mirrored'
-  worker_hosts: '$HOST1:port,$HOST2:port'
-  num_gpus: $NUM_GPUS
-  task_index: 0
-```
-By having `task_index: 0` on the first host and `task_index: 1` on the second
-and so on. `$HOST1` and `$HOST2` are the IP addresses of the hosts, and `port`
-can be chosen any free port on the hosts. Only the first host will write
-TensorBoard Summaries and save checkpoints.
-
-#### On TPU:
-```bash
-python3 classifier_trainer.py \
-  --mode=train_and_eval \
-  --model_type=resnet \
-  --dataset=imagenet \
-  --tpu=$TPU_NAME \
-  --model_dir=$MODEL_DIR \
-  --data_dir=$DATA_DIR \
-  --config_file=configs/examples/resnet/imagenet/tpu.yaml
-```
-
-### EfficientNet
-**Note: EfficientNet development is a work in progress.**
-#### On GPU:
-```bash
-python3 classifier_trainer.py \
-  --mode=train_and_eval \
-  --model_type=efficientnet \
-  --dataset=imagenet \
-  --model_dir=$MODEL_DIR \
-  --data_dir=$DATA_DIR \
-  --config_file=configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml \
-  --params_override='runtime.num_gpus=$NUM_GPUS'
-```
-
-
-#### On TPU:
-```bash
-python3 classifier_trainer.py \
-  --mode=train_and_eval \
-  --model_type=efficientnet \
-  --dataset=imagenet \
-  --tpu=$TPU_NAME \
-  --model_dir=$MODEL_DIR \
-  --data_dir=$DATA_DIR \
-  --config_file=configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml
-```
-
-Note that the number of GPU devices can be overridden in the command line using
-`--params_overrides`. The TPU does not need this override as the device is fixed
-by providing the TPU address or name with the `--tpu` flag.
-
+This repository is deprecated and replaced by the solid
+implementations inside vision/beta/. All the content has been moved to
+[official/legacy/image_classification](https://github.com/tensorflow/models/tree/master/official/legacy/image_classification).
diff --git a/official/vision/image_classification/__init__.py b/official/vision/image_classification/__init__.py
index e419af524b5f349fe04abfa820c3cb51b777d422..f8cba89ac32e6b894aeace22f6190d88f2f724df 100644
--- a/official/vision/image_classification/__init__.py
+++ b/official/vision/image_classification/__init__.py
@@ -12,3 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+"""Deprecating the vision/detection folder."""
+raise ImportError(
+    'This module has been moved to official/legacy/image_classification')
diff --git a/official/vision/image_classification/augment.py b/official/vision/image_classification/augment.py
deleted file mode 100644
index f322d31dac6ecc1e282566134720d42261a9b7fc..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/augment.py
+++ /dev/null
@@ -1,985 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""AutoAugment and RandAugment policies for enhanced image preprocessing.
-
-AutoAugment Reference: https://arxiv.org/abs/1805.09501
-RandAugment Reference: https://arxiv.org/abs/1909.13719
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-from typing import Any, Dict, List, Optional, Text, Tuple
-
-from keras.layers.preprocessing import image_preprocessing as image_ops
-import tensorflow as tf
-
-
-# This signifies the max integer that the controller RNN could predict for the
-# augmentation scheme.
-_MAX_LEVEL = 10.
-
-
-def to_4d(image: tf.Tensor) -> tf.Tensor:
-  """Converts an input Tensor to 4 dimensions.
-
-  4D image => [N, H, W, C] or [N, C, H, W]
-  3D image => [1, H, W, C] or [1, C, H, W]
-  2D image => [1, H, W, 1]
-
-  Args:
-    image: The 2/3/4D input tensor.
-
-  Returns:
-    A 4D image tensor.
-
-  Raises:
-    `TypeError` if `image` is not a 2/3/4D tensor.
-
-  """
-  shape = tf.shape(image)
-  original_rank = tf.rank(image)
-  left_pad = tf.cast(tf.less_equal(original_rank, 3), dtype=tf.int32)
-  right_pad = tf.cast(tf.equal(original_rank, 2), dtype=tf.int32)
-  new_shape = tf.concat(
-      [
-          tf.ones(shape=left_pad, dtype=tf.int32),
-          shape,
-          tf.ones(shape=right_pad, dtype=tf.int32),
-      ],
-      axis=0,
-  )
-  return tf.reshape(image, new_shape)
-
-
-def from_4d(image: tf.Tensor, ndims: tf.Tensor) -> tf.Tensor:
-  """Converts a 4D image back to `ndims` rank."""
-  shape = tf.shape(image)
-  begin = tf.cast(tf.less_equal(ndims, 3), dtype=tf.int32)
-  end = 4 - tf.cast(tf.equal(ndims, 2), dtype=tf.int32)
-  new_shape = shape[begin:end]
-  return tf.reshape(image, new_shape)
-
-
-def _convert_translation_to_transform(translations: tf.Tensor) -> tf.Tensor:
-  """Converts translations to a projective transform.
-
-  The translation matrix looks like this:
-    [[1 0 -dx]
-     [0 1 -dy]
-     [0 0 1]]
-
-  Args:
-    translations: The 2-element list representing [dx, dy], or a matrix of
-      2-element lists representing [dx dy] to translate for each image. The
-      shape must be static.
-
-  Returns:
-    The transformation matrix of shape (num_images, 8).
-
-  Raises:
-    `TypeError` if
-      - the shape of `translations` is not known or
-      - the shape of `translations` is not rank 1 or 2.
-
-  """
-  translations = tf.convert_to_tensor(translations, dtype=tf.float32)
-  if translations.get_shape().ndims is None:
-    raise TypeError('translations rank must be statically known')
-  elif len(translations.get_shape()) == 1:
-    translations = translations[None]
-  elif len(translations.get_shape()) != 2:
-    raise TypeError('translations should have rank 1 or 2.')
-  num_translations = tf.shape(translations)[0]
-
-  return tf.concat(
-      values=[
-          tf.ones((num_translations, 1), tf.dtypes.float32),
-          tf.zeros((num_translations, 1), tf.dtypes.float32),
-          -translations[:, 0, None],
-          tf.zeros((num_translations, 1), tf.dtypes.float32),
-          tf.ones((num_translations, 1), tf.dtypes.float32),
-          -translations[:, 1, None],
-          tf.zeros((num_translations, 2), tf.dtypes.float32),
-      ],
-      axis=1,
-  )
-
-
-def _convert_angles_to_transform(angles: tf.Tensor, image_width: tf.Tensor,
-                                 image_height: tf.Tensor) -> tf.Tensor:
-  """Converts an angle or angles to a projective transform.
-
-  Args:
-    angles: A scalar to rotate all images, or a vector to rotate a batch of
-      images. This must be a scalar.
-    image_width: The width of the image(s) to be transformed.
-    image_height: The height of the image(s) to be transformed.
-
-  Returns:
-    A tensor of shape (num_images, 8).
-
-  Raises:
-    `TypeError` if `angles` is not rank 0 or 1.
-
-  """
-  angles = tf.convert_to_tensor(angles, dtype=tf.float32)
-  if len(angles.get_shape()) == 0:  # pylint:disable=g-explicit-length-test
-    angles = angles[None]
-  elif len(angles.get_shape()) != 1:
-    raise TypeError('Angles should have a rank 0 or 1.')
-  x_offset = ((image_width - 1) -
-              (tf.math.cos(angles) * (image_width - 1) - tf.math.sin(angles) *
-               (image_height - 1))) / 2.0
-  y_offset = ((image_height - 1) -
-              (tf.math.sin(angles) * (image_width - 1) + tf.math.cos(angles) *
-               (image_height - 1))) / 2.0
-  num_angles = tf.shape(angles)[0]
-  return tf.concat(
-      values=[
-          tf.math.cos(angles)[:, None],
-          -tf.math.sin(angles)[:, None],
-          x_offset[:, None],
-          tf.math.sin(angles)[:, None],
-          tf.math.cos(angles)[:, None],
-          y_offset[:, None],
-          tf.zeros((num_angles, 2), tf.dtypes.float32),
-      ],
-      axis=1,
-  )
-
-
-def transform(image: tf.Tensor, transforms) -> tf.Tensor:
-  """Prepares input data for `image_ops.transform`."""
-  original_ndims = tf.rank(image)
-  transforms = tf.convert_to_tensor(transforms, dtype=tf.float32)
-  if transforms.shape.rank == 1:
-    transforms = transforms[None]
-  image = to_4d(image)
-  image = image_ops.transform(
-      images=image, transforms=transforms, interpolation='nearest')
-  return from_4d(image, original_ndims)
-
-
-def translate(image: tf.Tensor, translations) -> tf.Tensor:
-  """Translates image(s) by provided vectors.
-
-  Args:
-    image: An image Tensor of type uint8.
-    translations: A vector or matrix representing [dx dy].
-
-  Returns:
-    The translated version of the image.
-
-  """
-  transforms = _convert_translation_to_transform(translations)
-  return transform(image, transforms=transforms)
-
-
-def rotate(image: tf.Tensor, degrees: float) -> tf.Tensor:
-  """Rotates the image by degrees either clockwise or counterclockwise.
-
-  Args:
-    image: An image Tensor of type uint8.
-    degrees: Float, a scalar angle in degrees to rotate all images by. If
-      degrees is positive the image will be rotated clockwise otherwise it will
-      be rotated counterclockwise.
-
-  Returns:
-    The rotated version of image.
-
-  """
-  # Convert from degrees to radians.
-  degrees_to_radians = math.pi / 180.0
-  radians = tf.cast(degrees * degrees_to_radians, tf.float32)
-
-  original_ndims = tf.rank(image)
-  image = to_4d(image)
-
-  image_height = tf.cast(tf.shape(image)[1], tf.float32)
-  image_width = tf.cast(tf.shape(image)[2], tf.float32)
-  transforms = _convert_angles_to_transform(
-      angles=radians, image_width=image_width, image_height=image_height)
-  # In practice, we should randomize the rotation degrees by flipping
-  # it negatively half the time, but that's done on 'degrees' outside
-  # of the function.
-  image = transform(image, transforms=transforms)
-  return from_4d(image, original_ndims)
-
-
-def blend(image1: tf.Tensor, image2: tf.Tensor, factor: float) -> tf.Tensor:
-  """Blend image1 and image2 using 'factor'.
-
-  Factor can be above 0.0.  A value of 0.0 means only image1 is used.
-  A value of 1.0 means only image2 is used.  A value between 0.0 and
-  1.0 means we linearly interpolate the pixel values between the two
-  images.  A value greater than 1.0 "extrapolates" the difference
-  between the two pixel values, and we clip the results to values
-  between 0 and 255.
-
-  Args:
-    image1: An image Tensor of type uint8.
-    image2: An image Tensor of type uint8.
-    factor: A floating point value above 0.0.
-
-  Returns:
-    A blended image Tensor of type uint8.
-  """
-  if factor == 0.0:
-    return tf.convert_to_tensor(image1)
-  if factor == 1.0:
-    return tf.convert_to_tensor(image2)
-
-  image1 = tf.cast(image1, tf.float32)
-  image2 = tf.cast(image2, tf.float32)
-
-  difference = image2 - image1
-  scaled = factor * difference
-
-  # Do addition in float.
-  temp = tf.cast(image1, tf.float32) + scaled
-
-  # Interpolate
-  if factor > 0.0 and factor < 1.0:
-    # Interpolation means we always stay within 0 and 255.
-    return tf.cast(temp, tf.uint8)
-
-  # Extrapolate:
-  #
-  # We need to clip and then cast.
-  return tf.cast(tf.clip_by_value(temp, 0.0, 255.0), tf.uint8)
-
-
-def cutout(image: tf.Tensor, pad_size: int, replace: int = 0) -> tf.Tensor:
-  """Apply cutout (https://arxiv.org/abs/1708.04552) to image.
-
-  This operation applies a (2*pad_size x 2*pad_size) mask of zeros to
-  a random location within `img`. The pixel values filled in will be of the
-  value `replace`. The located where the mask will be applied is randomly
-  chosen uniformly over the whole image.
-
-  Args:
-    image: An image Tensor of type uint8.
-    pad_size: Specifies how big the zero mask that will be generated is that is
-      applied to the image. The mask will be of size (2*pad_size x 2*pad_size).
-    replace: What pixel value to fill in the image in the area that has the
-      cutout mask applied to it.
-
-  Returns:
-    An image Tensor that is of type uint8.
-  """
-  image_height = tf.shape(image)[0]
-  image_width = tf.shape(image)[1]
-
-  # Sample the center location in the image where the zero mask will be applied.
-  cutout_center_height = tf.random.uniform(
-      shape=[], minval=0, maxval=image_height, dtype=tf.int32)
-
-  cutout_center_width = tf.random.uniform(
-      shape=[], minval=0, maxval=image_width, dtype=tf.int32)
-
-  lower_pad = tf.maximum(0, cutout_center_height - pad_size)
-  upper_pad = tf.maximum(0, image_height - cutout_center_height - pad_size)
-  left_pad = tf.maximum(0, cutout_center_width - pad_size)
-  right_pad = tf.maximum(0, image_width - cutout_center_width - pad_size)
-
-  cutout_shape = [
-      image_height - (lower_pad + upper_pad),
-      image_width - (left_pad + right_pad)
-  ]
-  padding_dims = [[lower_pad, upper_pad], [left_pad, right_pad]]
-  mask = tf.pad(
-      tf.zeros(cutout_shape, dtype=image.dtype),
-      padding_dims,
-      constant_values=1)
-  mask = tf.expand_dims(mask, -1)
-  mask = tf.tile(mask, [1, 1, 3])
-  image = tf.where(
-      tf.equal(mask, 0),
-      tf.ones_like(image, dtype=image.dtype) * replace, image)
-  return image
-
-
-def solarize(image: tf.Tensor, threshold: int = 128) -> tf.Tensor:
-  # For each pixel in the image, select the pixel
-  # if the value is less than the threshold.
-  # Otherwise, subtract 255 from the pixel.
-  return tf.where(image < threshold, image, 255 - image)
-
-
-def solarize_add(image: tf.Tensor,
-                 addition: int = 0,
-                 threshold: int = 128) -> tf.Tensor:
-  # For each pixel in the image less than threshold
-  # we add 'addition' amount to it and then clip the
-  # pixel value to be between 0 and 255. The value
-  # of 'addition' is between -128 and 128.
-  added_image = tf.cast(image, tf.int64) + addition
-  added_image = tf.cast(tf.clip_by_value(added_image, 0, 255), tf.uint8)
-  return tf.where(image < threshold, added_image, image)
-
-
-def color(image: tf.Tensor, factor: float) -> tf.Tensor:
-  """Equivalent of PIL Color."""
-  degenerate = tf.image.grayscale_to_rgb(tf.image.rgb_to_grayscale(image))
-  return blend(degenerate, image, factor)
-
-
-def contrast(image: tf.Tensor, factor: float) -> tf.Tensor:
-  """Equivalent of PIL Contrast."""
-  degenerate = tf.image.rgb_to_grayscale(image)
-  # Cast before calling tf.histogram.
-  degenerate = tf.cast(degenerate, tf.int32)
-
-  # Compute the grayscale histogram, then compute the mean pixel value,
-  # and create a constant image size of that value.  Use that as the
-  # blending degenerate target of the original image.
-  hist = tf.histogram_fixed_width(degenerate, [0, 255], nbins=256)
-  mean = tf.reduce_sum(tf.cast(hist, tf.float32)) / 256.0
-  degenerate = tf.ones_like(degenerate, dtype=tf.float32) * mean
-  degenerate = tf.clip_by_value(degenerate, 0.0, 255.0)
-  degenerate = tf.image.grayscale_to_rgb(tf.cast(degenerate, tf.uint8))
-  return blend(degenerate, image, factor)
-
-
-def brightness(image: tf.Tensor, factor: float) -> tf.Tensor:
-  """Equivalent of PIL Brightness."""
-  degenerate = tf.zeros_like(image)
-  return blend(degenerate, image, factor)
-
-
-def posterize(image: tf.Tensor, bits: int) -> tf.Tensor:
-  """Equivalent of PIL Posterize."""
-  shift = 8 - bits
-  return tf.bitwise.left_shift(tf.bitwise.right_shift(image, shift), shift)
-
-
-def wrapped_rotate(image: tf.Tensor, degrees: float, replace: int) -> tf.Tensor:
-  """Applies rotation with wrap/unwrap."""
-  image = rotate(wrap(image), degrees=degrees)
-  return unwrap(image, replace)
-
-
-def translate_x(image: tf.Tensor, pixels: int, replace: int) -> tf.Tensor:
-  """Equivalent of PIL Translate in X dimension."""
-  image = translate(wrap(image), [-pixels, 0])
-  return unwrap(image, replace)
-
-
-def translate_y(image: tf.Tensor, pixels: int, replace: int) -> tf.Tensor:
-  """Equivalent of PIL Translate in Y dimension."""
-  image = translate(wrap(image), [0, -pixels])
-  return unwrap(image, replace)
-
-
-def shear_x(image: tf.Tensor, level: float, replace: int) -> tf.Tensor:
-  """Equivalent of PIL Shearing in X dimension."""
-  # Shear parallel to x axis is a projective transform
-  # with a matrix form of:
-  # [1  level
-  #  0  1].
-  image = transform(
-      image=wrap(image), transforms=[1., level, 0., 0., 1., 0., 0., 0.])
-  return unwrap(image, replace)
-
-
-def shear_y(image: tf.Tensor, level: float, replace: int) -> tf.Tensor:
-  """Equivalent of PIL Shearing in Y dimension."""
-  # Shear parallel to y axis is a projective transform
-  # with a matrix form of:
-  # [1  0
-  #  level  1].
-  image = transform(
-      image=wrap(image), transforms=[1., 0., 0., level, 1., 0., 0., 0.])
-  return unwrap(image, replace)
-
-
-def autocontrast(image: tf.Tensor) -> tf.Tensor:
-  """Implements Autocontrast function from PIL using TF ops.
-
-  Args:
-    image: A 3D uint8 tensor.
-
-  Returns:
-    The image after it has had autocontrast applied to it and will be of type
-    uint8.
-  """
-
-  def scale_channel(image: tf.Tensor) -> tf.Tensor:
-    """Scale the 2D image using the autocontrast rule."""
-    # A possibly cheaper version can be done using cumsum/unique_with_counts
-    # over the histogram values, rather than iterating over the entire image.
-    # to compute mins and maxes.
-    lo = tf.cast(tf.reduce_min(image), tf.float32)
-    hi = tf.cast(tf.reduce_max(image), tf.float32)
-
-    # Scale the image, making the lowest value 0 and the highest value 255.
-    def scale_values(im):
-      scale = 255.0 / (hi - lo)
-      offset = -lo * scale
-      im = tf.cast(im, tf.float32) * scale + offset
-      im = tf.clip_by_value(im, 0.0, 255.0)
-      return tf.cast(im, tf.uint8)
-
-    result = tf.cond(hi > lo, lambda: scale_values(image), lambda: image)
-    return result
-
-  # Assumes RGB for now.  Scales each channel independently
-  # and then stacks the result.
-  s1 = scale_channel(image[:, :, 0])
-  s2 = scale_channel(image[:, :, 1])
-  s3 = scale_channel(image[:, :, 2])
-  image = tf.stack([s1, s2, s3], 2)
-  return image
-
-
-def sharpness(image: tf.Tensor, factor: float) -> tf.Tensor:
-  """Implements Sharpness function from PIL using TF ops."""
-  orig_image = image
-  image = tf.cast(image, tf.float32)
-  # Make image 4D for conv operation.
-  image = tf.expand_dims(image, 0)
-  # SMOOTH PIL Kernel.
-  kernel = tf.constant([[1, 1, 1], [1, 5, 1], [1, 1, 1]],
-                       dtype=tf.float32,
-                       shape=[3, 3, 1, 1]) / 13.
-  # Tile across channel dimension.
-  kernel = tf.tile(kernel, [1, 1, 3, 1])
-  strides = [1, 1, 1, 1]
-  degenerate = tf.nn.depthwise_conv2d(
-      image, kernel, strides, padding='VALID', dilations=[1, 1])
-  degenerate = tf.clip_by_value(degenerate, 0.0, 255.0)
-  degenerate = tf.squeeze(tf.cast(degenerate, tf.uint8), [0])
-
-  # For the borders of the resulting image, fill in the values of the
-  # original image.
-  mask = tf.ones_like(degenerate)
-  padded_mask = tf.pad(mask, [[1, 1], [1, 1], [0, 0]])
-  padded_degenerate = tf.pad(degenerate, [[1, 1], [1, 1], [0, 0]])
-  result = tf.where(tf.equal(padded_mask, 1), padded_degenerate, orig_image)
-
-  # Blend the final result.
-  return blend(result, orig_image, factor)
-
-
-def equalize(image: tf.Tensor) -> tf.Tensor:
-  """Implements Equalize function from PIL using TF ops."""
-
-  def scale_channel(im, c):
-    """Scale the data in the channel to implement equalize."""
-    im = tf.cast(im[:, :, c], tf.int32)
-    # Compute the histogram of the image channel.
-    histo = tf.histogram_fixed_width(im, [0, 255], nbins=256)
-
-    # For the purposes of computing the step, filter out the nonzeros.
-    nonzero = tf.where(tf.not_equal(histo, 0))
-    nonzero_histo = tf.reshape(tf.gather(histo, nonzero), [-1])
-    step = (tf.reduce_sum(nonzero_histo) - nonzero_histo[-1]) // 255
-
-    def build_lut(histo, step):
-      # Compute the cumulative sum, shifting by step // 2
-      # and then normalization by step.
-      lut = (tf.cumsum(histo) + (step // 2)) // step
-      # Shift lut, prepending with 0.
-      lut = tf.concat([[0], lut[:-1]], 0)
-      # Clip the counts to be in range.  This is done
-      # in the C code for image.point.
-      return tf.clip_by_value(lut, 0, 255)
-
-    # If step is zero, return the original image.  Otherwise, build
-    # lut from the full histogram and step and then index from it.
-    result = tf.cond(
-        tf.equal(step, 0), lambda: im,
-        lambda: tf.gather(build_lut(histo, step), im))
-
-    return tf.cast(result, tf.uint8)
-
-  # Assumes RGB for now.  Scales each channel independently
-  # and then stacks the result.
-  s1 = scale_channel(image, 0)
-  s2 = scale_channel(image, 1)
-  s3 = scale_channel(image, 2)
-  image = tf.stack([s1, s2, s3], 2)
-  return image
-
-
-def invert(image: tf.Tensor) -> tf.Tensor:
-  """Inverts the image pixels."""
-  image = tf.convert_to_tensor(image)
-  return 255 - image
-
-
-def wrap(image: tf.Tensor) -> tf.Tensor:
-  """Returns 'image' with an extra channel set to all 1s."""
-  shape = tf.shape(image)
-  extended_channel = tf.ones([shape[0], shape[1], 1], image.dtype)
-  extended = tf.concat([image, extended_channel], axis=2)
-  return extended
-
-
-def unwrap(image: tf.Tensor, replace: int) -> tf.Tensor:
-  """Unwraps an image produced by wrap.
-
-  Where there is a 0 in the last channel for every spatial position,
-  the rest of the three channels in that spatial dimension are grayed
-  (set to 128).  Operations like translate and shear on a wrapped
-  Tensor will leave 0s in empty locations.  Some transformations look
-  at the intensity of values to do preprocessing, and we want these
-  empty pixels to assume the 'average' value, rather than pure black.
-
-
-  Args:
-    image: A 3D Image Tensor with 4 channels.
-    replace: A one or three value 1D tensor to fill empty pixels.
-
-  Returns:
-    image: A 3D image Tensor with 3 channels.
-  """
-  image_shape = tf.shape(image)
-  # Flatten the spatial dimensions.
-  flattened_image = tf.reshape(image, [-1, image_shape[2]])
-
-  # Find all pixels where the last channel is zero.
-  alpha_channel = tf.expand_dims(flattened_image[:, 3], axis=-1)
-
-  replace = tf.concat([replace, tf.ones([1], image.dtype)], 0)
-
-  # Where they are zero, fill them in with 'replace'.
-  flattened_image = tf.where(
-      tf.equal(alpha_channel, 0),
-      tf.ones_like(flattened_image, dtype=image.dtype) * replace,
-      flattened_image)
-
-  image = tf.reshape(flattened_image, image_shape)
-  image = tf.slice(image, [0, 0, 0], [image_shape[0], image_shape[1], 3])
-  return image
-
-
-def _randomly_negate_tensor(tensor):
-  """With 50% prob turn the tensor negative."""
-  should_flip = tf.cast(tf.floor(tf.random.uniform([]) + 0.5), tf.bool)
-  final_tensor = tf.cond(should_flip, lambda: tensor, lambda: -tensor)
-  return final_tensor
-
-
-def _rotate_level_to_arg(level: float):
-  level = (level / _MAX_LEVEL) * 30.
-  level = _randomly_negate_tensor(level)
-  return (level,)
-
-
-def _shrink_level_to_arg(level: float):
-  """Converts level to ratio by which we shrink the image content."""
-  if level == 0:
-    return (1.0,)  # if level is zero, do not shrink the image
-  # Maximum shrinking ratio is 2.9.
-  level = 2. / (_MAX_LEVEL / level) + 0.9
-  return (level,)
-
-
-def _enhance_level_to_arg(level: float):
-  return ((level / _MAX_LEVEL) * 1.8 + 0.1,)
-
-
-def _shear_level_to_arg(level: float):
-  level = (level / _MAX_LEVEL) * 0.3
-  # Flip level to negative with 50% chance.
-  level = _randomly_negate_tensor(level)
-  return (level,)
-
-
-def _translate_level_to_arg(level: float, translate_const: float):
-  level = (level / _MAX_LEVEL) * float(translate_const)
-  # Flip level to negative with 50% chance.
-  level = _randomly_negate_tensor(level)
-  return (level,)
-
-
-def _mult_to_arg(level: float, multiplier: float = 1.):
-  return (int((level / _MAX_LEVEL) * multiplier),)
-
-
-def _apply_func_with_prob(func: Any, image: tf.Tensor, args: Any, prob: float):
-  """Apply `func` to image w/ `args` as input with probability `prob`."""
-  assert isinstance(args, tuple)
-
-  # Apply the function with probability `prob`.
-  should_apply_op = tf.cast(
-      tf.floor(tf.random.uniform([], dtype=tf.float32) + prob), tf.bool)
-  augmented_image = tf.cond(should_apply_op, lambda: func(image, *args),
-                            lambda: image)
-  return augmented_image
-
-
-def select_and_apply_random_policy(policies: Any, image: tf.Tensor):
-  """Select a random policy from `policies` and apply it to `image`."""
-  policy_to_select = tf.random.uniform([], maxval=len(policies), dtype=tf.int32)
-  # Note that using tf.case instead of tf.conds would result in significantly
-  # larger graphs and would even break export for some larger policies.
-  for (i, policy) in enumerate(policies):
-    image = tf.cond(
-        tf.equal(i, policy_to_select),
-        lambda selected_policy=policy: selected_policy(image),
-        lambda: image)
-  return image
-
-
-NAME_TO_FUNC = {
-    'AutoContrast': autocontrast,
-    'Equalize': equalize,
-    'Invert': invert,
-    'Rotate': wrapped_rotate,
-    'Posterize': posterize,
-    'Solarize': solarize,
-    'SolarizeAdd': solarize_add,
-    'Color': color,
-    'Contrast': contrast,
-    'Brightness': brightness,
-    'Sharpness': sharpness,
-    'ShearX': shear_x,
-    'ShearY': shear_y,
-    'TranslateX': translate_x,
-    'TranslateY': translate_y,
-    'Cutout': cutout,
-}
-
-# Functions that have a 'replace' parameter
-REPLACE_FUNCS = frozenset({
-    'Rotate',
-    'TranslateX',
-    'ShearX',
-    'ShearY',
-    'TranslateY',
-    'Cutout',
-})
-
-
-def level_to_arg(cutout_const: float, translate_const: float):
-  """Creates a dict mapping image operation names to their arguments."""
-
-  no_arg = lambda level: ()
-  posterize_arg = lambda level: _mult_to_arg(level, 4)
-  solarize_arg = lambda level: _mult_to_arg(level, 256)
-  solarize_add_arg = lambda level: _mult_to_arg(level, 110)
-  cutout_arg = lambda level: _mult_to_arg(level, cutout_const)
-  translate_arg = lambda level: _translate_level_to_arg(level, translate_const)
-
-  args = {
-      'AutoContrast': no_arg,
-      'Equalize': no_arg,
-      'Invert': no_arg,
-      'Rotate': _rotate_level_to_arg,
-      'Posterize': posterize_arg,
-      'Solarize': solarize_arg,
-      'SolarizeAdd': solarize_add_arg,
-      'Color': _enhance_level_to_arg,
-      'Contrast': _enhance_level_to_arg,
-      'Brightness': _enhance_level_to_arg,
-      'Sharpness': _enhance_level_to_arg,
-      'ShearX': _shear_level_to_arg,
-      'ShearY': _shear_level_to_arg,
-      'Cutout': cutout_arg,
-      'TranslateX': translate_arg,
-      'TranslateY': translate_arg,
-  }
-  return args
-
-
-def _parse_policy_info(name: Text, prob: float, level: float,
-                       replace_value: List[int], cutout_const: float,
-                       translate_const: float) -> Tuple[Any, float, Any]:
-  """Return the function that corresponds to `name` and update `level` param."""
-  func = NAME_TO_FUNC[name]
-  args = level_to_arg(cutout_const, translate_const)[name](level)
-
-  if name in REPLACE_FUNCS:
-    # Add in replace arg if it is required for the function that is called.
-    args = tuple(list(args) + [replace_value])
-
-  return func, prob, args
-
-
-class ImageAugment(object):
-  """Image augmentation class for applying image distortions."""
-
-  def distort(self, image: tf.Tensor) -> tf.Tensor:
-    """Given an image tensor, returns a distorted image with the same shape.
-
-    Args:
-      image: `Tensor` of shape [height, width, 3] representing an image.
-
-    Returns:
-      The augmented version of `image`.
-    """
-    raise NotImplementedError()
-
-
-class AutoAugment(ImageAugment):
-  """Applies the AutoAugment policy to images.
-
-    AutoAugment is from the paper: https://arxiv.org/abs/1805.09501.
-  """
-
-  def __init__(self,
-               augmentation_name: Text = 'v0',
-               policies: Optional[Dict[Text, Any]] = None,
-               cutout_const: float = 100,
-               translate_const: float = 250):
-    """Applies the AutoAugment policy to images.
-
-    Args:
-      augmentation_name: The name of the AutoAugment policy to use. The
-        available options are `v0` and `test`. `v0` is the policy used for all
-        of the results in the paper and was found to achieve the best results on
-        the COCO dataset. `v1`, `v2` and `v3` are additional good policies found
-        on the COCO dataset that have slight variation in what operations were
-        used during the search procedure along with how many operations are
-        applied in parallel to a single image (2 vs 3).
-      policies: list of lists of tuples in the form `(func, prob, level)`,
-        `func` is a string name of the augmentation function, `prob` is the
-        probability of applying the `func` operation, `level` is the input
-        argument for `func`.
-      cutout_const: multiplier for applying cutout.
-      translate_const: multiplier for applying translation.
-    """
-    super(AutoAugment, self).__init__()
-
-    if policies is None:
-      self.available_policies = {
-          'v0': self.policy_v0(),
-          'test': self.policy_test(),
-          'simple': self.policy_simple(),
-      }
-
-    if augmentation_name not in self.available_policies:
-      raise ValueError(
-          'Invalid augmentation_name: {}'.format(augmentation_name))
-
-    self.augmentation_name = augmentation_name
-    self.policies = self.available_policies[augmentation_name]
-    self.cutout_const = float(cutout_const)
-    self.translate_const = float(translate_const)
-
-  def distort(self, image: tf.Tensor) -> tf.Tensor:
-    """Applies the AutoAugment policy to `image`.
-
-    AutoAugment is from the paper: https://arxiv.org/abs/1805.09501.
-
-    Args:
-      image: `Tensor` of shape [height, width, 3] representing an image.
-
-    Returns:
-      A version of image that now has data augmentation applied to it based on
-      the `policies` pass into the function.
-    """
-    input_image_type = image.dtype
-
-    if input_image_type != tf.uint8:
-      image = tf.clip_by_value(image, 0.0, 255.0)
-      image = tf.cast(image, dtype=tf.uint8)
-
-    replace_value = [128] * 3
-
-    # func is the string name of the augmentation function, prob is the
-    # probability of applying the operation and level is the parameter
-    # associated with the tf op.
-
-    # tf_policies are functions that take in an image and return an augmented
-    # image.
-    tf_policies = []
-    for policy in self.policies:
-      tf_policy = []
-      # Link string name to the correct python function and make sure the
-      # correct argument is passed into that function.
-      for policy_info in policy:
-        policy_info = list(policy_info) + [
-            replace_value, self.cutout_const, self.translate_const
-        ]
-        tf_policy.append(_parse_policy_info(*policy_info))
-      # Now build the tf policy that will apply the augmentation procedue
-      # on image.
-      def make_final_policy(tf_policy_):
-
-        def final_policy(image_):
-          for func, prob, args in tf_policy_:
-            image_ = _apply_func_with_prob(func, image_, args, prob)
-          return image_
-
-        return final_policy
-
-      tf_policies.append(make_final_policy(tf_policy))
-
-    image = select_and_apply_random_policy(tf_policies, image)
-    image = tf.cast(image, dtype=input_image_type)
-    return image
-
-  @staticmethod
-  def policy_v0():
-    """Autoaugment policy that was used in AutoAugment Paper.
-
-    Each tuple is an augmentation operation of the form
-    (operation, probability, magnitude). Each element in policy is a
-    sub-policy that will be applied sequentially on the image.
-
-    Returns:
-      the policy.
-    """
-
-    # TODO(dankondratyuk): tensorflow_addons defines custom ops, which
-    # for some reason are not included when building/linking
-    # This results in the error, "Op type not registered
-    # 'Addons>ImageProjectiveTransformV2' in binary" when running on borg TPUs
-    policy = [
-        [('Equalize', 0.8, 1), ('ShearY', 0.8, 4)],
-        [('Color', 0.4, 9), ('Equalize', 0.6, 3)],
-        [('Color', 0.4, 1), ('Rotate', 0.6, 8)],
-        [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)],
-        [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)],
-        [('Color', 0.2, 0), ('Equalize', 0.8, 8)],
-        [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)],
-        [('ShearX', 0.2, 9), ('Rotate', 0.6, 8)],
-        [('Color', 0.6, 1), ('Equalize', 1.0, 2)],
-        [('Invert', 0.4, 9), ('Rotate', 0.6, 0)],
-        [('Equalize', 1.0, 9), ('ShearY', 0.6, 3)],
-        [('Color', 0.4, 7), ('Equalize', 0.6, 0)],
-        [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)],
-        [('Solarize', 0.6, 8), ('Color', 0.6, 9)],
-        [('Solarize', 0.2, 4), ('Rotate', 0.8, 9)],
-        [('Rotate', 1.0, 7), ('TranslateY', 0.8, 9)],
-        [('ShearX', 0.0, 0), ('Solarize', 0.8, 4)],
-        [('ShearY', 0.8, 0), ('Color', 0.6, 4)],
-        [('Color', 1.0, 0), ('Rotate', 0.6, 2)],
-        [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)],
-        [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)],
-        [('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)],
-        [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)],
-        [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)],
-        [('Color', 0.8, 6), ('Rotate', 0.4, 5)],
-    ]
-    return policy
-
-  @staticmethod
-  def policy_simple():
-    """Same as `policy_v0`, except with custom ops removed."""
-
-    policy = [
-        [('Color', 0.4, 9), ('Equalize', 0.6, 3)],
-        [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)],
-        [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)],
-        [('Color', 0.2, 0), ('Equalize', 0.8, 8)],
-        [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)],
-        [('Color', 0.6, 1), ('Equalize', 1.0, 2)],
-        [('Color', 0.4, 7), ('Equalize', 0.6, 0)],
-        [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)],
-        [('Solarize', 0.6, 8), ('Color', 0.6, 9)],
-        [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)],
-        [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)],
-        [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)],
-        [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)],
-    ]
-    return policy
-
-  @staticmethod
-  def policy_test():
-    """Autoaugment test policy for debugging."""
-    policy = [
-        [('TranslateX', 1.0, 4), ('Equalize', 1.0, 10)],
-    ]
-    return policy
-
-
-class RandAugment(ImageAugment):
-  """Applies the RandAugment policy to images.
-
-  RandAugment is from the paper https://arxiv.org/abs/1909.13719,
-  """
-
-  def __init__(self,
-               num_layers: int = 2,
-               magnitude: float = 10.,
-               cutout_const: float = 40.,
-               translate_const: float = 100.):
-    """Applies the RandAugment policy to images.
-
-    Args:
-      num_layers: Integer, the number of augmentation transformations to apply
-        sequentially to an image. Represented as (N) in the paper. Usually best
-        values will be in the range [1, 3].
-      magnitude: Integer, shared magnitude across all augmentation operations.
-        Represented as (M) in the paper. Usually best values are in the range
-        [5, 10].
-      cutout_const: multiplier for applying cutout.
-      translate_const: multiplier for applying translation.
-    """
-    super(RandAugment, self).__init__()
-
-    self.num_layers = num_layers
-    self.magnitude = float(magnitude)
-    self.cutout_const = float(cutout_const)
-    self.translate_const = float(translate_const)
-    self.available_ops = [
-        'AutoContrast', 'Equalize', 'Invert', 'Rotate', 'Posterize', 'Solarize',
-        'Color', 'Contrast', 'Brightness', 'Sharpness', 'ShearX', 'ShearY',
-        'TranslateX', 'TranslateY', 'Cutout', 'SolarizeAdd'
-    ]
-
-  def distort(self, image: tf.Tensor) -> tf.Tensor:
-    """Applies the RandAugment policy to `image`.
-
-    Args:
-      image: `Tensor` of shape [height, width, 3] representing an image.
-
-    Returns:
-      The augmented version of `image`.
-    """
-    input_image_type = image.dtype
-
-    if input_image_type != tf.uint8:
-      image = tf.clip_by_value(image, 0.0, 255.0)
-      image = tf.cast(image, dtype=tf.uint8)
-
-    replace_value = [128] * 3
-    min_prob, max_prob = 0.2, 0.8
-
-    for _ in range(self.num_layers):
-      op_to_select = tf.random.uniform([],
-                                       maxval=len(self.available_ops) + 1,
-                                       dtype=tf.int32)
-
-      branch_fns = []
-      for (i, op_name) in enumerate(self.available_ops):
-        prob = tf.random.uniform([],
-                                 minval=min_prob,
-                                 maxval=max_prob,
-                                 dtype=tf.float32)
-        func, _, args = _parse_policy_info(op_name, prob, self.magnitude,
-                                           replace_value, self.cutout_const,
-                                           self.translate_const)
-        branch_fns.append((
-            i,
-            # pylint:disable=g-long-lambda
-            lambda selected_func=func, selected_args=args: selected_func(
-                image, *selected_args)))
-        # pylint:enable=g-long-lambda
-
-      image = tf.switch_case(
-          branch_index=op_to_select,
-          branch_fns=branch_fns,
-          default=lambda: tf.identity(image))
-
-    image = tf.cast(image, dtype=input_image_type)
-    return image
diff --git a/official/vision/image_classification/augment_test.py b/official/vision/image_classification/augment_test.py
deleted file mode 100644
index 6279352204c46ae24d1971c48160ff7c6b0acc79..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/augment_test.py
+++ /dev/null
@@ -1,129 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Tests for autoaugment."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl.testing import parameterized
-
-import tensorflow as tf
-
-from official.vision.image_classification import augment
-
-
-def get_dtype_test_cases():
-  return [
-      ('uint8', tf.uint8),
-      ('int32', tf.int32),
-      ('float16', tf.float16),
-      ('float32', tf.float32),
-  ]
-
-
-@parameterized.named_parameters(get_dtype_test_cases())
-class TransformsTest(parameterized.TestCase, tf.test.TestCase):
-  """Basic tests for fundamental transformations."""
-
-  def test_to_from_4d(self, dtype):
-    for shape in [(10, 10), (10, 10, 10), (10, 10, 10, 10)]:
-      original_ndims = len(shape)
-      image = tf.zeros(shape, dtype=dtype)
-      image_4d = augment.to_4d(image)
-      self.assertEqual(4, tf.rank(image_4d))
-      self.assertAllEqual(image, augment.from_4d(image_4d, original_ndims))
-
-  def test_transform(self, dtype):
-    image = tf.constant([[1, 2], [3, 4]], dtype=dtype)
-    self.assertAllEqual(
-        augment.transform(image, transforms=[1] * 8), [[4, 4], [4, 4]])
-
-  def test_translate(self, dtype):
-    image = tf.constant(
-        [[1, 0, 1, 0], [0, 1, 0, 1], [1, 0, 1, 0], [0, 1, 0, 1]], dtype=dtype)
-    translations = [-1, -1]
-    translated = augment.translate(image=image, translations=translations)
-    expected = [[1, 0, 1, 1], [0, 1, 0, 0], [1, 0, 1, 1], [1, 0, 1, 1]]
-    self.assertAllEqual(translated, expected)
-
-  def test_translate_shapes(self, dtype):
-    translation = [0, 0]
-    for shape in [(3, 3), (5, 5), (224, 224, 3)]:
-      image = tf.zeros(shape, dtype=dtype)
-      self.assertAllEqual(image, augment.translate(image, translation))
-
-  def test_translate_invalid_translation(self, dtype):
-    image = tf.zeros((1, 1), dtype=dtype)
-    invalid_translation = [[[1, 1]]]
-    with self.assertRaisesRegex(TypeError, 'rank 1 or 2'):
-      _ = augment.translate(image, invalid_translation)
-
-  def test_rotate(self, dtype):
-    image = tf.reshape(tf.cast(tf.range(9), dtype), (3, 3))
-    rotation = 90.
-    transformed = augment.rotate(image=image, degrees=rotation)
-    expected = [[2, 5, 8], [1, 4, 7], [0, 3, 6]]
-    self.assertAllEqual(transformed, expected)
-
-  def test_rotate_shapes(self, dtype):
-    degrees = 0.
-    for shape in [(3, 3), (5, 5), (224, 224, 3)]:
-      image = tf.zeros(shape, dtype=dtype)
-      self.assertAllEqual(image, augment.rotate(image, degrees))
-
-
-class AutoaugmentTest(tf.test.TestCase):
-
-  def test_autoaugment(self):
-    """Smoke test to be sure there are no syntax errors."""
-    image = tf.zeros((224, 224, 3), dtype=tf.uint8)
-
-    augmenter = augment.AutoAugment()
-    aug_image = augmenter.distort(image)
-
-    self.assertEqual((224, 224, 3), aug_image.shape)
-
-  def test_randaug(self):
-    """Smoke test to be sure there are no syntax errors."""
-    image = tf.zeros((224, 224, 3), dtype=tf.uint8)
-
-    augmenter = augment.RandAugment()
-    aug_image = augmenter.distort(image)
-
-    self.assertEqual((224, 224, 3), aug_image.shape)
-
-  def test_all_policy_ops(self):
-    """Smoke test to be sure all augmentation functions can execute."""
-
-    prob = 1
-    magnitude = 10
-    replace_value = [128] * 3
-    cutout_const = 100
-    translate_const = 250
-
-    image = tf.ones((224, 224, 3), dtype=tf.uint8)
-
-    for op_name in augment.NAME_TO_FUNC:
-      func, _, args = augment._parse_policy_info(op_name, prob, magnitude,
-                                                 replace_value, cutout_const,
-                                                 translate_const)
-      image = func(image, *args)
-
-    self.assertEqual((224, 224, 3), image.shape)
-
-
-if __name__ == '__main__':
-  tf.test.main()
diff --git a/official/vision/image_classification/callbacks.py b/official/vision/image_classification/callbacks.py
deleted file mode 100644
index a4934ed88f7db280d1ffd9ad57346f68a5395d5e..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/callbacks.py
+++ /dev/null
@@ -1,256 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Lint as: python3
-"""Common modules for callbacks."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-from typing import Any, List, MutableMapping, Optional, Text
-
-from absl import logging
-import tensorflow as tf
-
-from official.modeling import optimization
-from official.utils.misc import keras_utils
-
-
-def get_callbacks(
-    model_checkpoint: bool = True,
-    include_tensorboard: bool = True,
-    time_history: bool = True,
-    track_lr: bool = True,
-    write_model_weights: bool = True,
-    apply_moving_average: bool = False,
-    initial_step: int = 0,
-    batch_size: int = 0,
-    log_steps: int = 0,
-    model_dir: Optional[str] = None,
-    backup_and_restore: bool = False) -> List[tf.keras.callbacks.Callback]:
-  """Get all callbacks."""
-  model_dir = model_dir or ''
-  callbacks = []
-  if model_checkpoint:
-    ckpt_full_path = os.path.join(model_dir, 'model.ckpt-{epoch:04d}')
-    callbacks.append(
-        tf.keras.callbacks.ModelCheckpoint(
-            ckpt_full_path, save_weights_only=True, verbose=1))
-  if backup_and_restore:
-    backup_dir = os.path.join(model_dir, 'tmp')
-    callbacks.append(
-        tf.keras.callbacks.experimental.BackupAndRestore(backup_dir))
-  if include_tensorboard:
-    callbacks.append(
-        CustomTensorBoard(
-            log_dir=model_dir,
-            track_lr=track_lr,
-            initial_step=initial_step,
-            write_images=write_model_weights,
-            profile_batch=0))
-  if time_history:
-    callbacks.append(
-        keras_utils.TimeHistory(
-            batch_size,
-            log_steps,
-            logdir=model_dir if include_tensorboard else None))
-  if apply_moving_average:
-    # Save moving average model to a different file so that
-    # we can resume training from a checkpoint
-    ckpt_full_path = os.path.join(model_dir, 'average',
-                                  'model.ckpt-{epoch:04d}')
-    callbacks.append(
-        AverageModelCheckpoint(
-            update_weights=False,
-            filepath=ckpt_full_path,
-            save_weights_only=True,
-            verbose=1))
-    callbacks.append(MovingAverageCallback())
-  return callbacks
-
-
-def get_scalar_from_tensor(t: tf.Tensor) -> int:
-  """Utility function to convert a Tensor to a scalar."""
-  t = tf.keras.backend.get_value(t)
-  if callable(t):
-    return t()
-  else:
-    return t
-
-
-class CustomTensorBoard(tf.keras.callbacks.TensorBoard):
-  """A customized TensorBoard callback that tracks additional datapoints.
-
-  Metrics tracked:
-  - Global learning rate
-
-  Attributes:
-    log_dir: the path of the directory where to save the log files to be parsed
-      by TensorBoard.
-    track_lr: `bool`, whether or not to track the global learning rate.
-    initial_step: the initial step, used for preemption recovery.
-    **kwargs: Additional arguments for backwards compatibility. Possible key is
-      `period`.
-  """
-
-  # TODO(b/146499062): track params, flops, log lr, l2 loss,
-  # classification loss
-
-  def __init__(self,
-               log_dir: str,
-               track_lr: bool = False,
-               initial_step: int = 0,
-               **kwargs):
-    super(CustomTensorBoard, self).__init__(log_dir=log_dir, **kwargs)
-    self.step = initial_step
-    self._track_lr = track_lr
-
-  def on_batch_begin(self,
-                     epoch: int,
-                     logs: Optional[MutableMapping[str, Any]] = None) -> None:
-    self.step += 1
-    if logs is None:
-      logs = {}
-    logs.update(self._calculate_metrics())
-    super(CustomTensorBoard, self).on_batch_begin(epoch, logs)
-
-  def on_epoch_begin(self,
-                     epoch: int,
-                     logs: Optional[MutableMapping[str, Any]] = None) -> None:
-    if logs is None:
-      logs = {}
-    metrics = self._calculate_metrics()
-    logs.update(metrics)
-    for k, v in metrics.items():
-      logging.info('Current %s: %f', k, v)
-    super(CustomTensorBoard, self).on_epoch_begin(epoch, logs)
-
-  def on_epoch_end(self,
-                   epoch: int,
-                   logs: Optional[MutableMapping[str, Any]] = None) -> None:
-    if logs is None:
-      logs = {}
-    metrics = self._calculate_metrics()
-    logs.update(metrics)
-    super(CustomTensorBoard, self).on_epoch_end(epoch, logs)
-
-  def _calculate_metrics(self) -> MutableMapping[str, Any]:
-    logs = {}
-    # TODO(b/149030439): disable LR reporting.
-    # if self._track_lr:
-    #   logs['learning_rate'] = self._calculate_lr()
-    return logs
-
-  def _calculate_lr(self) -> int:
-    """Calculates the learning rate given the current step."""
-    return get_scalar_from_tensor(
-        self._get_base_optimizer()._decayed_lr(var_dtype=tf.float32))  # pylint:disable=protected-access
-
-  def _get_base_optimizer(self) -> tf.keras.optimizers.Optimizer:
-    """Get the base optimizer used by the current model."""
-
-    optimizer = self.model.optimizer
-
-    # The optimizer might be wrapped by another class, so unwrap it
-    while hasattr(optimizer, '_optimizer'):
-      optimizer = optimizer._optimizer  # pylint:disable=protected-access
-
-    return optimizer
-
-
-class MovingAverageCallback(tf.keras.callbacks.Callback):
-  """A Callback to be used with a `ExponentialMovingAverage` optimizer.
-
-  Applies moving average weights to the model during validation time to test
-  and predict on the averaged weights rather than the current model weights.
-  Once training is complete, the model weights will be overwritten with the
-  averaged weights (by default).
-
-  Attributes:
-    overwrite_weights_on_train_end: Whether to overwrite the current model
-      weights with the averaged weights from the moving average optimizer.
-    **kwargs: Any additional callback arguments.
-  """
-
-  def __init__(self, overwrite_weights_on_train_end: bool = False, **kwargs):
-    super(MovingAverageCallback, self).__init__(**kwargs)
-    self.overwrite_weights_on_train_end = overwrite_weights_on_train_end
-
-  def set_model(self, model: tf.keras.Model):
-    super(MovingAverageCallback, self).set_model(model)
-    assert isinstance(self.model.optimizer,
-                      optimization.ExponentialMovingAverage)
-    self.model.optimizer.shadow_copy(self.model)
-
-  def on_test_begin(self, logs: Optional[MutableMapping[Text, Any]] = None):
-    self.model.optimizer.swap_weights()
-
-  def on_test_end(self, logs: Optional[MutableMapping[Text, Any]] = None):
-    self.model.optimizer.swap_weights()
-
-  def on_train_end(self, logs: Optional[MutableMapping[Text, Any]] = None):
-    if self.overwrite_weights_on_train_end:
-      self.model.optimizer.assign_average_vars(self.model.variables)
-
-
-class AverageModelCheckpoint(tf.keras.callbacks.ModelCheckpoint):
-  """Saves and, optionally, assigns the averaged weights.
-
-  Taken from tfa.callbacks.AverageModelCheckpoint.
-
-  Attributes:
-    update_weights: If True, assign the moving average weights to the model, and
-      save them. If False, keep the old non-averaged weights, but the saved
-      model uses the average weights. See `tf.keras.callbacks.ModelCheckpoint`
-      for the other args.
-  """
-
-  def __init__(self,
-               update_weights: bool,
-               filepath: str,
-               monitor: str = 'val_loss',
-               verbose: int = 0,
-               save_best_only: bool = False,
-               save_weights_only: bool = False,
-               mode: str = 'auto',
-               save_freq: str = 'epoch',
-               **kwargs):
-    self.update_weights = update_weights
-    super().__init__(filepath, monitor, verbose, save_best_only,
-                     save_weights_only, mode, save_freq, **kwargs)
-
-  def set_model(self, model):
-    if not isinstance(model.optimizer, optimization.ExponentialMovingAverage):
-      raise TypeError('AverageModelCheckpoint is only used when training'
-                      'with MovingAverage')
-    return super().set_model(model)
-
-  def _save_model(self, epoch, logs):
-    assert isinstance(self.model.optimizer,
-                      optimization.ExponentialMovingAverage)
-
-    if self.update_weights:
-      self.model.optimizer.assign_average_vars(self.model.variables)
-      return super()._save_model(epoch, logs)  # pytype: disable=attribute-error  # typed-keras
-    else:
-      # Note: `model.get_weights()` gives us the weights (non-ref)
-      # whereas `model.variables` returns references to the variables.
-      non_avg_weights = self.model.get_weights()
-      self.model.optimizer.assign_average_vars(self.model.variables)
-      # result is currently None, since `super._save_model` doesn't
-      # return anything, but this may change in the future.
-      result = super()._save_model(epoch, logs)  # pytype: disable=attribute-error  # typed-keras
-      self.model.set_weights(non_avg_weights)
-      return result
diff --git a/official/vision/image_classification/classifier_trainer.py b/official/vision/image_classification/classifier_trainer.py
deleted file mode 100644
index ab6fbaea960e7d894d69e213e95c313d7fe9893c..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/classifier_trainer.py
+++ /dev/null
@@ -1,456 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Lint as: python3
-"""Runs an Image Classification model."""
-
-import os
-import pprint
-from typing import Any, Tuple, Text, Optional, Mapping
-
-from absl import app
-from absl import flags
-from absl import logging
-import tensorflow as tf
-from official.common import distribute_utils
-from official.modeling import hyperparams
-from official.modeling import performance
-from official.utils import hyperparams_flags
-from official.utils.misc import keras_utils
-from official.vision.image_classification import callbacks as custom_callbacks
-from official.vision.image_classification import dataset_factory
-from official.vision.image_classification import optimizer_factory
-from official.vision.image_classification.configs import base_configs
-from official.vision.image_classification.configs import configs
-from official.vision.image_classification.efficientnet import efficientnet_model
-from official.vision.image_classification.resnet import common
-from official.vision.image_classification.resnet import resnet_model
-
-
-def get_models() -> Mapping[str, tf.keras.Model]:
-  """Returns the mapping from model type name to Keras model."""
-  return {
-      'efficientnet': efficientnet_model.EfficientNet.from_name,
-      'resnet': resnet_model.resnet50,
-  }
-
-
-def get_dtype_map() -> Mapping[str, tf.dtypes.DType]:
-  """Returns the mapping from dtype string representations to TF dtypes."""
-  return {
-      'float32': tf.float32,
-      'bfloat16': tf.bfloat16,
-      'float16': tf.float16,
-      'fp32': tf.float32,
-      'bf16': tf.bfloat16,
-  }
-
-
-def _get_metrics(one_hot: bool) -> Mapping[Text, Any]:
-  """Get a dict of available metrics to track."""
-  if one_hot:
-    return {
-        # (name, metric_fn)
-        'acc':
-            tf.keras.metrics.CategoricalAccuracy(name='accuracy'),
-        'accuracy':
-            tf.keras.metrics.CategoricalAccuracy(name='accuracy'),
-        'top_1':
-            tf.keras.metrics.CategoricalAccuracy(name='accuracy'),
-        'top_5':
-            tf.keras.metrics.TopKCategoricalAccuracy(
-                k=5, name='top_5_accuracy'),
-    }
-  else:
-    return {
-        # (name, metric_fn)
-        'acc':
-            tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'),
-        'accuracy':
-            tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'),
-        'top_1':
-            tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'),
-        'top_5':
-            tf.keras.metrics.SparseTopKCategoricalAccuracy(
-                k=5, name='top_5_accuracy'),
-    }
-
-
-def get_image_size_from_model(
-    params: base_configs.ExperimentConfig) -> Optional[int]:
-  """If the given model has a preferred image size, return it."""
-  if params.model_name == 'efficientnet':
-    efficientnet_name = params.model.model_params.model_name
-    if efficientnet_name in efficientnet_model.MODEL_CONFIGS:
-      return efficientnet_model.MODEL_CONFIGS[efficientnet_name].resolution
-  return None
-
-
-def _get_dataset_builders(params: base_configs.ExperimentConfig,
-                          strategy: tf.distribute.Strategy,
-                          one_hot: bool) -> Tuple[Any, Any]:
-  """Create and return train and validation dataset builders."""
-  if one_hot:
-    logging.warning('label_smoothing > 0, so datasets will be one hot encoded.')
-  else:
-    logging.warning('label_smoothing not applied, so datasets will not be one '
-                    'hot encoded.')
-
-  num_devices = strategy.num_replicas_in_sync if strategy else 1
-
-  image_size = get_image_size_from_model(params)
-
-  dataset_configs = [params.train_dataset, params.validation_dataset]
-  builders = []
-
-  for config in dataset_configs:
-    if config is not None and config.has_data:
-      builder = dataset_factory.DatasetBuilder(
-          config,
-          image_size=image_size or config.image_size,
-          num_devices=num_devices,
-          one_hot=one_hot)
-    else:
-      builder = None
-    builders.append(builder)
-
-  return builders
-
-
-def get_loss_scale(params: base_configs.ExperimentConfig,
-                   fp16_default: float = 128.) -> float:
-  """Returns the loss scale for initializations."""
-  loss_scale = params.runtime.loss_scale
-  if loss_scale == 'dynamic':
-    return loss_scale
-  elif loss_scale is not None:
-    return float(loss_scale)
-  elif (params.train_dataset.dtype == 'float32' or
-        params.train_dataset.dtype == 'bfloat16'):
-    return 1.
-  else:
-    assert params.train_dataset.dtype == 'float16'
-    return fp16_default
-
-
-def _get_params_from_flags(flags_obj: flags.FlagValues):
-  """Get ParamsDict from flags."""
-  model = flags_obj.model_type.lower()
-  dataset = flags_obj.dataset.lower()
-  params = configs.get_config(model=model, dataset=dataset)
-
-  flags_overrides = {
-      'model_dir': flags_obj.model_dir,
-      'mode': flags_obj.mode,
-      'model': {
-          'name': model,
-      },
-      'runtime': {
-          'run_eagerly': flags_obj.run_eagerly,
-          'tpu': flags_obj.tpu,
-      },
-      'train_dataset': {
-          'data_dir': flags_obj.data_dir,
-      },
-      'validation_dataset': {
-          'data_dir': flags_obj.data_dir,
-      },
-      'train': {
-          'time_history': {
-              'log_steps': flags_obj.log_steps,
-          },
-      },
-  }
-
-  overriding_configs = (flags_obj.config_file, flags_obj.params_override,
-                        flags_overrides)
-
-  pp = pprint.PrettyPrinter()
-
-  logging.info('Base params: %s', pp.pformat(params.as_dict()))
-
-  for param in overriding_configs:
-    logging.info('Overriding params: %s', param)
-    params = hyperparams.override_params_dict(params, param, is_strict=True)
-
-  params.validate()
-  params.lock()
-
-  logging.info('Final model parameters: %s', pp.pformat(params.as_dict()))
-  return params
-
-
-def resume_from_checkpoint(model: tf.keras.Model, model_dir: str,
-                           train_steps: int) -> int:
-  """Resumes from the latest checkpoint, if possible.
-
-  Loads the model weights and optimizer settings from a checkpoint.
-  This function should be used in case of preemption recovery.
-
-  Args:
-    model: The model whose weights should be restored.
-    model_dir: The directory where model weights were saved.
-    train_steps: The number of steps to train.
-
-  Returns:
-    The epoch of the latest checkpoint, or 0 if not restoring.
-
-  """
-  logging.info('Load from checkpoint is enabled.')
-  latest_checkpoint = tf.train.latest_checkpoint(model_dir)
-  logging.info('latest_checkpoint: %s', latest_checkpoint)
-  if not latest_checkpoint:
-    logging.info('No checkpoint detected.')
-    return 0
-
-  logging.info('Checkpoint file %s found and restoring from '
-               'checkpoint', latest_checkpoint)
-  model.load_weights(latest_checkpoint)
-  initial_epoch = model.optimizer.iterations // train_steps
-  logging.info('Completed loading from checkpoint.')
-  logging.info('Resuming from epoch %d', initial_epoch)
-  return int(initial_epoch)
-
-
-def initialize(params: base_configs.ExperimentConfig,
-               dataset_builder: dataset_factory.DatasetBuilder):
-  """Initializes backend related initializations."""
-  keras_utils.set_session_config(enable_xla=params.runtime.enable_xla)
-  performance.set_mixed_precision_policy(dataset_builder.dtype)
-  if tf.config.list_physical_devices('GPU'):
-    data_format = 'channels_first'
-  else:
-    data_format = 'channels_last'
-  tf.keras.backend.set_image_data_format(data_format)
-  if params.runtime.run_eagerly:
-    # Enable eager execution to allow step-by-step debugging
-    tf.config.experimental_run_functions_eagerly(True)
-  if tf.config.list_physical_devices('GPU'):
-    if params.runtime.gpu_thread_mode:
-      keras_utils.set_gpu_thread_mode_and_count(
-          per_gpu_thread_count=params.runtime.per_gpu_thread_count,
-          gpu_thread_mode=params.runtime.gpu_thread_mode,
-          num_gpus=params.runtime.num_gpus,
-          datasets_num_private_threads=params.runtime
-          .dataset_num_private_threads)  # pylint:disable=line-too-long
-    if params.runtime.batchnorm_spatial_persistent:
-      os.environ['TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT'] = '1'
-
-
-def define_classifier_flags():
-  """Defines common flags for image classification."""
-  hyperparams_flags.initialize_common_flags()
-  flags.DEFINE_string(
-      'data_dir', default=None, help='The location of the input data.')
-  flags.DEFINE_string(
-      'mode',
-      default=None,
-      help='Mode to run: `train`, `eval`, `train_and_eval` or `export`.')
-  flags.DEFINE_bool(
-      'run_eagerly',
-      default=None,
-      help='Use eager execution and disable autograph for debugging.')
-  flags.DEFINE_string(
-      'model_type',
-      default=None,
-      help='The type of the model, e.g. EfficientNet, etc.')
-  flags.DEFINE_string(
-      'dataset',
-      default=None,
-      help='The name of the dataset, e.g. ImageNet, etc.')
-  flags.DEFINE_integer(
-      'log_steps',
-      default=100,
-      help='The interval of steps between logging of batch level stats.')
-
-
-def serialize_config(params: base_configs.ExperimentConfig, model_dir: str):
-  """Serializes and saves the experiment config."""
-  params_save_path = os.path.join(model_dir, 'params.yaml')
-  logging.info('Saving experiment configuration to %s', params_save_path)
-  tf.io.gfile.makedirs(model_dir)
-  hyperparams.save_params_dict_to_yaml(params, params_save_path)
-
-
-def train_and_eval(
-    params: base_configs.ExperimentConfig,
-    strategy_override: tf.distribute.Strategy) -> Mapping[str, Any]:
-  """Runs the train and eval path using compile/fit."""
-  logging.info('Running train and eval.')
-
-  distribute_utils.configure_cluster(params.runtime.worker_hosts,
-                                     params.runtime.task_index)
-
-  # Note: for TPUs, strategy and scope should be created before the dataset
-  strategy = strategy_override or distribute_utils.get_distribution_strategy(
-      distribution_strategy=params.runtime.distribution_strategy,
-      all_reduce_alg=params.runtime.all_reduce_alg,
-      num_gpus=params.runtime.num_gpus,
-      tpu_address=params.runtime.tpu)
-
-  strategy_scope = distribute_utils.get_strategy_scope(strategy)
-
-  logging.info('Detected %d devices.',
-               strategy.num_replicas_in_sync if strategy else 1)
-
-  label_smoothing = params.model.loss.label_smoothing
-  one_hot = label_smoothing and label_smoothing > 0
-
-  builders = _get_dataset_builders(params, strategy, one_hot)
-  datasets = [
-      builder.build(strategy) if builder else None for builder in builders
-  ]
-
-  # Unpack datasets and builders based on train/val/test splits
-  train_builder, validation_builder = builders  # pylint: disable=unbalanced-tuple-unpacking
-  train_dataset, validation_dataset = datasets
-
-  train_epochs = params.train.epochs
-  train_steps = params.train.steps or train_builder.num_steps
-  validation_steps = params.evaluation.steps or validation_builder.num_steps
-
-  initialize(params, train_builder)
-
-  logging.info('Global batch size: %d', train_builder.global_batch_size)
-
-  with strategy_scope:
-    model_params = params.model.model_params.as_dict()
-    model = get_models()[params.model.name](**model_params)
-    learning_rate = optimizer_factory.build_learning_rate(
-        params=params.model.learning_rate,
-        batch_size=train_builder.global_batch_size,
-        train_epochs=train_epochs,
-        train_steps=train_steps)
-    optimizer = optimizer_factory.build_optimizer(
-        optimizer_name=params.model.optimizer.name,
-        base_learning_rate=learning_rate,
-        params=params.model.optimizer.as_dict(),
-        model=model)
-    optimizer = performance.configure_optimizer(
-        optimizer,
-        use_float16=train_builder.dtype == 'float16',
-        loss_scale=get_loss_scale(params))
-
-    metrics_map = _get_metrics(one_hot)
-    metrics = [metrics_map[metric] for metric in params.train.metrics]
-    steps_per_loop = train_steps if params.train.set_epoch_loop else 1
-
-    if one_hot:
-      loss_obj = tf.keras.losses.CategoricalCrossentropy(
-          label_smoothing=params.model.loss.label_smoothing)
-    else:
-      loss_obj = tf.keras.losses.SparseCategoricalCrossentropy()
-    model.compile(
-        optimizer=optimizer,
-        loss=loss_obj,
-        metrics=metrics,
-        steps_per_execution=steps_per_loop)
-
-    initial_epoch = 0
-    if params.train.resume_checkpoint:
-      initial_epoch = resume_from_checkpoint(
-          model=model, model_dir=params.model_dir, train_steps=train_steps)
-
-    callbacks = custom_callbacks.get_callbacks(
-        model_checkpoint=params.train.callbacks.enable_checkpoint_and_export,
-        include_tensorboard=params.train.callbacks.enable_tensorboard,
-        time_history=params.train.callbacks.enable_time_history,
-        track_lr=params.train.tensorboard.track_lr,
-        write_model_weights=params.train.tensorboard.write_model_weights,
-        initial_step=initial_epoch * train_steps,
-        batch_size=train_builder.global_batch_size,
-        log_steps=params.train.time_history.log_steps,
-        model_dir=params.model_dir,
-        backup_and_restore=params.train.callbacks.enable_backup_and_restore)
-
-  serialize_config(params=params, model_dir=params.model_dir)
-
-  if params.evaluation.skip_eval:
-    validation_kwargs = {}
-  else:
-    validation_kwargs = {
-        'validation_data': validation_dataset,
-        'validation_steps': validation_steps,
-        'validation_freq': params.evaluation.epochs_between_evals,
-    }
-
-  history = model.fit(
-      train_dataset,
-      epochs=train_epochs,
-      steps_per_epoch=train_steps,
-      initial_epoch=initial_epoch,
-      callbacks=callbacks,
-      verbose=2,
-      **validation_kwargs)
-
-  validation_output = None
-  if not params.evaluation.skip_eval:
-    validation_output = model.evaluate(
-        validation_dataset, steps=validation_steps, verbose=2)
-
-  # TODO(dankondratyuk): eval and save final test accuracy
-  stats = common.build_stats(history, validation_output, callbacks)
-  return stats
-
-
-def export(params: base_configs.ExperimentConfig):
-  """Runs the model export functionality."""
-  logging.info('Exporting model.')
-  model_params = params.model.model_params.as_dict()
-  model = get_models()[params.model.name](**model_params)
-  checkpoint = params.export.checkpoint
-  if checkpoint is None:
-    logging.info('No export checkpoint was provided. Using the latest '
-                 'checkpoint from model_dir.')
-    checkpoint = tf.train.latest_checkpoint(params.model_dir)
-
-  model.load_weights(checkpoint)
-  model.save(params.export.destination)
-
-
-def run(flags_obj: flags.FlagValues,
-        strategy_override: tf.distribute.Strategy = None) -> Mapping[str, Any]:
-  """Runs Image Classification model using native Keras APIs.
-
-  Args:
-    flags_obj: An object containing parsed flag values.
-    strategy_override: A `tf.distribute.Strategy` object to use for model.
-
-  Returns:
-    Dictionary of training/eval stats
-  """
-  params = _get_params_from_flags(flags_obj)
-  if params.mode == 'train_and_eval':
-    return train_and_eval(params, strategy_override)
-  elif params.mode == 'export_only':
-    export(params)
-  else:
-    raise ValueError('{} is not a valid mode.'.format(params.mode))
-
-
-def main(_):
-  stats = run(flags.FLAGS)
-  if stats:
-    logging.info('Run stats:\n%s', stats)
-
-
-if __name__ == '__main__':
-  logging.set_verbosity(logging.INFO)
-  define_classifier_flags()
-  flags.mark_flag_as_required('data_dir')
-  flags.mark_flag_as_required('mode')
-  flags.mark_flag_as_required('model_type')
-  flags.mark_flag_as_required('dataset')
-
-  app.run(main)
diff --git a/official/vision/image_classification/classifier_trainer_test.py b/official/vision/image_classification/classifier_trainer_test.py
deleted file mode 100644
index 06227c154427db3057269f9e9250a179a52264c9..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/classifier_trainer_test.py
+++ /dev/null
@@ -1,240 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Lint as: python3
-"""Unit tests for the classifier trainer models."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import functools
-import json
-
-import os
-import sys
-
-from typing import Any, Callable, Iterable, Mapping, MutableMapping, Optional, Tuple
-
-from absl import flags
-from absl.testing import flagsaver
-from absl.testing import parameterized
-import tensorflow as tf
-
-from tensorflow.python.distribute import combinations
-from tensorflow.python.distribute import strategy_combinations
-from official.utils.flags import core as flags_core
-from official.vision.image_classification import classifier_trainer
-
-
-classifier_trainer.define_classifier_flags()
-
-
-def distribution_strategy_combinations() -> Iterable[Tuple[Any, ...]]:
-  """Returns the combinations of end-to-end tests to run."""
-  return combinations.combine(
-      distribution=[
-          strategy_combinations.default_strategy,
-          strategy_combinations.cloud_tpu_strategy,
-          strategy_combinations.one_device_strategy_gpu,
-          strategy_combinations.mirrored_strategy_with_two_gpus,
-      ],
-      model=[
-          'efficientnet',
-          'resnet',
-      ],
-      dataset=[
-          'imagenet',
-      ],
-  )
-
-
-def get_params_override(params_override: Mapping[str, Any]) -> str:
-  """Converts params_override dict to string command."""
-  return '--params_override=' + json.dumps(params_override)
-
-
-def basic_params_override(dtype: str = 'float32') -> MutableMapping[str, Any]:
-  """Returns a basic parameter configuration for testing."""
-  return {
-      'train_dataset': {
-          'builder': 'synthetic',
-          'use_per_replica_batch_size': True,
-          'batch_size': 1,
-          'image_size': 224,
-          'dtype': dtype,
-      },
-      'validation_dataset': {
-          'builder': 'synthetic',
-          'batch_size': 1,
-          'use_per_replica_batch_size': True,
-          'image_size': 224,
-          'dtype': dtype,
-      },
-      'train': {
-          'steps': 1,
-          'epochs': 1,
-          'callbacks': {
-              'enable_checkpoint_and_export': True,
-              'enable_tensorboard': False,
-          },
-      },
-      'evaluation': {
-          'steps': 1,
-      },
-  }
-
-
-@flagsaver.flagsaver
-def run_end_to_end(main: Callable[[Any], None],
-                   extra_flags: Optional[Iterable[str]] = None,
-                   model_dir: Optional[str] = None):
-  """Runs the classifier trainer end-to-end."""
-  extra_flags = [] if extra_flags is None else extra_flags
-  args = [sys.argv[0], '--model_dir', model_dir] + extra_flags
-  flags_core.parse_flags(argv=args)
-  main(flags.FLAGS)
-
-
-class ClassifierTest(tf.test.TestCase, parameterized.TestCase):
-  """Unit tests for Keras models."""
-  _tempdir = None
-
-  @classmethod
-  def setUpClass(cls):  # pylint: disable=invalid-name
-    super(ClassifierTest, cls).setUpClass()
-
-  def tearDown(self):
-    super(ClassifierTest, self).tearDown()
-    tf.io.gfile.rmtree(self.get_temp_dir())
-
-  @combinations.generate(distribution_strategy_combinations())
-  def test_end_to_end_train_and_eval(self, distribution, model, dataset):
-    """Test train_and_eval and export for Keras classifier models."""
-    # Some parameters are not defined as flags (e.g. cannot run
-    # classifier_train.py --batch_size=...) by design, so use
-    # "--params_override=..." instead
-    model_dir = self.create_tempdir().full_path
-    base_flags = [
-        '--data_dir=not_used',
-        '--model_type=' + model,
-        '--dataset=' + dataset,
-    ]
-    train_and_eval_flags = base_flags + [
-        get_params_override(basic_params_override()),
-        '--mode=train_and_eval',
-    ]
-
-    run = functools.partial(
-        classifier_trainer.run, strategy_override=distribution)
-    run_end_to_end(
-        main=run, extra_flags=train_and_eval_flags, model_dir=model_dir)
-
-  @combinations.generate(
-      combinations.combine(
-          distribution=[
-              strategy_combinations.one_device_strategy_gpu,
-          ],
-          model=[
-              'efficientnet',
-              'resnet',
-          ],
-          dataset='imagenet',
-          dtype='float16',
-      ))
-  def test_gpu_train(self, distribution, model, dataset, dtype):
-    """Test train_and_eval and export for Keras classifier models."""
-    # Some parameters are not defined as flags (e.g. cannot run
-    # classifier_train.py --batch_size=...) by design, so use
-    # "--params_override=..." instead
-    model_dir = self.create_tempdir().full_path
-    base_flags = [
-        '--data_dir=not_used',
-        '--model_type=' + model,
-        '--dataset=' + dataset,
-    ]
-    train_and_eval_flags = base_flags + [
-        get_params_override(basic_params_override(dtype)),
-        '--mode=train_and_eval',
-    ]
-
-    export_params = basic_params_override()
-    export_path = os.path.join(model_dir, 'export')
-    export_params['export'] = {}
-    export_params['export']['destination'] = export_path
-    export_flags = base_flags + [
-        '--mode=export_only',
-        get_params_override(export_params)
-    ]
-
-    run = functools.partial(
-        classifier_trainer.run, strategy_override=distribution)
-    run_end_to_end(
-        main=run, extra_flags=train_and_eval_flags, model_dir=model_dir)
-    run_end_to_end(main=run, extra_flags=export_flags, model_dir=model_dir)
-    self.assertTrue(os.path.exists(export_path))
-
-  @combinations.generate(
-      combinations.combine(
-          distribution=[
-              strategy_combinations.cloud_tpu_strategy,
-          ],
-          model=[
-              'efficientnet',
-              'resnet',
-          ],
-          dataset='imagenet',
-          dtype='bfloat16',
-      ))
-  def test_tpu_train(self, distribution, model, dataset, dtype):
-    """Test train_and_eval and export for Keras classifier models."""
-    # Some parameters are not defined as flags (e.g. cannot run
-    # classifier_train.py --batch_size=...) by design, so use
-    # "--params_override=..." instead
-    model_dir = self.create_tempdir().full_path
-    base_flags = [
-        '--data_dir=not_used',
-        '--model_type=' + model,
-        '--dataset=' + dataset,
-    ]
-    train_and_eval_flags = base_flags + [
-        get_params_override(basic_params_override(dtype)),
-        '--mode=train_and_eval',
-    ]
-
-    run = functools.partial(
-        classifier_trainer.run, strategy_override=distribution)
-    run_end_to_end(
-        main=run, extra_flags=train_and_eval_flags, model_dir=model_dir)
-
-  @combinations.generate(distribution_strategy_combinations())
-  def test_end_to_end_invalid_mode(self, distribution, model, dataset):
-    """Test the Keras EfficientNet model with `strategy`."""
-    model_dir = self.create_tempdir().full_path
-    extra_flags = [
-        '--data_dir=not_used',
-        '--mode=invalid_mode',
-        '--model_type=' + model,
-        '--dataset=' + dataset,
-        get_params_override(basic_params_override()),
-    ]
-
-    run = functools.partial(
-        classifier_trainer.run, strategy_override=distribution)
-    with self.assertRaises(ValueError):
-      run_end_to_end(main=run, extra_flags=extra_flags, model_dir=model_dir)
-
-
-if __name__ == '__main__':
-  tf.test.main()
diff --git a/official/vision/image_classification/classifier_trainer_util_test.py b/official/vision/image_classification/classifier_trainer_util_test.py
deleted file mode 100644
index d3624c286fdc716e4a09df56fbb8157fa35602aa..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/classifier_trainer_util_test.py
+++ /dev/null
@@ -1,166 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Lint as: python3
-"""Unit tests for the classifier trainer models."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import copy
-import os
-
-from absl.testing import parameterized
-import tensorflow as tf
-
-from official.vision.image_classification import classifier_trainer
-from official.vision.image_classification import dataset_factory
-from official.vision.image_classification import test_utils
-from official.vision.image_classification.configs import base_configs
-
-
-def get_trivial_model(num_classes: int) -> tf.keras.Model:
-  """Creates and compiles trivial model for ImageNet dataset."""
-  model = test_utils.trivial_model(num_classes=num_classes)
-  lr = 0.01
-  optimizer = tf.keras.optimizers.SGD(learning_rate=lr)
-  loss_obj = tf.keras.losses.SparseCategoricalCrossentropy()
-  model.compile(optimizer=optimizer, loss=loss_obj, run_eagerly=True)
-  return model
-
-
-def get_trivial_data() -> tf.data.Dataset:
-  """Gets trivial data in the ImageNet size."""
-
-  def generate_data(_) -> tf.data.Dataset:
-    image = tf.zeros(shape=(224, 224, 3), dtype=tf.float32)
-    label = tf.zeros([1], dtype=tf.int32)
-    return image, label
-
-  dataset = tf.data.Dataset.range(1)
-  dataset = dataset.repeat()
-  dataset = dataset.map(
-      generate_data, num_parallel_calls=tf.data.experimental.AUTOTUNE)
-  dataset = dataset.prefetch(buffer_size=1).batch(1)
-  return dataset
-
-
-class UtilTests(parameterized.TestCase, tf.test.TestCase):
-  """Tests for individual utility functions within classifier_trainer.py."""
-
-  @parameterized.named_parameters(
-      ('efficientnet-b0', 'efficientnet', 'efficientnet-b0', 224),
-      ('efficientnet-b1', 'efficientnet', 'efficientnet-b1', 240),
-      ('efficientnet-b2', 'efficientnet', 'efficientnet-b2', 260),
-      ('efficientnet-b3', 'efficientnet', 'efficientnet-b3', 300),
-      ('efficientnet-b4', 'efficientnet', 'efficientnet-b4', 380),
-      ('efficientnet-b5', 'efficientnet', 'efficientnet-b5', 456),
-      ('efficientnet-b6', 'efficientnet', 'efficientnet-b6', 528),
-      ('efficientnet-b7', 'efficientnet', 'efficientnet-b7', 600),
-      ('resnet', 'resnet', '', None),
-  )
-  def test_get_model_size(self, model, model_name, expected):
-    config = base_configs.ExperimentConfig(
-        model_name=model,
-        model=base_configs.ModelConfig(
-            model_params={
-                'model_name': model_name,
-            },))
-    size = classifier_trainer.get_image_size_from_model(config)
-    self.assertEqual(size, expected)
-
-  @parameterized.named_parameters(
-      ('dynamic', 'dynamic', None, 'dynamic'),
-      ('scalar', 128., None, 128.),
-      ('float32', None, 'float32', 1),
-      ('float16', None, 'float16', 128),
-  )
-  def test_get_loss_scale(self, loss_scale, dtype, expected):
-    config = base_configs.ExperimentConfig(
-        runtime=base_configs.RuntimeConfig(loss_scale=loss_scale),
-        train_dataset=dataset_factory.DatasetConfig(dtype=dtype))
-    ls = classifier_trainer.get_loss_scale(config, fp16_default=128)
-    self.assertEqual(ls, expected)
-
-  @parameterized.named_parameters(('float16', 'float16'),
-                                  ('bfloat16', 'bfloat16'))
-  def test_initialize(self, dtype):
-    config = base_configs.ExperimentConfig(
-        runtime=base_configs.RuntimeConfig(
-            run_eagerly=False,
-            enable_xla=False,
-            per_gpu_thread_count=1,
-            gpu_thread_mode='gpu_private',
-            num_gpus=1,
-            dataset_num_private_threads=1,
-        ),
-        train_dataset=dataset_factory.DatasetConfig(dtype=dtype),
-        model=base_configs.ModelConfig(),
-    )
-
-    class EmptyClass:
-      pass
-
-    fake_ds_builder = EmptyClass()
-    fake_ds_builder.dtype = dtype
-    fake_ds_builder.config = EmptyClass()
-    classifier_trainer.initialize(config, fake_ds_builder)
-
-  def test_resume_from_checkpoint(self):
-    """Tests functionality for resuming from checkpoint."""
-    # Set the keras policy
-    tf.keras.mixed_precision.set_global_policy('mixed_bfloat16')
-
-    # Get the model, datasets, and compile it.
-    model = get_trivial_model(10)
-
-    # Create the checkpoint
-    model_dir = self.create_tempdir().full_path
-    train_epochs = 1
-    train_steps = 10
-    ds = get_trivial_data()
-    callbacks = [
-        tf.keras.callbacks.ModelCheckpoint(
-            os.path.join(model_dir, 'model.ckpt-{epoch:04d}'),
-            save_weights_only=True)
-    ]
-    model.fit(
-        ds,
-        callbacks=callbacks,
-        epochs=train_epochs,
-        steps_per_epoch=train_steps)
-
-    # Test load from checkpoint
-    clean_model = get_trivial_model(10)
-    weights_before_load = copy.deepcopy(clean_model.get_weights())
-    initial_epoch = classifier_trainer.resume_from_checkpoint(
-        model=clean_model, model_dir=model_dir, train_steps=train_steps)
-    self.assertEqual(initial_epoch, 1)
-    self.assertNotAllClose(weights_before_load, clean_model.get_weights())
-
-    tf.io.gfile.rmtree(model_dir)
-
-  def test_serialize_config(self):
-    """Tests functionality for serializing data."""
-    config = base_configs.ExperimentConfig()
-    model_dir = self.create_tempdir().full_path
-    classifier_trainer.serialize_config(params=config, model_dir=model_dir)
-    saved_params_path = os.path.join(model_dir, 'params.yaml')
-    self.assertTrue(os.path.exists(saved_params_path))
-    tf.io.gfile.rmtree(model_dir)
-
-
-if __name__ == '__main__':
-  tf.test.main()
diff --git a/official/vision/image_classification/configs/__init__.py b/official/vision/image_classification/configs/__init__.py
deleted file mode 100644
index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/configs/__init__.py
+++ /dev/null
@@ -1,14 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
diff --git a/official/vision/image_classification/configs/base_configs.py b/official/vision/image_classification/configs/base_configs.py
deleted file mode 100644
index 760b3dce03fc017c912eb499e30ff1418b5ec090..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/configs/base_configs.py
+++ /dev/null
@@ -1,257 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Lint as: python3
-"""Definitions for high level configuration groups.."""
-
-import dataclasses
-from typing import Any, List, Mapping, Optional
-from official.core import config_definitions
-from official.modeling import hyperparams
-
-RuntimeConfig = config_definitions.RuntimeConfig
-
-
-@dataclasses.dataclass
-class TensorBoardConfig(hyperparams.Config):
-  """Configuration for TensorBoard.
-
-  Attributes:
-    track_lr: Whether or not to track the learning rate in TensorBoard. Defaults
-      to True.
-    write_model_weights: Whether or not to write the model weights as images in
-      TensorBoard. Defaults to False.
-  """
-  track_lr: bool = True
-  write_model_weights: bool = False
-
-
-@dataclasses.dataclass
-class CallbacksConfig(hyperparams.Config):
-  """Configuration for Callbacks.
-
-  Attributes:
-    enable_checkpoint_and_export: Whether or not to enable checkpoints as a
-      Callback. Defaults to True.
-    enable_backup_and_restore: Whether or not to add BackupAndRestore
-      callback. Defaults to True.
-    enable_tensorboard: Whether or not to enable TensorBoard as a Callback.
-      Defaults to True.
-    enable_time_history: Whether or not to enable TimeHistory Callbacks.
-      Defaults to True.
-  """
-  enable_checkpoint_and_export: bool = True
-  enable_backup_and_restore: bool = False
-  enable_tensorboard: bool = True
-  enable_time_history: bool = True
-
-
-@dataclasses.dataclass
-class ExportConfig(hyperparams.Config):
-  """Configuration for exports.
-
-  Attributes:
-    checkpoint: the path to the checkpoint to export.
-    destination: the path to where the checkpoint should be exported.
-  """
-  checkpoint: str = None
-  destination: str = None
-
-
-@dataclasses.dataclass
-class MetricsConfig(hyperparams.Config):
-  """Configuration for Metrics.
-
-  Attributes:
-    accuracy: Whether or not to track accuracy as a Callback. Defaults to None.
-    top_5: Whether or not to track top_5_accuracy as a Callback. Defaults to
-      None.
-  """
-  accuracy: bool = None
-  top_5: bool = None
-
-
-@dataclasses.dataclass
-class TimeHistoryConfig(hyperparams.Config):
-  """Configuration for the TimeHistory callback.
-
-  Attributes:
-    log_steps: Interval of steps between logging of batch level stats.
-  """
-  log_steps: int = None
-
-
-@dataclasses.dataclass
-class TrainConfig(hyperparams.Config):
-  """Configuration for training.
-
-  Attributes:
-    resume_checkpoint: Whether or not to enable load checkpoint loading.
-      Defaults to None.
-    epochs: The number of training epochs to run. Defaults to None.
-    steps: The number of steps to run per epoch. If None, then this will be
-      inferred based on the number of images and batch size. Defaults to None.
-    callbacks: An instance of CallbacksConfig.
-    metrics: An instance of MetricsConfig.
-    tensorboard: An instance of TensorBoardConfig.
-    set_epoch_loop: Whether or not to set `steps_per_execution` to
-      equal the number of training steps in `model.compile`. This reduces the
-      number of callbacks run per epoch which significantly improves end-to-end
-      TPU training time.
-  """
-  resume_checkpoint: bool = None
-  epochs: int = None
-  steps: int = None
-  callbacks: CallbacksConfig = CallbacksConfig()
-  metrics: MetricsConfig = None
-  tensorboard: TensorBoardConfig = TensorBoardConfig()
-  time_history: TimeHistoryConfig = TimeHistoryConfig()
-  set_epoch_loop: bool = False
-
-
-@dataclasses.dataclass
-class EvalConfig(hyperparams.Config):
-  """Configuration for evaluation.
-
-  Attributes:
-    epochs_between_evals: The number of train epochs to run between evaluations.
-      Defaults to None.
-    steps: The number of eval steps to run during evaluation. If None, this will
-      be inferred based on the number of images and batch size. Defaults to
-      None.
-    skip_eval: Whether or not to skip evaluation.
-  """
-  epochs_between_evals: int = None
-  steps: int = None
-  skip_eval: bool = False
-
-
-@dataclasses.dataclass
-class LossConfig(hyperparams.Config):
-  """Configuration for Loss.
-
-  Attributes:
-    name: The name of the loss. Defaults to None.
-    label_smoothing: Whether or not to apply label smoothing to the loss. This
-      only applies to 'categorical_cross_entropy'.
-  """
-  name: str = None
-  label_smoothing: float = None
-
-
-@dataclasses.dataclass
-class OptimizerConfig(hyperparams.Config):
-  """Configuration for Optimizers.
-
-  Attributes:
-    name: The name of the optimizer. Defaults to None.
-    decay: Decay or rho, discounting factor for gradient. Defaults to None.
-    epsilon: Small value used to avoid 0 denominator. Defaults to None.
-    momentum: Plain momentum constant. Defaults to None.
-    nesterov: Whether or not to apply Nesterov momentum. Defaults to None.
-    moving_average_decay: The amount of decay to apply. If 0 or None, then
-      exponential moving average is not used. Defaults to None.
-    lookahead: Whether or not to apply the lookahead optimizer. Defaults to
-      None.
-    beta_1: The exponential decay rate for the 1st moment estimates. Used in the
-      Adam optimizers. Defaults to None.
-    beta_2: The exponential decay rate for the 2nd moment estimates. Used in the
-      Adam optimizers. Defaults to None.
-    epsilon: Small value used to avoid 0 denominator. Defaults to 1e-7.
-  """
-  name: str = None
-  decay: float = None
-  epsilon: float = None
-  momentum: float = None
-  nesterov: bool = None
-  moving_average_decay: Optional[float] = None
-  lookahead: Optional[bool] = None
-  beta_1: float = None
-  beta_2: float = None
-  epsilon: float = None
-
-
-@dataclasses.dataclass
-class LearningRateConfig(hyperparams.Config):
-  """Configuration for learning rates.
-
-  Attributes:
-    name: The name of the learning rate. Defaults to None.
-    initial_lr: The initial learning rate. Defaults to None.
-    decay_epochs: The number of decay epochs. Defaults to None.
-    decay_rate: The rate of decay. Defaults to None.
-    warmup_epochs: The number of warmup epochs. Defaults to None.
-    batch_lr_multiplier: The multiplier to apply to the base learning rate, if
-      necessary. Defaults to None.
-    examples_per_epoch: the number of examples in a single epoch. Defaults to
-      None.
-    boundaries: boundaries used in piecewise constant decay with warmup.
-    multipliers: multipliers used in piecewise constant decay with warmup.
-    scale_by_batch_size: Scale the learning rate by a fraction of the batch
-      size. Set to 0 for no scaling (default).
-    staircase: Apply exponential decay at discrete values instead of continuous.
-  """
-  name: str = None
-  initial_lr: float = None
-  decay_epochs: float = None
-  decay_rate: float = None
-  warmup_epochs: int = None
-  examples_per_epoch: int = None
-  boundaries: List[int] = None
-  multipliers: List[float] = None
-  scale_by_batch_size: float = 0.
-  staircase: bool = None
-
-
-@dataclasses.dataclass
-class ModelConfig(hyperparams.Config):
-  """Configuration for Models.
-
-  Attributes:
-    name: The name of the model. Defaults to None.
-    model_params: The parameters used to create the model. Defaults to None.
-    num_classes: The number of classes in the model. Defaults to None.
-    loss: A `LossConfig` instance. Defaults to None.
-    optimizer: An `OptimizerConfig` instance. Defaults to None.
-  """
-  name: str = None
-  model_params: hyperparams.Config = None
-  num_classes: int = None
-  loss: LossConfig = None
-  optimizer: OptimizerConfig = None
-
-
-@dataclasses.dataclass
-class ExperimentConfig(hyperparams.Config):
-  """Base configuration for an image classification experiment.
-
-  Attributes:
-    model_dir: The directory to use when running an experiment.
-    mode: e.g. 'train_and_eval', 'export'
-    runtime: A `RuntimeConfig` instance.
-    train: A `TrainConfig` instance.
-    evaluation: An `EvalConfig` instance.
-    model: A `ModelConfig` instance.
-    export: An `ExportConfig` instance.
-  """
-  model_dir: str = None
-  model_name: str = None
-  mode: str = None
-  runtime: RuntimeConfig = None
-  train_dataset: Any = None
-  validation_dataset: Any = None
-  train: TrainConfig = None
-  evaluation: EvalConfig = None
-  model: ModelConfig = None
-  export: ExportConfig = None
diff --git a/official/vision/image_classification/configs/configs.py b/official/vision/image_classification/configs/configs.py
deleted file mode 100644
index 127af58c476f7ae849ca43e5765379b77897aea8..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/configs/configs.py
+++ /dev/null
@@ -1,113 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Lint as: python3
-"""Configuration utils for image classification experiments."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import dataclasses
-
-from official.vision.image_classification import dataset_factory
-from official.vision.image_classification.configs import base_configs
-from official.vision.image_classification.efficientnet import efficientnet_config
-from official.vision.image_classification.resnet import resnet_config
-
-
-@dataclasses.dataclass
-class EfficientNetImageNetConfig(base_configs.ExperimentConfig):
-  """Base configuration to train efficientnet-b0 on ImageNet.
-
-  Attributes:
-    export: An `ExportConfig` instance
-    runtime: A `RuntimeConfig` instance.
-    dataset: A `DatasetConfig` instance.
-    train: A `TrainConfig` instance.
-    evaluation: An `EvalConfig` instance.
-    model: A `ModelConfig` instance.
-  """
-  export: base_configs.ExportConfig = base_configs.ExportConfig()
-  runtime: base_configs.RuntimeConfig = base_configs.RuntimeConfig()
-  train_dataset: dataset_factory.DatasetConfig = \
-      dataset_factory.ImageNetConfig(split='train')
-  validation_dataset: dataset_factory.DatasetConfig = \
-      dataset_factory.ImageNetConfig(split='validation')
-  train: base_configs.TrainConfig = base_configs.TrainConfig(
-      resume_checkpoint=True,
-      epochs=500,
-      steps=None,
-      callbacks=base_configs.CallbacksConfig(
-          enable_checkpoint_and_export=True, enable_tensorboard=True),
-      metrics=['accuracy', 'top_5'],
-      time_history=base_configs.TimeHistoryConfig(log_steps=100),
-      tensorboard=base_configs.TensorBoardConfig(
-          track_lr=True, write_model_weights=False),
-      set_epoch_loop=False)
-  evaluation: base_configs.EvalConfig = base_configs.EvalConfig(
-      epochs_between_evals=1, steps=None)
-  model: base_configs.ModelConfig = \
-    efficientnet_config.EfficientNetModelConfig()
-
-
-@dataclasses.dataclass
-class ResNetImagenetConfig(base_configs.ExperimentConfig):
-  """Base configuration to train resnet-50 on ImageNet."""
-  export: base_configs.ExportConfig = base_configs.ExportConfig()
-  runtime: base_configs.RuntimeConfig = base_configs.RuntimeConfig()
-  train_dataset: dataset_factory.DatasetConfig = \
-      dataset_factory.ImageNetConfig(split='train',
-                                     one_hot=False,
-                                     mean_subtract=True,
-                                     standardize=True)
-  validation_dataset: dataset_factory.DatasetConfig = \
-      dataset_factory.ImageNetConfig(split='validation',
-                                     one_hot=False,
-                                     mean_subtract=True,
-                                     standardize=True)
-  train: base_configs.TrainConfig = base_configs.TrainConfig(
-      resume_checkpoint=True,
-      epochs=90,
-      steps=None,
-      callbacks=base_configs.CallbacksConfig(
-          enable_checkpoint_and_export=True, enable_tensorboard=True),
-      metrics=['accuracy', 'top_5'],
-      time_history=base_configs.TimeHistoryConfig(log_steps=100),
-      tensorboard=base_configs.TensorBoardConfig(
-          track_lr=True, write_model_weights=False),
-      set_epoch_loop=False)
-  evaluation: base_configs.EvalConfig = base_configs.EvalConfig(
-      epochs_between_evals=1, steps=None)
-  model: base_configs.ModelConfig = resnet_config.ResNetModelConfig()
-
-
-def get_config(model: str, dataset: str) -> base_configs.ExperimentConfig:
-  """Given model and dataset names, return the ExperimentConfig."""
-  dataset_model_config_map = {
-      'imagenet': {
-          'efficientnet': EfficientNetImageNetConfig(),
-          'resnet': ResNetImagenetConfig(),
-      }
-  }
-  try:
-    return dataset_model_config_map[dataset][model]
-  except KeyError:
-    if dataset not in dataset_model_config_map:
-      raise KeyError('Invalid dataset received. Received: {}. Supported '
-                     'datasets include: {}'.format(
-                         dataset, ', '.join(dataset_model_config_map.keys())))
-    raise KeyError('Invalid model received. Received: {}. Supported models for'
-                   '{} include: {}'.format(
-                       model, dataset,
-                       ', '.join(dataset_model_config_map[dataset].keys())))
diff --git a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml b/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml
deleted file mode 100644
index 6f40ffb1e3020a231832a120d9938bf77e9cc74b..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml
+++ /dev/null
@@ -1,52 +0,0 @@
-# Training configuration for EfficientNet-b0 trained on ImageNet on GPUs.
-# Takes ~32 minutes per epoch for 8 V100s.
-# Reaches ~76.1% within 350 epochs.
-# Note: This configuration uses a scaled per-replica batch size based on the number of devices.
-runtime:
-  distribution_strategy: 'mirrored'
-  num_gpus: 1
-train_dataset:
-  name: 'imagenet2012'
-  data_dir: null
-  builder: 'records'
-  split: 'train'
-  num_classes: 1000
-  num_examples: 1281167
-  batch_size: 32
-  use_per_replica_batch_size: True
-  dtype: 'float32'
-  augmenter:
-    name: 'autoaugment'
-validation_dataset:
-  name: 'imagenet2012'
-  data_dir: null
-  builder: 'records'
-  split: 'validation'
-  num_classes: 1000
-  num_examples: 50000
-  batch_size: 32
-  use_per_replica_batch_size: True
-  dtype: 'float32'
-model:
-  model_params:
-    model_name: 'efficientnet-b0'
-    overrides:
-      num_classes: 1000
-      batch_norm: 'default'
-      dtype: 'float32'
-      activation: 'swish'
-  optimizer:
-    name: 'rmsprop'
-    momentum: 0.9
-    decay: 0.9
-    moving_average_decay: 0.0
-    lookahead: false
-  learning_rate:
-    name: 'exponential'
-  loss:
-    label_smoothing: 0.1
-train:
-  resume_checkpoint: True
-  epochs: 500
-evaluation:
-  epochs_between_evals: 1
diff --git a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml b/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml
deleted file mode 100644
index c5be7e9ba32fc7e8f3999df8e7446405dd2d4173..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml
+++ /dev/null
@@ -1,52 +0,0 @@
-# Training configuration for EfficientNet-b0 trained on ImageNet on TPUs.
-# Takes ~2 minutes, 50 seconds per epoch for v3-32.
-# Reaches ~76.1% within 350 epochs.
-# Note: This configuration uses a scaled per-replica batch size based on the number of devices.
-runtime:
-  distribution_strategy: 'tpu'
-train_dataset:
-  name: 'imagenet2012'
-  data_dir: null
-  builder: 'records'
-  split: 'train'
-  num_classes: 1000
-  num_examples: 1281167
-  batch_size: 128
-  use_per_replica_batch_size: True
-  dtype: 'bfloat16'
-  augmenter:
-    name: 'autoaugment'
-validation_dataset:
-  name: 'imagenet2012'
-  data_dir: null
-  builder: 'records'
-  split: 'validation'
-  num_classes: 1000
-  num_examples: 50000
-  batch_size: 128
-  use_per_replica_batch_size: True
-  dtype: 'bfloat16'
-model:
-  model_params:
-    model_name: 'efficientnet-b0'
-    overrides:
-      num_classes: 1000
-      batch_norm: 'tpu'
-      dtype: 'bfloat16'
-      activation: 'swish'
-  optimizer:
-    name: 'rmsprop'
-    momentum: 0.9
-    decay: 0.9
-    moving_average_decay: 0.0
-    lookahead: false
-  learning_rate:
-    name: 'exponential'
-  loss:
-    label_smoothing: 0.1
-train:
-  resume_checkpoint: True
-  epochs: 500
-  set_epoch_loop: True
-evaluation:
-  epochs_between_evals: 1
diff --git a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-gpu.yaml b/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-gpu.yaml
deleted file mode 100644
index 2f3dce01a46c64c4d92e97091628daeadaceb21d..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-gpu.yaml
+++ /dev/null
@@ -1,47 +0,0 @@
-# Note: This configuration uses a scaled per-replica batch size based on the number of devices.
-runtime:
-  distribution_strategy: 'mirrored'
-  num_gpus: 1
-train_dataset:
-  name: 'imagenet2012'
-  data_dir: null
-  builder: 'records'
-  split: 'train'
-  num_classes: 1000
-  num_examples: 1281167
-  batch_size: 32
-  use_per_replica_batch_size: True
-  dtype: 'float32'
-validation_dataset:
-  name: 'imagenet2012'
-  data_dir: null
-  builder: 'records'
-  split: 'validation'
-  num_classes: 1000
-  num_examples: 50000
-  batch_size: 32
-  use_per_replica_batch_size: True
-  dtype: 'float32'
-model:
-  model_params:
-    model_name: 'efficientnet-b1'
-    overrides:
-      num_classes: 1000
-      batch_norm: 'default'
-      dtype: 'float32'
-      activation: 'swish'
-  optimizer:
-    name: 'rmsprop'
-    momentum: 0.9
-    decay: 0.9
-    moving_average_decay: 0.0
-    lookahead: false
-  learning_rate:
-    name: 'exponential'
-  loss:
-    label_smoothing: 0.1
-train:
-  resume_checkpoint: True
-  epochs: 500
-evaluation:
-  epochs_between_evals: 1
diff --git a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-tpu.yaml b/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-tpu.yaml
deleted file mode 100644
index 0bb6a9fe6f0b417f92686178d4bc79a44c5a4aa7..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-tpu.yaml
+++ /dev/null
@@ -1,51 +0,0 @@
-# Training configuration for EfficientNet-b1 trained on ImageNet on TPUs.
-# Takes ~3 minutes, 15 seconds per epoch for v3-32.
-# Note: This configuration uses a scaled per-replica batch size based on the number of devices.
-runtime:
-  distribution_strategy: 'tpu'
-train_dataset:
-  name: 'imagenet2012'
-  data_dir: null
-  builder: 'records'
-  split: 'train'
-  num_classes: 1000
-  num_examples: 1281167
-  batch_size: 128
-  use_per_replica_batch_size: True
-  dtype: 'bfloat16'
-  augmenter:
-    name: 'autoaugment'
-validation_dataset:
-  name: 'imagenet2012'
-  data_dir: null
-  builder: 'records'
-  split: 'validation'
-  num_classes: 1000
-  num_examples: 50000
-  batch_size: 128
-  use_per_replica_batch_size: True
-  dtype: 'bfloat16'
-model:
-  model_params:
-    model_name: 'efficientnet-b1'
-    overrides:
-      num_classes: 1000
-      batch_norm: 'tpu'
-      dtype: 'bfloat16'
-      activation: 'swish'
-  optimizer:
-    name: 'rmsprop'
-    momentum: 0.9
-    decay: 0.9
-    moving_average_decay: 0.0
-    lookahead: false
-  learning_rate:
-    name: 'exponential'
-  loss:
-    label_smoothing: 0.1
-train:
-  resume_checkpoint: True
-  epochs: 500
-  set_epoch_loop: True
-evaluation:
-  epochs_between_evals: 1
diff --git a/official/vision/image_classification/configs/examples/resnet/imagenet/gpu.yaml b/official/vision/image_classification/configs/examples/resnet/imagenet/gpu.yaml
deleted file mode 100644
index 2037d6b5d1c39b9ff898eaf49ec7a68e3987356b..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/configs/examples/resnet/imagenet/gpu.yaml
+++ /dev/null
@@ -1,49 +0,0 @@
-# Training configuration for ResNet trained on ImageNet on GPUs.
-# Reaches > 76.1% within 90 epochs.
-# Note: This configuration uses a scaled per-replica batch size based on the number of devices.
-runtime:
-  distribution_strategy: 'mirrored'
-  num_gpus: 1
-  batchnorm_spatial_persistent: True
-train_dataset:
-  name: 'imagenet2012'
-  data_dir: null
-  builder: 'tfds'
-  split: 'train'
-  image_size: 224
-  num_classes: 1000
-  num_examples: 1281167
-  batch_size: 256
-  use_per_replica_batch_size: True
-  dtype: 'float16'
-  mean_subtract: True
-  standardize: True
-validation_dataset:
-  name: 'imagenet2012'
-  data_dir: null
-  builder: 'tfds'
-  split: 'validation'
-  image_size: 224
-  num_classes: 1000
-  num_examples: 50000
-  batch_size: 256
-  use_per_replica_batch_size: True
-  dtype: 'float16'
-  mean_subtract: True
-  standardize: True
-model:
-  name: 'resnet'
-  model_params:
-    rescale_inputs: False
-  optimizer:
-    name: 'momentum'
-    momentum: 0.9
-    decay: 0.9
-    epsilon: 0.001
-  loss:
-    label_smoothing: 0.1
-train:
-  resume_checkpoint: True
-  epochs: 90
-evaluation:
-  epochs_between_evals: 1
diff --git a/official/vision/image_classification/configs/examples/resnet/imagenet/tpu.yaml b/official/vision/image_classification/configs/examples/resnet/imagenet/tpu.yaml
deleted file mode 100644
index 0a3030333bb42ce59e67cfbe12a12be877ab19d0..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/configs/examples/resnet/imagenet/tpu.yaml
+++ /dev/null
@@ -1,55 +0,0 @@
-# Training configuration for ResNet trained on ImageNet on TPUs.
-# Takes ~4 minutes, 30 seconds seconds per epoch for a v3-32.
-# Reaches > 76.1% within 90 epochs.
-# Note: This configuration uses a scaled per-replica batch size based on the number of devices.
-runtime:
-  distribution_strategy: 'tpu'
-train_dataset:
-  name: 'imagenet2012'
-  data_dir: null
-  builder: 'tfds'
-  split: 'train'
-  one_hot: False
-  image_size: 224
-  num_classes: 1000
-  num_examples: 1281167
-  batch_size: 128
-  use_per_replica_batch_size: True
-  mean_subtract: False
-  standardize: False
-  dtype: 'bfloat16'
-validation_dataset:
-  name: 'imagenet2012'
-  data_dir: null
-  builder: 'tfds'
-  split: 'validation'
-  one_hot: False
-  image_size: 224
-  num_classes: 1000
-  num_examples: 50000
-  batch_size: 128
-  use_per_replica_batch_size: True
-  mean_subtract: False
-  standardize: False
-  dtype: 'bfloat16'
-model:
-  name: 'resnet'
-  model_params:
-    rescale_inputs: True
-  optimizer:
-    name: 'momentum'
-    momentum: 0.9
-    decay: 0.9
-    epsilon: 0.001
-    moving_average_decay: 0.
-    lookahead: False
-  loss:
-    label_smoothing: 0.1
-train:
-  callbacks:
-    enable_checkpoint_and_export: True
-  resume_checkpoint: True
-  epochs: 90
-  set_epoch_loop: True
-evaluation:
-  epochs_between_evals: 1
diff --git a/official/vision/image_classification/dataset_factory.py b/official/vision/image_classification/dataset_factory.py
deleted file mode 100644
index a0458ecccf9a74eb57480f8d127c0eb736591ff5..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/dataset_factory.py
+++ /dev/null
@@ -1,537 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Lint as: python3
-"""Dataset utilities for vision tasks using TFDS and tf.data.Dataset."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-from typing import Any, List, Optional, Tuple, Mapping, Union
-
-from absl import logging
-from dataclasses import dataclass
-import tensorflow as tf
-import tensorflow_datasets as tfds
-
-from official.modeling.hyperparams import base_config
-from official.vision.image_classification import augment
-from official.vision.image_classification import preprocessing
-
-AUGMENTERS = {
-    'autoaugment': augment.AutoAugment,
-    'randaugment': augment.RandAugment,
-}
-
-
-@dataclass
-class AugmentConfig(base_config.Config):
-  """Configuration for image augmenters.
-
-  Attributes:
-    name: The name of the image augmentation to use. Possible options are None
-      (default), 'autoaugment', or 'randaugment'.
-    params: Any paramaters used to initialize the augmenter.
-  """
-  name: Optional[str] = None
-  params: Optional[Mapping[str, Any]] = None
-
-  def build(self) -> augment.ImageAugment:
-    """Build the augmenter using this config."""
-    params = self.params or {}
-    augmenter = AUGMENTERS.get(self.name, None)
-    return augmenter(**params) if augmenter is not None else None
-
-
-@dataclass
-class DatasetConfig(base_config.Config):
-  """The base configuration for building datasets.
-
-  Attributes:
-    name: The name of the Dataset. Usually should correspond to a TFDS dataset.
-    data_dir: The path where the dataset files are stored, if available.
-    filenames: Optional list of strings representing the TFRecord names.
-    builder: The builder type used to load the dataset. Value should be one of
-      'tfds' (load using TFDS), 'records' (load from TFRecords), or 'synthetic'
-      (generate dummy synthetic data without reading from files).
-    split: The split of the dataset. Usually 'train', 'validation', or 'test'.
-    image_size: The size of the image in the dataset. This assumes that `width`
-      == `height`. Set to 'infer' to infer the image size from TFDS info. This
-      requires `name` to be a registered dataset in TFDS.
-    num_classes: The number of classes given by the dataset. Set to 'infer' to
-      infer the image size from TFDS info. This requires `name` to be a
-      registered dataset in TFDS.
-    num_channels: The number of channels given by the dataset. Set to 'infer' to
-      infer the image size from TFDS info. This requires `name` to be a
-      registered dataset in TFDS.
-    num_examples: The number of examples given by the dataset. Set to 'infer' to
-      infer the image size from TFDS info. This requires `name` to be a
-      registered dataset in TFDS.
-    batch_size: The base batch size for the dataset.
-    use_per_replica_batch_size: Whether to scale the batch size based on
-      available resources. If set to `True`, the dataset builder will return
-      batch_size multiplied by `num_devices`, the number of device replicas
-      (e.g., the number of GPUs or TPU cores). This setting should be `True` if
-      the strategy argument is passed to `build()` and `num_devices > 1`.
-    num_devices: The number of replica devices to use. This should be set by
-      `strategy.num_replicas_in_sync` when using a distribution strategy.
-    dtype: The desired dtype of the dataset. This will be set during
-      preprocessing.
-    one_hot: Whether to apply one hot encoding. Set to `True` to be able to use
-      label smoothing.
-    augmenter: The augmenter config to use. No augmentation is used by default.
-    download: Whether to download data using TFDS.
-    shuffle_buffer_size: The buffer size used for shuffling training data.
-    file_shuffle_buffer_size: The buffer size used for shuffling raw training
-      files.
-    skip_decoding: Whether to skip image decoding when loading from TFDS.
-    cache: whether to cache to dataset examples. Can be used to avoid re-reading
-      from disk on the second epoch. Requires significant memory overhead.
-    tf_data_service: The URI of a tf.data service to offload preprocessing onto
-      during training. The URI should be in the format "protocol://address",
-      e.g. "grpc://tf-data-service:5050".
-    mean_subtract: whether or not to apply mean subtraction to the dataset.
-    standardize: whether or not to apply standardization to the dataset.
-  """
-  name: Optional[str] = None
-  data_dir: Optional[str] = None
-  filenames: Optional[List[str]] = None
-  builder: str = 'tfds'
-  split: str = 'train'
-  image_size: Union[int, str] = 'infer'
-  num_classes: Union[int, str] = 'infer'
-  num_channels: Union[int, str] = 'infer'
-  num_examples: Union[int, str] = 'infer'
-  batch_size: int = 128
-  use_per_replica_batch_size: bool = True
-  num_devices: int = 1
-  dtype: str = 'float32'
-  one_hot: bool = True
-  augmenter: AugmentConfig = AugmentConfig()
-  download: bool = False
-  shuffle_buffer_size: int = 10000
-  file_shuffle_buffer_size: int = 1024
-  skip_decoding: bool = True
-  cache: bool = False
-  tf_data_service: Optional[str] = None
-  mean_subtract: bool = False
-  standardize: bool = False
-
-  @property
-  def has_data(self):
-    """Whether this dataset is has any data associated with it."""
-    return self.name or self.data_dir or self.filenames
-
-
-@dataclass
-class ImageNetConfig(DatasetConfig):
-  """The base ImageNet dataset config."""
-  name: str = 'imagenet2012'
-  # Note: for large datasets like ImageNet, using records is faster than tfds
-  builder: str = 'records'
-  image_size: int = 224
-  num_channels: int = 3
-  num_examples: int = 1281167
-  num_classes: int = 1000
-  batch_size: int = 128
-
-
-@dataclass
-class Cifar10Config(DatasetConfig):
-  """The base CIFAR-10 dataset config."""
-  name: str = 'cifar10'
-  image_size: int = 224
-  batch_size: int = 128
-  download: bool = True
-  cache: bool = True
-
-
-class DatasetBuilder:
-  """An object for building datasets.
-
-  Allows building various pipelines fetching examples, preprocessing, etc.
-  Maintains additional state information calculated from the dataset, i.e.,
-  training set split, batch size, and number of steps (batches).
-  """
-
-  def __init__(self, config: DatasetConfig, **overrides: Any):
-    """Initialize the builder from the config."""
-    self.config = config.replace(**overrides)
-    self.builder_info = None
-
-    if self.config.augmenter is not None:
-      logging.info('Using augmentation: %s', self.config.augmenter.name)
-      self.augmenter = self.config.augmenter.build()
-    else:
-      self.augmenter = None
-
-  @property
-  def is_training(self) -> bool:
-    """Whether this is the training set."""
-    return self.config.split == 'train'
-
-  @property
-  def batch_size(self) -> int:
-    """The batch size, multiplied by the number of replicas (if configured)."""
-    if self.config.use_per_replica_batch_size:
-      return self.config.batch_size * self.config.num_devices
-    else:
-      return self.config.batch_size
-
-  @property
-  def global_batch_size(self):
-    """The global batch size across all replicas."""
-    return self.batch_size
-
-  @property
-  def local_batch_size(self):
-    """The base unscaled batch size."""
-    if self.config.use_per_replica_batch_size:
-      return self.config.batch_size
-    else:
-      return self.config.batch_size // self.config.num_devices
-
-  @property
-  def num_steps(self) -> int:
-    """The number of steps (batches) to exhaust this dataset."""
-    # Always divide by the global batch size to get the correct # of steps
-    return self.num_examples // self.global_batch_size
-
-  @property
-  def dtype(self) -> tf.dtypes.DType:
-    """Converts the config's dtype string to a tf dtype.
-
-    Returns:
-      A mapping from string representation of a dtype to the `tf.dtypes.DType`.
-
-    Raises:
-      ValueError if the config's dtype is not supported.
-
-    """
-    dtype_map = {
-        'float32': tf.float32,
-        'bfloat16': tf.bfloat16,
-        'float16': tf.float16,
-        'fp32': tf.float32,
-        'bf16': tf.bfloat16,
-    }
-    try:
-      return dtype_map[self.config.dtype]
-    except:
-      raise ValueError('Invalid DType provided. Supported types: {}'.format(
-          dtype_map.keys()))
-
-  @property
-  def image_size(self) -> int:
-    """The size of each image (can be inferred from the dataset)."""
-
-    if self.config.image_size == 'infer':
-      return self.info.features['image'].shape[0]
-    else:
-      return int(self.config.image_size)
-
-  @property
-  def num_channels(self) -> int:
-    """The number of image channels (can be inferred from the dataset)."""
-    if self.config.num_channels == 'infer':
-      return self.info.features['image'].shape[-1]
-    else:
-      return int(self.config.num_channels)
-
-  @property
-  def num_examples(self) -> int:
-    """The number of examples (can be inferred from the dataset)."""
-    if self.config.num_examples == 'infer':
-      return self.info.splits[self.config.split].num_examples
-    else:
-      return int(self.config.num_examples)
-
-  @property
-  def num_classes(self) -> int:
-    """The number of classes (can be inferred from the dataset)."""
-    if self.config.num_classes == 'infer':
-      return self.info.features['label'].num_classes
-    else:
-      return int(self.config.num_classes)
-
-  @property
-  def info(self) -> tfds.core.DatasetInfo:
-    """The TFDS dataset info, if available."""
-    try:
-      if self.builder_info is None:
-        self.builder_info = tfds.builder(self.config.name).info
-    except ConnectionError as e:
-      logging.error('Failed to use TFDS to load info. Please set dataset info '
-                    '(image_size, num_channels, num_examples, num_classes) in '
-                    'the dataset config.')
-      raise e
-    return self.builder_info
-
-  def build(
-      self,
-      strategy: Optional[tf.distribute.Strategy] = None) -> tf.data.Dataset:
-    """Construct a dataset end-to-end and return it using an optional strategy.
-
-    Args:
-      strategy: a strategy that, if passed, will distribute the dataset
-        according to that strategy. If passed and `num_devices > 1`,
-        `use_per_replica_batch_size` must be set to `True`.
-
-    Returns:
-      A TensorFlow dataset outputting batched images and labels.
-    """
-    if strategy:
-      if strategy.num_replicas_in_sync != self.config.num_devices:
-        logging.warn(
-            'Passed a strategy with %d devices, but expected'
-            '%d devices.', strategy.num_replicas_in_sync,
-            self.config.num_devices)
-      dataset = strategy.distribute_datasets_from_function(self._build)
-    else:
-      dataset = self._build()
-
-    return dataset
-
-  def _build(
-      self,
-      input_context: Optional[tf.distribute.InputContext] = None
-  ) -> tf.data.Dataset:
-    """Construct a dataset end-to-end and return it.
-
-    Args:
-      input_context: An optional context provided by `tf.distribute` for
-        cross-replica training.
-
-    Returns:
-      A TensorFlow dataset outputting batched images and labels.
-    """
-    builders = {
-        'tfds': self.load_tfds,
-        'records': self.load_records,
-        'synthetic': self.load_synthetic,
-    }
-
-    builder = builders.get(self.config.builder, None)
-
-    if builder is None:
-      raise ValueError('Unknown builder type {}'.format(self.config.builder))
-
-    self.input_context = input_context
-    dataset = builder()
-    dataset = self.pipeline(dataset)
-
-    return dataset
-
-  def load_tfds(self) -> tf.data.Dataset:
-    """Return a dataset loading files from TFDS."""
-
-    logging.info('Using TFDS to load data.')
-
-    builder = tfds.builder(self.config.name, data_dir=self.config.data_dir)
-
-    if self.config.download:
-      builder.download_and_prepare()
-
-    decoders = {}
-
-    if self.config.skip_decoding:
-      decoders['image'] = tfds.decode.SkipDecoding()
-
-    read_config = tfds.ReadConfig(
-        interleave_cycle_length=10,
-        interleave_block_length=1,
-        input_context=self.input_context)
-
-    dataset = builder.as_dataset(
-        split=self.config.split,
-        as_supervised=True,
-        shuffle_files=True,
-        decoders=decoders,
-        read_config=read_config)
-
-    return dataset
-
-  def load_records(self) -> tf.data.Dataset:
-    """Return a dataset loading files with TFRecords."""
-    logging.info('Using TFRecords to load data.')
-    if self.config.filenames is None:
-      if self.config.data_dir is None:
-        raise ValueError('Dataset must specify a path for the data files.')
-
-      file_pattern = os.path.join(self.config.data_dir,
-                                  '{}*'.format(self.config.split))
-      dataset = tf.data.Dataset.list_files(file_pattern, shuffle=False)
-    else:
-      dataset = tf.data.Dataset.from_tensor_slices(self.config.filenames)
-
-    return dataset
-
-  def load_synthetic(self) -> tf.data.Dataset:
-    """Return a dataset generating dummy synthetic data."""
-    logging.info('Generating a synthetic dataset.')
-
-    def generate_data(_):
-      image = tf.zeros([self.image_size, self.image_size, self.num_channels],
-                       dtype=self.dtype)
-      label = tf.zeros([1], dtype=tf.int32)
-      return image, label
-
-    dataset = tf.data.Dataset.range(1)
-    dataset = dataset.repeat()
-    dataset = dataset.map(
-        generate_data, num_parallel_calls=tf.data.experimental.AUTOTUNE)
-    return dataset
-
-  def pipeline(self, dataset: tf.data.Dataset) -> tf.data.Dataset:
-    """Build a pipeline fetching, shuffling, and preprocessing the dataset.
-
-    Args:
-      dataset: A `tf.data.Dataset` that loads raw files.
-
-    Returns:
-      A TensorFlow dataset outputting batched images and labels.
-    """
-    if (self.config.builder != 'tfds' and self.input_context and
-        self.input_context.num_input_pipelines > 1):
-      dataset = dataset.shard(self.input_context.num_input_pipelines,
-                              self.input_context.input_pipeline_id)
-      logging.info(
-          'Sharding the dataset: input_pipeline_id=%d '
-          'num_input_pipelines=%d', self.input_context.num_input_pipelines,
-          self.input_context.input_pipeline_id)
-
-    if self.is_training and self.config.builder == 'records':
-      # Shuffle the input files.
-      dataset.shuffle(buffer_size=self.config.file_shuffle_buffer_size)
-
-    if self.is_training and not self.config.cache:
-      dataset = dataset.repeat()
-
-    if self.config.builder == 'records':
-      # Read the data from disk in parallel
-      dataset = dataset.interleave(
-          tf.data.TFRecordDataset,
-          cycle_length=10,
-          block_length=1,
-          num_parallel_calls=tf.data.experimental.AUTOTUNE)
-
-    if self.config.cache:
-      dataset = dataset.cache()
-
-    if self.is_training:
-      dataset = dataset.shuffle(self.config.shuffle_buffer_size)
-      dataset = dataset.repeat()
-
-    # Parse, pre-process, and batch the data in parallel
-    if self.config.builder == 'records':
-      preprocess = self.parse_record
-    else:
-      preprocess = self.preprocess
-    dataset = dataset.map(
-        preprocess, num_parallel_calls=tf.data.experimental.AUTOTUNE)
-
-    if self.input_context and self.config.num_devices > 1:
-      if not self.config.use_per_replica_batch_size:
-        raise ValueError(
-            'The builder does not support a global batch size with more than '
-            'one replica. Got {} replicas. Please set a '
-            '`per_replica_batch_size` and enable '
-            '`use_per_replica_batch_size=True`.'.format(
-                self.config.num_devices))
-
-      # The batch size of the dataset will be multiplied by the number of
-      # replicas automatically when strategy.distribute_datasets_from_function
-      # is called, so we use local batch size here.
-      dataset = dataset.batch(
-          self.local_batch_size, drop_remainder=self.is_training)
-    else:
-      dataset = dataset.batch(
-          self.global_batch_size, drop_remainder=self.is_training)
-
-    # Prefetch overlaps in-feed with training
-    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
-
-    if self.config.tf_data_service:
-      if not hasattr(tf.data.experimental, 'service'):
-        raise ValueError('The tf_data_service flag requires Tensorflow version '
-                         '>= 2.3.0, but the version is {}'.format(
-                             tf.__version__))
-      dataset = dataset.apply(
-          tf.data.experimental.service.distribute(
-              processing_mode='parallel_epochs',
-              service=self.config.tf_data_service,
-              job_name='resnet_train'))
-      dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
-
-    return dataset
-
-  def parse_record(self, record: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]:
-    """Parse an ImageNet record from a serialized string Tensor."""
-    keys_to_features = {
-        'image/encoded': tf.io.FixedLenFeature((), tf.string, ''),
-        'image/format': tf.io.FixedLenFeature((), tf.string, 'jpeg'),
-        'image/class/label': tf.io.FixedLenFeature([], tf.int64, -1),
-        'image/class/text': tf.io.FixedLenFeature([], tf.string, ''),
-        'image/object/bbox/xmin': tf.io.VarLenFeature(dtype=tf.float32),
-        'image/object/bbox/ymin': tf.io.VarLenFeature(dtype=tf.float32),
-        'image/object/bbox/xmax': tf.io.VarLenFeature(dtype=tf.float32),
-        'image/object/bbox/ymax': tf.io.VarLenFeature(dtype=tf.float32),
-        'image/object/class/label': tf.io.VarLenFeature(dtype=tf.int64),
-    }
-
-    parsed = tf.io.parse_single_example(record, keys_to_features)
-
-    label = tf.reshape(parsed['image/class/label'], shape=[1])
-
-    # Subtract one so that labels are in [0, 1000)
-    label -= 1
-
-    image_bytes = tf.reshape(parsed['image/encoded'], shape=[])
-    image, label = self.preprocess(image_bytes, label)
-
-    return image, label
-
-  def preprocess(self, image: tf.Tensor,
-                 label: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]:
-    """Apply image preprocessing and augmentation to the image and label."""
-    if self.is_training:
-      image = preprocessing.preprocess_for_train(
-          image,
-          image_size=self.image_size,
-          mean_subtract=self.config.mean_subtract,
-          standardize=self.config.standardize,
-          dtype=self.dtype,
-          augmenter=self.augmenter)
-    else:
-      image = preprocessing.preprocess_for_eval(
-          image,
-          image_size=self.image_size,
-          num_channels=self.num_channels,
-          mean_subtract=self.config.mean_subtract,
-          standardize=self.config.standardize,
-          dtype=self.dtype)
-
-    label = tf.cast(label, tf.int32)
-    if self.config.one_hot:
-      label = tf.one_hot(label, self.num_classes)
-      label = tf.reshape(label, [self.num_classes])
-
-    return image, label
-
-  @classmethod
-  def from_params(cls, *args, **kwargs):
-    """Construct a dataset builder from a default config and any overrides."""
-    config = DatasetConfig.from_args(*args, **kwargs)
-    return cls(config)
diff --git a/official/vision/image_classification/efficientnet/__init__.py b/official/vision/image_classification/efficientnet/__init__.py
deleted file mode 100644
index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/efficientnet/__init__.py
+++ /dev/null
@@ -1,14 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
diff --git a/official/vision/image_classification/efficientnet/common_modules.py b/official/vision/image_classification/efficientnet/common_modules.py
deleted file mode 100644
index 9c3d11c8676773be4f7fc27187d0852fdd58aaf4..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/efficientnet/common_modules.py
+++ /dev/null
@@ -1,118 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Common modeling utilities."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import tensorflow as tf
-import tensorflow.compat.v1 as tf1
-from typing import Text, Optional
-
-from tensorflow.python.tpu import tpu_function
-
-
-@tf.keras.utils.register_keras_serializable(package='Vision')
-class TpuBatchNormalization(tf.keras.layers.BatchNormalization):
-  """Cross replica batch normalization."""
-
-  def __init__(self, fused: Optional[bool] = False, **kwargs):
-    if fused in (True, None):
-      raise ValueError('TpuBatchNormalization does not support fused=True.')
-    super(TpuBatchNormalization, self).__init__(fused=fused, **kwargs)
-
-  def _cross_replica_average(self, t: tf.Tensor, num_shards_per_group: int):
-    """Calculates the average value of input tensor across TPU replicas."""
-    num_shards = tpu_function.get_tpu_context().number_of_shards
-    group_assignment = None
-    if num_shards_per_group > 1:
-      if num_shards % num_shards_per_group != 0:
-        raise ValueError(
-            'num_shards: %d mod shards_per_group: %d, should be 0' %
-            (num_shards, num_shards_per_group))
-      num_groups = num_shards // num_shards_per_group
-      group_assignment = [[
-          x for x in range(num_shards) if x // num_shards_per_group == y
-      ] for y in range(num_groups)]
-    return tf1.tpu.cross_replica_sum(t, group_assignment) / tf.cast(
-        num_shards_per_group, t.dtype)
-
-  def _moments(self, inputs: tf.Tensor, reduction_axes: int, keep_dims: int):
-    """Compute the mean and variance: it overrides the original _moments."""
-    shard_mean, shard_variance = super(TpuBatchNormalization, self)._moments(
-        inputs, reduction_axes, keep_dims=keep_dims)
-
-    num_shards = tpu_function.get_tpu_context().number_of_shards or 1
-    if num_shards <= 8:  # Skip cross_replica for 2x2 or smaller slices.
-      num_shards_per_group = 1
-    else:
-      num_shards_per_group = max(8, num_shards // 8)
-    if num_shards_per_group > 1:
-      # Compute variance using: Var[X]= E[X^2] - E[X]^2.
-      shard_square_of_mean = tf.math.square(shard_mean)
-      shard_mean_of_square = shard_variance + shard_square_of_mean
-      group_mean = self._cross_replica_average(shard_mean, num_shards_per_group)
-      group_mean_of_square = self._cross_replica_average(
-          shard_mean_of_square, num_shards_per_group)
-      group_variance = group_mean_of_square - tf.math.square(group_mean)
-      return (group_mean, group_variance)
-    else:
-      return (shard_mean, shard_variance)
-
-
-def get_batch_norm(batch_norm_type: Text) -> tf.keras.layers.BatchNormalization:
-  """A helper to create a batch normalization getter.
-
-  Args:
-    batch_norm_type: The type of batch normalization layer implementation. `tpu`
-      will use `TpuBatchNormalization`.
-
-  Returns:
-    An instance of `tf.keras.layers.BatchNormalization`.
-  """
-  if batch_norm_type == 'tpu':
-    return TpuBatchNormalization
-
-  return tf.keras.layers.BatchNormalization  # pytype: disable=bad-return-type  # typed-keras
-
-
-def count_params(model, trainable_only=True):
-  """Returns the count of all model parameters, or just trainable ones."""
-  if not trainable_only:
-    return model.count_params()
-  else:
-    return int(
-        np.sum([
-            tf.keras.backend.count_params(p) for p in model.trainable_weights
-        ]))
-
-
-def load_weights(model: tf.keras.Model,
-                 model_weights_path: Text,
-                 weights_format: Text = 'saved_model'):
-  """Load model weights from the given file path.
-
-  Args:
-    model: the model to load weights into
-    model_weights_path: the path of the model weights
-    weights_format: the model weights format. One of 'saved_model', 'h5', or
-      'checkpoint'.
-  """
-  if weights_format == 'saved_model':
-    loaded_model = tf.keras.models.load_model(model_weights_path)
-    model.set_weights(loaded_model.get_weights())
-  else:
-    model.load_weights(model_weights_path)
diff --git a/official/vision/image_classification/efficientnet/efficientnet_config.py b/official/vision/image_classification/efficientnet/efficientnet_config.py
deleted file mode 100644
index 47cfd740221d3581db585e90bc6df0711c289019..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/efficientnet/efficientnet_config.py
+++ /dev/null
@@ -1,78 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Lint as: python3
-"""Configuration definitions for EfficientNet losses, learning rates, and optimizers."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from typing import Any, Mapping
-
-import dataclasses
-
-from official.modeling.hyperparams import base_config
-from official.vision.image_classification.configs import base_configs
-
-
-@dataclasses.dataclass
-class EfficientNetModelConfig(base_configs.ModelConfig):
-  """Configuration for the EfficientNet model.
-
-  This configuration will default to settings used for training efficientnet-b0
-  on a v3-8 TPU on ImageNet.
-
-  Attributes:
-    name: The name of the model. Defaults to 'EfficientNet'.
-    num_classes: The number of classes in the model.
-    model_params: A dictionary that represents the parameters of the
-      EfficientNet model. These will be passed in to the "from_name" function.
-    loss: The configuration for loss. Defaults to a categorical cross entropy
-      implementation.
-    optimizer: The configuration for optimizations. Defaults to an RMSProp
-      configuration.
-    learning_rate: The configuration for learning rate. Defaults to an
-      exponential configuration.
-  """
-  name: str = 'EfficientNet'
-  num_classes: int = 1000
-  model_params: base_config.Config = dataclasses.field(
-      default_factory=lambda: {
-          'model_name': 'efficientnet-b0',
-          'model_weights_path': '',
-          'weights_format': 'saved_model',
-          'overrides': {
-              'batch_norm': 'default',
-              'rescale_input': True,
-              'num_classes': 1000,
-              'activation': 'swish',
-              'dtype': 'float32',
-          }
-      })
-  loss: base_configs.LossConfig = base_configs.LossConfig(
-      name='categorical_crossentropy', label_smoothing=0.1)
-  optimizer: base_configs.OptimizerConfig = base_configs.OptimizerConfig(
-      name='rmsprop',
-      decay=0.9,
-      epsilon=0.001,
-      momentum=0.9,
-      moving_average_decay=None)
-  learning_rate: base_configs.LearningRateConfig = base_configs.LearningRateConfig(  # pylint: disable=line-too-long
-      name='exponential',
-      initial_lr=0.008,
-      decay_epochs=2.4,
-      decay_rate=0.97,
-      warmup_epochs=5,
-      scale_by_batch_size=1. / 128.,
-      staircase=True)
diff --git a/official/vision/image_classification/efficientnet/efficientnet_model.py b/official/vision/image_classification/efficientnet/efficientnet_model.py
deleted file mode 100644
index ad385715cd866209a0d3958a6742cbde73f16091..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/efficientnet/efficientnet_model.py
+++ /dev/null
@@ -1,499 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Lint as: python3
-"""Contains definitions for EfficientNet model.
-
-[1] Mingxing Tan, Quoc V. Le
-  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.
-  ICML'19, https://arxiv.org/abs/1905.11946
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-import os
-from typing import Any, Dict, Optional, Text, Tuple
-
-from absl import logging
-from dataclasses import dataclass
-import tensorflow as tf
-
-from official.modeling import tf_utils
-from official.modeling.hyperparams import base_config
-from official.vision.image_classification import preprocessing
-from official.vision.image_classification.efficientnet import common_modules
-
-
-@dataclass
-class BlockConfig(base_config.Config):
-  """Config for a single MB Conv Block."""
-  input_filters: int = 0
-  output_filters: int = 0
-  kernel_size: int = 3
-  num_repeat: int = 1
-  expand_ratio: int = 1
-  strides: Tuple[int, int] = (1, 1)
-  se_ratio: Optional[float] = None
-  id_skip: bool = True
-  fused_conv: bool = False
-  conv_type: str = 'depthwise'
-
-
-@dataclass
-class ModelConfig(base_config.Config):
-  """Default Config for Efficientnet-B0."""
-  width_coefficient: float = 1.0
-  depth_coefficient: float = 1.0
-  resolution: int = 224
-  dropout_rate: float = 0.2
-  blocks: Tuple[BlockConfig, ...] = (
-      # (input_filters, output_filters, kernel_size, num_repeat,
-      #  expand_ratio, strides, se_ratio)
-      # pylint: disable=bad-whitespace
-      BlockConfig.from_args(32, 16, 3, 1, 1, (1, 1), 0.25),
-      BlockConfig.from_args(16, 24, 3, 2, 6, (2, 2), 0.25),
-      BlockConfig.from_args(24, 40, 5, 2, 6, (2, 2), 0.25),
-      BlockConfig.from_args(40, 80, 3, 3, 6, (2, 2), 0.25),
-      BlockConfig.from_args(80, 112, 5, 3, 6, (1, 1), 0.25),
-      BlockConfig.from_args(112, 192, 5, 4, 6, (2, 2), 0.25),
-      BlockConfig.from_args(192, 320, 3, 1, 6, (1, 1), 0.25),
-      # pylint: enable=bad-whitespace
-  )
-  stem_base_filters: int = 32
-  top_base_filters: int = 1280
-  activation: str = 'simple_swish'
-  batch_norm: str = 'default'
-  bn_momentum: float = 0.99
-  bn_epsilon: float = 1e-3
-  # While the original implementation used a weight decay of 1e-5,
-  # tf.nn.l2_loss divides it by 2, so we halve this to compensate in Keras
-  weight_decay: float = 5e-6
-  drop_connect_rate: float = 0.2
-  depth_divisor: int = 8
-  min_depth: Optional[int] = None
-  use_se: bool = True
-  input_channels: int = 3
-  num_classes: int = 1000
-  model_name: str = 'efficientnet'
-  rescale_input: bool = True
-  data_format: str = 'channels_last'
-  dtype: str = 'float32'
-
-
-MODEL_CONFIGS = {
-    # (width, depth, resolution, dropout)
-    'efficientnet-b0': ModelConfig.from_args(1.0, 1.0, 224, 0.2),
-    'efficientnet-b1': ModelConfig.from_args(1.0, 1.1, 240, 0.2),
-    'efficientnet-b2': ModelConfig.from_args(1.1, 1.2, 260, 0.3),
-    'efficientnet-b3': ModelConfig.from_args(1.2, 1.4, 300, 0.3),
-    'efficientnet-b4': ModelConfig.from_args(1.4, 1.8, 380, 0.4),
-    'efficientnet-b5': ModelConfig.from_args(1.6, 2.2, 456, 0.4),
-    'efficientnet-b6': ModelConfig.from_args(1.8, 2.6, 528, 0.5),
-    'efficientnet-b7': ModelConfig.from_args(2.0, 3.1, 600, 0.5),
-    'efficientnet-b8': ModelConfig.from_args(2.2, 3.6, 672, 0.5),
-    'efficientnet-l2': ModelConfig.from_args(4.3, 5.3, 800, 0.5),
-}
-
-CONV_KERNEL_INITIALIZER = {
-    'class_name': 'VarianceScaling',
-    'config': {
-        'scale': 2.0,
-        'mode': 'fan_out',
-        # Note: this is a truncated normal distribution
-        'distribution': 'normal'
-    }
-}
-
-DENSE_KERNEL_INITIALIZER = {
-    'class_name': 'VarianceScaling',
-    'config': {
-        'scale': 1 / 3.0,
-        'mode': 'fan_out',
-        'distribution': 'uniform'
-    }
-}
-
-
-def round_filters(filters: int, config: ModelConfig) -> int:
-  """Round number of filters based on width coefficient."""
-  width_coefficient = config.width_coefficient
-  min_depth = config.min_depth
-  divisor = config.depth_divisor
-  orig_filters = filters
-
-  if not width_coefficient:
-    return filters
-
-  filters *= width_coefficient
-  min_depth = min_depth or divisor
-  new_filters = max(min_depth, int(filters + divisor / 2) // divisor * divisor)
-  # Make sure that round down does not go down by more than 10%.
-  if new_filters < 0.9 * filters:
-    new_filters += divisor
-  logging.info('round_filter input=%s output=%s', orig_filters, new_filters)
-  return int(new_filters)
-
-
-def round_repeats(repeats: int, depth_coefficient: float) -> int:
-  """Round number of repeats based on depth coefficient."""
-  return int(math.ceil(depth_coefficient * repeats))
-
-
-def conv2d_block(inputs: tf.Tensor,
-                 conv_filters: Optional[int],
-                 config: ModelConfig,
-                 kernel_size: Any = (1, 1),
-                 strides: Any = (1, 1),
-                 use_batch_norm: bool = True,
-                 use_bias: bool = False,
-                 activation: Optional[Any] = None,
-                 depthwise: bool = False,
-                 name: Optional[Text] = None):
-  """A conv2d followed by batch norm and an activation."""
-  batch_norm = common_modules.get_batch_norm(config.batch_norm)
-  bn_momentum = config.bn_momentum
-  bn_epsilon = config.bn_epsilon
-  data_format = tf.keras.backend.image_data_format()
-  weight_decay = config.weight_decay
-
-  name = name or ''
-
-  # Collect args based on what kind of conv2d block is desired
-  init_kwargs = {
-      'kernel_size': kernel_size,
-      'strides': strides,
-      'use_bias': use_bias,
-      'padding': 'same',
-      'name': name + '_conv2d',
-      'kernel_regularizer': tf.keras.regularizers.l2(weight_decay),
-      'bias_regularizer': tf.keras.regularizers.l2(weight_decay),
-  }
-
-  if depthwise:
-    conv2d = tf.keras.layers.DepthwiseConv2D
-    init_kwargs.update({'depthwise_initializer': CONV_KERNEL_INITIALIZER})
-  else:
-    conv2d = tf.keras.layers.Conv2D
-    init_kwargs.update({
-        'filters': conv_filters,
-        'kernel_initializer': CONV_KERNEL_INITIALIZER
-    })
-
-  x = conv2d(**init_kwargs)(inputs)
-
-  if use_batch_norm:
-    bn_axis = 1 if data_format == 'channels_first' else -1
-    x = batch_norm(
-        axis=bn_axis,
-        momentum=bn_momentum,
-        epsilon=bn_epsilon,
-        name=name + '_bn')(
-            x)
-
-  if activation is not None:
-    x = tf.keras.layers.Activation(activation, name=name + '_activation')(x)
-  return x
-
-
-def mb_conv_block(inputs: tf.Tensor,
-                  block: BlockConfig,
-                  config: ModelConfig,
-                  prefix: Optional[Text] = None):
-  """Mobile Inverted Residual Bottleneck.
-
-  Args:
-    inputs: the Keras input to the block
-    block: BlockConfig, arguments to create a Block
-    config: ModelConfig, a set of model parameters
-    prefix: prefix for naming all layers
-
-  Returns:
-    the output of the block
-  """
-  use_se = config.use_se
-  activation = tf_utils.get_activation(config.activation)
-  drop_connect_rate = config.drop_connect_rate
-  data_format = tf.keras.backend.image_data_format()
-  use_depthwise = block.conv_type != 'no_depthwise'
-  prefix = prefix or ''
-
-  filters = block.input_filters * block.expand_ratio
-
-  x = inputs
-
-  if block.fused_conv:
-    # If we use fused mbconv, skip expansion and use regular conv.
-    x = conv2d_block(
-        x,
-        filters,
-        config,
-        kernel_size=block.kernel_size,
-        strides=block.strides,
-        activation=activation,
-        name=prefix + 'fused')
-  else:
-    if block.expand_ratio != 1:
-      # Expansion phase
-      kernel_size = (1, 1) if use_depthwise else (3, 3)
-      x = conv2d_block(
-          x,
-          filters,
-          config,
-          kernel_size=kernel_size,
-          activation=activation,
-          name=prefix + 'expand')
-
-    # Depthwise Convolution
-    if use_depthwise:
-      x = conv2d_block(
-          x,
-          conv_filters=None,
-          config=config,
-          kernel_size=block.kernel_size,
-          strides=block.strides,
-          activation=activation,
-          depthwise=True,
-          name=prefix + 'depthwise')
-
-  # Squeeze and Excitation phase
-  if use_se:
-    assert block.se_ratio is not None
-    assert 0 < block.se_ratio <= 1
-    num_reduced_filters = max(1, int(block.input_filters * block.se_ratio))
-
-    if data_format == 'channels_first':
-      se_shape = (filters, 1, 1)
-    else:
-      se_shape = (1, 1, filters)
-
-    se = tf.keras.layers.GlobalAveragePooling2D(name=prefix + 'se_squeeze')(x)
-    se = tf.keras.layers.Reshape(se_shape, name=prefix + 'se_reshape')(se)
-
-    se = conv2d_block(
-        se,
-        num_reduced_filters,
-        config,
-        use_bias=True,
-        use_batch_norm=False,
-        activation=activation,
-        name=prefix + 'se_reduce')
-    se = conv2d_block(
-        se,
-        filters,
-        config,
-        use_bias=True,
-        use_batch_norm=False,
-        activation='sigmoid',
-        name=prefix + 'se_expand')
-    x = tf.keras.layers.multiply([x, se], name=prefix + 'se_excite')
-
-  # Output phase
-  x = conv2d_block(
-      x, block.output_filters, config, activation=None, name=prefix + 'project')
-
-  # Add identity so that quantization-aware training can insert quantization
-  # ops correctly.
-  x = tf.keras.layers.Activation(
-      tf_utils.get_activation('identity'), name=prefix + 'id')(
-          x)
-
-  if (block.id_skip and all(s == 1 for s in block.strides) and
-      block.input_filters == block.output_filters):
-    if drop_connect_rate and drop_connect_rate > 0:
-      # Apply dropconnect
-      # The only difference between dropout and dropconnect in TF is scaling by
-      # drop_connect_rate during training. See:
-      # https://github.com/keras-team/keras/pull/9898#issuecomment-380577612
-      x = tf.keras.layers.Dropout(
-          drop_connect_rate, noise_shape=(None, 1, 1, 1), name=prefix + 'drop')(
-              x)
-
-    x = tf.keras.layers.add([x, inputs], name=prefix + 'add')
-
-  return x
-
-
-def efficientnet(image_input: tf.keras.layers.Input, config: ModelConfig):  # pytype: disable=invalid-annotation  # typed-keras
-  """Creates an EfficientNet graph given the model parameters.
-
-  This function is wrapped by the `EfficientNet` class to make a tf.keras.Model.
-
-  Args:
-    image_input: the input batch of images
-    config: the model config
-
-  Returns:
-    the output of efficientnet
-  """
-  depth_coefficient = config.depth_coefficient
-  blocks = config.blocks
-  stem_base_filters = config.stem_base_filters
-  top_base_filters = config.top_base_filters
-  activation = tf_utils.get_activation(config.activation)
-  dropout_rate = config.dropout_rate
-  drop_connect_rate = config.drop_connect_rate
-  num_classes = config.num_classes
-  input_channels = config.input_channels
-  rescale_input = config.rescale_input
-  data_format = tf.keras.backend.image_data_format()
-  dtype = config.dtype
-  weight_decay = config.weight_decay
-
-  x = image_input
-  if data_format == 'channels_first':
-    # Happens on GPU/TPU if available.
-    x = tf.keras.layers.Permute((3, 1, 2))(x)
-  if rescale_input:
-    x = preprocessing.normalize_images(
-        x, num_channels=input_channels, dtype=dtype, data_format=data_format)
-
-  # Build stem
-  x = conv2d_block(
-      x,
-      round_filters(stem_base_filters, config),
-      config,
-      kernel_size=[3, 3],
-      strides=[2, 2],
-      activation=activation,
-      name='stem')
-
-  # Build blocks
-  num_blocks_total = sum(
-      round_repeats(block.num_repeat, depth_coefficient) for block in blocks)
-  block_num = 0
-
-  for stack_idx, block in enumerate(blocks):
-    assert block.num_repeat > 0
-    # Update block input and output filters based on depth multiplier
-    block = block.replace(
-        input_filters=round_filters(block.input_filters, config),
-        output_filters=round_filters(block.output_filters, config),
-        num_repeat=round_repeats(block.num_repeat, depth_coefficient))
-
-    # The first block needs to take care of stride and filter size increase
-    drop_rate = drop_connect_rate * float(block_num) / num_blocks_total
-    config = config.replace(drop_connect_rate=drop_rate)
-    block_prefix = 'stack_{}/block_0/'.format(stack_idx)
-    x = mb_conv_block(x, block, config, block_prefix)
-    block_num += 1
-    if block.num_repeat > 1:
-      block = block.replace(input_filters=block.output_filters, strides=[1, 1])
-
-      for block_idx in range(block.num_repeat - 1):
-        drop_rate = drop_connect_rate * float(block_num) / num_blocks_total
-        config = config.replace(drop_connect_rate=drop_rate)
-        block_prefix = 'stack_{}/block_{}/'.format(stack_idx, block_idx + 1)
-        x = mb_conv_block(x, block, config, prefix=block_prefix)
-        block_num += 1
-
-  # Build top
-  x = conv2d_block(
-      x,
-      round_filters(top_base_filters, config),
-      config,
-      activation=activation,
-      name='top')
-
-  # Build classifier
-  x = tf.keras.layers.GlobalAveragePooling2D(name='top_pool')(x)
-  if dropout_rate and dropout_rate > 0:
-    x = tf.keras.layers.Dropout(dropout_rate, name='top_dropout')(x)
-  x = tf.keras.layers.Dense(
-      num_classes,
-      kernel_initializer=DENSE_KERNEL_INITIALIZER,
-      kernel_regularizer=tf.keras.regularizers.l2(weight_decay),
-      bias_regularizer=tf.keras.regularizers.l2(weight_decay),
-      name='logits')(
-          x)
-  x = tf.keras.layers.Activation('softmax', name='probs')(x)
-
-  return x
-
-
-class EfficientNet(tf.keras.Model):
-  """Wrapper class for an EfficientNet Keras model.
-
-  Contains helper methods to build, manage, and save metadata about the model.
-  """
-
-  def __init__(self,
-               config: Optional[ModelConfig] = None,
-               overrides: Optional[Dict[Text, Any]] = None):
-    """Create an EfficientNet model.
-
-    Args:
-      config: (optional) the main model parameters to create the model
-      overrides: (optional) a dict containing keys that can override config
-    """
-    overrides = overrides or {}
-    config = config or ModelConfig()
-
-    self.config = config.replace(**overrides)
-
-    input_channels = self.config.input_channels
-    model_name = self.config.model_name
-    input_shape = (None, None, input_channels)  # Should handle any size image
-    image_input = tf.keras.layers.Input(shape=input_shape)
-
-    output = efficientnet(image_input, self.config)
-
-    # Cast to float32 in case we have a different model dtype
-    output = tf.cast(output, tf.float32)
-
-    logging.info('Building model %s with params %s', model_name, self.config)
-
-    super(EfficientNet, self).__init__(
-        inputs=image_input, outputs=output, name=model_name)
-
-  @classmethod
-  def from_name(cls,
-                model_name: Text,
-                model_weights_path: Optional[Text] = None,
-                weights_format: Text = 'saved_model',
-                overrides: Optional[Dict[Text, Any]] = None):
-    """Construct an EfficientNet model from a predefined model name.
-
-    E.g., `EfficientNet.from_name('efficientnet-b0')`.
-
-    Args:
-      model_name: the predefined model name
-      model_weights_path: the path to the weights (h5 file or saved model dir)
-      weights_format: the model weights format. One of 'saved_model', 'h5', or
-        'checkpoint'.
-      overrides: (optional) a dict containing keys that can override config
-
-    Returns:
-      A constructed EfficientNet instance.
-    """
-    model_configs = dict(MODEL_CONFIGS)
-    overrides = dict(overrides) if overrides else {}
-
-    # One can define their own custom models if necessary
-    model_configs.update(overrides.pop('model_config', {}))
-
-    if model_name not in model_configs:
-      raise ValueError('Unknown model name {}'.format(model_name))
-
-    config = model_configs[model_name]
-
-    model = cls(config=config, overrides=overrides)
-
-    if model_weights_path:
-      common_modules.load_weights(
-          model, model_weights_path, weights_format=weights_format)
-
-    return model
diff --git a/official/vision/image_classification/efficientnet/tfhub_export.py b/official/vision/image_classification/efficientnet/tfhub_export.py
deleted file mode 100644
index d3518a1304c8c761cfaabdcc96dead70dd9b0097..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/efficientnet/tfhub_export.py
+++ /dev/null
@@ -1,67 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""A script to export TF-Hub SavedModel."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import app
-from absl import flags
-
-import tensorflow as tf
-
-from official.vision.image_classification.efficientnet import efficientnet_model
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string("model_name", None, "EfficientNet model name.")
-flags.DEFINE_string("model_path", None, "File path to TF model checkpoint.")
-flags.DEFINE_string("export_path", None,
-                    "TF-Hub SavedModel destination path to export.")
-
-
-def export_tfhub(model_path, hub_destination, model_name):
-  """Restores a tf.keras.Model and saves for TF-Hub."""
-  model_configs = dict(efficientnet_model.MODEL_CONFIGS)
-  config = model_configs[model_name]
-
-  image_input = tf.keras.layers.Input(
-      shape=(None, None, 3), name="image_input", dtype=tf.float32)
-  x = image_input * 255.0
-  ouputs = efficientnet_model.efficientnet(x, config)
-  hub_model = tf.keras.Model(image_input, ouputs)
-  ckpt = tf.train.Checkpoint(model=hub_model)
-  ckpt.restore(model_path).assert_existing_objects_matched()
-  hub_model.save(
-      os.path.join(hub_destination, "classification"), include_optimizer=False)
-
-  feature_vector_output = hub_model.get_layer(name="top_pool").get_output_at(0)
-  hub_model2 = tf.keras.Model(image_input, feature_vector_output)
-  hub_model2.save(
-      os.path.join(hub_destination, "feature-vector"), include_optimizer=False)
-
-
-def main(argv):
-  if len(argv) > 1:
-    raise app.UsageError("Too many command-line arguments.")
-
-  export_tfhub(FLAGS.model_path, FLAGS.export_path, FLAGS.model_name)
-
-
-if __name__ == "__main__":
-  app.run(main)
diff --git a/official/vision/image_classification/learning_rate.py b/official/vision/image_classification/learning_rate.py
deleted file mode 100644
index 72f7e95187521eeebefa1e698ca5382f10642e88..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/learning_rate.py
+++ /dev/null
@@ -1,117 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Lint as: python3
-"""Learning rate utilities for vision tasks."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from typing import Any, Mapping, Optional
-
-import numpy as np
-import tensorflow as tf
-
-BASE_LEARNING_RATE = 0.1
-
-
-class WarmupDecaySchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
-  """A wrapper for LearningRateSchedule that includes warmup steps."""
-
-  def __init__(self,
-               lr_schedule: tf.keras.optimizers.schedules.LearningRateSchedule,
-               warmup_steps: int,
-               warmup_lr: Optional[float] = None):
-    """Add warmup decay to a learning rate schedule.
-
-    Args:
-      lr_schedule: base learning rate scheduler
-      warmup_steps: number of warmup steps
-      warmup_lr: an optional field for the final warmup learning rate. This
-        should be provided if the base `lr_schedule` does not contain this
-        field.
-    """
-    super(WarmupDecaySchedule, self).__init__()
-    self._lr_schedule = lr_schedule
-    self._warmup_steps = warmup_steps
-    self._warmup_lr = warmup_lr
-
-  def __call__(self, step: int):
-    lr = self._lr_schedule(step)
-    if self._warmup_steps:
-      if self._warmup_lr is not None:
-        initial_learning_rate = tf.convert_to_tensor(
-            self._warmup_lr, name="initial_learning_rate")
-      else:
-        initial_learning_rate = tf.convert_to_tensor(
-            self._lr_schedule.initial_learning_rate,
-            name="initial_learning_rate")
-      dtype = initial_learning_rate.dtype
-      global_step_recomp = tf.cast(step, dtype)
-      warmup_steps = tf.cast(self._warmup_steps, dtype)
-      warmup_lr = initial_learning_rate * global_step_recomp / warmup_steps
-      lr = tf.cond(global_step_recomp < warmup_steps, lambda: warmup_lr,
-                   lambda: lr)
-    return lr
-
-  def get_config(self) -> Mapping[str, Any]:
-    config = self._lr_schedule.get_config()
-    config.update({
-        "warmup_steps": self._warmup_steps,
-        "warmup_lr": self._warmup_lr,
-    })
-    return config
-
-
-class CosineDecayWithWarmup(tf.keras.optimizers.schedules.LearningRateSchedule):
-  """Class to generate learning rate tensor."""
-
-  def __init__(self, batch_size: int, total_steps: int, warmup_steps: int):
-    """Creates the consine learning rate tensor with linear warmup.
-
-    Args:
-      batch_size: The training batch size used in the experiment.
-      total_steps: Total training steps.
-      warmup_steps: Steps for the warm up period.
-    """
-    super(CosineDecayWithWarmup, self).__init__()
-    base_lr_batch_size = 256
-    self._total_steps = total_steps
-    self._init_learning_rate = BASE_LEARNING_RATE * batch_size / base_lr_batch_size
-    self._warmup_steps = warmup_steps
-
-  def __call__(self, global_step: int):
-    global_step = tf.cast(global_step, dtype=tf.float32)
-    warmup_steps = self._warmup_steps
-    init_lr = self._init_learning_rate
-    total_steps = self._total_steps
-
-    linear_warmup = global_step / warmup_steps * init_lr
-
-    cosine_learning_rate = init_lr * (tf.cos(np.pi *
-                                             (global_step - warmup_steps) /
-                                             (total_steps - warmup_steps)) +
-                                      1.0) / 2.0
-
-    learning_rate = tf.where(global_step < warmup_steps, linear_warmup,
-                             cosine_learning_rate)
-    return learning_rate
-
-  def get_config(self):
-    return {
-        "total_steps": self._total_steps,
-        "warmup_learning_rate": self._warmup_learning_rate,
-        "warmup_steps": self._warmup_steps,
-        "init_learning_rate": self._init_learning_rate,
-    }
diff --git a/official/vision/image_classification/learning_rate_test.py b/official/vision/image_classification/learning_rate_test.py
deleted file mode 100644
index 6c33ed24b8e46b8ecb58005a1f528e62a66f0005..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/learning_rate_test.py
+++ /dev/null
@@ -1,60 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Tests for learning_rate."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-from official.vision.image_classification import learning_rate
-
-
-class LearningRateTests(tf.test.TestCase):
-
-  def test_warmup_decay(self):
-    """Basic computational test for warmup decay."""
-    initial_lr = 0.01
-    decay_steps = 100
-    decay_rate = 0.01
-    warmup_steps = 10
-
-    base_lr = tf.keras.optimizers.schedules.ExponentialDecay(
-        initial_learning_rate=initial_lr,
-        decay_steps=decay_steps,
-        decay_rate=decay_rate)
-    lr = learning_rate.WarmupDecaySchedule(
-        lr_schedule=base_lr, warmup_steps=warmup_steps)
-
-    for step in range(warmup_steps - 1):
-      config = lr.get_config()
-      self.assertEqual(config['warmup_steps'], warmup_steps)
-      self.assertAllClose(
-          self.evaluate(lr(step)), step / warmup_steps * initial_lr)
-
-  def test_cosine_decay_with_warmup(self):
-    """Basic computational test for cosine decay with warmup."""
-    expected_lrs = [0.0, 0.1, 0.05, 0.0]
-
-    lr = learning_rate.CosineDecayWithWarmup(
-        batch_size=256, total_steps=3, warmup_steps=1)
-
-    for step in [0, 1, 2, 3]:
-      self.assertAllClose(lr(step), expected_lrs[step])
-
-
-if __name__ == '__main__':
-  tf.test.main()
diff --git a/official/vision/image_classification/mnist_main.py b/official/vision/image_classification/mnist_main.py
deleted file mode 100644
index 3eba80b06a9215cb5dc4d3b13facb2f2a4f3058c..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/mnist_main.py
+++ /dev/null
@@ -1,176 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Runs a simple model on the MNIST dataset."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-# Import libraries
-from absl import app
-from absl import flags
-from absl import logging
-import tensorflow as tf
-import tensorflow_datasets as tfds
-from official.common import distribute_utils
-from official.utils.flags import core as flags_core
-from official.utils.misc import model_helpers
-from official.vision.image_classification.resnet import common
-
-FLAGS = flags.FLAGS
-
-
-def build_model():
-  """Constructs the ML model used to predict handwritten digits."""
-
-  image = tf.keras.layers.Input(shape=(28, 28, 1))
-
-  y = tf.keras.layers.Conv2D(filters=32,
-                             kernel_size=5,
-                             padding='same',
-                             activation='relu')(image)
-  y = tf.keras.layers.MaxPooling2D(pool_size=(2, 2),
-                                   strides=(2, 2),
-                                   padding='same')(y)
-  y = tf.keras.layers.Conv2D(filters=32,
-                             kernel_size=5,
-                             padding='same',
-                             activation='relu')(y)
-  y = tf.keras.layers.MaxPooling2D(pool_size=(2, 2),
-                                   strides=(2, 2),
-                                   padding='same')(y)
-  y = tf.keras.layers.Flatten()(y)
-  y = tf.keras.layers.Dense(1024, activation='relu')(y)
-  y = tf.keras.layers.Dropout(0.4)(y)
-
-  probs = tf.keras.layers.Dense(10, activation='softmax')(y)
-
-  model = tf.keras.models.Model(image, probs, name='mnist')
-
-  return model
-
-
-@tfds.decode.make_decoder(output_dtype=tf.float32)
-def decode_image(example, feature):
-  """Convert image to float32 and normalize from [0, 255] to [0.0, 1.0]."""
-  return tf.cast(feature.decode_example(example), dtype=tf.float32) / 255
-
-
-def run(flags_obj, datasets_override=None, strategy_override=None):
-  """Run MNIST model training and eval loop using native Keras APIs.
-
-  Args:
-    flags_obj: An object containing parsed flag values.
-    datasets_override: A pair of `tf.data.Dataset` objects to train the model,
-                       representing the train and test sets.
-    strategy_override: A `tf.distribute.Strategy` object to use for model.
-
-  Returns:
-    Dictionary of training and eval stats.
-  """
-  # Start TF profiler server.
-  tf.profiler.experimental.server.start(flags_obj.profiler_port)
-
-  strategy = strategy_override or distribute_utils.get_distribution_strategy(
-      distribution_strategy=flags_obj.distribution_strategy,
-      num_gpus=flags_obj.num_gpus,
-      tpu_address=flags_obj.tpu)
-
-  strategy_scope = distribute_utils.get_strategy_scope(strategy)
-
-  mnist = tfds.builder('mnist', data_dir=flags_obj.data_dir)
-  if flags_obj.download:
-    mnist.download_and_prepare()
-
-  mnist_train, mnist_test = datasets_override or mnist.as_dataset(
-      split=['train', 'test'],
-      decoders={'image': decode_image()},  # pylint: disable=no-value-for-parameter
-      as_supervised=True)
-  train_input_dataset = mnist_train.cache().repeat().shuffle(
-      buffer_size=50000).batch(flags_obj.batch_size)
-  eval_input_dataset = mnist_test.cache().repeat().batch(flags_obj.batch_size)
-
-  with strategy_scope:
-    lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
-        0.05, decay_steps=100000, decay_rate=0.96)
-    optimizer = tf.keras.optimizers.SGD(learning_rate=lr_schedule)
-
-    model = build_model()
-    model.compile(
-        optimizer=optimizer,
-        loss='sparse_categorical_crossentropy',
-        metrics=['sparse_categorical_accuracy'])
-
-  num_train_examples = mnist.info.splits['train'].num_examples
-  train_steps = num_train_examples // flags_obj.batch_size
-  train_epochs = flags_obj.train_epochs
-
-  ckpt_full_path = os.path.join(flags_obj.model_dir, 'model.ckpt-{epoch:04d}')
-  callbacks = [
-      tf.keras.callbacks.ModelCheckpoint(
-          ckpt_full_path, save_weights_only=True),
-      tf.keras.callbacks.TensorBoard(log_dir=flags_obj.model_dir),
-  ]
-
-  num_eval_examples = mnist.info.splits['test'].num_examples
-  num_eval_steps = num_eval_examples // flags_obj.batch_size
-
-  history = model.fit(
-      train_input_dataset,
-      epochs=train_epochs,
-      steps_per_epoch=train_steps,
-      callbacks=callbacks,
-      validation_steps=num_eval_steps,
-      validation_data=eval_input_dataset,
-      validation_freq=flags_obj.epochs_between_evals)
-
-  export_path = os.path.join(flags_obj.model_dir, 'saved_model')
-  model.save(export_path, include_optimizer=False)
-
-  eval_output = model.evaluate(
-      eval_input_dataset, steps=num_eval_steps, verbose=2)
-
-  stats = common.build_stats(history, eval_output, callbacks)
-  return stats
-
-
-def define_mnist_flags():
-  """Define command line flags for MNIST model."""
-  flags_core.define_base(
-      clean=True,
-      num_gpu=True,
-      train_epochs=True,
-      epochs_between_evals=True,
-      distribution_strategy=True)
-  flags_core.define_device()
-  flags_core.define_distribution()
-  flags.DEFINE_bool('download', True,
-                    'Whether to download data to `--data_dir`.')
-  flags.DEFINE_integer('profiler_port', 9012,
-                       'Port to start profiler server on.')
-  FLAGS.set_default('batch_size', 1024)
-
-
-def main(_):
-  model_helpers.apply_clean(FLAGS)
-  stats = run(flags.FLAGS)
-  logging.info('Run stats:\n%s', stats)
-
-
-if __name__ == '__main__':
-  logging.set_verbosity(logging.INFO)
-  define_mnist_flags()
-  app.run(main)
diff --git a/official/vision/image_classification/mnist_test.py b/official/vision/image_classification/mnist_test.py
deleted file mode 100644
index c94396a444294b37259ba849bd8ea2f6f76997d0..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/mnist_test.py
+++ /dev/null
@@ -1,89 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Test the Keras MNIST model on GPU."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import functools
-
-from absl.testing import parameterized
-import tensorflow as tf
-
-from tensorflow.python.distribute import combinations
-from tensorflow.python.distribute import strategy_combinations
-from official.utils.testing import integration
-from official.vision.image_classification import mnist_main
-
-
-mnist_main.define_mnist_flags()
-
-
-def eager_strategy_combinations():
-  return combinations.combine(
-      distribution=[
-          strategy_combinations.default_strategy,
-          strategy_combinations.cloud_tpu_strategy,
-          strategy_combinations.one_device_strategy_gpu,
-      ],)
-
-
-class KerasMnistTest(tf.test.TestCase, parameterized.TestCase):
-  """Unit tests for sample Keras MNIST model."""
-  _tempdir = None
-
-  @classmethod
-  def setUpClass(cls):  # pylint: disable=invalid-name
-    super(KerasMnistTest, cls).setUpClass()
-
-  def tearDown(self):
-    super(KerasMnistTest, self).tearDown()
-    tf.io.gfile.rmtree(self.get_temp_dir())
-
-  @combinations.generate(eager_strategy_combinations())
-  def test_end_to_end(self, distribution):
-    """Test Keras MNIST model with `strategy`."""
-
-    extra_flags = [
-        "-train_epochs",
-        "1",
-        # Let TFDS find the metadata folder automatically
-        "--data_dir="
-    ]
-
-    dummy_data = (
-        tf.ones(shape=(10, 28, 28, 1), dtype=tf.int32),
-        tf.range(10),
-    )
-    datasets = (
-        tf.data.Dataset.from_tensor_slices(dummy_data),
-        tf.data.Dataset.from_tensor_slices(dummy_data),
-    )
-
-    run = functools.partial(
-        mnist_main.run,
-        datasets_override=datasets,
-        strategy_override=distribution)
-
-    integration.run_synthetic(
-        main=run,
-        synth=False,
-        tmp_root=self.create_tempdir().full_path,
-        extra_flags=extra_flags)
-
-
-if __name__ == "__main__":
-  tf.test.main()
diff --git a/official/vision/image_classification/optimizer_factory.py b/official/vision/image_classification/optimizer_factory.py
deleted file mode 100644
index 48a4512ee96438cec1367d6493f63a230b01eeb1..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/optimizer_factory.py
+++ /dev/null
@@ -1,181 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Optimizer factory for vision tasks."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from typing import Any, Dict, Optional, Text
-
-from absl import logging
-import tensorflow as tf
-import tensorflow_addons as tfa
-
-from official.modeling import optimization
-from official.vision.image_classification import learning_rate
-from official.vision.image_classification.configs import base_configs
-
-# pylint: disable=protected-access
-
-
-def build_optimizer(
-    optimizer_name: Text,
-    base_learning_rate: tf.keras.optimizers.schedules.LearningRateSchedule,
-    params: Dict[Text, Any],
-    model: Optional[tf.keras.Model] = None):
-  """Build the optimizer based on name.
-
-  Args:
-    optimizer_name: String representation of the optimizer name. Examples: sgd,
-      momentum, rmsprop.
-    base_learning_rate: `tf.keras.optimizers.schedules.LearningRateSchedule`
-      base learning rate.
-    params: String -> Any dictionary representing the optimizer params. This
-      should contain optimizer specific parameters such as `base_learning_rate`,
-      `decay`, etc.
-    model: The `tf.keras.Model`. This is used for the shadow copy if using
-      `ExponentialMovingAverage`.
-
-  Returns:
-    A tf.keras.Optimizer.
-
-  Raises:
-    ValueError if the provided optimizer_name is not supported.
-
-  """
-  optimizer_name = optimizer_name.lower()
-  logging.info('Building %s optimizer with params %s', optimizer_name, params)
-
-  if optimizer_name == 'sgd':
-    logging.info('Using SGD optimizer')
-    nesterov = params.get('nesterov', False)
-    optimizer = tf.keras.optimizers.SGD(
-        learning_rate=base_learning_rate, nesterov=nesterov)
-  elif optimizer_name == 'momentum':
-    logging.info('Using momentum optimizer')
-    nesterov = params.get('nesterov', False)
-    optimizer = tf.keras.optimizers.SGD(
-        learning_rate=base_learning_rate,
-        momentum=params['momentum'],
-        nesterov=nesterov)
-  elif optimizer_name == 'rmsprop':
-    logging.info('Using RMSProp')
-    rho = params.get('decay', None) or params.get('rho', 0.9)
-    momentum = params.get('momentum', 0.9)
-    epsilon = params.get('epsilon', 1e-07)
-    optimizer = tf.keras.optimizers.RMSprop(
-        learning_rate=base_learning_rate,
-        rho=rho,
-        momentum=momentum,
-        epsilon=epsilon)
-  elif optimizer_name == 'adam':
-    logging.info('Using Adam')
-    beta_1 = params.get('beta_1', 0.9)
-    beta_2 = params.get('beta_2', 0.999)
-    epsilon = params.get('epsilon', 1e-07)
-    optimizer = tf.keras.optimizers.Adam(
-        learning_rate=base_learning_rate,
-        beta_1=beta_1,
-        beta_2=beta_2,
-        epsilon=epsilon)
-  elif optimizer_name == 'adamw':
-    logging.info('Using AdamW')
-    weight_decay = params.get('weight_decay', 0.01)
-    beta_1 = params.get('beta_1', 0.9)
-    beta_2 = params.get('beta_2', 0.999)
-    epsilon = params.get('epsilon', 1e-07)
-    optimizer = tfa.optimizers.AdamW(
-        weight_decay=weight_decay,
-        learning_rate=base_learning_rate,
-        beta_1=beta_1,
-        beta_2=beta_2,
-        epsilon=epsilon)
-  else:
-    raise ValueError('Unknown optimizer %s' % optimizer_name)
-
-  if params.get('lookahead', None):
-    logging.info('Using lookahead optimizer.')
-    optimizer = tfa.optimizers.Lookahead(optimizer)
-
-  # Moving average should be applied last, as it's applied at test time
-  moving_average_decay = params.get('moving_average_decay', 0.)
-  if moving_average_decay is not None and moving_average_decay > 0.:
-    if model is None:
-      raise ValueError(
-          '`model` must be provided if using `ExponentialMovingAverage`.')
-    logging.info('Including moving average decay.')
-    optimizer = optimization.ExponentialMovingAverage(
-        optimizer=optimizer, average_decay=moving_average_decay)
-    optimizer.shadow_copy(model)
-  return optimizer
-
-
-def build_learning_rate(params: base_configs.LearningRateConfig,
-                        batch_size: Optional[int] = None,
-                        train_epochs: Optional[int] = None,
-                        train_steps: Optional[int] = None):
-  """Build the learning rate given the provided configuration."""
-  decay_type = params.name
-  base_lr = params.initial_lr
-  decay_rate = params.decay_rate
-  if params.decay_epochs is not None:
-    decay_steps = params.decay_epochs * train_steps
-  else:
-    decay_steps = 0
-  if params.warmup_epochs is not None:
-    warmup_steps = params.warmup_epochs * train_steps
-  else:
-    warmup_steps = 0
-
-  lr_multiplier = params.scale_by_batch_size
-
-  if lr_multiplier and lr_multiplier > 0:
-    # Scale the learning rate based on the batch size and a multiplier
-    base_lr *= lr_multiplier * batch_size
-    logging.info(
-        'Scaling the learning rate based on the batch size '
-        'multiplier. New base_lr: %f', base_lr)
-
-  if decay_type == 'exponential':
-    logging.info(
-        'Using exponential learning rate with: '
-        'initial_learning_rate: %f, decay_steps: %d, '
-        'decay_rate: %f', base_lr, decay_steps, decay_rate)
-    lr = tf.keras.optimizers.schedules.ExponentialDecay(
-        initial_learning_rate=base_lr,
-        decay_steps=decay_steps,
-        decay_rate=decay_rate,
-        staircase=params.staircase)
-  elif decay_type == 'stepwise':
-    steps_per_epoch = params.examples_per_epoch // batch_size
-    boundaries = [boundary * steps_per_epoch for boundary in params.boundaries]
-    multipliers = [batch_size * multiplier for multiplier in params.multipliers]
-    logging.info(
-        'Using stepwise learning rate. Parameters: '
-        'boundaries: %s, values: %s', boundaries, multipliers)
-    lr = tf.keras.optimizers.schedules.PiecewiseConstantDecay(
-        boundaries=boundaries, values=multipliers)
-  elif decay_type == 'cosine_with_warmup':
-    lr = learning_rate.CosineDecayWithWarmup(
-        batch_size=batch_size,
-        total_steps=train_epochs * train_steps,
-        warmup_steps=warmup_steps)
-  if warmup_steps > 0:
-    if decay_type not in ['cosine_with_warmup']:
-      logging.info('Applying %d warmup steps to the learning rate',
-                   warmup_steps)
-      lr = learning_rate.WarmupDecaySchedule(
-          lr, warmup_steps, warmup_lr=base_lr)
-  return lr
diff --git a/official/vision/image_classification/optimizer_factory_test.py b/official/vision/image_classification/optimizer_factory_test.py
deleted file mode 100644
index 41d71a328d6fc0d27709978ae75994f8985a166d..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/optimizer_factory_test.py
+++ /dev/null
@@ -1,118 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Tests for optimizer_factory."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl.testing import parameterized
-
-import tensorflow as tf
-from official.vision.image_classification import optimizer_factory
-from official.vision.image_classification.configs import base_configs
-
-
-class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase):
-
-  def build_toy_model(self) -> tf.keras.Model:
-    """Creates a toy `tf.Keras.Model`."""
-    model = tf.keras.Sequential()
-    model.add(tf.keras.layers.Dense(1, input_shape=(1,)))
-    return model
-
-  @parameterized.named_parameters(
-      ('sgd', 'sgd', 0., False), ('momentum', 'momentum', 0., False),
-      ('rmsprop', 'rmsprop', 0., False), ('adam', 'adam', 0., False),
-      ('adamw', 'adamw', 0., False),
-      ('momentum_lookahead', 'momentum', 0., True),
-      ('sgd_ema', 'sgd', 0.999, False),
-      ('momentum_ema', 'momentum', 0.999, False),
-      ('rmsprop_ema', 'rmsprop', 0.999, False))
-  def test_optimizer(self, optimizer_name, moving_average_decay, lookahead):
-    """Smoke test to be sure no syntax errors."""
-    model = self.build_toy_model()
-    params = {
-        'learning_rate': 0.001,
-        'rho': 0.09,
-        'momentum': 0.,
-        'epsilon': 1e-07,
-        'moving_average_decay': moving_average_decay,
-        'lookahead': lookahead,
-    }
-    optimizer = optimizer_factory.build_optimizer(
-        optimizer_name=optimizer_name,
-        base_learning_rate=params['learning_rate'],
-        params=params,
-        model=model)
-    self.assertTrue(issubclass(type(optimizer), tf.keras.optimizers.Optimizer))
-
-  def test_unknown_optimizer(self):
-    with self.assertRaises(ValueError):
-      optimizer_factory.build_optimizer(
-          optimizer_name='this_optimizer_does_not_exist',
-          base_learning_rate=None,
-          params=None)
-
-  def test_learning_rate_without_decay_or_warmups(self):
-    params = base_configs.LearningRateConfig(
-        name='exponential',
-        initial_lr=0.01,
-        decay_rate=0.01,
-        decay_epochs=None,
-        warmup_epochs=None,
-        scale_by_batch_size=0.01,
-        examples_per_epoch=1,
-        boundaries=[0],
-        multipliers=[0, 1])
-    batch_size = 1
-    train_steps = 1
-
-    lr = optimizer_factory.build_learning_rate(
-        params=params, batch_size=batch_size, train_steps=train_steps)
-    self.assertTrue(
-        issubclass(
-            type(lr), tf.keras.optimizers.schedules.LearningRateSchedule))
-
-  @parameterized.named_parameters(('exponential', 'exponential'),
-                                  ('cosine_with_warmup', 'cosine_with_warmup'))
-  def test_learning_rate_with_decay_and_warmup(self, lr_decay_type):
-    """Basic smoke test for syntax."""
-    params = base_configs.LearningRateConfig(
-        name=lr_decay_type,
-        initial_lr=0.01,
-        decay_rate=0.01,
-        decay_epochs=1,
-        warmup_epochs=1,
-        scale_by_batch_size=0.01,
-        examples_per_epoch=1,
-        boundaries=[0],
-        multipliers=[0, 1])
-    batch_size = 1
-    train_epochs = 1
-    train_steps = 1
-
-    lr = optimizer_factory.build_learning_rate(
-        params=params,
-        batch_size=batch_size,
-        train_epochs=train_epochs,
-        train_steps=train_steps)
-    self.assertTrue(
-        issubclass(
-            type(lr), tf.keras.optimizers.schedules.LearningRateSchedule))
-
-
-if __name__ == '__main__':
-  tf.test.main()
diff --git a/official/vision/image_classification/preprocessing.py b/official/vision/image_classification/preprocessing.py
deleted file mode 100644
index bd7e2e1d19faab1a4257f81bc59a5845d75b1823..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/preprocessing.py
+++ /dev/null
@@ -1,390 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Preprocessing functions for images."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-from typing import List, Optional, Text, Tuple
-
-from official.vision.image_classification import augment
-
-
-# Calculated from the ImageNet training set
-MEAN_RGB = (0.485 * 255, 0.456 * 255, 0.406 * 255)
-STDDEV_RGB = (0.229 * 255, 0.224 * 255, 0.225 * 255)
-
-IMAGE_SIZE = 224
-CROP_PADDING = 32
-
-
-def mean_image_subtraction(
-    image_bytes: tf.Tensor,
-    means: Tuple[float, ...],
-    num_channels: int = 3,
-    dtype: tf.dtypes.DType = tf.float32,
-) ->  tf.Tensor:
-  """Subtracts the given means from each image channel.
-
-  For example:
-    means = [123.68, 116.779, 103.939]
-    image_bytes = mean_image_subtraction(image_bytes, means)
-
-  Note that the rank of `image` must be known.
-
-  Args:
-    image_bytes: a tensor of size [height, width, C].
-    means: a C-vector of values to subtract from each channel.
-    num_channels: number of color channels in the image that will be distorted.
-    dtype: the dtype to convert the images to. Set to `None` to skip conversion.
-
-  Returns:
-    the centered image.
-
-  Raises:
-    ValueError: If the rank of `image` is unknown, if `image` has a rank other
-      than three or if the number of channels in `image` doesn't match the
-      number of values in `means`.
-  """
-  if image_bytes.get_shape().ndims != 3:
-    raise ValueError('Input must be of size [height, width, C>0]')
-
-  if len(means) != num_channels:
-    raise ValueError('len(means) must match the number of channels')
-
-  # We have a 1-D tensor of means; convert to 3-D.
-  # Note(b/130245863): we explicitly call `broadcast` instead of simply
-  # expanding dimensions for better performance.
-  means = tf.broadcast_to(means, tf.shape(image_bytes))
-  if dtype is not None:
-    means = tf.cast(means, dtype=dtype)
-
-  return image_bytes - means
-
-
-def standardize_image(
-    image_bytes: tf.Tensor,
-    stddev: Tuple[float, ...],
-    num_channels: int = 3,
-    dtype: tf.dtypes.DType = tf.float32,
-) ->  tf.Tensor:
-  """Divides the given stddev from each image channel.
-
-  For example:
-    stddev = [123.68, 116.779, 103.939]
-    image_bytes = standardize_image(image_bytes, stddev)
-
-  Note that the rank of `image` must be known.
-
-  Args:
-    image_bytes: a tensor of size [height, width, C].
-    stddev: a C-vector of values to divide from each channel.
-    num_channels: number of color channels in the image that will be distorted.
-    dtype: the dtype to convert the images to. Set to `None` to skip conversion.
-
-  Returns:
-    the centered image.
-
-  Raises:
-    ValueError: If the rank of `image` is unknown, if `image` has a rank other
-      than three or if the number of channels in `image` doesn't match the
-      number of values in `stddev`.
-  """
-  if image_bytes.get_shape().ndims != 3:
-    raise ValueError('Input must be of size [height, width, C>0]')
-
-  if len(stddev) != num_channels:
-    raise ValueError('len(stddev) must match the number of channels')
-
-  # We have a 1-D tensor of stddev; convert to 3-D.
-  # Note(b/130245863): we explicitly call `broadcast` instead of simply
-  # expanding dimensions for better performance.
-  stddev = tf.broadcast_to(stddev, tf.shape(image_bytes))
-  if dtype is not None:
-    stddev = tf.cast(stddev, dtype=dtype)
-
-  return image_bytes / stddev
-
-
-def normalize_images(features: tf.Tensor,
-                     mean_rgb: Tuple[float, ...] = MEAN_RGB,
-                     stddev_rgb: Tuple[float, ...] = STDDEV_RGB,
-                     num_channels: int = 3,
-                     dtype: tf.dtypes.DType = tf.float32,
-                     data_format: Text = 'channels_last') -> tf.Tensor:
-  """Normalizes the input image channels with the given mean and stddev.
-
-  Args:
-    features: `Tensor` representing decoded images in float format.
-    mean_rgb: the mean of the channels to subtract.
-    stddev_rgb: the stddev of the channels to divide.
-    num_channels: the number of channels in the input image tensor.
-    dtype: the dtype to convert the images to. Set to `None` to skip conversion.
-    data_format: the format of the input image tensor
-                 ['channels_first', 'channels_last'].
-
-  Returns:
-    A normalized image `Tensor`.
-  """
-  # TODO(allencwang) - figure out how to use mean_image_subtraction and
-  # standardize_image on batches of images and replace the following.
-  if data_format == 'channels_first':
-    stats_shape = [num_channels, 1, 1]
-  else:
-    stats_shape = [1, 1, num_channels]
-
-  if dtype is not None:
-    features = tf.image.convert_image_dtype(features, dtype=dtype)
-
-  if mean_rgb is not None:
-    mean_rgb = tf.constant(mean_rgb,
-                           shape=stats_shape,
-                           dtype=features.dtype)
-    mean_rgb = tf.broadcast_to(mean_rgb, tf.shape(features))
-    features = features - mean_rgb
-
-  if stddev_rgb is not None:
-    stddev_rgb = tf.constant(stddev_rgb,
-                             shape=stats_shape,
-                             dtype=features.dtype)
-    stddev_rgb = tf.broadcast_to(stddev_rgb, tf.shape(features))
-    features = features / stddev_rgb
-
-  return features
-
-
-def decode_and_center_crop(image_bytes: tf.Tensor,
-                           image_size: int = IMAGE_SIZE,
-                           crop_padding: int = CROP_PADDING) -> tf.Tensor:
-  """Crops to center of image with padding then scales image_size.
-
-  Args:
-    image_bytes: `Tensor` representing an image binary of arbitrary size.
-    image_size: image height/width dimension.
-    crop_padding: the padding size to use when centering the crop.
-
-  Returns:
-    A decoded and cropped image `Tensor`.
-  """
-  decoded = image_bytes.dtype != tf.string
-  shape = (tf.shape(image_bytes) if decoded
-           else tf.image.extract_jpeg_shape(image_bytes))
-  image_height = shape[0]
-  image_width = shape[1]
-
-  padded_center_crop_size = tf.cast(
-      ((image_size / (image_size + crop_padding)) *
-       tf.cast(tf.minimum(image_height, image_width), tf.float32)),
-      tf.int32)
-
-  offset_height = ((image_height - padded_center_crop_size) + 1) // 2
-  offset_width = ((image_width - padded_center_crop_size) + 1) // 2
-  crop_window = tf.stack([offset_height, offset_width,
-                          padded_center_crop_size, padded_center_crop_size])
-  if decoded:
-    image = tf.image.crop_to_bounding_box(
-        image_bytes,
-        offset_height=offset_height,
-        offset_width=offset_width,
-        target_height=padded_center_crop_size,
-        target_width=padded_center_crop_size)
-  else:
-    image = tf.image.decode_and_crop_jpeg(image_bytes, crop_window, channels=3)
-
-  image = resize_image(image_bytes=image,
-                       height=image_size,
-                       width=image_size)
-
-  return image
-
-
-def decode_crop_and_flip(image_bytes: tf.Tensor) -> tf.Tensor:
-  """Crops an image to a random part of the image, then randomly flips.
-
-  Args:
-    image_bytes: `Tensor` representing an image binary of arbitrary size.
-
-  Returns:
-    A decoded and cropped image `Tensor`.
-
-  """
-  decoded = image_bytes.dtype != tf.string
-  bbox = tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4])
-  shape = (tf.shape(image_bytes) if decoded
-           else tf.image.extract_jpeg_shape(image_bytes))
-  sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box(
-      shape,
-      bounding_boxes=bbox,
-      min_object_covered=0.1,
-      aspect_ratio_range=[0.75, 1.33],
-      area_range=[0.05, 1.0],
-      max_attempts=100,
-      use_image_if_no_bounding_boxes=True)
-  bbox_begin, bbox_size, _ = sample_distorted_bounding_box
-
-  # Reassemble the bounding box in the format the crop op requires.
-  offset_height, offset_width, _ = tf.unstack(bbox_begin)
-  target_height, target_width, _ = tf.unstack(bbox_size)
-  crop_window = tf.stack([offset_height, offset_width,
-                          target_height, target_width])
-  if decoded:
-    cropped = tf.image.crop_to_bounding_box(
-        image_bytes,
-        offset_height=offset_height,
-        offset_width=offset_width,
-        target_height=target_height,
-        target_width=target_width)
-  else:
-    cropped = tf.image.decode_and_crop_jpeg(image_bytes,
-                                            crop_window,
-                                            channels=3)
-
-  # Flip to add a little more random distortion in.
-  cropped = tf.image.random_flip_left_right(cropped)
-  return cropped
-
-
-def resize_image(image_bytes: tf.Tensor,
-                 height: int = IMAGE_SIZE,
-                 width: int = IMAGE_SIZE) -> tf.Tensor:
-  """Resizes an image to a given height and width.
-
-  Args:
-    image_bytes: `Tensor` representing an image binary of arbitrary size.
-    height: image height dimension.
-    width: image width dimension.
-
-  Returns:
-    A tensor containing the resized image.
-
-  """
-  return tf.compat.v1.image.resize(
-      image_bytes, [height, width], method=tf.image.ResizeMethod.BILINEAR,
-      align_corners=False)
-
-
-def preprocess_for_eval(
-    image_bytes: tf.Tensor,
-    image_size: int = IMAGE_SIZE,
-    num_channels: int = 3,
-    mean_subtract: bool = False,
-    standardize: bool = False,
-    dtype: tf.dtypes.DType = tf.float32
-) -> tf.Tensor:
-  """Preprocesses the given image for evaluation.
-
-  Args:
-    image_bytes: `Tensor` representing an image binary of arbitrary size.
-    image_size: image height/width dimension.
-    num_channels: number of image input channels.
-    mean_subtract: whether or not to apply mean subtraction.
-    standardize: whether or not to apply standardization.
-    dtype: the dtype to convert the images to. Set to `None` to skip conversion.
-
-  Returns:
-    A preprocessed and normalized image `Tensor`.
-  """
-  images = decode_and_center_crop(image_bytes, image_size)
-  images = tf.reshape(images, [image_size, image_size, num_channels])
-
-  if mean_subtract:
-    images = mean_image_subtraction(image_bytes=images, means=MEAN_RGB)
-  if standardize:
-    images = standardize_image(image_bytes=images, stddev=STDDEV_RGB)
-  if dtype is not None:
-    images = tf.image.convert_image_dtype(images, dtype=dtype)
-
-  return images
-
-
-def load_eval_image(filename: Text, image_size: int = IMAGE_SIZE) -> tf.Tensor:
-  """Reads an image from the filesystem and applies image preprocessing.
-
-  Args:
-    filename: a filename path of an image.
-    image_size: image height/width dimension.
-
-  Returns:
-    A preprocessed and normalized image `Tensor`.
-  """
-  image_bytes = tf.io.read_file(filename)
-  image = preprocess_for_eval(image_bytes, image_size)
-
-  return image
-
-
-def build_eval_dataset(filenames: List[Text],
-                       labels: Optional[List[int]] = None,
-                       image_size: int = IMAGE_SIZE,
-                       batch_size: int = 1) -> tf.Tensor:
-  """Builds a tf.data.Dataset from a list of filenames and labels.
-
-  Args:
-    filenames: a list of filename paths of images.
-    labels: a list of labels corresponding to each image.
-    image_size: image height/width dimension.
-    batch_size: the batch size used by the dataset
-
-  Returns:
-    A preprocessed and normalized image `Tensor`.
-  """
-  if labels is None:
-    labels = [0] * len(filenames)
-
-  filenames = tf.constant(filenames)
-  labels = tf.constant(labels)
-  dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
-
-  dataset = dataset.map(
-      lambda filename, label: (load_eval_image(filename, image_size), label))
-  dataset = dataset.batch(batch_size)
-
-  return dataset
-
-
-def preprocess_for_train(image_bytes: tf.Tensor,
-                         image_size: int = IMAGE_SIZE,
-                         augmenter: Optional[augment.ImageAugment] = None,
-                         mean_subtract: bool = False,
-                         standardize: bool = False,
-                         dtype: tf.dtypes.DType = tf.float32) -> tf.Tensor:
-  """Preprocesses the given image for training.
-
-  Args:
-    image_bytes: `Tensor` representing an image binary of
-      arbitrary size of dtype tf.uint8.
-    image_size: image height/width dimension.
-    augmenter: the image augmenter to apply.
-    mean_subtract: whether or not to apply mean subtraction.
-    standardize: whether or not to apply standardization.
-    dtype: the dtype to convert the images to. Set to `None` to skip conversion.
-
-  Returns:
-    A preprocessed and normalized image `Tensor`.
-  """
-  images = decode_crop_and_flip(image_bytes=image_bytes)
-  images = resize_image(images, height=image_size, width=image_size)
-  if augmenter is not None:
-    images = augmenter.distort(images)
-  if mean_subtract:
-    images = mean_image_subtraction(image_bytes=images, means=MEAN_RGB)
-  if standardize:
-    images = standardize_image(image_bytes=images, stddev=STDDEV_RGB)
-  if dtype is not None:
-    images = tf.image.convert_image_dtype(images, dtype)
-
-  return images
diff --git a/official/vision/image_classification/resnet/README.md b/official/vision/image_classification/resnet/README.md
deleted file mode 100644
index 5064523fbdcd4222c2159bdc1c09b7156800bf54..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/resnet/README.md
+++ /dev/null
@@ -1,125 +0,0 @@
-This folder contains a
-[custom training loop (CTL)](#resnet-custom-training-loop) implementation for
-ResNet50.
-
-## Before you begin
-Please refer to the [README](../README.md) in the parent directory for
-information on setup and preparing the data.
-
-## ResNet (custom training loop)
-
-Similar to the [estimator implementation](../../../r1/resnet), the Keras
-implementation has code for the ImageNet dataset. The ImageNet
-version uses a ResNet50 model implemented in
-[`resnet_model.py`](./resnet_model.py).
-
-
-### Pretrained Models
-
-* [ResNet50 Checkpoints](https://storage.googleapis.com/cloud-tpu-checkpoints/resnet/resnet50.tar.gz)
-
-* ResNet50 TFHub: [feature vector](https://tfhub.dev/tensorflow/resnet_50/feature_vector/1)
-and [classification](https://tfhub.dev/tensorflow/resnet_50/classification/1)
-
-Again, if you did not download the data to the default directory, specify the
-location with the `--data_dir` flag:
-
-```bash
-python3 resnet_ctl_imagenet_main.py --data_dir=/path/to/imagenet
-```
-
-There are more flag options you can specify. Here are some examples:
-
-- `--use_synthetic_data`: when set to true, synthetic data, rather than real
-data, are used;
-- `--batch_size`: the batch size used for the model;
-- `--model_dir`: the directory to save the model checkpoint;
-- `--train_epochs`: number of epoches to run for training the model;
-- `--train_steps`: number of steps to run for training the model. We now only
-support a number that is smaller than the number of batches in an epoch.
-- `--skip_eval`: when set to true, evaluation as well as validation during
-training is skipped
-
-For example, this is a typical command line to run with ImageNet data with
-batch size 128 per GPU:
-
-```bash
-python3 -m resnet_ctl_imagenet_main.py \
-    --model_dir=/tmp/model_dir/something \
-    --num_gpus=2 \
-    --batch_size=128 \
-    --train_epochs=90 \
-    --train_steps=10 \
-    --use_synthetic_data=false
-```
-
-See [`common.py`](common.py) for full list of options.
-
-### Using multiple GPUs
-
-You can train these models on multiple GPUs using `tf.distribute.Strategy` API.
-You can read more about them in this
-[guide](https://www.tensorflow.org/guide/distribute_strategy).
-
-In this example, we have made it easier to use is with just a command line flag
-`--num_gpus`. By default this flag is 1 if TensorFlow is compiled with CUDA,
-and 0 otherwise.
-
-- --num_gpus=0: Uses tf.distribute.OneDeviceStrategy with CPU as the device.
-- --num_gpus=1: Uses tf.distribute.OneDeviceStrategy with GPU as the device.
-- --num_gpus=2+: Uses tf.distribute.MirroredStrategy to run synchronous
-distributed training across the GPUs.
-
-If you wish to run without `tf.distribute.Strategy`, you can do so by setting
-`--distribution_strategy=off`.
-
-### Running on multiple GPU hosts
-
-You can also train these models on multiple hosts, each with GPUs, using
-`tf.distribute.Strategy`.
-
-The easiest way to run multi-host benchmarks is to set the
-[`TF_CONFIG`](https://www.tensorflow.org/guide/distributed_training#TF_CONFIG)
-appropriately at each host.  e.g., to run using `MultiWorkerMirroredStrategy` on
-2 hosts, the `cluster` in `TF_CONFIG` should have 2 `host:port` entries, and
-host `i` should have the `task` in `TF_CONFIG` set to `{"type": "worker",
-"index": i}`.  `MultiWorkerMirroredStrategy` will automatically use all the
-available GPUs at each host.
-
-### Running on Cloud TPUs
-
-Note: This model will **not** work with TPUs on Colab.
-
-You can train the ResNet CTL model on Cloud TPUs using
-`tf.distribute.TPUStrategy`. If you are not familiar with Cloud TPUs, it is
-strongly recommended that you go through the
-[quickstart](https://cloud.google.com/tpu/docs/quickstart) to learn how to
-create a TPU and GCE VM.
-
-To run ResNet model on a TPU, you must set `--distribution_strategy=tpu` and
-`--tpu=$TPU_NAME`, where `$TPU_NAME` the name of your TPU in the Cloud Console.
-From a GCE VM, you can run the following command to train ResNet for one epoch
-on a v2-8 or v3-8 TPU by setting `TRAIN_EPOCHS` to 1:
-
-```bash
-python3 resnet_ctl_imagenet_main.py \
-  --tpu=$TPU_NAME \
-  --model_dir=$MODEL_DIR \
-  --data_dir=$DATA_DIR \
-  --batch_size=1024 \
-  --steps_per_loop=500 \
-  --train_epochs=$TRAIN_EPOCHS \
-  --use_synthetic_data=false \
-  --dtype=fp32 \
-  --enable_eager=true \
-  --enable_tensorboard=true \
-  --distribution_strategy=tpu \
-  --log_steps=50 \
-  --single_l2_loss_op=true \
-  --use_tf_function=true
-```
-
-To train the ResNet to convergence, run it for 90 epochs by setting
-`TRAIN_EPOCHS` to 90.
-
-Note: `$MODEL_DIR` and `$DATA_DIR` must be GCS paths.
diff --git a/official/vision/image_classification/resnet/__init__.py b/official/vision/image_classification/resnet/__init__.py
deleted file mode 100644
index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/resnet/__init__.py
+++ /dev/null
@@ -1,14 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
diff --git a/official/vision/image_classification/resnet/common.py b/official/vision/image_classification/resnet/common.py
deleted file mode 100644
index a034ba7dd0be5b2b2536727137497c84519001a5..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/resnet/common.py
+++ /dev/null
@@ -1,418 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Common util functions and classes used by both keras cifar and imagenet."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import flags
-import tensorflow as tf
-
-import tensorflow_model_optimization as tfmot
-from official.utils.flags import core as flags_core
-from official.utils.misc import keras_utils
-
-FLAGS = flags.FLAGS
-BASE_LEARNING_RATE = 0.1  # This matches Jing's version.
-TRAIN_TOP_1 = 'training_accuracy_top_1'
-LR_SCHEDULE = [  # (multiplier, epoch to start) tuples
-    (1.0, 5), (0.1, 30), (0.01, 60), (0.001, 80)
-]
-
-
-class PiecewiseConstantDecayWithWarmup(
-    tf.keras.optimizers.schedules.LearningRateSchedule):
-  """Piecewise constant decay with warmup schedule."""
-
-  def __init__(self,
-               batch_size,
-               epoch_size,
-               warmup_epochs,
-               boundaries,
-               multipliers,
-               compute_lr_on_cpu=True,
-               name=None):
-    super(PiecewiseConstantDecayWithWarmup, self).__init__()
-    if len(boundaries) != len(multipliers) - 1:
-      raise ValueError('The length of boundaries must be 1 less than the '
-                       'length of multipliers')
-
-    base_lr_batch_size = 256
-    steps_per_epoch = epoch_size // batch_size
-
-    self.rescaled_lr = BASE_LEARNING_RATE * batch_size / base_lr_batch_size
-    self.step_boundaries = [float(steps_per_epoch) * x for x in boundaries]
-    self.lr_values = [self.rescaled_lr * m for m in multipliers]
-    self.warmup_steps = warmup_epochs * steps_per_epoch
-    self.compute_lr_on_cpu = compute_lr_on_cpu
-    self.name = name
-
-    self.learning_rate_ops_cache = {}
-
-  def __call__(self, step):
-    if tf.executing_eagerly():
-      return self._get_learning_rate(step)
-
-    # In an eager function or graph, the current implementation of optimizer
-    # repeatedly call and thus create ops for the learning rate schedule. To
-    # avoid this, we cache the ops if not executing eagerly.
-    graph = tf.compat.v1.get_default_graph()
-    if graph not in self.learning_rate_ops_cache:
-      if self.compute_lr_on_cpu:
-        with tf.device('/device:CPU:0'):
-          self.learning_rate_ops_cache[graph] = self._get_learning_rate(step)
-      else:
-        self.learning_rate_ops_cache[graph] = self._get_learning_rate(step)
-    return self.learning_rate_ops_cache[graph]
-
-  def _get_learning_rate(self, step):
-    """Compute learning rate at given step."""
-    with tf.name_scope('PiecewiseConstantDecayWithWarmup'):
-
-      def warmup_lr(step):
-        return self.rescaled_lr * (
-            tf.cast(step, tf.float32) / tf.cast(self.warmup_steps, tf.float32))
-
-      def piecewise_lr(step):
-        return tf.compat.v1.train.piecewise_constant(step, self.step_boundaries,
-                                                     self.lr_values)
-
-      return tf.cond(step < self.warmup_steps, lambda: warmup_lr(step),
-                     lambda: piecewise_lr(step))
-
-  def get_config(self):
-    return {
-        'rescaled_lr': self.rescaled_lr,
-        'step_boundaries': self.step_boundaries,
-        'lr_values': self.lr_values,
-        'warmup_steps': self.warmup_steps,
-        'compute_lr_on_cpu': self.compute_lr_on_cpu,
-        'name': self.name
-    }
-
-
-def get_optimizer(learning_rate=0.1):
-  """Returns optimizer to use."""
-  # The learning_rate is overwritten at the beginning of each step by callback.
-  return tf.keras.optimizers.SGD(learning_rate=learning_rate, momentum=0.9)
-
-
-def get_callbacks(pruning_method=None,
-                  enable_checkpoint_and_export=False,
-                  model_dir=None):
-  """Returns common callbacks."""
-  time_callback = keras_utils.TimeHistory(
-      FLAGS.batch_size,
-      FLAGS.log_steps,
-      logdir=FLAGS.model_dir if FLAGS.enable_tensorboard else None)
-  callbacks = [time_callback]
-
-  if FLAGS.enable_tensorboard:
-    tensorboard_callback = tf.keras.callbacks.TensorBoard(
-        log_dir=FLAGS.model_dir, profile_batch=FLAGS.profile_steps)
-    callbacks.append(tensorboard_callback)
-
-  is_pruning_enabled = pruning_method is not None
-  if is_pruning_enabled:
-    callbacks.append(tfmot.sparsity.keras.UpdatePruningStep())
-    if model_dir is not None:
-      callbacks.append(
-          tfmot.sparsity.keras.PruningSummaries(
-              log_dir=model_dir, profile_batch=0))
-
-  if enable_checkpoint_and_export:
-    if model_dir is not None:
-      ckpt_full_path = os.path.join(model_dir, 'model.ckpt-{epoch:04d}')
-      callbacks.append(
-          tf.keras.callbacks.ModelCheckpoint(
-              ckpt_full_path, save_weights_only=True))
-  return callbacks
-
-
-def build_stats(history, eval_output, callbacks):
-  """Normalizes and returns dictionary of stats.
-
-  Args:
-    history: Results of the training step. Supports both categorical_accuracy
-      and sparse_categorical_accuracy.
-    eval_output: Output of the eval step. Assumes first value is eval_loss and
-      second value is accuracy_top_1.
-    callbacks: a list of callbacks which might include a time history callback
-      used during keras.fit.
-
-  Returns:
-    Dictionary of normalized results.
-  """
-  stats = {}
-  if eval_output:
-    stats['accuracy_top_1'] = float(eval_output[1])
-    stats['eval_loss'] = float(eval_output[0])
-  if history and history.history:
-    train_hist = history.history
-    # Gets final loss from training.
-    stats['loss'] = float(train_hist['loss'][-1])
-    # Gets top_1 training accuracy.
-    if 'categorical_accuracy' in train_hist:
-      stats[TRAIN_TOP_1] = float(train_hist['categorical_accuracy'][-1])
-    elif 'sparse_categorical_accuracy' in train_hist:
-      stats[TRAIN_TOP_1] = float(train_hist['sparse_categorical_accuracy'][-1])
-    elif 'accuracy' in train_hist:
-      stats[TRAIN_TOP_1] = float(train_hist['accuracy'][-1])
-
-  if not callbacks:
-    return stats
-
-  # Look for the time history callback which was used during keras.fit
-  for callback in callbacks:
-    if isinstance(callback, keras_utils.TimeHistory):
-      timestamp_log = callback.timestamp_log
-      stats['step_timestamp_log'] = timestamp_log
-      stats['train_finish_time'] = callback.train_finish_time
-      if callback.epoch_runtime_log:
-        stats['avg_exp_per_second'] = callback.average_examples_per_second
-
-  return stats
-
-
-def define_keras_flags(model=False,
-                       optimizer=False,
-                       pretrained_filepath=False):
-  """Define flags for Keras models."""
-  flags_core.define_base(
-      clean=True,
-      num_gpu=True,
-      run_eagerly=True,
-      train_epochs=True,
-      epochs_between_evals=True,
-      distribution_strategy=True)
-  flags_core.define_performance(
-      num_parallel_calls=False,
-      synthetic_data=True,
-      dtype=True,
-      all_reduce_alg=True,
-      num_packs=True,
-      tf_gpu_thread_mode=True,
-      datasets_num_private_threads=True,
-      loss_scale=True,
-      fp16_implementation=True,
-      tf_data_experimental_slack=True,
-      enable_xla=True,
-      training_dataset_cache=True)
-  flags_core.define_image()
-  flags_core.define_benchmark()
-  flags_core.define_distribution()
-  flags.adopt_module_key_flags(flags_core)
-
-  flags.DEFINE_boolean(name='enable_eager', default=False, help='Enable eager?')
-  flags.DEFINE_boolean(name='skip_eval', default=False, help='Skip evaluation?')
-  # TODO(b/135607288): Remove this flag once we understand the root cause of
-  # slowdown when setting the learning phase in Keras backend.
-  flags.DEFINE_boolean(
-      name='set_learning_phase_to_train',
-      default=True,
-      help='If skip eval, also set Keras learning phase to 1 (training).')
-  flags.DEFINE_boolean(
-      name='explicit_gpu_placement',
-      default=False,
-      help='If not using distribution strategy, explicitly set device scope '
-      'for the Keras training loop.')
-  flags.DEFINE_boolean(
-      name='use_trivial_model',
-      default=False,
-      help='Whether to use a trivial Keras model.')
-  flags.DEFINE_boolean(
-      name='report_accuracy_metrics',
-      default=True,
-      help='Report metrics during training and evaluation.')
-  flags.DEFINE_boolean(
-      name='use_tensor_lr',
-      default=True,
-      help='Use learning rate tensor instead of a callback.')
-  flags.DEFINE_boolean(
-      name='enable_tensorboard',
-      default=False,
-      help='Whether to enable Tensorboard callback.')
-  flags.DEFINE_string(
-      name='profile_steps',
-      default=None,
-      help='Save profiling data to model dir at given range of global steps. The '
-      'value must be a comma separated pair of positive integers, specifying '
-      'the first and last step to profile. For example, "--profile_steps=2,4" '
-      'triggers the profiler to process 3 steps, starting from the 2nd step. '
-      'Note that profiler has a non-trivial performance overhead, and the '
-      'output file can be gigantic if profiling many steps.')
-  flags.DEFINE_integer(
-      name='train_steps',
-      default=None,
-      help='The number of steps to run for training. If it is larger than '
-      '# batches per epoch, then use # batches per epoch. This flag will be '
-      'ignored if train_epochs is set to be larger than 1. ')
-  flags.DEFINE_boolean(
-      name='batchnorm_spatial_persistent',
-      default=True,
-      help='Enable the spacial persistent mode for CuDNN batch norm kernel.')
-  flags.DEFINE_boolean(
-      name='enable_get_next_as_optional',
-      default=False,
-      help='Enable get_next_as_optional behavior in DistributedIterator.')
-  flags.DEFINE_boolean(
-      name='enable_checkpoint_and_export',
-      default=False,
-      help='Whether to enable a checkpoint callback and export the savedmodel.')
-  flags.DEFINE_string(name='tpu', default='', help='TPU address to connect to.')
-  flags.DEFINE_integer(
-      name='steps_per_loop',
-      default=None,
-      help='Number of steps per training loop. Only training step happens '
-      'inside the loop. Callbacks will not be called inside. Will be capped at '
-      'steps per epoch.')
-  flags.DEFINE_boolean(
-      name='use_tf_while_loop',
-      default=True,
-      help='Whether to build a tf.while_loop inside the training loop on the '
-      'host. Setting it to True is critical to have peak performance on '
-      'TPU.')
-
-  if model:
-    flags.DEFINE_string('model', 'resnet50_v1.5',
-                        'Name of model preset. (mobilenet, resnet50_v1.5)')
-  if optimizer:
-    flags.DEFINE_string(
-        'optimizer', 'resnet50_default', 'Name of optimizer preset. '
-        '(mobilenet_default, resnet50_default)')
-    # TODO(kimjaehong): Replace as general hyper-params not only for mobilenet.
-    flags.DEFINE_float(
-        'initial_learning_rate_per_sample', 0.00007,
-        'Initial value of learning rate per sample for '
-        'mobilenet_default.')
-    flags.DEFINE_float('lr_decay_factor', 0.94,
-                       'Learning rate decay factor for mobilenet_default.')
-    flags.DEFINE_float('num_epochs_per_decay', 2.5,
-                       'Number of epochs per decay for mobilenet_default.')
-  if pretrained_filepath:
-    flags.DEFINE_string('pretrained_filepath', '', 'Pretrained file path.')
-
-
-def get_synth_data(height, width, num_channels, num_classes, dtype):
-  """Creates a set of synthetic random data.
-
-  Args:
-    height: Integer height that will be used to create a fake image tensor.
-    width: Integer width that will be used to create a fake image tensor.
-    num_channels: Integer depth that will be used to create a fake image tensor.
-    num_classes: Number of classes that should be represented in the fake labels
-      tensor
-    dtype: Data type for features/images.
-
-  Returns:
-    A tuple of tensors representing the inputs and labels.
-
-  """
-  # Synthetic input should be within [0, 255].
-  inputs = tf.random.truncated_normal([height, width, num_channels],
-                                      dtype=dtype,
-                                      mean=127,
-                                      stddev=60,
-                                      name='synthetic_inputs')
-  labels = tf.random.uniform([1],
-                             minval=0,
-                             maxval=num_classes - 1,
-                             dtype=tf.int32,
-                             name='synthetic_labels')
-  return inputs, labels
-
-
-def define_pruning_flags():
-  """Define flags for pruning methods."""
-  flags.DEFINE_string(
-      'pruning_method', None, 'Pruning method.'
-      'None (no pruning) or polynomial_decay.')
-  flags.DEFINE_float('pruning_initial_sparsity', 0.0,
-                     'Initial sparsity for pruning.')
-  flags.DEFINE_float('pruning_final_sparsity', 0.5,
-                     'Final sparsity for pruning.')
-  flags.DEFINE_integer('pruning_begin_step', 0, 'Begin step for pruning.')
-  flags.DEFINE_integer('pruning_end_step', 100000, 'End step for pruning.')
-  flags.DEFINE_integer('pruning_frequency', 100, 'Frequency for pruning.')
-
-
-def define_clustering_flags():
-  """Define flags for clustering methods."""
-  flags.DEFINE_string('clustering_method', None,
-                      'None (no clustering) or selective_clustering '
-                      '(cluster last three Conv2D layers of the model).')
-
-
-def get_synth_input_fn(height,
-                       width,
-                       num_channels,
-                       num_classes,
-                       dtype=tf.float32,
-                       drop_remainder=True):
-  """Returns an input function that returns a dataset with random data.
-
-  This input_fn returns a data set that iterates over a set of random data and
-  bypasses all preprocessing, e.g. jpeg decode and copy. The host to device
-  copy is still included. This used to find the upper throughput bound when
-  tuning the full input pipeline.
-
-  Args:
-    height: Integer height that will be used to create a fake image tensor.
-    width: Integer width that will be used to create a fake image tensor.
-    num_channels: Integer depth that will be used to create a fake image tensor.
-    num_classes: Number of classes that should be represented in the fake labels
-      tensor
-    dtype: Data type for features/images.
-    drop_remainder: A boolean indicates whether to drop the remainder of the
-      batches. If True, the batch dimension will be static.
-
-  Returns:
-    An input_fn that can be used in place of a real one to return a dataset
-    that can be used for iteration.
-  """
-
-  # pylint: disable=unused-argument
-  def input_fn(is_training, data_dir, batch_size, *args, **kwargs):
-    """Returns dataset filled with random data."""
-    inputs, labels = get_synth_data(
-        height=height,
-        width=width,
-        num_channels=num_channels,
-        num_classes=num_classes,
-        dtype=dtype)
-    # Cast to float32 for Keras model.
-    labels = tf.cast(labels, dtype=tf.float32)
-    data = tf.data.Dataset.from_tensors((inputs, labels)).repeat()
-
-    # `drop_remainder` will make dataset produce outputs with known shapes.
-    data = data.batch(batch_size, drop_remainder=drop_remainder)
-    data = data.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
-    return data
-
-  return input_fn
-
-
-def set_cudnn_batchnorm_mode():
-  """Set CuDNN batchnorm mode for better performance.
-
-     Note: Spatial Persistent mode may lead to accuracy losses for certain
-     models.
-  """
-  if FLAGS.batchnorm_spatial_persistent:
-    os.environ['TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT'] = '1'
-  else:
-    os.environ.pop('TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT', None)
diff --git a/official/vision/image_classification/resnet/imagenet_preprocessing.py b/official/vision/image_classification/resnet/imagenet_preprocessing.py
deleted file mode 100644
index 86ba3ed98084987ea5d63edf8fd5f515d58fba93..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/resnet/imagenet_preprocessing.py
+++ /dev/null
@@ -1,574 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Provides utilities to preprocess images.
-
-Training images are sampled using the provided bounding boxes, and subsequently
-cropped to the sampled bounding box. Images are additionally flipped randomly,
-then resized to the target output size (without aspect-ratio preservation).
-
-Images used during evaluation are resized (with aspect-ratio preservation) and
-centrally cropped.
-
-All images undergo mean color subtraction.
-
-Note that these steps are colloquially referred to as "ResNet preprocessing,"
-and they differ from "VGG preprocessing," which does not use bounding boxes
-and instead does an aspect-preserving resize followed by random crop during
-training. (These both differ from "Inception preprocessing," which introduces
-color distortion steps.)
-
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import logging
-import tensorflow as tf
-
-DEFAULT_IMAGE_SIZE = 224
-NUM_CHANNELS = 3
-NUM_CLASSES = 1001
-
-NUM_IMAGES = {
-    'train': 1281167,
-    'validation': 50000,
-}
-
-_NUM_TRAIN_FILES = 1024
-_SHUFFLE_BUFFER = 10000
-
-_R_MEAN = 123.68
-_G_MEAN = 116.78
-_B_MEAN = 103.94
-CHANNEL_MEANS = [_R_MEAN, _G_MEAN, _B_MEAN]
-
-# The lower bound for the smallest side of the image for aspect-preserving
-# resizing. For example, if an image is 500 x 1000, it will be resized to
-# _RESIZE_MIN x (_RESIZE_MIN * 2).
-_RESIZE_MIN = 256
-
-
-def process_record_dataset(dataset,
-                           is_training,
-                           batch_size,
-                           shuffle_buffer,
-                           parse_record_fn,
-                           dtype=tf.float32,
-                           datasets_num_private_threads=None,
-                           drop_remainder=False,
-                           tf_data_experimental_slack=False):
-  """Given a Dataset with raw records, return an iterator over the records.
-
-  Args:
-    dataset: A Dataset representing raw records
-    is_training: A boolean denoting whether the input is for training.
-    batch_size: The number of samples per batch.
-    shuffle_buffer: The buffer size to use when shuffling records. A larger
-      value results in better randomness, but smaller values reduce startup time
-      and use less memory.
-    parse_record_fn: A function that takes a raw record and returns the
-      corresponding (image, label) pair.
-    dtype: Data type to use for images/features.
-    datasets_num_private_threads: Number of threads for a private threadpool
-      created for all datasets computation.
-    drop_remainder: A boolean indicates whether to drop the remainder of the
-      batches. If True, the batch dimension will be static.
-    tf_data_experimental_slack: Whether to enable tf.data's `experimental_slack`
-      option.
-
-  Returns:
-    Dataset of (image, label) pairs ready for iteration.
-  """
-  # Defines a specific size thread pool for tf.data operations.
-  if datasets_num_private_threads:
-    options = tf.data.Options()
-    options.experimental_threading.private_threadpool_size = (
-        datasets_num_private_threads)
-    dataset = dataset.with_options(options)
-    logging.info('datasets_num_private_threads: %s',
-                 datasets_num_private_threads)
-
-  if is_training:
-    # Shuffles records before repeating to respect epoch boundaries.
-    dataset = dataset.shuffle(buffer_size=shuffle_buffer)
-    # Repeats the dataset for the number of epochs to train.
-    dataset = dataset.repeat()
-
-  # Parses the raw records into images and labels.
-  dataset = dataset.map(
-      lambda value: parse_record_fn(value, is_training, dtype),
-      num_parallel_calls=tf.data.experimental.AUTOTUNE)
-  dataset = dataset.batch(batch_size, drop_remainder=drop_remainder)
-
-  # Operations between the final prefetch and the get_next call to the iterator
-  # will happen synchronously during run time. We prefetch here again to
-  # background all of the above processing work and keep it out of the
-  # critical training path. Setting buffer_size to tf.data.experimental.AUTOTUNE
-  # allows DistributionStrategies to adjust how many batches to fetch based
-  # on how many devices are present.
-  dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
-
-  options = tf.data.Options()
-  options.experimental_slack = tf_data_experimental_slack
-  dataset = dataset.with_options(options)
-
-  return dataset
-
-
-def get_filenames(is_training, data_dir):
-  """Return filenames for dataset."""
-  if is_training:
-    return [
-        os.path.join(data_dir, 'train-%05d-of-01024' % i)
-        for i in range(_NUM_TRAIN_FILES)
-    ]
-  else:
-    return [
-        os.path.join(data_dir, 'validation-%05d-of-00128' % i)
-        for i in range(128)
-    ]
-
-
-def parse_example_proto(example_serialized):
-  """Parses an Example proto containing a training example of an image.
-
-  The output of the build_image_data.py image preprocessing script is a dataset
-  containing serialized Example protocol buffers. Each Example proto contains
-  the following fields (values are included as examples):
-
-    image/height: 462
-    image/width: 581
-    image/colorspace: 'RGB'
-    image/channels: 3
-    image/class/label: 615
-    image/class/synset: 'n03623198'
-    image/class/text: 'knee pad'
-    image/object/bbox/xmin: 0.1
-    image/object/bbox/xmax: 0.9
-    image/object/bbox/ymin: 0.2
-    image/object/bbox/ymax: 0.6
-    image/object/bbox/label: 615
-    image/format: 'JPEG'
-    image/filename: 'ILSVRC2012_val_00041207.JPEG'
-    image/encoded: <JPEG encoded string>
-
-  Args:
-    example_serialized: scalar Tensor tf.string containing a serialized Example
-      protocol buffer.
-
-  Returns:
-    image_buffer: Tensor tf.string containing the contents of a JPEG file.
-    label: Tensor tf.int32 containing the label.
-    bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords]
-      where each coordinate is [0, 1) and the coordinates are arranged as
-      [ymin, xmin, ymax, xmax].
-  """
-  # Dense features in Example proto.
-  feature_map = {
-      'image/encoded':
-          tf.io.FixedLenFeature([], dtype=tf.string, default_value=''),
-      'image/class/label':
-          tf.io.FixedLenFeature([], dtype=tf.int64, default_value=-1),
-      'image/class/text':
-          tf.io.FixedLenFeature([], dtype=tf.string, default_value=''),
-  }
-  sparse_float32 = tf.io.VarLenFeature(dtype=tf.float32)
-  # Sparse features in Example proto.
-  feature_map.update({
-      k: sparse_float32 for k in [
-          'image/object/bbox/xmin', 'image/object/bbox/ymin',
-          'image/object/bbox/xmax', 'image/object/bbox/ymax'
-      ]
-  })
-
-  features = tf.io.parse_single_example(
-      serialized=example_serialized, features=feature_map)
-  label = tf.cast(features['image/class/label'], dtype=tf.int32)
-
-  xmin = tf.expand_dims(features['image/object/bbox/xmin'].values, 0)
-  ymin = tf.expand_dims(features['image/object/bbox/ymin'].values, 0)
-  xmax = tf.expand_dims(features['image/object/bbox/xmax'].values, 0)
-  ymax = tf.expand_dims(features['image/object/bbox/ymax'].values, 0)
-
-  # Note that we impose an ordering of (y, x) just to make life difficult.
-  bbox = tf.concat([ymin, xmin, ymax, xmax], 0)
-
-  # Force the variable number of bounding boxes into the shape
-  # [1, num_boxes, coords].
-  bbox = tf.expand_dims(bbox, 0)
-  bbox = tf.transpose(a=bbox, perm=[0, 2, 1])
-
-  return features['image/encoded'], label, bbox
-
-
-def parse_record(raw_record, is_training, dtype):
-  """Parses a record containing a training example of an image.
-
-  The input record is parsed into a label and image, and the image is passed
-  through preprocessing steps (cropping, flipping, and so on).
-
-  Args:
-    raw_record: scalar Tensor tf.string containing a serialized Example protocol
-      buffer.
-    is_training: A boolean denoting whether the input is for training.
-    dtype: data type to use for images/features.
-
-  Returns:
-    Tuple with processed image tensor in a channel-last format and
-    one-hot-encoded label tensor.
-  """
-  image_buffer, label, bbox = parse_example_proto(raw_record)
-
-  image = preprocess_image(
-      image_buffer=image_buffer,
-      bbox=bbox,
-      output_height=DEFAULT_IMAGE_SIZE,
-      output_width=DEFAULT_IMAGE_SIZE,
-      num_channels=NUM_CHANNELS,
-      is_training=is_training)
-  image = tf.cast(image, dtype)
-
-  # Subtract one so that labels are in [0, 1000), and cast to float32 for
-  # Keras model.
-  label = tf.cast(
-      tf.cast(tf.reshape(label, shape=[1]), dtype=tf.int32) - 1,
-      dtype=tf.float32)
-  return image, label
-
-
-def get_parse_record_fn(use_keras_image_data_format=False):
-  """Get a function for parsing the records, accounting for image format.
-
-  This is useful by handling different types of Keras models. For instance,
-  the current resnet_model.resnet50 input format is always channel-last,
-  whereas the keras_applications mobilenet input format depends on
-  tf.keras.backend.image_data_format(). We should set
-  use_keras_image_data_format=False for the former and True for the latter.
-
-  Args:
-    use_keras_image_data_format: A boolean denoting whether data format is keras
-      backend image data format. If False, the image format is channel-last. If
-      True, the image format matches tf.keras.backend.image_data_format().
-
-  Returns:
-    Function to use for parsing the records.
-  """
-
-  def parse_record_fn(raw_record, is_training, dtype):
-    image, label = parse_record(raw_record, is_training, dtype)
-    if use_keras_image_data_format:
-      if tf.keras.backend.image_data_format() == 'channels_first':
-        image = tf.transpose(image, perm=[2, 0, 1])
-    return image, label
-
-  return parse_record_fn
-
-
-def input_fn(is_training,
-             data_dir,
-             batch_size,
-             dtype=tf.float32,
-             datasets_num_private_threads=None,
-             parse_record_fn=parse_record,
-             input_context=None,
-             drop_remainder=False,
-             tf_data_experimental_slack=False,
-             training_dataset_cache=False,
-             filenames=None):
-  """Input function which provides batches for train or eval.
-
-  Args:
-    is_training: A boolean denoting whether the input is for training.
-    data_dir: The directory containing the input data.
-    batch_size: The number of samples per batch.
-    dtype: Data type to use for images/features
-    datasets_num_private_threads: Number of private threads for tf.data.
-    parse_record_fn: Function to use for parsing the records.
-    input_context: A `tf.distribute.InputContext` object passed in by
-      `tf.distribute.Strategy`.
-    drop_remainder: A boolean indicates whether to drop the remainder of the
-      batches. If True, the batch dimension will be static.
-    tf_data_experimental_slack: Whether to enable tf.data's `experimental_slack`
-      option.
-    training_dataset_cache: Whether to cache the training dataset on workers.
-      Typically used to improve training performance when training data is in
-      remote storage and can fit into worker memory.
-    filenames: Optional field for providing the file names of the TFRecords.
-
-  Returns:
-    A dataset that can be used for iteration.
-  """
-  if filenames is None:
-    filenames = get_filenames(is_training, data_dir)
-  dataset = tf.data.Dataset.from_tensor_slices(filenames)
-
-  if input_context:
-    logging.info(
-        'Sharding the dataset: input_pipeline_id=%d num_input_pipelines=%d',
-        input_context.input_pipeline_id, input_context.num_input_pipelines)
-    dataset = dataset.shard(input_context.num_input_pipelines,
-                            input_context.input_pipeline_id)
-
-  if is_training:
-    # Shuffle the input files
-    dataset = dataset.shuffle(buffer_size=_NUM_TRAIN_FILES)
-
-  # Convert to individual records.
-  # cycle_length = 10 means that up to 10 files will be read and deserialized in
-  # parallel. You may want to increase this number if you have a large number of
-  # CPU cores.
-  dataset = dataset.interleave(
-      tf.data.TFRecordDataset,
-      cycle_length=10,
-      num_parallel_calls=tf.data.experimental.AUTOTUNE)
-
-  if is_training and training_dataset_cache:
-    # Improve training performance when training data is in remote storage and
-    # can fit into worker memory.
-    dataset = dataset.cache()
-
-  return process_record_dataset(
-      dataset=dataset,
-      is_training=is_training,
-      batch_size=batch_size,
-      shuffle_buffer=_SHUFFLE_BUFFER,
-      parse_record_fn=parse_record_fn,
-      dtype=dtype,
-      datasets_num_private_threads=datasets_num_private_threads,
-      drop_remainder=drop_remainder,
-      tf_data_experimental_slack=tf_data_experimental_slack,
-  )
-
-
-def _decode_crop_and_flip(image_buffer, bbox, num_channels):
-  """Crops the given image to a random part of the image, and randomly flips.
-
-  We use the fused decode_and_crop op, which performs better than the two ops
-  used separately in series, but note that this requires that the image be
-  passed in as an un-decoded string Tensor.
-
-  Args:
-    image_buffer: scalar string Tensor representing the raw JPEG image buffer.
-    bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords]
-      where each coordinate is [0, 1) and the coordinates are arranged as [ymin,
-      xmin, ymax, xmax].
-    num_channels: Integer depth of the image buffer for decoding.
-
-  Returns:
-    3-D tensor with cropped image.
-
-  """
-  # A large fraction of image datasets contain a human-annotated bounding box
-  # delineating the region of the image containing the object of interest.  We
-  # choose to create a new bounding box for the object which is a randomly
-  # distorted version of the human-annotated bounding box that obeys an
-  # allowed range of aspect ratios, sizes and overlap with the human-annotated
-  # bounding box. If no box is supplied, then we assume the bounding box is
-  # the entire image.
-  sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box(
-      tf.image.extract_jpeg_shape(image_buffer),
-      bounding_boxes=bbox,
-      min_object_covered=0.1,
-      aspect_ratio_range=[0.75, 1.33],
-      area_range=[0.05, 1.0],
-      max_attempts=100,
-      use_image_if_no_bounding_boxes=True)
-  bbox_begin, bbox_size, _ = sample_distorted_bounding_box
-
-  # Reassemble the bounding box in the format the crop op requires.
-  offset_y, offset_x, _ = tf.unstack(bbox_begin)
-  target_height, target_width, _ = tf.unstack(bbox_size)
-  crop_window = tf.stack([offset_y, offset_x, target_height, target_width])
-
-  # Use the fused decode and crop op here, which is faster than each in series.
-  cropped = tf.image.decode_and_crop_jpeg(
-      image_buffer, crop_window, channels=num_channels)
-
-  # Flip to add a little more random distortion in.
-  cropped = tf.image.random_flip_left_right(cropped)
-  return cropped
-
-
-def _central_crop(image, crop_height, crop_width):
-  """Performs central crops of the given image list.
-
-  Args:
-    image: a 3-D image tensor
-    crop_height: the height of the image following the crop.
-    crop_width: the width of the image following the crop.
-
-  Returns:
-    3-D tensor with cropped image.
-  """
-  shape = tf.shape(input=image)
-  height, width = shape[0], shape[1]
-
-  amount_to_be_cropped_h = (height - crop_height)
-  crop_top = amount_to_be_cropped_h // 2
-  amount_to_be_cropped_w = (width - crop_width)
-  crop_left = amount_to_be_cropped_w // 2
-  return tf.slice(image, [crop_top, crop_left, 0],
-                  [crop_height, crop_width, -1])
-
-
-def _mean_image_subtraction(image, means, num_channels):
-  """Subtracts the given means from each image channel.
-
-  For example:
-    means = [123.68, 116.779, 103.939]
-    image = _mean_image_subtraction(image, means)
-
-  Note that the rank of `image` must be known.
-
-  Args:
-    image: a tensor of size [height, width, C].
-    means: a C-vector of values to subtract from each channel.
-    num_channels: number of color channels in the image that will be distorted.
-
-  Returns:
-    the centered image.
-
-  Raises:
-    ValueError: If the rank of `image` is unknown, if `image` has a rank other
-      than three or if the number of channels in `image` doesn't match the
-      number of values in `means`.
-  """
-  if image.get_shape().ndims != 3:
-    raise ValueError('Input must be of size [height, width, C>0]')
-
-  if len(means) != num_channels:
-    raise ValueError('len(means) must match the number of channels')
-
-  # We have a 1-D tensor of means; convert to 3-D.
-  # Note(b/130245863): we explicitly call `broadcast` instead of simply
-  # expanding dimensions for better performance.
-  means = tf.broadcast_to(means, tf.shape(image))
-
-  return image - means
-
-
-def _smallest_size_at_least(height, width, resize_min):
-  """Computes new shape with the smallest side equal to `smallest_side`.
-
-  Computes new shape with the smallest side equal to `smallest_side` while
-  preserving the original aspect ratio.
-
-  Args:
-    height: an int32 scalar tensor indicating the current height.
-    width: an int32 scalar tensor indicating the current width.
-    resize_min: A python integer or scalar `Tensor` indicating the size of the
-      smallest side after resize.
-
-  Returns:
-    new_height: an int32 scalar tensor indicating the new height.
-    new_width: an int32 scalar tensor indicating the new width.
-  """
-  resize_min = tf.cast(resize_min, tf.float32)
-
-  # Convert to floats to make subsequent calculations go smoothly.
-  height, width = tf.cast(height, tf.float32), tf.cast(width, tf.float32)
-
-  smaller_dim = tf.minimum(height, width)
-  scale_ratio = resize_min / smaller_dim
-
-  # Convert back to ints to make heights and widths that TF ops will accept.
-  new_height = tf.cast(height * scale_ratio, tf.int32)
-  new_width = tf.cast(width * scale_ratio, tf.int32)
-
-  return new_height, new_width
-
-
-def _aspect_preserving_resize(image, resize_min):
-  """Resize images preserving the original aspect ratio.
-
-  Args:
-    image: A 3-D image `Tensor`.
-    resize_min: A python integer or scalar `Tensor` indicating the size of the
-      smallest side after resize.
-
-  Returns:
-    resized_image: A 3-D tensor containing the resized image.
-  """
-  shape = tf.shape(input=image)
-  height, width = shape[0], shape[1]
-
-  new_height, new_width = _smallest_size_at_least(height, width, resize_min)
-
-  return _resize_image(image, new_height, new_width)
-
-
-def _resize_image(image, height, width):
-  """Simple wrapper around tf.resize_images.
-
-  This is primarily to make sure we use the same `ResizeMethod` and other
-  details each time.
-
-  Args:
-    image: A 3-D image `Tensor`.
-    height: The target height for the resized image.
-    width: The target width for the resized image.
-
-  Returns:
-    resized_image: A 3-D tensor containing the resized image. The first two
-      dimensions have the shape [height, width].
-  """
-  return tf.compat.v1.image.resize(
-      image, [height, width],
-      method=tf.image.ResizeMethod.BILINEAR,
-      align_corners=False)
-
-
-def preprocess_image(image_buffer,
-                     bbox,
-                     output_height,
-                     output_width,
-                     num_channels,
-                     is_training=False):
-  """Preprocesses the given image.
-
-  Preprocessing includes decoding, cropping, and resizing for both training
-  and eval images. Training preprocessing, however, introduces some random
-  distortion of the image to improve accuracy.
-
-  Args:
-    image_buffer: scalar string Tensor representing the raw JPEG image buffer.
-    bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords]
-      where each coordinate is [0, 1) and the coordinates are arranged as [ymin,
-      xmin, ymax, xmax].
-    output_height: The height of the image after preprocessing.
-    output_width: The width of the image after preprocessing.
-    num_channels: Integer depth of the image buffer for decoding.
-    is_training: `True` if we're preprocessing the image for training and
-      `False` otherwise.
-
-  Returns:
-    A preprocessed image.
-  """
-  if is_training:
-    # For training, we want to randomize some of the distortions.
-    image = _decode_crop_and_flip(image_buffer, bbox, num_channels)
-    image = _resize_image(image, output_height, output_width)
-  else:
-    # For validation, we want to decode, resize, then just crop the middle.
-    image = tf.image.decode_jpeg(image_buffer, channels=num_channels)
-    image = _aspect_preserving_resize(image, _RESIZE_MIN)
-    image = _central_crop(image, output_height, output_width)
-
-  image.set_shape([output_height, output_width, num_channels])
-
-  return _mean_image_subtraction(image, CHANNEL_MEANS, num_channels)
diff --git a/official/vision/image_classification/resnet/resnet_config.py b/official/vision/image_classification/resnet/resnet_config.py
deleted file mode 100644
index e39db3955f9fe9c312ea307c8ac3196d45447cf3..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/resnet/resnet_config.py
+++ /dev/null
@@ -1,55 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Lint as: python3
-"""Configuration definitions for ResNet losses, learning rates, and optimizers."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import dataclasses
-
-from official.modeling.hyperparams import base_config
-from official.vision.image_classification.configs import base_configs
-
-
-@dataclasses.dataclass
-class ResNetModelConfig(base_configs.ModelConfig):
-  """Configuration for the ResNet model."""
-  name: str = 'ResNet'
-  num_classes: int = 1000
-  model_params: base_config.Config = dataclasses.field(
-      default_factory=lambda: {
-          'num_classes': 1000,
-          'batch_size': None,
-          'use_l2_regularizer': True,
-          'rescale_inputs': False,
-      })
-  loss: base_configs.LossConfig = base_configs.LossConfig(
-      name='sparse_categorical_crossentropy')
-  optimizer: base_configs.OptimizerConfig = base_configs.OptimizerConfig(
-      name='momentum',
-      decay=0.9,
-      epsilon=0.001,
-      momentum=0.9,
-      moving_average_decay=None)
-  learning_rate: base_configs.LearningRateConfig = (
-      base_configs.LearningRateConfig(
-          name='stepwise',
-          initial_lr=0.1,
-          examples_per_epoch=1281167,
-          boundaries=[30, 60, 80],
-          warmup_epochs=5,
-          scale_by_batch_size=1. / 256.,
-          multipliers=[0.1 / 256, 0.01 / 256, 0.001 / 256, 0.0001 / 256]))
diff --git a/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py b/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py
deleted file mode 100644
index a66461df17a3fe5fc0d75969e99920310a694e71..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py
+++ /dev/null
@@ -1,195 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Runs a ResNet model on the ImageNet dataset using custom training loops."""
-
-import math
-import os
-
-# Import libraries
-from absl import app
-from absl import flags
-from absl import logging
-import orbit
-import tensorflow as tf
-from official.common import distribute_utils
-from official.modeling import performance
-from official.utils.flags import core as flags_core
-from official.utils.misc import keras_utils
-from official.utils.misc import model_helpers
-from official.vision.image_classification.resnet import common
-from official.vision.image_classification.resnet import imagenet_preprocessing
-from official.vision.image_classification.resnet import resnet_runnable
-
-flags.DEFINE_boolean(name='use_tf_function', default=True,
-                     help='Wrap the train and test step inside a '
-                     'tf.function.')
-flags.DEFINE_boolean(name='single_l2_loss_op', default=False,
-                     help='Calculate L2_loss on concatenated weights, '
-                     'instead of using Keras per-layer L2 loss.')
-
-
-def build_stats(runnable, time_callback):
-  """Normalizes and returns dictionary of stats.
-
-  Args:
-    runnable: The module containing all the training and evaluation metrics.
-    time_callback: Time tracking callback instance.
-
-  Returns:
-    Dictionary of normalized results.
-  """
-  stats = {}
-
-  if not runnable.flags_obj.skip_eval:
-    stats['eval_loss'] = runnable.test_loss.result().numpy()
-    stats['eval_acc'] = runnable.test_accuracy.result().numpy()
-
-    stats['train_loss'] = runnable.train_loss.result().numpy()
-    stats['train_acc'] = runnable.train_accuracy.result().numpy()
-
-  if time_callback:
-    timestamp_log = time_callback.timestamp_log
-    stats['step_timestamp_log'] = timestamp_log
-    stats['train_finish_time'] = time_callback.train_finish_time
-    if time_callback.epoch_runtime_log:
-      stats['avg_exp_per_second'] = time_callback.average_examples_per_second
-
-  return stats
-
-
-def get_num_train_iterations(flags_obj):
-  """Returns the number of training steps, train and test epochs."""
-  train_steps = (
-      imagenet_preprocessing.NUM_IMAGES['train'] // flags_obj.batch_size)
-  train_epochs = flags_obj.train_epochs
-
-  if flags_obj.train_steps:
-    train_steps = min(flags_obj.train_steps, train_steps)
-    train_epochs = 1
-
-  eval_steps = math.ceil(1.0 * imagenet_preprocessing.NUM_IMAGES['validation'] /
-                         flags_obj.batch_size)
-
-  return train_steps, train_epochs, eval_steps
-
-
-def run(flags_obj):
-  """Run ResNet ImageNet training and eval loop using custom training loops.
-
-  Args:
-    flags_obj: An object containing parsed flag values.
-
-  Raises:
-    ValueError: If fp16 is passed as it is not currently supported.
-
-  Returns:
-    Dictionary of training and eval stats.
-  """
-  keras_utils.set_session_config()
-  performance.set_mixed_precision_policy(flags_core.get_tf_dtype(flags_obj))
-
-  if tf.config.list_physical_devices('GPU'):
-    if flags_obj.tf_gpu_thread_mode:
-      keras_utils.set_gpu_thread_mode_and_count(
-          per_gpu_thread_count=flags_obj.per_gpu_thread_count,
-          gpu_thread_mode=flags_obj.tf_gpu_thread_mode,
-          num_gpus=flags_obj.num_gpus,
-          datasets_num_private_threads=flags_obj.datasets_num_private_threads)
-    common.set_cudnn_batchnorm_mode()
-
-  data_format = flags_obj.data_format
-  if data_format is None:
-    data_format = ('channels_first' if tf.config.list_physical_devices('GPU')
-                   else 'channels_last')
-  tf.keras.backend.set_image_data_format(data_format)
-
-  strategy = distribute_utils.get_distribution_strategy(
-      distribution_strategy=flags_obj.distribution_strategy,
-      num_gpus=flags_obj.num_gpus,
-      all_reduce_alg=flags_obj.all_reduce_alg,
-      num_packs=flags_obj.num_packs,
-      tpu_address=flags_obj.tpu)
-
-  per_epoch_steps, train_epochs, eval_steps = get_num_train_iterations(
-      flags_obj)
-  if flags_obj.steps_per_loop is None:
-    steps_per_loop = per_epoch_steps
-  elif flags_obj.steps_per_loop > per_epoch_steps:
-    steps_per_loop = per_epoch_steps
-    logging.warn('Setting steps_per_loop to %d to respect epoch boundary.',
-                 steps_per_loop)
-  else:
-    steps_per_loop = flags_obj.steps_per_loop
-
-  logging.info(
-      'Training %d epochs, each epoch has %d steps, '
-      'total steps: %d; Eval %d steps', train_epochs, per_epoch_steps,
-      train_epochs * per_epoch_steps, eval_steps)
-
-  time_callback = keras_utils.TimeHistory(
-      flags_obj.batch_size,
-      flags_obj.log_steps,
-      logdir=flags_obj.model_dir if flags_obj.enable_tensorboard else None)
-  with distribute_utils.get_strategy_scope(strategy):
-    runnable = resnet_runnable.ResnetRunnable(flags_obj, time_callback,
-                                              per_epoch_steps)
-
-  eval_interval = flags_obj.epochs_between_evals * per_epoch_steps
-  checkpoint_interval = (
-      steps_per_loop * 5 if flags_obj.enable_checkpoint_and_export else None)
-  summary_interval = steps_per_loop if flags_obj.enable_tensorboard else None
-
-  checkpoint_manager = tf.train.CheckpointManager(
-      runnable.checkpoint,
-      directory=flags_obj.model_dir,
-      max_to_keep=10,
-      step_counter=runnable.global_step,
-      checkpoint_interval=checkpoint_interval)
-
-  resnet_controller = orbit.Controller(
-      strategy=strategy,
-      trainer=runnable,
-      evaluator=runnable if not flags_obj.skip_eval else None,
-      global_step=runnable.global_step,
-      steps_per_loop=steps_per_loop,
-      checkpoint_manager=checkpoint_manager,
-      summary_interval=summary_interval,
-      summary_dir=flags_obj.model_dir,
-      eval_summary_dir=os.path.join(flags_obj.model_dir, 'eval'))
-
-  time_callback.on_train_begin()
-  if not flags_obj.skip_eval:
-    resnet_controller.train_and_evaluate(
-        train_steps=per_epoch_steps * train_epochs,
-        eval_steps=eval_steps,
-        eval_interval=eval_interval)
-  else:
-    resnet_controller.train(steps=per_epoch_steps * train_epochs)
-  time_callback.on_train_end()
-
-  stats = build_stats(runnable, time_callback)
-  return stats
-
-
-def main(_):
-  model_helpers.apply_clean(flags.FLAGS)
-  stats = run(flags.FLAGS)
-  logging.info('Run stats:\n%s', stats)
-
-
-if __name__ == '__main__':
-  logging.set_verbosity(logging.INFO)
-  common.define_keras_flags()
-  app.run(main)
diff --git a/official/vision/image_classification/resnet/resnet_model.py b/official/vision/image_classification/resnet/resnet_model.py
deleted file mode 100644
index 597b85739e965a157aff995d14891f76698678d4..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/resnet/resnet_model.py
+++ /dev/null
@@ -1,325 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""ResNet50 model for Keras.
-
-Adapted from tf.keras.applications.resnet50.ResNet50().
-This is ResNet model version 1.5.
-
-Related papers/blogs:
-- https://arxiv.org/abs/1512.03385
-- https://arxiv.org/pdf/1603.05027v2.pdf
-- http://torch.ch/blog/2016/02/04/resnets.html
-
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-from official.vision.image_classification.resnet import imagenet_preprocessing
-
-layers = tf.keras.layers
-
-
-def _gen_l2_regularizer(use_l2_regularizer=True, l2_weight_decay=1e-4):
-  return tf.keras.regularizers.L2(
-      l2_weight_decay) if use_l2_regularizer else None
-
-
-def identity_block(input_tensor,
-                   kernel_size,
-                   filters,
-                   stage,
-                   block,
-                   use_l2_regularizer=True,
-                   batch_norm_decay=0.9,
-                   batch_norm_epsilon=1e-5):
-  """The identity block is the block that has no conv layer at shortcut.
-
-  Args:
-    input_tensor: input tensor
-    kernel_size: default 3, the kernel size of middle conv layer at main path
-    filters: list of integers, the filters of 3 conv layer at main path
-    stage: integer, current stage label, used for generating layer names
-    block: 'a','b'..., current block label, used for generating layer names
-    use_l2_regularizer: whether to use L2 regularizer on Conv layer.
-    batch_norm_decay: Moment of batch norm layers.
-    batch_norm_epsilon: Epsilon of batch borm layers.
-
-  Returns:
-    Output tensor for the block.
-  """
-  filters1, filters2, filters3 = filters
-  if tf.keras.backend.image_data_format() == 'channels_last':
-    bn_axis = 3
-  else:
-    bn_axis = 1
-  conv_name_base = 'res' + str(stage) + block + '_branch'
-  bn_name_base = 'bn' + str(stage) + block + '_branch'
-
-  x = layers.Conv2D(
-      filters1, (1, 1),
-      use_bias=False,
-      kernel_initializer='he_normal',
-      kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
-      name=conv_name_base + '2a')(
-          input_tensor)
-  x = layers.BatchNormalization(
-      axis=bn_axis,
-      momentum=batch_norm_decay,
-      epsilon=batch_norm_epsilon,
-      name=bn_name_base + '2a')(
-          x)
-  x = layers.Activation('relu')(x)
-
-  x = layers.Conv2D(
-      filters2,
-      kernel_size,
-      padding='same',
-      use_bias=False,
-      kernel_initializer='he_normal',
-      kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
-      name=conv_name_base + '2b')(
-          x)
-  x = layers.BatchNormalization(
-      axis=bn_axis,
-      momentum=batch_norm_decay,
-      epsilon=batch_norm_epsilon,
-      name=bn_name_base + '2b')(
-          x)
-  x = layers.Activation('relu')(x)
-
-  x = layers.Conv2D(
-      filters3, (1, 1),
-      use_bias=False,
-      kernel_initializer='he_normal',
-      kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
-      name=conv_name_base + '2c')(
-          x)
-  x = layers.BatchNormalization(
-      axis=bn_axis,
-      momentum=batch_norm_decay,
-      epsilon=batch_norm_epsilon,
-      name=bn_name_base + '2c')(
-          x)
-
-  x = layers.add([x, input_tensor])
-  x = layers.Activation('relu')(x)
-  return x
-
-
-def conv_block(input_tensor,
-               kernel_size,
-               filters,
-               stage,
-               block,
-               strides=(2, 2),
-               use_l2_regularizer=True,
-               batch_norm_decay=0.9,
-               batch_norm_epsilon=1e-5):
-  """A block that has a conv layer at shortcut.
-
-  Note that from stage 3,
-  the second conv layer at main path is with strides=(2, 2)
-  And the shortcut should have strides=(2, 2) as well
-
-  Args:
-    input_tensor: input tensor
-    kernel_size: default 3, the kernel size of middle conv layer at main path
-    filters: list of integers, the filters of 3 conv layer at main path
-    stage: integer, current stage label, used for generating layer names
-    block: 'a','b'..., current block label, used for generating layer names
-    strides: Strides for the second conv layer in the block.
-    use_l2_regularizer: whether to use L2 regularizer on Conv layer.
-    batch_norm_decay: Moment of batch norm layers.
-    batch_norm_epsilon: Epsilon of batch borm layers.
-
-  Returns:
-    Output tensor for the block.
-  """
-  filters1, filters2, filters3 = filters
-  if tf.keras.backend.image_data_format() == 'channels_last':
-    bn_axis = 3
-  else:
-    bn_axis = 1
-  conv_name_base = 'res' + str(stage) + block + '_branch'
-  bn_name_base = 'bn' + str(stage) + block + '_branch'
-
-  x = layers.Conv2D(
-      filters1, (1, 1),
-      use_bias=False,
-      kernel_initializer='he_normal',
-      kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
-      name=conv_name_base + '2a')(
-          input_tensor)
-  x = layers.BatchNormalization(
-      axis=bn_axis,
-      momentum=batch_norm_decay,
-      epsilon=batch_norm_epsilon,
-      name=bn_name_base + '2a')(
-          x)
-  x = layers.Activation('relu')(x)
-
-  x = layers.Conv2D(
-      filters2,
-      kernel_size,
-      strides=strides,
-      padding='same',
-      use_bias=False,
-      kernel_initializer='he_normal',
-      kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
-      name=conv_name_base + '2b')(
-          x)
-  x = layers.BatchNormalization(
-      axis=bn_axis,
-      momentum=batch_norm_decay,
-      epsilon=batch_norm_epsilon,
-      name=bn_name_base + '2b')(
-          x)
-  x = layers.Activation('relu')(x)
-
-  x = layers.Conv2D(
-      filters3, (1, 1),
-      use_bias=False,
-      kernel_initializer='he_normal',
-      kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
-      name=conv_name_base + '2c')(
-          x)
-  x = layers.BatchNormalization(
-      axis=bn_axis,
-      momentum=batch_norm_decay,
-      epsilon=batch_norm_epsilon,
-      name=bn_name_base + '2c')(
-          x)
-
-  shortcut = layers.Conv2D(
-      filters3, (1, 1),
-      strides=strides,
-      use_bias=False,
-      kernel_initializer='he_normal',
-      kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
-      name=conv_name_base + '1')(
-          input_tensor)
-  shortcut = layers.BatchNormalization(
-      axis=bn_axis,
-      momentum=batch_norm_decay,
-      epsilon=batch_norm_epsilon,
-      name=bn_name_base + '1')(
-          shortcut)
-
-  x = layers.add([x, shortcut])
-  x = layers.Activation('relu')(x)
-  return x
-
-
-def resnet50(num_classes,
-             batch_size=None,
-             use_l2_regularizer=True,
-             rescale_inputs=False,
-             batch_norm_decay=0.9,
-             batch_norm_epsilon=1e-5):
-  """Instantiates the ResNet50 architecture.
-
-  Args:
-    num_classes: `int` number of classes for image classification.
-    batch_size: Size of the batches for each step.
-    use_l2_regularizer: whether to use L2 regularizer on Conv/Dense layer.
-    rescale_inputs: whether to rescale inputs from 0 to 1.
-    batch_norm_decay: Moment of batch norm layers.
-    batch_norm_epsilon: Epsilon of batch borm layers.
-
-  Returns:
-      A Keras model instance.
-  """
-  input_shape = (224, 224, 3)
-  img_input = layers.Input(shape=input_shape, batch_size=batch_size)
-  if rescale_inputs:
-    # Hub image modules expect inputs in the range [0, 1]. This rescales these
-    # inputs to the range expected by the trained model.
-    x = layers.Lambda(
-        lambda x: x * 255.0 - tf.keras.backend.constant(    # pylint: disable=g-long-lambda
-            imagenet_preprocessing.CHANNEL_MEANS,
-            shape=[1, 1, 3],
-            dtype=x.dtype),
-        name='rescale')(
-            img_input)
-  else:
-    x = img_input
-
-  if tf.keras.backend.image_data_format() == 'channels_first':
-    x = layers.Permute((3, 1, 2))(x)
-    bn_axis = 1
-  else:  # channels_last
-    bn_axis = 3
-
-  block_config = dict(
-      use_l2_regularizer=use_l2_regularizer,
-      batch_norm_decay=batch_norm_decay,
-      batch_norm_epsilon=batch_norm_epsilon)
-  x = layers.ZeroPadding2D(padding=(3, 3), name='conv1_pad')(x)
-  x = layers.Conv2D(
-      64, (7, 7),
-      strides=(2, 2),
-      padding='valid',
-      use_bias=False,
-      kernel_initializer='he_normal',
-      kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
-      name='conv1')(
-          x)
-  x = layers.BatchNormalization(
-      axis=bn_axis,
-      momentum=batch_norm_decay,
-      epsilon=batch_norm_epsilon,
-      name='bn_conv1')(
-          x)
-  x = layers.Activation('relu')(x)
-  x = layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
-
-  x = conv_block(
-      x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), **block_config)
-  x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', **block_config)
-  x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', **block_config)
-
-  x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', **block_config)
-  x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', **block_config)
-  x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', **block_config)
-  x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', **block_config)
-
-  x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', **block_config)
-  x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b', **block_config)
-  x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c', **block_config)
-  x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d', **block_config)
-  x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e', **block_config)
-  x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f', **block_config)
-
-  x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', **block_config)
-  x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', **block_config)
-  x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', **block_config)
-
-  x = layers.GlobalAveragePooling2D()(x)
-  x = layers.Dense(
-      num_classes,
-      kernel_initializer=tf.initializers.random_normal(stddev=0.01),
-      kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
-      bias_regularizer=_gen_l2_regularizer(use_l2_regularizer),
-      name='fc1000')(
-          x)
-
-  # A softmax that is followed by the model loss must be done cannot be done
-  # in float16 due to numeric issues. So we pass dtype=float32.
-  x = layers.Activation('softmax', dtype='float32')(x)
-
-  # Create model.
-  return tf.keras.Model(img_input, x, name='resnet50')
diff --git a/official/vision/image_classification/resnet/resnet_runnable.py b/official/vision/image_classification/resnet/resnet_runnable.py
deleted file mode 100644
index fe3059f77dfb73b3ac685aeea69102f8a4bb5ad4..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/resnet/resnet_runnable.py
+++ /dev/null
@@ -1,210 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Runs a ResNet model on the ImageNet dataset using custom training loops."""
-
-import orbit
-import tensorflow as tf
-from official.modeling import grad_utils
-from official.modeling import performance
-from official.utils.flags import core as flags_core
-from official.vision.image_classification.resnet import common
-from official.vision.image_classification.resnet import imagenet_preprocessing
-from official.vision.image_classification.resnet import resnet_model
-
-
-class ResnetRunnable(orbit.StandardTrainer, orbit.StandardEvaluator):
-  """Implements the training and evaluation APIs for Resnet model."""
-
-  def __init__(self, flags_obj, time_callback, epoch_steps):
-    self.strategy = tf.distribute.get_strategy()
-    self.flags_obj = flags_obj
-    self.dtype = flags_core.get_tf_dtype(flags_obj)
-    self.time_callback = time_callback
-
-    # Input pipeline related
-    batch_size = flags_obj.batch_size
-    if batch_size % self.strategy.num_replicas_in_sync != 0:
-      raise ValueError(
-          'Batch size must be divisible by number of replicas : {}'.format(
-              self.strategy.num_replicas_in_sync))
-
-    # As auto rebatching is not supported in
-    # `distribute_datasets_from_function()` API, which is
-    # required when cloning dataset to multiple workers in eager mode,
-    # we use per-replica batch size.
-    self.batch_size = int(batch_size / self.strategy.num_replicas_in_sync)
-
-    if self.flags_obj.use_synthetic_data:
-      self.input_fn = common.get_synth_input_fn(
-          height=imagenet_preprocessing.DEFAULT_IMAGE_SIZE,
-          width=imagenet_preprocessing.DEFAULT_IMAGE_SIZE,
-          num_channels=imagenet_preprocessing.NUM_CHANNELS,
-          num_classes=imagenet_preprocessing.NUM_CLASSES,
-          dtype=self.dtype,
-          drop_remainder=True)
-    else:
-      self.input_fn = imagenet_preprocessing.input_fn
-
-    self.model = resnet_model.resnet50(
-        num_classes=imagenet_preprocessing.NUM_CLASSES,
-        use_l2_regularizer=not flags_obj.single_l2_loss_op)
-
-    lr_schedule = common.PiecewiseConstantDecayWithWarmup(
-        batch_size=flags_obj.batch_size,
-        epoch_size=imagenet_preprocessing.NUM_IMAGES['train'],
-        warmup_epochs=common.LR_SCHEDULE[0][1],
-        boundaries=list(p[1] for p in common.LR_SCHEDULE[1:]),
-        multipliers=list(p[0] for p in common.LR_SCHEDULE),
-        compute_lr_on_cpu=True)
-    self.optimizer = common.get_optimizer(lr_schedule)
-    # Make sure iterations variable is created inside scope.
-    self.global_step = self.optimizer.iterations
-
-    self.optimizer = performance.configure_optimizer(
-        self.optimizer,
-        use_float16=self.dtype == tf.float16,
-        loss_scale=flags_core.get_loss_scale(flags_obj, default_for_fp16=128))
-
-    self.train_loss = tf.keras.metrics.Mean('train_loss', dtype=tf.float32)
-    self.train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
-        'train_accuracy', dtype=tf.float32)
-    self.test_loss = tf.keras.metrics.Mean('test_loss', dtype=tf.float32)
-    self.test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
-        'test_accuracy', dtype=tf.float32)
-
-    self.checkpoint = tf.train.Checkpoint(
-        model=self.model, optimizer=self.optimizer)
-
-    # Handling epochs.
-    self.epoch_steps = epoch_steps
-    self.epoch_helper = orbit.utils.EpochHelper(epoch_steps, self.global_step)
-    train_dataset = orbit.utils.make_distributed_dataset(
-        self.strategy,
-        self.input_fn,
-        is_training=True,
-        data_dir=self.flags_obj.data_dir,
-        batch_size=self.batch_size,
-        parse_record_fn=imagenet_preprocessing.parse_record,
-        datasets_num_private_threads=self.flags_obj
-        .datasets_num_private_threads,
-        dtype=self.dtype,
-        drop_remainder=True)
-    orbit.StandardTrainer.__init__(
-        self,
-        train_dataset,
-        options=orbit.StandardTrainerOptions(
-            use_tf_while_loop=flags_obj.use_tf_while_loop,
-            use_tf_function=flags_obj.use_tf_function))
-    if not flags_obj.skip_eval:
-      eval_dataset = orbit.utils.make_distributed_dataset(
-          self.strategy,
-          self.input_fn,
-          is_training=False,
-          data_dir=self.flags_obj.data_dir,
-          batch_size=self.batch_size,
-          parse_record_fn=imagenet_preprocessing.parse_record,
-          dtype=self.dtype)
-      orbit.StandardEvaluator.__init__(
-          self,
-          eval_dataset,
-          options=orbit.StandardEvaluatorOptions(
-              use_tf_function=flags_obj.use_tf_function))
-
-  def train_loop_begin(self):
-    """See base class."""
-    # Reset all metrics
-    self.train_loss.reset_states()
-    self.train_accuracy.reset_states()
-
-    self._epoch_begin()
-    self.time_callback.on_batch_begin(self.epoch_helper.batch_index)
-
-  def train_step(self, iterator):
-    """See base class."""
-
-    def step_fn(inputs):
-      """Function to run on the device."""
-      images, labels = inputs
-      with tf.GradientTape() as tape:
-        logits = self.model(images, training=True)
-
-        prediction_loss = tf.keras.losses.sparse_categorical_crossentropy(
-            labels, logits)
-        loss = tf.reduce_sum(prediction_loss) * (1.0 /
-                                                 self.flags_obj.batch_size)
-        num_replicas = self.strategy.num_replicas_in_sync
-        l2_weight_decay = 1e-4
-        if self.flags_obj.single_l2_loss_op:
-          l2_loss = l2_weight_decay * 2 * tf.add_n([
-              tf.nn.l2_loss(v)
-              for v in self.model.trainable_variables
-              if 'bn' not in v.name
-          ])
-
-          loss += (l2_loss / num_replicas)
-        else:
-          loss += (tf.reduce_sum(self.model.losses) / num_replicas)
-
-      grad_utils.minimize_using_explicit_allreduce(
-          tape, self.optimizer, loss, self.model.trainable_variables)
-      self.train_loss.update_state(loss)
-      self.train_accuracy.update_state(labels, logits)
-    if self.flags_obj.enable_xla:
-      step_fn = tf.function(step_fn, jit_compile=True)
-    self.strategy.run(step_fn, args=(next(iterator),))
-
-  def train_loop_end(self):
-    """See base class."""
-    metrics = {
-        'train_loss': self.train_loss.result(),
-        'train_accuracy': self.train_accuracy.result(),
-    }
-    self.time_callback.on_batch_end(self.epoch_helper.batch_index - 1)
-    self._epoch_end()
-    return metrics
-
-  def eval_begin(self):
-    """See base class."""
-    self.test_loss.reset_states()
-    self.test_accuracy.reset_states()
-
-  def eval_step(self, iterator):
-    """See base class."""
-
-    def step_fn(inputs):
-      """Function to run on the device."""
-      images, labels = inputs
-      logits = self.model(images, training=False)
-      loss = tf.keras.losses.sparse_categorical_crossentropy(labels, logits)
-      loss = tf.reduce_sum(loss) * (1.0 / self.flags_obj.batch_size)
-      self.test_loss.update_state(loss)
-      self.test_accuracy.update_state(labels, logits)
-
-    self.strategy.run(step_fn, args=(next(iterator),))
-
-  def eval_end(self):
-    """See base class."""
-    return {
-        'test_loss': self.test_loss.result(),
-        'test_accuracy': self.test_accuracy.result()
-    }
-
-  def _epoch_begin(self):
-    if self.epoch_helper.epoch_begin():
-      self.time_callback.on_epoch_begin(self.epoch_helper.current_epoch)
-
-  def _epoch_end(self):
-    if self.epoch_helper.epoch_end():
-      self.time_callback.on_epoch_end(self.epoch_helper.current_epoch)
diff --git a/official/vision/image_classification/resnet/tfhub_export.py b/official/vision/image_classification/resnet/tfhub_export.py
deleted file mode 100644
index 2b19f70bc7ae0c019d4d969cdedb28fdc5898b79..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/resnet/tfhub_export.py
+++ /dev/null
@@ -1,66 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""A script to export TF-Hub SavedModel."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-# Import libraries
-from absl import app
-from absl import flags
-
-import tensorflow as tf
-
-from official.vision.image_classification.resnet import imagenet_preprocessing
-from official.vision.image_classification.resnet import resnet_model
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string("model_path", None,
-                    "File path to TF model checkpoint or H5 file.")
-flags.DEFINE_string("export_path", None,
-                    "TF-Hub SavedModel destination path to export.")
-
-
-def export_tfhub(model_path, hub_destination):
-  """Restores a tf.keras.Model and saves for TF-Hub."""
-  model = resnet_model.resnet50(
-      num_classes=imagenet_preprocessing.NUM_CLASSES, rescale_inputs=True)
-  model.load_weights(model_path)
-  model.save(
-      os.path.join(hub_destination, "classification"), include_optimizer=False)
-
-  # Extracts a sub-model to use pooling feature vector as model output.
-  image_input = model.get_layer(index=0).get_output_at(0)
-  feature_vector_output = model.get_layer(name="reduce_mean").get_output_at(0)
-  hub_model = tf.keras.Model(image_input, feature_vector_output)
-
-  # Exports a SavedModel.
-  hub_model.save(
-      os.path.join(hub_destination, "feature-vector"), include_optimizer=False)
-
-
-def main(argv):
-  if len(argv) > 1:
-    raise app.UsageError("Too many command-line arguments.")
-
-  export_tfhub(FLAGS.model_path, FLAGS.export_path)
-
-
-if __name__ == "__main__":
-  app.run(main)
diff --git a/official/vision/image_classification/test_utils.py b/official/vision/image_classification/test_utils.py
deleted file mode 100644
index 8d7180c9d4e10c3241c4d6dd31d2cd013439df7a..0000000000000000000000000000000000000000
--- a/official/vision/image_classification/test_utils.py
+++ /dev/null
@@ -1,37 +0,0 @@
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Test utilities for image classification tasks."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-
-def trivial_model(num_classes):
-  """Trivial model for ImageNet dataset."""
-
-  input_shape = (224, 224, 3)
-  img_input = tf.keras.layers.Input(shape=input_shape)
-
-  x = tf.keras.layers.Lambda(
-      lambda x: tf.keras.backend.reshape(x, [-1, 224 * 224 * 3]),
-      name='reshape')(img_input)
-  x = tf.keras.layers.Dense(1, name='fc1')(x)
-  x = tf.keras.layers.Dense(num_classes, name='fc1000')(x)
-  x = tf.keras.layers.Activation('softmax', dtype='float32')(x)
-
-  return tf.keras.models.Model(img_input, x, name='trivial')