diff --git a/official/vision/image_classification/README.md b/official/vision/image_classification/README.md index 8e2edbf91888ec916231f66fea53f4887352a6c5..c34b48a4f847613132a7b36fa4f1ccd110882143 100644 --- a/official/vision/image_classification/README.md +++ b/official/vision/image_classification/README.md @@ -1,185 +1,3 @@ -# Image Classification - -**Warning:** the features in the `image_classification/` folder have been fully -intergrated into vision/beta. Please use the [new code base](../beta/README.md). - -This folder contains TF 2.0 model examples for image classification: - -* [MNIST](#mnist) -* [Classifier Trainer](#classifier-trainer), a framework that uses the Keras -compile/fit methods for image classification models, including: - * ResNet - * EfficientNet[^1] - -[^1]: Currently a work in progress. We cannot match "AutoAugment (AA)" in [the original version](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet). -For more information about other types of models, please refer to this -[README file](../../README.md). - -## Before you begin -Please make sure that you have the latest version of TensorFlow -installed and -[add the models folder to your Python path](/official/#running-the-models). - -### ImageNet preparation - -#### Using TFDS -`classifier_trainer.py` supports ImageNet with -[TensorFlow Datasets (TFDS)](https://www.tensorflow.org/datasets/overview). - -Please see the following [example snippet](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/scripts/download_and_prepare.py) -for more information on how to use TFDS to download and prepare datasets, and -specifically the [TFDS ImageNet readme](https://github.com/tensorflow/datasets/blob/master/docs/catalog/imagenet2012.md) -for manual download instructions. - -#### Legacy TFRecords -Download the ImageNet dataset and convert it to TFRecord format. -The following [script](https://github.com/tensorflow/tpu/blob/master/tools/datasets/imagenet_to_gcs.py) -and [README](https://github.com/tensorflow/tpu/tree/master/tools/datasets#imagenet_to_gcspy) -provide a few options. - -Note that the legacy ResNet runners, e.g. [resnet/resnet_ctl_imagenet_main.py](resnet/resnet_ctl_imagenet_main.py) -require TFRecords whereas `classifier_trainer.py` can use both by setting the -builder to 'records' or 'tfds' in the configurations. - -### Running on Cloud TPUs - -Note: These models will **not** work with TPUs on Colab. - -You can train image classification models on Cloud TPUs using -[tf.distribute.TPUStrategy](https://www.tensorflow.org/api_docs/python/tf.distribute.TPUStrategy?version=nightly). -If you are not familiar with Cloud TPUs, it is strongly recommended that you go -through the -[quickstart](https://cloud.google.com/tpu/docs/quickstart) to learn how to -create a TPU and GCE VM. - -### Running on multiple GPU hosts - -You can also train these models on multiple hosts, each with GPUs, using -[tf.distribute.experimental.MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy). - -The easiest way to run multi-host benchmarks is to set the -[`TF_CONFIG`](https://www.tensorflow.org/guide/distributed_training#TF_CONFIG) -appropriately at each host. e.g., to run using `MultiWorkerMirroredStrategy` on -2 hosts, the `cluster` in `TF_CONFIG` should have 2 `host:port` entries, and -host `i` should have the `task` in `TF_CONFIG` set to `{"type": "worker", -"index": i}`. `MultiWorkerMirroredStrategy` will automatically use all the -available GPUs at each host. - -## MNIST - -To download the data and run the MNIST sample model locally for the first time, -run one of the following command: - -```bash -python3 mnist_main.py \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --train_epochs=10 \ - --distribution_strategy=one_device \ - --num_gpus=$NUM_GPUS \ - --download -``` - -To train the model on a Cloud TPU, run the following command: - -```bash -python3 mnist_main.py \ - --tpu=$TPU_NAME \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --train_epochs=10 \ - --distribution_strategy=tpu \ - --download -``` - -Note: the `--download` flag is only required the first time you run the model. - - -## Classifier Trainer -The classifier trainer is a unified framework for running image classification -models using Keras's compile/fit methods. Experiments should be provided in the -form of YAML files, some examples are included within the configs/examples -folder. Please see [configs/examples](./configs/examples) for more example -configurations. - -The provided configuration files use a per replica batch size and is scaled -by the number of devices. For instance, if `batch size` = 64, then for 1 GPU -the global batch size would be 64 * 1 = 64. For 8 GPUs, the global batch size -would be 64 * 8 = 512. Similarly, for a v3-8 TPU, the global batch size would -be 64 * 8 = 512, and for a v3-32, the global batch size is 64 * 32 = 2048. - -### ResNet50 - -#### On GPU: -```bash -python3 classifier_trainer.py \ - --mode=train_and_eval \ - --model_type=resnet \ - --dataset=imagenet \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --config_file=configs/examples/resnet/imagenet/gpu.yaml \ - --params_override='runtime.num_gpus=$NUM_GPUS' -``` - -To train on multiple hosts, each with GPUs attached using -[MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy) -please update `runtime` section in gpu.yaml -(or override using `--params_override`) with: - -```YAML -# gpu.yaml -runtime: - distribution_strategy: 'multi_worker_mirrored' - worker_hosts: '$HOST1:port,$HOST2:port' - num_gpus: $NUM_GPUS - task_index: 0 -``` -By having `task_index: 0` on the first host and `task_index: 1` on the second -and so on. `$HOST1` and `$HOST2` are the IP addresses of the hosts, and `port` -can be chosen any free port on the hosts. Only the first host will write -TensorBoard Summaries and save checkpoints. - -#### On TPU: -```bash -python3 classifier_trainer.py \ - --mode=train_and_eval \ - --model_type=resnet \ - --dataset=imagenet \ - --tpu=$TPU_NAME \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --config_file=configs/examples/resnet/imagenet/tpu.yaml -``` - -### EfficientNet -**Note: EfficientNet development is a work in progress.** -#### On GPU: -```bash -python3 classifier_trainer.py \ - --mode=train_and_eval \ - --model_type=efficientnet \ - --dataset=imagenet \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --config_file=configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml \ - --params_override='runtime.num_gpus=$NUM_GPUS' -``` - - -#### On TPU: -```bash -python3 classifier_trainer.py \ - --mode=train_and_eval \ - --model_type=efficientnet \ - --dataset=imagenet \ - --tpu=$TPU_NAME \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --config_file=configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml -``` - -Note that the number of GPU devices can be overridden in the command line using -`--params_overrides`. The TPU does not need this override as the device is fixed -by providing the TPU address or name with the `--tpu` flag. - +This repository is deprecated and replaced by the solid +implementations inside vision/beta/. All the content has been moved to +[official/legacy/image_classification](https://github.com/tensorflow/models/tree/master/official/legacy/image_classification). diff --git a/official/vision/image_classification/__init__.py b/official/vision/image_classification/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..f8cba89ac32e6b894aeace22f6190d88f2f724df 100644 --- a/official/vision/image_classification/__init__.py +++ b/official/vision/image_classification/__init__.py @@ -12,3 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. +"""Deprecating the vision/detection folder.""" +raise ImportError( + 'This module has been moved to official/legacy/image_classification') diff --git a/official/vision/image_classification/augment.py b/official/vision/image_classification/augment.py deleted file mode 100644 index f322d31dac6ecc1e282566134720d42261a9b7fc..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/augment.py +++ /dev/null @@ -1,985 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""AutoAugment and RandAugment policies for enhanced image preprocessing. - -AutoAugment Reference: https://arxiv.org/abs/1805.09501 -RandAugment Reference: https://arxiv.org/abs/1909.13719 -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import math -from typing import Any, Dict, List, Optional, Text, Tuple - -from keras.layers.preprocessing import image_preprocessing as image_ops -import tensorflow as tf - - -# This signifies the max integer that the controller RNN could predict for the -# augmentation scheme. -_MAX_LEVEL = 10. - - -def to_4d(image: tf.Tensor) -> tf.Tensor: - """Converts an input Tensor to 4 dimensions. - - 4D image => [N, H, W, C] or [N, C, H, W] - 3D image => [1, H, W, C] or [1, C, H, W] - 2D image => [1, H, W, 1] - - Args: - image: The 2/3/4D input tensor. - - Returns: - A 4D image tensor. - - Raises: - `TypeError` if `image` is not a 2/3/4D tensor. - - """ - shape = tf.shape(image) - original_rank = tf.rank(image) - left_pad = tf.cast(tf.less_equal(original_rank, 3), dtype=tf.int32) - right_pad = tf.cast(tf.equal(original_rank, 2), dtype=tf.int32) - new_shape = tf.concat( - [ - tf.ones(shape=left_pad, dtype=tf.int32), - shape, - tf.ones(shape=right_pad, dtype=tf.int32), - ], - axis=0, - ) - return tf.reshape(image, new_shape) - - -def from_4d(image: tf.Tensor, ndims: tf.Tensor) -> tf.Tensor: - """Converts a 4D image back to `ndims` rank.""" - shape = tf.shape(image) - begin = tf.cast(tf.less_equal(ndims, 3), dtype=tf.int32) - end = 4 - tf.cast(tf.equal(ndims, 2), dtype=tf.int32) - new_shape = shape[begin:end] - return tf.reshape(image, new_shape) - - -def _convert_translation_to_transform(translations: tf.Tensor) -> tf.Tensor: - """Converts translations to a projective transform. - - The translation matrix looks like this: - [[1 0 -dx] - [0 1 -dy] - [0 0 1]] - - Args: - translations: The 2-element list representing [dx, dy], or a matrix of - 2-element lists representing [dx dy] to translate for each image. The - shape must be static. - - Returns: - The transformation matrix of shape (num_images, 8). - - Raises: - `TypeError` if - - the shape of `translations` is not known or - - the shape of `translations` is not rank 1 or 2. - - """ - translations = tf.convert_to_tensor(translations, dtype=tf.float32) - if translations.get_shape().ndims is None: - raise TypeError('translations rank must be statically known') - elif len(translations.get_shape()) == 1: - translations = translations[None] - elif len(translations.get_shape()) != 2: - raise TypeError('translations should have rank 1 or 2.') - num_translations = tf.shape(translations)[0] - - return tf.concat( - values=[ - tf.ones((num_translations, 1), tf.dtypes.float32), - tf.zeros((num_translations, 1), tf.dtypes.float32), - -translations[:, 0, None], - tf.zeros((num_translations, 1), tf.dtypes.float32), - tf.ones((num_translations, 1), tf.dtypes.float32), - -translations[:, 1, None], - tf.zeros((num_translations, 2), tf.dtypes.float32), - ], - axis=1, - ) - - -def _convert_angles_to_transform(angles: tf.Tensor, image_width: tf.Tensor, - image_height: tf.Tensor) -> tf.Tensor: - """Converts an angle or angles to a projective transform. - - Args: - angles: A scalar to rotate all images, or a vector to rotate a batch of - images. This must be a scalar. - image_width: The width of the image(s) to be transformed. - image_height: The height of the image(s) to be transformed. - - Returns: - A tensor of shape (num_images, 8). - - Raises: - `TypeError` if `angles` is not rank 0 or 1. - - """ - angles = tf.convert_to_tensor(angles, dtype=tf.float32) - if len(angles.get_shape()) == 0: # pylint:disable=g-explicit-length-test - angles = angles[None] - elif len(angles.get_shape()) != 1: - raise TypeError('Angles should have a rank 0 or 1.') - x_offset = ((image_width - 1) - - (tf.math.cos(angles) * (image_width - 1) - tf.math.sin(angles) * - (image_height - 1))) / 2.0 - y_offset = ((image_height - 1) - - (tf.math.sin(angles) * (image_width - 1) + tf.math.cos(angles) * - (image_height - 1))) / 2.0 - num_angles = tf.shape(angles)[0] - return tf.concat( - values=[ - tf.math.cos(angles)[:, None], - -tf.math.sin(angles)[:, None], - x_offset[:, None], - tf.math.sin(angles)[:, None], - tf.math.cos(angles)[:, None], - y_offset[:, None], - tf.zeros((num_angles, 2), tf.dtypes.float32), - ], - axis=1, - ) - - -def transform(image: tf.Tensor, transforms) -> tf.Tensor: - """Prepares input data for `image_ops.transform`.""" - original_ndims = tf.rank(image) - transforms = tf.convert_to_tensor(transforms, dtype=tf.float32) - if transforms.shape.rank == 1: - transforms = transforms[None] - image = to_4d(image) - image = image_ops.transform( - images=image, transforms=transforms, interpolation='nearest') - return from_4d(image, original_ndims) - - -def translate(image: tf.Tensor, translations) -> tf.Tensor: - """Translates image(s) by provided vectors. - - Args: - image: An image Tensor of type uint8. - translations: A vector or matrix representing [dx dy]. - - Returns: - The translated version of the image. - - """ - transforms = _convert_translation_to_transform(translations) - return transform(image, transforms=transforms) - - -def rotate(image: tf.Tensor, degrees: float) -> tf.Tensor: - """Rotates the image by degrees either clockwise or counterclockwise. - - Args: - image: An image Tensor of type uint8. - degrees: Float, a scalar angle in degrees to rotate all images by. If - degrees is positive the image will be rotated clockwise otherwise it will - be rotated counterclockwise. - - Returns: - The rotated version of image. - - """ - # Convert from degrees to radians. - degrees_to_radians = math.pi / 180.0 - radians = tf.cast(degrees * degrees_to_radians, tf.float32) - - original_ndims = tf.rank(image) - image = to_4d(image) - - image_height = tf.cast(tf.shape(image)[1], tf.float32) - image_width = tf.cast(tf.shape(image)[2], tf.float32) - transforms = _convert_angles_to_transform( - angles=radians, image_width=image_width, image_height=image_height) - # In practice, we should randomize the rotation degrees by flipping - # it negatively half the time, but that's done on 'degrees' outside - # of the function. - image = transform(image, transforms=transforms) - return from_4d(image, original_ndims) - - -def blend(image1: tf.Tensor, image2: tf.Tensor, factor: float) -> tf.Tensor: - """Blend image1 and image2 using 'factor'. - - Factor can be above 0.0. A value of 0.0 means only image1 is used. - A value of 1.0 means only image2 is used. A value between 0.0 and - 1.0 means we linearly interpolate the pixel values between the two - images. A value greater than 1.0 "extrapolates" the difference - between the two pixel values, and we clip the results to values - between 0 and 255. - - Args: - image1: An image Tensor of type uint8. - image2: An image Tensor of type uint8. - factor: A floating point value above 0.0. - - Returns: - A blended image Tensor of type uint8. - """ - if factor == 0.0: - return tf.convert_to_tensor(image1) - if factor == 1.0: - return tf.convert_to_tensor(image2) - - image1 = tf.cast(image1, tf.float32) - image2 = tf.cast(image2, tf.float32) - - difference = image2 - image1 - scaled = factor * difference - - # Do addition in float. - temp = tf.cast(image1, tf.float32) + scaled - - # Interpolate - if factor > 0.0 and factor < 1.0: - # Interpolation means we always stay within 0 and 255. - return tf.cast(temp, tf.uint8) - - # Extrapolate: - # - # We need to clip and then cast. - return tf.cast(tf.clip_by_value(temp, 0.0, 255.0), tf.uint8) - - -def cutout(image: tf.Tensor, pad_size: int, replace: int = 0) -> tf.Tensor: - """Apply cutout (https://arxiv.org/abs/1708.04552) to image. - - This operation applies a (2*pad_size x 2*pad_size) mask of zeros to - a random location within `img`. The pixel values filled in will be of the - value `replace`. The located where the mask will be applied is randomly - chosen uniformly over the whole image. - - Args: - image: An image Tensor of type uint8. - pad_size: Specifies how big the zero mask that will be generated is that is - applied to the image. The mask will be of size (2*pad_size x 2*pad_size). - replace: What pixel value to fill in the image in the area that has the - cutout mask applied to it. - - Returns: - An image Tensor that is of type uint8. - """ - image_height = tf.shape(image)[0] - image_width = tf.shape(image)[1] - - # Sample the center location in the image where the zero mask will be applied. - cutout_center_height = tf.random.uniform( - shape=[], minval=0, maxval=image_height, dtype=tf.int32) - - cutout_center_width = tf.random.uniform( - shape=[], minval=0, maxval=image_width, dtype=tf.int32) - - lower_pad = tf.maximum(0, cutout_center_height - pad_size) - upper_pad = tf.maximum(0, image_height - cutout_center_height - pad_size) - left_pad = tf.maximum(0, cutout_center_width - pad_size) - right_pad = tf.maximum(0, image_width - cutout_center_width - pad_size) - - cutout_shape = [ - image_height - (lower_pad + upper_pad), - image_width - (left_pad + right_pad) - ] - padding_dims = [[lower_pad, upper_pad], [left_pad, right_pad]] - mask = tf.pad( - tf.zeros(cutout_shape, dtype=image.dtype), - padding_dims, - constant_values=1) - mask = tf.expand_dims(mask, -1) - mask = tf.tile(mask, [1, 1, 3]) - image = tf.where( - tf.equal(mask, 0), - tf.ones_like(image, dtype=image.dtype) * replace, image) - return image - - -def solarize(image: tf.Tensor, threshold: int = 128) -> tf.Tensor: - # For each pixel in the image, select the pixel - # if the value is less than the threshold. - # Otherwise, subtract 255 from the pixel. - return tf.where(image < threshold, image, 255 - image) - - -def solarize_add(image: tf.Tensor, - addition: int = 0, - threshold: int = 128) -> tf.Tensor: - # For each pixel in the image less than threshold - # we add 'addition' amount to it and then clip the - # pixel value to be between 0 and 255. The value - # of 'addition' is between -128 and 128. - added_image = tf.cast(image, tf.int64) + addition - added_image = tf.cast(tf.clip_by_value(added_image, 0, 255), tf.uint8) - return tf.where(image < threshold, added_image, image) - - -def color(image: tf.Tensor, factor: float) -> tf.Tensor: - """Equivalent of PIL Color.""" - degenerate = tf.image.grayscale_to_rgb(tf.image.rgb_to_grayscale(image)) - return blend(degenerate, image, factor) - - -def contrast(image: tf.Tensor, factor: float) -> tf.Tensor: - """Equivalent of PIL Contrast.""" - degenerate = tf.image.rgb_to_grayscale(image) - # Cast before calling tf.histogram. - degenerate = tf.cast(degenerate, tf.int32) - - # Compute the grayscale histogram, then compute the mean pixel value, - # and create a constant image size of that value. Use that as the - # blending degenerate target of the original image. - hist = tf.histogram_fixed_width(degenerate, [0, 255], nbins=256) - mean = tf.reduce_sum(tf.cast(hist, tf.float32)) / 256.0 - degenerate = tf.ones_like(degenerate, dtype=tf.float32) * mean - degenerate = tf.clip_by_value(degenerate, 0.0, 255.0) - degenerate = tf.image.grayscale_to_rgb(tf.cast(degenerate, tf.uint8)) - return blend(degenerate, image, factor) - - -def brightness(image: tf.Tensor, factor: float) -> tf.Tensor: - """Equivalent of PIL Brightness.""" - degenerate = tf.zeros_like(image) - return blend(degenerate, image, factor) - - -def posterize(image: tf.Tensor, bits: int) -> tf.Tensor: - """Equivalent of PIL Posterize.""" - shift = 8 - bits - return tf.bitwise.left_shift(tf.bitwise.right_shift(image, shift), shift) - - -def wrapped_rotate(image: tf.Tensor, degrees: float, replace: int) -> tf.Tensor: - """Applies rotation with wrap/unwrap.""" - image = rotate(wrap(image), degrees=degrees) - return unwrap(image, replace) - - -def translate_x(image: tf.Tensor, pixels: int, replace: int) -> tf.Tensor: - """Equivalent of PIL Translate in X dimension.""" - image = translate(wrap(image), [-pixels, 0]) - return unwrap(image, replace) - - -def translate_y(image: tf.Tensor, pixels: int, replace: int) -> tf.Tensor: - """Equivalent of PIL Translate in Y dimension.""" - image = translate(wrap(image), [0, -pixels]) - return unwrap(image, replace) - - -def shear_x(image: tf.Tensor, level: float, replace: int) -> tf.Tensor: - """Equivalent of PIL Shearing in X dimension.""" - # Shear parallel to x axis is a projective transform - # with a matrix form of: - # [1 level - # 0 1]. - image = transform( - image=wrap(image), transforms=[1., level, 0., 0., 1., 0., 0., 0.]) - return unwrap(image, replace) - - -def shear_y(image: tf.Tensor, level: float, replace: int) -> tf.Tensor: - """Equivalent of PIL Shearing in Y dimension.""" - # Shear parallel to y axis is a projective transform - # with a matrix form of: - # [1 0 - # level 1]. - image = transform( - image=wrap(image), transforms=[1., 0., 0., level, 1., 0., 0., 0.]) - return unwrap(image, replace) - - -def autocontrast(image: tf.Tensor) -> tf.Tensor: - """Implements Autocontrast function from PIL using TF ops. - - Args: - image: A 3D uint8 tensor. - - Returns: - The image after it has had autocontrast applied to it and will be of type - uint8. - """ - - def scale_channel(image: tf.Tensor) -> tf.Tensor: - """Scale the 2D image using the autocontrast rule.""" - # A possibly cheaper version can be done using cumsum/unique_with_counts - # over the histogram values, rather than iterating over the entire image. - # to compute mins and maxes. - lo = tf.cast(tf.reduce_min(image), tf.float32) - hi = tf.cast(tf.reduce_max(image), tf.float32) - - # Scale the image, making the lowest value 0 and the highest value 255. - def scale_values(im): - scale = 255.0 / (hi - lo) - offset = -lo * scale - im = tf.cast(im, tf.float32) * scale + offset - im = tf.clip_by_value(im, 0.0, 255.0) - return tf.cast(im, tf.uint8) - - result = tf.cond(hi > lo, lambda: scale_values(image), lambda: image) - return result - - # Assumes RGB for now. Scales each channel independently - # and then stacks the result. - s1 = scale_channel(image[:, :, 0]) - s2 = scale_channel(image[:, :, 1]) - s3 = scale_channel(image[:, :, 2]) - image = tf.stack([s1, s2, s3], 2) - return image - - -def sharpness(image: tf.Tensor, factor: float) -> tf.Tensor: - """Implements Sharpness function from PIL using TF ops.""" - orig_image = image - image = tf.cast(image, tf.float32) - # Make image 4D for conv operation. - image = tf.expand_dims(image, 0) - # SMOOTH PIL Kernel. - kernel = tf.constant([[1, 1, 1], [1, 5, 1], [1, 1, 1]], - dtype=tf.float32, - shape=[3, 3, 1, 1]) / 13. - # Tile across channel dimension. - kernel = tf.tile(kernel, [1, 1, 3, 1]) - strides = [1, 1, 1, 1] - degenerate = tf.nn.depthwise_conv2d( - image, kernel, strides, padding='VALID', dilations=[1, 1]) - degenerate = tf.clip_by_value(degenerate, 0.0, 255.0) - degenerate = tf.squeeze(tf.cast(degenerate, tf.uint8), [0]) - - # For the borders of the resulting image, fill in the values of the - # original image. - mask = tf.ones_like(degenerate) - padded_mask = tf.pad(mask, [[1, 1], [1, 1], [0, 0]]) - padded_degenerate = tf.pad(degenerate, [[1, 1], [1, 1], [0, 0]]) - result = tf.where(tf.equal(padded_mask, 1), padded_degenerate, orig_image) - - # Blend the final result. - return blend(result, orig_image, factor) - - -def equalize(image: tf.Tensor) -> tf.Tensor: - """Implements Equalize function from PIL using TF ops.""" - - def scale_channel(im, c): - """Scale the data in the channel to implement equalize.""" - im = tf.cast(im[:, :, c], tf.int32) - # Compute the histogram of the image channel. - histo = tf.histogram_fixed_width(im, [0, 255], nbins=256) - - # For the purposes of computing the step, filter out the nonzeros. - nonzero = tf.where(tf.not_equal(histo, 0)) - nonzero_histo = tf.reshape(tf.gather(histo, nonzero), [-1]) - step = (tf.reduce_sum(nonzero_histo) - nonzero_histo[-1]) // 255 - - def build_lut(histo, step): - # Compute the cumulative sum, shifting by step // 2 - # and then normalization by step. - lut = (tf.cumsum(histo) + (step // 2)) // step - # Shift lut, prepending with 0. - lut = tf.concat([[0], lut[:-1]], 0) - # Clip the counts to be in range. This is done - # in the C code for image.point. - return tf.clip_by_value(lut, 0, 255) - - # If step is zero, return the original image. Otherwise, build - # lut from the full histogram and step and then index from it. - result = tf.cond( - tf.equal(step, 0), lambda: im, - lambda: tf.gather(build_lut(histo, step), im)) - - return tf.cast(result, tf.uint8) - - # Assumes RGB for now. Scales each channel independently - # and then stacks the result. - s1 = scale_channel(image, 0) - s2 = scale_channel(image, 1) - s3 = scale_channel(image, 2) - image = tf.stack([s1, s2, s3], 2) - return image - - -def invert(image: tf.Tensor) -> tf.Tensor: - """Inverts the image pixels.""" - image = tf.convert_to_tensor(image) - return 255 - image - - -def wrap(image: tf.Tensor) -> tf.Tensor: - """Returns 'image' with an extra channel set to all 1s.""" - shape = tf.shape(image) - extended_channel = tf.ones([shape[0], shape[1], 1], image.dtype) - extended = tf.concat([image, extended_channel], axis=2) - return extended - - -def unwrap(image: tf.Tensor, replace: int) -> tf.Tensor: - """Unwraps an image produced by wrap. - - Where there is a 0 in the last channel for every spatial position, - the rest of the three channels in that spatial dimension are grayed - (set to 128). Operations like translate and shear on a wrapped - Tensor will leave 0s in empty locations. Some transformations look - at the intensity of values to do preprocessing, and we want these - empty pixels to assume the 'average' value, rather than pure black. - - - Args: - image: A 3D Image Tensor with 4 channels. - replace: A one or three value 1D tensor to fill empty pixels. - - Returns: - image: A 3D image Tensor with 3 channels. - """ - image_shape = tf.shape(image) - # Flatten the spatial dimensions. - flattened_image = tf.reshape(image, [-1, image_shape[2]]) - - # Find all pixels where the last channel is zero. - alpha_channel = tf.expand_dims(flattened_image[:, 3], axis=-1) - - replace = tf.concat([replace, tf.ones([1], image.dtype)], 0) - - # Where they are zero, fill them in with 'replace'. - flattened_image = tf.where( - tf.equal(alpha_channel, 0), - tf.ones_like(flattened_image, dtype=image.dtype) * replace, - flattened_image) - - image = tf.reshape(flattened_image, image_shape) - image = tf.slice(image, [0, 0, 0], [image_shape[0], image_shape[1], 3]) - return image - - -def _randomly_negate_tensor(tensor): - """With 50% prob turn the tensor negative.""" - should_flip = tf.cast(tf.floor(tf.random.uniform([]) + 0.5), tf.bool) - final_tensor = tf.cond(should_flip, lambda: tensor, lambda: -tensor) - return final_tensor - - -def _rotate_level_to_arg(level: float): - level = (level / _MAX_LEVEL) * 30. - level = _randomly_negate_tensor(level) - return (level,) - - -def _shrink_level_to_arg(level: float): - """Converts level to ratio by which we shrink the image content.""" - if level == 0: - return (1.0,) # if level is zero, do not shrink the image - # Maximum shrinking ratio is 2.9. - level = 2. / (_MAX_LEVEL / level) + 0.9 - return (level,) - - -def _enhance_level_to_arg(level: float): - return ((level / _MAX_LEVEL) * 1.8 + 0.1,) - - -def _shear_level_to_arg(level: float): - level = (level / _MAX_LEVEL) * 0.3 - # Flip level to negative with 50% chance. - level = _randomly_negate_tensor(level) - return (level,) - - -def _translate_level_to_arg(level: float, translate_const: float): - level = (level / _MAX_LEVEL) * float(translate_const) - # Flip level to negative with 50% chance. - level = _randomly_negate_tensor(level) - return (level,) - - -def _mult_to_arg(level: float, multiplier: float = 1.): - return (int((level / _MAX_LEVEL) * multiplier),) - - -def _apply_func_with_prob(func: Any, image: tf.Tensor, args: Any, prob: float): - """Apply `func` to image w/ `args` as input with probability `prob`.""" - assert isinstance(args, tuple) - - # Apply the function with probability `prob`. - should_apply_op = tf.cast( - tf.floor(tf.random.uniform([], dtype=tf.float32) + prob), tf.bool) - augmented_image = tf.cond(should_apply_op, lambda: func(image, *args), - lambda: image) - return augmented_image - - -def select_and_apply_random_policy(policies: Any, image: tf.Tensor): - """Select a random policy from `policies` and apply it to `image`.""" - policy_to_select = tf.random.uniform([], maxval=len(policies), dtype=tf.int32) - # Note that using tf.case instead of tf.conds would result in significantly - # larger graphs and would even break export for some larger policies. - for (i, policy) in enumerate(policies): - image = tf.cond( - tf.equal(i, policy_to_select), - lambda selected_policy=policy: selected_policy(image), - lambda: image) - return image - - -NAME_TO_FUNC = { - 'AutoContrast': autocontrast, - 'Equalize': equalize, - 'Invert': invert, - 'Rotate': wrapped_rotate, - 'Posterize': posterize, - 'Solarize': solarize, - 'SolarizeAdd': solarize_add, - 'Color': color, - 'Contrast': contrast, - 'Brightness': brightness, - 'Sharpness': sharpness, - 'ShearX': shear_x, - 'ShearY': shear_y, - 'TranslateX': translate_x, - 'TranslateY': translate_y, - 'Cutout': cutout, -} - -# Functions that have a 'replace' parameter -REPLACE_FUNCS = frozenset({ - 'Rotate', - 'TranslateX', - 'ShearX', - 'ShearY', - 'TranslateY', - 'Cutout', -}) - - -def level_to_arg(cutout_const: float, translate_const: float): - """Creates a dict mapping image operation names to their arguments.""" - - no_arg = lambda level: () - posterize_arg = lambda level: _mult_to_arg(level, 4) - solarize_arg = lambda level: _mult_to_arg(level, 256) - solarize_add_arg = lambda level: _mult_to_arg(level, 110) - cutout_arg = lambda level: _mult_to_arg(level, cutout_const) - translate_arg = lambda level: _translate_level_to_arg(level, translate_const) - - args = { - 'AutoContrast': no_arg, - 'Equalize': no_arg, - 'Invert': no_arg, - 'Rotate': _rotate_level_to_arg, - 'Posterize': posterize_arg, - 'Solarize': solarize_arg, - 'SolarizeAdd': solarize_add_arg, - 'Color': _enhance_level_to_arg, - 'Contrast': _enhance_level_to_arg, - 'Brightness': _enhance_level_to_arg, - 'Sharpness': _enhance_level_to_arg, - 'ShearX': _shear_level_to_arg, - 'ShearY': _shear_level_to_arg, - 'Cutout': cutout_arg, - 'TranslateX': translate_arg, - 'TranslateY': translate_arg, - } - return args - - -def _parse_policy_info(name: Text, prob: float, level: float, - replace_value: List[int], cutout_const: float, - translate_const: float) -> Tuple[Any, float, Any]: - """Return the function that corresponds to `name` and update `level` param.""" - func = NAME_TO_FUNC[name] - args = level_to_arg(cutout_const, translate_const)[name](level) - - if name in REPLACE_FUNCS: - # Add in replace arg if it is required for the function that is called. - args = tuple(list(args) + [replace_value]) - - return func, prob, args - - -class ImageAugment(object): - """Image augmentation class for applying image distortions.""" - - def distort(self, image: tf.Tensor) -> tf.Tensor: - """Given an image tensor, returns a distorted image with the same shape. - - Args: - image: `Tensor` of shape [height, width, 3] representing an image. - - Returns: - The augmented version of `image`. - """ - raise NotImplementedError() - - -class AutoAugment(ImageAugment): - """Applies the AutoAugment policy to images. - - AutoAugment is from the paper: https://arxiv.org/abs/1805.09501. - """ - - def __init__(self, - augmentation_name: Text = 'v0', - policies: Optional[Dict[Text, Any]] = None, - cutout_const: float = 100, - translate_const: float = 250): - """Applies the AutoAugment policy to images. - - Args: - augmentation_name: The name of the AutoAugment policy to use. The - available options are `v0` and `test`. `v0` is the policy used for all - of the results in the paper and was found to achieve the best results on - the COCO dataset. `v1`, `v2` and `v3` are additional good policies found - on the COCO dataset that have slight variation in what operations were - used during the search procedure along with how many operations are - applied in parallel to a single image (2 vs 3). - policies: list of lists of tuples in the form `(func, prob, level)`, - `func` is a string name of the augmentation function, `prob` is the - probability of applying the `func` operation, `level` is the input - argument for `func`. - cutout_const: multiplier for applying cutout. - translate_const: multiplier for applying translation. - """ - super(AutoAugment, self).__init__() - - if policies is None: - self.available_policies = { - 'v0': self.policy_v0(), - 'test': self.policy_test(), - 'simple': self.policy_simple(), - } - - if augmentation_name not in self.available_policies: - raise ValueError( - 'Invalid augmentation_name: {}'.format(augmentation_name)) - - self.augmentation_name = augmentation_name - self.policies = self.available_policies[augmentation_name] - self.cutout_const = float(cutout_const) - self.translate_const = float(translate_const) - - def distort(self, image: tf.Tensor) -> tf.Tensor: - """Applies the AutoAugment policy to `image`. - - AutoAugment is from the paper: https://arxiv.org/abs/1805.09501. - - Args: - image: `Tensor` of shape [height, width, 3] representing an image. - - Returns: - A version of image that now has data augmentation applied to it based on - the `policies` pass into the function. - """ - input_image_type = image.dtype - - if input_image_type != tf.uint8: - image = tf.clip_by_value(image, 0.0, 255.0) - image = tf.cast(image, dtype=tf.uint8) - - replace_value = [128] * 3 - - # func is the string name of the augmentation function, prob is the - # probability of applying the operation and level is the parameter - # associated with the tf op. - - # tf_policies are functions that take in an image and return an augmented - # image. - tf_policies = [] - for policy in self.policies: - tf_policy = [] - # Link string name to the correct python function and make sure the - # correct argument is passed into that function. - for policy_info in policy: - policy_info = list(policy_info) + [ - replace_value, self.cutout_const, self.translate_const - ] - tf_policy.append(_parse_policy_info(*policy_info)) - # Now build the tf policy that will apply the augmentation procedue - # on image. - def make_final_policy(tf_policy_): - - def final_policy(image_): - for func, prob, args in tf_policy_: - image_ = _apply_func_with_prob(func, image_, args, prob) - return image_ - - return final_policy - - tf_policies.append(make_final_policy(tf_policy)) - - image = select_and_apply_random_policy(tf_policies, image) - image = tf.cast(image, dtype=input_image_type) - return image - - @staticmethod - def policy_v0(): - """Autoaugment policy that was used in AutoAugment Paper. - - Each tuple is an augmentation operation of the form - (operation, probability, magnitude). Each element in policy is a - sub-policy that will be applied sequentially on the image. - - Returns: - the policy. - """ - - # TODO(dankondratyuk): tensorflow_addons defines custom ops, which - # for some reason are not included when building/linking - # This results in the error, "Op type not registered - # 'Addons>ImageProjectiveTransformV2' in binary" when running on borg TPUs - policy = [ - [('Equalize', 0.8, 1), ('ShearY', 0.8, 4)], - [('Color', 0.4, 9), ('Equalize', 0.6, 3)], - [('Color', 0.4, 1), ('Rotate', 0.6, 8)], - [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)], - [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)], - [('Color', 0.2, 0), ('Equalize', 0.8, 8)], - [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)], - [('ShearX', 0.2, 9), ('Rotate', 0.6, 8)], - [('Color', 0.6, 1), ('Equalize', 1.0, 2)], - [('Invert', 0.4, 9), ('Rotate', 0.6, 0)], - [('Equalize', 1.0, 9), ('ShearY', 0.6, 3)], - [('Color', 0.4, 7), ('Equalize', 0.6, 0)], - [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)], - [('Solarize', 0.6, 8), ('Color', 0.6, 9)], - [('Solarize', 0.2, 4), ('Rotate', 0.8, 9)], - [('Rotate', 1.0, 7), ('TranslateY', 0.8, 9)], - [('ShearX', 0.0, 0), ('Solarize', 0.8, 4)], - [('ShearY', 0.8, 0), ('Color', 0.6, 4)], - [('Color', 1.0, 0), ('Rotate', 0.6, 2)], - [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)], - [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)], - [('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)], - [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)], - [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)], - [('Color', 0.8, 6), ('Rotate', 0.4, 5)], - ] - return policy - - @staticmethod - def policy_simple(): - """Same as `policy_v0`, except with custom ops removed.""" - - policy = [ - [('Color', 0.4, 9), ('Equalize', 0.6, 3)], - [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)], - [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)], - [('Color', 0.2, 0), ('Equalize', 0.8, 8)], - [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)], - [('Color', 0.6, 1), ('Equalize', 1.0, 2)], - [('Color', 0.4, 7), ('Equalize', 0.6, 0)], - [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)], - [('Solarize', 0.6, 8), ('Color', 0.6, 9)], - [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)], - [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)], - [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)], - [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)], - ] - return policy - - @staticmethod - def policy_test(): - """Autoaugment test policy for debugging.""" - policy = [ - [('TranslateX', 1.0, 4), ('Equalize', 1.0, 10)], - ] - return policy - - -class RandAugment(ImageAugment): - """Applies the RandAugment policy to images. - - RandAugment is from the paper https://arxiv.org/abs/1909.13719, - """ - - def __init__(self, - num_layers: int = 2, - magnitude: float = 10., - cutout_const: float = 40., - translate_const: float = 100.): - """Applies the RandAugment policy to images. - - Args: - num_layers: Integer, the number of augmentation transformations to apply - sequentially to an image. Represented as (N) in the paper. Usually best - values will be in the range [1, 3]. - magnitude: Integer, shared magnitude across all augmentation operations. - Represented as (M) in the paper. Usually best values are in the range - [5, 10]. - cutout_const: multiplier for applying cutout. - translate_const: multiplier for applying translation. - """ - super(RandAugment, self).__init__() - - self.num_layers = num_layers - self.magnitude = float(magnitude) - self.cutout_const = float(cutout_const) - self.translate_const = float(translate_const) - self.available_ops = [ - 'AutoContrast', 'Equalize', 'Invert', 'Rotate', 'Posterize', 'Solarize', - 'Color', 'Contrast', 'Brightness', 'Sharpness', 'ShearX', 'ShearY', - 'TranslateX', 'TranslateY', 'Cutout', 'SolarizeAdd' - ] - - def distort(self, image: tf.Tensor) -> tf.Tensor: - """Applies the RandAugment policy to `image`. - - Args: - image: `Tensor` of shape [height, width, 3] representing an image. - - Returns: - The augmented version of `image`. - """ - input_image_type = image.dtype - - if input_image_type != tf.uint8: - image = tf.clip_by_value(image, 0.0, 255.0) - image = tf.cast(image, dtype=tf.uint8) - - replace_value = [128] * 3 - min_prob, max_prob = 0.2, 0.8 - - for _ in range(self.num_layers): - op_to_select = tf.random.uniform([], - maxval=len(self.available_ops) + 1, - dtype=tf.int32) - - branch_fns = [] - for (i, op_name) in enumerate(self.available_ops): - prob = tf.random.uniform([], - minval=min_prob, - maxval=max_prob, - dtype=tf.float32) - func, _, args = _parse_policy_info(op_name, prob, self.magnitude, - replace_value, self.cutout_const, - self.translate_const) - branch_fns.append(( - i, - # pylint:disable=g-long-lambda - lambda selected_func=func, selected_args=args: selected_func( - image, *selected_args))) - # pylint:enable=g-long-lambda - - image = tf.switch_case( - branch_index=op_to_select, - branch_fns=branch_fns, - default=lambda: tf.identity(image)) - - image = tf.cast(image, dtype=input_image_type) - return image diff --git a/official/vision/image_classification/augment_test.py b/official/vision/image_classification/augment_test.py deleted file mode 100644 index 6279352204c46ae24d1971c48160ff7c6b0acc79..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/augment_test.py +++ /dev/null @@ -1,129 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for autoaugment.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from absl.testing import parameterized - -import tensorflow as tf - -from official.vision.image_classification import augment - - -def get_dtype_test_cases(): - return [ - ('uint8', tf.uint8), - ('int32', tf.int32), - ('float16', tf.float16), - ('float32', tf.float32), - ] - - -@parameterized.named_parameters(get_dtype_test_cases()) -class TransformsTest(parameterized.TestCase, tf.test.TestCase): - """Basic tests for fundamental transformations.""" - - def test_to_from_4d(self, dtype): - for shape in [(10, 10), (10, 10, 10), (10, 10, 10, 10)]: - original_ndims = len(shape) - image = tf.zeros(shape, dtype=dtype) - image_4d = augment.to_4d(image) - self.assertEqual(4, tf.rank(image_4d)) - self.assertAllEqual(image, augment.from_4d(image_4d, original_ndims)) - - def test_transform(self, dtype): - image = tf.constant([[1, 2], [3, 4]], dtype=dtype) - self.assertAllEqual( - augment.transform(image, transforms=[1] * 8), [[4, 4], [4, 4]]) - - def test_translate(self, dtype): - image = tf.constant( - [[1, 0, 1, 0], [0, 1, 0, 1], [1, 0, 1, 0], [0, 1, 0, 1]], dtype=dtype) - translations = [-1, -1] - translated = augment.translate(image=image, translations=translations) - expected = [[1, 0, 1, 1], [0, 1, 0, 0], [1, 0, 1, 1], [1, 0, 1, 1]] - self.assertAllEqual(translated, expected) - - def test_translate_shapes(self, dtype): - translation = [0, 0] - for shape in [(3, 3), (5, 5), (224, 224, 3)]: - image = tf.zeros(shape, dtype=dtype) - self.assertAllEqual(image, augment.translate(image, translation)) - - def test_translate_invalid_translation(self, dtype): - image = tf.zeros((1, 1), dtype=dtype) - invalid_translation = [[[1, 1]]] - with self.assertRaisesRegex(TypeError, 'rank 1 or 2'): - _ = augment.translate(image, invalid_translation) - - def test_rotate(self, dtype): - image = tf.reshape(tf.cast(tf.range(9), dtype), (3, 3)) - rotation = 90. - transformed = augment.rotate(image=image, degrees=rotation) - expected = [[2, 5, 8], [1, 4, 7], [0, 3, 6]] - self.assertAllEqual(transformed, expected) - - def test_rotate_shapes(self, dtype): - degrees = 0. - for shape in [(3, 3), (5, 5), (224, 224, 3)]: - image = tf.zeros(shape, dtype=dtype) - self.assertAllEqual(image, augment.rotate(image, degrees)) - - -class AutoaugmentTest(tf.test.TestCase): - - def test_autoaugment(self): - """Smoke test to be sure there are no syntax errors.""" - image = tf.zeros((224, 224, 3), dtype=tf.uint8) - - augmenter = augment.AutoAugment() - aug_image = augmenter.distort(image) - - self.assertEqual((224, 224, 3), aug_image.shape) - - def test_randaug(self): - """Smoke test to be sure there are no syntax errors.""" - image = tf.zeros((224, 224, 3), dtype=tf.uint8) - - augmenter = augment.RandAugment() - aug_image = augmenter.distort(image) - - self.assertEqual((224, 224, 3), aug_image.shape) - - def test_all_policy_ops(self): - """Smoke test to be sure all augmentation functions can execute.""" - - prob = 1 - magnitude = 10 - replace_value = [128] * 3 - cutout_const = 100 - translate_const = 250 - - image = tf.ones((224, 224, 3), dtype=tf.uint8) - - for op_name in augment.NAME_TO_FUNC: - func, _, args = augment._parse_policy_info(op_name, prob, magnitude, - replace_value, cutout_const, - translate_const) - image = func(image, *args) - - self.assertEqual((224, 224, 3), image.shape) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/image_classification/callbacks.py b/official/vision/image_classification/callbacks.py deleted file mode 100644 index a4934ed88f7db280d1ffd9ad57346f68a5395d5e..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/callbacks.py +++ /dev/null @@ -1,256 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Common modules for callbacks.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os -from typing import Any, List, MutableMapping, Optional, Text - -from absl import logging -import tensorflow as tf - -from official.modeling import optimization -from official.utils.misc import keras_utils - - -def get_callbacks( - model_checkpoint: bool = True, - include_tensorboard: bool = True, - time_history: bool = True, - track_lr: bool = True, - write_model_weights: bool = True, - apply_moving_average: bool = False, - initial_step: int = 0, - batch_size: int = 0, - log_steps: int = 0, - model_dir: Optional[str] = None, - backup_and_restore: bool = False) -> List[tf.keras.callbacks.Callback]: - """Get all callbacks.""" - model_dir = model_dir or '' - callbacks = [] - if model_checkpoint: - ckpt_full_path = os.path.join(model_dir, 'model.ckpt-{epoch:04d}') - callbacks.append( - tf.keras.callbacks.ModelCheckpoint( - ckpt_full_path, save_weights_only=True, verbose=1)) - if backup_and_restore: - backup_dir = os.path.join(model_dir, 'tmp') - callbacks.append( - tf.keras.callbacks.experimental.BackupAndRestore(backup_dir)) - if include_tensorboard: - callbacks.append( - CustomTensorBoard( - log_dir=model_dir, - track_lr=track_lr, - initial_step=initial_step, - write_images=write_model_weights, - profile_batch=0)) - if time_history: - callbacks.append( - keras_utils.TimeHistory( - batch_size, - log_steps, - logdir=model_dir if include_tensorboard else None)) - if apply_moving_average: - # Save moving average model to a different file so that - # we can resume training from a checkpoint - ckpt_full_path = os.path.join(model_dir, 'average', - 'model.ckpt-{epoch:04d}') - callbacks.append( - AverageModelCheckpoint( - update_weights=False, - filepath=ckpt_full_path, - save_weights_only=True, - verbose=1)) - callbacks.append(MovingAverageCallback()) - return callbacks - - -def get_scalar_from_tensor(t: tf.Tensor) -> int: - """Utility function to convert a Tensor to a scalar.""" - t = tf.keras.backend.get_value(t) - if callable(t): - return t() - else: - return t - - -class CustomTensorBoard(tf.keras.callbacks.TensorBoard): - """A customized TensorBoard callback that tracks additional datapoints. - - Metrics tracked: - - Global learning rate - - Attributes: - log_dir: the path of the directory where to save the log files to be parsed - by TensorBoard. - track_lr: `bool`, whether or not to track the global learning rate. - initial_step: the initial step, used for preemption recovery. - **kwargs: Additional arguments for backwards compatibility. Possible key is - `period`. - """ - - # TODO(b/146499062): track params, flops, log lr, l2 loss, - # classification loss - - def __init__(self, - log_dir: str, - track_lr: bool = False, - initial_step: int = 0, - **kwargs): - super(CustomTensorBoard, self).__init__(log_dir=log_dir, **kwargs) - self.step = initial_step - self._track_lr = track_lr - - def on_batch_begin(self, - epoch: int, - logs: Optional[MutableMapping[str, Any]] = None) -> None: - self.step += 1 - if logs is None: - logs = {} - logs.update(self._calculate_metrics()) - super(CustomTensorBoard, self).on_batch_begin(epoch, logs) - - def on_epoch_begin(self, - epoch: int, - logs: Optional[MutableMapping[str, Any]] = None) -> None: - if logs is None: - logs = {} - metrics = self._calculate_metrics() - logs.update(metrics) - for k, v in metrics.items(): - logging.info('Current %s: %f', k, v) - super(CustomTensorBoard, self).on_epoch_begin(epoch, logs) - - def on_epoch_end(self, - epoch: int, - logs: Optional[MutableMapping[str, Any]] = None) -> None: - if logs is None: - logs = {} - metrics = self._calculate_metrics() - logs.update(metrics) - super(CustomTensorBoard, self).on_epoch_end(epoch, logs) - - def _calculate_metrics(self) -> MutableMapping[str, Any]: - logs = {} - # TODO(b/149030439): disable LR reporting. - # if self._track_lr: - # logs['learning_rate'] = self._calculate_lr() - return logs - - def _calculate_lr(self) -> int: - """Calculates the learning rate given the current step.""" - return get_scalar_from_tensor( - self._get_base_optimizer()._decayed_lr(var_dtype=tf.float32)) # pylint:disable=protected-access - - def _get_base_optimizer(self) -> tf.keras.optimizers.Optimizer: - """Get the base optimizer used by the current model.""" - - optimizer = self.model.optimizer - - # The optimizer might be wrapped by another class, so unwrap it - while hasattr(optimizer, '_optimizer'): - optimizer = optimizer._optimizer # pylint:disable=protected-access - - return optimizer - - -class MovingAverageCallback(tf.keras.callbacks.Callback): - """A Callback to be used with a `ExponentialMovingAverage` optimizer. - - Applies moving average weights to the model during validation time to test - and predict on the averaged weights rather than the current model weights. - Once training is complete, the model weights will be overwritten with the - averaged weights (by default). - - Attributes: - overwrite_weights_on_train_end: Whether to overwrite the current model - weights with the averaged weights from the moving average optimizer. - **kwargs: Any additional callback arguments. - """ - - def __init__(self, overwrite_weights_on_train_end: bool = False, **kwargs): - super(MovingAverageCallback, self).__init__(**kwargs) - self.overwrite_weights_on_train_end = overwrite_weights_on_train_end - - def set_model(self, model: tf.keras.Model): - super(MovingAverageCallback, self).set_model(model) - assert isinstance(self.model.optimizer, - optimization.ExponentialMovingAverage) - self.model.optimizer.shadow_copy(self.model) - - def on_test_begin(self, logs: Optional[MutableMapping[Text, Any]] = None): - self.model.optimizer.swap_weights() - - def on_test_end(self, logs: Optional[MutableMapping[Text, Any]] = None): - self.model.optimizer.swap_weights() - - def on_train_end(self, logs: Optional[MutableMapping[Text, Any]] = None): - if self.overwrite_weights_on_train_end: - self.model.optimizer.assign_average_vars(self.model.variables) - - -class AverageModelCheckpoint(tf.keras.callbacks.ModelCheckpoint): - """Saves and, optionally, assigns the averaged weights. - - Taken from tfa.callbacks.AverageModelCheckpoint. - - Attributes: - update_weights: If True, assign the moving average weights to the model, and - save them. If False, keep the old non-averaged weights, but the saved - model uses the average weights. See `tf.keras.callbacks.ModelCheckpoint` - for the other args. - """ - - def __init__(self, - update_weights: bool, - filepath: str, - monitor: str = 'val_loss', - verbose: int = 0, - save_best_only: bool = False, - save_weights_only: bool = False, - mode: str = 'auto', - save_freq: str = 'epoch', - **kwargs): - self.update_weights = update_weights - super().__init__(filepath, monitor, verbose, save_best_only, - save_weights_only, mode, save_freq, **kwargs) - - def set_model(self, model): - if not isinstance(model.optimizer, optimization.ExponentialMovingAverage): - raise TypeError('AverageModelCheckpoint is only used when training' - 'with MovingAverage') - return super().set_model(model) - - def _save_model(self, epoch, logs): - assert isinstance(self.model.optimizer, - optimization.ExponentialMovingAverage) - - if self.update_weights: - self.model.optimizer.assign_average_vars(self.model.variables) - return super()._save_model(epoch, logs) # pytype: disable=attribute-error # typed-keras - else: - # Note: `model.get_weights()` gives us the weights (non-ref) - # whereas `model.variables` returns references to the variables. - non_avg_weights = self.model.get_weights() - self.model.optimizer.assign_average_vars(self.model.variables) - # result is currently None, since `super._save_model` doesn't - # return anything, but this may change in the future. - result = super()._save_model(epoch, logs) # pytype: disable=attribute-error # typed-keras - self.model.set_weights(non_avg_weights) - return result diff --git a/official/vision/image_classification/classifier_trainer.py b/official/vision/image_classification/classifier_trainer.py deleted file mode 100644 index ab6fbaea960e7d894d69e213e95c313d7fe9893c..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/classifier_trainer.py +++ /dev/null @@ -1,456 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Runs an Image Classification model.""" - -import os -import pprint -from typing import Any, Tuple, Text, Optional, Mapping - -from absl import app -from absl import flags -from absl import logging -import tensorflow as tf -from official.common import distribute_utils -from official.modeling import hyperparams -from official.modeling import performance -from official.utils import hyperparams_flags -from official.utils.misc import keras_utils -from official.vision.image_classification import callbacks as custom_callbacks -from official.vision.image_classification import dataset_factory -from official.vision.image_classification import optimizer_factory -from official.vision.image_classification.configs import base_configs -from official.vision.image_classification.configs import configs -from official.vision.image_classification.efficientnet import efficientnet_model -from official.vision.image_classification.resnet import common -from official.vision.image_classification.resnet import resnet_model - - -def get_models() -> Mapping[str, tf.keras.Model]: - """Returns the mapping from model type name to Keras model.""" - return { - 'efficientnet': efficientnet_model.EfficientNet.from_name, - 'resnet': resnet_model.resnet50, - } - - -def get_dtype_map() -> Mapping[str, tf.dtypes.DType]: - """Returns the mapping from dtype string representations to TF dtypes.""" - return { - 'float32': tf.float32, - 'bfloat16': tf.bfloat16, - 'float16': tf.float16, - 'fp32': tf.float32, - 'bf16': tf.bfloat16, - } - - -def _get_metrics(one_hot: bool) -> Mapping[Text, Any]: - """Get a dict of available metrics to track.""" - if one_hot: - return { - # (name, metric_fn) - 'acc': - tf.keras.metrics.CategoricalAccuracy(name='accuracy'), - 'accuracy': - tf.keras.metrics.CategoricalAccuracy(name='accuracy'), - 'top_1': - tf.keras.metrics.CategoricalAccuracy(name='accuracy'), - 'top_5': - tf.keras.metrics.TopKCategoricalAccuracy( - k=5, name='top_5_accuracy'), - } - else: - return { - # (name, metric_fn) - 'acc': - tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'), - 'accuracy': - tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'), - 'top_1': - tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'), - 'top_5': - tf.keras.metrics.SparseTopKCategoricalAccuracy( - k=5, name='top_5_accuracy'), - } - - -def get_image_size_from_model( - params: base_configs.ExperimentConfig) -> Optional[int]: - """If the given model has a preferred image size, return it.""" - if params.model_name == 'efficientnet': - efficientnet_name = params.model.model_params.model_name - if efficientnet_name in efficientnet_model.MODEL_CONFIGS: - return efficientnet_model.MODEL_CONFIGS[efficientnet_name].resolution - return None - - -def _get_dataset_builders(params: base_configs.ExperimentConfig, - strategy: tf.distribute.Strategy, - one_hot: bool) -> Tuple[Any, Any]: - """Create and return train and validation dataset builders.""" - if one_hot: - logging.warning('label_smoothing > 0, so datasets will be one hot encoded.') - else: - logging.warning('label_smoothing not applied, so datasets will not be one ' - 'hot encoded.') - - num_devices = strategy.num_replicas_in_sync if strategy else 1 - - image_size = get_image_size_from_model(params) - - dataset_configs = [params.train_dataset, params.validation_dataset] - builders = [] - - for config in dataset_configs: - if config is not None and config.has_data: - builder = dataset_factory.DatasetBuilder( - config, - image_size=image_size or config.image_size, - num_devices=num_devices, - one_hot=one_hot) - else: - builder = None - builders.append(builder) - - return builders - - -def get_loss_scale(params: base_configs.ExperimentConfig, - fp16_default: float = 128.) -> float: - """Returns the loss scale for initializations.""" - loss_scale = params.runtime.loss_scale - if loss_scale == 'dynamic': - return loss_scale - elif loss_scale is not None: - return float(loss_scale) - elif (params.train_dataset.dtype == 'float32' or - params.train_dataset.dtype == 'bfloat16'): - return 1. - else: - assert params.train_dataset.dtype == 'float16' - return fp16_default - - -def _get_params_from_flags(flags_obj: flags.FlagValues): - """Get ParamsDict from flags.""" - model = flags_obj.model_type.lower() - dataset = flags_obj.dataset.lower() - params = configs.get_config(model=model, dataset=dataset) - - flags_overrides = { - 'model_dir': flags_obj.model_dir, - 'mode': flags_obj.mode, - 'model': { - 'name': model, - }, - 'runtime': { - 'run_eagerly': flags_obj.run_eagerly, - 'tpu': flags_obj.tpu, - }, - 'train_dataset': { - 'data_dir': flags_obj.data_dir, - }, - 'validation_dataset': { - 'data_dir': flags_obj.data_dir, - }, - 'train': { - 'time_history': { - 'log_steps': flags_obj.log_steps, - }, - }, - } - - overriding_configs = (flags_obj.config_file, flags_obj.params_override, - flags_overrides) - - pp = pprint.PrettyPrinter() - - logging.info('Base params: %s', pp.pformat(params.as_dict())) - - for param in overriding_configs: - logging.info('Overriding params: %s', param) - params = hyperparams.override_params_dict(params, param, is_strict=True) - - params.validate() - params.lock() - - logging.info('Final model parameters: %s', pp.pformat(params.as_dict())) - return params - - -def resume_from_checkpoint(model: tf.keras.Model, model_dir: str, - train_steps: int) -> int: - """Resumes from the latest checkpoint, if possible. - - Loads the model weights and optimizer settings from a checkpoint. - This function should be used in case of preemption recovery. - - Args: - model: The model whose weights should be restored. - model_dir: The directory where model weights were saved. - train_steps: The number of steps to train. - - Returns: - The epoch of the latest checkpoint, or 0 if not restoring. - - """ - logging.info('Load from checkpoint is enabled.') - latest_checkpoint = tf.train.latest_checkpoint(model_dir) - logging.info('latest_checkpoint: %s', latest_checkpoint) - if not latest_checkpoint: - logging.info('No checkpoint detected.') - return 0 - - logging.info('Checkpoint file %s found and restoring from ' - 'checkpoint', latest_checkpoint) - model.load_weights(latest_checkpoint) - initial_epoch = model.optimizer.iterations // train_steps - logging.info('Completed loading from checkpoint.') - logging.info('Resuming from epoch %d', initial_epoch) - return int(initial_epoch) - - -def initialize(params: base_configs.ExperimentConfig, - dataset_builder: dataset_factory.DatasetBuilder): - """Initializes backend related initializations.""" - keras_utils.set_session_config(enable_xla=params.runtime.enable_xla) - performance.set_mixed_precision_policy(dataset_builder.dtype) - if tf.config.list_physical_devices('GPU'): - data_format = 'channels_first' - else: - data_format = 'channels_last' - tf.keras.backend.set_image_data_format(data_format) - if params.runtime.run_eagerly: - # Enable eager execution to allow step-by-step debugging - tf.config.experimental_run_functions_eagerly(True) - if tf.config.list_physical_devices('GPU'): - if params.runtime.gpu_thread_mode: - keras_utils.set_gpu_thread_mode_and_count( - per_gpu_thread_count=params.runtime.per_gpu_thread_count, - gpu_thread_mode=params.runtime.gpu_thread_mode, - num_gpus=params.runtime.num_gpus, - datasets_num_private_threads=params.runtime - .dataset_num_private_threads) # pylint:disable=line-too-long - if params.runtime.batchnorm_spatial_persistent: - os.environ['TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT'] = '1' - - -def define_classifier_flags(): - """Defines common flags for image classification.""" - hyperparams_flags.initialize_common_flags() - flags.DEFINE_string( - 'data_dir', default=None, help='The location of the input data.') - flags.DEFINE_string( - 'mode', - default=None, - help='Mode to run: `train`, `eval`, `train_and_eval` or `export`.') - flags.DEFINE_bool( - 'run_eagerly', - default=None, - help='Use eager execution and disable autograph for debugging.') - flags.DEFINE_string( - 'model_type', - default=None, - help='The type of the model, e.g. EfficientNet, etc.') - flags.DEFINE_string( - 'dataset', - default=None, - help='The name of the dataset, e.g. ImageNet, etc.') - flags.DEFINE_integer( - 'log_steps', - default=100, - help='The interval of steps between logging of batch level stats.') - - -def serialize_config(params: base_configs.ExperimentConfig, model_dir: str): - """Serializes and saves the experiment config.""" - params_save_path = os.path.join(model_dir, 'params.yaml') - logging.info('Saving experiment configuration to %s', params_save_path) - tf.io.gfile.makedirs(model_dir) - hyperparams.save_params_dict_to_yaml(params, params_save_path) - - -def train_and_eval( - params: base_configs.ExperimentConfig, - strategy_override: tf.distribute.Strategy) -> Mapping[str, Any]: - """Runs the train and eval path using compile/fit.""" - logging.info('Running train and eval.') - - distribute_utils.configure_cluster(params.runtime.worker_hosts, - params.runtime.task_index) - - # Note: for TPUs, strategy and scope should be created before the dataset - strategy = strategy_override or distribute_utils.get_distribution_strategy( - distribution_strategy=params.runtime.distribution_strategy, - all_reduce_alg=params.runtime.all_reduce_alg, - num_gpus=params.runtime.num_gpus, - tpu_address=params.runtime.tpu) - - strategy_scope = distribute_utils.get_strategy_scope(strategy) - - logging.info('Detected %d devices.', - strategy.num_replicas_in_sync if strategy else 1) - - label_smoothing = params.model.loss.label_smoothing - one_hot = label_smoothing and label_smoothing > 0 - - builders = _get_dataset_builders(params, strategy, one_hot) - datasets = [ - builder.build(strategy) if builder else None for builder in builders - ] - - # Unpack datasets and builders based on train/val/test splits - train_builder, validation_builder = builders # pylint: disable=unbalanced-tuple-unpacking - train_dataset, validation_dataset = datasets - - train_epochs = params.train.epochs - train_steps = params.train.steps or train_builder.num_steps - validation_steps = params.evaluation.steps or validation_builder.num_steps - - initialize(params, train_builder) - - logging.info('Global batch size: %d', train_builder.global_batch_size) - - with strategy_scope: - model_params = params.model.model_params.as_dict() - model = get_models()[params.model.name](**model_params) - learning_rate = optimizer_factory.build_learning_rate( - params=params.model.learning_rate, - batch_size=train_builder.global_batch_size, - train_epochs=train_epochs, - train_steps=train_steps) - optimizer = optimizer_factory.build_optimizer( - optimizer_name=params.model.optimizer.name, - base_learning_rate=learning_rate, - params=params.model.optimizer.as_dict(), - model=model) - optimizer = performance.configure_optimizer( - optimizer, - use_float16=train_builder.dtype == 'float16', - loss_scale=get_loss_scale(params)) - - metrics_map = _get_metrics(one_hot) - metrics = [metrics_map[metric] for metric in params.train.metrics] - steps_per_loop = train_steps if params.train.set_epoch_loop else 1 - - if one_hot: - loss_obj = tf.keras.losses.CategoricalCrossentropy( - label_smoothing=params.model.loss.label_smoothing) - else: - loss_obj = tf.keras.losses.SparseCategoricalCrossentropy() - model.compile( - optimizer=optimizer, - loss=loss_obj, - metrics=metrics, - steps_per_execution=steps_per_loop) - - initial_epoch = 0 - if params.train.resume_checkpoint: - initial_epoch = resume_from_checkpoint( - model=model, model_dir=params.model_dir, train_steps=train_steps) - - callbacks = custom_callbacks.get_callbacks( - model_checkpoint=params.train.callbacks.enable_checkpoint_and_export, - include_tensorboard=params.train.callbacks.enable_tensorboard, - time_history=params.train.callbacks.enable_time_history, - track_lr=params.train.tensorboard.track_lr, - write_model_weights=params.train.tensorboard.write_model_weights, - initial_step=initial_epoch * train_steps, - batch_size=train_builder.global_batch_size, - log_steps=params.train.time_history.log_steps, - model_dir=params.model_dir, - backup_and_restore=params.train.callbacks.enable_backup_and_restore) - - serialize_config(params=params, model_dir=params.model_dir) - - if params.evaluation.skip_eval: - validation_kwargs = {} - else: - validation_kwargs = { - 'validation_data': validation_dataset, - 'validation_steps': validation_steps, - 'validation_freq': params.evaluation.epochs_between_evals, - } - - history = model.fit( - train_dataset, - epochs=train_epochs, - steps_per_epoch=train_steps, - initial_epoch=initial_epoch, - callbacks=callbacks, - verbose=2, - **validation_kwargs) - - validation_output = None - if not params.evaluation.skip_eval: - validation_output = model.evaluate( - validation_dataset, steps=validation_steps, verbose=2) - - # TODO(dankondratyuk): eval and save final test accuracy - stats = common.build_stats(history, validation_output, callbacks) - return stats - - -def export(params: base_configs.ExperimentConfig): - """Runs the model export functionality.""" - logging.info('Exporting model.') - model_params = params.model.model_params.as_dict() - model = get_models()[params.model.name](**model_params) - checkpoint = params.export.checkpoint - if checkpoint is None: - logging.info('No export checkpoint was provided. Using the latest ' - 'checkpoint from model_dir.') - checkpoint = tf.train.latest_checkpoint(params.model_dir) - - model.load_weights(checkpoint) - model.save(params.export.destination) - - -def run(flags_obj: flags.FlagValues, - strategy_override: tf.distribute.Strategy = None) -> Mapping[str, Any]: - """Runs Image Classification model using native Keras APIs. - - Args: - flags_obj: An object containing parsed flag values. - strategy_override: A `tf.distribute.Strategy` object to use for model. - - Returns: - Dictionary of training/eval stats - """ - params = _get_params_from_flags(flags_obj) - if params.mode == 'train_and_eval': - return train_and_eval(params, strategy_override) - elif params.mode == 'export_only': - export(params) - else: - raise ValueError('{} is not a valid mode.'.format(params.mode)) - - -def main(_): - stats = run(flags.FLAGS) - if stats: - logging.info('Run stats:\n%s', stats) - - -if __name__ == '__main__': - logging.set_verbosity(logging.INFO) - define_classifier_flags() - flags.mark_flag_as_required('data_dir') - flags.mark_flag_as_required('mode') - flags.mark_flag_as_required('model_type') - flags.mark_flag_as_required('dataset') - - app.run(main) diff --git a/official/vision/image_classification/classifier_trainer_test.py b/official/vision/image_classification/classifier_trainer_test.py deleted file mode 100644 index 06227c154427db3057269f9e9250a179a52264c9..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/classifier_trainer_test.py +++ /dev/null @@ -1,240 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Unit tests for the classifier trainer models.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import functools -import json - -import os -import sys - -from typing import Any, Callable, Iterable, Mapping, MutableMapping, Optional, Tuple - -from absl import flags -from absl.testing import flagsaver -from absl.testing import parameterized -import tensorflow as tf - -from tensorflow.python.distribute import combinations -from tensorflow.python.distribute import strategy_combinations -from official.utils.flags import core as flags_core -from official.vision.image_classification import classifier_trainer - - -classifier_trainer.define_classifier_flags() - - -def distribution_strategy_combinations() -> Iterable[Tuple[Any, ...]]: - """Returns the combinations of end-to-end tests to run.""" - return combinations.combine( - distribution=[ - strategy_combinations.default_strategy, - strategy_combinations.cloud_tpu_strategy, - strategy_combinations.one_device_strategy_gpu, - strategy_combinations.mirrored_strategy_with_two_gpus, - ], - model=[ - 'efficientnet', - 'resnet', - ], - dataset=[ - 'imagenet', - ], - ) - - -def get_params_override(params_override: Mapping[str, Any]) -> str: - """Converts params_override dict to string command.""" - return '--params_override=' + json.dumps(params_override) - - -def basic_params_override(dtype: str = 'float32') -> MutableMapping[str, Any]: - """Returns a basic parameter configuration for testing.""" - return { - 'train_dataset': { - 'builder': 'synthetic', - 'use_per_replica_batch_size': True, - 'batch_size': 1, - 'image_size': 224, - 'dtype': dtype, - }, - 'validation_dataset': { - 'builder': 'synthetic', - 'batch_size': 1, - 'use_per_replica_batch_size': True, - 'image_size': 224, - 'dtype': dtype, - }, - 'train': { - 'steps': 1, - 'epochs': 1, - 'callbacks': { - 'enable_checkpoint_and_export': True, - 'enable_tensorboard': False, - }, - }, - 'evaluation': { - 'steps': 1, - }, - } - - -@flagsaver.flagsaver -def run_end_to_end(main: Callable[[Any], None], - extra_flags: Optional[Iterable[str]] = None, - model_dir: Optional[str] = None): - """Runs the classifier trainer end-to-end.""" - extra_flags = [] if extra_flags is None else extra_flags - args = [sys.argv[0], '--model_dir', model_dir] + extra_flags - flags_core.parse_flags(argv=args) - main(flags.FLAGS) - - -class ClassifierTest(tf.test.TestCase, parameterized.TestCase): - """Unit tests for Keras models.""" - _tempdir = None - - @classmethod - def setUpClass(cls): # pylint: disable=invalid-name - super(ClassifierTest, cls).setUpClass() - - def tearDown(self): - super(ClassifierTest, self).tearDown() - tf.io.gfile.rmtree(self.get_temp_dir()) - - @combinations.generate(distribution_strategy_combinations()) - def test_end_to_end_train_and_eval(self, distribution, model, dataset): - """Test train_and_eval and export for Keras classifier models.""" - # Some parameters are not defined as flags (e.g. cannot run - # classifier_train.py --batch_size=...) by design, so use - # "--params_override=..." instead - model_dir = self.create_tempdir().full_path - base_flags = [ - '--data_dir=not_used', - '--model_type=' + model, - '--dataset=' + dataset, - ] - train_and_eval_flags = base_flags + [ - get_params_override(basic_params_override()), - '--mode=train_and_eval', - ] - - run = functools.partial( - classifier_trainer.run, strategy_override=distribution) - run_end_to_end( - main=run, extra_flags=train_and_eval_flags, model_dir=model_dir) - - @combinations.generate( - combinations.combine( - distribution=[ - strategy_combinations.one_device_strategy_gpu, - ], - model=[ - 'efficientnet', - 'resnet', - ], - dataset='imagenet', - dtype='float16', - )) - def test_gpu_train(self, distribution, model, dataset, dtype): - """Test train_and_eval and export for Keras classifier models.""" - # Some parameters are not defined as flags (e.g. cannot run - # classifier_train.py --batch_size=...) by design, so use - # "--params_override=..." instead - model_dir = self.create_tempdir().full_path - base_flags = [ - '--data_dir=not_used', - '--model_type=' + model, - '--dataset=' + dataset, - ] - train_and_eval_flags = base_flags + [ - get_params_override(basic_params_override(dtype)), - '--mode=train_and_eval', - ] - - export_params = basic_params_override() - export_path = os.path.join(model_dir, 'export') - export_params['export'] = {} - export_params['export']['destination'] = export_path - export_flags = base_flags + [ - '--mode=export_only', - get_params_override(export_params) - ] - - run = functools.partial( - classifier_trainer.run, strategy_override=distribution) - run_end_to_end( - main=run, extra_flags=train_and_eval_flags, model_dir=model_dir) - run_end_to_end(main=run, extra_flags=export_flags, model_dir=model_dir) - self.assertTrue(os.path.exists(export_path)) - - @combinations.generate( - combinations.combine( - distribution=[ - strategy_combinations.cloud_tpu_strategy, - ], - model=[ - 'efficientnet', - 'resnet', - ], - dataset='imagenet', - dtype='bfloat16', - )) - def test_tpu_train(self, distribution, model, dataset, dtype): - """Test train_and_eval and export for Keras classifier models.""" - # Some parameters are not defined as flags (e.g. cannot run - # classifier_train.py --batch_size=...) by design, so use - # "--params_override=..." instead - model_dir = self.create_tempdir().full_path - base_flags = [ - '--data_dir=not_used', - '--model_type=' + model, - '--dataset=' + dataset, - ] - train_and_eval_flags = base_flags + [ - get_params_override(basic_params_override(dtype)), - '--mode=train_and_eval', - ] - - run = functools.partial( - classifier_trainer.run, strategy_override=distribution) - run_end_to_end( - main=run, extra_flags=train_and_eval_flags, model_dir=model_dir) - - @combinations.generate(distribution_strategy_combinations()) - def test_end_to_end_invalid_mode(self, distribution, model, dataset): - """Test the Keras EfficientNet model with `strategy`.""" - model_dir = self.create_tempdir().full_path - extra_flags = [ - '--data_dir=not_used', - '--mode=invalid_mode', - '--model_type=' + model, - '--dataset=' + dataset, - get_params_override(basic_params_override()), - ] - - run = functools.partial( - classifier_trainer.run, strategy_override=distribution) - with self.assertRaises(ValueError): - run_end_to_end(main=run, extra_flags=extra_flags, model_dir=model_dir) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/image_classification/classifier_trainer_util_test.py b/official/vision/image_classification/classifier_trainer_util_test.py deleted file mode 100644 index d3624c286fdc716e4a09df56fbb8157fa35602aa..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/classifier_trainer_util_test.py +++ /dev/null @@ -1,166 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Unit tests for the classifier trainer models.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import copy -import os - -from absl.testing import parameterized -import tensorflow as tf - -from official.vision.image_classification import classifier_trainer -from official.vision.image_classification import dataset_factory -from official.vision.image_classification import test_utils -from official.vision.image_classification.configs import base_configs - - -def get_trivial_model(num_classes: int) -> tf.keras.Model: - """Creates and compiles trivial model for ImageNet dataset.""" - model = test_utils.trivial_model(num_classes=num_classes) - lr = 0.01 - optimizer = tf.keras.optimizers.SGD(learning_rate=lr) - loss_obj = tf.keras.losses.SparseCategoricalCrossentropy() - model.compile(optimizer=optimizer, loss=loss_obj, run_eagerly=True) - return model - - -def get_trivial_data() -> tf.data.Dataset: - """Gets trivial data in the ImageNet size.""" - - def generate_data(_) -> tf.data.Dataset: - image = tf.zeros(shape=(224, 224, 3), dtype=tf.float32) - label = tf.zeros([1], dtype=tf.int32) - return image, label - - dataset = tf.data.Dataset.range(1) - dataset = dataset.repeat() - dataset = dataset.map( - generate_data, num_parallel_calls=tf.data.experimental.AUTOTUNE) - dataset = dataset.prefetch(buffer_size=1).batch(1) - return dataset - - -class UtilTests(parameterized.TestCase, tf.test.TestCase): - """Tests for individual utility functions within classifier_trainer.py.""" - - @parameterized.named_parameters( - ('efficientnet-b0', 'efficientnet', 'efficientnet-b0', 224), - ('efficientnet-b1', 'efficientnet', 'efficientnet-b1', 240), - ('efficientnet-b2', 'efficientnet', 'efficientnet-b2', 260), - ('efficientnet-b3', 'efficientnet', 'efficientnet-b3', 300), - ('efficientnet-b4', 'efficientnet', 'efficientnet-b4', 380), - ('efficientnet-b5', 'efficientnet', 'efficientnet-b5', 456), - ('efficientnet-b6', 'efficientnet', 'efficientnet-b6', 528), - ('efficientnet-b7', 'efficientnet', 'efficientnet-b7', 600), - ('resnet', 'resnet', '', None), - ) - def test_get_model_size(self, model, model_name, expected): - config = base_configs.ExperimentConfig( - model_name=model, - model=base_configs.ModelConfig( - model_params={ - 'model_name': model_name, - },)) - size = classifier_trainer.get_image_size_from_model(config) - self.assertEqual(size, expected) - - @parameterized.named_parameters( - ('dynamic', 'dynamic', None, 'dynamic'), - ('scalar', 128., None, 128.), - ('float32', None, 'float32', 1), - ('float16', None, 'float16', 128), - ) - def test_get_loss_scale(self, loss_scale, dtype, expected): - config = base_configs.ExperimentConfig( - runtime=base_configs.RuntimeConfig(loss_scale=loss_scale), - train_dataset=dataset_factory.DatasetConfig(dtype=dtype)) - ls = classifier_trainer.get_loss_scale(config, fp16_default=128) - self.assertEqual(ls, expected) - - @parameterized.named_parameters(('float16', 'float16'), - ('bfloat16', 'bfloat16')) - def test_initialize(self, dtype): - config = base_configs.ExperimentConfig( - runtime=base_configs.RuntimeConfig( - run_eagerly=False, - enable_xla=False, - per_gpu_thread_count=1, - gpu_thread_mode='gpu_private', - num_gpus=1, - dataset_num_private_threads=1, - ), - train_dataset=dataset_factory.DatasetConfig(dtype=dtype), - model=base_configs.ModelConfig(), - ) - - class EmptyClass: - pass - - fake_ds_builder = EmptyClass() - fake_ds_builder.dtype = dtype - fake_ds_builder.config = EmptyClass() - classifier_trainer.initialize(config, fake_ds_builder) - - def test_resume_from_checkpoint(self): - """Tests functionality for resuming from checkpoint.""" - # Set the keras policy - tf.keras.mixed_precision.set_global_policy('mixed_bfloat16') - - # Get the model, datasets, and compile it. - model = get_trivial_model(10) - - # Create the checkpoint - model_dir = self.create_tempdir().full_path - train_epochs = 1 - train_steps = 10 - ds = get_trivial_data() - callbacks = [ - tf.keras.callbacks.ModelCheckpoint( - os.path.join(model_dir, 'model.ckpt-{epoch:04d}'), - save_weights_only=True) - ] - model.fit( - ds, - callbacks=callbacks, - epochs=train_epochs, - steps_per_epoch=train_steps) - - # Test load from checkpoint - clean_model = get_trivial_model(10) - weights_before_load = copy.deepcopy(clean_model.get_weights()) - initial_epoch = classifier_trainer.resume_from_checkpoint( - model=clean_model, model_dir=model_dir, train_steps=train_steps) - self.assertEqual(initial_epoch, 1) - self.assertNotAllClose(weights_before_load, clean_model.get_weights()) - - tf.io.gfile.rmtree(model_dir) - - def test_serialize_config(self): - """Tests functionality for serializing data.""" - config = base_configs.ExperimentConfig() - model_dir = self.create_tempdir().full_path - classifier_trainer.serialize_config(params=config, model_dir=model_dir) - saved_params_path = os.path.join(model_dir, 'params.yaml') - self.assertTrue(os.path.exists(saved_params_path)) - tf.io.gfile.rmtree(model_dir) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/image_classification/configs/__init__.py b/official/vision/image_classification/configs/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/image_classification/configs/base_configs.py b/official/vision/image_classification/configs/base_configs.py deleted file mode 100644 index 760b3dce03fc017c912eb499e30ff1418b5ec090..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/base_configs.py +++ /dev/null @@ -1,257 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Definitions for high level configuration groups..""" - -import dataclasses -from typing import Any, List, Mapping, Optional -from official.core import config_definitions -from official.modeling import hyperparams - -RuntimeConfig = config_definitions.RuntimeConfig - - -@dataclasses.dataclass -class TensorBoardConfig(hyperparams.Config): - """Configuration for TensorBoard. - - Attributes: - track_lr: Whether or not to track the learning rate in TensorBoard. Defaults - to True. - write_model_weights: Whether or not to write the model weights as images in - TensorBoard. Defaults to False. - """ - track_lr: bool = True - write_model_weights: bool = False - - -@dataclasses.dataclass -class CallbacksConfig(hyperparams.Config): - """Configuration for Callbacks. - - Attributes: - enable_checkpoint_and_export: Whether or not to enable checkpoints as a - Callback. Defaults to True. - enable_backup_and_restore: Whether or not to add BackupAndRestore - callback. Defaults to True. - enable_tensorboard: Whether or not to enable TensorBoard as a Callback. - Defaults to True. - enable_time_history: Whether or not to enable TimeHistory Callbacks. - Defaults to True. - """ - enable_checkpoint_and_export: bool = True - enable_backup_and_restore: bool = False - enable_tensorboard: bool = True - enable_time_history: bool = True - - -@dataclasses.dataclass -class ExportConfig(hyperparams.Config): - """Configuration for exports. - - Attributes: - checkpoint: the path to the checkpoint to export. - destination: the path to where the checkpoint should be exported. - """ - checkpoint: str = None - destination: str = None - - -@dataclasses.dataclass -class MetricsConfig(hyperparams.Config): - """Configuration for Metrics. - - Attributes: - accuracy: Whether or not to track accuracy as a Callback. Defaults to None. - top_5: Whether or not to track top_5_accuracy as a Callback. Defaults to - None. - """ - accuracy: bool = None - top_5: bool = None - - -@dataclasses.dataclass -class TimeHistoryConfig(hyperparams.Config): - """Configuration for the TimeHistory callback. - - Attributes: - log_steps: Interval of steps between logging of batch level stats. - """ - log_steps: int = None - - -@dataclasses.dataclass -class TrainConfig(hyperparams.Config): - """Configuration for training. - - Attributes: - resume_checkpoint: Whether or not to enable load checkpoint loading. - Defaults to None. - epochs: The number of training epochs to run. Defaults to None. - steps: The number of steps to run per epoch. If None, then this will be - inferred based on the number of images and batch size. Defaults to None. - callbacks: An instance of CallbacksConfig. - metrics: An instance of MetricsConfig. - tensorboard: An instance of TensorBoardConfig. - set_epoch_loop: Whether or not to set `steps_per_execution` to - equal the number of training steps in `model.compile`. This reduces the - number of callbacks run per epoch which significantly improves end-to-end - TPU training time. - """ - resume_checkpoint: bool = None - epochs: int = None - steps: int = None - callbacks: CallbacksConfig = CallbacksConfig() - metrics: MetricsConfig = None - tensorboard: TensorBoardConfig = TensorBoardConfig() - time_history: TimeHistoryConfig = TimeHistoryConfig() - set_epoch_loop: bool = False - - -@dataclasses.dataclass -class EvalConfig(hyperparams.Config): - """Configuration for evaluation. - - Attributes: - epochs_between_evals: The number of train epochs to run between evaluations. - Defaults to None. - steps: The number of eval steps to run during evaluation. If None, this will - be inferred based on the number of images and batch size. Defaults to - None. - skip_eval: Whether or not to skip evaluation. - """ - epochs_between_evals: int = None - steps: int = None - skip_eval: bool = False - - -@dataclasses.dataclass -class LossConfig(hyperparams.Config): - """Configuration for Loss. - - Attributes: - name: The name of the loss. Defaults to None. - label_smoothing: Whether or not to apply label smoothing to the loss. This - only applies to 'categorical_cross_entropy'. - """ - name: str = None - label_smoothing: float = None - - -@dataclasses.dataclass -class OptimizerConfig(hyperparams.Config): - """Configuration for Optimizers. - - Attributes: - name: The name of the optimizer. Defaults to None. - decay: Decay or rho, discounting factor for gradient. Defaults to None. - epsilon: Small value used to avoid 0 denominator. Defaults to None. - momentum: Plain momentum constant. Defaults to None. - nesterov: Whether or not to apply Nesterov momentum. Defaults to None. - moving_average_decay: The amount of decay to apply. If 0 or None, then - exponential moving average is not used. Defaults to None. - lookahead: Whether or not to apply the lookahead optimizer. Defaults to - None. - beta_1: The exponential decay rate for the 1st moment estimates. Used in the - Adam optimizers. Defaults to None. - beta_2: The exponential decay rate for the 2nd moment estimates. Used in the - Adam optimizers. Defaults to None. - epsilon: Small value used to avoid 0 denominator. Defaults to 1e-7. - """ - name: str = None - decay: float = None - epsilon: float = None - momentum: float = None - nesterov: bool = None - moving_average_decay: Optional[float] = None - lookahead: Optional[bool] = None - beta_1: float = None - beta_2: float = None - epsilon: float = None - - -@dataclasses.dataclass -class LearningRateConfig(hyperparams.Config): - """Configuration for learning rates. - - Attributes: - name: The name of the learning rate. Defaults to None. - initial_lr: The initial learning rate. Defaults to None. - decay_epochs: The number of decay epochs. Defaults to None. - decay_rate: The rate of decay. Defaults to None. - warmup_epochs: The number of warmup epochs. Defaults to None. - batch_lr_multiplier: The multiplier to apply to the base learning rate, if - necessary. Defaults to None. - examples_per_epoch: the number of examples in a single epoch. Defaults to - None. - boundaries: boundaries used in piecewise constant decay with warmup. - multipliers: multipliers used in piecewise constant decay with warmup. - scale_by_batch_size: Scale the learning rate by a fraction of the batch - size. Set to 0 for no scaling (default). - staircase: Apply exponential decay at discrete values instead of continuous. - """ - name: str = None - initial_lr: float = None - decay_epochs: float = None - decay_rate: float = None - warmup_epochs: int = None - examples_per_epoch: int = None - boundaries: List[int] = None - multipliers: List[float] = None - scale_by_batch_size: float = 0. - staircase: bool = None - - -@dataclasses.dataclass -class ModelConfig(hyperparams.Config): - """Configuration for Models. - - Attributes: - name: The name of the model. Defaults to None. - model_params: The parameters used to create the model. Defaults to None. - num_classes: The number of classes in the model. Defaults to None. - loss: A `LossConfig` instance. Defaults to None. - optimizer: An `OptimizerConfig` instance. Defaults to None. - """ - name: str = None - model_params: hyperparams.Config = None - num_classes: int = None - loss: LossConfig = None - optimizer: OptimizerConfig = None - - -@dataclasses.dataclass -class ExperimentConfig(hyperparams.Config): - """Base configuration for an image classification experiment. - - Attributes: - model_dir: The directory to use when running an experiment. - mode: e.g. 'train_and_eval', 'export' - runtime: A `RuntimeConfig` instance. - train: A `TrainConfig` instance. - evaluation: An `EvalConfig` instance. - model: A `ModelConfig` instance. - export: An `ExportConfig` instance. - """ - model_dir: str = None - model_name: str = None - mode: str = None - runtime: RuntimeConfig = None - train_dataset: Any = None - validation_dataset: Any = None - train: TrainConfig = None - evaluation: EvalConfig = None - model: ModelConfig = None - export: ExportConfig = None diff --git a/official/vision/image_classification/configs/configs.py b/official/vision/image_classification/configs/configs.py deleted file mode 100644 index 127af58c476f7ae849ca43e5765379b77897aea8..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/configs.py +++ /dev/null @@ -1,113 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Configuration utils for image classification experiments.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import dataclasses - -from official.vision.image_classification import dataset_factory -from official.vision.image_classification.configs import base_configs -from official.vision.image_classification.efficientnet import efficientnet_config -from official.vision.image_classification.resnet import resnet_config - - -@dataclasses.dataclass -class EfficientNetImageNetConfig(base_configs.ExperimentConfig): - """Base configuration to train efficientnet-b0 on ImageNet. - - Attributes: - export: An `ExportConfig` instance - runtime: A `RuntimeConfig` instance. - dataset: A `DatasetConfig` instance. - train: A `TrainConfig` instance. - evaluation: An `EvalConfig` instance. - model: A `ModelConfig` instance. - """ - export: base_configs.ExportConfig = base_configs.ExportConfig() - runtime: base_configs.RuntimeConfig = base_configs.RuntimeConfig() - train_dataset: dataset_factory.DatasetConfig = \ - dataset_factory.ImageNetConfig(split='train') - validation_dataset: dataset_factory.DatasetConfig = \ - dataset_factory.ImageNetConfig(split='validation') - train: base_configs.TrainConfig = base_configs.TrainConfig( - resume_checkpoint=True, - epochs=500, - steps=None, - callbacks=base_configs.CallbacksConfig( - enable_checkpoint_and_export=True, enable_tensorboard=True), - metrics=['accuracy', 'top_5'], - time_history=base_configs.TimeHistoryConfig(log_steps=100), - tensorboard=base_configs.TensorBoardConfig( - track_lr=True, write_model_weights=False), - set_epoch_loop=False) - evaluation: base_configs.EvalConfig = base_configs.EvalConfig( - epochs_between_evals=1, steps=None) - model: base_configs.ModelConfig = \ - efficientnet_config.EfficientNetModelConfig() - - -@dataclasses.dataclass -class ResNetImagenetConfig(base_configs.ExperimentConfig): - """Base configuration to train resnet-50 on ImageNet.""" - export: base_configs.ExportConfig = base_configs.ExportConfig() - runtime: base_configs.RuntimeConfig = base_configs.RuntimeConfig() - train_dataset: dataset_factory.DatasetConfig = \ - dataset_factory.ImageNetConfig(split='train', - one_hot=False, - mean_subtract=True, - standardize=True) - validation_dataset: dataset_factory.DatasetConfig = \ - dataset_factory.ImageNetConfig(split='validation', - one_hot=False, - mean_subtract=True, - standardize=True) - train: base_configs.TrainConfig = base_configs.TrainConfig( - resume_checkpoint=True, - epochs=90, - steps=None, - callbacks=base_configs.CallbacksConfig( - enable_checkpoint_and_export=True, enable_tensorboard=True), - metrics=['accuracy', 'top_5'], - time_history=base_configs.TimeHistoryConfig(log_steps=100), - tensorboard=base_configs.TensorBoardConfig( - track_lr=True, write_model_weights=False), - set_epoch_loop=False) - evaluation: base_configs.EvalConfig = base_configs.EvalConfig( - epochs_between_evals=1, steps=None) - model: base_configs.ModelConfig = resnet_config.ResNetModelConfig() - - -def get_config(model: str, dataset: str) -> base_configs.ExperimentConfig: - """Given model and dataset names, return the ExperimentConfig.""" - dataset_model_config_map = { - 'imagenet': { - 'efficientnet': EfficientNetImageNetConfig(), - 'resnet': ResNetImagenetConfig(), - } - } - try: - return dataset_model_config_map[dataset][model] - except KeyError: - if dataset not in dataset_model_config_map: - raise KeyError('Invalid dataset received. Received: {}. Supported ' - 'datasets include: {}'.format( - dataset, ', '.join(dataset_model_config_map.keys()))) - raise KeyError('Invalid model received. Received: {}. Supported models for' - '{} include: {}'.format( - model, dataset, - ', '.join(dataset_model_config_map[dataset].keys()))) diff --git a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml b/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml deleted file mode 100644 index 6f40ffb1e3020a231832a120d9938bf77e9cc74b..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml +++ /dev/null @@ -1,52 +0,0 @@ -# Training configuration for EfficientNet-b0 trained on ImageNet on GPUs. -# Takes ~32 minutes per epoch for 8 V100s. -# Reaches ~76.1% within 350 epochs. -# Note: This configuration uses a scaled per-replica batch size based on the number of devices. -runtime: - distribution_strategy: 'mirrored' - num_gpus: 1 -train_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'train' - num_classes: 1000 - num_examples: 1281167 - batch_size: 32 - use_per_replica_batch_size: True - dtype: 'float32' - augmenter: - name: 'autoaugment' -validation_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'validation' - num_classes: 1000 - num_examples: 50000 - batch_size: 32 - use_per_replica_batch_size: True - dtype: 'float32' -model: - model_params: - model_name: 'efficientnet-b0' - overrides: - num_classes: 1000 - batch_norm: 'default' - dtype: 'float32' - activation: 'swish' - optimizer: - name: 'rmsprop' - momentum: 0.9 - decay: 0.9 - moving_average_decay: 0.0 - lookahead: false - learning_rate: - name: 'exponential' - loss: - label_smoothing: 0.1 -train: - resume_checkpoint: True - epochs: 500 -evaluation: - epochs_between_evals: 1 diff --git a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml b/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml deleted file mode 100644 index c5be7e9ba32fc7e8f3999df8e7446405dd2d4173..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml +++ /dev/null @@ -1,52 +0,0 @@ -# Training configuration for EfficientNet-b0 trained on ImageNet on TPUs. -# Takes ~2 minutes, 50 seconds per epoch for v3-32. -# Reaches ~76.1% within 350 epochs. -# Note: This configuration uses a scaled per-replica batch size based on the number of devices. -runtime: - distribution_strategy: 'tpu' -train_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'train' - num_classes: 1000 - num_examples: 1281167 - batch_size: 128 - use_per_replica_batch_size: True - dtype: 'bfloat16' - augmenter: - name: 'autoaugment' -validation_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'validation' - num_classes: 1000 - num_examples: 50000 - batch_size: 128 - use_per_replica_batch_size: True - dtype: 'bfloat16' -model: - model_params: - model_name: 'efficientnet-b0' - overrides: - num_classes: 1000 - batch_norm: 'tpu' - dtype: 'bfloat16' - activation: 'swish' - optimizer: - name: 'rmsprop' - momentum: 0.9 - decay: 0.9 - moving_average_decay: 0.0 - lookahead: false - learning_rate: - name: 'exponential' - loss: - label_smoothing: 0.1 -train: - resume_checkpoint: True - epochs: 500 - set_epoch_loop: True -evaluation: - epochs_between_evals: 1 diff --git a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-gpu.yaml b/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-gpu.yaml deleted file mode 100644 index 2f3dce01a46c64c4d92e97091628daeadaceb21d..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-gpu.yaml +++ /dev/null @@ -1,47 +0,0 @@ -# Note: This configuration uses a scaled per-replica batch size based on the number of devices. -runtime: - distribution_strategy: 'mirrored' - num_gpus: 1 -train_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'train' - num_classes: 1000 - num_examples: 1281167 - batch_size: 32 - use_per_replica_batch_size: True - dtype: 'float32' -validation_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'validation' - num_classes: 1000 - num_examples: 50000 - batch_size: 32 - use_per_replica_batch_size: True - dtype: 'float32' -model: - model_params: - model_name: 'efficientnet-b1' - overrides: - num_classes: 1000 - batch_norm: 'default' - dtype: 'float32' - activation: 'swish' - optimizer: - name: 'rmsprop' - momentum: 0.9 - decay: 0.9 - moving_average_decay: 0.0 - lookahead: false - learning_rate: - name: 'exponential' - loss: - label_smoothing: 0.1 -train: - resume_checkpoint: True - epochs: 500 -evaluation: - epochs_between_evals: 1 diff --git a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-tpu.yaml b/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-tpu.yaml deleted file mode 100644 index 0bb6a9fe6f0b417f92686178d4bc79a44c5a4aa7..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-tpu.yaml +++ /dev/null @@ -1,51 +0,0 @@ -# Training configuration for EfficientNet-b1 trained on ImageNet on TPUs. -# Takes ~3 minutes, 15 seconds per epoch for v3-32. -# Note: This configuration uses a scaled per-replica batch size based on the number of devices. -runtime: - distribution_strategy: 'tpu' -train_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'train' - num_classes: 1000 - num_examples: 1281167 - batch_size: 128 - use_per_replica_batch_size: True - dtype: 'bfloat16' - augmenter: - name: 'autoaugment' -validation_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'validation' - num_classes: 1000 - num_examples: 50000 - batch_size: 128 - use_per_replica_batch_size: True - dtype: 'bfloat16' -model: - model_params: - model_name: 'efficientnet-b1' - overrides: - num_classes: 1000 - batch_norm: 'tpu' - dtype: 'bfloat16' - activation: 'swish' - optimizer: - name: 'rmsprop' - momentum: 0.9 - decay: 0.9 - moving_average_decay: 0.0 - lookahead: false - learning_rate: - name: 'exponential' - loss: - label_smoothing: 0.1 -train: - resume_checkpoint: True - epochs: 500 - set_epoch_loop: True -evaluation: - epochs_between_evals: 1 diff --git a/official/vision/image_classification/configs/examples/resnet/imagenet/gpu.yaml b/official/vision/image_classification/configs/examples/resnet/imagenet/gpu.yaml deleted file mode 100644 index 2037d6b5d1c39b9ff898eaf49ec7a68e3987356b..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/examples/resnet/imagenet/gpu.yaml +++ /dev/null @@ -1,49 +0,0 @@ -# Training configuration for ResNet trained on ImageNet on GPUs. -# Reaches > 76.1% within 90 epochs. -# Note: This configuration uses a scaled per-replica batch size based on the number of devices. -runtime: - distribution_strategy: 'mirrored' - num_gpus: 1 - batchnorm_spatial_persistent: True -train_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'tfds' - split: 'train' - image_size: 224 - num_classes: 1000 - num_examples: 1281167 - batch_size: 256 - use_per_replica_batch_size: True - dtype: 'float16' - mean_subtract: True - standardize: True -validation_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'tfds' - split: 'validation' - image_size: 224 - num_classes: 1000 - num_examples: 50000 - batch_size: 256 - use_per_replica_batch_size: True - dtype: 'float16' - mean_subtract: True - standardize: True -model: - name: 'resnet' - model_params: - rescale_inputs: False - optimizer: - name: 'momentum' - momentum: 0.9 - decay: 0.9 - epsilon: 0.001 - loss: - label_smoothing: 0.1 -train: - resume_checkpoint: True - epochs: 90 -evaluation: - epochs_between_evals: 1 diff --git a/official/vision/image_classification/configs/examples/resnet/imagenet/tpu.yaml b/official/vision/image_classification/configs/examples/resnet/imagenet/tpu.yaml deleted file mode 100644 index 0a3030333bb42ce59e67cfbe12a12be877ab19d0..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/examples/resnet/imagenet/tpu.yaml +++ /dev/null @@ -1,55 +0,0 @@ -# Training configuration for ResNet trained on ImageNet on TPUs. -# Takes ~4 minutes, 30 seconds seconds per epoch for a v3-32. -# Reaches > 76.1% within 90 epochs. -# Note: This configuration uses a scaled per-replica batch size based on the number of devices. -runtime: - distribution_strategy: 'tpu' -train_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'tfds' - split: 'train' - one_hot: False - image_size: 224 - num_classes: 1000 - num_examples: 1281167 - batch_size: 128 - use_per_replica_batch_size: True - mean_subtract: False - standardize: False - dtype: 'bfloat16' -validation_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'tfds' - split: 'validation' - one_hot: False - image_size: 224 - num_classes: 1000 - num_examples: 50000 - batch_size: 128 - use_per_replica_batch_size: True - mean_subtract: False - standardize: False - dtype: 'bfloat16' -model: - name: 'resnet' - model_params: - rescale_inputs: True - optimizer: - name: 'momentum' - momentum: 0.9 - decay: 0.9 - epsilon: 0.001 - moving_average_decay: 0. - lookahead: False - loss: - label_smoothing: 0.1 -train: - callbacks: - enable_checkpoint_and_export: True - resume_checkpoint: True - epochs: 90 - set_epoch_loop: True -evaluation: - epochs_between_evals: 1 diff --git a/official/vision/image_classification/dataset_factory.py b/official/vision/image_classification/dataset_factory.py deleted file mode 100644 index a0458ecccf9a74eb57480f8d127c0eb736591ff5..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/dataset_factory.py +++ /dev/null @@ -1,537 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Dataset utilities for vision tasks using TFDS and tf.data.Dataset.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os -from typing import Any, List, Optional, Tuple, Mapping, Union - -from absl import logging -from dataclasses import dataclass -import tensorflow as tf -import tensorflow_datasets as tfds - -from official.modeling.hyperparams import base_config -from official.vision.image_classification import augment -from official.vision.image_classification import preprocessing - -AUGMENTERS = { - 'autoaugment': augment.AutoAugment, - 'randaugment': augment.RandAugment, -} - - -@dataclass -class AugmentConfig(base_config.Config): - """Configuration for image augmenters. - - Attributes: - name: The name of the image augmentation to use. Possible options are None - (default), 'autoaugment', or 'randaugment'. - params: Any paramaters used to initialize the augmenter. - """ - name: Optional[str] = None - params: Optional[Mapping[str, Any]] = None - - def build(self) -> augment.ImageAugment: - """Build the augmenter using this config.""" - params = self.params or {} - augmenter = AUGMENTERS.get(self.name, None) - return augmenter(**params) if augmenter is not None else None - - -@dataclass -class DatasetConfig(base_config.Config): - """The base configuration for building datasets. - - Attributes: - name: The name of the Dataset. Usually should correspond to a TFDS dataset. - data_dir: The path where the dataset files are stored, if available. - filenames: Optional list of strings representing the TFRecord names. - builder: The builder type used to load the dataset. Value should be one of - 'tfds' (load using TFDS), 'records' (load from TFRecords), or 'synthetic' - (generate dummy synthetic data without reading from files). - split: The split of the dataset. Usually 'train', 'validation', or 'test'. - image_size: The size of the image in the dataset. This assumes that `width` - == `height`. Set to 'infer' to infer the image size from TFDS info. This - requires `name` to be a registered dataset in TFDS. - num_classes: The number of classes given by the dataset. Set to 'infer' to - infer the image size from TFDS info. This requires `name` to be a - registered dataset in TFDS. - num_channels: The number of channels given by the dataset. Set to 'infer' to - infer the image size from TFDS info. This requires `name` to be a - registered dataset in TFDS. - num_examples: The number of examples given by the dataset. Set to 'infer' to - infer the image size from TFDS info. This requires `name` to be a - registered dataset in TFDS. - batch_size: The base batch size for the dataset. - use_per_replica_batch_size: Whether to scale the batch size based on - available resources. If set to `True`, the dataset builder will return - batch_size multiplied by `num_devices`, the number of device replicas - (e.g., the number of GPUs or TPU cores). This setting should be `True` if - the strategy argument is passed to `build()` and `num_devices > 1`. - num_devices: The number of replica devices to use. This should be set by - `strategy.num_replicas_in_sync` when using a distribution strategy. - dtype: The desired dtype of the dataset. This will be set during - preprocessing. - one_hot: Whether to apply one hot encoding. Set to `True` to be able to use - label smoothing. - augmenter: The augmenter config to use. No augmentation is used by default. - download: Whether to download data using TFDS. - shuffle_buffer_size: The buffer size used for shuffling training data. - file_shuffle_buffer_size: The buffer size used for shuffling raw training - files. - skip_decoding: Whether to skip image decoding when loading from TFDS. - cache: whether to cache to dataset examples. Can be used to avoid re-reading - from disk on the second epoch. Requires significant memory overhead. - tf_data_service: The URI of a tf.data service to offload preprocessing onto - during training. The URI should be in the format "protocol://address", - e.g. "grpc://tf-data-service:5050". - mean_subtract: whether or not to apply mean subtraction to the dataset. - standardize: whether or not to apply standardization to the dataset. - """ - name: Optional[str] = None - data_dir: Optional[str] = None - filenames: Optional[List[str]] = None - builder: str = 'tfds' - split: str = 'train' - image_size: Union[int, str] = 'infer' - num_classes: Union[int, str] = 'infer' - num_channels: Union[int, str] = 'infer' - num_examples: Union[int, str] = 'infer' - batch_size: int = 128 - use_per_replica_batch_size: bool = True - num_devices: int = 1 - dtype: str = 'float32' - one_hot: bool = True - augmenter: AugmentConfig = AugmentConfig() - download: bool = False - shuffle_buffer_size: int = 10000 - file_shuffle_buffer_size: int = 1024 - skip_decoding: bool = True - cache: bool = False - tf_data_service: Optional[str] = None - mean_subtract: bool = False - standardize: bool = False - - @property - def has_data(self): - """Whether this dataset is has any data associated with it.""" - return self.name or self.data_dir or self.filenames - - -@dataclass -class ImageNetConfig(DatasetConfig): - """The base ImageNet dataset config.""" - name: str = 'imagenet2012' - # Note: for large datasets like ImageNet, using records is faster than tfds - builder: str = 'records' - image_size: int = 224 - num_channels: int = 3 - num_examples: int = 1281167 - num_classes: int = 1000 - batch_size: int = 128 - - -@dataclass -class Cifar10Config(DatasetConfig): - """The base CIFAR-10 dataset config.""" - name: str = 'cifar10' - image_size: int = 224 - batch_size: int = 128 - download: bool = True - cache: bool = True - - -class DatasetBuilder: - """An object for building datasets. - - Allows building various pipelines fetching examples, preprocessing, etc. - Maintains additional state information calculated from the dataset, i.e., - training set split, batch size, and number of steps (batches). - """ - - def __init__(self, config: DatasetConfig, **overrides: Any): - """Initialize the builder from the config.""" - self.config = config.replace(**overrides) - self.builder_info = None - - if self.config.augmenter is not None: - logging.info('Using augmentation: %s', self.config.augmenter.name) - self.augmenter = self.config.augmenter.build() - else: - self.augmenter = None - - @property - def is_training(self) -> bool: - """Whether this is the training set.""" - return self.config.split == 'train' - - @property - def batch_size(self) -> int: - """The batch size, multiplied by the number of replicas (if configured).""" - if self.config.use_per_replica_batch_size: - return self.config.batch_size * self.config.num_devices - else: - return self.config.batch_size - - @property - def global_batch_size(self): - """The global batch size across all replicas.""" - return self.batch_size - - @property - def local_batch_size(self): - """The base unscaled batch size.""" - if self.config.use_per_replica_batch_size: - return self.config.batch_size - else: - return self.config.batch_size // self.config.num_devices - - @property - def num_steps(self) -> int: - """The number of steps (batches) to exhaust this dataset.""" - # Always divide by the global batch size to get the correct # of steps - return self.num_examples // self.global_batch_size - - @property - def dtype(self) -> tf.dtypes.DType: - """Converts the config's dtype string to a tf dtype. - - Returns: - A mapping from string representation of a dtype to the `tf.dtypes.DType`. - - Raises: - ValueError if the config's dtype is not supported. - - """ - dtype_map = { - 'float32': tf.float32, - 'bfloat16': tf.bfloat16, - 'float16': tf.float16, - 'fp32': tf.float32, - 'bf16': tf.bfloat16, - } - try: - return dtype_map[self.config.dtype] - except: - raise ValueError('Invalid DType provided. Supported types: {}'.format( - dtype_map.keys())) - - @property - def image_size(self) -> int: - """The size of each image (can be inferred from the dataset).""" - - if self.config.image_size == 'infer': - return self.info.features['image'].shape[0] - else: - return int(self.config.image_size) - - @property - def num_channels(self) -> int: - """The number of image channels (can be inferred from the dataset).""" - if self.config.num_channels == 'infer': - return self.info.features['image'].shape[-1] - else: - return int(self.config.num_channels) - - @property - def num_examples(self) -> int: - """The number of examples (can be inferred from the dataset).""" - if self.config.num_examples == 'infer': - return self.info.splits[self.config.split].num_examples - else: - return int(self.config.num_examples) - - @property - def num_classes(self) -> int: - """The number of classes (can be inferred from the dataset).""" - if self.config.num_classes == 'infer': - return self.info.features['label'].num_classes - else: - return int(self.config.num_classes) - - @property - def info(self) -> tfds.core.DatasetInfo: - """The TFDS dataset info, if available.""" - try: - if self.builder_info is None: - self.builder_info = tfds.builder(self.config.name).info - except ConnectionError as e: - logging.error('Failed to use TFDS to load info. Please set dataset info ' - '(image_size, num_channels, num_examples, num_classes) in ' - 'the dataset config.') - raise e - return self.builder_info - - def build( - self, - strategy: Optional[tf.distribute.Strategy] = None) -> tf.data.Dataset: - """Construct a dataset end-to-end and return it using an optional strategy. - - Args: - strategy: a strategy that, if passed, will distribute the dataset - according to that strategy. If passed and `num_devices > 1`, - `use_per_replica_batch_size` must be set to `True`. - - Returns: - A TensorFlow dataset outputting batched images and labels. - """ - if strategy: - if strategy.num_replicas_in_sync != self.config.num_devices: - logging.warn( - 'Passed a strategy with %d devices, but expected' - '%d devices.', strategy.num_replicas_in_sync, - self.config.num_devices) - dataset = strategy.distribute_datasets_from_function(self._build) - else: - dataset = self._build() - - return dataset - - def _build( - self, - input_context: Optional[tf.distribute.InputContext] = None - ) -> tf.data.Dataset: - """Construct a dataset end-to-end and return it. - - Args: - input_context: An optional context provided by `tf.distribute` for - cross-replica training. - - Returns: - A TensorFlow dataset outputting batched images and labels. - """ - builders = { - 'tfds': self.load_tfds, - 'records': self.load_records, - 'synthetic': self.load_synthetic, - } - - builder = builders.get(self.config.builder, None) - - if builder is None: - raise ValueError('Unknown builder type {}'.format(self.config.builder)) - - self.input_context = input_context - dataset = builder() - dataset = self.pipeline(dataset) - - return dataset - - def load_tfds(self) -> tf.data.Dataset: - """Return a dataset loading files from TFDS.""" - - logging.info('Using TFDS to load data.') - - builder = tfds.builder(self.config.name, data_dir=self.config.data_dir) - - if self.config.download: - builder.download_and_prepare() - - decoders = {} - - if self.config.skip_decoding: - decoders['image'] = tfds.decode.SkipDecoding() - - read_config = tfds.ReadConfig( - interleave_cycle_length=10, - interleave_block_length=1, - input_context=self.input_context) - - dataset = builder.as_dataset( - split=self.config.split, - as_supervised=True, - shuffle_files=True, - decoders=decoders, - read_config=read_config) - - return dataset - - def load_records(self) -> tf.data.Dataset: - """Return a dataset loading files with TFRecords.""" - logging.info('Using TFRecords to load data.') - if self.config.filenames is None: - if self.config.data_dir is None: - raise ValueError('Dataset must specify a path for the data files.') - - file_pattern = os.path.join(self.config.data_dir, - '{}*'.format(self.config.split)) - dataset = tf.data.Dataset.list_files(file_pattern, shuffle=False) - else: - dataset = tf.data.Dataset.from_tensor_slices(self.config.filenames) - - return dataset - - def load_synthetic(self) -> tf.data.Dataset: - """Return a dataset generating dummy synthetic data.""" - logging.info('Generating a synthetic dataset.') - - def generate_data(_): - image = tf.zeros([self.image_size, self.image_size, self.num_channels], - dtype=self.dtype) - label = tf.zeros([1], dtype=tf.int32) - return image, label - - dataset = tf.data.Dataset.range(1) - dataset = dataset.repeat() - dataset = dataset.map( - generate_data, num_parallel_calls=tf.data.experimental.AUTOTUNE) - return dataset - - def pipeline(self, dataset: tf.data.Dataset) -> tf.data.Dataset: - """Build a pipeline fetching, shuffling, and preprocessing the dataset. - - Args: - dataset: A `tf.data.Dataset` that loads raw files. - - Returns: - A TensorFlow dataset outputting batched images and labels. - """ - if (self.config.builder != 'tfds' and self.input_context and - self.input_context.num_input_pipelines > 1): - dataset = dataset.shard(self.input_context.num_input_pipelines, - self.input_context.input_pipeline_id) - logging.info( - 'Sharding the dataset: input_pipeline_id=%d ' - 'num_input_pipelines=%d', self.input_context.num_input_pipelines, - self.input_context.input_pipeline_id) - - if self.is_training and self.config.builder == 'records': - # Shuffle the input files. - dataset.shuffle(buffer_size=self.config.file_shuffle_buffer_size) - - if self.is_training and not self.config.cache: - dataset = dataset.repeat() - - if self.config.builder == 'records': - # Read the data from disk in parallel - dataset = dataset.interleave( - tf.data.TFRecordDataset, - cycle_length=10, - block_length=1, - num_parallel_calls=tf.data.experimental.AUTOTUNE) - - if self.config.cache: - dataset = dataset.cache() - - if self.is_training: - dataset = dataset.shuffle(self.config.shuffle_buffer_size) - dataset = dataset.repeat() - - # Parse, pre-process, and batch the data in parallel - if self.config.builder == 'records': - preprocess = self.parse_record - else: - preprocess = self.preprocess - dataset = dataset.map( - preprocess, num_parallel_calls=tf.data.experimental.AUTOTUNE) - - if self.input_context and self.config.num_devices > 1: - if not self.config.use_per_replica_batch_size: - raise ValueError( - 'The builder does not support a global batch size with more than ' - 'one replica. Got {} replicas. Please set a ' - '`per_replica_batch_size` and enable ' - '`use_per_replica_batch_size=True`.'.format( - self.config.num_devices)) - - # The batch size of the dataset will be multiplied by the number of - # replicas automatically when strategy.distribute_datasets_from_function - # is called, so we use local batch size here. - dataset = dataset.batch( - self.local_batch_size, drop_remainder=self.is_training) - else: - dataset = dataset.batch( - self.global_batch_size, drop_remainder=self.is_training) - - # Prefetch overlaps in-feed with training - dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE) - - if self.config.tf_data_service: - if not hasattr(tf.data.experimental, 'service'): - raise ValueError('The tf_data_service flag requires Tensorflow version ' - '>= 2.3.0, but the version is {}'.format( - tf.__version__)) - dataset = dataset.apply( - tf.data.experimental.service.distribute( - processing_mode='parallel_epochs', - service=self.config.tf_data_service, - job_name='resnet_train')) - dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE) - - return dataset - - def parse_record(self, record: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: - """Parse an ImageNet record from a serialized string Tensor.""" - keys_to_features = { - 'image/encoded': tf.io.FixedLenFeature((), tf.string, ''), - 'image/format': tf.io.FixedLenFeature((), tf.string, 'jpeg'), - 'image/class/label': tf.io.FixedLenFeature([], tf.int64, -1), - 'image/class/text': tf.io.FixedLenFeature([], tf.string, ''), - 'image/object/bbox/xmin': tf.io.VarLenFeature(dtype=tf.float32), - 'image/object/bbox/ymin': tf.io.VarLenFeature(dtype=tf.float32), - 'image/object/bbox/xmax': tf.io.VarLenFeature(dtype=tf.float32), - 'image/object/bbox/ymax': tf.io.VarLenFeature(dtype=tf.float32), - 'image/object/class/label': tf.io.VarLenFeature(dtype=tf.int64), - } - - parsed = tf.io.parse_single_example(record, keys_to_features) - - label = tf.reshape(parsed['image/class/label'], shape=[1]) - - # Subtract one so that labels are in [0, 1000) - label -= 1 - - image_bytes = tf.reshape(parsed['image/encoded'], shape=[]) - image, label = self.preprocess(image_bytes, label) - - return image, label - - def preprocess(self, image: tf.Tensor, - label: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: - """Apply image preprocessing and augmentation to the image and label.""" - if self.is_training: - image = preprocessing.preprocess_for_train( - image, - image_size=self.image_size, - mean_subtract=self.config.mean_subtract, - standardize=self.config.standardize, - dtype=self.dtype, - augmenter=self.augmenter) - else: - image = preprocessing.preprocess_for_eval( - image, - image_size=self.image_size, - num_channels=self.num_channels, - mean_subtract=self.config.mean_subtract, - standardize=self.config.standardize, - dtype=self.dtype) - - label = tf.cast(label, tf.int32) - if self.config.one_hot: - label = tf.one_hot(label, self.num_classes) - label = tf.reshape(label, [self.num_classes]) - - return image, label - - @classmethod - def from_params(cls, *args, **kwargs): - """Construct a dataset builder from a default config and any overrides.""" - config = DatasetConfig.from_args(*args, **kwargs) - return cls(config) diff --git a/official/vision/image_classification/efficientnet/__init__.py b/official/vision/image_classification/efficientnet/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/efficientnet/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/image_classification/efficientnet/common_modules.py b/official/vision/image_classification/efficientnet/common_modules.py deleted file mode 100644 index 9c3d11c8676773be4f7fc27187d0852fdd58aaf4..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/efficientnet/common_modules.py +++ /dev/null @@ -1,118 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Common modeling utilities.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import tensorflow as tf -import tensorflow.compat.v1 as tf1 -from typing import Text, Optional - -from tensorflow.python.tpu import tpu_function - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class TpuBatchNormalization(tf.keras.layers.BatchNormalization): - """Cross replica batch normalization.""" - - def __init__(self, fused: Optional[bool] = False, **kwargs): - if fused in (True, None): - raise ValueError('TpuBatchNormalization does not support fused=True.') - super(TpuBatchNormalization, self).__init__(fused=fused, **kwargs) - - def _cross_replica_average(self, t: tf.Tensor, num_shards_per_group: int): - """Calculates the average value of input tensor across TPU replicas.""" - num_shards = tpu_function.get_tpu_context().number_of_shards - group_assignment = None - if num_shards_per_group > 1: - if num_shards % num_shards_per_group != 0: - raise ValueError( - 'num_shards: %d mod shards_per_group: %d, should be 0' % - (num_shards, num_shards_per_group)) - num_groups = num_shards // num_shards_per_group - group_assignment = [[ - x for x in range(num_shards) if x // num_shards_per_group == y - ] for y in range(num_groups)] - return tf1.tpu.cross_replica_sum(t, group_assignment) / tf.cast( - num_shards_per_group, t.dtype) - - def _moments(self, inputs: tf.Tensor, reduction_axes: int, keep_dims: int): - """Compute the mean and variance: it overrides the original _moments.""" - shard_mean, shard_variance = super(TpuBatchNormalization, self)._moments( - inputs, reduction_axes, keep_dims=keep_dims) - - num_shards = tpu_function.get_tpu_context().number_of_shards or 1 - if num_shards <= 8: # Skip cross_replica for 2x2 or smaller slices. - num_shards_per_group = 1 - else: - num_shards_per_group = max(8, num_shards // 8) - if num_shards_per_group > 1: - # Compute variance using: Var[X]= E[X^2] - E[X]^2. - shard_square_of_mean = tf.math.square(shard_mean) - shard_mean_of_square = shard_variance + shard_square_of_mean - group_mean = self._cross_replica_average(shard_mean, num_shards_per_group) - group_mean_of_square = self._cross_replica_average( - shard_mean_of_square, num_shards_per_group) - group_variance = group_mean_of_square - tf.math.square(group_mean) - return (group_mean, group_variance) - else: - return (shard_mean, shard_variance) - - -def get_batch_norm(batch_norm_type: Text) -> tf.keras.layers.BatchNormalization: - """A helper to create a batch normalization getter. - - Args: - batch_norm_type: The type of batch normalization layer implementation. `tpu` - will use `TpuBatchNormalization`. - - Returns: - An instance of `tf.keras.layers.BatchNormalization`. - """ - if batch_norm_type == 'tpu': - return TpuBatchNormalization - - return tf.keras.layers.BatchNormalization # pytype: disable=bad-return-type # typed-keras - - -def count_params(model, trainable_only=True): - """Returns the count of all model parameters, or just trainable ones.""" - if not trainable_only: - return model.count_params() - else: - return int( - np.sum([ - tf.keras.backend.count_params(p) for p in model.trainable_weights - ])) - - -def load_weights(model: tf.keras.Model, - model_weights_path: Text, - weights_format: Text = 'saved_model'): - """Load model weights from the given file path. - - Args: - model: the model to load weights into - model_weights_path: the path of the model weights - weights_format: the model weights format. One of 'saved_model', 'h5', or - 'checkpoint'. - """ - if weights_format == 'saved_model': - loaded_model = tf.keras.models.load_model(model_weights_path) - model.set_weights(loaded_model.get_weights()) - else: - model.load_weights(model_weights_path) diff --git a/official/vision/image_classification/efficientnet/efficientnet_config.py b/official/vision/image_classification/efficientnet/efficientnet_config.py deleted file mode 100644 index 47cfd740221d3581db585e90bc6df0711c289019..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/efficientnet/efficientnet_config.py +++ /dev/null @@ -1,78 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Configuration definitions for EfficientNet losses, learning rates, and optimizers.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from typing import Any, Mapping - -import dataclasses - -from official.modeling.hyperparams import base_config -from official.vision.image_classification.configs import base_configs - - -@dataclasses.dataclass -class EfficientNetModelConfig(base_configs.ModelConfig): - """Configuration for the EfficientNet model. - - This configuration will default to settings used for training efficientnet-b0 - on a v3-8 TPU on ImageNet. - - Attributes: - name: The name of the model. Defaults to 'EfficientNet'. - num_classes: The number of classes in the model. - model_params: A dictionary that represents the parameters of the - EfficientNet model. These will be passed in to the "from_name" function. - loss: The configuration for loss. Defaults to a categorical cross entropy - implementation. - optimizer: The configuration for optimizations. Defaults to an RMSProp - configuration. - learning_rate: The configuration for learning rate. Defaults to an - exponential configuration. - """ - name: str = 'EfficientNet' - num_classes: int = 1000 - model_params: base_config.Config = dataclasses.field( - default_factory=lambda: { - 'model_name': 'efficientnet-b0', - 'model_weights_path': '', - 'weights_format': 'saved_model', - 'overrides': { - 'batch_norm': 'default', - 'rescale_input': True, - 'num_classes': 1000, - 'activation': 'swish', - 'dtype': 'float32', - } - }) - loss: base_configs.LossConfig = base_configs.LossConfig( - name='categorical_crossentropy', label_smoothing=0.1) - optimizer: base_configs.OptimizerConfig = base_configs.OptimizerConfig( - name='rmsprop', - decay=0.9, - epsilon=0.001, - momentum=0.9, - moving_average_decay=None) - learning_rate: base_configs.LearningRateConfig = base_configs.LearningRateConfig( # pylint: disable=line-too-long - name='exponential', - initial_lr=0.008, - decay_epochs=2.4, - decay_rate=0.97, - warmup_epochs=5, - scale_by_batch_size=1. / 128., - staircase=True) diff --git a/official/vision/image_classification/efficientnet/efficientnet_model.py b/official/vision/image_classification/efficientnet/efficientnet_model.py deleted file mode 100644 index ad385715cd866209a0d3958a6742cbde73f16091..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/efficientnet/efficientnet_model.py +++ /dev/null @@ -1,499 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Contains definitions for EfficientNet model. - -[1] Mingxing Tan, Quoc V. Le - EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. - ICML'19, https://arxiv.org/abs/1905.11946 -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import math -import os -from typing import Any, Dict, Optional, Text, Tuple - -from absl import logging -from dataclasses import dataclass -import tensorflow as tf - -from official.modeling import tf_utils -from official.modeling.hyperparams import base_config -from official.vision.image_classification import preprocessing -from official.vision.image_classification.efficientnet import common_modules - - -@dataclass -class BlockConfig(base_config.Config): - """Config for a single MB Conv Block.""" - input_filters: int = 0 - output_filters: int = 0 - kernel_size: int = 3 - num_repeat: int = 1 - expand_ratio: int = 1 - strides: Tuple[int, int] = (1, 1) - se_ratio: Optional[float] = None - id_skip: bool = True - fused_conv: bool = False - conv_type: str = 'depthwise' - - -@dataclass -class ModelConfig(base_config.Config): - """Default Config for Efficientnet-B0.""" - width_coefficient: float = 1.0 - depth_coefficient: float = 1.0 - resolution: int = 224 - dropout_rate: float = 0.2 - blocks: Tuple[BlockConfig, ...] = ( - # (input_filters, output_filters, kernel_size, num_repeat, - # expand_ratio, strides, se_ratio) - # pylint: disable=bad-whitespace - BlockConfig.from_args(32, 16, 3, 1, 1, (1, 1), 0.25), - BlockConfig.from_args(16, 24, 3, 2, 6, (2, 2), 0.25), - BlockConfig.from_args(24, 40, 5, 2, 6, (2, 2), 0.25), - BlockConfig.from_args(40, 80, 3, 3, 6, (2, 2), 0.25), - BlockConfig.from_args(80, 112, 5, 3, 6, (1, 1), 0.25), - BlockConfig.from_args(112, 192, 5, 4, 6, (2, 2), 0.25), - BlockConfig.from_args(192, 320, 3, 1, 6, (1, 1), 0.25), - # pylint: enable=bad-whitespace - ) - stem_base_filters: int = 32 - top_base_filters: int = 1280 - activation: str = 'simple_swish' - batch_norm: str = 'default' - bn_momentum: float = 0.99 - bn_epsilon: float = 1e-3 - # While the original implementation used a weight decay of 1e-5, - # tf.nn.l2_loss divides it by 2, so we halve this to compensate in Keras - weight_decay: float = 5e-6 - drop_connect_rate: float = 0.2 - depth_divisor: int = 8 - min_depth: Optional[int] = None - use_se: bool = True - input_channels: int = 3 - num_classes: int = 1000 - model_name: str = 'efficientnet' - rescale_input: bool = True - data_format: str = 'channels_last' - dtype: str = 'float32' - - -MODEL_CONFIGS = { - # (width, depth, resolution, dropout) - 'efficientnet-b0': ModelConfig.from_args(1.0, 1.0, 224, 0.2), - 'efficientnet-b1': ModelConfig.from_args(1.0, 1.1, 240, 0.2), - 'efficientnet-b2': ModelConfig.from_args(1.1, 1.2, 260, 0.3), - 'efficientnet-b3': ModelConfig.from_args(1.2, 1.4, 300, 0.3), - 'efficientnet-b4': ModelConfig.from_args(1.4, 1.8, 380, 0.4), - 'efficientnet-b5': ModelConfig.from_args(1.6, 2.2, 456, 0.4), - 'efficientnet-b6': ModelConfig.from_args(1.8, 2.6, 528, 0.5), - 'efficientnet-b7': ModelConfig.from_args(2.0, 3.1, 600, 0.5), - 'efficientnet-b8': ModelConfig.from_args(2.2, 3.6, 672, 0.5), - 'efficientnet-l2': ModelConfig.from_args(4.3, 5.3, 800, 0.5), -} - -CONV_KERNEL_INITIALIZER = { - 'class_name': 'VarianceScaling', - 'config': { - 'scale': 2.0, - 'mode': 'fan_out', - # Note: this is a truncated normal distribution - 'distribution': 'normal' - } -} - -DENSE_KERNEL_INITIALIZER = { - 'class_name': 'VarianceScaling', - 'config': { - 'scale': 1 / 3.0, - 'mode': 'fan_out', - 'distribution': 'uniform' - } -} - - -def round_filters(filters: int, config: ModelConfig) -> int: - """Round number of filters based on width coefficient.""" - width_coefficient = config.width_coefficient - min_depth = config.min_depth - divisor = config.depth_divisor - orig_filters = filters - - if not width_coefficient: - return filters - - filters *= width_coefficient - min_depth = min_depth or divisor - new_filters = max(min_depth, int(filters + divisor / 2) // divisor * divisor) - # Make sure that round down does not go down by more than 10%. - if new_filters < 0.9 * filters: - new_filters += divisor - logging.info('round_filter input=%s output=%s', orig_filters, new_filters) - return int(new_filters) - - -def round_repeats(repeats: int, depth_coefficient: float) -> int: - """Round number of repeats based on depth coefficient.""" - return int(math.ceil(depth_coefficient * repeats)) - - -def conv2d_block(inputs: tf.Tensor, - conv_filters: Optional[int], - config: ModelConfig, - kernel_size: Any = (1, 1), - strides: Any = (1, 1), - use_batch_norm: bool = True, - use_bias: bool = False, - activation: Optional[Any] = None, - depthwise: bool = False, - name: Optional[Text] = None): - """A conv2d followed by batch norm and an activation.""" - batch_norm = common_modules.get_batch_norm(config.batch_norm) - bn_momentum = config.bn_momentum - bn_epsilon = config.bn_epsilon - data_format = tf.keras.backend.image_data_format() - weight_decay = config.weight_decay - - name = name or '' - - # Collect args based on what kind of conv2d block is desired - init_kwargs = { - 'kernel_size': kernel_size, - 'strides': strides, - 'use_bias': use_bias, - 'padding': 'same', - 'name': name + '_conv2d', - 'kernel_regularizer': tf.keras.regularizers.l2(weight_decay), - 'bias_regularizer': tf.keras.regularizers.l2(weight_decay), - } - - if depthwise: - conv2d = tf.keras.layers.DepthwiseConv2D - init_kwargs.update({'depthwise_initializer': CONV_KERNEL_INITIALIZER}) - else: - conv2d = tf.keras.layers.Conv2D - init_kwargs.update({ - 'filters': conv_filters, - 'kernel_initializer': CONV_KERNEL_INITIALIZER - }) - - x = conv2d(**init_kwargs)(inputs) - - if use_batch_norm: - bn_axis = 1 if data_format == 'channels_first' else -1 - x = batch_norm( - axis=bn_axis, - momentum=bn_momentum, - epsilon=bn_epsilon, - name=name + '_bn')( - x) - - if activation is not None: - x = tf.keras.layers.Activation(activation, name=name + '_activation')(x) - return x - - -def mb_conv_block(inputs: tf.Tensor, - block: BlockConfig, - config: ModelConfig, - prefix: Optional[Text] = None): - """Mobile Inverted Residual Bottleneck. - - Args: - inputs: the Keras input to the block - block: BlockConfig, arguments to create a Block - config: ModelConfig, a set of model parameters - prefix: prefix for naming all layers - - Returns: - the output of the block - """ - use_se = config.use_se - activation = tf_utils.get_activation(config.activation) - drop_connect_rate = config.drop_connect_rate - data_format = tf.keras.backend.image_data_format() - use_depthwise = block.conv_type != 'no_depthwise' - prefix = prefix or '' - - filters = block.input_filters * block.expand_ratio - - x = inputs - - if block.fused_conv: - # If we use fused mbconv, skip expansion and use regular conv. - x = conv2d_block( - x, - filters, - config, - kernel_size=block.kernel_size, - strides=block.strides, - activation=activation, - name=prefix + 'fused') - else: - if block.expand_ratio != 1: - # Expansion phase - kernel_size = (1, 1) if use_depthwise else (3, 3) - x = conv2d_block( - x, - filters, - config, - kernel_size=kernel_size, - activation=activation, - name=prefix + 'expand') - - # Depthwise Convolution - if use_depthwise: - x = conv2d_block( - x, - conv_filters=None, - config=config, - kernel_size=block.kernel_size, - strides=block.strides, - activation=activation, - depthwise=True, - name=prefix + 'depthwise') - - # Squeeze and Excitation phase - if use_se: - assert block.se_ratio is not None - assert 0 < block.se_ratio <= 1 - num_reduced_filters = max(1, int(block.input_filters * block.se_ratio)) - - if data_format == 'channels_first': - se_shape = (filters, 1, 1) - else: - se_shape = (1, 1, filters) - - se = tf.keras.layers.GlobalAveragePooling2D(name=prefix + 'se_squeeze')(x) - se = tf.keras.layers.Reshape(se_shape, name=prefix + 'se_reshape')(se) - - se = conv2d_block( - se, - num_reduced_filters, - config, - use_bias=True, - use_batch_norm=False, - activation=activation, - name=prefix + 'se_reduce') - se = conv2d_block( - se, - filters, - config, - use_bias=True, - use_batch_norm=False, - activation='sigmoid', - name=prefix + 'se_expand') - x = tf.keras.layers.multiply([x, se], name=prefix + 'se_excite') - - # Output phase - x = conv2d_block( - x, block.output_filters, config, activation=None, name=prefix + 'project') - - # Add identity so that quantization-aware training can insert quantization - # ops correctly. - x = tf.keras.layers.Activation( - tf_utils.get_activation('identity'), name=prefix + 'id')( - x) - - if (block.id_skip and all(s == 1 for s in block.strides) and - block.input_filters == block.output_filters): - if drop_connect_rate and drop_connect_rate > 0: - # Apply dropconnect - # The only difference between dropout and dropconnect in TF is scaling by - # drop_connect_rate during training. See: - # https://github.com/keras-team/keras/pull/9898#issuecomment-380577612 - x = tf.keras.layers.Dropout( - drop_connect_rate, noise_shape=(None, 1, 1, 1), name=prefix + 'drop')( - x) - - x = tf.keras.layers.add([x, inputs], name=prefix + 'add') - - return x - - -def efficientnet(image_input: tf.keras.layers.Input, config: ModelConfig): # pytype: disable=invalid-annotation # typed-keras - """Creates an EfficientNet graph given the model parameters. - - This function is wrapped by the `EfficientNet` class to make a tf.keras.Model. - - Args: - image_input: the input batch of images - config: the model config - - Returns: - the output of efficientnet - """ - depth_coefficient = config.depth_coefficient - blocks = config.blocks - stem_base_filters = config.stem_base_filters - top_base_filters = config.top_base_filters - activation = tf_utils.get_activation(config.activation) - dropout_rate = config.dropout_rate - drop_connect_rate = config.drop_connect_rate - num_classes = config.num_classes - input_channels = config.input_channels - rescale_input = config.rescale_input - data_format = tf.keras.backend.image_data_format() - dtype = config.dtype - weight_decay = config.weight_decay - - x = image_input - if data_format == 'channels_first': - # Happens on GPU/TPU if available. - x = tf.keras.layers.Permute((3, 1, 2))(x) - if rescale_input: - x = preprocessing.normalize_images( - x, num_channels=input_channels, dtype=dtype, data_format=data_format) - - # Build stem - x = conv2d_block( - x, - round_filters(stem_base_filters, config), - config, - kernel_size=[3, 3], - strides=[2, 2], - activation=activation, - name='stem') - - # Build blocks - num_blocks_total = sum( - round_repeats(block.num_repeat, depth_coefficient) for block in blocks) - block_num = 0 - - for stack_idx, block in enumerate(blocks): - assert block.num_repeat > 0 - # Update block input and output filters based on depth multiplier - block = block.replace( - input_filters=round_filters(block.input_filters, config), - output_filters=round_filters(block.output_filters, config), - num_repeat=round_repeats(block.num_repeat, depth_coefficient)) - - # The first block needs to take care of stride and filter size increase - drop_rate = drop_connect_rate * float(block_num) / num_blocks_total - config = config.replace(drop_connect_rate=drop_rate) - block_prefix = 'stack_{}/block_0/'.format(stack_idx) - x = mb_conv_block(x, block, config, block_prefix) - block_num += 1 - if block.num_repeat > 1: - block = block.replace(input_filters=block.output_filters, strides=[1, 1]) - - for block_idx in range(block.num_repeat - 1): - drop_rate = drop_connect_rate * float(block_num) / num_blocks_total - config = config.replace(drop_connect_rate=drop_rate) - block_prefix = 'stack_{}/block_{}/'.format(stack_idx, block_idx + 1) - x = mb_conv_block(x, block, config, prefix=block_prefix) - block_num += 1 - - # Build top - x = conv2d_block( - x, - round_filters(top_base_filters, config), - config, - activation=activation, - name='top') - - # Build classifier - x = tf.keras.layers.GlobalAveragePooling2D(name='top_pool')(x) - if dropout_rate and dropout_rate > 0: - x = tf.keras.layers.Dropout(dropout_rate, name='top_dropout')(x) - x = tf.keras.layers.Dense( - num_classes, - kernel_initializer=DENSE_KERNEL_INITIALIZER, - kernel_regularizer=tf.keras.regularizers.l2(weight_decay), - bias_regularizer=tf.keras.regularizers.l2(weight_decay), - name='logits')( - x) - x = tf.keras.layers.Activation('softmax', name='probs')(x) - - return x - - -class EfficientNet(tf.keras.Model): - """Wrapper class for an EfficientNet Keras model. - - Contains helper methods to build, manage, and save metadata about the model. - """ - - def __init__(self, - config: Optional[ModelConfig] = None, - overrides: Optional[Dict[Text, Any]] = None): - """Create an EfficientNet model. - - Args: - config: (optional) the main model parameters to create the model - overrides: (optional) a dict containing keys that can override config - """ - overrides = overrides or {} - config = config or ModelConfig() - - self.config = config.replace(**overrides) - - input_channels = self.config.input_channels - model_name = self.config.model_name - input_shape = (None, None, input_channels) # Should handle any size image - image_input = tf.keras.layers.Input(shape=input_shape) - - output = efficientnet(image_input, self.config) - - # Cast to float32 in case we have a different model dtype - output = tf.cast(output, tf.float32) - - logging.info('Building model %s with params %s', model_name, self.config) - - super(EfficientNet, self).__init__( - inputs=image_input, outputs=output, name=model_name) - - @classmethod - def from_name(cls, - model_name: Text, - model_weights_path: Optional[Text] = None, - weights_format: Text = 'saved_model', - overrides: Optional[Dict[Text, Any]] = None): - """Construct an EfficientNet model from a predefined model name. - - E.g., `EfficientNet.from_name('efficientnet-b0')`. - - Args: - model_name: the predefined model name - model_weights_path: the path to the weights (h5 file or saved model dir) - weights_format: the model weights format. One of 'saved_model', 'h5', or - 'checkpoint'. - overrides: (optional) a dict containing keys that can override config - - Returns: - A constructed EfficientNet instance. - """ - model_configs = dict(MODEL_CONFIGS) - overrides = dict(overrides) if overrides else {} - - # One can define their own custom models if necessary - model_configs.update(overrides.pop('model_config', {})) - - if model_name not in model_configs: - raise ValueError('Unknown model name {}'.format(model_name)) - - config = model_configs[model_name] - - model = cls(config=config, overrides=overrides) - - if model_weights_path: - common_modules.load_weights( - model, model_weights_path, weights_format=weights_format) - - return model diff --git a/official/vision/image_classification/efficientnet/tfhub_export.py b/official/vision/image_classification/efficientnet/tfhub_export.py deleted file mode 100644 index d3518a1304c8c761cfaabdcc96dead70dd9b0097..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/efficientnet/tfhub_export.py +++ /dev/null @@ -1,67 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""A script to export TF-Hub SavedModel.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os - -from absl import app -from absl import flags - -import tensorflow as tf - -from official.vision.image_classification.efficientnet import efficientnet_model - -FLAGS = flags.FLAGS - -flags.DEFINE_string("model_name", None, "EfficientNet model name.") -flags.DEFINE_string("model_path", None, "File path to TF model checkpoint.") -flags.DEFINE_string("export_path", None, - "TF-Hub SavedModel destination path to export.") - - -def export_tfhub(model_path, hub_destination, model_name): - """Restores a tf.keras.Model and saves for TF-Hub.""" - model_configs = dict(efficientnet_model.MODEL_CONFIGS) - config = model_configs[model_name] - - image_input = tf.keras.layers.Input( - shape=(None, None, 3), name="image_input", dtype=tf.float32) - x = image_input * 255.0 - ouputs = efficientnet_model.efficientnet(x, config) - hub_model = tf.keras.Model(image_input, ouputs) - ckpt = tf.train.Checkpoint(model=hub_model) - ckpt.restore(model_path).assert_existing_objects_matched() - hub_model.save( - os.path.join(hub_destination, "classification"), include_optimizer=False) - - feature_vector_output = hub_model.get_layer(name="top_pool").get_output_at(0) - hub_model2 = tf.keras.Model(image_input, feature_vector_output) - hub_model2.save( - os.path.join(hub_destination, "feature-vector"), include_optimizer=False) - - -def main(argv): - if len(argv) > 1: - raise app.UsageError("Too many command-line arguments.") - - export_tfhub(FLAGS.model_path, FLAGS.export_path, FLAGS.model_name) - - -if __name__ == "__main__": - app.run(main) diff --git a/official/vision/image_classification/learning_rate.py b/official/vision/image_classification/learning_rate.py deleted file mode 100644 index 72f7e95187521eeebefa1e698ca5382f10642e88..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/learning_rate.py +++ /dev/null @@ -1,117 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Learning rate utilities for vision tasks.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from typing import Any, Mapping, Optional - -import numpy as np -import tensorflow as tf - -BASE_LEARNING_RATE = 0.1 - - -class WarmupDecaySchedule(tf.keras.optimizers.schedules.LearningRateSchedule): - """A wrapper for LearningRateSchedule that includes warmup steps.""" - - def __init__(self, - lr_schedule: tf.keras.optimizers.schedules.LearningRateSchedule, - warmup_steps: int, - warmup_lr: Optional[float] = None): - """Add warmup decay to a learning rate schedule. - - Args: - lr_schedule: base learning rate scheduler - warmup_steps: number of warmup steps - warmup_lr: an optional field for the final warmup learning rate. This - should be provided if the base `lr_schedule` does not contain this - field. - """ - super(WarmupDecaySchedule, self).__init__() - self._lr_schedule = lr_schedule - self._warmup_steps = warmup_steps - self._warmup_lr = warmup_lr - - def __call__(self, step: int): - lr = self._lr_schedule(step) - if self._warmup_steps: - if self._warmup_lr is not None: - initial_learning_rate = tf.convert_to_tensor( - self._warmup_lr, name="initial_learning_rate") - else: - initial_learning_rate = tf.convert_to_tensor( - self._lr_schedule.initial_learning_rate, - name="initial_learning_rate") - dtype = initial_learning_rate.dtype - global_step_recomp = tf.cast(step, dtype) - warmup_steps = tf.cast(self._warmup_steps, dtype) - warmup_lr = initial_learning_rate * global_step_recomp / warmup_steps - lr = tf.cond(global_step_recomp < warmup_steps, lambda: warmup_lr, - lambda: lr) - return lr - - def get_config(self) -> Mapping[str, Any]: - config = self._lr_schedule.get_config() - config.update({ - "warmup_steps": self._warmup_steps, - "warmup_lr": self._warmup_lr, - }) - return config - - -class CosineDecayWithWarmup(tf.keras.optimizers.schedules.LearningRateSchedule): - """Class to generate learning rate tensor.""" - - def __init__(self, batch_size: int, total_steps: int, warmup_steps: int): - """Creates the consine learning rate tensor with linear warmup. - - Args: - batch_size: The training batch size used in the experiment. - total_steps: Total training steps. - warmup_steps: Steps for the warm up period. - """ - super(CosineDecayWithWarmup, self).__init__() - base_lr_batch_size = 256 - self._total_steps = total_steps - self._init_learning_rate = BASE_LEARNING_RATE * batch_size / base_lr_batch_size - self._warmup_steps = warmup_steps - - def __call__(self, global_step: int): - global_step = tf.cast(global_step, dtype=tf.float32) - warmup_steps = self._warmup_steps - init_lr = self._init_learning_rate - total_steps = self._total_steps - - linear_warmup = global_step / warmup_steps * init_lr - - cosine_learning_rate = init_lr * (tf.cos(np.pi * - (global_step - warmup_steps) / - (total_steps - warmup_steps)) + - 1.0) / 2.0 - - learning_rate = tf.where(global_step < warmup_steps, linear_warmup, - cosine_learning_rate) - return learning_rate - - def get_config(self): - return { - "total_steps": self._total_steps, - "warmup_learning_rate": self._warmup_learning_rate, - "warmup_steps": self._warmup_steps, - "init_learning_rate": self._init_learning_rate, - } diff --git a/official/vision/image_classification/learning_rate_test.py b/official/vision/image_classification/learning_rate_test.py deleted file mode 100644 index 6c33ed24b8e46b8ecb58005a1f528e62a66f0005..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/learning_rate_test.py +++ /dev/null @@ -1,60 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for learning_rate.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf - -from official.vision.image_classification import learning_rate - - -class LearningRateTests(tf.test.TestCase): - - def test_warmup_decay(self): - """Basic computational test for warmup decay.""" - initial_lr = 0.01 - decay_steps = 100 - decay_rate = 0.01 - warmup_steps = 10 - - base_lr = tf.keras.optimizers.schedules.ExponentialDecay( - initial_learning_rate=initial_lr, - decay_steps=decay_steps, - decay_rate=decay_rate) - lr = learning_rate.WarmupDecaySchedule( - lr_schedule=base_lr, warmup_steps=warmup_steps) - - for step in range(warmup_steps - 1): - config = lr.get_config() - self.assertEqual(config['warmup_steps'], warmup_steps) - self.assertAllClose( - self.evaluate(lr(step)), step / warmup_steps * initial_lr) - - def test_cosine_decay_with_warmup(self): - """Basic computational test for cosine decay with warmup.""" - expected_lrs = [0.0, 0.1, 0.05, 0.0] - - lr = learning_rate.CosineDecayWithWarmup( - batch_size=256, total_steps=3, warmup_steps=1) - - for step in [0, 1, 2, 3]: - self.assertAllClose(lr(step), expected_lrs[step]) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/image_classification/mnist_main.py b/official/vision/image_classification/mnist_main.py deleted file mode 100644 index 3eba80b06a9215cb5dc4d3b13facb2f2a4f3058c..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/mnist_main.py +++ /dev/null @@ -1,176 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Runs a simple model on the MNIST dataset.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os - -# Import libraries -from absl import app -from absl import flags -from absl import logging -import tensorflow as tf -import tensorflow_datasets as tfds -from official.common import distribute_utils -from official.utils.flags import core as flags_core -from official.utils.misc import model_helpers -from official.vision.image_classification.resnet import common - -FLAGS = flags.FLAGS - - -def build_model(): - """Constructs the ML model used to predict handwritten digits.""" - - image = tf.keras.layers.Input(shape=(28, 28, 1)) - - y = tf.keras.layers.Conv2D(filters=32, - kernel_size=5, - padding='same', - activation='relu')(image) - y = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), - strides=(2, 2), - padding='same')(y) - y = tf.keras.layers.Conv2D(filters=32, - kernel_size=5, - padding='same', - activation='relu')(y) - y = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), - strides=(2, 2), - padding='same')(y) - y = tf.keras.layers.Flatten()(y) - y = tf.keras.layers.Dense(1024, activation='relu')(y) - y = tf.keras.layers.Dropout(0.4)(y) - - probs = tf.keras.layers.Dense(10, activation='softmax')(y) - - model = tf.keras.models.Model(image, probs, name='mnist') - - return model - - -@tfds.decode.make_decoder(output_dtype=tf.float32) -def decode_image(example, feature): - """Convert image to float32 and normalize from [0, 255] to [0.0, 1.0].""" - return tf.cast(feature.decode_example(example), dtype=tf.float32) / 255 - - -def run(flags_obj, datasets_override=None, strategy_override=None): - """Run MNIST model training and eval loop using native Keras APIs. - - Args: - flags_obj: An object containing parsed flag values. - datasets_override: A pair of `tf.data.Dataset` objects to train the model, - representing the train and test sets. - strategy_override: A `tf.distribute.Strategy` object to use for model. - - Returns: - Dictionary of training and eval stats. - """ - # Start TF profiler server. - tf.profiler.experimental.server.start(flags_obj.profiler_port) - - strategy = strategy_override or distribute_utils.get_distribution_strategy( - distribution_strategy=flags_obj.distribution_strategy, - num_gpus=flags_obj.num_gpus, - tpu_address=flags_obj.tpu) - - strategy_scope = distribute_utils.get_strategy_scope(strategy) - - mnist = tfds.builder('mnist', data_dir=flags_obj.data_dir) - if flags_obj.download: - mnist.download_and_prepare() - - mnist_train, mnist_test = datasets_override or mnist.as_dataset( - split=['train', 'test'], - decoders={'image': decode_image()}, # pylint: disable=no-value-for-parameter - as_supervised=True) - train_input_dataset = mnist_train.cache().repeat().shuffle( - buffer_size=50000).batch(flags_obj.batch_size) - eval_input_dataset = mnist_test.cache().repeat().batch(flags_obj.batch_size) - - with strategy_scope: - lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay( - 0.05, decay_steps=100000, decay_rate=0.96) - optimizer = tf.keras.optimizers.SGD(learning_rate=lr_schedule) - - model = build_model() - model.compile( - optimizer=optimizer, - loss='sparse_categorical_crossentropy', - metrics=['sparse_categorical_accuracy']) - - num_train_examples = mnist.info.splits['train'].num_examples - train_steps = num_train_examples // flags_obj.batch_size - train_epochs = flags_obj.train_epochs - - ckpt_full_path = os.path.join(flags_obj.model_dir, 'model.ckpt-{epoch:04d}') - callbacks = [ - tf.keras.callbacks.ModelCheckpoint( - ckpt_full_path, save_weights_only=True), - tf.keras.callbacks.TensorBoard(log_dir=flags_obj.model_dir), - ] - - num_eval_examples = mnist.info.splits['test'].num_examples - num_eval_steps = num_eval_examples // flags_obj.batch_size - - history = model.fit( - train_input_dataset, - epochs=train_epochs, - steps_per_epoch=train_steps, - callbacks=callbacks, - validation_steps=num_eval_steps, - validation_data=eval_input_dataset, - validation_freq=flags_obj.epochs_between_evals) - - export_path = os.path.join(flags_obj.model_dir, 'saved_model') - model.save(export_path, include_optimizer=False) - - eval_output = model.evaluate( - eval_input_dataset, steps=num_eval_steps, verbose=2) - - stats = common.build_stats(history, eval_output, callbacks) - return stats - - -def define_mnist_flags(): - """Define command line flags for MNIST model.""" - flags_core.define_base( - clean=True, - num_gpu=True, - train_epochs=True, - epochs_between_evals=True, - distribution_strategy=True) - flags_core.define_device() - flags_core.define_distribution() - flags.DEFINE_bool('download', True, - 'Whether to download data to `--data_dir`.') - flags.DEFINE_integer('profiler_port', 9012, - 'Port to start profiler server on.') - FLAGS.set_default('batch_size', 1024) - - -def main(_): - model_helpers.apply_clean(FLAGS) - stats = run(flags.FLAGS) - logging.info('Run stats:\n%s', stats) - - -if __name__ == '__main__': - logging.set_verbosity(logging.INFO) - define_mnist_flags() - app.run(main) diff --git a/official/vision/image_classification/mnist_test.py b/official/vision/image_classification/mnist_test.py deleted file mode 100644 index c94396a444294b37259ba849bd8ea2f6f76997d0..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/mnist_test.py +++ /dev/null @@ -1,89 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Test the Keras MNIST model on GPU.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import functools - -from absl.testing import parameterized -import tensorflow as tf - -from tensorflow.python.distribute import combinations -from tensorflow.python.distribute import strategy_combinations -from official.utils.testing import integration -from official.vision.image_classification import mnist_main - - -mnist_main.define_mnist_flags() - - -def eager_strategy_combinations(): - return combinations.combine( - distribution=[ - strategy_combinations.default_strategy, - strategy_combinations.cloud_tpu_strategy, - strategy_combinations.one_device_strategy_gpu, - ],) - - -class KerasMnistTest(tf.test.TestCase, parameterized.TestCase): - """Unit tests for sample Keras MNIST model.""" - _tempdir = None - - @classmethod - def setUpClass(cls): # pylint: disable=invalid-name - super(KerasMnistTest, cls).setUpClass() - - def tearDown(self): - super(KerasMnistTest, self).tearDown() - tf.io.gfile.rmtree(self.get_temp_dir()) - - @combinations.generate(eager_strategy_combinations()) - def test_end_to_end(self, distribution): - """Test Keras MNIST model with `strategy`.""" - - extra_flags = [ - "-train_epochs", - "1", - # Let TFDS find the metadata folder automatically - "--data_dir=" - ] - - dummy_data = ( - tf.ones(shape=(10, 28, 28, 1), dtype=tf.int32), - tf.range(10), - ) - datasets = ( - tf.data.Dataset.from_tensor_slices(dummy_data), - tf.data.Dataset.from_tensor_slices(dummy_data), - ) - - run = functools.partial( - mnist_main.run, - datasets_override=datasets, - strategy_override=distribution) - - integration.run_synthetic( - main=run, - synth=False, - tmp_root=self.create_tempdir().full_path, - extra_flags=extra_flags) - - -if __name__ == "__main__": - tf.test.main() diff --git a/official/vision/image_classification/optimizer_factory.py b/official/vision/image_classification/optimizer_factory.py deleted file mode 100644 index 48a4512ee96438cec1367d6493f63a230b01eeb1..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/optimizer_factory.py +++ /dev/null @@ -1,181 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Optimizer factory for vision tasks.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from typing import Any, Dict, Optional, Text - -from absl import logging -import tensorflow as tf -import tensorflow_addons as tfa - -from official.modeling import optimization -from official.vision.image_classification import learning_rate -from official.vision.image_classification.configs import base_configs - -# pylint: disable=protected-access - - -def build_optimizer( - optimizer_name: Text, - base_learning_rate: tf.keras.optimizers.schedules.LearningRateSchedule, - params: Dict[Text, Any], - model: Optional[tf.keras.Model] = None): - """Build the optimizer based on name. - - Args: - optimizer_name: String representation of the optimizer name. Examples: sgd, - momentum, rmsprop. - base_learning_rate: `tf.keras.optimizers.schedules.LearningRateSchedule` - base learning rate. - params: String -> Any dictionary representing the optimizer params. This - should contain optimizer specific parameters such as `base_learning_rate`, - `decay`, etc. - model: The `tf.keras.Model`. This is used for the shadow copy if using - `ExponentialMovingAverage`. - - Returns: - A tf.keras.Optimizer. - - Raises: - ValueError if the provided optimizer_name is not supported. - - """ - optimizer_name = optimizer_name.lower() - logging.info('Building %s optimizer with params %s', optimizer_name, params) - - if optimizer_name == 'sgd': - logging.info('Using SGD optimizer') - nesterov = params.get('nesterov', False) - optimizer = tf.keras.optimizers.SGD( - learning_rate=base_learning_rate, nesterov=nesterov) - elif optimizer_name == 'momentum': - logging.info('Using momentum optimizer') - nesterov = params.get('nesterov', False) - optimizer = tf.keras.optimizers.SGD( - learning_rate=base_learning_rate, - momentum=params['momentum'], - nesterov=nesterov) - elif optimizer_name == 'rmsprop': - logging.info('Using RMSProp') - rho = params.get('decay', None) or params.get('rho', 0.9) - momentum = params.get('momentum', 0.9) - epsilon = params.get('epsilon', 1e-07) - optimizer = tf.keras.optimizers.RMSprop( - learning_rate=base_learning_rate, - rho=rho, - momentum=momentum, - epsilon=epsilon) - elif optimizer_name == 'adam': - logging.info('Using Adam') - beta_1 = params.get('beta_1', 0.9) - beta_2 = params.get('beta_2', 0.999) - epsilon = params.get('epsilon', 1e-07) - optimizer = tf.keras.optimizers.Adam( - learning_rate=base_learning_rate, - beta_1=beta_1, - beta_2=beta_2, - epsilon=epsilon) - elif optimizer_name == 'adamw': - logging.info('Using AdamW') - weight_decay = params.get('weight_decay', 0.01) - beta_1 = params.get('beta_1', 0.9) - beta_2 = params.get('beta_2', 0.999) - epsilon = params.get('epsilon', 1e-07) - optimizer = tfa.optimizers.AdamW( - weight_decay=weight_decay, - learning_rate=base_learning_rate, - beta_1=beta_1, - beta_2=beta_2, - epsilon=epsilon) - else: - raise ValueError('Unknown optimizer %s' % optimizer_name) - - if params.get('lookahead', None): - logging.info('Using lookahead optimizer.') - optimizer = tfa.optimizers.Lookahead(optimizer) - - # Moving average should be applied last, as it's applied at test time - moving_average_decay = params.get('moving_average_decay', 0.) - if moving_average_decay is not None and moving_average_decay > 0.: - if model is None: - raise ValueError( - '`model` must be provided if using `ExponentialMovingAverage`.') - logging.info('Including moving average decay.') - optimizer = optimization.ExponentialMovingAverage( - optimizer=optimizer, average_decay=moving_average_decay) - optimizer.shadow_copy(model) - return optimizer - - -def build_learning_rate(params: base_configs.LearningRateConfig, - batch_size: Optional[int] = None, - train_epochs: Optional[int] = None, - train_steps: Optional[int] = None): - """Build the learning rate given the provided configuration.""" - decay_type = params.name - base_lr = params.initial_lr - decay_rate = params.decay_rate - if params.decay_epochs is not None: - decay_steps = params.decay_epochs * train_steps - else: - decay_steps = 0 - if params.warmup_epochs is not None: - warmup_steps = params.warmup_epochs * train_steps - else: - warmup_steps = 0 - - lr_multiplier = params.scale_by_batch_size - - if lr_multiplier and lr_multiplier > 0: - # Scale the learning rate based on the batch size and a multiplier - base_lr *= lr_multiplier * batch_size - logging.info( - 'Scaling the learning rate based on the batch size ' - 'multiplier. New base_lr: %f', base_lr) - - if decay_type == 'exponential': - logging.info( - 'Using exponential learning rate with: ' - 'initial_learning_rate: %f, decay_steps: %d, ' - 'decay_rate: %f', base_lr, decay_steps, decay_rate) - lr = tf.keras.optimizers.schedules.ExponentialDecay( - initial_learning_rate=base_lr, - decay_steps=decay_steps, - decay_rate=decay_rate, - staircase=params.staircase) - elif decay_type == 'stepwise': - steps_per_epoch = params.examples_per_epoch // batch_size - boundaries = [boundary * steps_per_epoch for boundary in params.boundaries] - multipliers = [batch_size * multiplier for multiplier in params.multipliers] - logging.info( - 'Using stepwise learning rate. Parameters: ' - 'boundaries: %s, values: %s', boundaries, multipliers) - lr = tf.keras.optimizers.schedules.PiecewiseConstantDecay( - boundaries=boundaries, values=multipliers) - elif decay_type == 'cosine_with_warmup': - lr = learning_rate.CosineDecayWithWarmup( - batch_size=batch_size, - total_steps=train_epochs * train_steps, - warmup_steps=warmup_steps) - if warmup_steps > 0: - if decay_type not in ['cosine_with_warmup']: - logging.info('Applying %d warmup steps to the learning rate', - warmup_steps) - lr = learning_rate.WarmupDecaySchedule( - lr, warmup_steps, warmup_lr=base_lr) - return lr diff --git a/official/vision/image_classification/optimizer_factory_test.py b/official/vision/image_classification/optimizer_factory_test.py deleted file mode 100644 index 41d71a328d6fc0d27709978ae75994f8985a166d..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/optimizer_factory_test.py +++ /dev/null @@ -1,118 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for optimizer_factory.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from absl.testing import parameterized - -import tensorflow as tf -from official.vision.image_classification import optimizer_factory -from official.vision.image_classification.configs import base_configs - - -class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase): - - def build_toy_model(self) -> tf.keras.Model: - """Creates a toy `tf.Keras.Model`.""" - model = tf.keras.Sequential() - model.add(tf.keras.layers.Dense(1, input_shape=(1,))) - return model - - @parameterized.named_parameters( - ('sgd', 'sgd', 0., False), ('momentum', 'momentum', 0., False), - ('rmsprop', 'rmsprop', 0., False), ('adam', 'adam', 0., False), - ('adamw', 'adamw', 0., False), - ('momentum_lookahead', 'momentum', 0., True), - ('sgd_ema', 'sgd', 0.999, False), - ('momentum_ema', 'momentum', 0.999, False), - ('rmsprop_ema', 'rmsprop', 0.999, False)) - def test_optimizer(self, optimizer_name, moving_average_decay, lookahead): - """Smoke test to be sure no syntax errors.""" - model = self.build_toy_model() - params = { - 'learning_rate': 0.001, - 'rho': 0.09, - 'momentum': 0., - 'epsilon': 1e-07, - 'moving_average_decay': moving_average_decay, - 'lookahead': lookahead, - } - optimizer = optimizer_factory.build_optimizer( - optimizer_name=optimizer_name, - base_learning_rate=params['learning_rate'], - params=params, - model=model) - self.assertTrue(issubclass(type(optimizer), tf.keras.optimizers.Optimizer)) - - def test_unknown_optimizer(self): - with self.assertRaises(ValueError): - optimizer_factory.build_optimizer( - optimizer_name='this_optimizer_does_not_exist', - base_learning_rate=None, - params=None) - - def test_learning_rate_without_decay_or_warmups(self): - params = base_configs.LearningRateConfig( - name='exponential', - initial_lr=0.01, - decay_rate=0.01, - decay_epochs=None, - warmup_epochs=None, - scale_by_batch_size=0.01, - examples_per_epoch=1, - boundaries=[0], - multipliers=[0, 1]) - batch_size = 1 - train_steps = 1 - - lr = optimizer_factory.build_learning_rate( - params=params, batch_size=batch_size, train_steps=train_steps) - self.assertTrue( - issubclass( - type(lr), tf.keras.optimizers.schedules.LearningRateSchedule)) - - @parameterized.named_parameters(('exponential', 'exponential'), - ('cosine_with_warmup', 'cosine_with_warmup')) - def test_learning_rate_with_decay_and_warmup(self, lr_decay_type): - """Basic smoke test for syntax.""" - params = base_configs.LearningRateConfig( - name=lr_decay_type, - initial_lr=0.01, - decay_rate=0.01, - decay_epochs=1, - warmup_epochs=1, - scale_by_batch_size=0.01, - examples_per_epoch=1, - boundaries=[0], - multipliers=[0, 1]) - batch_size = 1 - train_epochs = 1 - train_steps = 1 - - lr = optimizer_factory.build_learning_rate( - params=params, - batch_size=batch_size, - train_epochs=train_epochs, - train_steps=train_steps) - self.assertTrue( - issubclass( - type(lr), tf.keras.optimizers.schedules.LearningRateSchedule)) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/image_classification/preprocessing.py b/official/vision/image_classification/preprocessing.py deleted file mode 100644 index bd7e2e1d19faab1a4257f81bc59a5845d75b1823..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/preprocessing.py +++ /dev/null @@ -1,390 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Preprocessing functions for images.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf -from typing import List, Optional, Text, Tuple - -from official.vision.image_classification import augment - - -# Calculated from the ImageNet training set -MEAN_RGB = (0.485 * 255, 0.456 * 255, 0.406 * 255) -STDDEV_RGB = (0.229 * 255, 0.224 * 255, 0.225 * 255) - -IMAGE_SIZE = 224 -CROP_PADDING = 32 - - -def mean_image_subtraction( - image_bytes: tf.Tensor, - means: Tuple[float, ...], - num_channels: int = 3, - dtype: tf.dtypes.DType = tf.float32, -) -> tf.Tensor: - """Subtracts the given means from each image channel. - - For example: - means = [123.68, 116.779, 103.939] - image_bytes = mean_image_subtraction(image_bytes, means) - - Note that the rank of `image` must be known. - - Args: - image_bytes: a tensor of size [height, width, C]. - means: a C-vector of values to subtract from each channel. - num_channels: number of color channels in the image that will be distorted. - dtype: the dtype to convert the images to. Set to `None` to skip conversion. - - Returns: - the centered image. - - Raises: - ValueError: If the rank of `image` is unknown, if `image` has a rank other - than three or if the number of channels in `image` doesn't match the - number of values in `means`. - """ - if image_bytes.get_shape().ndims != 3: - raise ValueError('Input must be of size [height, width, C>0]') - - if len(means) != num_channels: - raise ValueError('len(means) must match the number of channels') - - # We have a 1-D tensor of means; convert to 3-D. - # Note(b/130245863): we explicitly call `broadcast` instead of simply - # expanding dimensions for better performance. - means = tf.broadcast_to(means, tf.shape(image_bytes)) - if dtype is not None: - means = tf.cast(means, dtype=dtype) - - return image_bytes - means - - -def standardize_image( - image_bytes: tf.Tensor, - stddev: Tuple[float, ...], - num_channels: int = 3, - dtype: tf.dtypes.DType = tf.float32, -) -> tf.Tensor: - """Divides the given stddev from each image channel. - - For example: - stddev = [123.68, 116.779, 103.939] - image_bytes = standardize_image(image_bytes, stddev) - - Note that the rank of `image` must be known. - - Args: - image_bytes: a tensor of size [height, width, C]. - stddev: a C-vector of values to divide from each channel. - num_channels: number of color channels in the image that will be distorted. - dtype: the dtype to convert the images to. Set to `None` to skip conversion. - - Returns: - the centered image. - - Raises: - ValueError: If the rank of `image` is unknown, if `image` has a rank other - than three or if the number of channels in `image` doesn't match the - number of values in `stddev`. - """ - if image_bytes.get_shape().ndims != 3: - raise ValueError('Input must be of size [height, width, C>0]') - - if len(stddev) != num_channels: - raise ValueError('len(stddev) must match the number of channels') - - # We have a 1-D tensor of stddev; convert to 3-D. - # Note(b/130245863): we explicitly call `broadcast` instead of simply - # expanding dimensions for better performance. - stddev = tf.broadcast_to(stddev, tf.shape(image_bytes)) - if dtype is not None: - stddev = tf.cast(stddev, dtype=dtype) - - return image_bytes / stddev - - -def normalize_images(features: tf.Tensor, - mean_rgb: Tuple[float, ...] = MEAN_RGB, - stddev_rgb: Tuple[float, ...] = STDDEV_RGB, - num_channels: int = 3, - dtype: tf.dtypes.DType = tf.float32, - data_format: Text = 'channels_last') -> tf.Tensor: - """Normalizes the input image channels with the given mean and stddev. - - Args: - features: `Tensor` representing decoded images in float format. - mean_rgb: the mean of the channels to subtract. - stddev_rgb: the stddev of the channels to divide. - num_channels: the number of channels in the input image tensor. - dtype: the dtype to convert the images to. Set to `None` to skip conversion. - data_format: the format of the input image tensor - ['channels_first', 'channels_last']. - - Returns: - A normalized image `Tensor`. - """ - # TODO(allencwang) - figure out how to use mean_image_subtraction and - # standardize_image on batches of images and replace the following. - if data_format == 'channels_first': - stats_shape = [num_channels, 1, 1] - else: - stats_shape = [1, 1, num_channels] - - if dtype is not None: - features = tf.image.convert_image_dtype(features, dtype=dtype) - - if mean_rgb is not None: - mean_rgb = tf.constant(mean_rgb, - shape=stats_shape, - dtype=features.dtype) - mean_rgb = tf.broadcast_to(mean_rgb, tf.shape(features)) - features = features - mean_rgb - - if stddev_rgb is not None: - stddev_rgb = tf.constant(stddev_rgb, - shape=stats_shape, - dtype=features.dtype) - stddev_rgb = tf.broadcast_to(stddev_rgb, tf.shape(features)) - features = features / stddev_rgb - - return features - - -def decode_and_center_crop(image_bytes: tf.Tensor, - image_size: int = IMAGE_SIZE, - crop_padding: int = CROP_PADDING) -> tf.Tensor: - """Crops to center of image with padding then scales image_size. - - Args: - image_bytes: `Tensor` representing an image binary of arbitrary size. - image_size: image height/width dimension. - crop_padding: the padding size to use when centering the crop. - - Returns: - A decoded and cropped image `Tensor`. - """ - decoded = image_bytes.dtype != tf.string - shape = (tf.shape(image_bytes) if decoded - else tf.image.extract_jpeg_shape(image_bytes)) - image_height = shape[0] - image_width = shape[1] - - padded_center_crop_size = tf.cast( - ((image_size / (image_size + crop_padding)) * - tf.cast(tf.minimum(image_height, image_width), tf.float32)), - tf.int32) - - offset_height = ((image_height - padded_center_crop_size) + 1) // 2 - offset_width = ((image_width - padded_center_crop_size) + 1) // 2 - crop_window = tf.stack([offset_height, offset_width, - padded_center_crop_size, padded_center_crop_size]) - if decoded: - image = tf.image.crop_to_bounding_box( - image_bytes, - offset_height=offset_height, - offset_width=offset_width, - target_height=padded_center_crop_size, - target_width=padded_center_crop_size) - else: - image = tf.image.decode_and_crop_jpeg(image_bytes, crop_window, channels=3) - - image = resize_image(image_bytes=image, - height=image_size, - width=image_size) - - return image - - -def decode_crop_and_flip(image_bytes: tf.Tensor) -> tf.Tensor: - """Crops an image to a random part of the image, then randomly flips. - - Args: - image_bytes: `Tensor` representing an image binary of arbitrary size. - - Returns: - A decoded and cropped image `Tensor`. - - """ - decoded = image_bytes.dtype != tf.string - bbox = tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4]) - shape = (tf.shape(image_bytes) if decoded - else tf.image.extract_jpeg_shape(image_bytes)) - sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( - shape, - bounding_boxes=bbox, - min_object_covered=0.1, - aspect_ratio_range=[0.75, 1.33], - area_range=[0.05, 1.0], - max_attempts=100, - use_image_if_no_bounding_boxes=True) - bbox_begin, bbox_size, _ = sample_distorted_bounding_box - - # Reassemble the bounding box in the format the crop op requires. - offset_height, offset_width, _ = tf.unstack(bbox_begin) - target_height, target_width, _ = tf.unstack(bbox_size) - crop_window = tf.stack([offset_height, offset_width, - target_height, target_width]) - if decoded: - cropped = tf.image.crop_to_bounding_box( - image_bytes, - offset_height=offset_height, - offset_width=offset_width, - target_height=target_height, - target_width=target_width) - else: - cropped = tf.image.decode_and_crop_jpeg(image_bytes, - crop_window, - channels=3) - - # Flip to add a little more random distortion in. - cropped = tf.image.random_flip_left_right(cropped) - return cropped - - -def resize_image(image_bytes: tf.Tensor, - height: int = IMAGE_SIZE, - width: int = IMAGE_SIZE) -> tf.Tensor: - """Resizes an image to a given height and width. - - Args: - image_bytes: `Tensor` representing an image binary of arbitrary size. - height: image height dimension. - width: image width dimension. - - Returns: - A tensor containing the resized image. - - """ - return tf.compat.v1.image.resize( - image_bytes, [height, width], method=tf.image.ResizeMethod.BILINEAR, - align_corners=False) - - -def preprocess_for_eval( - image_bytes: tf.Tensor, - image_size: int = IMAGE_SIZE, - num_channels: int = 3, - mean_subtract: bool = False, - standardize: bool = False, - dtype: tf.dtypes.DType = tf.float32 -) -> tf.Tensor: - """Preprocesses the given image for evaluation. - - Args: - image_bytes: `Tensor` representing an image binary of arbitrary size. - image_size: image height/width dimension. - num_channels: number of image input channels. - mean_subtract: whether or not to apply mean subtraction. - standardize: whether or not to apply standardization. - dtype: the dtype to convert the images to. Set to `None` to skip conversion. - - Returns: - A preprocessed and normalized image `Tensor`. - """ - images = decode_and_center_crop(image_bytes, image_size) - images = tf.reshape(images, [image_size, image_size, num_channels]) - - if mean_subtract: - images = mean_image_subtraction(image_bytes=images, means=MEAN_RGB) - if standardize: - images = standardize_image(image_bytes=images, stddev=STDDEV_RGB) - if dtype is not None: - images = tf.image.convert_image_dtype(images, dtype=dtype) - - return images - - -def load_eval_image(filename: Text, image_size: int = IMAGE_SIZE) -> tf.Tensor: - """Reads an image from the filesystem and applies image preprocessing. - - Args: - filename: a filename path of an image. - image_size: image height/width dimension. - - Returns: - A preprocessed and normalized image `Tensor`. - """ - image_bytes = tf.io.read_file(filename) - image = preprocess_for_eval(image_bytes, image_size) - - return image - - -def build_eval_dataset(filenames: List[Text], - labels: Optional[List[int]] = None, - image_size: int = IMAGE_SIZE, - batch_size: int = 1) -> tf.Tensor: - """Builds a tf.data.Dataset from a list of filenames and labels. - - Args: - filenames: a list of filename paths of images. - labels: a list of labels corresponding to each image. - image_size: image height/width dimension. - batch_size: the batch size used by the dataset - - Returns: - A preprocessed and normalized image `Tensor`. - """ - if labels is None: - labels = [0] * len(filenames) - - filenames = tf.constant(filenames) - labels = tf.constant(labels) - dataset = tf.data.Dataset.from_tensor_slices((filenames, labels)) - - dataset = dataset.map( - lambda filename, label: (load_eval_image(filename, image_size), label)) - dataset = dataset.batch(batch_size) - - return dataset - - -def preprocess_for_train(image_bytes: tf.Tensor, - image_size: int = IMAGE_SIZE, - augmenter: Optional[augment.ImageAugment] = None, - mean_subtract: bool = False, - standardize: bool = False, - dtype: tf.dtypes.DType = tf.float32) -> tf.Tensor: - """Preprocesses the given image for training. - - Args: - image_bytes: `Tensor` representing an image binary of - arbitrary size of dtype tf.uint8. - image_size: image height/width dimension. - augmenter: the image augmenter to apply. - mean_subtract: whether or not to apply mean subtraction. - standardize: whether or not to apply standardization. - dtype: the dtype to convert the images to. Set to `None` to skip conversion. - - Returns: - A preprocessed and normalized image `Tensor`. - """ - images = decode_crop_and_flip(image_bytes=image_bytes) - images = resize_image(images, height=image_size, width=image_size) - if augmenter is not None: - images = augmenter.distort(images) - if mean_subtract: - images = mean_image_subtraction(image_bytes=images, means=MEAN_RGB) - if standardize: - images = standardize_image(image_bytes=images, stddev=STDDEV_RGB) - if dtype is not None: - images = tf.image.convert_image_dtype(images, dtype) - - return images diff --git a/official/vision/image_classification/resnet/README.md b/official/vision/image_classification/resnet/README.md deleted file mode 100644 index 5064523fbdcd4222c2159bdc1c09b7156800bf54..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/README.md +++ /dev/null @@ -1,125 +0,0 @@ -This folder contains a -[custom training loop (CTL)](#resnet-custom-training-loop) implementation for -ResNet50. - -## Before you begin -Please refer to the [README](../README.md) in the parent directory for -information on setup and preparing the data. - -## ResNet (custom training loop) - -Similar to the [estimator implementation](../../../r1/resnet), the Keras -implementation has code for the ImageNet dataset. The ImageNet -version uses a ResNet50 model implemented in -[`resnet_model.py`](./resnet_model.py). - - -### Pretrained Models - -* [ResNet50 Checkpoints](https://storage.googleapis.com/cloud-tpu-checkpoints/resnet/resnet50.tar.gz) - -* ResNet50 TFHub: [feature vector](https://tfhub.dev/tensorflow/resnet_50/feature_vector/1) -and [classification](https://tfhub.dev/tensorflow/resnet_50/classification/1) - -Again, if you did not download the data to the default directory, specify the -location with the `--data_dir` flag: - -```bash -python3 resnet_ctl_imagenet_main.py --data_dir=/path/to/imagenet -``` - -There are more flag options you can specify. Here are some examples: - -- `--use_synthetic_data`: when set to true, synthetic data, rather than real -data, are used; -- `--batch_size`: the batch size used for the model; -- `--model_dir`: the directory to save the model checkpoint; -- `--train_epochs`: number of epoches to run for training the model; -- `--train_steps`: number of steps to run for training the model. We now only -support a number that is smaller than the number of batches in an epoch. -- `--skip_eval`: when set to true, evaluation as well as validation during -training is skipped - -For example, this is a typical command line to run with ImageNet data with -batch size 128 per GPU: - -```bash -python3 -m resnet_ctl_imagenet_main.py \ - --model_dir=/tmp/model_dir/something \ - --num_gpus=2 \ - --batch_size=128 \ - --train_epochs=90 \ - --train_steps=10 \ - --use_synthetic_data=false -``` - -See [`common.py`](common.py) for full list of options. - -### Using multiple GPUs - -You can train these models on multiple GPUs using `tf.distribute.Strategy` API. -You can read more about them in this -[guide](https://www.tensorflow.org/guide/distribute_strategy). - -In this example, we have made it easier to use is with just a command line flag -`--num_gpus`. By default this flag is 1 if TensorFlow is compiled with CUDA, -and 0 otherwise. - -- --num_gpus=0: Uses tf.distribute.OneDeviceStrategy with CPU as the device. -- --num_gpus=1: Uses tf.distribute.OneDeviceStrategy with GPU as the device. -- --num_gpus=2+: Uses tf.distribute.MirroredStrategy to run synchronous -distributed training across the GPUs. - -If you wish to run without `tf.distribute.Strategy`, you can do so by setting -`--distribution_strategy=off`. - -### Running on multiple GPU hosts - -You can also train these models on multiple hosts, each with GPUs, using -`tf.distribute.Strategy`. - -The easiest way to run multi-host benchmarks is to set the -[`TF_CONFIG`](https://www.tensorflow.org/guide/distributed_training#TF_CONFIG) -appropriately at each host. e.g., to run using `MultiWorkerMirroredStrategy` on -2 hosts, the `cluster` in `TF_CONFIG` should have 2 `host:port` entries, and -host `i` should have the `task` in `TF_CONFIG` set to `{"type": "worker", -"index": i}`. `MultiWorkerMirroredStrategy` will automatically use all the -available GPUs at each host. - -### Running on Cloud TPUs - -Note: This model will **not** work with TPUs on Colab. - -You can train the ResNet CTL model on Cloud TPUs using -`tf.distribute.TPUStrategy`. If you are not familiar with Cloud TPUs, it is -strongly recommended that you go through the -[quickstart](https://cloud.google.com/tpu/docs/quickstart) to learn how to -create a TPU and GCE VM. - -To run ResNet model on a TPU, you must set `--distribution_strategy=tpu` and -`--tpu=$TPU_NAME`, where `$TPU_NAME` the name of your TPU in the Cloud Console. -From a GCE VM, you can run the following command to train ResNet for one epoch -on a v2-8 or v3-8 TPU by setting `TRAIN_EPOCHS` to 1: - -```bash -python3 resnet_ctl_imagenet_main.py \ - --tpu=$TPU_NAME \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --batch_size=1024 \ - --steps_per_loop=500 \ - --train_epochs=$TRAIN_EPOCHS \ - --use_synthetic_data=false \ - --dtype=fp32 \ - --enable_eager=true \ - --enable_tensorboard=true \ - --distribution_strategy=tpu \ - --log_steps=50 \ - --single_l2_loss_op=true \ - --use_tf_function=true -``` - -To train the ResNet to convergence, run it for 90 epochs by setting -`TRAIN_EPOCHS` to 90. - -Note: `$MODEL_DIR` and `$DATA_DIR` must be GCS paths. diff --git a/official/vision/image_classification/resnet/__init__.py b/official/vision/image_classification/resnet/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/image_classification/resnet/common.py b/official/vision/image_classification/resnet/common.py deleted file mode 100644 index a034ba7dd0be5b2b2536727137497c84519001a5..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/common.py +++ /dev/null @@ -1,418 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Common util functions and classes used by both keras cifar and imagenet.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os - -from absl import flags -import tensorflow as tf - -import tensorflow_model_optimization as tfmot -from official.utils.flags import core as flags_core -from official.utils.misc import keras_utils - -FLAGS = flags.FLAGS -BASE_LEARNING_RATE = 0.1 # This matches Jing's version. -TRAIN_TOP_1 = 'training_accuracy_top_1' -LR_SCHEDULE = [ # (multiplier, epoch to start) tuples - (1.0, 5), (0.1, 30), (0.01, 60), (0.001, 80) -] - - -class PiecewiseConstantDecayWithWarmup( - tf.keras.optimizers.schedules.LearningRateSchedule): - """Piecewise constant decay with warmup schedule.""" - - def __init__(self, - batch_size, - epoch_size, - warmup_epochs, - boundaries, - multipliers, - compute_lr_on_cpu=True, - name=None): - super(PiecewiseConstantDecayWithWarmup, self).__init__() - if len(boundaries) != len(multipliers) - 1: - raise ValueError('The length of boundaries must be 1 less than the ' - 'length of multipliers') - - base_lr_batch_size = 256 - steps_per_epoch = epoch_size // batch_size - - self.rescaled_lr = BASE_LEARNING_RATE * batch_size / base_lr_batch_size - self.step_boundaries = [float(steps_per_epoch) * x for x in boundaries] - self.lr_values = [self.rescaled_lr * m for m in multipliers] - self.warmup_steps = warmup_epochs * steps_per_epoch - self.compute_lr_on_cpu = compute_lr_on_cpu - self.name = name - - self.learning_rate_ops_cache = {} - - def __call__(self, step): - if tf.executing_eagerly(): - return self._get_learning_rate(step) - - # In an eager function or graph, the current implementation of optimizer - # repeatedly call and thus create ops for the learning rate schedule. To - # avoid this, we cache the ops if not executing eagerly. - graph = tf.compat.v1.get_default_graph() - if graph not in self.learning_rate_ops_cache: - if self.compute_lr_on_cpu: - with tf.device('/device:CPU:0'): - self.learning_rate_ops_cache[graph] = self._get_learning_rate(step) - else: - self.learning_rate_ops_cache[graph] = self._get_learning_rate(step) - return self.learning_rate_ops_cache[graph] - - def _get_learning_rate(self, step): - """Compute learning rate at given step.""" - with tf.name_scope('PiecewiseConstantDecayWithWarmup'): - - def warmup_lr(step): - return self.rescaled_lr * ( - tf.cast(step, tf.float32) / tf.cast(self.warmup_steps, tf.float32)) - - def piecewise_lr(step): - return tf.compat.v1.train.piecewise_constant(step, self.step_boundaries, - self.lr_values) - - return tf.cond(step < self.warmup_steps, lambda: warmup_lr(step), - lambda: piecewise_lr(step)) - - def get_config(self): - return { - 'rescaled_lr': self.rescaled_lr, - 'step_boundaries': self.step_boundaries, - 'lr_values': self.lr_values, - 'warmup_steps': self.warmup_steps, - 'compute_lr_on_cpu': self.compute_lr_on_cpu, - 'name': self.name - } - - -def get_optimizer(learning_rate=0.1): - """Returns optimizer to use.""" - # The learning_rate is overwritten at the beginning of each step by callback. - return tf.keras.optimizers.SGD(learning_rate=learning_rate, momentum=0.9) - - -def get_callbacks(pruning_method=None, - enable_checkpoint_and_export=False, - model_dir=None): - """Returns common callbacks.""" - time_callback = keras_utils.TimeHistory( - FLAGS.batch_size, - FLAGS.log_steps, - logdir=FLAGS.model_dir if FLAGS.enable_tensorboard else None) - callbacks = [time_callback] - - if FLAGS.enable_tensorboard: - tensorboard_callback = tf.keras.callbacks.TensorBoard( - log_dir=FLAGS.model_dir, profile_batch=FLAGS.profile_steps) - callbacks.append(tensorboard_callback) - - is_pruning_enabled = pruning_method is not None - if is_pruning_enabled: - callbacks.append(tfmot.sparsity.keras.UpdatePruningStep()) - if model_dir is not None: - callbacks.append( - tfmot.sparsity.keras.PruningSummaries( - log_dir=model_dir, profile_batch=0)) - - if enable_checkpoint_and_export: - if model_dir is not None: - ckpt_full_path = os.path.join(model_dir, 'model.ckpt-{epoch:04d}') - callbacks.append( - tf.keras.callbacks.ModelCheckpoint( - ckpt_full_path, save_weights_only=True)) - return callbacks - - -def build_stats(history, eval_output, callbacks): - """Normalizes and returns dictionary of stats. - - Args: - history: Results of the training step. Supports both categorical_accuracy - and sparse_categorical_accuracy. - eval_output: Output of the eval step. Assumes first value is eval_loss and - second value is accuracy_top_1. - callbacks: a list of callbacks which might include a time history callback - used during keras.fit. - - Returns: - Dictionary of normalized results. - """ - stats = {} - if eval_output: - stats['accuracy_top_1'] = float(eval_output[1]) - stats['eval_loss'] = float(eval_output[0]) - if history and history.history: - train_hist = history.history - # Gets final loss from training. - stats['loss'] = float(train_hist['loss'][-1]) - # Gets top_1 training accuracy. - if 'categorical_accuracy' in train_hist: - stats[TRAIN_TOP_1] = float(train_hist['categorical_accuracy'][-1]) - elif 'sparse_categorical_accuracy' in train_hist: - stats[TRAIN_TOP_1] = float(train_hist['sparse_categorical_accuracy'][-1]) - elif 'accuracy' in train_hist: - stats[TRAIN_TOP_1] = float(train_hist['accuracy'][-1]) - - if not callbacks: - return stats - - # Look for the time history callback which was used during keras.fit - for callback in callbacks: - if isinstance(callback, keras_utils.TimeHistory): - timestamp_log = callback.timestamp_log - stats['step_timestamp_log'] = timestamp_log - stats['train_finish_time'] = callback.train_finish_time - if callback.epoch_runtime_log: - stats['avg_exp_per_second'] = callback.average_examples_per_second - - return stats - - -def define_keras_flags(model=False, - optimizer=False, - pretrained_filepath=False): - """Define flags for Keras models.""" - flags_core.define_base( - clean=True, - num_gpu=True, - run_eagerly=True, - train_epochs=True, - epochs_between_evals=True, - distribution_strategy=True) - flags_core.define_performance( - num_parallel_calls=False, - synthetic_data=True, - dtype=True, - all_reduce_alg=True, - num_packs=True, - tf_gpu_thread_mode=True, - datasets_num_private_threads=True, - loss_scale=True, - fp16_implementation=True, - tf_data_experimental_slack=True, - enable_xla=True, - training_dataset_cache=True) - flags_core.define_image() - flags_core.define_benchmark() - flags_core.define_distribution() - flags.adopt_module_key_flags(flags_core) - - flags.DEFINE_boolean(name='enable_eager', default=False, help='Enable eager?') - flags.DEFINE_boolean(name='skip_eval', default=False, help='Skip evaluation?') - # TODO(b/135607288): Remove this flag once we understand the root cause of - # slowdown when setting the learning phase in Keras backend. - flags.DEFINE_boolean( - name='set_learning_phase_to_train', - default=True, - help='If skip eval, also set Keras learning phase to 1 (training).') - flags.DEFINE_boolean( - name='explicit_gpu_placement', - default=False, - help='If not using distribution strategy, explicitly set device scope ' - 'for the Keras training loop.') - flags.DEFINE_boolean( - name='use_trivial_model', - default=False, - help='Whether to use a trivial Keras model.') - flags.DEFINE_boolean( - name='report_accuracy_metrics', - default=True, - help='Report metrics during training and evaluation.') - flags.DEFINE_boolean( - name='use_tensor_lr', - default=True, - help='Use learning rate tensor instead of a callback.') - flags.DEFINE_boolean( - name='enable_tensorboard', - default=False, - help='Whether to enable Tensorboard callback.') - flags.DEFINE_string( - name='profile_steps', - default=None, - help='Save profiling data to model dir at given range of global steps. The ' - 'value must be a comma separated pair of positive integers, specifying ' - 'the first and last step to profile. For example, "--profile_steps=2,4" ' - 'triggers the profiler to process 3 steps, starting from the 2nd step. ' - 'Note that profiler has a non-trivial performance overhead, and the ' - 'output file can be gigantic if profiling many steps.') - flags.DEFINE_integer( - name='train_steps', - default=None, - help='The number of steps to run for training. If it is larger than ' - '# batches per epoch, then use # batches per epoch. This flag will be ' - 'ignored if train_epochs is set to be larger than 1. ') - flags.DEFINE_boolean( - name='batchnorm_spatial_persistent', - default=True, - help='Enable the spacial persistent mode for CuDNN batch norm kernel.') - flags.DEFINE_boolean( - name='enable_get_next_as_optional', - default=False, - help='Enable get_next_as_optional behavior in DistributedIterator.') - flags.DEFINE_boolean( - name='enable_checkpoint_and_export', - default=False, - help='Whether to enable a checkpoint callback and export the savedmodel.') - flags.DEFINE_string(name='tpu', default='', help='TPU address to connect to.') - flags.DEFINE_integer( - name='steps_per_loop', - default=None, - help='Number of steps per training loop. Only training step happens ' - 'inside the loop. Callbacks will not be called inside. Will be capped at ' - 'steps per epoch.') - flags.DEFINE_boolean( - name='use_tf_while_loop', - default=True, - help='Whether to build a tf.while_loop inside the training loop on the ' - 'host. Setting it to True is critical to have peak performance on ' - 'TPU.') - - if model: - flags.DEFINE_string('model', 'resnet50_v1.5', - 'Name of model preset. (mobilenet, resnet50_v1.5)') - if optimizer: - flags.DEFINE_string( - 'optimizer', 'resnet50_default', 'Name of optimizer preset. ' - '(mobilenet_default, resnet50_default)') - # TODO(kimjaehong): Replace as general hyper-params not only for mobilenet. - flags.DEFINE_float( - 'initial_learning_rate_per_sample', 0.00007, - 'Initial value of learning rate per sample for ' - 'mobilenet_default.') - flags.DEFINE_float('lr_decay_factor', 0.94, - 'Learning rate decay factor for mobilenet_default.') - flags.DEFINE_float('num_epochs_per_decay', 2.5, - 'Number of epochs per decay for mobilenet_default.') - if pretrained_filepath: - flags.DEFINE_string('pretrained_filepath', '', 'Pretrained file path.') - - -def get_synth_data(height, width, num_channels, num_classes, dtype): - """Creates a set of synthetic random data. - - Args: - height: Integer height that will be used to create a fake image tensor. - width: Integer width that will be used to create a fake image tensor. - num_channels: Integer depth that will be used to create a fake image tensor. - num_classes: Number of classes that should be represented in the fake labels - tensor - dtype: Data type for features/images. - - Returns: - A tuple of tensors representing the inputs and labels. - - """ - # Synthetic input should be within [0, 255]. - inputs = tf.random.truncated_normal([height, width, num_channels], - dtype=dtype, - mean=127, - stddev=60, - name='synthetic_inputs') - labels = tf.random.uniform([1], - minval=0, - maxval=num_classes - 1, - dtype=tf.int32, - name='synthetic_labels') - return inputs, labels - - -def define_pruning_flags(): - """Define flags for pruning methods.""" - flags.DEFINE_string( - 'pruning_method', None, 'Pruning method.' - 'None (no pruning) or polynomial_decay.') - flags.DEFINE_float('pruning_initial_sparsity', 0.0, - 'Initial sparsity for pruning.') - flags.DEFINE_float('pruning_final_sparsity', 0.5, - 'Final sparsity for pruning.') - flags.DEFINE_integer('pruning_begin_step', 0, 'Begin step for pruning.') - flags.DEFINE_integer('pruning_end_step', 100000, 'End step for pruning.') - flags.DEFINE_integer('pruning_frequency', 100, 'Frequency for pruning.') - - -def define_clustering_flags(): - """Define flags for clustering methods.""" - flags.DEFINE_string('clustering_method', None, - 'None (no clustering) or selective_clustering ' - '(cluster last three Conv2D layers of the model).') - - -def get_synth_input_fn(height, - width, - num_channels, - num_classes, - dtype=tf.float32, - drop_remainder=True): - """Returns an input function that returns a dataset with random data. - - This input_fn returns a data set that iterates over a set of random data and - bypasses all preprocessing, e.g. jpeg decode and copy. The host to device - copy is still included. This used to find the upper throughput bound when - tuning the full input pipeline. - - Args: - height: Integer height that will be used to create a fake image tensor. - width: Integer width that will be used to create a fake image tensor. - num_channels: Integer depth that will be used to create a fake image tensor. - num_classes: Number of classes that should be represented in the fake labels - tensor - dtype: Data type for features/images. - drop_remainder: A boolean indicates whether to drop the remainder of the - batches. If True, the batch dimension will be static. - - Returns: - An input_fn that can be used in place of a real one to return a dataset - that can be used for iteration. - """ - - # pylint: disable=unused-argument - def input_fn(is_training, data_dir, batch_size, *args, **kwargs): - """Returns dataset filled with random data.""" - inputs, labels = get_synth_data( - height=height, - width=width, - num_channels=num_channels, - num_classes=num_classes, - dtype=dtype) - # Cast to float32 for Keras model. - labels = tf.cast(labels, dtype=tf.float32) - data = tf.data.Dataset.from_tensors((inputs, labels)).repeat() - - # `drop_remainder` will make dataset produce outputs with known shapes. - data = data.batch(batch_size, drop_remainder=drop_remainder) - data = data.prefetch(buffer_size=tf.data.experimental.AUTOTUNE) - return data - - return input_fn - - -def set_cudnn_batchnorm_mode(): - """Set CuDNN batchnorm mode for better performance. - - Note: Spatial Persistent mode may lead to accuracy losses for certain - models. - """ - if FLAGS.batchnorm_spatial_persistent: - os.environ['TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT'] = '1' - else: - os.environ.pop('TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT', None) diff --git a/official/vision/image_classification/resnet/imagenet_preprocessing.py b/official/vision/image_classification/resnet/imagenet_preprocessing.py deleted file mode 100644 index 86ba3ed98084987ea5d63edf8fd5f515d58fba93..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/imagenet_preprocessing.py +++ /dev/null @@ -1,574 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Provides utilities to preprocess images. - -Training images are sampled using the provided bounding boxes, and subsequently -cropped to the sampled bounding box. Images are additionally flipped randomly, -then resized to the target output size (without aspect-ratio preservation). - -Images used during evaluation are resized (with aspect-ratio preservation) and -centrally cropped. - -All images undergo mean color subtraction. - -Note that these steps are colloquially referred to as "ResNet preprocessing," -and they differ from "VGG preprocessing," which does not use bounding boxes -and instead does an aspect-preserving resize followed by random crop during -training. (These both differ from "Inception preprocessing," which introduces -color distortion steps.) - -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os - -from absl import logging -import tensorflow as tf - -DEFAULT_IMAGE_SIZE = 224 -NUM_CHANNELS = 3 -NUM_CLASSES = 1001 - -NUM_IMAGES = { - 'train': 1281167, - 'validation': 50000, -} - -_NUM_TRAIN_FILES = 1024 -_SHUFFLE_BUFFER = 10000 - -_R_MEAN = 123.68 -_G_MEAN = 116.78 -_B_MEAN = 103.94 -CHANNEL_MEANS = [_R_MEAN, _G_MEAN, _B_MEAN] - -# The lower bound for the smallest side of the image for aspect-preserving -# resizing. For example, if an image is 500 x 1000, it will be resized to -# _RESIZE_MIN x (_RESIZE_MIN * 2). -_RESIZE_MIN = 256 - - -def process_record_dataset(dataset, - is_training, - batch_size, - shuffle_buffer, - parse_record_fn, - dtype=tf.float32, - datasets_num_private_threads=None, - drop_remainder=False, - tf_data_experimental_slack=False): - """Given a Dataset with raw records, return an iterator over the records. - - Args: - dataset: A Dataset representing raw records - is_training: A boolean denoting whether the input is for training. - batch_size: The number of samples per batch. - shuffle_buffer: The buffer size to use when shuffling records. A larger - value results in better randomness, but smaller values reduce startup time - and use less memory. - parse_record_fn: A function that takes a raw record and returns the - corresponding (image, label) pair. - dtype: Data type to use for images/features. - datasets_num_private_threads: Number of threads for a private threadpool - created for all datasets computation. - drop_remainder: A boolean indicates whether to drop the remainder of the - batches. If True, the batch dimension will be static. - tf_data_experimental_slack: Whether to enable tf.data's `experimental_slack` - option. - - Returns: - Dataset of (image, label) pairs ready for iteration. - """ - # Defines a specific size thread pool for tf.data operations. - if datasets_num_private_threads: - options = tf.data.Options() - options.experimental_threading.private_threadpool_size = ( - datasets_num_private_threads) - dataset = dataset.with_options(options) - logging.info('datasets_num_private_threads: %s', - datasets_num_private_threads) - - if is_training: - # Shuffles records before repeating to respect epoch boundaries. - dataset = dataset.shuffle(buffer_size=shuffle_buffer) - # Repeats the dataset for the number of epochs to train. - dataset = dataset.repeat() - - # Parses the raw records into images and labels. - dataset = dataset.map( - lambda value: parse_record_fn(value, is_training, dtype), - num_parallel_calls=tf.data.experimental.AUTOTUNE) - dataset = dataset.batch(batch_size, drop_remainder=drop_remainder) - - # Operations between the final prefetch and the get_next call to the iterator - # will happen synchronously during run time. We prefetch here again to - # background all of the above processing work and keep it out of the - # critical training path. Setting buffer_size to tf.data.experimental.AUTOTUNE - # allows DistributionStrategies to adjust how many batches to fetch based - # on how many devices are present. - dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE) - - options = tf.data.Options() - options.experimental_slack = tf_data_experimental_slack - dataset = dataset.with_options(options) - - return dataset - - -def get_filenames(is_training, data_dir): - """Return filenames for dataset.""" - if is_training: - return [ - os.path.join(data_dir, 'train-%05d-of-01024' % i) - for i in range(_NUM_TRAIN_FILES) - ] - else: - return [ - os.path.join(data_dir, 'validation-%05d-of-00128' % i) - for i in range(128) - ] - - -def parse_example_proto(example_serialized): - """Parses an Example proto containing a training example of an image. - - The output of the build_image_data.py image preprocessing script is a dataset - containing serialized Example protocol buffers. Each Example proto contains - the following fields (values are included as examples): - - image/height: 462 - image/width: 581 - image/colorspace: 'RGB' - image/channels: 3 - image/class/label: 615 - image/class/synset: 'n03623198' - image/class/text: 'knee pad' - image/object/bbox/xmin: 0.1 - image/object/bbox/xmax: 0.9 - image/object/bbox/ymin: 0.2 - image/object/bbox/ymax: 0.6 - image/object/bbox/label: 615 - image/format: 'JPEG' - image/filename: 'ILSVRC2012_val_00041207.JPEG' - image/encoded: - - Args: - example_serialized: scalar Tensor tf.string containing a serialized Example - protocol buffer. - - Returns: - image_buffer: Tensor tf.string containing the contents of a JPEG file. - label: Tensor tf.int32 containing the label. - bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] - where each coordinate is [0, 1) and the coordinates are arranged as - [ymin, xmin, ymax, xmax]. - """ - # Dense features in Example proto. - feature_map = { - 'image/encoded': - tf.io.FixedLenFeature([], dtype=tf.string, default_value=''), - 'image/class/label': - tf.io.FixedLenFeature([], dtype=tf.int64, default_value=-1), - 'image/class/text': - tf.io.FixedLenFeature([], dtype=tf.string, default_value=''), - } - sparse_float32 = tf.io.VarLenFeature(dtype=tf.float32) - # Sparse features in Example proto. - feature_map.update({ - k: sparse_float32 for k in [ - 'image/object/bbox/xmin', 'image/object/bbox/ymin', - 'image/object/bbox/xmax', 'image/object/bbox/ymax' - ] - }) - - features = tf.io.parse_single_example( - serialized=example_serialized, features=feature_map) - label = tf.cast(features['image/class/label'], dtype=tf.int32) - - xmin = tf.expand_dims(features['image/object/bbox/xmin'].values, 0) - ymin = tf.expand_dims(features['image/object/bbox/ymin'].values, 0) - xmax = tf.expand_dims(features['image/object/bbox/xmax'].values, 0) - ymax = tf.expand_dims(features['image/object/bbox/ymax'].values, 0) - - # Note that we impose an ordering of (y, x) just to make life difficult. - bbox = tf.concat([ymin, xmin, ymax, xmax], 0) - - # Force the variable number of bounding boxes into the shape - # [1, num_boxes, coords]. - bbox = tf.expand_dims(bbox, 0) - bbox = tf.transpose(a=bbox, perm=[0, 2, 1]) - - return features['image/encoded'], label, bbox - - -def parse_record(raw_record, is_training, dtype): - """Parses a record containing a training example of an image. - - The input record is parsed into a label and image, and the image is passed - through preprocessing steps (cropping, flipping, and so on). - - Args: - raw_record: scalar Tensor tf.string containing a serialized Example protocol - buffer. - is_training: A boolean denoting whether the input is for training. - dtype: data type to use for images/features. - - Returns: - Tuple with processed image tensor in a channel-last format and - one-hot-encoded label tensor. - """ - image_buffer, label, bbox = parse_example_proto(raw_record) - - image = preprocess_image( - image_buffer=image_buffer, - bbox=bbox, - output_height=DEFAULT_IMAGE_SIZE, - output_width=DEFAULT_IMAGE_SIZE, - num_channels=NUM_CHANNELS, - is_training=is_training) - image = tf.cast(image, dtype) - - # Subtract one so that labels are in [0, 1000), and cast to float32 for - # Keras model. - label = tf.cast( - tf.cast(tf.reshape(label, shape=[1]), dtype=tf.int32) - 1, - dtype=tf.float32) - return image, label - - -def get_parse_record_fn(use_keras_image_data_format=False): - """Get a function for parsing the records, accounting for image format. - - This is useful by handling different types of Keras models. For instance, - the current resnet_model.resnet50 input format is always channel-last, - whereas the keras_applications mobilenet input format depends on - tf.keras.backend.image_data_format(). We should set - use_keras_image_data_format=False for the former and True for the latter. - - Args: - use_keras_image_data_format: A boolean denoting whether data format is keras - backend image data format. If False, the image format is channel-last. If - True, the image format matches tf.keras.backend.image_data_format(). - - Returns: - Function to use for parsing the records. - """ - - def parse_record_fn(raw_record, is_training, dtype): - image, label = parse_record(raw_record, is_training, dtype) - if use_keras_image_data_format: - if tf.keras.backend.image_data_format() == 'channels_first': - image = tf.transpose(image, perm=[2, 0, 1]) - return image, label - - return parse_record_fn - - -def input_fn(is_training, - data_dir, - batch_size, - dtype=tf.float32, - datasets_num_private_threads=None, - parse_record_fn=parse_record, - input_context=None, - drop_remainder=False, - tf_data_experimental_slack=False, - training_dataset_cache=False, - filenames=None): - """Input function which provides batches for train or eval. - - Args: - is_training: A boolean denoting whether the input is for training. - data_dir: The directory containing the input data. - batch_size: The number of samples per batch. - dtype: Data type to use for images/features - datasets_num_private_threads: Number of private threads for tf.data. - parse_record_fn: Function to use for parsing the records. - input_context: A `tf.distribute.InputContext` object passed in by - `tf.distribute.Strategy`. - drop_remainder: A boolean indicates whether to drop the remainder of the - batches. If True, the batch dimension will be static. - tf_data_experimental_slack: Whether to enable tf.data's `experimental_slack` - option. - training_dataset_cache: Whether to cache the training dataset on workers. - Typically used to improve training performance when training data is in - remote storage and can fit into worker memory. - filenames: Optional field for providing the file names of the TFRecords. - - Returns: - A dataset that can be used for iteration. - """ - if filenames is None: - filenames = get_filenames(is_training, data_dir) - dataset = tf.data.Dataset.from_tensor_slices(filenames) - - if input_context: - logging.info( - 'Sharding the dataset: input_pipeline_id=%d num_input_pipelines=%d', - input_context.input_pipeline_id, input_context.num_input_pipelines) - dataset = dataset.shard(input_context.num_input_pipelines, - input_context.input_pipeline_id) - - if is_training: - # Shuffle the input files - dataset = dataset.shuffle(buffer_size=_NUM_TRAIN_FILES) - - # Convert to individual records. - # cycle_length = 10 means that up to 10 files will be read and deserialized in - # parallel. You may want to increase this number if you have a large number of - # CPU cores. - dataset = dataset.interleave( - tf.data.TFRecordDataset, - cycle_length=10, - num_parallel_calls=tf.data.experimental.AUTOTUNE) - - if is_training and training_dataset_cache: - # Improve training performance when training data is in remote storage and - # can fit into worker memory. - dataset = dataset.cache() - - return process_record_dataset( - dataset=dataset, - is_training=is_training, - batch_size=batch_size, - shuffle_buffer=_SHUFFLE_BUFFER, - parse_record_fn=parse_record_fn, - dtype=dtype, - datasets_num_private_threads=datasets_num_private_threads, - drop_remainder=drop_remainder, - tf_data_experimental_slack=tf_data_experimental_slack, - ) - - -def _decode_crop_and_flip(image_buffer, bbox, num_channels): - """Crops the given image to a random part of the image, and randomly flips. - - We use the fused decode_and_crop op, which performs better than the two ops - used separately in series, but note that this requires that the image be - passed in as an un-decoded string Tensor. - - Args: - image_buffer: scalar string Tensor representing the raw JPEG image buffer. - bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] - where each coordinate is [0, 1) and the coordinates are arranged as [ymin, - xmin, ymax, xmax]. - num_channels: Integer depth of the image buffer for decoding. - - Returns: - 3-D tensor with cropped image. - - """ - # A large fraction of image datasets contain a human-annotated bounding box - # delineating the region of the image containing the object of interest. We - # choose to create a new bounding box for the object which is a randomly - # distorted version of the human-annotated bounding box that obeys an - # allowed range of aspect ratios, sizes and overlap with the human-annotated - # bounding box. If no box is supplied, then we assume the bounding box is - # the entire image. - sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( - tf.image.extract_jpeg_shape(image_buffer), - bounding_boxes=bbox, - min_object_covered=0.1, - aspect_ratio_range=[0.75, 1.33], - area_range=[0.05, 1.0], - max_attempts=100, - use_image_if_no_bounding_boxes=True) - bbox_begin, bbox_size, _ = sample_distorted_bounding_box - - # Reassemble the bounding box in the format the crop op requires. - offset_y, offset_x, _ = tf.unstack(bbox_begin) - target_height, target_width, _ = tf.unstack(bbox_size) - crop_window = tf.stack([offset_y, offset_x, target_height, target_width]) - - # Use the fused decode and crop op here, which is faster than each in series. - cropped = tf.image.decode_and_crop_jpeg( - image_buffer, crop_window, channels=num_channels) - - # Flip to add a little more random distortion in. - cropped = tf.image.random_flip_left_right(cropped) - return cropped - - -def _central_crop(image, crop_height, crop_width): - """Performs central crops of the given image list. - - Args: - image: a 3-D image tensor - crop_height: the height of the image following the crop. - crop_width: the width of the image following the crop. - - Returns: - 3-D tensor with cropped image. - """ - shape = tf.shape(input=image) - height, width = shape[0], shape[1] - - amount_to_be_cropped_h = (height - crop_height) - crop_top = amount_to_be_cropped_h // 2 - amount_to_be_cropped_w = (width - crop_width) - crop_left = amount_to_be_cropped_w // 2 - return tf.slice(image, [crop_top, crop_left, 0], - [crop_height, crop_width, -1]) - - -def _mean_image_subtraction(image, means, num_channels): - """Subtracts the given means from each image channel. - - For example: - means = [123.68, 116.779, 103.939] - image = _mean_image_subtraction(image, means) - - Note that the rank of `image` must be known. - - Args: - image: a tensor of size [height, width, C]. - means: a C-vector of values to subtract from each channel. - num_channels: number of color channels in the image that will be distorted. - - Returns: - the centered image. - - Raises: - ValueError: If the rank of `image` is unknown, if `image` has a rank other - than three or if the number of channels in `image` doesn't match the - number of values in `means`. - """ - if image.get_shape().ndims != 3: - raise ValueError('Input must be of size [height, width, C>0]') - - if len(means) != num_channels: - raise ValueError('len(means) must match the number of channels') - - # We have a 1-D tensor of means; convert to 3-D. - # Note(b/130245863): we explicitly call `broadcast` instead of simply - # expanding dimensions for better performance. - means = tf.broadcast_to(means, tf.shape(image)) - - return image - means - - -def _smallest_size_at_least(height, width, resize_min): - """Computes new shape with the smallest side equal to `smallest_side`. - - Computes new shape with the smallest side equal to `smallest_side` while - preserving the original aspect ratio. - - Args: - height: an int32 scalar tensor indicating the current height. - width: an int32 scalar tensor indicating the current width. - resize_min: A python integer or scalar `Tensor` indicating the size of the - smallest side after resize. - - Returns: - new_height: an int32 scalar tensor indicating the new height. - new_width: an int32 scalar tensor indicating the new width. - """ - resize_min = tf.cast(resize_min, tf.float32) - - # Convert to floats to make subsequent calculations go smoothly. - height, width = tf.cast(height, tf.float32), tf.cast(width, tf.float32) - - smaller_dim = tf.minimum(height, width) - scale_ratio = resize_min / smaller_dim - - # Convert back to ints to make heights and widths that TF ops will accept. - new_height = tf.cast(height * scale_ratio, tf.int32) - new_width = tf.cast(width * scale_ratio, tf.int32) - - return new_height, new_width - - -def _aspect_preserving_resize(image, resize_min): - """Resize images preserving the original aspect ratio. - - Args: - image: A 3-D image `Tensor`. - resize_min: A python integer or scalar `Tensor` indicating the size of the - smallest side after resize. - - Returns: - resized_image: A 3-D tensor containing the resized image. - """ - shape = tf.shape(input=image) - height, width = shape[0], shape[1] - - new_height, new_width = _smallest_size_at_least(height, width, resize_min) - - return _resize_image(image, new_height, new_width) - - -def _resize_image(image, height, width): - """Simple wrapper around tf.resize_images. - - This is primarily to make sure we use the same `ResizeMethod` and other - details each time. - - Args: - image: A 3-D image `Tensor`. - height: The target height for the resized image. - width: The target width for the resized image. - - Returns: - resized_image: A 3-D tensor containing the resized image. The first two - dimensions have the shape [height, width]. - """ - return tf.compat.v1.image.resize( - image, [height, width], - method=tf.image.ResizeMethod.BILINEAR, - align_corners=False) - - -def preprocess_image(image_buffer, - bbox, - output_height, - output_width, - num_channels, - is_training=False): - """Preprocesses the given image. - - Preprocessing includes decoding, cropping, and resizing for both training - and eval images. Training preprocessing, however, introduces some random - distortion of the image to improve accuracy. - - Args: - image_buffer: scalar string Tensor representing the raw JPEG image buffer. - bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] - where each coordinate is [0, 1) and the coordinates are arranged as [ymin, - xmin, ymax, xmax]. - output_height: The height of the image after preprocessing. - output_width: The width of the image after preprocessing. - num_channels: Integer depth of the image buffer for decoding. - is_training: `True` if we're preprocessing the image for training and - `False` otherwise. - - Returns: - A preprocessed image. - """ - if is_training: - # For training, we want to randomize some of the distortions. - image = _decode_crop_and_flip(image_buffer, bbox, num_channels) - image = _resize_image(image, output_height, output_width) - else: - # For validation, we want to decode, resize, then just crop the middle. - image = tf.image.decode_jpeg(image_buffer, channels=num_channels) - image = _aspect_preserving_resize(image, _RESIZE_MIN) - image = _central_crop(image, output_height, output_width) - - image.set_shape([output_height, output_width, num_channels]) - - return _mean_image_subtraction(image, CHANNEL_MEANS, num_channels) diff --git a/official/vision/image_classification/resnet/resnet_config.py b/official/vision/image_classification/resnet/resnet_config.py deleted file mode 100644 index e39db3955f9fe9c312ea307c8ac3196d45447cf3..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/resnet_config.py +++ /dev/null @@ -1,55 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Configuration definitions for ResNet losses, learning rates, and optimizers.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import dataclasses - -from official.modeling.hyperparams import base_config -from official.vision.image_classification.configs import base_configs - - -@dataclasses.dataclass -class ResNetModelConfig(base_configs.ModelConfig): - """Configuration for the ResNet model.""" - name: str = 'ResNet' - num_classes: int = 1000 - model_params: base_config.Config = dataclasses.field( - default_factory=lambda: { - 'num_classes': 1000, - 'batch_size': None, - 'use_l2_regularizer': True, - 'rescale_inputs': False, - }) - loss: base_configs.LossConfig = base_configs.LossConfig( - name='sparse_categorical_crossentropy') - optimizer: base_configs.OptimizerConfig = base_configs.OptimizerConfig( - name='momentum', - decay=0.9, - epsilon=0.001, - momentum=0.9, - moving_average_decay=None) - learning_rate: base_configs.LearningRateConfig = ( - base_configs.LearningRateConfig( - name='stepwise', - initial_lr=0.1, - examples_per_epoch=1281167, - boundaries=[30, 60, 80], - warmup_epochs=5, - scale_by_batch_size=1. / 256., - multipliers=[0.1 / 256, 0.01 / 256, 0.001 / 256, 0.0001 / 256])) diff --git a/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py b/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py deleted file mode 100644 index a66461df17a3fe5fc0d75969e99920310a694e71..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py +++ /dev/null @@ -1,195 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Runs a ResNet model on the ImageNet dataset using custom training loops.""" - -import math -import os - -# Import libraries -from absl import app -from absl import flags -from absl import logging -import orbit -import tensorflow as tf -from official.common import distribute_utils -from official.modeling import performance -from official.utils.flags import core as flags_core -from official.utils.misc import keras_utils -from official.utils.misc import model_helpers -from official.vision.image_classification.resnet import common -from official.vision.image_classification.resnet import imagenet_preprocessing -from official.vision.image_classification.resnet import resnet_runnable - -flags.DEFINE_boolean(name='use_tf_function', default=True, - help='Wrap the train and test step inside a ' - 'tf.function.') -flags.DEFINE_boolean(name='single_l2_loss_op', default=False, - help='Calculate L2_loss on concatenated weights, ' - 'instead of using Keras per-layer L2 loss.') - - -def build_stats(runnable, time_callback): - """Normalizes and returns dictionary of stats. - - Args: - runnable: The module containing all the training and evaluation metrics. - time_callback: Time tracking callback instance. - - Returns: - Dictionary of normalized results. - """ - stats = {} - - if not runnable.flags_obj.skip_eval: - stats['eval_loss'] = runnable.test_loss.result().numpy() - stats['eval_acc'] = runnable.test_accuracy.result().numpy() - - stats['train_loss'] = runnable.train_loss.result().numpy() - stats['train_acc'] = runnable.train_accuracy.result().numpy() - - if time_callback: - timestamp_log = time_callback.timestamp_log - stats['step_timestamp_log'] = timestamp_log - stats['train_finish_time'] = time_callback.train_finish_time - if time_callback.epoch_runtime_log: - stats['avg_exp_per_second'] = time_callback.average_examples_per_second - - return stats - - -def get_num_train_iterations(flags_obj): - """Returns the number of training steps, train and test epochs.""" - train_steps = ( - imagenet_preprocessing.NUM_IMAGES['train'] // flags_obj.batch_size) - train_epochs = flags_obj.train_epochs - - if flags_obj.train_steps: - train_steps = min(flags_obj.train_steps, train_steps) - train_epochs = 1 - - eval_steps = math.ceil(1.0 * imagenet_preprocessing.NUM_IMAGES['validation'] / - flags_obj.batch_size) - - return train_steps, train_epochs, eval_steps - - -def run(flags_obj): - """Run ResNet ImageNet training and eval loop using custom training loops. - - Args: - flags_obj: An object containing parsed flag values. - - Raises: - ValueError: If fp16 is passed as it is not currently supported. - - Returns: - Dictionary of training and eval stats. - """ - keras_utils.set_session_config() - performance.set_mixed_precision_policy(flags_core.get_tf_dtype(flags_obj)) - - if tf.config.list_physical_devices('GPU'): - if flags_obj.tf_gpu_thread_mode: - keras_utils.set_gpu_thread_mode_and_count( - per_gpu_thread_count=flags_obj.per_gpu_thread_count, - gpu_thread_mode=flags_obj.tf_gpu_thread_mode, - num_gpus=flags_obj.num_gpus, - datasets_num_private_threads=flags_obj.datasets_num_private_threads) - common.set_cudnn_batchnorm_mode() - - data_format = flags_obj.data_format - if data_format is None: - data_format = ('channels_first' if tf.config.list_physical_devices('GPU') - else 'channels_last') - tf.keras.backend.set_image_data_format(data_format) - - strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=flags_obj.distribution_strategy, - num_gpus=flags_obj.num_gpus, - all_reduce_alg=flags_obj.all_reduce_alg, - num_packs=flags_obj.num_packs, - tpu_address=flags_obj.tpu) - - per_epoch_steps, train_epochs, eval_steps = get_num_train_iterations( - flags_obj) - if flags_obj.steps_per_loop is None: - steps_per_loop = per_epoch_steps - elif flags_obj.steps_per_loop > per_epoch_steps: - steps_per_loop = per_epoch_steps - logging.warn('Setting steps_per_loop to %d to respect epoch boundary.', - steps_per_loop) - else: - steps_per_loop = flags_obj.steps_per_loop - - logging.info( - 'Training %d epochs, each epoch has %d steps, ' - 'total steps: %d; Eval %d steps', train_epochs, per_epoch_steps, - train_epochs * per_epoch_steps, eval_steps) - - time_callback = keras_utils.TimeHistory( - flags_obj.batch_size, - flags_obj.log_steps, - logdir=flags_obj.model_dir if flags_obj.enable_tensorboard else None) - with distribute_utils.get_strategy_scope(strategy): - runnable = resnet_runnable.ResnetRunnable(flags_obj, time_callback, - per_epoch_steps) - - eval_interval = flags_obj.epochs_between_evals * per_epoch_steps - checkpoint_interval = ( - steps_per_loop * 5 if flags_obj.enable_checkpoint_and_export else None) - summary_interval = steps_per_loop if flags_obj.enable_tensorboard else None - - checkpoint_manager = tf.train.CheckpointManager( - runnable.checkpoint, - directory=flags_obj.model_dir, - max_to_keep=10, - step_counter=runnable.global_step, - checkpoint_interval=checkpoint_interval) - - resnet_controller = orbit.Controller( - strategy=strategy, - trainer=runnable, - evaluator=runnable if not flags_obj.skip_eval else None, - global_step=runnable.global_step, - steps_per_loop=steps_per_loop, - checkpoint_manager=checkpoint_manager, - summary_interval=summary_interval, - summary_dir=flags_obj.model_dir, - eval_summary_dir=os.path.join(flags_obj.model_dir, 'eval')) - - time_callback.on_train_begin() - if not flags_obj.skip_eval: - resnet_controller.train_and_evaluate( - train_steps=per_epoch_steps * train_epochs, - eval_steps=eval_steps, - eval_interval=eval_interval) - else: - resnet_controller.train(steps=per_epoch_steps * train_epochs) - time_callback.on_train_end() - - stats = build_stats(runnable, time_callback) - return stats - - -def main(_): - model_helpers.apply_clean(flags.FLAGS) - stats = run(flags.FLAGS) - logging.info('Run stats:\n%s', stats) - - -if __name__ == '__main__': - logging.set_verbosity(logging.INFO) - common.define_keras_flags() - app.run(main) diff --git a/official/vision/image_classification/resnet/resnet_model.py b/official/vision/image_classification/resnet/resnet_model.py deleted file mode 100644 index 597b85739e965a157aff995d14891f76698678d4..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/resnet_model.py +++ /dev/null @@ -1,325 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""ResNet50 model for Keras. - -Adapted from tf.keras.applications.resnet50.ResNet50(). -This is ResNet model version 1.5. - -Related papers/blogs: -- https://arxiv.org/abs/1512.03385 -- https://arxiv.org/pdf/1603.05027v2.pdf -- http://torch.ch/blog/2016/02/04/resnets.html - -""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf -from official.vision.image_classification.resnet import imagenet_preprocessing - -layers = tf.keras.layers - - -def _gen_l2_regularizer(use_l2_regularizer=True, l2_weight_decay=1e-4): - return tf.keras.regularizers.L2( - l2_weight_decay) if use_l2_regularizer else None - - -def identity_block(input_tensor, - kernel_size, - filters, - stage, - block, - use_l2_regularizer=True, - batch_norm_decay=0.9, - batch_norm_epsilon=1e-5): - """The identity block is the block that has no conv layer at shortcut. - - Args: - input_tensor: input tensor - kernel_size: default 3, the kernel size of middle conv layer at main path - filters: list of integers, the filters of 3 conv layer at main path - stage: integer, current stage label, used for generating layer names - block: 'a','b'..., current block label, used for generating layer names - use_l2_regularizer: whether to use L2 regularizer on Conv layer. - batch_norm_decay: Moment of batch norm layers. - batch_norm_epsilon: Epsilon of batch borm layers. - - Returns: - Output tensor for the block. - """ - filters1, filters2, filters3 = filters - if tf.keras.backend.image_data_format() == 'channels_last': - bn_axis = 3 - else: - bn_axis = 1 - conv_name_base = 'res' + str(stage) + block + '_branch' - bn_name_base = 'bn' + str(stage) + block + '_branch' - - x = layers.Conv2D( - filters1, (1, 1), - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name=conv_name_base + '2a')( - input_tensor) - x = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name=bn_name_base + '2a')( - x) - x = layers.Activation('relu')(x) - - x = layers.Conv2D( - filters2, - kernel_size, - padding='same', - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name=conv_name_base + '2b')( - x) - x = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name=bn_name_base + '2b')( - x) - x = layers.Activation('relu')(x) - - x = layers.Conv2D( - filters3, (1, 1), - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name=conv_name_base + '2c')( - x) - x = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name=bn_name_base + '2c')( - x) - - x = layers.add([x, input_tensor]) - x = layers.Activation('relu')(x) - return x - - -def conv_block(input_tensor, - kernel_size, - filters, - stage, - block, - strides=(2, 2), - use_l2_regularizer=True, - batch_norm_decay=0.9, - batch_norm_epsilon=1e-5): - """A block that has a conv layer at shortcut. - - Note that from stage 3, - the second conv layer at main path is with strides=(2, 2) - And the shortcut should have strides=(2, 2) as well - - Args: - input_tensor: input tensor - kernel_size: default 3, the kernel size of middle conv layer at main path - filters: list of integers, the filters of 3 conv layer at main path - stage: integer, current stage label, used for generating layer names - block: 'a','b'..., current block label, used for generating layer names - strides: Strides for the second conv layer in the block. - use_l2_regularizer: whether to use L2 regularizer on Conv layer. - batch_norm_decay: Moment of batch norm layers. - batch_norm_epsilon: Epsilon of batch borm layers. - - Returns: - Output tensor for the block. - """ - filters1, filters2, filters3 = filters - if tf.keras.backend.image_data_format() == 'channels_last': - bn_axis = 3 - else: - bn_axis = 1 - conv_name_base = 'res' + str(stage) + block + '_branch' - bn_name_base = 'bn' + str(stage) + block + '_branch' - - x = layers.Conv2D( - filters1, (1, 1), - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name=conv_name_base + '2a')( - input_tensor) - x = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name=bn_name_base + '2a')( - x) - x = layers.Activation('relu')(x) - - x = layers.Conv2D( - filters2, - kernel_size, - strides=strides, - padding='same', - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name=conv_name_base + '2b')( - x) - x = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name=bn_name_base + '2b')( - x) - x = layers.Activation('relu')(x) - - x = layers.Conv2D( - filters3, (1, 1), - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name=conv_name_base + '2c')( - x) - x = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name=bn_name_base + '2c')( - x) - - shortcut = layers.Conv2D( - filters3, (1, 1), - strides=strides, - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name=conv_name_base + '1')( - input_tensor) - shortcut = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name=bn_name_base + '1')( - shortcut) - - x = layers.add([x, shortcut]) - x = layers.Activation('relu')(x) - return x - - -def resnet50(num_classes, - batch_size=None, - use_l2_regularizer=True, - rescale_inputs=False, - batch_norm_decay=0.9, - batch_norm_epsilon=1e-5): - """Instantiates the ResNet50 architecture. - - Args: - num_classes: `int` number of classes for image classification. - batch_size: Size of the batches for each step. - use_l2_regularizer: whether to use L2 regularizer on Conv/Dense layer. - rescale_inputs: whether to rescale inputs from 0 to 1. - batch_norm_decay: Moment of batch norm layers. - batch_norm_epsilon: Epsilon of batch borm layers. - - Returns: - A Keras model instance. - """ - input_shape = (224, 224, 3) - img_input = layers.Input(shape=input_shape, batch_size=batch_size) - if rescale_inputs: - # Hub image modules expect inputs in the range [0, 1]. This rescales these - # inputs to the range expected by the trained model. - x = layers.Lambda( - lambda x: x * 255.0 - tf.keras.backend.constant( # pylint: disable=g-long-lambda - imagenet_preprocessing.CHANNEL_MEANS, - shape=[1, 1, 3], - dtype=x.dtype), - name='rescale')( - img_input) - else: - x = img_input - - if tf.keras.backend.image_data_format() == 'channels_first': - x = layers.Permute((3, 1, 2))(x) - bn_axis = 1 - else: # channels_last - bn_axis = 3 - - block_config = dict( - use_l2_regularizer=use_l2_regularizer, - batch_norm_decay=batch_norm_decay, - batch_norm_epsilon=batch_norm_epsilon) - x = layers.ZeroPadding2D(padding=(3, 3), name='conv1_pad')(x) - x = layers.Conv2D( - 64, (7, 7), - strides=(2, 2), - padding='valid', - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name='conv1')( - x) - x = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name='bn_conv1')( - x) - x = layers.Activation('relu')(x) - x = layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x) - - x = conv_block( - x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), **block_config) - x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', **block_config) - x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', **block_config) - - x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', **block_config) - x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', **block_config) - x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', **block_config) - x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', **block_config) - - x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', **block_config) - x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b', **block_config) - x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c', **block_config) - x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d', **block_config) - x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e', **block_config) - x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f', **block_config) - - x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', **block_config) - x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', **block_config) - x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', **block_config) - - x = layers.GlobalAveragePooling2D()(x) - x = layers.Dense( - num_classes, - kernel_initializer=tf.initializers.random_normal(stddev=0.01), - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - bias_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name='fc1000')( - x) - - # A softmax that is followed by the model loss must be done cannot be done - # in float16 due to numeric issues. So we pass dtype=float32. - x = layers.Activation('softmax', dtype='float32')(x) - - # Create model. - return tf.keras.Model(img_input, x, name='resnet50') diff --git a/official/vision/image_classification/resnet/resnet_runnable.py b/official/vision/image_classification/resnet/resnet_runnable.py deleted file mode 100644 index fe3059f77dfb73b3ac685aeea69102f8a4bb5ad4..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/resnet_runnable.py +++ /dev/null @@ -1,210 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Runs a ResNet model on the ImageNet dataset using custom training loops.""" - -import orbit -import tensorflow as tf -from official.modeling import grad_utils -from official.modeling import performance -from official.utils.flags import core as flags_core -from official.vision.image_classification.resnet import common -from official.vision.image_classification.resnet import imagenet_preprocessing -from official.vision.image_classification.resnet import resnet_model - - -class ResnetRunnable(orbit.StandardTrainer, orbit.StandardEvaluator): - """Implements the training and evaluation APIs for Resnet model.""" - - def __init__(self, flags_obj, time_callback, epoch_steps): - self.strategy = tf.distribute.get_strategy() - self.flags_obj = flags_obj - self.dtype = flags_core.get_tf_dtype(flags_obj) - self.time_callback = time_callback - - # Input pipeline related - batch_size = flags_obj.batch_size - if batch_size % self.strategy.num_replicas_in_sync != 0: - raise ValueError( - 'Batch size must be divisible by number of replicas : {}'.format( - self.strategy.num_replicas_in_sync)) - - # As auto rebatching is not supported in - # `distribute_datasets_from_function()` API, which is - # required when cloning dataset to multiple workers in eager mode, - # we use per-replica batch size. - self.batch_size = int(batch_size / self.strategy.num_replicas_in_sync) - - if self.flags_obj.use_synthetic_data: - self.input_fn = common.get_synth_input_fn( - height=imagenet_preprocessing.DEFAULT_IMAGE_SIZE, - width=imagenet_preprocessing.DEFAULT_IMAGE_SIZE, - num_channels=imagenet_preprocessing.NUM_CHANNELS, - num_classes=imagenet_preprocessing.NUM_CLASSES, - dtype=self.dtype, - drop_remainder=True) - else: - self.input_fn = imagenet_preprocessing.input_fn - - self.model = resnet_model.resnet50( - num_classes=imagenet_preprocessing.NUM_CLASSES, - use_l2_regularizer=not flags_obj.single_l2_loss_op) - - lr_schedule = common.PiecewiseConstantDecayWithWarmup( - batch_size=flags_obj.batch_size, - epoch_size=imagenet_preprocessing.NUM_IMAGES['train'], - warmup_epochs=common.LR_SCHEDULE[0][1], - boundaries=list(p[1] for p in common.LR_SCHEDULE[1:]), - multipliers=list(p[0] for p in common.LR_SCHEDULE), - compute_lr_on_cpu=True) - self.optimizer = common.get_optimizer(lr_schedule) - # Make sure iterations variable is created inside scope. - self.global_step = self.optimizer.iterations - - self.optimizer = performance.configure_optimizer( - self.optimizer, - use_float16=self.dtype == tf.float16, - loss_scale=flags_core.get_loss_scale(flags_obj, default_for_fp16=128)) - - self.train_loss = tf.keras.metrics.Mean('train_loss', dtype=tf.float32) - self.train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy( - 'train_accuracy', dtype=tf.float32) - self.test_loss = tf.keras.metrics.Mean('test_loss', dtype=tf.float32) - self.test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy( - 'test_accuracy', dtype=tf.float32) - - self.checkpoint = tf.train.Checkpoint( - model=self.model, optimizer=self.optimizer) - - # Handling epochs. - self.epoch_steps = epoch_steps - self.epoch_helper = orbit.utils.EpochHelper(epoch_steps, self.global_step) - train_dataset = orbit.utils.make_distributed_dataset( - self.strategy, - self.input_fn, - is_training=True, - data_dir=self.flags_obj.data_dir, - batch_size=self.batch_size, - parse_record_fn=imagenet_preprocessing.parse_record, - datasets_num_private_threads=self.flags_obj - .datasets_num_private_threads, - dtype=self.dtype, - drop_remainder=True) - orbit.StandardTrainer.__init__( - self, - train_dataset, - options=orbit.StandardTrainerOptions( - use_tf_while_loop=flags_obj.use_tf_while_loop, - use_tf_function=flags_obj.use_tf_function)) - if not flags_obj.skip_eval: - eval_dataset = orbit.utils.make_distributed_dataset( - self.strategy, - self.input_fn, - is_training=False, - data_dir=self.flags_obj.data_dir, - batch_size=self.batch_size, - parse_record_fn=imagenet_preprocessing.parse_record, - dtype=self.dtype) - orbit.StandardEvaluator.__init__( - self, - eval_dataset, - options=orbit.StandardEvaluatorOptions( - use_tf_function=flags_obj.use_tf_function)) - - def train_loop_begin(self): - """See base class.""" - # Reset all metrics - self.train_loss.reset_states() - self.train_accuracy.reset_states() - - self._epoch_begin() - self.time_callback.on_batch_begin(self.epoch_helper.batch_index) - - def train_step(self, iterator): - """See base class.""" - - def step_fn(inputs): - """Function to run on the device.""" - images, labels = inputs - with tf.GradientTape() as tape: - logits = self.model(images, training=True) - - prediction_loss = tf.keras.losses.sparse_categorical_crossentropy( - labels, logits) - loss = tf.reduce_sum(prediction_loss) * (1.0 / - self.flags_obj.batch_size) - num_replicas = self.strategy.num_replicas_in_sync - l2_weight_decay = 1e-4 - if self.flags_obj.single_l2_loss_op: - l2_loss = l2_weight_decay * 2 * tf.add_n([ - tf.nn.l2_loss(v) - for v in self.model.trainable_variables - if 'bn' not in v.name - ]) - - loss += (l2_loss / num_replicas) - else: - loss += (tf.reduce_sum(self.model.losses) / num_replicas) - - grad_utils.minimize_using_explicit_allreduce( - tape, self.optimizer, loss, self.model.trainable_variables) - self.train_loss.update_state(loss) - self.train_accuracy.update_state(labels, logits) - if self.flags_obj.enable_xla: - step_fn = tf.function(step_fn, jit_compile=True) - self.strategy.run(step_fn, args=(next(iterator),)) - - def train_loop_end(self): - """See base class.""" - metrics = { - 'train_loss': self.train_loss.result(), - 'train_accuracy': self.train_accuracy.result(), - } - self.time_callback.on_batch_end(self.epoch_helper.batch_index - 1) - self._epoch_end() - return metrics - - def eval_begin(self): - """See base class.""" - self.test_loss.reset_states() - self.test_accuracy.reset_states() - - def eval_step(self, iterator): - """See base class.""" - - def step_fn(inputs): - """Function to run on the device.""" - images, labels = inputs - logits = self.model(images, training=False) - loss = tf.keras.losses.sparse_categorical_crossentropy(labels, logits) - loss = tf.reduce_sum(loss) * (1.0 / self.flags_obj.batch_size) - self.test_loss.update_state(loss) - self.test_accuracy.update_state(labels, logits) - - self.strategy.run(step_fn, args=(next(iterator),)) - - def eval_end(self): - """See base class.""" - return { - 'test_loss': self.test_loss.result(), - 'test_accuracy': self.test_accuracy.result() - } - - def _epoch_begin(self): - if self.epoch_helper.epoch_begin(): - self.time_callback.on_epoch_begin(self.epoch_helper.current_epoch) - - def _epoch_end(self): - if self.epoch_helper.epoch_end(): - self.time_callback.on_epoch_end(self.epoch_helper.current_epoch) diff --git a/official/vision/image_classification/resnet/tfhub_export.py b/official/vision/image_classification/resnet/tfhub_export.py deleted file mode 100644 index 2b19f70bc7ae0c019d4d969cdedb28fdc5898b79..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/tfhub_export.py +++ /dev/null @@ -1,66 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""A script to export TF-Hub SavedModel.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os - -# Import libraries -from absl import app -from absl import flags - -import tensorflow as tf - -from official.vision.image_classification.resnet import imagenet_preprocessing -from official.vision.image_classification.resnet import resnet_model - -FLAGS = flags.FLAGS - -flags.DEFINE_string("model_path", None, - "File path to TF model checkpoint or H5 file.") -flags.DEFINE_string("export_path", None, - "TF-Hub SavedModel destination path to export.") - - -def export_tfhub(model_path, hub_destination): - """Restores a tf.keras.Model and saves for TF-Hub.""" - model = resnet_model.resnet50( - num_classes=imagenet_preprocessing.NUM_CLASSES, rescale_inputs=True) - model.load_weights(model_path) - model.save( - os.path.join(hub_destination, "classification"), include_optimizer=False) - - # Extracts a sub-model to use pooling feature vector as model output. - image_input = model.get_layer(index=0).get_output_at(0) - feature_vector_output = model.get_layer(name="reduce_mean").get_output_at(0) - hub_model = tf.keras.Model(image_input, feature_vector_output) - - # Exports a SavedModel. - hub_model.save( - os.path.join(hub_destination, "feature-vector"), include_optimizer=False) - - -def main(argv): - if len(argv) > 1: - raise app.UsageError("Too many command-line arguments.") - - export_tfhub(FLAGS.model_path, FLAGS.export_path) - - -if __name__ == "__main__": - app.run(main) diff --git a/official/vision/image_classification/test_utils.py b/official/vision/image_classification/test_utils.py deleted file mode 100644 index 8d7180c9d4e10c3241c4d6dd31d2cd013439df7a..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/test_utils.py +++ /dev/null @@ -1,37 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Test utilities for image classification tasks.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf - - -def trivial_model(num_classes): - """Trivial model for ImageNet dataset.""" - - input_shape = (224, 224, 3) - img_input = tf.keras.layers.Input(shape=input_shape) - - x = tf.keras.layers.Lambda( - lambda x: tf.keras.backend.reshape(x, [-1, 224 * 224 * 3]), - name='reshape')(img_input) - x = tf.keras.layers.Dense(1, name='fc1')(x) - x = tf.keras.layers.Dense(num_classes, name='fc1000')(x) - x = tf.keras.layers.Activation('softmax', dtype='float32')(x) - - return tf.keras.models.Model(img_input, x, name='trivial')