From 78a41239c457b0ebf50d7b10ad3a448cd27d2195 Mon Sep 17 00:00:00 2001 From: Xianzhi Du Date: Sun, 23 Jan 2022 21:57:08 -0800 Subject: [PATCH] Internal change PiperOrigin-RevId: 423725861 --- official/README.md | 1 + official/vision/beta/MODEL_GARDEN.md | 14 ++++++++++++++ 2 files changed, 15 insertions(+) diff --git a/official/README.md b/official/README.md index 34d338e2c..001d48ff8 100644 --- a/official/README.md +++ b/official/README.md @@ -43,6 +43,7 @@ In the near future, we will add: | [ResNet](vision/beta/MODEL_GARDEN.md) | [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) | | [ResNet-RS](vision/beta/MODEL_GARDEN.md) | [Revisiting ResNets: Improved Training and Scaling Strategies](https://arxiv.org/abs/2103.07579) | | [EfficientNet](vision/image_classification) | [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) | +| [Vision Transformer](vision/beta/MODEL_GARDEN.md) | [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) | #### Object Detection and Segmentation diff --git a/official/vision/beta/MODEL_GARDEN.md b/official/vision/beta/MODEL_GARDEN.md index 3f16482d5..1546fbedb 100644 --- a/official/vision/beta/MODEL_GARDEN.md +++ b/official/vision/beta/MODEL_GARDEN.md @@ -55,6 +55,20 @@ depth, label smoothing and dropout. | ResNet-RS-350 | 256x256 | 164.3 | 83.7 | 96.7 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i256.tar.gz) | | ResNet-RS-350 | 320x320 | 164.3 | 84.2 | 96.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs420_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i320.tar.gz) | + +#### Vision Transformer (ViT) + +We support [ViT](https://arxiv.org/abs/2010.11929) and [DEIT](https://arxiv.org/abs/2012.12877) implementations in a TF +Vision +[project](https://github.com/tensorflow/models/tree/master/official/projects/vit). ViT models trained under the DEIT settings: + +model | resolution | Top-1 | Top-5 | +--------- | :--------: | ----: | ----: | +ViT-s16 | 224x224 | 79.4 | 94.7 | +ViT-b16 | 224x224 | 81.8 | 95.8 | +ViT-l16 | 224x224 | 82.2 | 95.8 | + + ## Object Detection and Instance Segmentation ### Common Settings and Notes -- GitLab