diff --git a/official/README.md b/official/README.md index 34d338e2c1531d526298be4313b15cfe1fd721bc..001d48ff87305734a627dc61de496ca6fca2b5a7 100644 --- a/official/README.md +++ b/official/README.md @@ -43,6 +43,7 @@ In the near future, we will add: | [ResNet](vision/beta/MODEL_GARDEN.md) | [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) | | [ResNet-RS](vision/beta/MODEL_GARDEN.md) | [Revisiting ResNets: Improved Training and Scaling Strategies](https://arxiv.org/abs/2103.07579) | | [EfficientNet](vision/image_classification) | [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) | +| [Vision Transformer](vision/beta/MODEL_GARDEN.md) | [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) | #### Object Detection and Segmentation diff --git a/official/vision/beta/MODEL_GARDEN.md b/official/vision/beta/MODEL_GARDEN.md index 3f16482d55c465c22b27fb7e8be92da270a47d22..1546fbedb1cb601b9bda0f614cc80b9eedeec5f8 100644 --- a/official/vision/beta/MODEL_GARDEN.md +++ b/official/vision/beta/MODEL_GARDEN.md @@ -55,6 +55,20 @@ depth, label smoothing and dropout. | ResNet-RS-350 | 256x256 | 164.3 | 83.7 | 96.7 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i256.tar.gz) | | ResNet-RS-350 | 320x320 | 164.3 | 84.2 | 96.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs420_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i320.tar.gz) | + +#### Vision Transformer (ViT) + +We support [ViT](https://arxiv.org/abs/2010.11929) and [DEIT](https://arxiv.org/abs/2012.12877) implementations in a TF +Vision +[project](https://github.com/tensorflow/models/tree/master/official/projects/vit). ViT models trained under the DEIT settings: + +model | resolution | Top-1 | Top-5 | +--------- | :--------: | ----: | ----: | +ViT-s16 | 224x224 | 79.4 | 94.7 | +ViT-b16 | 224x224 | 81.8 | 95.8 | +ViT-l16 | 224x224 | 82.2 | 95.8 | + + ## Object Detection and Instance Segmentation ### Common Settings and Notes