Add video classification with MoViNets to official readme.

PiperOrigin-RevId: 436276225

Add video classification with MoViNets to official readme.
PiperOrigin-RevId: 436276225
470986b6 · Dan Kondratyuk · A. Unique TensorFlower · f65fd490 · 470986b6 · 470986b6
Showing with 24 addition and 3 deletion

official/README.md official/README.md +8 -0

official/projects/movinet/README.md official/projects/movinet/README.md +1 -2

official/vision/beta/MODEL_GARDEN.md official/vision/beta/MODEL_GARDEN.md +15 -1

未找到文件。
--- a/official/README.md
+++ b/official/README.md
@@ -20,6 +20,7 @@ In the near future, we will add:
 * State-of-the-art language understanding models.
 * State-of-the-art image classification models.
 * State-of-the-art object detection and instance segmentation models.
+* State-of-the-art video classification models.

 ## Table of Contents

@@ -27,6 +28,7 @@ In the near future, we will add:
  * [Computer Vision](#computer-vision)
    + [Image Classification](#image-classification)
    + [Object Detection and Segmentation](#object-detection-and-segmentation)
+    + [Video Classification](#video-classification)
  * [Natural Language Processing](#natural-language-processing)
  * [Recommendation](#recommendation)
 - [How to get started with the official models](#how-to-get-started-with-the-official-models)
@@ -55,6 +57,12 @@ In the near future, we will add:
 | [SpineNet](vision/beta/MODEL_GARDEN.md) | [SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization](https://arxiv.org/abs/1912.05027) |
 | [Cascade RCNN-RS and RetinaNet-RS](vision/beta/MODEL_GARDEN.md) | [Simple Training Strategies and Model Scaling for Object Detection](https://arxiv.org/abs/2107.00057)|

+#### Video Classification
+
+| Model | Reference (Paper) |
+|-------|-------------------|
+| [Mobile Video Networks (MoViNets)](projects/movinet) | [MoViNets: Mobile Video Networks for Efficient Video Recognition](https://arxiv.org/abs/2103.11511) |
+
 ### Natural Language Processing

 | Model | Reference (Paper) |

--- a/official/projects/movinet/README.md
+++ b/official/projects/movinet/README.md
@@ -176,8 +176,7 @@ devices. See the [TF Lite Example](#tf-lite-example) to export and run your own
 models. We also provide [quantized TF Lite binaries via TF Hub](https://tfhub.dev/s?deployment-format=lite&q=movinet).

 For reference, MoViNet-A0-Stream runs with a similar latency to
-[MobileNetV3-Large]
-(https://tfhub.dev/google/imagenet/mobilenet_v3_large_100_224/classification/)
+[MobileNetV3-Large](https://tfhub.dev/google/imagenet/mobilenet_v3_large_100_224/classification/)
 with +5% accuracy on Kinetics 600.

 | Model Name | Input Shape | Pixel 4 Latency\* | x86 Latency\* | TF Lite Binary |

--- a/official/vision/beta/MODEL_GARDEN.md
+++ b/official/vision/beta/MODEL_GARDEN.md
@@ -171,8 +171,10 @@ evaluated on [COCO](https://cocodataset.org/) val2017.
        [Spatiotemporal Contrastive Video Representation Learning](https://arxiv.org/abs/2008.03800).
    *   ResNet-3D-RS (R3D-RS) in
        [Revisiting 3D ResNets for Video Recognition](https://arxiv.org/pdf/2109.01696.pdf).
+    *   Mobile Video Networks (MoViNets) in
+        [MoViNets: Mobile Video Networks for Efficient Video Recognition](https://arxiv.org/abs/2103.11511).

-* Training and evaluation details:
+* Training and evaluation details (SlowFast and ResNet):
  * All models are trained from scratch with vision modality (RGB) for 200
    epochs.
  * We use batch size of 1024 and cosine learning rate decay with linear warmup
@@ -192,6 +194,12 @@ evaluated on [COCO](https://cocodataset.org/) val2017.
 | R3D-RS-152 | 32 x 2                 | 79.9  | 94.3  | -
 | R3D-RS-200 | 32 x 2                 | 80.4  | 94.4  | -
 | R3D-RS-200 | 48 x 2                 | 81.0  | -     | -
+| MoViNet-A0-Base | 50 x 5            | 69.40 | 89.18 | -
+| MoViNet-A1-Base | 50 x 5            | 74.57 | 92.03 | -
+| MoViNet-A2-Base | 50 x 5            | 75.91 | 92.63 | -
+| MoViNet-A3-Base | 120 x 2           | 79.34 | 94.52 | -
+| MoViNet-A4-Base | 80 x 3            | 80.64 | 94.93 | -
+| MoViNet-A5-Base | 120 x 2           | 81.39 | 95.06 | -

 ### Kinetics-600 Action Recognition Baselines

@@ -201,3 +209,9 @@ evaluated on [COCO](https://cocodataset.org/) val2017.
 | R3D-50   | 32 x 2                 |  79.5   |  94.8   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k600_3d-resnet50_tpu.yaml) |
 | R3D-RS-200 | 32 x 2                 | 83.1  | -     | -
 | R3D-RS-200 | 48 x 2                 | 83.8  | -     | -
+| MoViNet-A0-Base | 50 x 5            | 72.05 | 90.92 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml) |
+| MoViNet-A1-Base | 50 x 5            | 76.69 | 93.40 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a1_k600_8x8.yaml) |
+| MoViNet-A2-Base | 50 x 5            | 78.62 | 94.17 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a2_k600_8x8.yaml) |
+| MoViNet-A3-Base | 120 x 2           | 81.79 | 95.67 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a3_k600_8x8.yaml) |
+| MoViNet-A4-Base | 80 x 3            | 83.48 | 96.16 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a4_k600_8x8.yaml) |
+| MoViNet-A5-Base | 120 x 2           | 84.27 | 96.39 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a5_k600_8x8.yaml) |