提交 470986b6 编写于 作者: D Dan Kondratyuk 提交者: A. Unique TensorFlower

Add video classification with MoViNets to official readme.

PiperOrigin-RevId: 436276225
上级 f65fd490
......@@ -20,6 +20,7 @@ In the near future, we will add:
* State-of-the-art language understanding models.
* State-of-the-art image classification models.
* State-of-the-art object detection and instance segmentation models.
* State-of-the-art video classification models.
## Table of Contents
......@@ -27,6 +28,7 @@ In the near future, we will add:
* [Computer Vision](#computer-vision)
+ [Image Classification](#image-classification)
+ [Object Detection and Segmentation](#object-detection-and-segmentation)
+ [Video Classification](#video-classification)
* [Natural Language Processing](#natural-language-processing)
* [Recommendation](#recommendation)
- [How to get started with the official models](#how-to-get-started-with-the-official-models)
......@@ -55,6 +57,12 @@ In the near future, we will add:
| [SpineNet](vision/beta/MODEL_GARDEN.md) | [SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization](https://arxiv.org/abs/1912.05027) |
| [Cascade RCNN-RS and RetinaNet-RS](vision/beta/MODEL_GARDEN.md) | [Simple Training Strategies and Model Scaling for Object Detection](https://arxiv.org/abs/2107.00057)|
#### Video Classification
| Model | Reference (Paper) |
|-------|-------------------|
| [Mobile Video Networks (MoViNets)](projects/movinet) | [MoViNets: Mobile Video Networks for Efficient Video Recognition](https://arxiv.org/abs/2103.11511) |
### Natural Language Processing
| Model | Reference (Paper) |
......
......@@ -176,8 +176,7 @@ devices. See the [TF Lite Example](#tf-lite-example) to export and run your own
models. We also provide [quantized TF Lite binaries via TF Hub](https://tfhub.dev/s?deployment-format=lite&q=movinet).
For reference, MoViNet-A0-Stream runs with a similar latency to
[MobileNetV3-Large]
(https://tfhub.dev/google/imagenet/mobilenet_v3_large_100_224/classification/)
[MobileNetV3-Large](https://tfhub.dev/google/imagenet/mobilenet_v3_large_100_224/classification/)
with +5% accuracy on Kinetics 600.
| Model Name | Input Shape | Pixel 4 Latency\* | x86 Latency\* | TF Lite Binary |
......
......@@ -171,8 +171,10 @@ evaluated on [COCO](https://cocodataset.org/) val2017.
[Spatiotemporal Contrastive Video Representation Learning](https://arxiv.org/abs/2008.03800).
* ResNet-3D-RS (R3D-RS) in
[Revisiting 3D ResNets for Video Recognition](https://arxiv.org/pdf/2109.01696.pdf).
* Mobile Video Networks (MoViNets) in
[MoViNets: Mobile Video Networks for Efficient Video Recognition](https://arxiv.org/abs/2103.11511).
* Training and evaluation details:
* Training and evaluation details (SlowFast and ResNet):
* All models are trained from scratch with vision modality (RGB) for 200
epochs.
* We use batch size of 1024 and cosine learning rate decay with linear warmup
......@@ -192,6 +194,12 @@ evaluated on [COCO](https://cocodataset.org/) val2017.
| R3D-RS-152 | 32 x 2 | 79.9 | 94.3 | -
| R3D-RS-200 | 32 x 2 | 80.4 | 94.4 | -
| R3D-RS-200 | 48 x 2 | 81.0 | - | -
| MoViNet-A0-Base | 50 x 5 | 69.40 | 89.18 | -
| MoViNet-A1-Base | 50 x 5 | 74.57 | 92.03 | -
| MoViNet-A2-Base | 50 x 5 | 75.91 | 92.63 | -
| MoViNet-A3-Base | 120 x 2 | 79.34 | 94.52 | -
| MoViNet-A4-Base | 80 x 3 | 80.64 | 94.93 | -
| MoViNet-A5-Base | 120 x 2 | 81.39 | 95.06 | -
### Kinetics-600 Action Recognition Baselines
......@@ -201,3 +209,9 @@ evaluated on [COCO](https://cocodataset.org/) val2017.
| R3D-50 | 32 x 2 | 79.5 | 94.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k600_3d-resnet50_tpu.yaml) |
| R3D-RS-200 | 32 x 2 | 83.1 | - | -
| R3D-RS-200 | 48 x 2 | 83.8 | - | -
| MoViNet-A0-Base | 50 x 5 | 72.05 | 90.92 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml) |
| MoViNet-A1-Base | 50 x 5 | 76.69 | 93.40 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a1_k600_8x8.yaml) |
| MoViNet-A2-Base | 50 x 5 | 78.62 | 94.17 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a2_k600_8x8.yaml) |
| MoViNet-A3-Base | 120 x 2 | 81.79 | 95.67 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a3_k600_8x8.yaml) |
| MoViNet-A4-Base | 80 x 3 | 83.48 | 96.16 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a4_k600_8x8.yaml) |
| MoViNet-A5-Base | 120 x 2 | 84.27 | 96.39 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a5_k600_8x8.yaml) |
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册