From a049f23b23ade2eaf68d0849bcff88db98c233f4 Mon Sep 17 00:00:00 2001 From: cuicheng01 Date: Wed, 7 Jul 2021 11:50:31 +0000 Subject: [PATCH] Add LeViT_en.md and Twins_en.md --- docs/en/models/LeViT_en.md | 17 +++++++++++++++++ docs/en/models/Twins.md | 17 +++++++++++++++++ 2 files changed, 34 insertions(+) create mode 100644 docs/en/models/LeViT_en.md create mode 100644 docs/en/models/Twins.md diff --git a/docs/en/models/LeViT_en.md b/docs/en/models/LeViT_en.md new file mode 100644 index 00000000..7fd953ac --- /dev/null +++ b/docs/en/models/LeViT_en.md @@ -0,0 +1,17 @@ +# LeViT series + +## Overview +LeViT is a fast inference hybrid neural network for image classification tasks. Its design considers the performance of the network model on different hardware platforms, so it can better reflect the real scenarios of common applications. Through a large number of experiments, the author found a better way to combine the convolutional neural network and the Transformer system, and proposed an attention-based method to integrate the position information encoding in the Transformer. [Paper](https://arxiv.org/abs/2104.01136)。 + +## Accuracy, FLOPS and Parameters + +| Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(M) | Params
(M) | +|:--:|:--:|:--:|:--:|:--:|:--:|:--:| +| LeViT-128S | 0.7598 | 0.9269 | 0.766 | 0.929 | 305 | 7.8 | +| LeViT-128 | 0.7810 | 0.9371 | 0.786 | 0.940 | 406 | 9.2 | +| LeViT-192 | 0.7934 | 0.9446 | 0.800 | 0.947 | 658 | 11 | +| LeViT-256 | 0.8085 | 0.9497 | 0.816 | 0.954 | 1120 | 19 | +| LeViT-384 | 0.8191 | 0.9551 | 0.826 | 0.960 | 2353 | 39 | + + +**Note**:The difference in accuracy from Reference is due to the difference in data preprocessing and the absence of distilled head as output. diff --git a/docs/en/models/Twins.md b/docs/en/models/Twins.md new file mode 100644 index 00000000..69e70544 --- /dev/null +++ b/docs/en/models/Twins.md @@ -0,0 +1,17 @@ +# Twins + +## Overview +The Twins network includes Twins-PCPVT and Twins-SVT, which focuses on the meticulous design of the spatial attention mechanism, resulting in a simple but more effective solution. Since the architecture only involves matrix multiplication, and the current deep learning framework has a high degree of optimization for matrix multiplication, the architecture is very efficient and easy to implement. Moreover, this architecture can achieve excellent performance in a variety of downstream vision tasks such as image classification, target detection, and semantic segmentation. [Paper](https://arxiv.org/abs/2104.13840). + +## Accuracy, FLOPS and Parameters + +| Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Params
(M) | +|:--:|:--:|:--:|:--:|:--:|:--:|:--:| +| pcpvt_small | 0.8082 | 0.9552 | 0.812 | - | 3.7 | 24.1 | +| pcpvt_base | 0.8242 | 0.9619 | 0.827 | - | 6.4 | 43.8 | +| pcpvt_large | 0.8273 | 0.9650 | 0.831 | - | 9.5 | 60.9 | +| alt_gvt_small | 0.8140 | 0.9546 | 0.817 | - | 2.8 | 24 | +| alt_gvt_base | 0.8294 | 0.9621 | 0.832 | - | 8.3 | 56 | +| alt_gvt_large | 0.8331 | 0.9642 | 0.837 | - | 14.8 | 99.2 | + +**Note**:The difference in accuracy from Reference is due to the difference in data preprocessing. -- GitLab