Add LeViT_en.md and Twins_en.md

a049f23b · cuicheng01 · 33019093 · a049f23b · a049f23b
隐藏空白更改
内联并排

Showing with 34 addition and 0 deletion

docs/en/models/LeViT_en.md docs/en/models/LeViT_en.md +17 -0

docs/en/models/Twins.md docs/en/models/Twins.md +17 -0

未找到文件。
--- a/docs/en/models/LeViT_en.md
+++ b/docs/en/models/LeViT_en.md
+# LeViT series
+## Overview
+LeViT is a fast inference hybrid neural network for image classification tasks. Its design considers the performance of the network model on different hardware platforms, so it can better reflect the real scenarios of common applications. Through a large number of experiments, the author found a better way to combine the convolutional neural network and the Transformer system, and proposed an attention-based method to integrate the position information encoding in the Transformer. [Paper](https://arxiv.org/abs/2104.01136)。
+## Accuracy, FLOPS and Parameters
+| Models           | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(M) | Params<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| LeViT-128S | 0.7598 | 0.9269 | 0.766 | 0.929 | 305  | 7.8 |
+| LeViT-128  | 0.7810 | 0.9371 | 0.786 | 0.940 | 406  | 9.2 |
+| LeViT-192  | 0.7934 | 0.9446 | 0.800 | 0.947 | 658  | 11 |
+| LeViT-256  | 0.8085 | 0.9497 | 0.816 | 0.954 | 1120 | 19 |
+| LeViT-384  | 0.8191 | 0.9551 | 0.826 | 0.960 | 2353 | 39 |
+**Note**：The difference in accuracy from Reference is due to the difference in data preprocessing and the absence of distilled head as output.
--- a/docs/en/models/Twins.md
+++ b/docs/en/models/Twins.md
+# Twins
+## Overview
+The Twins network includes Twins-PCPVT and Twins-SVT, which focuses on the meticulous design of the spatial attention mechanism, resulting in a simple but more effective solution. Since the architecture only involves matrix multiplication, and the current deep learning framework has a high degree of optimization for matrix multiplication, the architecture is very efficient and easy to implement. Moreover, this architecture can achieve excellent performance in a variety of downstream vision tasks such as image classification, target detection, and semantic segmentation. [Paper](https://arxiv.org/abs/2104.13840).
+## Accuracy, FLOPS and Parameters
+| Models        | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Params<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| pcpvt_small   | 0.8082 | 0.9552 | 0.812 | - | 3.7 | 24.1   |
+| pcpvt_base    | 0.8242 | 0.9619 | 0.827 | - | 6.4 | 43.8   |
+| pcpvt_large   | 0.8273 | 0.9650 | 0.831 | - | 9.5 | 60.9   |
+| alt_gvt_small | 0.8140 | 0.9546 | 0.817 | - | 2.8  | 24   |
+| alt_gvt_base  | 0.8294 | 0.9621 | 0.832 | - | 8.3  | 56   |
+| alt_gvt_large | 0.8331 | 0.9642 | 0.837 | - | 14.8 | 99.2   |
+**Note**:The difference in accuracy from Reference is due to the difference in data preprocessing.