# CSWinTransformer --- ## Catalogue * [1. Overview](#1) * [2. Accuracy, FLOPs and Parameters](#2) ## 1. Overview CSWinTransformer is a new visual Transformer network that can be used as a general backbone network in the field of computer vision. CSWinTransformer proposes to do self-attention through a cross-shaped window, which not only has a very high computational efficiency, but also can obtain a global receptive field through two-layer calculation. CSWinTransformer also proposed a new encoding method: LePE, which further improved the accuracy of the model. [Paper](https://arxiv.org/abs/2107.00652) ## 2. Accuracy, FLOPs and Parameters | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPs
(G) | Params
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| | CSWinTransformer_tiny_224 | 0.8281 | 0.9628 | 0.828 | - | 4.1 | 22 | | CSWinTransformer_small_224 | 0.8358 | 0.9658 | 0.836 | - | 6.4 | 35 | | CSWinTransformer_base_224 | 0.8420 | 0.9692 | 0.842 | - | 14.3 | 77 | | CSWinTransformer_large_224 | 0.8643 | 0.9799 | 0.865 | - | 32.2 | 173.3 | | CSWinTransformer_base_384 | 0.8550 | 0.9749 | 0.855 | - | 42.2 | 77 | | CSWinTransformer_large_384 | 0.8748 | 0.9833 | 0.875 | - | 94.7 | 173.3 |