@@ -500,7 +502,40 @@ The accuracy and speed indicators of TNT series models are shown in the followin
<aname="23"></a>
## 23. Other models
## 23. CSWinTransformer series <sup>[[40](#ref40)]</sup>
The accuracy and speed indicators of CSWinTransformer series models are shown in the following table. For more introduction, please refer to: [CSWinTransformer series model documents](../models/CSWinTransformer_en.md)。
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
The accuracy and speed indicators of PVTV2 series models are shown in the following table. For more introduction, please refer to: [PVTV2 series model documents](../models/PVTV2_en.md)。
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
The accuracy and speed indicators of AlexNet <sup>[[18](#ref18)]</sup>, SqueezeNet series <sup>[[19](#ref19)]</sup>, VGG series <sup>[[20](#ref20)]</sup>, DarkNet53 <sup>[[21](#ref21)]</sup> and other models are shown in the following table. For more information, please refer to: [Other model documents](../models/Others_en.md).
...
...
@@ -597,3 +632,8 @@ TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE.
<aname="ref39">[39]</a>Duo Lim Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, Qifeng Chen. Involution: Inverting the Inherence of Convolution for Visual Recognition.
<aname="ref40">[40]</a>Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, Baining Guo. CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows.
CSWinTransformer is a new visual Transformer network that can be used as a general backbone network in the field of computer vision. CSWinTransformer proposes to do self-attention through a cross-shaped window, which not only has a very high computational efficiency, but also can obtain a global receptive field through two-layer calculation. CSWinTransformer also proposed a new encoding method: LePE, which further improved the accuracy of the model. [Paper](https://arxiv.org/abs/2107.00652)
<aname="ref39">[39]</a>Duo Lim Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, Qifeng Chen. Involution: Inverting the Inherence of Convolution for Visual Recognition.
<aname="ref40">[40]</a>Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, Baining Guo. CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows.