提交 6b360783 编写于 作者: jm_12138's avatar jm_12138

add the readme of models

上级 c40ce17a
# DLA series
## Overview
DLA (Deep Layer Aggregation). Visual recognition requires rich representations that span levels from low to high, scales from small to large, and resolutions from fine to coarse. Even with the depth of features in a convolutional network, a layer in isolation is not enough: compounding and aggregating these representations improves inference of what and where. Although skip connections have been incorporated to combine layers, these connections have been "shallow" themselves, and only fuse by simple, one-step operations. The authors augment standard architectures with deeper aggregation to better fuse information across layers. Deep layer aggregation structures iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters. Experiments across architectures and tasks show that deep layer aggregation improves recognition and resolution compared to existing branching and merging schemes. [paper](https://arxiv.org/abs/1707.06484)
## Accuracy, FLOPS and Parameters
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|:-----------------:|:----------:|:---------:|:---------:|:---------:|
| DLA34 | 15.8 | 3.1 | 76.03 | 92.98 |
| DLA46_c | 1.3 | 0.5 | 63.21 | 85.30 |
| DLA46x_c | 1.1 | 0.5 | 64.36 | 86.01 |
| DLA60 | 22.0 | 4.2 | 76.10 | 92.92 |
| DLA60x | 17.4 | 3.5 | 77.53 | 93.78 |
| DLA60x_c | 1.3 | 0.6 | 66.45 | 87.54 |
| DLA102 | 33.3 | 7.2 | 78.93 | 94.52 |
| DLA102x | 26.4 | 5.9 | 78.10 | 94.00 |
| DLA102x2 | 41.4 | 9.3 | 78.85 | 94.45 |
| DLA169 | 53.5 | 11.6 | 78.09 | 94.09 |
# HarDNet series
## Overview
HarDNet(Harmonic DenseNet)is a brand new neural network proposed by National Tsing Hua University in 2019, which to achieve high efficiency in terms of both low MACs and memory traffic. The new network achieves 35%, 36%, 30%, 32%, and 45% inference time reduction compared with FC-DenseNet-103, DenseNet-264, ResNet-50, ResNet-152, and SSD-VGG, respectively. We use tools including Nvidia profiler and ARM Scale-Sim to measure the memory traffic and verify that the inference latency is indeed proportional to the memory traffic consumption and the proposed network consumes low memory traffic. [Paper](https://arxiv.org/abs/1909.00948).
## Accuracy, FLOPS and Parameters
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|:---------------------:|:----------:|:---------:|:---------:|:---------:|
| HarDNet68 | 17.6 | 4.3 | 75.46 | 92.65 |
| HarDNet85 | 36.7 | 9.1 | 77.44 | 93.55 |
| HarDNet39_ds | 3.5 | 0.4 | 71.33 | 89.98 |
| HarDNet68_ds | 4.2 | 0.8 | 73.62 | 91.52 |
\ No newline at end of file
# RedNet series
## Overview
In the backbone of ResNet and in all bottleneck positions of backbone, the convolution is replaced by Involution, but all convolutions are reserved for channel mapping and fusion. These carefully redesigned entities combine to form a new efficient backbone network, called Rednet. [paper](https://arxiv.org/abs/2103.06255).
## Accuracy, FLOPS and Parameters
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|:---------------------:|:----------:|:---------:|:---------:|:---------:|
| RedNet26 | 9.2 | 1.7 | 75.95 | 93.19 |
| RedNet38 | 12.4 | 2.2 | 77.47 | 93.56 |
| RedNet50 | 15.5 | 2.7 | 78.33 | 94.17 |
| RedNet101 | 25.7 | 4.7 | 78.94 | 94.36 |
| RedNet152 | 34.0 | 6.8 | 79.17 | 94.40 |
\ No newline at end of file
# TNT series
## Overview
TNT(Transformer-iN-Transformer) series models were proposed by Huawei-Noah in 2021 for modeling both patch-level and pixel-level representation. In each TNT block, an outer transformer block is utilized to process patch embeddings, and an inner transformer block extracts local features from pixel embeddings. The pixel-level feature is projected to the space of patch embedding by a linear transformation layer and then added into the patch. By stacking the TNT blocks, we build the TNT model for image recognition. Experiments on ImageNet benchmark and downstream tasks demonstrate the superiority and efficiency of the proposed TNT architecture. For example, our TNT achieves 81.3% top-1 accuracy on ImageNet which is 1.5% higher than that of DeiT with similar computational cost. [Paper](https://arxiv.org/abs/2103.00112).
## Accuracy, FLOPS and Parameters
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|:---------------------:|:----------:|:---------:|:---------:|:---------:|
| TNT_small | 23.8 | 5.2 | 81.12 | 95.56 |
\ No newline at end of file
# DLA系列
## 概述
DLA (Deep Layer Aggregation)。 视觉识别需要丰富的表示形式,其范围从低到高,范围从小到大,分辨率从精细到粗糙。即使卷积网络中的要素深度很深,仅靠隔离层还是不够的:将这些表示法进行复合和聚合可改善对内容和位置的推断。尽管已合并了残差连接以组合各层,但是这些连接本身是“浅”的,并且只能通过简单的一步操作来融合。作者通过更深层的聚合来增强标准体系结构,以更好地融合各层的信息。Deep Layer Aggregation 结构迭代地和分层地合并了特征层次结构,以使网络具有更高的准确性和更少的参数。跨体系结构和任务的实验表明,与现有的分支和合并方案相比,Deep Layer Aggregation 可提高识别和分辨率。[论文地址](https://arxiv.org/abs/1707.06484)
## 精度、FLOPS和参数量
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|:-----------------:|:----------:|:---------:|:---------:|:---------:|
| DLA34 | 15.8 | 3.1 | 76.03 | 92.98 |
| DLA46_c | 1.3 | 0.5 | 63.21 | 85.30 |
| DLA46x_c | 1.1 | 0.5 | 64.36 | 86.01 |
| DLA60 | 22.0 | 4.2 | 76.10 | 92.92 |
| DLA60x | 17.4 | 3.5 | 77.53 | 93.78 |
| DLA60x_c | 1.3 | 0.6 | 66.45 | 87.54 |
| DLA102 | 33.3 | 7.2 | 78.93 | 94.52 |
| DLA102x | 26.4 | 5.9 | 78.10 | 94.00 |
| DLA102x2 | 41.4 | 9.3 | 78.85 | 94.45 |
| DLA169 | 53.5 | 11.6 | 78.09 | 94.09 |
\ No newline at end of file
# HarDNet系列
## 概述
HarDNet(Harmonic DenseNet)是 2019 年由国立清华大学提出的一种全新的神经网络,在低 MAC 和内存流量的条件下实现了高效率。与 FC-DenseNet-103,DenseNet-264,ResNet-50,ResNet-152 和SSD-VGG 相比,新网络的推理时间减少了 35%,36%,30%,32% 和 45%。我们使用了包括Nvidia Profiler 和 ARM Scale-Sim 在内的工具来测量内存流量,并验证推理延迟确实与内存流量消耗成正比,并且所提议的网络消耗的内存流量很低。[论文地址](https://arxiv.org/abs/1909.00948)
## 精度、FLOPS和参数量
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|:---------------------:|:----------:|:---------:|:---------:|:---------:|
| HarDNet68 | 17.6 | 4.3 | 75.46 | 92.65 |
| HarDNet85 | 36.7 | 9.1 | 77.44 | 93.55 |
| HarDNet39_ds | 3.5 | 0.4 | 71.33 | 89.98 |
| HarDNet68_ds | 4.2 | 0.8 | 73.62 | 91.52 |
# RedNet系列
## 概述
在 ResNet 的 Backbone 和 Backbone 的所有 Bottleneck 位置上使用 Involution 替换掉了卷积,但保留了所有的卷积用于通道映射和融合。这些精心重新设计的实体联合起来,形成了一种新的高效 Backbone 网络,称为 RedNet。[论文地址](https://arxiv.org/abs/2103.06255)
## 精度、FLOPS和参数量
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|:---------------------:|:----------:|:---------:|:---------:|:---------:|
| RedNet26 | 9.2 | 1.7 | 75.95 | 93.19 |
| RedNet38 | 12.4 | 2.2 | 77.47 | 93.56 |
| RedNet50 | 15.5 | 2.7 | 78.33 | 94.17 |
| RedNet101 | 25.7 | 4.7 | 78.94 | 94.36 |
| RedNet152 | 34.0 | 6.8 | 79.17 | 94.40 |
\ No newline at end of file
# TNT系列
## 概述
TNT(Transformer-iN-Transformer)系列模型由华为诺亚于2021年提出,用于对 patch 级别和 pixel 级别的表示进行建模。在每个 TNT 块中,outer transformer block 用于处理 patch 嵌入,inner transformer block 从 pixel 嵌入中提取局部特征。通过线性变换层将 pixel 级特征投影到 patch 嵌入空间,然后加入到 patch 中。通过对 TNT 块的叠加,建立了用于图像识别的 TNT 模型。在ImageNet 基准测试和下游任务上的实验证明了该 TNT 体系结构的优越性和有效性。例如,在计算量相当的情况下 TNT 能在 ImageNet 上达到 81.3% 的 top-1 精度,比 DeiT 高 1.5%。[论文地址](https://arxiv.org/abs/2103.00112)
## 精度、FLOPS和参数量
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|:---------------------:|:----------:|:---------:|:---------:|:---------:|
| TNT_small | 23.8 | 5.2 | 81.12 | 95.56 |
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册