diff --git a/configs/gcnet/README_cn.md b/configs/gcnet/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..795e62ef61b5427f31dd9f0baf81c3e383711d24 --- /dev/null +++ b/configs/gcnet/README_cn.md @@ -0,0 +1,69 @@ +# GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond + +## 简介 + +Nonlocal基于自注意力机制,给出了捕捉长时依赖的方法,但是在该论文中,作者通过可视化分析发现,相同图像中对于不同位置点的attention map几乎是一致的,也就是说在Nonlocal计算过程中有很大的资源浪费(冗余计算)。SENet使用全局上下文对不同的通道进行权重标定,计算量很小,但是这样无法充分利用全局上下文信息。论文中作者结合了Nonlocal和SENet两者的优点,提出了GCNet模块,在保证较小计算量的情况下,很好地融合了全局上下文信息。 + +论文中基于attention map差距很小的现象,设计了simplified nonlocal结构(SNL),结构如下图所示,对所有位置共享全局attention map。 + +
+ +
+ + +SNL的网络输出计算如下 + +
+ +
+ +为进一步减少计算量,将$W_v$提取到attention pooling计算的外面,表示为 + +
+ +
+ +对应结构如下所示。通过共享attention map,计算量减少为之前的1/WH。 + +
+ +
+ +SNL模块可以抽象为上下文建模、特征转换和特征聚合三个部分,特征转化部分有大量参数,因此在这里参考SE的结构,最终GC block的结构如下所示。使用两层降维的1*1卷积降低计算量,由于两层卷积参数较难优化,在这里加入layer normalization的正则化层降低优化难度。 + +
+ +
+ +该模块可以很方便地插入到骨干网络中,提升模型的全局上下文表达能力,可以提升检测和分割任务的模型性能。 + + +## 模型库 + +| 骨架网络 | 网络类型 | Context设置 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP | 下载 | 配置文件 | +| :---------------------- | :-------------: | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | +| ResNet50-vd-FPN | Mask | GC(c3-c5, r16, add) | 2 | 2x | 15.31 | 41.4 | 36.8 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_vd_fpn_gcb_add_r16_2x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/gcnet/mask_rcnn_r50_vd_fpn_gcb_add_r16_2x.yml) | +| ResNet50-vd-FPN | Mask | GC(c3-c5, r16, mul) | 2 | 2x | 15.35 | 40.7 | 36.1 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_vd_fpn_gcb_mul_r16_2x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/gcnet/mask_rcnn_r50_vd_fpn_gcb_mul_r16_2x.yml) | + + +## 引用 + +``` +@article{DBLP:journals/corr/abs-1904-11492, + author = {Yue Cao and + Jiarui Xu and + Stephen Lin and + Fangyun Wei and + Han Hu}, + title = {GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond}, + journal = {CoRR}, + volume = {abs/1904.11492}, + year = {2019}, + url = {http://arxiv.org/abs/1904.11492}, + archivePrefix = {arXiv}, + eprint = {1904.11492}, + timestamp = {Tue, 09 Jul 2019 16:48:55 +0200}, + biburl = {https://dblp.org/rec/bib/journals/corr/abs-1904-11492}, + bibsource = {dblp computer science bibliography, https://dblp.org} +} +``` diff --git a/configs/iou_loss/README_cn.md b/configs/iou_loss/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..0710944c53281f2c1c7c9374eb246364112abf1e --- /dev/null +++ b/configs/iou_loss/README_cn.md @@ -0,0 +1,126 @@ +# Improvements of IOU loss + +## 简介 + +### GIOU loss + +IOU是论文中十分常用的指标,它对于物体的尺度并不敏感,在之前的检测任务中,常使用smooth l1 loss计算边框loss,但是该种方法计算出来的loss一方面无法与最终的IOU指标直接对应,同时也对检测框的尺度较为敏感,因此有学者提出将IOU loss作为回归的loss;但是如果IOU为0,则loss为0,同时IOU loss也没有考虑物体方向没有对齐时的loss,该论文基于此进行改进,计算GIOU的方法如下。 + + +
+ +
+ + +最终GIOU loss为1-GIOU所得的值。具体来看,IOU可以直接反映边框与真值之间的交并比,C为能够包含A和B的最小封闭凸物体,因此即使A和B的交并比为0,GIOU也会随着A和B的相对距离而不断变化,因此模型参数可以继续得到优化。在A和B的长宽保持恒定的情况下,两者距离越远,GIOU越小,GIOU loss越大。 + +使用GIOU loss计算边框损失的流程图如下。 + +
+ +
+ + +PaddleDetection也开源了基于faster rcnn的GIOU loss实现。使用GIOU loss替换传统的smooth l1 loss,基于faster rcnn的resnet50-vd-fpn 1x实验,coco val mAP能由38.3%提升到39.4%(没有带来任何预测耗时的损失) + + +### DIOU/CIOU loss + +GIOU loss解决了IOU loss中预测边框A与真值B的交并比为0时,模型无法给出优化方向的问题,但是仍然有2种情况难以解决, +1. 当边框A和边框B处于包含关系的时候,GIOU loss退化为IOU loss,此时模型收敛较慢。 +2. 当A与B相交,若A和B的的x1与x2均相等或者y1与y2均相等,GIOU loss仍然会退化为IOU loss,收敛很慢。 + +基于此,论文提出了DIOU loss与CIOU loss,解决收敛速度慢以及部分条件下无法收敛的问题。 +为加速收敛,论文在改进的loss中引入距离的概念,具体地,边框loss可以定义为如下形式: + + +
+ +
+ + +其中 是惩罚项,考虑预测边框与真值的距离损失时,惩罚项可以定义为 + + +
+ +
+ + +其中分子表示预测框与真值边框中心点的欧式距离,分母的c表示预测框与真值边框的最小外包边框的对角长度。因此DIOU loss可以写为 + +
+ +
+ + +相对于GIOU loss,DIOU loss不仅考虑了IOU,也考虑边框之间的距离,从而加快了模型收敛的速度。但是使用DIOU loss作为边框损失函数时,只考虑了边框的交并比以及中心点的距离,没有考虑到预测边框与真值的长宽比差异的情况,因此论文中提出了CIOU loss,惩罚项添加关于长宽比的约束。具体地,惩罚项定义如下 + +
+ +
+ + +其中v为惩罚项,α为惩罚系数,定义分别如下 + +
+ +
+ + +CIOU loss使得在边框回归时,与目标框有重叠甚至包含时能够更快更准确地收敛。 +在NMS阶段,一般的阈值计算为IOU,论文使用了DIOU修正后的阈值,检测框得分的更新方法如下。 + +
+ +
+ + +这使得模型效果有进一步的提升。 + + +## 模型库 + +| 骨架网络 | 网络类型 | Loss类型 | Loss权重 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP | 下载 | 配置文件 | +| :---------------------- | :------------- | :---: | :---: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :---: | +| ResNet50-vd-FPN | Faster | GIOU | 10 | 2 | 1x | 22.94 | 39.4 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_giou_loss_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/iou_loss/faster_rcnn_r50_vd_fpn_giou_loss_1x.yml) | +| ResNet50-vd-FPN | Faster | DIOU | 12 | 2 | 1x | 22.94 | 39.2 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_diou_loss_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/iou_loss/faster_rcnn_r50_vd_fpn_diou_loss_1x.yml) | +| ResNet50-vd-FPN | Faster | CIOU | 12 | 2 | 1x | 22.95 | 39.6 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_ciou_loss_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/iou_loss/faster_rcnn_r50_vd_fpn_ciou_loss_1x.yml) | + + + +## 引用 + +``` +@article{DBLP:journals/corr/abs-1902-09630, + author = {Seyed Hamid Rezatofighi and + Nathan Tsoi and + JunYoung Gwak and + Amir Sadeghian and + Ian D. Reid and + Silvio Savarese}, + title = {Generalized Intersection over Union: {A} Metric and {A} Loss for Bounding + Box Regression}, + journal = {CoRR}, + volume = {abs/1902.09630}, + year = {2019}, + url = {http://arxiv.org/abs/1902.09630}, + archivePrefix = {arXiv}, + eprint = {1902.09630}, + timestamp = {Tue, 21 May 2019 18:03:36 +0200}, + biburl = {https://dblp.org/rec/bib/journals/corr/abs-1902-09630}, + bibsource = {dblp computer science bibliography, https://dblp.org} +} +``` + +- Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression: [https://arxiv.org/abs/1911.08287](https://arxiv.org/abs/1911.08287) + +``` +@article{Zheng2019DistanceIoULF, + title={Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression}, + author={Zhaohui Zheng and Ping Wang and Wei Liu and Jinze Li and Rongguang Ye and Dongwei Ren}, + journal={ArXiv}, + year={2019}, + volume={abs/1911.08287} +} +``` diff --git a/configs/libra_rcnn/README_cn.md b/configs/libra_rcnn/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..134c1c390eaec3869b4cce7e46e4973ca6e8440c --- /dev/null +++ b/configs/libra_rcnn/README_cn.md @@ -0,0 +1,75 @@ +# Libra R-CNN: Towards Balanced Learning for Object Detection + +## 简介 + +检测模型训练大多包含3个步骤:候选区域生成与选择、特征提取、类别分类和检测框回归多任务的训练与收敛。 + +论文主要分析了在检测任务中,三个层面的不均衡现象限制了模型的性能,分别是样本(sample level)、特征(feature level)以及目标级别(objective level)的不均衡,提出了3种方案,用于解决上述三个不均衡的现象。三个解决方法如下。 + +### IoU-balanced Sampling + +Faster RCNN中生成许多候选框之后,使用随机的方法挑选正负样本,但是这导致了一个问题:负样本中有70%的候选框与真值的IOU都在0~0.05之间,分布如下图所示。使用在线难负样本挖掘(OHEM)的方法可以缓解这种情况,但是不同IOU区间的采样样本仍然差距仍然比较大,而且流程复杂。作者提出了均衡的负样本采样策略,即将IOU阈值区间分为K份,在每个子区间都采样相同数量的负样本(如果达不到平均数量,则取所有在该子区间的样本),最终可以保证采样得到的负样本在不同的IOU子区间达到尽量均衡的状态。这种方法思路简单,效果也比OHEM要更好一些。 + + +
+ +
+ + +### Balanced Feature Pyramid(BFP) + +之前的FPN结构中使用横向连接的操作融合骨干网络的特征,论文中提出了一个如下图,主要包括rescaling, integrating, refining and strengthening,共4个部分。首先将不同层级的特征图缩放到同一尺度,之后对特征图进行加权平均,使用Nonlocal模块进一步提炼特征,最终将提炼后的特征图进行缩放,作为残差项与不同层级的特征图相加,得到最终输出的特征图。这种平衡的特征图金字塔结构相对于标准的FPN在coco数据集上可以带来0.8%左右的精度提升。 + +
+ +
+ + + +### Balanced L1 Loss + +物体检测任务中,需要同时优化分类loss与边框的回归loss,当分类得分很高时,即使回归效果很差,也会使得模型有比较高的精度,因此可以考虑增加回归loss的权重。假设bbox loss<=1的边框为inliers(可以被视为简单的样本),bbox loss>1的边框为outliers(可以被视为难样本),假设直接调整所有边框的回归loss,这会导致模型对outliers更加敏感,而且基于smooth l1 loss的边框loss计算方法有以下缺点,当边框为inliers时,其梯度很小,当边框为outliers时,梯度幅值为1。smooth l1 loss的梯度计算方法定义如下。 + +
+ +
+ + +因此论文考虑增加inliers的梯度值,尽量平衡inliers与outliers的loss梯度比例。最终Libra loss的梯度计算方法如下所示。 + +
+ +
+ + +在不同的超参数下,梯度可视化如下图所示。 + + +
+ +
+ + +可以看出Libra loss与smooth l1 loss对于outliers的梯度是相同的,但是在inliers中,Libra loss的梯度更大一些,从而增大了不同情况下的边框回归loss,平衡了难易边框学习的loss,同时也提升了边框回归效果对检测模型性能的影响。 + +论文将3个部分融合在一起,在coco两阶段目标检测任务中有1.1%~2.5%的绝对精度提升,效果十分明显。 + + +## 模型库 + + +| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP | 下载 | 配置文件 | +| :---------------------- | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | +| ResNet50-vd-BFP | Faster | 2 | 1x | 18.247 | 40.5 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/libra_rcnn_r50_vd_fpn_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/libra_rcnn/libra_rcnn_r50_vd_fpn_1x.yml) | +| ResNet101-vd-BFP | Faster | 2 | 1x | 14.865 | 42.5 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/libra_rcnn_r101_vd_fpn_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/libra_rcnn/libra_rcnn_r101_vd_fpn_1x.yml) | + +## 引用 + +``` +@inproceedings{pang2019libra, + title={Libra R-CNN: Towards Balanced Learning for Object Detection}, + author={Pang, Jiangmiao and Chen, Kai and Shi, Jianping and Feng, Huajun and Ouyang, Wanli and Dahua Lin}, + booktitle={IEEE Conference on Computer Vision and Pattern Recognition}, + year={2019} +} +``` diff --git a/docs/images/models/gcnet_gcblock_module.png b/docs/images/models/gcnet_gcblock_module.png new file mode 100644 index 0000000000000000000000000000000000000000..cadc2167c60e1c07e9bc13bd02573f34ef0e4795 Binary files /dev/null and b/docs/images/models/gcnet_gcblock_module.png differ diff --git a/docs/images/models/gcnet_snl_module.png b/docs/images/models/gcnet_snl_module.png new file mode 100644 index 0000000000000000000000000000000000000000..087edb753f4f744140b804c84b746d513a07798f Binary files /dev/null and b/docs/images/models/gcnet_snl_module.png differ diff --git a/docs/images/models/gcnet_snl_module_simple.png b/docs/images/models/gcnet_snl_module_simple.png new file mode 100644 index 0000000000000000000000000000000000000000..3abbecef1d4e02f6cee2b0f90a1a4b6f04eb5cf8 Binary files /dev/null and b/docs/images/models/gcnet_snl_module_simple.png differ diff --git a/docs/images/models/gcnet_snl_out.png b/docs/images/models/gcnet_snl_out.png new file mode 100644 index 0000000000000000000000000000000000000000..eb4946da0b2ab46ddb3ab9f5d3cb9a9edbae3009 Binary files /dev/null and b/docs/images/models/gcnet_snl_out.png differ diff --git a/docs/images/models/gcnet_snl_out_simple.png b/docs/images/models/gcnet_snl_out_simple.png new file mode 100644 index 0000000000000000000000000000000000000000..81da924902f556da3693eee834e46723a3717eda Binary files /dev/null and b/docs/images/models/gcnet_snl_out_simple.png differ diff --git a/docs/images/models/iou_loss_diou_bbox_loss.png b/docs/images/models/iou_loss_diou_bbox_loss.png new file mode 100644 index 0000000000000000000000000000000000000000..6caf48de2afc836f586a7819ef551b5ab4777d59 Binary files /dev/null and b/docs/images/models/iou_loss_diou_bbox_loss.png differ diff --git a/docs/images/models/iou_loss_diou_ciou_nms.png b/docs/images/models/iou_loss_diou_ciou_nms.png new file mode 100644 index 0000000000000000000000000000000000000000..05c30e261d55aa68cee2e24217d38fbf57446237 Binary files /dev/null and b/docs/images/models/iou_loss_diou_ciou_nms.png differ diff --git a/docs/images/models/iou_loss_diou_diou_final.png b/docs/images/models/iou_loss_diou_diou_final.png new file mode 100644 index 0000000000000000000000000000000000000000..42f52c8d4e4e284fe6af19f2201d4eebd6af80a4 Binary files /dev/null and b/docs/images/models/iou_loss_diou_diou_final.png differ diff --git a/docs/images/models/iou_loss_diou_rciou_penalty.png b/docs/images/models/iou_loss_diou_rciou_penalty.png new file mode 100644 index 0000000000000000000000000000000000000000..25a219c2e6be619316a05d05496206b6cc478a43 Binary files /dev/null and b/docs/images/models/iou_loss_diou_rciou_penalty.png differ diff --git a/docs/images/models/iou_loss_diou_rdiou_penalty.png b/docs/images/models/iou_loss_diou_rdiou_penalty.png new file mode 100644 index 0000000000000000000000000000000000000000..6bd9149def1e4865f02496d55a23a6f459e5ef8e Binary files /dev/null and b/docs/images/models/iou_loss_diou_rdiou_penalty.png differ diff --git a/docs/images/models/iou_loss_diou_v_and_alpha.png b/docs/images/models/iou_loss_diou_v_and_alpha.png new file mode 100644 index 0000000000000000000000000000000000000000..792c3bed9e87aad3feec8e606057d968ae5adfc1 Binary files /dev/null and b/docs/images/models/iou_loss_diou_v_and_alpha.png differ diff --git a/docs/images/models/iou_loss_giou_calc.png b/docs/images/models/iou_loss_giou_calc.png new file mode 100644 index 0000000000000000000000000000000000000000..dd698e055094e6e6b3dad13cf2ea265a87f0cefd Binary files /dev/null and b/docs/images/models/iou_loss_giou_calc.png differ diff --git a/docs/images/models/iou_loss_giou_pipeline.png b/docs/images/models/iou_loss_giou_pipeline.png new file mode 100644 index 0000000000000000000000000000000000000000..218e974b8776d8cd2634e63b34e715f24de27604 Binary files /dev/null and b/docs/images/models/iou_loss_giou_pipeline.png differ diff --git a/docs/images/models/libra_rcnn_iou_distribution.png b/docs/images/models/libra_rcnn_iou_distribution.png new file mode 100644 index 0000000000000000000000000000000000000000..a161eea35ba949b33ef9bb025e8d58c5c57eb537 Binary files /dev/null and b/docs/images/models/libra_rcnn_iou_distribution.png differ diff --git a/docs/images/models/libra_rcnn_libraloss_equ.png b/docs/images/models/libra_rcnn_libraloss_equ.png new file mode 100644 index 0000000000000000000000000000000000000000..947ec73a4c26b26b2310b0182a70b00542d80935 Binary files /dev/null and b/docs/images/models/libra_rcnn_libraloss_equ.png differ diff --git a/docs/images/models/libra_rcnn_loss_grad.png b/docs/images/models/libra_rcnn_loss_grad.png new file mode 100644 index 0000000000000000000000000000000000000000..b0fcf4ff591b908f644092b31ae416ff7f6513dd Binary files /dev/null and b/docs/images/models/libra_rcnn_loss_grad.png differ diff --git a/docs/images/models/libra_rcnn_pipeline.png b/docs/images/models/libra_rcnn_pipeline.png new file mode 100644 index 0000000000000000000000000000000000000000..ba612ac8c291c319b1ceefb52cda61d34b8369dd Binary files /dev/null and b/docs/images/models/libra_rcnn_pipeline.png differ diff --git a/docs/images/models/libra_rcnn_smooth_l1_equ.png b/docs/images/models/libra_rcnn_smooth_l1_equ.png new file mode 100644 index 0000000000000000000000000000000000000000..3c2a174f826377fbdd636d11c871a8b04be45e72 Binary files /dev/null and b/docs/images/models/libra_rcnn_smooth_l1_equ.png differ