提交 9609c04a 编写于 作者: S still-wait

Merge branch 'master' of https://github.com/paddlepaddle/PaddleDetection into add_solov2_new

...@@ -53,13 +53,14 @@ PP-YOLO improved performance and speed of YOLOv3 with following methods: ...@@ -53,13 +53,14 @@ PP-YOLO improved performance and speed of YOLOv3 with following methods:
**Notes:** **Notes:**
- PP-YOLO is trained on COCO train2017 datast and evaluated on val2017 & test-dev2017 dataset,Box AP<sup>test</sup> is evaluation results of `mAP(IoU=0.5:0.95)`. - PP-YOLO is trained on COCO train2017 dataset and evaluated on val2017 & test-dev2017 dataset,Box AP<sup>test</sup> is evaluation results of `mAP(IoU=0.5:0.95)`.
- PP-YOLO used 8 GPUs for training and mini-batch size as 24 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](../../docs/FAQ.md). - PP-YOLO used 8 GPUs for training and mini-batch size as 24 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](../../docs/FAQ.md).
- PP-YOLO inference speed is tesed on single Tesla V100 with batch size as 1, CUDA 10.2, CUDNN 7.5.1, TensorRT 5.1.2.2 in TensorRT mode. - PP-YOLO inference speed is tesed on single Tesla V100 with batch size as 1, CUDA 10.2, CUDNN 7.5.1, TensorRT 5.1.2.2 in TensorRT mode.
- PP-YOLO FP32 inference speed testing uses inference model exported by `tools/export_model.py` and benchmarked by running `depoly/python/infer.py` with `--run_benchmark`. All testing results do not contains the time cost of data reading and post-processing(NMS), which is same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) in testing method. - PP-YOLO FP32 inference speed testing uses inference model exported by `tools/export_model.py` and benchmarked by running `depoly/python/infer.py` with `--run_benchmark`. All testing results do not contains the time cost of data reading and post-processing(NMS), which is same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) in testing method.
- TensorRT FP16 inference speed testing exclude the time cost of bounding-box decoding(`yolo_box`) part comparing with FP32 testing above, which means that data reading, bounding-box decoding and post-processing(NMS) is excluded(test method same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) too) - TensorRT FP16 inference speed testing exclude the time cost of bounding-box decoding(`yolo_box`) part comparing with FP32 testing above, which means that data reading, bounding-box decoding and post-processing(NMS) is excluded(test method same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) too)
- YOLOv4(AlexyAB) performance and inference speed is copy from single Tesla V100 testing results in [YOLOv4 github repo](https://github.com/AlexeyAB/darknet), Tesla V100 TensorRT FP16 inference speed is testing with tkDNN configuration and TensorRT 5.1.2.2 on single Tesla V100 based on [AlexyAB/darknet repo](https://github.com/AlexeyAB/darknet). - YOLOv4(AlexyAB) performance and inference speed is copy from single Tesla V100 testing results in [YOLOv4 github repo](https://github.com/AlexeyAB/darknet), Tesla V100 TensorRT FP16 inference speed is testing with tkDNN configuration and TensorRT 5.1.2.2 on single Tesla V100 based on [AlexyAB/darknet repo](https://github.com/AlexeyAB/darknet).
- Download and configuration of YOLOv4(AlexyAB) is reproduced model of YOLOv4 in PaddleDetection, whose evaluation performance is same as YOLOv4(AlexyAB), and finetune training is supported in PaddleDetection currently, reproducing by training from backbone pretrain weights is on working, see [PaddleDetection YOLOv4](../yolov4/README.md) for details. - Download and configuration of YOLOv4(AlexyAB) is reproduced model of YOLOv4 in PaddleDetection, whose evaluation performance is same as YOLOv4(AlexyAB), and finetune training is supported in PaddleDetection currently, reproducing by training from backbone pretrain weights is on working, see [PaddleDetection YOLOv4](../yolov4/README.md) for details.
- PP-YOLO trained with `batch_size=24` in each GPU with memory as 32G, configuation yaml with `batch_size=12` which can be trained on GPU with memory as 16G is provided as `ppyolo_2x_bs12.yml`, training with `batch_size=12` reached `mAP(IoU=0.5:0.95) = 45.1%` on COCO val2017 dataset, download weights by [ppyolo_2x_bs12 model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x_bs12.pdparams)
### PP-YOLO for mobile ### PP-YOLO for mobile
......
...@@ -61,6 +61,7 @@ PP-YOLO从如下方面优化和提升YOLOv3模型的精度和速度: ...@@ -61,6 +61,7 @@ PP-YOLO从如下方面优化和提升YOLOv3模型的精度和速度:
- YOLOv4(AlexyAB)模型精度和V100 FP32推理速度数据使用[YOLOv4 github库](https://github.com/AlexeyAB/darknet)提供的单卡V100上精度速度测试数据,V100 TensorRT FP16推理速度为使用[AlexyAB/darknet](https://github.com/AlexeyAB/darknet)库中tkDNN配置于单卡V100,TensorRT 5.1.2.2的测试结果。 - YOLOv4(AlexyAB)模型精度和V100 FP32推理速度数据使用[YOLOv4 github库](https://github.com/AlexeyAB/darknet)提供的单卡V100上精度速度测试数据,V100 TensorRT FP16推理速度为使用[AlexyAB/darknet](https://github.com/AlexeyAB/darknet)库中tkDNN配置于单卡V100,TensorRT 5.1.2.2的测试结果。
- PP-YOLO模型推理速度测试采用单卡V100,batch size=1进行测试,使用CUDA 10.2, CUDNN 7.5.1,TensorRT推理速度测试使用TensorRT 5.1.2.2。 - PP-YOLO模型推理速度测试采用单卡V100,batch size=1进行测试,使用CUDA 10.2, CUDNN 7.5.1,TensorRT推理速度测试使用TensorRT 5.1.2.2。
- YOLOv4(AlexyAB)行`模型下载``配置文件`为PaddleDetection复现的YOLOv4模型,目前评估精度已对齐,支持finetune,训练精度对齐中,可参见[PaddleDetection YOLOv4 模型](../yolov4/README.md) - YOLOv4(AlexyAB)行`模型下载``配置文件`为PaddleDetection复现的YOLOv4模型,目前评估精度已对齐,支持finetune,训练精度对齐中,可参见[PaddleDetection YOLOv4 模型](../yolov4/README.md)
- PP-YOLO使用每GPU `batch_size=24`训练,需要使用显存为32G的GPU,我们也提供了`batch_size=12`的可以在显存为16G的GPU上训练的配置文件`ppyolo_2x_bs12.yml`,使用这个配置文件训练在COCO val2017数据集上评估结果为`mAP(IoU=0.5:0.95) = 45.1%`,可通过[ppyolo_2x_bs12模型](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x_bs12.pdparams)下载权重。
### PP-YOLO 移动端模型 ### PP-YOLO 移动端模型
......
architecture: YOLOv3
use_gpu: true
max_iters: 500000
log_smooth_window: 100
log_iter: 100
save_dir: output
snapshot_iter: 10000
metric: COCO
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
weights: output/ppyolo/model_final
num_classes: 80
use_fine_grained_loss: true
use_ema: true
ema_decay: 0.9998
YOLOv3:
backbone: ResNet
yolo_head: YOLOv3Head
use_fine_grained_loss: true
ResNet:
norm_type: sync_bn
freeze_at: 0
freeze_norm: false
norm_decay: 0.
depth: 50
feature_maps: [3, 4, 5]
variant: d
dcn_v2_stages: [5]
YOLOv3Head:
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
norm_decay: 0.
coord_conv: true
iou_aware: true
iou_aware_factor: 0.4
scale_x_y: 1.05
spp: true
yolo_loss: YOLOv3Loss
nms: MatrixNMS
drop_block: true
YOLOv3Loss:
ignore_thresh: 0.7
scale_x_y: 1.05
label_smooth: false
use_fine_grained_loss: true
iou_loss: IouLoss
iou_aware_loss: IouAwareLoss
IouLoss:
loss_weight: 2.5
max_height: 608
max_width: 608
IouAwareLoss:
loss_weight: 1.0
max_height: 608
max_width: 608
MatrixNMS:
background_label: -1
keep_top_k: 100
normalized: false
score_threshold: 0.01
post_threshold: 0.01
LearningRate:
base_lr: 0.005
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 400000
- 450000
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
_READER_: 'ppyolo_reader.yml'
TrainReader:
batch_size: 12
...@@ -17,6 +17,7 @@ TrainReader: ...@@ -17,6 +17,7 @@ TrainReader:
beta: 1.5 beta: 1.5
- !ColorDistort {} - !ColorDistort {}
- !RandomExpand - !RandomExpand
ratio: 2.0
fill_value: [123.675, 116.28, 103.53] fill_value: [123.675, 116.28, 103.53]
- !RandomCrop {} - !RandomCrop {}
- !RandomFlipImage - !RandomFlipImage
......
...@@ -2576,9 +2576,7 @@ class DebugVisibleImage(BaseOperator): ...@@ -2576,9 +2576,7 @@ class DebugVisibleImage(BaseOperator):
x1 = round(keypoint[2 * j]).astype(np.int32) x1 = round(keypoint[2 * j]).astype(np.int32)
y1 = round(keypoint[2 * j + 1]).astype(np.int32) y1 = round(keypoint[2 * j + 1]).astype(np.int32)
draw.ellipse( draw.ellipse(
(x1, y1, x1 + 5, y1i + 5), (x1, y1, x1 + 5, y1 + 5), fill='green', outline='green')
fill='green',
outline='green')
save_path = os.path.join(self.output_dir, out_file_name) save_path = os.path.join(self.output_dir, out_file_name)
image.save(save_path, quality=95) image.save(save_path, quality=95)
return sample return sample
......
...@@ -23,6 +23,9 @@ try: ...@@ -23,6 +23,9 @@ try:
except Exception: except Exception:
from collections import Sequence from collections import Sequence
import logging
logger = logging.getLogger(__name__)
__all__ = ['YOLOv3Loss'] __all__ = ['YOLOv3Loss']
...@@ -41,16 +44,18 @@ class YOLOv3Loss(object): ...@@ -41,16 +44,18 @@ class YOLOv3Loss(object):
__inject__ = ['iou_loss', 'iou_aware_loss'] __inject__ = ['iou_loss', 'iou_aware_loss']
__shared__ = ['use_fine_grained_loss', 'train_batch_size'] __shared__ = ['use_fine_grained_loss', 'train_batch_size']
def __init__(self, def __init__(
train_batch_size=8, self,
ignore_thresh=0.7, train_batch_size=8,
label_smooth=True, batch_size=-1, # stub for backward compatable
use_fine_grained_loss=False, ignore_thresh=0.7,
iou_loss=None, label_smooth=True,
iou_aware_loss=None, use_fine_grained_loss=False,
downsample=[32, 16, 8], iou_loss=None,
scale_x_y=1., iou_aware_loss=None,
match_score=False): downsample=[32, 16, 8],
scale_x_y=1.,
match_score=False):
self._train_batch_size = train_batch_size self._train_batch_size = train_batch_size
self._ignore_thresh = ignore_thresh self._ignore_thresh = ignore_thresh
self._label_smooth = label_smooth self._label_smooth = label_smooth
...@@ -61,6 +66,11 @@ class YOLOv3Loss(object): ...@@ -61,6 +66,11 @@ class YOLOv3Loss(object):
self.scale_x_y = scale_x_y self.scale_x_y = scale_x_y
self.match_score = match_score self.match_score = match_score
if batch_size != -1:
logger.warn(
"config YOLOv3Loss.batch_size is deprecated, "
"training batch size should be set by TrainReader.batch_size")
def __call__(self, outputs, gt_box, gt_label, gt_score, targets, anchors, def __call__(self, outputs, gt_box, gt_label, gt_score, targets, anchors,
anchor_masks, mask_anchors, num_classes, prefix_name): anchor_masks, mask_anchors, num_classes, prefix_name):
if self._use_fine_grained_loss: if self._use_fine_grained_loss:
......
...@@ -255,8 +255,9 @@ def main(): ...@@ -255,8 +255,9 @@ def main():
train_stats.update(stats) train_stats.update(stats)
logs = train_stats.log() logs = train_stats.log()
if it % cfg.log_iter == 0 and (not FLAGS.dist or trainer_id == 0): if it % cfg.log_iter == 0 and (not FLAGS.dist or trainer_id == 0):
strs = 'iter: {}, lr: {:.6f}, {}, time: {:.3f}, eta: {}'.format( ips = float(cfg['TrainReader']['batch_size']) / time_cost
it, np.mean(outs[-1]), logs, time_cost, eta) strs = 'iter: {}, lr: {:.6f}, {}, batch_cost: {:.5f} s, eta: {}, ips: {:.5f} images/sec'.format(
it, np.mean(outs[-1]), logs, time_cost, eta, ips)
logger.info(strs) logger.info(strs)
# NOTE : profiler tools, used for benchmark # NOTE : profiler tools, used for benchmark
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册