Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • PaddleDetection
  • Issue
  • #1229

P
PaddleDetection
  • 项目概览

PaddlePaddle / PaddleDetection
大约 2 年 前同步成功

通知 708
Star 11112
Fork 2696
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 184
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 40
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
PaddleDetection
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 184
    • Issue 184
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 40
    • 合并请求 40
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 8月 18, 2020 by saxon_zh@saxon_zhGuest

Yolov3 + resnet 50 从 checkpoint 处继续训练出现 loss 为负

Created by: XiminLin

我的checkpoint和这次的训练都在 AI Studio GPU 环境下训练的, paddlepaddle-gpu==1.8.0.post97, python3.

除了改变reader.yml 里面的 transforms, 其他都保持不变.

这是我的模型配置:

YOLOv3: backbone: ResNet yolo_head: YOLOv3Head ResNet: norm_type: bn freeze_norm: false norm_decay: 0. variant: d depth: 50 dcn_v2_stages: [5] feature_maps: [3, 4, 5] YOLOv3Head: anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] anchors: [[19, 29], [28, 20], [25, 40], [31, 47], [36, 37], [41, 26], [47, 66], [48, 33], [67, 53]] norm_decay: 0. yolo_loss: YOLOv3Loss nms: background_label: -1 keep_top_k: 100 nms_threshold: 0.45 nms_top_k: 1000 normalized: false score_threshold: 0.01

YOLOv3Loss: # batch_size here is only used for fine grained loss, not used # for training batch_size setting, training batch_size setting # is in configs/yolov3_reader.yml TrainReader.batch_size, batch # size here should be set as same value as TrainReader.batch_size batch_size: 8 ignore_thresh: 0.7 # default is 0.7 label_smooth: true

我的checkpoint是模型在没有 data augmentation 的时候训练的, 具体的reader yml配置如下:

sample_transforms: - !DecodeImage to_rgb: True with_mixup: False - !RandomInterpImage target_size: 608 - !NormalizeBox {} - !PadBox num_max_boxes: 50 - !BboxXYXY2XYWH {} - !NormalizeImage mean: [0.8937, 0.9031, 0.8988] std: [0.19, 0.1995, 0.2022] is_scale: true is_channel_first: false batch_transforms: - !Permute to_bgr: false channel_first: true # Gt2YoloTarget is only used when use_fine_grained_loss set as true, # this operator will be deleted automatically if use_fine_grained_loss # is set as false - !Gt2YoloTarget anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]] downsample_ratios: [32, 16, 8]

我接下来用 python train.py -r checkpoint_path 来继续训练, 这次加上 augmentation, 具体的如下:

sample_transforms: - !DecodeImage to_rgb: True with_mixup: True # with_mixup: False - !MixupImage alpha: 1.5 beta: 1.5 - !ColorDistort {} - !RandomExpand ratio: 3 fill_value: [231.438 , 236.2575, 235.416] - !RandomCrop {} - !RandomInterpImage target_size: 608 - !RandomFlipImage is_normalized: false - !NormalizeBox {} - !PadBox num_max_boxes: 50 - !BboxXYXY2XYWH {} - !NormalizeImage mean: [0.8937, 0.9031, 0.8988] std: [0.19, 0.1995, 0.2022] is_scale: true is_channel_first: false batch_transforms: - !RandomShape sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] random_inter: True - !Permute to_bgr: false channel_first: true # Gt2YoloTarget is only used when use_fine_grained_loss set as true, # this operator will be deleted automatically if use_fine_grained_loss # is set as false - !Gt2YoloTarget anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]] downsample_ratios: [32, 16, 8]

结果出现负的loss, 并且训练结果变差...

2020-08-18 10:02:53,083-INFO: iter: 61100, lr: 0.000165, 'loss': '49.983803', time: 1.080, eta: 11:40:19 2020-08-18 10:02:53,083-INFO: iter: 61100, lr: 0.000165, 'loss': '49.983803', time: 1.080, eta: 11:40:19 2020-08-18 10:03:32,268-INFO: iter: 61200, lr: 0.000164, 'loss': '-3.282897', time: 0.390, eta: 4:12:26 2020-08-18 10:03:32,268-INFO: iter: 61200, lr: 0.000164, 'loss': '-3.282897', time: 0.390, eta: 4:12:26 2020-08-18 10:04:29,751-INFO: iter: 61300, lr: 0.000163, 'loss': '-483.048737', time: 0.575, eta: 6:10:44 2020-08-18 10:04:29,751-INFO: iter: 61300, lr: 0.000163, 'loss': '-483.048737', time: 0.575, eta: 6:10:44 2020-08-18 10:05:29,555-INFO: iter: 61400, lr: 0.000162, 'loss': '-1974.388916', time: 0.597, eta: 6:24:04 2020-08-18 10:05:29,555-INFO: iter: 61400, lr: 0.000162, 'loss': '-1974.388916', time: 0.597, eta: 6:24:04 2020-08-18 10:06:31,471-INFO: iter: 61500, lr: 0.000162, 'loss': '-4106.348633', time: 0.614, eta: 6:33:59 2020-08-18 10:06:31,471-INFO: iter: 61500, lr: 0.000162, 'loss': '-4106.348633', time: 0.614, eta: 6:33:59 2020-08-18 10:07:28,676-INFO: iter: 61600, lr: 0.000161, 'loss': '-2150.716797', time: 0.574, eta: 6:07:15 2020-08-18 10:07:28,676-INFO: iter: 61600, lr: 0.000161, 'loss': '-2150.716797', time: 0.574, eta: 6:07:15 2020-08-18 10:08:28,551-INFO: iter: 61700, lr: 0.000160, 'loss': '-24875.148438', time: 0.601, eta: 6:23:36 2020-08-18 10:08:28,551-INFO: iter: 61700, lr: 0.000160, 'loss': '-24875.148438', time: 0.601, eta: 6:23:36 2020-08-18 10:09:24,851-INFO: iter: 61800, lr: 0.000159, 'loss': '-175619.359375', time: 0.567, eta: 6:01:05 2020-08-18 10:09:24,851-INFO: iter: 61800, lr: 0.000159, 'loss': '-175619.359375', time: 0.567, eta: 6:01:05 2020-08-18 10:10:23,562-INFO: iter: 61900, lr: 0.000159, 'loss': '-316771.875000', time: 0.587, eta: 6:12:42 2020-08-18 10:10:23,562-INFO: iter: 61900, lr: 0.000159, 'loss': '-316771.875000', time: 0.587, eta: 6:12:42 2020-08-18 10:11:22,151-INFO: iter: 62000, lr: 0.000158, 'loss': '-657982.000000', time: 0.584, eta: 6:09:47 2020-08-18 10:11:22,151-INFO: iter: 62000, lr: 0.000158, 'loss': '-657982.000000', time: 0.584, eta: 6:09:47

这种情况是 loss overflow 了吗, 这种情况怎么办?

谢谢

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/PaddleDetection#1229
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7