diff --git a/dygraph/configs/ppyolo/README.md b/dygraph/configs/ppyolo/README.md new file mode 100644 index 0000000000000000000000000000000000000000..17b1e0da31ceb0fc7862fbcc27492cb84d3dda53 --- /dev/null +++ b/dygraph/configs/ppyolo/README.md @@ -0,0 +1,153 @@ +English | [简体中文](README_cn.md) + +# PP-YOLO + +## Table of Contents +- [Introduction](#Introduction) +- [Model Zoo](#Model_Zoo) +- [Getting Start](#Getting_Start) +- [Future Work](#Future_Work) +- [Appendix](#Appendix) + +## Introduction + +[PP-YOLO](https://arxiv.org/abs/2007.12099) is a optimized model based on YOLOv3 in PaddleDetection,whose performance(mAP on COCO) and inference spped are better than [YOLOv4](https://arxiv.org/abs/2004.10934),PaddlePaddle 2.0.0rc1(available on pip now) or [Daily Version](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#whl-release) is required to run this PP-YOLO。 + +PP-YOLO reached mmAP(IoU=0.5:0.95) as 45.9% on COCO test-dev2017 dataset, and inference speed of FP32 on single V100 is 72.9 FPS, inference speed of FP16 with TensorRT on single V100 is 155.6 FPS. + +
+ +
+ +PP-YOLO improved performance and speed of YOLOv3 with following methods: + +- Better backbone: ResNet50vd-DCN +- Larger training batch size: 8 GPUs and mini-batch size as 24 on each GPU +- [Drop Block](https://arxiv.org/abs/1810.12890) +- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp) +- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf) +- [Grid Sensitive](https://arxiv.org/abs/2004.10934) +- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf) +- [CoordConv](https://arxiv.org/abs/1807.03247) +- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729) +- Better ImageNet pretrain weights + +## Model Zoo + +### PP-YOLO + +| Model | GPU number | images/GPU | backbone | input shape | Box APval | Box APtest | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | download | config | +|:------------------------:|:----------:|:----------:|:----------:| :----------:| :------------------: | :-------------------: | :------------: | :---------------------: | :------: | :-----: | +| PP-YOLO | 8 | 24 | ResNet50vd | 608 | 44.8 | 45.2 | 72.9 | 155.6 | [model](https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | +| PP-YOLO_2x | 8 | 24 | ResNet50vd | 608 | 45.3 | 45.9 | 72.9 | 155.6 | [model](https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml) | + +**Notes:** + +- PP-YOLO is trained on COCO train2017 dataset and evaluated on val2017 & test-dev2017 dataset,Box APtest is evaluation results of `mAP(IoU=0.5:0.95)`. +- PP-YOLO used 8 GPUs for training and mini-batch size as 24 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](../../../docs/FAQ.md). +- PP-YOLO inference speed is tesed on single Tesla V100 with batch size as 1, CUDA 10.2, CUDNN 7.5.1, TensorRT 5.1.2.2 in TensorRT mode. +- PP-YOLO FP32 inference speed testing uses inference model exported by `tools/export_model.py` and benchmarked by running `depoly/python/infer.py` with `--run_benchmark`. All testing results do not contains the time cost of data reading and post-processing(NMS), which is same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) in testing method. +- TensorRT FP16 inference speed testing exclude the time cost of bounding-box decoding(`yolo_box`) part comparing with FP32 testing above, which means that data reading, bounding-box decoding and post-processing(NMS) is excluded(test method same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) too) + +## Getting Start + +### 1. Training + +Training PP-YOLO on 8 GPUs with following command(all commands should be run under PaddleDetection dygraph directory as default) + +```bash +python -m paddle.distributed.launch --log_dir=./ppyolo_dygraph/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml &>ppyolo_dygraph.log 2>&1 & +``` + +### 2. Evaluation + +Evaluating PP-YOLO on COCO val2017 dataset in single GPU with following commands: + +```bash +# use weights released in PaddleDetection model zoo +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams + +# use saved checkpoint in training +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=output/ppyolo_r50vd_dcn_1x_coco/model_final +``` + +For evaluation on COCO test-dev2017 dataset, `configs/ppyolo/ppyolo_test.yml` should be used, please download COCO test-dev2017 dataset from [COCO dataset download](https://cocodataset.org/#download) and decompress to pathes configured by `EvalReader.dataset` in `configs/ppyolo/ppyolo_test.yml` and run evaluation by following command: + +```bash +# use weights released in PaddleDetection model zoo +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams + +# use saved checkpoint in training +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o weights=output/ppyolo_r50vd_dcn_1x_coco/model_final +``` + +Evaluation results will be saved in `bbox.json`, compress it into a `zip` package and upload to [COCO dataset evaluation](https://competitions.codalab.org/competitions/20794#participate) to evaluate. + +**NOTE:** `configs/ppyolo/ppyolo_test.yml` is only used for evaluation on COCO test-dev2017 dataset, could not be used for training or COCO val2017 dataset evaluating. + +### 3. Inference + +Inference images in single GPU with following commands, use `--infer_img` to inference a single image and `--infer_dir` to inference all images in the directory. + +```bash +# inference single image +CUDA_VISIBLE_DEVICES=0 python tools/infer.py configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=../demo/000000014439_640x640.jpg + +# inference all images in the directory +CUDA_VISIBLE_DEVICES=0 python tools/infer.py configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_dir=../demo +``` + +### 4. Inferece deployment and benchmark + +For inference deployment or benchmard, model exported with `tools/export_model.py` should be used and perform inference with Paddle inference library with following commands: + +```bash +# export model, model will be save in output/ppyolo as default +python tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams + +# inference with Paddle Inference library +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=../demo/000000014439_640x640.jpg --use_gpu=True +``` + +Benchmark testing for PP-YOLO uses model without data reading and post-processing(NMS), export model with `--exclude_nms` to prunce NMS for benchmark testing from mode with following commands: + +```bash +# export model, --exclude_nms to prune NMS part, model will be save in output/ppyolo as default +python tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams --exclude_nms + +# FP32 benchmark +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=../demo/000000014439_640x640.jpg --use_gpu=True --run_benchmark=True + +# TensorRT FP16 benchmark +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=../demo/000000014439_640x640.jpg --use_gpu=True --run_benchmark=True --run_mode=trt_fp16 +``` + +## Future work + +1. more PP-YOLO tiny model +2. PP-YOLO model with more backbones + +## Appendix + +Optimizing method and ablation experiments of PP-YOLO compared with YOLOv3. + +| NO. | Model | Box APval | Box APtest | Params(M) | FLOPs(G) | V100 FP32 FPS | +| :--: | :--------------------------- | :------------------: |:--------------------: | :-------: | :------: | :-----------: | +| A | YOLOv3-DarkNet53 | 38.9 | - | 59.13 | 65.52 | 58.2 | +| B | YOLOv3-ResNet50vd-DCN | 39.1 | - | 43.89 | 44.71 | 79.2 | +| C | B + LB + EMA + DropBlock | 41.4 | - | 43.89 | 44.71 | 79.2 | +| D | C + IoU Loss | 41.9 | - | 43.89 | 44.71 | 79.2 | +| E | D + IoU Aware | 42.5 | - | 43.90 | 44.71 | 74.9 | +| F | E + Grid Sensitive | 42.8 | - | 43.90 | 44.71 | 74.8 | +| G | F + Matrix NMS | 43.5 | - | 43.90 | 44.71 | 74.8 | +| H | G + CoordConv | 44.0 | - | 43.93 | 44.76 | 74.1 | +| I | H + SPP | 44.3 | 45.2 | 44.93 | 45.12 | 72.9 | +| J | I + Better ImageNet Pretrain | 44.8 | 45.2 | 44.93 | 45.12 | 72.9 | +| K | J + 2x Scheduler | 45.3 | 45.9 | 44.93 | 45.12 | 72.9 | + +**Notes:** + +- Performance and inference spedd are measure with input shape as 608 +- All models are trained on COCO train2017 datast and evaluated on val2017 & test-dev2017 dataset,`Box AP` is evaluation results as `mAP(IoU=0.5:0.95)`. +- Inference speed is tested on single Tesla V100 with batch size as 1 following test method and environment configuration in benchmark above. +- [YOLOv3-DarkNet53](../yolov3/yolov3_darknet53_270e_coco.yml) with mAP as 39.0 is optimized YOLOv3 model in PaddleDetection,see [Model Zoo](../../../docs/MODEL_ZOO.md) for details. diff --git a/dygraph/configs/ppyolo/README_cn.md b/dygraph/configs/ppyolo/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..c1bd09d13460c8ecc89efdb9ed1a08b550747054 --- /dev/null +++ b/dygraph/configs/ppyolo/README_cn.md @@ -0,0 +1,154 @@ +简体中文 | [English](README.md) + +# PP-YOLO 模型 + +## 内容 +- [简介](#简介) +- [模型库与基线](#模型库与基线) +- [使用说明](#使用说明) +- [未来工作](#未来工作) +- [附录](#附录) + +## 简介 + +[PP-YOLO](https://arxiv.org/abs/2007.12099)是PaddleDetection优化和改进的YOLOv3的模型,其精度(COCO数据集mAP)和推理速度均优于[YOLOv4](https://arxiv.org/abs/2004.10934)模型,要求使用PaddlePaddle 2.0.0rc1(可使用pip安装) 或适当的[develop版本](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#whl-release)。 + +PP-YOLO在[COCO](http://cocodataset.org) test-dev2017数据集上精度达到45.9%,在单卡V100上FP32推理速度为72.9 FPS, V100上开启TensorRT下FP16推理速度为155.6 FPS。 + +
+ +
+ +PP-YOLO从如下方面优化和提升YOLOv3模型的精度和速度: + +- 更优的骨干网络: ResNet50vd-DCN +- 更大的训练batch size: 8 GPUs,每GPU batch_size=24,对应调整学习率和迭代轮数 +- [Drop Block](https://arxiv.org/abs/1810.12890) +- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp) +- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf) +- [Grid Sensitive](https://arxiv.org/abs/2004.10934) +- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf) +- [CoordConv](https://arxiv.org/abs/1807.03247) +- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729) +- 更优的预训练模型 + +## 模型库 + +### PP-YOLO模型 + +| 模型 | GPU个数 | 每GPU图片个数 | 骨干网络 | 输入尺寸 | Box APval | Box APtest | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | 模型下载 | 配置文件 | +|:------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: | :-------------------: | :------------: | :---------------------: | :------: | :------: | +| PP-YOLO | 8 | 24 | ResNet50vd | 608 | 44.8 | 45.2 | 72.9 | 155.6 | [model](https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | +| PP-YOLO_2x | 8 | 24 | ResNet50vd | 608 | 45.3 | 45.9 | 72.9 | 155.6 | [model](https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml) | + +**注意:** + +- PP-YOLO模型使用COCO数据集中train2017作为训练集,使用val2017和test-dev2017作为测试集,Box APtest为`mAP(IoU=0.5:0.95)`评估结果。 +- PP-YOLO模型训练过程中使用8 GPUs,每GPU batch size为24进行训练,如训练GPU数和batch size不使用上述配置,须参考[FAQ](../../../docs/FAQ.md)调整学习率和迭代次数。 +- PP-YOLO模型推理速度测试采用单卡V100,batch size=1进行测试,使用CUDA 10.2, CUDNN 7.5.1,TensorRT推理速度测试使用TensorRT 5.1.2.2。 +- PP-YOLO模型FP32的推理速度测试数据为使用`tools/export_model.py`脚本导出模型后,使用`deploy/python/infer.py`脚本中的`--run_benchnark`参数使用Paddle预测库进行推理速度benchmark测试结果, 且测试的均为不包含数据预处理和模型输出后处理(NMS)的数据(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致)。 +- TensorRT FP16的速度测试相比于FP32去除了`yolo_box`(bbox解码)部分耗时,即不包含数据预处理,bbox解码和NMS(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致)。 +- PP-YOLO模型推理速度测试采用单卡V100,batch size=1进行测试,使用CUDA 10.2, CUDNN 7.5.1,TensorRT推理速度测试使用TensorRT 5.1.2.2。 + +## 使用说明 + +### 1. 训练 + +使用8GPU通过如下命令一键式启动训练(以下命令均默认在PaddleDetection根目录运行), 通过`--eval`参数开启训练中交替评估。 + +```bash +python -m paddle.distributed.launch --log_dir=./ppyolo_dygraph/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml &>ppyolo_dygraph.log 2>&1 & +``` + +### 2. 评估 + +使用单GPU通过如下命令一键式评估模型在COCO val2017数据集效果 + +```bash +# 使用PaddleDetection发布的权重 +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams + +# 使用训练保存的checkpoint +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=output/ppyolo_r50vd_dcn_1x_coco/model_final +``` + +我们提供了`configs/ppyolo/ppyolo_test.yml`用于评估COCO test-dev2017数据集的效果,评估COCO test-dev2017数据集的效果须先从[COCO数据集下载页](https://cocodataset.org/#download)下载test-dev2017数据集,解压到`configs/ppyolo/ppyolo_test.yml`中`EvalReader.dataset`中配置的路径,并使用如下命令进行评估 + +```bash +# 使用PaddleDetection发布的权重 +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams + +# 使用训练保存的checkpoint +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o weights=output/ppyolo_r50vd_dcn_1x_coco/model_final +``` + +评估结果保存于`bbox.json`中,将其压缩为zip包后通过[COCO数据集评估页](https://competitions.codalab.org/competitions/20794#participate)提交评估。 + +**注意:** `configs/ppyolo/ppyolo_test.yml`仅用于评估COCO test-dev数据集,不用于训练和评估COCO val2017数据集。 + +### 3. 推理 + +使用单GPU通过如下命令一键式推理图像,通过`--infer_img`指定图像路径,或通过`--infer_dir`指定目录并推理目录下所有图像 + +```bash +# 推理单张图像 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=../demo/000000014439_640x640.jpg + +# 推理目录下所有图像 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_dir=../demo +``` + +### 4. 推理部署与benchmark + +PP-YOLO模型部署及推理benchmark需要通过`tools/export_model.py`导出模型后使用Paddle预测库进行部署和推理,可通过如下命令一键式启动。 + +```bash +# 导出模型,默认存储于output/ppyolo目录 +python tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams + +# 预测库推理 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=../demo/000000014439_640x640.jpg --use_gpu=True +``` + +PP-YOLO模型benchmark测试为不包含数据预处理和网络输出后处理(NMS)的网络结构部分数据,导出模型时须指定`--exlcude_nms`来裁剪掉模型中后处理的NMS部分,通过如下命令进行模型导出和benchmark测试。 + +```bash +# 导出模型,通过--exclude_nms参数裁剪掉模型中的NMS部分,默认存储于output/ppyolo目录 +python tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams --exclude_nms + +# FP32 benchmark测试 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=../demo/000000014439_640x640.jpg --use_gpu=True --run_benchmark=True + +# TensorRT FP16 benchmark测试 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=../demo/000000014439_640x640.jpg --use_gpu=True --run_benchmark=True --run_mode=trt_fp16 +``` + +## 未来工作 + +1. 发布PP-YOLO-tiny模型 +2. 发布更多骨干网络的PP-YOLO模型 + +## 附录 + +PP-YOLO模型相对于YOLOv3模型优化项消融实验数据如下表所示。 + +| 序号 | 模型 | Box APval | Box APtest | 参数量(M) | FLOPs(G) | V100 FP32 FPS | +| :--: | :--------------------------- | :------------------: | :-------------------: | :-------: | :------: | :-----------: | +| A | YOLOv3-DarkNet53 | 38.9 | - | 59.13 | 65.52 | 58.2 | +| B | YOLOv3-ResNet50vd-DCN | 39.1 | - | 43.89 | 44.71 | 79.2 | +| C | B + LB + EMA + DropBlock | 41.4 | - | 43.89 | 44.71 | 79.2 | +| D | C + IoU Loss | 41.9 | - | 43.89 | 44.71 | 79.2 | +| E | D + IoU Aware | 42.5 | - | 43.90 | 44.71 | 74.9 | +| F | E + Grid Sensitive | 42.8 | - | 43.90 | 44.71 | 74.8 | +| G | F + Matrix NMS | 43.5 | - | 43.90 | 44.71 | 74.8 | +| H | G + CoordConv | 44.0 | - | 43.93 | 44.76 | 74.1 | +| I | H + SPP | 44.3 | 45.2 | 44.93 | 45.12 | 72.9 | +| J | I + Better ImageNet Pretrain | 44.8 | 45.2 | 44.93 | 45.12 | 72.9 | +| K | J + 2x Scheduler | 45.3 | 45.9 | 44.93 | 45.12 | 72.9 | + +**注意:** + +- 精度与推理速度数据均为使用输入图像尺寸为608的测试结果 +- Box AP为在COCO train2017数据集训练,val2017和test-dev2017数据集上评估`mAP(IoU=0.5:0.95)`数据 +- 推理速度为单卡V100上,batch size=1, 使用上述benchmark测试方法的测试结果,测试环境配置为CUDA 10.2,CUDNN 7.5.1 +- [YOLOv3-DarkNet53](../yolov3/yolov3_darknet53_270e_coco.yml)精度38.9为PaddleDetection优化后的YOLOv3模型,可参见[模型库](../../../docs/MODEL_ZOO.md) diff --git a/dygraph/configs/ppyolo/_base_/optimizer_1x.yml b/dygraph/configs/ppyolo/_base_/optimizer_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..fe51b296c72e4c663bf4c611d80a1173ff69f6a9 --- /dev/null +++ b/dygraph/configs/ppyolo/_base_/optimizer_1x.yml @@ -0,0 +1,22 @@ +epoch: 405 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 243 + - 324 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/dygraph/configs/ppyolo/_base_/optimizer_2x.yml b/dygraph/configs/ppyolo/_base_/optimizer_2x.yml new file mode 100644 index 0000000000000000000000000000000000000000..c601a18601c7a0d8a79049cb0d1b9a87f41900f4 --- /dev/null +++ b/dygraph/configs/ppyolo/_base_/optimizer_2x.yml @@ -0,0 +1,22 @@ +epoch: 811 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 649 + - 730 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/dygraph/configs/ppyolo/_base_/ppyolo_r50vd_dcn.yml b/dygraph/configs/ppyolo/_base_/ppyolo_r50vd_dcn.yml new file mode 100644 index 0000000000000000000000000000000000000000..fe64a308288d61242fc9b97bb675404de3b84ebb --- /dev/null +++ b/dygraph/configs/ppyolo/_base_/ppyolo_r50vd_dcn.yml @@ -0,0 +1,70 @@ +architecture: YOLOv3 +pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar +weights: output/ppyolo_r50vd_dcn/model_final +load_static_weights: true +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: ResNet + neck: PPYOLOFPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOFPN: + feat_channels: [2048, 1280, 640] + coord_conv: true + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.4 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + normalized: false + background_label: -1 diff --git a/dygraph/configs/ppyolo/_base_/ppyolo_reader.yml b/dygraph/configs/ppyolo/_base_/ppyolo_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..3b1e8c09df61c2db15c093a4923a4338f522831a --- /dev/null +++ b/dygraph/configs/ppyolo/_base_/ppyolo_reader.yml @@ -0,0 +1,43 @@ +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 50 + sample_transforms: + - DecodeOp: {} + - MixupOp: {alpha: 1.5, beta: 1.5} + - RandomDistortOp: {} + - RandomExpandOp: {fill_value: [123.675, 116.28, 103.53]} + - RandomCropOp: {} + - RandomFlipOp: {} + batch_transforms: + - BatchRandomResizeOp: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBoxOp: {} + - PadBoxOp: {num_max_boxes: 50} + - BboxXYXY2XYWHOp: {} + - NormalizeImageOp: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - PermuteOp: {} + - Gt2YoloTargetOp: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 24 + shuffle: true + drop_last: true + mixup_epoch: 25000 + + +EvalReader: + sample_transforms: + - DecodeOp: {} + - ResizeOp: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImageOp: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - PermuteOp: {} + batch_size: 8 + drop_empty: false + +TestReader: + inputs_def: + image_shape: [3, 608, 608] + sample_transforms: + - DecodeOp: {} + - ResizeOp: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImageOp: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - PermuteOp: {} + batch_size: 1 diff --git a/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml b/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..4b1e2a797c21ba7ace257dbf46caf8085db8faec --- /dev/null +++ b/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyolo_r50vd_dcn.yml', + './_base_/optimizer_1x.yml', + './_base_/ppyolo_reader.yml', +] + +snapshot_epoch: 16 diff --git a/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_minicoco.yml b/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_minicoco.yml new file mode 100644 index 0000000000000000000000000000000000000000..18945a9bd6c34dd6e0277ee7c808a728006b92e4 --- /dev/null +++ b/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_minicoco.yml @@ -0,0 +1,35 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyolo_r50vd_dcn.yml', + './_base_/optimizer_1x.yml', + './_base_/ppyolo_reader.yml', +] + +snapshot_epoch: 8 +use_ema: false + +TrainReader: + batch_size: 12 + +TrainDataset: + !COCODataSet + image_dir: train2017 + # refer to https://github.com/giddyyupp/coco-minitrain + anno_path: annotations/instances_minitrain2017.json + dataset_dir: dataset/coco + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +epoch: 192 + +LearningRate: + base_lr: 0.005 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 153 + - 173 + - !LinearWarmup + start_factor: 0. + steps: 4000 diff --git a/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml b/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..87646baf728746f90c8706381a45d18896808680 --- /dev/null +++ b/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyolo_r50vd_dcn.yml', + './_base_/optimizer_2x.yml', + './_base_/ppyolo_reader.yml', +] + +snapshot_epoch: 16 diff --git a/dygraph/configs/ppyolo/ppyolo_test.yml b/dygraph/configs/ppyolo/ppyolo_test.yml new file mode 100644 index 0000000000000000000000000000000000000000..928f1c96ee694c89f22d7b1180e21a6dac662c6b --- /dev/null +++ b/dygraph/configs/ppyolo/ppyolo_test.yml @@ -0,0 +1,15 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/ppyolo_r50vd_dcn.yml', + './_base_/ppyolo_1x.yml', + './_base_/ppyolo_reader.yml', +] + +snapshot_epoch: 16 + +EvalDataset: + !COCODataSet + image_dir: test2017 + anno_path: annotations/image_info_test-dev2017.json + dataset_dir: dataset/coco diff --git a/dygraph/ppdet/engine/callbacks.py b/dygraph/ppdet/engine/callbacks.py index 59c2a5d7ec1320badd38815dd890ea737865ed83..16d85b4b5fefcbd7aa904d0f3fb9424c1d2df67a 100644 --- a/dygraph/ppdet/engine/callbacks.py +++ b/dygraph/ppdet/engine/callbacks.py @@ -23,6 +23,7 @@ import paddle from paddle.distributed import ParallelEnv from ppdet.utils.checkpoint import save_model +from ppdet.optimizer import ModelEMA from ppdet.utils.logger import setup_logger logger = setup_logger(__name__) @@ -135,6 +136,15 @@ class LogPrinter(Callback): class Checkpointer(Callback): def __init__(self, model): super(Checkpointer, self).__init__(model) + cfg = self.model.cfg + self.use_ema = ('use_ema' in cfg and cfg['use_ema']) + if self.use_ema: + self.ema = ModelEMA( + cfg['ema_decay'], self.model.model, use_thres_step=True) + + def on_step_end(self, status): + if self.use_ema: + self.ema.update(self.model.model) def on_epoch_end(self, status): assert self.model.mode == 'train', \ @@ -147,5 +157,10 @@ class Checkpointer(Callback): self.model.cfg.filename) save_name = str( epoch_id) if epoch_id != end_epoch - 1 else "model_final" - save_model(self.model.model, self.model.optimizer, save_dir, - save_name, epoch_id + 1) + if self.use_ema: + state_dict = self.ema.apply() + save_model(state_dict, self.model.optimizer, save_dir, + save_name, epoch_id + 1) + else: + save_model(self.model.model, self.model.optimizer, save_dir, + save_name, epoch_id + 1) diff --git a/dygraph/ppdet/modeling/architectures/yolo.py b/dygraph/ppdet/modeling/architectures/yolo.py index cbf09d47acb645d728521a96cd5490b42532590a..19ec048a23020712584c91e8bd395430920e2aee 100644 --- a/dygraph/ppdet/modeling/architectures/yolo.py +++ b/dygraph/ppdet/modeling/architectures/yolo.py @@ -44,8 +44,9 @@ class YOLOv3(BaseArch): return loss def get_pred(self): + yolo_head_outs = self.yolo_head.get_outputs(self.yolo_head_outs) bbox, bbox_num = self.post_process( - self.yolo_head_outs, self.yolo_head.mask_anchors, + yolo_head_outs, self.yolo_head.mask_anchors, self.inputs['im_shape'], self.inputs['scale_factor']) outs = { "bbox": bbox, diff --git a/dygraph/ppdet/modeling/backbones/resnet.py b/dygraph/ppdet/modeling/backbones/resnet.py index 126669857ecae7bb41d5c7401807598741aace0d..ad466cde3745256fcbe56defe688167de32af665 100755 --- a/dygraph/ppdet/modeling/backbones/resnet.py +++ b/dygraph/ppdet/modeling/backbones/resnet.py @@ -74,7 +74,7 @@ class ConvNormLayer(nn.Layer): padding=(filter_size - 1) // 2, groups=groups, weight_attr=ParamAttr( - learning_rate=lr, name=name + '_weights'), + learning_rate=lr, name=name + "_weights"), bias_attr=False, name=name) diff --git a/dygraph/ppdet/modeling/heads/yolo_head.py b/dygraph/ppdet/modeling/heads/yolo_head.py index d14dc5a3fd63ddca9f156cd8b11de2d1b59eb778..c88f26759807c43f74132dd534a5796d9d3c2772 100644 --- a/dygraph/ppdet/modeling/heads/yolo_head.py +++ b/dygraph/ppdet/modeling/heads/yolo_head.py @@ -7,6 +7,13 @@ from ppdet.core.workspace import register from ..backbones.darknet import ConvBNLayer +def _de_sigmoid(x, eps=1e-7): + x = paddle.clip(x, eps, 1. / eps) + x = paddle.clip(1. / x - 1., eps, 1. / eps) + x = -paddle.log(x) + return x + + @register class YOLOv3Head(nn.Layer): __shared__ = ['num_classes'] @@ -17,17 +24,25 @@ class YOLOv3Head(nn.Layer): [59, 119], [116, 90], [156, 198], [373, 326]], anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]], num_classes=80, - loss='YOLOv3Loss'): + loss='YOLOv3Loss', + iou_aware=False, + iou_aware_factor=0.4): super(YOLOv3Head, self).__init__() self.num_classes = num_classes self.loss = loss + self.iou_aware = iou_aware + self.iou_aware_factor = iou_aware_factor + self.parse_anchor(anchors, anchor_masks) self.num_outputs = len(self.anchors) self.yolo_outputs = [] for i in range(len(self.anchors)): - num_filters = self.num_outputs * (self.num_classes + 5) + if self.iou_aware: + num_filters = self.num_outputs * (self.num_classes + 6) + else: + num_filters = self.num_outputs * (self.num_classes + 5) name = 'yolo_output.{}'.format(i) yolo_output = self.add_sublayer( name, @@ -62,3 +77,28 @@ class YOLOv3Head(nn.Layer): def get_loss(self, inputs, targets): return self.loss(inputs, targets, self.anchors) + + def get_outputs(self, outputs): + if self.iou_aware: + y = [] + for i, out in enumerate(outputs): + na = len(self.anchors[i]) + ioup, x = out[:, 0:na, :, :], out[:, na:, :, :] + b, c, h, w = x.shape + no = c // na + x = x.reshape((b, na, no, h, w)) + ioup = ioup.reshape((b, na, 1, h, w)) + obj = x[:, :, 4:5, :, :] + ioup = F.sigmoid(ioup) + obj = F.sigmoid(obj) + obj_t = (obj**(1 - self.iou_aware_factor)) * ( + ioup**self.iou_aware_factor) + obj_t = _de_sigmoid(obj_t) + loc_t = x[:, :, :4, :, :] + cls_t = x[:, :, 5:, :, :] + y_t = paddle.concat([loc_t, obj_t, cls_t], axis=2) + y_t = y_t.reshape((b, -1, h, w)) + y.append(y_t) + return y + else: + return outputs diff --git a/dygraph/ppdet/modeling/layers.py b/dygraph/ppdet/modeling/layers.py index 4d44fad7667069d91e02033b86959bca5cdeca37..e128a58496b1ac413ca4771dac942c27be5a0326 100644 --- a/dygraph/ppdet/modeling/layers.py +++ b/dygraph/ppdet/modeling/layers.py @@ -612,7 +612,6 @@ class MultiClassNMS(object): @register @serializable class MatrixNMS(object): - __op__ = ops.matrix_nms __append_doc__ = True def __init__(self, @@ -634,6 +633,19 @@ class MatrixNMS(object): self.gaussian_sigma = gaussian_sigma self.background_label = background_label + def __call__(self, bbox, score): + return ops.matrix_nms( + bboxes=bbox, + scores=score, + score_threshold=self.score_threshold, + post_threshold=self.post_threshold, + nms_top_k=self.nms_top_k, + keep_top_k=self.keep_top_k, + use_gaussian=self.use_gaussian, + gaussian_sigma=self.gaussian_sigma, + background_label=self.background_label, + normalized=self.normalized) + @register @serializable diff --git a/dygraph/ppdet/modeling/losses/iou_aware_loss.py b/dygraph/ppdet/modeling/losses/iou_aware_loss.py index 0e61883f9af496c6336ad682011f45dc974b0b36..2cc6f2a2c4077558c93ec55baa11f3d12a2e8476 100644 --- a/dygraph/ppdet/modeling/losses/iou_aware_loss.py +++ b/dygraph/ppdet/modeling/losses/iou_aware_loss.py @@ -16,6 +16,7 @@ from __future__ import absolute_import from __future__ import division from __future__ import print_function +import paddle import paddle.nn.functional as F from ppdet.core.workspace import register, serializable from .iou_loss import IouLoss @@ -33,27 +34,15 @@ class IouAwareLoss(IouLoss): max_width (int): max width of input to support random shape input """ - def __init__( - self, - loss_weight=1.0, - giou=False, - diou=False, - ciou=False, ): + def __init__(self, loss_weight=1.0, giou=False, diou=False, ciou=False): super(IouAwareLoss, self).__init__( loss_weight=loss_weight, giou=giou, diou=diou, ciou=ciou) - def __call__(self, ioup, pbox, gbox, anchor, downsample, scale=1.): - b = pbox.shape[0] - ioup = ioup.reshape((b, -1)) - pbox = decode_yolo(pbox, anchor, downsample) - gbox = decode_yolo(gbox, anchor, downsample) - pbox = xywh2xyxy(pbox).reshape((b, -1, 4)) - gbox = xywh2xyxy(gbox).reshape((b, -1, 4)) + def __call__(self, ioup, pbox, gbox): iou = bbox_iou( pbox, gbox, giou=self.giou, diou=self.diou, ciou=self.ciou) iou.stop_gradient = True - - loss_iou_aware = F.binary_cross_entropy_with_logits( - ioup, iou, reduction='none') + ioup = F.sigmoid(ioup) + loss_iou_aware = (-iou * paddle.log(ioup)).sum(-2, keepdim=True) loss_iou_aware = loss_iou_aware * self.loss_weight return loss_iou_aware diff --git a/dygraph/ppdet/modeling/losses/iou_loss.py b/dygraph/ppdet/modeling/losses/iou_loss.py index 97c59bf278c9d269d557eb434b87b0c65f632371..42d51ad86161eddf5f6c6a7c2c9f63419199f89b 100644 --- a/dygraph/ppdet/modeling/losses/iou_loss.py +++ b/dygraph/ppdet/modeling/losses/iou_loss.py @@ -50,12 +50,7 @@ class IouLoss(object): self.ciou = ciou self.loss_square = loss_square - def __call__(self, pbox, gbox, anchor, downsample, scale=1.): - b = pbox.shape[0] - pbox = decode_yolo(pbox, anchor, downsample) - gbox = decode_yolo(gbox, anchor, downsample) - pbox = xywh2xyxy(pbox).reshape((b, -1, 4)) - gbox = xywh2xyxy(gbox).reshape((b, -1, 4)) + def __call__(self, pbox, gbox): iou = bbox_iou( pbox, gbox, giou=self.giou, diou=self.diou, ciou=self.ciou) if self.loss_square: diff --git a/dygraph/ppdet/modeling/losses/yolo_loss.py b/dygraph/ppdet/modeling/losses/yolo_loss.py index 51bc490cb9a08220f6eade39444d52e31a6e755b..ad679a079f31cd4089dbac054027fb88535b6dc3 100644 --- a/dygraph/ppdet/modeling/losses/yolo_loss.py +++ b/dygraph/ppdet/modeling/losses/yolo_loss.py @@ -26,6 +26,12 @@ from ..utils import decode_yolo, xywh2xyxy, iou_similarity __all__ = ['YOLOv3Loss'] +def bbox_transform(pbox, anchor, downsample): + pbox = decode_yolo(pbox, anchor, downsample) + pbox = xywh2xyxy(pbox) + return pbox + + @register class YOLOv3Loss(nn.Layer): @@ -50,11 +56,16 @@ class YOLOv3Loss(nn.Layer): self.iou_aware_loss = iou_aware_loss def obj_loss(self, pbox, gbox, pobj, tobj, anchor, downsample): - b, h, w, na = pbox.shape[:4] + # pbox pbox = decode_yolo(pbox, anchor, downsample) - pbox = pbox.reshape((b, -1, 4)) pbox = xywh2xyxy(pbox) - gbox = xywh2xyxy(gbox) + pbox = paddle.concat(pbox, axis=-1) + b = pbox.shape[0] + pbox = pbox.reshape((b, -1, 4)) + # gbox + gxy = gbox[:, :, 0:2] - gbox[:, :, 2:4] * 0.5 + gwh = gbox[:, :, 0:2] + gbox[:, :, 2:4] * 0.5 + gbox = paddle.concat([gxy, gwh], axis=-1) iou = iou_similarity(pbox, gbox) iou.stop_gradient = True @@ -86,57 +97,69 @@ class YOLOv3Loss(nn.Layer): pcls, tcls, reduction='none') return loss_cls - def yolov3_loss(self, x, t, gt_box, anchor, downsample, scale=1., + def yolov3_loss(self, p, t, gt_box, anchor, downsample, scale=1., eps=1e-10): na = len(anchor) - b, c, h, w = x.shape - no = c // na - x = x.reshape((b, na, no, h, w)).transpose((0, 3, 4, 1, 2)) - - xy, wh, obj = x[:, :, :, :, 0:2], x[:, :, :, :, 2:4], x[:, :, :, :, 4:5] + b, c, h, w = p.shape if self.iou_aware_loss: - ioup, pcls = x[:, :, :, :, 5:6], x[:, :, :, :, 6:] - else: - pcls = x[:, :, :, :, 5:] - - t = t.transpose((0, 3, 4, 1, 2)) - txy, twh, tscale = t[:, :, :, :, 0:2], t[:, :, :, :, 2:4], t[:, :, :, :, - 4:5] + ioup, p = p[:, 0:na, :, :], p[:, na:, :, :] + ioup = ioup.unsqueeze(-1) + p = p.reshape((b, na, -1, h, w)).transpose((0, 1, 3, 4, 2)) + x, y = p[:, :, :, :, 0:1], p[:, :, :, :, 1:2] + w, h = p[:, :, :, :, 2:3], p[:, :, :, :, 3:4] + obj, pcls = p[:, :, :, :, 4:5], p[:, :, :, :, 5:] + + t = t.transpose((0, 1, 3, 4, 2)) + tx, ty = t[:, :, :, :, 0:1], t[:, :, :, :, 1:2] + tw, th = t[:, :, :, :, 2:3], t[:, :, :, :, 3:4] + tscale = t[:, :, :, :, 4:5] tobj, tcls = t[:, :, :, :, 5:6], t[:, :, :, :, 6:] tscale_obj = tscale * tobj loss = dict() + + x = scale * F.sigmoid(x) - 0.5 * (scale - 1.) + y = scale * F.sigmoid(y) - 0.5 * (scale - 1.) + if abs(scale - 1.) < eps: - loss_xy = tscale_obj * F.binary_cross_entropy_with_logits( - xy, txy, reduction='none') + loss_x = F.binary_cross_entropy(x, tx, reduction='none') + loss_y = F.binary_cross_entropy(y, ty, reduction='none') + loss_xy = tscale_obj * (loss_x + loss_y) else: - xy = scale * F.sigmoid(xy) - 0.5 * (scale - 1.) - loss_xy = tscale_obj * paddle.abs(xy - txy) + loss_x = paddle.abs(x - tx) + loss_y = paddle.abs(y - ty) + loss_xy = tscale_obj * (loss_x + loss_y) loss_xy = loss_xy.sum([1, 2, 3, 4]).mean() - loss_wh = tscale_obj * paddle.abs(wh - twh) + + loss_w = paddle.abs(w - tw) + loss_h = paddle.abs(h - th) + loss_wh = tscale_obj * (loss_w + loss_h) loss_wh = loss_wh.sum([1, 2, 3, 4]).mean() - loss['loss_loc'] = loss_xy + loss_wh + loss['loss_xy'] = loss_xy + loss['loss_wh'] = loss_wh - x[:, :, :, :, 0:2] = scale * F.sigmoid(x[:, :, :, :, 0:2]) - 0.5 * ( - scale - 1.) - box, tbox = x[:, :, :, :, 0:4], t[:, :, :, :, 0:4] if self.iou_loss is not None: - # box and tbox will not change though they are modified in self.iou_loss function, so no need to clone - loss_iou = self.iou_loss(box, tbox, anchor, downsample, scale) - loss_iou = loss_iou * tscale_obj.reshape((b, -1)) - loss_iou = loss_iou.sum(-1).mean() + # warn: do not modify x, y, w, h in place + box, tbox = [x, y, w, h], [tx, ty, tw, th] + pbox = bbox_transform(box, anchor, downsample) + gbox = bbox_transform(tbox, anchor, downsample) + loss_iou = self.iou_loss(pbox, gbox) + loss_iou = loss_iou * tscale_obj + loss_iou = loss_iou.sum([1, 2, 3, 4]).mean() loss['loss_iou'] = loss_iou if self.iou_aware_loss is not None: - # box and tbox will not change though they are modified in self.iou_aware_loss function, so no need to clone - loss_iou_aware = self.iou_aware_loss(ioup, box, tbox, anchor, - downsample, scale) - loss_iou_aware = loss_iou_aware * tobj.reshape((b, -1)) - loss_iou_aware = loss_iou_aware.sum(-1).mean() + box, tbox = [x, y, w, h], [tx, ty, tw, th] + pbox = bbox_transform(box, anchor, downsample) + gbox = bbox_transform(tbox, anchor, downsample) + loss_iou_aware = self.iou_aware_loss(ioup, pbox, gbox) + loss_iou_aware = loss_iou_aware * tobj + loss_iou_aware = loss_iou_aware.sum([1, 2, 3, 4]).mean() loss['loss_iou_aware'] = loss_iou_aware + box = [x, y, w, h] loss_obj = self.obj_loss(box, gt_box, obj, tobj, anchor, downsample) loss_obj = loss_obj.sum(-1).mean() loss['loss_obj'] = loss_obj @@ -152,7 +175,8 @@ class YOLOv3Loss(nn.Layer): yolo_losses = dict() for x, t, anchor, downsample in zip(inputs, gt_targets, anchors, self.downsample): - yolo_loss = self.yolov3_loss(x, t, gt_box, anchor, downsample) + yolo_loss = self.yolov3_loss(x, t, gt_box, anchor, downsample, + self.scale_x_y) for k, v in yolo_loss.items(): if k in yolo_losses: yolo_losses[k] += v @@ -164,4 +188,4 @@ class YOLOv3Loss(nn.Layer): loss += v yolo_losses['loss'] = loss - return yolo_losses + return yolo_losses \ No newline at end of file diff --git a/dygraph/ppdet/modeling/necks/yolo_fpn.py b/dygraph/ppdet/modeling/necks/yolo_fpn.py index c7bfe9f4854266bf2579d72dc77f25545882b1f9..0def906ba9c0b63610d0192a9d0a7261c952a193 100644 --- a/dygraph/ppdet/modeling/necks/yolo_fpn.py +++ b/dygraph/ppdet/modeling/necks/yolo_fpn.py @@ -18,6 +18,7 @@ import paddle.nn.functional as F from paddle import ParamAttr from ppdet.core.workspace import register, serializable from ..backbones.darknet import ConvBNLayer +import numpy as np class YoloDetBlock(nn.Layer): @@ -62,6 +63,101 @@ class YoloDetBlock(nn.Layer): return route, tip +class SPP(nn.Layer): + def __init__(self, ch_in, ch_out, k, pool_size, norm_type, name): + super(SPP, self).__init__() + self.pool = [] + for size in pool_size: + pool = self.add_sublayer( + '{}.pool1'.format(name), + nn.MaxPool2D( + kernel_size=size, + stride=1, + padding=size // 2, + ceil_mode=False)) + self.pool.append(pool) + self.conv = ConvBNLayer( + ch_in, ch_out, k, padding=k // 2, norm_type=norm_type, name=name) + + def forward(self, x): + outs = [x] + for pool in self.pool: + outs.append(pool(x)) + y = paddle.concat(outs, axis=1) + y = self.conv(y) + return y + + +class DropBlock(nn.Layer): + def __init__(self, block_size, keep_prob, name): + super(DropBlock, self).__init__() + self.block_size = block_size + self.keep_prob = keep_prob + self.name = name + + def forward(self, x): + if not self.training or self.keep_prob == 1: + return x + else: + gamma = (1. - self.keep_prob) / (self.block_size**2) + for s in x.shape[2:]: + gamma *= s / (s - self.block_size + 1) + + matrix = paddle.cast(paddle.rand(x.shape, x.dtype) < gamma, x.dtype) + mask_inv = F.max_pool2d( + matrix, self.block_size, stride=1, padding=self.block_size // 2) + mask = 1. - mask_inv + y = x * mask * (mask.numel() / mask.sum()) + return y + + +class CoordConv(nn.Layer): + def __init__(self, ch_in, ch_out, filter_size, padding, norm_type, name): + super(CoordConv, self).__init__() + self.conv = ConvBNLayer( + ch_in + 2, + ch_out, + filter_size=filter_size, + padding=padding, + norm_type=norm_type, + name=name) + + def forward(self, x): + b = x.shape[0] + h = x.shape[2] + w = x.shape[3] + + gx = paddle.arange(w, dtype='float32') / (w - 1.) * 2.0 - 1. + gx = gx.reshape([1, 1, 1, w]).expand([b, 1, h, w]) + gx.stop_gradient = True + + gy = paddle.arange(h, dtype='float32') / (h - 1.) * 2.0 - 1. + gy = gy.reshape([1, 1, h, 1]).expand([b, 1, h, w]) + gy.stop_gradient = True + + y = paddle.concat([x, gx, gy], axis=1) + y = self.conv(y) + return y + + +class PPYOLODetBlock(nn.Layer): + def __init__(self, cfg, name): + super(PPYOLODetBlock, self).__init__() + self.conv_module = nn.Sequential() + for idx, (conv_name, layer, args, kwargs) in enumerate(cfg[:-1]): + kwargs.update(name='{}.{}'.format(name, conv_name)) + self.conv_module.add_sublayer(conv_name, layer(*args, **kwargs)) + + conv_name, layer, args, kwargs = cfg[-1] + kwargs.update(name='{}.{}'.format(name, conv_name)) + self.tip = layer(*args, **kwargs) + + def forward(self, inputs): + route = self.conv_module(inputs) + tip = self.tip(route) + return route, tip + + @register @serializable class YOLOv3FPN(nn.Layer): @@ -114,3 +210,101 @@ class YOLOv3FPN(nn.Layer): route = F.interpolate(route, scale_factor=2.) return yolo_feats + + +@register +@serializable +class PPYOLOFPN(nn.Layer): + __shared__ = ['norm_type'] + + def __init__(self, + feat_channels=[2048, 1280, 640], + norm_type='bn', + **kwargs): + super(PPYOLOFPN, self).__init__() + assert len(feat_channels) > 0, "feat_channels length should > 0" + self.feat_channels = feat_channels + self.num_blocks = len(feat_channels) + # parse kwargs + self.coord_conv = kwargs.get('coord_conv', False) + self.drop_block = kwargs.get('drop_block', False) + if self.drop_block: + self.block_size = kwargs.get('block_size', 3) + self.keep_prob = kwargs.get('keep_prob', 0.9) + + self.spp = kwargs.get('spp', False) + if self.coord_conv: + ConvLayer = CoordConv + else: + ConvLayer = ConvBNLayer + + if self.drop_block: + dropblock_cfg = [[ + 'dropblock', DropBlock, [self.block_size, self.keep_prob], + dict() + ]] + else: + dropblock_cfg = [] + + self.yolo_blocks = [] + self.routes = [] + for i, ch_in in enumerate(self.feat_channels): + channel = 64 * (2**self.num_blocks) // (2**i) + base_cfg = [ + # name of layer, Layer, args + ['conv0', ConvLayer, [ch_in, channel, 1]], + ['conv1', ConvBNLayer, [channel, channel * 2, 3]], + ['conv2', ConvLayer, [channel * 2, channel, 1]], + ['conv3', ConvBNLayer, [channel, channel * 2, 3]], + ['route', ConvLayer, [channel * 2, channel, 1]], + ['tip', ConvLayer, [channel, channel * 2, 3]] + ] + for conf in base_cfg: + filter_size = conf[-1][-1] + conf.append(dict(padding=filter_size // 2, norm_type=norm_type)) + if i == 0: + if self.spp: + pool_size = [5, 9, 13] + spp_cfg = [[ + 'spp', SPP, + [channel * (len(pool_size) + 1), channel, 1], dict( + pool_size=pool_size, norm_type=norm_type) + ]] + else: + spp_cfg = [] + cfg = base_cfg[0:3] + spp_cfg + base_cfg[ + 3:4] + dropblock_cfg + base_cfg[4:6] + else: + cfg = base_cfg[0:2] + dropblock_cfg + base_cfg[2:6] + name = 'yolo_block.{}'.format(i) + yolo_block = self.add_sublayer(name, PPYOLODetBlock(cfg, name)) + self.yolo_blocks.append(yolo_block) + if i < self.num_blocks - 1: + name = 'yolo_transition.{}'.format(i) + route = self.add_sublayer( + name, + ConvBNLayer( + ch_in=channel, + ch_out=channel // 2, + filter_size=1, + stride=1, + padding=0, + norm_type=norm_type, + name=name)) + self.routes.append(route) + + def forward(self, blocks): + assert len(blocks) == self.num_blocks + blocks = blocks[::-1] + yolo_feats = [] + for i, block in enumerate(blocks): + if i > 0: + block = paddle.concat([route, block], axis=1) + route, tip = self.yolo_blocks[i](block) + yolo_feats.append(tip) + + if i < self.num_blocks - 1: + route = self.routes[i](route) + route = F.interpolate(route, scale_factor=2.) + + return yolo_feats \ No newline at end of file diff --git a/dygraph/ppdet/modeling/ops.py b/dygraph/ppdet/modeling/ops.py index 70760af55d5a7785a1ac0564f3a83eb46ff5f373..e497375ae96b7b0622e97eed13b593f97fa81673 100644 --- a/dygraph/ppdet/modeling/ops.py +++ b/dygraph/ppdet/modeling/ops.py @@ -1209,13 +1209,11 @@ def matrix_nms(bboxes, use_gaussian, 'keep_top_k', keep_top_k, 'normalized', normalized) out, index, rois_num = core.ops.matrix_nms(bboxes, scores, *attrs) - if return_index: - if return_rois_num: - return out, index, rois_num - return out, index - if return_rois_num: - return out, rois_num - return out + if not return_index: + index = None + if not return_rois_num: + rois_num = None + return out, rois_num, index else: helper = LayerHelper('matrix_nms', **locals()) output = helper.create_variable_for_type_inference(dtype=bboxes.dtype) @@ -1242,13 +1240,11 @@ def matrix_nms(bboxes, outputs=outputs) output.stop_gradient = True - if return_index: - if return_rois_num: - return output, index, rois_num - return output, index - if return_rois_num: - return output, rois_num - return output + if not return_index: + index = None + if not return_rois_num: + rois_num = None + return output, rois_num, index def bipartite_match(dist_matrix, diff --git a/dygraph/ppdet/modeling/utils/bbox_util.py b/dygraph/ppdet/modeling/utils/bbox_util.py index 9672ed484840b60f68bd8fc1b9b55a72b88a13ef..440b162f85f8a89ffe590ab714b304e906234f0c 100644 --- a/dygraph/ppdet/modeling/utils/bbox_util.py +++ b/dygraph/ppdet/modeling/utils/bbox_util.py @@ -22,10 +22,12 @@ import math def xywh2xyxy(box): - out = paddle.zeros_like(box) - out[:, :, 0:2] = box[:, :, 0:2] - box[:, :, 2:4] / 2 - out[:, :, 2:4] = box[:, :, 0:2] + box[:, :, 2:4] / 2 - return out + x, y, w, h = box + x1 = x - w * 0.5 + y1 = y - h * 0.5 + x2 = x + w * 0.5 + y2 = y + h * 0.5 + return [x1, y1, x2, y2] def make_grid(h, w, dtype): @@ -37,27 +39,27 @@ def decode_yolo(box, anchor, downsample_ratio): """decode yolo box Args: - box (Tensor): pred with the shape [b, h, w, na, 4] + box (list): [x, y, w, h], all have the shape [b, na, h, w, 1] anchor (list): anchor with the shape [na, 2] downsample_ratio (int): downsample ratio, default 32 scale (float): scale, default 1. - + Return: - box (Tensor): decoded box, with the shape [b, h, w, na, 4] + box (list): decoded box, [x, y, w, h], all have the shape [b, na, h, w, 1] """ - h, w, na = box.shape[1:4] - grid = make_grid(h, w, box.dtype).reshape((1, h, w, 1, 2)) - box[:, :, :, :, 0:2] = box[:, :, :, :, :2] + grid - box[:, :, :, :, 0] = box[:, :, :, :, 0] / w - box[:, :, :, :, 1] = box[:, :, :, :, 1] / h + x, y, w, h = box + na, grid_h, grid_w = x.shape[1:4] + grid = make_grid(grid_h, grid_w, x.dtype).reshape((1, 1, grid_h, grid_w, 2)) + x1 = (x + grid[:, :, :, :, 0:1]) / grid_w + y1 = (y + grid[:, :, :, :, 1:2]) / grid_h anchor = paddle.to_tensor(anchor) - anchor = paddle.cast(anchor, box.dtype) - anchor = anchor.reshape((1, 1, 1, na, 2)) - box[:, :, :, :, 2:4] = paddle.exp(box[:, :, :, :, 2:4]) * anchor - box[:, :, :, :, 2] = box[:, :, :, :, 2] / (downsample_ratio * w) - box[:, :, :, :, 3] = box[:, :, :, :, 3] / (downsample_ratio * h) - return box + anchor = paddle.cast(anchor, x.dtype) + anchor = anchor.reshape((1, na, 1, 1, 2)) + w1 = paddle.exp(w) * anchor[:, :, :, :, 0:1] / (downsample_ratio * grid_w) + h1 = paddle.exp(h) * anchor[:, :, :, :, 1:2] / (downsample_ratio * grid_h) + + return [x1, y1, w1, h1] def iou_similarity(box1, box2, eps=1e-9): @@ -66,7 +68,7 @@ def iou_similarity(box1, box2, eps=1e-9): Args: box1 (Tensor): box with the shape [N, M1, 4] box2 (Tensor): box with the shape [N, M2, 4] - + Return: iou (Tensor): iou between box1 and box2 with the shape [N, M1, M2] """ @@ -87,48 +89,56 @@ def bbox_iou(box1, box2, giou=False, diou=False, ciou=False, eps=1e-9): """calculate the iou of box1 and box2 Args: - box1 (Tensor): box1 with the shape (N, M, 4) - box2 (Tensor): box1 with the shape (N, M, 4) + box1 (list): [x, y, w, h], all have the shape [b, na, h, w, 1] + box2 (list): [x, y, w, h], all have the shape [b, na, h, w, 1] giou (bool): whether use giou or not, default False diou (bool): whether use diou or not, default False ciou (bool): whether use ciou or not, default False eps (float): epsilon to avoid divide by zero Return: - iou (Tensor): iou of box1 and box1, with the shape (N, M) + iou (Tensor): iou of box1 and box1, with the shape [b, na, h, w, 1] """ - px1y1, px2y2 = box1[:, :, 0:2], box1[:, :, 2:4] - gx1y1, gx2y2 = box2[:, :, 0:2], box2[:, :, 2:4] - x1y1 = paddle.maximum(px1y1, gx1y1) - x2y2 = paddle.minimum(px2y2, gx2y2) + px1, py1, px2, py2 = box1 + gx1, gy1, gx2, gy2 = box2 + x1 = paddle.maximum(px1, gx1) + y1 = paddle.maximum(py1, gy1) + x2 = paddle.minimum(px2, gx2) + y2 = paddle.minimum(py2, gy2) + + overlap = (x2 - x1) * (y2 - y1) + overlap = overlap.clip(0) + + area1 = (px2 - px1) * (py2 - py1) + area1 = area1.clip(0) + + area2 = (gx2 - gx1) * (gy2 - gy1) + area2 = area2.clip(0) - overlap = (x2y2 - x1y1).clip(0).prod(-1) - area1 = (px2y2 - px1y1).clip(0).prod(-1) - area2 = (gx2y2 - gx1y1).clip(0).prod(-1) union = area1 + area2 - overlap + eps iou = overlap / union + if giou or ciou or diou: # convex w, h - cwh = paddle.maximum(px2y2, gx2y2) - paddle.minimum(px1y1, gx1y1) - if ciou or diou: + cw = paddle.maximum(px2, gx2) - paddle.minimum(px1, gx1) + ch = paddle.maximum(py2, gy2) - paddle.minimum(py1, gy1) + if giou: + c_area = cw * ch + eps + return iou - (c_area - union) / c_area + else: # convex diagonal squared - c2 = (cwh**2).sum(2) + eps + c2 = cw**2 + ch**2 + eps # center distance - rho2 = ((px1y1 + px2y2 - gx1y1 - gx2y2)**2).sum(2) / 4 + rho2 = ((px1 + px2 - gx1 - gx2)**2 + (py1 + py2 - gy1 - gy2)**2) / 4 if diou: return iou - rho2 / c2 - elif ciou: - wh1 = px2y2 - px1y1 - wh2 = gx2y2 - gx1y1 - w1, h1 = wh1[:, :, 0], wh1[:, :, 1] + eps - w2, h2 = wh2[:, :, 0], wh2[:, :, 1] + eps - v = (4 / math.pi**2) * paddle.pow( - paddle.atan(w1 / h1) - paddle.atan(w2 / h2), 2) + else: + w1, h1 = px2 - px1, py2 - py1 + eps + w2, h2 = gx2 - gx1, gy2 - gy1 + eps + delta = paddle.atan(w1 / h1) - paddle.atan(w2 / h2) + v = (4 / math.pi**2) * paddle.pow(delta, 2) alpha = v / (1 + eps - iou + v) alpha.stop_gradient = True return iou - (rho2 / c2 + v * alpha) - else: - c_area = cwh.prod(2) + eps - return iou - (c_area - union) / c_area else: return iou diff --git a/dygraph/ppdet/optimizer.py b/dygraph/ppdet/optimizer.py index 3c1a17be36b3555c76cfb5840d1602c7f53ef62a..e2e6123b385a506d0d1dcc25b68d0d9205fc775e 100644 --- a/dygraph/ppdet/optimizer.py +++ b/dygraph/ppdet/optimizer.py @@ -17,6 +17,7 @@ from __future__ import division from __future__ import print_function import math +import copy import paddle import paddle.nn as nn @@ -202,7 +203,7 @@ class OptimizerBuilder(): def __call__(self, learning_rate, params=None): if self.clip_grad_by_norm is not None: - grad_clip = nn.GradientClipByGlobalNorm( + grad_clip = nn.ClipGradByGlobalNorm( clip_norm=self.clip_grad_by_norm) else: grad_clip = None @@ -223,3 +224,38 @@ class OptimizerBuilder(): weight_decay=regularization, grad_clip=grad_clip, **optim_args) + + +class ModelEMA(object): + def __init__(self, decay, model, use_thres_step=False): + self.step = 0 + self.decay = decay + self.state_dict = dict() + for k, v in model.state_dict().items(): + self.state_dict[k] = paddle.zeros_like(v) + self.use_thres_step = use_thres_step + + def update(self, model): + if self.use_thres_step: + decay = min(self.decay, (1 + self.step) / (10 + self.step)) + else: + decay = self.decay + self._decay = decay + model_dict = model.state_dict() + for k, v in self.state_dict.items(): + if '_mean' not in k and '_variance' not in k: + v = decay * v + (1 - decay) * model_dict[k] + v.stop_gradient = True + self.state_dict[k] = v + else: + self.state_dict[k] = model_dict[k] + self.step += 1 + + def apply(self): + state_dict = dict() + for k, v in self.state_dict.items(): + if '_mean' not in k and '_variance' not in k: + v = v / (1 - self._decay**self.step) + v.stop_gradient = True + state_dict[k] = v + return state_dict diff --git a/dygraph/ppdet/utils/checkpoint.py b/dygraph/ppdet/utils/checkpoint.py index f96804f6bd58a655dde59c3b48e8c70213ff7e40..fca3d47d7166c29bba99b13d845dac3185ce911f 100644 --- a/dygraph/ppdet/utils/checkpoint.py +++ b/dygraph/ppdet/utils/checkpoint.py @@ -23,6 +23,7 @@ import time import re import numpy as np import paddle +import paddle.nn as nn from .download import get_weights_path from .logger import setup_logger @@ -169,7 +170,12 @@ def save_model(model, optimizer, save_dir, save_name, last_epoch): if not os.path.exists(save_dir): os.makedirs(save_dir) save_path = os.path.join(save_dir, save_name) - paddle.save(model.state_dict(), save_path + ".pdparams") + if isinstance(model, nn.Layer): + paddle.save(model.state_dict(), save_path + ".pdparams") + else: + assert isinstance(model, + dict), 'model is not a instance of nn.layer or dict' + paddle.save(model, save_path + ".pdparams") state_dict = optimizer.state_dict() state_dict['last_epoch'] = last_epoch paddle.save(state_dict, save_path + ".pdopt")