From 1b16f4753f96d7196ecf81d019b55443095ce060 Mon Sep 17 00:00:00 2001 From: wangguanzhong Date: Mon, 18 May 2020 15:28:42 +0800 Subject: [PATCH] Cherry pick transfer learning doc (#725) * update doc in transfer learning * update doc --- docs/advanced_tutorials/TRANSFER_LEARNING.md | 34 +++++++++++++++++-- .../TRANSFER_LEARNING_cn.md | 30 ++++++++++++++++ 2 files changed, 61 insertions(+), 3 deletions(-) diff --git a/docs/advanced_tutorials/TRANSFER_LEARNING.md b/docs/advanced_tutorials/TRANSFER_LEARNING.md index 975b85ccb..f88873b49 100644 --- a/docs/advanced_tutorials/TRANSFER_LEARNING.md +++ b/docs/advanced_tutorials/TRANSFER_LEARNING.md @@ -6,11 +6,39 @@ Transfer learning aims at learning new knowledge from existing knowledge. For ex In transfer learning, if different dataset and the number of classes is used, the dimensional inconsistency will causes in loading parameters related to the number of classes; On the other hand, if more complicated model is used, need to motify the open-source model construction and selective load parameters. Thus, PaddleDetection should designate parameter fields and ignore loading the parameters which match the fields. -## Transfer Learning in PaddleDetection +### Use custom dataset -In transfer learning, it's needed to load pretrained model selectively. Two ways are provided in PaddleDetection. +Transfer learning needs custom dataset and annotation in COCO-format and VOC-format is supported now. The script converts the annotation from labelme or cityscape to COCO is provided in ```ppdet/data/tools/x2coco.py```. More details please refer to [READER](READER.md). After data preparation, update the data parameters in configuration file. -#### Load pretrain weights directly (**recommended**) + +1. COCO-format dataset, take [yolov3\_darknet.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/yolov3_darknet.yml#L66) for example, modify the COCODataSet in yolov3\_reader: + +```yml + dataset: + !COCODataSet + dataset_dir: custom_data/coco # directory of custom dataset + image_dir: train2017 # custom training dataset which is in dataset_dir + anno_path: annotations/instances_train2017.json # custom annotation path which is in dataset_dir + with_background: false +``` + +2. VOC-format dataset, take [yolov3\_darknet\_voc.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/yolov3_darknet_voc.yml#L67) for example, modify the VOCDataSet in the configuration: + +```yml + dataset: + !VOCDataSet + dataset_dir: custom_data/voc # directory of custom dataset + anno_path: trainval.txt # custom annotation path which is in dataset_dir + use_default_label: true + with_background: false +``` + + +### Load pretrained model + +In transfer learning, it's needed to load pretrained model selectively. Two methods are provided. + +#### Load pretrained weights directly (**recommended**) The parameters which have diffierent shape between model and pretrain\_weights are ignored automatically. For example: diff --git a/docs/advanced_tutorials/TRANSFER_LEARNING_cn.md b/docs/advanced_tutorials/TRANSFER_LEARNING_cn.md index 9664f5d35..5cd5d7fe7 100644 --- a/docs/advanced_tutorials/TRANSFER_LEARNING_cn.md +++ b/docs/advanced_tutorials/TRANSFER_LEARNING_cn.md @@ -1,3 +1,5 @@ +[English](TRANSFER_LEARNING.md) | 简体中文 + # 迁移学习教程 迁移学习为利用已有知识,对新知识进行学习。例如利用ImageNet分类预训练模型做初始化来训练检测模型,利用在COCO数据集上的检测模型做初始化来训练基于PascalVOC数据集的检测模型。 @@ -5,6 +7,34 @@ ### 选择数据 +迁移学习需要使用自己的数据集,目前已支持COCO和VOC的数据标注格式,在```ppdet/data/tools/x2coco.py```中给出了labelme和cityscape标注格式转换为COCO格式的脚本,具体使用方式可以参考[自定义数据源](READER.md)。数据准备完成后,在配置文件中配置数据路径,对应修改reader中的路径参数即可。 + +1. COCO数据集需要修改COCODataSet中的参数,以[yolov3\_darknet.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/yolov3_darknet.yml#L66)为例,修改yolov3\_reader中的配置: + +```yml + dataset: + !COCODataSet + dataset_dir: custom_data/coco # 自定义数据目录 + image_dir: train2017 # 自定义训练集目录,该目录在dataset_dir中 + anno_path: annotations/instances_train2017.json # 自定义数据标注路径,该目录在dataset_dir中 + with_background: false +``` + +2. VOC数据集需要修改VOCDataSet中的参数,以[yolov3\_darknet\_voc.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/yolov3_darknet_voc.yml#L67)为例: + +```yml + dataset: + !VOCDataSet + dataset_dir: custom_data/voc # 自定义数据集目录 + anno_path: trainval.txt # 自定义数据标注路径,该目录在dataset_dir中 + use_default_label: true + with_background: false + +``` + + +### 加载预训练模型 + 在进行迁移学习时,由于会使用不同的数据集,数据类别数与COCO/VOC数据类别不同,导致在加载开源模型(如COCO预训练模型)时,与类别数相关的权重(例如分类模块的fc层)会出现维度不匹配的问题;另外,如果需要结构更加复杂的模型,需要对已有开源模型结构进行调整,对应权重也需要选择性加载。因此,需要在加载模型时不加载不能匹配的权重。 -- GitLab