autofinetune-cv.md 4.4 KB
Newer Older
S
Steffy-zxf 已提交
1
# PaddleHub AutoDL Finetuner——图像分类任务
Z
zhangxuefei 已提交
2 3


S
Steffy-zxf 已提交
4
使用PaddleHub AutoDL Finetuner需要准备两个指定格式的文件:待优化的超参数信息yaml文件hparam.yaml和需要Fine-tune的python脚本train.py
Z
zhangxuefei 已提交
5

S
Steffy-zxf 已提交
6
以Fine-tune图像分类任务为例,展示如何利用PaddleHub AutoDL Finetuner进行超参优化。
Z
zhangxuefei 已提交
7

S
Steffy-zxf 已提交
8
以下是待优化超参数的yaml文件hparam.yaml,包含需要搜素的超参名字、类型、范围等信息。目前参数搜索类型只支持float和int类型
Z
zhangxuefei 已提交
9 10 11 12 13 14 15 16 17 18 19 20 21 22
```
param_list:
- name : learning_rate
  init_value : 0.001
  type : float
  lower_than : 0.05
  greater_than : 0.00005
- name : batch_size
  init_value : 12
  type : int
  lower_than : 20
  greater_than : 10
```

S
Steffy-zxf 已提交
23
以下是图像分类的`train.py`
Z
zhangxuefei 已提交
24 25 26 27 28 29 30 31 32 33

```python
# coding:utf-8
import argparse
import os
import ast
import shutil

import paddle.fluid as fluid
import paddlehub as hub
S
Steffy-zxf 已提交
34
from paddlehub.common.logger import logger
Z
zhangxuefei 已提交
35 36

parser = argparse.ArgumentParser(__doc__)
S
Steffy-zxf 已提交
37
parser.add_argument("--epochs",             type=int,               default=1,                         help="Number of epoches for fine-tuning.")
Z
zhangxuefei 已提交
38 39
parser.add_argument("--use_gpu",            type=ast.literal_eval,  default=True,                      help="Whether use GPU for fine-tuning.")
parser.add_argument("--checkpoint_dir",     type=str,               default=None,                      help="Path to save log data.")
S
Steffy-zxf 已提交
40 41

# the name of hyperparameters to be searched should keep with hparam.py
Z
zhangxuefei 已提交
42 43
parser.add_argument("--batch_size",         type=int,               default=16,                        help="Total examples' number in batch for training.")
parser.add_argument("--learning_rate",      type=float,             default=1e-4,                      help="learning_rate.")
S
Steffy-zxf 已提交
44 45 46

# saved_params_dir and model_path are needed by auto finetune
parser.add_argument("--saved_params_dir",   type=str,               default="",                        help="Directory for saving model")
Z
zhangxuefei 已提交
47 48 49 50 51 52 53 54 55 56 57 58 59
parser.add_argument("--model_path",         type=str,               default="",                        help="load model path")


def is_path_valid(path):
    if path == "":
        return False
    path = os.path.abspath(path)
    dirname = os.path.dirname(path)
    if not os.path.exists(dirname):
        os.mkdir(dirname)
    return True

def finetune(args):
S
Steffy-zxf 已提交
60
    # Load Paddlehub resnet50 pretrained model
Z
zhangxuefei 已提交
61 62 63
    module = hub.Module(name="resnet_v2_50_imagenet")
    input_dict, output_dict, program = module.context(trainable=True)

S
Steffy-zxf 已提交
64
    # Download dataset and use ImageClassificationReader to read dataset
Z
zhangxuefei 已提交
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
    dataset = hub.dataset.Flowers()
    data_reader = hub.reader.ImageClassificationReader(
        image_width=module.get_expected_image_width(),
        image_height=module.get_expected_image_height(),
        images_mean=module.get_pretrained_images_mean(),
        images_std=module.get_pretrained_images_std(),
        dataset=dataset)

    feature_map = output_dict["feature_map"]

    img = input_dict["image"]
    feed_list = [img.name]

    # Select finetune strategy, setup config and finetune
    strategy = hub.DefaultFinetuneStrategy(
        learning_rate=args.learning_rate)

    config = hub.RunConfig(
        use_cuda=True,
S
Steffy-zxf 已提交
84
        num_epoch=args.epochs,
Z
zhangxuefei 已提交
85 86 87 88
        batch_size=args.batch_size,
        checkpoint_dir=args.checkpoint_dir,
        strategy=strategy)

S
Steffy-zxf 已提交
89
    # Construct transfer learning network
Z
zhangxuefei 已提交
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109
    task = hub.ImageClassifierTask(
        data_reader=data_reader,
        feed_list=feed_list,
        feature=feature_map,
        num_classes=dataset.num_labels,
        config=config)

    # Load model from the defined model path or not
    if args.model_path != "":
        with task.phase_guard(phase="train"):
            task.init_if_necessary()
            task.load_parameters(args.model_path)
            logger.info("PaddleHub has loaded model from %s" % args.model_path)


    task.finetune()
    run_states = task.eval()
    eval_avg_score, eval_avg_loss, eval_run_speed = task._calculate_metrics(run_states)

    # Move ckpt/best_model to the defined saved parameters directory
S
Steffy-zxf 已提交
110 111 112
    best_model_dir = os.path.join(config.checkpoint_dir, "best_model")
    if is_path_valid(args.saved_params_dir) and os.path.exists(best_model_dir):
        shutil.copytree(best_model_dir, args.saved_params_dir)
Z
zhangxuefei 已提交
113 114
        shutil.rmtree(config.checkpoint_dir)

S
Steffy-zxf 已提交
115
    # acc on dev will be used by auto finetune
S
Steffy-zxf 已提交
116
    hub.report_final_result(eval_avg_score["acc"])
Z
zhangxuefei 已提交
117 118 119 120 121 122


if __name__ == "__main__":
    args = parser.parse_args()
    finetune(args)
```