提交 820a38b6 编写于 作者: G Guanghua Yu 提交者: ceci3

add blazeface sa_nas demo && fix some config (#156)

* add blazeface nas demo

* fix format

* fix configs

* fix reduce_rate
上级 aaf77a05
...@@ -217,7 +217,7 @@ EvalReader: ...@@ -217,7 +217,7 @@ EvalReader:
... ...
dataset: dataset:
dataset_dir: dataset/fddb dataset_dir: dataset/fddb
annotation: FDDB-folds/fddb_annotFile.txt anno_path: FDDB-folds/fddb_annotFile.txt
... ...
``` ```
评估并生成结果文件: 评估并生成结果文件:
......
...@@ -234,7 +234,7 @@ EvalReader: ...@@ -234,7 +234,7 @@ EvalReader:
... ...
dataset: dataset:
dataset_dir: dataset/fddb dataset_dir: dataset/fddb
annotation: FDDB-folds/fddb_annotFile.txt anno_path: FDDB-folds/fddb_annotFile.txt
... ...
``` ```
Evaluate and generate results files: Evaluate and generate results files:
......
...@@ -81,6 +81,7 @@ TrainReader: ...@@ -81,6 +81,7 @@ TrainReader:
std: [127.502231, 127.502231, 127.502231] std: [127.502231, 127.502231, 127.502231]
batch_size: 8 batch_size: 8
use_process: true use_process: true
worker_num: 8
shuffle: true shuffle: true
EvalReader: EvalReader:
......
...@@ -83,6 +83,7 @@ TrainReader: ...@@ -83,6 +83,7 @@ TrainReader:
std: [127.502231, 127.502231, 127.502231] std: [127.502231, 127.502231, 127.502231]
batch_size: 8 batch_size: 8
use_process: true use_process: true
worker_num: 8
shuffle: true shuffle: true
EvalReader: EvalReader:
......
...@@ -43,6 +43,7 @@ OptimizerBuilder: ...@@ -43,6 +43,7 @@ OptimizerBuilder:
TrainReader: TrainReader:
batch_size: 8 batch_size: 8
use_process: True use_process: True
worker_num: 8
shuffle: true shuffle: true
inputs_def: inputs_def:
image_shape: [3, 640, 640] image_shape: [3, 640, 640]
......
...@@ -43,6 +43,7 @@ OptimizerBuilder: ...@@ -43,6 +43,7 @@ OptimizerBuilder:
TrainReader: TrainReader:
batch_size: 8 batch_size: 8
use_process: True use_process: True
worker_num: 8
shuffle: true shuffle: true
inputs_def: inputs_def:
image_shape: [3, 640, 640] image_shape: [3, 640, 640]
......
...@@ -94,7 +94,7 @@ TrainReader: ...@@ -94,7 +94,7 @@ TrainReader:
shuffle: true shuffle: true
worker_num: 8 worker_num: 8
bufsize: 32 bufsize: 32
use_process: 8 use_process: true
EvalReader: EvalReader:
inputs_def: inputs_def:
......
...@@ -97,7 +97,7 @@ TrainReader: ...@@ -97,7 +97,7 @@ TrainReader:
shuffle: true shuffle: true
worker_num: 8 worker_num: 8
bufsize: 32 bufsize: 32
use_process: 8 use_process: true
EvalReader: EvalReader:
inputs_def: inputs_def:
......
...@@ -98,7 +98,7 @@ TrainReader: ...@@ -98,7 +98,7 @@ TrainReader:
shuffle: true shuffle: true
worker_num: 8 worker_num: 8
bufsize: 32 bufsize: 32
use_process: 8 use_process: true
EvalReader: EvalReader:
inputs_def: inputs_def:
......
...@@ -14,7 +14,7 @@ cd "$DIR" ...@@ -14,7 +14,7 @@ cd "$DIR"
# Download the data. # Download the data.
echo "Downloading..." echo "Downloading..."
# external link to the Faces in the Wild data set and annotations file # external link to the Faces in the Wild data set and annotations file
#wget http://tamaraberg.com/faceDataset/originalPics.tar.gz wget http://tamaraberg.com/faceDataset/originalPics.tar.gz
wget http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz wget http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz
wget http://vis-www.cs.umass.edu/fddb/evaluation.tgz wget http://vis-www.cs.umass.edu/fddb/evaluation.tgz
......
>运行该示例前请安装Paddle1.6或更高版本
# 检测模型神经网络搜索(NAS)示例
## 概述
我们选取人脸检测的BlazeFace模型作为神经网络搜索示例,该示例使用[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
辅助完成神经网络搜索实验,具体技术细节,请您参考[神经网络搜索策略](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/tutorials/nas_demo.md)
## 定义搜索空间
在BlazeFace模型的搜索实验中,我们采用了SANAS的方式进行搜索,本次实验会对网络模型中的通道数和卷积核尺寸进行搜索。
所以我们定义了如下搜索空间:
- 初始化通道模块`blaze_filter_num1`:定义了BlazeFace第一个模块中通道数变化区间,人为定义了较小的通道数区间;
- 单blaze模块`blaze_filter_num2`: 定义了BlazeFace单blaze模块中通道数变化区间,人为定义了适中的通道数区间;
- 过渡blaze模块`mid_filter_num`:定义了BlazeFace由单blaze模块到双blaze模块的过渡区间;
- 双blaze模块`double_filter_num`:定义了BlazeFace双blaze模块中通道数变化区间,人为定义了较大的通道数区间;
- 卷积核尺寸`use_5x5kernel`:定义了BlazeFace中卷积和尺寸大小是3x3或者5x5。
根据定义的搜索空间各个区间,我们的搜索空间tokens共9位,变化区间在([0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 9, 12, 12, 6, 6, 6, 6, 2])范围内。
9位tokens分别表示:
- tokens[0]:初始化通道数 = blaze_filter_num1[tokens[0]]
- tokens[1]:单blaze模块通道数 = blaze_filter_num2[tokens[1]]
- tokens[2]-tokens[3]:双blaze模块起始通道数 = double_filter_num[tokens[2/3]]
- tokens[4]-tokens[7]:过渡blaze模块通道数 = [tokens[4/5/6/7]]
- tokens[8]:卷积核尺寸使用5x5 = True if use_5x5kernel[tokens[8]] else False
我们人为定义三个单blaze模块与4个双blaze模块,定义规则如下:
```
blaze_filters = [[self.blaze_filter_num1[tokens[0]], self.blaze_filter_num1[tokens[0]]],
[self.blaze_filter_num1[tokens[0]], self.blaze_filter_num2[tokens[1]], 2],
[self.blaze_filter_num2[tokens[1]], self.blaze_filter_num2[tokens[1]]]]
double_blaze_filters = [
[self.blaze_filter_num2[tokens[1]], self.mid_filter_num[tokens[4]], self.double_filter_num[tokens[2]], 2],
[self.double_filter_num[tokens[2]], self.mid_filter_num[tokens[5]], self.double_filter_num[tokens[2]]],
[self.double_filter_num[tokens[2]], self.mid_filter_num[tokens[6]], self.double_filter_num[tokens[3]], 2],
[self.double_filter_num[tokens[3]], self.mid_filter_num[tokens[7]], self.double_filter_num[tokens[3]]]]
```
blaze_filters与double_blaze_filters字段请参考[blazenet.py](../../ppdet/modeling/backbones/blazenet.py)中定义。
初始化tokens为:[2, 1, 3, 8, 2, 1, 2, 1, 1]。
## 开始搜索
首先需要安装PaddleSlim,请参考[安装教程](https://paddlepaddle.github.io/PaddleSlim/#_2)
然后进入 `slim/nas`目录中,修改blazeface.yml配置,配置文件中搜索配置字段含义请参考[NAS-API文档](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/api/nas_api.md)
然后开始搜索实验:
```
cd slim/nas
python -u train_nas.py -c blazeface.yml
```
**注意:**
搜索过程中为了加速,在`blazeface.yml`中去掉了数据预处理`CropImageWithDataAchorSampling`的操作。
训练完成后会获得最佳tokens,以及对应的`BlazeFace-NAS`的网络结构:
```
------------->>> BlazeFace-NAS structure start: <<<----------------
BlazeNet:
blaze_filters: XXX
double_blaze_filters: XXX
use_5x5kernel: XXX
with_extra_blocks: XXX
lite_edition: XXX
-------------->>> BlazeFace-NAS structure end! <<<-----------------
```
## 训练、评估与预测
- (1)修改配置文件:
根据搜索得到的`BlazeFace-NAS`的网络结构修改`blazeface.yml`中的`BlazeNet`模块。
- (2)训练、评估与预测:
启动完整的训练评估实验,可参考PaddleDetection的[训练、评估与预测流程](../../docs/GETTING_STARTED_cn.md)
## 实验结果
请参考[人脸检测模型库](../../configs/face_detection/README.md#模型库与基线)中BlazeFace-NAS的实验结果。
## FAQ
- 运行报错:`socket.error: [Errno 98] Address already in use`
解决方法:当前端口被占用,请修改blazeface.yml中的`server_port`端口。
- 运行报错:`not enough space for reason[failed to malloc 601 pages...`
解决方法:当前reader的共享存储队列空间不足,请增大blazeface.yml中的`memsize`
architecture: BlazeFace
max_iters: 5000
use_gpu: true
log_smooth_window: 20
log_iter: 20
metric: WIDERFACE
save_dir: nas_checkpoint
# 1(label_class) + 1(background)
num_classes: 2
# nas config
reduce_rate: 0.85
init_temperature: 10.24
is_server: true
max_flops: 531558400
search_steps: 300
server_ip: ""
server_port: 8999
search_space: BlazeFaceNasSpace
LearningRate:
base_lr: 0.001
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [240000, 300000]
OptimizerBuilder:
optimizer:
momentum: 0.0
type: RMSPropOptimizer
regularizer:
factor: 0.0005
type: L2
TrainReader:
inputs_def:
image_shape: [3, 640, 640]
fields: ['image', 'gt_bbox', 'gt_class']
dataset:
!WIDERFaceDataSet
dataset_dir: dataset/wider_face
anno_path: wider_face_split/wider_face_train_bbx_gt.txt
image_dir: WIDER_train/images
sample_transforms:
- !DecodeImage
to_rgb: true
- !NormalizeBox {}
- !RandomDistort
brightness_lower: 0.875
brightness_upper: 1.125
is_order: true
- !ExpandImage
max_ratio: 4
prob: 0.5
- !RandomInterpImage
target_size: 640
- !RandomFlipImage
is_normalized: true
- !Permute {}
- !NormalizeImage
is_scale: false
mean: [104, 117, 123]
std: [127.502231, 127.502231, 127.502231]
batch_size: 8
use_process: True
worker_num: 8
shuffle: true
memsize: 6G
EvalReader:
inputs_def:
fields: ['image', 'im_id', 'im_shape', 'gt_bbox']
dataset:
!WIDERFaceDataSet
dataset_dir: dataset/wider_face
anno_path: wider_face_split/wider_face_val_bbx_gt.txt
image_dir: WIDER_val/images
sample_transforms:
- !DecodeImage
to_rgb: true
- !NormalizeBox {}
- !Permute {}
- !NormalizeImage
is_scale: false
mean: [104, 117, 123]
std: [127.502231, 127.502231, 127.502231]
batch_size: 1
from .blazefacespace_nas import BlazeFaceNasSpace
__all__ = ['BlazeFaceNasSpace']
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from paddleslim.nas.search_space.search_space_base import SearchSpaceBase
from paddleslim.nas.search_space.search_space_registry import SEARCHSPACE
from ppdet.modeling.backbones.blazenet import BlazeNet
from ppdet.modeling.architectures.blazeface import BlazeFace
@SEARCHSPACE.register
class BlazeFaceNasSpace(SearchSpaceBase):
def __init__(self, input_size, output_size, block_num, block_mask):
super(BlazeFaceNasSpace, self).__init__(input_size, output_size,
block_num, block_mask)
self.blaze_filter_num1 = np.array([4, 8, 12, 16, 20, 24, 32])
self.blaze_filter_num2 = np.array([8, 12, 16, 20, 24, 32, 40, 48, 64])
self.mid_filter_num = np.array([8, 12, 16, 20, 24, 32])
self.double_filter_num = np.array(
[8, 12, 16, 24, 32, 40, 48, 64, 72, 80, 88, 96])
self.use_5x5kernel = np.array([0, 1])
def init_tokens(self):
return [2, 1, 3, 8, 2, 1, 2, 1, 1]
def range_table(self):
return [
len(self.blaze_filter_num1), len(self.blaze_filter_num2),
len(self.double_filter_num), len(self.double_filter_num),
len(self.mid_filter_num), len(self.mid_filter_num),
len(self.mid_filter_num), len(self.mid_filter_num),
len(self.use_5x5kernel)
]
def get_nas_cnf(self, tokens=None):
if tokens is None:
tokens = self.init_tokens()
blaze_filters = [[
self.blaze_filter_num1[tokens[0]], self.blaze_filter_num1[tokens[0]]
], [
self.blaze_filter_num1[tokens[0]],
self.blaze_filter_num2[tokens[1]], 2
], [
self.blaze_filter_num2[tokens[1]], self.blaze_filter_num2[tokens[1]]
]]
double_blaze_filters = [[
self.blaze_filter_num2[tokens[1]], self.mid_filter_num[tokens[4]],
self.double_filter_num[tokens[2]], 2
], [
self.double_filter_num[tokens[2]], self.mid_filter_num[tokens[5]],
self.double_filter_num[tokens[2]]
], [
self.double_filter_num[tokens[2]], self.mid_filter_num[tokens[6]],
self.double_filter_num[tokens[3]], 2
], [
self.double_filter_num[tokens[3]], self.mid_filter_num[tokens[7]],
self.double_filter_num[tokens[3]]
]]
is_5x5kernel = True if self.use_5x5kernel[tokens[8]] else False
return blaze_filters, double_blaze_filters, is_5x5kernel
def token2arch(self, tokens=None):
blaze_filters, double_blaze_filters, is_5x5kernel = self.get_nas_cnf(
tokens)
self.print_nas_structure(tokens)
def net_arch(input, mode, cfg):
self.output_decoder = cfg.BlazeFace['output_decoder']
self.min_sizes = cfg.BlazeFace['min_sizes']
self.use_density_prior_box = cfg.BlazeFace['use_density_prior_box']
my_backbone = BlazeNet(
blaze_filters=blaze_filters,
double_blaze_filters=double_blaze_filters,
use_5x5kernel=is_5x5kernel)
my_blazeface = BlazeFace(
my_backbone,
output_decoder=self.output_decoder,
min_sizes=self.min_sizes,
use_density_prior_box=self.use_density_prior_box)
return my_blazeface.build(input, mode=mode)
return net_arch
def print_nas_structure(self, tokens=None):
blaze_filters, double_filters, is_5x5kernel = self.get_nas_cnf(tokens)
print('---------->>> BlazeFace-NAS structure start: <<<------------')
print('BlazeNet:')
print(' blaze_filters: {}'.format(blaze_filters))
print(' double_blaze_filters: {}'.format(double_filters))
print(' use_5x5kernel: {}'.format(is_5x5kernel))
print(' with_extra_blocks: true')
print(' lite_edition: false')
print('---------->>> BlazeFace-NAS structure end! <<<------------')
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import time
import numpy as np
import datetime
from collections import deque
def set_paddle_flags(**kwargs):
for key, value in kwargs.items():
if os.environ.get(key, None) is None:
os.environ[key] = str(value)
# NOTE(paddle-dev): All of these flags should be set before
# `import paddle`. Otherwise, it would not take any effect.
set_paddle_flags(
FLAGS_eager_delete_tensor_gb=0, # enable GC to save memory
)
from paddle import fluid
import sys
sys.path.append("../../")
from ppdet.experimental import mixed_precision_context
from ppdet.core.workspace import load_config, merge_config, create
from ppdet.data.reader import create_reader
from ppdet.utils import dist_utils
from ppdet.utils.eval_utils import parse_fetches, eval_run
from ppdet.utils.stats import TrainingStats
from ppdet.utils.cli import ArgsParser
from ppdet.utils.check import check_gpu, check_version
import ppdet.utils.checkpoint as checkpoint
from paddleslim.analysis import flops
from paddleslim.nas import SANAS
import search_space
import logging
FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT)
logger = logging.getLogger(__name__)
def get_bboxes_scores(result):
bboxes = result['bbox'][0]
gt_bbox = result['gt_bbox'][0]
bbox_lengths = result['bbox'][1][0]
gt_lengths = result['gt_bbox'][1][0]
bbox_list = []
gt_box_list = []
for i in range(len(bbox_lengths)):
num = bbox_lengths[i]
for j in range(num):
dt = bboxes[j]
clsid, score, xmin, ymin, xmax, ymax = dt.tolist()
im_shape = result['im_shape'][0][i].tolist()
im_height, im_width = int(im_shape[0]), int(im_shape[1])
xmin *= im_width
ymin *= im_height
xmax *= im_width
ymax *= im_height
bbox_list.append([xmin, ymin, xmax, ymax, score])
faces_num_gt = 0
for i in range(len(gt_lengths)):
num = gt_lengths[i]
for j in range(num):
gt = gt_bbox[j]
xmin, ymin, xmax, ymax = gt.tolist()
im_shape = result['im_shape'][0][i].tolist()
im_height, im_width = int(im_shape[0]), int(im_shape[1])
xmin *= im_width
ymin *= im_height
xmax *= im_width
ymax *= im_height
gt_box_list.append([xmin, ymin, xmax, ymax])
faces_num_gt += 1
return gt_box_list, bbox_list, faces_num_gt
def calculate_ap_py(results):
def cal_iou(rect1, rect2):
lt_x = max(rect1[0], rect2[0])
lt_y = max(rect1[1], rect2[1])
rb_x = min(rect1[2], rect2[2])
rb_y = min(rect1[3], rect2[3])
if (rb_x > lt_x) and (rb_y > lt_y):
intersection = (rb_x - lt_x) * (rb_y - lt_y)
else:
return 0
area1 = (rect1[2] - rect1[0]) * (rect1[3] - rect1[1])
area2 = (rect2[2] - rect2[0]) * (rect2[3] - rect2[1])
intersection = min(intersection, area1, area2)
union = area1 + area2 - intersection
return float(intersection) / union
def is_same_face(face_gt, face_pred):
iou = cal_iou(face_gt, face_pred)
return iou >= 0.5
def eval_single_image(faces_gt, faces_pred):
pred_is_true = [False] * len(faces_pred)
gt_been_pred = [False] * len(faces_gt)
for i in range(len(faces_pred)):
isface = False
for j in range(len(faces_gt)):
if gt_been_pred[j] == 0:
isface = is_same_face(faces_gt[j], faces_pred[i])
if isface == 1:
gt_been_pred[j] = True
break
pred_is_true[i] = isface
return pred_is_true
score_res_pair = {}
faces_num_gt = 0
for t in results:
gt_box_list, bbox_list, face_num_gt = get_bboxes_scores(t)
faces_num_gt += face_num_gt
pred_is_true = eval_single_image(gt_box_list, bbox_list)
for i in range(0, len(pred_is_true)):
now_score = bbox_list[i][-1]
if now_score in score_res_pair:
score_res_pair[now_score].append(int(pred_is_true[i]))
else:
score_res_pair[now_score] = [int(pred_is_true[i])]
keys = score_res_pair.keys()
keys = sorted(keys, reverse=True)
tp_num = 0
predict_num = 0
precision_list = []
recall_list = []
for i in range(len(keys)):
k = keys[i]
v = score_res_pair[k]
predict_num += len(v)
tp_num += sum(v)
recall = float(tp_num) / faces_num_gt
precision_list.append(float(tp_num) / predict_num)
recall_list.append(recall)
ap = precision_list[0] * recall_list[0]
for i in range(1, len(precision_list)):
ap += precision_list[i] * (recall_list[i] - recall_list[i - 1])
return ap
def main():
env = os.environ
FLAGS.dist = 'PADDLE_TRAINER_ID' in env and 'PADDLE_TRAINERS_NUM' in env
if FLAGS.dist:
trainer_id = int(env['PADDLE_TRAINER_ID'])
import random
local_seed = (99 + trainer_id)
random.seed(local_seed)
np.random.seed(local_seed)
cfg = load_config(FLAGS.config)
if 'architecture' in cfg:
main_arch = cfg.architecture
else:
raise ValueError("'architecture' not specified in config file.")
merge_config(FLAGS.opt)
if 'log_iter' not in cfg:
cfg.log_iter = 20
# check if set use_gpu=True in paddlepaddle cpu version
check_gpu(cfg.use_gpu)
# check if paddlepaddle version is satisfied
check_version()
if cfg.use_gpu:
devices_num = fluid.core.get_cuda_device_count()
else:
devices_num = int(os.environ.get('CPU_NUM', 1))
if 'FLAGS_selected_gpus' in env:
device_id = int(env['FLAGS_selected_gpus'])
else:
device_id = 0
place = fluid.CUDAPlace(device_id) if cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
lr_builder = create('LearningRate')
optim_builder = create('OptimizerBuilder')
# add NAS
config = ([(cfg.search_space)])
server_address = (cfg.server_ip, cfg.server_port)
load_checkpoint = FLAGS.resume_checkpoint if FLAGS.resume_checkpoint else None
sa_nas = SANAS(
config,
server_addr=server_address,
init_temperature=cfg.init_temperature,
reduce_rate=cfg.reduce_rate,
search_steps=cfg.search_steps,
save_checkpoint=cfg.save_dir,
load_checkpoint=load_checkpoint,
is_server=cfg.is_server)
start_iter = 0
train_reader = create_reader(cfg.TrainReader, (cfg.max_iters - start_iter) *
devices_num, cfg)
eval_reader = create_reader(cfg.EvalReader)
for step in range(cfg.search_steps):
logger.info('----->>> search step: {} <<<------'.format(step))
archs = sa_nas.next_archs()[0]
# build program
startup_prog = fluid.Program()
train_prog = fluid.Program()
with fluid.program_guard(train_prog, startup_prog):
with fluid.unique_name.guard():
model = create(main_arch)
if FLAGS.fp16:
assert (getattr(model.backbone, 'norm_type', None)
!= 'affine_channel'), \
'--fp16 currently does not support affine channel, ' \
' please modify backbone settings to use batch norm'
with mixed_precision_context(FLAGS.loss_scale,
FLAGS.fp16) as ctx:
inputs_def = cfg['TrainReader']['inputs_def']
feed_vars, train_loader = model.build_inputs(**inputs_def)
train_fetches = archs(feed_vars, 'train', cfg)
loss = train_fetches['loss']
if FLAGS.fp16:
loss *= ctx.get_loss_scale_var()
lr = lr_builder()
optimizer = optim_builder(lr)
optimizer.minimize(loss)
if FLAGS.fp16:
loss /= ctx.get_loss_scale_var()
current_flops = flops(train_prog)
logger.info('current steps: {}, flops {}'.format(step, current_flops))
if current_flops > cfg.max_flops:
continue
# parse train fetches
train_keys, train_values, _ = parse_fetches(train_fetches)
train_values.append(lr)
if FLAGS.eval:
eval_prog = fluid.Program()
with fluid.program_guard(eval_prog, startup_prog):
with fluid.unique_name.guard():
model = create(main_arch)
inputs_def = cfg['EvalReader']['inputs_def']
feed_vars, eval_loader = model.build_inputs(**inputs_def)
fetches = archs(feed_vars, 'eval', cfg)
eval_prog = eval_prog.clone(True)
eval_loader.set_sample_list_generator(eval_reader, place)
extra_keys = ['im_id', 'im_shape', 'gt_bbox']
eval_keys, eval_values, eval_cls = parse_fetches(fetches, eval_prog,
extra_keys)
# compile program for multi-devices
build_strategy = fluid.BuildStrategy()
build_strategy.fuse_all_optimizer_ops = False
build_strategy.fuse_elewise_add_act_ops = True
exec_strategy = fluid.ExecutionStrategy()
# iteration number when CompiledProgram tries to drop local execution scopes.
# Set it to be 1 to save memory usages, so that unused variables in
# local execution scopes can be deleted after each iteration.
exec_strategy.num_iteration_per_drop_scope = 1
if FLAGS.dist:
dist_utils.prepare_for_multi_process(exe, build_strategy,
startup_prog, train_prog)
exec_strategy.num_threads = 1
exe.run(startup_prog)
compiled_train_prog = fluid.CompiledProgram(
train_prog).with_data_parallel(
loss_name=loss.name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
if FLAGS.eval:
compiled_eval_prog = fluid.compiler.CompiledProgram(eval_prog)
train_loader.set_sample_list_generator(train_reader, place)
train_stats = TrainingStats(cfg.log_smooth_window, train_keys)
train_loader.start()
end_time = time.time()
cfg_name = os.path.basename(FLAGS.config).split('.')[0]
save_dir = os.path.join(cfg.save_dir, cfg_name)
time_stat = deque(maxlen=cfg.log_smooth_window)
ap = 0
for it in range(start_iter, cfg.max_iters):
start_time = end_time
end_time = time.time()
time_stat.append(end_time - start_time)
time_cost = np.mean(time_stat)
eta_sec = (cfg.max_iters - it) * time_cost
eta = str(datetime.timedelta(seconds=int(eta_sec)))
outs = exe.run(compiled_train_prog, fetch_list=train_values)
stats = {
k: np.array(v).mean()
for k, v in zip(train_keys, outs[:-1])
}
train_stats.update(stats)
logs = train_stats.log()
if it % cfg.log_iter == 0 and (not FLAGS.dist or trainer_id == 0):
strs = 'iter: {}, lr: {:.6f}, {}, time: {:.3f}, eta: {}'.format(
it, np.mean(outs[-1]), logs, time_cost, eta)
logger.info(strs)
if (it > 0 and it == cfg.max_iters - 1) and (not FLAGS.dist or
trainer_id == 0):
save_name = str(
it) if it != cfg.max_iters - 1 else "model_final"
checkpoint.save(exe, train_prog,
os.path.join(save_dir, save_name))
if FLAGS.eval:
# evaluation
results = eval_run(exe, compiled_eval_prog, eval_loader,
eval_keys, eval_values, eval_cls)
ap = calculate_ap_py(results)
train_loader.reset()
eval_loader.reset()
logger.info('rewards: ap is {}'.format(ap))
sa_nas.reward(float(ap))
current_best_tokens = sa_nas.current_info()['best_tokens']
logger.info("All steps end, the best BlazeFace-NAS structure is: ")
sa_nas.tokens2arch(current_best_tokens)
if __name__ == '__main__':
parser = ArgsParser()
parser.add_argument(
"-r",
"--resume_checkpoint",
default=None,
type=str,
help="Checkpoint path for resuming training.")
parser.add_argument(
"--fp16",
action='store_true',
default=False,
help="Enable mixed precision training.")
parser.add_argument(
"--loss_scale",
default=8.,
type=float,
help="Mixed precision training loss scale.")
parser.add_argument(
"--eval",
action='store_true',
default=True,
help="Whether to perform evaluation in train")
FLAGS = parser.parse_args()
main()
...@@ -249,7 +249,9 @@ def main(): ...@@ -249,7 +249,9 @@ def main():
annotation_file = dataset.get_anno() annotation_file = dataset.get_anno()
dataset_dir = dataset.dataset_dir dataset_dir = dataset.dataset_dir
image_dir = dataset.image_dir image_dir = os.path.join(
dataset_dir,
dataset.image_dir) if FLAGS.eval_mode == 'widerface' else dataset_dir
pred_dir = FLAGS.output_eval if FLAGS.output_eval else 'output/pred' pred_dir = FLAGS.output_eval if FLAGS.output_eval else 'output/pred'
face_eval_run( face_eval_run(
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册