提交 9e306e6e 编写于 作者: C ceci3 提交者: qingqing01

Add latency demo for blazeface (#182)

* Add latency table
* Add latency demo
* Update doc
上级 1e395f8a
...@@ -5,7 +5,9 @@ ...@@ -5,7 +5,9 @@
## 概述 ## 概述
我们选取人脸检测的BlazeFace模型作为神经网络搜索示例,该示例使用[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) 我们选取人脸检测的BlazeFace模型作为神经网络搜索示例,该示例使用[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
辅助完成神经网络搜索实验,具体技术细节,请您参考[神经网络搜索策略](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/tutorials/nas_demo.md) 辅助完成神经网络搜索实验,具体技术细节,请您参考[神经网络搜索策略](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/tutorials/nas_demo.md)<br>
基于PaddleSlim进行搜索实验过程中,搜索限制条件可以选择是浮点运算数(FLOPs)限制还是硬件延时(latency)限制,硬件延时限制需要提供延时表。本示例提供一份基于blazeface搜索空间的硬件延时表,名称是latency_855.txt(基于PaddleLite在骁龙855上测试的延时),可以直接用该表进行blazeface的硬件延时搜索实验。<br>
硬件延时表每个字段的含义可以参考:[硬件延时表说明](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/table_latency.md)
## 定义搜索空间 ## 定义搜索空间
...@@ -15,9 +17,9 @@ ...@@ -15,9 +17,9 @@
- 单blaze模块`blaze_filter_num2`: 定义了BlazeFace单blaze模块中通道数变化区间,人为定义了适中的通道数区间; - 单blaze模块`blaze_filter_num2`: 定义了BlazeFace单blaze模块中通道数变化区间,人为定义了适中的通道数区间;
- 过渡blaze模块`mid_filter_num`:定义了BlazeFace由单blaze模块到双blaze模块的过渡区间; - 过渡blaze模块`mid_filter_num`:定义了BlazeFace由单blaze模块到双blaze模块的过渡区间;
- 双blaze模块`double_filter_num`:定义了BlazeFace双blaze模块中通道数变化区间,人为定义了较大的通道数区间; - 双blaze模块`double_filter_num`:定义了BlazeFace双blaze模块中通道数变化区间,人为定义了较大的通道数区间;
- 卷积核尺寸`use_5x5kernel`:定义了BlazeFace中卷积和尺寸大小是3x3或者5x5。 - 卷积核尺寸`use_5x5kernel`:定义了BlazeFace中卷积和尺寸大小是3x3或者5x5。由于提供的延时表中只统计了3x3卷积的延时,所以启动硬件延时搜索实验时,需要把卷积核尺寸固定为3x3。
根据定义的搜索空间各个区间,我们的搜索空间tokens共9位,变化区间在([0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 9, 12, 12, 6, 6, 6, 6, 2])范围内。 根据定义的搜索空间各个区间,我们的搜索空间tokens共9位,变化区间在([0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 9, 12, 12, 6, 6, 6, 6, 2])范围内。硬件延时搜索实验时,token的变化区间在([0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 9, 12, 12, 6, 6, 6, 6, 1])范围内。
9位tokens分别表示: 9位tokens分别表示:
...@@ -46,7 +48,14 @@ blaze_filters与double_blaze_filters字段请参考[blazenet.py](../../ppdet/mod ...@@ -46,7 +48,14 @@ blaze_filters与double_blaze_filters字段请参考[blazenet.py](../../ppdet/mod
## 开始搜索 ## 开始搜索
首先需要安装PaddleSlim,请参考[安装教程](https://paddlepaddle.github.io/PaddleSlim/#_2) 首先需要安装PaddleSlim,请参考[安装教程](https://paddlepaddle.github.io/PaddleSlim/#_2)
然后进入 `slim/nas`目录中,修改blazeface.yml配置,配置文件中搜索配置字段含义请参考[NAS-API文档](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/api/nas_api.md) 然后进入 `slim/nas`目录中,修改blazeface.yml配置,配置文件中搜索配置字段含义请参考[NAS-API文档](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/api/nas_api.md)<br>
配置文件blazeface.yml中的`Constraint`字段表示当前搜索实验的搜索限制条件实例: <br>
- `ctype`:具体的限制条件,可以设置为flops或者latency,分别表示浮点运算数限制和硬件延时限制。
- `max_constraint`:限制条件的最大值。
- `min_constraint`:限制条件的最小值。
- `table_file`:读取的硬件延时表文件的路径,这个字段只有在硬件延时搜索实验中才会用到。
然后开始搜索实验: 然后开始搜索实验:
``` ```
cd slim/nas cd slim/nas
......
...@@ -8,16 +8,24 @@ save_dir: nas_checkpoint ...@@ -8,16 +8,24 @@ save_dir: nas_checkpoint
# 1(label_class) + 1(background) # 1(label_class) + 1(background)
num_classes: 2 num_classes: 2
# nas config # nas config start
reduce_rate: 0.85 reduce_rate: 0.85
init_temperature: 10.24 init_temperature: 10.24
is_server: true is_server: true
max_flops: 531558400
search_steps: 300 search_steps: 300
server_ip: "" server_ip: ""
server_port: 8999 server_port: 8999
search_space: BlazeFaceNasSpace search_space: BlazeFaceNasSpace
Constraint:
# choice: flops, latency
ctype: latency
max_constraint: 57489
min_constraint: 18000
# only need in latency search
table_file: latency_855.txt
# nas config end
LearningRate: LearningRate:
base_lr: 0.001 base_lr: 0.001
schedulers: schedulers:
......
此差异已折叠。
...@@ -40,7 +40,7 @@ import sys ...@@ -40,7 +40,7 @@ import sys
sys.path.append("../../") sys.path.append("../../")
from ppdet.experimental import mixed_precision_context from ppdet.experimental import mixed_precision_context
from ppdet.core.workspace import load_config, merge_config, create from ppdet.core.workspace import load_config, merge_config, create, register
from ppdet.data.reader import create_reader from ppdet.data.reader import create_reader
from ppdet.utils import dist_utils from ppdet.utils import dist_utils
...@@ -49,7 +49,7 @@ from ppdet.utils.stats import TrainingStats ...@@ -49,7 +49,7 @@ from ppdet.utils.stats import TrainingStats
from ppdet.utils.cli import ArgsParser from ppdet.utils.cli import ArgsParser
from ppdet.utils.check import check_gpu, check_version from ppdet.utils.check import check_gpu, check_version
import ppdet.utils.checkpoint as checkpoint import ppdet.utils.checkpoint as checkpoint
from paddleslim.analysis import flops from paddleslim.analysis import flops, TableLatencyEvaluator
from paddleslim.nas import SANAS from paddleslim.nas import SANAS
import search_space import search_space
...@@ -59,6 +59,40 @@ logging.basicConfig(level=logging.INFO, format=FORMAT) ...@@ -59,6 +59,40 @@ logging.basicConfig(level=logging.INFO, format=FORMAT)
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@register
class Constraint(object):
"""
Constraint for nas
"""
def __init__(self,
ctype,
max_constraint=None,
min_constraint=None,
table_file=None):
super(Constraint, self).__init__()
self.ctype = ctype
self.max_constraint = max_constraint
self.min_constraint = min_constraint
self.table_file = table_file
def compute_constraint(self, program):
if self.ctype == 'flops':
model_status = flops(program)
elif self.ctype == 'latency':
assert os.path.exists(
self.table_file
), "latency constraint must have latency table, please check whether table file exist!"
model_latency = TableLatencyEvaluator(self.table_file)
model_status = model_latency.latency(program, only_conv=True)
else:
raise NotImplementedError(
"{} constraint is NOT support!!! Now PaddleSlim support flops constraint and latency constraint".
format(self.ctype))
return model_status
def get_bboxes_scores(result): def get_bboxes_scores(result):
bboxes = result['bbox'][0] bboxes = result['bbox'][0]
gt_bbox = result['gt_bbox'][0] gt_bbox = result['gt_bbox'][0]
...@@ -223,6 +257,7 @@ def main(): ...@@ -223,6 +257,7 @@ def main():
devices_num, cfg) devices_num, cfg)
eval_reader = create_reader(cfg.EvalReader) eval_reader = create_reader(cfg.EvalReader)
constraint = create('Constraint')
for step in range(cfg.search_steps): for step in range(cfg.search_steps):
logger.info('----->>> search step: {} <<<------'.format(step)) logger.info('----->>> search step: {} <<<------'.format(step))
archs = sa_nas.next_archs()[0] archs = sa_nas.next_archs()[0]
...@@ -252,9 +287,15 @@ def main(): ...@@ -252,9 +287,15 @@ def main():
optimizer.minimize(loss) optimizer.minimize(loss)
if FLAGS.fp16: if FLAGS.fp16:
loss /= ctx.get_loss_scale_var() loss /= ctx.get_loss_scale_var()
current_flops = flops(train_prog)
logger.info('current steps: {}, flops {}'.format(step, current_flops)) current_constraint = constraint.compute_constraint(train_prog)
if current_flops > cfg.max_flops: logger.info('current steps: {}, constraint {}'.format(
step, current_constraint))
if (constraint.max_constraint != None and
current_constraint > constraint.max_constraint) or (
constraint.min_constraint != None and
current_constraint < constraint.min_constraint):
continue continue
# parse train fetches # parse train fetches
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册