提交 01297143 编写于 作者: Y yangyaming

Polish README and simplify network configuration codes.

上级 129461a7
...@@ -6,7 +6,7 @@ SSD使用一个卷积神经网络实现“端到端”的检测,所谓“端 ...@@ -6,7 +6,7 @@ SSD使用一个卷积神经网络实现“端到端”的检测,所谓“端
1. 将最后的fc6、fc7全连接层变为卷积层,卷积层参数通过对原始fc6、fc7参数采样得到。 1. 将最后的fc6、fc7全连接层变为卷积层,卷积层参数通过对原始fc6、fc7参数采样得到。
2. 将pool5层的参数由2x2-s2(kernel大小为2x2,stride size为2)更改为3x3-s1-p1(kernel大小为3x3,stride size为1,padding size为1)。 2. 将pool5层的参数由2x2-s2(kernel大小为2x2,stride size为2)更改为3x3-s1-p1(kernel大小为3x3,stride size为1,padding size为1)。
3. 在conv4\_3、conv7、conv8\_2、conv9\_2、conv10\_2及pool11层后面接了priorbox层,priorbox层的主要目的是根据输入的feature map生成一系列的矩形候选框。关于SSD的更详细的介绍可以参考论文\[[1](#引用)\] 3. 在conv4\_3、conv7、conv8\_2、conv9\_2、conv10\_2及pool11层后面接了priorbox层,priorbox层的主要目的是根据输入的特征图(feature map)生成一系列的矩形候选框。关于SSD的更详细的介绍可以参考论文\[[1](#引用)\]
下图为模型(300x300)的总体结构: 下图为模型(300x300)的总体结构:
...@@ -17,12 +17,12 @@ SSD使用一个卷积神经网络实现“端到端”的检测,所谓“端 ...@@ -17,12 +17,12 @@ SSD使用一个卷积神经网络实现“端到端”的检测,所谓“端
图中每个矩形盒子代表一个卷积层,最后的两个矩形框分别表示汇总各卷积层输出结果和后处理阶段。具体地,在预测阶段网络会输出一组候选矩形框,每个矩形包含两类信息:位置和类别得分,图中倒数第二个矩形框即表示网络的检测结果的汇总处理,由于候选矩形框数量较多且很多矩形框重叠严重,这时需要经过后处理来筛选出质量较高的少数矩形框,这里的后处理主要指非极大值抑制(Non-maximum Suppression)。 图中每个矩形盒子代表一个卷积层,最后的两个矩形框分别表示汇总各卷积层输出结果和后处理阶段。具体地,在预测阶段网络会输出一组候选矩形框,每个矩形包含两类信息:位置和类别得分,图中倒数第二个矩形框即表示网络的检测结果的汇总处理,由于候选矩形框数量较多且很多矩形框重叠严重,这时需要经过后处理来筛选出质量较高的少数矩形框,这里的后处理主要指非极大值抑制(Non-maximum Suppression)。
从SSD的网络结构可以看出,候选矩形框在多个feature map上生成,不同的feature map具有的感受野不同,这样可以在不同尺度扫描图像,相对于其他检测方法可以生成更丰富的候选框,从而提高检测精度;另一方面SSD对VGG16的扩展部分以较小的代价实现对候选框的位置和类别得分的计算,整个过程只需要一个卷积神经网络完成,所以速度较快。 从SSD的网络结构可以看出,候选矩形框在多个特征图(feature map上)生成,不同的feature map具有的感受野不同,这样可以在不同尺度扫描图像,相对于其他检测方法可以生成更丰富的候选框,从而提高检测精度;另一方面SSD对VGG16的扩展部分以较小的代价实现对候选框的位置和类别得分的计算,整个过程只需要一个卷积神经网络完成,所以速度较快。
## 示例总览 ## 示例总览
本示例共包含如下文件: 本示例共包含如下文件:
<center> <center>
<center>表1. 示例文件</center> 表1. 示例文件
文件 | 用途 文件 | 用途
---- | ----- ---- | -----
...@@ -63,7 +63,7 @@ def prepare_filelist(devkit_dir, years, output_dir): ...@@ -63,7 +63,7 @@ def prepare_filelist(devkit_dir, years, output_dir):
ftest.write(item[0] + ' ' + item[1] + '\n') ftest.write(item[0] + ' ' + item[1] + '\n')
``` ```
该函数首先对每个year的数据进行处理,然后将训练图像的文件路径列表进行随机打乱,最后保存训练文件列表和测试文件列表。默认```prepare_voc_data.py``````VOCdevkit```在相同目录下,且生成的文件列表也在该目录。需注意```trainval.txt```既包含VOC2007的训练数据,也包含VOC2012的训练数据,```test.txt```只包含VOC2007的测试数据。我们这里提供```trainval.txt```前几行输入作为样例: 该函数首先对每一年(year)的数据进行处理,然后将训练图像的文件路径列表进行随机打乱,最后保存训练文件列表和测试文件列表。默认```prepare_voc_data.py``````VOCdevkit```在相同目录下,且生成的文件列表也在该目录。需注意```trainval.txt```既包含VOC2007的训练数据,也包含VOC2012的训练数据,```test.txt```只包含VOC2007的测试数据。我们这里提供```trainval.txt```前几行输入作为样例:
``` ```
VOCdevkit/VOC2007/JPEGImages/000005.jpg VOCdevkit/VOC2007/Annotations/000005.xml VOCdevkit/VOC2007/JPEGImages/000005.jpg VOCdevkit/VOC2007/Annotations/000005.xml
...@@ -99,6 +99,15 @@ train(train_file_list='./data/trainval.txt', ...@@ -99,6 +99,15 @@ train(train_file_list='./data/trainval.txt',
3. 调用```train```执行训练,其中```train_file_list```指定训练数据列表,```dev_file_list```指定评估数据列表,```init_model_path```指定预训练模型位置。 3. 调用```train```执行训练,其中```train_file_list```指定训练数据列表,```dev_file_list```指定评估数据列表,```init_model_path```指定预训练模型位置。
4. 训练过程中会打印一些日志信息,每训练10个batch会输出当前的轮数、当前batch的cost及mAP(mean Average Precision,平均精度均值),每训练一个pass,会保存一次模型,默认保存在```checkpoints```目录下(注:需事先创建)。 4. 训练过程中会打印一些日志信息,每训练10个batch会输出当前的轮数、当前batch的cost及mAP(mean Average Precision,平均精度均值),每训练一个pass,会保存一次模型,默认保存在```checkpoints```目录下(注:需事先创建)。
下面给出SDD300x300在VOC数据集(train包括07+12,test为07)上的mAP曲线,迭代140轮mAP可达到71.52%。
<p align="center">
<img src="images/SSD300x300_map.png" hspace='10'/> <br/>
图2. SSD300x300 mAP收敛曲线
</p>
### 模型评估 ### 模型评估
执行```python eval.py```即可对模型进行评估,```eval.py```的关键执行逻辑如下: 执行```python eval.py```即可对模型进行评估,```eval.py```的关键执行逻辑如下:
...@@ -134,7 +143,28 @@ infer( ...@@ -134,7 +143,28 @@ infer(
threshold=0.3) threshold=0.3)
``` ```
其中```eval_file_list```指定图像路径列表;```save_path```指定预测结果保存路径;```data_args```如上;```batch_size```为每多少样本预测一次;```model_path```指模型的位置;```threshold```为置信度阈值,只有得分大于或等于该值的才会输出。示例还提供了一个可视化脚本,直接运行```python visual.py```即可,须指定输出检测结果路径及输出目录。 其中```eval_file_list```指定图像路径列表;```save_path```指定预测结果保存路径;```data_args```如上;```batch_size```为每多少样本预测一次;```model_path```指模型的位置;```threshold```为置信度阈值,只有得分大于或等于该值的才会输出。下面给出```infer.res```的一些输出样例:
```
VOCdevkit/VOC2007/JPEGImages/006936.jpg 12 0.997844 131.255611777 162.271582842 396.475315094 334.0
VOCdevkit/VOC2007/JPEGImages/006936.jpg 14 0.998557 229.160234332 49.5991278887 314.098775387 312.913876176
VOCdevkit/VOC2007/JPEGImages/006936.jpg 14 0.372522 187.543615699 133.727034628 345.647156239 327.448492289
...
```
一共包含4个字段,以tab分割,第一个字段是检测图像路径,第二字段为检测矩形框内类别,第三个字段是置信度,第四个字段是4个坐标值(以空格分割)。
示例还提供了一个可视化脚本,直接运行```python visual.py```即可,须指定输出检测结果路径及输出目录,默认可视化后图像保存在```./visual_res```,下面是用训练好的模型infer部分图像,并可视化的效果:
<p align="center">
<img src="images/vis_1.jpg" height=150 width=200 hspace='10'/>
<img src="images/vis_2.jpg" height=150 width=200 hspace='10'/>
<img src="images/vis_3.jpg" height=150 width=100 hspace='10'/>
<img src="images/vis_4.jpg" height=150 width=200 hspace='10'/> <br />
图2. SSD300x300 检测可视化示例
</p>
## 自有数据集 ## 自有数据集
在自有数据上训练PaddlePaddle SSD需要完成两个关键准备,首先需要适配网络可以接受的输入格式,这里提供一个推荐的结构,以```train.txt```为例 在自有数据上训练PaddlePaddle SSD需要完成两个关键准备,首先需要适配网络可以接受的输入格式,这里提供一个推荐的结构,以```train.txt```为例
......
...@@ -7,22 +7,14 @@ devkit_dir = './VOCdevkit' ...@@ -7,22 +7,14 @@ devkit_dir = './VOCdevkit'
years = ['2007', '2012'] years = ['2007', '2012']
def get_img_dir(devkit_dir, year): def get_dir(devkit_dir, year, type):
return osp.join(devkit_dir, 'VOC' + year, 'JPEGImages') return osp.join(devkit_dir, 'VOC' + year, type)
def get_annotation_dir(devkit_dir, year):
return osp.join(devkit_dir, 'VOC' + year, 'Annotations')
def get_filelist_dir(devkit_dir, year):
return osp.join(devkit_dir, 'VOC' + year, 'ImageSets/Main')
def walk_dir(devkit_dir, year): def walk_dir(devkit_dir, year):
filelist_dir = get_filelist_dir(devkit_dir, year) filelist_dir = get_dir(devkit_dir, year, 'ImageSets/Main')
annotation_dir = get_annotation_dir(devkit_dir, year) annotation_dir = get_dir(devkit_dir, year, 'Annotations')
img_dir = get_img_dir(devkit_dir, year) img_dir = get_dir(devkit_dir, year, 'JPEGImages')
trainval_list = [] trainval_list = []
test_list = [] test_list = []
added = set() added = set()
......
...@@ -31,15 +31,8 @@ class Settings(object): ...@@ -31,15 +31,8 @@ class Settings(object):
self._resize_height = resize_h self._resize_height = resize_h
self._resize_width = resize_w self._resize_width = resize_w
self._mean_value = mean_value self._img_mean = np.array(mean_value)[:, np.newaxis, np.newaxis].astype(
'float32')
img_size = self._resize_height * self._resize_width
self._img_mean = np.zeros(img_size * 3, dtype=np.single)
for idx, value in enumerate(self._mean_value):
self._img_mean[idx * img_size:(idx + 1) * img_size] = value
self._img_mean = self._img_mean.reshape(3, self._resize_height,
self._resize_width)
self._img_mean = self._img_mean.astype('float32')
@property @property
def data_dir(self): def data_dir(self):
...@@ -130,12 +123,12 @@ def _reader_creator(settings, file_list, mode, shuffle): ...@@ -130,12 +123,12 @@ def _reader_creator(settings, file_list, mode, shuffle):
image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.0,
1.0)) 1.0))
""" random crop """ """ random crop """
sampled_bbox = image_util.generateBatchSamples( sampled_bbox = image_util.generate_batch_samples(
batch_sampler, bbox_labels, img_width, img_height) batch_sampler, bbox_labels, img_width, img_height)
if len(sampled_bbox) > 0: if len(sampled_bbox) > 0:
idx = int(random.uniform(0, len(sampled_bbox))) idx = int(random.uniform(0, len(sampled_bbox)))
img, sample_labels = image_util.cropImage( img, sample_labels = image_util.crop_image(
img, bbox_labels, sampled_bbox[idx], img_width, img, bbox_labels, sampled_bbox[idx], img_width,
img_height) img_height)
......
...@@ -40,20 +40,13 @@ class bbox(): ...@@ -40,20 +40,13 @@ class bbox():
self.ymax = ymax self.ymax = ymax
def bboxSize(src_bbox): def bbox_area(src_bbox):
width = src_bbox.xmax - src_bbox.xmin width = src_bbox.xmax - src_bbox.xmin
height = src_bbox.ymax - src_bbox.ymin height = src_bbox.ymax - src_bbox.ymin
return width * height return width * height
def preprocessImg(obj, im): def generate_sample(sampler):
im = im.astype('float32')
pic = im
pic -= obj.img_mean
return pic.flatten()
def generateSample(sampler):
scale = random.uniform(sampler.min_scale, sampler.max_scale) scale = random.uniform(sampler.min_scale, sampler.max_scale)
min_aspect_ratio = max(sampler.min_aspect_ratio, (scale**2.0)) min_aspect_ratio = max(sampler.min_aspect_ratio, (scale**2.0))
max_aspect_ratio = min(sampler.max_aspect_ratio, 1 / (scale**2.0)) max_aspect_ratio = min(sampler.max_aspect_ratio, 1 / (scale**2.0))
...@@ -70,7 +63,7 @@ def generateSample(sampler): ...@@ -70,7 +63,7 @@ def generateSample(sampler):
return sampled_bbox return sampled_bbox
def jaccardOverlap(sample_bbox, object_bbox): def jaccard_overlap(sample_bbox, object_bbox):
if sample_bbox.xmin >= object_bbox.xmax or \ if sample_bbox.xmin >= object_bbox.xmax or \
sample_bbox.xmax <= object_bbox.xmin or \ sample_bbox.xmax <= object_bbox.xmin or \
sample_bbox.ymin >= object_bbox.ymax or \ sample_bbox.ymin >= object_bbox.ymax or \
...@@ -82,20 +75,20 @@ def jaccardOverlap(sample_bbox, object_bbox): ...@@ -82,20 +75,20 @@ def jaccardOverlap(sample_bbox, object_bbox):
intersect_ymax = min(sample_bbox.ymax, object_bbox.ymax) intersect_ymax = min(sample_bbox.ymax, object_bbox.ymax)
intersect_size = (intersect_xmax - intersect_xmin) * ( intersect_size = (intersect_xmax - intersect_xmin) * (
intersect_ymax - intersect_ymin) intersect_ymax - intersect_ymin)
sample_bbox_size = bboxSize(sample_bbox) sample_bbox_size = bbox_area(sample_bbox)
object_bbox_size = bboxSize(object_bbox) object_bbox_size = bbox_area(object_bbox)
overlap = intersect_size / ( overlap = intersect_size / (
sample_bbox_size + object_bbox_size - intersect_size) sample_bbox_size + object_bbox_size - intersect_size)
return overlap return overlap
def satisfySampleConstraint(sampler, sample_bbox, bbox_labels): def satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
if sampler.min_jaccard_overlap == 0 and sampler.max_jaccard_overlap == 0: if sampler.min_jaccard_overlap == 0 and sampler.max_jaccard_overlap == 0:
return True return True
for i in range(len(bbox_labels)): for i in range(len(bbox_labels)):
object_bbox = bbox(bbox_labels[i][1], bbox_labels[i][2], object_bbox = bbox(bbox_labels[i][1], bbox_labels[i][2],
bbox_labels[i][3], bbox_labels[i][4]) bbox_labels[i][3], bbox_labels[i][4])
overlap = jaccardOverlap(sample_bbox, object_bbox) overlap = jaccard_overlap(sample_bbox, object_bbox)
if sampler.min_jaccard_overlap != 0 and \ if sampler.min_jaccard_overlap != 0 and \
overlap < sampler.min_jaccard_overlap: overlap < sampler.min_jaccard_overlap:
continue continue
...@@ -106,7 +99,8 @@ def satisfySampleConstraint(sampler, sample_bbox, bbox_labels): ...@@ -106,7 +99,8 @@ def satisfySampleConstraint(sampler, sample_bbox, bbox_labels):
return False return False
def generateBatchSamples(batch_sampler, bbox_labels, image_width, image_height): def generate_batch_samples(batch_sampler, bbox_labels, image_width,
image_height):
sampled_bbox = [] sampled_bbox = []
index = [] index = []
c = 0 c = 0
...@@ -115,8 +109,8 @@ def generateBatchSamples(batch_sampler, bbox_labels, image_width, image_height): ...@@ -115,8 +109,8 @@ def generateBatchSamples(batch_sampler, bbox_labels, image_width, image_height):
for i in range(sampler.max_trial): for i in range(sampler.max_trial):
if found >= sampler.max_sample: if found >= sampler.max_sample:
break break
sample_bbox = generateSample(sampler) sample_bbox = generate_sample(sampler)
if satisfySampleConstraint(sampler, sample_bbox, bbox_labels): if satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
sampled_bbox.append(sample_bbox) sampled_bbox.append(sample_bbox)
found = found + 1 found = found + 1
index.append(c) index.append(c)
...@@ -124,7 +118,7 @@ def generateBatchSamples(batch_sampler, bbox_labels, image_width, image_height): ...@@ -124,7 +118,7 @@ def generateBatchSamples(batch_sampler, bbox_labels, image_width, image_height):
return sampled_bbox return sampled_bbox
def clipBBox(src_bbox): def clip_bbox(src_bbox):
src_bbox.xmin = max(min(src_bbox.xmin, 1.0), 0.0) src_bbox.xmin = max(min(src_bbox.xmin, 1.0), 0.0)
src_bbox.ymin = max(min(src_bbox.ymin, 1.0), 0.0) src_bbox.ymin = max(min(src_bbox.ymin, 1.0), 0.0)
src_bbox.xmax = max(min(src_bbox.xmax, 1.0), 0.0) src_bbox.xmax = max(min(src_bbox.xmax, 1.0), 0.0)
...@@ -132,7 +126,7 @@ def clipBBox(src_bbox): ...@@ -132,7 +126,7 @@ def clipBBox(src_bbox):
return src_bbox return src_bbox
def meetEmitConstraint(src_bbox, sample_bbox): def meet_emit_constraint(src_bbox, sample_bbox):
center_x = (src_bbox.xmax + src_bbox.xmin) / 2 center_x = (src_bbox.xmax + src_bbox.xmin) / 2
center_y = (src_bbox.ymax + src_bbox.ymin) / 2 center_y = (src_bbox.ymax + src_bbox.ymin) / 2
if center_x >= sample_bbox.xmin and \ if center_x >= sample_bbox.xmin and \
...@@ -143,14 +137,14 @@ def meetEmitConstraint(src_bbox, sample_bbox): ...@@ -143,14 +137,14 @@ def meetEmitConstraint(src_bbox, sample_bbox):
return False return False
def transformLabels(bbox_labels, sample_bbox): def transform_labels(bbox_labels, sample_bbox):
proj_bbox = bbox(0, 0, 0, 0) proj_bbox = bbox(0, 0, 0, 0)
sample_labels = [] sample_labels = []
for i in range(len(bbox_labels)): for i in range(len(bbox_labels)):
sample_label = [] sample_label = []
object_bbox = bbox(bbox_labels[i][1], bbox_labels[i][2], object_bbox = bbox(bbox_labels[i][1], bbox_labels[i][2],
bbox_labels[i][3], bbox_labels[i][4]) bbox_labels[i][3], bbox_labels[i][4])
if not meetEmitConstraint(object_bbox, sample_bbox): if not meet_emit_constraint(object_bbox, sample_bbox):
continue continue
sample_width = sample_bbox.xmax - sample_bbox.xmin sample_width = sample_bbox.xmax - sample_bbox.xmin
sample_height = sample_bbox.ymax - sample_bbox.ymin sample_height = sample_bbox.ymax - sample_bbox.ymin
...@@ -158,8 +152,8 @@ def transformLabels(bbox_labels, sample_bbox): ...@@ -158,8 +152,8 @@ def transformLabels(bbox_labels, sample_bbox):
proj_bbox.ymin = (object_bbox.ymin - sample_bbox.ymin) / sample_height proj_bbox.ymin = (object_bbox.ymin - sample_bbox.ymin) / sample_height
proj_bbox.xmax = (object_bbox.xmax - sample_bbox.xmin) / sample_width proj_bbox.xmax = (object_bbox.xmax - sample_bbox.xmin) / sample_width
proj_bbox.ymax = (object_bbox.ymax - sample_bbox.ymin) / sample_height proj_bbox.ymax = (object_bbox.ymax - sample_bbox.ymin) / sample_height
proj_bbox = clipBBox(proj_bbox) proj_bbox = clip_bbox(proj_bbox)
if bboxSize(proj_bbox) > 0: if bbox_area(proj_bbox) > 0:
sample_label.append(bbox_labels[i][0]) sample_label.append(bbox_labels[i][0])
sample_label.append(float(proj_bbox.xmin)) sample_label.append(float(proj_bbox.xmin))
sample_label.append(float(proj_bbox.ymin)) sample_label.append(float(proj_bbox.ymin))
...@@ -170,12 +164,12 @@ def transformLabels(bbox_labels, sample_bbox): ...@@ -170,12 +164,12 @@ def transformLabels(bbox_labels, sample_bbox):
return sample_labels return sample_labels
def cropImage(img, bbox_labels, sample_bbox, image_width, image_height): def crop_image(img, bbox_labels, sample_bbox, image_width, image_height):
sample_bbox = clipBBox(sample_bbox) sample_bbox = clip_bbox(sample_bbox)
xmin = int(sample_bbox.xmin * image_width) xmin = int(sample_bbox.xmin * image_width)
xmax = int(sample_bbox.xmax * image_width) xmax = int(sample_bbox.xmax * image_width)
ymin = int(sample_bbox.ymin * image_height) ymin = int(sample_bbox.ymin * image_height)
ymax = int(sample_bbox.ymax * image_height) ymax = int(sample_bbox.ymax * image_height)
sample_img = img[ymin:ymax, xmin:xmax] sample_img = img[ymin:ymax, xmin:xmax]
sample_labels = transformLabels(bbox_labels, sample_bbox) sample_labels = transform_labels(bbox_labels, sample_bbox)
return sample_img, sample_labels return sample_img, sample_labels
...@@ -48,7 +48,7 @@ SSD使用一个卷积神经网络实现“端到端”的检测,所谓“端 ...@@ -48,7 +48,7 @@ SSD使用一个卷积神经网络实现“端到端”的检测,所谓“端
1. 将最后的fc6、fc7全连接层变为卷积层,卷积层参数通过对原始fc6、fc7参数采样得到。 1. 将最后的fc6、fc7全连接层变为卷积层,卷积层参数通过对原始fc6、fc7参数采样得到。
2. 将pool5层的参数由2x2-s2(kernel大小为2x2,stride size为2)更改为3x3-s1-p1(kernel大小为3x3,stride size为1,padding size为1)。 2. 将pool5层的参数由2x2-s2(kernel大小为2x2,stride size为2)更改为3x3-s1-p1(kernel大小为3x3,stride size为1,padding size为1)。
3. 在conv4\_3、conv7、conv8\_2、conv9\_2、conv10\_2及pool11层后面接了priorbox层,priorbox层的主要目的是根据输入的feature map生成一系列的矩形候选框。关于SSD的更详细的介绍可以参考论文\[[1](#引用)\]。 3. 在conv4\_3、conv7、conv8\_2、conv9\_2、conv10\_2及pool11层后面接了priorbox层,priorbox层的主要目的是根据输入的特征图(feature map)生成一系列的矩形候选框。关于SSD的更详细的介绍可以参考论文\[[1](#引用)\]。
下图为模型(300x300)的总体结构: 下图为模型(300x300)的总体结构:
...@@ -59,12 +59,12 @@ SSD使用一个卷积神经网络实现“端到端”的检测,所谓“端 ...@@ -59,12 +59,12 @@ SSD使用一个卷积神经网络实现“端到端”的检测,所谓“端
图中每个矩形盒子代表一个卷积层,最后的两个矩形框分别表示汇总各卷积层输出结果和后处理阶段。具体地,在预测阶段网络会输出一组候选矩形框,每个矩形包含两类信息:位置和类别得分,图中倒数第二个矩形框即表示网络的检测结果的汇总处理,由于候选矩形框数量较多且很多矩形框重叠严重,这时需要经过后处理来筛选出质量较高的少数矩形框,这里的后处理主要指非极大值抑制(Non-maximum Suppression)。 图中每个矩形盒子代表一个卷积层,最后的两个矩形框分别表示汇总各卷积层输出结果和后处理阶段。具体地,在预测阶段网络会输出一组候选矩形框,每个矩形包含两类信息:位置和类别得分,图中倒数第二个矩形框即表示网络的检测结果的汇总处理,由于候选矩形框数量较多且很多矩形框重叠严重,这时需要经过后处理来筛选出质量较高的少数矩形框,这里的后处理主要指非极大值抑制(Non-maximum Suppression)。
从SSD的网络结构可以看出,候选矩形框在多个feature map上生成,不同的feature map具有的感受野不同,这样可以在不同尺度扫描图像,相对于其他检测方法可以生成更丰富的候选框,从而提高检测精度;另一方面SSD对VGG16的扩展部分以较小的代价实现对候选框的位置和类别得分的计算,整个过程只需要一个卷积神经网络完成,所以速度较快。 从SSD的网络结构可以看出,候选矩形框在多个特征图(feature map上)生成,不同的feature map具有的感受野不同,这样可以在不同尺度扫描图像,相对于其他检测方法可以生成更丰富的候选框,从而提高检测精度;另一方面SSD对VGG16的扩展部分以较小的代价实现对候选框的位置和类别得分的计算,整个过程只需要一个卷积神经网络完成,所以速度较快。
## 示例总览 ## 示例总览
本示例共包含如下文件: 本示例共包含如下文件:
<center> <center>
<center>表1. 示例文件</center> 表1. 示例文件
文件 | 用途 文件 | 用途
---- | ----- ---- | -----
...@@ -105,7 +105,7 @@ def prepare_filelist(devkit_dir, years, output_dir): ...@@ -105,7 +105,7 @@ def prepare_filelist(devkit_dir, years, output_dir):
ftest.write(item[0] + ' ' + item[1] + '\n') ftest.write(item[0] + ' ' + item[1] + '\n')
``` ```
该函数首先对每个year的数据进行处理,然后将训练图像的文件路径列表进行随机打乱,最后保存训练文件列表和测试文件列表。默认```prepare_voc_data.py```和```VOCdevkit```在相同目录下,且生成的文件列表也在该目录。需注意```trainval.txt```既包含VOC2007的训练数据,也包含VOC2012的训练数据,```test.txt```只包含VOC2007的测试数据。我们这里提供```trainval.txt```前几行输入作为样例: 该函数首先对每一年(year)的数据进行处理,然后将训练图像的文件路径列表进行随机打乱,最后保存训练文件列表和测试文件列表。默认```prepare_voc_data.py```和```VOCdevkit```在相同目录下,且生成的文件列表也在该目录。需注意```trainval.txt```既包含VOC2007的训练数据,也包含VOC2012的训练数据,```test.txt```只包含VOC2007的测试数据。我们这里提供```trainval.txt```前几行输入作为样例:
``` ```
VOCdevkit/VOC2007/JPEGImages/000005.jpg VOCdevkit/VOC2007/Annotations/000005.xml VOCdevkit/VOC2007/JPEGImages/000005.jpg VOCdevkit/VOC2007/Annotations/000005.xml
...@@ -141,6 +141,15 @@ train(train_file_list='./data/trainval.txt', ...@@ -141,6 +141,15 @@ train(train_file_list='./data/trainval.txt',
3. 调用```train```执行训练,其中```train_file_list```指定训练数据列表,```dev_file_list```指定评估数据列表,```init_model_path```指定预训练模型位置。 3. 调用```train```执行训练,其中```train_file_list```指定训练数据列表,```dev_file_list```指定评估数据列表,```init_model_path```指定预训练模型位置。
4. 训练过程中会打印一些日志信息,每训练10个batch会输出当前的轮数、当前batch的cost及mAP(mean Average Precision,平均精度均值),每训练一个pass,会保存一次模型,默认保存在```checkpoints```目录下(注:需事先创建)。 4. 训练过程中会打印一些日志信息,每训练10个batch会输出当前的轮数、当前batch的cost及mAP(mean Average Precision,平均精度均值),每训练一个pass,会保存一次模型,默认保存在```checkpoints```目录下(注:需事先创建)。
下面给出SDD300x300在VOC数据集(train包括07+12,test为07)上的mAP曲线,迭代140轮mAP可达到71.52%。
<p align="center">
<img src="images/SSD300x300_map.png" hspace='10'/> <br/>
图2. SSD300x300 mAP收敛曲线
</p>
### 模型评估 ### 模型评估
执行```python eval.py```即可对模型进行评估,```eval.py```的关键执行逻辑如下: 执行```python eval.py```即可对模型进行评估,```eval.py```的关键执行逻辑如下:
...@@ -176,7 +185,28 @@ infer( ...@@ -176,7 +185,28 @@ infer(
threshold=0.3) threshold=0.3)
``` ```
其中```eval_file_list```指定图像路径列表;```save_path```指定预测结果保存路径;```data_args```如上;```batch_size```为每多少样本预测一次;```model_path```指模型的位置;```threshold```为置信度阈值,只有得分大于或等于该值的才会输出。示例还提供了一个可视化脚本,直接运行```python visual.py```即可,须指定输出检测结果路径及输出目录。 其中```eval_file_list```指定图像路径列表;```save_path```指定预测结果保存路径;```data_args```如上;```batch_size```为每多少样本预测一次;```model_path```指模型的位置;```threshold```为置信度阈值,只有得分大于或等于该值的才会输出。下面给出```infer.res```的一些输出样例:
```
VOCdevkit/VOC2007/JPEGImages/006936.jpg 12 0.997844 131.255611777 162.271582842 396.475315094 334.0
VOCdevkit/VOC2007/JPEGImages/006936.jpg 14 0.998557 229.160234332 49.5991278887 314.098775387 312.913876176
VOCdevkit/VOC2007/JPEGImages/006936.jpg 14 0.372522 187.543615699 133.727034628 345.647156239 327.448492289
...
```
一共包含4个字段,以tab分割,第一个字段是检测图像路径,第二字段为检测矩形框内类别,第三个字段是置信度,第四个字段是4个坐标值(以空格分割)。
示例还提供了一个可视化脚本,直接运行```python visual.py```即可,须指定输出检测结果路径及输出目录,默认可视化后图像保存在```./visual_res```,下面是用训练好的模型infer部分图像,并可视化的效果:
<p align="center">
<img src="images/vis_1.jpg" height=150 width=200 hspace='10'/>
<img src="images/vis_2.jpg" height=150 width=200 hspace='10'/>
<img src="images/vis_3.jpg" height=150 width=100 hspace='10'/>
<img src="images/vis_4.jpg" height=150 width=200 hspace='10'/> <br />
图2. SSD300x300 检测可视化示例
</p>
## 自有数据集 ## 自有数据集
在自有数据上训练PaddlePaddle SSD需要完成两个关键准备,首先需要适配网络可以接受的输入格式,这里提供一个推荐的结构,以```train.txt```为例 在自有数据上训练PaddlePaddle SSD需要完成两个关键准备,首先需要适配网络可以接受的输入格式,这里提供一个推荐的结构,以```train.txt```为例
......
...@@ -21,11 +21,26 @@ def _infer(inferer, infer_data, threshold): ...@@ -21,11 +21,26 @@ def _infer(inferer, infer_data, threshold):
return ret return ret
def save_batch_res(ret_res, img_w, img_h, fname_list, fout):
for det_res in ret_res:
img_idx = int(det_res[0])
label = int(det_res[1])
conf_score = det_res[2]
xmin = det_res[3] * img_w[img_idx]
ymin = det_res[4] * img_h[img_idx]
xmax = det_res[5] * img_w[img_idx]
ymax = det_res[6] * img_h[img_idx]
fout.write(fname_list[img_idx] + '\t' + str(label) + '\t' + str(
conf_score) + '\t' + str(xmin) + ' ' + str(ymin) + ' ' + str(xmax) +
' ' + str(ymax))
fout.write('\n')
def infer(eval_file_list, save_path, data_args, batch_size, model_path, def infer(eval_file_list, save_path, data_args, batch_size, model_path,
threshold): threshold):
detect_out = vgg_net_ssd_v2.net_conf(mode='infer') detect_out = vgg_ssd_net.net_conf(mode='infer')
assert os.path.isfile(init_model_path), 'Invalid model.' assert os.path.isfile(model_path), 'Invalid model.'
parameters = paddle.parameters.Parameters.from_tar(gzip.open(model_path)) parameters = paddle.parameters.Parameters.from_tar(gzip.open(model_path))
inferer = paddle.inference.Inference( inferer = paddle.inference.Inference(
...@@ -46,24 +61,12 @@ def infer(eval_file_list, save_path, data_args, batch_size, model_path, ...@@ -46,24 +61,12 @@ def infer(eval_file_list, save_path, data_args, batch_size, model_path,
for img in reader(): for img in reader():
test_data.append([img]) test_data.append([img])
fname_list.append(all_fname_list[idx]) fname_list.append(all_fname_list[idx])
w, h = \ w, h = Image.open(os.path.join('./data', fname_list[-1])).size
Image.open(os.path.join('./data', fname_list[-1])).size
img_w.append(w) img_w.append(w)
img_h.append(h) img_h.append(h)
if len(test_data) == batch_size: if len(test_data) == batch_size:
ret_res = _infer(inferer, test_data, threshold) ret_res = _infer(inferer, test_data, threshold)
for det_res in ret_res: save_batch_res(ret_res, img_w, img_h, fname_list, fout)
img_idx = int(det_res[0])
label = int(det_res[1])
conf_score = det_res[2]
xmin = det_res[3] * img_w[img_idx]
ymin = det_res[4] * img_h[img_idx]
xmax = det_res[5] * img_w[img_idx]
ymax = det_res[6] * img_h[img_idx]
fout.write(fname_list[img_idx] + '\t' + str(label) + '\t' +
str(conf_score) + '\t' + str(xmin) + ' ' + str(
ymin) + ' ' + str(xmax) + ' ' + str(
ymax) + '\n')
test_data = [] test_data = []
fname_list = [] fname_list = []
img_w = [] img_w = []
...@@ -73,17 +76,7 @@ def infer(eval_file_list, save_path, data_args, batch_size, model_path, ...@@ -73,17 +76,7 @@ def infer(eval_file_list, save_path, data_args, batch_size, model_path,
if len(test_data) > 0: if len(test_data) > 0:
ret_res = _infer(inferer, test_data, threshold) ret_res = _infer(inferer, test_data, threshold)
for det_res in ret_res: save_batch_res(ret_res, img_w, img_h, fname_list, fout)
img_idx = int(det_res[0])
label = int(det_res[1])
conf_score = det_res[2]
xmin = det_res[3] * img_w[img_idx]
ymin = det_res[4] * img_h[img_idx]
xmax = det_res[5] * img_w[img_idx]
ymax = det_res[6] * img_h[img_idx]
fout.write(fname_list[img_idx] + '\t' + str(label) + '\t' + str(
conf_score) + '\t' + str(xmin) + ' ' + str(ymin) + ' ' +
str(xmax) + ' ' + str(ymax) + '\n')
if __name__ == "__main__": if __name__ == "__main__":
......
...@@ -8,17 +8,6 @@ from config.pascal_voc_conf import cfg ...@@ -8,17 +8,6 @@ from config.pascal_voc_conf import cfg
def train(train_file_list, dev_file_list, data_args, init_model_path): def train(train_file_list, dev_file_list, data_args, init_model_path):
cost, detect_out = vgg_ssd_net.net_conf('train')
parameters = paddle.parameters.create(cost)
if not (init_model_path is None):
assert os.path.isfile(init_model_path), 'Invalid model.'
fparams = paddle.parameters.Parameters.from_tar(
gzip.open(init_model_path))
for param_name in fparams.names():
parameters.set(param_name, fparams.get(param_name))
optimizer = paddle.optimizer.Momentum( optimizer = paddle.optimizer.Momentum(
momentum=cfg.TRAIN.MOMENTUM, momentum=cfg.TRAIN.MOMENTUM,
learning_rate=cfg.TRAIN.LEARNING_RATE, learning_rate=cfg.TRAIN.LEARNING_RATE,
...@@ -28,6 +17,13 @@ def train(train_file_list, dev_file_list, data_args, init_model_path): ...@@ -28,6 +17,13 @@ def train(train_file_list, dev_file_list, data_args, init_model_path):
learning_rate_decay_b=cfg.TRAIN.LEARNING_RATE_DECAY_B, learning_rate_decay_b=cfg.TRAIN.LEARNING_RATE_DECAY_B,
learning_rate_schedule=cfg.TRAIN.LEARNING_RATE_SCHEDULE) learning_rate_schedule=cfg.TRAIN.LEARNING_RATE_SCHEDULE)
cost, detect_out = vgg_ssd_net.net_conf('train')
parameters = paddle.parameters.create(cost)
if not (init_model_path is None):
assert os.path.isfile(init_model_path), 'Invalid model.'
parameters.init_from_tar(gzip.open(init_model_path))
trainer = paddle.trainer.SGD( trainer = paddle.trainer.SGD(
cost=cost, cost=cost,
parameters=parameters, parameters=parameters,
...@@ -37,8 +33,7 @@ def train(train_file_list, dev_file_list, data_args, init_model_path): ...@@ -37,8 +33,7 @@ def train(train_file_list, dev_file_list, data_args, init_model_path):
feeding = {'image': 0, 'bbox': 1} feeding = {'image': 0, 'bbox': 1}
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.reader.shuffle( data_provider.train(data_args, train_file_list),
data_provider.train(data_args, train_file_list), buf_size=200),
batch_size=cfg.TRAIN.BATCH_SIZE) # generate a batch image each time batch_size=cfg.TRAIN.BATCH_SIZE) # generate a batch image each time
dev_reader = paddle.batch( dev_reader = paddle.batch(
......
...@@ -9,67 +9,56 @@ def net_conf(mode): ...@@ -9,67 +9,56 @@ def net_conf(mode):
""" """
default_l2regularization = cfg.TRAIN.L2REGULARIZATION default_l2regularization = cfg.TRAIN.L2REGULARIZATION
default_bias_attr = paddle.attr.ParamAttr( default_bias_attr = paddle.attr.ParamAttr(l2_rate=0.0, learning_rate=2.0)
l2_rate=0.0, learning_rate=2.0, momentum=cfg.TRAIN.MOMENTUM)
default_static_bias_attr = paddle.attr.ParamAttr(is_static=True) default_static_bias_attr = paddle.attr.ParamAttr(is_static=True)
def xavier(channels, filter_size, local_lr, regularization): def get_param_attr(local_lr, regularization):
init_w = (3.0 / (filter_size**2 * channels))**0.5
is_static = False is_static = False
if local_lr == 0.0: if local_lr == 0.0:
is_static = True is_static = True
return paddle.attr.ParamAttr( return paddle.attr.ParamAttr(
initial_min=(0.0 - init_w), learning_rate=local_lr, l2_rate=regularization, is_static=is_static)
initial_max=init_w,
learning_rate=local_lr, def conv_group(stack_num, name_list, input, filter_size_list, num_channels,
l2_rate=regularization, num_filters_list, stride_list, padding_list,
momentum=cfg.TRAIN.MOMENTUM, common_bias_attr, common_param_attr, common_act):
is_static=is_static) conv = input
in_channels = num_channels
for i in xrange(stack_num):
conv = paddle.layer.img_conv(
name=name_list[i],
input=conv,
filter_size=filter_size_list[i],
num_channels=in_channels,
num_filters=num_filters_list[i],
stride=stride_list[i],
padding=padding_list[i],
bias_attr=common_bias_attr,
param_attr=common_param_attr,
act=common_act)
in_channels = num_filters_list[i]
return conv
def vgg_block(idx_str, input, num_channels, num_filters, pool_size, def vgg_block(idx_str, input, num_channels, num_filters, pool_size,
pool_stride, pool_pad): pool_stride, pool_pad):
layer_name = "conv%s_" % idx_str layer_name = "conv%s_" % idx_str
conv1 = paddle.layer.img_conv( stack_num = 3
name=layer_name + "1", name_list = [layer_name + str(i + 1) for i in xrange(3)]
input=input,
filter_size=3, conv = conv_group(stack_num, name_list, input, [3] * stack_num,
num_channels=num_channels, num_channels, [num_filters] * stack_num,
num_filters=num_filters, [1] * stack_num, [1] * stack_num, default_bias_attr,
stride=1, get_param_attr(1, default_l2regularization),
padding=1, paddle.activation.Relu())
bias_attr=default_bias_attr,
param_attr=xavier(num_filters, 3, 1, default_l2regularization),
act=paddle.activation.Relu())
conv2 = paddle.layer.img_conv(
name=layer_name + "2",
input=conv1,
filter_size=3,
num_channels=num_filters,
num_filters=num_filters,
stride=1,
padding=1,
bias_attr=default_bias_attr,
param_attr=xavier(num_filters, 3, 1, default_l2regularization),
act=paddle.activation.Relu())
conv3 = paddle.layer.img_conv(
name=layer_name + "3",
input=conv2,
filter_size=3,
num_channels=num_filters,
num_filters=num_filters,
stride=1,
padding=1,
bias_attr=default_bias_attr,
param_attr=xavier(num_filters, 3, 1, default_l2regularization),
act=paddle.activation.Relu())
pool = paddle.layer.img_pool( pool = paddle.layer.img_pool(
input=conv3, input=conv,
pool_size=pool_size, pool_size=pool_size,
num_channels=num_filters, num_channels=num_filters,
pool_type=paddle.pooling.CudnnMax(), pool_type=paddle.pooling.CudnnMax(),
stride=pool_stride, stride=pool_stride,
padding=pool_pad) padding=pool_pad)
return conv3, pool return conv, pool
def mbox_block(layer_idx, input, num_channels, filter_size, loc_filters, def mbox_block(layer_idx, input, num_channels, filter_size, loc_filters,
conf_filters): conf_filters):
...@@ -83,8 +72,7 @@ def net_conf(mode): ...@@ -83,8 +72,7 @@ def net_conf(mode):
stride=1, stride=1,
padding=1, padding=1,
bias_attr=default_bias_attr, bias_attr=default_bias_attr,
param_attr=xavier(loc_filters, filter_size, 1, param_attr=get_param_attr(1, default_l2regularization),
default_l2regularization),
act=paddle.activation.Identity()) act=paddle.activation.Identity())
mbox_conf_name = layer_idx + "_mbox_conf" mbox_conf_name = layer_idx + "_mbox_conf"
...@@ -97,8 +85,7 @@ def net_conf(mode): ...@@ -97,8 +85,7 @@ def net_conf(mode):
stride=1, stride=1,
padding=1, padding=1,
bias_attr=default_bias_attr, bias_attr=default_bias_attr,
param_attr=xavier(conf_filters, filter_size, 1, param_attr=get_param_attr(1, default_l2regularization),
default_l2regularization),
act=paddle.activation.Identity()) act=paddle.activation.Identity())
return mbox_loc, mbox_conf return mbox_loc, mbox_conf
...@@ -106,30 +93,14 @@ def net_conf(mode): ...@@ -106,30 +93,14 @@ def net_conf(mode):
def ssd_block(layer_idx, input, img_shape, num_channels, num_filters1, def ssd_block(layer_idx, input, img_shape, num_channels, num_filters1,
num_filters2, aspect_ratio, variance, min_size, max_size): num_filters2, aspect_ratio, variance, min_size, max_size):
layer_name = "conv" + layer_idx + "_" layer_name = "conv" + layer_idx + "_"
stack_num = 2
conv1_name = layer_name + "1" conv1_name = layer_name + "1"
conv1 = paddle.layer.img_conv(
name=conv1_name,
input=input,
filter_size=1,
num_channels=num_channels,
num_filters=num_filters1,
stride=1,
padding=0,
bias_attr=default_bias_attr,
param_attr=xavier(num_filters1, 1, 1, default_l2regularization),
act=paddle.activation.Relu())
conv2_name = layer_name + "2" conv2_name = layer_name + "2"
conv2 = paddle.layer.img_conv( conv2 = conv_group(stack_num, [conv1_name, conv2_name], input, [1, 3],
name=conv2_name, num_channels, [num_filters1, num_filters2], [1, 2],
input=conv1, [0, 1], default_bias_attr,
filter_size=3, get_param_attr(1, default_l2regularization),
num_channels=num_filters1, paddle.activation.Relu())
num_filters=num_filters2,
stride=2,
padding=1,
bias_attr=default_bias_attr,
param_attr=xavier(num_filters2, 3, 1, default_l2regularization),
act=paddle.activation.Relu())
loc_filters = (len(aspect_ratio) * 2 + 1 + len(max_size)) * 4 loc_filters = (len(aspect_ratio) * 2 + 1 + len(max_size)) * 4
conf_filters = ( conf_filters = (
...@@ -153,28 +124,12 @@ def net_conf(mode): ...@@ -153,28 +124,12 @@ def net_conf(mode):
height=cfg.IMG_HEIGHT, height=cfg.IMG_HEIGHT,
width=cfg.IMG_WIDTH) width=cfg.IMG_WIDTH)
conv1_1 = paddle.layer.img_conv( stack_num = 2
name="conv1_1", conv1_2 = conv_group(stack_num, ['conv1_1', 'conv1_2'], img,
input=img, [3] * stack_num, 3, [64] * stack_num, [1] * stack_num,
filter_size=3, [1] * stack_num, default_static_bias_attr,
num_channels=3, get_param_attr(0, 0), paddle.activation.Relu())
num_filters=64,
stride=1,
padding=1,
bias_attr=default_static_bias_attr,
param_attr=xavier(64, 3, 0, 0),
act=paddle.activation.Relu())
conv1_2 = paddle.layer.img_conv(
name="conv1_2",
input=conv1_1,
filter_size=3,
num_channels=64,
num_filters=64,
stride=1,
padding=1,
bias_attr=default_static_bias_attr,
param_attr=xavier(64, 3, 0, 0),
act=paddle.activation.Relu())
pool1 = paddle.layer.img_pool( pool1 = paddle.layer.img_pool(
name="pool1", name="pool1",
input=conv1_2, input=conv1_2,
...@@ -183,28 +138,12 @@ def net_conf(mode): ...@@ -183,28 +138,12 @@ def net_conf(mode):
num_channels=64, num_channels=64,
stride=2) stride=2)
conv2_1 = paddle.layer.img_conv( stack_num = 2
name="conv2_1", conv2_2 = conv_group(stack_num, ['conv2_1', 'conv2_2'], pool1, [3] *
input=pool1, stack_num, 64, [128] * stack_num, [1] * stack_num,
filter_size=3, [1] * stack_num, default_static_bias_attr,
num_channels=64, get_param_attr(0, 0), paddle.activation.Relu())
num_filters=128,
stride=1,
padding=1,
bias_attr=default_static_bias_attr,
param_attr=xavier(128, 3, 0, 0),
act=paddle.activation.Relu())
conv2_2 = paddle.layer.img_conv(
name="conv2_2",
input=conv2_1,
filter_size=3,
num_channels=128,
num_filters=128,
stride=1,
padding=1,
bias_attr=default_static_bias_attr,
param_attr=xavier(128, 3, 0, 0),
act=paddle.activation.Relu())
pool2 = paddle.layer.img_pool( pool2 = paddle.layer.img_pool(
name="pool2", name="pool2",
input=conv2_2, input=conv2_2,
...@@ -226,39 +165,18 @@ def net_conf(mode): ...@@ -226,39 +165,18 @@ def net_conf(mode):
name="conv4_3_norm", name="conv4_3_norm",
input=conv4_3, input=conv4_3,
param_attr=paddle.attr.ParamAttr( param_attr=paddle.attr.ParamAttr(
initial_mean=20, initial_mean=20, initial_std=0, is_static=False, learning_rate=1))
initial_std=0,
is_static=False,
learning_rate=1,
momentum=cfg.TRAIN.MOMENTUM))
conv4_3_norm_mbox_loc, conv4_3_norm_mbox_conf = \ conv4_3_norm_mbox_loc, conv4_3_norm_mbox_conf = \
mbox_block("conv4_3_norm", conv4_3_norm, 512, 3, 12, 63) mbox_block("conv4_3_norm", conv4_3_norm, 512, 3, 12, 63)
conv5_3, pool5 = vgg_block("5", pool4, 512, 512, 3, 1, 1) conv5_3, pool5 = vgg_block("5", pool4, 512, 512, 3, 1, 1)
fc6 = paddle.layer.img_conv( stack_num = 2
name="fc6", fc7 = conv_group(stack_num, ['fc6', 'fc7'], pool5, [3, 1], 512, [1024] *
input=pool5, stack_num, [1] * stack_num, [1, 0], default_bias_attr,
filter_size=3, get_param_attr(1, default_l2regularization),
num_channels=512, paddle.activation.Relu())
num_filters=1024,
stride=1,
padding=1,
bias_attr=default_bias_attr,
param_attr=xavier(1024, 3, 1, default_l2regularization),
act=paddle.activation.Relu())
fc7 = paddle.layer.img_conv(
name="fc7",
input=fc6,
filter_size=1,
num_channels=1024,
num_filters=1024,
stride=1,
padding=0,
bias_attr=default_bias_attr,
param_attr=xavier(1024, 1, 1, default_l2regularization),
act=paddle.activation.Relu())
fc7_mbox_loc, fc7_mbox_conf = mbox_block("fc7", fc7, 1024, 3, 24, 126) fc7_mbox_loc, fc7_mbox_conf = mbox_block("fc7", fc7, 1024, 3, 24, 126)
fc7_mbox_priorbox = paddle.layer.priorbox( fc7_mbox_priorbox = paddle.layer.priorbox(
input=fc7, input=fc7,
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册