update script for imagenet to ofrecord

857388b0 · Flowingsun007 · 17a9c2cc · 857388b0 · 17a9c2cc · 857388b0
3 changed file
--- a/Classification/cnns/README.md
+++ b/Classification/cnns/README.md
@@ -363,7 +363,7 @@ OneFlow和英伟达保持了相同的初始化方式，只是在两个框架中

 #### 将ImageNet转换成OFRecord

-在OneFlow中，提供了将原始ImageNet-2012数据集文件转换成OFRecord格式的脚本。如果您已经准备好了ImageNet-2012数据集(训练集和验证集)，并且训练集/验证集的格式如下：
+在OneFlow中，提供了将原始ImageNet-2012数据集文件转换成OFRecord格式的脚本，如果您已经准备好了ImageNet-2012数据集(训练集和验证集)，并且训练集/验证集的格式如下：

 ```shell
 │   ├── train
@@ -376,7 +376,10 @@ OneFlow和英伟达保持了相同的初始化方式，只是在两个框架中
                                 ...
 ```

-那么，一键执行以下脚本即可完成训练集和验证集 > OFRecord的转换：
+那么，您只需要下载：[imagenet_2012_bounding_boxes.csv](https://oneflow-public.oss-cn-beijing.aliyuncs.com/online_document/dataset/imagenet/imagenet_2012_bounding_boxes.zip)
+
+然后执行以下脚本即可完成训练集/验证集 > OFRecord的转换：
+
 ##### 转换训练集

 ```shell
@@ -437,22 +440,28 @@ python3 imagenet_ofrecord.py  \



-如果您尚未下载过Imagenet数据集，请自行下载和准备以下文件：
+如果您尚未下载过Imagenet数据集，准备以下文件：

 - ILSVRC2012_img_train.tar
-
 - ILSVRC2012_img_val.tar
+- ILSVRC2012_bbox_train_v2.tar.gz（非必须）
+
+其中训练集和验证集的图片请自行下载，bbox标注可以点此下载：[ILSVRC2012_bbox_train_v2.tar.gz](https://oneflow-public.oss-cn-beijing.aliyuncs.com/online_document/dataset/imagenet/ILSVRC2012_bbox_train_v2.tar.gz)
+
+我们将用下面三个步骤，帮您完成数据集的预处理。之后，您就可以使用上面介绍的转换脚本进行OFReciord的转换了。

-我们将用以下两个步骤，帮您完成数据集的预处理。之后，您就可以使用上面介绍的转换脚本进行OFReciord的转换了。下面假设您已经下载好了原始数据集，并存放在data/imagenet目录下：
+
+
+下面假设您已经下载好了原始数据集和bbox标注文件，并存放在data/imagenet目录下：

 ```shell
 ├── data
 │   └── imagenet
 │       ├── ILSVRC2012_img_train.tar
 │       ├── ILSVRC2012_img_val.tar
-├── imagenet_utils
+│       ├── ILSVRC2012_bbox_train_v2.tar.gz
+├── tools
 │   ├── extract_trainval.sh
-│   ├── imagenet_2012_bounding_boxes.csv
 │   ├── imagenet_2012_validation_synset_labels.txt
 │   ├── imagenet_lsvrc_2015_synsets.txt
 │   ├── imagenet_metadata.txt
@@ -460,7 +469,25 @@ python3 imagenet_ofrecord.py  \
 │   └── preprocess_imagenet_validation_data.py
 ```

-**步骤一：extract imagenet**
+**步骤一：process_bounding_boxes**
+
+这一步，主要是将标注好的包含bboxs的xml文件提取到一个.csv文件中，方便后面代码中直接使用。完整的转换过程大约需要5分钟。
+
+当然，你也可以直接使用我们转换好的文件：[imagenet_2012_bounding_boxes.csv](https://oneflow-public.oss-cn-beijing.aliyuncs.com/online_document/dataset/imagenet/imagenet_2012_bounding_boxes.zip)
+
+1.解压ILSVRC2012_bbox_train_v2.tar.gz
+
+```shell
+cd data/imagenet && mkdir bounding_boxes && tar -zxvf ILSVRC2012_bbox_train_v2.tar.gz -C bounding_boxes
+```
+
+2.提取bboxs至.csv文件
+
+```shell
+cd ../.. && python process_bounding_boxes.py  data/imagenet/bounding_boxes   imagenet_lsvrc_2015_synsets.txt  | sort > imagenet_2012_bounding_boxes.csv
+```
+
+**步骤二：extract imagenet**

 这一步主要是将ILSVRC2012_img_train.tar和ILSVRC2012_img_val.tar解压缩，生成train、validation文件夹。train文件夹下是1000个虚拟lebel分类文件夹(如：n01443537)，训练集图片解压后根据分类放入这些label文件夹中；validation文件夹下是解压后的原图。

@@ -474,6 +501,8 @@ sh extract_trainval.sh ../data/imagenet # 参数指定存放imagenet元素数据
 ├── imagenet
 │   ├── ILSVRC2012_img_train.tar
 │   ├── ILSVRC2012_img_val.tar
+│   ├── ILSVRC2012_bbox_train_v2.tar.gz
+│   ├── bounding_boxes
 │   ├── train
 │   │   ├── n01440764
 │   │   │   ├── n01440764_10026.JPEG
@@ -489,7 +518,7 @@ sh extract_trainval.sh ../data/imagenet # 参数指定存放imagenet元素数据
 											...
 ```

-**步骤二：validation数据处理**
+**步骤三：validation数据处理**

 经过上一步，train数据集已经放入了1000个分类label文件夹中形成了规整的格式，而验证集部分的图片还全部堆放在validation文件夹中，这一步，我们就用preprocess_imagenet_validation_data.py对其进行处理，使其也按类别存放到label文件夹下。
 ```shell
@@ -503,6 +532,8 @@ python3 preprocess_imagenet_validation_data.py  ../data/imagenet/validation
 ├── imagenet
 │   ├── ILSVRC2012_img_train.tar
 │   ├── ILSVRC2012_img_val.tar
+│   ├── ILSVRC2012_bbox_train_v2.tar.gz
+│   ├── bounding_boxes
 │   ├── train
 │   │   ├── n01440764
 │   │   └── n01443537

--- a/Classification/cnns/tools/imagenet_2012_bounding_boxes.csv
+++ b/Classification/cnns/tools/imagenet_2012_bounding_boxes.csv
--- a/Classification/cnns/tools/process_bounding_boxes.py
+++ b/Classification/cnns/tools/process_bounding_boxes.py
+#!/usr/bin/python
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Process the ImageNet Challenge bounding boxes for TensorFlow model training.
+This script is called as
+process_bounding_boxes.py <dir> [synsets-file]
+Where <dir> is a directory containing the downloaded and unpacked bounding box
+data. If [synsets-file] is supplied, then only the bounding boxes whose
+synstes are contained within this file are returned. Note that the
+[synsets-file] file contains synset ids, one per line.
+The script dumps out a CSV text file in which each line contains an entry.
+  n00007846_64193.JPEG,0.0060,0.2620,0.7545,0.9940
+The entry can be read as:
+  <JPEG file name>, <xmin>, <ymin>, <xmax>, <ymax>
+The bounding box for <JPEG file name> contains two points (xmin, ymin) and
+(xmax, ymax) specifying the lower-left corner and upper-right corner of a
+bounding box in *relative* coordinates.
+The user supplies a directory where the XML files reside. The directory
+structure in the directory <dir> is assumed to look like this:
+<dir>/nXXXXXXXX/nXXXXXXXX_YYYY.xml
+Each XML file contains a bounding box annotation. The script:
+ (1) Parses the XML file and extracts the filename, label and bounding box info.
+ (2) The bounding box is specified in the XML files as integer (xmin, ymin) and
+    (xmax, ymax) *relative* to image size displayed to the human annotator. The
+    size of the image displayed to the human annotator is stored in the XML file
+    as integer (height, width).
+    Note that the displayed size will differ from the actual size of the image
+    downloaded from image-net.org. To make the bounding box annotation useable,
+    we convert bounding box to floating point numbers relative to displayed
+    height and width of the image.
+    Note that each XML file might contain N bounding box annotations.
+    Note that the points are all clamped at a range of [0.0, 1.0] because some
+    human annotations extend outside the range of the supplied image.
+    See details here: http://image-net.org/download-bboxes
+(3) By default, the script outputs all valid bounding boxes. If a
+    [synsets-file] is supplied, only the subset of bounding boxes associated
+    with those synsets are outputted. Importantly, one can supply a list of
+    synsets in the ImageNet Challenge and output the list of bounding boxes
+    associated with the training images of the ILSVRC.
+    We use these bounding boxes to inform the random distortion of images
+    supplied to the network.
+If you run this script successfully, you will see the following output
+to stderr:
+> Finished processing 544546 XML files.
+> Skipped 0 XML files not in ImageNet Challenge.
+> Skipped 0 bounding boxes not in ImageNet Challenge.
+> Wrote 615299 bounding boxes from 544546 annotated images.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import glob
+import os.path
+import sys
+import xml.etree.ElementTree as ET
+from six.moves import xrange  # pylint: disable=redefined-builtin
+
+
+class BoundingBox(object):
+  pass
+
+
+def GetItem(name, root, index=0):
+  count = 0
+  for item in root.iter(name):
+    if count == index:
+      return item.text
+    count += 1
+  # Failed to find "index" occurrence of item.
+  return -1
+
+
+def GetInt(name, root, index=0):
+  return int(GetItem(name, root, index))
+
+
+def FindNumberBoundingBoxes(root):
+  index = 0
+  while True:
+    if GetInt('xmin', root, index) == -1:
+      break
+    index += 1
+  return index
+
+
+def ProcessXMLAnnotation(xml_file):
+  """Process a single XML file containing a bounding box."""
+  # pylint: disable=broad-except
+  try:
+    tree = ET.parse(xml_file)
+  except Exception:
+    print('Failed to parse: ' + xml_file, file=sys.stderr)
+    return None
+  # pylint: enable=broad-except
+  root = tree.getroot()
+
+  num_boxes = FindNumberBoundingBoxes(root)
+  boxes = []
+
+  for index in xrange(num_boxes):
+    box = BoundingBox()
+    # Grab the 'index' annotation.
+    box.xmin = GetInt('xmin', root, index)
+    box.ymin = GetInt('ymin', root, index)
+    box.xmax = GetInt('xmax', root, index)
+    box.ymax = GetInt('ymax', root, index)
+
+    box.width = GetInt('width', root)
+    box.height = GetInt('height', root)
+    box.filename = GetItem('filename', root) + '.JPEG'
+    box.label = GetItem('name', root)
+
+    xmin = float(box.xmin) / float(box.width)
+    xmax = float(box.xmax) / float(box.width)
+    ymin = float(box.ymin) / float(box.height)
+    ymax = float(box.ymax) / float(box.height)
+
+    # Some images contain bounding box annotations that
+    # extend outside of the supplied image. See, e.g.
+    # n03127925/n03127925_147.xml
+    # Additionally, for some bounding boxes, the min > max
+    # or the box is entirely outside of the image.
+    min_x = min(xmin, xmax)
+    max_x = max(xmin, xmax)
+    box.xmin_scaled = min(max(min_x, 0.0), 1.0)
+    box.xmax_scaled = min(max(max_x, 0.0), 1.0)
+
+    min_y = min(ymin, ymax)
+    max_y = max(ymin, ymax)
+    box.ymin_scaled = min(max(min_y, 0.0), 1.0)
+    box.ymax_scaled = min(max(max_y, 0.0), 1.0)
+
+    boxes.append(box)
+
+  return boxes
+
+if __name__ == '__main__':
+  if len(sys.argv) < 2 or len(sys.argv) > 3:
+    print('Invalid usage\n'
+          'usage: process_bounding_boxes.py <dir> [synsets-file]',
+          file=sys.stderr)
+    sys.exit(-1)
+
+  xml_files = glob.glob(sys.argv[1] + '/*/*.xml')
+  print('Identified %d XML files in %s' % (len(xml_files), sys.argv[1]),
+        file=sys.stderr)
+
+  if len(sys.argv) == 3:
+    labels = set([l.strip() for l in open(sys.argv[2]).readlines()])
+    print('Identified %d synset IDs in %s' % (len(labels), sys.argv[2]),
+          file=sys.stderr)
+  else:
+    labels = None
+
+  skipped_boxes = 0
+  skipped_files = 0
+  saved_boxes = 0
+  saved_files = 0
+  for file_index, one_file in enumerate(xml_files):
+    # Example: <...>/n06470073/n00141669_6790.xml
+    label = os.path.basename(os.path.dirname(one_file))
+
+    # Determine if the annotation is from an ImageNet Challenge label.
+    if labels is not None and label not in labels:
+      skipped_files += 1
+      continue
+
+    bboxes = ProcessXMLAnnotation(one_file)
+    assert bboxes is not None, 'No bounding boxes found in ' + one_file
+
+    found_box = False
+    for bbox in bboxes:
+      if labels is not None:
+        if bbox.label != label:
+          # Note: There is a slight bug in the bounding box annotation data.
+          # Many of the dog labels have the human label 'Scottish_deerhound'
+          # instead of the synset ID 'n02092002' in the bbox.label field. As a
+          # simple hack to overcome this issue, we only exclude bbox labels
+          # *which are synset ID's* that do not match original synset label for
+          # the XML file.
+          if bbox.label in labels:
+            skipped_boxes += 1
+            continue
+
+      # Guard against improperly specified boxes.
+      if (bbox.xmin_scaled >= bbox.xmax_scaled or
+          bbox.ymin_scaled >= bbox.ymax_scaled):
+        skipped_boxes += 1
+        continue
+
+      # Note bbox.filename occasionally contains '%s' in the name. This is
+      # data set noise that is fixed by just using the basename of the XML file.
+      image_filename = os.path.splitext(os.path.basename(one_file))[0]
+      print('%s.JPEG,%.4f,%.4f,%.4f,%.4f' %
+            (image_filename,
+             bbox.xmin_scaled, bbox.ymin_scaled,
+             bbox.xmax_scaled, bbox.ymax_scaled))
+
+      saved_boxes += 1
+      found_box = True
+    if found_box:
+      saved_files += 1
+    else:
+      skipped_files += 1
+
+    if not file_index % 5000:
+      print('--> processed %d of %d XML files.' %
+            (file_index + 1, len(xml_files)),
+            file=sys.stderr)
+      print('--> skipped %d boxes and %d XML files.' %
+            (skipped_boxes, skipped_files), file=sys.stderr)
+
+  print('Finished processing %d XML files.' % len(xml_files), file=sys.stderr)
+  print('Skipped %d XML files not in ImageNet Challenge.' % skipped_files,
+        file=sys.stderr)
+  print('Skipped %d bounding boxes not in ImageNet Challenge.' % skipped_boxes,
+        file=sys.stderr)
+  print('Wrote %d bounding boxes from %d annotated images.' %
+        (saved_boxes, saved_files),
+        file=sys.stderr)
+  print('Finished.', file=sys.stderr)
\ No newline at end of file