未验证 提交 6c38ad07 编写于 作者: M Maria Khrustaleva 提交者: GitHub

Manifest (#2763)

* Added support for manifest file
* Added data migration
* Updated tests
* Update CHANGELOG
* Update manifest documentation
* Fix case with 3d data
Co-authored-by: NNikita Manovich <nikita.manovich@intel.com>
上级 e41c3012
......@@ -5,6 +5,7 @@ branch = true
source =
cvat/apps/
utils/cli/
utils/dataset_manifest
omit =
cvat/settings/*
......
......@@ -28,6 +28,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- CVAT-3D: Implemented initial cuboid placement in 3D View and select cuboid in Top, Side and Front views
(<https://github.com/openvinotoolkit/cvat/pull/2891>)
- [Market-1501](https://www.aitribune.com/dataset/2018051063) format support (<https://github.com/openvinotoolkit/cvat/pull/2869>)
- Ability of upload manifest for dataset with images (<https://github.com/openvinotoolkit/cvat/pull/2763>)
- Annotations filters UI using react-awesome-query-builder (https://github.com/openvinotoolkit/cvat/issues/1418)
### Changed
......@@ -42,6 +43,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Image visualizations settings on canvas for faster access (<https://github.com/openvinotoolkit/cvat/pull/2872>)
- Better scale management of left panel when screen is too small (<https://github.com/openvinotoolkit/cvat/pull/2880>)
- Improved error messages for annotation import (<https://github.com/openvinotoolkit/cvat/pull/2935>)
- Using manifest support instead video meta information and dummy chunks (<https://github.com/openvinotoolkit/cvat/pull/2763>)
### Deprecated
......
......@@ -2,40 +2,25 @@
## Description
Data on the fly processing is a way of working with data, the main idea of which is as follows:
Minimum necessary meta information is collected, when task is created.
This meta information allows in the future to create a necessary chunks when receiving a request from a client.
Data on the fly processing is a way of working with data, the main idea of which is as follows: when creating a task,
the minimum necessary meta information is collected. This meta information allows in the future to create necessary
chunks when receiving a request from a client.
Generated chunks are stored in a cache of limited size with a policy of evicting less popular items.
Generated chunks are stored in a cache of the limited size with a policy of evicting less popular items.
When a request received from a client, the required chunk is searched for in the cache.
If the chunk does not exist yet, it is created using a prepared meta information and then put into the cache.
When a request is received from a client, the required chunk is searched for in the cache. If the chunk does not exist
yet, it is created using prepared meta information and then put into the cache.
This method of working with data allows:
- reduce the task creation time.
- store data in a cache of limited size with a policy of evicting less popular items.
- store data in a cache of the limited size with a policy of evicting less popular items.
## Prepare meta information
Unfortunately, this method will not work for all videos with a valid manifest file. If there are not enough keyframes
in the video for smooth video decoding, the task will be created in another way. Namely, all chunks will be prepared
during task creation, which may take some time.
Different meta information is collected for different types of uploaded data.
#### Uploading a manifest with data
### Video
For video, this is a valid mapping of key frame numbers and their timestamps. This information is saved to `meta_info.txt`.
Unfortunately, this method will not work for all videos with valid meta information.
If there are not enough keyframes in the video for smooth video decoding, the task will be created in the old way.
#### Uploading meta information along with data
When creating a task, you can upload a file with meta information along with the video,
which will further reduce the time for creating a task.
You can see how to prepare meta information [here](/utils/prepare_meta_information/README.md).
It is worth noting that the generated file also contains information about the number of frames in the video at the end.
### Images
Mapping of chunk number and paths to images that should enter the chunk
is saved at the time of creating a task in a files `dummy_{chunk_number}.txt`
When creating a task, you can upload a `manifest.jsonl` file along with the video or dataset with images.
You can see how to prepare it [here](/utils/dataset_manifest/README.md).
......@@ -15,7 +15,6 @@
- [How to create a task with multiple jobs](#how-to-create-a-task-with-multiple-jobs)
- [How to transfer CVAT to another machine](#how-to-transfer-cvat-to-another-machine)
## How to update CVAT
Before upgrading, please follow the [backup guide](backup_guide.md) and backup all CVAT volumes.
......@@ -151,4 +150,5 @@ Set the segment size when you create a new task, this option is available in the
[Advanced configuration](user_guide.md#advanced-configuration) section.
## How to transfer CVAT to another machine
Follow the [backup/restore guide](backup_guide.md#how-to-backup-all-cvat-data).
......@@ -153,8 +153,8 @@ Go to the [Django administration panel](http://localhost:8080/admin). There you
**Select files**. Press tab `My computer` to choose some files for annotation from your PC.
If you select tab `Connected file share` you can choose files for annotation from your network.
If you select ` Remote source` , you'll see a field where you can enter a list of URLs (one URL per line).
If you upload a video data and select `Use cache` option, you can along with the video file attach a file with meta information.
You can find how to prepare it [here](/utils/prepare_meta_information/README.md).
If you upload a video or dataset with images and select `Use cache` option, you can attach a `manifest.jsonl` file.
You can find how to prepare it [here](/utils/dataset_manifest/README.md).
![](static/documentation/images/image127.jpg)
......@@ -1157,8 +1157,6 @@ Intelligent scissors is an CV method of creating a polygon by placing points wit
The distance between the adjacent points is limited by the threshold of action,
displayed as a red square which is tied to the cursor.
- First, select the label and then click on the `intelligent scissors` button.
![](static/documentation/images/image199.jpg)
......
# Copyright (C) 2020 Intel Corporation
# Copyright (C) 2020-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT
......@@ -9,9 +9,9 @@ from diskcache import Cache
from django.conf import settings
from cvat.apps.engine.media_extractors import (Mpeg4ChunkWriter,
Mpeg4CompressedChunkWriter, ZipChunkWriter, ZipCompressedChunkWriter)
Mpeg4CompressedChunkWriter, ZipChunkWriter, ZipCompressedChunkWriter,
ImageDatasetManifestReader, VideoDatasetManifestReader)
from cvat.apps.engine.models import DataChoice, StorageChoice
from cvat.apps.engine.prepare import PrepareInfo
from cvat.apps.engine.models import DimensionType
class CacheInteraction:
......@@ -51,17 +51,24 @@ class CacheInteraction:
StorageChoice.LOCAL: db_data.get_upload_dirname(),
StorageChoice.SHARE: settings.SHARE_ROOT
}[db_data.storage]
if os.path.exists(db_data.get_meta_path()):
if hasattr(db_data, 'video'):
source_path = os.path.join(upload_dir, db_data.video.path)
meta = PrepareInfo(source_path=source_path, meta_path=db_data.get_meta_path())
for frame in meta.decode_needed_frames(chunk_number, db_data):
images.append(frame)
writer.save_as_chunk([(image, source_path, None) for image in images], buff)
reader = VideoDatasetManifestReader(manifest_path=db_data.get_manifest_path(),
source_path=source_path, chunk_number=chunk_number,
chunk_size=db_data.chunk_size, start=db_data.start_frame,
stop=db_data.stop_frame, step=db_data.get_frame_step())
for frame in reader:
images.append((frame, source_path, None))
else:
with open(db_data.get_dummy_chunk_path(chunk_number), 'r') as dummy_file:
images = [os.path.join(upload_dir, line.strip()) for line in dummy_file]
writer.save_as_chunk([(image, image, None) for image in images], buff)
reader = ImageDatasetManifestReader(manifest_path=db_data.get_manifest_path(),
chunk_number=chunk_number, chunk_size=db_data.chunk_size,
start=db_data.start_frame, stop=db_data.stop_frame,
step=db_data.get_frame_step())
for item in reader:
source_path = os.path.join(upload_dir, f"{item['name']}{item['extension']}")
images.append((source_path, source_path, None))
writer.save_as_chunk(images, buff)
buff.seek(0)
return buff, mime_type
......
......@@ -11,6 +11,7 @@ import itertools
import struct
import re
from abc import ABC, abstractmethod
from contextlib import closing
import av
import numpy as np
......@@ -25,6 +26,7 @@ from cvat.apps.engine.models import DimensionType
ImageFile.LOAD_TRUNCATED_IMAGES = True
from cvat.apps.engine.mime_types import mimetypes
from utils.dataset_manifest import VideoManifestManager, ImageManifestManager
def get_mime(name):
for type_name, type_def in MEDIA_TYPES.items():
......@@ -127,6 +129,10 @@ class ImageListReader(IMediaReader):
img = Image.open(self._source_path[i])
return img.width, img.height
@property
def absolute_source_paths(self):
return [self.get_path(idx) for idx, _ in enumerate(self._source_path)]
class DirectoryReader(ImageListReader):
def __init__(self, source_path, step=1, start=0, stop=None):
image_paths = []
......@@ -317,6 +323,103 @@ class VideoReader(IMediaReader):
image = (next(iter(self)))[0]
return image.width, image.height
class FragmentMediaReader:
def __init__(self, chunk_number, chunk_size, start, stop, step=1):
self._start = start
self._stop = stop + 1 # up to the last inclusive
self._step = step
self._chunk_number = chunk_number
self._chunk_size = chunk_size
self._start_chunk_frame_number = \
self._start + self._chunk_number * self._chunk_size * self._step
self._end_chunk_frame_number = min(self._start_chunk_frame_number \
+ (self._chunk_size - 1) * self._step + 1, self._stop)
self._frame_range = self._get_frame_range()
@property
def frame_range(self):
return self._frame_range
def _get_frame_range(self):
frame_range = []
for idx in range(self._start, self._stop, self._step):
if idx < self._start_chunk_frame_number:
continue
elif idx < self._end_chunk_frame_number and \
not ((idx - self._start_chunk_frame_number) % self._step):
frame_range.append(idx)
elif (idx - self._start_chunk_frame_number) % self._step:
continue
else:
break
return frame_range
class ImageDatasetManifestReader(FragmentMediaReader):
def __init__(self, manifest_path, **kwargs):
super().__init__(**kwargs)
self._manifest = ImageManifestManager(manifest_path)
self._manifest.init_index()
def __iter__(self):
for idx in self._frame_range:
yield self._manifest[idx]
class VideoDatasetManifestReader(FragmentMediaReader):
def __init__(self, manifest_path, **kwargs):
self.source_path = kwargs.pop('source_path')
super().__init__(**kwargs)
self._manifest = VideoManifestManager(manifest_path)
self._manifest.init_index()
def _get_nearest_left_key_frame(self):
if self._start_chunk_frame_number >= \
self._manifest[len(self._manifest) - 1].get('number'):
left_border = len(self._manifest) - 1
else:
left_border = 0
delta = len(self._manifest)
while delta:
step = delta // 2
cur_position = left_border + step
if self._manifest[cur_position].get('number') < self._start_chunk_frame_number:
cur_position += 1
left_border = cur_position
delta -= step + 1
else:
delta = step
if self._manifest[cur_position].get('number') > self._start_chunk_frame_number:
left_border -= 1
frame_number = self._manifest[left_border].get('number')
timestamp = self._manifest[left_border].get('pts')
return frame_number, timestamp
def __iter__(self):
start_decode_frame_number, start_decode_timestamp = self._get_nearest_left_key_frame()
with closing(av.open(self.source_path, mode='r')) as container:
video_stream = next(stream for stream in container.streams if stream.type == 'video')
video_stream.thread_type = 'AUTO'
container.seek(offset=start_decode_timestamp, stream=video_stream)
frame_number = start_decode_frame_number - 1
for packet in container.demux(video_stream):
for frame in packet.decode():
frame_number += 1
if frame_number in self._frame_range:
if video_stream.metadata.get('rotate'):
frame = av.VideoFrame().from_ndarray(
rotate_image(
frame.to_ndarray(format='bgr24'),
360 - int(container.streams.video[0].metadata.get('rotate'))
),
format ='bgr24'
)
yield frame
elif frame_number < self._frame_range[-1]:
continue
else:
return
class IChunkWriter(ABC):
def __init__(self, quality, dimension=DimensionType.DIM_2D):
self._image_quality = quality
......
# Generated by Django 3.1.1 on 2021-02-20 08:36
import glob
import os
from re import search
from django.conf import settings
from django.db import migrations
from cvat.apps.engine.models import (DimensionType, StorageChoice,
StorageMethodChoice)
from utils.dataset_manifest import ImageManifestManager, VideoManifestManager
def migrate_data(apps, shema_editor):
Data = apps.get_model("engine", "Data")
query_set = Data.objects.filter(storage_method=StorageMethodChoice.CACHE)
for db_data in query_set:
try:
upload_dir = '{}/{}/raw'.format(settings.MEDIA_DATA_ROOT, db_data.id)
if os.path.exists(os.path.join(upload_dir, 'meta_info.txt')):
os.remove(os.path.join(upload_dir, 'meta_info.txt'))
else:
for path in glob.glob(f'{upload_dir}/dummy_*.txt'):
os.remove(path)
# it's necessary for case with long data migration
if os.path.exists(os.path.join(upload_dir, 'manifest.jsonl')):
continue
data_dir = upload_dir if db_data.storage == StorageChoice.LOCAL else settings.SHARE_ROOT
if hasattr(db_data, 'video'):
media_file = os.path.join(data_dir, db_data.video.path)
manifest = VideoManifestManager(manifest_path=upload_dir)
meta_info = manifest.prepare_meta(media_file=media_file)
manifest.create(meta_info)
manifest.init_index()
else:
manifest = ImageManifestManager(manifest_path=upload_dir)
sources = []
if db_data.storage == StorageChoice.LOCAL:
for (root, _, files) in os.walk(data_dir):
sources.extend([os.path.join(root, f) for f in files])
sources.sort()
# using share, this means that we can not explicitly restore the entire data structure
else:
sources = [os.path.join(data_dir, db_image.path) for db_image in db_data.images.all().order_by('frame')]
if any(list(filter(lambda x: x.dimension==DimensionType.DIM_3D, db_data.tasks.all()))):
content = []
for source in sources:
name, ext = os.path.splitext(os.path.relpath(source, upload_dir))
content.append({
'name': name,
'extension': ext
})
else:
meta_info = manifest.prepare_meta(sources=sources, data_dir=data_dir)
content = meta_info.content
if db_data.storage == StorageChoice.SHARE:
def _get_frame_step(str_):
match = search("step\s*=\s*([1-9]\d*)", str_)
return int(match.group(1)) if match else 1
step = _get_frame_step(db_data.frame_filter)
start = db_data.start_frame
stop = db_data.stop_frame + 1
images_range = range(start, stop, step)
result_content = []
for i in range(stop):
item = content.pop(0) if i in images_range else dict()
result_content.append(item)
content = result_content
manifest.create(content)
manifest.init_index()
except Exception as ex:
print(str(ex))
class Migration(migrations.Migration):
dependencies = [
('engine', '0037_task_subset'),
]
operations = [
migrations.RunPython(migrate_data)
]
......@@ -138,11 +138,10 @@ class Data(models.Model):
def get_preview_path(self):
return os.path.join(self.get_data_dirname(), 'preview.jpeg')
def get_meta_path(self):
return os.path.join(self.get_upload_dirname(), 'meta_info.txt')
def get_dummy_chunk_path(self, chunk_number):
return os.path.join(self.get_upload_dirname(), 'dummy_{}.txt'.format(chunk_number))
def get_manifest_path(self):
return os.path.join(self.get_upload_dirname(), 'manifest.jsonl')
def get_index_path(self):
return os.path.join(self.get_upload_dirname(), 'index.json')
class Video(models.Model):
data = models.OneToOneField(Data, on_delete=models.CASCADE, related_name="video", null=True)
......
# Copyright (C) 2020 Intel Corporation
#
# SPDX-License-Identifier: MIT
import av
from collections import OrderedDict
import hashlib
import os
from cvat.apps.engine.utils import rotate_image
class WorkWithVideo:
def __init__(self, **kwargs):
if not kwargs.get('source_path'):
raise Exception('No sourse path')
self.source_path = kwargs.get('source_path')
@staticmethod
def _open_video_container(sourse_path, mode, options=None):
return av.open(sourse_path, mode=mode, options=options)
@staticmethod
def _close_video_container(container):
container.close()
@staticmethod
def _get_video_stream(container):
video_stream = next(stream for stream in container.streams if stream.type == 'video')
video_stream.thread_type = 'AUTO'
return video_stream
@staticmethod
def _get_frame_size(container):
video_stream = WorkWithVideo._get_video_stream(container)
for packet in container.demux(video_stream):
for frame in packet.decode():
if video_stream.metadata.get('rotate'):
frame = av.VideoFrame().from_ndarray(
rotate_image(
frame.to_ndarray(format='bgr24'),
360 - int(container.streams.video[0].metadata.get('rotate')),
),
format ='bgr24',
)
return frame.width, frame.height
class AnalyzeVideo(WorkWithVideo):
def check_type_first_frame(self):
container = self._open_video_container(self.source_path, mode='r')
video_stream = self._get_video_stream(container)
for packet in container.demux(video_stream):
for frame in packet.decode():
self._close_video_container(container)
assert frame.pict_type.name == 'I', 'First frame is not key frame'
return
def check_video_timestamps_sequences(self):
container = self._open_video_container(self.source_path, mode='r')
video_stream = self._get_video_stream(container)
frame_pts = -1
frame_dts = -1
for packet in container.demux(video_stream):
for frame in packet.decode():
if None not in [frame.pts, frame_pts] and frame.pts <= frame_pts:
self._close_video_container(container)
raise Exception('Invalid pts sequences')
if None not in [frame.dts, frame_dts] and frame.dts <= frame_dts:
self._close_video_container(container)
raise Exception('Invalid dts sequences')
frame_pts, frame_dts = frame.pts, frame.dts
self._close_video_container(container)
def md5_hash(frame):
return hashlib.md5(frame.to_image().tobytes()).hexdigest()
class PrepareInfo(WorkWithVideo):
def __init__(self, **kwargs):
super().__init__(**kwargs)
if not kwargs.get('meta_path'):
raise Exception('No meta path')
self.meta_path = kwargs.get('meta_path')
self.key_frames = {}
self.frames = 0
container = self._open_video_container(self.source_path, 'r')
self.width, self.height = self._get_frame_size(container)
self._close_video_container(container)
def get_task_size(self):
return self.frames
@property
def frame_sizes(self):
return (self.width, self.height)
def check_key_frame(self, container, video_stream, key_frame):
for packet in container.demux(video_stream):
for frame in packet.decode():
if md5_hash(frame) != key_frame[1]['md5'] or frame.pts != key_frame[1]['pts']:
self.key_frames.pop(key_frame[0])
return
def check_seek_key_frames(self):
container = self._open_video_container(self.source_path, mode='r')
video_stream = self._get_video_stream(container)
key_frames_copy = self.key_frames.copy()
for key_frame in key_frames_copy.items():
container.seek(offset=key_frame[1]['pts'], stream=video_stream)
self.check_key_frame(container, video_stream, key_frame)
def check_frames_ratio(self, chunk_size):
return (len(self.key_frames) and (self.frames // len(self.key_frames)) <= 2 * chunk_size)
def save_key_frames(self):
container = self._open_video_container(self.source_path, mode='r')
video_stream = self._get_video_stream(container)
frame_number = 0
for packet in container.demux(video_stream):
for frame in packet.decode():
if frame.key_frame:
self.key_frames[frame_number] = {
'pts': frame.pts,
'md5': md5_hash(frame),
}
frame_number += 1
self.frames = frame_number
self._close_video_container(container)
def save_meta_info(self):
with open(self.meta_path, 'w') as meta_file:
for index, frame in self.key_frames.items():
meta_file.write('{} {}\n'.format(index, frame['pts']))
def get_nearest_left_key_frame(self, start_chunk_frame_number):
start_decode_frame_number = 0
start_decode_timestamp = 0
with open(self.meta_path, 'r') as file:
for line in file:
frame_number, timestamp = line.strip().split(' ')
if int(frame_number) <= start_chunk_frame_number:
start_decode_frame_number = frame_number
start_decode_timestamp = timestamp
else:
break
return int(start_decode_frame_number), int(start_decode_timestamp)
def decode_needed_frames(self, chunk_number, db_data):
step = db_data.get_frame_step()
start_chunk_frame_number = db_data.start_frame + chunk_number * db_data.chunk_size * step
end_chunk_frame_number = min(start_chunk_frame_number + (db_data.chunk_size - 1) * step + 1, db_data.stop_frame + 1)
start_decode_frame_number, start_decode_timestamp = self.get_nearest_left_key_frame(start_chunk_frame_number)
container = self._open_video_container(self.source_path, mode='r')
video_stream = self._get_video_stream(container)
container.seek(offset=start_decode_timestamp, stream=video_stream)
frame_number = start_decode_frame_number - 1
for packet in container.demux(video_stream):
for frame in packet.decode():
frame_number += 1
if frame_number < start_chunk_frame_number:
continue
elif frame_number < end_chunk_frame_number and not ((frame_number - start_chunk_frame_number) % step):
if video_stream.metadata.get('rotate'):
frame = av.VideoFrame().from_ndarray(
rotate_image(
frame.to_ndarray(format='bgr24'),
360 - int(container.streams.video[0].metadata.get('rotate'))
),
format ='bgr24'
)
yield frame
elif (frame_number - start_chunk_frame_number) % step:
continue
else:
self._close_video_container(container)
return
self._close_video_container(container)
class UploadedMeta(PrepareInfo):
def __init__(self, **kwargs):
super().__init__(**kwargs)
uploaded_meta = kwargs.get('uploaded_meta')
assert uploaded_meta is not None , 'No uploaded meta path'
with open(uploaded_meta, 'r') as meta_file:
lines = meta_file.read().strip().split('\n')
self.frames = int(lines.pop())
key_frames = {int(line.split()[0]): int(line.split()[1]) for line in lines}
self.key_frames = OrderedDict(sorted(key_frames.items(), key=lambda x: x[0]))
@property
def frame_sizes(self):
container = self._open_video_container(self.source_path, 'r')
video_stream = self._get_video_stream(container)
container.seek(offset=next(iter(self.key_frames.values())), stream=video_stream)
for packet in container.demux(video_stream):
for frame in packet.decode():
if video_stream.metadata.get('rotate'):
frame = av.VideoFrame().from_ndarray(
rotate_image(
frame.to_ndarray(format='bgr24'),
360 - int(container.streams.video[0].metadata.get('rotate'))
),
format ='bgr24'
)
self._close_video_container(container)
return (frame.width, frame.height)
def save_meta_info(self):
with open(self.meta_path, 'w') as meta_file:
for index, pts in self.key_frames.items():
meta_file.write('{} {}\n'.format(index, pts))
def check_key_frame(self, container, video_stream, key_frame):
for packet in container.demux(video_stream):
for frame in packet.decode():
assert frame.pts == key_frame[1], "Uploaded meta information does not match the video"
return
def check_seek_key_frames(self):
container = self._open_video_container(self.source_path, mode='r')
video_stream = self._get_video_stream(container)
for key_frame in self.key_frames.items():
container.seek(offset=key_frame[1], stream=video_stream)
self.check_key_frame(container, video_stream, key_frame)
self._close_video_container(container)
def check_frames_numbers(self):
container = self._open_video_container(self.source_path, mode='r')
video_stream = self._get_video_stream(container)
# not all videos contain information about numbers of frames
if video_stream.frames:
self._close_video_container(container)
assert video_stream.frames == self.frames, "Uploaded meta information does not match the video"
return
self._close_video_container(container)
def prepare_meta(media_file, upload_dir=None, meta_dir=None, chunk_size=None):
paths = {
'source_path': os.path.join(upload_dir, media_file) if upload_dir else media_file,
'meta_path': os.path.join(meta_dir, 'meta_info.txt') if meta_dir else os.path.join(upload_dir, 'meta_info.txt'),
}
analyzer = AnalyzeVideo(source_path=paths.get('source_path'))
analyzer.check_type_first_frame()
analyzer.check_video_timestamps_sequences()
meta_info = PrepareInfo(source_path=paths.get('source_path'),
meta_path=paths.get('meta_path'))
meta_info.save_key_frames()
meta_info.check_seek_key_frames()
meta_info.save_meta_info()
smooth_decoding = meta_info.check_frames_ratio(chunk_size) if chunk_size else None
return (meta_info, smooth_decoding)
def prepare_meta_for_upload(func, *args):
meta_info, smooth_decoding = func(*args)
with open(meta_info.meta_path, 'a') as meta_file:
meta_file.write(str(meta_info.get_task_size()))
return smooth_decoding
# Copyright (C) 2018-2020 Intel Corporation
# Copyright (C) 2018-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT
import itertools
import os
import sys
from re import findall
import rq
import shutil
from traceback import print_exception
......@@ -17,8 +16,9 @@ import requests
from cvat.apps.engine.media_extractors import get_mime, MEDIA_TYPES, Mpeg4ChunkWriter, ZipChunkWriter, Mpeg4CompressedChunkWriter, ZipCompressedChunkWriter, ValidateDimension
from cvat.apps.engine.models import DataChoice, StorageMethodChoice, StorageChoice, RelatedFile
from cvat.apps.engine.utils import av_scan_paths
from cvat.apps.engine.prepare import prepare_meta
from cvat.apps.engine.models import DimensionType
from utils.dataset_manifest import ImageManifestManager, VideoManifestManager
from utils.dataset_manifest.core import VideoManifestValidator
import django_rq
from django.conf import settings
......@@ -107,7 +107,7 @@ def _save_task_to_db(db_task):
db_task.data.save()
db_task.save()
def _count_files(data, meta_info_file=None):
def _count_files(data, manifest_file=None):
share_root = settings.SHARE_ROOT
server_files = []
......@@ -134,8 +134,8 @@ def _count_files(data, meta_info_file=None):
mime = get_mime(full_path)
if mime in counter:
counter[mime].append(rel_path)
elif findall('meta_info.txt$', rel_path):
meta_info_file.append(rel_path)
elif 'manifest.jsonl' == os.path.basename(rel_path):
manifest_file.append(rel_path)
else:
slogger.glob.warn("Skip '{}' file (its mime type doesn't "
"correspond to a video or an image file)".format(full_path))
......@@ -154,7 +154,7 @@ def _count_files(data, meta_info_file=None):
return counter
def _validate_data(counter, meta_info_file=None):
def _validate_data(counter, manifest_file=None):
unique_entries = 0
multiple_entries = 0
for media_type, media_config in MEDIA_TYPES.items():
......@@ -164,8 +164,8 @@ def _validate_data(counter, meta_info_file=None):
else:
multiple_entries += len(counter[media_type])
if meta_info_file and media_type != 'video':
raise Exception('File with meta information can only be uploaded with video file')
if manifest_file and media_type not in ('video', 'image'):
raise Exception('File with meta information can only be uploaded with video/images ')
if unique_entries == 1 and multiple_entries > 0 or unique_entries > 1:
unique_types = ', '.join([k for k, v in MEDIA_TYPES.items() if v['unique']])
......@@ -221,10 +221,10 @@ def _create_thread(tid, data):
if data['remote_files']:
data['remote_files'] = _download_data(data['remote_files'], upload_dir)
meta_info_file = []
media = _count_files(data, meta_info_file)
media, task_mode = _validate_data(media, meta_info_file)
if meta_info_file:
manifest_file = []
media = _count_files(data, manifest_file)
media, task_mode = _validate_data(media, manifest_file)
if manifest_file:
assert settings.USE_CACHE and db_data.storage_method == StorageMethodChoice.CACHE, \
"File with meta information can be uploaded if 'Use cache' option is also selected"
......@@ -248,8 +248,10 @@ def _create_thread(tid, data):
if extractor is not None:
raise Exception('Combined data types are not supported')
source_paths=[os.path.join(upload_dir, f) for f in media_files]
if media_type in ('archive', 'zip') and db_data.storage == StorageChoice.SHARE:
if media_type in {'archive', 'zip'} and db_data.storage == StorageChoice.SHARE:
source_paths.append(db_data.get_upload_dirname())
upload_dir = db_data.get_upload_dirname()
db_data.storage = StorageChoice.LOCAL
extractor = MEDIA_TYPES[media_type]['extractor'](
source_path=source_paths,
step=db_data.get_frame_step(),
......@@ -322,68 +324,108 @@ def _create_thread(tid, data):
video_path = ""
video_size = (0, 0)
def _update_status(msg):
job.meta['status'] = msg
job.save_meta()
if settings.USE_CACHE and db_data.storage_method == StorageMethodChoice.CACHE:
for media_type, media_files in media.items():
if not media_files:
continue
# replace manifest file (e.g was uploaded 'subdir/manifest.jsonl')
if manifest_file and not os.path.exists(db_data.get_manifest_path()):
shutil.copyfile(os.path.join(upload_dir, manifest_file[0]),
db_data.get_manifest_path())
if upload_dir != settings.SHARE_ROOT:
os.remove(os.path.join(upload_dir, manifest_file[0]))
if task_mode == MEDIA_TYPES['video']['mode']:
try:
if meta_info_file:
manifest_is_prepared = False
if manifest_file:
try:
from cvat.apps.engine.prepare import UploadedMeta
meta_info = UploadedMeta(source_path=os.path.join(upload_dir, media_files[0]),
meta_path=db_data.get_meta_path(),
uploaded_meta=os.path.join(upload_dir, meta_info_file[0]))
meta_info.check_seek_key_frames()
meta_info.check_frames_numbers()
meta_info.save_meta_info()
assert len(meta_info.key_frames) > 0, 'No key frames.'
manifest = VideoManifestValidator(source_path=os.path.join(upload_dir, media_files[0]),
manifest_path=db_data.get_manifest_path())
manifest.init_index()
manifest.validate_seek_key_frames()
manifest.validate_frame_numbers()
assert len(manifest) > 0, 'No key frames.'
all_frames = manifest['properties']['length']
video_size = manifest['properties']['resolution']
manifest_is_prepared = True
except Exception as ex:
base_msg = str(ex) if isinstance(ex, AssertionError) else \
'Invalid meta information was upload.'
job.meta['status'] = '{} Start prepare valid meta information.'.format(base_msg)
job.save_meta()
meta_info, smooth_decoding = prepare_meta(
media_file=media_files[0],
upload_dir=upload_dir,
meta_dir=os.path.dirname(db_data.get_meta_path()),
chunk_size=db_data.chunk_size
)
assert smooth_decoding == True, 'Too few keyframes for smooth video decoding.'
else:
meta_info, smooth_decoding = prepare_meta(
if os.path.exists(db_data.get_index_path()):
os.remove(db_data.get_index_path())
if isinstance(ex, AssertionError):
base_msg = str(ex)
else:
base_msg = 'Invalid manifest file was upload.'
slogger.glob.warning(str(ex))
_update_status('{} Start prepare a valid manifest file.'.format(base_msg))
if not manifest_is_prepared:
_update_status('Start prepare a manifest file')
manifest = VideoManifestManager(db_data.get_manifest_path())
meta_info = manifest.prepare_meta(
media_file=media_files[0],
upload_dir=upload_dir,
meta_dir=os.path.dirname(db_data.get_meta_path()),
chunk_size=db_data.chunk_size
)
assert smooth_decoding == True, 'Too few keyframes for smooth video decoding.'
manifest.create(meta_info)
manifest.init_index()
_update_status('A manifest had been created')
all_frames = meta_info.get_task_size()
video_size = meta_info.frame_sizes
all_frames = meta_info.get_size()
video_size = meta_info.frame_sizes
manifest_is_prepared = True
db_data.size = len(range(db_data.start_frame, min(data['stop_frame'] + 1 if data['stop_frame'] else all_frames, all_frames), db_data.get_frame_step()))
db_data.size = len(range(db_data.start_frame, min(data['stop_frame'] + 1 \
if data['stop_frame'] else all_frames, all_frames), db_data.get_frame_step()))
video_path = os.path.join(upload_dir, media_files[0])
except Exception as ex:
db_data.storage_method = StorageMethodChoice.FILE_SYSTEM
if os.path.exists(db_data.get_meta_path()):
os.remove(db_data.get_meta_path())
base_msg = str(ex) if isinstance(ex, AssertionError) else "Uploaded video does not support a quick way of task creating."
job.meta['status'] = "{} The task will be created using the old method".format(base_msg)
job.save_meta()
else:#images,archive
if os.path.exists(db_data.get_manifest_path()):
os.remove(db_data.get_manifest_path())
if os.path.exists(db_data.get_index_path()):
os.remove(db_data.get_index_path())
base_msg = str(ex) if isinstance(ex, AssertionError) \
else "Uploaded video does not support a quick way of task creating."
_update_status("{} The task will be created using the old method".format(base_msg))
else:# images, archive, pdf
db_data.size = len(extractor)
manifest = ImageManifestManager(db_data.get_manifest_path())
if not manifest_file:
if db_task.dimension == DimensionType.DIM_2D:
meta_info = manifest.prepare_meta(
sources=extractor.absolute_source_paths,
data_dir=upload_dir
)
content = meta_info.content
else:
content = []
for source in extractor.absolute_source_paths:
name, ext = os.path.splitext(os.path.relpath(source, upload_dir))
content.append({
'name': name,
'extension': ext
})
manifest.create(content)
manifest.init_index()
counter = itertools.count()
for chunk_number, chunk_frames in itertools.groupby(extractor.frame_range, lambda x: next(counter) // db_data.chunk_size):
for _, chunk_frames in itertools.groupby(extractor.frame_range, lambda x: next(counter) // db_data.chunk_size):
chunk_paths = [(extractor.get_path(i), i) for i in chunk_frames]
img_sizes = []
with open(db_data.get_dummy_chunk_path(chunk_number), 'w') as dummy_chunk:
for path, frame_id in chunk_paths:
dummy_chunk.write(os.path.relpath(path, upload_dir) + '\n')
img_sizes.append(extractor.get_image_size(frame_id))
for _, frame_id in chunk_paths:
properties = manifest[frame_id]
if db_task.dimension == DimensionType.DIM_2D:
resolution = (properties['width'], properties['height'])
else:
resolution = extractor.get_image_size(frame_id)
img_sizes.append(resolution)
db_images.extend([
models.Image(data=db_data,
......@@ -453,6 +495,10 @@ def _create_thread(tid, data):
if db_data.stop_frame == 0:
db_data.stop_frame = db_data.start_frame + (db_data.size - 1) * db_data.get_frame_step()
else:
# validate stop_frame
db_data.stop_frame = min(db_data.stop_frame, \
db_data.start_frame + (db_data.size - 1) * db_data.get_frame_step())
preview = extractor.get_preview()
preview.save(db_data.get_preview_path())
......
......@@ -30,9 +30,9 @@ from rest_framework.test import APIClient, APITestCase
from cvat.apps.engine.models import (AttributeSpec, AttributeType, Data, Job, Project,
Segment, StatusChoice, Task, Label, StorageMethodChoice, StorageChoice)
from cvat.apps.engine.prepare import prepare_meta, prepare_meta_for_upload
from cvat.apps.engine.media_extractors import ValidateDimension
from cvat.apps.engine.models import DimensionType
from utils.dataset_manifest import ImageManifestManager, VideoManifestManager
def create_db_users(cls):
(group_admin, _) = Group.objects.get_or_create(name="admin")
......@@ -1971,6 +1971,26 @@ def generate_pdf_file(filename, page_count=1):
file_buf.seek(0)
return image_sizes, file_buf
def generate_manifest_file(data_type, manifest_path, sources):
kwargs = {
'images': {
'sources': sources,
'is_sorted': False,
},
'video': {
'media_file': sources[0],
'upload_dir': os.path.dirname(sources[0]),
'force': True
}
}
if data_type == 'video':
manifest = VideoManifestManager(manifest_path)
else:
manifest = ImageManifestManager(manifest_path)
prepared_meta = manifest.prepare_meta(**kwargs[data_type])
manifest.create(prepared_meta)
class TaskDataAPITestCase(APITestCase):
_image_sizes = {}
......@@ -2093,6 +2113,12 @@ class TaskDataAPITestCase(APITestCase):
shutil.rmtree(root_path)
cls._image_sizes[filename] = image_sizes
generate_manifest_file(data_type='video', manifest_path=os.path.join(settings.SHARE_ROOT, 'videos', 'manifest.jsonl'),
sources=[os.path.join(settings.SHARE_ROOT, 'videos', 'test_video_1.mp4')])
generate_manifest_file(data_type='images', manifest_path=os.path.join(settings.SHARE_ROOT, 'manifest.jsonl'),
sources=[os.path.join(settings.SHARE_ROOT, f'test_{i}.jpg') for i in range(1,4)])
@classmethod
def tearDownClass(cls):
super().tearDownClass()
......@@ -2114,7 +2140,10 @@ class TaskDataAPITestCase(APITestCase):
path = os.path.join(settings.SHARE_ROOT, "videos", "test_video_1.mp4")
os.remove(path)
path = os.path.join(settings.SHARE_ROOT, "videos", "meta_info.txt")
path = os.path.join(settings.SHARE_ROOT, "videos", "manifest.jsonl")
os.remove(path)
path = os.path.join(settings.SHARE_ROOT, "manifest.jsonl")
os.remove(path)
def _run_api_v1_tasks_id_data_post(self, tid, user, data):
......@@ -2257,7 +2286,7 @@ class TaskDataAPITestCase(APITestCase):
self.assertEqual(len(images), min(task["data_chunk_size"], len(image_sizes)))
if task["data_original_chunk_type"] == self.ChunkType.IMAGESET:
server_files = [img for key, img in data.items() if key.startswith("server_files")]
server_files = [img for key, img in data.items() if key.startswith("server_files") and not img.endswith("manifest.jsonl")]
client_files = [img for key, img in data.items() if key.startswith("client_files")]
if server_files:
......@@ -2446,7 +2475,7 @@ class TaskDataAPITestCase(APITestCase):
image_sizes = self._image_sizes[task_data["server_files[0]"]]
self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data, self.ChunkType.IMAGESET, self.ChunkType.IMAGESET, image_sizes,
expected_uploaded_data_location=StorageChoice.SHARE)
expected_uploaded_data_location=StorageChoice.LOCAL)
task_spec.update([('name', 'my archive task #12')])
task_data.update([('copy_data', True)])
......@@ -2546,7 +2575,7 @@ class TaskDataAPITestCase(APITestCase):
image_sizes = self._image_sizes[task_data["server_files[0]"]]
self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data, self.ChunkType.IMAGESET,
self.ChunkType.IMAGESET, image_sizes, StorageMethodChoice.CACHE, StorageChoice.SHARE)
self.ChunkType.IMAGESET, image_sizes, StorageMethodChoice.CACHE, StorageChoice.LOCAL)
task_spec.update([('name', 'my cached zip archive task #19')])
task_data.update([('copy_data', True)])
......@@ -2595,11 +2624,6 @@ class TaskDataAPITestCase(APITestCase):
self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data,
self.ChunkType.IMAGESET, self.ChunkType.IMAGESET, image_sizes)
prepare_meta_for_upload(
prepare_meta,
os.path.join(settings.SHARE_ROOT, "videos", "test_video_1.mp4"),
os.path.join(settings.SHARE_ROOT, "videos")
)
task_spec = {
"name": "my video with meta info task without copying #22",
"overlap": 0,
......@@ -2611,7 +2635,7 @@ class TaskDataAPITestCase(APITestCase):
}
task_data = {
"server_files[0]": os.path.join("videos", "test_video_1.mp4"),
"server_files[1]": os.path.join("videos", "meta_info.txt"),
"server_files[1]": os.path.join("videos", "manifest.jsonl"),
"image_quality": 70,
"use_cache": True
}
......@@ -2723,6 +2747,38 @@ class TaskDataAPITestCase(APITestCase):
self.ChunkType.IMAGESET,
image_sizes, dimension=DimensionType.DIM_3D)
task_spec = {
"name": "my images+manifest without copying #26",
"overlap": 0,
"segment_size": 0,
"labels": [
{"name": "car"},
{"name": "person"},
]
}
task_data = {
"server_files[0]": "test_1.jpg",
"server_files[1]": "test_2.jpg",
"server_files[2]": "test_3.jpg",
"server_files[3]": "manifest.jsonl",
"image_quality": 70,
"use_cache": True
}
image_sizes = [
self._image_sizes[task_data["server_files[0]"]],
self._image_sizes[task_data["server_files[1]"]],
self._image_sizes[task_data["server_files[2]"]],
]
self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data, self.ChunkType.IMAGESET, self.ChunkType.IMAGESET,
image_sizes, StorageMethodChoice.CACHE, StorageChoice.SHARE)
task_spec.update([('name', 'my images+manifest #27')])
task_data.update([('copy_data', True)])
self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data, self.ChunkType.IMAGESET, self.ChunkType.IMAGESET,
image_sizes, StorageMethodChoice.CACHE, StorageChoice.LOCAL)
def test_api_v1_tasks_id_data_admin(self):
self._test_api_v1_tasks_id_data(self.admin)
......
# Copyright (C) 2020 Intel Corporation
# Copyright (C) 2020-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT
import ast
import cv2 as cv
from collections import namedtuple
import hashlib
import importlib
import sys
import traceback
import subprocess
import os
from av import VideoFrame
from django.core.exceptions import ValidationError
......@@ -51,6 +53,7 @@ class InterpreterError(Exception):
def execute_python_code(source_code, global_vars=None, local_vars=None):
try:
# pylint: disable=exec-used
exec(source_code, global_vars, local_vars)
except SyntaxError as err:
error_class = err.__class__.__name__
......@@ -72,7 +75,7 @@ def av_scan_paths(*paths):
if 'yes' == os.environ.get('CLAM_AV'):
command = ['clamscan', '--no-summary', '-i', '-o']
command.extend(paths)
res = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
res = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) # nosec
if res.returncode:
raise ValidationError(res.stdout)
......@@ -88,3 +91,8 @@ def rotate_image(image, angle):
matrix[1, 2] += bound_h/2 - image_center[1]
matrix = cv.warpAffine(image, matrix, (bound_w, bound_h))
return matrix
def md5_hash(frame):
if isinstance(frame, VideoFrame):
frame = frame.to_image()
return hashlib.md5(frame.tobytes()).hexdigest() # nosec
\ No newline at end of file
## Simple command line to prepare dataset manifest file
### Steps before use
When used separately from Computer Vision Annotation Tool(CVAT), the required dependencies must be installed
#### Ubuntu:20.04
Install dependencies:
```bash
# General
sudo apt-get update && sudo apt-get --no-install-recommends install -y \
python3-dev python3-pip python3-venv pkg-config
```
```bash
# Library components
sudo apt-get install --no-install-recommends -y \
libavformat-dev libavcodec-dev libavdevice-dev \
libavutil-dev libswscale-dev libswresample-dev libavfilter-dev
```
Create an environment and install the necessary python modules:
```bash
python3 -m venv .env
. .env/bin/activate
pip install -U pip
pip install -r requirements.txt
```
### Using
```bash
usage: python create.py [-h] [--force] [--output-dir .] source
positional arguments:
source Source paths
optional arguments:
-h, --help show this help message and exit
--force Use this flag to prepare the manifest file for video data if by default the video does not meet the requirements
and a manifest file is not prepared
--output-dir OUTPUT_DIR
Directory where the manifest file will be saved
```
### Alternative way to use with openvino/cvat_server
```bash
docker run -it --entrypoint python3 -v /path/to/host/data/:/path/inside/container/:rw openvino/cvat_server
utils/dataset_manifest/create.py --output-dir /path/to/manifest/directory/ /path/to/data/
```
### Examples of using
Create a dataset manifest in the current directory with video which contains enough keyframes:
```bash
python create.py ~/Documents/video.mp4
```
Create a dataset manifest with video which does not contain enough keyframes:
```bash
python create.py --force --output-dir ~/Documents ~/Documents/video.mp4
```
Create a dataset manifest with images:
```bash
python create.py --output-dir ~/Documents ~/Documents/images/
```
Create a dataset manifest with pattern (may be used `*`, `?`, `[]`):
```bash
python create.py --output-dir ~/Documents "/home/${USER}/Documents/**/image*.jpeg"
```
Create a dataset manifest with `openvino/cvat_server`:
```bash
docker run -it --entrypoint python3 -v ~/Documents/data/:${HOME}/manifest/:rw openvino/cvat_server
utils/dataset_manifest/create.py --output-dir ~/manifest/ ~/manifest/images/
```
### Examples of generated `manifest.jsonl` files
A maifest file contains some intuitive information and some specific like:
`pts` - time at which the frame should be shown to the user
`checksum` - `md5` hash sum for the specific image/frame
#### For a video
```json
{"version":"1.0"}
{"type":"video"}
{"properties":{"name":"video.mp4","resolution":[1280,720],"length":778}}
{"number":0,"pts":0,"checksum":"17bb40d76887b56fe8213c6fded3d540"}
{"number":135,"pts":486000,"checksum":"9da9b4d42c1206d71bf17a7070a05847"}
{"number":270,"pts":972000,"checksum":"a1c3a61814f9b58b00a795fa18bb6d3e"}
{"number":405,"pts":1458000,"checksum":"18c0803b3cc1aa62ac75b112439d2b62"}
{"number":540,"pts":1944000,"checksum":"4551ecea0f80e95a6c32c32e70cac59e"}
{"number":675,"pts":2430000,"checksum":"0e72faf67e5218c70b506445ac91cdd7"}
```
#### For a dataset with images
```json
{"version":"1.0"}
{"type":"images"}
{"name":"image1","extension":".jpg","width":720,"height":405,"checksum":"548918ec4b56132a5cff1d4acabe9947"}
{"name":"image2","extension":".jpg","width":183,"height":275,"checksum":"4b4eefd03cc6a45c1c068b98477fb639"}
{"name":"image3","extension":".jpg","width":301,"height":167,"checksum":"0e454a6f4a13d56c82890c98be063663"}
```
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT
from .core import VideoManifestManager, ImageManifestManager
\ No newline at end of file
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT
import av
import json
import os
from abc import ABC, abstractmethod
from collections import OrderedDict
from contextlib import closing
from PIL import Image
from .utils import md5_hash, rotate_image
class VideoStreamReader:
def __init__(self, source_path):
self.source_path = source_path
self._key_frames = OrderedDict()
self.frames = 0
with closing(av.open(self.source_path, mode='r')) as container:
self.width, self.height = self._get_frame_size(container)
@staticmethod
def _get_video_stream(container):
video_stream = next(stream for stream in container.streams if stream.type == 'video')
video_stream.thread_type = 'AUTO'
return video_stream
@staticmethod
def _get_frame_size(container):
video_stream = VideoStreamReader._get_video_stream(container)
for packet in container.demux(video_stream):
for frame in packet.decode():
if video_stream.metadata.get('rotate'):
frame = av.VideoFrame().from_ndarray(
rotate_image(
frame.to_ndarray(format='bgr24'),
360 - int(container.streams.video[0].metadata.get('rotate')),
),
format ='bgr24',
)
return frame.width, frame.height
def check_type_first_frame(self):
with closing(av.open(self.source_path, mode='r')) as container:
video_stream = self._get_video_stream(container)
for packet in container.demux(video_stream):
for frame in packet.decode():
if not frame.pict_type.name == 'I':
raise Exception('First frame is not key frame')
return
def check_video_timestamps_sequences(self):
with closing(av.open(self.source_path, mode='r')) as container:
video_stream = self._get_video_stream(container)
frame_pts = -1
frame_dts = -1
for packet in container.demux(video_stream):
for frame in packet.decode():
if None not in {frame.pts, frame_pts} and frame.pts <= frame_pts:
raise Exception('Invalid pts sequences')
if None not in {frame.dts, frame_dts} and frame.dts <= frame_dts:
raise Exception('Invalid dts sequences')
frame_pts, frame_dts = frame.pts, frame.dts
def rough_estimate_frames_ratio(self, upper_bound):
analyzed_frames_number, key_frames_number = 0, 0
_processing_end = False
with closing(av.open(self.source_path, mode='r')) as container:
video_stream = self._get_video_stream(container)
for packet in container.demux(video_stream):
for frame in packet.decode():
if frame.key_frame:
key_frames_number += 1
analyzed_frames_number += 1
if upper_bound == analyzed_frames_number:
_processing_end = True
break
if _processing_end:
break
# In our case no videos with non-key first frame, so 1 key frame is guaranteed
return analyzed_frames_number // key_frames_number
def validate_frames_ratio(self, chunk_size):
upper_bound = 3 * chunk_size
ratio = self.rough_estimate_frames_ratio(upper_bound + 1)
assert ratio < upper_bound, 'Too few keyframes'
def get_size(self):
return self.frames
@property
def frame_sizes(self):
return (self.width, self.height)
def validate_key_frame(self, container, video_stream, key_frame):
for packet in container.demux(video_stream):
for frame in packet.decode():
if md5_hash(frame) != key_frame[1]['md5'] or frame.pts != key_frame[1]['pts']:
self._key_frames.pop(key_frame[0])
return
def validate_seek_key_frames(self):
with closing(av.open(self.source_path, mode='r')) as container:
video_stream = self._get_video_stream(container)
key_frames_copy = self._key_frames.copy()
for key_frame in key_frames_copy.items():
container.seek(offset=key_frame[1]['pts'], stream=video_stream)
self.validate_key_frame(container, video_stream, key_frame)
def save_key_frames(self):
with closing(av.open(self.source_path, mode='r')) as container:
video_stream = self._get_video_stream(container)
frame_number = 0
for packet in container.demux(video_stream):
for frame in packet.decode():
if frame.key_frame:
self._key_frames[frame_number] = {
'pts': frame.pts,
'md5': md5_hash(frame),
}
frame_number += 1
self.frames = frame_number
@property
def key_frames(self):
return self._key_frames
def __len__(self):
return len(self._key_frames)
#TODO: need to change it in future
def __iter__(self):
for idx, key_frame in self._key_frames.items():
yield (idx, key_frame['pts'], key_frame['md5'])
class DatasetImagesReader:
def __init__(self, sources, is_sorted=True, use_image_hash=False, *args, **kwargs):
self._sources = sources if is_sorted else sorted(sources)
self._content = []
self._data_dir = kwargs.get('data_dir', None)
self._use_image_hash = use_image_hash
def __iter__(self):
for image in self._sources:
img = Image.open(image, mode='r')
img_name = os.path.relpath(image, self._data_dir) if self._data_dir \
else os.path.basename(image)
name, extension = os.path.splitext(img_name)
image_properties = {
'name': name,
'extension': extension,
'width': img.width,
'height': img.height,
}
if self._use_image_hash:
image_properties['checksum'] = md5_hash(img)
yield image_properties
def create(self):
for item in self:
self._content.append(item)
@property
def content(self):
return self._content
class _Manifest:
FILE_NAME = 'manifest.jsonl'
VERSION = '1.0'
def __init__(self, path, is_created=False):
assert path, 'A path to manifest file not found'
self._path = os.path.join(path, self.FILE_NAME) if os.path.isdir(path) else path
self._is_created = is_created
@property
def path(self):
return self._path
@property
def is_created(self):
return self._is_created
@is_created.setter
def is_created(self, value):
assert isinstance(value, bool)
self._is_created = value
# Needed for faster iteration over the manifest file, will be generated to work inside CVAT
# and will not be generated when manually creating a manifest
class _Index:
FILE_NAME = 'index.json'
def __init__(self, path):
assert path and os.path.isdir(path), 'No index directory path'
self._path = os.path.join(path, self.FILE_NAME)
self._index = {}
@property
def path(self):
return self._path
def dump(self):
with open(self._path, 'w') as index_file:
json.dump(self._index, index_file, separators=(',', ':'))
def load(self):
with open(self._path, 'r') as index_file:
self._index = json.load(index_file,
object_hook=lambda d: {int(k): v for k, v in d.items()})
def create(self, manifest, skip):
assert os.path.exists(manifest), 'A manifest file not exists, index cannot be created'
with open(manifest, 'r+') as manifest_file:
while skip:
manifest_file.readline()
skip -= 1
image_number = 0
position = manifest_file.tell()
line = manifest_file.readline()
while line:
if line.strip():
self._index[image_number] = position
image_number += 1
position = manifest_file.tell()
line = manifest_file.readline()
def partial_update(self, manifest, number):
assert os.path.exists(manifest), 'A manifest file not exists, index cannot be updated'
with open(manifest, 'r+') as manifest_file:
manifest_file.seek(self._index[number])
line = manifest_file.readline()
while line:
if line.strip():
self._index[number] = manifest_file.tell()
number += 1
line = manifest_file.readline()
def __getitem__(self, number):
assert 0 <= number < len(self), \
'A invalid index number: {}\nMax: {}'.format(number, len(self))
return self._index[number]
def __len__(self):
return len(self._index)
class _ManifestManager(ABC):
BASE_INFORMATION = {
'version' : 1,
'type': 2,
}
def __init__(self, path, *args, **kwargs):
self._manifest = _Manifest(path)
def _parse_line(self, line):
""" Getting a random line from the manifest file """
with open(self._manifest.path, 'r') as manifest_file:
if isinstance(line, str):
assert line in self.BASE_INFORMATION.keys(), \
'An attempt to get non-existent information from the manifest'
for _ in range(self.BASE_INFORMATION[line]):
fline = manifest_file.readline()
return json.loads(fline)[line]
else:
assert self._index, 'No prepared index'
offset = self._index[line]
manifest_file.seek(offset)
properties = manifest_file.readline()
return json.loads(properties)
def init_index(self):
self._index = _Index(os.path.dirname(self._manifest.path))
if os.path.exists(self._index.path):
self._index.load()
else:
self._index.create(self._manifest.path, 3 if self._manifest.TYPE == 'video' else 2)
self._index.dump()
@abstractmethod
def create(self, content, **kwargs):
pass
@abstractmethod
def partial_update(self, number, properties):
pass
def __iter__(self):
with open(self._manifest.path, 'r') as manifest_file:
manifest_file.seek(self._index[0])
image_number = 0
line = manifest_file.readline()
while line:
if not line.strip():
continue
yield (image_number, json.loads(line))
image_number += 1
line = manifest_file.readline()
@property
def manifest(self):
return self._manifest
def __len__(self):
if hasattr(self, '_index'):
return len(self._index)
else:
return None
def __getitem__(self, item):
return self._parse_line(item)
@property
def index(self):
return self._index
class VideoManifestManager(_ManifestManager):
def __init__(self, manifest_path, *args, **kwargs):
super().__init__(manifest_path)
setattr(self._manifest, 'TYPE', 'video')
self.BASE_INFORMATION['properties'] = 3
def create(self, content, **kwargs):
""" Creating and saving a manifest file """
with open(self._manifest.path, 'w') as manifest_file:
base_info = {
'version': self._manifest.VERSION,
'type': self._manifest.TYPE,
'properties': {
'name': os.path.basename(content.source_path),
'resolution': content.frame_sizes,
'length': content.get_size(),
},
}
for key, value in base_info.items():
json_item = json.dumps({key: value}, separators=(',', ':'))
manifest_file.write(f'{json_item}\n')
for item in content:
json_item = json.dumps({
'number': item[0],
'pts': item[1],
'checksum': item[2]
}, separators=(',', ':'))
manifest_file.write(f"{json_item}\n")
self._manifest.is_created = True
def partial_update(self, number, properties):
pass
@staticmethod
def prepare_meta(media_file, upload_dir=None, chunk_size=36, force=False):
source_path = os.path.join(upload_dir, media_file) if upload_dir else media_file
meta_info = VideoStreamReader(source_path=source_path)
meta_info.check_type_first_frame()
try:
meta_info.validate_frames_ratio(chunk_size)
except AssertionError:
if not force:
raise
meta_info.check_video_timestamps_sequences()
meta_info.save_key_frames()
meta_info.validate_seek_key_frames()
return meta_info
#TODO: add generic manifest structure file validation
class ManifestValidator:
def validate_base_info(self):
with open(self._manifest.path, 'r') as manifest_file:
assert self._manifest.VERSION != json.loads(manifest_file.readline())['version']
assert self._manifest.TYPE != json.loads(manifest_file.readline())['type']
class VideoManifestValidator(VideoManifestManager):
def __init__(self, **kwargs):
self.source_path = kwargs.pop('source_path')
super().__init__(self, **kwargs)
def validate_key_frame(self, container, video_stream, key_frame):
for packet in container.demux(video_stream):
for frame in packet.decode():
assert frame.pts == key_frame['pts'], "The uploaded manifest does not match the video"
return
def validate_seek_key_frames(self):
with closing(av.open(self.source_path, mode='r')) as container:
video_stream = self._get_video_stream(container)
last_key_frame = None
for _, key_frame in self:
# check that key frames sequence sorted
if last_key_frame and last_key_frame['number'] >= key_frame['number']:
raise AssertionError('Invalid saved key frames sequence in manifest file')
container.seek(offset=key_frame['pts'], stream=video_stream)
self.validate_key_frame(container, video_stream, key_frame)
last_key_frame = key_frame
def validate_frame_numbers(self):
with closing(av.open(self.source_path, mode='r')) as container:
video_stream = self._get_video_stream(container)
# not all videos contain information about numbers of frames
frames = video_stream.frames
if frames:
assert frames == self['properties']['length'], "The uploaded manifest does not match the video"
return
class ImageManifestManager(_ManifestManager):
def __init__(self, manifest_path):
super().__init__(manifest_path)
setattr(self._manifest, 'TYPE', 'images')
def create(self, content, **kwargs):
""" Creating and saving a manifest file"""
with open(self._manifest.path, 'w') as manifest_file:
base_info = {
'version': self._manifest.VERSION,
'type': self._manifest.TYPE,
}
for key, value in base_info.items():
json_item = json.dumps({key: value}, separators=(',', ':'))
manifest_file.write(f'{json_item}\n')
for item in content:
json_item = json.dumps({
key: value for key, value in item.items()
}, separators=(',', ':'))
manifest_file.write(f"{json_item}\n")
self._manifest.is_created = True
def partial_update(self, number, properties):
pass
@staticmethod
def prepare_meta(sources, **kwargs):
meta_info = DatasetImagesReader(sources=sources, **kwargs)
meta_info.create()
return meta_info
\ No newline at end of file
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT
import argparse
import mimetypes
import os
import sys
from glob import glob
def _define_data_type(media):
media_type, _ = mimetypes.guess_type(media)
if media_type:
return media_type.split('/')[0]
def _is_video(media_file):
return _define_data_type(media_file) == 'video'
def _is_image(media_file):
return _define_data_type(media_file) == 'image'
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument('--force', action='store_true',
help='Use this flag to prepare the manifest file for video data '
'if by default the video does not meet the requirements and a manifest file is not prepared')
parser.add_argument('--output-dir',type=str, help='Directory where the manifest file will be saved',
default=os.getcwd())
parser.add_argument('source', type=str, help='Source paths')
return parser.parse_args()
def main():
args = get_args()
manifest_directory = os.path.abspath(args.output_dir)
os.makedirs(manifest_directory, exist_ok=True)
source = os.path.abspath(args.source)
sources = []
if not os.path.isfile(source): # directory/pattern with images
data_dir = None
if os.path.isdir(source):
data_dir = source
for root, _, files in os.walk(source):
sources.extend([os.path.join(root, f) for f in files if _is_image(f)])
else:
items = source.lstrip('/').split('/')
position = 0
try:
for item in items:
if set(item) & {'*', '?', '[', ']'}:
break
position += 1
else:
raise Exception('Wrong positional argument')
assert position != 0, 'Wrong pattern: there must be a common root'
data_dir = source.split(items[position])[0]
except Exception as ex:
sys.exit(str(ex))
sources = list(filter(_is_image, glob(source, recursive=True)))
try:
assert len(sources), 'A images was not found'
manifest = ImageManifestManager(manifest_path=manifest_directory)
meta_info = manifest.prepare_meta(sources=sources, is_sorted=False,
use_image_hash=True, data_dir=data_dir)
manifest.create(meta_info)
except Exception as ex:
sys.exit(str(ex))
else: # video
try:
assert _is_video(source), 'You can specify a video path or a directory/pattern with images'
manifest = VideoManifestManager(manifest_path=manifest_directory)
try:
meta_info = manifest.prepare_meta(media_file=source, force=args.force)
except AssertionError as ex:
if str(ex) == 'Too few keyframes':
msg = 'NOTE: prepared manifest file contains too few key frames for smooth decoding.\n' \
'Use --force flag if you still want to prepare a manifest file.'
print(msg)
sys.exit(2)
else:
raise
manifest.create(meta_info)
except Exception as ex:
sys.exit(str(ex))
print('The manifest file has been prepared')
if __name__ == "__main__":
base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.append(base_dir)
from dataset_manifest.core import VideoManifestManager, ImageManifestManager
main()
\ No newline at end of file
av==8.0.2 --no-binary=av
opencv-python-headless==4.4.0.42
Pillow==7.2.0
\ No newline at end of file
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT
import hashlib
import cv2 as cv
from av import VideoFrame
def rotate_image(image, angle):
height, width = image.shape[:2]
image_center = (width/2, height/2)
matrix = cv.getRotationMatrix2D(image_center, angle, 1.)
abs_cos = abs(matrix[0,0])
abs_sin = abs(matrix[0,1])
bound_w = int(height * abs_sin + width * abs_cos)
bound_h = int(height * abs_cos + width * abs_sin)
matrix[0, 2] += bound_w/2 - image_center[0]
matrix[1, 2] += bound_h/2 - image_center[1]
matrix = cv.warpAffine(image, matrix, (bound_w, bound_h))
return matrix
def md5_hash(frame):
if isinstance(frame, VideoFrame):
frame = frame.to_image()
return hashlib.md5(frame.tobytes()).hexdigest() # nosec
\ No newline at end of file
# Simple command line for prepare meta information for video data
**Usage**
```bash
usage: prepare.py [-h] [-chunk_size CHUNK_SIZE] video_file meta_directory
positional arguments:
video_file Path to video file
meta_directory Directory where the file with meta information will be saved
optional arguments:
-h, --help show this help message and exit
-chunk_size CHUNK_SIZE
Chunk size that will be specified when creating the task with specified video and generated meta information
```
**NOTE**: For smooth video decoding, the `chunk size` must be greater than or equal to the ratio of number of frames
to a number of key frames.
You can understand the approximate `chunk size` by preparing and looking at the file with meta information.
**NOTE**: If ratio of number of frames to number of key frames is small compared to the `chunk size`,
then when creating a task with prepared meta information, you should expect that the waiting time for some chunks
will be longer than the waiting time for other chunks. (At the first iteration, when there is no chunk in the cache)
**Examples**
```bash
python prepare.py ~/Documents/some_video.mp4 ~/Documents
```
# Copyright (C) 2020 Intel Corporation
#
# SPDX-License-Identifier: MIT
import argparse
import sys
import os
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument('video_file',
type=str,
help='Path to video file')
parser.add_argument('meta_directory',
type=str,
help='Directory where the file with meta information will be saved')
parser.add_argument('-chunk_size',
type=int,
help='Chunk size that will be specified when creating the task with specified video and generated meta information')
return parser.parse_args()
def main():
args = get_args()
try:
smooth_decoding = prepare_meta_for_upload(prepare_meta, args.video_file, None, args.meta_directory, args.chunk_size)
print('Meta information for video has been prepared')
if smooth_decoding != None and not smooth_decoding:
print('NOTE: prepared meta information contains too few key frames for smooth decoding.')
except Exception:
print('Impossible to prepare meta information')
if __name__ == "__main__":
base_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(base_dir)
from cvat.apps.engine.prepare import prepare_meta, prepare_meta_for_upload
main()
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册