Design of data.fetcher in Paddle Data Package
Created by: reyoung
This issue is a part of issue #1392 (closed) job and it is a design documentation of Fetcher.
The Cached Fetcher is a python package, which can get a dataset from an HTTP server, and open any file inside the dataset as a Python file-like object.
The basically interface of fetcher is
class Fetcher(object):
def __init__(self, url, md5=None, local_cached_dir=None):
"""
:param url: url is the http address of this dataset package.
:param md5: md5sum of this package. None if do not check md5 sum.
:param local_cached_dir: the local directory which saved caches. if set
to None, then will pick a directory in `$HOME/.cache/cached_downloader/`
base on url or filename.
"""
pass
def open(self, path=None):
"""
Open a path inside the downloaded file.
If the downloaded file is a tar package, it will open the `path` file
inside this package. If the downloaded file is a simple plain file like
`XXX.json`, or a gzipped file like `XXX.json.gz`.
The basic usage is\:
.. code-block:: python
downloader = Fetcher(url="http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz")
with downloader.open() as train_images_file:
...
"""
pass