提交 e741a807 编写于 作者: P Pablo Hoffman

Added new Feed exports extension with documentation and storage tests. Closes #197.

Also deprecated File export pipeline (to be removed in Scrapy 0.11).

Still need to add tests for FeedExport main extension code.
上级 d17695ee
......@@ -12,10 +12,32 @@ else
exit 1
fi
# disable custom settings for running tests in a neutral environment
export SCRAPY_SETTINGS_DISABLED=1
# use vsftpd (if available) for testing ftp feed storage
if type vsftpd >/dev/null 2>&1; then
vsftpd_conf=$(mktemp /tmp/vsftpd-XXXX)
cat >$vsftpd_conf <<!
listen=YES
listen_port=2121
run_as_launching_user=YES
anonymous_enable=YES
write_enable=YES
anon_upload_enable=YES
anon_mkdir_write_enable=YES
anon_other_write_enable=YES
anon_umask=000
vsftpd_log_file=/dev/null
!
ftproot=$(mktemp -d /tmp/feedtest-XXXX)
chmod 755 $ftproot
export FEEDTEST_FTP_URI="ftp://anonymous:test@localhost:2121$ftproot/path/to/file.txt"
export FEEDTEST_FTP_PATH="$ftproot/path/to/file.txt"
vsftpd $vsftpd_conf &
vsftpd_pid=$!
fi
find -name '*.py[co]' -delete
if [ $# -eq 0 ]; then
$trial scrapy
......@@ -23,3 +45,7 @@ else
$trial "$@"
fi
# cleanup vsftpd stuff
[ -n "$vsftpd_pid" ] && kill $vsftpd_pid
[ -n "$ftproot" ] && rm -rf $ftproot $vsftpd_conf
......@@ -57,6 +57,7 @@ Scraping basics
topics/loaders
topics/shell
topics/item-pipeline
topics/feed-exports
:doc:`topics/items`
Define the data you want to scrape.
......@@ -76,6 +77,9 @@ Scraping basics
:doc:`topics/item-pipeline`
Post-process and store your scraped data.
:doc:`topics/feed-exports`
Output your scraped data using different formats and storages.
Built-in services
=================
......
......@@ -174,8 +174,9 @@ scraping easy and efficient, such as:
* Built-in support for :ref:`selecting and extracting <topics-selectors>` data
from HTML and XML sources
* Built-in support for :ref:`exporting data <file-export-pipeline>` in multiple
formats, including XML, CSV and JSON
* Built-in support for :ref:`generating feed exports <topics-feed-exports>` in
multiple formats (JSON, CSV, XML) and storing them in multiple backends (FTP,
S3, filesystem)
* A media pipeline for :ref:`automatically downloading images <topics-images>`
(or any other media) associated with the scraped items
......
......@@ -17,11 +17,10 @@ output formats, such as XML, CSV or JSON.
Using Item Exporters
====================
If you are in a hurry, and just want to use an Item Exporter as an :doc:`Item
Pipeline <item-pipeline>` see the :ref:`File Export Pipeline
<file-export-pipeline>`. Otherwise, if you want to know how Item Exporters
work or need more custom functionality (not covered by the :ref:`File Export
Pipeline <file-export-pipeline>`), continue reading below.
If you are in a hurry, and just want to use an Item Exporter to output scraped
data see the :ref:`topics-feed-exports`. Otherwise, if you want to know how
Item Exporters work or need more custom functionality (not covered by the
default exports), continue reading below.
In order to use an Item Exporter, you must instantiate it with its required
args. Each Item Exporter requires different arguments, so check each exporter
......@@ -333,7 +332,7 @@ PprintItemExporter
Longer lines (when present) are pretty-formatted.
JsonItemExporter
---------------------
----------------
.. class:: JsonItemExporter(file, \**kwargs)
......
.. _topics-feed-exports:
============
Feed exports
============
One of the most frequently required features when implementing scrapers is
being able to store the scraped data properly and, quite often, that means
generating a "export file" with the scraped data (commonly called "export
feed") to be consumed by other systems.
Scrapy provides this functionality out of the box with the Feed Exports, which
allows you to generate a feed with the scraped items, using multiple
serialization formats and storage backends.
.. _topics-feed-format:
Serialization formats
=====================
For serializing the scraped data, the feed exports use the :ref:`Item exporters
<topics-exporters>` and these formats are supported out of the box:
* :ref:`topics-feed-format-json`
* :ref:`topics-feed-format-jsonlines`
* :ref:`topics-feed-format-csv`
* :ref:`topics-feed-format-xml`
But you can also extend the supported format through the
:setting:`FEED_EXPORTERS` setting.
.. _topics-feed-format-json:
JSON
----
* :setting:`FEED_FORMAT`: ``json``
* Exporter used: :class:`~scrapy.contrib.exporter.JsonItemExporter`
* See :ref:`this warning <json-with-large-data>` if you're using JSON with large feeds
.. _topics-feed-format-jsonlines:
JSON lines
----------
* :setting:`FEED_FORMAT`: ``jsonlines``
* Exporter used: :class:`~scrapy.contrib.exporter.JsonLinesItemExporter`
.. _topics-feed-format-csv:
CSV
---
* :setting:`FEED_FORMAT`: ``csv``
* Exporter used: :class:`~scrapy.contrib.exporter.CsvItemExporter`
.. _topics-feed-format-xml:
XML
---
* :setting:`FEED_FORMAT`: ``xml``
* Exporter used: :class:`~scrapy.contrib.exporter.XmlItemExporter`
.. _topics-feed-storage:
Storages
========
When using the feed exports you define where to store the feed using a URI_
(through the :setting:`FEED_URI` setting). The feed exports supports multiple
storage backend types which are defined by the URI scheme.
The storages backends supported out of the box are:
* :ref:`topics-feed-storage-fs`
* :ref:`topics-feed-storage-ftp`
* :ref:`topics-feed-storage-s3` (requires boto_)
* :ref:`topics-feed-storage-stdout`
Some storage backends may be unavailable if the required external libraries are
not available. For example, the S3 backend is only available if the boto_
library is installed.
.. _topics-feed-uri-params:
Storage URI parameters
======================
The storage URI can also contain parameters that get replaced when the feed is
being created. These parameters are:
* ``%(time)s`` - gets replaced by a timestamp when the feed is being created
* ``%(name)s`` - gets replaced by the spider name
Any other named parmeter gets replaced by the spider attribute of the same
name. For example, ``%(site_id)s`` would get replaced by the ``spider.site_id``
attribute the moment the feed is being created.
Here are some examples to illustrate:
* Store in FTP using one directory per spider:
* ``ftp://user:password@ftp.example.com/scraping/feeds/%(name)s/%(time)s.json``
* Store in S3 using one directory per spider:
* ``s3://mybucket/scraping/feeds/%(name)s/%(time)s.json``
.. _topics-feed-storage-backends:
Storage backends
================
.. _topics-feed-storage-fs:
Local filesystem
----------------
The feeds are stored in the local filesystem.
* URI scheme: ``file``
* Example URI: ``file:///tmp/export.csv``
* Required external libraries: none
Note that for the local filesystem storage (only) you can omit the scheme if
you specify an absolute path like ``/tmp/export.csv``. This only works on Unix
systems though.
.. _topics-feed-storage-ftp:
FTP
---
The feeds are stored in a FTP server.
* URI scheme: ``ftp``
* Example URI: ``ftp://user:pass@ftp.example.com/path/to/export.csv``
* Required external libraries: none
.. _topics-feed-storage-s3:
S3
--
The feeds are stored on `Amazon S3`_.
* URI scheme: ``s3``
* Example URIs:
* ``s3://mybucket/path/to/export.csv``
* ``s3://aws_key:aws_secret@mybucket/path/to/export.csv``
* Required external libraries: `boto`_
The AWS credentials can be passed as user/password in the URI, or they can be
passed through the following settings:
* :setting:`AWS_ACCESS_KEY_ID`
* :setting:`AWS_SECRET_ACCESS_KEY`
.. _topics-feed-storage-stdout:
Standard output
---------------
The feeds are written to the standard output of the Scrapy process.
* URI scheme: ``stdout``
* Example URI: ``stdout:``
* Required external libraries: none
Settings
========
These are the settings used for configuring the feed exports:
* :setting:`FEED_URI` (mandatory)
* :setting:`FEED_FORMAT`
* :setting:`FEED_STORAGES`
* :setting:`FEED_EXPORTERS`
* :setting:`FEED_STORE_EMPTY`
.. currentmodule:: scrapy.contrib.feedexport
.. setting:: FEED_URI
FEED_URI
--------
Default: ``None``
The URI of the export feed. See :ref:`topics-feed-storage-backends` for
supported URI schemes.
This setting is required for enabling the feed exports.
.. setting:: FEED_FORMAT
FEED_FORMAT
-----------
The serialization format to be used for the feed. See
:ref:`topics-feed-format` for possible values.
.. setting:: FEED_STORE_EMPTY
FEED_STORE_EMPTY
----------------
Default: ``False``
Whether to export empty feeds (ie. feeds with no items).
.. setting:: FEED_STORAGES
FEED_STORAGES
-------------
Default:: ``{}``
A dict containing additional feed storage backends supported by your project.
The keys are URI schemes and the values are paths to storage classes.
.. setting:: FEED_STORAGES_BASE
FEED_STORAGES_BASE
------------------
Default::
{
'': 'scrapy.contrib.feedexport.FileFeedStorage',
'file': 'scrapy.contrib.feedexport.FileFeedStorage',
'stdout': 'scrapy.contrib.feedexport.StdoutFeedStorage',
's3': 'scrapy.contrib.feedexport.S3FeedStorage',
'ftp': 'scrapy.contrib.feedexport.FTPFeedStorage',
}
A dict containing the built-in feed storage backends supported by Scrapy.
.. setting:: FEED_EXPORTERS
FEED_EXPORTERS
--------------
Default:: ``{}``
A dict containing additional exporters supported by your project. The keys are
URI schemes and the values are paths to :ref:`Item exporter <topics-exporters>`
classes.
.. setting:: FEED_EXPORTERS_BASE
FEED_EXPORTERS_BASE
-------------------
Default::
FEED_EXPORTERS_BASE = {
'json': 'scrapy.contrib.exporter.JsonItemExporter',
'jsonlines': 'scrapy.contrib.exporter.JsonLinesItemExporter',
'csv': 'scrapy.contrib.exporter.CsvItemExporter',
'xml': 'scrapy.contrib.exporter.XmlItemExporter',
}
A dict containing the built-in feed exporters supported by Scrapy.
.. _URI: http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
.. _Amazon S3: http://aws.amazon.com/s3/
.. _boto: http://code.google.com/p/boto/
......@@ -116,130 +116,3 @@ spider returns multiples items with the same id::
else:
self.duplicates[spider].add(item['id'])
return item
Built-in Item Pipelines reference
=================================
.. module:: scrapy.contrib.pipeline
:synopsis: Item Pipeline manager and built-in pipelines
Here is a list of item pipelines bundled with Scrapy.
.. _file-export-pipeline:
File Export Pipeline
--------------------
.. module:: scrapy.contrib.pipeline.fileexport
.. class:: FileExportPipeline
This pipeline exports all scraped items into a file, using different formats.
It is simple but convenient wrapper to use :doc:`Item Exporters <exporters>` as
:ref:`Item Pipelines <topics-item-pipeline>`. If you need more custom/advanced
functionality you can write your own pipeline or subclass the :doc:`Item
Exporters <exporters>` .
It supports the following settings:
* :setting:`EXPORT_FORMAT` (mandatory)
* :setting:`EXPORT_FILE` (mandatory)
* :setting:`EXPORT_FIELDS`
* :setting:`EXPORT_EMPTY`
* :setting:`EXPORT_ENCODING`
If any mandatory setting is not set, this pipeline will be automatically
disabled.
File Export Pipeline examples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here are some usage examples of the File Export Pipeline.
To export all scraped items into a XML file::
EXPORT_FORMAT = 'xml'
EXPORT_FILE = 'scraped_items.xml'
To export all scraped items into a CSV file (with all fields in headers line)::
EXPORT_FORMAT = 'csv'
EXPORT_FILE = 'scraped_items.csv'
To export all scraped items into a CSV file (with specific fields in headers line)::
EXPORT_FORMAT = 'csv_headers'
EXPORT_FILE = 'scraped_items_with_headers.csv'
EXPORT_FIELDS = ['name', 'price', 'description']
File Export Pipeline settings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: scrapy.contrib.exporter
.. setting:: EXPORT_FORMAT
EXPORT_FORMAT
^^^^^^^^^^^^^
The format to use for exporting. Here is a list of all available formats. Click
on the respective Item Exporter to get more info.
* ``xml``: uses a :class:`XmlItemExporter`
* ``csv``: uses a :class:`CsvItemExporter`
* ``csv_headers``: uses a :class:`CsvItemExporter` with a the column headers on
the first line. This format requires you to specify the fields to export
using the :setting:`EXPORT_FIELDS` setting.
* ``json``: uses a :class:`JsonItemExporter`
* ``jsonlines``: uses a :class:`JsonLinesItemExporter`
* ``pickle``: uses a :class:`PickleItemExporter`
* ``pprint``: uses a :class:`PprintItemExporter`
This setting is mandatory in order to use the File Export Pipeline.
.. setting:: EXPORT_FILE
EXPORT_FILE
^^^^^^^^^^^
The name of the file where the items will be exported. This setting is
mandatory in order to use the File Export Pipeline.
.. setting:: EXPORT_FIELDS
EXPORT_FIELDS
^^^^^^^^^^^^^
Default: ``None``
The name of the item fields that will be exported. This will be use for the
:attr:`~BaseItemExporter.fields_to_export` Item Exporter attribute. If
``None``, all fields will be exported.
.. setting:: EXPORT_EMPTY
EXPORT_EMPTY
^^^^^^^^^^^^
Default: ``False``
Whether to export empty (non populated) fields. This will be used for the
:attr:`~BaseItemExporter.export_empty_fields` Item Exporter attribute.
.. setting:: EXPORT_ENCODING
EXPORT_ENCODING
^^^^^^^^^^^^^^^
Default: ``'utf-8'``
The encoding to use for exporting. Ths will be used for the
:attr:`~BaseItemExporter.encoding` Item Exporter attribute.
......@@ -195,12 +195,32 @@ to any particular component. In that case the module of that component will be
shown, typically an extension, middleware or pipeline. It also means that the
component must be enabled in order for the setting to have any effect.
.. setting:: AWS_ACCESS_KEY_ID
AWS_ACCESS_KEY_ID
-----------------
Default: ``None``
The AWS access key used by code that requires access to `Amazon Web services`_,
such as the :ref:`S3 feed storage backend <topics-feed-storage-s3>`.
.. setting:: AWS_SECRET_ACCESS_KEY
AWS_SECRET_ACCESS_KEY
---------------------
Default: ``None``
The AWS secret key used by code that requires access to `Amazon Web services`_,
such as the :ref:`S3 feed storage backend <topics-feed-storage-s3>`.
.. setting:: BOT_NAME
BOT_NAME
--------
Default: ``scrapybot``
Default: ``'scrapybot'``
The name of the bot implemented by this Scrapy project (also known as the
project name). This will be used to construct the User-Agent by default, and
......@@ -1017,3 +1037,4 @@ Default: ``"%s/%s" % (BOT_NAME, BOT_VERSION)``
The default User-Agent to use when crawling, unless overrided.
.. _Amazon web services: http://aws.amazon.com/
......@@ -114,6 +114,27 @@ EXTENSIONS_BASE = {
'scrapy.contrib.memusage.MemoryUsage': 0,
'scrapy.contrib.memdebug.MemoryDebugger': 0,
'scrapy.contrib.closespider.CloseSpider': 0,
'scrapy.contrib.feedexport.FeedExporter': 0,
}
FEED_URI = None
FEED_URI_ARGS = None # a function to extend uri arguments
FEED_FORMAT = 'jsonlines'
FEED_STORE_EMPTY = False
FEED_STORAGES = {}
FEED_STORAGES_BASE = {
'': 'scrapy.contrib.feedexport.FileFeedStorage',
'file': 'scrapy.contrib.feedexport.FileFeedStorage',
'stdout': 'scrapy.contrib.feedexport.StdoutFeedStorage',
's3': 'scrapy.contrib.feedexport.S3FeedStorage',
'ftp': 'scrapy.contrib.feedexport.FTPFeedStorage',
}
FEED_EXPORTERS = {}
FEED_EXPORTERS_BASE = {
'json': 'scrapy.contrib.exporter.JsonItemExporter',
'jsonlines': 'scrapy.contrib.exporter.JsonLinesItemExporter',
'csv': 'scrapy.contrib.exporter.CsvItemExporter',
'xml': 'scrapy.contrib.exporter.XmlItemExporter',
}
GROUPSETTINGS_ENABLED = False
......
"""
Feed Exports extension
See documentation in docs/topics/feed-exports.rst
"""
import sys, os, posixpath
from tempfile import TemporaryFile
from datetime import datetime
from urlparse import urlparse
from ftplib import FTP
from shutil import copyfileobj
from twisted.internet import defer, threads
from scrapy import log, signals
from scrapy.xlib.pydispatch import dispatcher
from scrapy.utils.ftp import ftp_makedirs_cwd
from scrapy.exceptions import NotConfigured
from scrapy.utils.misc import load_object
from scrapy.conf import settings
class BlockingFeedStorage(object):
def store(self, file, spider):
return threads.deferToThread(self._store_in_thread, file, spider)
def _store_in_thread(self, file, spider):
raise NotImplementedError
class StdoutFeedStorage(object):
def __init__(self, uri):
pass
def store(self, file, spider):
copyfileobj(file, sys.stdout)
class FileFeedStorage(BlockingFeedStorage):
def __init__(self, uri):
u = urlparse(uri)
self.path = u.path
def _store_in_thread(self, file, spider):
dirname = os.path.dirname(self.path)
if dirname and not os.path.exists(dirname):
os.makedirs(dirname)
f = open(self.path, 'wb')
copyfileobj(file, f)
f.close()
class S3FeedStorage(BlockingFeedStorage):
def __init__(self, uri):
try:
import boto
except ImportError:
raise NotConfigured
self.connect_s3 = boto.connect_s3
u = urlparse(uri)
self.bucketname = u.hostname
self.access_key = u.username or settings['AWS_ACCESS_KEY_ID']
self.secret_key = u.password or settings['AWS_SECRET_ACCESS_KEY']
self.keyname = u.path
def _store_in_thread(self, file, spider):
conn = self.connect_s3(self.access_key, self.secret_key)
bucket = conn.get_bucket(self.bucketname, validate=False)
key = bucket.new_key(self.keyname)
key.set_contents_from_file(file)
key.close()
class FTPFeedStorage(BlockingFeedStorage):
def __init__(self, uri):
u = urlparse(uri)
self.host = u.hostname
self.port = int(u.port or '21')
self.username = u.username
self.password = u.password
self.path = u.path
def _store_in_thread(self, file, spider):
ftp = FTP()
ftp.connect(self.host, self.port)
ftp.login(self.username, self.password)
dirname, filename = posixpath.split(self.path)
ftp_makedirs_cwd(ftp, dirname)
ftp.storbinary('STOR %s' % filename, file)
ftp.quit()
class SpiderSlot(object):
def __init__(self, file, exp):
self.file = file
self.exporter = exp
self.itemcount = 0
class FeedExporter(object):
def __init__(self):
self.urifmt = settings['FEED_URI']
if not self.urifmt:
raise NotConfigured
self.format = settings['FEED_FORMAT'].lower()
self.storages = self._load_components('FEED_STORAGES')
self.exporters = self._load_components('FEED_EXPORTERS')
if not self._storage_supported(self.urifmt):
raise NotConfigured
if not self._exporter_supported(self.format):
raise NotConfigured
self.store_empty = settings.getbool('FEED_STORE_EMPTY')
uripar = settings['FEED_URI_PARAMS']
self._uripar = load_object(uripar) if uripar else lambda x, y: None
self.slots = {}
dispatcher.connect(self.open_spider, signals.spider_opened)
dispatcher.connect(self.close_spider, signals.spider_closed)
dispatcher.connect(self.item_passed, signals.item_passed)
def open_spider(self, spider):
file = TemporaryFile(prefix='feed-')
exp = self._get_exporter(file)
exp.start_exporting()
self.slots[spider] = SpiderSlot(file, exp)
def close_spider(self, spider):
slot = self.slots.pop(spider)
if not slot.itemcount and not self.store_empty:
return
slot.exporter.finish_exporting()
nbytes = slot.file.tell()
slot.file.seek(0)
uri = self.urifmt % self._get_uri_params(spider)
storage = self._get_storage(uri)
if not storage:
return
logfmt = "%%s %s feed (%d items, %d bytes) in: %s" % (self.format, \
slot.itemcount, nbytes, uri)
d = defer.maybeDeferred(storage.store, slot.file, spider)
d.addCallback(lambda _: log.msg(logfmt % "Stored", spider=spider))
d.addErrback(log.err, logfmt % "Error storing", spider=spider)
d.addBoth(lambda _: slot.file.close())
return d
def item_passed(self, item, spider):
slot = self.slots[spider]
slot.exporter.export_item(item)
slot.itemcount += 1
return item
def _load_components(self, setting_prefix):
conf = dict(settings['%s_BASE' % setting_prefix])
conf.update(settings[setting_prefix])
d = {}
for k, v in conf.items():
try:
d[k] = load_object(v)
except NotConfigured:
pass
return d
def _exporter_supported(self, format):
if format in self.exporters:
return True
log.msg("Unknown feed format: %s" % format, log.ERROR)
def _storage_supported(self, uri):
scheme = urlparse(uri).scheme
if scheme in self.storages:
try:
self._get_storage(uri)
return True
except NotConfigured:
log.msg("Disabled feed storage scheme: %s" % scheme, log.ERROR)
else:
log.msg("Unknown feed storage scheme: %s" % scheme, log.ERROR)
def _get_exporter(self, *a, **kw):
return self.exporters[self.format](*a, **kw)
def _get_storage(self, uri):
return self.storages[urlparse(uri).scheme](uri)
def _get_uri_params(self, spider):
params = {}
for k in dir(spider):
params[k] = getattr(spider, k)
ts = datetime.utcnow().replace(microsecond=0).isoformat().replace(':', '-')
params['time'] = ts
self._uripar(params, spider)
return params
"""
File Export Pipeline
See documentation in docs/topics/item-pipeline.rst
"""
import warnings
warnings.warn("File export pipeline is deprecated and will be removed in Scrapy 0.11, use Feed exports instead", \
DeprecationWarning, stacklevel=2)
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
......
import os, urlparse
from twisted.trial import unittest
from twisted.internet import defer
from cStringIO import StringIO
from scrapy.spider import BaseSpider
from scrapy.contrib.feedexport import FileFeedStorage, FTPFeedStorage, S3FeedStorage
class FeedStorageTest(unittest.TestCase):
@defer.inlineCallbacks
def _assert_stores(self, storage, path):
yield storage.store(StringIO("content"), BaseSpider("default"))
self.failUnless(os.path.exists(path))
self.failUnlessEqual(open(path).read(), "content")
# again, to check files are overwritten properly
yield storage.store(StringIO("new content"), BaseSpider("default"))
self.failUnlessEqual(open(path).read(), "new content")
class FileFeedStorageTest(FeedStorageTest):
def test_store_file_uri(self):
path = os.path.abspath(self.mktemp())
uri = "file://%s" % path
return self._assert_stores(FileFeedStorage(uri), path)
def test_store_file_uri_makedirs(self):
path = os.path.abspath(self.mktemp())
path = os.path.join(path, 'more', 'paths', 'file.txt')
uri = "file://%s" % path
return self._assert_stores(FileFeedStorage(uri), path)
def test_store_direct_path(self):
path = os.path.abspath(self.mktemp())
return self._assert_stores(FileFeedStorage(path), path)
def test_store_direct_path_relative(self):
path = self.mktemp()
return self._assert_stores(FileFeedStorage(path), path)
class FTPFeedStorageTest(FeedStorageTest):
def test_store(self):
uri = os.environ.get('FEEDTEST_FTP_URI')
path = os.environ.get('FEEDTEST_FTP_PATH')
if not (uri and path):
raise unittest.SkipTest("No FTP server available for testing")
return self._assert_stores(FTPFeedStorage(uri), path)
class S3FeedStorageTest(unittest.TestCase):
@defer.inlineCallbacks
def test_store(self):
uri = os.environ.get('FEEDTEST_S3_URI')
if not uri:
raise unittest.SkipTest("No S3 bucket available for testing")
try:
from boto import connect_s3
except ImportError:
raise unittest.SkipTest("Missing library: boto")
storage = S3FeedStorage(uri)
yield storage.store(StringIO("content"), BaseSpider("default"))
u = urlparse.urlparse(uri)
key = connect_s3().get_bucket(u.hostname, validate=False).get_key(u.path)
self.failUnlessEqual(key.get_contents_as_string(), "content")
from ftplib import error_perm
from posixpath import dirname
def ftp_makedirs_cwd(ftp, path, first_call=True):
"""Set the current directory of the FTP connection given in the `ftp`
argument (as a ftplib.FTP object), creating all parent directories if they
don't exist. The ftplib.FTP object must be already connected and logged in.
"""
try:
ftp.cwd(path)
except error_perm:
ftp_makedirs_cwd(ftp, dirname(path), False)
ftp.mkd(path)
if first_call:
ftp.cwd(path)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册