Added new Feed exports extension with documentation and storage tests. Closes #197.

Also deprecated File export pipeline (to be removed in Scrapy 0.11). Still need to add tests for FeedExport main extension code.

Added new Feed exports extension with documentation and storage tests. Closes #197.
Also deprecated File export pipeline (to be removed in Scrapy 0.11). Still need to add tests for FeedExport main extension code.
e741a807 · Pablo Hoffman · d17695ee · e741a807 · e741a807 · e741a807
12 changed file
--- a/bin/runtests.sh
+++ b/bin/runtests.sh
@@ -12,10 +12,32 @@ else
    exit 1
 fi

-
 # disable custom settings for running tests in a neutral environment
 export SCRAPY_SETTINGS_DISABLED=1

+# use vsftpd (if available) for testing ftp feed storage
+if type vsftpd >/dev/null 2>&1; then
+    vsftpd_conf=$(mktemp /tmp/vsftpd-XXXX)
+    cat >$vsftpd_conf <<!
+listen=YES
+listen_port=2121
+run_as_launching_user=YES
+anonymous_enable=YES
+write_enable=YES
+anon_upload_enable=YES
+anon_mkdir_write_enable=YES
+anon_other_write_enable=YES
+anon_umask=000
+vsftpd_log_file=/dev/null
+!
+    ftproot=$(mktemp -d /tmp/feedtest-XXXX)
+    chmod 755 $ftproot
+    export FEEDTEST_FTP_URI="ftp://anonymous:test@localhost:2121$ftproot/path/to/file.txt"
+    export FEEDTEST_FTP_PATH="$ftproot/path/to/file.txt"
+    vsftpd $vsftpd_conf &
+    vsftpd_pid=$!
+fi
+
 find -name '*.py[co]' -delete
 if [ $# -eq 0 ]; then
    $trial scrapy
@@ -23,3 +45,7 @@ else
    $trial "$@"
 fi

+# cleanup vsftpd stuff
+[ -n "$vsftpd_pid" ] && kill $vsftpd_pid
+[ -n "$ftproot" ] && rm -rf $ftproot $vsftpd_conf
+
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -57,6 +57,7 @@ Scraping basics
   topics/loaders
   topics/shell
   topics/item-pipeline
+   topics/feed-exports

 :doc:`topics/items`
    Define the data you want to scrape.
@@ -76,6 +77,9 @@ Scraping basics
 :doc:`topics/item-pipeline`
    Post-process and store your scraped data.

+:doc:`topics/feed-exports`
+    Output your scraped data using different formats and storages.
+

 Built-in services
 =================

--- a/docs/intro/overview.rst
+++ b/docs/intro/overview.rst
@@ -174,8 +174,9 @@ scraping easy and efficient, such as:
 * Built-in support for :ref:`selecting and extracting <topics-selectors>` data
  from HTML and XML sources

-* Built-in support for :ref:`exporting data <file-export-pipeline>` in multiple
-  formats, including XML, CSV and JSON
+* Built-in support for :ref:`generating feed exports <topics-feed-exports>` in
+  multiple formats (JSON, CSV, XML) and storing them in multiple backends (FTP,
+  S3, filesystem)

 * A media pipeline for :ref:`automatically downloading images <topics-images>`
  (or any other media) associated with the scraped items

--- a/docs/topics/exporters.rst
+++ b/docs/topics/exporters.rst
@@ -17,11 +17,10 @@ output formats, such as XML, CSV or JSON.
 Using Item Exporters
 ====================

-If you are in a hurry, and just want to use an Item Exporter as an :doc:`Item
-Pipeline <item-pipeline>` see the :ref:`File Export Pipeline
-<file-export-pipeline>`. Otherwise, if you want to know how Item Exporters
-work or need more custom functionality (not covered by the :ref:`File Export
-Pipeline <file-export-pipeline>`), continue reading below.
+If you are in a hurry, and just want to use an Item Exporter to output scraped
+data see the :ref:`topics-feed-exports`. Otherwise, if you want to know how
+Item Exporters work or need more custom functionality (not covered by the
+default exports), continue reading below.

 In order to use an Item Exporter, you  must instantiate it with its required
 args. Each Item Exporter requires different arguments, so check each exporter
@@ -333,7 +332,7 @@ PprintItemExporter
   Longer lines (when present) are pretty-formatted.

 JsonItemExporter
---------------------
+----------------

 .. class:: JsonItemExporter(file, \**kwargs)


--- a/docs/topics/feed-exports.rst
+++ b/docs/topics/feed-exports.rst
+.. _topics-feed-exports:
+
+============
+Feed exports
+============
+
+One of the most frequently required features when implementing scrapers is
+being able to store the scraped data properly and, quite often, that means
+generating a "export file" with the scraped data (commonly called "export
+feed") to be consumed by other systems.
+
+Scrapy provides this functionality out of the box with the Feed Exports, which
+allows you to generate a feed with the scraped items, using multiple
+serialization formats and storage backends.
+
+.. _topics-feed-format:
+
+Serialization formats
+=====================
+
+For serializing the scraped data, the feed exports use the :ref:`Item exporters
+<topics-exporters>` and these formats are supported out of the box:
+
+ * :ref:`topics-feed-format-json`
+ * :ref:`topics-feed-format-jsonlines`
+ * :ref:`topics-feed-format-csv`
+ * :ref:`topics-feed-format-xml`
+
+But you can also extend the supported format through the
+:setting:`FEED_EXPORTERS` setting.
+ 
+.. _topics-feed-format-json:
+
+JSON
+----
+
+ * :setting:`FEED_FORMAT`: ``json``
+ * Exporter used: :class:`~scrapy.contrib.exporter.JsonItemExporter`
+ * See :ref:`this warning <json-with-large-data>` if you're using JSON with large feeds
+
+.. _topics-feed-format-jsonlines:
+
+JSON lines
+----------
+
+ * :setting:`FEED_FORMAT`: ``jsonlines``
+ * Exporter used: :class:`~scrapy.contrib.exporter.JsonLinesItemExporter`
+
+.. _topics-feed-format-csv:
+
+CSV
+---
+
+ * :setting:`FEED_FORMAT`: ``csv``
+ * Exporter used: :class:`~scrapy.contrib.exporter.CsvItemExporter`
+
+.. _topics-feed-format-xml:
+
+XML
+---
+
+ * :setting:`FEED_FORMAT`: ``xml``
+ * Exporter used: :class:`~scrapy.contrib.exporter.XmlItemExporter`
+
+.. _topics-feed-storage:
+
+Storages
+========
+
+When using the feed exports you define where to store the feed using a URI_
+(through the :setting:`FEED_URI` setting). The feed exports supports multiple
+storage backend types which are defined by the URI scheme.
+
+The storages backends supported out of the box are:
+
+ * :ref:`topics-feed-storage-fs`
+ * :ref:`topics-feed-storage-ftp`
+ * :ref:`topics-feed-storage-s3` (requires boto_)
+ * :ref:`topics-feed-storage-stdout`
+
+Some storage backends may be unavailable if the required external libraries are
+not available. For example, the S3 backend is only available if the boto_
+library is installed.
+
+
+.. _topics-feed-uri-params:
+
+Storage URI parameters
+======================
+
+The storage URI can also contain parameters that get replaced when the feed is
+being created. These parameters are:
+
+ * ``%(time)s`` - gets replaced by a timestamp when the feed is being created
+ * ``%(name)s`` - gets replaced by the spider name
+
+Any other named parmeter gets replaced by the spider attribute of the same
+name. For example, ``%(site_id)s`` would get replaced by the ``spider.site_id``
+attribute the moment the feed is being created.
+
+Here are some examples to illustrate:
+
+ * Store in FTP using one directory per spider:
+   * ``ftp://user:password@ftp.example.com/scraping/feeds/%(name)s/%(time)s.json``
+ * Store in S3 using one directory per spider:
+   * ``s3://mybucket/scraping/feeds/%(name)s/%(time)s.json``
+
+
+.. _topics-feed-storage-backends:
+
+Storage backends
+================
+
+.. _topics-feed-storage-fs:
+
+Local filesystem
+----------------
+
+The feeds are stored in the local filesystem.
+
+ * URI scheme: ``file``
+ * Example URI: ``file:///tmp/export.csv``
+ * Required external libraries: none
+
+Note that for the local filesystem storage (only) you can omit the scheme if
+you specify an absolute path like ``/tmp/export.csv``. This only works on Unix
+systems though.
+
+.. _topics-feed-storage-ftp:
+
+FTP
+---
+
+The feeds are stored in a FTP server.
+
+ * URI scheme: ``ftp``
+ * Example URI: ``ftp://user:pass@ftp.example.com/path/to/export.csv``
+ * Required external libraries: none
+
+.. _topics-feed-storage-s3:
+
+S3
+--
+
+The feeds are stored on `Amazon S3`_.
+
+ * URI scheme: ``s3``
+ * Example URIs:
+
+   * ``s3://mybucket/path/to/export.csv``
+   * ``s3://aws_key:aws_secret@mybucket/path/to/export.csv``
+
+ * Required external libraries: `boto`_
+
+The AWS credentials can be passed as user/password in the URI, or they can be
+passed through the following settings:
+
+ * :setting:`AWS_ACCESS_KEY_ID`
+ * :setting:`AWS_SECRET_ACCESS_KEY`
+
+.. _topics-feed-storage-stdout:
+
+Standard output
+---------------
+
+The feeds are written to the standard output of the Scrapy process.
+
+ * URI scheme: ``stdout``
+ * Example URI: ``stdout:``
+ * Required external libraries: none
+
+
+Settings
+========
+
+These are the settings used for configuring the feed exports:
+
+ * :setting:`FEED_URI` (mandatory)
+ * :setting:`FEED_FORMAT`
+ * :setting:`FEED_STORAGES`
+ * :setting:`FEED_EXPORTERS`
+ * :setting:`FEED_STORE_EMPTY`
+
+.. currentmodule:: scrapy.contrib.feedexport
+
+.. setting:: FEED_URI
+
+FEED_URI
+--------
+
+Default: ``None``
+
+The URI of the export feed. See :ref:`topics-feed-storage-backends` for
+supported URI schemes.
+
+This setting is required for enabling the feed exports.
+
+.. setting:: FEED_FORMAT
+
+FEED_FORMAT
+-----------
+
+The serialization format to be used for the feed. See
+:ref:`topics-feed-format` for possible values.
+
+.. setting:: FEED_STORE_EMPTY
+
+FEED_STORE_EMPTY
+----------------
+
+Default: ``False``
+
+Whether to export empty feeds (ie. feeds with no items).
+
+.. setting:: FEED_STORAGES
+
+FEED_STORAGES
+-------------
+
+Default:: ``{}``
+
+A dict containing additional feed storage backends supported by your project.
+The keys are URI schemes and the values are paths to storage classes.
+
+.. setting:: FEED_STORAGES_BASE
+
+FEED_STORAGES_BASE
+------------------
+
+Default:: 
+
+    {
+        '': 'scrapy.contrib.feedexport.FileFeedStorage',
+        'file': 'scrapy.contrib.feedexport.FileFeedStorage',
+        'stdout': 'scrapy.contrib.feedexport.StdoutFeedStorage',
+        's3': 'scrapy.contrib.feedexport.S3FeedStorage',
+        'ftp': 'scrapy.contrib.feedexport.FTPFeedStorage',
+    }
+
+A dict containing the built-in feed storage backends supported by Scrapy.
+
+.. setting:: FEED_EXPORTERS
+
+FEED_EXPORTERS
+--------------
+
+Default:: ``{}``
+
+A dict containing additional exporters supported by your project. The keys are
+URI schemes and the values are paths to :ref:`Item exporter <topics-exporters>`
+classes.
+
+.. setting:: FEED_EXPORTERS_BASE
+
+FEED_EXPORTERS_BASE
+-------------------
+
+Default:: 
+
+    FEED_EXPORTERS_BASE = {
+        'json': 'scrapy.contrib.exporter.JsonItemExporter',
+        'jsonlines': 'scrapy.contrib.exporter.JsonLinesItemExporter',
+        'csv': 'scrapy.contrib.exporter.CsvItemExporter',
+        'xml': 'scrapy.contrib.exporter.XmlItemExporter',
+    }
+
+A dict containing the built-in feed exporters supported by Scrapy.
+
+
+.. _URI: http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
+.. _Amazon S3: http://aws.amazon.com/s3/
+.. _boto: http://code.google.com/p/boto/
--- a/docs/topics/item-pipeline.rst
+++ b/docs/topics/item-pipeline.rst
@@ -116,130 +116,3 @@ spider returns multiples items with the same id::
            else:
                self.duplicates[spider].add(item['id'])
                return item
-
-Built-in Item Pipelines reference
-=================================
-
-.. module:: scrapy.contrib.pipeline
-   :synopsis: Item Pipeline manager and built-in pipelines
-
-Here is a list of item pipelines bundled with Scrapy.
-
-.. _file-export-pipeline:
-
-File Export Pipeline
--------------------
-
-.. module:: scrapy.contrib.pipeline.fileexport
-
-.. class:: FileExportPipeline
-
-This pipeline exports all scraped items into a file, using different formats.
-
-It is simple but convenient wrapper to use :doc:`Item Exporters <exporters>` as
-:ref:`Item Pipelines <topics-item-pipeline>`. If you need more custom/advanced
-functionality you can write your own pipeline or subclass the :doc:`Item
-Exporters <exporters>` .
-
-It supports the following settings:
-
-* :setting:`EXPORT_FORMAT` (mandatory)
-* :setting:`EXPORT_FILE` (mandatory)
-* :setting:`EXPORT_FIELDS`
-* :setting:`EXPORT_EMPTY`
-* :setting:`EXPORT_ENCODING`
-
-If any mandatory setting is not set, this pipeline will be automatically
-disabled.
-
-File Export Pipeline examples
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Here are some usage examples of the File Export Pipeline.
-
-To export all scraped items into a XML file::
-
-    EXPORT_FORMAT = 'xml'
-    EXPORT_FILE = 'scraped_items.xml'
-
-To export all scraped items into a CSV file (with all fields in headers line)::
-
-    EXPORT_FORMAT = 'csv'
-    EXPORT_FILE = 'scraped_items.csv'
-
-To export all scraped items into a CSV file (with specific fields in headers line)::
-
-    EXPORT_FORMAT = 'csv_headers'
-    EXPORT_FILE = 'scraped_items_with_headers.csv'
-    EXPORT_FIELDS = ['name', 'price', 'description']
-
-File Export Pipeline settings
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. currentmodule:: scrapy.contrib.exporter
-
-.. setting:: EXPORT_FORMAT
-
-EXPORT_FORMAT
-^^^^^^^^^^^^^
-
-The format to use for exporting. Here is a list of all available formats. Click
-on the respective Item Exporter to get more info.
-
-* ``xml``: uses a :class:`XmlItemExporter`
-
-* ``csv``: uses a :class:`CsvItemExporter`
-
-* ``csv_headers``: uses a :class:`CsvItemExporter` with a the column headers on
-  the first line. This format requires you to specify the fields to export
-  using the :setting:`EXPORT_FIELDS` setting.
-
-* ``json``: uses a :class:`JsonItemExporter`
-
-* ``jsonlines``: uses a :class:`JsonLinesItemExporter`
-
-* ``pickle``: uses a :class:`PickleItemExporter`
-
-* ``pprint``: uses a :class:`PprintItemExporter`
-
-This setting is mandatory in order to use the File Export Pipeline.
-
-.. setting:: EXPORT_FILE
-
-EXPORT_FILE
-^^^^^^^^^^^
-
-The name of the file where the items will be exported. This setting is
-mandatory in order to use the File Export Pipeline.
-
-.. setting:: EXPORT_FIELDS
-
-EXPORT_FIELDS
-^^^^^^^^^^^^^
-
-Default: ``None``
-
-The name of the item fields that will be exported. This will be use for the
-:attr:`~BaseItemExporter.fields_to_export` Item Exporter attribute. If
-``None``, all fields will be exported.
-
-.. setting:: EXPORT_EMPTY
-
-EXPORT_EMPTY
-^^^^^^^^^^^^
-
-Default: ``False``
-
-Whether to export empty (non populated) fields. This will be used for the
-:attr:`~BaseItemExporter.export_empty_fields` Item Exporter attribute.
-
-.. setting:: EXPORT_ENCODING
-
-EXPORT_ENCODING
-^^^^^^^^^^^^^^^
-
-Default: ``'utf-8'``
-
-The encoding to use for exporting. Ths will be used for the
-:attr:`~BaseItemExporter.encoding` Item Exporter attribute.
-
--- a/docs/topics/settings.rst
+++ b/docs/topics/settings.rst
@@ -195,12 +195,32 @@ to any particular component. In that case the module of that component will be
 shown, typically an extension, middleware or pipeline. It also means that the
 component must be enabled in order for the setting to have any effect.

+.. setting:: AWS_ACCESS_KEY_ID
+
+AWS_ACCESS_KEY_ID
+-----------------
+
+Default: ``None``
+
+The AWS access key used by code that requires access to `Amazon Web services`_,
+such as the :ref:`S3 feed storage backend <topics-feed-storage-s3>`.
+
+.. setting:: AWS_SECRET_ACCESS_KEY
+
+AWS_SECRET_ACCESS_KEY
+---------------------
+
+Default: ``None``
+
+The AWS secret key used by code that requires access to `Amazon Web services`_,
+such as the :ref:`S3 feed storage backend <topics-feed-storage-s3>`.
+
 .. setting:: BOT_NAME

 BOT_NAME
 --------

-Default: ``scrapybot``
+Default: ``'scrapybot'``

 The name of the bot implemented by this Scrapy project (also known as the
 project name). This will be used to construct the User-Agent by default, and
@@ -1017,3 +1037,4 @@ Default: ``"%s/%s" % (BOT_NAME, BOT_VERSION)``

 The default User-Agent to use when crawling, unless overrided. 

+.. _Amazon web services: http://aws.amazon.com/
--- a/scrapy/conf/default_settings.py
+++ b/scrapy/conf/default_settings.py
@@ -114,6 +114,27 @@ EXTENSIONS_BASE = {
    'scrapy.contrib.memusage.MemoryUsage': 0,
    'scrapy.contrib.memdebug.MemoryDebugger': 0,
    'scrapy.contrib.closespider.CloseSpider': 0,
+    'scrapy.contrib.feedexport.FeedExporter': 0,
+}
+
+FEED_URI = None
+FEED_URI_ARGS = None # a function to extend uri arguments
+FEED_FORMAT = 'jsonlines'
+FEED_STORE_EMPTY = False
+FEED_STORAGES = {}
+FEED_STORAGES_BASE = {
+    '': 'scrapy.contrib.feedexport.FileFeedStorage',
+    'file': 'scrapy.contrib.feedexport.FileFeedStorage',
+    'stdout': 'scrapy.contrib.feedexport.StdoutFeedStorage',
+    's3': 'scrapy.contrib.feedexport.S3FeedStorage',
+    'ftp': 'scrapy.contrib.feedexport.FTPFeedStorage',
+}
+FEED_EXPORTERS = {}
+FEED_EXPORTERS_BASE = {
+    'json': 'scrapy.contrib.exporter.JsonItemExporter',
+    'jsonlines': 'scrapy.contrib.exporter.JsonLinesItemExporter',
+    'csv': 'scrapy.contrib.exporter.CsvItemExporter',
+    'xml': 'scrapy.contrib.exporter.XmlItemExporter',
 }

 GROUPSETTINGS_ENABLED = False

--- a/scrapy/contrib/feedexport.py
+++ b/scrapy/contrib/feedexport.py
+"""
+Feed Exports extension
+
+See documentation in docs/topics/feed-exports.rst
+"""
+
+import sys, os, posixpath
+from tempfile import TemporaryFile
+from datetime import datetime
+from urlparse import urlparse
+from ftplib import FTP
+from shutil import copyfileobj
+
+from twisted.internet import defer, threads
+from scrapy import log, signals
+from scrapy.xlib.pydispatch import dispatcher
+from scrapy.utils.ftp import ftp_makedirs_cwd
+from scrapy.exceptions import NotConfigured
+from scrapy.utils.misc import load_object
+from scrapy.conf import settings
+
+class BlockingFeedStorage(object):
+
+    def store(self, file, spider):
+        return threads.deferToThread(self._store_in_thread, file, spider)
+
+    def _store_in_thread(self, file, spider):
+        raise NotImplementedError
+
+
+class StdoutFeedStorage(object):
+
+    def __init__(self, uri):
+        pass
+
+    def store(self, file, spider):
+        copyfileobj(file, sys.stdout)
+
+
+class FileFeedStorage(BlockingFeedStorage):
+
+    def __init__(self, uri):
+        u = urlparse(uri)
+        self.path = u.path
+
+    def _store_in_thread(self, file, spider):
+        dirname = os.path.dirname(self.path)
+        if dirname and not os.path.exists(dirname):
+            os.makedirs(dirname)
+        f = open(self.path, 'wb')
+        copyfileobj(file, f)
+        f.close()
+
+
+class S3FeedStorage(BlockingFeedStorage):
+
+    def __init__(self, uri):
+        try:
+            import boto
+        except ImportError:
+            raise NotConfigured
+        self.connect_s3 = boto.connect_s3
+        u = urlparse(uri)
+        self.bucketname = u.hostname
+        self.access_key = u.username or settings['AWS_ACCESS_KEY_ID']
+        self.secret_key = u.password or settings['AWS_SECRET_ACCESS_KEY']
+        self.keyname = u.path
+
+    def _store_in_thread(self, file, spider):
+        conn = self.connect_s3(self.access_key, self.secret_key)
+        bucket = conn.get_bucket(self.bucketname, validate=False)
+        key = bucket.new_key(self.keyname)
+        key.set_contents_from_file(file)
+        key.close()
+
+
+class FTPFeedStorage(BlockingFeedStorage):
+
+    def __init__(self, uri):
+        u = urlparse(uri)
+        self.host = u.hostname
+        self.port = int(u.port or '21')
+        self.username = u.username
+        self.password = u.password
+        self.path = u.path
+
+    def _store_in_thread(self, file, spider):
+        ftp = FTP()
+        ftp.connect(self.host, self.port)
+        ftp.login(self.username, self.password)
+        dirname, filename = posixpath.split(self.path)
+        ftp_makedirs_cwd(ftp, dirname)
+        ftp.storbinary('STOR %s' % filename, file)
+        ftp.quit()
+
+
+class SpiderSlot(object):
+    def __init__(self, file, exp):
+        self.file = file
+        self.exporter = exp
+        self.itemcount = 0
+
+class FeedExporter(object):
+
+    def __init__(self):
+        self.urifmt = settings['FEED_URI']
+        if not self.urifmt:
+            raise NotConfigured
+        self.format = settings['FEED_FORMAT'].lower()
+        self.storages = self._load_components('FEED_STORAGES')
+        self.exporters = self._load_components('FEED_EXPORTERS')
+        if not self._storage_supported(self.urifmt):
+            raise NotConfigured
+        if not self._exporter_supported(self.format):
+            raise NotConfigured
+        self.store_empty = settings.getbool('FEED_STORE_EMPTY')
+        uripar = settings['FEED_URI_PARAMS']
+        self._uripar = load_object(uripar) if uripar else lambda x, y: None
+        self.slots = {}
+        dispatcher.connect(self.open_spider, signals.spider_opened)
+        dispatcher.connect(self.close_spider, signals.spider_closed)
+        dispatcher.connect(self.item_passed, signals.item_passed)
+
+    def open_spider(self, spider):
+        file = TemporaryFile(prefix='feed-')
+        exp = self._get_exporter(file)
+        exp.start_exporting()
+        self.slots[spider] = SpiderSlot(file, exp)
+
+    def close_spider(self, spider):
+        slot = self.slots.pop(spider)
+        if not slot.itemcount and not self.store_empty:
+            return
+        slot.exporter.finish_exporting()
+        nbytes = slot.file.tell()
+        slot.file.seek(0)
+        uri = self.urifmt % self._get_uri_params(spider)
+        storage = self._get_storage(uri)
+        if not storage:
+            return
+        logfmt = "%%s %s feed (%d items, %d bytes) in: %s" % (self.format, \
+            slot.itemcount, nbytes, uri)
+        d = defer.maybeDeferred(storage.store, slot.file, spider)
+        d.addCallback(lambda _: log.msg(logfmt % "Stored", spider=spider))
+        d.addErrback(log.err, logfmt % "Error storing", spider=spider)
+        d.addBoth(lambda _: slot.file.close())
+        return d
+
+    def item_passed(self, item, spider):
+        slot = self.slots[spider]
+        slot.exporter.export_item(item)
+        slot.itemcount += 1
+        return item
+
+    def _load_components(self, setting_prefix):
+        conf = dict(settings['%s_BASE' % setting_prefix])
+        conf.update(settings[setting_prefix])
+        d = {}
+        for k, v in conf.items():
+            try:
+                d[k] = load_object(v)
+            except NotConfigured:
+                pass
+        return d
+
+    def _exporter_supported(self, format):
+        if format in self.exporters:
+            return True
+        log.msg("Unknown feed format: %s" % format, log.ERROR)
+
+    def _storage_supported(self, uri):
+        scheme = urlparse(uri).scheme
+        if scheme in self.storages:
+            try:
+                self._get_storage(uri)
+                return True
+            except NotConfigured:
+                log.msg("Disabled feed storage scheme: %s" % scheme, log.ERROR)
+        else:
+            log.msg("Unknown feed storage scheme: %s" % scheme, log.ERROR)
+
+    def _get_exporter(self, *a, **kw):
+        return self.exporters[self.format](*a, **kw)
+
+    def _get_storage(self, uri):
+        return self.storages[urlparse(uri).scheme](uri)
+
+    def _get_uri_params(self, spider):
+        params = {}
+        for k in dir(spider):
+            params[k] = getattr(spider, k)
+        ts = datetime.utcnow().replace(microsecond=0).isoformat().replace(':', '-')
+        params['time'] = ts
+        self._uripar(params, spider)
+        return params
--- a/scrapy/contrib/pipeline/fileexport.py
+++ b/scrapy/contrib/pipeline/fileexport.py
-"""
-File Export Pipeline
-
-See documentation in docs/topics/item-pipeline.rst
-"""
+import warnings
+warnings.warn("File export pipeline is deprecated and will be removed in Scrapy 0.11, use Feed exports instead", \
+    DeprecationWarning, stacklevel=2)

 from scrapy.xlib.pydispatch import dispatcher
 from scrapy import signals

--- a/scrapy/tests/test_contrib_feedexport.py
+++ b/scrapy/tests/test_contrib_feedexport.py
+import os, urlparse
+
+from twisted.trial import unittest
+from twisted.internet import defer
+from cStringIO import StringIO
+
+from scrapy.spider import BaseSpider
+from scrapy.contrib.feedexport import FileFeedStorage, FTPFeedStorage, S3FeedStorage
+
+class FeedStorageTest(unittest.TestCase):
+
+    @defer.inlineCallbacks
+    def _assert_stores(self, storage, path):
+        yield storage.store(StringIO("content"), BaseSpider("default"))
+        self.failUnless(os.path.exists(path))
+        self.failUnlessEqual(open(path).read(), "content")
+        # again, to check files are overwritten properly
+        yield storage.store(StringIO("new content"), BaseSpider("default"))
+        self.failUnlessEqual(open(path).read(), "new content")
+
+
+class FileFeedStorageTest(FeedStorageTest):
+
+    def test_store_file_uri(self):
+        path = os.path.abspath(self.mktemp())
+        uri = "file://%s" % path
+        return self._assert_stores(FileFeedStorage(uri), path)
+
+    def test_store_file_uri_makedirs(self):
+        path = os.path.abspath(self.mktemp())
+        path = os.path.join(path, 'more', 'paths', 'file.txt')
+        uri = "file://%s" % path
+        return self._assert_stores(FileFeedStorage(uri), path)
+
+    def test_store_direct_path(self):
+        path = os.path.abspath(self.mktemp())
+        return self._assert_stores(FileFeedStorage(path), path)
+
+    def test_store_direct_path_relative(self):
+        path = self.mktemp()
+        return self._assert_stores(FileFeedStorage(path), path)
+
+
+class FTPFeedStorageTest(FeedStorageTest):
+
+    def test_store(self):
+        uri = os.environ.get('FEEDTEST_FTP_URI')
+        path = os.environ.get('FEEDTEST_FTP_PATH')
+        if not (uri and path):
+            raise unittest.SkipTest("No FTP server available for testing")
+        return self._assert_stores(FTPFeedStorage(uri), path)
+
+
+class S3FeedStorageTest(unittest.TestCase):
+
+    @defer.inlineCallbacks
+    def test_store(self):
+        uri = os.environ.get('FEEDTEST_S3_URI')
+        if not uri:
+            raise unittest.SkipTest("No S3 bucket available for testing")
+        try:
+            from boto import connect_s3
+        except ImportError:
+            raise unittest.SkipTest("Missing library: boto")
+        storage = S3FeedStorage(uri)
+        yield storage.store(StringIO("content"), BaseSpider("default"))
+        u = urlparse.urlparse(uri)
+        key = connect_s3().get_bucket(u.hostname, validate=False).get_key(u.path)
+        self.failUnlessEqual(key.get_contents_as_string(), "content")
+
--- a/scrapy/utils/ftp.py
+++ b/scrapy/utils/ftp.py
+from ftplib import error_perm
+from posixpath import dirname
+
+def ftp_makedirs_cwd(ftp, path, first_call=True):
+    """Set the current directory of the FTP connection given in the `ftp`
+    argument (as a ftplib.FTP object), creating all parent directories if they
+    don't exist. The ftplib.FTP object must be already connected and logged in.
+    """
+    try:
+        ftp.cwd(path)
+    except error_perm:
+        ftp_makedirs_cwd(ftp, dirname(path), False)
+        ftp.mkd(path)
+        if first_call:
+            ftp.cwd(path)