Updated documentation after singleton removal changes.

Also removed some unused code and made some minor additional refactoring.

Updated documentation after singleton removal changes.
Also removed some unused code and made some minor additional refactoring.
babfc6e7 · Pablo Hoffman · 1e12c92b · babfc6e7 · babfc6e7 · babfc6e7
18 changed file
--- a/docs/experimental/djangoitems.rst
+++ b/docs/experimental/djangoitems.rst
@@ -77,8 +77,7 @@ As said before, we can add other fields to the item::
   p['age'] = '22'
   p['sex'] = 'M'

-.. note:: fields added to the item won't be taken into account when doing a
-   :meth:`~DjangoItem.save`
+.. note:: fields added to the item won't be taken into account when doing a :meth:`~DjangoItem.save`

 And we can override the fields of the model with your own::


--- a/docs/index.rst
+++ b/docs/index.rst
@@ -176,6 +176,7 @@ Extending Scrapy
   topics/downloader-middleware
   topics/spider-middleware
   topics/extensions
+   topics/api

 :doc:`topics/architecture`
    Understand the Scrapy architecture.
@@ -187,9 +188,10 @@ Extending Scrapy
    Customize the input and output of your spiders.

 :doc:`topics/extensions`
-    Add any custom functionality using :doc:`signals <topics/signals>` and the
-    Scrapy API
+    Extend Scrapy with your custom functionality

+:doc:`topics/api`
+    Use it on extensions and middlewares to extend Scrapy functionality

 Reference
 =========

--- a/docs/news.rst
+++ b/docs/news.rst
@@ -6,6 +6,9 @@ Release notes

 Scrapy changes:

+- dropped Signals singleton. Signals should now be accesed through the Crawler.signals attribute. See the signals documentation for more info.
+- dropped Stats Collector singleton. Stats can now be accessed through the Crawler.stats attribute. See the stats collection documentation for more info.
+- documented :ref:`topics-api`
 - `lxml` is now the default selectors backend instead of `libxml2`
 - ported FormRequest.from_response() to use `lxml`_ instead of `ClientForm`_
 - removed modules: ``scrapy.xlib.BeautifulSoup`` and ``scrapy.xlib.ClientForm``
@@ -21,6 +24,7 @@ Scrapy changes:
 - removed per-spider settings (to be replaced by instantiating multiple crawler objects)
 - ``USER_AGENT`` spider attribute will no longer work, use ``user_agent`` attribute instead
 - ``DOWNLOAD_TIMEOUT`` spider attribute will no longer work, use ``download_timeout`` attribute instead
+- removed ``ENCODING_ALIASES`` setting, as encoding auto-detection has been moved to the `w3lib`_ library

 Scrapyd changes:


--- a/docs/topics/api.rst
+++ b/docs/topics/api.rst
+.. _topics-api:
+
+========
+Core API
+========
+
+.. versionadded:: 0.15
+
+This section documents the Scrapy core API, and it's intended for developers of
+extensions and middlewares.
+
+.. _topics-api-crawler:
+
+Crawler API
+===========
+
+The main entry point to Scrapy API is the :class:`~scrapy.crawler.Crawler`
+object, passed to extensions through the ``from_crawler`` class method. This
+object provides access to all Scrapy core components, and it's the only way for
+extensions to access them and hook their functionality into Scrapy.
+
+.. module:: scrapy.crawler
+   :synopsis: The Scrapy crawler
+
+The Extension Manager is responsible for loading and keeping track of installed
+extensions and it's configured through the :setting:`EXTENSIONS` setting which
+contains a dictionary of all available extensions and their order similar to
+how you :ref:`configure the downloader middlewares
+<topics-downloader-middleware-setting>`.
+
+.. class:: Crawler(settings)
+
+    The Crawler object must be instantiated with a
+    :class:`scrapy.settings.Settings` object.
+
+    .. attribute:: settings
+
+        The settings manager of this crawler.
+
+        This is used by extensions & middlewares to access the Scrapy settings
+        of this crawler.
+
+        For an introduction on Scrapy settings see :ref:`topics-settings`.
+
+        For the API see :class:`~scrapy.settings.Settings` class.
+
+    .. attribute:: signals
+
+        The signals manager of this crawler.
+
+        This is used by extensions & middlewares to hook themselves into Scrapy
+        functionality.
+
+        For an introduction on signals see :ref:`topics-signals`.
+
+        For the API see :class:`~scrapy.signalmanager.SignalManager` class.
+
+    .. attribute:: stats
+
+        The stats collector of this crawler.
+
+        This is used from extensions & middlewares to record stats of their
+        behaviour, or access stats collected by other extensions.
+
+        For an introduction on stats collection see :ref:`topics-stats`.
+
+        For the API see :class:`~scrapy.statscol.StatsCollector` class.
+
+    .. attribute:: extensions
+
+        The extension manager that keeps track of enabled extensions.
+
+        Most extensions won't need to access this attribute.
+
+        For an introduction on extensions and a list of available extensions on
+        Scrapy see :ref:`topics-extensions`.
+
+    .. attribute:: spiders
+
+        The spider manager which takes care of loading and instantiating
+        spiders.
+
+        Most extensions won't need to access this attribute.
+
+    .. attribute:: engine
+
+        The execution engine, which coordinates the core crawling logic
+        between the scheduler, downloader and spiders.
+
+        Some extension may want to access the Scrapy engine, to modify inspect
+        or modify the downloader and scheduler behaviour, although this is an
+        advanced use and this API is not yet stable.
+
+    .. method:: configure()
+
+        Configure the crawler.
+
+        This loads extensions, middlewares and spiders, leaving the crawler
+        ready to be started. It also configures the execution engine.
+
+    .. method:: start()
+
+        Start the crawler. This calss :meth:`configure` if it hasn't been called yet.
+
+Settings API
+============
+
+.. module:: scrapy.settings
+   :synopsis: Settings manager
+
+.. class:: Settings()
+
+    This object that provides access to Scrapy settings.
+
+    .. attribute:: overrides
+
+       Global overrides are the ones that take most precedence, and are usually
+       populated by command-line options.
+
+       Overrides should be populated *before* configuring the Crawler object
+       (through the :meth:`~scrapy.crawler.Crawler.configure` method),
+       otherwise they won't have any effect. You don't typically need to worry
+       about overrides unless you are implementing your own Scrapy command.
+
+    .. method:: get(name, default=None)
+
+       Get a setting value without affecting its original type.
+
+       :param name: the setting name
+       :type name: string
+
+       :param default: the value to return if no setting is found
+       :type default: any
+
+    .. method:: getbool(name, default=False)
+
+       Get a setting value as a boolean. For example, both ``1`` and ``'1'``, and
+       ``True`` return ``True``, while ``0``, ``'0'``, ``False`` and ``None``
+       return ``False````
+
+       For example, settings populated through environment variables set to ``'0'``
+       will return ``False`` when using this method.
+
+       :param name: the setting name
+       :type name: string
+
+       :param default: the value to return if no setting is found
+       :type default: any
+
+    .. method:: getint(name, default=0)
+
+       Get a setting value as an int
+
+       :param name: the setting name
+       :type name: string
+
+       :param default: the value to return if no setting is found
+       :type default: any
+
+    .. method:: getfloat(name, default=0.0)
+
+       Get a setting value as a float
+
+       :param name: the setting name
+       :type name: string
+
+       :param default: the value to return if no setting is found
+       :type default: any
+
+    .. method:: getlist(name, default=None)
+
+       Get a setting value as a list. If the setting original type is a list it
+       will be returned verbatim. If it's a string it will be split by ",".
+
+       For example, settings populated through environment variables set to
+       ``'one,two'`` will return a list ['one', 'two'] when using this method.
+
+       :param name: the setting name
+       :type name: string
+
+       :param default: the value to return if no setting is found
+       :type default: any
+
+.. _topics-api-signals:
+
+Signals API
+===========
+
+.. module:: scrapy.signalmanager
+   :synopsis: The signal manager
+
+.. class:: SignalManager
+
+    .. method:: connect(receiver, signal)
+
+        Connect a receiver function to a signal.
+
+        The signal can be any object, although Scrapy comes with some
+        predefined signals that are documented in the :ref:`topics-signals`
+        section.
+
+        :param receiver: the function to be connected
+        :type receiver: callable
+
+        :param signal: the signal to connect to
+        :type signal: object
+
+    .. method:: send_catch_log(signal, \*\*kwargs)
+
+        Send a signal, catch exceptions and log them.
+
+        The keyword arguments are passed to the signal handlers (connected
+        through the :meth:`connect` method).
+
+    .. method:: send_catch_log_deferred(signal, \*\*kwargs)
+
+        Like :meth:`send_catch_log` but supports returning `deferreds`_ from
+        signal handlers.
+
+        Returns a `deferred`_ that gets fired once all signal handlers
+        deferreds were fired. Send a signal, catch exceptions and log them.
+
+        The keyword arguments are passed to the signal handlers (connected
+        through the :meth:`connect` method).
+
+    .. method:: disconnect(receiver, signal)
+
+        Disconnect a receiver function from a signal. This has the opposite
+        effect of the :meth:`connect` method, and the arguments are the same.
+
+    .. method:: disconnect_all(signal)
+
+        Disconnect all receivers from the given signal.
+
+        :param signal: the signal to disconnect from
+        :type signal: object
+
+.. _topics-api-stats:
+
+Stats Collector API
+===================
+
+There are several Stats Collectors available under the
+:mod:`scrapy.statscol` module and they all implement the Stats
+Collector API defined by the :class:`~scrapy.statscol.StatsCollector`
+class (which they all inherit from).
+
+.. module:: scrapy.statscol
+   :synopsis: Stats Collectors
+
+.. class:: StatsCollector
+
+    .. method:: get_value(key, default=None, spider=None)
+
+        Return the value for the given stats key or default if it doesn't exist.
+        If spider is ``None`` the global stats table is consulted, otherwise the
+        spider specific one is. If the spider is not yet opened a ``KeyError``
+        exception is raised.
+
+    .. method:: get_stats(spider=None)
+
+        Get all stats from the given spider (if spider is given) or all global
+        stats otherwise, as a dict. If spider is not opened ``KeyError`` is
+        raised.
+
+    .. method:: set_value(key, value, spider=None)
+
+        Set the given value for the given stats key on the global stats (if
+        spider is not given) or the spider-specific stats (if spider is given),
+        which must be opened or a ``KeyError`` will be raised.
+
+    .. method:: set_stats(stats, spider=None)
+
+        Set the given stats (as a dict) for the given spider. If the spider is
+        not opened a ``KeyError`` will be raised.
+
+    .. method:: inc_value(key, count=1, start=0, spider=None)
+
+        Increment the value of the given stats key, by the given count,
+        assuming the start value given (when it's not set). If spider is not
+        given the global stats table is used, otherwise the spider-specific
+        stats table is used, which must be opened or a ``KeyError`` will be
+        raised.
+
+    .. method:: max_value(key, value, spider=None)
+
+        Set the given value for the given key only if current value for the
+        same key is lower than value. If there is no current value for the
+        given key, the value is always set. If spider is not given, the global
+        stats table is used, otherwise the spider-specific stats table is used,
+        which must be opened or a KeyError will be raised.
+
+    .. method:: min_value(key, value, spider=None)
+
+        Set the given value for the given key only if current value for the
+        same key is greater than value. If there is no current value for the
+        given key, the value is always set. If spider is not given, the global
+        stats table is used, otherwise the spider-specific stats table is used,
+        which must be opened or a KeyError will be raised.
+
+    .. method:: clear_stats(spider=None)
+
+        Clear all global stats (if spider is not given) or all spider-specific
+        stats if spider is given, in which case it must be opened or a
+        ``KeyError`` will be raised.
+
+    .. method:: iter_spider_stats()
+
+        Return a iterator over ``(spider, spider_stats)`` for each open spider
+        currently tracked by the stats collector, where ``spider_stats`` is the
+        dict containing all spider-specific stats.
+
+        Global stats are not included in the iterator. If you want to get
+        those, use :meth:`get_stats` method.
+
+    The following methods are not part of the stats collection api but instead
+    used when implementing custom stats collectors:
+
+    .. method:: open_spider(spider)
+
+        Open the given spider for stats collection.
+
+    .. method:: close_spider(spider)
+
+        Close the given spider. After this is called, no more specific stats
+        for this spider can be accessed.
+
+    .. method:: engine_stopped()
+
+        Called after the engine is stopped, to dump or persist global stats.
+
+.. _deferreds: http://twistedmatrix.com/documents/current/core/howto/defer.html
+.. _deferred: http://twistedmatrix.com/documents/current/core/howto/defer.html
--- a/docs/topics/architecture.rst
+++ b/docs/topics/architecture.rst
@@ -80,7 +80,8 @@ functionality by plugging custom code. For more information see
 Data flow
 =========

-The data flow in Scrapy is controlled by the Engine, and goes like this:
+The data flow in Scrapy is controlled by the execution engine, and goes like
+this:

 1. The Engine opens a domain, locates the Spider that handles that domain, and
   asks the spider for the first URLs to crawl.

--- a/docs/topics/email.rst
+++ b/docs/topics/email.rst
@@ -12,7 +12,7 @@ library, Scrapy provides its own facility for sending e-mails which is very
 easy to use and it's implemented using `Twisted non-blocking IO`_, to avoid
 interfering with the non-blocking IO of the crawler. It also provides a
 simple API for sending attachments and it's very easy to configure, with a few
-:ref:`settings <topics-email-settings`.
+:ref:`settings <topics-email-settings>`.

 .. _smtplib: http://docs.python.org/library/smtplib.html
 .. _Twisted non-blocking IO: http://twistedmatrix.com/projects/core/documentation/howto/async.html

--- a/docs/topics/exporters.rst
+++ b/docs/topics/exporters.rst
@@ -39,17 +39,21 @@ the end of the exporting process
 Here you can see an :doc:`Item Pipeline <item-pipeline>` which uses an Item
 Exporter to export scraped items to different files, one per spider::

-   from scrapy.xlib.pydispatch import dispatcher
   from scrapy import signals
   from scrapy.contrib.exporter import XmlItemExporter

   class XmlExportPipeline(object):

       def __init__(self):
-           dispatcher.connect(self.spider_opened, signals.spider_opened) 
-           dispatcher.connect(self.spider_closed, signals.spider_closed)
           self.files = {}

+        @classmethod
+        def from_crawler(cls, crawler):
+            pipeline = cls()
+            crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
+            crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
+            return pipeline
+
       def spider_opened(self, spider):
           file = open('%s_products.xml' % spider.name, 'w+b')
           self.files[spider] = file

--- a/docs/topics/extensions.rst
+++ b/docs/topics/extensions.rst
@@ -62,113 +62,81 @@ Not all available extensions will be enabled. Some of them usually depend on a
 particular setting. For example, the HTTP Cache extension is available by default
 but disabled unless the :setting:`HTTPCACHE_ENABLED` setting is set.

-Accessing enabled extensions
-============================
-
-Even though it's not usually needed, you can access extension objects through
-the :ref:`topics-extensions-ref-manager` which is populated when extensions are
-loaded.  For example, to access the ``WebService`` extension::
-
-    from scrapy.project import extensions
-    webservice_extension = extensions.enabled['WebService']
-
-.. see also::
-
-    :ref:`topics-extensions-ref-manager`, for the complete Extension Manager
-    reference.
-
 Writing your own extension
 ==========================

 Writing your own extension is easy. Each extension is a single Python class
 which doesn't need to implement any particular method. 

-All extension initialization code must be performed in the class constructor
-(``__init__`` method). If that method raises the
-:exc:`~scrapy.exceptions.NotConfigured` exception, the extension will be
-disabled. Otherwise, the extension will be enabled.
-
-Let's take a look at the following example extension which just logs a message
-every time a domain/spider is opened and closed::
-
-    from scrapy.xlib.pydispatch import dispatcher
-    from scrapy import signals
-
-    class SpiderOpenCloseLogging(object):
-
-        def __init__(self):
-            dispatcher.connect(self.spider_opened, signal=signals.spider_opened)
-            dispatcher.connect(self.spider_closed, signal=signals.spider_closed)
+The main entry point for a Scrapy extension (this also includes middlewares and
+pipelines) is the ``from_crawler`` class method which receives a
+``Crawler`` instance which is the main object controlling the Scrapy crawler.
+Through that object you can access settings, signals, stats, and also control
+the crawler behaviour, if your extension needs to such thing.

-        def spider_opened(self, spider):
-            log.msg("opened spider %s" % spider.name)
-
-        def spider_closed(self, spider):
-            log.msg("closed spider %s" % spider.name)
-
-
-.. _topics-extensions-ref-manager:
-
-Extension Manager
-=================
-
-.. module:: scrapy.extension
-   :synopsis: The extension manager
+Typically, extensions connect to :ref:`signals <topics-signals>` and perform
+tasks triggered by them.

-The Extension Manager is responsible for loading and keeping track of installed
-extensions and it's configured through the :setting:`EXTENSIONS` setting which
-contains a dictionary of all available extensions and their order similar to
-how you :ref:`configure the downloader middlewares
-<topics-downloader-middleware-setting>`.
+Finally, if the ``from_crawler`` method raises the
+:exc:`~scrapy.exceptions.NotConfigured` exception, the extension will be
+disabled. Otherwise, the extension will be enabled.

-.. class:: ExtensionManager
+Sample extension
+----------------

-    The Extension Manager is a singleton object, which is instantiated at module
-    loading time and can be accessed like this::
+Here we will implement a simple extension to illustrate the concepts described
+in the previous section. This extension will log a message every time:

-        from scrapy.project import extensions
+* a spider is opened
+* a spider is closed
+* a specific number of items are scraped

-    .. attribute:: loaded
+The extension will be enabled through the ``MYEXT_ENABLED`` setting and the
+number of items will be specified through the ``MYEXT_ITEMCOUNT`` setting.

-        A boolean which is True if extensions are already loaded or False if
-        they're not.
+Here is the code of such extension::

-    .. attribute:: enabled
+    from scrapy import signals
+    from scrapy.exceptions import NotConfigured

-        A dict with the enabled extensions. The keys are the extension class names,
-        and the values are the extension objects. Example::
+    class SpiderOpenCloseLogging(object):

-            >>> from scrapy.project import extensions
-            >>> extensions.load()
-            >>> print extensions.enabled
-            {'CoreStats': <scrapy.contrib.corestats.CoreStats object at 0x9e272ac>,
-             'WebService': <scrapy.management.telnet.TelnetConsole instance at 0xa05670c>,
-            ...
+        def __init__(self, item_count):
+            self.item_count = item_count
+            self.items_scraped = 0

-    .. attribute:: disabled
+        @classmethod
+        def from_crawler(cls, crawler):
+            # first check if the extension should be enabled and raise
+            # NotConfigured otherwise
+            if not crawler.settings.getbool('MYEXT_ENABLED'):
+                raise NotConfigured

-        A dict with the disabled extensions. The keys are the extension class names,
-        and the values are the extension class paths (because objects are never
-        instantiated for disabled extensions). Example::
+            # get the number of items from settings
+            item_count = crawler.settings.getint('MYEXT_ITEMCOUNT', 1000)

-            >>> from scrapy.project import extensions
-            >>> extensions.load()
-            >>> print extensions.disabled
-            {'MemoryDebugger': 'scrapy.contrib.memdebug.MemoryDebugger',
-             'MyExtension': 'myproject.extensions.MyExtension',
-            ...
+            # instantiate the extension object
+            ext = cls(item_count)

-    .. method:: load()
+            # connect the extension object to signals
+            crawler.signals.connect(ext.spider_opened, signal=signals.spider_opened)
+            crawler.signals.connect(ext.spider_closed, signal=signals.spider_closed)
+            crawler.signals.connect(ext.item_scraped, signal=signals.item_scraped)

-        Load the available extensions configured in the :setting:`EXTENSIONS`
-        setting. On a standard run, this method is usually called by the Execution
-        Manager, but you may need to call it explicitly if you're dealing with
-        code outside Scrapy.
+            # return the extension object 
+            return ext

-    .. method:: reload()
+        def spider_opened(self, spider):
+            spider.log("opened spider %s" % spider.name)

-        Reload the available extensions. See :meth:`load`.
+        def spider_closed(self, spider):
+            spider.log("closed spider %s" % spider.name)

+        def item_scrapde(self, item, spider):
+            self.items_scraped += 1
+            if self.items_scraped == self.item_count:
+                spider.log("scraped %d items, resetting counter" % self.items_scraped)
+                self.item_count = 0

 .. _topics-extensions-ref:


--- a/docs/topics/firebug.rst
+++ b/docs/topics/firebug.rst
@@ -7,8 +7,8 @@ Using Firebug for scraping
 .. note:: Google Directory, the example website used in this guide is no longer
   available as it `has been shut down by Google`_. The concepts in this guide
   are still valid though. If you want to update this guide to use a new
-   (working) site, your contribution will be more than welcome!. See
-   :ref:`topics-contributing` for information on how to do so.
+   (working) site, your contribution will be more than welcome!. See :ref:`topics-contributing`
+   for information on how to do so.

 Introduction
 ============

--- a/docs/topics/item-pipeline.rst
+++ b/docs/topics/item-pipeline.rst
@@ -104,47 +104,37 @@ format::
   item pipelines. If you really want to store all scraped items into a JSON
   file you should use the :ref:`Feed exports <topics-feed-exports>`.

-Activating an Item Pipeline component
-=====================================
-
-To activate an Item Pipeline component you must add its class to the
-:setting:`ITEM_PIPELINES` list, like in the following example::
-
-   ITEM_PIPELINES = [
-       'myproject.pipeline.PricePipeline',
-       'myproject.pipeline.JsonWriterPipeline',
-   ]
-
-Item pipeline example with resources per spider
-===============================================
+Duplicates filter
+-----------------

-Sometimes you need to keep resources about the items processed grouped per
-spider, and delete those resource when a spider finishes.
+A filter that looks for duplicate items, and drops those items that were
+already processed. Let say that our items have an unique id, but our spider
+returns multiples items with the same id::

-An example is a filter that looks for duplicate items, and drops those items
-that were already processed. Let say that our items have an unique id, but our
-spider returns multiples items with the same id::

-
-    from scrapy.xlib.pydispatch import dispatcher
    from scrapy import signals
    from scrapy.exceptions import DropItem

    class DuplicatesPipeline(object):
-        def __init__(self):
-            self.duplicates = {}
-            dispatcher.connect(self.spider_opened, signals.spider_opened)
-            dispatcher.connect(self.spider_closed, signals.spider_closed)
-
-        def spider_opened(self, spider):
-            self.duplicates[spider] = set()

-        def spider_closed(self, spider):
-            del self.duplicates[spider]
+        def __init__(self):
+            self.ids_seen = set()

        def process_item(self, item, spider):
-            if item['id'] in self.duplicates[spider]:
+            if item['id'] in self.ids_seen:
                raise DropItem("Duplicate item found: %s" % item)
            else:
-                self.duplicates[spider].add(item['id'])
+                self.ids_seen.add(item['id'])
                return item
+
+
+Activating an Item Pipeline component
+=====================================
+
+To activate an Item Pipeline component you must add its class to the
+:setting:`ITEM_PIPELINES` list, like in the following example::
+
+   ITEM_PIPELINES = [
+       'myproject.pipeline.PricePipeline',
+       'myproject.pipeline.JsonWriterPipeline',
+   ]
--- a/docs/topics/request-response.rst
+++ b/docs/topics/request-response.rst
@@ -51,11 +51,13 @@ Request objects
       (for single valued headers) or lists (for multi-valued headers).
    :type headers: dict

-    :param cookies: the request cookies. These can be sent in two forms::
+    :param cookies: the request cookies. These can be sent in two forms.
+
+        1. Using a dict::

            request_with_cookies = Request(url="http://www.example.com",
                                           cookies={'currency': 'USD', 'country': 'UY'})
-        ::
+        2. Using a list of dicts::

            request_with_cookies = Request(url="http://www.example.com",
                                           cookies=[{'name': 'currency',

--- a/docs/topics/settings.rst
+++ b/docs/topics/settings.rst
@@ -4,9 +4,6 @@
 Settings
 ========

-.. module:: scrapy.conf
-   :synopsis: Settings manager
-
 The Scrapy settings allows you to customize the behaviour of all Scrapy
 components, including the core, extensions, pipelines and spiders themselves.

@@ -49,14 +46,11 @@ These mechanisms are described in more detail below.
 -------------------

 Global overrides are the ones that take most precedence, and are usually
-populated by command-line options.
-
-Example::
-   >>> from scrapy.conf import settings
-   >>> settings.overrides['LOG_ENABLED'] = True
+populated by command-line options. You can also override one (or more) settings
+from command line using the ``-s`` (or ``--set``) command line option. 

-You can also override one (or more) settings from command line using the
-``-s`` (or ``--set``) command line option. 
+For more information see the :attr:`~scrapy.settings.Settings.overrides`
+Settings attribute.

 .. highlight:: sh

@@ -90,82 +84,22 @@ How to access settings

 .. highlight:: python

-Here's an example of the simplest way to access settings from Python code::
+Settings can be accessed through the :attr:`scrapy.crawler.Crawler.settings`
+attribute of the Crawler that is passed to ``from_crawler`` method in
+extensions and middlewares::
+
+    class MyExtension(object):

-   >>> from scrapy.conf import settings
-   >>> print settings['LOG_ENABLED']
-   True
+        @classmethod
+        def from_crawler(cls, crawler):
+            settings = crawler.settings
+            if settings['LOG_ENABLED']:
+                print "log is enabled!"

 In other words, settings can be accesed like a dict, but it's usually preferred
 to extract the setting in the format you need it to avoid type errors. In order
-to do that you'll have to use one of the following methods:
-
-.. class:: Settings()
-
-   There is a (singleton) Settings object automatically instantiated when the
-   :mod:`scrapy.conf` module is loaded, and it's usually accessed like this::
-
-   >>> from scrapy.conf import settings
-
-    .. method:: get(name, default=None)
-
-       Get a setting value without affecting its original type.
-
-       :param name: the setting name
-       :type name: string
-
-       :param default: the value to return if no setting is found
-       :type default: any
-
-    .. method:: getbool(name, default=False)
-
-       Get a setting value as a boolean. For example, both ``1`` and ``'1'``, and
-       ``True`` return ``True``, while ``0``, ``'0'``, ``False`` and ``None``
-       return ``False````
-
-       For example, settings populated through environment variables set to ``'0'``
-       will return ``False`` when using this method.
-
-       :param name: the setting name
-       :type name: string
-
-       :param default: the value to return if no setting is found
-       :type default: any
-
-    .. method:: getint(name, default=0)
-
-       Get a setting value as an int
-
-       :param name: the setting name
-       :type name: string
-
-       :param default: the value to return if no setting is found
-       :type default: any
-
-    .. method:: getfloat(name, default=0.0)
-
-       Get a setting value as a float
-
-       :param name: the setting name
-       :type name: string
-
-       :param default: the value to return if no setting is found
-       :type default: any
-
-    .. method:: getlist(name, default=None)
-
-       Get a setting value as a list. If the setting original type is a list it
-       will be returned verbatim. If it's a string it will be split by ",".
-
-       For example, settings populated through environment variables set to
-       ``'one,two'`` will return a list ['one', 'two'] when using this method.
-
-       :param name: the setting name
-       :type name: string
-
-       :param default: the value to return if no setting is found
-       :type default: any
-
+to do that you'll have to use one of the methods provided the
+:class:`~scrapy.settings.Settings` API.

 Rationale for setting names
 ===========================
@@ -477,78 +411,17 @@ The class used to detect and filter duplicate requests.
 The default (``RFPDupeFilter``) filters based on request fingerprint using
 the ``scrapy.utils.request.request_fingerprint`` function.

-.. setting:: EDITOR
+.. setting:: jDITOR

 EDITOR
 ------

+Default: `depends on the environment`
+
 The editor to use for editing spiders with the :command:`edit` command. It
 defaults to the ``EDITOR`` environment variable, if set. Otherwise, it defaults
 to ``vi`` (on Unix systems) or the IDLE editor (on Windows).

-.. setting:: ENCODING_ALIASES
-
-ENCODING_ALIASES
----------------
-
-Default: ``{}``
-
-A mapping of custom encoding aliases for your project, where the keys are the
-aliases (and must be lower case) and the values are the encodings they map to.
-
-This setting extends the :setting:`ENCODING_ALIASES_BASE` setting which
-contains some default mappings.
-
-.. setting:: ENCODING_ALIASES_BASE
-
-ENCODING_ALIASES_BASE
---------------------
-
-Default::
-
-    {
-        # gb2312 is superseded by gb18030
-        'gb2312': 'gb18030',
-        'chinese': 'gb18030',
-        'csiso58gb231280': 'gb18030',
-        'euc- cn': 'gb18030',
-        'euccn': 'gb18030',
-        'eucgb2312-cn': 'gb18030',
-        'gb2312-1980': 'gb18030',
-        'gb2312-80': 'gb18030',
-        'iso- ir-58': 'gb18030',
-        # gbk is superseded by gb18030
-        'gbk': 'gb18030',
-        '936': 'gb18030',
-        'cp936': 'gb18030',
-        'ms936': 'gb18030',
-        # latin_1 is a subset of cp1252
-        'latin_1': 'cp1252',
-        'iso-8859-1': 'cp1252',
-        'iso8859-1': 'cp1252',
-        '8859': 'cp1252',
-        'cp819': 'cp1252',
-        'latin': 'cp1252',
-        'latin1': 'cp1252',
-        'l1': 'cp1252',
-        # others
-        'zh-cn': 'gb18030',
-        'win-1251': 'cp1251',
-        'macintosh' : 'mac_roman',
-        'x-sjis': 'shift_jis',
-    }
-
-The default encoding aliases defined in Scrapy. Don't override this setting in
-your project, override :setting:`ENCODING_ALIASES` instead.
-
-The reason why `ISO-8859-1`_ (and all its aliases) are mapped to `CP1252`_ is
-due to a well known browser hack. For more information see: `Character
-encodings in HTML`_.
-
-.. _ISO-8859-1: http://en.wikipedia.org/wiki/ISO/IEC_8859-1
-.. _CP1252: http://en.wikipedia.org/wiki/Windows-1252
-.. _Character encodings in HTML: http://en.wikipedia.org/wiki/Character_encodings_in_HTML
-
 .. setting:: EXTENSIONS

 EXTENSIONS
@@ -880,8 +753,8 @@ STATS_CLASS

 Default: ``'scrapy.statscol.MemoryStatsCollector'``

-The class to use for collecting stats (must implement the Stats Collector API,
-or subclass the StatsCollector class).
+The class to use for collecting stats, who must implement the
+:ref:`topics-api-stats`.

 .. setting:: STATS_DUMP

@@ -896,15 +769,6 @@ closed, while the global stats are dumped when the Scrapy process finishes.

 For more info see: :ref:`topics-stats`.

-.. setting:: STATS_ENABLED
-
-STATS_ENABLED
-------------
-
-Default: ``True``
-
-Enable stats collection.
-
 .. setting:: STATSMAILER_RCPTS

 STATSMAILER_RCPTS

--- a/docs/topics/signals.rst
+++ b/docs/topics/signals.rst
@@ -13,11 +13,8 @@ Even though signals provide several arguments, the handlers that catch them
 don't need to accept all of them - the signal dispatching mechanism will only
 deliver the arguments that the handler receives.

-Finally, for more detailed information about signals internals see the
-documentation of `pydispatcher`_ (the which the signal dispatching mechanism is
-based on).
-
-.. _pydispatcher: http://pydispatcher.sourceforge.net/
+You can connect to signals (or send your own) through the
+:ref:`topics-api-signals`.

 Deferred signal handlers
 ========================

--- a/docs/topics/stats.rst
+++ b/docs/topics/stats.rst
@@ -4,16 +4,11 @@
 Stats Collection
 ================

-Overview
-========
-
-Scrapy provides a convenient service for collecting stats in the form of
+Scrapy provides a convenient facility for collecting stats in the form of
 key/values, both globally and per spider. It's called the Stats Collector, and
-it's a singleton which can be imported and used quickly, as illustrated by the
-examples in the :ref:`topics-stats-usecases` section below.
-
-The stats collection is enabled by default but can be disabled through the
-:setting:`STATS_ENABLED` setting.
+can be accesed through the :attr:`~scrapy.crawler.Crawler.stats` attribute of
+the :ref:`topics-api-crawler`, as illustrated by the examples in the
+:ref:`topics-stats-usecases` section below.

 However, the Stats Collector is always available, so you can always import it
 in your module and use its API (to increment or set new stat keys), regardless
@@ -36,9 +31,12 @@ the spider is closed.
 Common Stats Collector uses
 ===========================

-Import the stats collector::
+Access the stats collector throught the :attr:`~scrapy.crawler.Crawler.stats`
+attribute::

-    from scrapy.stats import stats
+    @classmethod
+    def from_crawler(cls, crawler):
+        stats = crawler.stats

 Set global stat value::

@@ -66,8 +64,7 @@ Get all global stats (ie. not particular to any spider)::
    >>> stats.get_stats()
    {'hostname': 'localhost', 'spiders_crawled': 8}

-Set spider specific stat value (spider stats must be opened first, but this
-task is handled automatically by the Scrapy engine)::
+Set spider specific stat value::

    stats.set_value('start_time', datetime.now(), spider=some_spider)

@@ -95,100 +92,6 @@ Get all stats from a given spider::
    >>> stats.get_stats(spider=some_spider)
    {'pages_crawled': 1238, 'start_time': datetime.datetime(2009, 7, 14, 21, 47, 28, 977139)}

-.. _topics-stats-ref:
-
-Stats Collector API
-===================
-
-There are several Stats Collectors available under the
-:mod:`scrapy.statscol` module and they all implement the Stats
-Collector API defined by the :class:`~scrapy.statscol.StatsCollector`
-class (which they all inherit from).
-
-.. module:: scrapy.statscol
-   :synopsis: Basic Stats Collectors
-
-.. class:: StatsCollector
-    
-    .. method:: get_value(key, default=None, spider=None)
- 
-        Return the value for the given stats key or default if it doesn't exist.
-        If spider is ``None`` the global stats table is consulted, otherwise the
-        spider specific one is. If the spider is not yet opened a ``KeyError``
-        exception is raised.
-
-    .. method:: get_stats(spider=None)
-
-        Get all stats from the given spider (if spider is given) or all global
-        stats otherwise, as a dict. If spider is not opened ``KeyError`` is
-        raised.
-
-    .. method:: set_value(key, value, spider=None)
-
-        Set the given value for the given stats key on the global stats (if
-        spider is not given) or the spider-specific stats (if spider is given),
-        which must be opened or a ``KeyError`` will be raised.
-
-    .. method:: set_stats(stats, spider=None)
-
-        Set the given stats (as a dict) for the given spider. If the spider is
-        not opened a ``KeyError`` will be raised.
-
-    .. method:: inc_value(key, count=1, start=0, spider=None)
-
-        Increment the value of the given stats key, by the given count,
-        assuming the start value given (when it's not set). If spider is not
-        given the global stats table is used, otherwise the spider-specific
-        stats table is used, which must be opened or a ``KeyError`` will be
-        raised.
-
-    .. method:: max_value(key, value, spider=None)
-
-        Set the given value for the given key only if current value for the
-        same key is lower than value. If there is no current value for the
-        given key, the value is always set. If spider is not given, the global
-        stats table is used, otherwise the spider-specific stats table is used,
-        which must be opened or a KeyError will be raised.
-
-    .. method:: min_value(key, value, spider=None)
-
-        Set the given value for the given key only if current value for the
-        same key is greater than value. If there is no current value for the
-        given key, the value is always set. If spider is not given, the global
-        stats table is used, otherwise the spider-specific stats table is used,
-        which must be opened or a KeyError will be raised.
-
-    .. method:: clear_stats(spider=None)
-
-        Clear all global stats (if spider is not given) or all spider-specific
-        stats if spider is given, in which case it must be opened or a
-        ``KeyError`` will be raised.
-
-    .. method:: iter_spider_stats()
-
-        Return a iterator over ``(spider, spider_stats)`` for each open spider
-        currently tracked by the stats collector, where ``spider_stats`` is the
-        dict containing all spider-specific stats.
-
-        Global stats are not included in the iterator. If you want to get
-        those, use :meth:`get_stats` method.
-
-    .. method:: open_spider(spider)
-
-        Open the given spider for stats collection. This method must be called
-        prior to working with any stats specific to that spider, but this task
-        is handled automatically by the Scrapy engine.
-
-    .. method:: close_spider(spider)
-
-        Close the given spider. After this is called, no more specific stats
-        for this spider can be accessed. This method is called automatically on
-        the :signal:`spider_closed` signal.
-
-    .. method:: engine_stopped()
-
-        Called after the engine is stopped, to dump or persist global stats.
-
 Available Stats Collectors
 ==========================

@@ -197,9 +100,8 @@ available in Scrapy which extend the basic Stats Collector. You can select
 which Stats Collector to use through the :setting:`STATS_CLASS` setting. The
 default Stats Collector used is the :class:`MemoryStatsCollector`. 

-When stats are disabled (through the :setting:`STATS_ENABLED` setting) the
-:setting:`STATS_CLASS` setting is ignored and the :class:`DummyStatsCollector`
-is used.
+.. module:: scrapy.statscol
+   :synopsis: Stats Collectors

 MemoryStatsCollector
 --------------------
@@ -223,9 +125,12 @@ DummyStatsCollector

 .. class:: DummyStatsCollector

-    A Stats collector which does nothing but is very efficient. This is the
-    Stats Collector used when stats are disabled (through the
-    :setting:`STATS_ENABLED` setting).
+    A Stats collector which does nothing but is very efficient (beacuse it does
+    nothing). This stats collector can be set via the :setting:`STATS_CLASS`
+    setting, to disable stats collect in order to improve performance. However,
+    the performance penalty of stats collection is usually marginal compared to
+    other Scrapy workload like parsing pages.
+

 Stats signals
 =============

--- a/docs/topics/telnetconsole.rst
+++ b/docs/topics/telnetconsole.rst
@@ -43,21 +43,21 @@ convenience:
 +----------------+-------------------------------------------------------------------+
 | Shortcut       | Description                                                       |
 +================+===================================================================+
-| ``crawler``    | the Scrapy Crawler object (``scrapy.crawler``)                    |
+| ``crawler``    | the Scrapy Crawler (:class:`scrapy.crawler.Crawler` object)       |
 +----------------+-------------------------------------------------------------------+
-| ``engine``     | the Scrapy Engine object (``scrapy.core.engine``)                 |
+| ``engine``     | Crawler.engine attribute                                          |
 +----------------+-------------------------------------------------------------------+
-| ``spider``     | the spider object (only if there is a single spider opened)       |
+| ``spider``     | the active spider                                                 |
 +----------------+-------------------------------------------------------------------+
-| ``slot``       | the engine slot (only if there is a single spider opened)         |
+| ``slot``       | the engine slot                                                   |
 +----------------+-------------------------------------------------------------------+
-| ``extensions`` | the Extension Manager (``scrapy.project.crawler.extensions``)     |
+| ``extensions`` | the Extension Manager (Crawler.extensions attribute)              |
 +----------------+-------------------------------------------------------------------+
-| ``stats``      | the Stats Collector (``scrapy.stats.stats``)                      |
+| ``stats``      | the Stats Collector (Crawler.stats attribute)                     |
 +----------------+-------------------------------------------------------------------+
-| ``settings``   | the Scrapy settings object (``scrapy.conf.settings``)             |
+| ``settings``   | the Scrapy settings object (Crawler.settings attribute)           |
 +----------------+-------------------------------------------------------------------+
-| ``est``        | print a report of the current engine status                       |
+| ``est``        | print a report of the engine status                               |
 +----------------+-------------------------------------------------------------------+
 | ``prefs``      | for memory debugging (see :ref:`topics-leaks`)                    |
 +----------------+-------------------------------------------------------------------+

--- a/scrapy/contrib/spidermiddleware/depth.py
+++ b/scrapy/contrib/spidermiddleware/depth.py
@@ -4,11 +4,8 @@ Depth Spider Middleware
 See documentation in docs/topics/spider-middleware.rst
 """

-import warnings
-
 from scrapy import log
 from scrapy.http import Request
-from scrapy.exceptions import ScrapyDeprecationWarning

 class DepthMiddleware(object):

@@ -22,7 +19,6 @@ class DepthMiddleware(object):
    def from_crawler(cls, crawler):
        settings = crawler.settings
        maxdepth = settings.getint('DEPTH_LIMIT')
-        usestats = settings.getbool('DEPTH_STATS')
        verbose = settings.getbool('DEPTH_STATS_VERBOSE')
        prio = settings.getint('DEPTH_PRIORITY')
        return cls(maxdepth, crawler.stats, verbose, prio)

--- a/scrapy/settings/default_settings.py
+++ b/scrapy/settings/default_settings.py
@@ -216,7 +216,6 @@ SPIDER_MIDDLEWARES_BASE = {
 SPIDER_MODULES = []

 STATS_CLASS = 'scrapy.statscol.MemoryStatsCollector'
-STATS_ENABLED = True
 STATS_DUMP = True

 STATSMAILER_RCPTS = []

--- a/scrapy/tests/test_engine.py
+++ b/scrapy/tests/test_engine.py
@@ -90,13 +90,13 @@ class CrawlerRun(object):
        for name, signal in vars(signals).items():
            if not name.startswith('_'):
                dispatcher.connect(self.record_signal, signal)
-        dispatcher.connect(self.item_scraped, signals.item_scraped)
-        dispatcher.connect(self.request_received, signals.request_received)
-        dispatcher.connect(self.response_downloaded, signals.response_downloaded)

        self.crawler = get_crawler()
        self.crawler.install()
        self.crawler.configure()
+        self.crawler.signals.connect(self.item_scraped, signals.item_scraped)
+        self.crawler.signals.connect(self.request_received, signals.request_received)
+        self.crawler.signals.connect(self.response_downloaded, signals.response_downloaded)
        self.crawler.crawl(self.spider)
        self.crawler.start()