Merge pull request #1995 from scrapy/docs-errback

DOC Add info and example on errbacks

Merge pull request #1995 from scrapy/docs-errback
DOC Add info and example on errbacks
d3ced85e · Elias Dorneles · c2c8036a · b3367c7a · d3ced85e
隐藏空白更改
内联并排

Showing with 65 addition and 0 deletion

docs/topics/request-response.rst docs/topics/request-response.rst +65 -0

未找到文件。
--- a/docs/topics/request-response.rst
+++ b/docs/topics/request-response.rst
@@ -117,6 +117,8 @@ Request objects
       raised while processing the request. This includes pages that failed
       with 404 HTTP errors and such. It receives a `Twisted Failure`_ instance
       as first parameter.
+       For more information,
+       see :ref:`topics-request-response-ref-errbacks` below.
    :type errback: callable

    .. attribute:: Request.url
@@ -212,6 +214,69 @@ different fields from different pages::
        item['other_url'] = response.url
        return item

+
+.. _topics-request-response-ref-errbacks:
+
+Using errbacks to catch exceptions in request processing
+--------------------------------------------------------
+
+The errback of a request is a function that will be called when an exception
+is raise while processing it.
+
+It receives a `Twisted Failure`_ instance as first parameter and can be
+used to track connection establishment timeouts, DNS errors etc.
+
+Here's an example spider logging all errors and catching some specific
+errors if needed::
+
+    import scrapy
+
+    from scrapy.spidermiddlewares.httperror import HttpError
+    from twisted.internet.error import DNSLookupError
+    from twisted.internet.error import TimeoutError, TCPTimedOutError
+
+    class ErrbackSpider(scrapy.Spider):
+        name = "errback_example"
+        start_urls = [
+            "http://www.httpbin.org/",              # HTTP 200 expected
+            "http://www.httpbin.org/status/404",    # Not found error
+            "http://www.httpbin.org/status/500",    # server issue
+            "http://www.httpbin.org:12345/",        # non-responding host, timeout expected
+            "http://www.httphttpbinbin.org/",       # DNS error expected
+        ]
+
+        def start_requests(self):
+            for u in self.start_urls:
+                yield scrapy.Request(u, callback=self.parse_httpbin,
+                                        errback=self.errback_httpbin,
+                                        dont_filter=True)
+
+        def parse_httpbin(self, response):
+            self.logger.info('Got successful response from {}'.format(response.url))
+            # do something useful here...
+
+        def errback_httpbin(self, failure):
+            # log all failures
+            self.logger.error(repr(failure))
+
+            # in case you want to do something special for some errors,
+            # you may need the failure's type:
+
+            if failure.check(HttpError):
+                # these exceptions come from HttpError spider middleware
+                # you can get the non-200 response
+                response = failure.value.response
+                self.logger.error('HttpError on %s', response.url)
+
+            elif failure.check(DNSLookupError):
+                # this is the original request
+                request = failure.request
+                self.logger.error('DNSLookupError on %s', request.url)
+
+            elif failure.check(TimeoutError, TCPTimedOutError):
+                request = failure.request
+                self.logger.error('TimeoutError on %s', request.url)
+
 .. _topics-request-meta:

 Request.meta special keys