提交 d3ced85e 编写于 作者: E Elias Dorneles

Merge pull request #1995 from scrapy/docs-errback

DOC Add info and example on errbacks
......@@ -117,6 +117,8 @@ Request objects
raised while processing the request. This includes pages that failed
with 404 HTTP errors and such. It receives a `Twisted Failure`_ instance
as first parameter.
For more information,
see :ref:`topics-request-response-ref-errbacks` below.
:type errback: callable
.. attribute:: Request.url
......@@ -212,6 +214,69 @@ different fields from different pages::
item['other_url'] = response.url
return item
.. _topics-request-response-ref-errbacks:
Using errbacks to catch exceptions in request processing
--------------------------------------------------------
The errback of a request is a function that will be called when an exception
is raise while processing it.
It receives a `Twisted Failure`_ instance as first parameter and can be
used to track connection establishment timeouts, DNS errors etc.
Here's an example spider logging all errors and catching some specific
errors if needed::
import scrapy
from scrapy.spidermiddlewares.httperror import HttpError
from twisted.internet.error import DNSLookupError
from twisted.internet.error import TimeoutError, TCPTimedOutError
class ErrbackSpider(scrapy.Spider):
name = "errback_example"
start_urls = [
"http://www.httpbin.org/", # HTTP 200 expected
"http://www.httpbin.org/status/404", # Not found error
"http://www.httpbin.org/status/500", # server issue
"http://www.httpbin.org:12345/", # non-responding host, timeout expected
"http://www.httphttpbinbin.org/", # DNS error expected
]
def start_requests(self):
for u in self.start_urls:
yield scrapy.Request(u, callback=self.parse_httpbin,
errback=self.errback_httpbin,
dont_filter=True)
def parse_httpbin(self, response):
self.logger.info('Got successful response from {}'.format(response.url))
# do something useful here...
def errback_httpbin(self, failure):
# log all failures
self.logger.error(repr(failure))
# in case you want to do something special for some errors,
# you may need the failure's type:
if failure.check(HttpError):
# these exceptions come from HttpError spider middleware
# you can get the non-200 response
response = failure.value.response
self.logger.error('HttpError on %s', response.url)
elif failure.check(DNSLookupError):
# this is the original request
request = failure.request
self.logger.error('DNSLookupError on %s', request.url)
elif failure.check(TimeoutError, TCPTimedOutError):
request = failure.request
self.logger.error('TimeoutError on %s', request.url)
.. _topics-request-meta:
Request.meta special keys
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册