Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
looyolo
scrapy
提交
d3ced85e
S
scrapy
项目概览
looyolo
/
scrapy
与 Fork 源项目一致
从无法访问的项目Fork
通知
2
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
S
scrapy
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
d3ced85e
编写于
5月 18, 2016
作者:
E
Elias Dorneles
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #1995 from scrapy/docs-errback
DOC Add info and example on errbacks
上级
c2c8036a
b3367c7a
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
65 addition
and
0 deletion
+65
-0
docs/topics/request-response.rst
docs/topics/request-response.rst
+65
-0
未找到文件。
docs/topics/request-response.rst
浏览文件 @
d3ced85e
...
...
@@ -117,6 +117,8 @@ Request objects
raised while processing the request. This includes pages that failed
with 404 HTTP errors and such. It receives a `Twisted Failure`_ instance
as first parameter.
For more information,
see :ref:`topics-request-response-ref-errbacks` below.
:type errback: callable
.. attribute:: Request.url
...
...
@@ -212,6 +214,69 @@ different fields from different pages::
item['other_url'] = response.url
return item
.. _topics-request-response-ref-errbacks:
Using errbacks to catch exceptions in request processing
--------------------------------------------------------
The errback of a request is a function that will be called when an exception
is raise while processing it.
It receives a `Twisted Failure`_ instance as first parameter and can be
used to track connection establishment timeouts, DNS errors etc.
Here's an example spider logging all errors and catching some specific
errors if needed::
import scrapy
from scrapy.spidermiddlewares.httperror import HttpError
from twisted.internet.error import DNSLookupError
from twisted.internet.error import TimeoutError, TCPTimedOutError
class ErrbackSpider(scrapy.Spider):
name = "errback_example"
start_urls = [
"http://www.httpbin.org/", # HTTP 200 expected
"http://www.httpbin.org/status/404", # Not found error
"http://www.httpbin.org/status/500", # server issue
"http://www.httpbin.org:12345/", # non-responding host, timeout expected
"http://www.httphttpbinbin.org/", # DNS error expected
]
def start_requests(self):
for u in self.start_urls:
yield scrapy.Request(u, callback=self.parse_httpbin,
errback=self.errback_httpbin,
dont_filter=True)
def parse_httpbin(self, response):
self.logger.info('Got successful response from {}'.format(response.url))
# do something useful here...
def errback_httpbin(self, failure):
# log all failures
self.logger.error(repr(failure))
# in case you want to do something special for some errors,
# you may need the failure's type:
if failure.check(HttpError):
# these exceptions come from HttpError spider middleware
# you can get the non-200 response
response = failure.value.response
self.logger.error('HttpError on %s', response.url)
elif failure.check(DNSLookupError):
# this is the original request
request = failure.request
self.logger.error('DNSLookupError on %s', request.url)
elif failure.check(TimeoutError, TCPTimedOutError):
request = failure.request
self.logger.error('TimeoutError on %s', request.url)
.. _topics-request-meta:
Request.meta special keys
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录