提交 6ab99018 编写于 作者: A Adrián Chaves

Document get_retry_requests

上级 ec836dcc
......@@ -892,6 +892,11 @@ settings (see the settings documentation for more info):
If :attr:`Request.meta <scrapy.http.Request.meta>` has ``dont_retry`` key
set to True, the request will be ignored by this middleware.
To retry requests from a spider callback, you can use the
:func:`get_retry_request` function:
.. autofunction:: get_retry_request
RetryMiddleware Settings
~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -932,6 +937,18 @@ In some cases you may want to add 400 to :setting:`RETRY_HTTP_CODES` because
it is a common code used to indicate server overload. It is not included by
default because HTTP specs say so.
.. setting:: RETRY_PRIORITY_ADJUST
RETRY_PRIORITY_ADJUST
---------------------
Default: ``-1``
Adjust retry request priority relative to original request:
- a positive priority adjust means higher priority.
- **a negative priority adjust (default) means lower priority.**
.. _topics-dlmw-robots:
......
......@@ -1188,20 +1188,6 @@ Adjust redirect request priority relative to original request:
- **a positive priority adjust (default) means higher priority.**
- a negative priority adjust means lower priority.
.. setting:: RETRY_PRIORITY_ADJUST
RETRY_PRIORITY_ADJUST
---------------------
Default: ``-1``
Scope: ``scrapy.downloadermiddlewares.retry.RetryMiddleware``
Adjust retry request priority relative to original request:
- a positive priority adjust means higher priority.
- **a negative priority adjust (default) means lower priority.**
.. setting:: ROBOTSTXT_OBEY
ROBOTSTXT_OBEY
......
......@@ -39,16 +39,51 @@ def get_retry_request(
max_retry_times=None,
priority_adjust=None,
):
"""
Returns a new :class:`~scrapy.Request` object to retry the specified
request, or ``None`` if retries of the specified request have been
exhausted.
For example, in a :class:`~scrapy.Spider` callback, you could use it as
follows::
def parse(self, response):
if not response.text:
new_request = get_retry_request(
response.request,
spider=self,
reason='empty',
)
if new_request:
yield new_request
return
*spider* is the :class:`~scrapy.Spider` instance which is asking for the
retry request. It is used to access the :ref:`settings <topics-settings>`
and :ref:`stats <topics-stats>`, and to provide extra logging context (see
:func:`logging.debug`).
*reason* is a string or an :class:`Exception` object that indicates the
reason why the request needs to be retried. It is used to name retry stats.
*max_retry_times* is a number that determines the maximum number of times
that *request* can be retried. If not specified or ``None``, the number is
read from the :reqmeta:`max_retry_times` meta key of the request. If the
:reqmeta:`max_retry_times` meta key is not defined or ``None``, the number
is read from the :setting:`RETRY_TIMES` setting.
*priority_adjust* is a number that determines how the priority of the new
request changes in relation to *request*. If not specified, the number is
read from the :setting:`RETRY_PRIORITY_ADJUST` setting.
"""
settings = spider.crawler.settings
stats = spider.crawler.stats
retry_times = request.meta.get('retry_times', 0) + 1
request_max_retry_times = request.meta.get(
'max_retry_times',
max_retry_times,
)
if request_max_retry_times is None:
request_max_retry_times = settings.getint('RETRY_TIMES')
if retry_times <= request_max_retry_times:
if max_retry_times is None:
max_retry_times = request.meta.get('max_retry_times')
if max_retry_times is None:
max_retry_times = settings.getint('RETRY_TIMES')
if retry_times <= max_retry_times:
logger.debug(
"Retrying %(request)s (failed %(retry_times)d times): %(reason)s",
{'request': request, 'retry_times': retry_times, 'reason': reason},
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册