Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
looyolo
scrapy
提交
6ab99018
S
scrapy
项目概览
looyolo
/
scrapy
与 Fork 源项目一致
从无法访问的项目Fork
通知
2
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
S
scrapy
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
6ab99018
编写于
2月 22, 2021
作者:
A
Adrián Chaves
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Document get_retry_requests
上级
ec836dcc
变更
3
隐藏空白更改
内联
并排
Showing
3 changed file
with
59 addition
and
21 deletion
+59
-21
docs/topics/downloader-middleware.rst
docs/topics/downloader-middleware.rst
+17
-0
docs/topics/settings.rst
docs/topics/settings.rst
+0
-14
scrapy/downloadermiddlewares/retry.py
scrapy/downloadermiddlewares/retry.py
+42
-7
未找到文件。
docs/topics/downloader-middleware.rst
浏览文件 @
6ab99018
...
...
@@ -892,6 +892,11 @@ settings (see the settings documentation for more info):
If :attr:`Request.meta <scrapy.http.Request.meta>` has ``dont_retry`` key
set to True, the request will be ignored by this middleware.
To retry requests from a spider callback, you can use the
:func:`get_retry_request` function:
.. autofunction:: get_retry_request
RetryMiddleware Settings
~~~~~~~~~~~~~~~~~~~~~~~~
...
...
@@ -932,6 +937,18 @@ In some cases you may want to add 400 to :setting:`RETRY_HTTP_CODES` because
it is a common code used to indicate server overload. It is not included by
default because HTTP specs say so.
.. setting:: RETRY_PRIORITY_ADJUST
RETRY_PRIORITY_ADJUST
---------------------
Default: ``-1``
Adjust retry request priority relative to original request:
- a positive priority adjust means higher priority.
- **a negative priority adjust (default) means lower priority.**
.. _topics-dlmw-robots:
...
...
docs/topics/settings.rst
浏览文件 @
6ab99018
...
...
@@ -1188,20 +1188,6 @@ Adjust redirect request priority relative to original request:
- **a positive priority adjust (default) means higher priority.**
- a negative priority adjust means lower priority.
.. setting:: RETRY_PRIORITY_ADJUST
RETRY_PRIORITY_ADJUST
---------------------
Default: ``-1``
Scope: ``scrapy.downloadermiddlewares.retry.RetryMiddleware``
Adjust retry request priority relative to original request:
- a positive priority adjust means higher priority.
- **a negative priority adjust (default) means lower priority.**
.. setting:: ROBOTSTXT_OBEY
ROBOTSTXT_OBEY
...
...
scrapy/downloadermiddlewares/retry.py
浏览文件 @
6ab99018
...
...
@@ -39,16 +39,51 @@ def get_retry_request(
max_retry_times
=
None
,
priority_adjust
=
None
,
):
"""
Returns a new :class:`~scrapy.Request` object to retry the specified
request, or ``None`` if retries of the specified request have been
exhausted.
For example, in a :class:`~scrapy.Spider` callback, you could use it as
follows::
def parse(self, response):
if not response.text:
new_request = get_retry_request(
response.request,
spider=self,
reason='empty',
)
if new_request:
yield new_request
return
*spider* is the :class:`~scrapy.Spider` instance which is asking for the
retry request. It is used to access the :ref:`settings <topics-settings>`
and :ref:`stats <topics-stats>`, and to provide extra logging context (see
:func:`logging.debug`).
*reason* is a string or an :class:`Exception` object that indicates the
reason why the request needs to be retried. It is used to name retry stats.
*max_retry_times* is a number that determines the maximum number of times
that *request* can be retried. If not specified or ``None``, the number is
read from the :reqmeta:`max_retry_times` meta key of the request. If the
:reqmeta:`max_retry_times` meta key is not defined or ``None``, the number
is read from the :setting:`RETRY_TIMES` setting.
*priority_adjust* is a number that determines how the priority of the new
request changes in relation to *request*. If not specified, the number is
read from the :setting:`RETRY_PRIORITY_ADJUST` setting.
"""
settings
=
spider
.
crawler
.
settings
stats
=
spider
.
crawler
.
stats
retry_times
=
request
.
meta
.
get
(
'retry_times'
,
0
)
+
1
request_max_retry_times
=
request
.
meta
.
get
(
'max_retry_times'
,
max_retry_times
,
)
if
request_max_retry_times
is
None
:
request_max_retry_times
=
settings
.
getint
(
'RETRY_TIMES'
)
if
retry_times
<=
request_max_retry_times
:
if
max_retry_times
is
None
:
max_retry_times
=
request
.
meta
.
get
(
'max_retry_times'
)
if
max_retry_times
is
None
:
max_retry_times
=
settings
.
getint
(
'RETRY_TIMES'
)
if
retry_times
<=
max_retry_times
:
logger
.
debug
(
"Retrying %(request)s (failed %(retry_times)d times): %(reason)s"
,
{
'request'
:
request
,
'retry_times'
:
retry_times
,
'reason'
:
reason
},
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录