Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
looyolo
scrapy
提交
bc0f481a
S
scrapy
项目概览
looyolo
/
scrapy
与 Fork 源项目一致
从无法访问的项目Fork
通知
2
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
S
scrapy
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
bc0f481a
编写于
9月 21, 2014
作者:
M
Mikhail Korobov
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
DOC bring back notes about multiple spiders per process because it is now documented how to do that
上级
a122fdbf
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
26 addition
and
6 deletion
+26
-6
docs/topics/leaks.rst
docs/topics/leaks.rst
+24
-6
docs/topics/practices.rst
docs/topics/practices.rst
+2
-0
未找到文件。
docs/topics/leaks.rst
浏览文件 @
bc0f481a
...
...
@@ -32,13 +32,16 @@ and that effectively bounds the lifetime of those referenced objects to the
lifetime of the Request. This is, by far, the most common cause of memory leaks
in Scrapy projects, and a quite difficult one to debug for newcomers.
In big projects, the spiders are typically written by different people and some
of those spiders could be "leaking" and thus affecting the rest of the other
(well-written) spiders when they get to run concurrently, which, in turn,
affects the whole crawling process.
The leak could also come from a custom middleware, pipeline or extension that
you have written, if you are not releasing the (previously allocated) resources
properly.
It's hard to avoid the reasons that cause these leaks
without restricting the power of the framework, so we have decided not to
restrict the functionally but provide useful tools for debugging these leaks.
properly. For example, allocating resources on :signal:`spider_opened`
but not releasing them on :signal:`spider_closed` may cause problems if
you're running :ref:`multiple spiders per process <run-multiple-spiders>`.
.. _topics-leaks-trackrefs:
...
...
@@ -64,7 +67,10 @@ alias to the :func:`~scrapy.utils.trackref.print_live_refs` function::
FormRequest 878 oldest: 7s ago
As you can see, that report also shows the "age" of the oldest object in each
class.
class. If you're running multiple spiders per process chances are you can
figure out which spider is leaking by looking at the oldest request or response.
You can get the oldest object of each class using the
:func:`~scrapy.utils.trackref.get_oldest` function (from the telnet console).
Which objects are tracked?
--------------------------
...
...
@@ -130,6 +136,18 @@ can use the :func:`scrapy.utils.trackref.iter_all` function::
'http://www.somenastyspider.com/product.php?pid=584',
...
Too many spiders?
-----------------
If your project has too many spiders executed in parallel,
the output of :func:`prefs()` can be difficult to read.
For this reason, that function has a ``ignore`` argument which can be used to
ignore a particular class (and all its subclases). For
example, this won't show any live references to spiders::
>>> from scrapy.spider import Spider
>>> prefs(ignore=Spider)
.. module:: scrapy.utils.trackref
:synopsis: Track references of live objects
...
...
docs/topics/practices.rst
浏览文件 @
bc0f481a
...
...
@@ -69,6 +69,8 @@ the spider class as first argument in the :meth:`CrawlerRunner.crawl
.. seealso:: `Twisted Reactor Overview`_.
.. _run-multiple-spiders:
Running multiple spiders in the same process
============================================
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录