Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
looyolo
scrapy
提交
6fcf9dce
S
scrapy
项目概览
looyolo
/
scrapy
与 Fork 源项目一致
从无法访问的项目Fork
通知
2
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
S
scrapy
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
6fcf9dce
编写于
9月 25, 2014
作者:
M
Mikhail Korobov
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
DOC document from_crawler method for item pipelines; add an example.
上级
7be3479c
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
65 addition
and
6 deletion
+65
-6
docs/topics/item-pipeline.rst
docs/topics/item-pipeline.rst
+65
-6
未找到文件。
docs/topics/item-pipeline.rst
浏览文件 @
6fcf9dce
...
...
@@ -26,7 +26,7 @@ Writing your own item pipeline
Writing your own item pipeline is easy. Each item pipeline component is a
single Python class that must implement the following method:
.. method:: process_item(item, spider)
.. method:: process_item(
self,
item, spider)
This method is called for every item pipeline component and must either return
a :class:`~scrapy.item.Item` (or any descendant class) object or raise a
...
...
@@ -41,20 +41,31 @@ single Python class that must implement the following method:
Additionally, they may also implement the following methods:
.. method:: open_spider(spider)
.. method:: open_spider(s
elf, s
pider)
This method is called when the spider is opened.
:param spider: the spider which was opened
:type spider: :class:`~scrapy.spider.Spider` object
.. method:: close_spider(spider)
.. method:: close_spider(s
elf, s
pider)
This method is called when the spider is closed.
:param spider: the spider which was closed
:type spider: :class:`~scrapy.spider.Spider` object
.. method:: from_crawler(cls, crawler)
If present, this classmethod is called to create a pipeline instance
from a :class:`~scrapy.crawler.Crawler`. It must return a new instance
of the pipeline. Crawler object provides access to all Scrapy core
components like settings and signals; it is a way for pipeline to
access them and hook its functionality into Scrapy.
:param crawler: crawler that uses this pipeline
:type crawler: :class:`~scrapy.crawler.Crawler` object
Item pipeline example
=====================
...
...
@@ -62,9 +73,10 @@ Item pipeline example
Price validation and dropping items with no prices
--------------------------------------------------
Let's take a look at the following hypothetical pipeline that adjusts the ``price``
attribute for those items that do not include VAT (``price_excludes_vat``
attribute), and drops those items which don't contain a price::
Let's take a look at the following hypothetical pipeline that adjusts the
``price`` attribute for those items that do not include VAT
(``price_excludes_vat`` attribute), and drops those items which don't
contain a price::
from scrapy.exceptions import DropItem
...
...
@@ -104,6 +116,53 @@ format::
item pipelines. If you really want to store all scraped items into a JSON
file you should use the :ref:`Feed exports <topics-feed-exports>`.
Write items to MongoDB
----------------------
In this example we'll write items to MongoDB_ using pymongo_.
MongoDB address and database name are specified in Scrapy settings;
MongoDB collection is named after item class.
The main point of this example is to show how to use :meth:`from_crawler`
method and how to clean up the resources properly.
.. note::
Previous example (JsonWriterPipeline) doesn't clean up resources properly.
Fixing it is left as an exercise for the reader.
::
import pymongo
class MongoPipeline(object):
def __init__(self, mongo_uri, mongo_db):
self.mongo_uri = mongo_uri
self.mongo_db = mongo_db
@classmethod
def from_crawler(cls, crawler):
return cls(
mongo_uri=crawler.settings.get('MONGO_URI'),
mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
)
def open_spider(self, spider):
self.client = pymongo.MongoClient(self.mongo_uri)
self.db = self.client[self.mongo_db]
def close_spider(self, spider):
self.client.close()
def process_item(self, item, spider):
collection_name = item.__class__.__name__
self.db[collection_name].insert(dict(item))
return item
.. _MongoDB: http://www.mongodb.org/
.. _pymongo: http://api.mongodb.org/python/current/
Duplicates filter
-----------------
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录