提交 ad14cf05 编写于 作者: D Daniel Graña

Merge pull request #597 from chekunkov/dupefilter_request_fingerprint_test_and_docs

Dupefilter request_fingerprint test and docs
......@@ -408,7 +408,11 @@ Default: ``'scrapy.dupefilter.RFPDupeFilter'``
The class used to detect and filter duplicate requests.
The default (``RFPDupeFilter``) filters based on request fingerprint using
the ``scrapy.utils.request.request_fingerprint`` function.
the ``scrapy.utils.request.request_fingerprint`` function. In order to change
the way duplicates are checked you could subclass ``RFPDupeFilter`` and
override its ``request_fingerprint`` method. This method should accept
scrapy :class:`~scrapy.http.Request` object and return its fingerprint
(a string).
.. setting:: DUPEFILTER_DEBUG
......
import hashlib
import unittest
from scrapy.http import Request
from scrapy.dupefilter import RFPDupeFilter
from scrapy.http import Request
class RFPDupeFilterTest(unittest.TestCase):
def test_filter(self):
filter = RFPDupeFilter()
filter.open()
dupefilter = RFPDupeFilter()
dupefilter.open()
r1 = Request('http://scrapytest.org/1')
r2 = Request('http://scrapytest.org/2')
r3 = Request('http://scrapytest.org/2')
assert not filter.request_seen(r1)
assert filter.request_seen(r1)
assert not dupefilter.request_seen(r1)
assert dupefilter.request_seen(r1)
assert not dupefilter.request_seen(r2)
assert dupefilter.request_seen(r3)
dupefilter.close('finished')
def test_request_fingerprint(self):
"""Test if customization of request_fingerprint method will change
output of request_seen.
"""
r1 = Request('http://scrapytest.org/index.html')
r2 = Request('http://scrapytest.org/INDEX.html')
dupefilter = RFPDupeFilter()
dupefilter.open()
assert not dupefilter.request_seen(r1)
assert not dupefilter.request_seen(r2)
dupefilter.close('finished')
class CaseInsensitiveRFPDupeFilter(RFPDupeFilter):
def request_fingerprint(self, request):
fp = hashlib.sha1()
fp.update(request.url.lower())
return fp.hexdigest()
case_insensitive_dupefilter = CaseInsensitiveRFPDupeFilter()
case_insensitive_dupefilter.open()
assert not filter.request_seen(r2)
assert filter.request_seen(r3)
assert not case_insensitive_dupefilter.request_seen(r1)
assert case_insensitive_dupefilter.request_seen(r2)
filter.close('finished')
case_insensitive_dupefilter.close('finished')
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册