sep-006.rst 2.5 KB
Newer Older
E
Edwin O Marshall 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
=======  =========================================
SEP      06
Title    Extractors
Author   Ismael Carnales and a bunch of rabid mice
Created  2009-07-28
Status   Obsolete (discarded)
=======  =========================================

==========================================
SEP-006: Rename of Selectors to Extractors
==========================================

This SEP proposes a more meaningful naming of XPathSelectors or "Selectors" and their `x` method.

Motivation
==========

When you use Selectors in Scrapy, your final goal is to "extract" the data that
A
Aditya 已提交
19
you've selected, as the [https://doc.scrapy.org/en/latest/topics/selectors.html
E
Edwin O Marshall 已提交
20 21
XPath Selectors documentation] says (bolding by me):

22 23
   When you’re scraping web pages, the most common task you need to perform is
   to **extract** data from the HTML source.
E
Edwin O Marshall 已提交
24

25
..
26

27
   Scrapy comes with its own mechanism for **extracting** data. They’re called
E
Edwin O Marshall 已提交
28
   ``XPath`` selectors (or just “selectors”, for short) because they “select”
29
   certain parts of the HTML document specified by ``XPath`` expressions.
E
Edwin O Marshall 已提交
30

31
..
32

33 34
   To actually **extract** the textual data you must call the selector
   ``extract()`` method, as follows
E
Edwin O Marshall 已提交
35

36
..
37

38 39
   Selectors also have a ``re()`` method for **extracting** data using regular
   expressions.
E
Edwin O Marshall 已提交
40

41
..
42

43 44
   For example, suppose you want to **extract** all <p> elements inside <div>
   elements. First you get would get all <div> elements
E
Edwin O Marshall 已提交
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73

Rationale
=========

As and there is no ``Extractor`` object in Scrapy and what you want to finally
perform with ``Selectors`` is extracting data, we propose the renaming of
``Selectors`` to ``Extractors``. (In Scrapy for extracting you use selectors is
really weird :) )

Additional changes
==================

As the name of the method for performing selection (the ``x`` method) is not
descriptive nor mnemotechnic enough and clearly clashes with ``extract`` method
(x sounds like a short for extract in english), we propose to rename it to
`select`, `sel` (is shortness if required), or `xpath` after `lxml's
<http://codespeak.net/lxml/xpathxslt.html>`_ ``xpath`` method.

Bonus (ItemBuilder)
===================

After this renaming we propose also renaming ``ItemBuilder`` to ``ItemExtractor``,
because the ``ItemBuilder``/``Extractor`` will act as a bridge between a set of
``Extractors`` and an ``Item`` and because it will literally "extract" an item from a
webpage or set of pages.

References
==========

A
Aditya 已提交
74
 1. XPath Selectors (https://doc.scrapy.org/topics/selectors.html)
E
Edwin O Marshall 已提交
75
 2. XPath and XSLT with lxml (http://codespeak.net/lxml/xpathxslt.html)