提交 358ec6e8 编写于 作者: E Edwin O Marshall

converted sep 7 for #629

上级 690081b4
======= =============================
SEP 7
Title ItemLoader processors library
Author Ismael Carnales
Created 2009-08-10
Status Draft
======= =============================
======================================
SEP-007: ItemLoader processors library
======================================
This SEP proposes a library of ``ItemLoader`` processor to ship with Scrapy.
date.py
=======
``to_date``
-----------
Converts a date string to a YYYY-MM-DD one suitable for ``DateField``
**Decision**: Obsolete. ``DateField`` doesn't exists anymore.
extraction.py
=============
``extract``
-----------
This adaptor tries to extract data from the given locations. Any
``XPathSelector`` in it will be extracted, and any other data will be added
as-is to the result.
**Decision**: Obsolete. Functionality included in ``XpathLoader``.
``ExtractImageLinks``
This adaptor may receive either XPathSelectors pointing to the desired
locations for finding image urls, or just a list of XPath expressions (which
will be turned into selectors anyway).
**Decision**: XXX
markup.py
=========
``remove_tags``
---------------
Factory that returns an adaptor for removing each tag in the ``tags`` parameter
found in the given value. If no ``tags`` are specified, all of them are
removed.
**Decision**: XXX
``remove_root``
---------------
This adaptor removes the root tag of the given string/unicode, if it's found.
**Decision**: XXX
``replace_escape``
------------------
Factory that returns an adaptor for removing/replacing each escape character in
the ``wich_ones`` parameter found in the given value.
**Decision**: XXX
``unquote``
-----------
This factory returns an adaptor that receives a string or unicode, removes all
of the CDATAs and entities (except the ones in CDATAs, and the ones you specify
in the ``keep`` parameter) and then, returns a new string or unicode.
**Decision**: XXX
misc.py
=======
``to_unicode``
--------------
Receives a string and converts it to unicode using the given encoding (if
specified, else utf-8 is used) and returns a new unicode object. E.g:
::
>> to_unicode('it costs 20\xe2\x82\xac, or 30\xc2\xa3')
[u'it costs 20\u20ac, or 30\xa3']
**Decision**: XXX
``clean_spaces``
----------------
Converts multispaces into single spaces for the given string. E.g:
::
>> clean_spaces(u'Hello sir')
u'Hello sir'
**Decision**: XXX
``drop_empty``
--------------
Removes any index that evaluates to None from the provided iterable. E.g:
::
>> drop_empty([0, 'this', None, 'is', False, 'an example'])
['this', 'is', 'an example']
**Decision**: Obsolete. Functionality included in reducers.
``delist``
----------
This factory returns and adaptor that joins an iterable with the specified
delimiter.
**Decision**: Obsolete. Functionality included in reducers.
``Regex``
----------
This adaptor must receive either a list of strings or an XPathSelector and
return a new list with the matches of the given strings with the given regular
expression (which is passed by a keyword argument, and is mandatory for this
adaptor).
**Decision**: XXX
= SEP-007: !ItemLoader processors library =
[[PageOutline(2-5, Contents)]]
||'''SEP'''||7||
||'''Title'''||!ItemLoader processors library||
||'''Author'''||Ismael Carnales||
||'''Created'''||2009-08-10||
||'''Status'''||Draft||
== Introduction ==
This SEP proposes a library of !ItemLoader processor to ship with Scrapy.
== date.py ==
=== `to_date` ===
Converts a date string to a YYYY-MM-DD one suitable for !DateField
'''Decision''': Obsolete. !DateField doesn't exists anymore.
== extraction.py ==
=== `extract` ===
This adaptor tries to extract data from the given locations. Any XPathSelector in it will be extracted, and any other data will be added as-is to the result.
'''Decision''': Obsolete. Functionality included in !XpathLoader.
=== `ExtractImageLinks` ===
This adaptor may receive either XPathSelectors pointing to the desired locations for finding image urls, or just a list of XPath expressions (which will be turned into selectors anyway).
'''Decision''': XXX
== markup.py ==
=== `remove_tags` ===
Factory that returns an adaptor for removing each tag in the `tags` parameter found in the given value. If no `tags` are specified, all of them are removed.
'''Decision''': XXX
=== `remove_root` ===
This adaptor removes the root tag of the given string/unicode, if it's found.
'''Decision''': XXX
=== `replace_escape` ===
Factory that returns an adaptor for removing/replacing each escape character in the `wich_ones` parameter found in the given value.
'''Decision''': XXX
=== `unquote` ===
This factory returns an adaptor that receives a string or unicode, removes all of the CDATAs and entities (except the ones in CDATAs, and the ones you specify in the `keep` parameter) and then, returns a new string or unicode.
'''Decision''': XXX
== misc.py ==
=== `to_unicode` ===
Receives a string and converts it to unicode using the given encoding (if specified, else utf-8 is used) and returns a new unicode object. E.g:
{{{
>> to_unicode('it costs 20\xe2\x82\xac, or 30\xc2\xa3')
[u'it costs 20\u20ac, or 30\xa3']
}}}
'''Decision''': XXX
=== `clean_spaces` ===
Converts multispaces into single spaces for the given string. E.g:
{{{
>> clean_spaces(u'Hello sir')
u'Hello sir'
}}}
'''Decision''': XXX
=== `drop_empty` ===
Removes any index that evaluates to None from the provided iterable. E.g:
{{{
>> drop_empty([0, 'this', None, 'is', False, 'an example'])
['this', 'is', 'an example']
}}}
'''Decision''': Obsolete. Functionality included in reducers.
=== `delist` ===
This factory returns and adaptor that joins an iterable with the specified delimiter.
'''Decision''': Obsolete. Functionality included in reducers.
=== `Regex` ===
This adaptor must receive either a list of strings or an XPathSelector and return a new list with the matches of the given strings with the given regular expression (which is passed by a keyword argument, and is mandatory for this adaptor).
'''Decision''': XXX
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册