提交 6cbd92fa 编写于 作者: P Paul Tremberth

Add note on how to choose parser with BeautifulSoup

上级 7978237e
......@@ -26,15 +26,17 @@ comparing `jinja2`_ to `Django`_.
.. _jinja2: http://jinja.pocoo.org/
.. _Django: https://www.djangoproject.com/
How can I use Scrapy with BeautifulSoup?
----------------------------------------
Can I use Scrapy with BeautifulSoup?
------------------------------------
As mentioned :ref:`above <faq-scrapy-bs-cmp>`, BeautifulSoup can be used
Yes, you can.
As mentioned :ref:`above <faq-scrapy-bs-cmp>`, `BeautifulSoup`_ can be used
for parsing HTML responses in Scrapy callbacks.
You just have to feed the response's body into a ``BeautifulSoup`` object
and extract whatever data you need from it.
Here's an example spider using ``lxml`` parser with BeautifulSoup API::
Here's an example spider using BeautifulSoup API, with ``lxml`` as the HTML parser
(so you get the same parsing speed as with scrapy/parsel selectors)::
from bs4 import BeautifulSoup
......@@ -49,12 +51,19 @@ Here's an example spider using ``lxml`` parser with BeautifulSoup API::
)
def parse(self, response):
# use lxml to get decent HTML parsing speed
soup = BeautifulSoup(response.text, 'lxml')
yield {
"url": response.url,
"title": soup.h1.string
}
.. note::
``BeautifulSoup`` supports several HTML/XML parsers.
See `BeautifulSoup's official documentation`_ on which ones are available.
.. _BeautifulSoup's official documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#specifying-the-parser-to-use
.. _faq-python-versions:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册