Add note on how to choose parser with BeautifulSoup

6cbd92fa · Paul Tremberth · 7978237e · 6cbd92fa
隐藏空白更改
内联并排

Showing with 13 addition and 4 deletion

docs/faq.rst docs/faq.rst +13 -4

未找到文件。
--- a/docs/faq.rst
+++ b/docs/faq.rst
@@ -26,15 +26,17 @@ comparing `jinja2`_ to `Django`_.
 .. _jinja2: http://jinja.pocoo.org/
 .. _Django: https://www.djangoproject.com/

-How can I use Scrapy with BeautifulSoup?
----------------------------------------
+Can I use Scrapy with BeautifulSoup?
+------------------------------------

-As mentioned :ref:`above <faq-scrapy-bs-cmp>`, BeautifulSoup can be used
+Yes, you can.
+As mentioned :ref:`above <faq-scrapy-bs-cmp>`, `BeautifulSoup`_ can be used
 for parsing HTML responses in Scrapy callbacks.
 You just have to feed the response's body into a ``BeautifulSoup`` object
 and extract whatever data you need from it.

-Here's an example spider using ``lxml`` parser with BeautifulSoup API::
+Here's an example spider using BeautifulSoup API, with ``lxml`` as the HTML parser
+(so you get the same parsing speed as with scrapy/parsel selectors)::


    from bs4 import BeautifulSoup
@@ -49,12 +51,19 @@ Here's an example spider using ``lxml`` parser with BeautifulSoup API::
        )

        def parse(self, response):
+            # use lxml to get decent HTML parsing speed
            soup = BeautifulSoup(response.text, 'lxml')
            yield {
                "url": response.url,
                "title": soup.h1.string
            }

+.. note::
+
+    ``BeautifulSoup`` supports several HTML/XML parsers.
+    See `BeautifulSoup's official documentation`_ on which ones are available.
+
+.. _BeautifulSoup's official documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#specifying-the-parser-to-use

 .. _faq-python-versions: