Merge pull request #2048 from redapple/bs4-faq

[MRG] Add FAQ entry on using BeautifulSoup in spider callbacks

Merge pull request #2048 from redapple/bs4-faq
[MRG] Add FAQ entry on using BeautifulSoup in spider callbacks
80c296e0 · Mikhail Korobov · GitHub · edca2832 · 1ff9a482 · 80c296e0
隐藏空白更改
内联并排

Showing with 40 addition and 0 deletion

docs/faq.rst docs/faq.rst +40 -0

未找到文件。
--- a/docs/faq.rst
+++ b/docs/faq.rst
@@ -3,6 +3,8 @@
 Frequently Asked Questions
 ==========================

+.. _faq-scrapy-bs-cmp:
+
 How does Scrapy compare to BeautifulSoup or lxml?
 -------------------------------------------------

@@ -24,6 +26,44 @@ comparing `jinja2`_ to `Django`_.
 .. _jinja2: http://jinja.pocoo.org/
 .. _Django: https://www.djangoproject.com/

+Can I use Scrapy with BeautifulSoup?
+------------------------------------
+
+Yes, you can.
+As mentioned :ref:`above <faq-scrapy-bs-cmp>`, `BeautifulSoup`_ can be used
+for parsing HTML responses in Scrapy callbacks.
+You just have to feed the response's body into a ``BeautifulSoup`` object
+and extract whatever data you need from it.
+
+Here's an example spider using BeautifulSoup API, with ``lxml`` as the HTML parser::
+
+
+    from bs4 import BeautifulSoup
+    import scrapy
+
+
+    class ExampleSpider(scrapy.Spider):
+        name = "example"
+        allowed_domains = ["example.com"]
+        start_urls = (
+            'http://www.example.com/',
+        )
+
+        def parse(self, response):
+            # use lxml to get decent HTML parsing speed
+            soup = BeautifulSoup(response.text, 'lxml')
+            yield {
+                "url": response.url,
+                "title": soup.h1.string
+            }
+
+.. note::
+
+    ``BeautifulSoup`` supports several HTML/XML parsers.
+    See `BeautifulSoup's official documentation`_ on which ones are available.
+
+.. _BeautifulSoup's official documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#specifying-the-parser-to-use
+
 .. _faq-python-versions:

 What Python versions does Scrapy support?