Add FAQ entry on using BeautifulSoup in spider callbacks

7978237e · Paul Tremberth · b7925e42 · 7978237e
隐藏空白更改
内联并排

Showing with 32 addition and 0 deletion

docs/faq.rst docs/faq.rst +32 -0

未找到文件。
--- a/docs/faq.rst
+++ b/docs/faq.rst
@@ -3,6 +3,8 @@
 Frequently Asked Questions
 ==========================

+.. _faq-scrapy-bs-cmp:
+
 How does Scrapy compare to BeautifulSoup or lxml?
 -------------------------------------------------

@@ -24,6 +26,36 @@ comparing `jinja2`_ to `Django`_.
 .. _jinja2: http://jinja.pocoo.org/
 .. _Django: https://www.djangoproject.com/

+How can I use Scrapy with BeautifulSoup?
+----------------------------------------
+
+As mentioned :ref:`above <faq-scrapy-bs-cmp>`, BeautifulSoup can be used
+for parsing HTML responses in Scrapy callbacks.
+You just have to feed the response's body into a ``BeautifulSoup`` object
+and extract whatever data you need from it.
+
+Here's an example spider using ``lxml`` parser with BeautifulSoup API::
+
+
+    from bs4 import BeautifulSoup
+    import scrapy
+
+
+    class ExampleSpider(scrapy.Spider):
+        name = "example"
+        allowed_domains = ["example.com"]
+        start_urls = (
+            'http://www.example.com/',
+        )
+
+        def parse(self, response):
+            soup = BeautifulSoup(response.text, 'lxml')
+            yield {
+                "url": response.url,
+                "title": soup.h1.string
+            }
+
+
 .. _faq-python-versions:

 What Python versions does Scrapy support?