提交 7978237e 编写于 作者: P Paul Tremberth

Add FAQ entry on using BeautifulSoup in spider callbacks

上级 b7925e42
......@@ -3,6 +3,8 @@
Frequently Asked Questions
==========================
.. _faq-scrapy-bs-cmp:
How does Scrapy compare to BeautifulSoup or lxml?
-------------------------------------------------
......@@ -24,6 +26,36 @@ comparing `jinja2`_ to `Django`_.
.. _jinja2: http://jinja.pocoo.org/
.. _Django: https://www.djangoproject.com/
How can I use Scrapy with BeautifulSoup?
----------------------------------------
As mentioned :ref:`above <faq-scrapy-bs-cmp>`, BeautifulSoup can be used
for parsing HTML responses in Scrapy callbacks.
You just have to feed the response's body into a ``BeautifulSoup`` object
and extract whatever data you need from it.
Here's an example spider using ``lxml`` parser with BeautifulSoup API::
from bs4 import BeautifulSoup
import scrapy
class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
start_urls = (
'http://www.example.com/',
)
def parse(self, response):
soup = BeautifulSoup(response.text, 'lxml')
yield {
"url": response.url,
"title": soup.h1.string
}
.. _faq-python-versions:
What Python versions does Scrapy support?
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册