提交 80c296e0 编写于 作者: M Mikhail Korobov 提交者: GitHub

Merge pull request #2048 from redapple/bs4-faq

[MRG] Add FAQ entry on using BeautifulSoup in spider callbacks
......@@ -3,6 +3,8 @@
Frequently Asked Questions
==========================
.. _faq-scrapy-bs-cmp:
How does Scrapy compare to BeautifulSoup or lxml?
-------------------------------------------------
......@@ -24,6 +26,44 @@ comparing `jinja2`_ to `Django`_.
.. _jinja2: http://jinja.pocoo.org/
.. _Django: https://www.djangoproject.com/
Can I use Scrapy with BeautifulSoup?
------------------------------------
Yes, you can.
As mentioned :ref:`above <faq-scrapy-bs-cmp>`, `BeautifulSoup`_ can be used
for parsing HTML responses in Scrapy callbacks.
You just have to feed the response's body into a ``BeautifulSoup`` object
and extract whatever data you need from it.
Here's an example spider using BeautifulSoup API, with ``lxml`` as the HTML parser::
from bs4 import BeautifulSoup
import scrapy
class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
start_urls = (
'http://www.example.com/',
)
def parse(self, response):
# use lxml to get decent HTML parsing speed
soup = BeautifulSoup(response.text, 'lxml')
yield {
"url": response.url,
"title": soup.h1.string
}
.. note::
``BeautifulSoup`` supports several HTML/XML parsers.
See `BeautifulSoup's official documentation`_ on which ones are available.
.. _BeautifulSoup's official documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#specifying-the-parser-to-use
.. _faq-python-versions:
What Python versions does Scrapy support?
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册