提交 b2148e6a 编写于 作者: CSDN-Ada助手's avatar CSDN-Ada助手

add questions

上级 e903cb17
{
"source": "autoscraper_desc.md",
"depends": [],
"type": "code_options",
"author": "zxm2015",
"notebook_enable": true
}
\ No newline at end of file
# autoscraper简介
autoscraper是一个基于python的智能、自动、快速和轻量级的网络爬虫,以下说法<span style="color:red">错误</span>的是:
## 答案
```bash
是目前解析速度最快的网络爬虫
```
## 选项
### A
```bash
同时提供了精确抽取的方法
```
### B
```bash
可以根据示例文本自动抽取相似的文本
```
### C
```bash
避免了手写页面抽取规则的烦恼
```
{
"export": ["autoscraper_desc.json", "hello_autoscraper.json"],
"keywords": [],
"children": [],
"keywords_must": [
"autoscraper"
],
"keywords_forbid": []
}
\ No newline at end of file
{
"source": "hello_autoscraper.md",
"depends": [],
"type": "code_options",
"author": "zxm2015",
"notebook_enable": true
}
\ No newline at end of file
# autoscraper示例
使用 autoscraper 从stackoverflow搜索页提取相似的主题帖,代码如下:
```python
# -*- coding: UTF-8 -*-
from autoscraper import AutoScraper
def get_similar_result(url, wanted_list):
scraper = AutoScraper()
# TODO(You): 正确的提取代码
return result
url = 'https://stackoverflow.com/search?q=autoscraper&s=7b5866da-920e-4926-8c33-09fb7d32886b'
wanted_list = ["AutoScraper module not found in Python Autoscraper library"]
print(get_similar_result(url, wanted_list))
```
关于缺失代码部分,以下选项<span style="color:red">正确</span>的是:
## 答案
```python
result = scraper.build(url, wanted_list)
```
## 选项
### A
```python
result = scraper.get_result_similar(url, wanted_list)
```
### B
```python
result = scraper.get(url, wanted_list)
```
### C
```python
result = scraper.get_result_exact(url, wanted_list)
```
# -*- coding: UTF-8 -*-
from autoscraper import AutoScraper
def get_similar_result(url, wanted_list):
scraper = AutoScraper()
result = scraper.build(url, wanted_list)
return result
url = 'https://stackoverflow.com/search?q=autoscraper&s=7b5866da-920e-4926-8c33-09fb7d32886b'
wanted_list = ["AutoScraper module not found in Python Autoscraper library"]
print(get_similar_result(url, wanted_list))
\ No newline at end of file
{
"export": ["selectolax_desc.json", "hello_selectolax.json"],
"keywords": [],
"children": [],
"keywords_must": [
"selectolax"
],
"keywords_forbid": []
}
\ No newline at end of file
{
"source": "hello_selectolax.md",
"depends": [],
"type": "code_options",
"author": "zxm2015",
"notebook_enable": true
}
\ No newline at end of file
# selectolax示例
使用 selectolax 提取页面p标签的内容,代码如下:
```python
# -*- coding: UTF-8 -*-
from selectolax.parser import HTMLParser
def get_p(html):
p_list = []
for node in HTMLParser(html).css("p"):
# TODO(You): 正确的提取代码
return p_list
html = '''
<html>
<head>
<title>这是一个简单的测试页面</title>
</head>
<body>
<p class="item-0">body 元素的内容会显示在浏览器中。</p>
<p class="item-1">title 元素的内容会显示在浏览器的标题栏中。</p>
</body>
</html>
'''
print(get_p(html))
```
关于缺失代码部分,以下选项<span style="color:red">正确</span>的是:
## 答案
```python
p_list.append(node.text())
```
## 选项
### A
```python
p_list.append(node.text)
```
### B
```python
p_list.append(node)
```
### C
```python
p_list.append(node.get_text())
```
# -*- coding: UTF-8 -*-
from selectolax.parser import HTMLParser
def get_p(html):
p_list = []
for node in HTMLParser(html).css("p"):
p_list.append(node.text())
return p_list
html = '''
<html>
<head>
<title>这是一个简单的测试页面</title>
</head>
<body>
<p class="item-0">body 元素的内容会显示在浏览器中。</p>
<p class="item-1">title 元素的内容会显示在浏览器的标题栏中。</p>
</body>
</html>
'''
print(get_p(html))
{
"source": "selectolax_desc.md",
"depends": [],
"type": "code_options",
"author": "zxm2015",
"notebook_enable": true
}
\ No newline at end of file
# selectolax简介
selectolax用来高效解析网页,以下说法<span style="color:red">错误</span>的是:
## 答案
```bash
selectolax提供了下载网页功能
```
## 选项
### A
```bash
selectolax解析速度优于lxml
```
### B
```bash
爬取大量数据,解析页面可以考虑使用selectolax
```
### C
```bash
使用了Modest和Lexbor引擎
```
{
"export": ["requests_html_desc.json", "hello_requests_html.json"],
"keywords": [],
"children": [],
"keywords_must": [
"requests-html"
],
"keywords_forbid": []
}
\ No newline at end of file
{
"source": "hello_requests_html.md",
"depends": [],
"type": "code_options",
"author": "zxm2015",
"notebook_enable": true
}
\ No newline at end of file
# requests-html示例
使用 requests-html 提取页面https://www.baidu.com/上面的所有链接,代码如下:
```python
# -*- coding: UTF-8 -*-
from requests_html import HTMLSession
def get_url(url):
session = HTMLSession()
r = session.get(url)
# TODO(You): 正确的提取代码
return urls
print(get_url("https://www.baidu.com/"))
```
关于缺失代码部分,以下选项<span style="color:red">正确</span>的是:
## 答案
```python
urls = r.html.links
```
## 选项
### A
```python
urls = r.html.find("url")
```
### B
```python
urls = r.html.find("url")[0]
```
### C
```python
urls = r.html.urls
```
# -*- coding: UTF-8 -*-
from requests_html import HTMLSession
def get_url(url):
session = HTMLSession()
r = session.get(url)
return r.html.links
print(get_url("https://www.baidu.com/"))
\ No newline at end of file
{
"source": "requests_html_desc.md",
"depends": [],
"type": "code_options",
"author": "zxm2015",
"notebook_enable": true
}
\ No newline at end of file
# requests-html简介
requests-html可以使爬虫开发人员方便的编写爬虫代码,以下说法<span style="color:red">错误</span>的是:
## 答案
```bash
支持验证码识别
```
## 选项
### A
```bash
requests-html不仅可以下载网页,还可以解析网页
```
### B
```bash
支持CSS和XPath选择器
```
### C
```bash
支持持久cookie和代理
```
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册