add questions

b2148e6a · CSDN-Ada助手 · e903cb17 · b2148e6a · b2148e6a · b2148e6a
18 changed file
--- a/data/2.python中阶/3.网络爬虫/12.autoscraper/autoscraper_desc.json
+++ b/data/2.python中阶/3.网络爬虫/12.autoscraper/autoscraper_desc.json
+{
+    "source": "autoscraper_desc.md",
+    "depends": [],
+    "type": "code_options",
+    "author": "zxm2015",
+    "notebook_enable": true
+}
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/12.autoscraper/autoscraper_desc.md
+++ b/data/2.python中阶/3.网络爬虫/12.autoscraper/autoscraper_desc.md
+# autoscraper简介
+autoscraper是一个基于python的智能、自动、快速和轻量级的网络爬虫，以下说法<span style="color:red">错误</span>的是：
+## 答案
+```bash
+是目前解析速度最快的网络爬虫
+```
+## 选项
+### A
+```bash
+同时提供了精确抽取的方法
+```
+### B
+```bash
+可以根据示例文本自动抽取相似的文本
+```
+### C
+```bash
+避免了手写页面抽取规则的烦恼
+```
--- a/data/2.python中阶/3.网络爬虫/12.autoscraper/config.json
+++ b/data/2.python中阶/3.网络爬虫/12.autoscraper/config.json
+{
+    "export": ["autoscraper_desc.json", "hello_autoscraper.json"],
+    "keywords": [],
+    "children": [],
+    "keywords_must": [
+      "autoscraper"
+    ],
+    "keywords_forbid": []
+}
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/12.autoscraper/hello_autoscraper.json
+++ b/data/2.python中阶/3.网络爬虫/12.autoscraper/hello_autoscraper.json
+{
+    "source": "hello_autoscraper.md",
+    "depends": [],
+    "type": "code_options",
+    "author": "zxm2015",
+    "notebook_enable": true
+}
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/12.autoscraper/hello_autoscraper.md
+++ b/data/2.python中阶/3.网络爬虫/12.autoscraper/hello_autoscraper.md
+# autoscraper示例
+使用 autoscraper 从stackoverflow搜索页提取相似的主题帖，代码如下：
+```python
+# -*- coding: UTF-8 -*-
+from autoscraper import AutoScraper
+def get_similar_result(url, wanted_list):
+    scraper = AutoScraper()
+    # TODO(You): 正确的提取代码
+    return result
+url = 'https://stackoverflow.com/search?q=autoscraper&s=7b5866da-920e-4926-8c33-09fb7d32886b'
+wanted_list = ["AutoScraper module not found in Python Autoscraper library"]
+print(get_similar_result(url, wanted_list))
+```
+关于缺失代码部分，以下选项<span style="color:red">正确</span>的是：
+## 答案
+```python
+result = scraper.build(url, wanted_list)
+```
+## 选项
+### A
+```python
+result = scraper.get_result_similar(url, wanted_list)
+```
+### B
+```python
+result = scraper.get(url, wanted_list)
+```
+### C
+```python
+result = scraper.get_result_exact(url, wanted_list)
+```
--- a/data/2.python中阶/3.网络爬虫/12.autoscraper/hello_autoscraper.py
+++ b/data/2.python中阶/3.网络爬虫/12.autoscraper/hello_autoscraper.py
+# -*- coding: UTF-8 -*-
+from autoscraper import AutoScraper
+def get_similar_result(url, wanted_list):
+    scraper = AutoScraper()
+    result = scraper.build(url, wanted_list)
+    return result
+url = 'https://stackoverflow.com/search?q=autoscraper&s=7b5866da-920e-4926-8c33-09fb7d32886b'
+wanted_list = ["AutoScraper module not found in Python Autoscraper library"]
+print(get_similar_result(url, wanted_list))
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/13.selectolax/config.json
+++ b/data/2.python中阶/3.网络爬虫/13.selectolax/config.json
+{
+    "export": ["selectolax_desc.json", "hello_selectolax.json"],
+    "keywords": [],
+    "children": [],
+    "keywords_must": [
+      "selectolax"
+    ],
+    "keywords_forbid": []
+}
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/13.selectolax/hello_selectolax.json
+++ b/data/2.python中阶/3.网络爬虫/13.selectolax/hello_selectolax.json
+{
+    "source": "hello_selectolax.md",
+    "depends": [],
+    "type": "code_options",
+    "author": "zxm2015",
+    "notebook_enable": true
+}
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/13.selectolax/hello_selectolax.md
+++ b/data/2.python中阶/3.网络爬虫/13.selectolax/hello_selectolax.md
+# selectolax示例
+使用 selectolax 提取页面p标签的内容，代码如下：
+```python
+# -*- coding: UTF-8 -*-
+from selectolax.parser import HTMLParser
+def get_p(html):
+    p_list = []
+    for node in HTMLParser(html).css("p"):
+        # TODO(You): 正确的提取代码
+    return p_list
+html = '''
+    <html>
+        <head>
+            <title>这是一个简单的测试页面</title>
+        </head>
+        <body>
+            <p class="item-0">body 元素的内容会显示在浏览器中。</p>
+            <p class="item-1">title 元素的内容会显示在浏览器的标题栏中。</p>
+        </body>
+    </html>
+    '''
+print(get_p(html))
+```
+关于缺失代码部分，以下选项<span style="color:red">正确</span>的是：
+## 答案
+```python
+p_list.append(node.text())
+```
+## 选项
+### A
+```python
+p_list.append(node.text)
+```
+### B
+```python
+p_list.append(node)
+```
+### C
+```python
+p_list.append(node.get_text())
+```
--- a/data/2.python中阶/3.网络爬虫/13.selectolax/hello_selectolax.py
+++ b/data/2.python中阶/3.网络爬虫/13.selectolax/hello_selectolax.py
+# -*- coding: UTF-8 -*-
+from selectolax.parser import HTMLParser
+def get_p(html):
+    p_list = []
+    for node in HTMLParser(html).css("p"):
+        p_list.append(node.text())
+    return p_list
+html = '''
+    <html>
+        <head>
+            <title>这是一个简单的测试页面</title>
+        </head>
+        <body>
+            <p class="item-0">body 元素的内容会显示在浏览器中。</p>
+            <p class="item-1">title 元素的内容会显示在浏览器的标题栏中。</p>
+        </body>
+    </html>
+    '''
+print(get_p(html))
--- a/data/2.python中阶/3.网络爬虫/13.selectolax/selectolax_desc.json
+++ b/data/2.python中阶/3.网络爬虫/13.selectolax/selectolax_desc.json
+{
+    "source": "selectolax_desc.md",
+    "depends": [],
+    "type": "code_options",
+    "author": "zxm2015",
+    "notebook_enable": true
+}
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/13.selectolax/selectolax_desc.md
+++ b/data/2.python中阶/3.网络爬虫/13.selectolax/selectolax_desc.md
+# selectolax简介
+selectolax用来高效解析网页，以下说法<span style="color:red">错误</span>的是：
+## 答案
+```bash
+selectolax提供了下载网页功能
+```
+## 选项
+### A
+```bash
+selectolax解析速度优于lxml
+```
+### B
+```bash
+爬取大量数据，解析页面可以考虑使用selectolax
+```
+### C
+```bash
+使用了Modest和Lexbor引擎
+```
--- a/data/2.python中阶/3.网络爬虫/14.requests-html/config.json
+++ b/data/2.python中阶/3.网络爬虫/14.requests-html/config.json
+{
+    "export": ["requests_html_desc.json", "hello_requests_html.json"],
+    "keywords": [],
+    "children": [],
+    "keywords_must": [
+      "requests-html"
+    ],
+    "keywords_forbid": []
+}
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/14.requests-html/hello_requests_html.json
+++ b/data/2.python中阶/3.网络爬虫/14.requests-html/hello_requests_html.json
+{
+    "source": "hello_requests_html.md",
+    "depends": [],
+    "type": "code_options",
+    "author": "zxm2015",
+    "notebook_enable": true
+}
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/14.requests-html/hello_requests_html.md
+++ b/data/2.python中阶/3.网络爬虫/14.requests-html/hello_requests_html.md
+# requests-html示例
+使用 requests-html 提取页面https://www.baidu.com/上面的所有链接，代码如下：
+```python
+# -*- coding: UTF-8 -*-
+from requests_html import HTMLSession
+def get_url(url):
+    session = HTMLSession()
+    r = session.get(url)
+    # TODO(You): 正确的提取代码
+    return urls
+print(get_url("https://www.baidu.com/"))
+```
+关于缺失代码部分，以下选项<span style="color:red">正确</span>的是：
+## 答案
+```python
+urls = r.html.links
+```
+## 选项
+### A
+```python
+urls = r.html.find("url")
+```
+### B
+```python
+urls = r.html.find("url")[0]
+```
+### C
+```python
+urls = r.html.urls
+```
--- a/data/2.python中阶/3.网络爬虫/14.requests-html/hello_requests_html.py
+++ b/data/2.python中阶/3.网络爬虫/14.requests-html/hello_requests_html.py
+# -*- coding: UTF-8 -*-
+from requests_html import HTMLSession
+def get_url(url):
+    session = HTMLSession()
+    r = session.get(url)
+    return r.html.links
+print(get_url("https://www.baidu.com/"))
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/14.requests-html/requests_html_desc.json
+++ b/data/2.python中阶/3.网络爬虫/14.requests-html/requests_html_desc.json
+{
+    "source": "requests_html_desc.md",
+    "depends": [],
+    "type": "code_options",
+    "author": "zxm2015",
+    "notebook_enable": true
+}
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/14.requests-html/requests_html_desc.md
+++ b/data/2.python中阶/3.网络爬虫/14.requests-html/requests_html_desc.md
+# requests-html简介
+requests-html可以使爬虫开发人员方便的编写爬虫代码，以下说法<span style="color:red">错误</span>的是：
+## 答案
+```bash
+支持验证码识别
+```
+## 选项
+### A
+```bash
+requests-html不仅可以下载网页，还可以解析网页
+```
+### B
+```bash
+支持CSS和XPath选择器
+```
+### C
+```bash
+支持持久cookie和代理
+```