add scrawler questions

88b1e097 · CSDN-Ada助手 · e6cf3526 · 88b1e097 · 88b1e097 · 88b1e097
12 changed file
--- a/data/2.python中阶/3.网络爬虫/11.模拟登录/config.json
+++ b/data/2.python中阶/3.网络爬虫/11.模拟登录/config.json
 {
-  "export": [],
+  "export": ["simulate_login.json"],
  "keywords": [],
  "children": [
    {

--- a/data/2.python中阶/3.网络爬虫/11.模拟登录/simulate_login.json
+++ b/data/2.python中阶/3.网络爬虫/11.模拟登录/simulate_login.json
+{
+    "author": "zxm2015",
+    "source": "simulate_login.md",
+    "depends": [],
+    "type": "code_options"
+  }
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/11.模拟登录/simulate_login.md
+++ b/data/2.python中阶/3.网络爬虫/11.模拟登录/simulate_login.md
+# 模拟登陆
+一些网站需要登录之后才能浏览网站的其他内容，爬虫需要拥有登录获取cookie/session的能力才能继续采集数据，以下关于说法<span style="color:red">错误</span>的是：
+## 答案
+```
+登录成功后获取的cookie一般来说永久有效
+```
+## 选项
+### A
+```
+模拟登陆需要先注册网站的账号，或者多注册一些账号来维护一个cookies池
+```
+### B
+```
+获取登录页面，可以从登录按钮处获取到登录的url
+```
+### C
+```
+登录成功后获取到cookie，其他请求带上cookie就可以获取到请求的页面资源
+```
--- a/data/2.python中阶/3.网络爬虫/6.Selenium/config.json
+++ b/data/2.python中阶/3.网络爬虫/6.Selenium/config.json
 {
-  "export": [],
+  "export": ["selenium.json"],
  "keywords": [],
  "children": [
    {

--- a/data/2.python中阶/3.网络爬虫/6.Selenium/selenium.json
+++ b/data/2.python中阶/3.网络爬虫/6.Selenium/selenium.json
+{
+    "author": "zxm2015",
+    "source": "selenium.md",
+    "depends": [],
+    "type": "code_options"
+  }
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/6.Selenium/selenium.md
+++ b/data/2.python中阶/3.网络爬虫/6.Selenium/selenium.md
+# selenium
+Selenium是web自动化测试工具集，爬虫可以利用其实现对页面动态资源的采集，对于其这种说法<span style="color:red">错误</span>的是：
+## 答案
+```
+selenium和requests一样，都能用来采集数据，具有同等的速度
+```
+## 选项
+### A
+```
+页面执行js才能呈现的内容，可以使用selenium来协助采集
+```
+### B
+```
+selenium本质是驱动浏览器来发送请求，模拟浏览器的行为
+```
+### C
+```
+请求之后往往需要等待一段时间，等待资源加载渲染完成
+```
--- a/data/2.python中阶/3.网络爬虫/8.pyspider框架的使用/config.json
+++ b/data/2.python中阶/3.网络爬虫/8.pyspider框架的使用/config.json
 {
-  "export": [],
+  "export": ["pyspider.json"],
  "keywords": [],
  "children": [
    {

--- a/data/2.python中阶/3.网络爬虫/8.pyspider框架的使用/pyspider.json
+++ b/data/2.python中阶/3.网络爬虫/8.pyspider框架的使用/pyspider.json
+{
+    "author": "zxm2015",
+    "source": "pyspider.md",
+    "depends": [],
+    "type": "code_options"
+  }
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/8.pyspider框架的使用/pyspider.md
+++ b/data/2.python中阶/3.网络爬虫/8.pyspider框架的使用/pyspider.md
+# pyspider
+Pyspider与Scrapy都可以用来爬取数据，关于他们的说法<span style="color:red">错误</span>的是：
+## 答案
+```
+Scrapy提供了web界面，可以用来调试部署
+```
+## 选项
+### A
+```
+Pyspider提供了web界面，可以进行可视化调试
+```
+### B
+```
+初学者如果想快速入门爬取一个新闻网站，推荐使用Pyspider
+```
+### C
+```
+Scrapy的可扩展程度更高，主要用来应对一些复杂的爬取场景
+```
--- a/data/2.python中阶/3.网络爬虫/9.验证码处理/config.json
+++ b/data/2.python中阶/3.网络爬虫/9.验证码处理/config.json
 {
-  "export": [],
+  "export": ["verification_code.json"],
  "keywords": [],
  "children": [
    {

--- a/data/2.python中阶/3.网络爬虫/9.验证码处理/verification_code.json
+++ b/data/2.python中阶/3.网络爬虫/9.验证码处理/verification_code.json
+{
+    "author": "zxm2015",
+    "source": "verification_code.md",
+    "depends": [],
+    "type": "code_options"
+  }
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/9.验证码处理/verification_code.md
+++ b/data/2.python中阶/3.网络爬虫/9.验证码处理/verification_code.md
+# 爬虫验证码
+验证码是用来区分人和机器的一种方式，以下关于验证码的说法<span style="color:red">错误</span>的是：
+## 答案
+```
+验证码的识别是一个老话题，已经做到了100%的识别率
+```
+## 选项
+### A
+```
+验证码的种类繁多，包括中英混合，点选，滑动等等
+```
+### B
+```
+验证码识别要使用到OCR(Optical Character Recognition)技术
+```
+### C
+```
+对于有难度的验证码，可以对接打码平台或者第三方平台提供的识别服务
+```