Merge branch 'dev' into 'master'

合并Dev See merge request !3

Merge branch 'dev' into 'master'
合并Dev See merge request !3
fc23aaab · 幻灰龙 · e6cf3526 · 3df38ab2 · fc23aaab · fc23aaab
15 changed file
--- a/data/2.python中阶/3.网络爬虫/10.动态渲染页面爬取/config.json
+++ b/data/2.python中阶/3.网络爬虫/10.动态渲染页面爬取/config.json
 {
-  "export": [],
+  "export": ["dynamic_page.json"],
  "keywords": [],
  "children": [
    {

--- a/data/2.python中阶/3.网络爬虫/10.动态渲染页面爬取/dynamic_page.json
+++ b/data/2.python中阶/3.网络爬虫/10.动态渲染页面爬取/dynamic_page.json
+{
+    "author": "zxm2015",
+    "source": "dynamic_page.md",
+    "depends": [],
+    "type": "code_options"
+}
\ No newline at end of file
--- a/data/2.python中阶/3.网络爬虫/10.动态渲染页面爬取/dynamic_page.md
+++ b/data/2.python中阶/3.网络爬虫/10.动态渲染页面爬取/dynamic_page.md
+# 爬取动态页面
+现在想爬取一个url为下拉滚动的页面，下列选项可以爬取到下列页面内容的是：
+## 答案
+```python
+import time
+from selenium import webdriver
+from bs4 import BeautifulSoup
+driver = webdriver.Chrome()
+driver.get(url);
+Thread.sleep(1000);
+page_size = 10
+for i in range(page_size):
+    time.sleep(2)
+    js = "var q=document.documentElement.scrollTop=10000"
+    driver.execute_script(js)
+page = BeautifulSoup(driver.page_source, 'lxml')
+print(page.text)
+```
+## 选项
+### A
+```
+以上均不正确
+```
+### B
+```python
+import requests
+response = requests.get(url=url)
+page = BeautifulSoup(response.text, 'lxml')
+print(page.text)
+```
+### C
+```python
+import urllib.request
+response = urllib.request.urlopen(url)
+buff = response.read()
+html = buff.decode("utf8")
+page = BeautifulSoup(html, 'lxml')
+print(page.text)
+```
--- a/data/2.python中阶/3.网络爬虫/11.模拟登录/config.json
+++ b/data/2.python中阶/3.网络爬虫/11.模拟登录/config.json
 {
-  "export": [],
+  "export": ["simulate_login.json"],
  "keywords": [],
  "children": [
    {

--- a/data/2.python中阶/3.网络爬虫/11.模拟登录/simulate_login.json
+++ b/data/2.python中阶/3.网络爬虫/11.模拟登录/simulate_login.json
+{
+    "author": "zxm2015",
+    "source": "simulate_login.md",
+    "depends": [],
+    "type": "code_options"
+}
--- a/data/2.python中阶/3.网络爬虫/11.模拟登录/simulate_login.md
+++ b/data/2.python中阶/3.网络爬虫/11.模拟登录/simulate_login.md
+# 模拟登陆
+一些网站需要登录之后才能浏览网站的其他内容，爬虫需要拥有登录获取cookie/session的能力才能继续采集数据，以下关于说法<span style="color:red">错误</span>的是：
+## 答案
+```
+登录成功后获取的cookie一般来说永久有效
+```
+## 选项
+### A
+```
+模拟登陆需要先注册网站的账号，或者多注册一些账号来维护一个cookies池
+```
+### B
+```
+获取登录页面，可以从登录按钮处获取到登录的url
+```
+### C
+```
+登录成功后获取到cookie，其他请求带上cookie就可以获取到请求的页面资源
+```
--- a/data/2.python中阶/3.网络爬虫/6.Selenium/config.json
+++ b/data/2.python中阶/3.网络爬虫/6.Selenium/config.json
 {
-  "export": [],
+  "export": ["selenium.json"],
  "keywords": [],
  "children": [
    {

--- a/data/2.python中阶/3.网络爬虫/6.Selenium/selenium.json
+++ b/data/2.python中阶/3.网络爬虫/6.Selenium/selenium.json
+{
+    "author": "zxm2015",
+    "source": "selenium.md",
+    "depends": [],
+    "type": "code_options"
+}
--- a/data/2.python中阶/3.网络爬虫/6.Selenium/selenium.md
+++ b/data/2.python中阶/3.网络爬虫/6.Selenium/selenium.md
+# selenium
+Selenium是web自动化测试工具集，爬虫可以利用其实现对页面动态资源的采集，对于其这种说法<span style="color:red">错误</span>的是：
+## 答案
+```
+selenium和requests一样，都能用来采集数据，具有同等的速度
+```
+## 选项
+### A
+```
+页面执行js才能呈现的内容，可以使用selenium来协助采集
+```
+### B
+```
+selenium本质是驱动浏览器来发送请求，模拟浏览器的行为
+```
+### C
+```
+请求之后往往需要等待一段时间，等待资源加载渲染完成
+```
--- a/data/2.python中阶/3.网络爬虫/8.pyspider框架的使用/config.json
+++ b/data/2.python中阶/3.网络爬虫/8.pyspider框架的使用/config.json
 {
-  "export": [],
+  "export": ["pyspider.json"],
  "keywords": [],
  "children": [
    {

--- a/data/2.python中阶/3.网络爬虫/8.pyspider框架的使用/pyspider.json
+++ b/data/2.python中阶/3.网络爬虫/8.pyspider框架的使用/pyspider.json
+{
+    "author": "zxm2015",
+    "source": "pyspider.md",
+    "depends": [],
+    "type": "code_options"
+}
--- a/data/2.python中阶/3.网络爬虫/8.pyspider框架的使用/pyspider.md
+++ b/data/2.python中阶/3.网络爬虫/8.pyspider框架的使用/pyspider.md
+# pyspider
+Pyspider与Scrapy都可以用来爬取数据，关于他们的说法<span style="color:red">错误</span>的是：
+## 答案
+```
+Scrapy提供了web界面，可以用来调试部署
+```
+## 选项
+### A
+```
+Pyspider提供了web界面，可以进行可视化调试
+```
+### B
+```
+初学者如果想快速入门爬取一个新闻网站，推荐使用Pyspider
+```
+### C
+```
+Scrapy的可扩展程度更高，主要用来应对一些复杂的爬取场景
+```
--- a/data/2.python中阶/3.网络爬虫/9.验证码处理/config.json
+++ b/data/2.python中阶/3.网络爬虫/9.验证码处理/config.json
 {
-  "export": [],
+  "export": ["verification_code.json"],
  "keywords": [],
  "children": [
    {

--- a/data/2.python中阶/3.网络爬虫/9.验证码处理/verification_code.json
+++ b/data/2.python中阶/3.网络爬虫/9.验证码处理/verification_code.json
+{
+    "author": "zxm2015",
+    "source": "verification_code.md",
+    "depends": [],
+    "type": "code_options"
+}
--- a/data/2.python中阶/3.网络爬虫/9.验证码处理/verification_code.md
+++ b/data/2.python中阶/3.网络爬虫/9.验证码处理/verification_code.md
+# 爬虫验证码
+验证码是用来区分人和机器的一种方式，以下关于验证码的说法<span style="color:red">错误</span>的是：
+## 答案
+```
+验证码的识别是一个老话题，已经做到了100%的识别率
+```
+## 选项
+### A
+```
+验证码的种类繁多，包括中英混合，点选，滑动等等
+```
+### B
+```
+验证码识别要使用到OCR(Optical Character Recognition)技术
+```
+### C
+```
+对于有难度的验证码，可以对接打码平台或者第三方平台提供的识别服务
+```