Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
梦想橡皮擦
爬虫训练场
提交
0203d062
爬
爬虫训练场
项目概览
梦想橡皮擦
/
爬虫训练场
通知
64
Star
7
Fork
1
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
爬
爬虫训练场
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
提交
0203d062
编写于
1月 05, 2023
作者:
梦想橡皮擦
💬
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
案例 IP 限制反爬
上级
f4be1693
变更
11
隐藏空白更改
内联
并排
Showing
11 changed file
with
230 addition
and
18 deletion
+230
-18
README.md
README.md
+5
-0
app/__init__.py
app/__init__.py
+17
-0
app/__pycache__/__init__.cpython-36.pyc
app/__pycache__/__init__.cpython-36.pyc
+0
-0
app/__pycache__/routes.cpython-36.pyc
app/__pycache__/routes.cpython-36.pyc
+0
-0
app/school/__pycache__/index.cpython-36.pyc
app/school/__pycache__/index.cpython-36.pyc
+0
-0
app/school/index.py
app/school/index.py
+25
-2
app/templates/csdn/blogstar.html
app/templates/csdn/blogstar.html
+16
-7
app/templates/csdn/newstar.html
app/templates/csdn/newstar.html
+16
-8
app/templates/index.html
app/templates/index.html
+23
-1
app/templates/school/ajax_list3.html
app/templates/school/ajax_list3.html
+112
-0
app/templates/timeline.html
app/templates/timeline.html
+16
-0
未找到文件。
README.md
浏览文件 @
0203d062
...
...
@@ -23,12 +23,17 @@
15.
[
我是怎么用一个特殊 Cookie ,限制住别人的爬虫的
](
https://blog.csdn.net/hihell/article/details/128474849
)
16.
[
你很勇哦,这么点数据就敢用异步加载?
](
https://blog.csdn.net/hihell/article/details/128474866?spm=1001.2014.3001.5501
)
17.
[
老板让我手动控制网页渲染速度,说这能反爬虫?我信了。
](
https://blog.csdn.net/hihell/article/details/128474887?spm=1001.2014.3001.5501
)
18.
[
离职原因:让 BOSS 学习“滚动加载”这一名词
](
https://dream.blog.csdn.net/article/details/128474916
)
19.
[
网站响应数据加一个简单的密,就能挡住80%的爬虫,你信吗?
](
https://dream.blog.csdn.net/article/details/128474924
)
20.
[
一秒一个Token甩到前台,吓死在座的各位爬虫工程师
](
https://dream.blog.csdn.net/article/details/128474930
)
21.
[
反爬工程师都会用的手段,IP限制反爬 - 爬虫训练场
](
https://dream.blog.csdn.net/article/details/128550653
)
## 小知识点补充博客
1.
[
【小知识点】爬虫训练场项目,Python Flask 模板更新,每次都要重新服务
](
https://blog.csdn.net/hihell/article/details/128399376
)
2.
[
【小知识点】Python Flask 部署,生成环境的爬虫训练场项目
](
https://blog.csdn.net/hihell/article/details/128422613
)
3.
[
【小知识点】给PythonWeb项目添加百度统计,爬虫训练场
](
https://blog.csdn.net/hihell/article/details/128448271
)
4.
[
【小知识点】为爬虫训练场项目添加 Bootstrap5 时间轴
](
https://dream.blog.csdn.net/article/details/128543088
)
## 站点数据储备博客
...
...
app/__init__.py
浏览文件 @
0203d062
...
...
@@ -3,10 +3,27 @@ from flask_sqlalchemy import SQLAlchemy
from
.config
import
BaseConfig
# 导入配置文件
# Flask 限流器
from
flask_limiter
import
Limiter
from
flask_limiter.util
import
get_remote_address
,
get_ipaddr
app
=
Flask
(
__name__
)
app
.
config
.
from_object
(
BaseConfig
)
# 启用配置
def
get_real_ip
():
if
request
.
headers
.
getlist
(
"X-Forwarded-For"
):
return
request
.
headers
.
getlist
(
"X-Forwarded-For"
)[
0
]
return
request
.
remote_addr
limiter
=
Limiter
(
app
,
key_func
=
get_real_ip
)
# limiter = Limiter(app, key_func=get_ipaddr)
db
=
SQLAlchemy
()
db
.
init_app
(
app
)
# 初始化数据库
...
...
app/__pycache__/__init__.cpython-36.pyc
浏览文件 @
0203d062
无法预览此类型文件
app/__pycache__/routes.cpython-36.pyc
浏览文件 @
0203d062
无法预览此类型文件
app/school/__pycache__/index.cpython-36.pyc
浏览文件 @
0203d062
无法预览此类型文件
app/school/index.py
浏览文件 @
0203d062
...
...
@@ -7,6 +7,10 @@ from flask import Blueprint, jsonify, request
from
flask
import
render_template
from
..model
import
School
# 导入上级模块
# 从 app 中导入 limiter 对象
from
app
import
limiter
s
=
Blueprint
(
'school'
,
__name__
,
url_prefix
=
'/ss'
)
...
...
@@ -132,8 +136,6 @@ def encry_api():
"""
间隔10秒生成一Cookie
"""
...
...
@@ -165,3 +167,24 @@ def token_list_school():
pagination
=
pagination_object
(
page
)
return
jsonify
(
pagination
)
"""
限制 IP 访问
"""
@
s
.
route
(
'ajax_list3'
)
def
ajax_list3
():
page
=
1
# 初始化第一页数据
pagination
=
pagination_object
(
page
)
return
render_template
(
'school/ajax_list3.html'
,
pagination
=
pagination
)
@
s
.
route
(
'api3'
)
@
limiter
.
limit
(
"3/second"
)
def
school_api3
():
page
=
int
(
request
.
args
.
get
(
"page"
,
1
))
pagination
=
pagination_object
(
page
)
return
jsonify
(
pagination
)
app/templates/csdn/blogstar.html
浏览文件 @
0203d062
{% extends "base.html" %}
{% block content %}
<style>
@media
(
max-width
:
540px
)
{
table
{
font-size
:
10px
!important
;
}
}
</style>
<div
class=
"container"
>
<table
class=
"table table-hover table-bordered"
>
<table
class=
"table table-hover table-bordered table-responsive"
>
<caption
class=
"caption-top text-center"
>
<div
class=
"alert alert-warning"
>
<p
class=
"m-0"
>
<strong>
CSDN 2022 博客之星总排名
</strong>
👉 绿色背景是总分前 200(晋级区)👈
</p>
<p
class=
"text-success p-0"
><small>
数据同步时间:2023-01-0
3 12
:00
</small></p>
<p
class=
"m-0"
><small>
来都来了,不去给橡皮擦打个5分么?
</small>
|
<a
target=
"_blank"
<p
class=
"text-success p-0"
><small>
数据同步时间:2023-01-0
4 21
:00
</small></p>
<p
class=
"m-0"
><small>
来都来了,不去给橡皮擦打个5分么?
</small>
<br>
<a
target=
"_blank"
href=
"https://bbs.csdn.net/topics/611387187"
><small>
https://bbs.csdn.net/topics/611387187
</small></a>
</p>
...
...
@@ -18,16 +27,16 @@
<a
class=
"btn btn-primary"
href=
"/csdn/newstar"
>
仅看新星
</a>
</div>
<br>
<div
class=
"btn-group btn-group-sm mt-3 d-flex"
>
<div
class=
"btn-group btn-group-sm mt-3 d-flex"
style=
"font-size:12px;"
>
<a
class=
"btn btn-success"
href=
"/csdn/blogstar?page=1"
>
其它
</a>
<a
class=
"btn btn-success"
href=
"/csdn/blogstar?page=2"
>
前端
</a>
<a
class=
"btn btn-success"
href=
"/csdn/blogstar?page=3"
>
后端
</a>
<a
class=
"btn btn-success"
href=
"/csdn/blogstar?page=4"
>
大数据
</a>
<a
class=
"btn btn-success"
href=
"/csdn/blogstar?page=5"
>
云原生
</a>
<a
class=
"btn btn-success"
href=
"/csdn/blogstar?page=6"
>
前沿
技术
</a>
<a
class=
"btn btn-success"
href=
"/csdn/blogstar?page=6"
>
前沿
</a>
<a
class=
"btn btn-success"
href=
"/csdn/blogstar?page=7"
>
人工智能
</a>
<a
class=
"btn btn-success"
href=
"/csdn/blogstar?page=8"
>
运维
与安全
</a>
<a
class=
"btn btn-success"
href=
"/csdn/blogstar?page=9"
>
移动
开发
</a>
<a
class=
"btn btn-success"
href=
"/csdn/blogstar?page=8"
>
运维
</a>
<a
class=
"btn btn-success"
href=
"/csdn/blogstar?page=9"
>
移动
</a>
<a
class=
"btn btn-success"
href=
"/csdn/blogstar?page=10"
>
物联网
</a>
</div>
</caption>
...
...
app/templates/csdn/newstar.html
浏览文件 @
0203d062
{% extends "base.html" %}
{% block content %}
<style>
@media
(
max-width
:
540px
)
{
table
{
font-size
:
10px
!important
;
}
}
</style>
<div
class=
"container"
>
<div
class=
"
table-responsive-sm
"
>
<div
class=
"
table-responsive
"
>
<table
class=
"table table-hover table-bordered"
>
<caption
class=
"caption-top text-center"
>
<div
class=
"alert alert-warning"
>
<p
class=
"m-0"
>
<strong>
CSDN 2022 博客新星总排名
</strong>
👉 绿色背景是总分前 100(晋级区)👈
</p>
<p
class=
"text-success p-0"
><small>
数据同步时间:2023-
12-30 9
:00
</small></p>
<p
class=
"m-0"
><small>
来都来了,不去给橡皮擦打个5分么?
</small>
|
<a
target=
"_blank"
href=
"https://bbs.csdn.net/topics/611387187"
><small>
https://bbs.csdn.net/topics/611387187
</small></a>
<p
class=
"text-success p-0"
><small>
数据同步时间:2023-
01-04 21
:00
</small></p>
<p
class=
"m-0"
><small>
来都来了,不去给橡皮擦打个5分么?
</small>
<br>
<a
target=
"_blank"
href=
"https://bbs.csdn.net/topics/611387187"
><small>
https://bbs.csdn.net/topics/611387187
</small></a>
</p>
</div>
...
...
@@ -25,7 +35,7 @@
<th>
昵称
</th>
<th>
赛道
</th>
<th>
注册时间
</th>
<th>
目前得
分
</th>
<th>
总
分
</th>
</tr>
</thead>
<tbody>
...
...
@@ -52,9 +62,7 @@
{% endif %}
</td>
<td>
{{u.regtime}}
</td>
<td>
{{u.regtime}}
</td>
<td>
{{u.totalScore}}
</td>
</tr>
{%endfor%}
...
...
app/templates/index.html
浏览文件 @
0203d062
...
...
@@ -234,7 +234,29 @@
</p>
</div>
<div
class=
"card-footer text-end"
>
<a
href=
"https://dream.blog.csdn.net/article/details/128474924"
target=
"_blank"
class=
"card-link text-muted small"
>
案例制作教程
</a>
<a
href=
"https://dream.blog.csdn.net/article/details/128474930"
target=
"_blank"
class=
"card-link text-muted small"
>
案例制作教程
</a>
<a
href=
"#"
class=
"btn btn-success btn-sm card-link disabled"
alt=
"暂未开放"
>
学习博客
</a>
</div>
</div>
</div>
<div
class=
"col mt-2"
>
<div
class=
"card border-info rounded-5 shadow-sm"
style=
"min-height:306px;min-width:300px;"
>
<div
class=
"card-header text-center"
>
<h4
class=
"card-title"
>
IP 限制爬虫
</h4>
<div
class=
"bg-danger text-white rounded p-1"
style=
"transform: rotate(20deg); position:absolute;right:0;top:0.5rem;"
>
最新更新
</div>
</div>
<div
class=
"card-body"
>
<p
class=
"card-text"
>
本案例限制单IP每秒仅能访问3次API,学习时,需要用到代理IP池,或者间隔时间采集。
</p>
<p
class=
"card-text text-left"
>
难度:⭐⭐
</p>
<p
class=
"card-text"
>
案例:
<a
href=
"/ss/ajax_list3"
class=
"card-link text-success"
>
学校清单
</a>
</p>
</div>
<div
class=
"card-footer text-end"
>
<a
href=
"https://dream.blog.csdn.net/article/details/128474930"
target=
"_blank"
class=
"card-link text-muted small"
>
案例制作教程
</a>
<a
href=
"#"
class=
"btn btn-success btn-sm card-link disabled"
alt=
"暂未开放"
>
学习博客
</a>
</div>
</div>
...
...
app/templates/school/ajax_list3.html
0 → 100644
浏览文件 @
0203d062
{% extends "base.html" %}
{% block script %}
<script
type=
"text/javascript"
src=
"https://ajax.aspnetcdn.com/ajax/jQuery/jquery-3.6.0.min.js"
></script>
<script
type=
"text/javascript"
>
function
get_data
(
page
){
$
.
ajax
({
type
:
"
get
"
,
url
:
"
/ss/api3
"
,
data
:
{
page
:
page
},
success
:
function
(
response
)
{
// ajax 请求成功
render_data
(
response
);
// 修改分页数据
$
(
'
.prev
'
).
attr
(
'
page
'
,
response
[
"
prev_page
"
]);
$
(
'
.next
'
).
attr
(
'
page
'
,
response
[
"
next_page
"
])
;
console
.
log
(
"
AJAX request succeeded!
"
);
},
error
:
function
(
error
)
{
console
.
log
(
"
AJAX request failed:
"
+
error
);
}
});
}
function
render_data
(
response
){
data_list
=
response
[
"
data_list
"
];
if
(
data_list
.
length
>
0
){
$
(
'
#school_list
'
).
empty
();
$
.
each
(
data_list
,
function
(
index
,
item
){
var
row
=
$
(
'
<div>
'
,
{
'
class
'
:
'
row mt-3
'
,
'
data-custom-attribute
'
:
'
value
'
});
var
col
=
$
(
'
<div>
'
,
{
'
class
'
:
'
col
'
});
var
d_flex
=
$
(
'
<div>
'
,
{
'
class
'
:
'
d-flex
'
});
d_flex
.
append
(
'
<div class="flex-shrink-0"><a href="#"><img class="rounded-pill img-thumbnail" width="64" height="64" src="
'
+
item
.
pic
+
'
" alt=""></a></div>
'
);
// 生成一下标签代码
var
badge
=
""
;
$
.
each
(
item
.
feature
.
split
(
'
,
'
),
function
(
i
,
f
){
badge
+=
'
<span class="badge rounded-pill bg-primary">
'
+
f
+
'
</span>
'
;
});
d_flex
.
append
(
'
<div class="flex-grow-1 ms-3"><h5 class="float-start pe-3">
'
+
item
.
name
+
'
</h5><p class="ms-3">
'
+
badge
+
'
</p><p><em>所在省市:<span class="text-black-50">
'
+
item
.
province
+
'
--
'
+
item
.
city
+
'
</span></em></p></div>
'
)
col
.
append
(
d_flex
);
row
.
append
(
col
);
$
(
'
#school_list
'
).
append
(
row
);
})
}
}
$
(
function
(){
$
(
'
.page-item
'
).
on
(
'
click
'
,
function
(){
page
=
$
(
this
).
attr
(
'
page
'
);
// 获取数据
get_data
(
page
);
})
})
</script>
{% endblock script %}
{% block content %}
<div
class=
"container"
id=
"school_list"
>
{% for school in pagination.data_list %}
<div
class=
"row mt-3"
>
<div
class=
"col"
>
<div
class=
"d-flex"
>
<div
class=
"flex-shrink-0"
>
<a
href=
"#"
>
<img
class=
"rounded-pill img-thumbnail"
width=
"64"
height=
"64"
src=
"{{school.pic}}"
alt=
""
>
</a>
</div>
<div
class=
"flex-grow-1 ms-3"
>
<h5
class=
"float-start pe-3"
>
{{school.name}}
</h5>
<p
class=
"ms-3"
>
{% for fea in school.feature.split(',') %}
<span
class=
"badge rounded-pill bg-primary"
>
{{fea}}
</span>
{% endfor %}
</p>
<p><em>
所在省市:
<span
class=
"text-black-50"
>
{{school.province}} -- {{school.city}}
</span></em></p>
</div>
</div>
</div>
</div>
{% endfor %}
</div>
<div
class=
"container"
>
<div
class=
"row"
>
<div
class=
"col"
>
<span
class=
"text-dark float-end align-middle"
style=
"line-height: 40px;"
>
合计 {{pagination.total}} 条数据
</span>
<ul
class=
"pagination float-end"
>
<li
class=
"page-item prev"
page=
"{{pagination.prev_page}}"
>
<a
class=
"page-link"
href=
"#"
>
上一页
</a>
</li>
<li
class=
"page-item next"
page=
"{{ pagination.next_page }}"
><a
class=
"page-link"
href=
"#"
>
下一页
</a>
</li>
</ul>
</div>
</div>
</div>
{% endblock %}
app/templates/timeline.html
浏览文件 @
0203d062
...
...
@@ -16,6 +16,22 @@
<span
class=
"timeline-label"
>
<span
class=
"label bg-success text-white p-1"
>
正在更新中
</span>
</span>
<div
class=
"timeline-item"
>
<div
class=
"timeline-point timeline-point-success"
>
<i
class=
"fa fa-times"
></i>
</div>
<div
class=
"timeline-event"
>
<div
class=
"timeline-heading"
>
<h4>
爬虫训练场 V0.0.16 发布
</h4>
</div>
<div
class=
"timeline-body"
>
<p>
更新 反爬案例 --- IP 限制次数!
</p>
</div>
<div
class=
"timeline-footer"
>
<p
class=
"text-right"
>
2023年01月05日 20:35
</p>
</div>
</div>
</div>
<div
class=
"timeline-item"
>
<div
class=
"timeline-point timeline-point-success"
>
<i
class=
"fa fa-times"
></i>
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录