提交 8a877e45 编写于 作者: 梦想橡皮擦's avatar 梦想橡皮擦 💬

慢加载爬虫

上级 9c3e990e
...@@ -20,6 +20,9 @@ ...@@ -20,6 +20,9 @@
12. [爬虫训练场项目,jinja2 模板继承,项目继续迭代](https://dream.blog.csdn.net/article/details/128452409) 12. [爬虫训练场项目,jinja2 模板继承,项目继续迭代](https://dream.blog.csdn.net/article/details/128452409)
13. [爬虫训练场集成文件采集案例,来学习一下怎么实现的](https://blog.csdn.net/hihell/article/details/128465888) 13. [爬虫训练场集成文件采集案例,来学习一下怎么实现的](https://blog.csdn.net/hihell/article/details/128465888)
14. [UserAgent 反爬是如何实现的,来看看这篇博客 &](https://blog.csdn.net/hihell/article/details/128473575) 14. [UserAgent 反爬是如何实现的,来看看这篇博客 &](https://blog.csdn.net/hihell/article/details/128473575)
15. [我是怎么用一个特殊 Cookie ,限制住别人的爬虫的](https://blog.csdn.net/hihell/article/details/128474849)
16. [你很勇哦,这么点数据就敢用异步加载?](https://blog.csdn.net/hihell/article/details/128474866?spm=1001.2014.3001.5501)
17. [老板让我手动控制网页渲染速度,说这能反爬虫?我信了。](https://blog.csdn.net/hihell/article/details/128474887?spm=1001.2014.3001.5501)
## 小知识点补充博客 ## 小知识点补充博客
......
...@@ -17,8 +17,9 @@ from .school.index import * ...@@ -17,8 +17,9 @@ from .school.index import *
from .file.index import * from .file.index import *
from .antispider.index import * from .antispider.index import *
from .csdn.index import * from .csdn.index import *
from .slow.index import *
app.register_blueprint(s) app.register_blueprint(s)
app.register_blueprint(f) app.register_blueprint(f)
app.register_blueprint(antispider) app.register_blueprint(antispider)
app.register_blueprint(cs) app.register_blueprint(cs)
\ No newline at end of file app.register_blueprint(slow)
\ No newline at end of file
...@@ -10,13 +10,46 @@ cs = Blueprint('csdn', __name__, url_prefix='/csdn') ...@@ -10,13 +10,46 @@ cs = Blueprint('csdn', __name__, url_prefix='/csdn')
@cs.route('/blogstar') @cs.route('/blogstar')
def user_list(): def user_list():
user = Csdn.query.order_by(-Csdn.totalScore).all() user = []
page = request.args.get("page", 0)
try:
page = int(page)
except Exception as e:
return "what are you 弄啥嘞? <a href='/'>返回首页</a>", 403
if page == 0:
user = Csdn.query.order_by(-Csdn.totalScore).all()
elif page == 1:
user = Csdn.query.filter(Csdn.cateName == "IT其他").order_by(-Csdn.totalScore).all()
elif page == 2:
user = Csdn.query.filter(Csdn.cateName == "前端").order_by(-Csdn.totalScore).all()
elif page == 3:
user = Csdn.query.filter(Csdn.cateName == "后端").order_by(-Csdn.totalScore).all()
elif page == 4:
user = Csdn.query.filter(Csdn.cateName == "大数据与算法").order_by(-Csdn.totalScore).all()
elif page == 5:
user = Csdn.query.filter(Csdn.cateName == "云原生与云平台").order_by(-Csdn.totalScore).all()
elif page == 6:
user = Csdn.query.filter(Csdn.cateName == "前沿技术").order_by(-Csdn.totalScore).all()
elif page == 7:
user = Csdn.query.filter(Csdn.cateName == "人工智能").order_by(-Csdn.totalScore).all()
elif page == 8:
user = Csdn.query.filter(Csdn.cateName == "运维与安全").order_by(-Csdn.totalScore).all()
elif page == 9:
user = Csdn.query.filter(Csdn.cateName == "移动开发").order_by(-Csdn.totalScore).all()
elif page == 10:
user = Csdn.query.filter(Csdn.cateName == "物联网与嵌入式").order_by(-Csdn.totalScore).all()
for u in user: for u in user:
if u.regtime is not None: if u.regtime is not None:
u.star = "博客新星" if u.regtime.startswith("2022") else "博客之星" u.star = "博客新星" if u.regtime.startswith("2022") else "博客之星"
else: else:
u.star = "---" u.star = "---"
return render_template('csdn/blogstar.html', user=user)
bg_green = 200 if page == 0 else 10
return render_template('csdn/blogstar.html', user=user,bg_green=bg_green)
@cs.route('/newstar') @cs.route('/newstar')
......
import math
from flask import Blueprint, jsonify, request
from flask import render_template
import time
slow = Blueprint('slow', __name__, url_prefix='/slow')
movies = [{
"name": "无间道",
"release_time": "2002年12月12日",
"company": "寰亚电影发行公司",
"movie_type": "剧情、犯罪、警匪"
}, {
"name": "青蛇",
"release_time": "1993年11月4日",
"company": "香港思远影业公司",
"movie_type": "奇幻"
}, {
"name": "喜剧之王",
"release_time": "1999年02月13日",
"company": "星辉海外有限公司",
"movie_type": "剧情、喜剧、爱情"
}, {
"name": "重庆森林",
"release_time": "1994年07月14日",
"company": "泽东电影有限公司",
"movie_type": "剧情、悬疑、爱情"
}, {
"name": "英雄本色",
"release_time": "1986年8月2日",
"company": "新艺城影业有限公司",
"movie_type": "剧情、动作、犯罪、惊悚"
}, {
"name": "倩女幽魂",
"release_time": "1987年7月18日",
"company": "新艺城影业有限公司",
"movie_type": "爱情、奇幻、武侠、古装"
}, {
"name": "花样年华",
"release_time": "2000年9月29日",
"company": "泽东电影公司",
"movie_type": "剧情、文艺、爱情"
}, {
"name": "大话西游系列",
"release_time": "1995年2月4日",
"company": "彩星电影公司",
"movie_type": "喜剧、爱情、动作、奇幻、冒险"
}, {
"name": "东成西就",
"release_time": "1993年2月5日",
"company": "泽东电影公司",
"movie_type": "喜剧"
}]
@slow.route('/list')
def list_slow():
return render_template('slow/index.html')
@slow.route('/detail')
def detail():
movie_id = int(request.args.get("movie_id", 1))
movie = movies[movie_id - 1]
time.sleep(5) # 延迟 5 秒
return jsonify(movie)
...@@ -38,6 +38,15 @@ ...@@ -38,6 +38,15 @@
class="list-group-item list-group-item-action">13. 爬虫训练场集成文件采集案例,来学习一下怎么实现的</a> class="list-group-item list-group-item-action">13. 爬虫训练场集成文件采集案例,来学习一下怎么实现的</a>
<a href="https://blog.csdn.net/hihell/article/details/128473575?spm=1001.2014.3001.5501" target="_blank" <a href="https://blog.csdn.net/hihell/article/details/128473575?spm=1001.2014.3001.5501" target="_blank"
class="list-group-item list-group-item-action">14. UserAgent 反爬是如何实现的,来看看这篇博客 &</a> class="list-group-item list-group-item-action">14. UserAgent 反爬是如何实现的,来看看这篇博客 &</a>
<a href="https://blog.csdn.net/hihell/article/details/128474849?spm=1001.2014.3001.5501" target="_blank"
class="list-group-item list-group-item-action">15. 我是怎么用一个特殊 Cookie ,限制住别人的爬虫的</a>
<a href="https://blog.csdn.net/hihell/article/details/128474866?spm=1001.2014.3001.5501" target="_blank"
class="list-group-item list-group-item-action">16. 你很勇哦,这么点数据就敢用异步加载?</a>
<a href="https://blog.csdn.net/hihell/article/details/128474887?spm=1001.2014.3001.5501" target="_blank"
class="list-group-item list-group-item-action">17. 老板让我手动控制网页渲染速度,说这能反爬虫?我信了。</a>
</div> </div>
</div> </div>
</div> </div>
......
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
<div class="alert alert-warning"> <div class="alert alert-warning">
<p class="m-0"> <p class="m-0">
<strong>CSDN 2022 博客之星总排名</strong> 👉 绿色背景是总分前 200(晋级区)👈</p> <strong>CSDN 2022 博客之星总排名</strong> 👉 绿色背景是总分前 200(晋级区)👈</p>
<p class="text-success p-0"><small>数据同步时间:2022-12-30 9:00</small></p> <p class="text-success p-0"><small>数据同步时间:2023-01-02 18:00</small></p>
<p class="m-0"><small>来都来了,不去给橡皮擦打个5分么?</small> | <a target="_blank" <p class="m-0"><small>来都来了,不去给橡皮擦打个5分么?</small> | <a target="_blank"
href="https://bbs.csdn.net/topics/611387187"><small>https://bbs.csdn.net/topics/611387187</small></a> href="https://bbs.csdn.net/topics/611387187"><small>https://bbs.csdn.net/topics/611387187</small></a>
</p> </p>
...@@ -17,6 +17,19 @@ ...@@ -17,6 +17,19 @@
<!-- <a class="btn btn-primary" href="/csdn/oldstar">博客之星排名</a>--> <!-- <a class="btn btn-primary" href="/csdn/oldstar">博客之星排名</a>-->
<a class="btn btn-primary" href="/csdn/newstar">仅看新星</a> <a class="btn btn-primary" href="/csdn/newstar">仅看新星</a>
</div> </div>
<br>
<div class="btn-group btn-group-sm mt-3 d-flex">
<a class="btn btn-success" href="/csdn/blogstar?page=1">其它</a>
<a class="btn btn-success" href="/csdn/blogstar?page=2">前端</a>
<a class="btn btn-success" href="/csdn/blogstar?page=3">后端</a>
<a class="btn btn-success" href="/csdn/blogstar?page=4">大数据</a>
<a class="btn btn-success" href="/csdn/blogstar?page=5">云原生</a>
<a class="btn btn-success" href="/csdn/blogstar?page=6">前沿技术</a>
<a class="btn btn-success" href="/csdn/blogstar?page=7">人工智能</a>
<a class="btn btn-success" href="/csdn/blogstar?page=8">运维与安全</a>
<a class="btn btn-success" href="/csdn/blogstar?page=9">移动开发</a>
<a class="btn btn-success" href="/csdn/blogstar?page=10">物联网</a>
</div>
</caption> </caption>
<thead> <thead>
<tr> <tr>
...@@ -24,12 +37,12 @@ ...@@ -24,12 +37,12 @@
<th>昵称</th> <th>昵称</th>
<th>赛道</th> <th>赛道</th>
<th>状态</th> <th>状态</th>
<th>目前得</th> <th></th>
</tr> </tr>
</thead> </thead>
<tbody> <tbody>
{% for u in user %} {% for u in user %}
{% if loop.index<=200 %} {% if loop.index<=bg_green %}
<tr class="bg-success text-white"> <tr class="bg-success text-white">
{% else %} {% else %}
<tr> <tr>
......
...@@ -53,9 +53,7 @@ ...@@ -53,9 +53,7 @@
<div class="card border-info rounded-5 shadow-sm" style="min-height:268px;min-width:300px;"> <div class="card border-info rounded-5 shadow-sm" style="min-height:268px;min-width:300px;">
<div class="card-header text-center"> <div class="card-header text-center">
<h4 class="card-title">二进制文件采集</h4> <h4 class="card-title">二进制文件采集</h4>
<div class="bg-danger text-white rounded p-1"
style="transform: rotate(20deg); position:absolute;right:0;top:0.5rem;">最新更新
</div>
</div> </div>
<div class="card-body"> <div class="card-body">
<p class="card-text">本案例用于大家学习文件和视频文件内容采集,重点掌握 M3U8 格式视频下载,掌握二进制内容保存。</p> <p class="card-text">本案例用于大家学习文件和视频文件内容采集,重点掌握 M3U8 格式视频下载,掌握二进制内容保存。</p>
...@@ -95,6 +93,31 @@ ...@@ -95,6 +93,31 @@
</div> </div>
</div> </div>
<div class="row align-items-stretch">
<div class="col mt-2">
<div class="card border-info rounded-5 shadow-sm" style="min-height:268px;min-width:300px;">
<div class="card-header text-center">
<h4 class="card-title">慢速爬虫</h4>
<div class="bg-danger text-white rounded p-1"
style="transform: rotate(20deg); position:absolute;right:0;top:0.5rem;">最新更新
</div>
</div>
<div class="card-body">
<p class="card-text">本案例通过控制请求响应速度,来实现慢速爬虫,编写采集程序,需要控制请求和响应时间。</p>
<p class="card-text text-left">难度:⭐</p>
<p class="card-text">
案例:
<a href="/slow/list" class="card-link text-success">香港电影</a>
</p>
</div>
<div class="card-footer text-end">
<a href="#" class="btn btn-primary card-link ">学习博客</a>
</div>
</div>
</div>
</div>
</div> </div>
<div class="container pt-5"> <div class="container pt-5">
<h3 class="text-danger">PC端反爬</h3> <h3 class="text-danger">PC端反爬</h3>
......
{% extends "base.html" %}
{% block script %}
<script type="text/javascript" src="https://ajax.aspnetcdn.com/ajax/jQuery/jquery-3.6.0.min.js"></script>
<script type="text/javascript">
$(function(){
function showModal(movie_id) {
var data = {
movie_id: movie_id
}
$.get('/slow/detail',data,function(res){
$('.spinner-border').addClass('d-none');
$('.movie_title').text(res.name);
$('.movie_body').html('<p>上映时间: '+res.release_time+'</p><p>类型: '+res.movie_type+'</p><p>出品公司: '+res.company+'</p>')
})
$('#movieModal').modal('show');
}
$('.show_movie').on('click',function(){
var movie_id = $(this).attr('movieid');
$('.movie_title').text('加载中……');
$('.movie_body').text('加载中……');
$('.spinner-border').removeClass('d-none');
showModal(movie_id);
});
});
</script>
{% endblock script %}
{% block content %}
<div class="container mt-4">
<div class="row">
<div class="col-4">
<div class="card">
<img src="{{url_for('static',filename='images/movie/1.jpg')}}"
class="img-fluid" alt="Movie 1">
<div class="card-body">
<h5 class="card-title">无间道</h5>
<p class="card-text">
作为香港警探电影系列的巅峰之作,无间道由刘德华、梁朝伟、黄秋生等实力影帝主演,它作为刘德华十大经典电影之一,主要讲述了黑社会的卧底故事,情节反转再反转,非常值得重复观看。</p>
<button type="button" movieid="1" class="show_movie btn btn-primary">
详细
</button>
</div>
</div>
</div>
<div class="col-4">
<div class="card">
<img src="{{url_for('static',filename='images/movie/2.jpg')}}"
class="img-fluid" alt="Movie 2">
<div class="card-body">
<h5 class="card-title">青蛇</h5>
<p class="card-text">青蛇是由徐克执导的奇幻爱情片,该片由两大女神张曼玉和王祖贤领衔主演,青蛇对白娘子和许仙传统的爱情故事进行改变,是当年风格比较前卫的奇幻片。</p>
<button type="button" movieid="2" class="show_movie btn btn-primary">
详细
</button>
</div>
</div>
</div>
<div class="col-4">
<div class="card">
<img src="{{url_for('static',filename='images/movie/3.webp')}}" class="card-img-top" alt="Movie 3">
<div class="card-body">
<h5 class="card-title">喜剧之王</h5>
<p class="card-text">喜剧之王是由周星驰执导并主演的喜剧电影,该影片延续了星爷无厘头喜剧风格,前半部分笑点很多,影片结尾给人一种温暖的感觉,它是一部成功的喜剧电影。</p>
<button type="button" movieid="3" class="show_movie btn btn-primary">
详细
</button>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-4">
<div class="card">
<img src="{{url_for('static',filename='images/movie/4.webp')}}"
class="img-fluid" alt="Movie 1">
<div class="card-body">
<h5 class="card-title">重庆森林</h5>
<p class="card-text">重庆森林讲述了破案警察和神秘女杀手之间的爱情故事,由林青霞、梁朝伟主演,该片的色彩丰富,格调优美,在当年获得金像奖最佳影片等奖项。</p>
<button type="button" movieid="4" class="show_movie btn btn-primary">
详细
</button>
</div>
</div>
</div>
<div class="col-4">
<div class="card">
<img src="{{url_for('static',filename='images/movie/5.webp')}}"
class="img-fluid" alt="Movie 2">
<div class="card-body">
<h5 class="card-title">英雄本色</h5>
<p class="card-text">1986年上映的英雄本色,是当年非常成功的商业片,该片是当年香港票房冠军,主要讲述了三个主人公闯荡江湖并产生的的兄弟情义。</p>
<button type="button" movieid="5" class="show_movie btn btn-primary">
详细
</button>
</div>
</div>
</div>
<div class="col-4">
<div class="card">
<img src="{{url_for('static',filename='images/movie/6.webp')}}" class="card-img-top" alt="Movie 3">
<div class="card-body">
<h5 class="card-title">倩女幽魂</h5>
<p class="card-text">1986年上映的倩女幽魂,是由张国荣和王祖贤领衔主演的奇幻爱情片,该片采用聊斋中的传统故事,再加上大胆的情节改变,获得了众多粉丝的喜爱。</p>
<button type="button" movieid="6" class="show_movie btn btn-primary">
详细
</button>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-4">
<div class="card">
<img src="{{url_for('static',filename='images/movie/7.webp')}}"
class="img-fluid" alt="Movie 1">
<div class="card-body">
<h5 class="card-title">花样年华</h5>
<p class="card-text">花样年华是2000年上映的香港电影,该影片带有浓烈的王家卫风格,影片中的色彩运用到极致,再加上中国旗袍的古典美,该影片获得众多奖项。</p>
<button type="button" movieid="7" class="show_movie btn btn-primary">
详细
</button>
</div>
</div>
</div>
<div class="col-4">
<div class="card">
<img src="{{url_for('static',filename='images/movie/8.webp')}}"
class="img-fluid" alt="Movie 2">
<div class="card-body">
<h5 class="card-title">大话西游系列</h5>
<p class="card-text">
这是最早的穿越电影之一,主要讲述了至尊宝和紫霞仙子之间的情缘,该片在上映之时,并没有获得很多的关注,直到两年后才在高校学生之间蹿红,肯定是在香港十大经典电影推荐之中。</p>
<button type="button" movieid="8" class="show_movie btn btn-primary">
详细
</button>
</div>
</div>
</div>
<div class="col-4">
<div class="card">
<img src="{{url_for('static',filename='images/movie/9.jpg')}}" class="card-img-top" alt="Movie 3">
<div class="card-body">
<h5 class="card-title">东成西就</h5>
<p class="card-text">1993年上映的东成西就,是由刘镇伟执导的喜剧片,该片巨星云集,主演有张国荣、张学友、梁朝伟、梁家辉、张曼玉、林青霞等。</p>
<button type="button" movieid="9" class="show_movie btn btn-primary">
详细
</button>
</div>
</div>
</div>
</div>
</div>
<!-- 模态 -->
<div class="modal fade" id="movieModal">
<div class="modal-dialog modal-dialog-centered">
<div class="modal-content">
<!-- 模态标题 -->
<div class="modal-header">
<h4 class="modal-title movie_title">加载中……</h4>
<button type="button" class="btn-close" data-bs-dismiss="modal"></button>
</div>
<!-- 模态主体 -->
<div class="modal-body text-center">
<div class="spinner-border text-primary d-none"></div>
<p class="movie_body">加载中……</p>
</div>
<!-- 模态页脚 -->
<div class="modal-footer">
<button type="button" class="btn btn-danger" data-bs-dismiss="modal">关闭</button>
</div>
</div>
</div>
</div>
{% endblock %}
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册