稿定设计素材采集

a14ac74f · 梦想橡皮擦 · 1548bd3d · a14ac74f · a14ac74f · a14ac74f
隐藏空白更改
内联并排

Showing with 93 addition and 2 deletion

NO29/imgs/图片存放地址.txt NO29/imgs/图片存放地址.txt +0 -0

NO29/index.py NO29/index.py +86 -0

README.md README.md +7 -2

未找到文件。
--- a/NO29/imgs/图片存放地址.txt
+++ b/NO29/imgs/图片存放地址.txt
--- a/NO29/index.py
+++ b/NO29/index.py
+import requests
+from queue import Queue
+import random
+import threading
+import time
+def get_headers():
+    user_agent_list = [
+        "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)",
+        "Mozilla/5.0 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)",
+        "Baiduspider-image+(+http://www.baidu.com/search/spider.htm)",
+        "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 YisouSpider/5.0 Safari/537.36",
+        "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
+        "Mozilla/5.0 (compatible; Googlebot-Image/1.0; +http://www.google.com/bot.html)",
+        "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
+        "Sogou News Spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
+        "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0);",
+        "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)",
+        "Sosospider+(+http://help.soso.com/webspider.htm)",
+        "Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)"
+    ]
+    UserAgent = random.choice(user_agent_list)
+    headers = {'User-Agent': UserAgent, 'referer': 'https://sucai.gaoding.com/'}
+    return headers
+# 生产者线程
+class Producer(threading.Thread):
+    def __init__(self, t_name, queue):
+        threading.Thread.__init__(self, name=t_name)
+        self.data = queue
+    # 测试爬取 3 页，实际采集的时候，可以放大到100页
+    def run(self):
+        # 测试数据，爬取3页
+        for i in range(1, 101):
+            print("线程名: %s，序号：%d， 正在向队列写入数据 " % (self.getName(), i))
+            url = 'https://api-sucai.gaoding.com/api/search-api/sucai/templates/search?q=&sort=&colors=&styles=&filter_id=1617130&page_size=100&page_num={}'.format(
+                i)
+            res = requests.get(url=url, headers=get_headers(), timeout=5)
+            if res:
+                data = res.json()
+                for item in data:
+                    title = item["title"]
+                    img_url = item["preview"]["url"]
+                    self.data.put((title, img_url))
+        print("%s: %s 写入完成!" % (time.ctime(), self.getName()))
+# 消费者线程
+class Consumer(threading.Thread):
+    def __init__(self, t_name, queue):
+        threading.Thread.__init__(self, name=t_name)
+        self.data = queue
+    def run(self):
+        while True:
+            val = self.data.get()
+            if val is not None:
+                print("线程名：%s，正在读取数据：%s" % (self.getName(), val))
+                title, url = val
+                res = requests.get(url=url, headers=get_headers(), timeout=5)
+                if res:
+                    try:
+                        with open(f"./imgs/{title}.png", "wb") as f:
+                            f.write(res.content)
+                            print(f"{val}", "写入完毕")
+                    except Exception as e:
+                        pass
+# 主函数
+def main():
+    queue = Queue()
+    producer = Producer('生产者', queue)
+    consumer = Consumer('消费者', queue)
+    producer.start()
+    consumer.start()
+    producer.join()
+    consumer.join()
+    print('所有线程执行完毕')
+if __name__ == '__main__':
+    main()
\ No newline at end of file
--- a/README.md
+++ b/README.md
@@ -50,8 +50,13 @@ Python爬虫120例正式开始
 ### 多线程 threading + queue 模块
 26. [全国美容大夫数据采集数据（花容网 huaroo 公开数据），爬虫120例之26例](https://dream.blog.csdn.net/article/details/119914401)
 27. [一个站点不够学？那就在用Python增加一个采集目标，一派话题广场+某金融论坛话题广场爬虫](https://dream.blog.csdn.net/article/details/119914560)
-28. [域名中介数据采集，待发布]
+28. [Python爬虫采集，中介网互联网网站排行榜， 样本数量：58341](https://dream.blog.csdn.net/article/details/119941727)
-29. [稿定设计数据采集，待发布]
+29. [用Python保住“设计大哥“的头发，直接甩给他10000张参考图，爬虫采集【稿定设计】平面模板素材](https://dream.blog.csdn.net/article/details/120010272)
+### requests-html 库学习
+30. [外网站点排行榜数据采集]