Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
looyolo
scrapy
提交
8ed743f2
S
scrapy
项目概览
looyolo
/
scrapy
与 Fork 源项目一致
从无法访问的项目Fork
通知
2
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
S
scrapy
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
8ed743f2
编写于
9月 02, 2009
作者:
D
Daniel Grana
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
remove obsolete s3 images pipeline
上级
45bc12e2
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
0 addition
and
98 deletion
+0
-98
scrapy/contrib/pipeline/s3images.py
scrapy/contrib/pipeline/s3images.py
+0
-98
未找到文件。
scrapy/contrib/pipeline/s3images.py
已删除
100644 → 0
浏览文件 @
45bc12e2
import
rfc822
from
scrapy.http
import
Request
from
scrapy.core.engine
import
scrapyengine
from
scrapy.core.exceptions
import
NotConfigured
from
scrapy.contrib.pipeline.images
import
BaseImagesPipeline
from
scrapy.conf
import
settings
class
S3ImagesPipeline
(
BaseImagesPipeline
):
"""Images pipeline with amazon S3 support as image's store backend
This pipeline tries to minimize the PUT requests made to amazon doing a
HEAD per full image, if HEAD returns a successfully response, then the
Last-Modified header is compared to current timestamp and if the difference
in days are greater that IMAGE_EXPIRES setting, then the image is
downloaded, reprocessed and uploaded to S3 again including its thumbnails.
It is recommended to add an spider with domain_name 's3.amazonaws.com',
doing that you will overcome the limit of request per spider. The following
is the minimal code for this spider:
from scrapy.spider import BaseSpider
class S3AmazonAWSSpider(BaseSpider):
domain_name = "s3.amazonaws.com"
max_concurrent_requests = 100
start_urls = ('http://s3.amazonaws.com/',)
SPIDER = S3AmazonAWSSpider()
Commonly uploading images to S3 requires requests to be signed, the
recommended way is to enable scrapy.contrib.aws.AWSMiddleware downloader
middleware and configure AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
settings
More info about amazon S3 at http://docs.amazonwebservices.com/AmazonS3/2006-03-01/
"""
# amazon s3 bucket name to put images
bucket_name
=
settings
.
get
(
'S3_BUCKET'
)
# prefix to prepend to image keys
key_prefix
=
settings
.
get
(
'S3_PREFIX'
,
''
)
# Optional spider to use for image uploading
AmazonS3Spider
=
None
def
__init__
(
self
):
if
not
settings
[
'S3_IMAGES'
]:
raise
NotConfigured
super
(
S3ImagesPipeline
,
self
).
__init__
()
def
s3_request
(
self
,
key
,
method
,
body
=
None
,
headers
=
None
):
url
=
'http://%s.s3.amazonaws.com/%s%s'
%
(
self
.
bucket_name
,
self
.
key_prefix
,
key
)
return
Request
(
url
,
method
=
method
,
body
=
body
,
headers
=
headers
)
def
stat_key
(
self
,
key
,
info
):
def
_onsuccess
(
response
):
if
response
.
status
==
200
:
checksum
=
response
.
headers
[
'Etag'
].
strip
(
'"'
)
last_modified
=
response
.
headers
[
'Last-Modified'
]
modified_tuple
=
rfc822
.
parsedate_tz
(
last_modified
)
modified_stamp
=
int
(
rfc822
.
mktime_tz
(
modified_tuple
))
return
{
'checksum'
:
checksum
,
'last_modified'
:
modified_stamp
}
req
=
self
.
s3_request
(
key
,
method
=
'HEAD'
)
dfd
=
self
.
s3_download
(
req
,
info
)
dfd
.
addCallback
(
_onsuccess
)
return
dfd
def
store_image
(
self
,
key
,
image
,
buf
,
info
):
"""Upload image to S3 storage"""
width
,
height
=
image
.
size
headers
=
{
'Content-Type'
:
'image/jpeg'
,
'X-Amz-Acl'
:
'public-read'
,
'X-Amz-Meta-Width'
:
str
(
width
),
'X-Amz-Meta-Height'
:
str
(
height
),
'Cache-Control'
:
'max-age=172800'
,
}
buf
.
seek
(
0
)
req
=
self
.
s3_request
(
key
,
method
=
'PUT'
,
body
=
buf
.
read
(),
headers
=
headers
)
self
.
s3_download
(
req
,
info
)
def
s3_download
(
self
,
request
,
info
):
"""This method is used for HEAD and PUT requests sent to amazon S3
It tries to use a specific spider domain for uploads, or defaults
to current domain spider.
"""
if
self
.
AmazonS3Spider
:
# need to use schedule to auto-open domain
return
scrapyengine
.
schedule
(
request
,
self
.
AmazonS3Spider
)
return
self
.
download
(
request
,
info
)
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录