Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
looyolo
scrapy
提交
edcde7a2
S
scrapy
项目概览
looyolo
/
scrapy
与 Fork 源项目一致
从无法访问的项目Fork
通知
2
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
S
scrapy
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
edcde7a2
编写于
5月 18, 2017
作者:
M
Mikhail Korobov
提交者:
Paul Tremberth
5月 18, 2017
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
DOC tweak release notes: promote response.follow, mention logging/stats changes
上级
a3d3cd4c
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
36 addition
and
19 deletion
+36
-19
docs/news.rst
docs/news.rst
+36
-19
未找到文件。
docs/news.rst
浏览文件 @
edcde7a2
...
@@ -11,28 +11,41 @@ but quite a few handy improvements nonetheless.
...
@@ -11,28 +11,41 @@ but quite a few handy improvements nonetheless.
Scrapy now supports anonymous FTP sessions with customizable user and
Scrapy now supports anonymous FTP sessions with customizable user and
password via the new :setting:`FTP_USER` and :setting:`FTP_PASSWORD` settings.
password via the new :setting:`FTP_USER` and :setting:`FTP_PASSWORD` settings.
**And if you're using Twisted version 17.1.0 or above, FTP is now available
And if you're using Twisted version 17.1.0 or above, FTP is now available
with Python 3.**
with Python 3.
Link extractors now work similarly to what a regular modern browser would
There's a new :meth:`response.follow <scrapy.http.TextResponse.follow>` method
do. Especially, leading and trailing whitespace are removed from attributes
for creating requests; **it is now a recommended way to create Requests
(think ``href=" http://example.com"``) when building ``Link`` objects.
in Scrapy spiders**. This method makes it easier to write correct
This whitespace-stripping also happens for ``action`` attributes with
spiders; ``response.follow`` has several advantages over creating
``FormRequest``.
``scrapy.Request`` objects directly:
**Please also note that link extractors do not canonicalize URLs by default
anymore.** This was puzzling users every now and then, and it's not what
* it handles relative URLs;
browsers do in fact, so we removed that extra transformation on extractred
* it works properly with non-ascii URLs on non-UTF8 pages;
links.
* in addition to absolute and relative URLs it supports Selectors;
for ``<a>`` elements it can also extract their href values.
For example, instead of this::
There's a new ``response.follow()`` shortcut for creating requests directly
for href in response.css('li.page a::attr(href)').extract():
from a response instance and a relative URL.
url = response.urljoin(href)
For example, instead of::
yield scrapy.Request(url, self.parse, encoding=response.encoding)
scrapy.Request(response.urljoin(somehrefvalue))
One can now write this::
you can now use the simpler::
for a in response.css('li.page a'):
yield response.follow(a, self.parse)
response.follow(somehrefvalue)
Link extractors are also improved. They work similarly to what a regular
modern browser would do: leading and trailing whitespace are removed
from attributes (think ``href=" http://example.com"``) when building
``Link`` objects. This whitespace-stripping also happens for ``action``
attributes with ``FormRequest``.
**Please also note that link extractors do not canonicalize URLs by default
anymore.** This was puzzling users every now and then, and it's not what
browsers do in fact, so we removed that extra transformation on extractred
links.
For those of you wanting more control on the ``Referer:`` header that Scrapy
For those of you wanting more control on the ``Referer:`` header that Scrapy
sends when following links, you can set your own ``Referrer Policy``.
sends when following links, you can set your own ``Referrer Policy``.
...
@@ -44,6 +57,10 @@ And this policy is fully customizable with W3C standard values
...
@@ -44,6 +57,10 @@ And this policy is fully customizable with W3C standard values
(or with something really custom of your own if you wish).
(or with something really custom of your own if you wish).
See :setting:`REFERRER_POLICY` for details.
See :setting:`REFERRER_POLICY` for details.
To make Scrapy spiders easier to debug, Scrapy logs more stats by default
in 1.4: memory usage stats, detailed retry stats, detailed HTTP error code
stats. A similar change is that HTTP cache path is also visible in logs now.
Last but not least, Scrapy now has the option to make JSON and XML items
Last but not least, Scrapy now has the option to make JSON and XML items
more human-readable, with newlines between items and even custom indenting
more human-readable, with newlines between items and even custom indenting
offset, using the new :setting:`FEED_EXPORT_INDENT` setting.
offset, using the new :setting:`FEED_EXPORT_INDENT` setting.
...
@@ -60,7 +77,7 @@ New Features
...
@@ -60,7 +77,7 @@ New Features
- Enable memusage extension by default (:issue:`2187`) ;
- Enable memusage extension by default (:issue:`2187`) ;
**this is technically backwards-incompatible** so please check if you have
**this is technically backwards-incompatible** so please check if you have
any non-default ``MEMUSAGE_***`` settings set.
any non-default ``MEMUSAGE_***`` settings set.
- New :
meth:`Response.follow <scrapy.http.Response.follow
>` shortcut
- New :
ref:`response.follow <response-follow-example
>` shortcut
for creating requests (:issue:`1940`)
for creating requests (:issue:`1940`)
- Added ``flags`` argument and attribute to :class:`Request <scrapy.http.Request>`
- Added ``flags`` argument and attribute to :class:`Request <scrapy.http.Request>`
objects (:issue:`2047`)
objects (:issue:`2047`)
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录