Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
looyolo
scrapy
提交
6f4c964a
S
scrapy
项目概览
looyolo
/
scrapy
与 Fork 源项目一致
从无法访问的项目Fork
通知
2
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
S
scrapy
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
前往新版Gitcode,体验更适合开发者的 AI 搜索 >>
未验证
提交
6f4c964a
编写于
6月 24, 2020
作者:
A
Adrián Chaves
提交者:
GitHub
6月 24, 2020
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Cover Scrapy 2.2.0 in the release notes (#4630)
上级
536643ef
变更
6
隐藏空白更改
内联
并排
Showing
6 changed file
with
214 addition
and
6 deletion
+214
-6
docs/contributing.rst
docs/contributing.rst
+4
-1
docs/news.rst
docs/news.rst
+195
-0
docs/topics/media-pipeline.rst
docs/topics/media-pipeline.rst
+8
-1
docs/topics/request-response.rst
docs/topics/request-response.rst
+2
-2
scrapy/http/response/text.py
scrapy/http/response/text.py
+2
-0
scrapy/utils/misc.py
scrapy/utils/misc.py
+3
-2
未找到文件。
docs/contributing.rst
浏览文件 @
6f4c964a
...
...
@@ -155,6 +155,9 @@ Finally, try to keep aesthetic changes (:pep:`8` compliance, unused imports
removal, etc) in separate commits from functional changes. This will make pull
requests easier to review and more likely to get merged.
.. _coding-style:
Coding style
============
...
...
@@ -163,7 +166,7 @@ Scrapy:
* Unless otherwise specified, follow :pep:`8`.
* It's OK to use lines longer than
80
chars if it improves the code
* It's OK to use lines longer than
79
chars if it improves the code
readability.
* Don't put your name in the code you contribute; git provides enough
...
...
docs/news.rst
浏览文件 @
6f4c964a
...
...
@@ -3,6 +3,201 @@
Release notes
=============
.. _release-2.2.0:
Scrapy 2.2.0 (2020-06-24)
-------------------------
Highlights:
* Python 3.5.2+ is required now
* :ref:`dataclass objects <dataclass-items>` and
:ref:`attrs objects <attrs-items>` are now valid :ref:`item types
<item-types>`
* New :meth:`TextResponse.json <scrapy.http.TextResponse.json>` method
* New :signal:`bytes_received` signal that allows canceling response download
* :class:`~scrapy.downloadermiddlewares.cookies.CookiesMiddleware` fixes
Backward-incompatible changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Support for Python 3.5.0 and 3.5.1 has been dropped; Scrapy now refuses to
run with a Python version lower than 3.5.2, which introduced
:class:`typing.Type` (:issue:`4615`)
Deprecations
~~~~~~~~~~~~
* :meth:`TextResponse.body_as_unicode
<scrapy.http.TextResponse.body_as_unicode>` is now deprecated, use
:attr:`TextResponse.text <scrapy.http.TextResponse.text>` instead
(:issue:`4546`, :issue:`4555`, :issue:`4579`)
* :class:`scrapy.item.BaseItem` is now deprecated, use
:class:`scrapy.item.Item` instead (:issue:`4534`)
New features
~~~~~~~~~~~~
* :ref:`dataclass objects <dataclass-items>` and
:ref:`attrs objects <attrs-items>` are now valid :ref:`item types
<item-types>`, and a new itemadapter_ library makes it easy to
write code that :ref:`supports any item type <supporting-item-types>`
(:issue:`2749`, :issue:`2807`, :issue:`3761`, :issue:`3881`, :issue:`4642`)
* A new :meth:`TextResponse.json <scrapy.http.TextResponse.json>` method
allows to deserialize JSON responses (:issue:`2444`, :issue:`4460`,
:issue:`4574`)
* A new :signal:`bytes_received` signal allows monitoring response download
progress and :ref:`stopping downloads <topics-stop-response-download>`
(:issue:`4205`, :issue:`4559`)
* The dictionaries in the result list of a :ref:`media pipeline
<topics-media-pipeline>` now include a new key, ``status``, which indicates
if the file was downloaded or, if the file was not downloaded, why it was
not downloaded; see :meth:`FilesPipeline.get_media_requests
<scrapy.pipelines.files.FilesPipeline.get_media_requests>` for more
information (:issue:`2893`, :issue:`4486`)
* When using :ref:`Google Cloud Storage <media-pipeline-gcs>` for
a :ref:`media pipeline <topics-media-pipeline>`, a warning is now logged if
the configured credentials do not grant the required permissions
(:issue:`4346`, :issue:`4508`)
* :ref:`Link extractors <topics-link-extractors>` are now serializable,
as long as you do not use :ref:`lambdas <lambda>` for parameters; for
example, you can now pass link extractors in :attr:`Request.cb_kwargs
<scrapy.http.Request.cb_kwargs>` or
:attr:`Request.meta <scrapy.http.Request.meta>` when :ref:`persisting
scheduled requests <topics-jobs>` (:issue:`4554`)
* Upgraded the :ref:`pickle protocol <pickle-protocols>` that Scrapy uses
from protocol 2 to protocol 4, improving serialization capabilities and
performance (:issue:`4135`, :issue:`4541`)
* :func:`scrapy.utils.misc.create_instance` now raises a :exc:`TypeError`
exception if the resulting instance is ``None`` (:issue:`4528`,
:issue:`4532`)
.. _itemadapter: https://github.com/scrapy/itemadapter
Bug fixes
~~~~~~~~~
* :class:`~scrapy.downloadermiddlewares.cookies.CookiesMiddleware` no longer
discards cookies defined in :attr:`Request.headers
<scrapy.http.Request.headers>` (:issue:`1992`, :issue:`2400`)
* :class:`~scrapy.downloadermiddlewares.cookies.CookiesMiddleware` no longer
re-encodes cookies defined as :class:`bytes` in the ``cookies`` parameter
of the ``__init__`` method of :class:`~scrapy.http.Request`
(:issue:`2400`, :issue:`3575`)
* When :setting:`FEEDS` defines multiple URIs, :setting:`FEED_STORE_EMPTY` is
``False`` and the crawl yields no items, Scrapy no longer stops feed
exports after the first URI (:issue:`4621`, :issue:`4626`)
* :class:`~scrapy.spiders.Spider` callbacks defined using :doc:`coroutine
syntax <topics/coroutines>` no longer need to return an iterable, and may
instead return a :class:`~scrapy.http.Request` object, an
:ref:`item <topics-items>`, or ``None`` (:issue:`4609`)
* The :command:`startproject` command now ensures that the generated project
folders and files have the right permissions (:issue:`4604`)
* Fix a :exc:`KeyError` exception being sometimes raised from
:class:`scrapy.utils.datatypes.LocalWeakReferencedCache` (:issue:`4597`,
:issue:`4599`)
* When :setting:`FEEDS` defines multiple URIs, log messages about items being
stored now contain information from the corresponding feed, instead of
always containing information about only one of the feeds (:issue:`4619`,
:issue:`4629`)
Documentation
~~~~~~~~~~~~~
* Added a new section about :ref:`accessing cb_kwargs from errbacks
<errback-cb_kwargs>` (:issue:`4598`, :issue:`4634`)
* Covered chompjs_ in :ref:`topics-parsing-javascript` (:issue:`4556`,
:issue:`4562`)
* Removed from :doc:`topics/coroutines` the warning about the API being
experimental (:issue:`4511`, :issue:`4513`)
* Removed references to unsupported versions of :doc:`Twisted
<twisted:index>` (:issue:`4533`)
* Updated the description of the :ref:`screenshot pipeline example
<ScreenshotPipeline>`, which now uses :doc:`coroutine syntax
<topics/coroutines>` instead of returning a
:class:`~twisted.internet.defer.Deferred` (:issue:`4514`, :issue:`4593`)
* Removed a misleading import line from the
:func:`scrapy.utils.log.configure_logging` code example (:issue:`4510`,
:issue:`4587`)
* The display-on-hover behavior of internal documentation references now also
covers links to :ref:`commands <topics-commands>`, :attr:`Request.meta
<scrapy.http.Request.meta>` keys, :ref:`settings <topics-settings>` and
:ref:`signals <topics-signals>` (:issue:`4495`, :issue:`4563`)
* It is again possible to download the documentation for offline reading
(:issue:`4578`, :issue:`4585`)
* Removed backslashes preceding ``*args`` and ``**kwargs`` in some function
and method signatures (:issue:`4592`, :issue:`4596`)
.. _chompjs: https://github.com/Nykakin/chompjs
Quality assurance
~~~~~~~~~~~~~~~~~
* Adjusted the code base further to our :ref:`style guidelines
<coding-style>` (:issue:`4237`, :issue:`4525`, :issue:`4538`,
:issue:`4539`, :issue:`4540`, :issue:`4542`, :issue:`4543`, :issue:`4544`,
:issue:`4545`, :issue:`4557`, :issue:`4558`, :issue:`4566`, :issue:`4568`,
:issue:`4572`)
* Removed remnants of Python 2 support (:issue:`4550`, :issue:`4553`,
:issue:`4568`)
* Improved code sharing between the :command:`crawl` and :command:`runspider`
commands (:issue:`4548`, :issue:`4552`)
* Replaced ``chain(*iterable)`` with ``chain.from_iterable(iterable)``
(:issue:`4635`)
* You may now run the :mod:`asyncio` tests with Tox on any Python version
(:issue:`4521`)
* Updated test requirements to reflect an incompatibility with pytest 5.4 and
5.4.1 (:issue:`4588`)
* Improved :class:`~scrapy.spiderloader.SpiderLoader` test coverage for
scenarios involving duplicate spider names (:issue:`4549`, :issue:`4560`)
* Configured Travis CI to also run the tests with Python 3.5.2
(:issue:`4518`, :issue:`4615`)
* Added a `Pylint <https://www.pylint.org/>`_ job to Travis CI
(:issue:`3727`)
* Added a `Mypy <http://mypy-lang.org/>`_ job to Travis CI (:issue:`4637`)
* Made use of set literals in tests (:issue:`4573`)
* Cleaned up the Travis CI configuration (:issue:`4517`, :issue:`4519`,
:issue:`4522`, :issue:`4537`)
.. _release-2.1.0:
Scrapy 2.1.0 (2020-04-24)
...
...
docs/topics/media-pipeline.rst
浏览文件 @
6f4c964a
...
...
@@ -201,6 +201,9 @@ For self-hosting you also might feel the need not to use SSL and not to verify S
.. _s3.scality: https://s3.scality.com/
.. _canned ACLs: https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl
.. _media-pipeline-gcs:
Google Cloud Storage
---------------------
...
...
@@ -475,7 +478,11 @@ See here the methods that you can override in your custom Files Pipeline:
* ``checksum`` - a `MD5 hash`_ of the image contents
* ``status`` - the file status indication. It can be one of the following:
* ``status`` - the file status indication.
.. versionadded:: 2.2
It can be one of the following:
* ``downloaded`` - file was downloaded.
* ``uptodate`` - file was not downloaded, as it was downloaded recently,
...
...
docs/topics/request-response.rst
浏览文件 @
6f4c964a
...
...
@@ -191,7 +191,7 @@ Request objects
In case of a failure to process the request, this dict can be accessed as
``failure.request.cb_kwargs`` in the request's errback. For more information,
see :ref:`
topics-request-response-ref-accessing-callback-arguments-in-errback
`.
see :ref:`
errback-cb_kwargs
`.
.. method:: Request.copy()
...
...
@@ -316,7 +316,7 @@ errors if needed::
request = failure.request
self.logger.error('TimeoutError on %s', request.url)
.. _
topics-request-response-ref-accessing-callback-arguments-in-errback
:
.. _
errback-cb_kwargs
:
Accessing additional data in errback functions
----------------------------------------------
...
...
scrapy/http/response/text.py
浏览文件 @
6f4c964a
...
...
@@ -74,6 +74,8 @@ class TextResponse(Response):
def
json
(
self
):
"""
.. versionadded:: 2.2
Deserialize a JSON document to a Python object.
"""
if
self
.
_cached_decoded_json
is
_NONE
:
...
...
scrapy/utils/misc.py
浏览文件 @
6f4c964a
...
...
@@ -138,8 +138,9 @@ def create_instance(objcls, settings, crawler, *args, **kwargs):
Raises ``ValueError`` if both ``settings`` and ``crawler`` are ``None``.
Raises ``TypeError`` if the resulting instance is ``None`` (e.g. if an
extension has not been implemented correctly).
.. versionchanged:: 2.2
Raises ``TypeError`` if the resulting instance is ``None`` (e.g. if an
extension has not been implemented correctly).
"""
if
settings
is
None
:
if
crawler
is
None
:
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录