diff --git a/docs/news.rst b/docs/news.rst index d9fe897ad5126c7aba3b97327e6b77ed15316bb4..0ea412e753a4f8928907511f59681f071ddef7cd 100644 --- a/docs/news.rst +++ b/docs/news.rst @@ -3,6 +3,190 @@ Release notes ============= +.. _release-2.5.0: + +Scrapy 2.5.0 (2021-04-06) +------------------------- + +Highlights: + +- Official Python 3.9 support + +- Experimental :ref:`HTTP/2 support ` + +- New :func:`~scrapy.downloadermiddlewares.retry.get_retry_request` function + to retry requests from spider callbacks + +- New :class:`~scrapy.signals.headers_received` signal that allows stopping + downloads early + +- New :class:`Response.protocol ` attribute + +Deprecation removals +~~~~~~~~~~~~~~~~~~~~ + +- Removed all code that :ref:`was deprecated in 1.7.0 <1.7-deprecations>` and + had not :ref:`already been removed in 2.4.0 <2.4-deprecation-removals>`. + (:issue:`4901`) + +- Removed support for the ``SCRAPY_PICKLED_SETTINGS_TO_OVERRIDE`` environment + variable, :ref:`deprecated in 1.8.0 <1.8-deprecations>`. (:issue:`4912`) + + +Deprecations +~~~~~~~~~~~~ + +- The :mod:`scrapy.utils.py36` module is now deprecated in favor of + :mod:`scrapy.utils.asyncgen`. (:issue:`4900`) + + +New features +~~~~~~~~~~~~ + +- Experimental :ref:`HTTP/2 support ` through a new download handler + that can be assigned to the ``https`` protocol in the + :setting:`DOWNLOAD_HANDLERS` setting. + (:issue:`1854`, :issue:`4769`, :issue:`5058`, :issue:`5059`, :issue:`5066`) + +- The new :func:`scrapy.downloadermiddlewares.retry.get_retry_request` + function may be used from spider callbacks or middlewares to handle the + retrying of a request beyond the scenarios that + :class:`~scrapy.downloadermiddlewares.retry.RetryMiddleware` supports. + (:issue:`3590`, :issue:`3685`, :issue:`4902`) + +- The new :class:`~scrapy.signals.headers_received` signal gives early access + to response headers and allows :ref:`stopping downloads + `. + (:issue:`1772`, :issue:`4897`) + +- The new :attr:`Response.protocol ` + attribute gives access to the string that identifies the protocol used to + download a response. (:issue:`4878`) + +- :ref:`Stats ` now include the following entries that indicate + the number of successes and failures in storing + :ref:`feeds `:: + + feedexport/success_count/ + feedexport/failed_count/ + + Where ```` is the feed storage backend class name, such as + :class:`~scrapy.extensions.feedexport.FileFeedStorage` or + :class:`~scrapy.extensions.feedexport.FTPFeedStorage`. + + (:issue:`3947`, :issue:`4850`) + +- The :class:`~scrapy.spidermiddlewares.urllength.UrlLengthMiddleware` spider + middleware now logs ignored URLs with ``INFO`` :ref:`logging level + ` instead of ``DEBUG``, and it now includes the following entry + into :ref:`stats ` to keep track of the number of ignored + URLs:: + + urllength/request_ignored_count + + (:issue:`5036`) + +- The + :class:`~scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware` + downloader middleware now logs the number of decompressed responses and the + total count of resulting bytes:: + + httpcompression/response_bytes + httpcompression/response_count + + (:issue:`4797`, :issue:`4799`) + + +Bug fixes +~~~~~~~~~ + +- Fixed installation on PyPy installing PyDispatcher in addition to + PyPyDispatcher, which could prevent Scrapy from working depending on which + package got imported. (:issue:`4710`, :issue:`4814`) + +- When inspecting a callback to check if it is a generator that also returns + a value, an exception is no longer raised if the callback has a docstring + with lower indentation than the following code. + (:issue:`4477`, :issue:`4935`) + +- The `Content-Length `_ + header is no longer omitted from responses when using the default, HTTP/1.1 + download handler (see :setting:`DOWNLOAD_HANDLERS`). + (:issue:`5009`, :issue:`5034`, :issue:`5045`, :issue:`5057`, :issue:`5062`) + +- Setting the :reqmeta:`handle_httpstatus_all` request meta key to ``False`` + now has the same effect as not setting it at all, instead of having the + same effect as setting it to ``True``. + (:issue:`3851`, :issue:`4694`) + + +Documentation +~~~~~~~~~~~~~ + +- Added instructions to :ref:`install Scrapy in Windows using pip + `. + (:issue:`4715`, :issue:`4736`) + +- Logging documentation now includes :ref:`additional ways to filter logs + `. + (:issue:`4216`, :issue:`4257`, :issue:`4965`) + +- Covered how to deal with long lists of allowed domains in the :ref:`FAQ + `. (:issue:`2263`, :issue:`3667`) + +- Covered scrapy-bench_ in :ref:`benchmarking`. + (:issue:`4996`, :issue:`5016`) + +- Clarified that one :ref:`extension ` instance is created + per crawler. + (:issue:`5014`) + +- Fixed some errors in examples. + (:issue:`4829`, :issue:`4830`, :issue:`4907`, :issue:`4909`, + :issue:`5008`) + +- Fixed some external links, typos, and so on. + (:issue:`4892`, :issue:`4899`, :issue:`4936`, :issue:`4942`, :issue:`5005`, + :issue:`5063`) + +- The :ref:`list of Request.meta keys ` is now sorted + alphabetically. + (:issue:`5061`, :issue:`5065`) + +- Updated references to Scrapinghub, which is now called Zyte. + (:issue:`4973`, :issue:`5072`) + +- Added a mention to contributors in the README. (:issue:`4956`) + +- Reduced the top margin of lists. (:issue:`4974`) + + +Quality Assurance +~~~~~~~~~~~~~~~~~ + +- Made Python 3.9 support official (:issue:`4757`, :issue:`4759`) + +- Extended typing hints (:issue:`4895`) + +- Fixed deprecated uses of the Twisted API. + (:issue:`4940`, :issue:`4950`, :issue:`5073`) + +- Made our tests run with the new pip resolver. + (:issue:`4710`, :issue:`4814`) + +- Added tests to ensure that :ref:`coroutine support ` + is tested. (:issue:`4987`) + +- Migrated from Travis CI to GitHub Actions. (:issue:`4924`) + +- Fixed CI issues. + (:issue:`4986`, :issue:`5020`, :issue:`5022`, :issue:`5027`, :issue:`5052`, + :issue:`5053`) + +- Implemented code refactorings, style fixes and cleanups. + (:issue:`4911`, :issue:`4982`, :issue:`5001`, :issue:`5002`, :issue:`5076`) + + .. _release-2.4.1: Scrapy 2.4.1 (2020-11-17) @@ -97,6 +281,8 @@ Backward-incompatible changes (:issue:`4717`, :issue:`4823`) +.. _2.4-deprecation-removals: + Deprecation removals ~~~~~~~~~~~~~~~~~~~~ @@ -1433,6 +1619,8 @@ Deprecation removals * ``scrapy.xlib`` has been removed (:issue:`4015`) +.. _1.8-deprecations: + Deprecations ~~~~~~~~~~~~ @@ -1789,6 +1977,8 @@ The following deprecated settings have also been removed (:issue:`3578`): * ``SPIDER_MANAGER_CLASS`` (use :setting:`SPIDER_LOADER_CLASS`) +.. _1.7-deprecations: + Deprecations ~~~~~~~~~~~~ @@ -4184,7 +4374,7 @@ API changes - ``url`` and ``body`` attributes of Request objects are now read-only (#230) - ``Request.copy()`` and ``Request.replace()`` now also copies their ``callback`` and ``errback`` attributes (#231) - Removed ``UrlFilterMiddleware`` from ``scrapy.contrib`` (already disabled by default) -- Offsite middelware doesn't filter out any request coming from a spider that doesn't have a allowed_domains attribute (#225) +- Offsite middleware doesn't filter out any request coming from a spider that doesn't have a allowed_domains attribute (#225) - Removed Spider Manager ``load()`` method. Now spiders are loaded in the ``__init__`` method itself. - Changes to Scrapy Manager (now called "Crawler"): - ``scrapy.core.manager.ScrapyManager`` class renamed to ``scrapy.crawler.Crawler`` @@ -4331,6 +4521,7 @@ First release of Scrapy. .. _resource: https://docs.python.org/2/library/resource.html .. _robots.txt: https://www.robotstxt.org/ .. _scrapely: https://github.com/scrapy/scrapely +.. _scrapy-bench: https://github.com/scrapy/scrapy-bench .. _service_identity: https://service-identity.readthedocs.io/en/stable/ .. _six: https://six.readthedocs.io/ .. _tox: https://pypi.org/project/tox/ diff --git a/docs/topics/request-response.rst b/docs/topics/request-response.rst index c0283df015ce8e578d81972cdc6554fac55495ea..500781c055360bddc164b0d225f4a1102972a768 100644 --- a/docs/topics/request-response.rst +++ b/docs/topics/request-response.rst @@ -703,7 +703,7 @@ Response objects .. versionadded:: 2.1.0 The ``ip_address`` parameter. - .. versionadded:: VERSION + .. versionadded:: 2.5.0 The ``protocol`` parameter. .. attribute:: Response.url @@ -809,7 +809,7 @@ Response objects .. attribute:: Response.protocol - .. versionadded:: VERSION + .. versionadded:: 2.5.0 The protocol that was used to download the response. For instance: "HTTP/1.0", "HTTP/1.1" diff --git a/docs/topics/settings.rst b/docs/topics/settings.rst index 9dcee64eb51181133ffcc42af1eca9ec70211cbe..f5dca824f5a7f805465aaebe3744bd90d857591f 100644 --- a/docs/topics/settings.rst +++ b/docs/topics/settings.rst @@ -677,6 +677,8 @@ handler (without replacement), place this in your ``settings.py``:: 'ftp': None, } +.. _http2: + The default HTTPS handler uses HTTP/1.1. To use HTTP/2 update :setting:`DOWNLOAD_HANDLERS` as follows:: @@ -703,7 +705,8 @@ The default HTTPS handler uses HTTP/1.1. To use HTTP/2 update - No support for `server pushes`_, which are ignored. - - No support for the :signal:`bytes_received` signal. + - No support for the :signal:`bytes_received` and + :signal:`headers_received` signals. .. _frame size: https://tools.ietf.org/html/rfc7540#section-4.2 .. _http2 faq: https://http2.github.io/faq/#does-http2-require-encryption diff --git a/docs/topics/signals.rst b/docs/topics/signals.rst index 98cfa606c6be3c763852dc039849863ea8bd814f..3d838fb634bfddf87ac689e859ff92e64f8765a0 100644 --- a/docs/topics/signals.rst +++ b/docs/topics/signals.rst @@ -403,7 +403,7 @@ bytes_received headers_received ~~~~~~~~~~~~~~~~ -.. versionadded:: VERSION +.. versionadded:: 2.5 .. signal:: headers_received .. function:: headers_received(headers, request, spider) diff --git a/tests/test_webclient.py b/tests/test_webclient.py index f935a86892b17bfdbea769c80caf96090cd8ac52..6e4cb9b6e9f5e29c59cc3fbb98827507b99a4d9d 100644 --- a/tests/test_webclient.py +++ b/tests/test_webclient.py @@ -418,7 +418,7 @@ class WebClientCustomCiphersSSLTestCase(WebClientSSLTestCase): def testPayloadDisabledCipher(self): if sys.implementation.name == "pypy" and parse_version(cryptography.__version__) <= parse_version("2.3.1"): - self.skipTest("This does work in PyPy with cryptography<=2.3.1") + self.skipTest("This test expects a failure, but the code does work in PyPy with cryptography<=2.3.1") s = "0123456789" * 10 settings = Settings({'DOWNLOADER_CLIENT_TLS_CIPHERS': 'ECDHE-RSA-AES256-GCM-SHA384'}) client_context_factory = create_instance(ScrapyClientContextFactory, settings=settings, crawler=None)