未验证 提交 acb3b443 编写于 作者: A Andrey Rahmatullin 提交者: GitHub

Merge pull request #4724 from Gallaecio/feed-uri-params

Document FEED_URI_PARAMS
......@@ -321,13 +321,14 @@ The following is a list of the accepted keys and the setting that is used
as a fallback value if that key is not provided for a specific feed definition.
* ``format``: the serialization format to be used for the feed.
See :ref:`topics-feed-format` for possible values.
See :ref:`topics-feed-format` for possible values.
Mandatory, no fallback setting
* ``batch_item_count``: falls back to :setting:`FEED_EXPORT_BATCH_ITEM_COUNT`
* ``encoding``: falls back to :setting:`FEED_EXPORT_ENCODING`
* ``fields``: falls back to :setting:`FEED_EXPORT_FIELDS`
* ``indent``: falls back to :setting:`FEED_EXPORT_INDENT`
* ``store_empty``: falls back to :setting:`FEED_STORE_EMPTY`
* ``batch_item_count``: falls back to :setting:`FEED_EXPORT_BATCH_ITEM_COUNT`
* ``uri_params``: falls back to :setting:`FEED_URI_PARAMS`
.. setting:: FEED_EXPORT_ENCODING
......@@ -500,7 +501,7 @@ generated:
* ``%(batch_time)s`` - gets replaced by a timestamp when the feed is being created
(e.g. ``2020-03-28T14-45-08.237134``)
* ``%(batch_id)d`` - gets replaced by the sequence number of the batch.
* ``%(batch_id)d`` - gets replaced by the 1-based sequence number of the batch.
Use :ref:`printf-style string formatting <python:old-string-formatting>` to
alter the number format. For example, to make the batch ID a 5-digit
......@@ -517,16 +518,74 @@ And your :command:`crawl` command line is::
The command line above can generate a directory tree like::
->projectname
-->dirname
--->1-filename2020-03-28T14-45-08.237134.json
--->2-filename2020-03-28T14-45-09.148903.json
--->3-filename2020-03-28T14-45-10.046092.json
->projectname
-->dirname
--->1-filename2020-03-28T14-45-08.237134.json
--->2-filename2020-03-28T14-45-09.148903.json
--->3-filename2020-03-28T14-45-10.046092.json
Where the first and second files contain exactly 100 items. The last one contains
100 items or fewer.
.. setting:: FEED_URI_PARAMS
FEED_URI_PARAMS
---------------
Default: ``None``
A string with the import path of a function to set the parameters to apply with
:ref:`printf-style string formatting <python:old-string-formatting>` to the
feed URI.
The function signature should be as follows:
.. function:: uri_params(params, spider)
Return a :class:`dict` of key-value pairs to apply to the feed URI using
:ref:`printf-style string formatting <python:old-string-formatting>`.
:param params: default key-value pairs
Specifically:
- ``batch_id``: ID of the file batch. See
:setting:`FEED_EXPORT_BATCH_ITEM_COUNT`.
If :setting:`FEED_EXPORT_BATCH_ITEM_COUNT` is ``0``, ``batch_id``
is always ``1``.
- ``batch_time``: UTC date and time, in ISO format with ``:``
replaced with ``-``.
See :setting:`FEED_EXPORT_BATCH_ITEM_COUNT`.
- ``time``: ``batch_time``, with microseconds set to ``0``.
:type params: dict
:param spider: source spider of the feed items
:type spider: scrapy.spiders.Spider
For example, to include the :attr:`name <scrapy.spiders.Spider.name>` of the
source spider in the feed URI:
#. Define the following function somewhere in your project::
# myproject/utils.py
def uri_params(params, spider):
return {**params, 'spider_name': spider.name}
#. Point :setting:`FEED_URI_PARAMS` to that function in your settings::
# myproject/settings.py
FEED_URI_PARAMS = 'myproject.utils.uri_params'
#. Use ``%(spider_name)s`` in your feed URI::
scrapy crawl <spider_name> -o "%(spider_name)s.jl"
.. _URIs: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
.. _Amazon S3: https://aws.amazon.com/s3/
.. _botocore: https://github.com/boto/botocore
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册