未验证 提交 acb3b443 编写于 作者: A Andrey Rahmatullin 提交者: GitHub

Merge pull request #4724 from Gallaecio/feed-uri-params

Document FEED_URI_PARAMS
...@@ -321,13 +321,14 @@ The following is a list of the accepted keys and the setting that is used ...@@ -321,13 +321,14 @@ The following is a list of the accepted keys and the setting that is used
as a fallback value if that key is not provided for a specific feed definition. as a fallback value if that key is not provided for a specific feed definition.
* ``format``: the serialization format to be used for the feed. * ``format``: the serialization format to be used for the feed.
See :ref:`topics-feed-format` for possible values. See :ref:`topics-feed-format` for possible values.
Mandatory, no fallback setting Mandatory, no fallback setting
* ``batch_item_count``: falls back to :setting:`FEED_EXPORT_BATCH_ITEM_COUNT`
* ``encoding``: falls back to :setting:`FEED_EXPORT_ENCODING` * ``encoding``: falls back to :setting:`FEED_EXPORT_ENCODING`
* ``fields``: falls back to :setting:`FEED_EXPORT_FIELDS` * ``fields``: falls back to :setting:`FEED_EXPORT_FIELDS`
* ``indent``: falls back to :setting:`FEED_EXPORT_INDENT` * ``indent``: falls back to :setting:`FEED_EXPORT_INDENT`
* ``store_empty``: falls back to :setting:`FEED_STORE_EMPTY` * ``store_empty``: falls back to :setting:`FEED_STORE_EMPTY`
* ``batch_item_count``: falls back to :setting:`FEED_EXPORT_BATCH_ITEM_COUNT` * ``uri_params``: falls back to :setting:`FEED_URI_PARAMS`
.. setting:: FEED_EXPORT_ENCODING .. setting:: FEED_EXPORT_ENCODING
...@@ -500,7 +501,7 @@ generated: ...@@ -500,7 +501,7 @@ generated:
* ``%(batch_time)s`` - gets replaced by a timestamp when the feed is being created * ``%(batch_time)s`` - gets replaced by a timestamp when the feed is being created
(e.g. ``2020-03-28T14-45-08.237134``) (e.g. ``2020-03-28T14-45-08.237134``)
* ``%(batch_id)d`` - gets replaced by the sequence number of the batch. * ``%(batch_id)d`` - gets replaced by the 1-based sequence number of the batch.
Use :ref:`printf-style string formatting <python:old-string-formatting>` to Use :ref:`printf-style string formatting <python:old-string-formatting>` to
alter the number format. For example, to make the batch ID a 5-digit alter the number format. For example, to make the batch ID a 5-digit
...@@ -517,16 +518,74 @@ And your :command:`crawl` command line is:: ...@@ -517,16 +518,74 @@ And your :command:`crawl` command line is::
The command line above can generate a directory tree like:: The command line above can generate a directory tree like::
->projectname ->projectname
-->dirname -->dirname
--->1-filename2020-03-28T14-45-08.237134.json --->1-filename2020-03-28T14-45-08.237134.json
--->2-filename2020-03-28T14-45-09.148903.json --->2-filename2020-03-28T14-45-09.148903.json
--->3-filename2020-03-28T14-45-10.046092.json --->3-filename2020-03-28T14-45-10.046092.json
Where the first and second files contain exactly 100 items. The last one contains Where the first and second files contain exactly 100 items. The last one contains
100 items or fewer. 100 items or fewer.
.. setting:: FEED_URI_PARAMS
FEED_URI_PARAMS
---------------
Default: ``None``
A string with the import path of a function to set the parameters to apply with
:ref:`printf-style string formatting <python:old-string-formatting>` to the
feed URI.
The function signature should be as follows:
.. function:: uri_params(params, spider)
Return a :class:`dict` of key-value pairs to apply to the feed URI using
:ref:`printf-style string formatting <python:old-string-formatting>`.
:param params: default key-value pairs
Specifically:
- ``batch_id``: ID of the file batch. See
:setting:`FEED_EXPORT_BATCH_ITEM_COUNT`.
If :setting:`FEED_EXPORT_BATCH_ITEM_COUNT` is ``0``, ``batch_id``
is always ``1``.
- ``batch_time``: UTC date and time, in ISO format with ``:``
replaced with ``-``.
See :setting:`FEED_EXPORT_BATCH_ITEM_COUNT`.
- ``time``: ``batch_time``, with microseconds set to ``0``.
:type params: dict
:param spider: source spider of the feed items
:type spider: scrapy.spiders.Spider
For example, to include the :attr:`name <scrapy.spiders.Spider.name>` of the
source spider in the feed URI:
#. Define the following function somewhere in your project::
# myproject/utils.py
def uri_params(params, spider):
return {**params, 'spider_name': spider.name}
#. Point :setting:`FEED_URI_PARAMS` to that function in your settings::
# myproject/settings.py
FEED_URI_PARAMS = 'myproject.utils.uri_params'
#. Use ``%(spider_name)s`` in your feed URI::
scrapy crawl <spider_name> -o "%(spider_name)s.jl"
.. _URIs: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier .. _URIs: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
.. _Amazon S3: https://aws.amazon.com/s3/ .. _Amazon S3: https://aws.amazon.com/s3/
.. _botocore: https://github.com/boto/botocore .. _botocore: https://github.com/boto/botocore
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册