Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
looyolo
scrapy
提交
acb3b443
S
scrapy
项目概览
looyolo
/
scrapy
与 Fork 源项目一致
从无法访问的项目Fork
通知
2
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
S
scrapy
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
未验证
提交
acb3b443
编写于
8月 14, 2020
作者:
A
Andrey Rahmatullin
提交者:
GitHub
8月 14, 2020
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #4724 from Gallaecio/feed-uri-params
Document FEED_URI_PARAMS
上级
61653418
65e0abae
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
67 addition
and
8 deletion
+67
-8
docs/topics/feed-exports.rst
docs/topics/feed-exports.rst
+67
-8
未找到文件。
docs/topics/feed-exports.rst
浏览文件 @
acb3b443
...
@@ -321,13 +321,14 @@ The following is a list of the accepted keys and the setting that is used
...
@@ -321,13 +321,14 @@ The following is a list of the accepted keys and the setting that is used
as a fallback value if that key is not provided for a specific feed definition.
as a fallback value if that key is not provided for a specific feed definition.
* ``format``: the serialization format to be used for the feed.
* ``format``: the serialization format to be used for the feed.
See :ref:`topics-feed-format` for possible values.
See :ref:`topics-feed-format` for possible values.
Mandatory, no fallback setting
Mandatory, no fallback setting
* ``batch_item_count``: falls back to :setting:`FEED_EXPORT_BATCH_ITEM_COUNT`
* ``encoding``: falls back to :setting:`FEED_EXPORT_ENCODING`
* ``encoding``: falls back to :setting:`FEED_EXPORT_ENCODING`
* ``fields``: falls back to :setting:`FEED_EXPORT_FIELDS`
* ``fields``: falls back to :setting:`FEED_EXPORT_FIELDS`
* ``indent``: falls back to :setting:`FEED_EXPORT_INDENT`
* ``indent``: falls back to :setting:`FEED_EXPORT_INDENT`
* ``store_empty``: falls back to :setting:`FEED_STORE_EMPTY`
* ``store_empty``: falls back to :setting:`FEED_STORE_EMPTY`
* ``
batch_item_count``: falls back to :setting:`FEED_EXPORT_BATCH_ITEM_COUNT
`
* ``
uri_params``: falls back to :setting:`FEED_URI_PARAMS
`
.. setting:: FEED_EXPORT_ENCODING
.. setting:: FEED_EXPORT_ENCODING
...
@@ -500,7 +501,7 @@ generated:
...
@@ -500,7 +501,7 @@ generated:
* ``%(batch_time)s`` - gets replaced by a timestamp when the feed is being created
* ``%(batch_time)s`` - gets replaced by a timestamp when the feed is being created
(e.g. ``2020-03-28T14-45-08.237134``)
(e.g. ``2020-03-28T14-45-08.237134``)
* ``%(batch_id)d`` - gets replaced by the sequence number of the batch.
* ``%(batch_id)d`` - gets replaced by the
1-based
sequence number of the batch.
Use :ref:`printf-style string formatting <python:old-string-formatting>` to
Use :ref:`printf-style string formatting <python:old-string-formatting>` to
alter the number format. For example, to make the batch ID a 5-digit
alter the number format. For example, to make the batch ID a 5-digit
...
@@ -517,16 +518,74 @@ And your :command:`crawl` command line is::
...
@@ -517,16 +518,74 @@ And your :command:`crawl` command line is::
The command line above can generate a directory tree like::
The command line above can generate a directory tree like::
->projectname
->projectname
-->dirname
-->dirname
--->1-filename2020-03-28T14-45-08.237134.json
--->1-filename2020-03-28T14-45-08.237134.json
--->2-filename2020-03-28T14-45-09.148903.json
--->2-filename2020-03-28T14-45-09.148903.json
--->3-filename2020-03-28T14-45-10.046092.json
--->3-filename2020-03-28T14-45-10.046092.json
Where the first and second files contain exactly 100 items. The last one contains
Where the first and second files contain exactly 100 items. The last one contains
100 items or fewer.
100 items or fewer.
.. setting:: FEED_URI_PARAMS
FEED_URI_PARAMS
---------------
Default: ``None``
A string with the import path of a function to set the parameters to apply with
:ref:`printf-style string formatting <python:old-string-formatting>` to the
feed URI.
The function signature should be as follows:
.. function:: uri_params(params, spider)
Return a :class:`dict` of key-value pairs to apply to the feed URI using
:ref:`printf-style string formatting <python:old-string-formatting>`.
:param params: default key-value pairs
Specifically:
- ``batch_id``: ID of the file batch. See
:setting:`FEED_EXPORT_BATCH_ITEM_COUNT`.
If :setting:`FEED_EXPORT_BATCH_ITEM_COUNT` is ``0``, ``batch_id``
is always ``1``.
- ``batch_time``: UTC date and time, in ISO format with ``:``
replaced with ``-``.
See :setting:`FEED_EXPORT_BATCH_ITEM_COUNT`.
- ``time``: ``batch_time``, with microseconds set to ``0``.
:type params: dict
:param spider: source spider of the feed items
:type spider: scrapy.spiders.Spider
For example, to include the :attr:`name <scrapy.spiders.Spider.name>` of the
source spider in the feed URI:
#. Define the following function somewhere in your project::
# myproject/utils.py
def uri_params(params, spider):
return {**params, 'spider_name': spider.name}
#. Point :setting:`FEED_URI_PARAMS` to that function in your settings::
# myproject/settings.py
FEED_URI_PARAMS = 'myproject.utils.uri_params'
#. Use ``%(spider_name)s`` in your feed URI::
scrapy crawl <spider_name> -o "%(spider_name)s.jl"
.. _URIs: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
.. _URIs: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
.. _Amazon S3: https://aws.amazon.com/s3/
.. _Amazon S3: https://aws.amazon.com/s3/
.. _botocore: https://github.com/boto/botocore
.. _botocore: https://github.com/boto/botocore
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录