Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
looyolo
scrapy
提交
acb3b443
S
scrapy
项目概览
looyolo
/
scrapy
与 Fork 源项目一致
从无法访问的项目Fork
通知
2
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
S
scrapy
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
前往新版Gitcode,体验更适合开发者的 AI 搜索 >>
未验证
提交
acb3b443
编写于
8月 14, 2020
作者:
A
Andrey Rahmatullin
提交者:
GitHub
8月 14, 2020
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #4724 from Gallaecio/feed-uri-params
Document FEED_URI_PARAMS
上级
61653418
65e0abae
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
67 addition
and
8 deletion
+67
-8
docs/topics/feed-exports.rst
docs/topics/feed-exports.rst
+67
-8
未找到文件。
docs/topics/feed-exports.rst
浏览文件 @
acb3b443
...
...
@@ -321,13 +321,14 @@ The following is a list of the accepted keys and the setting that is used
as a fallback value if that key is not provided for a specific feed definition.
* ``format``: the serialization format to be used for the feed.
See :ref:`topics-feed-format` for possible values.
See :ref:`topics-feed-format` for possible values.
Mandatory, no fallback setting
* ``batch_item_count``: falls back to :setting:`FEED_EXPORT_BATCH_ITEM_COUNT`
* ``encoding``: falls back to :setting:`FEED_EXPORT_ENCODING`
* ``fields``: falls back to :setting:`FEED_EXPORT_FIELDS`
* ``indent``: falls back to :setting:`FEED_EXPORT_INDENT`
* ``store_empty``: falls back to :setting:`FEED_STORE_EMPTY`
* ``
batch_item_count``: falls back to :setting:`FEED_EXPORT_BATCH_ITEM_COUNT
`
* ``
uri_params``: falls back to :setting:`FEED_URI_PARAMS
`
.. setting:: FEED_EXPORT_ENCODING
...
...
@@ -500,7 +501,7 @@ generated:
* ``%(batch_time)s`` - gets replaced by a timestamp when the feed is being created
(e.g. ``2020-03-28T14-45-08.237134``)
* ``%(batch_id)d`` - gets replaced by the sequence number of the batch.
* ``%(batch_id)d`` - gets replaced by the
1-based
sequence number of the batch.
Use :ref:`printf-style string formatting <python:old-string-formatting>` to
alter the number format. For example, to make the batch ID a 5-digit
...
...
@@ -517,16 +518,74 @@ And your :command:`crawl` command line is::
The command line above can generate a directory tree like::
->projectname
-->dirname
--->1-filename2020-03-28T14-45-08.237134.json
--->2-filename2020-03-28T14-45-09.148903.json
--->3-filename2020-03-28T14-45-10.046092.json
->projectname
-->dirname
--->1-filename2020-03-28T14-45-08.237134.json
--->2-filename2020-03-28T14-45-09.148903.json
--->3-filename2020-03-28T14-45-10.046092.json
Where the first and second files contain exactly 100 items. The last one contains
100 items or fewer.
.. setting:: FEED_URI_PARAMS
FEED_URI_PARAMS
---------------
Default: ``None``
A string with the import path of a function to set the parameters to apply with
:ref:`printf-style string formatting <python:old-string-formatting>` to the
feed URI.
The function signature should be as follows:
.. function:: uri_params(params, spider)
Return a :class:`dict` of key-value pairs to apply to the feed URI using
:ref:`printf-style string formatting <python:old-string-formatting>`.
:param params: default key-value pairs
Specifically:
- ``batch_id``: ID of the file batch. See
:setting:`FEED_EXPORT_BATCH_ITEM_COUNT`.
If :setting:`FEED_EXPORT_BATCH_ITEM_COUNT` is ``0``, ``batch_id``
is always ``1``.
- ``batch_time``: UTC date and time, in ISO format with ``:``
replaced with ``-``.
See :setting:`FEED_EXPORT_BATCH_ITEM_COUNT`.
- ``time``: ``batch_time``, with microseconds set to ``0``.
:type params: dict
:param spider: source spider of the feed items
:type spider: scrapy.spiders.Spider
For example, to include the :attr:`name <scrapy.spiders.Spider.name>` of the
source spider in the feed URI:
#. Define the following function somewhere in your project::
# myproject/utils.py
def uri_params(params, spider):
return {**params, 'spider_name': spider.name}
#. Point :setting:`FEED_URI_PARAMS` to that function in your settings::
# myproject/settings.py
FEED_URI_PARAMS = 'myproject.utils.uri_params'
#. Use ``%(spider_name)s`` in your feed URI::
scrapy crawl <spider_name> -o "%(spider_name)s.jl"
.. _URIs: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
.. _Amazon S3: https://aws.amazon.com/s3/
.. _botocore: https://github.com/boto/botocore
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录