Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
looyolo
scrapy
提交
aae6aed4
S
scrapy
项目概览
looyolo
/
scrapy
与 Fork 源项目一致
从无法访问的项目Fork
通知
2
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
S
scrapy
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
aae6aed4
编写于
1月 22, 2013
作者:
C
Chris Tilden
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
fixes spelling errors in documentation
上级
27583922
变更
23
隐藏空白更改
内联
并排
Showing
23 changed file
with
39 addition
and
39 deletion
+39
-39
docs/topics/api.rst
docs/topics/api.rst
+1
-1
docs/topics/commands.rst
docs/topics/commands.rst
+1
-1
docs/topics/debug.rst
docs/topics/debug.rst
+1
-1
docs/topics/djangoitem.rst
docs/topics/djangoitem.rst
+2
-2
docs/topics/downloader-middleware.rst
docs/topics/downloader-middleware.rst
+2
-2
docs/topics/email.rst
docs/topics/email.rst
+1
-1
docs/topics/exporters.rst
docs/topics/exporters.rst
+1
-1
docs/topics/feed-exports.rst
docs/topics/feed-exports.rst
+1
-1
docs/topics/images.rst
docs/topics/images.rst
+1
-1
docs/topics/item-pipeline.rst
docs/topics/item-pipeline.rst
+1
-1
docs/topics/items.rst
docs/topics/items.rst
+2
-2
docs/topics/jobs.rst
docs/topics/jobs.rst
+1
-1
docs/topics/link-extractors.rst
docs/topics/link-extractors.rst
+1
-1
docs/topics/loaders.rst
docs/topics/loaders.rst
+3
-3
docs/topics/logging.rst
docs/topics/logging.rst
+2
-2
docs/topics/request-response.rst
docs/topics/request-response.rst
+3
-3
docs/topics/scrapyd.rst
docs/topics/scrapyd.rst
+3
-3
docs/topics/settings.rst
docs/topics/settings.rst
+2
-2
docs/topics/spider-middleware.rst
docs/topics/spider-middleware.rst
+2
-2
docs/topics/spiders.rst
docs/topics/spiders.rst
+3
-3
docs/topics/stats.rst
docs/topics/stats.rst
+2
-2
docs/topics/telnetconsole.rst
docs/topics/telnetconsole.rst
+1
-1
docs/topics/webservice.rst
docs/topics/webservice.rst
+2
-2
未找到文件。
docs/topics/api.rst
浏览文件 @
aae6aed4
...
...
@@ -100,7 +100,7 @@ how you :ref:`configure the downloader middlewares
.. method:: start()
Start the crawler. This cal
s
s :meth:`configure` if it hasn't been called yet.
Start the crawler. This cal
l
s :meth:`configure` if it hasn't been called yet.
Settings API
============
...
...
docs/topics/commands.rst
浏览文件 @
aae6aed4
...
...
@@ -126,7 +126,7 @@ And you can see all available commands with::
There are two kinds of commands, those that only work from inside a Scrapy
project (Project-specific commands) and those that also work without an active
Scrapy project (Global commands), though they may behave slightly different
when running from inside a project (as they would use the project overriden
when running from inside a project (as they would use the project overrid
d
en
settings).
Global commands:
...
...
docs/topics/debug.rst
浏览文件 @
aae6aed4
...
...
@@ -87,7 +87,7 @@ Scrapy Shell
While the :command:`parse` command is very useful for checking behaviour of a
spider, it is of little help to check what happens inside a callback, besides
showing the reponse received and the output. How to debug the situation when
showing the re
s
ponse received and the output. How to debug the situation when
``parse_details`` sometimes receives no item?
Fortunately, the :command:`shell` is your bread and butter in this case (see
...
...
docs/topics/djangoitem.rst
浏览文件 @
aae6aed4
...
...
@@ -16,7 +16,7 @@ Using DjangoItem
================
:
class
:`
DjangoItem
`
works
much
like
ModelForms
in
Django
,
you
create
a
subclass
and
define
its
``
django_model
``
at
ribute
to
v
e
a
valid
Django
model
.
With
this
and
define
its
``
django_model
``
at
tribute
to
b
e
a
valid
Django
model
.
With
this
you
will
get
an
item
with
a
field
for
each
Django
model
field
.
In
addition
,
you
can
define
fields
that
aren
't present in the model and even
...
...
@@ -85,7 +85,7 @@ And we can override the fields of the model with your own::
django_model = Person
name = Field(default='
No
Name
')
This is useful
l
to provide properties to the field, like a default or any other
This is useful to provide properties to the field, like a default or any other
property that your project uses.
DjangoItem caveats
...
...
docs/topics/downloader-middleware.rst
浏览文件 @
aae6aed4
...
...
@@ -108,7 +108,7 @@ single Python class that defines one or more of the following methods:
:param request: the request that originated the response
:type request: is a :class:`~scrapy.http.Request` object
:param reponse: the response being processed
:param re
s
ponse: the response being processed
:type response: :class:`~scrapy.http.Response` object
:param spider: the spider for which this response is intended
...
...
@@ -563,7 +563,7 @@ HttpProxyMiddleware
``proxy`` meta value to :class:`~scrapy.http.Request` objects.
Like the Python standard library modules `urllib`_ and `urllib2`_, it obeys
the following enviroment variables:
the following enviro
n
ment variables:
* ``http_proxy``
* ``https_proxy``
...
...
docs/topics/email.rst
浏览文件 @
aae6aed4
...
...
@@ -56,7 +56,7 @@ uses `Twisted non-blocking IO`_, like the rest of the framework.
performed.
:type smtphost: str
:param smtppass: the SMTP pass for authe
tn
ication.
:param smtppass: the SMTP pass for authe
nt
ication.
:type smtppass: str
:param smtpport: the SMTP port to connect to
...
...
docs/topics/exporters.rst
浏览文件 @
aae6aed4
...
...
@@ -244,7 +244,7 @@ XmlItemExporter
</item>
</items>
Unless overriden in the :meth:`serialize_field` method, multi-valued fields are
Unless overrid
d
en in the :meth:`serialize_field` method, multi-valued fields are
exported by serializing each value inside a ``<value>`` element. This is for
convenience, as multi-valued fields are very common.
...
...
docs/topics/feed-exports.rst
浏览文件 @
aae6aed4
...
...
@@ -113,7 +113,7 @@ being created. These parameters are:
* ``%(time)s`` - gets replaced by a timestamp when the feed is being created
* ``%(name)s`` - gets replaced by the spider name
Any other named parmeter gets replaced by the spider attribute of the same
Any other named par
a
meter gets replaced by the spider attribute of the same
name. For example, ``%(site_id)s`` would get replaced by the ``spider.site_id``
attribute the moment the feed is being created.
...
...
docs/topics/images.rst
浏览文件 @
aae6aed4
...
...
@@ -273,7 +273,7 @@ Here are the methods that you should override in your custom Images Pipeline:
.. method:: item_completed(results, items, info)
The :meth:`ImagesPipeline.item_completed` method called when all image
requests for a single item have completed (either finshed downloading, or
requests for a single item have completed (either fin
i
shed downloading, or
failed for some reason).
The :meth:`~item_completed` method must return the
...
...
docs/topics/item-pipeline.rst
浏览文件 @
aae6aed4
...
...
@@ -62,7 +62,7 @@ Item pipeline example
Price validation and dropping items with no prices
--------------------------------------------------
Let's take a look at the following hypothetic pipeline that adjusts the ``price``
Let's take a look at the following hypothetic
al
pipeline that adjusts the ``price``
attribute for those items that do not include VAT (``price_excludes_vat``
attribute), and drops those items which don't contain a price::
...
...
docs/topics/items.rst
浏览文件 @
aae6aed4
...
...
@@ -61,7 +61,7 @@ certain field keys to configure that behaviour. You must refer to their
documentation to see which metadata keys are used by each component.
It's important to note that the :class:`Field` objects used to declare the item
do not stay assigned as class attributes. Instead, they can be accesed through
do not stay assigned as class attributes. Instead, they can be acces
s
ed through
the :attr:`Item.fields` attribute.
And that's all you need to know about declaring items.
...
...
@@ -137,7 +137,7 @@ Setting field values
...
KeyError: 'Product does not support field: lala'
Accesing all populated values
Acces
s
ing all populated values
-----------------------------
To access all populated values, just use the typical `dict API`_::
...
...
docs/topics/jobs.rst
浏览文件 @
aae6aed4
...
...
@@ -49,7 +49,7 @@ loading that attribute from the job directory, when the spider starts and
stops.
Here's an example of a callback that uses the spider state (other spider code
is om
mi
ted for brevity)::
is om
it
ted for brevity)::
def parse_item(self, response):
# parse item here
...
...
docs/topics/link-extractors.rst
浏览文件 @
aae6aed4
...
...
@@ -85,7 +85,7 @@ SgmlLinkExtractor
Defaults to ``('a', 'area')``.
:type tags: str or list
:param attrs: list of attr
bitu
es which should be considered when looking
:param attrs: list of attr
ibut
es which should be considered when looking
for links to extract (only for those tags specified in the ``tags``
parameter). Defaults to ``('href',)``
:type attrs: boolean
...
...
docs/topics/loaders.rst
浏览文件 @
aae6aed4
...
...
@@ -61,7 +61,7 @@ In other words, data is being collected by extracting it from two XPath
locations, using the :meth:`~XPathItemLoader.add_xpath` method. This is the
data that will be assigned to the ``name`` field later.
Afterw
a
rds, similar calls are used for ``price`` and ``stock`` fields, and
Afterw
o
rds, similar calls are used for ``price`` and ``stock`` fields, and
finally the ``last_update`` field is populated directly with a literal value
(``today``) using a different method: :meth:`~ItemLoader.add_value`.
...
...
@@ -253,7 +253,7 @@ ItemLoader objects
:attr:`default_item_class`.
The item and the remaining keyword arguments are assigned to the Loader
context (accesible through the :attr:`context` attribute).
context (acces
s
ible through the :attr:`context` attribute).
.. method:: get_value(value, \*processors, \**kwargs)
...
...
@@ -280,7 +280,7 @@ ItemLoader objects
The value is first passed through :meth:`get_value` by giving the
``processors`` and ``kwargs``, and then passed through the
:ref:`field input processor <topics-loaders-processors>` and its result
appened to the data collected for that field. If the field already
appen
d
ed to the data collected for that field. If the field already
contains collected data, the new data is added.
The given ``field_name`` can be ``None``, in which case values for
...
...
docs/topics/logging.rst
浏览文件 @
aae6aed4
...
...
@@ -63,13 +63,13 @@ scrapy.log module
will be sent to standard error.
:type logfile: str
:param loglevel: the minimum logging level to log. Available
s
values are:
:param loglevel: the minimum logging level to log. Available values are:
:data:`CRITICAL`, :data:`ERROR`, :data:`WARNING`, :data:`INFO` and
:data:`DEBUG`.
:param logstdout: if ``True``, all standard output (and error) of your
application will be logged instead. For example if you "print 'hello'"
it will appear in the Scrapy log. If om
mi
ted, the :setting:`LOG_STDOUT`
it will appear in the Scrapy log. If om
it
ted, the :setting:`LOG_STDOUT`
setting will be used.
:type logstdout: boolean
...
...
docs/topics/request-response.rst
浏览文件 @
aae6aed4
...
...
@@ -152,7 +152,7 @@ Request objects
recognized by Scrapy.
This dict is `shallow copied`_ when the request is cloned using the
``copy()`` or ``replace()`` methods, and can also be accesed, in your
``copy()`` or ``replace()`` methods, and can also be acces
s
ed, in your
spider, from the ``response.meta`` attribute.
.. _shallow copied: http://docs.python.org/library/copy.html
...
...
@@ -270,7 +270,7 @@ fields with form data from :class:`Response` objects.
sometimes it can cause problems which could be hard to debug. For
example, when working with forms that are filled and/or submitted using
javascript, the default :meth:`from_response` behaviour may not be the
most appropiate. To disable this behaviour you can set the
most approp
r
iate. To disable this behaviour you can set the
``dont_click`` argument to ``True``. Also, if you want to change the
control clicked (instead of disabling it) you can also use the
``clickdata`` argument.
...
...
@@ -294,7 +294,7 @@ fields with form data from :class:`Response` objects.
overridden by the one passed in this parameter.
:type formdata: dict
:param dont_click: If True, the form data will be su
mb
itted without
:param dont_click: If True, the form data will be su
bm
itted without
clicking in any element.
:type dont_click: boolean
...
...
docs/topics/scrapyd.rst
浏览文件 @
aae6aed4
...
...
@@ -66,7 +66,7 @@ Or, if you want to start Scrapyd from inside a Scrapy project you can use the
Installing Scrapyd
==================
How to deploy Scrapyd on your servers depends on the platform you
r
're using.
How to deploy Scrapyd on your servers depends on the platform you're using.
Scrapy comes with Ubuntu packages for Scrapyd ready for deploying it as a
system service, to ease the installation and administration, but you can create
packages for other distribution or operating systems (including Windows). If
...
...
@@ -303,7 +303,7 @@ Now, if you type ``scrapy deploy -l`` you'll see::
See available projects
----------------------
To see all available projets in a specific target use::
To see all available proje
c
ts in a specific target use::
scrapy deploy -L scrapyd
...
...
@@ -459,7 +459,7 @@ Example request::
$ curl http://localhost:6800/addversion.json -F project=myproject -F version=r23 -F egg=@myproject.egg
Example reponse::
Example re
s
ponse::
{"status": "ok", "spiders": 3}
...
...
docs/topics/settings.rst
浏览文件 @
aae6aed4
...
...
@@ -96,7 +96,7 @@ extensions and middlewares::
if settings['LOG_ENABLED']:
print "log is enabled!"
In other words, settings can be accesed like a dict, but it's usually preferred
In other words, settings can be acces
s
ed like a dict, but it's usually preferred
to extract the setting in the format you need it to avoid type errors. In order
to do that you'll have to use one of the methods provided the
:class:`~scrapy.settings.Settings` API.
...
...
@@ -648,7 +648,7 @@ REDIRECT_MAX_TIMES
Default: ``20``
Defines the maximu
n times a request can be redirected. After this maximun
the
Defines the maximu
m times a request can be redirected. After this maximum
the
request's response is returned as is. We used Firefox default value for the
same task.
...
...
docs/topics/spider-middleware.rst
浏览文件 @
aae6aed4
...
...
@@ -77,7 +77,7 @@ single Python class that defines one or more of the following methods:
direction for :meth:`process_spider_output` to process it, or
:meth:`process_spider_exception` if it raised an exception.
:param reponse: the response being processed
:param re
s
ponse: the response being processed
:type response: :class:`~scrapy.http.Response` object
:param spider: the spider for which this response is intended
...
...
@@ -258,7 +258,7 @@ OffsiteMiddleware
these messages for each new domain filtered. So, for example, if another
request for ``www.othersite.com`` is filtered, no log message will be
printed. But if a request for ``someothersite.com`` is filtered, a message
will be printed (but only for the first request filtred).
will be printed (but only for the first request filt
e
red).
If the spider doesn't define an
:attr:`~scrapy.spider.BaseSpider.allowed_domains` attribute, or the
...
...
docs/topics/spiders.rst
浏览文件 @
aae6aed4
...
...
@@ -30,7 +30,7 @@ For spiders, the scraping cycle goes through something like this:
response handled by the specified callback.
3. In callback functions, you parse the page contents, typically using
:ref:`topics-selectors` (but you can also use BeautifuSoup, lxml or whatever
:ref:`topics-selectors` (but you can also use Beautifu
l
Soup, lxml or whatever
mechanism you prefer) and generate items with the parsed data.
4. Finally, the items returned from the spider will be typically persisted to a
...
...
@@ -183,7 +183,7 @@ BaseSpider
:class:`~scrapy.item.Item` objects.
:param response: the response to parse
:type reponse: :class:~scrapy.http.Response`
:type re
s
ponse: :class:~scrapy.http.Response`
.. method:: log(message, [level, component])
...
...
@@ -434,7 +434,7 @@ These spiders are pretty easy to use, let's have a look at one example::
name = 'example.com'
allowed_domains = ['example.com']
start_urls = ['http://www.example.com/feed.xml']
iterator = 'iternodes' # This is actually unnecesary, since it's the default value
iterator = 'iternodes' # This is actually unneces
s
ary, since it's the default value
itertag = 'item'
def parse_node(self, response, node):
...
...
docs/topics/stats.rst
浏览文件 @
aae6aed4
...
...
@@ -6,7 +6,7 @@ Stats Collection
Scrapy provides a convenient facility for collecting stats in the form of
key/values, where values are often counters. The facility is called the Stats
Collector, and can be accesed through the :attr:`~scrapy.crawler.Crawler.stats`
Collector, and can be acces
s
ed through the :attr:`~scrapy.crawler.Crawler.stats`
attribute of the :ref:`topics-api-crawler`, as illustrated by the examples in
the :ref:`topics-stats-usecases` section below.
...
...
@@ -100,7 +100,7 @@ DummyStatsCollector
.. class:: DummyStatsCollector
A Stats collector which does nothing but is very efficient (be
ac
use it does
A Stats collector which does nothing but is very efficient (be
ca
use it does
nothing). This stats collector can be set via the :setting:`STATS_CLASS`
setting, to disable stats collect in order to improve performance. However,
the performance penalty of stats collection is usually marginal compared to
...
...
docs/topics/telnetconsole.rst
浏览文件 @
aae6aed4
...
...
@@ -155,7 +155,7 @@ TELNETCONSOLE_PORT
Default: ``[6023, 6073]``
The port range to use for the
et
lnet console. If set to ``None`` or ``0``, a
The port range to use for the
te
lnet console. If set to ``None`` or ``0``, a
dynamically assigned port is used.
...
...
docs/topics/webservice.rst
浏览文件 @
aae6aed4
...
...
@@ -28,7 +28,7 @@ The web service contains several resources, defined in the
functionality. See :ref:`topics-webservice-resources-ref` for a list of
resources available by default.
Although
t
you can implement your own resources using any protocol, there are
Although you can implement your own resources using any protocol, there are
two kinds of resources bundled with Scrapy:
* Simple JSON resources - which are read-only and just output JSON data
...
...
@@ -188,7 +188,7 @@ To write a web service resource you should subclass the :class:`JsonResource` or
.. attribute:: ws_name
The name by which the Scrapy web service will known this resource, and
also the path w
e
here this resource will listen. For example, assuming
also the path where this resource will listen. For example, assuming
Scrapy web service is listening on http://localhost:6080/ and the
``ws_name`` is ``'resource1'`` the URL for that resource will be:
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录