Improved documentation of Scrapy command-line tool

--HG-- rename : docs/topics/cmdline.rst => docs/topics/commands.rst

Improved documentation of Scrapy command-line tool
--HG-- rename : docs/topics/cmdline.rst => docs/topics/commands.rst
94ead94b · Pablo Hoffman · 34554da2 · 94ead94b · 94ead94b · 34554da2
5 changed file
--- a/docs/_ext/scrapydocs.py
+++ b/docs/_ext/scrapydocs.py
@@ -9,3 +9,8 @@ def setup(app):
        rolename      = "signal",
        indextemplate = "pair: %s; signal",
    )
+    app.add_crossref_type(
+        directivename = "command",
+        rolename      = "command",
+        indextemplate = "pair: %s; command",
+    )
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -50,6 +50,7 @@ Scraping basics
 .. toctree::
   :hidden:

+   topics/commands
   topics/items
   topics/spiders
   topics/link-extractors
@@ -59,6 +60,9 @@ Scraping basics
   topics/item-pipeline
   topics/feed-exports

+:doc:`topics/commands`
+    Learn about the command-line tool used to manage your Scrapy project.
+
 :doc:`topics/items`
    Define the data you want to scrape.

@@ -169,15 +173,14 @@ Reference
 .. toctree::
   :hidden:

-   topics/cmdline
   topics/request-response
   topics/settings
   topics/signals
   topics/exceptions
   topics/exporters

-:doc:`topics/cmdline`
-    Understand the command-line tool used to control your Scrapy project.
+:doc:`topics/commands`
+    Learn about the command-line tool and see all :ref:`available commands <topics-commands-ref>`.

 :doc:`topics/request-response`
    Understand the classes used to represent HTTP requests and responses.

--- a/docs/topics/cmdline.rst
+++ b/docs/topics/cmdline.rst
-.. _topics-cmdline:
-
-========================
-Scrapy command line tool
-========================
-
-Scrapy is controlled through the ``scrapy`` command, which we'll refer to as
-the "Scrapy tool" from now on to differentiate it from Scrapy commands.
-
-The Scrapy tool provides several commands, for different purposes. Each command
-supports its own particular syntax. In other words, each command supports a
-different set of arguments and options.
-
-This page doesn't describe each command and its syntax, but instead provides an
-introduction to how the ``scrapy`` tool is used. After you learn the basics,
-you can get help for each particular command using the ``scrapy`` tool itself.
-
-Using the ``scrapy`` tool
-=========================
-
-The first thing you would do with the ``scrapy`` tool is to create your Scrapy
-project::
-
-    scrapy startproject myproject
-
-That will create a Scrapy project under the ``myproject`` directory.
-
-Next, you go inside the new project directory::
-
-    cd myproject
-
-And you're ready to use use the ``scrapy`` command to manage and control your
-project from there. For example, to create a new spider::
-
-    scrapy genspider mydomain mydomain.com
-
-See all available commands
--------------------------
-
-To see all available commands type::
-
-    scrapy -h
-
-That will print a summary of all available Scrapy commands.
-
-The first line will print the currently active project, if you're inside a
-Scrapy project.
-
-Example (with an active project)::
-
-    Scrapy X.X.X - project: myproject
-
-    Usage
-    =====
-
-    ...
-
-Example (with no active project)::
-
-    Scrapy X.X.X - no active project
-
-    Usage
-    =====
-
-    ...
-
-
-Get help for a particular command
---------------------------------
-
-To get help about a particular command, including its description, usage, and
-available options type::
-
-    scrapy <command> -h
-
-Example::
-
-    scrapy crawl -h
-
-Using ``scrapy`` tool outside your project
-==========================================
-
-Not all commands must be run from "inside" a Scrapy project. You can, for
-example, use the ``fetch`` command to download a page (using Scrapy built-in
-downloader) from outside a project. Other commands that can be used outside a
-project are ``startproject`` (obviously) and ``shell``, to launch a
-:ref:`Scrapy Shell <topics-shell>`.
-
-Also, keep in mind that some commands may have slightly different behaviours
-when running them from inside projects. For example, the fetch command will use
-spider arguments (such as ``user_agent`` attribute) if the url being fetched is
-handled by some specific project spider that happens to define a custom
-``user_agent`` attribute. This is feature, as the ``fetch`` command is meant to
-download pages as they would be downloaded from the spider.
--- a/docs/topics/commands.rst
+++ b/docs/topics/commands.rst
+.. _topics-commands:
+
+=================
+Command line tool
+=================
+
+Scrapy is controlled through the ``scrapy`` command-line tool, to be referred
+here as the "Scrapy tool" to differentiate it from their sub-commands which we
+just call "commands", or "Scrapy commands".
+
+The Scrapy tool provides several commands, for multiple purposes, and each one
+accepts a different set of arguments and options.
+
+Using the ``scrapy`` tool
+=========================
+
+You can start by running the Scrapy tool with no arguments and it will print
+some usage help and the available commands::
+
+    Scrapy X.Y - no active project
+
+    Usage
+    =====
+
+    To run a command:
+      scrapy <command> [options] [args]
+
+    To get help:
+      scrapy <command> -h
+
+    Available commands
+    ==================
+
+    [...]
+
+The first line will print the currently active project, if you're inside a
+Scrapy project. In this, it was run from outside a project. If run from inside
+a project it would have printed something like this::
+
+    Scrapy X.Y - project: myproject
+
+    Usage
+    =====
+
+    [...]
+
+Using the ``scrapy`` tool to create projects
+============================================
+
+The first thing you typically do with the ``scrapy`` tool is create your Scrapy
+project::
+
+    scrapy startproject myproject
+
+That will create a Scrapy project under the ``myproject`` directory.
+
+Next, you go inside the new project directory::
+
+    cd myproject
+
+And you're ready to use use the ``scrapy`` command to manage and control your
+project from there.
+
+Using the ``scrapy`` tool to control projects
+=============================================
+
+You use the ``scrapy`` tool from inside your projects to control and manage
+them.
+
+For example, to create a new spider::
+
+    scrapy genspider mydomain mydomain.com
+
+Some Scrapy commands (like :command:`crawl`) must be run from inside a Scrapy
+project. See the :ref:`commands reference <topics-commands-ref>` below for more
+information on which commands must be run from inside projects, and which not.
+
+Also keep in mind that some commands may have slightly different behaviours
+when running them from inside projects. For example, the fetch command will use
+spider-overridden behaviours (such as custom ``user_agent`` attribute) if the
+url being fetched is associated with some specific spider. This is intentional,
+as the ``fetch`` command is meant to be used to check how spiders are
+downloading pages.
+
+.. _topics-commands-ref:
+
+Available tool commands
+=======================
+
+Here's a list of available built-in commands with a description and some usage
+examples. Remember you can always get more info about each command by running::
+
+    scrapy <command> -h
+
+And you can check all available commands with::
+
+    scrapy -h
+
+.. command:: startproject
+
+startproject
+------------
+
+-------------------+----------------------------------------+
+| Syntax:           | ``scrapy startproject <project_name>`` |
+-------------------+----------------------------------------+
+| Requires project: | *no*                                   |
+-------------------+----------------------------------------+
+
+Creates a new Scrapy project named ``project_name``, under the ``project_name``
+directory.
+
+Usage example::
+
+    $ scrapy startproject myproject
+
+.. command:: genspider
+
+genspider
+---------
+
+-------------------+--------------------------------------+
+| Syntax:           | ``scrapy genspider <name> <domain>`` |
+-------------------+--------------------------------------+
+| Requires project: | *yes*                                |
+-------------------+--------------------------------------+
+
+Create a new spider in the current project.
+
+This is just a convenient shortcut command for creating spiders based on
+pre-defined templates, but certainly not the only way to create spiders. You
+can just create the spider source code files yourself.
+
+Usage example::
+
+    $ scrapy genspider example example.com
+    Created spider 'example' using template 'crawl' in module:
+      jobsbot.spiders.example
+
+.. command:: crawl
+
+crawl
+-----
+
+-------------------+-------------------------------+
+| Syntax:           | ``scrapy crawl <spider|url>`` |
+-------------------+-------------------------------+
+| Requires project: | *yes*                         |
+-------------------+-------------------------------+
+
+Start crawling a spider. If a URL is passed instead of a spider, it will start
+from that URL instead of the spider start urls.
+
+Usage examples::
+
+    $ scrapy crawl example.com
+    [ ... example.com spider starts crawling ... ]
+
+    $ scrapy crawl myspider
+    [ ... myspider starts crawling ... ]
+
+    $ scrapy crawl http://example.com/some/page.html
+    [ ... spider that handles example.com starts crawling from that url ... ]
+
+.. command:: start
+
+start
+-----
+
+-------------------+------------------+
+| Syntax:           | ``scrapy start`` |
+-------------------+------------------+
+| Requires project: | *yes*            |
+-------------------+------------------+
+
+Start Scrapy in server mode.
+
+Usage example::
+
+    $ scrapy start
+    [ ... scrapy starts and stays idle waiting for spiders to get scheduled ... ]
+
+.. command:: list
+
+list
+----
+
+-------------------+-----------------+
+| Syntax:           | ``scrapy list`` |
+-------------------+-----------------+
+| Requires project: | *yes*           |
+-------------------+-----------------+
+
+List all available spiders in the current project. The output is one spider per
+line.
+
+Usage example::
+
+    $ scrapy list
+    spider1
+    spider2
+
+.. command:: fetch
+
+fetch
+-----
+
+-------------------+------------------------+
+| Syntax:           | ``scrapy fetch <url>`` |
+-------------------+------------------------+
+| Requires project: | *no*                   |
+-------------------+------------------------+
+
+Downloads the given URL using the Scrapy downloader and writes the contents to
+standard output.
+
+The interesting thing about this command is that it fetches the page how the
+the spider would download it. For example, if the spider has an ``user_agent``
+attribute which overrides the User Agent, it will use that one.
+
+So this command can be used to "see" how your spider would fetch certain page.
+
+If used outside a project, no particular per-spider behaviour would be applied
+and it will just use the default Scrapy downloder settings.
+
+Usage examples::
+
+    $ scrapy fetch --nolog http://www.example.com/some/page.html
+    [ ... html content here ... ]
+
+    $ scrapy fetch --nolog --headers http://www.example.com/
+    {'Accept-Ranges': ['bytes'],
+     'Age': ['1263   '],
+     'Connection': ['close     '],
+     'Content-Length': ['596'],
+     'Content-Type': ['text/html; charset=UTF-8'],
+     'Date': ['Wed, 18 Aug 2010 23:59:46 GMT'],
+     'Etag': ['"573c1-254-48c9c87349680"'],
+     'Last-Modified': ['Fri, 30 Jul 2010 15:30:18 GMT'],
+     'Server': ['Apache/2.2.3 (CentOS)']}
+
+.. command:: view
+
+view
+----
+
+-------------------+-----------------------+
+| Syntax:           | ``scrapy view <url>`` |
+-------------------+-----------------------+
+| Requires project: | *no*                  |
+-------------------+-----------------------+
+
+Opens the given URL in a browser, as your Scrapy spider would "see" it.
+Sometimes spiders see pages differently from regular users, so this can be used
+to check what the spider "sees" and confirm it's what you expect.
+
+Usage example::
+
+    $ scrapy view http://www.example.com/some/page.html
+    [ ... browser starts ... ]
+
+.. command:: shell
+
+shell
+-----
+
+-------------------+------------------------+
+| Syntax:           | ``scrapy shell [url]`` |
+-------------------+------------------------+
+| Requires project: | *no*                   |
+-------------------+------------------------+
+
+Starts the Scrapy shell for the given URL (if given) or empty if not URL is
+given. See :ref:`topics-shell` for more info.
+
+Usage example::
+
+    $ scrapy shell http://www.example.com/some/page.html
+    [ ... scrapy shell starts ... ]
+
+.. command:: parse
+
+parse
+-----
+
+-------------------+----------------------------------+
+| Syntax:           | ``scrapy parse <url> [options]`` |
+-------------------+----------------------------------+
+| Requires project: | *yes*                            |
+-------------------+----------------------------------+
+
+Fetches the given URL and parses with the spider that handles it, using the
+method passed with the ``--callback`` option, or ``parse`` if not given.
+
+Supported options:
+
+ * ``--callback`` or ``-c``: spider method to use as callback for parsing the
+   response
+
+ * ``--noitems``: don't show extracted links
+
+ * ``--nolinks``: don't show scraped items
+
+Usage example::
+
+    $ scrapy parse http://www.example.com/ -c parse_item
+    [ ... scrapy log lines crawling example.com spider ... ]
+    # Scraped Items - callback: parse ------------------------------------------------------------
+    MyItem({'name': u"Example item",
+     'category': u'Furniture',
+     'length': u'12 cm'}
+    )
+
+.. command:: settings
+
+settings
+--------
+
+-------------------+-------------------------------+
+| Syntax:           | ``scrapy settings [options]`` |
+-------------------+-------------------------------+
+| Requires project: | *no*                          |
+-------------------+-------------------------------+
+
+Get the value of a Scrapy setting.
+
+If used inside a project it'll show the project setting value, otherwise it'll
+show the default Scrapy value for that setting.
+
+Example usage::
+
+    $ scrapy settings --get BOT_NAME
+    scrapybot
+    $ scrapy settings --get DOWNLOAD_DELAY
+    0
+
+.. command:: runspider
+
+runspider
+---------
+
+-------------------+---------------------------------------+
+| Syntax:           | ``scrapy runspider <spider_file.py>`` |
+-------------------+---------------------------------------+
+| Requires project: | *no*                                  |
+-------------------+---------------------------------------+
+
+Run a spider self-contained in a Python file, without having to create a
+project.
+
+Example usage::
+
+    $ scrapy runspider myspider.py
+    [ ... spider starts crawling ... ]
--- a/docs/topics/settings.rst
+++ b/docs/topics/settings.rst
@@ -24,7 +24,8 @@ Designating the settings

 When you use Scrapy, you have to tell it which settings you're using. You can
 do this by using an environment variable, ``SCRAPY_SETTINGS_MODULE``, or the
-``--settings`` argument of the :doc:`scrapy command </topics/cmdline>`.
+``--settings`` argument of the :doc:`scrapy command-line tool
+</topics/commands>`.

 The value of ``SCRAPY_SETTINGS_MODULE`` should be in Python path syntax, e.g.
 ``myproject.settings``. Note that the settings module should be on the
@@ -89,9 +90,10 @@ It's where most of your custom settings will be populated.
 4. Default settings per-command
 -------------------------------

-Each :doc:`/topics/cmdline` command can have its own default settings, which
-override the global default settings. Those custom command settings are
-specified in the ``default_settings`` attribute of the command class.
+Each :doc:`Scrapy tool </topics/commands>` command can have its own default
+settings, which override the global default settings. Those custom command
+settings are specified in the ``default_settings`` attribute of the command
+class.

 5. Default global settings
 --------------------------
@@ -223,8 +225,7 @@ project name). This will be used to construct the User-Agent by default, and
 also for logging.

 It's automatically populated with your project name when you create your
-project with the :doc:`scrapy </topics/cmdline>` ``startproject``
-command.
+project with the :command:`startproject` command.

 .. setting:: BOT_VERSION

@@ -720,7 +721,7 @@ NEWSPIDER_MODULE

 Default: ``''``

-Module where to create new spiders using the ``genspider`` command.
+Module where to create new spiders using the :command:`genspider` command.

 Example::

@@ -996,8 +997,8 @@ TEMPLATES_DIR

 Default: ``templates`` dir inside scrapy module

-The directory where to look for template when creating new projects with
-:doc:`scrapy startproject </topics/cmdline>` command.
+The directory where to look for templates when creating new projects with
+:command:`startproject` command.

 .. setting:: URLLENGTH_LIMIT