From 27ff010a4130d65629bd8adbe7d849392ee90103 Mon Sep 17 00:00:00 2001 From: Edwin O Marshall Date: Thu, 6 Mar 2014 17:18:51 -0500 Subject: [PATCH] - sep 15 for #629 --- sep/sep-015.rst | 57 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) create mode 100644 sep/sep-015.rst diff --git a/sep/sep-015.rst b/sep/sep-015.rst new file mode 100644 index 000000000..8e8f03cf0 --- /dev/null +++ b/sep/sep-015.rst @@ -0,0 +1,57 @@ +======= ============================================== +SEP 15 +Title ScrapyManager and SpiderManager API refactoring +Author Insophia Team +Created 2010-03-10 +Status Final +======= ============================================== + +======================================================== +SEP-015: ScrapyManager and SpiderManager API refactoring +======================================================== + +This SEP proposes a refactoring of ``ScrapyManager`` and ``SpiderManager`` +APIs. + +SpiderManager +============= + +- ``get(spider_name)`` -> ``Spider`` instance +- ``find_by_request(request)`` -> list of spider names +- ``list()`` -> list of spider names + +- remove ``fromdomain()``, ``fromurl()`` + +ScrapyManager +============= + +- ``crawl_request(request, spider=None)`` + - calls ``SpiderManager.find_by_request(request)`` if spider is ``None`` + - fails if ``len(spiders returned)`` != 1 +- ``crawl_spider(spider)`` + - calls ``spider.start_requests()`` +- ``crawl_spider_name(spider_name)`` + - calls ``SpiderManager.get(spider_name)`` + - calls ``spider.start_requests()`` +- ``crawl_url(url)`` + - calls ``spider.make_requests_from_url()`` + +- remove ``crawl()``, ``runonce()`` + +Instead of using ``runonce()``, commands (such as crawl/parse) would call +``crawl_*`` and then ``start()``. + +Changes to Commands +=================== + +- ``if is_url(arg):`` + - calls ``ScrapyManager.crawl_url(arg)`` +- ``else:`` + - calls ``ScrapyManager.crawl_spider_name(arg)`` + +Pending issues +============== + +- should we rename ``ScrapyManager.crawl_*`` to ``schedule_*`` or ``add_*`` ? +- ``SpiderManager.find_by_request`` or + ``SpiderManager.search(request=request)`` ? -- GitLab