diff --git a/sep/sep-015.rst b/sep/sep-015.rst new file mode 100644 index 0000000000000000000000000000000000000000..8e8f03cf02e9dbc93ad1203fb4f5cfbff6ce1582 --- /dev/null +++ b/sep/sep-015.rst @@ -0,0 +1,57 @@ +======= ============================================== +SEP 15 +Title ScrapyManager and SpiderManager API refactoring +Author Insophia Team +Created 2010-03-10 +Status Final +======= ============================================== + +======================================================== +SEP-015: ScrapyManager and SpiderManager API refactoring +======================================================== + +This SEP proposes a refactoring of ``ScrapyManager`` and ``SpiderManager`` +APIs. + +SpiderManager +============= + +- ``get(spider_name)`` -> ``Spider`` instance +- ``find_by_request(request)`` -> list of spider names +- ``list()`` -> list of spider names + +- remove ``fromdomain()``, ``fromurl()`` + +ScrapyManager +============= + +- ``crawl_request(request, spider=None)`` + - calls ``SpiderManager.find_by_request(request)`` if spider is ``None`` + - fails if ``len(spiders returned)`` != 1 +- ``crawl_spider(spider)`` + - calls ``spider.start_requests()`` +- ``crawl_spider_name(spider_name)`` + - calls ``SpiderManager.get(spider_name)`` + - calls ``spider.start_requests()`` +- ``crawl_url(url)`` + - calls ``spider.make_requests_from_url()`` + +- remove ``crawl()``, ``runonce()`` + +Instead of using ``runonce()``, commands (such as crawl/parse) would call +``crawl_*`` and then ``start()``. + +Changes to Commands +=================== + +- ``if is_url(arg):`` + - calls ``ScrapyManager.crawl_url(arg)`` +- ``else:`` + - calls ``ScrapyManager.crawl_spider_name(arg)`` + +Pending issues +============== + +- should we rename ``ScrapyManager.crawl_*`` to ``schedule_*`` or ``add_*`` ? +- ``SpiderManager.find_by_request`` or + ``SpiderManager.search(request=request)`` ?