noncrawl

noncrawl is a crawler that saves only links. It crawls the web but does not attempt to do everything. Instead, its only purpose is to recursively check sites for links to other sites, which are then also checked for links to other sites, etc. So, if site Y links to site X, that piece of information is saved, and if site X has not been checked yet, it will be crawled just like site Y was.
noncrawl has its branches at Gitorious; see http://gitorious.org/noncrawl. A bugtracker can be found at Launchpad; see http://launchpad.net/noncrawl.
| Title: | noncrawl |
| Modified: | Tue, 02 Aug 2011 23:08:13 +0200 |
| Created: | Tue, 02 Aug 2011 23:08:13 +0200 |
| Revision: | 0 (local), 20 (global) |
| Summary: | A links-centric webcrawler |
| License: | Creative Commons Attribution-ShareAlike 3.0 Unported (or any later version) (page) |
| License: | GNU General Public License, version 3 (or any later version) (program) |