Sphider è un opensource web spider e motore di ricerca. Include un crawler automatizzato, che può seguire i links di un sito e indicizzarli. Scritto in PHP e usa MySQL.
Spidering and indexing
- Performs full text indexing.
- Can index both static and dynamic pages.
- Respects robots.txt protocol, and nofollow and noindex tags.
- Follows server side redirections.
- Allows spidering to be limited by depth (ie maximum number of clicks from the starting page), by (sub)domain or by directory.
- Allows spidering only the urls matching (or not matching) certain keywords or regular expressions.
- Supports indexing of pdf and doc files (using external binaries for file conversion).
- Allows resuming paused spidering.
- Possbility to exclude common words from being indexed.
- Supports AND, OR and phrase searches
- Supports excluding words (by putting a ‘-‘ in front of a word, any page including the word will be omitted from the results).
- Option to add and group sites into categories
- Possibility to limit searching to a given category and its subcategories.
- Possibility of searcing in a specified domain only.
- “Did you mean” search suggestion on mistyped queries.
- Context-sensitive auto-completion on search terms (a la Google Suggest)
- Word stemming for english (searching for “run” finds “running”, “runs” etc)
- Includes a sophisticated web based administration interface
- Supports indexing via a web interface as well as from commandline – easy to set up cron jobs.
- Comprehensive site and search statistics
- Simple template system – easy to integrate into a site