Script php per creare motore di ricerca nel proprio sito web dotato di spider
Giugno 26th, 2008 by Giovanni Caputo
Sphider è un opensource web spider e motore di ricerca. Include un crawler automatizzato, che può seguire i links di un sito e indicizzarli. Scritto in PHP e usa MySQL.
Features
Spidering and indexing
- Performs full text indexing.
- Can index both static and dynamic pages.
- Finds links in href, frame, area and meta tags, and can also follow links given in javascript as strings via window.location and window.open.
- Respects robots.txt protocol, and nofollow and noindex tags.
- Follows server side redirections.
- Allows spidering to be limited by depth (ie maximum number of clicks from the starting page), by (sub)domain or by directory.
- Allows spidering only the urls matching (or not matching) certain keywords or regular expressions.
- Supports indexing of pdf and doc files (using external binaries for file conversion).
- Allows resuming paused spidering.
- Possbility to exclude common words from being indexed.
Searching
- Supports AND, OR and phrase searches
- Supports excluding words (by putting a ‘-‘ in front of a word, any page including the word will be omitted from the results).
- Option to add and group sites into categories
- Possibility to limit searching to a given category and its subcategories.
- Possibility of searcing in a specified domain only.
- “Did you mean” search suggestion on mistyped queries.
- Context-sensitive auto-completion on search terms (a la Google Suggest)
- Word stemming for english (searching for “run” finds “running”, “runs” etc)
Administering
- Includes a sophisticated web based administration interface
- Supports indexing via a web interface as well as from commandline – easy to set up cron jobs.
- Comprehensive site and search statistics
- Simple template system – easy to integrate into a site
Post correlati
Questo post è stato postato giovedì, Giugno 26th, 2008 at 11:09
nella categoria Siti Web, Tecnologia. Tags:crawler, mysql, php, ricerche, web.
Puoi seguire tutti i commenti di questo articolo attraverso RSS 2.0 feed.
Puoi lasciare un commento, o trackback dal nostro sito.