Caputo's blog

Informatica, tecnologia, programmazione, fai da te, papercraft e papertoy

Script php per creare motore di ricerca nel proprio sito web dotato di spider

giugno 26th, 2008 by Giovanni Caputo

Sphider è un opensource web spider e motore di ricerca. Include un crawler automatizzato, che può seguire i links di un sito e indicizzarli. Scritto in PHP e usa MySQL.

Features

Spidering and indexing

  • Performs full text indexing.
  • Can index both static and dynamic pages.
  • Finds links in href, frame, area and meta tags, and can also follow links given in javascript as strings via window.location and window.open.
  • Respects robots.txt protocol, and nofollow and noindex tags.
  • Follows server side redirections.
  • Allows spidering to be limited by depth (ie maximum number of clicks from the starting page), by (sub)domain or by directory.
  • Allows spidering only the urls matching (or not matching) certain keywords or regular expressions.
  • Supports indexing of pdf and doc files (using external binaries for file conversion).
  • Allows resuming paused spidering.
  • Possbility to exclude common words from being indexed.

Searching

  • Supports AND, OR and phrase searches
  • Supports excluding words (by putting a ‘-‘ in front of a word, any page including the word will be omitted from the results).
  • Option to add and group sites into categories
  • Possibility to limit searching to a given category and its subcategories.
  • Possibility of searcing in a specified domain only.
  • “Did you mean” search suggestion on mistyped queries.
  • Context-sensitive auto-completion on search terms (a la Google Suggest)
  • Word stemming for english (searching for “run” finds “running”, “runs” etc)

Administering

  • Includes a sophisticated web based administration interface
  • Supports indexing via a web interface as well as from commandline – easy to set up cron jobs.
  • Comprehensive site and search statistics
  • Simple template system – easy to integrate into a site

LINK

Questo post è stato postato giovedì, giugno 26th, 2008 at 11:09 nella categoria Siti Web, Tecnologia. Tags:, , , , .
Puoi seguire tutti i commenti di questo articolo attraverso RSS 2.0 feed. Puoi lasciare un commento, o trackback dal nostro sito.

Lascia un commento

You must be loggati to post a comment.