An Intranet Search Engine |
Tbis is the search engine powering this site.
Html pages are marked up with special [startindex] and [endindex] tags. The index generation
program indexes words found between these tags
The index is bound directly into the search engine. This design is simple and effective for sites
with reasonably small page sets, but does not scale to sites with thousands of pages
The program uses a list of common discard words such as 'and', 'the' etc, which are not indexed.
It outputs a list of new words to the log in the words.txt list format to make it as easy as possible
to add newly discovered words to the discard list. Nonetheless, manual maintenance of the discard
word list is a task that also does not scale well.
|