| |
|
Tbis is the search engine powering this site. Html pages are marked up with special [startindex] and [endindex] tags. The index generation program indexes words found between these tags The index is bound directly into the search engine. This design is simple and effective for sites with reasonably small page sets, but does not scale to sites with thousands of pages The program uses a list of common discard words such as 'and', 'the' etc, which are not indexed. It outputs a list of new words to the log in the words.txt list format to make it as easy as possible to add newly discovered words to the discard list. Nonetheless, manual maintenance of the discard word list is a task that also does not scale well.
|