How Search Works

We scrape data from sites throughout the federation so that we can consult it later when searching for something of current interest. See How Scrape Works

See also Search Overview from chat log.

# Interface

Our interests are expressed as search terms from within specific vocabularies collected as attributes of pages.

A web form provides a direct way to enter terms of interest and query the available indices.

A Search plugin provides a wiki centric way to construct, invoke and interpret queries of available indices.

A web service performs the queries from the web form and Search plugin with its intimate access to the indices constructed during scrapes.

# Queries

A well formed query specifies an attribute of interest, terms to be found within the attribute, indication whether any or all terms must match, and, indication whether sites are sought or specific pages within sites.

Query processing finds first sites with attribute matching search terms, and, if requested, matching pages within that site. A site or page 'has' the term if it is found in the corresponding index file.

Query code uses wildcard matches to find index files to be examined for any particular query. github

For sites matching terms.

sites/*/#{find}.txt

For pages within a site matching terms.

sites/#{site}/pages/*/#{find}.txt

Once read, the text of each file is examined for each term.

text.match /\b#{word}\b/