Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!nbires!hao!gaia!zhahai
From: zhahai@gaia.UUCP (Zhahai Stewart)
Newsgroups: comp.databases
Subject: Indexed Text Database Query
Message-ID: <217@gaia.UUCP>
Date: Wed, 17-Dec-86 15:49:03 EST
Article-I.D.: gaia.217
Posted: Wed Dec 17 15:49:03 1986
Date-Received: Fri, 19-Dec-86 02:05:45 EST
Reply-To: zhahai@gaia.UUCP (Zhahai Stewart)
Organization: Gaia Corp, Boulder, CO
Lines: 20
Keywords: text index invert

There are several commercial text indexing products on the market, which will
keep a master index for a set of files (or articles) which for any keyword
can quickly tell which files contain that keyword.  Further, one can query
for a boolean combination of keywords (OR, AND, maybe NOT), and (best trick
yet) ask for two keywords to be found within N words of each other.  One must
of course let the text indexer know when a file is deleted, added, or revised.

My question is how this is done, algorithmically.  The most obvious approches
are slow and would build rather large indices.  I am looking for either a
description or a reference to some source which treats this subject with enough
detail to support an implementation (assume a decent foundation in data 
structures and basic algorithms).

Thanks for any help. ~z~



-- 
Zhahai Stewart
{hao | nbires}!gaia!zhahai