Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.3 4.3bsd-beta 6/6/85; site nsc.UUCP
Path: utzoo!watmath!clyde!cbosgd!ihnp4!nsc!chuqui
From: chuqui@nsc.UUCP (Chuq Von Rospach)
Newsgroups: net.news
Subject: Re: keyword-based news
Message-ID: <3210@nsc.UUCP>
Date: Wed, 2-Oct-85 13:08:01 EDT
Article-I.D.: nsc.3210
Posted: Wed Oct  2 13:08:01 1985
Date-Received: Fri, 4-Oct-85 05:04:07 EDT
References: <820@vortex.UUCP>
Reply-To: chuqui@nsc.UUCP (Chuq Von Rospach)
Organization: Ninja Ewok Training Grounds
Lines: 32

In article <820@vortex.UUCP> lauren@vortex.UUCP (Lauren Weinstein) writes:
>For quite a few years, I've been using a very elaborate keyword-based
>system for searching a large newswire story database.  This database
>is in a centralized location so there is no concern about COSTS associated
>with extra matches, unlike the Usenet situation.
>
>One thing I learned long ago thanks to this system--it is almost
>IMPOSSIBLE to avoid major missed matches AND extra matches.  If you
>try to make your keyword choices very specific and negate out topics
>of no interest, you frequently (*VERY* frequently) find that you're missing
>great numbers of stories that you really DID want to see, but where
>a particular keyword you specified wasn't used.  Or you find that *MANY*
>stories you wanted to filter OUT still get through since the keywords
>you wanted to SKIP weren't used.

Lauren has a point, but if this system is like all of the other newswire
searching systems I've seen it has limited applicability to a keyword based
news system. The problem is that doing keyword searches on a general
database IS going to bring forward lots of silly matches because the words
just happen to be used in otherwise unrelated articles. What I'm planning
on doing for NNTN, though, is to have the author attach the appropriate
keywords to the article. Rather that simply grepping text for the words,
you look at only the keywords the author thinks is important. Even if the
author is completely incompetent with this keyword selection this should
keep the accidental matches down to a minimum. You can't ignore the
problem, but you also have to realize that Lauren's example is to a great
extent an Apple and Orange comparison to USENET and its problems. 
-- 
:From under the bar at Callahan's:   Chuq Von Rospach 
nsc!chuqui@decwrl.ARPA               {decwrl,hplabs,ihnp4,pyramid}!nsc!chuqui

If you can't talk below a bellow, you can't talk...