Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version VT1.00C 11/1/84; site vortex.UUCP
Path: utzoo!linus!decvax!bellcore!vortex!lauren
From: lauren@vortex.UUCP (Lauren Weinstein)
Newsgroups: net.news
Subject: keyword-based news
Message-ID: <820@vortex.UUCP>
Date: Mon, 30-Sep-85 13:31:51 EDT
Article-I.D.: vortex.820
Posted: Mon Sep 30 13:31:51 1985
Date-Received: Wed, 2-Oct-85 21:09:35 EDT
Organization: Vortex Technology, Los Angeles
Lines: 58

For quite a few years, I've been using a very elaborate keyword-based
system for searching a large newswire story database.  This database
is in a centralized location so there is no concern about COSTS associated
with extra matches, unlike the Usenet situation.

One thing I learned long ago thanks to this system--it is almost
IMPOSSIBLE to avoid major missed matches AND extra matches.  If you
try to make your keyword choices very specific and negate out topics
of no interest, you frequently (*VERY* frequently) find that you're missing
great numbers of stories that you really DID want to see, but where
a particular keyword you specified wasn't used.  Or you find that *MANY*
stories you wanted to filter OUT still get through since the keywords
you wanted to SKIP weren't used.  There are so many similar ways to specify
keywords, and there are so many personal choices involved, that getting
the proper match between the person choosing the article keywords and the
person trying to find (or ignore) particular stories is very difficult.

In a keyword-based news system, with users attempting to choose
their own keywords (and probably spelling them wrong part of the time,
or leaving typos in them, let's face it!) getting CORRECT matches without
getting lots of ERRONEOUS matches would be a nightmare.

Let's say I wanted to see all stories that discussed TELEPHONES.
But what if a story about AT&T was only keyworded with "PHONES"
or "COMMUNICATIONS"?  Well, you of course never see those stories.
The same sort of problems can occur in the reverse direction when
you're trying to avoid certain stories.  It is VERY hard to create
flexible keyword-based systems that avoid these problems.  The
issues involved with parts-of-speech and word usage alone are
very significant.  Even the advanced systems won't match on PHONE
when you want TELEPHONE... there are infinite similar examples.

Even if you're willing to sit for five minutes trying to figure out all
the "correct" keywords for a article when you submit it, you still frequently
make personal choices that are not going to match another person's 
veiw of that same article.  Two people will tend to keyword any given
article in different ways.  This means that matching is a serious problem.

Before people jump on the keyword bandwagon, I STRONGLY suggest that
some time be spent looking at the numerous problems with existing
keyword-based systems, such as DIALOG.  I've used that service quite
a bit, and it is very, very frustrating to wade through lots of junk
you didn't want, and miss items you did want, due to keyword
"mismatch" problems of various sorts.  For netnews sites trying to cut
back on the phone bills by only sending, for example, technical
items, the volume of erroneously matched stories could be massive.
The odds are that about half the stories that would be sent would 
be "incorrect" and that about half of the stories you WANTED to send
woudln't get sent.

There is a lot of existing research in keyword systems that the proponents
of keyword-based news seem to be ignoring.  My own opinion is that
in our distributed environment, with volumes of material and costs
going up steadily (and many sites faced with cutting back on both,
one way or another) keyword-based systems might make our current
mess look like a paradise by comparison.

--Lauren--