Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/17/84 chuqui version 1.7 9/23/84; site nsc.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!ihnp4!nsc!chuqui
From: chuqui@nsc.UUCP (Chuq Von Rospach)
Newsgroups: net.news
Subject: Re: netnews and keywords
Message-ID: <2452@nsc.UUCP>
Date: Sat, 9-Mar-85 03:05:37 EST
Article-I.D.: nsc.2452
Posted: Sat Mar  9 03:05:37 1985
Date-Received: Sun, 10-Mar-85 05:37:00 EST
References: <591@vortex.UUCP>
Reply-To: chuqui@nsc.UUCP (Chuq Von Rospach)
Organization: The Village
Lines: 39
Summary: 

In article <591@vortex.UUCP> lauren@vortex.UUCP (Lauren Weinstein) writes:
>While I personally am not in favor of keyword-based netnews, I might
>point out that Chuqui's calculations are based on a somewhat erroneous 
>keyword model.  The "right" way of designing keyword-based systems is not
>necessarily to store keywords for each article item, but rather to
>have a list of keywords and store the item numbers that correspond to
>each keyword.

Agreed-- the experiments I discussed were just one implementation I tried--
I also looked at using keyword->article_id lookups as well, and it has
similar problems in different ways. Some of the tradeoffs are better, some,
to me, aren't. Overall, I still am not sure that there is a good keyword
system for the amount of data we have with the number of keywords we SHOULD
keep to make the system really useful. I'm especially worried about disk
space and processor overhead-- two things a lot of news systems already
have in short supply. Even if we can get disk usage down to a 25& increase
(my results showed me about a 50% increase with my preliminary designs)
you're still talking about 3-5 megabytes of keyword database, and that
would be a significant problem for some sites. Generating and maintaining
that data would also be a significant processor load for many sites (not
all of us have Vaxen). Perhaps they can be worked around, and I'm still
looking at the situation, but I don't see any easy answers.

>Still, it is pretty useful, *if* you
>are good at picking the keywords to put into the search
>expressions.  This is something of an art, however, and is not
>easily mastered.  If you do it wrong, you can miss many
>interesting stories.

this is my other worry-- I don't want to see us moving in directions that
make usenet LESS useful. I want to see usenet made better and more
effective. Somehow. I think we all agree with that hope.

chuq
-- 
Chuq Von Rospach, National Semiconductor
{cbosgd,fortune,hplabs,ihnp4,seismo}!nsc!chuqui   nsc!chuqui@decwrl.ARPA

Be seeing you!