Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version VT1.00C 11/1/84; site vortex.UUCP Path: utzoo!linus!decvax!bellcore!vortex!lauren From: lauren@vortex.UUCP (Lauren Weinstein) Newsgroups: net.news Subject: keyword-based news Message-ID: <820@vortex.UUCP> Date: Mon, 30-Sep-85 13:31:51 EDT Article-I.D.: vortex.820 Posted: Mon Sep 30 13:31:51 1985 Date-Received: Wed, 2-Oct-85 21:09:35 EDT Organization: Vortex Technology, Los Angeles Lines: 58 For quite a few years, I've been using a very elaborate keyword-based system for searching a large newswire story database. This database is in a centralized location so there is no concern about COSTS associated with extra matches, unlike the Usenet situation. One thing I learned long ago thanks to this system--it is almost IMPOSSIBLE to avoid major missed matches AND extra matches. If you try to make your keyword choices very specific and negate out topics of no interest, you frequently (*VERY* frequently) find that you're missing great numbers of stories that you really DID want to see, but where a particular keyword you specified wasn't used. Or you find that *MANY* stories you wanted to filter OUT still get through since the keywords you wanted to SKIP weren't used. There are so many similar ways to specify keywords, and there are so many personal choices involved, that getting the proper match between the person choosing the article keywords and the person trying to find (or ignore) particular stories is very difficult. In a keyword-based news system, with users attempting to choose their own keywords (and probably spelling them wrong part of the time, or leaving typos in them, let's face it!) getting CORRECT matches without getting lots of ERRONEOUS matches would be a nightmare. Let's say I wanted to see all stories that discussed TELEPHONES. But what if a story about AT&T was only keyworded with "PHONES" or "COMMUNICATIONS"? Well, you of course never see those stories. The same sort of problems can occur in the reverse direction when you're trying to avoid certain stories. It is VERY hard to create flexible keyword-based systems that avoid these problems. The issues involved with parts-of-speech and word usage alone are very significant. Even the advanced systems won't match on PHONE when you want TELEPHONE... there are infinite similar examples. Even if you're willing to sit for five minutes trying to figure out all the "correct" keywords for a article when you submit it, you still frequently make personal choices that are not going to match another person's veiw of that same article. Two people will tend to keyword any given article in different ways. This means that matching is a serious problem. Before people jump on the keyword bandwagon, I STRONGLY suggest that some time be spent looking at the numerous problems with existing keyword-based systems, such as DIALOG. I've used that service quite a bit, and it is very, very frustrating to wade through lots of junk you didn't want, and miss items you did want, due to keyword "mismatch" problems of various sorts. For netnews sites trying to cut back on the phone bills by only sending, for example, technical items, the volume of erroneously matched stories could be massive. The odds are that about half the stories that would be sent would be "incorrect" and that about half of the stories you WANTED to send woudln't get sent. There is a lot of existing research in keyword systems that the proponents of keyword-based news seem to be ignoring. My own opinion is that in our distributed environment, with volumes of material and costs going up steadily (and many sites faced with cutting back on both, one way or another) keyword-based systems might make our current mess look like a paradise by comparison. --Lauren--