Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version VT1.00C 11/1/84; site vortex.UUCP Path: utzoo!watmath!clyde!cbosgd!ulysses!allegra!bellcore!decvax!vortex!lauren From: lauren@vortex.UUCP (Lauren Weinstein) Newsgroups: net.news Subject: netnews and keywords Message-ID: <591@vortex.UUCP> Date: Thu, 7-Mar-85 18:22:11 EST Article-I.D.: vortex.591 Posted: Thu Mar 7 18:22:11 1985 Date-Received: Sat, 9-Mar-85 08:43:52 EST Organization: Vortex Technology, Los Angeles Lines: 40 While I personally am not in favor of keyword-based netnews, I might point out that Chuqui's calculations are based on a somewhat erroneous keyword model. The "right" way of designing keyword-based systems is not necessarily to store keywords for each article item, but rather to have a list of keywords and store the item numbers that correspond to each keyword. For example, Stanford's newswire scanning program ("NS") makes EVERY word in EVERY newswire item (except for a list of "common" words that are automatically excluded from the tables) a keyword. You then pick out individual stories with boolean keyword expressions. A randomly selected example: ((love+sex)*handcuff)-chuqui) This expression would find all stories that mention the words "love" OR "sex" that also mention the word "handcuff". Also, it will exclude all stories that fit this critera but that also include the word "chuqui". The software automatically tries to handle plurals and special suffixes. There are some problems with this keyword technique, admittedly. You can't currently specify that two words should be next to each other in a story. And you still tend to get lots of erroneous keyword matches that aren't what you are looking for due to the strange places that some words tend to pop up in stories. Still, it is pretty useful, *if* you are good at picking the keywords to put into the search expressions. This is something of an art, however, and is not easily mastered. If you do it wrong, you can miss many interesting stories. Of course, this is a pretty big program and the database is still non-trivial, to say the least. But frankly, I don't think that systems based on users' selecting their own keywords will be useful in our environment. The technique above is an alternative, but probably not practical for smaller machines. So, I currently feel that keyword-based news is not really the way to go. --Lauren--