Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.3 4.3bsd-beta 6/6/85; site ucbvax.ARPA Path: utzoo!linus!decvax!decwrl!ucbvax!fair From: fair@ucbvax.ARPA (Erik E. Fair) Newsgroups: net.news,net.news.notes Subject: Information Overload and What We Can Do About It Message-ID: <10381@ucbvax.ARPA> Date: Sat, 14-Sep-85 08:58:18 EDT Article-I.D.: ucbvax.10381 Posted: Sat Sep 14 08:58:18 1985 Date-Received: Sun, 15-Sep-85 05:16:46 EDT Organization: University of California at Berkeley Lines: 200 Xref: linus net.news:3100 net.news.notes:2 Summary: information structuring and filtering mechanisms needed Have you ever wondered why the notesfiles people are so smug about the superiority of their system over netnews? Or why has `rn' been such a big hit with the USENET user community? (of course, if you're using it, you probably know, but bear with me for the moment anyway). The USENET user community as a whole is suffering from information overload; that is, there are more items coursing the paths of the network than any single individual can read in a reasonable period of time. As the volume of messages in the newsgroups that I choose to read increases, there are two steps I can take to be more efficient: 1) I can arrange to read netnews at a higher baud rate (instead of 1200 baud, how about 9600 or 19200?). This will allow me to make my article selections faster, and hopefully be able to handle more articles per unit time than I did at 1200 baud. 2) I can prioritize the list of newsgroups that I read and remove some newsgroups from the bottom of the list, until the volume is manageable again. However, these traditional mechanisms for limiting time spent reading netnews are no longer sufficient, because they're not specific enough. What I need now is a set of automatic structuring and filtering mechanisms for articles. Remember my original questions about notesfiles & rn? The reason that these two user interfaces are popular is that in addition to providing the usual amenities (screen oriented interface &c), they also structure the information presented to the user, and `rn' provides the first of many possible filtering mechanisms for removing from view articles that the user is not interested in. If you were to grep for the Subject line in any high volume newsgroup, my observation is that you would find 80% or more of the articles are responses, rather than original articles. To the notesfiles user, the `base note' (the first article) and all the responses appear as one item in the presentation menu. It is considerably more daunting to hit `=' in rn, in a newsgroup you haven't read in many weeks and see the list of hundreds of individual articles that have accumulated. Fortunately, `rn' provides you with the facility to `kill' (remove from the list of unread articles) all of the articles with a specific subject (including the `Re:' subjects). This brings us to: I N F O R M A T I O N S T R U C T U R E Right now (with the exception of rn & notes) netnews articles are presented to the user in the order they arrived on the system. This is not optimal. To create structure in the way that netnews articles are presented, we can start (as rn does) with the Subject line, and follow that along, presenting articles whose subjects match. This gives us the thread of a discussion. However since responses can and frequently do arrive on a system out of order, we should sort by date of submission (i.e. the contents of the `Date:' field). This will give us the discussion in the chrological order in which it occurred. There is even more information in the header that we can use to order the articles into a discussion more accurately than with `Subject:' and `Date:'. I mean the `References:' line. Presently, the only use that any of the user interfaces make of this field is for finding the `parent' article of the current article (that is, the article to which the current article is a response). We can use this information for following discussions by building the tree that discussions form: a /|\ b c d / \ e f If this information is put into a database that is easily used by the various user-interfaces, the following things are possible: 1) accurate ordering and presentation of the discussions that take place on the network 2) differentiation between the various sub-branches of the tree of discussion (one branch goes off discussing foo from foobar, the other discussing bar from foobar) 3) change of subjects to reflect actual message content to facilitate #2, without affecting #1 (i.e. no more `Re: foo (really bar)') 4) delay posting of responses until the user has read the entire tree (or at least as much of it as is online at his site). We have a problem with users asking a trivial question, to which everyone knows the answer (and everyone immediately responds!). If the user-interface holds the followup until the user has read all the articles in the tree, and asks again whether the submitted response is still appropriate, the incidence of this problem should drop significantly. This should also cause a drop in network traffic. 5) lessen the necessity of including the text of the article to which one is responding. (the `parent' command of vnews, and ^P in rn also provide some of this functionality). It is this particular structure that makes the netnews data storage structure superior to notesfiles. However, we still have the problem of too much information to read and understand, which leads into: F I L T E R I N G M E C H A N I S M S As I mentioned, rn provides for removing articles with subjects you are disinterested in, from your view. However, given the proclivity of users to change the subject line, for a less than titanic change of subject (in which you probably still have no interest), rn's current mechanism for killing discussions misses the mark. Given the database described above, rn would never miss. A subject, however, is not the only criterion that you might wish to filter with. Consider the following information that might be useful to filter by: author (also known as the `bozo' filter) site (they're all bozos on that bus) date (kill articles that are four days old) time (kill articles composed between 0000 and 0600?) transit-time (kill articles that took more than x days to get here) length (anything too small or too big) newsgroups (in a multiple group posting, skip if `net.flame' is one of the other groups) keywords (suppose that postnews mungs up a set of keywords from the body of the article when it was first posted...) Consider also that any of these criteria can be used for article selection (i.e. to *find* articles) as well as in article de-selection. Finally, one more mechanism: we use moderators as a filtering mechanism, in that they select appropriate articles to broadcast to the network. In our electronic publishing medium, they are the editors. With the appropriate statistical information gathered by the user-interfaces on the system, other users on your system can act as editors for you. Ideally, I should be able to tell the user-interface, `show me all the articles that John Smiththought were interesting'. In this way, John Smith becomes my editor. Alternately, `show me everything that John Smith and Jane A. Nonymous did not look at' should also be a valid filter. W H A T D O W E D O N O W ? The structuring of netnews articles should be easy to implement; all of the necessary hooks are there, we're just not using the information contained in the header as yet. Clearly this is a database function that should go into rnews and expire for update & maintainance, rather than in the user-interfaces. The more mundane filtering mechanisms that I suggested should also be relatively easy to implement, given `rn' as a base. The `other local users as editors' idea will take some work. With the volume of network traffic increasing, there is no doubt in my mind that we will have a test of fire (site death by network byte?). However, I think that the mechanisms I have outlined, coupled with sensible naming of groups (and management of that namespace as a whole) will `save' the network that we know as USENET. The key is getting this software implemented, and distributed network wide as soon as possible, so that the peak of the deluge of information will be that much sooner, and that much lower, than if we do nothing. your comments and observations are solicited, Erik E. Fair ucbvax!fair fair@ucbarpa.BERKELEY.EDU S U G G E S T E D R E A D I N G S DRAGONMAIL: A Prototype Conversation-Based Mail System Douglas E. Comer, Larry L. Peterson, Purdue University SLC USENIX Conference Proceedings, June 1984, p. 42 The Readers Workbench - A System for Computer Assisted Reading Evan L. Ivie, Brigham Young University SLC USENIX Conference Proceedings, June 1984, p. 270 Structuring Computer-Mediated Communication Systems to Avoid Information Overload Starr Roxanne Hiltz, Murray Turoff CACM, July 1985, Vol 28, #7, p. 680 Conversation-Based Mail DRAFT TR August 26, 1985 Douglas E. Comer, Purdue University Larry L. Peterson, University of Arizona