Xref: utzoo news.software.b:1001 comp.mail.uucp:852 Path: utzoo!mnetor!spectrix!clewis From: clewis@spectrix.UUCP (Chris Lewis) Newsgroups: news.software.b,comp.mail.uucp Subject: A Bug in "Supersedes:" and some comments on map handling. Message-ID: <339@spectrix.UUCP> Date: 16 Dec 87 20:31:29 GMT Organization: Spectrix Microsystems Inc., Toronto, Ontario, Canada Lines: 171 Keywords: supersedes maps comp.mail.maps expire news mail pathalias uuhosts First, the bug: The Supersedes header isn't working because someone is munging it. We're getting headers in comp.mail.maps articles that look like: > Path: spectrix!tmsoft!utgpu!water!watmath!clyde!cbosgd!ucbvax!rutgers!pleasant > From: uucpmap@rutgers.rutgers.edu (UUCP Mapping Project) > Newsgroups: comp.mail.maps > Subject: UUCP map for d.usa.ct.1 > Message-ID: <6187@rutgers.rutgers.edu> > Date: 13 Dec 87 17:49:04 GMT > Expires: 27 Jan 88 17:49:02 GMT > Sender: pleasant@rutgers.rutgers.edu > Lines: 35 > Approved: pleasant@rutgers.rutgers.edu > Supersedes: <6175@rutgers.rutgers.edu> can't open /usr/lib/news/artfile Note the contents of the Supersedes header line: <6175@rutgers.rutgers.edu> can't open /usr/lib/news/artfile Looking at inews.c and c_cancel()/control.c and hread I see that nobody strips off the stuff after the ">". Thus, the whole string is considered to be a Message-ID. Not surprisingly, inews drops this gem into the log file: Dec 15 16:13 tmsoft Can't cancel <6175@rutgers.rutgers.edu> can't \ open /usr/lib/news/artfile: non-existent Which makes perfect sense once you parse it this way: Dec 15 16:13 tmsoft Can't cancel "<6175@rutgers.rutgers.edu> can't \ open /usr/lib/news/artfile": non-existent Further, we get this in our history file: <6175@rutgers.rutgers.edu> can't open /usr/lib/news/artfile \ 12/15/87 16:13 cancelled Which should be parsed thusly: "<6175@rutgers.rutgers.edu> can't open /usr/lib/news/artfile" \ 12/15/87 16:13 cancelled Which blows the history file format and will cause all sorts of problems to the news code. Particularly expire. WHO'S DOING THIS?! The only thing I know for sure is that the munging isn't being done here. Second, my comments on this mess: Why on earth are the maps being updated this way? If I've read some of the commentary correctly, the Map Project is going to repost a whole chunk of the map once an update to some entries in it occur (modulo some small number of days latency). And, the Supersedes header is there to allow a new chunk to "cancel" the previous chunk so that comp.mail.maps doesn't take up so much room. Eyuck. The spool space problem is supposedly solved (BUT not here though, see above), but the transport costs will go thru the roof! I seem to recall someone saying updates will occur "hopefully within 48 to 72 hours". I can just see it - we'll get enough updated map chunks to vastly multiply the total comp.mail.maps traffic. First of all: why couldn't someone have reposted an up-to-date version of uuhosts (or a much simpler map muncher that just unshars comp.mail.maps postings). Then: a site sets a short expiry time on comp.mail.maps and/or the new version of uuhosts deletes the article after unpacking. All the comp.mail.maps trickery regarding "Expires:" and "Supersedes:" would be TOTALLY unneccessary - because once unpacked, what the heck do you need the article around for? Secondly, even without unpacking, having the articles in the spool area isn't terribly useful either (if you don't bother unpacking them somehow, what use are they?) Secondly: regarding transport load: Reposting humongous chunks of map data simply because one entry had a comma misplaced is stupid. The ideal brute force method would be to have map postings only contain one site, and uuhosts unpacks it into a file of the same name as the site. Then, when a site changes it's map entry, you only have to repost that system's entry. Obviously, this won't work very well - we don't have that many inodes.... rnews overhead would skyrocket.... Some site's names are too long... System V would start saying "Directory too big - get help"... Two possibilities: - How about having two types of maps posting: one a "whole chunk" (ala "u.can.on.1") posted once per month to resynchronize everybody. The other "patch" input to edit previously uuhost-unpacked chunks. If you put the "patch" input in a separate newsgroup (ala: comp.mail.maps.patches) you wouldn't even break existing map munchers. Sneakier still, just simply have the "patch" articles have the invocation of patch in them - anybody running uuhosts just simply has to copy "patch" into their MAPSH directory. (Though, some thought as to security has to be given...) - Much better: release a utility to use in place of uuhosts that maintains a database of sites and their map entries. Then, the keeper of the maps just posts articles which only contain new entries. The database munger just has to replace the already existing database entry with the new ones. uuhosts already does half of this (the "Index" file). Part of this utility would be a mechanism by which the whole database can be dumped thru pathalias (which we've also done to uuhosts). In fact I've been thinking about such a one and am going to try to build one. Another possible problem has occured to me: Since much of the map updating is decentralized (eg: Canadian maps are done at U of Toronto), won't there be a problem with the area coordinator trying to Supersede an article posted by Rutgers? c_cancel() won't like it. Another inconvenience (and a kludge) - at least until recently our area coordinator was kindly posting updated copies of the u.can.on into comp.mail.maps with distribution "can". Quite frequently at one point. Problem is that in B news you have problems trying to pump comp.mail.maps entries thru the map muncher when the distribution matches a top-level newsgroup. C-news does not have this problem because you can distinquish between distribution and newsgroup in the sys file. Eg: we used to have this sys entry: maps:world,comp.mail.maps:F:.....Batch (for uuhosts) Without "world", maps doesn't see anything. This didn't catch the "can" distribution postings, so I tried: maps:world,can,ont,tor,comp.mail.maps:.... Silly me! uuhosts bitched at me about all of the "not a map postings" - obviously, uuhosts was being given ALL of the local newsgroup articles. So, what I did was hack ifuncs.c's function broadcast (ifuncs.c at patch level 13): if (!ngmatch(h.nbuf, srec.s_nbuf)) continue; #define COMPMAILMAPSHACK /* START CRL 87:12:2 */ #ifdef COMPMAILMAPSHACK if (STRCMP(h.nbuf, "comp.mail.maps") == 0 && STRCMP(srec.s_name, "maps") == 0)/* must match sys entry */ dist = "world"; #endif /* END CRL 87:12:2 */ if (*dist == '\0') dist = "world"; if (!ngmatch(dist, srec.s_nbuf) && !ngmatch(srec.s_nbuf, dist)) continue; (sorry, we don't have diff -c). What this does is change an internal copy of the distribution field to world if the newsgroup is comp.mail.maps and the system name is "maps" so that it'll go to the maps site no matter what the distribution is. This doesn't seem to affect other site or newsgroup routing. And finally, somehow the last u.can.on posting didn't have the updates I had sent to our area coordinator which had made it into his local postings that had occured long before the comp.mail.maps flood from rutgers. I've sent off another copy. I wonder though, how many other entries have been lost like this? -- Chris Lewis, Spectrix Microsystems Inc, UUCP: {uunet!mnetor, utcsri!utzoo, lsuc}!spectrix!clewis [Also: lsuc!clewis in a pinch] Phone: (416)-474-1955