Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site brl-tgr.ARPA Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!genrad!panda!talcott!harvard!seismo!brl-tgr!gwyn From: gwyn@brl-tgr.ARPA (Doug Gwyn) Newsgroups: net.unix Subject: Re: Unix text files Message-ID: <2837@brl-tgr.ARPA> Date: Tue, 5-Nov-85 00:29:30 EST Article-I.D.: brl-tgr.2837 Posted: Tue Nov 5 00:29:30 1985 Date-Received: Thu, 7-Nov-85 04:33:55 EST References: <23@pixel.UUCP> <2235@brl-tgr.ARPA> <2333@flame.warwick.UUCP> Organization: Ballistic Research Lab Lines: 52 > Does anyone out there want to show those of us with weak knees how one > would use this kind of data structure [used loosely] in a program? > (In other words, as if the data were within the program not without.) > Without additional support information, like keeping track of the number > and lengths of lines. Most data processing algorithms are (or should be) driven by the structure of the data that they process; this is normally taught these days in the "data structures" CS course. It should be obvious from the grammar how to structure code that e.g. gets a line of text, processes it, and writes out the resulting line. (There is no need to bring in line numbering or "length of line".) If there is no (or only a fuzzy) definition of "line of text", then it is not obvious how to get/put one, and some random choice is made by the programmer. (Which is what started this discussion.) For simplicity, I left out of the grammar one important constraint, which is a limit of no more than 510 characters in a line of text (exclusive of newline). I had already stretched the notation a bit and didn't want to invent yet another notation like { char }*510 . This limit is actually important in allowing efficient get-line implementations. > I think it would be a good example to the young of inheirent complexity. There is nothing complex about that grammar. It is a remarkably simple one, which was the point. Note that it was decomposed into meaningful subunits -- this is important! Just having a formal grammar (syntax) is not sufficient for good semantic processing. (People often forget this.) > And I thought we were trying to make life simple! The main problem here > is that we are trying to impose structure on unstructured data, which > is probably not the best approach. Text files certainly are structured, although it's a rather flexible structure. One might argue that dividing text into lines is artificial, but the concept of a "line of text" is useful in many text-processing programs (e.g., "grep"). > Sentinels are a wonderful way of implementing lists, but a terrible way > of implementing strings. Hint, hint. Oh, foo. Both the count+data and NUL-terminated representations for character strings have good and bad points. I've used both and prefer C's approach for most routine programming. If the point of the correspondent was that FILES-11 variable- length record format is easier to work with, he deserves a large horse laugh. See "Software Tools" for examples of the use of UNIX-like text file formats in programs.