Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site warwick.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!think!harvard!seismo!mcvax!ukc!warwick!kay From: kay@warwick.UUCP (Kay Dekker) Newsgroups: net.unix Subject: Unix text files Message-ID: <2339@flame.warwick.UUCP> Date: Sat, 26-Oct-85 11:37:17 EST Article-I.D.: flame.2339 Posted: Sat Oct 26 11:37:17 1985 Date-Received: Tue, 29-Oct-85 00:49:01 EST References: <23@pixel.UUCP> <2235@brl-tgr.ARPA> <2333@flame.warwick.UUCP> <2308@brl-tgr.ARPA> Reply-To: kay@warwick.UUCP (Kay Dekker) Organization: VLSI Group, Warwick University, UK Lines: 71 Xpath: warwick flame flame ubu Extensive quoting ensues, as I've moved the discussion to net.unix from net.bugs, and people may have missed this... Sometime back, gwyn@brl-tgr.ARPA (Doug Gwyn) wrote: >> >Many UNIX text-file utilities will discard a (necessarily final) >> >text line that does not end in a newline. Quite simply, such a >> >file is not a proper UNIX text file. and I responded with: >> Who says? Where's the definition of a 'proper' UNIX text file? to which he replied: >The problem is, there are several interpretations of such a file, >depending on the utility involved. Perhaps there should be a >well-defined standard interpretation, but there isn't currently. > >"A file of text consists simply of a string of characters, with >lines demarcated by the newline character." -- from "The UNIX >Time-Sharing System" by Ritchie & Thompson > >"text file, ASCII file -- a file, the bytes of which are understood >to be in ASCII code" -- from "Glossary" in "UNIX Time-Sharing >System Programmer's Manual", 8th Ed. > >"A text stream is an ordered sequence of bytes composed into lines, >each line consisting of zero or more characters plus a terminating >new-line character. ... The sequentially last character read in >from a text stream will, however, always be sequentially the last >character that was earlier written out to the text stream, if that >character was a new-line." -- from ANSI X3J11/85-045 > >My personal choice would be similar to Ritchie & Thompson, where >newlines delimit (NOT "terminate") text lines, so that the last >character in a text file would not need to be a newline. However, >this raises the question of what utilities should do with the >null line at the end of every text file that DOES end with a >newline; this will still be utility-dependent (and should be >documented whenever it is handled differently from other text >lines in the file). > >X3J11/85-045 botched it anyhow, since they intended that ALL UNIX >files qualify as "text streams" under stdio (vs. "binary streams", >which have to be handled differently on some non-UNIX OSes). > >So, how do we establish a standard interpretation for non-newline- >terminated UNIX text files? Doug, I may be being optimistic (and thus *wrong*) but I don't see where the problem with your suggestion [newlines delimiting text lines] lies: the rule would be, simply, "Text consists of an ordered sequence of characters, with lines delimited by newline characters. Text is normally terminated by a newline. This newline should be considered to be followed by a (nonexistant) null line. The null line should not be considered to be part of the text. "If the last character of the text is not a newline, then consider the text to be terminated by a newline - null line pair; however, this newline - null line pair should not be considered to have been part of the file. I *think* that's right... Kay. -- "The only good thing that I can find to say about the idea of colonies in space is that America could, at last, have a world to herself." -- Elisabeth Zyne ... mcvax!ukc!warwick!flame!kay