Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!rutgers!ames!oliveb!sun!gorodish!guy
From: guy%gorodish@Sun.COM (Guy Harris)
Newsgroups: comp.bugs.4bsd
Subject: Re: read() from tty has fencepost error
Message-ID: <22664@sun.uucp>
Date: Sun, 5-Jul-87 04:51:37 EDT
Article-I.D.: sun.22664
Posted: Sun Jul  5 04:51:37 1987
Date-Received: Sun, 5-Jul-87 09:37:16 EDT
References: <648@haddock.UUCP> <6040@brl-smoke.ARPA> <13048@topaz.rutgers.edu> <6053@brl-smoke.ARPA>
Sender: news@sun.uucp
Lines: 82

> At one time, a special "delimiter" marker was inserted into the stream
> at that point.  Apparently, some UNIXy implementations do it one way
> and some another.

Non-STREAMS tty drivers generally have a "raw" queue and a
"canonical" queue.  Reads in "cooked" mode take place from the
"canonical" queue.

In the AT&T drivers, of various flavors (V7, S3, S5), characters
accumulate in the "raw" queue until a "read" is done.  If the
terminal is in cooked mode when the "read" is done, the "read" blocks
until a line terminator (newline, EOF, or "secondary end-of-line"
character) is received.  At that point, one and only one line is
canonicalized (erase/kill processing is done) and is moved to the
"canonical" queue.  If the "line" is terminated by an EOF rather than
an end-of-line character, the EOF does NOT appear in the canonical
queue.  Thus, the top-level reading code won't see delimiters.

The 4BSD driver(s) move data from the "raw" queue to the "canonical"
queue as soon as a line terminator is received.  "Canonicalization"
is done on the fly; for example, as soon as an "erase" character is
received, the character it erases is removed from the "raw" queue.
(This makes it easier to implement more correct handling of the
"erase" character - it's easier for the driver to know what character
is being erased, so it can do a better job of erasing it from the
screen - and also makes it easier to handle a "reprint" character
that causes the current queued-up input to be re-echoed.  It also
means that erase, kill, etc. characters do NOT count against the
256-character limit of uncanonicalized characters, but subtract from
that count.)  If the line ended with EOF, the EOF is left in the
canonical queue as a delimiter.  It is stripped out when the "read" is
done; however, if there are five characters in the queue, and the
"read" asks for five bytes, only those five characters are looked at.
If an EOF follows them, it is left in the queue and seen by the next
"read".

> I seem to recall that SVR3.0 STREAMS was missing the M_DELIM message type,
> so whenever AT&T finally gets the whole character I/O system converted to
> STREAMS, they couldn't insert a delimiter if they wanted too (according to
> Ron, that would be consistent with current UNIX System V behavior).

This is the true.  STREAMS messages somewhat resemble "mbuf" chains;
delimiters are implicit in the structure of these chains (when you
get to the end of one, you're at the end of a message).  A line would
be a single STREAMS message; the EOF would be discarded ASAP, since
it is not needed as a delimiter.  As such, any driver based on the
S5R3 STREAMS code will give the "push", rather than the "delimiter"
behavior (regardless of whether it implements "canonicalize at read
time" or "canonicalize at input time" behavior).

The "streams" code described in Dennis Ritchie's paper in the BSTJ (I
have no idea if that implementation is called STREAMS or just
"streams") has a "delimiter" message type.  I don't know what sort of
behavior the various V8 "streams"-based (as opposed to S5R3
STREAMS-based) tty drivers provide; Dennis' paper described two
drivers, one giving the 4.1BSD "old" line discipline behavior (which
may resemble V7 behavior) and one giving the 4.1BSD "new" line
discipline behavior (which probably resembles other 4BSD systems).

I agree with most of the people here; the non-4BSD behavior is
correct.  When I type ^D, it doesn't mean that I'm putting a ^D into
the input queue, it measns I'm terminating a record.

> Alas, another difference among UNIX variants.  What does POSIX have to
> say about this?

From the draft of Draft 10 (*sic*) we have here:

7.1.1.11 Special Characters

	...

	EOF	...When received, all the characters waiting to be
		read are immediately passed to the program, without
		waiting for a new-line, and the EOF is discarded.
		Thus, if there are no characters waiting (that is,
		the EOF occurred at the beginning of a line), zero
		characters shall be passed back, representing an
		end-of-file indication.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com