Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!killer!ames!lll-tis!lll-winken!uunet!munnari!otc!metro!ipso!runx!brucee
From: brucee@runx.ips.oz (Bruce Evans)
Newsgroups: comp.os.minix
Subject: Bugs found installing V1.3
Message-ID: <1646@runx.ips.oz>
Date: 8 Jul 88 16:05:17 GMT
Organization: RUNX Un*x Timeshare.  Sydney, Australia.
Lines: 250


I found the following bugs while beating hard on the system recently, doing
Minix development on Minix for the first time and installing the 1.3 diffs.
The system held up very well and my only major dis-satisfaction is the
slowness of fs+drivers.

Most of the bugs are still in 1.3. All diffs are relative to 1.3.

lib
===

sbrk() doesn't check underflow
------------------------------
Sbrk() in brk.c still doesn't do the bounds checking right. 1.2 didn't check
anything. 1.3 should check underflow as well as overflow.
Sbrk makes the undocumented assuption that the initial brksize is even.
It also declares unused undefined externs endv and dorgv. 

brk.c.diff:
-----------
24d23
<   extern int endv, dorgv;
28c27,28
<   if (incr > 0 && newsize < oldsize) return( (char *) -1);
---
>   if (incr > 0 && newsize < oldsize || incr < 0 && newsize > oldsize)
>   	return( (char *) -1);
-----------

ctime fails on Dec 31 of leap years
-------------------------------------
This bug has already been reported for the date command. My fix is simpler.
In ctime.c, too much is subtracted from the 't' variable, allowing it to go
negative. I checked the fixed version against another ctime implementation.
They agreed every hour 1970 to 1999. Minix ctime still fails miserably
before 1970 and more reasonably after 2000. This is for the upgraded ctime
posted just after the 1.3 diffs.

ctime.c.diff
------------
47a48,50
> 		{
> 			if ( t < YEAR + DAY )
> 				break;
48a52
> 		}
------------

fopen.c needs to include 
----------------------------------
to get at errno and ENOENT.

commands
========

asld divides by zero
--------------------
ZERO	=	/0
causes this. (Believe it or not, "/" means "0x" for at least one Xenix
assembler.)

echo discards quotes and fails with lots of arguments
-----------------------------------------------------
The buffering method in echo fails on arguments falling across the buffer
boundary. Since the buffer size is 2048 and exec used to provide at most
1024 bytes, no one noticed. With the new exec size of 2048 it should show
up, but I haven't re-compiled enough of 1.3 to check.

fdisk incompatible with DOS 3.3 fdisk
-------------------------------------
Partitions built with fdisk cause DOS 3.3 fdisk to crash, because of an
improper value in the incompletely documented sysind field.

More advice needs to be given on setting up Minix partitions. I have done
it many times, but still have trouble. DOS 3.3 fdisk at least _formats_
the partitions it sets up, so you can't modify an existing partition
without losing its data. I found this out the hard way after losing the
Minix partition table entry when DOS fdisk thought it was linked to the
DOS entry (it was, because although I used Minix fdisk to set up the
Minix partition, I changed the sysind field from 0 to the (linking) value
generated by DOS fdisk to stop DOS fdisk from crashing). Another important
thing is to reboot after changing the partition table. If you think that
the new values are effective immediately, it's easy to mkfs the wrong
partition.

getc() assigned to character variable
-------------------------------------
The following commands assign the value returned by getc() or getchar() to
a character variable which may be tested against EOF later. This fails when
characters are unsigned, and gives a premature EOF when the file contains
(char) EOF anyway. A couple of these test the getc() value is <= 0, but 
only == EOF and < 0 are correct. A global search produced:
 
	ast.c	readaline()
	df.c	getname()
	fdisk.c	get_a_char()
	mkfs.c	getline()
	prep.c	main(), skipline(), backslash()
	uniq.c	getline()

gets() buffer overflow possible
-------------------------------
Several commands, e.g. date and fdisk, use gets when fgets() would be better
because it doesn't allow the buffer to overflow.

head.s needs to define environ
-------------------------------
Init calls execn() which is from exec.c and something there references
environ. Normally environ is defined in crtso.s but head.s is linked instead
for mm/fs/init.

I think exec.c and the other library files which declare lots of functions
should be split up. This will make the source distribution larger (not much
if it is archived) and the binaries a little smaller.

Another problem with init and the environment is that the 1.2 version used
the 1.2 execn() which doesn't set up the environment. This didn't hurt me in
1.2 (why?) but in 1.3 the sh working on /etc/rc ran out of string space
trying to set up a junk environment.

mkfs allows file systems larger than device
-------------------------------------------
Mkfs.c attempts to test for file systems too large for the device by writing
to the block after the last, reading it back, and comparing. This doesn't
work because the i/o goes into the cache and always succeeds. (Eventually the
cache is flushed and an error is generated. The error handling for devices
with a size is confusing: the error is EOF which is ignored.) Thus errors from
the RAM disk are completely ignored. I fixed this by doing a sync() between
the write() and read().

mv messes up setuid bit and aliases
-----------------------------------
Doesn't preserve setuid bit when it uses cp.
mkdir a; cd a; > a; mv a ../a silently deletes file a.
mkdir a; cd a; > a; mv a .. silently deletes file a (directory a seems OK).

rm(?) deletes "." and ".."
--------------------------
Several times I somehow deleted the "." and ".." entries in a directory, using
a command like rm -r ../anotherdir. The file system remained valid (fsck
should really find this error), and the rest of the files in the directory
were recovered by going up a level and doing mv `ls baddir` gooddir.

sh quoting bugs
---------------
a=`echo b c` reduces to a=b c, so assigns a=b and executes c
Xenix sh assigns the entire output of a graved command (independent of quotes
in the output).
a="b c"; d=$a evaluates as d=b c and fails
a='"b c"'; d=$a evaluates as d=b 'c"'
a='"b c"'; d="$a" does the same
a=a; b=' '; c=c; d=$a$b$c evaluates as d=a c
a='"a"'; b='"b"', then
	c=$a$b works
	c="$a$b" eats the first quote off $a
I couldn't find a general way to make the shell concatenate 2 variables!!!!

tty
===

When 2 processes try to do tty i/o at once, the 2nd fails in bad ways. The
worst is with multiple writes. This can only happen after the 1st writer
is suspended. The 1st writer's state is overwritten, in particular the
process number on line 3917 of the book. Without the process number,
un-suspension is impossible. Killing the stuck process used to kill the
system but that seems to have stopped. To see this bug, start several
processes doing
   main() { while(1) printf("*"), sleep(2); }
and hit ^S^Q.

Fixing this properly would be best done by holding the state in fs. The 2nd
writer (or reader) should not get an error either. Fs can handle this by a
suspension which records the i/o request in the proposed state entry. A FIFO
queue can be used to select outstanding i/o requests when the device becomes
free. I made a related change a few months ago to allow fs to parcel out
humungous write requests in pieces acceptable to the driver. The driver
just returns a bytes_written count < bytes_to_write for automatic suspension.
Very few programs are written to handle incomplete writes like this, and
they shouldn't have to. Perhaps large writes to pipes can be broken up in
similarly.

When a 2nd process tries to read from tty, the error code E_TRY_AGAIN is
returned (line 3793 of the book). I think this error number filters back
to the caller, though it is supposed to be for the kernel's internal use.
Anyway the read fails and few programs can handle the errno. A simple
example is
   set | more
Output from set is nonstandard. More grabs the tty for reading and sh input
fails so sh logs out.

zero taken as EOF
-----------------
This has been fixed in a few commands (e.g. wc), but it's still in "more",
which confuses 0 from the buffer with the value returned by input() for EOF.

non-bugs
========

elle wishlist
-------------
Editing a linked file shouldn't destroy the link. (The .bak file stays linked.)
Shelling out should preserve the environment.
Backup files shouldn't be created automatically (hate cleaning them up).
Quitting shouldn't require confirmation.

mount wishlist
--------------
If the floppies are writable by everybody, it's too easy to write on a mounted
floppy. Mount should change the permissions to stop this. Just like login
(should) change tty permissions to protect logins from being read by others.

writing to stdin, reading from stdout and stderr should be allowed (?)
----------------------------------------------------------------------
Some Unix programs actually need this, e.g. compress and patch. Xenix certainly
allows both reading and writing to file descriptors 0, 1 and 2, provided they
have not been redirected by the shell. I think the reason is to allow stderr
at least to serve as a channel for handling input in response to error output,
after stdin has been redirected.

This can be arranged by changing all the open modes for ttys in init.c to 2.
I have been running this for 3 months with no problems.

zombie externs and conflicting use of extern library names
----------------------------------------------------------
Some commands and library routines declare unused variables as extern, and
the variables are not even defined externally.

It's dangerous to redefine a function in the library. If the new definition
is incompatible, everyone is confused and library routines may call the
wrong function (or non-function - some linkers don't complain even if printf
is declared as an int). If it's supposed to be compatible, time is wasted
and there's more chance of bugs. (When ctime is fixed, we can rip out lots
of buggy date routines.)

My linker is fussy and found these.

df.c:	declares the unused undefined extern super_block via super.h 
ed.c:	defines its own incompatible version of setbuf()
ed.c:	defines its own compatible(?) version of system()
make.c:	defines its own compatible(?) version of getenv()
mknod.c: defines its own compatible(?) version of atoi()
readfs.c: declares numerous unused undefined externs via buf.h and super.h 
shar.c:	defines index as a global int
shar.c:	defines its own incompatible version of putchar()
sort.c:	defines a global function incr() which is global int in termcap.c.
	This is termcap's fault.
termcap.c: declares unused undefined externs ospeed, PC, BC, UP
touch.c:defines its own compatible(?) version of std_err()
uudecode:defines its own compatible(?) version of index()

Bruce Evans
Internet: brucee@runx.ips.oz.au    UUCP: uunet!runx.ips.oz.au!brucee