Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!seismo!gatech!bloom-beacon!husc6!mit-eddie!genrad!decvax!dartvax!steve
From: steve@dartvax.UUCP (Steve Campbell)
Newsgroups: comp.bugs.4bsd,comp.unix.wizards
Subject: Cant access disks on second UDA50
Message-ID: <6683@dartvax.UUCP>
Date: Wed, 15-Jul-87 14:56:32 EDT
Article-I.D.: dartvax.6683
Posted: Wed Jul 15 14:56:32 1987
Date-Received: Sat, 18-Jul-87 07:45:03 EDT
Reply-To: steve@dartvax.UUCP (Steve Campbell)
Distribution: world
Organization: Dartmouth College, Hanover, NH
Lines: 121
Keywords: unibus uda50
Xref: mnetor comp.bugs.4bsd:445 comp.unix.wizards:3268

We have a VAX 785 with all FCO's applied running 4.3BSD with all fixes applied.
Its unibus currently has a UDA50 with 4 RA81s on it.  We plan to add 2 more
disks, requiring another UDA50.

Although conventional wisdom says not to put more than 1 UDA50 per
unibus, we are trying to do just that.  We have added a second UDA50 to
the bus and a third-party device called a USI/HRS from a company named
Shitashi which claims to enhance the unibus bandwidth enough to permit
the second uda.  The other devices on the unibus are a DEUNA and 2 DZ11s.

For testing purposes, we moved 2 RA81s from uda0 to uda1, so in terms of the
config file we went from this...

controller	uda0	at uba0 csr 0172150		vector udintr
disk		ra0	at uda0 drive 0
disk		ra1	at uda0 drive 1
disk		ra2	at uda0 drive 2
disk		ra3	at uda0 drive 3

...to this...

controller	uda0	at uba0 csr 0172150		vector udintr
disk		ra0	at uda0 drive 0
disk		ra1	at uda0 drive 1
controller	uda1	at uba0 csr 0172550		vector udintr
disk		ra2	at uda1 drive 2
disk		ra3	at uda1 drive 3

As far as we can tell, the hardware is working just fine.  All devices
interrupt at boottime, and all four disks are accessable AS RAW DEVICES.
We can fsck them all in parallel, mount them, and dd from the raw devices.

But - you knew there was a "but" - there is a problem.  Even in single
user mode, if we do a large number of accesses to files on any disk
USING PATHNAMES, then do a sync, the 2 disks on the second uda cannot
be accessed, and the command - and terminal - trying to do so hangs
completely.  For example just doing an ls -lR of a smallish file system
on ra1 (1000 files), output to /dev/null, then sync, then an ls of
anything on ra2 or ra3, and the terminal (console) hangs, and we have
to reboot.  A comparable find(1) will do the trick, too. 

The sync is important; without it we can still access the disks, but
after it we're dead.  On the other hand, the sync alone, ie without the
preceding ls or find, causes no problem.  Forcing a core dump of the
hung system shows the hung command to be in what ps(1) calls "D" state,
sleeping on runout in the scheduler.  The kernel "u" structure appears
to be empty - as if there were no current process.  Needless to say,
the same operation causes no problem when all four disks are on uda0.

I would suspect hardware if (a) we hadn't swapped everything in sight,
including the 2 UDA50's and removed the HSR, and (b) things didn't work
perfectly as long as we use the raw devices.

I would appreciate and comments or suggestions from the net.

						Steve Campbell
						steve@Dartmouth.EDU