Path: utzoo!attcan!uunet!zephyr.ens.tek.com!tektronix!psueea!psueea.uucp!kirkenda
From: kirkenda@psueea.uucp (Steve Kirkendall)
Newsgroups: comp.os.minix
Subject: Re: Disk performance under Minix
Summary: tests show read-ahead can speed up read()
Message-ID: <1599@psueea.UUCP>
Date: 15 Aug 89 07:26:43 GMT
References: <21290@louie.udel.EDU> <18613@princeton.Princeton.EDU>
Sender: news@psueea.UUCP
Reply-To: kirkenda@jove.cs.pdx.edu (Steve Kirkendall)
Organization: Dept. of Computer Science, Portland State University; Portland OR
Lines: 292

In article <18613@princeton.Princeton.EDU> nfs@notecnirp.UUCP (Norbert Schlenker) writes:
>In article <21290@louie.udel.EDU> HELMER%SDNET.BITNET@vm1.nodak.edu (Guy Helmer) writes:
>>The 1:2 interleave is optimum for DOS; I suspect that it is too
>>tight for Minix.  Has anyone tried fiddling with the interleave (or
>>the AT disk driver) to improve disk i/o?  Does Bruce's protected mode
>>fixes (with associated improved interrupt code) improve the disk i/o
>>situation by reducing the time the kernel spends working (thereby improving
>>the optimum interleave factor)?
>
>Interleave isn't a factor on my system.  The copy through the buffer
>cache struck me as another possible culprit, but shouldn't the system
>times be higher if so (not that I have confidence in time accounting)?
>
>So where is the slop?  I've looked at at_wini() - it's not a terribly
>complex piece of code.  What could it be doing that would make fixed
>disk performance so lousy?  Ideas, anyone?
>
>Norbert

I ran some speed tests on my ST, equipped witha Supra hard disk.  My tests all
concerned reading, not writing.  The results of the test are presented below,
followed by the program I used to perform the tests.

As you read the following test results, keep in mind that my disk hardware
can transfer data at 189 kb/sec, and that other brands of disks can achieve
up to 1000 kb/sec on the ST.

The test performs N calls to read(), giving a buffer size of (2meg / N).
The speed of the read() call is then computed by dividing the 2meg by the
elapsed realtime of the test.

The test is repeated with several buffer sizes.  If the speed remains fairly
contant regardless of the buffer size, then speed is limited primarily by
hardware or the caching strategy (i.e. either the device is slow, or the
driver is slow, or cache is too complex or has a low hit/miss ratio).  If
the speed increases as the buffer size increases, then the speed is limited
largely by the overhead involved in the system call.

-----------------------------------------------------------------------------
TEST 1: reading from a data file

Blk Size  Test Time    Speed
  512        101         20 kb/sec
 1024         91         23 kb/sec
 2048         67         31 kb/sec
 8192         57         36 kb/sec
16384         56         37 kb/sec

Speed increases as the the buffer size increases, so the system call overhead
contributes to the low speed.  However, when the block size was increased by
a factor of 32, the speed was increased by a factor of only 1.85 -- so the
system call overhead can only accept a relatively small part of the blame.

The results of this test are directly comparable to results published in
UNIX Review Magazine...
						Compaq 386/20
Blk Size	Sun 3/50	Sun 3/260	  ISC UNIX
  512		  232		   485		    124
 1024		  219		   672		    143
 2048		  232		   642		    142
 8192		  232		   620		    146
-----------------------------------------------------------------------------
Test 2: reading from /dev/null

This test eliminates the hard disk hardware & driver from the test.  The cache
is also of no consequence, since /dev/null is a character device.  Since it
is a device rather than a regular file, FS doesn't have much to do. Also,
every read() reads 0 bytes, so no memory-to-memory copies are needed.

We are left with the system call overhead and the ramdisk driver.

Blk Size  Test Time    Speed
  512         32         64 kb/sec
 1024         16        128 kb/sec
 2048          8        256 kb/sec
 8192          2       1024 kb/sec
16384          1       2048 kb/sec

This tells us that the kernel can handle about 128 read() calls per second.
Think of these speeds as a theoretical maximum.
-----------------------------------------------------------------------------
TEST 3: /dev/bnull

/dev/bnull is simply a block-device version of /dev/null.  By comparing this
test to the /dev/null test, we can get an idea of the overhead required to
maintain the cache.  (Keep in mind, though, that no data blocks are actually
being cached here, since /dev/bnull is 0 blocks long.)

Blk Size  Test Time    Speed
  512         56         37 kb/sec
 1024         29         71 kb/sec
 2048         14        146 kb/sec
 8192          3        683 kb/sec
16384          2       1024 kb/sec

So, with cacheing, the speed is about 60% of what we got without cacheing.
-----------------------------------------------------------------------------
TEST 4: /dev/rhd2

This test is similar to the test on /dev/null, except that real hardware is
involved, and data bytes are actually being moved around.

Blk Size  Test Time    Speed
  512         99         21 kb/sec
 1024         53         39 kb/sec
 2048         32         64 kb/sec
 8192         14        146 kb/sec
16384         11        186 kb/sec

This test is really amazing, since my hard disk is only capable of 189 kb/sec.
With a large buffer, hardware is the limiting factor.  With small buffers,
overhead in the kernel, FS, or device driver becomes a severe problem.
-----------------------------------------------------------------------------
TEST 5: /dev/hd2

This test is similar to the test on /dev/rhd2, except that the cache is used
because /dev/hd2 is a block device.

Blk Size  Test Time    Speed
  512         98         21 kb/sec
 1024         87         24 kb/sec
 2048         67         31 kb/sec
 8192         58         35 kb/sec
16384         56         37 kb/sec

Suddenly, the software overhead is killing us.  I suspect that FS divides
each request into 1K chunks, and then reads each chunk separately.  So, we
wind up with a speed that is slightly lower than the 1K uncached speed,
no matter how large our cached read is.

It is interesting to compare the results of this test with the results of
the test on /dev/bnull, in which no data was actually cached or copied.

Also, note that these speeds are almost identical to the speeds for a regular
file in a filesystem on the harddisk.  So, there seems to be little overhead
involved in translating a file offset into a block number within a filesystem.
-----------------------------------------------------------------------------
SUMMARY

Basically, the slow speed seems to be a product of the way FS handles blocks.
We could speed up I/O tremendously if we could modify FS so that it lets the
driver read more than one block at a time.

Or, if that is too ambitious, then we could modify the device driver so that
it performs read-ahead.  A simple way to do this would be to always read 4k
when FS requests 1K; the extra 3k would be used to satisfy later requests,
if appropriate.  This would probably double the speed of reading.

One word of caution: sequential reading of a large file is exactly the sort
of test that makes a cache look bad.  This test was biased against caches.
-----------------------------------------------------------------------------
*** NEWSFLASH ***

I just added read-ahead to the device driver, and reran the test on /dev/hd2,
with the following results:

Blk Size  Test Time    Speed
  512         68         30 kb/sec	(was 21 kb/sec)
 1024         55         37 kb/sec	(was 24 kb/sec)
 2048         43         48 kb/sec	(was 31 kb/sec)
 8192         43         48 kb/sec	(was 35 kb/sec)
16384         43         48 kb/sec	(was 37 kb/sec)

So we get a 30%-50% improvement with read-ahead in the driver.  That's nice,
but I expected more.  The modified driver always does physical reads of 4k
or more, so I expected a speed just slightly less than what a 4096 byte block
would get you on the raw disk -- about 90 kb/sec.

We could probably do better with read-ahead implemented in the cache, since
that way we could reduce the number of messages passed to/from the device
driver, and also eliminate the chore of copying from the driver's buffer
to FS's buffer.

     +------------------------------------------------------------------+
     |                                                                  |
     | Hey, by golly, I sure am learning a lot about operating systems! |
     |                                                                  |
     +------------------------------------------------------------------+


Here is the program I used to perform the tests.  When run with no arguments,
it creates a 2meg file to use for the testing.  If you give an argument, then
it reads from the named file without writing to it.
----- cut here --------- cut here ---------- cut here ---------- cut here -----
/* seqread.c */

/* This program tests the spead at which sequential files are read.
 * There must be enough disk space for a 2 megabyte temp file.
 */

#include 

#define TESTFILE	"twomegs"
#define FILESIZE	2097152L
char *testfile = TESTFILE;
char buf[16384];

main(argc, argv)
	int	argc;
	char	**argv;
{
	if (argc > 1)
	{
		testfile = argv[1];
	}
	else
	{
		/* create the test file */
		writefile();
	}

	/* test for various block sizes */
	printf("Blk Size  Test Time    Speed\n");
	readfile(512);
	readfile(1024);
	readfile(2048);
	readfile(8192);
	readfile(16384);

	if (argc > 1)
	{
		/* delete the test file */
		unlink(TESTFILE);
	}
}


writefile()
{
	long	offset;
	int	fd;

	/* create the file */
	fd = creat(TESTFILE, 0666);
	if (fd < 0)
	{
		perror(TESTFILE);
		exit(2);
	}

	/* put two megabytes of data in it */
	for (offset = 0L; offset < FILESIZE; offset += 16384)
	{
		if (write(fd, buf, 16384) < 16384)
		{
			perror("while writing");
			unlink(TESTFILE);
			exit(3);
		}
	}

	/* close the file */
	close(fd);
}


readfile(size)
	int	size;	/* size of buffer to use */
{
	long	before;	/* time at start of test */
	long	after;	/* time at end of test */
	int	blks;	/* number of buffers-full of data to read */
	int	fd;	/* used while reading the file */

	/* open the test file */
	fd = open(testfile, O_RDONLY);
	if (fd < 0)
	{
		perror("while reopening");
		exit(4);
	}

	/* read the file */
	for (blks = FILESIZE / size, time(&before); blks > 0; blks--)
	{
		read(fd, buf, size);
	}
	time(&after);

	/* close the file */
	close(fd);

	/* present statistics */
	printf("%5d    %7ld    %7ld kb/sec\n",
		size,
		after - before,
		(512 + FILESIZE / (after - before)) / 1024);
}
----- cut here --------- cut here ---------- cut here ---------- cut here -----
	-- Steve Kirkendall
	      ...uunet!tektronix!psueea!jove!kirkenda
	or    kirkenda@cs.pdx.edu