Path: utzoo!attcan!uunet!zephyr.ens.tek.com!tektronix!psueea!psueea.uucp!kirkenda From: kirkenda@psueea.uucp (Steve Kirkendall) Newsgroups: comp.os.minix Subject: Re: Disk performance under Minix Summary: tests show read-ahead can speed up read() Message-ID: <1599@psueea.UUCP> Date: 15 Aug 89 07:26:43 GMT References: <21290@louie.udel.EDU> <18613@princeton.Princeton.EDU> Sender: news@psueea.UUCP Reply-To: kirkenda@jove.cs.pdx.edu (Steve Kirkendall) Organization: Dept. of Computer Science, Portland State University; Portland OR Lines: 292 In article <18613@princeton.Princeton.EDU> nfs@notecnirp.UUCP (Norbert Schlenker) writes: >In article <21290@louie.udel.EDU> HELMER%SDNET.BITNET@vm1.nodak.edu (Guy Helmer) writes: >>The 1:2 interleave is optimum for DOS; I suspect that it is too >>tight for Minix. Has anyone tried fiddling with the interleave (or >>the AT disk driver) to improve disk i/o? Does Bruce's protected mode >>fixes (with associated improved interrupt code) improve the disk i/o >>situation by reducing the time the kernel spends working (thereby improving >>the optimum interleave factor)? > >Interleave isn't a factor on my system. The copy through the buffer >cache struck me as another possible culprit, but shouldn't the system >times be higher if so (not that I have confidence in time accounting)? > >So where is the slop? I've looked at at_wini() - it's not a terribly >complex piece of code. What could it be doing that would make fixed >disk performance so lousy? Ideas, anyone? > >Norbert I ran some speed tests on my ST, equipped witha Supra hard disk. My tests all concerned reading, not writing. The results of the test are presented below, followed by the program I used to perform the tests. As you read the following test results, keep in mind that my disk hardware can transfer data at 189 kb/sec, and that other brands of disks can achieve up to 1000 kb/sec on the ST. The test performs N calls to read(), giving a buffer size of (2meg / N). The speed of the read() call is then computed by dividing the 2meg by the elapsed realtime of the test. The test is repeated with several buffer sizes. If the speed remains fairly contant regardless of the buffer size, then speed is limited primarily by hardware or the caching strategy (i.e. either the device is slow, or the driver is slow, or cache is too complex or has a low hit/miss ratio). If the speed increases as the buffer size increases, then the speed is limited largely by the overhead involved in the system call. ----------------------------------------------------------------------------- TEST 1: reading from a data file Blk Size Test Time Speed 512 101 20 kb/sec 1024 91 23 kb/sec 2048 67 31 kb/sec 8192 57 36 kb/sec 16384 56 37 kb/sec Speed increases as the the buffer size increases, so the system call overhead contributes to the low speed. However, when the block size was increased by a factor of 32, the speed was increased by a factor of only 1.85 -- so the system call overhead can only accept a relatively small part of the blame. The results of this test are directly comparable to results published in UNIX Review Magazine... Compaq 386/20 Blk Size Sun 3/50 Sun 3/260 ISC UNIX 512 232 485 124 1024 219 672 143 2048 232 642 142 8192 232 620 146 ----------------------------------------------------------------------------- Test 2: reading from /dev/null This test eliminates the hard disk hardware & driver from the test. The cache is also of no consequence, since /dev/null is a character device. Since it is a device rather than a regular file, FS doesn't have much to do. Also, every read() reads 0 bytes, so no memory-to-memory copies are needed. We are left with the system call overhead and the ramdisk driver. Blk Size Test Time Speed 512 32 64 kb/sec 1024 16 128 kb/sec 2048 8 256 kb/sec 8192 2 1024 kb/sec 16384 1 2048 kb/sec This tells us that the kernel can handle about 128 read() calls per second. Think of these speeds as a theoretical maximum. ----------------------------------------------------------------------------- TEST 3: /dev/bnull /dev/bnull is simply a block-device version of /dev/null. By comparing this test to the /dev/null test, we can get an idea of the overhead required to maintain the cache. (Keep in mind, though, that no data blocks are actually being cached here, since /dev/bnull is 0 blocks long.) Blk Size Test Time Speed 512 56 37 kb/sec 1024 29 71 kb/sec 2048 14 146 kb/sec 8192 3 683 kb/sec 16384 2 1024 kb/sec So, with cacheing, the speed is about 60% of what we got without cacheing. ----------------------------------------------------------------------------- TEST 4: /dev/rhd2 This test is similar to the test on /dev/null, except that real hardware is involved, and data bytes are actually being moved around. Blk Size Test Time Speed 512 99 21 kb/sec 1024 53 39 kb/sec 2048 32 64 kb/sec 8192 14 146 kb/sec 16384 11 186 kb/sec This test is really amazing, since my hard disk is only capable of 189 kb/sec. With a large buffer, hardware is the limiting factor. With small buffers, overhead in the kernel, FS, or device driver becomes a severe problem. ----------------------------------------------------------------------------- TEST 5: /dev/hd2 This test is similar to the test on /dev/rhd2, except that the cache is used because /dev/hd2 is a block device. Blk Size Test Time Speed 512 98 21 kb/sec 1024 87 24 kb/sec 2048 67 31 kb/sec 8192 58 35 kb/sec 16384 56 37 kb/sec Suddenly, the software overhead is killing us. I suspect that FS divides each request into 1K chunks, and then reads each chunk separately. So, we wind up with a speed that is slightly lower than the 1K uncached speed, no matter how large our cached read is. It is interesting to compare the results of this test with the results of the test on /dev/bnull, in which no data was actually cached or copied. Also, note that these speeds are almost identical to the speeds for a regular file in a filesystem on the harddisk. So, there seems to be little overhead involved in translating a file offset into a block number within a filesystem. ----------------------------------------------------------------------------- SUMMARY Basically, the slow speed seems to be a product of the way FS handles blocks. We could speed up I/O tremendously if we could modify FS so that it lets the driver read more than one block at a time. Or, if that is too ambitious, then we could modify the device driver so that it performs read-ahead. A simple way to do this would be to always read 4k when FS requests 1K; the extra 3k would be used to satisfy later requests, if appropriate. This would probably double the speed of reading. One word of caution: sequential reading of a large file is exactly the sort of test that makes a cache look bad. This test was biased against caches. ----------------------------------------------------------------------------- *** NEWSFLASH *** I just added read-ahead to the device driver, and reran the test on /dev/hd2, with the following results: Blk Size Test Time Speed 512 68 30 kb/sec (was 21 kb/sec) 1024 55 37 kb/sec (was 24 kb/sec) 2048 43 48 kb/sec (was 31 kb/sec) 8192 43 48 kb/sec (was 35 kb/sec) 16384 43 48 kb/sec (was 37 kb/sec) So we get a 30%-50% improvement with read-ahead in the driver. That's nice, but I expected more. The modified driver always does physical reads of 4k or more, so I expected a speed just slightly less than what a 4096 byte block would get you on the raw disk -- about 90 kb/sec. We could probably do better with read-ahead implemented in the cache, since that way we could reduce the number of messages passed to/from the device driver, and also eliminate the chore of copying from the driver's buffer to FS's buffer. +------------------------------------------------------------------+ | | | Hey, by golly, I sure am learning a lot about operating systems! | | | +------------------------------------------------------------------+ Here is the program I used to perform the tests. When run with no arguments, it creates a 2meg file to use for the testing. If you give an argument, then it reads from the named file without writing to it. ----- cut here --------- cut here ---------- cut here ---------- cut here ----- /* seqread.c */ /* This program tests the spead at which sequential files are read. * There must be enough disk space for a 2 megabyte temp file. */ #include#define TESTFILE "twomegs" #define FILESIZE 2097152L char *testfile = TESTFILE; char buf[16384]; main(argc, argv) int argc; char **argv; { if (argc > 1) { testfile = argv[1]; } else { /* create the test file */ writefile(); } /* test for various block sizes */ printf("Blk Size Test Time Speed\n"); readfile(512); readfile(1024); readfile(2048); readfile(8192); readfile(16384); if (argc > 1) { /* delete the test file */ unlink(TESTFILE); } } writefile() { long offset; int fd; /* create the file */ fd = creat(TESTFILE, 0666); if (fd < 0) { perror(TESTFILE); exit(2); } /* put two megabytes of data in it */ for (offset = 0L; offset < FILESIZE; offset += 16384) { if (write(fd, buf, 16384) < 16384) { perror("while writing"); unlink(TESTFILE); exit(3); } } /* close the file */ close(fd); } readfile(size) int size; /* size of buffer to use */ { long before; /* time at start of test */ long after; /* time at end of test */ int blks; /* number of buffers-full of data to read */ int fd; /* used while reading the file */ /* open the test file */ fd = open(testfile, O_RDONLY); if (fd < 0) { perror("while reopening"); exit(4); } /* read the file */ for (blks = FILESIZE / size, time(&before); blks > 0; blks--) { read(fd, buf, size); } time(&after); /* close the file */ close(fd); /* present statistics */ printf("%5d %7ld %7ld kb/sec\n", size, after - before, (512 + FILESIZE / (after - before)) / 1024); } ----- cut here --------- cut here ---------- cut here ---------- cut here ----- -- Steve Kirkendall ...uunet!tektronix!psueea!jove!kirkenda or kirkenda@cs.pdx.edu