Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84 SMI; site sun.uucp
Path: utzoo!watmath!clyde!burl!ulysses!ucbvax!decvax!decwrl!sun!guy
From: guy@sun.uucp (Guy Harris)
Newsgroups: net.database
Subject: Re: UNIX + database
Message-ID: <2693@sun.uucp>
Date: Fri, 23-Aug-85 18:36:27 EDT
Article-I.D.: sun.2693
Posted: Fri Aug 23 18:36:27 1985
Date-Received: Sun, 25-Aug-85 13:57:33 EDT
References: <164@3comvax.UUCP>
Distribution: net
Organization: Sun Microsystems, Inc.
Lines: 89

> 	I'm curious about the 'suitability' of UNIX with respect to
> database implementations.  I've heard criticisms about the file system
> because to find a given sector in a file may require looking at up to
> 3 other sectors associated with the file inode.
> 	Some people say that 'extent-based' file systems are inherently
> faster than unix's.

If the extents are big enough, this may be true.  The UNIX file map for the
V7 file system (used by all standard UNIX releases except 4.2/4.3BSD) has,
in the inode (which is always in core), 10 direct block pointers, one
singly-indirect block pointer, one doubly-indirect block pointer, and one
triply-indirect block pointer.  The 10 direct block pointers point to the
first 10 blocks of the file (usually 512 or 1024 bytes), the singly-indirect
block pointer points to a block which points to the next 128 512-byte blocks
or the next 256 1024-byte blocks, the doubly-indirect block pointer points
to a block which points to the appropriate number of indirect blocks, etc..
At worst, you have to fetch the triply-indirect block, the appropriate
doubly-indirect block it points to, and the appropriate singly-indirect
block it points to, in order to fetch a block of the file.  However, there
is a fair chance that the triply-indirect block is in core if you've just
referenced another block using it; the same applies to the other blocks.

The 4.2BSD file system has 12 direct pointers, and the blocks are usually
4096 or 8192 bytes; it has triply-indirect blocks but they've never been
tested, as files can get a lot bigger before they require them.

The file map for an extent-based file system typically consists of entries
which say "the next N blocks of the file are located in the N blocks on the
disk starting at block M".  Thus, a map entry can map more blocks than a
UNIX map entry, which always maps in units of the file system's block size.
However, you can't directly compute the location of the map entry for a
given block of a file in a scheme like this; you have to search the map for
the entry.  This could, in principle, require more blocks to be read than
the UNIX scheme.  I don't know that it does so in practice.

I don't think there's a simple answer to the question "is an extent-style
file map faster than a UNIX-style file map?"  It depends on the pattern of
accesses to blocks of the file.  The worst case would be purely random
access; however, both the UNIX scheme and the extent map scheme perform much
better if the locus of file access doesn't move too fast (the UNIX scheme is
more likely to find the desired indirect block in the buffer cache, and the
extent map scheme is more likely not to have to "turn the window" and pull
in another map block).  I don't know the locality of "typical" block
accesses in a "typical" database system.

Another issue is "how much is a large file scattered across the disk?"  The
V7 file system uses a linked list to record unused blocks; when writing a
file, this makes it harder to allocate block N+1 of a file on a cylinder
near block N.  The 4.2BSD file system (and some other file systems used on
UNIX-based systems), and most (if not all) extent-based file systems, use a
bit map and can more easily allocate a block near its neighbor in the file.

Does anybody have any numbers comparing the behavior of a database system on
the V7 file system, the 4.2BSD file system, and some extent-based file
systems?

> 	I've also heard that databases usually rely on locking primitives
> which aren't present in UNIX.  (So how do UNIX databases perform locking?).

Some of them (such as INGRES) require you to add those primitives to UNIX
(INGRES has a device driver which you install into your system).

> 	I've heard that SystemV UNIX (release 2?) has now defined some locking
> primitives.  Is this true?  Doesn't this imply some important changes to the
> kernel?

S5R2 Version 2 (for the VAX; lord knows what version for other systems) has
a set of locking primitives.  They are influenced by a set of locking
primitives originally developed by John Bass, which permit you to lock a
region of a file defined by a file offset and a byte count.  Most of the
UNIX file/record locking primitives descend from these primitives.  The
/usr/group UNIX-based OS standard includes one such set; the S5R2V2 system
implements these on top of their own similar primitives as a compatibility
layer.

Yes, it implies important changes to the kernel; so what?  One would expect
it to require some changes...

> 	What about different file access methods (ISAM, hashed, etc)?  Can
> these always be satisfactorily layered on unix's bytestream file system
> as a library?  Isn't this expensive in terms of performance?

Why would it be any more expensive than, say, layering it on VMS's
block-array file system as a library (which is exactly what VMS does)?  Many
OSes out there have a low-level set of file I/O primitives which imply no
file structure, and build text file, or ISAM, or hashed, or whatever file
structure on top of them, just as UNIX does.

	Guy Harris