Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84 SMI; site sun.uucp Path: utzoo!watmath!clyde!burl!ulysses!ucbvax!decvax!decwrl!sun!guy From: guy@sun.uucp (Guy Harris) Newsgroups: net.database Subject: Re: UNIX + database Message-ID: <2693@sun.uucp> Date: Fri, 23-Aug-85 18:36:27 EDT Article-I.D.: sun.2693 Posted: Fri Aug 23 18:36:27 1985 Date-Received: Sun, 25-Aug-85 13:57:33 EDT References: <164@3comvax.UUCP> Distribution: net Organization: Sun Microsystems, Inc. Lines: 89 > I'm curious about the 'suitability' of UNIX with respect to > database implementations. I've heard criticisms about the file system > because to find a given sector in a file may require looking at up to > 3 other sectors associated with the file inode. > Some people say that 'extent-based' file systems are inherently > faster than unix's. If the extents are big enough, this may be true. The UNIX file map for the V7 file system (used by all standard UNIX releases except 4.2/4.3BSD) has, in the inode (which is always in core), 10 direct block pointers, one singly-indirect block pointer, one doubly-indirect block pointer, and one triply-indirect block pointer. The 10 direct block pointers point to the first 10 blocks of the file (usually 512 or 1024 bytes), the singly-indirect block pointer points to a block which points to the next 128 512-byte blocks or the next 256 1024-byte blocks, the doubly-indirect block pointer points to a block which points to the appropriate number of indirect blocks, etc.. At worst, you have to fetch the triply-indirect block, the appropriate doubly-indirect block it points to, and the appropriate singly-indirect block it points to, in order to fetch a block of the file. However, there is a fair chance that the triply-indirect block is in core if you've just referenced another block using it; the same applies to the other blocks. The 4.2BSD file system has 12 direct pointers, and the blocks are usually 4096 or 8192 bytes; it has triply-indirect blocks but they've never been tested, as files can get a lot bigger before they require them. The file map for an extent-based file system typically consists of entries which say "the next N blocks of the file are located in the N blocks on the disk starting at block M". Thus, a map entry can map more blocks than a UNIX map entry, which always maps in units of the file system's block size. However, you can't directly compute the location of the map entry for a given block of a file in a scheme like this; you have to search the map for the entry. This could, in principle, require more blocks to be read than the UNIX scheme. I don't know that it does so in practice. I don't think there's a simple answer to the question "is an extent-style file map faster than a UNIX-style file map?" It depends on the pattern of accesses to blocks of the file. The worst case would be purely random access; however, both the UNIX scheme and the extent map scheme perform much better if the locus of file access doesn't move too fast (the UNIX scheme is more likely to find the desired indirect block in the buffer cache, and the extent map scheme is more likely not to have to "turn the window" and pull in another map block). I don't know the locality of "typical" block accesses in a "typical" database system. Another issue is "how much is a large file scattered across the disk?" The V7 file system uses a linked list to record unused blocks; when writing a file, this makes it harder to allocate block N+1 of a file on a cylinder near block N. The 4.2BSD file system (and some other file systems used on UNIX-based systems), and most (if not all) extent-based file systems, use a bit map and can more easily allocate a block near its neighbor in the file. Does anybody have any numbers comparing the behavior of a database system on the V7 file system, the 4.2BSD file system, and some extent-based file systems? > I've also heard that databases usually rely on locking primitives > which aren't present in UNIX. (So how do UNIX databases perform locking?). Some of them (such as INGRES) require you to add those primitives to UNIX (INGRES has a device driver which you install into your system). > I've heard that SystemV UNIX (release 2?) has now defined some locking > primitives. Is this true? Doesn't this imply some important changes to the > kernel? S5R2 Version 2 (for the VAX; lord knows what version for other systems) has a set of locking primitives. They are influenced by a set of locking primitives originally developed by John Bass, which permit you to lock a region of a file defined by a file offset and a byte count. Most of the UNIX file/record locking primitives descend from these primitives. The /usr/group UNIX-based OS standard includes one such set; the S5R2V2 system implements these on top of their own similar primitives as a compatibility layer. Yes, it implies important changes to the kernel; so what? One would expect it to require some changes... > What about different file access methods (ISAM, hashed, etc)? Can > these always be satisfactorily layered on unix's bytestream file system > as a library? Isn't this expensive in terms of performance? Why would it be any more expensive than, say, layering it on VMS's block-array file system as a library (which is exactly what VMS does)? Many OSes out there have a low-level set of file I/O primitives which imply no file structure, and build text file, or ISAM, or hashed, or whatever file structure on top of them, just as UNIX does. Guy Harris