Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 (Fortune) 6/7/84; site dmsd.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!ihnp4!houxm!whuxl!whuxlm!akgua!sdcsvax!sdcrdcf!hplabs!hpda!dmsd!bass From: bass@dmsd.UUCP (John Bass) Newsgroups: net.arch Subject: Re: Magic Cookies and File Systems Message-ID: <168@dmsd.UUCP> Date: Thu, 7-Mar-85 13:17:20 EST Article-I.D.: dmsd.168 Posted: Thu Mar 7 13:17:20 1985 Date-Received: Sun, 10-Mar-85 07:33:38 EST References: <917@sjuvax.UUCP> <538@rlgvax.UUCP> <190@u1100s.UUCP> Lines: 95 As the author of locking(2) and lockf(2) I have a few comments: First the locking system I designed IS a simple semaphore system in which each byte of the file address space is a magic cookie to be locked, unlocked, and tested. It so happens that I included semantics to deal wwith ranges of magic cookies .. thus the byte stream nature. The code started out as a semaphore manager for a LUNDY display memory system to allow several processes to maintain the display lists with races. The soft blocking was added to minimize the number of system calls and improve performance of the graphics applications. In 1980 I had a stiff political argument with ONYX management and a contractor over how to do file locking. The suggested "mail box" semaphores with a global 64 bit cookie/resource or a P & V semaphore drive ala the UCLA implementation in the late 70's. I resurected the lundy code and proposed it be recoded to fit into the filesystem and after nearly being run out of town it ended up the ONYX locking(2) call. The soft blocking is generally known "as enforcement mode". In 1981 I published the work in the USENIX news letter login: and it has been in the public domain. After much hassle it went thru the /usr/group standard proposal with three minor changes. 1) the soft blocking was changed to be conditionally enabled based on the SGID bit of a file to remove distructive behavor in the hands of children. Locking /etc/passwd is generally unhealthy and kids in the educational area like such games although comercial systems are generally free of such bull. 2) lock/unlock actions were defined to deal with the previous record when given negative lengths saving two addititional lseek(2) calls. 3) A testonly call was added to supplement the test&lock call, thus removing race conditions in some applications. I did not implement shared locks of any kind for several reasons. I have had a number of requests to do so and flames as why necessary. But I have never been able to reach an agreement over the semantics or how to deal with certain race conditions. There are several DIFFERING shared locking semantics: 1) A shared lock to be used for an extended update process. The intent is that only one process can hold the update lock preventing other update process from gaining access. Any reader can access the data. 2) A shared lock to be used by reading processes to prevent updates. Any reader can request a duplicate lock, all writers are blocked. 3) locks that can be promoted from 1 to 2 and back again. Type 1 locks are the most common implementation. A payroll record is locked while the operator updates the name/address on a CRT. With my implementation this application can be implemented with a clean work around. The update process locks the record and sets a timestamp for the record (or subrecord) to be updated then frees the lock. Other updating processes note the record busy while the timestamp is active. The updating process checks the timestamp to insure that it is the SAME prior to updating the record, if it changed the operator took too long (IE went to lunch) and some other process updated the record. The process can either have the operator redo the transaction with the new data displayed, or the two records can be diff edited togather. Where money amounts are involved this is easy. Type 2 locks are the most needed but cause REAL design problems. A process traversing an N-Way tree data base needs to protect upper nodes from becoming split/compacted and thus commiting the traversing process into garbage land. This effectively prevents ANY insert/delete operations in a B-tree database since the trama can extend to the root node -- thus a transversal must lockout add/delete operations for the duration. These type locks can be implemented with lockf(2) semaphores indirectly or with the aid of a deamon process -- this is a tough design problem and semaphores of any type will not help. The solution is to do all database actions in a deamon process with fifo's to the user interface -- or to put the entire data base manager into the kernel where the critical regions can be handled properly. A unix based transaction filesystem is best solution for many problems of this class. The original 1981 code that was published has a race condition between the locked call and the inode lock for the enforcement hook in rdwri. The fix is minor but varies between the varities of unix because of changes in the filesystem code. The fix is to move the inode lock into the locked procedure -- systems implementors can write and I will provide the several line fix. I will attempt to repost the lockf.c file and some of the documents to net.source within a week or so ... It has went thru some minor changes in the last couple months -- with the help of some 4.2 sites it now runs under 4.2 but is undefined for the network filesystem. The 4.2 network filesystem is "stateless" to handle certain error problems. This design choice I think will make complex databases nearly impossible because of integrity problems after certain failures. Lockf can be added as RPC's to the host machine in distributed filesystems. -- John Bass DMS Design (System Performance and Arch Consultants) {dual,fortune,idi,hpda}!dmsd!bass (408) 996-0557