Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 (Fortune) 6/7/84; site dmsd.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!ihnp4!houxm!whuxl!whuxlm!akgua!sdcsvax!sdcrdcf!hplabs!hpda!dmsd!bass
From: bass@dmsd.UUCP (John Bass)
Newsgroups: net.arch
Subject: Re: Magic Cookies and File Systems
Message-ID: <168@dmsd.UUCP>
Date: Thu, 7-Mar-85 13:17:20 EST
Article-I.D.: dmsd.168
Posted: Thu Mar  7 13:17:20 1985
Date-Received: Sun, 10-Mar-85 07:33:38 EST
References: <917@sjuvax.UUCP> <538@rlgvax.UUCP> <190@u1100s.UUCP>
Lines: 95

As the author of locking(2) and lockf(2) I have a few comments:

	First the locking system I designed IS a simple semaphore
system in which each byte of the file address space is a magic cookie
to be locked, unlocked, and tested. It so happens that I included
semantics to deal wwith ranges of magic cookies .. thus the byte
stream nature.

	The code started out as a semaphore manager for a LUNDY display
memory system to allow several processes to maintain the display lists
with races. The soft blocking was added to minimize the number of
system calls and improve performance of the graphics applications.

	In 1980 I had a stiff political argument with ONYX management and
a contractor over how to do file locking. The suggested "mail box" semaphores
with a global 64 bit cookie/resource or a P & V semaphore drive ala the
UCLA implementation in the late 70's. I resurected the lundy code and
proposed it be recoded to fit into the filesystem and after nearly being
run out of town it ended up the ONYX locking(2) call. The soft blocking
is generally known "as enforcement mode".

	In 1981 I published the work in the USENIX news letter login:
and it has been in the public domain. After much hassle it went thru the
/usr/group standard proposal with three minor changes. 1) the soft blocking
was changed to be conditionally enabled based on the SGID bit of a file
to remove distructive behavor in the hands of children. Locking /etc/passwd
is generally unhealthy and kids in the educational area like such games
although comercial systems are generally free of such bull. 2) lock/unlock
actions were defined to deal with the previous record when given negative
lengths saving two addititional lseek(2) calls. 3) A testonly call was
added to supplement the test&lock call, thus removing race conditions in
some applications.

	I did not implement shared locks of any kind for several reasons.
I have had a number of requests to do so and flames as why necessary. But
I have never been able to reach an agreement over the semantics or how
to deal with certain race conditions.

	There are several DIFFERING shared locking semantics:

	1) A shared lock to be used for an extended update process.
	The intent is that only one process can hold the update lock
	preventing other update process from gaining access. Any reader
	can access the data.

	2) A shared lock to be used by reading processes to prevent updates.
	Any reader can request a duplicate lock, all writers are blocked.

	3) locks that can be promoted from 1 to 2 and back again.

Type 1 locks are the most common implementation. A payroll record is locked
while the operator updates the name/address on a CRT. With my implementation
this application can be implemented with a clean work around. The update
process locks the record and sets a timestamp for the record (or subrecord)
to be updated then frees the lock. Other updating processes note the record
busy while the timestamp is active. The updating process checks the timestamp
to insure that it is the SAME prior to updating the record, if it changed
the operator took too long (IE went to lunch) and some other process updated
the record. The process can either have the operator redo the transaction
with the new data displayed, or the two records can be diff edited togather.
Where money amounts are involved this is easy.

Type 2 locks are the most needed but cause REAL design problems. A process
traversing an N-Way tree data base needs to protect upper nodes from becoming
split/compacted and thus commiting the traversing process into garbage land.
This effectively prevents ANY insert/delete operations in a B-tree database
since the trama can extend to the root node -- thus a transversal must
lockout add/delete operations for the duration. These type locks can be
implemented with lockf(2) semaphores indirectly or with the aid of
a deamon process -- this is a tough design problem and semaphores of
any type will not help. The solution is to do all database actions in
a deamon process with fifo's to the user interface -- or to put the entire
data base manager into the kernel where the critical regions can be
handled properly. A unix based transaction filesystem is best solution
for many problems of this class.

	The original 1981 code that was published has a race condition between
the locked call and the inode lock for the enforcement hook in rdwri.
The fix is minor but varies between the varities of unix because of
changes in the filesystem code. The fix is to move the inode lock into
the locked procedure -- systems implementors can write and I will provide
the several line fix. I will attempt to repost the lockf.c file and some
of the documents to net.source within a week or so ... It has went thru
some minor changes in the last couple months -- with the help of some
4.2 sites it now runs under 4.2 but is undefined for the network filesystem.
The 4.2 network filesystem is "stateless" to handle certain error problems.
This design choice I think will make complex databases nearly impossible
because of integrity problems after certain failures.

	Lockf can be added as RPC's to the host machine in distributed
filesystems.
-- 
John Bass
DMS Design (System Performance and Arch Consultants)
{dual,fortune,idi,hpda}!dmsd!bass     (408) 996-0557