Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!uunet!munnari!kre
From: kre@munnari.oz (Robert Elz)
Newsgroups: comp.unix.wizards
Subject: Re: symbolic links poster child
Message-ID: <1738@munnari.oz>
Date: Mon, 6-Jul-87 15:03:15 EDT
Article-I.D.: munnari.1738
Posted: Mon Jul  6 15:03:15 1987
Date-Received: Tue, 7-Jul-87 06:29:26 EDT
References: <7955@brl-adm.ARPA> <2249@bunker.UUCP> <244@nuchat.UUCP> <7326@mimsy.UUCP>
Organization: Comp Sci, Melbourne Uni, Australia
Lines: 110
Summary: Why doesn't someone implement a loop detection algorithm??

The question is in the Summary header .. the answer is here...

Its not easy!  And this is quite apart from any effeciency
considerations (such as slowing namei from a walk to a crawl).

I'm not going to say its impossible, but its not nearly as easy as
has been implied by some of the news items on this subject.  You can't
simply read the literature and translate the mathematics into code.

Not for the first time, it turns out that the theoretical results
are simply impractical (see also Tanenbaum's MINIX book, and the
discussion of deadlock avoidance algorithms).

There are several reasons why its not easy.  One is that symlinks
simply don't operate like nodes in a graph, another is that programming
inside the kernel imposes a few limitations that theoretical algorithms
rarely allow for.  There's a third, which is the real killer, but
I'll leave that for a bit lower, to keep you reading...

First, the nature of symlinks .. its not a loop for a symlink to be
encountered in a path 2, 3, 4, ... times, that might be entirely
valid.  Most loop detection algorithms assume that there is just one
path out of a node, so when you encounter it again, you have a loop.
The interesing question is supposed to be how to remember what you
have encountered in minimum space and time.  Consider

	ls -F /foo
	x/ y/ z/

	ln -s /foo /a
	ln -s /a/x/xx /a/y/yy
	ln -s /a/y/yy /a/z/zz
	cp /dev/null /foo/x/xx

Now consider the number of times that the symlink "a" is examined
when referencing /a/z/zz ..

Now there are ways around this, you can look for loops at each "level"
of interpretation, but here the kernel programming restrictions start
making life difficult.

Here I assume that we're talking about unix, any unix, not a redesigned
kernel .. if you want to start from scratch you can implement whatever
you like.

Programming inside the kernel means no recursion.  You can sometimes
get away with one or two recursive calls, but when you write the code
you haveto guarantee that that's all there will be.  You can't write
anything that recurses a variable number of times based on user input
data (such as a path containing symbolic links).  The reason for this
is that the kernel has a small fixed size stack, and when its exceeded
the kernel crashes (or in some implementations, random memory is trashed).

Now for the third problem ... all of the alogorithms rely on remembering
something.  In the context of loop detection in symloops, the question
is what is it that you want to remember?  You could try remembering the
string of names, but I don't think that's going to get you very far.
You could build pwd into the kernel, and have it remember absolue paths
to each symllink encountered, but I don't think you'll get many customers
for that system.  The obvious thing to remember is inodes, right?

Wrong.. the kernel is not allowed to remember inodes in general.  That
either causes incorrect results, or deadlocks, and neither of those is
to be desired.  Anyone with 4.3 source should look in the directory
cache code and see the torture that was necessary to make that work
correctly.  There it can be done, since its just a cache, and whenever
things look difficult its perfectly OK to simply forget it, and go
back to the old way.  In a loop detection algorithm you couldn't afford
to do that, or from time to time someone will tie the kernel up in an
infinite loop, because a loop wasn't noticed.

As I said at the start, I would welcome an implementation that proves
me wrong, but please go and do it .. don't just talk about it!

Finally, as usual, I agree with Chris Torek in article <7326@mimsy.UUCP>:

> Moreover, eight symlinks is really not all that small a number.
> We had five levels of symlinks in one translation on a Sun, and it
> was noticeably slower than other path name translations.  If you
> have enough levels of symlinks that you run into ELOOP errors, you
> probably have too complex a nest of symlinks.  (That said, I do
> think eight is too small; but I also think namei is too slow.)

Given that no-one has yet actually implemented a loop detection
algorithm, and we still have some constant limit on symlinks, we
have a trade off.  System administrators will usually prefer a small
value, the smaller the better, as it lowers the overheads in filesystem
lookups.  Nb: you never *need* more than a single symlink, any symlink
chain can *always* be reduced to a single symlink, albeit sometimes
with considerably more administrative headaches.

Users would typically like a large limit, as then they don't have
to think about what they are doing, and can simply install a symlink
to anything that appears anywhere.

This is something that should be a system config option, so that the
local site can decide for itself what the right limit is (ie: how much
cpu time they're willing to devote to pathname lookups to increase
user convenience).  It shouldn't be a compiled in number.  Nothing
should be a compiled in number, unless that number is fixed by some
external and immutable object, that is really nevergoing to change
(the number of bits in a byte is one such number .. changing that is
going to mean changing the hardware, and that's certainly going to
require recompiling anyway).

Even if someone does implement a loop detection algorithm, I would
still want an absolute limit on the number of symlinks to keep my
machine running at a reasonable rate!

kre