Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!uunet!munnari!kre From: kre@munnari.oz (Robert Elz) Newsgroups: comp.unix.wizards Subject: Re: symbolic links poster child Message-ID: <1738@munnari.oz> Date: Mon, 6-Jul-87 15:03:15 EDT Article-I.D.: munnari.1738 Posted: Mon Jul 6 15:03:15 1987 Date-Received: Tue, 7-Jul-87 06:29:26 EDT References: <7955@brl-adm.ARPA> <2249@bunker.UUCP> <244@nuchat.UUCP> <7326@mimsy.UUCP> Organization: Comp Sci, Melbourne Uni, Australia Lines: 110 Summary: Why doesn't someone implement a loop detection algorithm?? The question is in the Summary header .. the answer is here... Its not easy! And this is quite apart from any effeciency considerations (such as slowing namei from a walk to a crawl). I'm not going to say its impossible, but its not nearly as easy as has been implied by some of the news items on this subject. You can't simply read the literature and translate the mathematics into code. Not for the first time, it turns out that the theoretical results are simply impractical (see also Tanenbaum's MINIX book, and the discussion of deadlock avoidance algorithms). There are several reasons why its not easy. One is that symlinks simply don't operate like nodes in a graph, another is that programming inside the kernel imposes a few limitations that theoretical algorithms rarely allow for. There's a third, which is the real killer, but I'll leave that for a bit lower, to keep you reading... First, the nature of symlinks .. its not a loop for a symlink to be encountered in a path 2, 3, 4, ... times, that might be entirely valid. Most loop detection algorithms assume that there is just one path out of a node, so when you encounter it again, you have a loop. The interesing question is supposed to be how to remember what you have encountered in minimum space and time. Consider ls -F /foo x/ y/ z/ ln -s /foo /a ln -s /a/x/xx /a/y/yy ln -s /a/y/yy /a/z/zz cp /dev/null /foo/x/xx Now consider the number of times that the symlink "a" is examined when referencing /a/z/zz .. Now there are ways around this, you can look for loops at each "level" of interpretation, but here the kernel programming restrictions start making life difficult. Here I assume that we're talking about unix, any unix, not a redesigned kernel .. if you want to start from scratch you can implement whatever you like. Programming inside the kernel means no recursion. You can sometimes get away with one or two recursive calls, but when you write the code you haveto guarantee that that's all there will be. You can't write anything that recurses a variable number of times based on user input data (such as a path containing symbolic links). The reason for this is that the kernel has a small fixed size stack, and when its exceeded the kernel crashes (or in some implementations, random memory is trashed). Now for the third problem ... all of the alogorithms rely on remembering something. In the context of loop detection in symloops, the question is what is it that you want to remember? You could try remembering the string of names, but I don't think that's going to get you very far. You could build pwd into the kernel, and have it remember absolue paths to each symllink encountered, but I don't think you'll get many customers for that system. The obvious thing to remember is inodes, right? Wrong.. the kernel is not allowed to remember inodes in general. That either causes incorrect results, or deadlocks, and neither of those is to be desired. Anyone with 4.3 source should look in the directory cache code and see the torture that was necessary to make that work correctly. There it can be done, since its just a cache, and whenever things look difficult its perfectly OK to simply forget it, and go back to the old way. In a loop detection algorithm you couldn't afford to do that, or from time to time someone will tie the kernel up in an infinite loop, because a loop wasn't noticed. As I said at the start, I would welcome an implementation that proves me wrong, but please go and do it .. don't just talk about it! Finally, as usual, I agree with Chris Torek in article <7326@mimsy.UUCP>: > Moreover, eight symlinks is really not all that small a number. > We had five levels of symlinks in one translation on a Sun, and it > was noticeably slower than other path name translations. If you > have enough levels of symlinks that you run into ELOOP errors, you > probably have too complex a nest of symlinks. (That said, I do > think eight is too small; but I also think namei is too slow.) Given that no-one has yet actually implemented a loop detection algorithm, and we still have some constant limit on symlinks, we have a trade off. System administrators will usually prefer a small value, the smaller the better, as it lowers the overheads in filesystem lookups. Nb: you never *need* more than a single symlink, any symlink chain can *always* be reduced to a single symlink, albeit sometimes with considerably more administrative headaches. Users would typically like a large limit, as then they don't have to think about what they are doing, and can simply install a symlink to anything that appears anywhere. This is something that should be a system config option, so that the local site can decide for itself what the right limit is (ie: how much cpu time they're willing to devote to pathname lookups to increase user convenience). It shouldn't be a compiled in number. Nothing should be a compiled in number, unless that number is fixed by some external and immutable object, that is really nevergoing to change (the number of bits in a byte is one such number .. changing that is going to mean changing the hardware, and that's certainly going to require recompiling anyway). Even if someone does implement a loop detection algorithm, I would still want an absolute limit on the number of symlinks to keep my machine running at a reasonable rate! kre