Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!think!barmar
From: barmar@think.COM (Barry Margolin)
Newsgroups: comp.unix.wizards
Subject: globbing in the shell (Was Re: more rm insanity)
Message-ID: <12441@think.UUCP>
Date: Wed, 2-Dec-87 13:08:56 EST
Article-I.D.: think.12441
Posted: Wed Dec  2 13:08:56 1987
Date-Received: Sat, 5-Dec-87 15:28:38 EST
References: <1257@boulder.Colorado.EDU> <6840002@hpcllmv.HP.COM> <9555@mimsy.UUCP> <1890@celtics.UUCP> <6774@brl-smoke.ARPA>
Sender: usenet@think.UUCP
Reply-To: barmar@sauron.UUCP (Barry Margolin)
Organization: Thinking Machines Corporation, Cambridge, MA
Lines: 78

In article <6774@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) ) writes:
>In fact that is a key "win" of UNIX over OSes that make applications deal with
>globbing.

Ahah, now you've hit one of my favorite complaints about Unix.

I do NOT think it is such a win that wildcard expansion is done by the
shell, at least not when it is done in the haphazard style that Unix
shells use.  It assumes that all commands take filenames as arguments,
and that any argument with wildcard characters is supposed to be a
filename.

A very common counterexample is grep.  Its first argument will often
contain wildcard characters, for example

	grep "foo.*bar" 

I wonder how many new users get screwed when they forget to quote the
first argument and it says "No match" so they assume that none of the
files contain the pattern (I think the Bourne shell "solved" this
problem by making unmatched tokens expand into themselves, but the C
shell just aborts the line).  Other commands may want to take
wildcards, although not necessarily to match filenames; for example

	who bar*

should list all the logged-in users whose names begin with "bar"
(equivalent to "who | grep '^bar'").

It should be up to the command to decide the appropriate context for
treating arguments as pathnames and performing wildcard expansion.
This way, a command that knows it is dangerous, such as rm, can check
whether it was called with a wildcard and perhaps be more careful.  On
Multics, the delete command does exactly this, querying "Are you sure
you want to 'delete **' in ?" unless -force is specified.

Globbing in the shell also severely limits the syntax of commands; I
will admit that this could be seen as a benefit, because it forces
conformity, but sometimes a minor syntax change can be useful.  For
example, there's no way to write a version of the cp or mv commands
that takes an alternating list of source and destination pathnames,
where the source pathnames are permitted to have wildcards.  You also
can't do something like Multics's

	rename foo.** foo.bar.==

(the == is replaced by whatever the ** matched) without writing a
complicated script that used grep and sed on the output of ls.

Finally, even when an argument is a pathname, it is sometimes not
allowed to be multiple files.  For example, diff takes pathnames, but
it requires exactly two of them, and ar allows only one archive
pathname to be specified.  On Multics, a command with a syntax like
this can check whether the argument contains wildcards and complain.
Diff can check that it received exactly two pathnames, but it won't
know whether this is simply because one wildcard happened to match
exactly two files (maybe this was intentional on the user's part, but
maybe it wasn't), and ar will simply treat the extra arguments as
member files.

So does this mean that globbing MUST be done by the commands
themselves?  Well, yes and no.  This is how it is done on Multics,
although the actual matching is done by a system call for filenames
(for efficiency, since Multics directories are not directly readable
by user-mode code, so it saves lots of data copying) and by a library
subroutine for non-filenames.  Some more modern systems allow commands
to provide information to the command processor that tell it how to do
the automatic parsing; in this case, this data would specify which
arguments are pathnames that allow wildcards, and the command
processor would automatically perform the expansion in the right
cases.

---
Barry Margolin
Thinking Machines Corp.

barmar@think.com
seismo!think!barmar