Path: utzoo!utgpu!water!watmath!clyde!att!rutgers!mit-eddie!ll-xn!ames!lll-tis!oodis01!uplherc!esunix!bpendlet
From: bpendlet@esunix.UUCP (Bob Pendleton)
Newsgroups: comp.arch
Subject: Re: Software Distribution
Message-ID: <978@esunix.UUCP>
Date: 23 Sep 88 20:46:07 GMT
Organization: Evans & Sutherland, Salt Lake City, Utah
Lines: 167

>In article <970@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>>> Distribution with sources is Good.  Distribution without sources is Evil.
>>
>>> ... What I'm thinking of is much more subtle things that the compiler
>>> can't easily discover and put in the intermediate form, e.g. "this program
>>> depends on being able to dereference NULL pointers".  Or, for that matter,
>>> "the details of the arithmetic in this program assume that integers are
>>> at least 36 bits"...
>>
>>... To be truly portable the intermediate form MUST address
>>the issues you mention. Even if the source language doesn't define the
>>semantics of dereferencing NULL pointers, the intermediate form must
>>define the semantics of dereferencing NULL pointers.
>
>Unfortunately, it *can't*, without being machine-specific.

Try this scenario:

There are two kinds of computers in the world, brand X and brand Y.
Brand X computers define the value pointed to by a NULL pointer to be
a NULL value. That is, the load indirect instruction given the value
that C uses for NULL is guaranteed to return NULL.  On the other hand
brand Y computers core dump if you try to load a value from the
address that is equivalent to NULL.

In all other respects X and Y computers are similar enough in word
size, data formats, and so on, that software that doesn't dereference
NULL ports easily from one brand of machine to the other.

Let's assume that on both brands of computers people want code to run
as fast as possible. So, the native code generators for the machines
will generate the shortest possible code sequence for dereferencing a
pointer.  Of course they don't want to do run time checking to see if
a NULL pointer is being dereferenced if they don't have to.

A programmer uses brand X computers. He writes a pointer chasing
program that assumes that *NULL == NULL. He's using a compiler suite
that generates code in UIF (Universal Intermediate Form). Now he
distributes the UIF to people with both brand X and brand Y computers.
They run it through their UIF to machine code translators and run the
code. What Happens?

Well that depends on the definition of UIF. If UIF ignores the *NULL
problem then the code will run on brand X computers and bomb on brand
Y computers. But, if UIF allows a compiler to put a flag in the UIF
that says that *NULL == NULL, or if UIF defines *NULL == NULL, then
the code will run on brand Y machines, but with a speed penalty caused
by the run time checks that the code generator had to insert to comply
with the the brand X compilers request that *NULL == NULL.

So, the compiler that runs on brand X machines must, at least, put a
flag in the UIF stating that dereferencing NULL is allowed. The
compiler on brand Y machines should state that dereferencing NULL is
not allowed. That way the code can be made to run on any machine,
though with a preformance hit when the original compilers assumptions
don't match the reality of a specific machine. Obviously the compilers
and code generators for brand X machines are going to be set up to
produce good code for brand X computers and the same is true for brand
Y computers.  But, it is still possible for UIF code generated for one
machine to be translated to be run on the other machine.

So, to restate what I've said so many times (am I getting boring yet?):

UIF must, at the very least, require that machine dependent
assumptions be stated in the UIF form of a program. If the assumptions
made by the original compiler and the target machine are a close match
then the program will run efficiently on the target machine. If the
assumptions don't match then the program will still run, it just won't
run an as fast as it might have.

This means that the UIF is not machine specific, but programs that
make machine specific assumptions will pay a penalty when they are run
on machines that don't support their assumptions.

>If the intermediate form allows dereferencing NULL, then the
>intermediate form's pointer-dereference operation is inherently
>expensive on machines which do not permit dereferencing NULL, making
>it impossible to generate good code from the intermediate form.

It would seem that our definitions of "good code" are very different.
My definition requires that the code do what I said to do. As I've
tried to point out, not everything I say in a program is explicit in
the source code.  Several critical declarations are made by default
based on the computer I'm using, the compiler I'm using, and the
operating system I'm using. A complete set of declarations for a
program includes all these things. For a compiler to generate code
that matches the complete declaration of a program on a machine other
than the one it was designed for may require that code sequences be
generated that slow the program down. That's engineering folks, but it
isn't impossible. By my definition, it's even good. I would prefer
that programmers not write code that do things like dereferencing
NULL. But, if the language allows, I want to support it an make it
portable.

>>Yes, that means that C compilers will have to put information into the
>>intermediate form that does not derive from any programmer provided
>>declarations. That indicates a flaw in C, not a problem with the idea
>>of a portable intermediate language. 
>
>This is like saying that the impossibility of reaching the Moon with a
>balloon indicates a flaw in the position of the Moon, not a problem with
>the idea of using balloons for space travel!

This is a very good example of the use of a false analogy to build a
strawman argument.

>All of a sudden, our
>universal intermediate form is useless for most of today's programming
>languages, unless the compilers are far more sophisticated than current
>ones.  (NULL pointers are a C-ism, but deducing the size of integers that
>the program's arithmetic needs is a problem for most languages.)

This is a good example of justifying a false conclusion with a false
premise.

I can't find any thing about requiring compilers to deduce number
ranges in anything in my author_copy file. What I keep saying is that
the compiler must explicitly state its ASSUMPTIONS in the UIF form of
a program. If the compiler can deduce number ranges, then it would be
nice if it passed that information along in the UIF. If the compiler
assumes that NULL can be dereferenced, as it would on a computer with
hardware that allows it, then the compiler must state that fact in the
UIF it generates.

>I assumed that we were talking about *practical* portable intermediate
>forms, ones that could be used with current languages and current compiler
>technology.

An ad hominem attack on my credibility? Incredible! But I'll address
it anyway.

No, I've been talking about old languages like C, COBOL, BASIC, LISP
FORTRAN, Pascal, and MODULA-2. I've worked on compilers or
interpretors for, or in, all of these languages. These languages
comprise a small subset of the off the wall languages I've used and/or
implemented over the last 17 years. So I'm convinced I know a little
something about them.

Anyway, it's very hard to keep up with all the current languages being
developed there are so many of them. :-)

As for practical, I've already cited examples of commercial products that
aren't far from using UIF already.

One of the problems I think we've had with this entire exchange is
that it has centered around C. C is not yet standardized, and because
it was intended to be a systems programming language C has always
tolerated machine dependent variations in the semantics of some of its
operators. I believe the variation has been tolerated because it was
believed to be justified by the resulting increase in speed. I believe
Henry published a paper that showed that using better algorithms is
much better than using nonportable hardware features.

If this discussion had centered around COBOL or BASIC there would have
been little to discuss because the standards for these languages already
require source level declarations that solve most of the problems we have
been discussing. 

In the long run I think that the kind of discipline that could result
from the use of a UIF would be a very good thing.

			Bob P.
-- 
Bob Pendleton @ Evans & Sutherland
UUCP Address:  {decvax,ucbvax,allegra}!decwrl!esunix!bpendlet
Alternate:     utah-cs!esunix!bpendlet
        I am solely responsible for what I say.