Path: utzoo!utgpu!water!watmath!clyde!att!rutgers!mit-eddie!ll-xn!ames!lll-tis!oodis01!uplherc!esunix!bpendlet From: bpendlet@esunix.UUCP (Bob Pendleton) Newsgroups: comp.arch Subject: Re: Software Distribution Message-ID: <978@esunix.UUCP> Date: 23 Sep 88 20:46:07 GMT Organization: Evans & Sutherland, Salt Lake City, Utah Lines: 167 >In article <970@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes: >>> Distribution with sources is Good. Distribution without sources is Evil. >> >>> ... What I'm thinking of is much more subtle things that the compiler >>> can't easily discover and put in the intermediate form, e.g. "this program >>> depends on being able to dereference NULL pointers". Or, for that matter, >>> "the details of the arithmetic in this program assume that integers are >>> at least 36 bits"... >> >>... To be truly portable the intermediate form MUST address >>the issues you mention. Even if the source language doesn't define the >>semantics of dereferencing NULL pointers, the intermediate form must >>define the semantics of dereferencing NULL pointers. > >Unfortunately, it *can't*, without being machine-specific. Try this scenario: There are two kinds of computers in the world, brand X and brand Y. Brand X computers define the value pointed to by a NULL pointer to be a NULL value. That is, the load indirect instruction given the value that C uses for NULL is guaranteed to return NULL. On the other hand brand Y computers core dump if you try to load a value from the address that is equivalent to NULL. In all other respects X and Y computers are similar enough in word size, data formats, and so on, that software that doesn't dereference NULL ports easily from one brand of machine to the other. Let's assume that on both brands of computers people want code to run as fast as possible. So, the native code generators for the machines will generate the shortest possible code sequence for dereferencing a pointer. Of course they don't want to do run time checking to see if a NULL pointer is being dereferenced if they don't have to. A programmer uses brand X computers. He writes a pointer chasing program that assumes that *NULL == NULL. He's using a compiler suite that generates code in UIF (Universal Intermediate Form). Now he distributes the UIF to people with both brand X and brand Y computers. They run it through their UIF to machine code translators and run the code. What Happens? Well that depends on the definition of UIF. If UIF ignores the *NULL problem then the code will run on brand X computers and bomb on brand Y computers. But, if UIF allows a compiler to put a flag in the UIF that says that *NULL == NULL, or if UIF defines *NULL == NULL, then the code will run on brand Y machines, but with a speed penalty caused by the run time checks that the code generator had to insert to comply with the the brand X compilers request that *NULL == NULL. So, the compiler that runs on brand X machines must, at least, put a flag in the UIF stating that dereferencing NULL is allowed. The compiler on brand Y machines should state that dereferencing NULL is not allowed. That way the code can be made to run on any machine, though with a preformance hit when the original compilers assumptions don't match the reality of a specific machine. Obviously the compilers and code generators for brand X machines are going to be set up to produce good code for brand X computers and the same is true for brand Y computers. But, it is still possible for UIF code generated for one machine to be translated to be run on the other machine. So, to restate what I've said so many times (am I getting boring yet?): UIF must, at the very least, require that machine dependent assumptions be stated in the UIF form of a program. If the assumptions made by the original compiler and the target machine are a close match then the program will run efficiently on the target machine. If the assumptions don't match then the program will still run, it just won't run an as fast as it might have. This means that the UIF is not machine specific, but programs that make machine specific assumptions will pay a penalty when they are run on machines that don't support their assumptions. >If the intermediate form allows dereferencing NULL, then the >intermediate form's pointer-dereference operation is inherently >expensive on machines which do not permit dereferencing NULL, making >it impossible to generate good code from the intermediate form. It would seem that our definitions of "good code" are very different. My definition requires that the code do what I said to do. As I've tried to point out, not everything I say in a program is explicit in the source code. Several critical declarations are made by default based on the computer I'm using, the compiler I'm using, and the operating system I'm using. A complete set of declarations for a program includes all these things. For a compiler to generate code that matches the complete declaration of a program on a machine other than the one it was designed for may require that code sequences be generated that slow the program down. That's engineering folks, but it isn't impossible. By my definition, it's even good. I would prefer that programmers not write code that do things like dereferencing NULL. But, if the language allows, I want to support it an make it portable. >>Yes, that means that C compilers will have to put information into the >>intermediate form that does not derive from any programmer provided >>declarations. That indicates a flaw in C, not a problem with the idea >>of a portable intermediate language. > >This is like saying that the impossibility of reaching the Moon with a >balloon indicates a flaw in the position of the Moon, not a problem with >the idea of using balloons for space travel! This is a very good example of the use of a false analogy to build a strawman argument. >All of a sudden, our >universal intermediate form is useless for most of today's programming >languages, unless the compilers are far more sophisticated than current >ones. (NULL pointers are a C-ism, but deducing the size of integers that >the program's arithmetic needs is a problem for most languages.) This is a good example of justifying a false conclusion with a false premise. I can't find any thing about requiring compilers to deduce number ranges in anything in my author_copy file. What I keep saying is that the compiler must explicitly state its ASSUMPTIONS in the UIF form of a program. If the compiler can deduce number ranges, then it would be nice if it passed that information along in the UIF. If the compiler assumes that NULL can be dereferenced, as it would on a computer with hardware that allows it, then the compiler must state that fact in the UIF it generates. >I assumed that we were talking about *practical* portable intermediate >forms, ones that could be used with current languages and current compiler >technology. An ad hominem attack on my credibility? Incredible! But I'll address it anyway. No, I've been talking about old languages like C, COBOL, BASIC, LISP FORTRAN, Pascal, and MODULA-2. I've worked on compilers or interpretors for, or in, all of these languages. These languages comprise a small subset of the off the wall languages I've used and/or implemented over the last 17 years. So I'm convinced I know a little something about them. Anyway, it's very hard to keep up with all the current languages being developed there are so many of them. :-) As for practical, I've already cited examples of commercial products that aren't far from using UIF already. One of the problems I think we've had with this entire exchange is that it has centered around C. C is not yet standardized, and because it was intended to be a systems programming language C has always tolerated machine dependent variations in the semantics of some of its operators. I believe the variation has been tolerated because it was believed to be justified by the resulting increase in speed. I believe Henry published a paper that showed that using better algorithms is much better than using nonportable hardware features. If this discussion had centered around COBOL or BASIC there would have been little to discuss because the standards for these languages already require source level declarations that solve most of the problems we have been discussing. In the long run I think that the kind of discipline that could result from the use of a UIF would be a very good thing. Bob P. -- Bob Pendleton @ Evans & Sutherland UUCP Address: {decvax,ucbvax,allegra}!decwrl!esunix!bpendlet Alternate: utah-cs!esunix!bpendlet I am solely responsible for what I say.