Path: utzoo!attcan!uunet!husc6!purdue!decwrl!tle.dec.com!rmeyers From: rmeyers@tle.dec.com (Randy Meyers 381-2743 ZKO2-3/N30) Newsgroups: comp.sys.amiga Subject: Leo's ANSI C Flame Message-ID: <8806292138.AA22025@decwrl.dec.com> Date: 29 Jun 88 21:38:30 GMT Organization: Digital Equipment Corporation Lines: 301 Leo recently posted an admitted flame about certain optimizations in Lattice C V4.0 and the general state of ANSI C. I believe that Leo may have been misinformed about some of these subjects. Leo's posting began by complaining about an optimization in the latest Lattice C compiler. Lattice C V4.0 will compile x = strlen("abcdefg") into an instruction that moves seven to x. Leo complains: > You realize, of course, that this kind of optimization falls flat on >its face if I somehow manage to change the contents of the memory that >contains "abcdefg". I could stuff a \0 where the 'd' is, and the program >would not notice. You are correct. However, such a program is clearly poorly written. Think, Leo! Would you really want to support such a program? Would you be proud that you can written it? Do you think that it helps program clarity if any constant in the program may have its value changed? Kernighan and Ritchie never guaranteed the string constants were modifiable. It was an accident of early implementations that string constants could be modified, and a very few programmers came to rely on it (probably again initially by accident). Note that there is no reason to have modifiable string constants in the language. Any program that takes advantage of modifiable string constants can be rewritten to use: static char modifiable[] = "abcdefg"; and take no extra time, no extra space, make it clear to the reader that the value is not a guaranteed constant, and just be better written. By the way, the ANSI standard does NOT require that character constants be read only. It says that "If a program attempts to modify a string literal..., the results are undefined." This is standard jargon for saying some implementations may write lock constants; some may not. Your program isn't portable if it depends on this (mis-)feature. If you care, only buy compilers from people who agree with your opinions. >Further, the type returned by strlen() is not *guaranteed* to be an int. >I could have written one that returns a short; where would that leave you? Under ANSI C (and Lattice, if it follows the ANSI rules on such things), the above strlen optimization is only legal if the user is using the real strlen. The way that the compiler "knows" you are using the real strlen as opposed to some strlen that you wrote is by the declaration of strlen. The rule boils down to "if you got the magic definition of strlen from the proper include file, the compiler is free to know lots of extra information about the function and to perform additional optimizations. If you provide your own definition of strlen, the compiler must use it and play dumb." This magic that happens when you include the proper include file is this: ANSI permits a standard include file to contain macros that are synonyms for standard functions in addition to the normal extern declarations for the functions. These macros might generate inline code instead of calling a function (many pre-ANSI versions of C use this trick for getc and getchar) or might call a builtin function that the compiler supports as an extension. For example, string.h might include: extern int strlen(const char *); /* Required by ANSI */ #define strlen(s) _STRLEN(s) /* Optional, permitted by ANSI */ What the first line does is declare the strlen function that has always been a part of C. ANSI C requires every implementation of C have in its library a function called strlen that does what you expect. (The only change here is ANSI C also specifies the argument type of the function.) The second line is optional. It defines a macro to be expanded when a normal call to strlen is found. However, the macro does not do its work by calling the library routine strlen, it uses the compiler extension _STRLEN to do the work. The _STRLEN can either try and determine the result at compile-time, try and generate code in-line to compute the result, or just give up and call the library routine. Note that this is pretty invisible to the programmer. He only gets the special _STRLEN function if he includes the proper .h file. If the programmer includes the file but does not want to take advantage of the builtin _STRLEN, he can do: #ifdef strlen #undef strlen #endif and not be bothered by it. Even if he doesn't do the #undef, he can call the library by writing the call as: x = (strlen)("abcdefg"); since macros that take arguments do not expand if the next token after the macro name is not a open parenthesis. A programmer can even take the address of the function without worry because of the same rule: f = strlen; /* get a pointer to strlen */ Note that the ANSI standard requires that a programmer be able to avoid this fancy builtin stuff through the methods I stated. Although, in general, programmers need not try to get around this stuff. Except for very bizarre programs (like programs that assume that constants aren't, but are compiled with compilers that assume constants are), everything works the same. In part of your argument is the assumption that a programmer is free to provide his own versions of any standard routine. This assumption is in error. It sometimes works, and it sometimes doesn't. The ANSI standard does not really change traditional practice here. You can provide your own routines if you do not include the standard header file declaring the function and your make your replacement a static (non-global) routine. This has always been true--ANSI doesn't change it. What the draft ANSI C standard says about making your own extern function or variable with the same name as a standard one is "If a program defines an external identifier with the same name as a reserved external identifier, even in a semantically equivalent form, the behavior is undefined." Again, this is standards jargon saying that it may work, or it may not. If you care, only give your money to a compiler writer whose prejudices match yours. This is not a change in traditional C practice. Although, a lot of misguided people think that this was formally permitted because it does work much of the time. Ok, let's assume that you want to write your own version of strlen that returns a short (assume sizeof (short) is 2) instead of the standard strlen returns unsigned int (assume sizeof (unsigned int) is four). You write some test programs, they all work fine. Now you write a program that uses your short strlen and calls printf. Guess what, unknown to you, the version of printf that comes with your compiler calls strlen on string arguments in order to determine the size of buffers it needs. Assume that printf now gets horribly wrong answers from your strlen because if picks up two bytes of garbage along with the two bytes of result. Maybe you luck out. Maybe printf doesn't call strlen. But you can probably break just about EVERY C implementation by randomly changing some of the library functions out from underneath it. (Does printf depend on puts? calloc? write? ferror? stdout? fprintf?) Try it on our favorite C implementation. Call up the developer. Tell him what you find. You'll probably get some reply like, "Gosh, your right. If you want to rewrite puts, you should also rewrite printf as well. Have you looked into buying the source for the library? It will make your job easier." The ANSI standard includes that bit about "semantically equivalent" to cover two other facts of life. First, your may think you have provided a "plug-compatible" version of the routine, but failed in some needed nuance. For example, some implementations of malloc have the property that if you allocate a chunk of memory, free it, and reallocate it, the original data you stuffed into the memory will still be there. I have heard of code that makes use of this "feature." Suppose that your malloc doesn't do this, but your compiler's version of printf requires it. The other fact of life is that sometimes several C library functions will end up in the same module. Assume that if the linker brings in calloc from the library, the entry point for malloc is dragged in as well. If you wanted to replace malloc with your own routine, but wanted to use the standard calloc, you will get multiple definitions of malloc when you link. All of the above is a fact of life today WITHOUT the ANSI Standard. The ANSI Standard actually improves the situation somewhat. The ANSI standard does "reserve" the traditional C library names, but it limits the standard functions to only depend on other standard functions or to names that begin with underscore. When I first got my Amiga and Lattice C V3.10, one of the first programs I tried to build was Wecker's VT100. It compiled and loaded without errors, but it would die horribly just after starting. I eventually tracked down the bug. The Lattice fopen function called another (new to V3.10) Lattice function called dopen. Wecker had a dopen function in his program that did something entirely different. When fopen called dopen, and entered the Wecker version, not the Lattice version, the program would die. This is a problem that has always haunted C, no one said you couldn't have some standard library routine call some non-standard entry point. The problem doesn't turn up too often because most standard library functions can be written using only calls to other standard functions or to system specific functions with really weird names (_WRITE, SYS$QIO, $#%&*OUT...). But occasionally the problem occurs. Under the ANSI standard, the problem is outlawed. If Lattice C had been standard conforming, the VT100 program would have worked. So, the ANSI standard doesn't make the situation any worse when it comes to you writing replacements for standard functions, and it makes the situation better when it comes to making sure that standard functions don't tromp all over your functions. > You further realize, of course, that no respectable programmer would >ever write: > > strlen ("abcdefg"); > > But would instead use (if he really *had* to): > > sizeof ("abcdefg") - 1; > > If the code is written by d*psh*ts, it is *not* the responsibility >of the compiler vendor to save their butts. Leo, write a macro that takes two arguments. The first argument is the name of a struct that has two members, len and ptr. The second argument to the macro is a pointer a string. The macro does two things: it sets the len member to the length of the string and the ptr member to the address of the string. Here's my answer: #define DESC(d, string) (d.len = strlen(string), d.ptr = string) I actually had to use a similar macro recently. Look at what happens when I make a call of the form DESC(d, "abcdefg"). The point here is that there is no such thing as an optimization for a d*psh*t case. Experience has shown time and time again that optimizations for what looks like stupid code are valuable. Stupid code comes up because people use macros, because the compiler itself may generate it, or because powerful optimizations may reduce complex code to a simple case. For example: register char *p; p = "abcdefg"; /* 100,000 lines of code that don't modify p */ DESC(d, p); A reasonably good compiler will prove that p's value has not been changed since the initial assignment, and will transform the call into DESC(d, "abcdefg"). With Lattice's strlen optimization, this will boil down into two moves, instead of a function call and two moves. >Bloated code is, by and large, the responsibility of the guy who *wrote* >it. And if the programmer in question doesn't realize this, then s/he >has no business writing code for public consumption. As show above, bloated code is sometimes written by nobody--it just sort of exists in the code written by the best of us. If an automatic tool, like an optimizing compiler, can get rid of it painlessly, it is a great idea. >'volatile' is a Good Thing. Function prototypes are a Good Thing. I agree. >#pragma is of questionable value (largely because no one has adequately >explained to me what it *does*!). Simple: pragma is a standard approved way to add extensions to the language without adding new reserved words. For example, Lattice uses it in their standard headers in order to call ROM Kernal routines directly without going through the stubs. pragma is intrinsically non-standard: the ANSI standard states that it exists, mentions some of the things that it can be used for, and leaves it alone. Every compiler is free to develop pragmas and use any syntax that they want after the word pragma. A programmer who uses a pragma should enclose it in #if--#endif: #if LATTICE #pragma Delete(R0,R1) /* Means delete source file to MANX */ #endif I made up the above example. Lattice's pragma don't look that way and MANX, as far as I know, doesn't have pragma. >Enforced parenthetical grouping whether or not it's necessary is Stupid. Expression control is necessary, but I don't like enforced parentheses either. I preferred it when the new unary plus operator controlled expression evaluation. However, France threatened to veto the ISO standard for C unless they got parentheses. The enforcement only makes a difference when doing floating point, one's complement math, or checking for integer overflows. Since most C implementations (and C programs) use two's complement integer math with no overflow detection, it isn't a big thing. >Making string constants read-only is Stupid. The ANSI standard doesn't. >Breaking all the string functions and giving them cryptic names is Stupid. I agree totally. But, I don't think that has happened. The traditional functions with traditional meanings are around. Send me mail with what you think is specifically wrong. To sum up: There is a lot of misinformation about ANSI C. If someone has told you that all your code will break under ANSI C, either you are a very poor programmer (and your code breaks every time you move it) or you are being misinformed. (The latter is very easy: the ANSI standard is written in formal style using certain conventions that make it hard to decipher. I have come across lots of misinformation about what the standard says.) ---------------------------------------- Randy Meyers, not representing Digital Equipment Corporation USENET: {decwrl|decvax|decuac}!tle.dec.com!rmeyers ARPA: rmeyers%tle.dec.com@decwrl.dec.com