Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!floyd!harpo!seismo!hao!hplabs!sri-unix!Laws@SRI-AI.ARPA From: Laws@SRI-AI.ARPA Newsgroups: net.unix Subject: Improving C Message-ID: <16884@sri-arpa.UUCP> Date: Thu, 23-Feb-84 18:29:55 EST Article-I.D.: sri-arpa.16884 Posted: Thu Feb 23 18:29:55 1984 Date-Received: Fri, 2-Mar-84 11:18:57 EST Lines: 184 From: Ken LawsThe recent messages about strncpy() illustrate the need for string commands in addition to the character vector commands offered in C and UNIX. Character manipulation combined with malloc() can be made to do whatever you want, but the semantics can be confusing. I find it absurd that there is not even a standard library of dynamic string routines supplied with UNIX. I have written such a package myself, and I am sure many others have also. String routines are easy to write in C, which may be why they are always hacked inline, but why must we all reinvent such wheels? A separate string package could be made reasonably efficient and could include extras such as a length field (making it possible to embed nulls in a string) or a current position pointer (making a string into a virtual disk file). The following is a list of other suggestions I have for improving C and the C environment: The C language is reasonably clean, but it could be improved. (Maybe the next version should be named D?) In particular, I would like: Dynamic strings that are distinct from character vectors. A string should be represented by its address as is now done for arrays. String routines should return copied substrings, etc. A concatenation routine is particularly needed. (We have provided one on our testbed, but without garbage collection such things are a little dangerous.) Dynamic matrices that are addressable using multidimensional subscripts. Lists. Definition of a list as a char ** works, but it must be initialized as a (char *)[]. This could be fixed in the compiler. Classes, as implemented in the "class" preprocessor from Bell Labs. Begin(name) and end(name) delimiters as part of the language. Our SRI testbed macros do not check for matching names, and cannot be used for top-level brackets because ctags does not expand the macros and gets confused. The cb program also fails to recognize brackets hidden by macros. True nested procedures in addition to the current nested blocks. At present it is difficult to make certain variables global to a main subroutine and its "servants", yet not global to everyone. This also makes it difficult to convert code from other languages that do have this capability. Variables declared outside functions should be private (static) by default. A "global" or "public" keyword should be required to make them available externally. A "proc" or similar keyword used in function headers so that they can be easily distinguished from variable declarations. A "forward" or "extern" keyword could be required to distinguish headers without bodies. This would simplify the job of cc, cb, ctags, and other programs that analyze C source files. It should be possible to use an enumeration code (e.g., NONE) with different values in different enumerations. Macro names must necessarily override enumeration names, so it is probably an error to have the same codes for both. Some type of package or union specification is needed for enums. An nargs() function to return the number of arguments passed to a routine. Such a function exists in the Berkeley UNIX, but is not documented. [The Berkeley routine actually returns the number of words in the argument list, which can differ from the number of arguments.] Macros that can handle a variable number of arguments. At present it is impossible to extract some of the arguments for various purposes and then pass the rest (however many) on to printf. It is also impossible to replace "return" with a macro because it may or may not have an argument. An OMITTED argument code of some type that can be used to test whether an argument to a function or macro was omitted by typing successive commas or providing too few arguments. This might be coupled to a default mechanism, but the user can easily write his own defaults if the OMITTED code were implemented. Some type of entry and exit hooks that can be used for debug tracing, timing instrumentation, etc. It is currently awkward to intercept return statements because they accept a variable number of arguments (one argument or none, but not an empty argument list). The assignment operator should have been := instead of ==. Use of = instead of == in conditionals is a common source of error. I particularly object to the statement in the manual that "Expressions involving a commutative and associative operator (*, +, &, |, ^) may be rearranged arbitrarily, even in the presence of parentheses; ...". This is inexcusible in a modern language. I am also unhappy about the number of machine-dependent results that C permits. (E.g., overflow and divide check, rounding of negative numbers, mod (%) on negative numbers, sign extension on chars, sign fill on right shift, direction of bits accessed by bit fields.) It should be possible to put spaces before a # command for the compiler. Also, it should be documented that spaces are legal after the #. Use of escaped linefeeds in a macro confuses the compiler: its diagnostic messages do not count the continuations as lines, but vi does count them. (This has been fixed in Berkeley 4.2.) Fclose should be called automatically when a program terminates abnormally. (It is already called for normal terminations.) It is very difficult to find some bugs when buffers are not dumped. If the program runs for a long time, it is convenient to pipe its output into a log file instead of tying up a terminal. If the log file is not flushed, however, this is not only unproductive; it is misleading. We just found another bug where setting array[4] in something declared "int array[4]" overwrote a pointer in a distant piece of code. C ought to offer a run-time subscript checking facility, and certainly should have caught this compile-time error. (Hardware speed and storage are becoming less of a consideration every year. Programming ease and software reliability should be dominant.) The compiler should warn about statements like "x+1;" since they can have no side effects or other useful purpose. Most likely the statement is intended to be "x+=1;". The expression "(cast) (flag == 0) ? 0 : 1" applies the cast to the boolean test rather than to the output of the conditional expression. I would much rather see the syntax "(cast) ifv (flag == 0) thenv 0 elsev 1" where the cast applies to the final value. [I have implemented the ifv/thenv/elsev macros, but there is no way to put hidden parentheses around the entire constuct unless one adds a special terminator (e.g., "fi").] The expression (A,B) returns the value of B. There needs to be a similar syntax for those cases where the value of A is desired, and A must be executed first. In particular, suppose that we are writing a macro noteerr() which is supposed to evaluate its argument and take some action based on a global return code, then return the result of the initial evaluation as its value. For example, suppose we want to pass some functional value, func(), to a subroutine, subr(), and that we want to wrap the evaluation of func() in an error handler: subr(...,noteerr(func()),...); There is currently no way to do this and have the whole noteerr() macro return an object of the same type and value as func() when func() may be of arbitrary type. It could be done if there were an (A,B) syntax that returned the value of B. The compiler should accept string continuations of the form printf("Beginning of string" " continuation of string."); The SAIL/MAINSAIL dynamic string concatenation syntax is even more flexible, but even this primitive convention would be adequate for compile-time concatenation. Every enum should have a validity checking routine, e.g. valid(...). This would permit one to identify illegal values without converting everything to ints. Note that valid enums are not necessarily sequential, so that the test can be complicated. This checking cannot be done at compile time, so it may be necessary for the user to provide the checking routines; a pity. I just got caught again closing a comment with \* instead of */. The compiler just ate everything up to the next comment. I see no reason why C can't allow nested comments and also check for proper balance of comment delimiters. -------