Xref: utzoo comp.lang.c:11264 comp.arch:5426 Path: utzoo!utgpu!water!watmath!clyde!bellcore!rutgers!mit-eddie!uw-beaver!uw-june!pardo From: pardo@june.cs.washington.edu (David Keppel) Newsgroups: comp.lang.c,comp.arch Subject: Re: Self-modifying code Keywords: self-modifying code, generate-and-execute Message-ID: <5262@june.cs.washington.edu> Date: 11 Jul 88 23:16:55 GMT References: <3353@cognos.UUCP> <619@goofy.megatest.UUCP> <429@uwovax.uwo.ca> <5254@june.cs.washington.edu> <12357@ut-sally.UUCP> Reply-To: pardo@uw-june.UUCP (David Keppel) Organization: U of Washington, Computer Science, Seattle Lines: 58 nather@ut-sally.UUCP (Ed Nather) writes: >pardo@june.cs.washington.edu (David Keppel) writes: >>[ Out of context quote ] >[ Funny retort. ] Sorry. I do like your rejoinder, though. Could be about self-modifying code, even :-) > Sorry, no. I've heard LOTS of arguments against programs that > generate their own code, and all of them -- so far as I am aware > -- assume the "proper" answer in generating arguments to show the > answer is proper. Be honest: suppose you *had* to do it this way, > but needed a formal discipline to keep it clean and understandable, > and *fast.* Couldn't you find a reasonable way? Let me rephrase my position. Their is nothing architecturally weird about programs that generate their own code. Doing so does not cause any problems on any machines that I am aware of, although few OPERATING SYSTEMS support this. There is a *major* problem with modifying existing code. Consider a machine with seperate I-cache and D-cache. If you execute an instruction at location 0x500, then one at 0x504 that branches back to the one at 0x500, and the one at 0x500 modifies itself, either: + The cache can be built so that it pays attention to the changing data (code) in the I-cache. This requires a "snoopy" cache, which is more complicated, less dense, and slower. If you are constrained by price or die space, you may be forced to use a slower, smaller (worse hit-ratio) cache. + You can execute out of the D-cache. This is effectively the same as having a snoopy I-cache. If you have BOTH an I-cache and a D-cache, you're left with coherency problems that are solved by building a snoopy I-cache. + You can define this operation to be illegal, build a smaller, faster, non-snoopy cache, and pay the price when you need to run code that could be expressed "naturally" as self-modifying code. Assume that you have an algorithm that is, say, 1000 instructions and that each instruction takes 4 cycles to execute (must be a RISC :-). One particular part of it, 10 instructions, is written as self-modifying code. It can also be written as 100 instructions of non-self-modifying (static) code. Further assume that the effect of having a non-snoopy cache is to make the computer run 10% faster (e.g., the cache is faster AND has a better hit rate). Then the running time of the program on the machine with the snoopy I-cache is: T = 990 * 4.0 + 10 * 4.0 = 4000 cycles On the machine without a snoopy I-cache (10% faster), T = 990 * 3.6 + 100 * 3.6 = 3924 cycles which is marginally FASTER for NOT using self-modifying code, even though it takes an additional 90 isntructions. ;-D on ( Cycles to burn reading flames ) Pardo