Xref: utzoo comp.lang.c++:2174 comp.lang.c:14483 comp.lang.forth:712 comp.lang.fortran:1579 comp.lang.misc:2265 Newsgroups: comp.lang.c++,comp.lang.c,comp.lang.forth,comp.lang.fortran,comp.lang.misc Path: utzoo!henry From: henry@utzoo.uucp (Henry Spencer) Subject: Re: Assembly or .... Message-ID: <1988Dec3.220546.28830@utzoo.uucp> Organization: U of Toronto Zoology References: <1388@aucs.UUCP> <729@convex.UUCP> <1961@crete.cs.glasgow.ac.uk> <1988Nov29.181235.23628@utzoo.uucp> <960@vsi.COM> Date: Sat, 3 Dec 88 22:05:46 GMT In article <960@vsi.COM> friedl@vsi.COM (Stephen J. Friedl) writes: >> Alas, if you buy your newer faster CPUs from Motorola or Intel, they can't >> tell you how many cycles each instruction takes! > >Why is this? When I was hacking on the VAX, nobody could ever >tell me how long anything took, and empirical measurements were >pretty tedious. Is it laziness on the vendor's part or are there >good reasons for this? Both. On simple machines like an 8080, which do one thing at a time and do not stress their memory systems, it's easy to say how many cycles a given instruction takes. Make the instruction set hideously complex, like the one on the VAX, and timing information gets very bulky. (Worse, it becomes heavily model-dependent, because different models of the CPU implement details differently.) Boost the clock rate and add a cache, and all of a sudden memory-access times are effectively non-deterministic: the time taken for an instruction is a function of whether its memory fetches come from the cache or not. Add prefetch, and execution times can be a function of the preceding instructions, because their memory-access patterns determine whether the prefetcher can sneak the instruction fetch in without stalling the execution unit. Add pipelining, and this gets a dozen times worse, because now instruction time is a complex function of both preceding and following instructions and how they fight each other for machine resources. (For example, the register-to-register move time on a 68020 is often zero, because the instruction completely disappears in overlap with neighboring instructions.) All of this means that supplying useful timing information for a cached, pipelined, prefetching CISC is hard. Supplying anything halfway accurate is a lot of work, and it may be encrusted with so many ifs, buts, and maybes that it isn't very useful. This encourages manufacturers to be lazy. There are also more cynical motives that may be involved, such as making life harder for third-party compiler suppliers, or a deliberate policy of discouraging model-specific programming (after all, if the customer isn't happy with the performance, he can always buy a bigger machine, and this way there's only one version of the software to maintain). -- SunOSish, adj: requiring | Henry Spencer at U of Toronto Zoology 32-bit bug numbers. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu