Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!mit-eddie!uw-beaver!cornell!rochester!pt.cs.cmu.edu!andrew.cmu.edu!zs01+
From: zs01+@andrew.cmu.edu (Zalman Stern)
Newsgroups: comp.arch
Subject: Re: using assembler
Message-ID: <0X1Yh8y00Vs8QEumsN@andrew.cmu.edu>
Date: 15 Aug 88 02:24:08 GMT
References: <6341@bloom-beacon.MIT.EDU> <60859@sun.uucp> <474@m3.mfci.UUCP> <2926@utastro.UUCP> <37014@linus.UUCP> <1086@garth.UUCP>,
	<17326@gatech.edu>
Organization: Carnegie Mellon
Lines: 80
In-Reply-To: <17326@gatech.edu>

> *Excerpts from ext.nn.comp.arch: 27-Jul-88 Re: using assembler Ken Seefried*
> *iii@gatech. (2803)*

> And C is a cakewalk?  Fine, Ill tell you what...lets get together some
> time and program, say, a MIPS machine, you in asm, and me in  C.
> I'll even tie one hand behind my backand program in FORTRAN.  And
> we'll see how far each gets (the MIPS is a RISC machine.  Even trivial
> operations require non-trivialamounts of asm code).

I am not sure which instructions you are refering to as requiring non-trivial
amounts of code to synthesize. Are they things I am likely to use quite often?

Here are some reasons why the MIPS R2000/R3000 should be quite reasonable for
assembly language programming:

Simplicity:

    3 hours with "MIPS R2000 RISC Architecture" by Gerry Kane and I understood
    (or "groked" if you prefer) the R2000 and R2010 (floating point unit).
    There were none of those "What would I use that instruction for?" type
    questions going through my mind.
Orthogonal register set:

    One register is pretty much as good as another. None of this "Where am I
    going to put the CX register so I can do a shift" crud you run into on the
    earlier Intel beasties.
Abundance of registers:

    You get ten or twelve (depending on whether or not you count v0 and v1)
    unsaved registers to use as temporaries. (Actually, you can add in four to
    that for the argument passing registers a0-a3.)
Arguments passed in registers:

    Many routines will not need to allocate a stack frame at all. This frees
    you from having to deal with the calling convention a lot of the time.
Single cycle instructions:

    You don't have to have an instruction timing table handy to write efficient
    code. Almost every instruction takes one cycle. The only exceptions I know
    of are multiply/divide, loads/stores, and branches. (And of course floating
    point.)
Intelligent assembler:

    The assembler removes the burden of scheduling delay slots from programmer.
    The assembler can also synthesize addressing modes for the programmer.
Of course I don't write entire programs in assembly. (For many reasons, most of
which can be summed up by saying "Assembly language is just the wrong level of
abstraction.") I occasionally find it necessary to write a routine or two in
assembly either because high level languages can't do what I need, or because I
need extreme speed. Examples of where this has come up in practice are dynamic
loading and DES encryption.

We have a dynamic loading system which uses a "link snapping" mechanism. This
means that when you call a routine that hasn't been loaded yet, you wind up in
some trampoline code that loads the routine, fixes the original reference to
the routine to point to the newly loaded code, and finally jumps to the new
routine. Since there is no way to jump to a routine in C, this trampoline code
must be written in assembly.

In the DES case, assembly can win big because DES is essentially a bunch of bit
manipulations on a small block of data (64 bytes if I remember correctly.) In
assembly, the entire block of data can be loaded into the register file and
manipulated. The lack of loads and stores during the manipulation makes the
encryption run much faster. (I have yet to run into a C compiler that is tense
enough to do this. Maybe someday, one will exist.) Most people have decided
that the portability loss of assembly is not worth the speed gain for DES code.

I have never actually programmed on the MIPS machine. I have however written
assembly code for the IBM RT which has some of the same features. (Notably
passing arguments in registers.) I have had a much easier time on the RT than
on either the VAX, the 68000, or the 8086. (Granted the 68020 and the 80386 fix
a few of my complaints with these processor families.)

In short, a processor's machine language ought to be simple, regular, and damn
fast.

Sincerely,
Zalman Stern
Internet: zs01+@andrew.cmu.edu     Usenet: I'm soooo confused...
Information Technology Center, Carnegie Mellon, Pittsburgh, PA 15213-3890