Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!rutgers!ames!ucbcad!ucbvax!decvax!decwrl!labrea!navajo!rokicki
From: rokicki@navajo.STANFORD.EDU (Tomas Rokicki)
Newsgroups: comp.sys.amiga
Subject: profiler (1/5) readme
Message-ID: <1245@navajo.STANFORD.EDU>
Date: Tue, 23-Dec-86 14:23:36 EST
Article-I.D.: navajo.1245
Posted: Tue Dec 23 14:23:36 1986
Date-Received: Tue, 23-Dec-86 23:52:23 EST
Organization: Stanford University
Lines: 179

---cut here---
                        Profiler for Manx 3.30e, v1.0
                        (C) 1986  Radical Eye Software

Introduction and Overview

    These two programs, p1 and p2, comprise a profiler for programs compiled
with Manx 3.30e or later.  With this tool, those sections of code executed the
most can be identified, so the program can be sped up with a minimum of
effort.

   The goal was to have a profiler with the following characteristics:

      -  Does not slow the program down excessively
      -  Does not require a recompilation of the program
      -  Does not take over system resources the program might need
      -  Does not require excessive disk storage
      -  Invocation counts should be kept
      -  Time for self should be kept
      -  Time for self plus children kept

   These goals have been attained, and though the results of the profiler
should be treated with some caution, it does perform a valuable service
currently lacking on the Amiga.

   The first program, p1, takes an executable and a link symbol file
created with the `-t' option of the Manx linker, and creates a new
executable with traps substituted for the link and unlink instructions,
and a data file for the second phase.  The second program, p2, resides in
memory as the modified executable is running, servicing the traps and
collecting usage information.  To collect routine timings, the second
phase also runs a fast real-time clock driven by the vertical beam
position and the vertical blanking interrupt.

Files

   To create these programs, you will need the following files:

   readme
   makefile
   p1.c
   p2.c
   p2a.asm

   Run the makefile over the three source files, and you should get two
executables, p1 and p2, which can be moved into your c: directory.  The
sizes of these two programs should be approximately 9K and 12K
respectively.

Usage

   These programs do very dangerous things, so they should be treated
carefully.  If the instructions below are not followed, system crashes may
result.

   The first step is to make the executable of the program you want to
profile, and its symbol file.  To do this, simply execute your normal link
step, but give the `-t' option to the compiler as well.  If your program
is named `foo', this should create the files `foo' and `foo.sym'.  If you
are developing a piece of code and feel you might profile it occasionally,
you might add the `-t' option to your makefile, as the resulting executable
will be no different, and the `.sym' file is small.

   Next, you will need to create the modified executable for tracing.  This
new executable will contain trap instructions at each routine's entrance and
exit.  To do this, simply type `p1 foo'.  The program p1 will read `foo' and
`foo.sym', and create `foo.exe', the new executable, and `foo.pdt', a profile
data file for the second phase.  You should not run `foo.exe' by itself.
If your `foo.sym' file is not up to date with your `foo' executable, `p1'
will complain, in which case you should relink `foo' with the `-t' option.

   Finally, you should perform the actual profiling step.  First, invoke
p2 in the background by typing `run p2 foo'.  Wait for p2 to open its window
before proceeding.  Note that you must supply the name `foo', and it must
match the program you expect to profile.  The program p2 will read in
`foo.pdt', and start the real time clock.

   Once p2 has opened its window, you are ready to start profiling.  Simply
run your program as you normally would, only invoke `foo.exe' instead of
`foo'.  The new executable will perform in every respect the same as the
old one, only more slowly because of the tracing.  Run `foo.exe' as many
times as you want; the profiling information will continue to accumulate
as long as `p2' is running.  It is recommended that you profile at least
twenty minutes of the programs execution to allow for sampling inaccuracies.
If you want to start profiling fresh, simply select the `initialize' gadget
on p2's window.

   After you are finished, and `foo.exe' has exited, you may select the
`finish' gadget on p2's window.  This will print a summary of the results
to a file called `foo.mon', and exit.  Do not select this gadget when
`foo.exe' is still running; the system will crash!

   The file `foo.mon' may now be examined for the results of the tracing.

Theory of Operation

   Under Manx C, each C routine has exactly one link instruction, at the
beginning of the code, and one unlink instruction, at its exit point.
(Multiple returns are handled by branches to the single exit.)  These links
and unlinks are replaced by `trap #3's, followed by an integer representing
the number of the procedure.  (These numbers are assigned by p1.)  Thus, as
the program is executing, these traps occur, and call the trap handling
routine set up by p2.

   To locate the procedures, and to assign names, the `foo.sym' file is
read in.  This file contains a list of each routine and its address in the
executable.  A link instruction should exist at the exact beginning of each
routine, and an unlink followed by a rts sometime within the routine.  These
are scanned for and converted.

   To calculate the run time of each procedure, a fast free-running clock
is also generated by p2.  The vertical beam position counter is used as the
eight least significant bits.  A vertical blanking interrupt routine is set up
to supply the rest of the bits; this gives a 32-bit counter with resolution
of approximately 65 microseconds.

Limitations

   Currently this program only works under Manx C, version 3.30e or later.
It can be modified for other compilers quite easily, however; p1 should be
the only routine which needs changes.  A list of the routine names and
their locations is needed by p1, and (hopefully) a single link and unlink
instruction will reside in each subroutine.

   The executable part of the program must reside in a single hunk; this
is an arbitrary limitation which simply exists in the code at the moment.
(I don't have any programs which I am compiling for scatter loading or
overlays.)  Again, simple changes to p1 should eliminate this problem.

   Recursion is handled, albeit not correctly.  The time for the self
is correct, but the time for self plus children is too large.  For instance,
if A calls itself recursively once, then the second invocation of A will
be counted twice in the self plus children category.  The necessary data
structures are set up to handle this problem, but the code is not there.
It is simple to add.

   Non-local jumps (setjmp/longjmp) are not handled correctly.  Avoid using
the profiler on code which has these.  The fix is to trap these specially,
and to do as many `virtual unlinks' as necessary to keep the profiling on
course.  Again, I don't need this.

   The program must exit by calling _exit() eventually.  I flag this as a
special case, and use it to `virtually unlink' all currently active routines.
By `virtual unlinks', I mean pretend the routine has exited and calculate the
statistics for it.  If the program does not exit by calling _exit(), then all
statistics for the current invocation of currently active routines when the
program does exit will be lost.  In addition, the stack maintained by the
profiler will start to creep; stack checking is done, so the program should
not crash, but if the stack hits bottom, the data will be invalid.  The fix
for this is to trap other possible exit routes and treat them the same way I
treat _exit.

   Static routines are not traced (they are invisible; their names never
get to the link stage.)  Lower level routines and routines without link
and unlink instructions are not traced.

   The time values are real time.  Thus, any time waiting for user interaction
is charged to the procedure as if it were running real code.  If this is a
problem, either localize user interaction to a single routine (so that its
value can be ignored) or supply interaction very quickly.  Or, if anyone out
there is clever, they can figure out a way to stop or fudge the clock while
another task is running.

   Related to the above, the profiling time is not subtracted.  Thus,
routines which don't do much and return immediately will have slightly
higher time values than they should.  I've tried to keep this factor to a
minimum, but since it will differ on different machines (68010, 68020,
fast memory, etc.), I've decided not to factor it out.

Extensions

   If anyone makes any extensions or modifications to this program, or you
just want to send kudos or flames, I can be reached at 326-5312 (home),
Box 2081, Stanford, CA  94305 (mail), rokicki@sushi.stanford.edu (arpanet),
or ...!decwrl!navajo.stanford.edu!rokicki (usenet).  Please contact me
before distributing extensions; I'll try and coordinate these.  Enjoy!

                                                            -Tom Rokicki
---cut here---