Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rutgers!ames!ucbcad!ucbvax!decvax!decwrl!labrea!navajo!rokicki From: rokicki@navajo.STANFORD.EDU (Tomas Rokicki) Newsgroups: comp.sys.amiga Subject: profiler (1/5) readme Message-ID: <1245@navajo.STANFORD.EDU> Date: Tue, 23-Dec-86 14:23:36 EST Article-I.D.: navajo.1245 Posted: Tue Dec 23 14:23:36 1986 Date-Received: Tue, 23-Dec-86 23:52:23 EST Organization: Stanford University Lines: 179 ---cut here--- Profiler for Manx 3.30e, v1.0 (C) 1986 Radical Eye Software Introduction and Overview These two programs, p1 and p2, comprise a profiler for programs compiled with Manx 3.30e or later. With this tool, those sections of code executed the most can be identified, so the program can be sped up with a minimum of effort. The goal was to have a profiler with the following characteristics: - Does not slow the program down excessively - Does not require a recompilation of the program - Does not take over system resources the program might need - Does not require excessive disk storage - Invocation counts should be kept - Time for self should be kept - Time for self plus children kept These goals have been attained, and though the results of the profiler should be treated with some caution, it does perform a valuable service currently lacking on the Amiga. The first program, p1, takes an executable and a link symbol file created with the `-t' option of the Manx linker, and creates a new executable with traps substituted for the link and unlink instructions, and a data file for the second phase. The second program, p2, resides in memory as the modified executable is running, servicing the traps and collecting usage information. To collect routine timings, the second phase also runs a fast real-time clock driven by the vertical beam position and the vertical blanking interrupt. Files To create these programs, you will need the following files: readme makefile p1.c p2.c p2a.asm Run the makefile over the three source files, and you should get two executables, p1 and p2, which can be moved into your c: directory. The sizes of these two programs should be approximately 9K and 12K respectively. Usage These programs do very dangerous things, so they should be treated carefully. If the instructions below are not followed, system crashes may result. The first step is to make the executable of the program you want to profile, and its symbol file. To do this, simply execute your normal link step, but give the `-t' option to the compiler as well. If your program is named `foo', this should create the files `foo' and `foo.sym'. If you are developing a piece of code and feel you might profile it occasionally, you might add the `-t' option to your makefile, as the resulting executable will be no different, and the `.sym' file is small. Next, you will need to create the modified executable for tracing. This new executable will contain trap instructions at each routine's entrance and exit. To do this, simply type `p1 foo'. The program p1 will read `foo' and `foo.sym', and create `foo.exe', the new executable, and `foo.pdt', a profile data file for the second phase. You should not run `foo.exe' by itself. If your `foo.sym' file is not up to date with your `foo' executable, `p1' will complain, in which case you should relink `foo' with the `-t' option. Finally, you should perform the actual profiling step. First, invoke p2 in the background by typing `run p2 foo'. Wait for p2 to open its window before proceeding. Note that you must supply the name `foo', and it must match the program you expect to profile. The program p2 will read in `foo.pdt', and start the real time clock. Once p2 has opened its window, you are ready to start profiling. Simply run your program as you normally would, only invoke `foo.exe' instead of `foo'. The new executable will perform in every respect the same as the old one, only more slowly because of the tracing. Run `foo.exe' as many times as you want; the profiling information will continue to accumulate as long as `p2' is running. It is recommended that you profile at least twenty minutes of the programs execution to allow for sampling inaccuracies. If you want to start profiling fresh, simply select the `initialize' gadget on p2's window. After you are finished, and `foo.exe' has exited, you may select the `finish' gadget on p2's window. This will print a summary of the results to a file called `foo.mon', and exit. Do not select this gadget when `foo.exe' is still running; the system will crash! The file `foo.mon' may now be examined for the results of the tracing. Theory of Operation Under Manx C, each C routine has exactly one link instruction, at the beginning of the code, and one unlink instruction, at its exit point. (Multiple returns are handled by branches to the single exit.) These links and unlinks are replaced by `trap #3's, followed by an integer representing the number of the procedure. (These numbers are assigned by p1.) Thus, as the program is executing, these traps occur, and call the trap handling routine set up by p2. To locate the procedures, and to assign names, the `foo.sym' file is read in. This file contains a list of each routine and its address in the executable. A link instruction should exist at the exact beginning of each routine, and an unlink followed by a rts sometime within the routine. These are scanned for and converted. To calculate the run time of each procedure, a fast free-running clock is also generated by p2. The vertical beam position counter is used as the eight least significant bits. A vertical blanking interrupt routine is set up to supply the rest of the bits; this gives a 32-bit counter with resolution of approximately 65 microseconds. Limitations Currently this program only works under Manx C, version 3.30e or later. It can be modified for other compilers quite easily, however; p1 should be the only routine which needs changes. A list of the routine names and their locations is needed by p1, and (hopefully) a single link and unlink instruction will reside in each subroutine. The executable part of the program must reside in a single hunk; this is an arbitrary limitation which simply exists in the code at the moment. (I don't have any programs which I am compiling for scatter loading or overlays.) Again, simple changes to p1 should eliminate this problem. Recursion is handled, albeit not correctly. The time for the self is correct, but the time for self plus children is too large. For instance, if A calls itself recursively once, then the second invocation of A will be counted twice in the self plus children category. The necessary data structures are set up to handle this problem, but the code is not there. It is simple to add. Non-local jumps (setjmp/longjmp) are not handled correctly. Avoid using the profiler on code which has these. The fix is to trap these specially, and to do as many `virtual unlinks' as necessary to keep the profiling on course. Again, I don't need this. The program must exit by calling _exit() eventually. I flag this as a special case, and use it to `virtually unlink' all currently active routines. By `virtual unlinks', I mean pretend the routine has exited and calculate the statistics for it. If the program does not exit by calling _exit(), then all statistics for the current invocation of currently active routines when the program does exit will be lost. In addition, the stack maintained by the profiler will start to creep; stack checking is done, so the program should not crash, but if the stack hits bottom, the data will be invalid. The fix for this is to trap other possible exit routes and treat them the same way I treat _exit. Static routines are not traced (they are invisible; their names never get to the link stage.) Lower level routines and routines without link and unlink instructions are not traced. The time values are real time. Thus, any time waiting for user interaction is charged to the procedure as if it were running real code. If this is a problem, either localize user interaction to a single routine (so that its value can be ignored) or supply interaction very quickly. Or, if anyone out there is clever, they can figure out a way to stop or fudge the clock while another task is running. Related to the above, the profiling time is not subtracted. Thus, routines which don't do much and return immediately will have slightly higher time values than they should. I've tried to keep this factor to a minimum, but since it will differ on different machines (68010, 68020, fast memory, etc.), I've decided not to factor it out. Extensions If anyone makes any extensions or modifications to this program, or you just want to send kudos or flames, I can be reached at 326-5312 (home), Box 2081, Stanford, CA 94305 (mail), rokicki@sushi.stanford.edu (arpanet), or ...!decwrl!navajo.stanford.edu!rokicki (usenet). Please contact me before distributing extensions; I'll try and coordinate these. Enjoy! -Tom Rokicki ---cut here---