Path: utzoo!attcan!uunet!wuarchive!gem.mps.ohio-state.edu!apple!rutgers!sunybcs!oswego!news
From: dl@g.g.oswego.edu (Doug Lea)
Newsgroups: comp.sw.components
Subject: Re: Garbage Collection & ADTs
Message-ID: 
Date: 27 Sep 89 20:21:03 GMT
References: <900@scaup.cl.cam.ac.uk> <6530@hubcap.clemson.edu>
	<909@scaup.cl.cam.ac.uk> <62342@tut.cis.ohio-state.edu>
	<599@hsi86.hsi.UUCP> <16270@brunix.UUCP>
Sender: news@oswego.Oswego.EDU (Network News)
Reply-To: dl@oswego.edu
Organization: SUNY Oswego
Lines: 285
In-reply-to: twl@brunix's message of 26 Sep 89 14:32:39 GMT


First a warning,

    I do quite a lot of C++ programming

and a perhaps slightly contentious definition,

    I take OO programming to be that which is in part or whole concerned
              -              -
             |  construction  |
    with the |  mutation      | of objects
             |  inspection    |  
             |  destruction   |
              -              -

    (And, of course, OO design to be concerned with the specification,
    definition, interrelationships, etc., of such objects...)


Only half-facetiously, one could make the vicious claim that GC
removes programmer control over 1/4 of the OO programming paradigm: In
C++, at least, the notion of destruction of an object may well require
some structured, active, resource deallocation, and need not just
consist of passive reclamation of memory cells.  The classic example
where such control is useful is a Window object that needs to erase
itself from view upon destruction. In a GC-based system, the
mechanisms for implemententing such actions seem less than natural --
either the programmer would need to `logically' destroy the window,
and then let the collector sometime later reclaim space, or perhaps
the collector could be designed to do all kinds of storage reclamation
associated with the object at some indeterminate time.

My points are just that

    Structured destruction is a good idea. (As is structured construction.)

    Garbage collection can get in the way of this.

    And again, half-facetiously:

        GC is sometimes a byproduct of functional programming methodologies
        that allow programmers to pay attention only to `values', and
        not the objects required in order to arrive at them.

All that said, I think GC is a very useful tool! Really!
It is a serious contender if
    Destruction consists only of releasing storage
and
    The lifetimes of objects are so unstructured/complex that manual
    reclamation is either slower, or more error-prone than GC.

This kind of situation comes up a lot.

But often a redesign that forces a more structured lifetime discipline
supporting simpler techniques is at least as attractive. I take that as
Wulf's main point. I think it's a good one.

How about a real example?  For the amusement of C++ fans following
this discussion, I'll append one below.  It's a fun variation of a
classic kind of situation where GC is commonly used. There are Cells
that may be linked together to create computational pipelines. While I
set it up to use GC, other choices are possible. Here are some of the
not-terribly-profound issues that came to my mind while writing it as
a classroom example:

    * The main() doesn't really exercise the collector. One chain
        is created and destroyed during the run. 

        * If this were a one-shot closed design, culminating only in a
            cute but slow prime number sieve program, the best
            solution is probably to do nothing, letting all
            deletion occur at program termination, since there are
            no other useful aspects of destruction.

        * If this were still self-contained, but a variation of
            the current main() were used as a callable function,
            then it would make sense to insert code to mark the
            base of the chain, use a stack-based allocator, and
            release it on destruction via modified constructor/
            destructor code. You'd also want to insert code to
            ensure that at most one chain were active at a time.

    * In fact, the *current* code using Sieve consists only of single 
        chains of Cells. If only such applications were supported,

        * One could delete space via a virtual no-op destructor
            in Cell, and one in Filter like 
                ~Filter() { delete source; }
            to cause a chain of deletions to occur on destruction.
    
        * Except that the Cells do not know if their sources were
            dynamically allocated or not, so `delete source' can be an
            error if the source is a local. Perhaps, Cells could
            refuse to support `local' creation, and only exist
            dynamically. Alternatively some bookkeeping could be added
            inside each Cell to record whether it was dynamically
            allocated, and constructors and destructors would need to
            use it. The former is probably a better strategy, since
            the implementation is not set up to cleanly support
            by-value argument passing of Cells, etc., anyway, and
            there seems to be every reason not to do so. 

            * Digression: Actually, this is all a bit of a pain in
                C++.  In order to avoid clients having to say `new
                Whatever', and thereafter dealing with pointers, you'd
                need to set up a parallel set of classes, each of
                which served as pointers to the `real' objects of the
                corresponding classes, and then took care of the
                operator `new' stuff on creation, etc.  It's a case
                where the convention taken in Eiffel, Smalltalk, etc.,
                that names are by default just bound to references to
                objects, not the objects themselves, would be more
                convenient. In these kinds of situations, C++
                programmers often settle for requiring clients to use
                pointers or references explicitly.  But even this has
                drawbacks, since, while you can force clients to only
                create objects via `new', the only means for doing this
                leaves the restriction as a mostly run-time, not
                compile-time error.

    * But it is very possible for someone to make two Filters
        point to the same source, or to create new subclasses with
        multiple sources or sinks.

        * The simple GC solution would continue to work in such cases.

        * In order to deal with this and still keep virtual
            destructors, some form of reference counting would be
            required.  Even this won't suffice if forms of circular
            paths that give reference counters problems were created,
            but such possibilities seem extremely unlikely, and
            defensible as a documented restriction, akin to the
            builtin limitation that only, say 32767 references are
            allowed before an object is considered permanent.  The
            work involved in adapting this code to use reference
            counts is not great, but is greater than using GC.  The
            overhead for reference counting vs GC would probably be
            close, depending on the quality of the implementations. In
            either case, storage management would probably amount to
            a tiny percentage of execution time.

        * Except that it makes no sense in *this* code for anyone
            to hook up two Filters to the same source, so perhaps
            Cells should record whether they've been used as sources
            and refuse to comply if they are re-attached. But this
            is reference-counting, just used in a different way.

        * Except that maybe someone might want to make a subclass of
            Cells (e.g., random number generators) that would be
            useful to multiply tap.

        * In either case, reference-counting might be more flexible,
            since it could be used both for allocation and to guard
            against some kinds of Cells being multiply tapped when
            this is would be an error.

    * What about programmers creating subclasses that display
        themselves in action on a screen, or record their
        actions in a log file? In such cases the reference-counting
        version seems necessary, since destruction would require
        real-time screen updates or file manipulation. 

So, perhaps, reference-counting would be more appropriate, since it
may better serve the goal of a robust, reusable, extensible design.
I'll leave implementation as an exercise to the reader (in other
words, I'm too lazy to do it out right now!)

The example has no particular deep moral, except to reflect my feeling
that thinking about the full lifetimes of objects is good for you! It
helps focus attention on various aspects of the semantics of new types
in ways you otherwise might not have thought much about. And further,
that GC-oriented issues and reusability *do* interact, but that GC need
not always be chosen on such grounds.

I suppose it's worth a mention that the issues based on extensibility
via inheritence (i.e., design reuse) don't much come up in Ada.
        
Perhaps others could offer other analyses of this or other examples.

------------------------------- sieve.cc

// Fun with the prime number sieve.
// Based loosely on R. Sethi ``Programming Languages'', sec 6.7


// link to Boehm's comp.sources.unix C GC routines ...

   extern "C" void  gc_init(); 
   extern "C" void* gc_malloc(unsigned int);

class GC      // ... but wrap gc routines as a C++ class
{
public:
               GC()                   { gc_init(); }
         void* alloc(unsigned int sz) { return gc_malloc(sz); }
};



class Cell     // an object with state and a way to update and report it
{
private:
  static   GC  gc;            // one collector for all Cells
protected:
          int  val;           // the value to output next
  virtual void update() {}    // the updater; default as no-op

public:
               Cell(int initial=0) :val(initial) {} 

          int  next()         { update(); return val; } // do one step

                              // set class allocation ops to use gc
         void* operator new(unsigned int sz) { return gc.alloc(sz); }
         void  operator delete(void*) {}
};

GC Cell::gc = GC();          // static class member initialization
                             // must be done outside class decl in C++


class Counter : public Cell  // output consecutive integers
{
protected:
         void update()       { ++val; }

public:
              Counter(int initial=0) :Cell(initial) {}
};


class Filter : public Cell  // transform input from another Cell
{
protected:
         Cell* src;          // input source
          int  inp;          // last input
         void  get()         { inp = src->next(); }

public:
               Filter(Cell* s, int initial=0) :Cell(initial), src(s), inp(0) {}
};


class ModFilter : public Filter // hold last input not divisible by divisor
{
private:
         int  divisor;
         int  divides()      { return inp % divisor == 0; }
public:
              ModFilter(Cell* s, int d) : Filter(s), divisor(d) {}
         void update()       { do get(); while (divides()); val = inp; }

};


class Sieve : public Filter  // Serve as terminal node of a prime number sieve
{
public:
              Sieve()        :Filter(new Counter(1)) {}
         void update()       { get(); src = new ModFilter(src, val = inp); }
};


#include  

main(int argc, char** argv) // exercise everything a little
{
  if (argc < 2) 
  { 
    cerr << "error: enter desired number of primes as program argument\n"; 
    exit(1); 
  }

  int n = abs(atoi(argv[1]));
  Sieve s;
  for (int i = 0; i < n; ++i) cout << s.next() << "\n";
}

Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367
email: dl@oswego.edu              or dl%oswego.edu@nisc.nyser.net
UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl
--
Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367
email: dl@oswego.edu              or dl%oswego.edu@nisc.nyser.net
UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl